Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\p{user-defined} should be immune from later Unicode releases #17025

Closed
p5pRT opened this issue May 27, 2019 · 10 comments
Closed

\p{user-defined} should be immune from later Unicode releases #17025

p5pRT opened this issue May 27, 2019 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented May 27, 2019

Migrated from rt.perl.org#134146 (status was 'open')

Searchable as RT134146$

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

From perlbug@jgreely.com

Created by perlbug@jgreely.com

This is a bug report for perl from perlbug@​jgreely.com,
generated with the help of perlbug 1.41 running under perl 5.28.1.

-----------------------------------------------------------------
While upgrading a script from Perl 5.20.2 to 5.28.1, I discovered
that there's now a builtin InKana property, so the example in
perlunicode.pod is ignored if you try to use it. Changing the
name to something other than InKana (or IsKana, which was already
in 5.20.2) works.

Example code​:
  perl -e '$_ = "a";/\p{InKana}/'

5.20.2​:
Can't find Unicode property definition "InKana" at -e line 1.

5.28.1​:
(no output)

Perl Info

Flags:
    category=docs
    severity=low

Site configuration information for perl 5.28.1:

Configured by jgreely at Tue Apr  2 22:39:36 PDT 2019.

Summary of my perl5 (revision 5 version 28 subversion 1) configuration:
   
  Platform:
    osname=darwin
    osvers=18.5.0
    archname=darwin-2level
    uname='darwin helloparty.local 18.5.0 darwin kernel version 18.5.0: mon mar 11 20:40:32 pdt 2019; root:xnu-4903.251.3~3release_x86_64 x86_64 '
    config_args='-de -Dprefix=/Users/Shared/perlbrew/perls/perl-5.28.1 -Aeval:scriptdir=/Users/Shared/perlbrew/perls/perl-5.28.1/bin'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.14 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -DPERL_USE_SAFE_PUTENV'
    optimize='-O3'
    cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.14 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.3)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -mmacosx-version-min=10.14 -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /Library/Developer/CommandLineTools/usr/lib/clang/10.0.1/lib /Library/Developer/CommandLineTools/usr/lib /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/lib /usr/lib
    libs=-lpthread -lgdbm -ldbm -ldb -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -mmacosx-version-min=10.14 -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    Devel::PatchPerl 1.48


@INC for perl 5.28.1:
    /Users/Shared/perlbrew/perls/perl-5.28.1/lib/site_perl/5.28.1/darwin-2level
    /Users/Shared/perlbrew/perls/perl-5.28.1/lib/site_perl/5.28.1
    /Users/Shared/perlbrew/perls/perl-5.28.1/lib/5.28.1/darwin-2level
    /Users/Shared/perlbrew/perls/perl-5.28.1/lib/5.28.1


Environment for perl 5.28.1:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/jgreely
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/Users/Shared/perlbrew/bin:/Users/Shared/perlbrew/perls/perl-5.28.1/bin:/Users/jgreely/.pyenv/shims:/Users/jgreely/.pyenv/bin:/Users/jgreely/.rbenv/shims:/Users/jgreely/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/Applications/VMware Fusion.app/Contents/Public:/Library/TeX/texbin:/usr/local/MacGPG2/bin:/usr/local/share/dotnet:/opt/X11/bin:~/.dotnet/tools:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Users/Shared/Go/bin:/usr/local/plan9/bin:.
    PERLBREW_HOME=/Users/jgreely/.perlbrew
    PERLBREW_MANPATH=/Users/Shared/perlbrew/perls/perl-5.28.1/man
    PERLBREW_PATH=/Users/Shared/perlbrew/bin:/Users/Shared/perlbrew/perls/perl-5.28.1/bin
    PERLBREW_PERL=perl-5.28.1
    PERLBREW_ROOT=/Users/Shared/perlbrew
    PERLBREW_SHELLRC_VERSION=0.86
    PERLBREW_VERSION=0.86
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

From @jkeenan

On Mon, 27 May 2019 08​:29​:53 GMT, jgreely wrote​:

This is a bug report for perl from perlbug@​jgreely.com,
generated with the help of perlbug 1.41 running under perl 5.28.1.

-----------------------------------------------------------------
While upgrading a script from Perl 5.20.2 to 5.28.1, I discovered
that there's now a builtin InKana property, so the example in
perlunicode.pod is ignored if you try to use it. Changing the
name to something other than InKana (or IsKana, which was already
in 5.20.2) works.

Example code​:
perl -e '$_ = "a";/\p{InKana}/'

5.20.2​:
Can't find Unicode property definition "InKana" at -e line 1.

5.28.1​:
(no output)

Would it be possible for you to provide a patch to pod/perlunicode.pod in the Perl 5 core distribution so that it is clear exactly which example in that document needs revision?

I couldn't locate the specific code example you provided in that document, even though I was able to confirm the difference in behavior between those two perl versions.

Ideally, we would prefer against perl 5 blead, i.e., against pod/perlunicode.pod in a git checkout of core. It would also be good to attach the patch rather than including it inline.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

From perlbug@jgreely.com

On Mon, 27 May 2019 07​:18​:59 -0700, jkeenan wrote​:

Would it be possible for you to provide a patch to pod/perlunicode.pod
in the Perl 5 core distribution so that it is clear exactly which
example in that document needs revision?

All instances of the string 'InKana' must be replaced with a new name to work under recent Perl versions. I chose 'InHiraganaKatakana', which may be longer than desired for an example. See attachment.

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

From perlbug@jgreely.com

inkana.patch
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 8f09a18fca..3e3a5f031c 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1144,7 +1144,7 @@ hexadecimal code points for a range; or a single hexadecimal code point.
 For example, to define a property that covers both the Japanese
 syllabaries (hiragana and katakana), you can define
 
-    sub InKana {
+    sub InHiraganaKatakana {
         return <<END;
     3040\t309F
     30A0\t30FF
@@ -1152,11 +1152,11 @@ syllabaries (hiragana and katakana), you can define
     }
 
 Imagine that the here-doc end marker is at the beginning of the line.
-Now you can use C<\p{InKana}> and C<\P{InKana}>.
+Now you can use C<\p{InHiraganaKatakana}> and C<\P{InHiraganaKatakana}>.
 
 You could also have used the existing block property names:
 
-    sub InKana {
+    sub InHiraganaKatakana {
         return <<'END';
     +utf8::InHiragana
     +utf8::InKatakana
@@ -1167,7 +1167,7 @@ Suppose you wanted to match only the allocated characters,
 not the raw block ranges: in other words, you want to remove
 the unassigned characters:
 
-    sub InKana {
+    sub InHiraganaKatakana {
         return <<'END';
     +utf8::InHiragana
     +utf8::InKatakana
@@ -1177,7 +1177,7 @@ the unassigned characters:
 
 The negation is useful for defining (surprise!) negated classes.
 
-    sub InNotKana {
+    sub InNotHiraganaKatakana {
         return <<'END';
     !utf8::InHiragana
     -utf8::InKatakana
@@ -1186,10 +1186,10 @@ The negation is useful for defining (surprise!) negated classes.
     }
 
 This will match all non-Unicode code points, since every one of them is
-not in Kana.  You can use intersection to exclude these, if desired, as
+not in HiraganaKatakana.  You can use intersection to exclude these, if desired, as
 this modified example shows:
 
-    sub InNotKana {
+    sub InNotHiraganaKatakana {
         return <<'END';
     !utf8::InHiragana
     -utf8::InKatakana

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

From @jkeenan

On Mon, 27 May 2019 15​:47​:57 GMT, jgreely wrote​:

On Mon, 27 May 2019 07​:18​:59 -0700, jkeenan wrote​:

Would it be possible for you to provide a patch to
pod/perlunicode.pod
in the Perl 5 core distribution so that it is clear exactly which
example in that document needs revision?

All instances of the string 'InKana' must be replaced with a new name
to work under recent Perl versions. I chose 'InHiraganaKatakana',
which may be longer than desired for an example. See attachment.

Patch looks good to me; available in jkeenan/rt-134146-unicode branch.

TonyC, khw, list​: Comments?

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

From @khwilliamson

On Mon, 27 May 2019 08​:47​:57 -0700, jgreely wrote​:

On Mon, 27 May 2019 07​:18​:59 -0700, jkeenan wrote​:

Would it be possible for you to provide a patch to
pod/perlunicode.pod
in the Perl 5 core distribution so that it is clear exactly which
example in that document needs revision?

All instances of the string 'InKana' must be replaced with a new name
to work under recent Perl versions. I chose 'InHiraganaKatakana',
which may be longer than desired for an example. See attachment.

Actually, this isn't true. This program shows that you can define your own InKana property

sub InKana {
  return <<END;
0100\t0101
END
}

qr/\p{InKana}/;

When run under blead with -Dr, we get
Final program​:
  1​: ANYOFH[0100-0101] (First UTF-8 byte=\xc4) (3)
  3​: END (0)

However, if instead the program is
qr/\p{InKana}/;

sub InKana {
  return <<END;
0100\t0101
END
}

and run under blead -Dr, we get

Final program​:
  1​: ANYOFH[3001-3003 3008-3011 3013-301F 3030-3035 3037 303C-303D 3099-309C 30A0-30FF 31F0-31FF 32D0-32FE 3300-3357 FE45-FE46 FF61-FF9F 1B000 1B164-1B167] (3)
  3​: END (0)

What is happening here is that perl looks for a user-defined property. If it finds one, it uses it. If it doesn't find it, it looks for an official Unicode property. If it finds one, it uses that. If not, it will defer looking up the property until execution. If it is still undefined at the point it is first needed, it croaks, otherwise it uses it.

My guess is that the program that led to the erroneous conclusion is like the 2nd program above. Declaring sub InKana;
before trying to compile it would cause it to be used instead of the official one.

I think the patch to the documentation would be to add something about this timing issue
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented May 27, 2019

From perlbug@jgreely.com

My program (which worked from at least 5.10.1 through 5.20.2) did indeed reference "/^\p{InKana}+$/" before defining "sub InKana {...}". It feels very un-Perly to have to put a sub at the top of your script in order to get correct behavior, so I'd agree that the documentation needs an update.

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2019

From @khwilliamson

On 5/27/19 2​:48 PM, J Greely via RT wrote​:

My program (which worked from at least 5.10.1 through 5.20.2) did indeed reference "/^\p{InKana}+$/" before defining "sub InKana {...}". It feels very un-Perly to have to put a sub at the top of your script in order to get correct behavior, so I'd agree that the documentation needs an update.

---
via perlbug​: queue​: perl5 status​: open
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=134146

You have convinced me, without perhaps intending to, that this is a bug.
  I'm working on a fix so that user-defined subs always override an
official property.

@p5pRT
Copy link
Author

p5pRT commented Jun 3, 2019

From @khwilliamson

On Fri, 31 May 2019 21​:41​:45 -0700, public@​khwilliamson.com wrote​:

On 5/27/19 2​:48 PM, J Greely via RT wrote​:

My program (which worked from at least 5.10.1 through 5.20.2) did
indeed reference "/^\p{InKana}+$/" before defining "sub InKana
{...}". It feels very un-Perly to have to put a sub at the top of
your script in order to get correct behavior, so I'd agree that the
documentation needs an update.

---
via perlbug​: queue​: perl5 status​: open
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=134146

You have convinced me, without perhaps intending to, that this is a
bug.
I'm working on a fix so that user-defined subs always override an
official property.

It turns out that this particular instance is symptomatic of another problem. The 'In' prefix is only supposed to be used for Block properties. But it was being accepted for all. This bug has apparently been there from the beginning of such things.

That issue has been fixed by 74333e9, which means no documentation should be changed.

The bottom line is that later Unicode releases with their new property names should not override existing code that uses a particular user-defined property name. I have changed the title of this ticket accordingly.

And it turns out that there is a fairly simple solution that does this. That is to never expand until runtime a property whose name could be a user-defined one. If at that time no appropriate sub has been defined, then look for an official properlty with that name.

This means slower execution if and only if the property name begins with In or Is but only the first time the match is tried.

I will defer fixing this for the time being.
--
Karl Williamson

@toddr toddr removed the khw label Oct 25, 2019
@khwilliamson khwilliamson reopened this Nov 27, 2019
khwilliamson added a commit that referenced this issue Dec 8, 2019
Prior to this patch, they only sometimes overrode.
khwilliamson added a commit that referenced this issue Dec 9, 2019
Prior to this patch, they only sometimes overrode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants