Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Unicode property in Perl 5.23.4 #15020

Closed
p5pRT opened this issue Oct 31, 2015 · 8 comments
Closed

Missing Unicode property in Perl 5.23.4 #15020

p5pRT opened this issue Oct 31, 2015 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 31, 2015

Migrated from rt.perl.org#126515 (status was 'rejected')

Searchable as RT126515$

@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2015

From j.imrie1@virginmedia.com

Created by @ThePilgrim

Bug in Perl 5.23.4

Unicode property Block=CJK_Unified_Ideograph will not compile.
This has been found by the CPAN testers see
http​://www.cpantesters.org/cpan/report/e9ac4ed2-7e10-11e5-b3a0-9c9ddfbfc7aa

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl 5.18.2:

Configured by strawberry-perl at Tue Apr 15 14:38:14 2014.

Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
   
  Platform:
    osname=MSWin32, osvers=6.2, archname=MSWin32-x64-multi-thread
    uname='Win32 strawberry-perl 5.18.2.2 #1 Tue Apr 15 14:36:23 2014 x64'
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags =' -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE 
-DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS
-DUSE_PERLIO -fno-strict-aliasing -mms-bitfields',
    optimize='-s -O2',
    cppflags='-DWIN32'
    ccversion='', gccversion='4.7.3', gccosandvers=''
    intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='long
long', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='g++', ldflags ='-s -L"D:\strawberry\perl\lib\CORE"
-L"D:\strawberry\c\lib"'
    libpth=D:\strawberry\c\lib D:\strawberry\c\x86_64-w64-mingw32\lib
D:\strawberry\c\lib\gcc\x86_64-w64-mingw32\4.7.3
    libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool
-lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid
-lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    libc=, so=dll, useshrplib=true, libperl=libperl518.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-mdll -s -L"D:\strawberry\perl\lib\CORE"
-L"D:\strawberry\c\lib"'

Locally applied patches:
    


@INC for perl 5.18.2:
    D:/strawberry/perl/site/lib
    D:/strawberry/perl/vendor/lib
    D:/strawberry/perl/lib
    .


Environment for perl 5.18.2:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
   
PATH=C:\Windows\system32;C:\Windows;C:\Windows\system32\wbem;C:\Program
Files (x86)\Intel\iCLS Client;C:\Program Files\Intel\iCLS
Client;C:\Program Files\Common Files\Microsoft Shared\Windows
Live;C:\Program Files (x86)\Common Files\Microsoft Shared\Windows
Live;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Program Files
(x86)\Windows Live\Shared;C:\Program Files\Intel\Intel(R) Management
Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine
Components\IPT;C:\Program Files (x86)\Intel\Intel(R) Management Engine
Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine
Components\IPT;D:\strawberry\c\bin;D:\strawberry\perl\site\bin;D:\strawberry\perl\bin;C:\strawberry\c\bin;C:\strawberry\perl\bin;C:\Program
Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files
(x86)\GNU\GnuPG\pub
    PERL_BADLANG (unset)
    SHELL (unset)


@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2015

From @khwilliamson

There is no such block. The proper block name is plural, as shown in perluniprops

  \p{Block​: CJK_Unified_Ideographs} (Short​: \p{Blk=CJK}) (20_992)

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2015

@khwilliamson - Status changed from 'open' to 'rejected'

@p5pRT p5pRT closed this as completed Oct 31, 2015
@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2015

From @khwilliamson

On 10/31/2015 05​:26 PM, Karl Williamson via RT wrote​:

There is no such block. The proper block name is plural, as shown in perluniprops

\\p\{Block​: CJK\_Unified\_Ideographs\} \(Short​: \\p\{Blk=CJK\}\) \(20\_992\)

And, I haven't looked at the code, but it's very rare code indeed that
should be using the Block property. Only code that is dealing with
Unicode internals has any real business with blocks. They are an
artifact of Unicode itself, and have no relation to any real language
features. For most code \p{Script​: Han} would be a better choice. But
as I said, I haven't looked at this code; perhaps it is a legitimate
usage, though misspelled.

@p5pRT
Copy link
Author

p5pRT commented Nov 1, 2015

From @ThePilgrim

On Sat Oct 31 16​:39​:44 2015, public@​khwilliamson.com wrote​:

On 10/31/2015 05​:26 PM, Karl Williamson via RT wrote​:

There is no such block. The proper block name is plural, as shown in
perluniprops

\p{Block​: CJK_Unified_Ideographs} (Short​: \p{Blk=CJK}) (20_992)

And, I haven't looked at the code, but it's very rare code indeed that
should be using the Block property. Only code that is dealing with
Unicode internals has any real business with blocks. They are an
artifact of Unicode itself, and have no relation to any real language
features. For most code \p{Script​: Han} would be a better choice.
But
as I said, I haven't looked at this code; perhaps it is a legitimate
usage, though misspelled.

Odd that I only get the error with 5.23.4 Previous versions all the way back to 5.10.1 allow it.

@p5pRT
Copy link
Author

p5pRT commented Nov 1, 2015

From @khwilliamson

On 10/31/2015 06​:53 PM, JohnI via RT wrote​:

On Sat Oct 31 16​:39​:44 2015, public@​khwilliamson.com wrote​:

On 10/31/2015 05​:26 PM, Karl Williamson via RT wrote​:

There is no such block. The proper block name is plural, as shown in
perluniprops

\p{Block​: CJK_Unified_Ideographs} (Short​: \p{Blk=CJK}) (20_992)

And, I haven't looked at the code, but it's very rare code indeed that
should be using the Block property. Only code that is dealing with
Unicode internals has any real business with blocks. They are an
artifact of Unicode itself, and have no relation to any real language
features. For most code \p{Script​: Han} would be a better choice.
But
as I said, I haven't looked at this code; perhaps it is a legitimate
usage, though misspelled.

Odd that I only get the error with 5.23.4 Previous versions all the way back to 5.10.1 allow it.

---
via perlbug​: queue​: perl5 status​: rejected
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=126515

Ahhh! Further investigation indicates why. I have just changed the
code so that it checks if a property is valid at compile time.
Previously, if it didn't recognize a property, it waited until runtime
to fail, under the theory that the property could be a user-defined one
whose definition could come later than the regex compilation. But in
fact, only properties whose names begin with 'In' or 'Is' can be
user-defined, so in 5.23.4 I changed it so that all other unknwon ones
are rejected at compile time.

If you compile the following on an earlier version of a -DEBUGGING perl,
you'll see something like​:

§ perl -Dr -le 'qr/\p{Block=CJK_Unified_Ideograph}/'
  Compiling REx "\p{Block=CJK_Unified_Ideograph}"
  ... lots of irrelevant lines
Final program​:
  1​: ANYOF[{outside bitmap}+utf8​::Block=CJK_Unified_Ideograph] (12)
  12​: END (0)

That +utf8​::... indicates it is expecting a run-time property of that name.

If you actually try to match against it, like

§ perl -le '"A" =~ qr/\p{Block=CJK_Unified_Ideograph}/'
Can't find Unicode property definition "Block=CJK_Unified_Ideograph" at
-e line 1.

What this means is that this pattern is never actually getting used in
your module. It's effectively dead code, or else you would have gotten
complaints in the past.

I knew this change might break something like is happening here. I
suppose the perldelta should say something about certain errors are now
caught at compile time instead of run time, so code that previously ran
but never reached those sections will now fail.

To make sure I'm being clear. A property whose name begins with 'In' or
'Is' (case sensitive) may be a user-defined property. If it isn't
defined at the time of pattern compilation, an error gets raised only if
fetching it at runtime fails. No error gets raised if that pattern or
portion of a pattern never gets matched. All other properties are now
checked at compile time.

@p5pRT
Copy link
Author

p5pRT commented Nov 2, 2015

From @ThePilgrim

Ahhh! Further investigation indicates why. I have just changed the
code so that it checks if a property is valid at compile time.
Previously, if it didn't recognize a property, it waited until runtime
to fail, under the theory that the property could be a user-defined
one
whose definition could come later than the regex compilation. But in
fact, only properties whose names begin with 'In' or 'Is' can be
user-defined, so in 5.23.4 I changed it so that all other unknwon ones
are rejected at compile time.

If you compile the following on an earlier version of a -DEBUGGING
perl,
you'll see something like​:

§ perl -Dr -le 'qr/\p{Block=CJK_Unified_Ideograph}/'
Compiling REx "\p{Block=CJK_Unified_Ideograph}"
... lots of irrelevant lines
Final program​:
1​: ANYOF[{outside bitmap}+utf8​::Block=CJK_Unified_Ideograph] (12)
12​: END (0)

That +utf8​::... indicates it is expecting a run-time property of that
name.

If you actually try to match against it, like

§ perl -le '"A" =~ qr/\p{Block=CJK_Unified_Ideograph}/'
Can't find Unicode property definition "Block=CJK_Unified_Ideograph"
at
-e line 1.

What this means is that this pattern is never actually getting used in
your module. It's effectively dead code, or else you would have
gotten
complaints in the past.

I knew this change might break something like is happening here. I
suppose the perldelta should say something about certain errors are
now
caught at compile time instead of run time, so code that previously
ran
but never reached those sections will now fail.

To make sure I'm being clear. A property whose name begins with 'In'
or
'Is' (case sensitive) may be a user-defined property. If it isn't
defined at the time of pattern compilation, an error gets raised only
if
fetching it at runtime fails. No error gets raised if that pattern or
portion of a pattern never gets matched. All other properties are now
checked at compile time.

OK that makes sense. And I'd rather have it fail at compile time than run time.

Thanks for taking time to investigate.

John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant