Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented change of UTF-8 delimiters to substitution #15744

Closed
p5pRT opened this issue Dec 2, 2016 · 6 comments
Closed

Undocumented change of UTF-8 delimiters to substitution #15744

p5pRT opened this issue Dec 2, 2016 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 2, 2016

Migrated from rt.perl.org#130242 (status was 'rejected')

Searchable as RT130242$

@p5pRT
Copy link
Author

p5pRT commented Dec 2, 2016

From @choroba

Created by @choroba

One can use non-ASCII characters as delimiters in s///, but their
behaviour depends on Perl version. In older Perls, you need to double
the middle delimiter (or rather, the pattern and replacement use their
own pair of delimiters). Example session​:

  ~ $ perl -v

This is perl 5, version 18, subversion 2 (v5.18.2) built for
x86_64-linux-thread-multi

Copyright 1987-2013, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http​://www.perl.org/, the Perl Home Page.

  ~ $ perl -Mutf8 -wE 'say "a" =~ s¦a¦A¦r'
Substitution replacement not terminated at -e line 1.
[255]
  ~ $ blead/bin/perl5.25.7 -Mutf8 -wE 'say "a" =~ s¦a¦A¦r'
A
  ~ $ perl -Mutf8 -wE 'say "a" =~ s¦a¦¦A¦r'
A
  ~ $ blead/bin/perl5.25.7 -Mutf8 -wE 'say "a" =~ s¦a¦¦A¦r'
Unknown regexp modifier "/A" at -e line 1, near "=~ "
Unrecognized character \x{a6}; marked by <-- HERE after =~ saA<-- HERE near
column 18 at -e line 1.
[255]

I haven't found this change documented in any delta.

Ch.

Perl Info

Flags:
     category=docs
     severity=low

This perlbug was built using Perl 5.18.2 - Thu Sep  8 10:08:46 UTC 2016
It is being executed now by  Perl 5.18.2 - Thu Sep  8 10:06:42 UTC 2016.

Site configuration information for perl 5.18.2:

Configured by abuild at Thu Sep  8 10:06:42 UTC 2016.

Summary of my perl5 (revision 5 version 18 subversion 2) configuration:

   Platform:
     osname=linux, osvers=4.1.27-27-default, archname=x86_64-linux-thread-multi
     uname='linux lamb14 4.1.27-27-default #1 smp preempt fri jul 15 12:46:41 
utc 2016 (84ae57e) x86_64 x86_64 x86_64 gnulinux '
     config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl 
-Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Dd_dbm_open -Duseshrplib=true 
-Doptimize=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall 
-D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables 
-fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV 
-Dotherlibdirs=/usr/lib/perl5/site_perl 
-Dinc_version_list=5.18.0/x86_64-linux-thread-multi 5.18.0 
5.18.1/x86_64-linux-thread-multi 5.18.1'
     hint=recommended, useposix=true, d_sigaction=define
     useithreads=define, usemultiplicity=define
     useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
     use64bitint=define, use64bitall=define, uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV 
-fno-strict-aliasing -pipe -fstack-protector -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64',
     optimize='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall 
-D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables 
-fasynchronous-unwind-tables -g -Wall -pipe',
     cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV 
-fno-strict-aliasing -pipe -fstack-protector'
     ccversion='', gccversion='4.8.5', gccosandvers=''
     intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
     ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
     alignbytes=8, prototype=define
   Linker and Libraries:
     ld='cc', ldflags =' -L/usr/local/lib64 -fstack-protector'
     libpth=/lib64 /usr/lib64 /usr/local/lib64
     libs=-lm -ldl -lcrypt -lpthread
     perllibs=-lm -ldl -lcrypt -lpthread
     libc=/lib64/libc-2.19.so, so=so, useshrplib=true, libperl=libperl.so
     gnulibc_version='2.19'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E 
-Wl,-rpath,/usr/lib/perl5/5.18.2/x86_64-linux-thread-multi/CORE'
     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64 
-fstack-protector'

Locally applied patches:



@INC for perl 5.18.2:
     /home/choroba/perl5/lib/perl5/5.18.2/x86_64-linux-thread-multi
     /home/choroba/perl5/lib/perl5/5.18.2
     /home/choroba/perl5/lib/perl5/x86_64-linux-thread-multi
     /home/choroba/perl5/lib/perl5
     /usr/lib/perl5/site_perl/5.18.2/x86_64-linux-thread-multi
     /usr/lib/perl5/site_perl/5.18.2
     /usr/lib/perl5/vendor_perl/5.18.2/x86_64-linux-thread-multi
     /usr/lib/perl5/vendor_perl/5.18.2
     /usr/lib/perl5/5.18.2/x86_64-linux-thread-multi
     /usr/lib/perl5/5.18.2
     /home/choroba/perl5/lib/perl5/5.18.1
     /usr/lib/perl5/site_perl


Environment for perl 5.18.2:
     HOME=/home/choroba
     LANG=en_US.UTF-8
     LANGUAGE (unset)
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)

PATH=/home/choroba/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/home/choroba/perl5/bin:.
     PERL5LIB=/home/choroba/perl5/lib/perl5
     PERL_BADLANG (unset)
     PERL_LOCAL_LIB_ROOT=/home/choroba/perl5
     PERL_MB_OPT=--install_base "/home/choroba/perl5"
     PERL_MM_OPT=INSTALL_BASE=/home/choroba/perl5
     SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Dec 2, 2016

From @jkeenan

On Fri, 02 Dec 2016 10​:35​:23 GMT, choroba@​cpan.org wrote​:

This is a bug report for perl from choroba@​cpan.org,
generated with the help of perlbug 1.39 running under perl 5.18.2.

-----------------------------------------------------------------
[Please describe your issue here]

One can use non-ASCII characters as delimiters in s///, but their
behaviour depends on Perl version. In older Perls, you need to double
the middle delimiter (or rather, the pattern and replacement use their
own pair of delimiters). Example session​:

~ $ perl -v

This is perl 5, version 18, subversion 2 (v5.18.2) built for
x86_64-linux-thread-multi

[snip]

~ $ perl -Mutf8 -wE 'say "a" =~ s¦a¦A¦r'
Substitution replacement not terminated at -e line 1.
[255]
~ $ blead/bin/perl5.25.7 -Mutf8 -wE 'say "a" =~ s¦a¦A¦r'
A
~ $ perl -Mutf8 -wE 'say "a" =~ s¦a¦¦A¦r'
A
~ $ blead/bin/perl5.25.7 -Mutf8 -wE 'say "a" =~ s¦a¦¦A¦r'
Unknown regexp modifier "/A" at -e line 1, near "=~ "
Unrecognized character \x{a6}; marked by <-- HERE after =~ saA<--
HERE near
column 18 at -e line 1.
[255]

Behaviour confirmed. With respect to major versions, the change appeared in 5.20.

I haven't found this change documented in any delta.

Correct. The actual change in behavior was introduced in​:

#####
commit e68dd03
Author​: Father Chrysostomos <sprout@​cpan.org>
Date​: Thu Nov 14 14​:29​:51 2013 -0800

  [perl #120463] s/// and tr/// with wide delimiters
 
  $ perl -Mutf8 -e 's αaαα'
  Substitution replacement not terminated at -e line 1.
...
#####

Observe​:

#####
$ ./perl -v | head -2 | tail -1
This is perl 5, version 19, subversion 6 (v5.19.6 (v5.19.5-289-g5853490)) built for x86_64-linux
$ ./perl -Ilib -Mutf8 -wE 'say "a" =~ s¦a¦A¦r'
Substitution replacement not terminated at -e line 1.

$ ./perl -v | head -2 | tail -1
This is perl 5, version 19, subversion 6 (v5.19.6 (v5.19.5-290-ge68dd03)) built for x86_64-linux
$ ./perl -Ilib -Mutf8 -wE 'say "a" =~ s¦a¦A¦r'
A
#####

In pod/perldelta5200.pod, we have​:

#####
C<s///>, C<tr///> and C<y///> now work when a wide character is used as the delimiter. [perl #120463]
#####

... but we don't have any mention of the fact that the "double-the-delimiter" technique was no longer necessary. We were only focusing on "wide characters".

I'm not sure that we need to document the non-necessity of "double-the-delimiter" going forward. I'm also not sure what our policy is on retrospectively correcting perldeltas for major versions that are out of support. List​: thoughts?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Dec 2, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 5, 2016

From @tonycoz

On Fri, 02 Dec 2016 06​:23​:20 -0800, jkeenan wrote​:

I'm not sure that we need to document the non-necessity of "double-
the-delimiter" going forward. I'm also not sure what our policy is on
retrospectively correcting perldeltas for major versions that are out
of support. List​: thoughts?

The "double the delimiter" trick is a work-around for a bug that e68dd03 fixed.

Per the commit message, you don't actually need to double the delimiter - you could have used a different delimiter instead.

The delta could be changed to describe the previous misbehaviour, but I don't think it's necessary since the ticket already describes that.

Tony

@p5pRT
Copy link
Author

p5pRT commented Dec 29, 2016

From @jkeenan

On Mon, 05 Dec 2016 03​:50​:08 GMT, tonyc wrote​:

On Fri, 02 Dec 2016 06​:23​:20 -0800, jkeenan wrote​:

I'm not sure that we need to document the non-necessity of "double-
the-delimiter" going forward. I'm also not sure what our policy is
on
retrospectively correcting perldeltas for major versions that are out
of support. List​: thoughts?

The "double the delimiter" trick is a work-around for a bug that
e68dd03 fixed.

Per the commit message, you don't actually need to double the
delimiter - you could have used a different delimiter instead.

The delta could be changed to describe the previous misbehaviour, but
I don't think it's necessary since the ticket already describes that.

Tony

So it appears that there's no bug in perl, nor is there sufficient reason to change an older perldelta. Closing ticket.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Dec 29, 2016

@jkeenan - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant