Skip Menu |
Report information
Id: 130679
Status: rejected
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: khw <khw [at] cpan.org>
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Subject: tr''' does not work properly
From: Karl Williamson <khw [...] cpan.org>
To: perlbug [...] perl.org
Date: Mon, 30 Jan 2017 19:24:20 -0700
Download (untitled) / with headers
text/plain 5.6k
This is a bug report for perl from khw@cpan.org, generated with the help of perlbug 1.40 running under perl 5.25.7. ----------------------------------------------------------------- Using a single quote as a delimiter in tr or y does not work properly. The parsing for these commands is split between toke.c and op.c, with the portion in toke.c mostly in S_scan_const. This function does not get called when the delimiter is a single quote; presumably it is expected to be only a double-quotish thing to do. scan_const creates an intermediate form that op.c uses to finish things up. But in this case, there is no intermediate form, and op.c is getting the raw input. This is part of the underlying bug in #130675. op.c is expecting the intermediate form; and the first part of its input matches the intermediate form, right down to the single reserved byte used by that form (which the fuzzer has managed to find). But that reserved byte signals that there should be more to come, and this defective input has none. It so happens that the intermediate form is unchanged from the input for ASCII literal characters that don't have ranges, so this works tr'abc'def' but ranges don't work tr'a-z'A-Z' # translates only the three chars 'a', '-', 'z' I haven't investigated \t, etc. My guess is that it has been broken from the beginning, and it's amazing to me that it has gone undiscovered for so long, requiring a fuzzer, with the red herring of UTF-8 being involved. One possibility to fix this is to simply prohibit single-quotish behavior with tr, by putting in a check somewhere in the parsing. It clearly isn't something that is affecting many, if any, people. Otherwise, the portion of scan_const that deals with this would have to be pulled out into a separate function, called as well from whatever part of toke deals with single quotish. I believe that tr '\xDF'\xFE' should not be evaluated as double-quotish, for example. Since I don't have much knowledge of the parser's overall operation, suggestions are welcome. Or volunteers. ----------------------------------------------------------------- --- Flags: category=core severity=low --- This perlbug was built using Perl 5.25.9 - Thu Jan 19 08:46:21 MST 2017 It is being executed now by Perl 5.25.7 - Fri Nov 18 08:29:04 MST 2016. Site configuration information for perl 5.25.7: Configured by khw at Fri Nov 18 08:29:04 MST 2016. Summary of my perl5 (revision 5 version 25 subversion 7) configuration: Commit id: 51d89e3583b4182c42c21b343376f2286f67fc3b Platform: osname=linux osvers=4.8.0-27-generic archname=x86_64-linux-thread-multi-ld uname='linux khw-inspiron-3558 4.8.0-27-generic #29-ubuntu smp thu oct 20 21:03:13 utc 2016 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Uversiononly -Dprefix=/home/khw/blead -Dusedevel -D'optimize=-ggdb3' -A'optimize=-ggdb3' -A'optimize=-O0' -Accflags='-DPERL_BOOL_AS_CHAR' -Accflags='-DPERL_EXTERNAL_GLOB' -Dman1dir=none -Dman3dir=none -DDEBUGGING -Dcc=g++ -Dusemorebits -Dusethreads -Accflags='-DNO_MATHOMS'' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=define usemymalloc=n bincompat5005=undef Compiler: cc='g++' ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_BOOL_AS_CHAR -DPERL_EXTERNAL_GLOB -DNO_MATHOMS -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize=' -ggdb3 -O0' cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_BOOL_AS_CHAR -DPERL_EXTERNAL_GLOB -DNO_MATHOMS -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='' gccversion='6.2.0 20161005' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='long double' nvsize=16 Off_t='off_t' lseeksize=8 alignbytes=16 prototype=define Linker and Libraries: ld='g++' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/include/c++/6 /usr/include/x86_64-linux-gnu/c++/6 /usr/include/c++/6/backward /usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/6/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.24.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.24' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -ggdb3 -ggdb3 -O0 -L/usr/local/lib -fstack-protector-strong' --- @INC for perl 5.25.7: /home/khw/blead/lib/perl5/site_perl/5.25.7/x86_64-linux-thread-multi-ld /home/khw/blead/lib/perl5/site_perl/5.25.7 /home/khw/blead/lib/perl5/5.25.7/x86_64-linux-thread-multi-ld /home/khw/blead/lib/perl5/5.25.7 /home/khw/blead/lib/perl5/site_perl --- Environment for perl 5.25.7: HOME=/home/khw LANG=en_US.UTF-8 LANGUAGE=en_US LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/lib/ccache:/home/khw/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/games:/usr/local/games PERL5OPT=-w PERL_BADLANG (unset) PERL_DIFF_TOOL=wgdiff PERL_POD_PEDANTIC=1 SHELL=/bin/ksh
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.3k
On Mon, 30 Jan 2017 18:24:38 -0800, khw wrote: Show quoted text
> It so happens that the intermediate form is unchanged from the input > for > ASCII literal characters that don't have ranges, so this works > > tr'abc'def' > > but ranges don't work > > tr'a-z'A-Z' # translates only the three chars 'a', '-', 'z' > > I haven't investigated \t, etc. > > My guess is that it has been broken from the beginning, and it's > amazing > to me that it has gone undiscovered for so long, requiring a fuzzer, > with the red herring of UTF-8 being involved. > > One possibility to fix this is to simply prohibit single-quotish > behavior with tr, by putting in a check somewhere in the parsing. It > clearly isn't something that is affecting many, if any, people. > > Otherwise, the portion of scan_const that deals with this would have > to > be pulled out into a separate function, called as well from whatever > part of toke deals with single quotish. I believe that > > tr '\xDF'\xFE' > > should not be evaluated as double-quotish, for example. > > Since I don't have much knowledge of the parser's overall operation, > suggestions are welcome. Or volunteers.
We'd need to modify S_tokeq() to handle ranges in single-quotish tr'''. Possibly it could check for broken unicode to avoid #130675. I don't think tr''' should support escapes beyond \' Tony
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.7k
On Mon, 30 Jan 2017 20:04:28 -0800, tonyc wrote: Show quoted text
> On Mon, 30 Jan 2017 18:24:38 -0800, khw wrote:
> > It so happens that the intermediate form is unchanged from the input > > for > > ASCII literal characters that don't have ranges, so this works > > > > tr'abc'def' > > > > but ranges don't work > > > > tr'a-z'A-Z' # translates only the three chars 'a', '-', 'z' > > > > I haven't investigated \t, etc. > > > > My guess is that it has been broken from the beginning, and it's > > amazing > > to me that it has gone undiscovered for so long, requiring a fuzzer, > > with the red herring of UTF-8 being involved. > > > > One possibility to fix this is to simply prohibit single-quotish > > behavior with tr, by putting in a check somewhere in the parsing. It > > clearly isn't something that is affecting many, if any, people. > > > > Otherwise, the portion of scan_const that deals with this would have > > to > > be pulled out into a separate function, called as well from whatever > > part of toke deals with single quotish. I believe that > > > > tr '\xDF'\xFE' > > > > should not be evaluated as double-quotish, for example. > > > > Since I don't have much knowledge of the parser's overall operation, > > suggestions are welcome. Or volunteers.
> > We'd need to modify S_tokeq() to handle ranges in single-quotish tr'''. > > Possibly it could check for broken unicode to avoid #130675.
Agreed Show quoted text
> > I don't think tr''' should support escapes beyond \'
Agreed Show quoted text
> > Tony
I don't know why the handling of tr/// is split between op.c and toke.c. I'm now thinking that most of it should be moved to op.c, with the toke.c part merely expanding the single- or double-quotish things. Then there would be no need to have an intermediate form, and I believe the code would be cleaner. -- Karl Williamson
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
On Mon, 06 Feb 2017 17:05:19 -0800, khw wrote: Show quoted text
> On Mon, 30 Jan 2017 20:04:28 -0800, tonyc wrote:
> > On Mon, 30 Jan 2017 18:24:38 -0800, khw wrote:
> > > It so happens that the intermediate form is unchanged from the > > > input > > > for > > > ASCII literal characters that don't have ranges, so this works > > > > > > tr'abc'def' > > > > > > but ranges don't work > > > > > > tr'a-z'A-Z' # translates only the three chars 'a', '-', 'z' > > > > > > I haven't investigated \t, etc. > > > > > > My guess is that it has been broken from the beginning, and it's > > > amazing > > > to me that it has gone undiscovered for so long, requiring a > > > fuzzer, > > > with the red herring of UTF-8 being involved. > > > > > > One possibility to fix this is to simply prohibit single-quotish > > > behavior with tr, by putting in a check somewhere in the parsing. > > > It > > > clearly isn't something that is affecting many, if any, people. > > > > > > Otherwise, the portion of scan_const that deals with this would > > > have > > > to > > > be pulled out into a separate function, called as well from > > > whatever > > > part of toke deals with single quotish. I believe that > > > > > > tr '\xDF'\xFE' > > > > > > should not be evaluated as double-quotish, for example. > > > > > > Since I don't have much knowledge of the parser's overall > > > operation, > > > suggestions are welcome. Or volunteers.
> > > > We'd need to modify S_tokeq() to handle ranges in single-quotish > > tr'''. > > > > Possibly it could check for broken unicode to avoid #130675.
> > Agreed >
> > > > I don't think tr''' should support escapes beyond \'
> > Agreed >
> > > > Tony
> > I don't know why the handling of tr/// is split between op.c and > toke.c. I'm now thinking that most of it should be moved to op.c, > with the toke.c part merely expanding the single- or double-quotish > things. Then there would be no need to have an intermediate form, > and I believe the code would be cleaner.
I have figured out why it is split between op.c and tr.c. If op.c didn't get it before it was fully parsed, it wouldn't be able to distinguish between things like an interior hyphen was input as a literal, and hence should be treated as a metacharacter; or if it was the expansion of something like \x2D, in which case it isn't a metacharacter. We have the same problem, and have (mostly) solved it, for regex patterns that are compiled separately, but it's work, and has presented a bunch of bugs over the years -- Karl Williamson
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 195b
This is actually working as designed. I added doc clarifications in 8efc02fdaef53aef61eb0b7569d5df023752913a. See also http://nntp.perl.org/group/perl.perl5.porters/251544 -- Karl Williamson


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org