Skip Menu |
 
Report information
Id: 60472
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: mihara [at] twister.dev.iwa.fujixerox.co.jp
Cc:
AdminCc:

Operating System: netbsd
PatchStatus: (no value)
Severity: low
Type: library
Perl Version: 5.10.0
Fixed In: (no value)



Subject: Module Encode degrades in Perl 5.10
Date: Tue, 11 Nov 2008 14:26:44 +0900 (JST)
To: perlbug [...] perl.org
From: mihara [...] twister.dev.iwa.fujixerox.co.jp (Osamu MIHARA)
Download (untitled) / with headers
text/plain 5.4k
This is a bug report for perl from mihara@twister.dev.iwa.fujixerox.co.jp, generated with the help of perlbug 1.36 running under perl 5.10.0. ----------------------------------------------------------------- [Please enter your report here] I found this bug while playing with Encode::IMAPUTF7 module. Encode::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10. The problem is that if "$1" is fed into "encode()" directly, $1 for subsequent pattern matching is not updated. Here is a small code to produce this. ---- #!/usr/bin/perl use MIME::Base64; use Encode; my $e_utf16 = find_encoding("UTF-16BE"); $str="abcあdef"; my $len=length($str); $re1=qr/(?:[a-z])/; $re2=qr/(?:[^a-z])/; $byte=''; pos($str) = 0; while (pos($str) < $len) { #print pos($str),"\n"; print "\n",pos($str),":",$str,"[$1]\n"; if ($str =~ /\G($re1+)/cg) { print "($1)"; $bytes .= $1; } elsif ($str =~ /\G($re2+)/cg) { my $base64 = encode_base64($e_utf16->encode($1), ''); print "<$1:$base64>"; $bytes .= $1; } else { die "aaa"; } } ----- Please be sure that $str contains Japanese character in UTF-8. when "$1" is fed like this in line 19 my $base64 = encode_base64($e_utf16->encode($1), ''); next pattern machi in line 15 if ($str =~ /\G($re1+)/cg) { cannot update $1. Previous $1 remains. Work around is this. my $tmp = $1; my $base64 = encode_base64($e_utf16->encode($tmp), ''); But inside the $e_utf16->encode(), something may be wrong. Sorry for my poor explanation. Please mail me at osamu.mihara'@'fujixerox.co.jp, if you have further questions. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=library severity=low --- Site configuration information for perl 5.10.0: Configured by mihara at Thu Nov 6 15:55:43 JST 2008. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=netbsd, osvers=4.0.1, archname=x86_64-netbsd-thread-multi uname='netbsd twister.dev.iwa.fujixerox.co.jp 4.0.1 netbsd 4.0.1 (generic.mp) #0: tue oct 7 21:00:48 pdt 2008 builds@wb28:homebuildsabnetbsd-4-0-1-releaseamd64200810080053z-objhomebuildsabnetbsd-4-0-1-releasesrcsysarchamd64compilegeneric.mp amd64 ' config_args='-sde -Darchname=x86_64-netbsd -Dcc=cc -Doptimize=-O2 -pthread -I/usr/include -Ui_malloc -Uusemymalloc -Uinstallusrbinperl -Dinstallstyle=lib/perl5 -Dprefix=/usr/pkg -Dsiteprefix=/usr/pkg -Dvendorprefix=/usr/pkg -Dscriptdir=/usr/pkg/lib/perl5/bin -Dsitescript=/usr/pkg/lib/perl5/site_perl/bin -Dvendorscript=/usr/pkg/lib/perl5/vendor_perl/bin -Dsitebin=/usr/pkg/lib/perl5/site_perl/bin -Dvendorbin=/usr/pkg/lib/perl5/vendor_perl/bin -Dprivlib=/usr/pkg/lib/perl5/5.10.0 -Dsitelib=/usr/pkg/lib/perl5/site_perl/5.10.0 -Dvendorlib=/usr/pkg/lib/perl5/vendor_perl/5.10.0 -Dsitelib_stem=/usr/pkg/lib/perl5/site_perl -Dvendorlib_stem=/usr/pkg/lib/perl5/vendor_perl -Dman1ext=1 -Dman1dir=/usr/pkg/lib/perl5/man/man1 -Dsiteman1dir=/usr/pkg/lib/perl5/site_perl/man/man1 -Dvendorman1dir=/usr/pkg/lib/perl5/vendor_perl/man/man1 -Dman3ext=3 -Dman3dir=/usr/pkg/lib/perl5/man/man3 -Dsiteman3dir=/usr/pkg/lib/perl5/site_perl/man/man3 -Dvendorman3dir=/usr/pkg/lib/perl5/vendor_perl/man/man3 - Duseshrplib -Daphostname=/bin/hostname -Dln=/bin/ln -Dsed=/usr/bin/sed -Dsh=/bin/sh -Dissymlink=test -h -Dperl5=false -Duseithreads -Dlibswanted=m crypt ' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/pkg/include', optimize='-O2 -pthread -I/usr/include', cppflags='-fno-strict-aliasing -pipe -I/usr/pkg/include' ccversion='', gccversion='4.1.2 20061021 prerelease (NetBSD nb3 20061125)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags ='-Wl,-R/usr/pkg/lib -L/usr/pkg/lib' libpth=/usr/lib /usr/pkg/lib libs=-lm -lcrypt -lpthread perllibs=-lm -lcrypt -lpthread libc=/lib/libc.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-R/usr/pkg/lib/perl5/5.10.0/x86_64-netbsd-thread-multi/CORE' cccdlflags='-DPIC -fPIC ', lddlflags='-Wl,-R/usr/pkg/lib --whole-archive -shared -L/usr/pkg/lib' Locally applied patches: --- @INC for perl 5.10.0: /usr/pkg/lib/perl5/site_perl/5.10.0/x86_64-netbsd-thread-multi /usr/pkg/lib/perl5/site_perl/5.10.0 /usr/pkg/lib/perl5/vendor_perl/5.10.0/x86_64-netbsd-thread-multi /usr/pkg/lib/perl5/vendor_perl/5.10.0 /usr/pkg/lib/perl5/vendor_perl /usr/pkg/lib/perl5/5.10.0/x86_64-netbsd-thread-multi /usr/pkg/lib/perl5/5.10.0 . --- Environment for perl 5.10.0: HOME=/home/mihara LANG=ja_JP.eucJP LANGUAGE (unset) LC_ALL=ja_JP.eucJP LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/mihara/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin:/usr/games PERL_BADLANG=0 SHELL=/usr/pkg/bin/tcsh
CC: bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Tue, 11 Nov 2008 18:01:14 +0100
To: perl5-porters [...] perl.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 3.7k
2008/11/11 mihara@twister.dev.iwa.fujixerox.co.jp (via RT) <perlbug-followup@perl.org>: Show quoted text
> # New Ticket Created by mihara@twister.dev.iwa.fujixerox.co.jp > # Please include the string: [perl #60472] > # in the subject line of all future correspondence about this issue. > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=60472 > > > > > This is a bug report for perl from mihara@twister.dev.iwa.fujixerox.co.jp, > generated with the help of perlbug 1.36 running under perl 5.10.0. > > > ----------------------------------------------------------------- > [Please enter your report here] > I found this bug while playing with Encode::IMAPUTF7 module. > Encode::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10. > The problem is that if "$1" is fed into "encode()" directly, $1 for > subsequent pattern matching is not updated. Here is a small code to produce this. > > > ---- > #!/usr/bin/perl > use MIME::Base64; > use Encode; > my $e_utf16 = find_encoding("UTF-16BE"); > > $str="abcあdef"; > my $len=length($str); > $re1=qr/(?:[a-z])/; > $re2=qr/(?:[^a-z])/; > $byte=''; > pos($str) = 0; > while (pos($str) < $len) { > #print pos($str),"\n"; > print "\n",pos($str),":",$str,"[$1]\n"; > if ($str =~ /\G($re1+)/cg) { > print "($1)"; > $bytes .= $1; > } elsif ($str =~ /\G($re2+)/cg) { > my $base64 = encode_base64($e_utf16->encode($1), ''); > print "<$1:$base64>"; > $bytes .= $1; > } else { > die "aaa"; > } > } > ----- > > Please be sure that $str contains Japanese character in UTF-8. > > when "$1" is fed like this in line 19 > my $base64 = encode_base64($e_utf16->encode($1), ''); > > next pattern machi in line 15 > > if ($str =~ /\G($re1+)/cg) { > > cannot update $1. Previous $1 remains. > > Work around is this. > > my $tmp = $1; > my $base64 = encode_base64($e_utf16->encode($tmp), ''); > > But inside the $e_utf16->encode(), something may be wrong. > > Sorry for my poor explanation. Please mail me at osamu.mihara'@'fujixerox.co.jp, if you have further questions.
Can you attach the code as a file? I dont seem to be able to create the appropriate input string. I change the Japanese character to a question mark i get the following output: D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl 0:abc?def[] (abc) 3:abc?def[abc] <?:AD8=> 4:abc?def[?] (?) D:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl 0:abc?def[] (abc) 3:abc?def[abc] <?:AD8=> 4:abc?def[?] (def) which clearly is different. (the bottom one is perl 5.8.6, the top is a pretty recent blead). If i change it to \x{100} then i get: D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl Wide character in print at ..\encode_bug.pl line 14. 0:abc─Çdef[] (abc) Wide character in print at ..\encode_bug.pl line 14. 3:abc─Çdef[abc] Wide character in print at ..\encode_bug.pl line 20. <─Ç:AQA=> Wide character in print at ..\encode_bug.pl line 14. Wide character in print at ..\encode_bug.pl line 14. 4:abc─Çdef[─Ç] Wide character in print at ..\encode_bug.pl line 16. (─Ç) D:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl Wide character in print at ..\encode_bug.pl line 14. 0:abc─Çdef[] (abc) Wide character in print at ..\encode_bug.pl line 14. 3:abc─Çdef[abc] Modification of a read-only value attempted at ..\encode_bug.pl line 19. some of which is explainable by this being a windows box, and not having a utf8 shell, but it looks like at least one issue (concerning the read only value) is fixed in blead. Which sortof reminds me of something: Encode has an annoying habit of modifying its arguments.... Anyway, obviously there is a bug. Damned if i understand it tho. Cheers, yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
CC: bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Tue, 11 Nov 2008 18:12:09 +0100
To: perl5-porters [...] perl.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 4.6k
2008/11/11 demerphq <demerphq@gmail.com>: Show quoted text
> 2008/11/11 mihara@twister.dev.iwa.fujixerox.co.jp (via RT) > <perlbug-followup@perl.org>:
>> # New Ticket Created by mihara@twister.dev.iwa.fujixerox.co.jp >> # Please include the string: [perl #60472] >> # in the subject line of all future correspondence about this issue. >> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=60472 > >> >> >> >> This is a bug report for perl from mihara@twister.dev.iwa.fujixerox.co.jp, >> generated with the help of perlbug 1.36 running under perl 5.10.0. >> >> >> ----------------------------------------------------------------- >> [Please enter your report here] >> I found this bug while playing with Encode::IMAPUTF7 module. >> Encode::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10. >> The problem is that if "$1" is fed into "encode()" directly, $1 for >> subsequent pattern matching is not updated. Here is a small code to produce this. >> >> >> ---- >> #!/usr/bin/perl >> use MIME::Base64; >> use Encode; >> my $e_utf16 = find_encoding("UTF-16BE"); >> >> $str="abcあdef"; >> my $len=length($str); >> $re1=qr/(?:[a-z])/; >> $re2=qr/(?:[^a-z])/; >> $byte=''; >> pos($str) = 0; >> while (pos($str) < $len) { >> #print pos($str),"\n"; >> print "\n",pos($str),":",$str,"[$1]\n"; >> if ($str =~ /\G($re1+)/cg) { >> print "($1)"; >> $bytes .= $1; >> } elsif ($str =~ /\G($re2+)/cg) { >> my $base64 = encode_base64($e_utf16->encode($1), ''); >> print "<$1:$base64>"; >> $bytes .= $1; >> } else { >> die "aaa"; >> } >> }
If i used Devel::Peek to have a look at $1 after each regex match i find this: (first output is blead, second is 5.8.6). So for some reason $1 isnt marked as readonly in 5.10, and what i find REALLY interesting is the last Dump output from both. In particular we see this from 5.8.6: SV = PVMG(0x1a76b2c) at 0x1a98d9c REFCNT = 1 FLAGS = (GMG,SMG,READONLY,pPOK) IV = 0 NV = 0 PV = 0x1a9359c "?"\0 CUR = 1 LEN = 11 MAGIC = 0x1a994c4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a98d90 MG_LEN = 1 MG_PTR = 0x1a803b4 "1" (def) and in blead: SV = PVMG(0x1ac052c) at 0x1a4dab4 REFCNT = 1 FLAGS = (GMG,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x1a60834 "?"\0 [UTF8 "?"] CUR = 1 LEN = 12 MAGIC = 0x1aa6bb4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a4dac4 MG_LEN = 1 MG_PTR = 0x1ab4a8c "1" (?) So, outside of the added UTF8 flag (wtf did that come from), and the missing READONLY flag, the dump is basically the same. Which really makes me scratch my head. D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl 0:abc?def[] SV = PVMG(0x1ac052c) at 0x1a4dab4 REFCNT = 1 FLAGS = (GMG,SMG) IV = 0 NV = 0 PV = 0x1a60834 "UCS"\0 CUR = 3 LEN = 12 MAGIC = 0x1aa6bb4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a4dac4 MG_LEN = 1 MG_PTR = 0x1ab4a8c "1" (abc) 3:abc?def[abc] SV = PVMG(0x1ac052c) at 0x1a4dab4 REFCNT = 1 FLAGS = (GMG,SMG,pPOK) IV = 0 NV = 0 PV = 0x1a60834 "abc"\0 CUR = 3 LEN = 12 MAGIC = 0x1aa6bb4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a4dac4 MG_LEN = 1 MG_PTR = 0x1ab4a8c "1" <?:AD8=> 4:abc?def[?] SV = PVMG(0x1ac052c) at 0x1a4dab4 REFCNT = 1 FLAGS = (GMG,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x1a60834 "?"\0 [UTF8 "?"] CUR = 1 LEN = 12 MAGIC = 0x1aa6bb4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a4dac4 MG_LEN = 1 MG_PTR = 0x1ab4a8c "1" (?) D:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl 0:abc?def[] SV = PVMG(0x1a76b2c) at 0x1a98d9c REFCNT = 1 FLAGS = (GMG,SMG,READONLY) IV = 0 NV = 0 PV = 0x1a9359c "UCS"\0 CUR = 3 LEN = 11 MAGIC = 0x1a994c4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a98d90 MG_LEN = 1 MG_PTR = 0x1a803b4 "1" (abc) 3:abc?def[abc] SV = PVMG(0x1a76b2c) at 0x1a98d9c REFCNT = 1 FLAGS = (GMG,SMG,READONLY,pPOK) IV = 0 NV = 0 PV = 0x1a9359c "abc"\0 CUR = 3 LEN = 11 MAGIC = 0x1a994c4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a98d90 MG_LEN = 1 MG_PTR = 0x1a803b4 "1" <?:AD8=> 4:abc?def[?] SV = PVMG(0x1a76b2c) at 0x1a98d9c REFCNT = 1 FLAGS = (GMG,SMG,READONLY,pPOK) IV = 0 NV = 0 PV = 0x1a9359c "?"\0 CUR = 1 LEN = 11 MAGIC = 0x1a994c4 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x1a98d90 MG_LEN = 1 MG_PTR = 0x1a803b4 "1" (def) -- perl -Mre=debug -e "/just|another|perl|hacker/"
CC: bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Tue, 11 Nov 2008 18:15:42 +0100
To: perl5-porters [...] perl.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 1.1k
2008/11/11 demerphq <demerphq@gmail.com>: Show quoted text
> So for some reason $1 isnt marked as readonly in 5.10, and what i find > REALLY interesting is the last Dump output from both. In particular > we see this from 5.8.6: > > SV = PVMG(0x1a76b2c) at 0x1a98d9c > REFCNT = 1 > FLAGS = (GMG,SMG,READONLY,pPOK) > IV = 0 > NV = 0 > PV = 0x1a9359c "?"\0 > CUR = 1 > LEN = 11 > MAGIC = 0x1a994c4 > MG_VIRTUAL = &PL_vtbl_sv > MG_TYPE = PERL_MAGIC_sv(\0) > MG_OBJ = 0x1a98d90 > MG_LEN = 1 > MG_PTR = 0x1a803b4 "1" > (def) > > and in blead: > > SV = PVMG(0x1ac052c) at 0x1a4dab4 > REFCNT = 1 > FLAGS = (GMG,SMG,POK,pPOK,UTF8)
Hmm, im wondering, does the POK mean that we ignore the vtable? Is it possible somehow that Encode is turning on POK and UTF8 incorrectly for some reason? Maybe because the variable isnt READONLY or something? Show quoted text
> IV = 0 > NV = 0 > PV = 0x1a60834 "?"\0 [UTF8 "?"] > CUR = 1 > LEN = 12 > MAGIC = 0x1aa6bb4 > MG_VIRTUAL = &PL_vtbl_sv > MG_TYPE = PERL_MAGIC_sv(\0) > MG_OBJ = 0x1a4dac4 > MG_LEN = 1 > MG_PTR = 0x1ab4a8c "1" > (?)
-- perl -Mre=debug -e "/just|another|perl|hacker/"
CC: perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Tue, 11 Nov 2008 17:20:39 +0000
To: demerphq <demerphq [...] gmail.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.1k
On Tue, Nov 11, 2008 at 06:15:42PM +0100, demerphq wrote: Show quoted text
> 2008/11/11 demerphq <demerphq@gmail.com>:
> > So for some reason $1 isnt marked as readonly in 5.10, and what i find > > REALLY interesting is the last Dump output from both. In particular > > we see this from 5.8.6: > > > > SV = PVMG(0x1a76b2c) at 0x1a98d9c > > REFCNT = 1 > > FLAGS = (GMG,SMG,READONLY,pPOK) > > IV = 0 > > NV = 0 > > PV = 0x1a9359c "?"\0 > > CUR = 1 > > LEN = 11 > > MAGIC = 0x1a994c4 > > MG_VIRTUAL = &PL_vtbl_sv > > MG_TYPE = PERL_MAGIC_sv(\0) > > MG_OBJ = 0x1a98d90 > > MG_LEN = 1 > > MG_PTR = 0x1a803b4 "1" > > (def) > > > > and in blead: > > > > SV = PVMG(0x1ac052c) at 0x1a4dab4 > > REFCNT = 1 > > FLAGS = (GMG,SMG,POK,pPOK,UTF8)
> > Hmm, im wondering, does the POK mean that we ignore the vtable? > > Is it possible somehow that Encode is turning on POK and UTF8 > incorrectly for some reason? Maybe because the variable isnt READONLY > or something?
POK shouldn't be on for something with GMG. It should only be pPOK, so that any calls to SvPV() etc call into Perl_sv2pv_flags(), and the get magic is triggered. So that (I think) means that we ignore the vtable, yes. Nicholas Clark
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Tue, 11 Nov 2008 19:45:22 +0100
To: demerphq <demerphq [...] gmail.com>, perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 1.5k
2008/11/11 Nicholas Clark <nick@ccl4.org>: Show quoted text
> On Tue, Nov 11, 2008 at 06:15:42PM +0100, demerphq wrote:
>> 2008/11/11 demerphq <demerphq@gmail.com>:
>> > So for some reason $1 isnt marked as readonly in 5.10, and what i find >> > REALLY interesting is the last Dump output from both. In particular >> > we see this from 5.8.6: >> > >> > SV = PVMG(0x1a76b2c) at 0x1a98d9c >> > REFCNT = 1 >> > FLAGS = (GMG,SMG,READONLY,pPOK) >> > IV = 0 >> > NV = 0 >> > PV = 0x1a9359c "?"\0 >> > CUR = 1 >> > LEN = 11 >> > MAGIC = 0x1a994c4 >> > MG_VIRTUAL = &PL_vtbl_sv >> > MG_TYPE = PERL_MAGIC_sv(\0) >> > MG_OBJ = 0x1a98d90 >> > MG_LEN = 1 >> > MG_PTR = 0x1a803b4 "1" >> > (def) >> > >> > and in blead: >> > >> > SV = PVMG(0x1ac052c) at 0x1a4dab4 >> > REFCNT = 1 >> > FLAGS = (GMG,SMG,POK,pPOK,UTF8)
>> >> Hmm, im wondering, does the POK mean that we ignore the vtable? >> >> Is it possible somehow that Encode is turning on POK and UTF8 >> incorrectly for some reason? Maybe because the variable isnt READONLY >> or something?
> > POK shouldn't be on for something with GMG. > It should only be pPOK, so that any calls to SvPV() etc call into > Perl_sv2pv_flags(), and the get magic is triggered. > > So that (I think) means that we ignore the vtable, yes.
Should I take it that as you didn't mention the READONLY being missing that you have no idea about that? Shouldn't they be readonly? Or am I missing something? It seems reasonable that Perl 5.8.6 dies, and blead doesn't is related. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
CC: perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Tue, 11 Nov 2008 18:47:38 +0000
To: demerphq <demerphq [...] gmail.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 397b
On Tue, Nov 11, 2008 at 07:45:22PM +0100, demerphq wrote: Show quoted text
> 2008/11/11 Nicholas Clark <nick@ccl4.org>:
Show quoted text
> Should I take it that as you didn't mention the READONLY being missing > that you have no idea about that? > > Shouldn't they be readonly? Or am I missing something?
I think that they should be, but I missed the difference in the READONLY flag, and have no idea about it. Nicholas Clark
CC: bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Wed, 12 Nov 2008 10:55:53 +0100
To: perl5-porters [...] perl.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 750b
2008/11/11 Nicholas Clark <nick@ccl4.org>: Show quoted text
> On Tue, Nov 11, 2008 at 07:45:22PM +0100, demerphq wrote:
>> 2008/11/11 Nicholas Clark <nick@ccl4.org>:
>
>> Should I take it that as you didn't mention the READONLY being missing >> that you have no idea about that? >> >> Shouldn't they be readonly? Or am I missing something?
> > I think that they should be, but I missed the difference in the READONLY > flag, and have no idea about it.
Turns out that this is deliberate, as some pluggable regex engines allow for a writable $1. Aevar patched this in quite some time ago. So I think at this points its a bug in Encode, but I couldn't figure out how to fix it in the time I had available. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Download (untitled) / with headers
text/plain 578b
According to the docs of Encode::Encoding, <http://search.cpan.org/~dankogai/Encode-2.26/lib/Encode/Encoding.pm>, encode is supposed to change its parameter. You're feeding it a (supposedly) readonly value, $1. So, don't do that. As to why the regex engine no longer updates $1 after such a change, at least in 5.10, where modifying $1 apparently is allowed, that must be a bug. I'm not sure as to what is supposed to happen to the regex, or the string, if you change $1. I think it should be similar to applying s///. For fun, try changing the parameter to encode to $1=$1.
Dave notes: Either Encode is doing something with $1 it shouldn't, or there's something up with $1
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.2k
On Thu May 28 07:43:38 2009, nicholas wrote: Show quoted text
> Dave notes: > > Either Encode is doing something with $1 it shouldn't, or there's > something up with $1
The first thing Encode::Unicode::encode_xs() does is call sv_utf8_upgrade(). When $1 was READONLY this would upgrade PVX and set UTF8 without setting POK. With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX (if needed), and then set UTF8. This doesn't call mg_set(), so the read-only handling in Perl_reg_numbered_buff_store isn't triggered to produce a read only value error. Encode::Unicode::encode_xs() makes further modifications to the SV later on without calling mg_set(). So three issues: 1) sv_utf8_upgrade() is setting POK via SvPV_force - is this sane for a GMG SV? 2) sv_utf8_upgrade() updates the string, but doesn't call mg_set() - should the caller be doing it? 3) Encode::Unicode::encode_xs() changes the string in other ways, but doesn't call mg_set() when the string (str) is SMG. I suspect Encode::Unicode::encode_xs not calling is a bug, but I'll admit I don't have a strong understanding of magic. I'm not sure what the desired behaviour would be between sv_utf8_upgrade(), SvPV_force() and set magic. This makes me wonder if $1 should be just marked READONLY.
CC: perl5-porters [...] perl.org
Subject: Re: [perl #60472] Module Encode degrades in Perl 5.10
Date: Wed, 8 Jul 2009 22:07:49 +1000
To: Tony Cook via RT <perlbug-followup [...] perl.org>
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 1.3k
On Mon, Jul 06, 2009 at 07:54:42AM -0700, Tony Cook via RT wrote: Show quoted text
> With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX > (if needed), and then set UTF8. This doesn't call mg_set(), so the > read-only handling in Perl_reg_numbered_buff_store isn't triggered to > produce a read only value error.
I've confirmed SvPV_force() will set POK if called on $1 [1]. Since SvPV() simply returns PVX when POK is set, any magic is ignored on later use of $1. So the main questions here is whether SvPV_force() should set SVf_POK on an SV with get magic? I suspect SvPV_force() shouldn't be setting POK, but I don't know what the policy is. Show quoted text
> I'm not sure what the desired behaviour would be between > sv_utf8_upgrade(), SvPV_force() and set magic. This makes me wonder if > $1 should be just marked READONLY.
I'm sure it shouldn't be. -- [1] By calling an XS function defined: void SvPV_force(sv) SV * sv PREINIT: STRLEN len; PPCODE: SvPV_force(sv, len); $ perl -Mblib -MT60472 -MDevel::Peek -le '$_ = "abcd"; /..(.)/; print $1; T60472::SvPV_force($1); Dump($1); /.(.)/; print $1' c SV = PVMG(0x13aaed0) at 0x13e4960 REFCNT = 1 FLAGS = (GMG,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x13894a0 "c"\0 CUR = 1 LEN = 8 MAGIC = 0x13ee9b0 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x13e4900 MG_LEN = 1 MG_PTR = 0x13ee8d0 "1" c
CC: Tony Cook via RT <perlbug-followup [...] perl.org>, perl5-porters [...] perl.org
Subject: SvPV_force() (was Re: [perl #60472] Module Encode degrades in Perl 5.10)
Date: Mon, 13 Jul 2009 16:58:28 +0100
To: Tony Cook <tony [...] develop-help.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.4k
On Wed, Jul 08, 2009 at 10:07:49PM +1000, Tony Cook wrote: Show quoted text
> On Mon, Jul 06, 2009 at 07:54:42AM -0700, Tony Cook via RT wrote:
> > With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX > > (if needed), and then set UTF8. This doesn't call mg_set(), so the > > read-only handling in Perl_reg_numbered_buff_store isn't triggered to > > produce a read only value error.
> > I've confirmed SvPV_force() will set POK if called on $1 [1]. > > Since SvPV() simply returns PVX when POK is set, any magic is ignored > on later use of $1. > > So the main questions here is whether SvPV_force() should set SVf_POK > on an SV with get magic?
The two together are contradictory. So something must give. Show quoted text
> I suspect SvPV_force() shouldn't be setting POK, but I don't know what > the policy is.
I suspect that it should not. I've no idea what would get upset if this changed. The code itself is, well, old. Hover over it: http://perl5.git.perl.org/perl.git/blame/blead:/sv.c#l8314 commit a0d0e21ea6ea90a22318550944fe6cb09ae10cda Author: Larry Wall <lwall@netlabs.com> Date: Mon Oct 17 23:00:00 1994 +0000 The logical alternative would seem to be to turn SvPOK on, but remove magic. But that would de-taint tainted scalars, which feels bad. (Which, really, suggests that tainting is implemented wrongly. I suspect that it would be a lot better re-done as a bit in SvFLAGS(), with manual propagation in the relevant ops. This would also eliminate taint-tunnelling, which has always bothered me) Nicholas Clark
Subject: Re: SvPV_force() (was Re: [perl #60472] Module Encode degrades in Perl 5.10)
Date: Tue, 14 Jul 2009 10:38:52 +1000
To: Tony Cook via RT <perlbug-followup [...] perl.org>, perl5-porters [...] perl.org
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 1.9k
On Mon, Jul 13, 2009 at 04:58:28PM +0100, Nicholas Clark wrote: Show quoted text
> On Wed, Jul 08, 2009 at 10:07:49PM +1000, Tony Cook wrote:
> > On Mon, Jul 06, 2009 at 07:54:42AM -0700, Tony Cook via RT wrote:
> > > With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX > > > (if needed), and then set UTF8. This doesn't call mg_set(), so the > > > read-only handling in Perl_reg_numbered_buff_store isn't triggered to > > > produce a read only value error.
> > > > I've confirmed SvPV_force() will set POK if called on $1 [1]. > > > > Since SvPV() simply returns PVX when POK is set, any magic is ignored > > on later use of $1. > > > > So the main questions here is whether SvPV_force() should set SVf_POK > > on an SV with get magic?
> > The two together are contradictory. So something must give. >
> > I suspect SvPV_force() shouldn't be setting POK, but I don't know what > > the policy is.
> > I suspect that it should not. I've no idea what would get upset if this > changed. The code itself is, well, old. Hover over it: > > http://perl5.git.perl.org/perl.git/blame/blead:/sv.c#l8314 > > commit a0d0e21ea6ea90a22318550944fe6cb09ae10cda > Author: Larry Wall <lwall@netlabs.com> > Date: Mon Oct 17 23:00:00 1994 +0000 > > > The logical alternative would seem to be to turn SvPOK on, but remove magic. > But that would de-taint tainted scalars, which feels bad.
In the specific case in the ticket it would get or set magic from $1 too, which is worse. When I find the time/energy I'll try adjusting Perl_sv_pvn_force_flags to set only pPOK for GMG SVs and see what it breaks. There is also a separate bug in Encode::Unicode::encode_xs - it modifies the value but doesn't apply set magic, which means the set magic for $1 isn't applied and we don't see a "modification of a readonly value" error. Unfortunately, the damage (POK on) is already done to the SV at this point, so if the error is caught by eval, we're still left with a broken $1. Tony
Subject: Re: SvPV_force() (was Re: [perl #60472] Module Encode degrades in Perl 5.10)
Date: Thu, 16 Jul 2009 13:29:33 +1000
To: Tony Cook via RT <perlbug-followup [...] perl.org>, perl5-porters [...] perl.org
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 2.2k
On Tue, Jul 14, 2009 at 10:38:52AM +1000, Tony Cook wrote: Show quoted text
> When I find the time/energy I'll try adjusting Perl_sv_pvn_force_flags > to set only pPOK for GMG SVs and see what it breaks.
With the following change: --- a/sv.c +++ b/sv.c @@ -8313,7 +8313,10 @@ Perl_sv_pvn_force_flags(pTHX_ SV *const sv, STRLEN *const lp, const I32 flags) SvPVX(sv)[len] = '\0'; } if (!SvPOK(sv)) { - SvPOK_on(sv); /* validate pointer */ + if (SvGMAGICAL(sv)) + SvFLAGS(sv) |= SVp_POK; + else + SvPOK_on(sv); /* validate pointer */ SvTAINT(sv); DEBUG_c(PerlIO_printf(Perl_debug_log, "0x%"UVxf" 2pv(%s)\n", PTR2UV(sv),SvPVX_const(sv))); The only change in tests are: @@ -816,17 +816,11 @@ ext/B/t/terse.................................................ok ext/B/t/xref..................................................ok ext/Compress-Raw-Bzip2/t/000prereq............................ok -ext/Compress-Raw-Bzip2/t/01bzip2..............................# Failed test at ../ext/Compress-Raw-Bzip2/t/01bzip2.t line 461. -# got: 'AppendedSYȀNM^@^@^E<9D><80>@^@@ d*^B<D6>^@ ^@"<B2>4<F5>0<9E> <A6><85>0^@M^G%<F6><80>:<98><CA>z^Oh<DD><FC>]<C9>^T<E1>BC"^A94Appended' -# expected: 'Appended' -FAILED at test 111 +ext/Compress-Raw-Bzip2/t/01bzip2..............................ok ext/Compress-Raw-Bzip2/t/09limitoutput........................ok ext/Compress-Raw-Bzip2/t/99pod................................skipped ext/Compress-Raw-Zlib/t/01version.............................ok -ext/Compress-Raw-Zlib/t/02zlib................................# Failed test at ../ext/Compress-Raw-Zlib/t/02zlib.t line 712. -# got: 'AppendeHT<F0>p<F4>Q<B0>400PH<CE><CF>-(-I-^B^@O<96>^G&Appended' -# expected: 'Appended' -FAILED at test 197 +ext/Compress-Raw-Zlib/t/02zlib................................ok ext/Compress-Raw-Zlib/t/07bufsize.............................ok ext/Compress-Raw-Zlib/t/09limitoutput.........................ok ext/Compress-Raw-Zlib/t/18lvalue..............................ok (and the summary) Copied and pasted from less, hence the <XX>. The inflate code in both classes also forces POK on. Tony
Download (untitled) / with headers
text/plain 447b
On Wed Jul 15 20:30:15 2009, tonyc wrote: Show quoted text
> On Tue, Jul 14, 2009 at 10:38:52AM +1000, Tony Cook wrote:
> > When I find the time/energy I'll try adjusting
> Perl_sv_pvn_force_flags
> > to set only pPOK for GMG SVs and see what it breaks.
This seems like it was a wild goose chase. SvSETMAGIC() will clear the POK flag, and if encode is modified to call it, the code behaves as expected. Pushed to https://rt.cpan.org/Ticket/Display.html?id=51263
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 581b
On Tue Nov 10 00:10:31 2009, tonyc wrote: Show quoted text
> On Wed Jul 15 20:30:15 2009, tonyc wrote:
> > On Tue, Jul 14, 2009 at 10:38:52AM +1000, Tony Cook wrote:
> > > When I find the time/energy I'll try adjusting
> > Perl_sv_pvn_force_flags
> > > to set only pPOK for GMG SVs and see what it breaks.
> > This seems like it was a wild goose chase. SvSETMAGIC() will clear the > POK flag, and if encode is modified to call it, the code behaves as > expected. > > Pushed to https://rt.cpan.org/Ticket/Display.html?id=51263
The specific case described in 60472 should be fixed by Encode 2.38.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org