Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module Encode degrades in Perl 5.10 #9561

Closed
p5pRT opened this issue Nov 11, 2008 · 19 comments
Closed

Module Encode degrades in Perl 5.10 #9561

p5pRT opened this issue Nov 11, 2008 · 19 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 11, 2008

Migrated from rt.perl.org#60472 (status was 'resolved')

Searchable as RT60472$

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

From mihara@twister.dev.iwa.fujixerox.co.jp

Created by mihara@twister.dev.iwa.fujixerox.co.jp

I found this bug while playing with Encode​::IMAPUTF7 module.
Encode​::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10.
The problem is that if "$1" is fed into "encode()" directly, $1 for
subsequent pattern matching is not updated. Here is a small code to produce this.

----
#!/usr/bin/perl
use MIME​::Base64;
use Encode;
my $e_utf16 = find_encoding("UTF-16BE");

$str="abcあdef";
my $len=length($str);
$re1=qr/(?​:[a-z])/;
$re2=qr/(?​:[^a-z])/;
$byte='';
pos($str) = 0;
while (pos($str) < $len) {
  #print pos($str),"\n";
  print "\n",pos($str),"​:",$str,"[$1]\n";
  if ($str =~ /\G($re1+)/cg) {
  print "($1)";
  $bytes .= $1;
  } elsif ($str =~ /\G($re2+)/cg) {
  my $base64 = encode_base64($e_utf16->encode($1), '');
  print "<$1​:$base64>";
  $bytes .= $1;
  } else {
  die "aaa";
  }
}
-----

Please be sure that $str contains Japanese character in UTF-8.

when "$1" is fed like this in line 19
  my $base64 = encode_base64($e_utf16->encode($1), '');

next pattern machi in line 15

  if ($str =~ /\G($re1+)/cg) {

cannot update $1. Previous $1 remains.

Work around is this.

  my $tmp = $1;
  my $base64 = encode_base64($e_utf16->encode($tmp), '');

But inside the $e_utf16->encode(), something may be wrong.

Sorry for my poor explanation. Please mail me at osamu.mihara'@​'fujixerox.co.jp, if you have further questions.

Perl Info

Flags:
    category=library
    severity=low

Site configuration information for perl 5.10.0:

Configured by mihara at Thu Nov  6 15:55:43 JST 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=netbsd, osvers=4.0.1, archname=x86_64-netbsd-thread-multi
    uname='netbsd twister.dev.iwa.fujixerox.co.jp 4.0.1 netbsd 4.0.1 (generic.mp) #0: tue oct 7 21:00:48 pdt 2008 builds@wb28:homebuildsabnetbsd-4-0-1-releaseamd64200810080053z-objhomebuildsabnetbsd-4-0-1-releasesrcsysarchamd64compilegeneric.mp amd64 '
    config_args='-sde -Darchname=x86_64-netbsd -Dcc=cc -Doptimize=-O2 -pthread -I/usr/include -Ui_malloc -Uusemymalloc -Uinstallusrbinperl -Dinstallstyle=lib/perl5 -Dprefix=/usr/pkg -Dsiteprefix=/usr/pkg -Dvendorprefix=/usr/pkg -Dscriptdir=/usr/pkg/lib/perl5/bin -Dsitescript=/usr/pkg/lib/perl5/site_perl/bin -Dvendorscript=/usr/pkg/lib/perl5/vendor_perl/bin -Dsitebin=/usr/pkg/lib/perl5/site_perl/bin -Dvendorbin=/usr/pkg/lib/perl5/vendor_perl/bin -Dprivlib=/usr/pkg/lib/perl5/5.10.0 -Dsitelib=/usr/pkg/lib/perl5/site_perl/5.10.0 -Dvendorlib=/usr/pkg/lib/perl5/vendor_perl/5.10.0 -Dsitelib_stem=/usr/pkg/lib/perl5/site_perl -Dvendorlib_stem=/usr/pkg/lib/perl5/vendor_perl -Dman1ext=1 -Dman1dir=/usr/pkg/lib/perl5/man/man1 -Dsiteman1dir=/usr/pkg/lib/perl5/site_perl/man/man1 -Dvendorman1dir=/usr/pkg/lib/perl5/vendor_perl/man/man1 -Dman3ext=3 -Dman3dir=/usr/pkg/lib/perl5/man/man3 -Dsiteman3dir=/usr/pkg/lib/perl5/site_perl/man/man3 -Dvendorman3dir=/usr/pkg/lib/perl5/vendor_perl/man/man3 -
 Duseshrplib -Daphostname=/bin/hostname -Dln=/bin/ln -Dsed=/usr/bin/sed -Dsh=/bin/sh -Dissymlink=test -h -Dperl5=false -Duseithreads -Dlibswanted=m crypt '
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/pkg/include',
    optimize='-O2 -pthread -I/usr/include',
    cppflags='-fno-strict-aliasing -pipe -I/usr/pkg/include'
    ccversion='', gccversion='4.1.2 20061021 prerelease (NetBSD nb3 20061125)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-R/usr/pkg/lib  -L/usr/pkg/lib'
    libpth=/usr/lib /usr/pkg/lib
    libs=-lm -lcrypt -lpthread
    perllibs=-lm -lcrypt -lpthread
    libc=/lib/libc.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E  -Wl,-R/usr/pkg/lib/perl5/5.10.0/x86_64-netbsd-thread-multi/CORE'
    cccdlflags='-DPIC -fPIC ', lddlflags='-Wl,-R/usr/pkg/lib --whole-archive -shared  -L/usr/pkg/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    /usr/pkg/lib/perl5/site_perl/5.10.0/x86_64-netbsd-thread-multi
    /usr/pkg/lib/perl5/site_perl/5.10.0
    /usr/pkg/lib/perl5/vendor_perl/5.10.0/x86_64-netbsd-thread-multi
    /usr/pkg/lib/perl5/vendor_perl/5.10.0
    /usr/pkg/lib/perl5/vendor_perl
    /usr/pkg/lib/perl5/5.10.0/x86_64-netbsd-thread-multi
    /usr/pkg/lib/perl5/5.10.0
    .


Environment for perl 5.10.0:
    HOME=/home/mihara
    LANG=ja_JP.eucJP
    LANGUAGE (unset)
    LC_ALL=ja_JP.eucJP
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/mihara/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin:/usr/games
    PERL_BADLANG=0
    SHELL=/usr/pkg/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

From @demerphq

2008/11/11 mihara@​twister.dev.iwa.fujixerox.co.jp (via RT)
<perlbug-followup@​perl.org>​:

# New Ticket Created by mihara@​twister.dev.iwa.fujixerox.co.jp
# Please include the string​: [perl #60472]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=60472 >

This is a bug report for perl from mihara@​twister.dev.iwa.fujixerox.co.jp,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]
I found this bug while playing with Encode​::IMAPUTF7 module.
Encode​::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10.
The problem is that if "$1" is fed into "encode()" directly, $1 for
subsequent pattern matching is not updated. Here is a small code to produce this.

----
#!/usr/bin/perl
use MIME​::Base64;
use Encode;
my $e_utf16 = find_encoding("UTF-16BE");

$str="abcあdef";
my $len=length($str);
$re1=qr/(?​:[a-z])/;
$re2=qr/(?​:[^a-z])/;
$byte='';
pos($str) = 0;
while (pos($str) < $len) {
#print pos($str),"\n";
print "\n",pos($str),"​:",$str,"[$1]\n";
if ($str =~ /\G($re1+)/cg) {
print "($1)";
$bytes .= $1;
} elsif ($str =~ /\G($re2+)/cg) {
my $base64 = encode_base64($e_utf16->encode($1), '');
print "<$1​:$base64>";
$bytes .= $1;
} else {
die "aaa";
}
}
-----

Please be sure that $str contains Japanese character in UTF-8.

when "$1" is fed like this in line 19
my $base64 = encode_base64($e_utf16->encode($1), '');

next pattern machi in line 15

   if \($str =~ /\\G\($re1\+\)/cg\) \{

cannot update $1. Previous $1 remains.

Work around is this.

           my $tmp = $1;
           my $base64 = encode\_base64\($e\_utf16\->encode\($tmp\)\, ''\);

But inside the $e_utf16->encode(), something may be wrong.

Sorry for my poor explanation. Please mail me at osamu.mihara'@​'fujixerox.co.jp, if you have further questions.

Can you attach the code as a file? I dont seem to be able to create
the appropriate input string.

I change the Japanese character to a question mark i get the following output​:

D​:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl

0​:abc?def[]
(abc)
3​:abc?def[abc]
<?​:AD8=>
4​:abc?def[?]
(?)
D​:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl

0​:abc?def[]
(abc)
3​:abc?def[abc]
<?​:AD8=>
4​:abc?def[?]
(def)

which clearly is different. (the bottom one is perl 5.8.6, the top is
a pretty recent blead).

If i change it to \x{100}

then i get​:

D​:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl

Wide character in print at ..\encode_bug.pl line 14.
0​:abc─Çdef[]
(abc)
Wide character in print at ..\encode_bug.pl line 14.
3​:abc─Çdef[abc]
Wide character in print at ..\encode_bug.pl line 20.
<─Ç​:AQA=>
Wide character in print at ..\encode_bug.pl line 14.
Wide character in print at ..\encode_bug.pl line 14.
4​:abc─Çdef[─Ç]
Wide character in print at ..\encode_bug.pl line 16.
(─Ç)
D​:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl

Wide character in print at ..\encode_bug.pl line 14.
0​:abc─Çdef[]
(abc)
Wide character in print at ..\encode_bug.pl line 14.
3​:abc─Çdef[abc]
Modification of a read-only value attempted at ..\encode_bug.pl line 19.

some of which is explainable by this being a windows box, and not
having a utf8 shell, but it looks like at least one issue (concerning
the read only value) is fixed in blead.

Which sortof reminds me of something​: Encode has an annoying habit of
modifying its arguments....

Anyway, obviously there is a bug. Damned if i understand it tho.

Cheers,
yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

From @demerphq

2008/11/11 demerphq <demerphq@​gmail.com>​:

2008/11/11 mihara@​twister.dev.iwa.fujixerox.co.jp (via RT)
<perlbug-followup@​perl.org>​:

# New Ticket Created by mihara@​twister.dev.iwa.fujixerox.co.jp
# Please include the string​: [perl #60472]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=60472 >

This is a bug report for perl from mihara@​twister.dev.iwa.fujixerox.co.jp,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]
I found this bug while playing with Encode​::IMAPUTF7 module.
Encode​::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10.
The problem is that if "$1" is fed into "encode()" directly, $1 for
subsequent pattern matching is not updated. Here is a small code to produce this.

----
#!/usr/bin/perl
use MIME​::Base64;
use Encode;
my $e_utf16 = find_encoding("UTF-16BE");

$str="abcあdef";
my $len=length($str);
$re1=qr/(?​:[a-z])/;
$re2=qr/(?​:[^a-z])/;
$byte='';
pos($str) = 0;
while (pos($str) < $len) {
#print pos($str),"\n";
print "\n",pos($str),"​:",$str,"[$1]\n";
if ($str =~ /\G($re1+)/cg) {
print "($1)";
$bytes .= $1;
} elsif ($str =~ /\G($re2+)/cg) {
my $base64 = encode_base64($e_utf16->encode($1), '');
print "<$1​:$base64>";
$bytes .= $1;
} else {
die "aaa";
}
}

If i used Devel​::Peek to have a look at $1 after each regex match i
find this​: (first output is blead, second is 5.8.6).

So for some reason $1 isnt marked as readonly in 5.10, and what i find
REALLY interesting is the last Dump output from both. In particular
we see this from 5.8.6​:

SV = PVMG(0x1a76b2c) at 0x1a98d9c
  REFCNT = 1
  FLAGS = (GMG,SMG,READONLY,pPOK)
  IV = 0
  NV = 0
  PV = 0x1a9359c "?"\0
  CUR = 1
  LEN = 11
  MAGIC = 0x1a994c4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a98d90
  MG_LEN = 1
  MG_PTR = 0x1a803b4 "1"
(def)

and in blead​:

SV = PVMG(0x1ac052c) at 0x1a4dab4
  REFCNT = 1
  FLAGS = (GMG,SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x1a60834 "?"\0 [UTF8 "?"]
  CUR = 1
  LEN = 12
  MAGIC = 0x1aa6bb4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a4dac4
  MG_LEN = 1
  MG_PTR = 0x1ab4a8c "1"
(?)

So, outside of the added UTF8 flag (wtf did that come from), and the
missing READONLY flag, the dump is basically the same. Which really
makes me scratch my head.

D​:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl

0​:abc?def[]
SV = PVMG(0x1ac052c) at 0x1a4dab4
  REFCNT = 1
  FLAGS = (GMG,SMG)
  IV = 0
  NV = 0
  PV = 0x1a60834 "UCS"\0
  CUR = 3
  LEN = 12
  MAGIC = 0x1aa6bb4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a4dac4
  MG_LEN = 1
  MG_PTR = 0x1ab4a8c "1"
(abc)
3​:abc?def[abc]
SV = PVMG(0x1ac052c) at 0x1a4dab4
  REFCNT = 1
  FLAGS = (GMG,SMG,pPOK)
  IV = 0
  NV = 0
  PV = 0x1a60834 "abc"\0
  CUR = 3
  LEN = 12
  MAGIC = 0x1aa6bb4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a4dac4
  MG_LEN = 1
  MG_PTR = 0x1ab4a8c "1"
<?​:AD8=>
4​:abc?def[?]
SV = PVMG(0x1ac052c) at 0x1a4dab4
  REFCNT = 1
  FLAGS = (GMG,SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x1a60834 "?"\0 [UTF8 "?"]
  CUR = 1
  LEN = 12
  MAGIC = 0x1aa6bb4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a4dac4
  MG_LEN = 1
  MG_PTR = 0x1ab4a8c "1"
(?)
D​:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl

0​:abc?def[]
SV = PVMG(0x1a76b2c) at 0x1a98d9c
  REFCNT = 1
  FLAGS = (GMG,SMG,READONLY)
  IV = 0
  NV = 0
  PV = 0x1a9359c "UCS"\0
  CUR = 3
  LEN = 11
  MAGIC = 0x1a994c4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a98d90
  MG_LEN = 1
  MG_PTR = 0x1a803b4 "1"
(abc)
3​:abc?def[abc]
SV = PVMG(0x1a76b2c) at 0x1a98d9c
  REFCNT = 1
  FLAGS = (GMG,SMG,READONLY,pPOK)
  IV = 0
  NV = 0
  PV = 0x1a9359c "abc"\0
  CUR = 3
  LEN = 11
  MAGIC = 0x1a994c4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a98d90
  MG_LEN = 1
  MG_PTR = 0x1a803b4 "1"
<?​:AD8=>
4​:abc?def[?]
SV = PVMG(0x1a76b2c) at 0x1a98d9c
  REFCNT = 1
  FLAGS = (GMG,SMG,READONLY,pPOK)
  IV = 0
  NV = 0
  PV = 0x1a9359c "?"\0
  CUR = 1
  LEN = 11
  MAGIC = 0x1a994c4
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x1a98d90
  MG_LEN = 1
  MG_PTR = 0x1a803b4 "1"
(def)

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

From @demerphq

2008/11/11 demerphq <demerphq@​gmail.com>​:

So for some reason $1 isnt marked as readonly in 5.10, and what i find
REALLY interesting is the last Dump output from both. In particular
we see this from 5.8.6​:

SV = PVMG(0x1a76b2c) at 0x1a98d9c
REFCNT = 1
FLAGS = (GMG,SMG,READONLY,pPOK)
IV = 0
NV = 0
PV = 0x1a9359c "?"\0
CUR = 1
LEN = 11
MAGIC = 0x1a994c4
MG_VIRTUAL = &PL_vtbl_sv
MG_TYPE = PERL_MAGIC_sv(\0)
MG_OBJ = 0x1a98d90
MG_LEN = 1
MG_PTR = 0x1a803b4 "1"
(def)

and in blead​:

SV = PVMG(0x1ac052c) at 0x1a4dab4
REFCNT = 1
FLAGS = (GMG,SMG,POK,pPOK,UTF8)

Hmm, im wondering, does the POK mean that we ignore the vtable?

Is it possible somehow that Encode is turning on POK and UTF8
incorrectly for some reason? Maybe because the variable isnt READONLY
or something?

IV = 0
NV = 0
PV = 0x1a60834 "?"\0 [UTF8 "?"]
CUR = 1
LEN = 12
MAGIC = 0x1aa6bb4
MG_VIRTUAL = &PL_vtbl_sv
MG_TYPE = PERL_MAGIC_sv(\0)
MG_OBJ = 0x1a4dac4
MG_LEN = 1
MG_PTR = 0x1ab4a8c "1"
(?)

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

From @nwc10

On Tue, Nov 11, 2008 at 06​:15​:42PM +0100, demerphq wrote​:

2008/11/11 demerphq <demerphq@​gmail.com>​:

So for some reason $1 isnt marked as readonly in 5.10, and what i find
REALLY interesting is the last Dump output from both. In particular
we see this from 5.8.6​:

SV = PVMG(0x1a76b2c) at 0x1a98d9c
REFCNT = 1
FLAGS = (GMG,SMG,READONLY,pPOK)
IV = 0
NV = 0
PV = 0x1a9359c "?"\0
CUR = 1
LEN = 11
MAGIC = 0x1a994c4
MG_VIRTUAL = &PL_vtbl_sv
MG_TYPE = PERL_MAGIC_sv(\0)
MG_OBJ = 0x1a98d90
MG_LEN = 1
MG_PTR = 0x1a803b4 "1"
(def)

and in blead​:

SV = PVMG(0x1ac052c) at 0x1a4dab4
REFCNT = 1
FLAGS = (GMG,SMG,POK,pPOK,UTF8)

Hmm, im wondering, does the POK mean that we ignore the vtable?

Is it possible somehow that Encode is turning on POK and UTF8
incorrectly for some reason? Maybe because the variable isnt READONLY
or something?

POK shouldn't be on for something with GMG.
It should only be pPOK, so that any calls to SvPV() etc call into
Perl_sv2pv_flags(), and the get magic is triggered.

So that (I think) means that we ignore the vtable, yes.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

From @demerphq

2008/11/11 Nicholas Clark <nick@​ccl4.org>​:

On Tue, Nov 11, 2008 at 06​:15​:42PM +0100, demerphq wrote​:

2008/11/11 demerphq <demerphq@​gmail.com>​:

So for some reason $1 isnt marked as readonly in 5.10, and what i find
REALLY interesting is the last Dump output from both. In particular
we see this from 5.8.6​:

SV = PVMG(0x1a76b2c) at 0x1a98d9c
REFCNT = 1
FLAGS = (GMG,SMG,READONLY,pPOK)
IV = 0
NV = 0
PV = 0x1a9359c "?"\0
CUR = 1
LEN = 11
MAGIC = 0x1a994c4
MG_VIRTUAL = &PL_vtbl_sv
MG_TYPE = PERL_MAGIC_sv(\0)
MG_OBJ = 0x1a98d90
MG_LEN = 1
MG_PTR = 0x1a803b4 "1"
(def)

and in blead​:

SV = PVMG(0x1ac052c) at 0x1a4dab4
REFCNT = 1
FLAGS = (GMG,SMG,POK,pPOK,UTF8)

Hmm, im wondering, does the POK mean that we ignore the vtable?

Is it possible somehow that Encode is turning on POK and UTF8
incorrectly for some reason? Maybe because the variable isnt READONLY
or something?

POK shouldn't be on for something with GMG.
It should only be pPOK, so that any calls to SvPV() etc call into
Perl_sv2pv_flags(), and the get magic is triggered.

So that (I think) means that we ignore the vtable, yes.

Should I take it that as you didn't mention the READONLY being missing
that you have no idea about that?

Shouldn't they be readonly? Or am I missing something?

It seems reasonable that Perl 5.8.6 dies, and blead doesn't is related.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2008

From @nwc10

On Tue, Nov 11, 2008 at 07​:45​:22PM +0100, demerphq wrote​:

2008/11/11 Nicholas Clark <nick@​ccl4.org>​:

Should I take it that as you didn't mention the READONLY being missing
that you have no idea about that?

Shouldn't they be readonly? Or am I missing something?

I think that they should be, but I missed the difference in the READONLY
flag, and have no idea about it.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2008

From @demerphq

2008/11/11 Nicholas Clark <nick@​ccl4.org>​:

On Tue, Nov 11, 2008 at 07​:45​:22PM +0100, demerphq wrote​:

2008/11/11 Nicholas Clark <nick@​ccl4.org>​:

Should I take it that as you didn't mention the READONLY being missing
that you have no idea about that?

Shouldn't they be readonly? Or am I missing something?

I think that they should be, but I missed the difference in the READONLY
flag, and have no idea about it.

Turns out that this is deliberate, as some pluggable regex engines
allow for a writable $1.

Aevar patched this in quite some time ago.

So I think at this points its a bug in Encode, but I couldn't figure
out how to fix it in the time I had available.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2008

From bart.lateur@pandora.be

According to the docs of Encode​::Encoding,
<http​://search.cpan.org/~dankogai/Encode-2.26/lib/Encode/Encoding.pm>,
encode is supposed to change its parameter. You're feeding it a
(supposedly) readonly value, $1. So, don't do that.

As to why the regex engine no longer updates $1 after such a change, at
least in 5.10, where modifying $1 apparently is allowed, that must be a
bug. I'm not sure as to what is supposed to happen to the regex, or the
string, if you change $1. I think it should be similar to applying s///.

For fun, try changing the parameter to encode to $1=$1.

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From @nwc10

Dave notes​:

Either Encode is doing something with $1 it shouldn't, or there's
something up with $1

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2009

From @tonycoz

On Thu May 28 07​:43​:38 2009, nicholas wrote​:

Dave notes​:

Either Encode is doing something with $1 it shouldn't, or there's
something up with $1

The first thing Encode​::Unicode​::encode_xs() does is call sv_utf8_upgrade().

When $1 was READONLY this would upgrade PVX and set UTF8 without setting
POK.

With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX
(if needed), and then set UTF8. This doesn't call mg_set(), so the
read-only handling in Perl_reg_numbered_buff_store isn't triggered to
produce a read only value error.

Encode​::Unicode​::encode_xs() makes further modifications to the SV later
on without calling mg_set().

So three issues​:

1) sv_utf8_upgrade() is setting POK via SvPV_force - is this sane for a
GMG SV?

2) sv_utf8_upgrade() updates the string, but doesn't call mg_set() -
should the caller be doing it?

3) Encode​::Unicode​::encode_xs() changes the string in other ways, but
doesn't call mg_set() when the string (str) is SMG.

I suspect Encode​::Unicode​::encode_xs not calling is a bug, but I'll
admit I don't have a strong understanding of magic.

I'm not sure what the desired behaviour would be between
sv_utf8_upgrade(), SvPV_force() and set magic. This makes me wonder if
$1 should be just marked READONLY.

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2009

From @tonycoz

On Mon, Jul 06, 2009 at 07​:54​:42AM -0700, Tony Cook via RT wrote​:

With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX
(if needed), and then set UTF8. This doesn't call mg_set(), so the
read-only handling in Perl_reg_numbered_buff_store isn't triggered to
produce a read only value error.

I've confirmed SvPV_force() will set POK if called on $1 [1].

Since SvPV() simply returns PVX when POK is set, any magic is ignored
on later use of $1.

So the main questions here is whether SvPV_force() should set SVf_POK
on an SV with get magic?

I suspect SvPV_force() shouldn't be setting POK, but I don't know what
the policy is.

I'm not sure what the desired behaviour would be between
sv_utf8_upgrade(), SvPV_force() and set magic. This makes me wonder if
$1 should be just marked READONLY.

I'm sure it shouldn't be.

--
[1] By calling an XS function defined​:

void
SvPV_force(sv)
  SV * sv
  PREINIT​:
  STRLEN len;
  PPCODE​:
  SvPV_force(sv, len);

$ perl -Mblib -MT60472 -MDevel​::Peek -le '$_ = "abcd"; /..(.)/; print $1; T60472​::SvPV_force($1); Dump($1); /.(.)/; print $1'
c
SV = PVMG(0x13aaed0) at 0x13e4960
  REFCNT = 1
  FLAGS = (GMG,SMG,POK,pPOK)
  IV = 0
  NV = 0
  PV = 0x13894a0 "c"\0
  CUR = 1
  LEN = 8
  MAGIC = 0x13ee9b0
  MG_VIRTUAL = &PL_vtbl_sv
  MG_TYPE = PERL_MAGIC_sv(\0)
  MG_OBJ = 0x13e4900
  MG_LEN = 1
  MG_PTR = 0x13ee8d0 "1"
c

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2009

From @nwc10

On Wed, Jul 08, 2009 at 10​:07​:49PM +1000, Tony Cook wrote​:

On Mon, Jul 06, 2009 at 07​:54​:42AM -0700, Tony Cook via RT wrote​:

With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX
(if needed), and then set UTF8. This doesn't call mg_set(), so the
read-only handling in Perl_reg_numbered_buff_store isn't triggered to
produce a read only value error.

I've confirmed SvPV_force() will set POK if called on $1 [1].

Since SvPV() simply returns PVX when POK is set, any magic is ignored
on later use of $1.

So the main questions here is whether SvPV_force() should set SVf_POK
on an SV with get magic?

The two together are contradictory. So something must give.

I suspect SvPV_force() shouldn't be setting POK, but I don't know what
the policy is.

I suspect that it should not. I've no idea what would get upset if this
changed. The code itself is, well, old. Hover over it​:

http​://perl5.git.perl.org/perl.git/blame/blead​:/sv.c#l8314

commit a0d0e21
Author​: Larry Wall <lwall@​netlabs.com>
Date​: Mon Oct 17 23​:00​:00 1994 +0000

The logical alternative would seem to be to turn SvPOK on, but remove magic.
But that would de-taint tainted scalars, which feels bad.

(Which, really, suggests that tainting is implemented wrongly. I suspect that
it would be a lot better re-done as a bit in SvFLAGS(), with manual
propagation in the relevant ops. This would also eliminate taint-tunnelling,
which has always bothered me)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2009

From @tonycoz

On Mon, Jul 13, 2009 at 04​:58​:28PM +0100, Nicholas Clark wrote​:

On Wed, Jul 08, 2009 at 10​:07​:49PM +1000, Tony Cook wrote​:

On Mon, Jul 06, 2009 at 07​:54​:42AM -0700, Tony Cook via RT wrote​:

With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX
(if needed), and then set UTF8. This doesn't call mg_set(), so the
read-only handling in Perl_reg_numbered_buff_store isn't triggered to
produce a read only value error.

I've confirmed SvPV_force() will set POK if called on $1 [1].

Since SvPV() simply returns PVX when POK is set, any magic is ignored
on later use of $1.

So the main questions here is whether SvPV_force() should set SVf_POK
on an SV with get magic?

The two together are contradictory. So something must give.

I suspect SvPV_force() shouldn't be setting POK, but I don't know what
the policy is.

I suspect that it should not. I've no idea what would get upset if this
changed. The code itself is, well, old. Hover over it​:

http​://perl5.git.perl.org/perl.git/blame/blead​:/sv.c#l8314

commit a0d0e21
Author​: Larry Wall <lwall@​netlabs.com>
Date​: Mon Oct 17 23​:00​:00 1994 +0000

The logical alternative would seem to be to turn SvPOK on, but remove magic.
But that would de-taint tainted scalars, which feels bad.

In the specific case in the ticket it would get or set magic from $1
too, which is worse.

When I find the time/energy I'll try adjusting Perl_sv_pvn_force_flags
to set only pPOK for GMG SVs and see what it breaks.

There is also a separate bug in Encode​::Unicode​::encode_xs - it
modifies the value but doesn't apply set magic, which means the set
magic for $1 isn't applied and we don't see a "modification of a
readonly value" error.

Unfortunately, the damage (POK on) is already done to the SV at this
point, so if the error is caught by eval, we're still left with a
broken $1.

Tony

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2009

From @tonycoz

On Tue, Jul 14, 2009 at 10​:38​:52AM +1000, Tony Cook wrote​:

When I find the time/energy I'll try adjusting Perl_sv_pvn_force_flags
to set only pPOK for GMG SVs and see what it breaks.

With the following change​:

--- a/sv.c
+++ b/sv.c
@​@​ -8313,7 +8313,10 @​@​ Perl_sv_pvn_force_flags(pTHX_ SV *const sv, STRLEN *const lp, const I32 flags)
  SvPVX(sv)[len] = '\0';
  }
  if (!SvPOK(sv)) {
- SvPOK_on(sv); /* validate pointer */
+ if (SvGMAGICAL(sv))
+ SvFLAGS(sv) |= SVp_POK;
+ else
+ SvPOK_on(sv); /* validate pointer */
  SvTAINT(sv);
  DEBUG_c(PerlIO_printf(Perl_debug_log, "0x%"UVxf" 2pv(%s)\n",
  PTR2UV(sv),SvPVX_const(sv)));

The only change in tests are​:

@​@​ -816,17 +816,11 @​@​
ext/B/t/terse.................................................ok
ext/B/t/xref..................................................ok
ext/Compress-Raw-Bzip2/t/000prereq............................ok
-ext/Compress-Raw-Bzip2/t/01bzip2..............................# Failed test at ../ext/Compress-Raw-Bzip2/t/01bzip2.t line 461.
-# got​: 'AppendedSYȀNM^@​^@​^E<9D><80>@​^@​@​ d*^B<D6>^@​ ^@​"<B2>4<F5>0<9E>
<A6><85>0^@​M^G%<F6><80>​:<98><CA>z^Oh<DD><FC>]<C9>^T<E1>BC"^A94Appended'
-# expected​: 'Appended'
-FAILED at test 111
+ext/Compress-Raw-Bzip2/t/01bzip2..............................ok
ext/Compress-Raw-Bzip2/t/09limitoutput........................ok
ext/Compress-Raw-Bzip2/t/99pod................................skipped
ext/Compress-Raw-Zlib/t/01version.............................ok
-ext/Compress-Raw-Zlib/t/02zlib................................# Failed test at ../ext/Compress-Raw-Zlib/t/02zlib.t line 712.
-# got​: 'AppendeHT<F0>p<F4>Q<B0>400PH<CE><CF>-(-I-^B^@​O<96>^G&Appended'
-# expected​: 'Appended'
-FAILED at test 197
+ext/Compress-Raw-Zlib/t/02zlib................................ok
ext/Compress-Raw-Zlib/t/07bufsize.............................ok
ext/Compress-Raw-Zlib/t/09limitoutput.........................ok
ext/Compress-Raw-Zlib/t/18lvalue..............................ok

(and the summary)

Copied and pasted from less, hence the <XX>.

The inflate code in both classes also forces POK on.

Tony

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2009

From @tonycoz

On Wed Jul 15 20​:30​:15 2009, tonyc wrote​:

On Tue, Jul 14, 2009 at 10​:38​:52AM +1000, Tony Cook wrote​:

When I find the time/energy I'll try adjusting
Perl_sv_pvn_force_flags
to set only pPOK for GMG SVs and see what it breaks.

This seems like it was a wild goose chase. SvSETMAGIC() will clear the
POK flag, and if encode is modified to call it, the code behaves as
expected.

Pushed to https://rt.cpan.org/Ticket/Display.html?id=51263

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2009

From @tonycoz

On Tue Nov 10 00​:10​:31 2009, tonyc wrote​:

On Wed Jul 15 20​:30​:15 2009, tonyc wrote​:

On Tue, Jul 14, 2009 at 10​:38​:52AM +1000, Tony Cook wrote​:

When I find the time/energy I'll try adjusting
Perl_sv_pvn_force_flags
to set only pPOK for GMG SVs and see what it breaks.

This seems like it was a wild goose chase. SvSETMAGIC() will clear the
POK flag, and if encode is modified to call it, the code behaves as
expected.

Pushed to https://rt.cpan.org/Ticket/Display.html?id=51263

The specific case described in 60472 should be fixed by Encode 2.38.

@p5pRT
Copy link
Author

p5pRT commented Nov 19, 2009

@rgs - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant