Skip Menu |
Report information
Id: 112532
Status: resolved
Priority: 0/
Queue: perl5

Owner: jkeenan <jkeenan [at] cpan.org>
Requestors: zzbbyy <zzbbyy [at] gmail.com>
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: medium
Type: library
Perl Version: 5.15.9
Fixed In: (no value)



Subject: eval( Dumper( 'Latin1' ) ) can result in malformed UTF8
Date: Thu, 19 Apr 2012 08:24:33 +0200
To: perlbug [...] perl.org
From: Zbigniew Łukasiak <zzbbyy [...] gmail.com>
Download (untitled) / with headers
text/plain 3.8k
This is a bug report for perl from zzbbyy@gmail.com, generated with the help of perlbug 1.39 running under perl 5.15.9. ----------------------------------------------------------------- Dumper( "\x{f3}" ) returns a string encoded in Latin1.  This results in malformed UTF8 when it is fed back to eval and 'use utf8' is in force. This is illustrated by the following test case: use strict; use warnings; use utf8; use Devel::Peek; use Test::More; use Data::Dumper; $Data::Dumper::Terse = 1; my $last = eval( Dumper( "\x{f3}" ) ); is( $last, "\x{f3}", 'eval' ); Dump( $last ); done_testing; __OUTPUT__ not ok 1 - eval #   Failed test 'eval' #   at d.pl line 13. Wide character in print at /home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759. #          got: 'ó' #     expected: 'ó' SV = PV(0x21a4060) at 0x1ed1e40  REFCNT = 1  FLAGS = (PADMY,POK,pPOK,UTF8)  PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4, after start byte 0xf3) in subroutine entry at d.pl line 14. [UTF8 "\x{0}"]  CUR = 1  LEN = 16 1..1 # Looks like you failed 1 test of 1. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags:    category=library    severity=medium    module=Data::Dumper --- Site configuration information for perl 5.15.9: Configured by zby at Wed Apr 18 21:43:42 CEST 2012. Summary of my perl5 (revision 5 version 15 subversion 9) configuration:  Platform:    osname=linux, osvers=3.0.0-17-generic, archname=x86_64-linux    uname='linux zby 3.0.0-17-generic #30-ubuntu smp thu mar 8 20:45:39 utc 2012 x86_64 x86_64 x86_64 gnulinux '    config_args='-des -Dprefix=/home/zby/localperl -Dusedevel'    hint=recommended, useposix=true, d_sigaction=define    useithreads=undef, usemultiplicity=undef    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef    use64bitint=define, use64bitall=define, uselongdouble=undef    usemymalloc=n, bincompat5005=undef  Compiler:    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',    optimize='-O2',    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'    ccversion='', gccversion='4.6.1', gccosandvers=''    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8    alignbytes=8, prototype=define  Linker and Libraries:    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib    libs=-lnsl -ldl -lm -lcrypt -lutil -lc    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc    libc=, so=so, useshrplib=false, libperl=libperl.a    gnulibc_version='2.13'  Dynamic Linking:    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Locally applied patches: --- @INC for perl 5.15.9:    /home/zby/localperl/lib/site_perl/5.15.9/x86_64-linux    /home/zby/localperl/lib/site_perl/5.15.9    /home/zby/localperl/lib/5.15.9/x86_64-linux    /home/zby/localperl/lib/5.15.9    . --- Environment for perl 5.15.9:    HOME=/home/zby    LANG=pl_PL.UTF-8    LANGUAGE (unset)    LD_LIBRARY_PATH (unset)    LOGDIR (unset)    PATH=/home/zby/perl5/perlbrew/bin:/home/zby/perl5/perlbrew/perls/perl-5.14.0/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games    PERLBREW_PATH=/home/zby/perl5/perlbrew/bin:/home/zby/perl5/perlbrew/perls/perl-5.14.0/bin    PERLBREW_PERL=perl-5.14.0    PERLBREW_ROOT=/home/zby/perl5/perlbrew    PERLBREW_VERSION=0.13    PERL_BADLANG (unset)    SHELL=/bin/bash
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1012b
Taken from https://metacpan.org/module/perlunifaq#Data::Dumper-doesnt- restore-the-UTF8-flag-is-it-broken- --- Data::Dumper doesn't restore the UTF8 flag; is it broken? No, Data::Dumper's Unicode abilities are as they should be. There have been some complaints that it should restore the UTF8 flag when the data is read again with eval. However, you should really not look at the flag, and nothing indicates that Data::Dumper should break this rule. Here's what happens: when Perl reads in a string literal, it sticks to 8 bit encoding as long as it can. (But perhaps originally it was internally encoded as UTF-8, when you dumped it.) When it has to give that up because other characters are added to the text string, it silently upgrades the string to UTF-8. If you properly encode your strings for output, none of this is of your concern, and you can just eval dumped data as always. --- Adding "utf8::upgrade( $last );" after the eval() line restores the flag and will lead to a passing test. HTH
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.7k
On Wed Apr 18 23:25:04 2012, zzbbyy wrote: Show quoted text
> Dumper( "\x{f3}" ) returns a string encoded in Latin1. �This results > in malformed UTF8 when it is fed back to eval and 'use utf8' is in > force. > This is illustrated by the following test case: > > use strict; > use warnings; > > use utf8; > > use Devel::Peek; > use Test::More; > use Data::Dumper; > $Data::Dumper::Terse = 1; > > > my $last = eval( Dumper( "\x{f3}" ) ); > is( $last, "\x{f3}", 'eval' ); > Dump( $last ); > > done_testing; > > __OUTPUT__ > > not ok 1 - eval > # � Failed test 'eval' > # � at d.pl line 13. > Wide character in print at > /home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759. > # � � � � �got: '�' > # � � expected: 'ó' > SV = PV(0x21a4060) at 0x1ed1e40 > �REFCNT = 1 > �FLAGS = (PADMY,POK,pPOK,UTF8) > �PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4, > after start byte 0xf3) in subroutine entry at d.pl line 14. > [UTF8 "\x{0}"] > �CUR = 1 > �LEN = 16 > 1..1 > # Looks like you failed 1 test of 1.
The real bug here is that ‘eval’ is respecting the ‘use utf8’ from outside it. Unfortunately, we cannot easily fix that without breaking code. So, instead, we’ve added the unicode_eval feature in 5.16 (soon to be released). If you put ‘use feature "unicode_eval"’ or ‘use v5.15’ (‘use v5.16’ doesn’t yet work in bleadperl, because of the version number) at the top of your test script, it just works. That the parser can produce malformed scalars is a separate bug, not specific to Data::Dumper or eval. It can happen if you put ‘use utf8’ at the top of a file that is not in utf8. I thought that was already reported, but now I can’t find the ticket. -- Father Chrysostomos
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.2k
On Thu Apr 19 16:53:05 2012, sprout wrote: Show quoted text
> On Wed Apr 18 23:25:04 2012, zzbbyy wrote:
> > Dumper( "\x{f3}" ) returns a string encoded in Latin1. �This results > > in malformed UTF8 when it is fed back to eval and 'use utf8' is in > > force. > > This is illustrated by the following test case: > > > > use strict; > > use warnings; > > > > use utf8; > > > > use Devel::Peek; > > use Test::More; > > use Data::Dumper; > > $Data::Dumper::Terse = 1; > > > > > > my $last = eval( Dumper( "\x{f3}" ) ); > > is( $last, "\x{f3}", 'eval' ); > > Dump( $last ); > > > > done_testing; > > > > __OUTPUT__ > > > > not ok 1 - eval > > # � Failed test 'eval' > > # � at d.pl line 13. > > Wide character in print at > > /home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759. > > # � � � � �got: '�' > > # � � expected: 'ó' > > SV = PV(0x21a4060) at 0x1ed1e40 > > �REFCNT = 1 > > �FLAGS = (PADMY,POK,pPOK,UTF8) > > �PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4, > > after start byte 0xf3) in subroutine entry at d.pl line 14. > > [UTF8 "\x{0}"] > > �CUR = 1 > > �LEN = 16 > > 1..1 > > # Looks like you failed 1 test of 1.
> > The real bug here is that ‘eval’ is respecting the ‘use utf8’ from > outside it. Unfortunately, we cannot easily fix that without breaking > code. So, instead, we’ve added the unicode_eval feature in 5.16 (soon > to be released). > > If you put ‘use feature "unicode_eval"’ or ‘use v5.15’ (‘use v5.16’ > doesn’t yet work in bleadperl, because of the version number) at the top > of your test script, it just works. > > That the parser can produce malformed scalars is a separate bug, not > specific to Data::Dumper or eval. It can happen if you put ‘use utf8’ > at the top of a file that is not in utf8. I thought that was already > reported, but now I can’t find the ticket. >
I can confirm that Father C's suggestions work, so the OP's problem is resolved as of Perl 5.16. I propose that we close this ticket. If someone wishes to discuss whether the parser can produce malformed scalars, s/he should open a separate ticket. I am taking this ticket for the purpose of closing it in seven days unless someone has new insights into these issues. Thank you very much. Jim Keenan
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
On Sun Jan 27 17:58:38 2013, jkeenan wrote: Show quoted text
> On Thu Apr 19 16:53:05 2012, sprout wrote:
> > On Wed Apr 18 23:25:04 2012, zzbbyy wrote:
> > > Dumper( "\x{f3}" ) returns a string encoded in Latin1. �This results > > > in malformed UTF8 when it is fed back to eval and 'use utf8' is in > > > force. > > > This is illustrated by the following test case: > > > > > > use strict; > > > use warnings; > > > > > > use utf8; > > > > > > use Devel::Peek; > > > use Test::More; > > > use Data::Dumper; > > > $Data::Dumper::Terse = 1; > > > > > > > > > my $last = eval( Dumper( "\x{f3}" ) ); > > > is( $last, "\x{f3}", 'eval' ); > > > Dump( $last ); > > > > > > done_testing; > > > > > > __OUTPUT__ > > > > > > not ok 1 - eval > > > # � Failed test 'eval' > > > # � at d.pl line 13. > > > Wide character in print at > > > /home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759. > > > # � � � � �got: '�' > > > # � � expected: 'ó' > > > SV = PV(0x21a4060) at 0x1ed1e40 > > > �REFCNT = 1 > > > �FLAGS = (PADMY,POK,pPOK,UTF8) > > > �PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4, > > > after start byte 0xf3) in subroutine entry at d.pl line 14. > > > [UTF8 "\x{0}"] > > > �CUR = 1 > > > �LEN = 16 > > > 1..1 > > > # Looks like you failed 1 test of 1.
> > > > The real bug here is that ‘eval’ is respecting the ‘use utf8’ from > > outside it. Unfortunately, we cannot easily fix that without breaking > > code. So, instead, we’ve added the unicode_eval feature in 5.16 (soon > > to be released). > > > > If you put ‘use feature "unicode_eval"’ or ‘use v5.15’ (‘use v5.16’ > > doesn’t yet work in bleadperl, because of the version number) at the top > > of your test script, it just works. > > > > That the parser can produce malformed scalars is a separate bug, not > > specific to Data::Dumper or eval. It can happen if you put ‘use utf8’ > > at the top of a file that is not in utf8. I thought that was already > > reported, but now I can’t find the ticket. > >
> > I can confirm that Father C's suggestions work, so the OP's problem is > resolved as of Perl 5.16. > > I propose that we close this ticket. If someone wishes to discuss > whether the parser can produce malformed scalars, s/he should open a > separate ticket. > > I am taking this ticket for the purpose of closing it in seven days > unless someone has new insights into these issues. >
Hearing no objection in the alloted time, am marking ticket Resolved. Thank you very much. Jim Keenan


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org