Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval( Dumper( 'Latin1' ) ) can result in malformed UTF8 #12066

Closed
p5pRT opened this issue Apr 19, 2012 · 9 comments
Closed

eval( Dumper( 'Latin1' ) ) can result in malformed UTF8 #12066

p5pRT opened this issue Apr 19, 2012 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 19, 2012

Migrated from rt.perl.org#112532 (status was 'resolved')

Searchable as RT112532$

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2012

From zzbbyy@gmail.com

Created by zzbbyy@gmail.com

This is a bug report for perl from zzbbyy@​gmail.com,
generated with the help of perlbug 1.39 running under perl 5.15.9.

-----------------------------------------------------------------
Dumper( "\x{f3}" ) returns a string encoded in Latin1.  This results
in malformed UTF8 when it is fed back to eval and 'use utf8' is in force.
This is illustrated by the following test case​:

use strict;
use warnings;

use utf8;

use Devel​::Peek;
use Test​::More;
use Data​::Dumper;
$Data​::Dumper​::Terse = 1;

my $last = eval( Dumper( "\x{f3}" ) );
is( $last, "\x{f3}", 'eval' );
Dump( $last );

done_testing;

__OUTPUT__

not ok 1 - eval
#   Failed test 'eval'
#   at d.pl line 13.
Wide character in print at
/home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759.
#          got​: 'ó'
#     expected​: 'Ã�³'
SV = PV(0x21a4060) at 0x1ed1e40
 REFCNT = 1
 FLAGS = (PADMY,POK,pPOK,UTF8)
 PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4,
after start byte 0xf3) in subroutine entry at d.pl line 14.
[UTF8 "\x{0}"]
 CUR = 1
 LEN = 16
1..1
# Looks like you failed 1 test of 1.

Perl Info

Flags:
   category=library
   severity=medium
   module=Data::Dumper

Site configuration information for perl 5.15.9:

Configured by zby at Wed Apr 18 21:43:42 CEST 2012.

Summary of my perl5 (revision 5 version 15 subversion 9) configuration:

 Platform:
   osname=linux, osvers=3.0.0-17-generic, archname=x86_64-linux
   uname='linux zby 3.0.0-17-generic #30-ubuntu smp thu mar 8 20:45:39
utc 2012 x86_64 x86_64 x86_64 gnulinux '
   config_args='-des -Dprefix=/home/zby/localperl -Dusedevel'
   hint=recommended, useposix=true, d_sigaction=define
   useithreads=undef, usemultiplicity=undef
   useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
   use64bitint=define, use64bitall=define, uselongdouble=undef
   usemymalloc=n, bincompat5005=undef
 Compiler:
   cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
   optimize='-O2',
   cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
   ccversion='', gccversion='4.6.1', gccosandvers=''
   intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
   d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
   ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
   alignbytes=8, prototype=define
 Linker and Libraries:
   ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
   libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
   libs=-lnsl -ldl -lm -lcrypt -lutil -lc
   perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
   libc=, so=so, useshrplib=false, libperl=libperl.a
   gnulibc_version='2.13'
 Dynamic Linking:
   dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
   cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Locally applied patches:



@INC for perl 5.15.9:
   /home/zby/localperl/lib/site_perl/5.15.9/x86_64-linux
   /home/zby/localperl/lib/site_perl/5.15.9
   /home/zby/localperl/lib/5.15.9/x86_64-linux
   /home/zby/localperl/lib/5.15.9
   .


Environment for perl 5.15.9:
   HOME=/home/zby
   LANG=pl_PL.UTF-8
   LANGUAGE (unset)
   LD_LIBRARY_PATH (unset)
   LOGDIR (unset)
   PATH=/home/zby/perl5/perlbrew/bin:/home/zby/perl5/perlbrew/perls/perl-5.14.0/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
   PERLBREW_PATH=/home/zby/perl5/perlbrew/bin:/home/zby/perl5/perlbrew/perls/perl-5.14.0/bin
   PERLBREW_PERL=perl-5.14.0
   PERLBREW_ROOT=/home/zby/perl5/perlbrew
   PERLBREW_VERSION=0.13
   PERL_BADLANG (unset)
   SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2012

zzbbyy@gmail.com - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2012

zzbbyy@gmail.com - Status changed from 'open' to 'new'

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2012

From willert@gmail.com

Taken from https://metacpan.org/module/perlunifaq#Data::Dumper-doesnt-
restore-the-UTF8-flag-is-it-broken-


Data​::Dumper doesn't restore the UTF8 flag; is it broken?

No, Data​::Dumper's Unicode abilities are as they should be. There have
been some complaints that it should restore the UTF8 flag when the data
is read again with eval. However, you should really not look at the
flag, and nothing indicates that Data​::Dumper should break this rule.
Here's what happens​: when Perl reads in a string literal, it sticks to 8
bit encoding as long as it can. (But perhaps originally it was
internally encoded as UTF-8, when you dumped it.) When it has to give
that up because other characters are added to the text string, it
silently upgrades the string to UTF-8. If you properly encode your
strings for output, none of this is of your concern, and you can just
eval dumped data as always.


Adding "utf8​::upgrade( $last );" after the eval() line restores the flag
and will lead to a passing test. HTH

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2012

From @cpansprout

On Wed Apr 18 23​:25​:04 2012, zzbbyy wrote​:

Dumper( "\x{f3}" ) returns a string encoded in Latin1. �This results
in malformed UTF8 when it is fed back to eval and 'use utf8' is in
force.
This is illustrated by the following test case​:

use strict;
use warnings;

use utf8;

use Devel​::Peek;
use Test​::More;
use Data​::Dumper;
$Data​::Dumper​::Terse = 1;

my $last = eval( Dumper( "\x{f3}" ) );
is( $last, "\x{f3}", 'eval' );
Dump( $last );

done_testing;

__OUTPUT__

not ok 1 - eval
# � Failed test 'eval'
# � at d.pl line 13.
Wide character in print at
/home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759.
# � � � � �got​: '�'
# � � expected​: 'ó'
SV = PV(0x21a4060) at 0x1ed1e40
�REFCNT = 1
�FLAGS = (PADMY,POK,pPOK,UTF8)
�PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4,
after start byte 0xf3) in subroutine entry at d.pl line 14.
[UTF8 "\x{0}"]
�CUR = 1
�LEN = 16
1..1
# Looks like you failed 1 test of 1.

The real bug here is that �eval� is respecting the �use utf8� from
outside it. Unfortunately, we cannot easily fix that without breaking
code. So, instead, we�ve added the unicode_eval feature in 5.16 (soon
to be released).

If you put �use feature "unicode_eval"� or �use v5.15� (�use v5.16�
doesn�t yet work in bleadperl, because of the version number) at the top
of your test script, it just works.

That the parser can produce malformed scalars is a separate bug, not
specific to Data​::Dumper or eval. It can happen if you put â��use utf8â��
at the top of a file that is not in utf8. I thought that was already
reported, but now I can�t find the ticket.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2013

From @jkeenan

On Thu Apr 19 16​:53​:05 2012, sprout wrote​:

On Wed Apr 18 23​:25​:04 2012, zzbbyy wrote​:

Dumper( "\x{f3}" ) returns a string encoded in Latin1. �This results
in malformed UTF8 when it is fed back to eval and 'use utf8' is in
force.
This is illustrated by the following test case​:

use strict;
use warnings;

use utf8;

use Devel​::Peek;
use Test​::More;
use Data​::Dumper;
$Data​::Dumper​::Terse = 1;

my $last = eval( Dumper( "\x{f3}" ) );
is( $last, "\x{f3}", 'eval' );
Dump( $last );

done_testing;

__OUTPUT__

not ok 1 - eval
# � Failed test 'eval'
# � at d.pl line 13.
Wide character in print at
/home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759.
# � � � � �got​: '�'
# � � expected​: 'ó'
SV = PV(0x21a4060) at 0x1ed1e40
�REFCNT = 1
�FLAGS = (PADMY,POK,pPOK,UTF8)
�PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4,
after start byte 0xf3) in subroutine entry at d.pl line 14.
[UTF8 "\x{0}"]
�CUR = 1
�LEN = 16
1..1
# Looks like you failed 1 test of 1.

The real bug here is that �eval� is respecting the �use utf8� from
outside it. Unfortunately, we cannot easily fix that without breaking
code. So, instead, we�ve added the unicode_eval feature in 5.16 (soon
to be released).

If you put �use feature "unicode_eval"� or �use v5.15� (�use v5.16�
doesn�t yet work in bleadperl, because of the version number) at the top
of your test script, it just works.

That the parser can produce malformed scalars is a separate bug, not
specific to Data​::Dumper or eval. It can happen if you put â��use utf8â��
at the top of a file that is not in utf8. I thought that was already
reported, but now I can�t find the ticket.

I can confirm that Father C's suggestions work, so the OP's problem is
resolved as of Perl 5.16.

I propose that we close this ticket. If someone wishes to discuss
whether the parser can produce malformed scalars, s/he should open a
separate ticket.

I am taking this ticket for the purpose of closing it in seven days
unless someone has new insights into these issues.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2013

From @jkeenan

On Sun Jan 27 17​:58​:38 2013, jkeenan wrote​:

On Thu Apr 19 16​:53​:05 2012, sprout wrote​:

On Wed Apr 18 23​:25​:04 2012, zzbbyy wrote​:

Dumper( "\x{f3}" ) returns a string encoded in Latin1. �This results
in malformed UTF8 when it is fed back to eval and 'use utf8' is in
force.
This is illustrated by the following test case​:

use strict;
use warnings;

use utf8;

use Devel​::Peek;
use Test​::More;
use Data​::Dumper;
$Data​::Dumper​::Terse = 1;

my $last = eval( Dumper( "\x{f3}" ) );
is( $last, "\x{f3}", 'eval' );
Dump( $last );

done_testing;

__OUTPUT__

not ok 1 - eval
# � Failed test 'eval'
# � at d.pl line 13.
Wide character in print at
/home/zby/localperl/lib/5.15.9/Test/Builder.pm line 1759.
# � � � � �got​: '�'
# � � expected​: 'ó'
SV = PV(0x21a4060) at 0x1ed1e40
�REFCNT = 1
�FLAGS = (PADMY,POK,pPOK,UTF8)
�PV = 0x2064050 "\363"\0Malformed UTF-8 character (1 byte, need 4,
after start byte 0xf3) in subroutine entry at d.pl line 14.
[UTF8 "\x{0}"]
�CUR = 1
�LEN = 16
1..1
# Looks like you failed 1 test of 1.

The real bug here is that �eval� is respecting the �use utf8� from
outside it. Unfortunately, we cannot easily fix that without breaking
code. So, instead, we�ve added the unicode_eval feature in 5.16 (soon
to be released).

If you put �use feature "unicode_eval"� or �use v5.15� (�use v5.16�
doesn�t yet work in bleadperl, because of the version number) at the top
of your test script, it just works.

That the parser can produce malformed scalars is a separate bug, not
specific to Data​::Dumper or eval. It can happen if you put â��use utf8â��
at the top of a file that is not in utf8. I thought that was already
reported, but now I can�t find the ticket.

I can confirm that Father C's suggestions work, so the OP's problem is
resolved as of Perl 5.16.

I propose that we close this ticket. If someone wishes to discuss
whether the parser can produce malformed scalars, s/he should open a
separate ticket.

I am taking this ticket for the purpose of closing it in seven days
unless someone has new insights into these issues.

Hearing no objection in the alloted time, am marking ticket Resolved.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2013

@jkeenan - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant