Skip Menu |
Report information
Id: 131263
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: pali [at] cpan.org
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Subject: Perl mess with UTF8 flag of GV
Date: Sun, 7 May 2017 01:22:20 +0200
From: pali [...] cpan.org
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 925b
It looks like that perl bleed mess with UTF8 flag of some GV. When into some *glob is assigned scalar with UTF8 flag and later to same *glob is assigned scalar without UTF8 flag, then UTF8 flag stay set in *glob. Look at following example: $str = "\N{U+0080}"; *sym = $str; (*sym eq "*main::\N{U+0080}") ? print "ok\n" : print "fail: " . (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; $str = "\xC3\x80"; *sym = $str; (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; Its output is: ok fail: 2a 6d 61 69 6e 3a 3a c0 "*main::\xC0" U+00C0 is encoded in UTF-8 as 0xC3, 0x80 which means that *sym after second assignment has UTF8 flag, even data (from $str) were assigned without UTF8 flag. And so perl mess with UTF8 flag. Calling Dump(*sym) or utf8::is_utf8(*sym) prove this fact that UTF8 flag is really set.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Sat, 06 May 2017 16:45:53 -0700, pali@cpan.org wrote: Show quoted text
> It looks like that perl bleed mess with UTF8 flag of some GV. > > When into some *glob is assigned scalar with UTF8 flag and later to same > *glob is assigned scalar without UTF8 flag, then UTF8 flag stay set in > *glob. > > Look at following example: > > $str = "\N{U+0080}"; > *sym = $str; > (*sym eq "*main::\N{U+0080}") ? print "ok\n" : print "fail: " . > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > $str = "\xC3\x80"; > *sym = $str; > (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > Its output is: > > ok > fail: 2a 6d 61 69 6e 3a 3a c0 > > "*main::\xC0" > > U+00C0 is encoded in UTF-8 as 0xC3, 0x80 which means that *sym after > second assignment has UTF8 flag, even data (from $str) were assigned > without UTF8 flag. And so perl mess with UTF8 flag. > > Calling Dump(*sym) or utf8::is_utf8(*sym) prove this fact that UTF8 flag > is really set.
Using this shorter example: #!perl *sym = "\N{U+0080}"; *sym eq "*main::\N{U+0080}"; *sym = "\xC3\x80"; (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; __END__ If you comment out either one of the first two lines, it prints ‘ok’, so the ‘eq’ comparison has something to do with it. -- Father Chrysostomos


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org