Skip Menu |
Report information
Id: 131263
Status: pending release
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: pali [at] cpan.org
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)

Attachments
0001-perl-131263-clear-the-UTF8-flag-on-a-glob-if-it-isn-.patch



Subject: Perl mess with UTF8 flag of GV
Date: Sun, 7 May 2017 01:22:20 +0200
From: pali [...] cpan.org
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 925b
It looks like that perl bleed mess with UTF8 flag of some GV. When into some *glob is assigned scalar with UTF8 flag and later to same *glob is assigned scalar without UTF8 flag, then UTF8 flag stay set in *glob. Look at following example: $str = "\N{U+0080}"; *sym = $str; (*sym eq "*main::\N{U+0080}") ? print "ok\n" : print "fail: " . (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; $str = "\xC3\x80"; *sym = $str; (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; Its output is: ok fail: 2a 6d 61 69 6e 3a 3a c0 "*main::\xC0" U+00C0 is encoded in UTF-8 as 0xC3, 0x80 which means that *sym after second assignment has UTF8 flag, even data (from $str) were assigned without UTF8 flag. And so perl mess with UTF8 flag. Calling Dump(*sym) or utf8::is_utf8(*sym) prove this fact that UTF8 flag is really set.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Sat, 06 May 2017 16:45:53 -0700, pali@cpan.org wrote: Show quoted text
> It looks like that perl bleed mess with UTF8 flag of some GV. > > When into some *glob is assigned scalar with UTF8 flag and later to same > *glob is assigned scalar without UTF8 flag, then UTF8 flag stay set in > *glob. > > Look at following example: > > $str = "\N{U+0080}"; > *sym = $str; > (*sym eq "*main::\N{U+0080}") ? print "ok\n" : print "fail: " . > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > $str = "\xC3\x80"; > *sym = $str; > (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > Its output is: > > ok > fail: 2a 6d 61 69 6e 3a 3a c0 > > "*main::\xC0" > > U+00C0 is encoded in UTF-8 as 0xC3, 0x80 which means that *sym after > second assignment has UTF8 flag, even data (from $str) were assigned > without UTF8 flag. And so perl mess with UTF8 flag. > > Calling Dump(*sym) or utf8::is_utf8(*sym) prove this fact that UTF8 flag > is really set.
Using this shorter example: #!perl *sym = "\N{U+0080}"; *sym eq "*main::\N{U+0080}"; *sym = "\xC3\x80"; (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; __END__ If you comment out either one of the first two lines, it prints ‘ok’, so the ‘eq’ comparison has something to do with it. -- Father Chrysostomos
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.6k
On Sun, 07 May 2017 06:45:31 -0700, sprout wrote: Show quoted text
> On Sat, 06 May 2017 16:45:53 -0700, pali@cpan.org wrote:
> > It looks like that perl bleed mess with UTF8 flag of some GV. > > > > When into some *glob is assigned scalar with UTF8 flag and later to > > same > > *glob is assigned scalar without UTF8 flag, then UTF8 flag stay set > > in > > *glob. > > > > Look at following example: > > > > $str = "\N{U+0080}"; > > *sym = $str; > > (*sym eq "*main::\N{U+0080}") ? print "ok\n" : print "fail: " . > > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > > > $str = "\xC3\x80"; > > *sym = $str; > > (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . > > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > > > Its output is: > > > > ok > > fail: 2a 6d 61 69 6e 3a 3a c0 > > > > "*main::\xC0" > > > > U+00C0 is encoded in UTF-8 as 0xC3, 0x80 which means that *sym after > > second assignment has UTF8 flag, even data (from $str) were assigned > > without UTF8 flag. And so perl mess with UTF8 flag. > > > > Calling Dump(*sym) or utf8::is_utf8(*sym) prove this fact that UTF8 > > flag > > is really set.
> > Using this shorter example: > > #!perl > *sym = "\N{U+0080}"; > *sym eq "*main::\N{U+0080}"; > > *sym = "\xC3\x80"; > (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > __END__ > > If you comment out either one of the first two lines, it prints ‘ok’, > so the ‘eq’ comparison has something to do with it.
sv_2pv_flags() would set UTF8 on, but never turned it off. Fixed by the attached. Tony
Subject: 0001-perl-131263-clear-the-UTF8-flag-on-a-glob-if-it-isn-.patch
From fc661be6f8ba0ef072eb3874d6d7cfbe11329781 Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Mon, 5 Jun 2017 14:38:14 +1000 Subject: [PATCH] [perl #131263] clear the UTF8 flag on a glob if it isn't UTF8 Previously sv_2pv_flags() would set the UTF8 flag on a glob if it had a UTF8 name, but wouldn't clear it if it didn't. This meant a name change (eg. if assigned another glob) from a UTF8 name to a non-UTF8 name would leave the flag set. --- sv.c | 2 ++ t/op/gv.t | 10 +++++++++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/sv.c b/sv.c index db51175..3296519 100644 --- a/sv.c +++ b/sv.c @@ -3181,6 +3181,8 @@ Perl_sv_2pv_flags(pTHX_ SV *const sv, STRLEN *const lp, const I32 flags) assert(SvPOK(buffer)); if (SvUTF8(buffer)) SvUTF8_on(sv); + else + SvUTF8_off(sv); if (lp) *lp = SvCUR(buffer); return SvPVX(buffer); diff --git a/t/op/gv.t b/t/op/gv.t index be6b0dd..4c2c43f 100644 --- a/t/op/gv.t +++ b/t/op/gv.t @@ -12,7 +12,7 @@ BEGIN { use warnings; -plan(tests => 280); +plan(tests => 282); # type coercion on assignment $foo = 'foo'; @@ -1170,6 +1170,14 @@ SKIP: { is ($? & 127, 0,"[perl #128597] No crash when gp_free calls ckWARN_d"); } +{ + # [perl #131263] + *sym = "\N{U+0080}"; + ok(*sym eq "*main::\N{U+0080}", "utf8 flag properly set"); + *sym = "\xC3\x80"; + ok(*sym eq "*main::\xC3\x80", "utf8 flag properly cleared"); +} + # test gv_try_downgrade() # If a GV can be stored in a stash in a compact, non-GV form, then # whenever ops are freed which reference the GV, an attempt is made to -- 2.1.4
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.8k
On Mon, 05 Jun 2017 04:39:12 GMT, tonyc wrote: Show quoted text
> On Sun, 07 May 2017 06:45:31 -0700, sprout wrote:
> > On Sat, 06 May 2017 16:45:53 -0700, pali@cpan.org wrote:
> > > It looks like that perl bleed mess with UTF8 flag of some GV. > > > > > > When into some *glob is assigned scalar with UTF8 flag and later to > > > same > > > *glob is assigned scalar without UTF8 flag, then UTF8 flag stay set > > > in > > > *glob. > > > > > > Look at following example: > > > > > > $str = "\N{U+0080}"; > > > *sym = $str; > > > (*sym eq "*main::\N{U+0080}") ? print "ok\n" : print "fail: " . > > > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > > > > > $str = "\xC3\x80"; > > > *sym = $str; > > > (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . > > > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > > > > > Its output is: > > > > > > ok > > > fail: 2a 6d 61 69 6e 3a 3a c0 > > > > > > "*main::\xC0" > > > > > > U+00C0 is encoded in UTF-8 as 0xC3, 0x80 which means that *sym after > > > second assignment has UTF8 flag, even data (from $str) were assigned > > > without UTF8 flag. And so perl mess with UTF8 flag. > > > > > > Calling Dump(*sym) or utf8::is_utf8(*sym) prove this fact that UTF8 > > > flag > > > is really set.
> > > > Using this shorter example: > > > > #!perl > > *sym = "\N{U+0080}"; > > *sym eq "*main::\N{U+0080}"; > > > > *sym = "\xC3\x80"; > > (*sym eq "*main::\xC3\x80") ? print "ok\n" : print "fail: " . > > (join " ", map { sprintf "%x", ord($_) } split //, *sym) . "\n"; > > __END__ > > > > If you comment out either one of the first two lines, it prints ‘ok’, > > so the ‘eq’ comparison has something to do with it.
> > sv_2pv_flags() would set UTF8 on, but never turned it off. > > Fixed by the attached. > > Tony >
Tested (in TDD manner) on Linux and two FreeBSD VMs. DWIMs. +1 -- James E Keenan (jkeenan@cpan.org)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 202b
On Mon, 05 Jun 2017 05:58:28 -0700, jkeenan wrote: Show quoted text
> Tested (in TDD manner) on Linux and two FreeBSD VMs. DWIMs. +1
Thanks for testing it, applied as 1097da16b21fe0a2257dba9937e55c0cca18f7e1. Tony


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org