Skip Menu |
Report information
Id: 132833
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: pali [at] cpan.org
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



To: perlbug [...] perl.org
Subject: COW bug in :encoding layer
Date: Thu, 8 Feb 2018 21:16:45 +0100
From: pali [...] cpan.org
Download (untitled) / with headers
text/plain 388b
Following code modifies variable $valid: use strict; use warnings; use Data::Dumper; my $valid = "\x61\x00\x00\x00"; my $tmp = $valid; print Dumper $valid; open my $fh, '<:encoding(UTF32-LE)', \$tmp or die; my $str = <$fh>; close $fh; print Dumper $valid; Output is: $VAR1 = 'a'; $VAR1 = ''; This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Thu, 08 Feb 2018 12:16:59 -0800, pali@cpan.org wrote: Show quoted text
> Following code modifies variable $valid: > > use strict; > use warnings; > use Data::Dumper; > > my $valid = "\x61\x00\x00\x00"; > my $tmp = $valid; > print Dumper $valid; > open my $fh, '<:encoding(UTF32-LE)', \$tmp or die; > my $str = <$fh>; > close $fh; > print Dumper $valid; > > Output is: > > $VAR1 = 'a'; > $VAR1 = ''; > > This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64- > linux-gnu-thread-multi
This appears to be a bug in Unicode::decode() - it's modifying a buffer that doesn't belong to it (SvLEN() for the SV is zero). Adding a call to SvGROW() fixes it: diff --git a/cpan/Encode/Unicode/Unicode.xs b/cpan/Encode/Unicode/Unicode.xs index b3b1d2fea8..2a5c647625 100644 --- a/cpan/Encode/Unicode/Unicode.xs +++ b/cpan/Encode/Unicode/Unicode.xs @@ -328,6 +328,7 @@ CODE: } } if (check && !(check & ENCODE_LEAVE_SRC)) { + SvGROW(str, 1+e-s); if (s < e) { Move(s,SvPVX(str),e-s,U8); SvCUR_set(str,(e-s)); Replacing that block with a call to sv_chop() instead would also fix it, except for what I think is a bug in PerlIO::encoding: /* Create a "dummy" SV to represent the available data from layer below */ if (SvLEN(e->dataSV) && SvPVX_const(e->dataSV)) { Safefree(SvPVX_mutable(e->dataSV)); } since SvPVX(sv) might not point to the beginning of the allocated block with an SvOOK() SV. Tony
Subject: Re: [perl #132833] COW bug in :encoding layer
Date: Mon, 12 Feb 2018 10:02:17 +0100
From: pali [...] cpan.org
To: perlbug-followup [...] perl.org
Download (untitled) / with headers
text/plain 345b
On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote: Show quoted text
> This appears to be a bug in Unicode::decode() - it's modifying a buffer that doesn't belong to it (SvLEN() for the SV is zero).
But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And is SvGROW() after SvPV_force really needed? Show quoted text
> Adding a call to SvGROW() fixes it:
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 707b
On Mon, 12 Feb 2018 01:02:37 -0800, pali@cpan.org wrote: Show quoted text
> On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote:
> > This appears to be a bug in Unicode::decode() - it's modifying a > > buffer that doesn't belong to it (SvLEN() for the SV is zero).
> > But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And > is SvGROW() after SvPV_force really needed?
I was surprised that it didn't, but it doesn't. It's possible I'm wrong in saying Encode is doing the wrong thing here. SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else). Tony
Subject: Re: [perl #132833] COW bug in :encoding layer
From: pali [...] cpan.org
Date: Tue, 13 Feb 2018 09:21:03 +0100
To: perlbug-followup [...] perl.org
On Monday 12 February 2018 15:21:31 Tony Cook via RT wrote: Show quoted text
> On Mon, 12 Feb 2018 01:02:37 -0800, pali@cpan.org wrote:
> > On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote:
> > > This appears to be a bug in Unicode::decode() - it's modifying a > > > buffer that doesn't belong to it (SvLEN() for the SV is zero).
> > > > But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And > > is SvGROW() after SvPV_force really needed?
> > I was surprised that it didn't, but it doesn't. > > It's possible I'm wrong in saying Encode is doing the wrong thing here. > > SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).
Anyway, exactly same pattern of code is in UTF-8 decoder (in Encode.xs) and is not affected by this bug. It also uses SvPV_force_nomg() and there is no SvGROW() call too. So I cannot believe problem is in missing SvGROW(), otherwise UTF-8 decoder would be affected too...
Date: Tue, 13 Feb 2018 09:46:16 +0000
Subject: Re: [perl #132833] COW bug in :encoding layer
To: pali [...] cpan.org
CC: perlbug-followup [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
On Tue, Feb 13, 2018 at 09:21:03AM +0100, pali@cpan.org wrote: Show quoted text
> On Monday 12 February 2018 15:21:31 Tony Cook via RT wrote:
> > On Mon, 12 Feb 2018 01:02:37 -0800, pali@cpan.org wrote:
> > > On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote:
> > > > This appears to be a bug in Unicode::decode() - it's modifying a > > > > buffer that doesn't belong to it (SvLEN() for the SV is zero).
> > > > > > But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And > > > is SvGROW() after SvPV_force really needed?
> > > > I was surprised that it didn't, but it doesn't. > > > > It's possible I'm wrong in saying Encode is doing the wrong thing here. > > > > SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).
> > Anyway, exactly same pattern of code is in UTF-8 decoder (in Encode.xs) > and is not affected by this bug. It also uses SvPV_force_nomg() and > there is no SvGROW() call too. So I cannot believe problem is in missing > SvGROW(), otherwise UTF-8 decoder would be affected too...
SvLEN() is set to zero here in ext/PerlIO-encoding/encoding.xs: SvPV_set(e->dataSV, (char *) ptr); SvLEN_set(e->dataSV, 0); /* Hands off sv.c - it isn't yours */ SvCUR_set(e->dataSV,use); SvPOK_only(e->dataSV); I'm not completely sure how core and other XS code should handle SVLEN()==0 SVs. My gut feeling is that if such an SV ever gets delivered to a block of code which which wishes to GROW the SV, then a bug has been triggered. Clearly if code *was* to GROW such an SV, then at a minimum, the previous "alien" buffer might get leaked. Or the alien-handling code might later interpret the new buffer as an alien buffer and crash badly. So I have a vague felling that Enocde is ok, and that the real bug is in allowing such an SV to reach decode() in the first place. But I haven't looked very closely at the whole system. -- Counsellor Troi states something other than the blindingly obvious. -- Things That Never Happen in "Star Trek" #16
To: perl5-porters [...] perl.org
From: Zefram <zefram [...] fysh.org>
Date: Thu, 15 Feb 2018 20:01:48 +0000
Subject: Re: [perl #132833] COW bug in :encoding layer
Download (untitled) / with headers
text/plain 507b
Show quoted text
>Following code modifies variable $valid:
Actually you don't need to see the modification of $valid to see that there's a bug. The code shouldn't be modifying $tmp, and it exhibits the bug of that modification even without $valid existing. It is a COW bug that the modification to $tmp also modifies $valid, but adding COW logic is not going to be the way to fix this. The underlying bug is that it writes to $tmp at all, which is not a COW bug, and the fix will be to stop it writing to $tmp. -zefram
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 859b
On Thu, 15 Feb 2018 20:02:17 GMT, zefram@fysh.org wrote: Show quoted text
> >Following code modifies variable $valid:
> > Actually you don't need to see the modification of $valid to see that > there's a bug. The code shouldn't be modifying $tmp, and it exhibits > the bug of that modification even without $valid existing. It is a COW > bug that the modification to $tmp also modifies $valid, but adding COW > logic is not going to be the way to fix this. The underlying bug is > that it writes to $tmp at all, which is not a COW bug, and the fix will > be to stop it writing to $tmp. > > -zefram
I.e., with the attachment: ##### $ perl 132833-UTF32-LE-encoding.pl | od -c 0000000 $ V A R 1 = ' a \0 \0 \0 ' ; \n 0000020 $ V A R 1 = ' \0 \0 \0 \0 ' ; \n 0000040 ##### -- James E Keenan (jkeenan@cpan.org)
Subject: 132833-UTF32-LE-encoding.pl
#!/usr/bin/env perl use Data::Dumper; my $tmp = "\x61\x00\x00\x00"; print Dumper $tmp; open my $fh, '<:encoding(UTF32-LE)', \$tmp or die; my $str = <$fh>; close $fh; print Dumper $tmp;
To: Zefram <zefram [...] fysh.org>
Subject: Re: [perl #132833] COW bug in :encoding layer
From: Tony Cook <tony [...] develop-help.com>
Date: Fri, 16 Feb 2018 09:01:55 +1100
CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 960b
On Thu, Feb 15, 2018 at 08:01:48PM +0000, Zefram wrote: Show quoted text
> >Following code modifies variable $valid:
> > Actually you don't need to see the modification of $valid to see that > there's a bug. The code shouldn't be modifying $tmp, and it exhibits > the bug of that modification even without $valid existing. It is a COW > bug that the modification to $tmp also modifies $valid, but adding COW > logic is not going to be the way to fix this. The underlying bug is > that it writes to $tmp at all, which is not a COW bug, and the fix will > be to stop it writing to $tmp.
The issue is where the bug is. PerlIO::encoding creates a SvLEN()==0 variable and passes it to Encode, which calls SvPV_force_nomg() on the sv, and then modifies the buffer. Is passing the len==0 SV through to Encode the error? Or should SvPV_force/sv_force_normal remove len == 0 from the SV? Or should Encode be doing something to prevent the problem? (SvGROW worked for me) Tony
To: perl5-porters [...] perl.org
From: Zefram <zefram [...] fysh.org>
Date: Thu, 15 Feb 2018 23:38:34 +0000
Subject: Re: [perl #132833] COW bug in :encoding layer
Dave Mitchell wrote: Show quoted text
>So I have a vague felling that Enocde is ok, and that the real bug is in >allowing such an SV to reach decode() in the first place.
To be precise: is it legitimate for an SV to have SvLEN==0 without also being SvREADONLY? The documentation doesn't explicitly address this, and in fact is pretty much silent on the SvLEN==0 arrangement. It appears to me that SvPV_force*() are intended to leave the scalar in a state in which its SvPVX is writable. The documentation suggests that it's sufficient, without explicitly addressing SvLEN==0. More convincingly, these macros will croak if the scalar is SvREADONLY. Encode::Unicode therefore seems legit in performing a write after SvPV_force_nomg(). If SvLEN==0 with !SvREADONLY is legitimate, then SvPV_force*() need to check for SvLEN==0 and do something, presumably cloning the string into a new buffer owned by the scalar. If there were no good answer as to what it should do here, that would be a strong argument for this kind of scalar not being permitted. I have a slight worry about whether it's OK to clone the buffer of an SvLEN==0 scalar, but since sv_utf8_upgrade() performs such an operation (even on an SvREADONLY scalar) there's precedent for it. So I think this argument against SvLEN==0 with !SvREADONLY being permitted doesn't apply. Do we have other cases of SvLEN==0 with !SvREADONLY? If not, I'd be inclined to say that it's forbidden, and avoid having to make the SvPV_force*() macros more expensive. Meanwhile, PerlIO::encoding should be setting the flags differently on that scalar. I'm not sure which way round. One option is that it should set SvREADONLY, and should set the ENCODE_LEAVE_SRC bit in the check flag. The other option is that it should clone the string itself so that it doesn't have SvLEN==0. At first glance I can't see whether all the mucking about with e->dataSV depends on the mutation that ->decode performs. PerlIO::encoding isn't ensuring that the ENCODE_LEAVE_SRC bit is clear, so either it doesn't depend on the mutation or that's another bug. -zefram
From: Leon Timmermans <fawaka [...] gmail.com>
To: Zefram <zefram [...] fysh.org>
Subject: Re: [perl #132833] COW bug in :encoding layer
Date: Fri, 16 Feb 2018 02:18:53 +0100
CC: Perl5 Porters <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 2.4k
On Fri, Feb 16, 2018 at 12:38 AM, Zefram <zefram@fysh.org> wrote: Show quoted text
> Dave Mitchell wrote:
>>So I have a vague felling that Enocde is ok, and that the real bug is in >>allowing such an SV to reach decode() in the first place.
> > To be precise: is it legitimate for an SV to have SvLEN==0 without also > being SvREADONLY? The documentation doesn't explicitly address this, > and in fact is pretty much silent on the SvLEN==0 arrangement. > > It appears to me that SvPV_force*() are intended to leave the scalar > in a state in which its SvPVX is writable. The documentation suggests > that it's sufficient, without explicitly addressing SvLEN==0. More > convincingly, these macros will croak if the scalar is SvREADONLY. > Encode::Unicode therefore seems legit in performing a write after > SvPV_force_nomg(). > > If SvLEN==0 with !SvREADONLY is legitimate, then SvPV_force*() need to > check for SvLEN==0 and do something, presumably cloning the string into > a new buffer owned by the scalar. If there were no good answer as to > what it should do here, that would be a strong argument for this kind of > scalar not being permitted. I have a slight worry about whether it's OK > to clone the buffer of an SvLEN==0 scalar, but since sv_utf8_upgrade() > performs such an operation (even on an SvREADONLY scalar) there's > precedent for it. So I think this argument against SvLEN==0 with > !SvREADONLY being permitted doesn't apply. > > Do we have other cases of SvLEN==0 with !SvREADONLY? If not, I'd > be inclined to say that it's forbidden, and avoid having to make the > SvPV_force*() macros more expensive.
In File::Map I use such strings, and writing operations work fine on them iff the operation doesn't change their length. A few other modules do the same, I didn't invent this technique myself. Show quoted text
> Meanwhile, PerlIO::encoding should be setting the flags differently on > that scalar. I'm not sure which way round. One option is that it should > set SvREADONLY, and should set the ENCODE_LEAVE_SRC bit in the check flag. > The other option is that it should clone the string itself so that it > doesn't have SvLEN==0. At first glance I can't see whether all the > mucking about with e->dataSV depends on the mutation that ->decode > performs.
The code is depending on Encode modifying dataSV. Show quoted text
> PerlIO::encoding isn't ensuring that the ENCODE_LEAVE_SRC > bit is clear, so either it doesn't depend on the mutation or that's > another bug.
IME it's generally a safe bet that PerlIO::encoding is doing the confusing thing. Leon
To: Perl5 Porters <perl5-porters [...] perl.org>
From: Zefram <zefram [...] fysh.org>
Date: Fri, 16 Feb 2018 17:25:42 +0000
Subject: Re: [perl #132833] COW bug in :encoding layer
Download (untitled) / with headers
text/plain 804b
Leon Timmermans wrote: Show quoted text
>In File::Map I use such strings, and writing operations work fine on >them iff the operation doesn't change their length.
Ah, thanks. I hadn't thought about the writability of the unowned buffer being used intentionally. Obviously that use of a writable SvLEN==0 scalar is completely incompatible with the behaviour that PerlIO::encoding intends for the scalar that it passes to the encoding object. Based on the File::Map precedent, I conclude that PerlIO::encoding should be cloning the string preemptively. Fixed in commit fed9fe5b48ccdffef9065a03c12c237cc7418de6. FWIW, the File::Map usage seems dubious, because SV-related functions are willing to allocate a new buffer. It means that there isn't any real guarantee that writes will go to the unowned buffer. -zefram
Subject: Re: [perl #132833] COW bug in :encoding layer
Date: Mon, 19 Feb 2018 19:12:22 +0100
To: Zefram <zefram [...] fysh.org>
CC: Perl5 Porters <perl5-porters [...] perl.org>
From: Leon Timmermans <fawaka [...] gmail.com>
Download (untitled) / with headers
text/plain 446b
On Fri, Feb 16, 2018 at 6:25 PM, Zefram <zefram@fysh.org> wrote: Show quoted text
> FWIW, the File::Map usage seems dubious, because SV-related functions > are willing to allocate a new buffer. It means that there isn't any > real guarantee that writes will go to the unowned buffer.
The main innovation of File::Map over previous modules like Sys::Mmap is that it uses set-magic to detect it when that happens, complain loudly and if possible fix it :-) Leon
RT-Send-CC: perl5-porters [...] perl.org
This was fixed in fed9fe5b48ccdffef9065a03c12c237cc7418de6. Tony
Download (untitled) / with headers
text/plain 317b
Thank you for filing this report. You have helped make Perl better. With the release yesterday of Perl 5.28.0, this and 185 other issues have been resolved. Perl 5.28.0 may be downloaded via: https://metacpan.org/release/XSAWYERX/perl-5.28.0 If you find that the problem persists, feel free to reopen this ticket.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org