COW bug in :encoding layer #16408

p5pRT · 2018-02-08T20:16:59Z

Migrated from rt.perl.org#132833 (status was 'resolved')

Searchable as RT132833$

p5pRT · 2018-02-08T20:16:59Z

From @pali

Following code modifies variable $valid:

use strict;
use warnings;
use Data::Dumper;

my $valid = "\x61\x00\x00\x00";
my $tmp = $valid;
print Dumper $valid;
open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
my $str = <$fh>;
close $fh;
print Dumper $valid;

Output is:

$VAR1 = 'a';
$VAR1 = '';

This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi

p5pRT · 2018-02-12T07:11:25Z

From @tonycoz

On Thu, 08 Feb 2018 12:16:59 -0800, pali@cpan.org wrote:

Following code modifies variable $valid:

use strict;
use warnings;
use Data::Dumper;

my $valid = "\x61\x00\x00\x00";
my $tmp = $valid;
print Dumper $valid;
open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
my $str = <$fh>;
close $fh;
print Dumper $valid;

Output is:

$VAR1 = 'a';
$VAR1 = '';

This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-
linux-gnu-thread-multi

This appears to be a bug in Unicode::decode() - it's modifying a buffer that doesn't belong to it (SvLEN() for the SV is zero).

Adding a call to SvGROW() fixes it:

Inline Patch

diff --git a/cpan/Encode/Unicode/Unicode.xs b/cpan/Encode/Unicode/Unicode.xs
index b3b1d2fea8..2a5c647625 100644
--- a/cpan/Encode/Unicode/Unicode.xs
+++ b/cpan/Encode/Unicode/Unicode.xs
@@ -328,6 +328,7 @@ CODE:
        }
     }
     if (check && !(check & ENCODE_LEAVE_SRC)) {
+        SvGROW(str, 1+e-s);
        if (s < e) {
            Move(s,SvPVX(str),e-s,U8);
            SvCUR_set(str,(e-s));

Replacing that block with a call to sv_chop() instead would also fix it, except for what I think is a bug in PerlIO::encoding:

/* Create a "dummy" SV to represent the available data from layer below */
if (SvLEN(e->dataSV) && SvPVX_const(e->dataSV)) {
Safefree(SvPVX_mutable(e->dataSV));
}

since SvPVX(sv) might not point to the beginning of the allocated block with an SvOOK() SV.

Tony

p5pRT · 2018-02-12T07:11:26Z

The RT System itself - Status changed from 'new' to 'open'

p5pRT · 2018-02-12T09:02:37Z

From @pali

On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote:

This appears to be a bug in Unicode::decode() - it's modifying a buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

Adding a call to SvGROW() fixes it:

p5pRT · 2018-02-12T23:21:31Z

From @tonycoz

On Mon, 12 Feb 2018 01:02:37 -0800, pali@cpan.org wrote:

On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote:

This appears to be a bug in Unicode::decode() - it's modifying a
buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

I was surprised that it didn't, but it doesn't.

It's possible I'm wrong in saying Encode is doing the wrong thing here.

SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).

Tony

p5pRT · 2018-02-13T08:21:20Z

From @pali

On Monday 12 February 2018 15:21:31 Tony Cook via RT wrote:

On Mon, 12 Feb 2018 01:02:37 -0800, pali@cpan.org wrote:

On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote:

This appears to be a bug in Unicode::decode() - it's modifying a
buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

I was surprised that it didn't, but it doesn't.

It's possible I'm wrong in saying Encode is doing the wrong thing here.

SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).

Anyway, exactly same pattern of code is in UTF-8 decoder (in Encode.xs)
and is not affected by this bug. It also uses SvPV_force_nomg() and
there is no SvGROW() call too. So I cannot believe problem is in missing
SvGROW(), otherwise UTF-8 decoder would be affected too...

p5pRT · 2018-02-13T09:46:39Z

From @iabyn

On Tue, Feb 13, 2018 at 09:21:03AM +0100, pali@cpan.org wrote:

On Monday 12 February 2018 15:21:31 Tony Cook via RT wrote:

On Mon, 12 Feb 2018 01:02:37 -0800, pali@cpan.org wrote:

On Sunday 11 February 2018 23:11:26 Tony Cook via RT wrote:

This appears to be a bug in Unicode::decode() - it's modifying a
buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

I was surprised that it didn't, but it doesn't.

It's possible I'm wrong in saying Encode is doing the wrong thing here.

SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).

Anyway, exactly same pattern of code is in UTF-8 decoder (in Encode.xs)
and is not affected by this bug. It also uses SvPV_force_nomg() and
there is no SvGROW() call too. So I cannot believe problem is in missing
SvGROW(), otherwise UTF-8 decoder would be affected too...

SvLEN() is set to zero here in ext/PerlIO-encoding/encoding.xs:

SvPV_set(e->dataSV, (char *) ptr);
SvLEN_set(e->dataSV, 0); /* Hands off sv.c - it isn't yours */
SvCUR_set(e->dataSV,use);
SvPOK_only(e->dataSV);

I'm not completely sure how core and other XS code should handle
SVLEN()==0 SVs. My gut feeling is that if such an SV ever gets delivered
to a block of code which which wishes to GROW the SV, then a bug has been
triggered.

Clearly if code *was* to GROW such an SV, then at a minimum, the previous
"alien" buffer might get leaked. Or the alien-handling code might later
interpret the new buffer as an alien buffer and crash badly.

So I have a vague felling that Enocde is ok, and that the real bug is in
allowing such an SV to reach decode() in the first place.

But I haven't looked very closely at the whole system.

--
Counsellor Troi states something other than the blindingly obvious.
-- Things That Never Happen in "Star Trek" #16

p5pRT · 2018-02-15T20:02:17Z

From zefram@fysh.org

Following code modifies variable $valid:

Actually you don't need to see the modification of $valid to see that
there's a bug. The code shouldn't be modifying $tmp, and it exhibits
the bug of that modification even without $valid existing. It is a COW
bug that the modification to $tmp also modifies $valid, but adding COW
logic is not going to be the way to fix this. The underlying bug is
that it writes to $tmp at all, which is not a COW bug, and the fix will
be to stop it writing to $tmp.

-zefram

p5pRT · 2018-02-15T20:39:31Z

From @jkeenan

On Thu, 15 Feb 2018 20:02:17 GMT, zefram@fysh.org wrote:

Following code modifies variable $valid:

Actually you don't need to see the modification of $valid to see that
there's a bug. The code shouldn't be modifying $tmp, and it exhibits
the bug of that modification even without $valid existing. It is a COW
bug that the modification to $tmp also modifies $valid, but adding COW
logic is not going to be the way to fix this. The underlying bug is
that it writes to $tmp at all, which is not a COW bug, and the fix will
be to stop it writing to $tmp.

-zefram

I.e., with the attachment:

#####
$ perl 132833-UTF32-LE-encoding.pl | od -c
0000000 $ V A R 1 = ' a \0 \0 \0 ' ; \n
0000020 $ V A R 1 = ' \0 \0 \0 \0 ' ; \n
0000040
#####
--
James E Keenan (jkeenan@cpan.org)

p5pRT · 2018-02-15T20:39:31Z

From @jkeenan

132833-UTF32-LE-encoding.pl

p5pRT · 2018-02-15T22:02:22Z

From @tonycoz

On Thu, Feb 15, 2018 at 08:01:48PM +0000, Zefram wrote:

Following code modifies variable $valid:

Actually you don't need to see the modification of $valid to see that
there's a bug. The code shouldn't be modifying $tmp, and it exhibits
the bug of that modification even without $valid existing. It is a COW
bug that the modification to $tmp also modifies $valid, but adding COW
logic is not going to be the way to fix this. The underlying bug is
that it writes to $tmp at all, which is not a COW bug, and the fix will
be to stop it writing to $tmp.

The issue is where the bug is.

PerlIO::encoding creates a SvLEN()==0 variable and passes it to
Encode, which calls SvPV_force_nomg() on the sv, and then modifies the
buffer.

Is passing the len==0 SV through to Encode the error?

Or should SvPV_force/sv_force_normal remove len == 0 from the SV?

Or should Encode be doing something to prevent the problem? (SvGROW
worked for me)

Tony

p5pRT · 2018-02-15T23:38:50Z

From zefram@fysh.org

Dave Mitchell wrote:

So I have a vague felling that Enocde is ok, and that the real bug is in
allowing such an SV to reach decode() in the first place.

To be precise: is it legitimate for an SV to have SvLEN==0 without also
being SvREADONLY? The documentation doesn't explicitly address this,
and in fact is pretty much silent on the SvLEN==0 arrangement.

It appears to me that SvPV_force*() are intended to leave the scalar
in a state in which its SvPVX is writable. The documentation suggests
that it's sufficient, without explicitly addressing SvLEN==0. More
convincingly, these macros will croak if the scalar is SvREADONLY.
Encode::Unicode therefore seems legit in performing a write after
SvPV_force_nomg().

If SvLEN==0 with !SvREADONLY is legitimate, then SvPV_force*() need to
check for SvLEN==0 and do something, presumably cloning the string into
a new buffer owned by the scalar. If there were no good answer as to
what it should do here, that would be a strong argument for this kind of
scalar not being permitted. I have a slight worry about whether it's OK
to clone the buffer of an SvLEN==0 scalar, but since sv_utf8_upgrade()
performs such an operation (even on an SvREADONLY scalar) there's
precedent for it. So I think this argument against SvLEN==0 with
!SvREADONLY being permitted doesn't apply.

Do we have other cases of SvLEN==0 with !SvREADONLY? If not, I'd
be inclined to say that it's forbidden, and avoid having to make the
SvPV_force*() macros more expensive.

Meanwhile, PerlIO::encoding should be setting the flags differently on
that scalar. I'm not sure which way round. One option is that it should
set SvREADONLY, and should set the ENCODE_LEAVE_SRC bit in the check flag.
The other option is that it should clone the string itself so that it
doesn't have SvLEN==0. At first glance I can't see whether all the
mucking about with e->dataSV depends on the mutation that ->decode
performs. PerlIO::encoding isn't ensuring that the ENCODE_LEAVE_SRC
bit is clear, so either it doesn't depend on the mutation or that's
another bug.

-zefram

p5pRT · 2018-02-16T01:19:25Z

From @Leont

On Fri, Feb 16, 2018 at 12:38 AM, Zefram <zefram@fysh.org> wrote:

Dave Mitchell wrote:

So I have a vague felling that Enocde is ok, and that the real bug is in
allowing such an SV to reach decode() in the first place.

To be precise: is it legitimate for an SV to have SvLEN==0 without also
being SvREADONLY? The documentation doesn't explicitly address this,
and in fact is pretty much silent on the SvLEN==0 arrangement.

It appears to me that SvPV_force*() are intended to leave the scalar
in a state in which its SvPVX is writable. The documentation suggests
that it's sufficient, without explicitly addressing SvLEN==0. More
convincingly, these macros will croak if the scalar is SvREADONLY.
Encode::Unicode therefore seems legit in performing a write after
SvPV_force_nomg().

If SvLEN==0 with !SvREADONLY is legitimate, then SvPV_force*() need to
check for SvLEN==0 and do something, presumably cloning the string into
a new buffer owned by the scalar. If there were no good answer as to
what it should do here, that would be a strong argument for this kind of
scalar not being permitted. I have a slight worry about whether it's OK
to clone the buffer of an SvLEN==0 scalar, but since sv_utf8_upgrade()
performs such an operation (even on an SvREADONLY scalar) there's
precedent for it. So I think this argument against SvLEN==0 with
!SvREADONLY being permitted doesn't apply.

Do we have other cases of SvLEN==0 with !SvREADONLY? If not, I'd
be inclined to say that it's forbidden, and avoid having to make the
SvPV_force*() macros more expensive.

In File::Map I use such strings, and writing operations work fine on
them iff the operation doesn't change their length. A few other
modules do the same, I didn't invent this technique myself.

Meanwhile, PerlIO::encoding should be setting the flags differently on
that scalar. I'm not sure which way round. One option is that it should
set SvREADONLY, and should set the ENCODE_LEAVE_SRC bit in the check flag.
The other option is that it should clone the string itself so that it
doesn't have SvLEN==0. At first glance I can't see whether all the
mucking about with e->dataSV depends on the mutation that ->decode
performs.

The code is depending on Encode modifying dataSV.

PerlIO::encoding isn't ensuring that the ENCODE_LEAVE_SRC
bit is clear, so either it doesn't depend on the mutation or that's
another bug.

IME it's generally a safe bet that PerlIO::encoding is doing the
confusing thing.

Leon

p5pRT · 2018-02-16T17:25:59Z

From zefram@fysh.org

Leon Timmermans wrote:

In File::Map I use such strings, and writing operations work fine on
them iff the operation doesn't change their length.

Ah, thanks. I hadn't thought about the writability of the unowned
buffer being used intentionally. Obviously that use of a writable
SvLEN==0 scalar is completely incompatible with the behaviour
that PerlIO::encoding intends for the scalar that it passes to the
encoding object. Based on the File::Map precedent, I conclude that
PerlIO::encoding should be cloning the string preemptively. Fixed in
commit fed9fe5.

FWIW, the File::Map usage seems dubious, because SV-related functions
are willing to allocate a new buffer. It means that there isn't any
real guarantee that writes will go to the unowned buffer.

-zefram

p5pRT · 2018-02-19T18:12:52Z

From @Leont

On Fri, Feb 16, 2018 at 6:25 PM, Zefram <zefram@fysh.org> wrote:

FWIW, the File::Map usage seems dubious, because SV-related functions
are willing to allocate a new buffer. It means that there isn't any
real guarantee that writes will go to the unowned buffer.

The main innovation of File::Map over previous modules like Sys::Mmap
is that it uses set-magic to detect it when that happens, complain
loudly and if possible fix it :-)

Leon

p5pRT · 2018-02-21T04:26:11Z

From @tonycoz

This was fixed in fed9fe5.

Tony

p5pRT · 2018-02-21T04:26:12Z

@tonycoz - Status changed from 'open' to 'pending release'

p5pRT · 2018-06-23T15:33:17Z

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release yesterday of Perl 5.28.0, this and 185 other issues have been
resolved.

Perl 5.28.0 may be downloaded via:
https://metacpan.org/release/XSAWYERX/perl-5.28.0

If you find that the problem persists, feel free to reopen this ticket.

p5pRT · 2018-06-23T15:33:18Z

@khwilliamson - Status changed from 'pending release' to 'resolved'

p5pRT closed this as completed Jun 23, 2018

p5pRT added the Severity Low label Oct 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COW bug in :encoding layer #16408

COW bug in :encoding layer #16408

p5pRT commented Feb 8, 2018

p5pRT commented Feb 8, 2018

p5pRT commented Feb 12, 2018

p5pRT commented Feb 12, 2018

p5pRT commented Feb 12, 2018

p5pRT commented Feb 12, 2018

p5pRT commented Feb 13, 2018

p5pRT commented Feb 13, 2018

p5pRT commented Feb 15, 2018

p5pRT commented Feb 15, 2018

p5pRT commented Feb 15, 2018

p5pRT commented Feb 15, 2018

p5pRT commented Feb 15, 2018

p5pRT commented Feb 16, 2018

p5pRT commented Feb 16, 2018

p5pRT commented Feb 19, 2018

p5pRT commented Feb 21, 2018

p5pRT commented Feb 21, 2018

p5pRT commented Jun 23, 2018

p5pRT commented Jun 23, 2018

COW bug in :encoding layer #16408

COW bug in :encoding layer #16408

Comments

p5pRT commented Feb 8, 2018

p5pRT commented Feb 8, 2018

From @pali

p5pRT commented Feb 12, 2018

From @tonycoz

p5pRT commented Feb 12, 2018

p5pRT commented Feb 12, 2018

From @pali

p5pRT commented Feb 12, 2018

From @tonycoz

p5pRT commented Feb 13, 2018

From @pali

p5pRT commented Feb 13, 2018

From @iabyn

p5pRT commented Feb 15, 2018

From zefram@fysh.org

p5pRT commented Feb 15, 2018

From @jkeenan

p5pRT commented Feb 15, 2018

From @jkeenan

p5pRT commented Feb 15, 2018

From @tonycoz

p5pRT commented Feb 15, 2018

From zefram@fysh.org

p5pRT commented Feb 16, 2018

From @Leont

p5pRT commented Feb 16, 2018

From zefram@fysh.org

p5pRT commented Feb 19, 2018

From @Leont

p5pRT commented Feb 21, 2018

From @tonycoz

p5pRT commented Feb 21, 2018

p5pRT commented Jun 23, 2018

From @khwilliamson

p5pRT commented Jun 23, 2018