Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COW bug in :encoding layer #16408

Closed
p5pRT opened this issue Feb 8, 2018 · 19 comments
Closed

COW bug in :encoding layer #16408

p5pRT opened this issue Feb 8, 2018 · 19 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 8, 2018

Migrated from rt.perl.org#132833 (status was 'resolved')

Searchable as RT132833$

@p5pRT
Copy link
Author

p5pRT commented Feb 8, 2018

From @pali

Following code modifies variable $valid​:

use strict;
use warnings;
use Data​::Dumper;

my $valid = "\x61\x00\x00\x00";
my $tmp = $valid;
print Dumper $valid;
open my $fh, '<​:encoding(UTF32-LE)', \$tmp or die;
my $str = <$fh>;
close $fh;
print Dumper $valid;

Output is​:

$VAR1 = 'a';
$VAR1 = '';

This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2018

From @tonycoz

On Thu, 08 Feb 2018 12​:16​:59 -0800, pali@​cpan.org wrote​:

Following code modifies variable $valid​:

use strict;
use warnings;
use Data​::Dumper;

my $valid = "\x61\x00\x00\x00";
my $tmp = $valid;
print Dumper $valid;
open my $fh, '<​:encoding(UTF32-LE)', \$tmp or die;
my $str = <$fh>;
close $fh;
print Dumper $valid;

Output is​:

$VAR1 = 'a';
$VAR1 = '';

This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-
linux-gnu-thread-multi

This appears to be a bug in Unicode​::decode() - it's modifying a buffer that doesn't belong to it (SvLEN() for the SV is zero).

Adding a call to SvGROW() fixes it​:

Inline Patch
diff --git a/cpan/Encode/Unicode/Unicode.xs b/cpan/Encode/Unicode/Unicode.xs
index b3b1d2fea8..2a5c647625 100644
--- a/cpan/Encode/Unicode/Unicode.xs
+++ b/cpan/Encode/Unicode/Unicode.xs
@@ -328,6 +328,7 @@ CODE:
        }
     }
     if (check && !(check & ENCODE_LEAVE_SRC)) {
+        SvGROW(str, 1+e-s);
        if (s < e) {
            Move(s,SvPVX(str),e-s,U8);
            SvCUR_set(str,(e-s));

Replacing that block with a call to sv_chop() instead would also fix it, except for what I think is a bug in PerlIO::encoding:

  /* Create a "dummy" SV to represent the available data from layer below */
  if (SvLEN(e->dataSV) && SvPVX_const(e->dataSV)) {
  Safefree(SvPVX_mutable(e->dataSV));
  }

since SvPVX(sv) might not point to the beginning of the allocated block with an SvOOK() SV.

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2018

From @pali

On Sunday 11 February 2018 23​:11​:26 Tony Cook via RT wrote​:

This appears to be a bug in Unicode​::decode() - it's modifying a buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode​::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

Adding a call to SvGROW() fixes it​:

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2018

From @tonycoz

On Mon, 12 Feb 2018 01​:02​:37 -0800, pali@​cpan.org wrote​:

On Sunday 11 February 2018 23​:11​:26 Tony Cook via RT wrote​:

This appears to be a bug in Unicode​::decode() - it's modifying a
buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode​::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

I was surprised that it didn't, but it doesn't.

It's possible I'm wrong in saying Encode is doing the wrong thing here.

SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2018

From @pali

On Monday 12 February 2018 15​:21​:31 Tony Cook via RT wrote​:

On Mon, 12 Feb 2018 01​:02​:37 -0800, pali@​cpan.org wrote​:

On Sunday 11 February 2018 23​:11​:26 Tony Cook via RT wrote​:

This appears to be a bug in Unicode​::decode() - it's modifying a
buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode​::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

I was surprised that it didn't, but it doesn't.

It's possible I'm wrong in saying Encode is doing the wrong thing here.

SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).

Anyway, exactly same pattern of code is in UTF-8 decoder (in Encode.xs)
and is not affected by this bug. It also uses SvPV_force_nomg() and
there is no SvGROW() call too. So I cannot believe problem is in missing
SvGROW(), otherwise UTF-8 decoder would be affected too...

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2018

From @iabyn

On Tue, Feb 13, 2018 at 09​:21​:03AM +0100, pali@​cpan.org wrote​:

On Monday 12 February 2018 15​:21​:31 Tony Cook via RT wrote​:

On Mon, 12 Feb 2018 01​:02​:37 -0800, pali@​cpan.org wrote​:

On Sunday 11 February 2018 23​:11​:26 Tony Cook via RT wrote​:

This appears to be a bug in Unicode​::decode() - it's modifying a
buffer that doesn't belong to it (SvLEN() for the SV is zero).

But Unicode​::decode() calls SvPV_force_nomg(). Is not this enough? And
is SvGROW() after SvPV_force really needed?

I was surprised that it didn't, but it doesn't.

It's possible I'm wrong in saying Encode is doing the wrong thing here.

SvGROW() (or sv_chop()) will allocate a new buffer and copy the from the original buffer into a new string, which belongs to the SV (rather than the SvLEN()== 0 meaning it belongs to something else).

Anyway, exactly same pattern of code is in UTF-8 decoder (in Encode.xs)
and is not affected by this bug. It also uses SvPV_force_nomg() and
there is no SvGROW() call too. So I cannot believe problem is in missing
SvGROW(), otherwise UTF-8 decoder would be affected too...

SvLEN() is set to zero here in ext/PerlIO-encoding/encoding.xs​:

  SvPV_set(e->dataSV, (char *) ptr);
  SvLEN_set(e->dataSV, 0); /* Hands off sv.c - it isn't yours */
  SvCUR_set(e->dataSV,use);
  SvPOK_only(e->dataSV);

I'm not completely sure how core and other XS code should handle
SVLEN()==0 SVs. My gut feeling is that if such an SV ever gets delivered
to a block of code which which wishes to GROW the SV, then a bug has been
triggered.

Clearly if code *was* to GROW such an SV, then at a minimum, the previous
"alien" buffer might get leaked. Or the alien-handling code might later
interpret the new buffer as an alien buffer and crash badly.

So I have a vague felling that Enocde is ok, and that the real bug is in
allowing such an SV to reach decode() in the first place.

But I haven't looked very closely at the whole system.

--
Counsellor Troi states something other than the blindingly obvious.
  -- Things That Never Happen in "Star Trek" #16

@p5pRT
Copy link
Author

p5pRT commented Feb 15, 2018

From zefram@fysh.org

Following code modifies variable $valid​:

Actually you don't need to see the modification of $valid to see that
there's a bug. The code shouldn't be modifying $tmp, and it exhibits
the bug of that modification even without $valid existing. It is a COW
bug that the modification to $tmp also modifies $valid, but adding COW
logic is not going to be the way to fix this. The underlying bug is
that it writes to $tmp at all, which is not a COW bug, and the fix will
be to stop it writing to $tmp.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Feb 15, 2018

From @jkeenan

On Thu, 15 Feb 2018 20​:02​:17 GMT, zefram@​fysh.org wrote​:

Following code modifies variable $valid​:

Actually you don't need to see the modification of $valid to see that
there's a bug. The code shouldn't be modifying $tmp, and it exhibits
the bug of that modification even without $valid existing. It is a COW
bug that the modification to $tmp also modifies $valid, but adding COW
logic is not going to be the way to fix this. The underlying bug is
that it writes to $tmp at all, which is not a COW bug, and the fix will
be to stop it writing to $tmp.

-zefram

I.e., with the attachment​:

#####
$ perl 132833-UTF32-LE-encoding.pl | od -c
0000000 $ V A R 1 = ' a \0 \0 \0 ' ; \n
0000020 $ V A R 1 = ' \0 \0 \0 \0 ' ; \n
0000040
#####
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Feb 15, 2018

@p5pRT
Copy link
Author

p5pRT commented Feb 15, 2018

From @tonycoz

On Thu, Feb 15, 2018 at 08​:01​:48PM +0000, Zefram wrote​:

Following code modifies variable $valid​:

Actually you don't need to see the modification of $valid to see that
there's a bug. The code shouldn't be modifying $tmp, and it exhibits
the bug of that modification even without $valid existing. It is a COW
bug that the modification to $tmp also modifies $valid, but adding COW
logic is not going to be the way to fix this. The underlying bug is
that it writes to $tmp at all, which is not a COW bug, and the fix will
be to stop it writing to $tmp.

The issue is where the bug is.

PerlIO​::encoding creates a SvLEN()==0 variable and passes it to
Encode, which calls SvPV_force_nomg() on the sv, and then modifies the
buffer.

Is passing the len==0 SV through to Encode the error?

Or should SvPV_force/sv_force_normal remove len == 0 from the SV?

Or should Encode be doing something to prevent the problem? (SvGROW
worked for me)

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 15, 2018

From zefram@fysh.org

Dave Mitchell wrote​:

So I have a vague felling that Enocde is ok, and that the real bug is in
allowing such an SV to reach decode() in the first place.

To be precise​: is it legitimate for an SV to have SvLEN==0 without also
being SvREADONLY? The documentation doesn't explicitly address this,
and in fact is pretty much silent on the SvLEN==0 arrangement.

It appears to me that SvPV_force*() are intended to leave the scalar
in a state in which its SvPVX is writable. The documentation suggests
that it's sufficient, without explicitly addressing SvLEN==0. More
convincingly, these macros will croak if the scalar is SvREADONLY.
Encode​::Unicode therefore seems legit in performing a write after
SvPV_force_nomg().

If SvLEN==0 with !SvREADONLY is legitimate, then SvPV_force*() need to
check for SvLEN==0 and do something, presumably cloning the string into
a new buffer owned by the scalar. If there were no good answer as to
what it should do here, that would be a strong argument for this kind of
scalar not being permitted. I have a slight worry about whether it's OK
to clone the buffer of an SvLEN==0 scalar, but since sv_utf8_upgrade()
performs such an operation (even on an SvREADONLY scalar) there's
precedent for it. So I think this argument against SvLEN==0 with
!SvREADONLY being permitted doesn't apply.

Do we have other cases of SvLEN==0 with !SvREADONLY? If not, I'd
be inclined to say that it's forbidden, and avoid having to make the
SvPV_force*() macros more expensive.

Meanwhile, PerlIO​::encoding should be setting the flags differently on
that scalar. I'm not sure which way round. One option is that it should
set SvREADONLY, and should set the ENCODE_LEAVE_SRC bit in the check flag.
The other option is that it should clone the string itself so that it
doesn't have SvLEN==0. At first glance I can't see whether all the
mucking about with e->dataSV depends on the mutation that ->decode
performs. PerlIO​::encoding isn't ensuring that the ENCODE_LEAVE_SRC
bit is clear, so either it doesn't depend on the mutation or that's
another bug.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2018

From @Leont

On Fri, Feb 16, 2018 at 12​:38 AM, Zefram <zefram@​fysh.org> wrote​:

Dave Mitchell wrote​:

So I have a vague felling that Enocde is ok, and that the real bug is in
allowing such an SV to reach decode() in the first place.

To be precise​: is it legitimate for an SV to have SvLEN==0 without also
being SvREADONLY? The documentation doesn't explicitly address this,
and in fact is pretty much silent on the SvLEN==0 arrangement.

It appears to me that SvPV_force*() are intended to leave the scalar
in a state in which its SvPVX is writable. The documentation suggests
that it's sufficient, without explicitly addressing SvLEN==0. More
convincingly, these macros will croak if the scalar is SvREADONLY.
Encode​::Unicode therefore seems legit in performing a write after
SvPV_force_nomg().

If SvLEN==0 with !SvREADONLY is legitimate, then SvPV_force*() need to
check for SvLEN==0 and do something, presumably cloning the string into
a new buffer owned by the scalar. If there were no good answer as to
what it should do here, that would be a strong argument for this kind of
scalar not being permitted. I have a slight worry about whether it's OK
to clone the buffer of an SvLEN==0 scalar, but since sv_utf8_upgrade()
performs such an operation (even on an SvREADONLY scalar) there's
precedent for it. So I think this argument against SvLEN==0 with
!SvREADONLY being permitted doesn't apply.

Do we have other cases of SvLEN==0 with !SvREADONLY? If not, I'd
be inclined to say that it's forbidden, and avoid having to make the
SvPV_force*() macros more expensive.

In File​::Map I use such strings, and writing operations work fine on
them iff the operation doesn't change their length. A few other
modules do the same, I didn't invent this technique myself.

Meanwhile, PerlIO​::encoding should be setting the flags differently on
that scalar. I'm not sure which way round. One option is that it should
set SvREADONLY, and should set the ENCODE_LEAVE_SRC bit in the check flag.
The other option is that it should clone the string itself so that it
doesn't have SvLEN==0. At first glance I can't see whether all the
mucking about with e->dataSV depends on the mutation that ->decode
performs.

The code is depending on Encode modifying dataSV.

PerlIO​::encoding isn't ensuring that the ENCODE_LEAVE_SRC
bit is clear, so either it doesn't depend on the mutation or that's
another bug.

IME it's generally a safe bet that PerlIO​::encoding is doing the
confusing thing.

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2018

From zefram@fysh.org

Leon Timmermans wrote​:

In File​::Map I use such strings, and writing operations work fine on
them iff the operation doesn't change their length.

Ah, thanks. I hadn't thought about the writability of the unowned
buffer being used intentionally. Obviously that use of a writable
SvLEN==0 scalar is completely incompatible with the behaviour
that PerlIO​::encoding intends for the scalar that it passes to the
encoding object. Based on the File​::Map precedent, I conclude that
PerlIO​::encoding should be cloning the string preemptively. Fixed in
commit fed9fe5.

FWIW, the File​::Map usage seems dubious, because SV-related functions
are willing to allocate a new buffer. It means that there isn't any
real guarantee that writes will go to the unowned buffer.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Feb 19, 2018

From @Leont

On Fri, Feb 16, 2018 at 6​:25 PM, Zefram <zefram@​fysh.org> wrote​:

FWIW, the File​::Map usage seems dubious, because SV-related functions
are willing to allocate a new buffer. It means that there isn't any
real guarantee that writes will go to the unowned buffer.

The main innovation of File​::Map over previous modules like Sys​::Mmap
is that it uses set-magic to detect it when that happens, complain
loudly and if possible fix it :-)

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2018

From @tonycoz

This was fixed in fed9fe5.

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2018

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Jun 23, 2018

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release yesterday of Perl 5.28.0, this and 185 other issues have been
resolved.

Perl 5.28.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.28.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented Jun 23, 2018

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant