Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in &= (string) and/or m// #8192

Closed
p5pRT opened this issue Nov 6, 2005 · 15 comments
Closed

Bug in &= (string) and/or m// #8192

p5pRT opened this issue Nov 6, 2005 · 15 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 6, 2005

Migrated from rt.perl.org#37616 (status was 'resolved')

Searchable as RT37616$

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2005

From anno4000@mailbox.tu-berlin.de

  From​: anno4000@​mailbox.tu-berlin.de
  Subject​:
  Date​: 6. November 2005 02​:13​:14.0MEZ
  To​: anno4000@​mailbox.zrz.tu-berlin.de

This is a bug report for perl from anno@​oliva.zrz.tu-berlin.de,
generated with the help of perlbug 1.35 running under perl v5.8.6.

The sequence

  my $str = 'aa';
  $str &= 'a';
  $str =~ /a+$/ or die;

dies, showing that the match fails while it obviously shouldn't. It
turns
out that &= returns a string without a trailing zero. The regex engine
appears to rely on the trailing zero, which it shouldn't. "use
bytes" makes
no difference. The behavior is the same with perl-5.9.2. Test
appended.

Anno

========================================================================
use Test​::More tests => 2;

# prepare a string
my $str = 'aa';
$str &= 'a'; # $str now defect
# $str .= ''; # this heals it

is( $str, 'a', "Single 'a' after &="); # passes
ok( $str =~ /a+$/, "Match after &="); # fails


Flags​:
  category=core
  severity=low


Site configuration information for perl v5.8.6​:

Configured by anno at Sun Jul 24 00​:22​:57 CEST 2005.

Summary of my perl5 (revision 5 version 8 subversion 6) configuration​:
  Platform​:
  osname=darwin, osvers=8.2.0, archname=darwin-2level
  uname='darwin oliva 8.2.0 darwin kernel version 8.2.0​: fri jun
24 17​:46​:54 pdt 2005; root​:xnu-792.2.4.obj~3release_ppc power
macintosh powerpc '
  config_args='-des'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp -
fno-strict-aliasing -pipe -I/usr/local/include',
  optimize='-O3',
  cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-
precomp -fno-strict-aliasing -pipe -I/usr/local/include'
  cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-
precomp -fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion='', gccversion='4.0.0 20041026 (Apple Computer, Inc.
build 4061)', gccosandvers='darwin8'
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=4, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -L/usr/
local/lib'
  libpth=/usr/local/lib /usr/lib
  libs=-ldbm -ldl -lm -lc
  perllibs=-ldl -lm -lc
  libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false,
libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dyld.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/
usr/local/lib'

Locally applied patches​:


@​INC for perl v5.8.6​:
  /Users/anno/lib/perl
  /usr/local/lib/perl5/5.8.6/darwin-2level
  /usr/local/lib/perl5/5.8.6
  /usr/local/lib/perl5/site_perl/5.8.6/darwin-2level
  /usr/local/lib/perl5/site_perl/5.8.6
  /usr/local/lib/perl5/site_perl/5.8.5/darwin-2level
  /usr/local/lib/perl5/site_perl/5.8.5
  /usr/local/lib/perl5/site_perl/5.8.4/darwin-2level
  /usr/local/lib/perl5/site_perl/5.8.4
  /usr/local/lib/perl5/site_perl/5.8.3/darwin-2level
  /usr/local/lib/perl5/site_perl/5.8.3
  /usr/local/lib/perl5/site_perl
  .


Environment for perl v5.8.6​:
  DYLD_LIBRARY_PATH (unset)
  HOME=/Users/anno
  LANG (unset)
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/usr/X11R6/bin​:/usr/local/bin​:/Developer/Tools​:/usr/local/
bin​:/bin​:/sbin​:/usr/bin​:/usr/sbin​:/Users/anno/bin
  PERL5LIB=/Users/anno/lib/perl
  PERL_BADLANG (unset)
  SHELL=/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2005

From @ysth

On Sat, Nov 05, 2005 at 05​:20​:20PM -0800, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies, showing that the match fails while it obviously shouldn't. It
turns
out that &= returns a string without a trailing zero. The regex engine
appears to rely on the trailing zero, which it shouldn't. "use
bytes" makes
no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies, the match is succeeding. And it succeeds for me from 5.6.2
to 5.9.3. But /^a$/ fails!

I would have expected the &= to leave $str set to the 2 characters​:
"a\0", but it seems that stringwise & returns something with the
length of the shorter operand. This makes some kind of sense.

But /^a$/ failing when $str eq "a" is true is obviously a bug.

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2005

From @Abigail

On Sun, Nov 06, 2005 at 06​:14​:20PM -0800, Yitzchak Scott-Thoennes wrote​:

On Sat, Nov 05, 2005 at 05​:20​:20PM -0800, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies, showing that the match fails while it obviously shouldn't. It
turns
out that &= returns a string without a trailing zero. The regex engine
appears to rely on the trailing zero, which it shouldn't. "use
bytes" makes
no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies, the match is succeeding. And it succeeds for me from 5.6.2
to 5.9.3. But /^a$/ fails!

I would have expected the &= to leave $str set to the 2 characters​:
"a\0", but it seems that stringwise & returns something with the
length of the shorter operand. This makes some kind of sense.

It not only makes sense, it's also documented to do it this way​:

  If the operands to a binary bitwise op are strings of
  different sizes, | and ^ ops act as though the shorter
  operand had additional zero bits on the right, while the &
  op acts as though the longer operand were truncated to the
  length of the shorter.

  From "Bitwise String Operators" in the "perlop" manual page.

Abigail

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2005

From anno4000@mailbox.tu-berlin.de

On 07.11.2005, at 03​:22, Yitzchak Scott-Thoennes via RT wrote​:

On Sat, Nov 05, 2005 at 05​:20​:20PM -0800, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies, showing that the match fails while it obviously shouldn't. It
turns
out that &= returns a string without a trailing zero. The regex
engine
appears to rely on the trailing zero, which it shouldn't. "use
bytes" makes
no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies, the match is succeeding. And it succeeds for me
from 5.6.2
to 5.9.3. But /^a$/ fails!

It's "match or die", so it dies on failure.

I would have expected the &= to leave $str set to the 2 characters​:
"a\0", but it seems that stringwise & returns something with the
length of the shorter operand. This makes some kind of sense.

It does, in view of the fact that a bit string is virtually followed
by infinitely
many zero bytes (at least as far as vec() is concerned.

But /^a$/ failing when $str eq "a" is true is obviously a bug.

Ah, good that's a clearer example than my /a+$/, which also fails.

Anno

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2005

From @ysth

On Mon, Nov 07, 2005 at 09​:26​:21PM +0100, Anno Siegel wrote​:

On 07.11.2005, at 03​:22, Yitzchak Scott-Thoennes via RT wrote​:

On Sat, Nov 05, 2005 at 05​:20​:20PM -0800, Anno Siegel wrote​:

The sequence

my $str = 'aa';
$str &= 'a';
$str =~ /a\+$/ or die;

dies, showing that the match fails while it obviously shouldn't. It
turns
out that &= returns a string without a trailing zero. The regex
engine
appears to rely on the trailing zero, which it shouldn't. "use
bytes" makes
no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies, the match is succeeding. And it succeeds for me
from 5.6.2
to 5.9.3. But /^a$/ fails!

It's "match or die", so it dies on failure.

Sorry, momentary confusion on my part.

I would have expected the &= to leave $str set to the 2 characters​:
"a\0", but it seems that stringwise & returns something with the
length of the shorter operand. This makes some kind of sense.

It does, in view of the fact that a bit string is virtually followed
by infinitely
many zero bytes (at least as far as vec() is concerned.

But /^a$/ failing when $str eq "a" is true is obviously a bug.

Ah, good that's a clearer example than my /a+$/, which also fails.

Hmm, still can't get that to fail on any version, but /^a$/ and /^a+$/
both do fail. Anyway, since $str is clearly being (correctly) left
as "a", your guess that the regex engine is tripping over there not
being a null character after the a seems quite likely to me as well.

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2005

From BQW10602@nifty.com

On Mon, 7 Nov 2005 23​:15​:14 +0100, Abigail <abigail@​abigail.nl> wrote

It not only makes sense, it's also documented to do it this way​:

   If the operands to a binary bitwise op are strings of
   different sizes\, | and ^ ops act as though the shorter
   operand had additional zero bits on the right\, while the &
   op acts as though the longer operand were truncated to the
   length of the shorter\.

From "Bitwise String Operators" in the "perlop" manual page\.

Does this part of perlop just mention that ("a" | "xyz") is same
as ("a\0\0" | "xyz") while ("a" & "xyz") is same as ("a" & "x")?
(see also "ASCII-based examples" following the part)

I don't think "additional zero bits" here mean a NUL character
which a C string is terminated with.

Say, another document, perlguts, mentions as cited below​:

  All SVs that contain strings should be terminated with a NUL
  character. If it is not NUL-terminated there is a risk of
  core dumps and corruptions from code which passes the string
  to C functions or system calls which expect a NUL-terminated string.
  Perl's own functions typically add a trailing NUL for this reason.
  Nevertheless, you should be very careful when you pass a string
  stored in an SV to a C function or system call.

Thus I think & operation should add NUL character always.

Regards,
SADAHIRO Tomoyuki

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2005

From @rgs

SADAHIRO Tomoyuki wrote​:

All SVs that contain strings should be terminated with a NUL
character. If it is not NUL-terminated there is a risk of
core dumps and corruptions from code which passes the string
to C functions or system calls which expect a NUL-terminated string.
Perl's own functions typically add a trailing NUL for this reason.
Nevertheless, you should be very careful when you pass a string
stored in an SV to a C function or system call.

This internal limitation bothers me; but anyway, the simplest fix
is there to ensure a \0 is appended at the end of the PV buffer.

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2005

From @nwc10

On Tue, Nov 08, 2005 at 01​:44​:19PM +0100, Rafael Garcia-Suarez wrote​:

SADAHIRO Tomoyuki wrote​:

All SVs that contain strings should be terminated with a NUL
character. If it is not NUL-terminated there is a risk of
core dumps and corruptions from code which passes the string
to C functions or system calls which expect a NUL-terminated string.
Perl's own functions typically add a trailing NUL for this reason.
Nevertheless, you should be very careful when you pass a string
stored in an SV to a C function or system call.

This internal limitation bothers me; but anyway, the simplest fix
is there to ensure a \0 is appended at the end of the PV buffer.

The limitation that the regexp engine is relying on it definitely bothers me.
It would be a nice bug to fix. I wonder if it's actually simpler to fix than
some of the other long standing regexp bugs.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2005

From BQW10602@nifty.com

On Tue, 8 Nov 2005 13​:44​:19 +0100, Rafael Garcia-Suarez <rgarciasuarez@​mandriva.com> wrote

SADAHIRO Tomoyuki wrote​:

All SVs that contain strings should be terminated with a NUL
character. If it is not NUL-terminated there is a risk of
core dumps and corruptions from code which passes the string
to C functions or system calls which expect a NUL-terminated string.
Perl's own functions typically add a trailing NUL for this reason.
Nevertheless, you should be very careful when you pass a string
stored in an SV to a C function or system call.

This internal limitation bothers me; but anyway, the simplest fix
is there to ensure a \0 is appended at the end of the PV buffer.

Here is a patch proposed,
but the test is too silly to test a bug relying on a bug.
(\0 in question is outside string...)

SADAHIRO Tomoyuki

Inline Patch
diff -ur perl~patch26045/doop.c perl/doop.c
--- perl~patch26045/doop.c	Mon Oct 31 19:55:18 2005
+++ perl/doop.c	Wed Nov 09 02:03:13 2005
@@ -1174,7 +1174,7 @@
     }
     else if (SvOK(sv) || SvTYPE(sv) > SVt_PVMG) {
 	dc = SvPV_force_nomg_nolen(sv);
-	if (SvCUR(sv) < (STRLEN)len) {
+	if (SvLEN(sv) < (STRLEN)(len + 1)) {
 	    dc = SvGROW(sv, (STRLEN)(len + 1));
 	    (void)memzero(dc + SvCUR(sv), len - SvCUR(sv) + 1);
 	}
@@ -1303,6 +1303,7 @@
 	case OP_BIT_AND:
 	    while (len--)
 		*dc++ = *lc++ & *rc++;
+	    *dc = '\0';
 	    break;
 	case OP_BIT_XOR:
 	    while (len--)
diff -ur perl~patch26045/t/op/bop.t perl/t/op/bop.t
--- perl~patch26045/t/op/bop.t	Wed Dec 22 06:00:08 2004
+++ perl/t/op/bop.t	Wed Nov 09 01:40:34 2005
@@ -15,7 +15,7 @@
 # If you find tests are failing, please try adding names to tests to track
 # down where the failure is, and supply your new names as a patch.
 # (Just-in-time test naming)
-plan tests => 146;
+plan tests => 148;
 
 # numerics
 ok ((0xdead & 0xbeef) == 0x9ead);
@@ -328,4 +328,15 @@
 SKIP: {
   skip "No malloc wrap checks" unless $Config::Config{usemallocwrap};
   like( runperl(prog => 'eval q($#a>>=1); print 1'), "^1\n?" );
+}
+
+# [perl #37616] Bug in &= (string) and/or m//
+{
+    $a = "aa";
+    $a &= "a";
+    ok($a =~ /a+$/, 'ASCII "a" is NUL-terminated');
+
+    $b = "bb\x{100}";
+    $b &= "b";
+    ok($b =~ /b+$/, 'Unicode "b" is NUL-terminated');
 }

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2005

From @Abigail

On Tue, Nov 08, 2005 at 09​:25​:35PM +0900, SADAHIRO Tomoyuki wrote​:

On Mon, 7 Nov 2005 23​:15​:14 +0100, Abigail <abigail@​abigail.nl> wrote

It not only makes sense, it's also documented to do it this way​:

   If the operands to a binary bitwise op are strings of
   different sizes\, | and ^ ops act as though the shorter
   operand had additional zero bits on the right\, while the &
   op acts as though the longer operand were truncated to the
   length of the shorter\.

From "Bitwise String Operators" in the "perlop" manual page\.

Does this part of perlop just mention that ("a" | "xyz") is same
as ("a\0\0" | "xyz") while ("a" & "xyz") is same as ("a" & "x")?
(see also "ASCII-based examples" following the part)

Yes, I think it does.

I don't think "additional zero bits" here mean a NUL character
which a C string is terminated with.

But a NUL character is a 8 zero bits. And the next line of the text
I quoted is​:

  The granularity for such extension or truncation is one or more bytes.

Say, another document, perlguts, mentions as cited below​:

All SVs that contain strings should be terminated with a NUL
character. If it is not NUL-terminated there is a risk of
core dumps and corruptions from code which passes the string
to C functions or system calls which expect a NUL-terminated string.
Perl's own functions typically add a trailing NUL for this reason.
Nevertheless, you should be very careful when you pass a string
stored in an SV to a C function or system call.

Thus I think & operation should add NUL character always.

Yes. But we're talking about two different additions of NUL characters.
The ones described in 'perlop' are visible at the Perl level. The
terminating NUL all strings should end with isn't visible at the Perl
level - that's an internals thingy.

Abigail

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2005

From BQW10602@nifty.com

On Tue, 8 Nov 2005 21​:56​:36 +0100, Abigail <abigail@​abigail.nl> wrote

On Tue, Nov 08, 2005 at 09​:25​:35PM +0900, SADAHIRO Tomoyuki wrote​:

Say, another document, perlguts, mentions as cited below​:

All SVs that contain strings should be terminated with a NUL
character. If it is not NUL-terminated there is a risk of
core dumps and corruptions from code which passes the string
to C functions or system calls which expect a NUL-terminated string.
Perl's own functions typically add a trailing NUL for this reason.
Nevertheless, you should be very careful when you pass a string
stored in an SV to a C function or system call.

Thus I think & operation should add NUL character always.

Yes. But we're talking about two different additions of NUL characters.
The ones described in 'perlop' are visible at the Perl level. The
terminating NUL all strings should end with isn't visible at the Perl
level - that's an internals thingy.

Abigail

Yes. I also think perlop is correct, and I don't intend to change it.

NUL character which should be added as I say is
only "an internals thingy."

SADAHIRO Tomoyuki

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2005

From @rgs

SADAHIRO Tomoyuki wrote​:

This internal limitation bothers me; but anyway, the simplest fix
is there to ensure a \0 is appended at the end of the PV buffer.

Here is a patch proposed,
but the test is too silly to test a bug relying on a bug.
(\0 in question is outside string...)

Thanks, applied as change 26136.

diff -ur perlpatch26045/doop.c perl/doop.c
--- perl
patch26045/doop.c Mon Oct 31 19​:55​:18 2005

@p5pRT p5pRT closed this as completed Nov 15, 2005
@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2005

@rgs - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Dec 8, 2005

From @ysth

On Sat, Nov 05, 2005 at 05​:20​:20PM -0800, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies, showing that the match fails while it obviously shouldn't. It
turns
out that &= returns a string without a trailing zero. The regex engine
appears to rely on the trailing zero, which it shouldn't. "use
bytes" makes
no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies, the match is succeeding. And it succeeds for me from 5.6.2
to 5.9.3. But /^a$/ fails!

I would have expected the &= to leave $str set to the 2 characters​:
"a\0", but it seems that stringwise & returns something with the
length of the shorter operand. This makes some kind of sense.

But /^a$/ failing when $str eq "a" is true is obviously a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant