Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List::Util reduce returns wrong answer because length(string) returns wrong value #13885

Closed
p5pRT opened this issue May 30, 2014 · 16 comments
Closed

Comments

@p5pRT
Copy link

p5pRT commented May 30, 2014

Migrated from rt.perl.org#121992 (status was 'rejected')

Searchable as RT121992$

@p5pRT
Copy link
Author

p5pRT commented May 30, 2014

From @jikamens

Created by @jikamens

This is a bug report for perl from jik@​kamens.us,
generated with the help of perlbug 1.39 running under perl 5.18.2.

-----------------------------------------------------------------

This is a weird bug for which I'm unfortunately not going to be able
to produce code that illustrates it, because it appears to be some
weird memory-related bug that I can only get to occur in my specific
script with my specific input data, which I can't give
you. Nevertheless, I will describe it in as much detail as I can and I
hope that will be enough for you to be able to figure out what's going
on.

I want to use reduce in List​::Util to find the longest string in a
list of names. Here's what I'm doing​:

  my $longest = reduce { length($a) > length($b) ? $a : $b } @​names;

At one point, my script executes this line of code with @​names set to
qw(Aaaaaa Aaaaaa-Bbbbbbb Tancel), and gets the wrong answer​:

main​::envelope_names(/home/jik/foo/to_sql.pl​:305)​:
305​: my $longest = reduce { length($a) > length($b) ? $a : $b } @​names;
  DB<2> p ">@​names<"

Aaaaaa Aaaaaa-Bbbbbbb Bbbbbbb<
  DB<3> n
main​::envelope_names(/home/jik/foo/to_sql.pl​:306)​:
306​: @​names = grep($_ ne $longest, @​names);
  DB<3> p $longest
Bbbbbbb
  DB<4>

Wow, that's exceedingly weird, don't you think? If I modify the reduce
block so that before returning a result, it prints its two input
strings and their lengths, we see something very bizarre​:

main​::envelope_names(/home/jik/foo/to_sql.pl​:305)​:
305​: my $longest = reduce { my $alen = length($a); my $blen = length($b); print "$a ($alen) $b ($blen)\n"; length($a) > length($b) ? $a : $b } @​names;
  DB<2> p ">@​names<"

Aaaaaa Aaaaaa-Bbbbbbb Bbbbbbb<
  DB<3> n
Aaaaaa (6) Aaaaaa-Bbbbbbb (14)
Aaaaaa-Bbbbbbb (6) Bbbbbbb (7)
main​::envelope_names(/home/jik/foo/to_sql.pl​:306)​:
306​: @​names = grep($_ ne $longest, @​names);
  DB<3>

Note how the first time the reduce block is called, it correctly
reports that the length of "Aaaaaa-Bbbbbbb" is 14, and the second time
it is called, it reports that the length of the same string is only
6!!

There appears to be something manifestly bizarre about the particular
perl object containing the string Aaaaaa-Bbbbbbb. If I call the
&envelope_names function with exactly the same list of names, but this
time typed by hand, it works​:

  DB<<21>> p &envelope_names("Aaaaaa", "Aaaaaa-Bbbbbbb", "Bbbbbbb")
Aaaaaa (6) Aaaaaa-Bbbbbbb (14)
Aaaaaa-Bbbbbbb (14) Bbbbbbb (7)
Aaaaaa-Bbbbbbb

NOTE​: The strings in question aren't actually "Aaaaaa",
"Aaaaaa-Bbbbbbb" and "Bbbbbbb". The real strings are people's names in
my data, and it would violate their privacy to post their names in a
public bug report, so I've replaced them with generic strings of the
same length. They're regular ASCII characters, so I really don't think
this makes a difference to the bug report.

Perl Info

Flags:
    category=library
    severity=high
    module=List::Util

Site configuration information for perl 5.18.2:

Configured by Red Hat, Inc. at Tue Jan  7 14:45:19 UTC 2014.

Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.11.9-200.fc19.x86_64, archname=x86_64-linux-thread-multi
    uname='linux buildvm-12.phx2.fedoraproject.org 3.11.9-200.fc19.x86_64 #1 smp wed nov 20 21:22:24 utc 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wl,-z,relro  -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.18.2 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -U!
 i_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.8.2 20131212 (Red Hat 4.8.2-7)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.18'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro '

Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch3: support for libdir64
    Fedora Patch4: use libresolv instead of libbind
    Fedora Patch5: USE_MM_LD_RUN_PATH
    Fedora Patch6: Skip hostname tests, due to builders not being network capable
    Fedora Patch7: Dont run one io test due to random builder failures
    Fedora Patch9: Fix find2perl to translate ? glob properly (RT#113054)
    Fedora Patch10: Update h2ph(1) documentation (RT#117647)
    Fedora Patch11: Update pod2html(1) documentation (RT#117623)
    Fedora Patch12: Disable ornaments on perl5db AutoTrace tests (RT#118817)
    Fedora Patch14: Do not use system Term::ReadLine::Gnu in tests (RT#118821)
    Fedora Patch15: Define SONAME for libperl.so
    Fedora Patch16: Install libperl.so to -Dshrpdir value
    Fedora Patch18: Fix crash with \\&$glob_copy (RT#119051)
    Fedora Patch19: Fix coreamp.t rand test (RT#118237)
    Fedora Patch20: Reap child in case where exception has been thrown (RT#114722)
    Fedora Patch21: Fix using regular expressions containing multiple code blocks (RT#117917)
    Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
    Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux


@INC for perl 5.18.2:
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .


Environment for perl 5.18.2:
    HOME=/home/jik
    LANG=en_US.utf8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_MEASUREMENT=en_US.utf8
    LC_MONETARY=en_US.utf8
    LC_NUMERIC=en_US.utf8
    LC_PAPER=en_US.utf8
    LC_TIME=en_US.utf8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/jik/bin:/home/jik/scripts:/usr/games:/usr/lib64/qt-3.3/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 31, 2014

From @ikegami

On Fri, May 30, 2014 at 1​:31 PM, via RT <perlbug-followup@​perl.org> wrote​:

main​::envelope_names(/home/jik/foo/to_sql.pl​:305)​:
305​: my $longest = reduce { my $alen = length($a); my $blen =
length($b); print "$a ($alen) $b ($blen)\n"; length($a) > length($b) ? $a :
$b } @​names;
DB<2> p ">@​names<"

Aaaaaa Aaaaaa-Bbbbbbb Bbbbbbb<
DB<3> n
Aaaaaa (6) Aaaaaa-Bbbbbbb (14)
Aaaaaa-Bbbbbbb (6) Bbbbbbb (7)
main​::envelope_names(/home/jik/foo/to_sql.pl​:306)​:
306​: @​names = grep($_ ne $longest, @​names);
DB<3>

Note how the first time the reduce block is called, it correctly
reports that the length of "Aaaaaa-Bbbbbbb" is 14, and the second time
it is called, it reports that the length of the same string is only
6!!

Looks like a missing mg_get

@p5pRT
Copy link
Author

p5pRT commented May 31, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 31, 2014

From @ikegami

On Fri, May 30, 2014 at 8​:19 PM, Eric Brine <ikegami@​adaelis.com> wrote​:

On Fri, May 30, 2014 at 1​:31 PM, via RT <perlbug-followup@​perl.org> wrote​:

main​::envelope_names(/home/jik/foo/to_sql.pl​:305)​:
305​: my $longest = reduce { my $alen = length($a); my $blen =
length($b); print "$a ($alen) $b ($blen)\n"; length($a) > length($b) ? $a :
$b } @​names;
DB<2> p ">@​names<"

Aaaaaa Aaaaaa-Bbbbbbb Bbbbbbb<
DB<3> n
Aaaaaa (6) Aaaaaa-Bbbbbbb (14)
Aaaaaa-Bbbbbbb (6) Bbbbbbb (7)
main​::envelope_names(/home/jik/foo/to_sql.pl​:306)​:
306​: @​names = grep($_ ne $longest, @​names);
DB<3>

Note how the first time the reduce block is called, it correctly
reports that the length of "Aaaaaa-Bbbbbbb" is 14, and the second time
it is called, it reports that the length of the same string is only
6!!

Looks like a missing mg_get

If so, you should be able to work around the problem using

  reduce { ... } map "$_", @​names;

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2014

From @jikamens

On 05/30/2014 08​:21 PM, Eric Brine via RT wrote​:

If so, you should be able to work around the problem using

 reduce \{ \.\.\. \} map "$\_"\, @&#8203;names;

This workaround does not work, i.e., the same problem occurs even when I
do this.

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2014

From @ikegami

On Sat, May 31, 2014 at 10​:51 PM, Jonathan Kamens <jik@​kamens.us> wrote​:

On 05/30/2014 08​:21 PM, Eric Brine via RT wrote​:

If so, you should be able to work around the problem using

reduce \{ \.\.\. \} map "$\_"\, @&#8203;names;

This workaround does not work, i.e., the same problem occurs even when I
do this.

And I wasn't able to reproduce the problem using tied variable. Are you
able to provide a dump of the variable using Devel​::Peek's Dump?

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2014

From @jikamens

SV = PVMG(0x23a59c0) at 0x2870310
  REFCNT = 1
  FLAGS = (TEMP,SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x281cc10 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
  CUR = 14
  LEN = 16
  MAGIC = 0x2831d90
  MG_VIRTUAL = &PL_vtbl_utf8
  MG_TYPE = PERL_MAGIC_utf8(w)
  MG_LEN = 6

@p5pRT
Copy link
Author

p5pRT commented Jun 3, 2014

From @demerphq

On 2 June 2014 06​:34, Jonathan Kamens <jik@​kamens.us> wrote​:

SV = PVMG(0x23a59c0) at 0x2870310
REFCNT = 1
FLAGS = (TEMP,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x281cc10 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
CUR = 14
LEN = 16
MAGIC = 0x2831d90
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 6

MG_LEN is clearly broken.

Can you do it again before and after the reduce call?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 3, 2014

From @ikegami

On Tue, Jun 3, 2014 at 4​:26 AM, demerphq <demerphq@​gmail.com> wrote​:

On 2 June 2014 06​:34, Jonathan Kamens <jik@​kamens.us> wrote​:

SV = PVMG(0x23a59c0) at 0x2870310
REFCNT = 1
FLAGS = (TEMP,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x281cc10 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
CUR = 14
LEN = 16
MAGIC = 0x2831d90
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 6

MG_LEN is clearly broken.

Code to reproduce​:

use strict;
use warnings;

use List​::Util qw( reduce );
use Devel​::Peek qw( Dump );

my @​names = qw( Aaaaaa Aaaaaa-Bbbbbbb Bbbbbbb );
for (@​names) {
  utf8​::upgrade($_);
  no warnings 'void';
  length($_);
}

my $longest = reduce { my $alen = length($a); my $blen = length($b);
length($a) > length($b) ? $a : $b } @​names;
print("$longest\n");

Can you do it again before and after the reduce call?

The C<MG_LEN> associated with the elements of C<@​names> is correct both
before and after the call to C<reduce>.

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2014

From @jikamens

Here's a Dump of the relevant member of the list before the reduce, both
the list member and $a during the reduce, and the relevant member of the
list again after the reduce​:

Before list member​:
SV = PV(0x43743c0) at 0x436a000
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x4378e90 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
  CUR = 14
  LEN = 16
During $a​:
SV = PVMG(0x3c50400) at 0x4347520
  REFCNT = 1
  FLAGS = (TEMP,SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x43155e0 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
  CUR = 14
  LEN = 16
  MAGIC = 0x4385e10
  MG_VIRTUAL = &PL_vtbl_utf8
  MG_TYPE = PERL_MAGIC_utf8(w)
  MG_LEN = 6
During list member​:
SV = PVMG(0x3c501c0) at 0x436a000
  REFCNT = 1
  FLAGS = (SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x4378e90 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
  CUR = 14
  LEN = 16
  MAGIC = 0x43800b0
  MG_VIRTUAL = &PL_vtbl_utf8
  MG_TYPE = PERL_MAGIC_utf8(w)
  MG_LEN = 14
After list member​:
SV = PVMG(0x3c501c0) at 0x436a000
  REFCNT = 1
  FLAGS = (SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x4378e90 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
  CUR = 14
  LEN = 16
  MAGIC = 0x43800b0
  MG_VIRTUAL = &PL_vtbl_utf8
  MG_TYPE = PERL_MAGIC_utf8(w)
  MG_LEN = 14

On 06/03/2014 04​:27 AM, yves orton via RT wrote​:

On 2 June 2014 06​:34, Jonathan Kamens <jik@​kamens.us> wrote​:

SV = PVMG(0x23a59c0) at 0x2870310
REFCNT = 1
FLAGS = (TEMP,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x281cc10 "Aaaaaa-Bbbbbbb"\0 [UTF8 "Aaaaaa-Bbbbbbb"]
CUR = 14
LEN = 16
MAGIC = 0x2831d90
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 6

MG_LEN is clearly broken.

Can you do it again before and after the reduce call?

Yves

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2014

From @iabyn

I can reproduce the issue on blead now.

  use List​::Util qw(reduce);
  @​names = ("a\x{100}c", "d\x{101}efgh", 'ijk');
  my $longest = reduce { length($a) > length($b) ? $a : $b } @​names;

with a debugging build gives

  $ ./perl -Ilib ~/tmp/p
  Wide character in length at /home/davem/tmp/p line 5.
  panic​: sv_len_utf8 cache 3 real 6 for dāefgh at /home/davem/tmp/p line 5.

clearly the utf8 cached length in magic is getting corrupted.

It's been present since 5.10.0.

--
Red sky at night - gerroff my land!
Red sky at morning - gerroff my land!
  -- old farmers' sayings #14

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2014

From @tonycoz

On Wed Jun 04 03​:19​:07 2014, davem wrote​:

I can reproduce the issue on blead now.

use List​::Util qw(reduce);
@​names = ("a\x{100}c", "d\x{101}efgh", 'ijk');
my $longest = reduce { length($a) > length($b) ? $a : $b } @​names;

with a debugging build gives

$ ./perl -Ilib ~/tmp/p
Wide character in length at /home/davem/tmp/p line 5.
panic​: sv_len_utf8 cache 3 real 6 for dāefgh at /home/davem/tmp/p line
5.

clearly the utf8 cached length in magic is getting corrupted.

It's been present since 5.10.0.

This is a bug in List​::Util​::reduce().

It uses SvSetSV() to copy SVs around, which doesn't call set magic.

The sequence in Dave's sample would be​:

  args[1] copied into return temp (no magic), aliased into $a
  args[2] aliased into $b
  callback called, which calls length on both, setting their length magic
  return value (which is $b) copied into return temp (no magic),
  leaving the length 3 magic
  args[2] aliased into $b
  callback called, which calls length on both, asserting since the temp has
  the 3 length magic but is 6 long

Replacing the 3 SvSetSV() calls with SvSetMagicSV() fixes the problem.

Tony

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2014

From @leonerd

On Mon, 16 Jun 2014 17​:06​:46 -0700
"Tony Cook via RT" <perlbug-followup@​perl.org> wrote​:

It uses SvSetSV() to copy SVs around, which doesn't call set magic.
...
Replacing the 3 SvSetSV() calls with SvSetMagicSV() fixes the problem.

Can you throw me that in an 'rt' bug and I'll remember to fix it...

--
Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk
http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2014

From @tonycoz

On Mon Jun 16 17​:06​:46 2014, tonyc wrote​:

This is a bug in List​::Util​::reduce().

I think it may already have been reported upstream as

https://rt.cpan.org/Public/Bug/Display.html?id=63211

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 8, 2014

From @tonycoz

On Mon Jun 16 17​:36​:52 2014, tonyc wrote​:

On Mon Jun 16 17​:06​:46 2014, tonyc wrote​:

This is a bug in List​::Util​::reduce().

I think it may already have been reported upstream as

https://rt.cpan.org/Public/Bug/Display.html?id=63211

Since this is an upstream bug, I'm rejecting this ticket.

From the note on the CPAN ticket, this will be fixed in the list release of Scalar-List-Utils.

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 8, 2014

@tonycoz - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant