Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 caching length #12536

Open
p5pRT opened this issue Nov 8, 2012 · 7 comments
Open

UTF8 caching length #12536

p5pRT opened this issue Nov 8, 2012 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 8, 2012

Migrated from rt.perl.org#115638 (status was 'open')

Searchable as RT115638$

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2012

From arnon@back2front.ca

Created by arnon@back2front.ca

Using the UTF-8 input layer, string length appears to be cached in some situations. I haven't narrowed it down exactly, but the following script reproduces the problem very reliably​:

#!/usr/bin/perl

use open ( "​:encoding(UTF-8)", "​:std" );
#${^UTF8CACHE} = 0;

my $tst_file = "substr.tst";

if ( fork() )
{
  foreach ( 1 .. 10 )
  {

my @​string = ( substr ( `cat '$tst_file' 2>/dev/null`, 0, -1 ),
  length ( `cat '$tst_file' 2>/dev/null` ),
  substr ( "" . `cat '$tst_file' 2>/dev/null`, 0, -1 ),
  length ( "" . `cat '$tst_file' 2>/dev/null` ) );

print "DEBUG​: " . join ( ", ", @​string ) . "\n";
sleep ( 1 );

  }
}
else
{
  foreach ( 1 .. 10 )
  {

system ( "echo 12345 > '$tst_file'" );
sleep ( 1 );
unlink ( $tst_file );
sleep ( 1 );

  }
}

Expected result​:
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 0, , 0

Actual result​:
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0

This is solved either by commenting out the use open line, or uncommenting the UTF8CACHE line.

Perl Info

Flags:
    category=core
    severity=low

This perlbug was built using Perl 5.14.2 in the Fedora build system.
It is being executed now by Perl 5.14.2 - Fri Sep 14 12:05:25 UTC 2012.

Site configuration information for perl 5.14.2:

Configured by Red Hat, Inc. at Fri Sep 14 12:05:25 UTC 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=2.6.32-279.2.1.el6.x86_64, archname=x86_64-linux-thread-multi
    uname='linux buildvm-26.phx2.fedoraproject.org 2.6.32-279.2.1.el6.x86_64 #1 smp thu jul 5 21:08:58 edt 2012 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4  -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4  -m64 -mtune=generic -Wl,-z,relro  -DDEBUGGING=-g -Dversion=5.14.2 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -U
 bincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.7.0 20120507 (Red Hat 4.7.0-5)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.15'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-rpath,/usr/lib64/perl5/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wl,-z,relro '

Locally applied patches:
    


@INC for perl 5.14.2:
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .


Environment for perl 5.14.2:
    HOME=/home/arnon
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=.:/home/arnon/applications:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/bin:/usr/bin
    PERL_BADLANG (unset)
    SHELL=/bin/ksh

@p5pRT
Copy link
Author

p5pRT commented Nov 9, 2012

From @jkeenan

On Wed Nov 07 16​:30​:43 2012, thewebsi wrote​:

This is a bug report for perl from arnon@​back2front.ca,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------
[Please describe your issue here]

Using the UTF-8 input layer, string length appears to be cached in
some situations. I haven't narrowed it down exactly, but the
following script reproduces the problem very reliably​:

#!/usr/bin/perl

use open ( "​:encoding(UTF-8)", "​:std" );
#${^UTF8CACHE} = 0;

my $tst_file = "substr.tst";

if ( fork() )
{
foreach ( 1 .. 10 )
{

my @​string = ( substr ( `cat '$tst_file' 2>/dev/null`, 0, -1 ),
length ( `cat '$tst_file' 2>/dev/null` ),
substr ( "" . `cat '$tst_file' 2>/dev/null`, 0, -1 ),
length ( "" . `cat '$tst_file' 2>/dev/null` ) );

print "DEBUG​: " . join ( ", ", @​string ) . "\n";
sleep ( 1 );

}
}
else
{
foreach ( 1 .. 10 )
{

system ( "echo 12345 > '$tst_file'" );
sleep ( 1 );
unlink ( $tst_file );
sleep ( 1 );

}
}

Expected result​:
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 0, , 0

Actual result​:
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0

This is solved either by commenting out the use open line, or
uncommenting the UTF8CACHE line.

I ran your program more than a dozen times. My results were
inconsistent​: Sometimes I got the expected results; sometimes not. For
example, here are two, back-to-back runs of the file.

#####
[p5p] 525 $ perl 115638_utf8cache.pl
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
[p5p] 526 $ perl 115638_utf8cache.pl
DEBUG​: 12345, 6, 12345, 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
DEBUG​: , 6, , 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0
#####

As you indicated, commenting/uncommenting certain lines made the program
consistently deliver the desired results.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Nov 9, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 9, 2012

From @iabyn

On Thu, Nov 08, 2012 at 05​:25​:13PM -0800, James E Keenan via RT wrote​:

I ran your program more than a dozen times. My results were
inconsistent​: Sometimes I got the expected results; sometimes not. For
example, here are two, back-to-back runs of the file.

With a debugging perl (5.14.2, 5.16.0 and blead) I get assorted output,
but always consistently followed by a panic​:

  $ ./perl -Ilib /tmp/p
  DEBUG​: , 0, , 6
  DEBUG​: , 0, , 0
  panic​: sv_len_utf8 cache 0 real 6 for 12345

because in debugging builds, the cached length is always checked against
the real length.

--
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
  -- Larry Wall

@p5pRT
Copy link
Author

p5pRT commented Nov 9, 2012

From @cpansprout

On Fri Nov 09 04​:47​:31 2012, davem wrote​:

On Thu, Nov 08, 2012 at 05​:25​:13PM -0800, James E Keenan via RT wrote​:

I ran your program more than a dozen times. My results were
inconsistent​: Sometimes I got the expected results; sometimes not. For
example, here are two, back-to-back runs of the file.

With a debugging perl (5.14.2, 5.16.0 and blead) I get assorted output,
but always consistently followed by a panic​:

$ \./perl \-Ilib /tmp/p
DEBUG​: \, 0\, \, 6
DEBUG​: \, 0\, \, 0
panic​: sv\_len\_utf8 cache 0 real 6 for 12345

because in debugging builds, the cached length is always checked against
the real length.

I haven’t looked at this closely, but I suspect the I/O layer needs to
call SvSETMAGIC on some buffer SV.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 16, 2019

From @khwilliamson

On Wed, 07 Nov 2012 16​:30​:43 -0800, thewebsi wrote​:

This is a bug report for perl from arnon@​back2front.ca,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------
[Please describe your issue here]

Using the UTF-8 input layer, string length appears to be cached in
some situations. I haven't narrowed it down exactly, but the
following script reproduces the problem very reliably​:

#!/usr/bin/perl

use open ( "​:encoding(UTF-8)", "​:std" );
#${^UTF8CACHE} = 0;

my $tst_file = "substr.tst";

if ( fork() )
{
foreach ( 1 .. 10 )
{

my @​string = ( substr ( `cat '$tst_file' 2>/dev/null`, 0, -1 ),
length ( `cat '$tst_file' 2>/dev/null` ),
substr ( "" . `cat '$tst_file' 2>/dev/null`, 0, -1 ),
length ( "" . `cat '$tst_file' 2>/dev/null` ) );

print "DEBUG​: " . join ( ", ", @​string ) . "\n";
sleep ( 1 );

}
}
else
{
foreach ( 1 .. 10 )
{

system ( "echo 12345 > '$tst_file'" );
sleep ( 1 );
unlink ( $tst_file );
sleep ( 1 );

}
}

Expected result​:
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 0, , 0

Actual result​:
DEBUG​: 12345, 6, 12345, 6
DEBUG​: , 6, , 0

This is solved either by commenting out the use open line, or
uncommenting the UTF8CACHE line.

This is still a problem, but instead of printing an erroneous line, it panics
panic​: sv_len_utf8 cache 6 real 0 for at 115638.pl
--
Karl Williamson

@khwilliamson
Copy link
Contributor

Still present in 5.35.10; asan didn't show a problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants