Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 failure with sprintf () #9636

Closed
p5pRT opened this issue Jan 23, 2009 · 8 comments
Closed

UTF8 failure with sprintf () #9636

p5pRT opened this issue Jan 23, 2009 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 23, 2009

Migrated from rt.perl.org#62666 (status was 'resolved')

Searchable as RT62666$

@p5pRT
Copy link
Author

p5pRT commented Jan 23, 2009

From hmbrand@cpan.org

Created by hmbrand@cpan.org

--8<--- demo.pl
use strict;
use warnings;
#use Data​::Peek;

my $v = "\x{20ac} 12,345.00";
#print DPeek ($v), "\n";
my $x = sprintf "%12.12s|", $v;
-->8---

with Data​::Peek for debugging​:

PV("\342\202\254 12,345.00"\0) [UTF8 "\x{20ac} 12,345.00"]
panic​: utf8_mg_pos_cache_update cache 12 real 11 for ? 12,345.00 at demo.pl line 9.

without​:

panic​: utf8_mg_pos_cache_update cache 12 real 11 for ? 12,345.00 at demo.pl line 9.

Which is a crash.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.10.0:

Configured by merijn at Tue Dec 18 13:34:32 CET 2007.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.22.13-0.3-default, archname=i686-linux-64int
    uname='linux nb09 2.6.22.13-0.3-default #1 smp 20071119 15:02:58 utc i686 i686 i386 gnulinux '
    config_args='-Duse64bitint -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/pro/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-fno-strict-aliasing -pipe -I/pro/local/include'
    ccversion='', gccversion='4.2.1 (SUSE Linux)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-L/pro/local/lib'
    libpth=/pro/local/lib /lib /usr/lib /usr/local/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.6.1.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.6.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/pro/local/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    /pro/lib/perl5/5.10.0/i686-linux-64int
    /pro/lib/perl5/5.10.0
    /pro/lib/perl5/site_perl/5.10.0/i686-linux-64int
    /pro/lib/perl5/site_perl/5.10.0
    .


Environment for perl 5.10.0:
    HOME=/home/merijn
    LANG=en_US.UTF8
    LANGUAGE (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From @nwc10

Dave notes​:

A panic in 5.10.0, maint, bleed; but not in 5.8.8 or 5.8.9

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

@nwc10 - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 29, 2009

From p5p@spam.wizbit.be

On Fri Jan 23 12​:14​:54 2009, hmbrand@​cpan.org wrote​:

--8<--- demo.pl
use strict;
use warnings;
#use Data​::Peek;

my $v = "\x{20ac} 12,345.00";
#print DPeek ($v), "\n";
my $x = sprintf "%12.12s|", $v;
-->8---

with Data​::Peek for debugging​:

PV("\342\202\254 12,345.00"\0) [UTF8 "\x{20ac} 12,345.00"]
panic​: utf8_mg_pos_cache_update cache 12 real 11 for ? 12,345.00 at
demo.pl line 9.

without​:

panic​: utf8_mg_pos_cache_update cache 12 real 11 for ? 12,345.00 at
demo.pl line 9.

Which is a crash.

[Please do not change anything below this line]
-----------------------------------------------------------------

Binary search​:

----Program----
use strict;
use warnings;
#use Data​::Peek;

my $v = "\x{20ac} 12,345.00";
#print DPeek ($v), "\n";
my $x = sprintf "%12.12s|", $v;

----Output of .../p0IhoGv/perl-5.9.4@​31245/bin/perl----

----EOF ($?='0')----
----Output of .../proQMr6/perl-5.9.4@​31246/bin/perl----
panic​: utf8_mg_pos_cache_update cache 12 real 11 for ⬠12,345.00 at /
tmp/rt-62666.pl line 7.

----EOF ($?='65280')----
Need a perl between 31245 and 31246

http​://public.activestate.com/cgi-bin/perlbrowse/p/31246
Change 31246 by davem@​davem-pigeon on 2007/05/20 23​:56​:30

  delete unused vars PL_av_fetch_sv, PL_hv_fetch_sv
  and fix 'duplicate symbol' warnings from embed.pl
  for utf8cache and sh_path

The only other difference I can find is that SvLEN is 14 on perl-5.8.7
and SvLEN is 16 on perl-5.8.8 but this looks like an intended and
unrelated change (Change 24665)...

Best regards,

Bram

@p5pRT
Copy link
Author

p5pRT commented May 30, 2009

From alex@chmrr.net

At Fri Jan 23 15​:14​:55 -0500 2009, hmbrand@​cpan.org (via RT) wrote​:

panic​: utf8_mg_pos_cache_update cache 12 real 11 for ? 12,345.00 at demo.pl
line 9.

Attached is a fix for this. Unfortunately, I don't think we can get
away without the added call to svn_len_utf8 without rewriting
sv_pos_u2b_cached.

Thanks to Best Practical for volunteering my time to look at some of
these 5.10.1 blockers.
- Alex
--
Networking -- only one letter away from not working

@p5pRT
Copy link
Author

p5pRT commented May 30, 2009

From alex@chmrr.net

0001-Fix-RT-6266-sv_pos_u2b-expects-to-be-called-with-a-v.patch
From dcba34fa03ca9da9b27871a19948cd516479c892 Mon Sep 17 00:00:00 2001
From: Alex Vandiver <alexmv@mit.edu>
Date: Fri, 29 May 2009 16:21:22 -0400
Subject: [PATCH] Fix [RT#6266] -- sv_pos_u2b expects to be called with a valid character index

sv_pos_u2b, when utf8 position caching is enabled, treats the uoffset
it is given as real, storing it away for lature use.  sprintf, here,
passes the byte length of the string, which causes an invalid offset
to be cached.
---
 sv.c            |    5 +++--
 t/op/sprintf2.t |   18 +++++++++++++++++-
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/sv.c b/sv.c
index a2fd856..a10bbf3 100644
--- a/sv.c
+++ b/sv.c
@@ -9444,7 +9444,8 @@ Perl_sv_vcatpvfn(pTHX_ SV *sv, const char *pat, STRLEN patlen, va_list *args, SV
 		if (DO_UTF8(argsv)) {
 		    I32 old_precis = precis;
 		    if (has_precis && precis < elen) {
-			I32 p = precis;
+			I32 ulen = sv_len_utf8(argsv);
+			I32 p = precis > ulen ? ulen : precis;
 			sv_pos_u2b(argsv, &p, 0); /* sticks at end */
 			precis = p;
 		    }
@@ -9459,7 +9460,7 @@ Perl_sv_vcatpvfn(pTHX_ SV *sv, const char *pat, STRLEN patlen, va_list *args, SV
 	    }
 
 	string:
-	    if (has_precis && elen > precis)
+	    if (has_precis && precis < elen)
 		elen = precis;
 	    break;
 
diff --git a/t/op/sprintf2.t b/t/op/sprintf2.t
index 397c19e..765bf68 100644
--- a/t/op/sprintf2.t
+++ b/t/op/sprintf2.t
@@ -6,7 +6,7 @@ BEGIN {
     require './test.pl';
 }   
 
-plan tests => 1295;
+plan tests => 1344;
 
 is(
     sprintf("%.40g ",0.01),
@@ -139,3 +139,19 @@ foreach my $n (2**1e100, -2**1e100, 2**1e100/2**1e100) { # +Inf, -Inf, NaN
     eval { my $f = sprintf("%f", $n); };
     is $@, "", "sprintf(\"%f\", $n)";
 }
+
+# Check unicode vs byte length
+for my $width (1,2,3,4,5,6,7) {
+    for my $precis (1,2,3,4,5,6,7) {
+        my $v = "\x{20ac}\x{20ac}";
+        my $format = "%" . $width . "." . $precis . "s";
+        my $chars = ($precis > 2 ? 2 : $precis);
+        my $space = ($width < 2 ? 0 : $width - $chars);
+        fresh_perl_is(
+            'my $v = "\x{20ac}\x{20ac}"; my $x = sprintf "'.$format.'", $v; $x =~ /^(\s*)(\S*)$/; print "$_" for map {length} $1, $2',
+            "$space$chars",
+            {},
+            q(sprintf ").$format.q(", "\x{20ac}\x{20ac}"),
+        );
+    }
+}
-- 
1.6.3.204.g8c948

@p5pRT
Copy link
Author

p5pRT commented May 30, 2009

@rgs - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed May 30, 2009
@p5pRT
Copy link
Author

p5pRT commented May 30, 2009

From @rgs

2009/5/29 Alex Vandiver <alex@​chmrr.net>​:

At Fri Jan 23 15​:14​:55 -0500 2009, hmbrand@​cpan.org (via RT) wrote​:

panic​: utf8_mg_pos_cache_update cache 12 real 11 for ? 12,345.00 at demo.pl
line 9.

Attached is a fix for this.  Unfortunately, I don't think we can get
away without the added call to svn_len_utf8 without rewriting
sv_pos_u2b_cached.

Thanks, applied as 9ef5ed9.
Do you patch against maint-5.10 ? I apply patches to blead, Dave will
then cherry-pick them to maint; which is why it's better to patch
directly against blead, usually. (of course this advice doesn't apply
to maint-only bugs)

Thanks to Best Practical for volunteering my time to look at some of
these 5.10.1 blockers.

Thanks indeed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant