Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substr function make illegal answer when using Unicode wide character #7876

Closed
p5pRT opened this issue Apr 14, 2005 · 4 comments
Closed

substr function make illegal answer when using Unicode wide character #7876

p5pRT opened this issue Apr 14, 2005 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 14, 2005

Migrated from rt.perl.org#34976 (status was 'resolved')

Searchable as RT34976$

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2005

From roku@shinjusha.com

Created by roku@shinjusha.com

Dear Sir.

When I runned this script, I received illegal result.

use utf8;
my($str) = "kasu�$B%F�(B"; # \x{006B}\x{0061}\x{0073}\x{0075}\x{30C6}
test($str);
sub test {
  my($tmp) = @​_;
  my($offset) = 1;
  $len = length($tmp);
  for ($i = 0 ; $i <= $len ; $i++) {
  print substr($tmp, $offset, 2), "\n"; #substr A
  print substr($tmp, $offset, 1), "\n"; #substr B
  }
}

I expected this result​:
as
a
as
a
as
...

But, I received below result​:
as
as
as
as
....

substr returned illegal action.
When substr function is executed in mixed value with wide and narrow character,
substr seems to make illegal answer.

But, if substr A is commented out,
substr B will return correct result.

If $offset is changed other value (0 or 2),
substr B will also return correct result.

ActivePerl5.8.4 is no problem.

Please confirm it.

Thank you.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.6:

Configured by ActiveState at Mon Dec 13 09:51:32 2004.

Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
  Platform:
    osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -
DHAVE_DES_FCRYPT  -DNO_HASH_SEED -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -
DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf  -libpath:"C:\Perl\lib\CORE"  -
machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib 
shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib 
odbc32.lib odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib 
shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib 
odbc32.lib odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
    gnulibc_version='undef'
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf  -libpath:"C:\Perl\lib
\CORE"  -machine:x86'

Locally applied patches:
    ACTIVEPERL_LOCAL_PATCHES_ENTRY
    21540 Fix backward-compatibility issues in if.pm
    23565 Wrong MANIFEST.SKIP


@INC for perl v5.8.6:
    C:/Perl/lib
    C:/Perl/site/lib
    .


Environment for perl v5.8.6:
    HOME (unset)
    LANG=ja_JP.SJIS
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\Perl\bin\;C:\kakasi\bin;C:\namazu\bin;C:\Perl\bin\MSWin32-x86-object;C:\Perl\bin;C:\WINNT
\system32;C:\WINNT;C:\WINNT\System32\Wbem;C:\Program Files\Common Files\Adaptec Shared\System;C:\Program 
Files\Microsoft SQL Server\80\Tools\BINN;C:\Program Files\Resource Pro Kit\;c:\bin;c:\cygwin\bin;C:
\Program Files\Microsoft Visual Studio\Common\Tools\WinNT;C:\Program Files\Microsoft Visual Studio\Common
\MSDev98\Bin;C:\Program Files\Microsoft Visual Studio\Common\Tools;C:\Program Files\Microsoft Visual 
Studio\VC98\bin
    PERL_BADLANG (unset)
    SHELL (unset)



@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2005

From @iabyn

On Thu, Apr 14, 2005 at 08​:54​:06AM -0000, Toshiharu Roppongi wrote​:

Dear Sir.

When I runned this script, I received illegal result.

use utf8;
my($str) = "kasu�"; # \x{006B}\x{0061}\x{0073}\x{0075}\x{30C6}
test($str);
sub test {
my($tmp) = @​_;
my($offset) = 1;
$len = length($tmp);
for ($i = 0 ; $i <= $len ; $i++) {
print substr($tmp, $offset, 2), "\n"; #substr A
print substr($tmp, $offset, 1), "\n"; #substr B
}
}

I expected this result​:
as
a
as
a
as
...

But, I received below result​:
as
as
as
as
....

Thanks for the report. Turns out there was a bug in the code that caches
uft8 byts offsets. Fixed in bleedperl by the change below.

Dave

--
"Emacs isn't a bad OS once you get used to it.
It just lacks a decent editor."

Change 24270 by davem@​davem-splatty on 2005/04/21 15​:36​:14

  [perl #34976] substr uses utf8 length cache incorrectly

Affected files ...

... //depot/perl/sv.c#807 edit
... //depot/perl/t/op/substr.t#31 edit

Differences ...

==== //depot/perl/sv.c#807 (text) ====

@​@​ -6402,7 +6402,7 @​@​
  if (lenp) {
  found = FALSE;
  start = s;
- if (utf8_mg_pos(sv, &mg, &cache, 2, lenp, *lenp + *offsetp, &s, start, send)) {
+ if (utf8_mg_pos(sv, &mg, &cache, 2, lenp, *lenp, &s, start, send)) {
  *lenp -= boffset;
  found = TRUE;
  }

==== //depot/perl/t/op/substr.t#31 (xtext) ====

@​@​ -1,6 +1,6 @​@​
#!./perl

-print "1..190\n";
+print "1..192\n";

#P = start of string Q = start of substr R = end of substr S = end of string

@​@​ -658,3 +658,10 @​@​
  substr($a, -1) &= chr(0xfeff);
  ok 190, $a eq "\xbf";
}
+
+# [perl #34976] incorrect caching of utf8 substr length
+{
+ my $a = "abcd\x{100}";
+ ok 191, substr($a,1,2) eq 'bc';
+ ok 192, substr($a,1,1) eq 'b';
+}

@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT p5pRT closed this as completed Apr 21, 2005
@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2005

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant