Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chop/~ mangles UTF8 #7040

Closed
p5pRT opened this issue Jan 16, 2004 · 6 comments
Closed

chop/~ mangles UTF8 #7040

p5pRT opened this issue Jan 16, 2004 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 16, 2004

Migrated from rt.perl.org#24926 (status was 'resolved')

Searchable as RT24926$

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2004

From nwc@faith.mccarroll.org.uk

Created by @nwc10

perl5.8.3 -le '$a="\0\x{100}"; chop $a; print ord ~$a'
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xff) in ord at -e line 1.
0

Not that it's a new bug​:

perl5.8.0 -le '$a="\0\x{100}"; chop $a; print ord ~$a'
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xff) in ord at -e line 1.
0

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.3:

Configured by nwc at Wed Jan 14 18:52:08 UTC 2004.

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.4.20-gentoo-r5, archname=i686-linux
    uname='linux faith 2.4.20-gentoo-r5 #1 thu jun 5 01:30:47 local time zone must be set--see zic manua i686 intel(r) pentium(r) 4 cpu 2.40ghz genuineintel gnulinux '
    config_args='-Dprefix=/home/nwc/sandpit5.8.3/bin -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3',
    cppflags='-fno-strict-aliasing'
    ccversion='', gccversion='3.2.3 20030422 (Gentoo Linux 1.4 3.2.3-r3, propolice)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.3:
    /home/nwc/sandpit5.8.3/bin/lib/perl5/5.8.3/i686-linux
    /home/nwc/sandpit5.8.3/bin/lib/perl5/5.8.3
    /home/nwc/sandpit5.8.3/bin/lib/perl5/site_perl/5.8.3/i686-linux
    /home/nwc/sandpit5.8.3/bin/lib/perl5/site_perl/5.8.3
    /home/nwc/sandpit5.8.3/bin/lib/perl5/site_perl
    .


Environment for perl v5.8.3:
    HOME=/home/nwc
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=en_GB.ISO-8859-1
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/nwc/bin:/bin:/usr/bin:/usr/local/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.2:/usr/X11R6/bin:/opt/blackdown-jdk-1.4.1/bin:/opt/blackdown-jdk-1.4.1/jre/bin:/usr/games/bin:/sbin:/usr/sbin:/usr/local/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2004

From @gisle

"nwc@​faith.mccarroll.org.uk (via RT)" <perlbug-followup@​perl.org> writes​:

perl5.8.3 -le '$a="\0\x{100}"; chop $a; print ord ~$a'
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xff) in ord at -e line 1.
0

It's the complement op that is buggy here. This is a fix​:

Inline Patch
--- pp.c.5.8.3	2004-01-17 10:07:22.000000000 +0100
+++ pp.c	2004-01-17 10:09:48.000000000 +0100
@@ -2406,6 +2406,7 @@
 	      *result = '\0';
 	      result -= nchar;
 	      sv_setpvn(TARG, (char*)result, nchar);
+	      SvUTF8_off(TARG);
 	  }
 	  Safefree(result);
 	  SETs(TARG);
--- t/op/bop.t.5.8.3	2004-01-17 10:17:10.000000000 +0100
+++ t/op/bop.t	2004-01-17 10:24:51.000000000 +0100
@@ -9,7 +9,7 @@
     @INC = '../lib';
 }
 
-print "1..44\n";
+print "1..46\n";
 
 # numerics
 print ((0xdead & 0xbeef) == 0x9ead ? "ok 1\n" : "not ok 1\n");
@@ -184,3 +184,8 @@
 print ((~ $neg1 == 0) ? "ok 43\n" : "not ok 43\n");
 $neg7 = -7.0;
 print ((~ $neg7 == 6) ? "ok 44\n" : "not ok 44\n");
+
+$a = "\0\x{100}"; chop($a);
+print utf8::is_utf8($a) ? "ok 45\n" : "not ok 45\n";  # make sure UTF8 flag is still there
+$a = ~$a;
+print $a eq "\xFF" ? "ok 46\n" : "not ok 46\n";

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2004

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 19, 2004

From @rgs

Gisle Aas wrote​:

"nwc@​faith.mccarroll.org.uk (via RT)" <perlbug-followup@​perl.org> writes​:

perl5.8.3 -le '$a="\0\x{100}"; chop $a; print ord ~$a'
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xff) in ord at -e line 1.
0

It's the complement op that is buggy here. This is a fix​:

--- pp.c.5.8.3 2004-01-17 10​:07​:22.000000000 +0100
+++ pp.c 2004-01-17 10​:09​:48.000000000 +0100

Thanks, applied as #22180.

@p5pRT
Copy link
Author

p5pRT commented Oct 13, 2005

From @smpeters

[rafael - Mon Jan 19 14​:52​:08 2004]​:

Gisle Aas wrote​:

"nwc@​faith.mccarroll.org.uk (via RT)" <perlbug-followup@​perl.org>
writes​:

perl5.8.3 -le '$a="\0\x{100}"; chop $a; print ord ~$a'
Malformed UTF-8 character (unexpected non-continuation byte 0x00,
immediately after start byte 0xff) in ord at -e line 1.
0

It's the complement op that is buggy here. This is a fix​:

--- pp.c.5.8.3 2004-01-17 10​:07​:22.000000000 +0100
+++ pp.c 2004-01-17 10​:09​:48.000000000 +0100

Thanks, applied as #22180.

Great, the applied patch works and this can be closed.

@p5pRT
Copy link
Author

p5pRT commented Oct 13, 2005

@smpeters - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant