Skip Menu |
Report information
Id: 38722
Status: resolved
Priority: 0/
Queue: perl5

Owner: khw <khw [at]>
Requestors: jgmyers <jgmyers [at]>

Operating System: Linux
PatchStatus: (no value)
Severity: medium
Perl Version: 5.8.7
Fixed In: (no value)

Subject: Perl_utf8n_to_uvuni decodes illegal characters
Date: Mon, 13 Mar 2006 10:58:54 -0800
To: perlbug [...]
From: "John Myers" <jgmyers [...]>
Download (untitled) / with headers
text/plain 3.9k
This is a bug report for perl from, generated with the help of perlbug 1.35 running under perl v5.8.7. ----------------------------------------------------------------- [Please enter your report here] As shown by the test program below, Perl_utf8n_to_uvuni will decode characters that Perl_uvuni_to_utf8_flags considers illegal. The problem characters are U+FDD0 through U+FDEF, U+FFFE, U+xFFFE for 1 <= x <= 10, and U+xFFFF for 1 <= x <= 10. The two functions must agree as to what is an illegal character or programs that handle untrusted input will have insufficient control over what perl warnings get thrown. use Encode; use strict; use warnings; sub trydecode { my ($utf8) = (@_); my $text = Encode::decode('UTF-8', $utf8, 0); printf "%x\n", ord(substr($text, 3, 1)); $text =~ /\b(?:https?|ftp)/o; } trydecode("aaa\xef\xbf\xbebbb"); #fffe trydecode("aaa\xef\xbf\xbfbbb"); #ffff trydecode("aaa\xef\xb7\x90bbb"); #fdd0 trydecode("aaa\xf0\x9f\xbf\xbebbb"); #1fffe trydecode("aaa\xf0\x9f\xbf\xbfbbb"); #1ffff [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=medium --- Site configuration information for perl v5.8.7: Configured by jgmyers at Mon Jul 25 16:01:57 PDT 2005. Summary of my perl5 (revision 5 version 8 subversion 7) configuration: Platform: osname=linux, osvers=2.4.21-32.0.1.elsmp, archname=i686-linux-thread-multi uname='linux 2.4.21-32.0.1.elsmp #1 smp tue may 17 17:52:23 edt 2005 i686 i686 i386 gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='3.3.3', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: --- @INC for perl v5.8.7: /u/jgmyers/perl/lib/5.8.7/i686-linux-thread-multi /u/jgmyers/perl/lib/5.8.7 /u/jgmyers/perl/lib/site_perl/5.8.7/i686-linux-thread-multi /u/jgmyers/perl/lib/site_perl/5.8.7 /u/jgmyers/perl/lib/site_perl/5.8.6/i686-linux-thread-multi /u/jgmyers/perl/lib/site_perl/5.8.6 /u/jgmyers/perl/lib/site_perl/5.8.5/i686-linux-thread-multi /u/jgmyers/perl/lib/site_perl/5.8.5 /u/jgmyers/perl/lib/site_perl/5.8.3/i686-linux-thread-multi /u/jgmyers/perl/lib/site_perl/5.8.3 /u/jgmyers/perl/lib/site_perl . --- Environment for perl v5.8.7: HOME=/u/jgmyers LANG=en_US LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/tools/x/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/u/jgmyers/bin PERL_BADLANG (unset) SHELL=/bin/bash
RT-Send-CC: perl5-porters [...]
Download (untitled) / with headers
text/plain 280b
The behavior is now changed so that all of these silently turn into the Unicode replacement character, U+FFFD. Both routines now know about the same 66 Unicode non-character code points. If you want a warning, a ticket should be written against Encode in CPAN --Karl Williamson

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at