Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl_utf8n_to_uvuni decodes illegal characters #8370

Closed
p5pRT opened this issue Mar 13, 2006 · 4 comments
Closed

Perl_utf8n_to_uvuni decodes illegal characters #8370

p5pRT opened this issue Mar 13, 2006 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 13, 2006

Migrated from rt.perl.org#38722 (status was 'resolved')

Searchable as RT38722$

@p5pRT
Copy link
Author

p5pRT commented Mar 13, 2006

From jgmyers@proofpoint.com

Created by jgmyers@pong.us.proofpoint.com

As shown by the test program below, Perl_utf8n_to_uvuni will decode
characters that Perl_uvuni_to_utf8_flags considers illegal. The problem
characters are U+FDD0 through U+FDEF, U+FFFE, U+xFFFE for 1 <= x <= 10,
and U+xFFFF for 1 <= x <= 10. The two functions must agree as to what
is an illegal character or programs that handle untrusted input will
have insufficient control over what perl warnings get thrown.

use Encode;
use strict;
use warnings;

sub trydecode {
  my ($utf8) = (@​_);
  my $text = Encode​::decode('UTF-8', $utf8, 0);

  printf "%x\n", ord(substr($text, 3, 1));

  $text =~ /\b(?​:https?|ftp)/o;
}

trydecode("aaa\xef\xbf\xbebbb"); #fffe
trydecode("aaa\xef\xbf\xbfbbb"); #ffff
trydecode("aaa\xef\xb7\x90bbb"); #fdd0
trydecode("aaa\xf0\x9f\xbf\xbebbb"); #1fffe
trydecode("aaa\xf0\x9f\xbf\xbfbbb"); #1ffff

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.7:

Configured by jgmyers at Mon Jul 25 16:01:57 PDT 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.4.21-32.0.1.elsmp, 
archname=i686-linux-thread-multi
    uname='linux pong.us.proofpoint.com 2.4.21-32.0.1.elsmp #1 smp tue 
may 17 17:52:23 edt 2005 i686 i686 i386 gnulinux '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING 
-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='3.3.3', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.7:
    /u/jgmyers/perl/lib/5.8.7/i686-linux-thread-multi
    /u/jgmyers/perl/lib/5.8.7
    /u/jgmyers/perl/lib/site_perl/5.8.7/i686-linux-thread-multi
    /u/jgmyers/perl/lib/site_perl/5.8.7
    /u/jgmyers/perl/lib/site_perl/5.8.6/i686-linux-thread-multi
    /u/jgmyers/perl/lib/site_perl/5.8.6
    /u/jgmyers/perl/lib/site_perl/5.8.5/i686-linux-thread-multi
    /u/jgmyers/perl/lib/site_perl/5.8.5
    /u/jgmyers/perl/lib/site_perl/5.8.3/i686-linux-thread-multi
    /u/jgmyers/perl/lib/site_perl/5.8.3
    /u/jgmyers/perl/lib/site_perl
    .


Environment for perl v5.8.7:
    HOME=/u/jgmyers
    LANG=en_US
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=/tools/x/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/u/jgmyers/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jan 10, 2011

From @khwilliamson

The behavior is now changed so that all of these silently turn into the
Unicode replacement character, U+FFFD. Both routines now know about the
same 66 Unicode non-character code points.

If you want a warning, a ticket should be written against Encode in CPAN

--Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Jan 10, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 10, 2011

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant