Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no "Malformed UTF-8 character" warning on single-quoted strings under "use utf8" #14973

Closed
p5pRT opened this issue Oct 9, 2015 · 18 comments
Closed

Comments

@p5pRT
Copy link

p5pRT commented Oct 9, 2015

Migrated from rt.perl.org#126310 (status was 'resolved')

Searchable as RT126310$

@p5pRT
Copy link
Author

p5pRT commented Oct 9, 2015

From florian.schlichting@fu-berlin.de

This is a bug report for perl from florian.schlichting@​fu-berlin.de,
generated with the help of perlbug 1.40 running under perl 5.20.2.


As discovered in the "Malformed UTF-8 character" thread at
http​://www.perlmonks.org/?node_id=902060 and isolated by tchrist in a reply at
http​://www.perlmonks.org/?displaytype=print;node_id=902212;replies=1, Perl
fails to issue a "Malformed UTF-8 character" warning when running under "use
utf8" IF the string in question is enclosed in single quotes. For double quoted
strings the warning is issued as expected​:

% blead -C0 -le 'print qq(print "\xB0C";)' | blead -Mutf8 -CS -l
Malformed UTF-8 character (unexpected continuation byte 0xb0, with no preceding start byte) at - line 1.
C

% blead -C0 -le 'print qq(print \x27\xB0C\x27;)' | blead -Mutf8 -CS -l
#C

This should be fixed so that the warning is issued for single quoted strings as
well, helping to detect incompletely/incorrectly converted scripts.



Flags​:
  category=core
  severity=medium


Site configuration information for perl 5.20.2​:

Configured by Debian Project at Sun May 3 16​:16​:25 UTC 2015.

Summary of my perl5 (revision 5 version 20 subversion 2) configuration​:
 
  Platform​:
  osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
  uname='linux x86-csail-01 3.2.0-4-amd64 #1 smp debian 3.2.68-1+deb7u1 x86_64 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.20 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.20 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.20 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.20.2 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.20.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.20.2 -des'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -g',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion='', gccversion='4.9.2', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=libc-2.19.so, so=so, useshrplib=true, libperl=libperl.so.5.20
  gnulibc_version='2.19'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Locally applied patches​:
  DEBPKG​:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
  DEBPKG​:debian/db_file_ver - http​://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
  DEBPKG​:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
  DEBPKG​:debian/enc2xs_inc - http​://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @​INC directories.
  DEBPKG​:debian/errno_ver - http​://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
  DEBPKG​:debian/libperl_embed_doc - http​://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
  DEBPKG​:fixes/respect_umask - Respect umask during installation
  DEBPKG​:debian/writable_site_dirs - Set umask approproately for site install directories
  DEBPKG​:debian/extutils_set_libperl_path - EU​:MM​: set location of libperl.a under /usr/lib
  DEBPKG​:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
  DEBPKG​:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
  DEBPKG​:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
  DEBPKG​:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
  DEBPKG​:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
  DEBPKG​:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
  DEBPKG​:debian/mod_paths - Tweak @​INC ordering for Debian
  DEBPKG​:debian/module_build_man_extensions - http​://bugs.debian.org/479460 Adjust Module​::Build manual page extensions for the Debian Perl policy
  DEBPKG​:debian/prune_libs - http​://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
  DEBPKG​:fixes/net_smtp_docs - [rt.cpan.org #36038] http​://bugs.debian.org/100195 Document the Net​::SMTP 'Port' option
  DEBPKG​:debian/perlivp - http​://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
  DEBPKG​:debian/deprecate-with-apt - http​://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
  DEBPKG​:debian/squelch-locale-warnings - http​://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
  DEBPKG​:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
  DEBPKG​:debian/patchlevel - http​://bugs.debian.org/567489 List packaged patches for 5.20.2-3+deb8u1 in patchlevel.h
  DEBPKG​:debian/skip-kfreebsd-crash - http​://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
  DEBPKG​:fixes/document_makemaker_ccflags - http​://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
  DEBPKG​:debian/find_html2text - http​://bugs.debian.org/640479 Configure CPAN​::Distribution with correct name of html2text
  DEBPKG​:debian/perl5db-x-terminal-emulator.patch - http​://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
  DEBPKG​:debian/cpan-missing-site-dirs - http​://bugs.debian.org/688842 Fix CPAN​::FirstTime defaults with nonexisting site dirs if a parent is writable
  DEBPKG​:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http​://bugs.debian.org/587650 Memoize​::Storable​: respect 'nstore' option not respected
  DEBPKG​:debian/regen-skip - Skip a regeneration check in unrelated git repositories
  DEBPKG​:fixes/regcomp-mips-optim - [perl #122817] http​://bugs.debian.org/754054 Downgrade the optimization of regcomp.c on mips and mipsel due to a gcc-4.9 bug
  DEBPKG​:debian/makemaker-pasthru - http​://bugs.debian.org/758471 Pass LD settings through to subdirectories
  DEBPKG​:fixes/perldoc-less-R - [rt.cpan.org #98636] http​://bugs.debian.org/758689 Tell the 'less' pager to allow terminal escape sequences
  DEBPKG​:fixes/pod_man_reproducible_date - http​://bugs.debian.org/759405 Support POD_MAN_DATE in Pod​::Man for the left-hand footer
  DEBPKG​:fixes/io_uncompress_gunzip_inmemory - http​://bugs.debian.org/747363 [rt.cpan.org #95494] Fix gunzip to in-memory file handle
  DEBPKG​:fixes/socket_test_recv_fix - http​://bugs.debian.org/758718 [perl #122657] Compare recv return value to peername in socket test
  DEBPKG​:fixes/hurd_socket_recv_todo - http​://bugs.debian.org/758718 [perl #122657] TODO checking the result of recv() on hurd
  DEBPKG​:fixes/regexp-performance - [0fa70a0] http​://bugs.debian.org/777556 [perl #123743] simpify and speed up /.*.../ handling
  DEBPKG​:fixes/failed_require_diagnostics - http​://bugs.debian.org/781120 [perl #123270] Report inaccesible file on failed require
  DEBPKG​:fixes/array-cloning - http​://bugs.debian.org/779357 [perl #124127] [902d169] fix cloning arrays with unused elements
  DEBPKG​:fixes/perldb-threads - http​://bugs.debian.org/779357 [perl #124127] [41ef2c6] lib/perl5db.pl​: Restore noop lock prototype


@​INC for perl 5.20.2​:
  /etc/perl
  /usr/local/lib/x86_64-linux-gnu/perl/5.20.2
  /usr/local/share/perl/5.20.2
  /usr/lib/x86_64-linux-gnu/perl5/5.20
  /usr/share/perl5
  /usr/lib/x86_64-linux-gnu/perl/5.20
  /usr/share/perl/5.20
  /usr/local/lib/site_perl
  .


Environment for perl 5.20.2​:
  HOME=/home/fschlich
  LANG=de_DE@​euro
  LANGUAGE (unset)
  LC_CTYPE=de_DE@​euro
  LC_MESSAGES=C
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/fschlich/bin​:/home/fschlich/bin​:/usr/local/sw/i3-jessie/bin​:/usr/local/sw/xfce/stable/bin​:/usr/local/bin​:/usr/local/sbin​:/bin​:/sbin​:/usr/bin​:/usr/sbin​:/usr/games
  PERL_BADLANG (unset)
  SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 2015

From @khwilliamson

I have taken this ticket, as I'm about to start work on related things.

On 10/09/2015 07​:17 AM, (via RT) wrote​:

# New Ticket Created by
# Please include the string​: [perl #126310]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=126310 >

This is a bug report for perl from florian.schlichting@​fu-berlin.de,
generated with the help of perlbug 1.40 running under perl 5.20.2.

-----------------------------------------------------------------

As discovered in the "Malformed UTF-8 character" thread at
http​://www.perlmonks.org/?node_id=902060 and isolated by tchrist in a reply at
http​://www.perlmonks.org/?displaytype=print;node_id=902212;replies=1, Perl
fails to issue a "Malformed UTF-8 character" warning when running under "use
utf8" IF the string in question is enclosed in single quotes. For double quoted
strings the warning is issued as expected​:

% blead -C0 -le 'print qq(print "\xB0C";)' | blead -Mutf8 -CS -l
Malformed UTF-8 character (unexpected continuation byte 0xb0, with no preceding start byte) at - line 1.
C

% blead -C0 -le 'print qq(print \x27\xB0C\x27;)' | blead -Mutf8 -CS -l
#C

This should be fixed so that the warning is issued for single quoted strings as
well, helping to detect incompletely/incorrectly converted scripts.

-----------------------------------------------------------------
---
Flags​:
category=core
severity=medium
---
Site configuration information for perl 5.20.2​:

Configured by Debian Project at Sun May 3 16​:16​:25 UTC 2015.

Summary of my perl5 (revision 5 version 20 subversion 2) configuration​:

Platform​:
osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux x86-csail-01 3.2.0-4-amd64 #1 smp debian 3.2.68-1+deb7u1 x86_64 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.20 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.20 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.20 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.20.2 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.20.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.20.2 -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='4.9.2', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=libc-2.19.so, so=so, useshrplib=true, libperl=libperl.so.5.20
gnulibc_version='2.19'
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Locally applied patches​:
DEBPKG​:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
DEBPKG​:debian/db_file_ver - http​://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
DEBPKG​:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
DEBPKG​:debian/enc2xs_inc - http​://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @​INC directories.
DEBPKG​:debian/errno_ver - http​://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
DEBPKG​:debian/libperl_embed_doc - http​://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
DEBPKG​:fixes/respect_umask - Respect umask during installation
DEBPKG​:debian/writable_site_dirs - Set umask approproately for site install directories
DEBPKG​:debian/extutils_set_libperl_path - EU​:MM​: set location of libperl.a under /usr/lib
DEBPKG​:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
DEBPKG​:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
DEBPKG​:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
DEBPKG​:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
DEBPKG​:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
DEBPKG​:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
DEBPKG​:debian/mod_paths - Tweak @​INC ordering for Debian
DEBPKG​:debian/module_build_man_extensions - http​://bugs.debian.org/479460 Adjust Module​::Build manual page extensions for the Debian Perl policy
DEBPKG​:debian/prune_libs - http​://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
DEBPKG​:fixes/net_smtp_docs - [rt.cpan.org #36038] http​://bugs.debian.org/100195 Document the Net​::SMTP 'Port' option
DEBPKG​:debian/perlivp - http​://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
DEBPKG​:debian/deprecate-with-apt - http​://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
DEBPKG​:debian/squelch-locale-warnings - http​://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
DEBPKG​:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
DEBPKG​:debian/patchlevel - http​://bugs.debian.org/567489 List packaged patches for 5.20.2-3+deb8u1 in patchlevel.h
DEBPKG​:debian/skip-kfreebsd-crash - http​://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
DEBPKG​:fixes/document_makemaker_ccflags - http​://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
DEBPKG​:debian/find_html2text - http​://bugs.debian.org/640479 Configure CPAN​::Distribution with correct name of html2text
DEBPKG​:debian/perl5db-x-terminal-emulator.patch - http​://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
DEBPKG​:debian/cpan-missing-site-dirs - http​://bugs.debian.org/688842 Fix CPAN​::FirstTime defaults with nonexisting site dirs if a parent is writable
DEBPKG​:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http​://bugs.debian.org/587650 Memoize​::Storable​: respect 'nstore' option not respected
DEBPKG​:debian/regen-skip - Skip a regeneration check in unrelated git repositories
DEBPKG​:fixes/regcomp-mips-optim - [perl #122817] http​://bugs.debian.org/754054 Downgrade the optimization of regcomp.c on mips and mipsel due to a gcc-4.9 bug
DEBPKG​:debian/makemaker-pasthru - http​://bugs.debian.org/758471 Pass LD settings through to subdirectories
DEBPKG​:fixes/perldoc-less-R - [rt.cpan.org #98636] http​://bugs.debian.org/758689 Tell the 'less' pager to allow terminal escape sequences
DEBPKG​:fixes/pod_man_reproducible_date - http​://bugs.debian.org/759405 Support POD_MAN_DATE in Pod​::Man for the left-hand footer
DEBPKG​:fixes/io_uncompress_gunzip_inmemory - http​://bugs.debian.org/747363 [rt.cpan.org #95494] Fix gunzip to in-memory file handle
DEBPKG​:fixes/socket_test_recv_fix - http​://bugs.debian.org/758718 [perl #122657] Compare recv return value to peername in socket test
DEBPKG​:fixes/hurd_socket_recv_todo - http​://bugs.debian.org/758718 [perl #122657] TODO checking the result of recv() on hurd
DEBPKG​:fixes/regexp-performance - [0fa70a0] http​://bugs.debian.org/777556 [perl #123743] simpify and speed up /.*.../ handling
DEBPKG​:fixes/failed_require_diagnostics - http​://bugs.debian.org/781120 [perl #123270] Report inaccesible file on failed require
DEBPKG​:fixes/array-cloning - http​://bugs.debian.org/779357 [perl #124127] [902d169] fix cloning arrays with unused elements
DEBPKG​:fixes/perldb-threads - http​://bugs.debian.org/779357 [perl #124127] [41ef2c6] lib/perl5db.pl​: Restore noop lock prototype

---
@​INC for perl 5.20.2​:
/etc/perl
/usr/local/lib/x86_64-linux-gnu/perl/5.20.2
/usr/local/share/perl/5.20.2
/usr/lib/x86_64-linux-gnu/perl5/5.20
/usr/share/perl5
/usr/lib/x86_64-linux-gnu/perl/5.20
/usr/share/perl/5.20
/usr/local/lib/site_perl
.

---
Environment for perl 5.20.2​:
HOME=/home/fschlich
LANG=de_DE@​euro
LANGUAGE (unset)
LC_CTYPE=de_DE@​euro
LC_MESSAGES=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/fschlich/bin​:/home/fschlich/bin​:/usr/local/sw/i3-jessie/bin​:/usr/local/sw/xfce/stable/bin​:/usr/local/bin​:/usr/local/sbin​:/bin​:/sbin​:/usr/bin​:/usr/sbin​:/usr/games
PERL_BADLANG (unset)
SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2016

From @khwilliamson

I intend to fix this, unless the consensus is to not. It involves extra work in the parser of doing a UTF-8 validity check when appropriate on single-quoted strings.

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2016

From [Unknown Contact. See original ticket]

I intend to fix this, unless the consensus is to not. It involves extra work in the parser of doing a UTF-8 validity check when appropriate on single-quoted strings.

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2016

From @cpansprout

On Tue Aug 02 19​:58​:51 2016, khw wrote​:

I intend to fix this, unless the consensus is to not. It involves
extra work in the parser of doing a UTF-8 validity check when
appropriate on single-quoted strings.

If you mean in tokeq or scan_str, I think that’s the wrong place to do it. It sounds as though eval "'...'" will be subject to such extra checks as well, but it is perfectly reasonable to assume that perl strings are already well-formed.

Ideally, under ‘use utf8’, the validation would be done when the input is read from a stream, though I can’t say offhand what is the best way to go about that.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2016

From @cpansprout

On Tue Aug 02 20​:05​:11 2016, sprout wrote​:

On Tue Aug 02 19​:58​:51 2016, khw wrote​:

I intend to fix this, unless the consensus is to not. It involves
extra work in the parser of doing a UTF-8 validity check when
appropriate on single-quoted strings.

If you mean in tokeq or scan_str, I think that’s the wrong place to do
it. It sounds as though eval "'...'" will be subject to such extra
checks as well, but it is perfectly reasonable to assume that perl
strings are already well-formed.

Ideally, under ‘use utf8’, the validation would be done when the input
is read from a stream, though I can’t say offhand what is the best way
to go about that.

Probably in Perl_lex_next_chunk or something it calls.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2016

From @khwilliamson

On Tue Aug 02 20​:09​:15 2016, sprout wrote​:

On Tue Aug 02 20​:05​:11 2016, sprout wrote​:

On Tue Aug 02 19​:58​:51 2016, khw wrote​:

I intend to fix this, unless the consensus is to not. It involves
extra work in the parser of doing a UTF-8 validity check when
appropriate on single-quoted strings.

If you mean in tokeq or scan_str, I think that’s the wrong place to do
it. It sounds as though eval "'...'" will be subject to such extra
checks as well, but it is perfectly reasonable to assume that perl
strings are already well-formed.

Ideally, under ‘use utf8’, the validation would be done when the input
is read from a stream, though I can’t say offhand what is the best way
to go about that.

Probably in Perl_lex_next_chunk or something it calls.

Is the attach3ed like what you mean?

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2016

From @khwilliamson

0001-Proof-of-concept-to-test-input-for-valid-UTF-8.patch
From c1d8cbda01e0b2f372e9341efeb4e306ec0c043d Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Wed, 31 Aug 2016 21:31:28 -0600
Subject: [PATCH] Proof-of-concept to test input for valid UTF-8.

This will fix #126310, and heaven knows what else.

I think we should die at the first malformed UTF-8 encountered in
parsing.  To try to continue is asking for trouble, and not going to be
DWIM anyway.
---
 toke.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/toke.c b/toke.c
index dbeecd1..eddfb29 100644
--- a/toke.c
+++ b/toke.c
@@ -1339,6 +1339,11 @@ Perl_lex_next_chunk(pTHX_ U32 flags)
     new_bufend_pos = SvCUR(linestr);
     PL_parser->bufend = buf + new_bufend_pos;
     PL_parser->bufptr = buf + bufptr_pos;
+
+    if (UTF && ! is_utf8_string((U8 *) PL_parser->bufptr, PL_parser->bufend - PL_parser->bufptr)) {
+	Perl_croak(aTHX_ "Malformed utf8");
+    }
+
     PL_parser->oldbufptr = buf + oldbufptr_pos;
     PL_parser->oldoldbufptr = buf + oldoldbufptr_pos;
     PL_parser->linestart = buf + linestart_pos;
-- 
2.5.0

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2016

From @cpansprout

On Wed Aug 31 20​:35​:02 2016, khw wrote​:

Is the attach3ed like what you mean?

Yes, that would work.

It would be nice, too, if we could add the ‘near such and such’ that yyerror normally does. Maybe yyerror could have an extra option to croak instead of calling qerror. It already has a flags field.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2016

From florian.schlichting@fu-berlin.de

Hi Karl,

Father Chrysostomos wrote​:

On Wed Aug 31 20​:35​:02 2016, khw wrote​:

Is the attach3ed like what you mean?

Yes, that would work.

It would be nice, too, if we could add the `near such and such' that
yyerror normally does. Maybe yyerror could have an extra option to croak
instead of calling qerror. It already has a flags field.

thanks for looking into this issue. I tested your patch and can confirm
that it correctly treats single and double quotes the same​:

% ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8 -CS % -l
Malformed utf8 at - line 1.

% ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib' -Mutf8 -CS -l
Malformed utf8 at - line 1.

However, I feel a little uneasy about dying altogether. Currently Perl
issues just a warning ("Malformed UTF-8 character") and that seems to be
the approach with UTF-8 issues encountered in other places in toke.c as
well. Most of the time, these will be strings displayed to the user, and
they will mostly still be legible even with a few characters garbled or
skipped. Don't you think "complain and carry on" is what users would
expect?

Florian

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2016

From @khwilliamson

On 09/16/2016 06​:46 AM, Florian Schlichting wrote​:

Hi Karl,

Father Chrysostomos wrote​:

On Wed Aug 31 20​:35​:02 2016, khw wrote​:

Is the attach3ed like what you mean?

Yes, that would work.

It would be nice, too, if we could add the `near such and such' that
yyerror normally does. Maybe yyerror could have an extra option to croak
instead of calling qerror. It already has a flags field.

thanks for looking into this issue. I tested your patch and can confirm
that it correctly treats single and double quotes the same​:

% ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8 -CS % -l
Malformed utf8 at - line 1.

% ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib' -Mutf8 -CS -l
Malformed utf8 at - line 1.

However, I feel a little uneasy about dying altogether. Currently Perl
issues just a warning ("Malformed UTF-8 character") and that seems to be
the approach with UTF-8 issues encountered in other places in toke.c as
well. Most of the time, these will be strings displayed to the user, and
they will mostly still be legible even with a few characters garbled or
skipped. Don't you think "complain and carry on" is what users would
expect?

Florian

But we are running into segfaults because of trying to keep going in the
face of malformed UTF-8. I'm thinking the lesson should be to give up
when we find it, and this is a reasonable place to start. There are
places where malformed UTF-8 is fatal.

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2016

From @cpansprout

On Fri Sep 16 13​:34​:55 2016, khw wrote​:

On 09/16/2016 06​:46 AM, Florian Schlichting wrote​:

Hi Karl,

Father Chrysostomos wrote​:

On Wed Aug 31 20​:35​:02 2016, khw wrote​:

Is the attach3ed like what you mean?

Yes, that would work.

It would be nice, too, if we could add the `near such and such' that
yyerror normally does. Maybe yyerror could have an extra option to
croak
instead of calling qerror. It already has a flags field.

thanks for looking into this issue. I tested your patch and can
confirm
that it correctly treats single and double quotes the same​:

% ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8
-CS % -l
Malformed utf8 at - line 1.

% ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib'
-Mutf8 -CS -l
Malformed utf8 at - line 1.

However, I feel a little uneasy about dying altogether. Currently
Perl
issues just a warning ("Malformed UTF-8 character") and that seems to
be
the approach with UTF-8 issues encountered in other places in toke.c
as
well. Most of the time, these will be strings displayed to the user,
and
they will mostly still be legible even with a few characters garbled
or
skipped. Don't you think "complain and carry on" is what users would
expect?

Florian

But we are running into segfaults because of trying to keep going in
the
face of malformed UTF-8. I'm thinking the lesson should be to give up
when we find it, and this is a reasonable place to start. There are
places where malformed UTF-8 is fatal.

I agree. If perl keeps going, then even if it does not crash, it will die on those malformed strings later.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 13, 2016

From @khwilliamson

On 09/16/2016 04​:44 PM, Father Chrysostomos via RT wrote​:

On Fri Sep 16 13​:34​:55 2016, khw wrote​:

On 09/16/2016 06​:46 AM, Florian Schlichting wrote​:

Hi Karl,

Father Chrysostomos wrote​:

On Wed Aug 31 20​:35​:02 2016, khw wrote​:

Is the attach3ed like what you mean?

Yes, that would work.

It would be nice, too, if we could add the `near such and such' that
yyerror normally does. Maybe yyerror could have an extra option to
croak
instead of calling qerror. It already has a flags field.

thanks for looking into this issue. I tested your patch and can
confirm
that it correctly treats single and double quotes the same​:

% ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8
-CS % -l
Malformed utf8 at - line 1.

% ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib'
-Mutf8 -CS -l
Malformed utf8 at - line 1.

However, I feel a little uneasy about dying altogether. Currently
Perl
issues just a warning ("Malformed UTF-8 character") and that seems to
be
the approach with UTF-8 issues encountered in other places in toke.c
as
well. Most of the time, these will be strings displayed to the user,
and
they will mostly still be legible even with a few characters garbled
or
skipped. Don't you think "complain and carry on" is what users would
expect?

Florian

But we are running into segfaults because of trying to keep going in
the
face of malformed UTF-8. I'm thinking the lesson should be to give up
when we find it, and this is a reasonable place to start. There are
places where malformed UTF-8 is fatal.

I agree. If perl keeps going, then even if it does not crash, it will die on those malformed strings later.

blead now has improved diagnostics for when malformations occur. I am
thinking that these should be turned on unconditionally when this error
occurs, as we are going to immediately die anyway Any opposition?

@p5pRT
Copy link
Author

p5pRT commented Dec 23, 2016

From @khwilliamson

This has been fixed in blead by
6cdc5cd
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Dec 23, 2016

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant