Skip Menu |
Report information
Id: 121292
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: tonyc <tony [at] develop-help.com>
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: low
Type: core
Perl Version: 5.19.10
Fixed In: (no value)



From: tony [...] develop-help.com
Subject: perlunicode claims about a UTF-8 BOM in perl source are incorrect
Date: Fri, 21 Feb 2014 11:41:50 +1100
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 3.7k
This is a bug report for perl from tony@develop-help.com, generated with the help of perlbug 1.40 running under perl 5.19.10. ----------------------------------------------------------------- [Please describe your issue here] perlunicode claims: =item C<BOM>-marked scripts and UTF-16 scripts autodetected If a Perl script begins marked with the Unicode C<BOM> (UTF-16LE, UTF16-BE, or UTF-8), or if the script looks like non-C<BOM>-marked UTF-16 of either endianness, Perl will correctly read in the script as Unicode. (C<BOM>less UTF-8 cannot be effectively recognized or differentiated from ISO 8859-1 or other eight-bit encodings.) ie. that the following code, hexdumped: 00000000 ef bb bf 70 72 69 6e 74 20 22 54 65 73 74 5c 6e |...print "Test\n| 00000010 22 3b 0a 70 72 69 6e 74 20 6f 72 64 28 27 ce a3 |";.print ord('..| 00000020 27 29 2c 20 22 5c 6e 22 3b 0a |'), "\n";.| 0000002a should be treated as unicode, implying to me at least that it should act as an implied C<use utf8;> This doesn't occur: tony@mars:.../git/perl2$ ./perl test.pl Test 206 Is the documentation correct, or unclear, or is the behaviour incorrect? [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.19.10: Configured by tony at Fri Feb 21 10:14:52 EST 2014. Summary of my perl5 (revision 5 version 19 subversion 10) configuration: Commit id: 3e63bed3c572617faf16446e7b44b5ea0b78e979 Platform: osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux uname='linux mars 3.2.0-4-amd64 #1 smp debian 3.2.46-1+deb7u1 x86_64 gnulinux ' config_args='-des -Dusedevel -Uusedl' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.7.2', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='ld', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat libc=libc-2.13.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.13' Dynamic Linking: dlsrc=dl_none.xs, dlext=none, d_dlsymun=undef, ccdlflags='' cccdlflags='', lddlflags='' --- @INC for perl 5.19.10: lib /usr/local/lib/perl5/site_perl/5.19.10/x86_64-linux /usr/local/lib/perl5/site_perl/5.19.10 /usr/local/lib/perl5/5.19.10/x86_64-linux /usr/local/lib/perl5/5.19.10 . --- Environment for perl 5.19.10: HOME=/home/tony LANG=en_AU.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/tony/perl5/perlbrew/bin:/home/tony/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games PERLBREW_BASHRC_VERSION=0.43 PERLBREW_HOME=/home/tony/.perlbrew PERLBREW_PATH=/home/tony/perl5/perlbrew/bin PERLBREW_ROOT=/home/tony/perl5/perlbrew PERL_BADLANG (unset) SHELL=/bin/bash
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 552b
The documentation is incorrect. We should keep the current behavior of ignoring the BOM and otherwise continuing as normal. The situation is annoying from many angles, but the presence of a BOM changing the behavior of a file that may otherwise be entirely plain ol' ASCII is going to lead to real pain when debugging. I wouldn't want an invisible sequence to start turning on utf8.pm's behavior normally, and even less so as a backward incompatible change. UTF-8 BOMs are best ignored. (Also, best never written out, but that's another matter.)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 842b
On Wed, 26 Feb 2014 15:45:59 -0800, rjbs wrote: Show quoted text
> The documentation is incorrect. We should keep the current behavior > of ignoring the BOM and otherwise continuing as normal. > > The situation is annoying from many angles, but the presence of a BOM > changing the behavior of a file that may otherwise be entirely plain > ol' ASCII is going to lead to real pain when debugging. I wouldn't > want an invisible sequence to start turning on utf8.pm's behavior > normally, and even less so as a backward incompatible change. UTF-8 > BOMs are best ignored. (Also, best never written out, but that's > another matter.)
I looked at the code. If the BOM looks like it is for UTF-16, everything in the file is read as UTF-16 and converted to UTF-8, so it is like a 'use utf8'. If the BOM is in UTF-8, it is simply ignored. -- Karl Williamson
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 124b
I have now changed the doc to be accurate about this in commit 27c74dfd9a73dc0baa42c9e37899f741b08b7c4b -- Karl Williamson
Download (untitled) / with headers
text/plain 313b
Thank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been resolved. Perl 5.26.0 may be downloaded via: https://metacpan.org/release/XSAWYERX/perl-5.26.0 If you find that the problem persists, feel free to reopen this ticket.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org