Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perlunicode claims about a UTF-8 BOM in perl source are incorrect #13617

Closed
p5pRT opened this issue Feb 21, 2014 · 8 comments
Closed

perlunicode claims about a UTF-8 BOM in perl source are incorrect #13617

p5pRT opened this issue Feb 21, 2014 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 21, 2014

Migrated from rt.perl.org#121292 (status was 'resolved')

Searchable as RT121292$

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2014

From @tonycoz

Created by @tonycoz

perlunicode claims​:

=item C<BOM>-marked scripts and UTF-16 scripts autodetected

If a Perl script begins marked with the Unicode C<BOM> (UTF-16LE, UTF16-BE,
or UTF-8), or if the script looks like non-C<BOM>-marked UTF-16 of either
endianness, Perl will correctly read in the script as Unicode.
(C<BOM>less UTF-8 cannot be effectively recognized or differentiated from
ISO 8859-1 or other eight-bit encodings.)

ie. that the following code, hexdumped​:

00000000 ef bb bf 70 72 69 6e 74 20 22 54 65 73 74 5c 6e |...print "Test\n|
00000010 22 3b 0a 70 72 69 6e 74 20 6f 72 64 28 27 ce a3 |";.print ord('..|
00000020 27 29 2c 20 22 5c 6e 22 3b 0a |'), "\n";.|
0000002a

should be treated as unicode, implying to me at least that it should
act as an implied C<use utf8;>

This doesn't occur​:

tony@​mars​:.../git/perl2$ ./perl test.pl
Test
206

Is the documentation correct, or unclear, or is the behaviour
incorrect?

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.19.10:

Configured by tony at Fri Feb 21 10:14:52 EST 2014.

Summary of my perl5 (revision 5 version 19 subversion 10) configuration:
  Commit id: 3e63bed3c572617faf16446e7b44b5ea0b78e979
  Platform:
    osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux
    uname='linux mars 3.2.0-4-amd64 #1 smp debian 3.2.46-1+deb7u1 x86_64 gnulinux '
    config_args='-des -Dusedevel -Uusedl'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.7.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='ld', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    libc=libc-2.13.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_none.xs, dlext=none, d_dlsymun=undef, ccdlflags=''
    cccdlflags='', lddlflags=''



@INC for perl 5.19.10:
    lib
    /usr/local/lib/perl5/site_perl/5.19.10/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.19.10
    /usr/local/lib/perl5/5.19.10/x86_64-linux
    /usr/local/lib/perl5/5.19.10
    .


Environment for perl 5.19.10:
    HOME=/home/tony
    LANG=en_AU.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/tony/perl5/perlbrew/bin:/home/tony/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERLBREW_BASHRC_VERSION=0.43
    PERLBREW_HOME=/home/tony/.perlbrew
    PERLBREW_PATH=/home/tony/perl5/perlbrew/bin
    PERLBREW_ROOT=/home/tony/perl5/perlbrew
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Feb 26, 2014

From @rjbs

The documentation is incorrect. We should keep the current behavior of ignoring the BOM and otherwise continuing as normal.

The situation is annoying from many angles, but the presence of a BOM changing the behavior of a file that may otherwise be entirely plain ol' ASCII is going to lead to real pain when debugging. I wouldn't want an invisible sequence to start turning on utf8.pm's behavior normally, and even less so as a backward incompatible change. UTF-8 BOMs are best ignored. (Also, best never written out, but that's another matter.)

@p5pRT
Copy link
Author

p5pRT commented Feb 26, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2017

From @khwilliamson

On Wed, 26 Feb 2014 15​:45​:59 -0800, rjbs wrote​:

The documentation is incorrect. We should keep the current behavior
of ignoring the BOM and otherwise continuing as normal.

The situation is annoying from many angles, but the presence of a BOM
changing the behavior of a file that may otherwise be entirely plain
ol' ASCII is going to lead to real pain when debugging. I wouldn't
want an invisible sequence to start turning on utf8.pm's behavior
normally, and even less so as a backward incompatible change. UTF-8
BOMs are best ignored. (Also, best never written out, but that's
another matter.)

I looked at the code. If the BOM looks like it is for UTF-16, everything in the file is read as UTF-16 and converted to UTF-8, so it is like a 'use utf8'. If the BOM is in UTF-8, it is simply ignored.
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Apr 8, 2017

From @khwilliamson

I have now changed the doc to be accurate about this
in commit 27c74df

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Apr 8, 2017

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant