Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"use bytes" doesn't apply byte semantics to concatenation #7114

Closed
p5pRT opened this issue Feb 19, 2004 · 5 comments
Closed

"use bytes" doesn't apply byte semantics to concatenation #7114

p5pRT opened this issue Feb 19, 2004 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 19, 2004

Migrated from rt.perl.org#26905 (status was 'resolved')

Searchable as RT26905$

@p5pRT
Copy link
Author

p5pRT commented Feb 19, 2004

From @jlokier

Created by @jlokier

The "use bytes" pragma is useful for code which only wants to handle bytes.

substr(), length(), index(), pos() and regex matching all ignore the
UTF-8 flag on strings in the scope of this pragma.

However, string concatenation does not take this pragma into account.
Just like without the pragma, it upgrades strings to UTF-8 if any of
them are UTF-8.

This is quite inconsistent with the algebraic properties expected of
byte strings, such as​:

  length(substr($a,0,1).substr($a,1)) == length($a)

Here's an example program which illustrates this​:

  $x="\x{100}abc";
  $y="\x{80}def";
  use bytes;
  print length($x), ",", length($y), "\n";
  $z = $x.substr($x,0,1).substr($x,1).$y;
  print length($x), ",", length($y), ",", length($z), "\n";

The program prints​:

  5,4
  5,5,17

Those numbers make no sense. In bytes, length($x) is 5 and length($y)
is 4. After the concatenation, the total is 17, when it should
logically be 14.

(This also shows length($y) is modified simply by $y being read,
reported as [perl #26901]. In this case, length($y) is 4 before the
concatenation but 5 after.)

Summary​: I think string concatenation should _not_ upgrade non-UTF-8
strings to UTF-8 when they are concatenated inside the scope of "use
bytes". A warning or even an exception may be appropriate.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.0:

Configured by bhcompile'
cf_email='bhcompile at Wed Aug 13 11:45:59 EDT 2003.

Summary of my rderl (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.21-1.1931.2.382.entsmp, archname=i386-linux-thread-multi
    uname='linux str'
    config_args='-des -Doptimize=-O2 -g -pipe -march=i386 -mcpu=i686 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Dotherlibdirs=/usr/lib/perl5/5.8.0 -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef'
 useithreads=define usemultiplicity=
    useperlio= d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=un uselongdouble=
    usemymalloc=, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', gccosandvers=''
gccversion='3.2.2 200302'
    intsize=r, longsize=r, ptrsize=5, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long'
k', ivsize=4'
ivtype='l, nvtype='double'
o_nonbl', nvsize=, Off_t='', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc'
l', ldflags =' -L/u'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt -lutil
    perllibs=
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libper
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so', d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC'
ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5', lddlflags='s Unicode/Normalize XS/A'

Locally applied patches:
    MAINT18379


@INC for perl v5.8.0:
    /usr/lib/perl5/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/5.8.0
    /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.0
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/5.8.0
    .


Environment for perl v5.8.0:
    HOME=/home/jamie
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/jamie/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash
    dlflags='-share (unset)

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2004

From @rgs

SADAHIRO Tomoyuki wrote​:

This is because join() internally uses sv_catsv() which
considers bytes.pm.

Here is a patch against perl-current.
After this patch the above example prints​:
[snip]

After my patch for pp_hot.c, some tests for Encode fail.

t/CJKT.t 1 256 60 1 1.67% 22
t/at-cn.t 2 512 29 2 6.90% 18 20
t/perlio.t 2 512 38 2 5.26% 7-8

This is unnecessary (I think) declaration of <use bytes>
in Encode​::CN​::HZ. If E​::CN​::HZ fixed, all the tests for Encode
should succeed.

In addition perlio_ok returning constantly true is wrong.
(it should return false if PerlIO​::encoding is not available)
So the default method in Encode​::Encoding​:: should be used.

Thanks, both patches applied to bleadperl as change #22363.
Note that I've changed the version number of Encode​::CN​::HZ to
1.05_01. The change to Encode​::CN​::HZ should probably be made
conditional on perl version >= 5.9.1.

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2004

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 24, 2004

From dankogai@dan.co.jp

On Feb 23, 2004, at 01​:26, Autrijus Tang wrote​:

On Sun, Feb 22, 2004 at 06​:41​:43PM +0900, SADAHIRO Tomoyuki wrote​:

After my patch for pp_hot.c, some tests for Encode fail.
re
t/CJKT.t 1 256 60 1 1.67% 22
t/at-cn.t 2 512 29 2 6.90% 18 20
t/perlio.t 2 512 38 2 5.26% 7-8

This is unnecessary (I think) declaration of <use bytes>
in Encode​::CN​::HZ. If E​::CN​::HZ fixed, all the tests for Encode
should succeed.

As the author of HZ.pm, I think the patch makes perfect sense. :-)

Sorry for my slow response. I was too busy to be online for last few
days.

I just checked the patch on both 5.8.0 and 5.8.3 and worked fine. So
it is backward-compatible. Now there is no reason not to let your
patch in. I already did so in my repository.

Pumpking(s), please go ahead apply his patch.

Dan the Encode Maintainer

@p5pRT
Copy link
Author

p5pRT commented Jun 21, 2008

p5p@spam.wizbit.be - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant