Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string-to-number coercion caching broken by locale #15952

Open
p5pRT opened this issue Apr 15, 2017 · 5 comments
Open

string-to-number coercion caching broken by locale #15952

p5pRT opened this issue Apr 15, 2017 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 15, 2017

Migrated from rt.perl.org#131155 (status was 'open')

Searchable as RT131155$

@p5pRT
Copy link
Author

p5pRT commented Apr 15, 2017

From zefram@fysh.org

Created by zefram@fysh.org

Riffing off the discussion in [perl #130801] of the locale dependence of
number-to-string coercion, and hence why we no longer cache that coercion
in the scalar, I had a look at the converse string-to-number coercion,
and found a bug​:

$ LANG=de_DE perl -lwe '$a = "1,50"; { use locale; print 0+$a; } print 0+$a'
1,5
1.5
$ LANG=de_DE perl -lwe '$a = "1,50"; print 0+$a'
Argument "1,50" isn't numeric in addition (+) at -e line 1.
1

Observe that the string-to-number coercion is affected by locale,
accepting comma as a decimal separator iff the locale uses comma in
that way. (Not shown​: dot is accepted as a decimal separator regardless
of locale.) The result of the coercion is cached in the scalar, and
subsequent numeric use of the scalar returns the cached value without
recomputing the coercion. Given the locale dependence, this caching
is in principle wrong, because it means that coercions performed under
different locale settings aren't getting their locale-specific results.
That can be seen above, with the non-locale coercion producing a different
result depending on whether a locale-using coercion was earlier performed.

But actually I think the locale dependence here is a mistake. Unlike the
locale dependence of number-to-string coercion, the locale dependence of
this operation doesn't exist in any form in old perls. It appeared from
nowhere in perl 5.19.8, presumably in the same edit that (intentionally)
changed the form of the locale control for number-to-string coercion.
We also have some semantic reliance on the caching for reasons other than
this effect on the value yielded​: we only warn once about a non-numeric
value, and we're somewhat open about the influence of an argument having
been used in numeric context on the bitwise operators.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.25.11:

Configured by zefram at Mon Mar 20 22:41:51 GMT 2017.

Summary of my perl5 (revision 5 version 25 subversion 11) configuration:
   
  Platform:
    osname=linux
    osvers=3.16.0-4-amd64
    archname=x86_64-linux-thread-multi
    uname='linux barba.rous.org 3.16.0-4-amd64 #1 smp debian 3.16.7-ckt11-1+deb8u6 (2015-11-09) x86_64 gnulinux '
    config_args='-des -Dprefix=/home/zefram/usr/perl/perl_install/perl-5.25.11-i64-f52 -Duselargefiles -Dusethreads -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dusedevel -Uversiononly -Ui_db'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O2'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.9.2'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lpthread -lnsl -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.19.so
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.19'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/home/zefram/usr/perl/perl_install/perl-5.25.11-i64-f52/lib/5.25.11/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.25.11:
    /home/zefram/usr/perl/perl_install/perl-5.25.11-i64-f52/lib/site_perl/5.25.11/x86_64-linux-thread-multi
    /home/zefram/usr/perl/perl_install/perl-5.25.11-i64-f52/lib/site_perl/5.25.11
    /home/zefram/usr/perl/perl_install/perl-5.25.11-i64-f52/lib/5.25.11/x86_64-linux-thread-multi
    /home/zefram/usr/perl/perl_install/perl-5.25.11-i64-f52/lib/5.25.11


Environment for perl 5.25.11:
    HOME=/home/zefram
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/zefram/usr/perl/perl_install/perl-5.25.11-i64-f52/bin:/home/zefram/usr/perl/util:/home/zefram/pub/x86_64-unknown-linux-gnu/bin:/home/zefram/pub/common/bin:/usr/bin:/bin:/usr/local/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/usr/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2017

From @jkeenan

On Sat, 15 Apr 2017 00​:44​:17 GMT, zefram@​fysh.org wrote​:

This is a bug report for perl from zefram@​fysh.org,
generated with the help of perlbug 1.40 running under perl 5.25.11.

-----------------------------------------------------------------
[Please describe your issue here]

Riffing off the discussion in [perl #130801] of the locale dependence
of
number-to-string coercion, and hence why we no longer cache that
coercion
in the scalar, I had a look at the converse string-to-number coercion,
and found a bug​:

$ LANG=de_DE perl -lwe '$a = "1,50"; { use locale; print 0+$a; } print
0+$a'
1,5
1.5
$ LANG=de_DE perl -lwe '$a = "1,50"; print 0+$a'
Argument "1,50" isn't numeric in addition (+) at -e line 1.
1

Observe that the string-to-number coercion is affected by locale,
accepting comma as a decimal separator iff the locale uses comma in
that way. (Not shown​: dot is accepted as a decimal separator
regardless
of locale.) The result of the coercion is cached in the scalar, and
subsequent numeric use of the scalar returns the cached value without
recomputing the coercion. Given the locale dependence, this caching
is in principle wrong, because it means that coercions performed under
different locale settings aren't getting their locale-specific
results.
That can be seen above, with the non-locale coercion producing a
different
result depending on whether a locale-using coercion was earlier
performed.

But actually I think the locale dependence here is a mistake. Unlike
the
locale dependence of number-to-string coercion, the locale dependence
of
this operation doesn't exist in any form in old perls. It appeared
from
nowhere in perl 5.19.8, presumably in the same edit that
(intentionally)
changed the form of the locale control for number-to-string coercion.
We also have some semantic reliance on the caching for reasons other
than
this effect on the value yielded​: we only warn once about a non-
numeric
value, and we're somewhat open about the influence of an argument
having
been used in numeric context on the bitwise operators.

To analyze this problem I added the 'de_DE' locale per instructions at https://askubuntu.com/questions/76013/how-do-i-add-locale-to-ubuntu-server#76106.  I then opened two terminals with (via perlbrew) two different versions of perl.

#####
[p5p] 510 $ perl -v | head -2 | tail -1
This is perl 5, version 18, subversion 4 (v5.18.4) built for x86_64-linux
[p5p] 511 $ LANG=de_DE perl -lwe '$a = "1,50"; { use locale; print 0+$a; } print 0+$a'
Argument "1,50" isn't numeric in addition (+) at -e line 1.
1
1
[p5p] 512 $ LANG=de_DE perl -lwe '$a = "1,50"; print 0+$a'
Argument "1,50" isn't numeric in addition (+) at -e line 1.
1
#####
[tmp] 521 $ perl -v | head -2 | tail -1
This is perl 5, version 26, subversion 0 (v5.26.0) built for x86_64-linux
[tmp] 522 $ LANG=de_DE perl -lwe '$a = "1,50"; { use locale; print 0+$a; } print 0+$a'
1,5
1.5
[tmp] 525 $ LANG=de_DE perl -lwe '$a = "1,50"; print 0+$a'
Argument "1,50" isn't numeric in addition (+) at -e line 1.
1
#####

Are these the results I should have expected for these two versions of perl?

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2017

From zefram@fysh.org

James E Keenan via RT wrote​:

Are these the results I should have expected for these two versions of perl?

Those results match what I see and what I described in the bug report.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Oct 1, 2017

From @jkeenan

On Sat, 30 Sep 2017 23​:30​:45 GMT, zefram@​fysh.org wrote​:

James E Keenan via RT wrote​:

Are these the results I should have expected for these two versions
of perl?

Those results match what I see and what I described in the bug report.

-zefram

Thanks. Bisection indicates that the following is the commit where the behavior changed​:

#####
commit 2143189
Author​: Karl Williamson <khw@​cpan.org>
AuthorDate​: Sun Jun 1 16​:02​:24 2014 -0600
Commit​: Karl Williamson <khw@​cpan.org>
CommitDate​: Thu Jun 5 11​:23​:00 2014 -0600

  Make sure locale set right for radix parsing
 
  I haven't found a test case this fails for in v5.20, but I'm sure there is one. But two commits from now would fail if this wasn't done.

#####

Whether the change in behavior was good, bad, or some mix thereof, is up for discussion.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants