Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode from_to() does not return on invalid conversion #9189

Closed
p5pRT opened this issue Jan 16, 2008 · 8 comments
Closed

Encode from_to() does not return on invalid conversion #9189

p5pRT opened this issue Jan 16, 2008 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 16, 2008

Migrated from rt.perl.org#49830 (status was 'rejected')

Searchable as RT49830$

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2008

From perlbugs2008@j3e.de

Created by perlbugs2008@j3e.de

if from_to() is called with the check parameter Encode​::FB_QUIET it should return on errors. With Perl 5.10 this does no longer work in this szenario​:

#!/usr/bin/perl
use Encode 'from_to';
my $string = "\366"; # this is "o umlaut" in iso-8859-1, invalid utf-8
if (from_to($string,utf8,utf8,Encode​::FB_QUIET) == undef) {
  print "from_to utf8..utf8 returns undef as is should!\n";
} else {
  print "from_to utf8..utf8 of non-UTF-8 strings returns NO error!\n";
  print "foo​: $string\n";
}

In this case \366 is being converted to \357 \277 \275 by Perl 5.10.

With Perl <= 5.8.8 from_to returned undef which iѕ more reasonable.

Perl Info

Flags:
    category=library
    severity=high

This perlbug was built using Perl 5.10.0 - Sat Jan 12 04:21:23 UTC 2008
It is being executed now by  Perl 5.10.0 - Sat Jan 12 04:15:46 UTC 2008.

Site configuration information for perl 5.10.0:

Configured by abuild at Sat Jan 12 04:15:46 UTC 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.23, archname=i586-linux-thread-multi
    uname='linux smetana 2.6.23 #1 smp thu may 17 14:00:09 utc 2007 i686 i686 i386 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -march=i586 -mtune=i686 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -march=i586 -mtune=i686 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe'
    ccversion='', gccversion='4.3.0 20080102 (experimental) [trunk revision 131254]', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =''
    libpth=/lib /usr/lib /usr/local/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.7.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.7'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/i586-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -march=i586 -mtune=i686 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe'

Locally applied patches:
    


@INC for perl 5.10.0:
    /usr/lib/perl5/5.10.0/i586-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/i586-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    .


Environment for perl 5.10.0:
    HOME=/home/bjacke
    LANG=de_DE.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/opt/kde3/bin:/home/bjacke/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From @nwc10

Dave notes​:

a 5.10.0 regression apparently

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

@nwc10 - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From p5p@spam.wizbit.be

(CC'ing the maintainer of Encode​: Dan Kogai)

Dan,

Could you please take a look at this bug report and help us determine
if this is intended behaviour, a bug in Encode or a bug in perl?

On Wed Jan 16 04​:54​:34 2008, perlbugs2008@​j3e.de wrote​:

-----------------------------------------------------------------
[Please enter your report here]

if from_to() is called with the check parameter Encode​::FB_QUIET it
should return on errors. With Perl 5.10 this does no longer work
in
this szenario​:

#!/usr/bin/perl
use Encode 'from_to';
my $string = "\366"; # this is "o umlaut" in iso-8859-1, invalid utf-
8
if (from_to($string,utf8,utf8,Encode​::FB_QUIET) == undef) {
print "from_to utf8..utf8 returns undef as is should!\n";
} else {
print "from_to utf8..utf8 of non-UTF-8 strings returns NO
error!\n";
print "foo​: $string\n";
}

In this case \366 is being converted to \357 \277 \275 by Perl 5.10.

With Perl <= 5.8.8 from_to returned undef which iѕ more reasonable.

-----------------------------------------------------------------

perl-5.8.8 contains Encode v2.12
$ perl-5.8.8 rt-49830.pl
from_to utf8..utf8 returns undef as is should!

perl-5.8.9 contains Encode v2.26
$ perl-5.8.9 rt-49830.pl
from_to utf8..utf8 of non-UTF-8 strings returns NO error!

perl-5.9.2 contains Encode v2.09
$ perl-5.9.2 rt-49830.pl
from_to utf8..utf8 returns undef as is should

perl-5.9.3 contains Encode v2.14
$ perl-5.9.3 rt-49830.pl
from_to utf8..utf8 of non-UTF-8 strings returns NO error!

A binary search​:
----EOF ($?='0')----
Will binsearch the lower half
Running the prog '/tmp/rt-49830.pl' for installed-perls/perl/peZnlj8/
perl-5.9.2@​26861/bin/perl and installed-perls/perl/pKNe6tf/perl-
5.9.2@​26863/bin/perl
----Program----
#!/usr/bin/perl

use Encode 'from_to';
my $string = "\366"; # this is "o umlaut" in iso-8859-1, invalid utf-8
if (not defined from_to($string,utf8,utf8,Encode​::FB_QUIET)) {
  print "from_to utf8..utf8 returns undef as is should!\n";
} else {
  print "from_to utf8..utf8 of non-UTF-8 strings returns NO
error!\n";
# print "foo​: $string\n";
}

----Output of .../peZnlj8/perl-5.9.2@​26861/bin/perl----
from_to utf8..utf8 returns undef as is should!

----EOF ($?='0')----
----Output of .../pKNe6tf/perl-5.9.2@​26863/bin/perl----
from_to utf8..utf8 of non-UTF-8 strings returns NO error!

----EOF ($?='0')----

http​://public.activestate.com/cgi-bin/perlbrowse/p/26863
Change 26863 by rgs@​stencil on 2006/01/16 14​:09​:29

  Upgrade to Encode 2.14

perl-5.9.2@​26861 contains Encode v2.12
perl-5.9.2@​26863 contains Encode v2.14

Running it with the latest version of Encode (v2.33) on perl-5.8.8​:
$ perl /tmp/rt-49830.pl
from_to utf8..utf8 of non-UTF-8 strings returns NO error!

So this looks like a change in behaviour (maybe intended, maybe not) in
Encode and not in perl.

Best regards,

Bram

@p5pRT
Copy link
Author

p5pRT commented May 29, 2009

From @tonycoz

This appears to be deliberate, in particular, see​:

http​://rt.cpan.org/Public/Bug/Display.html?id=27277

and in Encode.pm​:

  Also note that

  from_to($octets, $from, $to, $check);

  is equivalent to

  $octets = encode($to, decode($from, $octets), $check);

  Yes, it does not respect the $check during decoding. It is
  deliberately done that way.

@p5pRT
Copy link
Author

p5pRT commented May 29, 2009

From @tonycoz

So the list sees it...

On Fri May 29 05​:41​:29 2009, tonyc wrote​:

This appears to be deliberate, in particular, see​:

http​://rt.cpan.org/Public/Bug/Display.html?id=27277

and in Encode.pm​:

     Also note that

       from\_to\($octets\, $from\, $to\, $check\);

     is equivalent to

       $octets = encode\($to\, decode\($from\, $octets\)\, $check\);

     Yes\, it does not respect the $check during decoding\.  It is
     deliberately done that way\.

@p5pRT
Copy link
Author

p5pRT commented May 31, 2009

From p5p@spam.wizbit.be

On Wed Jan 16 04​:54​:34 2008, perlbugs2008@​j3e.de wrote​:

This is a bug report for perl from perlbugs2008@​j3e.de,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

if from_to() is called with the check parameter Encode​::FB_QUIET it
should return on errors.

This is documented as a feature in Encode.

Marking this bug as rejected.

Best regards,

Bram

@p5pRT
Copy link
Author

p5pRT commented May 31, 2009

p5p@spam.wizbit.be - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant