Skip Menu |
 
Report information
Id: 38456
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: ciaran [at] tnauk.org.uk
Cc:
AdminCc:

Operating System: cygwin
PatchStatus: (no value)
Severity: medium
Type:
  • core
  • PerlIO
  • Unicode
Perl Version: 5.8.7
Fixed In: (no value)

Attachments
0001-binmode-FH-crlf-only-modifies-top-crlf-layer.patch
0001-Update-comment-for-PerlIOCrlf_pushed-wrt-7826b36.patch
0002-Update-PerlIO-docs-for-crlf-wrt-7826b36f.patch



CC: gerrit [...] familiehaase.de, ciaran [...] theblob.org
Subject: The :crlf PerlIO layer doesn't like the :encoding layer.
Date: Tue, 07 Feb 2006 12:16:40 +0000
To: perlbug [...] perl.org
From: Ciaran Hamilton <ciaran [...] tnauk.org.uk>
Download (untitled) / with headers
text/plain 7.4k
This is a bug report for perl from ciaran@tnauk.org.uk, generated with the help of perlbug 1.35 running under perl v5.8.7. ----------------------------------------------------------------- [Please enter your report here] (note: This report was compiled with perlbug, then sent manually with Thunderbird as perlbug had trouble. However, I've made sure that any long lines that matter are NOT wrapped.) There seems to be a problem in the way that the ":crlf" PerlIO layer is implemented. I'm not entirely sure where the problem lies in :crlf, but it doesn't play nicely with ":encoding". I have observed this behaviour in: * Perl v5.8.7, Cygwin (prepackaged) * Perl v5.8.7, Gentoo Linux (dev-lang/perl-5.8.7-r3) * Perl v5.9.1, Cygwin (compiled) Here is a test case: ===CUT HERE=== #!/usr/bin/perl -w open(FILE, ">:encoding(UTF-16):crlf", "test-file"); print FILE "Test \x{A3}45!\n"; print FILE "Test!\n"; close(FILE); ===CUT HERE=== This should generate a valid UTF-16 file , containing "Test £45!\r\nTest!\r\n", where the \r and \n characters represent \015 and \012 respectively, and £ represents the character U+00A3, the British pound symbol. On a standard version of Perl, what actually happens is that I get a message: Show quoted text
> Malformed UTF-8 character (unexpected continuation byte 0xa3, with no > preceding start byte) in null operation at ./utf16-test.pl line 6.
and the pound sign is replaced with a null character. Using a literal (not interpolated) UTF-8 character sequence for U+00A3 (ie. C2 A3) in the file seems to work fine. As noted above, while I am submitting this report from Perl v5.8.7, I have compiled Perl v5.9.1 on Cygwin and observed the same behaviour. The output of "perl5.9.1 -V" is included at the end of the user-modifiable section. One workaround seems to be to install PerlIO::eol from CPAN and replace the ":crlf" layer with ":eol(CRLF)". This works correctly, and test-file contains what it should. I have not mentioned ActiveState Perl in this report as I assume it is not your responsibility. (For the record, ActiveState Perl v5.8.7 seems to treat the layers as if they were reversed, so the inserted CR doesn't get encoded, and thus corrupts the UTF-16 file by inserting a single-byte character instead of a double-byte character.) Thank you. Following is the output of "perl5.9.1 -V": ===CUT HERE=== ciaran@IT-16:~/utf8$ perl5.9.1 -V Summary of my perl5 (revision 5 version 9 subversion 1) configuration: Platform: osname=cygwin, osvers=1.5.19(0.15042), archname=cygwin uname='cygwin_nt-5.1 it-16 1.5.19(0.15042) 2006-01-20 13:28 i686 cygwin ' config_args='-Dmksymlinks' hint=recommended, useposix=true, d_sigaction=define usethreads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=y, bincompat5005=undef Compiler: cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing', optimize='-O2', cppflags='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing' ccversion='', gccversion='3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='ld2', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /usr/lib /lib libs=-lgdbm -lcrypt -lgdbm_compat perllibs=-lcrypt -lgdbm_compat libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags=' -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: USE_LARGE_FILES Built under cygwin Compiled at Feb 7 2006 09:58:18 %ENV: PERL5LIB="C:/HATS/scripts/lib" CYGWIN="" @INC: C /HATS/scripts/lib /usr/local/lib/perl5/5.9.1/cygwin /usr/local/lib/perl5/5.9.1 /usr/local/lib/perl5/site_perl/5.9.1/cygwin /usr/local/lib/perl5/site_perl/5.9.1 /usr/local/lib/perl5/site_perl . ===CUT HERE=== [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=medium --- Site configuration information for perl v5.8.7: Configured by gerrit at Fri Dec 30 02:40:15 2005. Summary of my perl5 (revision 5 version 8 subversion 7) configuration: Platform: osname=cygwin, osvers=1.5.18(0.13242), archname=cygwin-thread-multi-64int uname='cygwin_nt-5.1 inspiron 1.5.18(0.13242) 2005-07-02 20:30 i686 unknown unknown cygwin ' config_args='-de -Dmksymlinks -Duse64bitint -Dusethreads -Uusemymalloc -Doptimize=-O3 -Dman3ext=3pm -Dusesitecustomize' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -I/usr/local/include', optimize='-O3', cppflags='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='ld2', ldflags =' -s -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -ldb -lcrypt -lgdbm_compat perllibs=-lcrypt -lgdbm_compat libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' -s' cccdlflags=' ', lddlflags=' -s -L/usr/local/lib' Locally applied patches: SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962 --- @INC for perl v5.8.7: C /HATS/scripts/lib /usr/lib/perl5/5.8/cygwin /usr/lib/perl5/5.8 /usr/lib/perl5/site_perl/5.8/cygwin /usr/lib/perl5/site_perl/5.8 /usr/lib/perl5/site_perl/5.8/cygwin /usr/lib/perl5/site_perl/5.8 /usr/lib/perl5/vendor_perl/5.8/cygwin /usr/lib/perl5/vendor_perl/5.8 /usr/lib/perl5/vendor_perl/5.8/cygwin /usr/lib/perl5/vendor_perl/5.8 . --- Environment for perl v5.8.7: HOME=/home/ciaran LANG=en_GB LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/ciaran/bin:/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin:/cygdrive/c/Perl/bin/:/cygdrive/c/PHP/:/cygdrive/c/swig:/cygdrive/c/gnuwin/gnuwin32/bin:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/PROGRA~1/ULTRAE~1:/cygdrive/c/Program Files/MySQL/MySQL Server 5.0/bin:/cygdrive/c/Program Files/Subversion/bin:/cygdrive/c/Program Files/Common Files/GTK/2.0/bin:/cygdrive/c/Program Files/ActiveState Komodo 3.1/:/cygdrive/c/binpath:/usr/bin PERL5LIB=C:/HATS/scripts/lib PERL_BADLANG (unset) SHELL (unset)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 434b
I've attached a possible fix for this bug. The patch is relative to my patches for #82484 but it doesn't depend on it. Currently, if a «:crlf» layer is given it will first try to (re)enable any crlf layer it can find or else push itself on the stack. This can lead to data corruption. In this patch I've changed it to only check the topmost layer. Testing it would be very welcome, in particular on Windows (not my habitat). Leon
From d65510200bc7d39254ff62c8d4240280dd988276 Mon Sep 17 00:00:00 2001 From: Leon Timmermans <fawaka@gmail.com> Date: Thu, 20 Jan 2011 23:32:28 +0100 Subject: [PATCH] binmode FH, ":crlf" only modifies top crlf layer When pushed on top of the stack, crlf will no longer enable crlf layers lower in the stack. This will prevent unexpected results. --- perlio.c | 5 ++--- t/io/layers.t | 18 +++++++++++++++--- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/perlio.c b/perlio.c index 3ce31d1..6ce142e 100644 --- a/perlio.c +++ b/perlio.c @@ -4513,7 +4513,7 @@ PerlIOCrlf_pushed(pTHX_ PerlIO *f, const char *mode, SV *arg, PerlIO_funcs *tab) * any given moment at most one CRLF-capable layer being enabled * in the whole layer stack. */ PerlIO *g = PerlIONext(f); - while (PerlIOValid(g)) { + if (PerlIOValid(g)) { PerlIOl *b = PerlIOBase(g); if (b && b->tab == &PerlIO_crlf) { if (!(b->flags & PERLIO_F_CRLF)) @@ -4521,8 +4521,7 @@ PerlIOCrlf_pushed(pTHX_ PerlIO *f, const char *mode, SV *arg, PerlIO_funcs *tab) S_inherit_utf8_flag(g); PerlIO_pop(aTHX_ f); return code; - } - g = PerlIONext(g); + } } } S_inherit_utf8_flag(f); diff --git a/t/io/layers.t b/t/io/layers.t index dea3d09..b0bcf1e 100644 --- a/t/io/layers.t +++ b/t/io/layers.t @@ -43,7 +43,7 @@ if (${^UNICODE} & 1) { } else { $UTF8_STDIN = 0; } -my $NTEST = 45 - (($DOSISH || !$FASTSTDIO) ? 7 : 0) - ($DOSISH ? 5 : 0) +my $NTEST = 55 - (($DOSISH || !$FASTSTDIO) ? 7 : 0) - ($DOSISH ? 7 : 0) + $UTF8_STDIN; sub PerlIO::F_UTF8 () { 0x00008000 } # from perliol.h @@ -105,7 +105,7 @@ SKIP: { # 5 tests potentially skipped because # DOSISH systems already have a CRLF layer # which will make new ones not stick. - @$expected = grep { $_ ne 'crlf' } @$expected; + splice @$expected, 1, 1 if $expected->[1] eq 'crlf'; } my $n = scalar @$expected; is(scalar @$result, $n, "$id - layers == $n"); @@ -132,13 +132,25 @@ SKIP: { [ qw(stdio crlf) ], "open :crlf"); + binmode(F, ":crlf"); + + check([ PerlIO::get_layers(F) ], + [ qw(stdio crlf) ], + "binmode :crlf"); + binmode(F, ":encoding(cp1047)"); check([ PerlIO::get_layers(F) ], [ qw[stdio crlf encoding(cp1047) utf8] ], ":encoding(cp1047)"); + + binmode(F, ":crlf"); + + check([ PerlIO::get_layers(F) ], + [ qw[stdio crlf encoding(cp1047) utf8 crlf utf8] ], + ":encoding(cp1047):crlf"); - binmode(F, ":pop"); + binmode(F, ":pop:pop"); check([ PerlIO::get_layers(F) ], [ qw(stdio crlf) ], -- 1.7.1
RT-Send-CC: perl5-porters [...] perl.org, fawaka [...] gmail.com
Download (untitled) / with headers
text/plain 591b
On Thu Jan 20 15:04:30 2011, LeonT wrote: Show quoted text
> I've attached a possible fix for this bug. The patch is relative to my > patches for #82484 but it doesn't depend on it. > > Currently, if a «:crlf» layer is given it will first try to (re)enable > any crlf layer it can find or else push itself on the stack. This can > lead to data corruption. In this patch I've changed it to only check the > topmost layer. > > Testing it would be very welcome, in particular on Windows (not my
habitat). Show quoted text
> > Leon
Thank you. I have just applied this as 7826b36. Let’s see whether it breaks Windows. :-)
Subject: Re: [perl #38456] The :crlf PerlIO layer doesn't like the :encoding layer.
Date: Fri, 25 Mar 2011 13:19:45 +0100
To: perlbug-followup [...] perl.org
From: Leon Timmermans <fawaka [...] gmail.com>
Download (untitled) / with headers
text/plain 796b
On Fri, Jan 28, 2011 at 6:43 AM, Father Chrysostomos via RT <perlbug-followup@perl.org> wrote: Show quoted text
> On Thu Jan 20 15:04:30 2011, LeonT wrote:
>> I've attached a possible fix for this bug. The patch is relative to my >> patches for #82484 but it doesn't depend on it. >> >> Currently, if a «:crlf» layer is given it will first try to (re)enable >> any crlf layer it can find or else push itself on the stack. This can >> lead to data corruption. In this patch I've changed it to only check the >> topmost layer. >> >> Testing it would be very welcome, in particular on Windows (not my
> habitat).
>> >> Leon
> > Thank you. I have just applied this as 7826b36. Let’s see whether it > breaks Windows. :-)
Comment and documentation updates with regard to the previous patch. No code changes. Leon

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-porters [...] perl.org, fawaka [...] gmail.com
Download (untitled) / with headers
text/plain 918b
On Fri Mar 25 05:20:25 2011, LeonT wrote: Show quoted text
> On Fri, Jan 28, 2011 at 6:43 AM, Father Chrysostomos via RT > <perlbug-followup@perl.org> wrote:
> > On Thu Jan 20 15:04:30 2011, LeonT wrote:
> >> I've attached a possible fix for this bug. The patch is relative to my > >> patches for #82484 but it doesn't depend on it. > >> > >> Currently, if a «:crlf» layer is given it will first try to (re)enable > >> any crlf layer it can find or else push itself on the stack. This can > >> lead to data corruption. In this patch I've changed it to only
check the Show quoted text
> >> topmost layer. > >> > >> Testing it would be very welcome, in particular on Windows (not my
> > habitat).
> >> > >> Leon
> > > > Thank you. I have just applied this as 7826b36. Let’s see whether it > > breaks Windows. :-)
> > Comment and documentation updates with regard to the previous patch. > No code changes.
Thank you. Applied as 8dcd593 and 5da08ab.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org