New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding I/O Layer Difference on Windows #15218
Comments
From dwheeler@cpan.orgI have a file to which I’ve written: "\xc3\xa5\xc3\xa5\xc3\xa5\x0a" These bytes correspond to the UTF-8 string: "ååå\n" I want to read this file into a scalar, so I wrote this Perl: sub slurp { my $fn = shift or die "Usage: $0 [file]\n"; Works great on my Mac and on *nix machines, but not on Windows. There it emits a warning: utf8 "\xA5" does not map to Unicode at try.pl line 6 I can fix it by changing the I/O layer to :raw:encoding(UTF-8). My guess is that it reads it in as raw bytes, first, then does the conversion. I can also break it on my Mac by changing the layer to :crlf:encoding(UTF-8). But I don’t understand why the encoding layer’s parsing of a file with \x0a line endings should vary by platform. Sure, line ending on Windows might typically be \x0d\x0a, but why would the I/O layer care when I’ve told it that the file encoding is UTF-8? Note that this does not occur with data in memory. This works fine on both platforms: use Encode qw(decode_utf8); Is the I/O layer assuming that, because we’re on Windows, line endings need to be converted to \r\n before decoding? Is it implicitly using :crlf on Windows? Doesn’t seem like it’d be necessary if I’ve already told it what encoding to use, and shouldn't bork that encoding in any event. |
From @arcAs requested, I've attached a program and test inputs demonstrating that the problem shows up iff all the following are true: - The :crlf layer is used I get the following results when running the program in various ways: $ for mode in ':encoding(UTF-8)' :crlf:utf8 ':crlf:encoding(UTF-8)'; do
The original p5p thread on this may have some additional information: http://nntp.perl.org/group/perl.perl5.porters/234856 -- |
From @arcå |
1 similar comment
From @arcå |
@jkeenan - Status changed from 'new' to 'open' |
From @jkeenanOn Fri, 18 Mar 2016 13:19:04 GMT, arc wrote:
After the above post from Aaron, Leon Timmermans added the following, which I'm quoting here to get all the current state of discussion into RT: #####
Shocking, an issue in :crlf or :encoding… What happens is that a byte gets read and then unread. For a :perlio layer The other half would be not to do this read/unread silliness in the first - using a file that ends in \r\n rather than \n That actually was the crucial hint to where the problem is located :-) Leon Is anyone able to analyze this further? Thank you very much. -- |
From @LeontOn Tue, Feb 28, 2017 at 2:25 PM, James E Keenan via RT
I don't think this ticket is in need of analysis, I think it's in need of a fix. Leon |
From cm.perl@abtela.comLe 28/02/2017 à 14:25, James E Keenan via RT a écrit :
To me it looks very much like #120797 in which Leon Timmermans suggested Regards, --Christian |
Migrated from rt.perl.org#127668 (status was 'open')
Searchable as RT127668$
The text was updated successfully, but these errors were encountered: