New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DATA filehandle off on UTF16 source #8754
Comments
From adavies@ptc.comCreated by adavies@ADAVIES13D.ptcnet.ptc.comIf the the DATA filehandle is read from a UTF16LE # %< print FOUT <<'EOT'; print "START\n"; The above outputs: START ie. the multibyte data is being read out of sequence. START END is output. Perl Info
|
From Peter.Dintelmann@dresdner-bank.comDoes your editor atart the file with a BOM which
|
The RT System itself - Status changed from 'new' to 'open' |
From guest@guest.guest.xxxxxxxxOn Mon Jan 29 07:14:34 2007, dint wrote:
Yes it does - the standard 2 byte BOM. I've tried the test without the BOM in the file, and |
From Peter.Dintelmann@dresdner-bank.com
Sorry, I misssed that you create your UTF-16LE
|
From @bulk88On Mon Jan 29 04:07:18 2007, adavies@ptc.com wrote:
Running (my line numbers are different than the quote below) # Create a UTF16LE encoded test file: print FOUT <<'EOT'; print "START\n"; on win32 Perl 5.10 __________________________________________________________________ C:\Documents and Settings\Owner\Desktop> on win32 Perl 5.12 END C:\Documents and Settings\Owner\Desktop> END C:\p517\perl\win32> I suggest for someone who knows more about -- |
From @ikegamiOn Wed, Dec 26, 2012 at 4:48 PM, bulk88 via RT <perlbug-followup@perl.org>wrote:
If there was, there still is. Except now it's not off by one, it's off by a -----BEGIN UPDATED CODE----- print FOUT <<'EOT'; print FOUT "$_\n" for 1..50; print "START\n"; -----BEGIN OUTPUT----- END |
When Perl detects that the file is in UTF-16, it adds a source filter inside toke.c and delivers the file as UTF-8 to the rest of the parser. I suspect this is a units issue, that the offset that is passed to PerlIO to indicate the beginning of where to read is in terms of two-byte units, and PerlIO thinks it is in terms of single bytes, so multiplies by 2, creating an offset further in in which to read. @Leont does this idea lead you to where to check if it's true? |
Migrated from rt.perl.org#41368 (status was 'open')
Searchable as RT41368$
The text was updated successfully, but these errors were encountered: