New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unusual tell results with :crlf layer and multibyte input #8338
Comments
From adavies@ptc.comCreated by adavies@ptc.comThe presence or not of a :crlf layer on a multi byte encoded input Below is an a testcase using utf16le. # %<
my $file = './foobar';
print "#"x30, "\n";
print ">> using encoding utf16le <<\n";
# Create a file in the given encoding
open FOUT, ">:raw:encoding(utf16le):crlf", $file
or die "can't write file: $!\n";
print FOUT "aaa\nbbb\nccc\nddd\neee\n";
print "At end of write tell = ", tell(FOUT), "\n";
close FOUT;
print "File size = ", (-s $file), "\n";
open FIN, "<:raw:encoding(utf16le)", $file
or die "can't read file(2): $!\n";
while (<FIN>) {
s/\015?\012\z/\n/;
print "No crlf => $.:", tell(FIN), ":$_";
}
close FIN;
print "#"x30, "\n";
open FIN, "<:raw:encoding(utf16le):crlf", $file
or die "can't read file(1): $!\n";
while (<FIN>) {
print "With crlf => $.:", tell(FIN), ":$_";
}
close FIN;
END { unlink $file }
__END__
Is this a bug? In the documentation for tell: Cheers, alex. Perl Info
|
@Leont can you comment on this pls? |
It's not even a I'm not quite sure how to fix it, the whole concept of tell is rather dubious when there's several layer of transforming and buffering code between the users and the file. |
If I were designing this, I might say that it is the responsibility of each layer to track their position in relation to their parent layer. I assume that mechanism isn't there? |
Buffering complicates everything. They don't read byte-for-byte from the preceding layers but in chunks; therefor the different layers are not necessarily at the same location. The top layer reports knows how many bytes it has consumed from the preceding stream, but it doesn't necessarily know how to translate that to a physical position. Reading from :unix:perlio:encoding(...):crlf is really more like a pipe system. |
Perhaps the other option would be to die and indicate that tell is unsupported except on binary? This would potentially also break for a simple UTF8 layer. |
It isn't, that's the confusing thing. We have a hack that makes it work fine as long as a single transforming layer is involved. |
Right but the combination of |
|
If that's the case, then I wonder what the actual use case is and if it's common enough that we need to worry about it? or should we just close this as: "Yes. Yes it is broken. Don't do that!" |
Perhaps we should. I mean, of you're using two layers that each transform the data stream, it becomes pretty much impossible to reliably determine what point in the original datastream from the transformed data that you observe. |
Barring further comments from the original reporter, I'm closing this case. |
Migrated from rt.perl.org#38587 (status was 'new')
Searchable as RT38587$
The text was updated successfully, but these errors were encountered: