New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seeking on bytes causes broken perl strings #11697
Comments
From tchrist@perl.comPerl's seek and sysseek take off_t arguments and retvals, which are in % perl -CS -E 'say "E\x{301}" x 50 for 1..100' > sample.utf8 % cat sysseek-enc-test % perl sysseek-enc sample.utf8 2 4 Notice that the data produced claim to have an initial code point of % head -1 sample.utf8 | uniquote -x | chop -72 The problem doesn't change if you :%s/sys//g the previous program: % cat seek-enc-test Ok, now what? --tom Summary of my perl5 (revision 5 version 14 subversion 0) configuration: Characteristics of this binary (from libperl): |
From @doyOn Fri, Oct 14, 2011 at 03:56:49PM -0700, tchrist1 wrote:
For what it's worth (not to say that I agree with the current behavior), Note the in bytes: even if the filehandle has been set to operate on I guess the idea is that in order to seek to some location in the middle -doy |
The RT System itself - Status changed from 'new' to 'open' |
From tchrist@perl.com
Yes, I know. It was sysseek and sysread where I got into the problem. In the meanwhile, I suggest that this at the least be added to sysseek: B<WARNING>: I<POSITION> is in bytes not characters, no matter whether there --tom |
From @devenOn Fri, Oct 14, 2011 at 7:19 PM, Tom Christiansen <tchrist@perl.com> wrote:
Perhaps it would be better for the warning to talk about the dangers of I would propose adding special-case logic for UTF-8 for the following * UTF-8 has variable-length character encodings, so it's impossible to My proposed solution is simple: On the first character-oriented read after a Wouldn't this be better than creating corrupted characters in strings that Deven |
From @ap* Deven T. Corzine <deven@ties.org> [2011-10-17 16:35]:
That will make the read “work” in that it won’t complain and won’t “Sometimes your read will swallow the first character in the data. We What is the sense of that? At the very least if this is done then you’d have to correspondingly Regards, |
From tchrist@perl.com"A. Pagaltzis via RT" <perlbug-followup@perl.org> wrote
Good point. I agree that it seems dodgy. Perl is pretty careful not to go silently I just don't know what the devil to do with reads producing broken strings My stomach has been somewhat unsettled of late, so I don't know that I I know how the super-duper-over-object-oriented languages would "fix" But what to do about these malformed characters, eh? --tom |
From @cpansproutOn Mon Oct 17 09:18:48 2011, tom christiansen wrote:
Either |
From tchrist@perl.com"Father Chrysostomos via RT" <perlbug-followup@perl.org> wrote
Speaking of mandatory warnings, I was recently bitten by there not being --tom |
From @ikegamiOn Mon, Oct 17, 2011 at 12:40 PM, Tom Christiansen <tchrist@perl.com> wrote:
Because you used a version of Perl from after the depreciation cycle? If you don't test your script with every major version, you can miss a |
From @ikegamiOn Mon, Oct 17, 2011 at 1:01 PM, Eric Brine <ikegami@adaelis.com> wrote:
argh, I've seen deprecation misspelled so often, I started doing it! |
From tchrist@perl.com
I did? Really? I have 6/8/10/12/14 all installed. I wonder how Was implicit split a "mandatory" (=default) warning at some point? Does this mean we plan to soon stop warning people about $* not working anymore? --tom |
From @cpansproutOn Mon Oct 17 10:08:54 2011, tom christiansen wrote:
No. Deprecation warnings became default warnings in 5.12. 5.12 removed (Shameless plug: If you slap ‘use Classic::Perl;’ on all your old
I doubt it, but you can say ** or &* or %* before mentioning $*, and the |
From @devenOn Mon, Oct 17, 2011 at 12:03 PM, Aristotle Pagaltzis <pagaltzis@gmx.de>wrote:
In the scenario where you want to seek into the middle of UTF-8 data, but
A more accurate description would be "if you seek into the middle of a At the very least if this is done then you’d have to correspondingly
I don't follow your logic here. This isn't about using byte-oriented buffer Deven |
From perl-diddler@tlinx.orgDeven T. Corzine wrote:
I would use a more robust approach. Go BACK 'n' bytes (where 'n' is the maximum number of bytes needed to Then go forward and return the file seek value as the position before ??? Not against other optional warnings and 'strict' approaches, but it |
Migrated from rt.perl.org#101382 (status was 'open')
Searchable as RT101382$
The text was updated successfully, but these errors were encountered: