New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pack propagates UTF8 strings #13989
Comments
From @mikelwardI know this is a consequence of an intended change in 5.10, but it breaks existing code in non-obvious ways - there should at least be documentation, if not an option to pack() to change this behavior. If you pass a UTF8 encoded string to pack() the output of pack() is now a UTF8 encoded string. When the packed data is passed to another program the entire string gets encoded, including any binary data along with it. Consider the attached program. It packs an integer and a string and sends the result over a message queue. Because the packed string has the UTF8 encoded attribute it gets encoded and the received string does not unpack to the correct values. Having pack() handle UTF8 strings the way it does is only useful if the packed data is not sent outside the process, which is sort of the whole reason for pack in the first place... |
From @mikelward |
From @demerphqOn 16 July 2014 22:18, Mike L <perlbug-followup@perl.org> wrote:
FWIW, I concur. $work has been bitten by this more than it should. This change has caused far far more trouble than it should have, was Yves -- |
The RT System itself - Status changed from 'new' to 'open' |
From @ikegamiSounds like you're saying there's a bug in msgsnd? If so, the solution is On Thu, Jul 17, 2014 at 3:27 AM, demerphq <demerphq@gmail.com> wrote:
|
From @ikegamiOn Thu, Jul 17, 2014 at 10:21 AM, Eric Brine <ikegami@adaelis.com> wrote:
Confirmed. const char * const mbuf = SvPV_const(mstr, len); should be const char * const mbuf = SvPVbyte_const(mstr, len); (SvPVbyte_const doesn't currently exist.) |
From @demerphqOn 17 July 2014 16:36, Eric Brine <ikegami@adaelis.com> wrote:
That may be. But the behaviour of pack doesn't make any sense to anyone who Yves -- |
From @mikelwardOn Thu Jul 17 13:08:45 2014, demerphq wrote:
My point exactly. If pack('a8') puts 8 characters (but 9 bytes because they're UTF8 characters) what is msgsend supposed to do about it? Pack is supposed to let you control the layout of data. |
From @ap* Mike L via RT <perlbug-followup@perl.org> [2014-07-18 01:05]:
Same thing everything is supposed to do about it. Decode the buffer and I mean, what do you expect pack to do when you give it a UTF8=on string? That’s how strings work in Perl (since 5.8 anyhow). And if msgsend doesn’t do that then it’s clearly broken, since the data There’s an argument to be had about whether pack ought to do the same Regards, |
From @mikelwardOn Thu Jul 17 18:30:49 2014, aristotle wrote:
Yes, and that's what it does now. That's why I don't think msgsnd is the problem - pack is. pack is supposed to pack things so the user can lay out the bytes the way they want them. If pack is going to produce a UTF8 string as it's output, which will then have its bytes manipulated, then pack() is not doing what it was designed to do - layout the bytes the way the caller specified. |
From @ikegamiOn Thu, Jul 17, 2014 at 4:08 PM, demerphq <demerphq@gmail.com> wrote:
"Doesn't exibit optimal performance" is a far cry from "doesn't make any A non-fatal sv_downgrade on exit should be ok, but I seem to remember you |
From @ikegamiOn Thu, Jul 17, 2014 at 4:53 PM, Mike L via RT <perlbug-followup@perl.org>
The string only has 8 bytes. Your code even prints them out. msgsend is |
From @ap* Mike L via RT <perlbug-followup@perl.org> [2014-07-18 07:35]:
You are welcome to think so. The source code says otherwise.
The only catch being that Perl doesn’t have byte strings. (It ought to, I’m inclined to think. But status quo is it doesn’t.) Regards, |
From @demerphqOn 18 July 2014 03:30, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote:
Look, there is a serious bug if by packing a string it *upgrades* the rest
Exactly what used to happen before we introduced this backwards
I wonder if you even understand the problem the OP is talking about, and if
No. Its not how pack worked for a long time. The 5.8 behavior was sane.
This bug is about pack doing the wrong thing. Please dont try to turn it $ perl -MEncode=encode_utf8 -we'my $str="\x{100}" x 128; my $packed= pack Why did the length get corrupted?!!! Why was there no warning when it was This does not work right, and has not worked right ever since we introduce Yves |
From @demerphqOn 18 July 2014 09:47, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote:
Nonsense. You and others have tried to force this broken view on the rest Yves -- |
From @demerphqOn 18 July 2014 10:28, demerphq <demerphq@gmail.com> wrote:
My apologies. This particular subject makes me testier than usual and I I feel very strongly there is a bug here, which I have seen catch multiple Yves -- |
From @ikegamiOn Fri, Jul 18, 2014 at 4:27 AM, demerphq <demerphq@gmail.com> wrote:
Because you encoded it using encode_utf8. Why would you do that?!
How can encode_utf8 know that you're encoding something that isn't meant to |
From @ap* demerphq <demerphq@gmail.com> [2014-07-18 15:25]:
No apologies necessary. |
From @rgsOn 16 July 2014 22:18, Mike L <perlbug-followup@perl.org> wrote:
This is a serious bug. It's even possible to generate nonsense output $ bleadperl -MDevel::Peek -wE 'Dump pack "U0v/a","foo\x{1f0}"' In my opinion the output of pack should never have the UTF8 flag on. |
From @demerphqOn 25 July 2014 13:25, Rafael Garcia-Suarez <rgarciasuarez@gmail.com> wrote:
Exactly. Never ever ever. Yves -- |
From @ikegamiOn Fri, Jul 25, 2014 at 7:25 AM, Rafael Garcia-Suarez <
What you describe is a problem with U0 specifically. U0 is broken. In my opinion the output of pack should never have the UTF8 flag on. You'd break C<< pack("n/a*", "\x{2660}"); >> |
Migrated from rt.perl.org#122308 (status was 'open')
Searchable as RT122308$
The text was updated successfully, but these errors were encountered: