New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behaviour when decoding in substitution #15324
Comments
From mbu@jobindex.dkCreated by mbu@jobindex.dk $x =~ s/(.)/$latin1->decode($1)/ge These two expressions can in some cases give different results. See attached I'm fully aware that decoding an already decoded string is not a good idea. I have reproduced the issue on Perl 5.24.0, 5.22.1 and 5.18.2. ----------------- Example --------------------------------------- my $latin1 = find_encoding('latin1'); my $x = "\N{U+00C5}\N{U+0080}"; sub version1 { sub version2 { sub version3 { sub version4 { my $a = $x =~ s/(.)/version1($1)/ger; Dump($a); # "\xC5\x80" ----------------- Example output -------------------------------- Perl Info
|
From @iabynOn Fri, May 13, 2016 at 01:40:57AM -0700, Michael Budde wrote:
The difference is that $latin1->decode directly calls the correct method, If you directly pass $1 as an arg by reference as perl subs do, rather sub f { $1 et al act a bit like tied variables, and you can see similar artifacts -- |
The RT System itself - Status changed from 'new' to 'open' |
From @dcollinsnDave, his attachment suggests that it's the other way around. Running his testcase shows me that the "odd one out" is the one corresponding to: s/(.)/$latin1->decode($1)/ger; Surely, anything that $latin1->decode($1) does to trample on $1, Encode::decode('latin1', $1) will do as well? In fact, the first thing Encode::decode does is break the pass-by-reference with: my ( $name, $octets, $check ) = @_; So either $latin1->decode doesn't break the reference, and modifies $1 by assigning to it, or it performs a match that Encode::decode('latin1', $1) doesn't, and modifies $1 through a regex match. So, I ran a different testcase, Dumping $1 instead of the return from decode(), and all four testcases had only "\x80" in $1 - so it does not seem that $1 is being polluted by a pattern match in $latin1->decode(), nor does it seem that $1 is being changed by reference within $latin1->decode(), since each case should result in the attached modified test failing in some way. I'll keep poking this, unless I'm missing something obvious. |
From zefram@fysh.orgDan Collins via RT wrote:
No. $latin1 is an object of the Encode::XS class, so the method -zefram |
From @dcollinsnThanks, Zefram, that was a nudge in the right direction. I get it, and I think I fixed it. Let me write a test and I'll submit a patch tomorrow. |
From [Unknown Contact. See original ticket]Thanks, Zefram, that was a nudge in the right direction. I get it, and I think I fixed it. Let me write a test and I'll submit a patch tomorrow. |
From @dcollinsnSubmitted to Encode's cpan maintainers as [cpan #115168]. |
@iabyn - Status changed from 'open' to 'rejected' |
Migrated from rt.perl.org#128143 (status was 'rejected')
Searchable as RT128143$
The text was updated successfully, but these errors were encountered: