New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module Encode degrades in Perl 5.10 #9561
Comments
From mihara@twister.dev.iwa.fujixerox.co.jpCreated by mihara@twister.dev.iwa.fujixerox.co.jpI found this bug while playing with Encode::IMAPUTF7 module. ---- $str="abcあdef"; Please be sure that $str contains Japanese character in UTF-8. when "$1" is fed like this in line 19 next pattern machi in line 15 if ($str =~ /\G($re1+)/cg) { cannot update $1. Previous $1 remains. Work around is this. my $tmp = $1; But inside the $e_utf16->encode(), something may be wrong. Sorry for my poor explanation. Please mail me at osamu.mihara'@'fujixerox.co.jp, if you have further questions. Perl Info
|
From @demerphq2008/11/11 mihara@twister.dev.iwa.fujixerox.co.jp (via RT)
Can you attach the code as a file? I dont seem to be able to create I change the Japanese character to a question mark i get the following output: D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl 0:abc?def[] 0:abc?def[] which clearly is different. (the bottom one is perl 5.8.6, the top is If i change it to \x{100} then i get: D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl Wide character in print at ..\encode_bug.pl line 14. Wide character in print at ..\encode_bug.pl line 14. some of which is explainable by this being a windows box, and not Which sortof reminds me of something: Encode has an annoying habit of Anyway, obviously there is a bug. Damned if i understand it tho. Cheers, -- |
The RT System itself - Status changed from 'new' to 'open' |
From @demerphq2008/11/11 demerphq <demerphq@gmail.com>:
If i used Devel::Peek to have a look at $1 after each regex match i So for some reason $1 isnt marked as readonly in 5.10, and what i find SV = PVMG(0x1a76b2c) at 0x1a98d9c and in blead: SV = PVMG(0x1ac052c) at 0x1a4dab4 So, outside of the added UTF8 flag (wtf did that come from), and the D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl 0:abc?def[] 0:abc?def[] -- |
From @demerphq2008/11/11 demerphq <demerphq@gmail.com>:
Hmm, im wondering, does the POK mean that we ignore the vtable? Is it possible somehow that Encode is turning on POK and UTF8
-- |
From @nwc10On Tue, Nov 11, 2008 at 06:15:42PM +0100, demerphq wrote:
POK shouldn't be on for something with GMG. So that (I think) means that we ignore the vtable, yes. Nicholas Clark |
From @demerphq2008/11/11 Nicholas Clark <nick@ccl4.org>:
Should I take it that as you didn't mention the READONLY being missing Shouldn't they be readonly? Or am I missing something? It seems reasonable that Perl 5.8.6 dies, and blead doesn't is related. Yves -- |
From @nwc10On Tue, Nov 11, 2008 at 07:45:22PM +0100, demerphq wrote:
I think that they should be, but I missed the difference in the READONLY Nicholas Clark |
From @demerphq2008/11/11 Nicholas Clark <nick@ccl4.org>:
Turns out that this is deliberate, as some pluggable regex engines Aevar patched this in quite some time ago. So I think at this points its a bug in Encode, but I couldn't figure Yves -- |
From bart.lateur@pandora.beAccording to the docs of Encode::Encoding, As to why the regex engine no longer updates $1 after such a change, at For fun, try changing the parameter to encode to $1=$1. |
From @nwc10Dave notes: Either Encode is doing something with $1 it shouldn't, or there's |
From @tonycozOn Thu May 28 07:43:38 2009, nicholas wrote:
The first thing Encode::Unicode::encode_xs() does is call sv_utf8_upgrade(). When $1 was READONLY this would upgrade PVX and set UTF8 without setting With $1 not READONLY this calls SvPV_force() which sets POK, upgrade PVX Encode::Unicode::encode_xs() makes further modifications to the SV later So three issues: 1) sv_utf8_upgrade() is setting POK via SvPV_force - is this sane for a 2) sv_utf8_upgrade() updates the string, but doesn't call mg_set() - 3) Encode::Unicode::encode_xs() changes the string in other ways, but I suspect Encode::Unicode::encode_xs not calling is a bug, but I'll I'm not sure what the desired behaviour would be between |
From @tonycozOn Mon, Jul 06, 2009 at 07:54:42AM -0700, Tony Cook via RT wrote:
I've confirmed SvPV_force() will set POK if called on $1 [1]. Since SvPV() simply returns PVX when POK is set, any magic is ignored So the main questions here is whether SvPV_force() should set SVf_POK I suspect SvPV_force() shouldn't be setting POK, but I don't know what
I'm sure it shouldn't be. -- void $ perl -Mblib -MT60472 -MDevel::Peek -le '$_ = "abcd"; /..(.)/; print $1; T60472::SvPV_force($1); Dump($1); /.(.)/; print $1' |
From @nwc10On Wed, Jul 08, 2009 at 10:07:49PM +1000, Tony Cook wrote:
The two together are contradictory. So something must give.
I suspect that it should not. I've no idea what would get upset if this http://perl5.git.perl.org/perl.git/blame/blead:/sv.c#l8314 commit a0d0e21 The logical alternative would seem to be to turn SvPOK on, but remove magic. (Which, really, suggests that tainting is implemented wrongly. I suspect that Nicholas Clark |
From @tonycozOn Mon, Jul 13, 2009 at 04:58:28PM +0100, Nicholas Clark wrote:
In the specific case in the ticket it would get or set magic from $1 When I find the time/energy I'll try adjusting Perl_sv_pvn_force_flags There is also a separate bug in Encode::Unicode::encode_xs - it Unfortunately, the damage (POK on) is already done to the SV at this Tony |
From @tonycozOn Tue, Jul 14, 2009 at 10:38:52AM +1000, Tony Cook wrote:
With the following change: --- a/sv.c The only change in tests are: @@ -816,17 +816,11 @@ (and the summary) Copied and pasted from less, hence the <XX>. The inflate code in both classes also forces POK on. Tony |
From @tonycozOn Wed Jul 15 20:30:15 2009, tonyc wrote:
This seems like it was a wild goose chase. SvSETMAGIC() will clear the |
From @tonycozOn Tue Nov 10 00:10:31 2009, tonyc wrote:
The specific case described in 60472 should be fixed by Encode 2.38. |
@rgs - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#60472 (status was 'resolved')
Searchable as RT60472$
The text was updated successfully, but these errors were encountered: