New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substr giving wrong results on $1 with utf8 #12321
Comments
From choroba@matfyz.czCreated by choroba@matfyz.czRunning substr($1, 0, 1) gives strange results when matching a utf8 The following code should demonstrate the problem: use strict; open my binmode STDOUT, ':utf8'; while (my $line = <>) { open my system "$^X utf2.pl < utf1"; __END__ Perl Info
|
From @cpansproutOn Mon Aug 06 23:49:16 2012, choroba@matfyz.cz wrote:
It doesn’t have to come from input. Here is a simpler example: "\x{100}" =~ /(.+)/; And here is the output in various perl versions (5.17.3 is actually $ pbpaste|perl5.8.1 What’s happening is that the utf-8 length/pos cache is becoming stale The first substr results in pos information being cached. The second I suspect we need to rethink the way the magic mechanism interacts with It affects tied variables as well: $y = "a\x{100}"; And substr lvalues: $x = "a\x{100}"; And nonexistent hash elements: sub { -- Father Chrysostomos |
The RT System itself - Status changed from 'new' to 'open' |
From @cpansproutI’ve fixed this in commit 7d1328b by reset utf8 caches in mg_get -- Father Chrysostomos |
@cpansprout - Status changed from 'open' to 'resolved' |
From @nwc10On Thu, Aug 30, 2012 at 06:20:13PM -0700, Father Chrysostomos via RT wrote:
This seems like a sensible solution. (I can't spot any flaws in the approach) Historically, if something breaks because of tie, it usually also breaks $ cat 114410.pl package UTF8Toggle; use overload '""' => 'stringify', fallback => 1; sub new { sub stringify { package main; my $u = UTF8Toggle->new(" \x{c2}7 "); printf "%d\n", ord substr $u, 1; __END__ I'm not sure what the best fix is here. Given that I'd been wondering if the Nicholas Clark |
From @cpansproutOn Fri Aug 31 02:28:08 2012, nicholas wrote:
As you may have noticed (if not: git log d8f2f09), I went searching In the process of doing so, I noticed this in a few places (this one in if (SvGMAGICAL(bufsv) || SvAMAGIC(bufsv)) { And I slept on it and came to the conclusion that overload couldn’t work I’m also wondering whether it’s even worth creating the utf8 cache to In that case, my sv_len_utf8_nomg should go, and we should have a macro -- Father Chrysostomos |
@cpansprout - Status changed from 'resolved' to 'open' |
From @cpansproutI’m reopening this as there are still unresolved issues. -- Father Chrysostomos |
From @nwc10On Fri, Aug 31, 2012 at 08:29:48AM -0700, Father Chrysostomos via RT wrote:
I'm not sure if any code relies for performance on being able to call it pp_index() calls sv_pos_u2b() and sv_pos_b2c() on the same SV. If no op or clearly related code path ends up calling more than one function Nicholas Clark |
From @cpansproutOn Thu Sep 06 06:06:51 2012, nicholas wrote:
Even if it does call it twice, doing so on overloading is still So I think what I suggested is the correct fix. -- Father Chrysostomos |
From @cpansproutI have resolved the overloading issues in the series of commits ending -- Father Chrysostomos |
From [Unknown Contact. See original ticket]I have resolved the overloading issues in the series of commits ending -- Father Chrysostomos |
@cpansprout - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#114410 (status was 'resolved')
Searchable as RT114410$
The text was updated successfully, but these errors were encountered: