Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern utf8ness sticks around globally #12660

Closed
p5pRT opened this issue Dec 20, 2012 · 4 comments
Closed

Pattern utf8ness sticks around globally #12660

p5pRT opened this issue Dec 20, 2012 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 20, 2012

Migrated from rt.perl.org#116148 (status was 'resolved')

Searchable as RT116148$

@p5pRT
Copy link
Author

p5pRT commented Dec 20, 2012

From am@jobindex.dk

On Wed, Dec 19, 2012 at 10​:43​:04PM +0100, Anders Melchiorsen wrote​:

use feature 'unicode_strings';

my $x = "\x{263a}";
$x =~ /$x/;

my $text = "Perl";
die if $text !~ /P.*$/i;

The program does nothing (as expected) in perl 5.14.2, but dies in
perl 5.16.2.

On 12/19/2012 04​:02 PM, Dave Mitchell wrote​:

It bisects to

commit 77a6d85
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sun Oct 16 12​:47​:21 2011 -0600

 regexec\.c&#8203;: Less work in /i matching

On 20-12-2012 06​:07, Karl Williamson wrote​:

I looked at this a little. The bug isn't from that commit, which is just
exposing an underlying bug. The problem is that for the second match,
UTF_TARGET in regexec.c is wrongly evaluating to TRUE. It is

#define UTF_PATTERN ((PL_reg_flags & RF_utf8) != 0)

so that means that PL_reg_flags is set wrong. It turns out it is
wrong at
the beginning of_re_intuit_start(). Somehow the utf8ness of the first
pattern is getting passed in this global as if it applied to the 2nd
pattern.
Grepping the project's source indicates that the only setting of this
global
is done in regexec.c.

This looks to me like a global that shouldn't be, like PL_regsize,
which Dave
just removed. I didn't see anyplace where it gets initialized, which
would
indicate that once the program sees a UTF-8 pattern, it thinks all
patterns
are UTF-8. That bug seems unlikely to have escaped detection until
now, so
I don't understand.

This is a portion of the regex handling that I know nothing about.
Perhaps
this will help someone who does know this logic better to easily
figure out
the cause and fix.

@p5pRT
Copy link
Author

p5pRT commented Dec 25, 2012

From @iabyn

On Wed, Dec 19, 2012 at 11​:48​:31PM -0800, Anders Melchiorsen wrote​:

On 20-12-2012 06​:07, Karl Williamson wrote​:

I looked at this a little. The bug isn't from that commit, which is
just exposing an underlying bug. The problem is that for the second
match, UTF_TARGET in regexec.c is wrongly evaluating to TRUE. It is

#define UTF_PATTERN ((PL_reg_flags & RF_utf8) != 0)

so that means that PL_reg_flags is set wrong. It turns out it is
wrong at the beginning of_re_intuit_start(). Somehow the utf8ness of
the first pattern is getting passed in this global as if it applied
to the 2nd pattern. Grepping the project's source indicates that the
only setting of this global is done in regexec.c.

This looks to me like a global that shouldn't be, like PL_regsize,
which Dave just removed. I didn't see anyplace where it gets
initialized, which would indicate that once the program sees a UTF-8
pattern, it thinks all patterns are UTF-8. That bug seems unlikely
to have escaped detection until now, so I don't understand.

This is a portion of the regex handling that I know nothing about.
Perhaps this will help someone who does know this logic better to
easily figure out the cause and fix.

Note that it was only intuit that was failing to reset RF_utf8​: a call to
regexec_flags *would* reset it,

Fixed now with commit 4fab19c; which is followed by the four commits up
to 97a9508, which collectively eliminate the PL_reg_flags global.

--
Art is anything that has a label (especially if the label is "untitled 1")

@p5pRT
Copy link
Author

p5pRT commented Dec 25, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 25, 2012

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant