Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:ignorecase doesn't apply to character ranges #4454

Closed
p6rt opened this issue Aug 5, 2015 · 9 comments
Closed

:ignorecase doesn't apply to character ranges #4454

p6rt opened this issue Aug 5, 2015 · 9 comments

Comments

@p6rt
Copy link

p6rt commented Aug 5, 2015

Migrated from rt.perl.org#125753 (status was 'resolved')

Searchable as RT125753$

@p6rt
Copy link
Author

p6rt commented Aug 5, 2015

From @hoelzro

For example​:

"%E3%81%82" ~~ m​:ignorecase/["%" (<[a..f0..9]> ** 2)]+/ && say $/[0]

Only prints 81 and 82, not E3.

A test file is attached.

@p6rt
Copy link
Author

p6rt commented Aug 5, 2015

From @hoelzro

test.pl

@p6rt
Copy link
Author

p6rt commented Aug 5, 2015

From @perlpilot

Also ...

<PerlJam> m​: "%E3%81%82" ~~ m​:ignorecase/ "%" <[a..f0..9]> / && say $/;
<+camelia> rakudo-moar 0dcbba​: OUTPUT«「%8」␤»
<PerlJam> m​: "%E3%81%82" ~~ m​:ignorecase/ "%" <[a..f]> / && say $/;
<+camelia> rakudo-moar 0dcbba​: OUTPUT«「%E」␤»

It seems the presence of 0..9 in the character class makes some difference.

On Wed Aug 05 07​:22​:15 2015, rob@​hoelz.ro wrote​:

For example​:

"%E3%81%82" ~~ m​:ignorecase/["%" (<[a..f0..9]> ** 2)]+/ && say $/[0]

Only prints 81 and 82, not E3.

A test file is attached.

--

-Scott (PerlJam/perlpilot)

@p6rt
Copy link
Author

p6rt commented Aug 5, 2015

From @perlpilot

<PerlJam> m​: '%E3%81%82' ~~ m​:ignorecase/['%' (<[abcdef0123456789]> ** 2)]+/ && say $/[0];
<+camelia> rakudo-moar 0dcbba​: OUTPUT«「E3」 「81」 「82」␤»
<PerlJam> hoelzro​: I guess something is broken with character class ranges?
<jnthn> PerlJam​: Other fun question is if you can rely on chr(start)..chr(end) being contiguous implying
  chr(uc(start))..chr(uc(end)) being
<jnthn> PerlJam​: And the answer is surely no because that'd make life way too easy :)
<jnthn> PerlJam​: So it'll need a bit of care to fix, but shouldn't be too bad :)
<jnthn> Trouble is though that then you lose the nice charrange optimization
<jnthn> Or at least, if you want to keep it you've got some analysis to do
<jnthn> Anyway, long story short I can see why it didn't get done yet. I agree it probably should be. :)
* PerlJam adds to the ticket for now.

On Wed Aug 05 07​:26​:01 2015, duff wrote​:

Also ...

<PerlJam> m​: "%E3%81%82" ~~ m​:ignorecase/ "%" <[a..f0..9]> / && say $/;
<+camelia> rakudo-moar 0dcbba​: OUTPUT«「%8」␤»
<PerlJam> m​: "%E3%81%82" ~~ m​:ignorecase/ "%" <[a..f]> / && say $/;
<+camelia> rakudo-moar 0dcbba​: OUTPUT«「%E」␤»

It seems the presence of 0..9 in the character class makes some difference.

On Wed Aug 05 07​:22​:15 2015, rob@​hoelz.ro wrote​:

For example​:

"%E3%81%82" ~~ m​:ignorecase/["%" (<[a..f0..9]> ** 2)]+/ && say $/[0]

Only prints 81 and 82, not E3.

A test file is attached.

--

-Scott (PerlJam/perlpilot)

@p6rt
Copy link
Author

p6rt commented Oct 6, 2015

From @jnthn

On Wed Aug 05 07​:37​:00 2015, duff wrote​:

<PerlJam> m​: '%E3%81%82' ~~ m​:ignorecase/['%' (<[abcdef0123456789]> **
2)]+/ && say $/[0];
<+camelia> rakudo-moar 0dcbba​: OUTPUT«「E3」 「81」 「82」␤»
<PerlJam> hoelzro​: I guess something is broken with character class
ranges?
<jnthn> PerlJam​: Other fun question is if you can rely on
chr(start)..chr(end) being contiguous implying
chr(uc(start))..chr(uc(end)) being
<jnthn> PerlJam​: And the answer is surely no because that'd make life
way too easy :)
<jnthn> PerlJam​: So it'll need a bit of care to fix, but shouldn't be
too bad :)
<jnthn> Trouble is though that then you lose the nice charrange
optimization
<jnthn> Or at least, if you want to keep it you've got some analysis
to do
<jnthn> Anyway, long story short I can see why it didn't get done yet.
I agree it probably should be. :)
* PerlJam adds to the ticket for now.

The bug is actually in the LTM engine, and can also be triggered as​:

perl6-m -e "'%E3%81%82' ~~ m​:ignorecase/['%' (<[a..f]>|x)]+/ && say $/[0]"

Which wrongly produces no output.

@p6rt
Copy link
Author

p6rt commented Oct 6, 2015

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Oct 21, 2015

From @smls

This unexpectedly fails to match the capital A​:

  ➜ say "Aa1" ~~ /​:i <[a..z0..9]>+/
  「a1」

*Without* the second range in the character class, it works as expected​:

  ➜ say "Aa1" ~~ /​:i <[a..z]>+/
  「Aa」

(This is perl6 version 2015.09-433-g26617f9 built on MoarVM version 2015.09-79-gee9fc2b).

@p6rt
Copy link
Author

p6rt commented Nov 11, 2015

From @jnthn

On Tue Oct 06 09​:17​:25 2015, jnthn@​jnthn.net wrote​:

On Wed Aug 05 07​:37​:00 2015, duff wrote​:

<PerlJam> m​: '%E3%81%82' ~~ m​:ignorecase/['%' (<[abcdef0123456789]> **
2)]+/ && say $/[0];
<+camelia> rakudo-moar 0dcbba​: OUTPUT«「E3」 「81」 「82」␤»
<PerlJam> hoelzro​: I guess something is broken with character class
ranges?
<jnthn> PerlJam​: Other fun question is if you can rely on
chr(start)..chr(end) being contiguous implying
chr(uc(start))..chr(uc(end)) being
<jnthn> PerlJam​: And the answer is surely no because that'd make life
way too easy :)
<jnthn> PerlJam​: So it'll need a bit of care to fix, but shouldn't be
too bad :)
<jnthn> Trouble is though that then you lose the nice charrange
optimization
<jnthn> Or at least, if you want to keep it you've got some analysis
to do
<jnthn> Anyway, long story short I can see why it didn't get done yet.
I agree it probably should be. :)
* PerlJam adds to the ticket for now.

The bug is actually in the LTM engine, and can also be triggered as​:

perl6-m -e "'%E3%81%82' ~~ m​:ignorecase/['%' (<[a..f]>|x)]+/ && say $/[0]"

Which wrongly produces no output.

Fixed now, and tests added in S05-metasyntax/charset.t. Also found that charranges did the wrong thing under LTM with ignoremark, and fixed that too (plus tests). And, of course, a test for the ignormark and ignorecase combination for good measure (also passing).

/jnthn

@p6rt
Copy link
Author

p6rt commented Nov 11, 2015

@jnthn - Status changed from 'open' to 'resolved'

@p6rt p6rt closed this as completed Nov 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant