Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathological performance of a pattern match #13992

Closed
p5pRT opened this issue Jul 19, 2014 · 4 comments
Closed

Pathological performance of a pattern match #13992

p5pRT opened this issue Jul 19, 2014 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 19, 2014

Migrated from rt.perl.org#122331 (status was 'open')

Searchable as RT122331$

@p5pRT
Copy link
Author

p5pRT commented Jul 19, 2014

From @andk

I think this bugreport deserves to be a quiz. Test your intuition.

Sample string to test the regexps against​:

  sprintf "%scould%snot%sopen%s", ("x"x10000)x4;

Regexp 1​:

  qr!.*(?​:could ?not (?​:open|connect|find))!

Regexp 2 (the difference is it's case insensitive)​:

  qr!.*(?i​:could ?not (?​:open|connect|find))!

Note​: neither matches and this is correct.

Question​: which is faster and by how much?

Answer​: regexp 1 is much more than 100 times slower than the regexp 2.

On my machine this program takes 1-2 wallclock seconds​:

  time $p -le 'my $x = sprintf "%scould%snot%sopen%s", ("x"x10000)x4;print $x =~ m!.*(?​:could ?not (?​:open|connect|find))! ? "not ok" : "ok";'

And on all those perls it takes less than 0.01 wallclock seconds when I
add the "i".

Historical evidence​: this is not a regression, at least not since 5.6. I
have witnessed the same slowness on all my perls between 5.6.1 and
bleadperl.

Competition evidence​: with 'use re​::engine​::PCRE;' both regexps are
fast.

Enjoy,
--
andreas

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2014

From @iabyn

On Sat, Jul 19, 2014 at 02​:21​:03PM -0700, Andreas J. Koenig via RT wrote​:

sprintf "%scould%snot%sopen%s", ("x"x10000)x4;

Regexp 1​:

qr!.*(?​:could ?not (?​:open|connect|find))!

Regexp 2 (the difference is it's case insensitive)​:

qr!.*(?i​:could ?not (?​:open|connect|find))!

The second is faster because it disables intuiting, which causes it to
take a different path through regexec_flags(), which causes it to fail
after the first call to regtry(), rather than re-trying regtry at
positions 1,2,3,... and eventually failing.

I don't know know what the fix is, but this is tied up with how intuit()
and the parent regex_flags() handle all the various flags (like ANCH_MBOL,
and whether that MBOL is a fake one due the presence of a leading '.*'
etc).

Its all a bit of a mess, and I intend to look at this properly when I get
round to doing my second batch of intuit()/regexec_flags() refactoring
sometime this year.

--
Modern art​:
  "That's easy, I could have done that!"
  "Ah, but you didn't!"

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2014

The RT System itself - Status changed from 'new' to 'open'

@xenu
Copy link
Member

xenu commented May 29, 2020

I can reproduce that on 5.18.4 but I can't on 5.30.2 so it definitely was fixed at some point.

@xenu xenu closed this as completed May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants