Skip Menu |
Report information
Id: 131822
Status: new
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: shlomif [at] shlomifish.org
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Date: Mon, 31 Jul 2017 22:14:28 +0300
From: Shlomi Fish <shlomif [...] shlomifish.org>
Subject: A multiline regex that starts with /^/m is much slower than the corresponding one that starts with /\n/
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 1.2k
A multiline regex that starts with /^/m is much slower than the corresponding one that starts with /\n/. Below is Dave Mitchell's analysis: The code can be reduced to the following: my $nomatch = <<EOF; Start 1 = Boo End EOF my $match = <<EOF; Start 1 = End EOF $_ = ($nomatch x 10_000) . $match; my $n = $ARGV[0] ? '^' : '\n'; m/${n}Start [0-9]+ =\nEnd\n/m or die; $ time perl5260o ~/tmp/p 0; time perl5260o ~/tmp/p 1 real 0m0.004s user 0m0.002s sys 0m0.002s real 0m0.691s user 0m0.690s sys 0m0.001s It's probably down to this in regexec_flags(): /* note that with PREGf_IMPLICIT, intuit can only fail * or return the start position, so it's of limited utility. * Nevertheless, I made the decision that the potential for * quick fail was still worth it - DAPM */ Basically the '^' causes it to (fruitlessly) run intuit at the start of every line; the \n instead causes it to just fbm to the next "\nStart" string. I may need to revisit that decision. The whole 'pick the next viable start position' logic in regexec_flags() needs an overhaul, and its on my list of things to do (but not currently near the top). =========== Please look into fixing it in a future version.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org