Skip Menu |
Report information
Id: 127064
Status: open
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: jules [at] jules.uk
Cc:
AdminCc:

Severity: (no value)
Tag: (no value)
Platform: (no value)
Patch Status: (no value)
VM: (no value)



Subject: Variable interpolation in regex very slow
Download (untitled) / with headers
text/plain 649b
Given my @lines = "some-text.txt".IO.lines; my $s = 'Jules'; (some-text.txt is about 43k lines) Doing my @matching = @lines.grep(/ $s /); is about 50 times slower than my @matching = @lines.grep(/ Jules /); And if $s happened to contain anything other than literals, so I had to us my @matching = @lines.grep(/ <$s> /); then it's nearly 150 times slower. my @matching = @lines.grep($s); doesn't appear to work. It matches 0 lines but doesn't die. The lack of Perl5's straightforward variable interpolation in regexs is crippling the speed. Is there a faster alternative? (other than EVAL to build the regex) -- Jules@Jules.uk
Subject: Re: [perl #127064] Variable interpolation in regex very slow
Date: Wed, 30 Dec 2015 00:04:58 +0100
To: perl6-compiler [...] perl.org
From: Timo Paulssen <timo [...] wakelift.de>
Download (untitled) / with headers
text/plain 1.3k
On 12/29/2015 12:46 AM, Jules Field (via RT) wrote: Show quoted text
> # New Ticket Created by Jules Field > # Please include the string: [perl #127064] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=127064 > > > > Given > my @lines = "some-text.txt".IO.lines; > my $s = 'Jules'; > (some-text.txt is about 43k lines) > > Doing > my @matching = @lines.grep(/ $s /); > is about 50 times slower than > my @matching = @lines.grep(/ Jules /); > > And if $s happened to contain anything other than literals, so I had to us > my @matching = @lines.grep(/ <$s> /); > then it's nearly 150 times slower. > > my @matching = @lines.grep($s); > doesn't appear to work. It matches 0 lines but doesn't die. > > The lack of Perl5's straightforward variable interpolation in regexs is crippling the speed. > Is there a faster alternative? (other than EVAL to build the regex) >
For now, you can use @lines.grep(*.contains($s)), which will be sufficiently fast. Ideally, our regex optimizer would turn this simple regex into a code that uses .index to find a literal string and construct a match object for that. Or even - if you put a literal "so" in front - turn it into .contains($literal) if it knows that the match object will only be inspected for true/false. Until then, we ought to be able to make interpolation a bit faster. - Timo
From: Jules Field <Jules [...] Jules.uk>
Date: Wed, 30 Dec 2015 10:33:30 +0000
To: perl6-bugs-followup [...] perl.org
Subject: Re: [perl #127064] Variable interpolation in regex very slow
Download (untitled) / with headers
text/plain 2.3k
On 29/12/2015 23:05, Timo Paulssen via RT wrote: Show quoted text
> On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
>> # New Ticket Created by Jules Field >> # Please include the string: [perl #127064] >> # in the subject line of all future correspondence about this issue. >> # <URL: https://rt.perl.org/Ticket/Display.html?id=127064 > >> >> >> Given >> my @lines = "some-text.txt".IO.lines; >> my $s = 'Jules'; >> (some-text.txt is about 43k lines) >> >> Doing >> my @matching = @lines.grep(/ $s /); >> is about 50 times slower than >> my @matching = @lines.grep(/ Jules /); >> >> And if $s happened to contain anything other than literals, so I had to us >> my @matching = @lines.grep(/ <$s> /); >> then it's nearly 150 times slower. >> >> my @matching = @lines.grep($s); >> doesn't appear to work. It matches 0 lines but doesn't die. >> >> The lack of Perl5's straightforward variable interpolation in regexs is crippling the speed. >> Is there a faster alternative? (other than EVAL to build the regex) >>
> For now, you can use @lines.grep(*.contains($s)), which will be > sufficiently fast. > > Ideally, our regex optimizer would turn this simple regex into a code > that uses .index to find a literal string and construct a match object > for that. Or even - if you put a literal "so" in front - turn it into > .contains($literal) if it knows that the match object will only be > inspected for true/false. > > Until then, we ought to be able to make interpolation a bit faster. > - Timo
Many thanks for that. I hadn't thought to use Whatever. I would ideally also be doing case-insensitive regexps, but they are 50 times slower than case-sensitive ones, even in trivial cases. Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style) interpolation in this regex"? I can see the advantage of passing the variables to the regex engine, as then they can change over time. But that's not something I want to do very often, far more frequently I just need to construct the regex at run-time and have it go as fast as possible. Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of it!). Jules -- Jules@Jules.UK Twitter: @JulesFM 'If I were a Brazilian without land or money or the means to feed my children, I would be burning the rain forest too.' - Sting -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Download (untitled) / with headers
text/plain 3.1k
On Thu, 31 Dec 2015 05:39:24 -0800, jules@jules.uk wrote: Show quoted text
> > > On 29/12/2015 23:05, Timo Paulssen via RT wrote:
> > On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
> >> # New Ticket Created by Jules Field > >> # Please include the string: [perl #127064] > >> # in the subject line of all future correspondence about this issue. > >> # <URL: https://rt.perl.org/Ticket/Display.html?id=127064 > > >> > >> > >> Given > >> my @lines = "some-text.txt".IO.lines; > >> my $s = 'Jules'; > >> (some-text.txt is about 43k lines) > >> > >> Doing > >> my @matching = @lines.grep(/ $s /); > >> is about 50 times slower than > >> my @matching = @lines.grep(/ Jules /); > >> > >> And if $s happened to contain anything other than literals, so I had > >> to us > >> my @matching = @lines.grep(/ <$s> /); > >> then it's nearly 150 times slower. > >> > >> my @matching = @lines.grep($s); > >> doesn't appear to work. It matches 0 lines but doesn't die. > >> > >> The lack of Perl5's straightforward variable interpolation in regexs > >> is crippling the speed. > >> Is there a faster alternative? (other than EVAL to build the regex) > >>
> > For now, you can use @lines.grep(*.contains($s)), which will be > > sufficiently fast. > > > > Ideally, our regex optimizer would turn this simple regex into a code > > that uses .index to find a literal string and construct a match > > object > > for that. Or even - if you put a literal "so" in front - turn it into > > .contains($literal) if it knows that the match object will only be > > inspected for true/false. > > > > Until then, we ought to be able to make interpolation a bit faster. > > - Timo
> Many thanks for that. I hadn't thought to use Whatever. > > I would ideally also be doing case-insensitive regexps, but they are > 50 > times slower than case-sensitive ones, even in trivial cases. > Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style) > interpolation in this regex"? > I can see the advantage of passing the variables to the regex engine, > as > then they can change over time. > > But that's not something I want to do very often, far more frequently > I > just need to construct the regex at run-time and have it go as fast as > possible. > > Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of > it!). > > Jules
I recently attempted to make interpolating into regexes a little faster. This is what I was using for a benchmark: perl6 -e 'my @l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my @m = @l.grep(/ $s /); say @m.elems; say now - $t' sm.sql is 10k lines, of which 1283 contain the text "Perl6". This is Rakudo version 2017.09 built on MoarVM version 2017.09.1: / $s / took 5.3s and / <$s> / took 16.5s. This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version 2017.09.1-595-g716f2277f: / $s / took 3.2s and / <$s> / took 14.5s. However, if you type the string to interpolate it is *much* faster for literal interpolation. perl6 -e 'my @l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t = now; my @m = @l.grep(/ $s /); say @m.elems; say now - $t' This takes only 0.33s. This is still nowhere near as fast as grep(*.contains($s)) though, which only takes 0.037s.
Download (untitled) / with headers
text/plain 3.5k
On Sun, 15 Oct 2017 05:19:54 -0700, ddgreen@gmail.com wrote: Show quoted text
> On Thu, 31 Dec 2015 05:39:24 -0800, jules@jules.uk wrote:
> > > > > > On 29/12/2015 23:05, Timo Paulssen via RT wrote:
> > > On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
> > >> # New Ticket Created by Jules Field > > >> # Please include the string: [perl #127064] > > >> # in the subject line of all future correspondence about this > > >> issue. > > >> # <URL: https://rt.perl.org/Ticket/Display.html?id=127064 > > > >> > > >> > > >> Given > > >> my @lines = "some-text.txt".IO.lines; > > >> my $s = 'Jules'; > > >> (some-text.txt is about 43k lines) > > >> > > >> Doing > > >> my @matching = @lines.grep(/ $s /); > > >> is about 50 times slower than > > >> my @matching = @lines.grep(/ Jules /); > > >> > > >> And if $s happened to contain anything other than literals, so I > > >> had > > >> to us > > >> my @matching = @lines.grep(/ <$s> /); > > >> then it's nearly 150 times slower. > > >> > > >> my @matching = @lines.grep($s); > > >> doesn't appear to work. It matches 0 lines but doesn't die. > > >> > > >> The lack of Perl5's straightforward variable interpolation in > > >> regexs > > >> is crippling the speed. > > >> Is there a faster alternative? (other than EVAL to build the > > >> regex) > > >>
> > > For now, you can use @lines.grep(*.contains($s)), which will be > > > sufficiently fast. > > > > > > Ideally, our regex optimizer would turn this simple regex into a > > > code > > > that uses .index to find a literal string and construct a match > > > object > > > for that. Or even - if you put a literal "so" in front - turn it > > > into > > > .contains($literal) if it knows that the match object will only be > > > inspected for true/false. > > > > > > Until then, we ought to be able to make interpolation a bit faster. > > > - Timo
> > Many thanks for that. I hadn't thought to use Whatever. > > > > I would ideally also be doing case-insensitive regexps, but they are > > 50 > > times slower than case-sensitive ones, even in trivial cases. > > Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style) > > interpolation in this regex"? > > I can see the advantage of passing the variables to the regex engine, > > as > > then they can change over time. > > > > But that's not something I want to do very often, far more frequently > > I > > just need to construct the regex at run-time and have it go as fast > > as > > possible. > > > > Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of > > it!). > > > > Jules
> > > I recently attempted to make interpolating into regexes a little > faster. This is what I was using for a benchmark: > perl6 -e 'my @l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my > @m = @l.grep(/ $s /); say @m.elems; say now - $t' > sm.sql is 10k lines, of which 1283 contain the text "Perl6". > > This is Rakudo version 2017.09 built on MoarVM version 2017.09.1: > / $s / took 5.3s and / <$s> / took 16.5s. > > This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version > 2017.09.1-595-g716f2277f: > / $s / took 3.2s and / <$s> / took 14.5s. > > However, if you type the string to interpolate it is *much* faster for > literal interpolation. > perl6 -e 'my @l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t = now; > my @m = @l.grep(/ $s /); say @m.elems; say now - $t' > This takes only 0.33s. > > This is still nowhere near as fast as grep(*.contains($s)) though, > which only takes 0.037s.
This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version 2017.10-58-gad8618468: / $s / took 2.7s and / <$s> / took 7.0s.
Download (untitled) / with headers
text/plain 3.9k
On Tue, 07 Nov 2017 17:10:29 -0800, ddgreen@gmail.com wrote: Show quoted text
> On Sun, 15 Oct 2017 05:19:54 -0700, ddgreen@gmail.com wrote:
> > On Thu, 31 Dec 2015 05:39:24 -0800, jules@jules.uk wrote:
> > > > > > > > > On 29/12/2015 23:05, Timo Paulssen via RT wrote:
> > > > On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
> > > >> # New Ticket Created by Jules Field > > > >> # Please include the string: [perl #127064] > > > >> # in the subject line of all future correspondence about this > > > >> issue. > > > >> # <URL: https://rt.perl.org/Ticket/Display.html?id=127064 > > > > >> > > > >> > > > >> Given > > > >> my @lines = "some-text.txt".IO.lines; > > > >> my $s = 'Jules'; > > > >> (some-text.txt is about 43k lines) > > > >> > > > >> Doing > > > >> my @matching = @lines.grep(/ $s /); > > > >> is about 50 times slower than > > > >> my @matching = @lines.grep(/ Jules /); > > > >> > > > >> And if $s happened to contain anything other than literals, so I > > > >> had > > > >> to us > > > >> my @matching = @lines.grep(/ <$s> /); > > > >> then it's nearly 150 times slower. > > > >> > > > >> my @matching = @lines.grep($s); > > > >> doesn't appear to work. It matches 0 lines but doesn't die. > > > >> > > > >> The lack of Perl5's straightforward variable interpolation in > > > >> regexs > > > >> is crippling the speed. > > > >> Is there a faster alternative? (other than EVAL to build the > > > >> regex) > > > >>
> > > > For now, you can use @lines.grep(*.contains($s)), which will be > > > > sufficiently fast. > > > > > > > > Ideally, our regex optimizer would turn this simple regex into a > > > > code > > > > that uses .index to find a literal string and construct a match > > > > object > > > > for that. Or even - if you put a literal "so" in front - turn it > > > > into > > > > .contains($literal) if it knows that the match object will only > > > > be > > > > inspected for true/false. > > > > > > > > Until then, we ought to be able to make interpolation a bit > > > > faster. > > > > - Timo
> > > Many thanks for that. I hadn't thought to use Whatever. > > > > > > I would ideally also be doing case-insensitive regexps, but they > > > are > > > 50 > > > times slower than case-sensitive ones, even in trivial cases. > > > Maybe a :adverb for rx// that says "give me static (i.e. Perl5- > > > style) > > > interpolation in this regex"? > > > I can see the advantage of passing the variables to the regex > > > engine, > > > as > > > then they can change over time. > > > > > > But that's not something I want to do very often, far more > > > frequently > > > I > > > just need to construct the regex at run-time and have it go as fast > > > as > > > possible. > > > > > > Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines > > > of > > > it!). > > > > > > Jules
> > > > > > I recently attempted to make interpolating into regexes a little > > faster. This is what I was using for a benchmark: > > perl6 -e 'my @l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my > > @m = @l.grep(/ $s /); say @m.elems; say now - $t' > > sm.sql is 10k lines, of which 1283 contain the text "Perl6". > > > > This is Rakudo version 2017.09 built on MoarVM version 2017.09.1: > > / $s / took 5.3s and / <$s> / took 16.5s. > > > > This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version > > 2017.09.1-595-g716f2277f: > > / $s / took 3.2s and / <$s> / took 14.5s. > > > > However, if you type the string to interpolate it is *much* faster > > for > > literal interpolation. > > perl6 -e 'my @l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t = > > now; > > my @m = @l.grep(/ $s /); say @m.elems; say now - $t' > > This takes only 0.33s. > > > > This is still nowhere near as fast as grep(*.contains($s)) though, > > which only takes 0.037s.
> > > This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version > 2017.10-58-gad8618468: > / $s / took 2.7s and / <$s> / took 7.0s.
Adding :i (case insensitive adverb), /:i $s / took 3.0s and /:i <$s> / took 7.7s.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org