Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex metachar tilde fails on .*? content #1255

Closed
p6rt opened this issue Aug 29, 2009 · 5 comments
Closed

regex metachar tilde fails on .*? content #1255

p6rt opened this issue Aug 29, 2009 · 5 comments

Comments

@p6rt
Copy link

p6rt commented Aug 29, 2009

Migrated from rt.perl.org#68854 (status was 'resolved')

Searchable as RT68854$

@p6rt
Copy link
Author

p6rt commented Aug 29, 2009

From payload@lavabit.com

BUG​:
$ perl6

say '(foo)' ~~ /'(' ~ ')' .*?/
Unable to parse , couldn't find final ')'
in regex PGE​::Grammar​::_block21 (<unknown>​:1)
called from Main (<unknown>​:1)

TEST​:
ok( '(foo)' ~~ /'(' ~ ')' .*?/ )

The problem is .*? , i think, cause '(foo)' ~~ /'(' ~ ')' 'foo'/
matches. The bug was introduced between June 30 and today, cause similar
code is used in http://github.com/krunen/xml/tree/master, last updated
June 30

parrot rev​: 40852
rakudo rev​: 7666e92876576ce0dc9b1f5b8a5f035e78a80e81

@p6rt
Copy link
Author

p6rt commented Aug 30, 2009

From @pmichaud

On Sat, Aug 29, 2009 at 02​:45​:08PM -0700, Gilbert R. Roehrbein (via RT) wrote​:

BUG​:
$ perl6

say '(foo)' ~~ /'(' ~ ')' .*?/
Unable to parse , couldn't find final ')'
in regex PGE​::Grammar​::_block21 (<unknown>​:1)
called from Main (<unknown>​:1)

TEST​:
ok( '(foo)' ~~ /'(' ~ ')' .*?/ )

The problem is .*? , i think, cause '(foo)' ~~ /'(' ~ ')' 'foo'/
matches.

Currently Synopsis 5 is a bit unclear on the handling of backtracking
using the ~ operator in regexes. The current definition says that
something of the form

  '(' ~ ')' <expression>

gets rewritten to be something like

  '(' <expression> [ ')' || <FAILGOAL> ]

Note that there's no way to backtrack into <expression> -- once we've
matched <expression>, we either find the closing token or we fail.

So in the case of the problem regex above, we end up with

  '(' .*? [ ')' || <FAILGOAL> ]

which will match only "()", because there's no possibility of
backtracking into the .*? to find longer strings between the parens.

At one time I tried changing the definition of ~ so that it
could allow backtracking into the expression

  '(' [ <expression> ')' || <FAILGOAL> ]

but ISTR that I ran into some other issues there and gave up for the
time being.

So, short answer is that I think Rakudo is correctly following the
specification here, but we may need to tweak the specification a bit.

The bug was introduced between June 30 and today, cause similar
code is used in http://github.com/krunen/xml/tree/master, last updated
June 30

AFAIK none of the related code has been changed between June 30 and today,
so I'm guessing something else must be happening there. Looking at the
grammar that is given at that address now, I see

  token comment { '<!--' ~ '-->' <content> }
  token pi { '<?' ~ '?>' <content> }
  token content { .*? }

Given that these are all "token" (no backtracking), that would mean that
the calls to the <content> subrule will only ever match an empty string.

Thanks!

Pm

@p6rt
Copy link
Author

p6rt commented Aug 30, 2009

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Oct 14, 2012

From @pmichaud

Marking as resolved, Coke++ for noticing.

Pm

@p6rt
Copy link
Author

p6rt commented Oct 14, 2012

@pmichaud - Status changed from 'open' to 'resolved'

@p6rt p6rt closed this as completed Oct 14, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant