Skip Menu |
Report information
Id: 133541
Status: open
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: jvs [at] dyumnin.com
Cc:
AdminCc:

Severity: (no value)
Tag: (no value)
Platform: (no value)
Patch Status: (no value)
VM: (no value)



From: Vijayvithal <jvs [...] dyumnin.com>
Subject: Grammer bug <alnum> vs <alpha>
Date: Tue, 25 Sep 2018 16:38:02 +0530
To: rakudobug [...] perl.org
Download (untitled) / with headers
text/plain 706b
In the attached code, the only difference between the Grammars G0 and G1 is the defination of token 'type' it is defined as <alpha> in one case and as <alnum> in another. Since the string being matched is 'sc_in' both the alpha and alnum tokens should have captured it. But we see the following result on execution =========== <alnum> Example============== Nil =========== <alpha> Example============== 「sc_in<foo> bar」 ruport => 「sc_in」 type => 「sc_in」 alpha => 「s」 alpha => 「c」 alpha => 「_」 alpha => 「i」 alpha => 「n」 Perl Version is This is Rakudo Star version 2018.06 built on MoarVM version 2018.06 implementing Perl 6.c. -- Vijayvithal Dyumnin Semiconductors
Download gentb.p6
text/plain 445b

Message body is not shown because sender requested not to inline it.

Download signature.asc
application/pgp-signature 490b

Message body not shown because it is not plain text.

From: Brandon Allbery <allbery.b [...] gmail.com>
CC: bugs-bitbucket [...] rt.perl.org
To: perl6-compiler <perl6-compiler [...] perl.org>
Date: Thu, 27 Sep 2018 22:52:06 -0400
Subject: Re: [perl #133541] Grammer bug <alnum> vs <alpha>
Download (untitled) / with headers
text/plain 1.2k
"_" is not an alphabetic character. It's allowed in "alnum" because that is by intent what is \w in other regex implementations, which includes "_".

On Thu, Sep 27, 2018 at 10:47 PM Vijayvithal <perl6-bugs-followup@perl.org> wrote:
Show quoted text
# New Ticket Created by  Vijayvithal
# Please include the string:  [perl #133541]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/Ticket/Display.html?id=133541 >


In the attached code, the only difference between the Grammars G0 and G1
is the defination of token 'type' it is defined as <alpha> in one case
and as <alnum> in another.

Since the string being matched is 'sc_in' both the alpha and alnum
tokens should have captured it. But we see the following result on
execution

=========== <alnum> Example==============
Nil
=========== <alpha> Example==============
「sc_in<foo> bar」
ruport => 「sc_in」
type => 「sc_in」
alpha => 「s」
alpha => 「c」
alpha => 「_」
alpha => 「i」
alpha => 「n」


Perl Version is

This is Rakudo Star version 2018.06 built on MoarVM version 2018.06
implementing Perl 6.c.



--
Vijayvithal
Dyumnin Semiconductors


--
brandon s allbery kf8nh
To: Brandon Allbery <allbery.b [...] gmail.com>
CC: perl6-compiler [...] perl.org, bugs-bitbucket [...] rt.perl.org
Subject: Re: [perl #133541] Grammer bug <alnum> vs <alpha>
Date: Fri, 28 Sep 2018 02:26:41 -0700
From: Brent Laabs <bslaabs [...] gmail.com>
Download (untitled) / with headers
text/plain 1.5k
Are you sure about that?  Underscore has been part of the specs (synopses) for <alpha> for at least 10 years, probably longer.

 >  "_" ~~ /<alpha>/
「_」
 alpha => 「_」

On Thu, Sep 27, 2018 at 7:52 PM Brandon Allbery <allbery.b@gmail.com> wrote:
Show quoted text
"_" is not an alphabetic character. It's allowed in "alnum" because that is by intent what is \w in other regex implementations, which includes "_".

On Thu, Sep 27, 2018 at 10:47 PM Vijayvithal <perl6-bugs-followup@perl.org> wrote:
# New Ticket Created by  Vijayvithal
# Please include the string:  [perl #133541]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/Ticket/Display.html?id=133541 >


In the attached code, the only difference between the Grammars G0 and G1
is the defination of token 'type' it is defined as <alpha> in one case
and as <alnum> in another.

Since the string being matched is 'sc_in' both the alpha and alnum
tokens should have captured it. But we see the following result on
execution

=========== <alnum> Example==============
Nil
=========== <alpha> Example==============
「sc_in<foo> bar」
ruport => 「sc_in」
type => 「sc_in」
alpha => 「s」
alpha => 「c」
alpha => 「_」
alpha => 「i」
alpha => 「n」


Perl Version is

This is Rakudo Star version 2018.06 built on MoarVM version 2018.06
implementing Perl 6.c.



--
Vijayvithal
Dyumnin Semiconductors


--
brandon s allbery kf8nh
CC: Brandon Allbery <allbery.b [...] gmail.com>, perl6-compiler [...] perl.org, bugs-bitbucket [...] rt.perl.org
Date: Fri, 28 Sep 2018 09:18:31 -0500
Subject: Re: [perl #133541] Grammer bug <alnum> vs <alpha>
From: "Patrick R. Michaud" <pmichaud [...] pobox.com>
To: Brent Laabs <bslaabs [...] gmail.com>
Download (untitled) / with headers
text/plain 2.5k
The issue doesn't seem to be the underscore, because I get the same result even when converting the underscore into a letter ('b'): $ cat gentb.p6 grammar G0 { token TOP {<rport>|<ruport>.*} regex rport { <type>} rule ruport { <type>} #token type {<alpha>+} token type {<alnum>+} } grammar G1 { token TOP {<rport>|<ruport>.*} regex rport { <type>} rule ruport { <type>} token type {<alpha>+} #token type {<alnum>+} } my $str="scbin<foo> bar"; say "=========== <alnum> Example=============="; say G0.parse($str); say "=========== <alpha> Example=============="; say G1.parse($str); $ perl6 gentb.p6 =========== <alnum> Example============== Nil =========== <alpha> Example============== 「scbin<foo> bar」 ruport => 「scbin」 type => 「scbin」 alpha => 「s」 alpha => 「c」 alpha => 「b」 alpha => 「i」 alpha => 「n」 $ On Fri, Sep 28, 2018 at 02:26:41AM -0700, Brent Laabs wrote: Show quoted text
> Are you sure about that? Underscore has been part of the specs (synopses) > for <alpha> for at least 10 years, probably longer. >
> > "_" ~~ /<alpha>/
> 「_」 > alpha => 「_」 > > On Thu, Sep 27, 2018 at 7:52 PM Brandon Allbery <allbery.b@gmail.com> wrote: >
> > "_" is not an alphabetic character. It's allowed in "alnum" because that > > is by intent what is \w in other regex implementations, which includes "_". > > > > On Thu, Sep 27, 2018 at 10:47 PM Vijayvithal <perl6-bugs-followup@perl.org> > > wrote: > >
> >> # New Ticket Created by Vijayvithal > >> # Please include the string: [perl #133541] > >> # in the subject line of all future correspondence about this issue. > >> # <URL: https://rt.perl.org/Ticket/Display.html?id=133541 > > >> > >> > >> In the attached code, the only difference between the Grammars G0 and G1 > >> is the defination of token 'type' it is defined as <alpha> in one case > >> and as <alnum> in another. > >> > >> Since the string being matched is 'sc_in' both the alpha and alnum > >> tokens should have captured it. But we see the following result on > >> execution > >> > >> =========== <alnum> Example============== > >> Nil > >> =========== <alpha> Example============== > >> 「sc_in<foo> bar」 > >> ruport => 「sc_in」 > >> type => 「sc_in」 > >> alpha => 「s」 > >> alpha => 「c」 > >> alpha => 「_」 > >> alpha => 「i」 > >> alpha => 「n」 > >> > >> > >> Perl Version is > >> > >> This is Rakudo Star version 2018.06 built on MoarVM version 2018.06 > >> implementing Perl 6.c. > >> > >> > >> > >> -- > >> Vijayvithal > >> Dyumnin Semiconductors > >>
> > > > > > -- > > brandon s allbery kf8nh > > allbery.b@gmail.com > >
Subject: Re: [perl #133541] Grammer bug <alnum> vs <alpha>
Date: Fri, 28 Sep 2018 14:54:04 -0700
To: perl6-bugs-followup [...] perl.org
CC: jvs [...] dyumnin.com
From: Brent Laabs <bslaabs [...] gmail.com>
Download (untitled) / with headers
text/plain 3.4k
Golfs to just the top grammar, which is the only one that returns Nil.

grammar Alnum1 {
    token TOP {<alnum>|<alnum>.*}
}
grammar AlnumReversed {
    token TOP {<alnum>.*|<alnum>}
}
grammar Alpha1 {
    token TOP {<alpha>|<alpha>.*}
}
my $rx = rx/^ [<alnum>|<alnum>.*] $/;

my $str="n~";

.say for "=========== <alnum> ==============",
 Alnum1.parse($str),
 "=========== <alnum> (reversed) ===",
 AlnumReversed.parse($str),
 "=========== <alpha> ==============",
 Alpha1.parse($str),
 "=========== Regex   ==============",
 $str ~~ $rx;


On Fri, Sep 28, 2018 at 7:19 AM Patrick R. Michaud via RT <perl6-bugs-followup@perl.org> wrote:
Show quoted text
The issue doesn't seem to be the underscore, because I get the same result even when converting the underscore into a letter ('b'):

$ cat gentb.p6
grammar G0 {
        token TOP {<rport>|<ruport>.*}
        regex rport { <type>}
        rule ruport { <type>}
        #token type {<alpha>+}
        token type {<alnum>+}
}

grammar G1 {
        token TOP {<rport>|<ruport>.*}
        regex rport { <type>}
        rule ruport { <type>}
        token type {<alpha>+}
        #token type {<alnum>+}
}
my $str="scbin<foo> bar";
say "=========== <alnum> Example==============";
say G0.parse($str);
say "=========== <alpha> Example==============";
say G1.parse($str);

$ perl6 gentb.p6
=========== <alnum> Example==============
Nil
=========== <alpha> Example==============
「scbin<foo> bar」
 ruport => 「scbin」
  type => 「scbin」
   alpha => 「s」
   alpha => 「c」
   alpha => 「b」
   alpha => 「i」
   alpha => 「n」
$


On Fri, Sep 28, 2018 at 02:26:41AM -0700, Brent Laabs wrote:
> Are you sure about that?  Underscore has been part of the specs (synopses)
> for <alpha> for at least 10 years, probably longer.
>
>  >  "_" ~~ /<alpha>/
> 「_」
>  alpha => 「_」
>
> On Thu, Sep 27, 2018 at 7:52 PM Brandon Allbery <allbery.b@gmail.com> wrote:
>
> > "_" is not an alphabetic character. It's allowed in "alnum" because that
> > is by intent what is \w in other regex implementations, which includes "_".
> >
> > On Thu, Sep 27, 2018 at 10:47 PM Vijayvithal <perl6-bugs-followup@perl.org>
> > wrote:
> >
> >> # New Ticket Created by  Vijayvithal
> >> # Please include the string:  [perl #133541]
> >> # in the subject line of all future correspondence about this issue.
> >> # <URL: https://rt.perl.org/Ticket/Display.html?id=133541 >
> >>
> >>
> >> In the attached code, the only difference between the Grammars G0 and G1
> >> is the defination of token 'type' it is defined as <alpha> in one case
> >> and as <alnum> in another.
> >>
> >> Since the string being matched is 'sc_in' both the alpha and alnum
> >> tokens should have captured it. But we see the following result on
> >> execution
> >>
> >> =========== <alnum> Example==============
> >> Nil
> >> =========== <alpha> Example==============
> >> 「sc_in<foo> bar」
> >> ruport => 「sc_in」
> >> type => 「sc_in」
> >> alpha => 「s」
> >> alpha => 「c」
> >> alpha => 「_」
> >> alpha => 「i」
> >> alpha => 「n」
> >>
> >>
> >> Perl Version is
> >>
> >> This is Rakudo Star version 2018.06 built on MoarVM version 2018.06
> >> implementing Perl 6.c.
> >>
> >>
> >>
> >> --
> >> Vijayvithal
> >> Dyumnin Semiconductors
> >>
> >
> >
> > --
> > brandon s allbery kf8nh
> > allbery.b@gmail.com
> >

To: Brandon Allbery via RT <perl6-bugs-followup [...] perl.org>
From: vijayvithal jahagirdar <jvs [...] dyumnin.com>
Date: Fri, 28 Sep 2018 10:39:49 +0530
Subject: Re: [perl #133541] Grammer bug <alnum> vs <alpha>
Download (untitled) / with headers
text/plain 1.7k
This is in conflict with the documentation at https://docs.perl6.org/language/regexes which states <alpha>Alphabetic characters including _ And <alnum>\w. <alpha> plus <digit> In my example. '_' matches the alpha regex. As per specifications, Everything that matches alpha should match alnum. Which in the given example does not.On Sep 28, 2018 8:22 AM, Brandon Allbery via RT <perl6-bugs-followup@perl.org> wrote: Show quoted text
> > "_" is not an alphabetic character. It's allowed in "alnum" because that is > by intent what is \w in other regex implementations, which includes "_". > > On Thu, Sep 27, 2018 at 10:47 PM Vijayvithal <perl6-bugs-followup@perl.org> > wrote: >
> > # New Ticket Created by  Vijayvithal > > # Please include the string:  [perl #133541] > > # in the subject line of all future correspondence about this issue. > > # <URL: https://rt.perl.org/Ticket/Display.html?id=133541 > > > > > > > In the attached code, the only difference between the Grammars G0 and G1 > > is the defination of token 'type' it is defined as <alpha> in one case > > and as <alnum> in another. > > > > Since the string being matched is 'sc_in' both the alpha and alnum > > tokens should have captured it. But we see the following result on > > execution > > > > =========== <alnum> Example============== > > Nil > > =========== <alpha> Example============== > > 「sc_in<foo> bar」 > > ruport => 「sc_in」 > > type => 「sc_in」 > > alpha => 「s」 > > alpha => 「c」 > > alpha => 「_」 > > alpha => 「i」 > > alpha => 「n」 > > > > > > Perl Version is > > > > This is Rakudo Star version 2018.06 built on MoarVM version 2018.06 > > implementing Perl 6.c. > > > > > > > > -- > > Vijayvithal > > Dyumnin Semiconductors > >
> > > -- > brandon s allbery kf8nh > allbery.b@gmail.com >
Subject: Re: [perl #133541] Grammer bug <alnum> vs <alpha>
Date: Fri, 28 Sep 2018 20:38:07 +0530
From: Vijayvithal <jvs [...] dyumnin.com>
To: "Patrick R. Michaud via RT" <perl6-bugs-followup [...] perl.org>
Download (untitled) / with headers
text/plain 3.1k
This issue surfaces because of the token TOP line. If instead of <rport>|<ruport> only ruport was used the testcase works for both cases. So it is quite possible that the bug is elsewhere but shows up as a difference between alpha and alnum. Regards Vijay On Fri, Sep 28, 2018 at 07:18:49AM -0700, Patrick R. Michaud via RT wrote: Show quoted text
> The issue doesn't seem to be the underscore, because I get the same result even when converting the underscore into a letter ('b'): > > $ cat gentb.p6 > grammar G0 { > token TOP {<rport>|<ruport>.*} > regex rport { <type>} > rule ruport { <type>} > #token type {<alpha>+} > token type {<alnum>+} > } > > grammar G1 { > token TOP {<rport>|<ruport>.*} > regex rport { <type>} > rule ruport { <type>} > token type {<alpha>+} > #token type {<alnum>+} > } > my $str="scbin<foo> bar"; > say "=========== <alnum> Example=============="; > say G0.parse($str); > say "=========== <alpha> Example=============="; > say G1.parse($str); > > $ perl6 gentb.p6 > =========== <alnum> Example============== > Nil > =========== <alpha> Example============== > 「scbin<foo> bar」 > ruport => 「scbin」 > type => 「scbin」 > alpha => 「s」 > alpha => 「c」 > alpha => 「b」 > alpha => 「i」 > alpha => 「n」 > $ > > > On Fri, Sep 28, 2018 at 02:26:41AM -0700, Brent Laabs wrote:
> > Are you sure about that? Underscore has been part of the specs (synopses) > > for <alpha> for at least 10 years, probably longer. > >
> > > "_" ~~ /<alpha>/
> > 「_」 > > alpha => 「_」 > > > > On Thu, Sep 27, 2018 at 7:52 PM Brandon Allbery <allbery.b@gmail.com> wrote: > >
> > > "_" is not an alphabetic character. It's allowed in "alnum" because that > > > is by intent what is \w in other regex implementations, which includes "_". > > > > > > On Thu, Sep 27, 2018 at 10:47 PM Vijayvithal <perl6-bugs-followup@perl.org> > > > wrote: > > >
> > >> # New Ticket Created by Vijayvithal > > >> # Please include the string: [perl #133541] > > >> # in the subject line of all future correspondence about this issue. > > >> # <URL: https://rt.perl.org/Ticket/Display.html?id=133541 > > > >> > > >> > > >> In the attached code, the only difference between the Grammars G0 and G1 > > >> is the defination of token 'type' it is defined as <alpha> in one case > > >> and as <alnum> in another. > > >> > > >> Since the string being matched is 'sc_in' both the alpha and alnum > > >> tokens should have captured it. But we see the following result on > > >> execution > > >> > > >> =========== <alnum> Example============== > > >> Nil > > >> =========== <alpha> Example============== > > >> 「sc_in<foo> bar」 > > >> ruport => 「sc_in」 > > >> type => 「sc_in」 > > >> alpha => 「s」 > > >> alpha => 「c」 > > >> alpha => 「_」 > > >> alpha => 「i」 > > >> alpha => 「n」 > > >> > > >> > > >> Perl Version is > > >> > > >> This is Rakudo Star version 2018.06 built on MoarVM version 2018.06 > > >> implementing Perl 6.c. > > >> > > >> > > >> > > >> -- > > >> Vijayvithal > > >> Dyumnin Semiconductors > > >>
> > > > > > > > > -- > > > brandon s allbery kf8nh > > > allbery.b@gmail.com > > >
>
-- Vijayvithal Dyumnin Semiconductors
Download signature.asc
application/pgp-signature 490b

Message body not shown because it is not plain text.

Subject: Re: [perl #133541] Grammer bug <alnum> vs <alpha>
Date: Mon, 1 Oct 2018 19:51:37 -0700
To: perl6-bugs-followup [...] perl.org
From: Brent Laabs <bslaabs [...] gmail.com>
Download (untitled) / with headers
text/plain 3.7k
Actually, if you change it to <ruport>.*|<rport> -- this will work as you expect.  It's a bug that your version doesn't work, of course.  It does seem to involve <alpha> tangentially, but it is unrelated to underscore.

On Mon, Oct 1, 2018 at 6:17 PM Vijayvithal via RT <perl6-bugs-followup@perl.org> wrote:
Show quoted text
This issue surfaces because of the token TOP line. If instead of
<rport>|<ruport> only ruport was used the testcase works for both cases. So it is quite
possible that the bug is elsewhere but shows up as a difference between
alpha and alnum.

Regards
Vijay


On Fri, Sep 28, 2018 at 07:18:49AM -0700, Patrick R. Michaud via RT wrote:
> The issue doesn't seem to be the underscore, because I get the same result even when converting the underscore into a letter ('b'):
>
> $ cat gentb.p6
> grammar G0 {
>       token TOP {<rport>|<ruport>.*}
>       regex rport { <type>}
>       rule ruport { <type>}
>       #token type {<alpha>+}
>       token type {<alnum>+}
> }
>
> grammar G1 {
>       token TOP {<rport>|<ruport>.*}
>       regex rport { <type>}
>       rule ruport { <type>}
>       token type {<alpha>+}
>       #token type {<alnum>+}
> }
> my $str="scbin<foo> bar";
> say "=========== <alnum> Example==============";
> say G0.parse($str);
> say "=========== <alpha> Example==============";
> say G1.parse($str);
>
> $ perl6 gentb.p6
> =========== <alnum> Example==============
> Nil
> =========== <alpha> Example==============
> 「scbin<foo> bar」
>  ruport => 「scbin」
>   type => 「scbin」
>    alpha => 「s」
>    alpha => 「c」
>    alpha => 「b」
>    alpha => 「i」
>    alpha => 「n」
> $
>
>
> On Fri, Sep 28, 2018 at 02:26:41AM -0700, Brent Laabs wrote:
> > Are you sure about that?  Underscore has been part of the specs (synopses)
> > for <alpha> for at least 10 years, probably longer.
> >
> >  >  "_" ~~ /<alpha>/
> > 「_」
> >  alpha => 「_」
> >
> > On Thu, Sep 27, 2018 at 7:52 PM Brandon Allbery <allbery.b@gmail.com> wrote:
> >
> > > "_" is not an alphabetic character. It's allowed in "alnum" because that
> > > is by intent what is \w in other regex implementations, which includes "_".
> > >
> > > On Thu, Sep 27, 2018 at 10:47 PM Vijayvithal <perl6-bugs-followup@perl.org>
> > > wrote:
> > >
> > >> # New Ticket Created by  Vijayvithal
> > >> # Please include the string:  [perl #133541]
> > >> # in the subject line of all future correspondence about this issue.
> > >> # <URL: https://rt.perl.org/Ticket/Display.html?id=133541 >
> > >>
> > >>
> > >> In the attached code, the only difference between the Grammars G0 and G1
> > >> is the defination of token 'type' it is defined as <alpha> in one case
> > >> and as <alnum> in another.
> > >>
> > >> Since the string being matched is 'sc_in' both the alpha and alnum
> > >> tokens should have captured it. But we see the following result on
> > >> execution
> > >>
> > >> =========== <alnum> Example==============
> > >> Nil
> > >> =========== <alpha> Example==============
> > >> 「sc_in<foo> bar」
> > >> ruport => 「sc_in」
> > >> type => 「sc_in」
> > >> alpha => 「s」
> > >> alpha => 「c」
> > >> alpha => 「_」
> > >> alpha => 「i」
> > >> alpha => 「n」
> > >>
> > >>
> > >> Perl Version is
> > >>
> > >> This is Rakudo Star version 2018.06 built on MoarVM version 2018.06
> > >> implementing Perl 6.c.
> > >>
> > >>
> > >>
> > >> --
> > >> Vijayvithal
> > >> Dyumnin Semiconductors
> > >>
> > >
> > >
> > > --
> > > brandon s allbery kf8nh
> > > allbery.b@gmail.com
> > >
>

--
Vijayvithal
Dyumnin Semiconductors



This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org