Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bleadperl v5.13.7-234-g7b98bc4 breaks YAML::XS #10874

Closed
p5pRT opened this issue Dec 4, 2010 · 13 comments
Closed

Bleadperl v5.13.7-234-g7b98bc4 breaks YAML::XS #10874

p5pRT opened this issue Dec 4, 2010 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 4, 2010

Migrated from rt.perl.org#80212 (status was 'resolved')

Searchable as RT80212$

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2010

From @andk

git bisect​:


  commit 7b98bc4
  Author​: Karl Williamson <public@​khwilliamson.com>
  Date​: Tue Nov 30 18​:10​:37 2010 -0700

  regcomp.c​: utf8 pattern defaults to Unicode semantics

CPAN distros affected​:


  SPROUT/JE-0.051.tar.gz
  INGY/YAML-LibYAML-0.34.tar.gz

  I have patched YAML-LibYAML before so that only the newly broken tests
  get visible. My two patches are available from CPAN as

  ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
  ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

My observation​:


  perl -le 'my $x = "\x{100}"; chop $x; print qr/$x/'
  (?^u​:)

  This doesn't look right to me that an empty string gets the "u" and
  when I try to fix YAML​::XS it gets in the way.

perl -V​:


  Summary of my perl5 (revision 5 version 13 subversion 7) configuration​:
  Commit id​: 7b98bc4
  Platform​:
  osname=linux, osvers=2.6.32-5-amd64, archname=x86_64-linux-ld
  uname='linux k81 2.6.32-5-amd64 #1 smp sat oct 30 14​:18​:21 utc 2010 x86_64 gnulinux '
  config_args='-Dprefix=/home/src/perl/repoperls/installed-perls/perl/v5.13.7-234-g7b98bc4 -Dinstallusrbinperl=n -Uversiononly -Dusedevel -des -Ui_db -Uuseithreads -Duselongdouble'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=define
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2',
  cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.4.5', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8
  alignbytes=16, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.11.2.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.11.2'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'

  Characteristics of this binary (from libperl)​:
  Compile-time options​: PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP PERL_USE_DEVEL
  USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES
  USE_LONG_DOUBLE USE_PERLIO USE_PERL_ATOF
  Built under linux
  Compiled at Dec 4 2010 07​:20​:30
  @​INC​:
  /home/src/perl/repoperls/installed-perls/perl/v5.13.7-234-g7b98bc4/lib/site_perl/5.13.7/x86_64-linux-ld
  /home/src/perl/repoperls/installed-perls/perl/v5.13.7-234-g7b98bc4/lib/site_perl/5.13.7
  /home/src/perl/repoperls/installed-perls/perl/v5.13.7-234-g7b98bc4/lib/5.13.7/x86_64-linux-ld
  /home/src/perl/repoperls/installed-perls/perl/v5.13.7-234-g7b98bc4/lib/5.13.7
  .

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2010

From @demerphq

On 4 December 2010 09​:11, Andreas J. Koenig via RT
<perlbug-followup@​perl.org> wrote​:

# New Ticket Created by  (Andreas J. Koenig)
# Please include the string​:  [perl #80212]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=80212 >

git bisect​:
-----------
 commit 7b98bc4
 Author​: Karl Williamson <public@​khwilliamson.com>
 Date​:   Tue Nov 30 18​:10​:37 2010 -0700

 regcomp.c​: utf8 pattern defaults to Unicode semantics

CPAN distros affected​:
----------------------

 SPROUT/JE-0.051.tar.gz
 INGY/YAML-LibYAML-0.34.tar.gz

 I have patched YAML-LibYAML before so that only the newly broken tests
 get visible. My two patches are available from CPAN as

 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

I cant seem to fetch those.

My observation​:
---------------

 perl -le 'my $x = "\x{100}"; chop $x; print qr/$x/'
 (?^u​:)

 This doesn't look right to me that an empty string gets the "u" and
 when I try to fix YAML​::XS it gets in the way.

Hmm. Edge case. Arguable both ways, but probably should be special
cased as you say.

However, can I ask exactly what is going on here that fails?

You see it sounds like YAML might be doing dodgy or there could be a
bug somewhere in our introspection tools as there is an interface for
interacting with qr// objects. And has been since 5.10.

If YAML is not using it, then it will have problems as we add
modifiers and things like that, and we should probably think about
fixing that.

If YAML does use it, then it /should/ be robust to these problems.
(Emphasis because it might have been overlooked, which would be good
to know. ;-)

Im thinking in particular of this​:

  use re qw(regexp_pattern);

  my $ref= qr/\x{100}/;
  my ($pat, $mods) = regexp_pattern($ref);

Nevertheless, this is fixed in

commit 5b6010b
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Sat Dec 4 15​:26​:38 2010 +0100

  make empty string regexp stringify to the same thing regardless of
unicode flags

thanks for the report.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2010

From @andk

On Sat, 4 Dec 2010 15​:28​:12 +0100, demerphq <demerphq@​gmail.com> said​:

 I have patched YAML-LibYAML before so that only the newly broken tests
 get visible. My two patches are available from CPAN as

 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

  > I cant seem to fetch those.

Drats, duplicate copy and paste, hereby corrected.

 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

My observation​:
---------------

 perl -le 'my $x = "\x{100}"; chop $x; print qr/$x/'
 (?^u​:)

 This doesn't look right to me that an empty string gets the "u" and
 when I try to fix YAML​::XS it gets in the way.

  > Hmm. Edge case. Arguable both ways, but probably should be special
  > cased as you say.

I don't think I said that. I just wanted to present one observation that
has something problematic. I already regret that I did point it out
because it lead you to belief that this is the whole story.

  > However, can I ask exactly what is going on here that fails?

The appearance of the "u" switch in qr/.../ is probably not predictable.

So this time I bring you an example that will hopefully stick. This
happens before and after your commit​:

perl -le '
my $x = "x"; print qr/$x/; $x .= "\x{100}"; chop $x; print qr/$x/'
(?^​:x)
(?^u​:x)

I'll leave the rest of you posting uncommented.

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2010

From @demerphq

On 4 December 2010 21​:15, Andreas J. Koenig
<andreas.koenig.7os6VVqR@​franz.ak.mind.de> wrote​:

On Sat, 4 Dec 2010 15​:28​:12 +0100, demerphq <demerphq@​gmail.com> said​:

 >>  I have patched YAML-LibYAML before so that only the newly broken tests
 >>  get visible. My two patches are available from CPAN as
 >>
 >>  ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
 >>  ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

 > I cant seem to fetch those.

Drats, duplicate copy and paste, hereby corrected.

 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

 >> My observation​:
 >> ---------------
 >>
 >>  perl -le 'my $x = "\x{100}"; chop $x; print qr/$x/'
 >>  (?^u​:)
 >>
 >>  This doesn't look right to me that an empty string gets the "u" and
 >>  when I try to fix YAML​::XS it gets in the way.

 > Hmm. Edge case. Arguable both ways, but probably should be special
 > cased as you say.

I don't think I said that. I just wanted to present one observation that
has something problematic. I already regret that I did point it out
because it lead you to belief that this is the whole story.

 > However, can I ask exactly what is going on here that fails?

The appearance of the "u" switch in qr/.../ is probably not predictable.

So this time I bring you an example that will hopefully stick. This
happens before and after your commit​:

perl -le '
my $x = "x"; print qr/$x/; $x .= "\x{100}"; chop $x; print qr/$x/'
(?^​:x)
(?^u​:x)

This is the expected result.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2010

From @demerphq

On 4 December 2010 21​:15, Andreas J. Koenig
<andreas.koenig.7os6VVqR@​franz.ak.mind.de> wrote​:

On Sat, 4 Dec 2010 15​:28​:12 +0100, demerphq <demerphq@​gmail.com> said​:

 >>  I have patched YAML-LibYAML before so that only the newly broken tests
 >>  get visible. My two patches are available from CPAN as
 >>
 >>  ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
 >>  ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

 > I cant seem to fetch those.

Drats, duplicate copy and paste, hereby corrected.

 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/YAML-LibYAML-0.34-ANDK-01.patch.gz
 ftp​://cpan.cpantesters.org/CPAN/authors/id/A/AN/ANDK/patches/YAML-LibYAML-0.34-ANDK-02.patch.gz

 >> My observation​:
 >> ---------------
 >>
 >>  perl -le 'my $x = "\x{100}"; chop $x; print qr/$x/'
 >>  (?^u​:)
 >>
 >>  This doesn't look right to me that an empty string gets the "u" and
 >>  when I try to fix YAML​::XS it gets in the way.

 > Hmm. Edge case. Arguable both ways, but probably should be special
 > cased as you say.

I don't think I said that. I just wanted to present one observation that
has something problematic. I already regret that I did point it out
because it lead you to belief that this is the whole story.

It seemed to me did imply it by saying 'that an empty string gets the "u"'.

 > However, can I ask exactly what is going on here that fails?

The appearance of the "u" switch in qr/.../ is probably not predictable.

Umm, actually until my patch it was extremely predictable. Now it is a
little less so. :-)

So this time I bring you an example that will hopefully stick. This
happens before and after your commit​:

perl -le '
my $x = "x"; print qr/$x/; $x .= "\x{100}"; chop $x; print qr/$x/'
(?^​:x)
(?^u​:x)

As I said before this is expected.

The rule is "if the unicode flag is on, or the pattern contains
codepoints larger than 255 then the pattern is unicode".

When you concatenated \x{100} onto $x it was marked as a unicode
string. Simply by chopping the var does not make the flag turn off.

You can see this with Devel​::Peek.

I'll leave the rest of you posting uncommented.

Well thats a pity as it seems to apply directly to some of the changes you did.

my $rx2_ = Load($yaml2);
is ref($rx2_), 'Classy', 'Can Load a blessed regexp';
-is $rx2_, '(?-xism​:99999)', 'Loaded blessed regexp value is correct';
-ok "999999999" =~ $rx2_, 'Loaded blessed regexp works';
+$rx2_ = $1 if $rx2_ =~ /^\(\?\^​:(.+)\)$/;
+is $rx2_, "(?$_xism\​:99999)", 'Loaded blessed regexp value is correct';
+ok "999999999" =~ qr/$rx2_/, 'Loaded blessed regexp works';

Should be rewritten so that you use re​::regexp_pattern on the source,
and then check to see that the result has the same values.

Using is($qr, "string") is inherently not future compatible.

What can we do to make this kind of thing easier so that you dont have
to patch this stuff *again* if we add even more modifiers?

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 5, 2010

From @andk

On Sat, 4 Dec 2010 21​:34​:57 +0100, demerphq <demerphq@​gmail.com> said​:

I don't think I said that. I just wanted to present one observation that
has something problematic. I already regret that I did point it out
because it lead you to belief that this is the whole story.

  > It seemed to me did imply it by saying 'that an empty string gets the "u"'.

Not at all.

 > However, can I ask exactly what is going on here that fails?

The appearance of the "u" switch in qr/.../ is probably not predictable.

  > Umm, actually until my patch it was extremely predictable. Now it is a
  > little less so. :-)

You should probably revert it.

So this time I bring you an example that will hopefully stick. This
happens before and after your commit​:

perl -le '
my $x = "x"; print qr/$x/; $x .= "\x{100}"; chop $x; print qr/$x/'
(?^​:x)
(?^u​:x)

  > As I said before this is expected.

  > The rule is "if the unicode flag is on, or the pattern contains
  > codepoints larger than 255 then the pattern is unicode".

Before the "Unicode Bug" was fixed it has been said that the unicode
flag is an internal representation thingy. Has this changed? If an
internal representation leaks out what is it good for? Maybe this should
be explained this in the manpage for qr//? Part of fixing "The Unicode
Bug" was, imho, to overcome this heritage, but I may be wrong. I'd be
interested in a statement about where we stand now wrt. the Unicode
flag.

  > When you concatenated \x{100} onto $x it was marked as a unicode
  > string. Simply by chopping the var does not make the flag turn off.

  > You can see this with Devel​::Peek.

If this is not a bug, is it documented? Since your patch did not change
documentation, I suppose it isn't.

I'll leave the rest of you posting uncommented.

  > Well thats a pity as it seems to apply directly to some of the changes you did.

  > my $rx2_ = Load($yaml2);
  > is ref($rx2_), 'Classy', 'Can Load a blessed regexp';
  > -is $rx2_, '(?-xism​:99999)', 'Loaded blessed regexp value is correct';
  > -ok "999999999" =~ $rx2_, 'Loaded blessed regexp works';
  > +$rx2_ = $1 if $rx2_ =~ /^\(\?\^​:(.+)\)$/;
  > +is $rx2_, "(?$_xism\​:99999)", 'Loaded blessed regexp value is correct';
  > +ok "999999999" =~ qr/$rx2_/, 'Loaded blessed regexp works';

  > Should be rewritten so that you use re​::regexp_pattern on the source,
  > and then check to see that the result has the same values.

That would be a very different test. The current test is asserting a
stringification that is inherent in a string, not inherent in a
programming language. Remember, YAML as a marshalling tool should be
programming-language independent.

I cannot speak for the author of the test but I believe he doesn't mind
to rewrite the test from time to time as long as perl documents what it
changes.

  > Using is($qr, "string") is inherently not future compatible.

This seems to have been accepted by the author of the test. How can we
make it "present tense" compatible?

  > What can we do to make this kind of thing easier so that you dont have
  > to patch this stuff *again* if we add even more modifiers?

For one moment, please don't worry about the future of the test, just
help fixing it for now, without a suggestion like resorting to perl to
inspect internal flags. I would not mind if perl would suggest the
existence of the "u" flag in a string is random internal musing, it
would only surprise me a little.

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Dec 5, 2010

From @demerphq

On 5 December 2010 05​:15, Andreas J. Koenig
<andreas.koenig.7os6VVqR@​franz.ak.mind.de> wrote​:

On Sat, 4 Dec 2010 21​:34​:57 +0100, demerphq <demerphq@​gmail.com> said​:

 >> I don't think I said that. I just wanted to present one observation that
 >> has something problematic. I already regret that I did point it out
 >> because it lead you to belief that this is the whole story.

 > It seemed to me did imply it by saying 'that an empty string gets the "u"'.

Not at all.

 >>  > However, can I ask exactly what is going on here that fails?
 >>
 >> The appearance of the "u" switch in qr/.../ is probably not predictable.

 > Umm, actually until my patch it was extremely predictable. Now it is a
 > little less so. :-)

You should probably revert it.

For now, I will not. As it has benefit on its own. If i could easily
do so i would guarantee that the empty string pattern is represented
as (?​:) only.

 >> So this time I bring you an example that will hopefully stick. This
 >> happens before and after your commit​:
 >>
 >> perl -le '
 >> my $x = "x"; print qr/$x/; $x .= "\x{100}"; chop $x; print qr/$x/'
 >> (?^​:x)
 >> (?^u​:x)

 > As I said before this is expected.

 > The rule is  "if the unicode flag is on, or the pattern contains
 > codepoints larger than 255 then the pattern is unicode".

Before the "Unicode Bug" was fixed it has been said that the unicode
flag is an internal representation thingy. Has this changed? If an
internal representation leaks out what is it good for? Maybe this should
be explained this in the manpage for qr//? Part of fixing "The Unicode
Bug" was, imho, to overcome this heritage, but I may be wrong. I'd be
interested in a statement about where we stand now wrt. the Unicode
flag.

 > When you concatenated \x{100} onto $x it was marked as a unicode
 > string. Simply by chopping the var does not make the flag turn off.

 > You can see this with Devel​::Peek.

If this is not a bug, is it documented? Since your patch did not change
documentation, I suppose it isn't.

Hmm. Well. Is it documented that you will *only* see msix in a regexp pattern?

Because it is not true, and hasnt been true since 5.10, when 'p' was
added as well.

 >> I'll leave the rest of you posting uncommented.

 > Well thats a pity as it seems to apply directly to some of the changes you did.

 >  my $rx2_ = Load($yaml2);
 >  is ref($rx2_), 'Classy', 'Can Load a blessed regexp';
 > -is $rx2_, '(?-xism​:99999)', 'Loaded blessed regexp value is correct';
 > -ok "999999999" =~ $rx2_, 'Loaded blessed regexp works';
 > +$rx2_ = $1 if $rx2_ =~ /^\(\?\^​:(.+)\)$/;
 > +is $rx2_, "(?$_xism\​:99999)", 'Loaded blessed regexp value is correct';
 > +ok "999999999" =~ qr/$rx2_/, 'Loaded blessed regexp works';

 > Should be rewritten so that you use re​::regexp_pattern on the source,
 > and then check to see that the result has the same values.

That would be a very different test. The current test is asserting a
stringification that is inherent in a string, not inherent in a
programming language.

A) the syntax of perls regular expression engine is part of the Perl language.

B) regexps arent strings - they are *programs*.

Remember, YAML as a marshalling tool should be
programming-language independent.

Well by using perls stringified regex format they seem to have
established a hard dependency on something that Perl considers to be
internal.

Regexes stringify to their pattern not so people can print them out,
although we recognize this is useful for debugging, but rather so they
can be embedded correctly in other patterns.

IMO if it were truely language independent then it would output something like​:

  regex($pattern,$modifiers)

And use re​::regexp_pattern to do so. And then each language would be
responsible for handling the pattern and modifiers independently,
including ignoring or munging modifiers it doesnt know.

You cannot even expect the stringified version of a QR to *look* the
same as the pattern fed in. We reserve the right at the regex engine
level to do any kind of transformations we choose in how the program
is represented in order to minimize bugs and ensure that when the
pattern is embedded it behaves as expected. We are entitled to make
that representation look pretty much however we want.

And in fact this has been true from the first day of qr// objects.

So I go back to saying that the only sane test is to create a qr//,
serialize it, then deserialize it and see if it round trips correctly
in that perl.

Testing it does the right things in other languages is out of scope of
this mailing list.

I cannot speak for the author of the test but I believe he doesn't mind
to rewrite the test from time to time as long as perl documents what it
changes.

 > Using is($qr, "string") is inherently not future compatible.

This seems to have been accepted by the author of the test. How can we
make it "present tense" compatible?

Given the amount of activity related to adding new modifiers to perl
regex construct basically the only way to be present compatible is to
be future compatible.

 > What can we do to make this kind of thing easier so that you dont have
 > to patch this stuff *again* if we add even more modifiers?

For one moment, please don't worry about the future of the test, just
help fixing it for now,

I dont know that I can help with that. Im actually trying, and have
been trying since my first reply to your mail, but i dont know that I
can fix the broken assumptions baked into what is going on here.

without a suggestion like resorting to perl to
inspect internal flags. I would not mind if perl would suggest the
existence of the "u" flag in a string is random internal musing, it
would only surprise me a little.

It is not.

Unicode data has to behave differently to nonunicode data in patterns.

For instance unicode 'e' has to match a larger set of characters in a
case independent operation than ascii 'e'. Similar with 'u', in
unicode it matches 'ü' and 'U' amongst others. In ascii it does not.

This subtlety was overlooked in the past and taking a unicode regex
and embedding it would "lose" the unicode behavior. This was
considered a bug.

Ill try to follow up on this a bit, but basically, if this test code
is going to use perls internal embeddable representation of a qr// in
its dumps it is going to have issues.

And the behaviour of perl with respect to unicode strings is
longstanding. It would appear that YAML is actually unaware of this,
and treats all strings as equal when they are not in Perl. In their
defense almost no serialization tools fully respect the unicode flag
on perl strings.

Anyway, I still do have one more nit on this that I will discuss with
Karl. Specifically I dont entirely understand why we set the 'u' flag
when the pattern is case sensitive. It may be that we can stop that
from happening, in which case I would guess some of these tests will
be ok.

But the general points remain​:

A) that a regex will stringify to some particular string should not be
documented, and if I have anything to do with it will not be
documented because if we did we would never be able to change
anything. And we reserve the right to munge the pattern internally in
any way that suits us.
B) a compiled regex pattern is no more a string than a Perl subroutine
compiled using string eval is a string.

Let me put this in different terms. Does YAML think that it can use
Deparse to stringify code blocks into existance? Would you consider it
a Perl bug if they did so and we changed the deparse output?

Because that is basically what you are asserting for regexes. And that
is wrong. Regexes are more like compiled subroutines than they are
like strings.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 27, 2011

From @cpansprout

I fixed JE as soon as this ticket was posted, so I’m removing it from
the subject.

@p5pRT
Copy link
Author

p5pRT commented Mar 31, 2011

From @khwilliamson

I never understood a large part of the discussion about this bug. But I
did download and test, and can report on the current status. On the
5.14 frozen blead it fails 10 tests, in one file, regexp.t. (attached)
All of them are from the .t expecting a particular regex stringification
that is no longer met. Some are because of the new '^' default modifier
symbol, and the rest are the result of adding /u to indicate Unicode
semantics, and the regex has Unicode semantics.

I don't know how to proceed with this. The changes to the core are
pretty well entrenched. And we have never made any claim as to the
stability of regex stringification. It really is not a good idea to
test for a particular stringification in a regex. Besides the fact that
it isn't necessarily stable, getting the expected stringification may
not indicate that the underlying regex is correct. Tests for actual
behavior are both more accurate, and less subject to the instability of
representation.

I worked some on this module in September. My suggested fix at that
time doesn't work now if I just plug it in. I haven't played with it
to see why not.
--
--Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 31, 2011

From @khwilliamson

test.out

@p5pRT
Copy link
Author

p5pRT commented Apr 4, 2011

From @obra

YAML​::LibYAML 0.35 now passes tests on blead. I'm resolving this ticket.

@p5pRT
Copy link
Author

p5pRT commented Apr 4, 2011

@obra - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant