New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data::Dumper misbehaving on integer hash keys wrt useqq and quotekeys #13376
Comments
From @andkWhile analyzing some surprising results of some of my smokers together The following program generates a table that shows that (1) if useperl=0 and quotekeys=0, then integer hash keys are quoted (2) if useperl=1, then integer hash keys are never quoted, ignoring The documentation about quotekeys contains a sentence that surely needs A defined false value will avoid quoting hash keys when it looks Here is the program: perl -le 'use Data::Dumper; use Test::More; print $Data::Dumper::VERSION; For a reference here the output of 2.145 that came with 5.18.1: 2.145 -- |
From @jkeenanOn Sun Oct 27 13:17:02 2013, andreas.koenig.7os6VVqR@franz.ak.mind.de wrote:
Andreas, The most recent commit to the 'dist/Data-Dumper/' directory I can find is this: commit da4b417 Dumper.xs: Update comment You say you ran this test on "bleadperl's Data::Dumper 2.149". Can I assume that the commit above is what you meant when you said you ran this on "bleadperl's Data::Dumper 2.149"? If that is correct, then anyone looking at this ticket can make use of the diff between those two commits which I am attaching. Thank you very much. |
The RT System itself - Status changed from 'new' to 'open' |
From @jkeenanAlso attaching, for diagnostic purposes, a file version of Andreas' inline program. |
From @jkeenan |
From @andk"James E Keenan via RT" <perlbug-followup@perl.org> writes:
Yes. But please note that I said that the bug is not a regression. Older
Because it is not a regression, the diff should not be overrated. -- |
From @jkeenanOn 11/9/13 2:27 PM, Andreas Koenig wrote:
Would you be able to state what the buggy behaviors are (starting from Thank you very much. |
From @andk"James E Keenan via RT" <perlbug-followup@perl.org> writes:
In my original posting I listed exactly two bugs and I cite the posting (1) if useperl=0 and quotekeys=0, then integer hash keys are quoted A third bug is that the documentation about quotekeys leaves a lot to be |
From @tonycozOn Sat Nov 09 20:17:27 2013, andreas.koenig.7os6VVqR@franz.ak.mind.de wrote:
I expect (1) was introduced with the XS Useqq implementation, and that's where the differences between your sample outputs lie. My initial inclinination would be to make the XS code match the Useperl code, since that's been my preference when making other changes to the original patch (and it reduces BBC reports.) I suspect fixing (2) would result in a lot of BBC reports, but I think correctness may win over convenience here. Tony |
From @tonycozOn Mon Nov 25 16:23:13 2013, tonyc wrote:
I've attached a patch that I believe fixes all of the key quoting problems, both with and without quotekeys. To illustrate, I adapted Andreas's program to: #!perl -l With perl 5.18.1 this produces: 2.145 With the attached patch on blead it produces: 2.150 Tony |
From @tonycoz0001-perl-120384-make-hash-key-quoting-compatible-between.patchFrom 38e9c8c0875d68a415b727901a797051e3335942 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 10 Dec 2013 14:14:51 +1100
Subject: [perl #120384] make hash key quoting compatible between perl and XS
In particular:
- if quotekeys is set all hash keys are now quoted, previously the perl
code didn't quote numeric keys
- keys of the form ::foo are now quoted by XS as the perl code always did
- XS code quoted "safe" numbers, while the perl code didn't
- perl code didn't quote strings like "1\x{660}", since \x{660}
matches \d
---
dist/Data-Dumper/Dumper.pm | 18 ++++++--
dist/Data-Dumper/Dumper.xs | 100 ++++++++++++++++++++++++----------------
dist/Data-Dumper/t/dumper.t | 14 +++---
dist/Data-Dumper/t/quotekeys.t | 43 ++++++++++++++++-
4 files changed, 120 insertions(+), 55 deletions(-)
diff --git a/dist/Data-Dumper/Dumper.pm b/dist/Data-Dumper/Dumper.pm
index 5b31d2c..1fa0984 100644
--- a/dist/Data-Dumper/Dumper.pm
+++ b/dist/Data-Dumper/Dumper.pm
@@ -10,7 +10,7 @@
package Data::Dumper;
BEGIN {
- $VERSION = '2.149'; # Don't forget to set version and release
+ $VERSION = '2.150'; # Don't forget to set version and release
} # date in POD below!
#$| = 1;
@@ -450,8 +450,15 @@ sub _dump {
() )
{
my $nk = $s->_dump($k, "");
- $nk = $1
- if !$s->{quotekeys} and $nk =~ /^[\"\']([A-Za-z_]\w*)[\"\']$/;
+
+ # _dump doesn't quote numbers of this form
+ if ($s->{quotekeys} && $nk =~ /^(?:0|-?[1-9][0-9]{0,8})\z/) {
+ $nk = $s->{useqq} ? qq("$nk") : qq('$nk');
+ }
+ elsif (!$s->{quotekeys} and $nk =~ /^[\"\']([A-Za-z_]\w*)[\"\']$/) {
+ $nk = $1
+ }
+
$sname = $mname . '{' . $nk . '}';
$out .= $pad . $ipad . $nk . $pair;
@@ -556,7 +563,8 @@ sub _dump {
and ref $ref eq 'VSTRING' || eval{Scalar::Util::isvstring($val)}) {
$out .= sprintf "%vd", $val;
}
- elsif ($val =~ /^(?:0|-?[1-9]\d{0,8})\z/) { # safe decimal number
+ # \d here would treat "1\x{660}" as a safe decimal number
+ elsif ($val =~ /^(?:0|-?[1-9][0-9]{0,8})\z/) { # safe decimal number
$out .= $val;
}
else { # string
@@ -1400,7 +1408,7 @@ modify it under the same terms as Perl itself.
=head1 VERSION
-Version 2.149 (September 20 2013)
+Version 2.150 (December 10 2013)
=head1 SEE ALSO
diff --git a/dist/Data-Dumper/Dumper.xs b/dist/Data-Dumper/Dumper.xs
index 65d37c6..f1d194e 100644
--- a/dist/Data-Dumper/Dumper.xs
+++ b/dist/Data-Dumper/Dumper.xs
@@ -19,7 +19,9 @@
static I32 num_q (const char *s, STRLEN slen);
static I32 esc_q (char *dest, const char *src, STRLEN slen);
static I32 esc_q_utf8 (pTHX_ SV *sv, const char *src, STRLEN slen, I32 do_utf8, I32 useqq);
-static I32 needs_quote(const char *s, STRLEN len);
+static bool globname_needs_quote(const char *s, STRLEN len);
+static bool key_needs_quote(const char *s, STRLEN len);
+static bool safe_decimal_number(const char *p, STRLEN len);
static SV *sv_x (pTHX_ SV *sv, const char *str, STRLEN len, I32 n);
static I32 DD_dump (pTHX_ SV *val, const char *name, STRLEN namelen, SV *retval,
HV *seenhv, AV *postav, I32 *levelp, I32 indent,
@@ -91,19 +93,19 @@ Perl_utf8_to_uvchr_buf(pTHX_ U8 *s, U8 *send, STRLEN *retlen)
#define DD_is_integer(sv) SvIOK(sv)
#endif
-/* does a string need to be protected? */
-static I32
-needs_quote(const char *s, STRLEN len)
+/* does a glob name need to be protected? */
+static bool
+globname_needs_quote(const char *s, STRLEN len)
{
const char *send = s+len;
TOP:
if (s[0] == ':') {
if (++s<send) {
if (*s++ != ':')
- return 1;
+ return TRUE;
}
else
- return 1;
+ return TRUE;
}
if (isIDFIRST(*s)) {
while (++s<send)
@@ -111,12 +113,35 @@ TOP:
if (*s == ':')
goto TOP;
else
- return 1;
+ return TRUE;
}
}
else
- return 1;
- return 0;
+ return TRUE;
+
+ return FALSE;
+}
+
+/* does a hash key need to be quoted (to the left of => ).
+ Previously this used (globname_)needs_quote() which accepted strings
+ like '::foo', but these aren't safe as unquoted keys under strict.
+*/
+static bool
+key_needs_quote(const char *s, STRLEN len) {
+ const char *send = s+len;
+
+ if (safe_decimal_number(s, len)) {
+ return FALSE;
+ }
+ else if (isIDFIRST(*s)) {
+ while (++s<send)
+ if (!isWORDCHAR(*s))
+ return TRUE;
+ }
+ else
+ return TRUE;
+
+ return FALSE;
}
/* Check that the SV can be represented as a simple decimal integer.
@@ -124,10 +149,7 @@ TOP:
* The perl code does this by matching against /^(?:0|-?[1-9]\d{0,8})\z/
*/
static bool
-safe_decimal_number(pTHX_ SV *val) {
- STRLEN len;
- const char *p = SvPV(val, len);
-
+safe_decimal_number(const char *p, STRLEN len) {
if (len == 1 && *p == '0')
return TRUE;
@@ -847,28 +869,24 @@ DD_dump(pTHX_ SV *val, const char *name, STRLEN namelen, SV *retval, HV *seenhv,
sv_catsv(retval, totpad);
sv_catsv(retval, ipad);
- /* old logic was first to check utf8 flag, and if utf8 always
+ /* The (very)
+ old logic was first to check utf8 flag, and if utf8 always
call esc_q_utf8. This caused test to break under -Mutf8,
because there even strings like 'c' have utf8 flag on.
Hence with quotekeys == 0 the XS code would still '' quote
them based on flags, whereas the perl code would not,
based on regexps.
- The perl code is correct.
- needs_quote() decides that anything that isn't a valid
- perl identifier needs to be quoted, hence only correctly
- formed strings with no characters outside [A-Za-z0-9_:]
- won't need quoting. None of those characters are used in
- the byte encoding of utf8, so anything with utf8
- encoded characters in will need quoting. Hence strings
- with utf8 encoded characters in will end up inside do_utf8
- just like before, but now strings with utf8 flag set but
- only ascii characters will end up in the unquoted section.
-
- There should also be less tests for the (probably currently)
- more common doesn't need quoting case.
- The code is also smaller (22044 vs 22260) because I've been
- able to pull the common logic out to both sides. */
- if (quotekeys || needs_quote(key,keylen)) {
+
+ The old logic checked that the string was a valid
+ perl glob name (foo::bar), which isn't safe under
+ strict, and differs from the perl code which only
+ accepts simple identifiers.
+
+ With the fix for [perl #120384] I chose to make
+ their handling of key quoting compatible between XS
+ and perl.
+ */
+ if (quotekeys || key_needs_quote(key,keylen)) {
if (do_utf8 || useqq) {
STRLEN ocur = SvCUR(retval);
nlen = esc_q_utf8(aTHX_ retval, key, klen, do_utf8, useqq);
@@ -1058,7 +1076,7 @@ DD_dump(pTHX_ SV *val, const char *name, STRLEN namelen, SV *retval, HV *seenhv,
#endif
i = 0; else i -= 4;
}
- if (needs_quote(c,i)) {
+ if (globname_needs_quote(c,i)) {
#ifdef GvNAMEUTF8
if (GvNAMEUTF8(val)) {
sv_grow(retval, SvCUR(retval)+2);
@@ -1152,18 +1170,18 @@ DD_dump(pTHX_ SV *val, const char *name, STRLEN namelen, SV *retval, HV *seenhv,
}
#endif
- /* the pure perl and XS non-qq outputs have historically been
- * different in this case, but for useqq, let's try to match
- * the pure perl code.
- * see [perl #74798]
- */
- else if (useqq && safe_decimal_number(aTHX_ val)) {
- sv_catsv(retval, val);
- }
else {
integer_came_from_string:
- c = SvPV(val, i);
- if (DO_UTF8(val) || useqq)
+ c = SvPV(val, i);
+ /* the pure perl and XS non-qq outputs have historically been
+ * different in this case, but for useqq, let's try to match
+ * the pure perl code.
+ * see [perl #74798]
+ */
+ if (useqq && safe_decimal_number(c, i)) {
+ sv_catsv(retval, val);
+ }
+ else if (DO_UTF8(val) || useqq)
i += esc_q_utf8(aTHX_ retval, c, i, DO_UTF8(val), useqq);
else {
sv_grow(retval, SvCUR(retval)+3+2*i); /* 3: ""\0 */
diff --git a/dist/Data-Dumper/t/dumper.t b/dist/Data-Dumper/t/dumper.t
index dbc6d5e..d2af53c 100644
--- a/dist/Data-Dumper/t/dumper.t
+++ b/dist/Data-Dumper/t/dumper.t
@@ -899,11 +899,8 @@ TEST q(Data::Dumper->new([$a])->Dumpxs;)
#};
EOT
-# perl code does keys and values as numbers if possible
-TEST q(Data::Dumper->new([$c])->Dump;);
-# XS code always does them as strings
-$WANT =~ s/ (\d+)/ '$1'/gs;
-TEST q(Data::Dumper->new([$c])->Dumpxs;)
+TEST q(Data::Dumper->new([$c])->Dump;), "sortkeys sub";
+TEST q(Data::Dumper->new([$c])->Dumpxs;), "sort keys sub (XS)"
if $XS;
}
@@ -949,9 +946,10 @@ TEST q(Data::Dumper->new([$c])->Dumpxs;)
#];
EOT
-TEST q(Data::Dumper->new([[$c, $d]])->Dump;);
-$WANT =~ s/ (\d+)/ '$1'/gs;
-TEST q(Data::Dumper->new([[$c, $d]])->Dumpxs;)
+TEST q(Data::Dumper->new([[$c, $d]])->Dump;), "more sortkeys sub";
+# the XS code does number values as strings
+$WANT =~ s/ (\d+)(,?)$/ '$1'$2/gm;
+TEST q(Data::Dumper->new([[$c, $d]])->Dumpxs;), "more sortkeys sub (XS)"
if $XS;
}
diff --git a/dist/Data-Dumper/t/quotekeys.t b/dist/Data-Dumper/t/quotekeys.t
index c633d56..a858828 100644
--- a/dist/Data-Dumper/t/quotekeys.t
+++ b/dist/Data-Dumper/t/quotekeys.t
@@ -15,7 +15,7 @@ BEGIN {
use strict;
use Data::Dumper;
-use Test::More tests => 10;
+use Test::More tests => 18;
use lib qw( ./t/lib );
use Testing qw( _dumptostr );
@@ -90,5 +90,46 @@ sub run_tests_for_quotekeys {
isnt($dumps{'ddqkzero'}, $dumps{'objqkundef'},
"\$Data::Dumper::Quotekeys = undef and = 0 are equivalent");
%dumps = ();
+
+ local $Data::Dumper::Quotekeys = 1;
+ local $Data::Dumper::Sortkeys = 1;
+ local $Data::Dumper::Indent = 0;
+ local $Data::Dumper::Useqq = 0;
+
+ my %qkdata =
+ (
+ 0 => 1,
+ '012345' => 1,
+ 12 => 1,
+ 123456789 => 1,
+ 1234567890 => 1,
+ '::de::fg' => 1,
+ ab => 1,
+ 'hi::12' => 1,
+ "1\x{660}" => 1,
+ );
+
+ is(Dumper(\%qkdata),
+ q($VAR1 = {'0' => 1,'012345' => 1,'12' => 1,'123456789' => 1,'1234567890' => 1,"1\x{660}" => 1,'::de::fg' => 1,'ab' => 1,'hi::12' => 1};),
+ "always quote when quotekeys true");
+
+ {
+ local $Data::Dumper::Useqq = 1;
+ is(Dumper(\%qkdata),
+ q($VAR1 = {"0" => 1,"012345" => 1,"12" => 1,"123456789" => 1,"1234567890" => 1,"1\x{660}" => 1,"::de::fg" => 1,"ab" => 1,"hi::12" => 1};),
+ "always quote when quotekeys true (useqq)");
+ }
+
+ local $Data::Dumper::Quotekeys = 0;
+
+ is(Dumper(\%qkdata),
+ q($VAR1 = {0 => 1,'012345' => 1,12 => 1,123456789 => 1,'1234567890' => 1,"1\x{660}" => 1,'::de::fg' => 1,ab => 1,'hi::12' => 1};),
+ "avoid quotes when quotekeys false");
+ {
+ local $Data::Dumper::Useqq = 1;
+ is(Dumper(\%qkdata),
+ q($VAR1 = {0 => 1,"012345" => 1,12 => 1,123456789 => 1,"1234567890" => 1,"1\x{660}" => 1,"::de::fg" => 1,ab => 1,"hi::12" => 1};),
+ "avoid quotes when quotekeys false (useqq)");
+ }
}
--
1.7.10.4
|
From @tonycozOn Mon Dec 09 19:22:12 2013, tonyc wrote:
Applied as 5b50ddc. Leaving this ticket open for the expected BBC reports. Tony |
From @tonycozOn Tue Dec 17 20:50:43 2013, tonyc wrote:
No BBC reports, closing. Tony |
@tonycoz - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#120384 (status was 'resolved')
Searchable as RT120384$
The text was updated successfully, but these errors were encountered: