Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning on duplicate key in literal hash definition #14736

Open
p5pRT opened this issue Jun 5, 2015 · 47 comments
Open

Warning on duplicate key in literal hash definition #14736

p5pRT opened this issue Jun 5, 2015 · 47 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 5, 2015

Migrated from rt.perl.org#125332 (status was 'open')

Searchable as RT125332$

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From @epa

Created by @epa

  my %h = (a => 1, a => 2);

The first value for key 'a' will immediately be overwritten by the
second. Having them both does nothing useful and almost always
indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text,

- two of the keys of the hash (that is, the even-numbered elements of the list)
  are literal strings in the program text and are the same string.

This bug report is not suggesting a warning on code like

  my %h = (a => 1); my %g = (%h, a => 2);

but only on the case of a single literal hash initializer which
contains the same string twice.

(I suspect this warning would in fact find quite a lot of
cut-and-paste bugs in existing code.)

Perl Info

Flags:
    category=core
    severity=wishlist

Site configuration information for perl 5.10.1:

Configured by Debian Project at Wed Mar  6 15:31:28 UTC 2013.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
    uname='linux madeleine 3.2.0-4-amd64 #1 smp debian 3.2.39-2 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.5', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.11.3.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1
    gnulibc_version='2.11.3'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Locally applied patches:
    DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts
    DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable.
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
    DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. 
    DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix
    DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option
    DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules.
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries
    DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib
    DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD.
    DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
    DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
    DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD
    DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file
    DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets.
    DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd.
    DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc
    DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other
    DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match
    DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64
    DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626]
    DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa
    DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method
    DEBPKG:fixes/tainted-errno - http://bugs.debian.org/574129 [perl #61976] [be1cf43] fix an errno stringification bug in taint mode
    DEBPKG:fixes/safe-upgrade - http://bugs.debian.org/582978 Upgrade Safe.pm to 2.25, fixing CVE-2010-1974
    DEBPKG:fixes/tell-crash - http://bugs.debian.org/578577 [f4817f3] Fix a tell() crash on bad arguments.
    DEBPKG:fixes/format-write-crash - http://bugs.debian.org/579537 [perl #22977] [421f30e] Fix a crash in format/write
    DEBPKG:fixes/arm-alignment - http://bugs.debian.org/289884 [f1c7503] Prevent gcc from optimizing the alignment test away on armel
    DEBPKG:fixes/fcgi-test - Fix a failure in CGI/t/fast.t when FCGI is installed
    DEBPKG:fixes/hurd-ccflags - http://bugs.debian.org/587901 Make hints/gnu.sh append to $ccflags rather than overriding them
    DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:fixes/lc-numeric-docs - http://bugs.debian.org/379329 [perl #78452] [903eb63] LC_NUMERIC documentation fixes
    DEBPKG:fixes/lc-numeric-sprintf - http://bugs.debian.org/601549 [perl #78632] [b3fd614] Fix sprintf not to ignore LC_NUMERIC with constants
    DEBPKG:fixes/concat-stack-corruption - http://bugs.debian.org/596105 [perl #78674] [e3393f5] Fix stack pointer corruption in pp_concat() with 'use encoding'
    DEBPKG:fixes/cgi-multiline-header - http://bugs.debian.org/606995 [CVE-2010-2761 CVE-2010-4410 CVE-2010-4411] CGI.pm MIME boundary and multiline header vulnerabilities
    DEBPKG:fixes/casing-taint-cve-2011-1487 - http://bugs.debian.org/622817 [perl #87336] fix unwanted taint laundering in lc(), uc() et al.
    DEBPKG:fixes/safe-reval-rdo-cve-2010-1447 - [PATCH] Wrap by default coderefs returned by rdo and reval
    DEBPKG:fixes/encode-heap-overflow - [PATCH] Fix decode_xs n-byte heap-overflow security bug in
    DEBPKG:fixes/digest_eval_hole - Close the eval \"require $module\" security hole in
    DEBPKG:fixes/unregister_signal_handler - [PATCH] main: Unregister signal handler before destroying my_perl
    DEBPKG:fixes/CVE-2012-5195 - avoid calling memset with a negative count
    DEBPKG:fixes/CVE-2012-5526 - [PATCH 1/4] CR escaping for P3P header
    DEBPKG:fixes/storable-security-warning - [PATCH] add a note about security concerns in Storable
    DEBPKG:fixes/maketext-code-execution - Fix misparsing of maketext strings.
    DEBPKG:fixes/CVE-2013-1667 - [PATCH] Prevent premature hsplit() calls, and only trigger REHASH
    DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-17squeeze5 in patchlevel.h


@INC for perl 5.10.1:
    /home/ed/lib/perl/5.10.1
    /home/ed/share/perl/5.10.1
    /etc/perl
    /usr/local/lib/perl/5.10.1
    /usr/local/share/perl/5.10.1
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.10
    /usr/share/perl/5.10
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.10.0
    /usr/local/share/perl/5.10.0
    .


Environment for perl 5.10.1:
    HOME=/home/ed
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LC_ALL=en_GB.UTF-8
    LD_LIBRARY_PATH=/home/ed/lib:/usr/local/lib
    LOGDIR (unset)
    PATH=/home/ed/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin
    PERL5LIB=/home/ed/lib/perl/5.10.1:/home/ed/share/perl/5.10.1
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From zefram@fysh.org

Ed Avis wrote​:

- two of the keys of the hash (that is, the even-numbered elements of the list)

That's difficult to detect, due to list context.

  my %h = (a => foo(), a => bar());

looks like it has clashing keys, but it may not if foo() returns an
even-length list.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From @epa

Zefram <zefram <at> fysh.org> writes​:

my %h = \(a => foo\(\)\, a => bar\(\)\);

Yes, I agree that this case would have to be excluded from the check.
Only hash initializers where the number of elements is known at compile
time would be statically checked in this way.

(Sidetrack​: it might have been a good idea for => to impose scalar context
on its RHS, but probably too late for that now.)

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From @leonerd

On Fri, 5 Jun 2015 14​:00​:20 +0000 (UTC)
Ed Avis <eda@​waniasset.com> wrote​:

(Sidetrack​: it might have been a good idea for => to impose scalar
context on its RHS, but probably too late for that now.)

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar
context to RHS, because I do things like

  Future->fail( "HTTP GET failed", http => $request, $response );

That said, if we're going to wish for alternate histories, then maybe

  BARE => SCALAR
  BARE ==> LIST

might be the spelling?

--
Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk
http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From zefram@fysh.org

Paul "LeoNerd" Evans wrote​:

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar
context to RHS, because I do things like

Future->fail( "HTTP GET failed", http => $request, $response );

Would make no difference in this case, because the RHS, $request, behaves
the same regardless of context. Ed was clearly not proposing for =>
to alter the syntactic list structure.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From @Tux

On Fri, 5 Jun 2015 17​:11​:26 +0100, "Paul \"LeoNerd\" Evans"
<leonerd@​leonerd.org.uk> wrote​:

On Fri, 5 Jun 2015 14​:00​:20 +0000 (UTC)
Ed Avis <eda@​waniasset.com> wrote​:

(Sidetrack​: it might have been a good idea for => to impose scalar
context on its RHS, but probably too late for that now.)

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar
context to RHS, because I do things like

s/sometimes/very often/

my $string = join "," => $a, $b, @​foo, func (), "TAIL";

Future->fail ("HTTP GET failed", http => $request, $response);

That said, if we're going to wish for alternate histories, then maybe

BARE => SCALAR
BARE ==> LIST

might be the spelling?

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From @leonerd

On Fri, 5 Jun 2015 17​:23​:36 +0100
Zefram <zefram@​fysh.org> wrote​:

Paul "LeoNerd" Evans wrote​:

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply
scalar context to RHS, because I do things like

Future->fail( "HTTP GET failed", http => $request, $response );

Would make no difference in this case, because the RHS, $request,
behaves the same regardless of context. Ed was clearly not proposing
for => to alter the syntactic list structure.

Huh. Ohyes, of course.

In that case, ignore me :)

--
Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk
http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From @leonerd

On Fri, 5 Jun 2015 18​:32​:36 +0200
"H.Merijn Brand" <h.m.brand@​xs4all.nl> wrote​:

That said, if we're going to wish for alternate histories, then
maybe

BARE => SCALAR
BARE ==> LIST

might be the spelling?

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!

What; you wanted to reserve ==> for the pipeline operator?

  @​items ==> grep { ... } ==> map { ... } ==> my @​result;

perl6-style?

--
Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk
http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2015

From @Tux

On Fri, 5 Jun 2015 17​:46​:13 +0100, "Paul \"LeoNerd\" Evans"
<leonerd@​leonerd.org.uk> wrote​:

On Fri, 5 Jun 2015 18​:32​:36 +0200
"H.Merijn Brand" <h.m.brand@​xs4all.nl> wrote​:

That said, if we're going to wish for alternate histories, then
maybe

BARE => SCALAR
BARE ==> LIST

might be the spelling?

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!

What; you wanted to reserve ==> for the pipeline operator?

Joke, right?

I do not want => to enforce scalar on the right hand side
what ==> might be used for in the future is open to discussion

@​items ==> grep { ... } ==> map { ... } ==> my @​result;

perl6-style?

I'm also "doing" perl6 on an almost daily basis now, but perl5 still
wins for me. And the rumor goes that *perl5* has weird syntax
experience and not everything is consistent :)

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2015

From moregan@stresscafe.com

my %h = (a => 1, a => 2);

The first value for key 'a' will immediately be overwritten by the
second. Having them both does nothing useful and almost always
indicates a typo.

A lot of work has been done on this by Kevin Ryde in Perl​::Critic​::Policy​::ValuesAndExpressions​::ProhibitDuplicateHashKeys : http​://search.cpan.org/~kryde/Perl-Critic-Pulp-90/

Mike

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @iabyn

On Fri, Jun 05, 2015 at 06​:28​:00AM -0700, Ed Avis wrote​:

my %h = \(a => 1\, a => 2\);

The first value for key 'a' will immediately be overwritten by the
second. Having them both does nothing useful and almost always
indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text,

The only way I can think to detect this is, at compile time where
an assignment is seen with a list of constants on the RHS, to create a
temporary hash, then scan the keys of the RHS adding them to the temp
hash, checking for duplicates, then throw the tmp hash away.

Which seems moderately expensive.

Alternatively, perhaps we should enhance the HV API to provide a
hash copy function, then stuff like

  %h1 = %h2;

could be done directly rather than pushing all %h2's keys and values onto
the stack?

Then with

  %h = (constkey, constval, ....);

the (constkey, constval, ...) could be constant-folded to a single hash,
with the runtime code doing a direct hash copy. The warning would then
be easily generated during the constant folding.

--
Music lesson​: a symbiotic relationship whereby a pupil's embellishments
concerning the amount of practice performed since the last lesson are
rewarded with embellishments from the teacher concerning the pupil's
progress over the corresponding period.

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @epa

I don't think the compile-time cost of checking for duplicate hash keys
will be significant. Not unless someone is generating huge Perl programs
with hashes containing millions of entries - in which case there are
certainly more efficient ways to do it anyway.

Constant-folding hashes does sound like a good idea, more so if it would
let duplicate keys be detected straightforwardly. Indeed, if there were
some way to declare a hash as read-only, so that its contents come from
the initial declaration and assignment and can never afterwards be changed,
a lot of hash lookups could be constant-folded too. But that is another
sidetrack...

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @demerphq

On 8 June 2015 at 11​:23, Ed Avis <eda@​waniasset.com> wrote​:

I don't think the compile-time cost of checking for duplicate hash keys
will be significant. Not unless someone is generating huge Perl programs
with hashes containing millions of entries - in which case there are
certainly more efficient ways to do it anyway.

Constant-folding hashes does sound like a good idea, more so if it would
let duplicate keys be detected straightforwardly. Indeed, if there were
some way to declare a hash as read-only, so that its contents come from
the initial declaration and assignment and can never afterwards be changed,
a lot of hash lookups could be constant-folded too. But that is another
sidetrack...

Just curious, what should happen with code like this​:

my %hash= ( a=> 1, b=>2 );

my %other_hash= ( a=> 0, %hash );

would you expect a warning here? FWIW, personally I wouldn't, as this
is a fairly standard way to set up defaults in initialization.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @epa

demerphq <demerphq <at> gmail.com> writes​:

Just curious, what should happen with code like this​:

my %hash= ( a=> 1, b=>2 );

my %other_hash= ( a=> 0, %hash );

As I mentioned in the original bug report, I do not suggest making any warning
for this case. Only in cases where the two duplicate keys are literally
present in the program text as part of the same hash assignment.

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @epa

Thanks for the mention of Perl​::Critic​::Pulp. I do use it. But nobody runs their program through perlcritic every single edit-test cycle. Although particularly heavyweight, controversial or specialized warnings are certainly better in perlcritic, I believe that checking for duplicate literal hash keys in the program is lightweight enough and universally useful enough to merit a warning in perl itself.

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From zefram@fysh.org

Ed Avis via RT wrote​:

I believe that checking for duplicate literal hash keys in the program
is lightweight enough and universally useful enough to merit a warning
in perl itself.

There's a middle path that you seem to have overlooked​: a compile-time
warning implemented by a module. One could implement it quite easily
by hooking the checking for the aassign op. That's very close to where
one would naturally implement it in the core.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @demerphq

On 8 June 2015 at 13​:34, Ed Avis <eda@​waniasset.com> wrote​:

demerphq <demerphq <at> gmail.com> writes​:

Just curious, what should happen with code like this​:

my %hash= ( a=> 1, b=>2 );

my %other_hash= ( a=> 0, %hash );

As I mentioned in the original bug report, I do not suggest making any warning
for this case. Only in cases where the two duplicate keys are literally
present in the program text as part of the same hash assignment.

Sorry I missed that point. Thanks for the explanation. This sounds
like a useful improvement to me, assuming it is doable.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @timbunce

On Mon, Jun 08, 2015 at 04​:54​:27AM -0700, Ed Avis via RT wrote​:

Thanks for the mention of Perl​::Critic​::Pulp. I do use it. But
nobody runs their program through perlcritic every single edit-test cycle.

[Just a data point] At $work we run perlcritic in a git commit hook via
https://metacpan.org/pod/App::GitHooks::Plugin::PerlCritic
That works very well for us.

Tim.

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @timbunce

On Mon, Jun 08, 2015 at 09​:14​:41AM +0100, Dave Mitchell wrote​:

On Fri, Jun 05, 2015 at 06​:28​:00AM -0700, Ed Avis wrote​:

my %h = \(a => 1\, a => 2\);

The first value for key 'a' will immediately be overwritten by the
second. Having them both does nothing useful and almost always
indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text,

The only way I can think to detect this is, at compile time where
an assignment is seen with a list of constants on the RHS, to create a
temporary hash, then scan the keys of the RHS adding them to the temp
hash, checking for duplicates, then throw the tmp hash away.

Couldn't we just compare the number of elements in the list with the
number of elements that end up in the hash?

Tim.

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @epa

Tim Bunce <Tim.Bunce <at> pobox.com> writes​:

Couldn't we just compare the number of elements in the list with the
number of elements that end up in the hash?

That's certainly a start - it doesn't let you report exactly which hash key
was duplicated - but a lot better than nothing.

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @Tux

On Mon, 8 Jun 2015 06​:58​:10 -0600, Tim Bunce <Tim.Bunce@​pobox.com>
wrote​:

On Mon, Jun 08, 2015 at 04​:54​:27AM -0700, Ed Avis via RT wrote​:

Thanks for the mention of Perl​::Critic​::Pulp. I do use it. But
nobody runs their program through perlcritic every single edit-test cycle.

[Just a data point] At $work we run perlcritic in a git commit hook via
https://metacpan.org/pod/App::GitHooks::Plugin::PerlCritic
That works very well for us.

I think perlcritic is valuable when manually run every 6 month or so

There are too many edge cases broken to use it in commit hooks. A few
examples I just noticed in the past hour​:

@​_{slice} != @​_


  local %_;
  @​hdr and @​_{@​hdr} = @​$r;

results in

  [4 - Variables​::RequireLocalizedPunctuationVars]
  Magic variable "@​_" should be assigned as "local"
  :@​hdr and @​_{@​hdr} = @​$r;

grep { ... } @​foo and action != void context


  grep { !defined } @​_ and croak ($self->SetDiag (1004));

results in

  [3 - BuiltinFunctions​::ProhibitVoidGrep]
  "grep" used in void context
  :grep { !defined } @​_ and croak ($self->SetDiag (1004));

my ($var) inside sub { } is not the same variable


  $csv->callbacks (after_parse => sub {
  my ($csv, $r) = @​_;

results in

  [3 - Variables​::ProhibitReusedNames]
  Reused variable name in lexical scope​: $csv
  :my ($csv, $r) = @​_;

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @epa

Zefram <zefram <at> fysh.org> writes​:

checking for duplicate literal hash keys

There's a middle path that you seem to have overlooked​: a compile-time
warning implemented by a module.

Would that be able to warn on C<%h = (a => 1, a => 2)> but keep quiet on
C<%h = (a => 1); %g = (%h, a => 2)>?

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From zefram@fysh.org

Ed Avis wrote​:

Zefram <zefram <at> fysh.org> writes​:

There's a middle path that you seem to have overlooked​: a compile-time
warning implemented by a module.

Would that be able to warn on C<%h = (a => 1, a => 2)> but keep quiet on
C<%h = (a => 1); %g = (%h, a => 2)>?

Yes. It would actually be quite difficult to link the two statements in
that latter example at compile time. There's nothing offensive about
(%h, a=>2) in isolation, and %h doesn't have its clashing value when
that list is compiled.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @rcaputo

On Jun 8, 2015, at 10​:50, Zefram <zefram@​fysh.org> wrote​:

Ed Avis wrote​:

Zefram <zefram <at> fysh.org> writes​:

There's a middle path that you seem to have overlooked​: a compile-time
warning implemented by a module.

Would that be able to warn on C<%h = (a => 1, a => 2)> but keep quiet on
C<%h = (a => 1); %g = (%h, a => 2)>?

Yes. It would actually be quite difficult to link the two statements in
that latter example at compile time. There's nothing offensive about
(%h, a=>2) in isolation, and %h doesn't have its clashing value when
that list is compiled.

All the questions about edge cases imply that people expect this to be a lexically scoped feature not tightly coupled to compile time.

Tying %h to a class that raises an exception on duplicates would work in all the exceptional cases. To limit the feature's scope, untie the hash after initialization.

--
Rocco Caputo <rcaputo@​pobox.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From zefram@fysh.org

Rocco Caputo wrote​:

All the questions about edge cases imply that people expect this to be
a lexically scoped feature not tightly coupled to compile time.

No, the questions about edge cases imply that it's a semantically
tricky area with many edges and more than one route to implementation.
Most requested versions of the warning have been semantically compile-time
checks, and most discussion about implementation has been based on
implementing it at compile time. I think everyone expects that it would
be under lexically-scoped control, but that's unrelated to all the other
things you mention.

Tying %h to a class that raises an exception on duplicates would work
in all the exceptional cases. To limit the feature's scope, untie the
hash after initialization.

That sounds messy, and would interfere with ordinary use of the hash.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2015

From @epa

Rocco Caputo <rcaputo <at> pobox.com> writes​:

All the questions about edge cases imply that people expect this to be a
lexically scoped feature not tightly coupled to compile time.

Speaking for myself, that is not the intention. I envisage a strictly
compile-time check of duplicate hash keys in a literal hash assignment.
The check would not need to be lexically scoped because nobody would ever
want to disable it - at least I have not imagined a case where they would.

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 10, 2015

From @jkeenan

On Mon Jun 08 08​:23​:28 2015, eda@​waniasset.com wrote​:

Rocco Caputo <rcaputo <at> pobox.com> writes​:

All the questions about edge cases imply that people expect this to be a
lexically scoped feature not tightly coupled to compile time.

Speaking for myself, that is not the intention. I envisage a strictly
compile-time check of duplicate hash keys in a literal hash assignment.
The check would not need to be lexically scoped because nobody would ever
want to disable it - at least I have not imagined a case where they would.

My sense of the discussion so far in this thread is that, in order to minimize side effects, the number of cases in which the warning would actually be generated would have to be strictly limited to the simplest cases.

If that is so, then I doubt the benefit of this warning would outweigh the cost in human work-hours incurred in​:

* Implementing the warning in the core; I note that no patch was submitted.

* More importantly, the future maintenance cost for the Porters.

How much hand-holding do we have to do?
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2015

From @epa

James E Keenan - you are right, the warning would be limited to the simplest cases (which are also those which are completely uncontroversial, IMHO). I suggest that despite the simple-mindedness of the check it would actually catch a fair few bugs in practice. This is because a programmer will often cut and paste to add a new entry to a configuration hash, and forget to change the hash key. This is just my experience from my own code and reading code written by others.

Although the check is conceptually simple to explain (to reiterate​: when the two duplicate keys are literally present in the source code in the same hash assignment, and when the determination of odd and even positions is also obvious at compile time), I did not have much idea how difficult it would be to implement. If straightforward, it is an easy win; otherwise it may not be worth the development effort (which in any case is decided by whoever does the work, unless others are interested in funding this proposal).

Should the bug be closed as WONTFIX or WONTBOTHER, that is fine of course.

If the perl5-porters would prefer not to receive feature requests which aren't backed by patches, I will stop sending them.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2015

From @ikegami

On Fri, Jun 5, 2015 at 12​:23 PM, Zefram <zefram@​fysh.org> wrote​:

Paul "LeoNerd" Evans wrote​:

Sidetrack^2​: I sometimes find it useful that => *doesn't* apply scalar
context to RHS, because I do things like

Future->fail( "HTTP GET failed", http => $request, $response );

Would make no difference in this case

Better example​: system(cat => @​files)

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2015

From @epa

Things like

  system(cat => @​files)

are interesting but outside the scope of this feature, since it isn't a hash assignment; there isn't the same literal key appearing twice in the sequence; and the number of elements isn't known at compile time.

I would only check fixed lists of k=>v pairs, or (as a slightly enhanced version of the check) lists which contain only k=>v pairs and other hashes as

  %h = (%i, a => 1, b => 2, %j, a => 3)

since in this case the odd-even of each position is known at compile time, even if the total number of elements isn't.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2015

From @rcaputo

On Jun 11, 2015, at 11​:03, Ed Avis via RT <perlbug-followup@​perl.org> wrote​:

Things like

system(cat => @​files)

are interesting but outside the scope of this feature, since it isn't a hash assignment; there isn't the same literal key appearing twice in the sequence; and the number of elements isn't known at compile time.

If it were a runtime feature of list assignment to a hash, it would work in all cases, and the very specific case you'd like to happen at compile time could as an optimization.

--
Rocco Caputo <rcaputo@​pobox.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2015

From @epa

I suggest a runtime warning wouldn't really work because (unless I am
mistaken) it would not be able to distinguish between

  %c = (a => 1);
  %h = (a => 1, a => 2); # pointless
  %h = (%c, a => 2); # normal idiom for overriding some entries

In both cases the RHS of the assignment is the list ('a',1,'a',2) at run
time. You really do need to look at the literal program text to see the
duplicate keys - hence, it must be a compile time check.

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2015

From @rcaputo

On Jun 11, 2015, at 16​:02, Ed Avis <eda@​waniasset.com> wrote​:

I suggest a runtime warning wouldn't really work because (unless I am
mistaken) it would not be able to distinguish between

%c = (a => 1);
%h = (a => 1, a => 2); # pointless
%h = (%c, a => 2); # normal idiom for overriding some entries

In both cases the RHS of the assignment is the list ('a',1,'a',2) at run
time. You really do need to look at the literal program text to see the
duplicate keys - hence, it must be a compile time check.

If the feature were lexically scoped, or scope-able, one could disable the warning where the duplicates were deliberate. Or, I suppose, enable it where a problem was anticipated. That's why I mentioned lexical scoping before.

--
Rocco Caputo <rcaputo@​pobox.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @iabyn

On Thu, Jun 11, 2015 at 02​:33​:41AM -0700, Ed Avis via RT wrote​:

Although the check is conceptually simple to explain (to reiterate​: when
the two duplicate keys are literally present in the source code in the
same hash assignment, and when the determination of odd and even
positions is also obvious at compile time), I did not have much idea how
difficult it would be to implement. If straightforward, it is an easy
win; otherwise it may not be worth the development effort (which in any
case is decided by whoever does the work, unless others are interested
in funding this proposal).

I think the best approach to implementing this is to optimise assigning a
constant list to a hash by constant folding the list into a constant hash.
Then the warning becomes trivial to implement as part of creating the
constant hash.

Should the bug be closed as WONTFIX or WONTBOTHER, that is fine of
course.

I think the ticket should be left open. I'll add constant folding hashes
(and maybe arrays) to my list of things to to.

If the perl5-porters would prefer not to receive feature requests which
aren't backed by patches, I will stop sending them.

Speaking purely for myself, I'm happy for people to suggest features that
they themselves are not in a position to implement.

--
You never really learn to swear until you learn to drive.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From zefram@fysh.org

Dave Mitchell wrote​:

I think the best approach to implementing this is to optimise assigning a
constant list to a hash by constant folding the list into a constant hash.

That would miss cases such as

  %h = (a => $a, a => $b);

which I think Ed wants to be covered.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @kentfredric

On 6 June 2015 at 01​:28, Ed Avis <perlbug-followup@​perl.org> wrote​:

my %h = \(a => 1\, a => 2\);

The first value for key 'a' will immediately be overwritten by the
second. Having them both does nothing useful and almost always
indicates a typo. I suggest perl could warn in the case where

- a hash is initialized with a list given directly in the program text,

- two of the keys of the hash (that is, the even-numbered elements of the
list)
are literal strings in the program text and are the same string.

Question​:

What would happen here​:

my %h = ( a => 1, @​list, a => 2, @​list );

Given @​list can be an odd sized list, which may in fact expand as​:

my %h = ( a => 1, b => a, 2 => b );

( Apologies if it was covered elsewhere, I didn't see this exact concern
mentioned )

--
Kent

*KENTNL* - https://metacpan.org/author/KENTNL

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @epa

my %h = ( a => 1, @​list, a => 2, @​list );

Sadly that would have to be excluded from the check since the
odd-even-ness of each element cannot be determined at
compile time.

There is an interesting issue here, and perhaps a case for
adding a warning when an x=>y pair ends up 'misaligned' so
that x is at an odd position in a hash assignment, but it is
out of scope for this feature request.

Only cases where the number of elements in the list is known
at compile time - or, for a slightly more powerful check, the
odd-even-ness of each element is known at compile time - would
be checkable.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @rgs

On 12 June 2015 at 15​:29, Ed Avis via RT <perlbug-followup@​perl.org> wrote​:

my %h = ( a => 1, @​list, a => 2, @​list );

Sadly that would have to be excluded from the check since the
odd-even-ness of each element cannot be determined at
compile time.

There is an interesting issue here, and perhaps a case for
adding a warning when an x=>y pair ends up 'misaligned' so
that x is at an odd position in a hash assignment, but it is
out of scope for this feature request.

Only cases where the number of elements in the list is known
at compile time - or, for a slightly more powerful check, the
odd-even-ness of each element is known at compile time - would
be checkable.

I was just thinking about
%x = (
  a => 1,
  b => 2,
  ($need_some_more_b ? (b => 3) : ()),
);
where the trick is to use a boolean flag to change the default value of b.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @kentfredric

On 13 June 2015 at 01​:29, Ed Avis via RT <perlbug-followup@​perl.org> wrote​:

Sadly that would have to be excluded from the check since the
odd-even-ness of each element cannot be determined at
compile time.

Can I read that to interpret such that​:

( key1 => value, key2 => value, ARRAY_OR_SUBCALL, .* ) will process key1
and key2, but ignore everything that comes after LIST_OR_SUB?

That is

1​: for hashes that are entirely static declarations, full key analysis can
work
2​: for hashes with part that /could/ be returning a list of values, (
either @​ or x_() ), only the "keys" prior to the indeterminate part would
be analysed, and all parts after the indeterminate part ignored. ( Because
anything that comes after an @​ or & can't be statically determined to be a
key or a value ).

--
Kent

*KENTNL* - https://metacpan.org/author/KENTNL

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From zefram@fysh.org

Rafael Garcia-Suarez wrote​:

I was just thinking about
%x = (
a => 1,
b => 2,
($need_some_more_b ? (b => 3) : ()),
);
where the trick is to use a boolean flag to change the default value of b.

That seems a perverse way of changing the value ascribed to b.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @epa

for hashes with part that /could/ be returning a list of
values, ( either @​ or x_() ), only the "keys" prior to the
indeterminate part would be analysed, and all parts after
the indeterminate part ignored.

That's one option and the one I envisaged. Then the warning is always sound.

I suppose an alternative is just to assume that any subroutine call or array included in the hash initializer will return an even number of elements (since after all, what crazy person would mix => and odd-sized arrays in a hash initializer expr?) and warn on that basis.

That makes the warning technically unsound - since you *could* concoct an example program which triggers it while not having a duplicate hash key - but perhaps more useful in practice.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @rjbs

* Kent Fredric <kentfredric@​gmail.com> [2015-06-12T09​:59​:12]

On 13 June 2015 at 01​:29, Ed Avis via RT <perlbug-followup@​perl.org> wrote​:

Sadly that would have to be excluded from the check since the
odd-even-ness of each element cannot be determined at
compile time.

Can I read that to interpret such that​:

( key1 => value, key2 => value, ARRAY_OR_SUBCALL, .* ) will process key1
and key2, but ignore everything that comes after LIST_OR_SUB?

This is getting out of hand.

If this is going ot happen, it needs to be straightforward and have no false
positives, which is possible.

* scalar constants count as one thing
* $scalar and *glob variables and \references count as one thing
* hash variables count as zero things
* everything else causes the check to be aborted

If the sum of things is even, no warning. If the sum of things is odd,
warning.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2015

From @ikegami

On Thu, Jun 11, 2015 at 3​:58 PM, Rocco Caputo <rcaputo@​pobox.com> wrote​:

If it were a runtime feature of list assignment to a hash, it would work
in all cases, and the very specific case you'd like to happen at compile
time could as an optimization.

Which leads to the reason I wouldn't like it​:

%a = (b => @​c);

would be different than

@​d =(b => @​c);
%a = @​d;

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2015

From @epa

Ed Avis <eda <at> waniasset.com> writes​:

(Sidetrack​: it might have been a good idea for => to impose scalar context
on its RHS, but probably too late for that now.)

Just to mention an interesting bug that arose because programmers weren't
aware that the RHS of => is actually list context​:

  http​://blog.gerv.net/2014/10/new

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2015

From @Abigail

On Mon, Aug 03, 2015 at 12​:12​:08PM +0000, Ed Avis wrote​:

Ed Avis <eda <at> waniasset.com> writes​:

(Sidetrack​: it might have been a good idea for => to impose scalar context
on its RHS, but probably too late for that now.)

Just to mention an interesting bug that arose because programmers weren't
aware that the RHS of => is actually list context​:

http&#8203;://blog\.gerv\.net/2014/10/new

Well, it's a very good thing the RHS of => is in list context,
if it were in scalar context,

  %h = (a => 1, b => 2);

would be a one element hash (with key 'a', and value 2).

(This is also directed to the early subthread which suggests that the
RHS of => should be in scalar context).

In the blog post, the code would have failed if => provided scalar
context as well. The code makes *two* incorrect assumptions, not
just one​:

  - => has a higher precedence than ,
  - => provides scalar context to its arguments, where , doesn't.

Just changing => so it provides scalar context on its RHS only helps
someone who wrote​:

  %h = (key => foo());

with foo() returning a multi-element list in list context. (Which
is also the case for a new operator ==> which only differs from
=> by the context it provides to its RHS).

On top of changing the RHS behavior (or introducing a new operator)
you'd also need another level of precedence.

But I've yet to encounter someone who says "well, I'm being baffled
by context all the time, but 24 levels of precedence is a piece of
cake -- I wish there were more".

Abigail

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2015

From @epa

Abigail <abigail <at> abigail.be> writes​:

http​://blog.gerv.net/2014/10/new

The code makes *two* incorrect assumptions, not just one​:

- => has a higher precedence than ,
- => provides scalar context to its arguments, where , doesn't.

Those assumptions are both wrong but I think they are the mental model of
many Perl programmers when writing hash assignments with =>. At least, to
the extent that the programmer has muddled through with a vague idea of the
semantics. I too imagined that the => created a "pair" and by so doing had
higher precedence than , somehow; and any example or tutorial uses only
scalars or single-item-returning functions on the RHS, so you imagine that
the RHS of => will always be a scalar.

The authors of Bugzilla must be in the top 20% of Perl programmers, and they
got it wrong too.

But I do agree that it is not an easy thing to solve from where Perl is now.
It would be interesting to try tweaking => to bind more tightly than ,
and see what proportion of CPAN code compiles differently. TBH, I doubt that
25 levels of precedence is any more confusing than 24 :=-(

--
Ed Avis <eda@​waniasset.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants