Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented weird CORE::glob() behavior #14321

Open
p5pRT opened this issue Dec 10, 2014 · 4 comments
Open

Undocumented weird CORE::glob() behavior #14321

p5pRT opened this issue Dec 10, 2014 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 10, 2014

Migrated from rt.perl.org#123404 (status was 'open')

Searchable as RT123404$

@p5pRT
Copy link
Author

p5pRT commented Dec 10, 2014

From @kbenson

This is a bug report for perl from kentrak@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.21.6.


This was spurred from a discussion on Hacker News, and I think my reply
there was fairly succinct. The summary is that the special Perl magic
that makes glob work as an iterator is described inaccurately in the docs
(both for built-in glob and File​::Glob), assuming the current behavior
is expected and not a bug. In addition, File​::Glob​::bsd_glob's scalar
use appears to return the last alpha-sorted item, not the first.

You picked an extremely good built-in as an example. It doesn't function exactly as you've shown it, but there is very weird stuff going on. On the plus side, this is the only time I've seen something quite like this in Perl, so I'm not sure it's represents the state of using Perl as a whole very well.

glob appears to do something special, and unlike most other Perl functions where it's just a matter of knowing the context and the API the function exposes. In the versions I just tested (including 5.21.6), glob in fact doesn't quite do what it's documentation says, in the apparent effort to "do what you mean". It does not iterate through files in the result when used in scalar context, it iterates through files in the result when used in scalar context and in the exact same call-site in the program.

Here's all the files we'll test

  # perl -E 'my @​files = glob("*"); say for @​files;
  one.a
  one.b
  two.a
  two.b

Glob acts like an iterator when used in scalar context. Here, each call
returns another file.

  # perl -E 'while ( my $file = glob("*") ) { say $file; } '
  one.a
  one.b
  two.a
  two.b

Glob gives the first file each time it's called in scalar context if it's
not the same exact point in the source. This doesn't follow the docs, which
says it will act like an iterator in scalar context.

  # perl -E 'my $file1 = glob("*"); my $file2 = glob("*"); say $file1; say $file2;'
  one.a
  one.a

Again, we see it's not acting like an iterator. Each call is generating
it's own list and returning the first file.

  # perl -E 'my $file1 = glob("*"); my $file2 = glob("two*"); say $file1; say $file2;'
  one.a
  two.a

Here, we can see that since the same point in the code is getting hit, glob
is acting like an iterator.

  # perl -E 'sub myglob { my $mask = shift; glob($mask); } my $file1 = myglob("*"); my $file2 = myglob("*"); say $file1; say $file2;'
  one.a
  one.b

Here we see that since the same point in code is getting hit, glob is
ignoring it's input and just acting like an iterator on the first input,
which seems to be what you were trying to show in your example.

  # perl -E 'sub myglob { my $mask = shift; glob($mask); } my $file1 = myglob("*"); my $file2 = myglob("two*"); say $file1; say $file2;'
  one.a
  one.b

That is very weird behavior. I agree it's not consistent with the rest
of the language. I'm not sure it's indicative of the language as a whole
though, as it appears to be due to weird historical implementation details
that are kept for backwards compatibility. The perldoc for glob says it's
implemented using the the standard File​::Glob module. The perldoc for
File​::Glob mentions that it implements the code glob in terms of bsd_glob
(the FreeBSD glob(3) routine, a superset of POSIX glob), which is function
that can also be exported. In fact, the bsd_glob function, when used, acts
as we would wish, without weird iterator behavior (without iterator behavior
at all, in fact).

  # perl -E 'use File​::Glob qw/​:bsd_glob/; my $file1 = bsd_glob("*"); my $file2 = bsd_glob("two*"); say $file1; say $file2;'
  two.b
  two.b
  # perl -E 'use File​::Glob qw/​:bsd_glob/; sub myglob { my $mask = shift; bsd_glob($mask); } my $file1 = myglob("*"); my $file2 = myglob("one*"); say $file1; say $file2;'
  two.b
  one.b

Of course, this means the documentation that says the core glob routine is
implemented using bsd_glob has some glaring omissions.

So, congratulations, you picked an extremely good example and unearthed some
crazy Perl arcana, and possibly a bug (there's some open, longstanding tickets
regarding glob bugs[1][2], which seem to boil down to "we're doing what we can
to make it better, but we're hampered by backwards compatibility and weird
semantics"). I think the documentation is woefully inadequate to explain what's
going on in this case though.

Note​: that bsd_glob seems to return the last item when used in scalar context,
not the first, which may be a bug.

  [1]​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=2707

  [2]​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=2713



Flags​:
  category=library
  severity=low
  module=File​::Glob


Site configuration information for perl 5.21.6​:

Configured by root at Thu Dec 4 11​:38​:18 PST 2014.

Summary of my perl5 (revision 5 version 21 subversion 6) configuration​:
 
  Platform​:
  osname=linux, osvers=2.6.32-220.17.1.el6.x86_64, archname=x86_64-linux
  uname='linux stats.ticketfrontier.com 2.6.32-220.17.1.el6.x86_64 #1 smp wed may 16 00​:01​:37 bst 2012 x86_64 x86_64 x86_64 gnulinux '
  config_args='-de -Dprefix=/root/perl5/perlbrew/perls/perl-5.21.6 -Dusedevel -Aeval​:scriptdir=/root/perl5/perlbrew/perls/perl-5.21.6/bin'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
  optimize='-O2',
  cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.4.7 20120313 (Red Hat 4.4.7-4)', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 /usr/local/lib64
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=libc-2.12.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.12'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'


@​INC for perl 5.21.6​:
  /root/perl5/perlbrew/perls/perl-5.21.6/lib/site_perl/5.21.6/x86_64-linux
  /root/perl5/perlbrew/perls/perl-5.21.6/lib/site_perl/5.21.6
  /root/perl5/perlbrew/perls/perl-5.21.6/lib/5.21.6/x86_64-linux
  /root/perl5/perlbrew/perls/perl-5.21.6/lib/5.21.6
  .


Environment for perl 5.21.6​:
  HOME=/root
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/root/perl5/perlbrew/bin​:/root/perl5/perlbrew/perls/perl-5.21.6/bin​:/opt/ActivePerl-5.14/site/bin​:/opt/ActivePerl-5.14/bin​:/usr/lib64/qt-3.3/bin​:/opt/ActivePerl/site/bin​:/opt/ActivePerl/bin​:/usr/local/sbin​:/usr/local/bin​:/sbin​:/bin​:/usr/sbin​:/usr/bin​:/root/bin
  PERL5LIB=
  PERLBREW_LIB=
  PERLBREW_MANPATH=/root/perl5/perlbrew/perls/perl-5.21.6/man
  PERLBREW_PATH=/root/perl5/perlbrew/bin​:/root/perl5/perlbrew/perls/perl-5.21.6/bin
  PERLBREW_PERL=perl-5.21.6
  PERLBREW_ROOT=/root/perl5/perlbrew
  PERLBREW_SKIP_INIT=1
  PERLBREW_VERSION=0.72
  PERL_BADLANG (unset)
  PERL_LOCAL_LIB_ROOT=
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Dec 11, 2014

From @cpansprout

On Wed Dec 10 11​:31​:29 2014, kentrak@​gmail.com wrote​:

glob appears to do something special, and unlike most other Perl
functions where it's just a matter of knowing the context and the API
the function exposes. In the versions I just tested (including
5.21.6), glob in fact doesn't quite do what it's documentation says,
in the apparent effort to "do what you mean". It does not iterate
through files in the result when used in scalar context, it iterates
through files in the result when used in scalar context and in the
exact same call-site in the program.

This call-site behaviour has always been present for glob. I wouldn’t consider it a bug. But the documentation is somewhat lacking.

Here's all the files we'll test

# perl -E 'my @​files = glob("*"); say for @​files;
one.a
one.b
two.a
two.b

Glob acts like an iterator when used in scalar context. Here, each
call
returns another file.

# perl -E 'while ( my $file = glob("*") ) { say $file; } '
one.a
one.b
two.a
two.b

Glob gives the first file each time it's called in scalar context if
it's
not the same exact point in the source. This doesn't follow the docs,
which
says it will act like an iterator in scalar context.

It is an iterator, but it’s a different iterator for each call site. Honestly, I don’t see how else a glob iterator could behave.

In fact, the bsd_glob function, when used,
acts
as we would wish, without weird iterator behavior (without iterator
behavior
at all, in fact).

# perl -E 'use File​::Glob qw/​:bsd_glob/; my $file1 = bsd_glob("*"); my
$file2 = bsd_glob("two*"); say $file1; say $file2;'
two.b
two.b
# perl -E 'use File​::Glob qw/​:bsd_glob/; sub myglob { my $mask =
shift; bsd_glob($mask); } my $file1 = myglob("*"); my $file2 =
myglob("one*"); say $file1; say $file2;'
two.b
one.b

Of course, this means the documentation that says the core glob
routine is
implemented using bsd_glob has some glaring omissions.

So, congratulations, you picked an extremely good example and
unearthed some
crazy Perl arcana, and possibly a bug (there's some open, longstanding
tickets
regarding glob bugs[1][2], which seem to boil down to "we're doing
what we can
to make it better, but we're hampered by backwards compatibility and
weird
semantics").

That pretty much explains it all.

I think the documentation is woefully inadequate to
explain what's
going on in this case though.

Note​: that bsd_glob seems to return the last item when used in scalar
context,
not the first, which may be a bug.

File​::Glob was poorly implemented when it was first dragged kicking and screaming into the core in 5.6. Just thinking about it makes my head hurt. :-( It would take me a long time to explain all the foibles.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Dec 11, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 1, 2015

From @kbenson

Yeah, I understand "fixing" the behavior of this is a non-starter at this
point, I really sent in the report to illustrate how inadequate the
documentation was. Maybe I'll have time to generate a pull request or
submit a patch (whichever is preferred, I'm sure I can find info on that)
to update the docs on this so at least people know what to expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants