Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob() with spaces - documentation unclear #9393

Closed
p5pRT opened this issue Jun 25, 2008 · 13 comments
Closed

glob() with spaces - documentation unclear #9393

p5pRT opened this issue Jun 25, 2008 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 25, 2008

Migrated from rt.perl.org#56348 (status was 'resolved')

Searchable as RT56348$

@p5pRT
Copy link
Author

p5pRT commented Jun 25, 2008

From @epa

Created by @epa

use warnings;
use strict;
use 5.010;
my @​got = glob('* x');
say foreach @​got;

This unexpectedly matches every file in the current directory but I
expected it to match only filenames ending in ' x'.

Now, if you read perlfunc(1) it has a note saying 'starting in 5.6...
see File​::Glob'. And in File​::Glob(1) there is a mention that 'Due to
historical reasons, CORE​::glob() will also split its argument on
whitespace, treating it as multiple patterns'.

But this is obscure. The special treatment of space characters should
be there in the main glob() documentation, and it doesn't need to hark
back to the days of perl 5.6. It could mention that
File​::Glob​::bsd_glob is a safe alternative that won't trip up on
spaces.

Ideally, a space character in the pattern passed to glob() would be a
warning.

Perl Info

Flags:
    category=core
    severity=low

This perlbug was built using Perl 5.10.0 in the Fedora build system.
It is being executed now by Perl 5.10.0 - Tue Mar 18 15:46:25 EDT 2008.

Site configuration information for perl 5.10.0:

Configured by Red Hat, Inc. at Tue Mar 18 15:46:25 EDT 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.18-53.1.6.el5xen, archname=i386-linux-thread-multi
    uname='linux xenbuilder2.fedora.redhat.com 2.6.18-53.1.6.el5xen #1 smp wed jan 16 04:10:44 est 2008 i686 i686 i386 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -Dversion=5.10.0 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='4.3.0 20080314 (Red Hat 4.3.0-3)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.7.90.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.7.90'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    /home/eda/lib/perl5/5.10.0/i386-linux-thread-multi
    /home/eda/lib/perl5/5.10.0
    /home/eda/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /home/eda/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/site_perl/5.8.8
    /usr/lib/perl5/site_perl/5.8.7
    /usr/lib/perl5/site_perl/5.8.6
    /usr/lib/perl5/site_perl/5.8.5
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.8.8
    /usr/lib/perl5/vendor_perl/5.8.7
    /usr/lib/perl5/vendor_perl/5.8.6
    /usr/lib/perl5/vendor_perl/5.8.5
    /usr/lib/perl5/vendor_perl
    .


Environment for perl 5.10.0:
    HOME=/home/eda
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_GB.UTF-8
    LC_MESSAGES=en_GB.UTF-8
    LC_MONETARY=en_GB.UTF-8
    LC_NUMERIC=en_GB.UTF-8
    LC_TIME=en_GB.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/eda/bin:/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin
    PERL5LIB=/home/eda/lib/perl5/5.10.0:/home/eda/lib/perl5/site_perl/5.10.0:/usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi:/usr/lib/perl5/site_perl/5.10.0:/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi:/usr/lib/perl5/vendor_perl/5.10.0
    PERL_BADLANG (unset)
    SHELL=/bin/bash

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From @pjf

G'day Ed / p5p,

Ed Avis (via RT) wrote​:

But this is obscure. The special treatment of space characters should
be there in the main glob() documentation, and it doesn't need to hark
back to the days of perl 5.6. It could mention that
File​::Glob​::bsd_glob is a safe alternative that won't trip up on
spaces.

I agree! I've attached a git-ish patch that adds to the documentation.

Ideally, a space character in the pattern passed to glob() would be a
warning.

I disagree here, since I use glob splitting on spaces as a feature. I
regularly find myself writing glob('*.c *.h') and similar space-separated
patterns.

Cheerio,

  Paul

--
Paul Fenwick <pjf@​perltraining.com.au> | http​://perltraining.com.au/
Director of Training | Ph​: +61 3 9354 6001
Perl Training Australia | Fax​: +61 3 9354 2681

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From @pjf

0001-Expanded-glob-documentation-with-whitespace-behaviou.patch
From a527815a4fc7d6d858bf4e8e06ebff7696b76c88 Mon Sep 17 00:00:00 2001
From: Paul Fenwick <pjf@perltraining.com.au>
Date: Thu, 26 Jun 2008 11:39:59 +1000
Subject: [PATCH] Expanded glob documentation with whitespace behaviour.

	* Mentions glob splits on whitespace.
	* Provided some examples.
	* Mentioned bsd_glob as an alternative.
---
 pod/perlfunc.pod |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 864699d..5dafe1a 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -2348,8 +2348,14 @@ implementing the C<< <*.c> >> operator, but you can use it directly. If
 EXPR is omitted, C<$_> is used.  The C<< <*.c> >> operator is discussed in
 more detail in L<perlop/"I/O Operators">.
 
+Note that C<glob> will split its arguments on whitespace, treating
+each segment as separate pattern.  As such, C<glob('*.c *.h')> would
+match all files with a F<.c> or F<.h> extension.  The expression
+C<glob('.* *')> would match all files in the current working directory.
+
 Beginning with v5.6.0, this operator is implemented using the standard
-C<File::Glob> extension.  See L<File::Glob> for details.
+C<File::Glob> extension.  See L<File::Glob> for details, including
+C<bsd_glob> which does not treat whitespace as a pattern separator.
 
 =item gmtime EXPR
 X<gmtime> X<UTC> X<Greenwich>
-- 
1.5.2.2

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From ben@morrow.me.uk

Quoth pjf@​perltraining.com.au (Paul Fenwick)​:

Ed Avis (via RT) wrote​:

But this is obscure. The special treatment of space characters should
be there in the main glob() documentation, and it doesn't need to hark
back to the days of perl 5.6. It could mention that
File​::Glob​::bsd_glob is a safe alternative that won't trip up on
spaces.

I agree! I've attached a git-ish patch that adds to the documentation.

Since it recently came up on clpmisc :), it's probably worth mentioning
that glob omits files beginning with '.' on all OSen, even those which
don't treat such files as 'hidden'. Also, a fuller description of the
patterns used (rather than 'it's what csh does') might be useful, since
a lot of Perl programmers won't be familiar with csh nowadays
(thankfully :) ). I realise this is all documented in File​::Glob, but
at least an explicit mention that that is the place to look for the
pattern spec would be useful.

Ben

--
  Outside of a dog, a book is a man's best friend.
  Inside of a dog, it's too dark to read.
ben@​morrow.me.uk Groucho Marx

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From @rgs

2008/6/26 Paul Fenwick <pjf@​perltraining.com.au>​:

G'day Ed / p5p,

Ed Avis (via RT) wrote​:

But this is obscure. The special treatment of space characters should
be there in the main glob() documentation, and it doesn't need to hark
back to the days of perl 5.6. It could mention that
File​::Glob​::bsd_glob is a safe alternative that won't trip up on
spaces.

I agree! I've attached a git-ish patch that adds to the documentation.

Thanks, applied.

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From @pjf

G'day Ben,

Ben Morrow wrote​:

Since it recently came up on clpmisc :), it's probably worth mentioning
that glob omits files beginning with '.' on all OSen, even those which
don't treat such files as 'hidden'.

I did try to allude to that in my patch, with the example of glob(".* *")
matching everything in a directory.

a lot of Perl programmers won't be familiar with csh nowadays

"What would csh do?"

(thankfully :) ). I realise this is all documented in File​::Glob, but
at least an explicit mention that that is the place to look for the
pattern spec would be useful.

Any wording suggestions? ;)

Cheerio,

  Paul

--
Paul Fenwick <pjf@​perltraining.com.au> | http​://perltraining.com.au/
Director of Training | Ph​: +61 3 9354 6001
Perl Training Australia | Fax​: +61 3 9354 2681

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From @Tux

On Thu, 26 Jun 2008 18​:17​:46 +1000, Paul Fenwick
<pjf@​perltraining.com.au> wrote​:

G'day Ben,

Ben Morrow wrote​:

Since it recently came up on clpmisc :), it's probably worth mentioning
that glob omits files beginning with '.' on all OSen, even those which
don't treat such files as 'hidden'.

I did try to allude to that in my patch, with the example of glob(".* *")
matching everything in a directory.

a lot of Perl programmers won't be familiar with csh nowadays

"What would csh do?"

$ touch foo.c baar.o "foo bar.o"
$ ls *.c *.o
baar.o foo bar.o foo.c xx.c
$ csh
% ls *.c *.o
baar.o foo bar.o foo.c xx.c
%

There is nothing more to it than that

(thankfully :) ). I realise this is all documented in File​::Glob, but
at least an explicit mention that that is the place to look for the
pattern spec would be useful.

Any wording suggestions? ;)

--
H.Merijn Brand Amsterdam Perl Mongers http​://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin.
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From @pjf

H.Merijn Brand wrote​:

"What would csh do?"

$ touch foo.c baar.o "foo bar.o"
$ ls *.c *.o
baar.o foo bar.o foo.c xx.c

Sorry, that was a poor attempt at humour on my behalf, alluding to the "What
would X do?" meme that occasionally floats about.

Having said that, given that *, ?, [a-z], and {foo,bar,baz} all seem to work
the same way in both csh and bash, I don't actually know what csh does
differently to bash[1].

Cheerio,

  Paul

--
Paul Fenwick <pjf@​perltraining.com.au> | http​://perltraining.com.au/
Director of Training | Ph​: +61 3 9354 6001
Perl Training Australia | Fax​: +61 3 9354 2681

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2008

From @Tux

On Thu, 26 Jun 2008 20​:26​:52 +1000, Paul Fenwick
<pjf@​perltraining.com.au> wrote​:

H.Merijn Brand wrote​:

"What would csh do?"

$ touch foo.c baar.o "foo bar.o"
$ ls *.c *.o
baar.o foo bar.o foo.c xx.c

Sorry, that was a poor attempt at humour on my behalf, alluding to the "What
would X do?" meme that occasionally floats about.

Having said that, given that *, ?, [a-z], and {foo,bar,baz} all seem to work
the same way in both csh and bash, I don't actually know what csh does
differently to bash[1].

% ls %.{c,o}
baar.o foo bar.o foo.c xx.c

--
H.Merijn Brand Amsterdam Perl Mongers http​://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin.
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2008

From @epa

Thanks for fixing the doc.

As for glob('*.c *.h'), this convenience could be given just as well by
allowing glob() to take more than one argument​:

  glob('*.c', '*.h')

or if you prefer

  glob(qw(*.c *.h))

At least then perl can know you really wanted two separate patterns and
not one pattern with a space in the middle. Spaces in filenames are
really quite common these days (even on Unix-like systems - any Mac
users here?).

Perl programs do tend to suffer bugs in handling filenames with spaces
in, and this is one of the causes. Better to give a warning and
encourage people to use bsd_glob() instead, or use the suggested new
feature of multi-arg glob() if they really did intend two different
patterns.

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2011

From @cpansprout

On Thu Jul 03 08​:47​:45 2008, ed wrote​:

Thanks for fixing the doc.

As for glob('*.c *.h'), this convenience could be given just as well by
allowing glob() to take more than one argument​:

glob\('\*\.c'\, '\*\.h'\)

or if you prefer

glob\(qw\(\*\.c \*\.h\)\)

At least then perl can know you really wanted two separate patterns and
not one pattern with a space in the middle. Spaces in filenames are
really quite common these days (even on Unix-like systems - any Mac
users here?).

Perl programs do tend to suffer bugs in handling filenames with spaces
in, and this is one of the causes. Better to give a warning and
encourage people to use bsd_glob() instead, or use the suggested new
feature of multi-arg glob() if they really did intend two different
patterns.

While I’d like to make File​::Glob’s new :bsd_glob export make glob()
accept a list, there are problems with the special magic attached to the
glob keyword that prevent the prototype from changing.

If we can change that with perl 5.16, I’ll go head and make :bsd_glob
support that.

I’m going to mark this as resolved, since the glob documentation in
perlfunc has been expanded recently by Tom Christiansen to warn about
spaces more clearly and recommend quotation marks (a91bb7b).

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2011

@cpansprout - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant