Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File::Basename behavior is misleading #6521

Closed
p5pRT opened this issue May 18, 2003 · 8 comments
Closed

File::Basename behavior is misleading #6521

p5pRT opened this issue May 18, 2003 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented May 18, 2003

Migrated from rt.perl.org#22236 (status was 'resolved')

Searchable as RT22236$

@p5pRT
Copy link
Author

p5pRT commented May 18, 2003

From frederik@ugcs.caltech.edu

Created by frederik@ugcs.caltech.edu

The File​::Basename man page suggests that the basename and dirname
functions are compatible with the Unix shell commands. It is not
important for most purposes that these functions match the commands
exactly, however, for them to be useful for certain things, it is
important to preserve the invariant that $f and
dirname($f)."/".basename($f) refer to the same file, i.e. that when
the two functions are used together, no information about the file
location is lost. This invariant seems to hold for the Unix shell
commands, but not for the File​::Basename versions​:

$ dirname a/
.
$ basename a/
a
$ perl -e 'use File​::Basename; print basename("a/"),"\n"; print dirname("a/"),"\n";'

.
$

Here $f="a/", and dirname($f)."/".basename($f)="./".

Perl Info

Flags:
    category=library
    severity=medium

Site configuration information for perl v5.8.0:

Configured by Debian Project at Mon Feb 17 13:43:44 UTC 2003.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.19-powerpc-smp, archname=powerpc-linux-thread-multi
    uname='linux voltaire 2.4.19-powerpc-smp #1 smp mon sep 9 09:11:02 edt 2002 ppc unknown unknown gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=powerpc-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8.0 -Darchlib=/usr/lib/perl/5.8.0 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.0 -Dsitearch=/usr/local/lib/perl/5.8.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.0 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing'
    ccversion='', gccversion='3.2.3 20030210 (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.1.so, so=so, useshrplib=true, libperl=libperl.so.5.8.0
    gnulibc_version='2.3.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.0:
    /etc/perl
    /usr/local/lib/perl/5.8.0
    /usr/local/share/perl/5.8.0
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8.0
    /usr/share/perl/5.8.0
    /usr/local/lib/site_perl
    .


Environment for perl v5.8.0:
    HOME=/home/frederik
    LANG=C
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/frederik/bin:/usr/X11R6/bin:/usr/local/bin:/usr/bin:/bin:/home/frederik/sbin:/usr/X11R6/sbin:/usr/local/sbin:/usr/sbin:/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2003

From sorourke@cs.ucsd.edu

This makes dirname($f)."/".basename($f) point to the same file as $f
when $f ends in "/" (or your platform's directory separator char).
The behavior's now consistent with the shell, i.e.

$f = "/usr/lib//";
basename($f); # => "lib"
dirname($f); # => "/usr"

Tested on Darwin.

/s

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2003

From sorourke@cs.ucsd.edu

foo.patch
--- /opt/perl/lib/5.8.0/File/Basename.pm.old	Fri Jul 19 19:00:58 2002
+++ /opt/perl/lib/5.8.0/File/Basename.pm	Sun Jul  6 20:57:48 2003
@@ -104,6 +104,11 @@ The basename() routine returns the first
 by calling fileparse() with the same arguments, except that it always
 quotes metacharacters in the given suffixes.  It is provided for
 programmer compatibility with the Unix shell command basename(1).
+Like the dirname() routine, basename() emulates the Unix behavior,
+which may not be consistent with fileparse.  For example, for the
+input file specification F<lib/>, fileparse() returns the empty string
+as the file's basename, while basename() returns F<lib> (without the
+trailing slash(es)).
 
 =item C<dirname>
 
@@ -231,9 +236,19 @@ sub fileparse {
 
 
 #   basename() - returns first element of list returned by fileparse()
+#       on non-UNIX/Win/DOS systems, otherwise returns the result of
+#       basename(1) (see below
 
 sub basename {
   my($name) = shift;
+    my($fstype) = $Fileparse_fstype;
+
+    if ($fstype =~ /MS(DOS|Win32)|os2/i) { 
+        $name =~ s/([^:])[\\\/]*\z/$1/;
+    }
+    elsif ($fstype !~ /MacOS|VMS|AmigaOS/i) {
+        $name =~ s:(.)/*\z:$1:s;
+    }
   (fileparse($name, map("\Q$_\E",@_)))[0];
 }
 

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2005

From @schwern

[sorourke@​cs.ucsd.edu - Sun Jul 13 12​:04​:02 2003]​:

This makes dirname($f)."/".basename($f) point to the same file as $f
when $f ends in "/" (or your platform's directory separator char).
The behavior's now consistent with the shell, i.e.

$f = "/usr/lib//";
basename($f); # => "lib"
dirname($f); # => "/usr"

My only issue with this patch is it contradicts the existing docs...

  The basename() routine returns the first element of the list pro-
  duced by calling fileparse() with the same arguments, except that
  it always quotes metacharacters in the given suffixes.

Except the existing docs contradict itself.

  It is provided for programmer compatibility with the Unix
  shell command basename(1).

So EITHER it works the same as fileparse() OR it works the same as
basename(1). Given the recent discussion about dirname()'s behavior
(see 36435) concluded that its more important to be compatible with the
shell, I think this change should go in.

Attached is a patch to make basename() strip trailing path seperators
before processing as well as docs to that effect and more tests. I've
also documented the "dirname($path) . '/' . basename($path)"
assertation. It also consolidates all the trailing path seperator
stripping into one function so we don't have repeat the code in
dirname() and basename().

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2005

From @schwern

fb.patch
--- lib/File/Basename.pm	2005/07/06 16:11:06	1.3
+++ lib/File/Basename.pm	2005/07/06 19:30:56
@@ -22,6 +22,13 @@
 the shell and C functions of the same name.  See each function's documention
 for details.
 
+It is guaranteed that
+
+    # Where $path_separator is / for Unix, \ for Windows, etc...
+    dirname($path) . $path_separator . basename($path);
+
+is equivalent to the original path for all systems but VMS.
+
 =cut
 
 
@@ -172,21 +179,32 @@
     my $filename = basename($path);
     my $filename = basename($path, @suffixes);
 
-C<basename()> works just like C<fileparse()> in scalar context - you only get
-the $filename - except that it always quotes metacharacters in the @suffixes.
+This function is provided for compatibility with the Unix shell command 
+C<basename(1)>.  It does B<NOT> always return the file name portion of a
+path as you might expect.  To be safe, if you want the file name portion of
+a path use C<fileparse()>.
+
+C<basename()> returns the last level of a filepath even if the last
+level is clearly directory.  In effect, its acting like C<pop()> for
+paths.  This differs from fileparse()'s behavior.
+
+    # Both return "bar"
+    basename("/foo/bar");
+    basename("/foo/bar/");
+
+@suffixes work as in fileparse() except all regex metacharacters are
+quoted.
 
     # These two function calls are equivalent.
     my $filename = basename("/foo/bar/baz.txt",  ".txt");
     my $filename = fileparse("/foo/bar/baz.txt", qr/\Q.txt\E/);
 
-This function is provided for compatibility with the Unix shell command 
-C<basename(1)>.
-
 =cut
 
 
 sub basename {
   my($name) = shift;
+  _strip_trailing_sep($name);
   (fileparse($name, map("\Q$_\E",@_)))[0];
 }
 
@@ -251,16 +269,16 @@
     }
     elsif ($type eq 'MacOS') {
 	if( !length($basename) && $dirname !~ /^[^:]+:\z/) {
-	    $dirname =~ s/([^:]):\z/$1/s;
+            _strip_trailing_sep($dirname);
 	    ($basename,$dirname) = fileparse $dirname;
 	}
 	$dirname .= ":" unless $dirname =~ /:\z/;
     }
     elsif (grep { $type eq $_ } qw(MSDOS DOS MSWin32 OS2)) { 
-        $dirname =~ s/([^:])[\\\/]*\z/$1/;
+        _strip_trailing_sep($dirname);
         unless( length($basename) ) {
 	    ($basename,$dirname) = fileparse $dirname;
-	    $dirname =~ s/([^:])[\\\/]*\z/$1/;
+	    _strip_trailing_sep($dirname);
 	}
     }
     elsif ($type eq 'AmigaOS') {
@@ -269,14 +287,30 @@
         $dirname =~ s#[^:/]+\z## unless length($basename);
     }
     else {
-        $dirname =~ s{(.)/*\z}{$1}s;
+        _strip_trailing_sep($dirname);
         unless( length($basename) ) {
 	    ($basename,$dirname) = fileparse $dirname;
-	    $dirname =~ s{(.)/*\z}{$1}s;
+	    _strip_trailing_sep($dirname);
 	}
     }
 
     $dirname;
+}
+
+
+# Strip the trailing path separator.
+sub _strip_trailing_sep  {
+    my $type = $Fileparse_fstype;
+
+    if ($type eq 'MacOS') {
+        $_[0] =~ s/([^:]):\z/$1/s;
+    }
+    elsif (grep { $type eq $_ } qw(MSDOS DOS MSWin32 OS2)) { 
+        $_[0] =~ s/([^:])[\\\/]*\z/$1/;
+    }
+    else {
+        $_[0] =~ s{(.)/*\z}{$1}s;
+    }
 }
 
 
--- lib/File/Basename.t	2005/07/06 16:11:57	1.3
+++ lib/File/Basename.t	2005/07/06 19:24:07
@@ -5,7 +5,7 @@
     @INC = '../lib';
 }
 
-use Test::More tests => 53;
+use Test::More tests => 57;
 
 BEGIN { use_ok 'File::Basename' }
 
@@ -118,6 +118,16 @@
     is(dirname('/perl/'), '/');
     # perl5.003_18 gives '/perl/lib'
     is(dirname('/perl/lib//'), '/perl');
+}
+
+### rt.perl.org 22236
+{
+    is(basename('a/'), 'a');
+    is(basename('/usr/lib//'), 'lib');
+
+    fileparse_set_fstype 'MSWin32';
+    is(basename('a\\'), 'a');
+    is(basename('\\usr\\lib\\\\'), 'lib');
 }
 
 

@p5pRT
Copy link
Author

p5pRT commented Jul 7, 2005

From @steve-m-hay

Michael G Schwern via RT wrote​:

Attached is a patch to make basename() strip trailing path seperators
before processing as well as docs to that effect and more tests. I've
also documented the "dirname($path) . '/' . basename($path)"
assertation. It also consolidates all the trailing path seperator
stripping into one function so we don't have repeat the code in
dirname() and basename().

Thanks - applied as change 25090 (with one or two C<>'s again).


Radan Computational Ltd.

The information contained in this message and any files transmitted with it are confidential and intended for the addressee(s) only. If you have received this message in error or there are any problems, please notify the sender immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of Radan Computational Ltd. The recipient(s) of this message should check it and any attached files for viruses​: Radan Computational will accept no liability for any damage caused by any virus transmitted by this email.

@p5pRT
Copy link
Author

p5pRT commented Jul 7, 2005

From @steve-m-hay

Patch now applied, as noted above.

@p5pRT
Copy link
Author

p5pRT commented Jul 7, 2005

@steve-m-hay - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant