Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File::Find always re-sorts directories last #14156

Open
p5pRT opened this issue Oct 13, 2014 · 5 comments
Open

File::Find always re-sorts directories last #14156

p5pRT opened this issue Oct 13, 2014 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 13, 2014

Migrated from rt.perl.org#122968 (status was 'open')

Searchable as RT122968$

@p5pRT
Copy link
Author

p5pRT commented Oct 13, 2014

From @leonerd

Created by @leonerd

This is a bug report for perl from leonerd@​leonerd.org.uk,
generated with the help of perlbug 1.40 running under perl 5.20.0.

-----------------------------------------------------------------

File​::Find, even in depth-first mode, even with a preprocess function, will
always visit non-directories before directories. For example​:

  find({
  no_chdir => 1,
  preprocess => sub {
  my @​ret = sort @​_;
  print "Sorted dirs​: <@​ret>\n";
  return @​ret; },

  wanted => sub {
  my $filename = $_;
  print "Considering $filename...\n"; },
  }, "tests"
  );

will yield the output​:

  Considering tests...
  Sorted dirs​: <. .. 10apidoc 50register.pl 51displayname.pl 60create-room.pl 61room-message.pl 62room-presence.pl>
  Considering tests/50register.pl...
  Considering tests/51displayname.pl...
  Considering tests/60create-room.pl...
  Considering tests/61room-message.pl...
  Considering tests/62room-presence.pl...
  Considering tests/10apidoc...
  Sorted dirs​: <. .. 01register.pl>
  Considering tests/10apidoc/01register.pl...

Specfically, even though I have sorted '10apidoc' before '50register.pl' it
has reordered it to come last as it is a directory, after all of the
non-directories. This code is implemented in the logic around here​:

  https://metacpan.org/source/SHAY/perl-5.20.1/ext/File-Find/lib/File/Find.pm#L766

and the later if(-d _)-guarded block involving a splice on this array.

This bug therefore comes in two parts​:

1) The documentation of File​::Find nowhere makes mention of the fact it will
  do this.

2) There is no control over this behaviour, and therefore no way for me to
  ask it not to happen, and respect my chosen sorting order.

Perl Info

Flags:
    category=library
    severity=low
    module=File::Find

Site configuration information for perl 5.20.0:

Configured by Debian Project at Sat Aug 30 06:38:34 UTC 2014.

Summary of my perl5 (revision 5 version 20 subversion 0) configuration:
   
  Platform:
    osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
    uname='linux brahms 3.2.0-4-amd64 #1 smp debian 3.2.60-1+deb7u3 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.20 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.20 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.20 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.20.0 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.20.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.20.0 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.9.1', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=libc-2.19.so, so=so, useshrplib=true, libperl=libperl.so.5.20.0
    gnulibc_version='2.19'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Locally applied patches:
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
    DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
    DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.20.0-6 in patchlevel.h
    DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
    DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
    DEBPKG:debian/regen-skip - Skip a regeneration check in unrelated git repositories
    DEBPKG:debian/regcomp-mips-optim - http://bugs.debian.org/754054 Downgrade the optimization of regcomp.c on mips and mipsel due to a gcc-4.9 bug
    DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/758471 Pass LD settings through to subdirectories
    DEBPKG:debian/perldoc-less-R - http://bugs.debian.org/758689 Tell the 'less' pager to allow terminal escape sequences


@INC for perl 5.20.0:
    /home/leo/lib/perl5/x86_64-linux-gnu-thread-multi
    /home/leo/lib/perl5
    /etc/perl
    /usr/local/lib/x86_64-linux-gnu/perl/5.20.0
    /usr/local/share/perl/5.20.0
    /usr/lib/x86_64-linux-gnu/perl5/5.20
    /usr/share/perl5
    /usr/lib/x86_64-linux-gnu/perl/5.20
    /usr/share/perl/5.20
    /usr/local/lib/site_perl
    .


Environment for perl 5.20.0:
    HOME=/home/leo
    LANG=en_GB.utf8
    LANGUAGE=en_GB:en
    LD_LIBRARY_PATH=/home/leo/lib
    LOGDIR (unset)
    PATH=/home/leo/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERL5LIB=/home/leo/lib/perl5
    PERL_BADLANG (unset)
    PERL_MB_OPT=--install_base=/home/leo
    PERL_MM_OPT=INSTALL_BASE=/home/leo
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2014

From @jkeenan

On Mon Oct 13 10​:09​:09 2014, leonerd@​leonerd.org.uk wrote​:

This is a bug report for perl from leonerd@​leonerd.org.uk,
generated with the help of perlbug 1.40 running under perl 5.20.0.

-----------------------------------------------------------------

File​::Find, even in depth-first mode, even with a preprocess function,
will
always visit non-directories before directories. For example​:

find({
no_chdir => 1,
preprocess => sub {
my @​ret = sort @​_;
print "Sorted dirs​: <@​ret>\n";
return @​ret; },

wanted => sub {
my $filename = $_;
print "Considering $filename...\n"; },
}, "tests"
);

will yield the output​:

Considering tests...
Sorted dirs​: <. .. 10apidoc 50register.pl 51displayname.pl 60create-
room.pl 61room-message.pl 62room-presence.pl>
Considering tests/50register.pl...
Considering tests/51displayname.pl...
Considering tests/60create-room.pl...
Considering tests/61room-message.pl...
Considering tests/62room-presence.pl...
Considering tests/10apidoc...
Sorted dirs​: <. .. 01register.pl>
Considering tests/10apidoc/01register.pl...

Specfically, even though I have sorted '10apidoc' before
'50register.pl' it
has reordered it to come last as it is a directory, after all of the
non-directories. This code is implemented in the logic around here​:

https://metacpan.org/source/SHAY/perl-5.20.1/ext/File-
Find/lib/File/Find.pm#L766

and the later if(-d _)-guarded block involving a splice on this
array.

This bug therefore comes in two parts​:

1) The documentation of File​::Find nowhere makes mention of the fact
it will
do this.

2) There is no control over this behaviour, and therefore no way for
me to
ask it not to happen, and respect my chosen sorting order.

I suspect that the problem lies in this part of ext/File-Find/lib/File/Find.pm​:

#####
295 sub _find_dir($$$) {
296 my ($wanted, $p_dir, $nlink) = @​_;
...
389 @​filenames = readdir DIR;
390 closedir(DIR);
391 @​filenames = $pre_process->(@​filenames) if $pre_process;
...
419 else {
420 # This dir has subdirectories.
421 $subcount = $nlink - 2;
422
423 # HACK​: insert directories at this position. so as to preserve
424 # the user pre-processed ordering of files.
425 # EG​: directory traversal is in user sorted order, not at random.
426 my $stack_top = @​Stack;
427
428 for my $FN (@​filenames) {
429 next if $FN =~ $File​::Find​::skip_pattern;
430 if ($subcount > 0 || $no_nlink) {
431 # Seen all the subdirs?
432 # check for directoriness.
433 # stat is faster for a file in the current directory
434 $sub_nlink = (lstat ($no_chdir ? $dir_pref . $FN : $FN))[3];
435
436 if (-d _) {
437 --$subcount;
438 $FN =~ s/\.dir\z//i if $Is_VMS;
439 # HACK​: replace push to preserve dir traversal order
440 #push @​Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
441 splice @​Stack, $stack_top, 0,
442 [$CdLvl,$dir_name,$FN,$sub_nlink];
443 }
444 else {
445 $name = $dir_pref . $FN; # $File​::Find​::name
446 $_= ($no_chdir ? $name : $FN); # $_
447 { $wanted_callback->() }; # protect against wild "next"
448 }
449 }
450 else {
451 $name = $dir_pref . $FN; # $File​::Find​::name
452 $_= ($no_chdir ? $name : $FN); # $_
453 { $wanted_callback->() }; # protect against wild "next"
454 }
455 }
456 }
...
#####

This code was added in 2003 in what is now commit 7bd3152 in response to RT #22195 (see attachment).

That's as far as I'll be able to analyze this tonight.

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2014

From @jkeenan

rt122968.7bd31527.diff
commit 7bd31527512b13192dec55ed4943599edc17b94a
Author: Jarkko Hietaniemi <jhi@iki.fi>
Date:   Fri May 16 17:56:06 2003 +0000

    Apply the supplied patch for [perl #22195]
    "File::Find, sorted directory traversal order is inverted"
    
    p4raw-id: //depot/perl@19531

diff --git a/lib/File/Find.pm b/lib/File/Find.pm
index c5f2a5a..fc09503 100644
--- a/lib/File/Find.pm
+++ b/lib/File/Find.pm
@@ -7,6 +7,12 @@ our $VERSION = '1.04';
 require Exporter;
 require Cwd;
 
+#
+# Modified to ensure sub-directory traversal order is not inverded by stack
+# push and pops.  That is remains in the same order as in the directory file,
+# or user pre-processing (EG:sorted).
+#
+
 =head1 NAME
 
 File::Find - Traverse a directory tree.
@@ -855,6 +861,11 @@ sub _find_dir($$$) {
 	    # This dir has subdirectories.
 	    $subcount = $nlink - 2;
 
+	    # HACK: insert directories at this position. so as to preserve
+	    # the user pre-processed ordering of files.
+	    # EG: directory traversal is in user sorted order, not at random.
+            my $stack_top = @Stack;
+
 	    for my $FN (@filenames) {
 		next if $FN =~ $File::Find::skip_pattern;
 		if ($subcount > 0 || $no_nlink) {
@@ -866,7 +877,10 @@ sub _find_dir($$$) {
 		    if (-d _) {
 			--$subcount;
 			$FN =~ s/\.dir\z// if $Is_VMS;
-			push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
+			# HACK: replace push to preserve dir traversal order
+			#push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
+			splice @Stack, $stack_top, 0,
+			         [$CdLvl,$dir_name,$FN,$sub_nlink];
 		    }
 		    else {
 			$name = $dir_pref . $FN; # $File::Find::name

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2014

From @leonerd

On Mon, 13 Oct 2014 18​:51​:03 -0700
"James E Keenan via RT" <perlbug-followup@​perl.org> wrote​:

This code was added in 2003 in what is now commit 7bd3152 in
response to RT #22195 (see attachment).

That's as far as I'll be able to analyze this tonight.

Well, not even. RT #22195 was a bug with the already-existing logic
that moved sub-directories to the end of the sort order. The fact that
those get moved was already existing at that time.

--
Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk
http​://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants