Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File::Find not performing as documented #7913

Closed
p5pRT opened this issue May 17, 2005 · 10 comments
Closed

File::Find not performing as documented #7913

p5pRT opened this issue May 17, 2005 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented May 17, 2005

Migrated from rt.perl.org#35847 (status was 'resolved')

Searchable as RT35847$

@p5pRT
Copy link
Author

p5pRT commented May 17, 2005

From jms@mathras.comcast.net

Created by jms@mathras.comcast.net

This is a bug report for perl from jms@​mathras.comcast.net,
generated with the help of perlbug 1.34 running under perl v5.8.3.

-----------------------------------------------------------------

The use of lstat() is not guaranteed in File​::Find, contrary
to its documentation.

FAQ 3.4​: How do I find which modules are installed on my system?

It shouldn't matter. From "perldoc File​::Find" (v. 1.07)​:

  \*     It is guaranteed that an lstat has been called before the
        user's "wanted\(\)" function is called\. This enables fast file
        checks involving  \_\.

Not in all cases. lstat() does not always occur in directories
that don't have any subdirectories.

linux% cd ~/bin # Do not run test with . = dir-with-too-many-files
linux% cat ../temp.pl
use File​::Find;
$File​::Find​::dont_use_nlink = $ARGV[0];
my @​files;
find sub { push @​files, $File​::Find​::name if -f _ && /\.pm$/ }, @​INC;
print join "\n", @​files,'';
linux% perl ../temp.pl 0 | wc -l
  434
linux% perl ../temp.pl 1 | wc -l
  943
linux% perl -v
This is perl, v5.8.3 built for i586-linux

linux% diff -u Find.pm.orig Find.pm

Inline Patch
--- Find.pm.orig        2004-02-27 08:31:34.000000000 -0800
+++ Find.pm     2005-05-17 02:39:04.000000000 -0700
@@ -120,8 +120,11 @@
 
 =item *
 
-It is guaranteed that an I<lstat> has been called before the user's
-C<wanted()> function is called. This enables fast file checks involving S< _>.
+Previous versions of File::Find were guaranteed to call an I<lstat>
+before the user's C<wanted()> function was called, but this is no
+longer the case.  Since this depends on File::Find::done_use_nlink, $^O,
+and other factors, fast file checks involving S< _> are not recommended
+unless C<wanted()> calls I<lstat> first.
 
 =item *
 
Perl Info

Flags:
    category=library
    severity=low

Site configuration information for perl v5.8.3:

Configured by jms at Tue Feb 17 02:18:23 PST 2004.

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.4.20-28.9, archname=i586-linux
    uname='linux mathras 2.4.20-28.9 #1 thu dec 18 13:46:46 est 2003 i586 i586 i386 gnulinux '
    config_args='-der'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O3',
    cppflags='-fno-strict-aliasing -I/usr/include -I/usr/include/gdbm -fno-strict-aliasing -I/usr/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm'
    ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/lib'
    libpth=/usr/lib /lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/lib'

Locally applied patches:
    


@INC for perl v5.8.3:
    /usr/lib/perl5/5.8.3/i586-linux
    /usr/lib/perl5/5.8.3
    /usr/lib/perl5/site_perl/5.8.3/i586-linux
    /usr/lib/perl5/site_perl/5.8.3
    /usr/lib/perl5/site_perl
    .


Environment for perl v5.8.3:
    HOME=/home/jms
    LANG=en_US
    LANGUAGE (unset)
    LANGVAR=en_US
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/bin:/usr/sbin:/bin:/sbin:/usr/local/bin:/usr/local/sbin:/usr/X11R6/bin:/home/jms/bin
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2005

From @schwern

[jms@​mathras.comcast.net - Tue May 17 03​:40​:07 2005]​:

Not in all cases. lstat() does not always occur in directories
that don't have any subdirectories.

linux% cd ~/bin # Do not run test with . = dir-with-too-many-files
linux% cat ../temp.pl
use File​::Find;
$File​::Find​::dont_use_nlink = $ARGV[0];
my @​files;
find sub { push @​files, $File​::Find​::name if -f _ &amp;&amp; /\.pm$/ }, @​INC;
print join "\n", @​files,'';
linux% perl ../temp.pl 0 | wc -l
434
linux% perl ../temp.pl 1 | wc -l
943
linux% perl -v
This is perl, v5.8.3 built for i586-linux

I am unable to replicate this problem with either 5.8.1, 5.8.6 or
bleadperl@​24148. I don't have a 5.8.3 handy to try.

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 10, 2005

From @iabyn

On Fri, Jul 08, 2005 at 10​:10​:56AM -0000, Michael G Schwern via RT wrote​:

I am unable to replicate this problem with either 5.8.1, 5.8.6 or
bleadperl@​24148. I don't have a 5.8.3 handy to try.

I can​:

  #!/usr/bin/perl
  mkdir 'x', 0777;
  open F, '>x/f1';
  open F, '>x/f2';
  open F, '>x/f3';
  use File​::Find;
  $File​::Find​::dont_use_nlink = $ARGV[0];
  find sub { printf "%s %s\n", -f _ ? 'file' : 'notf', $File​::Find​::name }, 'x';
  system 'rm -r x';

which gives​:

  $ ./perl -Ilib /tmp/p1 0
  notf x
  notf x/f2
  notf x/f3
  notf x/f1
  $ ./perl -Ilib /tmp/p1 1
  notf x
  file x/f2
  file x/f3
  file x/f1

Its a problem as far back as 5.003_22 at least.

Basically, when it does the 'nlink check' shortcut on a directory to
determine whether the dir only contains files and no subdirs, it doesn't
bother lstating the individual entries in the dir to determine whether
they're a file or a subdir.

So we either fix the docs, as suggested, or fix the code to stat *every*
entry before calling the wanted function. The latter would defeat the
purpose of the optimisation, so I vote for the former.

--
"But Sidley Park is already a picture, and a most amiable picture too.
The slopes are green and gentle. The trees are companionably grouped at
intervals that show them to advantage. The rill is a serpentine ribbon
unwound from the lake peaceably contained by meadows on which the right
amount of sheep are tastefully arranged." -- Lady Croom - Arcadia

@p5pRT
Copy link
Author

p5pRT commented Jul 11, 2005

From @schwern

I still can't reproduce this on OS X.

On Sun, Jul 10, 2005 at 07​:31​:02PM +0100, Dave Mitchell wrote​:

So we either fix the docs, as suggested, or fix the code to stat *every*
entry before calling the wanted function. The latter would defeat the
purpose of the optimisation, so I vote for the former.

I'd be interested to see, given how much work File​::Find does anyway,
just how much a performance hit fixing this would be.

--
Michael G Schwern schwern@​pobox.com http​://www.pobox.com/~schwern
Ahh email, my old friend. Do you know that revenge is a dish that is best
served cold? And it is very cold on the Internet!

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2005

From @iabyn

On Mon, Jul 11, 2005 at 12​:35​:36AM -0700, Michael G Schwern wrote​:

I still can't reproduce this on OS X.

I guess the filesystem you have doesn't have the nlinks > 2 property

On Sun, Jul 10, 2005 at 07​:31​:02PM +0100, Dave Mitchell wrote​:

So we either fix the docs, as suggested, or fix the code to stat *every*
entry before calling the wanted function. The latter would defeat the
purpose of the optimisation, so I vote for the former.

I'd be interested to see, given how much work File​::Find does anyway,
just how much a performance hit fixing this would be.

It makes a huge difference on my laptop (slow disk)​:

  use File​::Find;
  $File​::Find​::dont_use_nlink = $ARGV[0];

  my $count=0;
  find(sub { $count++ }, '/usr');
  print "count=$count\n";

running this the first time with arg0 = 0 took about 3 minutes; rerunning
it takes about 10 secs because all the directory reads are cached by the
OS.

Susequently running with arg0 = 1 (bearing in mind that the directories
are still cached) takes about 6 minutes on both the first and subsequent
runs, presumably because my laptop hasn't got enough free ram to cache
all 300K inodes that have to be read under /usr.

CPU usage is almost zero; the code is completely IO bound.

Perhaps the best approrach would be to document that an lstat is no longer
guaranteed, but add a new option to the find() options hash, 'lstat',
that if true, reinstates the guarantee.

--
Now is the discount of our winter tent
  -- sign seen outside camping shop

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2005

From @schwern

On Sun, Jul 17, 2005 at 06​:42​:40PM +0100, Dave Mitchell wrote​:

Perhaps the best approrach would be to document that an lstat is no longer
guaranteed, but add a new option to the find() options hash, 'lstat',
that if true, reinstates the guarantee.

Sounds good to me. Seems like a waste to guarantee it if you're not going
to use it anyway.

--
Michael G Schwern schwern@​pobox.com http​://www.pobox.com/~schwern
Reality is that which, when you stop believing in it, doesn't go away.
  -- Phillip K. Dick

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2005

From @rgs

jms@​mathras.comcast.net (via RT) wrote​:

The use of lstat() is not guaranteed in File​::Find, contrary
to its documentation.

FAQ 3.4​: How do I find which modules are installed on my system?

It shouldn't matter. From "perldoc File​::Find" (v. 1.07)​:

  \*     It is guaranteed that an lstat has been called before the
        user's "wanted\(\)" function is called\. This enables fast file
        checks involving  \_\.

Not in all cases. lstat() does not always occur in directories
that don't have any subdirectories.
...
--- Find.pm.orig 2004-02-27 08​:31​:34.000000000 -0800
+++ Find.pm 2005-05-17 02​:39​:04.000000000 -0700

Thanks, doc patch applied as change #26076 to bleadperl.

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2005

@rgs - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Nov 10, 2005
@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2005

From @rgs

Rafael Garcia-Suarez wrote​:

jms@​mathras.comcast.net (via RT) wrote​:

The use of lstat() is not guaranteed in File​::Find, contrary
to its documentation.

FAQ 3.4​: How do I find which modules are installed on my system?

It shouldn't matter. From "perldoc File​::Find" (v. 1.07)​:

  \*     It is guaranteed that an lstat has been called before the
        user's "wanted\(\)" function is called\. This enables fast file
        checks involving  \_\.

Not in all cases. lstat() does not always occur in directories
that don't have any subdirectories.
...
--- Find.pm.orig 2004-02-27 08​:31​:34.000000000 -0800
+++ Find.pm 2005-05-17 02​:39​:04.000000000 -0700

Thanks, doc patch applied as change #26076 to bleadperl.

As this documentation appears on a section about follow and follow_fast,
I tweaked it further as follows : (cc​:ing brian because of the perlfaq
change.)

Change 26128 by rgs@​bloom on 2005/11/14 15​:40​:08

  A better fix for [perl #35847] File​::Find not performing as documented,
  suggested by Darren Dunham. Includes a fix to the code example that
  uses File​::Find in perlfaq3.

Affected files ...

... //depot/perl/lib/File/Find.pm#83 edit
... //depot/perl/pod/perlfaq3.pod#89 edit

Differences ...

==== //depot/perl/lib/File/Find.pm#83 (text) ====

@​@​ -120,11 +120,10 @​@​

=item *

-Previous versions of File​::Find were guaranteed to call an I<lstat>
-before the user's C<wanted()> function was called, but this is no
-longer the case. Since this depends on C<$File​::Find​::dont_use_nlink>, $^O,
-and other factors, fast file checks involving C<_> are not recommended
-unless C<wanted()> calls I<lstat> first.
+It is guaranteed that an I<lstat> has been called before the user's
+C<wanted()> function is called. This enables fast file checks involving S<_>.
+Note that this guarantee no longer holds if I<follow> or I<follow_fast>
+are not set.

=item *

==== //depot/perl/pod/perlfaq3.pod#89 (text) ====

@​@​ -85,10 +85,12 @​@​
  use File​::Find;
  my @​files;

- find sub { push @​files, $File​::Find​::name if -f _ &amp;&amp; /\.pm$/ },
- @​INC;
+ find(
+ sub { push @​files, $File​::Find​::name if -f $File​::Find​::name &amp;&amp; /\.pm$/ },
+ @​INC
+ );

- print join "\n", @​files;
+ print "$_\n" for @​files;

If you simply need to quickly check to see if a module is
available, you can check for its documentation. If you can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant