New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spurious lstat calls in Perl_do_readline / glob #13663
Comments
From @craigberryIn pp_hot.c in Perl_do_readline, each and every line returned by a glob operation is run through the following check: for (t1 = SvPVX_const(sv); *t1; t1++) where sv contains the line returned by calling sv_gets on the file pointer returned by Perl_start_glob. The code is ancient; the basic logic hasn't changed since at least 5.000 twenty years ago. Checking each and every character of each and every filename for its presence in a hard-coded list of pattern match characters is a fair amount of processing, but is also non-portable and, more seriously, causes many extra calls to the relatively slow lstat() in very common use cases on non-Unix platforms. On Windows, any path with a backslash directory delimiter will trigger an unnecessary lstat(). On VMS, any native path with bracket directory delimiters ([]<>) or a semicolon version delimiter (;) will trigger an unnecessary lstat(). Also, on VMS, the dollar sign is a perfectly legal and extremely common filename character, but its presence will trigger an lstat(). I believe the intent of the code quoted above is to detect the case where the shell's glob returns the input pattern verbatim, indicating "no match." The lstat() appears to be a sanity check, as if to say, "not only does this file have pattern characters in its name, it also does not in fact exist." If that is the case, it seems to me a better check would be to compare the first line returned from the glob (and the first line only) with the input pattern for an exact match and skip only that case. That wouldn't work if there are shells that normalize the input pattern in some way before returning it, but I don't know if that's ever the case. As far as I know (which may not be very for, so please correct me) there would never be a reason to do this check on any but the first line of input (I *think* C<IoLINES(io) == 1> would do this but haven't checked). By the time you have two lines returned from a glob operation, you know you have at least two matches, so a check for the "no match" case no longer makes sense. I don't know much about unixy glob implementations, so it's possible the foregoing analysis isn't correct, but I have stepped through and watched the extra lstat() calls, so there is a definite performance bug here as that's I/O to the file header that is completely unnecessary. The following simple example shows that the mere presence of a version specification (;*) on the glob pattern causes a stat call on each filename returned. First with no debugging output: $ perl -we "print qq/$x\n/ while ($x = <*.com;*>);" and now with Perl debugger tracing enabled but also run under the VMS debugger to show where the internal stat routine gets called: $ dbgperl -Dt -we "print qq/$x\n/ while ($x = <*.com;*>);" OpenVMS I64 Debug64 Version V8.4-001 %DEBUG-I-INITIAL, Language: C, Module: PERLMAIN DBG> go EXECUTING... (-e:0) enter Thanks go to Hein van den Heuvel for making me aware of this problem. $ perl -"V" Characteristics of this PERLSHR image: ________________________________________ "... getting out of a sonnet is much more |
From @craigberryOn 3/14/14, 3:51 PM, Craig A . Berry wrote:
What I forgot about when I posted is that this code only kicks in when
Answering my own question, I believe the expectation is that multiple $ echo *.lst *.flirble *.SH So this can't be fixed in the universal fashion I was hoping. The patch This makes a glob of the Perl source tree 60% faster on first iteration So I'll push this soonish unless someone has a better suggestion. Inline Patch--- pp_hot.c.orig 2014-03-14 21:40:45 -0500
+++ pp_hot.c 2014-03-15 18:00:37 -0500
@@ -1698,7 +1698,11 @@ Perl_do_readline(pTHX)
}
}
for (t1 = SvPVX_const(sv); *t1; t1++)
+#ifdef __VMS
+ if (strchr("*%?", *t1))
+#else
if (strchr("$&*(){}[]'\";\\|?<>~`", *t1))
+#endif
break;
if (*t1 && PerlLIO_lstat(SvPVX_const(sv), &PL_statbuf) < 0) {
(void)POPs; /* Unmatched wildcard? Chuck it... */ |
From @jkeenanOn Mon, 17 Mar 2014 12:33:58 GMT, craigberry wrote:
You did commit that code, but the ticket was never closed: ##### Closing now. |
The RT System itself - Status changed from 'new' to 'open' |
@jkeenan - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#121440 (status was 'resolved')
Searchable as RT121440$
The text was updated successfully, but these errors were encountered: