Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getcwd() fails to fail on Cygwin #16366

Open
p5pRT opened this issue Jan 17, 2018 · 9 comments
Open

getcwd() fails to fail on Cygwin #16366

p5pRT opened this issue Jan 17, 2018 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 17, 2018

Migrated from rt.perl.org#132733 (status was 'open')

Searchable as RT132733$

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2018

From zefram@fysh.org

Created by zefram@fysh.org

I'm spinning this off from [perl #132648] because it's got too big.

On Cygwin (not reflected in the perl -V below), it has been found that
multiple implementations of the getcwd() functionality in Cwd return a
non-error result in a situation where the current directory actually
has no resolvable name. Specifically, if the current directory
is rmdir()ed, then these subs return its former name, but stat()
on that name yields ENOENT. getcwd() functions should also yield
ENOENT in that case. This situation is exercised by the test script
dist/PathTools/t/cwd_enoent.t which was added for [perl #132648], so
adding that uncovered this preexisting bug.

Both the XS implementation Cwd​::getcwd() and _backtick_pwd() exhibit
the faulty behaviour. The XS one uses the core's getcwd_sv(), which
therefore also has the same bug. Fixing getcwd_sv() seems necessary,
but due to dual-lifing Cwd​::getcwd() can't rely on the core fix for it
to achieve correct behaviour.

The probable fix, which will be needed in multiple places, is to stat the
pathname returned by pwd(1)/getcwd(3)/getcwd_sv() to check that it refers
to the same file as ".". This extra logic should only be used on Cygwin.

Perl Info

Flags:
    category=library
    severity=low
    module=Cwd

Site configuration information for perl 5.27.7:

Configured by zefram at Thu Dec 21 04:41:17 GMT 2017.

Summary of my perl5 (revision 5 version 27 subversion 7) configuration:
   
  Platform:
    osname=linux
    osvers=3.16.0-4-amd64
    archname=x86_64-linux-thread-multi
    uname='linux barba.rous.org 3.16.0-4-amd64 #1 smp debian 3.16.43-2+deb8u2 (2017-06-26) x86_64 gnulinux '
    config_args='-des -Dprefix=/home/zefram/usr/perl/perl_install/perl-5.27.7-i64-f52 -Duselargefiles -Dusethreads -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dusedevel -Uversiononly -Ui_db'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O2'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.9.2'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.19.so
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.19'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/home/zefram/usr/perl/perl_install/perl-5.27.7-i64-f52/lib/5.27.7/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.27.7:
    /home/zefram/usr/perl/perl_install/perl-5.27.7-i64-f52/lib/site_perl/5.27.7/x86_64-linux-thread-multi
    /home/zefram/usr/perl/perl_install/perl-5.27.7-i64-f52/lib/site_perl/5.27.7
    /home/zefram/usr/perl/perl_install/perl-5.27.7-i64-f52/lib/5.27.7/x86_64-linux-thread-multi
    /home/zefram/usr/perl/perl_install/perl-5.27.7-i64-f52/lib/5.27.7


Environment for perl 5.27.7:
    HOME=/home/zefram
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/zefram/usr/perl/perl_install/perl-5.27.7-i64-f52/bin:/home/zefram/usr/perl/util:/home/zefram/pub/x86_64-unknown-linux-gnu/bin:/home/zefram/pub/common/bin:/usr/bin:/bin:/usr/local/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/usr/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2018

From Stromeko@nexgo.de

Zefram writes​:

On Cygwin (not reflected in the perl -V below), it has been found that
multiple implementations of the getcwd() functionality in Cwd return a
non-error result in a situation where the current directory actually
has no resolvable name. Specifically, if the current directory
is rmdir()ed, then these subs return its former name, but stat()
on that name yields ENOENT.

This exposes behaviour mandated by the underlying OS or rather the file
system. The directory can't really be deleted until the last process
having it as the cwd exits.

getcwd() functions should also yield ENOENT in that case.

It's perfectly possible even on a fully conformant POSIX system that the
two calls yield diverging results, just not as likely. Also on Linux
you can easily get a string that is not reachable from the same process
(somewhere in Linux 2.x this string will now be prepended with
"(unreachable)", but you must be able to deal with ENOENT anyway). If
you rely on racy behaviour you need to be prepared to get the wrong
result once in a while.

This situation is exercised by the test script
dist/PathTools/t/cwd_enoent.t which was added for [perl #132648], so
adding that uncovered this preexisting bug.

I'd say the bug here is in the assumption made by the test.

Both the XS implementation Cwd​::getcwd() and _backtick_pwd() exhibit
the faulty behaviour. The XS one uses the core's getcwd_sv(), which
therefore also has the same bug. Fixing getcwd_sv() seems necessary,
but due to dual-lifing Cwd​::getcwd() can't rely on the core fix for it
to achieve correct behaviour.

The probable fix, which will be needed in multiple places, is to stat the
pathname returned by pwd(1)/getcwd(3)/getcwd_sv() to check that it refers
to the same file as ".". This extra logic should only be used on Cygwin.

As the maintainer of the official Cygwin Perl I'd be inclined to remove
such a fix since it would likely introduce considerable slowdown for no
real gain.

Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9​:
http​://Synth.Stromeko.net/Downloads.html#KorgSDada

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2018

From zefram@fysh.org

Achim Gratz wrote​:

This exposes behaviour mandated by the underlying OS or rather the file
system. The directory can't really be deleted until the last process
having it as the cwd exits.

Earlier in the discussion on [perl #132648] I thought that might be
what's happening, because of some old knowledge about Windows. But the
test output that Tony provided showed that it's not. As described in
my paragraph that you quoted, stat() yields ENOENT for the directory's
former name. rmdir() is not leaving the directory in place; it *is*
removing it, at least from that location. It doesn't matter whether it's
temporarily getting some other name, because no other name is arising.
getcwd() is objectively erroneous in returning a name that cannot be
used to refer to the directory.

I'd say the bug here is in the assumption made by the test.

Which assumption exactly? The assumption that rmdir(), having indicated
success, would have succeeded in making the directory unreachable
through its original name? That assumption turns out to be true.
Perhaps the assumption that a successful return from getcwd() would mean
it's providing a usable name? That's not so much an assumption as the
feature that's being tested. If it's not warranted to test for that,
then we'd better add some caution to the getcwd() documentation, something
like "the return value of this function doesn't actually mean anything".

As the maintainer of the official Cygwin Perl I'd be inclined to remove
such a fix since it would likely introduce considerable slowdown for no
real gain.

If the decision is that getcwd() remain buggy on Cygwin, then I suppose
we keep the test skip for Cygwin, and document "on Cygwin the return
value of this function doesn't actually mean anything".

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jan 19, 2018

From Stromeko@nexgo.de

Zefram writes​:

Achim Gratz wrote​:

This exposes behaviour mandated by the underlying OS or rather the file
system. The directory can't really be deleted until the last process
having it as the cwd exits.

Earlier in the discussion on [perl #132648] I thought that might be
what's happening, because of some old knowledge about Windows. But the
test output that Tony provided showed that it's not. As described in
my paragraph that you quoted, stat() yields ENOENT for the directory's
former name. rmdir() is not leaving the directory in place; it *is*
removing it, at least from that location.

As always on Windows, things are a bit more complicated than that. I've
discussed this with the Cygwin developers and I don't pretend to
understand all of the details myself, but in a nutshell once you mark
the directory for deletion the handle and the directory name can not be
stat'ed anymore, but the directory is still enlisted in it's former
parent (maybe in order to make it impossible to create a new directory
with the same name), going away only after the last handle to it closes.
You'd actually have to stat that entry (respectively ask Windows for its
file information) to learn that it is no longer valid for almost all
file operations, but that syscall is one of those that are quite
expensive on Windows.

It doesn't matter whether it's temporarily getting some other name,
because no other name is arising. getcwd() is objectively erroneous
in returning a name that cannot be used to refer to the directory.

From my reading of the corresponding man page POSIX doesn't mandate that
the string returned is actually accessible by the time you use it and
makes no explicit guarantee that it was when you made the call.

I'd say the bug here is in the assumption made by the test.

Which assumption exactly? The assumption that rmdir(), having indicated
success, would have succeeded in making the directory unreachable
through its original name? That assumption turns out to be true.

It seems to me you are making the assumption that the getcwd function
creates an error condition and thus returns a NULL pointer rather than a
string immediately when that directory becomes unreachable (which in the
test code you know from the fact that rmdir has successfully returned).
Your further expectation that this error condition ought to be ENOENT is
explicitly mentioned as a GNU/Linux (glibc) extension and there actually
is still "POSIX" fallback code in place in glibc that simply doesn't do
that. POSIX just mandates that you can't make use of the name of a
deleted directory should you happen to still hold onto it, so for
instance you can't create any files there anymore or list its entries.

Perhaps the assumption that a successful return from getcwd() would mean
it's providing a usable name? That's not so much an assumption as the
feature that's being tested.

Yet that feature is a property of the Linux syscall API if even that,
not POSIX. I'm pretty sure you'd also run into trouble on a few network
file systems with that assumption on GNU/Linux.

If it's not warranted to test for that, then we'd better add some
caution to the getcwd() documentation, something like "the return
value of this function doesn't actually mean anything".

That may not be the worst outcome. It actually does mean something, but
just as with most non-atomic, non-transactional file operations, you
have to expect that once in a while you can't actually use the result as
you thought you might and get an error at some later operation that you
thought was guarded by an earlier one.

As the maintainer of the official Cygwin Perl I'd be inclined to remove
such a fix since it would likely introduce considerable slowdown for no
real gain.

If the decision is that getcwd() remain buggy on Cygwin, then I suppose
we keep the test skip for Cygwin, and document "on Cygwin the return
value of this function doesn't actually mean anything".

Well, the question here is if the gain outweighs the cost, given that
Cygwin would try to more closely emulate GNU/Linux behaviour that is
strictly valid only for local filesystems while no races are going on,
so even GNU/Linux applications would be well advised to not rely on it.

Note that for a complete fix it's not enough to add more checks in
getcwd, the real fix would be in readdir and that would entail an extra
syscall for each directory entry.

Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Q+, Q and microQ​:
http​://Synth.Stromeko.net/Downloads.html#WaldorfSounds

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2018

From zefram@fysh.org

Achim Gratz wrote​:

From my reading of the corresponding man page POSIX doesn't mandate that
the string returned is actually accessible by the time you use it

That's a red herring. Obviously concurrent activity can in general
render the result invalid between its generation and its use, but that's
not the issue here. There are no race conditions in this case.

makes no explicit guarantee that it was when you made the call.

"The getcwd() function shall place an absolute pathname of the current
working directory in the array pointed to by buf". If the string doesn't
actually resolve to the directory, then it is not "an absolute pathname
of the current working directory".

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2018

From Stromeko@nexgo.de

Zefram writes​:

That's a red herring. Obviously concurrent activity can in general
render the result invalid between its generation and its use, but that's
not the issue here. There are no race conditions in this case.

Yes there is and that's the reason your test fails. The entry for cwd
in its parent directory doesn't go away until the last file handle to it
closes, which is why you can still access the name through the file
handle to the cwd (that is what getcwd does). That is just how Windows'
file system works and not some odd choice on the part of Cygwin that can
be changed. In fact I've learned it's not the first time that it has
been tried to get papered over in one way or the other. But these
attempts have not been successful, either for not fixing the problem
completely or for creating larger problems someplace else.

"The getcwd() function shall place an absolute pathname of the current
working directory in the array pointed to by buf".

Which it dutifully does. Note the complete absence of _any_ discussion
of what should happen when the cwd gets deleted even in the informative
section. Now, I think it's preposterous to assume the POSIX folks
didn't think about that case, so that omission is quite likely
intentional to allow for system-dependent behaviour.

If the string doesn't actually resolve to the directory, then it is
not "an absolute pathname of the current working directory".

It still does resolve to it and the directory entry is in fact still
there if you go and look it up in the parent directory, you just can't
use that name to do a successful stat or readdir anymore.

Again, just skip that test on Cygwin (or mark it as expected to fail)
since you are making assumptions that are generally expected to be true
(assuming no other process works in the same directory where you do the
testing), but are known not to be in this specific instance. There's
nothing to do about it that wouldn't create a comparatively greater loss
someplace else.

Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack​:
http​://Synth.Stromeko.net/Downloads.html#WaldorfSounds

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2018

From zefram@fysh.org

Achim Gratz wrote​:

Zefram writes​:

not the issue here. There are no race conditions in this case.

Yes there is and that's the reason your test fails. The entry for cwd
in its parent directory doesn't go away until the last file handle to it
closes,

That's not a race condition. It's not dependent on any concurrent
activity, and adding pauses to the code doesn't affect it.

                     Note the complete absence of \_any\_ discussion

of what should happen when the cwd gets deleted even in the informative
section.

That is an unfortunate omission for getcwd(3), but the situation is
discussed a bit for rmdir(2). There is explicit permission for rmdir()
to yield EBUSY for a directory that is the root or cwd of any process.
I can't say whether it would be feasible for Cygwin to do that, but it
would certainly be more Unix-like and more POSIX conformant for it to
have rmdir() fail in that situation than for rmdir() to partially succeed
while blocking reuse of the name. Note that cwd_enoent.t skips if rmdir()
reports failure.

It still does resolve to it and the directory entry is in fact still
there if you go and look it up in the parent directory, you just can't
use that name to do a successful stat or readdir anymore.

That doesn't sound like "resolving" to me. That the name still exists
in the parent directory is certainly interesting, but if the name can't
be used in file operations then it's just blocking a directory slot.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jan 21, 2018

From Stromeko@nexgo.de

Zefram writes​:

Yes there is and that's the reason your test fails. The entry for cwd
in its parent directory doesn't go away until the last file handle to it
closes,

That's not a race condition. It's not dependent on any concurrent
activity, and adding pauses to the code doesn't affect it.

Yes it is, just not one you've created and so you don't expect it to be
observable. POSIX mandates that rmdir does a few things that
individually are supposed to be atomic, but makes no offer for atomicity
of the whole thing​:

  If the directory's link count becomes 0 and no process has the
  directory open, the space occupied by the directory shall be freed and
  the directory shall no longer be accessible. If one or more processes
  have the directory open when the last link is removed, the dot and
  dot-dot entries, if present, shall be removed before rmdir() returns
  and no new entries may be created in the directory, but the directory
  shall not be removed until all references to the directory are closed.

On Windows you can observe the point where the . and .. entries have
been removed, but the entry in the parent directory is still there
(because the system call removing the directory returns as soon as it
has marked the directory as deleted and hence made the access to it
impossible).

That is an unfortunate omission for getcwd(3), but the situation is
discussed a bit for rmdir(2). There is explicit permission for rmdir()
to yield EBUSY for a directory that is the root or cwd of any process.

Permission, not mandate. Whether or not "the implementation considers
this an error" is not something that you control and POSIX very wisely
left open.

I can't say whether it would be feasible for Cygwin to do that, but it
would certainly be more Unix-like and more POSIX conformant for it to
have rmdir() fail in that situation than for rmdir() to partially succeed
while blocking reuse of the name. Note that cwd_enoent.t skips if rmdir()
reports failure.

Cygwin is a user-mode layer on top of Windows' Win32 subsytem. There
are things that only the kernel can do correctly and it has a slightly
different idea on this very topic than you do.

You do in general not know whether a directory about to be removed is
the cwd of some random process, even if you'd look it up in the process
table. In addition for Cygwin it's not enough to look for Cygwin
processes either, so you'd need to traverse the Windows process list as
well (which you might not even have the right to do).

That doesn't sound like "resolving" to me. That the name still exists
in the parent directory is certainly interesting, but if the name can't
be used in file operations then it's just blocking a directory slot.

Which might entirely be the point of the exercise, so that you can't go
and create a new directory with the same name and then have an old
process use that as cwd (which on Windows has security implications when
you can plant DLL there and create new processes from the running
application). In fact that is a situation that is discussed as a
possibility in the POSIX getcwd documentation as one of the differences
between implementations using fchdir and chdir.

Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Waldorf MIDI Implementation & additional documentation​:
http​://Synth.Stromeko.net/Downloads.html#WaldorfDocs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants