Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to avoid __FILE__ variable containing relative path #15212

Open
p5pRT opened this issue Mar 2, 2016 · 9 comments
Open

Option to avoid __FILE__ variable containing relative path #15212

p5pRT opened this issue Mar 2, 2016 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 2, 2016

Migrated from rt.perl.org#127646 (status was 'open')

Searchable as RT127646$

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2016

From @hakonhagland

I am writing a module that implements a subroutine that requires to
read the source file of the caller. The filename of the caller can be
obtained from the Perl caller() function, but unfortunately it is
sometimes a relative pathname, which is not relative to the current
directory, but to what *was* the current directory when the file was
loaded. That current directory may very well have changed at the time
my module inspects the variable, and hence it is no longer possible to
recover the absolute pathname of the file.

I believe the caller() function gets the filename from package
variable "__FILE__".

I suspect the reason that __FILE__ is not populated with an absolute
path in these cases, is to avoid the call to Cwd​::getcwd() when the
module is loaded. Maybe it was/is considered that this call would be
to expensive and not worth the cost?

If this is the case, would it be possible to introduce an environment
variable, for example PERL5FILEPATH, that if set to "absolute" would
force a call to Cwd​::getcwd() and to populate __FILE__ with an
absolute path at the time a module is loaded?

On the other hand, if the variable PERL5FILEPATH is not set, or set to
"relative", things work as today. This is the default behavior and
Cwd​::getcwd() is not called.

This should also work for the main program file "$0", and it could
eliminate the use of "$FindBin​::Bin".

Why use an environment variable like PERL5FILEPATH? Why not a special
variable like $^FILE_PATH? The reason is that the variable must be set
very early at compile time in order for it to have effect. Most
modules are loaded with "use" or "require" at compile time in BEGIN {}
blocks. Since it is not possible to inject code to be executed before
all BEGIN {} blocks this approach seems not viable.

When does __FILE__ become relative? For modules, it will become
relative if @​INC contains a relative path name, like './lib',
'./test', or simply the current directory '.'.

Of course if the cost of Cwd​::getcwd() is not the reason why __FILE__
can be relative, other measures might be required.

Best regards,
Håkon Hægland

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2016

From zefram@fysh.org

Hakon Haegland wrote​:

I am writing a module that implements a subroutine that requires to
read the source file of the caller.

That's not a good idea. Why do you think you need to do this? What are
you trying to achieve? In general, for multiple reasons, you cannot rely
on being able to read the source files of loaded modules subsequent to
their actual loading.

   That current directory may very well have changed at the time

my module inspects the variable, and hence it is no longer possible to
recover the absolute pathname of the file.

Normally, if this kind of thing is an issue then the main program is
aware of it, and can make a note of the initial working directory before
changing it. Having this need entirely located in a module suggests
that you're going about your problem in the wrong way.

I believe the caller() function gets the filename from package
variable "__FILE__".

That's not a variable, and not located in a module package. Syntactically
it's essentially a built-in function. It has magical compile-time
behaviour such that the invocation is replaced with a const op containing
the filename. However, you're correct about there being a link​: caller()
and __FILE__ ultimately get the filename from the same place.

I suspect the reason that __FILE__ is not populated with an absolute
path in these cases, is to avoid the call to Cwd​::getcwd() when the
module is loaded.

No, it's not a desired feature that was sacrificed for performance.
It's more that there's just no convention of making all pathnames
absolute, and indeed it's not always possible to do so correctly.
Given a relative pathname, the correct treatment is in fact to use it
directly as a relative pathname.

If this is the case, would it be possible to introduce an environment
variable, for example PERL5FILEPATH,

Bad way to control it. However, if this would fix your case, that shows
that you know at the top level and from the start of the program run
that you'll want absolute pathnames. In that case, you could begin your
program with a stanza that edits @​INC (and any other relative pathnames
from which you'll load code) to make all the pathnames absolute.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2016

From @hakonhagland

Hi Zefram. Thanks for the detailed response! I am still not convinced that I
am going about the problem in the wrong way. Please read comments below.

What are you trying to achieve?

I am trying to make a small improvement to the CPAN module
"Data​::Printer". I wanted to display the variable name automatically
(similarly to "Data​::Dumper​::Simple"). However, I would like to do
this without using "PadWalker" or any source filter. Instead, I will
try to read the source file by exploiting filename and line number of
current source from caller(). Then parsing the line with PPI to
recover the variable name.

My thought was that the Pod documentation of the module could mention that
the user could
set environment variable PERL5FILEPATH="absolute" to have the module work
better.

In general, for multiple reasons, you cannot rely
on being able to read the source files of loaded modules subsequent to
their actual loading.

Do you mean if the file has been deleted in the mean time?
I think that is not very likely. So I guess you mean something else?

It's more that there's just no convention of making all pathnames
absolute, and indeed it's not always possible to do so correctly.

Cwd​::abs_path($filename) should be able to convert a relative
$filename, or am I missing something?

2016-03-02 21​:44 GMT+01​:00 Zefram via RT <perlbug-followup@​perl.org>​:

Hakon Haegland wrote​:

I am writing a module that implements a subroutine that requires to
read the source file of the caller.

That's not a good idea. Why do you think you need to do this? What are
you trying to achieve? In general, for multiple reasons, you cannot rely
on being able to read the source files of loaded modules subsequent to
their actual loading.

   That current directory may very well have changed at the time

my module inspects the variable, and hence it is no longer possible to
recover the absolute pathname of the file.

Normally, if this kind of thing is an issue then the main program is
aware of it, and can make a note of the initial working directory before
changing it. Having this need entirely located in a module suggests
that you're going about your problem in the wrong way.

I believe the caller() function gets the filename from package
variable "__FILE__".

That's not a variable, and not located in a module package. Syntactically
it's essentially a built-in function. It has magical compile-time
behaviour such that the invocation is replaced with a const op containing
the filename. However, you're correct about there being a link​: caller()
and __FILE__ ultimately get the filename from the same place.

I suspect the reason that __FILE__ is not populated with an absolute
path in these cases, is to avoid the call to Cwd​::getcwd() when the
module is loaded.

No, it's not a desired feature that was sacrificed for performance.
It's more that there's just no convention of making all pathnames
absolute, and indeed it's not always possible to do so correctly.
Given a relative pathname, the correct treatment is in fact to use it
directly as a relative pathname.

If this is the case, would it be possible to introduce an environment
variable, for example PERL5FILEPATH,

Bad way to control it. However, if this would fix your case, that shows
that you know at the top level and from the start of the program run
that you'll want absolute pathnames. In that case, you could begin your
program with a stanza that edits @​INC (and any other relative pathnames
from which you'll load code) to make all the pathnames absolute.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2016

From zefram@fysh.org

Hakon Haegland wrote​:

            I wanted to display the variable name automatically

...

try to read the source file by exploiting filename and line number of
current source from caller(). Then parsing the line with PPI to
recover the variable name.

That is totally the wrong way to go about it. PadWalker and source
filters are two more seriously wrong ways. The right way to determine
the variable name is to hook compilation of the call and examine the
argument ops. See Debug​::Show for an example. Of course, anything
like this makes the behaviour quite different from any ordinary
subroutine acting on argument values, such as Data​::Printer supplies,
so retrofitting this to Data​::Printer is also the wrong thing to do.
This facility belongs in a separate module, which can usefully use
Data​::Printer for the heavy lifting.

Do you mean if the file has been deleted in the mean time?

That's one possibility. Also may have been modified, or its permissions
may have changed, or the process's credentials may have changed, or there
may be an unpredictable error upon the second read. Also, even if you
do get to read the original text, your parsing of it is not the same
process as the original parsing of it for execution purposes. You may
get desynchronised and find the wrong line. There's no guarantee that the
code is using the (subset of) standard grammar that you are able to parse.
You especially can't expect to parse a single code line in isolation,
as you imply. Even if you were to reapply the core parser, it's possible
for the parsing process to depend on compile-time contingencies, such that
it parses differently each time, so you can't match the original parsing.

Cwd​::abs_path($filename) should be able to convert a relative
$filename, or am I missing something?

It attempts to, but it is not always possible. The current directory
might not be reachable from the process's current root directory.
The permissions on ancestor directories might prevent the discovery of
their absolute paths, or if the absolute path is findable then permissions
might prevent its use. An unpredictable error might frustrate the finding
of the absolute path. Basically, abs_path() is a hack that works in the
most ordinary circumstances but cannot be relied upon in the general case.
This is dictated by fundamental features of the operating system; there
are others in which absolute paths are always reliably available.

This RT ticket should be rejected.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 7, 2016

From @hakonhagland

The right way to determine the variable name is to hook compilation
of the call and examine the argument ops. See Debug​::Show for an example.

Thanks for the reference to Debug​::Show. Unfortunately, I have limited
knowledge of Perl internals, so it's difficult to judge the
correctness of that module with my current coding skills. My
impression is that I need to know more about perlguts to be going down
that road.

So for the moment I will continue working with PPI (despite its
limitations). To me, it seems to work well for the tests I have performed so
far. I also have had no problem with caller() giving wrong __LINE__ so
far. For the (unlikely) events that you mention that would invalidate
the information in __LINE__ and/or __FILE__, I will simply bail out..
The purpose of the module (and in particular, the extension I have
discussed) is only
to provide debugging aid anyway. If a user of the module, finds that
Data​::Printer does not work for his case where a module has been deleted
since
it was initially loaded, the user can try make his own extension to
Data​::Printer if he needs that functionality.

This RT ticket should be rejected.

Since Perl have a philosophy of "making hard things possible", I
would propose a last alternative before this is rejected​: Introduce a
special literal
__FILE_ABSPATH__. This will complement __FILE__. The literal
__FILE_ABSPATH__ will contain the absolute path of the current file
at the time it was loaded, if that exists, if it is not possible to
obtain the absoulte path, it will be "undef". Note​: __FILE_ABSPATH__
must also be accessible through caller().

I think for most practical cases __FILE_ABSPATH__ would be defined,
and that would be much better than not being able to access the path at
all. ( I am here talking of the case where another module
(Data​::Printer) tries to access the __FILE__ token (through caller())
of another module ).

2016-03-04 15​:37 GMT+01​:00 Zefram via RT <perlbug-followup@​perl.org>​:

Hakon Haegland wrote​:

            I wanted to display the variable name automatically

...

try to read the source file by exploiting filename and line number of
current source from caller(). Then parsing the line with PPI to
recover the variable name.

That is totally the wrong way to go about it. PadWalker and source
filters are two more seriously wrong ways. The right way to determine
the variable name is to hook compilation of the call and examine the
argument ops. See Debug​::Show for an example. Of course, anything
like this makes the behaviour quite different from any ordinary
subroutine acting on argument values, such as Data​::Printer supplies,
so retrofitting this to Data​::Printer is also the wrong thing to do.
This facility belongs in a separate module, which can usefully use
Data​::Printer for the heavy lifting.

Do you mean if the file has been deleted in the mean time?

That's one possibility. Also may have been modified, or its permissions
may have changed, or the process's credentials may have changed, or there
may be an unpredictable error upon the second read. Also, even if you
do get to read the original text, your parsing of it is not the same
process as the original parsing of it for execution purposes. You may
get desynchronised and find the wrong line. There's no guarantee that the
code is using the (subset of) standard grammar that you are able to parse.
You especially can't expect to parse a single code line in isolation,
as you imply. Even if you were to reapply the core parser, it's possible
for the parsing process to depend on compile-time contingencies, such that
it parses differently each time, so you can't match the original parsing.

Cwd​::abs_path($filename) should be able to convert a relative
$filename, or am I missing something?

It attempts to, but it is not always possible. The current directory
might not be reachable from the process's current root directory.
The permissions on ancestor directories might prevent the discovery of
their absolute paths, or if the absolute path is findable then permissions
might prevent its use. An unpredictable error might frustrate the finding
of the absolute path. Basically, abs_path() is a hack that works in the
most ordinary circumstances but cannot be relied upon in the general case.
This is dictated by fundamental features of the operating system; there
are others in which absolute paths are always reliably available.

This RT ticket should be rejected.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 7, 2016

From zefram@fysh.org

Hakon Haegland wrote​:

So for the moment I will continue working with PPI (despite its
limitations).
...
Since Perl have a philosophy of "making hard things possible",

The thing you want *is* possible. You're turning down the way to do
it that would actually work. You're not going to get much traction
with your request for a new feature to ameliorate just one of the many
problems with a very wrong way of doing things.

special literal
__FILE_ABSPATH__.
...
must also be accessible through caller().

This would mean that the attempt at abspath() must be performed for every
module file load, regardless of whether anything is interested in it,
which there almost always isn't. The additional cost isn't huge, but
imposing it on all Perl users, even when the feature isn't being used,
would be an unpopular move. Fundamentally it's putting the burden in
the wrong place​: your desire to reread a loaded module file is bizarre,
and so your code should be what pays the cost of making that work.

There are better ways to address your case, especially because it's only
for debugging use. You can advise users to favour absolute pathnames
in @​INC, in the cases where their program uses chdir() and they want
this context information in the debugging output. You can offer code
to abspath()ify @​INC. Thus even this wrong hard thing is possible.
__FILE_ABSPATH__ is not justified.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 7, 2016

From @demerphq

On 7 March 2016 at 05​:46, Zefram <zefram@​fysh.org> wrote​:

Hakon Haegland wrote​:

So for the moment I will continue working with PPI (despite its
limitations).
...
Since Perl have a philosophy of "making hard things possible",

The thing you want *is* possible. You're turning down the way to do
it that would actually work. You're not going to get much traction
with your request for a new feature to ameliorate just one of the many
problems with a very wrong way of doing things.

special literal
__FILE_ABSPATH__.
...
must also be accessible through caller().

This would mean that the attempt at abspath() must be performed for every
module file load, regardless of whether anything is interested in it,
which there almost always isn't. The additional cost isn't huge, but
imposing it on all Perl users, even when the feature isn't being used,
would be an unpopular move.

FWIW, I for one would not mind. If __FILE__, or caller, simply started
always being absolute, I think I would consider it an improvement, if
only due to the consistency it afforded. Backwards compat concerns
aside of course.

Fundamentally it's putting the burden in
the wrong place​: your desire to reread a loaded module file is bizarre,
and so your code should be what pays the cost of making that work.

I don't know that its bizarre. I think the use case described in this
ticket is a bit bizarre, but the desire to do so is not so weird. I
have seen code that does this several times, mostly for purposes where
100% formal correctness of doing so doesnt matter, like annotating
backtraces with the actual code in an exception reporting mechanism,
etc.

I also think the cost is so trivial almost nobody would care.

There are better ways to address your case, especially because it's only
for debugging use. You can advise users to favour absolute pathnames
in @​INC, in the cases where their program uses chdir() and they want
this context information in the debugging output. You can offer code
to abspath()ify @​INC. Thus even this wrong hard thing is possible.
__FILE_ABSPATH__ is not justified.

Personally if someone had a patch that added it, and supporting it
didn't produce measurable impact in practice, I would vote for it.

I guess to roll this yourself you would have to tie @​INC, and make
sure that your module was loaded absolutely first. I assume its
possible to tie @​INC.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 12, 2016

From @hakonhagland

If __FILE__, or caller, simply started always being absolute,
I think I would consider it an improvement, if only due to the
consistency it afforded. Backwards compat concerns aside of course.

Agree.

I guess to roll this yourself you would have to tie @​INC, and make
sure that your module was loaded absolutely first. I assume its
possible to tie @​INC.

Yes it is possible to tie @​INC, see for example Array​::Sticky​::INC

https://metacpan.org/pod/Array::Sticky::INC

but even if @​INC was tied, I think it can only affect modules loaded after
my
first call to tie(), so there is still the problem that my module must be
loaded
absolutely first. I think the requirement to load my module first is not
very practical. The purpose of the module should be ease of use
in the way that the user should be able to simply inserting "use
Data​::Printer" in any
module at any place he likes. To charge the user with the
responsibility of loading the module absolutely first would likely
undermine its purpose.

2016-03-07 11​:07 GMT+01​:00 yves orton via RT <perlbug-followup@​perl.org>​:

On 7 March 2016 at 05​:46, Zefram <zefram@​fysh.org> wrote​:

Hakon Haegland wrote​:

So for the moment I will continue working with PPI (despite its
limitations).
...
Since Perl have a philosophy of "making hard things possible",

The thing you want *is* possible. You're turning down the way to do
it that would actually work. You're not going to get much traction
with your request for a new feature to ameliorate just one of the many
problems with a very wrong way of doing things.

special literal
__FILE_ABSPATH__.
...
must also be accessible through caller().

This would mean that the attempt at abspath() must be performed for every
module file load, regardless of whether anything is interested in it,
which there almost always isn't. The additional cost isn't huge, but
imposing it on all Perl users, even when the feature isn't being used,
would be an unpopular move.

FWIW, I for one would not mind. If __FILE__, or caller, simply started
always being absolute, I think I would consider it an improvement, if
only due to the consistency it afforded. Backwards compat concerns
aside of course.

Fundamentally it's putting the burden in
the wrong place​: your desire to reread a loaded module file is bizarre,
and so your code should be what pays the cost of making that work.

I don't know that its bizarre. I think the use case described in this
ticket is a bit bizarre, but the desire to do so is not so weird. I
have seen code that does this several times, mostly for purposes where
100% formal correctness of doing so doesnt matter, like annotating
backtraces with the actual code in an exception reporting mechanism,
etc.

I also think the cost is so trivial almost nobody would care.

There are better ways to address your case, especially because it's only
for debugging use. You can advise users to favour absolute pathnames
in @​INC, in the cases where their program uses chdir() and they want
this context information in the debugging output. You can offer code
to abspath()ify @​INC. Thus even this wrong hard thing is possible.
__FILE_ABSPATH__ is not justified.

Personally if someone had a patch that added it, and supporting it
didn't produce measurable impact in practice, I would vote for it.

I guess to roll this yourself you would have to tie @​INC, and make
sure that your module was loaded absolutely first. I assume its
possible to tie @​INC.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants