New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-e ignores the UTF8 flag #10550
Comments
From @cpansprout#!perl -l This can happen easily if the file name comes from a URI (URIs always represent byte sequences) and the URI happened to come from a UTF-8 web page. Flags: Site configuration information for perl 5.13.3: Configured by sprout at Thu Aug 12 17:53:37 PDT 2010. Summary of my perl5 (revision 5 version 13 subversion 3 patch v5.13.3-193-g798ae1b) configuration: Locally applied patches: @INC for perl 5.13.3: Environment for perl 5.13.3: |
From @ikegamiOn Sun, Aug 15, 2010 at 5:11 PM, Father Chrysostomos
Bad demo. The arguments for substr are wrong. The following /#!perl -l So does this simpler code: #!perl -l |
The RT System itself - Status changed from 'new' to 'open' |
From dbook@cpan.orgCreated by dbook@cpan.orgFile test operators like -e and -f appear to be using the use strict; use Encode 'encode'; my $filename = "t\x{eb}st"; my $dir = File::Temp->newdir; my $filepath = catfile $dir, encode('UTF-8', $filename); ok -e $filepath, "File $filepath exists"; done_testing; __END__ Perl Info
|
From @GrinnzThis is a duplicate of https://rt.perl.org/Public/Bug/Display.html?id=77242 |
The RT System itself - Status changed from 'new' to 'open' |
…ile::Path. (See: #956233, #956723) This provides relief from runtime errors in Lintian, but does not solve the bugs. It merely makes Lintian useable again. The offending packages sphinx and supysonic no longer abort with runtime errors. Due to a bug in Perl, strings must be "downgraded" before system calls such as stat or open. It is the proper fix [1][2], and should happen in Perl. We simply do so here as triage. [1] Perl/perl5#10550 [2] Perl/perl5#9674 More comprehensive fixes for both bugs are in the works.
I believe this ticket is unfixable.. Take These bytes comprise U+2019. If one says I believe that is the correct behavior. And What the tickets are effectively asking for is for two different UTF8 strings to evaluate to the same thing. I don't think that is advisable. One needs to choose one or the other interpretation, and I believe the one we already have chosen is the better option. |
It's not asking for that. It is asking for two strings with the same contents to evaluate to the same thing. This is an instance of the unicode bug fixed by https://metacpan.org/pod/Sys::Binmode. |
Like Grinnz said, it's an instance of The Unicode Bug. Every builtin that deals with files still suffer from this bug. |
To be specific to your examples:
This is one example of the bug. The contents of that string is "\x{2019}", while the name of the file is "\xE2\x80\x99". Correct functionality would behave like
This is the other direction of this bug. This string is logically identical to the original string, as it was only subjected to an upgrade operation. If you printed it, the output would be the same bytes as with the original string. It has length of 3, just like the original string. And yet, used as a filename, it silently has different behavior. |
OK, I understand your points better now. Do no file systems allow UTF-8 names? What about Windows? Are their filenames not UTF16? |
Different OS/filesystem behavior is one of the biggest reasons it's hard to fix this. Filenames on unixlike systems are bytes, always - encoding is not enforced, just assumed via locale environment, except on MacOS which also normalizes the UTF-8 in the filesystem. On Windows they are stored in UTF-16 (I believe) but I am not familiar with how Perl's functions interact with that. |
The main functional issue is the inconsistency in behavior, not the encoding; but the differences in filesystem behavior makes the encoding an unfortunately relevant issue to any attempts at fixing this. |
Migrated from rt.perl.org#77242 (status was 'open')
Searchable as RT77242$
The text was updated successfully, but these errors were encountered: