New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rename, chroot etc. ignore internal encoding #10623
Comments
From perlbug@plan9.deCreated by perlbug@plan9.deThis snippet calls rename with two different paths, even though the same perl -e 'my $x = chr 200; rename $x,0; utf8::encode $x; rename $x,0' The fact that the internal (basically invisible to a perl program) The solution is to use the equivalent of SvPVbyte, not SvPV, when passing A cursory examination of pp_sys shows that at least backtick, open, All those functions silently throw away the crucial information of how (When in doubt, it always helps to review the discussion about crypt() Perl Info
|
From @ikegamiOn Sun, Sep 12, 2010 at 5:15 AM, perlbug@plan9.de <perlbug-followup@perl.org
$x and $x after utf8::encode($x) are not the same string. (They're not even But there is a bug here. $x after utf8::upgrade and $x after utf8::downgrade $ perl -e'$_=chr(0xE9); utf8::upgrade($_); rename "a",$_' The solution is to use the equivalent of SvPVbyte, not SvPV, when passing Correct. |
The RT System itself - Status changed from 'new' to 'open' |
From schmorp@schmorp.deOn Sun, Sep 12, 2010 at 01:23:42PM -0400, Eric Brine <ikegami@adaelis.com> wrote: Sorry for the late reply, but, again, I never received your mail becasue
Yes, while condensing the testcase as much as possible I accidentally -- |
What if this were solved by creating a That way Perl applications could set an I/O layer for |
What scope would that have? |
Global, I guess? Could alternatively make it a pragma, e.g., |
I think global would be wrong, because that means code can't make any assumptions of its own anymore. I immediately recall php code full of "if add_slashes is globally enabled do this, other wise do that" code. |
@Leont Global, yes, feels wrong. But if I could:
… and have that auto-encode the same way |
See also #17094 (comment) (the ticket is about win32, but tony's proposal is for all platforms). |
@xenu For myself, I actually want to go the other way: SvPVbyte rather than SvPVutf8. |
On unix systems, file names are composed of arbitrary bytes, which two having specific values: 0x00 reserved to denote end of string, and 0x2F directory separator. ("/" is 0x2F in EBCDIC encodings too!) There's no guarantee of being UTF-8 or some other encoding, no matter what the locale says. On Windows file systems, file names are sequences of arbitrary 16-bit values expected to be UTF-16le, but it's surely possible to have unmatched surrogates and invalid characters such as 0xFFFF. If we want Perl to be able to round-trip any file name (e.g. readdir -> rename), there are two options.
Current status:
I would love to see decoded files names (option 2 above) , and a pragma would be required to do so, but having to provide an encoding is bad. The correct encoding should be used. The pragma could allow one to specify errors.
|
This deals with the problem of upgraded/downgraded strings meaning different filesystem paths: It doesn’t address Windows, but AFAIK it doesn’t worsen the Windows situation, either. |
Migrated from rt.perl.org#77798 (status was 'open')
Searchable as RT77798$
The text was updated successfully, but these errors were encountered: