New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode readdir bugs #11513
Comments
From tchrist@perl.comI'm really rather unhappy with the what you see isn't Consider this: #!/usr/bin/env perl Run on Linux, I get this nonsense: Ï�Ï�ιγμαÏ� Run on Darwin, I get this, which is even worse: Ï�Ï�ιγμαÏ� *Who* told Perl it was ok to let me blithely use wide characters in Yes, if I make my loop while (my $enc = readdir($dh)) { Then I get στιγμας on Linux and στιγμας on Darwin. But that's nutty, and in several ways. First off, Darwin's case-insensitive filesytem is an idiot, and doesn't But secondly and of greater importance, I should be able to binmode($dh, ":utf8"); or even opendir(my $dh, ":utf8", "."); And not have to deal with this really really stupid encoding business. Is there reason that this is not a bug that should be fixed? And don't even get me started about glob(). It's broken, too. --tom Summary of my perl5 (revision 5 version 14 subversion 0) configuration: Characteristics of this binary (from libperl): |
From @ikegamiOn Tue, Jul 19, 2011 at 2:39 PM, tchrist1 <perlbug-followup@perl.org> wrote:
Just like: - Input from STDIN must be decoded. This applies: - Input from @ARGV and file names from builtins must be decoded. You can get away with not doing the fourth because you have an UTF-8 locale - Eric |
The RT System itself - Status changed from 'new' to 'open' |
From tchrist@perl.com +---------------------------------------------------------------------+ I've got six weeks of work to do in that number of days, so I'll be In the meanwhile, I will not in general be reading, let alone answering, #1: Life-and-death situations -- why are you using email for that? #2: Personal family matters of my own relations -- again, try the phone. #3: Issues @work w/my University textmining job *INVOLVING ME PERSONALLY*. #4: Prepping my 4.5h of Unicode talks for next week's conference in Portland. #5: Prepping a kilopage of the Camel Book's 4th ed. for Production by mid-August. Because I will *disconnecting my laptop from the Internet* so I can 1) once cheerfully between 5-7am MDT (UTC-0600) I'm a morning person, so those are the only two choices you're liable to Thank you for your forebearance. --tom |
From @cpansproutOn Tue Jul 19 11:39:04 2011, tom christiansen wrote:
Almost all (if not all?) Perl functions that take file names have this I would suggest we use a ‘Wide character’ warning, as we have for print Then we also need a pragma to enable Unicode filenames in -e, open, What should we call it? What do we do on systems on which file names *are* just octet sequences Also, what about systems that support Unicode, but for which no one has |
From @ikegamiOn Sun, Sep 18, 2011 at 8:40 PM, Father Chrysostomos via RT <
File names are meant to be read as text, so one can't really claim they're Also, what about systems that support Unicode, but for which no one has had the time to implement this? (I’m not going to do VMS, for instance.) Like open -| with multiple args on Windows? |
From @ap* Eric Brine <ikegami@adaelis.com> [2011-09-19 03:20]:
One could take a page from Python here and use its surrogate escape What this approach effectively does is allow strings to unambiguously I love the idea and it is one of my todos to add this to Encode should It would be a major step forward for Perl. Regards, |
From @cpansproutOn Tue Oct 04 09:02:13 2011, aristotle wrote:
If that happens, then it’s not really text, is it?
No, no, please don’t start using the locale to determine what the file Mac OS X, for instance, stores the encoding in the file system (so each On the other hand, if we keep things completely consistent on a given Also, nobody has answered my question: What do we call the pragma? dbmopen -X chdir chmod chown chroot fcntl glob link lstat mkdir open Those are all file name functions. But what about user and group names? exec, system, syscall, readpipe, bind, connect, getsockopt, shmwrite and
|
From @HugmeirOn Sun, Oct 23, 2011 at 7:23 PM, Father Chrysostomos via RT <
(Reading the Python thread is still on my TODO list, so I'm not commenting There's a couple of things here being grouped as one. Ignoring *Who* told Perl it was ok to let me blithely use wide characters in
So, first thing: Be like syswrite. -All- syscalls, sans for Second, there should be a way to avoid doing an encode/decode on every use syscalls IN => ":encoding(...)", OUT => ":encoding(...)"; or use syscalls :dir => { IN => ":encoding(...)", OUT => ":encoding(...)" } Or somesuch, which won't solve problems in, say, Windows, but hopefully it Third, require/use/do. I recall Python having some problems with this (if Here's the nicest thing -- I implemented 1 and a prototype of 2 in a couple |
From @cpansproutOn Sun Oct 23 18:26:45 2011, Hugmeir wrote:
(Please, don’t put -deable at the end of a Latin-based word. :-) It’s syswrite seems to be the odd one out. It’s probably using SvPVbyte. With the new pragma, I would suggest fixing the Unicode bug for those
I think it would make things worse, as we would have yet another On the other hand we could provide it with lots of caveats in the
It sounds like a nice idea at first, but I worry about modules |
From @HugmeirOn Sun, Oct 23, 2011 at 11:44 PM, Father Chrysostomos via RT <
But I like my half-broken english..! Fine :P
That's true, but consider which one of those has the actually useful Also, how often do you actually want to pass the internal form of UTF-8 to
I don't think it wouldn't cause any more breakage than when the Fcntl (I'd have little qualms if this were triggered by a 'use VERSION;' though)
Um, I'm not sure I follow. Isn't it as portable as the encode/decode calls use PerlIO::fse;
I was thinking in terms of redefining how the core itself looks for the |
From @cpansproutOn Sun Oct 23 21:00:09 2011, Hugmeir wrote:
Please don’t think I’m trying to pick on you. I just see this misuse so Generally, only the consonants c g k m v m z can have -eable after them, (You don’t know how long I’ve been wanting to bring this up--but now I’m
Several hundred. But those were one-time one-liners.
I think we need to warn, for backward-compatibility. I know there have
That’s my thought, but actual smoke reports tend to sway me quickly.
The whole point of the unicode::filenames pragma is to eliminate the
My initial train of thought was a little muddled. In any case, if perl If some OSes use Aristotle’s approach, then we only need *two* attempts, There are already people using ‘use Mödule’ on OS X. We shouldn’t break
??? |
From @khwilliamsonOn 10/23/2011 10:25 PM, Father Chrysostomos via RT wrote:
The macro UTF8_IS_DOWNGRADEABLE_START has been in the core since: It's understandable that this spelling has become enshrined as valid. |
From @cpansproutOn Mon Oct 24 07:48:27 2011, public@khwilliamson.com wrote:
I’m a native English speaker, too, and it bothers me whenever I see it, |
From tchrist@perl.com
Now you know how I feel about “numify”. :( --tom |
From @khwilliamsonOn 10/24/2011 09:37 AM, Tom Christiansen wrote:
numify rhymes (the way I pronounce it) with mummify, which is what My grandmother (born 1885, raised on a Wisconsin farm) hated the term I cringe when I hear 'less' when the 'proper' term is 'fewer'. I We are powerless over the vicissitudes of English, whose polyglot Vive le sandwich! |
From tchrist@perl.com
How odd: so was mine. 1919-2010. --tom |
From @ikegamiOn Sun, Oct 23, 2011 at 9:26 PM, Brian Fraser <fraserbn@gmail.com> wrote:
When does it make sense to use two different encodings? Are you saying that non-Windows system can't tell you which encoding it is |
From @LeontOn Mon, Oct 24, 2011 at 10:07 PM, Eric Brine <ikegami@adaelis.com> wrote:
Most unices (pretty much all of them except OS X) do not have an Leon |
From tchrist@perl.comKarl, it isn't about shifting word-use. That's a red herring. That is not the way English has ever worked before in any existing And somebody goofed. That doesn't make it right, or good. It's just like children who get catachrestically named Sure, you can do it. You can do anything. But it looks See also HTTP_REFERER. --tom |
From @ikegamiOn Mon, Oct 24, 2011 at 4:12 PM, Leon Timmermans <fawaka@gmail.com> wrote:
Then how come I can read the file names in file selection dialogs on this |
From @ilmariEric Brine <ikegami@adaelis.com> writes:
Because the toolkit assumes an encoding, usually UTF-8. See -- |
From @ap* Tom Christiansen <tchrist@perl.com> [2011-10-24 22:35]:
creat |
From tchrist@perl.com
creat was not caused by not knowing how to spell the word create. But you're right that it is something its inventors came --tom |
From @HugmeirOn Mon, Oct 24, 2011 at 1:25 AM, Father Chrysostomos via RT <
Actually, how about a CPAN smoke of this? If the extent of the breakage is
Hm.. That's true enough. I was a bit wary of something automatically (Would you consider calling it unicode::syscalls or somesuch? ::filenames
Yeah, you are right. I don't think I fully understand Aristotle's proposal
That probably won't work for the latin-1 range though, and the lack of
Sorry, in-joke. |
Migrated from rt.perl.org#95160 (status was 'open')
Searchable as RT95160$
The text was updated successfully, but these errors were encountered: