Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readdir doesnt understand filenames in a foreign language #6152

Open
p5pRT opened this issue Dec 12, 2002 · 3 comments
Open

readdir doesnt understand filenames in a foreign language #6152

p5pRT opened this issue Dec 12, 2002 · 3 comments
Labels
Unicode and System Calls Bad interactions of syscalls and UTF-8

Comments

@p5pRT
Copy link

p5pRT commented Dec 12, 2002

Migrated from rt.perl.org#19082 (status was 'open')

Searchable as RT19082$

@p5pRT
Copy link
Author

p5pRT commented Dec 12, 2002

From gshamis@yahoo.com

I am using ActivePerl on Win2K. I asked ActivePerl
about this and they claim it's a general issue not one
in their distribution.

The issue is this​:
Windows2000 allows for files to be named using foreign
languages -- like Russian using Win1251 encoding
I wanted to write a Perl script that automatically
transliterates to ascii. However it seems that readdir
gives me an ascii of the Win1251 encoding which I
can't do anything with...

Any info you can give me is appreciated.

--Garry

__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http​://sbc.yahoo.com

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2003

From nick.ing-simmons@elixent.com

Garry Shamis <perl5-porters@​perl.org> writes​:

# New Ticket Created by Garry Shamis
# Please include the string​: [perl #19082]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt2/Ticket/Display.html?id=19082 >

I am using ActivePerl on Win2K. I asked ActivePerl
about this and they claim it's a general issue not one
in their distribution.

The issue is this​:
Windows2000 allows for files to be named using foreign
languages -- like Russian using Win1251 encoding
I wanted to write a Perl script that automatically
transliterates to ascii. However it seems that readdir
gives me an ascii of the Win1251 encoding which I
can't do anything with...

Is Sarathy _still_ sure that wide syscalls are uninteresting?
Surely on NT-ish things on perl5.8 we _could_ always use wide calls
and UTF-8 on input and utf-16le (did I remember that right?) on output?

--
Nick Ing-Simmons
http​://www.ni-s.u-net.com/

@p5pRT
Copy link
Author

p5pRT commented Jan 19, 2003

From @gsar

On Fri, 17 Jan 2003 16​:11​:22 GMT, Nick Ing-Simmons wrote​:

Garry Shamis <perl5-porters@​perl.org> writes​:

# New Ticket Created by Garry Shamis
# Please include the string​: [perl #19082]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt2/Ticket/Display.html?id=19082 >

I am using ActivePerl on Win2K. I asked ActivePerl
about this and they claim it's a general issue not one
in their distribution.

The issue is this​:
Windows2000 allows for files to be named using foreign
languages -- like Russian using Win1251 encoding
I wanted to write a Perl script that automatically
transliterates to ascii. However it seems that readdir
gives me an ascii of the Win1251 encoding which I
can't do anything with...

Is Sarathy _still_ sure that wide syscalls are uninteresting?

On the contrary, I'm pretty sure that they're _very_ interesting to
the windows audience. The current implementation however is severely
broken, since the return code path into perl doesn't know that the
syscall returned UTF-8 data and doesn't do the necessary things to
set the right SV flags. The wide calls should also be autoenabled
when UTF-8 data is passed into these syscalls, which doesn't happen
now. So the code needs to be reworked a bit. Once Jarkko's work
on the Linux/UNIX end settles down, I'll take a look at it.

Surely on NT-ish things on perl5.8 we _could_ always use wide calls
and UTF-8 on input and utf-16le (did I remember that right?) on output?

That's certainly one way. I'm not sure if it will result in
UTF-8 data ending up in places that don't expect to see it
currently (XS extensions primarily) or if it will cause
UTF-8 codepaths to be taken in the regex engine with the
concomitant slowdowns. It seems safe to still do this based
on the -C switch (or some similar) hint.

Sarathy
gsar@​ActiveState.com

@p5pRT p5pRT added the Unicode and System Calls Bad interactions of syscalls and UTF-8 label Nov 15, 2019
@xenu xenu removed the Severity Low label Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Unicode and System Calls Bad interactions of syscalls and UTF-8
Projects
None yet
Development

No branches or pull requests

2 participants