New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perl::Encode doesn't handle UTF-8 NFD strings #6485
Comments
From debianbugs@j3e.deCreated by debianbugs@j3e.dethe encode function of perl is not able to convert from UTF-8 which is in normatization form D (NFD). Normalization is handled by Unicode::Normalize. To use encode one has to use the workaround Bjoern Perl Info
|
From dankogai@dan.co.jpOn Tuesday, May 6, 2003, at 06:20 AM, debianbugs@j3e.de (via RT) wrote:
If perl is an application like, say, a word processor, I would agree If you want to do it transparently, you can always use Encode::Encoding package Encode::UTF8::NFD; sub decode($$;;$){ sub encode($$;;$){ 1; Normalization is not an "easy thing that should be done easily". It is Dan the Encode Maintainer |
From @jhi
And I think it is not. Normalization should not magically be done.
You are assuming the equivalence of (pre)composed characters and
-- |
From debianbugs@j3e.deOn 2003-05-06 at 21:21 +0900 Dan Kogai sent off:
this gives a chance to workaround this bug (yes, I think it is).
well, see: from_to claims to convert from encoding1 to encoding2. Bjoern |
From BQW10602@nifty.comOn Tue, 6 May 2003 14:46:06 +0200 (snip)
You must suffer some information loss "Normalizability" (normalization behavior) of a legacy http://www.unicode.org/reports/tr15/#Legacy_Encodings According to this annex, Latin1 is unnormalizable except in NFC. In a sense, the legacy encoding is just a *legacy*; SADAHIRO Tomoyuki |
From nick.ing-simmons@elixent.comBjoern Jacke <debianbugs@j3e.de> writes:
Most of perl's encodings are octet-sequence/octet-sequence converters. Perhaps it makes sense to add a tweak to encode side so that if no encoding
|
From debianbugs@j3e.deOn 2003-05-06 at 15:58 +0300 Jarkko Hietaniemi sent off:
you are not right. $string = "äpfel"; I still say that this is a bug and encode should be able to convert |
From @jhiOn Tue, May 13, 2003 at 01:43:40PM +0200, Bjoern Jacke wrote:
I am confused. The above prints nothing (no surprise there since the
There is no Encode in the above.
I am sorry but I think you are simply flat out wrong and I do not feel -- |
From nick.ing-simmons@elixent.comJarkko Hietaniemi <jhi@iki.fi> writes:
For what it is worth Encode works at character level as well. If Encode or perl coerced the normalization then these would get lost. -- |
From BQW10602@nifty.comOn Wed, 07 May 2003 08:42:25 +0100
For transcoding/normalization at once, I write a tiny module, (1) Module name? http://homepage1.nifty.com/nomenclator/perl/Encode-UnicodeNormalization-0.00.tar.gz HTML (POD) Regards, |
@cpansprout - Status changed from 'open' to 'rejected' |
Migrated from rt.perl.org#22111 (status was 'rejected')
Searchable as RT22111$
The text was updated successfully, but these errors were encountered: