New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perl cannot read UTF-16 files with illegal unicode #10934
Comments
From andrew@pimlott.netCreated by andrew@pimlott.netCreate UTF-8 and UTF-16LE files containing the character U+FDD0. (For binmode(STDIN, ':encoding(UTF-8)'); The program runs without complaint. With the UTF-16LE file as STDIN, run binmode(STDIN, ':encoding(UTF-16LE)'); The program dies with UTF-16LE:Unicode character fdd3 is illegal at ./bin/grep_high line 2. This is a fatal error and I find no way to turn it off except perhaps to I suggest that this diagnostic be a warning, just like the "is illegal for Andrew Perl Info
|
From andrew@pimlott.netI said U+FDD0 but the example bytes were for U+FDD3. Everything else Andrew |
From @khwilliamsonI'm sorry, but this ticket should instead be against the CPAN module Note that U+FDD3 and the other non-character code points are not --Karl Williamson |
The RT System itself - Status changed from 'new' to 'open' |
@khwilliamson - Status changed from 'open' to 'rejected' |
From andrew@pimlott.netI will refile the issue with Encode. However, specifying the encoding - that all encodings treat these characters in the same way as the perl The last requires that the warning flags be passed on to Encode. This Interesting point about the security implication. But U+FEFF could as And it's a little strong to say that Encode is "working as required", Andrew |
From @khwilliamsonOn 01/14/2011 03:15 PM, Andrew Pimlott wrote:
I'm sorry that you're being exposed to Perl's internal organizational As an aside, UTF-16 parallels UTF-8, in that if you use Encode with that Security is serious business. I'm unhappy that Perl, including the I would
The default behavior can't be just a warning when a server is facing the
I'm not sure if you were being ironic here, as it is self-contradictory;
Since Encode doesn't know anything about the context, it has to assume There was no biasing involved. I've been trying to root out all |
From andrew@pimlott.netExcerpts from Karl Williamson's message of Sat Jan 22 13:16:45 -0800 2011:
No prob about that. I've filed the bug with Encode: I'm just suggesting that some coordination with the core perl maintaiers
I don't think that's accurate--see the original example I posted: binmode(STDIN, ':encoding(UTF-8)'); Input is EF B7 93, which decodes to U+FDD3, a noncharacter. There is no
That's a fair point, but consider the other side: I write code that is Even better would be to document the situation clearly: As a security precaution, the following encodings consider Unicode
I'd put it differently: perl and Encode have no idea what data is By the way, to be clear, chr(0xFDD0) does throw a warning, so someone
Oops, I got that one wrong. But according to my understanding, U+FFFE
I appreciate your caution. But it's a judgement call as to how paranoid
Great--clear diagnostics and documentation will make these issues less Andrew |
From @ap* Andrew Pimlott <andrew@pimlott.net> [2011-01-30 20:10]:
I don’t think compromises are desirable here. And programs that And that is already the plan for the future. (The :utf8 layer For things like `chr` the situation is more difficult since perl Regards, |
Migrated from rt.perl.org#81454 (status was 'rejected')
Searchable as RT81454$
The text was updated successfully, but these errors were encountered: