New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anomalies in handling malformed utf8 input #16504
Comments
From @rjbsMark Dominus sent me a bug report that he couldn't get perlbug to accept. This is a bug report. The attached input file “bad” is a one-line summary of an email message 1$ perl -lne 'print if /[ąę]/' bad > /dev/null 5$ perl -lne 'print if /ą/' bad > /dev/null There are at least two anomalies here. Invocation 4 properly fails. (PERL_UNICODE=39 is equivalent to supplying Invocation 2 is completely identical, except that the data is delivered on The complete message header is also attached (msg-hdr.txt), and the This is perl 5, version 22, subversion 1 (v5.22.1) built for Please cc me on replies, as I do not regularly read this list. -- |
From @rjbs6 01/02 " ������Ч�Ŀ�չ�ֳ����ճ�����-����վ<<������:����� |
From @rjbsSummary of my perl5 (revision 5 version 22 subversion 1) configuration: Characteristics of this binary (from libperl): |
From @rjbsReturn-Path: <yjzdjae3560@ywart.com> |
From @khwilliamsonOn 04/11/2018 11:17 AM, Ricardo SIGNES (via RT) wrote:
Shouldn't use utf8 be used?
|
The RT System itself - Status changed from 'new' to 'open' |
From @GrinnzOn Wed, Apr 11, 2018 at 4:46 PM, Karl Williamson <public@khwilliamson.com>
Yes. -CAS only sets @ARGV to be interpreted as UTF-8 and :utf8 layers on -Dan |
From @GrinnzOn Wed, Apr 11, 2018 at 4:52 PM, Dan Book <grinnz@gmail.com> wrote:
Using the options -CSD (-CD makes the special ARGV handle used by -n open -Dan |
From @rjbsOn Wed, 11 Apr 2018 14:35:20 -0700, grinnz@gmail.com wrote:
I'm not sure this is sufficient explanation. Consider: ~$ cat bad | perl -CAS -Mutf8 -lne 'print if /ąę/' Our input comes from stdin, and we have use -CS, which means STDIN is assumed UTF-8. In both cases, we use -Mutf8. We only see a fatal error in the second case, when we have used a character class instead of a string. -- |
From @khwilliamsonOn 04/12/2018 08:10 AM, Ricardo SIGNES via RT wrote:
I'm not sure the file survived the email transfer intact, because I But I know the reason one fails and the other doesn't. Perl does not We also don't got out of our way to make validity checks as we execute. That is what is happening here, as you can see if you add -Dr. As an In the first case, you get this: Did not find anchored substr "%x{105}%x{119}"... In this case, we can tell that the match will fail because we first use The second case is different. In this case we don't do a byte scan, but have to examine the string in The fix for this is to fix :utf8 to do validity checking by default. |
From @khwilliamsonOn 04/12/2018 09:36 AM, Karl Williamson wrote:
I believe it's documented somewhere that you can have inconsistent |
From @GrinnzOn Thu, Apr 12, 2018 at 10:10 AM, Ricardo SIGNES via RT <
To clarify, I meant "this should make these examples exhibit the expected -Dan |
From @khwilliamsonI believe this ticket can be rejected, and will do so if I don't hear opinions to the contrary in the next month-ish |
From @khwilliamsonTwo months without comment, so I am rejecting as scheduled |
@khwilliamson - Status changed from 'open' to 'rejected' |
Migrated from rt.perl.org#133101 (status was 'rejected')
Searchable as RT133101$
The text was updated successfully, but these errors were encountered: