New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A portion of backrefs from s///eg aren't downgraded under use bytes pragma. #10094
Comments
From aklaswad@gmail.comCreated by aklaswad@gmail.com#!/usr/bin/perl Expected: success for all tests. ( or at least, fail for all tests. ) Observed: only the first test is successed, and rest are failed.1..3
|
From zefram@fysh.orgAkira Sawada wrote:
You shouldn't be testing the "utf8" flag. It's an internal implementation There are, unfortunately, some places where string upgrading matters, in -zefram |
The RT System itself - Status changed from 'new' to 'open' |
From mark@exonetric.comOn 19 Jan 2010, at 20:34, Zefram wrote:
I wonder about the case of concatenating a decoded string ("utf8 on") with an This must be deliberate behaviour rather than a bug, but I'm curious what I almost wonder if there shouldn't be more default warnings when this happens. Pointers to archive postings on this point appreciated. - Mark |
From zefram@fysh.orgMark Blackman wrote:
I think what you mean by "decoded" and "encoded" here is that you have The "utf8" flag has very little to do with the character encoding -zefram |
From mark@exonetric.comOn 19 Jan 2010, at 21:17, Zefram wrote:
Indeed I should not, but this sort of thing does happen rather invisibly I'm wondering if there's any moral support for warning about these cases "warning: concatenating octet and character oriented strings, double Perhaps my situation is too rare to address with changes to core.
Yes, this is what I had deduced and with the help of this article, http://ahinea.com/en/tech/perl-unicode-struggle.html - Mark |
From zefram@fysh.orgMark Blackman wrote:
Perl doesn't distinguish between characters and octets, and retrofitting -zefram |
From mark@exonetric.comOn 19 Jan 2010, at 21:30, Zefram wrote:
the quote from perlunicode, below, 'If strings operating under byte semantics and strings with Unicode suggests to me that Perl does have some way of flagging these - Mark |
From zefram@fysh.orgMark Blackman wrote:
You're not misreading it, it's misleading you. There's a lot of such -zefram |
From mark@exonetric.comOn 19 Jan 2010, at 21:30, Zefram wrote:
I can still imagine a, probably optional, probably quite slow, test in the a) either operand has non-ASCII octets without a UTF8 flag, and delivers some kind of warning. The potential for a false positive I suspect I'm in a small minority on the "niceness" of that approach. - Mark |
From @greergaOn Tue, 19 Jan 2010, Zefram wrote:
I think that's what encoding::warnings does, right? http://search.cpan.org/~audreyt/encoding-warnings-0.11/lib/encoding/warnings.pm -- |
From mark@exonetric.comOn 19 Jan 2010, at 23:39, George Greer wrote:
thanks, that's what I was looking for. - Mark |
From @ap* Zefram <zefram@fysh.org> [2010-01-19 22:25]:
Actually it’s not, but it requires cooperative code, and few (In my personal bikeshed, the module would have been CHARDATA and I wonder if putting facilities of this sort into core would be Being able to tell apart character and binary data is a valid Regards, |
From @nwc10On Tue, Jan 19, 2010 at 10:53:06PM +0000, Mark Blackman wrote:
My brain has been chewing over things like this for some weeks now. I don't have any conclusions yet, other than "it might need 2 flags" Nicholas Clark |
From @khwilliamsonMark Blackman wrote:
Zefram wrote:
I wrote that piece of the pod, and rereading it, it still makes perfect Mark Blackman wrote:
What I was describing in the pod is that if a string contains octets If that string is concatenated with a string that does have the UTF8 I didn't want to mention the UTF8 flag detail at that point, as I Are there suggestions for improving the wording? And, as far as Mark's point in the last snippet, yes I think such a The problem is finding someone who wants to do this code change. My |
From mark@exonetric.comOn 09/03/2010 15:04, karl williamson wrote:
The concatenation point made sense to me. I interpreted that to imply In particular, this phrase by zefram "Perl doesn't distinguish between characters and octets" seemed to contradict my reading of the pod snippet "If strings operating under byte semantics and strings with Unicode Perhaps Zefram was trying to indicate that the semantics question only
If that character/bytes distinction only applies during concatenation,
Another poster indicated just such a module already exists, I believe, http://search.cpan.org/dist/encoding-warnings/lib/encoding/warnings.pm - Mark |
From zefram@fysh.orgMark Blackman wrote:
Referring to the UTF-8 flag when describing concatenation is misleading. String concatenation concatenates sequences of characters. That's it. If your string logically consists of octets, rather than characters, Note that in describing concatenation I haven't referred to how the The UTF-8 flag *does* make a difference to semantics in a few situations. -zefram |
From @khwilliamsonI didn't follow much of this, so maybe it will become clear to me after Zefram wrote:
If I understand correctly what you mean, I believe it to be wrong. Octets are not aliased to the latin1 character set. Without the UTF8 The only ways to have those upper-bit-set 128 octets mean Latin1 |
From @khwilliamsonkarl williamson wrote:
I realize I could have been clearer. When the UTF8 flag is set, octets
|
From zefram@fysh.orgkarl williamson wrote:
You're describing here the pre-Unicode character semantics, which
Here you're making the fundamental error of confusing the octets used If you're representing a string of octets, they don't magically get If you utf8::upgrade an octet string, the string still contains the Octet values and the Latin-1 characters are the same entities in the sense Now, semantics. In some situations, Perl will still give you the -zefram |
From @khwilliamsonZefram wrote:
But this is not the old version of \w. The current version of \w works
I really think I don't understand you, and vice versa. When you said Certainly when one does a uft8:upgrade, the internal string Perhaps Zefram thinks that Unicode semantics is the default for Perl, or
|
From zefram@fysh.orgkarl williamson wrote:
The level of meaning that you're talking about is a feature of You're still focussing too much on the representation. When you say "bit -zefram |
Migrated from rt.perl.org#72208 (status was 'open')
Searchable as RT72208$
The text was updated successfully, but these errors were encountered: