New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: use heuristic for utf8 usage w/-Mutf8 in PERL5OPT #16551
Comments
From perl-diddler@tlinx.orgCreated by perl-diddler@tlinx.orgFor some time I had an odd output in one of my programs Once I'd finished development on older module, I simply
C does this: #include <stdio.h>
I can't think of any language that forces *Ideally*, perl wouldn't either. However, some would complain BUT, at the very least... a compromise heuristic could 1) if 0xc2 or 0xc3 followed by another hex byte in the range For some though, that would still let too much incompat slip To that I say, add: 2) if the ENV var PERL5OPT has -Mutf8 in it -- AND if "1" if more safety was wanted, ----------------- Tangential, but related: Additionally, if a config file is This might assist in putting the infamous perl utf8 bug This isn't version related as it happens under perl 5.24.0 Perl Info
|
From @GrinnzA couple of things: 1. "Output the same bytes as on input." Nothing in Perl prevents this from occurring, but it's impossible to perform character-aware operations (like matching \w against unicode word characters) without knowing what encoding the decode the input from. 2. "use utf8;" only affects the source code itself. It's very different to talk about Perl's treatment of the bytes in the source code, and Perl's treatment of input and output bytes. Other operations are required to translate UTF-8 encoding at STDIN/STDOUT/STDERR, ARGV, and opened filehandle boundaries, among other things. These three things are covered by -CSAD. See https://metacpan.org/pod/perlrun#-C-[number/list] 3. I disagree with the feasability of any of the presented heuristics. It's 100% possible for a single-byte encoded file to look like UTF-8. 4. Using the locale to set default utf8 layers was a failed experiment in (I believe) Perl 5.8.0. You can enable this behavior for yourself with -CSADL (or adding L to your other -C switch arguments, see above link). 5. A potential way forward to at least default to the behavior of 'use utf8;' (decoding source code as UTF-8) was previously discussed in https://www.nntp.perl.org/group/perl.perl5.porters/2017/10/msg246838.html - I don't think there's any reasonable path to defaulting other handles to set utf8 layers. |
The RT System itself - Status changed from 'new' to 'open' |
Migrated from rt.perl.org#133183 (status was 'open')
Searchable as RT133183$
The text was updated successfully, but these errors were encountered: