Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation needed: backticks, qx() return octets, not characters #15249

Closed
p5pRT opened this issue Mar 24, 2016 · 13 comments
Closed

documentation needed: backticks, qx() return octets, not characters #15249

p5pRT opened this issue Mar 24, 2016 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 24, 2016

Migrated from rt.perl.org#127780 (status was 'resolved')

Searchable as RT127780$

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2016

From @karenetheridge

This is not surprising when you think about it, but it's a subtle gotcha that can cause a lot of bugs. Any content that is fetched via the backtick operators or qx() comes back as *encoded octets* (since it's just a readpipe), and the user might need to decode it first to get a readable string.

Therefore​:

  use utf8;
  my $str = "Les hivers de mon enfance étaient des saisons longues, longues."
  . `bin/chandail --stanza 2`
  . "mais la vraie vie était sur la patinoire.";

...is going to result in a garbage string, because utf8-ENcoded octets will be
mixed in with utf8-DEcoded characters. One needs to wrap a decode('UTF-8',
...) around the output of the command!

We should really mention this in perlop.pod.

(and then later, in the documentation for IPC​::System​::Simple, Capture​::Tiny, etc...)

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2016

From @tonycoz

On Thu, Mar 24, 2016 at 04​:15​:00PM -0700, Karen Etheridge wrote​:

This is not surprising when you think about it, but it's a subtle gotcha that can cause a lot of bugs. Any content that is fetched via the backtick operators or qx() comes back as *encoded octets* (since it's just a readpipe), and the user might need to decode it first to get a readable string.

Therefore​:

use utf8;
my $str = "Les hivers de mon enfance étaient des saisons longues\, longues\."
    \. \`bin/chandail \-\-stanza 2\`
    \. "mais la vraie vie était sur la patinoire\.";

...is going to result in a garbage string, because utf8-ENcoded octets will be
mixed in with utf8-DEcoded characters. One needs to wrap a decode('UTF-8',
...) around the output of the command!

We should really mention this in perlop.pod.

(and then later, in the documentation for IPC​::System​::Simple, Capture​::Tiny, etc...)

You can control decoding of qx() with the open pragma​:

tony@​mars​:~$ perl -le 'use open IN => "​:utf8"; $x = `perl -e "binmode STDOUT, q(​:utf8); print chr(0x100)"`; print ord $x'
256

Though you really shouldn't use :utf8 for input.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @tonycoz

On Thu Mar 24 16​:15​:00 2016, ether wrote​:

We should really mention this in perlop.pod.

How about the attached?

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @tonycoz

0001-perl-127780-point-backtick-users-at-the-open-pragma.patch
From 4adc98139ea3ddd24ad19bae9f36338d54bca0de Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 13 Apr 2016 11:57:19 +1000
Subject: [PATCH] (perl #127780) point backtick users at the open pragma

---
 pod/perlop.pod | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/pod/perlop.pod b/pod/perlop.pod
index 17d24bb..292381f 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -2316,6 +2316,12 @@ failure modes by inspecting C<$?> like this:
         printf "child exited with value %d\n", $? >> 8;
     }
 
+Use the L<open> pragma to control the I/O layers used when reading the
+output of the command, for example:
+
+  use open IN => "encoding(UTF-8)";
+  my $x = `cmd-producin-utf-8`;
+
 See L</"I/O Operators"> for more discussion.
 
 =item C<qw/I<STRING>/>
-- 
2.1.4

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @karenetheridge

Is it 'encoding(UTF-8)', or '​:encoding(UTF-8)'? Is there any way of changing the encoding just for a local scope, rather than for the entire program?

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @tonycoz

On Tue Apr 12 19​:17​:27 2016, ether wrote​:

Is it 'encoding(UTF-8)', or '​:encoding(UTF-8)'?

Oops, it's :encoding(UTF-8), new patch.

Is there any way of
changing the encoding just for a local scope, rather than for the
entire program?

The open pragma is lexically scoped*​:

# utf8.txt contains a UTF-8 \x{100}
$ perl -le '{ use open IN => "​:encoding(UTF-8)"; $x = `cat utf8.txt`; print ord $x } $x = `cat utf8.txt`; print ord $x'
256
196

Tony

* mostly, the :std option is global

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @tonycoz

0001-perl-127780-point-backtick-users-at-the-open-pragma.patch
From 397e69fe56aa7085de152c6cabbf124fdaf16b47 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 13 Apr 2016 14:18:57 +1000
Subject: [PATCH] (perl #127780) point backtick users at the open pragma

---
 pod/perlop.pod | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/pod/perlop.pod b/pod/perlop.pod
index 17d24bb..33ede83 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -2316,6 +2316,12 @@ failure modes by inspecting C<$?> like this:
         printf "child exited with value %d\n", $? >> 8;
     }
 
+Use the L<open> pragma to control the I/O layers used when reading the
+output of the command, for example:
+
+  use open IN => ":encoding(UTF-8)";
+  my $x = `cmd-producin-utf-8`;
+
 See L</"I/O Operators"> for more discussion.
 
 =item C<qw/I<STRING>/>
-- 
2.1.4

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @karenetheridge

On Tue Apr 12 21​:23​:27 2016, tonyc wrote​:

The open pragma is lexically scoped*​:
* mostly, the :std option is global

Aha, I didn't realize that. I should read the documentation more thoroughly!

I'm happy with this patch. Thanks!

@p5pRT
Copy link
Author

p5pRT commented May 17, 2016

From @tonycoz

On Tue Apr 12 21​:34​:01 2016, ether wrote​:

On Tue Apr 12 21​:23​:27 2016, tonyc wrote​:

The open pragma is lexically scoped*​:
* mostly, the :std option is global

Aha, I didn't realize that. I should read the documentation more thoroughly!

I'm happy with this patch. Thanks!

Applied as fe43a9c (with s/producin/producing/)

Tony

@p5pRT
Copy link
Author

p5pRT commented May 17, 2016

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant