Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO::Handle.read-internal cannot handle fancy Unicode chars on TTY handles #6637

Open
p6rt opened this issue Nov 14, 2017 · 8 comments
Open
Labels
IO severe A problem that is encountered frequently, or a problem that needs attention for other reasons Windows

Comments

@p6rt
Copy link

p6rt commented Nov 14, 2017

Migrated from rt.perl.org#132441 (status was 'open')

Searchable as RT132441$

@p6rt
Copy link
Author

p6rt commented Nov 14, 2017

From @lefth

Note​: results are from Windows 10. Same results from CMD and from Cmder.
Rakudo version 2017.09-355-g27131ed8d built on MoarVM version
2017.09.1-575-gd4e230a6

Working​:
perl6 -e "<a b>».say" # output is as expected

But in the REPL, I tried to run​:

<a b>».say

Without Linenoise installed, Rakudo endlessly repeats "> Decoder may not be
used concurrently"

With Linenoise installed, I get this error​:

Malformed UTF-8 at line 1 col 6
  in sub nativecast at
D​:\rakudo\share\perl6\sources\51E302443A2C8FF185ABC10CA1E5520EFEE885A1
(NativeCall​::Types) line 5
  in method deref at
D​:\rakudo\share\perl6\sources\51E302443A2C8FF185ABC10CA1E5520EFEE885A1
(NativeCall​::Types) line 58
  in sub linenoise at
D​:\rakudo\share\perl6\site\sources\0BDF8C54D33921FEA066491D8D13C96A7CB144B9
(Linenoise) line 86

I got similar error when using these characters as quoting operators, such
as​: q«text».

Note​: the obvious workaround is to substitute "<<" and ">>" for "«" and "»".

@p6rt
Copy link
Author

p6rt commented Nov 15, 2017

From @AlexDaniel

Marked as 「SEVERE」 because it's very unfortunate that non-ascii stuff does not work. I don't know why it happens though, so people should feel free to retag as needed.

On 2017-11-13 20​:29​:52, dan@​zwell.net wrote​:

Note​: results are from Windows 10. Same results from CMD and from Cmder.
Rakudo version 2017.09-355-g27131ed8d built on MoarVM version
2017.09.1-575-gd4e230a6

Working​:
perl6 -e "<a b>».say" # output is as expected

But in the REPL, I tried to run​:

<a b>».say

Without Linenoise installed, Rakudo endlessly repeats "> Decoder may not be
used concurrently"

With Linenoise installed, I get this error​:

Malformed UTF-8 at line 1 col 6
in sub nativecast at
D​:\rakudo\share\perl6\sources\51E302443A2C8FF185ABC10CA1E5520EFEE885A1
(NativeCall​::Types) line 5
in method deref at
D​:\rakudo\share\perl6\sources\51E302443A2C8FF185ABC10CA1E5520EFEE885A1
(NativeCall​::Types) line 58
in sub linenoise at
D​:\rakudo\share\perl6\site\sources\0BDF8C54D33921FEA066491D8D13C96A7CB144B9
(Linenoise) line 86

I got similar error when using these characters as quoting operators, such
as​: q«text».

Note​: the obvious workaround is to substitute "<<" and ">>" for "«" and "»".

@p6rt
Copy link
Author

p6rt commented Nov 15, 2017

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Nov 16, 2017

From @zoffixznet

On Mon, 13 Nov 2017 20​:29​:52 -0800, dan@​zwell.net wrote​:

With Linenoise installed, I get this error​:

Malformed UTF-8 at line 1 col 6

That looks like it might be working correctly, except cmd.exe wasn't told to UTF-8 code page (need to run `chcp 65001`).

But in the REPL, I tried to run​:

<a b>».say

Without Linenoise installed, Rakudo endlessly repeats "> Decoder may not be
used concurrently"

On 2017.07 on Win7 with 65001 code page enabled, the » char doesn't show up at all. Just seems to get removed from the content if I paste it into the terminal.

The default REPL uses `prompt` to read stuff and here's another datapoint​: prompt() returns Nil if the data it reads has fancy Unicode chars like » or 「」​:

  C​:\Users\zoffi>perl6 -e "say prompt"
  »
  Nil
 
  C​:\Users\zoffi>perl6 -e "say prompt"
  42
  42

  C​:\Users\zoffi>perl6 -e "say prompt"
  42»
  Nil

  C​:\Users\zoffi>perl6 -e "say prompt"
  »42
  Nil

Looks like it might be a good place to start looking at for the OP bug.

@p6rt
Copy link
Author

p6rt commented Nov 21, 2017

From @zoffixznet

On Thu, 16 Nov 2017 09​:53​:46 -0800, cpan@​zoffix.com wrote​:

The default REPL uses `prompt` to read stuff and here's another
datapoint​: prompt() returns Nil if the data it reads has fancy Unicode
chars like » or 「」​:

C​:\Users\zoffi>perl6 -e "say prompt"
»
Nil

looks to be a problem in moarvm. perl6 -e "say $*IN.read-internal(1000)" gives an empty buf (Buf[uint8]​:0x<>) if the input got fancypants chars

https://irclog.perlgeek.de/perl6/2017-11-21#i_15478971

@p6rt
Copy link
Author

p6rt commented Dec 25, 2017

From @zoffixznet

On Thu, 16 Nov 2017 09​:53​:46 -0800, cpan@​zoffix.com wrote​:

On 2017.07 on Win7 with 65001 code page enabled, the » char doesn't
show up at all. Just seems to get removed from the content if I paste
it into the terminal.

Starting to think this might be a limitation of cmd.exe. Though strangely,
I'm failing to find anyone mentioning this problem on Google...

For example, Perl 5 has the exact same problem that the fancy chars get stripped​:

  C​:\rakudo>perl -wlE "say '[' . scalar(readline) . ']'"
  e »♥ b
  [e b
  ]

I also cobbled together this C program from MSDN's code examples and the same
problem is present in it as well​:

  #include <fcntl.h>
  #include <io.h>
  #include <stdlib.h>
  #include <stdio.h>
  #include <share.h>

  int main( void ) {
  int fh, i;
  unsigned char buffer[60000];
  unsigned int nbytes = 60000, bytesread;
  int result;

  result = _setmode(_fileno(stdin), _O_BINARY);
  if( result == -1 )
  perror( "Cannot set mode" );
  else
  printf( "'stdin' successfully changed to binary mode\n" );

  if( ( bytesread = _read( _fileno( stdin ), buffer, nbytes ) ) <= 0 )
  perror( "Problem reading file" );
  else
  printf( "Read %u bytes from file\n", bytesread );

  printf("Read this​: `");
  for (i = 0; i < bytesread; i++)
  printf("%u ", buffer[i]);
  printf("`\n================\n");

  return 1;
  }

Anything fancy gets read as a nul byte instead of the proper bytes for that char​:

  C​:\rakudo>gcc test.c && a.exe
  'stdin' successfully changed to binary mode
  e »♥ b
  Read 8 bytes from file
  Read this​: `101 32 0 0 32 98 13 10 `
  ================

  C​:\rakudo>

@p6rt
Copy link
Author

p6rt commented Dec 26, 2017

From @geekosaur

On Mon, Dec 25, 2017 at 1​:07 AM, Zoffix Znet via RT <
perl6-bugs-followup@​perl.org> wrote​:

On Thu, 16 Nov 2017 09​:53​:46 -0800, cpan@​zoffix.com wrote​:

On 2017.07 on Win7 with 65001 code page enabled, the » char doesn't
show up at all. Just seems to get removed from the content if I paste
it into the terminal.

Starting to think this might be a limitation of cmd.exe. Though strangely,
I'm failing to find anyone mentioning this problem on Google...

IIRC this is known, and not really fixable. It's not even cmd.exe but a
Windows console mode limitation.

--
brandon s allbery kf8nh sine nomine associates
allbery.b@​gmail.com ballbery@​sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

@p6rt
Copy link
Author

p6rt commented Dec 26, 2017

From @geekosaur

On Tue, Dec 26, 2017 at 12​:15 AM, Brandon Allbery via RT <
perl6-bugs-followup@​perl.org> wrote​:

IIRC this is known, and not really fixable. It's not even cmd.exe but a
Windows console mode limitation.

Come to think of it, there should be existing mention of this on the moarvm
bug tracker (ticket may have been closed as unfixable).

--
brandon s allbery kf8nh sine nomine associates
allbery.b@​gmail.com ballbery@​sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

@p6rt p6rt added IO severe A problem that is encountered frequently, or a problem that needs attention for other reasons Windows labels Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO severe A problem that is encountered frequently, or a problem that needs attention for other reasons Windows
Projects
None yet
Development

No branches or pull requests

1 participant