New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: UTF-8 encoded output in cmd.exe with code page 65001 causes unexpected output #13794
Comments
From nanis@cpan.orgCreated by nanis@cpan.orgOn Windows 8.1 64-bit and Windows Vista 32-bit, using self-built 5.18.2 E.g.: # Normal # alpha, beta, gamma # last octet, 0x7a, seems to be repeated on a separate line # without a newline, more unexpected octets # with trailing ascii, last three octets seem to be repeated For comparison, the following C program, compiled with Microsoft (R) C/C++ #include <stdio.h> int main(void) { Further, C:\Users\sinan\src> type pttt.pl binmode STDOUT, ':utf8'; print 'αβγxyz', "\n"; C:\Users\sinan\src> perl pttt.pl Note that piping the Perl scripts through xxd or other programs or saving C:\Users\sinan\src> perl pttt.pl > ttt C:\Users\sinan\src> type ttt More info: http://blog.nu42.com/2014/05/utf-8-ouput-from-perl-and-c-programs-in.html Also, when the console code page is set to 437, the output from the C Finally, using `syswrite` with the UTF-8 encoded string also works as C:\Users\sinan\src> perl -e "syswrite STDOUT, qq{\xce\xb1\xce\xb2\xce\xb3\n I suspect an interaction between Perl's IO layers and cmd.exe set to code For code pages, see 65001 utf-8 Unicode (UTF-8) Thank you, -- Sinan Perl Info
|
From @tonycozOn Thu May 01 18:26:16 2014, nanis@cpan.org wrote:
This is caused by a bug in Windows. When writing to a console set to code page 65001, WriteFile() returns the number of characters written instead of the number of bytes. So the write loop in PerlIOBuf_flush() is told that only 8 bytes have been written (6 visible, CR, LF) instead of the 11 that actually were, and so it loops and writes the last 3 again (z, CR, LF). See: http://social.msdn.microsoft.com/Forums/vstudio/en-US/e4b91f49-6f60-4ffe-887a-e18e39250905/possible-bugs-in-writefile-and-crt-unicode-issues?forum=vcgeneral for a thread on how this breaks MSVC console output. Other languages have the same problem: haskell: https://ghc.haskell.org/trac/ghc/ticket/4471 As to fixing[1] it, maybe we could add a perlio flag that assumes successful writes are always complete, and set that for the Win32 console. Tony [1] working around Microsoft's long-standing bug |
The RT System itself - Status changed from 'new' to 'open' |
From nanis@cpan.orgThank you, Tony, I was not aware of this issue. Now, reading the MSDN documentation[1], I see that WriteFile using
Note that, if a pipe is broken during a synchronous write, WriteFile Therefore, it seems to me, if WriteFile succeeds, there are only two Therefore, the right thing to do seems to be to always return count In fact, I just re-built 5.20.0 with this change. However, the C:\Users\sinan> perl -e "print qq{\xce\xb1\xce\xb2\xce\xb3123}" I am baffled. -- Sinan [1]: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365747%28v=vs.85%29.aspx On Tue, May 27, 2014 at 8:08 PM, Tony Cook via RT
-- |
From @tonycozOn Thu, May 29, 2014 at 03:37:34PM -0400, A. Sinan Unur wrote:
PerlIOWin32_write() is part of the :win32 layer, which is incomplete Win32 uses :unix as the bottom layer for file handles so you change Tony |
From nanis@cpan.orgOn Fri, May 30, 2014 at 1:56 AM, Tony Cook <tony@develop-help.com> wrote:
...
Well, that explains a lot, doesn't it. I focused my attention I'll look in perlio.* and perliol.h then. Thank you, -- |
From @tonycozOn Tue May 27 17:08:24 2014, tonyc wrote:
Here's a patch that does roughly what I suggested, though at the win32_write() level rather than at the PerlIO level. Tony |
From @tonycoz0001-perl-121783-work-around-a-bug-in-WriteFile.patchFrom ef02acb1c78894083637626b9cda8d411b923cc2 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 16 Oct 2014 12:17:33 +1100
Subject: [perl #121783] work around a bug in WriteFile()
---
win32/win32.c | 33 ++++++++++++++++++++++++++++++++-
1 files changed, 32 insertions(+), 1 deletions(-)
diff --git a/win32/win32.c b/win32/win32.c
index 26d419e..a13522d 100644
--- a/win32/win32.c
+++ b/win32/win32.c
@@ -3322,7 +3322,38 @@ win32_read(int fd, void *buf, unsigned int cnt)
DllExport int
win32_write(int fd, const void *buf, unsigned int cnt)
{
- return write(fd, buf, cnt);
+ int len = write(fd, buf, cnt);
+ if (len != cnt && len > 0) {
+ /* make sure win32_isatty() doesn't fiddle with
+ * errno/GetLastError()
+ */
+ dSAVE_ERRNO;
+ if (win32_isatty(fd)) {
+ /* WriteFile() to a console returns the number of characters
+ * written to the display rather than the number of bytes
+ * written.
+ *
+ * eg. if the console CP is set to 65001, and the console
+ * font is a TrueType font, writing
+ * "\xce\xb1\xce\xb2\xce\xb3\n" will return 4 instead of 7.
+ *
+ * If the console font is a raster font, it will return the
+ * full count instead, this means we can't reliably convert
+ * the returned character count into a byte count.
+ *
+ * Since WriteConsole() (which WriteFile() appears to
+ * implemented in terms of) simply fails when supplied too
+ * much data, we assume the same holds for WriteFile().
+ *
+ * So we assume that if anything was written that the entire
+ * buffer was written correctly.
+ */
+ len = cnt;
+ }
+ RESTORE_ERRNO;
+ }
+
+ return len;
}
DllExport int
--
1.7.4.msysgit.0
|
From @khwilliamsonTony Cook suggested a patch, but we did not hear back from the OP in more than 2 years. A Sinan Unur: What is the status of this ticket for you -- |
From @xenuOn Tue, 27 May 2014 17:08:24 -0700, tonyc wrote:
Note that at some point Microsoft started fixing cp65001-related bugs. Right now I only have access to Windows XP and 10 machines, and I can confirm that WriteFile() bug exists in XP but *not* in Windows 10. I *think* it was fixed in Windows 8, but I can't say it for sure right now. |
From nanis@cpan.orgOn Fri, Feb 10, 2017 at 6:09 PM, Tomasz Konojacki via RT
Similar situation here. I do have a Vista laptop I can try it out on -- Sinan |
From @khwilliamsonOn Fri, 10 Feb 2017 15:39:07 -0800, nanis@cpan.org wrote:
What is the status of this? |
From nanis@cpan.orgWith stock 5.26.1 compiled using Visual Studio 2017, I get C:\> perl -e "print qq{\xce\xb1\xce\xb2\xce\xb3123}" so, it looks like the problem does not exist any more. I did not try Summary of my perl5 (revision 5 version 26 subversion 1) configuration: Platform: Characteristics of this binary (from libperl): On Mon, Mar 12, 2018 at 3:19 PM, Karl Williamson via RT
|
From @khwilliamsonOn Mon, 12 Mar 2018 12:31:59 -0700, nanis@cpan.org wrote:
Does anyone object to closing this ticket then. It's a Windows bug, and Windows has been fixed in modern versions. We shouldn't have to add a work around for things that MS aren't supporting.
-- |
From @tonycozOn Mon, 15 Apr 2019 21:10:38 -0700, khw wrote:
The issue is still present in Windows 7 which still receives security updates until January 2020[1]. That said it's a Microsoft bug which they've fixed in newer versions, so maybe let this ticket die quietly. Tony [1] longer if you give Microsoft unspecified amounts of money |
Can we now close this, since Win 7 is at end of life? |
Happy February 2020! |
Migrated from rt.perl.org#121783 (status was 'open')
Searchable as RT121783$
The text was updated successfully, but these errors were encountered: