Skip Menu |
Report information
Id: 75000
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: morozovvs <perlbug [at]>

Operating System: Linux
PatchStatus: (no value)
Severity: medium
  • core
  • OS-interaction
  • Unicode
Perl Version:
  • 5.10.1
  • 5.12.0
  • 5.13.0
Fixed In: (no value)

CC: perlbug [...]
Subject: Unicode symbols damaged in $File::Find::name
Date: Sun, 9 May 2010 21:55:00 +0300 (EEST)
To: perlbug [...]
From: root [...] (root)
Download (untitled) / with headers
text/plain 8.7k
This is a bug report for perl from, generated with the help of perlbug 1.39 running under perl 5.10.1. ----------------------------------------------------------------- [Please describe your issue here] when executed following code find(sub { return if -d $File::Find::name; return if ! /$suffixes$/; my $name=$File::Find::name; print 'File: '; print $_; print ' Path: '; print $name; }, $directory); with folder containing files named with non-latin characters the output of '$name' contains damaged unicode characters. If $directory also contains non-latin characters only file names are damaged ($directory part is correct) [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=critical --- Site configuration information for perl 5.10.1: Configured by Debian Project at Sun Apr 11 22:31:36 UTC 2010. Summary of my perl5 (revision 5 version 10 subversion 1) configuration: Platform: osname=linux, osvers=, archname=i486-linux-gnu-thread-multi uname='linux murphy #1 smp fri apr 2 10:32:00 cest 2010 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.4.3', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/, so=so, useshrplib=true, gnulibc_version='2.10.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector' Locally applied patches: DEBPKG:debian/arm_thread_stress_timeout - Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable. DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/m68k_thread_stress - Disable some threads tests on m68k for now due to missing TLS. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/module_build_man_extensions - Adjust Module::Build manual page extensions for the Debian Perl policy DEBPKG:debian/perl_synopsis - Rearrange perl.pod DEBPKG:debian/prune_libs - Prune the list of libraries wanted to what we actually need. DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. DEBPKG:fixes/assorted_docs - [384f06a] Math::BigInt::CalcEmu documentation grammar fix DEBPKG:fixes/net_smtp_docs - [ #36038] Document the Net::SMTP 'Port' option DEBPKG:fixes/processPL - [ #17224] Always use PERLRUNINST when building perl modules. DEBPKG:debian/perlivp - Make perlivp skip include directories in /usr/local DEBPKG:fixes/pod2man-index-backslash - Escape backslashes in .IX entries DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib DEBPKG:fixes/kfreebsd_cppsymbols - [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD. DEBPKG:debian/cpanplus_definstalldirs - Configure CPANPLUS to use the site directories by default. DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl. DEBPKG:fixes/kfreebsd-filecopy-pipes - [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD DEBPKG:fixes/anon-tmpfile-dir - [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file DEBPKG:fixes/abstract-sockets - [89904c0] Add support for Abstract namespace sockets. DEBPKG:fixes/hurd_cppsymbols - [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd. DEBPKG:fixes/autodie-flock - Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc DEBPKG:fixes/archive-tar-instance-error - [ #48879] Separate Archive::Tar instance error strings from each other DEBPKG:fixes/positive-gpos - [perl #69056] [c584a96] Fix \\G crash on first match DEBPKG:debian/devel-ppport-ia64-optim - Work around an ICE on ia64 DEBPKG:fixes/trie-logic-match - [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626] DEBPKG:fixes/hppa-thread-eagain - make the threads-shared test suite more robust, fixing failures on hppa DEBPKG:fixes/crash-on-undefined-destroy - [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method DEBPKG:fixes/tainted-errno - [perl #61976] [be1cf43] fix an errno stringification bug in taint mode DEBPKG:patchlevel - List packaged patches for 5.10.1-12 in patchlevel.h --- @INC for perl 5.10.1: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . --- Environment for perl 5.10.1: HOME=/root LANG=ru_RU.UTF-8 LANGUAGE=ru_RU:ru LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/opt/qtsdk-2010.02/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PERL_BADLANG (unset) SHELL=/bin/bash
Subject: Re: [perl #75000] Unicode symbols damaged in $File::Find::name
Date: Mon, 10 May 2010 18:04:27 +0100
To: perl5-porters [...]
From: Dave Mitchell <davem [...]>
Download (untitled) / with headers
text/plain 1.8k
On Sun, May 09, 2010 at 11:57:36AM -0700, Vladimir Morozov wrote: Show quoted text
> when executed following code > find(sub { > return if -d $File::Find::name; > return if ! /$suffixes$/; > my $name=$File::Find::name; > print 'File: '; > print $_; > print ' Path: '; > print $name; > }, $directory); > with folder containing files named with non-latin characters the output of '$name' contains damaged unicode characters. > If $directory also contains non-latin characters only file names are damaged ($directory part is correct)
This is a general issue with filenames, and not just restricted to File::Find. For example the following shows that the returned filename string isn't UTF-8 encoded: my $f = "file\x{100}"; open my $fh, '>', $f or die "open: $!\n"; close $fh; my ($newf) = <file*>; use Devel::Peek; Dump $f; Dump $newf; A workaround (if you know that the filenames are UTF8 encoded) is to UTF-8 decode the returned filename before using it, e.g.: my $name = $_; utf8::decode($name); I notice that perltodo.pod has this entry: =head2 Unicode and glob() Currently glob patterns and filenames returned from File::Glob::glob() are always byte strings. See L</"Virtualize operating system access">. and perlrun.pod has this entry: =item B<-C [I<number/list>]> ... =for todo perltodo mentions Unicode in %ENV and filenames. I guess that these will be options e and f (or F). -- This is a great day for France! -- Nixon at Charles De Gaulle's funeral

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at