Skip Menu |
Report information
Id: 79214
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: tom christiansen <tchrist [at] perl.com>
Cc:
AdminCc:

Operating System: darwin
PatchStatus: (no value)
Severity: low
Type: debugger
Perl Version: (no value)
Fixed In: (no value)



Subject: Perl debugger hopeless with Unicode program identifiers
Date: Sun, 14 Nov 2010 15:59:30 -0700
To: perlbug [...] perl.org
From: Tom Christiansen <tchrist [...] perl.com>
Download (untitled) / with headers
text/plain 13.1k
The Perl debugger does not work with Unicode identifiers. See the little main() function below. This command: % echo "ácütê" | perl -CS -d -S leo Yields this kinda of garbage: Loading DB routines from perl5db.pl version 1.33 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(/Users/tomchristiansen/scripts/leo:38): 38: main(); DB<1> b flip_diacriticals DB<2> c main::flip_diacriticals(/Users/tomchristiansen/scripts/leo:135): 135: binmode(DATA, ":utf8"); DB<2> T Wide character in print at /usr/local/lib/perl5/5.12.2/perl5db.pl line 6789, <> line 1. Wide character in print at /usr/local/lib/perl5/5.12.2/perl5db.pl line 5724, <> line 1. at /usr/local/lib/perl5/5.12.2/perl5db.pl line 5724 DB::print_trace('GLOB(0x253060)', 1) called at /usr/local/lib/perl5/5.12.2/perl5db.pl line 2842 DB::DB called at /Users/tomchristiansen/scripts/leo line 135 main::flip_diacriticals('êtücá') called at /Users/tomchristiansen/scripts/leo line 121 main::reverse_mark_flip('ácütê') called at /Users/tomchristiansen/scripts/leo line 57 main::uʍopəpᴉƨdn('ácütê') called at /Users/tomchristiansen/scripts/leo line 47 main::main() called at /Users/tomchristiansen/scripts/leo line 38 $ = main::flip_diacriticals('êtücá') called from file `/Users/tomchristiansen/scripts/leo' line 121 $ = main::reverse_mark_flip('M-acM-|tM-j') called from file `/Users/tomchristiansen/scripts/leo' line 57 $ = main::uʍopəpᴉƨdn('M-acM-|tM-j') called from file `/Users/tomchristiansen/scripts/leo' line 47 . = main::main() called from file `/Users/tomchristiansen/scripts/leo' line 38 I've used -CS on the command line, and I've even used it before -d. I have PERL_UNICODE=SA in my shell. My program starts like this: use 5.010_000; use utf8; use strict; use autodie; use warnings qw[ FATAL all ]; use open qw[ :std :utf8 ]; I can't think of anything else to do. Oh wait. Yes, I can! % echo "ácütê" | perl -CS -d -S leo Loading DB routines from perl5db.pl version 1.33 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(/Users/tomchristiansen/scripts/leo:38): 38: main(); DB<1> binmode(DB::OUT, ":utf8") || die DB<2> b flip_diacriticals DB<3> c main::flip_diacriticals(/Users/tomchristiansen/scripts/leo:135): 135: binmode(DATA, ":utf8"); DB<3> T $ = main::flip_diacriticals('êtücá') called from file `/Users/tomchristiansen/scripts/leo' line 121 $ = main::reverse_mark_flip('M-acM-|tM-j') called from file `/Users/tomchristiansen/scripts/leo' line 57 $ = main::uʍopəpᴉƨdn('M-acM-|tM-j') called from file `/Users/tomchristiansen/scripts/leo' line 47 . = main::main() called from file `/Users/tomchristiansen/scripts/leo' line 38 See, it's still garbage! What am I supposed to do? And watch this: DB<3> b main::uʍopəpᴉƨdn Subroutine main::u not found. That was entered as b main::<TAB> and it completed to that CRAP. Heck, even when I type b main::uʍopəpᴉƨdn it ignores me and displays b main::uʍopəpᴉƨdn and then again bitches about Subroutine main::u not found. To add injury to insult, that's illegal UTF-8 up there in its output! What am I supposed to do about *that*? It's just totally bollocksed, is what it is. :( --tom Summary of my perl5 (revision 5 version 12 subversion 2) configuration: Platform: osname=darwin, osvers=9.8.0, archname=darwin-thread-multi-2level uname='darwin mac 9.8.0 darwin kernel version 9.8.0: wed jul 15 16:55:01 pdt 2009; root:xnu-1228.15.4~1release_i386 i386 i386 ' config_args='-Dusethreads -de' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -I/opt/local/include', optimize='-O3', cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -I/opt/local/include' ccversion='', gccversion='4.0.1 (Apple Inc. build 5465)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector -L/usr/local/lib -L/opt/local/lib' libpth=/usr/local/lib /opt/local/lib /usr/lib libs=-ldbm -ldb -ldl -lm -lutil -lc perllibs=-ldl -lm -lutil -lc libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -L/opt/local/lib -fstack-protector' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Built under darwin Compiled at Oct 21 2010 23:16:39 %ENV: PERL_UNICODE="SA" @INC: /usr/local/lib/perl5/site_perl/5.12.2/darwin-thread-multi-2level /usr/local/lib/perl5/site_perl/5.12.2 /usr/local/lib/perl5/5.12.2/darwin-thread-multi-2level /usr/local/lib/perl5/5.12.2 /usr/local/lib/perl5/site_perl/5.11.3 /usr/local/lib/perl5/site_perl/5.10.0 /usr/local/lib/perl5/site_perl . And here's "leo", the happy program that causes all that: % perl -CS -lE 'use utf8; say "good job there?"; say "mais bien sûr"; say "oh we got àcüté trouble!"; say "El niño está loco."; say "¡qué malo!"' | leo ¡olɐɯ ə̗nb! ˙oɔol ɐ̗ʇƨə oṵᴉu lƎ ¡əlqnoɹʇ ə̗ʇn̤ɔɐ̖ ʇo⅁ əʍ ɥo ɹn̬ƨ uəᴉq ƨᴉɐɯ ¿əɹəɥʇ qoſ̣ poo⅁ --tom #!/usr/local/bin/perl -l # # leo (leonardo script) - reverse input to ʇndʇno # # Tom Christiansen <tchrist@perl.com> # Sat Nov 13 19:05:43 MST 2010 # ################################################################# use 5.010_000; use utf8; use strict; use autodie; use warnings qw[ FATAL all ]; use open qw[ :std :utf8 ]; use autouse "Unicode::Normalize" => qw[ NFD NFC NFKD NFKC ]; use constant BOTH_WAYS => 0; ################################################################# sub flip_diacriticals($); # heredoc beaᵘtification routines sub dequeue($$); sub strip_qq($); sub strip_q($); sub xbrace_quote(@); sub reverse_mark_flip($); sub main(); ################################################################# main(); exit(); ################################################################# sub main() { sub uʍopəpᴉƨdn($); for my $input (reverse <>) { chomp $input; my $ʇndʇno = uʍopəpᴉƨdn($input); say $ʇndʇno; } } ################################################################# sub uʍopəpᴉƨdn($) { my $_ = shift(); $_ = /[^\x00-\x7F]/ # Unicode? ? reverse_mark_flip($_) : reverse ($_); # this is the best we can do for either case s/[Jj]/ſ\x{323}/g; # long s + combining dot below # Placeholders below indicated by □ for chars I haven't # yet found an upside-down version of. This can be deceptive # if you don't have one of the normal things in your font set! if (BOTH_WAYS) { tr [abcdefghijklmnopqrstuvwxyzɐqɔpəɟ⅁ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□] [ɐqɔpəɟ⅁ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□abcdefghijklmnopqrstuvwxyz]; tr [ABCDEFGHIJKLMNOPQRSTUVWXYZɐqƆpƎℲ⅁ɥI□ʞ⅂ƜИOdbᴚƨʇnɅMX⅄□] [ɐqƆpƎℲ⅁ɥI□ʞ⅂ƜИOdbᴚƨʇnɅMX⅄□ABCDEFGHIJKLMNOPQRSTUVWXYZ]; } else { tr [abcdefghijklmnopqrstuvwxyz] [ɐqɔpəɟ⅁ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□]; # punt to other case # [ɐqɔpəɟ□ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□]; # missing in the casing tr [ABCDEFGHIJKLMNOPQRSTUVWXYZ] [ɐqƆpƎℲ⅁ɥI□ʞ⅂ƜИOdbᴚƨʇnɅMX⅄□]; # punt to other case # [□□Ɔ□ƎℲ⅁□I□□⅂ƜИO□□ᴚ□□□ɅMX⅄□]; # missing in the casing } tr [-¯_#&'"“”‘’!¡?¿,.] [-_¯#⅋'"„□□,¡!¿?ʻ˙]; tr [0123456789] [0□□ʕ□□9□86]; # sure wish these next two looked better tr [()<>{}[]] [)(><}{\]\[]; tr#/\\#\\/#; # NFC unlikely to be of much help, # but one is "supposed" to do this return NFC($_); } # reverse string by graphemes, inverting all the marks sub reverse_mark_flip($) { my $string = shift(); # first decompose to pull out grapheme units my $nfd = NFD($string); # reverse the string by grapheme units my @graphemes = $nfd =~ /\X/g; # put it back together reversed $string = join q[] => reverse @graphemes; # if there are marks, we have hard work to do if ($string =~ /\pM/) { $string = flip_diacriticals($string); } return $string; } # This autoloading stub replaces itself with the real function, # then jumps directly into its replacement via magic goto. # # HEY LIKE I'M SORRY ALREADY, OK! It's just too hard to get # this right—and look ok—any other way. Really, I *tried*. # sub flip_diacriticals($) { binmode(DATA, ":utf8"); local $/ = q[]; my $_; my($lhs, $rhs) = ( q[], q[] ); while (<DATA>) { next if m{ \A \s* \# }x; my @pair = m{ < ( \p{HexDigit} + ) > }gmx; next unless @pair == 2; $lhs .= xbrace_quote( @pair); $rhs .= xbrace_quote(reverse @pair); } my $redefinition = strip_q <<'END_OF_START' |Q| |Q| no warnings "redefine"; |Q| |Q| sub flip_diacriticals($) { |Q| # haven't touched @_ yet |Q| my $string = shift(); |Q| $string =~ |Q| END_OF_START . strip_qq <<"END_OF_TRANSLITERATION" |QQ| |QQ| tr[$lhs] |QQ| [$rhs]; |QQ| END_OF_TRANSLITERATION . strip_q <<'END_OF_FUNCTION' |Q| |Q| return $string; |Q| } |Q| |Q| 1; # eval happiness |Q| END_OF_FUNCTION # this ̬ is the end of the eval string build up ; # DO NOT DELETE # that ̂ was the end of the eval string build up ##say $redefinition; eval $redefinition || die; goto \&flip_diacriticals; } sub dequeue($$) { my($leader, $body) = @_; $body =~ s/^\s*\Q$leader\E ?//gm; return $body; } sub strip_q($) { my $body = shift(); return dequeue('|Q|', $body); } sub strip_qq($) { my $body = shift(); return dequeue("|QQ|", $body); } sub xbrace_quote(@) { return join q[] => map { q[\x{] . $_ . q[}] } @_; } __END__ ̈ 776 <0308> COMBINING DIAERESIS ̤ 804 <0324> COMBINING DIAERESIS BELOW ̃ 771 <0303> COMBINING TILDE ̰ 816 <0330> COMBINING TILDE BELOW ́ 769 <0301> COMBINING ACUTE ACCENT ̗ 791 <0317> COMBINING ACUTE ACCENT BELOW ̀ 768 <0300> COMBINING GRAVE ACCENT ̖ 790 <0316> COMBINING GRAVE ACCENT BELOW ̆ 774 <0306> COMBINING BREVE ̯ 815 <032F> COMBINING INVERTED BREVE BELOW ̑ 785 <0311> COMBINING INVERTED BREVE ̮ 814 <032E> COMBINING BREVE BELOW ̭ 813 <032D> COMBINING CIRCUMFLEX ACCENT BELOW ̌ 780 <030C> COMBINING CARON ̂ 770 <0302> COMBINING CIRCUMFLEX ACCENT ̬ 812 <032C> COMBINING CARON BELOW ̧ 807 <0327> COMBINING CEDILLA ̉ 777 <0309> COMBINING HOOK ABOVE ̇ 775 <0307> COMBINING DOT ABOVE ̣ 803 <0323> COMBINING DOT BELOW ̳ 819 <0333> COMBINING DOUBLE LOW LINE ̿ 831 <033F> COMBINING DOUBLE OVERLINE ̅ 773 <0305> COMBINING OVERLINE ̲ 818 <0332> COMBINING LOW LINE ̄ 772 <0304> COMBINING MACRON ̱ 817 <0331> COMBINING MACRON BELOW ̍ 781 <030D> COMBINING VERTICAL LINE ABOVE ̩ 809 <0329> COMBINING VERTICAL LINE BELOW
Subject: Re: [perl #79214] Perl debugger hopeless with Unicode program identifiers
Date: Mon, 15 Nov 2010 19:24:18 +0000
To: perl5-porters [...] perl.org
From: John <j.imrie [...] virginmedia.com>
Download (untitled) / with headers
text/plain 548b
Loading DB routines from perl5db.pl version 1.33 Show quoted text
> Editor support available. > > Enter h or `h h' for help, or `man perldebug' for more help. > > main::(/Users/tomchristiansen/scripts/leo:38): > 38: main(); > DB<1> binmode(DB::OUT, ":utf8") || die
Shouldn't binmode(DB::OUT, ":utf8") be binmode($DB::OUT, ":utf8"), and if that does not work make sure that your terminal is expecting utf8 characters. For xterm that means starting it with the -u8 flag. Search for utf8 in the xterm man page for more details.
CC: tchrist1 (via RT) <perlbug-followup [...] perl.org>, bugs-bitbucket [...] rt.perl.org
Subject: Re: [perl #79214] Perl debugger hopeless with Unicode program identifiers
Date: Tue, 16 Nov 2010 06:42:28 +0100
To: perl5-porters [...] perl.org
From: Richard Foley <Richard.Foley [...] rfi.net>
Download (untitled) / with headers
text/plain 13.7k
Well, look on the bright side, we can thank the stars (or Larry), that Perl wasn't written by a German otherwise we'd probably have to deal with things like this: ßüb functionname {} ;-) -- Richard Foley Ciao - shorter than aufwiedersehen http://www.rfi.net/ On Sunday 14 November 2010 23:59:56 tchrist1 wrote: Show quoted text
> # New Ticket Created by tchrist1 > # Please include the string: [perl #79214] > # in the subject line of all future correspondence about this issue. > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=79214 > > > > The Perl debugger does not work with Unicode identifiers. > See the little main() function below. > > This command: > > % echo "ácütê" | perl -CS -d -S leo > > Yields this kinda of garbage: > > Loading DB routines from perl5db.pl version 1.33 > Editor support available. > > Enter h or `h h' for help, or `man perldebug' for more help. > > main::(/Users/tomchristiansen/scripts/leo:38): > 38: main(); > DB<1> b flip_diacriticals > DB<2> c > main::flip_diacriticals(/Users/tomchristiansen/scripts/leo:135): > 135: binmode(DATA, ":utf8"); > DB<2> T > Wide character in print at /usr/local/lib/perl5/5.12.2/perl5db.pl line > 6789, <> line 1. Wide character in print at > /usr/local/lib/perl5/5.12.2/perl5db.pl line 5724, <> line 1. at > /usr/local/lib/perl5/5.12.2/perl5db.pl line 5724 > DB::print_trace('GLOB(0x253060)', 1) called at > /usr/local/lib/perl5/5.12.2/perl5db.pl line 2842 DB::DB called at > /Users/tomchristiansen/scripts/leo line 135 > main::flip_diacriticals('êtücá') called at > /Users/tomchristiansen/scripts/leo line 121 > main::reverse_mark_flip('ácütê') called at > /Users/tomchristiansen/scripts/leo line 57 main::uʍopəpᴉƨdn('ácütê') > called at /Users/tomchristiansen/scripts/leo line 47 main::main() called at > /Users/tomchristiansen/scripts/leo line 38 $ = > main::flip_diacriticals('êtücá') called from file > `/Users/tomchristiansen/scripts/leo' line 121 $ = > main::reverse_mark_flip('M-acM-|tM-j') called from file > `/Users/tomchristiansen/scripts/leo' line 57 $ = > main::uʍopəpᴉƨdn('M-acM-|tM-j') called from file > `/Users/tomchristiansen/scripts/leo' line 47 . = main::main() called from > file `/Users/tomchristiansen/scripts/leo' line 38 > > I've used -CS on the command line, and I've even used it > before -d. I have PERL_UNICODE=SA in my shell. My program > starts like this: > > use 5.010_000; > > use utf8; > use strict; > use autodie; > use warnings qw[ FATAL all ]; > use open qw[ :std :utf8 ]; > > I can't think of anything else to do. > > Oh wait. Yes, I can! > > % echo "ácütê" | perl -CS -d -S leo > Loading DB routines from perl5db.pl version 1.33 > Editor support available. > > Enter h or `h h' for help, or `man perldebug' for more help. > > main::(/Users/tomchristiansen/scripts/leo:38): > 38: main(); > DB<1> binmode(DB::OUT, ":utf8") || die > DB<2> b flip_diacriticals > DB<3> c > main::flip_diacriticals(/Users/tomchristiansen/scripts/leo:135): > 135: binmode(DATA, ":utf8"); > DB<3> T > $ = main::flip_diacriticals('êtücá') called from file > `/Users/tomchristiansen/scripts/leo' line 121 $ = > main::reverse_mark_flip('M-acM-|tM-j') called from file > `/Users/tomchristiansen/scripts/leo' line 57 $ = > main::uʍopəpᴉƨdn('M-acM-|tM-j') called from file > `/Users/tomchristiansen/scripts/leo' line 47 . = main::main() called from > file `/Users/tomchristiansen/scripts/leo' line 38 > > See, it's still garbage! > > What am I supposed to do? > > And watch this: > > DB<3> b main::uʍopəpᴉƨdn > Subroutine main::u not found. > > That was entered as > > b main::<TAB> > > and it completed to that CRAP. Heck, even when I type > > b main::uʍopəpᴉƨdn > > it ignores me and displays > > b main::uʍopəpᴉƨdn > > and then again bitches about > > Subroutine main::u not found. > > To add injury to insult, that's illegal UTF-8 up there in its output! > What am I supposed to do about *that*? > > It's just totally bollocksed, is what it is. :( > > --tom > > Summary of my perl5 (revision 5 version 12 subversion 2) configuration: > > Platform: > osname=darwin, osvers=9.8.0, archname=darwin-thread-multi-2level > uname='darwin mac 9.8.0 darwin kernel version 9.8.0: wed jul 15 > 16:55:01 pdt 2009; root:xnu-1228.15.4~1release_i386 i386 i386 ' > config_args='-Dusethreads -de' > hint=recommended, useposix=true, d_sigaction=define > useithreads=define, usemultiplicity=define > useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef > use64bitint=undef, use64bitall=undef, uselongdouble=undef > usemymalloc=n, bincompat5005=undef > Compiler: > cc='cc', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp > -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include > -I/opt/local/include', optimize='-O3', > cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-precomp > -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include > -I/opt/local/include' ccversion='', gccversion='4.0.1 (Apple Inc. build > 5465)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, > byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, > longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, > Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define > Linker and Libraries: > ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector > -L/usr/local/lib -L/opt/local/lib' libpth=/usr/local/lib /opt/local/lib > /usr/lib > libs=-ldbm -ldb -ldl -lm -lutil -lc > perllibs=-ldl -lm -lutil -lc > libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a > gnulibc_version='' > Dynamic Linking: > dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' > cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup > -L/usr/local/lib -L/opt/local/lib -fstack-protector' > > > Characteristics of this binary (from libperl): > Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV > PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP USE_ITHREADS > USE_LARGE_FILES USE_PERLIO USE_PERL_ATOF > USE_REENTRANT_API > Built under darwin > Compiled at Oct 21 2010 23:16:39 > %ENV: > PERL_UNICODE="SA" > @INC: > /usr/local/lib/perl5/site_perl/5.12.2/darwin-thread-multi-2level > /usr/local/lib/perl5/site_perl/5.12.2 > /usr/local/lib/perl5/5.12.2/darwin-thread-multi-2level > /usr/local/lib/perl5/5.12.2 > /usr/local/lib/perl5/site_perl/5.11.3 > /usr/local/lib/perl5/site_perl/5.10.0 > /usr/local/lib/perl5/site_perl > . > > And here's "leo", the happy program that causes all that: > > % perl -CS -lE 'use utf8; > say "good job there?"; > say "mais bien sûr"; > say "oh we got àcüté trouble!"; > say "El niño está loco."; > say "¡qué malo!"' | leo > > ¡olɐɯ ə̗nb! > ˙oɔol ɐ̗ʇƨə oṵᴉu lƎ > ¡əlqnoɹʇ ə̗ʇn̤ɔɐ̖ ʇo⅁ əʍ ɥo > ɹn̬ƨ uəᴉq ƨᴉɐɯ > ¿əɹəɥʇ qoſ̣ poo⅁ > > > --tom > > #!/usr/local/bin/perl -l > # > # leo (leonardo script) - reverse input to ʇndʇno > # > # Tom Christiansen <tchrist@perl.com> > # Sat Nov 13 19:05:43 MST 2010 > # > ################################################################# > > use 5.010_000; > > use utf8; > use strict; > use autodie; > use warnings qw[ FATAL all ]; > use open qw[ :std :utf8 ]; > > use autouse > "Unicode::Normalize" => qw[ NFD NFC NFKD NFKC ]; > > use constant BOTH_WAYS => 0; > > ################################################################# > > sub flip_diacriticals($); > > # heredoc beaᵘtification routines > sub dequeue($$); > sub strip_qq($); > sub strip_q($); > > sub xbrace_quote(@); > sub reverse_mark_flip($); > > sub main(); > > ################################################################# > > main(); > exit(); > > ################################################################# > > sub main() { sub > uʍopəpᴉƨdn($); > for my $input (reverse <>) { > chomp $input; > my $ʇndʇno = uʍopəpᴉƨdn($input); > say $ʇndʇno; > } > } > > ################################################################# > > sub uʍopəpᴉƨdn($) { > my $_ = shift(); > > $_ = /[^\x00-\x7F]/ # Unicode? > ? reverse_mark_flip($_) > > : reverse ($_); > > # this is the best we can do for either case > s/[Jj]/ſ\x{323}/g; # long s + combining dot below > > # Placeholders below indicated by □ for chars I haven't > # yet found an upside-down version of. This can be deceptive > # if you don't have one of the normal things in your font set! > > if (BOTH_WAYS) { > > tr [abcdefghijklmnopqrstuvwxyzɐqɔpəɟ⅁ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□] > [ɐqɔpəɟ⅁ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□abcdefghijklmnopqrstuvwxyz]; > > tr [ABCDEFGHIJKLMNOPQRSTUVWXYZɐqƆpƎℲ⅁ɥI□ʞ⅂ƜИOdbᴚƨʇnɅMX⅄□] > [ɐqƆpƎℲ⅁ɥI□ʞ⅂ƜИOdbᴚƨʇnɅMX⅄□ABCDEFGHIJKLMNOPQRSTUVWXYZ]; > > } else { > > tr [abcdefghijklmnopqrstuvwxyz] > [ɐqɔpəɟ⅁ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□]; # punt to other case > # [ɐqɔpəɟ□ɥᴉ□ʞlɯuodbɹƨʇnʌʍxλ□]; # missing in the casing > > tr [ABCDEFGHIJKLMNOPQRSTUVWXYZ] > [ɐqƆpƎℲ⅁ɥI□ʞ⅂ƜИOdbᴚƨʇnɅMX⅄□]; # punt to other case > # [□□Ɔ□ƎℲ⅁□I□□⅂ƜИO□□ᴚ□□□ɅMX⅄□]; # missing in the casing > > } > > tr [-¯_#&'"“”‘’!¡?¿,.] > [-_¯#⅋'"„□□,¡!¿?ʻ˙]; > > tr [0123456789] > [0□□ʕ□□9□86]; > > # sure wish these next two looked better > > tr [()<>{}[]] > [)(><}{\]\[]; > > tr#/\\#\\/#; > > # NFC unlikely to be of much help, > # but one is "supposed" to do this > return NFC($_); > } > > # reverse string by graphemes, inverting all the marks > sub reverse_mark_flip($) { > my $string = shift(); > > # first decompose to pull out grapheme units > my $nfd = NFD($string); > > # reverse the string by grapheme units > my @graphemes = $nfd =~ /\X/g; > > # put it back together reversed > $string = join q[] => reverse @graphemes; > > # if there are marks, we have hard work to do > if ($string =~ /\pM/) { > $string = flip_diacriticals($string); > } > > return $string; > } > > # This autoloading stub replaces itself with the real function, > # then jumps directly into its replacement via magic goto. > # > # HEY LIKE I'M SORRY ALREADY, OK! It's just too hard to get > # this right—and look ok—any other way. Really, I *tried*. > # > sub flip_diacriticals($) { > > binmode(DATA, ":utf8"); > local $/ = q[]; > my $_; > my($lhs, $rhs) = ( q[], q[] ); > while (<DATA>) { > > next if m{ \A \s* \# }x; > > my @pair = m{ < ( \p{HexDigit} + ) > }gmx; > > next unless @pair == 2; > > $lhs .= xbrace_quote( @pair); > $rhs .= xbrace_quote(reverse @pair); > } > my $redefinition = strip_q <<'END_OF_START' > > |Q| > |Q| no warnings "redefine"; > |Q| > |Q| sub flip_diacriticals($) { > |Q| # haven't touched @_ yet > |Q| my $string = shift(); > |Q| $string =~ > |Q| > > END_OF_START > . strip_qq <<"END_OF_TRANSLITERATION" > > |QQ| > |QQ| tr[$lhs] > |QQ| [$rhs]; > |QQ| > > END_OF_TRANSLITERATION > . strip_q <<'END_OF_FUNCTION' > > |Q| > |Q| return $string; > |Q| } > |Q| > |Q| 1; # eval happiness > |Q| > > END_OF_FUNCTION > > # this ̬ is the end of the eval string build up > ; # DO NOT DELETE > # that ̂ was the end of the eval string build up > > ##say $redefinition; > eval $redefinition || die; > goto \&flip_diacriticals; > } > > sub dequeue($$) { > my($leader, $body) = @_; > $body =~ s/^\s*\Q$leader\E ?//gm; > return $body; > } > > sub strip_q($) { > my $body = shift(); > return dequeue('|Q|', $body); > } > > sub strip_qq($) { > my $body = shift(); > return dequeue("|QQ|", $body); > } > > sub xbrace_quote(@) { > return join q[] => map { q[\x{] . $_ . q[}] } @_; > } > > __END__ > ̈ 776 <0308> COMBINING DIAERESIS > ̤ 804 <0324> COMBINING DIAERESIS BELOW > > ̃ 771 <0303> COMBINING TILDE > ̰ 816 <0330> COMBINING TILDE BELOW > > ́ 769 <0301> COMBINING ACUTE ACCENT > ̗ 791 <0317> COMBINING ACUTE ACCENT BELOW > > ̀ 768 <0300> COMBINING GRAVE ACCENT > ̖ 790 <0316> COMBINING GRAVE ACCENT BELOW > > ̆ 774 <0306> COMBINING BREVE > ̯ 815 <032F> COMBINING INVERTED BREVE BELOW > > ̑ 785 <0311> COMBINING INVERTED BREVE > ̮ 814 <032E> COMBINING BREVE BELOW > > ̭ 813 <032D> COMBINING CIRCUMFLEX ACCENT BELOW > ̌ 780 <030C> COMBINING CARON > > ̂ 770 <0302> COMBINING CIRCUMFLEX ACCENT > ̬ 812 <032C> COMBINING CARON BELOW > > ̧ 807 <0327> COMBINING CEDILLA > ̉ 777 <0309> COMBINING HOOK ABOVE > > ̇ 775 <0307> COMBINING DOT ABOVE > ̣ 803 <0323> COMBINING DOT BELOW > > ̳ 819 <0333> COMBINING DOUBLE LOW LINE > ̿ 831 <033F> COMBINING DOUBLE OVERLINE > > ̅ 773 <0305> COMBINING OVERLINE > ̲ 818 <0332> COMBINING LOW LINE > > ̄ 772 <0304> COMBINING MACRON > ̱ 817 <0331> COMBINING MACRON BELOW > > ̍ 781 <030D> COMBINING VERTICAL LINE ABOVE > ̩ 809 <0329> COMBINING VERTICAL LINE BELOW
CC: perl5-porters [...] perl.org, tchrist1 <perlbug-followup [...] perl.org>, bugs-bitbucket [...] rt.perl.org
Subject: Re: [perl #79214] Perl debugger hopeless with Unicode program identifiers
Date: Tue, 16 Nov 2010 09:17:15 +0100
To: Richard.Foley [...] rfi.net
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 15.6k

Message body is not shown because it is too large.



This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org