Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8-bracket support #11271

Closed
p5pRT opened this issue Apr 21, 2011 · 81 comments
Closed

utf8-bracket support #11271

p5pRT opened this issue Apr 21, 2011 · 81 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 21, 2011

Migrated from rt.perl.org#89032 (status was 'open')

Searchable as RT89032$

@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2011

From perl-diddler@tlinx.org

Created by perl-diddler@tlinx.org

I was trying to quote a block of code. Thing is, to do that, you have to
choose a delimiter that's not in the code. I wanted to use a "paired"
operator like some sort of bracket -- but it seems that perl ignores "left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square & curly (according to the perlop manpage). One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

It's a shame actually, that someone confused less-than and greater-than
with angle brackets (*cough*), since real angle brackets are not
likely to be confused with any perl operator and are unlikely to be
included in most code.

It wouldn't be to hard, I woudn't think, to pair up "Right & Left"
"THINGS" from unicode, given their symetric naming. Would it be
reasonable to ask that paired operators/symbols be allowed to be used
in a paired manner in Perl?

At the very least, the manpage (and any other references to the
mathematical operators) should be fixed, since if someplace is going
to claim angle-brackets work, then real angle brackets should be
supported, no? ;-)

Perl Info

Flags:
    category=core
    severity=low

This perlbug was built using Perl 5.10.0 - Fri Jul 30 00:12:10 UTC 2010
It is being executed now by  Perl 5.10.0 - Thu Sep 16 16:14:28 UTC 2010.

Site configuration information for perl 5.10.0:

Configured by abuild at Thu Sep 16 16:14:28 UTC 2010.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.31, archname=x86_64-linux-thread-multi
    uname='linux build35 2.6.31 #1 smp 2010-01-06 16:07:25 +0100 x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -DEBUGGING=both -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -Wall -pipe -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe'
    ccversion='', gccversion='4.4.1 [gcc-4_4-branch revision 150839]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.10.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'

Locally applied patches:
    


@INC for perl 5.10.0:
    /usr/local/lib/perl/5.8
    /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    .


Environment for perl 5.10.0:
    HOME=/home/law
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi/lib64
    LOGDIR (unset)
    PATH=.:/sbin:/usr/local/sbin:/usr/lib64/mpi/gcc/openmpi/bin:/home/law/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/usr/sbin
    PERL5LIB=/usr/local/lib/perl/5.8
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2011

From @iabyn

On Wed, Apr 20, 2011 at 10​:02​:45PM -0700, Linda Walsh wrote​:

I was trying to quote a block of code. Thing is, to do that, you have to
choose a delimiter that's not in the code. I wanted to use a "paired"
operator like some sort of bracket -- but it seems that perl ignores "left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square & curly (according to the perlop manpage). One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

I think you meant *U+232A* for the right bracket. But having said that,
I agree it still doesn't work under blead​:

$ cat /tmp/p

#!/usr/bin/perl
binmode(STDOUT,'​:utf8');
print "use utf8; \$x = q\x{2329}abc\x{232A}; print qq{x=[\$x]\\n};\n"

$ ./perl /tmp/p > /tmp/pp

$ cat /tmp/pp

use utf8; $x = q〈abc〉; print qq{x=[$x]\n};

$ ./perl /tmp/pp

Can't find string terminator "�" anywhere before EOF at /tmp/pp line 1.

--
The Enterprise successfully ferries an alien VIP from one place to another
without serious incident.
  -- Things That Never Happen in "Star Trek" #7

@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From @khwilliamson

On 04/20/2011 11​:02 PM, Linda Walsh (via RT) wrote​:

# New Ticket Created by Linda Walsh
# Please include the string​: [perl #89032]
# in the subject line of all future correspondence about this issue.
#<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032>

This is a bug report for perl from perl-diddler@​tlinx.org,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

I was trying to quote a block of code. Thing is, to do that, you have to
choose a delimiter that's not in the code. I wanted to use a "paired"
operator like some sort of bracket -- but it seems that perl ignores "left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square& curly (according to the perlop manpage). One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

It's a shame actually, that someone confused less-than and greater-than
with angle brackets (*cough*), since real angle brackets are not
likely to be confused with any perl operator and are unlikely to be
included in most code.

It wouldn't be to hard, I woudn't think, to pair up "Right& Left"
"THINGS" from unicode, given their symetric naming. Would it be
reasonable to ask that paired operators/symbols be allowed to be used
in a paired manner in Perl?

At the very least, the manpage (and any other references to the
mathematical operators) should be fixed, since if someplace is going
to claim angle-brackets work, then real angle brackets should be
supported, no? ;-)

If we were to do this, the criteria should probably be members of the
classes Open and Close Punctuation plus the existing GREATER and LESS
THAN signs. Here are the 72 opening ones​:
0028 # '(' LEFT PARENTHESIS
005B # '[' LEFT SQUARE BRACKET
007B # '{' LEFT CURLY BRACKET
0F3A # '༺' TIBETAN MARK GUG RTAGS GYON
0F3C # '༼' TIBETAN MARK ANG KHANG GYON
169B # '᚛' OGHAM FEATHER MARK
201A # '‚' SINGLE LOW-9 QUOTATION MARK
201E # '„' DOUBLE LOW-9 QUOTATION MARK
2045 # '⁅' LEFT SQUARE BRACKET WITH QUILL
207D # '⁽' SUPERSCRIPT LEFT PARENTHESIS
208D # '₍' SUBSCRIPT LEFT PARENTHESIS
2329 # '〈' LEFT-POINTING ANGLE BRACKET
2768 # '❨' MEDIUM LEFT PARENTHESIS ORNAMENT
276A # '❪' MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
276C # '❬' MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
276E # '❮' HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
2770 # '❰' HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
2772 # '❲' LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
2774 # '❴' MEDIUM LEFT CURLY BRACKET ORNAMENT
27C5 # '⟅' LEFT S-SHAPED BAG DELIMITER
27E6 # '⟦' MATHEMATICAL LEFT WHITE SQUARE BRACKET
27E8 # '⟨' MATHEMATICAL LEFT ANGLE BRACKET
27EA # '⟪' MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
27EC # '⟬' MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
27EE # '⟮' MATHEMATICAL LEFT FLATTENED PARENTHESIS
2983 # '⦃' LEFT WHITE CURLY BRACKET
2985 # '⦅' LEFT WHITE PARENTHESIS
2987 # '⦇' Z NOTATION LEFT IMAGE BRACKET
2989 # '⦉' Z NOTATION LEFT BINDING BRACKET
298B # '⦋' LEFT SQUARE BRACKET WITH UNDERBAR
298D # '⦍' LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
298F # '⦏' LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
2991 # '⦑' LEFT ANGLE BRACKET WITH DOT
2993 # '⦓' LEFT ARC LESS-THAN BRACKET
2995 # '⦕' DOUBLE LEFT ARC GREATER-THAN BRACKET
2997 # '⦗' LEFT BLACK TORTOISE SHELL BRACKET
29D8 # '⧘' LEFT WIGGLY FENCE
29DA # '⧚' LEFT DOUBLE WIGGLY FENCE
29FC # '⧼' LEFT-POINTING CURVED ANGLE BRACKET
2E22 # '⸢' TOP LEFT HALF BRACKET
2E24 # '⸤' BOTTOM LEFT HALF BRACKET
2E26 # '⸦' LEFT SIDEWAYS U BRACKET
2E28 # '⸨' LEFT DOUBLE PARENTHESIS
3008 # '〈' LEFT ANGLE BRACKET
300A # '《' LEFT DOUBLE ANGLE BRACKET
300C # '「' LEFT CORNER BRACKET
300E # '『' LEFT WHITE CORNER BRACKET
3010 # '【' LEFT BLACK LENTICULAR BRACKET
3014 # '〔' LEFT TORTOISE SHELL BRACKET
3016 # '〖' LEFT WHITE LENTICULAR BRACKET
3018 # '〘' LEFT WHITE TORTOISE SHELL BRACKET
301A # '〚' LEFT WHITE SQUARE BRACKET
301D # '〝' REVERSED DOUBLE PRIME QUOTATION MARK
FD3E # '﴾' ORNATE LEFT PARENTHESIS
FE17 # '︗' PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
FE35 # '︵' PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
FE37 # '︷' PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
FE39 # '︹' PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
FE3B # '︻' PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
FE3D # '︽' PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
FE3F # '︿' PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
FE41 # '﹁' PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
FE43 # '﹃' PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
FE47 # '﹇' PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
FE59 # '﹙' SMALL LEFT PARENTHESIS
FE5B # '﹛' SMALL LEFT CURLY BRACKET
FE5D # '﹝' SMALL LEFT TORTOISE SHELL BRACKET
FF08 # '(' FULLWIDTH LEFT PARENTHESIS
FF3B # '[' FULLWIDTH LEFT SQUARE BRACKET
FF5B # '{' FULLWIDTH LEFT CURLY BRACKET
FF5F # '⦅' FULLWIDTH LEFT WHITE PARENTHESIS
FF62 # '「' HALFWIDTH LEFT CORNER BRACKET

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From @khwilliamson

On 04/21/2011 08​:32 PM, Karl Williamson wrote​:

On 04/20/2011 11​:02 PM, Linda Walsh (via RT) wrote​:

# New Ticket Created by Linda Walsh
# Please include the string​: [perl #89032]
# in the subject line of all future correspondence about this issue.
#<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032>

This is a bug report for perl from perl-diddler@​tlinx.org,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

I was trying to quote a block of code. Thing is, to do that, you have to
choose a delimiter that's not in the code. I wanted to use a "paired"
operator like some sort of bracket -- but it seems that perl ignores
"left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square& curly (according to the perlop manpage). One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

It's a shame actually, that someone confused less-than and greater-than
with angle brackets (*cough*), since real angle brackets are not
likely to be confused with any perl operator and are unlikely to be
included in most code.

It wouldn't be to hard, I woudn't think, to pair up "Right& Left"
"THINGS" from unicode, given their symetric naming. Would it be
reasonable to ask that paired operators/symbols be allowed to be used
in a paired manner in Perl?

At the very least, the manpage (and any other references to the
mathematical operators) should be fixed, since if someplace is going
to claim angle-brackets work, then real angle brackets should be
supported, no? ;-)

If we were to do this, the criteria should probably be members of the
classes Open and Close Punctuation plus the existing GREATER and LESS
THAN signs. Here are the 72 opening ones​:
0028 # '(' LEFT PARENTHESIS
005B # '[' LEFT SQUARE BRACKET
007B # '{' LEFT CURLY BRACKET
0F3A # '༺' TIBETAN MARK GUG RTAGS GYON
0F3C # '༼' TIBETAN MARK ANG KHANG GYON
169B # '᚛' OGHAM FEATHER MARK
201A # '‚' SINGLE LOW-9 QUOTATION MARK
201E # '„' DOUBLE LOW-9 QUOTATION MARK
2045 # '⁅' LEFT SQUARE BRACKET WITH QUILL
207D # '⁽' SUPERSCRIPT LEFT PARENTHESIS
208D # '₍' SUBSCRIPT LEFT PARENTHESIS
2329 # '〈' LEFT-POINTING ANGLE BRACKET
2768 # '❨' MEDIUM LEFT PARENTHESIS ORNAMENT
276A # '❪' MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
276C # '❬' MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
276E # '❮' HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
2770 # '❰' HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
2772 # '❲' LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
2774 # '❴' MEDIUM LEFT CURLY BRACKET ORNAMENT
27C5 # '⟅' LEFT S-SHAPED BAG DELIMITER
27E6 # '⟦' MATHEMATICAL LEFT WHITE SQUARE BRACKET
27E8 # '⟨' MATHEMATICAL LEFT ANGLE BRACKET
27EA # '⟪' MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
27EC # '⟬' MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
27EE # '⟮' MATHEMATICAL LEFT FLATTENED PARENTHESIS
2983 # '⦃' LEFT WHITE CURLY BRACKET
2985 # '⦅' LEFT WHITE PARENTHESIS
2987 # '⦇' Z NOTATION LEFT IMAGE BRACKET
2989 # '⦉' Z NOTATION LEFT BINDING BRACKET
298B # '⦋' LEFT SQUARE BRACKET WITH UNDERBAR
298D # '⦍' LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
298F # '⦏' LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
2991 # '⦑' LEFT ANGLE BRACKET WITH DOT
2993 # '⦓' LEFT ARC LESS-THAN BRACKET
2995 # '⦕' DOUBLE LEFT ARC GREATER-THAN BRACKET
2997 # '⦗' LEFT BLACK TORTOISE SHELL BRACKET
29D8 # '⧘' LEFT WIGGLY FENCE
29DA # '⧚' LEFT DOUBLE WIGGLY FENCE
29FC # '⧼' LEFT-POINTING CURVED ANGLE BRACKET
2E22 # '⸢' TOP LEFT HALF BRACKET
2E24 # '⸤' BOTTOM LEFT HALF BRACKET
2E26 # '⸦' LEFT SIDEWAYS U BRACKET
2E28 # '⸨' LEFT DOUBLE PARENTHESIS
3008 # '〈' LEFT ANGLE BRACKET
300A # '《' LEFT DOUBLE ANGLE BRACKET
300C # '「' LEFT CORNER BRACKET
300E # '『' LEFT WHITE CORNER BRACKET
3010 # '【' LEFT BLACK LENTICULAR BRACKET
3014 # '〔' LEFT TORTOISE SHELL BRACKET
3016 # '〖' LEFT WHITE LENTICULAR BRACKET
3018 # '〘' LEFT WHITE TORTOISE SHELL BRACKET
301A # '〚' LEFT WHITE SQUARE BRACKET
301D # '〝' REVERSED DOUBLE PRIME QUOTATION MARK
FD3E # '﴾' ORNATE LEFT PARENTHESIS
FE17 # '︗' PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
FE35 # '︵' PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
FE37 # '︷' PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
FE39 # '︹' PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
FE3B # '︻' PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
FE3D # '︽' PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
FE3F # '︿' PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
FE41 # '﹁' PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
FE43 # '﹃' PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
FE47 # '﹇' PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
FE59 # '﹙' SMALL LEFT PARENTHESIS
FE5B # '﹛' SMALL LEFT CURLY BRACKET
FE5D # '﹝' SMALL LEFT TORTOISE SHELL BRACKET
FF08 # '(' FULLWIDTH LEFT PARENTHESIS
FF3B # '[' FULLWIDTH LEFT SQUARE BRACKET
FF5B # '{' FULLWIDTH LEFT CURLY BRACKET
FF5F # '⦅' FULLWIDTH LEFT WHITE PARENTHESIS
FF62 # '「' HALFWIDTH LEFT CORNER BRACKET

But perhaps I should have included the initial and final quotes, of
which there are 12 pairs​:
00AB # '«' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
2018 # '‘' LEFT SINGLE QUOTATION MARK
201B # '‛' SINGLE HIGH-REVERSED-9 QUOTATION MARK
201C # '“' LEFT DOUBLE QUOTATION MARK
201F # '‟' DOUBLE HIGH-REVERSED-9 QUOTATION MARK
2039 # '‹' SINGLE LEFT-POINTING ANGLE QUOTATION MARK
2E02 # '⸂' LEFT SUBSTITUTION BRACKET
2E04 # '⸄' LEFT DOTTED SUBSTITUTION BRACKET
2E09 # '⸉' LEFT TRANSPOSITION BRACKET
2E0C # '⸌' LEFT RAISED OMISSION BRACKET
2E1C # '⸜' LEFT LOW PARAPHRASE BRACKET
2E20 # '⸠' LEFT VERTICAL BAR WITH QUILL

Note that some of the first set have the name QUOTATION, but aren't
considered to be quotes.

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From tchrist@perl.com

If we were to do this, the criteria should probably be members of the
classes Open and Close Punctuation plus the existing GREATER and LESS
THAN signs. Here are the 72 opening ones​:

Except that the pair she had suggested, LEFT- and RIGHT-POINTING ANGLE
QUOTATION MARK, are Initial and Final Punctuation, not Open and Close
Punctuation. There are a dozen of the Pi kind​:

  « 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
  ‘ 2018 LEFT SINGLE QUOTATION MARK
  ‛ 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
  “ 201C LEFT DOUBLE QUOTATION MARK
  ‟ 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
  ‹ 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
  ⸂ 2E02 LEFT SUBSTITUTION BRACKET
  ⸄ 2E04 LEFT DOTTED SUBSTITUTION BRACKET
  ⸉ 2E09 LEFT TRANSPOSITION BRACKET
  ⸌ 2E0C LEFT RAISED OMISSION BRACKET
  ⸜ 2E1C LEFT LOW PARAPHRASE BRACKET
  ⸠ 2E20 LEFT VERTICAL BAR WITH QUILL

Of those, these four are *not* Bidi Mirrored​:

  ‘ 2018 LEFT SINGLE QUOTATION MARK
  ‛ 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
  “ 201C LEFT DOUBLE QUOTATION MARK
  ‟ 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK

I do agree that of the BidiM Symbols, probably only "<" and ">"
should count -- because they already have. I guess you might make
some argument to add the things with the same UCA1 values as those​:

  ﹤ FE64 SMALL LESS-THAN SIGN
  ﹥ FE65 SMALL GREATER-THAN SIGN
  < FF1C FULLWIDTH LESS-THAN SIGN
  > FF1E FULLWIDTH GREATER-THAN SIGN

But I dunno. Here are the only BidiM full/halfwidth code points​:

  ( FF08 GC=Ps FULLWIDTH LEFT PARENTHESIS
  ) FF09 GC=Pe FULLWIDTH RIGHT PARENTHESIS
  < FF1C GC=Sm FULLWIDTH LESS-THAN SIGN
  > FF1E GC=Sm FULLWIDTH GREATER-THAN SIGN
  [ FF3B GC=Ps FULLWIDTH LEFT SQUARE BRACKET
  ] FF3D GC=Pe FULLWIDTH RIGHT SQUARE BRACKET
  { FF5B GC=Ps FULLWIDTH LEFT CURLY BRACKET
  } FF5D GC=Pe FULLWIDTH RIGHT CURLY BRACKET
  ⦅ FF5F GC=Ps FULLWIDTH LEFT WHITE PARENTHESIS
  ⦆ FF60 GC=Pe FULLWIDTH RIGHT WHITE PARENTHESIS
  「 FF62 GC=Ps HALFWIDTH LEFT CORNER BRACKET
  」 FF63 GC=Pe HALFWIDTH RIGHT CORNER BRACKET

I don't know whether you really want to include the verticals​:

  ⸠ 2E20 GC=Pi LEFT VERTICAL BAR WITH QUILL
  ︗ FE17 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
  ︵ FE35 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
  ︷ FE37 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
  ︹ FE39 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
  ︻ FE3B GC=Ps PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
  ︽ FE3D GC=Ps PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
  ︿ FE3F GC=Ps PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
  ﹁ FE41 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
  ﹃ FE43 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
  ﹇ FE47 GC=Ps PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET

--tom

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From tchrist@perl.com

But perhaps I should have included the initial and final quotes, of
which there are 12 pairs​:

00AB # '«' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
2018 # '‘' LEFT SINGLE QUOTATION MARK
201B # '‛' SINGLE HIGH-REVERSED-9 QUOTATION MARK
201C # '“' LEFT DOUBLE QUOTATION MARK
201F # '‟' DOUBLE HIGH-REVERSED-9 QUOTATION MARK
2039 # '‹' SINGLE LEFT-POINTING ANGLE QUOTATION MARK
2E02 # '⸂' LEFT SUBSTITUTION BRACKET
2E04 # '⸄' LEFT DOTTED SUBSTITUTION BRACKET
2E09 # '⸉' LEFT TRANSPOSITION BRACKET
2E0C # '⸌' LEFT RAISED OMISSION BRACKET
2E1C # '⸜' LEFT LOW PARAPHRASE BRACKET
2E20 # '⸠' LEFT VERTICAL BAR WITH QUILL

Note that some of the first set have the name QUOTATION, but aren't
considered to be quotes.

I get only two​:

  % unichars -c '\pP' '\P{QMark}' 'NAME =~ /QUOT/'
  ❮ 276E GC=Ps HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
  ❯ 276F GC=Pe HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT

And they aren't from the Pi/Pf set.

--tom

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From perl-diddler@tlinx.org

tchrist1 via RT wrote​:

If we were to do this, the criteria should probably be members of the
classes Open and Close Punctuation plus the existing GREATER and LESS
THAN signs. Here are the 72 opening ones​:

Except that the pair she had suggested, LEFT- and RIGHT-POINTING ANGLE
QUOTATION MARK, are Initial and Final Punctuation, not Open and Close
Punctuation. There are a dozen of the Pi kind​:


  By she are you meaning me?

  I do like the double angle brackets that are called
quotation marks, but my original note was on not U+00AB, but U+2329 & U232A
(correction caught by Dave Mitchel) -- they are called left & right
angle brackets, not quotes.

  Honestly, when I submitted this, I thought the easiest thing
to do would be to check if something had "LEFT" or "RIGHT" in it's
textual description, then use, um, something like Perl, ( :^) ) to
look up a textual description with the opposite word substituted in and
if found, use it as the complimentary character -- basically doing this
for 2 characters that have a RIGHT & LEFT. Should also obviate the need
for any enumerated table.

  Is there something wrong in that 'simple' approach? It would seem
to be the most flexible... (?)

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From @Tux

On Fri, 22 Apr 2011 00​:30​:56 -0700, Linda Walsh
<perl-diddler@​tlinx.org> wrote​:

tchrist1 via RT wrote​:

If we were to do this, the criteria should probably be members of the
classes Open and Close Punctuation plus the existing GREATER and LESS
THAN signs. Here are the 72 opening ones​:

Except that the pair she had suggested, LEFT- and RIGHT-POINTING ANGLE
QUOTATION MARK, are Initial and Final Punctuation, not Open and Close
Punctuation. There are a dozen of the Pi kind​:
----
By she are you meaning me?

I do like the double angle brackets that are called

quotation marks, but my original note was on not U+00AB, but U+2329 & U232A
(correction caught by Dave Mitchel) -- they are called left & right
angle brackets, not quotes.

Honestly\, when I submitted this\, I thought the easiest thing

to do would be to check if something had "LEFT" or "RIGHT" in it's
textual description,

That would give you more than you ask for​:

LEFT has ± 328 entries, and RIGHT has ± 331
The LEFT list has been included at the end

The list only gets interesting when the "LEFT" code point has a
matching "RIGHT" version. That list is included below. I however think
that Karl and Tom are right that the *property* of the code point has
to be taken into account for "matching tokens"

000028 ( LEFT PARENTHESIS
000029 ) RIGHT PARENTHESIS
00005b [ LEFT SQUARE BRACKET
00005d ] RIGHT SQUARE BRACKET
00007b { LEFT CURLY BRACKET
00007d } RIGHT CURLY BRACKET
0000ab « AQML_IDX LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0000bb » AQMR_IDX RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0002bf ʿ MODIFIER LETTER LEFT HALF RING
0002be ʾ MODIFIER LETTER RIGHT HALF RING
0002c2 ˂ MODIFIER LETTER LEFT ARROWHEAD
0002c3 ˃ MODIFIER LETTER RIGHT ARROWHEAD
0002d3 ˓ MODIFIER LETTER CENTRED LEFT HALF RING
0002d2 ˒ MODIFIER LETTER CENTRED RIGHT HALF RING
0002f1 ˱ MODIFIER LETTER LOW LEFT ARROWHEAD
0002f2 ˲ MODIFIER LETTER LOW RIGHT ARROWHEAD
000318 ̘ COMBINING LEFT TACK BELOW
000319 ̙ COMBINING RIGHT TACK BELOW
00031c ̜ COMBINING LEFT HALF RING BELOW
000339 ̹ COMBINING RIGHT HALF RING BELOW
000351 ͑ COMBINING LEFT HALF RING ABOVE
000357 ͗ COMBINING RIGHT HALF RING ABOVE
000354 ͔ COMBINING LEFT ARROWHEAD BELOW
000355 ͕ COMBINING RIGHT ARROWHEAD BELOW
000706 ܆ SYRIAC COLON SKEWED LEFT
000707 ܇ SYRIAC COLON SKEWED RIGHT
000fd6 ࿖ LEFT-FACING SVASTI SIGN
000fd5 ࿕ RIGHT-FACING SVASTI SIGN
000fd8 ࿘ LEFT-FACING SVASTI SIGN WITH DOTS
000fd7 ࿗ RIGHT-FACING SVASTI SIGN WITH DOTS
001dfe ᷾ COMBINING LEFT ARROWHEAD ABOVE
000350 ͐ COMBINING RIGHT ARROWHEAD ABOVE
002018 ‘ LEFT SINGLE QUOTATION MARK
002019 ’ RIGHT SINGLE QUOTATION MARK
00201c “ LEFT DOUBLE QUOTATION MARK
00201d ” RIGHT DOUBLE QUOTATION MARK
002039 ‹ SINGLE LEFT-POINTING ANGLE QUOTATION MARK
00203a › SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
002045 ⁅ LEFT SQUARE BRACKET WITH QUILL
002046 ⁆ RIGHT SQUARE BRACKET WITH QUILL
00207d ⁽ SUPERSCRIPT LEFT PARENTHESIS
00207e ⁾ SUPERSCRIPT RIGHT PARENTHESIS
00208d ₍ SUBSCRIPT LEFT PARENTHESIS
00208e ₎ SUBSCRIPT RIGHT PARENTHESIS
0020d0 ⃐ COMBINING LEFT HARPOON ABOVE
0020d1 ⃑ COMBINING RIGHT HARPOON ABOVE
0020d6 ⃖ COMBINING LEFT ARROW ABOVE
0020d7 ⃗ COMBINING RIGHT ARROW ABOVE
0020ee ⃮ COMBINING LEFT ARROW BELOW
0020ef ⃯ COMBINING RIGHT ARROW BELOW
0022a3 ⊣ LEFT TACK
0022a2 ⊢ RIGHT TACK
0022c9 ⋉ LEFT NORMAL FACTOR SEMIDIRECT PRODUCT
0022ca ⋊ RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT
0022cb ⋋ LEFT SEMIDIRECT PRODUCT
0022cc ⋌ RIGHT SEMIDIRECT PRODUCT
002308 ⌈ LEFT CEILING
002309 ⌉ RIGHT CEILING
00230a ⌊ LEFT FLOOR
00230b ⌋ RIGHT FLOOR
00230d ⌍ BOTTOM LEFT CROP
00230c ⌌ BOTTOM RIGHT CROP
00230f ⌏ TOP LEFT CROP
00230e ⌎ TOP RIGHT CROP
00231c ⌜ TOP LEFT CORNER
00231d ⌝ TOP RIGHT CORNER
00231e ⌞ BOTTOM LEFT CORNER
00231f ⌟ BOTTOM RIGHT CORNER
002329 〈 LEFT-POINTING ANGLE BRACKET
00232a 〉 RIGHT-POINTING ANGLE BRACKET
00232b ⌫ ERASE TO THE LEFT
002326 ⌦ ERASE TO THE RIGHT
00239b ⎛ LEFT PARENTHESIS UPPER HOOK
00239e ⎞ RIGHT PARENTHESIS UPPER HOOK
00239c ⎜ LEFT PARENTHESIS EXTENSION
00239f ⎟ RIGHT PARENTHESIS EXTENSION
00239d ⎝ LEFT PARENTHESIS LOWER HOOK
0023a0 ⎠ RIGHT PARENTHESIS LOWER HOOK
0023a1 ⎡ LEFT SQUARE BRACKET UPPER CORNER
0023a4 ⎤ RIGHT SQUARE BRACKET UPPER CORNER
0023a2 ⎢ LEFT SQUARE BRACKET EXTENSION
0023a5 ⎥ RIGHT SQUARE BRACKET EXTENSION
0023a3 ⎣ LEFT SQUARE BRACKET LOWER CORNER
0023a6 ⎦ RIGHT SQUARE BRACKET LOWER CORNER
0023a7 ⎧ LEFT CURLY BRACKET UPPER HOOK
0023ab ⎫ RIGHT CURLY BRACKET UPPER HOOK
0023a8 ⎨ LEFT CURLY BRACKET MIDDLE PIECE
0023ac ⎬ RIGHT CURLY BRACKET MIDDLE PIECE
0023a9 ⎩ LEFT CURLY BRACKET LOWER HOOK
0023ad ⎭ RIGHT CURLY BRACKET LOWER HOOK
0023b8 ⎸ LEFT VERTICAL BOX LINE
0023b9 ⎹ RIGHT VERTICAL BOX LINE
0023cb ⏋ DENTISTRY SYMBOL LIGHT VERTICAL AND TOP LEFT
0023be ⎾ DENTISTRY SYMBOL LIGHT VERTICAL AND TOP RIGHT
0023cc ⏌ DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM LEFT
0023bf ⎿ DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM RIGHT
002510 ┐ BOX DRAWINGS LIGHT DOWN AND LEFT
00250c ┌ BOX DRAWINGS LIGHT DOWN AND RIGHT
002511 ┑ BOX DRAWINGS DOWN LIGHT AND LEFT HEAVY
00250d ┍ BOX DRAWINGS DOWN LIGHT AND RIGHT HEAVY
002512 ┒ BOX DRAWINGS DOWN HEAVY AND LEFT LIGHT
00250e ┎ BOX DRAWINGS DOWN HEAVY AND RIGHT LIGHT
002513 ┓ BOX DRAWINGS HEAVY DOWN AND LEFT
00250f ┏ BOX DRAWINGS HEAVY DOWN AND RIGHT
002518 ┘ BOX DRAWINGS LIGHT UP AND LEFT
002514 └ BOX DRAWINGS LIGHT UP AND RIGHT
002519 ┙ BOX DRAWINGS UP LIGHT AND LEFT HEAVY
002515 ┕ BOX DRAWINGS UP LIGHT AND RIGHT HEAVY
00251a ┚ BOX DRAWINGS UP HEAVY AND LEFT LIGHT
002516 ┖ BOX DRAWINGS UP HEAVY AND RIGHT LIGHT
00251b ┛ BOX DRAWINGS HEAVY UP AND LEFT
002517 ┗ BOX DRAWINGS HEAVY UP AND RIGHT
002524 ┤ BOX DRAWINGS LIGHT VERTICAL AND LEFT
00251c ├ BOX DRAWINGS LIGHT VERTICAL AND RIGHT
002525 ┥ BOX DRAWINGS VERTICAL LIGHT AND LEFT HEAVY
00251d ┝ BOX DRAWINGS VERTICAL LIGHT AND RIGHT HEAVY
002526 ┦ BOX DRAWINGS UP HEAVY AND LEFT DOWN LIGHT
00251e ┞ BOX DRAWINGS UP HEAVY AND RIGHT DOWN LIGHT
002527 ┧ BOX DRAWINGS DOWN HEAVY AND LEFT UP LIGHT
00251f ┟ BOX DRAWINGS DOWN HEAVY AND RIGHT UP LIGHT
002528 ┨ BOX DRAWINGS VERTICAL HEAVY AND LEFT LIGHT
002520 ┠ BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT
002529 ┩ BOX DRAWINGS DOWN LIGHT AND LEFT UP HEAVY
002521 ┡ BOX DRAWINGS DOWN LIGHT AND RIGHT UP HEAVY
00252a ┪ BOX DRAWINGS UP LIGHT AND LEFT DOWN HEAVY
002522 ┢ BOX DRAWINGS UP LIGHT AND RIGHT DOWN HEAVY
00252b ┫ BOX DRAWINGS HEAVY VERTICAL AND LEFT
002523 ┣ BOX DRAWINGS HEAVY VERTICAL AND RIGHT
002555 ╕ BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
002552 ╒ BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
002556 ╖ BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
002553 ╓ BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
002557 ╗ BOX DRAWINGS DOUBLE DOWN AND LEFT
002554 ╔ BOX DRAWINGS DOUBLE DOWN AND RIGHT
00255b ╛ BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
002558 ╘ BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
00255c ╜ BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
002559 ╙ BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
00255d ╝ BOX DRAWINGS DOUBLE UP AND LEFT
00255a ╚ BOX DRAWINGS DOUBLE UP AND RIGHT
002561 ╡ BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
00255e ╞ BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
002562 ╢ BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
00255f ╟ BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
002563 ╣ BOX DRAWINGS DOUBLE VERTICAL AND LEFT
002560 ╠ BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
00256e ╮ BOX DRAWINGS LIGHT ARC DOWN AND LEFT
00256d ╭ BOX DRAWINGS LIGHT ARC DOWN AND RIGHT
00256f ╯ BOX DRAWINGS LIGHT ARC UP AND LEFT
002570 ╰ BOX DRAWINGS LIGHT ARC UP AND RIGHT
002574 ╴ BOX DRAWINGS LIGHT LEFT
002576 ╶ BOX DRAWINGS LIGHT RIGHT
002578 ╸ BOX DRAWINGS HEAVY LEFT
00257a ╺ BOX DRAWINGS HEAVY RIGHT
00258c ▌ LEFT HALF BLOCK
002590 ▐ RIGHT HALF BLOCK
00258f ▏ LEFT ONE EIGHTH BLOCK
002595 ▕ RIGHT ONE EIGHTH BLOCK
002596 ▖ QUADRANT LOWER LEFT
002597 ▗ QUADRANT LOWER RIGHT
002598 ▘ QUADRANT UPPER LEFT
00259d ▝ QUADRANT UPPER RIGHT
002599 ▙ QUADRANT UPPER LEFT AND LOWER LEFT AND LOWER RIGHT
00259f ▟ QUADRANT UPPER RIGHT AND LOWER LEFT AND LOWER RIGHT
0025c0 ◀ BLACK LEFT-POINTING TRIANGLE
0025b6 ▶ BLACK RIGHT-POINTING TRIANGLE
0025c1 ◁ WHITE LEFT-POINTING TRIANGLE
0025b7 ▷ WHITE RIGHT-POINTING TRIANGLE
0025c2 ◂ BLACK LEFT-POINTING SMALL TRIANGLE
0025b8 ▸ BLACK RIGHT-POINTING SMALL TRIANGLE
0025c3 ◃ WHITE LEFT-POINTING SMALL TRIANGLE
0025b9 ▹ WHITE RIGHT-POINTING SMALL TRIANGLE
0025c4 ◄ BLACK LEFT-POINTING POINTER
0025ba ► BLACK RIGHT-POINTING POINTER
0025c5 ◅ WHITE LEFT-POINTING POINTER
0025bb ▻ WHITE RIGHT-POINTING POINTER
0025d0 ◐ CIRCLE WITH LEFT HALF BLACK
0025d1 ◑ CIRCLE WITH RIGHT HALF BLACK
0025d6 ◖ LEFT HALF BLACK CIRCLE
0025d7 ◗ RIGHT HALF BLACK CIRCLE
0025dc ◜ UPPER LEFT QUADRANT CIRCULAR ARC
0025dd ◝ UPPER RIGHT QUADRANT CIRCULAR ARC
0025df ◟ LOWER LEFT QUADRANT CIRCULAR ARC
0025de ◞ LOWER RIGHT QUADRANT CIRCULAR ARC
0025e3 ◣ BLACK LOWER LEFT TRIANGLE
0025e2 ◢ BLACK LOWER RIGHT TRIANGLE
0025e4 ◤ BLACK UPPER LEFT TRIANGLE
0025e5 ◥ BLACK UPPER RIGHT TRIANGLE
0025e7 ◧ SQUARE WITH LEFT HALF BLACK
0025e8 ◨ SQUARE WITH RIGHT HALF BLACK
0025e9 ◩ SQUARE WITH UPPER LEFT DIAGONAL HALF BLACK
002b14 ⬔ SQUARE WITH UPPER RIGHT DIAGONAL HALF BLACK
0025ed ◭ UP-POINTING TRIANGLE WITH LEFT HALF BLACK
0025ee ◮ UP-POINTING TRIANGLE WITH RIGHT HALF BLACK
0025f0 ◰ WHITE SQUARE WITH UPPER LEFT QUADRANT
0025f3 ◳ WHITE SQUARE WITH UPPER RIGHT QUADRANT
0025f1 ◱ WHITE SQUARE WITH LOWER LEFT QUADRANT
0025f2 ◲ WHITE SQUARE WITH LOWER RIGHT QUADRANT
0025f4 ◴ WHITE CIRCLE WITH UPPER LEFT QUADRANT
0025f7 ◷ WHITE CIRCLE WITH UPPER RIGHT QUADRANT
0025f5 ◵ WHITE CIRCLE WITH LOWER LEFT QUADRANT
0025f6 ◶ WHITE CIRCLE WITH LOWER RIGHT QUADRANT
0025f8 ◸ UPPER LEFT TRIANGLE
0025f9 ◹ UPPER RIGHT TRIANGLE
0025fa ◺ LOWER LEFT TRIANGLE
0025ff ◿ LOWER RIGHT TRIANGLE
00261a ☚ BLACK LEFT POINTING INDEX
00261b ☛ BLACK RIGHT POINTING INDEX
00261c ☜ WHITE LEFT POINTING INDEX
00261e ☞ WHITE RIGHT POINTING INDEX
00269f ⚟ THREE LINES CONVERGING LEFT
00269e ⚞ THREE LINES CONVERGING RIGHT
002768 ❨ MEDIUM LEFT PARENTHESIS ORNAMENT
002769 ❩ MEDIUM RIGHT PARENTHESIS ORNAMENT
00276a ❪ MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
00276b ❫ MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
00276c ❬ MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
00276d ❭ MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
00276e ❮ HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
00276f ❯ HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
002770 ❰ HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
002771 ❱ HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
002772 ❲ LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
002773 ❳ LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
002774 ❴ MEDIUM LEFT CURLY BRACKET ORNAMENT
002775 ❵ MEDIUM RIGHT CURLY BRACKET ORNAMENT
0027aa ➪ LEFT-SHADED WHITE RIGHTWARDS ARROW
0027a9 ➩ RIGHT-SHADED WHITE RIGHTWARDS ARROW
0027c5 ⟅ LEFT S-SHAPED BAG DELIMITER
0027c6 ⟆ RIGHT S-SHAPED BAG DELIMITER
0027d5 ⟕ LEFT OUTER JOIN
0027d6 ⟖ RIGHT OUTER JOIN
0027de ⟞ LONG LEFT TACK
0027dd ⟝ LONG RIGHT TACK
0027e6 ⟦ MATHEMATICAL LEFT WHITE SQUARE BRACKET
0027e7 ⟧ MATHEMATICAL RIGHT WHITE SQUARE BRACKET
0027e8 ⟨ MATHEMATICAL LEFT ANGLE BRACKET
0027e9 ⟩ MATHEMATICAL RIGHT ANGLE BRACKET
0027ea ⟪ MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
0027eb ⟫ MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
0027ec ⟬ MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
0027ed ⟭ MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET
0027ee ⟮ MATHEMATICAL LEFT FLATTENED PARENTHESIS
0027ef ⟯ MATHEMATICAL RIGHT FLATTENED PARENTHESIS
00294c ⥌ UP BARB RIGHT DOWN BARB LEFT HARPOON
00294f ⥏ UP BARB RIGHT DOWN BARB RIGHT HARPOON
00294d ⥍ UP BARB LEFT DOWN BARB RIGHT HARPOON
00294f ⥏ UP BARB RIGHT DOWN BARB RIGHT HARPOON
002951 ⥑ UP BARB LEFT DOWN BARB LEFT HARPOON
00294c ⥌ UP BARB RIGHT DOWN BARB LEFT HARPOON
002958 ⥘ UPWARDS HARPOON WITH BARB LEFT TO BAR
002954 ⥔ UPWARDS HARPOON WITH BARB RIGHT TO BAR
002959 ⥙ DOWNWARDS HARPOON WITH BARB LEFT TO BAR
002955 ⥕ DOWNWARDS HARPOON WITH BARB RIGHT TO BAR
002960 ⥠ UPWARDS HARPOON WITH BARB LEFT FROM BAR
00295c ⥜ UPWARDS HARPOON WITH BARB RIGHT FROM BAR
002961 ⥡ DOWNWARDS HARPOON WITH BARB LEFT FROM BAR
00295d ⥝ DOWNWARDS HARPOON WITH BARB RIGHT FROM BAR
00297c ⥼ LEFT FISH TAIL
00297d ⥽ RIGHT FISH TAIL
002983 ⦃ LEFT WHITE CURLY BRACKET
002984 ⦄ RIGHT WHITE CURLY BRACKET
002985 ⦅ LEFT WHITE PARENTHESIS
002986 ⦆ RIGHT WHITE PARENTHESIS
002987 ⦇ Z NOTATION LEFT IMAGE BRACKET
002988 ⦈ Z NOTATION RIGHT IMAGE BRACKET
002989 ⦉ Z NOTATION LEFT BINDING BRACKET
00298a ⦊ Z NOTATION RIGHT BINDING BRACKET
00298b ⦋ LEFT SQUARE BRACKET WITH UNDERBAR
00298c ⦌ RIGHT SQUARE BRACKET WITH UNDERBAR
00298d ⦍ LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
002990 ⦐ RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
00298f ⦏ LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
00298e ⦎ RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
002991 ⦑ LEFT ANGLE BRACKET WITH DOT
002992 ⦒ RIGHT ANGLE BRACKET WITH DOT
002997 ⦗ LEFT BLACK TORTOISE SHELL BRACKET
002998 ⦘ RIGHT BLACK TORTOISE SHELL BRACKET
0029a9 ⦩ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND LEFT
0029a8 ⦨ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND RIGHT
0029ab ⦫ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND LEFT
0029aa ⦪ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND RIGHT
0029ad ⦭ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND UP
0029ac ⦬ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND UP
0029af ⦯ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND DOWN
0029ae ⦮ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND DOWN
0029b4 ⦴ EMPTY SET WITH LEFT ARROW ABOVE
0029b3 ⦳ EMPTY SET WITH RIGHT ARROW ABOVE
0029d1 ⧑ BOWTIE WITH LEFT HALF BLACK
0029d2 ⧒ BOWTIE WITH RIGHT HALF BLACK
0029d4 ⧔ TIMES WITH LEFT HALF BLACK
0029d5 ⧕ TIMES WITH RIGHT HALF BLACK
0029d8 ⧘ LEFT WIGGLY FENCE
0029d9 ⧙ RIGHT WIGGLY FENCE
0029da ⧚ LEFT DOUBLE WIGGLY FENCE
0029db ⧛ RIGHT DOUBLE WIGGLY FENCE
0029e8 ⧨ DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK
0029e9 ⧩ DOWN-POINTING TRIANGLE WITH RIGHT HALF BLACK
0029fc ⧼ LEFT-POINTING CURVED ANGLE BRACKET
0029fd ⧽ RIGHT-POINTING CURVED ANGLE BRACKET
002a2d ⨭ PLUS SIGN IN LEFT HALF CIRCLE
002a2e ⨮ PLUS SIGN IN RIGHT HALF CIRCLE
002a34 ⨴ MULTIPLICATION SIGN IN LEFT HALF CIRCLE
002a35 ⨵ MULTIPLICATION SIGN IN RIGHT HALF CIRCLE
002acd ⫍ SQUARE LEFT OPEN BOX OPERATOR
002ace ⫎ SQUARE RIGHT OPEN BOX OPERATOR
002ae5 ⫥ DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE
0022ab ⊫ DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE
002b15 ⬕ SQUARE WITH LOWER LEFT DIAGONAL HALF BLACK
0025ea ◪ SQUARE WITH LOWER RIGHT DIAGONAL HALF BLACK
002b16 ⬖ DIAMOND WITH LEFT HALF BLACK
002b17 ⬗ DIAMOND WITH RIGHT HALF BLACK
002b30 ⬰ LEFT ARROW WITH SMALL CIRCLE
0021f4 ⇴ RIGHT ARROW WITH SMALL CIRCLE
002b32 ⬲ LEFT ARROW WITH CIRCLED PLUS
0027f4 ⟴ RIGHT ARROW WITH CIRCLED PLUS
002b3f ⬿ WAVE ARROW POINTING DIRECTLY LEFT
002933 ⤳ WAVE ARROW POINTING DIRECTLY RIGHT
002e02 ⸂ LEFT SUBSTITUTION BRACKET
002e03 ⸃ RIGHT SUBSTITUTION BRACKET
002e04 ⸄ LEFT DOTTED SUBSTITUTION BRACKET
002e05 ⸅ RIGHT DOTTED SUBSTITUTION BRACKET
002e09 ⸉ LEFT TRANSPOSITION BRACKET
002e0a ⸊ RIGHT TRANSPOSITION BRACKET
002e0c ⸌ LEFT RAISED OMISSION BRACKET
002e0d ⸍ RIGHT RAISED OMISSION BRACKET
002e1c ⸜ LEFT LOW PARAPHRASE BRACKET
002e1d ⸝ RIGHT LOW PARAPHRASE BRACKET
002e20 ⸠ LEFT VERTICAL BAR WITH QUILL
002e21 ⸡ RIGHT VERTICAL BAR WITH QUILL
002e22 ⸢ TOP LEFT HALF BRACKET
002e23 ⸣ TOP RIGHT HALF BRACKET
002e24 ⸤ BOTTOM LEFT HALF BRACKET
002e25 ⸥ BOTTOM RIGHT HALF BRACKET
002e26 ⸦ LEFT SIDEWAYS U BRACKET
002e27 ⸧ RIGHT SIDEWAYS U BRACKET
002e28 ⸨ LEFT DOUBLE PARENTHESIS
002e29 ⸩ RIGHT DOUBLE PARENTHESIS
002ff8 ⿸ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT
002ff9 ⿹ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT
003008 〈 LEFT ANGLE BRACKET
003009 〉 RIGHT ANGLE BRACKET
00300a 《 LEFT DOUBLE ANGLE BRACKET
00300b 》 RIGHT DOUBLE ANGLE BRACKET
00300c 「 LEFT CORNER BRACKET
00300d 」 RIGHT CORNER BRACKET
00300e 『 LEFT WHITE CORNER BRACKET
00300f 』 RIGHT WHITE CORNER BRACKET
003010 【 LEFT BLACK LENTICULAR BRACKET
003011 】 RIGHT BLACK LENTICULAR BRACKET
003014 〔 LEFT TORTOISE SHELL BRACKET
003015 〕 RIGHT TORTOISE SHELL BRACKET
003016 〖 LEFT WHITE LENTICULAR BRACKET
003017 〗 RIGHT WHITE LENTICULAR BRACKET
003018 〘 LEFT WHITE TORTOISE SHELL BRACKET
003019 〙 RIGHT WHITE TORTOISE SHELL BRACKET
00301a 〚 LEFT WHITE SQUARE BRACKET
00301b 〛 RIGHT WHITE SQUARE BRACKET
0032a7 ㊧ CIRCLED IDEOGRAPH LEFT
0032a8 ㊨ CIRCLED IDEOGRAPH RIGHT
00a9c1 ꧁ JAVANESE LEFT RERENGGAN
00a9c2 ꧂ JAVANESE RIGHT RERENGGAN
00fd3e ﴾ ORNATE LEFT PARENTHESIS
00fd3f ﴿ ORNATE RIGHT PARENTHESIS
00fe17 ︗ PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
00fe18 ︘ PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET
00fe20 ︠ COMBINING LIGATURE LEFT HALF
00fe21 ︡ COMBINING LIGATURE RIGHT HALF
00fe22 ︢ COMBINING DOUBLE TILDE LEFT HALF
00fe23 ︣ COMBINING DOUBLE TILDE RIGHT HALF
00fe24 ︤ COMBINING MACRON LEFT HALF
00fe25 ︥ COMBINING MACRON RIGHT HALF
00fe35 ︵ PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
00fe36 ︶ PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
00fe37 ︷ PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
00fe38 ︸ PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
00fe39 ︹ PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
00fe3a ︺ PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
00fe3b ︻ PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
00fe3c ︼ PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
00fe3d ︽ PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
00fe3e ︾ PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
00fe3f ︿ PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
00fe40 ﹀ PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
00fe41 ﹁ PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
00fe42 ﹂ PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
00fe43 ﹃ PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
00fe44 ﹄ PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
00fe47 ﹇ PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
00fe48 ﹈ PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET
00fe59 ﹙ SMALL LEFT PARENTHESIS
00fe5a ﹚ SMALL RIGHT PARENTHESIS
00fe5b ﹛ SMALL LEFT CURLY BRACKET
00fe5c ﹜ SMALL RIGHT CURLY BRACKET
00fe5d ﹝ SMALL LEFT TORTOISE SHELL BRACKET
00fe5e ﹞ SMALL RIGHT TORTOISE SHELL BRACKET
00ff08 ( FULLWIDTH LEFT PARENTHESIS
00ff09 ) FULLWIDTH RIGHT PARENTHESIS
00ff3b [ FULLWIDTH LEFT SQUARE BRACKET
00ff3d ] FULLWIDTH RIGHT SQUARE BRACKET
00ff5b { FULLWIDTH LEFT CURLY BRACKET
00ff5d } FULLWIDTH RIGHT CURLY BRACKET
00ff5f ⦅ FULLWIDTH LEFT WHITE PARENTHESIS
00ff60 ⦆ FULLWIDTH RIGHT WHITE PARENTHESIS
00ff62 「 HALFWIDTH LEFT CORNER BRACKET
00ff63 」 HALFWIDTH RIGHT CORNER BRACKET
01d106 � MUSICAL SYMBOL LEFT REPEAT SIGN
01d107 � MUSICAL SYMBOL RIGHT REPEAT SIGN
01d14a � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT WHITE
01d14c � MUSICAL SYMBOL TRIANGLE NOTEHEAD RIGHT WHITE
01d14b � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT BLACK
01d14d � MUSICAL SYMBOL TRIANGLE NOTEHEAD RIGHT BLACK
0e0028 � TAG LEFT PARENTHESIS
0e0029 � TAG RIGHT PARENTHESIS
0e005b � TAG LEFT SQUARE BRACKET
0e005d � TAG RIGHT SQUARE BRACKET
0e007b � TAG LEFT CURLY BRACKET
0e007d � TAG RIGHT CURLY BRACKET

then use, um, something like Perl, ( :^) ) to
look up a textual description with the opposite word substituted in and
if found, use it as the complimentary character -- basically doing this
for 2 characters that have a RIGHT & LEFT. Should also obviate the need
for any enumerated table.

Is there something wrong in that 'simple' approach?  It would seem

to be the most flexible... (?)

000028 ( LEFT PARENTHESIS
00005b [ LEFT SQUARE BRACKET
00007b { LEFT CURLY BRACKET
0000ab « AQML_IDX LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
00019d Ɲ LATIN CAPITAL LETTER N WITH LEFT HOOK
000272 ɲ LATIN SMALL LETTER N WITH LEFT HOOK
0002bf ʿ MODIFIER LETTER LEFT HALF RING
0002c2 ˂ MODIFIER LETTER LEFT ARROWHEAD
0002d3 ˓ MODIFIER LETTER CENTRED LEFT HALF RING
0002f1 ˱ MODIFIER LETTER LOW LEFT ARROWHEAD
0002ff ˿ MODIFIER LETTER LOW LEFT ARROW
000318 ̘ COMBINING LEFT TACK BELOW
00031a ̚ COMBINING LEFT ANGLE ABOVE
00031c ̜ COMBINING LEFT HALF RING BELOW
000349 ͉ COMBINING LEFT ANGLE BELOW
00034d ͍ COMBINING LEFT RIGHT ARROW BELOW
000351 ͑ COMBINING LEFT HALF RING ABOVE
000354 ͔ COMBINING LEFT ARROWHEAD BELOW
000559 ՙ ARMENIAN MODIFIER LETTER LEFT HALF RING
000706 ܆ SYRIAC COLON SKEWED LEFT
000708 ܈ SYRIAC SUPRALINEAR COLON SKEWED LEFT
000fd6 ࿖ LEFT-FACING SVASTI SIGN
000fd8 ࿘ LEFT-FACING SVASTI SIGN WITH DOTS
001b78 ᭸ BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PANG
001b79 ᭹ BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PUNG
001b7a ᭺ BALINESE MUSICAL SYMBOL LEFT-HAND CLOSED PLAK
001b7b ᭻ BALINESE MUSICAL SYMBOL LEFT-HAND CLOSED PLUK
001b7c ᭼ BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PING
001dae ᶮ MODIFIER LETTER SMALL N WITH LEFT HOOK
001dfe ᷾ COMBINING LEFT ARROWHEAD ABOVE
00200e ‎ LEFT-TO-RIGHT MARK
00200f ‏ RIGHT-TO-LEFT MARK
002018 ‘ LEFT SINGLE QUOTATION MARK
00201c “ LEFT DOUBLE QUOTATION MARK
00202a ‪ LEFT-TO-RIGHT EMBEDDING
00202b ‫ RIGHT-TO-LEFT EMBEDDING
00202d ‭ LEFT-TO-RIGHT OVERRIDE
00202e ‮ RIGHT-TO-LEFT OVERRIDE
002039 ‹ SINGLE LEFT-POINTING ANGLE QUOTATION MARK
002045 ⁅ LEFT SQUARE BRACKET WITH QUILL
00207d ⁽ SUPERSCRIPT LEFT PARENTHESIS
00208d ₍ SUBSCRIPT LEFT PARENTHESIS
0020d0 ⃐ COMBINING LEFT HARPOON ABOVE
0020d6 ⃖ COMBINING LEFT ARROW ABOVE
0020e1 ⃡ COMBINING LEFT RIGHT ARROW ABOVE
0020ee ⃮ COMBINING LEFT ARROW BELOW
002194 ↔ LEFT RIGHT ARROW
0021ad ↭ LEFT RIGHT WAVE ARROW
0021ae ↮ LEFT RIGHT ARROW WITH STROKE
0021ce ⇎ LEFT RIGHT DOUBLE ARROW WITH STROKE
0021d4 ⇔ LEFT RIGHT DOUBLE ARROW
0021f9 ⇹ LEFT RIGHT ARROW WITH VERTICAL STROKE
0021fc ⇼ LEFT RIGHT ARROW WITH DOUBLE VERTICAL STROKE
0021ff ⇿ LEFT RIGHT OPEN-HEADED ARROW
0022a3 ⊣ LEFT TACK
0022c9 ⋉ LEFT NORMAL FACTOR SEMIDIRECT PRODUCT
0022cb ⋋ LEFT SEMIDIRECT PRODUCT
002308 ⌈ LEFT CEILING
00230a ⌊ LEFT FLOOR
00230d ⌍ BOTTOM LEFT CROP
00230f ⌏ TOP LEFT CROP
00231c ⌜ TOP LEFT CORNER
00231e ⌞ BOTTOM LEFT CORNER
002329 〈 LEFT-POINTING ANGLE BRACKET
00232b ⌫ ERASE TO THE LEFT
002367 ⍧ APL FUNCTIONAL SYMBOL LEFT SHOE STILE
00239b ⎛ LEFT PARENTHESIS UPPER HOOK
00239c ⎜ LEFT PARENTHESIS EXTENSION
00239d ⎝ LEFT PARENTHESIS LOWER HOOK
0023a1 ⎡ LEFT SQUARE BRACKET UPPER CORNER
0023a2 ⎢ LEFT SQUARE BRACKET EXTENSION
0023a3 ⎣ LEFT SQUARE BRACKET LOWER CORNER
0023a7 ⎧ LEFT CURLY BRACKET UPPER HOOK
0023a8 ⎨ LEFT CURLY BRACKET MIDDLE PIECE
0023a9 ⎩ LEFT CURLY BRACKET LOWER HOOK
0023b0 ⎰ UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION
0023b1 ⎱ UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION
0023b8 ⎸ LEFT VERTICAL BOX LINE
0023cb ⏋ DENTISTRY SYMBOL LIGHT VERTICAL AND TOP LEFT
0023cc ⏌ DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM LEFT
002510 ┐ BOX DRAWINGS LIGHT DOWN AND LEFT
002511 ┑ BOX DRAWINGS DOWN LIGHT AND LEFT HEAVY
002512 ┒ BOX DRAWINGS DOWN HEAVY AND LEFT LIGHT
002513 ┓ BOX DRAWINGS HEAVY DOWN AND LEFT
002518 ┘ BOX DRAWINGS LIGHT UP AND LEFT
002519 ┙ BOX DRAWINGS UP LIGHT AND LEFT HEAVY
00251a ┚ BOX DRAWINGS UP HEAVY AND LEFT LIGHT
00251b ┛ BOX DRAWINGS HEAVY UP AND LEFT
002524 ┤ BOX DRAWINGS LIGHT VERTICAL AND LEFT
002525 ┥ BOX DRAWINGS VERTICAL LIGHT AND LEFT HEAVY
002526 ┦ BOX DRAWINGS UP HEAVY AND LEFT DOWN LIGHT
002527 ┧ BOX DRAWINGS DOWN HEAVY AND LEFT UP LIGHT
002528 ┨ BOX DRAWINGS VERTICAL HEAVY AND LEFT LIGHT
002529 ┩ BOX DRAWINGS DOWN LIGHT AND LEFT UP HEAVY
00252a ┪ BOX DRAWINGS UP LIGHT AND LEFT DOWN HEAVY
00252b ┫ BOX DRAWINGS HEAVY VERTICAL AND LEFT
00252d ┭ BOX DRAWINGS LEFT HEAVY AND RIGHT DOWN LIGHT
00252e ┮ BOX DRAWINGS RIGHT HEAVY AND LEFT DOWN LIGHT
002531 ┱ BOX DRAWINGS RIGHT LIGHT AND LEFT DOWN HEAVY
002532 ┲ BOX DRAWINGS LEFT LIGHT AND RIGHT DOWN HEAVY
002535 ┵ BOX DRAWINGS LEFT HEAVY AND RIGHT UP LIGHT
002536 ┶ BOX DRAWINGS RIGHT HEAVY AND LEFT UP LIGHT
002539 ┹ BOX DRAWINGS RIGHT LIGHT AND LEFT UP HEAVY
00253a ┺ BOX DRAWINGS LEFT LIGHT AND RIGHT UP HEAVY
00253d ┽ BOX DRAWINGS LEFT HEAVY AND RIGHT VERTICAL LIGHT
00253e ┾ BOX DRAWINGS RIGHT HEAVY AND LEFT VERTICAL LIGHT
002543 ╃ BOX DRAWINGS LEFT UP HEAVY AND RIGHT DOWN LIGHT
002544 ╄ BOX DRAWINGS RIGHT UP HEAVY AND LEFT DOWN LIGHT
002545 ╅ BOX DRAWINGS LEFT DOWN HEAVY AND RIGHT UP LIGHT
002546 ╆ BOX DRAWINGS RIGHT DOWN HEAVY AND LEFT UP LIGHT
002549 ╉ BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY
00254a ╊ BOX DRAWINGS LEFT LIGHT AND RIGHT VERTICAL HEAVY
002555 ╕ BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
002556 ╖ BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
002557 ╗ BOX DRAWINGS DOUBLE DOWN AND LEFT
00255b ╛ BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
00255c ╜ BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
00255d ╝ BOX DRAWINGS DOUBLE UP AND LEFT
002561 ╡ BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
002562 ╢ BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
002563 ╣ BOX DRAWINGS DOUBLE VERTICAL AND LEFT
00256e ╮ BOX DRAWINGS LIGHT ARC DOWN AND LEFT
00256f ╯ BOX DRAWINGS LIGHT ARC UP AND LEFT
002571 ╱ BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT
002572 ╲ BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT
002574 ╴ BOX DRAWINGS LIGHT LEFT
002578 ╸ BOX DRAWINGS HEAVY LEFT
00257c ╼ BOX DRAWINGS LIGHT LEFT AND HEAVY RIGHT
00257e ╾ BOX DRAWINGS HEAVY LEFT AND LIGHT RIGHT
002589 ▉ LEFT SEVEN EIGHTHS BLOCK
00258a ▊ LEFT THREE QUARTERS BLOCK
00258b ▋ LEFT FIVE EIGHTHS BLOCK
00258c ▌ LEFT HALF BLOCK
00258d ▍ LEFT THREE EIGHTHS BLOCK
00258e ▎ LEFT ONE QUARTER BLOCK
00258f ▏ LEFT ONE EIGHTH BLOCK
002596 ▖ QUADRANT LOWER LEFT
002598 ▘ QUADRANT UPPER LEFT
002599 ▙ QUADRANT UPPER LEFT AND LOWER LEFT AND LOWER RIGHT
00259a ▚ QUADRANT UPPER LEFT AND LOWER RIGHT
00259b ▛ QUADRANT UPPER LEFT AND UPPER RIGHT AND LOWER LEFT
00259c ▜ QUADRANT UPPER LEFT AND UPPER RIGHT AND LOWER RIGHT
00259e ▞ QUADRANT UPPER RIGHT AND LOWER LEFT
00259f ▟ QUADRANT UPPER RIGHT AND LOWER LEFT AND LOWER RIGHT
0025a7 ▧ SQUARE WITH UPPER LEFT TO LOWER RIGHT FILL
0025a8 ▨ SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL
0025c0 ◀ BLACK LEFT-POINTING TRIANGLE
0025c1 ◁ WHITE LEFT-POINTING TRIANGLE
0025c2 ◂ BLACK LEFT-POINTING SMALL TRIANGLE
0025c3 ◃ WHITE LEFT-POINTING SMALL TRIANGLE
0025c4 ◄ BLACK LEFT-POINTING POINTER
0025c5 ◅ WHITE LEFT-POINTING POINTER
0025d0 ◐ CIRCLE WITH LEFT HALF BLACK
0025d5 ◕ CIRCLE WITH ALL BUT UPPER LEFT QUADRANT BLACK
0025d6 ◖ LEFT HALF BLACK CIRCLE
0025dc ◜ UPPER LEFT QUADRANT CIRCULAR ARC
0025df ◟ LOWER LEFT QUADRANT CIRCULAR ARC
0025e3 ◣ BLACK LOWER LEFT TRIANGLE
0025e4 ◤ BLACK UPPER LEFT TRIANGLE
0025e7 ◧ SQUARE WITH LEFT HALF BLACK
0025e9 ◩ SQUARE WITH UPPER LEFT DIAGONAL HALF BLACK
0025ed ◭ UP-POINTING TRIANGLE WITH LEFT HALF BLACK
0025f0 ◰ WHITE SQUARE WITH UPPER LEFT QUADRANT
0025f1 ◱ WHITE SQUARE WITH LOWER LEFT QUADRANT
0025f4 ◴ WHITE CIRCLE WITH UPPER LEFT QUADRANT
0025f5 ◵ WHITE CIRCLE WITH LOWER LEFT QUADRANT
0025f8 ◸ UPPER LEFT TRIANGLE
0025fa ◺ LOWER LEFT TRIANGLE
00261a ☚ BLACK LEFT POINTING INDEX
00261c ☜ WHITE LEFT POINTING INDEX
00269f ⚟ THREE LINES CONVERGING LEFT
0026d5 ⛕ ALTERNATE ONE-WAY LEFT WAY TRAFFIC
0026d6 ⛖ BLACK TWO-WAY LEFT WAY TRAFFIC
0026d7 ⛗ WHITE TWO-WAY LEFT WAY TRAFFIC
0026d8 ⛘ BLACK LEFT LANE MERGE
0026d9 ⛙ WHITE LEFT LANE MERGE
0026dc ⛜ LEFT CLOSED ENTRY
0026e0 ⛠ RESTRICTED LEFT ENTRY-1
0026e1 ⛡ RESTRICTED LEFT ENTRY-2
002768 ❨ MEDIUM LEFT PARENTHESIS ORNAMENT
00276a ❪ MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
00276c ❬ MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
00276e ❮ HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
002770 ❰ HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
002772 ❲ LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
002774 ❴ MEDIUM LEFT CURLY BRACKET ORNAMENT
0027aa ➪ LEFT-SHADED WHITE RIGHTWARDS ARROW
0027c5 ⟅ LEFT S-SHAPED BAG DELIMITER
0027d4 ⟔ UPPER LEFT CORNER WITH DOT
0027d5 ⟕ LEFT OUTER JOIN
0027da ⟚ LEFT AND RIGHT DOUBLE TURNSTILE
0027db ⟛ LEFT AND RIGHT TACK
0027dc ⟜ LEFT MULTIMAP
0027de ⟞ LONG LEFT TACK
0027e6 ⟦ MATHEMATICAL LEFT WHITE SQUARE BRACKET
0027e8 ⟨ MATHEMATICAL LEFT ANGLE BRACKET
0027ea ⟪ MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
0027ec ⟬ MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
0027ee ⟮ MATHEMATICAL LEFT FLATTENED PARENTHESIS
0027f7 ⟷ LONG LEFT RIGHT ARROW
0027fa ⟺ LONG LEFT RIGHT DOUBLE ARROW
002904 ⤄ LEFT RIGHT DOUBLE ARROW WITH VERTICAL STROKE
002939 ⤹ LEFT-SIDE ARC ANTICLOCKWISE ARROW
00293f ⤿ LOWER LEFT SEMICIRCULAR ANTICLOCKWISE ARROW
002948 ⥈ LEFT RIGHT ARROW THROUGH SMALL CIRCLE
00294a ⥊ LEFT BARB UP RIGHT BARB DOWN HARPOON
00294b ⥋ LEFT BARB DOWN RIGHT BARB UP HARPOON
00294c ⥌ UP BARB RIGHT DOWN BARB LEFT HARPOON
00294d ⥍ UP BARB LEFT DOWN BARB RIGHT HARPOON
00294e ⥎ LEFT BARB UP RIGHT BARB UP HARPOON
002950 ⥐ LEFT BARB DOWN RIGHT BARB DOWN HARPOON
002951 ⥑ UP BARB LEFT DOWN BARB LEFT HARPOON
002958 ⥘ UPWARDS HARPOON WITH BARB LEFT TO BAR
002959 ⥙ DOWNWARDS HARPOON WITH BARB LEFT TO BAR
002960 ⥠ UPWARDS HARPOON WITH BARB LEFT FROM BAR
002961 ⥡ DOWNWARDS HARPOON WITH BARB LEFT FROM BAR
002963 ⥣ UPWARDS HARPOON WITH BARB LEFT BESIDE UPWARDS HARPOON WITH BARB RIGHT
002965 ⥥ DOWNWARDS HARPOON WITH BARB LEFT BESIDE DOWNWARDS HARPOON WITH BARB RIGHT
00296e ⥮ UPWARDS HARPOON WITH BARB LEFT BESIDE DOWNWARDS HARPOON WITH BARB RIGHT
00296f ⥯ DOWNWARDS HARPOON WITH BARB LEFT BESIDE UPWARDS HARPOON WITH BARB RIGHT
00297c ⥼ LEFT FISH TAIL
002983 ⦃ LEFT WHITE CURLY BRACKET
002985 ⦅ LEFT WHITE PARENTHESIS
002987 ⦇ Z NOTATION LEFT IMAGE BRACKET
002989 ⦉ Z NOTATION LEFT BINDING BRACKET
00298b ⦋ LEFT SQUARE BRACKET WITH UNDERBAR
00298d ⦍ LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
00298f ⦏ LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
002991 ⦑ LEFT ANGLE BRACKET WITH DOT
002993 ⦓ LEFT ARC LESS-THAN BRACKET
002995 ⦕ DOUBLE LEFT ARC GREATER-THAN BRACKET
002997 ⦗ LEFT BLACK TORTOISE SHELL BRACKET
00299b ⦛ MEASURED ANGLE OPENING LEFT
0029a0 ⦠ SPHERICAL ANGLE OPENING LEFT
0029a9 ⦩ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND LEFT
0029ab ⦫ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND LEFT
0029ad ⦭ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND UP
0029af ⦯ MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND DOWN
0029b4 ⦴ EMPTY SET WITH LEFT ARROW ABOVE
0029ce ⧎ RIGHT TRIANGLE ABOVE LEFT TRIANGLE
0029cf ⧏ LEFT TRIANGLE BESIDE VERTICAL BAR
0029d1 ⧑ BOWTIE WITH LEFT HALF BLACK
0029d4 ⧔ TIMES WITH LEFT HALF BLACK
0029d8 ⧘ LEFT WIGGLY FENCE
0029da ⧚ LEFT DOUBLE WIGGLY FENCE
0029e8 ⧨ DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK
0029fc ⧼ LEFT-POINTING CURVED ANGLE BRACKET
002a1e ⨞ LARGE LEFT TRIANGLE OPERATOR
002a2d ⨭ PLUS SIGN IN LEFT HALF CIRCLE
002a34 ⨴ MULTIPLICATION SIGN IN LEFT HALF CIRCLE
002a84 ⪄ GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE LEFT
002acd ⫍ SQUARE LEFT OPEN BOX OPERATOR
002ade ⫞ SHORT LEFT TACK
002ae3 ⫣ DOUBLE VERTICAL BAR LEFT TURNSTILE
002ae4 ⫤ VERTICAL BAR DOUBLE LEFT TURNSTILE
002ae5 ⫥ DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE
002ae6 ⫦ LONG DASH FROM LEFT MEMBER OF DOUBLE VERTICAL
002b04 ⬄ LEFT RIGHT WHITE ARROW
002b0c ⬌ LEFT RIGHT BLACK ARROW
002b15 ⬕ SQUARE WITH LOWER LEFT DIAGONAL HALF BLACK
002b16 ⬖ DIAMOND WITH LEFT HALF BLACK
002b30 ⬰ LEFT ARROW WITH SMALL CIRCLE
002b32 ⬲ LEFT ARROW WITH CIRCLED PLUS
002b3f ⬿ WAVE ARROW POINTING DIRECTLY LEFT
002e02 ⸂ LEFT SUBSTITUTION BRACKET
002e04 ⸄ LEFT DOTTED SUBSTITUTION BRACKET
002e09 ⸉ LEFT TRANSPOSITION BRACKET
002e0c ⸌ LEFT RAISED OMISSION BRACKET
002e1c ⸜ LEFT LOW PARAPHRASE BRACKET
002e20 ⸠ LEFT VERTICAL BAR WITH QUILL
002e22 ⸢ TOP LEFT HALF BRACKET
002e24 ⸤ BOTTOM LEFT HALF BRACKET
002e26 ⸦ LEFT SIDEWAYS U BRACKET
002e28 ⸨ LEFT DOUBLE PARENTHESIS
002ff0 ⿰ IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT
002ff2 ⿲ IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT
002ff7 ⿷ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT
002ff8 ⿸ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT
002ffa ⿺ IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT
003008 〈 LEFT ANGLE BRACKET
00300a 《 LEFT DOUBLE ANGLE BRACKET
00300c 「 LEFT CORNER BRACKET
00300e 『 LEFT WHITE CORNER BRACKET
003010 【 LEFT BLACK LENTICULAR BRACKET
003014 〔 LEFT TORTOISE SHELL BRACKET
003016 〖 LEFT WHITE LENTICULAR BRACKET
003018 〘 LEFT WHITE TORTOISE SHELL BRACKET
00301a 〚 LEFT WHITE SQUARE BRACKET
0032a7 ㊧ CIRCLED IDEOGRAPH LEFT
00a70d ꜍ MODIFIER LETTER EXTRA-HIGH DOTTED LEFT-STEM TONE BAR
00a70e ꜎ MODIFIER LETTER HIGH DOTTED LEFT-STEM TONE BAR
00a70f ꜏ MODIFIER LETTER MID DOTTED LEFT-STEM TONE BAR
00a710 ꜐ MODIFIER LETTER LOW DOTTED LEFT-STEM TONE BAR
00a711 ꜑ MODIFIER LETTER EXTRA-LOW DOTTED LEFT-STEM TONE BAR
00a712 ꜒ MODIFIER LETTER EXTRA-HIGH LEFT-STEM TONE BAR
00a713 ꜓ MODIFIER LETTER HIGH LEFT-STEM TONE BAR
00a714 ꜔ MODIFIER LETTER MID LEFT-STEM TONE BAR
00a715 ꜕ MODIFIER LETTER LOW LEFT-STEM TONE BAR
00a716 ꜖ MODIFIER LETTER EXTRA-LOW LEFT-STEM TONE BAR
00a9c1 ꧁ JAVANESE LEFT RERENGGAN
00fd3e ﴾ ORNATE LEFT PARENTHESIS
00fe17 ︗ PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
00fe20 ︠ COMBINING LIGATURE LEFT HALF
00fe22 ︢ COMBINING DOUBLE TILDE LEFT HALF
00fe24 ︤ COMBINING MACRON LEFT HALF
00fe35 ︵ PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
00fe37 ︷ PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
00fe39 ︹ PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
00fe3b ︻ PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
00fe3d ︽ PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
00fe3f ︿ PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
00fe41 ﹁ PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
00fe43 ﹃ PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
00fe47 ﹇ PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
00fe59 ﹙ SMALL LEFT PARENTHESIS
00fe5b ﹛ SMALL LEFT CURLY BRACKET
00fe5d ﹝ SMALL LEFT TORTOISE SHELL BRACKET
00ff08 ( FULLWIDTH LEFT PARENTHESIS
00ff3b [ FULLWIDTH LEFT SQUARE BRACKET
00ff5b { FULLWIDTH LEFT CURLY BRACKET
00ff5f ⦅ FULLWIDTH LEFT WHITE PARENTHESIS
00ff62 「 HALFWIDTH LEFT CORNER BRACKET
01d106 � MUSICAL SYMBOL LEFT REPEAT SIGN
01d14a � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT WHITE
01d14b � MUSICAL SYMBOL TRIANGLE NOTEHEAD LEFT BLACK
0e0028 � TAG LEFT PARENTHESIS
0e005b � TAG LEFT SQUARE BRACKET
0e007b � TAG LEFT CURLY BRACKET

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20, 11.00,
11.11, 11.23 and 11.31, OpenSuSE 10.1, 11.0 .. 11.3 and AIX 5.2 and 5.3.
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From jwkrahn@shaw.ca

Linda Walsh wrote​:

# New Ticket Created by Linda Walsh
# Please include the string​: [perl #89032]
# in the subject line of all future correspondence about this issue.
#<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032>

This is a bug report for perl from perl-diddler@​tlinx.org,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

I was trying to quote a block of code. Thing is, to do that, you have to
choose a delimiter that's not in the code. I wanted to use a "paired"
operator like some sort of bracket -- but it seems that perl ignores "left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square& curly (according to the perlop manpage). One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

Have you thought about using a here-document to quote your code​:

my $code = <<CODE_BLOCK;

# your code here
my \$var = 30;

CODE_BLOCK

And if you use single quotes it won't be interpolated​:

my $code = <<'CODE_BLOCK';

# your code here
my $var = 30;

CODE_BLOCK

John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From @Abigail

On Wed, Apr 20, 2011 at 10​:02​:45PM -0700, Linda Walsh wrote​:

# New Ticket Created by Linda Walsh
# Please include the string​: [perl #89032]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=89032 >

This is a bug report for perl from perl-diddler@​tlinx.org,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

I was trying to quote a block of code. Thing is, to do that, you have to
choose a delimiter that's not in the code. I wanted to use a "paired"
operator like some sort of bracket -- but it seems that perl ignores "left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square & curly (according to the perlop manpage). One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

If in your block of code, your '{}', '[]' or '()' are balanced, you can
use them as delimiters.

And if your code uses "real angle brackets", it fails to work.

Abigail

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From tchrist@perl.com

The last part of the list above displayed on my system written right to
left, until I added a second LEFT-TO-RIGHT OVERRIDE character to get it
back on track. That shows some of the perils of blindly using the words
of the names to decide if the delimiter is part of a pair or not.

The list doesn't include "LEFT BAGGAGE", for some reason, where LEFT
probably should have been "UNCLAIMED". Not a lot of thought has gone
into the Unicode names, and so they can be ambiguous.

That's putting it mildly. Another area where I find things have been
left too much to chance is in the primary collation strengths, where
there is no rhyme nor reason about whether something counts as the
same letter or not.

The list probably should be a subset of those characters that have a
mirrored glyph. These are​:

Is there an easier way to to pull those out of BidiMirroring.txt than
doing it by hand?

--tom

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From @Abigail

On Fri, Apr 22, 2011 at 09​:16​:47AM -0600, Karl Williamson wrote​:

The list probably should be a subset of those characters that have a
mirrored glyph. These are​:
0028 0029 # '(' => ')'; LEFT PARENTHESIS => RIGHT PARENTHESIS
0029 0028 # ')' => '('; RIGHT PARENTHESIS => LEFT PARENTHESIS
003C 003E # '<' => '>'; LESS-THAN SIGN => GREATER-THAN SIGN
003E 003C # '>' => '<'; GREATER-THAN SIGN => LESS-THAN SIGN
005B 005D # '[' => ']'; LEFT SQUARE BRACKET => RIGHT SQUARE BRACKET
005D 005B # ']' => '['; RIGHT SQUARE BRACKET => LEFT SQUARE BRACKET
007B 007D # '{' => '}'; LEFT CURLY BRACKET => RIGHT CURLY BRACKET
007D 007B # '}' => '{'; RIGHT CURLY BRACKET => LEFT CURLY BRACKET

No 'd' => 'b' or 'p' => 'q' ? ;-)

And then there are '|' => '|', '!' => '!', and other symmetric glyphs -
but we already can use them as "mirrorred" delimiters. They don't nest
though.

(I'd pick d/b and p/q over any of the non-ASCII mirrored glyphs; my
terminal/font show most of them fine (as long as I stay away from MacOS);
they're just too hard to enter)

I sometimes wish that Perl would do delimiter as POD does. So one could
write​:

  say qq<<< a > b >>>; Print "a > b"

Abigail

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From perl-diddler@tlinx.org

karl williamson via RT wrote​:

0e005b � TAG LEFT SQUARE BRACKET
0e007b � TAG LEFT CURLY BRACKET


The last part of the list above displayed on my system written right to
left, until I added a second LEFT-TO-RIGHT OVERRIDE character to get it

  Something broken with your system?

  They don't change RtL semantics anywhere I used them.

Where are you seeing this behavior?

The list doesn't include "LEFT BAGGAGE", for some reason, where LEFT
probably should have been "UNCLAIMED". Not a lot of thought has gone
into the Unicode names, and so they can be ambiguous.


  Unless there is a "RIGHT BAGGAGE" to match it up with, I
wouldn't worry.

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From @khwilliamson

On 04/22/2011 10​:10 AM, Tom Christiansen wrote​:

The list probably should be a subset of those characters that have a
mirrored glyph. These are​:

Is there an easier way to to pull those out of BidiMirroring.txt than
doing it by hand?

--tom

It is somewhat easier to use lib/unicore/To/Bmg.pl
The list I used was from a version of that file that had been compiled
with mktables -annotate

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From @khwilliamson

On 04/22/2011 10​:37 AM, Linda Walsh wrote​:

karl williamson via RT wrote​:

0e005b � TAG LEFT SQUARE BRACKET
0e007b � TAG LEFT CURLY BRACKET
----

The last part of the list above displayed on my system written right
to left, until I added a second LEFT-TO-RIGHT OVERRIDE character to
get it

Something broken with your system?

They don't change RtL semantics anywhere I used them.

Where are you seeing this behavior?

On the email I received from H. Merijn Brand, the RIGHT-TO-LEFT OVERRIDE
character in it caused the remainder of the email to be displayed
mirrored. That seems to me to be the correct behavior, and so I don't
think my system is broken.

The list doesn't include "LEFT BAGGAGE", for some reason, where LEFT
probably should have been "UNCLAIMED". Not a lot of thought has gone
into the Unicode names, and so they can be ambiguous.
----
Unless there is a "RIGHT BAGGAGE" to match it up with, I
wouldn't worry.

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2011

From perl-diddler@tlinx.org

karl williamson via RT wrote​:

On 04/22/2011 10​:37 AM, Linda Walsh wrote​:

karl williamson via RT wrote​:

0e005b � TAG LEFT SQUARE BRACKET
0e007b � TAG LEFT CURLY BRACKET
----

The last part of the list above displayed on my system written right
to left, until I added a second LEFT-TO-RIGHT OVERRIDE character to
get it

On the email I received from H. Merijn Brand, the RIGHT-TO-LEFT OVERRIDE
character in it caused the remainder of the email to be displayed
mirrored. That seems to me to be the correct behavior, and so I don't
think my system is broken.


  What email program do you use?

  FF doesn't display that behavior ...

  Oh, you mean after U+200E/U+200F

I thought you meant inherent the characters for the 2nd part of the
list.

  I'd rule out those characters because they contain
both the 'RIGHT+LEFT' keywords in the description. So a 'dumb'
algorithm looking at it wouldn't know if it was meant to be a right
or a left side of a pair.

  Alot of these objections are trivial details that would be
worked out in coding it up.

  What I gave was a general concept -- not a tested algorithm.
Be reasonable. If you need me to design the whole algorithm, it's
not something I'm going to do off the top of my head.

@p5pRT
Copy link
Author

p5pRT commented Apr 27, 2011

From @obra

On Wed 20.Apr'11 at 22​:02​:45 -0700, Linda Walsh wrote​:

I was trying to quote a block of code. Thing is, to do that, you have to
choose a delimiter that's not in the code. I wanted to use a "paired"
operator like some sort of bracket -- but it seems that perl ignores "left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square & curly (according to the perlop manpage). One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

I'd be curious to know if the Perl 6 community can offer us any useful
advice here. I believe that @​Larry went for such a solution. How happy
are they having done it?

-Jesse

@p5pRT
Copy link
Author

p5pRT commented Apr 27, 2011

From vadim.konovalov@alcatel-lucent.com

From​: Jesse Vincent
On Wed 20.Apr'11 at 22​:02​:45 -0700, Linda Walsh wrote​:

I was trying to quote a block of code. Thing is, to do
that, you have to
choose a delimiter that's not in the code. I wanted to use
a "paired"
operator like some sort of bracket -- but it seems that
perl ignores "left
& right" *anything*, unless it is a one of 4 "bracket types"​: round,
angle, square & curly (according to the perlop manpage).
One problem
though, *real* angle brackets U+2329 (〈) and U+2330 (〉) don't work.

I'd be curious to know if the Perl 6 community can offer us any useful
advice here. I believe that @​Larry went for such a solution.
How happy
are they having done it?

STD.pm and STD.pm6
(http​://cpansearch.perl.org/src/SOREAR/STD-20101111/lib/STD.pm
and https://github.com/perl6/std/blob/master/STD.pm6)

have
our %open2close = (
"\x{0028}" => "\x{0029}",
"\x{003C}" => "\x{003E}",
"\x{005B}" => "\x{005D}",
"\x{007B}" => "\x{007D}",
"\x{00AB}" => "\x{00BB}",
"\x{0F3A}" => "\x{0F3B}",
"\x{0F3C}" => "\x{0F3D}",
"\x{169B}" => "\x{169C}",
"\x{2018}" => "\x{2019}",
"\x{201A}" => "\x{2019}",
"\x{201B}" => "\x{2019}",
"\x{201C}" => "\x{201D}",
"\x{201E}" => "\x{201D}",
"\x{201F}" => "\x{201D}",
"\x{2039}" => "\x{203A}",
"\x{2045}" => "\x{2046}",
"\x{207D}" => "\x{207E}",
"\x{208D}" => "\x{208E}",
"\x{2208}" => "\x{220B}",
"\x{2209}" => "\x{220C}",
"\x{220A}" => "\x{220D}",
"\x{2215}" => "\x{29F5}",
"\x{223C}" => "\x{223D}",
"\x{2243}" => "\x{22CD}",
"\x{2252}" => "\x{2253}",
"\x{2254}" => "\x{2255}",
"\x{2264}" => "\x{2265}",
"\x{2266}" => "\x{2267}",
"\x{2268}" => "\x{2269}",
"\x{226A}" => "\x{226B}",
"\x{226E}" => "\x{226F}",
"\x{2270}" => "\x{2271}",
"\x{2272}" => "\x{2273}",
"\x{2274}" => "\x{2275}",
"\x{2276}" => "\x{2277}",
"\x{2278}" => "\x{2279}",
"\x{227A}" => "\x{227B}",
"\x{227C}" => "\x{227D}",
"\x{227E}" => "\x{227F}",
"\x{2280}" => "\x{2281}",
"\x{2282}" => "\x{2283}",
"\x{2284}" => "\x{2285}",
"\x{2286}" => "\x{2287}",
"\x{2288}" => "\x{2289}",
"\x{228A}" => "\x{228B}",
"\x{228F}" => "\x{2290}",
"\x{2291}" => "\x{2292}",
"\x{2298}" => "\x{29B8}",
"\x{22A2}" => "\x{22A3}",
"\x{22A6}" => "\x{2ADE}",
"\x{22A8}" => "\x{2AE4}",
"\x{22A9}" => "\x{2AE3}",
"\x{22AB}" => "\x{2AE5}",
"\x{22B0}" => "\x{22B1}",
"\x{22B2}" => "\x{22B3}",
"\x{22B4}" => "\x{22B5}",
"\x{22B6}" => "\x{22B7}",
"\x{22C9}" => "\x{22CA}",
"\x{22CB}" => "\x{22CC}",
"\x{22D0}" => "\x{22D1}",
"\x{22D6}" => "\x{22D7}",
"\x{22D8}" => "\x{22D9}",
"\x{22DA}" => "\x{22DB}",
"\x{22DC}" => "\x{22DD}",
"\x{22DE}" => "\x{22DF}",
"\x{22E0}" => "\x{22E1}",
"\x{22E2}" => "\x{22E3}",
"\x{22E4}" => "\x{22E5}",
"\x{22E6}" => "\x{22E7}",
"\x{22E8}" => "\x{22E9}",
"\x{22EA}" => "\x{22EB}",
"\x{22EC}" => "\x{22ED}",
"\x{22F0}" => "\x{22F1}",
"\x{22F2}" => "\x{22FA}",
"\x{22F3}" => "\x{22FB}",
"\x{22F4}" => "\x{22FC}",
"\x{22F6}" => "\x{22FD}",
"\x{22F7}" => "\x{22FE}",
"\x{2308}" => "\x{2309}",
"\x{230A}" => "\x{230B}",
"\x{2329}" => "\x{232A}",
"\x{23B4}" => "\x{23B5}",
"\x{2768}" => "\x{2769}",
"\x{276A}" => "\x{276B}",
"\x{276C}" => "\x{276D}",
"\x{276E}" => "\x{276F}",
"\x{2770}" => "\x{2771}",
"\x{2772}" => "\x{2773}",
"\x{2774}" => "\x{2775}",
"\x{27C3}" => "\x{27C4}",
"\x{27C5}" => "\x{27C6}",
"\x{27D5}" => "\x{27D6}",
"\x{27DD}" => "\x{27DE}",
"\x{27E2}" => "\x{27E3}",
"\x{27E4}" => "\x{27E5}",
"\x{27E6}" => "\x{27E7}",
"\x{27E8}" => "\x{27E9}",
"\x{27EA}" => "\x{27EB}",
"\x{2983}" => "\x{2984}",
"\x{2985}" => "\x{2986}",
"\x{2987}" => "\x{2988}",
"\x{2989}" => "\x{298A}",
"\x{298B}" => "\x{298C}",
"\x{298D}" => "\x{298E}",
"\x{298F}" => "\x{2990}",
"\x{2991}" => "\x{2992}",
"\x{2993}" => "\x{2994}",
"\x{2995}" => "\x{2996}",
"\x{2997}" => "\x{2998}",
"\x{29C0}" => "\x{29C1}",
"\x{29C4}" => "\x{29C5}",
"\x{29CF}" => "\x{29D0}",
"\x{29D1}" => "\x{29D2}",
"\x{29D4}" => "\x{29D5}",
"\x{29D8}" => "\x{29D9}",
"\x{29DA}" => "\x{29DB}",
"\x{29F8}" => "\x{29F9}",
"\x{29FC}" => "\x{29FD}",
"\x{2A2B}" => "\x{2A2C}",
"\x{2A2D}" => "\x{2A2E}",
"\x{2A34}" => "\x{2A35}",
"\x{2A3C}" => "\x{2A3D}",
"\x{2A64}" => "\x{2A65}",
"\x{2A79}" => "\x{2A7A}",
"\x{2A7D}" => "\x{2A7E}",
"\x{2A7F}" => "\x{2A80}",
"\x{2A81}" => "\x{2A82}",
"\x{2A83}" => "\x{2A84}",
"\x{2A8B}" => "\x{2A8C}",
"\x{2A91}" => "\x{2A92}",
"\x{2A93}" => "\x{2A94}",
"\x{2A95}" => "\x{2A96}",
"\x{2A97}" => "\x{2A98}",
"\x{2A99}" => "\x{2A9A}",
"\x{2A9B}" => "\x{2A9C}",
"\x{2AA1}" => "\x{2AA2}",
"\x{2AA6}" => "\x{2AA7}",
"\x{2AA8}" => "\x{2AA9}",
"\x{2AAA}" => "\x{2AAB}",
"\x{2AAC}" => "\x{2AAD}",
"\x{2AAF}" => "\x{2AB0}",
"\x{2AB3}" => "\x{2AB4}",
"\x{2ABB}" => "\x{2ABC}",
"\x{2ABD}" => "\x{2ABE}",
"\x{2ABF}" => "\x{2AC0}",
"\x{2AC1}" => "\x{2AC2}",
"\x{2AC3}" => "\x{2AC4}",
"\x{2AC5}" => "\x{2AC6}",
"\x{2ACD}" => "\x{2ACE}",
"\x{2ACF}" => "\x{2AD0}",
"\x{2AD1}" => "\x{2AD2}",
"\x{2AD3}" => "\x{2AD4}",
"\x{2AD5}" => "\x{2AD6}",
"\x{2AEC}" => "\x{2AED}",
"\x{2AF7}" => "\x{2AF8}",
"\x{2AF9}" => "\x{2AFA}",
"\x{2E02}" => "\x{2E03}",
"\x{2E04}" => "\x{2E05}",
"\x{2E09}" => "\x{2E0A}",
"\x{2E0C}" => "\x{2E0D}",
"\x{2E1C}" => "\x{2E1D}",
"\x{2E20}" => "\x{2E21}",
"\x{3008}" => "\x{3009}",
"\x{300A}" => "\x{300B}",
"\x{300C}" => "\x{300D}",
"\x{300E}" => "\x{300F}",
"\x{3010}" => "\x{3011}",
"\x{3014}" => "\x{3015}",
"\x{3016}" => "\x{3017}",
"\x{3018}" => "\x{3019}",
"\x{301A}" => "\x{301B}",
"\x{301D}" => "\x{301E}",
"\x{FD3E}" => "\x{FD3F}",
"\x{FE17}" => "\x{FE18}",
"\x{FE35}" => "\x{FE36}",
"\x{FE37}" => "\x{FE38}",
"\x{FE39}" => "\x{FE3A}",
"\x{FE3B}" => "\x{FE3C}",
"\x{FE3D}" => "\x{FE3E}",
"\x{FE3F}" => "\x{FE40}",
"\x{FE41}" => "\x{FE42}",
"\x{FE43}" => "\x{FE44}",
"\x{FE47}" => "\x{FE48}",
"\x{FE59}" => "\x{FE5A}",
"\x{FE5B}" => "\x{FE5C}",
"\x{FE5D}" => "\x{FE5E}",
"\x{FF08}" => "\x{FF09}",
"\x{FF1C}" => "\x{FF1E}",
"\x{FF3B}" => "\x{FF3D}",
"\x{FF5B}" => "\x{FF5D}",
"\x{FF5F}" => "\x{FF60}",
"\x{FF62}" => "\x{FF63}",
);

This list is useful, but I think even more complete list have
been established in this thread.

I see "\x{2329}" => "\x{232A}", rather than \x{2330}, though.

Regards,
Vadim.

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2011

From @cpansprout

On Aug 2, 2011, at 10​:15 PM, Brian Fraser wrote​:

Moving on, PL_multi_(close|open) are chars, not char*, so any attempt to implement RT#89032 would have to turn those to something more sensible (an SV, perhaps? through a simple three-element struct would make do just fine). Should I change it?

Are we sure it’s even a good idea to allow Unicode paired delimiters? I know we already allow for Unicode identifiers, but it has proven to be problematic, simply because Unicode is a moving target. Every Unicode upgrade changes Perl syntax just slightly. If we allow Unicode paired brackets, that will just aggravate the problem.

Also, it would not be backward-compatible, as these currently work​:

$ perl -Mutf8 -le 'print q «foo«'
foo

perlop states that it is only the four ASCII brackets that are treated specially. That implies that my example works. Since it’s documented, we can’t easily change it without a deprecation cycle, can we?

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2011

From perl-diddler@tlinx.org

Father Chrysostomos via RT wrote​:

On Aug 2, 2011, at 10​:15 PM, Brian Fraser wrote​:

Moving on, PL_multi_(close|open) are chars, not char*, so any attempt to implement RT#89032 would have to turn those to something more sensible (an SV, perhaps? through a simple three-element struct would make do just fine). Should I change it?

Are we sure it’s even a good idea to allow Unicode paired delimiters? I know we already allow for Unicode identifiers, but it has proven to be problematic, simply because Unicode is a moving target. Every Unicode upgrade changes Perl syntax just slightly. If we allow Unicode paired brackets, that will just aggravate the problem.

Also, it would not be backward-compatible, as these currently work​:

$ perl -Mutf8 -le 'print q «foo«'
foo

perlop states that it is only the four ASCII brackets that are treated specially. That implies that my example works. Since it’s documented, we can’t easily change it without a deprecation cycle, can we?


use unicode_brackets;

@p5pRT
Copy link
Author

p5pRT commented Aug 8, 2011

From @Hugmeir

On 8/7/11, Father Chrysostomos <sprout@​cpan.org> wrote​:

Are we sure it’s even a good idea to allow Unicode paired delimiters? I know
we already allow for Unicode identifiers, but it has proven to be
problematic, simply because Unicode is a moving target. Every Unicode
upgrade changes Perl syntax just slightly. If we allow Unicode paired
brackets, that will just aggravate the problem.

Also, it would not be backward-compatible, as these currently work​:

$ perl -Mutf8 -le 'print q «foo«'
foo

perlop states that it is only the four ASCII brackets that are treated
specially. That implies that my example works. Since it’s documented, we
can’t easily change it without a deprecation cycle, can we?

I think this is a valid concern, but I don't think the decision should
be dictated because of the implementation. If adding paired UTF-8
delimiters isn't a good course of action, then don't add them, but the
tokenizer's (in)ability to handle them should be besides the point.

@p5pRT
Copy link
Author

p5pRT commented Aug 12, 2011

From @nwc10

On Sun, Aug 07, 2011 at 03​:05​:38PM -0700, Linda Walsh wrote​:

Father Chrysostomos via RT wrote​:

On Aug 2, 2011, at 10​:15 PM, Brian Fraser wrote​:

Moving on, PL_multi_(close|open) are chars, not char*, so any attempt to implement RT#89032 would have to turn those to something more sensible (an SV, perhaps? through a simple three-element struct would make do just fine). Should I change it?

Are we sure it's even a good idea to allow Unicode paired delimiters? I know we already allow for Unicode identifiers, but it has proven to be problematic, simply because Unicode is a moving target. Every Unicode upgrade changes Perl syntax just slightly. If we allow Unicode paired brackets, that will just aggravate the problem.

perlop states that it is only the four ASCII brackets that are treated specially. That implies that my example works. Since it's documented, we can't easily change it without a deprecation cycle, can we?
----

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's
not a paired delimiter this version might become one next version.
And suddenly your program changes meaning underneath you.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Aug 12, 2011

From @Hugmeir

On Fri, Aug 12, 2011 at 6​:46 AM, Nicholas Clark <nick@​ccl4.org> wrote​:

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's
not a paired delimiter this version might become one next version.
And suddenly your program changes meaning underneath you.

And in any case, I think that, if you want to change the syntax, you should
be doing it explicitly, ala​:

use charnames qw( :full );
use paired_delimiters "\N{LEFT-POINTING DOUBLE ANGLE QUOTATION MARK}" =>
"\N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}";

or

use unicode_brackets Unicode => v6;

or somesuch. Though all of this would need a new API -- Might be less wrong
to just wait for (someone|Zefram) to add pluggable operators.

@p5pRT
Copy link
Author

p5pRT commented Aug 12, 2011

From zefram@fysh.org

Brian Fraser wrote​:

use paired_delimiters "\N{LEFT-POINTING DOUBLE ANGLE QUOTATION MARK}" =>
"\N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}";

Ah, that's reasonably nice.

or somesuch. Though all of this would need a new API -- Might be less wrong
to just wait for (someone|Zefram) to add pluggable operators.

I've been thinking about how to handle delimiters in plugged-in syntax.
I think it's a no-brainer that syntax plugins ought to have some help
in handling Perl's standard forms of delimitation. Syntax plugins
shouldn't have to handle delimiter pairing themselves, they should be
able to ask the core to process that part of the syntax. I imagine
an API function that will skip whitespace, read the next character
(the opening delimiter), and then return the codepoint that will be
the closing delimiter. (There's already a function, commonly used via
Devel​::Declare, that scans an entire delimited string, but as we've
learned with "@​{[]}" this pre-scanning doesn't get along nicely with
nesting syntactic constructs.)

So syntax plugins aren't a solution here, they're another source of
demand for a solution. I think the core should have some support for
delimiter pairing, but that could take the form of another hook that
lets modules plug in arbitrarily-complicated delimiter behaviour.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Aug 12, 2011

From perl-diddler@tlinx.org

Nicholas Clark via RT wrote​:

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's
not a paired delimiter this version might become one next version.
And suddenly your program changes meaning underneath you.


  That's not exactly true. It shouldn't happen.

  Characters can't be changed once they are created. They can be
'obviated', but
never deleted. They just move on to a replacement char in a new code
area with the new
meaning.

  That 'not deleting anything that has been published', rule was required
to move
forward for concerns exactly like this.

  New delimiters may come, but they'll come out of what are now, invalid
ranges
or unused planes.

@p5pRT
Copy link
Author

p5pRT commented Aug 13, 2011

From @nwc10

On Fri, Aug 12, 2011 at 07​:50​:18AM -0700, Linda Walsh wrote​:

Nicholas Clark via RT wrote​:

use unicode_brackets;

Fails to address the valid concern that Unicode is a moving target - what's
not a paired delimiter this version might become one next version.
And suddenly your program changes meaning underneath you.
---
That's not exactly true. It shouldn't happen.

Characters can't be changed once they are created\.  They can be 

'obviated', but
never deleted. They just move on to a replacement char in a new code
area with the new
meaning.

Character properties can change. U-00B5 was Greek once. It isn't now.
My impression (as an outsider) is that such changes are rare. But they
have happened, so clearly they are not impossible.

So the concern is that if we drive parsing using Unicode properties, then
there will be corner cases where the parsing changes based on which version
of Unicode the parser is based on. And *that* is going to be surprising to
anyone caught by it.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Aug 13, 2011

From perl-diddler@tlinx.org

` Nicholas Clark via RT wrote​:

Character properties can change. U-00B5 was Greek once.


  Could you find a better example? As it is still is.
Its been the same since the earliest online reference for 2.0 dating back
to 1994. I don't think unicode has changed it since its inception in 91.

  Unless you can come up with a more firm example, I'm only willing
to consider this (sorry) to to use the term, but "FUD", as it goes
against the the Unicode mission statement. I quote from the book, Fonts
and Encodings (O'Reilly), p61​:

  Unwritten principle #11​: permanent stability.

We have taken the liberty of adding an eleventh principle the official
Unicode principles, one that is important and laden with consequences​:
  [next two lines italicized in text for emphasis].
  *** as soon as a character has been added to the encoding, that
  *** character cannot be removed or altered.
The idea is that document encoded in Unicode today should not become
unusable in a few years hence, as is often the case with word-processing
software documents (such as those produced with MS Word, not to name any
names). Unlike the ten official principles, this one is so scrupulous
respected that Unicode has come to contain a large number of characters
whose use is deprecated by Unicode itself. Even more shocking is that
the name of the character 0xD0C5 contains an obvious type (FHTORA
instead of FTHORA); [still true in unicode 6.0, BTW], rather than
correcting it the Consortium has decided to let it stand and to insert a
little note" [... acknowledging it ].


  I'm probably as much an outsider as you (my largest claim to
knowledge about fonts is the book I just mentioned - great reference on
the topic, though a bit dated).

  I don't see any evidence to support the type changes you express concern
about. Obviously, the universe could end tomorrow, and permanence would be
severely truncated, but they appear to be more stable than perl -- and
perl is pretty stable (compared to BASH where the maintain cares nothing
for previous
version compat in multiple areas...it's becoming a nightmare -- I keep
holding up perl as an example to follow, but my protests fall on deaf
ears...)...

It isn't now.
My impression (as an outsider) is that such changes are rare. But they
have happened, so clearly they are not impossible.


  They may have happened, but the cited example is not one of those cases.

So the concern is that if we drive parsing using Unicode properties, then
there will be corner cases where the parsing changes based on which version
of Unicode the parser is based on. And *that* is going to be surprising to
anyone caught by it.


  Hey -- we an always blame it on them!.. ;-)

@p5pRT
Copy link
Author

p5pRT commented Aug 13, 2011

From @nwc10

On Sat, Aug 13, 2011 at 02​:02​:32AM -0700, Linda Walsh wrote​:

` Nicholas Clark via RT wrote​:

Character properties can change. U-00B5 was Greek once.
---
Could you find a better example? As it is still is.
Its been the same since the earliest online reference for 2.0 dating back
to 1994. I don't think unicode has changed it since its inception in 91.

$ ~/Sandpit/583/bin/perl -le '$_ = chr 0xB5; utf8​::upgrade $_; print /\p{isGreek}/ ? "Greek!" : "not :-("'
Greek!
$ ~/Sandpit/584/bin/perl -le '$_ = chr 0xB5; utf8​::upgrade $_; print /\p{isGreek}/ ? "Greek!" : "not :-("'
not :-(

Unless you can come up with a more firm example\, I'm only willing

to consider this (sorry) to to use the term, but "FUD", as it goes
against the the Unicode mission statement. I quote from the book, Fonts
and Encodings (O'Reilly), p61​:

Unwritten principle \#11&#8203;: permanent stability\.

Not FUD. See above. I don't know *why* Perl's implementation changed, but it
did.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2012

From perl-diddler@tlinx.org

chromatic via RT wrote​:

On Thursday, July 12, 2012 08​:13​:46 PM Linda W wrote​:

 Note\, however\, that this does not always work for quoting Perl code&#8203;:
       $s = q\{ if\($a eq "\}"\) \.\.\. \}; \# WRONG
   is a syntax error\.

In fact, it's impossible without resorting to special modules, like
"Text​::Balanced" to quote perl code

$s = q\{ if\($a eq "\\\}"\) \.\.\. \};

... seems to work for me as far back as I tested (5.8.9).

-- c

===
You have a backslash in front of your }.... the manpage version does not.

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2012

From perl-diddler@tlinx.org

Dave Mitchell wrote​:

On Thu, Jul 12, 2012 at 08​:13​:46PM -0700, Linda W wrote​:

since as this perl manpage points out​:

Note\, however\, that this does not always work for quoting Perl code&#8203;:
      $s = q\{ if\($a eq "\}"\) \.\.\. \}; \# WRONG
  is a syntax error\.

In fact, it's impossible without resorting to special modules, like
"Text​::Balanced"

And would still be impossible if those new bracketing chars were allowed,
since the new chars would then be legal chars in perl source. So any
robust program that embeds perl code in strings would still need to check,
escape etc.


  No... You didn't read what I wrote...

  The beauty of them is they are not perl operators, so they could safely
  be used to quote perl code and I doubt they are in use for any purpose
  now except in strings, and then only if unicode is enabled.

Absolutely not. Maint branches are for important bug fixes​: the sort of
thing we would have added post-RC0 if only we'd known back then.


  They are for bug fixes....not just 'important post RC0 unless things
have drastically
changed.

  Since « », are in the non-unicode region, it could be argued that it
was an oversight
in not including them and could already be considered bug. Why weren't
they allowed
in the first place? They are in the lower 256, where single bytes are
used....

 

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2012

From chromatic@wgz.org

On Friday, July 13, 2012 06​:35​:00 PM Linda W wrote​:

You have a backslash in front of your }.... the manpage version does not.

That's why my version works. Later, the same document (perldoc perlop) says​:

  A backslash represents a backslash unless followed by the delimiter or
  another backslash, in which case the delimiter or backslash is
  interpolated.

-- c

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2012

From @ikegami

On Fri, Jul 13, 2012 at 9​:43 PM, Linda W <perl-diddler@​tlinx.org> wrote​:

**

No\.\.\. You didn't read what I wrote\.\.\.

The beauty of them is they are not perl operators, so they could safely
be used to quote perl code and I doubt they are in use for any purpose
now except in strings, and then only if unicode is enabled.

Except that's totally false. The proposed quotes can most definitely
appear in code (e.g. in string literals and in comments).

@p5pRT
Copy link
Author

p5pRT commented Jul 15, 2012

From perl-diddler@tlinx.org

Eric Brine wrote​:

On Fri, Jul 13, 2012 at 9​:43 PM, Linda W <perl-diddler@​tlinx.org
<mailto​:perl-diddler@​tlinx.org>> wrote​:

    No\.\.\. You didn't read what I wrote\.\.\.

    The beauty of them is they are not perl operators\, so they
    could safely
    be used to quote perl code  and I doubt they are in use for
    any purpose
    now except in strings\, and then only if unicode is enabled\. 

Except that's totally false. The proposed quotes can most definitely
appear in code (e.g. in string literals and in comments).


  Except what you wrote starts out with a false statement. "Totally
false?"
  1) they are not perl operators. True or False?
  2) not used for any purpose except in strings. True or false?
  um... thats 0/2 you got right.

  I think I mentioned they could occur in strings, aren't those string
literals?...or am I missing
something?... and yes... comments.. lets see, I got 75%, out of 100,
you got 75% wrong. and thought
of another use. Congrats!

  What is with your attitude about my posts that causes you to respond
without engaging brain?
They way you're acting one might almost think your smitten or something...

  Let me stress this point...and think before claiming it is false.

  Unlike the the current pair'ed operators​: <>{}()[], «» would not
conflict with any
perl operators. and could be used to quote perl code with a little bit
of judgment.

  They'd even work for qr« » and you'd not have to worry about perl
operators being
deactivated. Something you can't do with any of the others.

  I did use single quote for my usage, as someone else suggested as
well -- but neither they
nor I initially noted that the comments in that example become part of
the pattern. Perhaps
comments are not the first thing one things about as being 'code'...

 

@p5pRT
Copy link
Author

p5pRT commented Jul 15, 2012

From @arc

Linda W <perl-diddler@​tlinx.org> wrote​:

Unlike the the current pair'ed operators​: <>{}()[], «» would not
conflict with any perl operators. and could be used to quote
perl code with a little bit of judgment.

Consider this code​:

my $s = q«a«;#»»;

In every current release of Perl 5, that sets $s to "a". Under your
proposal to make «» bracketing characters for pick-your-own-delimiter
quote-likes, it would instead set it to "a«;#»".

So your proposal would change the meaning of valid programs. That's
not an a priori reason not to make the change, but the reasoning for
any decision to do so must be informed by that fact, and by our policy
on when and how to make incompatible changes to the language. In
particular, making such a change to an existing maint series, as you
suggested earlier, would be a clear breach of our maintenance policy.

--
Aaron Crane ** http​://aaroncrane.co.uk/

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2012

From perl-diddler@tlinx.org

Aaron Crane via RT wrote​:

Linda W <perl-diddler@​tlinx.org> wrote​:

Unlike the the current pair'ed operators​: <>{}()[], «» would not
conflict with any perl operators. and could be used to quote
perl code with a little bit of judgment.

Consider this code​:

my $s = q«a«;#»»;

In every current release of Perl 5, that sets $s to "a". Under your
proposal to make «» bracketing characters for pick-your-own-delimiter
quote-likes, it would instead set it to "a«;#»".

---

So did changes in 5.16, 5.14, 5.12...

It wouldn't be the first time...but no notice would be no fun...

so "use curquotes"
and "use uniquotes"
in the next releases, for people to enable simple or full uni-code
quoting...
I thought the simple case might be easier.

Then set a deprecation schedule for curquotes (or not) -- as those are
in the original LATIN1
range (which should be unicode, BTW, as much as it might hurt,
cuz, '0x80-0xff' are all encoded in unicode as 2 bytes! -- claiming they
are equivalent to latin1 is
wrong.

I think the number of people affected by something like the above would be
near 0. Maybe not 0, but pretty darn close.
You have to really *try* to use those chars, it's not like they are on
most keyboards...

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2012

From @doy

On Sun, Jul 15, 2012 at 07​:41​:43PM -0700, Linda W wrote​:

Aaron Crane via RT wrote​:

Linda W <perl-diddler@​tlinx.org> wrote​:

Unlike the the current pair'ed operators​: <>{}()[], «» would not
conflict with any perl operators. and could be used to quote
perl code with a little bit of judgment.

Consider this code​:

my $s = q«a«;#»»;

In every current release of Perl 5, that sets $s to "a". Under your
proposal to make «» bracketing characters for pick-your-own-delimiter
quote-likes, it would instead set it to "a«;#»".

---
So did changes in 5.16, 5.14, 5.12...

If you read the paragraph directly following the one you just quoted,
you'll see that Aaron was only giving that as a reason why this change
would not go into 5.16.1, not why it won't go in at all.

-doy

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2012

From perl-diddler@tlinx.org

Jesse Luehrs via RT wrote​:

On Sun, Jul 15, 2012 at 07​:41​:43PM -0700, Linda W wrote​:

Aaron Crane via RT wrote​:

Linda W <perl-diddler@​tlinx.org> wrote​:

Unlike the the current pair'ed operators​: <>{}()[], «» would not
conflict with any perl operators. and could be used to quote
perl code with a little bit of judgment.

Consider this code​:

my $s = q«a«;#»»;

In every current release of Perl 5, that sets $s to "a". Under your
proposal to make «» bracketing characters for pick-your-own-delimiter
quote-likes, it would instead set it to "a«;#»".

---

So did changes in 5.16, 5.14, 5.12...

If you read the paragraph directly following the one you just quoted,
you'll see that Aaron was only giving that as a reason why this change
would not go into 5.16.1, not why it won't go in at all.

You mean that addressed by the part you elided?​:

It wouldn't be the first time... but no notice would be no fun...

so "use curquotes"
and "use uniquotes"
in the next releases, for people to enable simple or full uni-code
quoting...
I thought the simple case might be easier.

Then set a deprecation schedule for curquotes (or not) -- as those are
in the original LATIN1
range (which should be unicode, BTW, as much as it might hurt,
cuz, '0x80-0xff' are all encoded in unicode as 2 bytes! -- claiming they
are equivalent to latin1 is
wrong.

Adding something with a use feature latinquotes or curquotes, in a minor
release,
if done correctly, would have no impact on current code.

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2012

From @doy

On Mon, Jul 16, 2012 at 02​:20​:41PM -0700, Linda W wrote​:

Jesse Luehrs via RT wrote​:

On Sun, Jul 15, 2012 at 07​:41​:43PM -0700, Linda W wrote​:

Aaron Crane via RT wrote​:

In every current release of Perl 5, that sets $s to "a". Under your
proposal to make «» bracketing characters for pick-your-own-delimiter
quote-likes, it would instead set it to "a«;#»".

---
So did changes in 5.16, 5.14, 5.12...

If you read the paragraph directly following the one you just quoted,
you'll see that Aaron was only giving that as a reason why this change
would not go into 5.16.1, not why it won't go in at all.
You mean that addressed by the part you elided?​:

It wouldn't be the first time... but no notice would be no fun...

so "use curquotes"
and "use uniquotes"
in the next releases, for people to enable simple or full uni-code
quoting...
I thought the simple case might be easier.

Then set a deprecation schedule for curquotes (or not) -- as those
are in the original LATIN1
range (which should be unicode, BTW, as much as it might hurt,
cuz, '0x80-0xff' are all encoded in unicode as 2 bytes! -- claiming
they are equivalent to latin1 is
wrong.

Adding something with a use feature latinquotes or curquotes, in a
minor release,
if done correctly, would have no impact on current code.

Regardless of how little impact it should have, we don't add new
features in point releases (anymore). See perldoc perlpolicy.

-doy

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2012

From perl-diddler@tlinx.org

Jesse Luehrs via RT wrote​:

correctly, would have no impact on current code.

Regardless of how little impact it should have, we don't add new
features in point releases (anymore). See perldoc perlpolicy.

====
  I'm glad such policies are in place, so I'm not going disagree.

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2012

From @ikegami

On Mon, Jul 16, 2012 at 5​:20 PM, Linda W <perl-diddler@​tlinx.org> wrote​:

**

cuz, '0x80-0xff' are all encoded in unicode as 2 bytes!

You mean UTF-8 or UTF-16 or something similar, not Unicode. Unicode is not
an encoding.

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2012

From perl-diddler@tlinx.org

Eric Brine wrote​:

On Mon, Jul 16, 2012 at 5​:20 PM, Linda W <perl-diddler@​tlinx.org
<mailto​:perl-diddler@​tlinx.org>> wrote​:

cuz\, '0x80\-0xff' are all encoded in unicode as 2 bytes\!

You mean UTF-8 or UTF-16 or something similar, not Unicode. Unicode is
not an encoding.
Good point. I should have said "usual, but at least 2 bytes", instead
of "2 bytes". I was referring to the second in the perlunicode" manpage​:

  The [Perl] "Unicode Bug"
  The term, the "Unicode bug" has been applied to an inconsistency
[in Perl]
  with the Unicode code points in the [unassigned and (since perl
ignores
  locale) 'illegal'] block, that is, between 128 and 255. Without
a locale specified,
  unlike all other characters or code points, these characters...
are raw bytes.
  Without a local or character interpretation, they have no meaning.
  (The lesson here is to specify "unicode_strings" to avoid the
headaches.)[sic]
  [The lesson should be that these will, like the rest of perl,
default to unicode
  strings, in accordance with the rules that any bytes in a
character string with the
  high bit set are encoded].

  In character semantics they are interpreted as Unicode code
points [ -- which
  is ILLEGAL in any Unicode encoding. The justification for this
is that
  that they will have the same semantics as the US-Centric Latin-1
(ISO-8859-1)].

  In byte semantics, they are considered to be unassigned characters,
  meaning that the only semantics they have is their ordinal
numbers, and
  that they are not members of various character classes. None are
  considered to match "\w" for example, but all match "\W".


  I.e. by default they are illegal as characters and due to the mixed
handling
of those characters as bytes in some contexts and as characters in
others, one gets
arcane and confusing behavior when dealing with this range.

  By default, /I would submit/ that this range should be elevated, at
some point,
the way the rest of perl's character handling is done -- and convert it,
internally,
to UTF-8 when consistent with the rest of the program (or whatever
handling is
specified for the rest of the characters above 255).

  Not doing so broke basic perl functionality from Perl4 days in its
ability
to process text (from stdin->stdout)...I tried to tell it not to change
anything
by saying STDIN/OUT/ERR were bytes only, but it made no difference.

Most of email comes through looking like​:
Chris Rodr????????????????????????????????
and is unusable as a return/reply address when it should look like​:
Chris Rodri'guez <christina@​email.moc>


But I understand change doesn't come overnight as well...

@p5pRT
Copy link
Author

p5pRT commented Jul 18, 2012

From tchrist@perl.com

I'm afraid that you're really rather horribly confused about all this.

You have managed to get yourself into a snit because you've unwittingly
conflated logical code points, internal representations, and particular
encodings. Since you've gotten this wrong, nothing that follows from
your false premise is meaningful.

--tom

@p5pRT
Copy link
Author

p5pRT commented Jul 18, 2012

From perl-diddler@tlinx.org

tchrist1 via RT wrote​:

I'm afraid that you're really rather horribly confused about all this.

You have managed to get yourself into a snit because you've unwittingly
conflated logical code points, internal representations, and particular
encodings. Since you've gotten this wrong, nothing that follows from
your false premise is meaningful.

So you are saying that no matter if my terminology wasn't exactly you
believe
it should be, you really can't figure out what I mean? You really don't
have the
ability to parse what someone else is saying? Even when they don't say it
exactly 'right'?

Normally, have PERL5OPT set to -CSA, "use utf8" in my source and a
UTF-8 environment,
but I often don't get consistent results for chars in the range 0x7f-0xff.

I'll have to think of a different way to to explain this ...

@p5pRT
Copy link
Author

p5pRT commented Jul 18, 2012

From @Leont

On Wed, Jul 18, 2012 at 12​:07 PM, Linda W <perl-diddler@​tlinx.org> wrote​:

So you are saying that no matter if my terminology wasn't exactly you
believe
it should be, you really can't figure out what I mean? You really don't
have the
ability to parse what someone else is saying? Even when they don't say it
exactly 'right'?

Because you're making no sense that way.

Normally, have PERL5OPT set to -CSA, "use utf8" in my source and a UTF-8
environment,
but I often don't get consistent results for chars in the range 0x7f-0xff.

I'll have to think of a different way to to explain this ...

Code examples would be helpful. -CSA manages @​ARGV and
STDIN/STDOUT/STDERR, but not other filehandles, so that might be the
problem, but right now your problem description is far too vague for
any of us to help you.

Leon

@p5pRT
Copy link
Author

p5pRT commented Jul 18, 2012

From tchrist@perl.com

Linda W <perl-diddler@​tlinx.org> wrote
  on Wed, 18 Jul 2012 02​:07​:16 PDT​:

I'm afraid that you're really rather horribly confused about all this.

You have managed to get yourself into a snit because you've unwittingly
conflated logical code points, internal representations, and particular
encodings. Since you've gotten this wrong, nothing that follows from
your false premise is meaningful.

So you are saying that no matter if my terminology wasn't exactly you
believe it should be, you really can't figure out what I mean? You
really don't have the ability to parse what someone else is saying?
Even when they don't say it exactly 'right'?

I have no idea what the answers to those particular questions are
in the general case, but certainly in this specific case, I truly
have no idea what in the world you are talking about.

Normally, have PERL5OPT set to -CSA, "use utf8" in my source and a
UTF-8 environment, but I often don't get consistent results for chars
in the range 0x7f-0xff.

That's still vague.

Are you using unicode_strings in your source?

And are you reading Unicode data?

I'll have to think of a different way to to explain this ...

Good idea, that.

--tom

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2012

From perl-diddler@tlinx.org

On Wed Jul 18 05​:01​:47 2012, tom christiansen wrote​:

Linda W <perl-diddler@​tlinx.org> wrote
on Wed, 18 Jul 2012 02​:07​:16 PDT​:

I'm afraid that you're really rather horribly confused about all this.

You have managed to get yourself into a snit because you've unwittingly
conflated logical code points, internal representations, and particular
encodings. Since you've gotten this wrong, nothing that follows from
your false premise is meaningful.


Really. Perhaps my perceptions are not always correct, are you really so
sure yours are always correct?

Normally, have PERL5OPT set to -CSA, "use utf8" in my source and a
UTF-8 environment, but I often don't get consistent results for chars
in the range 0x7f-0xff.

That's still vague.

Are you using unicode_strings in your source?


I filed a bug, that more clearly elucidates what I am seeing as a problem.

You can call it confusion, but, if such exists, its because someone
thought the chars 127-255 could be left unencoded because they have the
same code point value in UTF8 as in LATIN1. This is my perception of
the bug in
perl -- if that is incorrect, please correct me -- i.e. explain how it
is wrong.

Example.

I have "use utf8" in my code and have a sub name using the script 'f'​:
'ƒ' (U+192)
Now you may believe I am confusing codepoint U+192 with the UTF-chars
\xc6\x92, but
They don't look a thing alike.

Now Perl -- it seems confused, as it thinks the UTF-8 encoding of U+192
are themselves
code points even though I hve -CSA set in my perl5opts.

When it prints out I see​: "Æ�Register_FStype"
"(U+C6)(U+92)Register_FStype"... The U+C6 and U+92
that were the utf-8 representation of U+192 in my source were
incorrectly converted by perl
into UTF-8 AGAIN.. because there is a bug in how perl interprets chars
0x80-0xff -- instead of decoding
the \xc6\x92 in my source correctly as code point U+192, it incorrectly
*redecodes it into UTF-8 again,
resulting in the byte sequence \xc3\x86\xc2\x92.

So Please tell me, who doesn't understand the difference between code
points and
their encoding, me? or Perl...

Is this clear enough for you?

@khwilliamson
Copy link
Contributor

Fixed by 0b6e3da and preceding commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants