Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid and tainted utf-8 char crashes perl 5.10.1 in regexp evaluation #9922

Closed
p5pRT opened this issue Oct 22, 2009 · 9 comments
Closed

Invalid and tainted utf-8 char crashes perl 5.10.1 in regexp evaluation #9922

p5pRT opened this issue Oct 22, 2009 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 22, 2009

Migrated from rt.perl.org#69973 (status was 'resolved')

Searchable as RT69973$

@p5pRT
Copy link
Author

p5pRT commented Oct 22, 2009

From Mark.Martinec@ijs.si

Created by Mark.Martinec@ijs.si

Tracking down a reason for crashes of a perl process while processing
certain obfuscated spam messages, it turns out that an utf-8 character
with a large (and invalid) codepoint is causing a perl 5.10.1 crash
while matching such string to a particular regular expression.

This is happening on a FreeBSD 7.2, using perl as installed from ports
with no special settings.

Reducing the actual crashing application to a small test case,
here it is​:

#!/usr/bin/perl -T
  use strict;

  # Here is a HTML snippet from a malicious/obfuscated mail message.
  # Note the last character has an invalid and huge UTF-8 code
  # (as a result of an unrelated bug in HTML​::Parser).
  #
  my $t = '<a>Attention Home&#959&#969n&#1257rs...1&#1109t '.
  'T&#1110&#1084e E&#957&#1257&#1075075</a>';

  $t =~ s/&#(\d+)/chr($1)/ge; # convert HTML entities to UTF8
  $t .= substr($ENV{PATH},0,0); # make it tainted

  # show character codes in the resulting string
  print join(", ", map {ord} split(//,$t)), "\n";

  # The following regexp evaluation crashes perl 5.10.1 on FreeBSD.
  # Note that $t must be tainted and must have the UTF8 flag on,
  # otherwise the crash seems to be avoided.

  $t =~ /( |\b)(http​:|www\.)/i;

and here is the result (hand wrapped)​:

  60, 97, 62, 65, 116, 116, 101, 110, 116, 105, 111, 110, 32, 72, 111,
  109, 101, 959, 969, 110, 1257, 114, 115, 46, 46, 46, 49, 1109, 116,
  32, 84, 1110, 1084, 101, 32, 69, 957, 1257, 1075075, 60, 47, 97, 62
  Segmentation fault​: 11 (core dumped)

Here is a backtrace as obtained from a core dump
(cut/pasted from screen, the actual 8-bit characters may be wrong)​:

$ gdb -c perl5.10.1.core /usr/local/bin/perl5.10.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `perl5.10.1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/perl5/5.10.1/mach/CORE/libperl.so...done.
Loaded symbols for /usr/local/lib/perl5/5.10.1/mach/CORE/libperl.so
Reading symbols from /lib/libm.so.5...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypt.so.4...done.
Loaded symbols for /lib/libcrypt.so.4
Reading symbols from /lib/libutil.so.7...done.
Loaded symbols for /lib/libutil.so.7
Reading symbols from /lib/libc.so.7...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0 0x00000000408bb101 in S_regmatch (reginfo=0x7fffffffe590, prog=0x411143a4) at regexec.c​:3049
3049 REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,

(gdb) bt
#0 0x00000000408bb101 in S_regmatch (reginfo=0x7fffffffe590, prog=0x411143a4) at regexec.c​:3049
#1 0x00000000408b7b0a in S_regtry (reginfo=0x7fffffffe590, startpos=0x7fffffffe6d8) at regexec.c​:2355
#2 0x00000000408b6a7a in Perl_regexec_flags (prog=0x4114f1a0,
  stringarg=0x4111d6c0 "<a>Attention HomeοÏ\211nÓ©rs...1Ñ\225t TÑ\226мe Eνөô\206\236\203</a>",
  strend=0x4111d6f3 "/a>",
  strbeg=0x4111d6c0 "<a>Attention HomeοÏ\211nÓ©rs...1Ñ\225t TÑ\226мe Eνөô\206\236\203</a>", minend=0,
  sv=0x4113ec48, data=0x0, flags=3) at regexec.c​:2146
#3 0x00000000407864a3 in Perl_pp_match () at pp_hot.c​:1356
#4 0x000000004073fa4c in Perl_runops_debug () at dump.c​:1968
#5 0x00000000406905d8 in S_run_body (oldscope=1) at perl.c​:2431
#6 0x000000004068f9b0 in perl_run (my_perl=0x41102104) at perl.c​:2349
#7 0x0000000000400bf4 in main (argc=3, argv=0x7fffffffea90, env=0x7fffffffeab0) at perlmain.c​:117

(gdb)

And lastly, here is a perl debug output using the -Dr command line option​:

Compiling REx "( |\b)(http​:|www\.)"
Final program​:
  1​: OPEN1 (3)
  3​: BRANCH (6)
  4​: EXACTF < > (8)
  6​: BRANCH (FAIL)
  7​: BOUND (8)
  8​: CLOSE1 (10)
  10​: OPEN2 (12)
  12​: TRIE-EXACTF[HWhw] (19)
  <http​:>
  <www.>
  19​: CLOSE2 (21)
  21​: END (0)
minlen 4
Omitting $` $&amp; $' support.

EXECUTING...

[...]
Matching REx "( |\b)(http​:|www\.)" against "<a>Attention Home%x{3bf}%x{3c9}n%x{4e9}rs...1%x{455}t T"...
UTF-8 string...
  0 <> <<a>Attenti> | 1​:OPEN1(3)
  0 <> <<a>Attenti> | 3​:BRANCH(6)
  0 <> <<a>Attenti> | 4​: EXACTF < >(8)
Compiling REx "(^|[/\\])warnings\.pmc?$"
rarest char w at 0
Final program​:
  1​: OPEN1 (3)
  3​: BRANCH (5)
  4​: BOL (17)
  5​: BRANCH (FAIL)
  6​: ANYOF[/\\][] (17)
  17​: CLOSE1 (19)
  19​: EXACT <warnings.pm> (23)
  23​: CURLY {0,1} (27)
  25​: EXACT <c> (0)
  27​: EOL (28)
  28​: END (0)
floating "warnings.pm" at 0..1 (checking floating) minlen 11
Guessing start of match in sv for REx "(^|[/\\])warnings\.pmc?$" against "/usr/local/lib/perl5/5.10.1/warnings.pm"
Found floating substr "warnings.pm" at offset 28...
Starting position does not contradict /^/m...
Guessed​: match at offset 27
Matching REx "(^|[/\\])warnings\.pmc?$" against "/warnings.pm"
  27 <.10.1> </warnings.> | 1​:OPEN1(3)
  27 <.10.1> </warnings.> | 3​:BRANCH(5)
  27 <.10.1> </warnings.> | 4​: BOL(17)
  failed...
  27 <.10.1> </warnings.> | 5​:BRANCH(17)
  27 <.10.1> </warnings.> | 6​: ANYOF[/\\][](17)
  28 <10.1/> <warnings.p> | 17​: CLOSE1(19)
  28 <10.1/> <warnings.p> | 19​: EXACT <warnings.pm>(23)
  39 </warnings.pm> <> | 23​: CURLY {0,1}(27)
  EXACT <c> can match 0 times out of 1...
  39 </warnings.pm> <> | 27​: EOL(28)
  39 </warnings.pm> <> | 28​: END(0)
Match successful!
Freeing REx​: "(^|[/\\])warnings\.pmc?$"
Compiling REx "^\s+"
synthetic stclass "ANYOF[\11\12\14\15 ][{unicode_all}]".
Final program​:
  1​: BOL (2)
  2​: PLUS (4)
  3​: SPACE (0)
  4​: END (0)
stclass ANYOF[\11\12\14\15 ][{unicode_all}] anchored(BOL) minlen 1
Compiling REx "\s+$"
rarest char
at 0
Final program​:
  1​: PLUS (3)
  2​: SPACE (0)
  3​: EOL (4)
  4​: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass SPACE plus minlen 1
Compiling REx "(.+)​::"
rarest char : at 0
Final program​:
  1​: OPEN1 (3)
  3​: PLUS (5)
  4​: REG_ANY (0)
  5​: CLOSE1 (7)
  7​: EXACT <​::> (9)
  9​: END (0)
floating "​::" at 1..2147483647 (checking floating) plus minlen 3
Compiling REx "^(?​:\w+)$"
rarest char
at 0
synthetic stclass "ANYOF[0-9A-Z_a-z][{unicode_all}]".
Final program​:
  1​: BOL (2)
  2​: PLUS (4)
  3​: ALNUM (0)
  4​: EOL (5)
  5​: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9A-Z_a-z][{unicode_all}] anchored(BOL) minlen 1
Compiling REx "^Is(?​:\s+|[-_])?"
Final program​:
  1​: BOL (2)
  2​: EXACTF <Is> (4)
  4​: CURLYX[0] {0,1} (23)
  6​: BRANCH (9)
  7​: PLUS (22)
  8​: SPACE (0)
  9​: BRANCH (FAIL)
  10​: ANYOF[\-_][] (22)
  21​: TAIL (22)
  22​: WHILEM (0)
  23​: NOTHING (24)
  24​: END (0)
stclass EXACTF <Is> anchored(BOL) minlen 2
Compiling REx "^(?​:(?​:General(?​:\s+|_)?)?Category|gc)\s*[​:=]\s*"
Final program​:
  1​: BOL (2)
  2​: BRANCH (24)
  3​: CURLYX[0] {0,1} (20)
  5​: EXACTF <General> (8)
  8​: CURLYX[0] {0,1} (18)
  10​: BRANCH (13)
  11​: PLUS (17)
  12​: SPACE (0)
  13​: BRANCH (FAIL)
  14​: EXACTF <_> (17)
  16​: TAIL (17)
  17​: WHILEM (0)
  18​: NOTHING (19)
  19​: WHILEM (0)
  20​: NOTHING (21)
  21​: EXACTF <Category> (28)
  24​: BRANCH (FAIL)
  25​: EXACTF <gc> (28)
  27​: TAIL (28)
  28​: STAR (30)
  29​: SPACE (0)
  30​: ANYOF[​:=][] (41)
  41​: STAR (43)
  42​: SPACE (0)
  43​: END (0)
anchored(BOL) minlen 3
Compiling REx "^(?​:Script|sc)\s*[​:=]\s*"
Final program​:
  1​: BOL (2)
  2​: EXACTF <Sc> (4)
  4​: TRIE-EXACTF[Rr] (10)
  <ript>
  <>
  10​: STAR (12)
  11​: SPACE (0)
  12​: ANYOF[​:=][] (23)
  23​: STAR (25)
  24​: SPACE (0)
  25​: END (0)
stclass EXACTF <Sc> anchored(BOL) minlen 3
Compiling REx "^Block\s*[​:=]\s*"
Final program​:
  1​: BOL (2)
  2​: EXACTF <Block> (5)
  5​: STAR (7)
  6​: SPACE (0)
  7​: ANYOF[​:=][] (18)
  18​: STAR (20)
  19​: SPACE (0)
  20​: END (0)
stclass EXACTF <Block> anchored(BOL) minlen 6
Compiling REx "^([\w\s]+)[​:=]\s*(.*)"
synthetic stclass "ANYOF[\11\12\14\15 0-9A-Z_a-z][+utf8​::IsWord +utf8​::IsSpacePerl]".
Final program​:
  1​: BOL (2)
  2​: OPEN1 (4)
  4​: PLUS (17)
  5​: ANYOF[\11\12\14\15 0-9A-Z_a-z][+utf8​::IsWord +utf8​::IsSpacePerl] (0)
  17​: CLOSE1 (19)
  19​: ANYOF[​:=][] (30)
  30​: STAR (32)
  31​: SPACE (0)
  32​: OPEN2 (34)
  34​: STAR (36)
  35​: REG_ANY (0)
  36​: CLOSE2 (38)
  38​: END (0)
stclass ANYOF[\11\12\14\15 0-9A-Z_a-z][+utf8​::IsWord +utf8​::IsSpacePerl] anchored(BOL) minlen 2
Compiling REx "(?<=[a-z\d])(?​:\s+|[-_])(?=[a-z\d])"
Final program​:
  1​: IFMATCH[-1] (17)
  3​: ANYOF[0-9a-z][+utf8​::IsDigit] (15)
  15​: SUCCEED (0)
  16​: TAIL (17)
  17​: BRANCH (20)
  18​: PLUS (33)
  19​: SPACE (0)
  20​: BRANCH (FAIL)
  21​: ANYOF[\-_][] (33)
  32​: TAIL (33)
  33​: IFMATCH[0] (49)
  35​: ANYOF[0-9a-z][+utf8​::IsDigit] (47)
  47​: SUCCEED (0)
  48​: TAIL (49)
  49​: END (0)
minlen 1
Compiling REx "^To(?​:\w+)$"
rarest char
at 0
rarest char T at 0
Final program​:
  1​: BOL (2)
  2​: EXACT <To> (4)
  4​: PLUS (6)
  5​: ALNUM (0)
  6​: EOL (7)
  7​: END (0)
anchored "To" at 0 floating ""$ at 3..2147483647 (checking anchored) anchored(BOL) minlen 3
Compiling REx "^To(Digit|Fold|Lower|Title|Upper)$"
rarest char
at 0
rarest char T at 0
Final program​:
  1​: BOL (2)
  2​: EXACT <To> (4)
  4​: OPEN1 (6)
  6​: TRIEC-EXACT[DFLTU] (25)
  <Digit>
  <Fold>
  <Lower>
  <Title>
  <Upper>
  25​: CLOSE1 (27)
  27​: EOL (28)
  28​: END (0)
anchored "To" at 0 floating ""$ at 6..7 (checking anchored) anchored(BOL) minlen 6
Compiling REx "^"
Final program​:
  1​: MBOL (2)
  2​: END (0)
stclass END anchored(MBOL) minlen 0
Compiling REx "^[^0-9a-fA-F]"
Final program​:
  1​: BOL (2)
  2​: ANYOF[\0-/​:-@​G-`g-\377][{unicode_all}] (13)
  13​: END (0)
stclass ANYOF[\0-/​:-@​G-`g-\377][{unicode_all}] anchored(BOL) minlen 1
Compiling REx "^([0-9a-fA-F]+)"
synthetic stclass "ANYOF[0-9A-Fa-f][]".
Final program​:
  1​: BOL (2)
  2​: OPEN1 (4)
  4​: PLUS (16)
  5​: ANYOF[0-9A-Fa-f][] (0)
  16​: CLOSE1 (18)
  18​: END (0)
stclass ANYOF[0-9A-Fa-f][] anchored(BOL) minlen 1
Compiling REx "^([0-9a-fA-F]+)"
synthetic stclass "ANYOF[0-9A-Fa-f][]".
Final program​:
  1​: BOL (2)
  2​: OPEN1 (4)
  4​: PLUS (16)
  5​: ANYOF[0-9A-Fa-f][] (0)
  16​: CLOSE1 (18)
  18​: END (0)
stclass ANYOF[0-9A-Fa-f][] anchored(BOL) minlen 1
Compiling REx "\tXXXX$"
rarest char X at 1
Final program​:
  1​: EXACT <\tXXXX> (4)
  4​: MEOL (5)
  5​: END (0)
anchored "%tXXXX"$ at 0 (checking anchored isall) minlen 5
Compiling REx "^([0-9a-fA-F]+)(?​:[\t]([0-9a-fA-F]+)?)(?​:[ \t]([0-9a-fA-F]+)"...
rarest char at 0
synthetic stclass "ANYOF[0-9A-Fa-f][]".
Final program​:
  1​: MBOL (2)
  2​: OPEN1 (4)
  4​: PLUS (16)
  5​: ANYOF[0-9A-Fa-f][] (0)
  16​: CLOSE1 (18)
  18​: EXACT <\t> (20)
  20​: CURLYX[1] {0,1} (39)
  22​: OPEN2 (24)
  24​: PLUS (36)
  25​: ANYOF[0-9A-Fa-f][] (0)
  36​: CLOSE2 (38)
  38​: WHILEM (0)
  39​: NOTHING (40)
  40​: CURLYX[2] {0,1} (70)
  42​: ANYOF[\11 ][] (53)
  53​: OPEN3 (55)
  55​: PLUS (67)
  56​: ANYOF[0-9A-Fa-f][] (0)
  67​: CLOSE3 (69)
  69​: WHILEM (0)
  70​: NOTHING (71)
  71​: END (0)
floating "%t" at 1..2147483647 (checking floating) stclass ANYOF[0-9A-Fa-f][] anchored(MBOL) minlen 2
Compiling REx "^([^0-9a-fA-F\n])(.*)"
synthetic stclass "ANYOF[\0-\11\13-/​:-@​G-`g-\377][{unicode_all}]".
Final program​:
  1​: MBOL (2)
  2​: OPEN1 (4)
  4​: ANYOF[\0-\11\13-/​:-@​G-`g-\377][{unicode_all}] (15)
  15​: CLOSE1 (17)
  17​: OPEN2 (19)
  19​: STAR (21)
  20​: REG_ANY (0)
  21​: CLOSE2 (23)
  23​: END (0)
stclass ANYOF[\0-\11\13-/​:-@​G-`g-\377][{unicode_all}] anchored(MBOL) minlen 1
Compiling REx "[-+!&]"
Final program​:
  1​: ANYOF[!&+\-][] (12)
  12​: END (0)
stclass ANYOF[!&+\-][] minlen 1
Compiling REx "​::"
rarest char : at 0
Final program​:
  1​: EXACT <​::> (3)
  3​: END (0)
anchored "​::" at 0 (checking anchored isall) minlen 2
Compiling REx "^([0-9a-fA-F]+)"
synthetic stclass "ANYOF[0-9A-Fa-f][]".
Final program​:
  1​: BOL (2)
  2​: OPEN1 (4)
  4​: PLUS (16)
  5​: ANYOF[0-9A-Fa-f][] (0)
  16​: CLOSE1 (18)
  18​: END (0)
stclass ANYOF[0-9A-Fa-f][] anchored(BOL) minlen 1
  failed...
  0 <> <<a>Attenti> | 6​:BRANCH(8)
  0 <> <<a>Attenti> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  1 <<> <a>Attentio> | 1​:OPEN1(3)
  1 <<> <a>Attentio> | 3​:BRANCH(6)
  1 <<> <a>Attentio> | 4​: EXACTF < >(8)
  failed...
  1 <<> <a>Attentio> | 6​:BRANCH(8)
  1 <<> <a>Attentio> | 7​: BOUND(8)
  1 <<> <a>Attentio> | 8​: CLOSE1(10)
  1 <<> <a>Attentio> | 10​: OPEN2(12)
  1 <<> <a>Attentio> | 12​: TRIE-EXACTF[HWhw](19)
  1 <<> <a>Attentio> | State​: 1 Accepted​: 0 Charid​: 0 CP​: 61 After State​: 0
  failed...
  BRANCH failed...
  2 <<a> <>Attention> | 1​:OPEN1(3)
  2 <<a> <>Attention> | 3​:BRANCH(6)
  2 <<a> <>Attention> | 4​: EXACTF < >(8)
  failed...
  2 <<a> <>Attention> | 6​:BRANCH(8)
  2 <<a> <>Attention> | 7​: BOUND(8)
  2 <<a> <>Attention> | 8​: CLOSE1(10)
  2 <<a> <>Attention> | 10​: OPEN2(12)
  2 <<a> <>Attention> | 12​: TRIE-EXACTF[HWhw](19)
  2 <<a> <>Attention> | State​: 1 Accepted​: 0 Charid​: 0 CP​: 3e After State​: 0
  failed...
  BRANCH failed...
  3 <<a>> <Attention > | 1​:OPEN1(3)
  3 <<a>> <Attention > | 3​:BRANCH(6)
  3 <<a>> <Attention > | 4​: EXACTF < >(8)
  failed...
  3 <<a>> <Attention > | 6​:BRANCH(8)
  3 <<a>> <Attention > | 7​: BOUND(8)
  3 <<a>> <Attention > | 8​: CLOSE1(10)
  3 <<a>> <Attention > | 10​: OPEN2(12)
  3 <<a>> <Attention > | 12​: TRIE-EXACTF[HWhw](19)
  3 <<a>> <Attention > | State​: 1 Accepted​: 0 Charid​: 0 CP​: 61 After State​: 0
  failed...
  BRANCH failed...
  4 <<a>A> <ttention H> | 1​:OPEN1(3)
  4 <<a>A> <ttention H> | 3​:BRANCH(6)
  4 <<a>A> <ttention H> | 4​: EXACTF < >(8)
  failed...
  4 <<a>A> <ttention H> | 6​:BRANCH(8)
  4 <<a>A> <ttention H> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  5 <<a>At> <tention Ho> | 1​:OPEN1(3)
  5 <<a>At> <tention Ho> | 3​:BRANCH(6)
  5 <<a>At> <tention Ho> | 4​: EXACTF < >(8)
  failed...
  5 <<a>At> <tention Ho> | 6​:BRANCH(8)
  5 <<a>At> <tention Ho> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  6 <a>Att> <ention Hom> | 1​:OPEN1(3)
  6 <a>Att> <ention Hom> | 3​:BRANCH(6)
  6 <a>Att> <ention Hom> | 4​: EXACTF < >(8)
  failed...
  6 <a>Att> <ention Hom> | 6​:BRANCH(8)
  6 <a>Att> <ention Hom> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  7 <>Atte> <ntion Home> | 1​:OPEN1(3)
  7 <>Atte> <ntion Home> | 3​:BRANCH(6)
  7 <>Atte> <ntion Home> | 4​: EXACTF < >(8)
  failed...
  7 <>Atte> <ntion Home> | 6​:BRANCH(8)
  7 <>Atte> <ntion Home> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  8 <Atten> <tion Home> | 1​:OPEN1(3)
  8 <Atten> <tion Home> | 3​:BRANCH(6)
  8 <Atten> <tion Home> | 4​: EXACTF < >(8)
  failed...
  8 <Atten> <tion Home> | 6​:BRANCH(8)
  8 <Atten> <tion Home> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  9 <ttent> <ion Home> | 1​:OPEN1(3)
  9 <ttent> <ion Home> | 3​:BRANCH(6)
  9 <ttent> <ion Home> | 4​: EXACTF < >(8)
  failed...
  9 <ttent> <ion Home> | 6​:BRANCH(8)
  9 <ttent> <ion Home> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  10 <tenti> <on Home> | 1​:OPEN1(3)
  10 <tenti> <on Home> | 3​:BRANCH(6)
  10 <tenti> <on Home> | 4​: EXACTF < >(8)
  failed...
  10 <tenti> <on Home> | 6​:BRANCH(8)
  10 <tenti> <on Home> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  11 <entio> <n Home> | 1​:OPEN1(3)
  11 <entio> <n Home> | 3​:BRANCH(6)
  11 <entio> <n Home> | 4​: EXACTF < >(8)
  failed...
  11 <entio> <n Home> | 6​:BRANCH(8)
  11 <entio> <n Home> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  12 <ntion> < Home> | 1​:OPEN1(3)
  12 <ntion> < Home> | 3​:BRANCH(6)
  12 <ntion> < Home> | 4​: EXACTF < >(8)
  13 <tion > <Home> | 8​: CLOSE1(10)
  13 <tion > <Home> | 10​: OPEN2(12)
  13 <tion > <Home> | 12​: TRIE-EXACTF[HWhw](19)
  13 <tion > <Home> | State​: 1 Accepted​: 0 Charid​: 1 CP​: 68 After State​: 2
  14 <ion H> <ome%x{3bf}> | State​: 2 Accepted​: 0 Charid​: 0 CP​: 6f After State​: 0
  failed...
  12 <ntion> < Home> | 6​:BRANCH(8)
  12 <ntion> < Home> | 7​: BOUND(8)
  12 <ntion> < Home> | 8​: CLOSE1(10)
  12 <ntion> < Home> | 10​: OPEN2(12)
  12 <ntion> < Home> | 12​: TRIE-EXACTF[HWhw](19)
  12 <ntion> < Home> | State​: 1 Accepted​: 0 Charid​: 0 CP​: 20 After State​: 0
  failed...
  BRANCH failed...
  13 <tion > <Home> | 1​:OPEN1(3)
  13 <tion > <Home> | 3​:BRANCH(6)
  13 <tion > <Home> | 4​: EXACTF < >(8)
  failed...
  13 <tion > <Home> | 6​:BRANCH(8)
  13 <tion > <Home> | 7​: BOUND(8)
  13 <tion > <Home> | 8​: CLOSE1(10)
  13 <tion > <Home> | 10​: OPEN2(12)
  13 <tion > <Home> | 12​: TRIE-EXACTF[HWhw](19)
  13 <tion > <Home> | State​: 1 Accepted​: 0 Charid​: 1 CP​: 68 After State​: 2
  14 <ion H> <ome%x{3bf}> | State​: 2 Accepted​: 0 Charid​: 0 CP​: 6f After State​: 0
  failed...
  BRANCH failed...
  14 <ion H> <ome%x{3bf}> | 1​:OPEN1(3)
  14 <ion H> <ome%x{3bf}> | 3​:BRANCH(6)
  14 <ion H> <ome%x{3bf}> | 4​: EXACTF < >(8)
  failed...
  14 <ion H> <ome%x{3bf}> | 6​:BRANCH(8)
  14 <ion H> <ome%x{3bf}> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  15 <on Ho> <me%x{3bf}> | 1​:OPEN1(3)
  15 <on Ho> <me%x{3bf}> | 3​:BRANCH(6)
  15 <on Ho> <me%x{3bf}> | 4​: EXACTF < >(8)
  failed...
  15 <on Ho> <me%x{3bf}> | 6​:BRANCH(8)
  15 <on Ho> <me%x{3bf}> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  16 <n Hom> <e%x{3bf}> | 1​:OPEN1(3)
  16 <n Hom> <e%x{3bf}> | 3​:BRANCH(6)
  16 <n Hom> <e%x{3bf}> | 4​: EXACTF < >(8)
  failed...
  16 <n Hom> <e%x{3bf}> | 6​:BRANCH(8)
  16 <n Hom> <e%x{3bf}> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  17 < Home> <%x{3bf}> | 1​:OPEN1(3)
  17 < Home> <%x{3bf}> | 3​:BRANCH(6)
  17 < Home> <%x{3bf}> | 4​: EXACTF < >(8)
  failed...
  17 < Home> <%x{3bf}> | 6​:BRANCH(8)
  17 < Home> <%x{3bf}> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  19 <ome%x{3bf}> <%x{3c9}n> | 1​:OPEN1(3)
  19 <ome%x{3bf}> <%x{3c9}n> | 3​:BRANCH(6)
  19 <ome%x{3bf}> <%x{3c9}n> | 4​: EXACTF < >(8)
  failed...
  19 <ome%x{3bf}> <%x{3c9}n> | 6​:BRANCH(8)
  19 <ome%x{3bf}> <%x{3c9}n> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  21 <e%x{3bf}%x{3c9}> <n%x{4e9}rs>| 1​:OPEN1(3)
  21 <e%x{3bf}%x{3c9}> <n%x{4e9}rs>| 3​:BRANCH(6)
  21 <e%x{3bf}%x{3c9}> <n%x{4e9}rs>| 4​: EXACTF < >(8)
  failed...
  21 <e%x{3bf}%x{3c9}> <n%x{4e9}rs>| 6​:BRANCH(8)
  21 <e%x{3bf}%x{3c9}> <n%x{4e9}rs>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  22 <%x{3bf}%x{3c9}n> <%x{4e9}rs.>| 1​:OPEN1(3)
  22 <%x{3bf}%x{3c9}n> <%x{4e9}rs.>| 3​:BRANCH(6)
  22 <%x{3bf}%x{3c9}n> <%x{4e9}rs.>| 4​: EXACTF < >(8)
  failed...
  22 <%x{3bf}%x{3c9}n> <%x{4e9}rs.>| 6​:BRANCH(8)
  22 <%x{3bf}%x{3c9}n> <%x{4e9}rs.>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  24 <%x{3c9}n%x{4e9}> <rs...1>| 1​:OPEN1(3)
  24 <%x{3c9}n%x{4e9}> <rs...1>| 3​:BRANCH(6)
  24 <%x{3c9}n%x{4e9}> <rs...1>| 4​: EXACTF < >(8)
  failed...
  24 <%x{3c9}n%x{4e9}> <rs...1>| 6​:BRANCH(8)
  24 <%x{3c9}n%x{4e9}> <rs...1>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  25 <%x{3c9}n%x{4e9}r> <s...1>| 1​:OPEN1(3)
  25 <%x{3c9}n%x{4e9}r> <s...1>| 3​:BRANCH(6)
  25 <%x{3c9}n%x{4e9}r> <s...1>| 4​: EXACTF < >(8)
  failed...
  25 <%x{3c9}n%x{4e9}r> <s...1>| 6​:BRANCH(8)
  25 <%x{3c9}n%x{4e9}r> <s...1>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  26 <n%x{4e9}rs> <...1> | 1​:OPEN1(3)
  26 <n%x{4e9}rs> <...1> | 3​:BRANCH(6)
  26 <n%x{4e9}rs> <...1> | 4​: EXACTF < >(8)
  failed...
  26 <n%x{4e9}rs> <...1> | 6​:BRANCH(8)
  26 <n%x{4e9}rs> <...1> | 7​: BOUND(8)
  26 <n%x{4e9}rs> <...1> | 8​: CLOSE1(10)
  26 <n%x{4e9}rs> <...1> | 10​: OPEN2(12)
  26 <n%x{4e9}rs> <...1> | 12​: TRIE-EXACTF[HWhw](19)
  26 <n%x{4e9}rs> <...1> | State​: 1 Accepted​: 0 Charid​: 6 CP​: 2e After State​: 0
  failed...
  BRANCH failed...
  27 <%x{4e9}rs.> <..1%x{455}>| 1​:OPEN1(3)
  27 <%x{4e9}rs.> <..1%x{455}>| 3​:BRANCH(6)
  27 <%x{4e9}rs.> <..1%x{455}>| 4​: EXACTF < >(8)
  failed...
  27 <%x{4e9}rs.> <..1%x{455}>| 6​:BRANCH(8)
  27 <%x{4e9}rs.> <..1%x{455}>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  28 <%x{4e9}rs..> <.1%x{455}t>| 1​:OPEN1(3)
  28 <%x{4e9}rs..> <.1%x{455}t>| 3​:BRANCH(6)
  28 <%x{4e9}rs..> <.1%x{455}t>| 4​: EXACTF < >(8)
  failed...
  28 <%x{4e9}rs..> <.1%x{455}t>| 6​:BRANCH(8)
  28 <%x{4e9}rs..> <.1%x{455}t>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  29 <rs...> <1%x{455}t > | 1​:OPEN1(3)
  29 <rs...> <1%x{455}t > | 3​:BRANCH(6)
  29 <rs...> <1%x{455}t > | 4​: EXACTF < >(8)
  failed...
  29 <rs...> <1%x{455}t > | 6​:BRANCH(8)
  29 <rs...> <1%x{455}t > | 7​: BOUND(8)
  29 <rs...> <1%x{455}t > | 8​: CLOSE1(10)
  29 <rs...> <1%x{455}t > | 10​: OPEN2(12)
  29 <rs...> <1%x{455}t > | 12​: TRIE-EXACTF[HWhw](19)
  29 <rs...> <1%x{455}t > | State​: 1 Accepted​: 0 Charid​: 0 CP​: 31 After State​: 0
  failed...
  BRANCH failed...
  30 <s...1> <%x{455}t T> | 1​:OPEN1(3)
  30 <s...1> <%x{455}t T> | 3​:BRANCH(6)
  30 <s...1> <%x{455}t T> | 4​: EXACTF < >(8)
  failed...
  30 <s...1> <%x{455}t T> | 6​:BRANCH(8)
  30 <s...1> <%x{455}t T> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  32 <..1%x{455}> <t T%x{456}>| 1​:OPEN1(3)
  32 <..1%x{455}> <t T%x{456}>| 3​:BRANCH(6)
  32 <..1%x{455}> <t T%x{456}>| 4​: EXACTF < >(8)
  failed...
  32 <..1%x{455}> <t T%x{456}>| 6​:BRANCH(8)
  32 <..1%x{455}> <t T%x{456}>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  33 <.1%x{455}t> < T%x{456}>| 1​:OPEN1(3)
  33 <.1%x{455}t> < T%x{456}>| 3​:BRANCH(6)
  33 <.1%x{455}t> < T%x{456}>| 4​: EXACTF < >(8)
  34 <1%x{455}t > <T%x{456}> | 8​: CLOSE1(10)
  34 <1%x{455}t > <T%x{456}> | 10​: OPEN2(12)
  34 <1%x{455}t > <T%x{456}> | 12​: TRIE-EXACTF[HWhw](19)
  34 <1%x{455}t > <T%x{456}> | State​: 1 Accepted​: 0 Charid​: 2 CP​: 74 After State​: 0
  failed...
  33 <.1%x{455}t> < T%x{456}>| 6​:BRANCH(8)
  33 <.1%x{455}t> < T%x{456}>| 7​: BOUND(8)
  33 <.1%x{455}t> < T%x{456}>| 8​: CLOSE1(10)
  33 <.1%x{455}t> < T%x{456}>| 10​: OPEN2(12)
  33 <.1%x{455}t> < T%x{456}>| 12​: TRIE-EXACTF[HWhw](19)
  33 <.1%x{455}t> < T%x{456}>| State​: 1 Accepted​: 0 Charid​: 0 CP​: 20 After State​: 0
  failed...
  BRANCH failed...
  34 <1%x{455}t > <T%x{456}> | 1​:OPEN1(3)
  34 <1%x{455}t > <T%x{456}> | 3​:BRANCH(6)
  34 <1%x{455}t > <T%x{456}> | 4​: EXACTF < >(8)
  failed...
  34 <1%x{455}t > <T%x{456}> | 6​:BRANCH(8)
  34 <1%x{455}t > <T%x{456}> | 7​: BOUND(8)
  34 <1%x{455}t > <T%x{456}> | 8​: CLOSE1(10)
  34 <1%x{455}t > <T%x{456}> | 10​: OPEN2(12)
  34 <1%x{455}t > <T%x{456}> | 12​: TRIE-EXACTF[HWhw](19)
  34 <1%x{455}t > <T%x{456}> | State​: 1 Accepted​: 0 Charid​: 2 CP​: 74 After State​: 0
  failed...
  BRANCH failed...
  35 <%x{455}t T> <%x{456}> | 1​:OPEN1(3)
  35 <%x{455}t T> <%x{456}> | 3​:BRANCH(6)
  35 <%x{455}t T> <%x{456}> | 4​: EXACTF < >(8)
  failed...
  35 <%x{455}t T> <%x{456}> | 6​:BRANCH(8)
  35 <%x{455}t T> <%x{456}> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  37 <t T%x{456}> <%x{43c}e E>| 1​:OPEN1(3)
  37 <t T%x{456}> <%x{43c}e E>| 3​:BRANCH(6)
  37 <t T%x{456}> <%x{43c}e E>| 4​: EXACTF < >(8)
  failed...
  37 <t T%x{456}> <%x{43c}e E>| 6​:BRANCH(8)
  37 <t T%x{456}> <%x{43c}e E>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  39 <T%x{456}%x{43c}> <e E%x{3bd}>| 1​:OPEN1(3)
  39 <T%x{456}%x{43c}> <e E%x{3bd}>| 3​:BRANCH(6)
  39 <T%x{456}%x{43c}> <e E%x{3bd}>| 4​: EXACTF < >(8)
  failed...
  39 <T%x{456}%x{43c}> <e E%x{3bd}>| 6​:BRANCH(8)
  39 <T%x{456}%x{43c}> <e E%x{3bd}>| 7​: BOUND(8)
  failed...
  BRANCH failed...
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 1​:OPEN1(3)
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 3​:BRANCH(6)
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 4​: EXACTF < >(8)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 8​: CLOSE1(10)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 10​: OPEN2(12)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 12​: TRIE-EXACTF[HWhw](19)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| State​: 1 Accepted​: 0 Charid​: 0 CP​: 65 After State​: 0
  failed...
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 6​:BRANCH(8)
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 7​: BOUND(8)
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 8​: CLOSE1(10)
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 10​: OPEN2(12)
  40 <%x{456}%x{43c}e> < E%x{3bd}>| 12​: TRIE-EXACTF[HWhw](19)
  40 <%x{456}%x{43c}e> < E%x{3bd}>| State​: 1 Accepted​: 0 Charid​: 0 CP​: 20 After State​: 0
  failed...
  BRANCH failed...
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 1​:OPEN1(3)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 3​:BRANCH(6)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 4​: EXACTF < >(8)
  failed...
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 6​:BRANCH(8)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 7​: BOUND(8)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 8​: CLOSE1(10)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 10​: OPEN2(12)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| 12​: TRIE-EXACTF[HWhw](19)
  41 <%x{456}%x{43c}e > <E%x{3bd}>| State​: 1 Accepted​: 0 Charid​: 0 CP​: 65 After State​: 0
  failed...
  BRANCH failed...
  42 <%x{43c}e E> <%x{3bd}> | 1​:OPEN1(3)
  42 <%x{43c}e E> <%x{3bd}> | 3​:BRANCH(6)
  42 <%x{43c}e E> <%x{3bd}> | 4​: EXACTF < >(8)
  failed...
  42 <%x{43c}e E> <%x{3bd}> | 6​:BRANCH(8)
  42 <%x{43c}e E> <%x{3bd}> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  44 <e E%x{3bd}> <%x{4e9}> | 1​:OPEN1(3)
  44 <e E%x{3bd}> <%x{4e9}> | 3​:BRANCH(6)
  44 <e E%x{3bd}> <%x{4e9}> | 4​: EXACTF < >(8)
  failed...
  44 <e E%x{3bd}> <%x{4e9}> | 6​:BRANCH(8)
  44 <e E%x{3bd}> <%x{4e9}> | 7​: BOUND(8)
  failed...
  BRANCH failed...
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 1​:OPEN1(3)
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 3​:BRANCH(6)
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 4​: EXACTF < >(8)
  failed...
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 6​:BRANCH(8)
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 7​: BOUND(8)
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 8​: CLOSE1(10)
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 10​: OPEN2(12)
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| 12​: TRIE-EXACTF[HWhw](19)
  46 <E%x{3bd}%x{4e9}> <%x{106783}>| State​: 1 Accepted​: 0
Segmentation fault​: 11 (core dumped)

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl 5.10.1:

Configured by mark at Thu Oct 22 21:09:57 CEST 2009.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=freebsd, osvers=7.2-release-p2, archname=amd64-freebsd
    uname='freebsd dorothy.ijs.si 7.2-release-p2 freebsd 7.2-release-p2 #0: wed jul 15 15:45:26 cest 2009 lesi@dorothy.ijs.si:usrobjusrsrcsysdorothy amd64 '
    config_args='-sde -Dprefix=/usr/local -Darchlib=/usr/local/lib/perl5/5.10.1/mach -Dprivlib=/usr/local/lib/perl5/5.10.1 -Dman3dir=/usr/local/lib/perl5/5.10.1/perl/man/man3 -Dman1dir=/usr/local/man/man1 -Dsitearch=/usr/local/lib/perl5/site_perl/5.10.1/mach -Dsitelib=/usr/local/lib/perl5/site_perl/5.10.1 -Dscriptdir=/usr/local/bin -Dsiteman3dir=/usr/local/lib/perl5/5.10.1/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Ui_malloc -Ui_iconv -Uinstallusrbinperl -Dcc=cc -Duseshrplib -Dinc_version_list=none -Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.10.1/BSDPAN" -Doptimize=-g -DDEBUGGING -Ud_dosuid -Ui_gdbm -Dusethreads=n -Dusemymalloc=n -Duse64bitint'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.10.1/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-g',
    cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.10.1/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.2.1 20070719  [FreeBSD]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/lib /usr/local/lib
    libs=-lgdbm -lm -lcrypt -lutil
    perllibs=-lm -lcrypt -lutil
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='  -Wl,-R/usr/local/lib/perl5/5.10.1/mach/CORE'
    cccdlflags='-DPIC -fPIC', lddlflags='-shared  -L/usr/local/lib -fstack-protector'

Locally applied patches:
    


@INC for perl 5.10.1:
    /usr/local/lib/perl5/5.10.1/BSDPAN
    /usr/local/lib/perl5/site_perl/5.10.1/mach
    /usr/local/lib/perl5/site_perl/5.10.1
    /usr/local/lib/perl5/5.10.1/mach
    /usr/local/lib/perl5/5.10.1
    .


Environment for perl 5.10.1:
    HOME=/root
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/home/mark/bin:/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2009

From Mark.Martinec@ijs.si

Some additional information on non-vulnerable systems, provided
by Jan iankko Lieskovsky / Red Hat Security Response Team :

This issue affects only perl-5.10.1​:
(didn't check perl-5.11.1.tar.gz though).

But did check perl-{5.8.0, 5.8.5, 5.8.8, 5.10.0} and the provided
reproducer [2] doesn't crash on these versions, while it cleanly
crashes on upstream perl-5.10.1.tar.gz.
[...]
Have checked both versions (with / without (original PoC))
this add-on on perl-{5.8.0, 5.8.5, 5.8.8, 5.10.0} - these
are not vulnerable.

Both versions (original && modified PoC) crash
with perl-5.10.1.tar.gz.

(Hopefully the above information could be stated also in
upstream PerlBug to explicitly mention (not)vulnerable versions).

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2009

From @demerphq

2009/10/22 Mark Martinec <perlbug-followup@​perl.org>​:

# New Ticket Created by  Mark Martinec
# Please include the string​:  [perl #69973]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=69973 >

This is a bug report for perl from Mark.Martinec@​ijs.si,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
[Please describe your issue here]

Tracking down a reason for crashes of a perl process while processing
certain obfuscated spam messages, it turns out that an utf-8 character
with a large (and invalid) codepoint is causing a perl 5.10.1 crash
while matching such string to a particular regular expression.

This is happening on a FreeBSD 7.2, using perl as installed from ports
with no special settings.

Reducing the actual crashing application to a small test case,
here it is​:

#!/usr/bin/perl -T
 use strict;

 # Here is a HTML snippet from a malicious/obfuscated mail message.
 # Note the last character has an invalid and huge UTF-8 code
 # (as a result of an unrelated bug in HTML​::Parser).
 #
 my $t = '<a>Attention Home&#959&#969n&#1257rs...1&#1109t '.
         'T&#1110&#1084e E&#957&#1257&#1075075</a>';

 $t =~ s/&#(\d+)/chr($1)/ge;    # convert HTML entities to UTF8
 $t .= substr($ENV{PATH},0,0);  # make it tainted

 # show character codes in the resulting string
 print join(", ", map {ord} split(//,$t)), "\n";

 # The following regexp evaluation crashes perl 5.10.1 on FreeBSD.
 # Note that $t must be tainted and must have the UTF8 flag on,
 # otherwise the crash seems to be avoided.

 $t =~ /( |\b)(http​:|www\.)/i;

and here is the result (hand wrapped)​:

 60, 97, 62, 65, 116, 116, 101, 110, 116, 105, 111, 110, 32, 72, 111,
 109, 101, 959, 969, 110, 1257, 114, 115, 46, 46, 46, 49, 1109, 116,
 32, 84, 1110, 1084, 101, 32, 69, 957, 1257, 1075075, 60, 47, 97, 62
 Segmentation fault​: 11 (core dumped)

Here is a backtrace as obtained from a core dump
(cut/pasted from screen, the actual 8-bit characters may be wrong)​:
$ gdb -c perl5.10.1.core /usr/local/bin/perl5.10.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `perl5.10.1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/perl5/5.10.1/mach/CORE/libperl.so...done.
Loaded symbols for /usr/local/lib/perl5/5.10.1/mach/CORE/libperl.so
Reading symbols from /lib/libm.so.5...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypt.so.4...done.
Loaded symbols for /lib/libcrypt.so.4
Reading symbols from /lib/libutil.so.7...done.
Loaded symbols for /lib/libutil.so.7
Reading symbols from /lib/libc.so.7...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x00000000408bb101 in S_regmatch (reginfo=0x7fffffffe590, prog=0x411143a4) at regexec.c​:3049
3049                            REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,

Unfortunately this is just masking the cause, im pretty sure the
problem is in utf8.c

You would have ended up in this code​:

  case trie_utf8_fold​: \
  if ( foldlen>0 ) { \
  uvc_unfolded = uvc = utf8n_to_uvuni( uscan, UTF8_MAXLEN, &len,
uniflags ); \
  foldlen -= len; \
  uscan += len; \
  len=0; \
  } else { \
  uvc_unfolded = uvc = utf8n_to_uvuni( (U8*)uc, UTF8_MAXLEN, &len,
uniflags ); \
  uvc = to_uni_fold( uvc, foldbuf, &foldlen ); \
  foldlen -= UNISKIP( uvc ); \
  uscan = foldbuf + UNISKIP( uvc ); \
  } \
  break;

Im guessing in the second clause, probably in to_uni_fold().

(gdb) bt
#0  0x00000000408bb101 in S_regmatch (reginfo=0x7fffffffe590, prog=0x411143a4) at regexec.c​:3049
#1  0x00000000408b7b0a in S_regtry (reginfo=0x7fffffffe590, startpos=0x7fffffffe6d8) at regexec.c​:2355
#2  0x00000000408b6a7a in Perl_regexec_flags (prog=0x4114f1a0,
   stringarg=0x4111d6c0 "<a>Attention HomeοÏ\211nÓ©rs...1Ñ\225t TÑ\226мe Eνөô\206\236\203</a>",
   strend=0x4111d6f3 "/a>",
   strbeg=0x4111d6c0 "<a>Attention HomeοÏ\211nÓ©rs...1Ñ\225t TÑ\226мe Eνөô\206\236\203</a>", minend=0,
   sv=0x4113ec48, data=0x0, flags=3) at regexec.c​:2146
#3  0x00000000407864a3 in Perl_pp_match () at pp_hot.c​:1356
#4  0x000000004073fa4c in Perl_runops_debug () at dump.c​:1968
#5  0x00000000406905d8 in S_run_body (oldscope=1) at perl.c​:2431
#6  0x000000004068f9b0 in perl_run (my_perl=0x41102104) at perl.c​:2349
#7  0x0000000000400bf4 in main (argc=3, argv=0x7fffffffea90, env=0x7fffffffeab0) at perlmain.c​:117

(gdb)

And lastly, here is a perl debug output using the -Dr command line option​:

Thanks, your report is very complete.

Compiling REx "( |\b)(http​:|www\.)"
Final program​:
  1​: OPEN1 (3)
  3​:   BRANCH (6)
  4​:     EXACTF < > (8)
  6​:   BRANCH (FAIL)
  7​:     BOUND (8)
  8​: CLOSE1 (10)
 10​: OPEN2 (12)
 12​:   TRIE-EXACTF[HWhw] (19)
       <http​:>
       <www.>
 19​: CLOSE2 (21)
 21​: END (0)
minlen 4
Omitting $` $&amp; $' support.

EXECUTING...
[...]
 46 <E%x{3bd}%x{4e9}> <%x{106783}>|  1​:OPEN1(3)
 46 <E%x{3bd}%x{4e9}> <%x{106783}>|  3​:BRANCH(6)
 46 <E%x{3bd}%x{4e9}> <%x{106783}>|  4​:  EXACTF < >(8)
                                   failed...
 46 <E%x{3bd}%x{4e9}> <%x{106783}>|  6​:BRANCH(8)
 46 <E%x{3bd}%x{4e9}> <%x{106783}>|  7​:  BOUND(8)
 46 <E%x{3bd}%x{4e9}> <%x{106783}>|  8​:  CLOSE1(10)
 46 <E%x{3bd}%x{4e9}> <%x{106783}>| 10​:  OPEN2(12)
 46 <E%x{3bd}%x{4e9}> <%x{106783}>| 12​:  TRIE-EXACTF[HWhw](19)
 46 <E%x{3bd}%x{4e9}> <%x{106783}>|      State​:    1 Accepted​:    0

I think the regex engine is the only place that uses the unicode
folding logic. Ill try to dig further.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2009

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2009

From perl@profvince.com

2009/10/22 Mark Martinec <perlbug-followup@​perl.org>​:

# New Ticket Created by Mark Martinec
# Please include the string​: [perl #69973]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=69973 >

This is a bug report for perl from Mark.Martinec@​ijs.si,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
[Please describe your issue here]

Tracking down a reason for crashes of a perl process while processing
certain obfuscated spam messages, it turns out that an utf-8 character
with a large (and invalid) codepoint is causing a perl 5.10.1 crash
while matching such string to a particular regular expression.

This is happening on a FreeBSD 7.2, using perl as installed from ports
with no special settings.

Reducing the actual crashing application to a small test case,
here it is​:

#!/usr/bin/perl -T
use strict;

# Here is a HTML snippet from a malicious/obfuscated mail message.
# Note the last character has an invalid and huge UTF-8 code
# (as a result of an unrelated bug in HTML​::Parser).
#
my $t = '<a>Attention Home&#959&#969n&#1257rs...1&#1109t '.
'T&#1110&#1084e E&#957&#1257&#1075075</a>';

$t =~ s/&#(\d+)/chr($1)/ge; # convert HTML entities to UTF8
$t .= substr($ENV{PATH},0,0); # make it tainted

# show character codes in the resulting string
print join(", ", map {ord} split(//,$t)), "\n";

# The following regexp evaluation crashes perl 5.10.1 on FreeBSD.
# Note that $t must be tainted and must have the UTF8 flag on,
# otherwise the crash seems to be avoided.

$t =~ /( |\b)(http​:|www\.)/i;

Bisected down to 8902bb0
(http​://perl5.git.perl.org/perl.git/commit/8902bb05)​:

Author​: Slaven Rezic <slaven@​rezic.de>
Date​: Sun Jan 4 17​:28​:33 2009 +0100

  Another regexp failure with utf8-flagged string and byte-flagged
pattern (reminder)
 
  Date​: 17 Nov 2007 16​:29​:29 +0100
  Message-ID​: <87r6iohova.fsf@​biokovo-amd64.herceg.de>
 
  (cherry picked from commit c012444)

Vincent.

@p5pRT
Copy link
Author

p5pRT commented Oct 24, 2009

From @demerphq

2009/10/23 Vincent Pit <perl@​profvince.com>​:

2009/10/22 Mark Martinec <perlbug-followup@​perl.org>​:

# New Ticket Created by Mark Martinec
# Please include the string​: [perl #69973]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=69973 >

This is a bug report for perl from Mark.Martinec@​ijs.si,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
[Please describe your issue here]

Tracking down a reason for crashes of a perl process while processing
certain obfuscated spam messages, it turns out that an utf-8 character
with a large (and invalid) codepoint is causing a perl 5.10.1 crash
while matching such string to a particular regular expression.

This is happening on a FreeBSD 7.2, using perl as installed from ports
with no special settings.

Reducing the actual crashing application to a small test case,
here it is​:

#!/usr/bin/perl -T
use strict;

# Here is a HTML snippet from a malicious/obfuscated mail message.
# Note the last character has an invalid and huge UTF-8 code
# (as a result of an unrelated bug in HTML​::Parser).
#
my $t = '<a>Attention Home&#959&#969n&#1257rs...1&#1109t '.
'T&#1110&#1084e E&#957&#1257&#1075075</a>';

$t =~ s/&#(\d+)/chr($1)/ge; # convert HTML entities to UTF8
$t .= substr($ENV{PATH},0,0); # make it tainted

# show character codes in the resulting string
print join(", ", map {ord} split(//,$t)), "\n";

# The following regexp evaluation crashes perl 5.10.1 on FreeBSD.
# Note that $t must be tainted and must have the UTF8 flag on,
# otherwise the crash seems to be avoided.

$t =~ /( |\b)(http​:|www\.)/i;

Bisected down to 8902bb0
(http​://perl5.git.perl.org/perl.git/commit/8902bb05)​:

Author​: Slaven Rezic <slaven@​rezic.de>
Date​: Sun Jan 4 17​:28​:33 2009 +0100

Another regexp failure with utf8-flagged string and byte-flagged
pattern (reminder)

Date​: 17 Nov 2007 16​:29​:29 +0100
Message-ID​: <87r6iohova.fsf@​biokovo-amd64.herceg.de>

(cherry picked from commit c012444)

thanks - that helps a lot.
yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2009

From @demerphq

2009/10/23 Vincent Pit <perl@​profvince.com>​:

2009/10/22 Mark Martinec <perlbug-followup@​perl.org>​:

# New Ticket Created by  Mark Martinec
# Please include the string​:  [perl #69973]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=69973 >

This is a bug report for perl from Mark.Martinec@​ijs.si,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
[Please describe your issue here]

Tracking down a reason for crashes of a perl process while processing
certain obfuscated spam messages, it turns out that an utf-8 character
with a large (and invalid) codepoint is causing a perl 5.10.1 crash
while matching such string to a particular regular expression.

This is happening on a FreeBSD 7.2, using perl as installed from ports
with no special settings.

Reducing the actual crashing application to a small test case,
here it is​:

#!/usr/bin/perl -T
 use strict;

 # Here is a HTML snippet from a malicious/obfuscated mail message.
 # Note the last character has an invalid and huge UTF-8 code
 # (as a result of an unrelated bug in HTML​::Parser).
 #
 my $t = '<a>Attention Home&#959&#969n&#1257rs...1&#1109t '.
         'T&#1110&#1084e E&#957&#1257&#1075075</a>';

 $t =~ s/&#(\d+)/chr($1)/ge;    # convert HTML entities to UTF8
 $t .= substr($ENV{PATH},0,0);  # make it tainted

 # show character codes in the resulting string
 print join(", ", map {ord} split(//,$t)), "\n";

 # The following regexp evaluation crashes perl 5.10.1 on FreeBSD.
 # Note that $t must be tainted and must have the UTF8 flag on,
 # otherwise the crash seems to be avoided.

 $t =~ /( |\b)(http​:|www\.)/i;

Bisected down to 8902bb0
(http​://perl5.git.perl.org/perl.git/commit/8902bb05)​:

Author​: Slaven Rezic <slaven@​rezic.de>
Date​:   Sun Jan 4 17​:28​:33 2009 +0100

   Another regexp failure with utf8-flagged string and byte-flagged
pattern (reminder)

   Date​: 17 Nov 2007 16​:29​:29 +0100
   Message-ID​: <87r6iohova.fsf@​biokovo-amd64.herceg.de>

   (cherry picked from commit c012444)

The simple fix is to add a guard to the if clause to prevent looking
up chars>255.

The thing is the original patch sorta hides a deeper problem. It may
be that the trie stuff just has to be disabled for case insensitive
matches. As there doesnt seem to be a way to support the run time
"decide how to match based on the string AND pattern" behaviour of
earlier perls in the trie structure without breaking case insensitive
matches.

For instance in old perls​:

use Test​::More;
use Encode;
  {
  # more TRIE/AHOCORASICK problems with mixed utf8 / latin-1 and
case folding
  for my $chr (181) { #160 .. 255) {
  my $chr_byte = chr($chr);
  my $chr_utf8 = chr($chr); utf8​::upgrade($chr_utf8);
  my $chr_high = chr(0x3bc);
  my $rx = qr{.?(?​:$chr_byte|X)}i;
  ok($chr_utf8 =~ $rx, "utf8/latin, codepoint $chr ".
encode_utf8($chr_utf8));
  ok($chr_high =~ $rx, "utf8/latin, codepoint $chr ".
encode_utf8($chr_utf8));
  }
  }

Should match. In TRIE'd perls it wont. As in unicode rules these rules
apply that do not in the non-unicode behaviour​:

00B5; C; 03BC; # MICRO SIGN
00C0; C; 00E0; # LATIN CAPITAL LETTER A WITH GRAVE
00C1; C; 00E1; # LATIN CAPITAL LETTER A WITH ACUTE
00C2; C; 00E2; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
00C3; C; 00E3; # LATIN CAPITAL LETTER A WITH TILDE
00C4; C; 00E4; # LATIN CAPITAL LETTER A WITH DIAERESIS
00C5; C; 00E5; # LATIN CAPITAL LETTER A WITH RING ABOVE
00C6; C; 00E6; # LATIN CAPITAL LETTER AE
00C7; C; 00E7; # LATIN CAPITAL LETTER C WITH CEDILLA
00C8; C; 00E8; # LATIN CAPITAL LETTER E WITH GRAVE
00C9; C; 00E9; # LATIN CAPITAL LETTER E WITH ACUTE
00CA; C; 00EA; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
00CB; C; 00EB; # LATIN CAPITAL LETTER E WITH DIAERESIS
00CC; C; 00EC; # LATIN CAPITAL LETTER I WITH GRAVE
00CD; C; 00ED; # LATIN CAPITAL LETTER I WITH ACUTE
00CE; C; 00EE; # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
00CF; C; 00EF; # LATIN CAPITAL LETTER I WITH DIAERESIS
00D0; C; 00F0; # LATIN CAPITAL LETTER ETH
00D1; C; 00F1; # LATIN CAPITAL LETTER N WITH TILDE
00D2; C; 00F2; # LATIN CAPITAL LETTER O WITH GRAVE
00D3; C; 00F3; # LATIN CAPITAL LETTER O WITH ACUTE
00D4; C; 00F4; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
00D5; C; 00F5; # LATIN CAPITAL LETTER O WITH TILDE
00D6; C; 00F6; # LATIN CAPITAL LETTER O WITH DIAERESIS
00D8; C; 00F8; # LATIN CAPITAL LETTER O WITH STROKE
00D9; C; 00F9; # LATIN CAPITAL LETTER U WITH GRAVE
00DA; C; 00FA; # LATIN CAPITAL LETTER U WITH ACUTE
00DB; C; 00FB; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
00DC; C; 00FC; # LATIN CAPITAL LETTER U WITH DIAERESIS
00DD; C; 00FD; # LATIN CAPITAL LETTER Y WITH ACUTE
00DE; C; 00FE; # LATIN CAPITAL LETTER THORN
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

I suppose any non-unicode pattern that doesnt use these can still be
case-insensitively matched with a trie.

Hrmph.

Cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2009

From @demerphq

Resolved by​:

commit 0abd0d7
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Sun Oct 25 20​:37​:08 2009 +0100

  disable non-unicode case insensitive trie matching
 
  Also revert 8902bb0 as it merely
  masked one symptom of the deeper problems.
 
  Also fixes RT #69973, which was a segfault which was exposed by
  8902bb0, see the ticket for further details.
 
  http​://rt.perl.org/rt3//Public/Bug/Display.html?id=69973
 
  At the code of this is the problem that in unicode matching a bunch
  of code points have case folding rules beyond just A-Z/a-z. Since
  the case folding rules are decided at runtime by the string, we cant
  use the same TRIE tables for both unicode/non-unicode matching.
 
  Until this is reconciled or some other solution is found case
insensitive
  matching only gets the TRIE optimisation when the pattern is uniocde.
 
  From CaseFolding.txt​:
 
  00B5; C; 03BC; # MICRO SIGN
  00C0; C; 00E0; # LATIN CAPITAL LETTER A WITH GRAVE
  00C1; C; 00E1; # LATIN CAPITAL LETTER A WITH ACUTE
  00C2; C; 00E2; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
  00C3; C; 00E3; # LATIN CAPITAL LETTER A WITH TILDE
  00C4; C; 00E4; # LATIN CAPITAL LETTER A WITH DIAERESIS
  00C5; C; 00E5; # LATIN CAPITAL LETTER A WITH RING ABOVE
  00C6; C; 00E6; # LATIN CAPITAL LETTER AE
  00C7; C; 00E7; # LATIN CAPITAL LETTER C WITH CEDILLA
  00C8; C; 00E8; # LATIN CAPITAL LETTER E WITH GRAVE
  00C9; C; 00E9; # LATIN CAPITAL LETTER E WITH ACUTE
  00CA; C; 00EA; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
  00CB; C; 00EB; # LATIN CAPITAL LETTER E WITH DIAERESIS
  00CC; C; 00EC; # LATIN CAPITAL LETTER I WITH GRAVE
  00CD; C; 00ED; # LATIN CAPITAL LETTER I WITH ACUTE
  00CE; C; 00EE; # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
  00CF; C; 00EF; # LATIN CAPITAL LETTER I WITH DIAERESIS
  00D0; C; 00F0; # LATIN CAPITAL LETTER ETH
  00D1; C; 00F1; # LATIN CAPITAL LETTER N WITH TILDE
  00D2; C; 00F2; # LATIN CAPITAL LETTER O WITH GRAVE
  00D3; C; 00F3; # LATIN CAPITAL LETTER O WITH ACUTE
  00D4; C; 00F4; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
  00D5; C; 00F5; # LATIN CAPITAL LETTER O WITH TILDE
  00D6; C; 00F6; # LATIN CAPITAL LETTER O WITH DIAERESIS
  00D8; C; 00F8; # LATIN CAPITAL LETTER O WITH STROKE
  00D9; C; 00F9; # LATIN CAPITAL LETTER U WITH GRAVE
  00DA; C; 00FA; # LATIN CAPITAL LETTER U WITH ACUTE
  00DB; C; 00FB; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
  00DC; C; 00FC; # LATIN CAPITAL LETTER U WITH DIAERESIS
  00DD; C; 00FD; # LATIN CAPITAL LETTER Y WITH ACUTE
  00DE; C; 00FE; # LATIN CAPITAL LETTER THORN
  00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2009

@demerphq - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant