Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory corruption in regexp matching on an UTF-8 string #10192

Closed
p5pRT opened this issue Feb 22, 2010 · 7 comments
Closed

memory corruption in regexp matching on an UTF-8 string #10192

p5pRT opened this issue Feb 22, 2010 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 22, 2010

Migrated from rt.perl.org#72996 (status was 'resolved')

Searchable as RT72996$

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2010

From @dolmen

This is a bug report for perl from dolmen@​cpan.org,
generated with the help of perlbug 1.36 running under perl 5.10.0.


This is perl 5.10.0 on amd64

This is case where matching an UTF-8 string (retrieved from a file read with
'<​:utf8') with a regexp corrupts perl's memory and finally crashes with a coredump.

The test case uses Regexp​::Grammars 1.002 which is a pure perl module.

See the full report here, including the test case​:
http​://rt.cpan.org/Public/Bug/Display.html?id=54819

See also the Ubuntu bug I reported (including coredump)​:
https://bugs.launchpad.net/ubuntu/+source/perl/+bug/524817



Flags​:
  category=core
  severity=critical


Site configuration information for perl 5.10.0​:

Configured by Debian Project at Thu Oct 1 22​:36​:47 UTC 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration​:
  Platform​:
  osname=linux, osvers=2.6.24-23-server, archname=x86_64-linux-gnu-thread-multi
  uname='linux crested 2.6.24-23-server #1 smp wed apr 1 22​:14​:30 utc 2009 x86_64 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.0 -Dsitearch=/usr/local/lib/perl/5.10.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.0 -Dd_dosuid -des'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -g',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion='', gccversion='4.4.1', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=/lib/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.0
  gnulibc_version='2.10.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib'

Locally applied patches​:
 


@​INC for perl 5.10.0​:
  /home/dolmen/perl/lib/perl5/x86_64-linux-gnu-thread-multi
  /home/dolmen/perl/lib/perl5
  /home/dolmen/perl/share/perl/5.10.0
  /home/dolmen/perl/share/perl
  /etc/perl
  /usr/local/lib/perl/5.10.0
  /usr/local/share/perl/5.10.0
  /usr/lib/perl5
  /usr/share/perl5
  /usr/lib/perl/5.10
  /usr/share/perl/5.10
  /usr/local/lib/site_perl
  .


Environment for perl 5.10.0​:
  HOME=/home/dolmen
  LANG=fr_FR.UTF-8
  LANGUAGE=fr_FR.UTF-8
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/dolmen/bin​:/home/dolmen/perl/bin​:/home/dolmen/bin​:/home/dolmen/perl/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/sbin​:/bin​:/usr/games​:/home/dolmen/applis/google_appengine
  PERL5LIB=/home/dolmen/perl/lib/perl5​:/home/dolmen/perl/share/perl
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2010

From @iabyn

Requester commented in #73516​:


Bug #72996 is probably related (or the same?)​: it is also a case of a
crask in regexp matching of an UTF-8 string.
The code to reproduce is attached here​:
http​://rt.cpan.org/Public/Bug/Display.html?id=54819
I'm sorry, I can't test with other perl versions.


@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2010

@iabyn - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2011

From @khwilliamson

I have tried to reproduce this on 5.14.0, which has many regexp UTF-8
fixes. I do not get a segfault. But I don't know what I should get.
It prints out many error messages of the form​:
===================> Trying <grammar> from position 3161
  \FAIL <grammar>

And then exits with code 0377. Perl 5.10 is no longer supported.

--Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2012

From @khwilliamson

On Mon Jun 06 21​:14​:54 2011, khw wrote​:

I have tried to reproduce this on 5.14.0, which has many regexp UTF-8
fixes. I do not get a segfault. But I don't know what I should get.
It prints out many error messages of the form​:
===================> Trying <grammar> from position 3161
\FAIL <grammar>

And then exits with code 0377. Perl 5.10 is no longer supported.

--Karl Williamson

The OP wrote that this bug is the same or related to #73516, which has
been fixed. I did not get a response to this comment on the ticket in
more than a year. I propose closing this ticket if there is no response
in one more month.

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2012

From @dolmen

Le Mer. Nov. 14 09​:36​:52 2012, khw a �crit�​:

On Mon Jun 06 21​:14​:54 2011, khw wrote​:

I have tried to reproduce this on 5.14.0, which has many regexp UTF-8
fixes. I do not get a segfault. But I don't know what I should get.

I'm the OP. I tried my original test case with perl 5.14.2 and
Regexp​::Grammar 1.021.
I confirm I don't get the segfault anymore.

It prints out many error messages of the form​:
===================> Trying <grammar> from position 3161
\FAIL <grammar>
And then exits with code 0377.

It looks like Regexep​::Grammars runs in debug mode. I wonder why.

But the FAILS come from the grammar that is invalid​: line 43 a 'T' was
missing before <Time> to correctly match ISO8601 times. Of course I
could not find the bug at the time because of the segfault.

Thanks for the fixes! I'm closing the ticket.

Olivier Mengué (DOLMEN).

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2012

@dolmen - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant