Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure when line length approaches 2^30 characters #10212

Open
p5pRT opened this issue Mar 4, 2010 · 10 comments
Open

failure when line length approaches 2^30 characters #10212

p5pRT opened this issue Mar 4, 2010 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 4, 2010

Migrated from rt.perl.org#73306 (status was 'open')

Searchable as RT73306$

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2010

From mbmiller+m@gmail.com

This bug is highly replicable. See the attached perlbug -d output, but we
have found the same bug on several other systems.

The problem is that perl fails when line length approaches 2^30 characters
and I use a replacement string that increases that line length. This is
how I first discovered it​: I had a large file, about 1.5 GB and wanted to
add three characters to the beginning of the first line, so I did this...

$ perl -pi -e 'BEGIN{undef $/} ; s/\A/ID /' foo.txt
Substitution loop at -e line 1, <> chunk 1.

...with this result​:

$ ls -l foo.txt
-rw-r--r-- 1 mbmiller mcguem 0 2010-02-26 18​:26 foo.txt

That is, zero bytes remain from a 1.5 GB file. (I know that I should have
used .bak, but I could easily recreate the file and I wasn't worried.)
After that I set out to find the source of the problem. It seems that
slurp mode treats the file as one long line, but I get the same problem
when not in slurp mode if the line is long enough. I made a file of one
long line (foo.txt) and did this with it​:

$ head -c 1073741818 foo.txt | perl -pe 's/./AB/' > bar.txt

That did not fail, so I tried this​:

$ head -c 1073741819 foo.txt | perl -pe 's/./AB/' > bar.txt

But that failed (with the same "Substitution ... chunk 1." message as
before). I find that perl fails when the line is 1073741819 characters or
longer and I try to make the line longer by at least one character.
Making an even longer line shorter is not a problem.

Of course, 2^30 = 1073741824, which happens to be exactly 5 bytes greater
than the point where we start to see failure. A friend suggested that
this might be due to perl using internally two numbers per UTF-8 character
and using signed integers to index them, thus pushing up against a 32-bit
limit. I don't know but thought I'd mention that.

cc​: three people who helped me figure this out

Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2010

From mbmiller+m@gmail.com


Flags​:
  category=
  severity=


This perlbug was built using Perl v5.8.3 - Fri Jul 11 14​:54​:19 UTC 2008
It is being executed now by Perl v5.8.3 - Fri Jul 11 14​:47​:00 UTC 2008.

Site configuration information for perl v5.8.3​:

Configured by abuild at Fri Jul 11 14​:47​:00 UTC 2008.

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration​:
  Platform​:
  osname=linux, osvers=2.6.5, archname=ia64-linux-thread-multi
  uname='linux regiomontanus 2.6.5 #1 smp thu may 17 14​:00​:09 utc 2007 ia64 ia64 ia64 gnulinux '
  config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -Wall -pipe'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=define use64bitall=define uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -fmessage-length=0 -Wall -Wall -pipe',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing'
  ccversion='', gccversion='3.3.3 (SuSE Linux)', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =''
  libpth=/lib /usr/lib /usr/local/lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.3.5'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.3/ia64-linux-thread-multi/CORE'
  cccdlflags='-fPIC', lddlflags='-shared'

Locally applied patches​:
  SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962


@​INC for perl v5.8.3​:
  /usr/lib/perl5/5.8.3/ia64-linux-thread-multi
  /usr/lib/perl5/5.8.3
  /usr/lib/perl5/site_perl/5.8.3/ia64-linux-thread-multi
  /usr/lib/perl5/site_perl/5.8.3
  /usr/lib/perl5/site_perl
  /usr/lib/perl5/vendor_perl/5.8.3/ia64-linux-thread-multi
  /usr/lib/perl5/vendor_perl/5.8.3
  /usr/lib/perl5/vendor_perl
  .


Environment for perl v5.8.3​:
  HOME=/home/msi/mbmiller
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LC_COLLATE=C
  LD_LIBRARY_PATH=/usr/local/intel/compiler81/lib​:/usr/local/intel/idb81/lib
  LOGDIR (unset)
  PATH=/usr/local/intel/idb81/bin​:/usr/local/intel/compiler81/bin​:/usr/local/bin​:/usr/bin​:/usr/X11R6/bin​:/bin​:/usr/games​:/opt/gnome/bin​:/opt/kde3/bin​:/usr/lib/java/bin​:/usr/local/sw/bin​:/usr/local/moab/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Mar 7, 2010

From mbmiller+m@gmail.com

Hello Perl Debuggers--

One of our supercomputing staff looked into this and the origin of the
error in the source code. See below.

Mike

Date​: Fri, 5 Mar 2010 11​:05​:35 -0600
From​: Mark Nelson
To​: mbmiller+m@​gmail.com
Subject​: [MSI #17056] perl failure when line length approaches 2^30 characters

Hi Mike,

I took a look into this and basically perl thinks you have an infinite
substitution loop because as you guessed, there is a 32bit integer limit
in the src. As to why it breaks a bit earlier than the 32bit limit, the
code explains all.

  From pp_hot.c in the perl src​:

I32 iters = 0;
I32 maxiters;

maxiters = 2 * slen + 10; /* We can match twice at each
position, once with zero-length,
second time with non-zero. */

if (iters++ > maxiters)
DIE(aTHX_ "Substitution loop");

So assuming signed ints, slen can be (2^31 - 10) / 2 which is exactly what
you are seeing in your test.

To do what you want to do, something like this should be a (faster) work
around​:

perl -pi -e 'print "ID " if $. == 1' foo.txt

Mark

On Thu, 4 Mar 2010, perlbug-followup@​perl.org wrote​:

Greetings,

This message has been automatically generated in response to the
creation of a perl bug report regarding​:
"failure when line length approaches 2^30 characters".

There is no need to reply to this message right now. Your ticket has been
assigned an ID of [perl #73306].

You can view your ticket at
http​://rt.perl.org/rt3/Ticket/Display.html?id=73306

Within the next 24-72 hours, your message will be posted to the perl developers. Please be patient!

Please include the string​:

[perl #73306]

in the subject line of all future correspondence about this issue. To do so,
you may reply to this message (please delete unnecessary quotes and text.)

Thank you,
perlbug-followup@​perl.org

-------------------------------------------------------------------------
CC​: Robert Citek <robert.citek@​gmail.com>, Russell Horn <mlug@​albanach.com>, Josh Buysse <buysse@​umn.edu>
MIME-Version​: 1.0
X-Spam-Status​: No, hits=-3.4 required=8.0 tests=DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,PERLBUG_CONF,SPF_NEUTRAL
Content-ID​: <alpine.DEB.2.00.1003041503530.30592@​taxa.psych.umn.edu>
X-Virus-Checked​: Checked
X-Virus-Checked​: Checked
X-Old-Spam-Check-BY​: la.mx.develooper.com
Content-Type​: MULTIPART/Mixed; BOUNDARY="1702920797-1477373342-1267736613=​:30592"
Message-ID​: <alpine.DEB.2.00.1003041527080.30592@​taxa.psych.umn.edu>
Received​: (qmail 16544 invoked from network); 4 Mar 2010 21​:29​:31 -0000
Received​: from localhost (HELO la.mx.develooper.com) (127.0.0.1) by localhost with SMTP; 4 Mar 2010 21​:29​:31 -0000
Received​: (qmail 16541 invoked by alias); 4 Mar 2010 21​:29​:30 -0000
Received​: from la.mx.develooper.com (HELO x1.develooper.com) (207.171.7.76) by la.mx.develooper.com (qpsmtpd/0.28) with SMTP; Thu, 04 Mar 2010 13​:29​:23 -0800
Received​: (qmail 15616 invoked by uid 225); 4 Mar 2010 21​:29​:12 -0000
Received​: (qmail 15502 invoked by alias); 4 Mar 2010 21​:29​:11 -0000
Received​: from mail-qy0-f196.google.com (HELO mail-qy0-f196.google.com) (209.85.221.196) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 04 Mar 2010 13​:29​:04 -0800
Received​: by qyk34 with SMTP id 34so2192191qyk.26 for <perlbug@​perl.org>; Thu, 04 Mar 2010 13​:28​:52 -0800 (PST)
Received​: by 10.224.98.140 with SMTP id q12mr235616qan.287.1267738132173; Thu, 04 Mar 2010 13​:28​:52 -0800 (PST)
Received​: from taxa.psych.umn.edu (taxa.psych.umn.edu [128.101.93.130]) by mx.google.com with ESMTPS id 21sm824870iwn.15.2010.03.04.13.28.47 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 04 Mar 2010 13​:28​:49 -0800 (PST)
Delivered-To​: rt-perl5@​netlabs.develooper.com
Delivered-To​: perlbug@​perl.org
Subject​: failure when line length approaches 2^30 characters
User-Agent​: Alpine 2.00 (DEB 1167 2008-08-23)
Return-Path​: <mbmiller+m@​gmail.com>
Return-Path​: <mbmiller+m@​gmail.com>
Domainkey-Signature​: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date​:from​:sender​:to​:cc​:subject​:message-id​:user-agent​:mime-version :content-type​:content-id; b=J7hO77VkAT7dOF66jV0gvr7/6Z1KMmwdZabmUKbb26KFCXn3uqMWq6E8xSVTbZW6fk WDUZOY1Rdg4hVEJ8zUtvtimIe2mLbUdqiedlt8gyLuNv3MHmstOSt/L8bBDHjyoDba6V myLXZCZnaTgHvIyfFYHEkpL4kq2xgBWdE1Ims=
X-Spam-Check-BY​: la.mx.develooper.com
Dkim-Signature​: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature​:received​:received​:date​:from​:sender​:to​:cc :subject​:message-id​:user-agent​:mime-version​:content-type​:content-id; bh=WNfO+CTn47HXUI9qPPpt4df4S+B/K7dAAgTGoa4ImD4=; b=eVFTY1SMdEGkWB24HRz9BZxxWpd5+B88KRijSn5NttM0oNeK1HZZ8FobzuoPcNcEww J0928tjtqoxzNcXKABSa3g/6eZmHA+yGa/ACl46iHrKdwp8T28kzEnIFMKkJyYK/D25j 8uDmgcdkCNx8qPSPNon7FPH9pW0fZLvtrSTfE=
X-Old-Spam-Status​: No, hits=-4.1 required=8.0 tests=DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,PERLBUG_CONF,SPF_PASS
Date​: Thu, 4 Mar 2010 15​:28​:47 -0600 (CST)
Sender​: Mike Miller <mbmiller@​gmail.com>
To​: perlbug@​perl.org
From​: Mike Miller <mbmiller+m@​gmail.com>

@p5pRT
Copy link
Author

p5pRT commented Apr 6, 2010

From mbmiller+m@gmail.com

Hello PerlBug People--

We never received a reply to this message from a month ago. We are
wondering what you thought of it.

Thanks.

Mike

On Fri, 5 Mar 2010, Mike Miller wrote​:

Hello Perl Debuggers--

One of our supercomputing staff looked into this and the origin of the error
in the source code. See below.

Mike

Date​: Fri, 5 Mar 2010 11​:05​:35 -0600
From​: Mark Nelson
To​: mbmiller+m@​gmail.com
Subject​: [MSI #17056] perl failure when line length approaches 2^30
characters

Hi Mike,

I took a look into this and basically perl thinks you have an infinite
substitution loop because as you guessed, there is a 32bit integer limit in
the src. As to why it breaks a bit earlier than the 32bit limit, the code
explains all.

From pp_hot.c in the perl src​:

I32 iters = 0;
I32 maxiters;

maxiters = 2 * slen + 10; /* We can match twice at each
position, once with zero-length,
second time with non-zero. */

if (iters++ > maxiters)
DIE(aTHX_ "Substitution loop");

So assuming signed ints, slen can be (2^31 - 10) / 2 which is exactly what
you are seeing in your test.

To do what you want to do, something like this should be a (faster) work
around​:

perl -pi -e 'print "ID " if $. == 1' foo.txt

Mark

On Thu, 4 Mar 2010, perlbug-followup@​perl.org wrote​:

Greetings,

This message has been automatically generated in response to the
creation of a perl bug report regarding​:
"failure when line length approaches 2^30 characters".

There is no need to reply to this message right now. Your ticket has been
assigned an ID of [perl #73306].

You can view your ticket at
http​://rt.perl.org/rt3/Ticket/Display.html?id=73306

Within the next 24-72 hours, your message will be posted to the perl
developers. Please be patient!

Please include the string​:

[perl #73306]

in the subject line of all future correspondence about this issue. To do
so,
you may reply to this message (please delete unnecessary quotes and text.)

Thank you,
perlbug-followup@​perl.org

-------------------------------------------------------------------------
CC​: Robert Citek <robert.citek@​gmail.com>, Russell Horn
<mlug@​albanach.com>, Josh Buysse <buysse@​umn.edu>
MIME-Version​: 1.0
X-Spam-Status​: No, hits=-3.4 required=8.0
tests=DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,PERLBUG_CONF,SPF_NEUTRAL
Content-ID​: <alpine.DEB.2.00.1003041503530.30592@​taxa.psych.umn.edu>
X-Virus-Checked​: Checked
X-Virus-Checked​: Checked
X-Old-Spam-Check-BY​: la.mx.develooper.com
Content-Type​: MULTIPART/Mixed;
BOUNDARY="1702920797-1477373342-1267736613=​:30592"
Message-ID​: <alpine.DEB.2.00.1003041527080.30592@​taxa.psych.umn.edu>
Received​: (qmail 16544 invoked from network); 4 Mar 2010 21​:29​:31 -0000
Received​: from localhost (HELO la.mx.develooper.com) (127.0.0.1) by
localhost with SMTP; 4 Mar 2010 21​:29​:31 -0000
Received​: (qmail 16541 invoked by alias); 4 Mar 2010 21​:29​:30 -0000
Received​: from la.mx.develooper.com (HELO x1.develooper.com) (207.171.7.76)
by la.mx.develooper.com (qpsmtpd/0.28) with SMTP; Thu, 04 Mar 2010 13​:29​:23
-0800
Received​: (qmail 15616 invoked by uid 225); 4 Mar 2010 21​:29​:12 -0000
Received​: (qmail 15502 invoked by alias); 4 Mar 2010 21​:29​:11 -0000
Received​: from mail-qy0-f196.google.com (HELO mail-qy0-f196.google.com)
(209.85.221.196) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 04
Mar 2010 13​:29​:04 -0800
Received​: by qyk34 with SMTP id 34so2192191qyk.26 for <perlbug@​perl.org>;
Thu, 04 Mar 2010 13​:28​:52 -0800 (PST)
Received​: by 10.224.98.140 with SMTP id q12mr235616qan.287.1267738132173;
Thu, 04 Mar 2010 13​:28​:52 -0800 (PST)
Received​: from taxa.psych.umn.edu (taxa.psych.umn.edu [128.101.93.130]) by
mx.google.com with ESMTPS id 21sm824870iwn.15.2010.03.04.13.28.47
(version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 04 Mar 2010 13​:28​:49 -0800 (PST)
Delivered-To​: rt-perl5@​netlabs.develooper.com
Delivered-To​: perlbug@​perl.org
Subject​: failure when line length approaches 2^30 characters
User-Agent​: Alpine 2.00 (DEB 1167 2008-08-23)
Return-Path​: <mbmiller+m@​gmail.com>
Return-Path​: <mbmiller+m@​gmail.com>
Domainkey-Signature​: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
h=date​:from​:sender​:to​:cc​:subject​:message-id​:user-agent​:mime-version
:content-type​:content-id;
b=J7hO77VkAT7dOF66jV0gvr7/6Z1KMmwdZabmUKbb26KFCXn3uqMWq6E8xSVTbZW6fk
WDUZOY1Rdg4hVEJ8zUtvtimIe2mLbUdqiedlt8gyLuNv3MHmstOSt/L8bBDHjyoDba6V
myLXZCZnaTgHvIyfFYHEkpL4kq2xgBWdE1Ims=
X-Spam-Check-BY​: la.mx.develooper.com
Dkim-Signature​: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
h=domainkey-signature​:received​:received​:date​:from​:sender​:to​:cc
:subject​:message-id​:user-agent​:mime-version​:content-type​:content-id;
bh=WNfO+CTn47HXUI9qPPpt4df4S+B/K7dAAgTGoa4ImD4=;
b=eVFTY1SMdEGkWB24HRz9BZxxWpd5+B88KRijSn5NttM0oNeK1HZZ8FobzuoPcNcEww
J0928tjtqoxzNcXKABSa3g/6eZmHA+yGa/ACl46iHrKdwp8T28kzEnIFMKkJyYK/D25j
8uDmgcdkCNx8qPSPNon7FPH9pW0fZLvtrSTfE=
X-Old-Spam-Status​: No, hits=-4.1 required=8.0
tests=DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,PERLBUG_CONF,SPF_PASS
Date​: Thu, 4 Mar 2010 15​:28​:47 -0600 (CST)
Sender​: Mike Miller <mbmiller@​gmail.com>
To​: perlbug@​perl.org
From​: Mike Miller <mbmiller+m@​gmail.com>

Please note that I am using a new email address​: mbmiller+m@​gmail.com.
The "+m" is for MCTFR and I use it to give higher priority to MCTFR
messages. My old email address, mbmiller@​taxa.epi.umn.edu, will stop
working because that old computer is being retired.

@p5pRT
Copy link
Author

p5pRT commented Apr 6, 2010

From @iabyn

On Mon, Apr 05, 2010 at 03​:05​:39PM -0500, Mike Miller wrote​:

We never received a reply to this message from a month ago. We are
wondering what you thought of it.

Sorry that no-one replied. We tend to have a lot of bugs reports and not
enough people to reply to them. An initial glance at your issue would
indicate that its one of a large number of things in the perl core that
uses 32-bits even on 64-bit platforms, and which all need refactoring at
some point, but which isn't necessarily likely to happen in a hurry.

--
"Strange women lying in ponds distributing swords is no basis for a system
of government. Supreme executive power derives from a mandate from the
masses, not from some farcical aquatic ceremony."
  -- Dennis, "Monty Python and the Holy Grail"

@p5pRT
Copy link
Author

p5pRT commented Apr 6, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 6, 2010

From @ikegami

On Tue, Apr 6, 2010 at 7​:32 AM, Dave Mitchell <davem@​iabyn.com> wrote​:

On Mon, Apr 05, 2010 at 03​:05​:39PM -0500, Mike Miller wrote​:

We never received a reply to this message from a month ago. We are
wondering what you thought of it.

Sorry that no-one replied. We tend to have a lot of bugs reports and not
enough people to reply to them. An initial glance at your issue would
indicate that its one of a large number of things in the perl core that
uses 32-bits even on 64-bit platforms, and which all need refactoring at
some point, but which isn't necessarily likely to happen in a hurry.

Seems the OP actually talks about an infinite loop when

maxiters = 2 * slen + 10;

overflows. It's hard to tell because he didn't include any code or any
description of the symptoms. An overflow check could be added to prevent the
infinite loop.

@p5pRT
Copy link
Author

p5pRT commented Apr 6, 2010

From @ikegami

On Tue, Apr 6, 2010 at 11​:48 AM, Mike Miller
<mbmiller+m@​gmail.com<mbmiller%2Bm@​gmail.com>

wrote​:

What sort of code are you missing?

My apologies, I was looking at a reply thinking it was the OP.

@p5pRT
Copy link
Author

p5pRT commented Apr 7, 2010

From mbmiller+m@gmail.com

On Tue, 6 Apr 2010, Eric Brine wrote​:

On Tue, Apr 6, 2010 at 7​:32 AM, Dave Mitchell <davem@​iabyn.com> wrote​:

On Mon, Apr 05, 2010 at 03​:05​:39PM -0500, Mike Miller wrote​:

We never received a reply to this message from a month ago. We are
wondering what you thought of it.

Sorry that no-one replied. We tend to have a lot of bugs reports and not
enough people to reply to them. An initial glance at your issue would
indicate that its one of a large number of things in the perl core that
uses 32-bits even on 64-bit platforms, and which all need refactoring at
some point, but which isn't necessarily likely to happen in a hurry.

Seems the OP actually talks about an infinite loop when

maxiters = 2 * slen + 10;

overflows. It's hard to tell because he didn't include any code or any
description of the symptoms. An overflow check could be added to prevent
the infinite loop.

What sort of code are you missing? My original post is below. It
included perl code. Let me know what else you need.

Thanks.

Mike


Date​: Thu, 4 Mar 2010 15​:28​:47 -0600 (CST)
From​: Mike Miller <mbmiller+m@​gmail.com>
To​: perlbug@​perl.org
Cc​: Robert Citek <robert.citek@​gmail.com>, Russell Horn <mlug@​albanach.com>, Josh Buysse <buysse@​umn.edu>
Subject​: failure when line length approaches 2^30 characters

This bug is highly replicable. See the attached perlbug -d output, but we
have found the same bug on several other systems.

The problem is that perl fails when line length approaches 2^30 characters
and I use a replacement string that increases that line length. This is
how I first discovered it​: I had a large file, about 1.5 GB and wanted to
add three characters to the beginning of the first line, so I did this...

$ perl -pi -e 'BEGIN{undef $/} ; s/\A/ID /' foo.txt
Substitution loop at -e line 1, <> chunk 1.

...with this result​:

$ ls -l foo.txt
-rw-r--r-- 1 mbmiller mcguem 0 2010-02-26 18​:26 foo.txt

That is, zero bytes remain from a 1.5 GB file. (I know that I should have
used .bak, but I could easily recreate the file and I wasn't worried.)
After that I set out to find the source of the problem. It seems that
slurp mode treats the file as one long line, but I get the same problem
when not in slurp mode if the line is long enough. I made a file of one
long line (foo.txt) and did this with it​:

$ head -c 1073741818 foo.txt | perl -pe 's/./AB/' > bar.txt

That did not fail, so I tried this​:

$ head -c 1073741819 foo.txt | perl -pe 's/./AB/' > bar.txt

But that failed (with the same "Substitution ... chunk 1." message as
before). I find that perl fails when the line is 1073741819 characters or
longer and I try to make the line longer by at least one character.
Making an even longer line shorter is not a problem.

Of course, 2^30 = 1073741824, which happens to be exactly 5 bytes greater
than the point where we start to see failure. A friend suggested that
this might be due to perl using internally two numbers per UTF-8 character
and using signed integers to index them, thus pushing up against a 32-bit
limit. I don't know but thought I'd mention that.

cc​: three people who helped me figure this out

Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

@p5pRT
Copy link
Author

p5pRT commented Apr 7, 2010

From mbmiller+m@gmail.com


Flags​:
  category=
  severity=


This perlbug was built using Perl v5.8.3 - Fri Jul 11 14​:54​:19 UTC 2008
It is being executed now by Perl v5.8.3 - Fri Jul 11 14​:47​:00 UTC 2008.

Site configuration information for perl v5.8.3​:

Configured by abuild at Fri Jul 11 14​:47​:00 UTC 2008.

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration​:
  Platform​:
  osname=linux, osvers=2.6.5, archname=ia64-linux-thread-multi
  uname='linux regiomontanus 2.6.5 #1 smp thu may 17 14​:00​:09 utc 2007 ia64 ia64 ia64 gnulinux '
  config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -Wall -pipe'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=define use64bitall=define uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -fmessage-length=0 -Wall -Wall -pipe',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing'
  ccversion='', gccversion='3.3.3 (SuSE Linux)', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =''
  libpth=/lib /usr/lib /usr/local/lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.3.5'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.3/ia64-linux-thread-multi/CORE'
  cccdlflags='-fPIC', lddlflags='-shared'

Locally applied patches​:
  SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962


@​INC for perl v5.8.3​:
  /usr/lib/perl5/5.8.3/ia64-linux-thread-multi
  /usr/lib/perl5/5.8.3
  /usr/lib/perl5/site_perl/5.8.3/ia64-linux-thread-multi
  /usr/lib/perl5/site_perl/5.8.3
  /usr/lib/perl5/site_perl
  /usr/lib/perl5/vendor_perl/5.8.3/ia64-linux-thread-multi
  /usr/lib/perl5/vendor_perl/5.8.3
  /usr/lib/perl5/vendor_perl
  .


Environment for perl v5.8.3​:
  HOME=/home/msi/mbmiller
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LC_COLLATE=C
  LD_LIBRARY_PATH=/usr/local/intel/compiler81/lib​:/usr/local/intel/idb81/lib
  LOGDIR (unset)
  PATH=/usr/local/intel/idb81/bin​:/usr/local/intel/compiler81/bin​:/usr/local/bin​:/usr/bin​:/usr/X11R6/bin​:/bin​:/usr/games​:/opt/gnome/bin​:/opt/kde3/bin​:/usr/lib/java/bin​:/usr/local/sw/bin​:/usr/local/moab/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants