Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: sv_setpvn called with negative strlen -1 [rt.cpan.org #76824] #12071

Closed
p5pRT opened this issue Apr 25, 2012 · 18 comments
Closed

panic: sv_setpvn called with negative strlen -1 [rt.cpan.org #76824] #12071

p5pRT opened this issue Apr 25, 2012 · 18 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 25, 2012

Migrated from rt.perl.org#112608 (status was 'resolved')

Searchable as RT112608$

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2012

From @steve-m-hay

Created by @steve-m-hay

Running the program below with the attached utf8.txt input file produces
the following crash​:
panic​: sv_setpvn called with negative strlen -1 at utf8.pl line 4, <$rh>
line 28.
Close with partial character at utf8.pl line 4, <$rh> line 28.

open my $rh, '&lt;​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '&gt;​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;
#select((select($wh), $| = 1)[0]);
print $wh $_ while <$rh>;
close $wh;
close $rh;

Obviously the (Greek) characters in the input file cannot be converted
to ISO-8859-1, but perl shouldn't crash.
Uncommenting the select() call strangely makes the crash go away. So
does deleting any single line from the input file.

The crash isn't new. I got the same result with 5.8.9.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.15.9:

Configured by shay at Wed Apr 25 08:54:19 2012.

Summary of my perl5 (revision 5 version 15 subversion 9) configuration:
  Commit id: 906024c7fead4086ed911b8052d784aa07c2f1e2
  Platform:
    osname=MSWin32, osvers=6.1, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32
-D_CONSOLE -DNO_STRICT -D_CRT_SECURE_NO_DEPRECATE
-D_CRT_NONSTDC_NO_DEPRECATE  -DPERL_TEXTMODE_SCRIPTS
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='16.00.40219.01', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf
-libpath:"c:\perl\lib\CORE"  -machine:x86
"/manifestdependency:type='Win32'
name='Microsoft.Windows.Common-Controls' version='6.0.0.0'
processorArchitecture='*' publicKeyToken='6595b64144ccf1df'
language='*'"'
    libpth=\lib
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl515.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug
-opt:ref,icf  -libpath:"c:\perl\lib\CORE"  -machine:x86
"/manifestdependency:type='Win32'
name='Microsoft.Windows.Common-Controls' version='6.0.0.0'
processorArchitecture='*' publicKeyToken='6595b64144ccf1df'
language='*'"'

Locally applied patches:
    


@INC for perl 5.15.9:
    C:/perl/site/lib
    C:/perl/lib
    .


Environment for perl 5.15.9:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
 
PATH=C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\
System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Microsoft SQL
Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL
Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL
Server\100\DTS\Binn\;C:\Program Files (x86)\Common Files\Roxio
Shared\OEM\12.0\DLLShared;C:\Program Files (x86)\Microsoft Team
Foundation Server 2010 Power Tools\;C:\Program Files (x86)\Microsoft
Team Foundation Server 2010 Power Tools\Best Practices
Analyzer\;C:\perl\bin
    PERL_BADLANG (unset)
    SHELL (unset)


@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2012

From @steve-m-hay

Οί Συνένοχοι
Οι Γενναίοι της Σαμοθράκης
Οι Γερμανοί ξανάρχονται...
Οι Εραστές Του Αιγαίου
Οι Κυνηγοί
Οι Πανκς Τα Κάνουν Όλα
Οι Φανταρίνες
Οικογένεια Παντρευόμαστε
Ολα είναι δρόμος
Ομηρος
Οξυγόνο
Ορατότης μηδέν
π
πάνω, κάτω και πλαγίως
Το Κακό
Το Κακό - Στην Εποχή των Ηρώων
Το κλάμα βγήκε απ'τον παράδεισο
Το κορίτσι με τα μαύρα
Το κορίτσι του λούνα παρκ
Το Ξύλο βγήκε από τον παράδεισο
Το πιο λαμπρό αστέρι
Το Ρεμαλι Της Αθηνας
Το Τανγκό των Χριστουγέννων
Το τελευταίο ψέμμα
Το φιλί της... Ζωής
Το χώμα βάφτηκε κόκκινο
Τοπίο στην ομίχλη
Τριλογία 1​: Το Λιβάδι που δακρύζει

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2012

From @Leont

2012/4/25 Steve Hay <perlbug-followup@​perl.org>​:

Running the program below with the attached utf8.txt input file produces
the following crash​:
panic​: sv_setpvn called with negative strlen -1 at utf8.pl line 4, <$rh>
line 28.
Close with partial character at utf8.pl line 4, <$rh> line 28.

open my $rh, '&lt;​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '&gt;​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;
#select((select($wh), $| = 1)[0]);
print $wh $_ while <$rh>;
close $wh;
close $rh;

Obviously the (Greek) characters in the input file cannot be converted
to ISO-8859-1, but perl shouldn't crash.

This crash only seems to happen when combining :crlf with :encoding. I
suspect we need a smaller test-case to make it obvious what's really
happening.

Uncommenting the select() call strangely makes the crash go away. So
does deleting any single line from the input file.

Another thing that makes it go away​: removing the byte order mark in
your file. This is smelling fishy.

Leon

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2012

From @khwilliamson

On 04/25/2012 05​:15 AM, Leon Timmermans wrote​:

2012/4/25 Steve Hay<perlbug-followup@​perl.org>​:

Running the program below with the attached utf8.txt input file produces
the following crash​:
panic​: sv_setpvn called with negative strlen -1 at utf8.pl line 4,<$rh>
line 28.
Close with partial character at utf8.pl line 4,<$rh> line 28.

open my $rh, '&lt;​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '&gt;​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;
#select((select($wh), $| = 1)[0]);
print $wh $_ while<$rh>;
close $wh;
close $rh;

Obviously the (Greek) characters in the input file cannot be converted
to ISO-8859-1, but perl shouldn't crash.

This crash only seems to happen when combining :crlf with :encoding. I
suspect we need a smaller test-case to make it obvious what's really
happening.

Uncommenting the select() call strangely makes the crash go away. So
does deleting any single line from the input file.

Another thing that makes it go away​: removing the byte order mark in
your file. This is smelling fishy.

Leon

I could not get this to reproduce on my machine with blead. Instead, I
get messages like​:
  \x{feff}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
  "\x{039f}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
  "\x{03af}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
  "\x{03a3}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
  "\x{03c5}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
  "\x{03bd}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
...
  "\x{03cd}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.
  "\x{03b6}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.
  "\x{03b5}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.
  "\x{03b9}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2012

From @cpansprout

On Wed Apr 25 19​:58​:59 2012, public@​khwilliamson.com wrote​:

On 04/25/2012 05​:15 AM, Leon Timmermans wrote​:

2012/4/25 Steve Hay<perlbug-followup@​perl.org>​:

Running the program below with the attached utf8.txt input file
produces
the following crash​:
panic​: sv_setpvn called with negative strlen -1 at utf8.pl line 4,<$rh>
line 28.
Close with partial character at utf8.pl line 4,<$rh> line 28.

open my $rh, '&lt;​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '&gt;​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;
#select((select($wh), $| = 1)[0]);
print $wh $_ while<$rh>;
close $wh;
close $rh;

Obviously the (Greek) characters in the input file cannot be converted
to ISO-8859-1, but perl shouldn't crash.

This crash only seems to happen when combining :crlf with :encoding. I
suspect we need a smaller test-case to make it obvious what's really
happening.

Uncommenting the select() call strangely makes the crash go away. So
does deleting any single line from the input file.

Another thing that makes it go away​: removing the byte order mark in
your file. This is smelling fishy.

Leon

I could not get this to reproduce on my machine with blead. Instead, I
get messages like​:
\x{feff}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
"\x{039f}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
"\x{03af}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
"\x{03a3}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
"\x{03c5}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
"\x{03bd}" does not map to iso-8859-1 at test.pl line 4, <$rh> line 28.
...
"\x{03cd}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.
"\x{03b6}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.
"\x{03b5}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.
"\x{03b9}" does not map to iso-8859-1 at test.pl line 5, <$rh> line 28.

I get the same result on a Mac, both threaded and unthreaded.

A C backtrace would help.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2012

From @steve-m-hay

Father Chrysostomos via RT wrote on 2012-04-26​:

On Wed Apr 25 19​:58​:59 2012, public@​khwilliamson.com wrote​:

On 04/25/2012 05​:15 AM, Leon Timmermans wrote​:

2012/4/25 Steve Hay<perlbug-followup@​perl.org>​:

Running the program below with the attached utf8.txt input file
produces the following crash​: panic​: sv_setpvn called with negative
strlen -1 at utf8.pl line 4,<$rh> line 28. Close with partial
character at utf8.pl line 4,<$rh> line 28.

open my $rh, '&lt;​:encoding(UTF-8)', 'utf8.txt' or die $!; open my
$wh, '&gt;​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;
#select((select($wh), $| = 1)[0]); print $wh $_ while<$rh>; close
$wh; close $rh;

Obviously the (Greek) characters in the input file cannot be
converted to ISO-8859-1, but perl shouldn't crash.

This crash only seems to happen when combining :crlf with :encoding.
I suspect we need a smaller test-case to make it obvious what's
really happening.

Uncommenting the select() call strangely makes the crash go away.
So does deleting any single line from the input file.

Another thing that makes it go away​: removing the byte order mark in
your file. This is smelling fishy.

Leon

I could not get this to reproduce on my machine with blead.
Instead, I get messages like​:
\x{feff}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{039f}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03af}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03a3}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03c5}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03bd}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line 28. ...
"\x{03cd}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b6}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b5}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b9}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line 28.

I get the same result on a Mac, both threaded and unthreaded.

A C backtrace would help.

Those messages are, of course, expected given what the program is being asked to do. I get them first too, but then followed by a crash (panic).

Is it an EOL issue? (I'm running on Windows here.) Leon mentioned it only happening when :crlf was combined with :encoding. I still see the crash with

open my $rh, '&lt;​:encoding(UTF-8)​:crlf', 'utf8.txt' or die $!;
open my $wh, '&gt;​:encoding(ISO-8859-1)​:crlf', 'iso88591.txt' or die $!;

but not with

open my $rh, '&lt;​:raw​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '&gt;​:raw​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;

Also, when I download the utf8.txt attachment I find that it has got mangled somehow from what I uploaded​: the line endings are now \r\r\n rather than the \r\n original which I uploaded. Furthermore, the program doesn't crash for me if I leave it like that -- I have to convert it back to \r\n before I see the crash again.

I will try to get a backtrace.

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2012

From @steve-m-hay

[Reposting my last reply on rt.perl.org since my email of it seems to
have disappeared into the ether...]

On Thu Apr 26 01​:20​:30 2012, Steve.Hay@​verosoftware.com wrote​:

Father Chrysostomos via RT wrote on 2012-04-26​:

On Wed Apr 25
19​:58​:59 2012, public@​khwilliamson.com wrote​:

On 04/25/2012
05​:15 AM, Leon Timmermans wrote​:

2012/4/25 Steve Hay<perlbug-
followup@​perl.org>​:

Running the program below with the
attached utf8.txt input file
produces the following crash​:
panic​: sv_setpvn called with negative
strlen -1 at utf8.pl
line 4,<$rh> line 28. Close with partial
character at utf8.pl
line 4,<$rh> line 28.

open my $rh, '<​:encoding(UTF-
8)', 'utf8.txt' or die $!; open my
$wh, '>​:encoding(ISO-8859-
1)', 'iso88591.txt' or die $!;
#select((select($wh), $| =
1)[0]); print $wh $_ while<$rh>; close
$wh; close $rh;

Obviously the (Greek) characters in the input file cannot be
converted to ISO-8859-1, but perl shouldn't crash.

This crash only seems to happen when combining :crlf with
:encoding.

I suspect we need a smaller test-case to make it
obvious what's
really happening.

Uncommenting the
select() call strangely makes the crash go away.
So does
deleting any single line from the input file.

Another
thing that makes it go away​: removing the byte order mark in

your file. This is smelling fishy.

Leon

I
could not get this to reproduce on my machine with blead.

Instead, I get messages like​:

\x{feff}" does not map to iso-
8859-1 at test.pl line 4, <$rh>
line
28.
"\x{039f}"
does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03af}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03a3}" does not map to iso-8859-1 at
test.pl line 4, <$rh>
line
28.
"\x{03c5}" does not map
to iso-8859-1 at test.pl line 4, <$rh>
line
28.

"\x{03bd}" does not map to iso-8859-1 at test.pl line 4, <$rh>

line 28. ...

"\x{03cd}" does not map to iso-8859-1 at test.pl
line 5, <$rh>
line
28.
"\x{03b6}" does not map to iso-
8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b5}"
does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b9}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line 28.

I get the same result on a Mac, both
threaded and unthreaded.

A C backtrace would help.

Those messages are, of course, expected given what the program is
being asked to do. I get them first too, but then followed by a
crash (panic).

Is it an EOL issue? (I'm running on Windows
here.) Leon mentioned it only happening when :crlf was combined
with :encoding. I still see the crash with

open my $rh,
'<​:encoding(UTF-8)​:crlf', 'utf8.txt' or die $!;
open my $wh,
'>​:encoding(ISO-8859-1)​:crlf', 'iso88591.txt' or die $!;

but not
with

open my $rh, '&lt;​:raw​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '>​:raw​:encoding(ISO-8859-1)', 'iso88591.txt' or die
$!;

Also, when I download the utf8.txt attachment I find that it
has got mangled somehow from what I uploaded​: the line endings are
now \r\r\n rather than the \r\n original which I uploaded.
Furthermore, the program doesn't crash for me if I leave it like
that -- I have to convert it back to \r\n before I see the crash
again.

I will try to get a backtrace.

Backtrace from current bleadperl (906024c)​:

(In encode_method, sdone is 4294967295, hence iv is -1 in Perl_sv_setpvn)

perl515.dll!Perl_sv_setpvn(interpreter * my_perl, sv * const sv, const
char * const ptr, const unsigned int len) Line 4494 C
Encode.dll!encode_method(interpreter * my_perl, const encode_s * enc,
const encpage_s * dir, sv * src, int check, unsigned int * offset, sv *
term, int * retcode, sv * fallback_cb) Line 266 + 0x19 bytes C
Encode.dll!XS_Encode__XS_encode(interpreter * my_perl, cv * cv) Line
658 + 0x26 bytes C
perl515.dll!Perl_pp_entersub(interpreter * my_perl) Line 2778 + 0x12
bytes C
perl515.dll!Perl_runops_debug(interpreter * my_perl) Line 2119 + 0xf
bytes C
perl515.dll!Perl_call_sv(interpreter * my_perl, sv * sv, volatile long
flags) Line 2690 + 0x38 bytes C
perl515.dll!Perl_call_method(interpreter * my_perl, const char *
methname, long flags) Line 2616 + 0x2d bytes C
encoding.dll!PerlIOEncode_flush(interpreter * my_perl, _PerlIO * * f)
Line 424 + 0x11 bytes C
perl515.dll!Perl_PerlIO_flush(interpreter * my_perl, _PerlIO * * f)
Line 1727 + 0x10 bytes C
perl515.dll!PerlIOBuf_write(interpreter * my_perl, _PerlIO * * f, const
void * vbuf, unsigned int count) Line 4166 + 0xd bytes C
encoding.dll!PerlIOEncode_write(interpreter * my_perl, _PerlIO * * f,
const void * vbuf, unsigned int count) Line 593 + 0x15 bytes C
perl515.dll!Perl_PerlIO_write(interpreter * my_perl, _PerlIO * * f,
const void * vbuf, unsigned int count) Line 1703 + 0x40 bytes C
perl515.dll!Perl_do_print(interpreter * my_perl, sv * sv, _PerlIO * *
fp) Line 1258 + 0x1b bytes C
perl515.dll!Perl_pp_print(interpreter * my_perl) Line 730 + 0x13 bytes C
perl515.dll!Perl_runops_debug(interpreter * my_perl) Line 2119 + 0xf
bytes C
perl515.dll!S_run_body(interpreter * my_perl, long oldscope) Line 2402
+ 0xf bytes C
perl515.dll!perl_run(interpreter * my_perl) Line 2320 + 0xd bytes C
perl515.dll!RunPerl(int argc, char * * argv, char * * env) Line 270 +
0x9 bytes C++
perl.exe!main(int argc, char * * argv, char * * env) Line 23 + 0x12 bytes C
perl.exe!__tmainCRTStartup() Line 555 + 0x17 bytes C

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2012

From @cpansprout

On Thu Apr 26 01​:20​:30 2012, Steve.Hay@​verosoftware.com wrote​:

Father Chrysostomos via RT wrote on 2012-04-26​:

I get the same result on a Mac, both
threaded and unthreaded.

I should have read Leon Timmerman’s message more closely. If I use
:crlf​:encoding(...) I get the panic, but not with :crlf after.

A C backtrace would help.

Those messages are, of course, expected given what the program is
being asked to do. I get them first too, but then followed by a
crash (panic).

Is it an EOL issue? (I'm running on Windows
here.) Leon mentioned it only happening when :crlf was combined
with :encoding. I still see the crash with

open my $rh,
'<​:encoding(UTF-8)​:crlf', 'utf8.txt' or die $!;
open my $wh,
'>​:encoding(ISO-8859-1)​:crlf', 'iso88591.txt' or die $!;

but not
with

open my $rh, '&lt;​:raw​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '>​:raw​:encoding(ISO-8859-1)', 'iso88591.txt' or die
$!;

Also, when I download the utf8.txt attachment I find that it
has got mangled somehow from what I uploaded​: the line endings are
now \r\r\n rather than the \r\n original which I uploaded.
Furthermore, the program doesn't crash for me if I leave it like
that -- I have to convert it back to \r\n before I see the crash
again.

I think your browser is trying to convert \n to \r\n when you download
it. I downloaded the file and it has \r\n line breaks in it.

I will try to get a backtrace.

I see your backtrace has Encode’s XS code in it. I will try to dig deeper.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2012

From @cpansprout

On Thu Apr 26 08​:35​:07 2012, sprout wrote​:

On Thu Apr 26 01​:20​:30 2012, Steve.Hay@​verosoftware.com wrote​:

Father Chrysostomos via RT wrote on 2012-04-26​:

I get the same result on a Mac, both
threaded and unthreaded.

I should have read Leon Timmerman’s message more closely. If I use
:crlf​:encoding(...) I get the panic, but not with :crlf after.

A C backtrace would help.

Those messages are, of course, expected given what the program is
being asked to do. I get them first too, but then followed by a
crash (panic).

Is it an EOL issue? (I'm running on Windows
here.) Leon mentioned it only happening when :crlf was combined
with :encoding. I still see the crash with

open my $rh,
'<​:encoding(UTF-8)​:crlf', 'utf8.txt' or die $!;
open my $wh,
'>​:encoding(ISO-8859-1)​:crlf', 'iso88591.txt' or die $!;

but not
with

open my $rh, '&lt;​:raw​:encoding(UTF-8)', 'utf8.txt' or die $!;
open my $wh, '>​:raw​:encoding(ISO-8859-1)', 'iso88591.txt' or die
$!;

Also, when I download the utf8.txt attachment I find that it
has got mangled somehow from what I uploaded​: the line endings are
now \r\r\n rather than the \r\n original which I uploaded.
Furthermore, the program doesn't crash for me if I leave it like
that -- I have to convert it back to \r\n before I see the crash
again.

I think your browser is trying to convert \n to \r\n when you download
it. I downloaded the file and it has \r\n line breaks in it.

I will try to get a backtrace.

I see your backtrace has Encode’s XS code in it. I will try to dig
deeper.

This is probably a bug in Encode. I’ve reduced it to a standalone
script (with no input file), attached as foo.text. (I know, it’s not
much smaller, but it’s easier to handle.)

This code in cpan/Encode/Encode.xs​:encode_method​:

ENCODE_SET_SRC​:
  if (check && !(check & ENCODE_LEAVE_SRC)){
  sdone = SvCUR(src) - (slen+sdone);
  if (sdone) {
  sv_setpvn(src, (char*)s+slen, sdone);
  }

calls sv_setpvn with 4294967295 for the length argument.

Before this chunk of code is reached, sdone has a value of 1025, but src
is only 1024 octets long. slen is 0. So the sdone assignment sets it
to -1, which becomes 4294967295 because sdone is unsigned.

So somehow Encode is losing track of how far it is through the string.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2012

From @cpansprout

@​_ = (
  "\x{feff}\x{39f}\x{3af} \x{3a3}\x{3c5}\x{3bd}\x{3ad}\x{3bd}\x{3bf}\x{3c7}\x{3bf}\x{3b9}\n",
  "\x{39f}\x{3b9} \x{393}\x{3b5}\x{3bd}\x{3bd}\x{3b1}\x{3af}\x{3bf}\x{3b9} \x{3c4}\x{3b7}\x{3c2} \x{3a3}\x{3b1}\x{3bc}\x{3bf}\x{3b8}\x{3c1}\x{3ac}\x{3ba}\x{3b7}\x{3c2}\n",
  "\x{39f}\x{3b9} \x{393}\x{3b5}\x{3c1}\x{3bc}\x{3b1}\x{3bd}\x{3bf}\x{3af} \x{3be}\x{3b1}\x{3bd}\x{3ac}\x{3c1}\x{3c7}\x{3bf}\x{3bd}\x{3c4}\x{3b1}\x{3b9}...\n",
  "\x{39f}\x{3b9} \x{395}\x{3c1}\x{3b1}\x{3c3}\x{3c4}\x{3ad}\x{3c2} \x{3a4}\x{3bf}\x{3c5} \x{391}\x{3b9}\x{3b3}\x{3b1}\x{3af}\x{3bf}\x{3c5}\n",
  "\x{39f}\x{3b9} \x{39a}\x{3c5}\x{3bd}\x{3b7}\x{3b3}\x{3bf}\x{3af}\n",
  "\x{39f}\x{3b9} \x{3a0}\x{3b1}\x{3bd}\x{3ba}\x{3c2} \x{3a4}\x{3b1} \x{39a}\x{3ac}\x{3bd}\x{3bf}\x{3c5}\x{3bd} \x{38c}\x{3bb}\x{3b1}\n",
  "\x{39f}\x{3b9} \x{3a6}\x{3b1}\x{3bd}\x{3c4}\x{3b1}\x{3c1}\x{3af}\x{3bd}\x{3b5}\x{3c2}\n",
  "\x{39f}\x{3b9}\x{3ba}\x{3bf}\x{3b3}\x{3ad}\x{3bd}\x{3b5}\x{3b9}\x{3b1} \x{3a0}\x{3b1}\x{3bd}\x{3c4}\x{3c1}\x{3b5}\x{3c5}\x{3cc}\x{3bc}\x{3b1}\x{3c3}\x{3c4}\x{3b5}\n",
  "\x{39f}\x{3bb}\x{3b1} \x{3b5}\x{3af}\x{3bd}\x{3b1}\x{3b9} \x{3b4}\x{3c1}\x{3cc}\x{3bc}\x{3bf}\x{3c2}\n",
  "\x{39f}\x{3bc}\x{3b7}\x{3c1}\x{3bf}\x{3c2}\n",
  "\x{39f}\x{3be}\x{3c5}\x{3b3}\x{3cc}\x{3bd}\x{3bf}\n",
  "\x{39f}\x{3c1}\x{3b1}\x{3c4}\x{3cc}\x{3c4}\x{3b7}\x{3c2} \x{3bc}\x{3b7}\x{3b4}\x{3ad}\x{3bd}\n",
  "\x{3c0}\n",
  "\x{3c0}\x{3ac}\x{3bd}\x{3c9}, \x{3ba}\x{3ac}\x{3c4}\x{3c9} \x{3ba}\x{3b1}\x{3b9} \x{3c0}\x{3bb}\x{3b1}\x{3b3}\x{3af}\x{3c9}\x{3c2}\n",
  "\x{3a4}\x{3bf} \x{39a}\x{3b1}\x{3ba}\x{3cc}\n",
  "\x{3a4}\x{3bf} \x{39a}\x{3b1}\x{3ba}\x{3cc} - \x{3a3}\x{3c4}\x{3b7}\x{3bd} \x{395}\x{3c0}\x{3bf}\x{3c7}\x{3ae} \x{3c4}\x{3c9}\x{3bd} \x{397}\x{3c1}\x{3ce}\x{3c9}\x{3bd}\n",
  "\x{3a4}\x{3bf} \x{3ba}\x{3bb}\x{3ac}\x{3bc}\x{3b1} \x{3b2}\x{3b3}\x{3ae}\x{3ba}\x{3b5} \x{3b1}\x{3c0}'\x{3c4}\x{3bf}\x{3bd} \x{3c0}\x{3b1}\x{3c1}\x{3ac}\x{3b4}\x{3b5}\x{3b9}\x{3c3}\x{3bf}\n",
  "\x{3a4}\x{3bf} \x{3ba}\x{3bf}\x{3c1}\x{3af}\x{3c4}\x{3c3}\x{3b9} \x{3bc}\x{3b5} \x{3c4}\x{3b1} \x{3bc}\x{3b1}\x{3cd}\x{3c1}\x{3b1}\n",
  "\x{3a4}\x{3bf} \x{3ba}\x{3bf}\x{3c1}\x{3af}\x{3c4}\x{3c3}\x{3b9} \x{3c4}\x{3bf}\x{3c5} \x{3bb}\x{3bf}\x{3cd}\x{3bd}\x{3b1} \x{3c0}\x{3b1}\x{3c1}\x{3ba}\n",
  "\x{3a4}\x{3bf} \x{39e}\x{3cd}\x{3bb}\x{3bf} \x{3b2}\x{3b3}\x{3ae}\x{3ba}\x{3b5} \x{3b1}\x{3c0}\x{3cc} \x{3c4}\x{3bf}\x{3bd} \x{3c0}\x{3b1}\x{3c1}\x{3ac}\x{3b4}\x{3b5}\x{3b9}\x{3c3}\x{3bf}\n",
  "\x{3a4}\x{3bf} \x{3c0}\x{3b9}\x{3bf} \x{3bb}\x{3b1}\x{3bc}\x{3c0}\x{3c1}\x{3cc} \x{3b1}\x{3c3}\x{3c4}\x{3ad}\x{3c1}\x{3b9}\n",
  "\x{3a4}\x{3bf} \x{3a1}\x{3b5}\x{3bc}\x{3b1}\x{3bb}\x{3b9} \x{3a4}\x{3b7}\x{3c2} \x{391}\x{3b8}\x{3b7}\x{3bd}\x{3b1}\x{3c2}\n",
  "\x{3a4}\x{3bf} \x{3a4}\x{3b1}\x{3bd}\x{3b3}\x{3ba}\x{3cc} \x{3c4}\x{3c9}\x{3bd} \x{3a7}\x{3c1}\x{3b9}\x{3c3}\x{3c4}\x{3bf}\x{3c5}\x{3b3}\x{3ad}\x{3bd}\x{3bd}\x{3c9}\x{3bd}\n",
  "\x{3a4}\x{3bf} \x{3c4}\x{3b5}\x{3bb}\x{3b5}\x{3c5}\x{3c4}\x{3b1}\x{3af}\x{3bf} \x{3c8}\x{3ad}\x{3bc}\x{3bc}\x{3b1}\n",
  "\x{3a4}\x{3bf} \x{3c6}\x{3b9}\x{3bb}\x{3af} \x{3c4}\x{3b7}\x{3c2}... \x{396}\x{3c9}\x{3ae}\x{3c2}\n",
  "\x{3a4}\x{3bf} \x{3c7}\x{3ce}\x{3bc}\x{3b1} \x{3b2}\x{3ac}\x{3c6}\x{3c4}\x{3b7}\x{3ba}\x{3b5} \x{3ba}\x{3cc}\x{3ba}\x{3ba}\x{3b9}\x{3bd}\x{3bf}\n",
  "\x{3a4}\x{3bf}\x{3c0}\x{3af}\x{3bf} \x{3c3}\x{3c4}\x{3b7}\x{3bd} \x{3bf}\x{3bc}\x{3af}\x{3c7}\x{3bb}\x{3b7}\n",
  "\x{3a4}\x{3c1}\x{3b9}\x{3bb}\x{3bf}\x{3b3}\x{3af}\x{3b1} 1​: \x{3a4}\x{3bf} \x{39b}\x{3b9}\x{3b2}\x{3ac}\x{3b4}\x{3b9} \x{3c0}\x{3bf}\x{3c5} \x{3b4}\x{3b1}\x{3ba}\x{3c1}\x{3cd}\x{3b6}\x{3b5}\x{3b9}\n"
  );
open my $wh, '&gt;​:crlf​:encoding(ISO-8859-1)', \$out or die $!;
print $wh $_ for @​_;
close $wh;

@p5pRT
Copy link
Author

p5pRT commented Apr 28, 2012

From @steve-m-hay

Steve Hay wrote on 2012-04-26​:

Father Chrysostomos via RT wrote on 2012-04-26​:

On Wed Apr 25 19​:58​:59 2012, public@​khwilliamson.com wrote​:

On 04/25/2012 05​:15 AM, Leon Timmermans wrote​:

2012/4/25 Steve Hay<perlbug-followup@​perl.org>​:

Running the program below with the attached utf8.txt input file
produces the following crash​: panic​: sv_setpvn called with negative
strlen -1 at utf8.pl line 4,<$rh> line 28. Close with partial
character at utf8.pl line 4,<$rh> line 28.

open my $rh, '&lt;​:encoding(UTF-8)', 'utf8.txt' or die $!; open my
$wh, '&gt;​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;
#select((select($wh), $| = 1)[0]); print $wh $_ while<$rh>; close
$wh; close $rh;

Obviously the (Greek) characters in the input file cannot be
converted to ISO-8859-1, but perl shouldn't crash.

This crash only seems to happen when combining :crlf with :encoding.
I suspect we need a smaller test-case to make it obvious what's
really happening.

Uncommenting the select() call strangely makes the crash go away.
So does deleting any single line from the input file.

Another thing that makes it go away​: removing the byte order mark in
your file. This is smelling fishy.

Leon

I could not get this to reproduce on my machine with blead.
Instead, I get messages like​:
\x{feff}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{039f}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03af}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03a3}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03c5}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28.
"\x{03bd}" does not map to iso-8859-1 at test.pl line 4, <$rh>
line
28. ...
"\x{03cd}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b6}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b5}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.
"\x{03b9}" does not map to iso-8859-1 at test.pl line 5, <$rh>
line
28.

I get the same result on a Mac, both threaded and unthreaded.

A C backtrace would help.

Those messages are, of course, expected given what the program is
being asked to do. I get them first too, but then followed by a crash (panic).

Is it an EOL issue? (I'm running on Windows here.) Leon mentioned it
only happening when :crlf was combined with :encoding. I still see the
crash with

open my $rh, '&lt;​:encoding(UTF-8)​:crlf', 'utf8.txt' or die $!; open my
$wh, '&gt;​:encoding(ISO-8859-1)​:crlf', 'iso88591.txt' or die $!;

but not with

open my $rh, '&lt;​:raw​:encoding(UTF-8)', 'utf8.txt' or die $!; open my
$wh, '&gt;​:raw​:encoding(ISO-8859-1)', 'iso88591.txt' or die $!;

Also, when I download the utf8.txt attachment I find that it has got
mangled somehow from what I uploaded​: the line endings are now \r\r\n
rather than the \r\n original which I uploaded. Furthermore, the
program doesn't crash for me if I leave it like that -- I have to
convert it back to \r\n before I see the crash again.

I will try to get a backtrace.

Backtrace from current bleadperl (906024c)​:

(In encode_method, sdone is 4294967295, hence iv is -1 in Perl_sv_setpvn)

perl515.dll!Perl_sv_setpvn(interpreter * my_perl, sv * const sv, const char * const ptr, const unsigned int len) Line 4494 C
Encode.dll!encode_method(interpreter * my_perl, const encode_s * enc, const encpage_s * dir, sv * src, int check, unsigned int * offset, sv * term, int * retcode, sv * fallback_cb) Line 266 + 0x19 bytes C
Encode.dll!XS_Encode__XS_encode(interpreter * my_perl, cv * cv) Line 658 + 0x26 bytes C
perl515.dll!Perl_pp_entersub(interpreter * my_perl) Line 2778 + 0x12 bytes C
perl515.dll!Perl_runops_debug(interpreter * my_perl) Line 2119 + 0xf bytes C
perl515.dll!Perl_call_sv(interpreter * my_perl, sv * sv, volatile long flags) Line 2690 + 0x38 bytes C
perl515.dll!Perl_call_method(interpreter * my_perl, const char * methname, long flags) Line 2616 + 0x2d bytes C
encoding.dll!PerlIOEncode_flush(interpreter * my_perl, _PerlIO * * f) Line 424 + 0x11 bytes C
perl515.dll!Perl_PerlIO_flush(interpreter * my_perl, _PerlIO * * f) Line 1727 + 0x10 bytes C
perl515.dll!PerlIOBuf_write(interpreter * my_perl, _PerlIO * * f, const void * vbuf, unsigned int count) Line 4166 + 0xd bytes C
encoding.dll!PerlIOEncode_write(interpreter * my_perl, _PerlIO * * f, const void * vbuf, unsigned int count) Line 593 + 0x15 bytes C
perl515.dll!Perl_PerlIO_write(interpreter * my_perl, _PerlIO * * f, const void * vbuf, unsigned int count) Line 1703 + 0x40 bytes C
perl515.dll!Perl_do_print(interpreter * my_perl, sv * sv, _PerlIO * * fp) Line 1258 + 0x1b bytes C
perl515.dll!Perl_pp_print(interpreter * my_perl) Line 730 + 0x13 bytes C
perl515.dll!Perl_runops_debug(interpreter * my_perl) Line 2119 + 0xf bytes C
perl515.dll!S_run_body(interpreter * my_perl, long oldscope) Line 2402 + 0xf bytes C
perl515.dll!perl_run(interpreter * my_perl) Line 2320 + 0xd bytes C
perl515.dll!RunPerl(int argc, char * * argv, char * * env) Line 270 + 0x9 bytes C++
perl.exe!main(int argc, char * * argv, char * * env) Line 23 + 0x12 bytes C
perl.exe!__tmainCRTStartup() Line 555 + 0x17 bytes C

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2016

From @dcollinsn

bee7c57 is the first new commit
commit bee7c57
Author​: Father Chrysostomos <sprout@​cpan.org>
Date​: Fri May 18 17​:02​:39 2012 -0700

  sv.c​: Don’t fiddle with AMAGIC in sv_bless

  Since overloading itself now checks whether caches are up to date, and
  since changes to the stash (@​ISA, methods) turn the flag on and over-
  loading itself turns the flag off when it can, sv_bless no longer
  needs to deal with it at all.

:040000 040000 4a9679f7a036de75ae61b9dda7ebecc3bf5335ba 634b1a9b9c56600fac5f9d9e4335474d106cd95e M lib
:100644 100644 0c940cb434f461f262ab8263d2f9f1552a04bcf2 1e7f4d2e341d79f5306194808fd65530a1f628f6 M sv.c

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2016

@dcollinsn - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Sep 20, 2016
@p5pRT
Copy link
Author

p5pRT commented Oct 8, 2016

From bug-Encode@rt.cpan.org

<URL​: https://rt.cpan.org/Ticket/Display.html?id=76824 >

Problem is still there also with blead perl. No crash or sv_setpvn panic anymore but valgrind show this error message​:

==17627== Conditional jump or move depends on uninitialised value(s)
==17627== at 0x51E0AA​: Perl_utf8n_to_uvchr (utf8.c​:858)
==17627== by 0x663CA14​: encode_method (Encode.xs​:193)
==17627== by 0x663CCF9​: XS_Encode__XS_encode (Encode.xs​:785)

Problem is again in this code from Encode.xs​:

  STRLEN clen;
  UV ch =
  utf8n_to_uvuni(s+slen, (SvCUR(src)-slen),
  &clen, UTF8_ALLOW_ANY|UTF8_CHECK_ONLY);

I suspect that (SvCUR(src)-slen) is really incorrect and something like (tlen-sdone-slen) should be passed.

IIRC s is pointer to first C char which is not yet processed in dst, slen is number of characters processed by last do_encode() call (in case of problems it can be just one or zero) and SvCUR(src) is length of original input string. (tlen-sdone) should be number of remaining characters in src, not processed in dst.

With change (SvCUR(src)-slen) to (tlen-sdone-slen) valgrind does not show error message anymore...

CCing khw, can you recheck this?

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From bug-Encode@rt.cpan.org

<URL​: https://rt.cpan.org/Ticket/Display.html?id=76824 >

On Sat Oct 08 06​:33​:55 2016, PALI wrote​:

Problem is still there also with blead perl. No crash or sv_setpvn
panic anymore but valgrind show this error message​:

==17627== Conditional jump or move depends on uninitialised value(s)
==17627== at 0x51E0AA​: Perl_utf8n_to_uvchr (utf8.c​:858)
==17627== by 0x663CA14​: encode_method (Encode.xs​:193)
==17627== by 0x663CCF9​: XS_Encode__XS_encode (Encode.xs​:785)

Problem is again in this code from Encode.xs​:

STRLEN clen;
UV ch =
utf8n_to_uvuni(s+slen, (SvCUR(src)-slen),
&clen, UTF8_ALLOW_ANY|UTF8_CHECK_ONLY);

I suspect that (SvCUR(src)-slen) is really incorrect and something
like (tlen-sdone-slen) should be passed.

IIRC s is pointer to first C char which is not yet processed in dst,
slen is number of characters processed by last do_encode() call (in
case of problems it can be just one or zero) and SvCUR(src) is length
of original input string. (tlen-sdone) should be number of remaining
characters in src, not processed in dst.

With change (SvCUR(src)-slen) to (tlen-sdone-slen) valgrind does not
show error message anymore...

CCing khw, can you recheck this?

The current code is wrong.

The 2nd parameter to utf8n_to_uvuni() is the upper limit of how far it is permissible to look beyond the first byte of the string pointed to by the first parameter. The typical way that the core code uses to handle this type of thing is to save s as s0 upon entrance to the function, and then it's easy to get it right. In this case, one could set s0 after s is adjusted for *offset.

s0 = s;

before the loop. tlen has been calculated to be the number of bytes available in s0, it should be used instead of SvCUR.

Because at the time of the function call, s is slen bytes behind the sequence you want to convert, adjustments have to be made. You could write.

utf8n_to_uvuni(s+slen, tlen - (s + slen - s0), ...

I think this is the most foolproof and maintainable method.

Note that

slen = tlen - sdone

so pali's solution

(tlen-sdone-slen)

can be rewritten as

tlen - sdone - (tlen -sdone) == 0

which is wrong. Another option would be to calculate and use sleft

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From bug-Encode@rt.cpan.org

<URL​: https://rt.cpan.org/Ticket/Display.html?id=76824 >

On Uto Okt 18 15​:44​:49 2016, khw wrote​:

Note that

slen = tlen - sdone

I do not think this is truth. slen is always modified by do_encode() before utf8n_to_uvuni() call. And do_encode() set it to number of processed bytes.in that one do_encode() call. Not to tlen - sdone.

so pali's solution

(tlen-sdone-slen)

can be rewritten as

tlen - sdone - (tlen -sdone) == 0

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From bug-Encode@rt.cpan.org

<URL​: https://rt.cpan.org/Ticket/Display.html?id=76824 >

Now I run all unit tests in Encode plus crash test from first post and compared (tlen-sdone-slen) and (tlen - (s + slen - s0)) values. Values are before every utf8n_to_uvuni() call exactly same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant