Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in decoder is dropping bytes on the floor #6466

Closed
p6rt opened this issue Aug 26, 2017 · 8 comments
Closed

Built-in decoder is dropping bytes on the floor #6466

p6rt opened this issue Aug 26, 2017 · 8 comments

Comments

@p6rt
Copy link

p6rt commented Aug 26, 2017

Migrated from rt.perl.org#131961 (status was 'resolved')

Searchable as RT131961$

@p6rt
Copy link
Author

p6rt commented Aug 26, 2017

From @AlexDaniel

The input file for this problem is ≈15 MB so please bear with external link​:
https://files.progarm.org/golfed.gz (1.6 MB compressed)

Command​:
perl6 -ne 'say $++' golfed
# or
perl6 -ne 'say $++' < golfed

Result​:
0
1
2
… … …
257568
257569
257570
Malformed UTF-8
  in block <unit> at -e line 1

There's no malformed UTF-8 in the file. And if you don't believe me, try this​:

cat golfed | perl6 -ne 'say $++'

There are at least three possible outcomes (it is not as stable as previous examples)​:
(*) Fails after 257570, just like in the previous example
(*) Fails after 121712
(*) No error, goes through the whole file just fine

<geekosaur> sounds more likely to be I/O related than unicode related
<geekosaur> like it's dropping bytes on the floor and if the utf8 decoder was in (or lands in) the middle of a sequence, boom

IRC log​: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860

This issue may be related​: https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075

@p6rt
Copy link
Author

p6rt commented Aug 26, 2017

From @MasterDuke17

On Fri, 25 Aug 2017 18​:56​:37 -0700, alex.jakimenko@​gmail.com wrote​:

The input file for this problem is ≈15 MB so please bear with external
link​:
https://files.progarm.org/golfed.gz (1.6 MB compressed)

Command​:
perl6 -ne 'say $++' golfed
# or
perl6 -ne 'say $++' < golfed

Result​:
0
1
2
… … …
257568
257569
257570
Malformed UTF-8
in block <unit> at -e line 1

There's no malformed UTF-8 in the file. And if you don't believe me,
try this​:

cat golfed | perl6 -ne 'say $++'

There are at least three possible outcomes (it is not as stable as
previous examples)​:
(*) Fails after 257570, just like in the previous example
(*) Fails after 121712
(*) No error, goes through the whole file just fine

<geekosaur> sounds more likely to be I/O related than unicode related
<geekosaur> like it's dropping bytes on the floor and if the utf8
decoder was in (or lands in) the middle of a sequence, boom

IRC log​: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860

This issue may be related​:
https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075

Some gdb output​:
(gdb) break MVM_exception_throw_adhoc
...
(gdb) bt
#​0 MVM_exception_throw_adhoc (tc=tc@​entry=0x555555758960, messageFormat=messageFormat@​entry=0x7ffff787053a "Malformed UTF-8") at src/core/exceptions.c​:721
#​1 0x00007ffff7808833 in MVM_string_utf8_decodestream (tc=tc@​entry=0x555555758960, ds=ds@​entry=0x5555580cfb70, stopper_chars=stopper_chars@​entry=0x0, seps=seps@​entry=0x5555580cfc90) at src/strings/utf8.c​:496
#​2 0x00007ffff7804720 in run_decode (eof=0, sep_spec=0x5555580cfc90, stopper_chars=0x0, ds=0x5555580cfb70, tc=0x555555758960) at src/strings/decode_stream.c​:115
#​3 MVM_string_decodestream_get_until_sep (tc=tc@​entry=0x555555758960, ds=ds@​entry=0x5555580cfb70, sep_spec=sep_spec@​entry=0x5555580cfc90, chomp=1) at src/strings/decode_stream.c​:373
#​4 0x00007ffff77cbc9f in MVM_decoder_take_line (tc=0x555555758960, decoder=<optimized out>, chomp=<optimized out>, incomplete_ok=<optimized out>) at src/6model/reprs/Decoder.c​:259
#​5 0x00007ffff42434f9 in ?? ()
#​6 0x0000006e0000005b in ?? ()
#​7 0x0000555555837010 in ?? ()
#​8 0x0000555555758960 in ?? ()
#​9 0x00005555560f9c98 in ?? ()
#​10 0x00007ffff0253bc0 in ?? ()
#​11 0x00007ffff786ff38 in ?? () from //home/dan/Source/perl6/install/lib/libmoar.so
#​12 0x0000555558079ba0 in ?? ()
#​13 0x00007ffff777860e in MVM_frame_invoke (tc=0x7fffffffd6c0, static_frame=<optimized out>, callsite=0x555555de1938, args=0x5555560f9c58, outer=<optimized out>, code_ref=<optimized out>, spesh_cand=<optimized out>) at src/core/frame.c​:551
#​14 0x0000555555758960 in ?? ()
#​15 0x00005555560f9c98 in ?? ()
#​16 0x00007ffff780398c in MVM_jit_enter_code (tc=<optimized out>, cu=<optimized out>, code=<optimized out>) at src/jit/compile.c​:146
#​17 0x00007ffff77652fe in MVM_interp_run (tc=tc@​entry=0x555555758960, initial_invoke=0x0, invoke_data=0x1) at src/core/interp.c​:5690
#​18 0x00007ffff7835322 in MVM_vm_run_file (instance=0x555555758010, filename=<optimized out>) at src/moar.c​:356
#​19 0x000055555555541f in main (argc=9, argv=0x7fffffffdd28) at src/main.c​:255
(gdb) call MVM_dump_backtrace(tc)
  at SETTING​::src/core/Encoding/Decoder/Builtin.pm​:32 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:consume-line-chars)
from SETTING​::src/core/IO/Handle.pm​:235 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:)
from SETTING​::src/core/IO/Handle.pm​:231 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:get-line-slow-path)
from SETTING​::src/core/IO/Handle.pm​:226 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:get)
from SETTING​::src/core/IO/Handle.pm​:388 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:pull-one)
from SETTING​::src/core/Iterable.pm​:27 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:pull-one)
from SETTING​::src/core/Iterable.pm​:27 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:pull-one)
from SETTING​::src/core/Any-iterable-methods.pm​:453 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:sink-all)
from SETTING​::src/core/Seq.pm​:188 (/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:sink)
from -e​:1 (<ephemeral file>​:<unit>)
from -e​:1 (<ephemeral file>​:<unit-outer>)
from gen/moar/stage2/NQPHLL.nqp​:1608 (/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:eval)
from gen/moar/stage2/NQPHLL.nqp​:1715 (/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:)
from gen/moar/stage2/NQPHLL.nqp​:1712 (/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:command_eval)
from src/Perl6/Compiler.nqp​:42 (/home/dan/Source/perl6/install/share/nqp/lib/Perl6/Compiler.moarvm​:command_eval)
from gen/moar/stage2/NQPHLL.nqp​:1696 (/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:command_line)
from gen/moar/main.nqp​:47 (/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:MAIN)
from gen/moar/main.nqp​:38 (/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:<mainline>)
from <unknown>​:1 (/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:<main>)
from <unknown>​:1 (/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:<entry>)

@p6rt
Copy link
Author

p6rt commented Aug 26, 2017

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Aug 26, 2017

From @MasterDuke17

On Fri, 25 Aug 2017 19​:48​:51 -0700, ddgreen@​gmail.com wrote​:

On Fri, 25 Aug 2017 18​:56​:37 -0700, alex.jakimenko@​gmail.com wrote​:

The input file for this problem is ≈15 MB so please bear with
external
link​:
https://files.progarm.org/golfed.gz (1.6 MB compressed)

Command​:
perl6 -ne 'say $++' golfed
# or
perl6 -ne 'say $++' < golfed

Result​:
0
1
2
… … …
257568
257569
257570
Malformed UTF-8
in block <unit> at -e line 1

There's no malformed UTF-8 in the file. And if you don't believe me,
try this​:

cat golfed | perl6 -ne 'say $++'

There are at least three possible outcomes (it is not as stable as
previous examples)​:
(*) Fails after 257570, just like in the previous example
(*) Fails after 121712
(*) No error, goes through the whole file just fine

<geekosaur> sounds more likely to be I/O related than unicode related
<geekosaur> like it's dropping bytes on the floor and if the utf8
decoder was in (or lands in) the middle of a sequence, boom

IRC log​: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860

This issue may be related​:
https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075

Some gdb output​:
(gdb) break MVM_exception_throw_adhoc
...
(gdb) bt
#​0 MVM_exception_throw_adhoc (tc=tc@​entry=0x555555758960,
messageFormat=messageFormat@​entry=0x7ffff787053a "Malformed UTF-8") at
src/core/exceptions.c​:721
#​1 0x00007ffff7808833 in MVM_string_utf8_decodestream
(tc=tc@​entry=0x555555758960, ds=ds@​entry=0x5555580cfb70,
stopper_chars=stopper_chars@​entry=0x0, seps=seps@​entry=0x5555580cfc90)
at src/strings/utf8.c​:496
#​2 0x00007ffff7804720 in run_decode (eof=0, sep_spec=0x5555580cfc90,
stopper_chars=0x0, ds=0x5555580cfb70, tc=0x555555758960) at
src/strings/decode_stream.c​:115
#​3 MVM_string_decodestream_get_until_sep (tc=tc@​entry=0x555555758960,
ds=ds@​entry=0x5555580cfb70, sep_spec=sep_spec@​entry=0x5555580cfc90,
chomp=1) at src/strings/decode_stream.c​:373
#​4 0x00007ffff77cbc9f in MVM_decoder_take_line (tc=0x555555758960,
decoder=<optimized out>, chomp=<optimized out>,
incomplete_ok=<optimized out>) at src/6model/reprs/Decoder.c​:259
#​5 0x00007ffff42434f9 in ?? ()
#​6 0x0000006e0000005b in ?? ()
#​7 0x0000555555837010 in ?? ()
#​8 0x0000555555758960 in ?? ()
#​9 0x00005555560f9c98 in ?? ()
#​10 0x00007ffff0253bc0 in ?? ()
#​11 0x00007ffff786ff38 in ?? () from
//home/dan/Source/perl6/install/lib/libmoar.so
#​12 0x0000555558079ba0 in ?? ()
#​13 0x00007ffff777860e in MVM_frame_invoke (tc=0x7fffffffd6c0,
static_frame=<optimized out>, callsite=0x555555de1938,
args=0x5555560f9c58, outer=<optimized out>, code_ref=<optimized out>,
spesh_cand=<optimized out>) at src/core/frame.c​:551
#​14 0x0000555555758960 in ?? ()
#​15 0x00005555560f9c98 in ?? ()
#​16 0x00007ffff780398c in MVM_jit_enter_code (tc=<optimized out>,
cu=<optimized out>, code=<optimized out>) at src/jit/compile.c​:146
#​17 0x00007ffff77652fe in MVM_interp_run (tc=tc@​entry=0x555555758960,
initial_invoke=0x0, invoke_data=0x1) at src/core/interp.c​:5690
#​18 0x00007ffff7835322 in MVM_vm_run_file (instance=0x555555758010,
filename=<optimized out>) at src/moar.c​:356
#​19 0x000055555555541f in main (argc=9, argv=0x7fffffffdd28) at
src/main.c​:255
(gdb) call MVM_dump_backtrace(tc)
at SETTING​::src/core/Encoding/Decoder/Builtin.pm​:32
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:consume-
line-chars)
from SETTING​::src/core/IO/Handle.pm​:235
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:)
from SETTING​::src/core/IO/Handle.pm​:231
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:get-
line-slow-path)
from SETTING​::src/core/IO/Handle.pm​:226
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:get)
from SETTING​::src/core/IO/Handle.pm​:388
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:pull-
one)
from SETTING​::src/core/Iterable.pm​:27
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:pull-
one)
from SETTING​::src/core/Iterable.pm​:27
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:pull-
one)
from SETTING​::src/core/Any-iterable-methods.pm​:453
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:sink-
all)
from SETTING​::src/core/Seq.pm​:188
(/home/dan/Source/perl6/install/share/perl6/runtime/CORE.setting.moarvm​:sink)
from -e​:1 (<ephemeral file>​:<unit>)
from -e​:1 (<ephemeral file>​:<unit-outer>)
from gen/moar/stage2/NQPHLL.nqp​:1608
(/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:eval)
from gen/moar/stage2/NQPHLL.nqp​:1715
(/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:)
from gen/moar/stage2/NQPHLL.nqp​:1712
(/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:command_eval)
from src/Perl6/Compiler.nqp​:42
(/home/dan/Source/perl6/install/share/nqp/lib/Perl6/Compiler.moarvm​:command_eval)
from gen/moar/stage2/NQPHLL.nqp​:1696
(/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm​:command_line)
from gen/moar/main.nqp​:47
(/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:MAIN)
from gen/moar/main.nqp​:38
(/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:<mainline>)
from <unknown>​:1
(/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:<main>)
from <unknown>​:1
(/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm​:<entry>)

And when built with --optimize=0​:
(gdb) bt
#​0 MVM_exception_throw_adhoc (tc=0x555555758960, messageFormat=0x7ffff78682ee "Malformed UTF-8") at src/core/exceptions.c​:721
#​1 0x00007ffff77fa86d in MVM_string_utf8_decodestream (tc=0x555555758960, ds=0x5555580cf750, stopper_chars=0x0, seps=0x5555580cf870) at src/strings/utf8.c​:496
#​2 0x00007ffff77f58a0 in run_decode (tc=0x555555758960, ds=0x5555580cf750, stopper_chars=0x0, sep_spec=0x5555580cf870, eof=0) at src/strings/decode_stream.c​:115
#​3 0x00007ffff77f62cd in MVM_string_decodestream_get_until_sep (tc=0x555555758960, ds=0x5555580cf750, sep_spec=0x5555580cf870, chomp=1) at src/strings/decode_stream.c​:373
#​4 0x00007ffff77a9d93 in MVM_decoder_take_line (tc=0x555555758960, decoder=0x5555583822e0, chomp=1, incomplete_ok=0) at src/6model/reprs/Decoder.c​:259
#​5 0x00007ffff41df4f9 in ?? ()
#​6 0x0000555556ecb048 in ?? ()
#​7 0x0000555556ecb030 in ?? ()
#​8 0x00007ffff7fd4030 in ?? ()
#​9 0x00007ffff7867960 in ?? () from //home/dan/Source/perl6/install/lib/libmoar.so
#​10 0x0000555555758960 in ?? ()
#​11 0x0000000758176180 in ?? ()
#​12 0x0000555556ed32c0 in ?? ()
#​13 0x00007fffffffc410 in ?? ()
#​14 0x00007ffff772ec64 in MVM_p6opaque_read_object (tc=0x5555560f9c90, o=0x7fff002c2d7d, offset=93824994347360) at src/6model/reprs/P6opaque.h​:115
#​15 0x00007ffff77f4d15 in MVM_jit_enter_code (tc=0x555555758960, cu=0x5555557bf8f0, code=0x7ffff0178c10) at src/jit/compile.c​:146
#​16 0x00007ffff7728f49 in MVM_interp_run (tc=0x555555758960, initial_invoke=0x7ffff7826b63 <toplevel_initial_invoke>, invoke_data=0x5555557d1010) at src/core/interp.c​:5690
#​17 0x00007ffff7826cc8 in MVM_vm_run_file (instance=0x555555758010, filename=0x7fffffffe1ca "/home/dan/Source/perl6/install/share/perl6/runtime/perl6.moarvm") at src/moar.c​:356
#​18 0x00005555555556ab in main (argc=9, argv=0x7fffffffdd28) at src/main.c​:255

@p6rt
Copy link
Author

p6rt commented Aug 31, 2017

From @AlexDaniel

Bisected to rakudo/rakudo@51b63bf

On 2017-08-25 18​:56​:37, alex.jakimenko@​gmail.com wrote​:

The input file for this problem is ≈15 MB so please bear with external
link​:
https://files.progarm.org/golfed.gz (1.6 MB compressed)

Command​:
perl6 -ne 'say $++' golfed
# or
perl6 -ne 'say $++' < golfed

Result​:
0
1
2
… … …
257568
257569
257570
Malformed UTF-8
in block <unit> at -e line 1

There's no malformed UTF-8 in the file. And if you don't believe me,
try this​:

cat golfed | perl6 -ne 'say $++'

There are at least three possible outcomes (it is not as stable as
previous examples)​:
(*) Fails after 257570, just like in the previous example
(*) Fails after 121712
(*) No error, goes through the whole file just fine

<geekosaur> sounds more likely to be I/O related than unicode related
<geekosaur> like it's dropping bytes on the floor and if the utf8
decoder was in (or lands in) the middle of a sequence, boom

IRC log​: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860

This issue may be related​:
https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075

@p6rt
Copy link
Author

p6rt commented Aug 31, 2017

From @AlexDaniel

I bisected moarvm manually to MoarVM/MoarVM@a6abd3c

On 2017-08-30 19​:09​:18, alex.jakimenko@​gmail.com wrote​:

Bisected to
rakudo/rakudo@51b63bf

On 2017-08-25 18​:56​:37, alex.jakimenko@​gmail.com wrote​:

The input file for this problem is ≈15 MB so please bear with
external
link​:
https://files.progarm.org/golfed.gz (1.6 MB compressed)

Command​:
perl6 -ne 'say $++' golfed
# or
perl6 -ne 'say $++' < golfed

Result​:
0
1
2
… … …
257568
257569
257570
Malformed UTF-8
in block <unit> at -e line 1

There's no malformed UTF-8 in the file. And if you don't believe me,
try this​:

cat golfed | perl6 -ne 'say $++'

There are at least three possible outcomes (it is not as stable as
previous examples)​:
(*) Fails after 257570, just like in the previous example
(*) Fails after 121712
(*) No error, goes through the whole file just fine

<geekosaur> sounds more likely to be I/O related than unicode related
<geekosaur> like it's dropping bytes on the floor and if the utf8
decoder was in (or lands in) the middle of a sequence, boom

IRC log​: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860

This issue may be related​:
https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075

@p6rt
Copy link
Author

p6rt commented Sep 4, 2017

From @jnthn

On Fri, 25 Aug 2017 18​:56​:37 -0700, alex.jakimenko@​gmail.com wrote​:

The input file for this problem is ≈15 MB so please bear with external
link​:
https://files.progarm.org/golfed.gz (1.6 MB compressed)

Command​:
perl6 -ne 'say $++' golfed
# or
perl6 -ne 'say $++' < golfed

Result​:
0
1
2
… … …
257568
257569
257570
Malformed UTF-8
in block <unit> at -e line 1

There's no malformed UTF-8 in the file. And if you don't believe me,
try this​:

cat golfed | perl6 -ne 'say $++'

There are at least three possible outcomes (it is not as stable as
previous examples)​:
(*) Fails after 257570, just like in the previous example
(*) Fails after 121712
(*) No error, goes through the whole file just fine

<geekosaur> sounds more likely to be I/O related than unicode related
<geekosaur> like it's dropping bytes on the floor and if the utf8
decoder was in (or lands in) the middle of a sequence, boom

IRC log​: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860

This issue may be related​:
https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075

Was actually the decoder itself dropping the bytes in a fast path -> slow path transition. Fixed, and test added in S32-io/io-handle.t.

@p6rt
Copy link
Author

p6rt commented Sep 4, 2017

@jnthn - Status changed from 'open' to 'resolved'

@p6rt p6rt closed this as completed Sep 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant