Skip Menu |
Report information
Id: 131923
Status: open
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: alex.jakimenko [at] gmail.com
Cc:
AdminCc:

Severity: (no value)
Tag: (no value)
Platform: (no value)
Patch Status: (no value)
VM: (no value)



Subject: Proc::Async.stdout and zero-separated input ($proc.stdout.split(“\0”) … )
Download (untitled) / with headers
text/plain 963b
Most command line tools support zero-separated input and output (grep -z, find -print0, perl -0, sort -z, xargs -0, sed -z). And while you can use .stdout.lines to work on things line-by-line, doing the same thing with null-byte separators is significantly harder. <jnthn> Anyway, it's pretty easy to write yourself <jnthn> Something like <jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_; while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx); $buffer .= substr($idx + 1); } LAST emit $buffer } } I agree that it is not too hard, but it should be built in. One could argue that it should be *easier* to do this than to work on stuff line-by-line. People usually don't expect newlines in filenames, but it is legal and therefore any code that expects non-null separated paths is broken. Not sure if we should go so far in trying to get the huffman coding right, but a built-in way to work with data like this would be a great step.
RT-Send-CC: perl6-compiler [...] perl.org
Download (untitled) / with headers
text/plain 1.3k
On Fri, 18 Aug 2017 08:35:18 -0700, alex.jakimenko@gmail.com wrote: Show quoted text
> Most command line tools support zero-separated input and output (grep > -z, find -print0, perl -0, sort -z, xargs -0, sed -z). > > And while you can use .stdout.lines to work on things line-by-line, > doing the same thing with null-byte separators is significantly > harder. > > <jnthn> Anyway, it's pretty easy to write yourself > <jnthn> Something like > <jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_; > while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx); > $buffer .= substr($idx + 1); } LAST emit $buffer } } > > I agree that it is not too hard, but it should be built in. > > One could argue that it should be *easier* to do this than to work on > stuff line-by-line. People usually don't expect newlines in filenames, > but it is legal and therefore any code that expects non-null separated > paths is broken. Not sure if we should go so far in trying to get the > huffman coding right, but a built-in way to work with data like this > would be a great step.
That'd only work for strings, while .split can also split on regexes. I'd say we defer this until Cat (lazy strings) is implemented and then do the full-featured .split and .comb on it. The exact same issue exists in IO::Handle, which currently implements it by slurping the entire file first.
Another way to do it is to support custom nl (similarly to how we do 「$*IN.nl-in = 0.chr」 now). Split may be an overkill.

On 2017-08-18 08:40:32, cpan@zoffix.com wrote:
Show quoted text
> On Fri, 18 Aug 2017 08:35:18 -0700, alex.jakimenko@gmail.com wrote:
> > Most command line tools support zero-separated input and output (grep
> > -z, find -print0, perl -0, sort -z, xargs -0, sed -z).
> >
> > And while you can use .stdout.lines to work on things line-by-line,
> > doing the same thing with null-byte separators is significantly
> > harder.
> >
> > <jnthn> Anyway, it's pretty easy to write yourself
> > <jnthn> Something like
> > <jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_;
> > while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx);
> > $buffer .= substr($idx + 1); } LAST emit $buffer } }
> >
> > I agree that it is not too hard, but it should be built in.
> >
> > One could argue that it should be *easier* to do this than to work on
> > stuff line-by-line. People usually don't expect newlines in
> > filenames,
> > but it is legal and therefore any code that expects non-null
> > separated
> > paths is broken. Not sure if we should go so far in trying to get the
> > huffman coding right, but a built-in way to work with data like this
> > would be a great step.
>
>
> That'd only work for strings, while .split can also split on regexes.
> I'd say we defer this until Cat (lazy strings) is implemented and then
> do the full-featured .split and .comb on it.
>
> The exact same issue exists in IO::Handle, which currently implements
> it by slurping the entire file first.


See https://github.com/perl6/doc/issues/1472

Turns out that $proc.lines does the wrong thing, which is probably a bug. We do need nl-in for Proc::Async, and this nl-in should also be the same as in IO::Handle.
On 2017-08-18 08:54:36, alex.jakimenko@gmail.com wrote:
Show quoted text
> Another way to do it is to support custom nl (similarly to how we do
> 「$*IN.nl-in = 0.chr」 now). Split may be an overkill.
>
> On 2017-08-18 08:40:32, cpan@zoffix.com wrote:
> > On Fri, 18 Aug 2017 08:35:18 -0700, alex.jakimenko@gmail.com wrote:
> > > Most command line tools support zero-separated input and output (grep
> > > -z, find -print0, perl -0, sort -z, xargs -0, sed -z).
> > >
> > > And while you can use .stdout.lines to work on things line-by-line,
> > > doing the same thing with null-byte separators is significantly
> > > harder.
> > >
> > > <jnthn> Anyway, it's pretty easy to write yourself
> > > <jnthn> Something like
> > > <jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_;
> > > while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx);
> > > $buffer .= substr($idx + 1); } LAST emit $buffer } }
> > >
> > > I agree that it is not too hard, but it should be built in.
> > >
> > > One could argue that it should be *easier* to do this than to work on
> > > stuff line-by-line. People usually don't expect newlines in
> > > filenames,
> > > but it is legal and therefore any code that expects non-null
> > > separated
> > > paths is broken. Not sure if we should go so far in trying to get the
> > > huffman coding right, but a built-in way to work with data like this
> > > would be a great step.
> >
> >
> > That'd only work for strings, while .split can also split on regexes.
> > I'd say we defer this until Cat (lazy strings) is implemented and then
> > do the full-featured .split and .comb on it.
> >
> > The exact same issue exists in IO::Handle, which currently implements
> > it by slurping the entire file first.


I meant $proc.stdout.lines of course.

On 2017-08-27 07:32:35, alex.jakimenko@gmail.com wrote:
Show quoted text
> See https://github.com/perl6/doc/issues/1472
>
> Turns out that $proc.lines does the wrong thing, which is probably a
> bug. We do
> need nl-in for Proc::Async, and this nl-in should also be the same as
> in
> IO::Handle.
> On 2017-08-18 08:54:36, alex.jakimenko@gmail.com wrote:
> > Another way to do it is to support custom nl (similarly to how we do
> > 「$*IN.nl-in = 0.chr」 now). Split may be an overkill.
> >
> > On 2017-08-18 08:40:32, cpan@zoffix.com wrote:
> > > On Fri, 18 Aug 2017 08:35:18 -0700, alex.jakimenko@gmail.com wrote:
> > > > Most command line tools support zero-separated input and output
> > > > (grep
> > > > -z, find -print0, perl -0, sort -z, xargs -0, sed -z).
> > > >
> > > > And while you can use .stdout.lines to work on things line-by-
> > > > line,
> > > > doing the same thing with null-byte separators is significantly
> > > > harder.
> > > >
> > > > <jnthn> Anyway, it's pretty easy to write yourself
> > > > <jnthn> Something like
> > > > <jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~=
> > > > $_;
> > > > while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx);
> > > > $buffer .= substr($idx + 1); } LAST emit $buffer } }
> > > >
> > > > I agree that it is not too hard, but it should be built in.
> > > >
> > > > One could argue that it should be *easier* to do this than to
> > > > work on
> > > > stuff line-by-line. People usually don't expect newlines in
> > > > filenames,
> > > > but it is legal and therefore any code that expects non-null
> > > > separated
> > > > paths is broken. Not sure if we should go so far in trying to get
> > > > the
> > > > huffman coding right, but a built-in way to work with data like
> > > > this
> > > > would be a great step.
> > >
> > >
> > > That'd only work for strings, while .split can also split on
> > > regexes.
> > > I'd say we defer this until Cat (lazy strings) is implemented and
> > > then
> > > do the full-featured .split and .comb on it.
> > >
> > > The exact same issue exists in IO::Handle, which currently
> > > implements
> > > it by slurping the entire file first.




This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org