Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proc::Async.stdout and zero-separated input ($proc.stdout.split(“\0”) … ) #6452

Open
p6rt opened this issue Aug 18, 2017 · 6 comments
Open

Comments

@p6rt
Copy link

p6rt commented Aug 18, 2017

Migrated from rt.perl.org#131923 (status was 'open')

Searchable as RT131923$

@p6rt
Copy link
Author

p6rt commented Aug 18, 2017

From @AlexDaniel

Most command line tools support zero-separated input and output (grep -z, find -print0, perl -0, sort -z, xargs -0, sed -z).

And while you can use .stdout.lines to work on things line-by-line, doing the same thing with null-byte separators is significantly harder.

<jnthn> Anyway, it's pretty easy to write yourself
<jnthn> Something like
<jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_; while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx); $buffer .= substr($idx + 1); } LAST emit $buffer } }

I agree that it is not too hard, but it should be built in.

One could argue that it should be *easier* to do this than to work on stuff line-by-line. People usually don't expect newlines in filenames, but it is legal and therefore any code that expects non-null separated paths is broken. Not sure if we should go so far in trying to get the huffman coding right, but a built-in way to work with data like this would be a great step.

@p6rt
Copy link
Author

p6rt commented Aug 18, 2017

From @zoffixznet

On Fri, 18 Aug 2017 08​:35​:18 -0700, alex.jakimenko@​gmail.com wrote​:

Most command line tools support zero-separated input and output (grep
-z, find -print0, perl -0, sort -z, xargs -0, sed -z).

And while you can use .stdout.lines to work on things line-by-line,
doing the same thing with null-byte separators is significantly
harder.

<jnthn> Anyway, it's pretty easy to write yourself
<jnthn> Something like
<jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_;
while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx);
$buffer .= substr($idx + 1); } LAST emit $buffer } }

I agree that it is not too hard, but it should be built in.

One could argue that it should be *easier* to do this than to work on
stuff line-by-line. People usually don't expect newlines in filenames,
but it is legal and therefore any code that expects non-null separated
paths is broken. Not sure if we should go so far in trying to get the
huffman coding right, but a built-in way to work with data like this
would be a great step.

That'd only work for strings, while .split can also split on regexes. I'd say we defer this until Cat (lazy strings) is implemented and then do the full-featured .split and .comb on it.

The exact same issue exists in IO​::Handle, which currently implements it by slurping the entire file first.

@p6rt
Copy link
Author

p6rt commented Aug 18, 2017

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Aug 18, 2017

From @AlexDaniel

Another way to do it is to support custom nl (similarly to how we do ï½¢$*IN.nl-in = 0.chrï½£ now). Split may be an overkill.

On 2017-08-18 08​:40​:32, cpan@​zoffix.com wrote​:

On Fri, 18 Aug 2017 08​:35​:18 -0700, alex.jakimenko@​gmail.com wrote​:

Most command line tools support zero-separated input and output (grep
-z, find -print0, perl -0, sort -z, xargs -0, sed -z).

And while you can use .stdout.lines to work on things line-by-line,
doing the same thing with null-byte separators is significantly
harder.

<jnthn> Anyway, it's pretty easy to write yourself
<jnthn> Something like
<jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_;
while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx);
$buffer .= substr($idx + 1); } LAST emit $buffer } }

I agree that it is not too hard, but it should be built in.

One could argue that it should be *easier* to do this than to work on
stuff line-by-line. People usually don't expect newlines in
filenames,
but it is legal and therefore any code that expects non-null
separated
paths is broken. Not sure if we should go so far in trying to get the
huffman coding right, but a built-in way to work with data like this
would be a great step.

That'd only work for strings, while .split can also split on regexes.
I'd say we defer this until Cat (lazy strings) is implemented and then
do the full-featured .split and .comb on it.

The exact same issue exists in IO​::Handle, which currently implements
it by slurping the entire file first.

@p6rt
Copy link
Author

p6rt commented Aug 27, 2017

From @AlexDaniel

See Raku/doc#1472

Turns out that $proc.lines does the wrong thing, which is probably a bug. We do need nl-in for Proc​::Async, and this nl-in should also be the same as in IO​::Handle.
On 2017-08-18 08​:54​:36, alex.jakimenko@​gmail.com wrote​:

Another way to do it is to support custom nl (similarly to how we do
ï½¢$*IN.nl-in = 0.chrï½£ now). Split may be an overkill.

On 2017-08-18 08​:40​:32, cpan@​zoffix.com wrote​:

On Fri, 18 Aug 2017 08​:35​:18 -0700, alex.jakimenko@​gmail.com wrote​:

Most command line tools support zero-separated input and output (grep
-z, find -print0, perl -0, sort -z, xargs -0, sed -z).

And while you can use .stdout.lines to work on things line-by-line,
doing the same thing with null-byte separators is significantly
harder.

<jnthn> Anyway, it's pretty easy to write yourself
<jnthn> Something like
<jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~= $_;
while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx);
$buffer .= substr($idx + 1); } LAST emit $buffer } }

I agree that it is not too hard, but it should be built in.

One could argue that it should be *easier* to do this than to work on
stuff line-by-line. People usually don't expect newlines in
filenames,
but it is legal and therefore any code that expects non-null
separated
paths is broken. Not sure if we should go so far in trying to get the
huffman coding right, but a built-in way to work with data like this
would be a great step.

That'd only work for strings, while .split can also split on regexes.
I'd say we defer this until Cat (lazy strings) is implemented and then
do the full-featured .split and .comb on it.

The exact same issue exists in IO​::Handle, which currently implements
it by slurping the entire file first.

@p6rt
Copy link
Author

p6rt commented Aug 27, 2017

From @AlexDaniel

I meant $proc.stdout.lines of course.

On 2017-08-27 07​:32​:35, alex.jakimenko@​gmail.com wrote​:

See Raku/doc#1472

Turns out that $proc.lines does the wrong thing, which is probably a
bug. We do
need nl-in for Proc​::Async, and this nl-in should also be the same as
in
IO​::Handle.
On 2017-08-18 08​:54​:36, alex.jakimenko@​gmail.com wrote​:

Another way to do it is to support custom nl (similarly to how we do
ï½¢$*IN.nl-in = 0.chrï½£ now). Split may be an overkill.

On 2017-08-18 08​:40​:32, cpan@​zoffix.com wrote​:

On Fri, 18 Aug 2017 08​:35​:18 -0700, alex.jakimenko@​gmail.com wrote​:

Most command line tools support zero-separated input and output
(grep
-z, find -print0, perl -0, sort -z, xargs -0, sed -z).

And while you can use .stdout.lines to work on things line-by-
line,
doing the same thing with null-byte separators is significantly
harder.

<jnthn> Anyway, it's pretty easy to write yourself
<jnthn> Something like
<jnthn> supply { my $buffer = ''; whenever $stdout { $buffer ~=
$_;
while $buffer.index("\0") -> $idx { emit $buffer.substr(0, $idx);
$buffer .= substr($idx + 1); } LAST emit $buffer } }

I agree that it is not too hard, but it should be built in.

One could argue that it should be *easier* to do this than to
work on
stuff line-by-line. People usually don't expect newlines in
filenames,
but it is legal and therefore any code that expects non-null
separated
paths is broken. Not sure if we should go so far in trying to get
the
huffman coding right, but a built-in way to work with data like
this
would be a great step.

That'd only work for strings, while .split can also split on
regexes.
I'd say we defer this until Cat (lazy strings) is implemented and
then
do the full-featured .split and .comb on it.

The exact same issue exists in IO​::Handle, which currently
implements
it by slurping the entire file first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant