Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurp is mangling newlines, it should not (slurp ‘foo’) #6546

Open
p6rt opened this issue Sep 24, 2017 · 5 comments
Open

slurp is mangling newlines, it should not (slurp ‘foo’) #6546

p6rt opened this issue Sep 24, 2017 · 5 comments

Comments

@p6rt
Copy link

p6rt commented Sep 24, 2017

Migrated from rt.perl.org#132154 (status was 'open')

Searchable as RT132154$

@p6rt
Copy link
Author

p6rt commented Sep 24, 2017

From @AlexDaniel

This was noticed in this PR​: rakudo/rakudo#1171

Basically, slurp is changing \r\n to \n. This may seem like a reasonable thing, except that then it has to also change \r to \n.

However, replacing \r\n seems like a meaningless thing to do anyway. See https://docs.perl6.org/language/traps#Splitting_the_Input_Data_Into_Lines for some context.

Long story short, .lines is already splitting the text using all possible variants of a newline. There's no need to mangle the data in slurp.

For example, let's say you want to split the data strictly by \n. One way to do it is to call .lines on a Handle, then it works fine. But if it happens that you must slurp the whole thing, this is what you have to do​:

â��fooâ��.IO.slurp(​:bin).decode.split(â��\nâ��)

Notice how we use :bin to prevent it from doing the decoding, only to call .decode later anyway. All that just because .slurp itself is doing unnecessary data mangling.

How to resolve this ticket​:
Option 1​: stop doing what we do now. For example, don't do \r\n â�� \n substitution if it's under â��use v6.dâ��
Option 2​: include \r and make sure it acts exactly like .lines.join(â��\nâ��)

@p6rt
Copy link
Author

p6rt commented Sep 25, 2017

From @zoffixznet

Some discussion from IRC​: https://irclog.perlgeek.de/perl6-dev/2017-09-25#i_15212848

@p6rt
Copy link
Author

p6rt commented Sep 28, 2017

From @AlexDaniel

I'm shocked to see \r\n→\n translation being defended without any reasoning. Why are we doing it at all?

“\r\n => \n transformation is done consistently” consistently what? And why? Even .IO.lines splitting on \r\n is wrong because it cuts \r that may be part of the data (which is fine in the middle of a line, but not on the end).

On 2017-09-25 05​:14​:26, cpan@​zoffix.com wrote​:

Some discussion from IRC​: https://irclog.perlgeek.de/perl6-dev/2017-
09-25#i_15212848

@p6rt
Copy link
Author

p6rt commented Sep 29, 2017

From @zoffixznet

On Thu, 28 Sep 2017 13​:56​:49 -0700, alex.jakimenko@​gmail.com wrote​:

I'm shocked to see \r\n�\n translation being defended without any
reasoning.

Well, it's just a pretty common thing to do. Perl did it for ages.
And IIRC we already had tickets on this because Proc​::Async didn't do it.

Why are we doing it at all?

So you could write code that works the same on Window and Linux.

�\r\n => \n transformation is done consistently� consistently what?

Proc​::Async, IO​::Handle, Str.encode, and custom decoders do this
consistently. The code the OP mentions just does it in IO​::Path because
it's a fastpath and the `​:translate-nl` option is (probably?) not fully
exposed to users yet.

Basically, slurp is changing \r\n to \n. This may seem like a
reasonable thing, except that then it has to also change \r to \n.

AFAIK we don't support any platform that uses `\r` as line separator
character. I don't know why Str.lines cuts on lone `\r`

Long story short, .lines is already splitting the text using all
possible variants of a newline. There's no need to mangle the data in
slurp.

This isn't about making input `.lines` friendly tho.

@p6rt
Copy link
Author

p6rt commented Sep 29, 2017

The RT System itself - Status changed from 'new' to 'open'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant