Skip Menu |
Report information
Id: 121870
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: mcgrath.martin [at] gmail.com
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: Wishlist
Type:
Perl Version: 5.19.12
Fixed In: (no value)



Date: Wed, 14 May 2014 10:11:00 +0100
To: perlbug [...] perl.org
Subject: [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Martin McGrath <mcgrath.martin [...] gmail.com>
Download (untitled) / with headers
text/plain 4.3k
Apologies if this has already been discussed. I don't subscribe to p5p but have searched the list before posting, and didn't find anything relating to this issue. This patch replaces the example/recommendation of using File::Slurp with Path::Tiny in perlfaq5. The reasoning behind this is an outstanding critical bug with File::Slurp: https://rt.cpan.org/Public/Bug/Display.html?id=83126 This is a bug report for perl from mcgrath.martin@gmail.com, generated with the help of perlbug 1.40 running under perl 5.19.12. diff --git a/cpan/perlfaq/lib/perlfaq5.pod b/cpan/perlfaq/lib/perlfaq5.pod index a2baf16..25d494f 100644 --- a/cpan/perlfaq/lib/perlfaq5.pod +++ b/cpan/perlfaq/lib/perlfaq5.pod @@ -200,7 +200,7 @@ you can fit the whole thing in memory!): print $out $content; -Modules such as L<File::Slurp> and L<Tie::File> can help with that +Modules such as L<Path::Tiny> and L<Tie::File> can help with that too. If you can, however, avoid reading the entire file at once. Perl won't give that memory back to the operating system until the process finishes. @@ -1133,13 +1133,13 @@ C<$DB_RECNO> bindings, which allow you to tie an array to a file so that accessing an element of the array actually accesses the corresponding line in the file. -If you want to load the entire file, you can use the L<File::Slurp> +If you want to load the entire file, you can use the L<Path::Tiny> module to do it in one simple and efficient step: - use File::Slurp; + use Path::Tiny; - my $all_of_it = read_file($filename); # entire file in scalar - my @all_lines = read_file($filename); # one line per element + my $all_of_it = path($filename)->slurp; # entire file in scalar + my @all_lines = path($filename)->lines; # one line per element Or you can read the entire file contents into a scalar like this: --------------1.8.3.2-- --- Flags: category=docs severity=wishlist --- Site configuration information for perl 5.19.12: Configured by marto at Sat May 10 12:15:37 BST 2014. Summary of my perl5 (revision 5 version 19 subversion 12) configuration: Derived from: f78f6d13e94ec2059acb931e3ca33869aa58f1e7 Platform: osname=linux, osvers=3.11.0-20-generic, archname=x86_64-linux uname='linux marto-virtualbox 3.11.0-20-generic #34-ubuntu smp tue apr 1 20:40:25 utc 2014 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Dusedevel -Dprefix=/home/marto/bleedperl' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.8.1', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.17.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.17' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Locally applied patches: uncommitted-changes --- @INC for perl 5.19.12: /home/marto/bleedperl/lib/site_perl/5.19.12/x86_64-linux /home/marto/bleedperl/lib/site_perl/5.19.12 /home/marto/bleedperl/lib/5.19.12/x86_64-linux /home/marto/bleedperl/lib/5.19.12 . --- Environment for perl 5.19.12: HOME=/home/marto LANG=en_GB.UTF-8 LANGUAGE=en_GB:en LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games PERL_BADLANG (unset) SHELL=/bin/bash

Message body is not shown because sender requested not to inline it.

From: Jarkko Hietaniemi <jhi [...] iki.fi>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
To: perl5-porters [...] perl.org
Date: Wed, 14 May 2014 10:14:08 -0400
Download (untitled) / with headers
text/plain 1.4k
On Wednesday-201405-14, 5:11, Martin McGrath (via RT) wrote: Show quoted text
> # New Ticket Created by Martin McGrath > # Please include the string: [perl #121870] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=121870 > > > > Apologies if this has already been discussed. I don't subscribe to p5p > but have searched the list before posting, and didn't find anything > relating to this issue. > > This patch replaces the example/recommendation of using File::Slurp > with Path::Tiny in perlfaq5. > > The reasoning behind this is an outstanding critical bug with File::Slurp: > > https://rt.cpan.org/Public/Bug/Display.html?id=83126
In addition to File::Slurp / perlfaq problem, shouldn't this: Show quoted text
> sysread treats any :encoding(...) as effectively :utf8. > Thus, requesting { binmode => ":encoding(UTF-8)" } (e.g. strict UTF-8
compliance) actually results in Perl's lax, insecure utf8 decoding being used instead. Show quoted text
> ... > More importantly, it will interpret any encoding as :utf8, even for
example :encoding(UTF-16). be classified as a serious issue in sysread? Though, I think, one could well argue that trying to read anything else than bytes (be it utf8 of any kind, or any other encoding) with sysread is a dubious proposition to begin with. Show quoted text
> This is a bug report for perl from mcgrath.martin@gmail.com, > generated with the help of perlbug 1.40 running under perl 5.19.12. >
From: Leon Timmermans <fawaka [...] gmail.com>
CC: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
To: Jarkko Hietaniemi <jhi [...] iki.fi>
Date: Wed, 14 May 2014 16:23:25 +0200
Download (untitled) / with headers
text/plain 922b
On Wed, May 14, 2014 at 4:14 PM, Jarkko Hietaniemi <jhi@iki.fi> wrote:
Show quoted text
In addition to File::Slurp / perlfaq problem, shouldn't this:

> sysread treats any :encoding(...) as effectively :utf8.
> Thus, requesting { binmode => ":encoding(UTF-8)" } (e.g. strict UTF-8
compliance) actually results in Perl's lax, insecure utf8 decoding being
used instead.
> ...
> More importantly, it will interpret any encoding as :utf8, even for example :encoding(UTF-16).

be classified as a serious issue in sysread?

Though, I think, one could well argue that trying to read anything else than bytes (be it utf8 of any kind, or any other encoding) with sysread is a dubious proposition to begin with.

Yes. This is essentially a core bug that's leaking out because File::Slurp is stupid. sysread might as well die in this particular case IMO, but I'm not sure if we can retroactively make it do that.

Leon
From: Ed Avis <eda [...] waniasset.com>
To: perl5-porters [...] perl.org
Date: Wed, 14 May 2014 14:44:55 +0000 (UTC)
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 102b
Interesting. Have you suggested this as a Perl::Critic policy too? -- Ed Avis <eda@waniasset.com>
Date: Wed, 14 May 2014 10:46:51 -0400
To: Leon Timmermans <fawaka [...] gmail.com>
CC: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Jarkko Hietaniemi <jhi [...] iki.fi>
Download (untitled) / with headers
text/plain 359b
On Wednesday-201405-14, 10:23, Leon Timmermans wrote: Show quoted text
> Yes. This is essentially a core bug that's leaking out because > File::Slurp is stupid. sysread might as well die in this particular case > IMO, but I'm not sure if we can retroactively make it do that.
Probably not for 5.20, harhar... but certainly we could try purposefully breaking this for 5.21.
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porters <perl5-porters [...] perl.org>
To: Leon Timmermans <fawaka [...] gmail.com>
Date: Wed, 14 May 2014 10:56:09 -0400
From: Jarkko Hietaniemi <jhi [...] iki.fi>
Download (untitled) / with headers
text/plain 240b
Show quoted text
> Yes. This is essentially a core bug that's leaking out because > File::Slurp is stupid. sysread might as well die in this particular case > IMO, but I'm not sure if we can retroactively make it do that.
I'll file a bug on this. Show quoted text
> Leon
From: Uri Guttman <uri [...] stemsystems.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
To: perl5-porters [...] perl.org
Date: Wed, 14 May 2014 11:14:26 -0400
Download (untitled) / with headers
text/plain 1.3k
On 05/14/2014 10:23 AM, Leon Timmermans wrote: Show quoted text
> On Wed, May 14, 2014 at 4:14 PM, Jarkko Hietaniemi <jhi@iki.fi> wrote: >
>> In addition to File::Slurp / perlfaq problem, shouldn't this: >>
>>> sysread treats any :encoding(...) as effectively :utf8. >>> Thus, requesting { binmode => ":encoding(UTF-8)" } (e.g. strict UTF-8
>> compliance) actually results in Perl's lax, insecure utf8 decoding being >> used instead.
>>> ... >>> More importantly, it will interpret any encoding as :utf8, even for
>> example :encoding(UTF-16). >> >> be classified as a serious issue in sysread? >> >> Though, I think, one could well argue that trying to read anything else >> than bytes (be it utf8 of any kind, or any other encoding) with sysread is >> a dubious proposition to begin with. >>
> > Yes. This is essentially a core bug that's leaking out because File::Slurp > is stupid. sysread might as well die in this particular case IMO, but I'm > not sure if we can retroactively make it do that.
and how would you go about fixing it in file::slurp? i am not a unicode/utf guru by any stretch (i say i am an ascii bigot! :). would just doing a raw sysread all the time and then converting the buffer as directed by binmode work? any other suggestions for how to read utf files with ease and speed? thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porters <perl5-porters [...] perl.org>
From: Leon Timmermans <fawaka [...] gmail.com>
Date: Wed, 14 May 2014 17:30:29 +0200
To: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 646b
On Wed, May 14, 2014 at 5:14 PM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
and how would you go about fixing it in file::slurp? i am not a unicode/utf guru by any stretch (i say i am an ascii bigot! :). would just doing a raw sysread all the time and then converting the buffer as directed by binmode work? any other suggestions for how to read utf files with ease and speed?

File::Slurp is nothing more than a workaround for a performance bug in perl. There is no reason «my $foo = do { local $/; <$fh> };» would need to be any slower than what File::Slurp is doing right now. In fact I'm planning to fix just that in 5.21.

Leon
From: Jarkko Hietaniemi <jhi [...] iki.fi>
To: Leon Timmermans <fawaka [...] gmail.com>, Uri Guttman <uri [...] stemsystems.com>
Date: Wed, 14 May 2014 12:34:59 -0400
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porters <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 1.2k
On Wednesday-201405-14, 11:30, Leon Timmermans wrote: Show quoted text
> On Wed, May 14, 2014 at 5:14 PM, Uri Guttman <uri@stemsystems.com > <mailto:uri@stemsystems.com>> wrote: > > and how would you go about fixing it in file::slurp? i am not a > unicode/utf guru by any stretch (i say i am an ascii bigot! :). > would just doing a raw sysread all the time and then converting the > buffer as directed by binmode work? any other suggestions for how to > read utf files with ease and speed? > > > File::Slurp is nothing more than a workaround for a performance bug in > perl. There is no reason «my $foo = do { local $/; <$fh> };» would need > to be any slower than what File::Slurp is doing right now. In fact I'm > planning to fix just that in 5.21. > > Leon
FWIW re the UTF-8 validity/strictness: I think we're well within our rights to break things, quoting Standards (via UTF-8 Wikipedia page): RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences." ... The Unicode Standard requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence."
Date: Wed, 14 May 2014 13:43:42 -0400
To: Jarkko Hietaniemi <jhi [...] iki.fi>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Leon Timmermans <fawaka [...] gmail.com>, Uri Guttman <uri [...] stemsystems.com>, Perl5 Porters <perl5-porters [...] perl.org>
From: David Golden <xdg [...] xdg.me>
Download (untitled) / with headers
text/plain 731b
On Wed, May 14, 2014 at 12:34 PM, Jarkko Hietaniemi <jhi@iki.fi> wrote: Show quoted text
> RFC 3629 states "Implementations of the decoding algorithm MUST protect > against decoding invalid sequences." ... The Unicode Standard requires > decoders to "...treat any ill-formed code unit sequence as an error > condition. This guarantees that it will neither interpret nor emit an > ill-formed code unit sequence."
It's worth considering two separate issues: * invalid UTF-8 encoding * private/nonchar codepoints As I understand it, :utf8 is lax in both respects. I'd be very supportive of fixing the first, even if it breaks backwards compatibility, but don't think we should address the second. -- David Golden <xdg@xdg.me> Twitter/IRC: @xdg
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: "Dr.Ruud" <rvtol+usenet [...] isolution.nl>
Date: Thu, 15 May 2014 02:47:48 +0200
To: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 915b
On 2014-05-14 17:30, Leon Timmermans wrote: Show quoted text
> On Wed, May 14, 2014 at 5:14 PM, Uri Guttman <uri@stemsystems.com > <mailto:uri@stemsystems.com>> wrote:
Show quoted text
> and how would you go about fixing it in file::slurp? i am not a > unicode/utf guru by any stretch (i say i am an ascii bigot! :). > would just doing a raw sysread all the time and then converting the > buffer as directed by binmode work? any other suggestions for how to > read utf files with ease and speed? > > > File::Slurp is nothing more than a workaround for a performance bug in > perl. There is no reason «my $foo = do { local $/; <$fh> };» would need > to be any slower than what File::Slurp is doing right now. In fact I'm > planning to fix just that in 5.21.
Not sure if this is still true for the latest Perls, but you better write it as: my $foo; { local $/; $foo = <$fh> } or it will use twice the memory. -- Ruud
To: perl5-porters [...] perl.org
Date: Wed, 14 May 2014 22:57:41 -0400
From: Uri Guttman <uri [...] stemsystems.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 1.7k
On 05/14/2014 11:30 AM, Leon Timmermans wrote: Show quoted text
> On Wed, May 14, 2014 at 5:14 PM, Uri Guttman <uri@stemsystems.com> wrote: >
>> and how would you go about fixing it in file::slurp? i am not a >> unicode/utf guru by any stretch (i say i am an ascii bigot! :). would just >> doing a raw sysread all the time and then converting the buffer as directed >> by binmode work? any other suggestions for how to read utf files with ease >> and speed? >>
> > File::Slurp is nothing more than a workaround for a performance bug in > perl. There is no reason «my $foo = do { local $/; <$fh> };» would need to > be any slower than what File::Slurp is doing right now. In fact I'm > planning to fix just that in 5.21.
if you think that is all there is to it, then do the workaround. most coders think read_file( $filename) is a much better piece of code than the infamous do/local hack. it isn't just a speed up (and check out the benchmark script to see how much it does blows away do/local) but also has a better api, better error handling options, more flexible ways to get the data back. of course you can do atomic writing on your own. edit_file and edit_file_lines would be trivial for you to write with open/read/write and all the other options. and where is the equivilent of write_file with do local? gonna need at least an open call and a print along with error handling and more. over 500 modules and distros use file::slurp. tell them all to use the do/local thing. i am sure you will be rewarded with gold and jewels. in the mean time, instead of saying it uses sysread and can't handle utf correctly, how about saying how to properly handle it? maybe those 500 modules will like that instead. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Date: Wed, 14 May 2014 15:52:18 +0100
To: perlbug-followup [...] perl.org
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Martin McGrath <mcgrath.martin [...] gmail.com>
Download (untitled) / with headers
text/plain 218b
On 14 May 2014 15:45, Ed Avis via RT <perlbug-followup@perl.org> wrote: Show quoted text
> Interesting. Have you suggested this as a Perl::Critic policy too? >
Not as yet, I've forked the repo and will work on this tonight. Thanks
To: Uri Guttman <uri [...] stemsystems.com>, perl5-porters [...] perl.org
Date: Thu, 15 May 2014 08:39:28 -0400
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Jarkko Hietaniemi <jhi [...] iki.fi>
Download (untitled) / with headers
text/plain 402b
On Wednesday-201405-14, 22:57, Uri Guttman wrote: Show quoted text
> in the mean time, instead of saying it uses sysread and can't handle utf > correctly, how about saying how to properly handle it? maybe those 500 > modules will like that instead.
Based on some torture testing I just did, sysread() isn't the only thing broken regarding :encoding(blah)... working on it, but it will be 5.21 before fixes happen.
Date: Thu, 15 May 2014 22:43:14 +1000
To: Uri Guttman <uri [...] stemsystems.com>
CC: perl5-porters [...] perl.org
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 593b
On Wed, May 14, 2014 at 10:57:41PM -0400, Uri Guttman wrote: Show quoted text
> in the mean time, instead of saying it uses sysread and can't handle > utf correctly, how about saying how to properly handle it? maybe > those 500 modules will like that instead.
Don't use sysread() if there are any layers - since it ignores everything but the :utf8 flag[1] I'd tend to fallback to read(), whether that meets your performance requirements I don't know. Tony [1] in a scary way, eg. if your file is UCS-2BE and you open with :encoding(UCS-2BE), sysread() will (try to) treat that UCS-2BE data as UTF-8 data.
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porters <perl5-porters [...] perl.org>
From: Leon Timmermans <fawaka [...] gmail.com>
Date: Thu, 15 May 2014 14:54:09 +0200
To: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 1.3k
On Thu, May 15, 2014 at 4:57 AM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
if you think that is all there is to it, then do the workaround. most coders think read_file( $filename) is a much better piece of code than the infamous do/local hack. it isn't just a speed up (and check out the benchmark script to see how much it does blows away do/local) but also has a better api, better error handling options, more flexible ways to get the data back. of course you can do atomic writing on your own.

All of that is true, but you don't need hundreds of lines of conceptually buggy code to achieve that.
 
Show quoted text
edit_file and edit_file_lines would be trivial for you to write with open/read/write and all the other options. and where is the equivilent of write_file with do local? gonna need at least an open call and a print along with error handling and more.

edit_file and edit_file_lines are horribly broken by ignoring encodings altogether.
  Show quoted text
over 500 modules and distros use file::slurp. tell them all to use the do/local thing. i am sure you will be rewarded with gold and jewels.

in the mean time, instead of saying it uses sysread and can't handle utf correctly, how about saying how to properly handle it? maybe those 500 modules will like that instead.

Don't use sysread, at least not unless you're absolutely sure it gives the right answer. It's really that simple.

Leon
Date: Thu, 15 May 2014 10:26:15 -0400
To: perl5-porters [...] perl.org
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 787b
On 05/15/2014 08:43 AM, Tony Cook wrote: Show quoted text
> On Wed, May 14, 2014 at 10:57:41PM -0400, Uri Guttman wrote:
>> in the mean time, instead of saying it uses sysread and can't handle >> utf correctly, how about saying how to properly handle it? maybe >> those 500 modules will like that instead.
> > Don't use sysread() if there are any layers - since it ignores > everything but the :utf8 flag[1] > > I'd tend to fallback to read(), whether that meets your performance > requirements I don't know.
so what about sysread in raw mode and then converting the buffer to the requested format? that should bypass any sysread issues and then rely on the utf conversion subs directly. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
From: Uri Guttman <uri [...] stemsystems.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porters <perl5-porters [...] perl.org>
To: Leon Timmermans <fawaka [...] gmail.com>
Date: Thu, 15 May 2014 10:31:35 -0400
On 05/15/2014 08:54 AM, Leon Timmermans wrote: Show quoted text
> On Thu, May 15, 2014 at 4:57 AM, Uri Guttman <uri@stemsystems.com> wrote: >
>> if you think that is all there is to it, then do the workaround. most >> coders think read_file( $filename) is a much better piece of code than the >> infamous do/local hack. it isn't just a speed up (and check out the >> benchmark script to see how much it does blows away do/local) but also has >> a better api, better error handling options, more flexible ways to get the >> data back. of course you can do atomic writing on your own.
> > > All of that is true, but you don't need hundreds of lines of conceptually > buggy code to achieve that. >
and there aren't many bug reports on those hundreds of lines. in fact most are utf related. the code is very solid and that isn't just my opinion. Show quoted text
>
>> edit_file and edit_file_lines would be trivial for you to write with >> open/read/write and all the other options. and where is the equivilent of >> write_file with do local? gonna need at least an open call and a print >> along with error handling and more. >>
> > edit_file and edit_file_lines are horribly broken by ignoring encodings > altogether. >
they don't ignore them. they pass on the encoding options to read_file and write_file. Show quoted text
>> >> over 500 modules and distros use file::slurp. tell them all to use the >> do/local thing. i am sure you will be rewarded with gold and jewels. >> >> in the mean time, instead of saying it uses sysread and can't handle utf >> correctly, how about saying how to properly handle it? maybe those 500 >> modules will like that instead. >>
> > Don't use sysread, at least not unless you're absolutely sure it gives the > right answer. It's really that simple.
and you didn't answer the question about calling sysread in raw mode and encoding afterwards. try answering that and being useful here. it isn't my fault that sysread encoding is broken and i wrote file::slurp 11 years ago when sysread was fine and dandy. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
To: Uri Guttman <uri [...] stemsystems.com>
Date: Thu, 15 May 2014 09:46:54 -0500
From: "Craig A. Berry" <craig.a.berry [...] gmail.com>
CC: "Perl5 Porters (E-mail)" <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 560b
On Thu, May 15, 2014 at 9:26 AM, Uri Guttman <uri@stemsystems.com> wrote: Show quoted text
> so what about sysread in raw mode and then converting the buffer to the > requested format? that should bypass any sysread issues and then rely on the > utf conversion subs directly.
The main problem I'm aware of is that there is no guarantee the buffer ends on a character boundary of the data are in a multi-byte encoding. The end of the buffer might have a partial character in it and the rest of that character hasn't been read in yet. You can't convert a partial character.
From: Uri Guttman <uri [...] stemsystems.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
To: perl5-porters [...] perl.org
Date: Thu, 15 May 2014 11:09:24 -0400
Download (untitled) / with headers
text/plain 1.7k
On 05/15/2014 10:46 AM, Craig A. Berry wrote: Show quoted text
> On Thu, May 15, 2014 at 9:26 AM, Uri Guttman <uri@stemsystems.com> wrote: >
>> so what about sysread in raw mode and then converting the buffer to the >> requested format? that should bypass any sysread issues and then rely on the >> utf conversion subs directly.
> > The main problem I'm aware of is that there is no guarantee the buffer > ends on a character boundary of the data are in a multi-byte encoding. > The end of the buffer might have a partial character in it and the > rest of that character hasn't been read in yet. You can't convert a > partial character.
the whole idea of slurp is to read in the entire file into a string. any partial chars at the end at the fault of whatever wrote the file. this conversion wouldn't be done on a block by block basis (and i see your point there as it finally is clear why sysread would be broken if used that way). so again, i will ask the simple question to which i have not gotten a proper answer from all you unicode experts. if file::slurp reads in an ENTIRE file with raw mode sysread into a single string, and THEN converts its to the requested format, will that work? given the actual description of the bug above, i can't see why it won't work. and it will be an EASY fix (patches welcome as slurp is now on github under perlhunter). so i will be blunt here, put up or shut up. i am asking for help on this unicode/utf issue where i am very weak. i know many of you are experts on perl/utf so what is the best way to do this? what subs do i call to do the conversion of a full buffer? i can make sysread do raw mode and delay the binmode option to a later conversion. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
To: Uri Guttman <uri [...] stemsystems.com>
Date: Thu, 15 May 2014 17:31:53 +0200
From: demerphq <demerphq [...] gmail.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porteros <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 1.9k
On 15 May 2014 17:09, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
On 05/15/2014 10:46 AM, Craig A. Berry wrote:
On Thu, May 15, 2014 at 9:26 AM, Uri Guttman <uri@stemsystems.com> wrote:

so what about sysread in raw mode and then converting the buffer to the
requested format? that should bypass any sysread issues and then rely on the
utf conversion subs directly.

The main problem I'm aware of is that there is no guarantee the buffer
ends on a character boundary of the data are in a multi-byte encoding.
  The end of the buffer might have a partial character in it and the
rest of that character hasn't been read in yet.  You can't convert a
partial character.

the whole idea of slurp is to read in the entire file into a string. any partial chars at the end at the fault of whatever wrote the file. this conversion wouldn't be done on a block by block basis (and i see your point there as it finally is clear why sysread would be broken if used that way).

so again, i will ask the simple question to which i have not gotten a proper answer from all you unicode experts. if file::slurp reads in an ENTIRE file with raw mode sysread into a single string, and THEN converts its to the requested format, will that work? given the actual description of the bug above, i can't see why it won't work. and it will be an EASY fix (patches welcome as slurp is now on github under perlhunter).

so i will be blunt here, put up or shut up. i am asking for help on this unicode/utf issue where i am very weak. i know many of you are experts on perl/utf so what is the best way to do this? what subs do i call to do the conversion of a full buffer? i can make sysread do raw mode and delay the binmode option to a later conversion.


Reading the full file and then calling Encode::decode() on it with the right flags would work fine.

cheers,
Yves 


--
perl -Mre=debug -e "/just|another|perl|hacker/"
Date: Thu, 15 May 2014 12:02:12 -0400
To: demerphq <demerphq [...] gmail.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porteros <perl5-porters [...] perl.org>
From: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 2.2k
On 05/15/2014 11:31 AM, demerphq wrote: Show quoted text
> On 15 May 2014 17:09, Uri Guttman <uri@stemsystems.com> wrote: >
>> On 05/15/2014 10:46 AM, Craig A. Berry wrote: >>
>>> On Thu, May 15, 2014 at 9:26 AM, Uri Guttman <uri@stemsystems.com> wrote: >>> >>> so what about sysread in raw mode and then converting the buffer to the
>>>> requested format? that should bypass any sysread issues and then rely on >>>> the >>>> utf conversion subs directly. >>>>
>>> >>> The main problem I'm aware of is that there is no guarantee the buffer >>> ends on a character boundary of the data are in a multi-byte encoding. >>> The end of the buffer might have a partial character in it and the >>> rest of that character hasn't been read in yet. You can't convert a >>> partial character. >>>
>> >> the whole idea of slurp is to read in the entire file into a string. any >> partial chars at the end at the fault of whatever wrote the file. this >> conversion wouldn't be done on a block by block basis (and i see your point >> there as it finally is clear why sysread would be broken if used that way). >> >> so again, i will ask the simple question to which i have not gotten a >> proper answer from all you unicode experts. if file::slurp reads in an >> ENTIRE file with raw mode sysread into a single string, and THEN converts >> its to the requested format, will that work? given the actual description >> of the bug above, i can't see why it won't work. and it will be an EASY fix >> (patches welcome as slurp is now on github under perlhunter). >> >> so i will be blunt here, put up or shut up. i am asking for help on this >> unicode/utf issue where i am very weak. i know many of you are experts on >> perl/utf so what is the best way to do this? what subs do i call to do the >> conversion of a full buffer? i can make sysread do raw mode and delay the >> binmode option to a later conversion. >> >>
> Reading the full file and then calling Encode::decode() on it with the > right flags would work fine.
yves, thanks for the direct answer. now the question is should i just pass in what i get in the bin_mode option to decode or something else? i have used decode before but i don't recall if it uses the same args as binmode. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Date: Thu, 15 May 2014 11:09:40 -0500 (CDT)
To: perl5-porters [...] perl.org
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Dave Rolsky <autarch [...] urth.org>
Download (untitled) / with headers
text/plain 948b
On Thu, 15 May 2014, Uri Guttman wrote: Show quoted text
> thanks for the direct answer. now the question is should i just pass in what > i get in the bin_mode option to decode or something else? i have used decode > before but i don't recall if it uses the same args as binmode.
This will only sort of work. With binmode you can set multiple layers and also do things unrelated to character sets (like gzip encode/decode, line ending translation, etc.). Encode can accept anything that would be passed to the ":encoding(...)" layer, as well as ":utf8", but the two don't really correspond. I suspect a better fix would be to simply _not_ use sysread if binmode is anything besides ":utf8" or ":raw". -dave /*============================================================ http://VegGuide.org http://blog.urth.org Your guide to all that's veg House Absolute(ly Pointless) ============================================================*/
To: perl5-porters [...] perl.org
Date: Thu, 15 May 2014 12:28:24 -0400
From: Uri Guttman <uri [...] stemsystems.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 1.5k
On 05/15/2014 12:09 PM, Dave Rolsky wrote: Show quoted text
> On Thu, 15 May 2014, Uri Guttman wrote: >
>> thanks for the direct answer. now the question is should i just pass >> in what i get in the bin_mode option to decode or something else? i >> have used decode before but i don't recall if it uses the same args as >> binmode.
> > This will only sort of work. With binmode you can set multiple layers > and also do things unrelated to character sets (like gzip encode/decode, > line ending translation, etc.). > > Encode can accept anything that would be passed to the ":encoding(...)" > layer, as well as ":utf8", but the two don't really correspond. > > I suspect a better fix would be to simply _not_ use sysread if binmode > is anything besides ":utf8" or ":raw".
what about adding another option that is passed directly to decode? read() would still fail as files are read in blocks and that would also trigger the same error if a char was partially read. what i can do is always do :raw on sysread and call encode when the binmode is something allowed by decode like utf8 or something. it would ignore it when the mode is requested to be :raw (could be a binary file). i don't see why i can't keep sysread if somehow the user can specify how to decode. an alternative is for callers to always use :raw and then do decode on their own. that could be added to the pod saying if the new decode feature doesn't work for them, roll your own. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
To: Uri Guttman <uri [...] stemsystems.com>
Date: Thu, 15 May 2014 13:16:33 -0400
From: Eric Brine <ikegami [...] adaelis.com>
CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 456b
On Thu, May 15, 2014 at 11:09 AM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
so again, i will ask the simple question to which i have not gotten a proper answer from all you unicode experts. if file::slurp reads in an ENTIRE file with raw mode sysread into a single string, and THEN converts its to the requested format, will that work?

Yes, but it won't handle layers other than :encoding, and it won't handle already-opened handles with layers.

To: perl5-porters [...] perl.org
Date: Thu, 15 May 2014 14:04:39 -0400
From: Uri Guttman <uri [...] stemsystems.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 1.1k
On 05/15/2014 01:16 PM, Eric Brine wrote: Show quoted text
> On Thu, May 15, 2014 at 11:09 AM, Uri Guttman <uri@stemsystems.com> wrote: >
>> so again, i will ask the simple question to which i have not gotten a >> proper answer from all you unicode experts. if file::slurp reads in an >> ENTIRE file with raw mode sysread into a single string, and THEN converts >> its to the requested format, will that work?
> > > Yes, but it won't handle layers other than :encoding, and it won't handle > already-opened handles with layers. >
that can be documented. most callers of read_file won't use it with already open files. maybe for sockets and other handles (e.g. <DATA>) which may not have layers. also with slurping the only common layer i know is already done which is line splitting (works on all platforms). so the decoding is the only issue. if you need more complex stuff, then do it yourself. this is for common basic read in a whole file with utf decoding as needed. and that is by far the common case and what the module is intended for. it isn't meant to replace complex i/o, just make common i/o simpler and faster. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
CC: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Leon Timmermans <fawaka [...] gmail.com>
Date: Thu, 15 May 2014 20:10:28 +0200
To: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 1.4k
On Thu, May 15, 2014 at 4:31 PM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
and there aren't many bug reports on those hundreds of lines. in fact most are utf related. the code is very solid and that isn't just my opinion.

It's just *completely* broken with encodings, which is a fairly critical bug. It just happens to get explicit crlf wrong. Its dealing with layers is dubious at best and flat out wrong more often than it should.

Show quoted text
they don't ignore them. they pass on the encoding options to read_file and write_file.

Sorry, it seems I had misread the docs there.
 
Show quoted text
Don't use sysread, at least not unless you're absolutely sure it gives the
right answer. It's really that simple.

and you didn't answer the question about calling sysread in raw mode and encoding afterwards. try answering that and being useful here. it isn't my fault that sysread encoding is broken and i wrote file::slurp 11 years ago when sysread was fine and dandy.

Yes, you can, but why would you go through these complications? What makes you think you'd be faster/better/whatever than

sub  read_file($filename, $encoding = 'utf8') {
  open my $fh, '<:encoding($encoding)', $filename or die "Can't open $filename: $!";
  return do { local $/; <$fh> }
}

It's not that this can't be made to work, it's that a 100+ lines of code doing tricky things for optimizations that can be achieved using significantly less code, and you're proposing to add more kludges to this.

Leon
To: Leon Timmermans <fawaka [...] gmail.com>
Date: Thu, 15 May 2014 14:38:40 -0400
From: Uri Guttman <uri [...] stemsystems.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Perl5 Porters <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 3.2k
On 05/15/2014 02:10 PM, Leon Timmermans wrote: Show quoted text
> On Thu, May 15, 2014 at 4:31 PM, Uri Guttman <uri@stemsystems.com> wrote: >
>> and there aren't many bug reports on those hundreds of lines. in fact most >> are utf related. the code is very solid and that isn't just my opinion.
> > > It's just *completely* broken with encodings, which is a fairly critical > bug. It just happens to get explicit crlf wrong. Its dealing with layers is > dubious at best and flat out wrong more often than it should.
and why would you need to explicitly pass in cdlf when it can do line splitting already? and as i said, layers aren't a critical part of the concept. if you want complex i/o do it yourself. slurping is mostly for common text files (think config, source, etc.) which don't need anything but a possible decode. this isn't line by line or some other layered thing. Show quoted text
> > they don't ignore them. they pass on the encoding options to read_file and
>> write_file. >>
> > Sorry, it seems I had misread the docs there.
i am not that dumb as to not pass in those options. the idea of those is to use read_file/write_file as they are working. eating my own dog food. Show quoted text
> >
>> Don't use sysread, at least not unless you're absolutely sure it gives the
>>> right answer. It's really that simple. >>>
>> >> and you didn't answer the question about calling sysread in raw mode and >> encoding afterwards. try answering that and being useful here. it isn't my >> fault that sysread encoding is broken and i wrote file::slurp 11 years ago >> when sysread was fine and dandy.
> > > Yes, you can, but why would you go through these complications? What makes > you think you'd be faster/better/whatever than > > sub read_file($filename, $encoding = 'utf8') { > open my $fh, '<:encoding($encoding)', $filename or die "Can't open > $filename: $!"; > return do { local $/; <$fh> } > }
so do that your self. and what about all the users out there who don't need that and want a simple call? you can say the same thing for so many modules which may have a simple workaround in 6 lines of code but do a whole lot more. so stop your rant on this because slurp is used all over and it isn't going away. don't use it if you don't want to. helping to fix the decode problem is useful as other here have done. Show quoted text
> > It's not that this can't be made to work, it's that a 100+ lines of code > doing tricky things for optimizations that can be achieved using > significantly less code, and you're proposing to add more kludges to this.
tricky things for optimizations? you still haven't really read the code. most of the 'tricky' code is handling options, line splitting on winblows (that is a bad special case), and such. very little other than looping over a LARGE sysread call of 1mb is done for optimization. and run the benchmark already. see the difference in MANY styles of slurping in files. choose a file size, etc. the do/local way is almost the slowest possible way and FUGLY as all hell. i would never allow someone to write that code when i review it. even a sub wrapper on that is better but then you might as well use file::slurp. and yes, i review code for all my work, both consulting and placement so i have some experience on what good code looks like. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Date: Thu, 15 May 2014 16:00:07 -0400
To: Uri Guttman <uri [...] stemsystems.com>
CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Eric Brine <ikegami [...] adaelis.com>
Download (untitled) / with headers
text/plain 256b
On Thu, May 15, 2014 at 2:04 PM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
also with slurping the only common layer i know is already done which is line splitting (works on all platforms). so the decoding is the only issue

:crlf is also very common.

From: Uri Guttman <uri [...] stemsystems.com>
CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
To: Eric Brine <ikegami [...] adaelis.com>
Date: Thu, 15 May 2014 16:09:49 -0400
Download (untitled) / with headers
text/plain 725b
On 05/15/2014 04:00 PM, Eric Brine wrote: Show quoted text
> On Thu, May 15, 2014 at 2:04 PM, Uri Guttman <uri@stemsystems.com> wrote: >
>> also with slurping the only common layer i know is already done which is >> line splitting (works on all platforms). so the decoding is the only issue
> > > :crlf is also very common.
from what i see in perldoc perlio it just converts crlf to \n on platforms that need it. read_file has already done that for a long long time. this is true for both files slurped into a scalar and into an array of lines. so that isn't anything i need to be concerned about with layers and decoding. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
To: Eric Brine <ikegami [...] adaelis.com>
Date: Thu, 15 May 2014 22:25:22 +0200
From: Leon Timmermans <fawaka [...] gmail.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Uri Guttman <uri [...] stemsystems.com>, perl5 porters <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 610b
On Thu, May 15, 2014 at 10:00 PM, Eric Brine <ikegami@adaelis.com> wrote:
Show quoted text
On Thu, May 15, 2014 at 2:04 PM, Uri Guttman <uri@stemsystems.com> wrote:
also with slurping the only common layer i know is already done which is line splitting (works on all platforms). so the decoding is the only issue

:crlf is also very common.

«binmode => ':crlf'» completely broken on File::Slurp. On Unix it's a no-op, on Windows it will turn crlf translation *off*. Because:

${$buf_ref} =~ s/\015\012/\n/g if $is_win32 && !$opts->{'binmode'} ;

Emulating PerlIO layers really is a deep rabbit hole.

Leon
CC: Eric Brine <ikegami [...] adaelis.com>, perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Uri Guttman <uri [...] stemsystems.com>
Date: Thu, 15 May 2014 16:43:03 -0400
To: Leon Timmermans <fawaka [...] gmail.com>
Download (untitled) / with headers
text/plain 1.1k
On 05/15/2014 04:25 PM, Leon Timmermans wrote: Show quoted text
> On Thu, May 15, 2014 at 10:00 PM, Eric Brine <ikegami@adaelis.com> wrote: >
>> On Thu, May 15, 2014 at 2:04 PM, Uri Guttman <uri@stemsystems.com> wrote: >>
>>> also with slurping the only common layer i know is already done which is >>> line splitting (works on all platforms). so the decoding is the only issue
>> >> >> :crlf is also very common. >>
> > «binmode => ':crlf'» completely broken on File::Slurp. On Unix it's a > no-op, on Windows it will turn crlf translation *off*. Because: > > ${$buf_ref} =~ s/\015\012/\n/g if $is_win32 && !$opts->{'binmode'} ; > > Emulating PerlIO layers really is a deep rabbit hole.
so you don't pass in :crlf to slurp. is that so hard to understand?? it isn't NEEDED. and it is emulated just fine and it works. no reports on that feature in a long time. try to find something constructive to say or just keep quiet on this. i get it. you don't like the module. good for you. doesn't mean you have to keep ripping it with unhelpful comments and such. crlf is handled - period. no need to deal with the :crlf layer. next? uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Date: Thu, 15 May 2014 15:50:23 -0500
CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: David Nicol <davidnicol [...] gmail.com>
Download (untitled) / with headers
text/plain 151b


Just to be difficult, has anyone benchmarked

   sub read_file($) {  `/bin/cat \Q$_[0]\E` }  # or "type" on windows

as an alternative, recently?

CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Eric Brine <ikegami [...] adaelis.com>
Date: Fri, 16 May 2014 00:33:40 -0400
To: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 470b
On Thu, May 15, 2014 at 4:43 PM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
crlf is handled - period.

Yes, but you were talking of breaking it by providing an implementation that only supported :encoding. I simply reminded you that you also needed to support :crlf too.

Show quoted text
i get it. you don't like the module. good for you. doesn't mean you have to keep ripping it with unhelpful comments and such.

Yeah, we hate it so much we just had to try to help you fix it???

Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: perl5 porters <perl5-porters [...] perl.org>
From: Eric Brine <ikegami [...] adaelis.com>
Date: Fri, 16 May 2014 00:46:59 -0400
To: David Nicol <davidnicol [...] gmail.com>
Download (untitled) / with headers
text/plain 996b
On Thu, May 15, 2014 at 4:50 PM, David Nicol <davidnicol@gmail.com> wrote:
Show quoted text

Just to be difficult, has anyone benchmarked

   sub read_file($) {  `/bin/cat \Q$_[0]\E` }  # or "type" on windows

as an alternative, recently?

Backticks read the same way C<read> does (8K at a time), so it's slower than using C<read> to read from a file. (There's quite a number of other issues too, in general and as a File::Slurp implementation).

use Benchmark qw( cmpthese );

sub _read {
   my $buf;
   open(my $fh, '<', $ARGV[0]) or die $!;
   my $sz = -s $fh;
   read($fh, $buf, $sz);
}

sub _sysread {
   my $buf;
   open(my $fh, '<', $ARGV[0]) or die $!;
   my $sz = -s $fh;
   sysread($fh, $buf, $sz);
}

cmpthese (-5, {
   qx => sub { my $buf = `/bin/cat \Q$ARGV[0]\E`; 1 },
   read => \&_read,
   sysread => \&_read,
});


          Rate      qx sysread    read
qx      59.7/s      --    -87%    -87%
sysread  451/s    655%      --     -1%
read     455/s    662%      1%      --

 1.5MB file

To: Eric Brine <ikegami [...] adaelis.com>
Date: Fri, 16 May 2014 01:35:54 -0400
From: Uri Guttman <uri [...] stemsystems.com>
CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 821b
On 05/16/2014 12:33 AM, Eric Brine wrote: Show quoted text
> On Thu, May 15, 2014 at 4:43 PM, Uri Guttman <uri@stemsystems.com> wrote: >
>> crlf is handled - period.
> > > Yes, but you were talking of breaking it by providing an implementation > that only supported :encoding. I simply reminded you that you also needed > to support :crlf too. >
how does it need to be supported more than it is now? it can be ignored if passed in as crlf conversion always happens now on winblows. Show quoted text
> i get it. you don't like the module. good for you. doesn't mean you have to
>> keep ripping it with unhelpful comments and such.
> > > Yeah, we hate it so much we just had to try to help you fix it??? >
i was referring to leon, not you. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Uri Guttman <uri [...] stemsystems.com>
Date: Fri, 16 May 2014 01:38:25 -0400
To: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.5k
On 05/16/2014 12:46 AM, Eric Brine wrote: Show quoted text
> On Thu, May 15, 2014 at 4:50 PM, David Nicol <davidnicol@gmail.com> wrote: >
>> >> Just to be difficult, has anyone benchmarked >> >> sub read_file($) { `/bin/cat \Q$_[0]\E` } # or "type" on windows >> >> as an alternative, recently? >>
> > Backticks read the same way C<read> does (8K at a time), so it's slower > than using C<read> to read from a file. (There's quite a number of other > issues too, in general and as a File::Slurp implementation). > > use Benchmark qw( cmpthese ); > > sub _read { > my $buf; > open(my $fh, '<', $ARGV[0]) or die $!; > my $sz = -s $fh; > read($fh, $buf, $sz); > } > > sub _sysread { > my $buf; > open(my $fh, '<', $ARGV[0]) or die $!; > my $sz = -s $fh; > sysread($fh, $buf, $sz); > } > > cmpthese (-5, { > qx => sub { my $buf = `/bin/cat \Q$ARGV[0]\E`; 1 }, > read => \&_read, > sysread => \&_read, > });
is that a mistake with both read and sysread calling _read? Show quoted text
> > > Rate qx sysread read > qx 59.7/s -- -87% -87% > sysread 451/s 655% -- -1% > read 455/s 662% 1% --
those last two are basically the same result as they call the same code. sysread should be much faster than read. file::slurp has a good benchmark script in extras which can easily be added to. it has many variations of slurping already in there and you can set the file size and other things. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Date: Fri, 16 May 2014 11:28:23 +0100
To: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Smylers <Smylers [...] stripey.com>
Download (untitled) / with headers
text/plain 1.5k
Leon Timmermans writes: Show quoted text
> There is no reason «my $foo = do { local $/; <$fh> };» would need to > be any slower than what File::Slurp is doing right now. In fact I'm > planning to fix just that in 5.21.
That's great to hear, Leon. As well as speeding up people using that idiom directly, it'll also help users of Path::Class and Path::Tiny, both of which have -> slurp methods which are implemented using it. Uri Guttman writes: Show quoted text
> On 05/15/2014 02:10 PM, Leon Timmermans wrote: >
> > sub read_file($filename, $encoding = 'utf8') { > > open my $fh, '<:encoding($encoding)', $filename or die "Can't open > > $filename: $!"; > > return do { local $/; <$fh> } > > }
> > so do that your self. and what about all the users out there who don't > need that and want a simple call?
Indeed, File::Slurp provides a convenient interface for many users, who will wish to use it for that reason regardless of any efficiency concerns. I interpreted Leon's suggestion above not as something users should do _instead_ of using read_file, but as a potential _implementation_ of read_file in File::Slurp (or at least the nub of such). Once using <$fh> has been sped up sufficiently, one possibility would be forFile::Slurp to switch to using that, thereby simplifying its implementation and avoiding any of the issues with using sysread. Best wishes Smylers -- Why drug companies shouldn't be allowed to hide research results that they don't like: http://gu.com/p/3zat8 — UK gov wasted millions on Tamiflu Sign the AllTrials petition: http://www.alltrials.net/
To: perl5-porters [...] perl.org
Date: Fri, 16 May 2014 11:41:09 +0100
From: Smylers <Smylers [...] stripey.com>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
Download (untitled) / with headers
text/plain 369b
I just wrote: Show quoted text
> As well as speeding up people using that idiom directly, ...
Erm, I meant speeding up their code, not actually speeding up the people. Smylers -- Why drug companies shouldn't be allowed to hide research results that they don't like: http://gu.com/p/3zat8 — UK gov wasted millions on Tamiflu Sign the AllTrials petition: http://www.alltrials.net/
Date: Fri, 16 May 2014 13:38:21 +0200
To: Uri Guttman <uri [...] stemsystems.com>
CC: Eric Brine <ikegami [...] adaelis.com>, perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Leon Timmermans <fawaka [...] gmail.com>
Download (untitled) / with headers
text/plain 782b
On Thu, May 15, 2014 at 10:43 PM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
so you don't pass in :crlf to slurp. is that so hard to understand?? it isn't NEEDED. and it is emulated just fine and it works. no reports on that feature in a long time.

You need it to (portably) open text files created on Windows. That is not an obscure use-case.
 
Show quoted text
try to find something constructive to say or just keep quiet on this. i get it. you don't like the module. good for you. doesn't mean you have to keep ripping it with unhelpful comments and such. crlf is handled - period. no need to deal with the :crlf layer. next?

How is pointing out bugs "unhelpful comments"? How is offering solutions (like not using sysread when that's not appropriate) not "constructive"?

Leon
Date: Fri, 16 May 2014 08:12:31 -0400
To: Uri Guttman <uri [...] stemsystems.com>
CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Eric Brine <ikegami [...] adaelis.com>
Download (untitled) / with headers
text/plain 337b
On Fri, May 16, 2014 at 1:38 AM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
is that a mistake with both read and sysread calling _read?

Of course. Was super duper tired. Fixed:

          Rate      qx    read sysread
qx      62.3/s      --    -86%    -91%
read     430/s    591%      --    -40%
sysread  718/s   1052%     67%      --

Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Uri Guttman <uri [...] stemsystems.com>
Date: Fri, 16 May 2014 10:03:15 -0400
To: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 653b
On 05/16/2014 07:38 AM, Leon Timmermans wrote: Show quoted text
> On Thu, May 15, 2014 at 10:43 PM, Uri Guttman <uri@stemsystems.com> wrote: >
>> so you don't pass in :crlf to slurp. is that so hard to understand?? it >> isn't NEEDED. and it is emulated just fine and it works. no reports on that >> feature in a long time.
> > > You need it to (portably) open text files created on Windows. That is not > an obscure use-case.
so it is a trivial fix to check for :crlf, do the same line conversion as now and not set the actual layer. this isn't rocket surgery. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
Date: Fri, 16 May 2014 10:29:32 -0400
To: Eric Brine <ikegami [...] adaelis.com>
CC: perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 1.3k
On 05/16/2014 08:12 AM, Eric Brine wrote: Show quoted text
> On Fri, May 16, 2014 at 1:38 AM, Uri Guttman <uri@stemsystems.com> wrote: >
>> is that a mistake with both read and sysread calling _read?
> > > Of course. Was super duper tired. Fixed: > > Rate qx read sysread > qx 62.3/s -- -86% -91% > read 430/s 591% -- -40% > sysread 718/s 1052% 67% -- >
more like what i have seen. that is a lot of speed to make up. i don't see how do/local or read can get to sysread speed given the i/o layers overhead. but as i said, even with a speedup, the slurp api is cleaner. and replacing the implementation isn't an issue as i said most of the code is handling options, errors and such. little code other than sysread and large read sizes are optimizations. i even found that just the overhead of option and error checking and handling is noticeable in benchmarks. a plain sysread in the benchmark script beats out read_file for just that reason. i keep telling you all to look at and run the benchmark. also the most common homebrew code i have seen is more like this: $text = join( '', <$fh> ) ; the use of $/ is not well known so advocating it is not a good solution IMNSHO. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
CC: Eric Brine <ikegami [...] adaelis.com>, perl5 porters <perl5-porters [...] perl.org>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Leon Timmermans <fawaka [...] gmail.com>
Date: Fri, 16 May 2014 17:06:40 +0200
To: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 908b
On Fri, May 16, 2014 at 4:29 PM, Uri Guttman <uri@stemsystems.com> wrote:
Show quoted text
more like what i have seen. that is a lot of speed to make up. i don't see how do/local or read can get to sysread speed given the i/o layers overhead.

It's not so much layer overhead per se, but buffering overhead in a specific case that doesn't need buffering. Turning it off will remarkably increase the performance. Something along the lines of:

sub read_file ($filename, $layer = ':unix') {
  open my $fh, "<:$layer", $filename or die "Couldn't open $filename: $!";
  read $fh, my $buffer, -s $fh or die "Couldn't read $filename: $!";
  return $buffer;
}

is just as fast as the code in File::Slurp (the read itself is the bottleneck, there isn't much left to optimize).

As soon as you do anything that uses a buffer this performance benefits goes away. This includes crlf translation, line based IO and decoding.

Leon
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Wed May 14 02:11:21 2014, mcgrath.martin@gmail.com wrote: Show quoted text
> Apologies if this has already been discussed. I don't subscribe to p5p > but have searched the list before posting, and didn't find anything > relating to this issue. > > This patch replaces the example/recommendation of using File::Slurp > with Path::Tiny in perlfaq5. > > The reasoning behind this is an outstanding critical bug with > File::Slurp: > > https://rt.cpan.org/Public/Bug/Display.html?id=83126 >
Discussion in this RT appears to have shifted to how best to fix File::Slurp. File::Slurp is not a core module, so discussion of how to improve it would normally have been conducted in its own bug queue (presumably, https://rt.cpan.org//Dist/Display.html?Queue=File-Slurp). I realize that shutting off discussion in p5p at this point would be Canutian, but can I at least pose these questions: Do we want to modify perlfaq5 in the manner suggested by the OP two days ago? Do we want to modify perlfaq5 at all? If we do, does this go into 5.20? My two cents: File::Slurp is a well-known library. I can recall lightning talks Uri gave about it at YAPCs a decade ago. Whatever problems it has can be hashed out, preferably in its own bug queue and don't warrant its deletion from perlfaq5. "Mend it, don't end it." Perhaps some discussion of Path::Tiny's slurping capabilities ought to be added to perlfaq5. And, no, this does not go into 5.20. Thank you very much. Jim Keenan
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: ;, perl5-porters [...] perl.org
Date: Fri, 16 May 2014 19:15:16 -0400
To: James E Keenan via RT <perlbug-followup [...] perl.org>
From: Ricardo Signes <perl.p5p [...] rjbs.manxome.org>
Download (untitled) / with headers
text/plain 125b
* James E Keenan via RT <perlbug-followup@perl.org> [2014-05-16T17:56:32] Show quoted text
> If we do, does this go into 5.20?
No. -- rjbs
Download signature.asc
application/pgp-signature 473b

Message body not shown because it is not plain text.

Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: Eric Brine <ikegami [...] adaelis.com>, perl5 porters <perl5-porters [...] perl.org>
From: David Golden <xdg [...] xdg.me>
Date: Fri, 16 May 2014 22:53:19 -0400
To: Uri Guttman <uri [...] stemsystems.com>
On Fri, May 16, 2014 at 10:29 AM, Uri Guttman <uri@stemsystems.com> wrote: Show quoted text
> more like what i have seen. that is a lot of speed to make up. i don't see > how do/local or read can get to sysread speed given the i/o layers overhead.
FWIW, if you open the file with :unix, then you don't get :perlio and then you don't have the buffering. At that point, read() on the file size (for ordinary files), does sysread looping in C -- i.e. adding up bytes read in chunks in C and not doing byte arithmetic and logic using perl ops. In my benchmarking, that was as fast or faster in some cases. I've tried to quickly reconstruct that here: Benchmark: https://gist.github.com/dagolden/c4dad291cd0b7156f268 Source: https://gist.github.com/dagolden/973bcfa11816e2d96c65 (Small was 50k, medium was 500k, large was 5000k) I'm not claiming it's an entirely fair benchmark, but it does show a way to get even faster performance in some circumstances. Path::Tiny has a bunch of method call overhead and abstraction (plus locking) that the others don't have, which is why it's a bit slower. David
Date: Sat, 17 May 2014 00:14:33 -0400
To: perl5-porters [...] perl.org
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
From: Uri Guttman <uri [...] stemsystems.com>
Download (untitled) / with headers
text/plain 1.7k
On 05/16/2014 10:53 PM, David Golden wrote: Show quoted text
> On Fri, May 16, 2014 at 10:29 AM, Uri Guttman <uri@stemsystems.com> wrote:
>> more like what i have seen. that is a lot of speed to make up. i don't see >> how do/local or read can get to sysread speed given the i/o layers overhead.
> > FWIW, if you open the file with :unix, then you don't get :perlio and > then you don't have the buffering. At that point, read() on the file > size (for ordinary files), does sysread looping in C -- i.e. adding up > bytes read in chunks in C and not doing byte arithmetic and logic > using perl ops.
and why would that be better than calling sysread directly? given the large read size of 1mb (most text files are way below that) you only get one loop and little overhead. i benchmarked a variation which does only one read on smaller files but the overhead of checking that was seen (compared to a bare sysread with no extra code). Show quoted text
> > In my benchmarking, that was as fast or faster in some cases. I've > tried to quickly reconstruct that here: > > Benchmark: https://gist.github.com/dagolden/c4dad291cd0b7156f268 > > Source: https://gist.github.com/dagolden/973bcfa11816e2d96c65 > > (Small was 50k, medium was 500k, large was 5000k) > > I'm not claiming it's an entirely fair benchmark, but it does show a > way to get even faster performance in some circumstances. > > Path::Tiny has a bunch of method call overhead and abstraction (plus > locking) that the others don't have, which is why it's a bit slower.
you can easily add an entry into the benchmark that come with file::slurp. i browsed the code of path::tiny today (not a tiny module! :). if you want to talk about some ideas back and forth, i am open to it. thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com
To: Uri Guttman <uri [...] stemsystems.com>
Date: Sun, 18 May 2014 12:23:17 -0400
From: David Golden <xdg [...] xdg.me>
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
CC: p5p <perl5-porters [...] perl.org>
On Sat, May 17, 2014 at 12:14 AM, Uri Guttman <uri@stemsystems.com> wrote: Show quoted text
>> FWIW, if you open the file with :unix, then you don't get :perlio and >> then you don't have the buffering. At that point, read() on the file >> size (for ordinary files), does sysread looping in C -- i.e. adding up >> bytes read in chunks in C and not doing byte arithmetic and logic >> using perl ops.
> > > and why would that be better than calling sysread directly?
In the real world, probably not much. In a tight slurp loop benchmark case, perl's read() function (in the :unix layer at least) does almost exactly what File::Slurp is doing with sysread(), except in C instead of in Perl, so avoids overhead carrying out things like loops, comparison, and addition via Perl's op-tree, pads and stack. Show quoted text
> you can easily add an entry into the benchmark that come with file::slurp. i > browsed the code of path::tiny today (not a tiny module! :). if you want to > talk about some ideas back and forth, i am open to it.
I'll put that on my round tuit list. :-) David -- David Golden <xdg@xdg.me> Twitter/IRC: @xdg
Date: Tue, 20 May 2014 02:30:44 +0200
To: perl5-porters [...] perl.org, perl5-porters [...] perl.org
Subject: Re: [perl #121870] [PATCH] Replace File::Slurp with Path::Tiny inperlfaq5
From: "Dr.Ruud" <rvtol+usenet [...] isolution.nl>
Download (untitled) / with headers
text/plain 1002b
On 2014-05-15 02:47, Dr.Ruud wrote: Show quoted text
> On 2014-05-14 17:30, Leon Timmermans wrote:
>> On Wed, May 14, 2014 at 5:14 PM, Uri Guttman <uri@stemsystems.com >> <mailto:uri@stemsystems.com>> wrote:
Show quoted text
>> and how would you go about fixing it in file::slurp? i am not a >> unicode/utf guru by any stretch (i say i am an ascii bigot! :). >> would just doing a raw sysread all the time and then converting the >> buffer as directed by binmode work? any other suggestions for how to >> read utf files with ease and speed? >> >> >> File::Slurp is nothing more than a workaround for a performance bug in >> perl. There is no reason «my $foo = do { local $/; <$fh> };» would need >> to be any slower than what File::Slurp is doing right now. In fact I'm >> planning to fix just that in 5.21.
> > Not sure if this is still true for the latest Perls, but you better > write it as: > > my $foo; { local $/; $foo = <$fh> } > > or it will use twice the memory.
Apparently fixed in 5.19. -- Ruud
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 617b
On Wed May 14 02:11:21 2014, mcgrath.martin@gmail.com wrote: Show quoted text
> This patch replaces the example/recommendation of using File::Slurp > with Path::Tiny in perlfaq5. > > The reasoning behind this is an outstanding critical bug with > File::Slurp: > > https://rt.cpan.org/Public/Bug/Display.html?id=83126
perlfaq is upstream CPAN, issues in it should be reported at: https://github.com/perl-doc-cats/perlfaq/issues though I suggest you point them at this discussion. Discussion of the bug in File::Slurp belongs in the File::Slurp ticket: https://rt.cpan.org/Ticket/Display.html?id=83126 Closing this ticket. Tony


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org