New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can buffered read() be smarter about chunk size? #14995
Comments
From @FGasperCreated by @FGasperCurrently read() will always take in data in 8,192-byte chunks read( $fh, $buffer, 30000 ) ... because the buffered read will actually *cost* system calls It would seem more sensible to read in some multiple of the normal chunk size People can, of course, use sysread(), but then (I am told, have not Is there anything gained from NOT doing a single read in the case of Perl Info
|
From chansen@cpan.orgOn Mon Oct 19 15:55:54 2015, felipe@felipegasper.com wrote:
You can bypass the PerlIO buffer layer (perlio) by specifying a lower open my $fh, '<:unix', '/path/to/fh'; -- |
The RT System itself - Status changed from 'new' to 'open' |
From @FGasperOn 20 Oct 2015 11:29 AM, Christian Hansen via RT wrote:
Would it work, though, to make PerlIO smarter? Then we can still get the -FG |
From chansen@cpan.orgOn Tue Oct 20 10:31:29 2015, felipe@felipegasper.com wrote:
When Nick Ing-Simmons designed PerlIO for inclusion in Perl 5.8.0 (2002), -- |
From @FGasperOn 20 Oct 2015 1:13 PM, Christian Hansen via RT wrote:
Does PerlIO only know how to fill up its buffer then flush? How hard 0) PerlIO has 500 bytes in $fh cache 1) read( $fh, $buffer, 30000 ) 2) flush $fh cache into $buffer. int bytes_needed = 25,000; 3) Read (floor(bytes_needed) * 8192) bytes DIRECTLY to $buffer. 3) Now do a “normal” 8,192-byte PerlIO read(). Send (bytes_needed % -FG |
From chansen@cpan.orgOn Tue Oct 20 11:13:45 2015, chansen wrote:
Perhaps we should use OS hints when Perl is compiled? -- |
From chansen@cpan.orgOn Tue Oct 20 11:28:03 2015, felipe@felipegasper.com wrote:
By design, the PerlIO implementation is completely decoupled from Perl. -- |
From @LeontOn Tue, Oct 20, 2015 at 12:55 AM, felipe@felipegasper.com <
Actually, I think I actually wrote a patch to that effect some months ago, Leon |
From @craigberryOn Tue, Oct 20, 2015 at 1:27 PM, Felipe Gasper <felipe@felipegasper.com> wrote:
Actually, he made it 4096. Some years later I made it the larger of <http://perl5.git.perl.org/perl.git/commit/b83080de5c42543809ce9004bcdbcd3162a00e70?f> which produced a significant performance improvement. I did what was
There are various dragons here. I believe one of the BSDs had a
This is interesting but I would favor having a buffer size selectable open my $f, '<:perlio(bufsiz=32767)', 'file.txt'; (though I really shouldn't be trusted with syntax). This way we would |
From @FGasperOn 20 Oct 2015 2:19 PM, Craig A. Berry wrote:
This would also require rewriting existing code, and for Perl developers If you optimize the read() built-in, though, then everything -FG |
From @LeontOn Tue, Oct 20, 2015 at 9:19 PM, Craig A. Berry <craig.a.berry@gmail.com>
See also: https://metacpan.org/release/PerlIO-buffersize. It's not very Leon |
From @FGasperOn 20 Oct 2015 2:20 PM, Craig Berry via RT wrote:
FWIW, between the ====’s is pseudocode for what I have in mind. I regret ================ //================================================== if (count > 0 && avail <= 0) {
|
From @tonycozOn Mon Oct 19 15:55:54 2015, felipe@felipegasper.com wrote:
I would have expected something like: Inline Patchdiff --git a/perlio.c b/perlio.c
index 8ab47e4..7636595 100644
--- a/perlio.c
+++ b/perlio.c
@@ -2032,6 +2032,7 @@ SSize_t
PerlIOBase_read(pTHX_ PerlIO *f, void *vbuf, Size_t count)
{
STDCHAR *buf = (STDCHAR *) vbuf;
+ PerlIOBuf * const b = PerlIOSelf(f, PerlIOBuf);
if (f) {
if (!(PerlIOBase(f)->flags & PERLIO_F_CANREAD)) {
PerlIOBase(f)->flags |= PERLIO_F_ERROR;
@@ -2055,6 +2056,18 @@ PerlIOBase_read(pTHX_ PerlIO *f, void *vbuf, Size_t count)
if (avail == 0) /* set_ptrcnt could have reset avail */
goto get_cnt;
}
+ /* if the caller wants more than we can fit in the buffer,
+ avoid looping around filling the buffer and copying it
+ out */
+ if (count > b->bufsiz) {
+ PerlIO *n = PerlIONext(f);
+
+ SSize_t got = PerlIO_read(n, buf, count);
+ if (count <= 0)
+ break;
+ count -= got;
+ buf += got;
+ }
if (count > 0 && avail <= 0) {
if (PerlIO_fill(f) != 0)
break;
It does reduce the number of read(2) calls to 1 though: $ strace -e trace=read ./perl -e 'read(STDIN, $buf, 30000)' </dev/zero 2>&1 | grep 'read(0' Tony |
From @LeontOn Wed, Oct 21, 2015 at 12:59 AM, Tony Cook via RT <
I'm surprised that only fails only one test, and would consider that a Leon |
From @FGasperOn 20 Oct 2015 7:00 PM, Leon Timmermans via RT wrote:
Thank you both for looking at this! :) Leon/Tony, maybe this could be active for :raw--or whatever other cases -FG |
From @craigberryOn Tue, Oct 20, 2015 at 6:59 PM, Leon Timmermans <fawaka@gmail.com> wrote:
The only alternative I've thought of would be to make PerlIO_fill (and We'd probably have to use va_arg and friends to make the new argument It might be possible to make this work, but nothing with PerlIO is as |
From @tonycozOn Tue Oct 20 17:00:24 2015, LeonT wrote:
The attached avoids modifying PerlIOBase_read or PerlIOBuf_read. I wondered if I can use PERLIO_K_RAW to indicate that the direct read was safe, but that's set for the crlf layer, where it isn't safe. Tony |
From @tonycoz0001-perl-126403-WIP-bypass-the-buffer-for-large-reads-th.patchFrom de3a8ff5e2b972f17f475c27471b8a7cb6f46bf1 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 26 Oct 2015 12:06:53 +1100
Subject: [perl #126403] WIP bypass the buffer for large reads through only
:perlio
---
perlio.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 61 insertions(+), 1 deletion(-)
diff --git a/perlio.c b/perlio.c
index 8ab47e4..e64be2b 100644
--- a/perlio.c
+++ b/perlio.c
@@ -4287,7 +4287,67 @@ PerlIOBuf_dup(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags)
return PerlIOBase_dup(aTHX_ f, o, param, flags);
}
+static SSize_t
+PerlIOPerlIO_read(pTHX_ PerlIO *f, void *vbuf, Size_t count)
+{
+ STDCHAR *buf = (STDCHAR *) vbuf;
+ if (f) {
+ PerlIOBuf * const b = PerlIOSelf(f, PerlIOBuf);
+ if (!(PerlIOBase(f)->flags & PERLIO_F_CANREAD)) {
+ PerlIOBase(f)->flags |= PERLIO_F_ERROR;
+ SETERRNO(EBADF, SS_IVCHAN);
+ PerlIO_save_errno(f);
+ return 0;
+ }
+ PerlIO_debug("flags %x\n", PerlIOBase(f)->flags);
+ while (count > 0) {
+ get_cnt:
+ {
+ SSize_t avail = PerlIO_get_cnt(f);
+ SSize_t take = 0;
+ if (avail > 0)
+ take = (((SSize_t) count >= 0) && ((SSize_t)count < avail)) ? (SSize_t)count : avail;
+ if (take > 0) {
+ STDCHAR *ptr = PerlIO_get_ptr(f);
+ Copy(ptr, buf, take, STDCHAR);
+ PerlIO_set_ptrcnt(f, ptr + take, (avail -= take));
+ count -= take;
+ buf += take;
+ if (avail == 0) /* set_ptrcnt could have reset avail */
+ goto get_cnt;
+ }
+ /* if the caller wants more than we can fit in the buffer,
+ avoid looping around filling the buffer and copying it
+ out */
+ if (count > b->bufsiz) {
+ PerlIO *n = PerlIONext(f);
+
+ SSize_t got = PerlIO_read(n, buf, count);
+ if (got <= 0) {
+ if (got == 0) {
+ PerlIOBase(f)->flags |= PERLIO_F_EOF;
+ break;
+ }
+ else if (got < 0) {
+ PerlIOBase(f)->flags |= PERLIO_F_ERROR;
+ PerlIO_save_errno(f);
+ break;
+ }
+ }
+ count -= got;
+ buf += got;
+ }
+ if (count > 0 && avail <= 0) {
+ if (PerlIO_fill(f) != 0)
+ break;
+ }
+ }
+ }
+ return (buf - (STDCHAR *) vbuf);
+ }
+ return 0;
+}
PERLIO_FUNCS_DECL(PerlIO_perlio) = {
sizeof(PerlIO_funcs),
@@ -4301,7 +4361,7 @@ PERLIO_FUNCS_DECL(PerlIO_perlio) = {
NULL,
PerlIOBase_fileno,
PerlIOBuf_dup,
- PerlIOBuf_read,
+ PerlIOPerlIO_read,
PerlIOBuf_unread,
PerlIOBuf_write,
PerlIOBuf_seek,
--
2.1.4
|
From @craigberryOn Sun, Oct 25, 2015 at 9:57 PM, Tony Cook via RT
So after determining that the buffer isn't large enough to hold count |
From @tonycozOn Tue, Oct 27, 2015 at 12:41:13PM -0500, Craig A. Berry wrote:
It won't fit into the layer buffer, but buf and count refer to the If it won't fit there then the caller has messed up, since Tony |
From @craigberryOn Tue, Oct 27, 2015 at 5:39 PM, Tony Cook <tony@develop-help.com> wrote:
Ah, yes. I saw "buf" but was thinking "b->buf". Have you tried to do |
From @tonycozOn Wed Oct 28 05:25:58 2015, craig.a.berry@gmail.com wrote:
I hadn't. On Linux it speeds up read() to around the same speed as sysread(). On Win32 it makes no difference, since the perlio layer isn't used, unless you explicitly add it: J:\dev\perl\git\perl\win32>..\perl -le "open my $fh, '<', 'zeros' or die; binmode $fh, ':raw'; print for PerlIO::get_layers($fh)" J:\dev\perl\git\perl\win32>..\perl -le "open my $fh, '<:raw:perlio', 'zeros' or die; binmode $fh, ':raw'; print for PerlIO::get_layers($fh)" J:\dev\perl\git\perl\win32>..\perl -le "open my $fh, '<:pop:perlio', 'zeros' or die; binmode $fh, ':raw'; print for PerlIO::get_layers($fh)" For Linux: blead: Size of zeros is 10000000 with the patch: Size of zeros is 10000000 I'd hoped it would be more useful on Win32, since read() calls are fairly slow per call on Win32, but I guess it would require more work. Win32 has at least one other perlio issue, the :crlf layer is O(n**2) (n = size of the buffer) as pointed out by bulk88 a while back. Tony |
From @tonycoz |
From Stromeko@nexgo.deAm 29.10.2015 um 01:51 schrieb Tony Cook via RT:
On windows the read buffer size should be 64kiB if it's used to read -- (on the road :-) |
From @craigberryOn Thu, Oct 29, 2015 at 2:51 AM, Achim Gratz <Stromeko@nexgo.de> wrote:
Folks building their own Perl who want to experiment with larger ./Configure -Accflags=-DPERLIOBUF_DEFAULT_BUFSIZ=65535 or whatever size you want. Or add that to the defines in the Windows I've been trying an alternate approach to Tony's that is more in line What's here (inline below and attached) does illustrate, however, the Not sure when or if I'll finish getting the kinks worked out, but this From 2a6c01c0637b801a90aaa5b8536efbd17c91d6a5 Mon Sep 17 00:00:00 2001 WIP: There should be an upper bound on how big we will grow it, In PerlIOBase_read, when we have exhausted the buffer and are This way we should satisfy the request on the next fill and Addresses [perl #126403]. perlio.c | 13 +++++++++++++ Inline Patchdiff --git a/perlio.c b/perlio.c
index 8ab47e4..1f83f01 100644
--- a/perlio.c
+++ b/perlio.c
@@ -2056,6 +2056,19 @@ PerlIOBase_read(pTHX_ PerlIO *f, void *vbuf,
|
From @craigberry0001-Grow-perlio-buffer-to-match-requested-read-size.patchFrom 2a6c01c0637b801a90aaa5b8536efbd17c91d6a5 Mon Sep 17 00:00:00 2001
From: "Craig A. Berry" <craigberry@mac.com>
Date: Thu, 29 Oct 2015 09:48:27 -0500
Subject: [PATCH] Grow perlio buffer to match requested read size.
WIP: There should be an upper bound on how big we will grow it,
but I don't know what's reasonable. 512K? 512MB?
In PerlIOBase_read, when we have exhausted the buffer and are
about to refill, expand its size first to match the size of the
requested read if it is larger than the current buffer size.
This way we should satisfy the request on the next fill and
not end up back where we are, continually refilling a buffer
that is smaller than what we want to read into it.
Addresses [perl #126403].
---
perlio.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/perlio.c b/perlio.c
index 8ab47e4..1f83f01 100644
--- a/perlio.c
+++ b/perlio.c
@@ -2056,6 +2056,19 @@ PerlIOBase_read(pTHX_ PerlIO *f, void *vbuf, Size_t count)
goto get_cnt;
}
if (count > 0 && avail <= 0) {
+ if (PerlIOBase(f)->flags & PERLIO_F_RDBUF) {
+ PerlIOBuf * const b = PerlIOSelf(f, PerlIOBuf);
+ if (count > b->bufsiz) {
+ Size_t requested = count - (count % 512) + 512;
+ b->bufsiz = requested;
+ Renew(b->buf, b->bufsiz, STDCHAR);
+ if (!b->buf) {
+ b->buf = (STDCHAR *) & b->oneword;
+ b->bufsiz = sizeof(b->oneword);
+ }
+ }
+ }
+
if (PerlIO_fill(f) != 0)
break;
}
--
2.2.1
|
From @craigberryOn Thu, Oct 29, 2015 at 12:11 PM, Craig A. Berry
Those test failures were easily fixed by doing a flush to reset buffer In any case, I think this is now a fairly complete solution that
There is also noticeable improvement on Windows. On a Perl built with N.B. This approach only addresses the case where a specific length is |
From @craigberry0001-Grow-perlio-buffer-to-match-requested-read-size.patchFrom 3c48082e12795723f769efa1747f13757ca60d9d Mon Sep 17 00:00:00 2001
From: "Craig A. Berry" <craigberry@mac.com>
Date: Thu, 29 Oct 2015 09:48:27 -0500
Subject: [PATCH] Grow perlio buffer to match requested read size.
In PerlIOBase_read, when we have exhausted the buffer and are
about to refill, expand its size first to match the size of the
requested read if it is larger than the current buffer size.
This way we should satisfy the request on the next fill and
not end up back where we are, continually refilling a buffer
that is smaller than what we want to read into it.
The buffer is not allowed to grow larger than the value to which
the new macro PERLIOBUF_MAX_BUFSIZ is set, currently 1 MiB.
Addresses [perl #126403].
---
perlio.c | 18 ++++++++++++++++++
perlio.h | 6 ++++++
2 files changed, 24 insertions(+)
diff --git a/perlio.c b/perlio.c
index 8ab47e4..405c776 100644
--- a/perlio.c
+++ b/perlio.c
@@ -2056,6 +2056,24 @@ PerlIOBase_read(pTHX_ PerlIO *f, void *vbuf, Size_t count)
goto get_cnt;
}
if (count > 0 && avail <= 0) {
+ if (PerlIOBase(f)->flags & PERLIO_F_RDBUF) {
+ PerlIOBuf * const b = PerlIOSelf(f, PerlIOBuf);
+ if (count > b->bufsiz && PerlIO_flush(f) == 0) {
+ Size_t requested = count - (count % 512) + 512;
+ if (requested > PERLIOBUF_MAX_BUFSIZ)
+ requested = PERLIOBUF_MAX_BUFSIZ;
+ if (requested > b->bufsiz) {
+ b->bufsiz = requested;
+ Renew(b->buf, b->bufsiz, STDCHAR);
+ if (!b->buf) {
+ b->buf = (STDCHAR *) & b->oneword;
+ b->bufsiz = sizeof(b->oneword);
+ return 0;
+ }
+ }
+ }
+ }
+
if (PerlIO_fill(f) != 0)
break;
}
diff --git a/perlio.h b/perlio.h
index 1a3d480..093fda9 100644
--- a/perlio.h
+++ b/perlio.h
@@ -149,6 +149,12 @@ PERL_CALLCONV void PerlIO_clone(pTHX_ PerlInterpreter *proto,
#define PERLIOBUF_DEFAULT_BUFSIZ (BUFSIZ > 8192 ? BUFSIZ : 8192)
#endif
+/* The maximum buffer size for the perlio buffering layer */
+/* One MiB is a bit arbitrary, but what should it be? */
+#ifndef PERLIOBUF_MAX_BUFSIZ
+#define PERLIOBUF_MAX_BUFSIZ 1024 * 1024
+#endif
+
#ifndef SEEK_SET
#define SEEK_SET 0
#endif
--
2.2.1
|
From @tonycozOn Sun Nov 01 18:23:17 2015, craig.a.berry@gmail.com wrote:
This check after Renew: + Renew(b->buf, b->bufsiz, STDCHAR); isn't needed. Unless you set PL_nomemok = TRUE, which is only done briefly by hv.c, memory allocation failures cause perl to exit. My main issue with pushing the buffer sizes up is, at least for reads with only :unix and :perlio, we're still copying data into the buffer just to copy it again into the caller's buffer. Tony |
From @craigberryOn Sun, Nov 8, 2015 at 5:04 PM, Tony Cook via RT
Since I wrote that, I've realized it's not actually how it works if
Good to know, thanks. I'm afraid that was copy and paste from
That's true, but short of abandoning the whole layer concept, is there |
From @tonycozOn Sun, Nov 08, 2015 at 10:00:57PM -0600, Craig A. Berry wrote:
That's for intermediate buffers, which should be filling by calling A layer which fiddles with the stream's contents, like :encoding or If a layer wants the lower layer to fill its buffer, then it can call Consider that the :unix layer already has this direct read() Tony |
Migrated from rt.perl.org#126403 (status was 'open')
Searchable as RT126403$
The text was updated successfully, but these errors were encountered: