Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] faster integer stringification algorithm #16769

Closed
p5pRT opened this issue Nov 28, 2018 · 16 comments
Closed

[PATCH] faster integer stringification algorithm #16769

p5pRT opened this issue Nov 28, 2018 · 16 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 28, 2018

Migrated from rt.perl.org#133691 (status was 'resolved')

Searchable as RT133691$

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2018

From @xenu

This patch speed-ups int stringification in perl. The algorithm works on
big endian machines (I have tested it on emulated s390x linux), it
doesn't cause unaligned reads/writes and I'm 90% sure it will work on
EBCDIC platforms (CC-ing Karl to get his opinion).

For single digit numbers there's basically no difference in performance,
but for larger integers there's a noticeable gain​:

Key​:
  Ir Instruction read
  Dr Data read
  Dw Data write
  COND conditional branches
  IND indirect branches
  _m branch predict miss
  _m1 level 1 cache miss
  _mm last cache (e.g. L3) miss
  - indeterminate percentage (e.g. 1/0)

The numbers represent relative counts per loop iteration, compared to
blead at 100.0%.
Higher is better​: for example, using half as many instructions gives 200%,
while using twice as many gives 50%.

one_digit
one digit (positive)

  blead my_patch
  ------ --------
  Ir 100.00 100.84
  Dr 100.00 100.41
  Dw 100.00 100.00
  COND 100.00 99.28
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

one_digit_neg
one digit (negative)

  blead my_patch
  ------ --------
  Ir 100.00 100.71
  Dr 100.00 100.00
  Dw 100.00 100.00
  COND 100.00 99.26
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

two_digits
two digits (positive)

  blead my_patch
  ------ --------
  Ir 100.00 102.26
  Dr 100.00 100.00
  Dw 100.00 100.79
  COND 100.00 100.00
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

two_digits_neg
two digits (negative)

  blead my_patch
  ------ --------
  Ir 100.00 102.13
  Dr 100.00 99.60
  Dw 100.00 100.79
  COND 100.00 100.00
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

four_digits
four digits (positive)

  blead my_patch
  ------ --------
  Ir 100.00 103.41
  Dr 100.00 99.60
  Dw 100.00 101.57
  COND 100.00 100.75
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

four_digits_neg
four digits (negative)

  blead my_patch
  ------ --------
  Ir 100.00 103.27
  Dr 100.00 99.20
  Dw 100.00 101.56
  COND 100.00 100.75
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

five_digits
five digits (positive)

  blead my_patch
  ------ --------
  Ir 100.00 103.24
  Dr 100.00 99.60
  Dw 100.00 101.56
  COND 100.00 100.74
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

five_digits_neg
five digits (negative)

  blead my_patch
  ------ --------
  Ir 100.00 102.68
  Dr 100.00 99.34
  Dw 100.00 101.28
  COND 100.00 100.64
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

eighteen_digits
eighteen digits (positive)

  blead my_patch
  ------ --------
  Ir 100.00 109.78
  Dr 100.00 97.39
  Dw 100.00 105.59
  COND 100.00 105.06
  IND 100.00 100.00

COND_m 100.00 -
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

eighteen_digits_neg
eighteen digits (negative)

  blead my_patch
  ------ --------
  Ir 100.00 109.64
  Dr 100.00 97.08
  Dw 100.00 105.56
  COND 100.00 105.06
  IND 100.00 100.00

COND_m 100.00 -
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

AVERAGE

  blead my_patch
  ------ --------
  Ir 100.00 103.71
  Dr 100.00 99.21
  Dw 100.00 101.83
  COND 100.00 101.11
  IND 100.00 100.00

COND_m 100.00 100.00
IND_m 100.00 100.00

Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00

Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2018

From @xenu

The patch and benchmark's source code are attached.

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2018

From @xenu

0001-S_uiv_2buf-faster-integer-stringification-algorithm.patch
From 6524f95e2b81a587529929ef45e6fed75a19cfbe Mon Sep 17 00:00:00 2001
From: Tomasz Konojacki <me@xenu.pl>
Date: Wed, 28 Nov 2018 12:38:51 +0100
Subject: [PATCH] S_uiv_2buf: faster integer stringification algorithm

Processing two digits at a time results in noticeable performance
improvement for larger numbers.

The previous version of the function was being automatically inlined
by gcc because of its small size. It's no longer the case so I had to
use PERL_STATIC_INLINE.

I have marked the UV branch as UNLIKELY because UVs are *much* rarer
than IVs.

[perl #133691]
---
 embed.fnc |  2 +-
 proto.h   |  4 +++-
 sv.c      | 53 +++++++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index 2ed2cc32b9..16535f87cc 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2690,7 +2690,7 @@ pR	|SV *	|varname	|NULLOK const GV *const gv|const char gvtype \
 
 pX	|void	|sv_del_backref	|NN SV *const tsv|NN SV *const sv
 #if defined(PERL_IN_SV_C)
-nsR	|char *	|uiv_2buf	|NN char *const buf|const IV iv|UV uv|const int is_uv|NN char **const peob
+niR	|char *	|uiv_2buf	|NN char *const buf|const IV iv|UV uv|const int is_uv|NN char **const peob
 i	|void	|sv_unglob	|NN SV *const sv|U32 flags
 s	|const char *|sv_display	|NN SV *const sv|NN char *tmpbuf|STRLEN tmpbuf_size
 s	|void	|not_a_number	|NN SV *const sv
diff --git a/proto.h b/proto.h
index e57df2f206..3a898a962b 100644
--- a/proto.h
+++ b/proto.h
@@ -6001,10 +6001,12 @@ PERL_STATIC_INLINE void	S_sv_unglob(pTHX_ SV *const sv, U32 flags);
 #define PERL_ARGS_ASSERT_SV_UNGLOB	\
 	assert(sv)
 #endif
-STATIC char *	S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
+#ifndef PERL_NO_INLINE_FUNCTIONS
+PERL_STATIC_INLINE char *	S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
 			__attribute__warn_unused_result__;
 #define PERL_ARGS_ASSERT_UIV_2BUF	\
 	assert(buf); assert(peob)
+#endif
 
 STATIC void	S_utf8_mg_len_cache_update(pTHX_ SV *const sv, MAGIC **const mgp, const STRLEN ulen);
 #define PERL_ARGS_ASSERT_UTF8_MG_LEN_CACHE_UPDATE	\
diff --git a/sv.c b/sv.c
index b10f205033..06b38b37b5 100644
--- a/sv.c
+++ b/sv.c
@@ -2846,6 +2846,28 @@ Perl_sv_2num(pTHX_ SV *const sv)
     return sv_2mortal(newSVuv(PTR2UV(SvRV(sv))));
 }
 
+/* int2str_table: lookup table containing string representations of all
+ * two digit numbers. For example, int2str_table[0] is "00" and
+ * int2str_table[12*2] is "12".
+ */
+const char int2str_table[200] = {
+    '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6',
+    '0', '7', '0', '8', '0', '9', '1', '0', '1', '1', '1', '2', '1', '3',
+    '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', '2', '0',
+    '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7',
+    '2', '8', '2', '9', '3', '0', '3', '1', '3', '2', '3', '3', '3', '4',
+    '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', '4', '0', '4', '1',
+    '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8',
+    '4', '9', '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5',
+    '5', '6', '5', '7', '5', '8', '5', '9', '6', '0', '6', '1', '6', '2',
+    '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+    '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6',
+    '7', '7', '7', '8', '7', '9', '8', '0', '8', '1', '8', '2', '8', '3',
+    '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', '9', '0',
+    '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7',
+    '9', '8', '9', '9'
+};
+
 /* uiv_2buf(): private routine for use by sv_2pv_flags(): print an IV or
  * UV as a string towards the end of buf, and return pointers to start and
  * end of it.
@@ -2853,16 +2875,23 @@ Perl_sv_2num(pTHX_ SV *const sv)
  * We assume that buf is at least TYPE_CHARS(UV) long.
  */
 
-static char *
+PERL_STATIC_INLINE char *
 S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
 {
     char *ptr = buf + TYPE_CHARS(UV);
     char * const ebuf = ptr;
     int sign;
+    U16 *word_ptr, *word_table;
 
     PERL_ARGS_ASSERT_UIV_2BUF;
 
-    if (is_uv)
+    /* ptr has to be properly aligned, because we will cast it to U16* */
+    assert(ptr % 2 == 0);
+    /* we are going to read/write two bytes at a time */
+    word_ptr = (U16*)ptr;
+    word_table = (U16*)int2str_table;
+
+    if (UNLIKELY(is_uv))
 	sign = 0;
     else if (iv >= 0) {
 	uv = iv;
@@ -2871,11 +2900,23 @@ S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const pe
         uv = -(UV)iv;
 	sign = 1;
     }
-    do {
-	*--ptr = '0' + (char)(uv % 10);
-    } while (uv /= 10);
+
+    while (uv > 99) {
+        *--word_ptr = word_table[uv % 100];
+        uv /= 100;
+    }
+    ptr = (char*)word_ptr;
+
+    if (uv < 10)
+        *--ptr = (char)uv + '0';
+    else {
+        *--word_ptr = word_table[uv];
+        ptr = (char*)word_ptr;
+    }
+
     if (sign)
-	*--ptr = '-';
+        *--ptr = '-';
+
     *peob = ebuf;
     return ptr;
 }
-- 
2.14.1

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2018

From @xenu

stringification_bench

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2018

From @jkeenan

On Wed, 28 Nov 2018 12​:43​:59 GMT, me@​xenu.pl wrote​:

The patch and benchmark's source code are attached.

Available for smoke-testing in this branch​:

smoke-me/jkeenan/xenu/133691-integer-stringification
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2018

From @jkeenan

On Wed, 28 Nov 2018 14​:01​:58 GMT, jkeenan wrote​:

On Wed, 28 Nov 2018 12​:43​:59 GMT, me@​xenu.pl wrote​:

The patch and benchmark's source code are attached.

Available for smoke-testing in this branch​:

smoke-me/jkeenan/xenu/133691-integer-stringification

This appears to be failing consistently on -DDEBUGGING builds on smoke-testing rigs that normally PASS. See, e.g., http​://perl5.test-smoke.org/report/74928

Configuring on Linux with simply '-des -Dusedevel -DDEBUGGING', 'make' fails for me here​:

#####
cc -c -DPERL_CORE -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -std=c89 -O2 -g -Wall -Werror=declaration-after-statement -Werror=pointer-arith -Wextra -Wc++-compat -Wwrite-strings sv.c
In file included from perl.h​:3399​:0,
  from sv.c​:32​:
sv.c​: In function ‘S_uiv_2buf’​:
sv.c​:2889​:16​: error​: invalid operands to binary % (have ‘char *’ and ‘int’)
  assert(ptr % 2 == 0);
  ^
makefile​:249​: recipe for target 'sv.o' failed
#####

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2018

From @tonycoz

On Wed, Nov 28, 2018 at 01​:43​:43PM +0100, Tomasz Konojacki wrote​:

The patch and benchmark's source code are attached.

This code makes some assumptions about alignment that I don't think we
can depend on​:

- it assumes int2str_table[] will be U16 aligned.

- it assumes buf[] will be U16 aligned.

I believe neither is guaranteed by the C standard.

As James' smoke tests point out, the assertion isn't valid C, since a
pointer doesn't have arithmetic type.

Tony

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2018

From @xenu

On Thu, 29 Nov 2018 11​:32​:52 +1100
Tony Cook <tony@​develop-help.com> wrote​:

This code makes some assumptions about alignment that I don't think we
can depend on​:

- it assumes int2str_table[] will be U16 aligned.

- it assumes buf[] will be U16 aligned.

I believe neither is guaranteed by the C standard.

Good catch! It is indeed not guaranteed. For some reason I was assuming
that stack arrays are aligned in the same way as malloc()ed memory, but
that is obviously not the case.

As James' smoke tests point out, the assertion isn't valid C, since a
pointer doesn't have arithmetic type.

Argh, that's what I get for not testing last minute changes :(

I have attached updated version of the patch. Performance is the same, but
the code is now a bit uglier. Unfortunately I'm not aware of a better
way to ensure array alignment than using unions with dummy members.

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2018

From @xenu

updated.patch
From 9e37e0d3295cb1a355a96a586cb55caf9e78c368 Mon Sep 17 00:00:00 2001
From: Tomasz Konojacki <me@xenu.pl>
Date: Wed, 28 Nov 2018 12:38:51 +0100
Subject: [PATCH] S_uiv_2buf: faster integer stringification algorithm

Processing two digits at a time results in noticeable performance
improvement for larger numbers.

The previous version of the function was being automatically inlined
by gcc because of its small size. It's no longer the case so I had to
use PERL_STATIC_INLINE.

I have marked the UV branch as UNLIKELY because UVs are *much* rarer
than IVs.

[perl #133691]
---
 embed.fnc |  2 +-
 proto.h   |  4 +++-
 sv.c      | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index 408917e0a7..04d560b6d6 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2691,7 +2691,7 @@ pR	|SV *	|varname	|NULLOK const GV *const gv|const char gvtype \
 
 pX	|void	|sv_del_backref	|NN SV *const tsv|NN SV *const sv
 #if defined(PERL_IN_SV_C)
-nsR	|char *	|uiv_2buf	|NN char *const buf|const IV iv|UV uv|const int is_uv|NN char **const peob
+niR	|char *	|uiv_2buf	|NN char *const buf|const IV iv|UV uv|const int is_uv|NN char **const peob
 i	|void	|sv_unglob	|NN SV *const sv|U32 flags
 s	|const char *|sv_display	|NN SV *const sv|NN char *tmpbuf|STRLEN tmpbuf_size
 s	|void	|not_a_number	|NN SV *const sv
diff --git a/proto.h b/proto.h
index 061a9d72a0..0a373f9994 100644
--- a/proto.h
+++ b/proto.h
@@ -6004,10 +6004,12 @@ PERL_STATIC_INLINE void	S_sv_unglob(pTHX_ SV *const sv, U32 flags);
 #define PERL_ARGS_ASSERT_SV_UNGLOB	\
 	assert(sv)
 #endif
-STATIC char *	S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
+#ifndef PERL_NO_INLINE_FUNCTIONS
+PERL_STATIC_INLINE char *	S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
 			__attribute__warn_unused_result__;
 #define PERL_ARGS_ASSERT_UIV_2BUF	\
 	assert(buf); assert(peob)
+#endif
 
 STATIC void	S_utf8_mg_len_cache_update(pTHX_ SV *const sv, MAGIC **const mgp, const STRLEN ulen);
 #define PERL_ARGS_ASSERT_UTF8_MG_LEN_CACHE_UPDATE	\
diff --git a/sv.c b/sv.c
index 826228bc92..672dc5f0eb 100644
--- a/sv.c
+++ b/sv.c
@@ -2846,6 +2846,34 @@ Perl_sv_2num(pTHX_ SV *const sv)
     return sv_2mortal(newSVuv(PTR2UV(SvRV(sv))));
 }
 
+/* int2str_table: lookup table containing string representations of all
+ * two digit numbers. For example, int2str_table.arr[0] is "00" and
+ * int2str_table.arr[12*2] is "12".
+ *
+ * We are going to read two bytes at a time, so we have to ensure that
+ * the array is aligned to a 2 byte boundary. That's why it was made a
+ * union with a dummy U16 member. */
+static const union {
+    char arr[200];
+    U16 dummy;
+} int2str_table = {{
+    '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6',
+    '0', '7', '0', '8', '0', '9', '1', '0', '1', '1', '1', '2', '1', '3',
+    '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', '2', '0',
+    '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7',
+    '2', '8', '2', '9', '3', '0', '3', '1', '3', '2', '3', '3', '3', '4',
+    '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', '4', '0', '4', '1',
+    '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8',
+    '4', '9', '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5',
+    '5', '6', '5', '7', '5', '8', '5', '9', '6', '0', '6', '1', '6', '2',
+    '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+    '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6',
+    '7', '7', '7', '8', '7', '9', '8', '0', '8', '1', '8', '2', '8', '3',
+    '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', '9', '0',
+    '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7',
+    '9', '8', '9', '9'
+}};
+
 /* uiv_2buf(): private routine for use by sv_2pv_flags(): print an IV or
  * UV as a string towards the end of buf, and return pointers to start and
  * end of it.
@@ -2853,16 +2881,23 @@ Perl_sv_2num(pTHX_ SV *const sv)
  * We assume that buf is at least TYPE_CHARS(UV) long.
  */
 
-static char *
+PERL_STATIC_INLINE char *
 S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
 {
     char *ptr = buf + TYPE_CHARS(UV);
     char * const ebuf = ptr;
     int sign;
+    U16 *word_ptr, *word_table;
 
     PERL_ARGS_ASSERT_UIV_2BUF;
 
-    if (is_uv)
+    /* ptr has to be properly aligned, because we will cast it to U16* */
+    assert(PTR2nat(ptr) % 2 == 0);
+    /* we are going to read/write two bytes at a time */
+    word_ptr = (U16*)ptr;
+    word_table = (U16*)int2str_table.arr;
+
+    if (UNLIKELY(is_uv))
 	sign = 0;
     else if (iv >= 0) {
 	uv = iv;
@@ -2871,11 +2906,23 @@ S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const pe
         uv = -(UV)iv;
 	sign = 1;
     }
-    do {
-	*--ptr = '0' + (char)(uv % 10);
-    } while (uv /= 10);
+
+    while (uv > 99) {
+        *--word_ptr = word_table[uv % 100];
+        uv /= 100;
+    }
+    ptr = (char*)word_ptr;
+
+    if (uv < 10)
+        *--ptr = (char)uv + '0';
+    else {
+        *--word_ptr = word_table[uv];
+        ptr = (char*)word_ptr;
+    }
+
     if (sign)
-	*--ptr = '-';
+        *--ptr = '-';
+
     *peob = ebuf;
     return ptr;
 }
@@ -3083,13 +3130,18 @@ Perl_sv_2pv_flags(pTHX_ SV *const sv, STRLEN *const lp, const I32 flags)
 	/* I'm assuming that if both IV and NV are equally valid then
 	   converting the IV is going to be more efficient */
 	const U32 isUIOK = SvIsUV(sv);
-	char buf[TYPE_CHARS(UV)];
+	/* The purpose of this union is to ensure that arr is aligned on
+           a 2 byte boundary, because that is what uiv_2buf() requires */
+	union {
+		char arr[TYPE_CHARS(UV)];
+		U16 dummy;
+	} buf;
 	char *ebuf, *ptr;
 	STRLEN len;
 
 	if (SvTYPE(sv) < SVt_PVIV)
 	    sv_upgrade(sv, SVt_PVIV);
- 	ptr = uiv_2buf(buf, SvIVX(sv), SvUVX(sv), isUIOK, &ebuf);
+        ptr = uiv_2buf(buf.arr, SvIVX(sv), SvUVX(sv), isUIOK, &ebuf);
 	len = ebuf - ptr;
 	/* inlined from sv_setpvn */
 	s = SvGROW_mutable(sv, len + 1);
@@ -10621,9 +10673,14 @@ Does not handle 'set' magic.  See C<L</sv_setpviv_mg>>.
 void
 Perl_sv_setpviv(pTHX_ SV *const sv, const IV iv)
 {
-    char buf[TYPE_CHARS(UV)];
+    /* The purpose of this union is to ensure that arr is aligned on
+       a 2 byte boundary, because that is what uiv_2buf() requires */
+    union {
+        char arr[TYPE_CHARS(UV)];
+        U16 dummy;
+    } buf;
     char *ebuf;
-    char * const ptr = uiv_2buf(buf, iv, 0, 0, &ebuf);
+    char * const ptr = uiv_2buf(buf.arr, iv, 0, 0, &ebuf);
 
     PERL_ARGS_ASSERT_SV_SETPVIV;
 
-- 
2.14.1

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2018

From @xenu

On Thu, 29 Nov 2018 11​:56​:39 +0100
Tomasz Konojacki <me@​xenu.pl> wrote​:

I have attached updated version of the patch. Performance is the same, but
the code is now a bit uglier. Unfortunately I'm not aware of a better
way to ensure array alignment than using unions with dummy members.

Oops, looks like it has introduced a few hard tabs. The fixed version is
attached.

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2018

From @xenu

updated2.patch
From c7389c7c6a2244f036649da11ec635889e091d1b Mon Sep 17 00:00:00 2001
From: Tomasz Konojacki <me@xenu.pl>
Date: Thu, 29 Nov 2018 12:24:03 +0100
Subject: [PATCH] S_uiv_2buf: faster integer stringification algorithm

Processing two digits at a time results in noticeable performance
improvement for larger numbers.

The previous version of the function was being automatically inlined
by gcc because of its small size. It's no longer the case so I had to
use PERL_STATIC_INLINE.

I have marked the UV branch as UNLIKELY because UVs are *much* rarer
than IVs.

[perl #133691]
---
 embed.fnc |  2 +-
 proto.h   |  4 +++-
 sv.c      | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index 408917e0a7..04d560b6d6 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2691,7 +2691,7 @@ pR	|SV *	|varname	|NULLOK const GV *const gv|const char gvtype \
 
 pX	|void	|sv_del_backref	|NN SV *const tsv|NN SV *const sv
 #if defined(PERL_IN_SV_C)
-nsR	|char *	|uiv_2buf	|NN char *const buf|const IV iv|UV uv|const int is_uv|NN char **const peob
+niR	|char *	|uiv_2buf	|NN char *const buf|const IV iv|UV uv|const int is_uv|NN char **const peob
 i	|void	|sv_unglob	|NN SV *const sv|U32 flags
 s	|const char *|sv_display	|NN SV *const sv|NN char *tmpbuf|STRLEN tmpbuf_size
 s	|void	|not_a_number	|NN SV *const sv
diff --git a/proto.h b/proto.h
index 061a9d72a0..0a373f9994 100644
--- a/proto.h
+++ b/proto.h
@@ -6004,10 +6004,12 @@ PERL_STATIC_INLINE void	S_sv_unglob(pTHX_ SV *const sv, U32 flags);
 #define PERL_ARGS_ASSERT_SV_UNGLOB	\
 	assert(sv)
 #endif
-STATIC char *	S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
+#ifndef PERL_NO_INLINE_FUNCTIONS
+PERL_STATIC_INLINE char *	S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
 			__attribute__warn_unused_result__;
 #define PERL_ARGS_ASSERT_UIV_2BUF	\
 	assert(buf); assert(peob)
+#endif
 
 STATIC void	S_utf8_mg_len_cache_update(pTHX_ SV *const sv, MAGIC **const mgp, const STRLEN ulen);
 #define PERL_ARGS_ASSERT_UTF8_MG_LEN_CACHE_UPDATE	\
diff --git a/sv.c b/sv.c
index 826228bc92..bc06c6c659 100644
--- a/sv.c
+++ b/sv.c
@@ -2846,6 +2846,34 @@ Perl_sv_2num(pTHX_ SV *const sv)
     return sv_2mortal(newSVuv(PTR2UV(SvRV(sv))));
 }
 
+/* int2str_table: lookup table containing string representations of all
+ * two digit numbers. For example, int2str_table.arr[0] is "00" and
+ * int2str_table.arr[12*2] is "12".
+ *
+ * We are going to read two bytes at a time, so we have to ensure that
+ * the array is aligned to a 2 byte boundary. That's why it was made a
+ * union with a dummy U16 member. */
+static const union {
+    char arr[200];
+    U16 dummy;
+} int2str_table = {{
+    '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6',
+    '0', '7', '0', '8', '0', '9', '1', '0', '1', '1', '1', '2', '1', '3',
+    '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', '2', '0',
+    '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7',
+    '2', '8', '2', '9', '3', '0', '3', '1', '3', '2', '3', '3', '3', '4',
+    '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', '4', '0', '4', '1',
+    '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8',
+    '4', '9', '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5',
+    '5', '6', '5', '7', '5', '8', '5', '9', '6', '0', '6', '1', '6', '2',
+    '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+    '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6',
+    '7', '7', '7', '8', '7', '9', '8', '0', '8', '1', '8', '2', '8', '3',
+    '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', '9', '0',
+    '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7',
+    '9', '8', '9', '9'
+}};
+
 /* uiv_2buf(): private routine for use by sv_2pv_flags(): print an IV or
  * UV as a string towards the end of buf, and return pointers to start and
  * end of it.
@@ -2853,16 +2881,23 @@ Perl_sv_2num(pTHX_ SV *const sv)
  * We assume that buf is at least TYPE_CHARS(UV) long.
  */
 
-static char *
+PERL_STATIC_INLINE char *
 S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const peob)
 {
     char *ptr = buf + TYPE_CHARS(UV);
     char * const ebuf = ptr;
     int sign;
+    U16 *word_ptr, *word_table;
 
     PERL_ARGS_ASSERT_UIV_2BUF;
 
-    if (is_uv)
+    /* ptr has to be properly aligned, because we will cast it to U16* */
+    assert(PTR2nat(ptr) % 2 == 0);
+    /* we are going to read/write two bytes at a time */
+    word_ptr = (U16*)ptr;
+    word_table = (U16*)int2str_table.arr;
+
+    if (UNLIKELY(is_uv))
 	sign = 0;
     else if (iv >= 0) {
 	uv = iv;
@@ -2871,11 +2906,23 @@ S_uiv_2buf(char *const buf, const IV iv, UV uv, const int is_uv, char **const pe
         uv = -(UV)iv;
 	sign = 1;
     }
-    do {
-	*--ptr = '0' + (char)(uv % 10);
-    } while (uv /= 10);
+
+    while (uv > 99) {
+        *--word_ptr = word_table[uv % 100];
+        uv /= 100;
+    }
+    ptr = (char*)word_ptr;
+
+    if (uv < 10)
+        *--ptr = (char)uv + '0';
+    else {
+        *--word_ptr = word_table[uv];
+        ptr = (char*)word_ptr;
+    }
+
     if (sign)
-	*--ptr = '-';
+        *--ptr = '-';
+
     *peob = ebuf;
     return ptr;
 }
@@ -3083,13 +3130,18 @@ Perl_sv_2pv_flags(pTHX_ SV *const sv, STRLEN *const lp, const I32 flags)
 	/* I'm assuming that if both IV and NV are equally valid then
 	   converting the IV is going to be more efficient */
 	const U32 isUIOK = SvIsUV(sv);
-	char buf[TYPE_CHARS(UV)];
+        /* The purpose of this union is to ensure that arr is aligned on
+           a 2 byte boundary, because that is what uiv_2buf() requires */
+        union {
+            char arr[TYPE_CHARS(UV)];
+            U16 dummy;
+        } buf;
 	char *ebuf, *ptr;
 	STRLEN len;
 
 	if (SvTYPE(sv) < SVt_PVIV)
 	    sv_upgrade(sv, SVt_PVIV);
- 	ptr = uiv_2buf(buf, SvIVX(sv), SvUVX(sv), isUIOK, &ebuf);
+        ptr = uiv_2buf(buf.arr, SvIVX(sv), SvUVX(sv), isUIOK, &ebuf);
 	len = ebuf - ptr;
 	/* inlined from sv_setpvn */
 	s = SvGROW_mutable(sv, len + 1);
@@ -10621,9 +10673,14 @@ Does not handle 'set' magic.  See C<L</sv_setpviv_mg>>.
 void
 Perl_sv_setpviv(pTHX_ SV *const sv, const IV iv)
 {
-    char buf[TYPE_CHARS(UV)];
+    /* The purpose of this union is to ensure that arr is aligned on
+       a 2 byte boundary, because that is what uiv_2buf() requires */
+    union {
+        char arr[TYPE_CHARS(UV)];
+        U16 dummy;
+    } buf;
     char *ebuf;
-    char * const ptr = uiv_2buf(buf, iv, 0, 0, &ebuf);
+    char * const ptr = uiv_2buf(buf.arr, iv, 0, 0, &ebuf);
 
     PERL_ARGS_ASSERT_SV_SETPVIV;
 
-- 
2.14.1

@p5pRT
Copy link
Author

p5pRT commented Dec 12, 2018

From @tonycoz

On Thu, 29 Nov 2018 03​:27​:41 -0800, me@​xenu.pl wrote​:

On Thu, 29 Nov 2018 11​:56​:39 +0100
Tomasz Konojacki <me@​xenu.pl> wrote​:

I have attached updated version of the patch. Performance is the same, but
the code is now a bit uglier. Unfortunately I'm not aware of a better
way to ensure array alignment than using unions with dummy members.

Oops, looks like it has introduced a few hard tabs. The fixed version is
attached.

Thanks, applied as dd0a5f5.

Tony

@p5pRT
Copy link
Author

p5pRT commented Dec 12, 2018

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.30.0, this and 160 other issues have been
resolved.

Perl 5.30.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.30.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant