Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic byteorder code in my_htonl and my_ntohl is incorrect #12943

Closed
p5pRT opened this issue May 6, 2013 · 7 comments
Closed

generic byteorder code in my_htonl and my_ntohl is incorrect #12943

p5pRT opened this issue May 6, 2013 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented May 6, 2013

Migrated from rt.perl.org#117887 (status was 'resolved')

Searchable as RT117887$

@p5pRT
Copy link
Author

p5pRT commented May 6, 2013

From @nwc10

The fallback functions my_htonl and my_ntoh_l were added to util.c for
Perl 3.0, as part of adding 'N' and 'n' templates to pack. They contain
conditionally compiled special-case code for little endian systems, and
a fallback loop for other values of BYTEORDER. The intent is that the
the fallback code will work on any byteorder. In fact, it only works
correctly on 32 bit little endian systems, because it *always* swaps the
order of the bytes within the word. The relevant code in util.c in blead
looks like this​:

long
Perl_my_htonl(pTHX_ long l)
{
  union {
  long result;
  char c[sizeof(long)];
  } u;

#if BYTEORDER == 0x1234 || BYTEORDER == 0x12345678
#if BYTEORDER == 0x12345678
  u.result = 0;
#endif
  u.c[0] = (l >> 24) & 255;
  u.c[1] = (l >> 16) & 255;
  u.c[2] = (l >> 8) & 255;
  u.c[3] = l & 255;
  return u.result;
#else
#if ((BYTEORDER - 0x1111) & 0x444) || !(BYTEORDER & 0xf)
  Perl_croak(aTHX_ "Unknown BYTEORDER\n");
#else
  I32 o;
  I32 s;

  for (o = BYTEORDER - 0x1111, s = 0; s < (sizeof(long)*8); o >>= 4, s += 8) {
  u.c[o & 0xf] = (l >> s) & 255;
  }
  return u.result;
#endif
#endif
}

Extracted into a standalone file (byteorder.c), with the first #if changed
to permit code to be forced to use either implementation, with output is
generated by​:

int
main() {
  unsigned long in = 0x0A0B0C0D;
  unsigned long mid = Perl_my_htonl(in);
  printf("my​: %08lx %08lx %08lx\n", in, mid, (unsigned long) Perl_my_ntohl(mid));
  mid = htonl(in);
  printf(" %08lx %08lx %08lx\n", in, mid, ntohl(mid));
  return 0;
}

Tested on 32 bit little endian systems, first with the special-case code,
then forced to use the loop​:

$ ./byteorder-special32
my​: 0a0b0c0d 0d0c0b0a 0a0b0c0d
  0a0b0c0d 0d0c0b0a 0a0b0c0d
$ ./byteorder32
my​: 0a0b0c0d 0d0c0b0a 0a0b0c0d
  0a0b0c0d 0d0c0b0a 0a0b0c0d

However, the same code on a 32 bit big endian system shows the bug - the
supposedly generic loop code actually reverses the word, whereas the
"special" code for little endian systems works​:

$ ./byteorder32
my​: 0a0b0c0d 0d0c0b0a 0a0b0c0d
  0a0b0c0d 0a0b0c0d 0a0b0c0d
$ ./byteorder-special32
my​: 0a0b0c0d 0a0b0c0d 0d0c0b0a
  0a0b0c0d 0a0b0c0d 0a0b0c0d

And again on a mixed endian system (emulated PDP-11 running BSD 2.11*)​:

nick[63] ./byteorder
my​: 0a0b0c0d 0d0c0b0a 0a0b0c0d
  0a0b0c0d 0b0a0d0c 0a0b0c0d
nick[64] ./byteorder-special
my​: 0a0b0c0d 0b0a0d0c 0c0d0a0b
  0a0b0c0d 0b0a0d0c 0a0b0c0d

From this I infer that the fallback functions were never needed on mixed
endian architectures such PDP-11s, as the operating system always supplied
a correct htonl etc.

So the irony is that the special-purpose little endian code is actually
correct for all platforms (attached as byteorder-fixed.c)
An even better fix would be to use the generator macros BETOH and HTOBE
instead (attached as byteorder-macros.c) - ie

BETOH(my_ntohl,long)
BETOH(my_ntohs,short)
HTOBE(my_htonl,long)
HTOBE(my_htons,short)

I think that the best fix, however, is

a) to explicitly drop all remaining code that supports mixed-endian platforms
  (and remove it)**
b) then post v5.18.0 merge in the fixes in smoke-me/nicholas/genpacksizetables
  which eliminate the need for the support functions completely.

Nicholas Clark

* PDP-11 emulator is part of simh​: http​://simh.trailing-edge.com/
  2.11BSD from http​://www.ak6dn.dyndns.org/PDP-11/2.11BSD/

  Trivially easy to install on macports as 'simh' or Debian as 'simh'.
  Currently my Raspberry Pi is emulating a PDP-11. (Running the emulator on
  my laptop made the fan spin, which was annoying)

  No, I don't propose that anyone spends time trying to get Perl working on
  PDP-11. For starters, you'll need to find an ANSI C toolchain. The BSD
  cc is resolutely K&R, doesn't do function prototypes, or "%lX" or "%hx"
  format strings in printf.

** Unless someone is willing to fix blead so that it compiles again on such
  a platform, and then run a reasonably regular smoker to ensure that it
  stays compiling.

@p5pRT
Copy link
Author

p5pRT commented May 6, 2013

From @nwc10

0001-Remove-buggy-loop-based-byte-swapping-code.patch
From 434e9c4a94a4830e2d75fee52d73a797bc85d4c3 Mon Sep 17 00:00:00 2001
From: Nicholas Clark <nick@ccl4.org>
Date: Mon, 6 May 2013 13:28:20 +0200
Subject: [PATCH] Remove buggy loop-based byte swapping code.

The irony is that the union-based code special-cased for little endian
systems actually works everywhere, even on mixed-endian systems.
---
 pod/perldiag.pod |    5 -----
 util.c           |   38 +++-----------------------------------
 2 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 3d8212e..3acbce4 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -5372,11 +5372,6 @@ problems when being input or output, which is likely where this message
 came from.  If you really really know what you are doing you can turn
 off this warning by C<no warnings 'surrogate';>.
 
-=item Unknown BYTEORDER
-
-(F) There are no byte-swapping functions for a machine with this byte
-order.
-
 =item Unknown charname '%s'
 
 (F) The name you used inside C<\N{}> is unknown to Perl.  Check the
diff --git a/util.c b/util.c
index 42fd70f..55a79b5 100644
--- a/util.c
+++ b/util.c
@@ -2183,8 +2183,7 @@ Perl_my_htonl(pTHX_ long l)
 	char c[sizeof(long)];
     } u;
 
-#if BYTEORDER == 0x1234 || BYTEORDER == 0x12345678
-#if BYTEORDER == 0x12345678
+#if BYTEORDER > 0xFFFF
     u.result = 0; 
 #endif 
     u.c[0] = (l >> 24) & 255;
@@ -2192,19 +2191,6 @@ Perl_my_htonl(pTHX_ long l)
     u.c[2] = (l >> 8) & 255;
     u.c[3] = l & 255;
     return u.result;
-#else
-#if ((BYTEORDER - 0x1111) & 0x444) || !(BYTEORDER & 0xf)
-    Perl_croak(aTHX_ "Unknown BYTEORDER\n");
-#else
-    I32 o;
-    I32 s;
-
-    for (o = BYTEORDER - 0x1111, s = 0; s < (sizeof(long)*8); o >>= 4, s += 8) {
-	u.c[o & 0xf] = (l >> s) & 255;
-    }
-    return u.result;
-#endif
-#endif
 }
 
 long
@@ -2215,27 +2201,9 @@ Perl_my_ntohl(pTHX_ long l)
 	char c[sizeof(long)];
     } u;
 
-#if BYTEORDER == 0x1234
-    u.c[0] = (l >> 24) & 255;
-    u.c[1] = (l >> 16) & 255;
-    u.c[2] = (l >> 8) & 255;
-    u.c[3] = l & 255;
-    return u.l;
-#else
-#if ((BYTEORDER - 0x1111) & 0x444) || !(BYTEORDER & 0xf)
-    Perl_croak(aTHX_ "Unknown BYTEORDER\n");
-#else
-    I32 o;
-    I32 s;
-
     u.l = l;
-    l = 0;
-    for (o = BYTEORDER - 0x1111, s = 0; s < (sizeof(long)*8); o >>= 4, s += 8) {
-	l |= (u.c[o & 0xf] & 255) << s;
-    }
-    return l;
-#endif
-#endif
+    return ((u.c[0] & 255) << 24) | ((u.c[1] & 255) << 16)
+	| ((u.c[2] & 255) << 8) | (u.c[3] & 255);
 }
 
 #endif /* BYTEORDER != 0x4321 */
-- 
1.7.2.5

@p5pRT
Copy link
Author

p5pRT commented May 6, 2013

From @nwc10

typedef long I32;

long
Perl_my_htonl(l)
long l;
{
    union {
	long result;
	char c[sizeof(long)];
    } u;

#ifdef SPECIAL
#if BYTEORDER == 0x12345678
    u.result = 0; 
#endif 
    u.c[0] = (l >> 24) & 255;
    u.c[1] = (l >> 16) & 255;
    u.c[2] = (l >> 8) & 255;
    u.c[3] = l & 255;
    return u.result;
#else
#if ((BYTEORDER - 0x1111) & 0x444) || !(BYTEORDER & 0xf)
    Perl_croak(aTHX_ "Unknown BYTEORDER\n");
#else
    I32 o;
    I32 s;

    for (o = BYTEORDER - 0x1111, s = 0; s < (sizeof(long)*8); o >>= 4, s += 8) {
	u.c[o & 0xf] = (l >> s) & 255;
    }
    return u.result;
#endif
#endif
}


long
Perl_my_ntohl(l)
long l;
{
    union {
	long l;
	char c[sizeof(long)];
    } u;

#ifdef COMMON
    u.c[0] = (l >> 24) & 255;
    u.c[1] = (l >> 16) & 255;
    u.c[2] = (l >> 8) & 255;
    u.c[3] = l & 255;
    return u.l;
#else
#if ((BYTEORDER - 0x1111) & 0x444) || !(BYTEORDER & 0xf)
    Perl_croak(aTHX_ "Unknown BYTEORDER\n");
#else
    I32 o;
    I32 s;

    u.l = l;
    l = 0;
    for (o = BYTEORDER - 0x1111, s = 0; s < (sizeof(long)*8); o >>= 4, s += 8) {
	l |= (u.c[o & 0xf] & 255) << s;
    }
    return l;
#endif
#endif
}

#include <stdio.h>
#include <sys/types.h>
#include <netinet/in.h>

int
main() {
    unsigned long in = 0x0A0B0C0D;
    unsigned long mid = Perl_my_htonl(in);
    printf("my: %08lx %08lx %08lx\n", in, mid, (unsigned long) Perl_my_ntohl(mid));
    mid = htonl(in);
    printf("    %08lx %08lx %08lx\n", in, mid, ntohl(mid));
    return 0;
}

@p5pRT
Copy link
Author

p5pRT commented May 6, 2013

From @nwc10

#ifdef pdp11
typedef unsigned long U32;
#else
typedef unsigned int U32;
#endif
typedef unsigned short U16;

/*
 * Little-endian byte order functions - 'v' for 'VAX', or 'reVerse'.
 * If these functions are defined,
 * the BYTEORDER is neither 0x1234 nor 0x4321.
 * However, this is not assumed.
 * -DWS
 */

#define HTOLE(name,type)					\
	type							\
	name (n)					        \
	type n;					        	\
	{							\
	    union {						\
		type value;					\
		char c[sizeof(type)];				\
	    } u;						\
	    U32 i;					        \
	    U32 s = 0;					        \
	    for (i = 0; i < sizeof(u.c); i++, s += 8) {		\
		u.c[i] = (n >> s) & 0xFF;			\
	    }							\
	    return u.value;					\
	}

#define LETOH(name,type)					\
	type							\
	name (n)					        \
	type n;					        	\
	{							\
	    union {						\
		type value;					\
		char c[sizeof(type)];				\
	    } u;						\
	    U32 i;					        \
	    U32 s = 0;					        \
	    u.value = n;					\
	    n = 0;						\
	    for (i = 0; i < sizeof(u.c); i++, s += 8) {		\
		n |= ((type)(u.c[i] & 0xFF)) << s;		\
	    }							\
	    return n;						\
	}

/*
 * Big-endian byte order functions.
 */

#define HTOBE(name,type)					\
	type							\
	name (n)					        \
	type n;					        	\
	{							\
	    union {						\
		type value;					\
		char c[sizeof(type)];				\
	    } u;						\
	    U32 i;					        \
	    U32 s = 8*(sizeof(u.c)-1);			        \
	    for (i = 0; i < sizeof(u.c); i++, s -= 8) {		\
		u.c[i] = (n >> s) & 0xFF;			\
	    }							\
	    return u.value;					\
	}

#define BETOH(name,type)					\
	type							\
	name (n)					        \
	type n;						        \
	{							\
	    union {						\
		type value;					\
		char c[sizeof(type)];				\
	    } u;						\
	    U32 i;					        \
	    U32 s = 8*(sizeof(u.c)-1);			        \
	    u.value = n;					\
	    n = 0;						\
	    for (i = 0; i < sizeof(u.c); i++, s -= 8) {		\
		n |= ((type)(u.c[i] & 0xFF)) << s;		\
	    }							\
	    return n;						\
	}

LETOH(vtohl,long)
LETOH(vtohs,short)
HTOLE(htovl,long)
HTOLE(htovs,short)

BETOH(my_ntohl,long)
BETOH(my_ntohs,short)
HTOBE(my_htonl,long)
HTOBE(my_htons,short)

#include <stdio.h>
#include <sys/types.h>
#include <netinet/in.h>

int
main() {
    unsigned long in_l = 0x12345678;
    unsigned short in_s = 0xABCD;
    unsigned long mid_l;
    unsigned short mid_s;
    unsigned short end_s;
    mid_l = htonl(in_l);
    printf("  %08lx %08lx %08lx\n", in_l, mid_l, (unsigned long) ntohl(mid_l));
    mid_l = my_htonl(in_l);
    printf("N %08lx %08lx %08lx\n", in_l, mid_l, (unsigned long) my_ntohl(mid_l));
    mid_l = htovl(in_l);
    printf("V %08lx %08lx %08lx\n", in_l, mid_l, (unsigned long) vtohl(mid_l));
    mid_s = htons(in_s);
    end_s = ntohs(mid_s);
    printf("  %04x %04x %04x\n", in_s, mid_s, end_s);
    mid_s = my_htons(in_s);
    end_s = my_ntohs(mid_s);
    printf("n %04x %04x %04x\n", in_s, mid_s, end_s);
    mid_s = htovs(in_s);
    end_s = vtohs(mid_s);
    printf("v %04x %04x %04x\n", in_s, mid_s, end_s);
    return 0;
}

@p5pRT
Copy link
Author

p5pRT commented May 6, 2013

From @nwc10

typedef long I32;

long
Perl_my_htonl(l)
    long l;
{
    union {
	long result;
	char c[sizeof(long)];
    } u;

#if BYTEORDER > 0xFFFF
    u.result = 0; 
#endif 
    u.c[0] = (l >> 24) & 255;
    u.c[1] = (l >> 16) & 255;
    u.c[2] = (l >> 8) & 255;
    u.c[3] = l & 255;
    return u.result;
}


long
Perl_my_ntohl(l)
    long l;
{
    union {
	long l;
	char c[sizeof(long)];
    } u;

    u.l = l;
    return ((u.c[0] & 255) << 24) | ((u.c[1] & 255) << 16)
	| ((u.c[2] & 255) << 8) | (u.c[3] & 255);
}

#include <stdio.h>
#include <sys/types.h>
#include <netinet/in.h>

int
main() {
    unsigned long in = 0x0A0B0C0D;
    unsigned long mid = Perl_my_htonl(in);
    printf("my: %08lx %08lx %08lx\n", in, mid, (unsigned long) Perl_my_ntohl(mid));
    mid = htonl(in);
    printf("    %08lx %08lx %08lx\n", in, mid, ntohl(mid));
    return 0;
}

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2013

From @nwc10

On Mon May 06 04​:32​:09 2013, nicholas wrote​:

I think that the best fix, however, is

a) to explicitly drop all remaining code that supports mixed-endian
platforms
(and remove it)**
b) then post v5.18.0 merge in the fixes in smoke-
me/nicholas/genpacksizetables
which eliminate the need for the support functions completely.

I applied the patch as commit 66bb80e
Marking as resolved.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2013

@nwc10 - Status changed from 'new' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant