Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] Remove some compiler hoops to jump through for EBCDIC #15683

Closed
p5pRT opened this issue Oct 25, 2016 · 12 comments
Closed

[PATCH] Remove some compiler hoops to jump through for EBCDIC #15683

p5pRT opened this issue Oct 25, 2016 · 12 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 25, 2016

Migrated from rt.perl.org#129961 (status was 'resolved')

Searchable as RT129961$

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2016

From @petdance

Created by @petdance

This patch simplifies two bits of code that I came across while
working on supporting the clang -Weverything flag.

The first, in Perl_validate_proto, removes unnecessary variable
initialization if proto of NULL is passed.

The second, in S_scan_const, rearranges some code and #ifdefs so that
the convert_unicode and real_range_max variables are only declared
if EBCDIC is set. This lets us no longer have to unnecessarily set
useless variables to make the compiler happy, and it saves us from some
unnecessary checks on "if (convert_unicode)". One of the comments says
"(Compilers should optimize this out for non-EBCDIC)", but now the
compiler won't even see these unnecessary variables or tests.

This diff may be easier to understand if it's viewed as two side-by-side
files, such as with vimdiff.

Perl Info

Flags:
    category=core
    severity=low
    Type=Patch
    PatchStatus=HasPatch

Site configuration information for perl 5.24.0:

Configured by andy at Sun Jun  5 23:28:46 CDT 2016.

Summary of my perl5 (revision 5 version 24 subversion 0) configuration:
   
  Platform:
    osname=linux, osvers=3.10.0-327.18.2.el7.x86_64, archname=x86_64-linux
    uname='linux clifford 3.10.0-327.18.2.el7.x86_64 #1 smp thu may 12 11:03:55 utc 2016 x86_64 x86_64 x86_64 gnulinux '
    config_args='-de -Dprefix=/home/andy/perl5/perlbrew/perls/perl-5.24.0 -Aeval:scriptdir=/home/andy/perl5/perlbrew/perls/perl-5.24.0/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
    optimize='-O2',
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion='', gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 /usr/local/lib64
    libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.17.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.17'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    Devel::PatchPerl 1.38


@INC for perl 5.24.0:
    /home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/x86_64-linux
    /home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0
    /home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0/x86_64-linux
    /home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0
    .


Environment for perl 5.24.0:
    HOME=/home/andy
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/andy/perl5/perlbrew/bin:/home/andy/perl5/perlbrew/perls/perl-5.24.0/bin:/home/andy/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
    PERLBREW_BASHRC_VERSION=0.75
    PERLBREW_HOME=/home/andy/.perlbrew
    PERLBREW_MANPATH=/home/andy/perl5/perlbrew/perls/perl-5.24.0/man
    PERLBREW_PATH=/home/andy/perl5/perlbrew/bin:/home/andy/perl5/perlbrew/perls/perl-5.24.0/bin
    PERLBREW_PERL=perl-5.24.0
    PERLBREW_ROOT=/home/andy/perl5/perlbrew
    PERLBREW_VERSION=0.75
    PERLCRITIC=/home/andy/tw/Dev/perlcriticrc
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2016

From @petdance

toke-ebcdic.patch
diff --git a/toke.c b/toke.c
index cab34b9..b2d9209 100644
--- a/toke.c
+++ b/toke.c
@@ -1568,7 +1568,7 @@ bool
 Perl_validate_proto(pTHX_ SV *name, SV *proto, bool warn)
 {
     STRLEN len, origlen;
-    char *p = proto ? SvPV(proto, len) : NULL;
+    char *p;
     bool bad_proto = FALSE;
     bool in_brackets = FALSE;
     bool after_slash = FALSE;
@@ -1583,6 +1583,7 @@ Perl_validate_proto(pTHX_ SV *name, SV *proto, bool warn)
     if (!proto)
 	return TRUE;
 
+    p = SvPV(proto, len);
     origlen = len;
     for (; len--; p++) {
 	if (!isSPACE(*p)) {
@@ -2954,10 +2955,7 @@ S_scan_const(pTHX_ char *start)
 		IV range_max;	/* last character in range */
                 STRLEN save_offset;
                 STRLEN grow;
-#ifndef EBCDIC  /* Not meaningful except in EBCDIC, so initialize to false */
-                const bool convert_unicode = FALSE;
-                const IV real_range_max = 0;
-#else
+#ifdef EBCDIC
                 bool convert_unicode;
                 IV real_range_max = 0;
 #endif
@@ -3002,12 +3000,14 @@ S_scan_const(pTHX_ char *start)
 #endif
 
                 if (range_min > range_max) {
+#ifdef EBCDIC
                     if (convert_unicode) {
                         /* Need to convert back to native for meaningful
                          * messages for this platform */
                         range_min = UNI_TO_NATIVE(range_min);
                         range_max = UNI_TO_NATIVE(range_max);
                     }
+#endif
 
                     /* Use the characters themselves for the error message if
                      * ASCII printables; otherwise some visible representation
@@ -3017,6 +3017,7 @@ S_scan_const(pTHX_ char *start)
 			 "Invalid range \"%c-%c\" in transliteration operator",
 			 (char)range_min, (char)range_max);
                     }
+#ifdef EBCDIC
                     else if (convert_unicode) {
                         /* diag_listed_as: Invalid range "%s" in transliteration operator */
                         Perl_croak(aTHX_
@@ -3024,6 +3025,7 @@ S_scan_const(pTHX_ char *start)
                                " in transliteration operator",
 			       range_min, range_max);
                     }
+#endif
                     else {
                         /* diag_listed_as: Invalid range "%s" in transliteration operator */
                         Perl_croak(aTHX_
@@ -3105,9 +3107,8 @@ S_scan_const(pTHX_ char *start)
                  * (min, max, and the hyphen) */
                 d = save_offset + SvGROW(sv, SvLEN(sv) + grow - 3);
 
-                /* Here, we expand out the range.  On ASCII platforms, the
-                 * compiler should optimize out the 'convert_unicode==TRUE'
-                 * portion of this */
+#ifdef EBCDIC
+                /* Here, we expand out the range. */
                 if (convert_unicode) {
                     IV i;
 
@@ -3126,7 +3127,10 @@ S_scan_const(pTHX_ char *start)
                         }
 		    }
 		}
-                else {
+                else
+#endif
+                /* Always gets run for ASCII, and sometimes for EBCDIC. */
+                {
                     IV i;
 
                     /* Here, no conversions are necessary, which means that the
@@ -3146,8 +3150,8 @@ S_scan_const(pTHX_ char *start)
 		    }
 		}
 
-                /* (Compilers should optimize this out for non-EBCDIC).  If the
-                 * original range extended above 255, add in that portion */
+#ifdef EBCDIC
+                /* If the original range extended above 255, add in that portion. */
                 if (real_range_max) {
                     *d++ = (char) UTF8_TWO_BYTE_HI(0x100);
                     *d++ = (char) UTF8_TWO_BYTE_LO(0x100);
@@ -3156,6 +3160,7 @@ S_scan_const(pTHX_ char *start)
                     if (real_range_max > 0x100)
                         d = (char*)uvchr_to_utf8((U8*)d, real_range_max);
                 }
+#endif
 
               range_done:
 		/* mark the range as done, and continue */

@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2016

From @tonycoz

On Mon Oct 24 21​:45​:04 2016, petdance wrote​:

This patch simplifies two bits of code that I came across while
working on supporting the clang -Weverything flag.

Thanks, applied as 11327fa.

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2016

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Nov 26, 2016

From @khwilliamson

On 10/24/2016 10​:45 PM, Andy Lester (via RT) wrote​:

# New Ticket Created by Andy Lester
# Please include the string​: [perl #129961]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129961 >

This is a bug report for perl from andy@​petdance.com,
generated with the help of perlbug 1.40 running under perl 5.24.0.

-----------------------------------------------------------------
[Please describe your issue here]

This patch simplifies two bits of code that I came across while
working on supporting the clang -Weverything flag.

The first, in Perl_validate_proto, removes unnecessary variable
initialization if proto of NULL is passed.

The second, in S_scan_const, rearranges some code and #ifdefs so that
the convert_unicode and real_range_max variables are only declared
if EBCDIC is set. This lets us no longer have to unnecessarily set
useless variables to make the compiler happy, and it saves us from some
unnecessary checks on "if (convert_unicode)". One of the comments says
"(Compilers should optimize this out for non-EBCDIC)", but now the
compiler won't even see these unnecessary variables or tests.

Given that this code was generating compiler warnings was the reason to
fix it.

But I do want to clarify that he reason I wrote it the way it was, was
to cut down on the number of #ifdef EBCDICs there are. Each added
#ifdef is an added human maintenance cost. It makes the code harder to
read, as it disrupts the flow of what is going on. It also means that
some code won't even be compiled except under the right conditions, and
that skipped code may not even be syntactically valid, which you won't
find out until and unless there is some smoker (or worse, something from
the field that we don't have the resources to reproduce) that has the
correct configurations to trigger it. When I first started the modern
EBCDIC port, there were a bunch of syntax errors to work through from
code that had been modified, but didn't get compiled.

By coding it the way it was, I made it easier to read, and hence
maintain, and I knew it would compile on both types of machines.

This diff may be easier to understand if it's viewed as two side-by-side
files, such as with vimdiff.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=low
Type=Patch
PatchStatus=HasPatch
---
Site configuration information for perl 5.24.0​:

Configured by andy at Sun Jun 5 23​:28​:46 CDT 2016.

Summary of my perl5 (revision 5 version 24 subversion 0) configuration​:

Platform​:
osname=linux, osvers=3.10.0-327.18.2.el7.x86_64, archname=x86_64-linux
uname='linux clifford 3.10.0-327.18.2.el7.x86_64 #1 smp thu may 12 11​:03​:55 utc 2016 x86_64 x86_64 x86_64 gnulinux '
config_args='-de -Dprefix=/home/andy/perl5/perlbrew/perls/perl-5.24.0 -Aeval​:scriptdir=/home/andy/perl5/perlbrew/perls/perl-5.24.0/bin'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
optimize='-O2',
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion='', gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 /usr/local/lib64
libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.17.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.17'
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Locally applied patches​:
Devel​::PatchPerl 1.38

---
@​INC for perl 5.24.0​:
/home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/x86_64-linux
/home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0
/home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0/x86_64-linux
/home/andy/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0
.

---
Environment for perl 5.24.0​:
HOME=/home/andy
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/andy/perl5/perlbrew/bin​:/home/andy/perl5/perlbrew/perls/perl-5.24.0/bin​:/home/andy/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/bin​:/usr/local/sbin​:/usr/sbin
PERLBREW_BASHRC_VERSION=0.75
PERLBREW_HOME=/home/andy/.perlbrew
PERLBREW_MANPATH=/home/andy/perl5/perlbrew/perls/perl-5.24.0/man
PERLBREW_PATH=/home/andy/perl5/perlbrew/bin​:/home/andy/perl5/perlbrew/perls/perl-5.24.0/bin
PERLBREW_PERL=perl-5.24.0
PERLBREW_ROOT=/home/andy/perl5/perlbrew
PERLBREW_VERSION=0.75
PERLCRITIC=/home/andy/tw/Dev/perlcriticrc
PERL_BADLANG (unset)
SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 27, 2016

From @petdance

By coding it the way it was, I made it easier to read, and hence
maintain, and I knew it would compile on both types of machines

That was my intent, too, to make it easier to follow for the programmer by encapsulating the EBCDIC-only variables within the ifdef blocks.

@p5pRT
Copy link
Author

p5pRT commented Nov 27, 2016

From @arc

Andy Lester via RT <perlbug-followup@​perl.org> wrote​:

By coding it the way it was, I made it easier to read, and hence
maintain, and I knew it would compile on both types of machines

That was my intent, too, to make it easier to follow for the programmer by encapsulating the EBCDIC-only variables within the ifdef blocks.

(Apologies for the delay in this response; I meant to follow up on
this ticket some weeks ago, but doing so apparently slipped my mind.)

My own view is that the net consequence of this change is to make it
harder rather than easier to follow the logic in question. My biggest
concerns relate to code structured like this​:

#ifdef EBCDIC
  if (convert_unicode) {
  …
  }
  else
#endif
  { … }

This sort of code interleaves conditionals written in two very
different languages. Correctly construing the meaning of this code for
all paths requires a mental dance that effectively reimplements the
various translation phases of a C compiler.

I'm also aware that we have a lot of code with the sort of structure
in question (and, indeed, some of it is even harder to follow than
this example), but I'd strongly prefer simplifying what we already
have, rather than letting these structures proliferate. I think the
best course of action would be to revert
11327fa (though I regret saying so,
because I know that you submitted this patch in a good-faith attempt
to improve Perl).

--
Aaron Crane ** http​://aaroncrane.co.uk/

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2016

From @khwilliamson

On 11/27/2016 11​:41 AM, Aaron Crane wrote​:

Andy Lester via RT <perlbug-followup@​perl.org> wrote​:

By coding it the way it was, I made it easier to read, and hence
maintain, and I knew it would compile on both types of machines

That was my intent, too, to make it easier to follow for the programmer by encapsulating the EBCDIC-only variables within the ifdef blocks.

(Apologies for the delay in this response; I meant to follow up on
this ticket some weeks ago, but doing so apparently slipped my mind.)

My own view is that the net consequence of this change is to make it
harder rather than easier to follow the logic in question. My biggest
concerns relate to code structured like this​:

#ifdef EBCDIC
if (convert_unicode) {

}
else
#endif
{ … }

This sort of code interleaves conditionals written in two very
different languages. Correctly construing the meaning of this code for
all paths requires a mental dance that effectively reimplements the
various translation phases of a C compiler.

I'm also aware that we have a lot of code with the sort of structure
in question (and, indeed, some of it is even harder to follow than
this example), but I'd strongly prefer simplifying what we already
have, rather than letting these structures proliferate. I think the
best course of action would be to revert
11327fa (though I regret saying so,
because I know that you submitted this patch in a good-faith attempt
to improve Perl).

But, my understanding is that this patch silences compiler warnings that
are not currently enabled, but that we are working to clean up the core
so that they can be enabled, and if so, I think that trumps readability
in this case. So the patch should remain, if my understanding is
correct. And it's clear that what is the best readability is debatable,
even though my taste agrees with Aaron in this case.

I've spent quite a bit of effort to try to eliminate EBCDIC vs ASCII
separate compilation paths, confining them to almost entirely utf8.h and
utf8.c. The main exceptions are because in EBCDIC, the range [A-Z]
includes more than 26 characters, and we have special handling to try to
make that invisible to the Perl application. There are two affected
files​: toke.c because we can say tr/A-Z/foo/, and regcomp.c because of
qr/[A-Z]/. This code is in one of those areas.

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2016

From @petdance

But, my understanding is that this patch silences compiler warnings that are not currently enabled, but that we are working to clean up the core so that they can be enabled,

Yes, that is correct. It was uncovered on my branch to enable clang's -Weverything.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant