Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor toke.c into smaller, more maintainable parts. #15550

Closed
p5pRT opened this issue Aug 24, 2016 · 8 comments
Closed

Refactor toke.c into smaller, more maintainable parts. #15550

p5pRT opened this issue Aug 24, 2016 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 24, 2016

Migrated from rt.perl.org#129070 (status was 'rejected')

Searchable as RT129070$

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2016

From @DemiMarie

Created by @DemiMarie

toke.c is 11850 lines long. This makes it unwieldly and difficult to
maintain. It should be split into separate files that each contain
a part of the lexer.

Perl Info

Flags:
    category=core
    severity=wishlist

This perlbug was built using Perl 5.22.2 - Wed Aug  3 13:42:00 UTC 2016
It is being executed now by  Perl 5.22.2 - Wed Aug  3 13:38:01 UTC 2016.

Site configuration information for perl 5.22.2:

Configured by Red Hat, Inc. at Wed Aug  3 13:38:01 UTC 2016.

Summary of my perl5 (revision 5 version 22 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=4.6.3-300.fc24.x86_64, archname=x86_64-linux-thread-multi
    uname='linux buildvm-05.phx2.fedoraproject.org 4.6.3-300.fc24.x86_64 #1 smp fri jun 24 20:52:41 utc 2016 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=none -Dccflags=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Dldflags=-Wl,-z,relro  -Dccdlflags=-Wl,--enable-new-dtags -Wl,-z,relro  -Dlddlflags=-shared -Wl,-z,relro  -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.22.2 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinper
 l=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='  -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='6.1.1 20160621 (Red Hat 6.1.1-3)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags ='-Wl,-z,relro  -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib
    libs=-lpthread -lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lresolv -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.23.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.23'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-z,relro '
    cccdlflags='-fPIC', lddlflags='-shared -Wl,-z,relro  -L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch3: support for libdir64
    Fedora Patch4: use libresolv instead of libbind
    Fedora Patch5: USE_MM_LD_RUN_PATH
    Fedora Patch6: Skip hostname tests, due to builders not being network capable
    Fedora Patch7: Dont run one io test due to random builder failures
    Fedora Patch15: Define SONAME for libperl.so
    Fedora Patch16: Install libperl.so to -Dshrpdir value
    Fedora Patch22: Document Math::BigInt::CalcEmu requires Math::BigInt (CPAN RT#85015)
    Fedora Patch26: Make *DBM_File desctructors thread-safe (RT#61912)
    Fedora Patch27: Make PadlistNAMES() lvalue again (CPAN RT#101063)
    Fedora Patch28: Make magic vtable writable as a work-around for Coro (CPAN RT#101063)
    Fedora Patch29: Fix duplicating PerlIO::encoding when spawning threads (RT#31923)
    Fedora Patch30: Do not let XSLoader load relative paths (CVE-2016-6185)
    Fedora Patch31: Avoid loading optional modules from default . (CVE-2016-1238)
    Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
    Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux


@INC for perl 5.22.2:
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5


Environment for perl 5.22.2:
    HOME=/home/dobenour
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/opt/rust/bin:/usr/lib64/qt-3.3/bin:/usr/lib64/ccache:/home/dobenour/.cargo/bin:/home/dobenour/.cargo/bin:/home/dobenour/.cabal.bin:/home/dobenour/.local/bin:/opt/rust/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/dobenour/.gem/ruby/gems/octopress-3.0.11/bin:/home/dobenour/bin
    PERL_BADLANG (unset)
    SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2016

From @cpansprout

On Wed Aug 24 14​:07​:29 2016, demiobenour@​gmail.com wrote​:

This is a bug report for perl from demiobenour@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.22.2.

-----------------------------------------------------------------
[Please describe your issue here]
toke.c is 11850 lines long. This makes it unwieldly and difficult to
maintain.

Difficult for whom? :-)

It should be split into separate files that each contain
a part of the lexer.

That would make it harder for me to find things.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2016

From @DemiMarie

On Wed, 2016-08-24 at 16​:57 -0700, Father Chrysostomos via RT wrote​:

On Wed Aug 24 14​:07​:29 2016, demiobenour@​gmail.com wrote​:

This is a bug report for perl from demiobenour@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.22.2.

-----------------------------------------------------------------
[Please describe your issue here]
toke.c is 11850 lines long.  This makes it unwieldly and difficult
to
maintain.

Difficult for whom? :-)

It should be split into separate files that each contain
a part of the lexer.

That would make it harder for me to find things.

What about using a lexer generator? toke.c is larger than the entire
Flex source tree, so one could probably write a custom lexer generator
in Perl that is designed to handle the features that Perl needs (here
documents, interpolated strings, division vs regex ambiguity, etc).

Flex is written in C, so a lexer generator written in Perl would
probably be less code.

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2016

From zefram@fysh.org

Demi Obenour wrote​:

What about using a lexer generator?

No way. toke.c does a lot more than one could get out of flex.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Aug 29, 2016

From @DemiMarie

On Sat, Aug 27, 2016 at 01​:45​:38PM -0700, Zefram via RT wrote​:

Demi Obenour wrote​:

What about using a lexer generator?

No way. toke.c does a lot more than one could get out of flex.

-zefram

Like what? Serious question.

I know that there is the division vs regex ambiguity, but are there
others?

@p5pRT
Copy link
Author

p5pRT commented Aug 29, 2016

From zefram@fysh.org

Demi Obenour wrote​:

I know that there is the division vs regex ambiguity, but are there
others?

Block vs hash constructor. Indirect object vs sub call. Bareword vs
sub call. Some of these are resolved by semantic information rather
than by looking ahead a few characters, so the custom logic would have
to remain. There are also a bunch of flags set by the tokeniser that
the parser looks at​: the interface between them isn't at all clean,
and all that flag stuff would have to remain. There's also a lot of
purely semantic stuff in toke.c that falls entirely outside the scope
of a lexer generator.

-zefram

@p5pRT p5pRT closed this as completed Mar 21, 2017
@p5pRT
Copy link
Author

p5pRT commented Mar 21, 2017

@iabyn - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant