Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp optimiser fails with -l #2330

Closed
p5pRT opened this issue Aug 7, 2000 · 7 comments
Closed

regexp optimiser fails with -l #2330

p5pRT opened this issue Aug 7, 2000 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 7, 2000

Migrated from rt.perl.org#3654 (status was 'resolved')

Searchable as RT3654$

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2000

From @vanstyn

Created by @vanstyn

Prompted by a usenet posting, the output from this command​:
  ( echo '<0123456789012345678>' ; echo '<!-- TMPL_VAR NAME=test -->' ) | perl -Dr -nwl0777e 'print /<!-- *TMPL_VAR NAME=([\w]+)? *-->/g'

  includes this​:
  anchored `<!--' at 0 floating `TMPL_VAR NAME=' at 4..2147483647 (checking floating) minlen 21
  Omitting $` $&amp; $' support.
 
  EXECUTING...
 
  Guessing start of match, REx `<!-- *TMPL_VAR NAME=([\w]+)? *-->' against `<0123456789012345678>'...
  Did not find floating substr `TMPL_VAR NAME='...
  Match rejected by optimizer
  Guessing start of match, REx `<!-- *TMPL_VAR NAME=([\w]+)? *-->' against `<!-- TMPL_VAR NAME=test -->'...
  Found floating substr `TMPL_VAR NAME=' at offset 5...
  Found anchored substr `<!--' at offset 0...
  Guessed​: match at offset 0

.. which I think is a bug. The initial 'Did not find ... Match rejected'
cycle appears more often if thereis more leading text, but all disappear
(ie the behaviour is apparently correct) if you remove the -l switch
from the command. Occurs with 5.6.0 and bleadperl.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.6.0:

Configured by hv at Mon Aug  7 23:48:54 BST 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.2.5-16, archname=i686-linux
    uname='linux crypt.compulink.co.uk 2.2.5-16 #1 sun may 30 23:00:18 bst 1999 i686 unknown '
    config_args='-des -Doptimize=-g -O6 -Dprefix=/opt/bleadperl'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define 
    use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
  Compiler:
    cc='cc', optimize='-g -O6', gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release), gccosandvers=
    cppflags='-DDEBUGGING -fno-strict-aliasing'
    ccflags ='-DDEBUGGING -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil
    libc=/lib/libc-2.1.1.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
	devel-6544


@INC for perl v5.6.0:
    lib
    /opt/bleadperl/lib/5.6.0/i686-linux
    /opt/bleadperl/lib/5.6.0
    /opt/bleadperl/lib/site_perl/5.6.0/i686-linux
    /opt/bleadperl/lib/site_perl/5.6.0
    /opt/bleadperl/lib/site_perl
    .


Environment for perl v5.6.0:
    HOME=/home/hv
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/hv/bin:/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2000

From @vanstyn

In <200008080240.DAA11092@​crypt.compulink.co.uk>, Hugo writes​:
:Prompted by a usenet posting, the output from this command​:
: ( echo '<0123456789012345678>' ; echo '<!-- TMPL_VAR NAME=test -->' ) | perl
: -Dr -nwl0777e 'print /<!-- *TMPL_VAR NAME=([\w]+)? *-->/g'
:
: includes this​:
: anchored `<!--' at 0 floating `TMPL_VAR NAME=' at 4..2147483647 (checking fl
:oating) minlen 21
: Omitting $` $&amp; $' support.
:
: EXECUTING...
:
: Guessing start of match, REx `<!-- *TMPL_VAR NAME=([\w]+)? *-->' against `<0
:123456789012345678>'...
: Did not find floating substr `TMPL_VAR NAME='...
: Match rejected by optimizer
: Guessing start of match, REx `<!-- *TMPL_VAR NAME=([\w]+)? *-->' against `<!
:-- TMPL_VAR NAME=test -->'...
: Found floating substr `TMPL_VAR NAME=' at offset 5...
: Found anchored substr `<!--' at offset 0...
: Guessed​: match at offset 0
:
:.. which I think is a bug. The initial 'Did not find ... Match rejected'
:cycle appears more often if thereis more leading text, but all disappear
:(ie the behaviour is apparently correct) if you remove the -l switch
:from the command. Occurs with 5.6.0 and bleadperl.

It also disappears if you switch -l and -0777, so I guess this has
something to do with $\. I'll try and look into this tomorrow.

Note also that -l0777 results in $\ apparently being set to \0377.
Surely it should be "\n"?

Hugo

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2000

From @tamias

On Tue, Aug 08, 2000 at 03​:49​:50AM +0100, Hugo wrote​:

It also disappears if you switch -l and -0777, so I guess this has
something to do with $\. I'll try and look into this tomorrow.

Note also that -l0777 results in $\ apparently being set to \0377.
Surely it should be "\n"?

`perl -l0777` is -l with the octal number 0777 as its argument.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Aug 8, 2000

From @vanstyn

In <20000807230047.B541461@​linguist.dartmouth.edu>, Ronald J Kimball writes​:
:On Tue, Aug 08, 2000 at 03​:49​:50AM +0100, Hugo wrote​:
:
:> It also disappears if you switch -l and -0777, so I guess this has
:> something to do with $\. I'll try and look into this tomorrow.
:>
:> Note also that -l0777 results in $\ apparently being set to \0377.
:> Surely it should be "\n"?
:
:`perl -l0777` is -l with the octal number 0777 as its argument.

Ah, of course. Thanks.

The processing for this (perl.c​:moreswitches()) casts the result
to a char. Should it not either warn about an out-of-range
character, or switch to Unicode?

Hugo

@p5pRT
Copy link
Author

p5pRT commented Aug 9, 2000

From [Unknown Contact. See original ticket]

In <200008081410.PAA12108@​crypt.compulink.co.uk>, Hugo wrote​:

:`perl -l0777` is -l with the octal number 0777 as its argument.

Ah, of course. Thanks.

The processing for this (perl.c​:moreswitches()) casts the result
to a char. Should it not either warn about an out-of-range
character, or switch to Unicode?

perldoc perlrun
  -0[digits]
  specifies the input record separator (`$/') as an octal
  number. If there are no digits, the null character is
  the separator. Other switches may precede or follow
  the digits. For example, if you have a version of find
  which can print filenames terminated by the null
  character, you can say this​:

  find . -name '*.orig' -print0 | perl -n0e unlink

  The special value 00 will cause Perl to slurp files in
  paragraph mode. The value 0777 will cause Perl to
  slurp files whole because there is no legal character
  with that value.

If -0777 is redefined to be a Unicode character, then that last sentence
will need a major rework. Some other command line option will be needed to
set $/ to undef. This problem will have to be resolved when perl can handle
Unicode on input.

@p5pRT
Copy link
Author

p5pRT commented Aug 9, 2000

From @ysth

In article <39913C91.1DA58537@​inwap.com>, Joe Smith <jms@​inwap.com> wrote​:

perldoc perlrun
-0[digits]
[snip]
The special value 00 will cause Perl to slurp files in
paragraph mode. The value 0777 will cause Perl to
slurp files whole because there is no legal character
with that value.

If -0777 is redefined to be a Unicode character, then that last sentence
will need a major rework. Some other command line option will be needed to
set $/ to undef. This problem will have to be resolved when perl can handle
Unicode on input.

Umm, no. -0777 will have to work as it does now for backwards compatibility.
LATIN SMALL LETTER O WITH STROKE AND ACUTE may need some other way to be
represented.

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2003

From The RT System itself

Not a bug​: -l0777 is different from -l -0777.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant