Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a built-in startswith($string, $prefix) function #16200

Open
p5pRT opened this issue Oct 14, 2017 · 6 comments
Open

a built-in startswith($string, $prefix) function #16200

p5pRT opened this issue Oct 14, 2017 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 14, 2017

Migrated from rt.perl.org#132301 (status was 'open')

Searchable as RT132301$

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2017

From @mauke

Created by @mauke

Currently Perl has no good way to check whether a string starts with another
string. Current alternatives​:

- index($string, $prefix) == 0

  Inefficient because it has to search the whole (potentially large) $string,
  just to see whether the offset was 0.

- $string =~ /^\Q$prefix/

  Inefficient because it has to pass the whole $prefix through quotemeta, then
  compile it to a regex, then do a regex check.

- substr($string, 0, length $prefix) eq $prefix

  I don't know if this actually makes a copy of the beginning of $string, but
  it's awkward​: You have to mention $prefix twice, which is annoying if it's a
  more complex expression.

I want to be able to express my intent (check prefixes) directly. As a bonus,
it can easily be implemented to be more efficient than string search / regex
stuff.

That's why I think it would be nice to have

  startswith($string, $prefix)

in core (and maybe endswith($x, $y)).

Perl Info

Flags:
    category=core
    severity=wishlist

Site configuration information for perl 5.26.0:

Configured by mauke at Fri Sep 22 13:28:36 CEST 2017.

Summary of my perl5 (revision 5 version 26 subversion 0) configuration:
   
  Platform:
    osname=linux
    osvers=4.9.41-1-lts
    archname=i686-linux
    uname='linux simplicio 4.9.41-1-lts #1 smp mon aug 7 17:57:02 cest 2017 i686 gnulinux '
    config_args=''
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=undef
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2 -march=native'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='7.2.0'
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=4
    doublesize=8
    byteorder=1234
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=12
    longdblkind=3
    ivtype='long'
    ivsize=4
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=4
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags ='-fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/i686-pc-linux-gnu/7.2.0/include-fixed /usr/lib /lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.26.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.26'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -march=native -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.26.0:
    /home/mauke/usr/lib/perl5/site_perl/5.26.0/i686-linux
    /home/mauke/usr/lib/perl5/site_perl/5.26.0
    /home/mauke/usr/lib/perl5/5.26.0/i686-linux
    /home/mauke/usr/lib/perl5/5.26.0


Environment for perl 5.26.0:
    HOME=/home/mauke
    LANG=en_US.UTF-8
    LANGUAGE=en_US
    LC_COLLATE=C
    LC_MONETARY=de_DE.UTF-8
    LC_TIME=de_DE.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/mauke/perl5/perlbrew/bin:/home/mauke/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERLBREW_BASHRC_VERSION=0.73
    PERLBREW_HOME=/home/mauke/.perlbrew
    PERLBREW_ROOT=/home/mauke/perl5/perlbrew
    PERL_BADLANG (unset)
    PERL_UNICODE=SAL
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2017

From @demerphq

On 14 October 2017 at 16​:18, l.mai@​web.de <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by l.mai@​web.de
# Please include the string​: [perl #132301]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=132301 >

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.40 running under perl 5.26.0.

-----------------------------------------------------------------
[Please describe your issue here]

Currently Perl has no good way to check whether a string starts with another
string. Current alternatives​:

- index($string, $prefix) == 0

Inefficient because it has to search the whole (potentially large) $string,
just to see whether the offset was 0.

- $string =~ /^\Q$prefix/

Inefficient because it has to pass the whole $prefix through quotemeta, then
compile it to a regex, then do a regex check.

Interesting observation. With Dave M's patches to how we construct a
regex this need not be the case.

In theory there is no reason that we could not optimize this kind of
case so that it does not do the quotemeta(), and instead just performs
like an anchored index call.

- substr($string, 0, length $prefix) eq $prefix

I don't know if this actually makes a copy of the beginning of $string, but
it's awkward​: You have to mention $prefix twice, which is annoying if it's a
more complex expression.

I don't think it does in the simple case.

I want to be able to express my intent (check prefixes) directly. As a bonus,
it can easily be implemented to be more efficient than string search / regex
stuff.

That's why I think it would be nice to have

startswith($string, $prefix)

in core (and maybe endswith($x, $y)).

At $work this is something we would put in an XS module, alongside
things like ltrim() and rtrim() and equivalents. I am not sure it
needs to be in core.

On the other hand I think you have a point about / ... \Q$str\E ... /
style cases and the general issue that for various common and simple
cases we do not properly optimize the regex engine.

It would be nice if we did not have to either actually escape the
string, and it would probably be really nice if we didn't have to
concat it into the pattern at all, but could instead create some kind
of special EXACT like regnode that could reuse the PV (via SvPV COW
mechanisms maybe).

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2017

From @cpansprout

On Sat, 14 Oct 2017 11​:11​:57 -0700, demerphq wrote​:

On 14 October 2017 at 16​:18, l.mai@​web.de <perlbug-followup@​perl.org>
wrote​:

# New Ticket Created by l.mai@​web.de
# Please include the string​: [perl #132301]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=132301 >

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.40 running under perl 5.26.0.

-----------------------------------------------------------------
[Please describe your issue here]

Currently Perl has no good way to check whether a string starts with
another
string. Current alternatives​:

- index($string, $prefix) == 0

Inefficient because it has to search the whole (potentially large)
$string,
just to see whether the offset was 0.

- $string =~ /^\Q$prefix/

Inefficient because it has to pass the whole $prefix through
quotemeta, then
compile it to a regex, then do a regex check.

Interesting observation. With Dave M's patches to how we construct a
regex this need not be the case.

In theory there is no reason that we could not optimize this kind of
case so that it does not do the quotemeta(), and instead just performs
like an anchored index call.

Similarly, we could optimise the index()== comparison. (Didn’t Dave Mitchell do something similar to that? I don’t remember what it was.)

- substr($string, 0, length $prefix) eq $prefix

I don't know if this actually makes a copy of the beginning of
$string, but
it's awkward​: You have to mention $prefix twice, which is annoying if
it's a
more complex expression.

I don't think it does in the simple case.

It does, unless the code changed when I wasn’t looking.

I want to be able to express my intent (check prefixes) directly. As
a bonus,
it can easily be implemented to be more efficient than string search
/ regex
stuff.

That's why I think it would be nice to have

startswith($string, $prefix)

in core (and maybe endswith($x, $y)).

At $work this is something we would put in an XS module, alongside
things like ltrim() and rtrim() and equivalents. I am not sure it
needs to be in core.

On the other hand I think you have a point about / ... \Q$str\E ... /
style cases and the general issue that for various common and simple
cases we do not properly optimize the regex engine.

It would be nice if we did not have to either actually escape the
string, and it would probably be really nice if we didn't have to
concat it into the pattern at all, but could instead create some kind
of special EXACT like regnode that could reuse the PV (via SvPV COW
mechanisms maybe).

++

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2017

From @cpansprout

On Sat, 14 Oct 2017 12​:13​:32 -0700, sprout wrote​:

Similarly, we could optimise the index()== comparison. (Didn’t Dave
Mitchell do something similar to that? I don’t remember what it was.)

See commit 400ffcf. It was​:

index(...) == -1
index(...) >= 0
index(...) < 0
index(...) <= -1

that got optimised. How hard would it be to extend this to ==0 as well?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2017

From blgl@stacken.kth.se

Quoth l . mai @​ web . de​:

Currently Perl has no good way to check whether a string starts with another
string. Current alternatives​:

- index($string, $prefix) == 0

Inefficient because it has to search the whole (potentially large) $string,
just to see whether the offset was 0.

+ rindex($string, $prefix, 0) == 0

Efficient because the range of potential start positions is 0..0

/Bo Lindbergh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants