New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perl doesn't handle m{}g for >= 2GB files even with largefiles and 64bitall #12807
Comments
From @jhiTo replicate one way is to first generate a big XML file: perl -e 'for (0..2**27){print "<id>$_</id>\n"}' > big.xml This generates a (rather boring) XML of 2439025741 bytes, safely north This works, printing "--0--\n" etc.: dd if=big.xml bs=1048576 count=2047 | perl -wlne 'BEGIN{undef $/};print This doesn't, the only thing output from perl is the length(): dd if=big.xml bs=1048576 count=2048 | perl -wlne 'BEGIN{undef $/};print I originally had m{}gs since I was looking for multiline things but Verified to happen in 5.16.2 (in Ubuntu Precise Pangolin) x86_64, and in useperlio=define,uselargefiles=define,use64bitint=define, use64bitall=define |
From @arcJarkko Hietaniemi <perlbug-followup@perl.org> wrote:
As a note for anyone looking into this further: I believe this can't There may be back-compat issues for the former approach, since The obvious "somewhere else" for the latter approach is presumably -- |
The RT System itself - Status changed from 'new' to 'open' |
From @LeontOn Sun, Mar 3, 2013 at 2:55 PM, Aaron Crane <arc@cpan.org> wrote:
AFAICT changing mg_len shouldn't break API unless someone is taking a
A new buffer in mg_ptr seems more logical (didn't look at the Leon |
From @cpansproutOn Sun Mar 03 06:48:09 2013, LeonT wrote:
Also, mg_length is deprecated in 5.18.0. It may or may not invoke So it’s pretty useless, except as an exercise in masochism.
meetoo -- Father Chrysostomos |
From @cpansproutOn Sun Mar 03 06:48:09 2013, LeonT wrote:
I have just searched CPAN for /&.*mg_len/. The only uses of it are in -- Father Chrysostomos |
From @cpansproutOn Sat Feb 23 15:25:15 2013, jhi wrote:
I think this demonstrates the problem more clearly: $ ./perl -lIlib -e '$x=" "x(2**31+20); pos $x = 2**31-5; for(1..10){ 1 -- Father Chrysostomos |
From @cpansproutOn Sat Jul 20 11:05:51 2013, sprout wrote:
That’s only part of the problem. Another data point: $ ./perl -lIlib -e '$x=" "x(2**31-1); $ ./perl -lIlib -e '$x=" "x(2**31); So it’s not just pos() that is affected. I think this is the same as, or similar to, #103260. -- Father Chrysostomos |
From @jhiOn Saturday-201307-20 22:25, Father Chrysostomos via RT wrote:
Indeed. I haven't looked at the code (that is, in the last ten years or so) but
|
From @cpansproutOn Sun Jul 21 06:57:21 2013, jhi wrote:
And yet you are still ahead in terms of the sheer number of commits.
That is exactly the problem. I have a local patch to fix it at least So it is another of these multifaceted puzzles. -- Father Chrysostomos |
From @jhi
Obsessive-compulsive disorder?
An interesting experiment would be to just mechanically to do a (*) Yes, even including all the external APIs, at least initially.
|
From PeterCMartini@GMail.comOn Sun, Jul 21, 2013 at 3:00 PM, Jarkko Hietaniemi <jhi@iki.fi> wrote:
FWIW, there's already a meta ticket for the I32/IV issue: |
From @cpansproutOn Mon Jun 24 18:22:12 2013, sprout wrote:
One of those uses is Perl_hv_placeholders_p. Added in ca73285, Is this another case where a function was mistakenly added to the API The only use of this function *anywhere* in publicly-viewable code (OK, Only one CPAN module, Data::Alias, uses HvPLACEHOLDERS, and it is like So it looks as though it is safe to change ApoR |I32* |hv_placeholders_p |NN HV *hv to XpoR |SSize_t*|hv_placeholders_p |NN HV *hv Comments? -- Father Chrysostomos |
From @rurbanOn Tue, Jul 23, 2013 at 12:15 AM, Father Chrysostomos via RT
No objection |
From @jkeenanOn 7/23/13 1:15 AM, Father Chrysostomos via RT wrote:
A good candidate for a smoke-me. |
From @cpansproutOn Mon Jul 22 22:15:05 2013, sprout wrote:
I have now fixed setting and reading of pos in commit 6174b39, which The rest of this ticket is as yet unresolved. -- Father Chrysostomos |
From @cpansproutOn Tue Jul 23 16:11:37 2013, jkeen@verizon.net wrote:
I only saw your message after pushing to blead. A smoke-me branch won’t If you meant a CPAN smoke, that wouldn’t help, because only one module, -- Father Chrysostomos |
From @cpansproutOn Sat Jul 20 19:25:52 2013, sprout wrote:
I’ve managed to fix this one locally, but now I’m running into another void How on earth do we change that ‘const I32 paren’ without breaking the Is the regexp API considered experimental (please, please)? Or do we perlreapi says: In order to install a new regexp handler, $^H{regcomp} is set to Does that mean we need $^H{regcomp_wide} = 1? (_wide is not a good -- Father Chrysostomos |
From @LeontOn Thu, Jul 25, 2013 at 3:19 AM, Father Chrysostomos via RT
Ouch! Yeah, that is tricky :-/ Leon |
From @cpansproutOn Wed Jul 24 18:24:17 2013, LeonT wrote:
Wait, I wasn’t thinking. I got an assertion failure from that function, paren can be I32, since no system has enough memory to compile a regular This is where the problem lies: typedef struct regexp_paren_pair { start and end obviously need to be bigger. What makes this tricky is that it is stored inside the regexp struct If regexp plugins never actually take pointers to start or end, then it Regexp engines that use STRLEN will just work, except in older perls So it does not seem so bad after all. The same thing applies to sublen in the regexp struct. It might apply minlen and minlenret are only for optimisations, so they can stay as I32. The other place where we have an I32 is the return value of
$^H{regcomp_wide} obviously wouldn’t work, because an older regexp -- Father Chrysostomos |
From @cpansproutI have fixed this bug and others on the sprout/regexp branch. I would -- Father Chrysostomos |
From @cpansproutFrom a03f5d860f145bd038f8d1e186c1c018dbfb949c Mon Sep 17 00:00:00 2001 Using I32 for the fields that record information about the location of SSize_t is appropriate, since it is 64 bits on 64-bit platforms and 32 This commit changes enough instances of I32 to SSize_t to get the embed.fnc | 4 ++-- Inline Patchdiff --git a/embed.fnc b/embed.fnc
index 3400222..edf0a2f 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2042,9 +2042,9 @@ Esn |void |cl_and |NN struct regnode_charclass_class *cl \
Esn |void |cl_or |NN const struct RExC_state_t *pRExC_state \
|NN struct regnode_charclass_class *cl \
|NN const struct regnode_charclass_class *or_with
-Es |I32 |study_chunk |NN struct RExC_state_t *pRExC_state \
+Es |SSize_t|study_chunk |NN struct RExC_state_t *pRExC_state \
|NN regnode **scanp|NN I32 *minlenp \
- |NN I32 *deltap|NN regnode *last \
+ |NN SSize_t *deltap|NN regnode *last \
|NULLOK struct scan_data_t *data \
|I32 stopparen|NULLOK U8* recursed \
|NULLOK struct regnode_charclass_class *and_withp \
diff --git a/perl.h b/perl.h
index bdb64b3..d2c5568 100644
--- a/perl.h
+++ b/perl.h
@@ -1601,6 +1601,8 @@ typedef UVTYPE UV;
# endif
#endif
+#define SSize_t_MAX (SSize_t)(~(size_t)0 >> 1)
+
#ifndef HAS_QUAD
# undef PERL_NEED_MY_HTOLE64
# undef PERL_NEED_MY_LETOH64
diff --git a/proto.h b/proto.h
index 5969e7b..c50d60f 100644
--- a/proto.h
+++ b/proto.h
@@ -6770,7 +6770,7 @@ STATIC void S_scan_commit(pTHX_ const struct RExC_state_t *pRExC_state, struct s
#define PERL_ARGS_ASSERT_SCAN_COMMIT \
assert(pRExC_state); assert(data); assert(minlenp)
-STATIC I32 S_study_chunk(pTHX_ struct RExC_state_t *pRExC_state, regnode **scanp, I32 *minlenp, I32 *deltap, regnode *last, struct scan_data_t *data, I32 stopparen, U8* recursed, struct regnode_charclass_class *and_withp, U32 flags, U32 depth)
+STATIC SSize_t S_study_chunk(pTHX_ struct RExC_state_t *pRExC_state, regnode **scanp, I32 *minlenp, SSize_t *deltap, regnode *last, struct scan_data_t *data, I32 stopparen, U8* recursed, struct regnode_charclass_class *and_withp, U32 flags, U32 depth)
__attribute__nonnull__(pTHX_1)
__attribute__nonnull__(pTHX_2)
__attribute__nonnull__(pTHX_3)
diff --git a/regcomp.c b/regcomp.c
index 432e5eb..cd7e8ae 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -345,25 +345,25 @@ typedef struct RExC_state_t {
typedef struct scan_data_t {
/*I32 len_min; unused */
/*I32 len_delta; unused */
- I32 pos_min;
+ SSize_t pos_min;
I32 pos_delta;
SV *last_found;
I32 last_end; /* min value, <0 unless valid. */
- I32 last_start_min;
+ SSize_t last_start_min;
I32 last_start_max;
SV **longest; /* Either &l_fixed, or &l_float. */
SV *longest_fixed; /* longest fixed string found in pattern */
- I32 offset_fixed; /* offset where it starts */
+ SSize_t offset_fixed; /* offset where it starts */
I32 *minlen_fixed; /* pointer to the minlen relevant to the string */
I32 lookbehind_fixed; /* is the position of the string modfied by LB */
SV *longest_float; /* longest floating string found in pattern */
- I32 offset_float_min; /* earliest point in string it can appear */
+ SSize_t offset_float_min; /* earliest point in string it can appear */
I32 offset_float_max; /* latest point in string it can appear */
I32 *minlen_float; /* pointer to the minlen relevant to the string */
- I32 lookbehind_float; /* is the position of the string modified by LB */
+ SSize_t lookbehind_float; /* is the pos of the string modified by LB */
I32 flags;
I32 whilem_c;
- I32 *last_closep;
+ SSize_t *last_closep;
struct regnode_charclass_class *start_class;
} scan_data_t;
@@ -3028,9 +3028,9 @@ typedef struct scan_frame {
#define SCAN_COMMIT(s, data, m) scan_commit(s, data, m, is_inf)
-STATIC I32
+STATIC SSize_t
S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
- I32 *minlenp, I32 *deltap,
+ I32 *minlenp, SSize_t *deltap,
regnode *last,
scan_data_t *data,
I32 stopparen,
@@ -3124,7 +3124,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
/* NOTE - There is similar code to this block below for handling
TRIE nodes on a re-study. If you change stuff here check there
too. */
- I32 max1 = 0, min1 = I32_MAX, num = 0;
+ SSize_t max1 = 0, min1 = SSize_t_MAX, num = 0;
struct regnode_charclass_class accum;
regnode * const startbranch=scan;
@@ -3134,7 +3134,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
cl_init_zero(pRExC_state, &accum);
while (OP(scan) == code) {
- I32 deltanext, minnext, f = 0, fake;
+ SSize_t deltanext, minnext, fake;
+ I32 f = 0;
struct regnode_charclass_class this_class;
num++;
@@ -3678,7 +3679,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
flags &= ~SCF_DO_STCLASS;
}
else if (PL_regkind[OP(scan)] == EXACT) { /* But OP != EXACT! */
- I32 l = STR_LEN(scan);
+ SSize_t l = STR_LEN(scan);
UV uc = *((U8*)STRING(scan));
/* Search for fixed substrings supports EXACT only. */
@@ -3795,8 +3796,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
flags &= ~SCF_DO_STCLASS;
}
else if (REGNODE_VARIES(OP(scan))) {
- I32 mincount, maxcount, minnext, deltanext, fl = 0;
- I32 f = flags, pos_before = 0;
+ SSize_t mincount, maxcount, minnext, deltanext;
+ I32 fl = 0, f = flags, pos_before = 0;
regnode * const oscan = scan;
struct regnode_charclass_class this_class;
struct regnode_charclass_class *oclass = NULL;
@@ -4373,7 +4374,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
In this case we can't do fixed string optimisation.
*/
- I32 deltanext, minnext, fake = 0;
+ SSize_t deltanext, minnext, fake = 0;
regnode *nscan;
struct regnode_charclass_class intrnl;
int f = 0;
@@ -4615,7 +4616,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
for ( word=1 ; word <= trie->wordcount ; word++)
{
- I32 deltanext=0, minnext=0, f = 0, fake;
+ SSize_t deltanext=0, minnext=0, f = 0, fake;
struct regnode_charclass_class this_class;
data_fake.flags = 0;
@@ -6075,11 +6076,11 @@ reStudy:
/* testing for BRANCH here tells us whether there is "must appear"
data in the pattern. If there is then we can use it for optimisations */
if (!(RExC_seen & REG_TOP_LEVEL_BRANCHES)) { /* Only one top-level choice. */
- I32 fake;
+ SSize_t fake;
STRLEN longest_float_length, longest_fixed_length;
struct regnode_charclass_class ch_class; /* pointed to by data */
int stclass_flag;
- I32 last_close = 0; /* pointed to by data */
+ SSize_t last_close = 0; /* pointed to by data */
regnode *first= scan;
regnode *first_next= regnext(first);
/*
@@ -6361,9 +6362,9 @@ reStudy:
}
else {
/* Several toplevels. Best we can is to set minlen. */
- I32 fake;
+ SSize_t fake;
struct regnode_charclass_class ch_class;
- I32 last_close = 0;
+ SSize_t last_close = 0;
DEBUG_PARSE_r(PerlIO_printf(Perl_debug_log, "\nMulti Top Level\n"));
diff --git a/regexec.c b/regexec.c
index fb8e97a..4a25f5c 100644
--- a/regexec.c
+++ b/regexec.c
@@ -615,7 +615,7 @@ Perl_re_intuit_start(pTHX_
{
dVAR;
struct regexp *const prog = ReANY(rx);
- I32 start_shift = 0;
+ SSize_t start_shift = 0;
/* Should be nonnegative! */
I32 end_shift = 0;
char *s;
@@ -751,7 +751,7 @@ Perl_re_intuit_start(pTHX_
the "check" substring in the region corrected by start/end_shift. */
{
- I32 srch_start_shift = start_shift;
+ SSize_t srch_start_shift = start_shift;
I32 srch_end_shift = end_shift;
U8* start_point;
U8* end_point;
diff --git a/regexp.h b/regexp.h
index ebbbde1..8542cb1 100644
--- a/regexp.h
+++ b/regexp.h
@@ -36,8 +36,8 @@ struct regexp_engine;
struct regexp;
struct reg_substr_datum {
- I32 min_offset;
- I32 max_offset;
+ SSize_t min_offset;
+ SSize_t max_offset;
SV *substr; /* non-utf8 variant */
SV *utf8_substr; /* utf8 variant */
I32 end_shift;
diff --git a/t/bigmem/regexp.t b/t/bigmem/regexp.t
index 9404f2c..d5496f1 100644
--- a/t/bigmem/regexp.t
+++ b/t/bigmem/regexp.t
@@ -12,7 +12,7 @@ $ENV{PERL_TEST_MEMORY} >= 2
$Config{ptrsize} >= 8
or skip_all("Need 64-bit pointers for this test");
-plan(5);
+plan(6);
# [perl #116907]
# ${\2} to defeat constant folding, which in this case actually slows
@@ -33,3 +33,7 @@ $x =~ /./g;
is "$'", 'efg', q "$' after match against long string";
is "$-[0],$+[0]", '2147483651,2147483652',
'@- and @+ after matches past 2**31';
+
+# Substring optimisations
+is $x =~ /(?:(?:.{32766}){32766}){2}(?:.{32766}){8}.{8}ef/, 1,
+ 'anchored substr past 2**31';
--
From ae3358ed545fdb5d12870c5307142bba363ae590 Mon Sep 17 00:00:00 2001 As part of getting the regexp engine to handle long strings, this com- To avoid having to change any logic, I kept the signedness the same. I did not change anything that affects the length of the regular These changes should fix bugs, but are very hard to test. In most embed.fnc | 21 ++++---- Inline Patchdiff --git a/embed.fnc b/embed.fnc
index edf0a2f..acd0b1c 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1095,8 +1095,8 @@ EMsR |SV* |_new_invlist_C_array|NN const UV* const list
: Not used currently: EXMs |bool |_invlistEQ |NN SV* const a|NN SV* const b|const bool complement_b
#endif
Ap |I32 |pregexec |NN REGEXP * const prog|NN char* stringarg \
- |NN char* strend|NN char* strbeg|I32 minend \
- |NN SV* screamer|U32 nosave
+ |NN char* strend|NN char* strbeg \
+ |SSize_t minend |NN SV* screamer|U32 nosave
Ap |void |pregfree |NULLOK REGEXP* r
Ap |void |pregfree2 |NN REGEXP *rx
: FIXME - is anything in re using this now?
@@ -1127,8 +1127,9 @@ EiPR |I32 |regcurly |NN const char *s \
|const bool rbrace_must_be_escaped
#endif
Ap |I32 |regexec_flags |NN REGEXP *const rx|NN char *stringarg \
- |NN char *strend|NN char *strbeg|I32 minend \
- |NN SV *sv|NULLOK void *data|U32 flags
+ |NN char *strend|NN char *strbeg \
+ |SSize_t minend|NN SV *sv \
+ |NULLOK void *data|U32 flags
ApR |regnode*|regnext |NULLOK regnode* p
EXp |SV*|reg_named_buff |NN REGEXP * const rx|NULLOK SV * const key \
|NULLOK SV * const value|const U32 flags
@@ -2030,8 +2031,8 @@ Ei |U8 |compute_EXACTish|NN struct RExC_state_t *pRExC_state
Es |char * |nextchar |NN struct RExC_state_t *pRExC_state
Es |bool |reg_skipcomment|NN struct RExC_state_t *pRExC_state
Es |void |scan_commit |NN const struct RExC_state_t *pRExC_state \
- |NN struct scan_data_t *data|NN I32 *minlenp \
- |int is_inf
+ |NN struct scan_data_t *data \
+ |NN SSize_t *minlenp|int is_inf
Esn |void |cl_anything |NN const struct RExC_state_t *pRExC_state \
|NN struct regnode_charclass_class *cl
EsRn |int |cl_is_anything |NN const struct regnode_charclass_class *cl
@@ -2043,7 +2044,7 @@ Esn |void |cl_or |NN const struct RExC_state_t *pRExC_state \
|NN struct regnode_charclass_class *cl \
|NN const struct regnode_charclass_class *or_with
Es |SSize_t|study_chunk |NN struct RExC_state_t *pRExC_state \
- |NN regnode **scanp|NN I32 *minlenp \
+ |NN regnode **scanp|NN SSize_t *minlenp \
|NN SSize_t *deltap|NN regnode *last \
|NULLOK struct scan_data_t *data \
|I32 stopparen|NULLOK U8* recursed \
@@ -2101,15 +2102,15 @@ Es |CHECKPOINT|regcppush |NN const regexp *rex|I32 parenfloor\
|U32 maxopenparen
Es |void |regcppop |NN regexp *rex\
|NN U32 *maxopenparen_p
-ERsn |U8* |reghop3 |NN U8 *s|I32 off|NN const U8 *lim
+ERsn |U8* |reghop3 |NN U8 *s|SSize_t off|NN const U8 *lim
ERsM |SV* |core_regclass_swash|NULLOK const regexp *prog \
|NN const struct regnode *node|bool doinit \
|NULLOK SV **listsvp
#ifdef XXX_dmq
-ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \
+ERsn |U8* |reghop4 |NN U8 *s|SSize_t off|NN const U8 *llim \
|NN const U8 *rlim
#endif
-ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim
+ERsn |U8* |reghopmaybe3 |NN U8 *s|SSize_t off|NN const U8 *lim
ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c \
|NN char *s|NN const char *strend \
|NULLOK regmatch_info *reginfo
diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod
index 659088e..cfc41d7 100644
--- a/pod/perlreapi.pod
+++ b/pod/perlreapi.pod
@@ -17,7 +17,7 @@ following format:
REGEXP * const rx,
char* stringarg,
char* strend, char* strbeg,
- I32 minend, SV* sv,
+ SSize_t minend, SV* sv,
void* data, U32 flags);
char* (*intuit) (pTHX_
REGEXP * const rx, SV *sv,
@@ -238,7 +238,7 @@ certain optimisations when this is set.
I32 exec(pTHX_ REGEXP * const rx,
char *stringarg, char* strend, char* strbeg,
- I32 minend, SV* sv,
+ SSize_t minend, SV* sv,
void* data, U32 flags);
Execute a regexp. The arguments are
diff --git a/proto.h b/proto.h
index c50d60f..fc007c9 100644
--- a/proto.h
+++ b/proto.h
@@ -3187,7 +3187,7 @@ PERL_CALLCONV REGEXP* Perl_pregcomp(pTHX_ SV * const pattern, const U32 flags)
#define PERL_ARGS_ASSERT_PREGCOMP \
assert(pattern)
-PERL_CALLCONV I32 Perl_pregexec(pTHX_ REGEXP * const prog, char* stringarg, char* strend, char* strbeg, I32 minend, SV* screamer, U32 nosave)
+PERL_CALLCONV I32 Perl_pregexec(pTHX_ REGEXP * const prog, char* stringarg, char* strend, char* strbeg, SSize_t minend, SV* screamer, U32 nosave)
__attribute__nonnull__(pTHX_1)
__attribute__nonnull__(pTHX_2)
__attribute__nonnull__(pTHX_3)
@@ -3409,7 +3409,7 @@ PERL_CALLCONV void Perl_regdump(pTHX_ const regexp* r)
#define PERL_ARGS_ASSERT_REGDUMP \
assert(r)
-PERL_CALLCONV I32 Perl_regexec_flags(pTHX_ REGEXP *const rx, char *stringarg, char *strend, char *strbeg, I32 minend, SV *sv, void *data, U32 flags)
+PERL_CALLCONV I32 Perl_regexec_flags(pTHX_ REGEXP *const rx, char *stringarg, char *strend, char *strbeg, SSize_t minend, SV *sv, void *data, U32 flags)
__attribute__nonnull__(pTHX_1)
__attribute__nonnull__(pTHX_2)
__attribute__nonnull__(pTHX_3)
@@ -6763,14 +6763,14 @@ STATIC char * S_regwhite(struct RExC_state_t *pRExC_state, char *p)
#define PERL_ARGS_ASSERT_REGWHITE \
assert(pRExC_state); assert(p)
-STATIC void S_scan_commit(pTHX_ const struct RExC_state_t *pRExC_state, struct scan_data_t *data, I32 *minlenp, int is_inf)
+STATIC void S_scan_commit(pTHX_ const struct RExC_state_t *pRExC_state, struct scan_data_t *data, SSize_t *minlenp, int is_inf)
__attribute__nonnull__(pTHX_1)
__attribute__nonnull__(pTHX_2)
__attribute__nonnull__(pTHX_3);
#define PERL_ARGS_ASSERT_SCAN_COMMIT \
assert(pRExC_state); assert(data); assert(minlenp)
-STATIC SSize_t S_study_chunk(pTHX_ struct RExC_state_t *pRExC_state, regnode **scanp, I32 *minlenp, SSize_t *deltap, regnode *last, struct scan_data_t *data, I32 stopparen, U8* recursed, struct regnode_charclass_class *and_withp, U32 flags, U32 depth)
+STATIC SSize_t S_study_chunk(pTHX_ struct RExC_state_t *pRExC_state, regnode **scanp, SSize_t *minlenp, SSize_t *deltap, regnode *last, struct scan_data_t *data, I32 stopparen, U8* recursed, struct regnode_charclass_class *and_withp, U32 flags, U32 depth)
__attribute__nonnull__(pTHX_1)
__attribute__nonnull__(pTHX_2)
__attribute__nonnull__(pTHX_3)
@@ -6976,14 +6976,14 @@ STATIC CHECKPOINT S_regcppush(pTHX_ const regexp *rex, I32 parenfloor, U32 maxop
#define PERL_ARGS_ASSERT_REGCPPUSH \
assert(rex)
-STATIC U8* S_reghop3(U8 *s, I32 off, const U8 *lim)
+STATIC U8* S_reghop3(U8 *s, SSize_t off, const U8 *lim)
__attribute__warn_unused_result__
__attribute__nonnull__(1)
__attribute__nonnull__(3);
#define PERL_ARGS_ASSERT_REGHOP3 \
assert(s); assert(lim)
-STATIC U8* S_reghopmaybe3(U8 *s, I32 off, const U8 *lim)
+STATIC U8* S_reghopmaybe3(U8 *s, SSize_t off, const U8 *lim)
__attribute__warn_unused_result__
__attribute__nonnull__(1)
__attribute__nonnull__(3);
@@ -7032,7 +7032,7 @@ STATIC void S_to_utf8_substr(pTHX_ regexp * prog)
assert(prog)
# if defined(XXX_dmq)
-STATIC U8* S_reghop4(U8 *s, I32 off, const U8 *llim, const U8 *rlim)
+STATIC U8* S_reghop4(U8 *s, SSize_t off, const U8 *llim, const U8 *rlim)
__attribute__warn_unused_result__
__attribute__nonnull__(1)
__attribute__nonnull__(3)
diff --git a/regcomp.c b/regcomp.c
index cd7e8ae..b2d6555 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -123,7 +123,7 @@ typedef struct RExC_state_t {
char *start; /* Start of input for compile */
char *end; /* End of input for compile */
char *parse; /* Input-scan pointer. */
- I32 whilem_seen; /* number of WHILEM in this expr */
+ SSize_t whilem_seen; /* number of WHILEM in this expr */
regnode *emit_start; /* Start of emitted-code area */
regnode *emit_bound; /* First regnode outside of the allocated space */
regnode *emit; /* Code-emit pointer; if = &emit_dummy,
@@ -132,10 +132,10 @@ typedef struct RExC_state_t {
I32 naughty; /* How bad is this pattern? */
I32 sawback; /* Did we see \1, ...? */
U32 seen;
- I32 size; /* Code size. */
- I32 npar; /* Capture buffer count, (OPEN). */
- I32 cpar; /* Capture buffer count, (CLOSE). */
- I32 nestroot; /* root parens we are in - used by accept */
+ SSize_t size; /* Code size. */
+ I32 npar; /* Capture buffer count, (OPEN). */
+ I32 cpar; /* Capture buffer count, (CLOSE). */
+ I32 nestroot; /* root parens we are in - used by accept */
I32 extralen;
I32 seen_zerolen;
regnode **open_parens; /* pointers to open parens */
@@ -158,8 +158,8 @@ typedef struct RExC_state_t {
I32 in_multi_char_class;
struct reg_code_block *code_blocks; /* positions of literal (?{})
within pattern */
- int num_code_blocks; /* size of code_blocks[] */
- int code_index; /* next code_blocks[] slot */
+ I32 num_code_blocks; /* size of code_blocks[] */
+ I32 code_index; /* next code_blocks[] slot */
#if ADD_TO_REGEXEC
char *starttry; /* -Dr: where regtry was called. */
#define RExC_starttry (pRExC_state->starttry)
@@ -301,7 +301,7 @@ typedef struct RExC_state_t {
- max_offset
Only used for floating strings. This is the rightmost point that
- the string can appear at. If set to I32 max it indicates that the
+ the string can appear at. If set to SSize_t_MAX it indicates that the
string can occur infinitely far to the right.
- minlenp
@@ -346,20 +346,20 @@ typedef struct scan_data_t {
/*I32 len_min; unused */
/*I32 len_delta; unused */
SSize_t pos_min;
- I32 pos_delta;
+ SSize_t pos_delta;
SV *last_found;
- I32 last_end; /* min value, <0 unless valid. */
+ SSize_t last_end; /* min value, <0 unless valid. */
SSize_t last_start_min;
- I32 last_start_max;
+ SSize_t last_start_max;
SV **longest; /* Either &l_fixed, or &l_float. */
SV *longest_fixed; /* longest fixed string found in pattern */
SSize_t offset_fixed; /* offset where it starts */
- I32 *minlen_fixed; /* pointer to the minlen relevant to the string */
+ SSize_t *minlen_fixed; /* pointer to the minlen relevant to the string */
I32 lookbehind_fixed; /* is the position of the string modfied by LB */
SV *longest_float; /* longest floating string found in pattern */
SSize_t offset_float_min; /* earliest point in string it can appear */
- I32 offset_float_max; /* latest point in string it can appear */
- I32 *minlen_float; /* pointer to the minlen relevant to the string */
+ SSize_t offset_float_max; /* latest point in string it can appear */
+ SSize_t *minlen_float; /* pointer to the minlen relevant to the string */
SSize_t lookbehind_float; /* is the pos of the string modified by LB */
I32 flags;
I32 whilem_c;
@@ -748,7 +748,8 @@ DEBUG_OPTIMISE_MORE_r(if(data){ \
floating substrings if needed. */
STATIC void
-S_scan_commit(pTHX_ const RExC_state_t *pRExC_state, scan_data_t *data, I32 *minlenp, int is_inf)
+S_scan_commit(pTHX_ const RExC_state_t *pRExC_state, scan_data_t *data,
+ SSize_t *minlenp, int is_inf)
{
const STRLEN l = CHR_SVLEN(data->last_found);
const STRLEN old_l = CHR_SVLEN(*data->longest);
@@ -772,9 +773,12 @@ S_scan_commit(pTHX_ const RExC_state_t *pRExC_state, scan_data_t *data, I32 *min
data->offset_float_min = l ? data->last_start_min : data->pos_min;
data->offset_float_max = (l
? data->last_start_max
- : (data->pos_delta == I32_MAX ? I32_MAX : data->pos_min + data->pos_delta));
- if (is_inf || (U32)data->offset_float_max > (U32)I32_MAX)
- data->offset_float_max = I32_MAX;
+ : (data->pos_delta == SSize_t_MAX
+ ? SSize_t_MAX
+ : data->pos_min + data->pos_delta));
+ if (is_inf
+ || (size_t)data->offset_float_max > (size_t)SSize_t_MAX)
+ data->offset_float_max = SSize_t_MAX;
if (data->flags & SF_BEFORE_EOL)
data->flags
|= ((data->flags & SF_BEFORE_EOL) << SF_FL_SHIFT_EOL);
@@ -3030,7 +3034,7 @@ typedef struct scan_frame {
STATIC SSize_t
S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
- I32 *minlenp, SSize_t *deltap,
+ SSize_t *minlenp, SSize_t *deltap,
regnode *last,
scan_data_t *data,
I32 stopparen,
@@ -3046,17 +3050,18 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
/* and_withp: Valid if flags & SCF_DO_STCLASS_OR */
{
dVAR;
- I32 min = 0; /* There must be at least this number of characters to match */
+ /* There must be at least this number of characters to match */
+ SSize_t min = 0;
I32 pars = 0, code;
regnode *scan = *scanp, *next;
- I32 delta = 0;
+ SSize_t delta = 0;
int is_inf = (flags & SCF_DO_SUBSTR) && (data->flags & SF_IS_INF);
int is_inf_internal = 0; /* The studied chunk is infinite */
I32 is_par = OP(scan) == OPEN ? ARG(scan) : 0;
scan_data_t data_fake;
SV *re_trie_maxbuff = NULL;
regnode *first_non_open = scan;
- I32 stopmin = I32_MAX;
+ SSize_t stopmin = SSize_t_MAX;
scan_frame *frame = NULL;
GET_RE_DEBUG_FLAGS_DECL;
@@ -3166,9 +3171,9 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
stopparen, recursed, NULL, f,depth+1);
if (min1 > minnext)
min1 = minnext;
- if (deltanext == I32_MAX) {
+ if (deltanext == SSize_t_MAX) {
is_inf = is_inf_internal = 1;
- max1 = I32_MAX;
+ max1 = SSize_t_MAX;
} else if (max1 < minnext + deltanext)
max1 = minnext + deltanext;
scan = next;
@@ -3193,16 +3198,17 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
min1 = 0;
if (flags & SCF_DO_SUBSTR) {
data->pos_min += min1;
- if (data->pos_delta >= I32_MAX - (max1 - min1))
- data->pos_delta = I32_MAX;
+ if (data->pos_delta >= SSize_t_MAX - (max1 - min1))
+ data->pos_delta = SSize_t_MAX;
else
data->pos_delta += max1 - min1;
if (max1 != min1 || is_inf)
data->longest = &(data->longest_float);
}
min += min1;
- if (delta == I32_MAX || I32_MAX - delta - (max1 - min1) < 0)
- delta = I32_MAX;
+ if (delta == SSize_t_MAX
+ || SSize_t_MAX - delta - (max1 - min1) < 0)
+ delta = SSize_t_MAX;
else
delta += max1 - min1;
if (flags & SCF_DO_STCLASS_OR) {
@@ -3590,7 +3596,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
}
}
else if (OP(scan) == EXACT) {
- I32 l = STR_LEN(scan);
+ SSize_t l = STR_LEN(scan);
UV uc;
if (UTF) {
const U8 * const s = (U8*)STRING(scan);
@@ -3606,7 +3612,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
if (data->last_end == -1) { /* Update the start info. */
data->last_start_min = data->pos_min;
data->last_start_max = is_inf
- ? I32_MAX : data->pos_min + data->pos_delta;
+ ? SSize_t_MAX : data->pos_min + data->pos_delta;
}
sv_catpvn(data->last_found, STRING(scan), STR_LEN(scan));
if (UTF)
@@ -3796,8 +3802,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
flags &= ~SCF_DO_STCLASS;
}
else if (REGNODE_VARIES(OP(scan))) {
- SSize_t mincount, maxcount, minnext, deltanext;
- I32 fl = 0, f = flags, pos_before = 0;
+ SSize_t mincount, maxcount, minnext, deltanext, pos_before = 0;
+ I32 fl = 0, f = flags;
regnode * const oscan = scan;
struct regnode_charclass_class this_class;
struct regnode_charclass_class *oclass = NULL;
@@ -3934,11 +3940,11 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
}
min += minnext * mincount;
- is_inf_internal |= deltanext == I32_MAX
+ is_inf_internal |= deltanext == SSize_t_MAX
|| (maxcount == REG_INFTY && minnext + deltanext > 0);
is_inf |= is_inf_internal;
if (is_inf)
- delta = I32_MAX;
+ delta = SSize_t_MAX;
else
delta += (minnext + deltanext) * maxcount - minnext * mincount;
@@ -4068,10 +4074,10 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
if (data->last_end > 0 && mincount != 0) { /* Ends with a string. */
#if defined(SPARC64_GCC_WORKAROUND)
- I32 b = 0;
+ SSize_t b = 0;
STRLEN l = 0;
const char *s = NULL;
- I32 old = 0;
+ SSize_t old = 0;
if (pos_before >= data->last_start_min)
b = pos_before;
@@ -4083,11 +4089,11 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
old = b - data->last_start_min;
#else
- I32 b = pos_before >= data->last_start_min
+ SSize_t b = pos_before >= data->last_start_min
? pos_before : data->last_start_min;
STRLEN l;
const char * const s = SvPV_const(data->last_found, l);
- I32 old = b - data->last_start_min;
+ SSize_t old = b - data->last_start_min;
#endif
if (UTF)
@@ -4119,20 +4125,21 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
} else {
/* start offset must point into the last copy */
data->last_start_min += minnext * (mincount - 1);
- data->last_start_max += is_inf ? I32_MAX
+ data->last_start_max += is_inf ? SSize_t_MAX
: (maxcount - 1) * (minnext + data->pos_delta);
}
}
/* It is counted once already... */
data->pos_min += minnext * (mincount - counted);
#if 0
-PerlIO_printf(Perl_debug_log, "counted=%d deltanext=%d I32_MAX=%d minnext=%d maxcount=%d mincount=%d\n",
- counted, deltanext, I32_MAX, minnext, maxcount, mincount);
+PerlIO_printf(Perl_debug_log, "counted=%zd deltanext=%zd SSize_t_MAX=%zd minnext=%zd maxcount=%zd mincount=%zd\n",
+ counted, deltanext, SSize_t_MAX, minnext, maxcount, mincount);
if (deltanext != I32_MAX)
-PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext + deltanext) * maxcount - minnext * mincount, I32_MAX - data->pos_delta);
+PerlIO_printf(Perl_debug_log, "LHS=%zd RHS=%zd\n", -counted * deltanext + (minnext + deltanext) * maxcount - minnext * mincount, SSize_t_MAX - data->pos_delta);
#endif
- if (deltanext == I32_MAX || -counted * deltanext + (minnext + deltanext) * maxcount - minnext * mincount >= I32_MAX - data->pos_delta)
- data->pos_delta = I32_MAX;
+ if (deltanext == SSize_t_MAX ||
+ -counted * deltanext + (minnext + deltanext) * maxcount - minnext * mincount >= SSize_t_MAX - data->pos_delta)
+ data->pos_delta = SSize_t_MAX;
else
data->pos_delta += - counted * deltanext +
(minnext + deltanext) * maxcount - minnext * mincount;
@@ -4152,7 +4159,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
data->last_start_min =
data->pos_min - CHR_SVLEN(last_str);
data->last_start_max = is_inf
- ? I32_MAX
+ ? SSize_t_MAX
: data->pos_min + data->pos_delta
- CHR_SVLEN(last_str);
}
@@ -4443,7 +4450,8 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
length of the pattern, something we won't know about
until after the recurse.
*/
- I32 deltanext, fake = 0;
+ SSize_t deltanext;
+ I32 fake = 0;
regnode *nscan;
struct regnode_charclass_class intrnl;
int f = 0;
@@ -4453,8 +4461,8 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
have to worry about freeing them when we know
they wont be used, which would be a pain.
*/
- I32 *minnextp;
- Newx( minnextp, 1, I32 );
+ SSize_t *minnextp;
+ Newx( minnextp, 1, SSize_t );
SAVEFREEPV(minnextp);
if (data) {
@@ -4583,7 +4591,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
{
if (!(RExC_rx->extflags & RXf_ANCH) && (flags & SCF_DO_SUBSTR))
RExC_rx->extflags |= RXf_ANCH_GPOS;
- if (RExC_rx->gofs < (U32)min)
+ if (RExC_rx->gofs < (STRLEN)min)
RExC_rx->gofs = min;
} else {
RExC_rx->extflags |= RXf_GPOS_FLOAT;
@@ -4599,7 +4607,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
regnode *trie_node= scan;
regnode *tail= regnext(scan);
reg_trie_data *trie = (reg_trie_data*)RExC_rxi->data->data[ ARG(scan) ];
- I32 max1 = 0, min1 = I32_MAX;
+ SSize_t max1 = 0, min1 = SSize_t_MAX;
struct regnode_charclass_class accum;
if (flags & SCF_DO_SUBSTR) /* XXXX Add !SUSPEND? */
@@ -4650,12 +4658,12 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
if (nextbranch && PL_regkind[OP(nextbranch)]==BRANCH)
nextbranch= regnext((regnode*)nextbranch);
- if (min1 > (I32)(minnext + trie->minlen))
+ if (min1 > (SSize_t)(minnext + trie->minlen))
min1 = minnext + trie->minlen;
- if (deltanext == I32_MAX) {
+ if (deltanext == SSize_t_MAX) {
is_inf = is_inf_internal = 1;
- max1 = I32_MAX;
- } else if (max1 < (I32)(minnext + deltanext + trie->maxlen))
+ max1 = SSize_t_MAX;
+ } else if (max1 < (SSize_t)(minnext + deltanext + trie->maxlen))
max1 = minnext + deltanext + trie->maxlen;
if (data_fake.flags & (SF_HAS_PAR|SF_IN_PAR))
@@ -4749,9 +4757,9 @@ PerlIO_printf(Perl_debug_log, "LHS=%d RHS=%d\n", -counted * deltanext + (minnext
DEBUG_STUDYDATA("pre-fin:",data,depth);
*scanp = scan;
- *deltap = is_inf_internal ? I32_MAX : delta;
+ *deltap = is_inf_internal ? SSize_t_MAX : delta;
if (flags & SCF_DO_SUBSTR && is_inf)
- data->pos_delta = I32_MAX - data->pos_min;
+ data->pos_delta = SSize_t_MAX - data->pos_min;
if (is_par > (I32)U8_MAX)
is_par = 0;
if (is_par && pars==1 && data) {
@@ -5419,13 +5427,15 @@ S_compile_runtime_code(pTHX_ RExC_state_t * const pRExC_state,
STATIC bool
-S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* sv_longest, SV** rx_utf8, SV** rx_substr, I32* rx_end_shift, I32 lookbehind, I32 offset, I32 *minlen, STRLEN longest_length, bool eol, bool meol)
+S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* sv_longest, SV** rx_utf8, SV** rx_substr, SSize_t* rx_end_shift,
+ SSize_t lookbehind, SSize_t offset, SSize_t *minlen, STRLEN longest_length, bool eol, bool meol)
{
/* This is the common code for setting up the floating and fixed length
* string data extracted from Perl_re_op_compile() below. Returns a boolean
* as to whether succeeded or not */
- I32 t,ml;
+ I32 t;
+ SSize_t ml;
if (! (longest_length
|| (eol /* Can't have SEOL and MULTI */
@@ -5450,7 +5460,7 @@ S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* sv_longest, SV** rx_utf8, S
follow this item. We calculate it ahead of time as once the
lookbehind offset is added in we lose the ability to correctly
calculate it.*/
- ml = minlen ? *(minlen) : (I32)longest_length;
+ ml = minlen ? *(minlen) : (SSize_t)longest_length;
*rx_end_shift = ml - offset
- longest_length + (SvTAIL(sv_longest) != 0)
+ lookbehind;
@@ -5519,7 +5529,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
char *exp;
regnode *scan;
I32 flags;
- I32 minlen = 0;
+ SSize_t minlen = 0;
U32 rx_flags;
SV *pat;
SV *code_blocksv = NULL;
@@ -6274,7 +6284,7 @@ reStudy:
{
r->float_min_offset = data.offset_float_min - data.lookbehind_float;
r->float_max_offset = data.offset_float_max;
- if (data.offset_float_max < I32_MAX) /* Don't offset infinity */
+ if (data.offset_float_max < SSize_t_MAX) /* Don't offset infinity */
r->float_max_offset -= data.lookbehind_float;
SvREFCNT_inc_simple_void_NN(data.longest_float);
}
@@ -6477,8 +6487,8 @@ reStudy:
});
#ifdef RE_TRACK_PATTERN_OFFSETS
DEBUG_OFFSETS_r(if (ri->u.offsets) {
- const U32 len = ri->u.offsets[0];
- U32 i;
+ const STRLEN len = ri->u.offsets[0];
+ STRLEN i;
GET_RE_DEBUG_FLAGS_DECL;
PerlIO_printf(Perl_debug_log, "Offsets: [%"UVuf"]\n\t", (UV)ri->u.offsets[0]);
for (i = 1; i <= len; i++) {
@@ -6666,7 +6676,7 @@ Perl_reg_named_buff_scalar(pTHX_ REGEXP * const r, const U32 flags)
{
SV *ret;
AV *av;
- I32 length;
+ SSize_t length;
struct regexp *const rx = ReANY(r);
PERL_ARGS_ASSERT_REG_NAMED_BUFF_SCALAR;
@@ -12266,7 +12276,7 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth,
regnode * const orig_emit = RExC_emit; /* Save the original RExC_emit in
case we need to change the emitted regop to an EXACT. */
const char * orig_parse = RExC_parse;
- const I32 orig_size = RExC_size;
+ const SSize_t orig_size = RExC_size;
GET_RE_DEBUG_FLAGS_DECL;
PERL_ARGS_ASSERT_REGCLASS;
diff --git a/regexec.c b/regexec.c
index 4a25f5c..63bee9a 100644
--- a/regexec.c
+++ b/regexec.c
@@ -369,7 +369,7 @@ S_regcppop(pTHX_ regexp *rex, U32 *maxopenparen_p)
);
paren = *maxopenparen_p;
for ( ; i > 0; i -= REGCP_PAREN_ELEMS) {
- I32 tmps;
+ SSize_t tmps;
rex->offs[paren].start_tmp = SSPOPINT;
rex->offs[paren].start = SSPOPIV;
tmps = SSPOPIV;
@@ -526,7 +526,7 @@ S_isFOO_utf8_lc(pTHX_ const U8 classnum, const U8* character)
*/
I32
Perl_pregexec(pTHX_ REGEXP * const prog, char* stringarg, char *strend,
- char *strbeg, I32 minend, SV *screamer, U32 nosave)
+ char *strbeg, SSize_t minend, SV *screamer, U32 nosave)
/* stringarg: the point in the string at which to begin matching */
/* strend: pointer to null at end of string */
/* strbeg: real beginning of string */
@@ -617,7 +617,7 @@ Perl_re_intuit_start(pTHX_
struct regexp *const prog = ReANY(rx);
SSize_t start_shift = 0;
/* Should be nonnegative! */
- I32 end_shift = 0;
+ SSize_t end_shift = 0;
char *s;
SV *check;
char *t;
@@ -687,7 +687,7 @@ Perl_re_intuit_start(pTHX_
See [perl #115242] */
{
/* Substring at constant offset from beg-of-str... */
- I32 slen;
+ SSize_t slen;
s = HOP3c(strpos, prog->check_offset_min, strend);
@@ -723,9 +723,9 @@ Perl_re_intuit_start(pTHX_
end_shift = prog->check_end_shift;
if (!ml_anch) {
- const I32 end = prog->check_offset_max + CHR_SVLEN(check)
+ const SSize_t end = prog->check_offset_max + CHR_SVLEN(check)
- (SvTAIL(check) != 0);
- const I32 eshift = CHR_DIST((U8*)strend, (U8*)s) - end;
+ const SSize_t eshift = CHR_DIST((U8*)strend, (U8*)s) - end;
if (end_shift < eshift)
end_shift = eshift;
@@ -752,7 +752,7 @@ Perl_re_intuit_start(pTHX_
{
SSize_t srch_start_shift = start_shift;
- I32 srch_end_shift = end_shift;
+ SSize_t srch_end_shift = end_shift;
U8* start_point;
U8* end_point;
if (srch_start_shift < 0 && strbeg - s > srch_start_shift) {
@@ -1555,7 +1555,7 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
* characters, and there are only 2 availabe, we know without
* trying that it will fail; so don't start a match past the
* required minimum number from the far end */
- e = HOP3c(strend, -((I32)ln), s);
+ e = HOP3c(strend, -((SSize_t)ln), s);
if (reginfo->intuit && e < s) {
e = s; /* Due to minlen logic of intuit() */
@@ -1601,7 +1601,7 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
* only 2 are left, it's guaranteed to fail, so don't start a
* match that would require us to go beyond the end of the string
*/
- e = HOP3c(strend, -((I32)lnc), s);
+ e = HOP3c(strend, -((SSize_t)lnc), s);
if (reginfo->intuit && e < s) {
e = s; /* Due to minlen logic of intuit() */
@@ -2099,7 +2099,7 @@ S_reg_set_capture_string(pTHX_ REGEXP * const rx,
{
SSize_t min = 0;
SSize_t max = strend - strbeg;
- I32 sublen;
+ SSize_t sublen;
if ( (flags & REXEC_COPY_SKIP_POST)
&& !(prog->extflags & RXf_PMf_KEEPCOPY) /* //p */
@@ -2179,7 +2179,8 @@ S_reg_set_capture_string(pTHX_ REGEXP * const rx,
* $x = "\x{100}" x 1E6; 1 while $x =~ /(.)/g;
* from going quadratic */
if (SvPOKp(sv) && SvPVX(sv) == strbeg)
- sv_pos_b2u(sv, &(prog->subcoffset));
+ prog->subcoffset = sv_pos_b2u_flags(sv, prog->subcoffset,
+ SV_GMAGIC|SV_CONST_RETURN);
else
prog->subcoffset = utf8_length((U8*)strbeg,
(U8*)(strbeg+prog->suboffset));
@@ -2202,7 +2203,7 @@ S_reg_set_capture_string(pTHX_ REGEXP * const rx,
*/
I32
Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, char *strend,
- char *strbeg, I32 minend, SV *sv, void *data, U32 flags)
+ char *strbeg, SSize_t minend, SV *sv, void *data, U32 flags)
/* stringarg: the point in the string at which to begin matching */
/* strend: pointer to null at end of string */
/* strbeg: real beginning of string */
@@ -2219,8 +2220,8 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, char *strend,
char *s;
regnode *c;
char *startpos;
- I32 minlen; /* must match at least this many chars */
- I32 dontbother = 0; /* how many characters not to try at end */
+ SSize_t minlen; /* must match at least this many chars */
+ SSize_t dontbother = 0; /* how many characters not to try at end */
const bool utf8_target = cBOOL(DO_UTF8(sv));
I32 multiline;
RXi_GET_DECL(prog,progi);
@@ -2603,8 +2604,8 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, char *strend,
|| ((prog->float_substr != NULL || prog->float_utf8 != NULL)
&& prog->float_max_offset < strend - s)) {
SV *must;
- I32 back_max;
- I32 back_min;
+ SSize_t back_max;
+ SSize_t back_min;
char *last;
char *last1; /* Last position checked before */
#ifdef DEBUGGING
@@ -2649,7 +2650,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, char *strend,
last = strend;
} else {
last = HOP3c(strend, /* Cannot start after this */
- -(I32)(CHR_SVLEN(must)
+ -(SSize_t)(CHR_SVLEN(must)
- (SvTAIL(must) != 0) + back_min), strbeg);
}
if (s > reginfo->strbeg)
@@ -3601,7 +3602,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
regnode *scan;
regnode *next;
U32 n = 0; /* general value; init to avoid compiler warning */
- I32 ln = 0; /* len or last; init to avoid compiler warning */
+ SSize_t ln = 0; /* len or last; init to avoid compiler warning */
char *locinput = startpos;
char *pushinput; /* where to continue after a PUSH */
I32 nextchr; /* is always set to UCHARAT(locinput) */
@@ -5558,10 +5559,10 @@ NULL
if (reginfo->poscache_iter-- == 0) {
/* initialise cache */
- const I32 size = (reginfo->poscache_maxiter + 7)/8;
+ const SSize_t size = (reginfo->poscache_maxiter + 7)/8;
regmatch_info_aux *const aux = reginfo->info_aux;
if (aux->poscache) {
- if ((I32)reginfo->poscache_size < size) {
+ if ((SSize_t)reginfo->poscache_size < size) {
Renew(aux->poscache, size, char);
reginfo->poscache_size = size;
}
@@ -5579,7 +5580,7 @@ NULL
if (reginfo->poscache_iter < 0) {
/* have we already failed at this position? */
- I32 offset, mask;
+ SSize_t offset, mask;
reginfo->poscache_iter = -1; /* stop eventual underflow */
offset = (scan->flags & 0xf) - 1
@@ -7519,7 +7520,7 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * const n, const U8* const
}
STATIC U8 *
-S_reghop3(U8 *s, I32 off, const U8* lim)
+S_reghop3(U8 *s, SSize_t off, const U8* lim)
{
/* return the position 'off' UTF-8 characters away from 's', forward if
* 'off' >= 0, backwards if negative. But don't go outside of position
@@ -7554,7 +7555,7 @@ S_reghop3(U8 *s, I32 off, const U8* lim)
we ifdef it out - dmq
*/
STATIC U8 *
-S_reghop4(U8 *s, I32 off, const U8* llim, const U8* rlim)
+S_reghop4(U8 *s, SSize_t off, const U8* llim, const U8* rlim)
{
dVAR;
@@ -7581,7 +7582,7 @@ S_reghop4(U8 *s, I32 off, const U8* llim, const U8* rlim)
#endif
STATIC U8 *
-S_reghopmaybe3(U8* s, I32 off, const U8* lim)
+S_reghopmaybe3(U8* s, SSize_t off, const U8* lim)
{
dVAR;
diff --git a/regexp.h b/regexp.h
index 8542cb1..928a374 100644
--- a/regexp.h
+++ b/regexp.h
@@ -40,7 +40,7 @@ struct reg_substr_datum {
SSize_t max_offset;
SV *substr; /* non-utf8 variant */
SV *utf8_substr; /* utf8 variant */
- I32 end_shift;
+ SSize_t end_shift;
};
struct reg_substr_data {
struct reg_substr_datum data[3]; /* Actual array */
@@ -63,7 +63,7 @@ typedef struct regexp_paren_pair {
* "abc" =~ /(.(?{print "[$1]"}))+/
*outputs [][a][b]
* This field is not part of the API. */
- I32 start_tmp;
+ SSize_t start_tmp;
} regexp_paren_pair;
#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_UTF8_C)
@@ -104,9 +104,9 @@ struct reg_code_block {
/* Information about the match that the perl core uses to */ \
/* manage things */ \
U32 extflags; /* Flags used both externally and internally */ \
- I32 minlen; /* mininum possible number of chars in string to match */\
- I32 minlenret; /* mininum possible number of chars in $& */ \
- U32 gofs; /* chars left of pos that we search from */ \
+ SSize_t minlen; /* mininum possible number of chars in string to match */\
+ SSize_t minlenret; /* mininum possible number of chars in $& */ \
+ STRLEN gofs; /* chars left of pos that we search from */ \
/* substring data about strings that must appear in the */ \
/* final match, used for optimisations */ \
struct reg_substr_data *substrs; \
@@ -125,8 +125,8 @@ struct reg_code_block {
char *subbeg; \
SV_SAVED_COPY /* If non-NULL, SV which is COW from original */\
SSize_t sublen; /* Length of string pointed by subbeg */ \
- I32 suboffset; /* byte offset of subbeg from logical start of str */ \
- I32 subcoffset; /* suboffset equiv, but in chars (for @-/@+) */ \
+ SSize_t suboffset; /* byte offset of subbeg from logical start of str */ \
+ SSize_t subcoffset; /* suboffset equiv, but in chars (for @-/@+) */ \
/* Information about the match that isn't often used */ \
/* offset from wrapped to the start of precomp */ \
PERL_BITFIELD32 pre_prefix:4; \
@@ -146,7 +146,7 @@ typedef struct regexp {
typedef struct re_scream_pos_data_s
{
char **scream_olds; /* match pos */
- I32 *scream_pos; /* Internal iterator of scream. */
+ SSize_t *scream_pos; /* Internal iterator of scream. */
} re_scream_pos_data;
/* regexp_engine structure. This is the dispatch table for regexes.
@@ -155,7 +155,7 @@ typedef struct re_scream_pos_data_s
typedef struct regexp_engine {
REGEXP* (*comp) (pTHX_ SV * const pattern, U32 flags);
I32 (*exec) (pTHX_ REGEXP * const rx, char* stringarg, char* strend,
- char* strbeg, I32 minend, SV* sv,
+ char* strbeg, SSize_t minend, SV* sv,
void* data, U32 flags);
char* (*intuit) (pTHX_
REGEXP * const rx,
@@ -606,7 +606,7 @@ typedef struct {
STRLEN suboffset; /* saved suboffset field from rex */
STRLEN subcoffset; /* saved subcoffset field from rex */
MAGIC *pos_magic; /* pos() magic attached to $_ */
- I32 pos; /* the original value of pos() in pos_magic */
+ SSize_t pos; /* the original value of pos() in pos_magic */
U8 pos_flags; /* flags to be restored; currently only MGf_BYTES*/
} regmatch_info_aux_eval;
--
1.7.12.4 (Apple Git-37) |
From @cpansproutOn Sun Aug 18 16:00:47 2013, sprout wrote:
which I forgot to mention are also attached to that message. -- Father Chrysostomos |
From @iabynOn Sun, Aug 18, 2013 at 04:00:48PM -0700, Father Chrysostomos via RT wrote:
I can't really comment on the regcomp.c parts since I only really (for -- |
From @cpansproutOn Mon Aug 19 05:16:19 2013, davem wrote:
That’s unfortunate, as that is the part I am most worried about.
Thank you. (The last time you applied that adjective to one of my At the risk of changing the subject, you’ve just reminded me of something. Is there any reason why this (in regexec.c): /* extract RE object from returned value; compiling if has to come after this? S_regcp_restore(aTHX_ rex, runops_cp, &maxopenparen); Couldn’t it be merged with /* before restoring everything, evaluate the returned further up? -- Father Chrysostomos |
From @demerphqOn 19 August 2013 16:15, Father Chrysostomos via RT
I will do my best to review the regcomp.c bits. Ill try to get it done this week sometime. Yves |
From @iabynOn Mon, Aug 19, 2013 at 07:15:14AM -0700, Father Chrysostomos via RT wrote:
Well originally all the code was after the restore; I moved the extracting "abcdef" =~ m{^(a)(??{ "b" =~ /(.)/; $1 })} Whether the rest of the compiling code could be moved back as well (thus $r = qr/(??{ '\\b+' })/; here the warning generated when \b+ is is compiled, should appear to come -- |
From @khwilliamsonOn 08/18/2013 05:00 PM, Father Chrysostomos via RT wrote:
I am really familiar with a few areas of both regcomp.c and regexec.c. Given what I know, I've looked at your patch, and it looks plausible to Everything I have to say is a nit or bikeshedding: I would prefer using something other than SSize_t throughout, We mostly Around line 4135, %z is not a C89 standard format, something else should |
From @bulk88On Fri Aug 23 14:14:52 2013, public@khwilliamson.com wrote:
PerlIO_printf winds up using sv_vcatpvfn_flags in some/all (???) perl |
From @cpansproutOn Fri Aug 23 14:14:52 2013, public@khwilliamson.com wrote:
That was a mistake. I’ll change it back.
I didn’t know of any such policy. I32 is sometimes appropriate.
I think most of these distinctions are historical artefacts, and nobody
I see nothing particularly wrong with introducing SSTRLEN, apart from
Yes, they should be.
Not so critical, as it is in an #if 0 section. But you have a point. Does PerlIO_printf use the underlying C library’s (s)printf, or perl’s -- Father Chrysostomos |
From @cpansproutOn Sun Aug 18 16:00:47 2013, sprout wrote:
The branch has been merged as ed56dbc. In particular, commits 389ecb5 and 99a90e5 fix the original bug report. -- Father Chrysostomos |
@cpansprout - Status changed from 'open' to 'resolved' |
From @nwc10On Fri, Aug 23, 2013 at 02:54:05PM -0700, Father Chrysostomos via RT wrote:
I don't think that it's been properly written down in the coding standards I agree totally that I32 is sometimes appropriate. But I suspect that about 1) should this be unsigned? I32 seemed to be used as the "default type" for the first 10 years. So unless something really should be both signed, and 32 bits, I think it's Nicholas Clark * I doubt that we actually build on any platforms with 16 bit ints. |
From @nwc10On Mon, Jul 22, 2013 at 10:15:05PM -0700, Father Chrysostomos via RT wrote:
Not intentionally. See below:
Yes, I think it's safe to do this (given the lack of users, and the compiler But this has been bugging me for quite a bit. 1) LVALUE macros make it very hard to refactor the internals. You can also trigger code to run on writing the value, but avoid running I don't think that anyone is going to try to do any refactoring as *_p functions look ugly, which was sort of part of the plan. 2) Unlike functions, macros don't have any way to formally document their I don't have an answer here, and I'm not sure if I'm mentally prioritising Ultimately I think that the problem is that the use of lots of direct data Nicholas Clark |
From zefram@fysh.orgNicholas Clark wrote:
Indeed. We've been forced to change many into get/set pairs, when a value -zefram |
From @cpansproutOn Fri Aug 30 08:38:46 2013, zefram@fysh.org wrote:
Lvalue macros can be really convenient. If we protect all new macros of -- Father Chrysostomos |
From @jhiI suggest moving this extended API discussion into its own ticket |
Migrated from rt.perl.org#116907 (status was 'resolved')
Searchable as RT116907$
The text was updated successfully, but these errors were encountered: