core tests need watchdog/timeout #14407

p5pRT · 2015-01-08T02:16:43Z

Migrated from rt.perl.org#123565 (status was 'open')

Searchable as RT123565$

p5pRT · 2015-01-08T02:16:43Z

From @jhi

I think it would be advantageous to be able to have a timeout for all the core tests.

"advantageous": I could argue that at least when running the default set of tests (tests run with "make test") any single test probably shouldn't run for more than, say, X minutes, where X is a reasonably low number. The vast majority of tests completes in few seconds, or even sub-second. If one really needs comprehensive tests ("all the combinations of ...", or thorough speed regressions) they probably should be behind some other test targets / triggered by some special flags/env vars etc.

"core tests" = all the tests run with the core, including the cpan/* and the dist/*.

What I know and envision:

(1) there is watchdog() in t/test.pl, which seemingly works both for forky systems and for thready systems.

(2) problems with (1):

(a) it works only for tests that explicitly use the watchdog() from test.pl - not for cpan/*
(b) some core tests explicitly do NOT use test.pl (t/porting/podcheck.t, at least, they cook their own)
(c) in other words, there's no way to say that *any* test taking more than X should be interrupted

(3) if one sets the timeout, and the timeout goes off, the interrupted test probably should NOT be marked
as a failing one IF all the results until the timeout have been ok (or TODO) -- but there were failures, the
test should fail, of course.

(4) it should be very clear to tell apart "the timeout happened" from "test failed / hung" - the most solid way probably would be to have an IPC of some sort, but that might be far too elaborate (and not work in non-POSIX anyway) AND interact badly with some tests - so may need to just to resort to the Good Old Filesystem?

(5) there may well be some tests which do not take kindly to being interrupted, so maybe there should be a way for a test to explicitly "opt out" itself.

[One particular reason I can't deny being close to my heart is running the core tests in slower systems: either systems of yesteryear (think Sparc 1), or more modern but still underpowered systems (think RaspPi). Exhibit A: for some reason, some tests that run in seconds or at most dozens of them, can take dozens of *minutes* in such systems. Ideally, at least in some cases, the slowness actually would indicate something that could be fixed in Perl, or at least in the test. It may be CPU, it may be memory, it may be I/O. But realistically, that is probably not going to happen, unless it's something very trivial. Better to have a way to cut short tests that are taking just too long.]

When faced with thousands of tests to run, and some of them very slow, people can say (and they do) "why don't you just throw more CPUs/cores/GHz at it" but that does no good if you don't have more of them to throw around.

From my quick research, I can see that at some point dual-lifing test.pl was discussed, and the watchdog mentioned: http://comments.gmane.org/gmane.comp.lang.perl.perl5.porters/60582

Note: I won't address this ticket myself. Tuit drought.

p5pRT · 2015-01-08T17:42:39Z

From @cpansprout

On Wed Jan 07 18:16:43 2015, jhi wrote:

Tuit drought.

I have plenty of tuits, but most of them are the wrong shape. It’s only the round ones that count.

--

Father Chrysostomos

p5pRT · 2015-01-08T17:42:39Z

The RT System itself - Status changed from 'new' to 'open'

p5pRT · 2015-01-08T19:47:36Z

From @bulk88

On Wed Jan 07 18:16:43 2015, jhi wrote:

I think it would be advantageous to be able to have a timeout for all
the core tests.

"advantageous": I could argue that at least when running the default
set of tests (tests run with "make test") any single test probably
shouldn't run for more than, say, X minutes, where X is a reasonably
low number. The vast majority of tests completes in few seconds, or
even sub-second. If one really needs comprehensive tests ("all the
combinations of ...", or thorough speed regressions) they probably
should be behind some other test targets / triggered by some special
flags/env vars etc.

Why? more human productivity by less waiting before the patch is declared done?

What I know and envision:

(1) there is watchdog() in t/test.pl, which seemingly works both for
forky systems and for thready systems.

(2) problems with (1):

(a) it works only for tests that explicitly use the watchdog() from
test.pl - not for cpan/*
(b) some core tests explicitly do NOT use test.pl

This timeout feature sounds like it should be an ENV var for Test::More and the watchdog feature copied from test.pl into Test::More.

(t/porting/podcheck.t, at least, they cook their own)
(c) in other words, there's no way to say that *any* test taking more
than X should be interrupted

(3) if one sets the timeout, and the timeout goes off, the interrupted
test probably should NOT be marked
as a failing one IF all the results until the timeout have been ok
(or TODO) -- but there were failures, the
test should fail, of course.

killpg which watchdog uses was broken for a while on Win32 https://rt.perl.org/Ticket/Display.html?id=121230 I saw some interesting lockups/deadlocks in Win32 Kernel mode at that time but never investigated them since that involves debugging a closed source OS kernel. One of those hangs in the core test suite probably was https://rt-archive.perl.org/perl5/Ticket/Display.html?id=123402 at the time but I didnt know it. I would say alot of watchdog's uses is a mistake to cover up core bugs unless the test explicitly is "test that sub/func hangs forever and does not error instantly", but with a generic 60 second timer for the whole .t, you don't know where the .t was (what "ok 1234" line), and IIRC, some files dont print their test count until the end, so really where the test file was stuck at when the watchdog kill the process is random, unknown, and hides new bugs if the hang point suddenly shifts in the .t (and that shifting IS a bug).

(4) it should be very clear to tell apart "the timeout happened" from
"test failed / hung" - the most solid way probably would be to have an
IPC of some sort, but that might be far too elaborate (and not work in
non-POSIX anyway) AND interact badly with some tests - so may need to
just to resort to the Good Old Filesystem?

Whose timeout is this? harness process or .t process? If the timeout fires, it means there was a hang right?

(5) there may well be some tests which do not take kindly to being
interrupted, so maybe there should be a way for a test to explicitly
"opt out" itself.

Leaving alot of non-git-ignored temp files is what happens when .t files die/SEGV/ctrl-c early.

[One particular reason I can't deny being close to my heart is running
the core tests in slower systems: either systems of yesteryear (think
Sparc 1), or more modern but still underpowered systems (think
RaspPi). Exhibit A: for some reason, some tests that run in seconds
or at most dozens of them, can take dozens of *minutes* in such
systems. Ideally, at least in some cases, the slowness actually would
indicate something that could be fixed in Perl, or at least in the
test. It may be CPU, it may be memory, it may be I/O. But
realistically, that is probably not going to happen, unless it's
something very trivial. Better to have a way to cut short tests that
are taking just too long.]

A curated list of tests would be best. Since "minitest" is used, why not call it "fasttest"? I take minitest and add porting/*.t to the list for "base/*.t comp/*.t cmd/*.t run/*.t io/*.t re/*.t opbasic/*.t op/*.t uni/*.t perf/*.t pragma/*.t porting/*.t" as the harness args. As my wild ass guess 95% of test failures are in comp, op and porting. Maybe toss in B/Concise too.

--
bulk88 ~ bulk88 at hotmail.com

p5pRT · 2015-01-09T01:43:35Z

From @jhi

"advantageous": I could argue that at least when running the default
set of tests (tests run with "make test") any single test probably
shouldn't run for more than, say, X minutes, where X is a reasonably
low number. The vast majority of tests completes in few seconds, or
even sub-second. If one really needs comprehensive tests ("all the
combinations of ...", or thorough speed regressions) they probably
should be behind some other test targets / triggered by some special
flags/env vars etc.

Why? more human productivity by less waiting before the patch is declared done?

I was more thinking of faster smoke turnaround times when I wrote
that, but sure, not having to wait for several hours (yes, really),
for "make test" makes for much more rewarding debugging.

This timeout feature sounds like it should be an ENV var for Test::More and the watchdog feature copied from test.pl into Test::More.

Whatever works. Now it seems like the pieces of the solution exist
(well, most of them, anyway), but they are rather haphazardly
distributed.

Whose timeout is this? harness process or .t process? If the timeout fires, it means there was a hang right?

Well, that is one of the design problems (whose timeout it is). If it
is the harness (or, in fact, TEST, even "minitest" contains evil
tests), then the timeout can be better monitored. But still it is not
trivial: if the watchdog is a forked-off process off of harness/TEST,
how does the main harness/TEST know for certain the timeout
interrupted the test?

A curated list of tests would be best. Since "minitest" is used, why not call it "fasttest"? I take minitest and add porting/*.t to the list for "base/*.t comp/*.t cmd/*.t run/*.t io/*.t re/*.t opbasic/*.t op/*.t uni/*.t perf/*.t pragma/*.t porting/*.t" as the harness args. As my wild ass guess 95% of test failures are in comp, op and
porting. Maybe toss in B/Concise too.

Some of those tests are in fact of the "very very bad' category. E.g.
t/re/uniprops.t is a horrible grinder in slower systems. See
http://code.activestate.com/lists/perl5-porters/214852/

I don't like "fasttest" idea since it's not really about running the
whole test suite "fast" (make minitest is good enough for such
purpose), but more about making certain outliers don't hog the testing
resources.

Don't like curation, either, because oftentimes an innocuous change
"over here" can cause a slowdown "over there". In other words, the
list would be quickly out of date. Much better to notice that "hey,
why are all these tests suddenly taking more than X minutes, they used
to run in seconds".

Maybe the default should still be "no watchdog", given that vast
majority of current systems can handle even evil nasty tests in less
than a minute. But, say, if the watchdog env var is set (therefore
being an explicit choice), then interrupt any test after that many
seconds. And if a test has only produced "ok" results up to the
moment of its interruption, do not call it "failed" but "timeouted".

If you have a slower system, let me also advertise the "cd t && env
HARNESS_TIMER=1 ../perl -I../lib harness" (IIRC).

--
bulk88 ~ bulk88 at hotmail.com

--
There is this special biologist word we use for 'stable'. It is
'dead'. -- Jack Cohen

p5pRT added the Severity Low label Oct 19, 2019

xenu removed the Severity Low label Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core tests need watchdog/timeout #14407

core tests need watchdog/timeout #14407

p5pRT commented Jan 8, 2015

p5pRT commented Jan 8, 2015

p5pRT commented Jan 8, 2015

p5pRT commented Jan 8, 2015

p5pRT commented Jan 8, 2015

p5pRT commented Jan 9, 2015

core tests need watchdog/timeout #14407

core tests need watchdog/timeout #14407

Comments

p5pRT commented Jan 8, 2015

p5pRT commented Jan 8, 2015

From @jhi

p5pRT commented Jan 8, 2015

From @cpansprout

p5pRT commented Jan 8, 2015

p5pRT commented Jan 8, 2015

From @bulk88

p5pRT commented Jan 9, 2015

From @jhi