New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core tests need watchdog/timeout #14407
Comments
From @jhiI think it would be advantageous to be able to have a timeout for all the core tests. "advantageous": I could argue that at least when running the default set of tests (tests run with "make test") any single test probably shouldn't run for more than, say, X minutes, where X is a reasonably low number. The vast majority of tests completes in few seconds, or even sub-second. If one really needs comprehensive tests ("all the combinations of ...", or thorough speed regressions) they probably should be behind some other test targets / triggered by some special flags/env vars etc. "core tests" = all the tests run with the core, including the cpan/* and the dist/*. What I know and envision: (1) there is watchdog() in t/test.pl, which seemingly works both for forky systems and for thready systems. (2) problems with (1): (a) it works only for tests that explicitly use the watchdog() from test.pl - not for cpan/* (3) if one sets the timeout, and the timeout goes off, the interrupted test probably should NOT be marked (4) it should be very clear to tell apart "the timeout happened" from "test failed / hung" - the most solid way probably would be to have an IPC of some sort, but that might be far too elaborate (and not work in non-POSIX anyway) AND interact badly with some tests - so may need to just to resort to the Good Old Filesystem? (5) there may well be some tests which do not take kindly to being interrupted, so maybe there should be a way for a test to explicitly "opt out" itself. [One particular reason I can't deny being close to my heart is running the core tests in slower systems: either systems of yesteryear (think Sparc 1), or more modern but still underpowered systems (think RaspPi). Exhibit A: for some reason, some tests that run in seconds or at most dozens of them, can take dozens of *minutes* in such systems. Ideally, at least in some cases, the slowness actually would indicate something that could be fixed in Perl, or at least in the test. It may be CPU, it may be memory, it may be I/O. But realistically, that is probably not going to happen, unless it's something very trivial. Better to have a way to cut short tests that are taking just too long.] When faced with thousands of tests to run, and some of them very slow, people can say (and they do) "why don't you just throw more CPUs/cores/GHz at it" but that does no good if you don't have more of them to throw around. From my quick research, I can see that at some point dual-lifing test.pl was discussed, and the watchdog mentioned: http://comments.gmane.org/gmane.comp.lang.perl.perl5.porters/60582 Note: I won't address this ticket myself. Tuit drought. |
From @cpansproutOn Wed Jan 07 18:16:43 2015, jhi wrote:
I have plenty of tuits, but most of them are the wrong shape. It’s only the round ones that count. -- Father Chrysostomos |
The RT System itself - Status changed from 'new' to 'open' |
From @bulk88On Wed Jan 07 18:16:43 2015, jhi wrote:
Why? more human productivity by less waiting before the patch is declared done?
This timeout feature sounds like it should be an ENV var for Test::More and the watchdog feature copied from test.pl into Test::More.
killpg which watchdog uses was broken for a while on Win32 https://rt.perl.org/Ticket/Display.html?id=121230 I saw some interesting lockups/deadlocks in Win32 Kernel mode at that time but never investigated them since that involves debugging a closed source OS kernel. One of those hangs in the core test suite probably was https://rt-archive.perl.org/perl5/Ticket/Display.html?id=123402 at the time but I didnt know it. I would say alot of watchdog's uses is a mistake to cover up core bugs unless the test explicitly is "test that sub/func hangs forever and does not error instantly", but with a generic 60 second timer for the whole .t, you don't know where the .t was (what "ok 1234" line), and IIRC, some files dont print their test count until the end, so really where the test file was stuck at when the watchdog kill the process is random, unknown, and hides new bugs if the hang point suddenly shifts in the .t (and that shifting IS a bug).
Whose timeout is this? harness process or .t process? If the timeout fires, it means there was a hang right?
Leaving alot of non-git-ignored temp files is what happens when .t files die/SEGV/ctrl-c early.
A curated list of tests would be best. Since "minitest" is used, why not call it "fasttest"? I take minitest and add porting/*.t to the list for "base/*.t comp/*.t cmd/*.t run/*.t io/*.t re/*.t opbasic/*.t op/*.t uni/*.t perf/*.t pragma/*.t porting/*.t" as the harness args. As my wild ass guess 95% of test failures are in comp, op and porting. Maybe toss in B/Concise too. -- |
From @jhi
I was more thinking of faster smoke turnaround times when I wrote
Whatever works. Now it seems like the pieces of the solution exist
Well, that is one of the design problems (whose timeout it is). If it
Some of those tests are in fact of the "very very bad' category. E.g. I don't like "fasttest" idea since it's not really about running the Don't like curation, either, because oftentimes an innocuous change Maybe the default should still be "no watchdog", given that vast If you have a slower system, let me also advertise the "cd t && env
-- |
Migrated from rt.perl.org#123565 (status was 'open')
Searchable as RT123565$
The text was updated successfully, but these errors were encountered: