Merge bitcoin/bitcoin#22249: test: kill process group to avoid dangling processes when using `--failfast`

451b96f7d2 test: kill process group to avoid dangling processes (S3RK)

Pull request description:

  This is an alternative to #19281

  This PR fixes a problem when after test failure with `--failfast` option there could be dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures.

  If there are any dangling nodes left at the end of the test run we kill the whole process group.
  Pros: the operations is immediate and won't lead to CI timeout
  Cons: the test_runner process is also killed and exit code is 137

  Example output:
  ```
  ...
  Early exiting after test failure

  TEST                           | STATUS    | DURATION

  rpc_decodescript.py            | ✓ Passed  | 2 s
  rpc_deprecated.py              | ✓ Passed  | 2 s
  rpc_deriveaddresses.py         | ✓ Passed  | 2 s
  rpc_dumptxoutset.py            | ✖ Failed  | 2 s

  ALL                            | ✖ Failed  | 8 s (accumulated)
  Runtime: 4 s

  Killed: 9
  > echo $?
  137
  ```

ACKs for top commit:
  MarcoFalke:
    review ACK 451b96f7d2
  aitorjs:
    ACK 451b96f7d2. Manual testing with and without **--failfast**.

Tree-SHA512: 87e510a1411b9e7571e63cf7ffc8b9a8935daf9112ffc0f069d6c406ba87743ec439808181f7e13cb97bb200fad528589786c47f0b43cf3a2ef0d06a23cb86dd
pull/826/head
MarcoFalke 3 years ago
commit 0844084c13
No known key found for this signature in database
GPG Key ID: CE2B75697E69A548

@ -19,6 +19,7 @@ import datetime
import os import os
import time import time
import shutil import shutil
import signal
import subprocess import subprocess
import sys import sys
import tempfile import tempfile
@ -548,9 +549,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
all_passed = all(map(lambda test_result: test_result.was_successful, test_results)) and coverage_passed all_passed = all(map(lambda test_result: test_result.was_successful, test_results)) and coverage_passed
# This will be a no-op unless failfast is True in which case there may be dangling # Clean up dangling processes if any. This may only happen with --failfast option.
# processes which need to be killed. # Killing the process group will also terminate the current process but that is
job_queue.kill_and_join() # not an issue
if len(job_queue.jobs):
os.killpg(os.getpgid(0), signal.SIGKILL)
sys.exit(not all_passed) sys.exit(not all_passed)
@ -647,16 +650,6 @@ class TestHandler:
print('.', end='', flush=True) print('.', end='', flush=True)
dot_count += 1 dot_count += 1
def kill_and_join(self):
"""Send SIGKILL to all jobs and block until all have ended."""
procs = [i[2] for i in self.jobs]
for proc in procs:
proc.kill()
for proc in procs:
proc.wait()
class TestResult(): class TestResult():
def __init__(self, name, status, time): def __init__(self, name, status, time):

Loading…
Cancel
Save