Merge bitcoin/bitcoin#22249: test: kill process group to avoid dangling processes when using `--failfast`

451b96f7d2 test: kill process group to avoid dangling processes (S3RK)

Pull request description:

  This is an alternative to #19281

  This PR fixes a problem when after test failure with `--failfast` option there could be dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures.

  If there are any dangling nodes left at the end of the test run we kill the whole process group.
  Pros: the operations is immediate and won't lead to CI timeout
  Cons: the test_runner process is also killed and exit code is 137

  Example output:
  ```
  ...
  Early exiting after test failure

  TEST                           | STATUS    | DURATION

  rpc_decodescript.py            | ✓ Passed  | 2 s
  rpc_deprecated.py              | ✓ Passed  | 2 s
  rpc_deriveaddresses.py         | ✓ Passed  | 2 s
  rpc_dumptxoutset.py            | ✖ Failed  | 2 s

  ALL                            | ✖ Failed  | 8 s (accumulated)
  Runtime: 4 s

  Killed: 9
  > echo $?
  137
  ```

ACKs for top commit:
  MarcoFalke:
    review ACK 451b96f7d2
  aitorjs:
    ACK 451b96f7d2. Manual testing with and without **--failfast**.

Tree-SHA512: 87e510a1411b9e7571e63cf7ffc8b9a8935daf9112ffc0f069d6c406ba87743ec439808181f7e13cb97bb200fad528589786c47f0b43cf3a2ef0d06a23cb86dd
pull/826/head
MarcoFalke 3 years ago
commit 0844084c13
No known key found for this signature in database
GPG Key ID: CE2B75697E69A548

@ -19,6 +19,7 @@ import datetime
import os
import time
import shutil
import signal
import subprocess
import sys
import tempfile
@ -548,9 +549,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
all_passed = all(map(lambda test_result: test_result.was_successful, test_results)) and coverage_passed
# This will be a no-op unless failfast is True in which case there may be dangling
# processes which need to be killed.
job_queue.kill_and_join()
# Clean up dangling processes if any. This may only happen with --failfast option.
# Killing the process group will also terminate the current process but that is
# not an issue
if len(job_queue.jobs):
os.killpg(os.getpgid(0), signal.SIGKILL)
sys.exit(not all_passed)
@ -647,16 +650,6 @@ class TestHandler:
print('.', end='', flush=True)
dot_count += 1
def kill_and_join(self):
"""Send SIGKILL to all jobs and block until all have ended."""
procs = [i[2] for i in self.jobs]
for proc in procs:
proc.kill()
for proc in procs:
proc.wait()
class TestResult():
def __init__(self, name, status, time):

Loading…
Cancel
Save