selftests/eeh: Bump EEH wait time to 60s
authorOliver O'Halloran <oohall@gmail.com>
Wed, 22 Jan 2020 03:11:25 +0000 (14:11 +1100)
committerMichael Ellerman <mpe@ellerman.id.au>
Sat, 25 Jan 2020 13:11:37 +0000 (00:11 +1100)
Some newer cards supported by aacraid can take up to 40s to recover
after an EEH event. This causes spurious failures in the basic EEH
self-test since the current maximim timeout is only 30s.

Fix the immediate issue by bumping the timeout to a default of 60s,
and allow the wait time to be specified via an environmental variable
(EEH_MAX_WAIT).

Reported-by: Steve Best <sbest@redhat.com>
Suggested-by: Douglas Miller <dougmill@us.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200122031125.25991-1-oohall@gmail.com
tools/testing/selftests/powerpc/eeh/eeh-functions.sh

index 26112ab..f52ed92 100755 (executable)
@@ -53,9 +53,13 @@ eeh_one_dev() {
        # is a no-op.
        echo $dev >/sys/kernel/debug/powerpc/eeh_dev_check
 
-       # Enforce a 30s timeout for recovery. Even the IPR, which is infamously
-       # slow to reset, should recover within 30s.
-       max_wait=30
+       # Default to a 60s timeout when waiting for a device to recover. This
+       # is an arbitrary default which can be overridden by setting the
+       # EEH_MAX_WAIT environmental variable when required.
+
+       # The current record holder for longest recovery time is:
+       #  "Adaptec Series 8 12G SAS/PCIe 3" at 39 seconds
+       max_wait=${EEH_MAX_WAIT:=60}
 
        for i in `seq 0 ${max_wait}` ; do
                if pe_ok $dev ; then