ipmi: fix long wait in unload when IPMI disconnect
authorZhang Yuchen <zhangyuchen.lcr@bytedance.com>
Fri, 7 Oct 2022 09:26:16 +0000 (17:26 +0800)
committerCorey Minyard <cminyard@mvista.com>
Mon, 17 Oct 2022 14:51:27 +0000 (09:51 -0500)
When fixing the problem mentioned in PATCH1, we also found
the following problem:

If the IPMI is disconnected and in the sending process, the
uninstallation driver will be stuck for a long time.

The main problem is that uninstalling the driver waits for curr_msg to
be sent or HOSED. After stopping tasklet, the only place to trigger the
timeout mechanism is the circular poll in shutdown_smi.

The poll function delays 10us and calls smi_event_handler(smi_info,10).
Smi_event_handler deducts 10us from kcs->ibf_timeout.

But the poll func is followed by schedule_timeout_uninterruptible(1).
The time consumed here is not counted in kcs->ibf_timeout.

So when 10us is deducted from kcs->ibf_timeout, at least 1 jiffies has
actually passed. The waiting time has increased by more than a
hundredfold.

Now instead of calling poll(). call smi_event_handler() directly and
calculate the elapsed time.

For verification, you can directly use ebpf to check the kcs->
ibf_timeout for each call to kcs_event() when IPMI is disconnected.
Decrement at normal rate before unloading. The decrement rate becomes
very slow after unloading.

  $ bpftrace -e 'kprobe:kcs_event {printf("kcs->ibftimeout : %d\n",
      *(arg0+584));}'

Signed-off-by: Zhang Yuchen <zhangyuchen.lcr@bytedance.com>
Message-Id: <20221007092617.87597-3-zhangyuchen.lcr@bytedance.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org
drivers/char/ipmi/ipmi_si_intf.c

index 6e357ad..abddd7e 100644 (file)
@@ -2153,6 +2153,20 @@ skip_fallback_noirq:
 }
 module_init(init_ipmi_si);
 
+static void wait_msg_processed(struct smi_info *smi_info)
+{
+       unsigned long jiffies_now;
+       long time_diff;
+
+       while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
+               jiffies_now = jiffies;
+               time_diff = (((long)jiffies_now - (long)smi_info->last_timeout_jiffies)
+                    * SI_USEC_PER_JIFFY);
+               smi_event_handler(smi_info, time_diff);
+               schedule_timeout_uninterruptible(1);
+       }
+}
+
 static void shutdown_smi(void *send_info)
 {
        struct smi_info *smi_info = send_info;
@@ -2187,16 +2201,13 @@ static void shutdown_smi(void *send_info)
         * in the BMC.  Note that timers and CPU interrupts are off,
         * so no need for locks.
         */
-       while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
-               poll(smi_info);
-               schedule_timeout_uninterruptible(1);
-       }
+       wait_msg_processed(smi_info);
+
        if (smi_info->handlers)
                disable_si_irq(smi_info);
-       while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
-               poll(smi_info);
-               schedule_timeout_uninterruptible(1);
-       }
+
+       wait_msg_processed(smi_info);
+
        if (smi_info->handlers)
                smi_info->handlers->cleanup(smi_info->si_sm);