ubi: wl: Close down wear-leveling before nand is suspended
authorMårten Lindahl <marten.lindahl@axis.com>
Sat, 7 Sep 2024 19:28:26 +0000 (21:28 +0200)
committerRichard Weinberger <richard@nod.at>
Thu, 14 Nov 2024 17:46:04 +0000 (18:46 +0100)
commit5580cdae05aefa96deebd7f5ade9d70c92adabd7
tree5e7b8168c094cad829768a0715c994509db76215
parentcb33ade753a6e2d629b4f87d2375ee9063d21bc7
ubi: wl: Close down wear-leveling before nand is suspended

If a reboot/shutdown signal with double force (-ff) is triggered when
the erase worker or wear-leveling worker function runs we may end up in
a race condition since the MTD device gets a reboot notification and
suspends the nand flash before the erase or wear-leveling is done. This
will reject all accesses to the flash with -EBUSY.

Sequence for the erase worker function:

   systemctl reboot -ff           ubi_thread

                                do_work
 __do_sys_reboot
   blocking_notifier_call_chain
     mtd_reboot_notifier
       nand_shutdown
         nand_suspend
                                  __erase_worker
                                    ubi_sync_erase
                                      mtd_erase
                                        nand_erase_nand

                                          # Blocked by suspended chip
                                          nand_get_device
                                            => EBUSY

Similar sequence for the wear-leveling function:

   systemctl reboot -ff           ubi_thread

                                do_work
 __do_sys_reboot
   blocking_notifier_call_chain
     mtd_reboot_notifier
       nand_shutdown
         nand_suspend
                                  wear_leveling_worker
                                    ubi_eba_copy_leb
                                      ubi_io_write
                                        mtd_write
                                          nand_write_oob

                                            # Blocked by suspended chip
                                            nand_get_device
                                              => EBUSY

 systemd-shutdown[1]: Rebooting.
 ubi0 error: ubi_io_write: error -16 while writing 2048 bytes to PEB
 CPU: 1 PID: 82 Comm: ubi_bgt0d Kdump: loaded Tainted: G           O
 (unwind_backtrace) from [<80107b9f>] (show_stack+0xb/0xc)
 (show_stack) from [<8033641f>] (dump_stack_lvl+0x2b/0x34)
 (dump_stack_lvl) from [<803b7f3f>] (ubi_io_write+0x3ab/0x4a8)
 (ubi_io_write) from [<803b817d>] (ubi_io_write_vid_hdr+0x71/0xb4)
 (ubi_io_write_vid_hdr) from [<803b6971>] (ubi_eba_copy_leb+0x195/0x2f0)
 (ubi_eba_copy_leb) from [<803b939b>] (wear_leveling_worker+0x2ff/0x738)
 (wear_leveling_worker) from [<803b86ef>] (do_work+0x5b/0xb0)
 (do_work) from [<803b9ee1>] (ubi_thread+0xb1/0x11c)
 (ubi_thread) from [<8012c113>] (kthread+0x11b/0x134)
 (kthread) from [<80100139>] (ret_from_fork+0x11/0x38)
 Exception stack(0x80c43fb0 to 0x80c43ff8)
 ...
 ubi0 error: ubi_dump_flash: err -16 while reading 2048 bytes from PEB
 ubi0 error: wear_leveling_worker: error -16 while moving PEB 246 to PEB
 ubi0 warning: ubi_ro_mode.part.0: switch to read-only mode
 ...
 ubi0 error: do_work: work failed with error code -16
 ubi0 error: ubi_thread: ubi_bgt0d: work failed with error code -16
 ...
 Kernel panic - not syncing: Software Watchdog Timer expired

Add a reboot notification for the ubi/wear-leveling to shutdown any
potential flash work actions before the nand is suspended.

Signed-off-by: Mårten Lindahl <marten.lindahl@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
drivers/mtd/ubi/ubi.h
drivers/mtd/ubi/wl.c