block: Hold invalidate_lock in BLKZEROOUT ioctl
authorShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tue, 9 Nov 2021 10:47:23 +0000 (19:47 +0900)
committerJens Axboe <axboe@kernel.dk>
Tue, 9 Nov 2021 19:41:12 +0000 (12:41 -0700)
When BLKZEROOUT ioctl and data read race, the data read leaves stale
page cache. To avoid the stale page cache, hold invalidate_lock of the
block device file mapping. The stale page cache is observed when
blktests test case block/009 is modified to call "blkdiscard -z" command
and repeated hundreds of times.

This patch can be applied back to the stable kernel version v5.15.y.
Rework is required for older stable kernels.

Fixes: 22dd6d356628 ("block: invalidate the page cache when issuing BLKZEROOUT")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211109104723.835533-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
block/ioctl.c

index 9fa87f6..0a1d10a 100644 (file)
@@ -154,6 +154,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 {
        uint64_t range[2];
        uint64_t start, end, len;
+       struct inode *inode = bdev->bd_inode;
        int err;
 
        if (!(mode & FMODE_WRITE))
@@ -176,12 +177,17 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
                return -EINVAL;
 
        /* Invalidate the page cache, including dirty pages */
+       filemap_invalidate_lock(inode->i_mapping);
        err = truncate_bdev_range(bdev, mode, start, end);
        if (err)
-               return err;
+               goto fail;
+
+       err = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+                                  BLKDEV_ZERO_NOUNMAP);
 
-       return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
-                       BLKDEV_ZERO_NOUNMAP);
+fail:
+       filemap_invalidate_unlock(inode->i_mapping);
+       return err;
 }
 
 static int put_ushort(unsigned short __user *argp, unsigned short val)