linux-2.6-microblaze.git
3 years agonvmet-fc: simplify nvmet_fc_alloc_hostport
Amit Engel [Mon, 22 Mar 2021 19:57:17 +0000 (21:57 +0200)]
nvmet-fc: simplify nvmet_fc_alloc_hostport

Once a host is already created, avoid allocate additional hostports that
will be thrown away. add an helper function to handle host search.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Amit Engel <amit.engel@dell.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet-tcp: fix a segmentation fault during io parsing error
Elad Grupi [Wed, 31 Mar 2021 09:13:14 +0000 (17:13 +0800)]
nvmet-tcp: fix a segmentation fault during io parsing error

In case there is an io that contains inline data and it goes to
parsing error flow, command response will free command and iov
before clearing the data on the socket buffer.
This will delay the command response until receive flow is completed.

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agolightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15
Christoph Hellwig [Tue, 13 Apr 2021 10:52:57 +0000 (10:52 +0000)]
lightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15

Lightnvm was an innovative idea to expose more low-level control over SSDs.
But it failed to get properly standardized and remains a non-standarized
extension to NVMe that requires vendor specific quirks for a few now mostly
obsolete SSD devices.  The standardized ZNS command set for NVMe has take
over a lot of the approaches and allows for fully standardized operation.

Remove the Linux code to support open channel SSDs as the few production
deployments of the above mentioned SSDs are using userspace driver stacks
instead of the fairly limited Linux support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Link: https://lore.kernel.org/r/20210413105257.159260-5-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agolightnvm: remove duplicate include in lightnvm.h
Zhang Yunkai [Tue, 13 Apr 2021 10:52:56 +0000 (10:52 +0000)]
lightnvm: remove duplicate include in lightnvm.h

'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h'
is duplicated.It is also included in the 5th and 7th line.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agolightnvm: return the correct return value
Tian Tao [Tue, 13 Apr 2021 10:52:55 +0000 (10:52 +0000)]
lightnvm: return the correct return value

When memdup_user returns an error, memdup_user has two different return
values, use PTR_ERR to get the correct return value.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-3-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agolightnvm: use kobj_to_dev()
Chaitanya Kulkarni [Tue, 13 Apr 2021 10:52:54 +0000 (10:52 +0000)]
lightnvm: use kobj_to_dev()

This fixs coccicheck warning:

drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for
kobj_to_dev()

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the -ERESTARTSYS handling in blkdev_get_by_dev
Christoph Hellwig [Mon, 12 Apr 2021 08:03:18 +0000 (10:03 +0200)]
block: remove the -ERESTARTSYS handling in blkdev_get_by_dev

Now that md has been cleaned up we can get rid of this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agonull_blk: add option for managing virtual boundary
Max Gurtovoy [Mon, 12 Apr 2021 09:55:23 +0000 (09:55 +0000)]
null_blk: add option for managing virtual boundary

This will enable changing the virtual boundary of null blk devices. For
now, null blk devices didn't have any restriction on the scatter/gather
elements received from the block layer. Add a module parameter and a
configfs option that will control the virtual boundary. This will
enable testing the efficiency of the block layer bounce buffer in case
a suitable application will send discontiguous IO to the given device.

Initial testing with patched FIO showed the following results (64 jobs,
128 iodepth, 1 nullb device):
IO size      READ (virt=false)   READ (virt=true)   Write (virt=false)  Write (virt=true)
----------  ------------------- -----------------  ------------------- -------------------
 1k            10.7M                8482k               10.8M              8471k
 2k            10.4M                8266k               10.4M              8271k
 4k            10.4M                8274k               10.3M              8226k
 8k            10.2M                8131k               9800k              7933k
 16k           9567k                7764k               8081k              6828k
 32k           8865k                7309k               5570k              5153k
 64k           7695k                6586k               2682k              2617k
 128k          5346k                5489k               1320k              1296k

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agogdrom: fix compilation error
Chaitanya Kulkarni [Sun, 11 Apr 2021 22:43:30 +0000 (15:43 -0700)]
gdrom: fix compilation error

Use the right name for the struct request variable that removes the
following compilation error :-

make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=sh
CROSS_COMPILE=sh4-linux-gnu- 'CC=sccache sh4-linux-gnu-gcc'
'HOSTCC=sccache gcc'

In file included from /builds/linux/include/linux/scatterlist.h:9,
                 from /builds/linux/include/linux/dma-mapping.h:10,
                 from /builds/linux/drivers/cdrom/gdrom.c:16:
/builds/linux/drivers/cdrom/gdrom.c: In function 'gdrom_readdisk_dma':
/builds/linux/drivers/cdrom/gdrom.c:586:61: error: 'rq' undeclared
(first use in this function)
  586 |  __raw_writel(page_to_phys(bio_page(req->bio)) + bio_offset(rq->bio),
      |                                                             ^~

Fixes: 1d2c82001a5f ("gdrom: support highmem")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agobcache: fix a regression of code compiling failure in debug.c
Coly Li [Sun, 11 Apr 2021 13:43:16 +0000 (21:43 +0800)]
bcache: fix a regression of code compiling failure in debug.c

The patch "bcache: remove PTR_CACHE" introduces a compiling failure in
debug.c with following error message,
  In file included from drivers/md/bcache/bcache.h:182:0,
                   from drivers/md/bcache/debug.c:9:
  drivers/md/bcache/debug.c: In function 'bch_btree_verify':
  drivers/md/bcache/debug.c:53:19: error: 'c' undeclared (first use in
  this function)
    bio_set_dev(bio, c->cache->bdev);
                     ^
This patch fixes the regression by replacing c->cache->bdev by b->c->
cache->bdev.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210411134316.80274-8-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agobcache: Use 64-bit arithmetic instead of 32-bit
Gustavo A. R. Silva [Sun, 11 Apr 2021 13:43:15 +0000 (21:43 +0800)]
bcache: Use 64-bit arithmetic instead of 32-bit

Cast multiple variables to (int64_t) in order to give the compiler
complete information about the proper arithmetic to use. Notice that
these variables are being used in contexts that expect expressions of
type int64_t  (64 bit, signed). And currently, such expressions are
being evaluated using 32-bit arithmetic.

Fixes: d0cf9503e908 ("octeontx2-pf: ethtool fec mode support")
Addresses-Coverity-ID: 1501724 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501725 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501726 ("Unintentional integer overflow")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomd: bcache: Trivial typo fixes in the file journal.c
Bhaskar Chowdhury [Sun, 11 Apr 2021 13:43:14 +0000 (21:43 +0800)]
md: bcache: Trivial typo fixes in the file journal.c

s/condidate/candidate/
s/folowing/following/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomd: bcache: avoid -Wempty-body warnings
Arnd Bergmann [Sun, 11 Apr 2021 13:43:13 +0000 (21:43 +0800)]
md: bcache: avoid -Wempty-body warnings

building with 'make W=1' shows a harmless warning for each user of the
EBUG_ON() macro:

drivers/md/bcache/bset.c: In function 'bch_btree_sort_partial':
drivers/md/bcache/util.h:30:55: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
   30 | #define EBUG_ON(cond)                   do { if (cond); } while (0)
      |                                                       ^
drivers/md/bcache/bset.c:1312:9: note: in expansion of macro 'EBUG_ON'
 1312 |         EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize);
      |         ^~~~~~~

Reword the macro slightly to avoid the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agobcache: use NULL instead of using plain integer as pointer
Yang Li [Sun, 11 Apr 2021 13:43:12 +0000 (21:43 +0800)]
bcache: use NULL instead of using plain integer as pointer

This fixes the following sparse warnings:
drivers/md/bcache/features.c:22:16: warning: Using plain integer as NULL
pointer

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agobcache: remove PTR_CACHE
Christoph Hellwig [Sun, 11 Apr 2021 13:43:11 +0000 (21:43 +0800)]
bcache: remove PTR_CACHE

Remove the PTR_CACHE inline and replace it with a direct dereference
of c->cache.

(Coly Li: fix the typo from PTR_BUCKET to PTR_CACHE in commit log)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agobcache: reduce redundant code in bch_cached_dev_run()
Zhiqiang Liu [Sun, 11 Apr 2021 13:43:10 +0000 (21:43 +0800)]
bcache: reduce redundant code in bch_cached_dev_run()

In bch_cached_dev_run(), free(env[1])|free(env[2])|free(buf)
show up three times. This patch introduce out tag in
which free(env[1])|free(env[2])|free(buf) are only called
one time. If we need to call free() when errors occur,
we can set error code to ret, and then goto out tag directly.

Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md...
Jens Axboe [Thu, 8 Apr 2021 15:55:14 +0000 (09:55 -0600)]
Merge branch 'md-next' of https://git./linux/kernel/git/song/md into for-5.13/drivers

Pull MD updates from Song:

"These patches fix a race condition with md_release() and md_open()."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: split mddev_find
  md: factor out a mddev_find_locked helper from mddev_find
  md: md_open returns -EBUSY when entering racing area

3 years agomd: split mddev_find
Christoph Hellwig [Sat, 3 Apr 2021 16:15:29 +0000 (18:15 +0200)]
md: split mddev_find

Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.

This turns out to fix this bug reported by Zhao Heming.

----------------------------- snip ------------------------------
commit d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger 1 time with 10 tests

`1  node1="15sp3-mdcluster1"
2  node2="15sp3-mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..100}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
`
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

`ID: 2831   TASK: ffff8dd7223b5040  CPU: 0   COMMAND: "mdadm"
 #0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f
 #1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d
 #2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3
 #3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9
 #4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964
 #5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4
 #6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc
 #7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb
 #8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d
 #9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c

or:
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
`
*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

`<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
`
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
------------------------------ snip ------------------------------

Cc: stable@vger.kernel.org
Fixes: d3374825ce57 ("md: make devices disappear when they are no longer needed.")
Reported-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
3 years agomd: factor out a mddev_find_locked helper from mddev_find
Christoph Hellwig [Sat, 3 Apr 2021 16:15:28 +0000 (18:15 +0200)]
md: factor out a mddev_find_locked helper from mddev_find

Factor out a self-contained helper to just lookup a mddev by the dev_t
"unit".

Cc: stable@vger.kernel.org
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
3 years agomd: md_open returns -EBUSY when entering racing area
Zhao Heming [Sat, 3 Apr 2021 03:01:25 +0000 (11:01 +0800)]
md: md_open returns -EBUSY when entering racing area

commit d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
will break the infinitely retry when md_open enter racing area.

This patch is partly fix soft lockup issue, full fix needs mddev_find
is split into two functions: mddev_find & mddev_find_or_alloc. And
md_open should call new mddev_find (it only does searching job).

For more detail, please refer with Christoph's "split mddev_find" patch
in later commits.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger every time with below script

```
1  node1="mdcluster1"
2  node2="mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..10}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
```

I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

```
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
```

*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

```
<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
```

In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.

Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
3 years agodrbd: use DEFINE_SPINLOCK() for spinlock
Guobin Huang [Tue, 6 Apr 2021 12:09:48 +0000 (20:09 +0800)]
drbd: use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Link: https://lore.kernel.org/r/1617710988-49205-1-git-send-email-huangguobin4@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoswim3: support highmem
Christoph Hellwig [Tue, 6 Apr 2021 06:18:39 +0000 (08:18 +0200)]
swim3: support highmem

swim3 only uses the virtual address of a bio to stash it into the data
transfer using virt_to_bus.  But the ppc32 virt_to_bus just uses the
physical address with an offset.  Replace virt_to_bus with a local hack
that performs the equivalent transformation and stop asking for block
layer bounce buffering.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061839.811588-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agofloppy: always use the track buffer
Christoph Hellwig [Tue, 6 Apr 2021 06:17:55 +0000 (08:17 +0200)]
floppy: always use the track buffer

Always use the track buffer that is already used for addresses outside
the 16MB address capability of the floppy controller.  This allows to
remove a lot of code that relies on kernel virtual addresses.  With
this gone there is just a single place left that looks at the bio,
which can be converted to memcpy_{from,to}_page, thus removing the need
for the extra block-layer bounce buffering for highmem pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061755.811522-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoswim: don't call blk_queue_bounce_limit
Christoph Hellwig [Tue, 6 Apr 2021 06:17:25 +0000 (08:17 +0200)]
swim: don't call blk_queue_bounce_limit

m68k doesn't support highmem, so don't bother enabling the block layer
bounce buffer code.  Just for safety throw in a depend on !HIGHMEM.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061725.811389-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agogdrom: support highmem
Christoph Hellwig [Tue, 6 Apr 2021 06:16:48 +0000 (08:16 +0200)]
gdrom: support highmem

The gdrom driver only has a single reference to the virtual address of
the bio data, and uses that only to get the physical address.  Switch
to deriving the physical address from the page directly and thus avoid
bounce buffering highmem data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061648.811275-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_nl: Demote half-complete kernel-doc headers
Lee Jones [Fri, 12 Mar 2021 10:55:30 +0000 (10:55 +0000)]
block: drbd: drbd_nl: Demote half-complete kernel-doc headers

Fixes the following W=1 kernel build warning(s):

 from drivers/block/drbd/drbd_nl.c:24:
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
 drivers/block/drbd/drbd_nl.c:1968:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'flags' not described in 'drbd_determine_dev_size'
 drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'rs' not described in 'drbd_determine_dev_size'
 drivers/block/drbd/drbd_nl.c:1148: warning: Function parameter or member 'dc' not described in 'drbd_check_al_size'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-12-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: xen-blkfront: Demote kernel-doc abuses
Lee Jones [Fri, 12 Mar 2021 10:55:29 +0000 (10:55 +0000)]
block: xen-blkfront: Demote kernel-doc abuses

Fixes the following W=1 kernel build warning(s):

 drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'dev' not described in 'blkfront_probe'
 drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'id' not described in 'blkfront_probe'
 drivers/block/xen-blkfront.c:1960: warning: expecting prototype for Allocate the basic(). Prototype was for blkfront_probe() instead
 drivers/block/xen-blkfront.c:2085: warning: Function parameter or member 'dev' not described in 'blkfront_resume'
 drivers/block/xen-blkfront.c:2085: warning: expecting prototype for or a backend(). Prototype was for blkfront_resume() instead
 drivers/block/xen-blkfront.c:2444: warning: wrong kernel-doc identifier on line:

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: xen-devel@lists.xenproject.org
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210312105530.2219008-11-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_receiver: Demote less than half complete kernel-doc header
Lee Jones [Fri, 12 Mar 2021 10:55:28 +0000 (10:55 +0000)]
block: drbd: drbd_receiver: Demote less than half complete kernel-doc header

Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-10-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_main: Fix a bunch of function documentation discrepancies
Lee Jones [Fri, 12 Mar 2021 10:55:27 +0000 (10:55 +0000)]
block: drbd: drbd_main: Fix a bunch of function documentation discrepancies

Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_main.c:278: warning: Function parameter or member 'connection' not described in 'tl_clear'
 drivers/block/drbd/drbd_main.c:278: warning: Excess function parameter 'device' description in 'tl_clear'
 drivers/block/drbd/drbd_main.c:489: warning: Function parameter or member 'cpu_mask' not described in 'drbd_calc_cpu_mask'
 drivers/block/drbd/drbd_main.c:528: warning: Excess function parameter 'device' description in 'drbd_thread_current_set_cpu'
 drivers/block/drbd/drbd_main.c:549: warning: Function parameter or member 'connection' not described in 'drbd_header_size'
 drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'device' not described in 'send_bitmap_rle_or_plain'
 drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'c' not described in 'send_bitmap_rle_or_plain'
 drivers/block/drbd/drbd_main.c:1335: warning: Function parameter or member 'peer_device' not described in '_drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1335: warning: Excess function parameter 'device' description in '_drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1379: warning: Function parameter or member 'peer_device' not described in 'drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1379: warning: Excess function parameter 'device' description in 'drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'connection' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'sock' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'buffer' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'size' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'msg_flags' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:3525: warning: Function parameter or member 'flags' not described in 'drbd_queue_bitmap_io'
 drivers/block/drbd/drbd_main.c:3563: warning: Function parameter or member 'flags' not described in 'drbd_bitmap_io'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-9-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_nl: Make conversion to 'enum drbd_ret_code' explicit
Lee Jones [Fri, 12 Mar 2021 10:55:26 +0000 (10:55 +0000)]
block: drbd: drbd_nl: Make conversion to 'enum drbd_ret_code' explicit

Fixes the following W=1 kernel build warning(s):

 from drivers/block/drbd/drbd_nl.c:24:
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_set_role’:
 drivers/block/drbd/drbd_nl.c:793:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c:795:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
 drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_connect’:
 drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_disconnect’:
 drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-8-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_main: Remove duplicate field initialisation
Lee Jones [Fri, 12 Mar 2021 10:55:25 +0000 (10:55 +0000)]
block: drbd: drbd_main: Remove duplicate field initialisation

[P_RETRY_WRITE] is initialised more than once.

Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_main.c: In function ‘cmdname’:
 drivers/block/drbd/drbd_main.c:3660:22: warning: initialized field overwritten [-Woverride-init]
 drivers/block/drbd/drbd_main.c:3660:22: note: (near initialization for ‘cmdnames[44]’)

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-7-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_receiver: Demote non-conformant kernel-doc headers
Lee Jones [Fri, 12 Mar 2021 10:55:24 +0000 (10:55 +0000)]
block: drbd: drbd_receiver: Demote non-conformant kernel-doc headers

Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_receiver.c:265: warning: Function parameter or member 'peer_device' not described in 'drbd_alloc_pages'
 drivers/block/drbd/drbd_receiver.c:265: warning: Excess function parameter 'device' description in 'drbd_alloc_pages'
 drivers/block/drbd/drbd_receiver.c:1362: warning: Function parameter or member 'connection' not described in 'drbd_may_finish_epoch'
 drivers/block/drbd/drbd_receiver.c:1362: warning: Excess function parameter 'device' description in 'drbd_may_finish_epoch'
 drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'resource' not described in 'drbd_bump_write_ordering'
 drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'bdev' not described in 'drbd_bump_write_ordering'
 drivers/block/drbd/drbd_receiver.c:1451: warning: Excess function parameter 'connection' description in 'drbd_bump_write_ordering'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Excess function parameter 'rw' description in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:3055: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_0p'
 drivers/block/drbd/drbd_receiver.c:3138: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_1p'
 drivers/block/drbd/drbd_receiver.c:3195: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_2p'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'peer_device' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'size' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'p' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'c' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'peer_device' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'p' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'c' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'len' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'peer_device' not described in 'decode_bitmap_c'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'p' not described in 'decode_bitmap_c'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'c' not described in 'decode_bitmap_c'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'len' not described in 'decode_bitmap_c'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-6-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_state: Fix some function documentation issues
Lee Jones [Fri, 12 Mar 2021 10:55:23 +0000 (10:55 +0000)]
block: drbd: drbd_state: Fix some function documentation issues

Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_state.c:913: warning: Function parameter or member 'connection' not described in 'is_valid_soft_transition'
 drivers/block/drbd/drbd_state.c:913: warning: Excess function parameter 'device' description in 'is_valid_soft_transition'
 drivers/block/drbd/drbd_state.c:1054: warning: Function parameter or member 'warn' not described in 'sanitize_state'
 drivers/block/drbd/drbd_state.c:1054: warning: Excess function parameter 'warn_sync_abort' description in 'sanitize_state'
 drivers/block/drbd/drbd_state.c:1703: warning: Function parameter or member 'state_change' not described in 'after_state_ch'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-5-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: mtip32xx: mtip32xx: Mark debugging variable 'start' as __maybe_unused
Lee Jones [Fri, 12 Mar 2021 10:55:22 +0000 (10:55 +0000)]
block: mtip32xx: mtip32xx: Mark debugging variable 'start' as __maybe_unused

Fixes the following W=1 kernel build warning(s):

 drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_standby_immediate’:
 drivers/block/mtip32xx/mtip32xx.c:1216:16: warning: variable ‘start’ set but not used [-Wunused-but-set-variable]

Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-4-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drbd: drbd_interval: Demote some kernel-doc abuses and fix another header
Lee Jones [Fri, 12 Mar 2021 10:55:21 +0000 (10:55 +0000)]
block: drbd: drbd_interval: Demote some kernel-doc abuses and fix another header

Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_interval.c:11: warning: Function parameter or member 'node' not described in 'interval_end'
 drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'root' not described in 'drbd_insert_interval'
 drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'this' not described in 'drbd_insert_interval'
 drivers/block/drbd/drbd_interval.c:70: warning: Function parameter or member 'root' not described in 'drbd_contains_interval'
 drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'root' not described in 'drbd_remove_interval'
 drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'this' not described in 'drbd_remove_interval'
 drivers/block/drbd/drbd_interval.c:113: warning: Function parameter or member 'root' not described in 'drbd_find_overlap'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-3-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme into for-5.13/drivers
Jens Axboe [Tue, 6 Apr 2021 15:17:22 +0000 (09:17 -0600)]
Merge tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme into for-5.13/drivers

Pull NVMe updates from Christoph:

"nvme updates for Linux 5.13

 - fix handling of very large MDTS values (Bart Van Assche)
 - retrigger ANA log update if group descriptor isn't found
   (Hannes Reinecke)
 - fix locking contexts in nvme-tcp and nvmet-tcp (Sagi Grimberg)
 - return proper error code from discovery ctrl (Hou Pu)
 - verify the SGLS field in nvmet-tcp and nvmet-fc (Max Gurtovoy)
 - disallow passthru cmd from targeting a nsid != nsid of the block dev
   (Niklas Cassel)
 - do not allow model_number exceed 40 bytes in nvmet (Noam Gottlieb)
 - enable optional queue idle period tracking in nvmet-tcp
   (Mark Wunderlich)
 - various cleanups and optimizations (Chaitanya Kulkarni, Kanchan Joshi)
 - expose fast_io_fail_tmo in sysfs (Daniel Wagner)
 - implement non-MDTS command limits (Keith Busch)
 - reduce warnings for unhandled command effects (Keith Busch)
 - allocate storage for the SQE as part of the nvme_request (Keith Busch)"

* tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme: (33 commits)
  nvme: fix handling of large MDTS values
  nvme: implement non-mdts command limits
  nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
  nvme: retrigger ANA log update if group descriptor isn't found
  nvme: export fast_io_fail_tmo to sysfs
  nvme: remove superfluous else in nvme_ctrl_loss_tmo_store
  nvme: use sysfs_emit instead of sprintf
  nvme-fc: check sgl supported by target
  nvme-tcp: check sgl supported by target
  nvmet-tcp: enable optional queue idle period tracking
  nvmet-tcp: fix incorrect locking in state_change sk callback
  nvme-tcp: block BH in sk state_change sk callback
  nvmet: return proper error code from discovery ctrl
  nvme: warn of unhandled effects only once
  nvme: use driver pdu command for passthrough
  nvme-pci: allocate nvme_command within driver pdu
  nvmet: do not allow model_number exceed 40 bytes
  nvmet: remove unnecessary ctrl parameter
  nvmet-fc: update function documentation
  nvme-fc: fix the function documentation comment
  ...

3 years agonvme: fix handling of large MDTS values
Bart Van Assche [Fri, 2 Apr 2021 16:58:20 +0000 (18:58 +0200)]
nvme: fix handling of large MDTS values

Instead of triggering an integer overflow and undefined behavior if MDTS is
large, set max_hw_sectors to UINT_MAX.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[hch: rebased to account for the new nvme_mps_to_sectors helper]
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: implement non-mdts command limits
Keith Busch [Wed, 24 Mar 2021 23:18:05 +0000 (16:18 -0700)]
nvme: implement non-mdts command limits

Commands that access LBA contents without a data transfer between the
host historically have not had a spec defined upper limit. The driver
set the queue constraints for such commands to the max data transfer
size just to be safe, but this artificial constraint frequently limits
devices below their capabilities.

The NVMe Workgroup ratified TP4040 defines how a controller may
advertise their non-MDTS limits. Use these if provided and default to
the current constraints if not. Since the Dataset Management command
limits are defined in logical blocks, but without a namespace to tell us
the logical block size, the code defaults to the safe 512b size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
Niklas Cassel [Fri, 26 Mar 2021 19:48:00 +0000 (19:48 +0000)]
nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev

When a passthru command targets a specific namespace, the ns parameter to
nvme_user_cmd()/nvme_user_cmd64() is set. However, there is currently no
validation that the nsid specified in the passthru command targets the
namespace/nsid represented by the block device that the ioctl was
performed on.

Add a check that validates that the nsid in the passthru command matches
that of the supplied namespace.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: retrigger ANA log update if group descriptor isn't found
Hannes Reinecke [Sat, 5 Dec 2020 15:29:01 +0000 (16:29 +0100)]
nvme: retrigger ANA log update if group descriptor isn't found

If ANA is enabled but no ANA group descriptor is found when creating
a new namespace the ANA log is most likely out of date, so trigger
a re-read. The namespace will be tagged with the NS_ANA_PENDING flag
to exclude it from path selection until the ANA log has been re-read.

Fixes: 32acab3181c7 ("nvme: implement multipath access to nvme subsystems")
Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: export fast_io_fail_tmo to sysfs
Daniel Wagner [Thu, 1 Apr 2021 09:54:12 +0000 (11:54 +0200)]
nvme: export fast_io_fail_tmo to sysfs

Commit 8c4dfea97f15 ("nvme-fabrics: reject I/O to offline device")
introduced fast_io_fail_tmo but didn't export the value to sysfs. The
value can be set during the 'nvme connect'. Export the timeout value
to user space via sysfs to allow runtime configuration.

Cc: Victor Gladkov <Victor.Gladkov@kioxia.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhaani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: remove superfluous else in nvme_ctrl_loss_tmo_store
Daniel Wagner [Thu, 1 Apr 2021 09:54:11 +0000 (11:54 +0200)]
nvme: remove superfluous else in nvme_ctrl_loss_tmo_store

If there is an error we will leave the function early. So there
is no need for an else. Remove it.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: use sysfs_emit instead of sprintf
Daniel Wagner [Thu, 1 Apr 2021 09:54:10 +0000 (11:54 +0200)]
nvme: use sysfs_emit instead of sprintf

sysfs_emit is the recommended API to use for formatting strings to be
returned to user space. It is equivalent to scnprintf and aware of the
PAGE_SIZE buffer size.

Suggested-by: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-fc: check sgl supported by target
Max Gurtovoy [Tue, 30 Mar 2021 23:01:20 +0000 (23:01 +0000)]
nvme-fc: check sgl supported by target

SGLs support is mandatory for NVMe/FC, make sure that the target is
aligned to the specification.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-tcp: check sgl supported by target
Max Gurtovoy [Tue, 30 Mar 2021 23:01:19 +0000 (23:01 +0000)]
nvme-tcp: check sgl supported by target

SGLs support is mandatory for NVMe/tcp, make sure that the target is
aligned to the specification.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet-tcp: enable optional queue idle period tracking
Wunderlich, Mark [Wed, 31 Mar 2021 21:38:30 +0000 (21:38 +0000)]
nvmet-tcp: enable optional queue idle period tracking

Add 'idle_poll_period_usecs' option used by io_work() to support
network devices enabled with advanced interrupt moderation
supporting a relaxed interrupt model. It was discovered that
such a NIC used on the target was unable to support initiator
connection establishment, caused by the existing io_work()
flow that immediately exits after a loop with no activity and
does not re-queue itself.

With this new option a queue is assigned a period of time
that no activity must occur in order to become 'idle'.  Until
the queue is idle the work item is requeued.

The new module option is defined as changeable making it
flexible for testing purposes.

The pre-existing legacy behavior is preserved when no module option
for idle_poll_period_usecs is specified.

Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet-tcp: fix incorrect locking in state_change sk callback
Sagi Grimberg [Sun, 21 Mar 2021 07:08:49 +0000 (00:08 -0700)]
nvmet-tcp: fix incorrect locking in state_change sk callback

We are not changing anything in the TCP connection state so
we should not take a write_lock but rather a read lock.

This caused a deadlock when running nvmet-tcp and nvme-tcp
on the same system, where state_change callbacks on the
host and on the controller side have causal relationship
and made lockdep report on this with blktests:

================================
WARNING: inconsistent lock state
5.12.0-rc3 #1 Tainted: G          I
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage.
nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
{IN-SOFTIRQ-W} state was registered at:
  __lock_acquire+0x79b/0x18d0
  lock_acquire+0x1ca/0x480
  _raw_write_lock_bh+0x39/0x80
  nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp]
  tcp_fin+0x2a8/0x780
  tcp_data_queue+0xf94/0x1f20
  tcp_rcv_established+0x6ba/0x1f00
  tcp_v4_do_rcv+0x502/0x760
  tcp_v4_rcv+0x257e/0x3430
  ip_protocol_deliver_rcu+0x69/0x6a0
  ip_local_deliver_finish+0x1e2/0x2f0
  ip_local_deliver+0x1a2/0x420
  ip_rcv+0x4fb/0x6b0
  __netif_receive_skb_one_core+0x162/0x1b0
  process_backlog+0x1ff/0x770
  __napi_poll.constprop.0+0xa9/0x5c0
  net_rx_action+0x7b3/0xb30
  __do_softirq+0x1f0/0x940
  do_softirq+0xa1/0xd0
  __local_bh_enable_ip+0xd8/0x100
  ip_finish_output2+0x6b7/0x18a0
  __ip_queue_xmit+0x706/0x1aa0
  __tcp_transmit_skb+0x2068/0x2e20
  tcp_write_xmit+0xc9e/0x2bb0
  __tcp_push_pending_frames+0x92/0x310
  inet_shutdown+0x158/0x300
  __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
  nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
  nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
  nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
  nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
  kernfs_fop_write_iter+0x2c7/0x460
  new_sync_write+0x36c/0x610
  vfs_write+0x5c0/0x870
  ksys_write+0xf9/0x1d0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 10687
hardirqs last  enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40
hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90
softirqs last  enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940
softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(clock-AF_INET);
  <Interrupt>
    lock(clock-AF_INET);

 *** DEADLOCK ***

5 locks held by nvme/1324:
 #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460
 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330
 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp]
 #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300

stack backtrace:
CPU: 26 PID: 1324 Comm: nvme Tainted: G          I       5.12.0-rc3 #1
Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020
Call Trace:
 dump_stack+0x93/0xc2
 mark_lock_irq.cold+0x2c/0xb3
 ? verify_lock_unused+0x390/0x390
 ? stack_trace_consume_entry+0x160/0x160
 ? lock_downgrade+0x100/0x100
 ? save_trace+0x88/0x5e0
 ? _raw_spin_unlock_irqrestore+0x2d/0x40
 mark_lock+0x530/0x1470
 ? mark_lock_irq+0x1d10/0x1d10
 ? enqueue_timer+0x660/0x660
 mark_usage+0x215/0x2a0
 __lock_acquire+0x79b/0x18d0
 ? tcp_schedule_loss_probe.part.0+0x38c/0x520
 lock_acquire+0x1ca/0x480
 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 ? rcu_read_unlock+0x40/0x40
 ? tcp_mtu_probe+0x1ae0/0x1ae0
 ? kmalloc_reserve+0xa0/0xa0
 ? sysfs_file_ops+0x170/0x170
 _raw_read_lock+0x3d/0xa0
 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 ? sysfs_file_ops+0x170/0x170
 inet_shutdown+0x189/0x300
 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
 nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
 nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
 nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
 nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
 kernfs_fop_write_iter+0x2c7/0x460
 new_sync_write+0x36c/0x610
 ? new_sync_read+0x600/0x600
 ? lock_acquire+0x1ca/0x480
 ? rcu_read_unlock+0x40/0x40
 ? lock_is_held_type+0x9a/0x110
 vfs_write+0x5c0/0x870
 ksys_write+0xf9/0x1d0
 ? __ia32_sys_read+0xa0/0xa0
 ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340
 ? syscall_enter_from_user_mode+0x27/0x70
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-tcp: block BH in sk state_change sk callback
Sagi Grimberg [Sun, 21 Mar 2021 07:08:48 +0000 (00:08 -0700)]
nvme-tcp: block BH in sk state_change sk callback

The TCP stack can run from process context for a long time
so we should disable BH here.

Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: return proper error code from discovery ctrl
Hou Pu [Wed, 31 Mar 2021 06:52:39 +0000 (14:52 +0800)]
nvmet: return proper error code from discovery ctrl

Return NVME_SC_INVALID_FIELD from discovery controller like normal
controller when executing identify or get log page command.

Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: warn of unhandled effects only once
Keith Busch [Wed, 17 Mar 2021 20:33:41 +0000 (13:33 -0700)]
nvme: warn of unhandled effects only once

We don't need to repeatedly spam the kernel logs with the same warning
about unhandled passthrough IO effects. Just one warning is sufficient
to observe this condition occurs.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: use driver pdu command for passthrough
Keith Busch [Wed, 17 Mar 2021 20:37:03 +0000 (13:37 -0700)]
nvme: use driver pdu command for passthrough

All nvme transport drivers preallocate an nvme command for each request.
Assume to use that command for nvme_setup_cmd() instead of requiring
drivers pass a pointer to it. All nvme drivers must initialize the
generic nvme_request 'cmd' to point to the transport's preallocated
nvme_command.

The generic nvme_request cmd pointer had previously been used only as a
temporary copy for passthrough commands. Since it now points to the
command that gets dispatched, passthrough commands must directly set it
up prior to executing the request.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-pci: allocate nvme_command within driver pdu
Keith Busch [Wed, 17 Mar 2021 20:37:02 +0000 (13:37 -0700)]
nvme-pci: allocate nvme_command within driver pdu

Except for pci, all the nvme transport drivers allocate a command within
the driver's pdu. Align pci with everyone else by allocating the nvme
command within pci's pdu and replace the .queue_rq() stack variable with
this.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: do not allow model_number exceed 40 bytes
Noam Gottlieb [Mon, 15 Mar 2021 14:56:11 +0000 (14:56 +0000)]
nvmet: do not allow model_number exceed 40 bytes

According to the NVM specifications, the model number size should be
40 bytes (bytes 63:24 of the Identify Controller data structure).
Therefore, any attempt to store a value into model_number which
exceeds 40 bytes should return an error.

Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: remove unnecessary ctrl parameter
Chaitanya Kulkarni [Wed, 10 Mar 2021 01:16:32 +0000 (17:16 -0800)]
nvmet: remove unnecessary ctrl parameter

The function nvmet_ctrl_find_get() accepts out pointer to nvmet_ctrl
structure. This function returns the same error value from two places
that is :- NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR.

Move this to the caller so we can change the return type to nvmet_ctrl.

Now that we can changed the return type, instead of taking out pointer
to the nvmet_ctrl structure remove that function parameter and return
the valid nvmet_ctrl pointer on success and NULL on failure.

Also, add and rename the goto labels for more readability with comments.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet-fc: update function documentation
Chaitanya Kulkarni [Mon, 1 Mar 2021 02:06:10 +0000 (18:06 -0800)]
nvmet-fc: update function documentation

Add minimum description of the hosthandle parameter for
nvmet_fc_rcv_ls_req() so that we can get rid of the following warning.

drivers/nvme//target/fc.c:2009: warning: Function parameter or member 'hosthandle' not described in 'nvmet_fc_rcv_ls_req

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-fc: fix the function documentation comment
Chaitanya Kulkarni [Mon, 1 Mar 2021 02:06:09 +0000 (18:06 -0800)]
nvme-fc: fix the function documentation comment

The nvme_fc_rcv_ls_req() function has first argument as pointer to
remoteport named portprt, but in the documentation comment that is name
is used as remoteport. Fix that to get rid if the compilation warning.

drivers/nvme//host/fc.c:1724: warning: Function parameter or member 'portptr' not described in 'nvme_fc_rcv_ls_req'
drivers/nvme//host/fc.c:1724: warning: Excess function parameter 'remoteport' description in 'nvme_fc_rcv_ls_req'

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: add new line after variable declatation
Chaitanya Kulkarni [Mon, 1 Mar 2021 02:06:11 +0000 (18:06 -0800)]
nvme: add new line after variable declatation

Add a new line in functions nvme_pr_preempt(), nvme_pr_clear(), and
nvme_pr_release() after variable declaration which follows the rest of
the code in the nvme/host/core.c.

No functional change(s) in this patch.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: don't check nvme_req flags for new req
Chaitanya Kulkarni [Mon, 1 Mar 2021 02:06:08 +0000 (18:06 -0800)]
nvme: don't check nvme_req flags for new req

nvme_clear_request() has a check for flag REQ_DONTPREP and it is called
from nvme_init_request() and nvme_setuo_cmd().

The function nvme_init_request() is called from nvme_alloc_request()
and nvme_alloc_request_qid(). From these two callers new request is
allocated everytime. For newly allocated request RQF_DONTPREP is never
set. Since after getting a tag, block layer sets the req->rq_flags == 0
and never sets the REQ_DONTPREP when returning the request :-

nvme_alloc_request()
blk_mq_alloc_request()
blk_mq_rq_ctx_init()
rq->rq_flags = 0 <----

nvme_alloc_request_qid()
blk_mq_alloc_request_hctx()
blk_mq_rq_ctx_init()
rq->rq_flags = 0 <----

The block layer does set req->rq_flags but REQ_DONTPREP is not one of
them and that is set by the driver.

That means we can unconditinally set the REQ_DONTPREP value to the
rq->rq_flags when nvme_init_request()->nvme_clear_request() is called
from above two callers.

Move the check for REQ_DONTPREP from nvme_clear_nvme_request() into
nvme_setup_cmd().

This is needed since nvme_alloc_request() now gets called from fast
path when NVMeOF target is configured with passthru backend to avoid
unnecessary checks in the fast path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: mark nvme_setup_passsthru() inline
Chaitanya Kulkarni [Mon, 1 Mar 2021 02:06:06 +0000 (18:06 -0800)]
nvme: mark nvme_setup_passsthru() inline

Since nvmet_setup_passthru() function falls in fast path when called
from the NVMeOF passthru backend, make it inline.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: split init identify into helper
Chaitanya Kulkarni [Mon, 1 Mar 2021 02:06:05 +0000 (18:06 -0800)]
nvme: split init identify into helper

The function nvme_init_ctrl_finish() (formerly nvme_init_identify()) has
grown over the period of time about ~200 lines given the size of nvme id
ctrl data structure.

Move the nvme_id_ctrl data structure related initilzation into helper
nvme_init_identify() and call it from nvme_init_ctrl_finish().

When we move the code into nvme_init_identify() change the local
variable i from int to unsigned int and remove the duplicate kfree()
after nvme_mpath_init() and jump to the label out_free if
nvme_mpath_ini() fails.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: rename nvme_init_identify()
Chaitanya Kulkarni [Mon, 1 Mar 2021 02:06:04 +0000 (18:06 -0800)]
nvme: rename nvme_init_identify()

This is a prep patch so that we can move the identify data structure
related code initialization from nvme_init_identify() into a helper.

Rename the function nvmet_init_identify() to nvmet_init_ctrl_finish().

Next patch will move the nvme_id_ctrl related initialization from newly
renamed function nvme_init_ctrl_finish() into the nvme_init_identify()
helper.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: reduce checks for zero command effects
Kanchan Joshi [Mon, 8 Mar 2021 19:18:04 +0000 (00:48 +0530)]
nvme: reduce checks for zero command effects

For passthrough I/O commands, effects are usually to be zero.
nvme_passthrough_end() does three checks in futility for this case.
Bail out of function-call/checks.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: use NVME_CTRL_CMIC_ANA macro
Kanchan Joshi [Mon, 8 Mar 2021 19:18:03 +0000 (00:48 +0530)]
nvme: use NVME_CTRL_CMIC_ANA macro

Use the proper macro instead of hard-coded value.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: replace white spaces with tabs
Chaitanya Kulkarni [Thu, 25 Feb 2021 01:56:42 +0000 (17:56 -0800)]
nvmet: replace white spaces with tabs

Instead of the using the whitespaces use tab spacing in the
nvmet_execute_identify_ns().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: remove an unnecessary function parameter to nvmet_check_ctrl_status
Chaitanya Kulkarni [Thu, 25 Feb 2021 01:56:40 +0000 (17:56 -0800)]
nvmet: remove an unnecessary function parameter to nvmet_check_ctrl_status

In nvmet_check_ctrl_status() cmd can be derived from nvmet_req. Remove
the local variable cmd in the nvmet_check_ctrl_status() and function
parameter cmd for nvmet_check_ctrl_status(). Derive the cmd value from
req parameter in the nvmet_check_ctrl_status().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: update error log page in nvmet_alloc_ctrl()
Chaitanya Kulkarni [Thu, 25 Feb 2021 01:56:38 +0000 (17:56 -0800)]
nvmet: update error log page in nvmet_alloc_ctrl()

Instead of updating the error log page in the caller of the
nvmet_alloc_ctrt() update the error log page in the nvmet_alloc_ctrl().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: remove a duplicate status assignment in nvmet_alloc_ctrl
Chaitanya Kulkarni [Thu, 25 Feb 2021 01:56:37 +0000 (17:56 -0800)]
nvmet: remove a duplicate status assignment in nvmet_alloc_ctrl

In the function nvmet_alloc_ctrl() we assign status value before we
call nvmet_fine_get_subsys() to:

status = NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR;

After we successfully find the subsystem we again set the status value
to:

status = NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR;

Remove the duplicate status assignment value.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-pci: cleanup nvme_irq()
Chaitanya Kulkarni [Tue, 23 Feb 2021 20:47:41 +0000 (12:47 -0800)]
nvme-pci: cleanup nvme_irq()

Get rid of a local variable that is not needed and just return the
status directly.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-pci: remove the barriers in nvme_irq()
Chaitanya Kulkarni [Tue, 23 Feb 2021 20:47:40 +0000 (12:47 -0800)]
nvme-pci: remove the barriers in nvme_irq()

The barriers were added to the nvme_irq() in commit 3a7afd8ee42a
("nvme-pci: remove the CQ lock for interrupt driven queues") to prevent
compiler from doing memory optimization for the variabes that were
protected previously by spinlock in nvme_irq() at completion queue
processing and with queue head check condition.

The variable nvmeq->last_cq_head from those checks was removed in the
commit f6c4d97b0d82 ("nvme/pci: Remove last_cq_head") that was not
allwing poll queues from mistakenly triggering the spurious interrupt
detection.

Remove the barriers which were protecting the updates to the variables.

Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agomtip32xx: use LIST_HEAD() for list_head
Shixin Liu [Mon, 29 Mar 2021 09:53:49 +0000 (17:53 +0800)]
mtip32xx: use LIST_HEAD() for list_head

There's no need to declare a list and then init it manually,
just use the LIST_HEAD() macro.

Signed-off-by: Shixin Liu <liushixin2@huawei.com>
Link: https://lore.kernel.org/r/20210329095349.4170870-2-liushixin2@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomtip32xx: use DEFINE_SPINLOCK() for spinlock
Shixin Liu [Mon, 29 Mar 2021 09:53:48 +0000 (17:53 +0800)]
mtip32xx: use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Shixin Liu <liushixin2@huawei.com>
Link: https://lore.kernel.org/r/20210329095349.4170870-1-liushixin2@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the revalidate_disk method
Christoph Hellwig [Mon, 8 Mar 2021 07:45:50 +0000 (08:45 +0100)]
block: remove the revalidate_disk method

No implementations left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210308074550.422714-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoparide/pd: remove ->revalidate_disk
Christoph Hellwig [Mon, 8 Mar 2021 07:45:48 +0000 (08:45 +0100)]
paride/pd: remove ->revalidate_disk

->revalidate_disk is only called during add_disk for pd, but at that
point the driver has already set the capacity to the one returned from
Identify a little earlier, so this additional update is entirely
superflous.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210308074550.422714-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md...
Jens Axboe [Thu, 25 Mar 2021 19:18:30 +0000 (13:18 -0600)]
Merge branch 'md-next' of https://git./linux/kernel/git/song/md into for-5.13/drivers

Pull MD updates from Song:

"The major changes are:

  1. Performance improvement for raid10 discard requests, from Xiao Ni.
  2. Fix missing information of /proc/mdstat, from Jan Glauber."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: Fix missing unused status line of /proc/mdstat
  md/raid10: improve discard request for far layout
  md/raid10: improve raid10 discard request
  md/raid10: pull the code that wait for blocked dev into one function
  md/raid10: extend r10bio devs to raid disks
  md: add md_submit_discard_bio() for submitting discard bio

3 years agomd: Fix missing unused status line of /proc/mdstat
Jan Glauber [Wed, 17 Mar 2021 14:04:39 +0000 (15:04 +0100)]
md: Fix missing unused status line of /proc/mdstat

Reading /proc/mdstat with a read buffer size that would not
fit the unused status line in the first read will skip this
line from the output.

So 'dd if=/proc/mdstat bs=64 2>/dev/null' will not print something
like: unused devices: <none>

Don't return NULL immediately in start() for v=2 but call
show() once to print the status line also for multiple reads.

Cc: stable@vger.kernel.org
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: Jan Glauber <jglauber@digitalocean.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
3 years agomd/raid10: improve discard request for far layout
Xiao Ni [Thu, 4 Feb 2021 07:50:47 +0000 (15:50 +0800)]
md/raid10: improve discard request for far layout

For far layout, the discard region is not continuous on disks. So it needs
far copies r10bio to cover all regions. It needs a way to know all r10bios
have finish or not. Similar with raid10_sync_request, only the first r10bio
master_bio records the discard bio. Other r10bios master_bio record the
first r10bio. The first r10bio can finish after other r10bios finish and
then return the discard bio.

Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
3 years agomd/raid10: improve raid10 discard request
Xiao Ni [Thu, 4 Feb 2021 07:50:46 +0000 (15:50 +0800)]
md/raid10: improve raid10 discard request

Now the discard request is split by chunk size. So it takes a long time
to finish mkfs on disks which support discard function. This patch improve
handling raid10 discard request. It uses the similar way with patch
29efc390b (md/md0: optimize raid0 discard handling).

But it's a little complex than raid0. Because raid10 has different layout.
If raid10 is offset layout and the discard request is smaller than stripe
size. There are some holes when we submit discard bio to underlayer disks.

For example: five disks (disk1 - disk5)
D01 D02 D03 D04 D05
D05 D01 D02 D03 D04
D06 D07 D08 D09 D10
D10 D06 D07 D08 D09
The discard bio just wants to discard from D03 to D10. For disk3, there is
a hole between D03 and D08. For disk4, there is a hole between D04 and D09.
D03 is a chunk, raid10_write_request can handle one chunk perfectly. So
the part that is not aligned with stripe size is still handled by
raid10_write_request.

If reshape is running when discard bio comes and the discard bio spans the
reshape position, raid10_write_request is responsible to handle this
discard bio.

I did a test with this patch set.
Without patch:
time mkfs.xfs /dev/md0
real4m39.775s
user0m0.000s
sys0m0.298s

With patch:
time mkfs.xfs /dev/md0
real0m0.105s
user0m0.000s
sys0m0.007s

nvme3n1           259:1    0   477G  0 disk
└─nvme3n1p1       259:10   0    50G  0 part
nvme4n1           259:2    0   477G  0 disk
└─nvme4n1p1       259:11   0    50G  0 part
nvme5n1           259:6    0   477G  0 disk
└─nvme5n1p1       259:12   0    50G  0 part
nvme2n1           259:9    0   477G  0 disk
└─nvme2n1p1       259:15   0    50G  0 part
nvme0n1           259:13   0   477G  0 disk
└─nvme0n1p1       259:14   0    50G  0 part

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
3 years agomd/raid10: pull the code that wait for blocked dev into one function
Xiao Ni [Thu, 4 Feb 2021 07:50:45 +0000 (15:50 +0800)]
md/raid10: pull the code that wait for blocked dev into one function

The following patch will reuse these logics, so pull the same codes into
one function.

Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
3 years agomd/raid10: extend r10bio devs to raid disks
Xiao Ni [Thu, 4 Feb 2021 07:50:44 +0000 (15:50 +0800)]
md/raid10: extend r10bio devs to raid disks

Now it allocs r10bio->devs[conf->copies]. Discard bio needs to submit
to all member disks and it needs to use r10bio. So extend to
r10bio->devs[geo.raid_disks].

Reviewed-by: Coly Li <colyli@suse.de>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
3 years agomd: add md_submit_discard_bio() for submitting discard bio
Xiao Ni [Thu, 4 Feb 2021 07:50:43 +0000 (15:50 +0800)]
md: add md_submit_discard_bio() for submitting discard bio

Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
3 years agodrivers/block: remove the umem driver
Davidlohr Bueso [Tue, 23 Mar 2021 19:07:10 +0000 (12:07 -0700)]
drivers/block: remove the umem driver

This removes the driver on the premise that it has been unused for a long
time. This is a better approach compared to changing untestable code
nobody cares about in the first place. Similarly, the umem.com website now
shows a mere Godaddy parking add.

Acked-by: NeilBrown <neilb@suse.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agorsxx: remove extraneous 'const' qualifier
Arnd Bergmann [Tue, 23 Mar 2021 21:57:26 +0000 (22:57 +0100)]
rsxx: remove extraneous 'const' qualifier

The returned string from rsxx_card_state_to_str is 'const',
but the other qualifier doesn't change anything here except
causing a warning with 'clang -Wextra':

drivers/block/rsxx/core.c:393:21: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers]
static const char * const rsxx_card_state_to_str(unsigned int state)

Fixes: f37912039eb0 ("block: IBM RamSan 70/80 trivial changes.")
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210323215753.281668-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoxsysace: Remove SYSACE driver
Michal Simek [Mon, 9 Nov 2020 10:59:41 +0000 (11:59 +0100)]
xsysace: Remove SYSACE driver

Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze
systems. The driver is not regularly tested and very likely not working for
quite a long time that's why remove it.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agos390/dasd: let driver core manage the sysfs attributes
Julian Wiedmann [Tue, 16 Mar 2021 09:45:13 +0000 (10:45 +0100)]
s390/dasd: let driver core manage the sysfs attributes

Wire up device_driver->dev_groups, so that really_probe() creates the
sysfs attributes for us automatically.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20210316094513.2601218-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agos390/dasd: remove dasd_fba_probe() wrapper
Julian Wiedmann [Tue, 16 Mar 2021 09:45:12 +0000 (10:45 +0100)]
s390/dasd: remove dasd_fba_probe() wrapper

commit e03c5941f904 ("s390/dasd: Remove unused parameter from
dasd_generic_probe()") allows us to wire the generic callback up
directly, avoiding the additional level of indirection.

While at it also remove the forward declaration for the dasd_fba_driver
struct, it's no longer needed.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20210316094513.2601218-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoLinux 5.12-rc3
Linus Torvalds [Sun, 14 Mar 2021 21:41:02 +0000 (14:41 -0700)]
Linux 5.12-rc3

3 years agoprctl: fix PR_SET_MM_AUXV kernel stack leak
Alexey Dobriyan [Sun, 14 Mar 2021 20:51:14 +0000 (23:51 +0300)]
prctl: fix PR_SET_MM_AUXV kernel stack leak

Doing a

prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);

will copy 1 byte from userspace to (quite big) on-stack array
and then stash everything to mm->saved_auxv.
AT_NULL terminator will be inserted at the very end.

/proc/*/auxv handler will find that AT_NULL terminator
and copy original stack contents to userspace.

This devious scheme requires CAP_SYS_RESOURCE.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 14 Mar 2021 20:33:33 +0000 (13:33 -0700)]
Merge tag 'irq-urgent-2021-03-14' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of irqchip updates:

   - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct

   - Add a missing DT compatible string for the Ingenic driver

   - Remove the pointless debugfs_file pointer from struct irqdomain"

* tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/ingenic: Add support for the JZ4760
  dt-bindings/irq: Add compatible string for the JZ4760B
  irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
  ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
  irqdomain: Remove debugfs_file from struct irq_domain

3 years agoMerge tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Mar 2021 20:29:38 +0000 (13:29 -0700)]
Merge tag 'timers-urgent-2021-03-14' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix in for hrtimers to prevent an interrupt storm caused by
  the lack of reevaluation of the timers which expire in softirq context
  under certain circumstances, e.g. when the clock was set"

* tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

3 years agoMerge tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Mar 2021 20:27:06 +0000 (13:27 -0700)]
Merge tag 'sched-urgent-2021-03-14' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
 "A set of scheduler updates:

   - Prevent a NULL pointer dereference in the migration_stop_cpu()
     mechanims

   - Prevent self concurrency of affine_move_task()

   - Small fixes and cleanups related to task migration/affinity setting

   - Ensure that sync_runqueues_membarrier_state() is invoked on the
     current CPU when it is in the cpu mask"

* tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/membarrier: fix missing local execution of ipi_sync_rq_state()
  sched: Simplify set_affinity_pending refcounts
  sched: Fix affine_move_task() self-concurrency
  sched: Optimize migration_cpu_stop()
  sched: Collate affine_move_task() stoppers
  sched: Simplify migration_cpu_stop()
  sched: Fix migration_cpu_stop() requeueing

3 years agoMerge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Mar 2021 20:15:55 +0000 (13:15 -0700)]
Merge tag 'objtool-urgent-2021-03-14' of git://git./linux/kernel/git/tip/tip

Pull objtool fix from Thomas Gleixner:
 "A single objtool fix to handle the PUSHF/POPF validation correctly for
  the paravirt changes which modified arch_local_irq_restore not to use
  popf"

* tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool,x86: Fix uaccess PUSHF/POPF validation

3 years agoMerge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Mar 2021 20:03:21 +0000 (13:03 -0700)]
Merge tag 'locking-urgent-2021-03-14' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "A couple of locking fixes:

   - A fix for the static_call mechanism so it handles unaligned
     addresses correctly.

   - Make u64_stats_init() a macro so every instance gets a seperate
     lockdep key.

   - Make seqcount_latch_init() a macro as well to preserve the static
     variable which is used for the lockdep key"

* tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  seqlock,lockdep: Fix seqcount_latch_init()
  u64_stats,lockdep: Fix u64_stats_init() vs lockdep
  static_call: Fix the module key fixup

3 years agoMerge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Mar 2021 19:57:17 +0000 (12:57 -0700)]
Merge tag 'perf_urgent_for_v5.12-rc3' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure PMU internal buffers are flushed for per-CPU events too and
   properly handle PID/TID for large PEBS.

 - Handle the case properly when there's no PMU and therefore return an
   empty list of perf MSRs for VMX to switch instead of reading random
   garbage from the stack.

* tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
  perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
  perf/core: Flush PMU internal buffers for per-CPU events

3 years agoMerge tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Mar 2021 19:54:56 +0000 (12:54 -0700)]
Merge tag 'efi-urgent-for-v5.12-rc2' of git://git./linux/kernel/git/tip/tip

Pull EFI fix from Ard Biesheuvel via Borislav Petkov:
 "Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was
  added v5.10, but failed to take the SetVirtualAddressMap() RT service
  into account"

* tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table

3 years agoMerge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Mar 2021 19:48:10 +0000 (12:48 -0700)]
Merge tag 'x86_urgent_for_v5.12_rc3' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - A couple of SEV-ES fixes and robustifications: verify usermode stack
   pointer in NMI is not coming from the syscall gap, correctly track
   IRQ states in the #VC handler and access user insn bytes atomically
   in same handler as latter cannot sleep.

 - Balance 32-bit fast syscall exit path to do the proper work on exit
   and thus not confuse audit and ptrace frameworks.

 - Two fixes for the ORC unwinder going "off the rails" into KASAN
   redzones and when ORC data is missing.

* tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Use __copy_from_user_inatomic()
  x86/sev-es: Correctly track IRQ states in runtime #VC handler
  x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
  x86/sev-es: Introduce ip_within_syscall_gap() helper
  x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
  x86/unwind/orc: Silence warnings caused by missing ORC data
  x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2

3 years agoMerge tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 14 Mar 2021 19:37:43 +0000 (12:37 -0700)]
Merge tag 'powerpc-5.12-3' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 5.12:

   - Fix wrong instruction encoding for lis in ppc_function_entry(),
     which could potentially lead to missed kprobes.

   - Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of
     non-volatile GPRs immediately after exec.

   - Clean up a missed SRR specifier in the recent interrupt rework.

   - Don't treat unrecoverable_exception() as an interrupt handler, it's
     called from other handlers so shouldn't do the interrupt entry/exit
     accounting itself.

   - Fix build errors caused by missing declarations for
     [en/dis]able_kernel_vsx().

  Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri
  Olsa, Naveen N. Rao, and Nicholas Piggin"

* tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/traps: unrecoverable_exception() is not an interrupt handler
  powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
  powerpc/64s/exception: Clean up a missed SRR specifier
  powerpc: Fix inverted SET_FULL_REGS bitop
  powerpc/64s: Use symbolic macros for function entry encoding
  powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()

3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 14 Mar 2021 19:35:02 +0000 (12:35 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "More fixes for ARM and x86"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: LAPIC: Advancing the timer expiration on guest initiated write
  KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
  KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
  kvm: x86: annotate RCU pointers
  KVM: arm64: Fix exclusive limit for IPA size
  KVM: arm64: Reject VM creation when the default IPA size is unsupported
  KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
  KVM: arm64: Don't use cbz/adr with external symbols
  KVM: arm64: Fix range alignment when walking page tables
  KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
  KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
  KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
  KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
  KVM: arm64: Fix nVHE hyp panic host context restore
  KVM: arm64: Avoid corrupting vCPU context register in guest exit
  KVM: arm64: nvhe: Save the SPE context early
  kvm: x86: use NULL instead of using plain integer as pointer
  KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
  KVM: x86: Ensure deadline timer has truly expired before posting its IRQ

3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 14 Mar 2021 19:23:34 +0000 (12:23 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "28 patches.

  Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
  highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
  zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
  ia64"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
  zram: fix broken page writeback
  zram: fix return value on writeback_store
  mm/memcg: set memcg when splitting page
  mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
  ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
  ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
  mm/userfaultfd: fix memory corruption due to writeprotect
  kasan: fix KASAN_STACK dependency for HW_TAGS
  kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
  mm/madvise: replace ptrace attach requirement for process_madvise
  include/linux/sched/mm.h: use rcu_dereference in in_vfork()
  kfence: fix reports if constant function prefixes exist
  kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
  kfence: fix printk format for ptrdiff_t
  linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
  MAINTAINERS: exclude uapi directories in API/ABI section
  binfmt_misc: fix possible deadlock in bm_register_write
  mm/highmem.c: fix zero_user_segments() with start > end
  hugetlb: do early cow when page pinned on src mm
  mm: use is_cow_mapping() across tree where proper
  ...

3 years agoMerge tag 'irqchip-fixes-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Thomas Gleixner [Sun, 14 Mar 2021 15:34:35 +0000 (16:34 +0100)]
Merge tag 'irqchip-fixes-5.12-1' of git://git./linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

  - More compatible strings for the Ingenic irqchip (introducing the
    JZ4760B SoC)
  - Select GENERIC_IRQ_MULTI_HANDLER on the ARM ep93xx platform
  - Drop all GENERIC_IRQ_MULTI_HANDLER selections from the irqchip
    Kconfig, now relying on the architecture to get it right
  - Drop the debugfs_file field from struct irq_domain, now that
    debugfs can track things on its own

3 years agoMerge tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 13 Mar 2021 20:38:44 +0000 (12:38 -0800)]
Merge tag 'char-misc-5.12-rc3' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small misc/char driver fixes to resolve some reported
  problems:

   - habanalabs driver fixes

   - Acrn build fixes (reported many times)

   - pvpanic module table export fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc/pvpanic: Export module FDT device table
  misc: fastrpc: restrict user apps from sending kernel RPC messages
  virt: acrn: Correct type casting of argument of copy_from_user()
  virt: acrn: Use EPOLLIN instead of POLLIN
  virt: acrn: Use vfs_poll() instead of f_op->poll()
  virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU
  cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP
  habanalabs: fix debugfs address translation
  habanalabs: Disable file operations after device is removed
  habanalabs: Call put_pid() when releasing control device
  drivers: habanalabs: remove unused dentry pointer for debugfs files
  habanalabs: mark hl_eq_inc_ptr() as static