Hou Pu [Fri, 16 Apr 2021 02:45:21 +0000 (10:45 +0800)]
nvmet: avoid queuing keep-alive timer if it is disabled
Issue following command:
nvme set-feature -f 0xf -v 0 /dev/nvme1n1 # disable keep-alive timer
nvme admin-passthru -o 0x18 /dev/nvme1n1 # send keep-alive command
will make keep-alive timer fired and thus delete the controller like
below:
[247459.907635] nvmet: ctrl 1 keep-alive timer (0 seconds) expired!
[247459.930294] nvmet: ctrl 1 fatal error occurred!
Avoid this by not queuing delayed keep-alive if it is disabled when
keep-alive command is received from the admin queue.
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Tested-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Dan Carpenter [Wed, 21 Apr 2021 10:19:45 +0000 (13:19 +0300)]
ataflop: fix off by one in ataflop_probe()
Smatch complains that the "type > NUM_DISK_MINORS" should be >=
instead of >. We also need to subtract one from "type" at the start.
Fixes:
bf9c0538e485 ("ataflop: use a separate gendisk for each media format")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dan Carpenter [Wed, 21 Apr 2021 10:18:35 +0000 (13:18 +0300)]
ataflop: potential out of bounds in do_format()
The function uses "type" as an array index:
q = unit[drive].disk[type]->queue;
Unfortunately the bounds check on "type" isn't done until later in the
function. Fix this by moving the bounds check to the start.
Fixes:
bf9c0538e485 ("ataflop: use a separate gendisk for each media format")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gustavo A. R. Silva [Tue, 20 Apr 2021 20:25:51 +0000 (15:25 -0500)]
drbd: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding a break statement instead of just
letting the code fall through to the next, and by adding a fallthrough
pseudo-keyword in places whre the code is intended to fall through.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dima Stepanov [Mon, 19 Apr 2021 07:37:22 +0000 (09:37 +0200)]
block/rnbd: Use strscpy instead of strlcpy
During checkpatch analyzing the following warning message was found:
WARNING:STRLCPY: Prefer strscpy over strlcpy - see:
https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Fix it by using strscpy calls instead of strlcpy.
Signed-off-by: Dima Stepanov <dmitrii.stepanov@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-20-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dima Stepanov [Mon, 19 Apr 2021 07:37:21 +0000 (09:37 +0200)]
block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name
cppcheck report the following error:
rnbd/rnbd-clt-sysfs.c:522:36: error: The variable 'buf' is used both
as a parameter and as destination in snprintf(). The origin and
destination buffers overlap. Quote from glibc (C-library)
documentation
(http://www.gnu.org/software/libc/manual/html_mono/libc.html#Formatted-Output-Functions):
"If copying takes place between objects that overlap as a result of a
call to sprintf() or snprintf(), the results are undefined."
[sprintfOverlappingData]
Fix it by initializing the buf variable in the first snprintf call.
Fixes:
91f4acb2801c ("block/rnbd-clt: support mapping two devices")
Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-19-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jack Wang [Mon, 19 Apr 2021 07:37:20 +0000 (09:37 +0200)]
block/rnbd-clt: Remove max_segment_size
We always map with SZ_4K, so do not need max_segment_size.
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-18-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Md Haris Iqbal [Mon, 19 Apr 2021 07:37:19 +0000 (09:37 +0200)]
block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes
When an RTRS session state changes, the transport layer generates an event
to RNBD. Then RNBD will change the state of the RNBD client device
accordingly.
This commit add kobject_uevent when the RNBD device state changes. With
this udev rules can be configured to react accordingly.
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-17-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:18 +0000 (09:37 +0200)]
block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev
struct rtrs_srv is not used when handling rnbd_srv_rdma_ev messages, so
cleaned up
rdma_ev function pointer in rtrs_srv_ops also is changed.
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-16-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:17 +0000 (09:37 +0200)]
Documentation/ABI/rnbd-clt: Add description for nr_poll_queues
describe how to set nr_poll_queues and enable the polling
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-15-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:16 +0000 (09:37 +0200)]
block/rnbd-clt: Support polling mode for IO latency optimization
RNBD can make double-queues for irq-mode and poll-mode.
For example, on 4-CPU system 8 request-queues are created,
4 for irq-mode and 4 for poll-mode.
If the IO has HIPRI flag, the block-layer will call .poll function
of RNBD. Then IO is sent to the poll-mode queue.
Add optional nr_poll_queues argument for map_devices interface.
To support polling of RNBD, RTRS client creates connections
for both of irq-mode and direct-poll-mode.
For example, on 4-CPU system it could've create 5 connections:
con[0] => user message (softirq cq)
con[1:4] => softirq cq
After this patch, it can create 9 connections:
con[0] => user message (softirq cq)
con[1:4] => softirq cq
con[5:8] => DIRECT-POLL cq
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-14-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:15 +0000 (09:37 +0200)]
block/rnbd-clt: Fix missing a memory free when unloading the module
When unloading the rnbd-clt module, it does not free a memory
including the filename of the symbolic link to /sys/block/rnbdX.
It is found by kmemleak as below.
unreferenced object 0xffff9f1a83d3c740 (size 16):
comm "bash", pid 736, jiffies
4295179665 (age 9841.310s)
hex dump (first 16 bytes):
21 64 65 76 21 6e 75 6c 6c 62 30 40 62 6c 61 00 !dev!nullb0@bla.
backtrace:
[<
0000000039f0c55e>] 0xffffffffc0456c24
[<
000000001aab9513>] kernfs_fop_write+0xcf/0x1c0
[<
00000000db5aa4b3>] vfs_write+0xdb/0x1d0
[<
000000007a2e2207>] ksys_write+0x65/0xe0
[<
00000000055e280a>] do_syscall_64+0x50/0x1b0
[<
00000000c2b51831>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-13-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tom Rix [Mon, 19 Apr 2021 07:37:14 +0000 (09:37 +0200)]
block/rnbd-clt: Improve find_or_create_sess() return check
clang static analysis reports this problem
rnbd-clt.c:1212:11: warning: Branch condition evaluates to a
garbage value
else if (!first)
^~~~~~
This is triggered in the find_and_get_or_create_sess() call
because the variable first is not initialized and the
earlier check is specifically for
if (sess == ERR_PTR(-ENOMEM))
This is false positive.
But the if-check can be reduced by initializing first to
false and then returning if the call to find_or_creat_sess()
does not set it to true. When it remains false, either
sess will be valid or not. The not case is caught by
find_and_get_or_create_sess()'s caller rnbd_clt_map_device()
sess = find_and_get_or_create_sess(...);
if (IS_ERR(sess))
return ERR_CAST(sess);
Since find_and_get_or_create_sess() initializes first to false
setting it in find_or_create_sess() is not needed.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-12-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:13 +0000 (09:37 +0200)]
block/rnbd-srv: Remove force_close file after holding a lock
We changed the rnbd_srv_sess_dev_force_close to use try-lock
because rnbd_srv_sess_dev_force_close and process_msg_close
can generate a deadlock.
Now rnbd_srv_sess_dev_force_close would do nothing
if it fails to get the lock. So removing the force_close
file should be moved to after the lock. Or the force_close
file is removed but the others are not removed.
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-11-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:12 +0000 (09:37 +0200)]
block/rnbd-srv: Prevent a deadlock generated by accessing sysfs in parallel
We got a warning message below.
When server tries to close one session by force, it locks the sysfs
interface and locks the srv_sess lock.
The problem is that client can send a request to close at the same time.
By close request, server locks the srv_sess lock and locks the sysfs
to remove the sysfs interfaces.
The simplest way to prevent that situation could be just use
mutex_trylock.
[ 234.153965] ======================================================
[ 234.154093] WARNING: possible circular locking dependency detected
[ 234.154219] 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+
20201216.1319+
b6b887b~deb10 Tainted: G O
[ 234.154381] ------------------------------------------------------
[ 234.154531] kworker/1:1H/618 is trying to acquire lock:
[ 234.154651]
ffff8887a09db0a8 (kn->count#132){++++}, at: kernfs_remove_by_name_ns+0x40/0x80
[ 234.154819]
but task is already holding lock:
[ 234.154965]
ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server]
[ 234.155132]
which lock already depends on the new lock.
[ 234.155311]
the existing dependency chain (in reverse order) is:
[ 234.155462]
-> #1 (&srv_sess->lock){+.+.}:
[ 234.155614] __mutex_lock+0x134/0xcb0
[ 234.155761] rnbd_srv_sess_dev_force_close+0x36/0x50 [rnbd_server]
[ 234.155889] rnbd_srv_dev_session_force_close_store+0x69/0xc0 [rnbd_server]
[ 234.156042] kernfs_fop_write+0x13f/0x240
[ 234.156162] vfs_write+0xf3/0x280
[ 234.156278] ksys_write+0xba/0x150
[ 234.156395] do_syscall_64+0x62/0x270
[ 234.156513] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 234.156632]
-> #0 (kn->count#132){++++}:
[ 234.156782] __lock_acquire+0x129e/0x23a0
[ 234.156900] lock_acquire+0xf3/0x210
[ 234.157043] __kernfs_remove+0x42b/0x4c0
[ 234.157161] kernfs_remove_by_name_ns+0x40/0x80
[ 234.157282] remove_files+0x3f/0xa0
[ 234.157399] sysfs_remove_group+0x4a/0xb0
[ 234.157519] rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server]
[ 234.157648] rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server]
[ 234.157775] process_io_req+0x29a/0x6a0 [rtrs_server]
[ 234.157924] __ib_process_cq+0x8c/0x100 [ib_core]
[ 234.158709] ib_cq_poll_work+0x31/0xb0 [ib_core]
[ 234.158834] process_one_work+0x4e5/0xaa0
[ 234.158958] worker_thread+0x65/0x5c0
[ 234.159078] kthread+0x1e0/0x200
[ 234.159194] ret_from_fork+0x24/0x30
[ 234.159309]
other info that might help us debug this:
[ 234.159513] Possible unsafe locking scenario:
[ 234.159658] CPU0 CPU1
[ 234.159775] ---- ----
[ 234.159891] lock(&srv_sess->lock);
[ 234.160005] lock(kn->count#132);
[ 234.160128] lock(&srv_sess->lock);
[ 234.160250] lock(kn->count#132);
[ 234.160364]
*** DEADLOCK ***
[ 234.160536] 3 locks held by kworker/1:1H/618:
[ 234.160677] #0:
ffff8883ca1ed528 ((wq_completion)ib-comp-wq){+.+.}, at: process_one_work+0x40a/0xaa0
[ 234.160840] #1:
ffff8883d2d5fe10 ((work_completion)(&cq->work)){+.+.}, at: process_one_work+0x40a/0xaa0
[ 234.161003] #2:
ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server]
[ 234.161168]
stack backtrace:
[ 234.161312] CPU: 1 PID: 618 Comm: kworker/1:1H Tainted: G O 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+
20201216.1319+
b6b887b~deb10
[ 234.161490] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012
[ 234.161643] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[ 234.161765] Call Trace:
[ 234.161910] dump_stack+0x96/0xe0
[ 234.162028] check_noncircular+0x29e/0x2e0
[ 234.162148] ? print_circular_bug+0x100/0x100
[ 234.162267] ? register_lock_class+0x1ad/0x8a0
[ 234.162385] ? __lock_acquire+0x68e/0x23a0
[ 234.162505] ? trace_event_raw_event_lock+0x190/0x190
[ 234.162626] __lock_acquire+0x129e/0x23a0
[ 234.162746] ? register_lock_class+0x8a0/0x8a0
[ 234.162866] lock_acquire+0xf3/0x210
[ 234.162982] ? kernfs_remove_by_name_ns+0x40/0x80
[ 234.163127] __kernfs_remove+0x42b/0x4c0
[ 234.163243] ? kernfs_remove_by_name_ns+0x40/0x80
[ 234.163363] ? kernfs_fop_readdir+0x3b0/0x3b0
[ 234.163482] ? strlen+0x1f/0x40
[ 234.163596] ? strcmp+0x30/0x50
[ 234.163712] kernfs_remove_by_name_ns+0x40/0x80
[ 234.163832] remove_files+0x3f/0xa0
[ 234.163948] sysfs_remove_group+0x4a/0xb0
[ 234.164068] rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server]
[ 234.164196] rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server]
[ 234.164345] ? _raw_spin_unlock_irqrestore+0x43/0x50
[ 234.164466] ? lockdep_hardirqs_on+0x1a8/0x290
[ 234.164597] ? mlx4_ib_poll_cq+0x927/0x1280 [mlx4_ib]
[ 234.164732] ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server]
[ 234.164859] process_io_req+0x29a/0x6a0 [rtrs_server]
[ 234.164982] ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server]
[ 234.165130] __ib_process_cq+0x8c/0x100 [ib_core]
[ 234.165279] ib_cq_poll_work+0x31/0xb0 [ib_core]
[ 234.165404] process_one_work+0x4e5/0xaa0
[ 234.165550] ? pwq_dec_nr_in_flight+0x160/0x160
[ 234.165675] ? do_raw_spin_lock+0x119/0x1d0
[ 234.165796] worker_thread+0x65/0x5c0
[ 234.165914] ? process_one_work+0xaa0/0xaa0
[ 234.166031] kthread+0x1e0/0x200
[ 234.166147] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 234.166268] ret_from_fork+0x24/0x30
[ 234.251591] rnbd_server L243: </dev/loop1@close_device_session>: Device closed
[ 234.604221] rnbd_server L264: RTRS Session close_device_session disconnected
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-10-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:11 +0000 (09:37 +0200)]
block/rnbd-clt: Replace {NO_WAIT,WAIT} with RTRS_PERMIT_{WAIT,NOWAIT}
They are defined with the same value and similar meaning, let's remove
one of them, then we can remove {WAIT,NOWAIT}.
Also change the type of 'wait' from 'int' to 'enum wait_type' to make
it clear.
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-9-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 19 Apr 2021 07:37:10 +0000 (09:37 +0200)]
block/rnbd: Kill destroy_device_cb
We can use destroy_device directly since destroy_device_cb is just the
wrapper of destroy_device.
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-8-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 19 Apr 2021 07:37:09 +0000 (09:37 +0200)]
block/rnbd: Kill rnbd_clt_destroy_default_group
No need to have it since we can call sysfs_remove_group in the
rnbd_clt_destroy_sysfs_files.
Then rnbd_clt_destroy_sysfs_files is paired with it's counterpart
rnbd_clt_create_sysfs_files.
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-7-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 19 Apr 2021 07:37:08 +0000 (09:37 +0200)]
block/rnbd-clt: Move add_disk(dev->gd) to rnbd_clt_setup_gen_disk
It makes more sense to add gendisk in rnbd_clt_setup_gen_disk, instead
of do it in rnbd_clt_map_device.
Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com>
Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-6-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 19 Apr 2021 07:37:07 +0000 (09:37 +0200)]
block/rnbd-clt: Remove some arguments from rnbd_client_setup_device
Remove them since both sess and idx can be dereferenced from dev. And
sess is not used in the function.
Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com>
Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-5-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 19 Apr 2021 07:37:06 +0000 (09:37 +0200)]
block/rnbd-clt: Remove some arguments from insert_dev_if_not_exists_devpath
Remove 'pathname' and 'sess' since we can dereference it from 'dev'.
Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com>
Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-4-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gioh Kim [Mon, 19 Apr 2021 07:37:05 +0000 (09:37 +0200)]
Documentation/sysfs-block-rnbd: Add descriptions for remap_device and resize
Two sysfs entries, remap_device and resize, are missing.
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-3-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Danil Kipnis [Mon, 19 Apr 2021 07:37:04 +0000 (09:37 +0200)]
MAINTAINERS: Change maintainer for rnbd module
Danil steps down, Haris will take over.
Also update email address to ionos.com, the old
cloud.ionos.com will still work for some time.
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Acked-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-2-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Colin Ian King [Thu, 15 Apr 2021 13:00:20 +0000 (14:00 +0100)]
floppy: remove redundant assignment to variable st
The variable st is being assigned a value that is never read and
it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Denis Efremov <efremov@linux.com>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20210415130020.1959951-1-colin.king@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Denis Efremov [Fri, 16 Apr 2021 08:34:49 +0000 (11:34 +0300)]
floppy: cleanups: remove FLOPPY_SILENT_DCL_CLEAR undef
FLOPPY_SILENT_DCL_CLEAR is not defined anywhere and comes from pre-git
era. Just drop this undef. There is FD_SILENT_DCL_CLEAR which is really
used.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-6-efremov@linux.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Denis Efremov [Fri, 16 Apr 2021 08:34:48 +0000 (11:34 +0300)]
floppy: cleanups: use memcpy() to copy reply_buffer
Use memcpy() in raw_cmd_done() to copy reply_buffer instead
of a for loop.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-5-efremov@linux.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Denis Efremov [Fri, 16 Apr 2021 08:34:47 +0000 (11:34 +0300)]
floppy: cleanups: use memset() to zero reply_buffer
Use memset() to zero reply buffer in raw_cmd_copyin() instead
of a for loop.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-4-efremov@linux.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Denis Efremov [Fri, 16 Apr 2021 08:34:46 +0000 (11:34 +0300)]
floppy: cleanups: use ST0 as reply_buffer index 0
Use ST0 as 0 index for reply_buffer array. get_fdc_version() is the only
function that uses index 0 directly instead of the ST0 define.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-3-efremov@linux.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Denis Efremov [Fri, 16 Apr 2021 08:34:45 +0000 (11:34 +0300)]
floppy: cleanups: remove trailing whitespaces
Cleanup trailing whitespaces as checkpatch.pl suggests.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-2-efremov@linux.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 15 Apr 2021 18:15:48 +0000 (12:15 -0600)]
Merge branch 'md-next' of https://git./linux/kernel/git/song/md into for-5.13/drivers
Pull MD updates from Song:
"1. mddev_find_or_alloc() clean up, from Christoph.
2. Fix NULL pointer deref with external bitmap, from Sudhakar."
* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/bitmap: wait for external bitmap writes to complete during tear down
md: do not return existing mddevs from mddev_find_or_alloc
md: refactor mddev_find_or_alloc
md: factor out a mddev_alloc_unit helper from mddev_find
Jens Axboe [Thu, 15 Apr 2021 18:15:12 +0000 (12:15 -0600)]
Merge tag 'nvme-5.13-2021-04-15' of git://git.infradead.org/nvme into for-5.13/drivers
Pull NVMe updates from Christoph:
"nvme updates for Linux 5.13
- refactor the ioctl code
- fix a segmentation fault during io parsing error in nvmet-tcp
(Elad Grupi)
- fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/store
(Gopal Tiwari)
- properly respect the sgl_threshold flag in nvme-pci (Niklas Cassel)
- misc cleanups (Niklas Cassel, Amit Engel, Minwoo Im, Colin Ian King)"
* tag 'nvme-5.13-2021-04-15' of git://git.infradead.org/nvme:
nvme: fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/store
nvme: let namespace probing continue for unsupported features
nvme: factor out nvme_ns_open and nvme_ns_release helpers
nvme: move nvme_ns_head_ops to multipath.c
nvme: factor out a nvme_tryget_ns_head helper
nvme: move the ioctl code to a separate file
nvme: don't bother to look up a namespace for controller ioctls
nvme: simplify block device ioctl handling for the !multipath case
nvme: simplify the compat ioctl handling
nvme: factor out a nvme_ns_ioctl helper
nvme: pass a user pointer to nvme_nvm_ioctl
nvme: cleanup setting the disk name
nvme: add a nvme_ns_head_multipath helper
nvme: remove single trailing whitespace
nvme-multipath: remove single trailing whitespace
nvme-pci: remove single trailing whitespace
nvme-pci: don't simple map sgl when sgls are disabled
nvmet: fix a spelling mistake "nubmer" -> "number"
nvmet-fc: simplify nvmet_fc_alloc_hostport
nvmet-tcp: fix a segmentation fault during io parsing error
Sudhakar Panneerselvam [Tue, 13 Apr 2021 04:08:29 +0000 (04:08 +0000)]
md/bitmap: wait for external bitmap writes to complete during tear down
NULL pointer dereference was observed in super_written() when it tries
to access the mddev structure.
[The below stack trace is from an older kernel, but the problem described
in this patch applies to the mainline kernel.]
[ 1194.474861] task:
ffff8fdd20858000 task.stack:
ffffb99d40790000
[ 1194.488000] RIP: 0010:super_written+0x29/0xe1
[ 1194.499688] RSP: 0018:
ffff8ffb7fcc3c78 EFLAGS:
00010046
[ 1194.512477] RAX:
0000000000000000 RBX:
ffff8ffb7bf4a000 RCX:
ffff8ffb78991048
[ 1194.527325] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
ffff8ffb56b8a200
[ 1194.542576] RBP:
ffff8ffb7fcc3c90 R08:
000000000000000b R09:
0000000000000000
[ 1194.558001] R10:
ffff8ffb56b8a298 R11:
0000000000000000 R12:
ffff8ffb56b8a200
[ 1194.573070] R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
[ 1194.588117] FS:
0000000000000000(0000) GS:
ffff8ffb7fcc0000(0000) knlGS:
0000000000000000
[ 1194.604264] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1194.617375] CR2:
00000000000002b8 CR3:
00000021e040a002 CR4:
00000000007606e0
[ 1194.632327] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 1194.647865] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 1194.663316] PKRU:
55555554
[ 1194.674090] Call Trace:
[ 1194.683735] <IRQ>
[ 1194.692948] bio_endio+0xae/0x135
[ 1194.703580] blk_update_request+0xad/0x2fa
[ 1194.714990] blk_update_bidi_request+0x20/0x72
[ 1194.726578] __blk_end_bidi_request+0x2c/0x4d
[ 1194.738373] __blk_end_request_all+0x31/0x49
[ 1194.749344] blk_flush_complete_seq+0x377/0x383
[ 1194.761550] flush_end_io+0x1dd/0x2a7
[ 1194.772910] blk_finish_request+0x9f/0x13c
[ 1194.784544] scsi_end_request+0x180/0x25c
[ 1194.796149] scsi_io_completion+0xc8/0x610
[ 1194.807503] scsi_finish_command+0xdc/0x125
[ 1194.818897] scsi_softirq_done+0x81/0xde
[ 1194.830062] blk_done_softirq+0xa4/0xcc
[ 1194.841008] __do_softirq+0xd9/0x29f
[ 1194.851257] irq_exit+0xe6/0xeb
[ 1194.861290] do_IRQ+0x59/0xe3
[ 1194.871060] common_interrupt+0x1c6/0x382
[ 1194.881988] </IRQ>
[ 1194.890646] RIP: 0010:cpuidle_enter_state+0xdd/0x2a5
[ 1194.902532] RSP: 0018:
ffffb99d40793e68 EFLAGS:
00000246 ORIG_RAX:
ffffffffffffff43
[ 1194.917317] RAX:
ffff8ffb7fce27c0 RBX:
ffff8ffb7fced800 RCX:
000000000000001f
[ 1194.932056] RDX:
0000000000000000 RSI:
0000000000000004 RDI:
0000000000000000
[ 1194.946428] RBP:
ffffb99d40793ea0 R08:
0000000000000004 R09:
0000000000002ed2
[ 1194.960508] R10:
0000000000002664 R11:
0000000000000018 R12:
0000000000000003
[ 1194.974454] R13:
000000000000000b R14:
ffffffff925715a0 R15:
0000011610120d5a
[ 1194.988607] ? cpuidle_enter_state+0xcc/0x2a5
[ 1194.999077] cpuidle_enter+0x17/0x19
[ 1195.008395] call_cpuidle+0x23/0x3a
[ 1195.017718] do_idle+0x172/0x1d5
[ 1195.026358] cpu_startup_entry+0x73/0x75
[ 1195.035769] start_secondary+0x1b9/0x20b
[ 1195.044894] secondary_startup_64+0xa5/0xa5
[ 1195.084921] RIP: super_written+0x29/0xe1 RSP:
ffff8ffb7fcc3c78
[ 1195.096354] CR2:
00000000000002b8
bio in the above stack is a bitmap write whose completion is invoked after
the tear down sequence sets the mddev structure to NULL in rdev.
During tear down, there is an attempt to flush the bitmap writes, but for
external bitmaps, there is no explicit wait for all the bitmap writes to
complete. For instance, md_bitmap_flush() is called to flush the bitmap
writes, but the last call to md_bitmap_daemon_work() in md_bitmap_flush()
could generate new bitmap writes for which there is no explicit wait to
complete those writes. The call to md_bitmap_update_sb() will return
simply for external bitmaps and the follow-up call to md_update_sb() is
conditional and may not get called for external bitmaps. This results in a
kernel panic when the completion routine, super_written() is called which
tries to reference mddev in the rdev that has been set to
NULL(in unbind_rdev_from_array() by tear down sequence).
The solution is to call md_super_wait() for external bitmaps after the
last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there
are no pending bitmap writes before proceeding with the tear down.
Cc: stable@vger.kernel.org
Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
Reviewed-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Christoph Hellwig [Mon, 12 Apr 2021 08:05:30 +0000 (10:05 +0200)]
md: do not return existing mddevs from mddev_find_or_alloc
Instead of returning an existing mddev, just for it to be discarded
later directly return -EEXIST. Rename the function to mddev_alloc now
that it doesn't find an existing mddev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Christoph Hellwig [Mon, 12 Apr 2021 08:05:29 +0000 (10:05 +0200)]
md: refactor mddev_find_or_alloc
Allocate the new mddev first speculatively, which greatly simplifies
the code flow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Christoph Hellwig [Mon, 12 Apr 2021 08:05:28 +0000 (10:05 +0200)]
md: factor out a mddev_alloc_unit helper from mddev_find
Split out a self contained helper to find a free minor for the md
"unit" number.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Gopal Tiwari [Wed, 14 Apr 2021 08:46:45 +0000 (14:16 +0530)]
nvme: fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/store
Adding entry for dev_attr_fast_io_fail_tmo to avoid the kernel crash
while reading and writing the fast_io_fail_tmo.
Fixes:
09fbed636382 (nvme: export fast_io_fail_tmo to sysfs)
Signed-off-by: Gopal Tiwari <gtiwari@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Wed, 7 Apr 2021 13:03:16 +0000 (15:03 +0200)]
nvme: let namespace probing continue for unsupported features
Instead of failing to scan the namespace entirely when unsupported
features are detected, just mark the gendisk hidden but allow other
access like the upcoming per-namespace character device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Christoph Hellwig [Wed, 7 Apr 2021 12:36:47 +0000 (14:36 +0200)]
nvme: factor out nvme_ns_open and nvme_ns_release helpers
These will be reused for the per-namespace character devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Christoph Hellwig [Wed, 7 Apr 2021 12:22:12 +0000 (14:22 +0200)]
nvme: move nvme_ns_head_ops to multipath.c
Move the multipath block_device_operations to multipath.c, where they
belong.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Christoph Hellwig [Wed, 7 Apr 2021 12:20:40 +0000 (14:20 +0200)]
nvme: factor out a nvme_tryget_ns_head helper
Add a helper to avoid opencoding ns_head->ref manipulations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Christoph Hellwig [Sat, 10 Apr 2021 06:42:03 +0000 (08:42 +0200)]
nvme: move the ioctl code to a separate file
Split out the ioctl code from core.c into a new file. Also update
copyrights while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Christoph Hellwig [Fri, 14 Aug 2020 09:11:49 +0000 (11:11 +0200)]
nvme: don't bother to look up a namespace for controller ioctls
Don't bother to look up a namespace just to drop if after retreiving the
controller for the multipath case. Just look up a live controller for
the subsystem directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Christoph Hellwig [Fri, 14 Aug 2020 08:55:32 +0000 (10:55 +0200)]
nvme: simplify block device ioctl handling for the !multipath case
Only use the existing ioctl handler for the multipath case, and add a
simpler one that reverts to the pre-multipath case for not shared
use case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Christoph Hellwig [Thu, 8 Apr 2021 12:04:42 +0000 (14:04 +0200)]
nvme: simplify the compat ioctl handling
Don't bother defining a separate compat_ioctl handler, and just handle
the NVME_IOCTL_SUBMIT_IO32 case inline. Also only defined it for those
ABIs (currently just i386 vs x86_64) that are affected.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Christoph Hellwig [Fri, 14 Aug 2020 08:30:50 +0000 (10:30 +0200)]
nvme: factor out a nvme_ns_ioctl helper
Factor out a helper for the namespace based ioctls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Christoph Hellwig [Fri, 14 Aug 2020 08:33:14 +0000 (10:33 +0200)]
nvme: pass a user pointer to nvme_nvm_ioctl
Pass the proper user pointer instead of the not all that useful integer
representation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Christoph Hellwig [Wed, 7 Apr 2021 10:46:46 +0000 (12:46 +0200)]
nvme: cleanup setting the disk name
Return false from nvme_set_disk_name and let the caller set the
non-multipath name instead of duplicating the naming information in two
places. Also remove the pointless local variables for the disk name
and flags and the not needed ctrl argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Minwoo Im [Wed, 7 Apr 2021 15:49:29 +0000 (17:49 +0200)]
nvme: add a nvme_ns_head_multipath helper
Move the multipath gendisk out of #ifdef CONFIG_NVME_MULTIPATH and add
a new nvme_ns_head_multipath that uses it to check if a ns_head has
a multipath device associated with it.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[hch: added the IS_ENABLED, converted a few existing users]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Niklas Cassel [Sat, 10 Apr 2021 20:16:21 +0000 (20:16 +0000)]
nvme: remove single trailing whitespace
There is a single trailing whitespace in core.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Niklas Cassel [Sat, 10 Apr 2021 20:15:45 +0000 (20:15 +0000)]
nvme-multipath: remove single trailing whitespace
There is a single trailing whitespace in multipath.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Niklas Cassel [Sat, 10 Apr 2021 20:15:43 +0000 (20:15 +0000)]
nvme-pci: remove single trailing whitespace
There is a single trailing whitespace in pci.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Niklas Cassel [Fri, 9 Apr 2021 18:12:55 +0000 (20:12 +0200)]
nvme-pci: don't simple map sgl when sgls are disabled
According to the module parameter description for sgl_threshold,
a value of 0 means that SGLs are disabled.
If SGLs are disabled, we should respect that, even for the case
where the request is made up of a single physical segment.
Fixes:
297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Colin Ian King [Wed, 7 Apr 2021 11:10:20 +0000 (12:10 +0100)]
nvmet: fix a spelling mistake "nubmer" -> "number"
There is a spelling mistake in a pr_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Amit Engel [Mon, 22 Mar 2021 19:57:17 +0000 (21:57 +0200)]
nvmet-fc: simplify nvmet_fc_alloc_hostport
Once a host is already created, avoid allocate additional hostports that
will be thrown away. add an helper function to handle host search.
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Amit Engel <amit.engel@dell.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Elad Grupi [Wed, 31 Mar 2021 09:13:14 +0000 (17:13 +0800)]
nvmet-tcp: fix a segmentation fault during io parsing error
In case there is an io that contains inline data and it goes to
parsing error flow, command response will free command and iov
before clearing the data on the socket buffer.
This will delay the command response until receive flow is completed.
Fixes:
872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Tue, 13 Apr 2021 10:52:57 +0000 (10:52 +0000)]
lightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15
Lightnvm was an innovative idea to expose more low-level control over SSDs.
But it failed to get properly standardized and remains a non-standarized
extension to NVMe that requires vendor specific quirks for a few now mostly
obsolete SSD devices. The standardized ZNS command set for NVMe has take
over a lot of the approaches and allows for fully standardized operation.
Remove the Linux code to support open channel SSDs as the few production
deployments of the above mentioned SSDs are using userspace driver stacks
instead of the fairly limited Linux support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Link: https://lore.kernel.org/r/20210413105257.159260-5-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zhang Yunkai [Tue, 13 Apr 2021 10:52:56 +0000 (10:52 +0000)]
lightnvm: remove duplicate include in lightnvm.h
'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h'
is duplicated.It is also included in the 5th and 7th line.
Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tian Tao [Tue, 13 Apr 2021 10:52:55 +0000 (10:52 +0000)]
lightnvm: return the correct return value
When memdup_user returns an error, memdup_user has two different return
values, use PTR_ERR to get the correct return value.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-3-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chaitanya Kulkarni [Tue, 13 Apr 2021 10:52:54 +0000 (10:52 +0000)]
lightnvm: use kobj_to_dev()
This fixs coccicheck warning:
drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for
kobj_to_dev()
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Mon, 12 Apr 2021 08:03:18 +0000 (10:03 +0200)]
block: remove the -ERESTARTSYS handling in blkdev_get_by_dev
Now that md has been cleaned up we can get rid of this hack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Max Gurtovoy [Mon, 12 Apr 2021 09:55:23 +0000 (09:55 +0000)]
null_blk: add option for managing virtual boundary
This will enable changing the virtual boundary of null blk devices. For
now, null blk devices didn't have any restriction on the scatter/gather
elements received from the block layer. Add a module parameter and a
configfs option that will control the virtual boundary. This will
enable testing the efficiency of the block layer bounce buffer in case
a suitable application will send discontiguous IO to the given device.
Initial testing with patched FIO showed the following results (64 jobs,
128 iodepth, 1 nullb device):
IO size READ (virt=false) READ (virt=true) Write (virt=false) Write (virt=true)
---------- ------------------- ----------------- ------------------- -------------------
1k 10.7M 8482k 10.8M 8471k
2k 10.4M 8266k 10.4M 8271k
4k 10.4M 8274k 10.3M 8226k
8k 10.2M 8131k 9800k 7933k
16k 9567k 7764k 8081k 6828k
32k 8865k 7309k 5570k 5153k
64k 7695k 6586k 2682k 2617k
128k 5346k 5489k 1320k 1296k
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chaitanya Kulkarni [Sun, 11 Apr 2021 22:43:30 +0000 (15:43 -0700)]
gdrom: fix compilation error
Use the right name for the struct request variable that removes the
following compilation error :-
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=sh
CROSS_COMPILE=sh4-linux-gnu- 'CC=sccache sh4-linux-gnu-gcc'
'HOSTCC=sccache gcc'
In file included from /builds/linux/include/linux/scatterlist.h:9,
from /builds/linux/include/linux/dma-mapping.h:10,
from /builds/linux/drivers/cdrom/gdrom.c:16:
/builds/linux/drivers/cdrom/gdrom.c: In function 'gdrom_readdisk_dma':
/builds/linux/drivers/cdrom/gdrom.c:586:61: error: 'rq' undeclared
(first use in this function)
586 | __raw_writel(page_to_phys(bio_page(req->bio)) + bio_offset(rq->bio),
| ^~
Fixes:
1d2c82001a5f ("gdrom: support highmem")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Sun, 11 Apr 2021 13:43:16 +0000 (21:43 +0800)]
bcache: fix a regression of code compiling failure in debug.c
The patch "bcache: remove PTR_CACHE" introduces a compiling failure in
debug.c with following error message,
In file included from drivers/md/bcache/bcache.h:182:0,
from drivers/md/bcache/debug.c:9:
drivers/md/bcache/debug.c: In function 'bch_btree_verify':
drivers/md/bcache/debug.c:53:19: error: 'c' undeclared (first use in
this function)
bio_set_dev(bio, c->cache->bdev);
^
This patch fixes the regression by replacing c->cache->bdev by b->c->
cache->bdev.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210411134316.80274-8-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gustavo A. R. Silva [Sun, 11 Apr 2021 13:43:15 +0000 (21:43 +0800)]
bcache: Use 64-bit arithmetic instead of 32-bit
Cast multiple variables to (int64_t) in order to give the compiler
complete information about the proper arithmetic to use. Notice that
these variables are being used in contexts that expect expressions of
type int64_t (64 bit, signed). And currently, such expressions are
being evaluated using 32-bit arithmetic.
Fixes:
d0cf9503e908 ("octeontx2-pf: ethtool fec mode support")
Addresses-Coverity-ID:
1501724 ("Unintentional integer overflow")
Addresses-Coverity-ID:
1501725 ("Unintentional integer overflow")
Addresses-Coverity-ID:
1501726 ("Unintentional integer overflow")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bhaskar Chowdhury [Sun, 11 Apr 2021 13:43:14 +0000 (21:43 +0800)]
md: bcache: Trivial typo fixes in the file journal.c
s/condidate/candidate/
s/folowing/following/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Arnd Bergmann [Sun, 11 Apr 2021 13:43:13 +0000 (21:43 +0800)]
md: bcache: avoid -Wempty-body warnings
building with 'make W=1' shows a harmless warning for each user of the
EBUG_ON() macro:
drivers/md/bcache/bset.c: In function 'bch_btree_sort_partial':
drivers/md/bcache/util.h:30:55: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
30 | #define EBUG_ON(cond) do { if (cond); } while (0)
| ^
drivers/md/bcache/bset.c:1312:9: note: in expansion of macro 'EBUG_ON'
1312 | EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize);
| ^~~~~~~
Reword the macro slightly to avoid the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yang Li [Sun, 11 Apr 2021 13:43:12 +0000 (21:43 +0800)]
bcache: use NULL instead of using plain integer as pointer
This fixes the following sparse warnings:
drivers/md/bcache/features.c:22:16: warning: Using plain integer as NULL
pointer
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Sun, 11 Apr 2021 13:43:11 +0000 (21:43 +0800)]
bcache: remove PTR_CACHE
Remove the PTR_CACHE inline and replace it with a direct dereference
of c->cache.
(Coly Li: fix the typo from PTR_BUCKET to PTR_CACHE in commit log)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zhiqiang Liu [Sun, 11 Apr 2021 13:43:10 +0000 (21:43 +0800)]
bcache: reduce redundant code in bch_cached_dev_run()
In bch_cached_dev_run(), free(env[1])|free(env[2])|free(buf)
show up three times. This patch introduce out tag in
which free(env[1])|free(env[2])|free(buf) are only called
one time. If we need to call free() when errors occur,
we can set error code to ret, and then goto out tag directly.
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 8 Apr 2021 15:55:14 +0000 (09:55 -0600)]
Merge branch 'md-next' of https://git./linux/kernel/git/song/md into for-5.13/drivers
Pull MD updates from Song:
"These patches fix a race condition with md_release() and md_open()."
* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: split mddev_find
md: factor out a mddev_find_locked helper from mddev_find
md: md_open returns -EBUSY when entering racing area
Christoph Hellwig [Sat, 3 Apr 2021 16:15:29 +0000 (18:15 +0200)]
md: split mddev_find
Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.
This turns out to fix this bug reported by Zhao Heming.
----------------------------- snip ------------------------------
commit
d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.
*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt
*** script ***
about trigger 1 time with 10 tests
`1 node1="15sp3-mdcluster1"
2 node2="15sp3-mdcluster2"
3
4 mdadm -Ss
5 ssh ${node2} "mdadm -Ss"
6 wipefs -a /dev/sda /dev/sdb
7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
/dev/sdb --assume-clean
8
9 for i in {1..100}; do
10 echo ==== $i ====;
11
12 echo "test ...."
13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14 sleep 1
15
16 echo "clean ....."
17 ssh ${node2} "mdadm -Ss"
18 done
`
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.
*** stack ***
`ID: 2831 TASK:
ffff8dd7223b5040 CPU: 0 COMMAND: "mdadm"
#0 [
ffffa15d00a13b90] __schedule at
ffffffffb8f1935f
#1 [
ffffa15d00a13ba8] exact_lock at
ffffffffb8a4a66d
#2 [
ffffa15d00a13bb0] kobj_lookup at
ffffffffb8c62fe3
#3 [
ffffa15d00a13c28] __blkdev_get at
ffffffffb89273b9
#4 [
ffffa15d00a13c98] blkdev_get at
ffffffffb8927964
#5 [
ffffa15d00a13cb0] do_dentry_open at
ffffffffb88dc4b4
#6 [
ffffa15d00a13ce0] path_openat at
ffffffffb88f0ccc
#7 [
ffffa15d00a13db8] do_filp_open at
ffffffffb88f32bb
#8 [
ffffa15d00a13ee0] do_sys_open at
ffffffffb88ddc7d
#9 [
ffffa15d00a13f38] do_syscall_64 at
ffffffffb86053cb ffffffffb900008c
or:
[ 884.226509] mddev_put+0x1c/0xe0 [md_mod]
[ 884.226515] md_open+0x3c/0xe0 [md_mod]
[ 884.226518] __blkdev_get+0x30d/0x710
[ 884.226520] ? bd_acquire+0xd0/0xd0
[ 884.226522] blkdev_get+0x14/0x30
[ 884.226524] do_dentry_open+0x204/0x3a0
[ 884.226531] path_openat+0x2fc/0x1520
[ 884.226534] ? seq_printf+0x4e/0x70
[ 884.226536] do_filp_open+0x9b/0x110
[ 884.226542] ? md_release+0x20/0x20 [md_mod]
[ 884.226543] ? seq_read+0x1d8/0x3e0
[ 884.226545] ? kmem_cache_alloc+0x18a/0x270
[ 884.226547] ? do_sys_open+0x1bd/0x260
[ 884.226548] do_sys_open+0x1bd/0x260
[ 884.226551] do_syscall_64+0x5b/0x1e0
[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9
`
*** rootcause ***
"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.
The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.
`<thread 1>: "mdadm -Ss"
md_release
mddev_put deletes mddev from all_mddevs
queue_work for mddev_delayed_delete
at this time, "/dev/md0" is still available for opening
<thread 2>: "mdadm --monitor ..."
md_open
+ mddev_find can't find mddev of /dev/md0, and create a new mddev and
| return.
+ trigger "if (mddev->gendisk != bdev->bd_disk)" and return
-ERESTARTSYS.
`
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.
In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
------------------------------ snip ------------------------------
Cc: stable@vger.kernel.org
Fixes:
d3374825ce57 ("md: make devices disappear when they are no longer needed.")
Reported-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Christoph Hellwig [Sat, 3 Apr 2021 16:15:28 +0000 (18:15 +0200)]
md: factor out a mddev_find_locked helper from mddev_find
Factor out a self-contained helper to just lookup a mddev by the dev_t
"unit".
Cc: stable@vger.kernel.org
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Zhao Heming [Sat, 3 Apr 2021 03:01:25 +0000 (11:01 +0800)]
md: md_open returns -EBUSY when entering racing area
commit
d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.
This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
will break the infinitely retry when md_open enter racing area.
This patch is partly fix soft lockup issue, full fix needs mddev_find
is split into two functions: mddev_find & mddev_find_or_alloc. And
md_open should call new mddev_find (it only does searching job).
For more detail, please refer with Christoph's "split mddev_find" patch
in later commits.
*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt
*** script ***
about trigger every time with below script
```
1 node1="mdcluster1"
2 node2="mdcluster2"
3
4 mdadm -Ss
5 ssh ${node2} "mdadm -Ss"
6 wipefs -a /dev/sda /dev/sdb
7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
/dev/sdb --assume-clean
8
9 for i in {1..10}; do
10 echo ==== $i ====;
11
12 echo "test ...."
13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14 sleep 1
15
16 echo "clean ....."
17 ssh ${node2} "mdadm -Ss"
18 done
```
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.
*** stack ***
```
[ 884.226509] mddev_put+0x1c/0xe0 [md_mod]
[ 884.226515] md_open+0x3c/0xe0 [md_mod]
[ 884.226518] __blkdev_get+0x30d/0x710
[ 884.226520] ? bd_acquire+0xd0/0xd0
[ 884.226522] blkdev_get+0x14/0x30
[ 884.226524] do_dentry_open+0x204/0x3a0
[ 884.226531] path_openat+0x2fc/0x1520
[ 884.226534] ? seq_printf+0x4e/0x70
[ 884.226536] do_filp_open+0x9b/0x110
[ 884.226542] ? md_release+0x20/0x20 [md_mod]
[ 884.226543] ? seq_read+0x1d8/0x3e0
[ 884.226545] ? kmem_cache_alloc+0x18a/0x270
[ 884.226547] ? do_sys_open+0x1bd/0x260
[ 884.226548] do_sys_open+0x1bd/0x260
[ 884.226551] do_syscall_64+0x5b/0x1e0
[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9
```
*** rootcause ***
"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.
The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.
```
<thread 1>: "mdadm -Ss"
md_release
mddev_put deletes mddev from all_mddevs
queue_work for mddev_delayed_delete
at this time, "/dev/md0" is still available for opening
<thread 2>: "mdadm --monitor ..."
md_open
+ mddev_find can't find mddev of /dev/md0, and create a new mddev and
| return.
+ trigger "if (mddev->gendisk != bdev->bd_disk)" and return
-ERESTARTSYS.
```
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.
In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Guobin Huang [Tue, 6 Apr 2021 12:09:48 +0000 (20:09 +0800)]
drbd: use DEFINE_SPINLOCK() for spinlock
spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Link: https://lore.kernel.org/r/1617710988-49205-1-git-send-email-huangguobin4@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 6 Apr 2021 06:18:39 +0000 (08:18 +0200)]
swim3: support highmem
swim3 only uses the virtual address of a bio to stash it into the data
transfer using virt_to_bus. But the ppc32 virt_to_bus just uses the
physical address with an offset. Replace virt_to_bus with a local hack
that performs the equivalent transformation and stop asking for block
layer bounce buffering.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061839.811588-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 6 Apr 2021 06:17:55 +0000 (08:17 +0200)]
floppy: always use the track buffer
Always use the track buffer that is already used for addresses outside
the 16MB address capability of the floppy controller. This allows to
remove a lot of code that relies on kernel virtual addresses. With
this gone there is just a single place left that looks at the bio,
which can be converted to memcpy_{from,to}_page, thus removing the need
for the extra block-layer bounce buffering for highmem pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061755.811522-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 6 Apr 2021 06:17:25 +0000 (08:17 +0200)]
swim: don't call blk_queue_bounce_limit
m68k doesn't support highmem, so don't bother enabling the block layer
bounce buffer code. Just for safety throw in a depend on !HIGHMEM.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061725.811389-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 6 Apr 2021 06:16:48 +0000 (08:16 +0200)]
gdrom: support highmem
The gdrom driver only has a single reference to the virtual address of
the bio data, and uses that only to get the physical address. Switch
to deriving the physical address from the page directly and thus avoid
bounce buffering highmem data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061648.811275-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:30 +0000 (10:55 +0000)]
block: drbd: drbd_nl: Demote half-complete kernel-doc headers
Fixes the following W=1 kernel build warning(s):
from drivers/block/drbd/drbd_nl.c:24:
drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
drivers/block/drbd/drbd_nl.c:1968:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'flags' not described in 'drbd_determine_dev_size'
drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'rs' not described in 'drbd_determine_dev_size'
drivers/block/drbd/drbd_nl.c:1148: warning: Function parameter or member 'dc' not described in 'drbd_check_al_size'
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-12-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:29 +0000 (10:55 +0000)]
block: xen-blkfront: Demote kernel-doc abuses
Fixes the following W=1 kernel build warning(s):
drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'dev' not described in 'blkfront_probe'
drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'id' not described in 'blkfront_probe'
drivers/block/xen-blkfront.c:1960: warning: expecting prototype for Allocate the basic(). Prototype was for blkfront_probe() instead
drivers/block/xen-blkfront.c:2085: warning: Function parameter or member 'dev' not described in 'blkfront_resume'
drivers/block/xen-blkfront.c:2085: warning: expecting prototype for or a backend(). Prototype was for blkfront_resume() instead
drivers/block/xen-blkfront.c:2444: warning: wrong kernel-doc identifier on line:
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: xen-devel@lists.xenproject.org
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210312105530.2219008-11-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:28 +0000 (10:55 +0000)]
block: drbd: drbd_receiver: Demote less than half complete kernel-doc header
Fixes the following W=1 kernel build warning(s):
drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-10-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:27 +0000 (10:55 +0000)]
block: drbd: drbd_main: Fix a bunch of function documentation discrepancies
Fixes the following W=1 kernel build warning(s):
drivers/block/drbd/drbd_main.c:278: warning: Function parameter or member 'connection' not described in 'tl_clear'
drivers/block/drbd/drbd_main.c:278: warning: Excess function parameter 'device' description in 'tl_clear'
drivers/block/drbd/drbd_main.c:489: warning: Function parameter or member 'cpu_mask' not described in 'drbd_calc_cpu_mask'
drivers/block/drbd/drbd_main.c:528: warning: Excess function parameter 'device' description in 'drbd_thread_current_set_cpu'
drivers/block/drbd/drbd_main.c:549: warning: Function parameter or member 'connection' not described in 'drbd_header_size'
drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'device' not described in 'send_bitmap_rle_or_plain'
drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'c' not described in 'send_bitmap_rle_or_plain'
drivers/block/drbd/drbd_main.c:1335: warning: Function parameter or member 'peer_device' not described in '_drbd_send_ack'
drivers/block/drbd/drbd_main.c:1335: warning: Excess function parameter 'device' description in '_drbd_send_ack'
drivers/block/drbd/drbd_main.c:1379: warning: Function parameter or member 'peer_device' not described in 'drbd_send_ack'
drivers/block/drbd/drbd_main.c:1379: warning: Excess function parameter 'device' description in 'drbd_send_ack'
drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'connection' not described in 'drbd_send_all'
drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'sock' not described in 'drbd_send_all'
drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'buffer' not described in 'drbd_send_all'
drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'size' not described in 'drbd_send_all'
drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'msg_flags' not described in 'drbd_send_all'
drivers/block/drbd/drbd_main.c:3525: warning: Function parameter or member 'flags' not described in 'drbd_queue_bitmap_io'
drivers/block/drbd/drbd_main.c:3563: warning: Function parameter or member 'flags' not described in 'drbd_bitmap_io'
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-9-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:26 +0000 (10:55 +0000)]
block: drbd: drbd_nl: Make conversion to 'enum drbd_ret_code' explicit
Fixes the following W=1 kernel build warning(s):
from drivers/block/drbd/drbd_nl.c:24:
drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_set_role’:
drivers/block/drbd/drbd_nl.c:793:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
drivers/block/drbd/drbd_nl.c:795:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_connect’:
drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_disconnect’:
drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-8-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:25 +0000 (10:55 +0000)]
block: drbd: drbd_main: Remove duplicate field initialisation
[P_RETRY_WRITE] is initialised more than once.
Fixes the following W=1 kernel build warning(s):
drivers/block/drbd/drbd_main.c: In function ‘cmdname’:
drivers/block/drbd/drbd_main.c:3660:22: warning: initialized field overwritten [-Woverride-init]
drivers/block/drbd/drbd_main.c:3660:22: note: (near initialization for ‘cmdnames[44]’)
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-7-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:24 +0000 (10:55 +0000)]
block: drbd: drbd_receiver: Demote non-conformant kernel-doc headers
Fixes the following W=1 kernel build warning(s):
drivers/block/drbd/drbd_receiver.c:265: warning: Function parameter or member 'peer_device' not described in 'drbd_alloc_pages'
drivers/block/drbd/drbd_receiver.c:265: warning: Excess function parameter 'device' description in 'drbd_alloc_pages'
drivers/block/drbd/drbd_receiver.c:1362: warning: Function parameter or member 'connection' not described in 'drbd_may_finish_epoch'
drivers/block/drbd/drbd_receiver.c:1362: warning: Excess function parameter 'device' description in 'drbd_may_finish_epoch'
drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'resource' not described in 'drbd_bump_write_ordering'
drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'bdev' not described in 'drbd_bump_write_ordering'
drivers/block/drbd/drbd_receiver.c:1451: warning: Excess function parameter 'connection' description in 'drbd_bump_write_ordering'
drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'
drivers/block/drbd/drbd_receiver.c:1643: warning: Excess function parameter 'rw' description in 'drbd_submit_peer_request'
drivers/block/drbd/drbd_receiver.c:3055: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_0p'
drivers/block/drbd/drbd_receiver.c:3138: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_1p'
drivers/block/drbd/drbd_receiver.c:3195: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_2p'
drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'peer_device' not described in 'receive_bitmap_plain'
drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'size' not described in 'receive_bitmap_plain'
drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'p' not described in 'receive_bitmap_plain'
drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'c' not described in 'receive_bitmap_plain'
drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'peer_device' not described in 'recv_bm_rle_bits'
drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'p' not described in 'recv_bm_rle_bits'
drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'c' not described in 'recv_bm_rle_bits'
drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'len' not described in 'recv_bm_rle_bits'
drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'peer_device' not described in 'decode_bitmap_c'
drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'p' not described in 'decode_bitmap_c'
drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'c' not described in 'decode_bitmap_c'
drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'len' not described in 'decode_bitmap_c'
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-6-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:23 +0000 (10:55 +0000)]
block: drbd: drbd_state: Fix some function documentation issues
Fixes the following W=1 kernel build warning(s):
drivers/block/drbd/drbd_state.c:913: warning: Function parameter or member 'connection' not described in 'is_valid_soft_transition'
drivers/block/drbd/drbd_state.c:913: warning: Excess function parameter 'device' description in 'is_valid_soft_transition'
drivers/block/drbd/drbd_state.c:1054: warning: Function parameter or member 'warn' not described in 'sanitize_state'
drivers/block/drbd/drbd_state.c:1054: warning: Excess function parameter 'warn_sync_abort' description in 'sanitize_state'
drivers/block/drbd/drbd_state.c:1703: warning: Function parameter or member 'state_change' not described in 'after_state_ch'
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-5-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:22 +0000 (10:55 +0000)]
block: mtip32xx: mtip32xx: Mark debugging variable 'start' as __maybe_unused
Fixes the following W=1 kernel build warning(s):
drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_standby_immediate’:
drivers/block/mtip32xx/mtip32xx.c:1216:16: warning: variable ‘start’ set but not used [-Wunused-but-set-variable]
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-4-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lee Jones [Fri, 12 Mar 2021 10:55:21 +0000 (10:55 +0000)]
block: drbd: drbd_interval: Demote some kernel-doc abuses and fix another header
Fixes the following W=1 kernel build warning(s):
drivers/block/drbd/drbd_interval.c:11: warning: Function parameter or member 'node' not described in 'interval_end'
drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'root' not described in 'drbd_insert_interval'
drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'this' not described in 'drbd_insert_interval'
drivers/block/drbd/drbd_interval.c:70: warning: Function parameter or member 'root' not described in 'drbd_contains_interval'
drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'root' not described in 'drbd_remove_interval'
drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'this' not described in 'drbd_remove_interval'
drivers/block/drbd/drbd_interval.c:113: warning: Function parameter or member 'root' not described in 'drbd_find_overlap'
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-3-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 6 Apr 2021 15:17:22 +0000 (09:17 -0600)]
Merge tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme into for-5.13/drivers
Pull NVMe updates from Christoph:
"nvme updates for Linux 5.13
- fix handling of very large MDTS values (Bart Van Assche)
- retrigger ANA log update if group descriptor isn't found
(Hannes Reinecke)
- fix locking contexts in nvme-tcp and nvmet-tcp (Sagi Grimberg)
- return proper error code from discovery ctrl (Hou Pu)
- verify the SGLS field in nvmet-tcp and nvmet-fc (Max Gurtovoy)
- disallow passthru cmd from targeting a nsid != nsid of the block dev
(Niklas Cassel)
- do not allow model_number exceed 40 bytes in nvmet (Noam Gottlieb)
- enable optional queue idle period tracking in nvmet-tcp
(Mark Wunderlich)
- various cleanups and optimizations (Chaitanya Kulkarni, Kanchan Joshi)
- expose fast_io_fail_tmo in sysfs (Daniel Wagner)
- implement non-MDTS command limits (Keith Busch)
- reduce warnings for unhandled command effects (Keith Busch)
- allocate storage for the SQE as part of the nvme_request (Keith Busch)"
* tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme: (33 commits)
nvme: fix handling of large MDTS values
nvme: implement non-mdts command limits
nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
nvme: retrigger ANA log update if group descriptor isn't found
nvme: export fast_io_fail_tmo to sysfs
nvme: remove superfluous else in nvme_ctrl_loss_tmo_store
nvme: use sysfs_emit instead of sprintf
nvme-fc: check sgl supported by target
nvme-tcp: check sgl supported by target
nvmet-tcp: enable optional queue idle period tracking
nvmet-tcp: fix incorrect locking in state_change sk callback
nvme-tcp: block BH in sk state_change sk callback
nvmet: return proper error code from discovery ctrl
nvme: warn of unhandled effects only once
nvme: use driver pdu command for passthrough
nvme-pci: allocate nvme_command within driver pdu
nvmet: do not allow model_number exceed 40 bytes
nvmet: remove unnecessary ctrl parameter
nvmet-fc: update function documentation
nvme-fc: fix the function documentation comment
...
Bart Van Assche [Fri, 2 Apr 2021 16:58:20 +0000 (18:58 +0200)]
nvme: fix handling of large MDTS values
Instead of triggering an integer overflow and undefined behavior if MDTS is
large, set max_hw_sectors to UINT_MAX.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[hch: rebased to account for the new nvme_mps_to_sectors helper]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Keith Busch [Wed, 24 Mar 2021 23:18:05 +0000 (16:18 -0700)]
nvme: implement non-mdts command limits
Commands that access LBA contents without a data transfer between the
host historically have not had a spec defined upper limit. The driver
set the queue constraints for such commands to the max data transfer
size just to be safe, but this artificial constraint frequently limits
devices below their capabilities.
The NVMe Workgroup ratified TP4040 defines how a controller may
advertise their non-MDTS limits. Use these if provided and default to
the current constraints if not. Since the Dataset Management command
limits are defined in logical blocks, but without a namespace to tell us
the logical block size, the code defaults to the safe 512b size.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Niklas Cassel [Fri, 26 Mar 2021 19:48:00 +0000 (19:48 +0000)]
nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
When a passthru command targets a specific namespace, the ns parameter to
nvme_user_cmd()/nvme_user_cmd64() is set. However, there is currently no
validation that the nsid specified in the passthru command targets the
namespace/nsid represented by the block device that the ioctl was
performed on.
Add a check that validates that the nsid in the passthru command matches
that of the supplied namespace.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Hannes Reinecke [Sat, 5 Dec 2020 15:29:01 +0000 (16:29 +0100)]
nvme: retrigger ANA log update if group descriptor isn't found
If ANA is enabled but no ANA group descriptor is found when creating
a new namespace the ANA log is most likely out of date, so trigger
a re-read. The namespace will be tagged with the NS_ANA_PENDING flag
to exclude it from path selection until the ANA log has been re-read.
Fixes:
32acab3181c7 ("nvme: implement multipath access to nvme subsystems")
Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Daniel Wagner [Thu, 1 Apr 2021 09:54:12 +0000 (11:54 +0200)]
nvme: export fast_io_fail_tmo to sysfs
Commit
8c4dfea97f15 ("nvme-fabrics: reject I/O to offline device")
introduced fast_io_fail_tmo but didn't export the value to sysfs. The
value can be set during the 'nvme connect'. Export the timeout value
to user space via sysfs to allow runtime configuration.
Cc: Victor Gladkov <Victor.Gladkov@kioxia.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhaani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Daniel Wagner [Thu, 1 Apr 2021 09:54:11 +0000 (11:54 +0200)]
nvme: remove superfluous else in nvme_ctrl_loss_tmo_store
If there is an error we will leave the function early. So there
is no need for an else. Remove it.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Daniel Wagner [Thu, 1 Apr 2021 09:54:10 +0000 (11:54 +0200)]
nvme: use sysfs_emit instead of sprintf
sysfs_emit is the recommended API to use for formatting strings to be
returned to user space. It is equivalent to scnprintf and aware of the
PAGE_SIZE buffer size.
Suggested-by: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Max Gurtovoy [Tue, 30 Mar 2021 23:01:20 +0000 (23:01 +0000)]
nvme-fc: check sgl supported by target
SGLs support is mandatory for NVMe/FC, make sure that the target is
aligned to the specification.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Max Gurtovoy [Tue, 30 Mar 2021 23:01:19 +0000 (23:01 +0000)]
nvme-tcp: check sgl supported by target
SGLs support is mandatory for NVMe/tcp, make sure that the target is
aligned to the specification.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Wunderlich, Mark [Wed, 31 Mar 2021 21:38:30 +0000 (21:38 +0000)]
nvmet-tcp: enable optional queue idle period tracking
Add 'idle_poll_period_usecs' option used by io_work() to support
network devices enabled with advanced interrupt moderation
supporting a relaxed interrupt model. It was discovered that
such a NIC used on the target was unable to support initiator
connection establishment, caused by the existing io_work()
flow that immediately exits after a loop with no activity and
does not re-queue itself.
With this new option a queue is assigned a period of time
that no activity must occur in order to become 'idle'. Until
the queue is idle the work item is requeued.
The new module option is defined as changeable making it
flexible for testing purposes.
The pre-existing legacy behavior is preserved when no module option
for idle_poll_period_usecs is specified.
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Sagi Grimberg [Sun, 21 Mar 2021 07:08:49 +0000 (00:08 -0700)]
nvmet-tcp: fix incorrect locking in state_change sk callback
We are not changing anything in the TCP connection state so
we should not take a write_lock but rather a read lock.
This caused a deadlock when running nvmet-tcp and nvme-tcp
on the same system, where state_change callbacks on the
host and on the controller side have causal relationship
and made lockdep report on this with blktests:
================================
WARNING: inconsistent lock state
5.12.0-rc3 #1 Tainted: G I
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage.
nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
{IN-SOFTIRQ-W} state was registered at:
__lock_acquire+0x79b/0x18d0
lock_acquire+0x1ca/0x480
_raw_write_lock_bh+0x39/0x80
nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp]
tcp_fin+0x2a8/0x780
tcp_data_queue+0xf94/0x1f20
tcp_rcv_established+0x6ba/0x1f00
tcp_v4_do_rcv+0x502/0x760
tcp_v4_rcv+0x257e/0x3430
ip_protocol_deliver_rcu+0x69/0x6a0
ip_local_deliver_finish+0x1e2/0x2f0
ip_local_deliver+0x1a2/0x420
ip_rcv+0x4fb/0x6b0
__netif_receive_skb_one_core+0x162/0x1b0
process_backlog+0x1ff/0x770
__napi_poll.constprop.0+0xa9/0x5c0
net_rx_action+0x7b3/0xb30
__do_softirq+0x1f0/0x940
do_softirq+0xa1/0xd0
__local_bh_enable_ip+0xd8/0x100
ip_finish_output2+0x6b7/0x18a0
__ip_queue_xmit+0x706/0x1aa0
__tcp_transmit_skb+0x2068/0x2e20
tcp_write_xmit+0xc9e/0x2bb0
__tcp_push_pending_frames+0x92/0x310
inet_shutdown+0x158/0x300
__nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
kernfs_fop_write_iter+0x2c7/0x460
new_sync_write+0x36c/0x610
vfs_write+0x5c0/0x870
ksys_write+0xf9/0x1d0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 10687
hardirqs last enabled at (10687): [<
ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40
hardirqs last disabled at (10686): [<
ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90
softirqs last enabled at (10684): [<
ffffffff9f000608>] __do_softirq+0x608/0x940
softirqs last disabled at (10649): [<
ffffffff9cdedd31>] do_softirq+0xa1/0xd0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(clock-AF_INET);
<Interrupt>
lock(clock-AF_INET);
*** DEADLOCK ***
5 locks held by nvme/1324:
#0:
ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
#1:
ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460
#2:
ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330
#3:
ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp]
#4:
ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300
stack backtrace:
CPU: 26 PID: 1324 Comm: nvme Tainted: G I 5.12.0-rc3 #1
Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020
Call Trace:
dump_stack+0x93/0xc2
mark_lock_irq.cold+0x2c/0xb3
? verify_lock_unused+0x390/0x390
? stack_trace_consume_entry+0x160/0x160
? lock_downgrade+0x100/0x100
? save_trace+0x88/0x5e0
? _raw_spin_unlock_irqrestore+0x2d/0x40
mark_lock+0x530/0x1470
? mark_lock_irq+0x1d10/0x1d10
? enqueue_timer+0x660/0x660
mark_usage+0x215/0x2a0
__lock_acquire+0x79b/0x18d0
? tcp_schedule_loss_probe.part.0+0x38c/0x520
lock_acquire+0x1ca/0x480
? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
? rcu_read_unlock+0x40/0x40
? tcp_mtu_probe+0x1ae0/0x1ae0
? kmalloc_reserve+0xa0/0xa0
? sysfs_file_ops+0x170/0x170
_raw_read_lock+0x3d/0xa0
? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
? sysfs_file_ops+0x170/0x170
inet_shutdown+0x189/0x300
__nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
kernfs_fop_write_iter+0x2c7/0x460
new_sync_write+0x36c/0x610
? new_sync_read+0x600/0x600
? lock_acquire+0x1ca/0x480
? rcu_read_unlock+0x40/0x40
? lock_is_held_type+0x9a/0x110
vfs_write+0x5c0/0x870
ksys_write+0xf9/0x1d0
? __ia32_sys_read+0xa0/0xa0
? lockdep_hardirqs_on_prepare.part.0+0x198/0x340
? syscall_enter_from_user_mode+0x27/0x70
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes:
872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>