linux-2.6-microblaze.git
2 years agodm crypt: use dm_submit_bio_remap
Mike Snitzer [Fri, 18 Feb 2022 04:40:35 +0000 (23:40 -0500)]
dm crypt: use dm_submit_bio_remap

Care was taken to support kcryptd_io_read being called from crypt_map
or workqueue.  Use of an intermediate CRYPT_MAP_READ_GFP gfp_t
(defined as GFP_NOWAIT) should protect from maintenance burden if that
flag were to change for some reason.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: add dm_submit_bio_remap interface
Mike Snitzer [Fri, 18 Feb 2022 04:40:32 +0000 (23:40 -0500)]
dm: add dm_submit_bio_remap interface

Where possible, switch from early bio-based IO accounting (at the time
DM clones each incoming bio) to late IO accounting just before each
remapped bio is issued to underlying device via submit_bio_noacct().

Allows more precise bio-based IO accounting for DM targets that use
their own workqueues to perform additional processing of each bio in
conjunction with their DM_MAPIO_SUBMITTED return from their map
function. When a target is updated to use dm_submit_bio_remap() they
must also set ti->accounts_remapped_io to true.

Use xchg() in start_io_acct(), as suggested by Mikulas, to ensure each
IO is only started once.  The xchg race only happens if
__send_duplicate_bios() sends multiple bios -- that case is reflected
via tio->is_duplicate_bio.  Given the niche nature of this race, it is
best to avoid any xchg performance penalty for normal IO.

For IO that was never submitted with dm_bio_submit_remap(), but the
target completes the clone with bio_endio, accounting is started then
ended and pending_io counter decremented.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: flag clones created by __send_duplicate_bios
Mike Snitzer [Fri, 18 Feb 2022 04:40:30 +0000 (23:40 -0500)]
dm: flag clones created by __send_duplicate_bios

Formally disallow dm_accept_partial_bio() on clones created by
__send_duplicate_bios() because their len_ptr points to a shared
unsigned int.  __send_duplicate_bios() is only used for flush bios
and other "abnormal" bios (discards, writezeroes, etc). And
dm_accept_partial_bio() already didn't support flush bios.

Also refactor __send_changing_extent_only() to reflect it cannot fail.
As such __send_changing_extent_only() can update the clone_info before
__send_duplicate_bios() is called to fan-out __map_bio() calls.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: reduce dm_io and dm_target_io struct sizes
Mike Snitzer [Fri, 18 Feb 2022 04:40:28 +0000 (23:40 -0500)]
dm: reduce dm_io and dm_target_io struct sizes

Remove one 4 byte hole in dm_io struct.
Remove two 4 byte holes in dm_target_io struct.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: move duplicate code from callers of alloc_tio into alloc_tio
Mike Snitzer [Fri, 18 Feb 2022 04:40:25 +0000 (23:40 -0500)]
dm: move duplicate code from callers of alloc_tio into alloc_tio

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: record old_sector in dm_target_io before calling map function
Mike Snitzer [Fri, 18 Feb 2022 04:40:23 +0000 (23:40 -0500)]
dm: record old_sector in dm_target_io before calling map function

Prep for being able to defer trace_block_bio_remap() until when the
bio is remapped and submitted by the DM target.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: remove legacy code only needed before submit_bio recursion
Mike Snitzer [Fri, 18 Feb 2022 04:40:21 +0000 (23:40 -0500)]
dm: remove legacy code only needed before submit_bio recursion

Commit 8615cb65bd63 ("dm: remove useless loop in
__split_and_process_bio") showcased that we no longer loop.

Remove the bio_advance() in __split_and_process_bio() that was only
needed when looping was possible.

Similarly there is no need to advance the bio, using ci->sector
cursor, in __send_duplicate_bios().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: remove unused mapped_device argument from free_tio
Mike Snitzer [Fri, 18 Feb 2022 04:40:18 +0000 (23:40 -0500)]
dm: remove unused mapped_device argument from free_tio

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: remove impossible BUG_ON in __send_empty_flush
Mike Snitzer [Fri, 18 Feb 2022 04:40:16 +0000 (23:40 -0500)]
dm: remove impossible BUG_ON in __send_empty_flush

The flush_bio in question was just initialized to be empty, so there
is no way bio_has_data() will return true.  So remove stale BUG_ON().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: reduce code duplication in __map_bio
Mike Snitzer [Fri, 18 Feb 2022 04:40:14 +0000 (23:40 -0500)]
dm: reduce code duplication in __map_bio

Error path code (for handling DM_MAPIO_REQUEUE and DM_MAPIO_KILL) is
effectively identical.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: refactor dm_split_and_process_bio a bit
Mike Snitzer [Fri, 18 Feb 2022 04:40:11 +0000 (23:40 -0500)]
dm: refactor dm_split_and_process_bio a bit

Remove needless branching and indentation. Leaves code to catch
malformed op_is_zone_mgmt bios (they shouldn't have a payload).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: fold __clone_and_map_data_bio into __split_and_process_bio
Mike Snitzer [Fri, 18 Feb 2022 04:40:09 +0000 (23:40 -0500)]
dm: fold __clone_and_map_data_bio into __split_and_process_bio

Fold __clone_and_map_data_bio into its only caller.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: rename split functions
Mike Snitzer [Fri, 18 Feb 2022 04:40:07 +0000 (23:40 -0500)]
dm: rename split functions

Rename __split_and_process_bio to dm_split_and_process_bio.
Rename __split_and_process_non_flush to __split_and_process_bio.

Also fix a stale comment and whitespace.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: reorder members in mapped_device struct
Mike Snitzer [Fri, 18 Feb 2022 04:40:04 +0000 (23:40 -0500)]
dm: reorder members in mapped_device struct

Improves alignment and groups related members relative to cachelines.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: eliminate copying of dm_io fields in dm_io_dec_pending
Mike Snitzer [Sun, 20 Feb 2022 17:57:11 +0000 (12:57 -0500)]
dm: eliminate copying of dm_io fields in dm_io_dec_pending

There is no need for dm_io_dec_pending() to copy dm_io fields
anymore now that DM provides its own pending_io counters again.

The race documented in commit d208b89401e0 ("dm: fix mempool NULL
pointer race when completing IO") no longer exists now that block
core's in_flight counters aren't used to signal all dm_io is
complete.

Also, rename {start,end}_io_acct to dm_{start,end}_io_acct.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm stats: fix too short end duration_ns when using precise_timestamps
Mike Snitzer [Fri, 18 Feb 2022 04:39:59 +0000 (23:39 -0500)]
dm stats: fix too short end duration_ns when using precise_timestamps

dm_stats_account_io()'s STAT_PRECISE_TIMESTAMPS support doesn't handle
the fact that with commit b879f915bc48 ("dm: properly fix redundant
bio-based IO accounting") io->start_time _may_ be in the past (meaning
the start_io_acct() was deferred until later).

Add a new dm_stats_recalc_precise_timestamps() helper that will
set/clear a new 'precise_timestamps' flag in the dm_stats struct based
on whether any configured stats enable STAT_PRECISE_TIMESTAMPS.
And update DM core's alloc_io() to use dm_stats_record_start() to set
stats_aux.duration_ns if stats->precise_timestamps is true.

Also, remove unused 'last_sector' and 'last_rw' members from the
dm_stats struct.

Fixes: b879f915bc48 ("dm: properly fix redundant bio-based IO accounting")
Cc: stable@vger.kernel.org
Co-developed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: fix double accounting of flush with data
Mike Snitzer [Fri, 18 Feb 2022 04:39:57 +0000 (23:39 -0500)]
dm: fix double accounting of flush with data

DM handles a flush with data by first issuing an empty flush and then
once it completes the REQ_PREFLUSH flag is removed and the payload is
issued.  The problem fixed by this commit is that both the empty flush
bio and the data payload will account the full extent of the data
payload.

Fix this by factoring out dm_io_acct() and having it wrap all IO
accounting to set the size of  bio with REQ_PREFLUSH to 0, account the
IO, and then restore the original size.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agodm: interlock pending dm_io and dm_wait_for_bios_completion
Mike Snitzer [Fri, 18 Feb 2022 04:40:02 +0000 (23:40 -0500)]
dm: interlock pending dm_io and dm_wait_for_bios_completion

Commit d208b89401e0 ("dm: fix mempool NULL pointer race when
completing IO") didn't go far enough.

When bio_end_io_acct ends the count of in-flight I/Os may reach zero
and the DM device may be suspended. There is a possibility that the
suspend races with dm_stats_account_io.

Fix this by adding percpu "pending_io" counters to track outstanding
dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also,
rename md_in_flight_bios() to dm_in_flight_bios() and update it to
iterate all pending_io counters.

Fixes: d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO")
Cc: stable@vger.kernel.org
Co-developed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2 years agoblock/bfq_wf2q: correct weight to ioprio
Yahu Gao [Fri, 7 Jan 2022 06:58:59 +0000 (14:58 +0800)]
block/bfq_wf2q: correct weight to ioprio

The return value is ioprio * BFQ_WEIGHT_CONVERSION_COEFF or 0.
What we want is ioprio or 0.
Correct this by changing the calculation.

Signed-off-by: Yahu Gao <gaoyahu19@gmail.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20220107065859.25689-1-gaoyahu19@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queues
David Jeffery [Mon, 31 Jan 2022 20:33:37 +0000 (15:33 -0500)]
blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queues

When blk_mq_delay_run_hw_queues sets an hctx to run in the future, it can
reset the delay length for an already pending delayed work run_work. This
creates a scenario where multiple hctx may have their queues set to run,
but if one runs first and finds nothing to do, it can reset the delay of
another hctx and stall the other hctx's ability to run requests.

To avoid this I/O stall when an hctx's run_work is already pending,
leave it untouched to run at its current designated time rather than
extending its delay. The work will still run which keeps closed the race
calling blk_mq_delay_run_hw_queues is needed for while also avoiding the
I/O stall.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220131203337.GA17666@redhat
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agovirtio_blk: simplify refcounting
Christoph Hellwig [Tue, 15 Feb 2022 09:45:14 +0000 (10:45 +0100)]
virtio_blk: simplify refcounting

Implement the ->free_disk method to free the virtio_blk structure only
once the last gendisk reference goes away instead of keeping a local
refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20220215094514.3828912-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agomemstick/mspro_block: simplify refcounting
Christoph Hellwig [Tue, 15 Feb 2022 09:45:13 +0000 (10:45 +0100)]
memstick/mspro_block: simplify refcounting

Implement the ->free_disk method to free the msb_data structure only once
the last gendisk reference goes away instead of keeping a local
refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agomemstick/mspro_block: fix handling of read-only devices
Christoph Hellwig [Tue, 15 Feb 2022 09:45:12 +0000 (10:45 +0100)]
memstick/mspro_block: fix handling of read-only devices

Use set_disk_ro to propagate the read-only state to the block layer
instead of checking for it in ->open and leaking a reference in case
of a read-only device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agomemstick/ms_block: simplify refcounting
Christoph Hellwig [Tue, 15 Feb 2022 09:45:11 +0000 (10:45 +0100)]
memstick/ms_block: simplify refcounting

Implement the ->free_disk method to free the msb_data structure only once
the last gendisk reference goes away instead of keeping a local refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: add a ->free_disk method
Christoph Hellwig [Tue, 15 Feb 2022 09:45:10 +0000 (10:45 +0100)]
block: add a ->free_disk method

Add a method to notify the driver that the gendisk is about to be freed.
This allows drivers to tie the lifetime of their private data to that of
the gendisk and thus deal with device removal races without expensive
synchronization and boilerplate code.

A new flag is added so that ->free_disk is only called after a successful
call to add_disk, which significantly simplifies the error handling path
during probing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: revert 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO scenarios")
Ming Lei [Wed, 16 Feb 2022 04:45:14 +0000 (12:45 +0800)]
block: revert 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO scenarios")

Revert commit 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large
IO scenarios") since we have another easier way to address this issue and
get better iops throttling result.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: don't try to throttle split bio if iops limit isn't set
Ming Lei [Wed, 16 Feb 2022 04:45:13 +0000 (12:45 +0800)]
block: don't try to throttle split bio if iops limit isn't set

We need to throttle split bio in case of IOPS limit even though the
split bio has been marked as BIO_THROTTLED since block layer
accounts split bio actually.

If only throughput throttle is setup, no need to throttle any more
if BIO_THROTTLED is set since we have accounted & considered the
whole bio bytes already.

Add one flag of THROTL_TG_HAS_IOPS_LIMIT for serving this purpose.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: throttle split bio in case of iops limit
Ming Lei [Wed, 16 Feb 2022 04:45:12 +0000 (12:45 +0800)]
block: throttle split bio in case of iops limit

Commit 111be8839817 ("block-throttle: avoid double charge") marks bio as
BIO_THROTTLED unconditionally if __blk_throtl_bio() is called on this bio,
then this bio won't be called into __blk_throtl_bio() any more. This way
is to avoid double charge in case of bio splitting. It is reasonable for
read/write throughput limit, but not reasonable for IOPS limit because
block layer provides io accounting against split bio.

Chunguang Xu has already observed this issue and fixed it in commit
4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO scenarios").
However, that patch only covers bio splitting in __blk_queue_split(), and
we have other kind of bio splitting, such as bio_split() &
submit_bio_noacct() and other ways.

This patch tries to fix the issue in one generic way by always charging
the bio for iops limit in blk_throtl_bio(). This way is reasonable:
re-submission & fast-cloned bio is charged if it is submitted to same
disk/queue, and BIO_THROTTLED will be cleared if bio->bi_bdev is changed.

This new approach can get much more smooth/stable iops limit compared with
commit 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO
scenarios") since that commit can't throttle current split bios actually.

Also this way won't cause new double bio iops charge in
blk_throtl_dispatch_work_fn() in which blk_throtl_bio() won't be called
any more.

Reported-by: Ning Li <lining2020x@163.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: merge submit_bio_checks() into submit_bio_noacct
Ming Lei [Wed, 16 Feb 2022 04:45:11 +0000 (12:45 +0800)]
block: merge submit_bio_checks() into submit_bio_noacct

Now submit_bio_checks() is only called by submit_bio_noacct(), so merge
it into submit_bio_noacct().

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: don't check bio in blk_throtl_dispatch_work_fn
Ming Lei [Wed, 16 Feb 2022 04:45:10 +0000 (12:45 +0800)]
block: don't check bio in blk_throtl_dispatch_work_fn

The bio has been checked already before throttling, so no need to check
it again before dispatching it from throttle queue.

Add a helper of submit_bio_noacct_nocheck() for this purpose.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220216044514.2903784-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: don't declare submit_bio_checks in local header
Ming Lei [Wed, 16 Feb 2022 04:45:09 +0000 (12:45 +0800)]
block: don't declare submit_bio_checks in local header

submit_bio_checks() won't be called outside of block/blk-core.c any more
since commit 9d497e2941c3 ("block: don't protect submit_bio_checks by
q_usage_counter"), so mark it as one local helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: move blk_crypto_bio_prep() out of blk-mq.c
Ming Lei [Wed, 16 Feb 2022 04:45:08 +0000 (12:45 +0800)]
block: move blk_crypto_bio_prep() out of blk-mq.c

blk_crypto_bio_prep() is called for both bio based and blk-mq drivers,
so move it out of blk-mq.c, then we can unify this kind of handling.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: move submit_bio_checks() into submit_bio_noacct
Ming Lei [Wed, 16 Feb 2022 04:45:07 +0000 (12:45 +0800)]
block: move submit_bio_checks() into submit_bio_noacct

It is more clean & readable to check bio when starting to submit it,
instead of just before calling ->submit_bio() or blk_mq_submit_bio().

Also it provides us chance to optimize bio submission without checking
bio.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: remove dm_dispatch_clone_request
Christoph Hellwig [Tue, 15 Feb 2022 10:05:40 +0000 (11:05 +0100)]
dm: remove dm_dispatch_clone_request

Fold dm_dispatch_clone_request into it's only caller, and use a switch
statement to single dispatch for the handling of the different return
values from blk_insert_cloned_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: remove useless code from dm_dispatch_clone_request
Christoph Hellwig [Tue, 15 Feb 2022 10:05:39 +0000 (11:05 +0100)]
dm: remove useless code from dm_dispatch_clone_request

Both ->start_time_ns and the RQF_IO_STAT are set when the request is
allocated using blk_mq_alloc_request by dm-mpath in blk_mq_rq_ctx_init.
The block layer also ensures ->start_time_ns is only set when actually
needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblk-mq: remove the request_queue argument to blk_insert_cloned_request
Christoph Hellwig [Tue, 15 Feb 2022 10:05:38 +0000 (11:05 +0100)]
blk-mq: remove the request_queue argument to blk_insert_cloned_request

The request must be submitted to the queue it was allocated for, so
remove the extra request_queue argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblk-mq: fold blk_cloned_rq_check_limits into blk_insert_cloned_request
Christoph Hellwig [Tue, 15 Feb 2022 10:05:37 +0000 (11:05 +0100)]
blk-mq: fold blk_cloned_rq_check_limits into blk_insert_cloned_request

Fold blk_cloned_rq_check_limits into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblk-mq: make the blk-mq stacking code optional
Christoph Hellwig [Tue, 15 Feb 2022 10:05:36 +0000 (11:05 +0100)]
blk-mq: make the blk-mq stacking code optional

The code to stack blk-mq drivers is only used by dm-multipath, and
will preferably stay that way.  Make it optional and only selected
by device mapper, so that the buildbots more easily catch abuses
like the one that slipped in in the ufs driver in the last merged
window.  Another positive side effects is that kernel builds without
device mapper shrink a little bit as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblk-cgroup: set blkg iostat after percpu stat aggregation
Chengming Zhou [Sun, 13 Feb 2022 08:59:02 +0000 (16:59 +0800)]
blk-cgroup: set blkg iostat after percpu stat aggregation

Don't need to do blkg_iostat_set for top blkg iostat on each CPU,
so move it after percpu stat aggregation.

Fixes: ef45fe470e1e ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220213085902.88884-1-zhouchengming@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblk-lib: don't check bdev_get_queue() NULL check
Chaitanya Kulkarni [Tue, 15 Feb 2022 11:52:47 +0000 (03:52 -0800)]
blk-lib: don't check bdev_get_queue() NULL check

Based on the comment present in the bdev_get_queue()
bdev->bd_queue can never be NULL. Remove the NULL check for the local
variable q that is set from bdev_get_queue() for discard, write_same,
and write_zeroes.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220215115247.11717-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: remove biodoc.rst
Christoph Hellwig [Tue, 15 Feb 2022 08:10:47 +0000 (09:10 +0100)]
block: remove biodoc.rst

This document is completely out of date and extremely misleading. In
general the existing kerneldoc comment serve as a much better
documentation of the still existing functionality, while the history
blurbs are pretty much irrelevant today.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220215081047.3693582-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodocs: block: biodoc.rst: Drop the obsolete and incorrect content
Barry Song [Mon, 7 Feb 2022 07:49:31 +0000 (15:49 +0800)]
docs: block: biodoc.rst: Drop the obsolete and incorrect content

Since commit 7eaceaccab5f ("block: remove per-queue plugging"), kernel
has removed blk_run_address_space(), blk_unplug() and sync_buffer(),
and moved to on-stack plugging. The document has been obsolete for
years.
Given that there is no obvious counterparts in the new mechinism to
replace old APIs, this patch drops the content directly.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lore.kernel.org/r/20220207074931.20067-1-song.bao.hua@hisilicon.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: partition include/linux/blk-cgroup.h
Ming Lei [Fri, 11 Feb 2022 10:11:49 +0000 (18:11 +0800)]
block: partition include/linux/blk-cgroup.h

Partition include/linux/blk-cgroup.h into two parts: one is public part,
the other is block layer private part.

Suggested by Christoph Hellwig.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211101149.2368042-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: move initialization of q->blkg_list into blkcg_init_queue
Ming Lei [Fri, 11 Feb 2022 10:11:48 +0000 (18:11 +0800)]
block: move initialization of q->blkg_list into blkcg_init_queue

q->blkg_list is only used by blkcg code, so move it into
blkcg_init_queue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220211101149.2368042-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: remove THROTL_IOPS_MAX
Ming Lei [Fri, 11 Feb 2022 10:11:47 +0000 (18:11 +0800)]
block: remove THROTL_IOPS_MAX

No one uses THROTL_IOPS_MAX any more, so remove it.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211101149.2368042-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: introduce block_rq_error tracepoint
Yang Shi [Thu, 10 Feb 2022 22:52:22 +0000 (14:52 -0800)]
block: introduce block_rq_error tracepoint

Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.

But there are a few problems with this approach:

1. Even kernel trace filter could do the filtering work, there is
   still some overhead after we enable this tracepoint.

2. The filter is merely based on errno, which does not align with kernel
   logic to check the errors for print_req_error().

3. block_rq_complete only provides dev major and minor to identify
   the block device, it is not convenient to use in user-space.

So introduce a new tracepoint block_rq_error just for the error case.
With this patch, rasdaemon could switch to block_rq_error.

Since the new tracepoint has the similar implementation with
block_rq_complete, so move the existing code from TRACE_EVENT
block_rq_complete() into new event class block_rq_completion(). Then add
event for block_rq_complete and block_rq_err respectively from the newly
created event class per the suggestion from Chaitanya Kulkarni.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220210225222.260069-1-shy828301@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agosbitmap: Delete old sbitmap_queue_get_shallow()
John Garry [Tue, 8 Feb 2022 12:07:04 +0000 (20:07 +0800)]
sbitmap: Delete old sbitmap_queue_get_shallow()

Since __sbitmap_queue_get_shallow() was introduced in commit c05e66733788
("sbitmap: add sbitmap_get_shallow() operation"), it has not been used.

Delete __sbitmap_queue_get_shallow() and rename public
__sbitmap_queue_get_shallow() -> sbitmap_queue_get_shallow() as it is odd
to have public __foo but no foo at all.

Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1644322024-105340-1-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agolib/sbitmap: kill 'depth' from sbitmap_word
Ming Lei [Mon, 10 Jan 2022 07:29:45 +0000 (15:29 +0800)]
lib/sbitmap: kill 'depth' from sbitmap_word

Only the last sbitmap_word can have different depth, and all the others
must have same depth of 1U << sb->shift, so not necessary to store it in
sbitmap_word, and it can be retrieved easily and efficiently by adding
one internal helper of __map_depth(sb, index).

Remove 'depth' field from sbitmap_word, then the annotation of
____cacheline_aligned_in_smp for 'word' isn't needed any more.

Not see performance effect when running high parallel IOPS test on
null_blk.

This way saves us one cacheline(usually 64 words) per each sbitmap_word.

Cc: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: pass a block_device to bio_clone_fast
Christoph Hellwig [Wed, 2 Feb 2022 16:01:09 +0000 (17:01 +0100)]
block: pass a block_device to bio_clone_fast

Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: initialize the target bio in __bio_clone_fast
Christoph Hellwig [Wed, 2 Feb 2022 16:01:08 +0000 (17:01 +0100)]
block: initialize the target bio in __bio_clone_fast

All callers of __bio_clone_fast initialize the bio first.  Move that
initialization into __bio_clone_fast instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: use bio_clone_fast in alloc_io/alloc_tio
Christoph Hellwig [Wed, 2 Feb 2022 16:01:07 +0000 (17:01 +0100)]
dm: use bio_clone_fast in alloc_io/alloc_tio

Replace open coded bio_clone_fast implementations with the actual helper.
Note that the bio allocated as part of the dm_io structure in alloc_io
will only actually be used later in alloc_tio, making this earlier
cloning of the information safe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: clone crypto and integrity data in __bio_clone_fast
Christoph Hellwig [Wed, 2 Feb 2022 16:01:06 +0000 (17:01 +0100)]
block: clone crypto and integrity data in __bio_clone_fast

__bio_clone_fast should also clone integrity and crypto data, as a clone
without those is incomplete.  Right now the only caller that can actually
support crypto and integrity data (dm) does it manually for the one
callchain that supports these, but we better do it properly in the core.

Note that all callers except for the above mentioned one also don't need
to handle failure at all, given that the integrity and crypto clones are
based on mempool allocations that won't fail for sleeping allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm-cache: remove __remap_to_origin_clear_discard
Christoph Hellwig [Wed, 2 Feb 2022 16:01:05 +0000 (17:01 +0100)]
dm-cache: remove __remap_to_origin_clear_discard

Fold __remap_to_origin_clear_discard into the two callers to prepare
for bio cloning refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: simplify the single bio fast path in __send_duplicate_bios
Christoph Hellwig [Wed, 2 Feb 2022 16:01:04 +0000 (17:01 +0100)]
dm: simplify the single bio fast path in __send_duplicate_bios

Most targets just need a single flush bio.  Open code that case in
__send_duplicate_bios without the need to add the bio to a list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: retun the clone bio from alloc_tio
Christoph Hellwig [Wed, 2 Feb 2022 16:01:03 +0000 (17:01 +0100)]
dm: retun the clone bio from alloc_tio

Return the clone bio embedded into the tio as that is what the callers
actually want.  Similar for the free side.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: pass the bio instead of tio to __map_bio
Christoph Hellwig [Wed, 2 Feb 2022 16:01:02 +0000 (17:01 +0100)]
dm: pass the bio instead of tio to __map_bio

This simplifies the callers a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: move cloning the bio into alloc_tio
Christoph Hellwig [Wed, 2 Feb 2022 16:01:01 +0000 (17:01 +0100)]
dm: move cloning the bio into alloc_tio

Move the call to __bio_clone_fast and the assignment of ->len_ptr from
the callers into alloc_tio to prepare for changes to the bio clone API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: fold __send_duplicate_bios into __clone_and_map_simple_bio
Christoph Hellwig [Wed, 2 Feb 2022 16:01:00 +0000 (17:01 +0100)]
dm: fold __send_duplicate_bios into __clone_and_map_simple_bio

Fold __send_duplicate_bios into its only caller to prepare for
refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: fold clone_bio into __clone_and_map_data_bio
Christoph Hellwig [Wed, 2 Feb 2022 16:00:59 +0000 (17:00 +0100)]
dm: fold clone_bio into __clone_and_map_data_bio

Fold clone_bio into its only caller to prepare for refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: add a clone_to_tio helper
Christoph Hellwig [Wed, 2 Feb 2022 16:00:58 +0000 (17:00 +0100)]
dm: add a clone_to_tio helper

Add a helper to stop open coding the container_of operations to get
from the clone bio to the tio structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodrbd: set ->bi_bdev in drbd_req_new
Christoph Hellwig [Wed, 2 Feb 2022 16:00:57 +0000 (17:00 +0100)]
drbd: set ->bi_bdev in drbd_req_new

Make sure the newly allocated bio has the correct bi_bdev set from the
start.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: call bio_associate_blkg from bio_reset
Christoph Hellwig [Fri, 4 Feb 2022 07:19:34 +0000 (08:19 +0100)]
block: call bio_associate_blkg from bio_reset

Call bio_associate_blkg just like bio_set_dev did in the callers before
the conversion to set the block device in bio_reset.

Fixes: a7c50c940477 ("block: pass a block_device and opf to bio_reset")
Reported-by: syzbot+2b3f18414c37b42dcc94@syzkaller.appspotmail.com
Tested-by: syzbot+2b3f18414c37b42dcc94@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220204071934.168469-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoscsi: use BLK_STS_OFFLINE for not fully online devices
Song Liu [Thu, 3 Feb 2022 19:28:27 +0000 (11:28 -0800)]
scsi: use BLK_STS_OFFLINE for not fully online devices

The new error message for such case looks like

[  172.809565] device offline error, dev sda, sector 3138208 ...

which will not be confused with regular I/O error (BLK_STS_IOERR).

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220203192827.1370270-4-song@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: return -ENODEV for BLK_STS_OFFLINE
Song Liu [Thu, 3 Feb 2022 19:28:26 +0000 (11:28 -0800)]
block: return -ENODEV for BLK_STS_OFFLINE

Change the user visible return value for BLK_STS_OFFLINE to -ENODEV, which
is more descriptive than existing -EIO.

Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220203192827.1370270-3-song@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: introduce BLK_STS_OFFLINE
Song Liu [Thu, 3 Feb 2022 19:28:25 +0000 (11:28 -0800)]
block: introduce BLK_STS_OFFLINE

Currently, drivers reports BLK_STS_IOERR for devices that are not full
online or being removed. This behavior could cause confusion for users,
as they are not really I/O errors from the device.

Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
offline error" in dmesg instead of "I/O error".

EIO is intentionally kept to not change user visible return value.

Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220203192827.1370270-2-song@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agofs/ntfs3: remove unnecessary NULL check
Dan Carpenter [Fri, 28 Jan 2022 14:09:22 +0000 (17:09 +0300)]
fs/ntfs3: remove unnecessary NULL check

This code triggers a Smatch warning:

    fs/ntfs3/fsntfs.c:1606 ntfs_bio_fill_1()
    warn: variable dereferenced before check 'bio' (see line 1591)

The "bio" pointer cannot be NULL so there is no need to check.
Originally there was more extensive NULL checking but it was removed
because bio_alloc() will never fail if it is allowed to sleep.

Remove this check as well.

Fixes: 39146b6f66ba ("ntfs3: remove ntfs_alloc_bio")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220128140922.GA29766@kili
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: fix boolreturn.cocci warning
Jiapeng Chong [Fri, 28 Jan 2022 04:34:54 +0000 (12:34 +0800)]
block: fix boolreturn.cocci warning

Return statements in functions returning bool should use true/false
instead of 1/0.

./block/bio.c:1081:9-10: WARNING: return of 0/1 in function
'bio_add_folio' with return type bool.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220128043454.68927-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMAINTAINERS: add bio.h to the block section
Christoph Hellwig [Thu, 27 Jan 2022 06:42:21 +0000 (07:42 +0100)]
MAINTAINERS: add bio.h to the block section

bio.h is part of the block layer, so list it in the MAINTAINERS file
as such.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220127064221.1314477-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: fix the kerneldoc for bio_end_io_acct
Christoph Hellwig [Thu, 27 Jan 2022 06:41:25 +0000 (07:41 +0100)]
block: fix the kerneldoc for bio_end_io_acct

Document the actually existing parameter name.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220127064125.1314347-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: check that there is a plug in blk_flush_plug
Christoph Hellwig [Thu, 27 Jan 2022 07:05:49 +0000 (08:05 +0100)]
block: check that there is a plug in blk_flush_plug

Rename blk_flush_plug to __blk_flush_plug and add a wrapper that includes
the NULL check instead of open coding that check everywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220127070549.1377856-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: remove blk_needs_flush_plug
Christoph Hellwig [Thu, 27 Jan 2022 07:05:48 +0000 (08:05 +0100)]
block: remove blk_needs_flush_plug

blk_needs_flush_plug fails to account for the cb_list, which needs
flushing as well.  Remove it and just check if there is a plug instead
of poking into the internals of the plug structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220127070549.1377856-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: pass a block_device and opf to bio_reset
Christoph Hellwig [Mon, 24 Jan 2022 09:11:07 +0000 (10:11 +0100)]
block: pass a block_device and opf to bio_reset

Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment.  A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: pass a block_device and opf to bio_init
Christoph Hellwig [Mon, 24 Jan 2022 09:11:06 +0000 (10:11 +0100)]
block: pass a block_device and opf to bio_init

Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment.  A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: pass a block_device and opf to bio_alloc
Christoph Hellwig [Mon, 24 Jan 2022 09:11:05 +0000 (10:11 +0100)]
block: pass a block_device and opf to bio_alloc

Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment.  NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: pass a block_device and opf to bio_alloc_kiocb
Christoph Hellwig [Mon, 24 Jan 2022 09:11:04 +0000 (10:11 +0100)]
block: pass a block_device and opf to bio_alloc_kiocb

Pass the block_device and operation that we plan to use this bio for to
bio_alloc_kiocb to optimize the assigment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: pass a block_device and opf to bio_alloc_bioset
Christoph Hellwig [Mon, 24 Jan 2022 09:11:03 +0000 (10:11 +0100)]
block: pass a block_device and opf to bio_alloc_bioset

Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment.  NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: pass a block_device and opf to blk_next_bio
Chaitanya Kulkarni [Mon, 24 Jan 2022 09:11:02 +0000 (10:11 +0100)]
block: pass a block_device and opf to blk_next_bio

All callers need to set the block_device and operation, so lift that into
the common code.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: move blk_next_bio to bio.c
Christoph Hellwig [Mon, 24 Jan 2022 09:11:01 +0000 (10:11 +0100)]
block: move blk_next_bio to bio.c

Keep blk_next_bio next to the core bio infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoxen-blkback: bio_alloc can't fail if it is allow to sleep
Christoph Hellwig [Mon, 24 Jan 2022 09:11:00 +0000 (10:11 +0100)]
xen-blkback: bio_alloc can't fail if it is allow to sleep

Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agornbd-srv: remove struct rnbd_dev_blk_io
Christoph Hellwig [Mon, 24 Jan 2022 09:10:59 +0000 (10:10 +0100)]
rnbd-srv: remove struct rnbd_dev_blk_io

Only the priv field of rnbd_dev_blk_io is used, so store the value of
that in bio->bi_private directly and remove the entire bio_set overhead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20220124091107.642561-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agornbd-srv: simplify bio mapping in process_rdma
Christoph Hellwig [Mon, 24 Jan 2022 09:10:58 +0000 (10:10 +0100)]
rnbd-srv: simplify bio mapping in process_rdma

The memory mapped in process_rdma is contiguous, so there is no need
to loop over bio_add_page.  Remove rnbd_bio_map_kern and just open code
the bio allocation and mapping in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@ionons.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20220124091107.642561-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodrbd: bio_alloc can't fail if it is allow to sleep
Christoph Hellwig [Mon, 24 Jan 2022 09:10:57 +0000 (10:10 +0100)]
drbd: bio_alloc can't fail if it is allow to sleep

Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm-thin: use blkdev_issue_flush instead of open coding it
Christoph Hellwig [Mon, 24 Jan 2022 09:10:56 +0000 (10:10 +0100)]
dm-thin: use blkdev_issue_flush instead of open coding it

Use blkdev_issue_flush, which uses an on-stack bio instead of an
opencoded version with a bio embedded into struct pool.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm-snap: use blkdev_issue_flush instead of open coding it
Christoph Hellwig [Mon, 24 Jan 2022 09:10:55 +0000 (10:10 +0100)]
dm-snap: use blkdev_issue_flush instead of open coding it

Use blkdev_issue_flush, which uses an on-stack bio instead of an
opencoded version with a bio embedded into struct dm_snapshot.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm-crypt: remove clone_init
Christoph Hellwig [Mon, 24 Jan 2022 09:10:54 +0000 (10:10 +0100)]
dm-crypt: remove clone_init

Just open code it next to the bio allocations, which saves a few lines
of code, prepares for future changes and allows to remove the duplicate
bi_opf assignment for the bio_clone_fast case in kcryptd_io_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodm: bio_alloc can't fail if it is allowed to sleep
Christoph Hellwig [Mon, 24 Jan 2022 09:10:53 +0000 (10:10 +0100)]
dm: bio_alloc can't fail if it is allowed to sleep

Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agontfs3: remove ntfs_alloc_bio
Christoph Hellwig [Mon, 24 Jan 2022 09:10:52 +0000 (10:10 +0100)]
ntfs3: remove ntfs_alloc_bio

bio_alloc will never fail if it is allowed to sleep, so there is no
need for this loop.  Also remove the __GFP_HIGH specifier as it doesn't
make sense here given that we'll always fall back to the mempool anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agonfs/blocklayout: remove bl_alloc_init_bio
Christoph Hellwig [Mon, 24 Jan 2022 09:10:51 +0000 (10:10 +0100)]
nfs/blocklayout: remove bl_alloc_init_bio

bio_alloc will never fail when it can sleep.  Remove the now simple
bl_alloc_init_bio helper and open code it in the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agonilfs2: remove nilfs_alloc_seg_bio
Christoph Hellwig [Mon, 24 Jan 2022 09:10:50 +0000 (10:10 +0100)]
nilfs2: remove nilfs_alloc_seg_bio

bio_alloc will never fail when it can sleep.  Remove the now simple
nilfs_alloc_seg_bio helper and open code it in the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agofs: remove mpage_alloc
Christoph Hellwig [Mon, 24 Jan 2022 09:10:49 +0000 (10:10 +0100)]
fs: remove mpage_alloc

open code mpage_alloc in it's two callers and simplify the results
because of the context:

 - __mpage_writepage always passes GFP_NOFS and can thus always sleep and
    will never get a NULL return from bio_alloc at all.
 - do_mpage_readpage can only get a non-sleeping context for readahead
   which never sets PF_MEMALLOC and thus doesn't need the retry loop
   either.

Both cases will never have __GFP_HIGH set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: remove genhd.h
Christoph Hellwig [Mon, 24 Jan 2022 09:39:13 +0000 (10:39 +0100)]
block: remove genhd.h

There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it.  So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: move blk_drop_partitions to blk.h
Christoph Hellwig [Mon, 24 Jan 2022 09:39:12 +0000 (10:39 +0100)]
block: move blk_drop_partitions to blk.h

No need to have this declaration in a public header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: move disk_{block,unblock,flush}_events to blk.h
Christoph Hellwig [Mon, 24 Jan 2022 09:39:11 +0000 (10:39 +0100)]
block: move disk_{block,unblock,flush}_events to blk.h

No need to have these declarations in a public header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: deprecate autoloading based on dev_t
Christoph Hellwig [Tue, 4 Jan 2022 07:16:47 +0000 (08:16 +0100)]
block: deprecate autoloading based on dev_t

Make the legacy dev_t based autoloading optional and add a deprecation
warning.  This kind of autoloading has ceased to be useful about 20 years
ago.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220104071647.164918-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoLinux 5.17-rc2
Linus Torvalds [Sun, 30 Jan 2022 13:37:07 +0000 (15:37 +0200)]
Linux 5.17-rc2

2 years agoMerge tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jan 2022 13:12:02 +0000 (15:12 +0200)]
Merge tag 'irq_urgent_for_v5.17_rc2_p2' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Drop an unused private data field in the AIC driver

 - Various fixes to the realtek-rtl driver

 - Make the GICv3 ITS driver compile again in !SMP configurations

 - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec

 - Yet another kfree/bitmap_free conversion

 - Various DT updates (Renesas, SiFive)

* tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples
  dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts
  dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support
  irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
  irqchip/gic-v3-its: Fix build for !SMP
  irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap
  irqchip/realtek-rtl: Service all pending interrupts
  irqchip/realtek-rtl: Fix off-by-one in routing
  irqchip/realtek-rtl: Map control data to virq
  irqchip/apple-aic: Drop unused ipi_hwirq field

2 years agoMerge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jan 2022 13:02:32 +0000 (15:02 +0200)]
Merge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Prevent accesses to the per-CPU cgroup context list from another CPU
   except the one it belongs to, to avoid list corruption

 - Make sure parent events are always woken up to avoid indefinite hangs
   in the traced workload

* tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix cgroup event list management
  perf: Always wake the parent event

2 years agoMerge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Sun, 30 Jan 2022 11:09:00 +0000 (13:09 +0200)]
Merge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:
 "Make sure the membarrier-rseq fence commands are part of the reported
  set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY"

* tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask

2 years agoMerge tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jan 2022 10:55:06 +0000 (12:55 +0200)]
Merge tag 'x86_urgent_for_v5.17_rc2' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add another Intel CPU model to the list of CPUs supporting the
   processor inventory unique number

 - Allow writing to MCE thresholding sysfs files again - a previous
   change had accidentally disabled it and no one noticed. Goes to show
   how much is this stuff used

* tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
  x86/MCE/AMD: Allow thresholding interface updates after init

2 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 30 Jan 2022 09:21:50 +0000 (11:21 +0200)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "12 patches.

  Subsystems affected by this patch series: sysctl, binfmt, ia64, mm
  (memory-failure, folios, kasan, and psi), selftests, and ocfs2"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  ocfs2: fix a deadlock when commit trans
  jbd2: export jbd2_journal_[grab|put]_journal_head
  psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
  psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
  mm, kasan: use compare-exchange operation to set KASAN page tag
  kasan: test: fix compatibility with FORTIFY_SOURCE
  tools/testing/scatterlist: add missing defines
  mm: page->mapping folio->mapping should have the same offset
  memory-failure: fetch compound_head after pgmap_pfn_valid()
  ia64: make IA64_MCA_RECOVERY bool instead of tristate
  binfmt_misc: fix crash when load/unload module
  include/linux/sysctl.h: fix register_sysctl_mount_point() return type