linux-2.6-microblaze.git
3 years agoio_uring: disable multishot poll for double poll add cases
Jens Axboe [Thu, 15 Apr 2021 15:47:13 +0000 (09:47 -0600)]
io_uring: disable multishot poll for double poll add cases

The re-add handling isn't correct for the multi wait case, so let's
just disable it for now explicitly until we can get that sorted out. This
just turns it into a one-shot request. Since we pass back whether or not
a poll request terminates in multishot mode on completion, this should
not break properly behaving applications that check for IORING_CQE_F_MORE
on completion.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move poll update into remove not add
Pavel Begunkov [Wed, 14 Apr 2021 12:38:37 +0000 (13:38 +0100)]
io_uring: move poll update into remove not add

Having poll update function as a part of IORING_OP_POLL_ADD is not
great, we have to do hack around struct layouts and add some overhead in
the way of more popular POLL_ADD. Even more serious drawback is that
POLL_ADD requires file and always grabs it, and so poll update, which
doesn't need it.

Incorporate poll update into IORING_OP_POLL_REMOVE instead of
IORING_OP_POLL_ADD. It also more consistent with timeout remove/update.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add helper for parsing poll events
Pavel Begunkov [Wed, 14 Apr 2021 12:38:36 +0000 (13:38 +0100)]
io_uring: add helper for parsing poll events

Isolate poll mask SQE parsing and preparations into a new function,
which will be reused shortly.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix POLL_REMOVE removing apoll
Pavel Begunkov [Wed, 14 Apr 2021 12:38:35 +0000 (13:38 +0100)]
io_uring: fix POLL_REMOVE removing apoll

Don't allow REQ_OP_POLL_REMOVE to kill apoll requests, users should not
know about it. Also, remove weird -EACCESS in io_poll_update(), it
shouldn't know anything about apoll, and have to work even if happened
to have a poll and an async poll'ed request with same user_data.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_ring_exit_work()
Pavel Begunkov [Wed, 14 Apr 2021 12:38:34 +0000 (13:38 +0100)]
io_uring: refactor io_ring_exit_work()

Don't reinit io_ring_exit_work()'s exit work/completions on each
iteration, that's wasteful. Also add list_rotate_left(), so if we failed
to complete the task job, we don't try it again and again but defer it
until others are processed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: inline io_iopoll_getevents()
Pavel Begunkov [Tue, 13 Apr 2021 01:58:46 +0000 (02:58 +0100)]
io_uring: inline io_iopoll_getevents()

io_iopoll_getevents() is of no use to us anymore, io_iopoll_check()
handles all the cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e50b8917390f38bee4f822c6f4a6a98a27be037.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: skip futile iopoll iterations
Pavel Begunkov [Tue, 13 Apr 2021 01:58:45 +0000 (02:58 +0100)]
io_uring: skip futile iopoll iterations

The only way to get out of io_iopoll_getevents() and continue iterating
is to have empty iopoll_list, otherwise the main loop would just exit.
So, instead of the unlock on 8th time heuristic, do that based on
iopoll_list.

Also, as no one can add new requests to iopoll_list while
io_iopoll_check() hold uring_lock, it's useless to spin with the list
empty, return in that case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b8ebe84f5fff7ffa1f708952dfef7fc78b668e2.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't fail overflow on in_idle
Pavel Begunkov [Tue, 13 Apr 2021 01:58:44 +0000 (02:58 +0100)]
io_uring: don't fail overflow on in_idle

As CQE overflows are now untied from requests and so don't hold any
ref, we don't need to handle exiting/exec'ing cases there anymore.
Moreover, it's much nicer in regards to userspace to save overflowed
CQEs whenever possible, so remove failing on in_idle.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d873b7dab75c7f3039ead9628a745bea01f2cfd2.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: clean up io_poll_remove_waitqs()
Pavel Begunkov [Tue, 13 Apr 2021 01:58:43 +0000 (02:58 +0100)]
io_uring: clean up io_poll_remove_waitqs()

Move some parts of io_poll_remove_waitqs() that are opcode independent.
Looks better and stresses that both do __io_poll_remove_one().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bbc717f82117cc335c89cbe67ec8d72608178732.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor hrtimer_try_to_cancel uses
Pavel Begunkov [Tue, 13 Apr 2021 01:58:42 +0000 (02:58 +0100)]
io_uring: refactor hrtimer_try_to_cancel uses

Don't save return values of hrtimer_try_to_cancel() in a variable, but
use right away. It's in general safer to not have an intermediate
variable, which may be reused and passed out wrongly, but it be
contracted out. Also clean io_timeout_extract().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d2566ef7ce632e6882dc13e022a26249b3fd30b5.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add timeout completion_lock annotation
Pavel Begunkov [Tue, 13 Apr 2021 01:58:41 +0000 (02:58 +0100)]
io_uring: add timeout completion_lock annotation

Add one more sparse locking annotation for readability in
io_kill_timeout().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bdbb22026024eac29203c1aa0045c4954a2488d1.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: split poll and poll update structures
Pavel Begunkov [Tue, 13 Apr 2021 01:58:40 +0000 (02:58 +0100)]
io_uring: split poll and poll update structures

struct io_poll_iocb became pretty nasty combining also update fields.
Split them, so we would have more clarity to it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2f74d64ffebb57a648f791681af086c7211e3a4.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix uninit old data for poll event upd
Pavel Begunkov [Tue, 13 Apr 2021 01:58:39 +0000 (02:58 +0100)]
io_uring: fix uninit old data for poll event upd

Both IORING_POLL_UPDATE_EVENTS and IORING_POLL_UPDATE_USER_DATA need
old_user_data to find/cancel a poll request, but it's set only for the
first one.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ab08fd35b7652e977f9a475f01741b04102297f1.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix leaking reg files on exit
Pavel Begunkov [Tue, 13 Apr 2021 01:58:38 +0000 (02:58 +0100)]
io_uring: fix leaking reg files on exit

If io_sqe_files_unregister() faults on io_rsrc_ref_quiesce(), it will
fail to do unregister leaving files referenced. And that may well happen
because of a strayed signal or just because it does allocations inside.

In io_ring_ctx_free() do an unsafe version of unregister, as it's
guaranteed to not have requests by that point and so quiesce is useless.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e696e9eade571b51997d0dc1d01f144c6d685c05.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: return back safer resurrect
Pavel Begunkov [Sun, 11 Apr 2021 00:46:40 +0000 (01:46 +0100)]
io_uring: return back safer resurrect

Revert of revert of "io_uring: wait potential ->release() on resurrect",
which adds a helper for resurrect not racing completion reinit, as was
removed because of a strange bug with no clear root or link to the
patch.

Was improved, instead of rcu_synchronize(), just wait_for_completion()
because we're at 0 refs and it will happen very shortly. Specifically
use non-interruptible version to ignore all pending signals that may
have ended prior interruptible wait.

This reverts commit cb5e1b81304e089ee3ca948db4d29f71902eb575.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7a080c20f686d026efade810b116b72f88abaff9.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve hardlink code generation
Pavel Begunkov [Sun, 11 Apr 2021 00:46:39 +0000 (01:46 +0100)]
io_uring: improve hardlink code generation

req_set_fail_links() condition checking is bulky. Even though it's
always in a slow path, it's inlined and generates lots of extra code,
simplify it be moving HARDLINK checking into helpers killing linked
requests.

          text    data     bss     dec     hex filename
before:  79318   12330       8   91656   16608 ./fs/io_uring.o
after:   79126   12330       8   91464   16548 ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/96a9387db658a9d5a44ecbfd57c2a62cb888c9b6.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve sqo stop
Pavel Begunkov [Sun, 11 Apr 2021 00:46:38 +0000 (01:46 +0100)]
io_uring: improve sqo stop

Set IO_SQ_THREAD_SHOULD_STOP before taking sqd lock, so the sqpoll task
sees earlier. Not a problem, it will stop eventually. Also check
invariant that it's stopped only once.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/653b24ee93843a50ff65a45847d9138f5adb76d7.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: split file table from rsrc nodes
Pavel Begunkov [Sun, 11 Apr 2021 00:46:37 +0000 (01:46 +0100)]
io_uring: split file table from rsrc nodes

We don't need to store file tables in rsrc nodes, for now it's easier to
handle tables not generically, so move file tables into the context. A
nice side effect is having one less pointer dereference for request with
fixed file initialisation.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/de9fc4cd3545f24c26c03be4556f58ba3d18b9c3.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: cleanup buffer register
Pavel Begunkov [Sun, 11 Apr 2021 00:46:36 +0000 (01:46 +0100)]
io_uring: cleanup buffer register

In preparation for more changes do a little cleanup of
io_sqe_buffers_register(). Move all args/invariant checking into it from
io_buffers_map_alloc(), because it's confusing. And add a bit more
cleaning for the loop.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93292cb9708c8455e5070cc855861d94e11ca042.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add buffer unmap helper
Pavel Begunkov [Sun, 11 Apr 2021 00:46:35 +0000 (01:46 +0100)]
io_uring: add buffer unmap helper

Add a helper for unmapping registered buffers, better than double
indexing and will be reused in the future.

Suggested-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66cbc6ea863be865bac7b7080ed6a3d5c542b71f.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify io_rsrc_data refcounting
Pavel Begunkov [Sun, 11 Apr 2021 00:46:34 +0000 (01:46 +0100)]
io_uring: simplify io_rsrc_data refcounting

We don't take many references of struct io_rsrc_data, only one per each
io_rsrc_node, so using percpu refs is overkill. Use atomic ref instead,
which is much simpler.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1551d90f7c9b183cf2f0d7b5e5b923430acb03fa.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: provide io_resubmit_prep() stub for !CONFIG_BLOCK
Jens Axboe [Mon, 12 Apr 2021 12:40:02 +0000 (06:40 -0600)]
io_uring: provide io_resubmit_prep() stub for !CONFIG_BLOCK

Randy reports the following error on CONFIG_BLOCK not being set:

../fs/io_uring.c: In function ‘kiocb_done’:
../fs/io_uring.c:2766:7: error: implicit declaration of function ‘io_resubmit_prep’; did you mean ‘io_put_req’? [-Werror=implicit-function-declaration]
   if (io_resubmit_prep(req)) {

Provide a dummy stub for io_resubmit_prep() like we do for
io_rw_should_reissue(), which also helps remove an ifdef sequence from
io_complete_rw_iopoll() as well.

Fixes: 8c130827f417 ("io_uring: don't alter iopoll reissue fail ret code")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise fill_event() by inlining
Pavel Begunkov [Sun, 11 Apr 2021 00:46:33 +0000 (01:46 +0100)]
io_uring: optimise fill_event() by inlining

There are three cases where we much care about performance of
io_cqring_fill_event() -- flushing inline completions, iopoll and
io_req_complete_post(). Inline a hot part of fill_event() into them.

All others are not as important and we don't want to bloat binary for
them, so add a noinline version of the function for all other use
use cases.

nops test(batch=32): 16.932 vs 17.822 KIOPS

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a11d59424bf4417aca33f5ec21008bb3b0ebd11e.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: always pass cflags into fill_event()
Pavel Begunkov [Sun, 11 Apr 2021 00:46:32 +0000 (01:46 +0100)]
io_uring: always pass cflags into fill_event()

A simple preparation patch inlining io_cqring_fill_event(), which only
role was to pass cflags=0 into an actual fill event. It helps to keep
number of related helpers sane in following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/704f9c85b7d9843e4ad50a9f057200c58f5adc6e.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise non-eventfd post-event
Pavel Begunkov [Sun, 11 Apr 2021 00:46:31 +0000 (01:46 +0100)]
io_uring: optimise non-eventfd post-event

Eventfd is not the canonical way of using io_uring, annotate
io_should_trigger_evfd() with likely so it improves code generation for
non-eventfd branch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42fdaa51c68d39479f02cef4fe5bcb24624d60fa.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor compat_msghdr import
Pavel Begunkov [Sun, 11 Apr 2021 00:46:30 +0000 (01:46 +0100)]
io_uring: refactor compat_msghdr import

Add an entry for user pointer to compat_msghdr into io_connect, so it's
explicit that we may use it as this, and removes annoying casts.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/73fd644dea1518f528d3648981cf777ce6e537e9.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: enable inline completion for more cases
Pavel Begunkov [Sun, 11 Apr 2021 00:46:29 +0000 (01:46 +0100)]
io_uring: enable inline completion for more cases

Take advantage of delayed/inline completion flushing and pass right
issue flags for completion of open, open2, fadvise and poll remove
opcodes. All others either already use it or always punted and never
executed inline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0badc7512e82f7350b73bb09abbebbecbdd5dab8.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_close
Pavel Begunkov [Sun, 11 Apr 2021 00:46:28 +0000 (01:46 +0100)]
io_uring: refactor io_close

A small refactoring shrinking it and making easier to read.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19b24eed7cd491a0243b50366dd2a23b558e2665.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: unify files and task cancel
Pavel Begunkov [Sun, 11 Apr 2021 00:46:27 +0000 (01:46 +0100)]
io_uring: unify files and task cancel

Now __io_uring_cancel() and __io_uring_files_cancel() are very similar
and mostly differ by how we count requests, merge them and allow
tctx_inflight() to handle counting.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1a5986a97df4dc1378f3fe0ca1eb483dbcf42112.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: track inflight requests through counter
Pavel Begunkov [Sun, 11 Apr 2021 00:46:26 +0000 (01:46 +0100)]
io_uring: track inflight requests through counter

Instead of keeping requests in a inflight_list, just track them with a
per tctx atomic counter. Apart from it being much easier and more
consistent with task cancel, it frees ->inflight_entry from being shared
between iopoll and cancel-track, so less headache for us.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3c2ee0863cd7eeefa605f3eaff4c1c461a6f1157.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: unify task and files cancel loops
Pavel Begunkov [Sun, 11 Apr 2021 00:46:25 +0000 (01:46 +0100)]
io_uring: unify task and files cancel loops

Move tracked inflight number check up the stack into
__io_uring_files_cancel() so it's similar to task cancel. Will be used
for further cleaning.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dca5a395efebd1e3e0f3bbc6b9640c5e8aa7e468.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify apoll hash removal
Pavel Begunkov [Fri, 9 Apr 2021 08:13:21 +0000 (09:13 +0100)]
io_uring: simplify apoll hash removal

hash_del() works well with non-hashed nodes, there's no need to check
if it is hashed first.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_poll_complete()
Pavel Begunkov [Fri, 9 Apr 2021 08:13:20 +0000 (09:13 +0100)]
io_uring: refactor io_poll_complete()

Remove error parameter from io_poll_complete(), 0 is always passed,
and do a bit of cleaning on top.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: clean up io_poll_task_func()
Pavel Begunkov [Fri, 9 Apr 2021 08:13:19 +0000 (09:13 +0100)]
io_uring: clean up io_poll_task_func()

io_poll_complete() always fills an event (even an overflowed one), so we
always should do io_cqring_ev_posted() afterwards. And that's what is
currently happening, because second EPOLLONESHOT check is always true,
it can't return !done for oneshots.

Remove those branching, it's much easier to read.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: Fix io_wq_worker_affinity()
Peter Zijlstra [Thu, 8 Apr 2021 09:44:50 +0000 (11:44 +0200)]
io-wq: Fix io_wq_worker_affinity()

Do not include private headers and do not frob in internals.

On top of that, while the previous code restores the affinity, it
doesn't ensure the task actually moves there if it was running,
leading to the fun situation that it can be observed running outside
of its allowed mask for potentially significant time.

Use the proper API instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/YG7QkiUzlEbW85TU@hirez.programming.kicks-ass.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't attempt re-add of multishot poll request if racing
Jens Axboe [Tue, 6 Apr 2021 15:49:31 +0000 (09:49 -0600)]
io_uring: don't attempt re-add of multishot poll request if racing

We currently allow racy updates to multishot requests, but we can end up
double adding the poll request if both completion and update does it.
Ensure that we skip re-add on the update side if someone else is
completing it.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: simplify code in __io_worker_busy()
Hao Xu [Tue, 6 Apr 2021 03:08:45 +0000 (11:08 +0800)]
io-wq: simplify code in __io_worker_busy()

Leverage XOR to simplify the code in __io_worker_busy.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1617678525-3129-1-git-send-email-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: kill outdated comment about splice punt
Pavel Begunkov [Thu, 1 Apr 2021 14:44:05 +0000 (15:44 +0100)]
io_uring: kill outdated comment about splice punt

The splice/tee comment in io_prep_async_work() isn't relevant since the
section was moved, delete it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/892a549c89c3d422b679677b8e68ffd3fcb736b6.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: encapsulate fixed files into struct
Pavel Begunkov [Thu, 1 Apr 2021 14:44:04 +0000 (15:44 +0100)]
io_uring: encapsulate fixed files into struct

Add struct io_fixed_file representing a single registered file, first to
hide ugly struct file **, which may be misleading, and secondly to
retype it to unsigned long as conversions to it and back to file * for
handling and masking FFS_* flags are getting nasty.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/78669731a605a7614c577c3de552631cfaf0869a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor file tables alloc/free
Pavel Begunkov [Thu, 1 Apr 2021 14:44:03 +0000 (15:44 +0100)]
io_uring: refactor file tables alloc/free

Introduce a heler io_free_file_tables() doing all the cleaning, there
are several places where it's hand coded. Also move all allocations into
io_sqe_alloc_file_tables() and rename it, so all of it is in one place.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/502a84ebf41ff119b095e59661e678eacb752bf8.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't quiesce intial files register
Pavel Begunkov [Thu, 1 Apr 2021 14:44:02 +0000 (15:44 +0100)]
io_uring: don't quiesce intial files register

There is no reason why we would want to fully quiesce ring on
IORING_REGISTER_FILES, if it's already registered we fail.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/563bb8060bb2d3efbc32fce6101678281c574d2a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: set proper FFS* flags on reg file update
Pavel Begunkov [Thu, 1 Apr 2021 14:44:01 +0000 (15:44 +0100)]
io_uring: set proper FFS* flags on reg file update

Set FFS_* flags (e.g. FFS_ASYNC_READ) not only in initial registration
but also on registered files update. Not a bug, but may miss getting
profit out of the feature.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/df29a841a2d3d3695b509cdffce5070777d9d942.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: deduplicate NOSIGNAL setting
Pavel Begunkov [Thu, 1 Apr 2021 14:44:00 +0000 (15:44 +0100)]
io_uring: deduplicate NOSIGNAL setting

Set MSG_NOSIGNAL and REQ_F_NOWAIT in send/recv prep routines and don't
duplicate it in all four send/recv handlers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1133a3ed1c0e192975b7341ea4b0bf91f63b132.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: put link timeout req consistently
Pavel Begunkov [Thu, 1 Apr 2021 14:43:59 +0000 (15:43 +0100)]
io_uring: put link timeout req consistently

Don't put linked timeout req in io_async_find_and_cancel() but do it in
io_link_timeout_fn(), so we have only one point for that and won't have
to do it differently as it's now (put vs put_deferred). Btw, improve a
bit io_async_find_and_cancel()'s locking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d75b70957f245275ab7cba83e0ac9c1b86aae78a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify overflow handling
Pavel Begunkov [Thu, 1 Apr 2021 14:43:58 +0000 (15:43 +0100)]
io_uring: simplify overflow handling

Overflowed CQEs doesn't lock requests anymore, so we don't care so much
about cancelling them, so kill cq_overflow_flushed and simplify the
code.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5799867aeba9e713c32f49aef78e5e1aef9fbc43.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: lock annotate timeouts and poll
Pavel Begunkov [Thu, 1 Apr 2021 14:43:57 +0000 (15:43 +0100)]
io_uring: lock annotate timeouts and poll

Add timeout and poll ->comletion_lock annotations for Sparse, makes life
easier while looking at the functions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2345325643093d41543383ba985a735aeb899eac.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: kill unused forward decls
Pavel Begunkov [Thu, 1 Apr 2021 14:43:56 +0000 (15:43 +0100)]
io_uring: kill unused forward decls

Kill unused forward declarations for io_ring_file_put() and
io_queue_next(). Also btw rename the first one.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/64aa27c3f9662e14615cc119189f5eaf12989671.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: store reg buffer end instead of length
Pavel Begunkov [Thu, 1 Apr 2021 14:43:55 +0000 (15:43 +0100)]
io_uring: store reg buffer end instead of length

It's a bit more convenient for us to store a registered buffer end
address instead of length, see struct io_mapped_ubuf, as it allow to not
recompute it every time.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/39164403fe92f1dc437af134adeec2423cdf9395.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve import_fixed overflow checks
Pavel Begunkov [Thu, 1 Apr 2021 14:43:54 +0000 (15:43 +0100)]
io_uring: improve import_fixed overflow checks

Replace a hand-coded overflow check with a specialised function. Even
though compilers are smart enough to generate identical binary (i.e.
check carry bit), but it's more foolproof and conveys the intention
better.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e437dcdc929bacbb6f11a4824ecbbf17225cb82a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_async_cancel()
Pavel Begunkov [Thu, 1 Apr 2021 14:43:53 +0000 (15:43 +0100)]
io_uring: refactor io_async_cancel()

Remove extra tctx==NULL checks that are already done by
io_async_cancel_one().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70c2a8b958d942e86958a28af0452966ce1095b0.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove unused hash_wait
Pavel Begunkov [Thu, 1 Apr 2021 14:43:52 +0000 (15:43 +0100)]
io_uring: remove unused hash_wait

No users of io_uring_ctx::hash_wait left, kill it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e25cb83c233a5f75f15275596b49fbafbea606fa.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: better ref handling in poll_remove_one
Pavel Begunkov [Thu, 1 Apr 2021 14:43:51 +0000 (15:43 +0100)]
io_uring: better ref handling in poll_remove_one

Instead of io_put_req() to drop not a final ref, use req_ref_put(),
which is slimmer and will also check the invariant.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/85b5774ce13ae55cc2e705abdc8cbafe1212f1bd.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: combine lock/unlock sections on exit
Pavel Begunkov [Thu, 1 Apr 2021 14:43:50 +0000 (15:43 +0100)]
io_uring: combine lock/unlock sections on exit

io_ring_exit_work() already does uring_lock lock/unlock, no need to
repeat it for lock waiting trick in io_ring_ctx_free(). Move the waiting
with comments and spinlocking into io_ring_exit_work.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a8ae0589b0ea64ad4791e2c282e4e9b713dd7024.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove useless is_dying check on quiesce
Pavel Begunkov [Thu, 1 Apr 2021 14:43:48 +0000 (15:43 +0100)]
io_uring: remove useless is_dying check on quiesce

rsrc_data refs should always be valid for potential submitters,
io_rsrc_ref_quiesce() restores it before unlocking, so
percpu_ref_is_dying() check in io_sqe_files_unregister() does nothing
and misleading. Concurrent quiesce is prevented with
struct io_rsrc_data::quiesce.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bf97055e1748ee3a382e66daf384a469eb90b931.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: reuse io_rsrc_node_destroy()
Pavel Begunkov [Thu, 1 Apr 2021 14:43:47 +0000 (15:43 +0100)]
io_uring: reuse io_rsrc_node_destroy()

Reuse io_rsrc_node_destroy() in __io_rsrc_put_work(). Also move it to a
more appropriate place -- to the other node routines, and remove forward
declaration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cccafba41aee1e5bb59988704885b1340aef3a27.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: ctx-wide rsrc nodes
Pavel Begunkov [Thu, 1 Apr 2021 14:43:46 +0000 (15:43 +0100)]
io_uring: ctx-wide rsrc nodes

If we're going to ever support multiple types of resources we need
shared rsrc nodes to not bloat requests, that is implemented in this
patch. It also gives a nicer API and saves one pointer dereference
in io_req_set_rsrc_node().

We may say that all requests bound to a resource belong to one and only
one rsrc node, and considering that nodes are removed and recycled
strictly in-order, this separates requests into generations, where
generation are changed on each node switch (i.e. io_rsrc_node_switch()).

The API is simple, io_rsrc_node_switch() switches to a new generation if
needed, and also optionally kills a passed in io_rsrc_data. Each call to
io_rsrc_node_switch() have to be preceded with
io_rsrc_node_switch_start(). The start function is idempotent and should
not necessarily be followed by switch.

One difference is that once a node was set it will always retain a valid
rsrc node, even on unregister. It may be a nuisance at the moment, but
makes much sense for multiple types of resources. Another thing changed
is that nodes are bound to/associated with a io_rsrc_data later just
before killing (i.e. switching).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e9c693b4b9a2f47aa784b616ce29843021bb65a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_queue_rsrc_removal()
Pavel Begunkov [Thu, 1 Apr 2021 14:43:45 +0000 (15:43 +0100)]
io_uring: refactor io_queue_rsrc_removal()

Pass rsrc_node into io_queue_rsrc_removal() explicitly. Just a
simple preparation patch, makes following changes nicer.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/002889ce4de7baf287f2b010eef86ffe889174c6.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move rsrc_put callback into io_rsrc_data
Pavel Begunkov [Thu, 1 Apr 2021 14:43:44 +0000 (15:43 +0100)]
io_uring: move rsrc_put callback into io_rsrc_data

io_rsrc_node's callback operates only on a single io_rsrc_data and only
with its resources, so rsrc_put() callback is actually a property of
io_rsrc_data. Move it there, it makes code much nicecr.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9417c2fba3c09e8668f05747006a603d416d34b4.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: encapsulate rsrc node manipulations
Pavel Begunkov [Thu, 1 Apr 2021 14:43:43 +0000 (15:43 +0100)]
io_uring: encapsulate rsrc node manipulations

io_rsrc_node_get() and io_rsrc_node_set() are always used together,
merge them into one so most users don't even see io_rsrc_node and don't
need to care about it.

It helped to catch io_sqe_files_register() inferring rsrc data argument
for get and set differently, not a problem but a good sign.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0827b080b2e61b3dec795380f7e1a1995595d41f.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: use rsrc prealloc infra for files reg
Pavel Begunkov [Thu, 1 Apr 2021 14:43:42 +0000 (15:43 +0100)]
io_uring: use rsrc prealloc infra for files reg

Keep it consistent with update and use io_rsrc_node_prealloc() +
io_rsrc_node_get() in io_sqe_files_register() as well, that will be used
in future patches, not as error prone and allows to deduplicate
rsrc_node init.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cf87321e6be5e38f4dc7fe5079d2aa6945b1ace0.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify io_rsrc_node_ref_zero
Pavel Begunkov [Thu, 1 Apr 2021 14:43:41 +0000 (15:43 +0100)]
io_uring: simplify io_rsrc_node_ref_zero

Replace queue_delayed_work() with mod_delayed_work() in
io_rsrc_node_ref_zero() as the later one can schedule a new work, and
cleanup it further for better readability.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3b2b23e3a1ea4bbf789cd61815d33e05d9ff945e.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: name rsrc bits consistently
Pavel Begunkov [Thu, 1 Apr 2021 14:43:40 +0000 (15:43 +0100)]
io_uring: name rsrc bits consistently

Keep resource related structs' and functions' naming consistent, in
particular use "io_rsrc" prefix for everything.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/962f5acdf810f3a62831e65da3932cde24f6d9df.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: cancel task_work on exit only targeting the current 'wq'
Jens Axboe [Fri, 2 Apr 2021 01:57:07 +0000 (19:57 -0600)]
io-wq: cancel task_work on exit only targeting the current 'wq'

With using task_work_cancel(), we're potentially canceling task_work
that isn't related to this specific io_wq. Use the newly added
task_work_cancel_match() to ensure that we only remove and cancel work
items that are specific to this io_wq.

Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agotask_work: add helper for more targeted task_work canceling
Jens Axboe [Fri, 2 Apr 2021 01:53:29 +0000 (19:53 -0600)]
task_work: add helper for more targeted task_work canceling

The only exported helper we have right now is task_work_cancel(), which
cancels any task_work from a given task where func matches the queued
work item. This is a bit too coarse for some use cases. Add a
task_work_cancel_match() that allows to more specifically target
individual work items outside of purely the callback function used.

task_work_cancel() can be trivially implemented on top of that, hence do
so.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix race around poll update and poll triggering
Jens Axboe [Wed, 31 Mar 2021 15:03:03 +0000 (09:03 -0600)]
io_uring: fix race around poll update and poll triggering

Joakim reports that in some conditions he sees a multishot poll request
being canceled, and that it coincides with getting -EALREADY on
modification. As part of the poll update procedure, there's a small window
where the request is marked as canceled, and if this coincides with the
event actually triggering, then we can get a spurious -ECANCELED and
termination of the multishot request.

Don't mark the poll request as being canceled for update. We also don't
care if we race on removal unless it's a one-shot request, we can safely
updated for either case.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: reg buffer overflow checks hardening
Pavel Begunkov [Wed, 24 Mar 2021 22:59:01 +0000 (22:59 +0000)]
io_uring: reg buffer overflow checks hardening

We are safe with overflows in io_sqe_buffer_register() because it will
just yield alloc failure, but it's nicer to check explicitly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2b0625551be3d97b80a5fd21c8cd79dc1c91f0b5.1616624589.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE
Jens Axboe [Thu, 25 Mar 2021 16:21:35 +0000 (10:21 -0600)]
io_uring: allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE

Now that we have any worker being attached to the original task as
threads, accounting of CPU time is directly attributed to the original
task as well. This means that we no longer have to restrict SQPOLL to
needing elevated privileges, as it's really no different from just having
the task spawn a busy looping thread in userspace.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: eliminate the need for a manager thread
Jens Axboe [Mon, 8 Mar 2021 16:37:51 +0000 (09:37 -0700)]
io-wq: eliminate the need for a manager thread

io-wq relies on a manager thread to create/fork new workers, as needed.
But there's really no strong need for it anymore. We have the following
cases that fork a new worker:

1) Work queue. This is done from the task itself always, and it's trivial
   to create a worker off that path, if needed.

2) All workers have gone to sleep, and we have more work. This is called
   off the sched out path. For this case, use a task_work items to queue
   a fork-worker operation.

3) Hashed work completion. Don't think we need to do anything off this
   case. If need be, it could just use approach 2 as well.

Part of this change is incrementing the running worker count before the
fork, to avoid cases where we observe we need a worker and then queue
creation of one. Then new work comes in, we fork a new one. That last
queue operation should have waited for the previous worker to come up,
it's quite possible we don't even need it. Hence move the worker running
from before we fork it off to more efficiently handle that case.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agokernel: allow fork with TIF_NOTIFY_SIGNAL pending
Jens Axboe [Mon, 22 Mar 2021 15:39:12 +0000 (09:39 -0600)]
kernel: allow fork with TIF_NOTIFY_SIGNAL pending

fork() fails if signal_pending() is true, but there are two conditions
that can lead to that:

1) An actual signal is pending. We want fork to fail for that one, like
   we always have.

2) TIF_NOTIFY_SIGNAL is pending, because the task has pending task_work.
   We don't need to make it fail for that case.

Allow fork() to proceed if just task_work is pending, by changing the
signal_pending() check to task_sigpending().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: allow events and user_data update of running poll requests
Jens Axboe [Wed, 17 Mar 2021 14:37:41 +0000 (08:37 -0600)]
io_uring: allow events and user_data update of running poll requests

This adds two new POLL_ADD flags, IORING_POLL_UPDATE_EVENTS and
IORING_POLL_UPDATE_USER_DATA. As with the other POLL_ADD flag, these are
masked into sqe->len. If set, the POLL_ADD will have the following
behavior:

- sqe->addr must contain the the user_data of the poll request that
  needs to be modified. This field is otherwise invalid for a POLL_ADD
  command.

- If IORING_POLL_UPDATE_EVENTS is set, sqe->poll_events must contain the
  new mask for the existing poll request. There are no checks for whether
  these are identical or not, if a matching poll request is found, then it
  is re-armed with the new mask.

- If IORING_POLL_UPDATE_USER_DATA is set, sqe->off must contain the new
  user_data for the existing poll request.

A POLL_ADD with any of these flags set may complete with any of the
following results:

1) 0, which means that we successfully found the existing poll request
   specified, and performed the re-arm procedure. Any error from that
   re-arm will be exposed as a completion event for that original poll
   request, not for the update request.
2) -ENOENT, if no existing poll request was found with the given
   user_data.
3) -EALREADY, if the existing poll request was already in the process of
   being removed/canceled/completing.
4) -EACCES, if an attempt was made to modify an internal poll request
   (eg not one originally issued ass IORING_OP_POLL_ADD).

The usual -EINVAL cases apply as well, if any invalid fields are set
in the sqe for this command type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: abstract out a io_poll_find_helper()
Jens Axboe [Wed, 17 Mar 2021 14:17:19 +0000 (08:17 -0600)]
io_uring: abstract out a io_poll_find_helper()

We'll need this helper for another purpose, for now just abstract it
out and have io_poll_cancel() use it for lookups.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: terminate multishot poll for CQ ring overflow
Jens Axboe [Tue, 23 Feb 2021 16:02:26 +0000 (09:02 -0700)]
io_uring: terminate multishot poll for CQ ring overflow

If we hit overflow and fail to allocate an overflow entry for the
completion, terminate the multishot poll mode.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: abstract out helper for removing poll waitqs/hashes
Jens Axboe [Tue, 23 Feb 2021 15:58:04 +0000 (08:58 -0700)]
io_uring: abstract out helper for removing poll waitqs/hashes

No functional changes in this patch, just preparation for kill multishot
poll on CQ overflow.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add multishot mode for IORING_OP_POLL_ADD
Jens Axboe [Tue, 23 Feb 2021 05:08:01 +0000 (22:08 -0700)]
io_uring: add multishot mode for IORING_OP_POLL_ADD

The default io_uring poll mode is one-shot, where once the event triggers,
the poll command is completed and won't trigger any further events. If
we're doing repeated polling on the same file or socket, then it can be
more efficient to do multishot, where we keep triggering whenever the
event becomes true.

This deviates from the usual norm of having one CQE per SQE submitted. Add
a CQE flag, IORING_CQE_F_MORE, which tells the application to expect
further completion events from the submitted SQE. Right now the only user
of this is POLL_ADD in multishot mode.

Since sqe->poll_events is using the space that we normally use for adding
flags to commands, use sqe->len for the flag space for POLL_ADD. Multishot
mode is selected by setting IORING_POLL_ADD_MULTI in sqe->len. An
application should expect more CQEs for the specificed SQE if the CQE is
flagged with IORING_CQE_F_MORE. In multishot mode, only cancelation or an
error will terminate the poll request, in which case the flag will be
cleared.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: include cflags in completion trace event
Jens Axboe [Tue, 23 Feb 2021 05:05:00 +0000 (22:05 -0700)]
io_uring: include cflags in completion trace event

We should be including the completion flags for better introspection on
exactly what completion event was logged.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: allocate memory for overflowed CQEs
Pavel Begunkov [Tue, 23 Feb 2021 12:40:22 +0000 (12:40 +0000)]
io_uring: allocate memory for overflowed CQEs

Instead of using a request itself for overflowed CQE stashing, allocate a
separate entry. The disadvantage is that the allocation may fail and it
will be accounted as lost (see rings->cq_overflow), so we lose reliability
in case of memory pressure if the application is driving the CQ ring into
overflow. However, it opens a way for for multiple CQEs per an SQE and
even generating SQE-less CQEs.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: use GFP_ATOMIC | __GFP_ACCOUNT]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: mask in error/nval/hangup consistently for poll
Jens Axboe [Fri, 19 Mar 2021 20:06:24 +0000 (14:06 -0600)]
io_uring: mask in error/nval/hangup consistently for poll

Instead of masking these in as part of regular POLL_ADD prep, do it in
io_init_poll_iocb(), and include NVAL as that's generally unmaskable,
and RDHUP alongside the HUP that is already set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise rw complete error handling
Pavel Begunkov [Mon, 22 Mar 2021 01:58:34 +0000 (01:58 +0000)]
io_uring: optimise rw complete error handling

Expect read/write to succeed and create a hot path for this case, in
particular hide all error handling with resubmission under a single
check with the desired result.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: hide iter revert in resubmit_prep
Pavel Begunkov [Mon, 22 Mar 2021 01:58:33 +0000 (01:58 +0000)]
io_uring: hide iter revert in resubmit_prep

Move iov_iter_revert() resetting iterator in case of -EIOCBQUEUED into
io_resubmit_prep(), so we don't do heavy revert in hot path, also saves
a couple of checks.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't alter iopoll reissue fail ret code
Pavel Begunkov [Mon, 22 Mar 2021 01:58:32 +0000 (01:58 +0000)]
io_uring: don't alter iopoll reissue fail ret code

When reissue_prep failed in io_complete_rw_iopoll(), we change return
code to -EIO to prevent io_iopoll_complete() from doing resubmission.
Mark requests with a new flag (i.e. REQ_F_DONT_REISSUE) instead and
retain the original return value.

It also removes io_rw_reissue() from io_iopoll_complete() that will be
used later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise kiocb_end_write for !ISREG
Pavel Begunkov [Mon, 22 Mar 2021 01:58:31 +0000 (01:58 +0000)]
io_uring: optimise kiocb_end_write for !ISREG

file_end_write() is only for regular files, so the function do a couple
of dereferences to get inode and check for it. However, we already have
REQ_F_ISREG at hand, just use it and inline file_end_write().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: kill unused REQ_F_NO_FILE_TABLE
Pavel Begunkov [Mon, 22 Mar 2021 01:58:30 +0000 (01:58 +0000)]
io_uring: kill unused REQ_F_NO_FILE_TABLE

current->files are always valid now even for io-wq threads, so kill not
used anymore REQ_F_NO_FILE_TABLE.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't init req->work fully in advance
Pavel Begunkov [Mon, 22 Mar 2021 01:58:29 +0000 (01:58 +0000)]
io_uring: don't init req->work fully in advance

req->work is mostly unused unless it's punted, and io_init_req() is too
hot for fully initialising it. Fortunately, we can skip init work.next
as it's controlled by io-wq, and can not touch work.flags by moving
everything related into io_prep_async_work(). The only field left is
req->work.creds, but there is nothing can be done, keep maintaining it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: refactor *_get_acct()
Pavel Begunkov [Mon, 22 Mar 2021 01:58:28 +0000 (01:58 +0000)]
io-wq: refactor *_get_acct()

Extract a helper for io_work_get_acct() and io_wqe_get_acct() to avoid
duplication.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove tctx->sqpoll
Pavel Begunkov [Mon, 22 Mar 2021 01:58:27 +0000 (01:58 +0000)]
io_uring: remove tctx->sqpoll

struct io_uring_task::sqpoll is not used anymore, kill it

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't do extra EXITING cancellations
Pavel Begunkov [Mon, 22 Mar 2021 01:58:25 +0000 (01:58 +0000)]
io_uring: don't do extra EXITING cancellations

io_match_task() matches all requests with PF_EXITING task, even though
those may be valid requests. It was necessary for SQPOLL cancellation,
but now it kills all requests before exiting via
io_uring_cancel_sqpoll(), so it's not needed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't clear REQ_F_LINK_TIMEOUT
Pavel Begunkov [Mon, 22 Mar 2021 01:58:24 +0000 (01:58 +0000)]
io_uring: don't clear REQ_F_LINK_TIMEOUT

REQ_F_LINK_TIMEOUT is a hint that to look for linked timeouts to cancel,
we're leaving it even when it's already fired. Hence don't care to clear
it in io_kill_linked_timeout(), it's safe and is called only once.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise io_req_task_work_add()
Pavel Begunkov [Fri, 19 Mar 2021 17:22:44 +0000 (17:22 +0000)]
io_uring: optimise io_req_task_work_add()

Inline io_task_work_add() into io_req_task_work_add(). They both work
with a request, so keeping them separate doesn't make things much more
clear, but merging allows optimise it. Apart from small wins like not
reading req->ctx or not calculating @notify in the hot path, i.e. with
tctx->task_state set, it avoids doing wake_up_process() for every single
add, but only after actually done task_work_add().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: abolish old io_put_file()
Pavel Begunkov [Fri, 19 Mar 2021 17:22:43 +0000 (17:22 +0000)]
io_uring: abolish old io_put_file()

io_put_file() doesn't do a good job at generating a good code. Inline
it, so we can check REQ_F_FIXED_FILE first, prioritising FIXED_FILE case
over requests without files, and saving a memory load in that case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise io_dismantle_req() fast path
Pavel Begunkov [Fri, 19 Mar 2021 17:22:42 +0000 (17:22 +0000)]
io_uring: optimise io_dismantle_req() fast path

Reshuffle io_dismantle_req() checks to put most of slow path stuff under
a single if.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: inline io_clean_op()'s fast path
Pavel Begunkov [Fri, 19 Mar 2021 17:22:41 +0000 (17:22 +0000)]
io_uring: inline io_clean_op()'s fast path

Inline io_clean_op(), leaving __io_clean_op() but renaming it. This will
be used in following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove __io_req_task_cancel()
Pavel Begunkov [Fri, 19 Mar 2021 17:22:40 +0000 (17:22 +0000)]
io_uring: remove __io_req_task_cancel()

Both io_req_complete_failed() and __io_req_task_cancel() do the same
thing: set failure flag, put both req refs and emit an CQE. The former
one is a bit more advance as it puts req back into a req cache, so make
it to take over __io_req_task_cancel() and remove the last one.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add helper flushing locked_free_list
Pavel Begunkov [Fri, 19 Mar 2021 17:22:39 +0000 (17:22 +0000)]
io_uring: add helper flushing locked_free_list

Add a new helper io_flush_cached_locked_reqs() that splices
locked_free_list to free_list, and does it right doing all sync and
invariant reinit.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_free_req_deferred()
Pavel Begunkov [Fri, 19 Mar 2021 17:22:38 +0000 (17:22 +0000)]
io_uring: refactor io_free_req_deferred()

We don't care about ret value in io_free_req_deferred(), make the code a
bit more concise.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: inline io_put_req and friends
Pavel Begunkov [Fri, 19 Mar 2021 17:22:37 +0000 (17:22 +0000)]
io_uring: inline io_put_req and friends

One big omission is that io_put_req() haven't been marked inline, and at
least gcc 9 doesn't inline it, not to mention that it's really hot and
extra function call is intolerable, especially when it doesn't put a
final ref.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor rsrc refnode allocation
Pavel Begunkov [Fri, 19 Mar 2021 17:22:36 +0000 (17:22 +0000)]
io_uring: refactor rsrc refnode allocation

There are two problems:
1) we always allocate refnodes in advance and free them if those
haven't been used. It's expensive, takes two allocations, where one of
them is percpu. And it may be pretty common not actually using them.

2) Current API with allocating a refnode and setting some of the fields
is error prone, we don't ever want to have a file node runninng fixed
buffer callback...

Solve both with pre-init/get API. Pre-init just leaves the node for
later if not used, and for get (i.e. io_rsrc_refnode_get()), you need to
explicitly pass all arguments setting callbacks/etc., so it's more
resilient.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_flush_cached_reqs()
Pavel Begunkov [Fri, 19 Mar 2021 17:22:35 +0000 (17:22 +0000)]
io_uring: refactor io_flush_cached_reqs()

Emphasize that return value of io_flush_cached_reqs() depends on number
of requests in the cache. It looks nicer and might help tools from
false-negative analyses.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise success case of __io_queue_sqe
Pavel Begunkov [Fri, 19 Mar 2021 17:22:34 +0000 (17:22 +0000)]
io_uring: optimise success case of __io_queue_sqe

Move the case of successfully issued request by doing that check first.
It's not much of a difference, just generates slightly better code for
me.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: inline __io_queue_linked_timeout()
Pavel Begunkov [Fri, 19 Mar 2021 17:22:33 +0000 (17:22 +0000)]
io_uring: inline __io_queue_linked_timeout()

Inline __io_queue_linked_timeout(), we don't need it

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: keep io_req_free_batch() call locality
Pavel Begunkov [Fri, 19 Mar 2021 17:22:32 +0000 (17:22 +0000)]
io_uring: keep io_req_free_batch() call locality

Don't do a function call (io_dismantle_req()) in the middle and place it
to near other function calls, otherwise may lead to excessive register
spilling.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>