linux-2.6-microblaze.git
3 years agoio_uring: fix invalid error check after malloc
Pavel Begunkov [Sun, 25 Apr 2021 23:16:31 +0000 (00:16 +0100)]
io_uring: fix invalid error check after malloc

Now we allocate io_mapped_ubuf instead of bvec, so we clearly have to
check its address after allocation.

Fixes: 41edf1a5ec967 ("io_uring: keep table of pointers to ubufs")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d28eb1bc4384284f69dbce35b9f70c115ff6176f.1619392565.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: io_sq_thread() no longer needs to reset current->pf_io_worker
Stefan Metzmacher [Sat, 24 Apr 2021 23:26:04 +0000 (01:26 +0200)]
io_uring: io_sq_thread() no longer needs to reset current->pf_io_worker

This is done by create_io_thread() now.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agokernel: always initialize task->pf_io_worker to NULL
Stefan Metzmacher [Sat, 24 Apr 2021 23:26:03 +0000 (01:26 +0200)]
kernel: always initialize task->pf_io_worker to NULL

Otherwise io_wq_worker_{running,sleeping}() may dereference an
invalid pointer (in future). Currently all users of create_io_thread()
are fine and get task->pf_io_worker = NULL implicitly from the
wq_manager, which got it either from the userspace thread
of the sq_thread, which explicitly reset it to NULL.

I think it's safer to always reset it in order to avoid future
problems.

Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task")
cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: update sq_thread_idle after ctx deleted
Hao Xu [Sat, 24 Apr 2021 09:26:20 +0000 (17:26 +0800)]
io_uring: update sq_thread_idle after ctx deleted

we shall update sq_thread_idle anytime we do ctx deletion from ctx_list

Fixes:734551df6f9b ("io_uring: fix shared sqpoll cancellation hangs")

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619256380-236460-1-git-send-email-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add full-fledged dynamic buffers support
Pavel Begunkov [Sun, 25 Apr 2021 13:32:26 +0000 (14:32 +0100)]
io_uring: add full-fledged dynamic buffers support

Hook buffers into all rsrc infrastructure, including tagging and
updates.

Suggested-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: implement fixed buffers registration similar to fixed files
Bijan Mottahedeh [Sun, 25 Apr 2021 13:32:25 +0000 (14:32 +0100)]
io_uring: implement fixed buffers registration similar to fixed files

Apply fixed_rsrc functionality for fixed buffers support.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
[rebase, remove multi-level tables, fix unregister on exit]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/17035f4f75319dc92962fce4fc04bc0afb5a68dc.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: prepare fixed rw for dynanic buffers
Pavel Begunkov [Sun, 25 Apr 2021 13:32:24 +0000 (14:32 +0100)]
io_uring: prepare fixed rw for dynanic buffers

With dynamic buffer updates, registered buffers in the table may change
at any moment. First of all we want to prevent future races between
updating and importing (i.e. io_import_fixed()), where the latter one
may happen without uring_lock held, e.g. from io-wq.

Save the first loaded io_mapped_ubuf buffer and reuse.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/21a2302d07766ae956640b6f753292c45200fe8f.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: keep table of pointers to ubufs
Pavel Begunkov [Sun, 25 Apr 2021 13:32:23 +0000 (14:32 +0100)]
io_uring: keep table of pointers to ubufs

Instead of keeping a table of ubufs convert them into pointers to ubuf,
so we can atomically read one pointer and be sure that the content of
ubuf won't change.

Because it was already dynamically allocating imu->bvec, throw both
imu and bvec into a single structure so they can be allocated together.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b96efa4c5febadeccf41d0e849ac099f4c83b0d3.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add generic rsrc update with tags
Pavel Begunkov [Sun, 25 Apr 2021 13:32:22 +0000 (14:32 +0100)]
io_uring: add generic rsrc update with tags

Add IORING_REGISTER_RSRC_UPDATE, which also supports passing in rsrc
tags. Implement it for registered files.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d4dc66df204212f64835ffca2c4eb5e8363f2f05.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add IORING_REGISTER_RSRC
Pavel Begunkov [Sun, 25 Apr 2021 13:32:21 +0000 (14:32 +0100)]
io_uring: add IORING_REGISTER_RSRC

Add a new io_uring_register() opcode for rsrc registeration. Instead of
accepting a pointer to resources, fds or iovecs, it @arg is now pointing
to a struct io_uring_rsrc_register, and the second argument tells how
large that struct is to make it easily extendible by adding new fields.

All that is done mainly to be able to pass in a pointer with tags. Pass
it in and enable CQE posting for file resources. Doesn't support setting
tags on update yet.

A design choice made here is to not post CQEs on rsrc de-registration,
but only when we updated-removed it by rsrc dynamic update.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c498aaec32a4bb277b2406b9069662c02cdda98c.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: enumerate dynamic resources
Pavel Begunkov [Sun, 25 Apr 2021 13:32:20 +0000 (14:32 +0100)]
io_uring: enumerate dynamic resources

As resources are getting more support and common parts, it'll be more
convenient to index resources and use it for indexing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0be63e9310212d5601d36277c2946ff7a040485.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add generic path for rsrc update
Pavel Begunkov [Sun, 25 Apr 2021 13:32:19 +0000 (14:32 +0100)]
io_uring: add generic path for rsrc update

Extract some common parts for rsrc update, will be used reg buffers
support dynamic (i.e. quiesce-lee) managing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b49c3ff6b9ff0e530295767604fe4de64d349e04.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: preparation for rsrc tagging
Pavel Begunkov [Sun, 25 Apr 2021 13:32:18 +0000 (14:32 +0100)]
io_uring: preparation for rsrc tagging

We need a way to notify userspace when a lazily removed resource
actually died out. This will be done by associating a tag, which is u64
exactly like req->user_data, with each rsrc (e.g. buffer of file). A CQE
will be posted once a resource is actually put down.

Tag 0 is a special value set by default, for whcih it don't generate an
CQE, so providing the old behaviour.

Don't expose it to the userspace yet, but prepare internally, allocate
buffers, add all posting hooks, etc.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e6beec5eabe7216bb61fb93cdf5aaf65812a9b0.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: decouple CQE filling from requests
Pavel Begunkov [Sun, 25 Apr 2021 13:32:17 +0000 (14:32 +0100)]
io_uring: decouple CQE filling from requests

Make __io_cqring_fill_event() agnostic of struct io_kiocb, pass all the
data needed directly into it. Will be used to post rsrc removal
completions, which don't have an associated request.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9b8da9e42772db2033547dfebe479dc972a0f2c.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: return back rsrc data free helper
Pavel Begunkov [Sun, 25 Apr 2021 13:32:16 +0000 (14:32 +0100)]
io_uring: return back rsrc data free helper

Add io_rsrc_data_free() helper for destroying rsrc_data, easier for
search and the function will get more stuff to destroy shortly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/562d1d53b5ff184f15b8949a63d76ef19c4ba9ec.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move __io_sqe_files_unregister
Pavel Begunkov [Sun, 25 Apr 2021 13:32:15 +0000 (14:32 +0100)]
io_uring: move __io_sqe_files_unregister

A preparation patch moving __io_sqe_files_unregister() definition closer
to other "files" functions without any modification.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/95caf17fe837e67bd1f878395f07049062a010d4.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: check sqring and iopoll_list before shedule
Hao Xu [Wed, 21 Apr 2021 15:19:11 +0000 (23:19 +0800)]
io_uring: check sqring and iopoll_list before shedule

do this to avoid race below:

         userspace                         kernel

                               |  check sqring and iopoll_list
submit sqe                     |
check IORING_SQ_NEED_WAKEUP    |
(which is not set)    |        |
                               |  set IORING_SQ_NEED_WAKEUP
wait cqe                       |  schedule(never wakeup again)

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619018351-75883-1-git-send-email-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_sq_offload_create()
Pavel Begunkov [Tue, 20 Apr 2021 11:03:33 +0000 (12:03 +0100)]
io_uring: refactor io_sq_offload_create()

Just a bit of code tossing in io_sq_offload_create(), so it looks a bit
better. No functional changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/939776f90de8d2cdd0414e1baa29c8ec0926b561.1618916549.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: safer sq_creds putting
Pavel Begunkov [Tue, 20 Apr 2021 11:03:32 +0000 (12:03 +0100)]
io_uring: safer sq_creds putting

Put sq_creds as a part of io_ring_ctx_free(), it's easy to miss doing it
in io_sq_thread_finish(), especially considering past mistakes related
to ring creation failures.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3becb1866467a1de82a97345a0a90d7fb8ff875e.1618916549.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move inflight un-tracking into cleanup
Pavel Begunkov [Tue, 20 Apr 2021 11:03:31 +0000 (12:03 +0100)]
io_uring: move inflight un-tracking into cleanup

REQ_F_INFLIGHT deaccounting doesn't do any spinlocking or resource
freeing anymore, so it's safe to move it into the normal cleanup flow,
i.e. into io_clean_op(), so making it cleaner.

Also move io_req_needs_clean() to be first in io_dismantle_req() so it
doesn't reload req->flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/90653a3a5de4107e3a00536fa4c2ea5f2c38a4ac.1618916549.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: remove unused io_wqe_need_worker() function
Jens Axboe [Tue, 20 Apr 2021 17:24:22 +0000 (11:24 -0600)]
io-wq: remove unused io_wqe_need_worker() function

A previous commit removed the need for this, but overlooked that we no
longer use it at all. Get rid of it.

Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix shared sqpoll cancellation hangs
Pavel Begunkov [Sun, 18 Apr 2021 13:52:09 +0000 (14:52 +0100)]
io_uring: fix shared sqpoll cancellation hangs

[  736.982891] INFO: task iou-sqp-4294:4295 blocked for more than 122 seconds.
[  736.982897] Call Trace:
[  736.982901]  schedule+0x68/0xe0
[  736.982903]  io_uring_cancel_sqpoll+0xdb/0x110
[  736.982908]  io_sqpoll_cancel_cb+0x24/0x30
[  736.982911]  io_run_task_work_head+0x28/0x50
[  736.982913]  io_sq_thread+0x4e3/0x720

We call io_uring_cancel_sqpoll() one by one for each ctx either in
sq_thread() itself or via task works, and it's intended to cancel all
requests of a specified context. However the function uses per-task
counters to track the number of inflight requests, so it counts more
requests than available via currect io_uring ctx and goes to sleep for
them to appear (e.g. from IRQ), that will never happen.

Cancel a bit more than before, i.e. all ctxs that share sqpoll
and continue to use shared counters. Don't forget that we should not
remove ctx from the list before running that task_work sqpoll-cancel,
otherwise the function wouldn't be able to find the context and will
hang.

Reported-by: Joakim Hassila <joj@mac.com>
Reported-by: Jens Axboe <axboe@kernel.dk>
Fixes: 37d1e2e3642e2 ("io_uring: move SQPOLL thread io-wq forked worker")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bded7e6c6b32e0bae25fce36be2868e46b116a0.1618752958.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove extra sqpoll submission halting
Pavel Begunkov [Sun, 18 Apr 2021 13:52:08 +0000 (14:52 +0100)]
io_uring: remove extra sqpoll submission halting

SQPOLL task won't submit requests for a context that is currently dying,
so no need to remove ctx from sqd_list prior the main loop of
io_ring_exit_work(). Kill it, will be removed by io_sq_thread_finish()
and only brings confusion and lockups.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f220c2b786ba0f9499bebc9f3cd9714d29efb6a5.1618752958.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: check register restriction afore quiesce
Pavel Begunkov [Thu, 15 Apr 2021 12:07:40 +0000 (13:07 +0100)]
io_uring: check register restriction afore quiesce

Move restriction checks of __io_uring_register() before quiesce, saves
from waiting for requests in fail case and simplifies the code a bit.
Also add array_index_nospec() for safety

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/88d7913c9280ee848fdb7b584eea37a465391cee.1618488258.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix overflows checks in provide buffers
Pavel Begunkov [Thu, 15 Apr 2021 12:07:39 +0000 (13:07 +0100)]
io_uring: fix overflows checks in provide buffers

Colin reported before possible overflow and sign extension problems in
io_provide_buffers_prep(). As Linus pointed out previous attempt did nothing
useful, see d81269fecb8ce ("io_uring: fix provide_buffers sign extension").

Do that with help of check_<op>_overflow helpers. And fix struct
io_provide_buf::len type, as it doesn't make much sense to keep it
signed.

Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/46538827e70fce5f6cdb50897cff4cacc490f380.1618488258.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't fail submit with overflow backlog
Pavel Begunkov [Thu, 15 Apr 2021 12:40:50 +0000 (13:40 +0100)]
io_uring: don't fail submit with overflow backlog

Don't fail submission attempts if there are CQEs in the overflow
backlog, but give away the decision making to the userspace. It
might be very inconvenient to the userspace, especially if
submission and completion are done by different threads.

We can remove it because of recent changes, where requests
are now not locked by the backlog, backlog entries are allocated
separately, so they take less space and cgroup accounted.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix merge error for async resubmit
Jens Axboe [Thu, 15 Apr 2021 17:31:14 +0000 (11:31 -0600)]
io_uring: fix merge error for async resubmit

A hand-edit while applying this patch on top of a new base resulted in
a reverted check for re-issue, resulting in spurious -EAGAIN errors.

Fixes: 8c130827f417 ("io_uring: don't alter iopoll reissue fail ret code")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: tie req->apoll to request lifetime
Jens Axboe [Thu, 15 Apr 2021 15:52:40 +0000 (09:52 -0600)]
io_uring: tie req->apoll to request lifetime

We manage these separately right now, just tie it to the request lifetime
and make it be part of the usual REQ_F_NEED_CLEANUP logic.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: put flag checking for needing req cleanup in one spot
Jens Axboe [Thu, 15 Apr 2021 23:44:34 +0000 (17:44 -0600)]
io_uring: put flag checking for needing req cleanup in one spot

We have this in two spots right now, which is a bit fragile. In
preparation for moving REQ_F_POLLED cleanup into the same spot, move
the check into a separate helper so we only have it once.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: disable multishot poll for double poll add cases
Jens Axboe [Thu, 15 Apr 2021 15:47:13 +0000 (09:47 -0600)]
io_uring: disable multishot poll for double poll add cases

The re-add handling isn't correct for the multi wait case, so let's
just disable it for now explicitly until we can get that sorted out. This
just turns it into a one-shot request. Since we pass back whether or not
a poll request terminates in multishot mode on completion, this should
not break properly behaving applications that check for IORING_CQE_F_MORE
on completion.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move poll update into remove not add
Pavel Begunkov [Wed, 14 Apr 2021 12:38:37 +0000 (13:38 +0100)]
io_uring: move poll update into remove not add

Having poll update function as a part of IORING_OP_POLL_ADD is not
great, we have to do hack around struct layouts and add some overhead in
the way of more popular POLL_ADD. Even more serious drawback is that
POLL_ADD requires file and always grabs it, and so poll update, which
doesn't need it.

Incorporate poll update into IORING_OP_POLL_REMOVE instead of
IORING_OP_POLL_ADD. It also more consistent with timeout remove/update.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add helper for parsing poll events
Pavel Begunkov [Wed, 14 Apr 2021 12:38:36 +0000 (13:38 +0100)]
io_uring: add helper for parsing poll events

Isolate poll mask SQE parsing and preparations into a new function,
which will be reused shortly.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix POLL_REMOVE removing apoll
Pavel Begunkov [Wed, 14 Apr 2021 12:38:35 +0000 (13:38 +0100)]
io_uring: fix POLL_REMOVE removing apoll

Don't allow REQ_OP_POLL_REMOVE to kill apoll requests, users should not
know about it. Also, remove weird -EACCESS in io_poll_update(), it
shouldn't know anything about apoll, and have to work even if happened
to have a poll and an async poll'ed request with same user_data.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_ring_exit_work()
Pavel Begunkov [Wed, 14 Apr 2021 12:38:34 +0000 (13:38 +0100)]
io_uring: refactor io_ring_exit_work()

Don't reinit io_ring_exit_work()'s exit work/completions on each
iteration, that's wasteful. Also add list_rotate_left(), so if we failed
to complete the task job, we don't try it again and again but defer it
until others are processed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: inline io_iopoll_getevents()
Pavel Begunkov [Tue, 13 Apr 2021 01:58:46 +0000 (02:58 +0100)]
io_uring: inline io_iopoll_getevents()

io_iopoll_getevents() is of no use to us anymore, io_iopoll_check()
handles all the cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e50b8917390f38bee4f822c6f4a6a98a27be037.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: skip futile iopoll iterations
Pavel Begunkov [Tue, 13 Apr 2021 01:58:45 +0000 (02:58 +0100)]
io_uring: skip futile iopoll iterations

The only way to get out of io_iopoll_getevents() and continue iterating
is to have empty iopoll_list, otherwise the main loop would just exit.
So, instead of the unlock on 8th time heuristic, do that based on
iopoll_list.

Also, as no one can add new requests to iopoll_list while
io_iopoll_check() hold uring_lock, it's useless to spin with the list
empty, return in that case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b8ebe84f5fff7ffa1f708952dfef7fc78b668e2.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't fail overflow on in_idle
Pavel Begunkov [Tue, 13 Apr 2021 01:58:44 +0000 (02:58 +0100)]
io_uring: don't fail overflow on in_idle

As CQE overflows are now untied from requests and so don't hold any
ref, we don't need to handle exiting/exec'ing cases there anymore.
Moreover, it's much nicer in regards to userspace to save overflowed
CQEs whenever possible, so remove failing on in_idle.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d873b7dab75c7f3039ead9628a745bea01f2cfd2.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: clean up io_poll_remove_waitqs()
Pavel Begunkov [Tue, 13 Apr 2021 01:58:43 +0000 (02:58 +0100)]
io_uring: clean up io_poll_remove_waitqs()

Move some parts of io_poll_remove_waitqs() that are opcode independent.
Looks better and stresses that both do __io_poll_remove_one().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bbc717f82117cc335c89cbe67ec8d72608178732.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor hrtimer_try_to_cancel uses
Pavel Begunkov [Tue, 13 Apr 2021 01:58:42 +0000 (02:58 +0100)]
io_uring: refactor hrtimer_try_to_cancel uses

Don't save return values of hrtimer_try_to_cancel() in a variable, but
use right away. It's in general safer to not have an intermediate
variable, which may be reused and passed out wrongly, but it be
contracted out. Also clean io_timeout_extract().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d2566ef7ce632e6882dc13e022a26249b3fd30b5.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add timeout completion_lock annotation
Pavel Begunkov [Tue, 13 Apr 2021 01:58:41 +0000 (02:58 +0100)]
io_uring: add timeout completion_lock annotation

Add one more sparse locking annotation for readability in
io_kill_timeout().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bdbb22026024eac29203c1aa0045c4954a2488d1.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: split poll and poll update structures
Pavel Begunkov [Tue, 13 Apr 2021 01:58:40 +0000 (02:58 +0100)]
io_uring: split poll and poll update structures

struct io_poll_iocb became pretty nasty combining also update fields.
Split them, so we would have more clarity to it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2f74d64ffebb57a648f791681af086c7211e3a4.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix uninit old data for poll event upd
Pavel Begunkov [Tue, 13 Apr 2021 01:58:39 +0000 (02:58 +0100)]
io_uring: fix uninit old data for poll event upd

Both IORING_POLL_UPDATE_EVENTS and IORING_POLL_UPDATE_USER_DATA need
old_user_data to find/cancel a poll request, but it's set only for the
first one.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ab08fd35b7652e977f9a475f01741b04102297f1.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix leaking reg files on exit
Pavel Begunkov [Tue, 13 Apr 2021 01:58:38 +0000 (02:58 +0100)]
io_uring: fix leaking reg files on exit

If io_sqe_files_unregister() faults on io_rsrc_ref_quiesce(), it will
fail to do unregister leaving files referenced. And that may well happen
because of a strayed signal or just because it does allocations inside.

In io_ring_ctx_free() do an unsafe version of unregister, as it's
guaranteed to not have requests by that point and so quiesce is useless.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e696e9eade571b51997d0dc1d01f144c6d685c05.1618278933.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: return back safer resurrect
Pavel Begunkov [Sun, 11 Apr 2021 00:46:40 +0000 (01:46 +0100)]
io_uring: return back safer resurrect

Revert of revert of "io_uring: wait potential ->release() on resurrect",
which adds a helper for resurrect not racing completion reinit, as was
removed because of a strange bug with no clear root or link to the
patch.

Was improved, instead of rcu_synchronize(), just wait_for_completion()
because we're at 0 refs and it will happen very shortly. Specifically
use non-interruptible version to ignore all pending signals that may
have ended prior interruptible wait.

This reverts commit cb5e1b81304e089ee3ca948db4d29f71902eb575.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7a080c20f686d026efade810b116b72f88abaff9.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve hardlink code generation
Pavel Begunkov [Sun, 11 Apr 2021 00:46:39 +0000 (01:46 +0100)]
io_uring: improve hardlink code generation

req_set_fail_links() condition checking is bulky. Even though it's
always in a slow path, it's inlined and generates lots of extra code,
simplify it be moving HARDLINK checking into helpers killing linked
requests.

          text    data     bss     dec     hex filename
before:  79318   12330       8   91656   16608 ./fs/io_uring.o
after:   79126   12330       8   91464   16548 ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/96a9387db658a9d5a44ecbfd57c2a62cb888c9b6.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve sqo stop
Pavel Begunkov [Sun, 11 Apr 2021 00:46:38 +0000 (01:46 +0100)]
io_uring: improve sqo stop

Set IO_SQ_THREAD_SHOULD_STOP before taking sqd lock, so the sqpoll task
sees earlier. Not a problem, it will stop eventually. Also check
invariant that it's stopped only once.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/653b24ee93843a50ff65a45847d9138f5adb76d7.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: split file table from rsrc nodes
Pavel Begunkov [Sun, 11 Apr 2021 00:46:37 +0000 (01:46 +0100)]
io_uring: split file table from rsrc nodes

We don't need to store file tables in rsrc nodes, for now it's easier to
handle tables not generically, so move file tables into the context. A
nice side effect is having one less pointer dereference for request with
fixed file initialisation.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/de9fc4cd3545f24c26c03be4556f58ba3d18b9c3.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: cleanup buffer register
Pavel Begunkov [Sun, 11 Apr 2021 00:46:36 +0000 (01:46 +0100)]
io_uring: cleanup buffer register

In preparation for more changes do a little cleanup of
io_sqe_buffers_register(). Move all args/invariant checking into it from
io_buffers_map_alloc(), because it's confusing. And add a bit more
cleaning for the loop.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93292cb9708c8455e5070cc855861d94e11ca042.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add buffer unmap helper
Pavel Begunkov [Sun, 11 Apr 2021 00:46:35 +0000 (01:46 +0100)]
io_uring: add buffer unmap helper

Add a helper for unmapping registered buffers, better than double
indexing and will be reused in the future.

Suggested-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66cbc6ea863be865bac7b7080ed6a3d5c542b71f.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify io_rsrc_data refcounting
Pavel Begunkov [Sun, 11 Apr 2021 00:46:34 +0000 (01:46 +0100)]
io_uring: simplify io_rsrc_data refcounting

We don't take many references of struct io_rsrc_data, only one per each
io_rsrc_node, so using percpu refs is overkill. Use atomic ref instead,
which is much simpler.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1551d90f7c9b183cf2f0d7b5e5b923430acb03fa.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: provide io_resubmit_prep() stub for !CONFIG_BLOCK
Jens Axboe [Mon, 12 Apr 2021 12:40:02 +0000 (06:40 -0600)]
io_uring: provide io_resubmit_prep() stub for !CONFIG_BLOCK

Randy reports the following error on CONFIG_BLOCK not being set:

../fs/io_uring.c: In function â€˜kiocb_done’:
../fs/io_uring.c:2766:7: error: implicit declaration of function â€˜io_resubmit_prep’; did you mean â€˜io_put_req’? [-Werror=implicit-function-declaration]
   if (io_resubmit_prep(req)) {

Provide a dummy stub for io_resubmit_prep() like we do for
io_rw_should_reissue(), which also helps remove an ifdef sequence from
io_complete_rw_iopoll() as well.

Fixes: 8c130827f417 ("io_uring: don't alter iopoll reissue fail ret code")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise fill_event() by inlining
Pavel Begunkov [Sun, 11 Apr 2021 00:46:33 +0000 (01:46 +0100)]
io_uring: optimise fill_event() by inlining

There are three cases where we much care about performance of
io_cqring_fill_event() -- flushing inline completions, iopoll and
io_req_complete_post(). Inline a hot part of fill_event() into them.

All others are not as important and we don't want to bloat binary for
them, so add a noinline version of the function for all other use
use cases.

nops test(batch=32): 16.932 vs 17.822 KIOPS

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a11d59424bf4417aca33f5ec21008bb3b0ebd11e.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: always pass cflags into fill_event()
Pavel Begunkov [Sun, 11 Apr 2021 00:46:32 +0000 (01:46 +0100)]
io_uring: always pass cflags into fill_event()

A simple preparation patch inlining io_cqring_fill_event(), which only
role was to pass cflags=0 into an actual fill event. It helps to keep
number of related helpers sane in following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/704f9c85b7d9843e4ad50a9f057200c58f5adc6e.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise non-eventfd post-event
Pavel Begunkov [Sun, 11 Apr 2021 00:46:31 +0000 (01:46 +0100)]
io_uring: optimise non-eventfd post-event

Eventfd is not the canonical way of using io_uring, annotate
io_should_trigger_evfd() with likely so it improves code generation for
non-eventfd branch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42fdaa51c68d39479f02cef4fe5bcb24624d60fa.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor compat_msghdr import
Pavel Begunkov [Sun, 11 Apr 2021 00:46:30 +0000 (01:46 +0100)]
io_uring: refactor compat_msghdr import

Add an entry for user pointer to compat_msghdr into io_connect, so it's
explicit that we may use it as this, and removes annoying casts.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/73fd644dea1518f528d3648981cf777ce6e537e9.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: enable inline completion for more cases
Pavel Begunkov [Sun, 11 Apr 2021 00:46:29 +0000 (01:46 +0100)]
io_uring: enable inline completion for more cases

Take advantage of delayed/inline completion flushing and pass right
issue flags for completion of open, open2, fadvise and poll remove
opcodes. All others either already use it or always punted and never
executed inline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0badc7512e82f7350b73bb09abbebbecbdd5dab8.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_close
Pavel Begunkov [Sun, 11 Apr 2021 00:46:28 +0000 (01:46 +0100)]
io_uring: refactor io_close

A small refactoring shrinking it and making easier to read.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19b24eed7cd491a0243b50366dd2a23b558e2665.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: unify files and task cancel
Pavel Begunkov [Sun, 11 Apr 2021 00:46:27 +0000 (01:46 +0100)]
io_uring: unify files and task cancel

Now __io_uring_cancel() and __io_uring_files_cancel() are very similar
and mostly differ by how we count requests, merge them and allow
tctx_inflight() to handle counting.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1a5986a97df4dc1378f3fe0ca1eb483dbcf42112.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: track inflight requests through counter
Pavel Begunkov [Sun, 11 Apr 2021 00:46:26 +0000 (01:46 +0100)]
io_uring: track inflight requests through counter

Instead of keeping requests in a inflight_list, just track them with a
per tctx atomic counter. Apart from it being much easier and more
consistent with task cancel, it frees ->inflight_entry from being shared
between iopoll and cancel-track, so less headache for us.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3c2ee0863cd7eeefa605f3eaff4c1c461a6f1157.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: unify task and files cancel loops
Pavel Begunkov [Sun, 11 Apr 2021 00:46:25 +0000 (01:46 +0100)]
io_uring: unify task and files cancel loops

Move tracked inflight number check up the stack into
__io_uring_files_cancel() so it's similar to task cancel. Will be used
for further cleaning.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dca5a395efebd1e3e0f3bbc6b9640c5e8aa7e468.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify apoll hash removal
Pavel Begunkov [Fri, 9 Apr 2021 08:13:21 +0000 (09:13 +0100)]
io_uring: simplify apoll hash removal

hash_del() works well with non-hashed nodes, there's no need to check
if it is hashed first.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_poll_complete()
Pavel Begunkov [Fri, 9 Apr 2021 08:13:20 +0000 (09:13 +0100)]
io_uring: refactor io_poll_complete()

Remove error parameter from io_poll_complete(), 0 is always passed,
and do a bit of cleaning on top.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: clean up io_poll_task_func()
Pavel Begunkov [Fri, 9 Apr 2021 08:13:19 +0000 (09:13 +0100)]
io_uring: clean up io_poll_task_func()

io_poll_complete() always fills an event (even an overflowed one), so we
always should do io_cqring_ev_posted() afterwards. And that's what is
currently happening, because second EPOLLONESHOT check is always true,
it can't return !done for oneshots.

Remove those branching, it's much easier to read.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: Fix io_wq_worker_affinity()
Peter Zijlstra [Thu, 8 Apr 2021 09:44:50 +0000 (11:44 +0200)]
io-wq: Fix io_wq_worker_affinity()

Do not include private headers and do not frob in internals.

On top of that, while the previous code restores the affinity, it
doesn't ensure the task actually moves there if it was running,
leading to the fun situation that it can be observed running outside
of its allowed mask for potentially significant time.

Use the proper API instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/YG7QkiUzlEbW85TU@hirez.programming.kicks-ass.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't attempt re-add of multishot poll request if racing
Jens Axboe [Tue, 6 Apr 2021 15:49:31 +0000 (09:49 -0600)]
io_uring: don't attempt re-add of multishot poll request if racing

We currently allow racy updates to multishot requests, but we can end up
double adding the poll request if both completion and update does it.
Ensure that we skip re-add on the update side if someone else is
completing it.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: simplify code in __io_worker_busy()
Hao Xu [Tue, 6 Apr 2021 03:08:45 +0000 (11:08 +0800)]
io-wq: simplify code in __io_worker_busy()

Leverage XOR to simplify the code in __io_worker_busy.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1617678525-3129-1-git-send-email-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: kill outdated comment about splice punt
Pavel Begunkov [Thu, 1 Apr 2021 14:44:05 +0000 (15:44 +0100)]
io_uring: kill outdated comment about splice punt

The splice/tee comment in io_prep_async_work() isn't relevant since the
section was moved, delete it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/892a549c89c3d422b679677b8e68ffd3fcb736b6.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: encapsulate fixed files into struct
Pavel Begunkov [Thu, 1 Apr 2021 14:44:04 +0000 (15:44 +0100)]
io_uring: encapsulate fixed files into struct

Add struct io_fixed_file representing a single registered file, first to
hide ugly struct file **, which may be misleading, and secondly to
retype it to unsigned long as conversions to it and back to file * for
handling and masking FFS_* flags are getting nasty.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/78669731a605a7614c577c3de552631cfaf0869a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor file tables alloc/free
Pavel Begunkov [Thu, 1 Apr 2021 14:44:03 +0000 (15:44 +0100)]
io_uring: refactor file tables alloc/free

Introduce a heler io_free_file_tables() doing all the cleaning, there
are several places where it's hand coded. Also move all allocations into
io_sqe_alloc_file_tables() and rename it, so all of it is in one place.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/502a84ebf41ff119b095e59661e678eacb752bf8.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't quiesce intial files register
Pavel Begunkov [Thu, 1 Apr 2021 14:44:02 +0000 (15:44 +0100)]
io_uring: don't quiesce intial files register

There is no reason why we would want to fully quiesce ring on
IORING_REGISTER_FILES, if it's already registered we fail.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/563bb8060bb2d3efbc32fce6101678281c574d2a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: set proper FFS* flags on reg file update
Pavel Begunkov [Thu, 1 Apr 2021 14:44:01 +0000 (15:44 +0100)]
io_uring: set proper FFS* flags on reg file update

Set FFS_* flags (e.g. FFS_ASYNC_READ) not only in initial registration
but also on registered files update. Not a bug, but may miss getting
profit out of the feature.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/df29a841a2d3d3695b509cdffce5070777d9d942.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: deduplicate NOSIGNAL setting
Pavel Begunkov [Thu, 1 Apr 2021 14:44:00 +0000 (15:44 +0100)]
io_uring: deduplicate NOSIGNAL setting

Set MSG_NOSIGNAL and REQ_F_NOWAIT in send/recv prep routines and don't
duplicate it in all four send/recv handlers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1133a3ed1c0e192975b7341ea4b0bf91f63b132.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: put link timeout req consistently
Pavel Begunkov [Thu, 1 Apr 2021 14:43:59 +0000 (15:43 +0100)]
io_uring: put link timeout req consistently

Don't put linked timeout req in io_async_find_and_cancel() but do it in
io_link_timeout_fn(), so we have only one point for that and won't have
to do it differently as it's now (put vs put_deferred). Btw, improve a
bit io_async_find_and_cancel()'s locking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d75b70957f245275ab7cba83e0ac9c1b86aae78a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify overflow handling
Pavel Begunkov [Thu, 1 Apr 2021 14:43:58 +0000 (15:43 +0100)]
io_uring: simplify overflow handling

Overflowed CQEs doesn't lock requests anymore, so we don't care so much
about cancelling them, so kill cq_overflow_flushed and simplify the
code.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5799867aeba9e713c32f49aef78e5e1aef9fbc43.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: lock annotate timeouts and poll
Pavel Begunkov [Thu, 1 Apr 2021 14:43:57 +0000 (15:43 +0100)]
io_uring: lock annotate timeouts and poll

Add timeout and poll ->comletion_lock annotations for Sparse, makes life
easier while looking at the functions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2345325643093d41543383ba985a735aeb899eac.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: kill unused forward decls
Pavel Begunkov [Thu, 1 Apr 2021 14:43:56 +0000 (15:43 +0100)]
io_uring: kill unused forward decls

Kill unused forward declarations for io_ring_file_put() and
io_queue_next(). Also btw rename the first one.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/64aa27c3f9662e14615cc119189f5eaf12989671.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: store reg buffer end instead of length
Pavel Begunkov [Thu, 1 Apr 2021 14:43:55 +0000 (15:43 +0100)]
io_uring: store reg buffer end instead of length

It's a bit more convenient for us to store a registered buffer end
address instead of length, see struct io_mapped_ubuf, as it allow to not
recompute it every time.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/39164403fe92f1dc437af134adeec2423cdf9395.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve import_fixed overflow checks
Pavel Begunkov [Thu, 1 Apr 2021 14:43:54 +0000 (15:43 +0100)]
io_uring: improve import_fixed overflow checks

Replace a hand-coded overflow check with a specialised function. Even
though compilers are smart enough to generate identical binary (i.e.
check carry bit), but it's more foolproof and conveys the intention
better.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e437dcdc929bacbb6f11a4824ecbbf17225cb82a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_async_cancel()
Pavel Begunkov [Thu, 1 Apr 2021 14:43:53 +0000 (15:43 +0100)]
io_uring: refactor io_async_cancel()

Remove extra tctx==NULL checks that are already done by
io_async_cancel_one().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70c2a8b958d942e86958a28af0452966ce1095b0.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove unused hash_wait
Pavel Begunkov [Thu, 1 Apr 2021 14:43:52 +0000 (15:43 +0100)]
io_uring: remove unused hash_wait

No users of io_uring_ctx::hash_wait left, kill it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e25cb83c233a5f75f15275596b49fbafbea606fa.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: better ref handling in poll_remove_one
Pavel Begunkov [Thu, 1 Apr 2021 14:43:51 +0000 (15:43 +0100)]
io_uring: better ref handling in poll_remove_one

Instead of io_put_req() to drop not a final ref, use req_ref_put(),
which is slimmer and will also check the invariant.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/85b5774ce13ae55cc2e705abdc8cbafe1212f1bd.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: combine lock/unlock sections on exit
Pavel Begunkov [Thu, 1 Apr 2021 14:43:50 +0000 (15:43 +0100)]
io_uring: combine lock/unlock sections on exit

io_ring_exit_work() already does uring_lock lock/unlock, no need to
repeat it for lock waiting trick in io_ring_ctx_free(). Move the waiting
with comments and spinlocking into io_ring_exit_work.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a8ae0589b0ea64ad4791e2c282e4e9b713dd7024.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove useless is_dying check on quiesce
Pavel Begunkov [Thu, 1 Apr 2021 14:43:48 +0000 (15:43 +0100)]
io_uring: remove useless is_dying check on quiesce

rsrc_data refs should always be valid for potential submitters,
io_rsrc_ref_quiesce() restores it before unlocking, so
percpu_ref_is_dying() check in io_sqe_files_unregister() does nothing
and misleading. Concurrent quiesce is prevented with
struct io_rsrc_data::quiesce.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bf97055e1748ee3a382e66daf384a469eb90b931.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: reuse io_rsrc_node_destroy()
Pavel Begunkov [Thu, 1 Apr 2021 14:43:47 +0000 (15:43 +0100)]
io_uring: reuse io_rsrc_node_destroy()

Reuse io_rsrc_node_destroy() in __io_rsrc_put_work(). Also move it to a
more appropriate place -- to the other node routines, and remove forward
declaration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cccafba41aee1e5bb59988704885b1340aef3a27.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: ctx-wide rsrc nodes
Pavel Begunkov [Thu, 1 Apr 2021 14:43:46 +0000 (15:43 +0100)]
io_uring: ctx-wide rsrc nodes

If we're going to ever support multiple types of resources we need
shared rsrc nodes to not bloat requests, that is implemented in this
patch. It also gives a nicer API and saves one pointer dereference
in io_req_set_rsrc_node().

We may say that all requests bound to a resource belong to one and only
one rsrc node, and considering that nodes are removed and recycled
strictly in-order, this separates requests into generations, where
generation are changed on each node switch (i.e. io_rsrc_node_switch()).

The API is simple, io_rsrc_node_switch() switches to a new generation if
needed, and also optionally kills a passed in io_rsrc_data. Each call to
io_rsrc_node_switch() have to be preceded with
io_rsrc_node_switch_start(). The start function is idempotent and should
not necessarily be followed by switch.

One difference is that once a node was set it will always retain a valid
rsrc node, even on unregister. It may be a nuisance at the moment, but
makes much sense for multiple types of resources. Another thing changed
is that nodes are bound to/associated with a io_rsrc_data later just
before killing (i.e. switching).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e9c693b4b9a2f47aa784b616ce29843021bb65a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_queue_rsrc_removal()
Pavel Begunkov [Thu, 1 Apr 2021 14:43:45 +0000 (15:43 +0100)]
io_uring: refactor io_queue_rsrc_removal()

Pass rsrc_node into io_queue_rsrc_removal() explicitly. Just a
simple preparation patch, makes following changes nicer.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/002889ce4de7baf287f2b010eef86ffe889174c6.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move rsrc_put callback into io_rsrc_data
Pavel Begunkov [Thu, 1 Apr 2021 14:43:44 +0000 (15:43 +0100)]
io_uring: move rsrc_put callback into io_rsrc_data

io_rsrc_node's callback operates only on a single io_rsrc_data and only
with its resources, so rsrc_put() callback is actually a property of
io_rsrc_data. Move it there, it makes code much nicecr.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9417c2fba3c09e8668f05747006a603d416d34b4.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: encapsulate rsrc node manipulations
Pavel Begunkov [Thu, 1 Apr 2021 14:43:43 +0000 (15:43 +0100)]
io_uring: encapsulate rsrc node manipulations

io_rsrc_node_get() and io_rsrc_node_set() are always used together,
merge them into one so most users don't even see io_rsrc_node and don't
need to care about it.

It helped to catch io_sqe_files_register() inferring rsrc data argument
for get and set differently, not a problem but a good sign.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0827b080b2e61b3dec795380f7e1a1995595d41f.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: use rsrc prealloc infra for files reg
Pavel Begunkov [Thu, 1 Apr 2021 14:43:42 +0000 (15:43 +0100)]
io_uring: use rsrc prealloc infra for files reg

Keep it consistent with update and use io_rsrc_node_prealloc() +
io_rsrc_node_get() in io_sqe_files_register() as well, that will be used
in future patches, not as error prone and allows to deduplicate
rsrc_node init.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cf87321e6be5e38f4dc7fe5079d2aa6945b1ace0.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify io_rsrc_node_ref_zero
Pavel Begunkov [Thu, 1 Apr 2021 14:43:41 +0000 (15:43 +0100)]
io_uring: simplify io_rsrc_node_ref_zero

Replace queue_delayed_work() with mod_delayed_work() in
io_rsrc_node_ref_zero() as the later one can schedule a new work, and
cleanup it further for better readability.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3b2b23e3a1ea4bbf789cd61815d33e05d9ff945e.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: name rsrc bits consistently
Pavel Begunkov [Thu, 1 Apr 2021 14:43:40 +0000 (15:43 +0100)]
io_uring: name rsrc bits consistently

Keep resource related structs' and functions' naming consistent, in
particular use "io_rsrc" prefix for everything.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/962f5acdf810f3a62831e65da3932cde24f6d9df.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: cancel task_work on exit only targeting the current 'wq'
Jens Axboe [Fri, 2 Apr 2021 01:57:07 +0000 (19:57 -0600)]
io-wq: cancel task_work on exit only targeting the current 'wq'

With using task_work_cancel(), we're potentially canceling task_work
that isn't related to this specific io_wq. Use the newly added
task_work_cancel_match() to ensure that we only remove and cancel work
items that are specific to this io_wq.

Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agotask_work: add helper for more targeted task_work canceling
Jens Axboe [Fri, 2 Apr 2021 01:53:29 +0000 (19:53 -0600)]
task_work: add helper for more targeted task_work canceling

The only exported helper we have right now is task_work_cancel(), which
cancels any task_work from a given task where func matches the queued
work item. This is a bit too coarse for some use cases. Add a
task_work_cancel_match() that allows to more specifically target
individual work items outside of purely the callback function used.

task_work_cancel() can be trivially implemented on top of that, hence do
so.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix race around poll update and poll triggering
Jens Axboe [Wed, 31 Mar 2021 15:03:03 +0000 (09:03 -0600)]
io_uring: fix race around poll update and poll triggering

Joakim reports that in some conditions he sees a multishot poll request
being canceled, and that it coincides with getting -EALREADY on
modification. As part of the poll update procedure, there's a small window
where the request is marked as canceled, and if this coincides with the
event actually triggering, then we can get a spurious -ECANCELED and
termination of the multishot request.

Don't mark the poll request as being canceled for update. We also don't
care if we race on removal unless it's a one-shot request, we can safely
updated for either case.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: reg buffer overflow checks hardening
Pavel Begunkov [Wed, 24 Mar 2021 22:59:01 +0000 (22:59 +0000)]
io_uring: reg buffer overflow checks hardening

We are safe with overflows in io_sqe_buffer_register() because it will
just yield alloc failure, but it's nicer to check explicitly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2b0625551be3d97b80a5fd21c8cd79dc1c91f0b5.1616624589.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE
Jens Axboe [Thu, 25 Mar 2021 16:21:35 +0000 (10:21 -0600)]
io_uring: allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE

Now that we have any worker being attached to the original task as
threads, accounting of CPU time is directly attributed to the original
task as well. This means that we no longer have to restrict SQPOLL to
needing elevated privileges, as it's really no different from just having
the task spawn a busy looping thread in userspace.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: eliminate the need for a manager thread
Jens Axboe [Mon, 8 Mar 2021 16:37:51 +0000 (09:37 -0700)]
io-wq: eliminate the need for a manager thread

io-wq relies on a manager thread to create/fork new workers, as needed.
But there's really no strong need for it anymore. We have the following
cases that fork a new worker:

1) Work queue. This is done from the task itself always, and it's trivial
   to create a worker off that path, if needed.

2) All workers have gone to sleep, and we have more work. This is called
   off the sched out path. For this case, use a task_work items to queue
   a fork-worker operation.

3) Hashed work completion. Don't think we need to do anything off this
   case. If need be, it could just use approach 2 as well.

Part of this change is incrementing the running worker count before the
fork, to avoid cases where we observe we need a worker and then queue
creation of one. Then new work comes in, we fork a new one. That last
queue operation should have waited for the previous worker to come up,
it's quite possible we don't even need it. Hence move the worker running
from before we fork it off to more efficiently handle that case.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agokernel: allow fork with TIF_NOTIFY_SIGNAL pending
Jens Axboe [Mon, 22 Mar 2021 15:39:12 +0000 (09:39 -0600)]
kernel: allow fork with TIF_NOTIFY_SIGNAL pending

fork() fails if signal_pending() is true, but there are two conditions
that can lead to that:

1) An actual signal is pending. We want fork to fail for that one, like
   we always have.

2) TIF_NOTIFY_SIGNAL is pending, because the task has pending task_work.
   We don't need to make it fail for that case.

Allow fork() to proceed if just task_work is pending, by changing the
signal_pending() check to task_sigpending().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: allow events and user_data update of running poll requests
Jens Axboe [Wed, 17 Mar 2021 14:37:41 +0000 (08:37 -0600)]
io_uring: allow events and user_data update of running poll requests

This adds two new POLL_ADD flags, IORING_POLL_UPDATE_EVENTS and
IORING_POLL_UPDATE_USER_DATA. As with the other POLL_ADD flag, these are
masked into sqe->len. If set, the POLL_ADD will have the following
behavior:

- sqe->addr must contain the the user_data of the poll request that
  needs to be modified. This field is otherwise invalid for a POLL_ADD
  command.

- If IORING_POLL_UPDATE_EVENTS is set, sqe->poll_events must contain the
  new mask for the existing poll request. There are no checks for whether
  these are identical or not, if a matching poll request is found, then it
  is re-armed with the new mask.

- If IORING_POLL_UPDATE_USER_DATA is set, sqe->off must contain the new
  user_data for the existing poll request.

A POLL_ADD with any of these flags set may complete with any of the
following results:

1) 0, which means that we successfully found the existing poll request
   specified, and performed the re-arm procedure. Any error from that
   re-arm will be exposed as a completion event for that original poll
   request, not for the update request.
2) -ENOENT, if no existing poll request was found with the given
   user_data.
3) -EALREADY, if the existing poll request was already in the process of
   being removed/canceled/completing.
4) -EACCES, if an attempt was made to modify an internal poll request
   (eg not one originally issued ass IORING_OP_POLL_ADD).

The usual -EINVAL cases apply as well, if any invalid fields are set
in the sqe for this command type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: abstract out a io_poll_find_helper()
Jens Axboe [Wed, 17 Mar 2021 14:17:19 +0000 (08:17 -0600)]
io_uring: abstract out a io_poll_find_helper()

We'll need this helper for another purpose, for now just abstract it
out and have io_poll_cancel() use it for lookups.

Signed-off-by: Jens Axboe <axboe@kernel.dk>