Jason Gunthorpe [Tue, 10 Mar 2020 09:25:45 +0000 (11:25 +0200)]
RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy
The first switch statement in cm_destroy_id() tries to move the ID to
either IB_CM_IDLE or IB_CM_TIMEWAIT. Both states will block concurrent
MAD handlers from progressing.
Previous patches removed the unreliably lock/unlock sequences in this
flow, this patch removes the extra locking steps and adds the missing
parts to guarantee that destroy reaches IB_CM_IDLE. There is no point in
leaving the ID in the IB_CM_TIMEWAIT state the memory about to be kfreed.
Rework things to hold the lock across all the state transitions and
directly assert when done that it ended up in IB_CM_IDLE as expected.
This was accompanied by a careful audit of all the state transitions here,
which generally did end up in IDLE on their success and non-racy paths.
Link: https://lore.kernel.org/r/20200310092545.251365-16-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:44 +0000 (11:25 +0200)]
RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock
The first thing ib_send_cm_sidr_rep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.
Get rid of the cm_reject_sidr_req() wrapper so each call site can call the
locked or unlocked version as required.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
Link: https://lore.kernel.org/r/20200310092545.251365-15-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:43 +0000 (11:25 +0200)]
RDMA/cm: Allow ib_send_cm_rej() to be done under lock
The first thing ib_send_cm_rej() does is obtain the lock, so use the usual
unlocked wrapper, locked actor pattern here.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
While here simplify some of the logic in the implementation.
Link: https://lore.kernel.org/r/20200310092545.251365-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:42 +0000 (11:25 +0200)]
RDMA/cm: Allow ib_send_cm_drep() to be done under lock
The first thing ib_send_cm_drep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
Link: https://lore.kernel.org/r/20200310092545.251365-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:41 +0000 (11:25 +0200)]
RDMA/cm: Allow ib_send_cm_dreq() to be done under lock
The first thing ib_send_cm_dreq() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
Link: https://lore.kernel.org/r/20200310092545.251365-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:40 +0000 (11:25 +0200)]
RDMA/cm: Add some lockdep assertions for cm_id_priv->lock
These functions all touch state, so must be called under the lock.
Inspection shows this is currently true.
Link: https://lore.kernel.org/r/20200310092545.251365-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:39 +0000 (11:25 +0200)]
RDMA/cm: Add missing locking around id.state in cm_dup_req_handler
All accesses to id.state must be done under the spinlock.
Fixes:
a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:38 +0000 (11:25 +0200)]
RDMA/cm: Make it clearer how concurrency works in cm_req_handler()
ib_crate_cm_id() immediately places the id in the xarray, and publishes it
into the remote_id and remote_qpn rbtrees. This makes it visible to other
threads before it is fully set up.
It appears the thinking here was that the states IB_CM_IDLE and
IB_CM_REQ_RCVD do not allow any MAD handler or lookup in the remote_id and
remote_qpn rbtrees to advance.
However, cm_rej_handler() does take an action on IB_CM_REQ_RCVD, which is
not really expected by the design.
Make the whole thing clearer:
- Keep the new cm_id out of the xarray until it is completely set up.
This directly prevents MAD handlers and all rbtree lookups from seeing
the pointer.
- Move all the trivial setup right to the top so it is obviously done
before any concurrency begins
- Move the mutation of the cm_id_priv out of cm_match_id() and into the
caller so the state transition is obvious
- Place the manipulation of the work_list at the end, under lock, after
the cm_id is placed in the xarray. The work_count cannot change on an
ID outside the xarray.
- Add some comments
Link: https://lore.kernel.org/r/20200310092545.251365-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:37 +0000 (11:25 +0200)]
RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler()
ib_create_cm_id() immediately places the id in the xarray, so it is visible
to network traffic.
The state is initially set to IB_CM_IDLE and all the MAD handlers will
test this state under lock and refuse to advance from IDLE, so adding to
the xarray is harmless.
Further, the set to IB_CM_SIDR_REQ_RCVD also excludes all MAD handlers.
However, the local_id isn't even used for SIDR mode, and there will be no
input MADs related to the newly created ID.
So, make the whole flow simpler so it can be understood:
- Do not put the SIDR cm_id in the xarray. This directly shows that there
is no concurrency
- Delete the confusing work_count and pending_list manipulations. This
mechanism is only used by MAD handlers and timewait, neither of which
apply to SIDR.
- Add a few comments and rename 'cur_cm_id_priv' to 'listen_cm_id_priv'
- Move other loose sets up to immediately after cm_id creation so that
the cm_id is fully configured right away. This fixes an oversight where
the service_id will not be returned back on a IB_SIDR_UNSUPPORTED
reject.
Link: https://lore.kernel.org/r/20200310092545.251365-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:36 +0000 (11:25 +0200)]
RDMA/cm: Read id.state under lock when doing pr_debug()
The lock should not be dropped before doing the pr_debug() print as it is
accessing data protected by the lock, such as id.state.
Fixes:
119bf81793ea ("IB/cm: Add debug prints to ib_cm")
Link: https://lore.kernel.org/r/20200310092545.251365-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:35 +0000 (11:25 +0200)]
RDMA/cm: Simplify establishing a listen cm_id
Any manipulation of cm_id->state must be done under the cm_id_priv->lock,
the two routines that added listens did not follow this rule, because they
never participate in any concurrent access around the state.
However, since this exception makes the code hard to understand, simplify
the flow so that it can be fully locked:
- Move manipulation of listen_sharecount into cm_insert_listen() so it is
trivially under the cm.lock without having to expose the cm.lock to the
caller.
- Push the cm.lock down into cm_insert_listen() and have the function
increment the reference count before returning an existing pointer.
- Split ib_cm_listen() into an cm_init_listen() and do not call
ib_cm_listen() from ib_cm_insert_listen()
- Make both ib_cm_listen() and ib_cm_insert_listen() directly call
cm_insert_listen() under their cm_id_priv->lock which does both a
collision detect and, if needed, the insert (atomically)
- Enclose all state manipulation within the cm_id_priv->lock, notice this
set can be done safely after cm_insert_listen() as no reader is allowed
to read the state without holding the lock.
- Do not set the listen cm_id in the xarray, as it is never correct to
look it up. This makes the concurrency simpler to understand.
Many needless error unwinds are removed in the process.
Link: https://lore.kernel.org/r/20200310092545.251365-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:34 +0000 (11:25 +0200)]
RDMA/cm: Make the destroy_id flow more robust
Too much of the destruction is very carefully sensitive to the state
and various other things. Move more code to the unconditional path and
add several WARN_ONs to check consistency.
Link: https://lore.kernel.org/r/20200310092545.251365-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:33 +0000 (11:25 +0200)]
RDMA/cm: Remove a race freeing timewait_info
When creating a cm_id during REQ the id immediately becomes visible to the
other MAD handlers, and shortly after the state is moved to IB_CM_REQ_RCVD
This allows cm_rej_handler() to run concurrently and free the work:
CPU 0 CPU1
cm_req_handler()
ib_create_cm_id()
cm_match_req()
id_priv->state = IB_CM_REQ_RCVD
cm_rej_handler()
cm_acquire_id()
spin_lock(&id_priv->lock)
switch (id_priv->state)
case IB_CM_REQ_RCVD:
cm_reset_to_idle()
kfree(id_priv->timewait_info);
goto destroy
destroy:
kfree(id_priv->timewait_info);
id_priv->timewait_info = NULL
Causing a double free or worse.
Do not free the timewait_info without also holding the
id_priv->lock. Simplify this entire flow by making the free unconditional
during cm_destroy_id() and removing the confusing special case error
unwind during creation of the timewait_info.
This also fixes a leak of the timewait if cm_destroy_id() is called in
IB_CM_ESTABLISHED with an XRC TGT QP. The state machine will be left in
ESTABLISHED while it needed to transition through IB_CM_TIMEWAIT to
release the timewait pointer.
Also fix a leak of the timewait_info if the caller mis-uses the API and
does ib_send_cm_reqs().
Fixes:
a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:32 +0000 (11:25 +0200)]
RDMA/cm: Fix checking for allowed duplicate listens
The test here typod the cm_id_priv to use, it used the one that was
freshly allocated. By definition the allocated one has the matching
cm_handler and zero context, so the condition was always true.
Instead check that the existing listening ID is compatible with the
proposed handler so that it can be shared, as was originally intended.
Fixes:
067b171b8679 ("IB/cm: Share listening CM IDs")
Link: https://lore.kernel.org/r/20200310092545.251365-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 09:25:31 +0000 (11:25 +0200)]
RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id()
xa_alloc_cyclic() is a SMP release to be paired with some later acquire
during xa_load() as part of cm_acquire_id().
As such, xa_alloc_cyclic() must be done after the cm_id is fully
initialized, in particular, it absolutely must be after the
refcount_set(), otherwise the refcount_inc() in cm_acquire_id() may not
see the set.
As there are several cases where a reader will be able to use the
id.local_id after cm_acquire_id in the IB_CM_IDLE state there needs to be
an unfortunate split into a NULL allocate and a finalizing xa_store.
Fixes:
a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Weihang Li [Tue, 10 Mar 2020 13:06:09 +0000 (21:06 +0800)]
RDMA/hns: Fix wrong judgments of udata->outlen
These judgments were used to keep the compatibility with older versions of
userspace that don't have the field named "cap_flags" in structure
hns_roce_ib_create_cq_resp. But it will be wrong to compare outlen with
the size of resp if another new field were added in resp. oulen should be
compared with the end offset of cap_flags in resp.
Fixes:
4f8f0d5e33dd ("RDMA/hns: Package the flow of creating cq")
Link: https://lore.kernel.org/r/1583845569-47257-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Fri, 13 Mar 2020 14:11:07 +0000 (11:11 -0300)]
Merge branch 'mlx5_mr_cache' into rdma.git for-next
Leon Romanovsky says:
====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_mr-cache':
RDMA/mlx5: Allow MRs to be created in the cache synchronously
RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
RDMA/mlx5: Fix locking in MR cache work queue
RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
RDMA/mlx5: Fix MR cache size and limit debugfs
RDMA/mlx5: Always remove MRs from the cache before destroying them
RDMA/mlx5: Simplify how the MR cache bucket is located
RDMA/mlx5: Rename the tracking variables for the MR cache
RDMA/mlx5: Replace spinlock protected write with atomic var
{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only
{IB,net}/mlx5: Setup mkey variant before mr create command invocation
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:38 +0000 (10:22 +0200)]
RDMA/mlx5: Allow MRs to be created in the cache synchronously
If the cache is completely out of MRs, and we are running in cache mode,
then directly, and synchronously, create an MR that is compatible with the
cache bucket using a sleeping mailbox command. This ensures that the
thread that is waiting for the MR absolutely will get one.
When a MR allocated in this way becomes freed then it is compatible with
the cache bucket and will be recycled back into it.
Deletes the very buggy ent->compl scheme to create a synchronous MR
allocation.
Link: https://lore.kernel.org/r/20200310082238.239865-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:37 +0000 (10:22 +0200)]
RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
Currently if the work queue is running then it is in 'hysteresis' mode and
will fill until the cache reaches the high water mark. This implicit state
is very tricky and doesn't interact with pending very well.
Instead of self re-scheduling the work queue after the add_keys() has
started to create the new MR, have the queue scheduled from
reg_mr_callback() only after the requested MR has been added.
This avoids the bad design of an in-rush of queue'd work doing back to
back add_keys() until EAGAIN then sleeping. The add_keys() will be paced
one at a time as they complete, slowly filling up the cache.
Also, fix pending to be only manipulated under lock.
Link: https://lore.kernel.org/r/20200310082238.239865-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:36 +0000 (10:22 +0200)]
RDMA/mlx5: Fix locking in MR cache work queue
All of the members of mlx5_cache_ent must be accessed while holding the
spinlock, add the missing spinlock in the __cache_work_func().
Using cache->stopped and flush_workqueue() is an inherently racy way to
shutdown self-scheduling work on a queue. Replace it with ent->disabled
under lock, and always check disabled before queuing any new work. Use
cancel_work_sync() to shutdown the queue.
Use READ_ONCE/WRITE_ONCE for dev->last_add to manage concurrency as
coherency is less important here.
Split fill_delay from the bitfield. C bitfield updates are not atomic and
this is just a mess. Use READ_ONCE/WRITE_ONCE, but this could also use
test_bit()/set_bit().
Link: https://lore.kernel.org/r/20200310082238.239865-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:35 +0000 (10:22 +0200)]
RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
Accesses to these members needs to be locked. There is no reason not to
hold a spinlock while calling queue_work(), so move the tests into a
helper and always call it under lock.
The helper should be called when available_mrs is adjusted.
Link: https://lore.kernel.org/r/20200310082238.239865-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:34 +0000 (10:22 +0200)]
RDMA/mlx5: Fix MR cache size and limit debugfs
The size_write function is supposed to adjust the total_mr's to match the
user's request, but lacks locking and safety checking.
total_mrs can only be adjusted by at most available_mrs. mrs already
assigned to users cannot be revoked. Ensure that the user provides a
target value within the range of available_mrs and within the high/low
water mark.
limit_write has confusing and wrong sanity checking, and doesn't have the
ability to deallocate on limit reduction.
Since both functions use the same algorithm to adjust the available_mrs,
consolidate it into one function and write it correctly. Fix the locking
and by holding the spinlock for all accesses to ent->X.
Always fail if the user provides a malformed string.
Fixes:
e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200310082238.239865-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:33 +0000 (10:22 +0200)]
RDMA/mlx5: Always remove MRs from the cache before destroying them
The cache bucket tracks the total number of MRs that exists, both inside
and outside of the cache. Removing a MR from the cache (by setting
cache_ent to NULL) without updating total_mrs will cause the tracking to
leak and be inflated.
Further fix the rereg_mr path to always destroy the MR. reg_create will
always overwrite all the MR data in mlx5_ib_mr, so the MR must be
completely destroyed, in all cases, before this function can be
called. Detach the MR from the cache and unconditionally destroy it to
avoid leaking HW mkeys.
Fixes:
afd1417404fb ("IB/mlx5: Use direct mkey destroy command upon UMR unreg failure")
Fixes:
56e11d628c5d ("IB/mlx5: Added support for re-registration of MRs")
Link: https://lore.kernel.org/r/20200310082238.239865-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:32 +0000 (10:22 +0200)]
RDMA/mlx5: Simplify how the MR cache bucket is located
There are many bad APIs here that are accepting a cache bucket index
instead of a bucket pointer. Many of the callers already have a bucket
pointer, so this results in a lot of confusing uses of order2idx().
Pass the struct mlx5_cache_ent into add_keys(), remove_keys(), and
alloc_cached_mr().
Once the MR is in the cache, store the cache bucket pointer directly in
the MR, replacing the 'bool allocated_from cache'.
In the end there is only one place that needs to form index from order,
alloc_mr_from_cache(). Increase the safety of this function by disallowing
it from accessing cache entries in the ODP special area.
Link: https://lore.kernel.org/r/20200310082238.239865-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 08:22:31 +0000 (10:22 +0200)]
RDMA/mlx5: Rename the tracking variables for the MR cache
The old names do not clearly indicate the intent.
Link: https://lore.kernel.org/r/20200310082238.239865-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Saeed Mahameed [Tue, 10 Mar 2020 08:22:29 +0000 (10:22 +0200)]
RDMA/mlx5: Replace spinlock protected write with atomic var
mkey variant calculation was spinlock protected to make it atomic, replace
that with one atomic variable.
Link: https://lore.kernel.org/r/20200310082238.239865-4-leon@kernel.org
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Michael Guralnik [Tue, 10 Mar 2020 08:22:30 +0000 (10:22 +0200)]
{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Saeed Mahameed [Tue, 10 Mar 2020 08:22:28 +0000 (10:22 +0200)]
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Saeed Mahameed [Tue, 10 Mar 2020 08:22:27 +0000 (10:22 +0200)]
{IB,net}/mlx5: Setup mkey variant before mr create command invocation
On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is
wrong and will lead to using a different key variant than the one
submitted to firmware on create mkey command invocation.
To fix this, we store the mkey variant before invoking the firmware
command and use it later on completion (reg_mr_callback).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Leon Romanovsky [Tue, 10 Mar 2020 09:14:32 +0000 (11:14 +0200)]
RDMA/cm: Delete not implemented CM peer to peer communication
Peer to peer support was never implemented, so delete it to make code less
clutter.
Link: https://lore.kernel.org/r/20200310091438.248429-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 10 Mar 2020 09:14:31 +0000 (11:14 +0200)]
RDMA/mlx5: Use offsetofend() instead of duplicated variant
Convert mlx5 driver to use offsetofend() instead of its duplicated
variant.
Link: https://lore.kernel.org/r/20200310091438.248429-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 10 Mar 2020 09:14:29 +0000 (11:14 +0200)]
RDMA/mlx4: Delete duplicated offsetofend implementation
Convert mlx4 to use in-kernel offsetofend() instead
of its duplicated implementation.
Link: https://lore.kernel.org/r/20200310091438.248429-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Alex Vesker [Thu, 5 Mar 2020 12:38:41 +0000 (14:38 +0200)]
IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
Until now the flex parser capability was used in ib_query_device() to
indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp.
Newer devices and firmware will have configurations with the flexparser
but without mpls support.
Testing for the flex parser capability was a mistake, the tunnel_stateless
capability was intended for detecting mpls and was introduced at the same
time as the flex parser capability.
Otherwise userspace will be incorrectly informed that a future device
supports MPLS when it does not.
Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.17
Fixes:
e818e255a58d ("IB/mlx5: Expose MPLS related tunneling offloads")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Erez Shitrit [Tue, 10 Mar 2020 07:57:06 +0000 (09:57 +0200)]
RDMA/mlx5: Remove duplicate definitions of SW_ICM macros
Those macros are already defined in include/linux/mlx5/driver.h, so delete
their duplicate variants.
Link: https://lore.kernel.org/r/20200310075706.238592-1-leon@kernel.org
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Zhu Yanjun [Tue, 10 Mar 2020 09:16:56 +0000 (11:16 +0200)]
RDMA/core: Remove the duplicate header file
The header file rdma_core.h is duplicate, so let's remove it.
Fixes:
622db5b6439a ("RDMA/core: Add trace points to follow MR allocation")
Link: https://lore.kernel.org/r/20200310091656.249696-1-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Christophe JAILLET [Sun, 8 Mar 2020 06:54:42 +0000 (07:54 +0100)]
RDMA/bnxt_re: Remove a redundant 'memset'
'wqe' is already zeroed at the top of the 'while' loop, just a few lines
below, and is not used outside of the loop.
So there is no need to zero it again, or for the variable to be declared
outside the loop.
Link: https://lore.kernel.org/r/20200308065442.5415-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Thu, 27 Feb 2020 20:36:51 +0000 (16:36 -0400)]
RDMA/cma: Teach lockdep about the order of rtnl and lock
This lock ordering only happens when bonding is enabled and a certain
bonding related event fires. However, since it can happen this is a global
restriction on lock ordering.
Teach lockdep about the order directly and unconditionally so bugs here
are found quickly.
See https://syzkaller.appspot.com/bug?extid=
55de90ab5f44172b0c90
Link: https://lore.kernel.org/r/20200227203651.GA27185@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Max Gurtovoy [Thu, 20 Feb 2020 10:08:19 +0000 (12:08 +0200)]
RDMA/rw: map P2P memory correctly for signature operations
Since RDMA rw API support operations with P2P memory sg list, make sure to
map/unmap the scatter list for signature operation correctly.
Link: https://lore.kernel.org/r/20200220100819.41860-2-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 15:49:09 +0000 (12:49 -0300)]
Merge tag 'v5.6-rc5' into rdma.git for-next
Required due to dependencies in following patches.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 10 Mar 2020 14:54:17 +0000 (11:54 -0300)]
Merge branch 'mlx5_packet_pacing' into rdma.git for-next
Yishai Hadas Says:
====================
Expose raw packet pacing APIs to be used by DEVX based applications. The
existing code was refactored to have a single flow with the new raw APIs.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_packet_pacing':
IB/mlx5: Introduce UAPIs to manage packet pacing
net/mlx5: Expose raw packet pacing APIs
Yishai Hadas [Wed, 19 Feb 2020 19:05:18 +0000 (21:05 +0200)]
IB/mlx5: Introduce UAPIs to manage packet pacing
Introduce packet pacing uobject and its alloc and destroy
methods.
This uobject holds mlx5 packet pacing context according to the device
specification and enables managing packet pacing device entries that are
needed by DEVX applications.
Link: https://lore.kernel.org/r/20200219190518.200912-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Linus Torvalds [Mon, 9 Mar 2020 00:44:44 +0000 (17:44 -0700)]
Linux 5.6-rc5
Linus Torvalds [Mon, 9 Mar 2020 00:36:22 +0000 (17:36 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"We've been accruing these for a couple of weeks, so the batch is a bit
bigger than usual.
Largest delta is due to a led-bl driver that is added -- there was a
miscommunication before the merge window and the driver didn't make it
in. Due to this, the platforms needing it regressed. At this point, it
seemed easier to add the new driver than unwind the changes.
Besides that, there are a handful of various fixes:
- AMD tee memory leak fix
- A handful of fixlets for i.MX SCU communication
- A few maintainers woke up and realized DEBUG_FS had been missing
for a while, so a few updates of that.
... and the usual collection of smaller fixes to various platforms"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (37 commits)
ARM: socfpga_defconfig: Add back DEBUG_FS
arm64: dts: socfpga: agilex: Fix gmac compatible
ARM: bcm2835_defconfig: Explicitly restore CONFIG_DEBUG_FS
arm64: dts: meson: fix gxm-khadas-vim2 wifi
arm64: dts: meson-sm1-sei610: add missing interrupt-names
ARM: meson: Drop unneeded select of COMMON_CLK
ARM: dts: bcm2711: Add pcie0 alias
ARM: dts: bcm283x: Add missing properties to the PWR LED
tee: amdtee: fix memory leak in amdtee_open_session()
ARM: OMAP2+: Fix compile if CONFIG_HAVE_ARM_SMCCC is not set
arm: dts: dra76x: Fix mmc3 max-frequency
ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
bus: ti-sysc: Fix 1-wire reset quirk
ARM: dts: r8a7779: Remove deprecated "renesas, rcar-sata" compatible value
soc: imx-scu: Align imx sc msg structs to 4
firmware: imx: Align imx_sc_msg_req_cpu_start to 4
firmware: imx: scu-pd: Align imx sc msg structs to 4
firmware: imx: misc: Align imx sc msg structs to 4
firmware: imx: scu: Ensure sequential TX
ARM: dts: imx7-colibri: Fix frequency for sd/mmc
...
Linus Torvalds [Mon, 9 Mar 2020 00:33:52 +0000 (17:33 -0700)]
Merge tag 'edac_urgent-2020-03-08' of git://git./linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"Error reporting fix for synopsys_edac: do not overwrite partial
decoded error message (Sherry Sun)"
* tag 'edac_urgent-2020-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/synopsys: Do not print an error with back-to-back snprintf() calls
Linus Torvalds [Sun, 8 Mar 2020 15:49:44 +0000 (10:49 -0500)]
Merge tag 'char-misc-5.6-rc5' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are four small char/misc driver fixes for reported issues for
5.6-rc5.
These fixes are:
- binder fix for a potential use-after-free problem found (took two
tries to get it right)
- interconnect core fix
- altera-stapl driver fix
All four of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: prevent UAF for binderfs devices II
interconnect: Handle memory allocation errors
altera-stapl: altera_get_note: prevent write beyond end of 'key'
binder: prevent UAF for binderfs devices
Linus Torvalds [Sun, 8 Mar 2020 15:39:40 +0000 (10:39 -0500)]
Merge tag 'driver-core-5.6-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs fixes from Greg KH:
"Here are four small driver core / debugfs patches for 5.6-rc3:
- debugfs api cleanup now that all debugfs_create_regset32() callers
have been fixed up. This was waiting until after the -rc1 merge as
these fixes came in through different trees
- driver core sync state fixes based on reports of minor issues found
in the feature
All of these have been in linux-next with no reported issues"
* tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Skip unnecessary work when device doesn't have sync_state()
driver core: Add dev_has_sync_state()
driver core: Call sync_state() even if supplier has no consumers
debugfs: remove return value of debugfs_create_regset32()
Linus Torvalds [Sun, 8 Mar 2020 15:35:04 +0000 (10:35 -0500)]
Merge tag 'tty-5.6-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial fixes for 5.6-rc5
Just some small serial driver fixes, and a vt core fixup, full details
are:
- vt fixes for issues found by syzbot
- serdev fix for Apple boxes
- fsl_lpuart serial driver fixes
- MAINTAINER update for incorrect serial files
- new device ids for 8250_exar driver
- mvebu-uart fix
All of these have been in linux-next with no reported issues"
* tag 'tty-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: fsl_lpuart: free IDs allocated by IDA
Revert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE"
serdev: Fix detection of UART devices on Apple machines.
MAINTAINERS: Add missed files related to Synopsys DesignWare UART
serial: 8250_exar: add support for ACCES cards
tty:serial:mvebu-uart:fix a wrong return
vt: selection, push sel_lock up
vt: selection, push console lock down
Linus Torvalds [Sun, 8 Mar 2020 15:32:23 +0000 (10:32 -0500)]
Merge tag 'usb-5.6-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are some small USB and PHY driver fixes for reported issues for
5.6-rc5.
Included in here are:
- phy driver fixes
- new USB quirks
- USB cdns3 gadget driver fixes
- USB hub core fixes
All of these have been in linux-next with no reported issues"
* tag 'usb-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc3: gadget: Update chain bit correctly when using sg list
usb: core: port: do error out if usb_autopm_get_interface() fails
usb: core: hub: do error out if usb_autopm_get_interface() fails
usb: core: hub: fix unhandled return by employing a void function
usb: storage: Add quirk for Samsung Fit flash
usb: quirks: add NO_LPM quirk for Logitech Screen Share
usb: usb251xb: fix regulator probe and error handling
phy: allwinner: Fix GENMASK misuse
usb: cdns3: gadget: toggle cycle bit before reset endpoint
usb: cdns3: gadget: link trb should point to next request
phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling
phy: brcm-sata: Correct MDIO operations for 40nm platforms
phy: ti: gmii-sel: do not fail in case of gmii
phy: ti: gmii-sel: fix set of copy-paste errors
phy: core: Fix phy_get() to not return error on link creation failure
phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval
Linus Torvalds [Sun, 8 Mar 2020 01:52:55 +0000 (19:52 -0600)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Nothing particularly exciting, some small ODP regressions from the mmu
notifier rework, another bunch of syzkaller fixes, and a bug fix for a
botched syzkaller fix in the first rc pull request.
- Fix busted syzkaller fix in 'get_new_pps' - this turned out to
crash on certain HW configurations
- Bug fixes for various missed things in error unwinds
- Add a missing rcu_read_lock annotation in hfi/qib
- Fix two ODP related regressions from the recent mmu notifier
changes
- Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm
- Revert an old patch in CMA as it is now shown to not be allocating
port numbers properly"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/iwcm: Fix iwcm work deallocation
RDMA/siw: Fix failure handling during device creation
RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
RDMA/odp: Ensure the mm is still alive before creating an implicit child
RDMA/core: Fix protection fault in ib_mr_pool_destroy
IB/mlx5: Fix implicit ODP race
IB/hfi1, qib: Ensure RCU is locked when accessing list
RDMA/core: Fix pkey and port assignment in get_new_pps
RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
RDMA/rw: Fix error flow during RDMA context initialization
RDMA/core: Fix use of logical OR in get_new_pps
Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
Eli Cohen [Tue, 3 Mar 2020 00:15:22 +0000 (16:15 -0800)]
net/mlx5: HW bit for goto chain offload support
Add the HW bit definition indecating goto chain offload support.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mark Bloch [Tue, 3 Mar 2020 00:15:21 +0000 (16:15 -0800)]
net/mlx5: Expose link speed directly
Expose port rate as part of the port speed register fields.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Tue, 3 Mar 2020 00:15:20 +0000 (16:15 -0800)]
net/mlx5: Introduce TLS and IPSec objects enums
Expose the TLS encryption key general object type enum correctly,
and add the IPSec encryption key general object type enum.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Tue, 3 Mar 2020 00:15:19 +0000 (16:15 -0800)]
net/mlx5: Introduce egress acl forward-to-vport capability
Add HCA_CAP.egress_acl_forward_to_vport field to check whether HW
supports e-switch vport's egress acl to forward packets to other
e-switch vport or not.
By default E-Switch egress ACL forwards eswitch vports egress packets
to their corresponding NIC/VF vports.
With this cap enabled, the driver is allowed to alter this behavior
and forward packets to arbitrary NIC/VF vports with the following
limitations:
a. Multiple processing paths are supported if all of the following
conditions are met:
- HCA_CAP.egress_acl_forward_to_vport is set ==1.
- A destination of type Flow Table only appears once, as the
last destination in the list.
- Vport destination is supported if
HCA_CAP.egress_acl_forward_to_vport==1. Vport must not be
the Uplink.
b. Flow_tag not supported.
c. This table is only applicable after an FDB table is created.
d. Push VLAN action is not supported.
e. Pop VLAN action cannot be added concurrently to this table and
FDB table.
This feature will be used during port failover in bonding scenario
where two VFs representors are bonded to handle failover egress traffic
(VM's ingress/receive traffic).
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Linus Torvalds [Sat, 7 Mar 2020 20:20:29 +0000 (14:20 -0600)]
Merge tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Here are a few io_uring fixes that should go into this release. This
contains:
- Removal of (now) unused io_wq_flush() and associated flag (Pavel)
- Fix cancelation lockup with linked timeouts (Pavel)
- Fix for potential use-after-free when freeing percpu ref for fixed
file sets
- io-wq cancelation fixups (Pavel)"
* tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
io_uring: fix lockup with timeouts
io_uring: free fixed_file_data after RCU grace period
io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL
io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
Linus Torvalds [Sat, 7 Mar 2020 20:14:38 +0000 (14:14 -0600)]
Merge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here are a few fixes that should go into this release. This contains:
- Revert of a bad bcache patch from this merge window
- Removed unused function (Daniel)
- Fixup for the blktrace fix from Jan from this release (Cengiz)
- Fix of deeper level bfqq overwrite in BFQ (Carlo)"
* tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
blktrace: fix dereference after null check
Revert "bcache: ignore pending signals when creating gc and allocator thread"
block: Remove used kblockd_schedule_work_on()
Linus Torvalds [Sat, 7 Mar 2020 18:00:13 +0000 (12:00 -0600)]
Merge tag 'media/v5.6-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a fix for the media controller links in both hantro driver and in
v4l2-mem2mem core
- some fixes for the pulse8-cec driver
- vicodec: handle alpha channel for RGB32 formats, as it may be used
- mc-entity.c: fix handling of pad flags
* tag 'media/v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: hantro: Fix broken media controller links
media: mc-entity.c: use & to check pad flags, not ==
media: v4l2-mem2mem.c: fix broken links
media: vicodec: process all 4 components for RGB32 formats
media: pulse8-cec: close serio in disconnect, not adap_free
media: pulse8-cec: INIT_DELAYED_WORK was called too late
Pavel Begunkov [Fri, 6 Mar 2020 22:15:22 +0000 (01:15 +0300)]
io_uring: fix lockup with timeouts
There is a recipe to deadlock the kernel: submit a timeout sqe with a
linked_timeout (e.g. test_single_link_timeout_ception() from liburing),
and SIGKILL the process.
Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout
isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it
during io_put_free() to cancel the linked timeout. Probably, the same
can happen with another io_kill_timeout() call site, that is
io_commit_cqring().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 7 Mar 2020 14:12:47 +0000 (08:12 -0600)]
Merge tag 's390-5.6-5' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix panic in gup_fast on large pud by providing an implementation of
pud_write. This has been overlooked during migration to common gup
code.
- Fix unexpected write combining on PCI stores.
* tag 's390-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: Fix unexpected write combine on resource
s390/mm: fix panic in gup_fast on large pud
Linus Torvalds [Sat, 7 Mar 2020 14:10:34 +0000 (08:10 -0600)]
Merge tag 'powerpc-5.6-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.6:
- One fix for a recent regression to our breakpoint/watchpoint code.
- Another fix for our KUAP support, this time a missing annotation in
a rarely used path in signal handling.
- A fix for our handling of a CPU feature that effects the PMU, when
booting guests in some configurations.
- A minor fix to our linker script to explicitly include the .BTF
section.
Thanks to: Christophe Leroy, Desnes A. Nunes do Rosario, Leonardo
Bras, Naveen N. Rao, Ravi Bangoria, Stefan Berger"
* tag 'powerpc-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()
powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
powerpc: Include .BTF section
powerpc/watchpoint: Don't call dar_within_range() for Book3S
Linus Torvalds [Sat, 7 Mar 2020 14:04:54 +0000 (08:04 -0600)]
Merge tag 'for-linus-5.6b-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Four fixes and a small cleanup patch:
- two fixes by Dongli Zhang fixing races in the xenbus driver
- two fixes by me fixing issues introduced in 5.6
- a small cleanup by Gustavo Silva replacing a zero-length array with
a flexible-array"
* tag 'for-linus-5.6b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/blkfront: fix ring info addressing
xen/xenbus: fix locking
xenbus: req->err should be updated before req->state
xenbus: req->body should be updated before req->state
xen: Replace zero-length array with flexible-array member
Linus Torvalds [Sat, 7 Mar 2020 14:01:43 +0000 (08:01 -0600)]
Merge tag 'for-linus-2020-03-07' of gitolite.pub/scm/linux/kernel/git/brauner/linux
Pull thread fixes from Christian Brauner:
"Here are a few hopefully uncontroversial fixes:
- Use RCU_INIT_POINTER() when initializing rcu protected members in
task_struct to fix sparse warnings.
- Add pidfd_fdinfo_test binary to .gitignore file"
* tag 'for-linus-2020-03-07' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
selftests: pidfd: Add pidfd_fdinfo_test in .gitignore
exit: Fix Sparse errors and warnings
fork: Use RCU_INIT_POINTER() instead of rcu_access_pointer()
Linus Torvalds [Sat, 7 Mar 2020 13:59:30 +0000 (07:59 -0600)]
Merge tag 'sound-5.6-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The regular "bump-in-the-middle" updates, containing mostly ASoC-
related fixes at this time. All changes are reasonably small.
A few entries are for ASoC and ALSA core parts (DAPM, PCM, topology)
for followups of the recent changes and potential buffer overflow by
snprintf(), while the rest are (both new and old) device-specific
fixes for Intel, meson, tas2562, rt1015, as well as the usual HD-audio
quirks"
* tag 'sound-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
ALSA: sgio2audio: Remove usage of dropped hw_params/hw_free functions
ALSA: hda/realtek - Enable the headset of ASUS
B9450FA with ALC294
ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master
ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1
ALSA: hda/realtek - Add Headset Mic supported
ASoC: wm8741: Fix typo in Kconfig prompt
ASoC: stm32: sai: manage rebind issue
ASoC: SOF: Fix snd_sof_ipc_stream_posn()
ASoC: rt1015: modify pre-divider for sysclk
ASoC: rt1015: add operation callback function for rt1015_dai[]
ASoC: soc-component: tidyup snd_soc_pcm_component_sync_stop()
ASoC: dapm: Correct DAPM handling of active widgets during shutdown
ASoC: tas2562: Fix sample rate error message
ASoC: Intel: Skylake: Fix available clock counter incrementation
ASoC: soc-pcm/soc-compress: don't use snd_soc_dapm_stream_stop()
ASoC: meson: g12a: add tohdmitx reset
ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path
ASoC: soc-core: fix for_rtd_codec_dai_rollback() macro
ASoC: topology: Fix memleak in soc_tplg_manifest_load()
ASoC: topology: Fix memleak in soc_tplg_link_elems_load()
...
Takashi Iwai [Sat, 7 Mar 2020 06:24:36 +0000 (07:24 +0100)]
Merge tag 'asoc-fix-v5.6-rc4' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.6
More fixes that have arrived since the merge window, spread out all
over. There's a few things like the operation callback addition for
rt1015 and the meson reset addition which add small new bits of
functionality to fix non-working systems, they're all very small and for
parts of newly added functionality.
Linus Torvalds [Fri, 6 Mar 2020 23:03:37 +0000 (17:03 -0600)]
Merge tag 'linux-kselftest-5.6-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"This consists of a cleanup patch to undo changes to global .gitignore
that added selftests/lkdtm objects and add them to a local
selftests/lkdtm/.gitignore.
Summary of Linus's comments on local vs. global gitignore scope:
- Keep local gitignore patterns in local files.
- Put only global gitignore patterns in the top-level gitignore file.
Local scope keeps things much better separated. It also incidentally
means that if a directory gets renamed, the gitignore file continues
to work unless in the case of renaming the actual files themselves
that are named in the gitignore"
* tag 'linux-kselftest-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftest/lkdtm: Use local .gitignore
Linus Torvalds [Fri, 6 Mar 2020 22:38:33 +0000 (16:38 -0600)]
Merge tag 'riscv-for-linus-5.6-rc5' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains a handful of fixes that I would like to target for 5.6:
- A pair of fixes to module loading, which we hope solve the last of
the issues with module text being loaded too sparsely for our call
relocations.
- A Kconfig fix that disallows selecting memory models not supported
by NOMMU.
- A series of Kconfig updates to ease selecting the drivers necessary
to run on QEMU's virt platform.
- DTS updates for SiFive's HiFive Unleashed.
- A fix to our seccomp support that avoids mangling restartable
syscalls"
* tag 'riscv-for-linus-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix seccomp reject syscall code path
riscv: dts: Add GPIO reboot method to HiFive Unleashed DTS file
RISC-V: Select Goldfish RTC driver for QEMU virt machine
RISC-V: Select SYSCON Reboot and Poweroff for QEMU virt machine
RISC-V: Enable QEMU virt machine support in defconfigs
RISC-V: Add kconfig option for QEMU virt machine
riscv: Fix range looking for kernel image memblock
riscv: Force flat memory model with no-mmu
riscv: Change code model of module to medany to improve data accessing
riscv: avoid the PIC offset of static percpu data in module beyond 2G limits
Jonathan Neuschäfer [Fri, 6 Mar 2020 22:13:11 +0000 (23:13 +0100)]
parse-maintainers: Mark as executable
This makes the script more convenient to run.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 6 Mar 2020 22:11:34 +0000 (16:11 -0600)]
Merge tag 'devicetree-fixes-for-5.6-3' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
"Another batch of DT fixes. I think this should be the last of it, but
sending pull requests seems to cause people to send more fixes.
Summary:
- Fixes for warnings introduced by hierarchical PSCI binding changes
- Fixes for broken doc references due to DT schema conversions
- Several grammar and typo fixes
- Fix a bunch of dtc warnings in examples"
* tag 'devicetree-fixes-for-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states
dt-bindings: power: Extend nodename pattern for power-domain providers
MAINTAINERS: update ALLWINNER CPUFREQ DRIVER entry
dt-bindings: bus: Drop empty compatible string in example
dt-bindings: power: Convert domain-idle-states bindings to json-schema
dt-bindings: arm: Fix cpu compatibles in the hierarchical example for PSCI
dt-bindings: arm: Correct links to idle states definitions
dt-bindings: mfd: Fix typo in file name of twl-familly.txt
dt-bindings: mfd: tps65910: Improve grammar
dt-bindings: mfd: zii,rave-sp: Fix a typo ("onborad")
dt-bindings: arm: fsl: fix APF6Dev compatible
dt-bindings: Fix dtc warnings in examples
docs: dt: fix several broken doc references
docs: dt: fix several broken references due to renames
MAINTAINERS: clean up PCIE DRIVER FOR CAVIUM THUNDERX
Linus Torvalds [Fri, 6 Mar 2020 22:08:48 +0000 (16:08 -0600)]
Merge tag 'drm-fixes-2020-03-06-1' of git://anongit.freedesktop.org/drm/drm
Pull vgacon fix from Daniel Vetter:
"One vgacon input check for stable"
* tag 'drm-fixes-2020-03-06-1' of git://anongit.freedesktop.org/drm/drm:
vgacon: Fix a UAF in vgacon_invert_region
Linus Torvalds [Fri, 6 Mar 2020 20:56:46 +0000 (14:56 -0600)]
Merge tag 'for-5.6-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One fixup for DIO when in use with the new checksums, a missed case
where the checksum size was still assuming u32"
* tag 'for-5.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix RAID direct I/O reads with alternate csums
Linus Torvalds [Fri, 6 Mar 2020 20:55:27 +0000 (14:55 -0600)]
Merge tag 'filelock-v5.6-1' of git://git./linux/kernel/git/jlayton/linux
Pull file locking fixes from Jeff Layton:
"Just a couple of late-breaking patches for the file locking code. The
second patch (from yangerkun) fixes a rather nasty looking potential
use-after-free that should go to stable.
The other patch could technically wait for 5.7, but it's fairly
innocuous so I figured we might as well take it"
* tag 'filelock-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: fix a potential use-after-free problem when wakeup a waiter
fcntl: Distribute switch variables for initialization
Linus Torvalds [Fri, 6 Mar 2020 20:50:16 +0000 (14:50 -0600)]
Merge tag 'spi-fix-v5.6-rc4' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A selection of small fixes, mostly for drivers, that have arrived
since the merge window. None of them are earth shattering in
themselves but all useful for affected systems"
* tag 'spi-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi_register_controller(): free bus id on error paths
spi: bcm63xx-hsspi: Really keep pll clk enabled
spi: atmel-quadspi: fix possible MMIO window size overrun
spi/zynqmp: remove entry that causes a cs glitch
spi: pxa2xx: Add CS control clock quirk
spi: spidev: Fix CS polarity if GPIO descriptors are used
spi: qup: call spi_qup_pm_resume_runtime before suspending
spi: spi-omap2-mcspi: Support probe deferral for DMA channels
spi: spi-omap2-mcspi: Handle DMA size restriction on AM65x
Linus Torvalds [Fri, 6 Mar 2020 20:48:30 +0000 (14:48 -0600)]
Merge tag 'regulator-fix-v5.6-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of small fixes, one for a minor issue in the stm32-vrefbuf
driver and a documentation fix in the Qualcomm code"
* tag 'regulator-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: stm32-vrefbuf: fix a possible overshoot when re-enabling
regulator: qcom_spmi: Fix docs for PM8004
Linus Torvalds [Fri, 6 Mar 2020 20:47:06 +0000 (14:47 -0600)]
Merge tag 'hwmon-for-v5.6-rc5' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix an error return in the adt7462 driver, bad voltage limits reported
by the xdpe12284 driver, and a broken documentation reference in the
adm1177 driver documentation"
* tag 'hwmon-for-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()
hwmon: (pmbus/xdpe12284) Add callback for vout limits conversion
docs: adm1177: fix a broken reference
Linus Torvalds [Fri, 6 Mar 2020 20:35:47 +0000 (14:35 -0600)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here are another three arm64 fixes for 5.6, all pretty minor. Main
thing is fixing a silly bug in the fsl_imx8_ddr PMU driver where we
would zero the counters when disabling them.
- Fix misreporting of ASID limit when KPTI is enabled
- Fix busted NULL pointer checks for GICC structure in ACPI PMU code
- Avoid nobbling the "fsl_imx8_ddr" PMU counters when disabling them"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: context: Fix ASID limit in boot messages
drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
drivers/perf: fsl_imx8_ddr: Correct the CLEAR bit definition
Zhang Xiaoxu [Wed, 4 Mar 2020 02:24:29 +0000 (10:24 +0800)]
vgacon: Fix a UAF in vgacon_invert_region
When syzkaller tests, there is a UAF:
BUG: KASan: use after free in vgacon_invert_region+0x9d/0x110 at addr
ffff880000100000
Read of size 2 by task syz-executor.1/16489
page:
ffffea0000004000 count:0 mapcount:-127 mapping: (null)
index:0x0
page flags: 0xfffff00000000()
page dumped because: kasan: bad access detected
CPU: 1 PID: 16489 Comm: syz-executor.1 Not tainted
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
[<
ffffffffb119f309>] dump_stack+0x1e/0x20
[<
ffffffffb04af957>] kasan_report+0x577/0x950
[<
ffffffffb04ae652>] __asan_load2+0x62/0x80
[<
ffffffffb090f26d>] vgacon_invert_region+0x9d/0x110
[<
ffffffffb0a39d95>] invert_screen+0xe5/0x470
[<
ffffffffb0a21dcb>] set_selection+0x44b/0x12f0
[<
ffffffffb0a3bfae>] tioclinux+0xee/0x490
[<
ffffffffb0a1d114>] vt_ioctl+0xff4/0x2670
[<
ffffffffb0a0089a>] tty_ioctl+0x46a/0x1a10
[<
ffffffffb052db3d>] do_vfs_ioctl+0x5bd/0xc40
[<
ffffffffb052e2f2>] SyS_ioctl+0x132/0x170
[<
ffffffffb11c9b1b>] system_call_fastpath+0x22/0x27
Memory state around the buggy address:
ffff8800000fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
ffff8800000fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
>
ffff880000100000: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
It can be reproduce in the linux mainline by the program:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <linux/vt.h>
struct tiocl_selection {
unsigned short xs; /* X start */
unsigned short ys; /* Y start */
unsigned short xe; /* X end */
unsigned short ye; /* Y end */
unsigned short sel_mode; /* selection mode */
};
#define TIOCL_SETSEL 2
struct tiocl {
unsigned char type;
unsigned char pad;
struct tiocl_selection sel;
};
int main()
{
int fd = 0;
const char *dev = "/dev/char/4:1";
struct vt_consize v = {0};
struct tiocl tioc = {0};
fd = open(dev, O_RDWR, 0);
v.v_rows = 3346;
ioctl(fd, VT_RESIZEX, &v);
tioc.type = TIOCL_SETSEL;
ioctl(fd, TIOCLINUX, &tioc);
return 0;
}
When resize the screen, update the 'vc->vc_size_row' to the new_row_size,
but when 'set_origin' in 'vgacon_set_origin', vgacon use 'vga_vram_base'
for 'vc_origin' and 'vc_visible_origin', not 'vc_screenbuf'. It maybe
smaller than 'vc_screenbuf'. When TIOCLINUX, use the new_row_size to calc
the offset, it maybe larger than the vga_vram_size in vgacon driver, then
bad access.
Also, if set an larger screenbuf firstly, then set an more larger
screenbuf, when copy old_origin to new_origin, a bad access may happen.
So, If the screen size larger than vga_vram, resize screen should be
failed. This alse fix CVE-2020-8649 and CVE-2020-8647.
Linus pointed out that overflow checking seems absent. We're saved by
the existing bounds checks in vc_do_resize() with rather strict
limits:
if (cols > VC_RESIZE_MAXCOL || lines > VC_RESIZE_MAXROW)
return -EINVAL;
Fixes:
0aec4867dca14 ("[PATCH] SVGATextMode fix")
Reference: CVE-2020-8647 and CVE-2020-8649
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
[danvet: augment commit message to point out overflow safety]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304022429.37738-1-zhangxiaoxu5@huawei.com
Ulf Hansson [Tue, 3 Mar 2020 15:07:47 +0000 (16:07 +0100)]
dt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states
The hierarchical topology with power-domain should be described through
child nodes, rather than as currently described in the PSCI root node. Fix
this by adding a patternProperties with a corresponding reference to the
power-domain DT binding.
Additionally, update the example to conform to the new pattern, but also to
the adjusted domain-idle-state DT binding.
Fixes:
a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[robh: Add missing allOf, tweak power-domain node name]
Signed-off-by: Rob Herring <robh@kernel.org>
Ulf Hansson [Tue, 3 Mar 2020 15:07:46 +0000 (16:07 +0100)]
dt-bindings: power: Extend nodename pattern for power-domain providers
The existing binding requires the nodename to have a '@', which is a bit
limiting for the wider use case. Therefore, let's extend the pattern to
allow either '@' or '-'.
Fixes:
a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[robh: drop example change]
Signed-off-by: Rob Herring <robh@kernel.org>
Jens Axboe [Wed, 4 Mar 2020 14:25:50 +0000 (07:25 -0700)]
io_uring: free fixed_file_data after RCU grace period
The percpu refcount protects this structure, and we can have an atomic
switch in progress when exiting. This makes it unsafe to just free the
struct normally, and can trigger the following KASAN warning:
BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
Read of size 1 at addr
ffff888181a19a30 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x3b/0x60
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
__kasan_report.cold+0x1a/0x3d
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
rcu_core+0x370/0x830
? percpu_ref_exit+0x50/0x50
? rcu_note_context_switch+0x7b0/0x7b0
? run_rebalance_domains+0x11d/0x140
__do_softirq+0x10a/0x3e9
irq_exit+0xd5/0xe0
smp_apic_timer_interrupt+0x86/0x200
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:default_idle+0x26/0x1f0
Fix this by punting the final exit and free of the struct to RCU, then
we know that it's safe to do so. Jann suggested the approach of using a
double rcu callback to achieve this. It's important that we do a nested
call_rcu() callback, as otherwise the free could be ordered before the
atomic switch, even if the latter was already queued.
Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
Suggested-by: Jann Horn <jannh@google.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
yangerkun [Wed, 4 Mar 2020 07:25:56 +0000 (15:25 +0800)]
locks: fix a potential use-after-free problem when wakeup a waiter
'
16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the
logic to check waiter->fl_blocker without blocked_lock_lock. And it will
trigger a UAF when we try to wakeup some waiter:
Thread 1 has create a write flock a on file, and now thread 2 try to
unlock and delete flock a, thread 3 try to add flock b on the same file.
Thread2 Thread3
flock syscall(create flock b)
...flock_lock_inode_wait
flock_lock_inode(will insert
our fl_blocked_member list
to flock a's fl_blocked_requests)
sleep
flock syscall(unlock)
...flock_lock_inode_wait
locks_delete_lock_ctx
...__locks_wake_up_blocks
__locks_delete_blocks(
b->fl_blocker = NULL)
...
break by a signal
locks_delete_block
b->fl_blocker == NULL &&
list_empty(&b->fl_blocked_requests)
success, return directly
locks_free_lock b
wake_up(&b->fl_waiter)
trigger UAF
Fix it by remove this logic, and this patch may also fix CVE-2019-19769.
Cc: stable@vger.kernel.org
Fixes:
16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Carlo Nonato [Fri, 6 Mar 2020 12:27:31 +0000 (13:27 +0100)]
block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
The bfq_find_set_group() function takes as input a blkcg (which represents
a cgroup) and retrieves the corresponding bfq_group, then it updates the
bfq internal group hierarchy (see comments inside the function for why
this is needed) and finally it returns the bfq_group.
In the hierarchy update cycle, the pointer holding the correct bfq_group
that has to be returned is mistakenly used to traverse the hierarchy
bottom to top, meaning that in each iteration it gets overwritten with the
parent of the current group. Since the update cycle stops at root's
children (depth = 2), the overwrite becomes a problem only if the blkcg
describes a cgroup at a hierarchy level deeper than that (depth > 2). In
this case the root's child that happens to be also an ancestor of the
correct bfq_group is returned. The main consequence is that processes
contained in a cgroup at depth greater than 2 are wrongly placed in the
group described above by BFQ.
This commits fixes this problem by using a different bfq_group pointer in
the update cycle in order to avoid the overwrite of the variable holding
the original group reference.
Reported-by: Kwon Je Oh <kwonje.oh2@gmail.com>
Signed-off-by: Carlo Nonato <carlo.nonato95@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 6 Mar 2020 13:18:36 +0000 (07:18 -0600)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description
mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
mm/z3fold.c: do not include rwlock.h directly
fat: fix uninit-memory access for partial initialized inode
mm: avoid data corruption on CoW fault into PFN-mapped VMA
mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
Michael Walle [Tue, 3 Mar 2020 17:42:59 +0000 (18:42 +0100)]
tty: serial: fsl_lpuart: free IDs allocated by IDA
Since commit
3bc3206e1c0f ("serial: fsl_lpuart: Remove the alias node
dependence") the port line number can also be allocated by IDA, but in
case of an error the ID will no be removed again. More importantly, any
ID will be freed in remove(), even if it wasn't allocated but instead
fetched by of_alias_get_id(). If it was not allocated by IDA there will
be a warning:
WARN(1, "ida_free called for id=%d which is not allocated.\n", id);
Move the ID allocation more to the end of the probe() so that we still
can use plain return in the first error cases.
Fixes:
3bc3206e1c0f ("serial: fsl_lpuart: Remove the alias node dependence")
Signed-off-by: Michael Walle <michael@walle.cc>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200303174306.6015-3-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Walle [Tue, 3 Mar 2020 17:42:58 +0000 (18:42 +0100)]
Revert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE"
This reverts commit
a659652f6169240a5818cb244b280c5a362ef5a4.
This broke the earlycon on LS1021A processors because the order of the
earlycon_setup() functions were changed. Before the commit the normal
lpuart32_early_console_setup() was called. After the commit the
lpuart32_imx_early_console_setup() is called instead.
Fixes:
a659652f6169 ("tty: serial: fsl_lpuart: drop EARLYCON_DECLARE")
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20200303174306.6015-2-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ronald Tschalär [Tue, 11 Feb 2020 19:47:23 +0000 (11:47 -0800)]
serdev: Fix detection of UART devices on Apple machines.
On Apple devices the _CRS method returns an empty resource template, and
the resource settings are instead provided by the _DSM method. But
commit
33364d63c75d6182fa369cea80315cf1bb0ee38e (serdev: Add ACPI
devices by ResourceSource field) changed the search for serdev devices
to require valid, non-empty resource template, thereby breaking Apple
devices and causing bluetooth devices to not be found.
This expands the check so that if we don't find a valid template, and
we're on an Apple machine, then just check for the device being an
immediate child of the controller and having a "baud" property.
Cc: <stable@vger.kernel.org> # 5.5
Fixes:
33364d63c75d ("serdev: Add ACPI devices by ResourceSource field")
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Link: https://lore.kernel.org/r/20200211194723.486217-1-ronald@innovation.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miroslav Benes [Fri, 6 Mar 2020 06:28:45 +0000 (22:28 -0800)]
arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description
save_stack_trace_tsk_reliable() is not the only function providing the
reliable stack traces anymore. Architecture might define ARCH_STACKWALK
which provides a newer stack walking interface and has
arch_stack_walk_reliable() function. Update the description accordingly.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/20200120154042.9934-1-mbenes@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Fri, 6 Mar 2020 06:28:42 +0000 (22:28 -0800)]
mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
Commit
cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
fixed memory hotplug with debug_pagealloc enabled, where onlining a page
goes through page freeing, which removes the direct mapping. Some arches
don't like when the page is not mapped in the first place, so
generic_online_page() maps it first. This is somewhat wasteful, but
better than special casing page freeing fast paths.
The commit however missed that DEBUG_PAGEALLOC configured doesn't mean
it's actually enabled. One has to test debug_pagealloc_enabled() since
031bc5743f15 ("mm/debug-pagealloc: make debug-pagealloc boottime
configurable"), or alternatively debug_pagealloc_enabled_static() since
8e57f8acbbd1 ("mm, debug_pagealloc: don't rely on static keys too early"),
but this is not done.
As a result, a s390 kernel with DEBUG_PAGEALLOC configured but not enabled
will crash:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address:
0000000000000000 TEID:
0000000000000483
Fault in home space mode while using kernel ASCE.
AS:
0000001ece13400b R2:
000003fff7fd000b R3:
000003fff7fcc007 S:
000003fff7fd7000 P:
000000000000013d
Oops: 0004 ilc:2 [#1] SMP
CPU: 1 PID: 26015 Comm: chmem Kdump: loaded Tainted: GX 5.3.18-5-default #1 SLE15-SP2 (unreleased)
Krnl PSW :
0704e00180000000 0000001ecd281b9e (__kernel_map_pages+0x166/0x188)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS:
0000000000000000 0000000000000800 0000400b00000000 0000000000000100
0000000000000001 0000000000000000 0000000000000002 0000000000000100
0000001ece139230 0000001ecdd98d40 0000400b00000100 0000000000000000
000003ffa17e4000 001fffe0114f7d08 0000001ecd4d93ea 001fffe0114f7b20
Krnl Code:
0000001ecd281b8e:
ec17ffff00d8 ahik %r1,%r7,-1
0000001ecd281b94:
ec111dbc0355 risbg %r1,%r1,29,188,3
>
0000001ecd281b9e:
94fb5006 ni 6(%r5),251
0000001ecd281ba2:
41505008 la %r5,8(%r5)
0000001ecd281ba6:
ec51fffc6064 cgrj %r5,%r1,6,
1ecd281b9e
0000001ecd281bac: 1a07 ar %r0,%r7
0000001ecd281bae:
ec03ff584076 crj %r0,%r3,4,
1ecd281a5e
Call Trace:
[<
0000001ecd281b9e>] __kernel_map_pages+0x166/0x188
[<
0000001ecd4d9516>] online_pages_range+0xf6/0x128
[<
0000001ecd2a8186>] walk_system_ram_range+0x7e/0xd8
[<
0000001ecda28aae>] online_pages+0x2fe/0x3f0
[<
0000001ecd7d02a6>] memory_subsys_online+0x8e/0xc0
[<
0000001ecd7add42>] device_online+0x5a/0xc8
[<
0000001ecd7d0430>] state_store+0x88/0x118
[<
0000001ecd5b9f62>] kernfs_fop_write+0xc2/0x200
[<
0000001ecd5064b6>] vfs_write+0x176/0x1e0
[<
0000001ecd50676a>] ksys_write+0xa2/0x100
[<
0000001ecda315d4>] system_call+0xd8/0x2c8
Fix this by checking debug_pagealloc_enabled_static() before calling
kernel_map_pages(). Backports for kernel before 5.5 should use
debug_pagealloc_enabled() instead. Also add comments.
Fixes:
cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200224094651.18257-1-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sebastian Andrzej Siewior [Fri, 6 Mar 2020 06:28:39 +0000 (22:28 -0800)]
mm/z3fold.c: do not include rwlock.h directly
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. One thing it does is to break the RT build.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OGAWA Hirofumi [Fri, 6 Mar 2020 06:28:36 +0000 (22:28 -0800)]
fat: fix uninit-memory access for partial initialized inode
When get an error in the middle of reading an inode, some fields in the
inode might be still not initialized. And then the evict_inode path may
access those fields via iput().
To fix, this makes sure that inode fields are initialized.
Reported-by: syzbot+9d82b8de2992579da5d0@syzkaller.appspotmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/871rqnreqx.fsf@mail.parknet.co.jp
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 6 Mar 2020 06:28:32 +0000 (22:28 -0800)]
mm: avoid data corruption on CoW fault into PFN-mapped VMA
Jeff Moyer has reported that one of xfstests triggers a warning when run
on DAX-enabled filesystem:
WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
...
wp_page_copy+0x98c/0xd50 (unreliable)
do_wp_page+0xd8/0xad0
__handle_mm_fault+0x748/0x1b90
handle_mm_fault+0x120/0x1f0
__do_page_fault+0x240/0xd70
do_page_fault+0x38/0xd0
handle_page_fault+0x10/0x30
The warning happens on failed __copy_from_user_inatomic() which tries to
copy data into a CoW page.
This happens because of race between MADV_DONTNEED and CoW page fault:
CPU0 CPU1
handle_mm_fault()
do_wp_page()
wp_page_copy()
do_wp_page()
madvise(MADV_DONTNEED)
zap_page_range()
zap_pte_range()
ptep_get_and_clear_full()
<TLB flush>
__copy_from_user_inatomic()
sees empty PTE and fails
WARN_ON_ONCE(1)
clear_page()
The solution is to re-try __copy_from_user_inatomic() under PTL after
checking that PTE is matches the orig_pte.
The second copy attempt can still fail, like due to non-readable PTE, but
there's nothing reasonable we can do about, except clearing the CoW page.
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Justin He <Justin.He@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/20200218154151.13349-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Ying [Fri, 6 Mar 2020 06:28:29 +0000 (22:28 -0800)]
mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
In set_pmd_migration_entry(), pmdp_invalidate() is used to change PMD
atomically. But the PMD is read before that with an ordinary memory
reading. If the THP (transparent huge page) is written between the PMD
reading and pmdp_invalidate(), the PMD dirty bit may be lost, and cause
data corruption. The race window is quite small, but still possible in
theory, so need to be fixed.
The race is fixed via using the return value of pmdp_invalidate() to get
the original content of PMD, which is a read/modify/write atomic
operation. So no THP writing can occur in between.
The race has been introduced when the THP migration support is added in
the commit
616b8371539a ("mm: thp: enable thp migration in generic path").
But this fix depends on the commit
d52605d7cb30 ("mm: do not lose dirty
and accessed bits in pmdp_invalidate()"). So it's easy to be backported
after v4.16. But the race window is really small, so it may be fine not
to backport the fix at all.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/20200220075220.2327056-1-ying.huang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 6 Mar 2020 06:28:26 +0000 (22:28 -0800)]
mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
: A user reported a bug against a distribution kernel while running a
: proprietary workload described as "memory intensive that is not swapping"
: that is expected to apply to mainline kernels. The workload is
: read/write/modifying ranges of memory and checking the contents. They
: reported that within a few hours that a bad PMD would be reported followed
: by a memory corruption where expected data was all zeros. A partial
: report of the bad PMD looked like
:
: [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd
ffff8888157ba008(
000002e0396009e2)
: [ 5195.341184] ------------[ cut here ]------------
: [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35!
: ....
: [ 5195.410033] Call Trace:
: [ 5195.410471] [<
ffffffff811bc75d>] change_protection_range+0x7dd/0x930
: [ 5195.410716] [<
ffffffff811d4be8>] change_prot_numa+0x18/0x30
: [ 5195.410918] [<
ffffffff810adefe>] task_numa_work+0x1fe/0x310
: [ 5195.411200] [<
ffffffff81098322>] task_work_run+0x72/0x90
: [ 5195.411246] [<
ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2
: [ 5195.411494] [<
ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40
: [ 5195.411739] [<
ffffffff815e56af>] retint_user+0x8/0x10
:
: Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD
: was a false detection. The bug does not trigger if automatic NUMA
: balancing or transparent huge pages is disabled.
:
: The bug is due a race in change_pmd_range between a pmd_trans_huge and
: pmd_nond_or_clear_bad check without any locks held. During the
: pmd_trans_huge check, a parallel protection update under lock can have
: cleared the PMD and filled it with a prot_numa entry between the transhuge
: check and the pmd_none_or_clear_bad check.
:
: While this could be fixed with heavy locking, it's only necessary to make
: a copy of the PMD on the stack during change_pmd_range and avoid races. A
: new helper is created for this as the check if quite subtle and the
: existing similar helpful is not suitable. This passed 154 hours of
: testing (usually triggers between 20 minutes and 24 hours) without
: detecting bad PMDs or corruption. A basic test of an autonuma-intensive
: workload showed no significant change in behaviour.
Although Mel withdrew the patch on the face of LKML comment
https://lkml.org/lkml/2017/4/10/922 the race window aforementioned is
still open, and we have reports of Linpack test reporting bad residuals
after the bad PMD warning is observed. In addition to that, bad
rss-counter and non-zero pgtables assertions are triggered on mm teardown
for the task hitting the bad PMD.
host kernel: mm/pgtable-generic.c:40: bad pmd
00000000b3152f68(
8000000d2d2008e7)
....
host kernel: BUG: Bad rss-counter state mm:
00000000b583043d idx:1 val:512
host kernel: BUG: non-zero pgtables_bytes on freeing mm: 4096
The issue is observed on a v4.18-based distribution kernel, but the race
window is expected to be applicable to mainline kernels, as well.
[akpm@linux-foundation.org: fix comment typo, per Rafael]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/20200216191800.22423-1-aquini@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 6 Mar 2020 12:50:26 +0000 (06:50 -0600)]
Merge tag 'devprop-5.6-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull device properties framework fix from Rafael Wysocki:
"Revert a problematic commit from the 5.3 development cycle (Brendan
Higgins)"
* tag 'devprop-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "software node: Simplify software_node_release() function"
Linus Torvalds [Fri, 6 Mar 2020 12:49:09 +0000 (06:49 -0600)]
Merge tag 'acpi-5.6-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI documentation fix from Rafael Wysocki:
"Fix Sphinx format warinings in an ACPI fan document added recently
(Randy Dunlap)"
* tag 'acpi-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation/admin-guide/acpi: fix fan_performance_states.rst warnings
Linus Torvalds [Fri, 6 Mar 2020 12:45:20 +0000 (06:45 -0600)]
Merge tag 'drm-fixes-2020-03-06' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Weekly fixes round, looks like a few people woke up, got a bunch of
fixes across the drivers. Bit bigger than I'd like but they all seem
fine and hopefully it quiets down now.
sun4i, kirin, mediatek and exynos on the ARM side. virtio-gpu and core
have some mmap fixes, and there is a dma-buf leak. one ttm fence leak
is also fixed.
Otherwise it's mostly amdgpu and i915.
One of the i915 fixes is for a very long latency I was seeing (using
latencytop) running gnome-shell locally when using firefox and eating
nearly all my RAM, it really helps with desktop responsiveness esp
when firefox is chewing a lot.
dma-buf:
- fix memory leak
core:
- shmem object mmap fix.
ttm:
- Fix fence leak in ttm_buffer_object_transfer().
amdgpu:
- Gfx reset fix for gfx9, 10
- Fix for gfx10
- DP MST fix
- DCC fix
- Renoir power fixes
- Navi power fix
i915:
- Break up long lists of object reclaim with cond_resched()
- PSR probe fix
- TGL workarounds
- Selftest return value fix
- Drop timeline mutex while waiting for retirement
- Wait for OA configuration completion before writes to OA buffer
virtio:
- Fix resource id creation race in virtio.
- mmap fixes
sun4i:
- Fixes for sun4i VI layer format support.
kirin:
- kirin: Revert "Fix for hikey620 display offset problem"
exynos:
- fix a kernel oops problem in case that driver is loaded as module.
- fix a regulator warning issue when I2C DDC adapter cannot be gathered.
- print out an error message only in error case excepting -EPROBE_DEFER.
mediatek:
- overlay, cursor and gce fixes"
`
* tag 'drm-fixes-2020-03-06' of git://anongit.freedesktop.org/drm/drm: (38 commits)
drm/amdgpu/display: navi1x copy dcn watermark clock settings to smu resume from s3 (v2)
drm/amd/powerplay: map mclk to fclk for COMBINATIONAL_BYPASS case
drm/amd/powerplay: fix pre-check condition for setting clock range
drm/amd/display: fix dcc swath size calculations on dcn1
drm/amd/display: Clear link settings on MST disable connector
drm/amdgpu: disable 3D pipe 1 on Navi1x
drm/amdgpu: clean wptr on wb when gpu recovery
drm: kirin: Revert "Fix for hikey620 display offset problem"
drm/i915/gt: Drop the timeline->mutex as we wait for retirement
drm/i915/perf: Reintroduce wait on OA configuration completion
drm/sun4i: Fix DE2 VI layer format support
drm/sun4i: Add separate DE3 VI layer formats
drm/sun4i: de2/de3: Remove unsupported VI layer formats
drm/i915/selftests: Fix return in assert_mmap_offset()
drm/i915: Protect i915_request_await_start from early waits
drm/i915/tgl: Add Wa_1608008084
drm/i915/tgl: Add Wa_22010178259:tgl
drm/i915: Program MBUS with rmw during initialization
drm/i915/psr: Force PSR probe only after full initialization
drm/i915/gem: Break up long lists of object reclaim
...
Thomas Bogendoerfer [Fri, 6 Mar 2020 10:58:37 +0000 (11:58 +0100)]
ALSA: sgio2audio: Remove usage of dropped hw_params/hw_free functions
Commit
ee88f4ebe575 ("ALSA: mips: Use managed buffer allocation") removed
superfluous hw_params/hw_free callbacks, but forgot to remove them where
they were used.
Fixes:
ee88f4ebe575 ("ALSA: mips: Use managed buffer allocation")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://lore.kernel.org/r/20200306105837.31523-1-tsbogend@alpha.franken.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Rafael J. Wysocki [Fri, 6 Mar 2020 09:57:46 +0000 (10:57 +0100)]
Merge branch 'acpi-doc'
* acpi-doc:
Documentation/admin-guide/acpi: fix fan_performance_states.rst warnings
Dave Airlie [Fri, 6 Mar 2020 01:06:33 +0000 (11:06 +1000)]
Merge tag 'amd-drm-fixes-5.6-2020-03-05' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.6-2020-03-05:
amdgpu:
- Gfx reset fix for gfx9, 10
- Fix for gfx10
- DP MST fix
- DCC fix
- Renoir power fixes
- Navi power fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305185957.4268-1-alexander.deucher@amd.com
Dave Airlie [Thu, 5 Mar 2020 23:18:18 +0000 (09:18 +1000)]
Merge tag 'drm-intel-fixes-2020-03-05' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.6-rc5:
- Break up long lists of object reclaim with cond_resched()
- PSR probe fix
- TGL workarounds
- Selftest return value fix
- Drop timeline mutex while waiting for retirement
- Wait for OA configuration completion before writes to OA buffer
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87eeu7nl6z.fsf@intel.com
Dave Airlie [Thu, 5 Mar 2020 19:40:12 +0000 (05:40 +1000)]
Merge tag 'drm-misc-fixes-2020-03-05' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Fixes for v5.6.rc5:
- dma-buf fix memory leak
- Fix resource id creation race in virtio.
- Various mmap fixes.
- Fix fence leak in ttm_buffer_object_transfer().
- Fixes for sun4i VI layer format support.
- kirin: Revert "Fix for hikey620 display offset problem"
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/56de63c7-0cdf-5805-e268-44944af7fef2@linux.intel.com
Tycho Andersen [Sat, 8 Feb 2020 15:18:17 +0000 (08:18 -0700)]
riscv: fix seccomp reject syscall code path
If secure_computing() rejected a system call, we were previously setting
the system call number to -1, to indicate to later code that the syscall
failed. However, if something (e.g. a user notification) was sleeping, and
received a signal, we may set a0 to -ERESTARTSYS and re-try the system call
again.
In this case, seccomp "denies" the syscall (because of the signal), and we
would set a7 to -1, thus losing the value of the system call we want to
restart.
Instead, let's return -1 from do_syscall_trace_enter() to indicate that the
syscall was rejected, so we don't clobber the value in case of -ERESTARTSYS
or whatever.
This commit fixes the user_notification_signal seccomp selftest on riscv to
no longer hang. That test expects the system call to be re-issued after the
signal, and it wasn't due to the above bug. Now that it is, everything
works normally.
Note that in the ptrace (tracer) case, the tracer can set the register
values to whatever they want, so we still need to keep the code that
handles out-of-bounds syscalls. However, we can drop the comment.
We can also drop syscall_set_nr(), since it is no longer used anywhere, and
the code that re-loads the value in a7 because of it.
Reported in: https://lore.kernel.org/bpf/CAEn-LTp=ss0Dfv6J00=rCAy+N78U2AmhqJNjfqjr2FDpPYjxEQ@mail.gmail.com/
Reported-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>