NeilBrown [Tue, 18 Jul 2023 06:38:08 +0000 (16:38 +1000)]
SUNRPC: change svc_recv() to return void.
svc_recv() currently returns a 0 on success or one of two errors:
- -EAGAIN means no message was successfully received
- -EINTR means the thread has been told to stop
Previously nfsd would stop as the result of a signal as well as
following kthread_stop(). In that case the difference was useful: EINTR
means stop unconditionally. EAGAIN means stop if kthread_should_stop(),
continue otherwise.
Now threads only exit when kthread_should_stop() so we don't need the
distinction.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Tue, 18 Jul 2023 06:38:08 +0000 (16:38 +1000)]
SUNRPC: call svc_process() from svc_recv().
All callers of svc_recv() go on to call svc_process() on success.
Simplify callers by having svc_recv() do that for them.
This loses one call to validate_process_creds() in nfsd. That was
debugging code added 14 years ago. I don't think we need to keep it.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Mon, 31 Jul 2023 06:48:32 +0000 (16:48 +1000)]
nfsd: separate nfsd_last_thread() from nfsd_put()
Now that the last nfsd thread is stopped by an explicit act of calling
svc_set_num_threads() with a count of zero, we only have a limited
number of places that can happen, and don't need to call
nfsd_last_thread() in nfsd_put()
So separate that out and call it at the two places where the number of
threads is set to zero.
Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all()
into nfsd_last_thread(), as they are really part of the same action.
nfsd_put() is now a thin wrapper around svc_put(), so make it a static
inline.
nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of
places we have to use svc_put() instead.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Mon, 31 Jul 2023 06:48:31 +0000 (16:48 +1000)]
nfsd: Simplify code around svc_exit_thread() call in nfsd()
Previously a thread could exit asynchronously (due to a signal) so some
care was needed to hold nfsd_mutex over the last svc_put() call. Now a
thread can only exit when svc_set_num_threads() is called, and this is
always called under nfsd_mutex. So no care is needed.
Not only is the mutex held when a thread exits now, but the svc refcount
is elevated, so the svc_put() in svc_exit_thread() will never be a final
put, so the mutex isn't even needed at this point in the code.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Tue, 18 Jul 2023 06:38:08 +0000 (16:38 +1000)]
nfsd: don't allow nfsd threads to be signalled.
The original implementation of nfsd used signals to stop threads during
shutdown.
In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads
internally it if was asked to run "0" threads. After this user-space
transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to
threads was no longer an important part of the API.
In commit
3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename
svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the
use of signals for stopping threads, using kthread_stop() instead.
This patch makes the "obvious" next step and removes the ability to
signal nfsd threads - or any svc threads. nfsd stops allowing signals
and we don't check for their delivery any more.
This will allow for some simplification in later patches.
A change worth noting is in nfsd4_ssc_setup_dul(). There was previously
a signal_pending() check which would only succeed when the thread was
being shut down. It should really have tested kthread_should_stop() as
well. Now it just does the latter, not the former.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Tue, 18 Jul 2023 06:38:08 +0000 (16:38 +1000)]
lockd: remove SIGKILL handling
lockd allows SIGKILL and responds by dropping all locks and restarting
the grace period. This functionality has been present since 2.1.32 when
lockd was added to Linux.
This functionality is undocumented and most likely added as a useful
debug aid. When there is a need to drop locks, the better approach is
to use /proc/fs/nfsd/unlock_*.
This patch removes SIGKILL handling as part of preparation for removing
all signal handling from sunrpc service threads.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Su Hui [Fri, 4 Aug 2023 01:26:57 +0000 (09:26 +0800)]
fs: lockd: avoid possible wrong NULL parameter
clang's static analysis warning: fs/lockd/mon.c: line 293, column 2:
Null pointer passed as 2nd argument to memory copy function.
Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will
pass NULL as 2nd argument to memory copy function 'memcpy()'. So return
NULL if 'hostname' is invalid.
Fixes:
77a3ef33e2de ("NSM: More clean up of nsm_get_handle()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Zhu Wang [Mon, 31 Jul 2023 11:23:00 +0000 (19:23 +0800)]
exportfs: remove kernel-doc warnings in exportfs
Remove kernel-doc warning in exportfs:
fs/exportfs/expfs.c:395: warning: Function parameter or member 'parent'
not described in 'exportfs_encode_inode_fh'
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 19 Jul 2023 18:31:29 +0000 (14:31 -0400)]
SUNRPC: Reduce thread wake-up rate when receiving large RPC messages
With large NFS WRITE requests on TCP, I measured 5-10 thread wake-
ups to receive each request. This is because the socket layer
calls ->sk_data_ready() frequently, and each call triggers a
thread wake-up. Each recvmsg() seems to pull in less than 100KB.
Have the socket layer hold ->sk_data_ready() calls until the full
incoming message has arrived to reduce the wake-up rate.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 19 Jul 2023 18:31:22 +0000 (14:31 -0400)]
SUNRPC: Revert
e0a912e8ddba
Flamegraph analysis showed that the cork/uncork calls consume
nearly a third of the CPU time spent in svc_tcp_sendto(). The
other two consumers are mutex lock/unlock and svc_tcp_sendmsg().
Now that svc_tcp_sendto() coalesces RPC messages properly, there
is no need to introduce artificial delays to prevent sending
partial messages.
After applying this change, I measured a 1.2K read IOPS increase
for 8KB random I/O (several percent) on 56Gb IP over IB.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 19 Jul 2023 18:31:16 +0000 (14:31 -0400)]
SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array
Commit
da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg
for socket sends") modified svc_udp_sendto() to use xprt_sock_sendmsg()
because we originally believed xprt_sock_sendmsg() would be needed
for TLS support. That does not actually appear to be the case.
In addition, the linkage between the client and server send code has
been a bit of a maintenance headache because of the distinct ways
that the client and server handle memory allocation.
Going forward, eventually the XDR layer will deal with its buffers
in the form of bio_vec arrays, so convert this function accordingly.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 19 Jul 2023 18:31:09 +0000 (14:31 -0400)]
SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call
There is now enough infrastructure in place to combine the stream
record marker into the biovec array used to send each outgoing RPC
message on TCP. The whole message can be more efficiently sent with
a single call to sock_sendmsg() using a bio_vec iterator.
Note that this also helps with RPC-with-TLS: the TLS implementation
can now clearly see where the upper layer message boundaries are.
Before, it would send each component of the xdr_buf (record marker,
head, page payload, tail) in separate TLS records.
Suggested-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 19 Jul 2023 18:31:03 +0000 (14:31 -0400)]
SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly
Add a helper to convert a whole xdr_buf directly into an array of
bio_vecs, then send this array instead of iterating piecemeal over
the xdr_buf containing the outbound RPC message.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Mon, 24 Jul 2023 12:13:05 +0000 (08:13 -0400)]
nfsd: inherit required unset default acls from effective set
A well-formed NFSv4 ACL will always contain OWNER@/GROUP@/EVERYONE@
ACEs, but there is no requirement for inheritable entries for those
entities. POSIX ACLs must always have owner/group/other entries, even for a
default ACL.
nfsd builds the default ACL from inheritable ACEs, but the current code
just leaves any unspecified ACEs zeroed out. The result is that adding a
default user or group ACE to an inode can leave it with unwanted deny
entries.
For instance, a newly created directory with no acl will look something
like this:
# NFSv4 translation by server
A::OWNER@:rwaDxtTcCy
A::GROUP@:rxtcy
A::EVERYONE@:rxtcy
# POSIX ACL of underlying file
user::rwx
group::r-x
other::r-x
...if I then add new v4 ACE:
nfs4_setfacl -a A:fd:1000:rwx /mnt/local/test
...I end up with a result like this today:
user::rwx
user:1000:rwx
group::r-x
mask::rwx
other::r-x
default:user::---
default:user:1000:rwx
default:group::---
default:mask::rwx
default:other::---
A::OWNER@:rwaDxtTcCy
A::1000:rwaDxtcy
A::GROUP@:rxtcy
A::EVERYONE@:rxtcy
D:fdi:OWNER@:rwaDx
A:fdi:OWNER@:tTcCy
A:fdi:1000:rwaDxtcy
A:fdi:GROUP@:tcy
A:fdi:EVERYONE@:tcy
...which is not at all expected. Adding a single inheritable allow ACE
should not result in everyone else losing access.
The setfacl command solves a silimar issue by copying owner/group/other
entries from the effective ACL when none of them are set:
"If a Default ACL entry is created, and the Default ACL contains no
owner, owning group, or others entry, a copy of the ACL owner,
owning group, or others entry is added to the Default ACL.
Having nfsd do the same provides a more sane result (with no deny ACEs
in the resulting set):
user::rwx
user:1000:rwx
group::r-x
mask::rwx
other::r-x
default:user::rwx
default:user:1000:rwx
default:group::r-x
default:mask::rwx
default:other::r-x
A::OWNER@:rwaDxtTcCy
A::1000:rwaDxtcy
A::GROUP@:rxtcy
A::EVERYONE@:rxtcy
A:fdi:OWNER@:rwaDxtTcCy
A:fdi:1000:rwaDxtcy
A:fdi:GROUP@:rxtcy
A:fdi:EVERYONE@:rxtcy
Reported-by: Ondrej Valousek <ondrej.valousek@diasemi.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=
2136452
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
YueHaibing [Sat, 22 Jul 2023 03:31:16 +0000 (11:31 +0800)]
sunrpc: Remove unused extern declarations
Since commit
49b28684fdba ("nfsd: Remove deprecated nfsctl system call and related code.")
these declarations are unused, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Alexander Aring [Thu, 20 Jul 2023 12:58:04 +0000 (08:58 -0400)]
lockd: nlm_blocked list race fixes
This patch fixes races when lockd accesses the global nlm_blocked list.
It was mostly safe to access the list because everything was accessed
from the lockd kernel thread context but there exist cases like
nlmsvc_grant_deferred() that could manipulate the nlm_blocked list and
it can be called from any context.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Mon, 24 Jul 2023 14:53:39 +0000 (10:53 -0400)]
nfsd: set missing after_change as before_change + 1
In the event that we can't fetch post_op_attr attributes, we still need
to set a value for the after_change. The operation has already happened,
so we're not able to return an error at that point, but we do want to
ensure that the client knows that its cache should be invalidated.
If we weren't able to fetch post-op attrs, then just set the
after_change to before_change + 1. The atomic flag should already be
clear in this case.
Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Fri, 21 Jul 2023 14:29:11 +0000 (10:29 -0400)]
nfsd: remove unsafe BUG_ON from set_change_info
At one time, nfsd would scrape inode information directly out of struct
inode in order to populate the change_info4. At that time, the BUG_ON in
set_change_info made some sense, since having it unset meant a coding
error.
More recently, it calls vfs_getattr to get this information, which can
fail. If that fails, fh_pre_saved can end up not being set. While this
situation is unfortunate, we don't need to crash the box.
Move set_change_info to nfs4proc.c since all of the callers are there.
Revise the condition for setting "atomic" to also check for
fh_pre_saved. Drop the BUG_ON and just have it zero out both
change_attr4s when this occurs.
Reported-by: Boyang Xue <bxue@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=
2223560
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Fri, 21 Jul 2023 14:29:10 +0000 (10:29 -0400)]
nfsd: handle failure to collect pre/post-op attrs more sanely
Collecting pre_op_attrs can fail, in which case it's probably best to
fail the whole operation.
Change fh_fill_pre_attrs and fh_fill_both_attrs to return __be32, and
have the callers check the return code and abort the operation if it's
not nfs_ok.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Thu, 20 Jul 2023 13:34:53 +0000 (09:34 -0400)]
nfsd: add a MODULE_DESCRIPTION
I got this today from modpost:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfsd/nfsd.o
Add a module description.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Sun, 9 Jul 2023 15:45:48 +0000 (11:45 -0400)]
NFSD: Rename struct svc_cacherep
The svc_ prefix is identified with the SunRPC layer. Although the
duplicate reply cache caches RPC replies, it is only for the NFS
protocol. Rename the struct to better reflect its purpose.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Sun, 9 Jul 2023 15:45:41 +0000 (11:45 -0400)]
NFSD: Remove svc_rqst::rq_cacherep
Over time I'd like to see NFS-specific fields moved out of struct
svc_rqst, which is an RPC layer object. These fields are layering
violations.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Sun, 9 Jul 2023 15:45:35 +0000 (11:45 -0400)]
NFSD: Refactor the duplicate reply cache shrinker
Avoid holding the bucket lock while freeing cache entries. This
change also caps the number of entries that are freed when the
shrinker calls to reduce the shrinker's impact on the cache's
effectiveness.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Sun, 9 Jul 2023 15:45:29 +0000 (11:45 -0400)]
NFSD: Replace nfsd_prune_bucket()
Enable nfsd_prune_bucket() to drop the bucket lock while calling
kfree(). Use the same pattern that Jeff recently introduced in the
NFSD filecache.
A few percpu operations are moved outside the lock since they
temporarily disable local IRQs which is expensive and does not
need to be done while the lock is held.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Sun, 9 Jul 2023 15:45:22 +0000 (11:45 -0400)]
NFSD: Rename nfsd_reply_cache_alloc()
For readability, rename to match the other helpers.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Sun, 9 Jul 2023 15:45:16 +0000 (11:45 -0400)]
NFSD: Refactor nfsd_reply_cache_free_locked()
To reduce contention on the bucket locks, we must avoid calling
kfree() while each bucket lock is held.
Start by refactoring nfsd_reply_cache_free_locked() into a helper
that removes an entry from the bucket (and must therefore run under
the lock) and a second helper that frees the entry (which does not
need to hold the lock).
For readability, rename the helpers nfsd_cacherep_<verb>.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:51:32 +0000 (13:51 -0400)]
SUNRPC: Remove net/sunrpc/auth_gss/gss_krb5_seqnum.c
These functions are no longer used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:51:26 +0000 (13:51 -0400)]
SUNRPC: Remove the ->import_ctx method
All supported encryption types now use the same context import
function.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:51:19 +0000 (13:51 -0400)]
SUNRPC: Remove CONFIG_RPCSEC_GSS_KRB5_CRYPTOSYSTEM
This code is now always on, so the ifdef can be removed.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:51:13 +0000 (13:51 -0400)]
SUNRPC: Remove gss_import_v1_context()
We no longer support importing v1 contexts.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:51:06 +0000 (13:51 -0400)]
SUNRPC: Remove krb5_derive_key_v1()
This function is no longer used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:51:00 +0000 (13:51 -0400)]
SUNRPC: Remove code behind CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED
None of this code can be enabled any more.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:50:52 +0000 (13:50 -0400)]
SUNRPC: Remove DES and DES3 enctypes from the supported enctypes list
These enctypes can no longer be enabled via CONFIG.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:50:46 +0000 (13:50 -0400)]
SUNRPC: Remove Kunit tests for the DES3 encryption type
The DES3 encryption type is no longer implemented.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Thu, 29 Jun 2023 17:50:39 +0000 (13:50 -0400)]
SUNRPC: Remove RPCSEC_GSS_KRB5_ENCTYPES_DES
Make it impossible to enable support for the DES or DES3 Kerberos
encryption types in SunRPC. These enctypes were deprecated by RFCs
6649 and 8429 because they are known to be insecure.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Dai Ngo [Fri, 30 Jun 2023 01:52:40 +0000 (18:52 -0700)]
NFSD: Enable write delegation support
This patch grants write delegations for OPEN with NFS4_SHARE_ACCESS_WRITE
if there is no conflict with other OPENs.
Write delegation conflicts with another OPEN, REMOVE, RENAME and SETATTR
are handled the same as read delegation using notify_change,
try_break_deleg.
The NFSv4.0 protocol does not enable a server to determine that a
conflicting GETATTR originated from the client holding the
delegation versus coming from some other client. With NFSv4.1 and
later, the SEQUENCE operation that begins each COMPOUND contains a
client ID, so delegation recall can be safely squelched in this case.
With NFSv4.0, however, the server must recall or send a CB_GETATTR
(per RFC 7530 Section 16.7.5) even when the GETATTR originates from
the client holding that delegation.
An NFSv4.0 client can trigger a pathological situation if it always
sends a DELEGRETURN preceded by a conflicting GETATTR in the same
COMPOUND. COMPOUND execution will always stop at the GETATTR and the
DELEGRETURN will never get executed. The server eventually revokes
the delegation, which can result in loss of open or lock state.
Tracepoint added to track whether read or write delegation is granted.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 19 Jul 2023 15:33:09 +0000 (11:33 -0400)]
NFSD: Report zero space limit for write delegations
Replace the -1 (no limit) with a zero (no reserved space).
This prevents certain non-determinant client behavior, such as
silly-renaming a file when the only open reference is a write
delegation. Such a rename can leave unexpected .nfs files in a
directory that is otherwise supposed to be empty.
Note that other server implementations that support write delegation
also set this field to zero.
Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Dai Ngo [Fri, 30 Jun 2023 01:52:39 +0000 (18:52 -0700)]
NFSD: handle GETATTR conflict with write delegation
If the GETATTR request on a file that has write delegation in effect and
the request attributes include the change info and size attribute then
the write delegation is recalled. If the delegation is returned within
30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY
error is returned for the GETATTR.
Add counter for write delegation recall due to conflict GETATTR. This is
used to evaluate the need to implement CB_GETATTR to adoid recalling the
delegation with conflit GETATTR.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Dai Ngo [Fri, 30 Jun 2023 01:52:37 +0000 (18:52 -0700)]
locks: allow support for write delegation
Remove the check for F_WRLCK in generic_add_lease to allow file_lock
to be used for write delegation.
First consumer is NFSD.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Mon, 28 Aug 2023 13:23:00 +0000 (09:23 -0400)]
SUNRPC: Fix the recent bv_offset fix
Jeff confirmed his original fix addressed his pynfs test failure,
but this same bug also impacted qemu: accessing qcow2 virtual disks
using direct I/O was failing. Jeff's fix missed that you have to
shorten the bio_vec element by the same amount as you increased
the page offset.
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes:
c96e2a695e00 ("sunrpc: set the bv_offset of first bvec in svc_tcp_sendmsg")
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Linus Torvalds [Sun, 27 Aug 2023 21:49:51 +0000 (14:49 -0700)]
Linux 6.5
Linus Torvalds [Sun, 27 Aug 2023 14:33:54 +0000 (07:33 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three small driver fixes and one larger unused function set removal in
the raid class (so no external impact)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: snic: Fix double free in snic_tgt_create()
scsi: core: raid_class: Remove raid_component_add()
scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5
scsi: ufs: mcq: Fix the search/wrap around logic
Linus Torvalds [Sat, 26 Aug 2023 17:57:29 +0000 (10:57 -0700)]
Merge tag 'x86-urgent-2023-08-26' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix an FPU invalidation bug on exec(), and fix a performance
regression due to a missing setting of X86_FEATURE_OSXSAVE"
* tag 'x86-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
x86/fpu: Invalidate FPU state correctly on exec()
Linus Torvalds [Sat, 26 Aug 2023 17:34:29 +0000 (10:34 -0700)]
Merge tag 'irq-urgent-2023-08-26' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A last minute fix for a regression introduced in the v6.5 merge
window.
The conversion of the software based interrupt resend mechanism to
hlist missed to add a check whether the descriptor is already enqueued
and dropped the interrupt descriptor lookup for nested interrupts.
The missing check whether the descriptor is already queued causes
hlist corruption and can be observed in the wild. The dropped parent
descriptor lookup has not yet caused problems, but it would result in
stale interrupt line in the worst case.
Add the missing enqueued check and bring the descriptor lookup back to
cure this"
* tag 'irq-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Fix software resend lockup and nested resend
Linus Torvalds [Sat, 26 Aug 2023 17:28:52 +0000 (10:28 -0700)]
Merge tag 'loongarch-fixes-6.5-2' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix a ptrace bug, a hw_breakpoint bug, some build errors/warnings and
some trivial cleanups"
* tag 'loongarch-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix hw_breakpoint_control() for watchpoints
LoongArch: Ensure FP/SIMD registers in the core dump file is up to date
LoongArch: Put the body of play_dead() into arch_cpu_idle_dead()
LoongArch: Add identifier names to arguments of die() declaration
LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP
LoongArch: Do not kill the task in die() if notify_die() returns NOTIFY_STOP
LoongArch: Remove <asm/export.h>
LoongArch: Replace #include <asm/export.h> with #include <linux/export.h>
LoongArch: Remove unneeded #include <asm/export.h>
LoongArch: Replace -ffreestanding with finer-grained -fno-builtin's
LoongArch: Remove redundant "source drivers/firmware/Kconfig"
Johan Hovold [Sat, 26 Aug 2023 15:40:04 +0000 (17:40 +0200)]
genirq: Fix software resend lockup and nested resend
The switch to using hlist for managing software resend of interrupts
broke resend in at least two ways:
First, unconditionally adding interrupt descriptors to the resend list can
corrupt the list when the descriptor in question has already been
added. This causes the resend tasklet to loop indefinitely with interrupts
disabled as was recently reported with the Lenovo ThinkPad X13s after
threaded NAPI was disabled in the ath11k WiFi driver.
This bug is easily fixed by restoring the old semantics of irq_sw_resend()
so that it can be called also for descriptors that have already been marked
for resend.
Second, the offending commit also broke software resend of nested
interrupts by simply discarding the code that made sure that such
interrupts are retriggered using the parent interrupt.
Add back the corresponding code that adds the parent descriptor to the
resend list.
Fixes:
bc06a9e08742 ("genirq: Use hlist for managing resend handlers")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/lkml/20230809073432.4193-1-johan+linaro@kernel.org/
Link: https://lore.kernel.org/r/20230826154004.1417-1-johan+linaro@kernel.org
Huacai Chen [Sat, 26 Aug 2023 14:21:57 +0000 (22:21 +0800)]
LoongArch: Fix hw_breakpoint_control() for watchpoints
In hw_breakpoint_control(), encode_ctrl_reg() has already encoded the
MWPnCFG3_LoadEn/MWPnCFG3_StoreEn bits in info->ctrl. We don't need to
add (1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn) unconditionally.
Otherwise we can't set read watchpoint and write watchpoint separately.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Sat, 26 Aug 2023 14:21:57 +0000 (22:21 +0800)]
LoongArch: Ensure FP/SIMD registers in the core dump file is up to date
This is a port of commit
379eb01c21795edb4c ("riscv: Ensure the value
of FP registers in the core dump file is up to date").
The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Linus Torvalds [Sat, 26 Aug 2023 00:49:03 +0000 (17:49 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"One clk driver fix and two clk framework fixes:
- Fix an OOB access when devm_get_clk_from_child() is used and
devm_clk_release() casts the void pointer to the wrong type
- Move clk_rate_exclusive_{get,put}() within the correct ifdefs in
clk.h so that the stubs are used when CONFIG_COMMON_CLK=n
- Register the proper clk provider function depending on the value of
#clock-cells in the TI keystone driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Fix slab-out-of-bounds error in devm_clk_release()
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
clk: keystone: syscon-clk: Fix audio refclk
Helge Deller [Fri, 25 Aug 2023 19:50:33 +0000 (21:50 +0200)]
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
The gcc compiler translates on some architectures the 64-bit
__builtin_clzll() function to a call to the libgcc function __clzdi2(),
which should take a 64-bit parameter on 32- and 64-bit platforms.
But in the current kernel code, the built-in __clzdi2() function is
defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
32, thus the return values on 32-bit kernels are in the range from
[0..31] instead of the expected [0..63] range.
This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
take a 64-bit parameter on 32-bit kernels as well, thus it makes the
functions identical for 32- and 64-bit kernels.
This bug went unnoticed since kernel 3.11 for over 10 years, and here
are some possible reasons for that:
a) Some architectures have assembly instructions to count the bits and
which are used instead of calling __clzdi2(), e.g. on x86 the bsr
instruction and on ppc cntlz is used. On such architectures the
wrong __clzdi2() implementation isn't used and as such the bug has
no effect and won't be noticed.
b) Some architectures link to libgcc.a, and the in-kernel weak
functions get replaced by the correct 64-bit variants from libgcc.a.
c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
places in the kernel, and most likely only in uncritical functions,
e.g. when printing hex values via seq_put_hex_ll(). The wrong return
value will still print the correct number, but just in a wrong
formatting (e.g. with too many leading zeroes).
d) 32-bit kernels aren't used that much any longer, so they are less
tested.
A trivial testcase to verify if the currently running 32-bit kernel is
affected by the bug is to look at the output of /proc/self/maps:
Here the kernel uses a correct implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
00010000-
00019000 r-xp
00000000 08:05 787324 /usr/bin/cat
00019000-
0001a000 rwxp
00009000 08:05 787324 /usr/bin/cat
0001a000-
0003b000 rwxp
00000000 00:00 0 [heap]
f7551000-
f770d000 r-xp
00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
and this kernel uses the broken implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
0000000010000-
0000000019000 r-xp
00000000 000000008:
000000005 787324 /usr/bin/cat
0000000019000-
000000001a000 rwxp
000000009000 000000008:
000000005 787324 /usr/bin/cat
000000001a000-
000000003b000 rwxp
00000000 00:00 0 [heap]
00000000f73d1000-
00000000f758d000 r-xp
00000000 000000008:
000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes:
4df87bb7b6a22 ("lib: add weak clz/ctz functions")
Cc: Chanho Min <chanho.min@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 25 Aug 2023 18:44:43 +0000 (11:44 -0700)]
Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4
issues or aren't considered suitable for a -stable backport"
* tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
shmem: fix smaps BUG sleeping while atomic
selftests: cachestat: catch failing fsync test on tmpfs
selftests: cachestat: test for cachestat availability
maple_tree: disable mas_wr_append() when other readers are possible
madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
mm: multi-gen LRU: don't spin during memcg release
mm: memory-failure: fix unexpected return value in soft_offline_page()
radix tree: remove unused variable
mm: add a call to flush_cache_vmap() in vmap_pfn()
selftests/mm: FOLL_LONGTERM need to be updated to 0x100
nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
selftests: cgroup: fix test_kmem_basic less than error
mm: enable page walking API to lock vmas during the walk
smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Linus Torvalds [Fri, 25 Aug 2023 16:29:47 +0000 (09:29 -0700)]
Merge tag 'riscv-for-linus-6.5-rc8' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This is obviously not ideal, particularly for something this late in
the cycle.
Unfortunately we found some uABI issues in the vector support while
reviewing the GDB port, which has triggered a revert -- probably a
good sign we should have reviewed GDB before merging this, I guess I
just dropped the ball because I was so worried about the context
extension and libc suff I forgot. Hence the late revert.
There's some risk here as we're still exposing the vector context for
signal handlers, but changing that would have meant reverting all of
the vector support. The issues we've found so far have been fixed
already and they weren't absolute showstoppers, so we're essentially
just playing it safe by holding ptrace support for another release (or
until we get through a proper userspace code review).
Summary:
- The vector ucontext extension has been extended with vlenb
- The vector registers ELF core dump note type has been changed to
avoid aliasing with the CSR type used in embedded systems
- Support for accessing vector registers via ptrace() has been
reverted
- Another build fix for the ISA spec changes around Zifencei/Zicsr
that manifests on some systems built with binutils-2.37 and
gcc-11.2"
* tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix build errors using binutils2.37 toolchains
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
RISC-V: Remove ptrace support for vectors
Linus Torvalds [Fri, 25 Aug 2023 16:18:22 +0000 (09:18 -0700)]
Merge tag 'gpio-fixes-for-v6.5' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an irq mapping leak in gpio-sim
- associate the GPIO device's software node with the irq domain in
gpio-sim
* tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: pass the GPIO device's software node to irq domain
gpio: sim: dispose of irq mappings before destroying the irq_sim domain
Linus Torvalds [Fri, 25 Aug 2023 16:10:16 +0000 (09:10 -0700)]
Merge tag 'pinctrl-v6.5-4' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here are some Renesas and AMD driver fixes, the AMD fix affects
important laptops in the wild so this one is pretty important. It
seems a bit tough to get this right.
- Fix DT parsing and related locking in the Renesas driver.
- Fix wakeup IRQs in the AMD driver once again. Really tricky this
one"
* tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Mask wake bits on probe again
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
Linus Torvalds [Fri, 25 Aug 2023 15:48:14 +0000 (08:48 -0700)]
Merge tag 'sound-6.5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last bits for 6.5. It's slightly higher LOCs than
wished, but it doesn't look scary.
The biggest change is MAINTAINERS update for TI; it's good to have the
update before the final release, so that people can contact to the
right persons for bug reports (which shouldn't happen of course!)
The rest are all device-specific fixes and quirks, most for various
ASoC platforms"
* tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
ASoC: cs35l41: Correct amp_gain_tlv values
ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
ASoC: tas2781: fixed register access error when switching to other chips
ASoC: cs35l56: Add an ACPI match table
ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
ASoC: SOF: ipc4-pcm: fix possible null pointer deference
MAINTAINERS: Add entries for TEXAS INSTRUMENTS ASoC DRIVERS
Tiezhu Yang [Fri, 25 Aug 2023 15:40:38 +0000 (23:40 +0800)]
LoongArch: Put the body of play_dead() into arch_cpu_idle_dead()
The initial aim is to silence the following objtool warning:
arch/loongarch/kernel/process.o: warning: objtool: arch_cpu_idle_dead() falls through to next function start_thread()
According to tools/objtool/Documentation/objtool.txt, this is because
the last instruction of arch_cpu_idle_dead() is a call to a noreturn
function play_dead(). In order to silence the warning, one simple way
is to add the noreturn function play_dead() to objtool's hard-coded
global_noreturns array, that is to say, just put "NORETURN(play_dead)"
into tools/objtool/noreturns.h, it works well.
But I noticed that play_dead() is only defined once and only called by
arch_cpu_idle_dead(), so put the body of play_dead() into the caller
arch_cpu_idle_dead(), then remove the noreturn function play_dead() is
an alternative way which can reduce the overhead of the function call
at the same time.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Add identifier names to arguments of die() declaration
Add identifier names to arguments of die() declaration in ptrace.h
to fix the following checkpatch warnings:
WARNING: function definition argument 'const char *' should also have an identifier name
WARNING: function definition argument 'struct pt_regs *' should also have an identifier name
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP
After the call to oops_exit(), it should not panic or execute
the crash kernel if the oops is to be suppressed.
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Do not kill the task in die() if notify_die() returns NOTIFY_STOP
If notify_die() returns NOTIFY_STOP, honor the return value from the
handler chain invocation in die() and return without killing the task
as, through a debugger, the fault may have been fixed. It makes sense
even if ignoring the event will make the system unstable: by allowing
access through a debugger it has been compromised already anyway. It
makes our port consistent with x86, arm64, riscv and csky.
Commit
20c0d2d44029 ("[PATCH] i386: pass proper trap numbers to die
chain handlers") may be the earliest of similar changes.
Link: https://lore.kernel.org/r/43DDF02E.76F0.0078.0@novell.com/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Masahiro Yamada [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Remove <asm/export.h>
All *.S files under arch/loongarch/ have been converted to include
<linux/export.h> instead of <asm/export.h>.
Remove <asm/export.h>.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Masahiro Yamada [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Replace #include <asm/export.h> with #include <linux/export.h>
Commit
ddb5cdbafaaad ("kbuild: generate KSYMTAB entries by modpost")
deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>.
Replace #include <asm/export.h> with #include <linux/export.h>.
After all the <asm/export.h> lines are converted, <asm/export.h> and
<asm-generic/export.h> will be removed.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Masahiro Yamada [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Remove unneeded #include <asm/export.h>
There is no EXPORT_SYMBOL() line there, hence #include <asm/export.h>
is unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Replace -ffreestanding with finer-grained -fno-builtin's
As explained by Nick in the original issue: the kernel usually does a
good job of providing library helpers that have similar semantics as
their ordinary userspace libc equivalents, but -ffreestanding disables
such libcall optimization and other related features in the compiler,
which can lead to unexpected things such as CONFIG_FORTIFY_SOURCE not
working (!).
However, due to the desire for better control over unaligned accesses
with respect to CONFIG_ARCH_STRICT_ALIGN, and also for avoiding the
GCC bug https://gcc.gnu.org/PR109465, we do want to still disable
optimizations for the memory libcalls (memcpy, memmove and memset for
now). Use finer-grained -fno-builtin-* toggles to achieve this without
losing source fortification and other libcall optimizations.
Closes: https://github.com/ClangBuiltLinux/linux/issues/1897
Reported-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Xi Ruoyao [Fri, 25 Aug 2023 15:40:26 +0000 (23:40 +0800)]
LoongArch: Remove redundant "source drivers/firmware/Kconfig"
In drivers/Kconfig, drivers/firmware/Kconfig is sourced for all ports so
there is no need to source it in the port-specific Kconfig file. And
sourcing it here also caused the "Firmware Drivers" menu appeared two
times: one in the "Device Drivers" menu, another in the toplevel menu.
This is really puzzling so remove it.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Linus Torvalds [Fri, 25 Aug 2023 15:38:40 +0000 (08:38 -0700)]
Merge tag 'drm-fixes-2023-08-25' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"A bit bigger than I'd care for, but it's mostly a single vmwgfx fix
and a fix for an i915 hotplug probing. Otherwise misc i915, bridge,
panfrost and dma-buf fixes.
core:
- add a HPD poll helper
i915:
- fix regression in i915 polling
- fix docs build warning
- fix DG2 idle power consumption
bridge:
- samsung-dsim: init fix
panfrost:
- fix speed binning issue
dma-buf:
- fix recursive lock in fence signal
vmwgfx:
- fix shader stage validation
- fix NULL ptr derefs in gem put"
* tag 'drm-fixes-2023-08-25' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Fix HPD polling, reenabling the output poll work as needed
drm: Add an HPD poll helper to reschedule the poll work
drm/vmwgfx: Fix possible invalid drm gem put calls
drm/vmwgfx: Fix shader stage validation
dma-buf/sw_sync: Avoid recursive lock during fence signal
drm/i915: fix Sphinx indentation warning
drm/i915/dgfx: Enable d3cold at s2idle
drm/display/dp: Fix the DP DSC Receiver cap size
drm/panfrost: Skip speed binning on EOPNOTSUPP
drm: bridge: samsung-dsim: Fix init during host transfer
Takashi Iwai [Fri, 25 Aug 2023 07:43:49 +0000 (09:43 +0200)]
Merge tag 'asoc-fix-v6.5-rc7-2' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Quirk for v6.5
One additional fix for v6.5, an additional quirk. As with the other
fixes this could wait for the merge window.
Linus Torvalds [Fri, 25 Aug 2023 02:39:20 +0000 (19:39 -0700)]
Merge tag 'trace-v6.5-rc6' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix ring buffer being permanently disabled due to missed
record_disabled()
Changing the trace cpu mask will disable the ring buffers for the
CPUs no longer in the mask. But it fails to update the snapshot
buffer. If a snapshot takes place, the accounting for the ring buffer
being disabled is corrupted and this can lead to the ring buffer
being permanently disabled.
- Add test case for snapshot and cpu mask working together
- Fix memleak by the function graph tracer not getting closed properly.
The iterator is used to read the ring buffer. When it opens, it calls
the open function of a tracer, and when it is closed, it calls the
close iteration. While a trace is being read, it is still possible to
change the tracer.
If this happens between the function graph tracer and the wakeup
tracer (which uses function graph tracing), the tracers are not
closed properly during when the iterator sees the switch, and the
wakeup function did not initialize its private pointer to NULL, which
is used to know if the function graph tracer was the last tracer. It
could be fooled in thinking it is, but then on exit it does not call
the close function of the function graph tracer to clean up its data.
- Fix synthetic events on big endian machines, by introducing a union
that does the conversions properly.
- Fix synthetic events from printing out the number of elements in the
stacktrace when it shouldn't.
- Fix synthetic events stacktrace to not print a bogus value at the
end.
- Introduce a pipe_cpumask that prevents the trace_pipe files from
being opened by more than one task (file descriptor).
There was a race found where if splice is called, the iter->ent could
become stale and events could be missed. There's no point reading a
producer/consumer file by more than one task as they will corrupt
each other anyway. Add a cpumask that keeps track of the per_cpu
trace_pipe files as well as the global trace_pipe file that prevents
more than one open of a trace_pipe file that represents the same ring
buffer. This prevents the race from happening.
- Fix ftrace samples for arm64 to work with older compilers.
* tag 'trace-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
samples: ftrace: Replace bti assembly with hint for older compiler
tracing: Introduce pipe_cpumask to avoid race on trace_pipes
tracing: Fix memleak due to race between current_tracer and trace
tracing/synthetic: Allocate one additional element for size
tracing/synthetic: Skip first entry for stack traces
tracing/synthetic: Use union instead of casts
selftests/ftrace: Add a basic testcase for snapshot
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Zhu Wang [Sat, 19 Aug 2023 08:39:41 +0000 (08:39 +0000)]
scsi: snic: Fix double free in snic_tgt_create()
Commit
41320b18a0e0 ("scsi: snic: Fix possible memory leak if device_add()
fails") fixed the memory leak caused by dev_set_name() when device_add()
failed. However, it did not consider that 'tgt' has already been released
when put_device(&tgt->dev) is called. Remove kfree(tgt) in the error path
to avoid double free of 'tgt' and move put_device(&tgt->dev) after the
removed kfree(tgt) to avoid a use-after-free.
Fixes:
41320b18a0e0 ("scsi: snic: Fix possible memory leak if device_add() fails")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230819083941.164365-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Fri, 25 Aug 2023 02:10:53 +0000 (19:10 -0700)]
Merge tag 'media/v6.5-4' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"Fix a potential array out-of-bounds in the mediatek vcodec driver"
* tag 'media/v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Zhu Wang [Tue, 22 Aug 2023 01:52:54 +0000 (01:52 +0000)]
scsi: core: raid_class: Remove raid_component_add()
The raid_component_add() function was added to the kernel tree via patch
"[SCSI] embryonic RAID class" (2005). Remove this function since it never
has had any callers in the Linux kernel. And also raid_component_release()
is only used in raid_component_add(), so it is also removed.
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230822015254.184270-1-wangzhu9@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Fixes:
04b5b5cb0136 ("scsi: core: Fix possible memory leak if device_add() fails")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dave Airlie [Thu, 24 Aug 2023 21:35:23 +0000 (07:35 +1000)]
Merge tag 'drm-intel-fixes-2023-08-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix power consumption at s2idle on DG2 (Anshuman)
- Fix documentation build warning (Jani)
- Fix Display HPD (Imre)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZOdPRFSJpo0ErPX/@intel.com
Hugh Dickins [Wed, 23 Aug 2023 05:14:47 +0000 (22:14 -0700)]
shmem: fix smaps BUG sleeping while atomic
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
Fixes:
230100321518 ("mm/smaps: simplify shmem handling of pte holes")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andre Przywara [Mon, 21 Aug 2023 16:05:34 +0000 (17:05 +0100)]
selftests: cachestat: catch failing fsync test on tmpfs
The cachestat kselftest runs a test on a normal file, which is created
temporarily in the current directory. Among the tests it runs there is a
call to fsync(), which is expected to clean all dirty pages used by the
file.
However the tmpfs filesystem implements fsync() as noop_fsync(), so the
call will not even attempt to clean anything when this test file happens
to live on a tmpfs instance. This happens in an initramfs, or when the
current directory is in /dev/shm or sometimes /tmp.
To avoid this test failing wrongly, use statfs() to check which filesystem
the test file lives on. If that is "tmpfs", we skip the fsync() test.
Since the fsync test is only one part of the "normal file" test, we now
execute this twice, skipping the fsync part on the first call. This way
only the second test, including the fsync part, would be skipped.
Link: https://lkml.kernel.org/r/20230821160534.3414911-3-andre.przywara@arm.com
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andre Przywara [Mon, 21 Aug 2023 16:05:33 +0000 (17:05 +0100)]
selftests: cachestat: test for cachestat availability
Patch series "selftests: cachestat: fix run on older kernels", v2.
I ran all kernel selftests on some test machine, and stumbled upon
cachestat failing (among others). These patches fix the run on older
kernels and when the current directory is on a tmpfs instance.
This patch (of 2):
As cachestat is a new syscall, it won't be available on older kernels, for
instance those running on a development machine. At the moment the test
reports all tests as "not ok" in this case.
Test for the cachestat syscall availability first, before doing further
tests, and bail out early with a TAP SKIP comment.
This also uses the opportunity to add the proper TAP headers, and add one
check for proper error handling (illegal file descriptor).
Link: https://lkml.kernel.org/r/20230821160534.3414911-1-andre.przywara@arm.com
Link: https://lkml.kernel.org/r/20230821160534.3414911-2-andre.przywara@arm.com
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Sat, 19 Aug 2023 00:43:55 +0000 (20:43 -0400)]
maple_tree: disable mas_wr_append() when other readers are possible
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes:
54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yin Fengwei [Tue, 8 Aug 2023 02:09:17 +0000 (10:09 +0800)]
madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
Commit
98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes:
98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yin Fengwei [Tue, 8 Aug 2023 02:09:16 +0000 (10:09 +0800)]
madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
Commit
fc986a38b670 ("mm: huge_memory: convert madvise_free_huge_pmd to
use a folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-3-fengwei.yin@intel.com
Fixes:
fc986a38b670 ("mm: huge_memory: convert madvise_free_huge_pmd to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yin Fengwei [Tue, 8 Aug 2023 02:09:15 +0000 (10:09 +0800)]
madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/
9e92e42d-488f-47db-ac9d-
75b24cd0d037@intel.com/T/#mbf0f2ec7fbe45da47526de1d7036183981691e81
and I confirmed this patchset make it work again.
This patch (of 3):
Commit
07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes:
07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Thu, 24 Aug 2023 21:30:47 +0000 (14:30 -0700)]
Merge tag 'nfsd-6.5-5' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Two last-minute one-liners for v6.5-rc. One got lost in the shuffle,
and the other was reported just this morning"
- Close race window when handling FREE_STATEID operations
- Fix regression in /proc/fs/nfsd/v4_end_grace introduced in v6.5-rc"
* tag 'nfsd-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix a thinko introduced by recent trace point changes
nfsd: Fix race to FREE_STATEID and cl_revoked
Linus Torvalds [Thu, 24 Aug 2023 20:55:35 +0000 (13:55 -0700)]
Merge tag 'spi-fix-v6.5-rc7' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple more small driver specific fixes for v6.5.
The device mode for Cadence had been broken by some recent updates
done for host mode and large transfers for multi-byte words on stm32
had been broken by an API update in what I think was a rebasing
incident"
* tag 'spi-fix-v6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-cadence: Fix data corruption issues in slave mode
spi: stm32: fix accidential revert to byte-sized transfer splitting
Mingzheng Xing [Thu, 24 Aug 2023 19:08:52 +0000 (03:08 +0800)]
riscv: Fix build errors using binutils2.37 toolchains
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0,
the following error occurs:
Assembler messages:
Error: cannot find default versions of the ISA extension `zicsr'
Error: cannot find default versions of the ISA extension `zifencei'
The above error originated from this commit of binutils[0], which has been
resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2].
So fix this by change the GCC version in
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0.
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1995608fbf6648fcee4e9e0c
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ca2bbb88f999f4d3cc40e89bc1aba712505dd598
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e35ae9fd6671
Fixes:
ca09f772ccca ("riscv: Handle zicsr/zifencei issue between gcc and binutils")
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Link: https://lore.kernel.org/r/20230824190852.45470-1-xingmingzheng@iscas.ac.cn
Closes: https://lore.kernel.org/all/
20230823-captive-abdomen-
befd942a4a73@wendy/
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Dave Airlie [Thu, 24 Aug 2023 19:15:54 +0000 (05:15 +1000)]
Merge tag 'drm-misc-fixes-2023-08-24' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A samsung-dsim initialization fix, a devfreq fix for panfrost, a DP DSC
define fix, a recursive lock fix for dma-buf, a shader validation fix
and a reference counting fix for vmwgfx
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/amy26vu5xbeeikswpx7nt6rddwfocdidshrtt2qovipihx5poj@y45p3dtzrloc
Linus Torvalds [Thu, 24 Aug 2023 15:23:13 +0000 (08:23 -0700)]
Merge tag 'net-6.5-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wifi, can and netfilter.
Fixes to fixes:
- nf_tables:
- GC transaction race with abort path
- defer gc run if previous batch is still pending
Previous releases - regressions:
- ipv4: fix data-races around inet->inet_id
- phy: fix deadlocking in phy_error() invocation
- mdio: fix C45 read/write protocol
- ipvlan: fix a reference count leak warning in ipvlan_ns_exit()
- ice: fix NULL pointer deref during VF reset
- i40e: fix potential NULL pointer dereferencing of pf->vf in
i40e_sync_vsi_filters()
- tg3: use slab_build_skb() when needed
- mtk_eth_soc: fix NULL pointer on hw reset
Previous releases - always broken:
- core: validate veth and vxcan peer ifindexes
- sched: fix a qdisc modification with ambiguous command request
- devlink: add missing unregister linecard notification
- wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
- batman:
- do not get eth header before batadv_check_management_packet
- fix batadv_v_ogm_aggr_send memory leak
- bonding: fix macvlan over alb bond support
- mlxsw: set time stamp fields also when its type is MIRROR_UTC"
* tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
selftests: bonding: add macvlan over bond testing
selftest: bond: add new topo bond_topo_2d1c.sh
bonding: fix macvlan over alb bond support
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
ibmveth: Use dcbf rather than dcbfl
i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
net/sched: fix a qdisc modification with ambiguous command request
igc: Fix the typo in the PTM Control macro
batman-adv: Hold rtnl lock during MTU update via netlink
igb: Avoid starting unnecessary workqueues
can: raw: add missing refcount for memory leak fix
can: isotp: fix support for transmission of SF without flow control
bnx2x: new flag for track HW resource allocation
sfc: allocate a big enough SKB for loopback selftest packet
...
Chuck Lever [Thu, 24 Aug 2023 14:30:27 +0000 (10:30 -0400)]
NFSD: Fix a thinko introduced by recent trace point changes
The fixed commit erroneously removed a call to nfsd_end_grace(),
which makes calls to write_v4_end_grace() a no-op.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/
202308241229.
68396422-oliver.sang@intel.com
Fixes:
39d432fc7630 ("NFSD: trace nfsctl operations")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Mario Limonciello [Thu, 24 Aug 2023 01:11:49 +0000 (20:11 -0500)]
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
Lenovo 82SJ doesn't have DMIC connected like 82V2 does. Narrow
the match down to only cover 82V2.
Reported-by: prosenfeld@Yuhsbstudents.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217063
Fixes:
2232b2dd8cd4 ("ASoC: amd: yc: Add Lenovo Yoga Slim 7 Pro X to quirks table")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com
Link: https://lore.kernel.org/r/20230824011149.1395-1-mario.limonciello@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org
Feng Tang [Wed, 23 Aug 2023 06:57:47 +0000 (14:57 +0800)]
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit
b81fac906a8f ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:
arch_cpu_finalize_init
identify_boot_cpu
identify_cpu
generic_identify
get_cpu_cap --> setup cpu capability
...
fpu__init_cpu
fpu__init_cpu_xstate
cr4_set_bits(X86_CR4_OSXSAVE);
As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.
Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.
[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]
Fixes:
b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com
Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com
Rick Edgecombe [Fri, 18 Aug 2023 17:03:05 +0000 (10:03 -0700)]
x86/fpu: Invalidate FPU state correctly on exec()
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Fixes:
33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
Paolo Abeni [Thu, 24 Aug 2023 08:33:22 +0000 (10:33 +0200)]
Merge tag 'nf-23-08-23' of ssh://gitolite./linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
First patch fixes table validation, I broke this in 6.4 when tracking
validation state per table, reported by Pablo, fixup from myself.
Second patch makes sure objects waiting for memory release have been
released, this was broken in 6.1, patch from Pablo Neira Ayuso.
Patch three is a fix-for-fix from previous PR: In case a transaction
gets aborted, gc sequence counter needs to be incremented so pending
gc requests are invalidated, from Pablo.
Same for patch 4: gc list needs to use gc list lock, not destroy lock,
also from Pablo.
Patch 5 fixes a UaF in a set backend, but this should only occur when
failslab is enabled for GFP_KERNEL allocations, broken since feature
was added in 5.6, from myself.
Patch 6 fixes a double-free bug that was also added via previous PR:
We must not schedule gc work if the previous batch is still queued.
netfilter pull request 2023-08-23
* tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
====================
Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 24 Aug 2023 08:07:16 +0000 (10:07 +0200)]
Merge branch 'fix-macvlan-over-alb-bond-support'
Hangbin Liu says:
====================
fix macvlan over alb bond support
Currently, the macvlan over alb bond is broken after commit
14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds").
Fix this and add relate tests.
====================
Link: https://lore.kernel.org/r/20230823071907.3027782-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Wed, 23 Aug 2023 07:19:06 +0000 (15:19 +0800)]
selftests: bonding: add macvlan over bond testing
Add a macvlan over bonding test with mode active-backup, balance-tlb
and balance-alb.
]# ./bond_macvlan.sh
TEST: active-backup: IPv4: client->server [ OK ]
TEST: active-backup: IPv6: client->server [ OK ]
TEST: active-backup: IPv4: client->macvlan_1 [ OK ]
TEST: active-backup: IPv6: client->macvlan_1 [ OK ]
TEST: active-backup: IPv4: client->macvlan_2 [ OK ]
TEST: active-backup: IPv6: client->macvlan_2 [ OK ]
TEST: active-backup: IPv4: macvlan_1->macvlan_2 [ OK ]
TEST: active-backup: IPv6: macvlan_1->macvlan_2 [ OK ]
TEST: active-backup: IPv4: server->client [ OK ]
TEST: active-backup: IPv6: server->client [ OK ]
TEST: active-backup: IPv4: macvlan_1->client [ OK ]
TEST: active-backup: IPv6: macvlan_1->client [ OK ]
TEST: active-backup: IPv4: macvlan_2->client [ OK ]
TEST: active-backup: IPv6: macvlan_2->client [ OK ]
TEST: active-backup: IPv4: macvlan_2->macvlan_2 [ OK ]
TEST: active-backup: IPv6: macvlan_2->macvlan_2 [ OK ]
[...]
TEST: balance-alb: IPv4: client->server [ OK ]
TEST: balance-alb: IPv6: client->server [ OK ]
TEST: balance-alb: IPv4: client->macvlan_1 [ OK ]
TEST: balance-alb: IPv6: client->macvlan_1 [ OK ]
TEST: balance-alb: IPv4: client->macvlan_2 [ OK ]
TEST: balance-alb: IPv6: client->macvlan_2 [ OK ]
TEST: balance-alb: IPv4: macvlan_1->macvlan_2 [ OK ]
TEST: balance-alb: IPv6: macvlan_1->macvlan_2 [ OK ]
TEST: balance-alb: IPv4: server->client [ OK ]
TEST: balance-alb: IPv6: server->client [ OK ]
TEST: balance-alb: IPv4: macvlan_1->client [ OK ]
TEST: balance-alb: IPv6: macvlan_1->client [ OK ]
TEST: balance-alb: IPv4: macvlan_2->client [ OK ]
TEST: balance-alb: IPv6: macvlan_2->client [ OK ]
TEST: balance-alb: IPv4: macvlan_2->macvlan_2 [ OK ]
TEST: balance-alb: IPv6: macvlan_2->macvlan_2 [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Wed, 23 Aug 2023 07:19:05 +0000 (15:19 +0800)]
selftest: bond: add new topo bond_topo_2d1c.sh
Add a new testing topo bond_topo_2d1c.sh which is used more commonly.
Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the
extra link.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Wed, 23 Aug 2023 07:19:04 +0000 (15:19 +0800)]
bonding: fix macvlan over alb bond support
The commit
14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode
bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
the current rlb bond mode only handles ARP packets to update remote neighbor
entries. This causes an issue when a macvlan is on top of the bond, and
remote devices send packets to the macvlan using the bond's MAC address
as the destination. After delivering the packets to the macvlan, the macvlan
will rejects them as the MAC address is incorrect. Consequently, this commit
makes macvlan over bond non-functional.
To address this problem, one potential solution is to check for the presence
of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
and return NULL in the rlb_arp_xmit() function. However, this approach
doesn't fully resolve the situation when a VLAN exists between the bond and
macvlan.
So let's just do a partial revert for commit
14af9963ba1e in rlb_arp_xmit().
As the comment said, Don't modify or load balance ARPs that do not originate
locally.
Fixes:
14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds")
Reported-by: susan.zheng@veritas.com
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=
2117816
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ido Schimmel [Wed, 23 Aug 2023 06:43:48 +0000 (09:43 +0300)]
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Negative ifindexes are illegal, but the kernel does not validate the
ifindex in the ancillary header of RTM_NEWLINK messages, resulting in
the kernel generating a warning [1] when such an ifindex is specified.
Fix by rejecting negative ifindexes.
[1]
WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593
[...]
Call Trace:
<TASK>
register_netdevice+0x69a/0x1490 net/core/dev.c:10081
br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552
rtnl_newlink_create net/core/rtnetlink.c:3471 [inline]
__rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688
rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701
rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427
netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545
netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368
netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910
sock_sendmsg_nosec net/socket.c:728 [inline]
sock_sendmsg+0xd9/0x180 net/socket.c:751
____sys_sendmsg+0x6ac/0x940 net/socket.c:2538
___sys_sendmsg+0x135/0x1d0 net/socket.c:2592
__sys_sendmsg+0x117/0x1e0 net/socket.c:2621
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes:
38f7b870d4a6 ("[RTNETLINK]: Link creation API")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Takashi Iwai [Thu, 24 Aug 2023 07:20:36 +0000 (09:20 +0200)]
Merge tag 'asoc-fix-v6.5-rc7' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.5
A relatively large but generally not super urgent set of fixes for ASoC,
including some quirks and a MAINTAINERS update. There's also an update
to cs35l56 to change the firmware ABI, there are no current shipping
systems which use the current interface and the sooner we get the new
interface in the less likely it is that something will start.
It'd be nice if these landed for v6.5 but not the end of the world if
they wait till v6.6.
Linus Torvalds [Wed, 23 Aug 2023 21:28:19 +0000 (14:28 -0700)]
Merge tag 'acpi-6.5-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make an existing ACPI IRQ override quirk for PCSpecialist Elimina Pro
16 M work as intended (Hans de Goede)"
* tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
Imre Deak [Tue, 22 Aug 2023 11:30:15 +0000 (14:30 +0300)]
drm/i915: Fix HPD polling, reenabling the output poll work as needed
After the commit in the Fixes: line below, HPD polling stopped working
on i915, since after that change calling drm_kms_helper_poll_enable()
doesn't restart drm_mode_config::output_poll_work if the work was
stopped (no connectors needing polling) and enabling polling for a
connector (during runtime suspend or detecting an HPD IRQ storm).
After the above change calling drm_kms_helper_poll_enable() is a nop
after it's been called already and polling for some connectors was
disabled/re-enabled.
Fix this by calling drm_kms_helper_poll_reschedule() added in the
previous patch instead, which reschedules the work whenever expected.
Fixes:
d33a54e3991d ("drm/probe_helper: sort out poll_running vs poll_enabled")
CC: stable@vger.kernel.org # 6.4+
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230822113015.41224-2-imre.deak@intel.com
(cherry picked from commit
50452f2f76852322620b63e62922b85e955abe94)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Imre Deak [Tue, 22 Aug 2023 11:30:14 +0000 (14:30 +0300)]
drm: Add an HPD poll helper to reschedule the poll work
Add a helper to reschedule drm_mode_config::output_poll_work after
polling has been enabled for a connector (and needing a reschedule,
since previously polling was disabled for all connectors and hence
output_poll_work was not running).
This is needed by the next patch fixing HPD polling on i915.
CC: stable@vger.kernel.org # 6.4+
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230822113015.41224-1-imre.deak@intel.com
(cherry picked from commit
fe2352fd64029918174de4b460dfe6df0c6911cd)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Palmer Dabbelt [Tue, 22 Aug 2023 20:55:14 +0000 (13:55 -0700)]
Merge patch series "riscv: fix ptrace and export VLENB"
Andy Chiu <andy.chiu@sifive.com> says:
We add a vlenb field in Vector context and save it with the
riscv_vstate_save() macro. It should not cause performance regression as
VLENB is a design-time constant and is frequently used by hardware.
Also, adding this field into the __sc_riscv_v_state may benifit us on a
future compatibility issue becuse a hardware may have writable VLENB.
Adding and saving VLENB have an immediate benifit as it gives ptrace a
better view of the Vector extension and makes it possible to reconstruct
Vector register files from the dump without doing an additional csr read.
This patchset also sync the number of note types between us and gdb for
riscv to solve a conflicting note.
This is not an ABI break given that 6.5 has not been released yet.
* b4-shazam-merge:
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
RISC-V: Remove ptrace support for vectors
Link: https://lore.kernel.org/r/20230816155450.26200-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Bartosz Golaszewski [Tue, 22 Aug 2023 19:29:43 +0000 (21:29 +0200)]
gpio: sim: pass the GPIO device's software node to irq domain
Associate the swnode of the GPIO device's (which is the interrupt
controller here) with the irq domain. Otherwise the interrupt-controller
device attribute is a no-op.
Fixes:
cb8c474e79be ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Bartosz Golaszewski [Tue, 22 Aug 2023 19:29:42 +0000 (21:29 +0200)]
gpio: sim: dispose of irq mappings before destroying the irq_sim domain
If a GPIO simulator device is unbound with interrupts still requested,
we will hit a use-after-free issue in __irq_domain_deactivate_irq(). The
owner of the irq domain must dispose of all mappings before destroying
the domain object.
Fixes:
cb8c474e79be ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>