Ivan Malov [Tue, 28 Jun 2022 09:18:48 +0000 (12:18 +0300)]
xsk: Clear page contiguity bit when unmapping pool
When a XSK pool gets mapped, xp_check_dma_contiguity() adds bit 0x1
to pages' DMA addresses that go in ascending order and at 4K stride.
The problem is that the bit does not get cleared before doing unmap.
As a result, a lot of warnings from iommu_dma_unmap_page() are seen
in dmesg, which indicates that lookups by iommu_iova_to_phys() fail.
Fixes:
2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220628091848.534803-1-ivan.malov@oktetlabs.ru
Alexei Starovoitov [Mon, 27 Jun 2022 18:22:55 +0000 (20:22 +0200)]
bpf, docs: Better scale maintenance of BPF subsystem
The BPF subsystem consists of a large number of pieces. There is not a
single person that understands it all. Yet reviews are crucially important
for the BPF community to provide productive quality feedback to contributors
in a timely manner and therefore to ultimately expand the number of active
developers in the community.
So far, the BPF community had a two-stage review system, that is, a weekly
rotation among 7 developers (Alexei, Daniel, Andrii, Martin, Song, Yonghong,
John) as a first-level review of all inbound patches accompanied by a BPF CI
system which runs the in-tree BPF selftests to check for regressions for
every new patch, and then, a final check by Alexei, Daniel, Andrii to apply
the patches to either bpf or bpf-next trees.
This system worked well for the last ~3.5 years, but clearly reaches its
limits these days as it does not scale enough. Especially, as we also need
to allow enough room for every developer to contribute patches themselves,
integrate with their day to day job, and in particular avoid burnout. We
want to better scale both horizontally and vertically going forward.
On the horizontal scale, we are adding more developers (KP, Stan, Hao, Jiri)
to the overall core reviewer team, thus growing to 11 people in total. The
weekly rotation for the horizontal oncall reviewer is shortened to 1/2 week
(Mo - Wed and Thur - Fri). Instead of just patches, the coverage however
extends also generally to triage and reply to mailing list traffic (e.g. RFCs,
questions, etc).
On the vertical scale, there is clearly a need for deep expertise areas to
assign dedicated maintainer/reviewer teams that are responsible for code
reviews and help with design of individual building blocks. To some degree
we have been doing this implicitly, but the point is to formalize the teams
and commitment.
There is an overlap between areas and boundaries are intentionally grey. These
additional entries provide a guidance on who has to look at the patches. The
patch series which span multiple areas will be looked at by multiple people.
The vertical review with areas of deep expertise are bundled at the same time
with the horizontal side.
This patch cleans up a bit the BPF entries, adds mentioned developers to
the horizontal scale and creates new sub-entries with teams for developers
committing to the above outlined vertical scale. Also, pw.git tools we use
for BPF tree maintenance have been updated with a new pw-schedule script to
semi-automate vertical oncall review rotation.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@linux.dev>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Mykola Lysenko <mykolal@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/pw.git
Link: https://lore.kernel.org/bpf/5bdc73e7f5a087299589944fa074563cdf2c2c1a.1656353995.git.daniel@iogearbox.net
Masami Hiramatsu (Google) [Thu, 23 Jun 2022 22:31:35 +0000 (07:31 +0900)]
fprobe, samples: Add module parameter descriptions
Add module parameter descriptions for the fprobe_example module.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/165602349520.56016.1314423560740428008.stgit@devnote2
Peilin Ye [Thu, 16 Jun 2022 23:43:36 +0000 (16:43 -0700)]
net/sched: sch_netem: Fix arithmetic in netem_dump() for 32-bit platforms
As reported by Yuming, currently tc always show a latency of UINT_MAX
for netem Qdisc's on 32-bit platforms:
$ tc qdisc add dev dummy0 root netem latency 100ms
$ tc qdisc show dev dummy0
qdisc netem 8001: root refcnt 2 limit 1000 delay 275s 275s
^^^^^^^^^^^^^^^^
Let us take a closer look at netem_dump():
qopt.latency = min_t(psched_tdiff_t, PSCHED_NS2TICKS(q->latency,
UINT_MAX);
qopt.latency is __u32, psched_tdiff_t is signed long,
(psched_tdiff_t)(UINT_MAX) is negative for 32-bit platforms, so
qopt.latency is always UINT_MAX.
Fix it by using psched_time_t (u64) instead.
Note: confusingly, users have two ways to specify 'latency':
1. normally, via '__u32 latency' in struct tc_netem_qopt;
2. via the TCA_NETEM_LATENCY64 attribute, which is s64.
For the second case, theoretically 'latency' could be negative. This
patch ignores that corner case, since it is broken (i.e. assigning a
negative s64 to __u32) anyways, and should be handled separately.
Thanks Ted Lin for the analysis [1] .
[1] https://github.com/raspberrypi/linux/issues/3512
Reported-by: Yuming Chen <chenyuming.junnan@bytedance.com>
Fixes:
112f9cb65643 ("netem: convert to qdisc_watchdog_schedule_ns")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20220616234336.2443-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 16 Jun 2022 16:08:55 +0000 (18:08 +0200)]
ethtool: Fix get module eeprom fallback
Function fallback_set_params() checks if the module type returned
by a driver is ETH_MODULE_SFF_8079 and in this case it assumes
that buffer returns a concatenated content of page A0h and A2h.
The check is wrong because the correct type is ETH_MODULE_SFF_8472.
Fixes:
96d971e307cc ("ethtool: Add fallback to get_module_eeprom from netlink command")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220616160856.3623273-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jay Vosburgh [Thu, 16 Jun 2022 19:32:40 +0000 (12:32 -0700)]
bonding: ARP monitor spams NETDEV_NOTIFY_PEERS notifiers
The bonding ARP monitor fails to decrement send_peer_notif, the
number of peer notifications (gratuitous ARP or ND) to be sent. This
results in a continuous series of notifications.
Correct this by decrementing the counter for each notification.
Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes:
b0929915e035 ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/9400.1655407960@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Thu, 16 Jun 2022 14:13:20 +0000 (16:13 +0200)]
igb: fix a use-after-free issue in igb_clean_tx_ring
Fix the following use-after-free bug in igb_clean_tx_ring routine when
the NIC is running in XDP mode. The issue can be triggered redirecting
traffic into the igb NIC and then closing the device while the traffic
is flowing.
[ 73.322719] CPU: 1 PID: 487 Comm: xdp_redirect Not tainted 5.18.3-apu2 #9
[ 73.330639] Hardware name: PC Engines APU2/APU2, BIOS 4.0.7 02/28/2017
[ 73.337434] RIP: 0010:refcount_warn_saturate+0xa7/0xf0
[ 73.362283] RSP: 0018:
ffffc9000081f798 EFLAGS:
00010282
[ 73.367761] RAX:
0000000000000000 RBX:
ffffc90000420f80 RCX:
0000000000000000
[ 73.375200] RDX:
ffff88811ad22d00 RSI:
ffff88811ad171e0 RDI:
ffff88811ad171e0
[ 73.382590] RBP:
0000000000000900 R08:
ffffffff82298f28 R09:
0000000000000058
[ 73.390008] R10:
0000000000000219 R11:
ffffffff82280f40 R12:
0000000000000090
[ 73.397356] R13:
ffff888102343a40 R14:
ffff88810359e0e4 R15:
0000000000000000
[ 73.404806] FS:
00007ff38d31d740(0000) GS:
ffff88811ad00000(0000) knlGS:
0000000000000000
[ 73.413129] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 73.419096] CR2:
000055cff35f13f8 CR3:
0000000106391000 CR4:
00000000000406e0
[ 73.426565] Call Trace:
[ 73.429087] <TASK>
[ 73.431314] igb_clean_tx_ring+0x43/0x140 [igb]
[ 73.436002] igb_down+0x1d7/0x220 [igb]
[ 73.439974] __igb_close+0x3c/0x120 [igb]
[ 73.444118] igb_xdp+0x10c/0x150 [igb]
[ 73.447983] ? igb_pci_sriov_configure+0x70/0x70 [igb]
[ 73.453362] dev_xdp_install+0xda/0x110
[ 73.457371] dev_xdp_attach+0x1da/0x550
[ 73.461369] do_setlink+0xfd0/0x10f0
[ 73.465166] ? __nla_validate_parse+0x89/0xc70
[ 73.469714] rtnl_setlink+0x11a/0x1e0
[ 73.473547] rtnetlink_rcv_msg+0x145/0x3d0
[ 73.477709] ? rtnl_calcit.isra.0+0x130/0x130
[ 73.482258] netlink_rcv_skb+0x8d/0x110
[ 73.486229] netlink_unicast+0x230/0x340
[ 73.490317] netlink_sendmsg+0x215/0x470
[ 73.494395] __sys_sendto+0x179/0x190
[ 73.498268] ? move_addr_to_user+0x37/0x70
[ 73.502547] ? __sys_getsockname+0x84/0xe0
[ 73.506853] ? netlink_setsockopt+0x1c1/0x4a0
[ 73.511349] ? __sys_setsockopt+0xc8/0x1d0
[ 73.515636] __x64_sys_sendto+0x20/0x30
[ 73.519603] do_syscall_64+0x3b/0x80
[ 73.523399] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 73.528712] RIP: 0033:0x7ff38d41f20c
[ 73.551866] RSP: 002b:
00007fff3b945a68 EFLAGS:
00000246 ORIG_RAX:
000000000000002c
[ 73.559640] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007ff38d41f20c
[ 73.567066] RDX:
0000000000000034 RSI:
00007fff3b945b30 RDI:
0000000000000003
[ 73.574457] RBP:
0000000000000003 R08:
0000000000000000 R09:
0000000000000000
[ 73.581852] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fff3b945ab0
[ 73.589179] R13:
0000000000000000 R14:
0000000000000003 R15:
00007fff3b945b30
[ 73.596545] </TASK>
[ 73.598842] ---[ end trace
0000000000000000 ]---
Fixes:
9cbc948b5a20c ("igb: add XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/e5c01d549dc37bff18e46aeabd6fb28a7bcf84be.1655388571.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 18 Jun 2022 01:30:01 +0000 (18:30 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2022-06-17
We've added 12 non-merge commits during the last 4 day(s) which contain
a total of 14 files changed, 305 insertions(+), 107 deletions(-).
The main changes are:
1) Fix x86 JIT tailcall count offset on BPF-2-BPF call, from Jakub Sitnicki.
2) Fix a kprobe_multi link bug which misplaces BPF cookies, from Jiri Olsa.
3) Fix an infinite loop when processing a module's BTF, from Kumar Kartikeya Dwivedi.
4) Fix getting a rethook only in RCU available context, from Masami Hiramatsu.
5) Fix request socket refcount leak in sk lookup helpers, from Jon Maxwell.
6) Fix xsk xmit behavior which wrongly adds skb to already full cq, from Ciara Loftus.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
rethook: Reject getting a rethook if RCU is not watching
fprobe, samples: Add use_trace option and show hit/missed counter
bpf, docs: Update some of the JIT/maintenance entries
selftest/bpf: Fix kprobe_multi bench test
bpf: Force cookies array to follow symbols sorting
ftrace: Keep address offset in ftrace_lookup_symbols
selftests/bpf: Shuffle cookies symbols in kprobe multi test
selftests/bpf: Test tail call counting with bpf2bpf and data on stack
bpf, x86: Fix tail call count offset calculation on bpf2bpf call
bpf: Limit maximum modifier chain length in btf_check_type_tags
bpf: Fix request_sock leak in sk lookup helpers
xsk: Fix generic transmit when completion queue reservation fails
====================
Link: https://lore.kernel.org/r/20220617202119.2421-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Masami Hiramatsu (Google) [Tue, 7 Jun 2022 16:11:12 +0000 (01:11 +0900)]
rethook: Reject getting a rethook if RCU is not watching
Since the rethook_recycle() will involve the call_rcu() for reclaiming
the rethook_instance, the rethook must be set up at the RCU available
context (non idle). This rethook_recycle() in the rethook trampoline
handler is inevitable, thus the RCU available check must be done before
setting the rethook trampoline.
This adds a rcu_is_watching() check in the rethook_try_get() so that
it will return NULL if it is called when !rcu_is_watching().
Fixes:
54ecbe6f1ed5 ("rethook: Add a generic return hook")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/165461827269.280167.7379263615545598958.stgit@devnote2
Masami Hiramatsu (Google) [Tue, 7 Jun 2022 16:11:02 +0000 (01:11 +0900)]
fprobe, samples: Add use_trace option and show hit/missed counter
Add use_trace option to use trace_printk() instead of pr_info()
so that the handler doesn't involve the RCU operations.
And show the hit and missed counter so that the user can check
how many times the probe handler hit and missed.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/165461826247.280167.11939123218334322352.stgit@devnote2
Daniel Borkmann [Fri, 17 Jun 2022 19:42:33 +0000 (21:42 +0200)]
bpf, docs: Update some of the JIT/maintenance entries
Various minor updates around some of the BPF-related entries:
JITs for ARM32/NFP/SPARC/X86-32 haven't seen updates in quite a while, thus
for now, mark them as 'Odd Fixes' until they become more actively developed.
JITs for POWERPC/S390 are in good shape and receive active development and
review, thus bump to 'Supported' similar as we have with X86-64/ARM64.
JITs for MIPS/RISC-V are in similar good shape as the ones mentioned above,
but looked after mostly in spare time, thus leave for now in 'Maintained' state.
Add Michael to PPC JIT given he's picking up the patches there, so it better
reflects today's state.
Also, I haven't done much reviewing around BPF sockmap/kTLS after John and I
did the big rework back in the days to integrate sockmap with kTLS.
These days, most of this is taken care by John, Jakub {Sitnicki,Kicinski} and
others in the community, so remove myself from these two.
Lastly, move all BPF-related entries into one place, that is, move the sockmap
one over near rest of BPF.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/f9b8a63a0b48dc764bd4c50f87632889f5813f69.1655494758.git.daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Riccardo Paolo Bestetti [Fri, 17 Jun 2022 08:54:35 +0000 (10:54 +0200)]
ipv4: ping: fix bind address validity check
Commit
8ff978b8b222 ("ipv4/raw: support binding to nonlocal addresses")
introduced a helper function to fold duplicated validity checks of bind
addresses into inet_addr_valid_or_nonlocal(). However, this caused an
unintended regression in ping_check_bind_addr(), which previously would
reject binding to multicast and broadcast addresses, but now these are
both incorrectly allowed as reported in [1].
This patch restores the original check. A simple reordering is done to
improve readability and make it evident that multicast and broadcast
addresses should not be allowed. Also, add an early exit for INADDR_ANY
which replaces lost behavior added by commit
0ce779a9f501 ("net: Avoid
unnecessary inet_addr_type() call when addr is INADDR_ANY").
Furthermore, this patch introduces regression selftests to catch these
specific cases.
[1] https://lore.kernel.org/netdev/CANP3RGdkAcDyAZoT1h8Gtuu0saq+eOrrTiWbxnOs+5zn+cpyKg@mail.gmail.com/
Fixes:
8ff978b8b222 ("ipv4/raw: support binding to nonlocal addresses")
Cc: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xu Jia [Fri, 17 Jun 2022 09:31:06 +0000 (17:31 +0800)]
hamradio: 6pack: fix array-index-out-of-bounds in decode_std_command()
Hulk Robot reports incorrect sp->rx_count_cooked value in decode_std_command().
This should be caused by the subtracting from sp->rx_count_cooked before.
It seems that sp->rx_count_cooked value is changed to 0, which bypassed the
previous judgment.
The situation is shown below:
(Thread 1) | (Thread 2)
decode_std_command() | resync_tnc()
... |
if (rest == 2) |
sp->rx_count_cooked -= 2; |
else if (rest == 3) | ...
| sp->rx_count_cooked = 0;
sp->rx_count_cooked -= 1; |
for (i = 0; i < sp->rx_count_cooked; i++) // report error
checksum += sp->cooked_buf[i];
sp->rx_count_cooked is a shared variable but is not protected by a lock.
The same applies to sp->rx_count. This patch adds a lock to fix the bug.
The fail log is shown below:
=======================================================================
UBSAN: array-index-out-of-bounds in drivers/net/hamradio/6pack.c:925:31
index 400 is out of range for type 'unsigned char [400]'
CPU: 3 PID: 7433 Comm: kworker/u10:1 Not tainted
5.18.0-rc5-00163-g4b97bac0756a #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound flush_to_ldisc
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
ubsan_epilogue+0xb/0x50
__ubsan_handle_out_of_bounds.cold+0x62/0x6c
sixpack_receive_buf+0xfda/0x1330
tty_ldisc_receive_buf+0x13e/0x180
tty_port_default_receive_buf+0x6d/0xa0
flush_to_ldisc+0x213/0x3f0
process_one_work+0x98f/0x1620
worker_thread+0x665/0x1080
kthread+0x2e9/0x3a0
ret_from_fork+0x1f/0x30
...
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hoang Le [Fri, 17 Jun 2022 01:45:51 +0000 (08:45 +0700)]
tipc: fix use-after-free Read in tipc_named_reinit
syzbot found the following issue on:
==================================================================
BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
net/tipc/name_distr.c:413
Read of size 8 at addr
ffff88805299a000 by task kworker/1:9/23764
CPU: 1 PID: 23764 Comm: kworker/1:9 Not tainted
5.18.0-rc4-syzkaller-00878-g17d49e6e8012 #0
Hardware name: Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: events tipc_net_finalize_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0xeb/0x495
mm/kasan/report.c:313
print_report mm/kasan/report.c:429 [inline]
kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
tipc_named_reinit+0x94f/0x9b0 net/tipc/name_distr.c:413
tipc_net_finalize+0x234/0x3d0 net/tipc/net.c:138
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>
[...]
==================================================================
In the commit
d966ddcc3821 ("tipc: fix a deadlock when flushing scheduled work"),
the cancel_work_sync() function just to make sure ONLY the work
tipc_net_finalize_work() is executing/pending on any CPU completed before
tipc namespace is destroyed through tipc_exit_net(). But this function
is not guaranteed the work is the last queued. So, the destroyed instance
may be accessed in the work which will try to enqueue later.
In order to completely fix, we re-order the calling of cancel_work_sync()
to make sure the work tipc_net_finalize_work() was last queued and it
must be completed by calling cancel_work_sync().
Reported-by: syzbot+47af19f3307fc9c5c82e@syzkaller.appspotmail.com
Fixes:
d966ddcc3821 ("tipc: fix a deadlock when flushing scheduled work")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Vosburgh [Thu, 16 Jun 2022 19:26:30 +0000 (12:26 -0700)]
veth: Add updating of trans_start
Since commit
21a75f0915dd ("bonding: Fix ARP monitor validation"),
the bonding ARP / ND link monitors depend on the trans_start time to
determine link availability. NETIF_F_LLTX drivers must update trans_start
directly, which veth does not do. This prevents use of the ARP or ND link
monitors with veth interfaces in a bond.
Resolve this by having veth_xmit update the trans_start time.
Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes:
21a75f0915dd ("bonding: Fix ARP monitor validation")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 Jun 2022 07:34:34 +0000 (00:34 -0700)]
net: fix data-race in dev_isalive()
dev_isalive() is called under RTNL or dev_base_lock protection.
This means that changes to dev->reg_state should be done with both locks held.
syzbot reported:
BUG: KCSAN: data-race in register_netdevice / type_show
write to 0xffff888144ecf518 of 1 bytes by task 20886 on cpu 0:
register_netdevice+0xb9f/0xdf0 net/core/dev.c:10050
lapbeth_new_device drivers/net/wan/lapbether.c:414 [inline]
lapbeth_device_event+0x4a0/0x6c0 drivers/net/wan/lapbether.c:456
notifier_call_chain kernel/notifier.c:87 [inline]
raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:455
__dev_notify_flags+0x1d6/0x3a0
dev_change_flags+0xa2/0xc0 net/core/dev.c:8607
do_setlink+0x778/0x2230 net/core/rtnetlink.c:2780
__rtnl_newlink net/core/rtnetlink.c:3546 [inline]
rtnl_newlink+0x114c/0x16a0 net/core/rtnetlink.c:3593
rtnetlink_rcv_msg+0x811/0x8c0 net/core/rtnetlink.c:6089
netlink_rcv_skb+0x13e/0x240 net/netlink/af_netlink.c:2501
rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6107
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x58a/0x660 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x661/0x750 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
__sys_sendto+0x21e/0x2c0 net/socket.c:2119
__do_sys_sendto net/socket.c:2131 [inline]
__se_sys_sendto net/socket.c:2127 [inline]
__x64_sys_sendto+0x74/0x90 net/socket.c:2127
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
read to 0xffff888144ecf518 of 1 bytes by task 20423 on cpu 1:
dev_isalive net/core/net-sysfs.c:38 [inline]
netdev_show net/core/net-sysfs.c:50 [inline]
type_show+0x24/0x90 net/core/net-sysfs.c:112
dev_attr_show+0x35/0x90 drivers/base/core.c:2095
sysfs_kf_seq_show+0x175/0x240 fs/sysfs/file.c:59
kernfs_seq_show+0x75/0x80 fs/kernfs/file.c:162
seq_read_iter+0x2c3/0x8e0 fs/seq_file.c:230
kernfs_fop_read_iter+0xd1/0x2f0 fs/kernfs/file.c:235
call_read_iter include/linux/fs.h:2052 [inline]
new_sync_read fs/read_write.c:401 [inline]
vfs_read+0x5a5/0x6a0 fs/read_write.c:482
ksys_read+0xe8/0x1a0 fs/read_write.c:620
__do_sys_read fs/read_write.c:630 [inline]
__se_sys_read fs/read_write.c:628 [inline]
__x64_sys_read+0x3e/0x50 fs/read_write.c:628
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 20423 Comm: udevd Tainted: G W 5.19.0-rc2-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 10 Jun 2022 08:40:37 +0000 (11:40 +0300)]
phy: aquantia: Fix AN when higher speeds than 1G are not advertised
Even when the eth port is resticted to work with speeds not higher than 1G,
and so the eth driver is requesting the phy (via phylink) to advertise up
to 1000BASET support, the aquantia phy device is still advertising for 2.5G
and 5G speeds.
Clear these advertising defaults when requested.
Cc: Ondrej Spacek <ondrej.spacek@nxp.com>
Fixes:
09c4c57f7bc41 ("net: phy: aquantia: add support for auto-negotiation configuration")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov [Fri, 17 Jun 2022 02:42:21 +0000 (19:42 -0700)]
Merge branch 'bpf: Fix cookie values for kprobe multi'
Jiri Olsa says:
====================
hi,
there's bug in kprobe_multi link that makes cookies misplaced when
using symbols to attach. The reason is that we sort symbols by name
but not adjacent cookie values. Current test did not find it because
bpf_fentry_test* are already sorted by name.
v3 changes:
- fixed kprobe_multi bench test to filter out invalid entries
from available_filter_functions
v2 changes:
- rebased on top of bpf/master
- checking if cookies are defined later in swap function [Andrii]
- added acks
thanks,
jirka
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiri Olsa [Wed, 15 Jun 2022 11:21:18 +0000 (13:21 +0200)]
selftest/bpf: Fix kprobe_multi bench test
With [1] the available_filter_functions file contains records
starting with __ftrace_invalid_address___ and marking disabled
entries.
We need to filter them out for the bench test to pass only
resolvable symbols to kernel.
[1] commit
b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Fixes:
b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiri Olsa [Wed, 15 Jun 2022 11:21:17 +0000 (13:21 +0200)]
bpf: Force cookies array to follow symbols sorting
When user specifies symbols and cookies for kprobe_multi link
interface it's very likely the cookies will be misplaced and
returned to wrong functions (via get_attach_cookie helper).
The reason is that to resolve the provided functions we sort
them before passing them to ftrace_lookup_symbols, but we do
not do the same sort on the cookie values.
Fixing this by using sort_r function with custom swap callback
that swaps cookie values as well.
Fixes:
0236fec57a15 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiri Olsa [Wed, 15 Jun 2022 11:21:16 +0000 (13:21 +0200)]
ftrace: Keep address offset in ftrace_lookup_symbols
We want to store the resolved address on the same index as
the symbol string, because that's the user (bpf kprobe link)
code assumption.
Also making sure we don't store duplicates that might be
present in kallsyms.
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes:
bed0d9a50dac ("ftrace: Add ftrace_lookup_symbols function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiri Olsa [Wed, 15 Jun 2022 11:21:15 +0000 (13:21 +0200)]
selftests/bpf: Shuffle cookies symbols in kprobe multi test
There's a kernel bug that causes cookies to be misplaced and
the reason we did not catch this with this test is that we
provide bpf_fentry_test* functions already sorted by name.
Shuffling function bpf_fentry_test2 deeper in the list and
keeping the current cookie values as before will trigger
the bug.
The kernel fix is coming in following changes.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Sitnicki [Thu, 16 Jun 2022 16:20:37 +0000 (18:20 +0200)]
selftests/bpf: Test tail call counting with bpf2bpf and data on stack
Cover the case when tail call count needs to be passed from BPF function to
BPF function, and the caller has data on stack. Specifically when the size
of data allocated on BPF stack is not a multiple on 8.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616162037.535469-3-jakub@cloudflare.com
Jakub Sitnicki [Thu, 16 Jun 2022 16:20:36 +0000 (18:20 +0200)]
bpf, x86: Fix tail call count offset calculation on bpf2bpf call
On x86-64 the tail call count is passed from one BPF function to another
through %rax. Additionally, on function entry, the tail call count value
is stored on stack right after the BPF program stack, due to register
shortage.
The stored count is later loaded from stack either when performing a tail
call - to check if we have not reached the tail call limit - or before
calling another BPF function call in order to pass it via %rax.
In the latter case, we miscalculate the offset at which the tail call count
was stored on function entry. The JIT does not take into account that the
allocated BPF program stack is always a multiple of 8 on x86, while the
actual stack depth does not have to be.
This leads to a load from an offset that belongs to the BPF stack, as shown
in the example below:
SEC("tc")
int entry(struct __sk_buff *skb)
{
/* Have data on stack which size is not a multiple of 8 */
volatile char arr[1] = {};
return subprog_tail(skb);
}
int entry(struct __sk_buff * skb):
0: (b4) w2 = 0
1: (73) *(u8 *)(r10 -1) = r2
2: (85) call pc+1#bpf_prog_ce2f79bb5f3e06dd_F
3: (95) exit
int entry(struct __sk_buff * skb):
0xffffffffa0201788: nop DWORD PTR [rax+rax*1+0x0]
0xffffffffa020178d: xor eax,eax
0xffffffffa020178f: push rbp
0xffffffffa0201790: mov rbp,rsp
0xffffffffa0201793: sub rsp,0x8
0xffffffffa020179a: push rax
0xffffffffa020179b: xor esi,esi
0xffffffffa020179d: mov BYTE PTR [rbp-0x1],sil
0xffffffffa02017a1: mov rax,QWORD PTR [rbp-0x9] !!! tail call count
0xffffffffa02017a8: call 0xffffffffa02017d8 !!! is at rbp-0x10
0xffffffffa02017ad: leave
0xffffffffa02017ae: ret
Fix it by rounding up the BPF stack depth to a multiple of 8, when
calculating the tail call count offset on stack.
Fixes:
ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616162037.535469-2-jakub@cloudflare.com
Linus Torvalds [Thu, 16 Jun 2022 18:51:32 +0000 (11:51 -0700)]
Merge tag 'net-5.19-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Mostly driver fixes.
Current release - regressions:
- Revert "net: Add a second bind table hashed by port and address",
needs more work
- amd-xgbe: use platform_irq_count(), static setup of IRQ resources
had been removed from DT core
- dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
Current release - new code bugs:
- hns3: modify the ring param print info
Previous releases - always broken:
- axienet: make the 64b addressable DMA depends on 64b architectures
- iavf: fix issue with MAC address of VF shown as zero
- ice: fix PTP TX timestamp offset calculation
- usb: ax88179_178a needs FLAG_SEND_ZLP
Misc:
- document some net.sctp.* sysctls"
* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
net: axienet: add missing error return code in axienet_probe()
Revert "net: Add a second bind table hashed by port and address"
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
net: usb: ax88179_178a needs FLAG_SEND_ZLP
MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
ARM: dts: at91: ksz9477_evb: fix port/phy validation
net: bgmac: Fix an erroneous kfree() in bgmac_remove()
ice: Fix memory corruption in VF driver
ice: Fix queue config fail handling
ice: Sync VLAN filtering features for DVM
ice: Fix PTP TX timestamp offset calculation
mlxsw: spectrum_cnt: Reorder counter pools
docs: networking: phy: Fix a typo
amd-xgbe: Use platform_irq_count()
octeontx2-vf: Add support for adaptive interrupt coalescing
xilinx: Fix build on x86.
net: axienet: Use iowrite64 to write all 64b descriptor pointers
net: axienet: make the 64b addresable DMA depends on 64b archectures
net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
net: hns3: fix PF rss size initialization bug
...
Yang Yingliang [Thu, 16 Jun 2022 06:29:17 +0000 (14:29 +0800)]
net: axienet: add missing error return code in axienet_probe()
It should return error code in error path in axienet_probe().
Fixes:
00be43a74ca2 ("net: axienet: make the 64b addresable DMA depends on 64b archectures")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220616062917.3601-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joanne Koong [Wed, 15 Jun 2022 19:32:13 +0000 (12:32 -0700)]
Revert "net: Add a second bind table hashed by port and address"
This reverts:
commit
d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
commit
538aaf9b2383 ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/
There are a few things that need to be fixed here:
* Updating bhash2 in cases where the socket's rcv saddr changes
* Adding bhash2 hashbucket locks
Links to syzbot reports:
https://lore.kernel.org/netdev/
00000000000022208805e0df247a@google.com/
https://lore.kernel.org/netdev/
0000000000003f33bc05dfaf44fe@google.com/
Fixes:
d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com
Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com
Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 15 Jun 2022 21:20:26 +0000 (14:20 -0700)]
Merge tag 'hardening-v5.19-rc3' of git://git./linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Correctly handle vm_map areas in hardened usercopy (Matthew Wilcox)
- Adjust CFI RCU usage to avoid boot splats with cpuidle (Sami Tolvanen)
* tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usercopy: Make usercopy resilient against ridiculously large copies
usercopy: Cast pointer to an integer once
usercopy: Handle vm_map_ram() areas
cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle
Linus Torvalds [Wed, 15 Jun 2022 19:34:19 +0000 (12:34 -0700)]
Merge tag 'tpmdd-next-v5.19-rc3' of git://git./linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two fixes for this merge window"
* tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
certs/blacklist_hashes.c: fix const confusion in certs blacklist
Masahiro Yamada [Sat, 11 Jun 2022 17:22:31 +0000 (02:22 +0900)]
certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
Commit
addf466389d9 ("certs: Check that builtin blacklist hashes are
valid") was applied 8 months after the submission.
In the meantime, the base code had been removed by commit
b8c96a6b466c
("certs: simplify $(srctree)/ handling and remove config_filename
macro").
Fix the Makefile.
Create a local copy of $(CONFIG_SYSTEM_BLACKLIST_HASH_LIST). It is
included from certs/blacklist_hashes.c and also works as a timestamp.
Send error messages from check-blacklist-hashes.awk to stderr instead
of stdout.
Fixes:
addf466389d9 ("certs: Check that builtin blacklist hashes are valid")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Masahiro Yamada [Sat, 11 Jun 2022 17:22:30 +0000 (02:22 +0900)]
certs/blacklist_hashes.c: fix const confusion in certs blacklist
This file fails to compile as follows:
CC certs/blacklist_hashes.o
certs/blacklist_hashes.c:4:1: error: ignoring attribute ‘section (".init.data")’ because it conflicts with previous ‘section (".init.rodata")’ [-Werror=attributes]
4 | const char __initdata *const blacklist_hashes[] = {
| ^~~~~
In file included from certs/blacklist_hashes.c:2:
certs/blacklist.h:5:38: note: previous declaration here
5 | extern const char __initconst *const blacklist_hashes[];
| ^~~~~~~~~~~~~~~~
Apply the same fix as commit
2be04df5668d ("certs/blacklist_nohashes.c:
fix const confusion in certs blacklist").
Fixes:
734114f8782f ("KEYS: Add a system blacklist keyring")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Kumar Kartikeya Dwivedi [Wed, 15 Jun 2022 04:21:51 +0000 (09:51 +0530)]
bpf: Limit maximum modifier chain length in btf_check_type_tags
On processing a module BTF of module built for an older kernel, we might
sometimes find that some type points to itself forming a loop. If such a
type is a modifier, btf_check_type_tags's while loop following modifier
chain will be caught in an infinite loop.
Fix this by defining a maximum chain length and bailing out if we spin
any longer than that.
Fixes:
eb596b090558 ("bpf: Ensure type tags precede modifiers in BTF")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220615042151.2266537-1-memxor@gmail.com
Linus Torvalds [Wed, 15 Jun 2022 16:04:55 +0000 (09:04 -0700)]
Merge tag 'fs.fixes.v5.19-rc3' of git://git./linux/kernel/git/brauner/linux
Pull vfs idmapping fix from Christian Brauner:
"This fixes an issue where we fail to change the group of a file when
the caller owns the file and is a member of the group to change to.
This is only relevant on idmapped mounts.
There's a detailed description in the commit message and regression
tests have been added to xfstests"
* tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: account for group membership
Jon Maxwell [Wed, 15 Jun 2022 01:15:40 +0000 (11:15 +1000)]
bpf: Fix request_sock leak in sk lookup helpers
A customer reported a request_socket leak in a Calico cloud environment. We
found that a BPF program was doing a socket lookup with takes a refcnt on
the socket and that it was finding the request_socket but returning the parent
LISTEN socket via sk_to_full_sk() without decrementing the child request socket
1st, resulting in request_sock slab object leak. This patch retains the
existing behaviour of returning full socks to the caller but it also decrements
the child request_socket if one is present before doing so to prevent the leak.
Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
thanks to Antoine Tenart for the reproducer and patch input.
v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
validate RCU flags on the listen socket so that it balances with bpf_sk_release()
and update comments as per Martin KaFai Lau's suggestion. One small change to
Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra
instruction.
Fixes:
f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()")
Fixes:
edbf8c01de5a ("bpf: add skc_lookup_tcp helper")
Co-developed-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net
Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com
Duoming Zhou [Tue, 14 Jun 2022 09:25:57 +0000 (17:25 +0800)]
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
and block until it receives a packet from the remote. If the client
doesn`t connect to server and calls read() directly, it will not
receive any packets forever. As a result, the deadlock will happen.
The fail log caused by deadlock is shown below:
[ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
[ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.613058] Call Trace:
[ 369.613315] <TASK>
[ 369.614072] __schedule+0x2f9/0xb20
[ 369.615029] schedule+0x49/0xb0
[ 369.615734] __lock_sock+0x92/0x100
[ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20
[ 369.617941] lock_sock_nested+0x6e/0x70
[ 369.618809] ax25_bind+0xaa/0x210
[ 369.619736] __sys_bind+0xca/0xf0
[ 369.620039] ? do_futex+0xae/0x1b0
[ 369.620387] ? __x64_sys_futex+0x7c/0x1c0
[ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40
[ 369.620613] __x64_sys_bind+0x11/0x20
[ 369.621791] do_syscall_64+0x3b/0x90
[ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 369.623319] RIP: 0033:0x7f43c8aa8af7
[ 369.624301] RSP: 002b:
00007f43c8197ef8 EFLAGS:
00000246 ORIG_RAX:
0000000000000031
[ 369.625756] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f43c8aa8af7
[ 369.626724] RDX:
0000000000000010 RSI:
000055768e2021d0 RDI:
0000000000000005
[ 369.628569] RBP:
00007f43c8197f00 R08:
0000000000000011 R09:
00007f43c8198700
[ 369.630208] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fff845e6afe
[ 369.632240] R13:
00007fff845e6aff R14:
00007f43c8197fc0 R15:
00007f43c8198700
This patch replaces skb_recv_datagram() with an open-coded variant of it
releasing the socket lock before the __skb_wait_for_more_packets() call
and re-acquiring it after such call in order that other functions that
need socket lock could be executed.
what's more, the socket lock will be released only when recvmsg() will
block and that should produce nicer overall behavior.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reported-by: Thomas Habets <thomas@@habets.se>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Alonso [Mon, 13 Jun 2022 18:32:44 +0000 (15:32 -0300)]
net: usb: ax88179_178a needs FLAG_SEND_ZLP
The extra byte inserted by usbnet.c when
(length % dev->maxpacket == 0) is causing problems to device.
This patch sets FLAG_SEND_ZLP to avoid this.
Tested with: 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
Problems observed:
======================================================================
1) Using ssh/sshfs. The remote sshd daemon can abort with the message:
"message authentication code incorrect"
This happens because the tcp message sent is corrupted during the
USB "Bulk out". The device calculate the tcp checksum and send a
valid tcp message to the remote sshd. Then the encryption detects
the error and aborts.
2) NETDEV WATCHDOG: ... (ax88179_178a): transmit queue 0 timed out
3) Stop normal work without any log message.
The "Bulk in" continue receiving packets normally.
The host sends "Bulk out" and the device responds with -ECONNRESET.
(The netusb.c code tx_complete ignore -ECONNRESET)
Under normal conditions these errors take days to happen and in
intense usage take hours.
A test with ping gives packet loss, showing that something is wrong:
ping -4 -s 462 {destination} # 462 = 512 - 42 - 8
Not all packets fail.
My guess is that the device tries to find another packet starting
at the extra byte and will fail or not depending on the next
bytes (old buffer content).
======================================================================
Signed-off-by: Jose Alonso <joalonsof@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jun 2022 08:15:33 +0000 (09:15 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-06-14
This series contains updates to ice driver only.
Michal fixes incorrect Tx timestamp offset calculation for E822 devices.
Roman enforces required VLAN filtering settings for double VLAN mode.
Przemyslaw fixes memory corruption issues with VFs by ensuring
queues are disabled in the error path of VF queue configuration and to
disabled VFs during reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lukas Bulwahn [Mon, 13 Jun 2022 12:18:26 +0000 (14:18 +0200)]
MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
Maintainers of the directory Documentation/devicetree/bindings/net
are also the maintainers of the corresponding directory
include/dt-bindings/net.
Add the file entry for include/dt-bindings/net to the appropriate
section in MAINTAINERS.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220613121826.11484-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Fri, 10 Jun 2022 08:16:21 +0000 (10:16 +0200)]
ARM: dts: at91: ksz9477_evb: fix port/phy validation
Latest drivers version requires phy-mode to be set. Otherwise we will
use "NA" mode and the switch driver will invalidate this port mode.
Fixes:
65ac79e18120 ("net: dsa: microchip: add the phylink get_caps")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220610081621.584393-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christophe JAILLET [Mon, 13 Jun 2022 20:53:50 +0000 (22:53 +0200)]
net: bgmac: Fix an erroneous kfree() in bgmac_remove()
'bgmac' is part of a managed resource allocated with bgmac_alloc(). It
should not be freed explicitly.
Remove the erroneous kfree() from the .remove() function.
Fixes:
34a5102c3235 ("net: bgmac: allocate struct bgmac just once & don't copy it")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/a026153108dd21239036a032b95c25b5cece253b.1655153616.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 14 Jun 2022 17:36:11 +0000 (10:36 -0700)]
netfs: fix up netfs_inode_init() docbook comment
Commit
e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode
wrapper introduced") changed the argument types and names, and actually
updated the comment too (although that was thanks to David Howells, not
me: my original patch only changed the code).
But the comment fixup didn't go quite far enough, and didn't change the
argument name in the comment, resulting in
include/linux/netfs.h:314: warning: Function parameter or member 'ctx' not described in 'netfs_inode_init'
include/linux/netfs.h:314: warning: Excess function parameter 'inode' description in 'netfs_inode_init'
during htmldoc generation.
Fixes:
e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode wrapper introduced")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Przemyslaw Patynowski [Thu, 2 Jun 2022 10:09:17 +0000 (12:09 +0200)]
ice: Fix memory corruption in VF driver
Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled,
when it requests a reset. If PF driver assumes that VF is disabled,
while VF still has queues configured, VF may unmap DMA resources.
In such scenario device still can map packets to memory, which ends up
silently corrupting it.
Previously, VF driver could experience memory corruption, which lead to
crash:
[ 5119.170157] BUG: unable to handle kernel paging request at
00001b9780003237
[ 5119.170166] PGD 0 P4D 0
[ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI
[ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G W I --------- - - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1
[ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019
[ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf]
[ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30
[ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74
[ 5119.170244] RSP: 0018:
ffffa43b0bdcfd78 EFLAGS:
00010282
[ 5119.170250] RAX:
ffffffff896b3e40 RBX:
ffff8fb282524000 RCX:
0000000000000002
[ 5119.170254] RDX:
0000000049000000 RSI:
0000000000000000 RDI:
00001b9780003203
[ 5119.170259] RBP:
ffff8fb248217b00 R08:
0000000000000022 R09:
0000000000000009
[ 5119.170262] R10:
2b849d6300000000 R11:
0000000000000020 R12:
0000000000000000
[ 5119.170265] R13:
0000000000001000 R14:
0000000000000009 R15:
0000000000000000
[ 5119.170269] FS:
0000000000000000(0000) GS:
ffff8fb1201c0000(0000) knlGS:
0000000000000000
[ 5119.170274] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 5119.170279] CR2:
00001b9780003237 CR3:
00000008f3e1a003 CR4:
00000000007726e0
[ 5119.170283] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 5119.170286] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 5119.170290] PKRU:
55555554
[ 5119.170292] Call Trace:
[ 5119.170298] iavf_clean_rx_ring+0xad/0x110 [iavf]
[ 5119.170324] iavf_free_rx_resources+0xe/0x50 [iavf]
[ 5119.170342] iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf]
[ 5119.170358] iavf_virtchnl_completion+0xd8a/0x15b0 [iavf]
[ 5119.170377] ? iavf_clean_arq_element+0x210/0x280 [iavf]
[ 5119.170397] iavf_adminq_task+0x126/0x2e0 [iavf]
[ 5119.170416] process_one_work+0x18f/0x420
[ 5119.170429] worker_thread+0x30/0x370
[ 5119.170437] ? process_one_work+0x420/0x420
[ 5119.170445] kthread+0x151/0x170
[ 5119.170452] ? set_kthread_struct+0x40/0x40
[ 5119.170460] ret_from_fork+0x35/0x40
[ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas
[ 5119.170613] i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf]
[ 5119.170627] CR2:
00001b9780003237
Fixes:
ec4f5a436bdf ("ice: Check if VF is disabled for Opcode and other operations")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Co-developed-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemyslaw Patynowski [Thu, 2 Jun 2022 10:09:04 +0000 (12:09 +0200)]
ice: Fix queue config fail handling
Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail.
Not disabling them might lead to scenario, where PF driver leaves VF
queues enabled, when VF's VSI failed queue config.
In this scenario VF should not have RX/TX queues enabled. If PF failed
to set up VF's queues, VF will reset due to TX timeouts in VF driver.
Initialize iterator 'i' to -1, so if error happens prior to configuring
queues then error path code will not disable queue 0. Loop that
configures queues will is using same iterator, so error path code will
only disable queues that were configured.
Fixes:
77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Suggested-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Roman Storozhenko [Tue, 7 Jun 2022 06:54:57 +0000 (08:54 +0200)]
ice: Sync VLAN filtering features for DVM
VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be
both enabled or disabled.
In case of turning off/on only one of the features, another feature must
be turned off/on automatically with issuing an appropriate message to
the kernel log.
Fixes:
1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Roman Storozhenko <roman.storozhenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Michalik [Tue, 10 May 2022 11:03:43 +0000 (13:03 +0200)]
ice: Fix PTP TX timestamp offset calculation
The offset was being incorrectly calculated for E822 - that led to
collisions in choosing TX timestamp register location when more than
one port was trying to use timestamping mechanism.
In E822 one quad is being logically split between ports, so quad 0 is
having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should
have separate memory location for tracking timestamps. Due to error for
example ports 1 and 2 had been assigned to quad 0 with same offset (0),
while port 1 should have offset 0 and 1 offset 16.
Fix it by correctly calculating quad offset.
Fixes:
3a7496234d17 ("ice: implement basic E822 PTP support")
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Tue, 14 Jun 2022 14:57:18 +0000 (07:57 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"While last week's pull request contained miscellaneous fixes for x86,
this one covers other architectures, selftests changes, and a bigger
series for APIC virtualization bugs that were discovered during 5.20
development. The idea is to base 5.20 development for KVM on top of
this tag.
ARM64:
- Properly reset the SVE/SME flags on vcpu load
- Fix a vgic-v2 regression regarding accessing the pending state of a
HW interrupt from userspace (and make the code common with vgic-v3)
- Fix access to the idreg range for protected guests
- Ignore 'kvm-arm.mode=protected' when using VHE
- Return an error from kvm_arch_init_vm() on allocation failure
- A bunch of small cleanups (comments, annotations, indentation)
RISC-V:
- Typo fix in arch/riscv/kvm/vmid.c
- Remove broken reference pattern from MAINTAINERS entry
x86-64:
- Fix error in page tables with MKTME enabled
- Dirty page tracking performance test extended to running a nested
guest
- Disable APICv/AVIC in cases that it cannot implement correctly"
[ This merge also fixes a misplaced end parenthesis bug introduced in
commit
3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
ID or APIC base") pointed out by Sean Christopherson ]
Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
KVM: selftests: Clean up LIBKVM files in Makefile
KVM: selftests: Link selftests directly with lib object files
KVM: selftests: Drop unnecessary rule for STATIC_LIBS
KVM: selftests: Add a helper to check EPT/VPID capabilities
KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
KVM: selftests: Refactor nested_map() to specify target level
KVM: selftests: Drop stale function parameter comment for nested_map()
KVM: selftests: Add option to create 2M and 1G EPT mappings
KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
KVM: x86: disable preemption while updating apicv inhibition
KVM: x86: SVM: fix avic_kick_target_vcpus_fast
KVM: x86: SVM: remove avic's broken code that updated APIC ID
KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
KVM: x86: document AVIC/APICv inhibit reasons
KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
...
Ciara Loftus [Tue, 14 Jun 2022 07:07:46 +0000 (07:07 +0000)]
xsk: Fix generic transmit when completion queue reservation fails
Two points of potential failure in the generic transmit function are:
1. completion queue (cq) reservation failure.
2. skb allocation failure
Originally the cq reservation was performed first, followed by the skb
allocation. Commit
675716400da6 ("xdp: fix possible cq entry leak")
reversed the order because at the time there was no mechanism available
to undo the cq reservation which could have led to possible cq entry leaks
in the event of skb allocation failure. However if the skb allocation is
performed first and the cq reservation then fails, the xsk skb destructor
is called which blindly adds the skb address to the already full cq leading
to undefined behavior.
This commit restores the original order (cq reservation followed by skb
allocation) and uses the xskq_prod_cancel helper to undo the cq reserve
in event of skb allocation failure.
Fixes:
675716400da6 ("xdp: fix possible cq entry leak")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220614070746.8871-1-ciara.loftus@intel.com
Linus Torvalds [Tue, 14 Jun 2022 14:43:15 +0000 (07:43 -0700)]
Merge tag 'x86-bugs-2022-06-01' of git://git./linux/kernel/git/tip/tip
Pull x86 MMIO stale data fixes from Thomas Gleixner:
"Yet another hw vulnerability with a software mitigation: Processor
MMIO Stale Data.
They are a class of MMIO-related weaknesses which can expose stale
data by propagating it into core fill buffers. Data which can then be
leaked using the usual speculative execution methods.
Mitigations include this set along with microcode updates and are
similar to MDS and TAA vulnerabilities: VERW now clears those buffers
too"
* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/mmio: Print SMT warning
KVM: x86/speculation: Disable Fill buffer clear within guests
x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
x86/speculation/srbds: Update SRBDS mitigation selection
x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
x86/speculation: Add a common function for MD_CLEAR mitigation update
x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
Documentation: Add documentation for Processor MMIO Stale Data
Petr Machata [Mon, 13 Jun 2022 12:50:17 +0000 (15:50 +0300)]
mlxsw: spectrum_cnt: Reorder counter pools
Both RIF and ACL flow counters use a 24-bit SW-managed counter address to
communicate which counter they want to bind.
In a number of Spectrum FW releases, binding a RIF counter is broken and
slices the counter index to 16 bits. As a result, on Spectrum-2 and above,
no more than about 410 RIF counters can be effectively used. This
translates to 205 netdevices for which L3 HW stats can be enabled. (This
does not happen on Spectrum-1, because there are fewer counters available
overall and the counter index never exceeds 16 bits.)
Binding counters to ACLs does not have this issue. Therefore reorder the
counter allocation scheme so that RIF counters come first and therefore get
lower indices that are below the 16-bit barrier.
Fixes:
98e60dce4da1 ("Merge branch 'mlxsw-Introduce-initial-Spectrum-2-support'")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220613125017.2018162-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Christian Brauner [Mon, 13 Jun 2022 11:15:17 +0000 (13:15 +0200)]
fs: account for group membership
When calling setattr_prepare() to determine the validity of the
attributes the ia_{g,u}id fields contain the value that will be written
to inode->i_{g,u}id. This is exactly the same for idmapped and
non-idmapped mounts and allows callers to pass in the values they want
to see written to inode->i_{g,u}id.
When group ownership is changed a caller whose fsuid owns the inode can
change the group of the inode to any group they are a member of. When
searching through the caller's groups we need to use the gid mapped
according to the idmapped mount otherwise we will fail to change
ownership for unprivileged users.
Consider a caller running with fsuid and fsgid 1000 using an idmapped
mount that maps id 65534 to 1000 and 65535 to 1001. Consequently, a file
owned by 65534:65535 in the filesystem will be owned by 1000:1001 in the
idmapped mount.
The caller now requests the gid of the file to be changed to 1000 going
through the idmapped mount. In the vfs we will immediately map the
requested gid to the value that will need to be written to inode->i_gid
and place it in attr->ia_gid. Since this idmapped mount maps 65534 to
1000 we place 65534 in attr->ia_gid.
When we check whether the caller is allowed to change group ownership we
first validate that their fsuid matches the inode's uid. The
inode->i_uid is 65534 which is mapped to uid 1000 in the idmapped mount.
Since the caller's fsuid is 1000 we pass the check.
We now check whether the caller is allowed to change inode->i_gid to the
requested gid by calling in_group_p(). This will compare the passed in
gid to the caller's fsgid and search the caller's additional groups.
Since we're dealing with an idmapped mount we need to pass in the gid
mapped according to the idmapped mount. This is akin to checking whether
a caller is privileged over the future group the inode is owned by. And
that needs to take the idmapped mount into account. Note, all helpers
are nops without idmapped mounts.
New regression test sent to xfstests.
Link: https://github.com/lxc/lxd/issues/10537
Link: https://lore.kernel.org/r/20220613111517.2186646-1-brauner@kernel.org
Fixes:
2f221d6f7b88 ("attr: handle idmapped mounts")
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org # 5.15+
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Jonathan Neuschäfer [Fri, 10 Jun 2022 07:28:08 +0000 (09:28 +0200)]
docs: networking: phy: Fix a typo
Write "to be operated" instead of "to be operate".
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220610072809.352962-1-j.neuschaefer@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jean-Philippe Brucker [Thu, 9 Jun 2022 16:14:59 +0000 (17:14 +0100)]
amd-xgbe: Use platform_irq_count()
The AMD XGbE driver currently counts the number of interrupts assigned
to the device by inspecting the pdev->resource array. Since commit
a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT
core") removed IRQs from this array, the driver now attempts to get all
interrupts from 1 to -1U and gives up probing once it reaches an invalid
interrupt index.
Obtain the number of IRQs with platform_irq_count() instead.
Fixes:
a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT core")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220609161457.69614-1-jean-philippe@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthew Wilcox (Oracle) [Sun, 12 Jun 2022 21:32:27 +0000 (22:32 +0100)]
usercopy: Make usercopy resilient against ridiculously large copies
If 'n' is so large that it's negative, we might wrap around and mistakenly
think that the copy is OK when it's not. Such a copy would probably
crash, but just doing the arithmetic in a more simple way lets us detect
and refuse this case.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220612213227.3881769-4-willy@infradead.org
Matthew Wilcox (Oracle) [Sun, 12 Jun 2022 21:32:26 +0000 (22:32 +0100)]
usercopy: Cast pointer to an integer once
Get rid of a lot of annoying casts by setting 'addr' once at the top
of the function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220612213227.3881769-3-willy@infradead.org
Matthew Wilcox (Oracle) [Sun, 12 Jun 2022 21:32:25 +0000 (22:32 +0100)]
usercopy: Handle vm_map_ram() areas
vmalloc does not allocate a vm_struct for vm_map_ram() areas. That causes
us to deny usercopies from those areas. This affects XFS which uses
vm_map_ram() for its directories.
Fix this by calling find_vmap_area() instead of find_vm_area().
Fixes:
0aef499f3172 ("mm/usercopy: Detect vmalloc overruns")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220612213227.3881769-2-willy@infradead.org
Sami Tolvanen [Tue, 31 May 2022 17:59:10 +0000 (10:59 -0700)]
cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle
RCU_NONIDLE usage during __cfi_slowpath_diag can result in an invalid
RCU state in the cpuidle code path:
WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter+0xe4/0x138
...
Call trace:
rcu_eqs_enter+0xe4/0x138
rcu_idle_enter+0xa8/0x100
cpuidle_enter_state+0x154/0x3a8
cpuidle_enter+0x3c/0x58
do_idle.llvm.
6590768638138871020+0x1f4/0x2ec
cpu_startup_entry+0x28/0x2c
secondary_start_kernel+0x1b8/0x220
__secondary_switched+0x94/0x98
Instead, call rcu_irq_enter/exit to wake up RCU only when needed and
disable interrupts for the entire CFI shadow/module check when we do.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20220531175910.890307-1-samitolvanen@google.com
Fixes:
cf68fffb66d6 ("add support for Clang CFI")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Suman Ghosh [Sun, 12 Jun 2022 17:45:36 +0000 (23:15 +0530)]
octeontx2-vf: Add support for adaptive interrupt coalescing
Fixes:
6e144b47f560 (octeontx2-pf: Add support for adaptive interrupt coalescing)
Added support for VF interfaces as well.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Jun 2022 11:49:21 +0000 (12:49 +0100)]
xilinx: Fix build on x86.
CONFIG_64BIT is not sufficient for checking for availability of
iowrite64() and friends.
Also, the out_addr helpers need to be inline.
Fixes:
b690f8df6497 ("net: axienet: Use iowrite64 to write all 64b descriptor pointers")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Jun 2022 11:36:56 +0000 (12:36 +0100)]
Merge branch 'axienet-fixes'
Andy Chiu says:
====================
net: axienet: fix DMA Tx error
We ran into multiple DMA TX errors while writing files over a network
block device running on top of a DMA-connected AXI Ethernet device on
64-bit RISC-V machines. The errors indicated that the DMA had fetched a
null descriptor and we found that the reason for this is that AXI DMA had
unexpectedly processed a partially updated tail descriptor pointer. To
fix it, we suggest that the driver should use one 64-bit write instead
of two 32-bit writes to perform such update if possible. For those
archectures where double-word load/stores are unavailable, e.g. 32-bit
archectures, force a driver probe failure if the driver finds 64-bit
capability on DMA.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Mon, 13 Jun 2022 03:42:02 +0000 (11:42 +0800)]
net: axienet: Use iowrite64 to write all 64b descriptor pointers
According to commit
f735c40ed93c ("net: axienet: Autodetect 64-bit DMA
capability") and AXI-DMA spec (pg021), on 64-bit capable dma, only
writing MSB part of tail descriptor pointer causes DMA engine to start
fetching descriptors. However, we found that it is true only if dma is in
idle state. In other words, dma would use a tailp even if it only has LSB
updated, when the dma is running.
The non-atomicity of this behavior could be problematic if enough
delay were introduced in between the 2 writes. For example, if an
interrupt comes right after the LSB write and the cpu spends long
enough time in the handler for the dma to get back into idle state by
completing descriptors, then the seconcd write to MSB would treat dma
to start fetching descriptors again. Since the descriptor next to the
one pointed by current tail pointer is not filled by the kernel yet,
fetching a null descriptor here causes a dma internal error and halt
the dma engine down.
We suggest that the dma engine should start process a 64-bit MMIO write
to the descriptor pointer only if ONE 32-bit part of it is written on all
states. Or we should restrict the use of 64-bit addressable dma on 32-bit
platforms, since those devices have no instruction to guarantee the write
to LSB and MSB part of tail pointer occurs atomically to the dma.
initial condition:
curp = x-3;
tailp = x-2;
LSB = x;
MSB = 0;
cpu: |dma:
iowrite32(LSB, tailp) | completes #(x-3) desc, curp = x-3
... | tailp updated
=> irq | completes #(x-2) desc, curp = x-2
... | completes #(x-1) desc, curp = x-1
... | ...
... | completes #x desc, curp = tailp = x
<= irqreturn | reaches tailp == curp = x, idle
iowrite32(MSB, tailp + 4) | ...
| tailp updated, starts fetching...
| fetches #(x + 1) desc, sees cntrl = 0
| post Tx error, halt
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reported-by: Max Hsu <max.hsu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Mon, 13 Jun 2022 03:42:01 +0000 (11:42 +0800)]
net: axienet: make the 64b addresable DMA depends on 64b archectures
Currently it is not safe to config the IP as 64-bit addressable on 32-bit
archectures, which cannot perform a double-word store on its descriptor
pointers. The pointer is 64-bit wide if the IP is configured as 64-bit,
and the device would process the partially updated pointer on some
states if the pointer was updated via two store-words. To prevent such
condition, we force a probe fail if we discover that the IP has 64-bit
capability but it is not running on a 64-Bit kernel.
This is a series of patch (1/2). The next patch must be applied in order
to make 64b DMA safe on 64b archectures.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reported-by: Max Hsu <max.hsu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Jun 2022 10:56:01 +0000 (11:56 +0100)]
Merge branch 'hns3-fixres'
Guangbin Huang says:
====================
net: hns3: add some fixes for -net
This series adds some fixes for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Sat, 11 Jun 2022 12:25:29 +0000 (20:25 +0800)]
net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
Currently in driver initialization process, driver will set shapping
parameters of tm port to default speed read from firmware. However, the
speed of SFP module may not be default speed, so shapping parameters of
tm port may be incorrect.
To fix this problem, driver sets new shapping parameters for tm port
after getting exact speed of SFP module in this case.
Fixes:
88d10bd6f730 ("net: hns3: add support for multiple media type")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jie Wang [Sat, 11 Jun 2022 12:25:28 +0000 (20:25 +0800)]
net: hns3: fix PF rss size initialization bug
Currently hns3 driver misuses the VF rss size to initialize the PF rss size
in hclge_tm_vport_tc_info_update. So this patch fix it by checking the
vport id before initialization.
Fixes:
7347255ea389 ("net: hns3: refactor PF rss get APIs with new common rss get APIs")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Sat, 11 Jun 2022 12:25:27 +0000 (20:25 +0800)]
net: hns3: restore tm priority/qset to default settings when tc disabled
Currently, settings parameters of schedule mode, dwrr, shaper of tm
priority or qset of one tc are only be set when tc is enabled, they are
not restored to the default settings when tc is disabled. It confuses
users when they cat tm_priority or tm_qset files of debugfs. So this
patch fixes it.
Fixes:
848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jie Wang [Sat, 11 Jun 2022 12:25:26 +0000 (20:25 +0800)]
net: hns3: modify the ring param print info
Currently tx push is also a ring param. So the original ring param print
info in hns3_is_ringparam_changed should be adjusted.
Fixes:
07fdc163ac88 ("net: hns3: refactor hns3_set_ringparam()")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Sat, 11 Jun 2022 12:25:25 +0000 (20:25 +0800)]
net: hns3: don't push link state to VF if unalive
It's unnecessary to push link state to unalive VF, and the VF will
query link state from PF when it being start works.
Fixes:
18b6e31f8bf4 ("net: hns3: PF add support for pushing link status to VFs")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Sat, 11 Jun 2022 12:25:24 +0000 (20:25 +0800)]
net: hns3: set port base vlan tbl_sta to false before removing old vlan
When modify port base vlan, the port base vlan tbl_sta needs to set to
false before removing old vlan, to indicate this operation is not finish.
Fixes:
c0f46de30c96 ("net: hns3: fix port base vlan add fail when concurrent with reset")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 12 Jun 2022 23:11:37 +0000 (16:11 -0700)]
Linux 5.19-rc2
Linus Torvalds [Sun, 12 Jun 2022 18:33:42 +0000 (11:33 -0700)]
Merge tag 'platform-drivers-x86-v5.19-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Highlights:
- Fix hp-wmi regression on HP Omen laptops introduced in 5.18
- Several hardware-id additions
- A couple of other tiny fixes"
* tag 'platform-drivers-x86-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: hid: Add Surface Go to VGBS allow list
platform/x86: hp-wmi: Use zero insize parameter only when supported
platform/x86: hp-wmi: Resolve WMI query failures on some devices
platform/x86: gigabyte-wmi: Add support for B450M DS3H-CF
platform/x86: gigabyte-wmi: Add Z690M AORUS ELITE AX DDR4 support
platform/x86: barco-p50-gpio: Add check for platform_driver_register
platform/x86/intel: pmc: Support Intel Raptorlake P
platform/x86/intel: Fix pmt_crashlog array reference
platform/mellanox: Add static in struct declaration.
platform/mellanox: Spelling s/platfom/platform/
Linus Torvalds [Sun, 12 Jun 2022 18:16:00 +0000 (11:16 -0700)]
Merge tag 'wq-for-5.19-rc1-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Tetsuo's patch to trigger build warnings if system-wide wq's are
flushed along with a TP type update and trivial comment update"
* tag 'wq-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Switch to new kerneldoc syntax for named variable macro argument
workqueue: Fix type of cpu in trace event
workqueue: Wrap flush_workqueue() using a macro
Linus Torvalds [Sun, 12 Jun 2022 18:10:07 +0000 (11:10 -0700)]
Merge tag 'kbuild-fixes-v5.19' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Make the *.mod build rule portable for POSIX awk
- Fix regression of 'make nsdeps'
- Make scripts/check-local-export working for older bash versions
- Fix scripts/gdb to extract the .config data from vmlinux
* tag 'kbuild-fixes-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
scripts/gdb: change kernel config dumping method
scripts/check-local-export: avoid 'wait $!' for process substitution
scripts/nsdeps: adjust to the format change of *.mod files
kbuild: avoid regex RS for POSIX awk
Linus Torvalds [Sun, 12 Jun 2022 18:05:44 +0000 (11:05 -0700)]
Merge tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
"Three reconnect fixes, all for stable as well.
One of these three reconnect fixes does address a problem with
multichannel reconnect, but this does not include the additional
fix (still being tested) for dynamically detecting multichannel
adapter changes which will improve those reconnect scenarios even
more"
* tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: populate empty hostnames for extra channels
cifs: return errors during session setup during reconnects
cifs: fix reconnect on smb3 mount types
Linus Torvalds [Sun, 12 Jun 2022 17:33:38 +0000 (10:33 -0700)]
Merge tag 'random-5.19-rc2-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- A fix for a 5.19 regression for a case in which early device tree
initializes the RNG, which flips a static branch.
On most plaforms, jump labels aren't initialized until much later, so
this caused splats. On a few mailing list threads, we cooked up easy
fixes for arm64, arm32, and risc-v. But then things looked slightly
more involved for xtensa, powerpc, arc, and mips. And at that point,
when we're patching 7 architectures in a place before the console is
even available, it seems like the cost/risk just wasn't worth it.
So random.c works around it now by checking the already exported
`static_key_initialized` boolean, as though somebody already ran into
this issue in the past. I'm not super jazzed about that; it'd be
prettier to not have to complicate downstream code. But I suppose
it's practical.
- A few small code nits and adding a missing __init annotation.
- A change to the default config values to use the cpu and bootloader's
seeds for initializing the RNG earlier.
This brings them into line with what all the distros do (Fedora/RHEL,
Debian, Ubuntu, Gentoo, Arch, NixOS, Alpine, SUSE, and Void... at
least), and moreover will now give us test coverage in various test
beds that might have caught the above device tree bug earlier.
- A change to WireGuard CI's configuration to increase test coverage
around the RNG.
- A documentation comment fix to unrelated maintainerless CRC code that
I was asked to take, I guess because it has to do with polynomials
(which the RNG thankfully no longer uses).
* tag 'random-5.19-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
wireguard: selftests: use maximum cpu features and allow rng seeding
random: remove rng_has_arch_random()
random: credit cpu and bootloader seeds by default
random: do not use jump labels before they are initialized
random: account for arch randomness in bits
random: mark bootloader randomness code as __init
random: avoid checking crng_ready() twice in random_init()
crc-itu-t: fix typo in CRC ITU-T polynomial comment
Duke Lee [Tue, 7 Jun 2022 21:36:54 +0000 (14:36 -0700)]
platform/x86/intel: hid: Add Surface Go to VGBS allow list
The Surface Go reports Chassis Type 9 (Laptop,) so the device needs to be
added to dmi_vgbs_allow_list to enable tablet mode when an attached Type
Cover is folded back.
BugLink: https://github.com/linux-surface/linux-surface/issues/837
Signed-off-by: Duke Lee <krnhotwings@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220607213654.5567-1-krnhotwings@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Bedant Patnaik [Wed, 8 Jun 2022 19:28:43 +0000 (00:58 +0530)]
platform/x86: hp-wmi: Use zero insize parameter only when supported
commit
be9d73e64957 ("platform/x86: hp-wmi: Fix 0x05 error code reported by
several WMI calls") and commit
12b19f14a21a ("platform/x86: hp-wmi: Fix
hp_wmi_read_int() reporting error (0x05)") cause ACPI BIOS Error (bug):
Attempt to CreateField of length zero (
20211217/dsopcode-133) because of
the ACPI method HWMC, which unconditionally creates a Field of
size (insize*8) bits:
CreateField (Arg1, 0x80, (Local5 * 0x08), DAIN)
In cases where args->insize = 0, the Field size is 0, resulting in
an error.
Fix this by using zero insize only if 0x5 error code is returned
Tested on Omen 15 AMD (2020) board ID: 8786.
Fixes:
be9d73e64957 ("platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls")
Signed-off-by: Bedant Patnaik <bedant.patnaik@gmail.com>
Tested-by: Jorge Lopez <jorge.lopez2@hp.com>
Link: https://lore.kernel.org/r/41be46743d21c78741232a47bbb5f1cdbcc3d21e.camel@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Jorge Lopez [Wed, 8 Jun 2022 21:29:23 +0000 (16:29 -0500)]
platform/x86: hp-wmi: Resolve WMI query failures on some devices
WMI queries fail on some devices where the ACPI method HWMC
unconditionally attempts to create Fields beyond the buffer
if the buffer is too small, this breaks essential features
such as power profiles:
CreateByteField (Arg1, 0x10, D008)
CreateByteField (Arg1, 0x11, D009)
CreateByteField (Arg1, 0x12, D010)
CreateDWordField (Arg1, 0x10, D032)
CreateField (Arg1, 0x80, 0x0400, D128)
In cases where args->data had zero length, ACPI BIOS Error
(bug): AE_AML_BUFFER_LIMIT, Field [D008] at bit
offset/length 128/8 exceeds size of target Buffer (128 bits)
(
20211217/dsopcode-198) was obtained.
ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [D009] at bit
offset/length 136/8 exceeds size of target Buffer (136bits)
(
20211217/dsopcode-198)
The original code created a buffer size of 128 bytes regardless if
the WMI call required a smaller buffer or not. This particular
behavior occurs in older BIOS and reproduced in OMEN laptops. Newer
BIOS handles buffer sizes properly and meets the latest specification
requirements. This is the reason why testing with a dynamically
allocated buffer did not uncover any failures with the test systems at
hand.
This patch was tested on several OMEN, Elite, and Zbooks. It was
confirmed the patch resolves HPWMI_FAN GET/SET calls in an OMEN
Laptop 15-ek0xxx. No problems were reported when testing on several Elite
and Zbooks notebooks.
Fixes:
4b4967cbd268 ("platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated")
Signed-off-by: Jorge Lopez <jorge.lopez2@hp.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220608212923.8585-2-jorge.lopez2@hp.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Jonathan Neuschäfer [Thu, 9 Jun 2022 23:41:10 +0000 (01:41 +0200)]
workqueue: Switch to new kerneldoc syntax for named variable macro argument
The syntax without dots is available since commit
43756e347f21
("scripts/kernel-doc: Add support for named variable macro arguments").
The same HTML output is produced with and without this patch.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Sat, 11 Jun 2022 23:56:41 +0000 (16:56 -0700)]
Merge tag 'gpio-fixes-for-v5.19-rc2' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"A set of fixes. Most address the new warning we emit at build time
when irq chips are not immutable with some additional tweaks to
gpio-crystalcove from Andy and a small tweak to gpio-dwapd.
- make irq_chip structs immutable in several Diolan and intel drivers
to get rid of the new warning we emit when fiddling with irq chips
- don't print error messages on probe deferral in gpio-dwapb"
* tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: dwapb: Don't print error on -EPROBE_DEFER
gpio: dln2: make irq_chip immutable
gpio: sch: make irq_chip immutable
gpio: merrifield: make irq_chip immutable
gpio: wcove: make irq_chip immutable
gpio: crystalcove: Join function declarations and long lines
gpio: crystalcove: Use specific type and API for IRQ number
gpio: crystalcove: make irq_chip immutable
Linus Torvalds [Sat, 11 Jun 2022 23:50:39 +0000 (16:50 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Driver fixes and and one core patch.
Nine of the driver patches are minor fixes and reworks to lpfc and the
rest are trivial and minor fixes elsewhere"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: pmcraid: Fix missing resource cleanup in error case
scsi: ipr: Fix missing/incorrect resource cleanup in error case
scsi: mpt3sas: Fix out-of-bounds compiler warning
scsi: lpfc: Update lpfc version to 14.2.0.4
scsi: lpfc: Allow reduced polling rate for nvme_admin_async_event cmd completion
scsi: lpfc: Add more logging of cmd and cqe information for aborted NVMe cmds
scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topology
scsi: lpfc: Resolve NULL ptr dereference after an ELS LOGO is aborted
scsi: lpfc: Address NULL pointer dereference after starget_to_rport()
scsi: lpfc: Resolve some cleanup issues following SLI path refactoring
scsi: lpfc: Resolve some cleanup issues following abort path refactoring
scsi: lpfc: Correct BDE type for XMIT_SEQ64_WQE in lpfc_ct_reject_event()
scsi: vmw_pvscsi: Expand vcpuHint to 16 bits
scsi: sd: Fix interpretation of VPD B9h length
Linus Torvalds [Sat, 11 Jun 2022 23:32:47 +0000 (16:32 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Fixes all over the place, most notably fixes for latent bugs in
drivers that got exposed by suppressing interrupts before DRIVER_OK,
which in turn has been done by
8b4ec69d7e09 ("virtio: harden vring
IRQ")"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
um: virt-pci: set device ready in probe()
vdpa: make get_vq_group and set_group_asid optional
virtio: Fix all occurences of the "the the" typo
vduse: Fix NULL pointer dereference on sysfs access
vringh: Fix loop descriptors check in the indirect cases
vdpa/mlx5: clean up indenting in handle_ctrl_vlan()
vdpa/mlx5: fix error code for deleting vlan
virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed
vdpa/mlx5: Fix syntax errors in comments
virtio-rng: make device ready before making request
Linus Torvalds [Sat, 11 Jun 2022 19:37:39 +0000 (12:37 -0700)]
Merge tag 'loongarch-fixes-5.19-1' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen.
"Fix build errors and a stale comment"
* tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Remove MIPS comment about cycle counter
LoongArch: Fix copy_thread() build errors
LoongArch: Fix the !CONFIG_SMP build
Linus Torvalds [Sat, 11 Jun 2022 17:30:20 +0000 (10:30 -0700)]
iov_iter: fix build issue due to possible type mis-match
Commit
6c77676645ad ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()")
introduced a problem on some 32-bit architectures (at least arm, xtensa,
csky,sparc and mips), that have a 'size_t' that is 'unsigned int'.
The reason is that we now do
min(nr * PAGE_SIZE - offset, maxsize);
where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is
'unsigned long'. As a result, the normal C type rules means that the
first argument to 'min()' ends up being 'unsigned long'.
In contrast, 'maxsize' is of type 'size_t'.
Now, 'size_t' and 'unsigned long' are always the same physical type in
the kernel, so you'd think this doesn't matter, and from an actual
arithmetic standpoint it doesn't.
But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if
it could also be 'unsigned long'. In that situation, both are unsigned
32-bit types, but they are not the *same* type.
And as a result 'min()' will complain about the distinct types (ignore
the "pointer types" part of the error message: that's an artifact of the
way we have made 'min()' check types for being the same):
lib/iov_iter.c: In function 'iter_xarray_get_pages':
include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^~
lib/iov_iter.c:1464:16: note: in expansion of macro 'min'
1464 | return min(nr * PAGE_SIZE - offset, maxsize);
| ^~~
This was not visible on 64-bit architectures (where we always define
'size_t' to be 'unsigned long').
Force these cases to use 'min_t(size_t, x, y)' to make the type explicit
and avoid the issue.
[ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned
long' arithmetically. We've certainly historically seen environments
with 16-bit address spaces and 32-bit 'unsigned long'.
Similarly, even in 64-bit modern environments, 'size_t' could be its
own type distinct from 'unsigned long', even if it were arithmetically
identical.
So the above type commentary is only really descriptive of the kernel
environment, not some kind of universal truth for the kinds of wild
and crazy situations that are allowed by the C standard ]
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/
Cc: Jeff Layton <jlayton@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason A. Donenfeld [Fri, 10 Jun 2022 14:32:02 +0000 (16:32 +0200)]
wireguard: selftests: use maximum cpu features and allow rng seeding
By forcing the maximum CPU that QEMU has available, we expose additional
capabilities, such as the RNDR instruction, which increases test
coverage. This then allows the CI to skip the fake seeding step in some
cases. Also enable STRICT_KERNEL_RWX to catch issues related to early
jump labels when the RNG is initialized at boot.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Kuan-Ying Lee [Fri, 10 Jun 2022 07:14:57 +0000 (15:14 +0800)]
scripts/gdb: change kernel config dumping method
MAGIC_START("IKCFG_ST") and MAGIC_END("IKCFG_ED") are moved out
from the kernel_config_data variable.
Thus, we parse kernel_config_data directly instead of considering
offset of MAGIC_START and MAGIC_END.
Fixes:
13610aa908dc ("kernel/configs: use .incbin directive to embed config_data.gz")
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Jakub Kicinski [Sat, 11 Jun 2022 05:18:22 +0000 (22:18 -0700)]
Merge branch 'documentation-add-description-for-a-couple-of-sctp-sysctl-options'
Xin Long says:
====================
Documentation: add description for a couple of sctp sysctl options
These are a couple of sysctl options I recently added, but missed adding
documents for them. Especially for net.sctp.intl_enable, it's hard for
users to setup stream interleaving, as it also needs to call some socket
options.
This patchset is to add documents for them.
====================
Link: https://lore.kernel.org/r/cover.1654787716.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Thu, 9 Jun 2022 15:17:15 +0000 (11:17 -0400)]
Documentation: add description for net.sctp.ecn_enable
Describe it in networking/ip-sysctl.rst like other SCTP options.
Fixes:
2f5268a9249b ("sctp: allow users to set netns ecn flag with sysctl")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Thu, 9 Jun 2022 15:17:14 +0000 (11:17 -0400)]
Documentation: add description for net.sctp.intl_enable
Describe it in networking/ip-sysctl.rst like other SCTP options.
We need to document this especially as when using the feature
of User Message Interleaving, some socket options also needs
to be set.
Fixes:
463118c34a35 ("sctp: support sysctl to allow users to use stream interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Thu, 9 Jun 2022 15:17:13 +0000 (11:17 -0400)]
Documentation: add description for net.sctp.reconf_enable
Describe it in networking/ip-sysctl.rst like other SCTP options.
Fixes:
c0d8bab6ae51 ("sctp: add get and set sockopt for reconf_enable")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 11 Jun 2022 05:10:30 +0000 (22:10 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-06-09
Grzegorz prevents addition of TC flower filters to TC0 and fixes queue
iteration for VF ADQ to number of actual queues for i40e.
Aleksandr prevents running of ethtool tests when device is being reset
for i40e.
Michal resolves an issue where iavf does not report its MAC address
properly.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix issue with MAC address of VF shown as zero
i40e: Fix call trace in setup_tx_descriptors
i40e: Fix calculating the number of queue pairs
i40e: Fix adding ADQ filter to TC0
====================
Link: https://lore.kernel.org/r/20220609162620.2619258-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vincent Whitchurch [Fri, 10 Jun 2022 15:12:03 +0000 (17:12 +0200)]
um: virt-pci: set device ready in probe()
Call virtio_device_ready() to make this driver work after commit
b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the
virtqueues in the probe function. (The virtio core sets the device
ready when probe returns.)
Fixes:
8b4ec69d7e09 ("virtio: harden vring IRQ")
Fixes:
68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Message-Id: <
20220610151203.
3492541-1-vincent.whitchurch@axis.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Linus Torvalds [Sat, 11 Jun 2022 00:28:43 +0000 (17:28 -0700)]
Merge tag 'nfsd-5.19-1' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Notable changes:
- There is now a backup maintainer for NFSD
Notable fixes:
- Prevent array overruns in svc_rdma_build_writes()
- Prevent buffer overruns when encoding NFSv3 READDIR results
- Fix a potential UAF in nfsd_file_put()"
* tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
SUNRPC: Clean up xdr_get_next_encode_buffer()
SUNRPC: Clean up xdr_commit_encode()
SUNRPC: Optimize xdr_reserve_space()
SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
SUNRPC: Trap RDMA segment overflows
NFSD: Fix potential use-after-free in nfsd_file_put()
MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
Shyam Prasad N [Mon, 6 Jun 2022 09:52:46 +0000 (09:52 +0000)]
cifs: populate empty hostnames for extra channels
Currently, the secondary channels of a multichannel session
also get hostname populated based on the info in primary channel.
However, this will end up with a wrong resolution of hostname to
IP address during reconnect.
This change fixes this by not populating hostname info for all
secondary channels.
Fixes:
5112d80c162f ("cifs: populate server_hostname for extra channels")
Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Fri, 10 Jun 2022 23:32:49 +0000 (16:32 -0700)]
Merge tag 'for-5.19/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM core's bioset initialization so that blk integrity pool is
properly setup. Remove now unused bioset_init_from_src.
- Fix DM zoned hang from locking imbalance due to needless check in
clone_endio().
* tag 'for-5.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix zoned locking imbalance due to needless check in clone_endio
block: remove bioset_init_from_src
dm: fix bio_set allocation
Linus Torvalds [Fri, 10 Jun 2022 23:15:19 +0000 (16:15 -0700)]
Merge branch 'fscache-fixes' of git://git./linux/kernel/git/dhowells/linux-fs
Pull fscache cleanups from David Howells:
- fix checker complaint in afs
- two netfs cleanups:
- netfs_inode calling convention cleanup plus the requisite
documentation changes
- replace the ->cleanup op with a ->free_request op.
This is possible as the I/O request is now always available at
the cleanup point as the stuff to be cleaned up is no longer
passed into the API functions, but rather obtained by ->init_request.
* 'fscache-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
netfs: Rename the netfs_io_request cleanup op and give it an op pointer
netfs: Further cleanups after struct netfs_inode wrapper introduced
afs: Fix some checker issues
Linus Torvalds [Fri, 10 Jun 2022 22:53:09 +0000 (15:53 -0700)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs
Pull iov_iter fix from Al Viro:
"ITER_XARRAY get_pages fix; now the return value is a lot saner (and
more similar to logics for other flavours)"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
iov_iter: Fix iter_xarray_get_pages{,_alloc}()
August Wikerfors [Wed, 8 Jun 2022 21:20:28 +0000 (23:20 +0200)]
platform/x86: gigabyte-wmi: Add support for B450M DS3H-CF
Tested and works on my system.
Signed-off-by: August Wikerfors <git@augustwikerfors.se>
Link: https://lore.kernel.org/r/20220608212028.28307-1-git@augustwikerfors.se
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Piotr Chmura [Mon, 6 Jun 2022 17:15:13 +0000 (19:15 +0200)]
platform/x86: gigabyte-wmi: Add Z690M AORUS ELITE AX DDR4 support
Add dmi_system_id of Gigabyte Z690M AORUS ELITE AX DDR4 board.
Tested on my PC.
Signed-off-by: Piotr Chmura <chmooreck@gmail.com>
Link: https://lore.kernel.org/r/bd83567e-ebf5-0b31-074b-5f6dc7f7c147@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Jiasheng Jiang [Thu, 26 May 2022 09:03:45 +0000 (17:03 +0800)]
platform/x86: barco-p50-gpio: Add check for platform_driver_register
As platform_driver_register() could fail, it should be better
to deal with the return value in order to maintain the code
consisitency.
Fixes:
86af1d02d458 ("platform/x86: Support for EC-connected GPIOs for identify LED/button on Barco P50 board")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Peter Korsgaard <peter.korsgaard@barco.com>
Link: https://lore.kernel.org/r/20220526090345.1444172-1-jiasheng@iscas.ac.cn
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
George D Sworo [Thu, 2 Jun 2022 01:26:17 +0000 (18:26 -0700)]
platform/x86/intel: pmc: Support Intel Raptorlake P
Add Raptorlake P to the list of the platforms that intel_pmc_core driver
supports for pmc_core device. Raptorlake P PCH is based on Alderlake P
PCH.
Signed-off-by: George D Sworo <george.d.sworo@intel.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220602012617.20100-1-george.d.sworo@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>