Emmanuel Gil Peyrot [Tue, 8 Mar 2022 19:18:20 +0000 (20:18 +0100)]
ARM: fix build error when BPF_SYSCALL is disabled
It was missing a semicolon.
Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Fixes:
25875aa71dfe ("ARM: include unprivileged BPF status in Spectre V2 reporting").
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 8 Mar 2022 19:52:45 +0000 (11:52 -0800)]
Merge tag 'devicetree-fixes-for-5.17-3' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix pinctrl node name warnings in examples
- Add missing 'mux-states' property in ti,tcan104x-can binding
* tag 'devicetree-fixes-for-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: phy: ti,tcan104x-can: Document mux-states property
dt-bindings: mfd: Fix pinctrl node name warnings
Linus Torvalds [Tue, 8 Mar 2022 17:41:18 +0000 (09:41 -0800)]
Merge tag 'fuse-fixes-5.17-rc8' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix an issue with splice on the fuse device
- Fix a regression in the fileattr API conversion
- Add a small userspace API improvement
* tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix pipe buffer lifetime for direct_io
fuse: move FUSE_SUPER_MAGIC definition to magic.h
fuse: fix fileattr op failure
Linus Torvalds [Tue, 8 Mar 2022 17:27:25 +0000 (09:27 -0800)]
Merge tag 'arm64-spectre-bhb-for-v5.17-2' of git://git./linux/kernel/git/arm64/linux
Pull arm64 spectre fixes from James Morse:
"ARM64 Spectre-BHB mitigations:
- Make EL1 vectors per-cpu
- Add mitigation sequences to the EL1 and EL2 vectors on vulnerble
CPUs
- Implement ARCH_WORKAROUND_3 for KVM guests
- Report Vulnerable when unprivileged eBPF is enabled"
* tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
arm64: Use the clearbhb instruction in mitigations
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
arm64: Mitigate spectre style branch history side channels
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
arm64: Add percpu vectors for EL1
arm64: entry: Add macro for reading symbol addresses from the trampoline
arm64: entry: Add vectors that have the bhb mitigation sequences
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
arm64: entry: Allow the trampoline text to occupy multiple pages
arm64: entry: Make the kpti trampoline's kpti sequence optional
arm64: entry: Move trampoline macros out of ifdef'd section
arm64: entry: Don't assume tramp_vectors is the start of the vectors
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
arm64: entry: Move the trampoline data page before the text page
arm64: entry: Free up another register on kpti's tramp_exit path
arm64: entry: Make the trampoline cleanup optional
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
arm64: entry.S: Add ventry overflow sanity checks
Linus Torvalds [Tue, 8 Mar 2022 17:08:06 +0000 (09:08 -0800)]
Merge tag 'for-linus-bhb' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM spectre fixes from Russell King:
"ARM Spectre BHB mitigations.
These patches add Spectre BHB migitations for the following Arm CPUs
to the 32-bit ARM kernels:
- Cortex A15
- Cortex A57
- Cortex A72
- Cortex A73
- Cortex A75
- Brahma B15
for CVE-2022-23960"
* tag 'for-linus-bhb' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: include unprivileged BPF status in Spectre V2 reporting
ARM: Spectre-BHB workaround
ARM: use LOADADDR() to get load address of sections
ARM: early traps initialisation
ARM: report Spectre v2 status through sysfs
Aswath Govindraju [Thu, 16 Dec 2021 04:10:11 +0000 (09:40 +0530)]
dt-bindings: phy: ti,tcan104x-can: Document mux-states property
On some boards, for routing CAN signals from controller to transceivers,
muxes might need to be set. This can be implemented using mux-states
property. Therefore, document the same in the respective bindings.
Signed-off-by: Aswath Govindraju <a-govindraju@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211216041012.16892-2-a-govindraju@ti.com
Rob Herring [Thu, 3 Mar 2022 23:23:49 +0000 (17:23 -0600)]
dt-bindings: mfd: Fix pinctrl node name warnings
The recent addition pinctrl.yaml in commit
c09acbc499e8 ("dt-bindings:
pinctrl: use pinctrl.yaml") resulted in some node name warnings:
Documentation/devicetree/bindings/mfd/cirrus,lochnagar.example.dt.yaml: \
lochnagar-pinctrl: $nodename:0: 'lochnagar-pinctrl' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/cirrus,madera.example.dt.yaml: \
codec@1a: $nodename:0: 'codec@1a' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/brcm,cru.example.dt.yaml: \
pin-controller@1c0: $nodename:0: 'pin-controller@1c0' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Fix the node names to the preferred 'pinctrl'. For cirrus,madera,
nothing from pinctrl.yaml schema is used, so just drop the reference.
Fixes:
c09acbc499e8 ("dt-bindings: pinctrl: use pinctrl.yaml")
Cc: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220303232350.2591143-1-robh@kernel.org
Russell King (Oracle) [Mon, 7 Mar 2022 19:28:32 +0000 (19:28 +0000)]
ARM: include unprivileged BPF status in Spectre V2 reporting
The mitigations for Spectre-BHB are only applied when an exception
is taken, but when unprivileged BPF is enabled, userspace can
load BPF programs that can be used to exploit the problem.
When unprivileged BPF is enabled, report the vulnerable status via
the spectre_v2 sysfs file.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Linus Torvalds [Tue, 8 Mar 2022 01:29:47 +0000 (17:29 -0800)]
Merge tag 'x86_bugs_for_v5.17' of git://git./linux/kernel/git/tip/tip
Pull x86 spectre fixes from Borislav Petkov:
- Mitigate Spectre v2-type Branch History Buffer attacks on machines
which support eIBRS, i.e., the hardware-assisted speculation
restriction after it has been shown that such machines are vulnerable
even with the hardware mitigation.
- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
it is insufficient to mitigate such attacks. Instead, switch to
retpolines on all AMD by default.
- Update the docs and add some warnings for the obviously vulnerable
cmdline configurations.
* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
x86/speculation: Warn about Spectre v2 LFENCE mitigation
x86/speculation: Update link to AMD speculation whitepaper
x86/speculation: Use generic retpoline by default on AMD
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Documentation/hw-vuln: Update spectre doc
x86/speculation: Add eIBRS + Retpoline options
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Linus Torvalds [Mon, 7 Mar 2022 19:43:22 +0000 (11:43 -0800)]
Merge tag 'mtd/fixes-for-5.17-rc8' of git://git./linux/kernel/git/mtd/linux
Pull MTD fix from Miquel Raynal:
"As part of a previous changeset introducing support for the K3
architecture, the OMAP_GPMC (a non visible symbol) got selected by the
selection of MTD_NAND_OMAP2 instead of doing so from the architecture
directly (like for the other users of these two drivers). Indeed, from
a hardware perspective, the OMAP NAND controller needs the GPMC to
work.
This led to a robot error which got addressed in fix merge into -rc4.
Unfortunately, the approach at this time still used "select" and lead
to further build error reports (sparc64:allmodconfig).
This time we switch to 'depends on' in order to prevent random
misconfigurations. The different dependencies will however need a
future cleanup"
* tag 'mtd/fixes-for-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: omap2: Actually prevent invalid configuration and build error
Linus Torvalds [Mon, 7 Mar 2022 19:32:17 +0000 (11:32 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Some last minute fixes that took a while to get ready. Not
regressions, but they look safe and seem to be worth to have"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: handle fallout from folio work
tools/virtio: fix virtio_test execution
vhost: remove avail_event arg from vhost_update_avail_event()
virtio: drop default for virtio-mem
vdpa: fix use-after-free on vp_vdpa_remove
virtio-blk: Remove BUG_ON() in virtio_queue_rq()
virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
vhost: fix hung thread due to erroneous iotlb entries
vduse: Fix returning wrong type in vduse_domain_alloc_iova()
vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
virtio_console: break out of buf poll on remove
virtio: document virtio_reset_device
virtio: acknowledge all features before access
virtio: unexport virtio_finalize_features
Halil Pasic [Sat, 5 Mar 2022 17:07:14 +0000 (18:07 +0100)]
swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes:
ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Morse [Thu, 3 Mar 2022 16:53:56 +0000 (16:53 +0000)]
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
The mitigations for Spectre-BHB are only applied when an exception is
taken from user-space. The mitigation status is reported via the spectre_v2
sysfs vulnerabilities file.
When unprivileged eBPF is enabled the mitigation in the exception vectors
can be avoided by an eBPF program.
When unprivileged eBPF is enabled, print a warning and report vulnerable
via the sysfs vulnerabilities file.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Roger Quadros [Sat, 19 Feb 2022 19:36:00 +0000 (21:36 +0200)]
mtd: rawnand: omap2: Actually prevent invalid configuration and build error
The root of the problem is that we are selecting symbols that have
dependencies. This can cause random configurations that can fail.
The cleanest solution is to avoid using select.
This driver uses interfaces from the OMAP_GPMC driver so we have to
depend on it instead.
Fixes:
4cd335dae3cf ("mtd: rawnand: omap2: Prevent invalid configuration and build error")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/linux-mtd/20220219193600.24892-1-rogerq@kernel.org
Miklos Szeredi [Mon, 7 Mar 2022 15:30:44 +0000 (16:30 +0100)]
fuse: fix pipe buffer lifetime for direct_io
In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.
On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.
Fix by copying pages coming from the user address space to new pipe
buffers.
Reported-by: Jann Horn <jannh@google.com>
Fixes:
c3021629a0d8 ("fuse: support splice() reading from fuse device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Linus Torvalds [Sun, 6 Mar 2022 22:28:31 +0000 (14:28 -0800)]
Linux 5.17-rc7
Linus Torvalds [Sun, 6 Mar 2022 20:19:36 +0000 (12:19 -0800)]
Merge tag 'for-5.17-rc6-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes for various problems that have user visible effects
or seem to be urgent:
- fix corruption when combining DIO and non-blocking io_uring over
multiple extents (seen on MariaDB)
- fix relocation crash due to premature return from commit
- fix quota deadlock between rescan and qgroup removal
- fix item data bounds checks in tree-checker (found on a fuzzed
image)
- fix fsync of prealloc extents after EOF
- add missing run of delayed items after unlink during log replay
- don't start relocation until snapshot drop is finished
- fix reversed condition for subpage writers locking
- fix warning on page error"
* tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fallback to blocking mode when doing async dio over multiple extents
btrfs: add missing run of delayed items after unlink during log replay
btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
btrfs: do not start relocation until in progress drops are done
btrfs: tree-checker: use u64 for item data end to avoid overflow
btrfs: do not WARN_ON() if we have PageError set
btrfs: fix lost prealloc extents beyond eof after full fsync
btrfs: subpage: fix a wrong check on subpage->writers
Linus Torvalds [Sun, 6 Mar 2022 20:08:42 +0000 (12:08 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86 guest:
- Tweaks to the paravirtualization code, to avoid using them when
they're pointless or harmful
x86 host:
- Fix for SRCU lockdep splat
- Brown paper bag fix for the propagation of errno"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
KVM: x86: Yield to IPI target vCPU only if it is busy
x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
x86/kvm: Don't waste memory if kvmclock is disabled
x86/kvm: Don't use PV TLB/yield when mwait is advertised
Linus Torvalds [Sun, 6 Mar 2022 19:57:42 +0000 (11:57 -0800)]
Merge tag 'powerpc-5.17-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set.
Thanks to Murilo Opsfelder Araujo, and Erhard F"
* tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
Linus Torvalds [Sun, 6 Mar 2022 19:47:59 +0000 (11:47 -0800)]
Merge tag 'trace-v5.17-rc5' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix sorting on old "cpu" value in histograms
- Fix return value of __setup() boot parameter handlers
* tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix return value of __setup handlers
tracing/histogram: Fix sorting on old "cpu" value
Michael S. Tsirkin [Fri, 4 Mar 2022 17:10:38 +0000 (12:10 -0500)]
tools/virtio: handle fallout from folio work
just add a stub
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jan 2022 15:06:31 +0000 (16:06 +0100)]
tools/virtio: fix virtio_test execution
virtio_test hangs on __vring_new_virtqueue() because `vqs_list_lock`
is not initialized.
Let's initialize it in vdev_info_init().
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220118150631.167015-1-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Stefano Garzarella [Thu, 13 Jan 2022 14:11:34 +0000 (15:11 +0100)]
vhost: remove avail_event arg from vhost_update_avail_event()
In vhost_update_avail_event() we never used the `avail_event` argument,
since its introduction in commit
2723feaa8ec6 ("vhost: set log when
updating used flags or avail event").
Let's remove it to clean up the code.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220113141134.186773-1-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Fri, 25 Feb 2022 11:46:34 +0000 (06:46 -0500)]
virtio: drop default for virtio-mem
There's no special reason why virtio-mem needs a default that's
different from what kconfig provides, any more than e.g. virtio blk.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Zhang Min [Tue, 1 Mar 2022 09:10:59 +0000 (17:10 +0800)]
vdpa: fix use-after-free on vp_vdpa_remove
When vp_vdpa driver is unbind, vp_vdpa is freed in vdpa_unregister_device
and then vp_vdpa->mdev.pci_dev is dereferenced in vp_modern_remove,
triggering use-after-free.
Call Trace of unbinding driver free vp_vdpa :
do_syscall_64
vfs_write
kernfs_fop_write_iter
device_release_driver_internal
pci_device_remove
vp_vdpa_remove
vdpa_unregister_device
kobject_release
device_release
kfree
Call Trace of dereference vp_vdpa->mdev.pci_dev:
vp_modern_remove
pci_release_selected_regions
pci_release_region
pci_resource_len
pci_resource_end
(dev)->resource[(bar)].end
Signed-off-by: Zhang Min <zhang.min9@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Link: https://lore.kernel.org/r/20220301091059.46869-1-wang.yi59@zte.com.cn
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes:
64b9f64f80a6 ("vdpa: introduce virtio pci driver")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Xie Yongji [Fri, 4 Mar 2022 10:00:58 +0000 (18:00 +0800)]
virtio-blk: Remove BUG_ON() in virtio_queue_rq()
Currently we have a BUG_ON() to make sure the number of sg
list does not exceed queue_max_segments() in virtio_queue_rq().
However, the block layer uses queue_max_discard_segments()
instead of queue_max_segments() to limit the sg list for
discard requests. So the BUG_ON() might be triggered if
virtio-blk device reports a larger value for max discard
segment than queue_max_segments(). To fix it, let's simply
remove the BUG_ON() which has become unnecessary after commit
02746e26c39e("virtio-blk: avoid preallocating big SGL for data").
And the unused vblk->sg_elems can also be removed together.
Fixes:
1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20220304100058.116-2-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xie Yongji [Fri, 4 Mar 2022 10:00:57 +0000 (18:00 +0800)]
virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
Currently the value of max_discard_segment will be set to
MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device
set 0 to max_discard_seg in configuration space. It's incorrect
since the device might not be able to handle such large descriptors.
To fix it, let's follow max_segments restrictions in this case.
Fixes:
1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Anirudh Rayabharam [Sat, 5 Mar 2022 09:55:25 +0000 (15:25 +0530)]
vhost: fix hung thread due to erroneous iotlb entries
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when
start is 0 and last is ULONG_MAX. One instance where it can happen
is when userspace sends an IOTLB message with iova=size=uaddr=0
(vhost_process_iotlb_msg). So, an entry with size = 0, start = 0,
last = ULONG_MAX ends up in the iotlb. Next time a packet is sent,
iotlb_access_ok() loops indefinitely due to that erroneous entry.
Call Trace:
<TASK>
iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Reported by syzbot at:
https://syzkaller.appspot.com/bug?extid=
0abd373e2e50d704db87
To fix this, do two things:
1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map
a range with size 0.
2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX]
by splitting it into two entries.
Fixes:
0bbe30668d89e ("vhost: factor out IOTLB")
Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Sat, 5 Mar 2022 23:49:45 +0000 (15:49 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a fixup for Goodix touchscreen driver allowing it to work on certain
Cherry Trail devices
- a fix for imbalanced enable/disable regulator in Elam touchpad driver
that became apparent when used with Asus TF103C 2-in-1 dock
- a couple new input keycodes used on newer keyboards
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
HID: add mapping for KEY_ALL_APPLICATIONS
HID: add mapping for KEY_DICTATE
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource
Input: goodix - use the new soc_intel_is_byt() helper
Input: samsung-keypad - properly state IOMEM dependency
Linus Torvalds [Sat, 5 Mar 2022 20:03:14 +0000 (12:03 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (hugetlb, pagemap, and
userfaultfd), memfd, selftests, and kconfig"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
configs/debug: set CONFIG_DEBUG_INFO=y properly
proc: fix documentation and description of pagemap
kselftest/vm: fix tests build with old libc
memfd: fix F_SEAL_WRITE after shmem huge page allocated
mm: fix use-after-free when anon vma name is used after vma is freed
mm: prevent vm_area_struct::anon_name refcount saturation
mm: refactor vm_area_struct::anon_vma_name usage code
selftests/vm: cleanup hugetlb file after mremap test
Linus Torvalds [Sat, 5 Mar 2022 19:25:26 +0000 (11:25 -0800)]
Merge tag 's390-5.17-5' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct
switching between ftrace_caller/ftrace_regs_caller and supplying
pt_regs only when ftrace_regs_caller is activated.
- Fix exception table sorting.
- Fix breakage of kdump tooling by preserving metadata it cannot
function without.
* tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/extable: fix exception table sorting
s390/ftrace: fix arch_ftrace_get_regs implementation
s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
Qian Cai [Sat, 5 Mar 2022 04:29:10 +0000 (20:29 -0800)]
configs/debug: set CONFIG_DEBUG_INFO=y properly
CONFIG_DEBUG_INFO can't be set by user directly, so set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y instead.
Otherwise, we end up with no debuginfo in vmlinux which is a big no-no
for kernel debugging.
Link: https://lkml.kernel.org/r/20220301202920.18488-1-quic_qiancai@quicinc.com
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yun Zhou [Sat, 5 Mar 2022 04:29:07 +0000 (20:29 -0800)]
proc: fix documentation and description of pagemap
Since bit 57 was exported for uffd-wp write-protected (commit
fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
fixing it can reduce some unnecessary confusion.
Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com
Fixes:
fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Colin Cross <ccross@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chengming Zhou [Sat, 5 Mar 2022 04:29:04 +0000 (20:29 -0800)]
kselftest/vm: fix tests build with old libc
The error message when I build vm tests on debian10 (GLIBC 2.28):
userfaultfd.c: In function `userfaultfd_pagemap_test':
userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
in this function); did you mean `MADV_RANDOM'?
if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
^~~~~~~~~~~~
MADV_RANDOM
This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.
Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.com
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 5 Mar 2022 04:29:01 +0000 (20:29 -0800)]
memfd: fix F_SEAL_WRITE after shmem huge page allocated
Wangyong reports: after enabling tmpfs filesystem to support transparent
hugepage with the following command:
echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:
fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.
Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict. Bear in
mind that total_mapcount() itself scans all of the THP subpages, when
choosing to take an XA_CHECK_SCHED latency break.
Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.
Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Reported-by: wangyong <wang.yong12@zte.com.cn>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 5 Mar 2022 04:28:58 +0000 (20:28 -0800)]
mm: fix use-after-free when anon vma name is used after vma is freed
When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed. In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name and it is used after the call to vma_merge.
In the cases when vma_merge merges the original vma and destroys it,
this might result in UAF. For that the original vma would have to hold
the anon_vma_name with the last reference. The following vma would need
to contain a different anon_vma_name object with the same string. Such
scenario is shown below:
madvise_vma_behavior(vma)
madvise_update_vma(vma, ..., anon_name == vma->anon_name)
vma_merge(vma)
__vma_adjust(vma) <-- merges vma with adjacent one
vm_area_free(vma) <-- frees the original vma
replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name
Fix this by raising the name refcount and stabilizing it.
Link: https://lkml.kernel.org/r/20220224231834.1481408-3-surenb@google.com
Link: https://lkml.kernel.org/r/20220223153613.835563-3-surenb@google.com
Fixes:
9a10064f5625 ("mm: add a field to store names for private anonymous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 5 Mar 2022 04:28:55 +0000 (20:28 -0800)]
mm: prevent vm_area_struct::anon_name refcount saturation
A deep process chain with many vmas could grow really high. With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is
2147450880 and the refcounter has
headroom of
1073774592 before it reaches REFCOUNT_SATURATED
(
3221225472).
Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults. Currently the max for pid_max is PID_MAX_LIMIT
(
4194304) and for sysctl_max_map_count it's INT_MAX (
2147483647). In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).
kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter. This would lead to the
refcounted object never being freed. A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.
To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (
2147483647), which still
leaves INT_MAX/2 (
1073741823) values before the counter reaches
REFCOUNT_SATURATED. This should provide enough headroom for raising the
refcounts temporarily.
Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 5 Mar 2022 04:28:51 +0000 (20:28 -0800)]
mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible. This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.
[surenb@google.com: fix comment]
Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Sat, 5 Mar 2022 04:28:48 +0000 (20:28 -0800)]
selftests/vm: cleanup hugetlb file after mremap test
The hugepage-mremap test will create a file in a hugetlb filesystem. In
a default 'run_vmtests' run, the file will contain all the hugetlb
pages. After the test, the file remains and there are no free hugetlb
pages for subsequent tests. This causes those hugetlb tests to fail.
Change hugepage-mremap to take the name of the hugetlb file as an
argument. Unlink the file within the test, and just to be sure remove
the file in the run_vmtests script.
Link: https://lkml.kernel.org/r/20220201033459.156944-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King (Oracle) [Thu, 10 Feb 2022 16:05:45 +0000 (16:05 +0000)]
ARM: Spectre-BHB workaround
Workaround the Spectre BHB issues for Cortex-A15, Cortex-A57,
Cortex-A72, Cortex-A73 and Cortex-A75. We also include Brahma B15 as
well to be safe, which is affected by Spectre V2 in the same ways as
Cortex-A15.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Russell King (Oracle) [Fri, 11 Feb 2022 19:49:50 +0000 (19:49 +0000)]
ARM: use LOADADDR() to get load address of sections
Use the linker's LOADADDR() macro to get the load address of the
sections, and provide a macro to set the start and end symbols.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Russell King (Oracle) [Fri, 11 Feb 2022 19:46:15 +0000 (19:46 +0000)]
ARM: early traps initialisation
Provide a couple of helpers to copy the vectors and stubs, and also
to flush the copied vectors and stubs.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Russell King (Oracle) [Fri, 11 Feb 2022 16:45:54 +0000 (16:45 +0000)]
ARM: report Spectre v2 status through sysfs
As per other architectures, add support for reporting the Spectre
vulnerability status via sysfs CPU.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Murilo Opsfelder Araujo [Tue, 1 Mar 2022 20:47:43 +0000 (17:47 -0300)]
powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not
set:
arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’:
arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’?
811 | if (mmu_linear_psize == MMU_PAGE_4K)
| ^~~~~~~~~~~~~~~~
| mmu_virtual_psize
arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in
Move the declaration of mmu_linear_psize outside of
CONFIG_PPC_64S_HASH_MMU ifdef.
After the above is fixed, it fails later with the following error:
ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe':
file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range'
Fix that, too, by conditioning add_htab_mem_range() symbol to
CONFIG_PPC_64S_HASH_MMU.
Fixes:
387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567
Link: https://lore.kernel.org/r/20220301204743.45133-1-muriloo@linux.ibm.com
Josh Poimboeuf [Fri, 25 Feb 2022 22:32:28 +0000 (14:32 -0800)]
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
The commit
44a3918c8245 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting")
added a warning for the "eIBRS + unprivileged eBPF" combination, which
has been shown to be vulnerable against Spectre v2 BHB-based attacks.
However, there's no warning about the "eIBRS + LFENCE retpoline +
unprivileged eBPF" combo. The LFENCE adds more protection by shortening
the speculation window after a mispredicted branch. That makes an attack
significantly more difficult, even with unprivileged eBPF. So at least
for now the logic doesn't warn about that combination.
But if you then add SMT into the mix, the SMT attack angle weakens the
effectiveness of the LFENCE considerably.
So extend the "eIBRS + unprivileged eBPF" warning to also include the
"eIBRS + LFENCE + unprivileged eBPF + SMT" case.
[ bp: Massage commit message. ]
Suggested-by: Alyssa Milburn <alyssa.milburn@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Josh Poimboeuf [Fri, 25 Feb 2022 22:31:49 +0000 (14:31 -0800)]
x86/speculation: Warn about Spectre v2 LFENCE mitigation
With:
f8a66d608a3e ("x86,bugs: Unconditionally allow spectre_v2=retpoline,amd")
it became possible to enable the LFENCE "retpoline" on Intel. However,
Intel doesn't recommend it, as it has some weaknesses compared to
retpoline.
Now AMD doesn't recommend it either.
It can still be left available as a cmdline option. It's faster than
retpoline but is weaker in certain scenarios -- particularly SMT, but
even non-SMT may be vulnerable in some cases.
So just unconditionally warn if the user requests it on the cmdline.
[ bp: Massage commit message. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Linus Torvalds [Sat, 5 Mar 2022 00:03:46 +0000 (16:03 -0800)]
Merge tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a small UAF fix for blktrace"
* tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block:
blktrace: fix use after free for struct blk_trace
Linus Torvalds [Fri, 4 Mar 2022 19:54:06 +0000 (11:54 -0800)]
Merge tag 'riscv-for-linus-5.17-rc7' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Fixes for a handful of KASAN-related crashes.
- A fix to avoid a crash during boot for SPARSEMEM &&
!SPARSEMEM_VMEMMAP configurations.
- A fix to stop reporting some incorrect errors under DEBUG_VIRTUAL.
- A fix for the K210's device tree to properly populate the interrupt
map, so hart1 will get interrupts again.
* tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: k210: fix broken IRQs on hart1
riscv: Fix kasan pud population
riscv: Move high_memory initialization to setup_bootmem
riscv: Fix config KASAN && DEBUG_VIRTUAL
riscv: Fix DEBUG_VIRTUAL false warnings
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix is_linear_mapping with recent move of KASAN region
Linus Torvalds [Fri, 4 Mar 2022 19:30:57 +0000 (11:30 -0800)]
Merge tag 'iommu-fixes-v5.17-rc6' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix a double list_add() in Intel VT-d code
- Add missing put_device() in Tegra SMMU driver
- Two AMD IOMMU fixes:
- Memory leak in IO page-table freeing code
- Add missing recovery from event-log overflow
* tag 'iommu-fixes-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find
iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
iommu/amd: Fix I/O page table memory leak
iommu/amd: Recover from event log overflow
Linus Torvalds [Fri, 4 Mar 2022 19:19:14 +0000 (11:19 -0800)]
Merge tag 'thermal-5.17-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix NULL pointer dereference in the thermal netlink interface (Nicolas
Cavallari)"
* tag 'thermal-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
Linus Torvalds [Fri, 4 Mar 2022 19:15:00 +0000 (11:15 -0800)]
Merge tag 'sound-5.17-rc7' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last PR for 5.17, including just a few small changes:
an additional fix for ASoC ops boundary check and other minor
device-specific fixes"
* tag 'sound-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: intel_hdmi: Fix reference to PCM buffer address
ASoC: cs4265: Fix the duplicated control name
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
Linus Torvalds [Fri, 4 Mar 2022 19:01:22 +0000 (11:01 -0800)]
Merge tag 'drm-fixes-2022-03-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Things are quieting down as expected, just a small set of fixes, i915,
exynos, amdgpu, vrr, bridge and hdlcd. Nothing scary at all.
i915:
- Fix GuC SLPC unset command
- Fix misidentification of some Apple MacBook Pro laptops as Jasper Lake
amdgpu:
- Suspend regression fix
exynos:
- irq handling fixes
- Fix two regressions to TE-gpio handling
arm/hdlcd:
- Select DRM_GEM_CMEA_HELPER for HDLCD
bridge:
- ti-sn65dsi86: Properly undo autosuspend
vrr:
- Fix potential NULL-pointer deref"
* tag 'drm-fixes-2022-03-04' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix suspend/resume hang regression
drm/vrr: Set VRR capable prop only if it is attached to connector
drm/arm: arm hdlcd select DRM_GEM_CMA_HELPER
drm/bridge: ti-sn65dsi86: Properly undo autosuspend
drm/i915: s/JSP2/ICP2/ PCH
drm/i915/guc/slpc: Correct the param count for unset param
drm/exynos: Search for TE-gpio in DSI panel's node
drm/exynos: Don't fail if no TE-gpio is defined for DSI driver
drm/exynos: gsc: Use platform_get_irq() to get the interrupt
drm/exynos/fimc: Use platform_get_irq() to get the interrupt
drm/exynos/exynos_drm_fimd: Use platform_get_irq_byname() to get the interrupt
drm/exynos: mixer: Use platform_get_irq() to get the interrupt
drm/exynos/exynos7_drm_decon: Use platform_get_irq_byname() to get the interrupt
Linus Torvalds [Fri, 4 Mar 2022 18:56:00 +0000 (10:56 -0800)]
Merge tag 'pinctrl-v5.17-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"These two fixes should fix the issues seen on the OrangePi, first we
needed the correct offset when calling pinctrl_gpio_direction(), and
fixing that made a lockdep issue explode in our face. Both now fixed"
* tag 'pinctrl-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: Use unique lockdep classes for IRQs
pinctrl-sunxi: sunxi_pinctrl_gpio_direction_in/output: use correct offset
Randy Dunlap [Thu, 3 Mar 2022 03:17:44 +0000 (19:17 -0800)]
tracing: Fix return value of __setup handlers
__setup() handlers should generally return 1 to indicate that the
boot options have been handled.
Using invalid option values causes the entire kernel boot option
string to be reported as Unknown and added to init's environment
strings, polluting it.
Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
kprobe_event=p,syscall_any,$arg1 trace_options=quiet
trace_clock=jiffies", will be passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc6
kprobe_event=p,syscall_any,$arg1
trace_options=quiet
trace_clock=jiffies
Return 1 from the __setup() handlers so that init's environment is not
polluted with kernel boot options.
Link: lore.kernel.org/r/
64644a2f-4a20-bab3-1e15-
3b2cdd0defe3@omprussia.ru
Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org
Cc: stable@vger.kernel.org
Fixes:
7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter")
Fixes:
e1e232ca6b8f ("tracing: Add trace_clock=<clock> kernel parameter")
Fixes:
970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Daniel Borkmann [Fri, 4 Mar 2022 14:26:32 +0000 (15:26 +0100)]
mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
syzkaller was recently triggering an oversized kvmalloc() warning via
xdp_umem_create().
The triggered warning was added back in
7661809d493b ("mm: don't allow
oversized kvmalloc() calls"). The rationale for the warning for huge
kvmalloc sizes was as a reaction to a security bug where the size was
more than UINT_MAX but not everything was prepared to handle unsigned
long sizes.
Anyway, the AF_XDP related call trace from this syzkaller report was:
kvmalloc include/linux/mm.h:806 [inline]
kvmalloc_array include/linux/mm.h:824 [inline]
kvcalloc include/linux/mm.h:829 [inline]
xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
__sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
__do_sys_setsockopt net/socket.c:2187 [inline]
__se_sys_setsockopt net/socket.c:2184 [inline]
__x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Björn mentioned that requests for >2GB allocation can still be valid:
The structure that is being allocated is the page-pinning accounting.
AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
PAGE_SIZE on 64 bit systems). [...]
I could just change from U32_MAX to INT_MAX, but as I stated earlier
that has a hacky feeling to it. [...] From my perspective, the code
isn't broken, with the memcg limits in consideration. [...]
Linus says:
[...] Pretty much every time this has come up, the kernel warning has
shown that yes, the code was broken and there really wasn't a reason
for doing allocations that big.
Of course, some people would be perfectly fine with the allocation
failing, they just don't want the warning. I didn't want __GFP_NOWARN
to shut it up originally because I wanted people to see all those
cases, but these days I think we can just say "yeah, people can shut
it up explicitly by saying 'go ahead and fail this allocation, don't
warn about it'".
So enough time has passed that by now I'd certainly be ok with [it].
Thus allow call-sites to silence such userspace triggered splats if the
allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
to kvcalloc() this is already the case, so nothing else needed there.
Fixes:
7661809d493b ("mm: don't allow oversized kvmalloc() calls")
Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Ackd-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xie Yongji [Fri, 21 Jan 2022 08:39:39 +0000 (16:39 +0800)]
vduse: Fix returning wrong type in vduse_domain_alloc_iova()
This fixes the following smatch warnings:
drivers/vdpa/vdpa_user/iova_domain.c:305 vduse_domain_alloc_iova() warn: should 'iova_pfn << shift' be a 64 bit type?
Fixes:
8c773d53fb7b ("vduse: Implement an MMU-based software IOTLB")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20220121083940.102-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Si-Wei Liu [Sat, 15 Jan 2022 00:28:01 +0000 (19:28 -0500)]
vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
When control vq receives a VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
request from the driver, presently there is no validation against the
number of queue pairs to configure, or even if multiqueue had been
negotiated or not is unverified. This may lead to kernel panic due to
uninitialized resource for the queues were there any bogus request
sent down by untrusted driver. Tie up the loose ends there.
Fixes:
52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1642206481-30721-4-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Si-Wei Liu [Sat, 15 Jan 2022 00:28:00 +0000 (19:28 -0500)]
vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
Per VIRTIO v1.1 specification, section 5.1.3.1 Feature bit requirements:
"VIRTIO_NET_F_MQ Requires VIRTIO_NET_F_CTRL_VQ".
There's assumption in the mlx5_vdpa multiqueue code that MQ must come
together with CTRL_VQ. However, there's nowhere in the upper layer to
guarantee this assumption would hold. Were there an untrusted driver
sending down MQ without CTRL_VQ, it would compromise various spots for
e.g. is_index_valid() and is_ctrl_vq_idx(). Although this doesn't end
up with immediate panic or security loophole as of today's code, the
chance for this to be taken advantage of due to future code change is
not zero.
Harden the crispy assumption by failing the set_driver_features() call
when seeing (MQ && !CTRL_VQ). For that end, verify_min_features() is
renamed to verify_driver_features() to reflect the fact that it now does
more than just validate the minimum features. verify_driver_features()
is now used to accommodate various checks against the driver features
for set_driver_features().
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1642206481-30721-3-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Si-Wei Liu [Sat, 15 Jan 2022 00:27:59 +0000 (19:27 -0500)]
vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
No functional change introduced. vdpa bus driver such as virtio_vdpa
or vhost_vdpa is not supposed to take care of the locking for core
by its own. The locked API vdpa_set_features should suffice the
bus driver's need.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/1642206481-30721-2-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Filipe Manana [Wed, 2 Mar 2022 11:48:39 +0000 (11:48 +0000)]
btrfs: fallback to blocking mode when doing async dio over multiple extents
Some users recently reported that MariaDB was getting a read corruption
when using io_uring on top of btrfs. This started to happen in 5.16,
after commit
51bd9563b6783d ("btrfs: fix deadlock due to page faults
during direct IO reads and writes"). That changed btrfs to use the new
iomap flag IOMAP_DIO_PARTIAL and to disable page faults before calling
iomap_dio_rw(). This was necessary to fix deadlocks when the iovector
corresponds to a memory mapped file region. That type of scenario is
exercised by test case generic/647 from fstests.
For this MariaDB scenario, we attempt to read 16K from file offset X
using IOCB_NOWAIT and io_uring. In that range we have 4 extents, each
with a size of 4K, and what happens is the following:
1) btrfs_direct_read() disables page faults and calls iomap_dio_rw();
2) iomap creates a struct iomap_dio object, its reference count is
initialized to 1 and its ->size field is initialized to 0;
3) iomap calls btrfs_dio_iomap_begin() with file offset X, which finds
the first 4K extent, and setups an iomap for this extent consisting
of a single page;
4) At iomap_dio_bio_iter(), we are able to access the first page of the
buffer (struct iov_iter) with bio_iov_iter_get_pages() without
triggering a page fault;
5) iomap submits a bio for this 4K extent
(iomap_dio_submit_bio() -> btrfs_submit_direct()) and increments
the refcount on the struct iomap_dio object to 2; The ->size field
of the struct iomap_dio object is incremented to 4K;
6) iomap calls btrfs_iomap_begin() again, this time with a file
offset of X + 4K. There we setup an iomap for the next extent
that also has a size of 4K;
7) Then at iomap_dio_bio_iter() we call bio_iov_iter_get_pages(),
which tries to access the next page (2nd page) of the buffer.
This triggers a page fault and returns -EFAULT;
8) At __iomap_dio_rw() we see the -EFAULT, but we reset the error
to 0 because we passed the flag IOMAP_DIO_PARTIAL to iomap and
the struct iomap_dio object has a ->size value of 4K (we submitted
a bio for an extent already). The 'wait_for_completion' variable
is not set to true, because our iocb has IOCB_NOWAIT set;
9) At the bottom of __iomap_dio_rw(), we decrement the reference count
of the struct iomap_dio object from 2 to 1. Because we were not
the only ones holding a reference on it and 'wait_for_completion' is
set to false, -EIOCBQUEUED is returned to btrfs_direct_read(), which
just returns it up the callchain, up to io_uring;
10) The bio submitted for the first extent (step 5) completes and its
bio endio function, iomap_dio_bio_end_io(), decrements the last
reference on the struct iomap_dio object, resulting in calling
iomap_dio_complete_work() -> iomap_dio_complete().
11) At iomap_dio_complete() we adjust the iocb->ki_pos from X to X + 4K
and return 4K (the amount of io done) to iomap_dio_complete_work();
12) iomap_dio_complete_work() calls the iocb completion callback,
iocb->ki_complete() with a second argument value of 4K (total io
done) and the iocb with the adjust ki_pos of X + 4K. This results
in completing the read request for io_uring, leaving it with a
result of 4K bytes read, and only the first page of the buffer
filled in, while the remaining 3 pages, corresponding to the other
3 extents, were not filled;
13) For the application, the result is unexpected because if we ask
to read N bytes, it expects to get N bytes read as long as those
N bytes don't cross the EOF (i_size).
MariaDB reports this as an error, as it's not expecting a short read,
since it knows it's asking for read operations fully within the i_size
boundary. This is typical in many applications, but it may also be
questionable if they should react to such short reads by issuing more
read calls to get the remaining data. Nevertheless, the short read
happened due to a change in btrfs regarding how it deals with page
faults while in the middle of a read operation, and there's no reason
why btrfs can't have the previous behaviour of returning the whole data
that was requested by the application.
The problem can also be triggered with the following simple program:
/* Get O_DIRECT */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <liburing.h>
int main(int argc, char *argv[])
{
char *foo_path;
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
struct iovec iovec;
int fd;
long pagesize;
void *write_buf;
void *read_buf;
ssize_t ret;
int i;
if (argc != 2) {
fprintf(stderr, "Use: %s <directory>\n", argv[0]);
return 1;
}
foo_path = malloc(strlen(argv[1]) + 5);
if (!foo_path) {
fprintf(stderr, "Failed to allocate memory for file path\n");
return 1;
}
strcpy(foo_path, argv[1]);
strcat(foo_path, "/foo");
/*
* Create file foo with 2 extents, each with a size matching
* the page size. Then allocate a buffer to read both extents
* with io_uring, using O_DIRECT and IOCB_NOWAIT. Before doing
* the read with io_uring, access the first page of the buffer
* to fault it in, so that during the read we only trigger a
* page fault when accessing the second page of the buffer.
*/
fd = open(foo_path, O_CREAT | O_TRUNC | O_WRONLY |
O_DIRECT, 0666);
if (fd == -1) {
fprintf(stderr,
"Failed to create file 'foo': %s (errno %d)",
strerror(errno), errno);
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
ret = posix_memalign(&write_buf, pagesize, 2 * pagesize);
if (ret) {
fprintf(stderr, "Failed to allocate write buffer\n");
return 1;
}
memset(write_buf, 0xab, pagesize);
memset(write_buf + pagesize, 0xcd, pagesize);
/* Create 2 extents, each with a size matching page size. */
for (i = 0; i < 2; i++) {
ret = pwrite(fd, write_buf + i * pagesize, pagesize,
i * pagesize);
if (ret != pagesize) {
fprintf(stderr,
"Failed to write to file, ret = %ld errno %d (%s)\n",
ret, errno, strerror(errno));
return 1;
}
ret = fsync(fd);
if (ret != 0) {
fprintf(stderr, "Failed to fsync file\n");
return 1;
}
}
close(fd);
fd = open(foo_path, O_RDONLY | O_DIRECT);
if (fd == -1) {
fprintf(stderr,
"Failed to open file 'foo': %s (errno %d)",
strerror(errno), errno);
return 1;
}
ret = posix_memalign(&read_buf, pagesize, 2 * pagesize);
if (ret) {
fprintf(stderr, "Failed to allocate read buffer\n");
return 1;
}
/*
* Fault in only the first page of the read buffer.
* We want to trigger a page fault for the 2nd page of the
* read buffer during the read operation with io_uring
* (O_DIRECT and IOCB_NOWAIT).
*/
memset(read_buf, 0, 1);
ret = io_uring_queue_init(1, &ring, 0);
if (ret != 0) {
fprintf(stderr, "Failed to create io_uring queue\n");
return 1;
}
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
fprintf(stderr, "Failed to get io_uring sqe\n");
return 1;
}
iovec.iov_base = read_buf;
iovec.iov_len = 2 * pagesize;
io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
ret = io_uring_submit_and_wait(&ring, 1);
if (ret != 1) {
fprintf(stderr,
"Failed at io_uring_submit_and_wait()\n");
return 1;
}
ret = io_uring_wait_cqe(&ring, &cqe);
if (ret < 0) {
fprintf(stderr, "Failed at io_uring_wait_cqe()\n");
return 1;
}
printf("io_uring read result for file foo:\n\n");
printf(" cqe->res == %d (expected %d)\n", cqe->res, 2 * pagesize);
printf(" memcmp(read_buf, write_buf) == %d (expected 0)\n",
memcmp(read_buf, write_buf, 2 * pagesize));
io_uring_cqe_seen(&ring, cqe);
io_uring_queue_exit(&ring);
return 0;
}
When running it on an unpatched kernel:
$ gcc io_uring_test.c -luring
$ mkfs.btrfs -f /dev/sda
$ mount /dev/sda /mnt/sda
$ ./a.out /mnt/sda
io_uring read result for file foo:
cqe->res == 4096 (expected 8192)
memcmp(read_buf, write_buf) == -205 (expected 0)
After this patch, the read always returns 8192 bytes, with the buffer
filled with the correct data. Although that reproducer always triggers
the bug in my test vms, it's possible that it will not be so reliable
on other environments, as that can happen if the bio for the first
extent completes and decrements the reference on the struct iomap_dio
object before we do the atomic_dec_and_test() on the reference at
__iomap_dio_rw().
Fix this in btrfs by having btrfs_dio_iomap_begin() return -EAGAIN
whenever we try to satisfy a non blocking IO request (IOMAP_NOWAIT flag
set) over a range that spans multiple extents (or a mix of extents and
holes). This avoids returning success to the caller when we only did
partial IO, which is not optimal for writes and for reads it's actually
incorrect, as the caller doesn't expect to get less bytes read than it has
requested (unless EOF is crossed), as previously mentioned. This is also
the type of behaviour that xfs follows (xfs_direct_write_iomap_begin()),
even though it doesn't use IOMAP_DIO_PARTIAL.
A test case for fstests will follow soon.
Link: https://lore.kernel.org/linux-btrfs/CABVffEM0eEWho+206m470rtM0d9J8ue85TtR-A_oVTuGLWFicA@mail.gmail.com/
Link: https://lore.kernel.org/linux-btrfs/CAHF2GV6U32gmqSjLe=XKgfcZAmLCiH26cJ2OnHGp5x=VAH4OHQ@mail.gmail.com/
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Michael S. Tsirkin [Tue, 5 Oct 2021 07:04:10 +0000 (03:04 -0400)]
virtio_console: break out of buf poll on remove
A common pattern for device reset is currently:
vdev->config->reset(vdev);
.. cleanup ..
reset prevents new interrupts from arriving and waits for interrupt
handlers to finish.
However if - as is common - the handler queues a work request which is
flushed during the cleanup stage, we have code adding buffers / trying
to get buffers while device is reset. Not good.
This was reproduced by running
modprobe virtio_console
modprobe -r virtio_console
in a loop.
Fix this up by calling virtio_break_device + flush before reset.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=
1786239
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Fri, 14 Jan 2022 20:54:01 +0000 (15:54 -0500)]
virtio: document virtio_reset_device
Looks like most callers get driver/device removal wrong.
Document what's expected of callers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Fri, 14 Jan 2022 19:58:41 +0000 (14:58 -0500)]
virtio: acknowledge all features before access
The feature negotiation was designed in a way that
makes it possible for devices to know which config
fields will be accessed by drivers.
This is broken since commit
404123c2db79 ("virtio: allow drivers to
validate features") with fallout in at least block and net. We have a
partial work-around in commit
2f9a174f918e ("virtio: write back
F_VERSION_1 before validate") which at least lets devices find out which
format should config space have, but this is a partial fix: guests
should not access config space without acknowledging features since
otherwise we'll never be able to change the config space format.
To fix, split finalize_features from virtio_finalize_features and
call finalize_features with all feature bits before validation,
and then - if validation changed any bits - once again after.
Since virtio_finalize_features no longer writes out features
rename it to virtio_features_ok - since that is what it does:
checks that features are ok with the device.
As a side effect, this also reduces the amount of hypervisor accesses -
we now only acknowledge features once unless we are clearing any
features when validating (which is uncommon).
IRC I think that this was more or less always the intent in the spec but
unfortunately the way the spec is worded does not say this explicitly, I
plan to address this at the spec level, too.
Acked-by: Jason Wang <jasowang@redhat.com>
Cc: stable@vger.kernel.org
Fixes:
404123c2db79 ("virtio: allow drivers to validate features")
Fixes:
2f9a174f918e ("virtio: write back F_VERSION_1 before validate")
Cc: "Halil Pasic" <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Fri, 14 Jan 2022 19:56:15 +0000 (14:56 -0500)]
virtio: unexport virtio_finalize_features
virtio_finalize_features is only used internally within virtio.
No reason to export it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Niklas Cassel [Tue, 1 Mar 2022 00:44:18 +0000 (00:44 +0000)]
riscv: dts: k210: fix broken IRQs on hart1
Commit
67d96729a9e7 ("riscv: Update Canaan Kendryte K210 device tree")
incorrectly removed two entries from the PLIC interrupt-controller node's
interrupts-extended property.
The PLIC driver cannot know the mapping between hart contexts and hart ids,
so this information has to be provided by device tree, as specified by the
PLIC device tree binding.
The PLIC driver uses the interrupts-extended property, and initializes the
hart context registers in the exact same order as provided by the
interrupts-extended property.
In other words, if we don't specify the S-mode interrupts, the PLIC driver
will simply initialize the hart0 S-mode hart context with the hart1 M-mode
configuration. It is therefore essential to specify the S-mode IRQs even
though the system itself will only ever be running in M-mode.
Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
Cc: <stable@vger.kernel.org>
Fixes:
67d96729a9e7 ("riscv: Update Canaan Kendryte K210 device tree")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Dave Airlie [Fri, 4 Mar 2022 03:04:06 +0000 (13:04 +1000)]
Merge tag 'drm-misc-fixes-2022-03-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
* drm/arm: Select DRM_GEM_CMEA_HELPER for HDLCD
* drm/bridge: ti-sn65dsi86: Properly undo autosuspend
* drm/vrr: Fix potential NULL-pointer deref
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YiCTGZ8IVCw0ilKK@linux-uq9g
Dave Airlie [Fri, 4 Mar 2022 03:02:13 +0000 (13:02 +1000)]
Merge tag 'amd-drm-fixes-5.17-2022-03-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.17-2022-03-02:
amdgpu:
- Suspend regression fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220303045035.5650-1-alexander.deucher@amd.com
Dave Airlie [Fri, 4 Mar 2022 02:55:48 +0000 (12:55 +1000)]
Merge tag 'drm-intel-fixes-2022-03-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix GuC SLPC unset command. (Vinay Belgaumkar)
- Fix misidentification of some Apple MacBook Pro laptops as Jasper Lake. (Ville Syrjälä)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YiCXHiTyCE7TbopG@tursulin-mobl2
William Mahon [Fri, 4 Mar 2022 02:26:22 +0000 (18:26 -0800)]
HID: add mapping for KEY_ALL_APPLICATIONS
This patch adds a new key definition for KEY_ALL_APPLICATIONS
and aliases KEY_DASHBOARD to it.
It also maps the 0x0c/0x2a2 usage code to KEY_ALL_APPLICATIONS.
Signed-off-by: William Mahon <wmahon@chromium.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220303035618.1.I3a7746ad05d270161a18334ae06e3b6db1a1d339@changeid
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
William Mahon [Fri, 4 Mar 2022 02:23:42 +0000 (18:23 -0800)]
HID: add mapping for KEY_DICTATE
Numerous keyboards are adding dictate keys which allows for text
messages to be dictated by a microphone.
This patch adds a new key definition KEY_DICTATE and maps 0x0c/0x0d8
usage code to this new keycode. Additionally hid-debug is adjusted to
recognize this new usage code as well.
Signed-off-by: William Mahon <wmahon@chromium.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220303021501.1.I5dbf50eb1a7a6734ee727bda4a8573358c6d3ec0@changeid
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:53 +0000 (13:39 +0100)]
riscv: Fix kasan pud population
In sv48, the kasan inner regions are not aligned on PGDIR_SIZE and then
when we populate the kasan linear mapping region, we clear the kasan
vmalloc region which is in the same PGD.
Fix this by copying the content of the kasan early pud after allocating a
new PGD for the first time.
Fixes:
e8a62cc26ddf ("riscv: Implement sv48 support")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:52 +0000 (13:39 +0100)]
riscv: Move high_memory initialization to setup_bootmem
high_memory used to be initialized in mem_init, way after setup_bootmem.
But a call to dma_contiguous_reserve in this function gives rise to the
below warning because high_memory is equal to 0 and is used at the very
beginning at cma_declare_contiguous_nid.
It went unnoticed since the move of the kasan region redefined
KERN_VIRT_SIZE so that it does not encompass -1 anymore.
Fix this by initializing high_memory in setup_bootmem.
------------[ cut here ]------------
virt_to_phys used for non-linear address:
ffffffffffffffff (0xffffffffffffffff)
WARNING: CPU: 0 PID: 0 at arch/riscv/mm/physaddr.c:14 __virt_to_phys+0xac/0x1b8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
5.17.0-rc1-00007-ga68b89289e26 #27
Hardware name: riscv-virtio,qemu (DT)
epc : __virt_to_phys+0xac/0x1b8
ra : __virt_to_phys+0xac/0x1b8
epc :
ffffffff80014922 ra :
ffffffff80014922 sp :
ffffffff84a03c30
gp :
ffffffff85866c80 tp :
ffffffff84a3f180 t0 :
ffffffff86bce657
t1 :
fffffffef09406e8 t2 :
0000000000000000 s0 :
ffffffff84a03c70
s1 :
ffffffffffffffff a0 :
000000000000004f a1 :
00000000000f0000
a2 :
0000000000000002 a3 :
ffffffff8011f408 a4 :
0000000000000000
a5 :
0000000000000000 a6 :
0000000000f00000 a7 :
ffffffff84a03747
s2 :
ffffffd800000000 s3 :
ffffffff86ef4000 s4 :
ffffffff8467f828
s5 :
fffffff800000000 s6 :
8000000000006800 s7 :
0000000000000000
s8 :
0000000480000000 s9 :
0000000080038ea0 s10:
0000000000000000
s11:
ffffffffffffffff t3 :
ffffffff84a035c0 t4 :
fffffffef09406e8
t5 :
fffffffef09406e9 t6 :
ffffffff84a03758
status:
0000000000000100 badaddr:
0000000000000000 cause:
0000000000000003
[<
ffffffff8322ef4c>] cma_declare_contiguous_nid+0xf2/0x64a
[<
ffffffff83212a58>] dma_contiguous_reserve_area+0x46/0xb4
[<
ffffffff83212c3a>] dma_contiguous_reserve+0x174/0x18e
[<
ffffffff83208fc2>] paging_init+0x12c/0x35e
[<
ffffffff83206bd2>] setup_arch+0x120/0x74e
[<
ffffffff83201416>] start_kernel+0xce/0x68c
irq event stamp: 0
hardirqs last enabled at (0): [<
0000000000000000>] 0x0
hardirqs last disabled at (0): [<
0000000000000000>] 0x0
softirqs last enabled at (0): [<
0000000000000000>] 0x0
softirqs last disabled at (0): [<
0000000000000000>] 0x0
---[ end trace
0000000000000000 ]---
Fixes:
f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:51 +0000 (13:39 +0100)]
riscv: Fix config KASAN && DEBUG_VIRTUAL
__virt_to_phys function is called very early in the boot process (ie
kasan_early_init) so it should not be instrumented by KASAN otherwise it
bugs.
Fix this by declaring phys_addr.c as non-kasan instrumentable.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes:
8ad8b72721d0 (riscv: Add KASAN support)
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:50 +0000 (13:39 +0100)]
riscv: Fix DEBUG_VIRTUAL false warnings
KERN_VIRT_SIZE used to encompass the kernel mapping before it was
redefined when moving the kasan mapping next to the kernel mapping to only
match the maximum amount of physical memory.
Then, kernel mapping addresses that go through __virt_to_phys are now
declared as wrong which is not true, one can use __virt_to_phys on such
addresses.
Fix this by redefining the condition that matches wrong addresses.
Fixes:
f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:49 +0000 (13:39 +0100)]
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
In order to get the pfn of a struct page* when sparsemem is enabled
without vmemmap, the mem_section structures need to be initialized which
happens in sparse_init.
But kasan_early_init calls pfn_to_page way before sparse_init is called,
which then tries to dereference a null mem_section pointer.
Fix this by removing the usage of this function in kasan_early_init.
Fixes:
8ad8b72721d0 ("riscv: Add KASAN support")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:48 +0000 (13:39 +0100)]
riscv: Fix is_linear_mapping with recent move of KASAN region
The KASAN region was recently moved between the linear mapping and the
kernel mapping, is_linear_mapping used to check the validity of an
address by using the start of the kernel mapping, which is now wrong.
Fix this by using the maximum size of the physical memory.
Fixes:
f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Ammar Faizi [Sat, 26 Feb 2022 07:40:56 +0000 (14:40 +0700)]
MAINTAINERS: Remove dead patchwork link
The patchwork link is dead. It says:
404: File not found
The page URL requested (/project/LKML/list/) does not exist.
Remove it.
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 3 Mar 2022 13:05:18 +0000 (13:05 +0000)]
cachefiles: Fix incorrect length to fallocate()
When cachefiles_shorten_object() calls fallocate() to shape the cache
file to match the DIO size, it passes the total file size it wants to
achieve, not the amount of zeros that should be inserted. Since this is
meant to preallocate that amount of storage for the file, it can cause
the cache to fill up the disk and hit ENOSPC.
Fix this by passing the length actually required to go from the current
EOF to the desired EOF.
Fixes:
7623ed6772de ("cachefiles: Implement cookie resize for truncate")
Reported-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164630854858.3665356.17419701804248490708.stgit@warthog.procyon.org.uk
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 3 Mar 2022 19:10:56 +0000 (11:10 -0800)]
Merge tag 'net-5.17-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, xfrm, wifi, bluetooth, and netfilter.
Lots of various size fixes, the length of the tag speaks for itself.
Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees
which had been lagging as you pointed out previously. But there's also
a larger than we'd like portion of fixes for bugs from previous
releases.
Three more fixes still under discussion, including and xfrm revert for
uAPI error.
Current release - regressions:
- iwlwifi: don't advertise TWT support, prevent FW crash
- xfrm: fix the if_id check in changelink
- xen/netfront: destroy queues before real_num_tx_queues is zeroed
- bluetooth: fix not checking MGMT cmd pending queue, make scanning
work again
Current release - new code bugs:
- mptcp: make SIOCOUTQ accurate for fallback socket
- bluetooth: access skb->len after null check
- bluetooth: hci_sync: fix not using conn_timeout
- smc: fix cleanup when register ULP fails
- dsa: restore error path of dsa_tree_change_tag_proto
- iwlwifi: fix build error for IWLMEI
- iwlwifi: mvm: propagate error from request_ownership to the user
Previous releases - regressions:
- xfrm: fix pMTU regression when reported pMTU is too small
- xfrm: fix TCP MSS calculation when pMTU is close to 1280
- bluetooth: fix bt_skb_sendmmsg not allocating partial chunks
- ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks
- ipv6: prevent leaks in igmp6 when input queues get full
- fix up skbs delta_truesize in UDP GRO frag_list
- eth: e1000e: fix possible HW unit hang after an s0ix exit
- eth: e1000e: correct NVM checksum verification flow
- ptp: ocp: fix large time adjustments
Previous releases - always broken:
- tcp: make tcp_read_sock() more robust in presence of urgent data
- xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate
- xfrm: fix xfrm_migrate issues when address family changes
- dcb: flush lingering app table entries for unregistered devices
- smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
- mac80211: fix EAPoL rekey fail in 802.3 rx path
- mac80211: fix forwarded mesh frames AC & queue selection
- netfilter: nf_queue: fix socket access races and bugs
- batman-adv: fix ToCToU iflink problems and check the result belongs
to the expected net namespace
- can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting
- can: rcar_canfd: register the CAN device when fully ready
- eth: igb, igc: phy: drop premature return leaking HW semaphore
- eth: ixgbe: xsk: change !netif_carrier_ok() handling in
ixgbe_xmit_zc(), prevent live lock when link goes down
- eth: stmmac: only enable DMA interrupts when ready
- eth: sparx5: move vlan checks before any changes are made
- eth: iavf: fix races around init, removal, resets and vlan ops
- ibmvnic: more reset flow fixes
Misc:
- eth: fix return value of __setup handlers"
* tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
selftests: mlxsw: resource_scale: Fix return value
selftests: mlxsw: tc_police_scale: Make test more robust
net: dcb: disable softirqs in dcbnl_flush_dev()
bnx2: Fix an error message
sfc: extend the locking on mcdi->seqno
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
tcp: make tcp_read_sock() more robust
bpf, sockmap: Do not ignore orig_len parameter
net: ipa: add an interconnect dependency
net: fix up skbs delta_truesize in UDP GRO frag_list
iwlwifi: mvm: return value for request_ownership
nl80211: Update bss channel on channel switch for P2P_CLIENT
iwlwifi: fix build error for IWLMEI
ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
batman-adv: Don't expect inter-netns unique iflink indices
...
Linus Torvalds [Thu, 3 Mar 2022 18:38:28 +0000 (10:38 -0800)]
Merge tag 'mips-fixes-5.17_4' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix memory detection for MT7621 devices
- Fix setnocoherentio kernel option
- Fix warning when CONFIG_SCHED_CORE is enabled
* tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: ralink: mt7621: use bitwise NOT instead of logical
mips: setup: fix setnocoherentio() boolean setting
MIPS: smp: fill in sibling and core maps earlier
MIPS: ralink: mt7621: do memory detection on KSEG1
Linus Torvalds [Thu, 3 Mar 2022 18:31:09 +0000 (10:31 -0800)]
Merge tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux
Pull auxdisplay fixes from Miguel Ojeda:
"A few lcd2s fixes from Andy Shevchenko"
* tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux:
auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
auxdisplay: lcd2s: Fix memory leak in ->remove()
auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
Eric Dumazet [Thu, 3 Mar 2022 17:37:28 +0000 (09:37 -0800)]
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
While investigating on why a synchronize_net() has been added recently
in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
might drop skbs in some cases.
Discussion about removing synchronize_net() from ipv6_mc_down()
will happen in a different thread.
Fixes:
f185de28d9ae ("mld: add new workqueues for process mld events")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 3 Mar 2022 15:42:49 +0000 (17:42 +0200)]
net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
The blamed commit said one thing but did another. It explains that we
should restore the "return err" to the original "goto out_unwind_tagger",
but instead it replaced it with "goto out_unlock".
When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a
multi-switch tree, the switches would end up not using the same tagging
protocol.
Fixes:
0b0e2ff10356 ("net: dsa: restore error path of dsa_tree_change_tag_proto")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maciej Fijalkowski [Wed, 2 Mar 2022 17:59:27 +0000 (09:59 -0800)]
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
Commit
c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if
netif is not OK") addressed the ring transient state when
MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
interface to through down/up. Maurice reported that when carrier is not
ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
there is still some work to be done.
To fix this, do not set work_done to false for a !netif_carrier_ok().
Fixes:
c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK")
Reported-by: Maurice Baijens <maurice.baijens@ellips.com>
Tested-by: Maurice Baijens <maurice.baijens@ellips.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Mar 2022 16:14:04 +0000 (08:14 -0800)]
Merge branch 'selftests-mlxsw-a-couple-of-fixes'
Ido Schimmel says:
====================
selftests: mlxsw: A couple of fixes
Patch #1 fixes a breakage due to a change in iproute2 output. The real
problem is not iproute2, but the fact that the check was not strict
enough. Fixed by using JSON output instead. Targeting at net so that the
test will pass as part of old and new kernels regardless of iproute2
version.
Patch #2 fixes an issue uncovered by the first one.
====================
Link: https://lore.kernel.org/r/20220302161447.217447-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amit Cohen [Wed, 2 Mar 2022 16:14:47 +0000 (18:14 +0200)]
selftests: mlxsw: resource_scale: Fix return value
The test runs several test cases and is supposed to return an error in
case at least one of them failed.
Currently, the check of the return value of each test case is in the
wrong place, which can result in the wrong return value. For example:
# TESTS='tc_police' ./resource_scale.sh
TEST: 'tc_police' [default] 968 [FAIL]
tc police offload count failed
Error: mlxsw_spectrum: Failed to allocate policer index.
We have an error talking to the kernel
Command failed /tmp/tmp.i7Oc5HwmXY:969
TEST: 'tc_police' [default] overflow 969 [ OK ]
...
TEST: 'tc_police' [ipv4_max] overflow 969 [ OK ]
$ echo $?
0
Fix this by moving the check to be done after each test case.
Fixes:
059b18e21c63 ("selftests: mlxsw: Return correct error code in resource scale test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amit Cohen [Wed, 2 Mar 2022 16:14:46 +0000 (18:14 +0200)]
selftests: mlxsw: tc_police_scale: Make test more robust
The test adds tc filters and checks how many of them were offloaded by
grepping for 'in_hw'.
iproute2 commit
f4cd4f127047 ("tc: add skip_hw and skip_sw to control
action offload") added offload indication to tc actions, producing the
following output:
$ tc filter show dev swp2 ingress
...
filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
eth_type ipv6
dst_ip 2001:db8:1::7bf
skip_sw
in_hw in_hw_count 1
action order 1: police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
ref 1 bind 1
not_in_hw
used_hw_stats immediate
The current grep expression matches on both 'in_hw' and 'not_in_hw',
resulting in incorrect results.
Fix that by using JSON output instead.
Fixes:
5061e773264b ("selftests: mlxsw: Add scale test for tc-police")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 2 Mar 2022 19:39:39 +0000 (21:39 +0200)]
net: dcb: disable softirqs in dcbnl_flush_dev()
Ido Schimmel points out that since commit
52cff74eef5d ("dcbnl : Disable
software interrupts before taking dcb_lock"), the DCB API can be called
by drivers from softirq context.
One such in-tree example is the chelsio cxgb4 driver:
dcb_rpl
-> cxgb4_dcb_handle_fw_update
-> dcb_ieee_setapp
If the firmware for this driver happened to send an event which resulted
in a call to dcb_ieee_setapp() at the exact same time as another
DCB-enabled interface was unregistering on the same CPU, the softirq
would deadlock, because the interrupted process was already holding the
dcb_lock in dcbnl_flush_dev().
Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
in the rest of the dcbnl code.
Fixes:
91b0383fef06 ("net: dcb: flush lingering app table entries for unregistered devices")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christophe JAILLET [Wed, 2 Mar 2022 20:21:15 +0000 (21:21 +0100)]
bnx2: Fix an error message
Fix an error message and report the correct failing function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niels Dossche [Tue, 1 Mar 2022 22:28:22 +0000 (23:28 +0100)]
sfc: extend the locking on mcdi->seqno
seqno could be read as a stale value outside of the lock. The lock is
already acquired to protect the modification of seqno against a possible
race condition. Place the reading of this value also inside this locking
to protect it against a possible race condition.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Mar 2022 10:34:18 +0000 (10:34 +0000)]
Merge branch 'smc-fix'
D. Wythe says:
====================
fix unexpected SMC_CLC_DECL_ERR_REGRMB error
We can easily trigger the SMC_CLC_DECL_ERR_REGRMB exception within
following script:
server: smc_run nginx
client: smc_run ./wrk -c 2000 -t 8 -d 20 http://smc-server
And we can clearly see that this error is also divided into two types:
1. 0x09990003
2. 0x05000000/0x09990003
Which has the same root causes, but the immediate causes vary.
The root cause of this issues is that remove connections from link group
is not synchronous with add/delete rtoken entry, which means that even
the number of connections is less that SMC_RMBS_PER_LGR_MAX, it does not
mean that the connection can register rtoken successfully later. In
other words, the rtoken entry may released, This will cause an
unexpected SMC_CLC_DECL_ERR_REGRMB to be reported, and then this SMC
connections have to fallback to TCP.
This patch set handles two types of SMC_CLC_DECL_ERR_REGRMB exceptions
from different perspectives.
Patch 1: fix the 0x05000000/0x09990003 error.
Patch 2: fix the 0x09990003 error.
After those patches, there is no SMC_CLC_DECL_ERR_REGRMB exceptions in
my
test case any more.
v1 -> v2:
- add bugfix patch for SMC_CLC_DECL_ERR_REGRMB cause by server side
v2 -> v3:
- fix incorrect mail thread
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Wed, 2 Mar 2022 13:25:12 +0000 (21:25 +0800)]
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
Based on the fact that whether a new SMC connection can be accepted or
not depends on not only the limit of conn nums, but also the available
entries of rtoken. Since the rtoken release is trigger by peer, while
the conn nums is decrease by local, tons of thing can happen in this
time difference.
This only thing that needs to be mentioned is that now all connection
creations are completely protected by smc_server_lgr_pending lock, it's
enough to check only the available entries in rtokens_used_mask.
Fixes:
cd6851f30386 ("smc: remote memory buffers (RMBs)")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Wed, 2 Mar 2022 13:25:11 +0000 (21:25 +0800)]
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
dues to following execution sequence:
Server Conn A: Server Conn B: Client Conn B:
smc_lgr_unregister_conn
smc_lgr_register_conn
smc_clc_send_accept ->
smc_rtoken_add
smcr_buf_unuse
-> Client Conn A:
smc_rtoken_delete
smc_lgr_unregister_conn() makes current link available to assigned to new
incoming connection, while smcr_buf_unuse() has not executed yet, which
means that smc_rtoken_add may fail because of insufficient rtoken_entry,
reversing their execution order will avoid this problem.
Fixes:
3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheyu Ma [Wed, 2 Mar 2022 12:24:23 +0000 (20:24 +0800)]
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
During driver initialization, the pointer of card info, i.e. the
variable 'ci' is required. However, the definition of
'com20020pci_id_table' reveals that this field is empty for some
devices, which will cause null pointer dereference when initializing
these devices.
The following log reveals it:
[ 3.973806] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[ 3.973819] RIP: 0010:com20020pci_probe+0x18d/0x13e0 [com20020_pci]
[ 3.975181] Call Trace:
[ 3.976208] local_pci_probe+0x13f/0x210
[ 3.977248] pci_device_probe+0x34c/0x6d0
[ 3.977255] ? pci_uevent+0x470/0x470
[ 3.978265] really_probe+0x24c/0x8d0
[ 3.978273] __driver_probe_device+0x1b3/0x280
[ 3.979288] driver_probe_device+0x50/0x370
Fix this by checking whether the 'ci' is a null pointer first.
Fixes:
8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 2 Mar 2022 16:17:23 +0000 (08:17 -0800)]
tcp: make tcp_read_sock() more robust
If recv_actor() returns an incorrect value, tcp_read_sock()
might loop forever.
Instead, issue a one time warning and make sure to make progress.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220302161723.3910001-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 2 Mar 2022 16:17:22 +0000 (08:17 -0800)]
bpf, sockmap: Do not ignore orig_len parameter
Currently, sk_psock_verdict_recv() returns skb->len
This is problematic because tcp_read_sock() might have
passed orig_len < skb->len, due to the presence of TCP urgent data.
This causes an infinite loop from tcp_read_sock()
Followup patch will make tcp_read_sock() more robust vs bad actors.
Fixes:
ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220302161723.3910001-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 1 Mar 2022 11:34:40 +0000 (05:34 -0600)]
net: ipa: add an interconnect dependency
In order to function, the IPA driver very clearly requires the
interconnect framework to be enabled in the kernel configuration.
State that dependency in the Kconfig file.
This became a problem when CONFIG_COMPILE_TEST support was added.
Non-Qualcomm platforms won't necessarily enable CONFIG_INTERCONNECT.
Reported-by: kernel test robot <lkp@intel.com>
Fixes:
38a4066f593c5 ("net: ipa: support COMPILE_TEST")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220301113440.257916-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lena wang [Tue, 1 Mar 2022 11:17:09 +0000 (19:17 +0800)]
net: fix up skbs delta_truesize in UDP GRO frag_list
The truesize for a UDP GRO packet is added by main skb and skbs in main
skb's frag_list:
skb_gro_receive_list
p->truesize += skb->truesize;
The commit
53475c5dd856 ("net: fix use-after-free when UDP GRO with
shared fraglist") introduced a truesize increase for frag_list skbs.
When uncloning skb, it will call pskb_expand_head and trusesize for
frag_list skbs may increase. This can occur when allocators uses
__netdev_alloc_skb and not jump into __alloc_skb. This flow does not
use ksize(len) to calculate truesize while pskb_expand_head uses.
skb_segment_list
err = skb_unclone(nskb, GFP_ATOMIC);
pskb_expand_head
if (!skb->sk || skb->destructor == sock_edemux)
skb->truesize += size - osize;
If we uses increased truesize adding as delta_truesize, it will be
larger than before and even larger than previous total truesize value
if skbs in frag_list are abundant. The main skb truesize will become
smaller and even a minus value or a huge value for an unsigned int
parameter. Then the following memory check will drop this abnormal skb.
To avoid this error we should use the original truesize to segment the
main skb.
Fixes:
53475c5dd856 ("net: fix use-after-free when UDP GRO with shared fraglist")
Signed-off-by: lena wang <lena.wang@mediatek.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Mar 2022 05:53:34 +0000 (21:53 -0800)]
Merge tag 'batadv-net-pullrequest-
20220302' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Remove redundant iflink requests, by Sven Eckelmann (2 patches)
- Don't expect inter-netns unique iflink indices, by Sven Eckelmann
* tag 'batadv-net-pullrequest-
20220302' of git://git.open-mesh.org/linux-merge:
batman-adv: Don't expect inter-netns unique iflink indices
batman-adv: Request iflink once in batadv_get_real_netdevice
batman-adv: Request iflink once in batadv-on-batadv check
====================
Link: https://lore.kernel.org/r/20220302163049.101957-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Mar 2022 05:49:57 +0000 (21:49 -0800)]
Merge tag 'wireless-for-net-2022-03-02' of git://git./linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Three more fixes:
- fix build issue in iwlwifi, now that I understood
what's going on there
- propagate error in iwlwifi/mvm to userspace so it
can figure out what's happening
- fix channel switch related updates in P2P-client
in cfg80211
* tag 'wireless-for-net-2022-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
iwlwifi: mvm: return value for request_ownership
nl80211: Update bss channel on channel switch for P2P_CLIENT
iwlwifi: fix build error for IWLMEI
====================
Link: https://lore.kernel.org/r/20220302214444.100180-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>