Linus Torvalds [Fri, 18 Oct 2024 14:01:59 +0000 (07:01 -0700)]
Merge tag 's390-6.12-3' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix PCI error recovery by handling error events correctly
- Fix CCA crypto card behavior within protected execution environment
- Two KVM commits which fix virtual vs physical address handling bugs
in KVM pfault handling
- Fix return code handling in pckmo_key2protkey()
- Deactivate sclp console as late as possible so that outstanding
messages appear on the console instead of being dropped on reboot
- Convert newlines to CRLF instead of LFCR for the sclp vt220 driver,
as required by the vt220 specification
- Initialize also psw mask in perf_arch_fetch_caller_regs() to make
sure that user_mode(regs) will return false
- Update defconfigs
* tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Update defconfigs
s390: Initialize psw mask in perf_arch_fetch_caller_regs()
s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
s390/sclp: Deactivate sclp after all its users
s390/pkey_pckmo: Return with success for valid protected key types
KVM: s390: Change virtual to physical address access in diag 0x258 handler
KVM: s390: gaccess: Check if guest address is in memslot
s390/ap: Fix CCA crypto card behavior within protected execution environment
s390/pci: Handle PCI error codes other than 0x3a
Linus Torvalds [Fri, 18 Oct 2024 02:12:38 +0000 (19:12 -0700)]
Merge tag 'x86_bugs_post_ibpb' of git://git./linux/kernel/git/tip/tip
Pull x86 IBPB fixes from Borislav Petkov:
"This fixes the IBPB implementation of older AMDs (< gen4) that do not
flush the RSB (Return Address Stack) so you can still do some leaking
when using a "=ibpb" mitigation for Retbleed or SRSO. Fix it by doing
the flushing in software on those generations.
IBPB is not the default setting so this is not likely to affect
anybody in practice"
* tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
x86/bugs: Skip RSB fill at VMEXIT
x86/entry: Have entry_ibpb() invalidate return predictions
x86/cpufeatures: Add a IBPB_NO_RET BUG flag
x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
Linus Torvalds [Thu, 17 Oct 2024 23:33:06 +0000 (16:33 -0700)]
Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"28 hotfixes. 13 are cc:stable. 23 are MM.
It is the usual shower of unrelated singletons - please see the
individual changelogs for details"
* tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: add regression test for spanning store bug
maple_tree: correct tree corruption on spanning store
mm/mglru: only clear kswapd_failures if reclaimable
mm/swapfile: skip HugeTLB pages for unuse_vma
selftests: mm: fix the incorrect usage() info of khugepaged
MAINTAINERS: add Jann as memory mapping/VMA reviewer
mm: swap: prevent possible data-race in __try_to_reclaim_swap
mm: khugepaged: fix the incorrect statistics when collapsing large file folios
MAINTAINERS: kasan, kcov: add bugzilla links
mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
maple_tree: check for MA_STATE_BULK on setting wr_rebalance
mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
mm: remove unused stub for can_swapin_thp()
mailmap: add an entry for Andy Chiu
MAINTAINERS: add memory mapping/VMA co-maintainers
fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
...
Linus Torvalds [Thu, 17 Oct 2024 23:24:42 +0000 (16:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two clk driver fixes and a unit test fix:
- Terminate the of_device_id table in the Samsung exynosautov920 clk
driver so that device matching logic doesn't run off the end of the
array into other memory and break matching for any kernel with this
driver loaded
- Properly limit the max clk ID in the Rockchip clk driver
- Use clk kunit helpers in the clk tests so that memory isn't leaked
after the test concludes"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: test: Fix some memory leaks
clk: rockchip: fix finding of maximum clock ID
clk: samsung: Fix out-of-bound access of of_match_node()
Linus Torvalds [Thu, 17 Oct 2024 16:51:03 +0000 (09:51 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Disable software tag-based KASAN when compiling with GCC, as
functions are incorrectly instrumented leading to a crash early
during boot
- Fix pkey configuration for kernel threads when POE is enabled
- Fix invalid memory accesses in uprobes when targetting load-literal
instructions
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kasan: Disable Software Tag-Based KASAN with GCC
Documentation/protection-keys: add AArch64 to documentation
arm64: set POR_EL0 for kernel threads
arm64: probes: Fix uprobes for big-endian kernels
arm64: probes: Fix simulate_ldr*_literal()
arm64: probes: Remove broken LDR (literal) uprobe support
Linus Torvalds [Thu, 17 Oct 2024 16:43:36 +0000 (09:43 -0700)]
Merge tag 'arm-fixes-6.12' of git://git./linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"Most of the fixes this time are for platform specific drivers,
addressing issues found through build testing on freescale, ep93xx,
starfive, and npcm platforms, as as well as the ffa firmware.
The fixes for the scmi firmware driver address compatibility problems
found on broadcom machines.
There are only two devicetree fixes, addressing incorrect in
configuration on broadcom and marvell machines.
The changes to the Documentation and MAINTAINERS files are for
clarification only"
* tag 'arm-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
firmware: arm_ffa: Avoid string-fortify warning caused by memcpy()
firmware: arm_scmi: Queue in scmi layer for mailbox implementation
firmware: arm_ffa: Avoid string-fortify warning in export_uuid()
firmware: arm_scmi: Give SMC transport precedence over mailbox
firmware: arm_scmi: Fix the double free in scmi_debugfs_common_setup()
Documentation/process: maintainer-soc: clarify submitting patches
dmaengine: cirrus: check that output may be truncated
dmaengine: cirrus: ERR_CAST() ioremap error
MAINTAINERS: use the canonical soc mailing list address and mark it as L:
ARM: dts: bcm2837-rpi-cm3-io3: Fix HDMI hpd-gpio pin
arm64: dts: marvell: cn9130-sr-som: fix cp0 mdio pin numbers
soc: fsl: cpm1: qmc: Fix unused data compilation warning
soc: fsl: cpm1: qmc: Do not use IS_ERR_VALUE() on error pointers
reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC
reset: npcm: convert comma to semicolon
Linus Torvalds [Thu, 17 Oct 2024 16:36:59 +0000 (09:36 -0700)]
Merge tag 'sound-6.12-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, nothing really stands out:
- Usual HD-audio quirks / device-specific fixes
- Kconfig dependency fix for UM
- A series of minor fixes for SoundWire
- Updates of USB-audio LINE6 contact address"
* tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2
ALSA/hda: intel-sdw-acpi: add support for sdw-manager-list property read
ALSA/hda: intel-sdw-acpi: simplify sdw-master-count property read
ALSA/hda: intel-sdw-acpi: fetch fwnode once in sdw_intel_scan_controller()
ALSA/hda: intel-sdw-acpi: cleanup sdw_intel_scan_controller
ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects
ALSA: scarlett2: Add error check after retrieving PEQ filter values
ALSA: hda/cs8409: Fix possible NULL dereference
sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML
ALSA: line6: update contact information
ALSA: usb-audio: Fix NULL pointer deref in snd_usb_power_domain_set()
ALSA: hda/conexant - Fix audio routing for HP EliteOne 1000 G2
ALSA: hda: Sound support for HP Spectre x360 16 inch model 2024
Linus Torvalds [Thu, 17 Oct 2024 16:31:18 +0000 (09:31 -0700)]
Merge tag 'net-6.12-rc4' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - new code bugs:
- eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
Previous releases - regressions:
- ipv4: give an IPv4 dev to blackhole_netdev
- udp: compute L4 checksum as usual when not segmenting the skb
- tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
- eth: mlx5e: don't call cleanup on profile rollback failure
- eth: microchip: vcap api: fix memory leaks in
vcap_api_encode_rule_test()
- eth: enetc: disable Tx BD rings after they are empty
- eth: macb: avoid 20s boot delay by skipping MDIO bus registration
for fixed-link PHY
Previous releases - always broken:
- posix-clock: fix missing timespec64 check in pc_clock_settime()
- genetlink: hold RCU in genlmsg_mcast()
- mptcp: prevent MPC handshake on port-based signal endpoints
- eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
- eth: stmmac: dwmac-tegra: fix link bring-up sequence
- eth: bcmasp: fix potential memory leak in bcmasp_xmit()
Misc:
- add Andrew Lunn as a co-maintainer of all networking drivers"
* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5e: Don't call cleanup on profile rollback failure
net/mlx5: Unregister notifier on eswitch init failure
net/mlx5: Fix command bitmask initialization
net/mlx5: Check for invalid vector index on EQ creation
net/mlx5: HWS, use lock classes for bwc locks
net/mlx5: HWS, don't destroy more bwc queue locks than allocated
net/mlx5: HWS, fixed double free in error flow of definer layout
net/mlx5: HWS, removed wrong access to a number of rules variable
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
net: phy: mdio-bcm-unimac: Add BCM6846 support
dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
udp: Compute L4 checksum as usual when not segmenting the skb
genetlink: hold RCU in genlmsg_mcast()
net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
...
Lorenzo Stoakes [Mon, 7 Oct 2024 15:28:33 +0000 (16:28 +0100)]
maple_tree: add regression test for spanning store bug
Add a regression test to assert that, when performing a spanning store
which consumes the entirety of the rightmost right leaf node does not
result in maple tree corruption when doing so.
This achieves this by building a test tree of 3 levels and establishing a
store which ultimately results in a spanned store of this nature.
Link: https://lkml.kernel.org/r/30cdc101a700d16e03ba2f9aa5d83f2efa894168.1728314403.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Mon, 7 Oct 2024 15:28:32 +0000 (16:28 +0100)]
maple_tree: correct tree corruption on spanning store
Patch series "maple_tree: correct tree corruption on spanning store", v3.
There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.
This bug seems far more likely to happen since commit
f8d112a4e657
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.
We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.
The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.
This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.
The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.
The fix patch commit message goes into great detail as to how this occurs.
This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.
Bert has kindly tested the fix and confirmed it resolved his issues. Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.
This patch (of 2):
There has been a subtle bug present in the maple tree implementation from
its inception.
This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.
A range may always ultimately span two leaf nodes. In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.
This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().
In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.
When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.
When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.
We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.
We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().
After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e.
r_mas.offset <= r_mas.end).
Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.
Consider the following tree:
Height
0 Root Node
/ \
pivot = 0xffff / \ pivot = ULONG_MAX
/ \
1 A [-----] ...
/ \
pivot = 0x4fff / \ pivot = 0xffff
/ \
2 (LEAVES) B [-----] [-----] C
^--- Last pivot 0xffff.
Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):
1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
determines that this is a spanning store across nodes B and C. The mas
state is set such that the current node from which we traverse further
is node A.
2. In mas_wr_spanning_store() we try to find elements to the right of pivot
0xffff by searching for an index of 0x10000:
- mas_wr_walk_index() invokes mas_wr_walk_descend() and
mas_wr_node_walk() in turn.
- mas_wr_node_walk() loops over entries in node A until EITHER it
finds an entry whose pivot equals or exceeds 0x10000 OR it
reaches the final entry.
- Since no entry has a pivot equal to or exceeding 0x10000, pivot
0xffff is selected, leading to node C.
- mas_wr_walk_traverse() resets the mas state to traverse node C. We
loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
in turn once again.
- Again, we reach the last entry in node C, which has a pivot of
0xffff.
3. We then copy the elements to the left of 0x4000 in node B to the big
node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
too.
4. We determine whether we have any entries to copy from the right of the
end of the range via - and with r_mas set up at the entry at pivot
0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
pivot 0xffff.
5. BUG! The maple tree is corrupted with a duplicate entry.
This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.
spanning store across two leaf nodes with a range that ends at that shared
pivot.
A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.
Instead, this patch detects if the right hand node is populated, i.e. has
anything we need to copy.
We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.
The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().
The work performed in commit
f8d112a4e657 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.
The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.
After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.
Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com
Fixes:
54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/
20241001023402.3374-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Paolo Abeni [Thu, 17 Oct 2024 10:14:10 +0000 (12:14 +0200)]
Merge branch 'mlx5-misc-fixes-2024-10-15'
Tariq Toukan says:
====================
mlx5 misc fixes 2024-10-15
This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
Series generated against:
commit
174714f0e505 ("selftests: drivers: net: fix name not defined")
====================
Link: https://patch.msgid.link/20241015093208.197603-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:08 +0000 (12:32 +0300)]
net/mlx5e: Don't call cleanup on profile rollback failure
When profile rollback fails in mlx5e_netdev_change_profile, the netdev
profile var is left set to NULL. Avoid a crash when unloading the driver
by not calling profile->cleanup in such a case.
This was encountered while testing, with the original trigger that
the wq rescuer thread creation got interrupted (presumably due to
Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by
mlx5e_priv_init, the profile rollback also fails for the same reason
(signal still active) so the profile is left as NULL, leading to a crash
later in _mlx5e_remove.
[ 732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2)
[ 734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
[ 734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
[ 734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12
[ 734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
[ 734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
[ 734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
[ 745.537492] BUG: kernel NULL pointer dereference, address:
0000000000000008
[ 745.538222] #PF: supervisor read access in kernel mode
<snipped>
[ 745.551290] Call Trace:
[ 745.551590] <TASK>
[ 745.551866] ? __die+0x20/0x60
[ 745.552218] ? page_fault_oops+0x150/0x400
[ 745.555307] ? exc_page_fault+0x79/0x240
[ 745.555729] ? asm_exc_page_fault+0x22/0x30
[ 745.556166] ? mlx5e_remove+0x6b/0xb0 [mlx5_core]
[ 745.556698] auxiliary_bus_remove+0x18/0x30
[ 745.557134] device_release_driver_internal+0x1df/0x240
[ 745.557654] bus_remove_device+0xd7/0x140
[ 745.558075] device_del+0x15b/0x3c0
[ 745.558456] mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core]
[ 745.559112] mlx5_unregister_device+0x34/0x50 [mlx5_core]
[ 745.559686] mlx5_uninit_one+0x46/0xf0 [mlx5_core]
[ 745.560203] remove_one+0x4e/0xd0 [mlx5_core]
[ 745.560694] pci_device_remove+0x39/0xa0
[ 745.561112] device_release_driver_internal+0x1df/0x240
[ 745.561631] driver_detach+0x47/0x90
[ 745.562022] bus_remove_driver+0x84/0x100
[ 745.562444] pci_unregister_driver+0x3b/0x90
[ 745.562890] mlx5_cleanup+0xc/0x1b [mlx5_core]
[ 745.563415] __x64_sys_delete_module+0x14d/0x2f0
[ 745.563886] ? kmem_cache_free+0x1b0/0x460
[ 745.564313] ? lockdep_hardirqs_on_prepare+0xe2/0x190
[ 745.564825] do_syscall_64+0x6d/0x140
[ 745.565223] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 745.565725] RIP: 0033:0x7f1579b1288b
Fixes:
3ef14e463f6e ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:07 +0000 (12:32 +0300)]
net/mlx5: Unregister notifier on eswitch init failure
It otherwise remains registered and a subsequent attempt at eswitch
enabling might trigger warnings of the sort:
[ 682.589148] ------------[ cut here ]------------
[ 682.590204] notifier callback eswitch_vport_event [mlx5_core] already registered
[ 682.590256] WARNING: CPU: 13 PID: 2660 at kernel/notifier.c:31 notifier_chain_register+0x3e/0x90
[...snipped]
[ 682.610052] Call Trace:
[ 682.610369] <TASK>
[ 682.610663] ? __warn+0x7c/0x110
[ 682.611050] ? notifier_chain_register+0x3e/0x90
[ 682.611556] ? report_bug+0x148/0x170
[ 682.611977] ? handle_bug+0x36/0x70
[ 682.612384] ? exc_invalid_op+0x13/0x60
[ 682.612817] ? asm_exc_invalid_op+0x16/0x20
[ 682.613284] ? notifier_chain_register+0x3e/0x90
[ 682.613789] atomic_notifier_chain_register+0x25/0x40
[ 682.614322] mlx5_eswitch_enable_locked+0x1d4/0x3b0 [mlx5_core]
[ 682.614965] mlx5_eswitch_enable+0xc9/0x100 [mlx5_core]
[ 682.615551] mlx5_device_enable_sriov+0x25/0x340 [mlx5_core]
[ 682.616170] mlx5_core_sriov_configure+0x50/0x170 [mlx5_core]
[ 682.616789] sriov_numvfs_store+0xb0/0x1b0
[ 682.617248] kernfs_fop_write_iter+0x117/0x1a0
[ 682.617734] vfs_write+0x231/0x3f0
[ 682.618138] ksys_write+0x63/0xe0
[ 682.618536] do_syscall_64+0x4c/0x100
[ 682.618958] entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes:
7624e58a8b3a ("net/mlx5: E-switch, register event handler before arming the event")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shay Drory [Tue, 15 Oct 2024 09:32:06 +0000 (12:32 +0300)]
net/mlx5: Fix command bitmask initialization
Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit
isn't Initialize during command bitmask Initialization, only during
MANAGE_PAGES.
In addition, mlx5_cmd_trigger_completions() is trying to trigger
completion for MANAGE_PAGES command as well.
Hence, in case health error occurred before any MANAGE_PAGES command
have been invoke (for example, during mlx5_enable_hca()),
mlx5_cmd_trigger_completions() will try to trigger completion for
MANAGE_PAGES command, which will result in null-ptr-deref error.[1]
Fix it by Initialize command bitmask correctly.
While at it, re-write the code for better understanding.
[1]
BUG: KASAN: null-ptr-deref in mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
Write of size 4 at addr
0000000000000214 by task kworker/u96:2/12078
CPU: 10 PID: 12078 Comm: kworker/u96:2 Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_07_19_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
kasan_report+0xb9/0xf0
kasan_check_range+0xec/0x190
mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
mlx5_cmd_flush+0x94/0x240 [mlx5_core]
enter_error_state+0x6c/0xd0 [mlx5_core]
mlx5_fw_fatal_reporter_err_work+0xf3/0x480 [mlx5_core]
process_one_work+0x787/0x1490
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0xda0/0xda0
? assign_work+0x168/0x240
worker_thread+0x586/0xd30
? rescuer_thread+0xae0/0xae0
kthread+0x2df/0x3b0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x2d/0x70
? kthread_complete_and_exit+0x20/0x20
ret_from_fork_asm+0x11/0x20
</TASK>
Fixes:
9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Maher Sanalla [Tue, 15 Oct 2024 09:32:05 +0000 (12:32 +0300)]
net/mlx5: Check for invalid vector index on EQ creation
Currently, mlx5 driver does not enforce vector index to be lower than
the maximum number of supported completion vectors when requesting a
new completion EQ. Thus, mlx5_comp_eqn_get() fails when trying to
acquire an IRQ with an improper vector index.
To prevent the case above, enforce that vector index value is
valid and lower than maximum in mlx5_comp_eqn_get() before handling the
request.
Fixes:
f14c1a14e632 ("net/mlx5: Allocate completion EQs dynamically")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:04 +0000 (12:32 +0300)]
net/mlx5: HWS, use lock classes for bwc locks
The HWS BWC API uses one lock per queue and usually acquires one of
them, except when doing changes which require locking all queues in
order. Naturally, lockdep isn't too happy about acquiring the same lock
class multiple times, so inform it that each queue lock is a different
class to avoid false positives.
Fixes:
2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:03 +0000 (12:32 +0300)]
net/mlx5: HWS, don't destroy more bwc queue locks than allocated
hws_send_queues_bwc_locks_destroy destroyed more queue locks than
allocated, leading to memory corruption (occasionally) and warnings such
as DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) in __mutex_destroy because
sometimes, the 'mutex' being destroyed was random memory.
The severity of this problem is proportional to the number of queues
configured because the code overreaches beyond the end of the
bwc_send_queue_locks array by 2x its length.
Fix that by using the correct number of bwc queues.
Fixes:
2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yevgeny Kliteynik [Tue, 15 Oct 2024 09:32:02 +0000 (12:32 +0300)]
net/mlx5: HWS, fixed double free in error flow of definer layout
Fix error flow bug that could lead to double free of a buffer
during a failure to calculate a suitable definer layout.
Fixes:
74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yevgeny Kliteynik [Tue, 15 Oct 2024 09:32:01 +0000 (12:32 +0300)]
net/mlx5: HWS, removed wrong access to a number of rules variable
Removed wrong access to the num_of_rules field of the matcher.
This is a usual u32 variable, but the access was as if it was atomic.
This fixes the following CI warnings:
mlx5hws_bwc.c:708:17: warning: large atomic operation may incur significant performance penalty;
the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment]
Fixes:
510f9f61a112 ("net/mlx5: HWS, added API and enabled HWS support")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202409291101.6NdtMFVC-lkp@intel.com/
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Matthieu Baerts (NGI0) [Tue, 15 Oct 2024 08:38:47 +0000 (10:38 +0200)]
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
Syzkaller reported this splat:
==================================================================
BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
Read of size 4 at addr
ffff8880569ac858 by task syz.1.2799/14662
CPU: 0 UID: 0 PID: 14662 Comm: syz.1.2799 Not tainted
6.12.0-rc2-syzkaller-00307-g36c254515dc6 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg net/socket.c:744 [inline]
____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
__sys_sendmsg+0x117/0x1f0 net/socket.c:2690
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
RIP: 0023:0xf7fe4579
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:
00000000f574556c EFLAGS:
00000296 ORIG_RAX:
0000000000000172
RAX:
ffffffffffffffda RBX:
000000000000000b RCX:
0000000020000140
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
0000000000000000 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000296 R12:
0000000000000000
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
</TASK>
Allocated by task 5387:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
subflow_create_ctx+0x87/0x2a0 net/mptcp/subflow.c:1803
subflow_ulp_init+0xc3/0x4d0 net/mptcp/subflow.c:1956
__tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
tcp_set_ulp+0x326/0x7f0 net/ipv4/tcp_ulp.c:167
mptcp_subflow_create_socket+0x4ae/0x10a0 net/mptcp/subflow.c:1764
__mptcp_subflow_connect+0x3cc/0x1490 net/mptcp/subflow.c:1592
mptcp_pm_create_subflow_or_signal_addr+0xbda/0x23a0 net/mptcp/pm_netlink.c:642
mptcp_pm_nl_fully_established net/mptcp/pm_netlink.c:650 [inline]
mptcp_pm_nl_work+0x3a1/0x4f0 net/mptcp/pm_netlink.c:943
mptcp_worker+0x15a/0x1240 net/mptcp/protocol.c:2777
process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
process_scheduled_works kernel/workqueue.c:3310 [inline]
worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Freed by task 113:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:2342 [inline]
slab_free mm/slub.c:4579 [inline]
kfree+0x14f/0x4b0 mm/slub.c:4727
kvfree+0x47/0x50 mm/util.c:701
kvfree_rcu_list+0xf5/0x2c0 kernel/rcu/tree.c:3423
kvfree_rcu_drain_ready kernel/rcu/tree.c:3563 [inline]
kfree_rcu_monitor+0x503/0x8b0 kernel/rcu/tree.c:3632
kfree_rcu_shrink_scan+0x245/0x3a0 kernel/rcu/tree.c:3966
do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
shrink_slab+0x32b/0x12a0 mm/shrinker.c:662
shrink_one+0x47e/0x7b0 mm/vmscan.c:4818
shrink_many mm/vmscan.c:4879 [inline]
lru_gen_shrink_node mm/vmscan.c:4957 [inline]
shrink_node+0x2452/0x39d0 mm/vmscan.c:5937
kswapd_shrink_node mm/vmscan.c:6765 [inline]
balance_pgdat+0xc19/0x18f0 mm/vmscan.c:6957
kswapd+0x5ea/0xbf0 mm/vmscan.c:7226
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Last potentially related work creation:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
__kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:541
kvfree_call_rcu+0x74/0xbe0 kernel/rcu/tree.c:3810
subflow_ulp_release+0x2ae/0x350 net/mptcp/subflow.c:2009
tcp_cleanup_ulp+0x7c/0x130 net/ipv4/tcp_ulp.c:124
tcp_v4_destroy_sock+0x1c5/0x6a0 net/ipv4/tcp_ipv4.c:2541
inet_csk_destroy_sock+0x1a3/0x440 net/ipv4/inet_connection_sock.c:1293
tcp_done+0x252/0x350 net/ipv4/tcp.c:4870
tcp_rcv_state_process+0x379b/0x4f30 net/ipv4/tcp_input.c:6933
tcp_v4_do_rcv+0x1ad/0xa90 net/ipv4/tcp_ipv4.c:1938
sk_backlog_rcv include/net/sock.h:1115 [inline]
__release_sock+0x31b/0x400 net/core/sock.c:3072
__tcp_close+0x4f3/0xff0 net/ipv4/tcp.c:3142
__mptcp_close_ssk+0x331/0x14d0 net/mptcp/protocol.c:2489
mptcp_close_ssk net/mptcp/protocol.c:2543 [inline]
mptcp_close_ssk+0x150/0x220 net/mptcp/protocol.c:2526
mptcp_pm_nl_rm_addr_or_subflow+0x2be/0xcc0 net/mptcp/pm_netlink.c:878
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg net/socket.c:744 [inline]
____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
__sys_sendmsg+0x117/0x1f0 net/socket.c:2690
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
The buggy address belongs to the object at
ffff8880569ac800
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 88 bytes inside of
freed 512-byte region [
ffff8880569ac800,
ffff8880569aca00)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x569ac
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw:
04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
raw:
0000000000000000 0000000080100010 00000001f5000000 0000000000000000
head:
04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
head:
0000000000000000 0000000080100010 00000001f5000000 0000000000000000
head:
04fff00000000002 ffffea00015a6b01 ffffffffffffffff 0000000000000000
head:
0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 10238, tgid 10238 (kworker/u32:6), ts
597403252405, free_ts
597177952947
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1537
prep_new_page mm/page_alloc.c:1545 [inline]
get_page_from_freelist+0x101e/0x3070 mm/page_alloc.c:3457
__alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
alloc_slab_page mm/slub.c:2412 [inline]
allocate_slab mm/slub.c:2578 [inline]
new_slab+0x2ba/0x3f0 mm/slub.c:2631
___slab_alloc+0xd1d/0x16f0 mm/slub.c:3818
__slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
__kmalloc_cache_noprof+0x2c5/0x310 mm/slub.c:4290
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
mld_add_delrec net/ipv6/mcast.c:743 [inline]
igmp6_leave_group net/ipv6/mcast.c:2625 [inline]
igmp6_group_dropped+0x4ab/0xe40 net/ipv6/mcast.c:723
__ipv6_dev_mc_dec+0x281/0x360 net/ipv6/mcast.c:979
addrconf_leave_solict net/ipv6/addrconf.c:2253 [inline]
__ipv6_ifa_notify+0x3f6/0xc30 net/ipv6/addrconf.c:6283
addrconf_ifdown.isra.0+0xef9/0x1a20 net/ipv6/addrconf.c:3982
addrconf_notify+0x220/0x19c0 net/ipv6/addrconf.c:3781
notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1996
call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
call_netdevice_notifiers net/core/dev.c:2048 [inline]
dev_close_many+0x333/0x6a0 net/core/dev.c:1589
page last free pid 13136 tgid 13136 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1108 [inline]
free_unref_page+0x5f4/0xdc0 mm/page_alloc.c:2638
stack_depot_save_flags+0x2da/0x900 lib/stackdepot.c:666
kasan_save_stack+0x42/0x60 mm/kasan/common.c:48
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:319 [inline]
__kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
kasan_slab_alloc include/linux/kasan.h:247 [inline]
slab_post_alloc_hook mm/slub.c:4085 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
skb_clone+0x190/0x3f0 net/core/skbuff.c:2084
do_one_broadcast net/netlink/af_netlink.c:1462 [inline]
netlink_broadcast_filtered+0xb11/0xef0 net/netlink/af_netlink.c:1540
netlink_broadcast+0x39/0x50 net/netlink/af_netlink.c:1564
uevent_net_broadcast_untagged lib/kobject_uevent.c:331 [inline]
kobject_uevent_net_broadcast lib/kobject_uevent.c:410 [inline]
kobject_uevent_env+0xacd/0x1670 lib/kobject_uevent.c:608
device_del+0x623/0x9f0 drivers/base/core.c:3882
snd_card_disconnect.part.0+0x58a/0x7c0 sound/core/init.c:546
snd_card_disconnect+0x1f/0x30 sound/core/init.c:495
snd_usx2y_disconnect+0xe9/0x1f0 sound/usb/usx2y/usbusx2y.c:417
usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461
device_remove drivers/base/dd.c:569 [inline]
device_remove+0x122/0x170 drivers/base/dd.c:561
That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
which will initiate the release of its memory. Even if it is very likely
the release and the re-utilisation will be done later on, it is of
course better to avoid any issues and read the content of 'subflow'
before closing it.
Fixes:
1c1f72137598 ("mptcp: pm: only decrement add_addr_accepted for MPJ req")
Cc: stable@vger.kernel.org
Reported-by: syzbot+3c8b7a8e7df6a2a226ca@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/
670d7337.
050a0220.4cbc0.004f.GAE@google.com
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20241015-net-mptcp-uaf-pm-rm-v1-1-c4ee5d987a64@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Felix Fietkau [Tue, 15 Oct 2024 08:17:55 +0000 (10:17 +0200)]
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
The loop responsible for allocating up to MTK_FQ_DMA_LENGTH buffers must
only touch as many descriptors, otherwise it ends up corrupting unrelated
memory. Fix the loop iteration count accordingly.
Fixes:
c57e55819443 ("net: ethernet: mtk_eth_soc: handle dma buffer size soc specific")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241015081755.31060-1-nbd@nbd.name
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Mon, 14 Oct 2024 19:03:11 +0000 (21:03 +0200)]
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
Andrew and Nikolay reported connectivity issues with Cilium's service
load-balancing in case of vmxnet3.
If a BPF program for native XDP adds an encapsulation header such as
IPIP and transmits the packet out the same interface, then in case
of vmxnet3 a corrupted packet is being sent and subsequently dropped
on the path.
vmxnet3_xdp_xmit_frame() which is called e.g. via vmxnet3_run_xdp()
through vmxnet3_xdp_xmit_back() calculates an incorrect DMA address:
page = virt_to_page(xdpf->data);
tbi->dma_addr = page_pool_get_dma_addr(page) +
VMXNET3_XDP_HEADROOM;
dma_sync_single_for_device(&adapter->pdev->dev,
tbi->dma_addr, buf_size,
DMA_TO_DEVICE);
The above assumes a fixed offset (VMXNET3_XDP_HEADROOM), but the XDP
BPF program could have moved xdp->data. While the passed buf_size is
correct (xdpf->len), the dma_addr needs to have a dynamic offset which
can be calculated as xdpf->data - (void *)xdpf, that is, xdp->data -
xdp->data_hard_start.
Fixes:
54f00cce1178 ("vmxnet3: Add XDP support.")
Reported-by: Andrew Sauber <andrew.sauber@isovalent.com>
Reported-by: Nikolay Nikolaev <nikolay.nikolaev@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Nikolay Nikolaev <nikolay.nikolaev@isovalent.com>
Acked-by: Anton Protopopov <aspsk@isovalent.com>
Cc: William Tu <witu@nvidia.com>
Cc: Ronak Doshi <ronak.doshi@broadcom.com>
Link: https://patch.msgid.link/a0888656d7f09028f9984498cc698bb5364d89fc.1728931137.git.daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Xu [Mon, 14 Oct 2024 22:12:11 +0000 (22:12 +0000)]
mm/mglru: only clear kswapd_failures if reclaimable
lru_gen_shrink_node() unconditionally clears kswapd_failures, which can
prevent kswapd from sleeping and cause 100% kswapd cpu usage even when
kswapd repeatedly fails to make progress in reclaim.
Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some
progress, similar to shrink_node().
I happened to run into this problem in one of my tests recently. It
requires a combination of several conditions: The allocator needs to
allocate a right amount of pages such that it can wake up kswapd
without itself being OOM killed; there is no memory for kswapd to
reclaim (My test disables swap and cleans page cache first); no other
process frees enough memory at the same time.
Link: https://lkml.kernel.org/r/20241014221211.832591-1-weixugc@google.com
Fixes:
e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Wei Xu <weixugc@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Jan Alexander Steffens <heftig@archlinux.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liu Shixin [Tue, 15 Oct 2024 01:45:21 +0000 (09:45 +0800)]
mm/swapfile: skip HugeTLB pages for unuse_vma
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff. The
problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud
00000000743d215d(
84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
unuse_pud_range() by ftrace. And therefore the HugeTLB pages will never
be freed because we lost it from page table. We can skip HugeTLB pages
for unuse_vma to fix it.
Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
Fixes:
0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nanyong Sun [Tue, 15 Oct 2024 02:02:57 +0000 (10:02 +0800)]
selftests: mm: fix the incorrect usage() info of khugepaged
The mount option of tmpfs should be huge=advise, not madvise which is not
supported and may mislead the users.
Link: https://lkml.kernel.org/r/20241015020257.139235-1-sunnanyong@huawei.com
Fixes:
1b03d0d558a2 ("selftests/vm: add thp collapse file and tmpfs testing")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jann Horn [Mon, 14 Oct 2024 20:50:57 +0000 (22:50 +0200)]
MAINTAINERS: add Jann as memory mapping/VMA reviewer
Add myself as a reviewer for memory mapping / VMA code. I will probably
only reply to patches sporadically, but hopefully this will help me keep
up with changes that look interesting security-wise.
Link: https://lkml.kernel.org/r/20241014-maintainers-mmap-reviewer-v1-1-50dce0514752@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeongjun Park [Mon, 7 Oct 2024 07:06:23 +0000 (16:06 +0900)]
mm: swap: prevent possible data-race in __try_to_reclaim_swap
A report [1] was uploaded from syzbot.
In the previous commit
862590ac3708 ("mm: swap: allow cache reclaim to
skip slot cache"), the __try_to_reclaim_swap() function reads offset and
folio->entry from folio without folio_lock protection.
In the currently reported KCSAN log, it is assumed that the actual
data-race will not occur because the calltrace that does WRITE already
obtains the folio_lock and then writes.
However, the existing __try_to_reclaim_swap() function was already
implemented to perform reads under folio_lock protection [1], and there is
a risk of a data-race occurring through a function other than the one
shown in the KCSAN log.
Therefore, I think it is appropriate to change
read operations for folio to be performed under folio_lock.
[1]
==================================================================
BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
__delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
free_swap_cache mm/swap_state.c:293 [inline]
free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
__tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
zap_pte_range mm/memory.c:1700 [inline]
zap_pmd_range mm/memory.c:1739 [inline]
zap_pud_range mm/memory.c:1768 [inline]
zap_p4d_range mm/memory.c:1789 [inline]
unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
exit_mmap+0x18a/0x690 mm/mmap.c:1864
__mmput+0x28/0x1b0 kernel/fork.c:1347
mmput+0x4c/0x60 kernel/fork.c:1369
exit_mm+0xe4/0x190 kernel/exit.c:571
do_exit+0x55e/0x17f0 kernel/exit.c:926
do_group_exit+0x102/0x150 kernel/exit.c:1088
get_signal+0xf2a/0x1070 kernel/signal.c:2917
arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
__try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
zap_pte_range mm/memory.c:1656 [inline]
zap_pmd_range mm/memory.c:1739 [inline]
zap_pud_range mm/memory.c:1768 [inline]
zap_p4d_range mm/memory.c:1789 [inline]
unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
exit_mmap+0x18a/0x690 mm/mmap.c:1864
__mmput+0x28/0x1b0 kernel/fork.c:1347
mmput+0x4c/0x60 kernel/fork.c:1369
exit_mm+0xe4/0x190 kernel/exit.c:571
do_exit+0x55e/0x17f0 kernel/exit.c:926
__do_sys_exit kernel/exit.c:1055 [inline]
__se_sys_exit kernel/exit.c:1053 [inline]
__x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x0000000000000242 -> 0x0000000000000000
Link: https://lkml.kernel.org/r/20241007070623.23340-1-aha310510@gmail.com
Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
Fixes:
862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baolin Wang [Mon, 14 Oct 2024 10:24:44 +0000 (18:24 +0800)]
mm: khugepaged: fix the incorrect statistics when collapsing large file folios
Khugepaged already supports collapsing file large folios (including shmem
mTHP) by commit
7de856ffd007 ("mm: khugepaged: support shmem mTHP
collapse"), and the control parameters in khugepaged:
'khugepaged_max_ptes_swap' and 'khugepaged_max_ptes_none', still compare
based on PTE granularity to determine whether a file collapse is needed.
However, the statistics for 'present' and 'swap' in
hpage_collapse_scan_file() do not take into account the large folios,
which may lead to incorrect judgments regarding the
khugepaged_max_ptes_swap/none parameters, resulting in unnecessary file
collapses.
To fix this issue, take into account the large folios' statistics for
'present' and 'swap' variables in the hpage_collapse_scan_file().
Link: https://lkml.kernel.org/r/c76305d96d12d030a1a346b50503d148364246d2.1728901391.git.baolin.wang@linux.alibaba.com
Fixes:
7de856ffd007 ("mm: khugepaged: support shmem mTHP collapse")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Sat, 12 Oct 2024 22:55:24 +0000 (00:55 +0200)]
MAINTAINERS: kasan, kcov: add bugzilla links
Add links to the Bugzilla component that's used to track KASAN and KCOV
issues.
Link: https://lkml.kernel.org/r/20241012225524.117871-1-andrey.konovalov@linux.dev
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 11 Oct 2024 10:24:45 +0000 (12:24 +0200)]
mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
We (or rather, readahead logic :) ) might be allocating a THP in the
pagecache and then try mapping it into a process that explicitly disabled
THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
starting the VM.
For example, starting a VM backed on a file system with large folios
supported makes the VM crash when the VM tries accessing such a mapping
using KVM.
Is it also a problem when the HW disabled THP using
TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow
PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
really wants. For now, fix it by essentially performing the same check as
would be done in __thp_vma_allowable_orders() or in shmem code, where this
works as expected, and disallow PMD mappings, making us fallback to PTE
mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
Fixes:
793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Leo Fu <bfu@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Fri, 11 Oct 2024 10:24:44 +0000 (12:24 +0200)]
mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Patch series "mm: don't install PMD mappings when THPs are disabled by the
hw/process/vma".
During testing, it was found that we can get PMD mappings in processes
where THP (and more precisely, PMD mappings) are supposed to be disabled.
While it works as expected for anon+shmem, the pagecache is the
problematic bit.
For s390 KVM this currently means that a VM backed by a file located on
filesystem with large folio support can crash when KVM tries accessing the
problematic page, because the readahead logic might decide to use a
PMD-sized THP and faulting it into the page tables will install a PMD
mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings,
but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can
install a PMD mapping. khugepaged should already be taking care of not
collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by
shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ]
Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com
Fixes:
793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Leo Fu <bfu@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Boqiao Fu <bfu@redhat.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 11 Oct 2024 17:01:54 +0000 (10:01 -0700)]
Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
DAMON GitHub repos have moved from awslabs GitHub org to damonitor org[1].
Following the change, URLs on documents are also updated[2]. However,
commit
2e9b3d6e2e59 ("Docs/damon/maintainer-profile: add links in place"),
which was added just after the update, was using the deprecated GitHub
URLs. Update those to use damonitor GitHub URLs instead.
[1] https://lore.kernel.org/
20240813232158.83903-1-sj@kernel.org
[2] https://lore.kernel.org/
20240826015741.80707-2-sj@kernel.org
Link: https://lkml.kernel.org/r/20241011170154.70651-3-sj@kernel.org
Fixes:
2e9b3d6e2e59 ("Docs/damon/maintainer-profile: add links in place")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 11 Oct 2024 17:01:53 +0000 (10:01 -0700)]
Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
Patch series "Docs/damon/maintainer-profile: a couple of minor hotfixes".
DAMON maintainer-profile.rst file patches[1] that were merged into the
v6.12-rc1 have a couple of minor mistakes. Fix those.
[1] https://lore.kernel.org/
20240826015741.80707-1-sj@kernel.org
This patch (of 2):
Links to external web pages on DAMON's maintainer-profile.rst are missing
'_' suffixes. As a result, rendered document is having only verbose URLs
that cannot be clicked. Fix those.
Also, update the link texts for git trees to contain the names of the
trees, for better readability and avoiding below Sphinx warning.
maintainer-profile.rst:4: WARNING: Duplicate explicit target name: "tree".
Link: https://lkml.kernel.org/r/20241011170154.70651-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20241011170154.70651-2-sj@kernel.org
Fixes:
2e9b3d6e2e59 ("Docs/damon/maintainer-profile: add links in place")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Fri, 11 Oct 2024 21:44:50 +0000 (17:44 -0400)]
maple_tree: check for MA_STATE_BULK on setting wr_rebalance
It is possible for a bulk operation (MA_STATE_BULK is set) to enter the
new_end < mt_min_slots[type] case and set wr_rebalance as a store type.
This is incorrect as bulk stores do not rebalance per write, but rather
after the all of the writes are done through the mas_bulk_rebalance()
path. Therefore, add a check to make sure MA_STATE_BULK is not set before
we return wr_rebalance as the store type.
Also add a test to make sure wr_rebalance is never the store type when
doing bulk operations via mas_expected_entries()
This is a hotfix for this rc however it has no userspace effects as there
are no users of the bulk insertion mode.
Link: https://lkml.kernel.org/r/20241011214451.7286-1-sidhartha.kumar@oracle.com
Fixes:
5d659bbb52a2 ("maple_tree: introduce mas_wr_store_type()")
Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yang Shi [Sat, 12 Oct 2024 01:17:02 +0000 (18:17 -0700)]
mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
The "addr" and "is_shmem" arguments have different order in TP_PROTO and
TP_ARGS. This resulted in the incorrect trace result:
text-hugepage-644429 [276] 392092.878683: mm_khugepaged_collapse_file:
mm=0xffff20025d52c440, hpage_pfn=0x200678c00, index=512, addr=1, is_shmem=0,
filename=text-hugepage, nr=512, result=failed
The value of "addr" is wrong because it was treated as bool value, the
type of is_shmem.
Fix the order in TP_PROTO to keep "addr" is before "is_shmem" since the
original patch review suggested this order to achieve best packing.
And use "lx" for "addr" instead of "ld" in TP_printk because address is
typically shown in hex.
After the fix, the trace result looks correct:
text-hugepage-7291 [004] 128.627251: mm_khugepaged_collapse_file:
mm=0xffff0001328f9500, hpage_pfn=0x20016ea00, index=512, addr=0x400000,
is_shmem=0, filename=text-hugepage, nr=512, result=failed
Link: https://lkml.kernel.org/r/20241012011702.1084846-1-yang@os.amperecomputing.com
Fixes:
4c9473e87e75 ("mm/khugepaged: add tracepoint to collapse_file()")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Gautam Menghani <gautammenghani201@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org> [6.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jinjie Ruan [Thu, 10 Oct 2024 12:53:23 +0000 (20:53 +0800)]
mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
The sysfs_target->regions allocated in damon_sysfs_regions_alloc() is not
freed in damon_sysfs_test_add_targets(), which cause the following memory
leak, free it to fix it.
unreferenced object 0xffffff80c2a8db80 (size 96):
comm "kunit_try_catch", pid 187, jiffies
4294894363
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
[<
0000000001e3714d>] kmemleak_alloc+0x34/0x40
[<
000000008e6835c1>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
000000001286d9f8>] damon_sysfs_test_add_targets+0x1cc/0x738
[<
0000000032ef8f77>] kunit_try_run_case+0x13c/0x3ac
[<
00000000f3edea23>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
00000000adf936cf>] kthread+0x2e8/0x374
[<
0000000041bb1628>] ret_from_fork+0x10/0x20
Link: https://lkml.kernel.org/r/20241010125323.3127187-1-ruanjinjie@huawei.com
Fixes:
b8ee5575f763 ("mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Tue, 8 Oct 2024 19:13:29 +0000 (22:13 +0300)]
mm: remove unused stub for can_swapin_thp()
When can_swapin_thp() is unused, it prevents kernel builds with clang,
`make W=1` and CONFIG_WERROR=y:
mm/memory.c:4184:20: error: unused function 'can_swapin_thp' [-Werror,-Wunused-function]
Fix this by removing the unused stub.
See also commit
6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
Link: https://lkml.kernel.org/r/20241008191329.2332346-1-andriy.shevchenko@linux.intel.com
Fixes:
242d12c98174 ("mm: support large folios swap-in for sync io devices")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Barry Song <baohua@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Chiu [Wed, 9 Oct 2024 14:49:34 +0000 (22:49 +0800)]
mailmap: add an entry for Andy Chiu
Map my outdated addresses within mailmap.
Link: https://lkml.kernel.org/r/20241009144934.43027-1-andybnac@gmail.com
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Leon Chien <leonchien@synology.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Wed, 9 Oct 2024 20:10:32 +0000 (21:10 +0100)]
MAINTAINERS: add memory mapping/VMA co-maintainers
Add myself and Liam as co-maintainers of the memory mapping and VMA code
alongside Andrew as we are heavily involved in its implementation and
maintenance.
Link: https://lkml.kernel.org/r/20241009201032.6130-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Brahmajit Das [Sat, 5 Oct 2024 06:37:00 +0000 (12:07 +0530)]
fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
show show_smap_vma_flags() has been a using misspelled initializer in
mnemonics[] - it needed to initialize 2 element array of char and it used
NUL-padded 2 character string literals (i.e. 3-element initializer).
This has been spotted by gcc-15[*]; prior to that gcc quietly dropped the
3rd eleemnt of initializers. To fix this we are increasing the size of
mnemonics[] (from mnemonics[BITS_PER_LONG][2] to
mnemonics[BITS_PER_LONG][3]) to accomodate the NUL-padded string literals.
This also helps us in simplyfying the logic for printing of the flags as
instead of printing each character from the mnemonics[], we can just print
the mnemonics[] using seq_printf.
[*]: fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
917 | [0 ... (BITS_PER_LONG-1)] = "??",
| ^~~~
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
...
Stephen pointed out:
: The C standard explicitly allows for a string initializer to be too long
: due to the NUL byte at the end ... so this warning may be overzealous.
but let's make the warning go away anwyay.
Link: https://lkml.kernel.org/r/20241005063700.2241027-1-brahmajit.xyz@gmail.com
Link: https://lkml.kernel.org/r/20241003093040.47c08382@canb.auug.org.au
Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Florian Westphal [Mon, 7 Oct 2024 20:52:24 +0000 (22:52 +0200)]
lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
Ben Greear reports following splat:
------------[ cut here ]------------
net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
codetag_unload_module+0x19b/0x2a0
? codetag_load_module+0x80/0x80
nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.
Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.
Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat # will WARN.
[akpm@linux-foundation.org: add comment]
Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
Fixes:
a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/
bdaaef9d-4364-4171-b82b-
bcfc12e207eb@candelatech.com/
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jann Horn [Mon, 7 Oct 2024 21:42:04 +0000 (23:42 +0200)]
mm/mremap: fix move_normal_pmd/retract_page_tables race
In mremap(), move_page_tables() looks at the type of the PMD entry and the
specified address range to figure out by which method the next chunk of
page table entries should be moved.
At that point, the mmap_lock is held in write mode, but no rmap locks are
held yet. For PMD entries that point to page tables and are fully covered
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
which first takes rmap locks, then does move_normal_pmd().
move_normal_pmd() takes the necessary page table locks at source and
destination, then moves an entire page table from the source to the
destination.
The problem is: The rmap locks, which protect against concurrent page
table removal by retract_page_tables() in the THP code, are only taken
after the PMD entry has been read and it has been decided how to move it.
So we can race as follows (with two processes that have mappings of the
same tmpfs file that is stored on a tmpfs mount with huge=advise); note
that process A accesses page tables through the MM while process B does it
through the file rmap:
process A process B
========= =========
mremap
mremap_to
move_vma
move_page_tables
get_old_pmd
alloc_new_pmd
*** PREEMPT ***
madvise(MADV_COLLAPSE)
do_madvise
madvise_walk_vmas
madvise_vma_behavior
madvise_collapse
hpage_collapse_scan_file
collapse_file
retract_page_tables
i_mmap_lock_read(mapping)
pmdp_collapse_flush
i_mmap_unlock_read(mapping)
move_pgt_entry(NORMAL_PMD, ...)
take_rmap_locks
move_normal_pmd
drop_rmap_locks
When this happens, move_normal_pmd() can end up creating bogus PMD entries
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect
depends on arch-specific and machine-specific details; on x86, you can end
up with physical page 0 mapped as a page table, which is likely
exploitable for user->kernel privilege escalation.
Fix the race by letting process B recheck that the PMD still points to a
page table after the rmap locks have been taken. Otherwise, we bail and
let the caller fall back to the PTE-level copying path, which will then
bail immediately at the pmd_none() check.
Bug reachability: Reaching this bug requires that you can create
shmem/file THP mappings - anonymous THP uses different code that doesn't
zap stuff under rmap locks. File THP is gated on an experimental config
flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need
shmem THP to hit this bug. As far as I know, getting shmem THP normally
requires that you can mount your own tmpfs with the right mount flags,
which would require creating your own user+mount namespace; though I don't
know if some distros maybe enable shmem THP by default or something like
that.
Bug impact: This issue can likely be used for user->kernel privilege
escalation when it is reachable.
Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com
Fixes:
1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Closes: https://project-zero.issues.chromium.org/
371047675
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sebastian Andrzej Siewior [Mon, 7 Oct 2024 14:30:49 +0000 (16:30 +0200)]
mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on certain builds.
Arnd reported a build failure due to the BUILD_BUG_ON() statement in
alloc_kmem_cache_cpus(). The test
PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)
The factors that increase the right side of the equation:
- PAGE_SIZE > 4KiB increases KMALLOC_SHIFT_HIGH
- For the local_lock_t in kmem_cache_cpu:
- PREEMPT_RT adds an actual lock.
- LOCKDEP increases the size of the lock.
- LOCK_STAT adds additional bytes plus padding to the lockdep
structure.
The net difference with and without PREEMPT_RT is 88 bytes for the
lock_lock_t, 96 bytes for kmem_cache_cpu due to additional padding. This
is enough to exceed the 80KiB limit with 16KiB page size - the 8KiB page
size is fine.
Increase PERCPU_DYNAMIC_SIZE_SHIFT to 13 on configs with PAGE_SIZE larger
than 4KiB and LOCKDEP enabled.
Link: https://lkml.kernel.org/r/20241007143049.gyMpEu89@linutronix.de
Fixes:
d8fccd9ca5f9 ("arm64: Allow to enable PREEMPT_RT.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202410020326.iaZIteIx-lkp@intel.com/
Reported-by: Arnd Bergmann <arnd@kernel.org>
Closes: https://lore.kernel.org/
20241004095702.637528-1-arnd@kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Edward Liaw [Thu, 3 Oct 2024 21:17:11 +0000 (21:17 +0000)]
selftests/mm: fix deadlock for fork after pthread_create on ARM
On Android with arm, there is some synchronization needed to avoid a
deadlock when forking after pthread_create.
Link: https://lkml.kernel.org/r/20241003211716.371786-3-edliaw@google.com
Fixes:
cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Edward Liaw [Thu, 3 Oct 2024 21:17:10 +0000 (21:17 +0000)]
selftests/mm: replace atomic_bool with pthread_barrier_t
Patch series "selftests/mm: fix deadlock after pthread_create".
On Android arm, pthread_create followed by a fork caused a deadlock in the
case where the fork required work to be completed by the created thread.
Update the synchronization primitive to use pthread_barrier instead of
atomic_bool.
Apply the same fix to the wp-fork-with-event test.
This patch (of 2):
Swap synchronization primitive with pthread_barrier, so that stdatomic.h
does not need to be included.
The synchronization is needed on Android ARM64; we see a deadlock with
pthread_create when the parent thread races forward before the child has a
chance to start doing work.
Link: https://lkml.kernel.org/r/20241003211716.371786-1-edliaw@google.com
Link: https://lkml.kernel.org/r/20241003211716.371786-2-edliaw@google.com
Fixes:
cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
OGAWA Hirofumi [Fri, 4 Oct 2024 06:03:49 +0000 (15:03 +0900)]
fat: fix uninitialized variable
syszbot produced this with a corrupted fs image. In theory, however an IO
error would trigger this also.
This affects just an error report, so should not be a serious error.
Link: https://lkml.kernel.org/r/87r08wjsnh.fsf@mail.parknet.co.jp
Link: https://lkml.kernel.org/r/66ff2c95.050a0220.49194.03e9.GAE@google.com
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: syzbot+ef0d7bc412553291aa86@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Fri, 4 Oct 2024 03:35:31 +0000 (12:35 +0900)]
nilfs2: propagate directory read errors from nilfs_find_entry()
Syzbot reported that a task hang occurs in vcs_open() during a fuzzing
test for nilfs2.
The root cause of this problem is that in nilfs_find_entry(), which
searches for directory entries, ignores errors when loading a directory
page/folio via nilfs_get_folio() fails.
If the filesystem images is corrupted, and the i_size of the directory
inode is large, and the directory page/folio is successfully read but
fails the sanity check, for example when it is zero-filled,
nilfs_check_folio() may continue to spit out error messages in bursts.
Fix this issue by propagating the error to the callers when loading a
page/folio fails in nilfs_find_entry().
The current interface of nilfs_find_entry() and its callers is outdated
and cannot propagate error codes such as -EIO and -ENOMEM returned via
nilfs_find_entry(), so fix it together.
Link: https://lkml.kernel.org/r/20241004033640.6841-1-konishi.ryusuke@gmail.com
Fixes:
2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Lizhi Xu <lizhi.xu@windriver.com>
Closes: https://lkml.kernel.org/r/
20240927013806.
3577931-1-lizhi.xu@windriver.com
Reported-by: syzbot+8a192e8d090fa9a31135@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
8a192e8d090fa9a31135
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Wed, 2 Oct 2024 07:39:32 +0000 (08:39 +0100)]
mm/mmap: correct error handling in mmap_region()
Commit
f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
changed how error handling is performed in mmap_region().
The error value defaults to -ENOMEM, but then gets reassigned immediately
to the result of vms_gather_munmap_vmas() if we are performing a MAP_FIXED
mapping over existing VMAs (and thus unmapping them).
This overwrites the error value, potentially clearing it.
After this, we invoke may_expand_vm() and possibly vm_area_alloc(), and
check to see if they failed. If they do so, then we perform error-handling
logic, but importantly, we do NOT update the error code.
This means that, if vms_gather_munmap_vmas() succeeds, but one of these
calls does not, the function will return indicating no error, but rather an
address value of zero, which is entirely incorrect.
Correct this and avoid future confusion by strictly setting error on each
and every occasion we jump to the error handling logic, and set the error
code immediately prior to doing so.
This way we can see at a glance that the error code is always correct.
Many thanks to Vegard Nossum who spotted this issue in discussion around
this problem.
Link: https://lkml.kernel.org/r/20241002073932.13482-1-lorenzo.stoakes@oracle.com
Fixes:
f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jinjie Ruan [Wed, 16 Oct 2024 02:26:58 +0000 (10:26 +0800)]
clk: test: Fix some memory leaks
CONFIG_CLK_KUNIT_TEST=y, CONFIG_DEBUG_KMEMLEAK=y
and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the following memory leak occurs.
If the KUNIT_ASSERT_*() fails, the latter (exit() or testcases)
clk_put() or clk_hw_unregister() will fail to release the clk resource
and cause memory leaks, use new clk_hw_register_kunit()
and clk_hw_get_clk_kunit() to automatically release them.
unreferenced object 0xffffff80c6af5000 (size 512):
comm "kunit_try_catch", pid 371, jiffies
4294896001
hex dump (first 32 bytes):
20 4c c0 86 e1 ff ff ff e0 1a c0 86 e1 ff ff ff L..............
c0 75 e3 c6 80 ff ff ff 00 00 00 00 00 00 00 00 .u..............
backtrace (crc
8ca788fa):
[<
00000000e21852d0>] kmemleak_alloc+0x34/0x40
[<
000000009c583f7b>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
00000000d1bc850c>] __clk_register+0x80/0x1ecc
[<
00000000b08c78c5>] clk_hw_register+0xc4/0x110
[<
00000000b16d6df8>] clk_multiple_parents_mux_test_init+0x238/0x288
[<
0000000014a7e804>] kunit_try_run_case+0x10c/0x3ac
[<
0000000026b41f03>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
0000000066619fb8>] kthread+0x2e8/0x374
[<
00000000a1157f53>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80c6e37880 (size 96):
comm "kunit_try_catch", pid 371, jiffies
4294896002
hex dump (first 32 bytes):
00 50 af c6 80 ff ff ff 00 00 00 00 00 00 00 00 .P..............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc
b4b766dd):
[<
00000000e21852d0>] kmemleak_alloc+0x34/0x40
[<
000000009c583f7b>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
0000000086e7dd64>] clk_hw_create_clk.part.0.isra.0+0x58/0x2f4
[<
00000000dcf1ac31>] clk_hw_get_clk+0x8c/0x114
[<
000000006fab5bfa>] clk_test_multiple_parents_mux_set_range_set_parent_get_rate+0x3c/0xa0
[<
00000000c97db55a>] kunit_try_run_case+0x13c/0x3ac
[<
0000000026b41f03>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
0000000066619fb8>] kthread+0x2e8/0x374
[<
00000000a1157f53>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80c2b56900 (size 96):
comm "kunit_try_catch", pid 395, jiffies
4294896107
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 e0 49 c0 86 e1 ff ff ff .........I......
backtrace (crc
2e59b327):
[<
00000000e21852d0>] kmemleak_alloc+0x34/0x40
[<
00000000c6c715a8>] __kmalloc_noprof+0x2bc/0x3c0
[<
00000000f04a7951>] __clk_register+0x70c/0x1ecc
[<
00000000b08c78c5>] clk_hw_register+0xc4/0x110
[<
00000000cafa9563>] clk_orphan_transparent_multiple_parent_mux_test_init+0x1a8/0x1dc
[<
0000000014a7e804>] kunit_try_run_case+0x10c/0x3ac
[<
0000000026b41f03>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
0000000066619fb8>] kthread+0x2e8/0x374
[<
00000000a1157f53>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80c87c9400 (size 512):
comm "kunit_try_catch", pid 483, jiffies
4294896907
hex dump (first 32 bytes):
a0 44 c0 86 e1 ff ff ff e0 1a c0 86 e1 ff ff ff .D..............
20 05 a8 c8 80 ff ff ff 00 00 00 00 00 00 00 00 ...............
backtrace (crc
c25b43fb):
[<
00000000e21852d0>] kmemleak_alloc+0x34/0x40
[<
000000009c583f7b>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
00000000d1bc850c>] __clk_register+0x80/0x1ecc
[<
00000000b08c78c5>] clk_hw_register+0xc4/0x110
[<
000000002688be48>] clk_single_parent_mux_test_init+0x1a0/0x1d4
[<
0000000014a7e804>] kunit_try_run_case+0x10c/0x3ac
[<
0000000026b41f03>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
0000000066619fb8>] kthread+0x2e8/0x374
[<
00000000a1157f53>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80c6dd2380 (size 96):
comm "kunit_try_catch", pid 483, jiffies
4294896908
hex dump (first 32 bytes):
00 94 7c c8 80 ff ff ff 00 00 00 00 00 00 00 00 ..|.............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc
4401212):
[<
00000000e21852d0>] kmemleak_alloc+0x34/0x40
[<
000000009c583f7b>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
0000000086e7dd64>] clk_hw_create_clk.part.0.isra.0+0x58/0x2f4
[<
00000000dcf1ac31>] clk_hw_get_clk+0x8c/0x114
[<
0000000063eb2c90>] clk_test_single_parent_mux_set_range_disjoint_child_last+0x3c/0xa0
[<
00000000c97db55a>] kunit_try_run_case+0x13c/0x3ac
[<
0000000026b41f03>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
0000000066619fb8>] kthread+0x2e8/0x374
[<
00000000a1157f53>] ret_from_fork+0x10/0x20
......
Fixes:
02cdeace1e1e ("clk: tests: Add tests for single parent mux")
Fixes:
2e9cad1abc71 ("clk: tests: Add some tests for orphan with multiple parents")
Fixes:
433fb8a611ca ("clk: tests: Add missing test case for ranges")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20241016022658.2131826-1-ruanjinjie@huawei.com
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Linus Torvalds [Wed, 16 Oct 2024 20:37:59 +0000 (13:37 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Several miscellaneous fixes. A lot of bnxt_re activity, there will be
more rc patches there coming.
- Many bnxt_re bug fixes - Memory leaks, kasn, NULL pointer deref,
soft lockups, error unwinding and some small functional issues
- Error unwind bug in rdma netlink
- Two issues with incorrect VLAN detection for iWarp
- skb_splice_from_iter() splat in siw
- Give SRP slab caches unique names to resolve the merge window
WARN_ON regression"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/bnxt_re: Fix the GID table length
RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
RDMA/bnxt_re: Fix a possible NULL pointer dereference
RDMA/bnxt_re: Return more meaningful error
RDMA/bnxt_re: Fix incorrect dereference of srq in async event
RDMA/bnxt_re: Fix out of bound check
RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
RDMA/srpt: Make slab cache names unique
RDMA/irdma: Fix misspelling of "accept*"
RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES
RDMA/core: Fix ENODEV error for iWARP test over vlan
RDMA/nldev: Fix NULL pointer dereferences issue in rdma_nl_notify_event
RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
RDMA/bnxt_re: Add a check for memory allocation
RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
RDMA/bnxt_re: Fix a possible memory leak
Linus Torvalds [Wed, 16 Oct 2024 16:30:20 +0000 (09:30 -0700)]
Merge tag 'for-6.12-rc3-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix: dirty extents tracked in xarray for qgroups must be
adjusted for 32bit platforms
- fix potentially freeing uninitialized name in fscrypt structure
- fix warning about unneeded variable in a send callback
* tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix uninitialized pointer free on read_alloc_one_name() error
btrfs: send: cleanup unneeded return variable in changed_verity()
btrfs: fix uninitialized pointer free in add_inode_ref()
btrfs: use sector numbers as keys for the dirty extents xarray
Linus Torvalds [Wed, 16 Oct 2024 16:15:43 +0000 (09:15 -0700)]
Merge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- fix race between session setup and session logoff
- add supplementary group support
* tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add support for supplementary groups
ksmbd: fix user-after-free from session log off
Linus Torvalds [Wed, 16 Oct 2024 15:42:54 +0000 (08:42 -0700)]
Merge tag 'v6.12-p3' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Remove bogus testmgr ENOENT error messages
- Ensure algorithm is still alive before marking it as tested
- Disable buggy hash algorithms in marvell/cesa
* tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell/cesa - Disable hash algorithms
crypto: testmgr - Hide ENOENT errors better
crypto: api - Fix liveliness check in crypto_alg_tested
Heiko Carstens [Mon, 14 Oct 2024 10:07:26 +0000 (12:07 +0200)]
s390: Update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Thu, 10 Oct 2024 15:52:39 +0000 (17:52 +0200)]
s390: Initialize psw mask in perf_arch_fetch_caller_regs()
Also initialize regs->psw.mask in perf_arch_fetch_caller_regs().
This way user_mode(regs) will return false, like it should.
It looks like all current users initialize regs to zero, so that this
doesn't fix a bug currently. However it is better to not rely on callers
to do this.
Fixes:
914d52e46490 ("s390: implement perf_arch_fetch_caller_regs")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Weißschuh [Mon, 14 Oct 2024 05:50:07 +0000 (07:50 +0200)]
s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
According to the VT220 specification the possible character combinations
sent on RETURN are only CR or CRLF [0].
The Return key sends either a CR character (0/13) or a CR
character (0/13) and an LF character (0/10), depending on the
set/reset state of line feed/new line mode (LNM).
The sclp/vt220 driver however uses LFCR. This can confuse tools, for
example the kunit runner.
Link: https://vt100.net/docs/vt220-rm/chapter3.html#S3.2
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/r/20241014-s390-kunit-v1-2-941defa765a6@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Weißschuh [Mon, 14 Oct 2024 05:50:06 +0000 (07:50 +0200)]
s390/sclp: Deactivate sclp after all its users
On reboot the SCLP interface is deactivated through a reboot notifier.
This happens before other components using SCLP have the chance to run
their own reboot notifiers.
Two of those components are the SCLP console and tty drivers which try
to flush the last outstanding messages.
At that point the SCLP interface is already unusable and the messages
are discarded.
Execute sclp_deactivate() as late as possible to avoid this issue.
Fixes:
4ae46db99cd8 ("s390/consoles: improve panic notifiers reliability")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/r/20241014-s390-kunit-v1-1-941defa765a6@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Holger Dengler [Fri, 11 Oct 2024 08:48:00 +0000 (10:48 +0200)]
s390/pkey_pckmo: Return with success for valid protected key types
The key_to_protkey handler function in module pkey_pckmo should return
with success on all known protected key types, including the new types
introduced by
fd197556eef5 ("s390/pkey: Add AES xts and HMAC clear key
token support").
Fixes:
fd197556eef5 ("s390/pkey: Add AES xts and HMAC clear key token support")
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Vasiliy Kovalev [Wed, 16 Oct 2024 08:07:13 +0000 (11:07 +0300)]
ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2
The cached version avoids redundant commands to the codec, improving
stability and reducing unnecessary operations. This change ensures
better power management and reliable restoration of pin configurations,
especially after hibernation (S4) and other power transitions.
Fixes:
9988844c457f ("ALSA: hda/conexant - Fix audio routing for HP EliteOne 1000 G2")
Suggested-by: Kai-Heng Feng <kaihengf@nvidia.com>
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Link: https://patch.msgid.link/20241016080713.46801-1-kovalev@altlinux.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Wed, 16 Oct 2024 02:47:19 +0000 (19:47 -0700)]
Merge tag 'sched_ext-for-6.12-rc3-fixes' of git://git./linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- More issues reported in the enable/disable paths on large machines
with many tasks due to scx_tasks_lock being held too long. Break up
the task iterations
- Remove ops.select_cpu() dependency in bypass mode so that a
misbehaving implementation can't live-lock the machine by pushing all
tasks to few CPUs in bypass mode
- Other misc fixes
* tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Remove unnecessary cpu_relax()
sched_ext: Don't hold scx_tasks_lock for too long
sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers
sched_ext: bypass mode shouldn't depend on ops.select_cpu()
sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()
sched_ext: Start schedulers with consistent p->scx.slice values
Revert "sched_ext: Use shorter slice while bypassing"
sched_ext: use correct function name in pick_task_scx() warning message
selftests: sched_ext: Add sched_ext as proper selftest target
Vladimir Oltean [Mon, 14 Oct 2024 15:30:41 +0000 (18:30 +0300)]
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
Similar to the situation described for sja1105 in commit
1f9fc48fd302
("net: dsa: sja1105: fix reception from VLAN-unaware bridges"), the
vsc73xx driver uses tag_8021q and doesn't need the ds->untag_bridge_pvid
request. In fact, this option breaks packet reception.
The ds->untag_bridge_pvid option strips VLANs from packets received on
VLAN-unaware bridge ports. But those VLANs should already be stripped
by tag_vsc73xx_8021q.c as part of vsc73xx_rcv() - they are not VLANs in
VLAN-unaware mode, but DSA tags. Thus, dsa_software_vlan_untag() tries
to untag a VLAN that doesn't exist, corrupting the packet.
Fixes:
93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges")
Tested-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20241014153041.1110364-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Niklas Söderlund [Mon, 14 Oct 2024 12:43:43 +0000 (14:43 +0200)]
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
Recent work moving the reporting of Rx software timestamps to the core
[1] highlighted an issue where hardware time stamping was advertised
for the platforms where it is not supported.
Fix this by covering advertising support for hardware timestamps only if
the hardware supports it. Due to the Tx implementation in RAVB software
Tx timestamping is also only considered if the hardware supports
hardware timestamps. This should be addressed in future, but this fix
only reflects what the driver currently implements.
1. Commit
277901ee3a26 ("ravb: Remove setting of RX software timestamp")
Fixes:
7e09a052dc4e ("ravb: Exclude gPTP feature support for RZ/G2L")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Tested-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://patch.msgid.link/20241014124343.3875285-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jinjie Ruan [Mon, 14 Oct 2024 12:19:22 +0000 (20:19 +0800)]
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
Commit
a3c1e45156ad ("net: microchip: vcap: Fix use-after-free error in
kunit test") fixed the use-after-free error, but introduced below
memory leaks by removing necessary vcap_free_rule(), add it to fix it.
unreferenced object 0xffffff80ca58b700 (size 192):
comm "kunit_try_catch", pid 1215, jiffies
4294898264
hex dump (first 32 bytes):
00 12 7a 00 05 00 00 00 0a 00 00 00 64 00 00 00 ..z.........d...
00 00 00 00 00 00 00 00 00 04 0b cc 80 ff ff ff ................
backtrace (crc
9c09c3fe):
[<
0000000052a0be73>] kmemleak_alloc+0x34/0x40
[<
0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
0000000040a01b8d>] vcap_alloc_rule+0x3cc/0x9c4
[<
000000003fe86110>] vcap_api_encode_rule_test+0x1ac/0x16b0
[<
00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
[<
0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
00000000c5d82c9a>] kthread+0x2e8/0x374
[<
00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0400 (size 64):
comm "kunit_try_catch", pid 1215, jiffies
4294898265
hex dump (first 32 bytes):
80 04 0b cc 80 ff ff ff 18 b7 58 ca 80 ff ff ff ..........X.....
39 00 00 00 02 00 00 00 06 05 04 03 02 01 ff ff 9...............
backtrace (crc
daf014e9):
[<
0000000052a0be73>] kmemleak_alloc+0x34/0x40
[<
0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
[<
00000000dfdb1e81>] vcap_api_encode_rule_test+0x224/0x16b0
[<
00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
[<
0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
00000000c5d82c9a>] kthread+0x2e8/0x374
[<
00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0700 (size 64):
comm "kunit_try_catch", pid 1215, jiffies
4294898265
hex dump (first 32 bytes):
80 07 0b cc 80 ff ff ff 28 b7 58 ca 80 ff ff ff ........(.X.....
3c 00 00 00 00 00 00 00 01 2f 03 b3 ec ff ff ff <......../......
backtrace (crc
8d877792):
[<
0000000052a0be73>] kmemleak_alloc+0x34/0x40
[<
0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
000000006eadfab7>] vcap_rule_add_action+0x2d0/0x52c
[<
00000000323475d1>] vcap_api_encode_rule_test+0x4d4/0x16b0
[<
00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
[<
0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
00000000c5d82c9a>] kthread+0x2e8/0x374
[<
00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0900 (size 64):
comm "kunit_try_catch", pid 1215, jiffies
4294898266
hex dump (first 32 bytes):
80 09 0b cc 80 ff ff ff 80 06 0b cc 80 ff ff ff ................
7d 00 00 00 01 00 00 00 00 00 00 00 ff 00 00 00 }...............
backtrace (crc
34181e56):
[<
0000000052a0be73>] kmemleak_alloc+0x34/0x40
[<
0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
[<
00000000991e3564>] vcap_val_rule+0xcf0/0x13e8
[<
00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
[<
00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
[<
0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
00000000c5d82c9a>] kthread+0x2e8/0x374
[<
00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0980 (size 64):
comm "kunit_try_catch", pid 1215, jiffies
4294898266
hex dump (first 32 bytes):
18 b7 58 ca 80 ff ff ff 00 09 0b cc 80 ff ff ff ..X.............
67 00 00 00 00 00 00 00 01 01 74 88 c0 ff ff ff g.........t.....
backtrace (crc
275fd9be):
[<
0000000052a0be73>] kmemleak_alloc+0x34/0x40
[<
0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
[<
000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
[<
000000001396a1a2>] test_add_def_fields+0xb0/0x100
[<
000000006e7621f0>] vcap_val_rule+0xa98/0x13e8
[<
00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
[<
00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
[<
0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<
00000000c5d82c9a>] kthread+0x2e8/0x374
[<
00000000f4287308>] ret_from_fork+0x10/0x20
......
Cc: stable@vger.kernel.org
Fixes:
a3c1e45156ad ("net: microchip: vcap: Fix use-after-free error in kunit test")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20241014121922.1280583-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 16 Oct 2024 01:23:55 +0000 (18:23 -0700)]
Merge branch 'net-phy-mdio-bcm-unimac-add-bcm6846-variant'
Linus Walleij says:
====================
net: phy: mdio-bcm-unimac: Add BCM6846 variant
As pointed out by Florian:
https://lore.kernel.org/linux-devicetree/
b542b2e8-115c-4234-a464-
e73aa6bece5c@broadcom.com/
The BCM6846 has a few extra registers and cannot reuse the
compatible string from other variants of the Unimac
MDIO block: we need to be able to tell them apart.
====================
Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-0-c703ca83e962@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Walleij [Sat, 12 Oct 2024 20:35:23 +0000 (22:35 +0200)]
net: phy: mdio-bcm-unimac: Add BCM6846 support
Add Unimac mdio compatible string for the special BCM6846
variant.
This variant has a few extra registers compared to other
versions.
Suggested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-2-c703ca83e962@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Walleij [Sat, 12 Oct 2024 20:35:22 +0000 (22:35 +0200)]
dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
The MDIO block in the BCM6846 is not identical to any of the
previous versions, but has extended registers not present in
the other variants. For this reason we need to use a new
compatible especially for this SoC.
Suggested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-1-c703ca83e962@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Sitnicki [Fri, 11 Oct 2024 12:17:30 +0000 (14:17 +0200)]
udp: Compute L4 checksum as usual when not segmenting the skb
If:
1) the user requested USO, but
2) there is not enough payload for GSO to kick in, and
3) the egress device doesn't offer checksum offload, then
we want to compute the L4 checksum in software early on.
In the case when we are not taking the GSO path, but it has been requested,
the software checksum fallback in skb_segment doesn't get a chance to
compute the full checksum, if the egress device can't do it. As a result we
end up sending UDP datagrams with only a partial checksum filled in, which
the peer will discard.
Fixes:
10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20241011-uso-swcsum-fixup-v2-1-6e1ddc199af9@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 11 Oct 2024 17:12:17 +0000 (17:12 +0000)]
genetlink: hold RCU in genlmsg_mcast()
While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
one lockdep splat [1].
genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.
Instead of letting all callers guard genlmsg_multicast_allns()
with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().
This also means the @flags parameter is useless, we need to always use
GFP_ATOMIC.
[1]
[10882.424136] =============================
[10882.424166] WARNING: suspicious RCU usage
[10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
[10882.424400] -----------------------------
[10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
[10882.424469]
other info that might help us debug this:
[10882.424500]
rcu_scheduler_active = 2, debug_locks = 1
[10882.424744] 2 locks held by ip/15677:
[10882.424791] #0:
ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
[10882.426334] #1:
ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
[10882.426465]
stack backtrace:
[10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
[10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[10882.427046] Call Trace:
[10882.427131] <TASK>
[10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
[10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
[10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
[10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
[10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
[10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
[10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
[10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
[10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
[10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
[10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
[10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
[10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
[10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))
Fixes:
33f72e6f0c67 ("l2tp : multicast notification to the registered listeners")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Cc: Tom Parkin <tparkin@katalix.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Peter Rashleigh [Mon, 14 Oct 2024 20:43:42 +0000 (13:43 -0700)]
net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
According to the Marvell datasheet the
88E6361 has two VTU pages
(4k VIDs per page) so the max_vid should be 8191, not 4095.
In the current implementation mv88e6xxx_vtu_walk() gives unexpected
results because of this error. I verified that mv88e6xxx_vtu_walk()
works correctly on the MV88E6361 with this patch in place.
Fixes:
12899f299803 ("net: dsa: mv88e6xxx: enable support for
88E6361 switch")
Signed-off-by: Peter Rashleigh <peter@rashleigh.ca>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241014204342.5852-1-peter@rashleigh.ca
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 14 Oct 2024 22:33:12 +0000 (15:33 -0700)]
tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
"""
We are seeing a use-after-free from a bpf prog attached to
trace_tcp_retransmit_synack. The program passes the req->sk to the
bpf_sk_storage_get_tracing kernel helper which does check for null
before using it.
"""
The commit
83fccfc3940c ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.
Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.
If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.
The reported UAF could happen if req->sk is close()d earlier than the timer
expiration, which is 63s by default.
The scenario would be
1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
but del_timer_sync() is missed
2. reqsk timer is executed and scheduled again
3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
reqsk timer still has another one, and inet_csk_accept() does not
clear req->sk for non-TFO sockets
4. sk is close()d
5. reqsk timer is executed again, and BPF touches req->sk
Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().
Note that reqsk timer is pinned, so the issue does not happen in most
use cases. [1]
[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6
allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
Fixes:
83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Martin KaFai Lau <martin.lau@kernel.org>
Closes: https://lore.kernel.org/netdev/
eb6684d0-ffd9-4bdc-9196-
33f690c25824@linux.dev/
Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Hai [Mon, 14 Oct 2024 14:59:01 +0000 (22:59 +0800)]
net: bcmasp: fix potential memory leak in bcmasp_xmit()
The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb
in case of mapping fails, add dev_kfree_skb() to fix it.
Fixes:
490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Tue, 15 Oct 2024 20:39:42 +0000 (20:39 +0000)]
Merge tag 'scmi-fixes-6.12' of https://git./linux/kernel/git/sudeep.holla/linux into arm/fixes
Arm SCMI fixes for v6.12
Couple of fixes to address the issues found and reported on Broadcom
STB platforms following the recent refactor of all the SCMI transports
as standalone drivers.
One of the issue is that the effective timeout value is much less than
the intended value due to the way mailbox messages are queues in the
mailbox framework. Since we block or serialise the shmem access anyway,
there is no point in utilizing mailbox queues. The issue is fixed with
exclusive lock on the channel when sending the message.
The other issues is actually non-issue for upstream, but the workaround
is just changing the link order of the transport drivers which enables
Broadcom STB platforms to run both upstream and custom downstream kernel
without any device tree changes. So pushing this to help them test upstream
seamlessly as it has no practical or theoretical impact for others.
There is also a fix to address possible double freeing of the name string
in scmi_debugfs_common_cleanup() when devm_add_action_or_reset() fails.
* tag 'scmi-fixes-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Queue in scmi layer for mailbox implementation
firmware: arm_scmi: Give SMC transport precedence over mailbox
firmware: arm_scmi: Fix the double free in scmi_debugfs_common_setup()
Link: https://lore.kernel.org/r/20241015185128.1000604-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Tue, 15 Oct 2024 20:38:27 +0000 (20:38 +0000)]
Merge tag 'ffa-fixes-6.12' of https://git./linux/kernel/git/sudeep.holla/linux into arm/fixes
Arm FF-A fixes for v6.12
Couple of fixes to avoid string-fortify warnings in export_uuid()
and memcpy() from the recently added functions to support
FFA_MSG_SEND_DIRECT_REQ2 and FFA_MSG_SEND_DIRECT_RESP2.
* tag 'ffa-fixes-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_ffa: Avoid string-fortify warning caused by memcpy()
firmware: arm_ffa: Avoid string-fortify warning in export_uuid()
Link: https://lore.kernel.org/r/20241015185037.1000435-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Tue, 15 Oct 2024 20:37:29 +0000 (20:37 +0000)]
Merge tag 'mvebu-fixes-6.12-1' of https://git./linux/kernel/git/gclement/mvebu into arm/fixes
mvebu fixes for 6.12 (part 1)
Fix cp0 mdio pin numbers on SolidRun CN9130 SoM
* tag 'mvebu-fixes-6.12-1' of https://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
arm64: dts: marvell: cn9130-sr-som: fix cp0 mdio pin numbers
Link: https://lore.kernel.org/r/87ldyud25o.fsf@BLaptop.bootlin.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Tue, 15 Oct 2024 20:36:53 +0000 (20:36 +0000)]
Merge tag 'reset-fixes-for-v6.12' of git://git.pengutronix.de/pza/linux into arm/fixes
Reset controller fixes for v6.12
Fix a NULL pointer dereference in reset-starfive-jh71x0 and replace two
accidental commas at line endings with semicolons in reset-npcm.
* tag 'reset-fixes-for-v6.12' of git://git.pengutronix.de/pza/linux:
reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC
reset: npcm: convert comma to semicolon
Link: https://lore.kernel.org/r/20240930165733.1541936-1-p.zabel@pengutronix.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Wang Hai [Mon, 14 Oct 2024 14:51:15 +0000 (22:51 +0800)]
net: systemport: fix potential memory leak in bcm_sysport_xmit()
The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb
in case of dma_map_single() fails, add dev_kfree_skb() to fix it.
Fixes:
80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://patch.msgid.link/20241014145115.44977-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 15 Oct 2024 18:18:44 +0000 (11:18 -0700)]
Merge tag 'trace-ringbuffer-v6.12-rc3' of git://git./linux/kernel/git/trace/linux-trace
Pull ring-buffer fixes from Steven Rostedt:
- Fix ref counter of buffers assigned at boot up
A tracing instance can be created from the kernel command line. If it
maps to memory, it is considered permanent and should not be deleted,
or bad things can happen. If it is not mapped to memory, then the
user is fine to delete it via rmdir from the instances directory. But
the ref counts assumed 0 was free to remove and greater than zero was
not. But this was not the case. When an instance is created, it
should have the reference of 1, and if it should not be removed, it
must be greater than 1. The boot up code set normal instances with a
ref count of 0, which could get removed if something accessed it and
then released it. And memory mapped instances had a ref count of 1
which meant it could be deleted, and bad things happen. Keep normal
instances ref count as 1, and set memory mapped instances ref count
to 2.
- Protect sub buffer size (order) updates from other modifications
When a ring buffer is changing the size of its sub-buffers, no other
operations should be performed on the ring buffer. That includes
reading it. But the locking only grabbed the buffer->mutex that keeps
some operations from touching the ring buffer. It also must hold the
cpu_buffer->reader_lock as well when updates happen as other paths
use that to do some operations on the ring buffer.
* tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Fix reader locking when changing the sub buffer order
ring-buffer: Fix refcount setting of boot mapped buffers
Linus Torvalds [Tue, 15 Oct 2024 18:06:45 +0000 (11:06 -0700)]
Merge tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- New metadata version inode_has_child_snapshots
This fixes bugs with handling of unlinked inodes + snapshots, in
particular when an inode is reattached after taking a snapshot;
deleted inodes now get correctly cleaned up across snapshots.
- Disk accounting rewrite fixes
- validation fixes for when a device has been removed
- fix journal replay failing with "journal_reclaim_would_deadlock"
- Some more small fixes for erasure coding + device removal
- Assorted small syzbot fixes
* tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs: (27 commits)
bcachefs: Fix sysfs warning in fstests generic/730,731
bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev
bcachefs: Fix kasan splat in new_stripe_alloc_buckets()
bcachefs: Add missing validation for bch_stripe.csum_granularity_bits
bcachefs: Fix missing bounds checks in bch2_alloc_read()
bcachefs: fix uaf in bch2_dio_write_done()
bcachefs: Improve check_snapshot_exists()
bcachefs: Fix bkey_nocow_lock()
bcachefs: Fix accounting replay flags
bcachefs: Fix invalid shift in member_to_text()
bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID
bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry
bcachefs: Check if stuck in journal_res_get()
closures: Add closure_wait_event_timeout()
bcachefs: Fix state lock involved deadlock
bcachefs: Fix NULL pointer dereference in bch2_opt_to_text
bcachefs: Release transaction before wake up
bcachefs: add check for btree id against max in try read node
bcachefs: Disk accounting device validation fixes
bcachefs: bch2_inode_or_descendents_is_open()
...
Wang Hai [Mon, 14 Oct 2024 14:42:50 +0000 (22:42 +0800)]
net: ethernet: rtsn: fix potential memory leak in rtsn_start_xmit()
The rtsn_start_xmit() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb_any() to fix it.
Fixes:
b0d3969d2b4d ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014144250.38802-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Hai [Mon, 14 Oct 2024 14:37:04 +0000 (22:37 +0800)]
net: xilinx: axienet: fix potential memory leak in axienet_start_xmit()
The axienet_start_xmit() returns NETDEV_TX_OK without freeing skb
in case of dma_map_single() fails, add dev_kfree_skb_any() to fix it.
Fixes:
71791dc8bdea ("net: axienet: Check for DMA mapping errors")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20241014143704.31938-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 15 Oct 2024 17:57:04 +0000 (10:57 -0700)]
Merge branch 'mptcp-prevent-mpc-handshake-on-port-based-signal-endpoints'
Matthieu Baerts says:
====================
mptcp: prevent MPC handshake on port-based signal endpoints
MPTCP connection requests toward a listening socket created by the
in-kernel PM for a port based signal endpoint will never be accepted,
they need to be explicitly rejected.
- Patch 1: Explicitly reject such requests. A fix for >= v5.12.
- Patch 2: Cover this case in the MPTCP selftests to avoid regressions.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
v1: https://lore.kernel.org/
20240908180620.822579-1-xiyou.wangcong@gmail.com
Link: https://lore.kernel.org/a5289a0d-2557-40b8-9575-6f1a0bbf06e4@redhat.com
====================
Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-0-7faea8e6b6ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 14 Oct 2024 14:06:01 +0000 (16:06 +0200)]
selftests: mptcp: join: test for prohibited MPC to port-based endp
Explicitly verify that MPC connection attempts towards a port-based
signal endpoint fail with a reset.
Note that this new test is a bit different from the other ones, not
using 'run_tests'. It is then needed to add the capture capability, and
the picking the right port which have been extracted into three new
helpers. The info about the capture can also be printed from a single
point, which simplifies the exit paths in do_transfer().
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes:
1729cf186d8a ("mptcp: create the listening socket for new port")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-2-7faea8e6b6ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 14 Oct 2024 14:06:00 +0000 (16:06 +0200)]
mptcp: prevent MPC handshake on port-based signal endpoints
Syzkaller reported a lockdep splat:
============================================
WARNING: possible recursive locking detected
6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted
--------------------------------------------
syz-executor364/5113 is trying to acquire lock:
ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
but task is already holding lock:
ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(k-slock-AF_INET);
lock(k-slock-AF_INET);
*** DEADLOCK ***
May be due to missing lock nesting notation
7 locks held by syz-executor364/5113:
#0:
ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
#0:
ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806
#1:
ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
#1:
ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727
#2:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
#2:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#2:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470
#3:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
#3:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#3:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228
#4:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
#4:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104
#5:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
#5:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#5:
ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232
#6:
ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
#6:
ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
stack backtrace:
CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted
6.11.0-rc6-syzkaller-00019-g67784a74e258 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
check_deadlock kernel/locking/lockdep.c:3061 [inline]
validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855
__lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279
subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874
tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853
tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267
ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
__netif_receive_skb_one_core net/core/dev.c:5661 [inline]
__netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
process_backlog+0x662/0x15b0 net/core/dev.c:6108
__napi_poll+0xcb/0x490 net/core/dev.c:6772
napi_poll net/core/dev.c:6841 [inline]
net_rx_action+0x89b/0x1240 net/core/dev.c:6963
handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
do_softirq+0x11b/0x1e0 kernel/softirq.c:455
</IRQ>
<TASK>
__local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
__dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450
dev_queue_xmit include/linux/netdevice.h:3105 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235
ip_local_out net/ipv4/ip_output.c:129 [inline]
__ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535
__tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline]
tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729
tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934
sk_backlog_rcv include/net/sock.h:1111 [inline]
__release_sock+0x214/0x350 net/core/sock.c:3004
release_sock+0x61/0x1f0 net/core/sock.c:3558
mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733
mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x1a6/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
__do_sys_sendmmsg net/socket.c:2766 [inline]
__se_sys_sendmmsg net/socket.c:2763 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f04fb13a6b9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007ffd651f42d8 EFLAGS:
00000246 ORIG_RAX:
0000000000000133
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
00007f04fb13a6b9
RDX:
0000000000000001 RSI:
0000000020000d00 RDI:
0000000000000004
RBP:
00007ffd651f4310 R08:
0000000000000001 R09:
0000000000000001
R10:
0000000020000080 R11:
0000000000000246 R12:
00000000000f4240
R13:
00007f04fb187449 R14:
00007ffd651f42f4 R15:
00007ffd651f4300
</TASK>
As noted by Cong Wang, the splat is false positive, but the code
path leading to the report is an unexpected one: a client is
attempting an MPC handshake towards the in-kernel listener created
by the in-kernel PM for a port based signal endpoint.
Such connection will be never accepted; many of them can make the
listener queue full and preventing the creation of MPJ subflow via
such listener - its intended role.
Explicitly detect this scenario at initial-syn time and drop the
incoming MPC request.
Fixes:
1729cf186d8a ("mptcp: create the listening socket for new port")
Cc: stable@vger.kernel.org
Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
f4aacdfef2c6a6529c3e
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Li RongQing [Mon, 14 Oct 2024 11:53:21 +0000 (19:53 +0800)]
net/smc: Fix searching in list of known pnetids in smc_pnet_add_pnetid
pnetid of pi (not newly allocated pe) should be compared
Fixes:
e888a2e8337c ("net/smc: introduce list of pnetids for Ethernet devices")
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Link: https://patch.msgid.link/20241014115321.33234-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Sun, 13 Oct 2024 05:29:16 +0000 (07:29 +0200)]
net: macb: Avoid 20s boot delay by skipping MDIO bus registration for fixed-link PHY
A boot delay was introduced by commit
79540d133ed6 ("net: macb: Fix
handling of fixed-link node"). This delay was caused by the call to
`mdiobus_register()` in cases where a fixed-link PHY was present. The
MDIO bus registration triggered unnecessary PHY address scans, leading
to a 20-second delay due to attempts to detect Clause 45 (C45)
compatible PHYs, despite no MDIO bus being attached.
The commit
79540d133ed6 ("net: macb: Fix handling of fixed-link node")
was originally introduced to fix a regression caused by commit
7897b071ac3b4 ("net: macb: convert to phylink"), which caused the driver
to misinterpret fixed-link nodes as PHY nodes. This resulted in warnings
like:
mdio_bus
f0028000.ethernet-
ffffffff: fixed-link has invalid PHY address
mdio_bus
f0028000.ethernet-
ffffffff: scan phy fixed-link at address 0
...
mdio_bus
f0028000.ethernet-
ffffffff: scan phy fixed-link at address 31
This patch reworks the logic to avoid registering and allocation of the
MDIO bus when:
- The device tree contains a fixed-link node.
- There is no "mdio" child node in the device tree.
If a child node named "mdio" exists, the MDIO bus will be registered to
support PHYs attached to the MACB's MDIO bus. Otherwise, with only a
fixed-link, the MDIO bus is skipped.
Tested on a sama5d35 based system with a ksz8863 switch attached to
macb0.
Fixes:
79540d133ed6 ("net: macb: Fix handling of fixed-link node")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241013052916.3115142-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Hai [Sat, 12 Oct 2024 11:04:34 +0000 (19:04 +0800)]
net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()
The greth_start_xmit_gbit() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb() to fix it.
Fixes:
d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20241012110434.49265-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Sat, 12 Oct 2024 09:42:30 +0000 (09:42 +0000)]
netdevsim: use cond_resched() in nsim_dev_trap_report_work()
I am still seeing many syzbot reports hinting that syzbot
might fool nsim_dev_trap_report_work() with hundreds of ports [1]
Lets use cond_resched(), and system_unbound_wq
instead of implicit system_wq.
[1]
INFO: task syz-executor:20633 blocked for more than 143 seconds.
Not tainted
6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor state:D stack:25856 pid:20633 tgid:20633 ppid:1 flags:0x00004006
...
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 16760 Comm: kworker/1:0 Not tainted
6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:210
Code: 89 fb e8 23 00 00 00 48 8b 3d 04 fb 9c 0c 48 89 de 5b e9 c3 c7 5d 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0c 25 c0 d7 03 00 65 8b 15 60 f0
RSP: 0018:
ffffc90000a187e8 EFLAGS:
00000246
RAX:
0000000000000100 RBX:
ffffc90000a188e0 RCX:
ffff888027d3bc00
RDX:
ffff888027d3bc00 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffff88804a2e6000 R08:
ffffffff8a4bc495 R09:
ffffffff89da3577
R10:
0000000000000004 R11:
ffffffff8a4bc2b0 R12:
dffffc0000000000
R13:
ffff88806573b503 R14:
dffffc0000000000 R15:
ffff8880663cca00
FS:
0000000000000000(0000) GS:
ffff8880b8700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fc90a747f98 CR3:
000000000e734000 CR4:
00000000003526f0
DR0:
0000000000000000 DR1:
000000000000002b DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
__local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
spin_unlock_bh include/linux/spinlock.h:396 [inline]
nsim_dev_trap_report drivers/net/netdevsim/dev.c:820 [inline]
nsim_dev_trap_report_work+0x75d/0xaa0 drivers/net/netdevsim/dev.c:850
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Fixes:
ba5e1272142d ("netdevsim: avoid potential loop in nsim_dev_trap_report_work()")
Reported-by: syzbot+d383dc9579a76f56c251@syzkaller.appspotmail.com
Reported-by: syzbot+c596faae21a68bf7afd0@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20241012094230.3893510-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 11 Oct 2024 15:16:37 +0000 (17:16 +0200)]
macsec: don't increment counters for an unrelated SA
On RX, we shouldn't be incrementing the stats for an arbitrary SA in
case the actual SA hasn't been set up. Those counters are intended to
track packets for their respective AN when the SA isn't currently
configured. Due to the way MACsec is implemented, we don't keep
counters unless the SA is configured, so we can't track those packets,
and those counters will remain at 0.
The RXSC's stats keeps track of those packets without telling us which
AN they belonged to. We could add counters for non-existent SAs, and
then find a way to integrate them in the dump to userspace, but I
don't think it's worth the effort.
Fixes:
91ec9bd57f35 ("macsec: Fix traffic counters/statistics")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/f5ac92aaa5b89343232615f4c03f9f95042c6aa0.1728657709.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Pavlu [Tue, 15 Oct 2024 11:24:29 +0000 (13:24 +0200)]
ring-buffer: Fix reader locking when changing the sub buffer order
The function ring_buffer_subbuf_order_set() updates each
ring_buffer_per_cpu and installs new sub buffers that match the requested
page order. This operation may be invoked concurrently with readers that
rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or
the ring_buffer_per_cpu.pages and reader_page pointers. However, no
exclusive access is acquired by ring_buffer_subbuf_order_set(). Modifying
the mentioned data while a reader also operates on them can then result in
incorrect memory access and various crashes.
Fix the problem by taking the reader_lock when updating a specific
ring_buffer_per_cpu in ring_buffer_subbuf_order_set().
Link: https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/
Link: https://lore.kernel.org/linux-trace-kernel/20241010195849.2f77cc3f@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20241011112850.17212b25@gandalf.local.home/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241015112440.26987-1-petr.pavlu@suse.com
Fixes:
8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Gavin Shan [Mon, 14 Oct 2024 00:47:24 +0000 (10:47 +1000)]
firmware: arm_ffa: Avoid string-fortify warning caused by memcpy()
Copying from a 144 byte structure arm_smccc_1_2_regs at an offset of 32
into an 112 byte struct ffa_send_direct_data2 causes a compile-time warning:
| In file included from drivers/firmware/arm_ffa/driver.c:25:
| In function 'fortify_memcpy_chk',
| inlined from 'ffa_msg_send_direct_req2' at drivers/firmware/arm_ffa/driver.c:504:3:
| include/linux/fortify-string.h:580:4: warning: call to '__read_overflow2_field'
| declared with 'warning' attribute: detected read beyond size of field
| (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
| __read_overflow2_field(q_size_field, size);
Fix it by not passing a plain buffer to memcpy() to avoid the overflow
warning.
Fixes:
aaef3bc98129 ("firmware: arm_ffa: Add support for FFA_MSG_SEND_DIRECT_{REQ,RESP}2")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Message-Id: <
20241014004724.991353-1-gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Colin Ian King [Thu, 10 Oct 2024 15:45:19 +0000 (16:45 +0100)]
octeontx2-af: Fix potential integer overflows on integer shifts
The left shift int 32 bit integer constants 1 is evaluated using 32 bit
arithmetic and then assigned to a 64 bit unsigned integer. In the case
where the shift is 32 or more this can lead to an overflow. Avoid this
by shifting using the BIT_ULL macro instead.
Fixes:
019aba04f08c ("octeontx2-af: Modify SMQ flush sequence to drop packets")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20241010154519.768785-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Will Deacon [Mon, 14 Oct 2024 16:11:00 +0000 (17:11 +0100)]
kasan: Disable Software Tag-Based KASAN with GCC
Syzbot reports a KASAN failure early during boot on arm64 when building
with GCC 12.2.0 and using the Software Tag-Based KASAN mode:
| BUG: KASAN: invalid-access in smp_build_mpidr_hash arch/arm64/kernel/setup.c:133 [inline]
| BUG: KASAN: invalid-access in setup_arch+0x984/0xd60 arch/arm64/kernel/setup.c:356
| Write of size 4 at addr
03ff800086867e00 by task swapper/0
| Pointer tag: [03], memory tag: [fe]
Initial triage indicates that the report is a false positive and a
thorough investigation of the crash by Mark Rutland revealed the root
cause to be a bug in GCC:
> When GCC is passed `-fsanitize=hwaddress` or
> `-fsanitize=kernel-hwaddress` it ignores
> `__attribute__((no_sanitize_address))`, and instruments functions
> we require are not instrumented.
>
> [...]
>
> All versions [of GCC] I tried were broken, from 11.3.0 to 14.2.0
> inclusive.
>
> I think we have to disable KASAN_SW_TAGS with GCC until this is
> fixed
Disable Software Tag-Based KASAN when building with GCC by making
CC_HAS_KASAN_SW_TAGS depend on !CC_IS_GCC.
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com
Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241014161100.18034-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Paritosh Dixit [Thu, 10 Oct 2024 14:29:08 +0000 (10:29 -0400)]
net: stmmac: dwmac-tegra: Fix link bring-up sequence
The Tegra MGBE driver sometimes fails to initialize, reporting the
following error, and as a result, it is unable to acquire an IP
address with DHCP:
tegra-mgbe
6800000.ethernet: timeout waiting for link to become ready
As per the recommendation from the Tegra hardware design team, fix this
issue by:
- clearing the PHY_RDY bit before setting the CDR_RESET bit and then
setting PHY_RDY bit before clearing CDR_RESET bit. This ensures valid
data is present at UPHY RX inputs before starting the CDR lock.
- adding the required delays when bringing up the UPHY lane. Note we
need to use delays here because there is no alternative, such as
polling, for these cases. Using the usleep_range() instead of ndelay()
as sleeping is preferred over busy wait loop.
Without this change we would see link failures on boot sometimes as
often as 1 in 5 boots. With this fix we have not observed any failures
in over 1000 boots.
Fixes:
d8ca113724e7 ("net: stmmac: tegra: Add MGBE support")
Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>
Link: https://patch.msgid.link/20241010142908.602712-1-paritoshd@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Oliver Neukum [Thu, 10 Oct 2024 13:19:14 +0000 (15:19 +0200)]
net: usb: usbnet: fix race in probe failure
The same bug as in the disconnect code path also exists
in the case of a failure late during the probe process.
The flag must also be set.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://patch.msgid.link/20241010131934.1499695-1-oneukum@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pierre-Louis Bossart [Tue, 1 Oct 2024 07:06:11 +0000 (15:06 +0800)]
ALSA/hda: intel-sdw-acpi: add support for sdw-manager-list property read
The DisCo for SoundWire 2.0 spec adds support for a new
sdw-manager-list property. Add it in backwards-compatible mode with
'sdw-master-count', which assumed that all links between 0..count-1
exist.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20241001070611.63288-5-yung-chuan.liao@linux.intel.com
Pierre-Louis Bossart [Tue, 1 Oct 2024 07:06:10 +0000 (15:06 +0800)]
ALSA/hda: intel-sdw-acpi: simplify sdw-master-count property read
For some reason we used an array of one u8 when the specification
requires a u32.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20241001070611.63288-4-yung-chuan.liao@linux.intel.com
Pierre-Louis Bossart [Tue, 1 Oct 2024 07:06:09 +0000 (15:06 +0800)]
ALSA/hda: intel-sdw-acpi: fetch fwnode once in sdw_intel_scan_controller()
Optimize a bit by using an intermediate 'fwnode' variable.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20241001070611.63288-3-yung-chuan.liao@linux.intel.com
Pierre-Louis Bossart [Tue, 1 Oct 2024 07:06:08 +0000 (15:06 +0800)]
ALSA/hda: intel-sdw-acpi: cleanup sdw_intel_scan_controller
Remove unnecessary initialization and un-shadow return code.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20241001070611.63288-2-yung-chuan.liao@linux.intel.com
Kai Shen [Thu, 10 Oct 2024 11:56:24 +0000 (11:56 +0000)]
net/smc: Fix memory leak when using percpu refs
This patch adds missing percpu_ref_exit when releasing percpu refs.
When releasing percpu refs, percpu_ref_exit should be called.
Otherwise, memory leak happens.
Fixes:
79a22238b4f2 ("net/smc: Use percpu ref for wr tx reference")
Signed-off-by: Kai Shen <KaiShen@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/20241010115624.7769-1-KaiShen@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 15 Oct 2024 00:22:45 +0000 (17:22 -0700)]
Merge branch 'posix-clock-fix-missing-timespec64-check-for-ptp-clock'
Jinjie Ruan says:
====================
posix-clock: Fix missing timespec64 check for PTP clock
Check timespec64 in pc_clock_settime() for PTP clock as
the man manual of clock_settime() said.
====================
Link: https://patch.msgid.link/20241009072302.1754567-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>