Lu Wei [Wed, 22 Feb 2023 08:36:29 +0000 (16:36 +0800)]
selftests: fib_tests: Add test cases for IPv4/IPv6 in route notify
Add tests to check whether the total fib info length is calculated
corretly in route notify process.
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230222083629.335683-3-luwei32@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lu Wei [Wed, 22 Feb 2023 08:36:28 +0000 (16:36 +0800)]
ipv6: Add lwtunnel encap size of all siblings in nexthop calculation
In function rt6_nlmsg_size(), the length of nexthop is calculated
by multipling the nexthop length of fib6_info and the number of
siblings. However if the fib6_info has no lwtunnel but the siblings
have lwtunnels, the nexthop length is less than it should be, and
it will trigger a warning in inet6_rt_notify() as follows:
WARNING: CPU: 0 PID: 6082 at net/ipv6/route.c:6180 inet6_rt_notify+0x120/0x130
......
Call Trace:
<TASK>
fib6_add_rt2node+0x685/0xa30
fib6_add+0x96/0x1b0
ip6_route_add+0x50/0xd0
inet6_rtm_newroute+0x97/0xa0
rtnetlink_rcv_msg+0x156/0x3d0
netlink_rcv_skb+0x5a/0x110
netlink_unicast+0x246/0x350
netlink_sendmsg+0x250/0x4c0
sock_sendmsg+0x66/0x70
___sys_sendmsg+0x7c/0xd0
__sys_sendmsg+0x5d/0xb0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
This bug can be reproduced by script:
ip -6 addr add 2002::2/64 dev ens2
ip -6 route add 100::/64 via 2002::1 dev ens2 metric 100
for i in 10 20 30 40 50 60 70;
do
ip link add link ens2 name ipv_$i type ipvlan
ip -6 addr add 2002::$i/64 dev ipv_$i
ifconfig ipv_$i up
done
for i in 10 20 30 40 50 60;
do
ip -6 route append 100::/64 encap ip6 dst 2002::$i via 2002::1
dev ipv_$i metric 100
done
ip -6 route append 100::/64 via 2002::1 dev ipv_70 metric 100
This patch fixes it by adding nexthop_len of every siblings using
rt6_nh_nlmsg_size().
Fixes:
beb1afac518d ("net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230222083629.335683-2-luwei32@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 23 Feb 2023 05:25:23 +0000 (21:25 -0800)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix broken listing of set elements when table has an owner.
2) Fix conntrack refcount leak in ctnetlink with related conntrack
entries, from Hangyu Hua.
3) Fix use-after-free/double-free in ctnetlink conntrack insert path,
from Florian Westphal.
4) Fix ip6t_rpfilter with VRF, from Phil Sutter.
5) Fix use-after-free in ebtables reported by syzbot, also from Florian.
6) Use skb->len in xt_length to deal with IPv6 jumbo packets,
from Xin Long.
7) Fix NETLINK_LISTEN_ALL_NSID with ctnetlink, from Florian Westphal.
8) Fix memleak in {ip_,ip6_,arp_}tables in ENOMEM error case,
from Pavel Tikhomirov.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: x_tables: fix percpu counter block leak on error path when creating new netns
netfilter: ctnetlink: make event listener tracking global
netfilter: xt_length: use skb len to match in length_mt6
netfilter: ebtables: fix table blob use-after-free
netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
netfilter: conntrack: fix rmmod double-free race
netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack()
netfilter: nf_tables: allow to fetch set elements when table has an owner
====================
Link: https://lore.kernel.org/r/20230222092137.88637-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Íñigo Huguet [Tue, 21 Feb 2023 13:06:16 +0000 (14:06 +0100)]
ptp: vclock: use mutex to fix "sleep on atomic" bug
vclocks were using spinlocks to protect access to its timecounter and
cyclecounter. Access to timecounter/cyclecounter is backed by the same
driver callbacks that are used for non-virtual PHCs, but the usage of
the spinlock imposes a new limitation that didn't exist previously: now
they're called in atomic context so they mustn't sleep.
Some drivers like sfc or ice may sleep on these callbacks, causing
errors like "BUG: scheduling while atomic: ptp5/25223/0x00000002"
Fix it replacing the vclock's spinlock by a mutex. It fix the mentioned
bug and it doesn't introduce longer delays.
I've tested synchronizing various different combinations of clocks:
- vclock->sysclock
- sysclock->vclock
- vclock->vclock
- hardware PHC in different NIC -> vclock
- created 4 vclocks and launch 4 parallel phc2sys processes with
lockdep enabled
In all cases, comparing the delays reported by phc2sys, they are in the
same range of values than before applying the patch.
Link: https://lore.kernel.org/netdev/69d0ff33-bd32-6aa5-d36c-fbdc3c01337c@redhat.com/
Fixes:
5d43f951b1ac ("ptp: add ptp virtual clock driver framework")
Reported-by: Yalin Li <yalli@redhat.com>
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20230221130616.21837-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavel Tikhomirov [Mon, 13 Feb 2023 04:25:05 +0000 (12:25 +0800)]
netfilter: x_tables: fix percpu counter block leak on error path when creating new netns
Here is the stack where we allocate percpu counter block:
+-< __alloc_percpu
+-< xt_percpu_counter_alloc
+-< find_check_entry # {arp,ip,ip6}_tables.c
+-< translate_table
And it can be leaked on this code path:
+-> ip6t_register_table
+-> translate_table # allocates percpu counter block
+-> xt_register_table # fails
there is no freeing of the counter block on xt_register_table fail.
Note: xt_percpu_counter_free should be called to free it like we do in
do_replace through cleanup_entry helper (or in __ip6t_unregister_table).
Probability of hitting this error path is low AFAICS (xt_register_table
can only return ENOMEM here, as it is not replacing anything, as we are
creating new netns, and it is hard to imagine that all previous
allocations succeeded and after that one in xt_register_table failed).
But it's worth fixing even the rare leak.
Fixes:
71ae0dff02d7 ("netfilter: xtables: use percpu rule counters")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Wed, 22 Feb 2023 02:24:12 +0000 (18:24 -0800)]
Merge tag 'net-next-6.3' of git://git./linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Linus Torvalds [Wed, 22 Feb 2023 02:10:50 +0000 (18:10 -0800)]
Merge tag 'v6.3-p1' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Use kmap_local instead of kmap_atomic
- Change request callback to take void pointer
- Print FIPS status in /proc/crypto (when enabled)
Algorithms:
- Add rfc4106/gcm support on arm64
- Add ARIA AVX2/512 support on x86
Drivers:
- Add TRNG driver for StarFive SoC
- Delete ux500/hash driver (subsumed by stm32/hash)
- Add zlib support in qat
- Add RSA support in aspeed"
* tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits)
crypto: x86/aria-avx - Do not use avx2 instructions
crypto: aspeed - Fix modular aspeed-acry
crypto: hisilicon/qm - fix coding style issues
crypto: hisilicon/qm - update comments to match function
crypto: hisilicon/qm - change function names
crypto: hisilicon/qm - use min() instead of min_t()
crypto: hisilicon/qm - remove some unused defines
crypto: proc - Print fips status
crypto: crypto4xx - Call dma_unmap_page when done
crypto: octeontx2 - Fix objects shared between several modules
crypto: nx - Fix sparse warnings
crypto: ecc - Silence sparse warning
tls: Pass rec instead of aead_req into tls_encrypt_done
crypto: api - Remove completion function scaffolding
tls: Remove completion function scaffolding
tipc: Remove completion function scaffolding
net: ipv6: Remove completion function scaffolding
net: ipv4: Remove completion function scaffolding
net: macsec: Remove completion function scaffolding
dm: Remove completion function scaffolding
...
Linus Torvalds [Wed, 22 Feb 2023 01:32:50 +0000 (17:32 -0800)]
Merge tag 'platform-drivers-x86-v6.3-1' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
- AMD PMC: Improvements to aid s2idle debugging
- Dell WMI-DDV: hwmon support
- INT3472 camera sensor power-management: Improve privacy LED support
- Intel VSEC: Base TPMI (Topology Aware Register and PM Capsule
Interface) support
- Mellanox: SN5600 and Nvidia L1 switch support
- Microsoft Surface Support: Various cleanups + code improvements
- tools/intel-speed-select: Various improvements
- Miscellaneous other cleanups / fixes
* tag 'platform-drivers-x86-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (80 commits)
platform/x86: nvidia-wmi-ec-backlight: Add force module parameter
platform/x86/amd/pmf: Add depends on CONFIG_POWER_SUPPLY
platform/x86: dell-ddv: Prefer asynchronous probing
platform/x86: dell-ddv: Add hwmon support
Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces
platform: mellanox: mlx-platform: Move bus shift assignment out of the loop
platform: mellanox: mlx-platform: Add mux selection register to regmap
platform_data/mlxreg: Add field with mapped resource address
platform/mellanox: mlxreg-hotplug: Allow more flexible hotplug events configuration
platform: mellanox: Extend all systems with I2C notification callback
platform: mellanox: Split logic in init and exit flow
platform: mellanox: Split initialization procedure
platform: mellanox: Introduce support of new Nvidia L1 switch
platform: mellanox: Introduce support for next-generation 800GB/s switch
platform: mellanox: Cosmetic changes - rename to more common name
platform: mellanox: Change "reset_pwr_converter_fail" attribute
platform: mellanox: Introduce support for rack manager switch
MAINTAINERS: dell-wmi-sysman: drop Divya Bharathi
x86/platform/uv: Make kobj_type structure constant
platform/x86: think-lmi: Make kobj_type structure constant
...
Linus Torvalds [Wed, 22 Feb 2023 01:23:39 +0000 (17:23 -0800)]
Merge tag 'tag-chrome-platform-for-v6.3' of git://git./linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"New drivers:
- cros_ec_uart for ChromeOS EC protocol over UART
- cros_typec_vdm for USB PD Vendor Defined Message
Improvements:
- Preserve logs as much as possible when EC panics
- Shutdown to refrain from potential HW damages when EC panics
Fixes:
- Fix DP_PORT_VDO to include DP_CAP_RECEPTACLE
- Fix a lockdep false positive
Cleanups:
- Use sysfs_emit*() instead of scnprintf()
- Use asm instead of asm-generic for unaligned.h
Misc:
- Rename module name from cros_ec_typec to cros-ec-typec
- Minor fixes"
* tag 'tag-chrome-platform-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (34 commits)
platform/chrome: cros_ec_typec: Fix spelling mistake
platform/chrome: cros_typec_vdm: Add Attention support
platform/chrome: cros_ec: Add VDM attention headers
platform/chrome: cros_typec_vdm: Fix VDO copy
platform/chrome: cros_ec_typec: allow deferred probe of switch handles
platform/chrome: cros_ec_proto: remove big stub objects from stack
platform/chrome: cros_ec_uart: fix negative type promoted to high
platform/chrome: cros_ec: Use per-device lockdep key
platform/chrome: fix kernel-doc warnings for cros_ec_command
platform/chrome: fix kernel-doc warning for last_resume_result
platform/chrome: fix kernel-doc warning for suspend_timeout_ms
platform/chrome: fix kernel-doc warnings for panic notifier
platform/chrome: cros_ec_lpc: initialize the buf variable
platform/chrome: cros_ec: Fix panic notifier registration
platform/chrome: cros_typec_switch: Check for retimer flag
platform/chrome: cros_typec_switch: Use fwnode* prop check
platform/chrome: cros_typec_vdm: Add VDM send support
platform/chrome: cros_typec_vdm: Add VDM reply support
platform/chrome: cros_ec_typec: Add initial VDM support
platform/chrome: cros_ec_typec: Alter module name with hyphens
...
Linus Torvalds [Wed, 22 Feb 2023 01:07:39 +0000 (17:07 -0800)]
Merge tag 'for-linus-6.3-rc1-tag' of git://git./linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- help deprecate the /proc/xen files by making the related information
available via sysfs
- mark the Xen variants of play_dead "noreturn"
- support a shared Xen platform interrupt
- several small cleanups and fixes
* tag 'for-linus-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: sysfs: make kobj_type structure constant
x86/Xen: drop leftover VM-assist uses
xen: Replace one-element array with flexible-array member
xen/grant-dma-iommu: Implement a dummy probe_device() callback
xen/pvcalls-back: fix permanently masked event channel
xen: Allow platform PCI interrupt to be shared
x86/xen/time: prefer tsc as clocksource when it is invariant
x86/xen: mark xen_pv_play_dead() as __noreturn
x86/xen: don't let xen_pv_play_dead() return
drivers/xen/hypervisor: Expose Xen SIF flags to userspace
Linus Torvalds [Wed, 22 Feb 2023 00:59:23 +0000 (16:59 -0800)]
Merge tag 'hyperv-next-signed-
20230220' of git://git./linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- allow Linux to run as the nested root partition for Microsoft
Hypervisor (Jinank Jain and Nuno Das Neves)
- clean up the return type of callback functions (Dawei Li)
* tag 'hyperv-next-signed-
20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Fix hv_get/set_register for nested bringup
Drivers: hv: Make remove callback of hyperv driver void returned
Drivers: hv: Enable vmbus driver for nested root partition
x86/hyperv: Add an interface to do nested hypercalls
Drivers: hv: Setup synic registers in case of nested root partition
x86/hyperv: Add support for detecting nested hypervisor
Florian Westphal [Mon, 20 Feb 2023 16:24:00 +0000 (17:24 +0100)]
netfilter: ctnetlink: make event listener tracking global
pernet tracking doesn't work correctly because other netns might have
set NETLINK_LISTEN_ALL_NSID on its event socket.
In this case its expected that events originating in other net
namespaces are also received.
Making pernet-tracking work while also honoring NETLINK_LISTEN_ALL_NSID
requires much more intrusive changes both in netlink and nfnetlink,
f.e. adding a 'setsockopt' callback that lets nfnetlink know that the
event socket entered (or left) ALL_NSID mode.
Move to global tracking instead: if there is an event socket anywhere
on the system, all net namespaces which have conntrack enabled and
use autobind mode will allocate the ecache extension.
netlink_has_listeners() returns false only if the given group has no
subscribers in any net namespace, the 'net' argument passed to
nfnetlink_has_listeners is only used to derive the protocol (nfnetlink),
it has no other effect.
For proper NETLINK_LISTEN_ALL_NSID-aware pernet tracking of event
listeners a new netlink_has_net_listeners() is also needed.
Fixes:
90d1daa45849 ("netfilter: conntrack: add nf_conntrack_events autodetect mode")
Reported-by: Bryce Kahle <bryce.kahle@datadoghq.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Xin Long [Fri, 17 Feb 2023 23:22:57 +0000 (18:22 -0500)]
netfilter: xt_length: use skb len to match in length_mt6
For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
length for the jumbo packets, it may mismatch.
To fix this, we can just use skb->len instead of parsing exthdrs, as
the hbh exthdr parsing has been done before coming to length_mt6 in
ip6_rcv_core() and br_validate_ipv6() and also the packet has been
trimmed according to the correct IPv6 (ext)hdr length there, and skb
len is trustable in length_mt6().
Note that this patch is especially needed after the IPv6 BIG TCP was
supported in kernel, which is using IPv6 Jumbo packets. Besides, to
match the packets greater than 65535 more properly, a v1 revision of
xt_length may be needed to extend "min, max" to u32 in the future,
and for now the IPv6 Jumbo packets can be matched by:
# ip6tables -m length ! --length 0:65535
Fixes:
7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Fixes:
0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Tue, 21 Feb 2023 23:27:48 +0000 (15:27 -0800)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
architectural register (ZT0, for the look-up table feature) that
Linux needs to save/restore
- Include TPIDR2 in the signal context and add the corresponding
kselftests
- Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
(ARM CMN) at probe time
- Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64
- Permit EFI boot with MMU and caches on. Instead of cleaning the
entire loaded kernel image to the PoC and disabling the MMU and
caches before branching to the kernel bare metal entry point, leave
the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
all of system RAM to populate the initial page tables
- Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
kernel (the arm32 kernel only defines the values)
- Harden the arm64 shadow call stack pointer handling: stash the shadow
stack pointer in the task struct on interrupt, load it directly from
this structure
- Signal handling cleanups to remove redundant validation of size
information and avoid reading the same data from userspace twice
- Refactor the hwcap macros to make use of the automatically generated
ID registers. It should make new hwcaps writing less error prone
- Further arm64 sysreg conversion and some fixes
- arm64 kselftest fixes and improvements
- Pointer authentication cleanups: don't sign leaf functions, unify
asm-arch manipulation
- Pseudo-NMI code generation optimisations
- Minor fixes for SME and TPIDR2 handling
- Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
replace strtobool() to kstrtobool() in the cpufeature.c code, apply
dynamic shadow call stack in two passes, intercept pfn changes in
set_pte_at() without the required break-before-make sequence, attempt
to dump all instructions on unhandled kernel faults
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
arm64: fix .idmap.text assertion for large kernels
kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
kselftest/arm64: Copy whole EXTRA context
arm64: kprobes: Drop ID map text from kprobes blacklist
perf: arm_spe: Print the version of SPE detected
perf: arm_spe: Add support for SPEv1.2 inverted event filtering
perf: Add perf_event_attr::config3
arm64/sme: Fix __finalise_el2 SMEver check
drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
arm64/signal: Only read new data when parsing the ZT context
arm64/signal: Only read new data when parsing the ZA context
arm64/signal: Only read new data when parsing the SVE context
arm64/signal: Avoid rereading context frame sizes
arm64/signal: Make interface for restore_fpsimd_context() consistent
arm64/signal: Remove redundant size validation from parse_user_sigframe()
arm64/signal: Don't redundantly verify FPSIMD magic
arm64/cpufeature: Use helper macros to specify hwcaps
arm64/cpufeature: Always use symbolic name for feature value in hwcaps
arm64/sysreg: Initial unsigned annotations for ID registers
arm64/sysreg: Initial annotation of signed ID registers
...
Florian Westphal [Fri, 17 Feb 2023 22:20:06 +0000 (23:20 +0100)]
netfilter: ebtables: fix table blob use-after-free
We are not allowed to return an error at this point.
Looking at the code it looks like ret is always 0 at this
point, but its not.
t = find_table_lock(net, repl->name, &ret, &ebt_mutex);
... this can return a valid table, with ret != 0.
This bug causes update of table->private with the new
blob, but then frees the blob right away in the caller.
Syzbot report:
BUG: KASAN: vmalloc-out-of-bounds in __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
Read of size 4 at addr
ffffc90005425000 by task kworker/u4:4/74
Workqueue: netns cleanup_net
Call Trace:
kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
__ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
ebt_unregister_table+0x35/0x40 net/bridge/netfilter/ebtables.c:1372
ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
...
ip(6)tables appears to be ok (ret should be 0 at this point) but make
this more obvious.
Fixes:
c58dd2dd443c ("netfilter: Can't fail and free after table replacement")
Reported-by: syzbot+f61594de72d6705aea03@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Thu, 16 Feb 2023 16:05:36 +0000 (17:05 +0100)]
netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
When calling ip6_route_lookup() for the packet arriving on the VRF
interface, the result is always the real (slave) interface. Expect this
when validating the result.
Fixes:
acc641ab95b66 ("netfilter: rpfilter/fib: Populate flowic_l3mdev field")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Tue, 21 Feb 2023 23:21:29 +0000 (15:21 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM udpates from Russell King:
- Improve Kconfig help text for Cortex A8 and Cortex A9 errata
- Kconfig spelling and grammar fixes
- Allow kernel-mode VFP/Neon in softirq context
- Use Neon in softirq context
- Implement AES-CTR/GHASH version of GCM
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9289/1: Allow pre-ARMv5 builds with ld.lld 16.0.0 and newer
ARM: 9288/1: Kconfigs: fix spelling & grammar
ARM: 9286/1: crypto: Implement fused AES-CTR/GHASH version of GCM
ARM: 9285/1: remove meaningless arch/arm/mach-rda/Makefile
ARM: 9283/1: permit non-nested kernel mode NEON in softirq context
ARM: 9282/1: vfp: Manipulate task VFP state with softirqs disabled
ARM: 9281/1: improve Cortex A8/A9 errata help text
Florian Westphal [Tue, 14 Feb 2023 16:23:59 +0000 (17:23 +0100)]
netfilter: conntrack: fix rmmod double-free race
nf_conntrack_hash_check_insert() callers free the ct entry directly, via
nf_conntrack_free.
This isn't safe anymore because
nf_conntrack_hash_check_insert() might place the entry into the conntrack
table and then delteted the entry again because it found that a conntrack
extension has been removed at the same time.
In this case, the just-added entry is removed again and an error is
returned to the caller.
Problem is that another cpu might have picked up this entry and
incremented its reference count.
This results in a use-after-free/double-free, once by the other cpu and
once by the caller of nf_conntrack_hash_check_insert().
Fix this by making nf_conntrack_hash_check_insert() not fail anymore
after the insertion, just like before the 'Fixes' commit.
This is safe because a racing nf_ct_iterate() has to wait for us
to release the conntrack hash spinlocks.
While at it, make the function return -EAGAIN in the rmmod (genid
changed) case, this makes nfnetlink replay the command (suggested
by Pablo Neira).
Fixes:
c56716c69ce1 ("netfilter: extensions: introduce extension genid count")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Tue, 21 Feb 2023 23:17:34 +0000 (15:17 -0800)]
Merge tag 'm68k-for-v6.3-tag1' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Add seccomp support
- defconfig updates
- Miscellaneous fixes and improvements
* tag 'm68k-for-v6.3-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: /proc/hardware should depend on PROC_FS
selftests/seccomp: Add m68k support
m68k: Add kernel seccomp support
m68k: Check syscall_trace_enter() return code
m68k: defconfig: Update defconfigs for v6.2-rc3
m68k: q40: Do not initialise statics to 0
Linus Torvalds [Tue, 21 Feb 2023 23:09:17 +0000 (15:09 -0800)]
Merge tag 's390-6.3-1' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Large cleanup of the con3270/tty3270 driver. Among others this fixes:
- Background Color Support
- ASCII Line Character Support
- VT100 Support
- Geometries other than 80x24
- Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to
uaccess functions, which will be used by KVM to access KVM guest
memory with a specific storage key
- Add support for user space events counting to CPUMF
- Cleanup the vfio/ccw code, which also allows now to properly support
2K Format-2 IDALs
- Move kernel page table allocation and initialization to decompressor,
which finally allows to enter the kernel with dynamic address
translation enabled. This in turn allows to get rid of code with
special handling in the kernel, which has to distinguish if DAT is on
or off
- Replace kretprobe with rethook
- Various improvements to vfio/ap queue resets:
- Use TAPQ to verify completion of a reset in progress rather than
multiple invocations of ZAPQ.
- Check TAPQ response codes when verifying successful completion of
ZAPQ.
- Fix erroneous handling of some error response codes.
- Increase the maximum amount of time to wait for successful
completion of ZAPQ
- Rework system call wrappers to get rid of alias functions, which were
only left on s390
- Cleanup diag288_wdt watchdog driver. It has been agreed on with
Guenter Roeck that this goes upstream via the s390 tree
- Add missing loadparm parameter handling for list-directed ECKD
ipl/reipl
- Various improvements to memory detection code
- Remove arch_cpu_idle_time() since the current implementation is
broken, and allows user space observable accounted idle times which
can temporarily decrease
- Add Reset DAT-Protection support: (only) allow to change PTEs from RO
to RW with a new RDP instruction. Unlike the currently used IPTE
instruction, this does not necessarily guarantee that TLBs of all
CPUs are synchronously flushed; and that remote CPUs can see spurious
protection faults. The overall improvement for not requiring an all
CPU synchronization, like it is required with IPTE, should be
beneficial
- Fix KFENCE page fault reporting
- Smaller cleanups and improvement all over the place
* tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (182 commits)
s390/irq,idle: simplify idle check
s390/processor: add test_and_set_cpu_flag() and test_and_clear_cpu_flag()
s390/processor: let cpu helper functions return boolean values
s390/kfence: fix page fault reporting
s390/zcrypt: introduce ctfm field in struct CPRBX
s390: remove confusing comment from uapi types header file
vfio/ccw: remove WARN_ON during shutdown
s390/entry: remove toolchain dependent micro-optimization
s390/mem_detect: do not truncate online memory ranges info
s390/vx: remove __uint128_t type from __vector128 struct again
s390/mm: add support for RDP (Reset DAT-Protection)
s390/mm: define private VM_FAULT_* reasons from top bits
Documentation: s390: correct spelling
s390/ap: fix status returned by ap_qact()
s390/ap: fix status returned by ap_aqic()
s390: vfio-ap: tighten the NIB validity check
Revert "s390/mem_detect: do not update output parameters on failure"
s390/idle: remove arch_cpu_idle_time() and corresponding code
s390/vx: use simple assignments to access __vector128 members
s390/vx: add 64 and 128 bit members to __vector128 struct
...
Hangyu Hua [Fri, 10 Feb 2023 07:17:30 +0000 (15:17 +0800)]
netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack()
nf_ct_put() needs to be called to put the refcount got by
nf_conntrack_find_get() to avoid refcount leak when
nf_conntrack_hash_check_insert() fails.
Fixes:
7d367e06688d ("netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Tue, 21 Feb 2023 22:51:40 +0000 (14:51 -0800)]
Merge tag 'x86_cpu_for_v6.3_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Cache the AMD debug registers in per-CPU variables to avoid MSR
writes where possible, when supporting a debug registers swap feature
for SEV-ES guests
- Add support for AMD's version of eIBRS called Automatic IBRS which is
a set-and-forget control of indirect branch restriction speculation
resources on privilege change
- Add support for a new x86 instruction - LKGS - Load kernel GS which
is part of the FRED infrastructure
- Reset SPEC_CTRL upon init to accomodate use cases like kexec which
rediscover
- Other smaller fixes and cleanups
* tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd: Cache debug register values in percpu variables
KVM: x86: Propagate the AMD Automatic IBRS feature to the guest
x86/cpu: Support AMD Automatic IBRS
x86/cpu, kvm: Add the SMM_CTL MSR not present feature
x86/cpu, kvm: Add the Null Selector Clears Base feature
x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature
KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code
x86/cpu, kvm: Add support for CPUID_80000021_EAX
x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h>
x86/gsseg: Use the LKGS instruction if available for load_gs_index()
x86/gsseg: Move load_gs_index() to its own new header file
x86/gsseg: Make asm_load_gs_index() take an u16
x86/opcode: Add the LKGS instruction to x86-opcode-map
x86/cpufeature: Add the CPU feature bit for LKGS
x86/bugs: Reset speculation control settings on init
x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
Dave Hansen [Tue, 21 Feb 2023 20:30:15 +0000 (12:30 -0800)]
uaccess: Add speculation barrier to copy_from_user()
The results of "access_ok()" can be mis-speculated. The result is that
you can end speculatively:
if (access_ok(from, size))
// Right here
even for bad from/size combinations. On first glance, it would be ideal
to just add a speculation barrier to "access_ok()" so that its results
can never be mis-speculated.
But there are lots of system calls just doing access_ok() via
"copy_to_user()" and friends (example: fstat() and friends). Those are
generally not problematic because they do not _consume_ data from
userspace other than the pointer. They are also very quick and common
system calls that should not be needlessly slowed down.
"copy_from_user()" on the other hand uses a user-controller pointer and
is frequently followed up with code that might affect caches. Take
something like this:
if (!copy_from_user(&kernelvar, uptr, size))
do_something_with(kernelvar);
If userspace passes in an evil 'uptr' that *actually* points to a kernel
addresses, and then do_something_with() has cache (or other)
side-effects, it could allow userspace to infer kernel data values.
Add a barrier to the common copy_from_user() code to prevent
mis-speculated values which happen after the copy.
Also add a stub for architectures that do not define barrier_nospec().
This makes the macro usable in generic code.
Since the barrier is now usable in generic code, the x86 #ifdef in the
BPF code can also go away.
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 21 Feb 2023 20:32:05 +0000 (12:32 -0800)]
Merge tag 'thermal-6.3-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"The majority of changes here are related to the general switch-over to
using arrays of generic trip point structures registered along with a
thermal zone instead of trip point callbacks (this has been done
mostly by Daniel Lezcano with some help from yours truly on the Intel
drivers front).
Apart from that and the related reorganization of code, there are some
enhancements of the existing driver and a new Mediatek Low Voltage
Thermal Sensor (LVTS) driver. The Intel powerclamp undergoes a major
rework so it will use the generic idle_inject facility for CPU idle
time injection going forward and it will take additional module
parameters for specifying the subset of CPUs to be affected by it
(work done by Srinivas Pandruvada).
Also included are assorted fixes and a whole bunch of cleanups.
Specifics:
- Rework a large bunch of drivers to use the generic thermal trip
structure and use the opportunity to do more cleanups by removing
unused functions from the OF code (Daniel Lezcano)
- Remove core header inclusion from drivers (Daniel Lezcano)
- Fix some locking issues related to the generic thermal trip rework
(Johan Hovold)
- Fix a crash when requesting the critical temperature on tegra,
which is related to the generic trip point work (Jon Hunter)
- Clean up thermal device unregistration code (Viresh Kumar)
- Fix and clean up thermal control core initialization error code
paths (Daniel Lezcano)
- Relocate the trip points handling code into a separate file (Daniel
Lezcano)
- Make the thermal core fail registration of thermal zones and
cooling devices if the thermal class has not been registered
(Rafael Wysocki)
- Add trip point initialization helper functions for ACPI-defined
trip points and modify two thermal drivers to use them (Rafael
Wysocki, Daniel Lezcano)
- Make the core thermal control code use sysfs_emit_at() instead of
scnprintf() where applicable (ye xingchen)
- Consolidate code accessing the Intel TCC (Thermal Control
Circuitry) MSRs by introducing library functions for that and
making the TCC-related code in thermal drivers use them (Zhang Rui)
- Enhance the x86_pkg_temp_thermal driver to support dynamic tjmax
changes (Zhang Rui)
- Address an "unsigned expression compared with zero" warning in the
intel_soc_dts_iosf thermal driver (Yang Li)
- Update comments regarding two functions in the Intel Menlow thermal
driver (Deming Wang)
- Use sysfs_emit_at() instead of scnprintf() in the int340x thermal
driver (ye xingchen)
- Make the intel_pch thermal driver support the Wellsburg PCH (Tim
Zimmermann)
- Modify the intel_pch and processor_thermal_device_pci thermal
drivers use generic trip point tables instead of thermal zone trip
point callbacks (Daniel Lezcano)
- Add production mode attribute sysfs attribute to the int340x
thermal driver (Srinivas Pandruvada)
- Rework dynamic trip point updates handling and locking in the
int340x thermal driver (Rafael Wysocki)
- Make the int340x thermal driver use a generic trip points table
instead of thermal zone trip point callbacks (Rafael Wysocki,
Daniel Lezcano)
- Clean up and improve the int340x thermal driver (Rafael Wysocki)
- Simplify and clean up the intel_pch thermal driver (Rafael Wysocki)
- Fix the Intel powerclamp thermal driver and make it use the common
idle injection framework (Srinivas Pandruvada)
- Add two module parameters, cpumask and max_idle, to the Intel
powerclamp thermal driver to allow it to affect only a specific
subset of CPUs instead of all of them (Srinivas Pandruvada)
- Make the Intel quark_dts thermal driver Use generic trip point
objects instead of its own trip point representation (Daniel
Lezcano)
- Add toctree entry for thermal documents and fix two issues in the
Intel powerclamp driver documentation (Bagas Sanjaya)
- Use strscpy() to instead of strncpy() in the thermal core (Xu
Panda)
- Fix thermal_sampling_exit() (Vincent Guittot)
- Add Mediatek Low Voltage Thermal Sensor (LVTS) driver (Balsam
Chihi)
- Add r8a779g0 RCar support to the rcar_gen3 thermal driver (Geert
Uytterhoeven)
- Fix useless call to set_trips() when resuming in the rcar_gen3
thermal control driver and add interrupt support detection at init
time to it (Niklas Söderlund)
- Fix memory corruption in the hi3660 thermal driver (Yongqin Liu)
- Fix include path for libnl3 in pkg-config file for libthermal
(Vibhav Pant)
- Remove syscfg-based driver for st as the platform is not supported
any more (Alain Volmat)"
* tag 'thermal-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (135 commits)
thermal/drivers/st: Remove syscfg based driver
thermal: Remove core header inclusion from drivers
tools/lib/thermal: Fix include path for libnl3 in pkg-config file.
thermal/drivers/hisi: Drop second sensor hi3660
thermal/drivers/rcar_gen3_thermal: Fix device initialization
thermal/drivers/rcar_gen3_thermal: Create device local ops struct
thermal/drivers/rcar_gen3_thermal: Do not call set_trips() when resuming
thermal/drivers/rcar_gen3: Add support for R-Car V4H
dt-bindings: thermal: rcar-gen3-thermal: Add r8a779g0 support
thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver
dt-bindings: thermal: mediatek: Add LVTS thermal controllers
thermal/drivers/mediatek: Relocate driver to mediatek folder
tools/lib/thermal: Fix thermal_sampling_exit()
Documentation: powerclamp: Fix numbered lists formatting
Documentation: powerclamp: Escape wildcard in cpumask description
Documentation: admin-guide: Add toctree entry for thermal docs
thermal: intel: powerclamp: Add two module parameters
Documentation: admin-guide: Move intel_powerclamp documentation
thermal: core: Use sysfs_emit_at() instead of scnprintf()
thermal: intel: powerclamp: Fix duration module parameter
...
Linus Torvalds [Tue, 21 Feb 2023 20:23:24 +0000 (12:23 -0800)]
Merge tag 'acpi-6.3-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These fix a frequency limit issue in the ACPI processor performance
library code, fix a few issues in the ACPICA code, improve Crystal
Cove support in the ACPI PMIC driver, fix string handling in the ACPI
battery driver, add IRQ override quirks for a few machines more, fix
other assorted problems and clean up code and documentation.
Specifics:
- Drop port I/O validation for some regions to avoid AML failures due
to rejections of legitimate port I/O writes (Mario Limonciello)
- Constify acpi_get_handle() pathname argument to allow its callers
to pass const pathnames to it (Sakari Ailus)
- Prevent acpi_ns_simple_repair() from crashing in some cases when
AE_AML_NO_RETURN_VALUE should be returned (Daniil Tatianin)
- Fix typo in CDAT DSMAS struct definition (Lukas Wunner)
- Drop an unnecessary (void *) conversion from the ACPI processor
driver (Zhou jie)
- Modify the ACPI processor performance library code to use the "no
limit" frequency QoS as appropriate and adjust the intel_pstate
driver accordingly (Rafael Wysocki)
- Add support for NBFT to the ACPI table parser (Stuart Hayes)
- Introduce list of known non-PNP devices to avoid enumerating some
of them as PNP devices (Rafael Wysocki)
- Add x86 ACPI paths to the ACPI entry in MAINTAINERS to allow
scripts to report the actual maintainers information (Rafael
Wysocki)
- Add two more entries to the ACPI IRQ override quirk list (Adam
Niederer, Werner Sembach)
- Add a pmic_i2c_address entry for Intel Bay Trail Crystal Cove to
allow intel_soc_pmic_exec_mipi_pmic_seq_element() to be used with
the Bay Trail Crystal Cove PMIC OpRegion driver (Hans de Goede)
- Add comments with DSDT power OpRegion field names to the ACPI PMIC
driver (Hans de Goede)
- Fix string termination handling in the ACPI battery driver (Armin
Wolf)
- Limit error type to 32-bit width in the ACPI APEI error injection
code (Shuai Xue)
- Fix Lenovo Ideapad Z570 DMI match in the ACPI backlight driver
(Hans de Goede)
- Silence missing prototype warnings in some places in the
ACPI-related code (Ammar Faizi)
- Make kobj_type structures used in the ACPI code constant (Thomas
Weißschuh)
- Correct spelling in firmware-guide/ACPI (Randy Dunlap)
- Clarify the meaning of Explicit and Implicit in the _DSD GPIO
properties documentation (Andy Shevchenko)
- Fix some kernel-doc comments in the ACPI CPPC library code (Yang
Li)"
* tag 'acpi-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
ACPI: make kobj_type structures constant
Documentation: firmware-guide: gpio-properties: Clarify Explicit and Implicit
ACPICA: Fix typo in CDAT DSMAS struct definition
ACPI: resource: Do IRQ override on all TongFang GMxRGxx
ACPI: resource: Add IRQ overrides for MAINGEAR Vector Pro 2 models
ACPI: CPPC: Fix some kernel-doc comments
ACPI: video: Fix Lenovo Ideapad Z570 DMI match
Documentation: firmware-guide/ACPI: correct spelling
ACPI: PMIC: Add comments with DSDT power opregion field names
ACPI: battery: Increase maximum string length
ACPI: battery: Fix buffer overread if not NUL-terminated
ACPI: APEI: EINJ: Limit error type to 32-bit width
MAINTAINERS: Add x86 ACPI paths to the ACPI entry
ACPI: battery: Fix missing NUL-termination with large strings
ACPI: PNP: Introduce list of known non-PNP devices
ACPICA: nsrepair: handle cases without a return value correctly
ACPI: Silence missing prototype warnings
cpufreq: intel_pstate: Drop ACPI _PSS states table patching
ACPI: processor: perflib: Avoid updating frequency QoS unnecessarily
ACPI: processor: perflib: Use the "no limit" frequency QoS
...
Linus Torvalds [Tue, 21 Feb 2023 20:13:58 +0000 (12:13 -0800)]
Merge tag 'pm-6.3-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add EPP support to the AMD P-state cpufreq driver, add support
for new platforms to the Intel RAPL power capping driver, intel_idle
and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
drop the custom cpufreq driver for loongson1 that is not necessary any
more (and the corresponding cpufreq platform device), fix assorted
issues and clean up code.
Specifics:
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
Karny, Arnd Bergmann, Bagas Sanjaya)
- Drop the custom cpufreq driver for loongson1 that is not necessary
any more and the corresponding cpufreq platform device (Keguang
Zhang)
- Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
entries (Paul E. McKenney)
- Enable thermal cooling for Tegra194 (Yi-Wei Wang)
- Register module device table and add missing compatibles for
cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)
- Various dt binding updates for qcom-cpufreq-nvmem and
opp-v2-kryo-cpu (Christian Marangi)
- Make kobj_type structure in the cpufreq core constant (Thomas
Weißschuh)
- Make cpufreq_unregister_driver() return void (Uwe Kleine-König)
- Make the TEO cpuidle governor check CPU utilization in order to
refine idle state selection (Kajetan Puchalski)
- Make Kconfig select the haltpoll cpuidle governor when the haltpoll
cpuidle driver is selected and replace a default_idle() call in
that driver with arch_cpu_idle() to allow MWAIT to be used (Li
RongQing)
- Add Emerald Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy)
- Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
avoid randconfig build failures (Arnd Bergmann)
- Make kobj_type structures used in the cpuidle sysfs interface
constant (Thomas Weißschuh)
- Make the cpuidle driver registration code update microsecond values
of idle state parameters in accordance with their nanosecond values
if they are provided (Rafael Wysocki)
- Make the PSCI cpuidle driver prevent topology CPUs from being
suspended on PREEMPT_RT (Krzysztof Kozlowski)
- Document that pm_runtime_force_suspend() cannot be used with
DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)
- Add EXPORT macros for exporting PM functions from drivers (Richard
Fitzgerald)
- Remove /** from non-kernel-doc comments in hibernation code (Randy
Dunlap)
- Fix possible name leak in powercap_register_zone() (Yang Yingliang)
- Add Meteor Lake and Emerald Rapids support to the intel_rapl power
capping driver (Zhang Rui)
- Modify the idle_inject power capping facility to support 100% idle
injection (Srinivas Pandruvada)
- Fix large time windows handling in the intel_rapl power capping
driver (Zhang Rui)
- Fix memory leaks with using debugfs_lookup() in the generic PM
domains and Energy Model code (Greg Kroah-Hartman)
- Add missing 'cache-unified' property in the example for kryo OPP
bindings (Rob Herring)
- Fix error checking in opp_migrate_dentry() (Qi Zheng)
- Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
Dybcio)
- Modify some power management utilities to use the canonical ftrace
path (Ross Zwisler)
- Correct spelling problems for Documentation/power/ as reported by
codespell (Randy Dunlap)"
* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
Documentation: amd-pstate: disambiguate user space sections
cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
PM: Add EXPORT macros for exporting PM functions
cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
MIPS: loongson32: Drop obsolete cpufreq platform device
powercap: intel_rapl: Fix handling for large time window
cpuidle: driver: Update microsecond values of state parameters as needed
cpuidle: sysfs: make kobj_type structures constant
cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
PM: EM: fix memory leak with using debugfs_lookup()
PM: domains: fix memory leak with using debugfs_lookup()
cpufreq: Make kobj_type structure constant
cpufreq: davinci: Fix clk use after free
cpufreq: amd-pstate: avoid uninitialized variable use
cpufreq: Make cpufreq_unregister_driver() return void
OPP: fix error checking in opp_migrate_dentry()
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
...
Linus Torvalds [Tue, 21 Feb 2023 19:07:23 +0000 (11:07 -0800)]
Merge tag 'hardening-v6.3-rc1' of git://git./linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"Beyond some specific LoadPin, UBSAN, and fortify features, there are
other fixes scattered around in various subsystems where maintainers
were okay with me carrying them in my tree or were non-responsive but
the patches were reviewed by others:
- Replace 0-length and 1-element arrays with flexible arrays in
various subsystems (Paulo Miguel Almeida, Stephen Rothwell, Kees
Cook)
- randstruct: Disable Clang 15 support (Eric Biggers)
- GCC plugins: Drop -std=gnu++11 flag (Sam James)
- strpbrk(): Refactor to use strchr() (Andy Shevchenko)
- LoadPin LSM: Allow root filesystem switching when non-enforcing
- fortify: Use dynamic object size hints when available
- ext4: Fix CFI function prototype mismatch
- Nouveau: Fix DP buffer size arguments
- hisilicon: Wipe entire crypto DMA pool on error
- coda: Fully allocate sig_inputArgs
- UBSAN: Improve arm64 trap code reporting
- copy_struct_from_user(): Add minimum bounds check on kernel buffer
size"
* tag 'hardening-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randstruct: disable Clang 15 support
uaccess: Add minimum bounds check on kernel buffer size
arm64: Support Clang UBSAN trap codes for better reporting
coda: Avoid partial allocation of sig_inputArgs
gcc-plugins: drop -std=gnu++11 to fix GCC 13 build
lib/string: Use strchr() in strpbrk()
crypto: hisilicon: Wipe entire pool on error
net/i40e: Replace 0-length array with flexible array
io_uring: Replace 0-length array with flexible array
ext4: Fix function prototype mismatch for ext4_feat_ktype
i915/gvt: Replace one-element array with flexible-array member
drm/nouveau/disp: Fix nvif_outp_acquire_dp() argument size
LoadPin: Allow filesystem switch when not enforcing
LoadPin: Move pin reporting cleanly out of locking
LoadPin: Refactor sysctl initialization
LoadPin: Refactor read-only check into a helper
ARM: ixp4xx: Replace 0-length arrays with flexible arrays
fortify: Use __builtin_dynamic_object_size() when available
rxrpc: replace zero-lenth array with DECLARE_FLEX_ARRAY() helper
Linus Torvalds [Tue, 21 Feb 2023 19:04:56 +0000 (11:04 -0800)]
Merge tag 'seccomp-v6.3-rc1' of git://git./linux/kernel/git/kees/linux
Pull seccomp update from Kees Cook:
- Fix kernel-doc function name ordering to avoid warning (Randy Dunlap)
* tag 'seccomp-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: fix kernel-doc function name warning
Linus Torvalds [Tue, 21 Feb 2023 18:45:51 +0000 (10:45 -0800)]
Merge tag 'rcu.2023.02.10a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes, perhaps most notably:
- Throttling callback invocation based on the number of callbacks
that are now ready to invoke instead of on the total number of
callbacks
- Several patches that suppress false-positive boot-time
diagnostics, for example, due to lockdep not yet being
initialized
- Make expedited RCU CPU stall warnings dump stacks of any tasks
that are blocking the stalled grace period. (Normal RCU CPU
stall warnings have done this for many years)
- Lazy-callback fixes to avoid delays during boot, suspend, and
resume. (Note that lazy callbacks must be explicitly enabled, so
this should not (yet) affect production use cases)
- Make kfree_rcu() and friends take advantage of polled grace periods,
thus reducing memory footprint by almost two orders of magnitude,
admittedly on a microbenchmark
This also begins the transition from kfree_rcu(p) to
kfree_rcu_mightsleep(p). This transition was motivated by bugs where
kfree_rcu(p), which can block, was typed instead of the intended
kfree_rcu(p, rh)
- SRCU updates, perhaps most notably fixing a bug that causes SRCU to
fail when booted on a system with a non-zero boot CPU. This
surprising situation actually happens for kdump kernels on the
powerpc architecture
This also adds an srcu_down_read() and srcu_up_read(), which act like
srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side
critical section to be handed off from one task to another
- Clean up the now-useless SRCU Kconfig option
There are a few more commits that are not yet acked or pulled into
maintainer trees, and these will be in a pull request for a later
merge window
- RCU-tasks updates, perhaps most notably these fixes:
- A strange interaction between PID-namespace unshare and the
RCU-tasks grace period that results in a low-probability but
very real hang
- A race between an RCU tasks rude grace period on a single-CPU
system and CPU-hotplug addition of the second CPU that can
result in a too-short grace period
- A race between shrinking RCU tasks down to a single callback
list and queuing a new callback to some other CPU, but where
that queuing is delayed for more than an RCU grace period. This
can result in that callback being stranded on the non-boot CPU
- Torture-test updates and fixes
- Torture-test scripting updates and fixes
- Provide additional RCU CPU stall-warning information in kernels built
with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute
timeout limit for expedited RCU CPU stall warnings
* tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep()
kernel/notifier: Remove CONFIG_SRCU
init: Remove "select SRCU"
fs/quota: Remove "select SRCU"
fs/notify: Remove "select SRCU"
fs/btrfs: Remove "select SRCU"
fs: Remove CONFIG_SRCU
drivers/pci/controller: Remove "select SRCU"
drivers/net: Remove "select SRCU"
drivers/md: Remove "select SRCU"
drivers/hwtracing/stm: Remove "select SRCU"
drivers/dax: Remove "select SRCU"
drivers/base: Remove CONFIG_SRCU
rcu: Disable laziness if lazy-tracking says so
rcu: Track laziness during boot and suspend
rcu: Remove redundant call to rcu_boost_kthread_setaffinity()
rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts
rcu: Align the output of RCU CPU stall warning messages
rcu: Add RCU stall diagnosis information
sched: Add helper nr_context_switches_cpu()
...
Linus Torvalds [Tue, 21 Feb 2023 18:36:29 +0000 (10:36 -0800)]
Merge tag 'cgroup-for-6.3' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"All the changes are trivial: documentation updates and a trivial code
cleanup"
* tag 'cgroup-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: fix a few kernel-doc warnings & coding style
docs: cgroup-v1: use numbered lists for user interface setup
docs: cgroup-v1: add internal cross-references
docs: cgroup-v1: make swap extension subsections subsections
docs: cgroup-v1: use bullet lists for list of stat file tables
docs: cgroup-v1: move hierarchy of accounting caption
docs: cgroup-v1: fix footnotes
docs: cgroup-v1: use code block for locking order schema
docs: cgroup-v1: wrap remaining admonitions in admonition blocks
docs: cgroup-v1: replace custom note constructs with appropriate admonition blocks
cgroup/cpuset: no need to explicitly init a global static variable
Linus Torvalds [Tue, 21 Feb 2023 18:25:48 +0000 (10:25 -0800)]
Merge tag 'wq-for-6.3' of git://git./linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- When per-cpu workqueue workers expire after sitting idle for too
long, they used to wake up to the CPU that they're bound to in order
to exit. This unfortunately could cause unwanted disturbances on CPUs
isolated for e.g. RT applications.
The worker exit path is restructured so that an existing worker is
unbound from its CPU before being woken up for the last time,
allowing it to migrate away from an isolated CPU for exiting.
- A couple debug improvements. Watchdog dump is made more compact and
workqueue now warns if used-after-free during the RCU grace period
after destroy_workqueue().
* tag 'wq-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fold rebind_worker() within rebind_workers()
workqueue: Unbind kworkers before sending them to exit()
workqueue: Don't hold any lock while rcuwait'ing for !POOL_MANAGER_ACTIVE
workqueue: Convert the idle_timer to a timer + work_struct
workqueue: Factorize unbind/rebind_workers() logic
workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex
workqueue: Make show_pwq() use run-length encoding
workqueue: Add a new flag to spot the potential UAF error
Linus Torvalds [Tue, 21 Feb 2023 18:03:48 +0000 (10:03 -0800)]
Merge tag 'irq-core-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Move the interrupt affinity spreading mechanism into lib/group_cpus
so it can be used for similar spreading requirements, e.g. in the
block multi-queue code
This also contains a first usecase in the block multi-queue code
which Jens asked to take along with the librarization
- Improve irqdomain locking to close a number race conditions which
can be observed with massive parallel device driver probing
- Enforce and document the semantics of disable_irq() which cannot be
invoked safely from non-sleepable context
- Move the IPI multiplexing code from the Apple AIC driver into the
core, so it can be reused by RISCV
Drivers:
- Plug OF node refcounting leaks in various drivers
- Correctly mark level triggered interrupts in the Broadcom L2
drivers
- The usual small fixes and improvements
- No new drivers for the record!"
* tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
irqdomain: Switch to per-domain locking
irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
irqchip/gic-v2m: Use irq_domain_create_hierarchy()
irqchip/alpine-msi: Use irq_domain_add_hierarchy()
x86/uv: Use irq_domain_create_hierarchy()
x86/ioapic: Use irq_domain_create_hierarchy()
irqdomain: Clean up irq_domain_push/pop_irq()
irqdomain: Drop leftover brackets
irqdomain: Drop dead domain-name assignment
irqdomain: Drop revmap mutex
irqdomain: Fix domain registration race
irqdomain: Fix mapping-creation race
irqdomain: Refactor __irq_domain_alloc_irqs()
irqdomain: Look for existing mapping only once
irqdomain: Drop bogus fwspec-mapping error handling
...
Linus Torvalds [Tue, 21 Feb 2023 17:45:13 +0000 (09:45 -0800)]
Merge tag 'timers-core-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timekeeping, timers and clockevent/source drivers:
Core:
- Yet another round of improvements to make the clocksource watchdog
more robust:
- Relax the clocksource-watchdog skew criteria to match the NTP
criteria.
- Temporarily skip the watchdog when high memory latencies are
detected which can lead to false-positives.
- Provide an option to enable TSC skew detection even on systems
where TSC is marked as reliable.
Sigh!
- Initialize the restart block in the nanosleep syscalls to be
directed to the no restart function instead of doing a partial
setup on entry.
This prevents an erroneous restart_syscall() invocation from
corrupting user space data. While such a situation is clearly a
user space bug, preventing this is a correctness issue and caters
to the least suprise principle.
- Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
to align it with the nanosleep semantics.
Drivers:
- The obligatory new driver bindings for Mediatek, Rockchip and
RISC-V variants.
- Add support for the C3STOP misfeature to the RISC-V timer to handle
the case where the timer stops in deeper idle state.
- Set up a static key in the RISC-V timer correctly before first use.
- The usual small improvements and fixes all over the place"
* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
clocksource/drivers/em_sti: Mark driver as non-removable
clocksource/drivers/sh_tmu: Mark driver as non-removable
clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
clocksource/drivers/timer-microchip-pit64b: Add delay timer
clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
dt-bindings: timer: mediatek,mtk-timer: add MT8365
clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
clocksource/drivers/sh_cmt: Mark driver as non-removable
clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
clocksource/drivers/riscv: Increase the clock source rating
clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
dt-bindings: timer: Add bindings for the RISC-V timer device
RISC-V: time: initialize hrtimer based broadcast clock event device
dt-bindings: timer: rk-timer: Add rktimer for rv1126
time/debug: Fix memory leak with using debugfs_lookup()
clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
clocksource: Verify HPET and PMTMR when TSC unverified
...
Jakub Kicinski [Tue, 21 Feb 2023 17:26:22 +0000 (09:26 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Per-next-PR merge.
net/smc/af_smc.c
b5dd4d698171 ("net/smc: llc_conf_mutex refactor, replace it with rw_semaphore")
e40b801b3603 ("net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()")
https://lore.kernel.org/all/
20230221124008.
6303c330@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 21 Feb 2023 17:24:08 +0000 (09:24 -0800)]
Merge tag 'x86-cleanups-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull miscellaneous x86 cleanups from Thomas Gleixner:
- Correct the common copy and pasted mishandling of kstrtobool() in the
strict_sas_size() setup function
- Make recalibrate_cpu_khz() an GPL only export
- Check TSC feature before doing anything else which avoids pointless
code execution if TSC is not available
- Remove or fixup stale and misleading comments
- Remove unused or pointelessly duplicated variables
- Spelling and typo fixes
* tag 'x86-cleanups-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hotplug: Remove incorrect comment about mwait_play_dead()
x86/tsc: Do feature check as the very first thing
x86/tsc: Make recalibrate_cpu_khz() export GPL only
x86/cacheinfo: Remove unused trace variable
x86/Kconfig: Fix spellos & punctuation
x86/signal: Fix the value returned by strict_sas_size()
x86/cpu: Remove misleading comment
x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery
x86/boot/e820: Fix typo in e820.c comment
Ilias Apalodimas [Fri, 17 Feb 2023 22:21:30 +0000 (00:21 +0200)]
page_pool: add a comment explaining the fragment counter usage
When reading the page_pool code the first impression is that keeping
two separate counters, one being the page refcnt and the other being
fragment pp_frag_count, is counter-intuitive.
However without that fragment counter we don't know when to reliably
destroy or sync the outstanding DMA mappings. So let's add a comment
explaining this part.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20230217222130.85205-1-ilias.apalodimas@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Mon, 20 Feb 2023 12:23:31 +0000 (14:23 +0200)]
net: ethtool: fix __ethtool_dev_mm_supported() implementation
The MAC Merge layer is supported when ops->get_mm() returns 0.
The implementation was changed during review, and in this process, a bug
was introduced.
Link: https://lore.kernel.org/netdev/20230111161706.1465242-5-vladimir.oltean@nxp.com/
Fixes:
04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Link: https://lore.kernel.org/all/20230220122343.1156614-2-vladimir.oltean@nxp.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bo Liu [Tue, 21 Feb 2023 08:30:36 +0000 (03:30 -0500)]
ethtool: pse-pd: Fix double word in comments
Remove the repeated word "for" in comments.
Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20230221083036.2414-1-liubo03@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xuan Zhuo [Tue, 21 Feb 2023 07:51:40 +0000 (15:51 +0800)]
xsk: add linux/vmalloc.h to xsk.c
Fix the failure of the compilation under the sh4.
Because we introduced remap_vmalloc_range() earlier, this has caused
the compilation failure on the sh4 platform. So this introduction of the
header file of linux/vmalloc.h.
config: sh-allmodconfig (https://download.01.org/0day-ci/archive/
20230221/
202302210041.kpPQLlNQ-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=
9f78bf330a66cd400b3e00f370f597e9fa939207
git remote add net-next https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
git fetch --no-tags net-next master
git checkout
9f78bf330a66cd400b3e00f370f597e9fa939207
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash net/
Fixes:
9f78bf330a66 ("xsk: support use vaddr as ring")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302210041.kpPQLlNQ-lkp@intel.com/
Link: https://lore.kernel.org/r/20230221075140.46988-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 21 Feb 2023 16:54:41 +0000 (08:54 -0800)]
Merge tag 'x86_vdso_for_v6.3_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 vdso updates from Borislav Petkov:
- Add getcpu support for the 32-bit version of the vDSO
- Some smaller fixes
* tag 'x86_vdso_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Fix -Wmissing-prototypes warnings
x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu
selftests: Emit a warning if getcpu() is missing on 32bit
x86/vdso: Provide getcpu for x86-32.
x86/cpu: Provide the full setup for getcpu() on x86-32
x86/vdso: Move VDSO image init to vdso2c generated code
Linus Torvalds [Tue, 21 Feb 2023 16:47:36 +0000 (08:47 -0800)]
Merge tag 'x86_microcode_for_v6.3_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 microcode loader updates from Borislav Petkov:
- Fix mixed steppings support on AMD which got broken somewhere along
the way
- Improve revision reporting
- Properly check CPUID capabilities after late microcode upgrade to
avoid false positives
- A garden variety of other small fixes
* tag 'x86_microcode_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/core: Return an error only when necessary
x86/microcode/AMD: Fix mixed steppings support
x86/microcode/AMD: Add a @cpu parameter to the reloading functions
x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter
x86/microcode: Allow only "1" as a late reload trigger value
x86/microcode/intel: Print old and new revision during early boot
x86/microcode/intel: Pass the microcode revision to print_ucode_info() directly
x86/microcode: Adjust late loading result reporting message
x86/microcode: Check CPU capabilities after late microcode update correctly
x86/microcode: Add a parameter to microcode_check() to store CPU capabilities
x86/microcode: Use the DEVICE_ATTR_RO() macro
x86/microcode/AMD: Handle multiple glued containers properly
x86/microcode/AMD: Rename a couple of functions
Linus Torvalds [Tue, 21 Feb 2023 16:38:45 +0000 (08:38 -0800)]
Merge tag 'x86_cache_for_v6.3_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Add support for a new AMD feature called slow memory bandwidth
allocation. Its goal is to control resource allocation in external
slow memory which is connected to the machine like for example
through CXL devices, accelerators etc
* tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix a silly -Wunused-but-set-variable warning
Documentation/x86: Update resctrl.rst for new features
x86/resctrl: Add interface to write mbm_local_bytes_config
x86/resctrl: Add interface to write mbm_total_bytes_config
x86/resctrl: Add interface to read mbm_local_bytes_config
x86/resctrl: Add interface to read mbm_total_bytes_config
x86/resctrl: Support monitor configuration
x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
x86/resctrl: Include new features in command line options
x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
Linus Torvalds [Tue, 21 Feb 2023 16:27:47 +0000 (08:27 -0800)]
Merge tag 'x86_alternatives_for_v6.3_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 asm alternatives updates from Borislav Petkov:
- Teach the static_call patching infrastructure to handle conditional
tall calls properly which can be static calls too
- Add proper struct alt_instr.flags which controls different aspects of
insn patching behavior
* tag 'x86_alternatives_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/static_call: Add support for Jcc tail-calls
x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
x86/alternatives: Introduce int3_emulate_jcc()
x86/alternatives: Add alt_instr.flags
Linus Torvalds [Tue, 21 Feb 2023 16:10:03 +0000 (08:10 -0800)]
Merge tag 'edac_updates_for_v6.3' of git://git./linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Add a driver for the RAS functionality on Xilinx's on chip memory
controller
- Add support for decoding errors from the first and second level
memory on SKL-based hardware
- Add support for the memory controllers in Intel Granite Rapids and
Emerald Rapids machines
- First round of amd64_edac driver simplification and removal of
unneeded functionality
- The usual cleanups and fixes
* tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive
EDAC/amd64: Remove early_channel_count()
EDAC/amd64: Remove PCI Function 0
EDAC/amd64: Remove PCI Function 6
EDAC/amd64: Remove scrub rate control for Family 17h and later
EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
EDAC/i10nm: Add driver decoder for Sapphire Rapids server
EDAC/i10nm: Add Intel Granite Rapids server support
EDAC/i10nm: Make more configurations CPU model specific
EDAC/i10nm: Add Intel Emerald Rapids server support
EDAC/skx_common: Delete duplicated and unreachable code
EDAC/skx_common: Enable EDAC support for the "near" memory
EDAC/qcom: Add platform_device_id table for module autoloading
EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCM
dt-bindings: edac: Add bindings for Xilinx ZynqMP OCM
Linus Torvalds [Tue, 21 Feb 2023 16:04:51 +0000 (08:04 -0800)]
Merge tag 'ras_core_for_v6.3_rc1' of git://git./linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- Add support for reporting more bits of the physical address on error,
on newer AMD CPUs
- Mask out bits which don't belong to the address of the error being
reported
* tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Mask out non-address bits from machine check bank
x86/mce: Add support for Extended Physical Address MCA changes
x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
Jiri Pirko [Mon, 20 Feb 2023 13:23:36 +0000 (14:23 +0100)]
sefltests: netdevsim: wait for devlink instance after netns removal
When devlink instance is put into network namespace and that network
namespace gets deleted, devlink instance is moved back into init_ns.
This is done as a part of cleanup_net() routine. Since cleanup_net()
is called asynchronously from workqueue, there is no guarantee that
the devlink instance move is done after "ip netns del" returns.
So fix this race by making sure that the devlink instance is present
before any other operation.
Reported-by: Amir Tzin <amirtz@nvidia.com>
Fixes:
b74c37fd35a2 ("selftests: netdevsim: add tests for devlink reload with resources")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230220132336.198597-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Roxana Nicolescu [Mon, 20 Feb 2023 11:04:00 +0000 (12:04 +0100)]
selftest: fib_tests: Always cleanup before exit
Usage of `set -e` before executing a command causes immediate exit
on failure, without cleanup up the resources allocated at setup.
This can affect the next tests that use the same resources,
leading to a chain of failures.
A simple fix is to always call cleanup function when the script exists.
This approach is already used by other existing tests.
Fixes:
1056691b2680 ("selftests: fib_tests: Make test results more verbose")
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Link: https://lore.kernel.org/r/20230220110400.26737-2-roxana.nicolescu@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Tue, 21 Feb 2023 03:04:54 +0000 (19:04 -0800)]
Merge tag 'x86-platform-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull x86 platform update from Ingo Molnar:
- Simplify add_rtc_cmos()
- Use strscpy() in the mcelog code
* tag 'x86-platform-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/dev-mcelog: use strscpy() to instead of strncpy()
x86/rtc: Simplify PNP ids check
Linus Torvalds [Tue, 21 Feb 2023 03:00:00 +0000 (19:00 -0800)]
Merge tag 'x86-mm-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull x86 mm update from Ingo Molnar:
"Micro-optimize __flush_tlb_all()"
* tag 'x86-mm-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Use cpu_feature_enabled() when checking global pages support
Linus Torvalds [Tue, 21 Feb 2023 02:50:02 +0000 (18:50 -0800)]
Merge tag 'x86-fpu-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
- Replace zero-length array in struct xregs_state with flexible-array
member, to help the enabling of stricter compiler checks.
- Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads.
* tag 'x86-fpu-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads
x86/fpu: Replace zero-length array in struct xregs_state with flexible-array member
Linus Torvalds [Tue, 21 Feb 2023 02:40:45 +0000 (18:40 -0800)]
Merge tag 'x86-core-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
- Clean up the signal frame layout tests
- Suppress KMSAN false positive reports in arch_within_stack_frames()
* tag 'x86-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Suppress KMSAN reports in arch_within_stack_frames()
x86/signal/compat: Move sigaction_compat_abi() to signal_64.c
x86/signal: Move siginfo field tests
Linus Torvalds [Tue, 21 Feb 2023 02:36:54 +0000 (18:36 -0800)]
Merge tag 'x86-build-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull x86 build update from Ingo Molnar:
"Make the 64-bit defconfig the x86 default for all builds, unless
x86-32 is requested explicitly"
* tag 'x86-build-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Make 64-bit defconfig the default
Linus Torvalds [Tue, 21 Feb 2023 02:32:55 +0000 (18:32 -0800)]
Merge tag 'x86-boot-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
- Robustify/fix calling startup_{32,64}() from the decompressor code,
and removing x86 quirk from scripts/head-object-list.txt as a result.
- Do not register processors that cannot be onlined for x2APIC
* tag 'x86-boot-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC
scripts/head-object-list: Remove x86 from the list
x86/boot: Robustify calling startup_{32,64}() from the decompressor code
Linus Torvalds [Tue, 21 Feb 2023 02:25:30 +0000 (18:25 -0800)]
Merge tag 'x86-asm-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"Header fixes and a DocBook fix"
* tag 'x86-asm-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Fix compiler and kernel-doc warnings
x86/lib: Include <asm/misc.h> to fix a missing prototypes warning at build time
Linus Torvalds [Tue, 21 Feb 2023 01:41:08 +0000 (17:41 -0800)]
Merge tag 'sched-core-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Improve the scalability of the CFS bandwidth unthrottling logic with
large number of CPUs.
- Fix & rework various cpuidle routines, simplify interaction with the
generic scheduler code. Add __cpuidle methods as noinstr to objtool's
noinstr detection and fix boatloads of cpuidle bugs & quirks.
- Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
previously issued registrations.
- Limit scheduler slice duration to the sysctl_sched_latency period, to
improve scheduling granularity with a large number of SCHED_IDLE
tasks.
- Debuggability enhancement on sys_exit(): warn about disabled IRQs,
but also enable them to prevent a cascade of followup problems and
repeat warnings.
- Fix the rescheduling logic in prio_changed_dl().
- Micro-optimize cpufreq and sched-util methods.
- Micro-optimize ttwu_runnable()
- Micro-optimize the idle-scanning in update_numa_stats(),
select_idle_capacity() and steal_cookie_task().
- Update the RSEQ code & self-tests
- Constify various scheduler methods
- Remove unused methods
- Refine __init tags
- Documentation updates
- Misc other cleanups, fixes
* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
sched/rt: pick_next_rt_entity(): check list_entry
sched/deadline: Add more reschedule cases to prio_changed_dl()
sched/fair: sanitize vruntime of entity being placed
sched/fair: Remove capacity inversion detection
sched/fair: unlink misfit task from cpu overutilized
objtool: mem*() are not uaccess safe
cpuidle: Fix poll_idle() noinstr annotation
sched/clock: Make local_clock() noinstr
sched/clock/x86: Mark sched_clock() noinstr
x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
x86/atomics: Always inline arch_atomic64*()
cpuidle: tracing, preempt: Squash _rcuidle tracing
cpuidle: tracing: Warn about !rcu_is_watching()
cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
cpuidle: drivers: firmware: psci: Dont instrument suspend code
KVM: selftests: Fix build of rseq test
exit: Detect and fix irq disabled state in oops
cpuidle, arm64: Fix the ARM64 cpuidle logic
cpuidle: mvebu: Fix duplicate flags assignment
sched/fair: Limit sched slice duration
...
Linus Torvalds [Tue, 21 Feb 2023 01:29:55 +0000 (17:29 -0800)]
Merge tag 'perf-core-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
- Optimize perf_sample_data layout
- Prepare sample data handling for BPF integration
- Update the x86 PMU driver for Intel Meteor Lake
- Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
discovery breakage
- Fix the x86 Zhaoxin PMU driver
- Cleanups
* tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
perf/x86/intel/uncore: Add Meteor Lake support
x86/perf/zhaoxin: Add stepping check for ZXC
perf/x86/intel/ds: Fix the conversion from TSC to perf time
perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
perf/x86/uncore: Add a quirk for UPI on SPR
perf/x86/uncore: Ignore broken units in discovery table
perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
perf/x86/uncore: Factor out uncore_device_to_die()
perf/core: Call perf_prepare_sample() before running BPF
perf/core: Introduce perf_prepare_header()
perf/core: Do not pass header for sample ID init
perf/core: Set data->sample_flags in perf_prepare_sample()
perf/core: Add perf_sample_save_brstack() helper
perf/core: Add perf_sample_save_raw_data() helper
perf/core: Add perf_sample_save_callchain() helper
perf/core: Save the dynamic parts of sample data size
x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
perf/core: Change the layout of perf_sample_data
perf/x86/msr: Add Meteor Lake support
perf/x86/cstate: Add Meteor Lake support
...
Linus Torvalds [Tue, 21 Feb 2023 01:18:23 +0000 (17:18 -0800)]
Merge tag 'locking-core-2023-02-20' of git://git./linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- rwsem micro-optimizations
- spinlock micro-optimizations
- cleanups, simplifications
* tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vduse: Remove include of rwlock.h
locking/lockdep: Remove lockdep_init_map_crosslock.
x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
x86/PAT: Use try_cmpxchg() in set_page_memtype()
locking/rwsem: Disable preemption in all down_write*() and up_write() code paths
locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
locking/qspinlock: Micro-optimize pending state waiting for unlock
Leon Romanovsky [Sun, 19 Feb 2023 09:09:10 +0000 (11:09 +0200)]
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
Hardware requires an alignment to 64 bytes to return ASO data. Missing
this alignment caused to unpredictable results while ASO events were
generated.
Fixes:
8518d05b8f9a ("net/mlx5e: Create Advanced Steering Operation object for IPsec")
Reported-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/de0302c572b90c9224a72868d4e0d657b6313c4b.1676797613.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 21 Feb 2023 00:52:50 +0000 (16:52 -0800)]
Merge tag 'mlx5-updates-2023-02-15' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-02-15
1) From Gal Tariq and Parav, Few cleanups for mlx5 driver.
2) From Vlad: Allow offloading of ct 'new' match based on [1]
[1] https://lore.kernel.org/netdev/
20230201163100.
1001180-1-vladbu@nvidia.com/
* tag 'mlx5-updates-2023-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: RX, Remove doubtful unlikely call
net/mlx5e: Fix outdated TLS comment
net/mlx5e: Remove unused function mlx5e_sq_xmit_simple
net/mlx5e: Allow offloading of ct 'new' match
net/mlx5e: Implement CT entry update
net/mlx5: Simplify eq list traversal
net/mlx5e: Remove redundant page argument in mlx5e_xdp_handle()
net/mlx5e: Remove redundant page argument in mlx5e_xmit_xdp_buff()
net/mlx5e: Switch to using napi_build_skb()
====================
Link: https://lore.kernel.org/r/20230218090513.284718-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 21 Feb 2023 00:46:11 +0000 (16:46 -0800)]
Merge branch 'net-sched-cls_api-support-hardware-miss-to-tc-action'
Paul Blakey says:
====================
net/sched: cls_api: Support hardware miss to tc action
This series adds support for hardware miss to instruct tc to continue execution
in a specific tc action instance on a filter's action list. The mlx5 driver patch
(besides the refactors) shows its usage instead of using just chain restore.
Currently a filter's action list must be executed all together or
not at all as driver are only able to tell tc to continue executing from a
specific tc chain, and not a specific filter/action.
This is troublesome with regards to action CT, where new connections should
be sent to software (via tc chain restore), and established connections can
be handled in hardware.
Checking for new connections is done when executing the ct action in hardware
(by checking the packet's tuple against known established tuples).
But if there is a packet modification (pedit) action before action CT and the
checked tuple is a new connection, hardware will need to revert the previous
packet modifications before sending it back to software so it can
re-match the same tc filter in software and re-execute its CT action.
The following is an example configuration of stateless nat
on mlx5 driver that isn't supported before this patchet:
#Setup corrosponding mlx5 VFs in namespaces
$ ip netns add ns0
$ ip netns add ns1
$ ip link set dev enp8s0f0v0 netns ns0
$ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
$ ip link set dev enp8s0f0v1 netns ns1
$ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up
#Setup tc arp and ct rules on mxl5 VF representors
$ tc qdisc add dev enp8s0f0_0 ingress
$ tc qdisc add dev enp8s0f0_1 ingress
$ ifconfig enp8s0f0_0 up
$ ifconfig enp8s0f0_1 up
#Original side
$ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
ct_state -trk ip_proto tcp dst_port 8888 \
action pedit ex munge tcp dport set 5001 pipe \
action csum ip tcp pipe \
action ct pipe \
action goto chain 1
$ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
ct_state +trk+est \
action mirred egress redirect dev enp8s0f0_1
$ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
ct_state +trk+new \
action ct commit pipe \
action mirred egress redirect dev enp8s0f0_1
$ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
action mirred egress redirect dev enp8s0f0_1
#Reply side
$ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
action mirred egress redirect dev enp8s0f0_0
$ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
ct_state -trk ip_proto tcp \
action ct pipe \
action pedit ex munge tcp sport set 8888 pipe \
action csum ip tcp pipe \
action mirred egress redirect dev enp8s0f0_0
#Run traffic
$ ip netns exec ns1 iperf -s -p 5001&
$ sleep 2 #wait for iperf to fully open
$ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888
#dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
$ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
Sent hardware
9310116832 bytes
6149672 pkt
Sent hardware
9310116832 bytes
6149672 pkt
Sent hardware
9310116832 bytes
6149672 pkt
A new connection executing the first filter in hardware will first rewrite
the dst port to the new port, and then the ct action is executed,
because this is a new connection, hardware will need to be send this back
to software, on chain 0, to execute the first filter again in software.
The dst port needs to be reverted otherwise it won't re-match the old
dst port in the first filter. Because of that, currently mlx5 driver will
reject offloading the above action ct rule.
This series adds support for hardware partially executing a filter's action list,
and letting tc software continue processing in the specific action instance
where hardware left off (in the above case after the "action pedit ex munge tcp
dport... of the first rule") allowing support for scenarios such as the above.
====================
Link: https://lore.kernel.org/r/20230217223620.28508-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:20 +0000 (00:36 +0200)]
net/mlx5e: TC, Set CT miss to the specific ct action instance
Currently, CT misses restore the missed chain on the tc skb extension so
tc will continue from the relevant chain. Instead, restore the CT action's
miss cookie on the extension, which will instruct tc to continue from the
this specific CT action instance on the relevant filter's action list.
Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
this miss mapping instead of the current chain miss object (CHAIN_MISS)
for CT action misses.
To restore this new miss mapping value, add a RX restore rule for each
such mapping value.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Sholmo <ozsh@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:19 +0000 (00:36 +0200)]
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
This reg usage is always a mapped object, not necessarily
containing chain info.
Rename to properly convey what it stores.
This patch doesn't change any functionality.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:18 +0000 (00:36 +0200)]
net/mlx5: Refactor tc miss handling to a single function
Move tc miss handling code to en_tc.c, and remove
duplicate code.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:17 +0000 (00:36 +0200)]
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
Tc skb extension is a basic requirement for using tc
offload to support correct restoration on action miss.
Depend on it.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:16 +0000 (00:36 +0200)]
net/sched: flower: Support hardware miss to tc action
To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:15 +0000 (00:36 +0200)]
net/sched: flower: Move filter handle initialization earlier
To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.
Move filter handle initialization earlier.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:14 +0000 (00:36 +0200)]
net/sched: cls_api: Support hardware miss to tc action
For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.
CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.
Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Fri, 17 Feb 2023 22:36:13 +0000 (00:36 +0200)]
net/sched: Rename user cookie and act cookie
struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.
Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 21 Feb 2023 00:40:52 +0000 (16:40 -0800)]
Merge tag 'ieee802154-for-net-next-2023-02-20' of git://git./linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2023-02-20
Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.
Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).
Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
tx path
* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
ieee802154: Drop device trackers
mac802154: Fix an always true condition
mac802154: Send beacons using the MLME Tx path
ieee802154: Change error code on monitor scan netlink request
ieee802154: Convert scan error messages to extack
ieee802154: Use netlink policies when relevant on scan parameters
ieee802154: at86rf230: switch to using gpiod API
ieee802154: at86rf230: drop support for platform data
Revert "at86rf230: convert to gpio descriptors"
cc2520: move to gpio descriptors
mac802154: Avoid superfluous endianness handling
at86rf230: convert to gpio descriptors
mac802154: Handle basic beaconing
ieee802154: Add support for user beaconing requests
mac802154: Handle passive scanning
mac802154: Add MLME Tx locked helpers
mac802154: Prepare forcing specific symbol duration
ieee802154: Introduce a helper to validate a channel
ieee802154: Define a beacon frame header
ieee802154: Add support for user scanning requests
====================
Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alejandro Lucero [Mon, 20 Feb 2023 11:01:33 +0000 (11:01 +0000)]
sfc: fix builds without CONFIG_RTC_LIB
Add an embarrassingly missed semicolon plus and embarrassingly missed
parenthesis breaking kernel building when CONFIG_RTC_LIB is not set
like the one reported with ia64 config.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302170047.EjCPizu3-lkp@intel.com/
Fixes:
14743ddd2495 ("sfc: add devlink info support for ef100")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Link: https://lore.kernel.org/r/20230220110133.29645-1-alejandro.lucero-palau@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yang Li [Mon, 20 Feb 2023 06:59:58 +0000 (14:59 +0800)]
sfc: clean up some inconsistent indentings
Fix some indentngs and remove the warning below:
drivers/net/ethernet/sfc/mae.c:657 efx_mae_enumerate_mports() warn: inconsistent indenting
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4117
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230220065958.52941-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Sat, 18 Feb 2023 18:38:50 +0000 (10:38 -0800)]
net/mlx4_en: Introduce flexible array to silence overflow warning
The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY
memcpy() warning on ppc64 platform:
In function ‘fortify_memcpy_chk’,
inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2,
inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4,
inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3:
./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with
attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?
[-Werror=attribute-warning]
513 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Same behaviour on x86 you can get if you use "__always_inline" instead of
"inline" for skb_copy_from_linear_data() in skbuff.h
The call here copies data into inlined tx destricptor, which has 104
bytes (MAX_INLINE) space for data payload. In this case "spc" is known
in compile-time but the destination is used with hidden knowledge
(real structure of destination is different from that the compiler
can see). That cause the fortify warning because compiler can check
bounds, but the real bounds are different. "spc" can't be bigger than
64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined
tx descriptor. The fact that "inl" points into inlined tx descriptor is
determined earlier in mlx4_en_xmit().
Avoid confusing the compiler with "inl + 1" constructions to get to past
the inl header by introducing a flexible array "data" to the struct so
that the compiler can see that we are not dealing with an array of inl
structs, but rather, arbitrary data following the structure. There are
no changes to the structure layout reported by pahole, and the resulting
machine code is actually smaller.
Reported-by: Josef Oskera <joskera@redhat.com>
Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com
Fixes:
f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time")
Cc: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur [Fri, 17 Feb 2023 21:09:17 +0000 (22:09 +0100)]
net: lan966x: Fix possible deadlock inside PTP
When doing timestamping in lan966x and having PROVE_LOCKING
enabled the following warning is shown.
========================================================
WARNING: possible irq lock inversion dependency detected
6.2.0-rc7-01749-gc54e1f7f7e36 #2786 Tainted: G N
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
c2609f50 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x16c/0x2e8
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&lan966x->ptp_ts_id_lock){+.+.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&lan966x->ptp_ts_id_lock);
local_irq_disable();
lock(_xmit_ETHER#2);
lock(&lan966x->ptp_ts_id_lock);
<Interrupt>
lock(_xmit_ETHER#2);
*** DEADLOCK ***
5 locks held by swapper/0/0:
#0:
c1001e18 ((&ndev->rs_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x33c
#1:
c105e7c4 (rcu_read_lock){....}-{1:2}, at: ndisc_send_skb+0x134/0x81c
#2:
c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x17c/0xc64
#3:
c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x4c/0x1224
#4:
c3056174 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x354/0x1224
the shortest dependencies between 2nd lock and 1st lock:
-> (&lan966x->ptp_ts_id_lock){+.+.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
lan966x_ptp_irq_handler+0x164/0x2a8
irq_thread_fn+0x1c/0x78
irq_thread+0x130/0x278
kthread+0xec/0x110
ret_from_fork+0x14/0x28
SOFTIRQ-ON-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
lan966x_ptp_irq_handler+0x164/0x2a8
irq_thread_fn+0x1c/0x78
irq_thread+0x130/0x278
kthread+0xec/0x110
ret_from_fork+0x14/0x28
INITIAL USE at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock_irqsave+0x4c/0x68
lan966x_ptp_txtstamp_request+0x128/0x1cc
lan966x_port_xmit+0x224/0x43c
dev_hard_start_xmit+0xa8/0x2f0
sch_direct_xmit+0x108/0x2e8
__dev_queue_xmit+0x41c/0x1224
packet_sendmsg+0xdb4/0x134c
__sys_sendto+0xd0/0x154
sys_send+0x18/0x20
ret_fast_syscall+0x0/0x1c
}
... key at: [<
c174ba0c>] __key.2+0x0/0x8
... acquired at:
_raw_spin_lock_irqsave+0x4c/0x68
lan966x_ptp_txtstamp_request+0x128/0x1cc
lan966x_port_xmit+0x224/0x43c
dev_hard_start_xmit+0xa8/0x2f0
sch_direct_xmit+0x108/0x2e8
__dev_queue_xmit+0x41c/0x1224
packet_sendmsg+0xdb4/0x134c
__sys_sendto+0xd0/0x154
sys_send+0x18/0x20
ret_fast_syscall+0x0/0x1c
-> (_xmit_ETHER#2){+.-.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
netif_freeze_queues+0x38/0x68
dev_deactivate_many+0xac/0x388
dev_deactivate+0x38/0x6c
linkwatch_do_dev+0x70/0x8c
__linkwatch_run_queue+0xd4/0x1e8
linkwatch_event+0x24/0x34
process_one_work+0x284/0x744
worker_thread+0x28/0x4bc
kthread+0xec/0x110
ret_from_fork+0x14/0x28
IN-SOFTIRQ-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
sch_direct_xmit+0x16c/0x2e8
__dev_queue_xmit+0x41c/0x1224
ip6_finish_output2+0x5f4/0xc64
ndisc_send_skb+0x4cc/0x81c
addrconf_rs_timer+0xb0/0x2f8
call_timer_fn+0xb4/0x33c
expire_timers+0xb4/0x10c
run_timer_softirq+0xf8/0x2a8
__do_softirq+0xd4/0x5fc
__irq_exit_rcu+0x138/0x17c
irq_exit+0x8/0x28
__irq_svc+0x90/0xbc
arch_cpu_idle+0x30/0x3c
default_idle_call+0x44/0xac
do_idle+0xc8/0x138
cpu_startup_entry+0x18/0x1c
rest_init+0xcc/0x168
arch_post_acpi_subsys_init+0x0/0x8
INITIAL USE at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
netif_freeze_queues+0x38/0x68
dev_deactivate_many+0xac/0x388
dev_deactivate+0x38/0x6c
linkwatch_do_dev+0x70/0x8c
__linkwatch_run_queue+0xd4/0x1e8
linkwatch_event+0x24/0x34
process_one_work+0x284/0x744
worker_thread+0x28/0x4bc
kthread+0xec/0x110
ret_from_fork+0x14/0x28
}
... key at: [<
c175974c>] netdev_xmit_lock_key+0x8/0x1c8
... acquired at:
__lock_acquire+0x978/0x2978
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
sch_direct_xmit+0x16c/0x2e8
__dev_queue_xmit+0x41c/0x1224
ip6_finish_output2+0x5f4/0xc64
ndisc_send_skb+0x4cc/0x81c
addrconf_rs_timer+0xb0/0x2f8
call_timer_fn+0xb4/0x33c
expire_timers+0xb4/0x10c
run_timer_softirq+0xf8/0x2a8
__do_softirq+0xd4/0x5fc
__irq_exit_rcu+0x138/0x17c
irq_exit+0x8/0x28
__irq_svc+0x90/0xbc
arch_cpu_idle+0x30/0x3c
default_idle_call+0x44/0xac
do_idle+0xc8/0x138
cpu_startup_entry+0x18/0x1c
rest_init+0xcc/0x168
arch_post_acpi_subsys_init+0x0/0x8
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G N
6.2.0-rc7-01749-gc54e1f7f7e36 #2786
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from mark_lock.part.0+0x59c/0x93c
mark_lock.part.0 from __lock_acquire+0x978/0x2978
__lock_acquire from lock_acquire.part.0+0xb0/0x248
lock_acquire.part.0 from _raw_spin_lock+0x38/0x48
_raw_spin_lock from sch_direct_xmit+0x16c/0x2e8
sch_direct_xmit from __dev_queue_xmit+0x41c/0x1224
__dev_queue_xmit from ip6_finish_output2+0x5f4/0xc64
ip6_finish_output2 from ndisc_send_skb+0x4cc/0x81c
ndisc_send_skb from addrconf_rs_timer+0xb0/0x2f8
addrconf_rs_timer from call_timer_fn+0xb4/0x33c
call_timer_fn from expire_timers+0xb4/0x10c
expire_timers from run_timer_softirq+0xf8/0x2a8
run_timer_softirq from __do_softirq+0xd4/0x5fc
__do_softirq from __irq_exit_rcu+0x138/0x17c
__irq_exit_rcu from irq_exit+0x8/0x28
irq_exit from __irq_svc+0x90/0xbc
Exception stack(0xc1001f20 to 0xc1001f68)
1f20:
ffffffff ffffffff 00000001 c011f840 c100e000 c100e000 c1009314 c1009370
1f40:
c10f0c1a c0d5e564 c0f5da8c 00000000 00000000 c1001f70 c010f0bc c010f0c0
1f60:
600f0013 ffffffff
__irq_svc from arch_cpu_idle+0x30/0x3c
arch_cpu_idle from default_idle_call+0x44/0xac
default_idle_call from do_idle+0xc8/0x138
do_idle from cpu_startup_entry+0x18/0x1c
cpu_startup_entry from rest_init+0xcc/0x168
rest_init from arch_post_acpi_subsys_init+0x0/0x8
Fix this by using spin_lock_irqsave/spin_lock_irqrestore also
inside lan966x_ptp_irq_handler.
Fixes:
e85a96e48e33 ("net: lan966x: Add support for ptp interrupts")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230217210917.2649365-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Fri, 17 Feb 2023 20:09:20 +0000 (12:09 -0800)]
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
Commit
2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 20 Feb 2023 23:38:41 +0000 (15:38 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-02-17
We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
The main changes are:
1) Add a rbtree data structure following the "next-gen data structure"
precedent set by recently-added linked-list, that is, by using
kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
2) Add a new benchmark for hashmap lookups to BPF selftests,
from Anton Protopopov.
3) Fix bpf_fib_lookup to only return valid neighbors and add an option
to skip the neigh table lookup, from Martin KaFai Lau.
4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
accouting for container environments, from Yafang Shao.
5) Batch of ice multi-buffer and driver performance fixes,
from Alexander Lobakin.
6) Fix a bug in determining whether global subprog's argument is
PTR_TO_CTX, which is based on type names which breaks kprobe progs,
from Andrii Nakryiko.
7) Prep work for future -mcpu=v4 LLVM option which includes usage of
BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
from Eduard Zingerman.
8) More prep work for later building selftests with Memory Sanitizer
in order to detect usages of undefined memory, from Ilya Leoshkevich.
9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
dereference via sendmsg(), from Maciej Fijalkowski.
10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
11) Fix BPF memory allocator in combination with BPF hashtab where it could
corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
12) Fix LoongArch BPF JIT to always use 4 instructions for function
address so that instruction sequences don't change between passes,
from Hengqi Chen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
selftests/bpf: Add bpf_fib_lookup test
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
riscv, bpf: Add bpf trampoline support for RV64
riscv, bpf: Add bpf_arch_text_poke support for RV64
riscv, bpf: Factor out emit_call for kernel and bpf context
riscv: Extend patch_text for multiple instructions
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
selftests/bpf: Add global subprog context passing tests
selftests/bpf: Convert test_global_funcs test to test_loader framework
bpf: Fix global subprog context argument resolution logic
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
bpf: Disable bh in bpf_test_run for xdp and tc prog
xsk: check IFF_UP earlier in Tx path
Fix typos in selftest/bpf files
selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
...
====================
Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Schmidt [Sat, 18 Feb 2023 21:13:17 +0000 (22:13 +0100)]
MAINTAINERS: Add Miquel Raynal as additional maintainer for ieee802154
We are growing the maintainer team for ieee802154 to spread the load for
review and general maintenance. Miquel has been driving the subsystem
forward over the last year and we would like to welcome him as a
maintainer.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230218211317.284889-4-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Schmidt [Sat, 18 Feb 2023 21:13:16 +0000 (22:13 +0100)]
MAINTAINERS: Switch maintenance for mrf24j40 driver over
Alan Ott has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Alan for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-3-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Schmidt [Sat, 18 Feb 2023 21:13:15 +0000 (22:13 +0100)]
MAINTAINERS: Switch maintenance for mcr20a driver over
Xue Liu has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Xue Liu for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-2-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Schmidt [Sat, 18 Feb 2023 21:13:14 +0000 (22:13 +0100)]
MAINTAINERS: Switch maintenance for cc2520 driver over
Varka Bhadram has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Varka for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 20 Feb 2023 23:55:47 +0000 (15:55 -0800)]
Merge tag 'asm-generic-6.3' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"Only three minor changes: a cross-platform series from Mike Rapoport
to consolidate asm/agp.h between architectures, and a correctness
change for __generic_cmpxchg_local() from Matt Evans"
* tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
char/agp: introduce asm-generic/agp.h
char/agp: consolidate {alloc,free}_gatt_pages()
locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value
Linus Torvalds [Mon, 20 Feb 2023 23:49:56 +0000 (15:49 -0800)]
Merge tag 'soc-dt-6.3' of git://git./linux/kernel/git/soc/soc
Pull SoC DT updates from Arnd Bergmann:
"About a quarter of the changes are for 32-bit arm, mostly filling in
device support for existing machines and adding minor cleanups, mostly
for Qualcomm and Samsung based machines.
Two new 32-bit SoCs are added, both are quad-core Cortex-A7 chips from
Rockchips that have been around for a while but were lacking kernel
support so far: RV1126 is a Vision SoC with an NPU and is used in the
Edgeble Neural Compute Module 2(Neu2) board, while RK3128 is design
for TV boxes and so far only comes with a dts for its refernece
design.
The other 32-bit boards that were added are two ASpeed AST2600 based
BMC boards, the Microchip sam9x60_curiosity development board (Armv5
based!), the Enclustra PE1 FPGA-SoM baseboard, and a few more boards
for i.MX53 and i.MX6ULL.
On the RISC-V side, there are fewer patches, but a total of ten new
single-board computers based on variations of the Allwinner D1/T113
chip, plus one more board based on Microchip Polarfire.
As usual, arm64 has by far the most changes here, with over 700
non-merge changesets, among them over 400 alone for Qualcomm. The
newly added SoCs this time are all recent high-end embedded SoCs for
various markets, each on comes with support for its reference board:
- Qualcomm SM8550 (Snapdragon 8 Gen 2) for mobile phones
- Qualcomm QDU1000/QRU1000 5G RAN platform
- Rockchips RK3588/RK3588s for tablets, chromebooks and SBCs
- TI J784S4 for industrial and automotive applications
In total, there are 46 new arm64 machines:
- Reference platforms for each of the five new SoCs
- Three Amlogic based development boards
- Six embedded machines based on NXP i.MX8MM and i.MX8MP
- The Mediatek mt7986a based Banana Pi R3 router
- Six tablets based on Qualcomm MSM8916 (Snapdragon 410), SM6115
(Snapdragon 662) and SM8250 (Snapdragon 865)
- Two LTE dongles, also based on MSM8916
- Seven mobile phones, based on Qualcomm MSM8953 (Snapdragon 610),
SDM450 and SDM632
- Three chromebooks based on Qualcomm SC7280 (Snapdragon 7c)
- Nine development boards based on Rockchips RK3588, RK3568, RK3566
and RK3328.
- Five development machines based on TI K3 (AM642/AM654/AM68/AM69)
The cleanup of dtc warnings continues across all platforms, adding to
the total number of changes"
* tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (1035 commits)
dt-bindings: riscv: correct starfive visionfive 2 compatibles
ARM: dts: socfpga: Add enclustra PE1 devicetree
dt-bindings: altera: Add enclustra mercury PE1
arm64: dts: qcom: msm8996: align RPM G-Link clock-controller node with bindings
arm64: dts: qcom: qcs404: align RPM G-Link node with bindings
arm64: dts: qcom: ipq6018: align RPM G-Link node with bindings
arm64: dts: qcom: sm8550: remove invalid interconnect property from cryptobam
arm64: dts: qcom: sc7280: Adjust zombie PWM frequency
arm64: dts: qcom: sc8280xp-pmics: Specify interrupt parent explicitly
arm64: dts: qcom: sm7225-fairphone-fp4: enable remaining i2c busses
arm64: dts: qcom: sm7225-fairphone-fp4: move status property down
arm64: dts: qcom: pmk8350: Use the correct PON compatible
arm64: dts: qcom: sc8280xp-x13s: Enable external display
arm64: dts: qcom: sc8280xp-crd: Introduce pmic_glink
arm64: dts: qcom: sc8280xp: Add USB-C-related DP blocks
arm64: dts: qcom: sm8350-hdk: enable GPU
arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes
arm64: dts: qcom: sm8350: finish reordering nodes
arm64: dts: qcom: sm8350: move more nodes to correct place
arm64: dts: qcom: sm8350: reorder device nodes
...
Linus Torvalds [Mon, 20 Feb 2023 23:43:36 +0000 (15:43 -0800)]
Merge tag 'soc-defconfig-6.3' of git://git./linux/kernel/git/soc/soc
Pull ARM defconfigs updates from Arnd Bergmann:
"As usual, this contains all the patches to enable options for newly
added device drivers in the 32-bit and 64-bit defconfig files.
I have sorted the files according to the changes to Kconfig files,
to make it easier to check what has changed compared to the 'make
savedefconfig' output.
The most notable change this time is a series from Mark Brown to add
a 'virtconfig' target for arm64, which is for the moment the same as
the 'defconfig' target but disables all the top-level SoC specific
options in order to have a smaller and faster kernel build"
* tag 'soc-defconfig-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (39 commits)
arm64: defconfig: enable drivers required by the Qualcomm SA8775P platform
arm64: defconfig: Enable DisplayPort on SC8280XP laptops
arm64: configs: Add virtconfig
kbuild: Provide a version of merge_into_defconfig without override warnings
scripts: merge_config: Add option to suppress warning on overrides
ARM: reorder defconfig files
arm64: reorder defconfig
arm64: defconfig: enable Qualcomm SDAM nvmem driver
arm64: defconfig: enable SM8450 DISPCC clock driver
ARM: defconfig: Add IOSCHED_BFQ to the default configs
ARM: configs: multi_v7: enable NVMEM driver for STM32
ARM: Add wpcm450_defconfig for Nuvoton WPCM450
arm64: defconfig: Enable DMA_RESTRICTED_POOL
arm64: defconfig: Enable missing configs for mt8192-asurada
riscv: defconfig: Enable the Allwinner D1 platform and drivers
ARM: imx_v6_v7_defconfig: Don't enable PROVE_LOCKING
ARM: multi_v7_defconfig: Add GXP Fan and SPI support
ARM: add multi_v7_lpae_defconfig
kbuild: Add config fragment merge functionality
ARM: multi_v7_defconfig: Add options to support TQMLS102xA series
...
Linus Torvalds [Mon, 20 Feb 2023 23:36:37 +0000 (15:36 -0800)]
Merge tag 'arm-soc-6.3' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC updates from Arnd Bergmann:
"The majority of the changes are for the OMAP2 platform, mostly
removing some dead code that got left behind from previous cleanups.
Aside from that, there are very minor updates and correctness fixes
for Zynq, i.MX, Samsung, Broadcom, AT91, ep93xx, and OMAP1"
* tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
dt-bindings: soc: samsung: exynos-pmu: allow phys as child
ARM: imx: mach-imx6ul: add imx6ulz support
ARM: imx: Call ida_simple_remove() for ida_simple_get
arm64: drop redundant "ARMv8" from Kconfig option title
ARM: ep93xx: Convert to use descriptors for GPIO LEDs
ARM: s3c: fix s3c64xx_set_timer_source prototype
ARM: OMAP2+: Fix spelling typos in comment
ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/machine.h>
ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/pinmux.h>
ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init()
ARM: BCM63xx: remove useless goto statement
ARM: omap2: make functions static
ARM: omap2: remove unused omap2_pm_init
ARM: omap2: remove unused declarations
ARM: omap2: remove unused functions
ARM: omap2: smartreflex: remove on_init control
ARM: omap2: remove APLL control
ARM: omap2: simplify clock2xxx header
ARM: omap2: remove unused omap_hwmod_reset.c
ARM: omap2: remove unused headers
...
Linus Torvalds [Mon, 20 Feb 2023 23:28:57 +0000 (15:28 -0800)]
Merge tag 'arm-boardfile-remove-6.3' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC boardfile updates from Arnd Bergmann
"Unused boardfile removal for 6.3
This is a follow-up to the deprecation of most of the old-style board
files that was merged in linux-6.0, removing them for good.
This branch is almost exclusively dead code removal based on those
annotations. Some device driver removals went through separate
subsystem trees, but the majority is in the same branch, in order to
better handle dependencies between the patches and avoid breaking
bisection.
Unfortunately that leads to merge conflicts against other changes in
the subsystem trees, but they should all be trivial to resolve by
removing the files.
See commit
7d0d3fa7339e ("Merge tag 'arm-boardfiles-6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc") for the
description of which machines were marked unused and are now removed.
The only removals that got postponed are Terastation WXL (mv78xx0) and
Jornada720 (StrongARM1100), which turned out to still have potential
users"
* tag 'arm-boardfile-remove-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (91 commits)
mmc: omap: drop TPS65010 dependency
ARM: pxa: restore mfp-pxa320.h
usb: ohci-omap: avoid unused-variable warning
ARM: debug: remove references in DEBUG_UART_8250_SHIFT to removed configs
ARM: s3c: remove obsolete s3c-cpu-freq header
MAINTAINERS: adjust SAMSUNG SOC CLOCK DRIVERS after s3c24xx support removal
MAINTAINERS: update file entries after arm multi-platform rework and mach-pxa removal
ARM: remove CONFIG_UNUSED_BOARD_FILES
mfd: remove htc-pasic3 driver
w1: remove ds1wm driver
usb: remove ohci-tmio driver
fbdev: remove w100fb driver
fbdev: remove tmiofb driver
mmc: remove tmio_mmc driver
mfd: remove ucb1400 support
mfd: remove toshiba tmio drivers
rtc: remove v3020 driver
power: remove pda_power supply driver
ASoC: pxa: remove unused board support
pcmcia: remove unused pxa/sa1100 drivers
...
Linus Torvalds [Mon, 20 Feb 2023 22:27:21 +0000 (14:27 -0800)]
Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe updates via Christoph:
- Small improvements to the logging functionality (Amit Engel)
- Authentication cleanups (Hannes Reinecke)
- Cleanup and optimize the DMA mapping cod in the PCIe driver
(Keith Busch)
- Work around the command effects for Format NVM (Keith Busch)
- Misc cleanups (Keith Busch, Christoph Hellwig)
- Fix and cleanup freeing single sgl (Keith Busch)
- MD updates via Song:
- Fix a rare crash during the takeover process
- Don't update recovery_cp when curr_resync is ACTIVE
- Free writes_pending in md_stop
- Change active_io to percpu
- Updates to drbd, inching us closer to unifying the out-of-tree driver
with the in-tree one (Andreas, Christoph, Lars, Robert)
- BFQ update adding support for multi-actuator drives (Paolo, Federico,
Davide)
- Make brd compliant with REQ_NOWAIT (me)
- Fix for IOPOLL and queue entering, fixing stalled IO waiting on
timeouts (me)
- Fix for REQ_NOWAIT with multiple bios (me)
- Fix memory leak in blktrace cleanup (Greg)
- Clean up sbitmap and fix a potential hang (Kemeng)
- Clean up some bits in BFQ, and fix a bug in the request injection
(Kemeng)
- Clean up the request allocation and issue code, and fix some bugs
related to that (Kemeng)
- ublk updates and fixes:
- Add support for unprivileged ublk (Ming)
- Improve device deletion handling (Ming)
- Misc (Liu, Ziyang)
- s390 dasd fixes (Alexander, Qiheng)
- Improve utility of request caching and fixes (Anuj, Xiao)
- zoned cleanups (Pankaj)
- More constification for kobjs (Thomas)
- blk-iocost cleanups (Yu)
- Remove bio splitting from drivers that don't need it (Christoph)
- Switch blk-cgroups to use struct gendisk. Some of this is now
incomplete as select late reverts were done. (Christoph)
- Add bvec initialization helpers, and convert callers to use that
rather than open-coding it (Christoph)
- Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
Matthew, Ulf, Zhong)
* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
brd: use radix_tree_maybe_preload instead of radix_tree_preload
block: use proper return value from bio_failfast()
block: bio-integrity: Copy flags when bio_integrity_payload is cloned
block: Fix io statistics for cgroup in throttle path
brd: mark as nowait compatible
brd: check for REQ_NOWAIT and set correct page allocation mask
brd: return 0/-error from brd_insert_page()
block: sync mixed merged request's failfast with 1st bio's
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
Revert "blk-cgroup: pass a gendisk to blkg_lookup"
Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
Revert "blk-cgroup: move the cgroup information to struct gendisk"
nvme-pci: remove iod use_sgls
nvme-pci: fix freeing single sgl
block: ublk: check IO buffer based on flag need_get_data
s390/dasd: Fix potential memleak in dasd_eckd_init()
s390/dasd: sort out physical vs virtual pointers usage
block: Remove the ALLOC_CACHE_SLACK constant
block: make kobj_type structures constant
...
Linus Torvalds [Mon, 20 Feb 2023 22:10:36 +0000 (14:10 -0800)]
Merge tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux
Pull legacy dio update from Jens Axboe:
"We only have a few file systems that use the old dio code, make them
select it rather than build it unconditionally"
* tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux:
fs: build the legacy direct I/O code conditionally
fs: move sb_init_dio_done_wq out of direct-io.c
Linus Torvalds [Mon, 20 Feb 2023 22:03:57 +0000 (14:03 -0800)]
Merge tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux
Pull io_uring ITER_UBUF conversion from Jens Axboe:
"Since we now have ITER_UBUF available, switch to using it for single
ranges as it's more efficient than ITER_IOVEC for that"
* tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux:
block: use iter_ubuf for single range
iov_iter: move iter_ubuf check inside restore WARN
io_uring: use iter_ubuf for single range imports
io_uring: switch network send/recv to ITER_UBUF
iov: add import_ubuf()
Linus Torvalds [Mon, 20 Feb 2023 21:53:39 +0000 (13:53 -0800)]
Merge tag 'for-6.3/io_uring-2023-02-16' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
- Cleanup series making the async prep and handling of
REQ_F_FORCE_ASYNC easier to follow and verify (Dylan)
- Enable specifying specific flags for OP_MSG_RING (Breno)
- Enable use of KASAN with the internal request cache (Breno)
- Split the opcode definition structs into a hot and cold part (Breno)
- OP_MSG_RING fixes (Pavel, me)
- Fix an issue with IOPOLL cancelation and PREEMPT_NONE (me)
- Handle TIF_NOTIFY_RESUME for the io-wq threads that never return to
userspace (me)
- Add support for using io_uring_register() with a registered ring fd
(Josh)
- Improve handling of poll on the ring fd (Pavel)
- Series improving the task_work handling (Pavel)
- Misc cleanups, fixes, improvements (Dmitrii, Quanfa, Richard, Pavel,
me)
* tag 'for-6.3/io_uring-2023-02-16' of git://git.kernel.dk/linux: (51 commits)
io_uring: Support calling io_uring_register with a registered ring fd
io_uring,audit: don't log IORING_OP_MADVISE
io_uring: mark task TASK_RUNNING before handling resume/task work
io_uring: always go async for unsupported open flags
io_uring: always go async for unsupported fadvise flags
io_uring: for requests that require async, force it
io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async
io_uring: add reschedule point to handle_tw_list()
io_uring: add a conditional reschedule to the IOPOLL cancelation loop
io_uring: return normal tw run linking optimisation
io_uring: refactor tctx_task_work
io_uring: refactor io_put_task helpers
io_uring: refactor req allocation
io_uring: improve io_get_sqe
io_uring: kill outdated comment about overflow flush
io_uring: use user visible tail in io_uring_poll()
io_uring: pass in io_issue_def to io_assign_file()
io_uring: Enable KASAN for request cache
io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
io_uring/msg-ring: ensure flags passing works for task_work completions
...
Linus Torvalds [Mon, 20 Feb 2023 21:05:24 +0000 (13:05 -0800)]
Merge tag 'dlm-6.3' of git://git./linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This fixes some races in the lowcomms startup and shutdown code that
were found by targeted stress testing that quickly and repeatedly
joins and leaves lockspaces"
* tag 'dlm-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: remove unnecessary waker_up() calls
fs: dlm: move state change into else branch
fs: dlm: remove newline in log_print
fs: dlm: reduce the shutdown timeout to 5 secs
fs: dlm: make dlm sequence id more robust
fs: dlm: wait until all midcomms nodes detect version
fs: dlm: ignore unexpected non dlm opts msgs
fs: dlm: bring back previous shutdown handling
fs: dlm: send FIN ack back in right cases
fs: dlm: move sending fin message into state change handling
fs: dlm: don't set stop rx flag after node reset
fs: dlm: fix race setting stop tx flag
fs: dlm: be sure to call dlm_send_queue_flush()
fs: dlm: fix use after free in midcomms commit
fs: dlm: start midcomms before scand
fs/dlm: Remove "select SRCU"
fs: dlm: fix return value check in dlm_memory_init()
Linus Torvalds [Mon, 20 Feb 2023 20:54:27 +0000 (12:54 -0800)]
Merge tag 'for-6.3-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"The usual mix of performance improvements and new features.
The core change is reworking how checksums are processed, with
followup cleanups and simplifications. There are two minor changes in
block layer and iomap code.
Features:
- block group allocation class heuristics:
- pack files by size (up to 128k, up to 8M, more) to avoid
fragmentation in block groups, assuming that file size and life
time is correlated, in particular this may help during balance
- with tracepoints and extensible in the future
Performance:
- send: cache directory utimes and only emit the command when
necessary
- speedup up to 10x
- smaller final stream produced (no redundant utimes commands
issued)
- compatibility not affected
- fiemap: skip backref checks for shared leaves
- speedup 3x on sample filesystem with all leaves shared (e.g. on
snapshots)
- micro optimized b-tree key lookup, speedup in metadata operations
(sample benchmark: fs_mark +10% of files/sec)
Core changes:
- change where checksumming is done in the io path:
- checksum and read repair does verification at lower layer
- cascaded cleanups and simplifications
- raid56 refactoring and cleanups
Fixes:
- sysfs: make sure that a run-time change of a feature is correctly
tracked by the feature files
- scrub: better reporting of tree block errors
Other:
- locally enable -Wmaybe-uninitialized after fixing all warnings
- misc cleanups, spelling fixes
Other code:
- block: export bio_split_rw
- iomap: remove IOMAP_F_ZONE_APPEND"
* tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (109 commits)
btrfs: make kobj_type structures constant
btrfs: remove the bdev argument to btrfs_rmap_block
btrfs: don't rely on unchanging ->bi_bdev for zone append remaps
btrfs: never return true for reads in btrfs_use_zone_append
btrfs: pass a btrfs_bio to btrfs_use_append
btrfs: set bbio->file_offset in alloc_new_bio
btrfs: use file_offset to limit bios size in calc_bio_boundaries
btrfs: do unsigned integer division in the extent buffer binary search loop
btrfs: eliminate extra call when doing binary search on extent buffer
btrfs: raid56: handle endio in scrub_rbio
btrfs: raid56: handle endio in recover_rbio
btrfs: raid56: handle endio in rmw_rbio
btrfs: raid56: submit the read bios from scrub_assemble_read_bios
btrfs: raid56: fold rmw_read_wait_recover into rmw_read_bios
btrfs: raid56: fold recover_assemble_read_bios into recover_rbio
btrfs: raid56: add a bio_list_put helper
btrfs: raid56: wait for I/O completion in submit_read_bios
btrfs: raid56: simplify code flow in rmw_rbio
btrfs: raid56: simplify error handling and code flow in raid56_parity_write
btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback
...
Linus Torvalds [Mon, 20 Feb 2023 20:44:08 +0000 (12:44 -0800)]
Merge tag 'fixes_for_v6.3-rc1' of git://git./linux/kernel/git/jack/linux-fs
Pull UDF and ext2 fixes from Jan Kara:
- Rewrite of udf directory iteration code to address multiple syzbot
reports
- Fixes to udf extent handling and block mapping code to address
several syzbot reports and filesystem corruption issues uncovered by
fsx & fsstress
- Convert udf to kmap_local()
- Add sanity checks when loading udf bitmaps
- Drop old VARCONV support which I've never seen used and which was
broken for quite some years without anybody noticing
- Finish conversion of ext2 to kmap_local()
- One fix to mpage_writepages() on which other udf fixes depend
* tag 'fixes_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (78 commits)
udf: Avoid directory type conversion failure due to ENOMEM
udf: Use unsigned variables for size calculations
udf: remove reporting loc in debug output
udf: Check consistency of Space Bitmap Descriptor
udf: Fix file counting in LVID
udf: Limit file size to 4TB
udf: Don't return bh from udf_expand_dir_adinicb()
udf: Convert udf_expand_file_adinicb() to avoid kmap_atomic()
udf: Convert udf_adinicb_writepage() to memcpy_to_page()
udf: Switch udf_adinicb_readpage() to kmap_local_page()
udf: Move udf_adinicb_readpage() to inode.c
udf: Mark aops implementation static
udf: Switch to single address_space_operations
udf: Add handling of in-ICB files to udf_bmap()
udf: Convert all file types to use udf_write_end()
udf: Convert in-ICB files to use udf_write_begin()
udf: Convert in-ICB files to use udf_direct_IO()
udf: Convert in-ICB files to use udf_writepages()
udf: Unify .read_folio for normal and in-ICB files
udf: Fix off-by-one error when discarding preallocation
...
Linus Torvalds [Mon, 20 Feb 2023 20:38:27 +0000 (12:38 -0800)]
Merge tag 'fsnotify_for_v6.3-rc1' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"Support for auditing decisions regarding fanotify permission events"
* tag 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify,audit: Allow audit to use the full permission event response
fanotify: define struct members to hold response decision context
fanotify: Ensure consistent variable type for response
Linus Torvalds [Mon, 20 Feb 2023 20:33:41 +0000 (12:33 -0800)]
Merge tag 'fsverity-for-linus' of git://git./fs/fsverity/linux
Pull fsverity updates from Eric Biggers:
"Fix the longstanding implementation limitation that fsverity was only
supported when the Merkle tree block size, filesystem block size, and
PAGE_SIZE were all equal.
Specifically, add support for Merkle tree block sizes less than
PAGE_SIZE, and make ext4 support fsverity on filesystems where the
filesystem block size is less than PAGE_SIZE.
Effectively, this means that fsverity can now be used on systems with
non-4K pages, at least on ext4. These changes have been tested using
the verity group of xfstests, newly updated to cover the new code
paths.
Also update fs/verity/ to support verifying data from large folios.
There's also a similar patch for fs/crypto/, to support decrypting
data from large folios, which I'm including in here to avoid a merge
conflict between the fscrypt and fsverity branches"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fscrypt: support decrypting data from large folios
fsverity: support verifying data from large folios
fsverity.rst: update git repo URL for fsverity-utils
ext4: allow verity with fs block size < PAGE_SIZE
fs/buffer.c: support fsverity in block_read_full_folio()
f2fs: simplify f2fs_readpage_limit()
ext4: simplify ext4_readpage_limit()
fsverity: support enabling with tree block size < PAGE_SIZE
fsverity: support verification with tree block size < PAGE_SIZE
fsverity: replace fsverity_hash_page() with fsverity_hash_block()
fsverity: use EFBIG for file too large to enable verity
fsverity: store log2(digest_size) precomputed
fsverity: simplify Merkle tree readahead size calculation
fsverity: use unsigned long for level_start
fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG
fsverity: pass pos and size to ->write_merkle_tree_block
fsverity: optimize fsverity_cleanup_inode() on non-verity files
fsverity: optimize fsverity_prepare_setattr() on non-verity files
fsverity: optimize fsverity_file_open() on non-verity files
Linus Torvalds [Mon, 20 Feb 2023 20:29:27 +0000 (12:29 -0800)]
Merge tag 'fscrypt-for-linus' of git://git./fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"Simplify the implementation of the test_dummy_encryption mount option
by adding the 'test dummy key' on-demand"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: clean up fscrypt_add_test_dummy_key()
fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super()
f2fs: stop calling fscrypt_add_test_dummy_key()
ext4: stop calling fscrypt_add_test_dummy_key()
fscrypt: add the test dummy encryption key on-demand
Linus Torvalds [Mon, 20 Feb 2023 20:23:40 +0000 (12:23 -0800)]
Merge tag 'erofs-for-6.3-rc1' of git://git./linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most noticeable feature for this cycle is per-CPU kthread
decompression since Android use cases need low-latency I/O handling in
order to ensure the app runtime performance, currently unbounded
workqueue latencies are not quite good for production on many aarch64
hardwares and thus we need to introduce a deterministic expectation
for these. Decompression is CPU-intensive and it is sleepable for
EROFS, so other alternatives like decompression under softirq contexts
are not considered. More details are in the corresponding commit
message.
Others are random cleanups around the whole codebase and we will
continue to clean up further in the next few months.
Due to Lunar New Year holidays, some other new features were not
completely reviewed and solidified as expected and we may delay them
into the next version.
Summary:
- Add per-cpu kthreads for low-latency decompression for Android use
cases
- Get rid of tagged pointer helpers since they are rarely used now
- Several code cleanups to reduce codebase
- Documentation and MAINTAINERS updates"
* tag 'erofs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (21 commits)
erofs: fix an error code in z_erofs_init_zip_subsystem()
erofs: unify anonymous inodes for blob
erofs: relinquish volume with mutex held
erofs: maintain cookies of share domain in self-contained list
erofs: remove unused device mapping in meta routine
MAINTAINERS: erofs: Add Documentation/ABI/testing/sysfs-fs-erofs
Documentation/ABI: sysfs-fs-erofs: update supported features
erofs: remove unused EROFS_GET_BLOCKS_RAW flag
erofs: update print symbols for various flags in trace
erofs: make kobj_type structures constant
erofs: add per-cpu threads for decompression as an option
erofs: tidy up internal.h
erofs: get rid of z_erofs_do_map_blocks() forward declaration
erofs: move zdata.h into zdata.c
erofs: remove tagged pointer helpers
erofs: avoid tagged pointers to mark sync decompression
erofs: get rid of erofs_inode_datablocks()
erofs: simplify iloc()
erofs: get rid of debug_one_dentry()
erofs: remove linux/buffer_head.h dependency
...
Linus Torvalds [Mon, 20 Feb 2023 20:14:33 +0000 (12:14 -0800)]
Merge tag 'fs.acl.v6.3' of git://git./linux/kernel/git/vfs/idmapping
Pull vfs acl update from Christian Brauner:
"This contains a single update to the internal get acl method and
replaces an open-coded cmpxchg() comparison with with try_cmpxchg().
It's clearer and also beneficial on some architectures"
* tag 'fs.acl.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
posix_acl: Use try_cmpxchg in get_acl
Linus Torvalds [Mon, 20 Feb 2023 20:03:55 +0000 (12:03 -0800)]
Merge tag 'fs.v6.3' of git://git./linux/kernel/git/vfs/idmapping
Pull vfs hardening update from Christian Brauner:
"Jan pointed out that during shutdown both filp_close() and super block
destruction will use basic printk logging when bugs are detected. This
causes issues in a few scenarios:
- Tools like syzkaller cannot figure out that the logged message
indicates a bug.
- Users that explicitly opt in to have the kernel bug on data
corruption by selecting CONFIG_BUG_ON_DATA_CORRUPTION should see
the kernel crash when they did actually select that option.
- When there are busy inodes after the superblock is shut down later
access to such a busy inodes walks through freed memory. It would
be better to cleanly crash instead.
All of this can be addressed by using the already existing
CHECK_DATA_CORRUPTION() macro in these places when kernel bugs are
detected. Its logging improvement is useful for all users.
Otherwise this only has a meaningful behavioral effect when users do
select CONFIG_BUG_ON_DATA_CORRUPTION which means this is backward
compatible for regular users"
* tag 'fs.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected
Linus Torvalds [Mon, 20 Feb 2023 19:53:11 +0000 (11:53 -0800)]
Merge tag 'fs.idmapped.v6.3' of git://git./linux/kernel/git/vfs/idmapping
Pull vfs idmapping updates from Christian Brauner:
- Last cycle we introduced the dedicated struct mnt_idmap type for
mount idmapping and the required infrastucture in
256c8aed2b42 ("fs:
introduce dedicated idmap type for mounts"). As promised in last
cycle's pull request message this converts everything to rely on
struct mnt_idmap.
Currently we still pass around the plain namespace that was attached
to a mount. This is in general pretty convenient but it makes it easy
to conflate namespaces that are relevant on the filesystem with
namespaces that are relevant on the mount level. Especially for
non-vfs developers without detailed knowledge in this area this was a
potential source for bugs.
This finishes the conversion. Instead of passing the plain namespace
around this updates all places that currently take a pointer to a
mnt_userns with a pointer to struct mnt_idmap.
Now that the conversion is done all helpers down to the really
low-level helpers only accept a struct mnt_idmap argument instead of
two namespace arguments.
Conflating mount and other idmappings will now cause the compiler to
complain loudly thus eliminating the possibility of any bugs. This
makes it impossible for filesystem developers to mix up mount and
filesystem idmappings as they are two distinct types and require
distinct helpers that cannot be used interchangeably.
Everything associated with struct mnt_idmap is moved into a single
separate file. With that change no code can poke around in struct
mnt_idmap. It can only be interacted with through dedicated helpers.
That means all filesystems are and all of the vfs is completely
oblivious to the actual implementation of idmappings.
We are now also able to extend struct mnt_idmap as we see fit. For
example, we can decouple it completely from namespaces for users that
don't require or don't want to use them at all. We can also extend
the concept of idmappings so we can cover filesystem specific
requirements.
In combination with the vfs{g,u}id_t work we finished in v6.2 this
makes this feature substantially more robust and thus difficult to
implement wrong by a given filesystem and also protects the vfs.
- Enable idmapped mounts for tmpfs and fulfill a longstanding request.
A long-standing request from users had been to make it possible to
create idmapped mounts for tmpfs. For example, to share the host's
tmpfs mount between multiple sandboxes. This is a prerequisite for
some advanced Kubernetes cases. Systemd also has a range of use-cases
to increase service isolation. And there are more users of this.
However, with all of the other work going on this was way down on the
priority list but luckily someone other than ourselves picked this
up.
As usual the patch is tiny as all the infrastructure work had been
done multiple kernel releases ago. In addition to all the tests that
we already have I requested that Rodrigo add a dedicated tmpfs
testsuite for idmapped mounts to xfstests. It is to be included into
xfstests during the v6.3 development cycle. This should add a slew of
additional tests.
* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
shmem: support idmapped mounts for tmpfs
fs: move mnt_idmap
fs: port vfs{g,u}id helpers to mnt_idmap
fs: port fs{g,u}id helpers to mnt_idmap
fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
fs: port i_{g,u}id_{needs_}update() to mnt_idmap
quota: port to mnt_idmap
fs: port privilege checking helpers to mnt_idmap
fs: port inode_owner_or_capable() to mnt_idmap
fs: port inode_init_owner() to mnt_idmap
fs: port acl to mnt_idmap
fs: port xattr to mnt_idmap
fs: port ->permission() to pass mnt_idmap
fs: port ->fileattr_set() to pass mnt_idmap
fs: port ->set_acl() to pass mnt_idmap
fs: port ->get_acl() to pass mnt_idmap
fs: port ->tmpfile() to pass mnt_idmap
fs: port ->rename() to pass mnt_idmap
fs: port ->mknod() to pass mnt_idmap
fs: port ->mkdir() to pass mnt_idmap
...
Yury Norov [Fri, 17 Feb 2023 01:39:08 +0000 (17:39 -0800)]
sched/topology: fix KASAN warning in hop_cmp()
Despite that prev_hop is used conditionally on cur_hop
is not the first hop, it's initialized unconditionally.
Because initialization implies dereferencing, it might happen
that the code dereferences uninitialized memory, which has been
spotted by KASAN. Fix it by reorganizing hop_cmp() logic.
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Fixes:
cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/Y+7avK6V9SyAWsXi@yury-laptop/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 20 Feb 2023 19:21:02 +0000 (11:21 -0800)]
Merge tag 'iversion-v6.3' of git://git./linux/kernel/git/jlayton/linux
Pull i_version updates from Jeff Layton:
"This overhauls how we handle i_version queries from nfsd.
Instead of having special routines and grabbing the i_version field
directly out of the inode in some cases, we've moved most of the
handling into the various filesystems' getattr operations. As a bonus,
this makes ceph's change attribute usable by knfsd as well.
This should pave the way for future work to make this value queryable
by userland, and to make it more resilient against rolling back on a
crash"
* tag 'iversion-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
nfsd: remove fetch_iversion export operation
nfsd: use the getattr operation to fetch i_version
nfsd: move nfsd4_change_attribute to nfsfh.c
ceph: report the inode version in getattr if requested
nfs: report the inode version in getattr if requested
vfs: plumb i_version handling into struct kstat
fs: clarify when the i_version counter must be updated
fs: uninline inode_query_iversion