git.monstr.eu Git - linux-2.6-microblaze.git/log

Merge tag 'fbdev-for-7.0-rc1-2' of git://git./linux/kernel/git/deller/linux-fbdev

Pull more fbdev updates from Helge Deller:
"Code cleanups for the au1100fb fbdev driver (Uwe Kleine-König)"

* tag 'fbdev-for-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: au1100fb: Replace license boilerplate by SPDX header
  fbdev: au1100fb: Fold au1100fb.h into its only user
  fbdev: au1100fb: Replace custom printk wrappers by pr_*
  fbdev: au1100fb: Make driver compilable on non-mips platforms
  fbdev: au1100fb: Use proper conversion specifiers in printk formats
  fbdev: au1100fb: Mark several local functions as static
  fbdev: au1100fb: Don't store device specific data in global variables

Merge tag 'trace-v7.0-2' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix possible dereference of uninitialized pointer

   When validating the persistent ring buffer on boot up, if the first
   validation fails, a reference to "head_page" is performed in the
   error path, but it skips over the initialization of that variable.
   Move the initialization before the first validation check.

- Fix use of event length in validation of persistent ring buffer

   On boot up, the persistent ring buffer is checked to see if it is
   valid by several methods. One being to walk all the events in the
   memory location to make sure they are all valid. The length of the
   event is used to move to the next event. This length is determined by
   the data in the buffer. If that length is corrupted, it could
   possibly make the next event to check located at a bad memory
   location.

   Validate the length field of the event when doing the event walk.

- Fix function graph on archs that do not support use of ftrace_ops

   When an architecture defines HAVE_DYNAMIC_FTRACE_WITH_ARGS, it means
   that its function graph tracer uses the ftrace_ops of the function
   tracer to call its callbacks. This allows a single registered
   callback to be called directly instead of checking the callback's
   meta data's hash entries against the function being traced.

   For architectures that do not support this feature, it must always
   call the loop function that tests each registered callback (even if
   there's only one). The loop function tests each callback's meta data
   against its hash of functions and will call its callback if the
   function being traced is in its hash map.

   The issue was that there was no check against this and the direct
   function was being called even if the architecture didn't support it.
   This meant that if function tracing was enabled at the same time as a
   callback was registered with the function graph tracer, its callback
   would be called for every function that the function tracer also
   traced, even if the callback's meta data only wanted to be called
   back for a small subset of functions.

   Prevent the direct calling for those architectures that do not
   support it.

- Fix references to trace_event_file for hist files

   The hist files used event_file_data() to get a reference to the
   associated trace_event_file the histogram was attached to. This would
   return a pointer even if the trace_event_file is about to be freed
   (via RCU). Instead it should use the event_file_file() helper that
   returns NULL if the trace_event_file is marked to be freed so that no
   new references are added to it.

- Wake up hist poll readers when an event is being freed

   When polling on a hist file, the task is only awoken when a hist
   trigger is triggered. This means that if an event is being freed
   while there's a task waiting on its hist file, it will need to wait
   until the hist trigger occurs to wake it up and allow the freeing to
   happen. Note, the event will not be completely freed until all
   references are removed, and a hist poller keeps a reference. But it
   should still be woken when the event is being freed.

* tag 'trace-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Wake up poll waiters for hist files when removing an event
  tracing: Fix checking of freed trace_event_file for hist files
  fgraph: Do not call handlers direct when not using ftrace_ops
  tracing: ring-buffer: Fix to check event length before using
  ring-buffer: Fix possible dereference of uninitialized pointer

Merge tag 'for-7.0-rc1-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- multiple error handling fixes of unexpected conditions

- reset block group size class once it becomes empty so that
   its class can be changed

- error message level adjustments

- fixes of returned error values

- use correct block reserve for delayed refs

* tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found
  btrfs: fix lost error return in btrfs_find_orphan_roots()
  btrfs: fix lost return value on error in finish_verity()
  btrfs: change unaligned root messages to error level in btrfs_validate_super()
  btrfs: use the correct type to initialize block reserve for delayed refs
  btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure()
  btrfs: reset block group size class when it becomes empty
  btrfs: replace BUG() with error handling in __btrfs_balance()
  btrfs: handle unexpected exact match in btrfs_set_inode_index_count()

Merge tag 'ecryptfs-7.0-rc1-fixes' of git://git./linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs updates from Tyler Hicks:
"This consists of some really minor typo fixes that fell through the
  cracks and some more recent code cleanups:

   - Comment typo fixes

   - Removal of an unused function declaration

   - Use strscpy() instead of the deprecated strcpy()

   - Use string copying helpers instead of memcpy() and manually
     terminating strings"

* tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filename
  ecryptfs: Drop redundant NUL terminations after calling ecryptfs_to_hex
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_new_file_context
  ecryptfs: Replace strcpy with strscpy in ecryptfs_validate_options
  ecryptfs: Replace strcpy with strscpy in ecryptfs_cipher_code_to_string
  ecryptfs: Replace strcpy with strscpy in ecryptfs_set_default_crypt_stat_vals
  ecryptfs: simplify list initialization in ecryptfs_parse_packet_set()
  ecryptfs: Remove unused declartion ecryptfs_fill_zeros()
  ecryptfs: Fix packet format comment in parse_tag_67_packet()
  ecryptfs: comment typo fix
  ecryptfs: keystore: Fix typo 'the the' in comment

Merge tag 'apparmor-pr-2026-02-18' of git://git./linux/kernel/git/jj/linux-apparmor

Pull AppArmor updates from John Johansen:
"Features:
   - add .kunitconfig
   - audit execpath in userns mediation
   - add support loading per permission tagging

  Cleanups:
   - remove unused percpu critical sections in buffer management
   - document the buffer hold, add an overflow guard
   - split xxx_in_ns into its two separate semantic use cases
   - remove apply_modes_to_perms from label_match
   - refactor/cleanup cred helper fns.
   - guard against free attachment/data routines being called with NULL
   - drop in_atomic flag in common_mmap, common_file_perm, and cleanup
   - make str table more generic and be able to have multiple entries
   - Replace deprecated strcpy with memcpy in gen_symlink_name
   - Replace deprecated strcpy in d_namespace_path
   - Replace sprintf/strcpy with scnprintf/strscpy in aa_policy_init
   - replace sprintf with snprintf in aa_new_learning_profile

  Bug Fixes:
   - fix cast in format string DEBUG statement
   - fix make aa_labelmatch return consistent
   - fix fmt string type error in process_strs_entry
   - fix kernel-doc comments for inview
   - fix invalid deref of rawdata when export_binary is unset
   - avoid per-cpu hold underflow in aa_get_buffer
   - fix fast path cache check for unix sockets
   - fix rlimit for posix cpu timers
   - fix label and profile debug macros
   - move check for aa_null file to cover all cases
   - return -ENOMEM in unpack_perms_table upon alloc failure
   - fix boolean argument in apparmor_mmap_file
   - Fix & Optimize table creation from possibly unaligned memory
   - Allow apparmor to handle unaligned dfa tables
   - fix NULL deref in aa_sock_file_perm
   - fix NULL pointer dereference in __unix_needs_revalidation
   - fix signedness bug in unpack_tags()"

* tag 'apparmor-pr-2026-02-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: (34 commits)
  apparmor: fix signedness bug in unpack_tags()
  apparmor: fix cast in format string DEBUG statement
  apparmor: fix aa_label to return state from compount and component match
  apparmor: fix fmt string type error in process_strs_entry
  apparmor: fix kernel-doc comments for inview
  apparmor: fix invalid deref of rawdata when export_binary is unset
  apparmor: add .kunitconfig
  apparmor: cleanup remove unused percpu critical sections in buffer management
  apparmor: document the buffer hold, add an overflow guard
  apparmor: avoid per-cpu hold underflow in aa_get_buffer
  apparmor: split xxx_in_ns into its two separate semantic use cases
  apparmor: make label_match return a consistent value
  apparmor: remove apply_modes_to_perms from label_match
  apparmor: fix fast path cache check for unix sockets
  apparmor: fix rlimit for posix cpu timers
  apparmor: refactor/cleanup cred helper fns.
  apparmor: fix label and profile debug macros
  apparmor: move check for aa_null file to cover all cases
  apparmor: guard against free routines being called with a NULL
  apparmor: return -ENOMEM in unpack_perms_table upon alloc failure
  ...

Merge tag 'kmalloc_obj-prep-v7.0-rc1' of git://git./linux/kernel/git/kees/linux

Pull kmalloc_obj prep from Kees Cook:
"Fixes for return types to prepare for the kmalloc_obj treewide
  conversion, that haven't yet appeared during the merge window:
  dm-crypt, dm-zoned, drm/msm, and arm64 kvm"

* tag 'kmalloc_obj-prep-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
  drm/msm: Adjust msm_iommu_pagetable_prealloc_allocate() allocation type
  dm: dm-zoned: Adjust dmz_load_mapping() allocation type
  dm-crypt: Adjust crypt_alloc_tfms_aead() allocation type

Merge tag 'for-linus' of git://git./linux/kernel/git/rmk/linux

Pull ARM updates from Russell King:

- avoid %pK for ARM MM prints

- implement ARCH_HAS_CC_CAN_LINK to ensure runnable user progs

- handle BE8 and BE32 for user progs

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9470/1: Handle BE8 vs BE32 in ARCH_CC_CAN_LINK
  ARM: 9469/1: Implement ARCH_HAS_CC_CAN_LINK
  ARM: 9467/1: mm: Don't use %pK through printk

Merge tag 'efi-fixes-for-v7.0-1' of git://git./linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:
"Mixed bag of EFI tweaks and bug fixes:

   - Add a missing symbol export spotted by Arnd's randconfig testing

   - Fix kexec from a kernel booted with 'noefi'

   - Fix memblock handling of the unaccepted memory table

   - Constify an occurrence of struct efivar_operations

   - Add Ilias as EFI reviewer"

* tag 'efi-fixes-for-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Align unaccepted memory range to page boundary
  efi: Fix reservation of unaccepted memory table
  MAINTAINERS: Add a reviewer entry for EFI
  efi: stmm: Constify struct efivar_operations
  x86/kexec: Copy ACPI root pointer address from config table
  efi: export sysfb_primary_display for EDID

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Two arm64 fixes: one fixes a warning that started showing up with
  gcc 16 and the other fixes a lockup in udelay() when running on a
  vCPU loaded on a CPU with the new-fangled WFIT instruction:

   - Fix compiler warning from huge_pte_clear() with GCC 16

   - Fix hang in udelay() on systems with WFIT by consistently using the
     virtual counter to calculate the delta"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: hugetlbpage: avoid unused-but-set-parameter warning (gcc-16)
  arm64: Force the use of CNTVCT_EL0 in __delay()

Merge tag 's390-7.0-2' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

- Make KEXEC_SIG available again for CONFIG_MODULES=n

- The s390 topology code used to call rebuild_sched_domains() before
   common code scheduling domains were setup. This was silently ignored
   by common code, but now results in a warning. Address by avoiding the
   early call

- Convert debug area lock from spinlock to raw spinlock to address
   lockdep warnings

- The recent 3490 tape device driver rework resulted in a different
   device driver name, which is visible via sysfs for user space. This
   breaks at least one user space application. Change the device driver
   name back to its old name to fix this

* tag 's390-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/tape: Fix device driver name
  s390/debug: Convert debug area lock from a spinlock to a raw spinlock
  s390/smp: Avoid calling rebuild_sched_domains() early
  s390/kexec: Make KEXEC_SIG available when CONFIG_MODULES=n

Merge tag 'xtensa-20260219' of https://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa update from Max Filippov:

- fix unhandled case in the load/store fault handler
in configurations with MMU

* tag 'xtensa-20260219' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: align: validate access in fast_load_store

Merge tag 'for-linus-7.0-rc1a-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"A single patch fixing a boot regression when running as a Xen PV
guest. This issue was introduced in this merge window"

* tag 'for-linus-7.0-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: Fix Xen PV guest boot

Merge tag 'hyperv-next-signed-20260218' of git://git./linux/kernel/git/hyperv/linux

Pull Hyper-V updates from Wei Liu:

- Debugfs support for MSHV statistics (Nuno Das Neves)

- Support for the integrated scheduler (Stanislav Kinsburskii)

- Various fixes for MSHV memory management and hypervisor status
   handling (Stanislav Kinsburskii)

- Expose more capabilities and flags for MSHV partition management
   (Anatol Belski, Muminul Islam, Magnus Kulke)

- Miscellaneous fixes to improve code quality and stability (Carlos
   López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh
   Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros
   Bizjak)

- PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka)

* tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits)
  mshv: Handle insufficient root memory hypervisor statuses
  mshv: Handle insufficient contiguous memory hypervisor status
  mshv: Introduce hv_deposit_memory helper functions
  mshv: Introduce hv_result_needs_memory() helper function
  mshv: Add SMT_ENABLED_GUEST partition creation flag
  mshv: Add nested virtualization creation flag
  Drivers: hv: vmbus: Simplify allocation of vmbus_evt
  mshv: expose the scrub partition hypercall
  mshv: Add support for integrated scheduler
  mshv: Use try_cmpxchg() instead of cmpxchg()
  x86/hyperv: Fix error pointer dereference
  x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV
  Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
  x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn
  x86/hyperv: Use savesegment() instead of inline asm() to save segment registers
  mshv: fix SRCU protection in irqfd resampler ack handler
  mshv: make field names descriptive in a header struct
  x86/hyperv: Update comment in hyperv_cleanup()
  mshv: clear eventfd counter on irqfd shutdown
  x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()
  ...

tracing: Wake up poll waiters for hist files when removing an event

The event_hist_poll() function attempts to verify whether an event file is
being removed, but this check may not occur or could be unnecessarily
delayed. This happens because hist_poll_wakeup() is currently invoked only
from event_hist_trigger() when a hist command is triggered. If the event
file is being removed, no associated hist command will be triggered and a
waiter will be woken up only after an unrelated hist command is triggered.

Fix the issue by adding a call to hist_poll_wakeup() in
remove_event_file_dir() after setting the EVENT_FILE_FL_FREED flag. This
ensures that a task polling on a hist file is woken up and receives
EPOLLERR.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-3-petr.pavlu@suse.com
Fixes: 1bd13edbbed6 ("tracing/hist: Add poll(POLLIN) support on hist file")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Fix checking of freed trace_event_file for hist files

The event_hist_open() and event_hist_poll() functions currently retrieve
a trace_event_file pointer from a file struct by invoking
event_file_data(), which simply returns file->f_inode->i_private. The
functions then check if the pointer is NULL to determine whether the event
is still valid. This approach is flawed because i_private is assigned when
an eventfs inode is allocated and remains set throughout its lifetime.
Instead, the code should call event_file_file(), which checks for
EVENT_FILE_FL_FREED. Using the incorrect access function may result in the
code potentially opening a hist file for an event that is being removed or
becoming stuck while polling on this file.

Correct the access method to event_file_file() in both functions.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-2-petr.pavlu@suse.com
Fixes: 1bd13edbbed6 ("tracing/hist: Add poll(POLLIN) support on hist file")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

fgraph: Do not call handlers direct when not using ftrace_ops

The function graph tracer was modified to us the ftrace_ops of the
function tracer. This simplified the code as well as allowed more features
of the function graph tracer.

Not all architectures were converted over as it required the
implementation of HAVE_DYNAMIC_FTRACE_WITH_ARGS to implement. For those
architectures, it still did it the old way where the function graph tracer
handle was called by the function tracer trampoline. The handler then had
to check the hash to see if the registered handlers wanted to be called by
that function or not.

In order to speed up the function graph tracer that used ftrace_ops, if
only one callback was registered with function graph, it would call its
function directly via a static call.

Now, if the architecture does not support the use of using ftrace_ops and
still has the ftrace function trampoline calling the function graph
handler, then by doing a direct call it removes the check against the
handler's hash (list of functions it wants callbacks to), and it may call
that handler for functions that the handler did not request calls for.

On 32bit x86, which does not support the ftrace_ops use with function
graph tracer, it shows the issue:

~# trace-cmd start -p function -l schedule
~# trace-cmd show
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  2) * 11898.94 us |  schedule();
  3) # 1783.041 us |  schedule();
  1)               |  schedule() {
  ------------------------------------------
  1)   bash-8369    =>  kworker-7669
  ------------------------------------------
  1)               |        schedule() {
  ------------------------------------------
  1)  kworker-7669  =>   bash-8369
  ------------------------------------------
  1) + 97.004 us   |  }
  1)               |  schedule() {
[..]

Now by starting the function tracer is another instance:

~# trace-cmd start -B foo -p function

This causes the function graph tracer to trace all functions (because the
function trace calls the function graph tracer for each on, and the
function graph trace is doing a direct call):

~# trace-cmd show
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  1)   1.669 us    |          } /* preempt_count_sub */
  1) + 10.443 us   |        } /* _raw_spin_unlock_irqrestore */
  1)               |        tick_program_event() {
  1)               |          clockevents_program_event() {
  1)   1.044 us    |            ktime_get();
  1)   6.481 us    |            lapic_next_event();
  1) + 10.114 us   |          }
  1) + 11.790 us   |        }
  1) ! 181.223 us  |      } /* hrtimer_interrupt */
  1) ! 184.624 us  |    } /* __sysvec_apic_timer_interrupt */
  1)               |    irq_exit_rcu() {
  1)   0.678 us    |      preempt_count_sub();

When it should still only be tracing the schedule() function.

To fix this, add a macro FGRAPH_NO_DIRECT to be set to 0 when the
architecture does not support function graph use of ftrace_ops, and set to
1 otherwise. Then use this macro to know to allow function graph tracer to
call the handlers directly or not.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://patch.msgid.link/20260218104244.5f14dade@gandalf.local.home
Fixes: cc60ee813b503 ("function_graph: Use static_call and branch to optimize entry function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: ring-buffer: Fix to check event length before using

Check the event length before adding it for accessing next index in
rb_read_data_buffer(). Since this function is used for validating
possibly broken ring buffers, the length of the event could be broken.
In that case, the new event (e + len) can point a wrong address.
To avoid invalid memory access at boot, check whether the length of
each event is in the possible range before using it.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177123421541.142205.9414352170164678966.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ring-buffer: Fix possible dereference of uninitialized pointer

There is a pointer head_page in rb_meta_validate_events() which is not
initialized at the beginning of a function. This pointer can be dereferenced
if there is a failure during reader page validation. In this case the control
is passed to "invalid" label where the pointer is dereferenced in a loop.

To fix the issue initialize orig_head and head_page before calling
rb_validate_buffer.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260213100130.2013839-1-d.dulov@aladdin.ru
Closes: https://lore.kernel.org/r/202406130130.JtTGRf7W-lkp@intel.com/
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'net-7.0-rc1' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.

  Current release - new code bugs:

   - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT

   - eth: mlx5e: XSK, Fix unintended ICOSQ change

   - phy_port: correctly recompute the port's linkmodes

   - vsock: prevent child netns mode switch from local to global

   - couple of kconfig fixes for new symbols

  Previous releases - regressions:

   - nfc: nci: fix false-positive parameter validation for packet data

   - net: do not delay zero-copy skbs in skb_attempt_defer_free()

  Previous releases - always broken:

   - mctp: ensure our nlmsg responses to user space are zero-initialised

   - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()

   - fixes for ICMP rate limiting

  Misc:

   - intel: fix PCI device ID conflict between i40e and ipw2200"

* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: nfc: nci: Fix parameter validation for packet data
  net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
  net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
  net/mlx5: Fix misidentification of write combining CQE during poll loop
  net/mlx5e: Fix misidentification of ASO CQE during poll loop
  net/mlx5: Fix multiport device check over light SFs
  bonding: alb: fix UAF in rlb_arp_recv during bond up/down
  bnge: fix reserving resources from FW
  eth: fbnic: Advertise supported XDP features.
  rds: tcp: fix uninit-value in __inet_bind
  net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
  octeontx2-af: Fix default entries mcam entry action
  net/mlx5e: XSK, Fix unintended ICOSQ change
  ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
  ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
  ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
  inet: move icmp_global_{credit,stamp} to a separate cache line
  icmp: prevent possible overflow in icmp_global_allow()
  selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
  ...

Merge tag 'bpf-fixes' of git://git./linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
   Hung)

- Fix a potential use-after-free of BTF object (Anton Protopopov)

- Add feature detection to libbpf and avoid moving arena global
   variables on older kernels (Emil Tsalapatis)

- Remove extern declaration of bpf_stream_vprintk() from libbpf headers
   (Ihor Solodrai)

- Fix truncated netlink dumps in bpftool (Jakub Kicinski)

- Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
   Dwivedi)

- Remove hexdump dependency while building bpf selftests (Matthieu
   Baerts)

- Complete fsession support in BPF trampolines on riscv (Menglong Dong)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Remove hexdump dependency
  libbpf: Remove extern declaration of bpf_stream_vprintk()
  selftests/bpf: Use vmlinux.h in test_xdp_meta
  bpftool: Fix truncated netlink dumps
  libbpf: Delay feature gate check until object prepare time
  libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
  bpf: Add a map/btf from a fd array more consistently
  selftests/bpf: Fix map_kptr grace period wait
  selftests/bpf: enable fsession_test on riscv64
  selftests/bpf: Adjust selftest due to function rename
  bpf, riscv: add fsession support for trampolines
  bpf: Fix a potential use-after-free of BTF object
  bpf, riscv: introduce emit_store_stack_imm64() for trampoline
  libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
  libbpf: Add gating for arena globals relocation feature

KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "struct gic_kvm_info", but the returned type,
while matching, is const qualified. To get them exactly matching, just
use the dereferenced pointer for the sizeof().

Link: https://patch.msgid.link/20260206223022.it.052-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

drm/msm: Adjust msm_iommu_pagetable_prealloc_allocate() allocation type

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "void **" but the returned type will be "void ***".
These are the same allocation size (pointer size), but the types do not
match. Adjust the allocation type to match the assignment.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260206222151.work.016-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

dm: dm-zoned: Adjust dmz_load_mapping() allocation type

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "struct dmz_mblock **" but the returned type will
be "struct dmz_mblk **". These are the same allocation size (pointer
size), but the types do not match. Adjust the allocation type to match
the assignment.

Link: https://patch.msgid.link/20250426061707.work.587-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

dm-crypt: Adjust crypt_alloc_tfms_aead() allocation type

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "struct crypto_skcipher **" but the returned type
will be "struct crypto_aead **". These are the same allocation size
(pointer size), but the types don't match. Adjust the allocation type
to match the assignment.

Link: https://patch.msgid.link/20250426061629.work.266-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

net: nfc: nci: Fix parameter validation for packet data

Since commit 9c328f54741b ("net: nfc: nci: Add parameter validation for
packet data") communication with nci nfc chips is not working any more.

The mentioned commit tries to fix access of uninitialized data, but
failed to understand that in some cases the data packet is of variable
length and can therefore not be compared to the maximum packet length
given by the sizeof(struct).

Fixes: 9c328f54741b ("net: nfc: nci: Add parameter validation for packet data")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Reported-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20260218083000.301354-1-michael.thalmeier@hale.at
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-misc-fixes-2026-02-18'

Tariq Toukan says:

====================
mlx5 misc fixes 2026-02-18

This patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.
====================

Link: https://patch.msgid.link/20260218072904.1764634-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Use unsigned for mlx5e_get_max_num_channels

The max number of channels is always an unsigned int, use the correct
type to fix compilation errors done with strict type checking, e.g.:

error: call to ‘__compiletime_assert_1110’ declared with attribute
error: min(mlx5e_get_devlink_param_num_doorbells(mdev),
mlx5e_get_max_num_channels(mdev)) signedness error

Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix deadlocks between devlink and netdev instance locks

In the mentioned "Fixes" commit, various work tasks triggering devlink
health reporter recovery were switched to use netdev_trylock to protect
against concurrent tear down of the channels being recovered. But this
had the side effect of introducing potential deadlocks because of
incorrect lock ordering.

The correct lock order is described by the init flow:
probe_one -> mlx5_init_one (acquires devlink lock)
-> mlx5_init_one_devl_locked -> mlx5_register_device
-> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
-> register_netdev (acquires rtnl lock)
-> register_netdevice (acquires netdev lock)
=> devlink lock -> rtnl lock -> netdev lock.

But in the current recovery flow, the order is wrong:
mlx5e_tx_err_cqe_work (acquires netdev lock)
-> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
-> devlink_health_report (acquires devlink lock => boom!)
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
-> mlx5e_tx_reporter_err_cqe_recover

The same pattern exists in:
mlx5e_reporter_rx_timeout
mlx5e_reporter_tx_ptpsq_unhealthy
mlx5e_reporter_tx_timeout

Fix these by moving the netdev_trylock calls from the work handlers
lower in the call stack, in the respective recovery functions, where
they are actually necessary.

Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event

The macsec_aso_set_arm_event function calls mlx5_aso_poll_cq once
without a retry loop. If the CQE is not immediately available after
posting the WQE, the function fails unnecessarily.

Use read_poll_timeout() to poll 3-10 usecs for CQE, consistent with
other ASO polling code paths in the driver.

Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix misidentification of write combining CQE during poll loop

The write combining completion poll loop uses usleep_range() which can
sleep much longer than requested due to scheduler latency. Under load,
we witnessed a 20ms+ delay until the process was rescheduled, causing
the jiffies based timeout to expire while the thread is sleeping.

The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.

Instead of the open-coded while loop, use the kernel's poll_timeout_us()
which always performs an additional check after the sleep expiration,
and is less error-prone.

Note: poll_timeout_us() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.

Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix misidentification of ASO CQE during poll loop

The ASO completion poll loop uses usleep_range() which can sleep much
longer than requested due to scheduler latency. Under load, we witnessed
a 20ms+ delay until the process was rescheduled, causing the jiffies
based timeout to expire while the thread is sleeping.

The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.

Instead of the open-coded while loop, use the kernel's
read_poll_timeout() which always performs an additional check after the
sleep expiration, and is less error-prone.

Note: read_poll_timeout() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.

Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")
Fixes: 7e3fce82d945 ("net/mlx5e: Overcome slow response for first macsec ASO WQE")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix multiport device check over light SFs

Driver is using num_vhca_ports capability to distinguish between
multiport master device and multiport slave device. num_vhca_ports is a
capability the driver sets according to the MAX num_vhca_ports
capability reported by FW. On the other hand, light SFs doesn't set the
above capbility.

This leads to wrong results whenever light SFs is checking whether he is
a multiport master or slave.

Therefore, use the MAX capability to distinguish between master and
slave devices.

Fixes: e71383fb9cd1 ("net/mlx5: Light probe local SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: alb: fix UAF in rlb_arp_recv during bond up/down

The ALB RX path may access rx_hashtbl concurrently with bond
teardown. During rapid bond up/down cycles, rlb_deinitialize()
frees rx_hashtbl while RX handlers are still running, leading
to a null pointer dereference detected by KASAN.

However, the root cause is that rlb_arp_recv() can still be accessed
after setting recv_probe to NULL, which is actually a use-after-free
(UAF) issue. That is the reason for using the referenced commit in the
Fixes tag.

[  214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
[  214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
[  214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
[  214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
[  214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
[  214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
[  214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
[  214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
[  214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
[  214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
[  214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
[  214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
[  214.286943] FS:  00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
[  214.295966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
[  214.310347] Call Trace:
[  214.313070]  <IRQ>
[  214.315318]  ? __pfx_rlb_arp_recv+0x10/0x10 [bonding]
[  214.320975]  bond_handle_frame+0x166/0xb60 [bonding]
[  214.326537]  ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
[  214.332680]  __netif_receive_skb_core.constprop.0+0x576/0x2710
[  214.339199]  ? __pfx_arp_process+0x10/0x10
[  214.343775]  ? sched_balance_find_src_group+0x98/0x630
[  214.349513]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  214.356513]  ? arp_rcv+0x307/0x690
[  214.360311]  ? __pfx_arp_rcv+0x10/0x10
[  214.364499]  ? __lock_acquire+0x58c/0xbd0
[  214.368975]  __netif_receive_skb_one_core+0xae/0x1b0
[  214.374518]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  214.380743]  ? lock_acquire+0x10b/0x140
[  214.385026]  process_backlog+0x3f1/0x13a0
[  214.389502]  ? process_backlog+0x3aa/0x13a0
[  214.394174]  __napi_poll.constprop.0+0x9f/0x370
[  214.399233]  net_rx_action+0x8c1/0xe60
[  214.403423]  ? __pfx_net_rx_action+0x10/0x10
[  214.408193]  ? lock_acquire.part.0+0xbd/0x260
[  214.413058]  ? sched_clock_cpu+0x6c/0x540
[  214.417540]  ? mark_held_locks+0x40/0x70
[  214.421920]  handle_softirqs+0x1fd/0x860
[  214.426302]  ? __pfx_handle_softirqs+0x10/0x10
[  214.431264]  ? __neigh_event_send+0x2d6/0xf50
[  214.436131]  do_softirq+0xb1/0xf0
[  214.439830]  </IRQ>

The issue is reproducible by repeatedly running
ip link set bond0 up/down while receiving ARP messages, where
rlb_arp_recv() can race with rlb_deinitialize() and dereference
a freed rx_hashtbl entry.

Fix this by setting recv_probe to NULL and then calling
synchronize_net() to wait for any concurrent RX processing to finish.
This ensures that no RX handler can access rx_hashtbl after it is freed
in bond_alb_deinitialize().

Reported-by: Liang Li <liali@redhat.com>
Fixes: 3aba891dde38 ("bonding: move processing of recv handlers into handle_frame()")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260218060919.101574-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnge: fix reserving resources from FW

HWRM_FUNC_CFG is used to reserve resources, whereas HWRM_FUNC_QCFG is
intended for querying resource information from the firmware.
Since __bnge_hwrm_reserve_pf_rings() reserves resources for a specific
PF, the command type should be HWRM_FUNC_CFG.

Fixes: 627c67f038d2 ("bng_en: Add resource management support")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260218052755.4097468-1-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Advertise supported XDP features.

Drivers are supposed to advertise the XDP features they support. This was
missed while adding XDP support.

Before:
$ ynl --family netdev --dump dev-get
...
{'ifindex': 3,
  'xdp-features': set(),
  'xdp-rx-metadata-features': set(),
  'xsk-features': set()},
...

After:
$ ynl --family netdev --dump dev-get
...
{'ifindex': 3,
  'xdp-features': {'basic', 'rx-sg'},
  'xdp-rx-metadata-features': set(),
  'xsk-features': set()},
...

Fixes: 168deb7b31b2 ("eth: fbnic: Add support for XDP_TX action")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260218030620.3329608-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fbdev: au1100fb: Replace license boilerplate by SPDX header

This also gets rid of an old address of the FSF.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: au1100fb: Fold au1100fb.h into its only user

This gets rid of a header that is only used once. The copyrights and
license specifications are all already covered in the au1100fb.c file.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: au1100fb: Replace custom printk wrappers by pr_*

The global wrappers also have the advantage to do stricter format
checking, so the pr_devel formats are also checked if DEBUG is not
defined. The global variants only check for DEBUG being defined and not
its actual value, so the #define to zero is dropped, too.

There is only a slight semantic change as the (by default disabled)
debug output doesn't contain __FILE__ any more.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: au1100fb: Make driver compilable on non-mips platforms

The header asm/mach-au1x00/au1000.h is unused apart from pulling in
<linux/delay.h> (for mdelay()) and <linux/io.h> (for KSEG1ADDR()). Then
the only platform specific part in the driver is the usage of the KSEG1ADDR
macro, which for the non-mips case can be stubbed.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: au1100fb: Use proper conversion specifiers in printk formats

%zu is the dedicated type for size_t. %d only works on 32bit
architectures where size_t is typedef'd to be unsigned int. (And then
the signedness doesn't fit, but `gcc -Wformat` doesn't stumble over this.
Also the size of dma_addr_t is architecture dependent and it should be
printkd using %pad (and the value passed by reference).

This prepares allowing this driver to be compiled on non-mips platforms.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: au1100fb: Mark several local functions as static

This fixes several (fatal) compiler warnings à la

drivers/video/fbdev/au1100fb.c:530:6: error: no previous prototype for ‘au1100fb_drv_remove’ [-Werror=missing-prototypes]
523 | void au1100fb_drv_remove(struct platform_device *dev)
| ^~~~~~~~~~~~~~~~~~~

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: au1100fb: Don't store device specific data in global variables

Using global data to store device specific data is a bad pattern that
breaks if there is more than one device. So expand driver data and drop
the global variables.

While there is probably no machine that has two or more au1100fb
devices, this makes the driver a better template for new drivers and
saves some memory if there is no such bound device.

bloat-o-meter reports (for ARCH=arm allmodconfig + CONFIG_FB_AU1100=y
and ignoring the rename of the init function):

add/remove: 1/4 grow/shrink: 2/2 up/down: 1360/-4800 (-3440)
Function                                     old     new   delta
au1100fb_drv_probe                          2648    3328    +680
$a                                         12808   13484    +676
au1100fb_drv_resume                          404     400      -4
au1100fb_fix                                  68       -     -68
au1100fb_var                                 160       -    -160
fbregs                                      2048       -   -2048
$d                                          9525    7009   -2516
Total: Before=38664, After=35224, chg -8.90%

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

rds: tcp: fix uninit-value in __inet_bind

KMSAN reported an uninit-value access in __inet_bind() when binding
an RDS TCP socket.

The uninitialized memory originates from rds_tcp_conn_alloc(),
which uses kmem_cache_alloc() to allocate the rds_tcp_connection structure.

Specifically, the field 't_client_port_group' is incremented in
rds_tcp_conn_path_connect() without being initialized first:

if (++tc->t_client_port_group >= port_groups)

Since kmem_cache_alloc() does not zero the memory, this field contains
garbage, leading to the KMSAN report.

Fix this by using kmem_cache_zalloc() to ensure the structure is
zero-initialized upon allocation.

Reported-by: syzbot+aae646f09192f72a68dc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=aae646f09192f72a68dc
Tested-by: syzbot+aae646f09192f72a68dc@syzkaller.appspotmail.com
Fixes: a20a6992558f ("net/rds: Encode cp_index in TCP source port")

Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com>
Reviewed-by: Charalampos Mitrodimas <charmitro@posteo.net>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260217135350.33641-1-tabreztalks@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/rds: Fix NULL pointer dereference in rds_tcp_accept_one

Save a local pointer to new_sock->sk and hold a reference before
installing callbacks in rds_tcp_accept_one. After
rds_tcp_set_callbacks() or rds_tcp_reset_callbacks(), tc->t_sock is
set to new_sock which may race with the shutdown path. A concurrent
rds_tcp_conn_path_shutdown() may call sock_release(), which sets
new_sock->sk = NULL and may eventually free sk when the refcount
reaches zero.

Subsequent accesses to new_sock->sk->sk_state would dereference NULL,
causing the crash. The fix saves a local sk pointer before callbacks
are installed so that sk_state can be accessed safely even after
new_sock->sk is nulled, and uses sock_hold()/sock_put() to ensure
sk itself remains valid for the duration.

Fixes: 826c1004d4ae ("net/rds: rds_tcp_conn_path_shutdown must not discard messages")
Reported-by: syzbot+96046021045ffe6d7709@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96046021045ffe6d7709
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260216222643.2391390-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

s390/tape: Fix device driver name

Recent cleanups and code consolidations in the s390 tape device driver
renamed files and function namespaces from tape_34xx to tape_3490 to
better reflect the single support of the IBM 3490E device in the
codebase.

These changes also renamed the driver name to tape_3490, which
consequently broke userspace as the sysfs driver path is now
/sys/bus/ccw/drivers/tape_3490/ instead of
/sys/bus/ccw/drivers/tape_34xx/.

Change the device driver name back to tape_34xx to fix userspace.

Fixes: 9872dae6102e ("s390/tape: Rename tape_34xx.c to tape_3490.c")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

arm64: hugetlbpage: avoid unused-but-set-parameter warning (gcc-16)

gcc-16 warns about an instance that older compilers did not:

arch/arm64/mm/hugetlbpage.c: In function 'huge_pte_clear':
arch/arm64/mm/hugetlbpage.c:369:57: error: parameter 'addr' set but not used [-Werror=unused-but-set-parameter=]

The issue here is that __pte_clear() does not actually use its second
argument, but when CONFIG_ARM64_CONTPTE is enabled it still gets
updated.

Replace the macro with an inline function to let the compiler see
the argument getting passed down.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Force the use of CNTVCT_EL0 in __delay()

Quentin forwards a report from Hyesoo Yu, describing an interesting
problem with the use of WFxT in __delay() when a vcpu is loaded and
that KVM is *not* in VHE mode (either nVHE or hVHE).

In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the
state of the guest virtual counter. At the same time, __delay() is
using get_cycles() to read the counter value, which is indirected to
reading CNTPCT_EL0.

The core of the issue is that WFxT is using the *virtual* counter,
while the kernel is using the physical counter, and that the offset
introduces a really bad discrepancy between the two.

Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent
irrespective of the value of CNTVOFF_EL2.

Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reported-by: Quentin Perret <qperret@google.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Fixes: 7d26b0516a0d ("arm64: Use WFxT for __delay() when possible")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

mshv: Handle insufficient root memory hypervisor statuses

When creating guest partition objects, the hypervisor may fail to
allocate root partition pages and return an insufficient memory status.
In this case, deposit memory using the root partition ID instead.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: Handle insufficient contiguous memory hypervisor status

The HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY status indicates that the
hypervisor lacks sufficient contiguous memory for its internal allocations.

When this status is encountered, allocate and deposit
HV_MAX_CONTIGUOUS_ALLOCATION_PAGES contiguous pages to the hypervisor.
HV_MAX_CONTIGUOUS_ALLOCATION_PAGES is defined in the hypervisor headers, a
deposit of this size will always satisfy the hypervisor's requirements.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: Introduce hv_deposit_memory helper functions

Introduce hv_deposit_memory_node() and hv_deposit_memory() helper
functions to handle memory deposit with proper error handling.

The new hv_deposit_memory_node() function takes the hypervisor status
as a parameter and validates it before depositing pages. It checks for
HV_STATUS_INSUFFICIENT_MEMORY specifically and returns an error for
unexpected status codes.

This is a precursor patch to new out-of-memory error codes support.
No functional changes intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: Introduce hv_result_needs_memory() helper function

Replace direct comparisons of hv_result(status) against
HV_STATUS_INSUFFICIENT_MEMORY with a new hv_result_needs_memory() helper
function.

This improves code readability and provides a consistent and extendable
interface for checking out-of-memory conditions in hypercall results.

No functional changes intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

Merge tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git./linux/kernel/git/akpm/mm

Pull more non-MM updates from Andrew Morton:

- "two fixes in kho_populate()" fixes a couple of not-major issues in
   the kexec handover code (Ran Xiaokai)

- misc singletons

* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib/group_cpus: handle const qualifier from clusters allocation type
  kho: remove unnecessary WARN_ON(err) in kho_populate()
  kho: fix missing early_memunmap() call in kho_populate()
  scripts/gdb: implement x86_page_ops in mm.py
  objpool: fix the overestimation of object pooling metadata size
  selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
  delayacct: fix build regression on accounting tool

Merge tag 'mm-stable-2026-02-18-19-48' of git://git./linux/kernel/git/akpm/mm

Pull more MM  updates from Andrew Morton:

- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
   couple of issues in the demotion code - pages were failed demotion
   and were finding themselves demoted into disallowed nodes (Bing Jiao)

- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
   mapledtree race and performs a number of cleanups (Liam Howlett)

- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
   them" implements a lot of cleanups following on from the conversion
   of the VMA flags into a bitmap (Lorenzo Stoakes)

- "support batch checking of references and unmapping for large folios"
   implements batching to greatly improve the performance of reclaiming
   clean file-backed large folios (Baolin Wang)

- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
   Lin)

* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
  mm/page_alloc: clear page->private in free_pages_prepare()
  selftests/mm: add memory failure dirty pagecache test
  selftests/mm: add memory failure clean pagecache test
  selftests/mm: add memory failure anonymous page test
  mm: rmap: support batched unmapping for file large folios
  arm64: mm: implement the architecture-specific clear_flush_young_ptes()
  arm64: mm: support batch clearing of the young flag for large folios
  arm64: mm: factor out the address and ptep alignment into a new helper
  mm: rmap: support batched checks of the references for large folios
  tools/testing/vma: add VMA userland tests for VMA flag functions
  tools/testing/vma: separate out vma_internal.h into logical headers
  tools/testing/vma: separate VMA userland tests into separate files
  mm: make vm_area_desc utilise vma_flags_t only
  mm: update all remaining mmap_prepare users to use vma_flags_t
  mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
  mm: update secretmem to use VMA flags on mmap_prepare
  mm: update hugetlbfs to use VMA flags on mmap_prepare
  mm: add basic VMA flag operation helper functions
  tools: bitmap: add missing bitmap_[subset(), andnot()]
  mm: add mk_vma_flags() bitmap flag macro helper
  ...

octeontx2-af: Fix default entries mcam entry action

As per design, AF should update the default MCAM action only when
mcam_index is -1. A bug in the previous patch caused default entries
to be changed even when the request was not for them.

Fixes: 570ba37898ec ("octeontx2-af: Update RSS algorithm index")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260216090338.1318976-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-02-17' of https://git./linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*:

1) Add missing __rcu annotations to NAT helper hook pointers in Amanda,
   FTP, IRC, SNMP and TFTP helpers.  From Sun Jian.

2-4):
- Add global spinlock to serialize nft_counter fetch+reset operations.
- Use atomic64_xchg() for nft_quota reset instead of read+subtract pattern.
   Note AI review detects a race in this change but it isn't new. The
   'racing' bit only exists to prevent constant stream of 'quota expired'
   notifications.
- Revert commit_mutex usage in nf_tables reset path, it caused
   circular lock dependency.  All from Brian Witte.

5) Fix uninitialized l3num value in nf_conntrack_h323 helper.

6) Fix musl libc compatibility in netfilter_bridge.h UAPI header. This
   change isn't nice (UAPI headers should not include libc headers), but
   as-is musl builds may fail due to redefinition of struct ethhdr.

7) Fix protocol checksum validation in IPVS for IPv6 with extension headers,
   from Julian Anastasov.

8) Fix device reference leak in IPVS when netdev goes down. Also from
   Julian.

9) Remove WARN_ON_ONCE when accessing forward path array, this can
   trigger with sufficiently long forward paths.  From Pablo Neira Ayuso.

10) Fix use-after-free in nf_tables_addchain() error path, from Inseo An.

* tag 'nf-26-02-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix use-after-free in nf_tables_addchain()
  net: remove WARN_ON_ONCE when accessing forward path array
  ipvs: do not keep dest_dst if dev is going down
  ipvs: skip ipv6 extension headers for csum checks
  include: uapi: netfilter_bridge.h: Cover for musl libc
  netfilter: nf_conntrack_h323: don't pass uninitialised l3num value
  netfilter: nf_tables: revert commit_mutex usage in reset path
  netfilter: nft_quota: use atomic64_xchg for reset
  netfilter: nft_counter: serialize reset with spinlock
  netfilter: annotate NAT helper hook pointers with __rcu
====================

Link: https://patch.msgid.link/20260217163233.31455-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: XSK, Fix unintended ICOSQ change

XSK wakeup must use the async ICOSQ (with proper locking), as it is not
guaranteed to run on the same CPU as the channel.

The commit that converted the NAPI trigger path to use the sync ICOSQ
incorrectly applied the same change to XSK, causing XSK wakeups to use
the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.

XDP program attach/detach triggers channel reopen, while XSK pool
enable/disable can happen on-the-fly via NDOs without reopening
channels. As a result, xsk_pool state cannot be reliably used at
mlx5e_open_channel() time to decide whether an async ICOSQ is needed.

Update the async_icosq_needed logic to depend on the presence of an XDP
program rather than the xsk_pool, ensuring the async ICOSQ is available
when XSK wakeups are enabled.

This fixes multiple issues:

1. Illegal synchronize_rcu() in an RCU read- side critical section via
mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() ->
synchronize_net(). The stack holds RCU read-lock in xsk_poll().

2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup():

[] BUG: kernel NULL pointer dereference, address: 0000000000000240
[] #PF: supervisor read access in kernel mode
[] #PF: error_code(0x0000) - not-present page
[] PGD 0 P4D 0
[] Oops: Oops: 0000 [#1] SMP
[] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none)
[] Hardware name: [...]
[] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/
Fixes: 56aca3e0f730 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Alice Mikityanska <alice.kernel@fastmail.im>
Link: https://patch.msgid.link/20260217074525.1761454-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'icmp-better-deal-with-ddos'

Eric Dumazet says:

====================
icmp: better deal with DDOS

When dealing with death of big UDP servers, admins might want to
increase net.ipv4.icmp_msgs_per_sec and net.ipv4.icmp_msgs_burst
to big values (2,000,000 or more).

They also might need to tune the per-host ratelimit to 1ms or 0ms
in favor of the global rate limit.

This series fixes bugs showing up in all these needs.
====================

Link: https://patch.msgid.link/20260216142832.3834174-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero

If net.ipv6.icmp.ratelimit is zero we do not have to call
inet_getpeer_v6() and inet_peer_xrlim_allow().

Both can be very expensive under DDOS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero

If net.ipv4.icmp_ratelimit is zero, we do not have to call
inet_getpeer_v4() and inet_peer_xrlim_allow().

Both can be very expensive under DDOS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()

Following part was needed before the blamed commit, because
inet_getpeer_v6() second argument was the prefix.

/* Give more bandwidth to wider prefixes. */
if (rt->rt6i_dst.plen < 128)
tmo >>= ((128 - rt->rt6i_dst.plen)>>5);

Now inet_getpeer_v6() retrieves hosts, we need to remove
@tmo adjustement or wider prefixes likes /24 allow 8x
more ICMP to be sent for a given ratelimit.

As we had this issue for a while, this patch changes net.ipv6.icmp.ratelimit
default value from 1000ms to 100ms to avoid potential regressions.

Also add a READ_ONCE() when reading net->ipv6.sysctl.icmpv6_time.

Fixes: fd0273d7939f ("ipv6: Remove external dependency on rt6i_dst and rt6i_src")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260216142832.3834174-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

inet: move icmp_global_{credit,stamp} to a separate cache line

icmp_global_credit was meant to be changed ~1000 times per second,
but if an admin sets net.ipv4.icmp_msgs_per_sec to a very high value,
icmp_global_credit changes can inflict false sharing to surrounding
fields that are read mostly.

Move icmp_global_credit and icmp_global_stamp to a separate
cacheline aligned group.

Fixes: b056b4cd9178 ("icmp: move icmp_global.credit and icmp_global.stamp to per netns storage")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

icmp: prevent possible overflow in icmp_global_allow()

Following expression can overflow
if sysctl_icmp_msgs_per_sec is big enough.

sysctl_icmp_msgs_per_sec * delta / HZ;

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: add ipv4-mapped-ipv6 tests

Add ipv4-mapped-ipv6 case to ksft_runner.sh before
an upcoming TCP fix in this area.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260217142924.1853498-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mshv: Add SMT_ENABLED_GUEST partition creation flag

Add support for HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST
to allow userspace VMMs to enable SMT for guest partitions.

Expose this via new MSHV_PT_BIT_SMT_ENABLED_GUEST flag in the UAPI.

Without this flag, the hypervisor schedules guest VPs incorrectly,
causing SMT unusable.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: Add nested virtualization creation flag

Introduce HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE to
indicate support for nested virtualization during partition creation.

This enables clearer configuration and capability checks for nested
virtualization scenarios.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

Drivers: hv: vmbus: Simplify allocation of vmbus_evt

The per-cpu variable vmbus_evt is currently dynamically allocated. It's
only 8 bytes, so just allocate it statically to simplify and save a few
lines of code.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: expose the scrub partition hypercall

This hypercall needs to be exposed for VMMs to soft-reboot guests. It
will reset APIC and synthetic interrupt controller state, among others.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: Add support for integrated scheduler

Query the hypervisor for integrated scheduler support and use it if
configured.

Microsoft Hypervisor originally provided two schedulers: root and core. The
root scheduler allows the root partition to schedule guest vCPUs across
physical cores, supporting both time slicing and CPU affinity (e.g., via
cgroups). In contrast, the core scheduler delegates vCPU-to-physical-core
scheduling entirely to the hypervisor.

Direct virtualization introduces a new privileged guest partition type - L1
Virtual Host (L1VH) — which can create child partitions from its own
resources. These child partitions are effectively siblings, scheduled by
the hypervisor's core scheduler. This prevents the L1VH parent from setting
affinity or time slicing for its own processes or guest VPs. While cgroups,
CFS, and cpuset controllers can still be used, their effectiveness is
unpredictable, as the core scheduler swaps vCPUs according to its own logic
(typically round-robin across all allocated physical CPUs). As a result,
the system may appear to "steal" time from the L1VH and its children.

To address this, Microsoft Hypervisor introduces the integrated scheduler.
This allows an L1VH partition to schedule its own vCPUs and those of its
guests across its "physical" cores, effectively emulating root scheduler
behavior within the L1VH, while retaining core scheduler behavior for the
rest of the system.

The integrated scheduler is controlled by the root partition and gated by
the vmm_enable_integrated_scheduler capability bit. If set, the hypervisor
supports the integrated scheduler. The L1VH partition must then check if it
is enabled by querying the corresponding extended partition property. If
this property is true, the L1VH partition must use the root scheduler
logic; otherwise, it must use the core scheduler. This requirement makes
reading VMM capabilities in L1VH partition a requirement too.

Signed-off-by: Andreea Pintilie <anpintil@microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: Use try_cmpxchg() instead of cmpxchg()

Use !try_cmpxchg() instead of cmpxchg (*ptr, old, new) != old.
x86 CMPXCHG instruction returns success in ZF flag, so this
change saves a compare after CMPXCHG.

The generated assembly code improves from e.g.:

     415: 48 8b 44 24 30        mov    0x30(%rsp),%rax
     41a: 48 8b 54 24 38        mov    0x38(%rsp),%rdx
     41f: f0 49 0f b1 91 a8 02 lock cmpxchg %rdx,0x2a8(%r9)
     426: 00 00
     428: 48 3b 44 24 30        cmp    0x30(%rsp),%rax
     42d: 0f 84 09 ff ff ff     je     33c <...>

to:

     415: 48 8b 44 24 30        mov    0x30(%rsp),%rax
     41a: 48 8b 54 24 38        mov    0x38(%rsp),%rdx
     41f: f0 49 0f b1 91 a8 02 lock cmpxchg %rdx,0x2a8(%r9)
     426: 00 00
     428: 0f 84 0e ff ff ff     je     33c <...>

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

x86/hyperv: Fix error pointer dereference

The function idle_thread_get() can return an error pointer and is not
checked for it. Add check for error pointer.

Detected by Smatch:
arch/x86/hyperv/hv_vtl.c:126 hv_vtl_bringup_vcpu() error:
'idle' dereferencing possible ERR_PTR()

Fixes: 2b4b90e053a29 ("x86/hyperv: Use per cpu initial stack for vtl context")
Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV

MSVC compiler, used to compile the Microsoft Hypervisor, currently
has an assert intrinsic that uses interrupt vector 0x29 to create an
exception. This will cause hypervisor to then crash and collect core. As
such, if this interrupt number is assigned to a device by Linux and the
device generates it, hypervisor will crash. There are two other such
vectors hard coded in the hypervisor, 0x2C and 0x2D for debug purposes.

Fortunately, the three vectors are part of the kernel driver space and
that makes it feasible to reserve them early so they are not assigned
later.

Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

selftests/bpf: Remove hexdump dependency

The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2b3
("selftests/bpf: Remove xxd util dependency").

hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:

/bin/sh: 2: hexdump: not found

Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.

Fixes: b640d556a2b3 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'kbuild-fixes-7.0-1' of git://git./linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nathan Chancellor:

- Ensure tools/objtool is cleaned by 'make clean' and 'make mrproper'

- Fix test program for CONFIG_CC_CAN_LINK to avoid a warning, which is
   made fatal by -Werror

- Drop explicit LZMA parallel compression in scripts/make_fit.py

- Several fixes for commit 62089b804895 ("kbuild: rpm-pkg: Generate
   debuginfo package manually")

* tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: rpm-pkg: Disable automatic requires for manual debuginfo package
  kbuild: rpm-pkg: Fix manual debuginfo generation when using .src.rpm
  kernel: rpm-pkg: Restore find-debuginfo.sh approach to -debuginfo package
  kbuild: rpm-pkg: Restrict manual debug package creation
  scripts/make_fit.py: Drop explicit LZMA parallel compression
  kbuild: Fix CC_CAN_LINK detection
  kbuild: Add objtool to top-level clean target

Merge branch 'libbpf-remove-extern-declaration-of-bpf_stream_vprintk'

Ihor Solodrai says:

====================
libbpf: Remove extern declaration of bpf_stream_vprintk()

The first patch adjusts a selftest that has been using
bpf_stream_printk() macro. The second patch removes the declaration.
====================

Link: https://patch.msgid.link/20260218215651.2057673-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Remove extern declaration of bpf_stream_vprintk()

An issue was reported that building BPF program which includes both
vmlinux.h and bpf_helpers.h from libbpf fails due to conflicting
declarations of bpf_stream_vprintk().

Remove the extern declaration from bpf_helpers.h to address this.

In order to use bpf_stream_printk() macro, BPF programs are expected
to either include vmlinux.h of the kernel they are targeting, or add
their own extern declaration.

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Closes: https://github.com/libbpf/libbpf/issues/947
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Use vmlinux.h in test_xdp_meta

- Replace linux/* includes with vmlinux.h
- Include errno.h
- Include bpf_tracing_net.h for TC_ACT_* and ETH_*
- Use BPF_STDERR instead of BPF_STREAM_STDERR

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'thermal-7.0-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
"This fixes a sysfs group leak on DLVR registration failure in the
Intel int340x thermal driver (Kaushlendra Kumar)"

* tag 'thermal-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Fix sysfs group leak on DLVR registration failure

Merge tag 'acpi-7.0-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull more ACPI support updates from Rafael J. Wysocki:
"These are mostly fixes and cleanups on top of the ACPI support updates
  merged recently, including two new quirks, an ACPI CPPC library fix,
  and fixes and cleanups of a few core ACPI device drivers:

   - Add an unused power resource handling quirk for THUNDEROBOT ZERO
     (Zhai Can)

   - Fix remaining for_each_possible_cpu() in the ACPI CPPC library to
     use online CPUs (Sean V Kelley)

   - Drop redundant checks from the ACPI notify handler and the driver
     remove callback in the ACPI battery driver (Rafael Wysocki)

   - Move the creation of the wakeup source during the ACPI button
     driver probe to an earlier point to avoid missing a wakeup event
     due to a race and clean up system wakeup handling and remove
     callback in that driver (Rafael Wysocki)

   - Drop unnecessary driver_data pointer clearing from the ACPI EC and
     SMBUS HC drivers and make the ACPI backlight (video) driver clear
     the device's driver_data pointer on remove (Rafael Wysocki)

   - Force enabling of PWM2 on the Yogabook YB1-X90 tablets (Yauhen
     Kharuzhy)"

* tag 'acpi-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO
  ACPI: driver: Drop driver_data pointer clearing from two drivers
  ACPI: video: Clear driver_data pointer on remove
  ACPI: button: Tweak acpi_button_remove()
  ACPI: button: Tweak system wakeup handling
  ACPI: battery: Drop redundant checks from acpi_battery_remove()
  ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs
  ACPI: x86: Force enabling of PWM2 on the Yogabook YB1-X90
  ACPI: button: Call device_init_wakeup() earlier during probe
  ACPI: battery: Drop redundant check from acpi_battery_notify()

Merge tag 'pm-7.0-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
"These are mostly fixes on top of the power management updates merged
  recently in cpuidle governors, in the Intel RAPL power capping driver
  and in the wake IRQ management code:

   - Fix the handling of package-scope MSRs in the intel_rapl power
     capping driver when called from the PMU subsystem and make it add
     all package CPUs to the PMU cpumask to allow tools to read RAPL
     events from any CPU in the package (Kuppuswamy Satharayananyan)

   - Rework the invalid version check in the intel_rapl_tpmi power
     capping driver to account for the fact that on partitioned systems,
     multiple TPMI instances may exist per package, but RAPL registers
     are only valid on one instance (Kuppuswamy Satharayananyan)

   - Describe the new intel_idle.table command line option in the
     admin-guide intel_idle documentation (Artem Bityutskiy)

   - Fix a crash in the ladder cpuidle governor on systems with only one
     (polling) idle state available by making the cpuidle core bypass
     the governor in those cases and adjust the other existing governors
     to that change (Aboorva Devarajan, Christian Loehle)

   - Update kerneldoc comments for wake IRQ management functions that
     have not been matching the code (Wang Jiayue)"

* tag 'pm-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: menu: Remove single state handling
  cpuidle: teo: Remove single state handling
  cpuidle: haltpoll: Remove single state handling
  cpuidle: Skip governor when only one idle state is available
  powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
  PM: sleep: wakeirq: Update outdated documentation comments
  Documentation: PM: Document intel_idle.table command line option
  powercap: intel_rapl: Expose all package CPUs in PMU cpumask
  powercap: intel_rapl: Remove incorrect CPU check in PMU context

apparmor: fix signedness bug in unpack_tags()

Smatch static checker warning:
security/apparmor/policy_unpack.c:966 unpack_pdb()
warn: unsigned 'unpack_tags(e, &pdb->tags, info)' is never less than zero.

unpack_tags() is declared with return type size_t (unsigned) but returns
negative errno values on failure. The caller in unpack_pdb() tests the
return with `< 0`, which is always false for an unsigned type, making
error handling dead code. Malformed tag data would be silently accepted
instead of causing a load failure.

Change return type of unpack_tags() from size_t to int to match the
functions's actual semantic.

Fixes: 3d28e2397af7 ("apparmor: add support loading per permission tagging")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Massimiliano Pellizzer <mpellizzer.dev@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

Merge branches 'acpi-battery', 'acpi-button' and 'acpi-driver'

Merge additional updates of multiple core ACPI device drivers (battery,
button, video, EC, SMBUS HC) for 7.0-rc1:

- Drop redundant checks from the ACPI notify handler and the driver
   remove callback in the ACPI battery driver (Rafael Wysocki)

- Move the creation of the wakeup source during the ACPI button driver
   probe to an earlier point to avoid missing a wakeup event due to a
   race and clean up system wakeup handling and remove callback in that
   driver (Rafael Wysocki)

- Drop unnecessary driver_data pointer clearing from the ACPI EC and
   SMBUS HC drivers and make the ACPI backlight (video) driver clear the
   device's driver_data pointer on remove (Rafael Wysocki)

* acpi-battery:
  ACPI: battery: Drop redundant checks from acpi_battery_remove()
  ACPI: battery: Drop redundant check from acpi_battery_notify()

* acpi-button:
  ACPI: button: Tweak acpi_button_remove()
  ACPI: button: Tweak system wakeup handling
  ACPI: button: Call device_init_wakeup() earlier during probe

* acpi-driver:
  ACPI: driver: Drop driver_data pointer clearing from two drivers
  ACPI: video: Clear driver_data pointer on remove

Merge branches 'acpi-pm' and 'acpi-cppc'

Merge an ACPI power management update and an ACPI CPPC library update
for 7.0-rc1:

- Add an unused power resource handling quirk for THUNDEROBOT ZERO (Zhai
   Can)

- Fix remaining for_each_possible_cpu() in the ACPI CPPC library to use
   online CPUs (Sean V Kelley)

* acpi-pm:
  ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO

* acpi-cppc:
  ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs

Merge branches 'pm-powercap' and 'pm-cpuidle'

Merge additional power capping and cpuidle updates for 7.0-rc1:

- Fix the handling of package-scope MSRs in the intel_rapl power
   capping driver when called from the PMU subsystem and make it add all
   package CPUs to the PMU cpumask to allow tools to read RAPL events
   from any CPU in the package (Kuppuswamy Sathyanarayanan)

- Rework the invalid version check in the intel_rapl_tpmi power capping
   driver to account for the fact that on partitioned systems, multiple
   TPMI instances may exist per package, but RAPL registers are only
   valid on one instance (Kuppuswamy Satharayananyan)

- Describe the new intel_idle.table command line option in the
   admin-guide intel_idle documentation (Artem Bityutskiy)

- Fix a crash in the ladder cpuidle governor on systems with only one
   (polling) idle state available by making the cpuidle core bypass the
   governor in those cases and adjust the other existing governors to
   that change (Aboorva Devarajan, Christian Loehle)

* pm-powercap:
  powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
  powercap: intel_rapl: Expose all package CPUs in PMU cpumask
  powercap: intel_rapl: Remove incorrect CPU check in PMU context

* pm-cpuidle:
  cpuidle: menu: Remove single state handling
  cpuidle: teo: Remove single state handling
  cpuidle: haltpoll: Remove single state handling
  cpuidle: Skip governor when only one idle state is available
  Documentation: PM: Document intel_idle.table command line option

Merge tag 'sysctl-7.00-rc1' of git://git./linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:

- Remove macros from proc handler converters

   Replace the proc converter macros with "regular" functions. Though it
   is more verbose than the macro version, it helps when debugging and
   better aligns with coding-style.rst.

- General cleanup

   Remove superfluous ctl_table forward declarations. Const qualify the
   memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
   Add missing kernel doc to proc_dointvec_conv.

- Testing

   This series was run through sysctl selftests/kunit test suite in
   x86_64. And went into linux-next after rc4, giving it a good 3 weeks
   of testing

* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
  sysctl: Replace unidirectional INT converter macros with functions
  sysctl: Add kernel doc to proc_douintvec_conv
  sysctl: Replace UINT converter macros with functions
  sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
  sysctl: clarify proc_douintvec_minmax doc
  sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
  sysctl: Remove unused ctl_table forward declarations
  loadpin: Implement custom proc_handler for enforce
  alloc_tag: move memory_allocation_profiling_sysctls into .rodata
  sysctl: Add missing kernel-doc for proc_dointvec_conv

Merge tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git./linux/kernel/git/lenb/linux

Pull turbostat fix from Len Brown:
"Fix a recent AMD regression due to errant code cleanup"

* tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: Fix AMD RAPL regression

btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found

If btrfs_search_slot_for_read() returns 1, it means we did not find any
key greater than or equals to the key we asked for, meaning we have
reached the end of the tree and therefore the path is not valid. If
this happens we need to break out of the loop and stop, instead of
continuing and accessing an invalid path.

Fixes: 5223cc60b40a ("btrfs: drop the path before adding qgroup items when enabling qgroups")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix lost error return in btrfs_find_orphan_roots()

If the call to btrfs_get_fs_root() returns an error different from -ENOENT
we break out of the loop and then return 0, losing the error. Fix this
by returning the error instead of breaking from the loop.

Reported-by: Chris Mason <clm@meta.com>
Link: https://lore.kernel.org/linux-btrfs/20260208185321.1128472-1-clm@meta.com/
Fixes: 8670a25ecb2f ("btrfs: use single return variable in btrfs_find_orphan_roots()")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix lost return value on error in finish_verity()

If btrfs_update_inode() or del_orphan() fail, we jump to the 'end_trans'
label and then return 0 instead of the error returned by one of those
calls. Fix this and return the error.

Fixes: 61fb7f04ee06 ("btrfs: remove out label in finish_verity()")
Reported-by: Chris Mason <clm@meta.com>
Link: https://lore.kernel.org/linux-btrfs/20260208161129.3888234-1-clm@meta.com/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: change unaligned root messages to error level in btrfs_validate_super()

If the root nodes for the chunk root, tree root or log root are not sector
size aligned, we are logging a warning message but these are in fact
errors that makes the super block validation fail. So change the level of
the messages from warning to error.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use the correct type to initialize block reserve for delayed refs

When initializing the delayed refs block reserve for a transaction handle
we are passing a type of BTRFS_BLOCK_RSV_DELOPS, which is meant for
delayed items and not for delayed refs. The correct type for delayed refs
is BTRFS_BLOCK_RSV_DELREFS.

On release of any excess space reserved in a local delayed refs reserve,
we also should transfer that excess space to the global block reserve
(it it's full, we return to the space info for general availability).

By initializing a transaction's local delayed refs block reserve with a
type of BTRFS_BLOCK_RSV_DELOPS, we were also causing any excess space
released from the delayed block reserve (fs_info->delayed_block_rsv, used
for delayed inodes and items) to be transferred to the global block
reserve instead of the global delayed refs block reserve. This was an
unintentional change in commit 28270e25c69a ("btrfs: always reserve space
for delayed refs when starting transaction"), but it's not particularly
serious as things tend to cancel out each other most of the time and it's
relatively rare to be anywhere near exhaustion of the global reserve.

Fix this by initializing a transaction's local delayed refs reserve with
a type of BTRFS_BLOCK_RSV_DELREFS and making btrfs_block_rsv_release()
attempt to transfer unused space from such a reserve into the global block
reserve, just as we did before that commit for when the block reserve is
a delayed refs rsv.

Reported-by: Alex Lyakas <alex.lyakas@zadara.com>
Link: https://lore.kernel.org/linux-btrfs/CAOcd+r0FHG5LWzTSu=LknwSoqxfw+C00gFAW7fuX71+Z5AfEew@mail.gmail.com/
Fixes: 28270e25c69a ("btrfs: always reserve space for delayed refs when starting transaction")
Reviewed-by: Alex Lyakas <alex.lyakas@zadara.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure()

[BUG]
There is a bug report that when btrfs hits ENOSPC error in a critical
path, btrfs flips RO (this part is expected, although the ENOSPC bug
still needs to be addressed).

The problem is after the RO flip, if there is a read repair pending, we
can hit the ASSERT() inside btrfs_repair_io_failure() like the following:

  BTRFS info (device vdc): relocating block group 30408704 flags metadata|raid1
  ------------[ cut here ]------------
  BTRFS: Transaction aborted (error -28)
  WARNING: fs/btrfs/extent-tree.c:3235 at __btrfs_free_extent.isra.0+0x453/0xfd0, CPU#1: btrfs/383844
  Modules linked in: kvm_intel kvm irqbypass
  [...]
  ---[ end trace 0000000000000000 ]---
  BTRFS info (device vdc state EA): 2 enospc errors during balance
  BTRFS info (device vdc state EA): balance: ended with status: -30
  BTRFS error (device vdc state EA): parent transid verify failed on logical 30556160 mirror 2 wanted 8 found 6
  BTRFS error (device vdc state EA): bdev /dev/nvme0n1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
  [...]
  assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
  ------------[ cut here ]------------
  assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
  kernel BUG at fs/btrfs/bio.c:938!
  Oops: invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 0 UID: 0 PID: 868 Comm: kworker/u8:13 Tainted: G        W        N  6.19.0-rc6+ #4788 PREEMPT(full)
  Tainted: [W]=WARN, [N]=TEST
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
  Workqueue: btrfs-endio simple_end_io_work
  RIP: 0010:btrfs_repair_io_failure.cold+0xb2/0x120
  RSP: 0000:ffffc90001d2bcf0 EFLAGS: 00010246
  RAX: 0000000000000051 RBX: 0000000000001000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff8305cf42 RDI: 00000000ffffffff
  RBP: 0000000000000002 R08: 00000000fffeffff R09: ffffffff837fa988
  R10: ffffffff8327a9e0 R11: 6f69747265737361 R12: ffff88813018d310
  R13: ffff888168b8a000 R14: ffffc90001d2bd90 R15: ffff88810a169000
  FS:  0000000000000000(0000) GS:ffff8885e752c000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  ------------[ cut here ]------------

[CAUSE]
The cause of -ENOSPC error during the test case btrfs/124 is still
unknown, although it's known that we still have cases where metadata can
be over-committed but can not be fulfilled correctly, thus if we hit
such ENOSPC error inside a critical path, we have no choice but abort
the current transaction.

This will mark the fs read-only.

The problem is inside the btrfs_repair_io_failure() path that we require
the fs not to be mount read-only. This is normally fine, but if we are
doing a read-repair meanwhile the fs flips RO due to a critical error,
we can enter btrfs_repair_io_failure() with super block set to
read-only, thus triggering the above crash.

[FIX]
Just replace the ASSERT() with a proper return if the fs is already
read-only.

Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-btrfs/20260126045555.GB31641@lst.de/
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reset block group size class when it becomes empty

Block group size classes are managed consistently everywhere.
Currently, btrfs_use_block_group_size_class() sets a block group's size
class to specialize it for a specific allocation size. However, this
size class remains "stale" even if the block group becomes completely
empty (both used and reserved bytes reach zero).

This happens in two scenarios:

1. When space reservations are freed (e.g., due to errors or transaction
aborts) via btrfs_free_reserved_bytes().
2. When the last extent in a block group is freed via
btrfs_update_block_group().

While size classes are advisory, a stale size class can cause
find_free_extent to unnecessarily skip candidate block groups during
initial search loops. This undermines the purpose of size classes to
reduce fragmentation by keeping block groups restricted to a specific
size class when they could be reused for any size.

Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a
block group's used and reserved counts both reach zero. This ensures
that empty block groups are fully available for any allocation size in
the next cycle.

Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: replace BUG() with error handling in __btrfs_balance()

We search with offset (u64)-1 which should never match exactly.
Previously this was handled with BUG(). Now logs an error
and return -EUCLEAN.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Adarsh Das <adarshdas950@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: handle unexpected exact match in btrfs_set_inode_index_count()

We search with offset (u64)-1 which should never match exactly.
Previously the code silently returned success without setting the index
count. Now logs an error and return -EUCLEAN instead.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Adarsh Das <adarshdas950@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>,
Signed-off-by: David Sterba <dsterba@suse.com>

s390/debug: Convert debug area lock from a spinlock to a raw spinlock

With PREEMPT_RT as potential configuration option, spinlock_t is now
considered as a sleeping lock, and thus might cause issues when used in
an atomic context. But even with PREEMPT_RT as potential configuration
option, raw_spinlock_t remains as a true spinning lock/atomic context.
This creates potential issues with the s390 debug/tracing feature. The
functions to trace errors are called in various contexts, including
under lock of raw_spinlock_t, and thus the used spinlock_t in each debug
area is in violation of the locking semantics.

Here are two examples involving failing PCI Read accesses that are
traced while holding `pci_lock` in `drivers/pci/access.c`:

=============================
[ BUG: Invalid wait context ]
6.19.0-devel #18 Not tainted
-----------------------------
bash/3833 is trying to lock:
0000027790baee30 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300
other info that might help us debug this:
context-{5:5}
5 locks held by bash/3833:
#0: 0000027efbb29450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0
#1: 00000277f0504a90 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260
#2: 00000277beed8c18 (kn->active#339){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260
#3: 00000277e9859190 (&dev->mutex){....}-{4:4}, at: pci_dev_lock+0x2e/0x40
#4: 00000383068a7708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x4a/0xb0
stack backtrace:
CPU: 6 UID: 0 PID: 3833 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #18 PREEMPTLAZY
Hardware name: IBM 9175 ME1 701 (LPAR)
Call Trace:
[<00000383048afec2>] dump_stack_lvl+0xa2/0xe8
[<00000383049ba166>] __lock_acquire+0x816/0x1660
[<00000383049bb1fa>] lock_acquire+0x24a/0x370
[<00000383059e3860>] _raw_spin_lock_irqsave+0x70/0xc0
[<00000383048bbb6c>] debug_event_common+0xfc/0x300
[<0000038304900b0a>] __zpci_load+0x17a/0x1f0
[<00000383048fad88>] pci_read+0x88/0xd0
[<00000383054cbce0>] pci_bus_read_config_dword+0x70/0xb0
[<00000383054d55e4>] pci_dev_wait+0x174/0x290
[<00000383054d5a3e>] __pci_reset_function_locked+0xfe/0x170
[<00000383054d9b30>] pci_reset_function+0xd0/0x100
[<00000383054ee21a>] reset_store+0x5a/0x80
[<0000038304e98758>] kernfs_fop_write_iter+0x1e8/0x260
[<0000038304d995da>] new_sync_write+0x13a/0x180
[<0000038304d9c5d0>] vfs_write+0x200/0x330
[<0000038304d9c88c>] ksys_write+0x7c/0xf0
[<00000383059cfa80>] __do_syscall+0x210/0x500
[<00000383059e4c06>] system_call+0x6e/0x90
INFO: lockdep is turned off.

=============================
[ BUG: Invalid wait context ]
6.19.0-devel #3 Not tainted
-----------------------------
bash/6861 is trying to lock:
0000009da05c7430 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300
other info that might help us debug this:
context-{5:5}
5 locks held by bash/6861:
#0: 000000acff404450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0
#1: 000000acff41c490 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260
#2: 0000009da36937d8 (kn->active#75){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260
#3: 0000009dd15250d0 (&zdev->state_lock){+.+.}-{4:4}, at: enable_slot+0x2e/0xc0
#4: 000001a19682f708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_byte+0x42/0xa0
stack backtrace:
CPU: 16 UID: 0 PID: 6861 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #3 PREEMPTLAZY
Hardware name: IBM 9175 ME1 701 (LPAR)
Call Trace:
[<000001a194837ec2>] dump_stack_lvl+0xa2/0xe8
[<000001a194942166>] __lock_acquire+0x816/0x1660
[<000001a1949431fa>] lock_acquire+0x24a/0x370
[<000001a19596b810>] _raw_spin_lock_irqsave+0x70/0xc0
[<000001a194843b6c>] debug_event_common+0xfc/0x300
[<000001a194888b0a>] __zpci_load+0x17a/0x1f0
[<000001a194882d88>] pci_read+0x88/0xd0
[<000001a195453b88>] pci_bus_read_config_byte+0x68/0xa0
[<000001a195457bc2>] pci_setup_device+0x62/0xad0
[<000001a195458e70>] pci_scan_single_device+0x90/0xe0
[<000001a19488a0f6>] zpci_bus_scan_device+0x46/0x80
[<000001a19547f958>] enable_slot+0x98/0xc0
[<000001a19547f134>] power_write_file+0xc4/0x110
[<000001a194e20758>] kernfs_fop_write_iter+0x1e8/0x260
[<000001a194d215da>] new_sync_write+0x13a/0x180
[<000001a194d245d0>] vfs_write+0x200/0x330
[<000001a194d2488c>] ksys_write+0x7c/0xf0
[<000001a195957a30>] __do_syscall+0x210/0x500
[<000001a19596cbb6>] system_call+0x6e/0x90
INFO: lockdep is turned off.

Since it is desired to keep it possible to create trace records in most
situations, including this particular case (failing PCI config space
accesses are relevant), convert the used spinlock_t in `struct
debug_info` to raw_spinlock_t.

The impact is small, as the debug area lock only protects bounded memory
access without external dependencies, apart from one function
debug_set_size() where kfree() is implicitly called with the lock held.
Move debug_info_free() out of this lock, to keep remove this external
dependency.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

efi: Align unaccepted memory range to page boundary

The accept_memory() and range_contains_unaccepted_memory() functions
employ a "guard page" logic to prevent crashes with load_unaligned_zeropad().
This logic extends the range to be accepted (or checked) by one unit_size
if the end of the range is aligned to a unit_size boundary.

However, if the caller passes a range that is not page-aligned, the
'end' of the range might not be numerically aligned to unit_size, even
if it covers the last page of a unit. This causes the "if (!(end % unit_size))"
check to fail, skipping the necessary extension and leaving the next
unit unaccepted, which can lead to a kernel panic when accessed by
load_unaligned_zeropad().

Align the start address down and the size up to the nearest page
boundary before performing the unit_size alignment check. This ensures
that the guard unit is correctly added when the range effectively ends
on a unit boundary.

Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

efi: Fix reservation of unaccepted memory table

The reserve_unaccepted() function incorrectly calculates the size of the
memblock reservation for the unaccepted memory table. It aligns the
size of the table, but fails to account for cases where the table's
starting physical address (efi.unaccepted) is not page-aligned.

If the table starts at an offset within a page and its end crosses into
a subsequent page that the aligned size does not cover, the end of the
table will not be reserved. This can lead to the table being overwritten
or inaccessible, causing a kernel panic in accept_memory().

This issue was observed when starting Intel TDX VMs with specific memory
sizes (e.g., > 64GB).

Fix this by calculating the end address first (including the unaligned
start) and then aligning it up, ensuring the entire range is covered
by the reservation.

Fixes: 8dbe33956d96 ("efi/unaccepted: Make sure unaccepted table is mapped")
Reported-by: Moritz Sanft <ms@edgeless.systems>
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

MAINTAINERS: Add a reviewer entry for EFI

Over the years I've contributed patches to the EFI subsystem
mostly around TPM and EFI variables. Add me as a reviewer.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

efi: stmm: Constify struct efivar_operations

The 'struct efivar_operations' is not modified by the driver after
initialization, so it should follow typical practice of being static
const for increased code safety and readability.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

x86/kexec: Copy ACPI root pointer address from config table

Dave reports that kexec may fail when the first kernel boots via the EFI
stub but without EFI runtime services, as in that case, the RSDP address
field in struct bootparams is never assigned. Kexec copies this value
into the version of struct bootparams that it provides to the incoming
kernel, which may have no other means to locate the ACPI root pointer.

So take the value from the EFI config tables if no root pointer has been
set in the first kernel's struct bootparams.

Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Cc: <stable@vger.kernel.org> # v6.1
Reported-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Link: https://lore.kernel.org/linux-efi/aZQg_tRQmdKNadCg@darkstar.users.ipa.redhat.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>