linux-2.6-microblaze.git
5 months agohexagon: vm_tlb: include asm/tlbflush.h for prototypes
Nathan Chancellor [Thu, 30 Nov 2023 22:58:20 +0000 (15:58 -0700)]
hexagon: vm_tlb: include asm/tlbflush.h for prototypes

Clang warns about several missing prototypes that are declared in this
header:

  arch/hexagon/mm/vm_tlb.c:25:6: warning: no previous prototype for function 'flush_tlb_range' [-Wmissing-prototypes]
     25 | void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
        |      ^
  arch/hexagon/mm/vm_tlb.c:25:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     25 | void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
        | ^
        | static
  arch/hexagon/mm/vm_tlb.c:37:6: warning: no previous prototype for function 'flush_tlb_one' [-Wmissing-prototypes]
     37 | void flush_tlb_one(unsigned long vaddr)
        |      ^
  arch/hexagon/mm/vm_tlb.c:37:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     37 | void flush_tlb_one(unsigned long vaddr)
        | ^
        | static
  arch/hexagon/mm/vm_tlb.c:47:6: warning: no previous prototype for function 'tlb_flush_all' [-Wmissing-prototypes]
     47 | void tlb_flush_all(void)
        |      ^
  arch/hexagon/mm/vm_tlb.c:47:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     47 | void tlb_flush_all(void)
        | ^
        | static
  arch/hexagon/mm/vm_tlb.c:56:6: warning: no previous prototype for function 'flush_tlb_mm' [-Wmissing-prototypes]
     56 | void flush_tlb_mm(struct mm_struct *mm)
        |      ^
  arch/hexagon/mm/vm_tlb.c:56:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     56 | void flush_tlb_mm(struct mm_struct *mm)
        | ^
        | static
  arch/hexagon/mm/vm_tlb.c:66:6: warning: no previous prototype for function 'flush_tlb_page' [-Wmissing-prototypes]
     66 | void flush_tlb_page(struct vm_area_struct *vma, unsigned long vaddr)
        |      ^
  arch/hexagon/mm/vm_tlb.c:66:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     66 | void flush_tlb_page(struct vm_area_struct *vma, unsigned long vaddr)
        | ^
        | static
  arch/hexagon/mm/vm_tlb.c:78:6: warning: no previous prototype for function 'flush_tlb_kernel_range' [-Wmissing-prototypes]
     78 | void flush_tlb_kernel_range(unsigned long start, unsigned long end)
        |      ^
  arch/hexagon/mm/vm_tlb.c:78:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     78 | void flush_tlb_kernel_range(unsigned long start, unsigned long end)
        | ^
        | static
  6 warnings generated.

Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-7-5c34714afe9e@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohexagon: vm_fault: include asm/vm_fault.h for prototypes
Nathan Chancellor [Thu, 30 Nov 2023 22:58:19 +0000 (15:58 -0700)]
hexagon: vm_fault: include asm/vm_fault.h for prototypes

Clang warns:

  arch/hexagon/mm/vm_fault.c:157:6: warning: no previous prototype for function 'read_protection_fault' [-Wmissing-prototypes]
    157 | void read_protection_fault(struct pt_regs *regs)
        |      ^
  arch/hexagon/mm/vm_fault.c:157:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    157 | void read_protection_fault(struct pt_regs *regs)
        | ^
        | static
  arch/hexagon/mm/vm_fault.c:164:6: warning: no previous prototype for function 'write_protection_fault' [-Wmissing-prototypes]
    164 | void write_protection_fault(struct pt_regs *regs)
        |      ^
  arch/hexagon/mm/vm_fault.c:164:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    164 | void write_protection_fault(struct pt_regs *regs)
        | ^
        | static
  arch/hexagon/mm/vm_fault.c:171:6: warning: no previous prototype for function 'execute_protection_fault' [-Wmissing-prototypes]
    171 | void execute_protection_fault(struct pt_regs *regs)
        |      ^
  arch/hexagon/mm/vm_fault.c:171:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    171 | void execute_protection_fault(struct pt_regs *regs)
        | ^
        | static

The prototypes for these functions are defined in asm/vm_fault.h, so
include it to pick them up and silence the warnings.

Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-6-5c34714afe9e@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohexagon: vm_fault: mark do_page_fault() as static
Nathan Chancellor [Thu, 30 Nov 2023 22:58:18 +0000 (15:58 -0700)]
hexagon: vm_fault: mark do_page_fault() as static

Clang warns:

  arch/hexagon/mm/vm_fault.c:36:6: warning: no previous prototype for function 'do_page_fault' [-Wmissing-prototypes]
     36 | void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
        |      ^
  arch/hexagon/mm/vm_fault.c:36:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     36 | void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
        | ^
        | static

This function is not used outside of this translation unit, so mark it
as static.

Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-5-5c34714afe9e@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohexagon: smp: mark handle_ipi() and start_secondary() as static
Nathan Chancellor [Thu, 30 Nov 2023 22:58:17 +0000 (15:58 -0700)]
hexagon: smp: mark handle_ipi() and start_secondary() as static

Clang warns:

  arch/hexagon/kernel/smp.c:82:13: warning: no previous prototype for function 'handle_ipi' [-Wmissing-prototypes]
     82 | irqreturn_t handle_ipi(int irq, void *desc)
        |             ^
  arch/hexagon/kernel/smp.c:82:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     82 | irqreturn_t handle_ipi(int irq, void *desc)
        | ^
        | static
  arch/hexagon/kernel/smp.c:127:6: warning: no previous prototype for function 'start_secondary' [-Wmissing-prototypes]
    127 | void start_secondary(void)
        |      ^
  arch/hexagon/kernel/smp.c:127:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    127 | void start_secondary(void)
        | ^
        | static
  2 warnings generated.

These functions are not used outside of this translation unit, so mark
them as static.

Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-4-5c34714afe9e@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohexagon: mm: include asm/setup.h for setup_arch_memory()'s prototype
Nathan Chancellor [Thu, 30 Nov 2023 22:58:16 +0000 (15:58 -0700)]
hexagon: mm: include asm/setup.h for setup_arch_memory()'s prototype

Clang warns:

  arch/hexagon/mm/init.c:138:13: warning: no previous prototype for function 'setup_arch_memory' [-Wmissing-prototypes]
    138 | void __init setup_arch_memory(void)
        |             ^
  arch/hexagon/mm/init.c:138:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    138 | void __init setup_arch_memory(void)
        | ^
        | static

The prototype is in asm/setup.h, include it to clear up the warning.

Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-3-5c34714afe9e@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohexagon: mm: mark paging_init() as static
Nathan Chancellor [Thu, 30 Nov 2023 22:58:15 +0000 (15:58 -0700)]
hexagon: mm: mark paging_init() as static

Clang warns:

  arch/hexagon/mm/init.c:89:13: warning: no previous prototype for function 'paging_init' [-Wmissing-prototypes]
     89 | void __init paging_init(void)
        |             ^
  arch/hexagon/mm/init.c:89:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     89 | void __init paging_init(void)
        | ^
        | static

This function is only used within this translation unit, so mark it static
as suggested.

Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-2-5c34714afe9e@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohexagon: uaccess: remove clear_user_hexagon()
Nathan Chancellor [Thu, 30 Nov 2023 22:58:14 +0000 (15:58 -0700)]
hexagon: uaccess: remove clear_user_hexagon()

Patch series "hexagon: Fix up instances of -Wmissing-prototypes".

This series fixes all the instances of -Wmissing-prototypes in
arch/hexagon, as it is about to be enabled globally in a default build.

This patch (of 19):

Clang warns:

  arch/hexagon/mm/uaccess.c:39:15: warning: no previous prototype for function 'clear_user_hexagon' [-Wmissing-prototypes]
     39 | unsigned long clear_user_hexagon(void __user *dest, unsigned long count)
        |               ^
  arch/hexagon/mm/uaccess.c:39:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     39 | unsigned long clear_user_hexagon(void __user *dest, unsigned long count)
        | ^
        | static
  1 warning generated.

This function appears to have been unused since it was introduced in
commit 7567746e1c0d ("Hexagon: Add user access functions"), so remove it.

Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-0-5c34714afe9e@kernel.org
Link: https://lkml.kernel.org/r/20231130-hexagon-missing-prototypes-v1-1-5c34714afe9e@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoarch: turn off -Werror for architectures with known warnings
Arnd Bergmann [Thu, 30 Nov 2023 08:07:38 +0000 (09:07 +0100)]
arch: turn off -Werror for architectures with known warnings

A couple of architectures enable -Werror for their own files regardless of
CONFIG_WERROR but also have known warnings that fail the build with
-Wmissing-prototypes enabled by default:

arch/alpha/lib/memcpy.c:153:8: error: no previous prototype for 'memcpy' [-Werror=missing-prototypes]
arch/alpha/kernel/irq.c:96:1: error: no previous prototype for 'handle_irq' [-Werror=missing-prototypes]
arch/mips/kernel/signal.c:673:17: error: no previous prototype for ‘sys_rt_sigreturn’ [-Werror=missing-prototypes]
arch/mips/kernel/signal.c:636:17: error: no previous prototype for ‘sys_sigreturn’ [-Werror=missing-prototypes]
arch/mips/kernel/syscall.c:51:16: error: no previous prototype for ‘sysm_pipe’ [-Werror=missing-prototypes]
arch/mips/mm/fault.c:323:17: error: no previous prototype for ‘do_page_fault’ [-Werror=missing-prototypes]
arch/sparc/vdso/vma.c:246:12: warning: no previous prototype for ‘init_vdso_image’ [-Wmissing-prototypes]v
arch/sparc/vdso/vdso32/../vclock_gettime.c:343:1: warning: no previous prototype for ‘__vdso_gettimeofday_stick’ [-Wmissing-prototypes]
arch/sparc/vdso/vclock_gettime.c:343:1: warning: no previous prototype for ‘__vdso_gettimeofday_stick’ [-Wmissing-prototypes]
arch/sparc/prom/p1275.c:52:6: warning: no previous prototype for ‘prom_cif_init’ [-Wmissing-prototypes]
arch/sparc/prom/misc_64.c:165:5: warning: no previous prototype for ‘prom_get_mmu_ihandle’ [-Wmissing-prototypes]

This appears to be an artifact from the times when this architecture code
was better maintained that most device drivers and before CONFIG_WERROR
was added.  Now it just gets in the way, so remove all of these.

Powerpc and x86 both still have their own Kconfig options to enable
-Werror for some of their files.  These architectures are better
maintained than most and the options are easy to disable, so leave those
untouched.

Link: https://lkml.kernel.org/r/4be73872-c1f5-4c31-8201-712c19290a22@app.fastmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Stephen Rothwell <sfr@rothwell.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agokexec: use atomic_try_cmpxchg in crash_kexec
Uros Bizjak [Tue, 14 Nov 2023 16:12:01 +0000 (17:12 +0100)]
kexec: use atomic_try_cmpxchg in crash_kexec

Use atomic_try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
crash_kexec().  x86 CMPXCHG instruction returns success in ZF flag,
so this change saves a compare after cmpxchg.

No functional change intended.

Link: https://lkml.kernel.org/r/20231114161228.108516-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoscripts/spelling.txt: add more spellings to spelling.txt
Colin Ian King [Wed, 22 Nov 2023 10:40:37 +0000 (10:40 +0000)]
scripts/spelling.txt: add more spellings to spelling.txt

Some of the more common spelling mistakes and typos that I've found while
fixing up spelling mistakes in the kernel over the past couple of
releases.

Link: https://lkml.kernel.org/r/20231122104037.1770749-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months ago__ptrace_unlink: kill the obsolete "FIXME" code
Oleg Nesterov [Tue, 21 Nov 2023 16:26:50 +0000 (17:26 +0100)]
__ptrace_unlink: kill the obsolete "FIXME" code

The corner case described by the comment is no longer possible after the
commit 7b3c36fc4c23 ("ptrace: fix task_join_group_stop() for the case when
current is traced"), task_join_group_stop() ensures that the new thread
has the correct signr in JOBCTL_STOP_SIGMASK regardless of ptrace.

Link: https://lkml.kernel.org/r/20231121162650.GA6635@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agocheckstack: allow to pass MINSTACKSIZE parameter
Heiko Carstens [Mon, 20 Nov 2023 18:37:19 +0000 (19:37 +0100)]
checkstack: allow to pass MINSTACKSIZE parameter

The checkstack script omits all functions with a stack usage of less than
100 bytes.  However the script already has support for a parameter which
allows to override the default, but it cannot be set with

$ make checkstack

Add a MINSTACKSIZE parameter which allows to change the default. This might
be useful in order to print the stack usage of all functions, or only those
with large stack usage:

$ make checkstack MINSTACKSIZE=0
$ make checkstack MINSTACKSIZE=800

Link: https://lkml.kernel.org/r/20231120183719.2188479-4-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Maninder Singh <maninder1.s@samsung.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agocheckstack: sort output by size and function name
Heiko Carstens [Mon, 20 Nov 2023 18:37:18 +0000 (19:37 +0100)]
checkstack: sort output by size and function name

Sort output by size and in addition by function name.  This increases
readability for cases where there are many functions with the same stack
usage.

Link: https://lkml.kernel.org/r/20231120183719.2188479-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Maninder Singh <maninder1.s@samsung.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agokernel/signal.c: simplify force_sig_info_to_task(), kill recalc_sigpending_and_wake()
Oleg Nesterov [Mon, 20 Nov 2023 15:16:49 +0000 (16:16 +0100)]
kernel/signal.c: simplify force_sig_info_to_task(), kill recalc_sigpending_and_wake()

The purpose of recalc_sigpending_and_wake() is not clear, it looks
"obviously unneeded" because we are going to send the signal which can't
be blocked or ignored.

Add the comment to explain why we can't rely on send_signal_locked() and
make this logic more simple/explicit.  recalc_sigpending_and_wake() has no
other users, it can die.

In fact I think we don't even need signal_wake_up(), the target task must
be either current or a TASK_TRACED child, otherwise the usage of siglock
is not safe.  But this needs another change.

Link: https://lkml.kernel.org/r/20231120151649.GA15995@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agodocs: filesystems: document the squashfs specific mount options
Ariel Miculas [Fri, 17 Nov 2023 16:12:14 +0000 (18:12 +0200)]
docs: filesystems: document the squashfs specific mount options

When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is set, the "threads" mount option
can be used to specify the decompression mode: single-threaded,
multi-threaded, percpu or the number of threads used for decompression.
When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is not set, SQUASHFS_DECOMP_MULTI and
SQUASHFS_MOUNT_DECOMP_THREADS are both set, the "threads" option can also
be used to specify the number of threads used for decompression.  This
mount option is only mentioned in fs/squashfs/Kconfig, which makes it
difficult to find.

Another mount option available is "errors", which can be configured to
panic the kernel when squashfs errors are encountered.

Add both these options to the squashfs documentation, making them more
noticeable.

Link: https://lkml.kernel.org/r/20231117161215.140282-1-amiculas@cisco.com
Signed-off-by: Ariel Miculas <amiculas@cisco.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agocheckpatch: do not require an empty line before error injection
Sergey Senozhatsky [Thu, 9 Nov 2023 07:51:38 +0000 (16:51 +0900)]
checkpatch: do not require an empty line before error injection

ALLOW_ERROR_INJECTION macro (just like EXPORT_SYMBOL) can immediately
follow a function it annotates.

Link: https://lkml.kernel.org/r/20231109075147.2779461-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com> (maintainer:CHECKPATCH)
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> (reviewer:CHECKPATCH)
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoarch: remove ARCH_TASK_STRUCT_ON_STACK
Heiko Carstens [Thu, 16 Nov 2023 13:36:38 +0000 (14:36 +0100)]
arch: remove ARCH_TASK_STRUCT_ON_STACK

IA-64 was the only architecture which selected ARCH_TASK_STRUCT_ON_STACK.
IA-64 was removed with commit cf8e8658100d ("arch: Remove Itanium (IA-64)
architecture"). Therefore remove support for ARCH_TASK_STRUCT_ON_STACK
as well.

Note: this also reveals a potential bug in powerpc code, which makes use of
__init_task_data without selecting ARCH_TASK_STRUCT_ON_STACK which makes
__init_task_data a no-op. This is broken since commit d11ed3ab3166 ("Expand
INIT_TASK() in init/init_task.c and remove") from 2018 and needs to be
addressed separately.

Link: https://lkml.kernel.org/r/20231116133638.1636277-4-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoarch: remove ARCH_TASK_STRUCT_ALLOCATOR
Heiko Carstens [Thu, 16 Nov 2023 13:36:37 +0000 (14:36 +0100)]
arch: remove ARCH_TASK_STRUCT_ALLOCATOR

IA-64 was the only architecture which selected ARCH_TASK_STRUCT_ALLOCATOR.
IA-64 was removed with commit cf8e8658100d ("arch: Remove Itanium (IA-64)
architecture"). Therefore remove support for ARCH_THREAD_STACK_ALLOCATOR
as well.

Link: https://lkml.kernel.org/r/20231116133638.1636277-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoarch: remove ARCH_THREAD_STACK_ALLOCATOR
Heiko Carstens [Thu, 16 Nov 2023 13:36:36 +0000 (14:36 +0100)]
arch: remove ARCH_THREAD_STACK_ALLOCATOR

Patch series "Remove unused code after IA-64 removal".

While looking into something different I noticed that there are a couple
of Kconfig options which were only selected by IA-64 and which are now
unused.

So remove them and simplify the code a bit.

This patch (of 3):

IA-64 was the only architecture which selected ARCH_THREAD_STACK_ALLOCATOR.
IA-64 was removed with commit cf8e8658100d ("arch: Remove Itanium (IA-64)
architecture"). Therefore remove support for ARCH_THREAD_STACK_ALLOCATOR as
well.

Link: https://lkml.kernel.org/r/20231116133638.1636277-1-hca@linux.ibm.com
Link: https://lkml.kernel.org/r/20231116133638.1636277-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_btnode_abort_change_key to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:36 +0000 (17:44 +0900)]
nilfs2: convert nilfs_btnode_abort_change_key to use a folio

Saves one call to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-21-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_btnode_commit_change_key to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:35 +0000 (17:44 +0900)]
nilfs2: convert nilfs_btnode_commit_change_key to use a folio

Saves one call to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-20-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_btnode_prepare_change_key to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:34 +0000 (17:44 +0900)]
nilfs2: convert nilfs_btnode_prepare_change_key to use a folio

Saves three calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-19-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_btnode_delete to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:33 +0000 (17:44 +0900)]
nilfs2: convert nilfs_btnode_delete to use a folio

Saves six calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-18-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_btnode_submit_block to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:32 +0000 (17:44 +0900)]
nilfs2: convert nilfs_btnode_submit_block to use a folio

Saves two calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-17-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_btnode_create_block to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:31 +0000 (17:44 +0900)]
nilfs2: convert nilfs_btnode_create_block to use a folio

Saves two calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-16-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_gccache_submit_read_data to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:30 +0000 (17:44 +0900)]
nilfs2: convert nilfs_gccache_submit_read_data to use a folio

Saves two calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-15-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_mdt_submit_block to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:29 +0000 (17:44 +0900)]
nilfs2: convert nilfs_mdt_submit_block to use a folio

Saves two calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-14-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_mdt_create_block to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:28 +0000 (17:44 +0900)]
nilfs2: convert nilfs_mdt_create_block to use a folio

Saves two calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-13-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_page_mkwrite() to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:27 +0000 (17:44 +0900)]
nilfs2: convert nilfs_page_mkwrite() to use a folio

Using the new folio APIs saves seven hidden calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-12-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_segctor_prepare_write to use folios
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:26 +0000 (17:44 +0900)]
nilfs2: convert nilfs_segctor_prepare_write to use folios

Use the new folio APIs, saving 17 hidden calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-11-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert to __nilfs_clear_folio_dirty()
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:25 +0000 (17:44 +0900)]
nilfs2: convert to __nilfs_clear_folio_dirty()

All callers now have a folio, so convert to pass a folio.  No caller uses
the return value, so make it return void.  Removes a couple of hidden
calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-10-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert to nilfs_clear_folio_dirty()
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:24 +0000 (17:44 +0900)]
nilfs2: convert to nilfs_clear_folio_dirty()

All callers of nilfs_clear_dirty_page() now have a folio, so rename the
function and pass in the folio.  Saves three hidden calls to
compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-9-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_mdt_write_page() to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:23 +0000 (17:44 +0900)]
nilfs2: convert nilfs_mdt_write_page() to use a folio

Convert the incoming page to a folio.  Replaces three calls to
compound_head() with one.

Link: https://lkml.kernel.org/r/20231114084436.2755-8-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_writepage() to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:22 +0000 (17:44 +0900)]
nilfs2: convert nilfs_writepage() to use a folio

Convert the incoming page to a folio.  Replaces three calls to
compound_head() with one.

Link: https://lkml.kernel.org/r/20231114084436.2755-7-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert to nilfs_folio_buffers_clean()
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:21 +0000 (17:44 +0900)]
nilfs2: convert to nilfs_folio_buffers_clean()

All callers of nilfs_page_buffers_clean() now have a folio, so convert it
to take a folio.  While I'm at it, make it return a bool.

Link: https://lkml.kernel.org/r/20231114084436.2755-6-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_forget_buffer to use a folio
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:20 +0000 (17:44 +0900)]
nilfs2: convert nilfs_forget_buffer to use a folio

Save two hidden calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-5-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_segctor_complete_write to use folios
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:19 +0000 (17:44 +0900)]
nilfs2: convert nilfs_segctor_complete_write to use folios

Use the new folio APIs, saving five calls to compound_head().  This
includes the last callers of nilfs_end_page_io(), so remove that too.

Link: https://lkml.kernel.org/r/20231114084436.2755-4-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: convert nilfs_abort_logs to use folios
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:18 +0000 (17:44 +0900)]
nilfs2: convert nilfs_abort_logs to use folios

Use the new folio APIs, saving five hidden calls to compound_head().

Link: https://lkml.kernel.org/r/20231114084436.2755-3-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: add nilfs_end_folio_io()
Matthew Wilcox (Oracle) [Tue, 14 Nov 2023 08:44:17 +0000 (17:44 +0900)]
nilfs2: add nilfs_end_folio_io()

Patch series "nilfs2: Folio conversions for file paths".

This series advances page->folio conversions for a wide range of nilfs2,
including its file operations, block routines, and the log writer's
writeback routines.  It doesn't cover large folios support, but it saves a
lot of hidden compound_head() calls while preserving the existing support
range behavior.

The original series in post [1] also covered directory-related page->folio
conversions, but that was put on hold because a regression was found in
testing, so this is an excerpt from the first half of the original post.

[1] https://lkml.kernel.org/r/20231106173903.1734114-1-willy@infradead.org

I tested this series in both 32-bit and 64-bit environments, switching
between normal and small block sizes.  I also reviewed all changes in all
patches to ensure they do not break existing behavior.  There were no
problems.

This patch (of 20):

This is the folio counterpart of the existing nilfs_end_page_io() which is
retained as a wrapper of nilfs_end_folio_io().  Replaces nine hidden calls
to compound_head() with one.

Link: https://lkml.kernel.org/r/20231114084436.2755-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20231114084436.2755-2-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoSquashfs: fix variable overflow triggered by sysbot
Phillip Lougher [Mon, 13 Nov 2023 16:09:01 +0000 (16:09 +0000)]
Squashfs: fix variable overflow triggered by sysbot

Sysbot reports a slab out of bounds write in squashfs_readahead().

This is ultimately caused by a file reporting an (infeasibly) large file
size (1407374883553280 bytes) with the minimum block size of 4K.

This causes variable overflow.

Link: https://lkml.kernel.org/r/20231113160901.6444-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+604424eb051c2f696163@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000b1fda20609ede0d1@google.com/
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agofs/nilfs2: use standard array-copy-function
Philipp Stanner [Mon, 6 Nov 2023 22:44:16 +0000 (07:44 +0900)]
fs/nilfs2: use standard array-copy-function

ioctl.c utilizes memdup_user() to copy a userspace array.  An overflow
check is performed manually before the function's invocation.

The new function memdup_array_user() standardizes copying userspace
arrays, thus, improving readability by making it more clear that an array
is being copied.  Additionally, it also performs an overflow check.

Remove the (now redundant) manual overflow-check and replace memdup_user()
with memdup_array_user().

In addition, improve the grammar of the comment above
memdup_array_user().

Link: https://lkml.kernel.org/r/20231106224416.3055-1-konishi.ryusuke@gmail.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Link: https://lkml.kernel.org/r/20231103184831.99406-2-pstanner@redhat.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Suggested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agointroduce for_other_threads(p, t)
Oleg Nesterov [Mon, 30 Oct 2023 15:57:10 +0000 (16:57 +0100)]
introduce for_other_threads(p, t)

Cosmetic, but imho it makes the usage look more clear and simple, the new
helper doesn't require to initialize "t".

After this change while_each_thread() has only 3 users, and it is only
used in the do/while loops.

Link: https://lkml.kernel.org/r/20231030155710.GA9095@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agokernel/reboot: explicitly notify if halt occurred instead of power off
Dongmin Lee [Sat, 4 Nov 2023 11:33:20 +0000 (20:33 +0900)]
kernel/reboot: explicitly notify if halt occurred instead of power off

When kernel_can_power_off() returns false, and reboot has called with
LINUX_REBOOT_CMD_POWER_OFF, kernel_halt() will be initiated instead of
actual power off function.

However, in this situation, Kernel never explicitly notifies user that
system halted instead of requested power off.

Since halt and power off perform different behavior, and user initiated
reboot call with power off command, not halt, This could be unintended
behavior to user, like this:

~ # poweroff -f
[    3.581482] reboot: System halted

Therefore, this explicitly notifies user that poweroff is not available,
and halting has been occured as an alternative behavior instead:

~ # poweroff -f
[    4.123668] reboot: Power off not available: System halted instead

[akpm@linux-foundation.org: tweak comment text]
Link: https://lkml.kernel.org/r/20231104113320.72440-1-ldmldm05@gmail.com
Signed-off-by: Dongmin Lee <ldmldm05@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoMerge branch 'master' into mm-hotfixes-stable
Andrew Morton [Thu, 7 Dec 2023 01:03:50 +0000 (17:03 -0800)]
Merge branch 'master' into mm-hotfixes-stable

5 months agomm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()
Jiexun Wang [Thu, 21 Sep 2023 12:27:51 +0000 (20:27 +0800)]
mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()

I conducted real-time testing and observed that
madvise_cold_or_pageout_pte_range() causes significant latency under
memory pressure, which can be effectively reduced by adding cond_resched()
within the loop.

I tested on the LicheePi 4A board using Cylictest for latency testing and
Ftrace for latency tracing.  The board uses TH1520 processor and has a
memory size of 8GB.  The kernel version is 6.5.0 with the PREEMPT_RT patch
applied.

The script I tested is as follows:

echo wakeup_rt > /sys/kernel/tracing/current_tracer
echo 1 > /sys/kernel/tracing/tracing_on
echo 0 > /sys/kernel/tracing/tracing_max_latency
stress-ng --vm 8 --vm-bytes 2G &
cyclictest --mlockall --smp --priority=99 --distance=0 --duration=30m
echo 0 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace

The tracing results before modification are as follows:

# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 6.5.0-rt6-r1208-00003-g999d221864bf
# --------------------------------------------------------------------
# latency: 2552 us, #6/6, CPU#3 | (M:preempt_rt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: cyclictest-196 (uid:0 nice:0 policy:1 rt_prio:99)
#    -----------------
#
#                    _--------=> CPU#
#                   / _-------=> irqs-off/BH-disabled
#                  | / _------=> need-resched
#                  || / _-----=> need-resched-lazy
#                  ||| / _----=> hardirq/softirq
#                  |||| / _---=> preempt-depth
#                  ||||| / _--=> preempt-lazy-depth
#                  |||||| / _-=> migrate-disable
#                  ||||||| /     delay
#  cmd     pid     |||||||| time  |   caller
#     \   /        ||||||||  \    |    /
stress-n-206       3dn.h512    2us :      206:120:R   + [003]     196:  0:R cyclictest
stress-n-206       3dn.h512    7us : <stack trace>
 => __ftrace_trace_stack
 => __trace_stack
 => probe_wakeup
 => ttwu_do_activate
 => try_to_wake_up
 => wake_up_process
 => hrtimer_wakeup
 => __hrtimer_run_queues
 => hrtimer_interrupt
 => riscv_timer_interrupt
 => handle_percpu_devid_irq
 => generic_handle_domain_irq
 => riscv_intc_irq
 => handle_riscv_irq
 => do_irq
stress-n-206       3dn.h512    9us#: 0
stress-n-206       3d...3.. 2544us : __schedule
stress-n-206       3d...3.. 2545us :      206:120:R ==> [003]     196:  0:R cyclictest
stress-n-206       3d...3.. 2551us : <stack trace>
 => __ftrace_trace_stack
 => __trace_stack
 => probe_wakeup_sched_switch
 => __schedule
 => preempt_schedule
 => migrate_enable
 => rt_spin_unlock
 => madvise_cold_or_pageout_pte_range
 => walk_pgd_range
 => __walk_page_range
 => walk_page_range
 => madvise_pageout
 => madvise_vma_behavior
 => do_madvise
 => sys_madvise
 => do_trap_ecall_u
 => ret_from_exception

The tracing results after modification are as follows:

# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 6.5.0-rt6-r1208-00004-gca3876fc69a6-dirty
# --------------------------------------------------------------------
# latency: 1689 us, #6/6, CPU#0 | (M:preempt_rt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: cyclictest-217 (uid:0 nice:0 policy:1 rt_prio:99)
#    -----------------
#
#                    _--------=> CPU#
#                   / _-------=> irqs-off/BH-disabled
#                  | / _------=> need-resched
#                  || / _-----=> need-resched-lazy
#                  ||| / _----=> hardirq/softirq
#                  |||| / _---=> preempt-depth
#                  ||||| / _--=> preempt-lazy-depth
#                  |||||| / _-=> migrate-disable
#                  ||||||| /     delay
#  cmd     pid     |||||||| time  |   caller
#     \   /        ||||||||  \    |    /
stress-n-232       0dn.h413    1us+:      232:120:R   + [000]     217:  0:R cyclictest
stress-n-232       0dn.h413   12us : <stack trace>
 => __ftrace_trace_stack
 => __trace_stack
 => probe_wakeup
 => ttwu_do_activate
 => try_to_wake_up
 => wake_up_process
 => hrtimer_wakeup
 => __hrtimer_run_queues
 => hrtimer_interrupt
 => riscv_timer_interrupt
 => handle_percpu_devid_irq
 => generic_handle_domain_irq
 => riscv_intc_irq
 => handle_riscv_irq
 => do_irq
stress-n-232       0dn.h413   19us#: 0
stress-n-232       0d...3.. 1671us : __schedule
stress-n-232       0d...3.. 1676us+:      232:120:R ==> [000]     217:  0:R cyclictest
stress-n-232       0d...3.. 1687us : <stack trace>
 => __ftrace_trace_stack
 => __trace_stack
 => probe_wakeup_sched_switch
 => __schedule
 => preempt_schedule
 => migrate_enable
 => free_unref_page_list
 => release_pages
 => free_pages_and_swap_cache
 => tlb_batch_pages_flush
 => tlb_flush_mmu
 => unmap_page_range
 => unmap_vmas
 => unmap_region
 => do_vmi_align_munmap.constprop.0
 => do_vmi_munmap
 => __vm_munmap
 => sys_munmap
 => do_trap_ecall_u
 => ret_from_exception

After the modification, the cause of maximum latency is no longer
madvise_cold_or_pageout_pte_range(), so this modification can reduce the
latency caused by madvise_cold_or_pageout_pte_range().

Currently the madvise_cold_or_pageout_pte_range() function exhibits
significant latency under memory pressure, which can be effectively
reduced by adding cond_resched() within the loop.

When the batch_count reaches SWAP_CLUSTER_MAX, we reschedule
the task to ensure fairness and avoid long lock holding times.

Link: https://lkml.kernel.org/r/85363861af65fac66c7a98c251906afc0d9c8098.1695291046.git.wangjiexun@tinylab.org
Signed-off-by: Jiexun Wang <wangjiexun@tinylab.org>
Cc: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
Ryusuke Konishi [Tue, 5 Dec 2023 08:59:47 +0000 (17:59 +0900)]
nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()

If nilfs2 reads a disk image with corrupted segment usage metadata, and
its segment usage information is marked as an error for the segment at the
write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs
during log writing.

Segments newly allocated for writing with nilfs_sufile_alloc() will not
have this error flag set, but this unexpected situation will occur if the
segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active
segment) was marked in error.

Fix this issue by inserting a sanity check to treat it as a file system
corruption.

Since error returns are not allowed during the execution phase where
nilfs_sufile_set_segment_usage() is used, this inserts the sanity check
into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the
segment usage record to be updated and sets it up in a dirty state for
writing.

In addition, nilfs_sufile_set_segment_usage() is also called when
canceling log writing and undoing segment usage update, so in order to
avoid issuing the same kernel warning in that case, in case of
cancellation, avoid checking the error flag in
nilfs_sufile_set_segment_usage().

Link: https://lkml.kernel.org/r/20231205085947.4431-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+14e9f834f6ddecece094@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=14e9f834f6ddecece094
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
Sidhartha Kumar [Mon, 4 Dec 2023 18:32:34 +0000 (10:32 -0800)]
mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI

After commit a08c7193e4f1 "mm/filemap: remove hugetlb special casing in
filemap.c", hugetlb pages are stored in the page cache in base page sized
indexes.  This leads to multi index stores in the xarray which is only
supporting through CONFIG_XARRAY_MULTI.  The other page cache user of
multi index stores ,THP, selects XARRAY_MULTI.  Have CONFIG_HUGETLB_PAGE
follow this behavior as well to avoid the BUG() with a CONFIG_HUGETLB_PAGE
&& !CONFIG_XARRAY_MULTI config.

Link: https://lkml.kernel.org/r/20231204183234.348697-1-sidhartha.kumar@oracle.com
Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoscripts/gdb: fix lx-device-list-bus and lx-device-list-class
Florian Fainelli [Thu, 30 Nov 2023 04:33:16 +0000 (20:33 -0800)]
scripts/gdb: fix lx-device-list-bus and lx-device-list-class

After the conversion to bus_to_subsys() and class_to_subsys(), the gdb
scripts listing the system buses and classes respectively was broken, fix
those by returning the subsys_priv pointer and have the various caller
de-reference either the 'bus' or 'class' structure members accordingly.

Link: https://lkml.kernel.org/r/20231130043317.174188-1-florian.fainelli@broadcom.com
Fixes: 7b884b7f24b4 ("driver core: class.c: convert to only use class_to_subsys")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoMAINTAINERS: drop Antti Palosaari
Bagas Sanjaya [Thu, 30 Nov 2023 08:38:48 +0000 (15:38 +0700)]
MAINTAINERS: drop Antti Palosaari

He is currently inactive (last message from him is two years ago [1]).
His media tree [2] is also dormant (latest activity is 6 years ago), yet
his site is still online [3].

Drop him from MAINTAINERS and add CREDITS entry for him. We thank him
for maintaining various DVB drivers.

[1]: https://lore.kernel.org/all/660772b3-0597-02db-ed94-c6a9be04e8e8@iki.fi/
[2]: https://git.linuxtv.org/anttip/media_tree.git/
[3]: https://palosaari.fi/linux/

Link: https://lkml.kernel.org/r/20231130083848.5396-1-bagasdotme@gmail.com
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Antti Palosaari <crope@iki.fi>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohighmem: fix a memory copy problem in memcpy_from_folio
Su Hui [Thu, 30 Nov 2023 03:40:18 +0000 (11:40 +0800)]
highmem: fix a memory copy problem in memcpy_from_folio

Clang static checker complains that value stored to 'from' is never read.
And memcpy_from_folio() only copy the last chunk memory from folio to
destination.  Use 'to += chunk' to replace 'from += chunk' to fix this
typo problem.

Link: https://lkml.kernel.org/r/20231130034017.1210429-1-suhui@nfschina.com
Fixes: b23d03ef7af5 ("highmem: add memcpy_to_folio() and memcpy_from_folio()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agonilfs2: fix missing error check for sb_set_blocksize call
Ryusuke Konishi [Wed, 29 Nov 2023 14:15:47 +0000 (23:15 +0900)]
nilfs2: fix missing error check for sb_set_blocksize call

When mounting a filesystem image with a block size larger than the page
size, nilfs2 repeatedly outputs long error messages with stack traces to
the kernel log, such as the following:

 getblk(): invalid block size 8192 requested
 logical block size: 512
 ...
 Call Trace:
  dump_stack_lvl+0x92/0xd4
  dump_stack+0xd/0x10
  bdev_getblk+0x33a/0x354
  __breadahead+0x11/0x80
  nilfs_search_super_root+0xe2/0x704 [nilfs2]
  load_nilfs+0x72/0x504 [nilfs2]
  nilfs_mount+0x30f/0x518 [nilfs2]
  legacy_get_tree+0x1b/0x40
  vfs_get_tree+0x18/0xc4
  path_mount+0x786/0xa88
  __ia32_sys_mount+0x147/0x1a8
  __do_fast_syscall_32+0x56/0xc8
  do_fast_syscall_32+0x29/0x58
  do_SYSENTER_32+0x15/0x18
  entry_SYSENTER_32+0x98/0xf1
 ...

This overloads the system logger.  And to make matters worse, it sometimes
crashes the kernel with a memory access violation.

This is because the return value of the sb_set_blocksize() call, which
should be checked for errors, is not checked.

The latter issue is due to out-of-buffer memory being accessed based on a
large block size that caused sb_set_blocksize() to fail for buffers read
with the initial minimum block size that remained unupdated in the
super_block structure.

Since nilfs2 mkfs tool does not accept block sizes larger than the system
page size, this has been overlooked.  However, it is possible to create
this situation by intentionally modifying the tool or by passing a
filesystem image created on a system with a large page size to a system
with a smaller page size and mounting it.

Fix this issue by inserting the expected error handling for the call to
sb_set_blocksize().

Link: https://lkml.kernel.org/r/20231129141547.4726-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agokernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
Baoquan He [Tue, 28 Nov 2023 05:44:57 +0000 (13:44 +0800)]
kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP

Ignat Korchagin complained that a potential config regression was
introduced by commit 89cde455915f ("kexec: consolidate kexec and crash
options into kernel/Kconfig.kexec").  Before the commit, CONFIG_CRASH_DUMP
has no dependency on CONFIG_KEXEC.  After the commit, CRASH_DUMP selects
KEXEC.  That enforces system to have CONFIG_KEXEC=y as long as
CONFIG_CRASH_DUMP=Y which people may not want.

In Ignat's case, he sets CONFIG_CRASH_DUMP=y, CONFIG_KEXEC_FILE=y and
CONFIG_KEXEC=n because kexec_load interface could have security issue if
kernel/initrd has no chance to be signed and verified.

CRASH_DUMP has select of KEXEC because Eric, author of above commit, met a
LKP report of build failure when posting patch of earlier version.  Please
see below link to get detail of the LKP report:

    https://lore.kernel.org/all/3e8eecd1-a277-2cfb-690e-5de2eb7b988e@oracle.com/T/#u

In fact, that LKP report is triggered because arm's <asm/kexec.h> is
wrapped in CONFIG_KEXEC ifdeffery scope.  That is wrong.  CONFIG_KEXEC
controls the enabling/disabling of kexec_load interface, but not kexec
feature.  Removing the wrongly added CONFIG_KEXEC ifdeffery scope in
<asm/kexec.h> of arm allows us to drop the select KEXEC for CRASH_DUMP.
Meanwhile, change arch/arm/kernel/Makefile to let machine_kexec.o
relocate_kernel.o depend on KEXEC_CORE.

Link: https://lkml.kernel.org/r/20231128054457.659452-1-bhe@redhat.com
Fixes: 89cde455915f ("kexec: consolidate kexec and crash options into kernel/Kconfig.kexec")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Tested-by: Ignat Korchagin <ignat@cloudflare.com> [compile-time only]
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Tested-by: Eric DeVolder <eric_devolder@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agounits: add missing header
Andy Shevchenko [Tue, 28 Nov 2023 17:44:03 +0000 (19:44 +0200)]
units: add missing header

BITS_PER_BYTE is defined in bits.h.

Link: https://lkml.kernel.org/r/20231128174404.393393-1-andriy.shevchenko@linux.intel.com
Fixes: e8eed5f7366f ("units: Add BYTES_PER_*BIT")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Damian Muszynski <damian.muszynski@intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agodrivers/base/cpu: crash data showing should depends on KEXEC_CORE
Baoquan He [Tue, 28 Nov 2023 05:52:48 +0000 (13:52 +0800)]
drivers/base/cpu: crash data showing should depends on KEXEC_CORE

After commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs
attributes"), on x86_64, if only below kernel configs related to kdump are
set, compiling error are triggered.

----
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
------

------------------------------------------------------
drivers/base/cpu.c: In function `crash_hotplug_show':
drivers/base/cpu.c:309:40: error: implicit declaration of function `crash_hotplug_cpu_support'; did you mean `crash_hotplug_show'? [-Werror=implicit-function-declaration]
  309 |         return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support());
      |                                        ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                                        crash_hotplug_show
cc1: some warnings being treated as errors
------------------------------------------------------

CONFIG_KEXEC is used to enable kexec_load interface, the
crash_notes/crash_notes_size/crash_hotplug showing depends on
CONFIG_KEXEC is incorrect. It should depend on KEXEC_CORE instead.

Fix it now.

Link: https://lkml.kernel.org/r/20231128055248.659808-1-bhe@redhat.com
Fixes: 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Ignat Korchagin <ignat@cloudflare.com> [compile-time only]
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions
SeongJae Park [Fri, 24 Nov 2023 21:38:40 +0000 (21:38 +0000)]
mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions

If a scheme is set to not applied to any monitoring target region for any
reasons including the target access pattern, quota, filters, or
watermarks, writing 'update_schemes_tried_regions' to 'state' DAMON sysfs
file can indefinitely hang.  Fix the case by implementing a timeout for
the operation.  The time limit is two apply intervals of each scheme.

Link: https://lkml.kernel.org/r/20231124213840.39157-1-sj@kernel.org
Fixes: 4d4e41b68299 ("mm/damon/sysfs-schemes: do not update tried regions more than one DAMON snapshot")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoscripts/gdb/tasks: fix lx-ps command error
Kuan-Ying Lee [Mon, 27 Nov 2023 07:04:01 +0000 (15:04 +0800)]
scripts/gdb/tasks: fix lx-ps command error

Since commit 8e1f385104ac ("kill task_struct->thread_group") remove
the thread_group, we will encounter below issue.

(gdb) lx-ps
      TASK          PID    COMM
0xffff800086503340   0   swapper/0
Python Exception <class 'gdb.error'>: There is no member named thread_group.
Error occurred in Python: There is no member named thread_group.

We use signal->thread_head to iterate all threads instead.

[Kuan-Ying.Lee@mediatek.com: v2]
Link: https://lkml.kernel.org/r/20231129065142.13375-2-Kuan-Ying.Lee@mediatek.com
Link: https://lkml.kernel.org/r/20231127070404.4192-2-Kuan-Ying.Lee@mediatek.com
Fixes: 8e1f385104ac ("kill task_struct->thread_group")
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/Kconfig: make userfaultfd a menuconfig
Peter Xu [Thu, 23 Nov 2023 22:42:04 +0000 (17:42 -0500)]
mm/Kconfig: make userfaultfd a menuconfig

PTE_MARKER_UFFD_WP is a subconfig for userfaultfd.  To make it clear,
switch to use menuconfig for userfaultfd.

Link: https://lkml.kernel.org/r/20231123224204.1060152-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoselftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS
Nico Pache [Mon, 20 Nov 2023 22:29:08 +0000 (15:29 -0700)]
selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS

Commit 05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
fixed the inconsistency caused by tests being defined as TEST_GEN_PROGS.
This issue was leading to tests not being executed via run_vmtests.sh and
furthermore some tests running twice due to the kselftests wrapper also
executing them.

Fix the definition of two tests (soft-dirty and pagemap_ioctl) that are
still incorrectly defined.

Link: https://lkml.kernel.org/r/20231120222908.28559-1-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/damon/core: copy nr_accesses when splitting region
SeongJae Park [Sun, 19 Nov 2023 17:15:28 +0000 (17:15 +0000)]
mm/damon/core: copy nr_accesses when splitting region

Regions split function ('damon_split_region_at()') is called at the
beginning of an aggregation interval, and when DAMOS applying the actions
and charging quota.  Because 'nr_accesses' fields of all regions are reset
at the beginning of each aggregation interval, and DAMOS was applying the
action at the end of each aggregation interval, there was no need to copy
the 'nr_accesses' field to the split-out region.

However, commit 42f994b71404 ("mm/damon/core: implement scheme-specific
apply interval") made DAMOS applies action on its own timing interval.
Hence, 'nr_accesses' should also copied to split-out regions, but the
commit didn't.  Fix it by copying it.

Link: https://lkml.kernel.org/r/20231119171529.66863-1-sj@kernel.org
Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agolib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
Ming Lei [Mon, 20 Nov 2023 08:35:59 +0000 (16:35 +0800)]
lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly

group_cpus_evenly() could be part of storage driver's error handler, such
as nvme driver, when may happen during CPU hotplug, in which storage queue
has to drain its pending IOs because all CPUs associated with the queue
are offline and the queue is becoming inactive.  And handling IO needs
error handler to provide forward progress.

Then deadlock is caused:

1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's
   handler is waiting for inflight IO

2) error handler is waiting for CPU hotplug lock

3) inflight IO can't be completed in blk-mq's CPU hotplug handler
   because error handling can't provide forward progress.

Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(),
in which two stage spreads are taken: 1) the 1st stage is over all present
CPUs; 2) the end stage is over all other CPUs.

Turns out the two stage spread just needs consistent 'cpu_present_mask',
and remove the CPU hotplug lock by storing it into one local cache.  This
way doesn't change correctness, because all CPUs are still covered.

Link: https://lkml.kernel.org/r/20231120083559.285174-1-ming.lei@redhat.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Tested-by: Guangwu Zhang <guazhang@redhat.com>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agocheckstack: fix printed address
Heiko Carstens [Mon, 20 Nov 2023 18:37:17 +0000 (19:37 +0100)]
checkstack: fix printed address

All addresses printed by checkstack have an extra incorrect 0 appended at
the end.

This was introduced with commit 677f1410e058 ("scripts/checkstack.pl: don't
display $dre as different entity"): since then the address is taken from
the line which contains the function name, instead of the line which
contains stack consumption. E.g. on s390:

0000000000100a30 <do_one_initcall>:
...
  100a44:       e3 f0 ff 70 ff 71       lay     %r15,-144(%r15)

So the used regex which matches spaces and hexadecimal numbers to extract
an address now matches a different substring. Subsequently replacing spaces
with 0 appends a zero at the and, instead of replacing leading spaces.

Fix this by using the proper regex, and simplify the code a bit.

Link: https://lkml.kernel.org/r/20231120183719.2188479-2-hca@linux.ibm.com
Fixes: 677f1410e058 ("scripts/checkstack.pl: don't display $dre as different entity")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Maninder Singh <maninder1.s@samsung.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Vaneet Narang <v.narang@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/memory_hotplug: fix error handling in add_memory_resource()
Sumanth Korikkar [Mon, 20 Nov 2023 14:53:53 +0000 (15:53 +0100)]
mm/memory_hotplug: fix error handling in add_memory_resource()

In add_memory_resource(), creation of memory block devices occurs after
successful call to arch_add_memory().  However, creation of memory block
devices could fail.  In that case, arch_remove_memory() is called to
perform necessary cleanup.

Currently with or without altmap support, arch_remove_memory() is always
passed with altmap set to NULL during error handling.  This leads to
freeing of struct pages using free_pages(), eventhough the allocation
might have been performed with altmap support via
altmap_alloc_block_buf().

Fix the error handling by passing altmap in arch_remove_memory(). This
ensures the following:
* When altmap is disabled, deallocation of the struct pages array occurs
  via free_pages().
* When altmap is enabled, deallocation occurs via vmem_altmap_free().

Link: https://lkml.kernel.org/r/20231120145354.308999-3-sumanthk@linux.ibm.com
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/memory_hotplug: add missing mem_hotplug_lock
Sumanth Korikkar [Mon, 20 Nov 2023 14:53:52 +0000 (15:53 +0100)]
mm/memory_hotplug: add missing mem_hotplug_lock

From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock
in write mode to serialise memory hotplug (e.g. access to global/zone
variables).

mhp_(de)init_memmap_on_memory() functions can change zone stats and
struct page content, but they are currently called w/o the
mem_hotplug_lock.

When memory block is being offlined and when kmemleak goes through each
populated zone, the following theoretical race conditions could occur:
CPU 0:      | CPU 1:
memory_offline()      |
-> offline_pages()      |
-> mem_hotplug_begin()      |
   ...      |
-> mem_hotplug_done()      |
     | kmemleak_scan()
     | -> get_online_mems()
     |    ...
-> mhp_deinit_memmap_on_memory()      |
  [not protected by mem_hotplug_begin/done()]|
  Marks memory section as offline,      |   Retrieves zone_start_pfn
  poisons vmemmap struct pages and updates   |   and struct page members.
  the zone related data      |
         |    ...
         | -> put_online_mems()

Fix this by ensuring mem_hotplug_lock is taken before performing
mhp_init_memmap_on_memory().  Also ensure that
mhp_deinit_memmap_on_memory() holds the lock.

online/offline_pages() are currently only called from
memory_block_online/offline(), so it is safe to move the locking there.

Link: https://lkml.kernel.org/r/20231120145354.308999-2-sumanthk@linux.ibm.com
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: <stable@vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months ago.mailmap: add a new address mapping for Chester Lin
Chester Lin [Fri, 17 Nov 2023 02:28:07 +0000 (10:28 +0800)]
.mailmap: add a new address mapping for Chester Lin

My company email address is going to be disabled so let's create a mapping
that links to my private/community email just in case people might still
try to reach me via the old one.

Link: https://lkml.kernel.org/r/20231117022807.29461-1-clin@suse.com
Signed-off-by: Chester Lin <clin@suse.com>
Cc: Chester Lin <chester62515@gmail.com>
Cc: Bjorn Andersson <quic_bjorande@quicinc.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm: fix oops when filemap_map_pmd() without prealloc_pte
Hugh Dickins [Fri, 17 Nov 2023 08:49:18 +0000 (00:49 -0800)]
mm: fix oops when filemap_map_pmd() without prealloc_pte

syzbot reports oops in lockdep's __lock_acquire(), called from
__pte_offset_map_lock() called from filemap_map_pages(); or when I run the
repro, the oops comes in pmd_install(), called from filemap_map_pmd()
called from filemap_map_pages(), just before the __pte_offset_map_lock().

The problem is that filemap_map_pmd() has been assuming that when it finds
pmd_none(), a page table has already been prepared in prealloc_pte; and
indeed do_fault_around() has been careful to preallocate one there, when
it finds pmd_none(): but what if *pmd became none in between?

My 6.6 mods in mm/khugepaged.c, avoiding mmap_lock for write, have made it
easy for *pmd to be cleared while servicing a page fault; but even before
those, a huge *pmd might be zapped while a fault is serviced.

The difference in symptomatic stack traces comes from the "memory model"
in use: pmd_install() uses pmd_populate() uses page_to_pfn(): in some
models that is strict, and will oops on the NULL prealloc_pte; in other
models, it will construct a bogus value to be populated into *pmd, then
__pte_offset_map_lock() oops when trying to access split ptlock pointer
(or some other symptom in normal case of ptlock embedded not pointer).

Link: https://lore.kernel.org/linux-mm/20231115065506.19780-1-jose.pekkarinen@foxhound.fi/
Link: https://lkml.kernel.org/r/6ed0c50c-78ef-0719-b3c5-60c0c010431c@google.com
Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-and-tested-by: syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/0000000000005e44550608a0806c@google.com/
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>,
Cc: José Pekkarinen <jose.pekkarinen@foxhound.fi>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org> [5.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agosquashfs: squashfs_read_data need to check if the length is 0
Lizhi Xu [Thu, 16 Nov 2023 03:13:52 +0000 (11:13 +0800)]
squashfs: squashfs_read_data need to check if the length is 0

When the length passed in is 0, the pagemap_scan_test_walk() caller should
bail.  This error causes at least a WARN_ON().

Link: https://lkml.kernel.org/r/20231116031352.40853-1-lizhi.xu@windriver.com
Reported-by: syzbot+32d3767580a1ea339a81@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000000526f2060a30a085@google.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/selftests: fix pagemap_ioctl memory map test
Peter Xu [Thu, 16 Nov 2023 20:15:47 +0000 (15:15 -0500)]
mm/selftests: fix pagemap_ioctl memory map test

__FILE__ is not guaranteed to exist in current dir.  Replace that with
argv[0] for memory map test.

Link: https://lkml.kernel.org/r/20231116201547.536857-4-peterx@redhat.com
Fixes: 46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set
Peter Xu [Thu, 16 Nov 2023 20:15:46 +0000 (15:15 -0500)]
mm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set

The new pagemap ioctl contains a fast path for wr-protections without
looking into category masks.  It forgets to check PM_SCAN_WP_MATCHING
before applying the wr-protections.  It can cause, e.g., pte markers
installed on archs that do not even support uffd wr-protect.

WARNING: CPU: 0 PID: 5059 at mm/memory.c:1520 zap_pte_range mm/memory.c:1520 [inline]

Link: https://lkml.kernel.org/r/20231116201547.536857-3-peterx@redhat.com
Fixes: 12f6b01a0bcb ("fs/proc/task_mmu: add fast paths to get/clear PAGE_IS_WRITTEN flag")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+7ca4b2719dc742b8d0a4@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/pagemap: fix ioctl(PAGEMAP_SCAN) on vma check
Peter Xu [Thu, 16 Nov 2023 20:15:45 +0000 (15:15 -0500)]
mm/pagemap: fix ioctl(PAGEMAP_SCAN) on vma check

Patch series "mm/pagemap: A few fixes to the recent PAGEMAP_SCAN".

This series should fix two known reports from syzbot on the new
PAGEMAP_SCAN ioctl():

https://lore.kernel.org/all/000000000000b0e576060a30ee3b@google.com/
https://lore.kernel.org/all/000000000000773fa7060a31e2cc@google.com/

The 3rd patch is something I found when testing these patches.

This patch (of 3):

The new ioctl(PAGEMAP_SCAN) relies on vma wr-protect capability provided
by userfault, however in the vma test it didn't explicitly require the vma
to have wr-protect function enabled, even if PM_SCAN_WP_MATCHING flag is
set.

It means the pagemap code can now apply uffd-wp bit to a page in the vma
even if not registered to userfaultfd at all.

Then in whatever way as long as the pte got written and page fault
resolved, we'll apply the write bit even if uffd-wp bit is set.  We'll see
a pte that has both UFFD_WP and WRITE bit set.  Anything later that looks
up the pte for uffd-wp bit will trigger the warning:

WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]

Fix it by doing proper check over the vma attributes when
PM_SCAN_WP_MATCHING is specified.

Link: https://lkml.kernel.org/r/20231116201547.536857-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20231116201547.536857-2-peterx@redhat.com
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+e94c5aaf7890901ebf9b@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm: kmem: properly initialize local objcg variable in current_obj_cgroup()
Roman Gushchin [Thu, 16 Nov 2023 02:51:09 +0000 (18:51 -0800)]
mm: kmem: properly initialize local objcg variable in current_obj_cgroup()

Erhard reported that the 6.7-rc1 kernel panics on boot if being
built with clang-16. The problem was not reproducible with gcc.

[    5.975049] general protection fault, probably for non-canonical address 0xf555515555555557: 0000 [#1] SMP KASAN PTI
[    5.976422] KASAN: maybe wild-memory-access in range [0xaaaaaaaaaaaaaab8-0xaaaaaaaaaaaaaabf]
[    5.977475] CPU: 3 PID: 1 Comm: systemd Not tainted 6.7.0-rc1-Zen3 #77
[    5.977860] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[    5.977860] RIP: 0010:obj_cgroup_charge_pages+0x27/0x2d5
[    5.977860] Code: 90 90 90 55 41 57 41 56 41 55 41 54 53 89 d5 41 89 f6 49 89 ff 48 b8 00 00 00 00 00 fc ff df 49 83 c7 10 4d3
[    5.977860] RSP: 0018:ffffc9000001fb18 EFLAGS: 00010a02
[    5.977860] RAX: dffffc0000000000 RBX: aaaaaaaaaaaaaaaa RCX: ffff8883eb9a8b08
[    5.977860] RDX: 0000000000000005 RSI: 0000000000400cc0 RDI: aaaaaaaaaaaaaaaa
[    5.977860] RBP: 0000000000000005 R08: 3333333333333333 R09: 0000000000000000
[    5.977860] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883eb9a8b18
[    5.977860] R13: 1555555555555557 R14: 0000000000400cc0 R15: aaaaaaaaaaaaaaba
[    5.977860] FS:  00007f2976438b40(0000) GS:ffff8883eb980000(0000) knlGS:0000000000000000
[    5.977860] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.977860] CR2: 00007f29769e0060 CR3: 0000000107222003 CR4: 0000000000370eb0
[    5.977860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    5.977860] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    5.977860] Call Trace:
[    5.977860]  <TASK>
[    5.977860]  ? __die_body+0x16/0x75
[    5.977860]  ? die_addr+0x4a/0x70
[    5.977860]  ? exc_general_protection+0x1c9/0x2d0
[    5.977860]  ? cgroup_mkdir+0x455/0x9fb
[    5.977860]  ? __x64_sys_mkdir+0x69/0x80
[    5.977860]  ? asm_exc_general_protection+0x26/0x30
[    5.977860]  ? obj_cgroup_charge_pages+0x27/0x2d5
[    5.977860]  obj_cgroup_charge+0x114/0x1ab
[    5.977860]  pcpu_alloc+0x1a6/0xa65
[    5.977860]  ? mem_cgroup_css_alloc+0x1eb/0x1140
[    5.977860]  ? cgroup_apply_control_enable+0x26b/0x7c0
[    5.977860]  mem_cgroup_css_alloc+0x23f/0x1140
[    5.977860]  cgroup_apply_control_enable+0x26b/0x7c0
[    5.977860]  ? cgroup_kn_set_ugid+0x2d/0x1a0
[    5.977860]  cgroup_mkdir+0x455/0x9fb
[    5.977860]  ? __cfi_cgroup_mkdir+0x10/0x10
[    5.977860]  kernfs_iop_mkdir+0x130/0x170
[    5.977860]  vfs_mkdir+0x405/0x530
[    5.977860]  do_mkdirat+0x188/0x1f0
[    5.977860]  __x64_sys_mkdir+0x69/0x80
[    5.977860]  do_syscall_64+0x7d/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[    5.977860] RIP: 0033:0x7f297671defb
[    5.977860] Code: 8b 05 39 7f 0d 00 bb ff ff ff ff 64 c7 00 16 00 00 00 e9 61 ff ff ff e8 23 0c 02 00 0f 1f 00 f3 0f 1e fa b88
[    5.977860] RSP: 002b:00007ffee6242bb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[    5.977860] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f297671defb
[    5.977860] RDX: 0000000000000000 RSI: 00000000000001ed RDI: 000055c6b449f0e0
[    5.977860] RBP: 00007ffee6242bf0 R08: 000000000000000e R09: 0000000000000000
[    5.977860] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c6b445db80
[    5.977860] R13: 00000000000003a0 R14: 00007f2976a68651 R15: 00000000000003a0
[    5.977860]  </TASK>
[    5.977860] Modules linked in:
[    6.014095] ---[ end trace 0000000000000000 ]---
[    6.014701] RIP: 0010:obj_cgroup_charge_pages+0x27/0x2d5
[    6.015348] Code: 90 90 90 55 41 57 41 56 41 55 41 54 53 89 d5 41 89 f6 49 89 ff 48 b8 00 00 00 00 00 fc ff df 49 83 c7 10 4d3
[    6.017575] RSP: 0018:ffffc9000001fb18 EFLAGS: 00010a02
[    6.018255] RAX: dffffc0000000000 RBX: aaaaaaaaaaaaaaaa RCX: ffff8883eb9a8b08
[    6.019120] RDX: 0000000000000005 RSI: 0000000000400cc0 RDI: aaaaaaaaaaaaaaaa
[    6.019983] RBP: 0000000000000005 R08: 3333333333333333 R09: 0000000000000000
[    6.020849] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883eb9a8b18
[    6.021747] R13: 1555555555555557 R14: 0000000000400cc0 R15: aaaaaaaaaaaaaaba
[    6.022609] FS:  00007f2976438b40(0000) GS:ffff8883eb980000(0000) knlGS:0000000000000000
[    6.023593] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.024296] CR2: 00007f29769e0060 CR3: 0000000107222003 CR4: 0000000000370eb0
[    6.025279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.026139] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    6.027000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Actually the problem is caused by uninitialized local variable in
current_obj_cgroup().  If the root memory cgroup is set as an active
memory cgroup for a charging scope (as in the trace, where systemd tries
to create the first non-root cgroup, so the parent cgroup is the root
cgroup), the "for" loop is skipped and uninitialized objcg is returned,
causing a panic down the accounting stack.

The fix is trivial: initialize the objcg variable to NULL unconditionally
before the "for" loop.

[vbabka@suse.cz: remove redundant assignment]
Link: https://lkml.kernel.org/r/4bd106d5-c3e3-6731-9a74-cff81e2392de@suse.cz
Link: https://lkml.kernel.org/r/20231116025109.3775055-1-roman.gushchin@linux.dev
Fixes: e86828e5446d ("mm: kmem: scoped objcg protection")
Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1959
Tested-by: Erhard Furtner <erhard_f@mailbox.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/kmemleak: move set_track_prepare() outside raw_spinlocks
Liu Shixin [Wed, 15 Nov 2023 08:21:38 +0000 (16:21 +0800)]
mm/kmemleak: move set_track_prepare() outside raw_spinlocks

set_track_prepare() will call __alloc_pages() which attempts to acquire
zone->lock(spinlocks), so move it outside object->lock(raw_spinlocks)
because it's not right to acquire spinlocks while holding raw_spinlocks in
RT mode.

Link: https://lkml.kernel.org/r/20231115082138.2649870-3-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Patrick Wang <patrick.wang.shcn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoRevert "mm/kmemleak: move the initialisation of object to __link_object"
Liu Shixin [Wed, 15 Nov 2023 08:21:37 +0000 (16:21 +0800)]
Revert "mm/kmemleak: move the initialisation of object to __link_object"

Patch series "Fix invalid wait context of set_track_prepare()".

Geert reported an invalid wait context[1] which is resulted by moving
set_track_prepare() inside kmemleak_lock.  This is not allowed because in
RT mode, the spinlocks can be preempted but raw_spinlocks can not, so it
is not allowd to acquire spinlocks while holding raw_spinlocks.  The
second patch fix same problem in kmemleak_update_trace().

This patch (of 2):

Move the initialisation of object back to__alloc_object() because
set_track_prepare() attempt to acquire zone->lock(spinlocks) while
__link_object is holding kmemleak_lock(raw_spinlocks).  This is not right
for RT mode.

This reverts commit 245245c2fffd00 ("mm/kmemleak: move the initialisation
of object to __link_object").

Link: https://lkml.kernel.org/r/20231115082138.2649870-1-liushixin2@huawei.com
Link: https://lkml.kernel.org/r/20231115082138.2649870-2-liushixin2@huawei.com
Fixes: 245245c2fffd ("mm/kmemleak: move the initialisation of object to __link_object")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Closes: https://lore.kernel.org/linux-mm/CAMuHMdWj0UzwNaxUvcocTfh481qRJpOWwXxsJCTJfu1oCqvgdA@mail.gmail.com/ [1]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Patrick Wang <patrick.wang.shcn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agomm/memory.c:zap_pte_range() print bad swap entry
Andrew Morton [Wed, 15 Nov 2023 21:54:18 +0000 (13:54 -0800)]
mm/memory.c:zap_pte_range() print bad swap entry

We have a report of this WARN() triggering.  Let's print the offending
swp_entry_t to help diagnosis.

Link: https://lkml.kernel.org/r/000000000000b0e576060a30ee3b@google.com
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agohugetlb: fix null-ptr-deref in hugetlb_vma_lock_write
Mike Kravetz [Tue, 14 Nov 2023 01:20:33 +0000 (17:20 -0800)]
hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write

The routine __vma_private_lock tests for the existence of a reserve map
associated with a private hugetlb mapping.  A pointer to the reserve map
is in vma->vm_private_data.  __vma_private_lock was checking the pointer
for NULL.  However, it is possible that the low bits of the pointer could
be used as flags.  In such instances, vm_private_data is not NULL and not
a valid pointer.  This results in the null-ptr-deref reported by syzbot:

general protection fault, probably for non-canonical address 0xdffffc000000001d:
 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
CPU: 0 PID: 5048 Comm: syz-executor139 Not tainted 6.6.0-rc7-syzkaller-00142-g88
8cf78c29e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 1
0/09/2023
RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5004
...
Call Trace:
 <TASK>
 lock_acquire kernel/locking/lockdep.c:5753 [inline]
 lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5718
 down_write+0x93/0x200 kernel/locking/rwsem.c:1573
 hugetlb_vma_lock_write mm/hugetlb.c:300 [inline]
 hugetlb_vma_lock_write+0xae/0x100 mm/hugetlb.c:291
 __hugetlb_zap_begin+0x1e9/0x2b0 mm/hugetlb.c:5447
 hugetlb_zap_begin include/linux/hugetlb.h:258 [inline]
 unmap_vmas+0x2f4/0x470 mm/memory.c:1733
 exit_mmap+0x1ad/0xa60 mm/mmap.c:3230
 __mmput+0x12a/0x4d0 kernel/fork.c:1349
 mmput+0x62/0x70 kernel/fork.c:1371
 exit_mm kernel/exit.c:567 [inline]
 do_exit+0x9ad/0x2a20 kernel/exit.c:861
 __do_sys_exit kernel/exit.c:991 [inline]
 __se_sys_exit kernel/exit.c:989 [inline]
 __x64_sys_exit+0x42/0x50 kernel/exit.c:989
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Mask off low bit flags before checking for NULL pointer.  In addition, the
reserve map only 'belongs' to the OWNER (parent in parent/child
relationships) so also check for the OWNER flag.

Link: https://lkml.kernel.org/r/20231114012033.259600-1-mike.kravetz@oracle.com
Reported-by: syzbot+6ada951e7c0f7bc8a71e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/00000000000078d1e00608d7878b@google.com/
Fixes: bf4916922c60 ("hugetlbfs: extend hugetlb_vma_lock to private VMAs")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Edward Adam Davis <eadavis@qq.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoMAINTAINERS: add Andrew Morton for lib/*
Andrew Morton [Tue, 14 Nov 2023 23:02:04 +0000 (15:02 -0800)]
MAINTAINERS: add Andrew Morton for lib/*

Add myself as the fallthough maintainer for material under lib/.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
5 months agoLinux 6.7-rc4
Linus Torvalds [Sun, 3 Dec 2023 09:52:56 +0000 (18:52 +0900)]
Linux 6.7-rc4

5 months agoMerge tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 3 Dec 2023 00:08:26 +0000 (09:08 +0900)]
Merge tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Two fallocate fixes

 - Fix warnings from new gcc

 - Two symlink fixes

* tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client, common: fix fortify warnings
  cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved
  cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
  smb: client: report correct st_size for SMB and NFS symlinks
  smb: client: fix missing mode bits for SMB symlinks

5 months agoMerge tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 Dec 2023 00:03:07 +0000 (09:03 +0900)]
Merge tag 'firewire-fixes-6.7-rc4' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
 "A single patch to fix long-standing issue of memory leak at failure of
  device registration for fw_unit. We rarely encounter the issue, but it
  should be applied to stable releases, since it fixes inappropriate API
  usage"

* tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: fix possible memory leak in create_units()

5 months agoMerge tag 'powerpc-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 2 Dec 2023 23:43:35 +0000 (08:43 +0900)]
Merge tag 'powerpc-6.7-3' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix corruption of f0/vs0 during FP/Vector save, seen as userspace
   crashes when using io-uring workers (in particular with MariaDB)

 - Fix KVM_RUN potentially clobbering all host userspace FP/Vector
   registers

Thanks to Timothy Pearson, Jens Axboe, and Nicholas Piggin.

* tag 'powerpc-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers
  powerpc: Don't clobber f0/vs0 during fp|altivec register save

5 months agoMerge tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio
Linus Torvalds [Sat, 2 Dec 2023 23:37:39 +0000 (08:37 +0900)]
Merge tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio

Pull vfio fixes from Alex Williamson:

 - Fix the lifecycle of a mutex in the pds variant driver such that a
   reset prior to opening the device won't find it uninitialized.
   Implement the release path to symmetrically destroy the mutex. Also
   switch a different lock from spinlock to mutex as the code path has
   the potential to sleep and doesn't need the spinlock context
   otherwise (Brett Creeley)

 - Fix an issue detected via randconfig where KVM tries to symbol_get an
   undeclared function. The symbol is temporarily declared
   unconditionally here, which resolves the problem and avoids churn
   relative to a series pending for the next merge window which resolves
   some of this symbol ugliness, but also fixes Kconfig dependencies
   (Sean Christopherson)

* tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio:
  vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart
  vfio/pds: Fix possible sleep while in atomic context
  vfio/pds: Fix mutex lock->magic != lock warning

5 months agoMerge tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 2 Dec 2023 23:31:53 +0000 (08:31 +0900)]
Merge tag 'for-linus-6.7a-rc4-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - A fix for the Xen event driver setting the correct return value when
   experiencing an allocation failure

 - A fix for allocating space for a struct in the percpu area to not
   cross page boundaries (this one is for x86, a similar one for Arm was
   already in the pull request for rc3)

* tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: fix error code in xen_bind_pirq_msi_to_irq()
  x86/xen: fix percpu vcpu_info allocation

5 months agoMerge tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 2 Dec 2023 23:02:49 +0000 (08:02 +0900)]
Merge tag 'probes-fixes-v6.7-rc3' of git://git./linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - objpool: Fix objpool overrun case on memory/cache access delay
   especially on the big.LITTLE SoC. The objpool uses a copy of object
   slot index internal loop, but the slot index can be changed on
   another processor in parallel. In that case, the difference of 'head'
   local copy and the 'slot->last' index will be bigger than local slot
   size. In that case, we need to re-read the slot::head to update it.

 - kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
   kretprobe_holder::rp is RCU managed, it should use
   rcu_assign_pointer() and rcu_dereference_check() correctly. Also
   adding __rcu tag for finding wrong usage by sparse.

 - rethook: Fix to use appropriate rcu API for rethook::handler. The
   same as kretprobe, rethook::handler is RCU managed and it should use
   rcu_assign_pointer() and rcu_dereference_check(). This also adds
   __rcu tag for finding wrong usage by sparse.

* tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rethook: Use __rcu pointer for rethook::handler
  kprobes: consistent rcu api usage for kretprobe holder
  lib: objpool: fix head overrun on RK3588 SBC

5 months agoMerge tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 2 Dec 2023 00:01:00 +0000 (09:01 +0900)]
Merge tag 'pm-6.7-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix issues in two cpufreq drivers, in the AMD P-state driver and
  in the power-capping DTPM framework.

  Specifics:

   - Fix the AMD P-state driver's EPP sysfs interface in the cases when
     the performance governor is in use (Ayush Jain)

   - Make the ->fast_switch() callback in the AMD P-state driver return
     the target frequency as expected (Gautham R. Shenoy)

   - Allow user space to control the range of frequencies to use via
     scaling_min_freq and scaling_max_freq when AMD P-state driver is in
     use (Wyes Karny)

   - Prevent power domains needed for wakeup signaling from being turned
     off during system suspend on Qualcomm systems and prevent
     performance states votes from runtime-suspended devices from being
     lost across a system suspend-resume cycle in qcom-cpufreq-nvmem
     (Stephan Gerhold)

   - Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
     i.MX6ULL types that can run at that frequency (Christoph
     Niedermaier)

   - Eliminate unnecessary and harmful conversions to uW from the DTPM
     (dynamic thermal and power management) framework (Lukasz Luba)"

* tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate: Only print supported EPP values for performance governor
  cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update
  powercap: DTPM: Fix unneeded conversions to micro-Watts
  cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()
  pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
  cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
  cpufreq: qcom-nvmem: Enable virtual power domain devices
  cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily

5 months agoMerge tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 1 Dec 2023 23:52:20 +0000 (08:52 +0900)]
Merge tag 'acpi-6.7-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "This fixes a recently introduced build issue on ARM32 and a NULL
  pointer dereference in the ACPI backlight driver due to a design issue
  exposed by a recent change in the ACPI bus type code.

  Specifics:

   - Fix a recently introduced build issue on ARM32 platforms caused by
     an inadvertent header file breakage (Dave Jiang)

   - Eliminate questionable usage of acpi_driver_data() in the ACPI
     backlight cooling device code that leads to NULL pointer
     dereferences after recent ACPI core changes (Hans de Goede)"

* tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: video: Use acpi_video_device for cooling-dev driver data
  ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes

5 months agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 1 Dec 2023 23:48:59 +0000 (08:48 +0900)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix a regression where the arm64 KPTI ends up enabled even on systems
  that don't need it"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Avoid enabling KPTI unnecessarily

5 months agoMerge tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 1 Dec 2023 23:42:39 +0000 (08:42 +0900)]
Merge tag 'iommu-fixes-v6.7-rc3' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Fix race conditions in device probe path

 - Handle ERR_PTR() returns in __iommu_domain_alloc() path

 - Update MAINTAINERS entry for Qualcom IOMMUs

 - Printk argument fix in device tree specific code

 - Several Intel VT-d fixes from Lu Baolu:
     - Do not support enforcing cache coherency for non-empty domains
     - Avoid devTLB invalidation if iommu is off
     - Disable PCI ATS in legacy passthrough mode
     - Support non-PCI devices when clearing context
     - Fix incorrect cache invalidation for mm notification
     - Add MTL to quirk list to skip TE disabling
     - Set variable intel_dirty_ops to static

* tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Fix printk arg in of_iommu_get_resv_regions()
  iommu/vt-d: Set variable intel_dirty_ops to static
  iommu/vt-d: Fix incorrect cache invalidation for mm notification
  iommu/vt-d: Add MTL to quirk list to skip TE disabling
  iommu/vt-d: Make context clearing consistent with context mapping
  iommu/vt-d: Disable PCI ATS in legacy passthrough mode
  iommu/vt-d: Omit devTLB invalidation requests when TES=0
  iommu/vt-d: Support enforce_cache_coherency only for empty domains
  iommu: Avoid more races around device probe
  MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry
  iommu: Flow ERR_PTR out from __iommu_domain_alloc()

5 months agoMerge tag 'sound-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 1 Dec 2023 23:33:29 +0000 (08:33 +0900)]
Merge tag 'sound-6.7-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "No surprise here, including only a collection of HD-audio
  device-specific small fixes"

* tag 'sound-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: Disable power-save on KONTRON SinglePC
  ALSA: hda/realtek: Add supported ALC257 for ChromeOS
  ALSA: hda/realtek: Headset Mic VREF to 100%
  ALSA: hda: intel-nhlt: Ignore vbps when looking for DMIC 32 bps format
  ALSA: hda: cs35l56: Enable low-power hibernation mode on SPI
  ALSA: cs35l41: Fix for old systems which do not support command
  ALSA: hda: cs35l41: Remove unnecessary boolean state variable firmware_running
  ALSA: hda - Fix speaker and headset mic pin config for CHUWI CoreBook XPro

5 months agoMerge tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 1 Dec 2023 23:18:59 +0000 (08:18 +0900)]
Merge tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Weekly fixes, mostly amdgpu fixes with a scattering of nouveau, i915,
  and a couple of reverts. Hopefully it will quieten down in coming
  weeks.

  drm:
   - Revert unexport of prime helpers for fd/handle conversion

  dma_resv:
   - Do not double add fences in dma_resv_add_fence.

  gpuvm:
   - Fix GPUVM license identifier.

  i915:
   - Mark internal GSC engine with reserved uabi class
   - Take VGA converters into account in eDP probe
   - Fix intel_pre_plane_updates() call to ensure workarounds get applied

  panel:
   - Revert panel fixes as they require exporting device_is_dependent.

  nouveau:
   - fix oversized allocations in new vm path
   - fix zero-length array
   - remove a stray lock

  nt36523:
   - Fix error check for nt36523.

  amdgpu:
   - DMUB fix
   - DCN 3.5 fixes
   - XGMI fix
   - DCN 3.2 fixes
   - Vangogh suspend fix
   - NBIO 7.9 fix
   - GFX11 golden register fix
   - Backlight fix
   - NBIO 7.11 fix
   - IB test overflow fix
   - DCN 3.1.4 fixes
   - fix a runtime pm ref count
   - Retimer fix
   - ABM fix
   - DCN 3.1.5 fix
   - Fix AGP addressing
   - Fix possible memory leak in SMU error path
   - Make sure PME is enabled in D3
   - Fix possible NULL pointer dereference in debugfs
   - EEPROM fix
   - GC 9.4.3 fix

  amdkfd:
   - IP version check fix
   - Fix memory leak in pqm_uninit()"

* tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm: (53 commits)
  Revert "drm/prime: Unexport helpers for fd/handle conversion"
  drm/amdgpu: Use another offset for GC 9.4.3 remap
  drm/amd/display: Fix some HostVM parameters in DML
  drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit
  drm/amdgpu: Update EEPROM I2C address for smu v13_0_0
  drm/amd/display: Allow DTBCLK disable for DCN35
  drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer
  drm/amd: Enable PCIe PME from D3
  drm/amd/pm: fix a memleak in aldebaran_tables_init
  drm/amdgpu: fix AGP addressing when GART is not at 0
  drm/amd/display: update dcn315 lpddr pstate latency
  drm/amd/display: fix ABM disablement
  drm/amd/display: Fix black screen on video playback with embedded panel
  drm/amd/display: Fix conversions between bytes and KB
  drm/amdkfd: Use common function for IP version check
  drm/amd/display: Remove config update
  drm/amd/display: Update DCN35 clock table policy
  drm/amd/display: force toggle rate wa for first link training for a retimer
  drm/amdgpu: correct the amdgpu runtime dereference usage count
  drm/amd/display: Update min Z8 residency time to 2100 for DCN314
  ...

5 months agoMerge tag 'io_uring-6.7-2023-11-30' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 1 Dec 2023 21:47:32 +0000 (06:47 +0900)]
Merge tag 'io_uring-6.7-2023-11-30' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix an issue with discontig page checking for IORING_SETUP_NO_MMAP

 - Fix an issue with not allowing IORING_SETUP_NO_MMAP also disallowing
   mmap'ed buffer rings

 - Fix an issue with deferred release of memory mapped pages

 - Fix a lockdep issue with IORING_SETUP_NO_MMAP

 - Use fget/fput consistently, even from our sync system calls. No real
   issue here, but if we were ever to allow closing io_uring descriptors
   it would be required. Let's play it safe and just use the full ref
   counted versions upfront. Most uses of io_uring are threaded anyway,
   and hence already doing the full version underneath.

* tag 'io_uring-6.7-2023-11-30' of git://git.kernel.dk/linux:
  io_uring: use fget/fput consistently
  io_uring: free io_buffer_list entries via RCU
  io_uring/kbuf: prune deferred locked cache when tearing down
  io_uring/kbuf: recycle freed mapped buffer ring entries
  io_uring/kbuf: defer release of mapped buffer rings
  io_uring: enable io_mem_alloc/free to be used in other parts
  io_uring: don't guard IORING_OFF_PBUF_RING with SETUP_NO_MMAP
  io_uring: don't allow discontig pages for IORING_SETUP_NO_MMAP

5 months agoMerge tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 1 Dec 2023 21:39:30 +0000 (06:39 +0900)]
Merge tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - Invalid namespace identification error handling (Marizio Ewan,
       Keith)
     - Fabrics keep-alive tuning (Mark)

 - Fix for a bad error check regression in bcache (Markus)

 - Fix for a performance regression with O_DIRECT (Ming)

 - Fix for a flush related deadlock (Ming)

 - Make the read-only warn on per-partition (Yu)

* tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux:
  nvme-core: check for too small lba shift
  blk-mq: don't count completed flush data request as inflight in case of quiesce
  block: Document the role of the two attribute groups
  block: warn once for each partition in bio_check_ro()
  block: move .bd_inode into 1st cacheline of block_device
  nvme: check for valid nvme_identify_ns() before using it
  nvme-core: fix a memory leak in nvme_ns_info_from_identify()
  nvme: fine-tune sending of first keep-alive
  bcache: revert replacing IS_ERR_OR_NULL with IS_ERR

5 months agoMerge tag 'dm-6.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Fri, 1 Dec 2023 21:32:29 +0000 (06:32 +0900)]
Merge tag 'dm-6.7/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM verity target's FEC support to always initialize IO before it
   frees it. Also fix alignment of struct dm_verity_fec_io within the
   per-bio-data

 - Fix DM verity target to not FEC failed readahead IO

 - Update DM flakey target to use MAX_ORDER rather than MAX_ORDER - 1

* tag 'dm-6.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-flakey: start allocating with MAX_ORDER
  dm-verity: align struct dm_verity_fec_io properly
  dm verity: don't perform FEC for failed readahead IO
  dm verity: initialize fec io before freeing it

5 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 1 Dec 2023 21:27:20 +0000 (06:27 +0900)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three small fixes, one in drivers.

  The core changes are to the internal representation of flags in
  scsi_devices which removes space wasting bools in favour of single bit
  flags and to add a flag to force a runtime resume which is used by ATA
  devices"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Fix system start for ATA devices
  scsi: Change SCSI device boolean fields to single bit flags
  scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode

5 months agoMerge tag 'fs_for_v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack...
Linus Torvalds [Fri, 1 Dec 2023 21:19:27 +0000 (06:19 +0900)]
Merge tag 'fs_for_v6.7-rc4' of git://git./linux/kernel/git/jack/linux-fs

Pull ext2 fix from Jan Kara:
 "Fix an ext2 bug introduced by changes in ext2 & iomap stepping on each
  other toes (apparently ext2 driver does not get much testing in
  linux-next)"

* tag 'fs_for_v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Fix ki_pos update for DIO buffered-io fallback case

5 months agoMerge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Fri, 1 Dec 2023 21:02:16 +0000 (06:02 +0900)]
Merge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs bugfixes from Kent Overstreet:

 - bcache & bcachefs were broken with CFI enabled; patch for closures to
   fix type punning

 - mark erasure coding as extra-experimental; there are incompatible
   disk space accounting changes coming for erasure coding, and I'm
   still seeing checksum errors in some tests

 - several fixes for durability-related issues (durability is a device
   specific setting where we can tell bcachefs that data on a given
   device should be counted as replicated x times)

 - a fix for a rare livelock when a btree node merge then updates a
   parent node that is almost full

 - fix a race in the device removal path, where dropping a pointer in a
   btree node to a device would be clobbered by an in flight btree write
   updating the btree node key on completion

 - fix one SRCU lock hold time warning in the btree gc code - ther's
   still a bunch more of these to fix

 - fix a rare race where we'd start copygc before initializing the "are
   we rw" percpu refcount; copygc would think we were already ro and die
   immediately

* tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits)
  bcachefs: Extra kthread_should_stop() calls for copygc
  bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
  bcachefs: Fix race between btree writes and metadata drop
  bcachefs: move journal seq assertion
  bcachefs: -EROFS doesn't count as move_extent_start_fail
  bcachefs: trace_move_extent_start_fail() now includes errcode
  bcachefs: Fix split_race livelock
  bcachefs: Fix bucket data type for stripe buckets
  bcachefs: Add missing validation for jset_entry_data_usage
  bcachefs: Fix zstd compress workspace size
  bcachefs: bpos is misaligned on big endian
  bcachefs: Fix ec + durability calculation
  bcachefs: Data update path won't accidentaly grow replicas
  bcachefs: deallocate_extra_replicas()
  bcachefs: Proper refcounting for journal_keys
  bcachefs: preserve device path as device name
  bcachefs: Fix an endianness conversion
  bcachefs: Start gc, copygc, rebalance threads after initing writes ref
  bcachefs: Don't stop copygc thread on device resize
  bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops
  ...

5 months agoMerge branch 'acpi-tables'
Rafael J. Wysocki [Fri, 1 Dec 2023 20:32:19 +0000 (21:32 +0100)]
Merge branch 'acpi-tables'

Merge a fix for a recently introduced build issue on ARM32 platforms
caused by an inadvertent header file breakage (Dave Jiang).

* acpi-tables:
  ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes

5 months agoMerge branch 'powercap'
Rafael J. Wysocki [Fri, 1 Dec 2023 20:07:55 +0000 (21:07 +0100)]
Merge branch 'powercap'

Merge a power capping fix for 6.7-rc4 which eliminates unnecessary
and harmful conversions to uW from the DTPM (dynamic thermal and power
management) framework (Lukasz Luba).

* powercap:
  powercap: DTPM: Fix unneeded conversions to micro-Watts

5 months agoMerge tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme into block-6.7
Jens Axboe [Fri, 1 Dec 2023 16:09:16 +0000 (09:09 -0700)]
Merge tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme into block-6.7

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.7

 - Invalid namespace identification error handling (Marizio Ewan, Keith)
 - Fabrics keep-alive tuning (Mark)"

* tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme:
  nvme-core: check for too small lba shift
  nvme: check for valid nvme_identify_ns() before using it
  nvme-core: fix a memory leak in nvme_ns_info_from_identify()
  nvme: fine-tune sending of first keep-alive

5 months agonvme-core: check for too small lba shift
Keith Busch [Tue, 28 Nov 2023 17:36:04 +0000 (09:36 -0800)]
nvme-core: check for too small lba shift

The block layer doesn't support logical block sizes smaller than 512
bytes. The nvme spec doesn't support that small either, but the driver
isn't checking to make sure the device responded with usable data.
Failing to catch this will result in a kernel bug, either from a
division by zero when stacking, or a zero length bio.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
5 months agoblk-mq: don't count completed flush data request as inflight in case of quiesce
Ming Lei [Fri, 1 Dec 2023 08:56:05 +0000 (16:56 +0800)]
blk-mq: don't count completed flush data request as inflight in case of quiesce

Request queue quiesce may interrupt flush sequence, and the original request
may have been marked as COMPLETE, but can't get finished because of
queue quiesce.

This way is fine from driver viewpoint, because flush sequence is block
layer concept, and it isn't related with driver.

However, driver(such as dm-rq) can call blk_mq_queue_inflight() to count &
drain inflight requests, then the wait & drain never gets done because
the completed & not-finished flush request is counted as inflight.

Fix this issue by not counting completed flush data request as inflight in
case of quiesce.

Cc: Mike Snitzer <snitzer@kernel.org>
Cc: David Jeffery <djeffery@redhat.com>
Cc: John Pittman <jpittman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231201085605.577730-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 months agoiommu: Fix printk arg in of_iommu_get_resv_regions()
Daniel Mentz [Wed, 8 Nov 2023 06:22:26 +0000 (22:22 -0800)]
iommu: Fix printk arg in of_iommu_get_resv_regions()

The variable phys is defined as (struct resource *) which aligns with
the printk format specifier %pr. Taking the address of it results in a
value of type (struct resource **) which is incompatible with the format
specifier %pr. Therefore, remove the address of operator (&).

Fixes: a5bf3cfce8cb ("iommu: Implement of_iommu_get_resv_regions()")
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20231108062226.928985-1-danielmentz@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>