Steven Rostedt (VMware) [Fri, 29 Oct 2021 13:54:14 +0000 (09:54 -0400)]
tracing: Fix misspelling of "missing"
My snake instinct was on and I wrote "misssing" instead of "missing".
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 29 Oct 2021 13:52:23 +0000 (09:52 -0400)]
ftrace: Fix kernel-doc formatting issues
Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.
Also, the "ftrace_is_dead()" had an incomplete description.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Wed, 27 Oct 2021 16:08:54 +0000 (12:08 -0400)]
tracing: Do not warn when connecting eprobe to non existing event
When the syscall trace points are not configured in, the kselftests for
ftrace will try to attach an event probe (eprobe) to one of the system
call trace points. This triggered a WARNING, because the failure only
expects to see memory issues. But this is not the only failure. The user
may attempt to attach to a non existent event, and the kernel must not
warn about it.
Link: https://lkml.kernel.org/r/20211027120854.0680aa0f@gandalf.local.home
Fixes:
7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Wed, 27 Oct 2021 16:51:01 +0000 (12:51 -0400)]
ftrace/nds32: Update the proto for ftrace_trace_function to match ftrace_stub
The ftrace callback prototype was changed to pass a special ftrace_regs
instead of pt_regs as the last parameter, but the static ftrace for nds32
missed updating ftrace_trace_function and this caused a warning when
compared to ftrace_stub:
../arch/nds32/kernel/ftrace.c: In function '_mcount':
../arch/nds32/kernel/ftrace.c:24:35: error: comparison of distinct pointer types lacks a cast [-Werror]
24 | if (ftrace_trace_function != ftrace_stub)
| ^~
Link: https://lore.kernel.org/all/20211027055554.19372-1-rdunlap@infradead.org/
Link: https://lkml.kernel.org/r/20211027125101.33449969@gandalf.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Fixes:
d19ad0775dcd6 ("ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Mon, 18 Oct 2021 19:44:12 +0000 (15:44 -0400)]
tracing: Have all levels of checks prevent recursion
While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.
The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.
Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.
Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.
Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.
If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.
Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.
This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.
i.e.
traced_function_1: [ more than one callback tracing it ]
call loop_func
loop_func:
trace_recursion set internal bit
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
traced_function_2: [ only traced by above callback ]
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
[ wash, rinse, repeat, BOOM! out of shampoo! ]
Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.
Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.
[*] There is on layer of recursion that is allowed, and that is to allow
for the transition between interrupt context (normal -> softirq ->
irq -> NMI), because a trace may occur before the context update is
visible to the trace recursion logic.
Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes:
edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt [Thu, 14 Oct 2021 18:35:07 +0000 (14:35 -0400)]
nds32/ftrace: Fix Error: invalid operands (*UND* and *UND* sections) for `^'
I received a build failure for a new patch I'm working on the nds32
architecture, and when I went to test it, I couldn't get to my build error,
because it failed to build with a bunch of:
Error: invalid operands (*UND* and *UND* sections) for `^'
issues with various files. Those files were temporary asm files that looked
like: kernel/.tmp_mc_fork.s
I decided to look deeper, and found that the "mc" portion of that name
stood for "mcount", and was created by the recordmcount.pl script. One that
I wrote over a decade ago. Once I knew the source of the problem, I was
able to investigate it further.
The way the recordmcount.pl script works (BTW, there's a C version that
simply modifies the ELF object) is by doing an "objdump" on the object
file. Looks for all the calls to "mcount", and creates an offset of those
locations from some global variable it can use (usually a global function
name, found with <.*>:). Creates a asm file that is a table of references
to these locations, using the found variable/function. Compiles it and
links it back into the original object file. This asm file is called
".tmp_mc_<object_base_name>.s".
The problem here is that the objdump produced by the nds32 object file,
contains things that look like:
0000159a <.L3^B1>:
159a: c6 00 beqz38 $r6, 159a <.L3^B1>
159a: R_NDS32_9_PCREL_RELA .text+0x159e
159c: 84 d2 movi55 $r6, #-14
159e: 80 06 mov55 $r0, $r6
15a0: ec 3c addi10.sp #0x3c
Where ".L3^B1 is somehow selected as the "global" variable to index off of.
Then the assembly file that holds the mcount locations looks like this:
.section __mcount_loc,"a",@progbits
.align 2
.long .L3^B1 + -5522
.long .L3^B1 + -5384
.long .L3^B1 + -5270
.long .L3^B1 + -5098
.long .L3^B1 + -4970
.long .L3^B1 + -4758
.long .L3^B1 + -4122
[...]
And when it is compiled back to an object to link to the original object,
the compile fails on the "^" symbol.
Simple solution for now, is to have the perl script ignore using function
symbols that have an "^" in the name.
Link: https://lkml.kernel.org/r/20211014143507.4ad2c0f7@gandalf.local.home
Cc: stable@vger.kernel.org
Acked-by: Greentime Hu <green.hu@gmail.com>
Fixes:
fbf58a52ac088 ("nds32/ftrace: Add RECORD_MCOUNT support")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Wed, 13 Oct 2021 20:51:13 +0000 (16:51 -0400)]
selftests/ftrace: Update test for more eprobe removal process
The removal of eprobes was broken and missed in testing. Add various ways
to remove eprobes that are considered acceptable to the testing process to
catch when/if they break again.
Link: https://lkml.kernel.org/r/20211013205533.836644549@goodmis.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Wed, 13 Oct 2021 20:51:12 +0000 (16:51 -0400)]
tracing: Fix event probe removal from dynamic events
When an event probe is to be removed via the API that created it via the
dynamic events, an -ENOENT error is returned.
This is because the removal of the event probe does not expect to see the
event system and name that the event probe is attached to, even though
that's part of the API to create it. As the removal of probes is to use
the same API as they are created.
In fact, the removal is not consistent with the kprobes and uprobes
removal. Fix that by allowing various ways to remove the eprobe.
The eprobe is created with:
e:[GROUP/]NAME SYSTEM/EVENT [OPTIONS]
Have it get removed by echoing in the following into dynamic_events:
# Remove all eprobes with NAME
echo '-:NAME' >> dynamic_events
# Remove a specific eprobe
echo '-:GROUP/NAME' >> dynamic_events
echo '-:GROUP/NAME SYSTEM/EVENT' >> dynamic_events
echo '-:NAME SYSTEM/EVENT' >> dynamic_events
echo '-:GROUP/NAME SYSTEM/EVENT OPTIONS' >> dynamic_events
echo '-:NAME SYSTEM/EVENT OPTIONS' >> dynamic_events
Link: https://lkml.kernel.org/r/20211012081925.0e19cc4f@gandalf.local.home
Link: https://lkml.kernel.org/r/20211013205533.630722129@goodmis.org
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Colin Ian King [Wed, 6 Oct 2021 17:28:30 +0000 (18:28 +0100)]
tracing: Fix missing * in comment block
There is a missing * in a comment block, add it in.
Link: https://lkml.kernel.org/r/20211006172830.1025336-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Thu, 16 Sep 2021 06:23:12 +0000 (15:23 +0900)]
bootconfig: init: Fix memblock leak in xbc_make_cmdline()
Free unused memblock in a error case to fix memblock leak
in xbc_make_cmdline().
Link: https://lkml.kernel.org/r/163177339181.682366.8713781325929549256.stgit@devnote2
Fixes:
51887d03aca1 ("bootconfig: init: Allow admin to use bootconfig for kernel command line")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Vamshi K Sthambamkadi [Fri, 8 Oct 2021 07:18:06 +0000 (12:48 +0530)]
tracing: Fix memory leak in eprobe_register()
kmemleak report:
unreferenced object 0xffff900a70ec7ec0 (size 32):
comm "ftracetest", pid 2770, jiffies
4295042510 (age 311.464s)
hex dump (first 32 bytes):
c8 31 23 45 0a 90 ff ff 40 85 c7 6e 0a 90 ff ff .1#E....@..n....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440
[<
0000000088b8124b>] eprobe_register+0x1e3/0x350
[<
000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240
[<
0000000019109321>] event_enable_write+0x93/0xe0
[<
000000007d85b320>] vfs_write+0xb9/0x260
[<
00000000b94c5e41>] ksys_write+0x67/0xe0
[<
000000005a08c81d>] __x64_sys_write+0x1a/0x20
[<
00000000240bf576>] do_syscall_64+0x3b/0xc0
[<
0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff900a56bbf280 (size 128):
comm "ftracetest", pid 2770, jiffies
4295042510 (age 311.464s)
hex dump (first 32 bytes):
ff ff ff ff ff ff ff ff 00 00 00 00 01 00 00 00 ................
80 69 3b b2 ff ff ff ff 20 69 3b b2 ff ff ff ff .i;..... i;.....
backtrace:
[<
000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440
[<
00000000c4e90fad>] eprobe_register+0x1fc/0x350
[<
000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240
[<
0000000019109321>] event_enable_write+0x93/0xe0
[<
000000007d85b320>] vfs_write+0xb9/0x260
[<
00000000b94c5e41>] ksys_write+0x67/0xe0
[<
000000005a08c81d>] __x64_sys_write+0x1a/0x20
[<
00000000240bf576>] do_syscall_64+0x3b/0xc0
[<
0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae
In new_eprobe_trigger(), allocated edata and trigger variables are
never freed.
To fix, free memory in disable_eprobe().
Link: https://lkml.kernel.org/r/20211008071802.GA2098@cosmos
Fixes:
7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Jackie Liu [Wed, 22 Sep 2021 02:51:22 +0000 (10:51 +0800)]
tracing: Fix missing osnoise tracer on max_latency
The compiler warns when the data are actually unused:
kernel/trace/trace.c:1712:13: error: ‘trace_create_maxlat_file’ defined but not used [-Werror=unused-function]
1712 | static void trace_create_maxlat_file(struct trace_array *tr,
| ^~~~~~~~~~~~~~~~~~~~~~~~
[Why]
CONFIG_HWLAT_TRACER=n, CONFIG_TRACER_MAX_TRACE=n, CONFIG_OSNOISE_TRACER=y
gcc report warns.
[How]
Now trace_create_maxlat_file will only take effect when
CONFIG_HWLAT_TRACER=y or CONFIG_TRACER_MAX_TRACE=y. In fact, after
adding osnoise trace, it also needs to take effect.
Link: https://lore.kernel.org/all/c1d9e328-ad7c-920b-6c24-9e1598a6421c@infradead.org/
Link: https://lkml.kernel.org/r/20210922025122.3268022-1-liu.yun@linux.dev
Fixes:
bce29ac9ce0b ("trace: Add osnoise tracer")
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Sun, 26 Sep 2021 21:08:19 +0000 (14:08 -0700)]
Linux 5.15-rc3
Linus Torvalds [Sun, 26 Sep 2021 19:46:45 +0000 (12:46 -0700)]
Merge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Five fixes for the ksmbd kernel server, including three security
fixes:
- remove follow symlinks support
- use LOOKUP_BENEATH to prevent out of share access
- SMB3 compounding security fix
- fix for returning the default streams correctly, fixing a bug when
writing ppt or doc files from some clients
- logging more clearly that ksmbd is experimental (at module load
time)"
* tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: use LOOKUP_BENEATH to prevent the out of share access
ksmbd: remove follow symlinks support
ksmbd: check protocol id in ksmbd_verify_smb_message()
ksmbd: add default data stream name in FILE_STREAM_INFORMATION
ksmbd: log that server is experimental at module load
Linus Torvalds [Sun, 26 Sep 2021 19:18:10 +0000 (12:18 -0700)]
Merge tag 'edac_urgent_for_v5.15_rc3' of git://git./linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
"Fix two EDAC drivers using the wrong value type for the DIMM mode"
* tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/dmc520: Assign the proper type to dimm->edac_mode
EDAC/synopsys: Fix wrong value type assignment for edac_mode
Linus Torvalds [Sun, 26 Sep 2021 19:11:58 +0000 (12:11 -0700)]
Merge tag 'thermal-v5.15-rc3' of git://git./linux/kernel/git/thermal/linux
Pull thermal fixes from Daniel Lezcano:
- Fix thermal shutdown after a suspend/resume due to a wrong TCC value
restored on Intel platform (Antoine Tenart)
- Fix potential buffer overflow when building the list of policies. The
buffer size is not updated after writing to it (Dan Carpenter)
- Fix wrong check against IS_ERR instead of NULL (Ansuel Smith)
* tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/drivers/tsens: Fix wrong check for tzd in irq handlers
thermal/core: Potential buffer overflow in thermal_build_list_of_policies()
thermal/drivers/int340x: Do not set a wrong tcc offset on resume
Linus Torvalds [Sun, 26 Sep 2021 17:09:20 +0000 (10:09 -0700)]
Merge tag 'x86-urgent-2021-09-26' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for X86:
- Prevent sending the wrong signal when protection keys are enabled
and the kernel handles a fault in the vsyscall emulation.
- Invoke early_reserve_memory() before invoking e820_memory_setup()
which is required to make the Xen dom0 e820 hooks work correctly.
- Use the correct data type for the SETZ operand in the EMQCMDS
instruction wrapper.
- Prevent undefined behaviour to the potential unaligned accesss in
the instruction decoder library"
* tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
x86/asm: Fix SETZ size enqcmds() build failure
x86/setup: Call early_reserve_memory() earlier
x86/fault: Fix wrong signal when vsyscall fails with pkey
Linus Torvalds [Sun, 26 Sep 2021 17:00:16 +0000 (10:00 -0700)]
Merge tag 'timers-urgent-2021-09-26' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for the recently introduced regression in posix CPU
timers which failed to stop the timer when requested. That caused
unexpected signals to be sent to the process/thread causing
malfunction"
* tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Prevent spuriously armed 0-value itimer
Linus Torvalds [Sun, 26 Sep 2021 16:55:22 +0000 (09:55 -0700)]
Merge tag 'irq-urgent-2021-09-26' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Work around a bad GIC integration on a Renesas platform which can't
handle byte-sized MMIO access
- Plug a potential memory leak in the GICv4 driver
- Fix a regression in the Armada 370-XP IPI code which was caused by
issuing EOI instack of ACK.
- A couple of small fixes here and there"
* tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic: Work around broken Renesas integration
irqchip/renesas-rza1: Use semicolons instead of commas
irqchip/gic-v3-its: Fix potential VPE leak on error
irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
irqchip/mbigen: Repair non-kernel-doc notation
irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent
irqchip/armada-370-xp: Fix ack/eoi breakage
Documentation: Fix irq-domain.rst build warning
Linus Torvalds [Sat, 25 Sep 2021 23:20:34 +0000 (16:20 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"16 patches.
Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts,
lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache,
debug, and pagemap)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix uninitialized use in overcommit_policy_handler
mm/memory_failure: fix the missing pte_unmap() call
kasan: always respect CONFIG_KASAN_STACK
sh: pgtable-3level: fix cast to pointer from integer of different size
mm/debug: sync up latest migrate_reason to migrate_reason_names
mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
mm: fs: invalidate bh_lrus for only cold path
lib/zlib_inflate/inffast: check config in C to avoid unused function warning
tools/vm/page-types: remove dependency on opt_file for idle page tracking
scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
ocfs2: drop acl cache for directories too
mm/shmem.c: fix judgment error in shmem_is_huge()
xtensa: increase size of gcc stack frame check
mm/damon: don't use strnlen() with known-bogus source length
kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
Linus Torvalds [Sat, 25 Sep 2021 23:05:56 +0000 (16:05 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Thirty-three fixes, I'm afraid.
Essentially the build up from the last couple of weeks while I've been
dealling with Linux Plumbers conference infrastructure issues. It's
mostly the usual assortment of spelling fixes and minor corrections.
The only core relevant changes are to the sd driver to reduce the spin
up message spew and fix a small memory leak on the freeing path"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (33 commits)
scsi: ses: Retry failed Send/Receive Diagnostic commands
scsi: target: Fix spelling mistake "CONFLIFT" -> "CONFLICT"
scsi: lpfc: Fix gcc -Wstringop-overread warning, again
scsi: lpfc: Use correct scnprintf() limit
scsi: lpfc: Fix sprintf() overflow in lpfc_display_fpin_wwpn()
scsi: core: Remove 'current_tag'
scsi: acornscsi: Remove tagged queuing vestiges
scsi: fas216: Kill scmd->tag
scsi: qla2xxx: Restore initiator in dual mode
scsi: ufs: core: Unbreak the reset handler
scsi: sd_zbc: Support disks with more than 2**32 logical blocks
scsi: ufs: core: Revert "scsi: ufs: Synchronize SCSI and UFS error handling"
scsi: bsg: Fix device unregistration
scsi: sd: Make sd_spinup_disk() less noisy
scsi: ufs: ufs-pci: Fix Intel LKF link stability
scsi: mpt3sas: Clean up some inconsistent indenting
scsi: megaraid: Clean up some inconsistent indenting
scsi: sr: Fix spelling mistake "does'nt" -> "doesn't"
scsi: Remove SCSI CDROM MAINTAINERS entry
scsi: megaraid: Fix Coccinelle warning
...
Linus Torvalds [Sat, 25 Sep 2021 22:51:08 +0000 (15:51 -0700)]
Merge tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"This one looks a bit bigger than it is, but that's mainly because 2/3
of it is enabling IORING_OP_CLOSE to close direct file descriptors.
We've had a few folks using them and finding it confusing that the way
to close them is through using -1 for file update, this just brings
API symmetry for direct descriptors. Hence I think we should just do
this now and have a better API for 5.15 release. There's some room for
de-duplicating the close code, but we're leaving that for the next
merge window.
Outside of that, just small fixes:
- Poll race fixes (Hao)
- io-wq core dump exit fix (me)
- Reschedule around potentially intensive tctx and buffer iterators
on teardown (me)
- Fix for always ending up punting files update to io-wq (me)
- Put the provided buffer meta data under memcg accounting (me)
- Tweak for io_write(), removing dead code that was added with the
iterator changes in this release (Pavel)"
* tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
io_uring: make OP_CLOSE consistent with direct open
io_uring: kill extra checks in io_write()
io_uring: don't punt files update to io-wq unconditionally
io_uring: put provided buffer meta data under memcg accounting
io_uring: allow conditional reschedule for intensive iterators
io_uring: fix potential req refcount underflow
io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow
io_uring: fix race between poll completion and cancel_hash insertion
io-wq: ensure we exit if thread group is exiting
Linus Torvalds [Sat, 25 Sep 2021 22:44:05 +0000 (15:44 -0700)]
Merge tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- keep ctrl->namespaces ordered (Christoph Hellwig)
- fix incorrect h2cdata pdu offset accounting in nvme-tcp (Sagi
Grimberg)
- handled updated hw_queues in nvme-fc more carefully (Daniel
Wagner, James Smart)
- md lock order fix (Christoph)
- fallocate locking fix (Ming)
- blktrace UAF fix (Zhihao)
- rq-qos bio tracking fix (Ming)
* tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
block: hold ->invalidate_lock in blkdev_fallocate
blktrace: Fix uaf in blk_trace access after removing by sysfs
block: don't call rq_qos_ops->done_bio if the bio isn't tracked
md: fix a lock order reversal in md_alloc
nvme: keep ctrl->namespaces ordered
nvme-tcp: fix incorrect h2cdata pdu offset accounting
nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
nvme-fc: avoid race between time out and tear down
nvme-fc: update hardware queues before using them
Linus Torvalds [Sat, 25 Sep 2021 22:37:31 +0000 (15:37 -0700)]
Merge tag 'for-linus-5.15b-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Some minor cleanups and fixes of some theoretical bugs, as well as a
fix of a bug introduced in 5.15-rc1"
* tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/x86: fix PV trap handling on secondary processors
xen/balloon: fix balloon kthread freezing
swiotlb-xen: this is PV-only on x86
xen/pci-swiotlb: reduce visibility of symbols
PCI: only build xen-pcifront in PV-enabled environments
swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests
Xen/gntdev: don't ignore kernel unmapping error
xen/x86: drop redundant zeroing from cpu_initialize_context()
Linus Torvalds [Sat, 25 Sep 2021 22:30:29 +0000 (15:30 -0700)]
Merge tag 'linux-kselftest-fixes-5.15-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- fix to Kselftest common framework header install to run before other
targets for it work correctly in parallel build case.
- fixes to kvm test to not ignore fscanf() returns which could result
in inconsistent test behavior and failures.
* tag 'linux-kselftest-fixes-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: kvm: fix get_run_delay() ignoring fscanf() return warn
selftests: kvm: move get_run_delay() into lib/test_util
selftests:kvm: fix get_trans_hugepagesz() ignoring fscanf() return warn
selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn
selftests: be sure to make khdr before other targets
Linus Torvalds [Sat, 25 Sep 2021 18:31:48 +0000 (11:31 -0700)]
Merge tag 'erofs-for-5.15-rc3-fixes' of git://git./linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Two bugfixes to fix the 4KiB blockmap chunk format availability and a
dangling pointer usage. There is also a trivial cleanup to clarify
compacted_2b if compacted_4b_initial > totalidx.
Summary:
- fix the dangling pointer use in erofs_lookup tracepoint
- fix unsupported chunk format check
- zero out compacted_2b if compacted_4b_initial > totalidx"
* tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: clear compacted_2b if compacted_4b_initial > totalidx
erofs: fix misbehavior of unsupported chunk format check
erofs: fix up erofs_lookup tracepoint
Linus Torvalds [Sat, 25 Sep 2021 18:08:12 +0000 (11:08 -0700)]
Merge tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Six small cifs/smb3 fixes, two for stable:
- important fix for deferred close (found by a git functional test)
related to attribute caching on close.
- four (two cosmetic, two more serious) small fixes for problems
pointed out by smatch via Dan Carpenter
- fix for comment formatting problems pointed out by W=1"
* tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix incorrect check for null pointer in header_assemble
smb3: correct server pointer dereferencing check to be more consistent
smb3: correct smb3 ACL security descriptor
cifs: Clear modified attribute bit from inode flags
cifs: Deal with some warnings from W=1
cifs: fix a sign extension bug
Linus Torvalds [Sat, 25 Sep 2021 17:29:14 +0000 (10:29 -0700)]
Merge tag 'char-misc-5.15-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.15-rc3.
Nothing huge in here, just fixes for a number of small issues that
have been reported. These include:
- habanalabs race conditions and other bugs fixed
- binder driver fixes
- fpga driver fixes
- coresight build warning fix
- nvmem driver fix
- comedi memory leak fix
- bcm-vk tty race fix
- other tiny driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
comedi: Fix memory leak in compat_insnlist()
nvmem: NVMEM_NINTENDO_OTP should depend on WII
misc: bcm-vk: fix tty registration race
fpga: dfl: Avoid reads to AFU CSRs during enumeration
fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
fpga: machxo2-spi: Return an error on failure
habanalabs: expose a single cs seq in staged submissions
habanalabs: fix wait offset handling
habanalabs: rate limit multi CS completion errors
habanalabs/gaudi: fix LBW RR configuration
habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK"
habanalabs: fail collective wait when not supported
habanalabs/gaudi: use direct MSI in single mode
habanalabs: fix kernel OOPs related to staged cs
habanalabs: fix potential race in interrupt wait ioctl
mcb: fix error handling in mcb_alloc_bus()
misc: genwqe: Fixes DMA mask setting
coresight: syscfg: Fix compiler warning
nvmem: core: Add stubs for nvmem_cell_read_variable_le_u32/64 if !CONFIG_NVMEM
binder: make sure fd closes complete
...
Linus Torvalds [Sat, 25 Sep 2021 17:19:49 +0000 (10:19 -0700)]
Merge tag 'staging-5.15-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for 5.15-rc3:
- greybus tty use-after-free bugfix
- r8188eu ioctl overlap build warning fix
Note, the r8188eu ioctl has been entirely removed for 5.16-rc1, but
it's good to get this fixed now for people using this in 5.15.
Both of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: r8188eu: fix -Wrestrict warnings
staging: greybus: uart: fix tty use after free
Linus Torvalds [Sat, 25 Sep 2021 17:15:55 +0000 (10:15 -0700)]
Merge tag 'tty-5.15-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are four small tty/serial driver fixes for 5.15-rc3. They
include:
- remove an export now that no one is using it anymore
- mvebu-uart tx_empty callback fix
- 8250_omap bugfix
- synclink_gt build fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: unexport tty_ldisc_release
tty: synclink_gt: rename a conflicting function name
serial: mvebu-uart: fix driver's tx_empty callback
serial: 8250: 8250_omap: Fix RX_LVL register offset
Linus Torvalds [Sat, 25 Sep 2021 17:10:38 +0000 (10:10 -0700)]
Merge tag 'usb-5.15-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some USB driver fixes and new device ids for 5.15-rc3.
They include:
- usb-storage quirk additions
- usb-serial new device ids
- usb-serial driver fixes
- USB roothub registration bugfix to resolve a long-reported issue
- usb gadget driver fixes for a large number of small things
- dwc2 driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
USB: serial: option: add device id for Foxconn T99W265
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
USB: serial: cp210x: add part-number debug printk
USB: serial: cp210x: fix dropped characters with CP2102
MAINTAINERS: usb, update Peter Korsgaard's entries
usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
USB: serial: option: remove duplicate USB device ID
USB: serial: mos7840: remove duplicated 0xac24 device ID
arm64: dts: qcom: ipq8074: remove USB tx-fifo-resize property
usb: gadget: f_uac2: Populate SS descriptors' wBytesPerInterval
usb: gadget: f_uac2: Add missing companion descriptor for feedback EP
usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd()
xhci: Set HCD flag to defer primary roothub registration
usb: core: hcd: Add support for deferring roothub registration
usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
usb: dwc3: core: balance phy init and exit
Revert "USB: bcma: Add a check for devm_gpiod_get"
...
Hyunchul Lee [Fri, 24 Sep 2021 15:06:16 +0000 (00:06 +0900)]
ksmbd: use LOOKUP_BENEATH to prevent the out of share access
instead of removing '..' in a given path, call
kern_path with LOOKUP_BENEATH flag to prevent
the out of share access.
ran various test on this:
smb2-cat-async smb://127.0.0.1/homes/../out_of_share
smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share
smbclient //127.0.0.1/homes -c "mkdir ../foo2"
smbclient //127.0.0.1/homes -c "rename bar ../bar"
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Boehme <slow@samba.org>
Tested-by: Steve French <smfrench@gmail.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Chen Jun [Fri, 24 Sep 2021 22:44:06 +0000 (15:44 -0700)]
mm: fix uninitialized use in overcommit_policy_handler
We get an unexpected value of /proc/sys/vm/overcommit_memory after
running the following program:
int main()
{
int fd = open("/proc/sys/vm/overcommit_memory", O_RDWR);
write(fd, "1", 1);
write(fd, "2", 1);
close(fd);
}
write(fd, "2", 1) will pass *ppos = 1 to proc_dointvec_minmax.
proc_dointvec_minmax will return 0 without setting new_policy.
t.data = &new_policy;
ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos)
-->do_proc_dointvec
-->__do_proc_dointvec
if (write) {
if (proc_first_pos_non_zero_ignore(ppos, table))
goto out;
sysctl_overcommit_memory = new_policy;
so sysctl_overcommit_memory will be set to an uninitialized value.
Check whether new_policy has been changed by proc_dointvec_minmax.
Link: https://lkml.kernel.org/r/20210923020524.13289-1-chenjun102@huawei.com
Fixes:
56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Rui Xiang <rui.xiang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qi Zheng [Fri, 24 Sep 2021 22:44:03 +0000 (15:44 -0700)]
mm/memory_failure: fix the missing pte_unmap() call
The paired pte_unmap() call is missing before the
dev_pagemap_mapping_shift() returns. So fix it.
David says:
"I guess this code never runs on 32bit / highmem, that's why we didn't
notice so far".
[akpm@linux-foundation.org: cleanup]
Link: https://lkml.kernel.org/r/20210923122642.4999-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nathan Chancellor [Fri, 24 Sep 2021 22:44:00 +0000 (15:44 -0700)]
kasan: always respect CONFIG_KASAN_STACK
Currently, the asan-stack parameter is only passed along if
CFLAGS_KASAN_SHADOW is not empty, which requires KASAN_SHADOW_OFFSET to
be defined in Kconfig so that the value can be checked. In RISC-V's
case, KASAN_SHADOW_OFFSET is not defined in Kconfig, which means that
asan-stack does not get disabled with clang even when CONFIG_KASAN_STACK
is disabled, resulting in large stack warnings with allmodconfig:
drivers/video/fbdev/omap2/omapfb/displays/panel-lgphilips-lb035q02.c:117:12: error: stack frame size (14400) exceeds limit (2048) in function 'lb035q02_connect' [-Werror,-Wframe-larger-than]
static int lb035q02_connect(struct omap_dss_device *dssdev)
^
1 error generated.
Ensure that the value of CONFIG_KASAN_STACK is always passed along to
the compiler so that these warnings do not happen when
CONFIG_KASAN_STACK is disabled.
Link: https://github.com/ClangBuiltLinux/linux/issues/1453
References:
6baec880d7a5 ("kasan: turn off asan-stack for clang-8 and earlier")
Link: https://lkml.kernel.org/r/20210922205525.570068-1-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Fri, 24 Sep 2021 22:43:57 +0000 (15:43 -0700)]
sh: pgtable-3level: fix cast to pointer from integer of different size
If X2TLB=y (CPU_SHX2=y or CPU_SHX3=y, e.g. migor_defconfig), pgd_t.pgd
is "unsigned long long", causing:
In file included from arch/sh/include/asm/pgtable.h:13,
from include/linux/pgtable.h:6,
from include/linux/mm.h:33,
from arch/sh/kernel/asm-offsets.c:14:
arch/sh/include/asm/pgtable-3level.h: In function `pud_pgtable':
arch/sh/include/asm/pgtable-3level.h:37:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
37 | return (pmd_t *)pud_val(pud);
| ^
Fix this by adding an intermediate cast to "unsigned long", which is
basically what the old code did before.
Link: https://lkml.kernel.org/r/2c2eef3c9a2f57e5609100a4864715ccf253d30f.1631713483.git.geert+renesas@glider.be
Fixes:
9cf6fa2458443118 ("mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Daniel Palmer <daniel@thingy.jp>
Acked-by: Rob Landley <rob@landley.net>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Jacopo Mondi <jacopo+renesas@jmondi.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weizhao Ouyang [Fri, 24 Sep 2021 22:43:53 +0000 (15:43 -0700)]
mm/debug: sync up latest migrate_reason to migrate_reason_names
Sync up MR_DEMOTION to migrate_reason_names and add a synch prompt.
Link: https://lkml.kernel.org/r/20210921064553.293905-3-o451686892@gmail.com
Fixes:
26aa2d199d6f ("mm/migrate: demote pages during reclaim")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weizhao Ouyang [Fri, 24 Sep 2021 22:43:50 +0000 (15:43 -0700)]
mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
Sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN to migrate_reason_names.
Link: https://lkml.kernel.org/r/20210921064553.293905-2-o451686892@gmail.com
Fixes:
310253514bbf ("mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGE")
Fixes:
d1e153fea2a8 ("mm/gup: migrate pinned pages out of movable zone")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Fri, 24 Sep 2021 22:43:47 +0000 (15:43 -0700)]
mm: fs: invalidate bh_lrus for only cold path
The kernel test robot reported the regression of fio.write_iops[1] with
commit
8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration").
Since lru_add_drain is called frequently, invalidate bh_lrus there could
increase bh_lrus cache miss ratio, which needs more IO in the end.
This patch moves the bh_lrus invalidation from the hot path( e.g.,
zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all,
lru_cache_disable).
Zhengjun Xing confirmed
"I test the patch, the regression reduced to -2.9%"
[1] https://lore.kernel.org/lkml/
20210520083144.GD14190@xsang-OptiPlex-9020/
[2]
8cc621d2f45d, mm: fs: invalidate BH LRU during page migration
Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Tested-by: "Xing, Zhengjun" <zhengjun.xing@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Menzel [Fri, 24 Sep 2021 22:43:44 +0000 (15:43 -0700)]
lib/zlib_inflate/inffast: check config in C to avoid unused function warning
Building Linux for ppc64le with Ubuntu clang version
12.0.0-3ubuntu1~21.04.1 shows the warning below.
arch/powerpc/boot/inffast.c:20:1: warning: unused function 'get_unaligned16' [-Wunused-function]
get_unaligned16(const unsigned short *p)
^
1 warning generated.
Fix it by moving the check from the preprocessor to C, so the compiler
sees the use.
Link: https://lkml.kernel.org/r/20210920084332.5752-1-pmenzel@molgen.mpg.de
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changbin Du [Fri, 24 Sep 2021 22:43:41 +0000 (15:43 -0700)]
tools/vm/page-types: remove dependency on opt_file for idle page tracking
Idle page tracking can also be used for process address space, not only
file mappings.
Without this change, using with '-i' option for process address space
encounters below errors reported.
$ sudo ./page-types -p $(pidof bash) -i
mark page idle: Bad file descriptor
mark page idle: Bad file descriptor
mark page idle: Bad file descriptor
mark page idle: Bad file descriptor
...
Link: https://lkml.kernel.org/r/20210917032826.10669-1-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miles Chen [Fri, 24 Sep 2021 22:43:38 +0000 (15:43 -0700)]
scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
Fix the following build failure reported in [1] by adding a conditional
definition of EM_RISCV in order to allow cross-compilation on machines
which do not have EM_RISCV definition in their host.
scripts/sorttable.c:352:7: error: use of undeclared identifier 'EM_RISCV'
EM_RISCV was added to <elf.h> in glibc 2.24 so builds on systems with
glibc headers < 2.24 should show this error.
[mkubecek@suse.cz: changelog addition]
Link: https://lore.kernel.org/lkml/e8965b25-f15b-c7b4-748c-d207dda9c8e8@i2se.com/
Link: https://lkml.kernel.org/r/20210913030625.4525-1-miles.chen@mediatek.com
Fixes:
54fed35fd393 ("riscv: Enable BUILDTIME_TABLE_SORT")
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Markus Mayer <mmayer@broadcom.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wengang Wang [Fri, 24 Sep 2021 22:43:35 +0000 (15:43 -0700)]
ocfs2: drop acl cache for directories too
ocfs2_data_convert_worker() is currently dropping any cached acl info
for FILE before down-converting meta lock. It should also drop for
DIRECTORY. Otherwise the second acl lookup returns the cached one (from
VFS layer) which could be already stale.
The problem we are seeing is that the acl changes on one node doesn't
get refreshed on other nodes in the following case:
Node 1 Node 2
-------------- ----------------
getfacl dir1
getfacl dir1 <-- this is OK
setfacl -m u:user1:rwX dir1
getfacl dir1 <-- see the change for user1
getfacl dir1 <-- can't see change for user1
Link: https://lkml.kernel.org/r/20210903012631.6099-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Liu Yuntao [Fri, 24 Sep 2021 22:43:32 +0000 (15:43 -0700)]
mm/shmem.c: fix judgment error in shmem_is_huge()
In the case of SHMEM_HUGE_WITHIN_SIZE, the page index is not rounded up
correctly. When the page index points to the first page in a huge page,
round_up() cannot bring it to the end of the huge page, but to the end
of the previous one.
An example:
HPAGE_PMD_NR on my machine is 512(2 MB huge page size). After
allcoating a 3000 KB buffer, I access it at location 2050 KB. In
shmem_is_huge(), the corresponding index happens to be 512. After
rounded up by HPAGE_PMD_NR, it will still be 512 which is smaller than
i_size, and shmem_is_huge() will return true. As a result, my buffer
takes an additional huge page, and that shouldn't happen when
shmem_enabled is set to within_size.
Link: https://lkml.kernel.org/r/20210909032007.18353-1-liuyuntao10@huawei.com
Fixes:
f3f0e1d2150b2b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Liu Yuntao <liuyuntao10@huawei.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: wuxu.wu <wuxu.wu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guenter Roeck [Fri, 24 Sep 2021 22:43:29 +0000 (15:43 -0700)]
xtensa: increase size of gcc stack frame check
xtensa frame size is larger than the frame size for almost all other
architectures. This results in more than 50 "the frame size of <n> is
larger than 1024 bytes" errors when trying to build xtensa:allmodconfig.
Increase frame size for xtensa to 1536 bytes to avoid compile errors due
to frame size limits.
Link: https://lkml.kernel.org/r/20210912025235.3514761-1-linux@roeck-us.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adam Borowski [Fri, 24 Sep 2021 22:43:26 +0000 (15:43 -0700)]
mm/damon: don't use strnlen() with known-bogus source length
gcc knows the true length too, and rightfully complains.
Link: https://lkml.kernel.org/r/20210912204447.10427-1-kilobyte@angband.pl
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marco Elver [Fri, 24 Sep 2021 22:43:23 +0000 (15:43 -0700)]
kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
In the main KASAN config option CC_HAS_WORKING_NOSANITIZE_ADDRESS is
checked for instrumentation-based modes. However, if
HAVE_ARCH_KASAN_HW_TAGS is true all modes may still be selected.
To fix, also make the software modes depend on
CC_HAS_WORKING_NOSANITIZE_ADDRESS.
Link: https://lkml.kernel.org/r/20210910084240.1215803-1-elver@google.com
Fixes:
6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS")
Signed-off-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Taras Madan <tarasmadan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 24 Sep 2021 22:43:20 +0000 (15:43 -0700)]
mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
Commit
fcc00621d88b ("mm/hwpoison: retry with shake_page() for
unhandlable pages") changed the return value of __get_hwpoison_page() to
retry for transiently unhandlable cases. However, __get_hwpoison_page()
currently fails to properly judge buddy pages as handlable, so hard/soft
offline for buddy pages always fail as "unhandlable page". This is
totally regrettable.
So let's add is_free_buddy_page() in HWPoisonHandlable(), so that
__get_hwpoison_page() returns different return values between buddy
pages and unhandlable pages as intended.
Link: https://lkml.kernel.org/r/20210909004131.163221-1-naoya.horiguchi@linux.dev
Fixes:
fcc00621d88b ("mm/hwpoison: retry with shake_page() for unhandlable pages")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Begunkov [Fri, 24 Sep 2021 19:04:29 +0000 (20:04 +0100)]
io_uring: make OP_CLOSE consistent with direct open
From recently open/accept are now able to manipulate fixed file table,
but it's inconsistent that close can't. Close the gap, keep API same as
with open/accept, i.e. via sqe->file_slot.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 24 Sep 2021 18:22:55 +0000 (11:22 -0700)]
Merge tag 'gpio-fixes-for-v5.15-rc3' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a regression in GPIO ACPI on HP ElitePad 1000 G2 where the
gpio_set_debounce_timeout() now returns a fatal error if the specific
debounce period is not supported by the driver instead of just
emitting a warning
- fix return values of irq_mask/unmask() callbacks in gpio-uniphier
- fix hwirq calculation in gpio-aspeed-sgpio
- fix two issues in gpio-rockchip: only make the extended debounce
support available for v2 and remove a redundant BIT() usage
* tag 'gpio-fixes-for-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio/rockchip: fix get_direction value handling
gpio/rockchip: extended debounce support is only available on v2
gpio: gpio-aspeed-sgpio: Fix wrong hwirq in irq handler.
gpio: uniphier: Fix void functions to remove return value
gpiolib: acpi: Make set-debounce-timeout failures non fatal
Linus Torvalds [Fri, 24 Sep 2021 18:20:29 +0000 (11:20 -0700)]
Merge tag 'devprop-5.15-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull device properties framework fix from Rafael Wysocki:
"Fix software node refcount imbalance on device removal (Laurentiu
Tudor)"
* tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
software node: balance refcount for managed software nodes
Linus Torvalds [Fri, 24 Sep 2021 18:17:32 +0000 (11:17 -0700)]
Merge tag 'acpi-5.15-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert a recent commit related to memory management that turned out to
be problematic (Jia He)"
* tag 'acpi-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: Add memory semantics to acpi_os_map_memory()"
Linus Torvalds [Fri, 24 Sep 2021 18:12:17 +0000 (11:12 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- It turns out that the optimised string routines merged in 5.14 are
not safe with in-kernel MTE (KASAN_HW_TAGS) because of reading beyond
the end of a string (strcmp, strncmp). Such reading may go across a
16 byte tag granule and cause a tag check fault. When KASAN_HW_TAGS
is enabled, use the generic strcmp/strncmp C implementation.
- An errata workaround for ThunderX relied on the CPU capabilities
being enabled in a specific order. This disappeared with the
automatic generation of the cpucaps.h file (sorted alphabetically).
Fix it by checking the current CPU only rather than the system-wide
capability.
- Add system_supports_mte() checks on the kernel entry/exit path and
thread switching to avoid unnecessary barriers and function calls on
systems where MTE is not supported.
- kselftests: skip arm64 tests if the required features are missing.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Restore forced disabling of KPTI on ThunderX
kselftest/arm64: signal: Skip tests if required features are missing
arm64: Mitigate MTE issues with str{n}cmp()
arm64: add MTE supported check to thread switching and syscall entry/exit
Linus Torvalds [Fri, 24 Sep 2021 17:28:18 +0000 (10:28 -0700)]
Merge tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a potential array out of bounds access from Dan"
* tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-client:
ceph: fix off by one bugs in unsafe_request_wait()
Linus Torvalds [Fri, 24 Sep 2021 17:22:35 +0000 (10:22 -0700)]
Merge tag 'fixes_for_v5.15-rc3' of git://git./linux/kernel/git/jack/linux-fs
Pull misc filesystem fixes from Jan Kara:
"A for ext2 sleep in atomic context in case of some fs problems and a
cleanup of an invalidate_lock initialization"
* tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: fix sleeping in atomic bugs on error
mm: Fully initialize invalidate_lock, amend lock class later
Linus Torvalds [Fri, 24 Sep 2021 17:18:07 +0000 (10:18 -0700)]
Merge branch 'work.init' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Followups to nodev root stuff from this merge window"
* 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
init: don't panic if mount_nodev_root failed
init/do_mounts.c: Harden split_fs_names() against buffer overflow
Linus Torvalds [Fri, 24 Sep 2021 17:14:19 +0000 (10:14 -0700)]
Merge tag 'drm-fixes-2021-09-24' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Quiet week this week, just some i915 and amd fixes, just getting ready
for my all nighter maintainer summit!
Summary:
i915:
- Fix ADL-P memory bandwidth parameters
- Fix memory corruption due to a double free
- Fix memory leak in DMC firmware handling
amdgpu:
- Update MAINTAINERS entry for powerplay
- Fix empty macros
- SI DPM fix
amdkfd:
- SVM fixes
- DMA mapping fix"
* tag 'drm-fixes-2021-09-24' of git://anongit.freedesktop.org/drm/drm:
drm/amdkfd: fix svm_migrate_fini warning
drm/amdkfd: handle svm migrate init error
drm/amd/pm: Update intermediate power state for SI
drm/amdkfd: fix dma mapping leaking warning
drm/amdkfd: SVM map to gpus check vma boundary
MAINTAINERS: fix up entry for AMD Powerplay
drm/amd/display: fix empty debug macros
drm/i915: Free all DMC payloads
drm/i915: Move __i915_gem_free_object to ttm_bo_destroy
drm/i915: Update memory bandwidth parameters
Ming Lei [Thu, 23 Sep 2021 02:37:51 +0000 (10:37 +0800)]
block: hold ->invalidate_lock in blkdev_fallocate
When running ->fallocate(), blkdev_fallocate() should hold
mapping->invalidate_lock to prevent page cache from being accessed,
otherwise stale data may be read in page cache.
Without this patch, blktests block/009 fails sometimes. With this patch,
block/009 can pass always.
Also as Jan pointed out, no pages can be created in the discarded area
while you are holding the invalidate_lock, so remove the 2nd
truncate_bdev_range().
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zhihao Cheng [Thu, 23 Sep 2021 13:49:21 +0000 (21:49 +0800)]
blktrace: Fix uaf in blk_trace access after removing by sysfs
There is an use-after-free problem triggered by following process:
P1(sda) P2(sdb)
echo 0 > /sys/block/sdb/trace/enable
blk_trace_remove_queue
synchronize_rcu
blk_trace_free
relay_close
rcu_read_lock
__blk_add_trace
trace_note_tsk
(Iterate running_trace_list)
relay_close_buf
relay_destroy_buf
kfree(buf)
trace_note(sdb's bt)
relay_reserve
buf->offset <- nullptr deference (use-after-free) !!!
rcu_read_unlock
[ 502.714379] BUG: kernel NULL pointer dereference, address:
0000000000000010
[ 502.715260] #PF: supervisor read access in kernel mode
[ 502.715903] #PF: error_code(0x0000) - not-present page
[ 502.716546] PGD
103984067 P4D
103984067 PUD
17592b067 PMD 0
[ 502.717252] Oops: 0000 [#1] SMP
[ 502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360
[ 502.732872] Call Trace:
[ 502.733193] __blk_add_trace.cold+0x137/0x1a3
[ 502.733734] blk_add_trace_rq+0x7b/0xd0
[ 502.734207] blk_add_trace_rq_issue+0x54/0xa0
[ 502.734755] blk_mq_start_request+0xde/0x1b0
[ 502.735287] scsi_queue_rq+0x528/0x1140
...
[ 502.742704] sg_new_write.isra.0+0x16e/0x3e0
[ 502.747501] sg_ioctl+0x466/0x1100
Reproduce method:
ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
ioctl(/dev/sda, BLKTRACESTART)
ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
ioctl(/dev/sdb, BLKTRACESTART)
echo 0 > /sys/block/sdb/trace/enable &
// Add delay(mdelay/msleep) before kernel enters blk_trace_free()
ioctl$SG_IO(/dev/sda, SG_IO, ...)
// Enters trace_note_tsk() after blk_trace_free() returned
// Use mdelay in rcu region rather than msleep(which may schedule out)
Remove blk_trace from running_list before calling blk_trace_free() by
sysfs if blk_trace is at Blktrace_running state.
Fixes:
c71a896154119f ("blktrace: add ftrace plugin")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Fri, 24 Sep 2021 11:07:04 +0000 (19:07 +0800)]
block: don't call rq_qos_ops->done_bio if the bio isn't tracked
rq_qos framework is only applied on request based driver, so:
1) rq_qos_done_bio() needn't to be called for bio based driver
2) rq_qos_done_bio() needn't to be called for bio which isn't tracked,
such as bios ended from error handling code.
Especially in bio_endio():
1) request queue is referred via bio->bi_bdev->bd_disk->queue, which
may be gone since request queue refcount may not be held in above two
cases
2) q->rq_qos may be freed in blk_cleanup_queue() when calling into
__rq_qos_done_bio()
Fix the potential kernel panic by not calling rq_qos_ops->done_bio if
the bio isn't tracked. This way is safe because both ioc_rqos_done_bio()
and blkcg_iolatency_done_bio() are nop if the bio isn't tracked.
Reported-by: Yu Kuai <yukuai3@huawei.com>
Cc: tj@kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Fri, 24 Sep 2021 16:14:48 +0000 (17:14 +0100)]
io_uring: kill extra checks in io_write()
We don't retry short writes and so we would never get to async setup in
io_write() in that case. Thus ret2 > 0 is always false and
iov_iter_advance() is never used. Apparently, the same is found by
Coverity, which complains on the code.
Fixes:
cd65869512ab ("io_uring: use iov_iter state save/restore helpers")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 24 Sep 2021 14:43:54 +0000 (08:43 -0600)]
io_uring: don't punt files update to io-wq unconditionally
There's no reason to punt it unconditionally, we just need to ensure that
the submit lock grabbing is conditional.
Fixes:
05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 24 Sep 2021 13:39:08 +0000 (07:39 -0600)]
io_uring: put provided buffer meta data under memcg accounting
For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.
Fixes:
ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 24 Sep 2021 13:12:27 +0000 (07:12 -0600)]
io_uring: allow conditional reschedule for intensive iterators
If we have a lot of threads and rings, the tctx list can get quite big.
This is especially true if we keep creating new threads and rings.
Likewise for the provided buffers list. Be nice and insert a conditional
reschedule point while iterating the nodes for deletion.
Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/
Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Wed, 22 Sep 2021 10:12:38 +0000 (18:12 +0800)]
io_uring: fix potential req refcount underflow
For multishot mode, there may be cases like:
iowq original context
io_poll_add
_arm_poll()
mask = vfs_poll() is not 0
if mask
(2) io_poll_complete()
compl_unlock
(interruption happens
tw queued to original
context)
io_poll_task_func()
compl_lock
(3) done = io_poll_complete() is true
compl_unlock
put req ref
(1) if (poll->flags & EPOLLONESHOT)
put req ref
EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple
combinations that can cause ref underfow.
Let's address it by:
- check the return value in (2) as done
- change (1) to if (done)
in this way, we only do ref put in (1) if 'oneshot flag' is from
(2)
- do poll.done check in io_poll_task_func(), so that we won't put ref
for the second time.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-4-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Wed, 22 Sep 2021 10:12:37 +0000 (18:12 +0800)]
io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow
We should set EPOLLONESHOT if cqring_fill_event() returns false since
io_poll_add() decides to put req or not by it.
Fixes:
5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Wed, 22 Sep 2021 10:12:36 +0000 (18:12 +0800)]
io_uring: fix race between poll completion and cancel_hash insertion
If poll arming and poll completion runs in parallel, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:
iowq original context
io_poll_add
vfs_poll
(interruption happens
tw queued to original
context) io_poll_task_func
generate cqe
del from cancel_hash[]
if !poll.done
insert to cancel_hash[]
The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].
Fixes:
5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 21 Sep 2021 14:24:57 +0000 (08:24 -0600)]
io-wq: ensure we exit if thread group is exiting
Dave reports that a coredumping workload gets stuck in 5.15-rc2, and
identified the culprit in the Fixes line below. The problem is that
relying solely on fatal_signal_pending() to gate whether to exit or not
fails miserably if a process gets eg SIGILL sent. Don't exclusively
rely on fatal signals, also check if the thread group is exiting.
Fixes:
15e20db2e0ce ("io-wq: only exit on fatal signals")
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 24 Sep 2021 13:15:21 +0000 (07:15 -0600)]
Merge tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme into block-5.15
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.15:
- keep ctrl->namespaces ordered (me)
- fix incorrect h2cdata pdu offset accounting in nvme-tcp
(Sagi Grimberg)
- handled updated hw_queues in nvme-fc more carefully (Daniel Wagner,
James Smart)"
* tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme:
nvme: keep ctrl->namespaces ordered
nvme-tcp: fix incorrect h2cdata pdu offset accounting
nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
nvme-fc: avoid race between time out and tear down
nvme-fc: update hardware queues before using them
Thomas Gleixner [Fri, 24 Sep 2021 12:11:04 +0000 (14:11 +0200)]
Merge tag 'irqchip-fixes-5.15-1' of git://git./linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- Work around a bad GIC integration on a Renesas platform, where the
interconnect cannot deal with byte-sized MMIO accesses
- Cleanup another Renesas driver abusing the comma operator
- Fix a potential GICv4 memory leak on an error path
- Make the type of 'size' consistent with the rest of the code in
__irq_domain_add()
- Fix a regression in the Armada 370-XP IPI path
- Fix the build for the obviously unloved goldfish-pic
- Some documentation fixes
Link: https://lore.kernel.org/r/20210924090933.2766857-1-maz@kernel.org
Numfor Mbiziwo-Tiapo [Thu, 23 Sep 2021 16:18:43 +0000 (09:18 -0700)]
x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
Don't perform unaligned loads in __get_next() and __peek_nbyte_next() as
these are forms of undefined behavior:
"A pointer to an object or incomplete type may be converted to a pointer
to a different object or incomplete type. If the resulting pointer
is not correctly aligned for the pointed-to type, the behavior is
undefined."
(from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
These problems were identified using the undefined behavior sanitizer
(ubsan) with the tools version of the code and perf test.
[ bp: Massage commit message. ]
Signed-off-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210923161843.751834-1-irogers@google.com
Greg Kroah-Hartman [Fri, 24 Sep 2021 08:22:53 +0000 (10:22 +0200)]
Merge tag 'usb-serial-5.15-rc3' of https://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.15-rc3
Here's a fix for a regression affecting some CP2102 devices and a host
of new device ids.
Included are also a couple of cleanups of duplicate device ids, which
are also tagged for stable to keep the tables in sync, and a trivial
patch to help debugging cp210x issues.
All have been in linux-next with no reported issues. Note however that
the last last two device-id commits were rebased to fix up a lore link
in a commit message (as the patch itself never made it to the list).
* tag 'usb-serial-5.15-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add device id for Foxconn T99W265
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
USB: serial: cp210x: add part-number debug printk
USB: serial: cp210x: fix dropped characters with CP2102
USB: serial: option: remove duplicate USB device ID
USB: serial: mos7840: remove duplicated 0xac24 device ID
USB: serial: option: add Telit LN920 compositions
Slark Xiao [Fri, 17 Sep 2021 11:01:06 +0000 (19:01 +0800)]
USB: serial: option: add device id for Foxconn T99W265
Adding support for Foxconn device T99W265 for enumeration with
PID 0xe0db.
usb-devices output for 0xe0db
T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 19 Spd=5000 MxCh= 0
D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1
P: Vendor=0489 ProdID=e0db Rev=05.04
S: Manufacturer=Microsoft
S: Product=Generic Mobile Broadband Adapter
S: SerialNumber=
6c50f452
C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=896mA
I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim
I: If#=0x1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I: If#=0x3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
I: If#=0x4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
if0/1: MBIM, if2:Diag, if3:GNSS, if4: Modem
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Link: https://lore.kernel.org/r/20210917110106.9852-1-slark_xiao@163.com
[ johan: use USB_DEVICE_INTERFACE_CLASS(), amend comment ]
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Uwe Brandt [Tue, 21 Sep 2021 17:54:46 +0000 (19:54 +0200)]
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
Add the USB serial device ID for the GW Instek GDM-834x Digital Multimeter.
Signed-off-by: Uwe Brandt <uwe.brandt@gmail.com>
Link: https://lore.kernel.org/r/YUxFl3YUCPGJZd8Y@hovoldconsulting.com
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Steve French [Fri, 24 Sep 2021 00:18:37 +0000 (19:18 -0500)]
cifs: fix incorrect check for null pointer in header_assemble
Although very unlikely that the tlink pointer would be null in this case,
get_next_mid function can in theory return null (but not an error)
so need to check for null (not for IS_ERR, which can not be returned
here).
Address warning:
fs/smbfs_client/connect.c:2392 cifs_match_super()
warn: 'tlink' isn't an ERR_PTR
Pointed out by Dan Carpenter via smatch code analysis tool
CC: stable@vger.kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Thu, 23 Sep 2021 23:52:40 +0000 (18:52 -0500)]
smb3: correct server pointer dereferencing check to be more consistent
Address warning:
fs/smbfs_client/misc.c:273 header_assemble()
warn: variable dereferenced before check 'treeCon->ses->server'
Pointed out by Dan Carpenter via smatch code analysis tool
Although the check is likely unneeded, adding it makes the code
more consistent and easier to read, as the same check is
done elsewhere in the function.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Dave Airlie [Thu, 23 Sep 2021 23:39:16 +0000 (09:39 +1000)]
Merge tag 'drm-intel-fixes-2021-09-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.15-rc3:
- Fix ADL-P memory bandwidth parameters
- Fix memory corruption due to a double free
- Fix memory leak in DMC firmware handling
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87o88jbk3o.fsf@intel.com
Dave Airlie [Thu, 23 Sep 2021 22:39:02 +0000 (08:39 +1000)]
Merge tag 'amd-drm-fixes-5.15-2021-09-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.15-2021-09-23:
amdgpu:
- Update MAINTAINERS entry for powerplay
- Fix empty macros
- SI DPM fix
amdkfd:
- SVM fixes
- DMA mapping fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210923211330.20725-1-alexander.deucher@amd.com
Linus Torvalds [Thu, 23 Sep 2021 21:39:41 +0000 (14:39 -0700)]
Merge tag 'for-5.15-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix for leak of transaction handle after verity rollback
failure
- properly reset device last error between mounts
- improve one error handling case when checksumming bios
- fixup confusing displayed size of space info free space
* tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prevent __btrfs_dump_space_info() to underflow its free space
btrfs: fix mount failure due to past and transient device flush error
btrfs: fix transaction handle leak after verity rollback failure
btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
Steve French [Thu, 23 Sep 2021 21:00:31 +0000 (16:00 -0500)]
smb3: correct smb3 ACL security descriptor
Address warning:
fs/smbfs_client/smb2pdu.c:2425 create_sd_buf()
warn: struct type mismatch 'smb3_acl vs cifs_acl'
Pointed out by Dan Carpenter via smatch code analysis tool
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 23 Sep 2021 21:17:06 +0000 (14:17 -0700)]
Merge tag 'selinux-pr-
20210923' of git://git./linux/kernel/git/pcmoore/selinux
Pull SELinux/Smack fixes from Paul Moore:
"Another single-patch pull request for SELinux, as well as Smack.
This fixes some credential misuse and is explained reasonably well in
the patch description"
* tag 'selinux-pr-
20210923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux,smack: fix subjective/objective credential use mixups
Steve French [Thu, 23 Sep 2021 17:42:35 +0000 (12:42 -0500)]
cifs: Clear modified attribute bit from inode flags
Clear CIFS_INO_MODIFIED_ATTR bit from inode flags after
updating mtime and ctime
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org # 5.13+
Signed-off-by: Steve French <stfrench@microsoft.com>
Philip Yang [Mon, 20 Sep 2021 21:25:52 +0000 (17:25 -0400)]
drm/amdkfd: fix svm_migrate_fini warning
Device manager releases device-specific resources when a driver
disconnects from a device, devm_memunmap_pages and
devm_release_mem_region calls in svm_migrate_fini are redundant.
It causes below warning trace after patch "drm/amdgpu: Split
amdgpu_device_fini into early and late", so remove function
svm_migrate_fini.
BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718
WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
devm_release_action+0x51/0x60
Call Trace:
? memunmap_pages+0x360/0x360
svm_migrate_fini+0x2d/0x60 [amdgpu]
kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
drm_dev_release+0x20/0x40 [drm]
release_nodes+0x196/0x1e0
device_release_driver_internal+0x104/0x1d0
driver_detach+0x47/0x90
bus_remove_driver+0x7a/0xd0
pci_unregister_driver+0x3d/0x90
amdgpu_exit+0x11/0x20 [amdgpu]
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 17 Sep 2021 18:32:14 +0000 (14:32 -0400)]
drm/amdkfd: handle svm migrate init error
If svm migration init failed to create pgmap for device memory, set
pgmap type to 0 to disable device SVM support capability.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 23 Sep 2021 03:58:43 +0000 (11:58 +0800)]
drm/amd/pm: Update intermediate power state for SI
Update the current state as boot state during dpm initialization.
During the subsequent initialization, set_power_state gets called to
transition to the final power state. set_power_state refers to values
from the current state and without current state populated, it could
result in NULL pointer dereference.
For ex: on platforms where PCI speed change is supported through ACPI
ATCS method, the link speed of current state needs to be queried before
deciding on changing to final power state's link speed. The logic to query
ATCS-support was broken on certain platforms. The issue became visible
when broken ATCS-support logic got fixed with commit
f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)").
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1698
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Philip Yang [Tue, 14 Sep 2021 20:33:40 +0000 (16:33 -0400)]
drm/amdkfd: fix dma mapping leaking warning
For xnack off, restore work dma unmap previous system memory page, and
dma map the updated system memory page to update GPU mapping, this is
not dma mapping leaking, remove the WARN_ONCE for dma mapping leaking.
prange->dma_addr store the VRAM page pfn after the range migrated to
VRAM, should not dma unmap VRAM page when updating GPU mapping or
remove prange. Add helper svm_is_valid_dma_mapping_addr to check VRAM
page and error cases.
Mask out SVM_RANGE_VRAM_DOMAIN flag in dma_addr before calling amdgpu vm
update to avoid BUG_ON(*addr & 0xFFFF00000000003FULL), and set it again
immediately after. This flag is used to know the type of page later to
dma unmapping system memory page.
Fixes:
1d5dbfe6c06a ("drm/amdkfd: classify and map mixed svm range pages in GPU")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 13 Sep 2021 14:03:36 +0000 (10:03 -0400)]
drm/amdkfd: SVM map to gpus check vma boundary
SVM range may includes multiple VMAs with different vm_flags, if prange
page index is the last page of the VMA offset + npages, update GPU
mapping to create GPU page table with same VMA access permission.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 17 Sep 2021 16:05:30 +0000 (12:05 -0400)]
MAINTAINERS: fix up entry for AMD Powerplay
Fix the path to cover both the older powerplay infrastructure
and the newer SwSMU infrastructure.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Arnd Bergmann [Mon, 20 Sep 2021 12:16:00 +0000 (14:16 +0200)]
drm/amd/display: fix empty debug macros
Using an empty macro expansion as a conditional expression
produces a W=1 warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c: In function 'dce_aux_transfer_with_retries':
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:775:156: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
775 | "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER");
| ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:783:155: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
783 | "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_NACK");
| ^
Expand it to "do { } while (0)" instead to make the expression
more robust and avoid the warning.
Fixes:
56aca2309301 ("drm/amd/display: Add AUX I2C tracing.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Howells [Mon, 20 Sep 2021 12:14:15 +0000 (13:14 +0100)]
cifs: Deal with some warnings from W=1
Deal with some warnings generated from make W=1:
(1) Add/remove/fix kerneldoc parameters descriptions.
(2) Turn cifs' rqst_page_get_length()'s banner comment into a kerneldoc
comment. It should probably be prefixed with "cifs_" though.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Jia He [Thu, 23 Sep 2021 03:35:57 +0000 (11:35 +0800)]
Revert "ACPI: Add memory semantics to acpi_os_map_memory()"
This reverts commit
437b38c51162f8b87beb28a833c4d5dc85fa864e.
The memory semantics added in commit
437b38c51162 causes SystemMemory
Operation region, whose address range is not described in the EFI memory
map to be mapped as NormalNC memory on arm64 platforms (through
acpi_os_map_memory() in acpi_ex_system_memory_space_handler()).
This triggers the following abort on an ARM64 Ampere eMAG machine,
because presumably the physical address range area backing the Opregion
does not support NormalNC memory attributes driven on the bus.
Internal error: synchronous external abort:
96000410 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0+ #462
Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 0.14 02/22/2019
pstate:
60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[...snip...]
Call trace:
acpi_ex_system_memory_space_handler+0x26c/0x2c8
acpi_ev_address_space_dispatch+0x228/0x2c4
acpi_ex_access_region+0x114/0x268
acpi_ex_field_datum_io+0x128/0x1b8
acpi_ex_extract_from_field+0x14c/0x2ac
acpi_ex_read_data_from_field+0x190/0x1b8
acpi_ex_resolve_node_to_value+0x1ec/0x288
acpi_ex_resolve_to_value+0x250/0x274
acpi_ds_evaluate_name_path+0xac/0x124
acpi_ds_exec_end_op+0x90/0x410
acpi_ps_parse_loop+0x4ac/0x5d8
acpi_ps_parse_aml+0xe0/0x2c8
acpi_ps_execute_method+0x19c/0x1ac
acpi_ns_evaluate+0x1f8/0x26c
acpi_ns_init_one_device+0x104/0x140
acpi_ns_walk_namespace+0x158/0x1d0
acpi_ns_initialize_devices+0x194/0x218
acpi_initialize_objects+0x48/0x50
acpi_init+0xe0/0x498
If the Opregion address range is not present in the EFI memory map there
is no way for us to determine the memory attributes to use to map it -
defaulting to NormalNC does not work (and it is not correct on a memory
region that may have read side-effects) and therefore commit
437b38c51162 should be reverted, which means reverting back to the
original behavior whereby address ranges that are mapped using
acpi_os_map_memory() default to the safe devicenGnRnE attributes on
ARM64 if the mapped address range is not defined in the EFI memory map.
Fixes:
437b38c51162 ("ACPI: Add memory semantics to acpi_os_map_memory()")
Signed-off-by: Jia He <justin.he@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Thu, 23 Sep 2021 18:24:12 +0000 (11:24 -0700)]
Merge tag 'for-linus-rseq' of git://git./virt/kvm/kvm
Pull rseq fixes from Paolo Bonzini:
"A fix for a bug with restartable sequences and KVM.
KVM's handling of TIF_NOTIFY_RESUME, e.g. for task migration, clears
the flag without informing rseq and leads to stale data in userspace's
rseq struct"
* tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Remove __NR_userfaultfd syscall fallback
KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
tools: Move x86 syscall number fallbacks to .../uapi/
entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
Linus Torvalds [Thu, 23 Sep 2021 17:30:31 +0000 (10:30 -0700)]
Merge tag 'net-5.15-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
Previous releases - regressions:
- introduce a shutdown method to mdio device drivers, and make DSA
switch drivers compatible with masters disappearing on shutdown;
preventing infinite reference wait
- fix issues in mdiobus users related to ->shutdown vs ->remove
- virtio-net: fix pages leaking when building skb in big mode
- xen-netback: correct success/error reporting for the
SKB-with-fraglist
- dsa: tear down devlink port regions when tearing down the devlink
port on error
- nexthop: fix division by zero while replacing a resilient group
- hns3: check queue, vf, vlan ids range before using
Previous releases - always broken:
- napi: fix race against netpoll causing NAPI getting stuck
- mlx4_en: ensure link operstate is updated even if link comes up
before netdev registration
- bnxt_en: fix TX timeout when TX ring size is set to the smallest
- enetc: fix illegal access when reading affinity_hint; prevent oops
on sysfs access
- mtk_eth_soc: avoid creating duplicate offload entries
Misc:
- core: correct the sock::sk_lock.owned lockdep annotations"
* tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
atlantic: Fix issue in the pm resume flow.
net/mlx4_en: Don't allow aRFS for encapsulated packets
net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
nfc: st-nci: Add SPI ID matching DT compatible
MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
nexthop: Fix memory leaks in nexthop notification chain listeners
mptcp: ensure tx skbs always have the MPTCP ext
qed: rdma - don't wait for resources under hw error recovery flow
s390/qeth: fix deadlock during failing recovery
s390/qeth: Fix deadlock in remove_discipline
s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
net: dsa: realtek: register the MDIO bus under devres
net: dsa: don't allocate the slave_mii_bus using devres
Doc: networking: Fox a typo in ice.rst
net: dsa: fix dsa_tree_setup error path
net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
net/smc: add missing error check in smc_clc_prfx_set()
net: hns3: fix a return value error in hclge_get_reset_status()
net: hns3: check vlan id before using it
...
Shakeel Butt [Wed, 22 Sep 2021 22:49:06 +0000 (15:49 -0700)]
memcg: flush lruvec stats in the refault
Prior to the commit
7e1c0d6f5820 ("memcg: switch lruvec stats to rstat")
and the commit
aa48e47e3906 ("memcg: infrastructure to flush memcg
stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
32) at worst and for unbounded amount of time. The commit
aa48e47e3906
moved the lruvec stats to rstat infrastructure and the commit
7e1c0d6f5820 bounded the error for all the lruvec stats to (nr_cpus *
32) at worst for at most 2 seconds. More specifically it decoupled the
number of stats and the number of cgroups from the error rate.
However this reduction in error comes with the cost of triggering the
slowpath of stats update more frequently. Previously in the slowpath
the kernel adds the stats up the memcg tree. After
aa48e47e3906, the
kernel triggers the asyn lruvec stats flush through queue_work(). This
causes regression reports from 0day kernel bot [1] as well as from
phoronix test suite [2].
We tried two options to fix the regression:
1) Increase the threshold to trigger the slowpath in lruvec stats
update codepath from 32 to 512.
2) Remove the slowpath from lruvec stats update codepath and instead
flush the stats in the page refault codepath. The assumption is that
the kernel timely flush the stats, so, the update tree would be
small in the refault codepath to not cause the preformance impact.
Following are the results of will-it-scale/page_fault[1|2|3] benchmark
on four settings i.e. (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
aa48e47e3906 and
7e1c0d6f5820 reverted (3) 5.15-rc1 with option-1
(4) 5.15-rc1 with option-2.
test (1) (2) (3) (4)
pg_f1 368563 406277 (10.23%) 399693 (8.44%) 416398 (12.97%)
pg_f2 338399 372133 (9.96%) 369180 (9.09%) 381024 (12.59%)
pg_f3 500853 575399 (14.88%) 570388 (13.88%) 576083 (15.02%)
From the above result, it seems like the option-2 not only solves the
regression but also improves the performance for at least these
benchmarks.
Feng Tang (intel) ran the aim7 benchmark with these two options and
confirms that option-1 reduces the regression but option-2 removes the
regression.
Michael Larabel (phoronix) ran multiple benchmarks with these options
and reported the results at [3] and it shows for most benchmarks
option-2 removes the regression introduced by the commit
aa48e47e3906
("memcg: infrastructure to flush memcg stats").
Based on the experiment results, this patch proposed the option-2 as the
solution to resolve the regression.
Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020
Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress
Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104
Fixes:
aa48e47e3906 ("memcg: infrastructure to flush memcg stats")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Tested-by: Michael Larabel <Michael@phoronix.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>,
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Moore [Thu, 23 Sep 2021 13:50:11 +0000 (09:50 -0400)]
selinux,smack: fix subjective/objective credential use mixups
Jann Horn reported a problem with commit
eb1231f73c4d ("selinux:
clarify task subjective and objective credentials") where some LSM
hooks were attempting to access the subjective credentials of a task
other than the current task. Generally speaking, it is not safe to
access another task's subjective credentials and doing so can cause
a number of problems.
Further, while looking into the problem, I realized that Smack was
suffering from a similar problem brought about by a similar commit
1fb057dcde11 ("smack: differentiate between subjective and objective
task credentials").
This patch addresses this problem by restoring the use of the task's
objective credentials in those cases where the task is other than the
current executing task. Not only does this resolve the problem
reported by Jann, it is arguably the correct thing to do in these
cases.
Cc: stable@vger.kernel.org
Fixes:
eb1231f73c4d ("selinux: clarify task subjective and objective credentials")
Fixes:
1fb057dcde11 ("smack: differentiate between subjective and objective task credentials")
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Yue Hu [Tue, 14 Sep 2021 03:59:15 +0000 (11:59 +0800)]
erofs: clear compacted_2b if compacted_4b_initial > totalidx
Currently, the whole indexes will only be compacted 4B if
compacted_4b_initial > totalidx. So, the calculated compacted_2b
is worthless for that case. It may waste CPU resources.
No need to update compacted_4b_initial as mkfs since it's used to
fulfill the alignment of the 1st compacted_2b pack and would handle
the case above.
We also need to clarify compacted_4b_end here. It's used for the
last lclusters which aren't fitted in the previous compacted_2b
packs.
Some messages are from Xiang.
Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Gao Xiang [Wed, 22 Sep 2021 09:51:41 +0000 (17:51 +0800)]
erofs: fix misbehavior of unsupported chunk format check
Unsupported chunk format should be checked with
"if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)"
Found when checking with 4k-byte blockmap (although currently mkfs
uses inode chunk indexes format by default.)
Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.com
Fixes:
c5aa903a59db ("erofs: support reading chunk-based uncompressed files")
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Gao Xiang [Tue, 21 Sep 2021 14:35:30 +0000 (22:35 +0800)]
erofs: fix up erofs_lookup tracepoint
Fix up a misuse that the filename pointer isn't always valid in
the ring buffer, and we should copy the content instead.
Link: https://lore.kernel.org/r/20210921143531.81356-1-hsiangkao@linux.alibaba.com
Fixes:
13f06f48f7bf ("staging: erofs: support tracepoint")
Cc: stable@vger.kernel.org # 4.19+
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
dann frazier [Thu, 23 Sep 2021 14:50:02 +0000 (08:50 -0600)]
arm64: Restore forced disabling of KPTI on ThunderX
A noted side-effect of commit
0c6c2d3615ef ("arm64: Generate cpucaps.h")
is that cpucaps are now sorted, changing the enumeration order. This
assumed no dependencies between cpucaps, which turned out not to be true
in one case. UNMAP_KERNEL_AT_EL0 currently needs to be processed after
WORKAROUND_CAVIUM_27456. ThunderX systems are incompatible with KPTI, so
unmap_kernel_at_el0() bails if WORKAROUND_CAVIUM_27456 is set. But because
of the sorting, WORKAROUND_CAVIUM_27456 will not yet have been considered
when unmap_kernel_at_el0() checks for it, so the kernel tries to
run w/ KPTI - and quickly falls over.
Because all ThunderX implementations have homogeneous CPUs, we can remove
this dependency by just checking the current CPU for the erratum.
Fixes:
0c6c2d3615ef ("arm64: Generate cpucaps.h")
Cc: <stable@vger.kernel.org> # 5.13.x
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210923145002.3394558-1-dann.frazier@canonical.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Sudarsana Reddy Kalluru [Thu, 23 Sep 2021 10:16:05 +0000 (03:16 -0700)]
atlantic: Fix issue in the pm resume flow.
After fixing hibernation resume flow, another usecase was found which
should be explicitly handled - resume when device is in "down" state.
Invoke aq_nic_init jointly with aq_nic_start only if ndev was already
up during suspend/hibernate. We still need to perform nic_deinit() if
caller requests for it, to handle the freeze/resume scenarios.
Fixes:
57f780f1c433 ("atlantic: Fix driver resume flow.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>