Eiichi Tsukata [Mon, 10 Jun 2019 04:00:16 +0000 (13:00 +0900)]
tracing: Fix out-of-range read in trace_stack_print()
Puts range check before dereferencing the pointer.
Reproducer:
# echo stacktrace > trace_options
# echo 1 > events/enable
# cat trace > /dev/null
KASAN report:
==================================================================
BUG: KASAN: use-after-free in trace_stack_print+0x26b/0x2c0
Read of size 8 at addr
ffff888069d20000 by task cat/1953
CPU: 0 PID: 1953 Comm: cat Not tainted 5.2.0-rc3+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
Call Trace:
dump_stack+0x8a/0xce
print_address_description+0x60/0x224
? trace_stack_print+0x26b/0x2c0
? trace_stack_print+0x26b/0x2c0
__kasan_report.cold+0x1a/0x3e
? trace_stack_print+0x26b/0x2c0
kasan_report+0xe/0x20
trace_stack_print+0x26b/0x2c0
print_trace_line+0x6ea/0x14d0
? tracing_buffers_read+0x700/0x700
? trace_find_next_entry_inc+0x158/0x1d0
s_show+0xea/0x310
seq_read+0xaa7/0x10e0
? seq_escape+0x230/0x230
__vfs_read+0x7c/0x100
vfs_read+0x16c/0x3a0
ksys_read+0x121/0x240
? kernel_write+0x110/0x110
? perf_trace_sys_enter+0x8a0/0x8a0
? syscall_slow_exit_work+0xa9/0x410
do_syscall_64+0xb7/0x390
? prepare_exit_to_usermode+0x165/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f867681f910
Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 04
RSP: 002b:
00007ffdabf23488 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
RAX:
ffffffffffffffda RBX:
0000000000020000 RCX:
00007f867681f910
RDX:
0000000000020000 RSI:
00007f8676cde000 RDI:
0000000000000003
RBP:
00007f8676cde000 R08:
ffffffffffffffff R09:
0000000000000000
R10:
0000000000000871 R11:
0000000000000246 R12:
00007f8676cde000
R13:
0000000000000003 R14:
0000000000020000 R15:
0000000000000ec0
Allocated by task 1214:
save_stack+0x1b/0x80
__kasan_kmalloc.constprop.0+0xc2/0xd0
kmem_cache_alloc+0xaf/0x1a0
getname_flags+0xd2/0x5b0
do_sys_open+0x277/0x5a0
do_syscall_64+0xb7/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 1214:
save_stack+0x1b/0x80
__kasan_slab_free+0x12c/0x170
kmem_cache_free+0x8a/0x1c0
putname+0xe1/0x120
do_sys_open+0x2c5/0x5a0
do_syscall_64+0xb7/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at
ffff888069d20000
which belongs to the cache names_cache of size 4096
The buggy address is located 0 bytes inside of
4096-byte region [
ffff888069d20000,
ffff888069d21000)
The buggy address belongs to the page:
page:
ffffea0001a74800 refcount:1 mapcount:0 mapping:
ffff88806ccd1380 index:0x0 compound_mapcount: 0
flags: 0x100000000010200(slab|head)
raw:
0100000000010200 dead000000000100 dead000000000200 ffff88806ccd1380
raw:
0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888069d1ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888069d1ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
ffff888069d20000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888069d20080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888069d20100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Link: http://lkml.kernel.org/r/20190610040016.5598-1-devel@etsukata.com
Fixes:
4285f2fcef80 ("tracing: Remove the ULONG_MAX stack trace hackery")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tomas Bortoli [Tue, 28 May 2019 15:43:38 +0000 (17:43 +0200)]
tracing: Avoid memory leak in predicate_parse()
In case of errors, predicate_parse() goes to the out_free label
to free memory and to return an error code.
However, predicate_parse() does not free the predicates of the
temporary prog_stack array, thence leaking them.
Link: http://lkml.kernel.org/r/20190528154338.29976-1-tomasbortoli@gmail.com
Cc: stable@vger.kernel.org
Fixes:
80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster")
Reported-by: syzbot+6b8e0fb820e570c59e19@syzkaller.appspotmail.com
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
[ Added protection around freeing prog_stack[i].pred ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Miguel Ojeda [Thu, 23 May 2019 12:45:35 +0000 (14:45 +0200)]
tracing: Silence GCC 9 array bounds warning
Starting with GCC 9, -Warray-bounds detects cases when memset is called
starting on a member of a struct but the size to be cleared ends up
writing over further members.
Such a call happens in the trace code to clear, at once, all members
after and including `seq` on struct trace_iterator:
In function 'memset',
inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
./include/linux/string.h:344:9: warning: '__builtin_memset' offset
[8505, 8560] from the object at 'iter' is out of the bounds of
referenced subobject 'seq' with type 'struct trace_seq' at offset
4368 [-Warray-bounds]
344 | return __builtin_memset(p, c, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to avoid GCC complaining about it, we compute the address
ourselves by adding the offsetof distance instead of referring
directly to the member.
Since there are two places doing this clear (trace.c and trace_kdb.c),
take the chance to move the workaround into a single place in
the internal header.
Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.com
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
[ Removed unnecessary parenthesis around "iter" ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Jagadeesh Pagadala [Wed, 27 Mar 2019 22:19:46 +0000 (03:49 +0530)]
kernel/trace/trace.h: Remove duplicate header of trace_seq.h
Remove duplicate header which is included twice.
Link: http://lkml.kernel.org/r/1553725186-41442-1-git-send-email-jagdsh.linux@gmail.com
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Jagadeesh Pagadala <jagdsh.linux@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tom Zanussi [Thu, 18 Apr 2019 15:18:52 +0000 (10:18 -0500)]
tracing: Add a check_val() check before updating cond_snapshot() track_val
Without this check a snapshot is taken whenever a bucket's max is hit,
rather than only when the global max is hit, as it should be.
Before:
In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (347), then do
a second run and note the max again.
In this case, the max in the second run (39) is below the max in the
first run, but since we haven't cleared the histogram, the first max
is still in the histogram and is higher than any other max, so it
should still be the max for the snapshot. It isn't however - the
value should still be 347 after the second run.
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio,prev_comm):onmax($wakeup_lat).snapshot() if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 347
triggered by event with key: { next_pid: 2144 }
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2148 } hitcount: 199
max: 16 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/1
{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2150 } hitcount: 1326
max: 39 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
{ next_pid: 2149 } hitcount: 1983
max: 130 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/0
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 39
triggered by event with key: { next_pid: 2150 }
After:
In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (375), then do
a second run and note the max again.
In this case, the max in the second run is still 375, the highest in
any bucket, as it should be.
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5
{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2101 } hitcount: 199
max: 49 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5
{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2103 } hitcount: 1325
max: 74 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
{ next_pid: 2102 } hitcount: 1981
max: 84 next_prio: 19 next_comm: cyclictest
prev_pid: 12 prev_prio: 120 prev_comm: kworker/0:1
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }
Link: http://lkml.kernel.org/r/95958351329f129c07504b4d1769c47a97b70d65.1555597045.git.tom.zanussi@linux.intel.com
Cc: stable@vger.kernel.org
Fixes:
a3785b7eca8fd ("tracing: Add hist trigger snapshot() action")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tom Zanussi [Thu, 18 Apr 2019 15:18:51 +0000 (10:18 -0500)]
tracing: Check keys for variable references in expressions too
There's an existing check for variable references in keys, but it
doesn't go far enough. It checks whether a key field is a variable
reference but doesn't check whether it's an expression containing
variable references, which can cause the same problems for callers.
Use the existing field_has_hist_vars() function rather than a direct
top-level flag check to catch all possible variable references.
Link: http://lkml.kernel.org/r/e8c3d3d53db5ca90ceea5a46e5413103a6902fc7.1555597045.git.tom.zanussi@linux.intel.com
Cc: stable@vger.kernel.org
Fixes:
067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tom Zanussi [Thu, 18 Apr 2019 15:18:50 +0000 (10:18 -0500)]
tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
hist_field_var_ref() is an implementation of hist_field_fn_t(), which
can be called with a null tracing_map_elt elt param when assembling a
key in event_hist_trigger().
In the case of hist_field_var_ref() this doesn't make sense, because a
variable can only be resolved by looking it up using an already
assembled key i.e. a variable can't be used to assemble a key since
the key is required in order to access the variable.
Upper layers should prevent the user from constructing a key using a
variable in the first place, but in case one slips through, it
shouldn't cause a NULL pointer dereference. Also if one does slip
through, we want to know about it, so emit a one-time warning in that
case.
Link: http://lkml.kernel.org/r/64ec8dc15c14d305295b64cdfcc6b2b9dd14753f.1555597045.git.tom.zanussi@linux.intel.com
Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Sun, 19 May 2019 22:47:09 +0000 (15:47 -0700)]
Linux 5.2-rc1
Linus Torvalds [Sun, 19 May 2019 22:22:03 +0000 (15:22 -0700)]
Merge tag 'upstream-5.2-rc2' of git://git./linux/kernel/git/rw/ubifs
Pull UBIFS fixes from Richard Weinberger:
- build errors wrt xattrs
- mismerge which lead to a wrong Kconfig ifdef
- missing endianness conversion
* tag 'upstream-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: Convert xattr inum to host order
ubifs: Use correct config name for encryption
ubifs: Fix build error without CONFIG_UBIFS_FS_XATTR
Linus Torvalds [Sun, 19 May 2019 19:15:32 +0000 (12:15 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
"A few final bits:
- large changes to vmalloc, yielding large performance benefits
- tweak the console-flush-on-panic code
- a few fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
panic: add an option to replay all the printk message in buffer
initramfs: don't free a non-existent initrd
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock
mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro
mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro
mm/vmalloc.c: keep track of free blocks for vmap allocation
Linus Torvalds [Sun, 19 May 2019 18:53:58 +0000 (11:53 -0700)]
Merge tag 'kbuild-v5.2-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- remove unneeded use of cc-option, cc-disable-warning, cc-ldoption
- exclude tracked files from .gitignore
- re-enable -Wint-in-bool-context warning
- refactor samples/Makefile
- stop building immediately if syncconfig fails
- do not sprinkle error messages when $(CC) does not exist
- move arch/alpha/defconfig to the configs subdirectory
- remove crappy header search path manipulation
- add comment lines to .config to clarify the end of menu blocks
- check uniqueness of module names (adding new warnings intentionally)
* tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (24 commits)
kconfig: use 'else ifneq' for Makefile to improve readability
kbuild: check uniqueness of module names
kconfig: Terminate menu blocks with a comment in the generated config
kbuild: add LICENSES to KBUILD_ALLDIRS
kbuild: remove 'addtree' and 'flags' magic for header search paths
treewide: prefix header search paths with $(srctree)/
media: prefix header search paths with $(srctree)/
media: remove unneeded header search paths
alpha: move arch/alpha/defconfig to arch/alpha/configs/defconfig
kbuild: terminate Kconfig when $(CC) or $(LD) is missing
kbuild: turn auto.conf.cmd into a mandatory include file
.gitignore: exclude .get_maintainer.ignore and .gitattributes
kbuild: add all Clang-specific flags unconditionally
kbuild: Don't try to add '-fcatch-undefined-behavior' flag
kbuild: add some extra warning flags unconditionally
kbuild: add -Wvla flag unconditionally
arch: remove dangling asm-generic wrappers
samples: guard sub-directories with CONFIG options
kbuild: re-enable int-in-bool-context warning
MAINTAINERS: kbuild: Add pattern for scripts/*vmlinux*
...
Linus Torvalds [Sun, 19 May 2019 18:47:03 +0000 (11:47 -0700)]
Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"Some I2C core API additions which are kind of simple but enhance error
checking for users a lot, especially by returning errno now.
There are wrappers to still support the old API but it will be removed
once all users are converted"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core: add device-managed version of i2c_new_dummy
i2c: core: improve return value handling of i2c_new_device and i2c_new_dummy
Linus Torvalds [Sun, 19 May 2019 18:43:16 +0000 (11:43 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Some bug fixes, and an update to the URL's for the final version of
Unicode 12.1.0"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: avoid panic during forced reboot due to aborted journal
ext4: fix block validity checks for journal inodes using indirect blocks
unicode: update to Unicode 12.1.0 final
unicode: add missing check for an error return from utf8lookup()
ext4: fix miscellaneous sparse warnings
ext4: unsigned int compared against zero
ext4: fix use-after-free in dx_release()
ext4: fix data corruption caused by overlapping unaligned and aligned IO
jbd2: fix potential double free
ext4: zero out the unused memory region in the extent tree block
Linus Torvalds [Sun, 19 May 2019 18:38:18 +0000 (11:38 -0700)]
Merge tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Minor cleanup and fixes, one for stable, four rdma (smbdirect)
related. Also adds SEEK_HOLE support"
* tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: add support for SEEK_DATA and SEEK_HOLE
Fixed https://bugzilla.kernel.org/show_bug.cgi?id=202935 allow write on the same file
cifs: Allocate memory for all iovs in smb2_ioctl
cifs: Don't match port on SMBDirect transport
cifs:smbd Use the correct DMA direction when sending data
cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs have been called
cifs: use the right include for signal_pending()
smb3: trivial cleanup to smb2ops.c
cifs: cleanup smb2ops.c and normalize strings
smb3: display session id in debug data
Linus Torvalds [Sun, 19 May 2019 18:20:22 +0000 (11:20 -0700)]
Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling updates from Ingo Molnar:
"perf.data:
- Streaming compression of perf ring buffer into
PERF_RECORD_COMPRESSED user space records, resulting in ~3-5x
perf.data file size reduction on variety of tested workloads what
saves storage space on larger server systems where perf.data size
can easily reach several tens or even hundreds of GiBs, especially
when profiling with DWARF-based stacks and tracing of context
switches.
perf record:
- Improve -user-regs/intr-regs suggestions to overcome errors
perf annotate:
- Remove hist__account_cycles() from callback, speeding up branch
processing (perf record -b)
perf stat:
- Add a 'percore' event qualifier, e.g.: -e
cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for
both hardware threads in a core.
We can already do this with --per-core, but it's often useful to do
this together with other metrics that are collected per hardware
thread.
I.e. now its possible to do this per-event, and have it mixed with
other events not aggregated by core.
arm64:
- Map Brahma-B53 CPUID to cortex-a53 events.
- Add Cortex-A57 and Cortex-A72 events.
csky:
- Add DWARF register mappings for libdw, allowing --call-graph=dwarf
to work on the C-SKY arch.
x86:
- Add support for recording and printing XMM registers, available,
for instance, on Icelake.
- Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON
support. UPI replaced the Intel QuickPath Interconnect (QPI) in
Xeon Skylake-SP.
Intel PT:
- Fix instructions sampling rate.
- Timestamp fixes.
- Improve exported-sql-viewer GUI, allowing, for instance, to
copy'n'paste the trees, useful for e-mailing"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
perf stat: Support 'percore' event qualifier
perf stat: Factor out aggregate counts printing
perf tools: Add a 'percore' event qualifier
perf docs: Add description for stderr
perf intel-pt: Fix sample timestamp wrt non-taken branches
perf intel-pt: Fix improved sample timestamp
perf intel-pt: Fix instructions sampling rate
perf regs x86: Add X86 specific arch__intr_reg_mask()
perf parse-regs: Add generic support for arch__intr/user_reg_mask()
perf parse-regs: Split parse_regs
perf vendor events arm64: Add Cortex-A57 and Cortex-A72 events
perf vendor events arm64: Map Brahma-B53 CPUID to cortex-a53 events
perf vendor events arm64: Remove [[:xdigit:]] wildcard
perf jevents: Remove unused variable
perf test zstd: Fixup verbose mode output
perf tests: Implement Zstd comp/decomp integration test
perf inject: Enable COMPRESSED record decompression
perf report: Implement perf.data record decompression
perf record: Implement -z,--compression_level[=<n>] option
perf report: Add stub processing of compressed events for -D
...
Linus Torvalds [Sun, 19 May 2019 18:11:20 +0000 (11:11 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull clocksource updates from Ingo Molnar:
"Misc clocksource/clockevent driver updates that came in a bit late but
are ready for v5.2"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
misc: atmel_tclib: Do not probe already used TCBs
clocksource/drivers/timer-atmel-tcb: Convert tc_clksrc_suspend|resume() to static
clocksource/drivers/tcb_clksrc: Rename the file for consistency
clocksource/drivers/timer-atmel-pit: Rework Kconfig option
clocksource/drivers/tcb_clksrc: Move Kconfig option
ARM: at91: Implement clocksource selection
clocksource/drivers/tcb_clksrc: Use tcb as sched_clock
clocksource/drivers/tcb_clksrc: Stop depending on atmel_tclib
ARM: at91: move SoC specific definitions to SoC folder
clocksource/drivers/timer-milbeaut: Cleanup common register accesses
clocksource/drivers/timer-milbeaut: Add shutdown function
clocksource/drivers/timer-milbeaut: Fix to enable one-shot timer
clocksource/drivers/tegra: Rework for compensation of suspend time
clocksource/drivers/sp804: Add COMPILE_TEST to CONFIG_ARM_TIMER_SP804
clocksource/drivers/sun4i: Add a compatible for suniv
dt-bindings: timer: Add Allwinner suniv timer
Linus Torvalds [Sun, 19 May 2019 17:58:45 +0000 (10:58 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull IRQ chip updates from Ingo Molnar:
"A late irqchips update:
- New TI INTR/INTA set of drivers
- Rewrite of the stm32mp1-exti driver as a platform driver
- Update the IOMMU MSI mapping API to be RT friendly
- A number of cleanups and other low impact fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg()
irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg()
irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg()
iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
arm64: arch_k3: Enable interrupt controller drivers
irqchip/ti-sci-inta: Add msi domain support
soc: ti: Add MSI domain bus support for Interrupt Aggregator
irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver
dt-bindings: irqchip: Introduce TISCI Interrupt Aggregator bindings
irqchip/ti-sci-intr: Add support for Interrupt Router driver
dt-bindings: irqchip: Introduce TISCI Interrupt router bindings
gpio: thunderx: Use the default parent apis for {request,release}_resources
genirq: Introduce irq_chip_{request,release}_resource_parent() apis
firmware: ti_sci: Add helper apis to manage resources
firmware: ti_sci: Add RM mapping table for am654
firmware: ti_sci: Add support for IRQ management
firmware: ti_sci: Add support for RM core ops
...
Linus Torvalds [Sun, 19 May 2019 17:33:26 +0000 (10:33 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fix from Ingo Molnar:
"Fix an EFI-fb regression that affects certain x86 systems"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fbdev/efifb: Ignore framebuffer memmap entries that lack any memory types
Linus Torvalds [Sun, 19 May 2019 17:23:24 +0000 (10:23 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"This fixes a particularly thorny munmap() bug with MPX, plus fixes a
host build environment assumption in objtool"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Allow AR to be overridden with HOSTAR
x86/mpx, mm/core: Fix recursive munmap() corruption
Linus Torvalds [Sun, 19 May 2019 17:16:39 +0000 (10:16 -0700)]
Merge tag 'armsoc-late' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC late updates from Olof Johansson:
"This is some material that we picked up into our tree late. Most of it
are smaller fixes and additions, some defconfig updates due to recent
development, etc.
Code-wise the largest portion is a series of PM updates for the at91
platform, and those have been in linux-next a while through the at91
tree before we picked them up"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
arm64: dts: sprd: Add clock properties for serial devices
Opt out of scripts/get_maintainer.pl
ARM: ixp4xx: Remove duplicated include from common.c
soc: ixp4xx: qmgr: Fix an NULL vs IS_ERR() check in probe
arm64: tegra: Disable XUSB support on Jetson TX2
arm64: tegra: Enable SMMU translation for PCI on Tegra186
arm64: tegra: Fix insecure SMMU users for Tegra186
arm64: tegra: Select ARM_GIC_PM
amba: tegra-ahb: Mark PM functions as __maybe_unused
ARM: dts: logicpd-som-lv: Fix MMC1 card detect
ARM: mvebu: drop return from void function
ARM: mvebu: prefix coprocessor operand with p
ARM: mvebu: drop unnecessary label
ARM: mvebu: fix a leaked reference by adding missing of_node_put
ARM: socfpga_defconfig: enable LTC2497
ARM: mvebu: kirkwood: remove error message when retrieving mac address
ARM: at91: sama5: make ov2640 as a module
ARM: OMAP1: ams-delta: fix early boot crash when LED support is disabled
ARM: at91: remove HAVE_FB_ATMEL for sama5 SoC as they use DRM
soc/fsl/qe: Fix an error code in qe_pin_request()
...
Linus Torvalds [Sun, 19 May 2019 17:10:15 +0000 (10:10 -0700)]
Merge tag 'powerpc-5.2-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix going back to stable, for a bug on 32-bit introduced when we
added support for THREAD_INFO_IN_TASK.
A fix for a typo in a recent rework of our hugetlb code that leads to
crashes on 64-bit when using hugetlbfs with a 4K PAGE_SIZE.
Two fixes for our recent rework of the address layout on 64-bit hash
CPUs, both only triggered when userspace tries to access addresses
outside the user or kernel address ranges.
Finally a fix for a recently introduced double free in an error path
in our cacheinfo code.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Sachin Sant, Tobin C.
Harding"
* tag 'powerpc-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/cacheinfo: Remove double free
powerpc/mm/hash: Fix get_region_id() for invalid addresses
powerpc/mm: Drop VM_BUG_ON in get_region_id()
powerpc/mm: Fix crashes with hugepages & 4K pages
powerpc/32s: fix flush_hash_pages() on SMP
Linus Torvalds [Sun, 19 May 2019 17:05:28 +0000 (10:05 -0700)]
Merge tag 'mips_5.2_2' of git://git./linux/kernel/git/mips/linux
Pull a few more MIPS updates from Paul Burton:
"Some SGI IP27 specific PCI rework and a batch of fixes:
- A build fix for BMIPS5000 configurations with
CONFIG_HW_PERF_EVENTS=y, which also neatly removes some #ifdefery.
- A fix to report supported ISAs correctly on older Ingenic SoCs
which incorrectly indicate MIPSr2 support in their cop0 Config
register.
- Some PCI modernization for SGI IP27 systems as part of ongoing work
to support some other SGI systems.
- A fix allowing use of appended DTB files with generic kernels.
- DMA mask fixes for SGI IP22 & Alchemy systems"
* tag 'mips_5.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Alchemy: add DMA masks for on-chip ethernet
MIPS: SGI-IP22: provide missing dma_mask/coherent_dma_mask
generic: fix appended dtb support
MIPS: SGI-IP27: abstract chipset irq from bridge
MIPS: SGI-IP27: use generic PCI driver
MIPS: Fix Ingenic SoCs sometimes reporting wrong ISA
MIPS: perf: Fix build with CONFIG_CPU_BMIPS5000 enabled
Linus Torvalds [Sun, 19 May 2019 16:56:36 +0000 (09:56 -0700)]
Merge tag 'riscv-for-linus-5.2-mw2' of git://git./linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains an assortment of RISC-V related patches that I'd like to
target for the 5.2 merge window. Most of the patches are cleanups, but
there are a handful of user-visible changes:
- The nosmp and nr_cpus command-line arguments are now supported,
which work like normal.
- The SBI console no longer installs itself as a preferred console,
we rely on standard mechanisms (/chosen, command-line, hueristics)
instead.
- sfence_remove_sfence_vma{,_asid} now pass their arguments along to
the SBI call.
- Modules now support BUG().
- A missing sfence.vma during boot has been added. This bug only
manifests during boot.
- The arch/riscv support for SiFive's L2 cache controller has been
merged, which should un-block the EDAC framework work.
I've only tested this on QEMU again, as I didn't have time to get
things running on the Unleashed. The latest master from this morning
merges in cleanly and passes the tests as well"
* tag 'riscv-for-linus-5.2-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: (31 commits)
riscv: fix locking violation in page fault handler
RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs
RISC-V: Add DT documentation for SiFive L2 Cache Controller
RISC-V: Avoid using invalid intermediate translations
riscv: Support BUG() in kernel module
riscv: Add the support for c.ebreak check in is_valid_bugaddr()
riscv: support trap-based WARN()
riscv: fix sbi_remote_sfence_vma{,_asid}.
riscv: move switch_mm to its own file
riscv: move flush_icache_{all,mm} to cacheflush.c
tty: Don't force RISCV SBI console as preferred console
RISC-V: Access CSRs using CSR numbers
RISC-V: Add interrupt related SCAUSE defines in asm/csr.h
RISC-V: Use tabs to align macro values in asm/csr.h
RISC-V: Fix minor checkpatch issues.
RISC-V: Support nr_cpus command line option.
RISC-V: Implement nosmp commandline option.
RISC-V: Add RISC-V specific arch_match_cpu_phys_id
riscv: vdso: drop unnecessary cc-ldoption
riscv: call pm_power_off from machine_halt / machine_power_off
...
Masahiro Yamada [Sat, 18 May 2019 08:07:48 +0000 (17:07 +0900)]
kconfig: use 'else ifneq' for Makefile to improve readability
'ifeq ... else ifneq ... endif' notation is supported by GNU Make 3.81
or later, which is the requirement for building the kernel since
commit
37d69ee30808 ("docs: bump minimal GNU Make version to 3.81").
Use it to improve the readability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Feng Tang [Fri, 17 May 2019 21:31:50 +0000 (14:31 -0700)]
panic: add an option to replay all the printk message in buffer
Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().
Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .
[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Fri, 17 May 2019 21:31:47 +0000 (14:31 -0700)]
initramfs: don't free a non-existent initrd
Since commit
54c7a8916a88 ("initramfs: free initrd memory if opening
/initrd.image fails"), the kernel has unconditionally attempted to free
the initrd even if it doesn't exist.
In the non-existent case this causes a boot-time splat if
CONFIG_DEBUG_VIRTUAL is enabled due to a call to virt_to_phys() with a
NULL address.
Instead we should check that the initrd actually exists and only attempt
to free it if it does.
Link: http://lkml.kernel.org/r/20190516143125.48948-1-steven.price@arm.com
Fixes:
54c7a8916a88 ("initramfs: free initrd memory if opening /initrd.image fails")
Signed-off-by: Steven Price <steven.price@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiufei Xue [Fri, 17 May 2019 21:31:44 +0000 (14:31 -0700)]
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu(). Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.
VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day...
BUG: unable to handle kernel NULL pointer dereference at
0000000000000278
evict+0xb3/0x180
iput+0x1b0/0x230
inode_switch_wbs_work_fn+0x3c0/0x6a0
worker_thread+0x4e/0x490
? process_one_work+0x410/0x410
kthread+0xe6/0x100
ret_from_fork+0x39/0x50
Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish. And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.
Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 17 May 2019 21:31:41 +0000 (14:31 -0700)]
mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock
syzbot reported the following error from a tree with a head commit of
baf76f0c58ae ("slip: make slhc_free() silently accept an error pointer")
BUG: unable to handle kernel paging request at
ffffea0003348000
#PF error: [normal kernel read fault]
PGD
12c3f9067 P4D
12c3f9067 PUD
12c3f8067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline]
RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline]
RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579
Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff
ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49
c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff
RSP: 0018:
ffff88802b31eab8 EFLAGS:
00010246
RAX:
1ffffd4000669000 RBX:
00000000000cd200 RCX:
ffffc9000a235000
RDX:
000000000001ca5e RSI:
ffffffff81988cc7 RDI:
0000000000000001
RBP:
ffff88802b31ebd8 R08:
ffff88805af700c0 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
ffffea0003348000
R13:
0000000000000000 R14:
ffff88802b31f030 R15:
dffffc0000000000
FS:
00007f61648dc700(0000) GS:
ffff8880ae900000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffea0003348000 CR3:
0000000037c64000 CR4:
00000000001426e0
Call Trace:
fast_isolate_around mm/compaction.c:1243 [inline]
fast_isolate_freepages mm/compaction.c:1418 [inline]
isolate_freepages mm/compaction.c:1438 [inline]
compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550
There is no reproducer and it is difficult to hit -- 1 crash every few
days. The issue is very similar to the fix in commit
6b0868c820ff
("mm/compaction.c: correct zone boundary handling when resetting pageblock
skip hints"). When isolating free pages around a target pageblock, the
boundary handling is off by one and can stray into the next pageblock.
Triggering the syzbot error requires that the end of pageblock is section
or zone aligned, and that the next section is unpopulated.
A more subtle consequence of the bug is that pageblocks were being
improperly used as migration targets which potentially hurts fragmentation
avoidance in the long-term one page at a time.
A debugging patch revealed that it's definitely possible to stray outside
of a pageblock which is not intended. While syzbot cannot be used to
verify this patch, it was confirmed that the debugging warning no longer
triggers with this patch applied. It has also been confirmed that the THP
allocation stress tests are not degraded by this patch.
Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net
Fixes:
e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uladzislau Rezki (Sony) [Fri, 17 May 2019 21:31:37 +0000 (14:31 -0700)]
mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro
This macro adds some debug code to check that vmap allocations are
happened in ascending order.
By default this option is set to 0 and not active. It requires
recompilation of the kernel to activate it. Set to 1, compile the
kernel.
[urezki@gmail.com: v4]
Link: http://lkml.kernel.org/r/20190406183508.25273-4-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190402162531.10888-4-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uladzislau Rezki (Sony) [Fri, 17 May 2019 21:31:34 +0000 (14:31 -0700)]
mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro
This macro adds some debug code to check that the augment tree is
maintained correctly, meaning that every node contains valid
subtree_max_size value.
By default this option is set to 0 and not active. It requires
recompilation of the kernel to activate it. Set to 1, compile the
kernel.
[urezki@gmail.com: v4]
Link: http://lkml.kernel.org/r/20190406183508.25273-3-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190402162531.10888-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uladzislau Rezki (Sony) [Fri, 17 May 2019 21:31:31 +0000 (14:31 -0700)]
mm/vmalloc.c: keep track of free blocks for vmap allocation
Patch series "improve vmap allocation", v3.
Objective
---------
Please have a look for the description at:
https://lkml.org/lkml/2018/10/19/786
but let me also summarize it a bit here as well.
The current implementation has O(N) complexity. Requests with different
permissive parameters can lead to long allocation time. When i say
"long" i mean milliseconds.
Description
-----------
This approach organizes the KVA memory layout into free areas of the
1-ULONG_MAX range, i.e. an allocation is done over free areas lookups,
instead of finding a hole between two busy blocks. It allows to have
lower number of objects which represent the free space, therefore to have
less fragmented memory allocator. Because free blocks are always as large
as possible.
It uses the augment tree where all free areas are sorted in ascending
order of va->va_start address in pair with linked list that provides
O(1) access to prev/next elements.
Since the tree is augment, we also maintain the "subtree_max_size" of VA
that reflects a maximum available free block in its left or right
sub-tree. Knowing that, we can easily traversal toward the lowest (left
most path) free area.
Allocation: ~O(log(N)) complexity. It is sequential allocation method
therefore tends to maximize locality. The search is done until a first
suitable block is large enough to encompass the requested parameters.
Bigger areas are split.
I copy paste here the description of how the area is split, since i
described it in https://lkml.org/lkml/2018/10/19/786
<snip>
A free block can be split by three different ways. Their names are
FL_FIT_TYPE, LE_FIT_TYPE/RE_FIT_TYPE and NE_FIT_TYPE, i.e. they
correspond to how requested size and alignment fit to a free block.
FL_FIT_TYPE - in this case a free block is just removed from the free
list/tree because it fully fits. Comparing with current design there is
an extra work with rb-tree updating.
LE_FIT_TYPE/RE_FIT_TYPE - left/right edges fit. In this case what we do
is just cutting a free block. It is as fast as a current design. Most of
the vmalloc allocations just end up with this case, because the edge is
always aligned to 1.
NE_FIT_TYPE - Is much less common case. Basically it happens when
requested size and alignment does not fit left nor right edges, i.e. it
is between them. In this case during splitting we have to build a
remaining left free area and place it back to the free list/tree.
Comparing with current design there are two extra steps. First one is we
have to allocate a new vmap_area structure. Second one we have to insert
that remaining free block to the address sorted list/tree.
In order to optimize a first case there is a cache with free_vmap objects.
Instead of allocating from slab we just take an object from the cache and
reuse it.
Second one is pretty optimized. Since we know a start point in the tree
we do not do a search from the top. Instead a traversal begins from a
rb-tree node we split.
<snip>
De-allocation. ~O(log(N)) complexity. An area is not inserted straight
away to the tree/list, instead we identify the spot first, checking if it
can be merged around neighbors. The list provides O(1) access to
prev/next, so it is pretty fast to check it. Summarizing. If merged then
large coalesced areas are created, if not the area is just linked making
more fragments.
There is one more thing that i should mention here. After modification of
VA node, its subtree_max_size is updated if it was/is the biggest area in
its left or right sub-tree. Apart of that it can also be populated back
to upper levels to fix the tree. For more details please have a look at
the __augment_tree_propagate_from() function and the description.
Tests and stressing
-------------------
I use the "test_vmalloc.sh" test driver available under
"tools/testing/selftests/vm/" since 5.1-rc1 kernel. Just trigger "sudo
./test_vmalloc.sh" to find out how to deal with it.
Tested on different platforms including x86_64/i686/ARM64/x86_64_NUMA.
Regarding last one, i do not have any physical access to NUMA system,
therefore i emulated it. The time of stressing is days.
If you run the test driver in "stress mode", you also need the patch that
is in Andrew's tree but not in Linux 5.1-rc1. So, please apply it:
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=
e0cf7749bade6da318e98e934a24d8b62fab512c
After massive testing, i have not identified any problems like memory
leaks, crashes or kernel panics. I find it stable, but more testing would
be good.
Performance analysis
--------------------
I have used two systems to test. One is i5-3320M CPU @ 2.60GHz and
another is HiKey960(arm64) board. i5-3320M runs on 4.20 kernel, whereas
Hikey960 uses 4.15 kernel. I have both system which could run on 5.1-rc1
as well, but the results have not been ready by time i an writing this.
Currently it consist of 8 tests. There are three of them which correspond
to different types of splitting(to compare with default). We have 3
ones(see above). Another 5 do allocations in different conditions.
a) sudo ./test_vmalloc.sh performance
When the test driver is run in "performance" mode, it runs all available
tests pinned to first online CPU with sequential execution test order. We
do it in order to get stable and repeatable results. Take a look at time
difference in "long_busy_list_alloc_test". It is not surprising because
the worst case is O(N).
# i5-3320M
How many cycles all tests took:
CPU0=
646919905370(default) cycles vs CPU0=
193290498550(patched) cycles
# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_patched.txt
# Hikey960 8x CPUs
How many cycles all tests took:
CPU0=
3478683207 cycles vs CPU0=
463767978 cycles
# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_patched.txt
b) time sudo ./test_vmalloc.sh test_repeat_count=1
With this configuration, all tests are run on all available online CPUs.
Before running each CPU shuffles its tests execution order. It gives
random allocation behaviour. So it is rough comparison, but it puts in
the picture for sure.
# i5-3320M
<default> vs <patched>
real 101m22.813s real 0m56.805s
user 0m0.011s user 0m0.015s
sys 0m5.076s sys 0m0.023s
# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_patched.txt
# Hikey960 8x CPUs
<default> vs <patched>
real unknown real 4m25.214s
user unknown user 0m0.011s
sys unknown sys 0m0.670s
I did not manage to complete this test on "default Hikey960" kernel
version. After 24 hours it was still running, therefore i had to cancel
it. That is why real/user/sys are "unknown".
This patch (of 3):
Currently an allocation of the new vmap area is done over busy list
iteration(complexity O(n)) until a suitable hole is found between two busy
areas. Therefore each new allocation causes the list being grown. Due to
over fragmented list and different permissive parameters an allocation can
take a long time. For example on embedded devices it is milliseconds.
This patch organizes the KVA memory layout into free areas of the
1-ULONG_MAX range. It uses an augment red-black tree that keeps blocks
sorted by their offsets in pair with linked list keeping the free space in
order of increasing addresses.
Nodes are augmented with the size of the maximum available free block in
its left or right sub-tree. Thus, that allows to take a decision and
traversal toward the block that will fit and will have the lowest start
address, i.e. it is sequential allocation.
Allocation: to allocate a new block a search is done over the tree until a
suitable lowest(left most) block is large enough to encompass: the
requested size, alignment and vstart point. If the block is bigger than
requested size - it is split.
De-allocation: when a busy vmap area is freed it can either be merged or
inserted to the tree. Red-black tree allows efficiently find a spot
whereas a linked list provides a constant-time access to previous and next
blocks to check if merging can be done. In case of merging of
de-allocated memory chunk a large coalesced area is created.
Complexity: ~O(log(N))
[urezki@gmail.com: v3]
Link: http://lkml.kernel.org/r/20190402162531.10888-2-urezki@gmail.com
[urezki@gmail.com: v4]
Link: http://lkml.kernel.org/r/20190406183508.25273-2-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190321190327.11813-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Sat, 18 May 2019 08:24:43 +0000 (10:24 +0200)]
Merge tag 'perf-core-for-mingo-5.2-
20190517' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf.data:
Alexey Budankov:
- Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED
user space records, resulting in ~3-5x perf.data file size reduction
on variety of tested workloads what saves storage space on larger
server systems where perf.data size can easily reach several tens or
even hundreds of GiBs, especially when profiling with DWARF-based
stacks and tracing of context switches.
perf record:
Arnaldo Carvalho de Melo
- Improve -user-regs/intr-regs suggestions to overcome errors.
perf annotate:
Jin Yao:
- Remove hist__account_cycles() from callback, speeding up branch processing
(perf record -b).
perf stat:
- Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/,
that sums up the event counts for both hardware threads in a core.
We can already do this with --per-core, but it's often useful to do
this together with other metrics that are collected per hardware thread.
I.e. now its possible to do this per-event, and have it mixed with other
events not aggregated by core.
core libraries:
Donald Yandt:
- Check for errors when doing fgets(/proc/version).
Jiri Olsa:
- Speed up report for perf compiled with linbunwind.
tools headers:
Arnaldo Carvalho de Melo
- Update memcpy_64.S, x86's kvm.h and pt_regs.h.
arm64:
Florian Fainelli:
- Map Brahma-B53 CPUID to cortex-a53 events.
- Add Cortex-A57 and Cortex-A72 events.
csky:
Mao Han:
- Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work
on the C-SKY arch.
x86:
Andi Kleen/Kan Liang:
- Add support for recording and printing XMM registers, available, for
instance, on Icelake.
Kan Liang:
- Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support.
UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP.
Intel PT:
Adrian Hunter
. Fix instructions sampling rate.
. Timestamp fixes.
. Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste
the trees, useful for e-mailing.
Documentation:
Thomas Richter:
- Add description for 'perf --debug stderr=1', which redirects stderr to stdout.
libtraceevent:
Tzvetomir Stoyanov:
- Add man pages for the various APIs.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masahiro Yamada [Fri, 17 May 2019 16:07:15 +0000 (01:07 +0900)]
kbuild: check uniqueness of module names
In the recent build test of linux-next, Stephen saw a build error
caused by a broken .tmp_versions/*.mod file:
https://lkml.org/lkml/2019/5/13/991
drivers/net/phy/asix.ko and drivers/net/usb/asix.ko have the same
basename, and there is a race in generating .tmp_versions/asix.mod
Kbuild has not checked this before, and it suddenly shows up with
obscure error messages when this kind of race occurs.
Non-unique module names cause various sort of problems, but it is
not trivial to catch them by eyes.
Hence, this script.
It checks not only real modules, but also built-in modules (i.e.
controlled by tristate CONFIG option, but currently compiled with =y).
Non-unique names for built-in modules also cause problems because
/sys/modules/ would fall over.
For the latest kernel, I tested "make allmodconfig all" (or more
quickly "make allyesconfig modules"), and it detected the following:
warning: same basename if the following are built as modules:
drivers/regulator/88pm800.ko
drivers/mfd/88pm800.ko
warning: same basename if the following are built as modules:
drivers/gpu/drm/bridge/adv7511/adv7511.ko
drivers/media/i2c/adv7511.ko
warning: same basename if the following are built as modules:
drivers/net/phy/asix.ko
drivers/net/usb/asix.ko
warning: same basename if the following are built as modules:
fs/coda/coda.ko
drivers/media/platform/coda/coda.ko
warning: same basename if the following are built as modules:
drivers/net/phy/realtek.ko
drivers/net/dsa/realtek.ko
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Alexander Popov [Fri, 17 May 2019 19:42:22 +0000 (22:42 +0300)]
kconfig: Terminate menu blocks with a comment in the generated config
Currently menu blocks start with a pretty header but end with nothing in
the generated config. So next config options stick together with the
options from the menu block.
Let's terminate menu blocks in the generated config with a comment and
a newline if needed. Example:
...
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y
#
# Network testing
#
CONFIG_NET_PKTGEN=y
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options
CONFIG_HAMRADIO=y
...
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Wed, 15 May 2019 16:18:54 +0000 (01:18 +0900)]
kbuild: add LICENSES to KBUILD_ALLDIRS
For *-pkg targets, the LICENSES directory should be included in the
source tarball.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Mon, 13 May 2019 06:22:17 +0000 (15:22 +0900)]
kbuild: remove 'addtree' and 'flags' magic for header search paths
The 'addtree' and 'flags' in scripts/Kbuild.include are so compilecated
and ugly.
As I mentioned in [1], Kbuild should stop automatic prefixing of header
search path options.
I fixed up (almost) all Makefiles in the kernel. Now 'addtree' and
'flags' have been removed.
Kbuild still caters to add $(srctree)/$(src) and $(objtree)/$(obj)
to the header search path for O= building, but never touches extra
compiler options from ccflags-y etc.
[1]: https://patchwork.kernel.org/patch/
9632347/
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Mon, 13 May 2019 06:22:16 +0000 (15:22 +0900)]
treewide: prefix header search paths with $(srctree)/
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].
To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.
Having whitespaces after -I does not matter since commit
48f6e3cf5bc6
("kbuild: do not drop -I without parameter").
[1]: https://patchwork.kernel.org/patch/
9632347/
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Mon, 13 May 2019 06:22:15 +0000 (15:22 +0900)]
media: prefix header search paths with $(srctree)/
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].
To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.
Having whitespaces after -I does not matter since commit
48f6e3cf5bc6
("kbuild: do not drop -I without parameter").
[1]: https://patchwork.kernel.org/patch/
9632347/
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Masahiro Yamada [Mon, 13 May 2019 06:22:14 +0000 (15:22 +0900)]
media: remove unneeded header search paths
I was able to build without these extra header search paths.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Mon, 13 May 2019 02:14:05 +0000 (11:14 +0900)]
alpha: move arch/alpha/defconfig to arch/alpha/configs/defconfig
As of Linux 5.1, alpha and s390 are the last architectures that
have defconfig in arch/*/ instead of arch/*/configs/.
$ find arch -name defconfig | sort
arch/alpha/defconfig
arch/arm64/configs/defconfig
arch/csky/configs/defconfig
arch/nds32/configs/defconfig
arch/riscv/configs/defconfig
arch/s390/defconfig
The arch/$(ARCH)/defconfig is the hard-coded default in Kconfig,
and I want to deprecate it after evacuating the remaining defconfig
into the standard location, arch/*/configs/.
Define KBUILD_DEFCONFIG like other architectures, and move defconfig
into the configs/ subdirectory.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Paul Walmsley <paul@pwsan.com>
Masahiro Yamada [Thu, 9 May 2019 07:35:55 +0000 (16:35 +0900)]
kbuild: terminate Kconfig when $(CC) or $(LD) is missing
If the compiler specified by $(CC) is not present, the Kconfig stage
sprinkles 'not found' messages, then succeeds.
$ make CROSS_COMPILE=foo defconfig
/bin/sh: 1: foogcc: not found
/bin/sh: 1: foogcc: not found
*** Default configuration is based on 'x86_64_defconfig'
./scripts/gcc-version.sh: 17: ./scripts/gcc-version.sh: foogcc: not found
./scripts/gcc-version.sh: 18: ./scripts/gcc-version.sh: foogcc: not found
./scripts/gcc-version.sh: 19: ./scripts/gcc-version.sh: foogcc: not found
./scripts/gcc-version.sh: 17: ./scripts/gcc-version.sh: foogcc: not found
./scripts/gcc-version.sh: 18: ./scripts/gcc-version.sh: foogcc: not found
./scripts/gcc-version.sh: 19: ./scripts/gcc-version.sh: foogcc: not found
./scripts/clang-version.sh: 11: ./scripts/clang-version.sh: foogcc: not found
./scripts/gcc-plugin.sh: 11: ./scripts/gcc-plugin.sh: foogcc: not found
init/Kconfig:16:warning: 'GCC_VERSION': number is invalid
#
# configuration written to .config
#
Terminate parsing files immediately if $(CC) or $(LD) is not found.
"make *config" will fail more nicely.
$ make CROSS_COMPILE=foo defconfig
*** Default configuration is based on 'x86_64_defconfig'
scripts/Kconfig.include:34: compiler 'foogcc' not found
make[1]: *** [scripts/kconfig/Makefile;82: defconfig] Error 1
make: *** [Makefile;557: defconfig] Error 2
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Sun, 12 May 2019 02:13:48 +0000 (11:13 +0900)]
kbuild: turn auto.conf.cmd into a mandatory include file
syncconfig is responsible for keeping auto.conf up-to-date, so if it
fails for any reason, the build must be terminated immediately.
However, since commit
9390dff66a52 ("kbuild: invoke syncconfig if
include/config/auto.conf.cmd is missing"), Kbuild continues running
even after syncconfig fails.
You can confirm this by intentionally making syncconfig error out:
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index
08ba146..
307b9de 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -1023,6 +1023,9 @@ int conf_write_autoconf(int overwrite)
FILE *out, *tristate, *out_h;
int i;
+ if (overwrite)
+ return 1;
+
if (!overwrite && is_present(autoconf_name))
return 0;
Then, syncconfig fails, but Make would not stop:
$ make -s mrproper allyesconfig defconfig
$ make
scripts/kconfig/conf --syncconfig Kconfig
*** Error during sync of the configuration.
make[2]: *** [scripts/kconfig/Makefile;69: syncconfig] Error 1
make[1]: *** [Makefile;557: syncconfig] Error 2
make: *** [include/config/auto.conf.cmd] Deleting file 'include/config/tristate.conf'
make: Failed to remake makefile 'include/config/auto.conf'.
SYSTBL arch/x86/include/generated/asm/syscalls_32.h
SYSHDR arch/x86/include/generated/asm/unistd_32_ia32.h
SYSHDR arch/x86/include/generated/asm/unistd_64_x32.h
SYSTBL arch/x86/include/generated/asm/syscalls_64.h
[ continue running ... ]
The reason is in the behavior of a pattern rule with multi-targets.
%/auto.conf %/auto.conf.cmd %/tristate.conf: $(KCONFIG_CONFIG)
$(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
GNU Make knows this rule is responsible for making all the three files
simultaneously. As far as examined, auto.conf.cmd is the target in
question when this rule is invoked. It is probably because auto.conf.cmd
is included below the inclusion of auto.conf.
The inclusion of auto.conf is mandatory, while that of auto.conf.cmd
is optional. GNU Make does not care about the failure in the process
of updating optional include files.
I filed this issue (https://savannah.gnu.org/bugs/?56301) in case this
behavior could be improved somehow in future releases of GNU Make.
Anyway, it is quite easy to fix our Makefile.
Given that auto.conf is already a mandatory include file, there is no
reason to stick auto.conf.cmd optional. Make it mandatory as well.
Cc: linux-stable <stable@vger.kernel.org> # 5.0+
Fixes:
9390dff66a52 ("kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Sat, 11 May 2019 03:13:54 +0000 (12:13 +0900)]
.gitignore: exclude .get_maintainer.ignore and .gitattributes
Also, sort the patterns alphabetically. Update the comment since
we have non-git files here.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Fri, 10 May 2019 14:10:09 +0000 (23:10 +0900)]
kbuild: add all Clang-specific flags unconditionally
We do not support old Clang versions. Upgrade your clang version
if any of these flags is unsupported.
Let's add all flags inside ifdef CONFIG_CC_IS_CLANG unconditionally.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Nathan Chancellor [Thu, 9 May 2019 11:48:25 +0000 (04:48 -0700)]
kbuild: Don't try to add '-fcatch-undefined-behavior' flag
This is no longer a valid option in clang, it was removed in 3.5, which
we don't support.
https://github.com/llvm/llvm-project/commit/
cb3f812b6b9fab8f3b41414f24e90222170417b4
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Thu, 9 May 2019 06:46:35 +0000 (15:46 +0900)]
kbuild: add some extra warning flags unconditionally
These flags are documented in the GCC 4.6 manual, and recognized by
Clang as well. Let's rip off the cc-option / cc-disable-warning switches.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Masahiro Yamada [Thu, 9 May 2019 06:45:49 +0000 (15:45 +0900)]
kbuild: add -Wvla flag unconditionally
This flag is documented in the GCC 4.6 manual, and recognized by
Clang as well. Let's rip off the cc-option switch.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Masahiro Yamada [Thu, 9 May 2019 07:59:34 +0000 (16:59 +0900)]
arch: remove dangling asm-generic wrappers
These generic-y defines do not have the corresponding generic header
in include/asm-generic/, so they are definitely invalid.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Thu, 9 May 2019 01:00:19 +0000 (10:00 +0900)]
samples: guard sub-directories with CONFIG options
Do not descend to sub-directories when unneeded.
I used subdir-$(CONFIG_...) for hidraw, seccomp, and vfs because
they only contain host programs.
While we are here, let's add SPDX License tag, and sort the directories
alphabetically.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Thu, 9 May 2019 00:58:01 +0000 (09:58 +0900)]
kbuild: re-enable int-in-bool-context warning
This warning was disabled by commit
bd664f6b3e37 ("disable new
gcc-7.1.1 warnings for now") just because it was too noisy.
Thanks to Arnd Bergmann, all warnings have been fixed. Now, we are
ready to re-enable it.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Krzysztof Kozlowski [Mon, 6 May 2019 12:47:00 +0000 (14:47 +0200)]
MAINTAINERS: kbuild: Add pattern for scripts/*vmlinux*
scripts/link-vmlinux.sh is part of kbuild so extend the pattern to match
any vmlinux related scripts.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Wed, 1 May 2019 13:56:01 +0000 (22:56 +0900)]
sh: exclude vmlinux.scr from .gitignore pattern
arch/sh/boot/.gitignore has the pattern "vmlinux*"; this is effective
not only for the current directory, but also for any sub-directories.
So, from the point of .gitignore grammar, the following check-in files
are also considered to be ignored:
arch/sh/boot/compressed/vmlinux.scr
arch/sh/boot/romimage/vmlinux.scr
As the manual gitignore(5) says "Files already tracked by Git are not
affected", this is not a problem as far as Git is concerned.
However, Git is not the only program that parses .gitignore because
.gitignore is useful to distinguish build artifacts from source files.
For example, tar(1) supports the --exclude-vcs-ignore option. As of
writing, this option does not work perfectly, but it intends to create
a tarball excluding files specified by .gitignore.
So, I believe it is better to fix this issue.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Nick Desaulniers [Wed, 24 Apr 2019 18:02:21 +0000 (11:02 -0700)]
sh: vsyscall: drop unnecessary cc-ldoption
Towards the goal of removing cc-ldoption, it seems that --hash-style=
was added to binutils 2.17.50.0.2 in 2006. The minimal required version
of binutils for the kernel according to
Documentation/process/changes.rst is 2.20.
Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html
Cc: clang-built-linux@googlegroups.com
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Nick Desaulniers [Tue, 23 Apr 2019 20:48:20 +0000 (13:48 -0700)]
ia64: require -Wl,--hash-style=sysv
Towards the goal of removing cc-ldoption, it seems that --hash-style=
was added to binutils 2.17.50.0.2 in 2006. The minimal required version
of binutils for the kernel according to
Documentation/process/changes.rst is 2.20.
Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html
Cc: clang-built-linux@googlegroups.com
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Mon, 4 Feb 2019 07:56:01 +0000 (16:56 +0900)]
csky: remove deprecated arch/csky/boot/dts/include/dt-bindings
Having a symbolic link arch/*/boot/dts/include/dt-bindings was
deprecated by commit
d5d332d3f7e8 ("devicetree: Move include
prefixes from arch to separate directory").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Jan Kara [Fri, 17 May 2019 21:37:18 +0000 (17:37 -0400)]
ext4: avoid panic during forced reboot due to aborted journal
Handling of aborted journal is a special code path different from
standard ext4_error() one and it can call panic() as well. Commit
1dc1097ff60e ("ext4: avoid panic during forced reboot") forgot to update
this path so fix that omission.
Fixes:
1dc1097ff60e ("ext4: avoid panic during forced reboot")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # 5.1
Linus Torvalds [Fri, 17 May 2019 20:57:54 +0000 (13:57 -0700)]
Merge tag 'sound-fix-5.2-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few HD-audio fixes, most of which are specific to Realtek
codecs"
* tag 'sound-fix-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
ALSA: hda: Fix race between creating and refreshing sysfs entries
ALSA: hda/realtek - Corrected fixup for System76 Gazelle (gaze14)
ALSA: hda/realtek - Avoid superfluous COEF EAPD setups
ALSA: hda/realtek - Fixup headphone noise via runtime suspend
Linus Torvalds [Fri, 17 May 2019 17:33:30 +0000 (10:33 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for SVE and Pointer Authentication in guests
- PMU improvements
POWER:
- support for direct access to the POWER9 XIVE interrupt controller
- memory and performance optimizations
x86:
- support for accessing memory not backed by struct page
- fixes and refactoring
Generic:
- dirty page tracking improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
kvm: fix compilation on aarch64
Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
kvm: x86: Fix L1TF mitigation for shadow MMU
KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
tests: kvm: Add tests for KVM_SET_NESTED_STATE
KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
tests: kvm: Add tests to .gitignore
KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
KVM: Fix the bitmap range to copy during clear dirty
KVM: arm64: Fix ptrauth ID register masking logic
KVM: x86: use direct accessors for RIP and RSP
KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
KVM: x86: Omit caching logic for always-available GPRs
kvm, x86: Properly check whether a pfn is an MMIO or not
...
Heiner Kallweit [Thu, 16 May 2019 21:13:09 +0000 (23:13 +0200)]
i2c: core: add device-managed version of i2c_new_dummy
i2c_new_dummy is typically called from the probe function of the
driver for the primary i2c client. It requires calls to
i2c_unregister_device in the error path of the probe function and
in the remove function.
This can be simplified by introducing a device-managed version.
Note the changed error case return value type: i2c_new_dummy returns
NULL whilst devm_i2c_new_dummy_device returns an ERR_PTR.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
[wsa: rename new functions and fix minor kdoc issues]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Peter Rosin <peda@axentia.se>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Heiner Kallweit [Thu, 16 May 2019 21:13:08 +0000 (23:13 +0200)]
i2c: core: improve return value handling of i2c_new_device and i2c_new_dummy
Currently i2c_new_device and i2c_new_dummy return just NULL in error
case although they have more error details internally. Therefore move
the functionality into new functions returning detailed errors and
add wrappers for compatibility with the current API.
This allows to use these functions with detailed error codes within
the i2c core or for API extensions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
[wsa: rename new functions and fix minor kdoc issues]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Peter Rosin <peda@axentia.se>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Linus Torvalds [Fri, 17 May 2019 17:17:29 +0000 (10:17 -0700)]
Merge tag 'nds32-for-linus-5.2-rc1' of git://git./linux/kernel/git/greentime/linux
Pull nds32 updates from Greentime Hu:
- Clean up codes and Makefile
- Fix a vDSO bug
- Remove useless functions/header files
- Update git repo path in MAINTAINERS
* tag 'nds32-for-linus-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: Fix vDSO clock_getres()
MAINTAINERS: update nds32 git repo path
nds32: don't export low-level cache flushing routines
arch: nds32: Kconfig: pedantic formatting
nds32: fix semicolon code style issue
nds32: vdso: drop unnecessary cc-ldoption
nds32: remove unused generic-y += cmpxchg-local.h
nds32: Use the correct style for SPDX License Identifier
nds32: remove __virt_to_bus and __bus_to_virt
nds32: vdso: fix and clean-up Makefile
nds32: add vmlinux.lds and vdso.so to .gitignore
nds32: ex-exit: Remove unneeded need_resched() loop
nds32/io: Remove useless definition of mmiowb()
nds32: Removed unused thread flag TIF_USEDFPU
Linus Torvalds [Fri, 17 May 2019 17:08:59 +0000 (10:08 -0700)]
Merge tag 's390-5.2-2' of git://git./linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
- Enhancements for the QDIO layer
- Remove the RCP trace event
- Avoid three build issues
- Move the defconfig to the configs directory
* tag 's390-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: move arch/s390/defconfig to arch/s390/configs/defconfig
s390/qdio: optimize state inspection of HW-owned SBALs
s390/qdio: use get_buf_state() in debug_get_buf_state()
s390/qdio: allow to scan all Output SBALs in one go
s390/cio: Remove tracing for rchp instruction
s390/kasan: adapt disabled_wait usage to avoid build error
latent_entropy: avoid build error when plugin cflags are not set
s390/boot: fix compiler error due to missing awk strtonum
Linus Torvalds [Fri, 17 May 2019 16:46:31 +0000 (09:46 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull more vfs mount updates from Al Viro:
"Propagation of new syscalls to other architectures + cosmetic change
from Christian (fscontext didn't follow the convention for anon inode
names)"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
uapi, fsopen: use square brackets around "fscontext" [ver #2]
Thomas Gleixner [Fri, 17 May 2019 13:48:06 +0000 (15:48 +0200)]
Merge tag 'timers-v5.2' of git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull clockevent updates from Daniel Lezcano:
- Add compatible string for suniv for sun4i (Mesih Kilinc)
- Add COMPILE_TEST option for sp804 (David Abdurachmanov)
- Replace the compensation time when suspend happens on tegra with the one
provided by the generic framework (Joseph Lo)
- Cleanup, shutdown and oneshot mode fix on milbeaut timer (Sugaya Taichi)
- Atmel TCB rework to fix boot failure on boards without PIT or misfunction
on system using a preempt-rt kernel (Alexandre Belloni)
Tobin C. Harding [Wed, 15 May 2019 09:07:50 +0000 (19:07 +1000)]
powerpc/cacheinfo: Remove double free
kfree() after kobject_put(). Who ever wrote this was on crack.
Fixes:
7e8039795a80 ("powerpc/cacheinfo: Fix kobject memleak")
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Thu, 16 May 2019 11:50:54 +0000 (17:20 +0530)]
powerpc/mm/hash: Fix get_region_id() for invalid addresses
Accesses by userspace to random addresses outside the user or kernel
address range will generate an SLB fault. When we handle that fault we
classify the effective address into several classes, eg. user, kernel
linear, kernel virtual etc.
For addresses that are completely outside of any valid range, we
should not insert an SLB entry at all, and instead immediately an
exception.
In the past this was handled in two ways. Firstly we would check the
top nibble of the address (using REGION_ID(ea)) and that would tell us
if the address was user (0), kernel linear (c), kernel virtual (d), or
vmemmap (f). If the address didn't match any of these it was invalid.
Then for each type of address we would do a secondary check. For the
user region we check against H_PGTABLE_RANGE, for kernel linear we
would mask the top nibble of the address and then check the address
against MAX_PHYSMEM_BITS.
As part of commit
0034d395f89d ("powerpc/mm/hash64: Map all the kernel
regions in the same 0xc range") we replaced REGION_ID() with
get_region_id() and changed the masking of the top nibble to only mask
the top two bits, which introduced a bug.
Addresses less than (4 << 60) are still handled correctly, they are
either less than (1 << 60) in which case they are subject to the
H_PGTABLE_RANGE check, or they are correctly checked against
MAX_PHYSMEM_BITS.
However addresses from (4 << 60) to ((0xc << 60) - 1), are incorrectly
treated as kernel linear addresses in get_region_id(). Then the top
two bits are cleared by EA_MASK in slb_allocate_kernel() and the
address is checked against MAX_PHYSMEM_BITS, which it passes due to
the masking. The end result is we incorrectly insert SLB entries for
those addresses.
That is not actually catastrophic, having inserted the SLB entry we
will then go on to take a page fault for the address and at that point
we detect the problem and report it as a bad fault.
Still we should not be inserting those entries, or treating them as
kernel linear addresses in the first place. So fix get_region_id() to
detect addresses in that range and return an invalid region id, which
we cause use to not insert an SLB entry and directly report an
exception.
Fixes:
0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Drop change to EA_MASK for now, rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paolo Bonzini [Fri, 17 May 2019 12:08:53 +0000 (14:08 +0200)]
kvm: fix compilation on aarch64
Commit
e45adf665a53 ("KVM: Introduce a new guest mapping API", 2019-01-31)
introduced a build failure on aarch64 defconfig:
$ make -j$(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=out defconfig \
Image.gz
...
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
In function '__kvm_map_gfn':
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:9: error:
implicit declaration of function 'memremap'; did you mean 'memset_p'?
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:46: error:
'MEMREMAP_WB' undeclared (first use in this function)
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
In function 'kvm_vcpu_unmap':
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1795:3: error:
implicit declaration of function 'memunmap'; did you mean 'vm_munmap'?
because these functions are declared in <linux/io.h> rather than <asm/io.h>,
and the former was being pulled in already on x86 but not on aarch64.
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nathan Chancellor [Thu, 16 May 2019 17:49:42 +0000 (12:49 -0500)]
objtool: Allow AR to be overridden with HOSTAR
Currently, this Makefile hardcodes GNU ar, meaning that if it is not
available, there is no way to supply a different one and the build will
fail.
$ make AR=llvm-ar CC=clang LD=ld.lld HOSTAR=llvm-ar HOSTCC=clang \
HOSTLD=ld.lld HOSTLDFLAGS=-fuse-ld=lld defconfig modules_prepare
...
AR /out/tools/objtool/libsubcmd.a
/bin/sh: 1: ar: not found
...
Follow the logic of HOST{CC,LD} and allow the user to specify a
different ar tool via HOSTAR (which is used elsewhere in other
tools/ Makefiles).
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/80822a9353926c38fd7a152991c6292491a9d0e8.1558028966.git.jpoimboe@redhat.com
Link: https://github.com/ClangBuiltLinux/linux/issues/481
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ard Biesheuvel [Thu, 16 May 2019 21:31:59 +0000 (23:31 +0200)]
fbdev/efifb: Ignore framebuffer memmap entries that lack any memory types
The following commit:
38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB")
updated the EFI framebuffer code to use memory mappings for the linear
framebuffer that are permitted by the memory attributes described by the
EFI memory map for the particular region, if the framebuffer happens to
be covered by the EFI memory map (which is typically only the case for
framebuffers in shared memory). This is required since non-x86 systems
may require cacheable attributes for memory mappings that are shared
with other masters (such as GPUs), and this information cannot be
described by the Graphics Output Protocol (GOP) EFI protocol itself,
and so we rely on the EFI memory map for this.
As reported by James, this breaks some x86 systems:
[ 1.173368] efifb: probing for efifb
[ 1.173386] efifb: abort, cannot remap video memory 0x1d5000 @ 0xcf800000
[ 1.173395] Trying to free nonexistent resource <
00000000cf800000-
00000000cf9d4bff>
[ 1.173413] efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
The problem turns out to be that the memory map entry that describes the
framebuffer has no memory attributes listed at all, and so we end up with
a mem_flags value of 0x0.
So work around this by ensuring that the memory map entry's attribute field
has a sane value before using it to mask the set of usable attributes.
Reported-by: James Hilliard <james.hilliard1@gmail.com>
Tested-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org> # v4.19+
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes:
38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when ...")
Link: http://lkml.kernel.org/r/20190516213159.3530-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andreas Schwab [Tue, 7 May 2019 07:36:46 +0000 (09:36 +0200)]
riscv: fix locking violation in page fault handler
When a user mode process accesses an address in the vmalloc area
do_page_fault tries to unlock the mmap semaphore when it isn't locked.
Signed-off-by: Andreas Schwab <schwab@suse.de>
[Palmer: Duplicated code instead of a goto]
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Yash Shah [Mon, 6 May 2019 10:48:40 +0000 (16:18 +0530)]
RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs
The driver currently supports only SiFive FU540-C000 platform.
The initial version of L2 cache controller driver includes:
- Initial configuration reporting at boot up.
- Support for ECC related functionality.
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Yash Shah [Mon, 6 May 2019 10:48:39 +0000 (16:18 +0530)]
RISC-V: Add DT documentation for SiFive L2 Cache Controller
Add device tree bindings for SiFive FU540 L2 cache controller driver
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Palmer Dabbelt [Wed, 27 Mar 2019 00:40:24 +0000 (17:40 -0700)]
RISC-V: Avoid using invalid intermediate translations
This is almost entirely a comment.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Vincent Chen [Tue, 5 Mar 2019 03:23:35 +0000 (11:23 +0800)]
riscv: Support BUG() in kernel module
The kernel module is loaded into vmalloc region which is located below
to the PAGE_OFFSET. Hence the condition, pc < PAGE_OFFSET, in the
is_valid_bugaddr() will filter out all trap exceptions triggered
by kernel module. To support BUG() in kernel module, the condition is
changed to pc < VMALLOC_START.
Signed-off-by: Vincent Chen <vincentc@andestech.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Vincent Chen [Tue, 5 Mar 2019 03:23:34 +0000 (11:23 +0800)]
riscv: Add the support for c.ebreak check in is_valid_bugaddr()
The macro __BUG_INSN currently is defined as the "ebreak" opcode.
The is_valid_bugaddr() function compares the instruction pointed to by
$sepc with macro __BUG_INSN to check whether the current trap exception
is caused by an "ebreak" instruction. However, this check flow is possibly
erroneous because if C extension is supported, the expected trap
instruction "ebreak" is possibly translated to "c.ebreak" by the assembler.
Therefore, it requires a mechanism to distinguish the length of the
instruction in $spec and compare it to the correct trap instruction.
Signed-off-by: Vincent Chen <vincentc@andestech.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Vincent Chen [Tue, 5 Mar 2019 03:23:33 +0000 (11:23 +0800)]
riscv: support trap-based WARN()
The WARN() related function will trigger a debug exception. This can help
developers to analyze the cause of WARN() because if the debugger is
connected, the control flow will be transferred to debugging
environment.
Signed-off-by: Vincent Chen <vincentc@andestech.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Gary Guo [Wed, 27 Mar 2019 00:41:29 +0000 (00:41 +0000)]
riscv: fix sbi_remote_sfence_vma{,_asid}.
Currently sbi_remote_sfence_vma{,_asid} does not pass their arguments
to SBI at all, which is semantically incorrect.
Neither BBL nor OpenSBI is using these arguments at the moment, and
they just do a global flush instead. However we still need to provide
correct arguments.
Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Gary Guo [Wed, 27 Mar 2019 00:41:29 +0000 (00:41 +0000)]
riscv: move switch_mm to its own file
switch_mm is an expensive operations that has two users.
flush_icache_deferred is only called within switch_mm and can be moved
together. The function is expected to be more complicated when ASID
support is added, so clean up eagerly.
By moving them to a separate file we also removes some excessive
dependency of tlbflush.h and cacheflush.h.
Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Gary Guo [Wed, 27 Mar 2019 00:41:25 +0000 (00:41 +0000)]
riscv: move flush_icache_{all,mm} to cacheflush.c
Currently, flush_icache_all is macro-expanded into a SBI call, yet no
asm/sbi.h is included in asm/cacheflush.h. This could be moved to
mm/cacheflush.c instead (SBI call will dominate performance-wise and
there is no worry to not have it inlined.
Currently, flush_icache_mm stays in kernel/smp.c, which looks like a
hack to prevent it from being compiled when CONFIG_SMP=n. It should
also be in mm/cacheflush.c.
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Anup Patel [Thu, 25 Apr 2019 13:35:06 +0000 (06:35 -0700)]
tty: Don't force RISCV SBI console as preferred console
The Linux kernel will auto-disables all boot consoles whenever it
gets a preferred real console.
Currently on RISC-V systems, if we have a real console which is not
RISCV SBI console then boot consoles (such as earlycon=sbi) are not
auto-disabled when a real console (ttyS0 or ttySIF0) is available.
This results in duplicate prints at boot-time after kernel starts
using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel
parameter was passed by bootloader.
The reason for above issue is that RISCV SBI console always adds
itself as preferred console which is causing other real consoles
to be not used as preferred console.
Ideally "console=" kernel parameter passed by bootloaders should
be the one selecting a preferred real console.
This patch fixes above issue by not forcing RISCV SBI console as
preferred console.
Fixes:
afa6b1ccfad5 ("tty: New RISC-V SBI console driver")
Cc: stable@vger.kernel.org
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Anup Patel [Thu, 25 Apr 2019 08:38:41 +0000 (08:38 +0000)]
RISC-V: Access CSRs using CSR numbers
We should prefer accessing CSRs using their CSR numbers because:
1. It compiles fine with older toolchains.
2. We can use latest CSR names in #define macro names of CSR numbers
as-per RISC-V spec.
3. We can access newly added CSRs even if toolchain does not recognize
newly addes CSRs by name.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Anup Patel [Thu, 25 Apr 2019 08:38:37 +0000 (08:38 +0000)]
RISC-V: Add interrupt related SCAUSE defines in asm/csr.h
This patch adds SCAUSE interrupt flag and SCAUSE interrupt related
defines to asm/csr.h. We also use these defines in kernel/irq.c and
express SIE/SIP flags in-terms of SCAUSE interrupt causes.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Anup Patel [Thu, 25 Apr 2019 08:38:30 +0000 (08:38 +0000)]
RISC-V: Use tabs to align macro values in asm/csr.h
The spacing between macro name and value is not consistent in
asm/csr.h. This patch beautifies asm/csr.h by using tabs to align
macro values instead of spaces.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Atish Patra [Wed, 24 Apr 2019 21:48:01 +0000 (14:48 -0700)]
RISC-V: Fix minor checkpatch issues.
While working on the patches, I found some minor checkpatch issues.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Atish Patra [Wed, 24 Apr 2019 21:48:00 +0000 (14:48 -0700)]
RISC-V: Support nr_cpus command line option.
If nr_cpus command line option is set, maximum possible cpu should be
set to that value.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Linus Torvalds [Fri, 17 May 2019 02:10:37 +0000 (19:10 -0700)]
Merge tag 'for-linus-
20190516' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A small set of fixes for io_uring.
This contains:
- smp_rmb() cleanup for io_cqring_events() (Jackie)
- io_cqring_wait() simplification (Jackie)
- removal of dead 'ev_flags' passing (me)
- SQ poll CPU affinity verification fix (me)
- SQ poll wait fix (Roman)
- SQE command prep cleanup and fix (Stefan)"
* tag 'for-linus-
20190516' of git://git.kernel.dk/linux-block:
io_uring: use wait_event_interruptible for cq_wait conditional wait
io_uring: adjust smp_rmb inside io_cqring_events
io_uring: fix infinite wait in khread_park() on io_finish_async()
io_uring: remove 'ev_flags' argument
io_uring: fix failure to verify SQ_AFF cpu
io_uring: fix race condition reading SQE data
Linus Torvalds [Fri, 17 May 2019 02:08:15 +0000 (19:08 -0700)]
Merge tag 'for-5.2/block-post-
20190516' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"This is mainly some late lightnvm changes that came in just before the
merge window, as well as fixes that have been queued up since the
initial pull request was frozen.
This contains:
- lightnvm changes, fixing race conditions, improving memory
utilization, and improving pblk compatability (Chansol, Igor,
Marcin)
- NVMe pull request with minor fixes all over the map (via Christoph)
- remove redundant error print in sata_rcar (Geert)
- struct_size() cleanup (Jackie)
- dasd CONFIG_LBADF warning fix (Ming)
- brd cond_resched() improvement (Mikulas)"
* tag 'for-5.2/block-post-
20190516' of git://git.kernel.dk/linux-block: (41 commits)
block/bio-integrity: use struct_size() in kmalloc()
nvme: validate cntlid during controller initialisation
nvme: change locking for the per-subsystem controller list
nvme: trace all async notice events
nvme: fix typos in nvme status code values
nvme-fabrics: remove unused argument
nvme-multipath: avoid crash on invalid subsystem cntlid enumeration
nvme-fc: use separate work queue to avoid warning
nvme-rdma: remove redundant reference between ib_device and tagset
nvme-pci: mark expected switch fall-through
nvme-pci: add known admin effects to augument admin effects log page
nvme-pci: init shadow doorbell after each reset
brd: add cond_resched to brd_free_pages
sata_rcar: Remove ata_host_alloc() error printing
s390/dasd: fix build warning in dasd_eckd_build_cp_raw
lightnvm: pblk: use nvm_rq_to_ppa_list()
lightnvm: pblk: simplify partial read path
lightnvm: do not remove instance under global lock
lightnvm: track inflight target creations
lightnvm: pblk: recover only written metadata
...
Linus Torvalds [Fri, 17 May 2019 02:05:35 +0000 (19:05 -0700)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull more clk framework updates from Stephen Boyd:
"One more patch to remove io.h from clk-provider.h.
We used to need this include when we had clk_readl() and clk_writel(),
but those are gone now so this patch pushes the dependency out to the
users of clk-provider.h"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Remove io.h from clk-provider.h
Linus Torvalds [Fri, 17 May 2019 02:01:23 +0000 (19:01 -0700)]
Merge branch 'for-5.2-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"The cgroup2 freezer pulled in this cycle broke strace. This pull
request includes a workaround for the problem.
It's not a complete fix in that it may cause spurious frozen state
flip-flops which is fairly minor. Will push a full fix once it's
ready"
* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
signal: unconditionally leave the frozen state in ptrace_stop()
Linus Torvalds [Fri, 17 May 2019 01:57:58 +0000 (18:57 -0700)]
Merge tag 'linux-kselftest-5.2-rc1-2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull more kselftest updates from Shuah Khan:
- kselftest framework bpf build/test workflow regression fix
- Fix to kselftest install to use default install path
- Fix to kselftest KBUILD_OUTPUT builds to not clutter main
KBUILD_OUTPUT directory with selftest objects
- .gitignore fixes (Kelsey Skunberg)
- rseq selftests updates (Mathieu Desnoyers and Martin Schwidefsky)
They change the per-architecture pre-abort signatures to ensure those
are valid trap instructions.
The way exit points are presented to debuggers is enhanced, ensuring
all exit points are present, so debuggers don't have to disassemble
rseq critical section to properly skip over them.
Discussions with the glibc community is reaching a consensus of
exposing a __rseq_handled symbol from glibc to coexist with rseq
early adopters. Update the rseq selftest code to expose and use this
symbol.
Support for compiling asm goto with clang is added with the
"-no-integrated-as" compiler switch, similarly to the top level
kernel Makefile.
- kselftest Makefile test run output refactoring and making test output
TAP13 compliant from Kees Cook:
This re-factors the selftest Makefiles to extract the test running
logic to be reused between "run_tests" and "emit_tests", while also
fixing up the test output to be TAP version 13 compliant:
- added "plan" line
- fixed result line syntax
- moved all test output to be "# "-prefixed as TAP "diagnostic"
lines
The prefixing code includes a fallback mode for limited execution
environments.
Additionally, the plan lines are fixed for all callers of
kselftest.h.
* tag 'linux-kselftest-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects
selftests: drivers: Create .gitignore to include /dma-buf/udmabuf
selftests: pidfd: Create .gitignore to include pidfd_test
selftests: fix bpf build/test workflow regression when KBUILD_OUTPUT is set
selftests: fix install target to use default install path
rseq/selftests: add -no-integrated-as for clang
rseq/selftests: mips: use break instruction for RSEQ_SIG
rseq/selftests: powerpc code signature: generate valid instructions
rseq/selftests: aarch64 code signature: handle big-endian environment
rseq/selftests: arm: use udf instruction for RSEQ_SIG
rseq/selftests: s390: use trap4 for RSEQ_SIG
rseq/selftests: x86: use ud1 instruction as RSEQ_SIG opcode
rseq/selftests: s390: use jg instruction for jumps outside of the asm
rseq/selftests: Use __rseq_handled symbol to coexist with glibc
rseq/selftests: Introduce __rseq_cs_ptr_array, rename __rseq_table to __rseq_cs
rseq/selftests: Add __rseq_exit_point_array section for debuggers
rseq/selftests: x86: Work-around bogus gcc-8 optimisation
selftests: Add test plan API to kselftest.h and adjust callers
selftests: Remove KSFT_TAP_LEVEL
selftests: Move test output to diagnostic lines
...
Linus Torvalds [Fri, 17 May 2019 01:54:13 +0000 (18:54 -0700)]
Merge tag 'devicetree-for-5.2-part2' of git://git./linux/kernel/git/robh/linux
Pull Devicetree vendor prefix conversion from Rob Herring:
"Conversion of vendor-prefixes.txt to json-schema"
* tag 'devicetree-for-5.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Convert vendor prefixes to json-schema
Linus Torvalds [Fri, 17 May 2019 00:18:41 +0000 (17:18 -0700)]
Merge tag 'afs-fixes-b-
20190516' of git://git./linux/kernel/git/dhowells/linux-fs
Pull AFS callback promise fixes from David Howells:
"This series fixes a bunch of problems in callback promise handling,
where a callback promise indicates a promise on the part of the server
to notify the client in the event of some sort of change to a file or
volume. In the event of a break, the client has to go and refetch the
client status from the server and discard any cached permission
information as the ACL might have changed.
The problem in the current code is that changes made by other clients
aren't always noticed, primarily because the file status information
and the callback information aren't updated in the same critical
section, even if these are carried in the same reply from an RPC
operation, and so the AFS_VNODE_CB_PROMISED flag is unreliable.
Arranging for them to be done in the same critical section during
reply decoding is tricky because of the FS.InlineBulkStatus op - which
has all the statuses in the reply arriving and then all the callbacks,
so they have to be buffered. It simplifies things a lot to move the
critical section out of the decode phase and do it after the RPC
function returns.
Also new inodes (either newly fetched or newly created) aren't
properly managed against a callback break happening before we get the
local inode up and running.
Fix this by:
- There's now a combined file status and callback record (struct
afs_status_cb) to carry both plus some flags.
- Each operation wrapper function allocates sufficient afs_status_cb
records for all the vnodes it is interested in and passes them into
RPC operations to be filled in from the reply.
- The FileStatus and CallBack record decoders no longer apply the
new/revised status and callback information to the inode/vnode at
the point of decoding and instead store the information into the
record from (2).
- afs_vnode_commit_status() then revises the file status, detects
deletion and notes callback information inside of a single critical
section. It also checks the callback break counters and cancels the
callback promise if they changed during the operation.
[*] Note that "callback break counters" are counters of server
events that cancel one or more callback promises that the client
thinks it has. The client counts the events and compares the
counters before and after an operation to see if the callback
promise it thinks it just got evaporated before it got recorded
under lock.
- Volume and server callback break counters are passed into
afs_iget() allowing callback breaks concurrent with inode set up to
be detected and the callback promise thence to be cancelled.
- AFS validation checks are now done under RCU conditions using a
read lock on cb_lock. This requires vnode->cb_interest to be made
RCU safe.
- If the checks in (6) fail, the callback breaker is then called
under write lock on the cb_lock - but only if the callback break
counter didn't change from the value read before the checks were
made.
- Results from FS.InlineBulkStatus that correspond to inodes we
currently have in memory are now used to update those inodes'
status and callback information rather than being discarded. This
requires those inodes to be looked up before the RPC op is made and
all their callback break values saved.
To aid in this, the following changes have also been made:
- Don't pass the vnode into the reply delivery functions or the
decoders. The vnode shouldn't be altered anywhere in those paths.
The only exception, for the moment, is for the call done hook for
file lock ops that wants access to both the vnode and the call -
this can be fixed at a later time.
- Get rid of the call->reply[] void* array and replace it with named
and typed members. This avoids confusion since different ops were
mapping different reply[] members to different things.
- Fix an order-1 kmalloc allocation in afs_do_lookup() and replace it
with kvcalloc().
- Always get the reply time. Since callback, lock and fileserver
record expiry times are calculated for several RPCs, make this
mandatory.
- Call afs_pages_written_back() from the operation wrapper rather
than from the delivery function.
- Don't store the version and type from a callback promise in a reply
as the information in them is of very limited use"
* tag 'afs-fixes-b-
20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix application of the results of a inline bulk status fetch
afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
afs: Fix unlink to handle YFS.RemoveFile2 better
afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry
afs: Make vnode->cb_interest RCU safe
afs: Split afs_validate() so first part can be used under LOOKUP_RCU
afs: Don't save callback version and type fields
afs: Fix application of status and callback to be under same lock
afs: Always get the reply time
afs: Fix order-1 allocation in afs_do_lookup()
afs: Get rid of afs_call::reply[]
afs: Don't pass the vnode pointer through into the inline bulk status op
Linus Torvalds [Fri, 17 May 2019 00:00:13 +0000 (17:00 -0700)]
Merge tag 'afs-fixes-
20190516' of git://git./linux/kernel/git/dhowells/linux-fs
Pull misc AFS fixes from David Howells:
"This fixes a set of miscellaneous issues in the afs filesystem,
including:
- leak of keys on file close.
- broken error handling in xattr functions.
- missing locking when updating VL server list.
- volume location server DNS lookup whereby preloaded cells may not
ever get a lookup and regular DNS lookups to maintain server lists
consume power unnecessarily.
- incorrect error propagation and handling in the fileserver
iteration code causes operations to sometimes apparently succeed.
- interruption of server record check/update side op during
fileserver iteration causes uninterruptible main operations to fail
unexpectedly.
- callback promise expiry time miscalculation.
- over invalidation of the callback promise on directories.
- double locking on callback break waking up file locking waiters.
- double increment of the vnode callback break counter.
Note that it makes some changes outside of the afs code, including:
- an extra parameter to dns_query() to allow the dns_resolver key
just accessed to be immediately invalidated. AFS is caching the
results itself, so the key can be discarded.
- an interruptible version of wait_var_event().
- an rxrpc function to allow the maximum lifespan to be set on a
call.
- a way for an rxrpc call to be marked as non-interruptible"
* tag 'afs-fixes-
20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix double inc of vnode->cb_break
afs: Fix lock-wait/callback-break double locking
afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set
afs: Fix calculation of callback expiry time
afs: Make dynamic root population wait uninterruptibly for proc_cells_lock
afs: Make some RPC operations non-interruptible
rxrpc: Allow the kernel to mark a call as being non-interruptible
afs: Fix error propagation from server record check/update
afs: Fix the maximum lifespan of VL and probe calls
rxrpc: Provide kernel interface to set max lifespan on a call
afs: Fix "kAFS: AFS vnode with undefined type 0"
afs: Fix cell DNS lookup
Add wait_var_event_interruptible()
dns_resolver: Allow used keys to be invalidated
afs: Fix afs_cell records to always have a VL server list record
afs: Fix missing lock when replacing VL server list
afs: Fix afs_xattr_get_yfs() to not try freeing an error value
afs: Fix incorrect error handling in afs_xattr_get_acl()
afs: Fix key leak in afs_release() and afs_evict_inode()
Linus Torvalds [Thu, 16 May 2019 23:24:01 +0000 (16:24 -0700)]
Merge tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"On the filesystem side we have:
- a fix to enforce quotas set above the mount point (Luis Henriques)
- support for exporting snapshots through NFS (Zheng Yan)
- proper statx implementation (Jeff Layton). statx flags are mapped
to MDS caps, with AT_STATX_{DONT,FORCE}_SYNC taken into account.
- some follow-up dentry name handling fixes, in particular
elimination of our hand-rolled helper and the switch to __getname()
as suggested by Al (Jeff Layton)
- a set of MDS client cleanups in preparation for async MDS requests
in the future (Jeff Layton)
- a fix to sync the filesystem before remounting (Jeff Layton)
On the rbd side, work is on-going on object-map and fast-diff image
features"
* tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client: (29 commits)
ceph: flush dirty inodes before proceeding with remount
ceph: fix unaligned access in ceph_send_cap_releases
libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer
libceph: fix unaligned accesses in ceph_entity_addr handling
rbd: don't assert on writes to snapshots
rbd: client_mutex is never nested
ceph: print inode number in __caps_issued_mask debugging messages
ceph: just call get_session in __ceph_lookup_mds_session
ceph: simplify arguments and return semantics of try_get_cap_refs
ceph: fix comment over ceph_drop_caps_for_unlink
ceph: move wait for mds request into helper function
ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_request
ceph: after an MDS request, do callback and completions
ceph: use pathlen values returned by set_request_path_attr
ceph: use __getname/__putname in ceph_mdsc_build_path
ceph: use ceph_mdsc_build_path instead of clone_dentry_name
ceph: fix potential use-after-free in ceph_mdsc_build_path
ceph: dump granular cap info in "caps" debugfs file
ceph: make iterate_session_caps a public symbol
ceph: fix NULL pointer deref when debugging is enabled
...
Linus Torvalds [Thu, 16 May 2019 23:16:18 +0000 (16:16 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:
- Remove the 'module' Kconfig option for thermal subsystem framework
because the thermal framework are required to be ready as early as
possible to avoid overheat at boot time (Daniel Lezcano)
- Fix a bug that thermal framework pokes disabled thermal zones upon
resume (Wei Wang)
- A couple of cleanups and trivial fixes on int340x thermal drivers
(Srinivas Pandruvada, Zhang Rui, Sumeet Pawnikar)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
drivers: thermal: processor_thermal: Downgrade error message
mlxsw: Remove obsolete dependency on THERMAL=m
hwmon/drivers/core: Simplify complex dependency
thermal/drivers/core: Fix typo in the option name
thermal/drivers/core: Remove depends on THERMAL in Kconfig
thermal/drivers/core: Remove module unload code
thermal/drivers/core: Remove the module Kconfig's option
thermal: core: skip update disabled thermal zones after suspend
thermal: make device_register's type argument const
thermal: intel: int340x: processor_thermal_device: simplify to get driver data
thermal/int3403_thermal: favor _TMP instead of PTYP
Linus Torvalds [Thu, 16 May 2019 22:55:48 +0000 (15:55 -0700)]
Merge tag 'for-5.2/dm-changes-v2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Improve DM snapshot target's scalability by using finer grained
locking. Requires some list_bl interface improvements.
- Add ability for DM integrity to use a bitmap mode, that tracks
regions where data and metadata are out of sync, instead of using a
journal.
- Improve DM thin provisioning target to not write metadata changes to
disk if the thin-pool and associated thin devices are merely
activated but not used. This avoids metadata corruption due to
concurrent activation of thin devices across different OS instances
(e.g. split brain scenarios, which ultimately would be avoided if
proper device filters were used -- but not having proper filtering
has proven a very common configuration mistake)
- Fix missing call to path selector type->end_io in DM multipath. This
fixes reported performance problems due to inaccurate path selector
IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to
particular path due to it seemingly being heavily used).
- Fix bug in DM cache metadata's loading of its discard bitset that
could lead to all cache blocks being discarded if the very first
cache block was discarded (thankfully in practice the first cache
block is generally in use; be it FS superblock, partition table, disk
label, etc).
- Add testing-only DM dust target which simulates a device that has
failing sectors and/or read failures.
- Fix a DM init error path reference count hang that caused boot hangs
if user supplied malformed input on kernel commandline.
- Fix a couple issues with DM crypt target's logging being overly
verbose or lacking context.
- Various other small fixes to DM init, DM multipath, DM zoned, and DM
crypt.
* tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits)
dm: fix a couple brace coding style issues
dm crypt: print device name in integrity error message
dm crypt: move detailed message into debug level
dm ioctl: fix hang in early create error condition
dm integrity: whitespace, coding style and dead code cleanup
dm integrity: implement synchronous mode for reboot handling
dm integrity: handle machine reboot in bitmap mode
dm integrity: add a bitmap mode
dm integrity: introduce a function add_new_range_and_wait()
dm integrity: allow large ranges to be described
dm ingerity: pass size to dm_integrity_alloc_page_list()
dm integrity: introduce rw_journal_sectors()
dm integrity: update documentation
dm integrity: don't report unused options
dm integrity: don't check null pointer before kvfree and vfree
dm integrity: correctly calculate the size of metadata area
dm dust: Make dm_dust_init and dm_dust_exit static
dm dust: remove redundant unsigned comparison to less than zero
dm mpath: always free attached_handler_name in parse_path()
dm init: fix max devices/targets checks
...
Qian Cai [Thu, 16 May 2019 19:57:41 +0000 (15:57 -0400)]
slab: remove /proc/slab_allocators
It turned out that DEBUG_SLAB_LEAK is still broken even after recent
recue efforts that when there is a large number of objects like
kmemleak_object which is normal on a debug kernel,
# grep kmemleak /proc/slabinfo
kmemleak_object
2243606 3436210 ...
reading /proc/slab_allocators could easily loop forever while processing
the kmemleak_object cache and any additional freeing or allocating
objects will trigger a reprocessing. To make a situation worse,
soft-lockups could easily happen in this sitatuion which will call
printk() to allocate more kmemleak objects to guarantee an infinite
loop.
Also, since it seems no one had noticed when it was totally broken
more than 2-year ago - see the commit
fcf88917dd43 ("slab: fix a crash
by reading /proc/slab_allocators"), probably nobody cares about it
anymore due to the decline of the SLAB. Just remove it entirely.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Baolin Wang [Wed, 10 Apr 2019 07:22:50 +0000 (15:22 +0800)]
arm64: dts: sprd: Add clock properties for serial devices
We've introduced power management logics for the Spreadtrum serial
controller by commit
062ec2774c8a ("serial: sprd: Add power management
for the Spreadtrum serial controller"), thus add related clock properties
to support this feature.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
David Howells [Tue, 14 May 2019 11:33:10 +0000 (12:33 +0100)]
afs: Fix application of the results of a inline bulk status fetch
Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
it will update inodes that are already extant (something that afs_iget()
doesn't do) and to cache permits for each inode created (thereby avoiding a
follow up FS.FetchStatus call to determine this).
Extant inodes need looking up in advance so that their cb_break counters
before and after the operation can be compared. To this end, the inode
pointers are cached so that they don't need looking up again after the op.
Fixes:
5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 14 May 2019 11:23:43 +0000 (12:23 +0100)]
afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
Pass the server and volume break counts from before the status fetch
operation that queried the attributes of a file into afs_iget5_set() so
that the new vnode's break counters can be initialised appropriately.
This allows detection of a volume or server break that happened whilst we
were fetching the status or setting up the vnode.
Fixes:
c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>