git.monstr.eu Git - linux-2.6-microblaze.git/log

ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU

On a 144 thread system, `perf ftrace` takes about 20 seconds to start
up, due to calling synchronize_rcu() for each CPU.

  cat /proc/108560/stack
    0xc0003e7eb336f470
    __switch_to+0x2e0/0x480
    __wait_rcu_gp+0x20c/0x220
    synchronize_rcu+0x9c/0xc0
    ring_buffer_reset_cpu+0x88/0x2e0
    tracing_reset_online_cpus+0x84/0xe0
    tracing_open+0x1d4/0x1f0

On a system with 10x more threads, it starts to become an annoyance.

Batch these up so we disable all the per-cpu buffers first, then
synchronize_rcu() once, then reset each of the buffers. This brings
the time down to about 0.5s.

Link: https://lkml.kernel.org/r/20200625053403.2386972-1-npiggin@gmail.com
Tested-by: Anton Blanchard <anton@ozlabs.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit

After a discussion with the new time algorithm to have nested events still
have proper time keeping but required using local64_t atomic operations.
Mathieu was concerned about the performance this would have on 32 bit
machines, as in most cases, atomic 64 bit operations on them can be
expensive.

As the ring buffer's timing needs do not require full features of local64_t,
a wrapper is made to implement a new rb_time_t operation that uses two longs
on 32 bit machines but still uses the local64_t operations on 64 bit
machines. There's a switch that can be made in the file to force 64 bit to
use the 32 bit version just for testing purposes.

All reads do not need to succeed if a read happened while the stamp being
read is in the process of being updated. The requirement is that all reads
must succed that were done by an interrupting event (where this event was
interrupted by another event that did the write). Or if the event itself did
the write first. That is: rb_time_set(t, x) followed by rb_time_read(t) will
always succeed (even if it gets interrupted by another event that writes to
t. The result of the read will be either the previous set, or a set
performed by an interrupting event.

If the read is done by an event that interrupted another event that was in
the process of setting the time stamp, and no other event came along to
write to that time stamp, it will fail and the rb_time_read() will return
that it failed (the value to read will be undefined).

A set will always write to the time stamp and return with a valid time
stamp, such that any read after it will be valid.

A cmpxchg may fail if it interrupted an event that was in the process of
updating the time stamp just like the reads do. Other than that, it will act
like a normal cmpxchg.

The way this works is that the rb_time_t is made of of three fields. A cnt,
that gets updated atomically everyting a modification is made. A top that
represents the most significant 30 bits of the time, and a bottom to
represent the least significant 30 bits of the time. Notice, that the time
values is only 60 bits long (where the ring buffer only uses 59 bits, which
gives us 18 years of nanoseconds!).

The top two bits of both the top and bottom is a 2 bit counter that gets set
by the value of the least two significant bits of the cnt. A read of the top
and the bottom where both the top and bottom have the same most significant
top 2 bits, are considered a match and a valid 60 bit number can be created
from it. If they do not match, then the number is considered invalid, and
this must only happen if an event interrupted another event in the midst of
updating the time stamp.

This is only used for 32 bits machines as 64 bit machines can get better
performance out of the local64_t. This has been tested heavily by forcing 64
bit to use this logic.

Link: https://lore.kernel.org/r/20200625225345.18cf5881@oasis.local.home
Link: http://lkml.kernel.org/r/20200629025259.309232719@goodmis.org
Inspired-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

ring-buffer: Incorporate absolute timestamp into add_timestamp logic

Instead of calling out the absolute test for each time to check if the
ring buffer wants absolute time stamps for all its recording, incorporate it
with the add_timestamp field and turn it into flags for faster processing
between wanting a absolute tag and needing to force one.

Link: http://lkml.kernel.org/r/20200629025259.154892368@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

ring-buffer: Have nested events still record running time stamp

Up until now, if an event is interrupted while it is recorded by an
interrupt, and that interrupt records events, the time of those events will
all be the same. This is because events only record the delta of the time
since the previous event (or beginning of a page), and to handle updating
the time keeping for that of nested events is extremely racy. After years of
thinking about this and several failed attempts, I finally have a solution
to solve this puzzle.

The problem is that you need to atomically calculate the delta and then
update the time stamp you made the delta from, as well as then record it
into the buffer, all this while at any time an interrupt can come in and
do the same thing. This is easy to solve with heavy weight atomics, but that
would be detrimental to the performance of the ring buffer. The current
state of affairs sacrificed the time deltas for nested events for
performance.

The reason for previous failed attempts at solving this puzzle was because I
was trying to completely avoid slow atomic operations like cmpxchg. I final
came to the conclusion to always avoid cmpxchg is not possible, which is why
those previous attempts always failed. But it is possible to pick one path
(the most common case) and avoid cmpxchg in that path, which is the "fast
path". The most common case is that an event will not be interrupted and
have other events added into it. An event can detect if it has
interrupted another event, and for these cases we can make it the slow
path and use the heavy operations like cmpxchg.

One more player was added to the game that made this possible, and that is
the "absolute timestamp" (by Tom Zanussi) that allows us to inject a full 59
bit time stamp. (Of course this breaks if a machine is running for more than
18 years without a reboot!).

There's barrier() placements around for being paranoid, even when they
are not needed because of other atomic functions near by. But those
should not hurt, as if they are not needed, they basically become a nop.

Note, this also makes the race window much smaller, which means there
are less slow paths to slow down the performance.

The basic idea is that there's two main paths taken.

1) Not being interrupted between time stamps and reserving buffer space.
    In this case, the time stamps taken are true to the location in the
    buffer.

2) Was interrupted by another path between taking time stamps and reserving
    buffer space.

The objective is to know what the delta is from the last reserved location
in the buffer.

As it is possible to detect if an event is interrupting another event before
reserving data, space is added to the length to be reserved to inject a full
time stamp along with the event being reserved.

When an event is not interrupted, the write stamp is always the time of the
last event written to the buffer.

In path 1, there's two sub paths we care about:

a) The event did not interrupt another event.
b) The event interrupted another event.

In case a, as the write stamp was read and known to be correct, the delta
between the current time stamp and the write stamp is the delta between the
current event and the previously recorded event.

In case b, extra space was reserved to just put the full time stamp into the
buffer. Which is done, as stated, in this path the time stamp taken is known
to match the location in the buffer.

In path 2, there's also two sub paths we care about:

a) The event was not interrupted by another event since it reserved space
    on the buffer and re-reading the write stamp.
b) The event was interrupted by another event.

In case a, the write stamp is that of the last event that interrupted this
event between taking the time stamps and reserving. As no event came in
after re-reading the write stamp, that event is known to be the time of the
event directly before this event and the delta can be the new time stamp and
the write stamp.

In case b, one or more events came in between reserving the event and
re-reading he write stamp. Since this event's buffer reservation is between
other events at this path, there's no way to know what the delta is. But
because an event interrupted this event after it started, its fine to just
give a zero delta, and take the same time stamp as the events that happened
within the event being recorded.

Here's the implementation of the design of this solution:

All this is per cpu, and only needs to worry about nested events (not
parallel events).

The players:

write_tail: The index in the buffer where new events can be written to.
     It is incremented via local_add() to reserve space for a new event.

before_stamp: A time stamp set by all events before reserving space.

write_stamp: A time stamp updated by events after it has successfully
     reserved space.

/* Save the current position of write */
[A] w = local_read(write_tail);
barrier();
/* Read both before and write stamps before touching anything */
before = local_read(before_stamp);
after = local_read(write_stamp);
barrier();

/*
* If before and after are the same, then this event is not
* interrupting a time update. If it is, then reserve space for adding
* a full time stamp (this can turn into a time extend which is
* just an extended time delta but fill up the extra space).
*/
if (after != before)
abs = true;

ts = clock();

/* Now update the before_stamp (everyone does this!) */
[B] local_set(before_stamp, ts);

/* Now reserve space on the buffer */
[C] write = local_add_return(len, write_tail);

/* Set tail to be were this event's data is */
tail = write - len;

if (w == tail) {

/* Nothing interrupted this between A and C */
[D] local_set(write_stamp, ts);
barrier();
[E] save_before = local_read(before_stamp);

if (!abs) {
/* This did not interrupt a time update */
delta = ts - after;
} else {
delta = ts; /* The full time stamp will be in use */
}
if (ts != save_before) {
/* slow path - Was interrupted between C and E */
/* The update to write_stamp could have overwritten the update to
* it by the interrupting event, but before and after should be
* the same for all completed top events */
after = local_read(write_stamp);
if (save_before > after)
local_cmpxchg(write_stamp, after, save_before);
}
} else {
/* slow path - Interrupted between A and C */

after = local_read(write_stamp);
temp_ts = clock();
barrier();
[F] if (write == local_read(write_tail) && after < temp_ts) {
/* This was not interrupted since C and F
* The last write_stamp is still valid for the previous event
* in the buffer. */
delta = temp_ts - after;
/* OK to keep this new time stamp */
ts = temp_ts;
} else {
/* Interrupted between C and F
* Well, there's no use to try to know what the time stamp
* is for the previous event. Just set delta to zero and
* be the same time as that event that interrupted us before
* the reservation of the buffer. */

delta = 0;
}
/* No need to use full timestamps here */
abs = 0;
}

Link: https://lkml.kernel.org/r/20200625094454.732790f7@oasis.local.home
Link: https://lore.kernel.org/r/20200627010041.517736087@goodmis.org
Link: http://lkml.kernel.org/r/20200629025258.957440797@goodmis.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Move pipe reference to trace array instead of current_tracer

If a process has the trace_pipe open on a trace_array, the current tracer
for that trace array should not be changed. This was original enforced by a
global lock, but when instances were introduced, it was moved to the
current_trace. But this structure is shared by all instances, and a
trace_pipe is for a single instance. There's no reason that a process that
has trace_pipe open on one instance should prevent another instance from
changing its current tracer. Move the reference counter to the trace_array
instead.

This is marked as "Fixes" but is more of a clean up than a true fix.
Backport if you want, but its not critical.

Fixes: cf6ab6d9143b1 ("tracing: Add ref count to tracer for when they are being read by pipe")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: not necessary to define DEFINE_EVENT_PRINT to be empty again

After the previous cleanup, DEFINE_EVENT_PRINT's definition has no
relationship with DEFINE_EVENT. So After we re-define DEFINE_EVENT, it
is not necessary to define DEFINE_EVENT_PRINT to be empty again.

Link: http://lkml.kernel.org/r/20200612092844.56107-5-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: define DEFINE_EVENT_PRINT not related to DEFINE_EVENT

Current definition define DEFINE_EVENT_PRINT to be DEFINE_EVENT.
Actually, at this point DEFINE_EVENT is already an empty macro. Let's
cut the relationship between DEFINE_EVENT_PRINT and DEFINE_EVENT.

Link: http://lkml.kernel.org/r/20200612092844.56107-4-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: not necessary re-define DEFINE_EVENT_PRINT

The definition of DEFINE_EVENT_PRINT is not changed after previous one,
so not necessary to re-define is as the same form.

Link: http://lkml.kernel.org/r/20200612092844.56107-3-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: not necessary to undefine DEFINE_EVENT again

After un-define DEFINE_EVENT in Stage 2, DEFINE_EVENT is not defined to a
specific form. It is not necessary to un-define it again.

Let's skip this.

Link: http://lkml.kernel.org/r/20200612092844.56107-2-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

x86/ftrace: Do not jump to direct code in created trampolines

When creating a trampoline based on the ftrace_regs_caller code, nop out the
jnz test that would jmup to the code that would return to a direct caller
(stored in the ORIG_RAX field) and not back to the function that called it.

Link: http://lkml.kernel.org/r/20200422162750.638839749@goodmis.org
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

x86/ftrace: Only have the builtin ftrace_regs_caller call direct hooks

If a direct hook is attached to a function that ftrace also has a function
attached to it, then it is required that the ftrace_ops_list_func() is used
to iterate over the registered ftrace callbacks. This will also include the
direct ftrace_ops helper, that tells ftrace_regs_caller where to return to
(the direct callback and not the function that called it).

As this direct helper is only to handle the case of ftrace callbacks
attached to the same function as the direct callback, the ftrace callback
allocated trampolines (used to only call them), should never be used to
return back to a direct callback.

Only copy the portion of the ftrace_regs_caller that will return back to
what called it, and not the portion that returns back to the direct caller.

The direct ftrace_ops must then pick the ftrace_regs_caller builtin function
as its own trampoline to ensure that it will never have one allocated for
it (which would not include the handling of direct callbacks).

Link: http://lkml.kernel.org/r/20200422162750.495903799@goodmis.org
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

x86/ftrace: Make non direct case the default in ftrace_regs_caller

If a direct function is hooked along with one of the ftrace registered
functions, then the ftrace_regs_caller is attached to the function that
shares the direct hook as well as the ftrace hook. The ftrace_regs_caller
will call ftrace_ops_list_func() that iterates through all the registered
ftrace callbacks, and if there's a direct callback attached to that
function, the direct ftrace_ops callback is called to notify that
ftrace_regs_caller to return to the direct caller instead of going back to
the function that called it.

But this is a very uncommon case. Currently, the code has it as the default
case. Modify ftrace_regs_caller to make the default case (the non jump) to
just return normally, and have the jump to the handling of the direct
caller.

Link: http://lkml.kernel.org/r/20200422162750.350373278@goodmis.org
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Only allow trace_array_printk() to be used by instances

To prevent default "trace_printks()" from spamming the top level tracing
ring buffer, only allow trace instances to use trace_array_printk() (which
can be used without the trace_printk() start up warning).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Linux 5.8-rc3

Merge tag 'arm-omap-fixes-5.8-1' of git://git./linux/kernel/git/soc/soc

Pull ARM OMAP fixes from Arnd Bergmann:
"The OMAP developers are particularly active at hunting down
  regressions, so this is a separate branch with OMAP specific
  fixes for v5.8:

  As Tony explains
    "The recent display subsystem (DSS) related platform data changes
     caused display related regressions for suspend and resume. Looks
     like I only tested suspend and resume before dropping the legacy
     platform data, and forgot to test it after dropping it. Turns out
     the main issue was that we no longer have platform code calling
     pm_runtime_suspend for DSS like we did for the legacy platform data
     case, and that fix is still being discussed on the dri-devel list
     and will get merged separately. The DSS related testing exposed a
     pile other other display related issues that also need fixing
     though":

   - Fix ti-sysc optional clock handling and reset status checks for
     devices that reset automatically in idle like DSS

   - Ignore ti-sysc clockactivity bit unless separately requested to
     avoid unexpected performance issues

   - Init ti-sysc framedonetv_irq to true and disable for am4

   - Avoid duplicate DSS reset for legacy mode with dts data

   - Remove LCD timings for am4 as they cause warnings now that we're
     using generic panels

  Other OMAP changes from Tony include:

   - Fix omap_prm reset deassert as we still have drivers setting the
     pm_runtime_irq_safe() flag

   - Flush posted write for ti-sysc enable and disable

   - Fix droid4 spi related errors with spi flags

   - Fix am335x USB range and a typo for softreset

   - Fix dra7 timer nodes for clocks for IPU and DSP

   - Drop duplicate mailboxes after mismerge for dra7

   - Prevent pocketgeagle header line signal from accidentally setting
     micro-SD write protection signal by removing the default mux

   - Fix NFSroot flakeyness after resume for duover by switching the
     smsc911x gpio interrupt to back to level sensitive

   - Fix regression for omap4 clockevent source after recent system
     timer changes

   - Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
     phy-mode

   - One patch to convert am3/am4 DT files to use the regular sdhci-omap
     driver instead of the old hsmmc driver, this was meant for the
     merge window but got lost in the process"

* tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
  ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode
  ARM: dts: Fix omap4 system timer source clocks
  ARM: dts: Fix duovero smsc interrupt for suspend
  ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect
  Revert "bus: ti-sysc: Increase max softreset wait"
  ARM: dts: am437x-epos-evm: remove lcd timings
  ARM: dts: am437x-gp-evm: remove lcd timings
  ARM: dts: am437x-sk-evm: remove lcd timings
  ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes
  ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks
  ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag
  ARM: dts: Fix am33xx.dtsi USB ranges length
  bus: ti-sysc: Increase max softreset wait
  ARM: OMAP2+: Fix legacy mode dss_reset
  bus: ti-sysc: Fix uninitialized framedonetv_irq
  bus: ti-sysc: Ignore clockactivity unless specified as a quirk
  bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit
  ARM: dts: omap4-droid4: Fix spi configuration and increase rate
  bus: ti-sysc: Flush posted write on enable and disable
  soc: ti: omap-prm: use atomic iopoll instead of sleeping one
  ...

Merge tag 'arm-fixes-5.8-1' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple of bug fixes, mostly for devicetree files

  NXP i.MX:
   - Use correct voltage on some i.MX8M board device trees to avoid
     hardware damage
   - Code fixes for a compiler warning and incorrect reference counting,
     both harmless.
   - Fix the i.MX8M SoC driver to correctly identify imx8mp
   - Fix watchdog configuration in imx6ul-kontron device tree.

  Broadcom:
   - A small regression fix for the Raspberry-Pi firmware driver
   - A Kconfig change to use the correct timer driver on Northstar
   - A DT fix for the Luxul XWC-2000 machine
   - Two more DT fixes for NSP SoCs

  STmicroelectronics STI
   - Revert one broken patch for L2 cache configuration

  ARM Versatile Express:
   - Fix a regression by reverting a broken DT cleanup

  TEE drivers:
   - MAINTAINERS: change tee mailing list"

* tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  Revert "ARM: sti: Implement dummy L2 cache's write_sec"
  soc: imx8m: fix build warning
  ARM: imx6: add missing put_device() call in imx6q_suspend_init()
  ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()
  soc: imx8m: Correct i.MX8MP UID fuse offset
  ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain
  ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM
  arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2
  arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range
  arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range
  ARM: dts: NSP: Correct FA2 mailbox node
  ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision()
  MAINTAINERS: change tee mailing list
  ARM: dts: NSP: Disable PL330 by default, add dma-coherent property
  ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP
  ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000
  arm: dts: vexpress: Move mcc node back into motherboard node

Merge tag 'timers-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"A single DocBook fix"

* tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix kerneldoc system_device_crosststamp & al

Merge tag 'perf-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
"A single Kbuild dependency fix"

* tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Fix RAPL config variable bug

Merge tag 'efi-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:

- Fix build regression on v4.8 and older

- Robustness fix for TPM log parsing code

- kobject refcount fix for the ESRT parsing code

- Two efivarfs fixes to make it behave more like an ordinary file
   system

- Style fixup for zero length arrays

- Fix a regression in path separator handling in the initrd loader

- Fix a missing prototype warning

- Add some kerneldoc headers for newly introduced stub routines

- Allow support for SSDT overrides via EFI variables to be disabled

- Report CPU mode and MMU state upon entry for 32-bit ARM

- Use the correct stack pointer alignment when entering from mixed mode

* tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub: arm: Print CPU boot mode and MMU state at boot
  efi/libstub: arm: Omit arch specific config table matching array on arm64
  efi/x86: Setup stack correctly for efi_pe_entry
  efi: Make it possible to disable efivar_ssdt entirely
  efi/libstub: Descriptions for stub helper functions
  efi/libstub: Fix path separator regression
  efi/libstub: Fix missing-prototype warning for skip_spaces()
  efi: Replace zero-length array and use struct_size() helper
  efivarfs: Don't return -EINTR when rate-limiting reads
  efivarfs: Update inode modification time for successful writes
  efi/esrt: Fix reference count leak in esre_create_sysfs_entry.
  efi/tpm: Verify event log header before parsing
  efi/x86: Fix build with gcc 4

Merge tag 'sched_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:
"The most anticipated fix in this pull request is probably the horrible
  build fix for the RANDSTRUCT fail that didn't make -rc2. Also included
  is the cleanup that removes those BUILD_BUG_ON()s and replaces it with
  ugly unions.

  Also included is the try_to_wake_up() race fix that was first
  triggered by Paul's RCU-torture runs, but was independently hit by
  Dave Chinner's fstest runs as well"

* tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cfs: change initial value of runnable_avg
  smp, irq_work: Continue smp_call_function*() and irq_work*() integration
  sched/core: s/WF_ON_RQ/WQ_ON_CPU/
  sched/core: Fix ttwu() race
  sched/core: Fix PI boosting between RT and DEADLINE tasks
  sched/deadline: Initialize ->dl_boosted
  sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption
  sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail

Merge tag 'x86_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- AMD Memory bandwidth counter width fix, by Babu Moger.

- Use the proper length type in the 32-bit truncate() syscall variant,
   by Jiri Slaby.

- Reinit IA32_FEAT_CTL during wakeup to fix the case where after
   resume, VMXON would #GP due to VMX not being properly enabled, by
   Sean Christopherson.

- Fix a static checker warning in the resctrl code, by Dan Carpenter.

- Add a CR4 pinning mask for bits which cannot change after boot, by
   Kees Cook.

- Align the start of the loop of __clear_user() to 16 bytes, to improve
   performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.

* tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/64: Align start of __clear_user() loop to 16-bytes
  x86/cpu: Use pinning mask for CR4 bits needing to be 0
  x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
  x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
  syscalls: Fix offset type of ksys_ftruncate()
  x86/resctrl: Fix memory bandwidth counter width for AMD

Merge tag 'rcu_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull RCU-vs-KCSAN fixes from Borislav Petkov:
"A single commit that uses "arch_" atomic operations to avoid the
  instrumentation that comes with the non-"arch_" versions.

  In preparation for that commit, it also has another commit that makes
  these "arch_" atomic operations available to generic code.

  Without these commits, KCSAN uses can see pointless errors"

* tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Fixup noinstr warnings
  locking/atomics: Provide the arch_atomic_ interface to generic code

Merge tag 'objtool_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:
"Three fixes from Peter Zijlstra suppressing KCOV instrumentation in
  noinstr sections.

  Peter Zijlstra says:
    "Address KCOV vs noinstr. There is no function attribute to
     selectively suppress KCOV instrumentation, instead teach objtool
     to NOP out the calls in noinstr functions"

  This cures a bunch of KCOV crashes (as used by syzcaller)"

* tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix noinstr vs KCOV
  objtool: Provide elf_write_{insn,reloc}()
  objtool: Clean up elf_write() condition

Merge tag 'x86_entry_for_5.8' of git://git./linux/kernel/git/tip/tip

Pull x86 entry fixes from Borislav Petkov:
"This is the x86/entry urgent pile which has accumulated since the
  merge window.

  It is not the smallest but considering the almost complete entry core
  rewrite, the amount of fixes to follow is somewhat higher than usual,
  which is to be expected.

  Peter Zijlstra says:
   'These patches address a number of instrumentation issues that were
    found after the x86/entry overhaul. When combined with rcu/urgent
    and objtool/urgent, these patches make UBSAN/KASAN/KCSAN happy
    again.

    Part of making this all work is bumping the minimum GCC version for
    KASAN builds to gcc-8.3, the reason for this is that the
    __no_sanitize_address function attribute is broken in GCC releases
    before that.

    No known GCC version has a working __no_sanitize_undefined, however
    because the only noinstr violation that results from this happens
    when an UB is found, we treat it like WARN. That is, we allow it to
    violate the noinstr rules in order to get the warning out'"

* tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Fix #UD vs WARN more
  x86/entry: Increase entry_stack size to a full page
  x86/entry: Fixup bad_iret vs noinstr
  objtool: Don't consider vmlinux a C-file
  kasan: Fix required compiler version
  compiler_attributes.h: Support no_sanitize_undefined check with GCC 4
  x86/entry, bug: Comment the instrumentation_begin() usage for WARN()
  x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*()
  x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline()
  compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr
  kasan: Bump required compiler version
  x86, kcsan: Add __no_kcsan to noinstr
  kcsan: Remove __no_kcsan_or_inline
  x86, kcsan: Remove __no_kcsan_or_inline usage

sched/cfs: change initial value of runnable_avg

Some performance regression on reaim benchmark have been raised with
commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")

The problem comes from the init value of runnable_avg which is initialized
with max value. This can be a problem if the newly forked task is finally
a short task because the group of CPUs is wrongly set to overloaded and
tasks are pulled less agressively.

Set initial value of runnable_avg equals to util_avg to reflect that there
is no waiting time so far.

Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200624154422.29166-1-vincent.guittot@linaro.org

smp, irq_work: Continue smp_call_function*() and irq_work*() integration

Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions to make it all automatic.

Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org

sched/core: s/WF_ON_RQ/WQ_ON_CPU/

Use a better name for this poorly named flag, to avoid confusion...

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org

sched/core: Fix ttwu() race

Paul reported rcutorture occasionally hitting a NULL deref:

  sched_ttwu_pending()
    ttwu_do_wakeup()
      check_preempt_curr() := check_preempt_wakeup()
        find_matching_se()
          is_same_group()
            if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*

Debugging showed that this only appears to happen when we take the new
code-path from commit:

  2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")

and only when @cpu == smp_processor_id(). Something which should not
be possible, because p->on_cpu can only be true for remote tasks.
Similarly, without the new code-path from commit:

  c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")

this would've unconditionally hit:

  smp_cond_load_acquire(&p->on_cpu, !VAL);

and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this
would result in an instant live-lock (with IRQs disabled), something
that hasn't been reported.

The NULL deref can be explained however if the task_cpu(p) load at the
beginning of try_to_wake_up() returns an old value, and this old value
happens to be smp_processor_id(). Further assume that the p->on_cpu
load accurately returns 1, it really is still running, just not here.

Then, when we enqueue the task locally, we can crash in exactly the
observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq
is from the wrong CPU, therefore we'll iterate into the non-existant
parents and NULL deref.

The closest semi-plausible scenario I've managed to contrive is
somewhat elaborate (then again, actual reproduction takes many CPU
hours of rcutorture, so it can't be anything obvious):

X->cpu = 1
rq(1)->curr = X

CPU0 CPU1 CPU2

// switch away from X
LOCK rq(1)->lock
smp_mb__after_spinlock
dequeue_task(X)
  X->on_rq = 9
switch_to(Z)
  X->on_cpu = 0
UNLOCK rq(1)->lock

// migrate X to cpu 0
LOCK rq(1)->lock
dequeue_task(X)
set_task_cpu(X, 0)
  X->cpu = 0
UNLOCK rq(1)->lock

LOCK rq(0)->lock
enqueue_task(X)
  X->on_rq = 1
UNLOCK rq(0)->lock

// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
  X->on_cpu = 1
UNLOCK rq(0)->lock

// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb(); // wake X
ttwu()
  LOCK X->pi_lock
  smp_mb__after_spinlock

  if (p->state)

  cpu = X->cpu; // =? 1

  smp_rmb()

// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
  X->on_rq = 0

  if (p->on_rq)

  smp_rmb();

  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]

  smp_cond_load_acquire(&p->on_cpu, !VAL)

  cpu = select_task_rq(X, X->wake_cpu, ...)
  if (X->cpu != cpu)
switch_to(Y)
  X->on_cpu = 0
UNLOCK rq(0)->lock

However I'm having trouble convincing myself that's actually possible
on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.

(Most of the previous ttwu() races were found on very large PowerPC)

Nevertheless, this fully explains the observed failure case.

Fix it by ordering the task_cpu(p) load after the p->on_cpu load,
which is easy since nothing actually uses @cpu before this.

Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net

sched/core: Fix PI boosting between RT and DEADLINE tasks

syzbot reported the following warning:

WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504

At deadline.c:628 we have:

623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
624 {
625 struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
626 struct rq *rq = rq_of_dl_rq(dl_rq);
627
628 WARN_ON(dl_se->dl_boosted);
629 WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
[...]
}

Which means that setup_new_dl_entity() has been called on a task
currently boosted. This shouldn't happen though, as setup_new_dl_entity()
is only called when the 'dynamic' deadline of the new entity
is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
condition.

Digging through the PI code I noticed that what above might in fact happen
if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
first branch of boosting conditions we check only if a pi_task 'dynamic'
deadline is earlier than mutex holder's and in this case we set mutex
holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
initialized if such tasks get boosted at some point (or if they become
DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
to 0 and this verifies the aforementioned condition.

Fix it by checking that the potential donor task is actually (even if
temporary because in turn boosted) running at DEADLINE priority before
using its 'dynamic' deadline value.

Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain

sched/deadline: Initialize ->dl_boosted

syzbot reported the following warning triggered via SYSC_sched_setattr():

  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 setup_new_dl_entity /kernel/sched/deadline.c:594 [inline]
  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_dl_entity /kernel/sched/deadline.c:1370 [inline]
  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_task_dl+0x1c17/0x2ba0 /kernel/sched/deadline.c:1441

This happens because the ->dl_boosted flag is currently not initialized by
__dl_clear_params() (unlike the other flags) and setup_new_dl_entity()
rightfully complains about it.

Initialize dl_boosted to 0.

Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+5ac8bac25f95e8b221e7@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20200617072919.818409-1-juri.lelli@redhat.com

sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption

This function is concerned with the long-term CPU mask, not the
transitory mask the task might have while migrate disabled. Before
this patch, if a task was migrate-disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the CPU that the task was running on, then the mask update
would be lost.

Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200617121742.cpxppyi7twxmpin7@linutronix.de

sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail

As a temporary build fix, the proper cleanup needs more work.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Eric Biggers <ebiggers@kernel.org>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Fixes: a148866489fb ("sched: Replace rq::wake_list")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'imx-fixes-5.8' of git://git./linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 5.8:

- Fix LDO1 and LDO2 voltage range for a couple of i.MX8M board device
  trees.
- Fix i.MX8MP UID fuse offset in i.MX8M SoC driver.
- Fix watchdog configuration in imx6ul-kontron device tree.
- Fix one build warning seen on building soc-imx8m driver with
  x86_64-randconfig.
- Add missing put_device() call for a couple of mach-imx PM functions.

* tag 'imx-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  soc: imx8m: fix build warning
  ARM: imx6: add missing put_device() call in imx6q_suspend_init()
  ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()
  soc: imx8m: Correct i.MX8MP UID fuse offset
  ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain
  ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM
  arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2
  arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range
  arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range

Link: https://lore.kernel.org/r/20200624111725.GA24312@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'arm-soc/for-5.8/drivers-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM/ARM64/MIPS SoCs drivers fixes
for 5.8, please pull the following:

- Andy provides a fix for the Raspberry Pi firmware driver to print the
  correct time upon boot. This is a fallout from a converstion to use
  the ptT format

* tag 'arm-soc/for-5.8/drivers-fixes' of https://github.com/Broadcom/stblinux:
  ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision()

Link: https://lore.kernel.org/r/20200619202250.19029-2-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'arm-soc/for-5.8/soc-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM-based SoCs machine/Kconfig fixes
for 5.8, please pull the following:

- Matthew adds a missing select to permit the use of the standard ARM
SP804 timers on Norsthstar Plus (NSP)

* tag 'arm-soc/for-5.8/soc-fixes' of https://github.com/Broadcom/stblinux:
ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP

Link: https://lore.kernel.org/r/20200619202250.19029-3-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'arm-soc/for-5.8/devicetree-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.8, please pull the following:

- Rafal adds a missing 'device_type' property to the Luxul XWC-2000
  required for the memory nodes to be correctly parsed by Linux

- Matthew provides two fixes for the NSP SoCs, one to disable the PL330
  DMA controller by default since it can be left in reset by the
  bootloader and the second to correct the flow accelerator mailbox node

* tag 'arm-soc/for-5.8/devicetree-fixes' of https://github.com/Broadcom/stblinux:
  ARM: dts: NSP: Correct FA2 mailbox node
  ARM: dts: NSP: Disable PL330 by default, add dma-coherent property
  ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000

Link: https://lore.kernel.org/r/20200619202250.19029-1-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Revert "ARM: sti: Implement dummy L2 cache's write_sec"

This reverts commit 7b8e0188fa717cd9abc4fb52587445b421835c2a.

Initially, STiH410-B2260 was supposed to be secured, that's why
l2c_write_sec was stubbed to avoid secure register access from
non secure world.

But by default, STiH410-B2260 is running in non secure mode,
so L2 cache register accesses are authorized, l2c_write_sec stub
is not needed.

With this patch, L2 cache is configured and performance are enhanced.

Link: https://lore.kernel.org/r/20200618172456.29475-1-patrice.chotard@st.com
Signed-off-by: Patrice Chotard <patrice.chotard@st.com>
Cc: Alain Volmat <alain.volmat@st.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'omap-for-v5.8/fixes-rc1-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/omap-fixes

Few dts fixes for omaps for v5.8

Few fixes for various devices:

- Prevent pocketgeagle header line signal from accidentally setting
  micro-SD write protection signal by removing the default mux

- Fix NFSroot flakeyness after resume for duover by switching the
  smsc911x gpio interrupt to back to level sensitive

- Fix regression for omap4 clockevent source after recent system
  timer changes

- Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
  phy-mode

* tag 'omap-for-v5.8/fixes-rc1-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode
  ARM: dts: Fix omap4 system timer source clocks
  ARM: dts: Fix duovero smsc interrupt for suspend
  ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect

Link: https://lore.kernel.org/r/pull-1592499282-121092@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'omap-for-v5.8/dt-missed-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/omap-fixes

Missed sdhci patch for am3 and am4

I forgot to send a pull request earlier for converting am3 and am4 to
use sdhci-omap driver instead of the old omap_hsmmc driver.

There was a display subsystem related suspend and resume regression found
recently and looks like I forgot to send a pull request for this patch
while debugging the regression. This patch has been tested without the
display subsystem, and has been in Linux next for several weeks now, so
would be good to have merged for v5.8.

* tag 'omap-for-v5.8/dt-missed-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Move am33xx and am43xx mmc nodes to sdhci-omap driver

Link: https://lore.kernel.org/r/pull-1591637467-607254@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'tee-ml-for-v5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes

Change the TEE mailing list in MAINTAINERS

* tag 'tee-ml-for-v5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee:
MAINTAINERS: change tee mailing list

Link: https://lore.kernel.org/r/20200616075948.GA2288211@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'omap-for-v5.8/fixes-merge-window-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/fixes

Fixes for omaps for v5.8

The recent display subsystem (DSS) related platform data changes caused
display related regressions for suspend and resume. Looks like I only
tested suspend and resume before dropping the legacy platform data, and
forgot to test it after dropping it. Turns out the main issue was that
we no longer have platform code calling pm_runtime_suspend for DSS like
we did for the legacy platform data case, and that fix is still being
discussed on the dri-devel list and will get merged separately. The DSS
related testing exposed a pile other other display related issues that
also need fixing though:

- Fix ti-sysc optional clock handling and reset status checks
  for devices that reset automatically in idle like DSS

- Ignore ti-sysc clockactivity bit unless separately requested
  to avoid unexpected performance issues

- Init ti-sysc framedonetv_irq to true and disable for am4

- Avoid duplicate DSS reset for legacy mode with dts data

- Remove LCD timings for am4 as they cause warnings now that we're
  using generic panels

Then there is a pile of other fixes not related to the DSS:

- Fix omap_prm reset deassert as we still have drivers setting the
  pm_runtime_irq_safe() flag

- Flush posted write for ti-sysc enable and disable

- Fix droid4 spi related errors with spi flags

- Fix am335x USB range and a typo for softreset

- Fix dra7 timer nodes for clocks for IPU and DSP

- Drop duplicate mailboxes after mismerge for dra7

* tag 'omap-for-v5.8/fixes-merge-window-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  Revert "bus: ti-sysc: Increase max softreset wait"
  ARM: dts: am437x-epos-evm: remove lcd timings
  ARM: dts: am437x-gp-evm: remove lcd timings
  ARM: dts: am437x-sk-evm: remove lcd timings
  ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes
  ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks
  ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag
  ARM: dts: Fix am33xx.dtsi USB ranges length
  bus: ti-sysc: Increase max softreset wait
  ARM: OMAP2+: Fix legacy mode dss_reset
  bus: ti-sysc: Fix uninitialized framedonetv_irq
  bus: ti-sysc: Ignore clockactivity unless specified as a quirk
  bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit
  ARM: dts: omap4-droid4: Fix spi configuration and increase rate
  bus: ti-sysc: Flush posted write on enable and disable
  soc: ti: omap-prm: use atomic iopoll instead of sleeping one

Link: https://lore.kernel.org/r/pull-1591889257-410830@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

afs: Fix storage of cell names

The cell name stored in the afs_cell struct is a 64-char + NUL buffer -
when it needs to be able to handle up to AFS_MAXCELLNAME (256 chars) + NUL.

Fix this by changing the array to a pointer and allocating the string.

Found using Coverity.

Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Six cifs/smb3 fixes, three of them for stable.

  Fixes xfstests 451, 313 and 316"

* tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: misc: Use array_size() in if-statement controlling expression
  cifs: update ctime and mtime during truncate
  cifs/smb3: Fix data inconsistent when punch hole
  cifs/smb3: Fix data inconsistent when zero file range
  cifs: Fix double add page to memcg when cifs_readpages
  cifs: Fix cached_fid refcnt leak in open_shroot

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Six small fixes, five in drivers and one to correct another minor
  regression from cc97923a5bcc ("block: move dma drain handling to
  scsi") where we still need the drain stub to be built in to the kernel
  for the modular libata, non-modular SAS driver case"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mptscsih: Fix read sense data size
  scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action
  scsi: lpfc: Avoid another null dereference in lpfc_sli4_hba_unset()
  scsi: libata: Fix the ata_scsi_dma_need_drain stub
  scsi: qla2xxx: Keep initiator ports after RSCN
  scsi: qla2xxx: Set NVMe status code for failed NVMe FCP request

Merge tag 'vfio-v5.8-rc3' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

- Fix double free of eventfd ctx (Alex Williamson)

- Fix duplicate use of capability ID (Alex Williamson)

- Fix SR-IOV VF memory enable handling (Alex Williamson)

* tag 'vfio-v5.8-rc3' of git://github.com/awilliam/linux-vfio:
  vfio/pci: Fix SR-IOV VF handling with MMIO blocking
  vfio/type1: Fix migration info capability ID
  vfio/pci: Clear error and request eventfd ctx after releasing

Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"This contains a 5.8 regression fix for the Designware driver, a
  register bitfield fix for the fsi driver, and a missing sanity check
  for the I2C core"

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: check returned size of emulated smbus block read
  i2c: fsi: Fix the port number field in status register
  i2c: designware: Adjust bus speed independently of ACPI

Merge tag 'staging-5.8-rc3' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are a small number of tiny staging driver fixes for 5.8-rc3.

  Not much here, but there were some reported problems to be fixed:

   - three wfx driver fixes

   - rtl8723bs driver fix

  All of these have been in linux-next with no reported issues"

* tag 'staging-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  Staging: rtl8723bs: prevent buffer overflow in update_sta_support_rate()
  staging: wfx: fix coherency of hif_scan() prototype
  staging: wfx: drop useless loop
  staging: wfx: fix AC priority

Merge tag 'usb-5.8-rc3' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.8-rc3 to resolve some reported
  issues.

  Nothing major here:

   - gadget driver fixes

   - cdns3 driver fixes

   - xhci fixes

   - renesas_usbhs driver fixes

   - some new device support with ids

   - documentation update

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (27 commits)
  usb: renesas_usbhs: getting residue from callback_result
  Revert "usb: dwc3: exynos: Add support for Exynos5422 suspend clk"
  xhci: Poll for U0 after disabling USB2 LPM
  xhci: Return if xHCI doesn't support LPM
  usb: host: xhci-mtk: avoid runtime suspend when removing hcd
  xhci: Fix enumeration issue when setting max packet size for FS devices.
  xhci: Fix incorrect EP_STATE_MASK
  usb: cdns3: ep0: add spinlock for cdns3_check_new_setup
  usb: cdns3: trace: using correct dir value
  usb: cdns3: ep0: fix the test mode set incorrectly
  Revert "usb: dwc3: exynos: Add support for Exynos5422 suspend clk"
  usb: gadget: udc: Potential Oops in error handling code
  usb: phy: tegra: Fix unnecessary check in tegra_usb_phy_probe()
  usb: dwc3: pci: Fix reference count leak in dwc3_pci_resume_work
  usb: cdns3: ep0: add spinlock for cdns3_check_new_setup
  usb: cdns3: trace: using correct dir value
  usb: cdns3: ep0: fix the test mode set incorrectly
  usb: typec: tcpci_rt1711h: avoid screaming irq causing boot hangs
  USB: ohci-sm501: Add missed iounmap() in remove
  cdc-acm: Add DISABLE_ECHO quirk for Microchip/SMSC chip
  ...

Merge tag 'char-misc-5.8-rc3' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
"Some tiny char/misc driver fixes for 5.8-rc3.

  The "largest" changes are in the mei driver, to resolve some reported
  problems and add some new device ids. There's also a binder bugfix, an
  fpga driver build fix, and some assorted habanalabs fixes.

  All of these, except for the habanalabs fixes, have been in linux-next
  with no reported issues. The habanalabs driver changes showed up in my
  tree on Friday, but as they are totally self-contained, all should be
  good there"

* tag 'char-misc-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  habanalabs: increase h/w timer when checking idle
  habanalabs: Correct handling when failing to enqueue CB
  habanalabs: increase GAUDI QMAN ARB WDT timeout
  habanalabs: rename mmu_write() to mmu_asid_va_write()
  habanalabs: use PI in MMU cache invalidation
  habanalabs: block scalar load_and_exe on external queue
  mei: me: add tiger lake point device ids for H platforms.
  mei: me: disable mei interface on Mehlow server platforms
  binder: fix null deref of proc->context
  fpga: zynqmp: fix modular build

Merge tag 'edac_urgent_for_5.8' of git://git./linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:
"A single fix for amd64_edac restoring the reporting of the DRAM scrub
rate on family 0x15 CPUs"

* tag 'edac_urgent_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Read back the scrub rate PCI register on F15h

Merge tag 'dma-mapping-5.8-4' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

- fix dma coherent mmap in nommu (me)

- more AMD SEV fallout (David Rientjes, me)

- fix alignment in dma_common_*_remap (Eric Auger)

* tag 'dma-mapping-5.8-4' of git://git.infradead.org/users/hch/dma-mapping:
  dma-remap: align the size in dma_common_*_remap()
  dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR
  dma-direct: add missing set_memory_decrypted() for coherent mapping
  dma-direct: check return value when encrypting or decrypting memory
  dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails
  dma-direct: always align allocation size in dma_direct_alloc_pages()
  dma-direct: mark __dma_direct_alloc_pages static
  dma-direct: re-enable mmap for !CONFIG_MMU

Merge tag 'nfs-for-5.8-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
"Stable Fixes:
   - xprtrdma: Fix handling of RDMA_ERROR replies
   - sunrpc: Fix rollback in rpc_gssd_dummy_populate()
   - pNFS/flexfiles: Fix list corruption if the mirror count changes
   - NFSv4: Fix CLOSE not waiting for direct IO completion
   - SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment()

  Other Fixes:
   - xprtrdma: Fix a use-after-free with r_xprt->rx_ep
   - Fix other xprtrdma races during disconnect
   - NFS: Fix memory leak of export_path"

* tag 'nfs-for-5.8-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment()
  NFSv4 fix CLOSE not waiting for direct IO compeletion
  pNFS/flexfiles: Fix list corruption if the mirror count changes
  nfs: Fix memory leak of export_path
  sunrpc: fixed rollback in rpc_gssd_dummy_populate()
  xprtrdma: Fix handling of RDMA_ERROR replies
  xprtrdma: Clean up disconnect
  xprtrdma: Clean up synopsis of rpcrdma_flush_disconnect()
  xprtrdma: Use re_connect_status safely in rpcrdma_xprt_connect()
  xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed

Merge tag 'io_uring-5.8-2020-06-26' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Three small fixes:

   - Close a corner case for polled IO resubmission (Pavel)

   - Toss commands when exiting (Pavel)

   - Fix SQPOLL conditional reschedule on perpetually busy submit
     (Xuan)"

* tag 'io_uring-5.8-2020-06-26' of git://git.kernel.dk/linux-block:
  io_uring: fix current->mm NULL dereference on exit
  io_uring: fix hanging iopoll in case of -EAGAIN
  io_uring: fix io_sq_thread no schedule when busy

Merge tag 'block-5.8-2020-06-26' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request from Christoph:
    - multipath deadlock fixes (Anton)
    - NUMA fixes (Max)
    - RDMA completion vector fix (Max)
    - IO deadlock fix (Sagi)
    - multipath reference fix (Sagi)
    - NS mutation fix (Sagi)

- Use right allocator when freeing bip in error path (Chengguang)

* tag 'block-5.8-2020-06-26' of git://git.kernel.dk/linux-block:
  nvme-multipath: fix bogus request queue reference put
  nvme-multipath: fix deadlock due to head->lock
  nvme: don't protect ns mutation with ns->head->lock
  nvme-multipath: fix deadlock between ana_work and scan_work
  nvme: fix possible deadlock when I/O is blocked
  nvme-rdma: assign completion vector correctly
  nvme-loop: initialize tagset numa value to the value of the ctrl
  nvme-tcp: initialize tagset numa value to the value of the ctrl
  nvme-pci: initialize tagset numa value to the value of the ctrl
  nvme-pci: override the value of the controller's numa node
  nvme: set initial value for controller's numa node
  block: release bip in a right way in error path

Merge tag 'for-5.8/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Quite a few DM zoned target fixes and a Zone append fix in DM core.

   Considering the amount of dm-zoned changes that went in during the
   5.8 merge window these fixes are not that surprising.

- A few DM writecache target fixes.

- A fix to Documentation index to include DM ebs target docs.

- Small cleanup to use struct_size() in DM core's retrieve_deps().

* tag 'for-5.8/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm writecache: add cond_resched to loop in persistent_memory_claim()
  dm zoned: Fix reclaim zone selection
  dm zoned: Fix random zone reclaim selection
  dm: update original bio sector on Zone Append
  dm zoned: Fix metadata zone size check
  docs: device-mapper: add dm-ebs.rst to an index file
  dm ioctl: use struct_size() helper in retrieve_deps()
  dm writecache: skip writecache_wait when using pmem mode
  dm writecache: correct uncommitted_block when discarding uncommitted entry
  dm zoned: assign max_io_len correctly
  dm zoned: fix uninitialized pointer dereference

Merge tag 'kgdb-5.8-rc3' of git://git./linux/kernel/git/danielt/linux

Pull kgdb fixes from Daniel Thompson:
"The main change here is a fix for a number of unsafe interactions
  between kdb and the console system. The fixes are specific to kdb
  (pure kgdb debugging does not use the console system at all). On
  systems with an NMI then kdb, if it is enabled, must get messages to
  the user despite potentially running from some "difficult" calling
  contexts. These fixes avoid using the console system where we have
  been provided an alternative (safer) way to interact with the user
  and, if using the console system in unavoidable, use oops_in_progress
  for deadlock avoidance. These fixes also ensure kdb honours the
  console enable flag.

  Also included is a fix that wraps kgdb trap handling in an RCU read
  lock to avoids triggering diagnostic warnings. This is a wide lock
  scope but this is OK because kgdb is a stop-the-world debugger. When
  we stop the world we put all the CPUs into holding pens and this
  inhibits RCU update anyway"

* tag 'kgdb-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kgdb: Avoid suspicious RCU usage warning
  kdb: Switch to use safer dbg_io_ops over console APIs
  kdb: Make kdb_printf() console handling more robust
  kdb: Check status of console prior to invoking handlers
  kdb: Re-factor kdb_printf() message write code

Merge tag 'powerpc-5.8-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- A fix for a crash in nested KVM when CONFIG_DEBUG_VIRTUAL=y.

- Two minor build fixes.

Thanks to: Aneesh Kumar K.V, Arseny Solokha, Harish.

* tag 'powerpc-5.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix build failure in ebb tests
  powerpc/kvm/book3s64: Fix kernel crash with nested kvm & DEBUG_VIRTUAL
  powerpc/fsl_booke/32: Fix build with CONFIG_RANDOMIZE_BASE

Merge tag 'riscv-for-linus-5.8-rc3' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
"This contains a handful of fixes I'd like to target for rc3.

  Most of them fix issues with the conversion of our vDSO to C. There is
  also one fix to the SiFive PRCI driver that I picked up as it's
  causing boot issues on the hardware.

   - A fix to allow kernels with dynamic ftrace to use the vDSO.

   - Some build fixes for the C vDSO functions.

   - A fix to the PRCI driver's memory allocation, which was the cause
     of some boot panics with FREELIST_RANDOM"

* tag 'riscv-for-linus-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fixup __vdso_gettimeofday broke dynamic ftrace
  riscv: Add extern declarations for vDSO time-related functions
  clk: sifive: allocate sufficient memory for struct __prci_data
  riscv: Add -fPIC option to CFLAGS_vgettimeofday.o

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"The big fix here is to our vDSO sigreturn trampoline as, after a
  painfully long stint of debugging, it turned out that fixing some of
  our CFI directives in the merge window lit up a bunch of logic in
  libgcc which has been shown to SEGV in some cases during asynchronous
  pthread cancellation.

  It looks like we can fix this by extending the directives to restore
  most of the interrupted register state from the sigcontext, but it's
  risky and hard to test so we opted to remove the CFI directives for
  now and rely on the unwinder fallback path like we used to.

   - Fix unwinding through vDSO sigreturn trampoline

   - Fix build warnings by raising minimum LD version for PAC

   - Whitelist some Kryo Cortex-A55 derivatives for Meltdown and SSB

   - Fix perf register PC reporting for compat tasks

   - Fix 'make clean' warning for arm64 signal selftests

   - Fix ftrace when BTI is compiled in

   - Avoid building the compat vDSO using GCC plugins"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add KRYO{3,4}XX silver CPU cores to SSB safelist
  arm64: perf: Report the PC value in REGS_ABI_32 mode
  kselftest: arm64: Remove redundant clean target
  arm64: kpti: Add KRYO{3, 4}XX silver CPU cores to kpti safelist
  arm64: Don't insert a BTI instruction at inner labels
  arm64: vdso: Don't use gcc plugins for building vgettimeofday.c
  arm64: vdso: Only pass --no-eh-frame-hdr when linker supports it
  arm64: Depend on newer binutils when building PAC
  arm64: compat: Remove 32-bit sigreturn code from the vDSO
  arm64: compat: Always use sigpage for sigreturn trampoline
  arm64: compat: Allow 32-bit vdso and sigpage to co-exist
  arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline

Merge tag 'juno-fix-5.8' of git://git./linux/kernel/git/sudeep.holla/linux into arm/fixes

ARMv8 Juno/Vexpress/Fast Models fix for v5.8

Partial revert of some recent fixes to silence DTC warning which broke
clocks on some Vexpress platforms resulting in boot issues.

* tag 'juno-fix-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
arm: dts: vexpress: Move mcc node back into motherboard node

Link: https://lore.kernel.org/r/20200609180447.GB5732@bogus
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'acpi-5.8-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"Prevent bypassing kernel lockdown via the ACPI tables loading
  interface (Jason A. Donenfeld) and fix the handling of an ACPI sysfs
  attribute (Nathan Chancellor)"

* tag 'acpi-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: sysfs: Fix pm_profile_attr type
  ACPI: configfs: Disallow loading ACPI tables when locked down

Merge tag 'pm-5.8-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix a recent regression that broke suspend-to-idle on some x86
  systems, fix the intel_pstate driver to correctly let the platform
  firmware control CPU performance in some cases and add __init
  annotations to a couple of functions.

  Specifics:

   - Make sure that the _TIF_POLLING_NRFLAG is clear before entering the
     last phase of suspend-to-idle to avoid wakeup issues on some x86
     systems (Chen Yu, Rafael Wysocki).

   - Cover one more case in which the intel_pstate driver should let the
     platform firmware control the CPU frequency and refuse to load
     (Srinivas Pandruvada).

   - Add __init annotations to 2 functions in the power management core
     (Christophe JAILLET)"

* tag 'pm-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: Rearrange s2idle-specific idle state entry code
  PM: sleep: core: mark 2 functions as __init to save some memory
  cpufreq: intel_pstate: Add one more OOB control bit
  PM: s2idle: Clear _TIF_POLLING_NRFLAG before suspend to idle

Merge tag 'iommu-fixes-v5.8-rc2' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
"A couple of Intel VT-d fixes:

   - Make Intel SVM code 64bit only. The code uses pgd_t* and the IOMMU
     only supports long-mode page-table formats, so its broken on 32bit
     anyway.

   - Make sure GFX quirks in for Intel VT-d are not applied to untrusted
     devices. Those devices might gain full memory access otherwise.

   - Identity mapping setup fix.

   - Fix ACS enabling when Intel IOMMU is off and untrusted devices are
     detected.

   - Two smaller fixes for coherency and IO page-table setup"

* tag 'iommu-fixes-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix misuse of iommu_domain_identity_map()
  iommu/vt-d: Update scalable mode paging structure coherency
  iommu/vt-d: Enable PCI ACS for platform opt in hint
  iommu/vt-d: Don't apply gfx quirks to untrusted devices
  iommu/vt-d: Set U/S bit in first level page table by default
  iommu/vt-d: Make Intel SVM code 64-bit only

Merge tag 'drm-fixes-2020-06-26' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Usual rc3 pickup, lots of little fixes all over.

  The core VT registration regression fix is probably the largest,
  otherwise ttm, amdgpu and tegra are the bulk, with some minor driver
  fixes.

  No i915 pull this week which may or may not mean I get 2x of it next
  week, we'll see how it goes.

  core:
   - fix VT registration regression

  ttm:
   - fix two fence leaks

  amdgpu:
   - Fix missed mutex unlock in DC error path
   - Fix firmware leak for sdma5
   - DC bpc property fixes

  amdkfd:
   - Fix memleak in an error path

  radeon:
   - Fix copy paste typo in NI DPM spll validation

  rcar-du:
   - build fix

  tegra:
   - add missing zpos property
   - child driver registeration fix
   - debugfs cleanup fix
   - doc fix

  mcde:
   - reorder fbdev setup

  panel:
   - fix connector type
   - fix orienation for some panels

  sun4i:
   - fix dma/iommu configuration

  uvesafb:
   - respect blank flag"

* tag 'drm-fixes-2020-06-26' of git://anongit.freedesktop.org/drm/drm: (25 commits)
  drm/amd: fix potential memleak in err branch
  drm/amd/display: Fix ineffective setting of max bpc property
  drm/amd/display: Enable output_bpc property on all outputs
  drm/amdgpu: add fw release for sdma v5_0
  drm/fb-helper: Fix vt restore
  drm/radeon: fix fb_div check in ni_init_smc_spll_table()
  drm/amdgpu/display: Unlock mutex on error
  drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU
  drm: panel-orientation-quirks: Use generic orientation-data for Acer S1003
  drm: panel-orientation-quirks: Add quirk for Asus T101HA panel
  video: fbdev: uvesafb: fix "noblank" option handling
  drm/panel-simple: fix connector type for newhaven_nhd_43_480272ef_atxl
  drm/panel-simple: fix connector type for LogicPD Type28 Display
  drm: rcar-du: Fix build error
  drm: mcde: Fix forgotten user of drm->dev_private
  drm: mcde: Fix display initialization problem
  drm/tegra: Add zpos property for cursor planes
  gpu: host1x: Detach driver on unregister
  gpu: host1x: Correct trivial kernel-doc inconsistencies
  drm/tegra: hub: Register child devices
  ...

Merge branch 'akpm' (patches from Andrew)

Merge misx fixes from Andrew Morton:
"31 patches.

  Subsystems affected by this patch series: hotfixes, mm/pagealloc,
  kexec, ocfs2, lib, mm/slab, mm/slab, mm/slub, mm/swap, mm/pagemap,
  mm/vmalloc, mm/memcg, mm/gup, mm/thp, mm/vmscan, x86,
  mm/memory-hotplug, MAINTAINERS"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
  MAINTAINERS: update info for sparse
  mm/memory_hotplug.c: fix false softlockup during pfn range removal
  mm: remove vmalloc_exec
  arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page
  x86/hyperv: allocate the hypercall page with only read and execute bits
  mm/memory: fix IO cost for anonymous page
  mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages"
  mm: workingset: age nonresident information alongside anonymous pages
  doc: THP CoW fault no longer allocate THP
  docs: mm/gup: minor documentation update
  mm/memcontrol.c: prevent missed memory.low load tears
  mm/memcontrol.c: add missed css_put()
  mm: memcontrol: handle div0 crash race condition in memory.low
  mm/vmalloc.c: fix a warning while make xmldocs
  media: omap3isp: remove cacheflush.h
  make asm-generic/cacheflush.h more standalone
  mm/debug_vm_pgtable: fix build failure with powerpc 8xx
  mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages()
  mm: fix swap cache node allocation mask
  slub: cure list_slab_objects() from double fix
  ...

Merge tag 'fpga-fixes-for-5.8' of git://git./linux/kernel/git/mdf/linux-fpga into char-misc-next

FPGA Manager fixes for 5.8-rc1

Here is one (late) fix for 5.8-rc1 merge window.

Arnd's change addresses a missing build dependency.

All patches have been reviewed on the mailing list, and have been in the
last few linux-next releases (as part of my fixes branch) without issues.

Signed-off-by: Moritz Fischer <mdf@kernel.org>
* tag 'fpga-fixes-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga:
fpga: zynqmp: fix modular build

Merge tag 'misc-habanalabs-fixes-2020-06-24' of git://people.freedesktop.org/~gabbayo/linux into char-misc-linus

Oded writes:

This tag contains the following fixes for kernel 5.8-rc2:

- close security hole in GAUDI command buffer parsing by blocking an
  instruction that might allow user to run command buffer that wasn't
  parsed on a secured engine.

- Fix bug in GAUDI MMU cache invalidation code.

- Rename a function to resolve conflict with a static inline function in
  arch/m68k/include/asm/mcfmmu.h

- Increase watchdog timeout of GAUDI QMAN arbitration H/W to prevent false
  reports on timeouts

- Fix bug of dereferencing NULL pointer when an error occurs during command
  submission

- Increase H/W timer for checking if PDMA engine is IDLE in GAUDI.

* tag 'misc-habanalabs-fixes-2020-06-24' of git://people.freedesktop.org/~gabbayo/linux:
  habanalabs: increase h/w timer when checking idle
  habanalabs: Correct handling when failing to enqueue CB
  habanalabs: increase GAUDI QMAN ARB WDT timeout
  habanalabs: rename mmu_write() to mmu_asid_va_write()
  habanalabs: use PI in MMU cache invalidation
  habanalabs: block scalar load_and_exe on external queue

Merge tag 'fixes-for-v5.8-rc2' of git://git./linux/kernel/git/balbi/usb into usb-linus

Felipe writes:

usb: fixes for v5.8-rc2

A revert of Exynos5422 suspend clock support, it turns out it wasn't
ready to be merged. CDNS3 got a fix for test mode initialization.

Signed-off-by: Felipe Balbi <balbi@kernel.org>
* tag 'fixes-for-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
  Revert "usb: dwc3: exynos: Add support for Exynos5422 suspend clk"
  usb: gadget: udc: Potential Oops in error handling code
  usb: phy: tegra: Fix unnecessary check in tegra_usb_phy_probe()
  usb: dwc3: pci: Fix reference count leak in dwc3_pci_resume_work
  usb: cdns3: ep0: add spinlock for cdns3_check_new_setup
  usb: cdns3: trace: using correct dir value
  usb: cdns3: ep0: fix the test mode set incorrectly

Merge branches 'pm-cpufreq' and 'pm-cpuidle'

* pm-cpufreq:
  cpufreq: intel_pstate: Add one more OOB control bit

* pm-cpuidle:
  cpuidle: Rearrange s2idle-specific idle state entry code
  PM: s2idle: Clear _TIF_POLLING_NRFLAG before suspend to idle

Merge branch 'acpi-sysfs'

* acpi-sysfs:
ACPI: sysfs: Fix pm_profile_attr type

Merge 5.8-rc2 into usb-linus

Felipe has based his patches on that tag, so update my usb-linus branch
to it as well so that I can pull his patches in here easier.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kgdb: Avoid suspicious RCU usage warning

At times when I'm using kgdb I see a splat on my console about
suspicious RCU usage.  I managed to come up with a case that could
reproduce this that looked like this:

  WARNING: suspicious RCU usage
  5.7.0-rc4+ #609 Not tainted
  -----------------------------
  kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection!

  other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
  3 locks held by swapper/0/1:
   #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c
   #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac
   #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac

  stack backtrace:
  CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609
  Hardware name: Google Cheza (rev3+) (DT)
  Call trace:
   dump_backtrace+0x0/0x1b8
   show_stack+0x1c/0x24
   dump_stack+0xd4/0x134
   lockdep_rcu_suspicious+0xf0/0x100
   find_task_by_pid_ns+0x5c/0x80
   getthread+0x8c/0xb0
   gdb_serial_stub+0x9d4/0xd04
   kgdb_cpu_enter+0x284/0x7ac
   kgdb_handle_exception+0x174/0x20c
   kgdb_brk_fn+0x24/0x30
   call_break_hook+0x6c/0x7c
   brk_handler+0x20/0x5c
   do_debug_exception+0x1c8/0x22c
   el1_sync_handler+0x3c/0xe4
   el1_sync+0x7c/0x100
   rpmh_rsc_probe+0x38/0x420
   platform_drv_probe+0x94/0xb4
   really_probe+0x134/0x300
   driver_probe_device+0x68/0x100
   __device_attach_driver+0x90/0xa8
   bus_for_each_drv+0x84/0xcc
   __device_attach+0xb4/0x13c
   device_initial_probe+0x18/0x20
   bus_probe_device+0x38/0x98
   device_add+0x38c/0x420

If I understand properly we should just be able to blanket kgdb under
one big RCU read lock and the problem should go away.  We'll add it to
the beast-of-a-function known as kgdb_cpu_enter().

With this I no longer get any splats and things seem to work fine.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200602154729.v2.1.I70e0d4fd46d5ed2aaf0c98a355e8e1b7a5bb7e4e@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

kdb: Switch to use safer dbg_io_ops over console APIs

In kgdb context, calling console handlers aren't safe due to locks used
in those handlers which could in turn lead to a deadlock. Although, using
oops_in_progress increases the chance to bypass locks in most console
handlers but it might not be sufficient enough in case a console uses
more locks (VT/TTY is good example).

Currently when a driver provides both polling I/O and a console then kdb
will output using the console. We can increase robustness by using the
currently active polling I/O driver (which should be lockless) instead
of the corresponding console. For several common cases (e.g. an
embedded system with a single serial port that is used both for console
output and debugger I/O) this will result in no console handler being
used.

In order to achieve this we need to reverse the order of preference to
use dbg_io_ops (uses polling I/O mode) over console APIs. So we just
store "struct console" that represents debugger I/O in dbg_io_ops and
while emitting kdb messages, skip console that matches dbg_io_ops
console in order to avoid duplicate messages. After this change,
"is_console" param becomes redundant and hence removed.

Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/1591264879-25920-5-git-send-email-sumit.garg@linaro.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment()

@subbuf is an output parameter of xdr_buf_subsegment(). A survey of
call sites shows that @subbuf is always uninitialized before
xdr_buf_segment() is invoked by callers.

There are some execution paths through xdr_buf_subsegment() that do
not set all of the fields in @subbuf, leaving some pointer fields
containing garbage addresses. Subsequent processing of that buffer
then results in a page fault.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSv4 fix CLOSE not waiting for direct IO compeletion

Figuring out the root case for the REMOVE/CLOSE race and
suggesting the solution was done by Neil Brown.

Currently what happens is that direct IO calls hold a reference
on the open context which is decremented as an asynchronous task
in the nfs_direct_complete(). Before reference is decremented,
control is returned to the application which is free to close the
file. When close is being processed, it decrements its reference
on the open_context but since directIO still holds one, it doesn't
sent a close on the wire. It returns control to the application
which is free to do other operations. For instance, it can delete a
file. Direct IO is finally releasing its reference and triggering
an asynchronous close. Which races with the REMOVE. On the server,
REMOVE can be processed before the CLOSE, failing the REMOVE with
EACCES as the file is still opened.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Suggested-by: Neil Brown <neilb@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

pNFS/flexfiles: Fix list corruption if the mirror count changes

If the mirror count changes in the new layout we pick up inside
ff_layout_pg_init_write(), then we can end up adding the
request to the wrong mirror and corrupting the mirror->pg_list.

Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

nfs: Fix memory leak of export_path

The try_location function is called within a loop by nfs_follow_referral.
try_location calls nfs4_pathname_string to created the export_path.
nfs4_pathname_string allocates the memory. export_path is stored in the
nfs_fs_context/fs_context structure similarly as hostname and source.
But whereas the ctx hostname and source are freed before assignment,
export_path is not. So if there are multiple loops, the new export_path
will overwrite the old without the old being freed.

So call kfree for export_path.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

sunrpc: fixed rollback in rpc_gssd_dummy_populate()

__rpc_depopulate(gssd_dentry) was lost on error path

cc: stable@vger.kernel.org
Fixes: commit 4b9a445e3eeb ("sunrpc: create a new dummy pipe for gssd to hold open")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Merge branch 'linus' into x86/entry, to resolve conflicts

Conflicts:
arch/x86/kernel/traps.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>

i2c: core: check returned size of emulated smbus block read

If the i2c bus driver ignores the I2C_M_RECV_LEN flag (as some of
them do), it is possible for an I2C_SMBUS_BLOCK_DATA read issued
on some random device to return an arbitrary value in the first
byte (and nothing else). When this happens, i2c_smbus_xfer_emulated()
will happily write past the end of the supplied data buffer, thus
causing Bad Things to happen. To prevent this, check the size
before copying the data block and return an error if it is too large.

Fixes: 209d27c3b167 ("i2c: Emulate SMBus block read over I2C")
Signed-off-by: Mans Rullgard <mans@mansr.com>
[wsa: use better errno]
Signed-off-by: Wolfram Sang <wsa@kernel.org>

MAINTAINERS: update info for sparse

Update the info for sparse. More specifically:

- change W entry to point to sparse.docs.kernel.org

- add Q & B entry (patchwork & bugzilla)

Link: http://lkml.kernel.org/r/20200621144204.53938-1-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory_hotplug.c: fix false softlockup during pfn range removal

When working with very large nodes, poisoning the struct pages (for which
there will be very many) can take a very long time.  If the system is
using voluntary preemptions, the software watchdog will not be able to
detect forward progress.  This patch addresses this issue by offering to
give up time like __remove_pages() does.  This behavior was introduced in
v5.6 with: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in
remove_pfn_range_from_zone()")

Alternately, init_page_poison could do this cond_resched(), but it seems
to me that the caller of init_page_poison() is what actually knows whether
or not it should relax its own priority.

Based on Dan's notes, I think this is perfectly safe: commit f931ab479dd2
("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

Aside from fixing the lockup, it is also a friendlier thing to do on lower
core systems that might wipe out large chunks of hotplug memory (probably
not a very common case).

Fixes this kind of splat:

  watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
  irq event stamp: 138450
  hardirqs last  enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
  hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
  softirqs last  enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
  softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
  CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
  Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
  RIP: 0010:memset_erms+0x9/0x10
  Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
  Call Trace:
   remove_pfn_range_from_zone+0x3a/0x380
   memunmap_pages+0x17f/0x280
   release_nodes+0x22a/0x260
   __device_release_driver+0x172/0x220
   device_driver_detach+0x3e/0xa0
   unbind_store+0x113/0x130
   kernfs_fop_write+0xdc/0x1c0
   vfs_write+0xde/0x1d0
   ksys_write+0x58/0xd0
   do_syscall_64+0x5a/0x120
   entry_SYSCALL_64_after_hwframe+0x49/0xb3
  Built 2 zonelists, mobility grouping on.  Total pages: 49050381
  Policy zone: Normal
  Built 3 zonelists, mobility grouping on.  Total pages: 49312525
  Policy zone: Normal

David said: "It really only is an issue for devmem.  Ordinary
hotplugged system memory is not affected (onlined/offlined in memory
block granularity)."

Link: http://lkml.kernel.org/r/20200619231213.1160351-1-ben.widawsky@intel.com
Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: remove vmalloc_exec

Merge vmalloc_exec into its only caller. Note that for !CONFIG_MMU
__vmalloc_node_range maps to __vmalloc, which directly clears the
__GFP_HIGHMEM added by the vmalloc_exec stub anyway.

Link: http://lkml.kernel.org/r/20200618064307.32739-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page

Use PAGE_KERNEL_ROX directly instead of allocating RWX and setting the
page read-only just after the allocation.

Link: http://lkml.kernel.org/r/20200618064307.32739-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86/hyperv: allocate the hypercall page with only read and execute bits

Patch series "fix a hyperv W^X violation and remove vmalloc_exec"

Dexuan reported a W^X violation due to the fact that the hyper hypercall
page due switching it to be allocated using vmalloc_exec.

The problem is that PAGE_KERNEL_EXEC as used by vmalloc_exec actually
sets writable permissions in the pte.  This series fixes the issue by
switching to the low-level __vmalloc_node_range interface that allows
specifing more detailed permissions instead.  It then also open codes
the other two callers and removes the somewhat confusing vmalloc_exec
interface.

Peter noted that the hyper hypercall page allocation also has another
long standing issue in that it shouldn't use the full vmalloc but just
the module space.  This issue is so far theoretical as the allocation is
done early in the boot process.  I plan to fix it with another bigger
series for 5.9.

This patch (of 3):

Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes
the writable bit.

For this resurrect the removed PAGE_KERNEL_RX definition, but as
PAGE_KERNEL_ROX to match arm64 and powerpc.

Link: http://lkml.kernel.org/r/20200618064307.32739-2-hch@lst.de
Fixes: 78bb17f76edc ("x86/hyperv: use vmalloc_exec for the hypercall page")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory: fix IO cost for anonymous page

With synchronous IO swap device, swap-in is directly handled in fault
code. Since IO cost notation isn't added there, with synchronous IO
swap device, LRU balancing could be wrongly biased. Fix it to count it
in fault code.

Link: http://lkml.kernel.org/r/1592288204-27734-4-git-send-email-iamjoonsoo.kim@lge.com
Fixes: 314b57fb0460001 ("mm: balance LRU lists based on relative thrashing cache sizing")
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages"

Non-file-lru page could also be activated in mark_page_accessed() and we
need to count this activation for nonresident_age.

Note that it's better for this patch to be squashed into the patch "mm:
workingset: age nonresident information alongside anonymous pages".

Link: http://lkml.kernel.org/r/1592288204-27734-3-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: workingset: age nonresident information alongside anonymous pages

Patch series "fix for "mm: balance LRU lists based on relative
thrashing" patchset"

This patchset fixes some problems of the patchset, "mm: balance LRU
lists based on relative thrashing", which is now merged on the mainline.

Patch "mm: workingset: let cache workingset challenge anon fix" is the
result of discussion with Johannes.  See following link.

  http://lkml.kernel.org/r/20200520232525.798933-6-hannes@cmpxchg.org

And, the other two are minor things which are found when I try to rebase
my patchset.

This patch (of 3):

After ("mm: workingset: let cache workingset challenge anon fix"), we
compare refault distances to active_file + anon.  But age of the
non-resident information is only driven by the file LRU.  As a result,
we may overestimate the recency of any incoming refaults and activate
them too eagerly, causing unnecessary LRU churn in certain situations.

Make anon aging drive nonresident age as well to address that.

Link: http://lkml.kernel.org/r/1592288204-27734-1-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1592288204-27734-2-git-send-email-iamjoonsoo.kim@lge.com
Fixes: 34e58cac6d8f2a ("mm: workingset: let cache workingset challenge anon")
Reported-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

doc: THP CoW fault no longer allocate THP

Since commit 3917c80280c9 ("thp: change CoW semantics for anon-THP"),
THP CoW page fault is rewritten. Now it just splits pmd then fallback
to base page fault, it doesn't try to allocate THP anymore. So it is no
longer counted in THP_FAULT_ALLOC.

Remove the obsolete statement in documentation about THP CoW allocation
to avoid confusion.

Link: http://lkml.kernel.org/r/1592424895-5421-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

docs: mm/gup: minor documentation update

Now there are 5 cases. Updated the same.

Link: http://lkml.kernel.org/r/1592422023-7401-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memcontrol.c: prevent missed memory.low load tears

Looks like one of these got missed when massaging in f86b810c2610 ("mm,
memcg: prevent memory.low load/store tearing") with other linux-mm
changes.

Link: http://lkml.kernel.org/r/20200612174437.GA391453@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Reported-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memcontrol.c: add missed css_put()

We should put the css reference when memory allocation failed.

Link: http://lkml.kernel.org/r/20200614122653.98829-1-songmuchun@bytedance.com
Fixes: f0a3a24b532d ("mm: memcg/slab: rework non-root kmem_cache lifecycle management")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcontrol: handle div0 crash race condition in memory.low

Tejun reports seeing rare div0 crashes in memory.low stress testing:

  RIP: 0010:mem_cgroup_calculate_protection+0xed/0x150
  Code: 0f 46 d1 4c 39 d8 72 57 f6 05 16 d6 42 01 40 74 1f 4c 39 d8 76 1a 4c 39 d1 76 15 4c 29 d1 4c 29 d8 4d 29 d9 31 d2 48 0f af c1 <49> f7 f1 49 01 c2 4c 89 96 38 01 00 00 5d c3 48 0f af c7 31 d2 49
  RSP: 0018:ffffa14e01d6fcd0 EFLAGS: 00010246
  RAX: 000000000243e384 RBX: 0000000000000000 RCX: 0000000000008f4b
  RDX: 0000000000000000 RSI: ffff8b89bee84000 RDI: 0000000000000000
  RBP: ffffa14e01d6fcd0 R08: ffff8b89ca7d40f8 R09: 0000000000000000
  R10: 0000000000000000 R11: 00000000006422f7 R12: 0000000000000000
  R13: ffff8b89d9617000 R14: ffff8b89bee84000 R15: ffffa14e01d6fdb8
  FS:  0000000000000000(0000) GS:ffff8b8a1f1c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f93b1fc175b CR3: 000000016100a000 CR4: 0000000000340ea0
  Call Trace:
    shrink_node+0x1e5/0x6c0
    balance_pgdat+0x32d/0x5f0
    kswapd+0x1d7/0x3d0
    kthread+0x11c/0x160
    ret_from_fork+0x1f/0x30

This happens when parent_usage == siblings_protected.

We check that usage is bigger than protected, which should imply
parent_usage being bigger than siblings_protected.  However, we don't
read (or even update) these values atomically, and they can be out of
sync as the memory state changes under us.  A bit of fluctuation around
the target protection isn't a big deal, but we need to handle the div0
case.

Check the parent state explicitly to make sure we have a reasonable
positive value for the divisor.

Link: http://lkml.kernel.org/r/20200615140658.601684-1-hannes@cmpxchg.org
Fixes: 8a931f801340 ("mm: memcontrol: recursive memory.low protection")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/vmalloc.c: fix a warning while make xmldocs

This patch fixes following warning while "make xmldocs"

mm/vmalloc.c:1877: warning: Excess function parameter 'prot' description in 'vm_map_ram'

This warning started since commit d4efd79a81ab ("mm: remove the prot
argument from vm_map_ram").

Link: http://lkml.kernel.org/r/20200622152850.140871-1-standby24x7@gmail.com
Fixes: d4efd79a81ab ("mm: remove the prot argument from vm_map_ram")
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

media: omap3isp: remove cacheflush.h

After mm.h was removed from the asm-generic version of cacheflush.h,
s390 allyesconfig shows several warnings of the following nature:

  In file included from arch/s390/include/generated/asm/cacheflush.h:1,
                   from drivers/media/platform/omap3isp/isp.c:42:
  include/asm-generic/cacheflush.h:16:42: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration

As Geert and Laurent point out, this driver does not need this header in
the two files that include it.  Remove it so there are no warnings.

Link: http://lkml.kernel.org/r/20200622234740.72825-2-natechancellor@gmail.com
Fixes: e0cf615d725c ("asm-generic: don't include <linux/mm.h> in cacheflush.h")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

make asm-generic/cacheflush.h more standalone

Some s390 builds get these warnings:

  include/asm-generic/cacheflush.h:16:42: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:22:46: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:28:45: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:36:44: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:44:45: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:52:50: warning: 'struct address_space' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:58:52: warning: 'struct address_space' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:75:17: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:74:45: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:82:16: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
  include/asm-generic/cacheflush.h:81:50: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration

Forward declare the named structs to get rid of these.

Link: http://lkml.kernel.org/r/20200623135714.4dae4b8a@canb.auug.org.au
Fixes: e0cf615d725c ("asm-generic: don't include <linux/mm.h> in cacheflush.h")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/debug_vm_pgtable: fix build failure with powerpc 8xx

Since commit 9e343b467c70 ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses"), READ_ONCE() cannot be used
anymore to read complex page table entries.

This leads to:

      CC      mm/debug_vm_pgtable.o
    In file included from ./include/asm-generic/bug.h:5,
                     from ./arch/powerpc/include/asm/bug.h:109,
                     from ./include/linux/bug.h:5,
                     from ./include/linux/mmdebug.h:5,
                     from ./include/linux/gfp.h:5,
                     from mm/debug_vm_pgtable.c:13:
    In function 'pte_clear_tests',
        inlined from 'debug_vm_pgtable' at mm/debug_vm_pgtable.c:363:2:
    ./include/linux/compiler.h:392:38: error: Unsupported access size for {READ,WRITE}_ONCE().
    mm/debug_vm_pgtable.c:249:14: note: in expansion of macro 'READ_ONCE'
      249 |  pte_t pte = READ_ONCE(*ptep);
          |              ^~~~~~~~~
    make[2]: *** [mm/debug_vm_pgtable.o] Error 1

Fix it by using the recently added ptep_get() helper.

Link: http://lkml.kernel.org/r/6ca8c972e6c920dc4ae0d4affbed9703afa4d010.1592490570.git.christophe.leroy@csgroup.eu
Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages()

Calls to pte_offset_map() in vm_insert_pages() are erroneously not
matched with a call to pte_unmap(). This would cause problems on
architectures where that is not a no-op.

This patch does away with the non-traditional locking in the existing
code, and instead uses pte_offset_map_lock/unlock() as usual,
incrementing PTE as necessary. The PTE pointer is kept within bounds
since we clamp it with PTRS_PER_PTE.

Link: http://lkml.kernel.org/r/20200618220446.20284-1-arjunroy.kdev@gmail.com
Fixes: 8cd3984d81d5 ("mm/memory.c: add vm_insert_pages()")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix swap cache node allocation mask

Chris Murphy reports that a slightly overcommitted load, testing swap
and zram along with i915, splats and keeps on splatting, when it had
better fail less noisily:

  gnome-shell: page allocation failure: order:0,
  mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
  nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
  Call Trace:
    dump_stack+0x64/0x88
    warn_alloc.cold+0x75/0xd9
    __alloc_pages_slowpath.constprop.0+0xcfa/0xd30
    __alloc_pages_nodemask+0x2df/0x320
    alloc_slab_page+0x195/0x310
    allocate_slab+0x3c5/0x440
    ___slab_alloc+0x40c/0x5f0
    __slab_alloc+0x1c/0x30
    kmem_cache_alloc+0x20e/0x220
    xas_nomem+0x28/0x70
    add_to_swap_cache+0x321/0x400
    __read_swap_cache_async+0x105/0x240
    swap_cluster_readahead+0x22c/0x2e0
    shmem_swapin+0x8e/0xc0
    shmem_swapin_page+0x196/0x740
    shmem_getpage_gfp+0x3a2/0xa60
    shmem_read_mapping_page_gfp+0x32/0x60
    shmem_get_pages+0x155/0x5e0 [i915]
    __i915_gem_object_get_pages+0x68/0xa0 [i915]
    i915_vma_pin+0x3fe/0x6c0 [i915]
    eb_add_vma+0x10b/0x2c0 [i915]
    i915_gem_do_execbuffer+0x704/0x3430 [i915]
    i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
    drm_ioctl_kernel+0x86/0xd0 [drm]
    drm_ioctl+0x206/0x390 [drm]
    ksys_ioctl+0x82/0xc0
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x5b/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported on 5.7, but it goes back really to 3.1: when
shmem_read_mapping_page_gfp() was implemented for use by i915, and
allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
__read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
Fixes: 68da9f055755 ("tmpfs: pass gfp to shmem_getpage_gfp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Chris Murphy <lists@colorremedies.com>
Analyzed-by: Vlastimil Babka <vbabka@suse.cz>
Analyzed-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Chris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

slub: cure list_slab_objects() from double fix

According to Christopher Lameter two fixes have been merged for the same
problem. As far as I can tell, the code does not acquire the list_lock
and invoke kmalloc(). list_slab_objects() misses an unlock (the
counterpart to get_map()) and the memory allocated in free_partial()
isn't used.

Revert the mentioned commit.

Link: http://lkml.kernel.org/r/20200618201234.795692-1-bigeasy@linutronix.de
Fixes: aa456c7aebb14 ("slub: remove kmalloc under list_lock from list_slab_objects() V2")
Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2006181501480.12014@www.lameter.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>