git.monstr.eu Git - linux-2.6-microblaze.git/log

vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages

pfn is not added to pfn_list when vfio_add_to_pfn_list fails.
vfio_unpin_page_external will exit directly without calling
vfio_iova_put_vfio_pfn. This will lead to a memory leak.

Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
Signed-off-by: Xiaoyang Xu <xuxiaoyang2@huawei.com>
[aw: simplified logic, add Fixes]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/pci: Clear token on bypass registration failure

The eventfd context is used as our irqbypass token, therefore if an
eventfd is re-used, our token is the same. The irqbypass code will
return an -EBUSY in this case, but we'll still attempt to unregister
the producer, where if that duplicate token still exists, results in
removing the wrong object. Clear the token of failed producers so
that they harmlessly fall out when unregistered.

Fixes: 6d7425f109d2 ("vfio: Register/unregister irq_bypass_producer")
Reported-by: guomin chen <guomin_chen@sina.com>
Tested-by: guomin chen <guomin_chen@sina.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: fix the return of the uninitialized variable ret

The vfio_fsl_mc_reflck_attach function may return, on success path,
an uninitialized variable. Fix the problem by initializing the return
variable to 0.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: f2ba7e8c947b ("vfio/fsl-mc: Added lock support in preparation for interrupt handling")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger

Static analysis discovered that some code in vfio_fsl_mc_set_irq_trigger
is dead code. Fixed the code by changing the conditions order.

Fixes: cc0ee20bd969 ("vfio/fsl-mc: trigger an interrupt via eventfd")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit

The FSL_MC_BUS on which the VFIO-FSL-MC driver is dependent on
can be compiled on other architectures as well (not only ARM64)
including 32 bit architectures.
Include linux/io-64-nonatomic-hi-lo.h to make writeq/readq used
in the driver available on 32bit platforms.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into v5.10/vfio/next

MAINTAINERS: Add entry for s390 vfio-pci

Add myself to cover s390-specific items related to vfio-pci.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO

Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI.

When this s390-only feature is configured we add capabilities to the
VFIO_DEVICE_GET_INFO ioctl that describe features of the associated
zPCI device and its underlying hardware.

This patch is based on work previously done by Pierre Morel.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Add support for device reset

Currently only resetting the DPRC container is supported which
will reset all the objects inside it. Resetting individual
objects is possible from the userspace by issueing commands
towards MC firmware.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Add read/write support for fsl-mc devices

The software uses a memory-mapped I/O command interface (MC portals) to
communicate with the MC hardware. This command interface is used to
discover, enumerate, configure and remove DPAA2 objects. The DPAA2
objects use MSIs, so the command interface needs to be emulated
such that the correct MSI is configured in the hardware (the guest
has the virtual MSIs).

This patch is adding read/write support for fsl-mc devices. The mc
commands are emulated by the userspace. The host is just passing
the correct command to the hardware.

Also the current patch limits userspace to write complete
64byte command once and read 64byte response by one ioctl.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: trigger an interrupt via eventfd

This patch allows to set an eventfd for fsl-mc device interrupts
and also to trigger the interrupt eventfd from userspace for testing.

All fsl-mc device interrupts are MSIs. The MSIs are allocated from
the MSI domain only once per DPRC and used by all the DPAA2 objects.
The interrupts are managed by the DPRC in a pool of interrupts. Each
device requests interrupts from this pool. The pool is allocated
when the first virtual device is setting the interrupts.
The pool of interrupts is protected by a lock.

The DPRC has an interrupt of its own which indicates if the DPRC
contents have changed. However, currently, the contents of a DPRC
assigned to the guest cannot be changed at runtime, so this interrupt
is not configured.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Add irq infrastructure for fsl-mc devices

This patch adds the skeleton for interrupt support
for fsl-mc devices. The interrupts are not yet functional,
the functionality will be added by subsequent patches.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Added lock support in preparation for interrupt handling

Only the DPRC object allocates interrupts from the MSI
interrupt domain. The interrupts are managed by the DPRC in
a pool of interrupts. The access to this pool of interrupts
has to be protected with a lock.
This patch extends the current lock implementation to have a
lock per DPRC.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions

Allow userspace to mmap device regions for direct access of
fsl-mc devices.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call

Expose to userspace information about the memory regions.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl

Allow userspace to get fsl-mc device info (number of regions
and irqs).

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind

The DPRC (Data Path Resource Container) device is a bus device and has
child devices attached to it. When the vfio-fsl-mc driver is probed
the DPRC is scanned and the child devices discovered and initialized.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO

Allow the VFIO_DEVICE_GET_INFO ioctl to include a capability chain.
Add a flag indicating capability chain support, and introduce the
definitions for the first set of capabilities which are specified to
s390 zPCI devices.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390/pci: track whether util_str is valid in the zpci_dev

We'll need to keep track of whether or not the byte string in util_str is
valid and thus needs to be passed to a vfio-pci passthrough device.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390/pci: stash version in the zpci_dev

In preparation for passing the info on to vfio-pci devices, stash the
supported PCI version for the target device in the zpci_dev.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices

DPAA2 (Data Path Acceleration Architecture) consists in
mechanisms for processing Ethernet packets, queue management,
accelerators, etc.

The Management Complex (mc) is a hardware entity that manages the DPAA2
hardware resources. It provides an object-based abstraction for software
drivers to use the DPAA2 hardware. The MC mediates operations such as
create, discover, destroy of DPAA2 objects.
The MC provides memory-mapped I/O command interfaces (MC portals) which
DPAA2 software drivers use to operate on DPAA2 objects.

A DPRC is a container object that holds other types of DPAA2 objects.
Each object in the DPRC is a Linux device and bound to a driver.
The MC-bus driver is a platform driver (different from PCI or platform
bus). The DPRC driver does runtime management of a bus instance. It
performs the initial scan of the DPRC and handles changes in the DPRC
configuration (adding/removing objects).

All objects inside a container share the same hardware isolation
context, meaning that only an entire DPRC can be assigned to
a virtual machine.
When a container is assigned to a virtual machine, all the objects
within that container are assigned to that virtual machine.
The DPRC container assigned to the virtual machine is not allowed
to change contents (add/remove objects) by the guest. The restriction
is set by the host and enforced by the mc hardware.

The DPAA2 objects can be directly assigned to the guest. However
the MC portals (the memory mapped command interface to the MC) need
to be emulated because there are commands that configure the
interrupts and the isolation IDs which are virtual in the guest.

Example:
echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override
echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind

The dprc.2 is bound to the VFIO driver and all the objects within
dprc.2 are going to be bound to the VFIO driver.

This patch adds the infrastructure for VFIO support for fsl-mc
devices. Subsequent patches will add support for binding and secure
assigning these devices using VFIO.

More details about the DPAA2 objects can be found here:
Documentation/networking/device_drivers/freescale/dpaa2/overview.rst

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', 'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next

vfio/type1: fix dirty bitmap calculation in vfio_dma_rw

The count of dirtied pages is not only determined by count of copied
pages, but also by the start offset.

e.g. if offset = PAGE_SIZE - 1, and *copied=2, the dirty pages count
is 2, instead of 1 or 0.

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: fix a missed vfio group put in vfio_pin_pages

When error occurs, need to put vfio group after a successful get.

Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn

While it is true that devices with is_virtfn=1 will have a Memory Space
Enable bit that is hard-wired to 0, this is not the only case where we
see this behavior -- For example some bare-metal hypervisors lack
Memory Space Enable bit emulation for devices not setting is_virtfn
(s390). Fix this by instead checking for the newly-added
no_command_memory bit which directly denotes the need for
PCI_COMMAND_MEMORY emulation in vfio.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390/pci: Mark all VFs as not implementing PCI_COMMAND_MEMORY

For s390 we can have VFs that are passed-through without the associated
PF. Firmware provides an emulation layer to allow these devices to
operate independently, but is missing emulation of the Memory Space
Enable bit. For these as well as linked VFs, set no_command_memory
which specifies these devices do not implement PCI_COMMAND_MEMORY.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio iommu: Add dma available capability

Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container")
added the ability to limit the number of memory backed DMA mappings.
However on s390x, when lazy mapping is in use, we use a very large
number of concurrent mappings. Let's provide the current allowable
number of DMA mappings to userspace via the IOMMU info chain so that
userspace can take appropriate mitigation.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: add a singleton check for vfio_group_pin_pages

Page pinning is used both to translate and pin device mappings for DMA
purpose, as well as to indicate to the IOMMU backend to limit the dirty
page scope to those pages that have been pinned, in the case of an IOMMU
backed device.
To support this, the vfio_pin_pages() interface limits itself to only
singleton groups such that the IOMMU backend can consider dirty page
scope only at the group level. Implement the same requirement for the
vfio_group_pin_pages() interface.

Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

PCI/IOV: Mark VFs as not implementing PCI_COMMAND_MEMORY

For VFs, the Memory Space Enable bit in the Command Register is
hard-wired to 0.

Add a new bit to signify devices where the Command Register Memory
Space Enable bit does not control the device's response to MMIO
accesses.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/pci: Don't regenerate vconfig for all BARs if !bardirty

Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every
time any offset of any of them are read. Though BARs aren't re-read
regularly, the regeneration can be avoided if no BARs had been written
since they were last read, in which case vdev->bardirty is false.

Let's return immediately in vfio_bar_fixup() if bardirty is false.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/pci: Remove redundant declaration of vfio_pci_driver

It was added by commit 137e5531351d ("vfio/pci: Add sriov_configure
support") but duplicates a forward declaration earlier in the file.

Remove it.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Fix typo of the device_state

A typo fix ("_RUNNNG" => "_RUNNING") in comment block of the uapi header.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Linux 5.9-rc6

Merge tag 'core_urgent_for_v5.9_rc6' of git://git./linux/kernel/git/tip/tip

Pull syscall tracing fix from Borislav Petkov:
"Fix the seccomp syscall rewriting so that trace and audit see the
rewritten syscall number, from Kees Cook"

* tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
core/entry: Report syscall correctly for trace and audit

Merge tag 'objtool_urgent_for_v5.9_rc6' of git://git./linux/kernel/git/tip/tip

Pull objtool fix from Borislav Petkov:
"Fix noreturn detection for ignored sibling functions (Josh Poimboeuf)"

* tag 'objtool_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix noreturn detection for ignored functions

Merge tag 'locking_urgent_for_v5.9_rc6' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:
"Two fixes from the locking/urgent pile:

   - Fix lockdep's detection of "USED" <- "IN-NMI" inversions (Peter
     Zijlstra)

   - Make percpu-rwsem operations on the semaphore's ->read_count
     IRQ-safe because it can be used in an IRQ context (Hou Tao)"

* tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
  locking/lockdep: Fix "USED" <- "IN-NMI" inversions

Merge tag 'efi-urgent-for-v5.9-rc5' of git://git./linux/kernel/git/tip/tip

Pull EFI fix from Borislav Petkov:
"Ensure that the EFI bootloader control module only probes successfully
on systems that support the EFI SetVariable runtime service"

[ Tag and commit from Ard Biesheuvel, forwarded by Borislav ]

* tag 'efi-urgent-for-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: efibc: check for efivars write capability

Merge tag 'x86_urgent_for_v5.9_rc6' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- A defconfig fix (Daniel Díaz)

- Disable relocation relaxation for the compressed kernel when not
   built as -pie as in that case kernels built with clang and linked
   with LLD fail to boot due to the linker optimizing some instructions
   in non-PIE form; the gory details in the commit message (Arvind
   Sankar)

- A fix for the "bad bp value" warning issued by the frame-pointer
   unwinder (Josh Poimboeuf)

* tag 'x86_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/fp: Fix FP unwinding in ret_from_fork
  x86/boot/compressed: Disable relocation relaxation
  x86/defconfigs: Explicitly unset CONFIG_64BIT in i386_defconfig

Merge tag 'libnvdimm-fixes-5.9-rc6' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"A handful of fixes to address a string of mistakes in the mechanism
  for device-mapper to determine if its component devices are dax
  capable.

   - Fix an original bug in device-mapper table reference counting when
     interrogating dax capability in the component device. This bug was
     hidden by the following bug.

   - Fix device-mapper to use the proper helper (dax_supported() instead
     of the leaf helper generic_fsdax_supported()) to determine dax
     operation of a stacked block device configuration. The original
     implementation is only valid for one level of dax-capable block
     device stacking. This bug was discovered while fixing the below
     regression.

   - Fix an infinite recursion regression introduced by broken attempts
     to quiet the generic_fsdax_supported() path and make it bail out
     before logging "dax capability not found" errors"

* tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix stack overflow when mounting fsdax pmem device
  dm: Call proper helper to determine dax support
  dm/dax: Fix table reference counts

Merge tag 'riscv-for-linus-5.9-rc6' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- A fix for a lockdep issue to avoid an asserting triggering during
   early boot. There shouldn't be any incorrect behavior as the system
   isn't concurrent at the time.

- The addition of a missing fence when installing early fixmap
   mappings.

- A corretion to the K210 device tree's interrupt map.

- A fix for M-mode timer handling on the K210.

* tag 'riscv-for-linus-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Resurrect the MMIO timer implementation for M-mode systems
  riscv: Fix Kendryte K210 device tree
  riscv: Add sfence.vma after early page table changes
  RISC-V: Take text_mutex in ftrace_init_nop()

Merge tag 'usb-5.9-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt fixes from Greg KH:
"Here are some small USB and one Thunderbolt driver fixes.

  Nothing major at all, just some fixes for reported issues, and a quirk
  addition:

   - typec fixes

   - UAS disconnect fix

   - usblp race fix

   - ehci-hcd modversions build fix

   - ignore wakeup quirk table addition

   - thunderbolt DROM read fix

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usblp: fix race between disconnect() and read()
  ehci-hcd: Move include to keep CRC stable
  usb: typec: intel_pmc_mux: Handle SCU IPC error conditions
  USB: quirks: Add USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk for BYD zhaoxin notebook
  USB: UAS: fix disconnect by unplugging a hub
  usb: typec: ucsi: Prevent mode overrun
  usb: typec: ucsi: acpi: Increase command completion timeout value
  thunderbolt: Retry DROM read once if parsing fails

Merge tag 'tty-5.9-rc6' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial/fbcon fixes from Greg KH:
"Here are some small tty/serial and one more fbcon fix.

  They include:

   - serial core locking regression fixes

   - new device ids for 8250_pci driver

   - fbcon fix for syzbot found issue

  All have been in linux-next with no reported issues"

* tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  fbcon: Fix user font detection test at fbcon_resize().
  serial: 8250_pci: Add Realtek 816a and 816b
  serial: core: fix console port-lock regression
  serial: core: fix port-lock initialisation

Merge tag 'edac_urgent_for_v5.9_rc6' of git://git./linux/kernel/git/ras/ras

Pull EDAC fixes from Borislav Petkov:
"Two fixes for resulting from CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
  experiments:

   - complete a previous fix to reset a local structure containing
     scanned system data properly so that the driver rescans, as it
     should, on a second load.

   - address a refcount underflow due to not paying attention to the
     driver whitelest on unregister"

* tag 'edac_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/ghes: Check whether the driver is on the safe list correctly
  EDAC/ghes: Clear scanned data on unload

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"Just a couple of driver quirks"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: trackpoint - add new trackpoint variant IDs
Input: i8042 - add Entroware Proteus EL07R4 to nomux and reset lists

mm: fix wake_page_function() comment typos

Sedat Dilek pointed out some silly comment typo issues.

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'kbuild-fixes-v5.9-3' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:
"Fix qconf warnings and revive help message"

* tag 'kbuild-fixes-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: qconf: revive help message in the info view
  kconfig: qconf: fix incomplete type 'struct gstr' warning
  kconfig: qconf: use delete[] instead of delete to free array (again)

dax: Fix stack overflow when mounting fsdax pmem device

When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix
detection of dax support for non-persistent memory block devices")
introduces the stack overflow [1][2]. Here is the call path for
mounting ext4 file system:
  ext4_fill_super
    bdev_dax_supported
      __bdev_dax_supported
        dax_supported
          generic_fsdax_supported
            __generic_fsdax_supported
              bdev_dax_supported

The call path leads to the infinite calling loop, so we cannot
call bdev_dax_supported() in __generic_fsdax_supported(). The sanity
checking of the variable 'dax_dev' is moved prior to the two
bdev_dax_pgoff() checks [3][4].

[1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/
[2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/
[3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/
[4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/

Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: Coly Li <colyli@suse.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: John Pittman <jpittman@redhat.com>
Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

dm: Call proper helper to determine dax support

DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

dm/dax: Fix table reference counts

A recent fix to the dm_dax_supported() flow uncovered a latent bug. When
dm_get_live_table() fails it is still required to drop the
srcu_read_lock(). Without this change the lvm2 test-suite triggers this
warning:

    # lvm2-testsuite --only pvmove-abort-all.sh

    WARNING: lock held when returning to user space!
    5.9.0-rc5+ #251 Tainted: G           OE
    ------------------------------------------------
    lvm/1318 is leaving the kernel with locks still held!
    1 lock held by lvm/1318:
     #0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod]

...and later on this hang signature:

    INFO: task lvm:1344 blocked for more than 122 seconds.
          Tainted: G           OE     5.9.0-rc5+ #251
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:lvm             state:D stack:    0 pid: 1344 ppid:     1 flags:0x00004000
    Call Trace:
     __schedule+0x45f/0xa80
     ? finish_task_switch+0x249/0x2c0
     ? wait_for_completion+0x86/0x110
     schedule+0x5f/0xd0
     schedule_timeout+0x212/0x2a0
     ? __schedule+0x467/0xa80
     ? wait_for_completion+0x86/0x110
     wait_for_completion+0xb0/0x110
     __synchronize_srcu+0xd1/0x160
     ? __bpf_trace_rcu_utilization+0x10/0x10
     __dm_suspend+0x6d/0x210 [dm_mod]
     dm_suspend+0xf6/0x140 [dm_mod]

Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

kconfig: qconf: revive help message in the info view

Since commit 68fd110b3e7e ("kconfig: qconf: remove redundant help in
the info view"), the help message is no longer displayed.

I intended to drop duplicated "Symbol:", "Type:", but precious info
about help and reverse dependencies was lost too.

Revive it now.

"defined at" is contained in menu_get_ext_help(), so I made sure
to not display it twice.

Fixes: 68fd110b3e7e ("kconfig: qconf: remove redundant help in the info view")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kconfig: qconf: fix incomplete type 'struct gstr' warning

"make HOSTCXX=clang++ xconfig" reports the following:

HOSTCXX scripts/kconfig/qconf.o
In file included from scripts/kconfig/qconf.cc:23:
In file included from scripts/kconfig/lkc.h:15:
scripts/kconfig/lkc_proto.h:26:13: warning: 'get_relations_str' has C-linkage specified, but returns incomplete type 'struct gstr' which could be incompatible with C [-Wreturn-type-c-linkage]
struct gstr get_relations_str(struct symbol **sym_arr, struct list_head *head);
^

Currently, get_relations_str() is declared before the struct gstr
definition.

Move all declarations of menu.c functions below.

BTW, some are declared in lkc.h and some in lkc_proto.h, but the
difference is unclear. I guess some refactoring is needed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Boris Kolpackov <boris@codesynthesis.com>

Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
"15 patches.

  Subsystems affected by this patch series: mailmap, mm/hotfixes,
  mm/thp, mm/memory-hotplug, misc, kcsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'
  fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype
  stackleak: let stack_erasing_sysctl take a kernel pointer buffer
  ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
  mm/memory_hotplug: drain per-cpu pages again during memory offline
  selftests/vm: fix display of page size in map_hugetlb
  mm/thp: fix __split_huge_pmd_locked() for migration PMD
  kprobes: fix kill kprobe which has been marked as gone
  tmpfs: restore functionality of nr_inodes=0
  mlock: fix unevictable_pgs event counts on THP
  mm: fix check_move_unevictable_pages() on THP
  mm: migration of hugetlbfs page skip memcg
  ksm: reinstate memcg charge on copied pages
  mailmap: add older email addresses for Kees Cook

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Another bunch of fixes for I2C.

  Jean's i801 patch is a cleanup on top of Volker's i801 patch, but it
  will make dependency handling much easier if those two go together"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: mxs: use MXS_DMA_CTRL_WAIT4END instead of DMA_CTRL_ACK
  i2c: mediatek: Send i2c master code at more than 1MHz
  i2c: mediatek: Fix generic definitions for bus frequency
  i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()
  i2c: i801: Simplify the suspend callback
  i2c: i801: Fix resume bug
  i2c: aspeed: Mask IRQ status to relevant bits

RISC-V: Resurrect the MMIO timer implementation for M-mode systems

The K210 doesn't implement rdtime in M-mode, and since that's where Linux runs
in the NOMMU systems that means we can't use rdtime. The K210 is the only
system that anyone is currently running NOMMU or M-mode on, so here we're just
inlining the timer read directly.

This also adds the CLINT driver as an !MMU dependency, as it's currently the
only timer driver availiable for these systems and without it we get a build
failure for some configurations.

Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

riscv: Fix Kendryte K210 device tree

The Kendryte K210 SoC CLINT is compatible with Sifive clint v0
(sifive,clint0). Fix the Kendryte K210 device tree clint entry to be
inline with the sifive timer definition documented in
Documentation/devicetree/bindings/timer/sifive,clint.yaml.
The device tree clint entry is renamed similarly to u-boot device tree
definition to improve compatibility with u-boot defined device tree.
To ensure correct initialization, the interrup-cells attribute is added
and the interrupt-extended attribute definition fixed.

This fixes boot failures with Kendryte K210 SoC boards.

Note that the clock referenced is kept as K210_CLK_ACLK, which does not
necessarilly match the clint MTIME increment rate. This however does not
seem to cause any problem for now.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

riscv: Add sfence.vma after early page table changes

This invalidates local TLB after modifying the page tables during early init as
it's too early to handle suprious faults as we otherwise do.

Fixes: f2c17aabc917 ("RISC-V: Implement compile-time fixed mappings")
Reported-by: Syven Wang <syven.wang@sifive.com>
Signed-off-by: Syven Wang <syven.wang@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
[Palmer: Cleaned up the commit text]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'

This moves the KCSAN kconfig items under menu 'Generic Kernel Debugging
Instruments' where UBSAN resides.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200904152224.5570-1-changbin.du@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
definition of dirtytime_interval_handler to match its prototype in
linux/writeback.h which fixes the following sparse error/warning:

fs/fs-writeback.c:2189:50: warning: incorrect type in argument 3 (different address spaces)
fs/fs-writeback.c:2189:50:    expected void *
fs/fs-writeback.c:2189:50:    got void [noderef] __user *buffer
fs/fs-writeback.c:2184:5: error: symbol 'dirtytime_interval_handler' redeclared with different type (incompatible argument 3 (different address spaces)):
fs/fs-writeback.c:2184:5:    int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )
fs/fs-writeback.c: note: in included file:
./include/linux/writeback.h:374:5: note: previously declared as:
./include/linux/writeback.h:374:5:    int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093140.13434-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

stackleak: let stack_erasing_sysctl take a kernel pointer buffer

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of stack_erasing_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
kernel/stackleak.c:31:50:    expected void *
kernel/stackleak.c:31:50:    got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ftrace: let ftrace_enable_sysctl take a kernel pointer buffer

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43:    expected void *
kernel/trace/ftrace.c:7544:43:    got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory_hotplug: drain per-cpu pages again during memory offline

There is a race during page offline that can lead to infinite loop:
a page never ends up on a buddy list and __offline_pages() keeps
retrying infinitely or until a termination signal is received.

Thread#1 - a new process:

load_elf_binary
begin_new_exec
  exec_mmap
   mmput
    exit_mmap
     tlb_finish_mmu
      tlb_flush_mmu
       release_pages
        free_unref_page_list
         free_unref_page_prepare
          set_pcppage_migratetype(page, migratetype);
             // Set page->index migration type below  MIGRATE_PCPTYPES

Thread#2 - hot-removes memory
__offline_pages
  start_isolate_page_range
    set_migratetype_isolate
      set_pageblock_migratetype(page, MIGRATE_ISOLATE);
        Set migration type to MIGRATE_ISOLATE-> set
        drain_all_pages(zone);
             // drain per-cpu page lists to buddy allocator.

Thread#1 - continue
         free_unref_page_commit
           migratetype = get_pcppage_migratetype(page);
              // get old migration type
           list_add(&page->lru, &pcp->lists[migratetype]);
              // add new page to already drained pcp list

Thread#2
Never drains pcp again, and therefore gets stuck in the loop.

The fix is to try to drain per-cpu lists again after
check_pages_isolated_cb() fails.

Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200903140032.380431-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20200904151448.100489-2-pasha.tatashin@soleen.com
Link: http://lkml.kernel.org/r/20200904070235.GA15277@dhcp22.suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

selftests/vm: fix display of page size in map_hugetlb

The displayed size is in bytes while the text says it is in kB.

Shift it by 10 to really display kBytes.

Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/e27481224564a93d14106e750de31189deaa8bc8.1598861977.git.christophe.leroy@csgroup.eu
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/thp: fix __split_huge_pmd_locked() for migration PMD

A migrating transparent huge page has to already be unmapped.  Otherwise,
the page could be modified while it is being copied to a new page and data
could be lost.  The function __split_huge_pmd() checks for a PMD migration
entry before calling __split_huge_pmd_locked() leading one to think that
__split_huge_pmd_locked() can handle splitting a migrating PMD.

However, the code always increments the page->_mapcount and adjusts the
memory control group accounting assuming the page is mapped.

Also, if the PMD entry is a migration PMD entry, the call to
is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead
of migration_entry_to_pfn(pmd_to_swp_entry(pmd)).  Fix these problems by
checking for a PMD migration entry.

Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org> [4.14+]
Link: https://lkml.kernel.org/r/20200903183140.19055-1-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kprobes: fix kill kprobe which has been marked as gone

If a kprobe is marked as gone, we should not kill it again. Otherwise, we
can disarm the kprobe more than once. In that case, the statistics of
kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not
work.

Fixes: e8386a0cb22f ("kprobes: support probing module __exit function")
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Song Liu <songliubraving@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tmpfs: restore functionality of nr_inodes=0

Commit e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") made changes
to shmem_reserve_inode() in mm/shmem.c, however the original test for
(sbinfo->max_inodes) got dropped.  This causes mounting tmpfs with option
nr_inodes=0 to fail:

  # mount -ttmpfs -onr_inodes=0 none /ext0
  mount: /ext0: mount(2) system call failed: Cannot allocate memory.

This patch restores the nr_inodes=0 functionality.

Fixes: e809d5f0b5c9 ("tmpfs: per-superblock i_ino support")
Signed-off-by: Byron Stanoszek <gandalf@winds.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Chris Down <chris@chrisdown.name>
Link: https://lkml.kernel.org/r/20200902035715.16414-1-gandalf@winds.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mlock: fix unevictable_pgs event counts on THP

5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
established that vm_events should count every subpage of a THP, including
unevictable_pgs_culled and unevictable_pgs_rescued; but
lru_cache_add_inactive_or_unevictable() was not doing so for
unevictable_pgs_mlocked, and mm/mlock.c was not doing so for
unevictable_pgs mlocked, munlocked, cleared and stranded.

Fix them; but THPs don't go the pagevec way in mlock.c, so no fixes needed
on that path.

Fixes: 5d91f31faf8e ("mm: swap: fix vmstats for huge page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301408230.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix check_move_unevictable_pages() on THP

check_move_unevictable_pages() is used in making unevictable shmem pages
evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
i915/gem check_release_pagevec(). Those may pass down subpages of a huge
page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".

That does not crash or warn at present, but the accounting of vmstats
unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
scanned being incremented on each subpage, rescued only on the head (since
tails already appear evictable once the head has been updated).

5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
established that vm_events in general (and unevictable_pgs_rescued in
particular) should count every subpage: so follow that precedent here.

Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
(to check page->mem_cgroup is always set), no problem: skip the tails
before calling it, and add thp_nr_pages() to vmstats on the head.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301405000.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: migration of hugetlbfs page skip memcg

hugetlbfs pages do not participate in memcg: so although they do find most
of migrate_page_states() useful, it would be better if they did not call
into mem_cgroup_migrate() - where Qian Cai reported that LTP's
move_pages12 triggers the warning in Alex Shi's prospective commit
"mm/memcg: warning on !memcg after readahead page charged".

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxch.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301359460.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ksm: reinstate memcg charge on copied pages

Patch series "mm: fixes to past from future testing".

Here's a set of independent fixes against 5.9-rc2: prompted by
testing Alex Shi's "warning on !memcg" and lru_lock series, but
I think fit for 5.9 - though maybe only the first for stable.

This patch (of 5):

In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte()
were removed, on the understanding that swap cache is now already charged
at those points; but a case was missed, when ksm_might_need_to_copy() has
decided it must allocate a substitute page: such pages were never charged.
Fix it inside ksm_might_need_to_copy().

This was discovered by Alex Shi's prospective commit "mm/memcg: warning on
!memcg after readahead page charged".

But there is a another surprise: this also fixes some rarer uncharged
PageAnon cases, when KSM is configured in, but has never been activated.
ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check
sometimes catches a case which would need to have been copied if KSM were
turned on. Or that's my optimistic interpretation (of my own old code),
but it leaves some doubt as to whether everything is working as intended
there - might it hint at rare anon ptes which rmap cannot find? A
question not easily answered: put in the fix for missed memcg charges.

Cc; Matthew Wilcox <willy@infradead.org>

Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org> [5.8]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301343270.5954@eggly.anvils
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301358020.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mailmap: add older email addresses for Kees Cook

This adds explicit mailmap entries for my older/other email addresses.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lkml.kernel.org/r/20200910193939.3798377-1-keescook@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 's390-5.9-6' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

- Fix order in trace_hardirqs_off_caller() to make locking state
   consistent even if the IRQ tracer calls into lockdep again. Touches
   common code. Acked-by Peter Zijlstra.

- Correctly handle secure storage violation exception to avoid kernel
   panic triggered by user space misbehaviour.

- Switch the idle->seqcount over to using raw_write_*() to avoid
  "suspicious RCU usage".

- Fix memory leaks on hard unplug in pci code.

- Use kvmalloc instead of kmalloc for larger allocations in zcrypt.

- Add few missing __init annotations to static functions to avoid
   section mismatch complains when functions are not inlined.

* tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: add 3f program exception handler
  lockdep: fix order in trace_hardirqs_off_caller()
  s390/pci: fix leak of DMA tables on hard unplug
  s390/init: add missing __init annotations
  s390/zcrypt: fix kmalloc 256k failure
  s390/idle: fix suspicious RCU usage

i2c: mxs: use MXS_DMA_CTRL_WAIT4END instead of DMA_CTRL_ACK

The driver-specific usage of the DMA_CTRL_ACK flag was replaced with a
custom flag in commit ceeeb99cd821 ("dmaengine: mxs: rename custom flag"),
but i2c-mxs was not updated to use the new flag, completely breaking I2C
transactions using DMA.

Fixes: ceeeb99cd821 ("dmaengine: mxs: rename custom flag")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

i2c: mediatek: Send i2c master code at more than 1MHz

The master code needs to being sent when the speed is more than
I2C_MAX_FAST_MODE_PLUS_FREQ, not I2C_MAX_FAST_MODE_FREQ in the
latest I2C-bus specification and user manual.

Signed-off-by: Qii Wang <qii.wang@mediatek.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

i2c: mediatek: Fix generic definitions for bus frequency

The max frequency of mediatek i2c controller driver is
I2C_MAX_HIGH_SPEED_MODE_FREQ, not I2C_MAX_FAST_MODE_PLUS_FREQ.
Fix it.

Fixes: 90224e6468e1 ("i2c: drivers: Use generic definitions for bus frequencies")
Reviewed-by: Yingjoe Chen <yingjoe.chen@mediatek.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Qii Wang <qii.wang@mediatek.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

Merge tag 'sh-for-5.9-part2' of git://git.libc.org/linux-sh

Pull arch/sh fixes from Rich Felker:
"Fixes for build and function regression"

* tag 'sh-for-5.9-part2' of git://git.libc.org/linux-sh:
sh: fix syscall tracing
sh: remove spurious circular inclusion from asm/smp.h

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Allow CPUs affected by erratum 1418040 to come online late
   (previously we only fixed the other case - CPUs not affected by the
   erratum coming up late).

- Fix branch offset in BPF JIT.

- Defer the stolen time initialisation to the CPU online time from the
   CPU starting time to avoid a (sleep-able) memory allocation in an
   atomic context.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: paravirt: Initialize steal time when cpu is online
  arm64: bpf: Fix branch offset in JIT
  arm64: Allow CPUs unffected by ARM erratum 1418040 to come in late

Merge tag 'powerpc-5.9-5' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.9:

   - Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing
     crashes.

   - Fix a long standing bug in our DMA mask handling that was hidden
     until recently, and which caused problems with some drivers.

   - Fix a boot failure on systems with large amounts of RAM, and no
     hugepage support and using Radix MMU, only seen in the lab.

   - A few other minor fixes.

  Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy,
  Hari Bathini, Ira Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav
  Jain, and Vaidyanathan Srinivasan"

* tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
  cpuidle: pseries: Fix CEDE latency conversion from tb to us
  powerpc/dma: Fix dma_map_ops::get_required_mask
  Revert "powerpc/build: vdso linker warning for orphan sections"
  powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc
  selftests/powerpc: Skip PROT_SAO test in guests/LPARS
  powerpc/book3s64/radix: Fix boot failure with large amount of guest memory

Merge tag 'pm-5.9-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"These add a new CPU ID to the RAPL power capping driver and prevent
  the ACPI processor idle driver from triggering RCU-lockdep complaints.

  Specifics:

   - Add support for the Lakefield chip to the RAPL power capping driver
     (Ricardo Neri).

   - Modify the ACPI processor idle driver to prevent it from triggering
     RCU-lockdep complaints which has started to happen after recent
     changes in that area (Peter Zijlstra)"

* tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Take over RCU-idle for C3-BM idle
  cpuidle: Allow cpuidle drivers to take over RCU-idle
  ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHED
  ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP
  powercap: RAPL: Add support for Lakefield

Merge tag 'sound-5.9-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Here is a collection of fixes for 5.9. All look small and are nothing
  scary.

  The majority of changes are about ASoC driver- specific fixes, while
  there are a couple of ASoC core fixes (DAI lookup and lockdep stuff)
  and usual HD-audio quirks"

* tag 'sound-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
  ALSA: hda/realtek - The Mic on a RedmiBook doesn't work
  ASoC: tlv320adcx140: Wake up codec before accessing register
  ASoC: core: Do not cleanup uninitialized dais on soc_pcm_open failure
  ALSA: hda: fixup headset for ASUS GX502 laptop
  ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1
  ASoC: Intel: haswell: Fix power transition refactor
  ASoC: tlv320adcx140: Fix accessing uninitialized adcx140->dev
  ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions
  ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811
  ASoC: meson: axg-toddr: fix channel order on g12 platforms
  ASoC: soc-core: add snd_soc_find_dai_with_mutex()
  ASoC: qcom: common: Fix refcount imbalance on error
  ASoC: rt700: Fix return check for devm_regmap_init_sdw()
  ASoC: rt715: Fix return check for devm_regmap_init_sdw()
  ASoC: rt711: Fix return check for devm_regmap_init_sdw()
  ASoC: rt1308-sdw: Fix return check for devm_regmap_init_sdw()
  ASoC: max98373: Fix return check for devm_regmap_init_sdw()
  ASoC: ti: fixup ams_delta_mute() function name
  ASoC: pcm3168a: ignore 0 Hz settings
  ASoC: Intel: tgl_max98373: fix a runtime pm issue in multi-thread case
  ...

Merge tag 'iommu-fixes-v5.9-rc5' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
"Two fixes for the AMD IOMMU driver:

   - Fix a potential NULL-ptr dereference found by smatch

   - Fix interrupt remapping when a device is assigned to a guest and
     AVIC is enabled"

* tag 'iommu-fixes-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Restore IRTE.RemapEn bit for amd_iommu_activate_guest_mode
  iommu/amd: Fix potential @entry null deref

Merge tag 'mtd/fixes-for-5.9-rc6' of git://git./linux/kernel/git/mtd/linux

Pull MTD/SPI NOR fixes from Vignesh Raghavendra:
"Revert patches that caused non volatile Quad Enable bit to be cleared
  for certain SPI NOR flashes during module remove or during shutdown,
  thus breaking backward compatibility"

Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
* tag 'mtd/fixes-for-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  Revert "mtd: spi-nor: Add capability to disable flash quad mode"
  Revert "mtd: spi-nor: Disable the flash quad mode in spi_nor_restore()"

objtool: Fix noreturn detection for ignored functions

When a function is annotated with STACK_FRAME_NON_STANDARD, objtool
doesn't validate its code paths.  It also skips sibling call detection
within the function.

But sibling call detection is actually needed for the case where the
ignored function doesn't have any return instructions.  Otherwise
objtool naively marks the function as implicit static noreturn, which
affects the reachability of its callers, resulting in "unreachable
instruction" warnings.

Fix it by just enabling sibling call detection for ignored functions.
The 'insn->ignore' check in add_jump_destinations() is no longer needed
after

  e6da9567959e ("objtool: Don't use ignore flag for fake jumps").

Fixes the following warning:

  arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_handle_exit_irqoff()+0x142: unreachable instruction

which triggers on an allmodconfig with CONFIG_GCOV_KERNEL unset.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/5b1e2536cdbaa5246b60d7791b76130a74082c62.1599751464.git.jpoimboe@redhat.com

Merge branch 'pm-cpuidle'

* pm-cpuidle:
  ACPI: processor: Take over RCU-idle for C3-BM idle
  cpuidle: Allow cpuidle drivers to take over RCU-idle
  ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHED
  ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP

kconfig: qconf: use delete[] instead of delete to free array (again)

Commit c9b09a9249e6 ("kconfig: qconf: use delete[] instead of delete
to free array") fixed two lines, but there is one more.
(cppcheck does not report it for some reason...)

This was detected by Clang.

"make HOSTCXX=clang++ xconfig" reports the following:

scripts/kconfig/qconf.cc:1279:2: warning: 'delete' applied to a pointer that was allocated with 'new[]'; did you mean 'delete[]'? [-Wmismatched-new-delete]
        delete data;
        ^
              []
scripts/kconfig/qconf.cc:1239:15: note: allocated with 'new[]' here
        char *data = new char[count + 1];
                     ^

Fixes: c4f7398bee9c ("kconfig: qconf: make debug links work again")
Fixes: c9b09a9249e6 ("kconfig: qconf: use delete[] instead of delete to free array")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

iommu/amd: Restore IRTE.RemapEn bit for amd_iommu_activate_guest_mode

Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating
128-bit IRTE") removed an assumption that modify_irte_ga always set
the valid bit, which requires the callers to set the appropriate value
for the struct irte_ga.valid bit before calling the function.

Similar to the commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn
bit after programming IRTE"), which is for the function
amd_iommu_deactivate_guest_mode().

The same change is also needed for the amd_iommu_activate_guest_mode().
Otherwise, this could trigger IO_PAGE_FAULT for the VFIO based VMs with
AVIC enabled.

Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20200916111720.43913-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

iommu/amd: Fix potential @entry null deref

After commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after
programming IRTE"), smatch warns:

drivers/iommu/amd/iommu.c:3870 amd_iommu_deactivate_guest_mode()
warn: variable dereferenced before check 'entry' (see line 3867)

Fix this by moving the @valid assignment to after @entry has been checked
for NULL.

Fixes: 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20200910171621.12879-1-joao.m.martins@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

x86/unwind/fp: Fix FP unwinding in ret_from_fork

There have been some reports of "bad bp value" warnings printed by the
frame pointer unwinder:

WARNING: kernel stack regs at 000000005bac7112 in sh:1014 has bad 'bp' value 0000000000000000

This warning happens when unwinding from an interrupt in
ret_from_fork(). If entry code gets interrupted, the state of the
frame pointer (rbp) may be undefined, which can confuse the unwinder,
resulting in warnings like the above.

There's an in_entry_code() check which normally silences such
warnings for entry code. But in this case, ret_from_fork() is getting
interrupted. It recently got moved out of .entry.text, so the
in_entry_code() check no longer works.

It could be moved back into .entry.text, but that would break the
noinstr validation because of the call to schedule_tail().

Instead, initialize each new task's RBP to point to the task's entry
regs via an encoded frame pointer. That will allow the unwinder to
reach the end of the stack gracefully.

Fixes: b9f6976bfb94 ("x86/entry/64: Move non entry code into .text section")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/f366bbf5a8d02e2318ee312f738112d0af74d16f.1600103007.git.jpoimboe@redhat.com

Merge branch 'for-5.9-fixes' of git://git./linux/kernel/git/dennis/percpu

Pull percpu fix from Dennis Zhou:
"This is a fix for the first chunk size calculation where the variable
  length array incorrectly used the number of longs instead of bytes of
  longs.

  This came in as a code fix and not a bug report, so I don't think it
  was widely problematic. I believe it worked out due to it being
  memblock memory and alignment requirements working in our favor"

* 'for-5.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  percpu: fix first chunk size calculation for populated bitmap

Merge tag 'drm-fixes-2020-09-18' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"A bunch of small fixes, some of the i915 ones have been out for a
  while and got better commit msg explaining some better reasoning
  behind them (hopefully this trend continues).

  Otherwise there a few AMD related ones mostly small, one radeon PLL
  regression fix and a bunch of small mediatek fixes.

  amdgpu:
   - Sienna Cichlid fixes
   - Navy Flounder fixes
   - DC fixes

  amdkfd:
   - Fix a GPU reset crash
   - Fix a memory leak

  radeon:
   - Revert a PLL fix that broke other boards

  i915:
   - Avoid exposing a partially constructed context
   - Use RCU instead of mutex for context termination list iteration
   - Avoid data race reported by KCSAN
   - Filter wake_flags passed to default_wake_function

  mediatek:
   - Fix scrolling of panel
   - Remove duplicated include
   - Use CPU when fail to get cmdq event
   - Add missing put_device() call"

* tag 'drm-fixes-2020-09-18' of git://anongit.freedesktop.org/drm/drm: (21 commits)
  drm/amd/display: Don't log hdcp module warnings in dmesg
  drm/amdgpu: declare ta firmware for navy_flounder
  drm/mediatek: Add missing put_device() call in mtk_hdmi_dt_parse_pdata()
  drm/mediatek: Add missing put_device() call in mtk_drm_kms_init()
  drm/mediatek: Add exception handing in mtk_drm_probe() if component init fail
  drm/mediatek: Add missing put_device() call in mtk_ddp_comp_init()
  drm/mediatek: Use CPU when fail to get cmdq event
  drm/mediatek: Remove duplicated include
  drm/i915: Filter wake_flags passed to default_wake_function
  drm/i915: Be wary of data races when reading the active execlists
  drm/i915/gem: Reduce context termination list iteration guard to RCU
  drm/i915/gem: Delay tracking the GEM context until it is registered
  drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is
  drm/radeon: revert "Prefer lower feedback dividers"
  drm/amdgpu: Include sienna_cichlid in USBC PD FW support.
  drm/amd/display: update nv1x stutter latencies
  drm/amd/display: Don't use DRM_ERROR() for DTM add topology
  drm/amd/pm: support runtime pptable update for sienna_cichlid etc.
  drm/amdkfd: fix a memory leak issue
  drm/kfd: fix a system crash issue during GPU recovery
  ...

Merge tag 'mediatek-drm-fixes-5.9' of https://git./linux/kernel/git/chunkuang.hu/linux into drm-fixes

Mediatek DRM Fixes for Linux 5.9

1. Fix scrolling of panel
2. Remove duplicated include
3. Use CPU when fail to get cmdq event
4. Add missing put_device() call

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200916231724.30571-1-chunkuang.hu@kernel.org

Merge tag 'drm-intel-fixes-2020-09-17' of ssh://git.freedesktop.org/git/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.9-rc6:
- Avoid exposing a partially constructed context
- Use RCU instead of mutex for context termination list iteration
- Avoid data race reported by KCSAN
- Filter wake_flags passed to default_wake_function

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87y2l8vlj3.fsf@intel.com

Merge tag 'amd-drm-fixes-5.9-2020-09-17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.9-2020-09-17:

amdgpu:
- Sienna Cichlid fixes
- Navy Flounder fixes
- DC fixes

amdkfd:
- Fix a GPU reset crash
- Fix a memory leak

radeon:
- Revert a PLL fix that broke other boards

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200917043818.3717-1-alexander.deucher@amd.com

i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()

Some ACPI i2c-devices _STA method (which is used to detect if the device
is present) use autodetection code which probes which device is present
over i2c. This requires the I2C ACPI OpRegion handler to be registered
before we enumerate i2c-clients under the i2c-adapter.

This fixes the i2c touchpad on the Lenovo ThinkBook 14-IIL and
ThinkBook 15 IIL not getting an i2c-client instantiated and thus not
working.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1842039
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

Merge tag 'mips_fixes_5.9_2' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:
"Two small fixes for SNI machines"

* tag 'mips_fixes_5.9_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: SNI: Fix spurious interrupts
MIPS: SNI: Fix MIPS_L1_CACHE_SHIFT

percpu: fix first chunk size calculation for populated bitmap

Variable populated, which is a member of struct pcpu_chunk, is used as a
unit of size of unsigned long.
However, size of populated is miscounted. So, I fix this minor part.

Fixes: 8ab16c43ea79 ("percpu: change the number of pages marked in the first_chunk pop bitmap")
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Sunghyun Jin <mcsmonk@gmail.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>

mm: allow a controlled amount of unfairness in the page lock

Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.

That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.

It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.

Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to. But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.

There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.

The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.

This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks. And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.

Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
Reported-and-tested-by: Michael Larabel <Michael@michaellarabel.com>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

arm64: paravirt: Initialize steal time when cpu is online

Steal time initialization requires mapping a memory region which
invokes a memory allocation. Doing this at CPU starting time results
in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled:

BUG: sleeping function called from invalid context at mm/slab.h:498
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1
Call trace:
dump_backtrace+0x0/0x208
show_stack+0x1c/0x28
dump_stack+0xc4/0x11c
___might_sleep+0xf8/0x130
__might_sleep+0x58/0x90
slab_pre_alloc_hook.constprop.101+0xd0/0x118
kmem_cache_alloc_node_trace+0x84/0x270
__get_vm_area_node+0x88/0x210
get_vm_area_caller+0x38/0x40
__ioremap_caller+0x70/0xf8
ioremap_cache+0x78/0xb0
memremap+0x9c/0x1a8
init_stolen_time_cpu+0x54/0xf0
cpuhp_invoke_callback+0xa8/0x720
notify_cpu_starting+0xc8/0xd8
secondary_start_kernel+0x114/0x180
CPU1: Booted secondary processor 0x0000000001 [0x431f0a11]

However we don't need to initialize steal time at CPU starting time.
We can simply wait until CPU online time, just sacrificing a bit of
accuracy by returning zero for steal time until we know better.

While at it, add __init to the functions that are only called by
pv_time_init() which is __init.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Fixes: e0685fa228fd ("arm64: Retrieve stolen time as paravirtualized guest")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

usblp: fix race between disconnect() and read()

read() needs to check whether the device has been
disconnected before it tries to talk to the device.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: syzbot+be5b5f86a162a6c281e6@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20200917103427.15740-1-oneukum@suse.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: bpf: Fix branch offset in JIT

Running the eBPF test_verifier leads to random errors looking like this:

[ 6525.735488] Unexpected kernel BRK exception at EL1
[ 6525.735502] Internal error: ptrace BRK handler: f2000100 [#1] SMP
[ 6525.741609] Modules linked in: nls_utf8 cifs libdes libarc4 dns_resolver fscache binfmt_misc nls_ascii nls_cp437 vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce gf128mul efi_pstore sha2_ce sha256_arm64 sha1_ce evdev efivars efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic xor xor_neon zstd_compress raid6_pq libcrc32c crc32c_generic ahci xhci_pci libahci xhci_hcd igb libata i2c_algo_bit nvme realtek usbcore nvme_core scsi_mod t10_pi netsec mdio_devres of_mdio gpio_keys fixed_phy libphy gpio_mb86s7x
[ 6525.787760] CPU: 3 PID: 7881 Comm: test_verifier Tainted: G        W         5.9.0-rc1+ #47
[ 6525.796111] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #1 Jun  6 2020
[ 6525.804812] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--)
[ 6525.810390] pc : bpf_prog_c3d01833289b6311_F+0xc8/0x9f4
[ 6525.815613] lr : bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c
[ 6525.820832] sp : ffff8000130cbb80
[ 6525.824141] x29: ffff8000130cbbb0 x28: 0000000000000000
[ 6525.829451] x27: 000005ef6fcbf39b x26: 0000000000000000
[ 6525.834759] x25: ffff8000130cbb80 x24: ffff800011dc7038
[ 6525.840067] x23: ffff8000130cbd00 x22: ffff0008f624d080
[ 6525.845375] x21: 0000000000000001 x20: ffff800011dc7000
[ 6525.850682] x19: 0000000000000000 x18: 0000000000000000
[ 6525.855990] x17: 0000000000000000 x16: 0000000000000000
[ 6525.861298] x15: 0000000000000000 x14: 0000000000000000
[ 6525.866606] x13: 0000000000000000 x12: 0000000000000000
[ 6525.871913] x11: 0000000000000001 x10: ffff8000000a660c
[ 6525.877220] x9 : ffff800010951810 x8 : ffff8000130cbc38
[ 6525.882528] x7 : 0000000000000000 x6 : 0000009864cfa881
[ 6525.887836] x5 : 00ffffffffffffff x4 : 002880ba1a0b3e9f
[ 6525.893144] x3 : 0000000000000018 x2 : ffff8000000a4374
[ 6525.898452] x1 : 000000000000000a x0 : 0000000000000009
[ 6525.903760] Call trace:
[ 6525.906202]  bpf_prog_c3d01833289b6311_F+0xc8/0x9f4
[ 6525.911076]  bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c
[ 6525.915957]  bpf_dispatcher_xdp_func+0x14/0x20
[ 6525.920398]  bpf_test_run+0x70/0x1b0
[ 6525.923969]  bpf_prog_test_run_xdp+0xec/0x190
[ 6525.928326]  __do_sys_bpf+0xc88/0x1b28
[ 6525.932072]  __arm64_sys_bpf+0x24/0x30
[ 6525.935820]  el0_svc_common.constprop.0+0x70/0x168
[ 6525.940607]  do_el0_svc+0x28/0x88
[ 6525.943920]  el0_sync_handler+0x88/0x190
[ 6525.947838]  el0_sync+0x140/0x180
[ 6525.951154] Code: d4202000 d4202000 d4202000 d4202000 (d4202000)
[ 6525.957249] ---[ end trace cecc3f93b14927e2 ]---

The reason is the offset[] creation and later usage, while building
the eBPF body. The code currently omits the first instruction, since
build_insn() will increase our ctx->idx before saving it.
That was fine up until bounded eBPF loops were introduced. After that
introduction, offset[0] must be the offset of the end of prologue which
is the start of the 1st insn while, offset[n] holds the
offset of the end of n-th insn.

When "taken loop with back jump to 1st insn" test runs, it will
eventually call bpf2a64_offset(-1, 2, ctx). Since negative indexing is
permitted, the current outcome depends on the value stored in
ctx->offset[-1], which has nothing to do with our array.
If the value happens to be 0 the tests will work. If not this error
triggers.

commit 7c2e988f400e ("bpf: fix x64 JIT code generation for jmp to 1st insn")
fixed an indentical bug on x86 when eBPF bounded loops were introduced.

So let's fix it by creating the ctx->offset[] differently. Track the
beginning of instruction and account for the extra instruction while
calculating the arm instruction offsets.

Fixes: 2589726d12a1 ("bpf: introduce bounded loops")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Jiri Olsa <jolsa@kernel.org>
Co-developed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Co-developed-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200917084925.177348-1-ilias.apalodimas@linaro.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ehci-hcd: Move include to keep CRC stable

The CRC calculation done by genksyms is triggered when the parser hits
EXPORT_SYMBOL*() macros. At this point, genksyms recursively expands the
types of the function parameters, and uses that as the input for the CRC
calculation. In the case of forward-declared structs, the type expands
to 'UNKNOWN'. Following this, it appears that the result of the
expansion of each type is cached somewhere, and seems to be re-used
when/if the same type is seen again for another exported symbol in the
same C file.

Unfortunately, this can cause CRC 'stability' issues when a struct
definition becomes visible in the middle of a C file. For example, let's
assume code with the following pattern:

    struct foo;

    int bar(struct foo *arg)
    {
/* Do work ... */
    }
    EXPORT_SYMBOL_GPL(bar);

    /* This contains struct foo's definition */
    #include "foo.h"

    int baz(struct foo *arg)
    {
/* Do more work ... */
    }
    EXPORT_SYMBOL_GPL(baz);

Here, baz's CRC will be computed using the expansion of struct foo that
was cached after bar's CRC calculation ('UNKOWN' here). But if
EXPORT_SYMBOL_GPL(bar) is removed from the file (because of e.g. symbol
trimming using CONFIG_TRIM_UNUSED_KSYMS), struct foo will be expanded
late, during baz's CRC calculation, which now has visibility over the
full struct definition, hence resulting in a different CRC for baz.

The proper fix for this certainly is in genksyms, but that will take me
some time to get right. In the meantime, we have seen one occurrence of
this in the ehci-hcd code which hits this problem because of the way it
includes C files halfway through the code together with an unlucky mix
of symbol trimming.

In order to workaround this, move the include done in ehci-hub.c early
in ehci-hcd.c, hence making sure the struct definitions are visible to
the entire file. This improves CRC stability of the ehci-hcd exports
even when symbol trimming is enabled.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200916171825.3228122-1-qperret@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>