linux-2.6-microblaze.git
4 months agoMerge tag 'acpi-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 10 Sep 2021 20:29:04 +0000 (13:29 -0700)]
Merge tag 'acpi-5.15-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "These prevent a confusing PRMT-related message from being printed,
  drop an unnecessary header file include and update the list of ACPICA
  maintainers.

  Specifics:

   - Prevent a message about missing PRMT from being printed on systems
     that do not support PRM, which are the majority now (Aubrey Li).

   - Drop unnecessary header include from scan.c (Kari Argillander).

   - Update the list of ACPICA maintainers after recent departure of one
     of them (Rafael Wysocki)"

* tag 'acpi-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Update the list of maintainers
  ACPI: PRM: Find PRMT table before parsing it
  ACPI: scan: Remove unneeded header linux/nls.h

4 months agoMerge tag 'pm-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 10 Sep 2021 20:20:47 +0000 (13:20 -0700)]
Merge tag 'pm-5.15-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These improve hybrid processors support in intel_pstate, fix an issue
  in the core devices PM code, clean up the handling of dedicated wake
  IRQs, update the Energy Model documentation and update MAINTAINERS.

  Specifics:

   - Make the HWP performance levels calibration on hybrid processors in
     intel_pstate more straightforward (Rafael Wysocki).

   - Prevent the PM core from leaving devices in suspend after a failing
     system-wide suspend transition in some cases when driver PM flags
     are used (Prasad Sodagudi).

   - Drop unused function argument from the dedicated wake IRQs handling
     code (Sergey Shtylyov).

   - Fix up Energy Model kerneldoc comments and include them in the
     Energy Model documentation (Lukasz Luba).

   - Use my kernel.org address in MAINTAINERS insead of the personal one
     (Rafael Wysocki)"

* tag 'pm-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  MAINTAINERS: Change Rafael's e-mail address
  PM: sleep: core: Avoid setting power.must_resume to false
  Documentation: power: include kernel-doc in Energy Model doc
  PM: EM: fix kernel-doc comments
  cpufreq: intel_pstate: hybrid: Rework HWP calibration
  ACPI: CPPC: Introduce cppc_get_nominal_perf()
  PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

4 months agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 10 Sep 2021 18:58:20 +0000 (11:58 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Limit the linear region to 51-bit when KVM is running in nVHE mode.

   Otherwise, depending on the placement of the ID map, kernel-VA to
   hyp-VA translations may produce addresses that either conflict with
   other HYP mappings or generate addresses outside of the 52-bit
   addressable range.

 - Instruct kmemleak not to scan the memory reserved for kdump as this
   range is removed from the kernel linear map and therefore not
   accessible.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kdump: Skip kmemleak scan reserved memory for kdump
  arm64: mm: limit linear region to 51 bits for KVM in nVHE mode

4 months agoMerge tag 'for-5.15/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Fri, 10 Sep 2021 18:52:01 +0000 (11:52 -0700)]
Merge tag 'for-5.15/parisc-3' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:

 - Build warning fixes in Makefile and Dino PCI driver

 - Fix when sched_clock is marked unstable

 - Drop strnlen_user() in favour of generic version

 - Prevent kernel to write outside userspace signal stack

 - Remove CONFIG_SET_FS including KERNEL_DS and USER_DS from parisc and
   switch to __get/put_kernel_nofault()

* tag 'for-5.15/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Implement __get/put_kernel_nofault()
  parisc: Mark sched_clock unstable only if clocks are not syncronized
  parisc: Move pci_dev_is_behind_card_dino to where it is used
  parisc: Reduce sigreturn trampoline to 3 instructions
  parisc: Check user signal stack trampoline is inside TASK_SIZE
  parisc: Drop useless debug info and comments from signal.c
  parisc: Drop strnlen_user() in favour of generic version
  parisc: Add missing FORCE prerequisite in Makefile

4 months agoMerge tag 'iommu-fixes-v5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 10 Sep 2021 18:42:03 +0000 (11:42 -0700)]
Merge tag 'iommu-fixes-v5.15-rc0' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d:
     - PASID leakage in intel_svm_unbind_mm()
     - Deadlock in intel_svm_drain_prq()

 - AMD IOMMU: Fixes for an unhandled page-fault bug when AVIC is used
   for a KVM guest.

 - Make CONFIG_IOMMU_DEFAULT_DMA_LAZY architecture instead of IOMMU
   driver dependent

* tag 'iommu-fixes-v5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Clarify default domain Kconfig
  iommu/vt-d: Fix a deadlock in intel_svm_drain_prq()
  iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm()
  iommu/amd: Remove iommu_init_ga()
  iommu/amd: Relocate GAMSup check to early_enable_iommus

4 months agoMerge tag 'char-misc-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 10 Sep 2021 18:31:47 +0000 (11:31 -0700)]
Merge tag 'char-misc-5.15-rc1-2' of git://git./linux/kernel/git/gregkh/char-misc

Pull habanalabs updates from Greg KH:
 "Here is another round of misc driver patches for 5.15-rc1.

  In here is only updates for the Habanalabs driver. This request is
  late because the previously-objected-to dma-buf patches are all
  removed and some fixes that you and others found are now included in
  here as well.

  All of these have been in linux-next for well over a week with no
  reports of problems, and they are all self-contained to only this one
  driver. Full details are in the shortlog"

* tag 'char-misc-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (61 commits)
  habanalabs/gaudi: hwmon default card name
  habanalabs: add support for f/w reset
  habanalabs/gaudi: block ICACHE_BASE_ADDERESS_HIGH in TPC
  habanalabs: cannot sleep while holding spinlock
  habanalabs: never copy_from_user inside spinlock
  habanalabs: remove unnecessary device status check
  habanalabs: disable IRQ in user interrupts spinlock
  habanalabs: add "in device creation" status
  habanalabs/gaudi: invalidate PMMU mem cache on init
  habanalabs/gaudi: size should be printed in decimal
  habanalabs/gaudi: define DC POWER for secured PMC
  habanalabs/gaudi: unmask out of bounds SLM access interrupt
  habanalabs: add userptr_lookup node in debugfs
  habanalabs/gaudi: fetch TPC/MME ECC errors from F/W
  habanalabs: modify multi-CS to wait on stream masters
  habanalabs/gaudi: add monitored SOBs to state dump
  habanalabs/gaudi: restore user registers when context opens
  habanalabs/gaudi: increase boot fit timeout
  habanalabs: update to latest firmware headers
  habanalabs/gaudi: minimize number of register reads
  ...

4 months agoMerge branches 'acpi-scan' and 'acpi-prm'
Rafael J. Wysocki [Fri, 10 Sep 2021 18:27:07 +0000 (20:27 +0200)]
Merge branches 'acpi-scan' and 'acpi-prm'

* acpi-scan:
  ACPI: scan: Remove unneeded header linux/nls.h

* acpi-prm:
  ACPI: PRM: Find PRMT table before parsing it

4 months agoMerge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'
Rafael J. Wysocki [Fri, 10 Sep 2021 18:26:08 +0000 (20:26 +0200)]
Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'

* pm-cpufreq:
  cpufreq: intel_pstate: hybrid: Rework HWP calibration
  ACPI: CPPC: Introduce cppc_get_nominal_perf()

* pm-sleep:
  PM: sleep: core: Avoid setting power.must_resume to false
  PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

* pm-em:
  Documentation: power: include kernel-doc in Energy Model doc
  PM: EM: fix kernel-doc comments

4 months agoMerge tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 10 Sep 2021 18:22:23 +0000 (11:22 -0700)]
Merge tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Just an initial bunch of fixes for the merge window, amdgpu is most of
  them with a few ttm fixes and an fbdev avoid multiply overflow fix.

  core:
   - Make some dma-buf config options depend on DMA_SHARED_BUFFER
   - Handle multiplication overflow of fbdev xres/yres in the core

  ttm:
   - Fix ttm_bo_move_memcpy() when ttm_resource is subclassed
   - Fix ttm deadlock if target BO isn't idle
   - ttm build fix
   - ttm docs fix

  dma-buf:
   - config option fixes

  fbdev:
   - limit resolutions to avoid int overflow

  i915:
   - stddef change.

  amdgpu:
   - Misc cleanups, typo fixes
   - EEPROM fix
   - Add some new PCI IDs
   - Scatter/Gather display support for Yellow Carp
   - PCIe DPM fix for RKL platforms
   - RAS fix

  amdkfd:
   - SVM fix

  vc4:
   - static function fix

  mgag200:
   - fix uninit var

  panfrost:
   - lock_region fixes"

* tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drm: (36 commits)
  drm/ttm: Fix a deadlock if the target BO is not idle during swap
  fbmem: don't allow too huge resolutions
  dma-buf: DMABUF_SYSFS_STATS should depend on DMA_SHARED_BUFFER
  dma-buf: DMABUF_DEBUG should depend on DMA_SHARED_BUFFER
  drm/i915: use linux/stddef.h due to "isystem: trim/fixup stdarg.h and other headers"
  dma-buf: DMABUF_MOVE_NOTIFY should depend on DMA_SHARED_BUFFER
  drm/amdkfd: drop process ref count when xnack disable
  drm/amdgpu: enable more pm sysfs under SRIOV 1-VF mode
  drm/amdgpu: fix fdinfo race with process exit
  drm/amdgpu: Fix a deadlock if previous GEM object allocation fails
  drm/amdgpu: stop scheduler when calling hw_fini (v2)
  drm/amdgpu: Clear RAS interrupt status on aldebaran
  drm/amd/display: Initialize lt_settings on instantiation
  drm/amd/display: cleanup idents after a revert
  drm/amd/display: Fix memory leak reported by coverity
  drm/ttm: Fix ttm_bo_move_memcpy() for subclassed struct ttm_resource
  drm/amdgpu/swsmu: fix spelling mistake "minimun" -> "minimum"
  drm/amdgpu: Disable PCIE_DPM on Intel RKL Platform
  drm/amdgpu: show both cmd id and name when psp cmd failed
  drm/amd/display: setup system context for APUs
  ...

4 months agofsnotify: fix sb_connectors leak
Amir Goldstein [Thu, 9 Sep 2021 11:56:34 +0000 (14:56 +0300)]
fsnotify: fix sb_connectors leak

Fix a leak in s_fsnotify_connectors counter in case of a race between
concurrent add of new fsnotify mark to an object.

The task that lost the race fails to drop the counter before freeing
the unused connector.

Following umount() hangs in fsnotify_sb_delete()/wait_var_event(),
because s_fsnotify_connectors never drops to zero.

Fixes: ec44610fe2b8 ("fsnotify: count all objects with attached connectors")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Link: https://lore.kernel.org/linux-fsdevel/20210907063338.ycaw6wvhzrfsfdlp@xzhoux.usersys.redhat.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoMAINTAINERS: Change Rafael's e-mail address
Rafael J. Wysocki [Fri, 10 Sep 2021 12:45:57 +0000 (14:45 +0200)]
MAINTAINERS: Change Rafael's e-mail address

I have been slow to respond to messages going to rjw@rjwysocki.net
recently, so change it to rafael@kernel.org (which works better for
me) in MAINTAINERS.

Signed-off-by: Rafael J. Wysocki <rafael@kernel.org>
4 months agoACPICA: Update the list of maintainers
Rafael J. Wysocki [Fri, 10 Sep 2021 12:42:51 +0000 (14:42 +0200)]
ACPICA: Update the list of maintainers

Erik Kaneda will not be maintaining ACPICA any more, so drop his
address (which doesn't work any more anyway) from the maintainer
list.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
4 months agoarm64: kdump: Skip kmemleak scan reserved memory for kdump
Chen Wandun [Fri, 10 Sep 2021 06:48:44 +0000 (14:48 +0800)]
arm64: kdump: Skip kmemleak scan reserved memory for kdump

Trying to boot with kdump + kmemleak, command will result in a crash:
"echo scan > /sys/kernel/debug/kmemleak"

crashkernel reserved: 0x0000000007c00000 - 0x0000000027c00000 (512 MB)
Kernel command line: BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.14.0-rc5-next-20210809+ root=/dev/mapper/ao-root ro rd.lvm.lv=ao/root rd.lvm.lv=ao/swap crashkernel=512M
Unable to handle kernel paging request at virtual address ffff000007c00000
Mem abort info:
  ESR = 0x96000007
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x07: level 3 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000007
  CM = 0, WnR = 0
swapper pgtable: 64k pages, 48-bit VAs, pgdp=00002024f0d80000
[ffff000007c00000] pgd=1800205ffffd0003, p4d=1800205ffffd0003, pud=1800205ffffd0003, pmd=1800205ffffc0003, pte=0068000007c00f06
Internal error: Oops: 96000007 [#1] SMP
pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : scan_block+0x98/0x230
lr : scan_block+0x94/0x230
sp : ffff80008d6cfb70
x29: ffff80008d6cfb70 x28: 0000000000000000 x27: 0000000000000000
x26: 00000000000000c0 x25: 0000000000000001 x24: 0000000000000000
x23: ffffa88a6b18b398 x22: ffff000007c00ff9 x21: ffffa88a6ac7fc40
x20: ffffa88a6af6a830 x19: ffff000007c00000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
x14: ffffffff00000000 x13: ffffffffffffffff x12: 0000000000000020
x11: 0000000000000000 x10: 0000000001080000 x9 : ffffa88a6951c77c
x8 : ffffa88a6a893988 x7 : ffff203ff6cfb3c0 x6 : ffffa88a6a52b3c0
x5 : ffff203ff6cfb3c0 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000001 x1 : ffff20226cb56a40 x0 : 0000000000000000
Call trace:
 scan_block+0x98/0x230
 scan_gray_list+0x120/0x270
 kmemleak_scan+0x3a0/0x648
 kmemleak_write+0x3ac/0x4c8
 full_proxy_write+0x6c/0xa0
 vfs_write+0xc8/0x2b8
 ksys_write+0x70/0xf8
 __arm64_sys_write+0x24/0x30
 invoke_syscall+0x4c/0x110
 el0_svc_common+0x9c/0x190
 do_el0_svc+0x30/0x98
 el0_svc+0x28/0xd8
 el0t_64_sync_handler+0x90/0xb8
 el0t_64_sync+0x180/0x184

The reserved memory for kdump will be looked up by kmemleak, this area
will be set invalid when kdump service is bring up. That will result in
crash when kmemleak scan this area.

Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210910064844.3827813-1-chenwandun@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 months agodrm/ttm: Fix a deadlock if the target BO is not idle during swap
xinhui pan [Tue, 7 Sep 2021 04:08:32 +0000 (12:08 +0800)]
drm/ttm: Fix a deadlock if the target BO is not idle during swap

The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes: ebd59851c796 ("drm/ttm: move swapout logic around v3")
Link: https://patchwork.freedesktop.org/patch/msgid/20210907040832.1107747-1-xinhui.pan@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
4 months agoMerge tag 'drm-misc-next-fixes-2021-09-09' of git://anongit.freedesktop.org/drm/drm...
Dave Airlie [Fri, 10 Sep 2021 04:18:33 +0000 (14:18 +1000)]
Merge tag 'drm-misc-next-fixes-2021-09-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next-fixes for v5.15:
- Make some dma-buf config options depend on DMA_SHARED_BUFFER.
- Handle multiplication overflow of fbdev xres/yres in the core.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/37c5fe2e-5be8-45c3-286b-d8d536a5cef2@linux.intel.com
4 months agoMerge tag '5.15-rc-ksmbd-part2' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 9 Sep 2021 23:17:14 +0000 (16:17 -0700)]
Merge tag '5.15-rc-ksmbd-part2' of git://git.samba.org/ksmbd

Pull ksmbd fixes from Steve French:

 - various fixes pointed out by coverity, and a minor cleanup patch

 - id mapping and ownership fixes

 - an smbdirect fix

* tag '5.15-rc-ksmbd-part2' of git://git.samba.org/ksmbd:
  ksmbd: fix control flow issues in sid_to_id()
  ksmbd: fix read of uninitialized variable ret in set_file_basic_info
  ksmbd: add missing assignments to ret on ndr_read_int64 read calls
  ksmbd: add validation for ndr read/write functions
  ksmbd: remove unused ksmbd_file_table_flush function
  ksmbd: smbd: fix dma mapping error in smb_direct_post_send_data
  ksmbd: Reduce error log 'speed is unknown' to debug
  ksmbd: defer notify_change() call
  ksmbd: remove setattr preparations in set_file_basic_info()
  ksmbd: ensure error is surfaced in set_file_basic_info()
  ndr: fix translation in ndr_encode_posix_acl()
  ksmbd: fix translation in sid_to_id()
  ksmbd: fix subauth 0 handling in sid_to_id()
  ksmbd: fix translation in acl entries
  ksmbd: fix translation in ksmbd_acls_fattr()
  ksmbd: fix translation in create_posix_rsp_buf()
  ksmbd: fix translation in smb2_populate_readdir_entry()
  ksmbd: fix lookup on idmapped mounts

4 months agoMerge tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Linus Torvalds [Thu, 9 Sep 2021 23:09:56 +0000 (16:09 -0700)]
Merge tag 'for-5.15-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix max_inline mount option limit on 64k page system

 - lockdep fixes:
     - update bdev time in a safer way
     - move bdev put outside of sb write section when removing device
     - fix possible deadlock when mounting seed/sprout filesystem

 - zoned mode: fix split extent accounting

 - minor include fixup

* tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix double counting of split ordered extent
  btrfs: fix lockdep warning while mounting sprout fs
  btrfs: delay blkdev_put until after the device remove
  btrfs: update the bdev time directly when closing
  btrfs: use correct header for div_u64 in misc.h
  btrfs: fix upper limit for max_inline for page size 64K

4 months agoMerge tag 'sound-fix-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 9 Sep 2021 23:05:10 +0000 (16:05 -0700)]
Merge tag 'sound-fix-5.15-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes that have been gathered before rc1,
  including a few regression fixes for the problem in the previous pull
  request"

* tag 'sound-fix-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: gus: Fix repeated probe for ISA interwave card
  ALSA: gus: Fix repeated probes of snd_gus_create()
  ALSA: vx222: fix null-ptr-deref
  ASoC: rockchip: i2s: Fix concurrency between tx/rx
  ASoC: mt8195: correct the dts parsing logic about DPTX and HDMITX
  ASoC: Intel: boards: Fix CONFIG_SND_SOC_SDW_MOCKUP select
  ASoC: dt-bindings: fsl_rpmsg: Add compatible string for i.MX8ULP
  ALSA: usb-audio: Add registration quirk for JBL Quantum 800
  ASoC: rt5682: fix headset background noise when S3 state
  ASoC: dt-bindings: mt8195: remove dependent headers in the example
  ASoC: mediatek: SND_SOC_MT8195 should depend on ARCH_MEDIATEK
  ASoC: samsung: s3c24xx_simtec: fix spelling mistake "devicec" -> "device"
  ASoC: audio-graph: respawn Platform Support
  ASoC: mediatek: mt8195: add MTK_PMIC_WRAP dependency

4 months agoparisc: Implement __get/put_kernel_nofault()
Helge Deller [Thu, 9 Sep 2021 10:47:00 +0000 (12:47 +0200)]
parisc: Implement __get/put_kernel_nofault()

Remove CONFIG_SET_FS from parisc, so we need to add
__get_kernel_nofault() and __put_kernel_nofault(), define
HAVE_GET_KERNEL_NOFAULT and remove set_fs(), get_fs(), load_sr2(),
thread_info->addr_limit, KERNEL_DS and USER_DS.

The nice side-effect of this patch is that we now can directly access
userspace via sr3 without the need to use a temporary sr2 which is
either copied from sr3 or set to zero (for kernel space).

Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
4 months agoMerge tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Linus Torvalds [Thu, 9 Sep 2021 20:45:26 +0000 (13:45 -0700)]
Merge tag 'for-linus-5.15-rc1' of git://git./linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - Support for VMAP_STACK

 - Support for splice_write in hostfs

 - Fixes for virt-pci

 - Fixes for virtio_uml

 - Various fixes

* tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: fix stub location calculation
  um: virt-pci: fix uapi documentation
  um: enable VMAP_STACK
  um: virt-pci: don't do DMA from stack
  hostfs: support splice_write
  um: virtio_uml: fix memory leak on init failures
  um: virtio_uml: include linux/virtio-uml.h
  lib/logic_iomem: fix sparse warnings
  um: make PCI emulation driver init/exit static

4 months agoMerge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Thu, 9 Sep 2021 20:25:49 +0000 (13:25 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM development updates from Russell King:

 - Rename "mod_init" and "mod_exit" so that initcall debug output is
   actually useful (Randy Dunlap)

 - Update maintainers entries for linux-arm-kernel to indicate it is
   moderated for non-subscribers (Randy Dunlap)

 - Move install rules to arch/arm/Makefile (Masahiro Yamada)

 - Drop unnecessary ARCH_NR_GPIOS definition (Linus Walleij)

 - Don't warn about atags_to_fdt() stack size (David Heidelberg)

 - Speed up unaligned copy_{from,to}_kernel_nofault (Arnd Bergmann)

 - Get rid of set_fs() usage (Arnd Bergmann)

 - Remove checks for GCC prior to v4.6 (Geert Uytterhoeven)

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9118/1: div64: Remove always-true __div64_const32_is_OK() duplicate
  ARM: 9117/1: asm-generic: div64: Remove always-true __div64_const32_is_OK()
  ARM: 9116/1: unified: Remove check for gcc < 4
  ARM: 9110/1: oabi-compat: fix oabi epoll sparse warning
  ARM: 9113/1: uaccess: remove set_fs() implementation
  ARM: 9112/1: uaccess: add __{get,put}_kernel_nofault
  ARM: 9111/1: oabi-compat: rework fcntl64() emulation
  ARM: 9114/1: oabi-compat: rework sys_semtimedop emulation
  ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulation
  ARM: 9107/1: syscall: always store thread_info->abi_syscall
  ARM: 9109/1: oabi-compat: add epoll_pwait handler
  ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()
  ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  ARM: 9105/1: atags_to_fdt: don't warn about stack size
  ARM: 9103/1: Drop ARCH_NR_GPIOS definition
  ARM: 9102/1: move theinstall rules to arch/arm/Makefile
  ARM: 9100/1: MAINTAINERS: mark all linux-arm-kernel@infradead list as moderated
  ARM: 9099/1: crypto: rename 'mod_init' & 'mod_exit' functions to be module-specific

4 months agoMerge tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Thu, 9 Sep 2021 20:11:15 +0000 (13:11 -0700)]
Merge tag 'trace-v5.15-2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:

 - Add migrate-disable counter to tracing header

 - Fix error handling in event probes

 - Fix missed unlock in osnoise in error path

 - Fix merge issue with tools/bootconfig

 - Clean up bootconfig data when init memory is removed

 - Fix bootconfig to loop only on subkeys

 - Have kernel command lines override bootconfig options

 - Increase field counts for synthetic events

 - Have histograms dynamic allocate event elements to save space

 - Fixes in testing and documentation

* tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/boot: Fix to loop on only subkeys
  selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events
  tracing: Dynamically allocate the per-elt hist_elt_data array
  tracing: synth events: increase max fields count
  tools/bootconfig: Show whole test command for each test case
  bootconfig: Fix missing return check of xbc_node_compose_key function
  tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh
  docs: bootconfig: Add how to use bootconfig for kernel parameters
  init/bootconfig: Reorder init parameter from bootconfig and cmdline
  init: bootconfig: Remove all bootconfig data when the init memory is removed
  tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()
  tracing: Fix some alloc_event_probe() error handling bugs
  tracing: Add migrate-disabled counter to tracing output.

4 months agoMerge tag 's390-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 9 Sep 2021 19:55:12 +0000 (12:55 -0700)]
Merge tag 's390-5.15-2' of git://git./linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:
 "Except for the xpram device driver removal it is all about fixes and
  cleanups.

   - Fix topology update on cpu hotplug, so notifiers see expected
     masks. This bug was uncovered with SCHED_CORE support.

   - Fix stack unwinding so that the correct number of entries are
     omitted like expected by common code. This fixes KCSAN selftests.

   - Add kmemleak annotation to stack_alloc to avoid false positive
     kmemleak warnings.

   - Avoid layering violation in common I/O code and don't unregister
     subchannel from child-drivers.

   - Remove xpram device driver for which no real use case exists since
     the kernel is 64 bit only. Also all hypervisors got required
     support removed in the meantime, which means the xpram device
     driver is dead code.

   - Fix -ENODEV handling of clp_get_state in our PCI code.

   - Enable KFENCE in debug defconfig.

   - Cleanup hugetlbfs s390 specific Kconfig dependency.

   - Quite a lot of trivial fixes to get rid of "W=1" warnings, and and
     other simple cleanups"

* tag 's390-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  hugetlbfs: s390 is always 64bit
  s390/ftrace: remove incorrect __va usage
  s390/zcrypt: remove incorrect kernel doc indicators
  scsi: zfcp: fix kernel doc comments
  s390/sclp: add __nonstring annotation
  s390/hmcdrv_ftp: fix kernel doc comment
  s390: remove xpram device driver
  s390/pci: read clp_list_pci_req only once
  s390/pci: fix clp_get_state() handling of -ENODEV
  s390/cio: fix kernel doc comment
  s390/ctrlchar: fix kernel doc comment
  s390/con3270: use proper type for tasklet function
  s390/cpum_cf: move array from header to C file
  s390/mm: fix kernel doc comments
  s390/topology: fix topology information when calling cpu hotplug notifiers
  s390/unwind: use current_frame_address() to unwind current task
  s390/configs: enable CONFIG_KFENCE in debug_defconfig
  s390/entry: make oklabel within CHKSTG macro local
  s390: add kmemleak annotation in stack_alloc()
  s390/cio: dont unregister subchannel from child-drivers

4 months agoMerge branch 'work.gfs2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Thu, 9 Sep 2021 19:45:26 +0000 (12:45 -0700)]
Merge branch 'work.gfs2' of git://git./linux/kernel/git/viro/vfs

Pull gfs2 setattr updates from Al Viro:
 "Make it possible for filesystems to use a generic 'may_setattr()' and
  switch gfs2 to using it"

* 'work.gfs2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  gfs2: Switch to may_setattr in gfs2_setattr
  fs: Move notify_change permission checks into may_setattr

4 months agoMerge branch 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Thu, 9 Sep 2021 19:38:18 +0000 (12:38 -0700)]
Merge branch 'work.init' of git://git./linux/kernel/git/viro/vfs

Pull root filesystem type handling updates from Al Viro:
 "Teach init/do_mounts.c to handle non-block filesystems, hopefully
  preventing even more special-cased kludges (such as root=/dev/nfs,
  etc)"

* 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: simplify get_filesystem_list / get_all_fs_names
  init: allow mounting arbitrary non-blockdevice filesystems as root
  init: split get_fs_names

4 months agoMerge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Thu, 9 Sep 2021 19:13:46 +0000 (12:13 -0700)]
Merge branch 'work.iov_iter' of git://git./linux/kernel/git/viro/vfs

Pull iov_iter fixes from Al Viro:
 "Fixes for io-uring handling of iov_iter reexpands"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  io_uring: reexpand under-reexpanded iters
  iov_iter: track truncated size

4 months agoMerge tag 'cxl-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Linus Torvalds [Thu, 9 Sep 2021 18:48:27 +0000 (11:48 -0700)]
Merge tag 'cxl-for-5.15' of git://git./linux/kernel/git/cxl/cxl

Pull CXL (Compute Express Link) updates from Dan Williams:

 - Fix detection of CXL host bridges to filter out disabled ACPI0016
   devices in the ACPI DSDT.

 - Fix kernel lockdown integration to disable raw commands when raw PCI
   access is disabled.

 - Fix a broken debug message.

 - Add support for "Get Partition Info". I.e. enumerate the split
   between volatile and persistent capacity on bi-modal CXL memory
   expanders.

 - Re-factor the core by subject area. This is a work in progress.

 - Prepare libnvdimm to understand CXL labels in addition to EFI labels.
   This is a work in progress.

* tag 'cxl-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (25 commits)
  cxl/registers: Fix Documentation warning
  cxl/pmem: Fix Documentation warning
  cxl/uapi: Fix defined but not used warnings
  cxl/pci: Fix debug message in cxl_probe_regs()
  cxl/pci: Fix lockdown level
  cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge ports
  libnvdimm/labels: Add claim class helpers
  libnvdimm/labels: Add type-guid helpers
  libnvdimm/labels: Add blk special cases for nlabel and position helpers
  libnvdimm/labels: Add blk isetcookie set / validation helpers
  libnvdimm/labels: Add a checksum calculation helper
  libnvdimm/labels: Introduce label setter helpers
  libnvdimm/labels: Add isetcookie validation helper
  libnvdimm/labels: Introduce getters for namespace label fields
  cxl/mem: Adjust ram/pmem range to represent DPA ranges
  cxl/mem: Account for partitionable space in ram/pmem ranges
  cxl/pci: Store memory capacity values
  cxl/pci: Simplify register setup
  cxl/pci: Ignore unknown register block types
  cxl/core: Move memdev management to core
  ...

4 months agoMerge tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdim...
Linus Torvalds [Thu, 9 Sep 2021 18:39:57 +0000 (11:39 -0700)]
Merge tag 'libnvdimm-for-5.15' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:

 - Fix a race condition in the teardown path of raw mode pmem
   namespaces.

 - Cleanup the code that filesystems use to detect filesystem-dax
   capabilities of their underlying block device.

* tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: remove bdev_dax_supported
  xfs: factor out a xfs_buftarg_is_dax helper
  dax: stub out dax_supported for !CONFIG_FS_DAX
  dax: remove __generic_fsdax_supported
  dax: move the dax_read_lock() locking into dax_supported
  dax: mark dax_get_by_host static
  dm: use fs_dax_get_by_bdev instead of dax_get_by_host
  dax: stop using bdevname
  fsdax: improve the FS_DAX Kconfig description and help text
  libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind

4 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Thu, 9 Sep 2021 18:14:14 +0000 (11:14 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "I don't usually send a second PR in the merge window, but the fix to
  mlx5 is significant enough that it should start going through the
  process ASAP. Along with it comes some of the usual -rc stuff that
  would normally wait for a -rc2 or so.

  Summary:

  Important error case regression fixes in mlx5:

   - Wrong size used when computing the error path smaller allocation
     request leads to corruption

   - Confusing but ultimately harmless alignment mis-calculation

  Static checker warning fixes:

   - NULL pointer subtraction in qib

   - kcalloc in bnxt_re

   - Missing static on global variable in hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/hfi1: make hist static
  RDMA/bnxt_re: Prefer kcalloc over open coded arithmetic
  IB/qib: Fix null pointer subtraction compiler warning
  RDMA/mlx5: Fix xlt_chunk_align calculation
  RDMA/mlx5: Fix number of allocated XLT entries

4 months agoMerge tag 'dmaengine-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul...
Linus Torvalds [Thu, 9 Sep 2021 18:07:47 +0000 (11:07 -0700)]
Merge tag 'dmaengine-5.15-rc1' of git://git./linux/kernel/git/vkoul/dmaengine

Pull dmaengine updates from Vinod Koul:
 "New drivers/devices
   - Support for Renesas RZ/G2L dma controller
   - New driver for AMD PTDMA controller

  Updates:
   - Big pile of idxd updates
   - Updates for Altera driver, stm32-dma, dw etc"

* tag 'dmaengine-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (83 commits)
  dmaengine: sh: fix some NULL dereferences
  dmaengine: sh: Fix unused initialization of pointer lmdesc
  MAINTAINERS: Fix AMD PTDMA DRIVER entry
  dmaengine: ptdma: remove PT_OFFSET to avoid redefnition
  dmaengine: ptdma: Add debugfs entries for PTDMA
  dmaengine: ptdma: register PTDMA controller as a DMA resource
  dmaengine: ptdma: Initial driver for the AMD PTDMA
  dmaengine: fsl-dpaa2-qdma: Fix spelling mistake "faile" -> "failed"
  dmaengine: idxd: remove interrupt disable for dev_lock
  dmaengine: idxd: remove interrupt disable for cmd_lock
  dmaengine: idxd: fix setting up priv mode for dwq
  dmaengine: xilinx_dma: Set DMA mask for coherent APIs
  dmaengine: ti: k3-psil-j721e: Add entry for CSI2RX
  dmaengine: sh: Add DMAC driver for RZ/G2L SoC
  dmaengine: Extend the dma_slave_width for 128 bytes
  dt-bindings: dma: Document RZ/G2L bindings
  dmaengine: ioat: depends on !UML
  dmaengine: idxd: set descriptor allocation size to threshold for swq
  dmaengine: idxd: make submit failure path consistent on desc freeing
  dmaengine: idxd: remove interrupt flag for completion list spinlock
  ...

4 months agoarm64: mm: limit linear region to 51 bits for KVM in nVHE mode
Ard Biesheuvel [Thu, 26 Aug 2021 16:56:13 +0000 (18:56 +0200)]
arm64: mm: limit linear region to 51 bits for KVM in nVHE mode

KVM in nVHE mode divides up its VA space into two equal halves, and
picks the half that does not conflict with the HYP ID map to map its
linear region. This worked fine when the kernel's linear map itself was
guaranteed to cover precisely as many bits of VA space, but this was
changed by commit f4693c2716b35d08 ("arm64: mm: extend linear region for
52-bit VA configurations").

The result is that, depending on the placement of the ID map, kernel-VA
to hyp-VA translations may produce addresses that either conflict with
other HYP mappings (including the ID map itself) or generate addresses
outside of the 52-bit addressable range, neither of which is likely to
lead to anything useful.

Given that 52-bit capable cores are guaranteed to implement VHE, this
only affects configurations such as pKVM where we opt into non-VHE mode
even if the hardware is VHE capable. So just for these configurations,
let's limit the kernel linear map to 51 bits and work around the
problem.

Fixes: f4693c2716b3 ("arm64: mm: extend linear region for 52-bit VA configurations")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210826165613.60774-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 months agoiommu: Clarify default domain Kconfig
Robin Murphy [Wed, 8 Sep 2021 12:55:37 +0000 (13:55 +0100)]
iommu: Clarify default domain Kconfig

Although strictly it is the AMD and Intel drivers which have an existing
expectation of lazy behaviour by default, it ends up being rather
unintuitive to describe this literally in Kconfig. Express it instead as
an architecture dependency, to clarify that it is a valid config-time
decision. The end result is the same since virtio-iommu doesn't support
lazy mode and thus falls back to strict at runtime regardless.

The per-architecture disparity is a matter of historical expectations:
the AMD and Intel drivers have been lazy by default since 2008, and
changing that gets noticed by people asking where their I/O throughput
has gone. Conversely, Arm-based systems with their wider assortment of
IOMMU drivers mostly only support strict mode anyway; only the Arm SMMU
drivers have later grown support for passthrough and lazy mode, for
users who wanted to explicitly trade off isolation for performance.
These days, reducing the default level of isolation in a way which may
go unnoticed by users who expect otherwise hardly seems worth risking
for the sake of one line of Kconfig, so here's where we are.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/69a0c6f17b000b54b8333ee42b3124c1d5a869e2.1631105737.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
4 months agoiommu/vt-d: Fix a deadlock in intel_svm_drain_prq()
Fenghua Yu [Sat, 28 Aug 2021 07:06:22 +0000 (15:06 +0800)]
iommu/vt-d: Fix a deadlock in intel_svm_drain_prq()

pasid_mutex and dev->iommu->param->lock are held while unbinding mm is
flushing IO page fault workqueue and waiting for all page fault works to
finish. But an in-flight page fault work also need to hold the two locks
while unbinding mm are holding them and waiting for the work to finish.
This may cause an ABBA deadlock issue as shown below:

idxd 0000:00:0a.0: unbind PASID 2
======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc7+ #549 Not tainted [  186.615245] ----------
dsa_test/898 is trying to acquire lock:
ffff888100d854e8 (&param->lock){+.+.}-{3:3}, at:
iopf_queue_flush_dev+0x29/0x60
but task is already holding lock:
ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
intel_svm_unbind+0x34/0x1e0
which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (pasid_mutex){+.+.}-{3:3}:
       __mutex_lock+0x75/0x730
       mutex_lock_nested+0x1b/0x20
       intel_svm_page_response+0x8e/0x260
       iommu_page_response+0x122/0x200
       iopf_handle_group+0x1c2/0x240
       process_one_work+0x2a5/0x5a0
       worker_thread+0x55/0x400
       kthread+0x13b/0x160
       ret_from_fork+0x22/0x30

-> #1 (&param->fault_param->lock){+.+.}-{3:3}:
       __mutex_lock+0x75/0x730
       mutex_lock_nested+0x1b/0x20
       iommu_report_device_fault+0xc2/0x170
       prq_event_thread+0x28a/0x580
       irq_thread_fn+0x28/0x60
       irq_thread+0xcf/0x180
       kthread+0x13b/0x160
       ret_from_fork+0x22/0x30

-> #0 (&param->lock){+.+.}-{3:3}:
       __lock_acquire+0x1134/0x1d60
       lock_acquire+0xc6/0x2e0
       __mutex_lock+0x75/0x730
       mutex_lock_nested+0x1b/0x20
       iopf_queue_flush_dev+0x29/0x60
       intel_svm_drain_prq+0x127/0x210
       intel_svm_unbind+0xc5/0x1e0
       iommu_sva_unbind_device+0x62/0x80
       idxd_cdev_release+0x15a/0x200 [idxd]
       __fput+0x9c/0x250
       ____fput+0xe/0x10
       task_work_run+0x64/0xa0
       exit_to_user_mode_prepare+0x227/0x230
       syscall_exit_to_user_mode+0x2c/0x60
       do_syscall_64+0x48/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  &param->lock --> &param->fault_param->lock --> pasid_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(pasid_mutex);
       lock(&param->fault_param->lock);
       lock(pasid_mutex);
  lock(&param->lock);

 *** DEADLOCK ***

2 locks held by dsa_test/898:
 #0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at:
 iommu_sva_unbind_device+0x53/0x80
 #1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
 intel_svm_unbind+0x34/0x1e0

stack backtrace:
CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549
Hardware name: Intel Corporation Kabylake Client platform/KBL S
DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016
Call Trace:
 dump_stack_lvl+0x5b/0x74
 dump_stack+0x10/0x12
 print_circular_bug.cold+0x13d/0x142
 check_noncircular+0xf1/0x110
 __lock_acquire+0x1134/0x1d60
 lock_acquire+0xc6/0x2e0
 ? iopf_queue_flush_dev+0x29/0x60
 ? pci_mmcfg_read+0xde/0x240
 __mutex_lock+0x75/0x730
 ? iopf_queue_flush_dev+0x29/0x60
 ? pci_mmcfg_read+0xfd/0x240
 ? iopf_queue_flush_dev+0x29/0x60
 mutex_lock_nested+0x1b/0x20
 iopf_queue_flush_dev+0x29/0x60
 intel_svm_drain_prq+0x127/0x210
 ? intel_pasid_tear_down_entry+0x22e/0x240
 intel_svm_unbind+0xc5/0x1e0
 iommu_sva_unbind_device+0x62/0x80
 idxd_cdev_release+0x15a/0x200

pasid_mutex protects pasid and svm data mapping data. It's unnecessary
to hold pasid_mutex while flushing the workqueue. To fix the deadlock
issue, unlock pasid_pasid during flushing the workqueue to allow the works
to be handled.

Fixes: d5b9e4bfe0d8 ("iommu/vt-d: Report prq to io-pgfault framework")
Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com
[joro: Removed timing information from kernel log messages]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
4 months agoiommu/vt-d: Fix PASID leak in intel_svm_unbind_mm()
Fenghua Yu [Sat, 28 Aug 2021 07:06:21 +0000 (15:06 +0800)]
iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm()

The mm->pasid will be used in intel_svm_free_pasid() after load_pasid()
during unbinding mm. Clearing it in load_pasid() will cause PASID cannot
be freed in intel_svm_free_pasid().

Additionally mm->pasid was updated already before load_pasid() during pasid
allocation. No need to update it again in load_pasid() during binding mm.
Don't update mm->pasid to avoid the issues in both binding mm and unbinding
mm.

Fixes: 4048377414162 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
4 months agoiommu/amd: Remove iommu_init_ga()
Suravee Suthikulpanit [Fri, 20 Aug 2021 20:29:57 +0000 (15:29 -0500)]
iommu/amd: Remove iommu_init_ga()

Since the function has been simplified and only call iommu_init_ga_log(),
remove the function and replace with iommu_init_ga_log() instead.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20210820202957.187572-4-suravee.suthikulpanit@amd.com
Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
4 months agoiommu/amd: Relocate GAMSup check to early_enable_iommus
Wei Huang [Fri, 20 Aug 2021 20:29:55 +0000 (15:29 -0500)]
iommu/amd: Relocate GAMSup check to early_enable_iommus

Currently, iommu_init_ga() checks and disables IOMMU VAPIC support
(i.e. AMD AVIC support in IOMMU) when GAMSup feature bit is not set.
However it forgets to clear IRQ_POSTING_CAP from the previously set
amd_iommu_irq_ops.capability.

This triggers an invalid page fault bug during guest VM warm reboot
if AVIC is enabled since the irq_remapping_cap(IRQ_POSTING_CAP) is
incorrectly set, and crash the system with the following kernel trace.

    BUG: unable to handle page fault for address: 0000000000400dd8
    RIP: 0010:amd_iommu_deactivate_guest_mode+0x19/0xbc
    Call Trace:
     svm_set_pi_irte_mode+0x8a/0xc0 [kvm_amd]
     ? kvm_make_all_cpus_request_except+0x50/0x70 [kvm]
     kvm_request_apicv_update+0x10c/0x150 [kvm]
     svm_toggle_avic_for_irq_window+0x52/0x90 [kvm_amd]
     svm_enable_irq_window+0x26/0xa0 [kvm_amd]
     vcpu_enter_guest+0xbbe/0x1560 [kvm]
     ? avic_vcpu_load+0xd5/0x120 [kvm_amd]
     ? kvm_arch_vcpu_load+0x76/0x240 [kvm]
     ? svm_get_segment_base+0xa/0x10 [kvm_amd]
     kvm_arch_vcpu_ioctl_run+0x103/0x590 [kvm]
     kvm_vcpu_ioctl+0x22a/0x5d0 [kvm]
     __x64_sys_ioctl+0x84/0xc0
     do_syscall_64+0x33/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes by moving the initializing of AMD IOMMU interrupt remapping mode
(amd_iommu_guest_ir) earlier before setting up the
amd_iommu_irq_ops.capability with appropriate IRQ_POSTING_CAP flag.

[joro: Squashed the two patches and limited
check_features_on_all_iommus() to CONFIG_IRQ_REMAP
to fix a compile warning.]

Signed-off-by: Wei Huang <wei.huang2@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20210820202957.187572-2-suravee.suthikulpanit@amd.com
Link: https://lore.kernel.org/r/20210820202957.187572-3-suravee.suthikulpanit@amd.com
Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
4 months agoparisc: Mark sched_clock unstable only if clocks are not syncronized
Helge Deller [Wed, 8 Sep 2021 21:27:00 +0000 (23:27 +0200)]
parisc: Mark sched_clock unstable only if clocks are not syncronized

We check at runtime if the cr16 clocks are stable across CPUs. Only mark
the sched_clock unstable by calling clear_sched_clock_stable() if we
know that we run on a system which isn't syncronized across CPUs.

Signed-off-by: Helge Deller <deller@gmx.de>
4 months agoparisc: Move pci_dev_is_behind_card_dino to where it is used
Guenter Roeck [Wed, 8 Sep 2021 15:30:41 +0000 (08:30 -0700)]
parisc: Move pci_dev_is_behind_card_dino to where it is used

parisc build test images fail to compile with the following error.

drivers/parisc/dino.c:160:12: error:
'pci_dev_is_behind_card_dino' defined but not used

Move the function just ahead of its only caller to avoid the error.

Fixes: 5fa1659105fa ("parisc: Disable HP HSC-PCI Cards to prevent kernel crash")
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
4 months agoparisc: Reduce sigreturn trampoline to 3 instructions
Helge Deller [Tue, 7 Sep 2021 03:03:29 +0000 (05:03 +0200)]
parisc: Reduce sigreturn trampoline to 3 instructions

We can move the INSN_LDI_R20 instruction into the branch delay slot.

Signed-off-by: Helge Deller <deller@gmx.de>
4 months agoparisc: Check user signal stack trampoline is inside TASK_SIZE
Helge Deller [Sun, 5 Sep 2021 09:53:32 +0000 (11:53 +0200)]
parisc: Check user signal stack trampoline is inside TASK_SIZE

Add some additional checks to ensure the signal stack is inside
userspace bounds.

Signed-off-by: Helge Deller <deller@gmx.de>
4 months agoparisc: Drop useless debug info and comments from signal.c
Helge Deller [Mon, 6 Sep 2021 20:45:16 +0000 (22:45 +0200)]
parisc: Drop useless debug info and comments from signal.c

Signed-off-by: Helge Deller <deller@gmx.de>
4 months agoparisc: Drop strnlen_user() in favour of generic version
Helge Deller [Sat, 4 Sep 2021 21:49:26 +0000 (23:49 +0200)]
parisc: Drop strnlen_user() in favour of generic version

As suggested by Arnd Bergmann, drop the parisc version of
strnlen_user() and switch to the generic version.

Suggested-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
4 months agoparisc: Add missing FORCE prerequisite in Makefile
Helge Deller [Sun, 5 Sep 2021 09:50:56 +0000 (11:50 +0200)]
parisc: Add missing FORCE prerequisite in Makefile

Signed-off-by: Helge Deller <deller@gmx.de>
4 months agoMerge tag 'drm-misc-next-fixes-2021-09-03' of git://anongit.freedesktop.org/drm/drm...
Dave Airlie [Thu, 9 Sep 2021 03:34:15 +0000 (13:34 +1000)]
Merge tag 'drm-misc-next-fixes-2021-09-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next-fixes for v5.15:
- Fix ttm_bo_move_memcpy() when ttm_resource is subclassed.
- Small fixes to panfrost, mgag200, vc4.
- Small ttm compilation fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/41ff5e54-0837-2226-a182-97ffd11ef01e@linux.intel.com
4 months agoMerge tag 'amd-drm-next-5.15-2021-09-01' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 9 Sep 2021 03:33:48 +0000 (13:33 +1000)]
Merge tag 'amd-drm-next-5.15-2021-09-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-5.15-2021-09-01:

amdgpu:
- Misc cleanups, typo fixes
- EEPROM fix
- Add some new PCI IDs
- Scatter/Gather display support for Yellow Carp
- PCIe DPM fix for RKL platforms
- RAS fix

amdkfd:
- SVM fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210901214015.4488-1-alexander.deucher@amd.com
4 months agoMerge branches 'akpm' and 'akpm-hotfixes' (patches from Andrew)
Linus Torvalds [Thu, 9 Sep 2021 01:52:05 +0000 (18:52 -0700)]
Merge branches 'akpm' and 'akpm-hotfixes' (patches from Andrew)

Merge yet more updates and hotfixes from Andrew Morton:
 "Post-linux-next material, based upon latest upstream to catch the
  now-merged dependencies:

   - 10 patches.

     Subsystems affected by this patch series: mm (vmstat and migration)
     and compat.

  And bunch of hotfixes, mostly cc:stable:

   - 8 patches.

     Subsystems affected by this patch series: mm (hmm, hugetlb, vmscan,
     pagealloc, pagemap, kmemleak, mempolicy, and memblock)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  arch: remove compat_alloc_user_space
  compat: remove some compat entry points
  mm: simplify compat numa syscalls
  mm: simplify compat_sys_move_pages
  kexec: avoid compat_alloc_user_space
  kexec: move locking into do_kexec_load
  mm: migrate: change to use bool type for 'page_was_mapped'
  mm: migrate: fix the incorrect function name in comments
  mm: migrate: introduce a local variable to get the number of pages
  mm/vmstat: protect per cpu variables with preempt disable on RT

* emailed hotfixes from Andrew Morton <akpm@linux-foundation.org>:
  nds32/setup: remove unused memblock_region variable in setup_memory()
  mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task
  mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp
  mmap_lock: change trace and locking order
  mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
  mm,vmscan: fix divide by zero in get_scan_count
  mm/hugetlb: initialize hugetlb_usage in mm_init
  mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled

4 months agonds32/setup: remove unused memblock_region variable in setup_memory()
Mike Rapoport [Thu, 9 Sep 2021 01:10:23 +0000 (18:10 -0700)]
nds32/setup: remove unused memblock_region variable in setup_memory()

kernel test robot reports unused variable warning:

   arch/nds32/kernel/setup.c:247:26: warning: Unused variable: region
   [unusedVariable]
    struct memblock_region *region;
                            ^

Remove the unused variable.

Link: https://lkml.kernel.org/r/20210712125218.28951-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm/mempolicy: fix a race between offset_il_node and mpol_rebind_task
yanghui [Thu, 9 Sep 2021 01:10:20 +0000 (18:10 -0700)]
mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task

Servers happened below panic:

  Kernel version:5.4.56
  BUG: unable to handle page fault for address: 0000000000002c48
  RIP: 0010:__next_zones_zonelist+0x1d/0x40
  Call Trace:
    __alloc_pages_nodemask+0x277/0x310
    alloc_page_interleave+0x13/0x70
    handle_mm_fault+0xf99/0x1390
    __do_page_fault+0x288/0x500
    do_page_fault+0x30/0x110
    page_fault+0x3e/0x50

The reason for the panic is that MAX_NUMNODES is passed in the third
parameter in __alloc_pages_nodemask(preferred_nid).  So access to
zonelist->zoneref->zone_idx in __next_zones_zonelist will cause a panic.

In offset_il_node(), first_node() returns nid from pol->v.nodes, after
this other threads may chang pol->v.nodes before next_node().  This race
condition will let next_node return MAX_NUMNODES.  So put pol->nodes in
a local variable.

The race condition is between offset_il_node and cpuset_change_task_nodemask:

  CPU0:                                     CPU1:
  alloc_pages_vma()
    interleave_nid(pol,)
      offset_il_node(pol,)
        first_node(pol->v.nodes)            cpuset_change_task_nodemask
                        //nodes==0xc          mpol_rebind_task
                                                mpol_rebind_policy
                                                  mpol_rebind_nodemask(pol,nodes)
                        //nodes==0x3
        next_node(nid, pol->v.nodes)//return MAX_NUMNODES

Link: https://lkml.kernel.org/r/20210906034658.48721-1-yanghui.def@bytedance.com
Signed-off-by: yanghui <yanghui.def@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp
Naohiro Aota [Thu, 9 Sep 2021 01:10:17 +0000 (18:10 -0700)]
mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp

In a memory pressure situation, I'm seeing the lockdep WARNING below.
Actually, this is similar to a known false positive which is already
addressed by commit 6dcde60efd94 ("xfs: more lockdep whackamole with
kmem_alloc*").

This warning still persists because it's not from kmalloc() itself but
from an allocation for kmemleak object.  While kmalloc() itself suppress
the warning with __GFP_NOLOCKDEP, gfp_kmemleak_mask() is dropping the
flag for the kmemleak's allocation.

Allow __GFP_NOLOCKDEP to be passed to kmemleak's allocation, so that the
warning for it is also suppressed.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.14.0-rc7-BTRFS-ZNS+ #37 Not tainted
  ------------------------------------------------------
  kswapd0/288 is trying to acquire lock:
  ffff88825ab45df0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x8a/0x250

  but task is already holding lock:
  ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (fs_reclaim){+.+.}-{0:0}:
         fs_reclaim_acquire+0x112/0x160
         kmem_cache_alloc+0x48/0x400
         create_object.isra.0+0x42/0xb10
         kmemleak_alloc+0x48/0x80
         __kmalloc+0x228/0x440
         kmem_alloc+0xd3/0x2b0
         kmem_alloc_large+0x5a/0x1c0
         xfs_attr_copy_value+0x112/0x190
         xfs_attr_shortform_getvalue+0x1fc/0x300
         xfs_attr_get_ilocked+0x125/0x170
         xfs_attr_get+0x329/0x450
         xfs_get_acl+0x18d/0x430
         get_acl.part.0+0xb6/0x1e0
         posix_acl_xattr_get+0x13a/0x230
         vfs_getxattr+0x21d/0x270
         getxattr+0x126/0x310
         __x64_sys_fgetxattr+0x1a6/0x2a0
         do_syscall_64+0x3b/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
         __lock_acquire+0x2c0f/0x5a00
         lock_acquire+0x1a1/0x4b0
         down_read_nested+0x50/0x90
         xfs_ilock+0x8a/0x250
         xfs_can_free_eofblocks+0x34f/0x570
         xfs_inactive+0x411/0x520
         xfs_fs_destroy_inode+0x2c8/0x710
         destroy_inode+0xc5/0x1a0
         evict+0x444/0x620
         dispose_list+0xfe/0x1c0
         prune_icache_sb+0xdc/0x160
         super_cache_scan+0x31e/0x510
         do_shrink_slab+0x337/0x8e0
         shrink_slab+0x362/0x5c0
         shrink_node+0x7a7/0x1a40
         balance_pgdat+0x64e/0xfe0
         kswapd+0x590/0xa80
         kthread+0x38c/0x460
         ret_from_fork+0x22/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:
         CPU0                    CPU1
         ----                    ----
    lock(fs_reclaim);
                                 lock(&xfs_nondir_ilock_class);
                                 lock(fs_reclaim);
    lock(&xfs_nondir_ilock_class);

   *** DEADLOCK ***
  3 locks held by kswapd0/288:
   #0: ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   #1: ffffffff848a08d8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x269/0x5c0
   #2: ffff8881a7a820e8 (&type->s_umount_key#60){++++}-{3:3}, at: super_cache_scan+0x5a/0x510

Link: https://lkml.kernel.org/r/20210907055659.3182992-1-naohiro.aota@wdc.com
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Darrick J . Wong" <djwong@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agommap_lock: change trace and locking order
Liam Howlett [Thu, 9 Sep 2021 01:10:14 +0000 (18:10 -0700)]
mmap_lock: change trace and locking order

Print to the trace log before releasing the lock to avoid racing with
other trace log printers of the same lock type.

Link: https://lkml.kernel.org/r/20210903022041.1843024-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michel Lespinasse <walken.cr@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
Miaohe Lin [Thu, 9 Sep 2021 01:10:11 +0000 (18:10 -0700)]
mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype

If it's not prepared to free unref page, the pcp page migratetype is
unset.  Thus we will get rubbish from get_pcppage_migratetype() and
might list_del(&page->lru) again after it's already deleted from the list
leading to grumble about data corruption.

Link: https://lkml.kernel.org/r/20210902115447.57050-1-linmiaohe@huawei.com
Fixes: df1acc856923 ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm,vmscan: fix divide by zero in get_scan_count
Rik van Riel [Thu, 9 Sep 2021 01:10:08 +0000 (18:10 -0700)]
mm,vmscan: fix divide by zero in get_scan_count

Commit f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to
proportional memory.low reclaim") introduced a divide by zero corner
case when oomd is being used in combination with cgroup memory.low
protection.

When oomd decides to kill a cgroup, it will force the cgroup memory to
be reclaimed after killing the tasks, by writing to the memory.max file
for that cgroup, forcing the remaining page cache and reclaimable slab
to be reclaimed down to zero.

Previously, on cgroups with some memory.low protection that would result
in the memory being reclaimed down to the memory.low limit, or likely
not at all, having the page cache reclaimed asynchronously later.

With f56ce412a59d the oomd write to memory.max tries to reclaim all the
way down to zero, which may race with another reclaimer, to the point of
ending up with the divide by zero below.

This patch implements the obvious fix.

Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com
Fixes: f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm/hugetlb: initialize hugetlb_usage in mm_init
Liu Zixian [Thu, 9 Sep 2021 01:10:05 +0000 (18:10 -0700)]
mm/hugetlb: initialize hugetlb_usage in mm_init

After fork, the child process will get incorrect (2x) hugetlb_usage.  If
a process uses 5 2MB hugetlb pages in an anonymous mapping,

HugetlbPages:    10240 kB

and then forks, the child will show,

HugetlbPages:    20480 kB

The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent
to child.  Child will have 2x actual usage.

Fix this by adding hugetlb_count_init in mm_init.

Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
Fixes: 5d317b2b6536 ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
Li Zhijian [Thu, 9 Sep 2021 01:10:02 +0000 (18:10 -0700)]
mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled

Previously, we noticed the one rpma example was failed[1] since commit
36f30e486dce ("IB/core: Improve ODP to use hmm_range_fault()"), where it
will use ODP feature to do RDMA WRITE between fsdax files.

After digging into the code, we found hmm_vma_handle_pte() will still
return EFAULT even though all the its requesting flags has been
fulfilled.  That's because a DAX page will be marked as (_PAGE_SPECIAL |
PAGE_DEVMAP) by pte_mkdevmap().

Link: https://github.com/pmem/rpma/issues/1142
Link: https://lkml.kernel.org/r/20210830094232.203029-1-lizhijian@cn.fujitsu.com
Fixes: 405506274922 ("mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling")
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoMerge tag 'tag-chrome-platform-for-v5.15' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Wed, 8 Sep 2021 23:43:46 +0000 (16:43 -0700)]
Merge tag 'tag-chrome-platform-for-v5.15' of git://git./linux/kernel/git/chrome-platform/linux

Pull chrome platform updates from Benson Leung:
 "cros_ec_typec:

   - make the cros_ec_typec driver to use the pre-existing
     cros_ec_check_features() function

  sensorhub:

   - add trace events for sample

  misc:

   - cros_ec_proto - re-send commands in the event of a timeout (for the
     FPMCU)

   - fix warnings in cros_ec_trace related to format output"

* tag 'tag-chrome-platform-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_trace: Fix format warnings
  platform/chrome: cros_ec_typec: Use existing feature check
  platform/chrome: cros_ec_proto: Send command again when timeout occurs
  platform/chrome: sensorhub: Add trace events for sample

4 months agoMerge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Wed, 8 Sep 2021 23:38:25 +0000 (16:38 -0700)]
Merge tag 'pm-5.15-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These are mostly ARM cpufreq driver updates, including one new
  MediaTek driver that has just passed all of the reviews, with the
  addition of a revert of a recent intel_pstate commit, some core
  cpufreq changes and a DT-related update of the operating performance
  points (OPP) support code.

  Specifics:

   - Add new cpufreq driver for the MediaTek MT6779 platform called
     mediatek-hw along with corresponding DT bindings (Hector.Yuan).

   - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
     Gopinath).

   - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
     policy flag (Taniya Das).

   - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
     Andersson).

   - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
     flag (Viresh Kumar).

   - Add new cpufreq driver callback to allow drivers to register with
     the Energy Model in a consistent way and make several drivers use
     it (Viresh Kumar).

   - Change the remaining users of the .ready() cpufreq driver callback
     to move the code from it elsewhere and drop it from the cpufreq
     core (Viresh Kumar).

   - Revert recent intel_pstate change adding HWP guaranteed performance
     change notification support to it that led to problems, because the
     notification in question is triggered prematurely on some systems
     (Rafael Wysocki).

   - Convert the OPP DT bindings to DT schema and clean them up while at
     it (Rob Herring)"

* tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
  cpufreq: mediatek-hw: Add support for CPUFREQ HW
  cpufreq: Add of_perf_domain_get_sharing_cpumask
  dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
  cpufreq: Remove ready() callback
  cpufreq: sh: Remove sh_cpufreq_cpu_ready()
  cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
  cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
  cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
  cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
  cpufreq: scmi: Use .register_em() to register with energy model
  cpufreq: vexpress: Use .register_em() to register with energy model
  cpufreq: scpi: Use .register_em() to register with energy model
  dt-bindings: opp: Convert to DT schema
  dt-bindings: Clean-up OPP binding node names in examples
  ARM: dts: omap: Drop references to opp.txt
  cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
  cpufreq: omap: Use .register_em() to register with energy model
  cpufreq: mediatek: Use .register_em() to register with energy model
  cpufreq: imx6q: Use .register_em() to register with energy model
  ...

4 months agoMerge tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Wed, 8 Sep 2021 23:33:21 +0000 (16:33 -0700)]
Merge tag 'acpi-5.15-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "These add ACPI support to the PCI VMD driver, improve suspend-to-idle
  support for AMD platforms and update documentation.

  Specifics:

   - Add ACPI support to the PCI VMD driver (Rafael Wysocki)

   - Rearrange suspend-to-idle support code to reflect the platform
     firmware expectations on some AMD platforms (Mario Limonciello)

   - Make SSDT overlays documentation follow the code documented by it
     more closely (Andy Shevchenko)"

* tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: s2idle: Run both AMD and Microsoft methods if both are supported
  Documentation: ACPI: Align the SSDT overlays file with the code
  PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus

4 months agoMerge tag 'docs-5.15-2' of git://git.lwn.net/linux
Linus Torvalds [Wed, 8 Sep 2021 23:28:14 +0000 (16:28 -0700)]
Merge tag 'docs-5.15-2' of git://git.lwn.net/linux

Pull more documentation updates from Jonathan Corbet:
 "Another collection of documentation patches, mostly fixes but also
  includes another set of traditional Chinese translations"

* tag 'docs-5.15-2' of git://git.lwn.net/linux:
  docs: pdfdocs: Fix typo in CJK-language specific font settings
  docs: kernel-hacking: Remove inappropriate text
  docs/zh_TW: add translations for zh_TW/filesystems
  docs/zh_TW: add translations for zh_TW/cpu-freq
  docs/zh_TW: add translations for zh_TW/arm64
  docs/zh_CN: Modify the translator tag and fix the wrong word
  Documentation/features/vm: correct huge-vmap APIs
  Documentation: block: blk-mq: Fix small typo in multi-queue docs
  Documentation: in_irq() cleanup
  Documentation: arm: marvell: Add 88F6825 model into list
  Documentation/process/maintainer-pgp-guide: Replace broken link to PGP path finder
  Documentation: locking: fix references
  Documentation: Update details of The Linux Kernel Module Programming Guide
  docs: x86: Remove obsolete information about x86_64 vmalloc() faulting
  Documentation/process/applying-patches: Activate linux-next man hyperlink

4 months agoMerge tag 'modules-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Wed, 8 Sep 2021 23:06:48 +0000 (16:06 -0700)]
Merge tag 'modules-for-v5.15' of git://git./linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "The only main change I have for this round of updates is the modules
  MAINTAINERS update.

  As I find myself with less time to devote to upstream these days, Luis
  has kindly agreed to help maintain the module loader, to eventually
  transition to being the primary maintainer. Since Luis is already very
  involved upstream with experience maintaining various areas of the
  kernel including the kmod usermode helper, I think he is a great fit
  for this area of the kernel.

  Summary:

   - Add Luis Chamberlain as modules maintainer

   - Fix for .ctors sections in module linker script"

* tag 'modules-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  MAINTAINERS: Add Luis Chamberlain as modules maintainer
  module: combine constructors in module linker script

4 months agoMerge tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze
Linus Torvalds [Wed, 8 Sep 2021 23:02:13 +0000 (16:02 -0700)]
Merge tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze

Pull microblaze update from Michal Simek:

 - Kbuild clean up

* tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: move core-y in arch/microblaze/Makefile to arch/microblaze/Kbuild

4 months agoMerge branch 'for-5.15/fsdax-cleanups' into for-5.15/libnvdimm
Dan Williams [Wed, 8 Sep 2021 22:58:13 +0000 (15:58 -0700)]
Merge branch 'for-5.15/fsdax-cleanups' into for-5.15/libnvdimm

Include Christoph's rework of the dax_supported() helpers in the v5.15
libnvdimm update. This supports the ongoing dax-reflink enabling effort.

4 months agoMerge tag 'nfsd-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Wed, 8 Sep 2021 22:55:42 +0000 (15:55 -0700)]
Merge tag 'nfsd-5.15-1' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Restore performance on memory-starved servers

* tag 'nfsd-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: improve error response to over-size gss credential
  SUNRPC: don't pause on incomplete allocation

4 months agoMerge tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client
Linus Torvalds [Wed, 8 Sep 2021 22:50:32 +0000 (15:50 -0700)]
Merge tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:

 - a set of patches to address fsync stalls caused by depending on
   periodic rather than triggered MDS journal flushes in some cases
   (Xiubo Li)

 - a fix for mtime effectively not getting updated in case of competing
   writers (Jeff Layton)

 - a couple of fixes for inode reference leaks and various WARNs after
   "umount -f" (Xiubo Li)

 - a new ceph.auth_mds extended attribute (Jeff Layton)

 - a smattering of fixups and cleanups from Jeff, Xiubo and Colin.

* tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client:
  ceph: fix dereference of null pointer cf
  ceph: drop the mdsc_get_session/put_session dout messages
  ceph: lockdep annotations for try_nonblocking_invalidate
  ceph: don't WARN if we're forcibly removing the session caps
  ceph: don't WARN if we're force umounting
  ceph: remove the capsnaps when removing caps
  ceph: request Fw caps before updating the mtime in ceph_write_iter
  ceph: reconnect to the export targets on new mdsmaps
  ceph: print more information when we can't find snaprealm
  ceph: add ceph_change_snap_realm() helper
  ceph: remove redundant initializations from mdsc and session
  ceph: cancel delayed work instead of flushing on mdsc teardown
  ceph: add a new vxattr to return auth mds for an inode
  ceph: remove some defunct forward declarations
  ceph: flush the mdlog before waiting on unsafe reqs
  ceph: flush mdlog before umounting
  ceph: make iterate_sessions a global symbol
  ceph: make ceph_create_session_msg a global symbol
  ceph: fix comment about short copies in ceph_write_end
  ceph: fix memory leak on decode error in ceph_handle_caps

4 months agoMerge tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux
Linus Torvalds [Wed, 8 Sep 2021 22:40:39 +0000 (15:40 -0700)]
Merge tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
 "A couple of harmless fixes, increase max tcp msize (64KB -> 1MB), and
  increase default msize (8KB -> 128KB)

  The default increase has been discussed with Christian for the qemu
  side of things but makes sense for all supported transports"

* tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux:
  net/9p: increase default msize to 128k
  net/9p: use macro to define default msize
  net/9p: increase tcp max msize to 1MB
  9p/xen: Fix end of loop tests for list_for_each_entry
  9p/trans_virtio: Remove sysfs file on probe failure

4 months agoarch: remove compat_alloc_user_space
Arnd Bergmann [Wed, 8 Sep 2021 22:18:29 +0000 (15:18 -0700)]
arch: remove compat_alloc_user_space

All users of compat_alloc_user_space() and copy_in_user() have been
removed from the kernel, only a few functions in sparc remain that can be
changed to calling arch_copy_in_user() instead.

Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agocompat: remove some compat entry points
Arnd Bergmann [Wed, 8 Sep 2021 22:18:25 +0000 (15:18 -0700)]
compat: remove some compat entry points

These are all handled correctly when calling the native system call entry
point, so remove the special cases.

Link: https://lkml.kernel.org/r/20210727144859.4150043-6-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm: simplify compat numa syscalls
Arnd Bergmann [Wed, 8 Sep 2021 22:18:21 +0000 (15:18 -0700)]
mm: simplify compat numa syscalls

The compat implementations for mbind, get_mempolicy, set_mempolicy and
migrate_pages are just there to handle the subtly different layout of
bitmaps on 32-bit hosts.

The compat implementation however lacks some of the checks that are
present in the native one, in particular for checking that the extra bits
are all zero when user space has a larger mask size than the kernel.
Worse, those extra bits do not get cleared when copying in or out of the
kernel, which can lead to incorrect data as well.

Unify the implementation to handle the compat bitmap layout directly in
the get_nodes() and copy_nodes_to_user() helpers.  Splitting out the
get_bitmap() helper from get_nodes() also helps readability of the native
case.

On x86, two additional problems are addressed by this: compat tasks can
pass a bitmap at the end of a mapping, causing a fault when reading across
the page boundary for a 64-bit word.  x32 tasks might also run into
problems with get_mempolicy corrupting data when an odd number of 32-bit
words gets passed.

On parisc the migrate_pages() system call apparently had the wrong calling
convention, as big-endian architectures expect the words inside of a
bitmap to be swapped.  This is not a problem though since parisc has no
NUMA support.

[arnd@arndb.de: fix mempolicy crash]
Link: https://lkml.kernel.org/r/20210730143417.3700653-1-arnd@kernel.org
Link: https://lore.kernel.org/lkml/YQPLG20V3dmOfq3a@osiris/
Link: https://lkml.kernel.org/r/20210727144859.4150043-5-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm: simplify compat_sys_move_pages
Arnd Bergmann [Wed, 8 Sep 2021 22:18:17 +0000 (15:18 -0700)]
mm: simplify compat_sys_move_pages

The compat move_pages() implementation uses compat_alloc_user_space() for
converting the pointer array.  Moving the compat handling into the
function itself is a bit simpler and lets us avoid the
compat_alloc_user_space() call.

Link: https://lkml.kernel.org/r/20210727144859.4150043-4-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agokexec: avoid compat_alloc_user_space
Arnd Bergmann [Wed, 8 Sep 2021 22:18:13 +0000 (15:18 -0700)]
kexec: avoid compat_alloc_user_space

kimage_alloc_init() expects a __user pointer, so compat_sys_kexec_load()
uses compat_alloc_user_space() to convert the layout and put it back onto
the user space caller stack.

Moving the user space access into the syscall handler directly actually
makes the code simpler, as the conversion for compat mode can now be done
on kernel memory.

Link: https://lkml.kernel.org/r/20210727144859.4150043-3-arnd@kernel.org
Link: https://lore.kernel.org/lkml/YPbtsU4GX6PL7%2F42@infradead.org/
Link: https://lore.kernel.org/lkml/m1y2cbzmnw.fsf@fess.ebiederm.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Co-developed-by: Eric Biederman <ebiederm@xmission.com>
Co-developed-by: Christoph Hellwig <hch@infradead.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agokexec: move locking into do_kexec_load
Arnd Bergmann [Wed, 8 Sep 2021 22:18:10 +0000 (15:18 -0700)]
kexec: move locking into do_kexec_load

Patch series "compat: remove compat_alloc_user_space", v5.

Going through compat_alloc_user_space() to convert indirect system call
arguments tends to add complexity compared to handling the native and
compat logic in the same code.

This patch (of 6):

The locking is the same between the native and compat version of
sys_kexec_load(), so it can be done in the common implementation to reduce
duplication.

Link: https://lkml.kernel.org/r/20210727144859.4150043-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20210727144859.4150043-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Co-developed-by: Eric Biederman <ebiederm@xmission.com>
Co-developed-by: Christoph Hellwig <hch@infradead.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm: migrate: change to use bool type for 'page_was_mapped'
Baolin Wang [Wed, 8 Sep 2021 22:18:06 +0000 (15:18 -0700)]
mm: migrate: change to use bool type for 'page_was_mapped'

Change to use bool type for 'page_was_mapped' variable making it more
readable.

Link: https://lkml.kernel.org/r/ce1279df18d2c163998c403e0b5ec6d3f6f90f7a.1629447552.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm: migrate: fix the incorrect function name in comments
Baolin Wang [Wed, 8 Sep 2021 22:18:03 +0000 (15:18 -0700)]
mm: migrate: fix the incorrect function name in comments

since commit a98a2f0c8ce1 ("mm/rmap: split migration into its own
function"), the migration ptes establishment has been split into a
separate try_to_migrate() function, thus update the related comments.

Link: https://lkml.kernel.org/r/5b824bad6183259c916ae6cf42f81d14c6118b06.1629447552.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm: migrate: introduce a local variable to get the number of pages
Baolin Wang [Wed, 8 Sep 2021 22:18:01 +0000 (15:18 -0700)]
mm: migrate: introduce a local variable to get the number of pages

Use thp_nr_pages() instead of compound_nr() to get the number of pages for
THP page, meanwhile introducing a local variable 'nr_pages' to avoid
getting the number of pages repeatedly.

Link: https://lkml.kernel.org/r/a8e331ac04392ee230c79186330fb05e86a2aa77.1629447552.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm/vmstat: protect per cpu variables with preempt disable on RT
Ingo Molnar [Wed, 8 Sep 2021 22:17:57 +0000 (15:17 -0700)]
mm/vmstat: protect per cpu variables with preempt disable on RT

Disable preemption on -RT for the vmstat code.  On vanila the code runs in
IRQ-off regions while on -RT it may not when stats are updated under a
local_lock.  "preempt_disable" ensures that the same resources is not
updated in parallel due to preemption.

This patch differs from the preempt-rt version where __count_vm_event and
__count_vm_events are also protected.  The counters are explicitly
"allowed to be to be racy" so there is no need to protect them from
preemption.  Only the accurate page stats that are updated by a
read-modify-write need protection.  This patch also differs in that a
preempt_[en|dis]able_rt helper is not used.  As vmstat is the only user of
the helper, it was suggested that it be open-coded in vmstat.c instead of
risking the helper being used in unnecessary contexts.

Link: https://lkml.kernel.org/r/20210805160019.1137-2-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoksmbd: fix control flow issues in sid_to_id()
Namjae Jeon [Mon, 6 Sep 2021 23:16:26 +0000 (08:16 +0900)]
ksmbd: fix control flow issues in sid_to_id()

Addresses-Coverity reported Control flow issues in sid_to_id()
/fs/ksmbd/smbacl.c: 277 in sid_to_id()
271
272 if (sidtype == SIDOWNER) {
273 kuid_t uid;
274 uid_t id;
275
276 id = le32_to_cpu(psid->sub_auth[psid->num_subauth - 1]);
>>> CID 1506810:  Control flow issues  (NO_EFFECT)
>>> This greater-than-or-equal-to-zero comparison of an unsigned value
>>> is always true. "id >= 0U".
277 if (id >= 0) {
278 /*
279  * Translate raw sid into kuid in the server's user
280  * namespace.
281  */
282 uid = make_kuid(&init_user_ns, id);

Addresses-Coverity: ("Control flow issues")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoksmbd: fix read of uninitialized variable ret in set_file_basic_info
Namjae Jeon [Mon, 6 Sep 2021 23:15:21 +0000 (08:15 +0900)]
ksmbd: fix read of uninitialized variable ret in set_file_basic_info

Addresses-Coverity reported Uninitialized variables warninig :

/fs/ksmbd/smb2pdu.c: 5525 in set_file_basic_info()
5519                    if (!rc) {
5520                            inode->i_ctime = ctime;
5521                            mark_inode_dirty(inode);
5522                    }
5523                    inode_unlock(inode);
5524            }
>>>     CID 1506805:  Uninitialized variables  (UNINIT)
>>>     Using uninitialized value "rc".
5525            return rc;
5526     }
5527
5528     static int set_file_allocation_info(struct ksmbd_work *work,
5529                                 struct ksmbd_file *fp, char *buf)
5530     {

Addresses-Coverity: ("Uninitialized variable")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoksmbd: add missing assignments to ret on ndr_read_int64 read calls
Colin Ian King [Mon, 6 Sep 2021 13:44:38 +0000 (14:44 +0100)]
ksmbd: add missing assignments to ret on ndr_read_int64 read calls

Currently there are two ndr_read_int64 calls where ret is being checked
for failure but ret is not being assigned a return value from the call.
Static analyis is reporting the checks on ret as dead code.  Fix this.

Addresses-Coverity: ("Logical dead code")
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 8 Sep 2021 19:55:35 +0000 (12:55 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
 "147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.

  Subsystems affected by this patch series: mm (memory-hotplug, rmap,
  ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
  alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
  checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
  selftests, ipc, and scripts"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  scripts: check_extable: fix typo in user error message
  mm/workingset: correct kernel-doc notations
  ipc: replace costly bailout check in sysvipc_find_ipc()
  selftests/memfd: remove unused variable
  Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
  configs: remove the obsolete CONFIG_INPUT_POLLDEV
  prctl: allow to setup brk for et_dyn executables
  pid: cleanup the stale comment mentioning pidmap_init().
  kernel/fork.c: unexport get_{mm,task}_exe_file
  coredump: fix memleak in dump_vma_snapshot()
  fs/coredump.c: log if a core dump is aborted due to changed file permissions
  nilfs2: use refcount_dec_and_lock() to fix potential UAF
  nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
  nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
  nilfs2: fix NULL pointer in nilfs_##name##_attr_release
  nilfs2: fix memory leak in nilfs_sysfs_create_device_group
  trap: cleanup trap_init()
  init: move usermodehelper_enable() to populate_rootfs()
  ...

4 months agotracing/boot: Fix to loop on only subkeys
Masami Hiramatsu [Wed, 8 Sep 2021 19:38:03 +0000 (04:38 +0900)]
tracing/boot: Fix to loop on only subkeys

Since the commit e5efaeb8a8f5 ("bootconfig: Support mixing
a value and subkeys under a key") allows to co-exist a value
node and key nodes under a node, xbc_node_for_each_child()
is not only returning key node but also a value node.
In the boot-time tracing using xbc_node_for_each_child() to
iterate the events, groups and instances, but those must be
key nodes. Thus it must use xbc_node_for_each_subkey().

Link: https://lkml.kernel.org/r/163112988361.74896.2267026262061819145.stgit@devnote2
Fixes: e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agoMerge tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka...
Linus Torvalds [Wed, 8 Sep 2021 19:36:00 +0000 (12:36 -0700)]
Merge tag 'mm-slub-5.15-rc1' of git://git./linux/kernel/git/vbabka/linux

Pull SLUB updates from Vlastimil Babka:
 "SLUB: reduce irq disabled scope and make it RT compatible

  This series was initially inspired by Mel's pcplist local_lock
  rewrite, and also interest to better understand SLUB's locking and the
  new primitives and RT variants and implications. It makes SLUB
  compatible with PREEMPT_RT and generally more preemption-friendly,
  apparently without significant regressions, as the fast paths are not
  affected.

  The main changes to SLUB by this series:

   - irq disabling is now only done for minimum amount of time needed to
     protect the strict kmem_cache_cpu fields, and as part of spin lock,
     local lock and bit lock operations to make them irq-safe

   - SLUB is fully PREEMPT_RT compatible

  The series should now be sufficiently tested in both RT and !RT
  configs, mainly thanks to Mike.

  The RFC/v1 version also got basic performance screening by Mel that
  didn't show major regressions. Mike's testing with hackbench of v2 on
  !RT reported negligible differences [6]:

    virgin(ish) tip
    5.13.0.g60ab3ed-tip
              7,320.67 msec task-clock                #    7.792 CPUs utilized            ( +-  0.31% )
               221,215      context-switches          #    0.030 M/sec                    ( +-  3.97% )
                16,234      cpu-migrations            #    0.002 M/sec                    ( +-  4.07% )
                13,233      page-faults               #    0.002 M/sec                    ( +-  0.91% )
        27,592,205,252      cycles                    #    3.769 GHz                      ( +-  0.32% )
         8,309,495,040      instructions              #    0.30  insn per cycle           ( +-  0.37% )
         1,555,210,607      branches                  #  212.441 M/sec                    ( +-  0.42% )
             5,484,209      branch-misses             #    0.35% of all branches          ( +-  2.13% )

               0.93949 +- 0.00423 seconds time elapsed  ( +-  0.45% )
               0.94608 +- 0.00384 seconds time elapsed  ( +-  0.41% ) (repeat)
               0.94422 +- 0.00410 seconds time elapsed  ( +-  0.43% )

    5.13.0.g60ab3ed-tip +slub-local-lock-v2r3
              7,343.57 msec task-clock                #    7.776 CPUs utilized            ( +-  0.44% )
               223,044      context-switches          #    0.030 M/sec                    ( +-  3.02% )
                16,057      cpu-migrations            #    0.002 M/sec                    ( +-  4.03% )
                13,164      page-faults               #    0.002 M/sec                    ( +-  0.97% )
        27,684,906,017      cycles                    #    3.770 GHz                      ( +-  0.45% )
         8,323,273,871      instructions              #    0.30  insn per cycle           ( +-  0.28% )
         1,556,106,680      branches                  #  211.901 M/sec                    ( +-  0.31% )
             5,463,468      branch-misses             #    0.35% of all branches          ( +-  1.33% )

               0.94440 +- 0.00352 seconds time elapsed  ( +-  0.37% )
               0.94830 +- 0.00228 seconds time elapsed  ( +-  0.24% ) (repeat)
               0.93813 +- 0.00440 seconds time elapsed  ( +-  0.47% ) (repeat)

  RT configs showed some throughput regressions, but that's expected
  tradeoff for the preemption improvements through the RT mutex. It
  didn't prevent the v2 to be incorporated to the 5.13 RT tree [7],
  leading to testing exposure and bugfixes.

  Before the series, SLUB is lockless in both allocation and free fast
  paths, but elsewhere, it's disabling irqs for considerable periods of
  time - especially in allocation slowpath and the bulk allocation,
  where IRQs are re-enabled only when a new page from the page allocator
  is needed, and the context allows blocking. The irq disabled sections
  can then include deactivate_slab() which walks a full freelist and
  frees the slab back to page allocator or unfreeze_partials() going
  through a list of percpu partial slabs. The RT tree currently has some
  patches mitigating these, but we can do much better in mainline too.

  Patches 1-6 are straightforward improvements or cleanups that could
  exist outside of this series too, but are prerequsities.

  Patches 7-9 are also preparatory code changes without functional
  changes, but not so useful without the rest of the series.

  Patch 10 simplifies the fast paths on systems with preemption, based
  on (hopefully correct) observation that the current loops to verify
  tid are unnecessary.

  Patches 11-20 focus on reducing irq disabled scope in the allocation
  slowpath:

   - patch 11 moves disabling of irqs into ___slab_alloc() from its
     callers, which are the allocation slowpath, and bulk allocation.
     Instead these callers only disable preemption to stabilize the cpu.

   - The following patches then gradually reduce the scope of disabled
     irqs in ___slab_alloc() and the functions called from there. As of
     patch 14, the re-enabling of irqs based on gfp flags before calling
     the page allocator is removed from allocate_slab(). As of patch 17,
     it's possible to reach the page allocator (in case of existing
     slabs depleted) without disabling and re-enabling irqs a single
     time.

  Pathces 21-26 reduce the scope of disabled irqs in functions related
  to unfreezing percpu partial slab.

  Patch 27 is preparatory. Patch 28 is adopted from the RT tree and
  converts the flushing of percpu slabs on all cpus from using IPI to
  workqueue, so that the processing isn't happening with irqs disabled
  in the IPI handler. The flushing is not performance critical so it
  should be acceptable.

  Patch 29 also comes from RT tree and makes object_map_lock RT
  compatible.

  Patch 30 make slab_lock irq-safe on RT where we cannot rely on having
  irq disabled from the list_lock spin lock usage.

  Patch 31 changes kmem_cache_cpu->partial handling in put_cpu_partial()
  from cmpxchg loop to a short irq disabled section, which is used by
  all other code modifying the field. This addresses a theoretical race
  scenario pointed out by Jann, and makes the critical section safe wrt
  with RT local_lock semantics after the conversion in patch 35.

  Patch 32 changes preempt disable to migrate disable, so that the
  nested list_lock spinlock is safe to take on RT. Because
  migrate_disable() is a function call even on !RT, a small set of
  private wrappers is introduced to keep using the cheaper
  preempt_disable() on !PREEMPT_RT configurations. As of this patch,
  SLUB should be already compatible with RT's lock semantics.

  Finally, patch 33 changes irq disabled sections that protect
  kmem_cache_cpu fields in the slow paths, with a local lock. However on
  PREEMPT_RT it means the lockless fast paths can now preempt slow paths
  which don't expect that, so the local lock has to be taken also in the
  fast paths and they are no longer lockless. RT folks seem to not mind
  this tradeoff. The patch also updates the locking documentation in the
  file's comment"

Mike Galbraith and Mel Gorman verified that their earlier testing
observations still hold for the final series:

Link: https://lore.kernel.org/lkml/89ba4f783114520c167cc915ba949ad2c04d6790.camel@gmx.de/
Link: https://lore.kernel.org/lkml/20210907082010.GB3959@techsingularity.net/
* tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux: (33 commits)
  mm, slub: convert kmem_cpu_slab protection to local_lock
  mm, slub: use migrate_disable() on PREEMPT_RT
  mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
  mm, slub: make slab_lock() disable irqs with PREEMPT_RT
  mm: slub: make object_map_lock a raw_spinlock_t
  mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
  mm, slab: split out the cpu offline variant of flush_slab()
  mm, slub: don't disable irqs in slub_cpu_dead()
  mm, slub: only disable irq with spin_lock in __unfreeze_partials()
  mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing
  mm, slub: detach whole partial list at once in unfreeze_partials()
  mm, slub: discard slabs in unfreeze_partials() without irqs disabled
  mm, slub: move irq control into unfreeze_partials()
  mm, slub: call deactivate_slab() without disabling irqs
  mm, slub: make locking in deactivate_slab() irq-safe
  mm, slub: move reset of c->page and freelist out of deactivate_slab()
  mm, slub: stop disabling irqs around get_partial()
  mm, slub: check new pages with restored irqs
  mm, slub: validate slab from partial list or page allocator before making it cpu slab
  mm, slub: restore irqs around calling new_slab()
  ...

4 months agoselftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events
Steven Rostedt (VMware) [Wed, 8 Sep 2021 03:04:29 +0000 (23:04 -0400)]
selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events

The original test for adding and removing eprobes used synthetic events
and retrieved the filename from the open system call at the end of the
system call. This would allow it to always be loaded into the page tables
when accessed.

Masami suggested that the test was too complex for just testing add and
remove, so it was changed to test just adding and removing an event probe
on top of the start of the open system call event. Now it is possible that
the filename will not be loaded into memory at the time the eprobe is
triggered, and will result in "(fault)" being displayed in the event. This
causes the test to fail.

Account for "(fault)" also being one of the values of the filename field
of the event probe.

Link: https://lkml.kernel.org/r/20210907230429.5783d519@rorschach.local.home
Fixes: 079db70794ec5 ("selftests/ftrace: Add test case to test adding and removing of event probe")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agotracing: Dynamically allocate the per-elt hist_elt_data array
Tom Zanussi [Thu, 2 Sep 2021 20:57:12 +0000 (15:57 -0500)]
tracing: Dynamically allocate the per-elt hist_elt_data array

Setting the hist_elt_data.field_var_str[] array unconditionally to a
size of SYNTH_FIELD_MAX elements wastes space unnecessarily.  The
actual number of elements needed can be calculated at run-time
instead.

In most cases, this will save a lot of space since it's a per-elt
array which isn't normally close to being full.  It also allows us to
increase SYNTH_FIELD_MAX without worrying about even more wastage when
we do that.

Link: https://lkml.kernel.org/r/d52ae0ad5e1b59af7c4f54faf3fc098461fd82b3.camel@kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agotracing: synth events: increase max fields count
Artem Bityutskiy [Wed, 1 Sep 2021 13:55:13 +0000 (16:55 +0300)]
tracing: synth events: increase max fields count

Sometimes it is useful to construct larger synthetic trace events. Increase
'SYNTH_FIELDS_MAX' (maximum number of fields in a synthetic event) from 32 to
64.

Link: https://lkml.kernel.org/r/20210901135513.3087062-1-dedekind1@gmail.com
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agotools/bootconfig: Show whole test command for each test case
Masami Hiramatsu [Sat, 4 Sep 2021 15:54:46 +0000 (00:54 +0900)]
tools/bootconfig: Show whole test command for each test case

Show whole test command instead of only the 3rd argument.
This will clear to show what will be actually tested by
each test case.

Link: https://lkml.kernel.org/r/163077088607.222577.14786016266462495017.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agobootconfig: Fix missing return check of xbc_node_compose_key function
Julio Faracco [Sat, 4 Sep 2021 15:54:38 +0000 (00:54 +0900)]
bootconfig: Fix missing return check of xbc_node_compose_key function

The function `xbc_show_list should` handle the keys during the
composition. Even the errors returned by the compose function. Instead
of removing the `ret` variable, it should save the value and show the
exact error. This missing variable is causing a compilation issue also.

Link: https://lkml.kernel.org/r/163077087861.222577.12884543474750968146.stgit@devnote2
Fixes: e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key")
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agotools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh
Masami Hiramatsu [Sat, 4 Sep 2021 15:54:31 +0000 (00:54 +0900)]
tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh

Since tracing_on indicates only "1" (default) or "0", ftrace2bconf.sh
only need to check the value is "0".

Link: https://lkml.kernel.org/r/163077087144.222577.6888011847727968737.stgit@devnote2
Fixes: 55ed4560774d ("tools/bootconfig: Add tracing_on support to helper scripts")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agodocs: bootconfig: Add how to use bootconfig for kernel parameters
Masami Hiramatsu [Sat, 4 Sep 2021 15:54:24 +0000 (00:54 +0900)]
docs: bootconfig: Add how to use bootconfig for kernel parameters

Add a section to describe how to use the bootconfig for
specifying kernel and init parameters. This is an important
section because it is the reason why this document is under
the admin-guide.

Link: https://lkml.kernel.org/r/163077086399.222577.5881779375643977991.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agoinit/bootconfig: Reorder init parameter from bootconfig and cmdline
Masami Hiramatsu [Sat, 4 Sep 2021 15:54:16 +0000 (00:54 +0900)]
init/bootconfig: Reorder init parameter from bootconfig and cmdline

Reorder the init parameters from bootconfig and kernel cmdline
so that the kernel cmdline always be the last part of the
parameters as below.

 " -- "[bootconfig init params][cmdline init params]

This change will help us to prevent that bootconfig init params
overwrite the init params which user gives in the command line.

Link: https://lkml.kernel.org/r/163077085675.222577.5665176468023636160.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agoinit: bootconfig: Remove all bootconfig data when the init memory is removed
Masami Hiramatsu [Sat, 4 Sep 2021 15:54:09 +0000 (00:54 +0900)]
init: bootconfig: Remove all bootconfig data when the init memory is removed

Since the bootconfig is used only in the init functions,
it doesn't need to keep the data after boot. Free it when
the init memory is removed.

Link: https://lkml.kernel.org/r/163077084958.222577.5924961258513004428.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agotracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()
Qiang.Zhang [Tue, 31 Aug 2021 02:29:19 +0000 (10:29 +0800)]
tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()

When start_kthread() return error, the cpus_read_unlock() need
to be called.

Link: https://lkml.kernel.org/r/20210831022919.27630-1-qiang.zhang@windriver.com
Cc: <stable@vger.kernel.org>
Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Qiang.Zhang <qiang.zhang@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 months agoACPI: PRM: Find PRMT table before parsing it
Aubrey Li [Wed, 8 Sep 2021 10:55:45 +0000 (18:55 +0800)]
ACPI: PRM: Find PRMT table before parsing it

Find and verify PRMT before parsing it, which eliminates a
warning on machines without PRMT:

[    7.197173] ACPI: PRMT not present

Fixes: cefc7ca46235 ("ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtype")
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
4 months agoscripts: check_extable: fix typo in user error message
Randy Dunlap [Wed, 8 Sep 2021 03:00:59 +0000 (20:00 -0700)]
scripts: check_extable: fix typo in user error message

Fix typo ("and" should be "an") in an error message.

Link: https://lkml.kernel.org/r/20210727002943.29774-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agomm/workingset: correct kernel-doc notations
Randy Dunlap [Wed, 8 Sep 2021 03:00:56 +0000 (20:00 -0700)]
mm/workingset: correct kernel-doc notations

Use the documented kernel-doc format to prevent kernel-doc warnings.

mm/workingset.c:256: warning: No description found for return value of 'workingset_eviction'
mm/workingset.c:285: warning: Function parameter or member 'folio' not described in 'workingset_refault'
mm/workingset.c:285: warning: Excess function parameter 'page' description in 'workingset_refault'

Link: https://lkml.kernel.org/r/20210808203153.10678-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoipc: replace costly bailout check in sysvipc_find_ipc()
Rafael Aquini [Wed, 8 Sep 2021 03:00:53 +0000 (20:00 -0700)]
ipc: replace costly bailout check in sysvipc_find_ipc()

sysvipc_find_ipc() was left with a costly way to check if the offset
position fed to it is bigger than the total number of IPC IDs in use.  So
much so that the time it takes to iterate over /proc/sysvipc/* files grows
exponentially for a custom benchmark that creates "N" SYSV shm segments
and then times the read of /proc/sysvipc/shm (milliseconds):

    12 msecs to read   1024 segs from /proc/sysvipc/shm
    18 msecs to read   2048 segs from /proc/sysvipc/shm
    65 msecs to read   4096 segs from /proc/sysvipc/shm
   325 msecs to read   8192 segs from /proc/sysvipc/shm
  1303 msecs to read  16384 segs from /proc/sysvipc/shm
  5182 msecs to read  32768 segs from /proc/sysvipc/shm

The root problem lies with the loop that computes the total amount of ids
in use to check if the "pos" feeded to sysvipc_find_ipc() grew bigger than
"ids->in_use".  That is a quite inneficient way to get to the maximum
index in the id lookup table, specially when that value is already
provided by struct ipc_ids.max_idx.

This patch follows up on the optimization introduced via commit
15df03c879836 ("sysvipc: make get_maxid O(1) again") and gets rid of the
aforementioned costly loop replacing it by a simpler checkpoint based on
ipc_get_maxidx() returned value, which allows for a smooth linear increase
in time complexity for the same custom benchmark:

     2 msecs to read   1024 segs from /proc/sysvipc/shm
     2 msecs to read   2048 segs from /proc/sysvipc/shm
     4 msecs to read   4096 segs from /proc/sysvipc/shm
     9 msecs to read   8192 segs from /proc/sysvipc/shm
    19 msecs to read  16384 segs from /proc/sysvipc/shm
    39 msecs to read  32768 segs from /proc/sysvipc/shm

Link: https://lkml.kernel.org/r/20210809203554.1562989-1-aquini@redhat.com
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <llong@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoselftests/memfd: remove unused variable
Greg Thelen [Wed, 8 Sep 2021 03:00:50 +0000 (20:00 -0700)]
selftests/memfd: remove unused variable

Commit 544029862cbb ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE
seal") added an unused variable to mfd_assert_reopen_fd().

Delete the unused variable.

Link: https://lkml.kernel.org/r/20210702045509.1517643-1-gthelen@google.com
Fixes: 544029862cbb ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE seal")
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoKconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
Lukas Bulwahn [Wed, 8 Sep 2021 03:00:47 +0000 (20:00 -0700)]
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH

Commit 05a4a9527931 ("kernel/watchdog: split up config options") adds a
new config HARDLOCKUP_DETECTOR, which selects the non-existing config
HARDLOCKUP_DETECTOR_ARCH.

Hence, ./scripts/checkkconfigsymbols.py warns:

HARDLOCKUP_DETECTOR_ARCH Referencing files: lib/Kconfig.debug

Simply drop selecting the non-existing HARDLOCKUP_DETECTOR_ARCH.

Link: https://lkml.kernel.org/r/20210806115618.22088-1-lukas.bulwahn@gmail.com
Fixes: 05a4a9527931 ("kernel/watchdog: split up config options")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoconfigs: remove the obsolete CONFIG_INPUT_POLLDEV
Zenghui Yu [Wed, 8 Sep 2021 03:00:44 +0000 (20:00 -0700)]
configs: remove the obsolete CONFIG_INPUT_POLLDEV

This CONFIG option was removed in commit 278b13ce3a89 ("Input: remove
input_polled_dev implementation") so there's no point to keep it in
defconfigs any longer.

Get rid of the leftover for all arches.

Link: https://lkml.kernel.org/r/20210726074741.1062-1-yuzenghui@huawei.com
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoprctl: allow to setup brk for et_dyn executables
Cyrill Gorcunov [Wed, 8 Sep 2021 03:00:41 +0000 (20:00 -0700)]
prctl: allow to setup brk for et_dyn executables

Keno Fischer reported that when a binray loaded via ld-linux-x the
prctl(PR_SET_MM_MAP) doesn't allow to setup brk value because it lays
before mm:end_data.

For example a test program shows

 | # ~/t
 |
 | start_code      401000
 | end_code        401a15
 | start_stack     7ffce4577dd0
 | start_data    403e10
 | end_data        40408c
 | start_brk    b5b000
 | sbrk(0)         b5b000

and when executed via ld-linux

 | # /lib64/ld-linux-x86-64.so.2 ~/t
 |
 | start_code      7fc25b0a4000
 | end_code        7fc25b0c4524
 | start_stack     7fffcc6b2400
 | start_data    7fc25b0ce4c0
 | end_data        7fc25b0cff98
 | start_brk    55555710c000
 | sbrk(0)         55555710c000

This of course prevent criu from restoring such programs.  Looking into
how kernel operates with brk/start_brk inside brk() syscall I don't see
any problem if we allow to setup brk/start_brk without checking for
end_data.  Even if someone pass some weird address here on a purpose then
the worst possible result will be an unexpected unmapping of existing vma
(own vma, since prctl works with the callers memory) but test for
RLIMIT_DATA is still valid and a user won't be able to gain more memory in
case of expanding VMAs via new values shipped with prctl call.

Link: https://lkml.kernel.org/r/20210121221207.GB2174@grain
Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Andrey Vagin <avagin@gmail.com>
Tested-by: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agopid: cleanup the stale comment mentioning pidmap_init().
Takahiro Itazuri [Wed, 8 Sep 2021 03:00:38 +0000 (20:00 -0700)]
pid: cleanup the stale comment mentioning pidmap_init().

pidmap_init() has already been replaced with pid_idr_init() in the commit
95846ecf9dac ("pid: replace pid bitmap implementation with IDR API").
Cleanup the stale comment which still mentions it.

Link: https://lkml.kernel.org/r/20210714120713.19825-1-itazur@amazon.com
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agokernel/fork.c: unexport get_{mm,task}_exe_file
Christoph Hellwig [Wed, 8 Sep 2021 03:00:35 +0000 (20:00 -0700)]
kernel/fork.c: unexport get_{mm,task}_exe_file

Only used by core code and the tomoyo which can't be a module either.

Link: https://lkml.kernel.org/r/20210820095430.445242-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>