Pingfan Liu [Mon, 3 Aug 2020 13:18:40 +0000 (21:18 +0800)]
arm64/fixmap: make notes of fixed_addresses more precisely
These 'compile-time allocated' memory buffers can occupy more than one
page and each enum increment is page-sized. So improve the note about it.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1596460720-19243-1-git-send-email-kernelfans@gmail.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Fri, 31 Jul 2020 17:09:57 +0000 (18:09 +0100)]
Merge branch 'for-next/read-barrier-depends' into for-next/core
* for-next/read-barrier-depends:
: Allow architectures to override __READ_ONCE()
arm64: Reduce the number of header files pulled into vmlinux.lds.S
compiler.h: Move compiletime_assert() macros into compiler_types.h
checkpatch: Remove checks relating to [smp_]read_barrier_depends()
include/linux: Remove smp_read_barrier_depends() from comments
tools/memory-model: Remove smp_read_barrier_depends() from informal doc
Documentation/barriers/kokr: Remove references to [smp_]read_barrier_depends()
Documentation/barriers: Remove references to [smp_]read_barrier_depends()
locking/barriers: Remove definitions for [smp_]read_barrier_depends()
alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()
vhost: Remove redundant use of read_barrier_depends() barrier
asm/rwonce: Don't pull <asm/barrier.h> into 'asm-generic/rwonce.h'
asm/rwonce: Remove smp_read_barrier_depends() invocation
alpha: Override READ_ONCE() with barriered implementation
asm/rwonce: Allow __READ_ONCE to be overridden by the architecture
compiler.h: Split {READ,WRITE}_ONCE definitions out into rwonce.h
tools: bpf: Use local copy of headers including uapi/linux/filter.h
Catalin Marinas [Fri, 31 Jul 2020 17:09:50 +0000 (18:09 +0100)]
Merge branch 'for-next/tlbi' into for-next/core
* for-next/tlbi:
: Support for TTL (translation table level) hint in the TLB operations
arm64: tlb: Use the TLBI RANGE feature in arm64
arm64: enable tlbi range instructions
arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
arm64: tlb: don't set the ttl value in flush_tlb_page_nosync
arm64: Shift the __tlbi_level() indentation left
arm64: tlb: Set the TTL field in flush_*_tlb_range
arm64: tlb: Set the TTL field in flush_tlb_range
tlb: mmu_gather: add tlb_flush_*_range APIs
arm64: Add tlbi_user_level TLB invalidation helper
arm64: Add level-hinted TLB invalidation helper
arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
arm64: Detect the ARMv8.4 TTL feature
Catalin Marinas [Fri, 31 Jul 2020 17:09:39 +0000 (18:09 +0100)]
Merge branches 'for-next/misc', 'for-next/vmcoreinfo', 'for-next/cpufeature', 'for-next/acpi', 'for-next/perf', 'for-next/timens', 'for-next/msi-iommu' and 'for-next/trivial' into for-next/core
* for-next/misc:
: Miscellaneous fixes and cleanups
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.
arm64: Reserve HWCAP2_MTE as (1 << 18)
arm64/entry: deduplicate SW PAN entry/exit routines
arm64: s/AMEVTYPE/AMEVTYPER
arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs
arm64: stacktrace: Move export for save_stack_trace_tsk()
smccc: Make constants available to assembly
arm64/mm: Redefine CONT_{PTE, PMD}_SHIFT
arm64/defconfig: Enable CONFIG_KEXEC_FILE
arm64: Document sysctls for emulated deprecated instructions
arm64/panic: Unify all three existing notifier blocks
arm64/module: Optimize module load time by optimizing PLT counting
* for-next/vmcoreinfo:
: Export the virtual and physical address sizes in vmcoreinfo
arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
* for-next/cpufeature:
: CPU feature handling cleanups
arm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[]
arm64/cpufeature: Replace all open bits shift encodings with macros
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register
* for-next/acpi:
: ACPI updates for arm64
arm64/acpi: disallow writeable AML opregion mapping for EFI code regions
arm64/acpi: disallow AML memory opregions to access kernel memory
* for-next/perf:
: perf updates for arm64
arm64: perf: Expose some new events via sysfs
tools headers UAPI: Update tools's copy of linux/perf_event.h
arm64: perf: Add cap_user_time_short
perf: Add perf_event_mmap_page::cap_user_time_short ABI
arm64: perf: Only advertise cap_user_time for arch_timer
arm64: perf: Implement correct cap_user_time
time/sched_clock: Use raw_read_seqcount_latch()
sched_clock: Expose struct clock_read_data
arm64: perf: Correct the event index in sysfs
perf/smmuv3: To simplify code for ioremap page in pmcg
* for-next/timens:
: Time namespace support for arm64
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
arm64/vdso: Add time namespace page
arm64/vdso: Zap vvar pages when switching to a time namespace
arm64/vdso: use the fault callback to map vvar pages
* for-next/msi-iommu:
: Make the MSI/IOMMU input/output ID translation PCI agnostic, augment the
: MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID bus-specific parameter
: and apply the resulting changes to the device ID space provided by the
: Freescale FSL bus
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
* for-next/trivial:
: Trivial fixes
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
Maninder Singh [Fri, 31 Jul 2020 11:49:50 +0000 (17:19 +0530)]
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
IRQ_STACK_SIZE can be made different from THREAD_SIZE,
and as IRQ_STACK_SIZE is used while irq stack allocation,
same define should be used while printing information of irq stack.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1596196190-14141-1-git-send-email-maninder1.s@samsung.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pingfan Liu [Fri, 10 Jul 2020 14:04:12 +0000 (22:04 +0800)]
arm64/mm: save memory access in check_and_switch_context() fast switch path
On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable,
using the per-cpu offset stored in the tpidr_el1 system register. In
some cases we generate a per-cpu address with a sequence like:
cpu_ptr = &per_cpu(ptr, smp_processor_id());
Which potentially incurs a cache miss for both `cpu_number` and the
in-memory `__per_cpu_offset` array. This can be written more optimally
as:
cpu_ptr = this_cpu_ptr(ptr);
Which only needs the offset from tpidr_el1, and does not need to
load from memory.
The following two test cases show a small performance improvement measured
on a 46-cpus qualcomm machine with 5.8.0-rc4 kernel.
Test 1: (about 0.3% improvement)
#cat b.sh
make clean && make all -j138
#perf stat --repeat 10 --null --sync sh b.sh
- before this patch
Performance counter stats for 'sh b.sh' (10 runs):
298.62 +- 1.86 seconds time elapsed ( +- 0.62% )
- after this patch
Performance counter stats for 'sh b.sh' (10 runs):
297.734 +- 0.954 seconds time elapsed ( +- 0.32% )
Test 2: (about 1.69% improvement)
'perf stat -r 10 perf bench sched messaging'
Then sum the total time of 'sched/messaging' by manual.
- before this patch
total 0.707 sec for 10 times
- after this patch
totol 0.695 sec for 10 times
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/1594389852-19949-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Randy Dunlap [Sun, 26 Jul 2020 00:32:07 +0000 (17:32 -0700)]
arm64: sigcontext.h: delete duplicated word
Drop the repeated word "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-4-rdunlap@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Randy Dunlap [Sun, 26 Jul 2020 00:32:06 +0000 (17:32 -0700)]
arm64: ptrace.h: delete duplicated word
Drop the repeated word "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-3-rdunlap@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Randy Dunlap [Sun, 26 Jul 2020 00:32:05 +0000 (17:32 -0700)]
arm64: pgtable-hwdef.h: delete duplicated words
Drop the repeated words "at" and "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-2-rdunlap@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Makarand Pawagi [Fri, 19 Jun 2020 08:20:13 +0000 (09:20 +0100)]
bus: fsl-mc: Add ACPI support for fsl-mc
Add ACPI support in the fsl-mc driver. Driver parses MC DSDT table to
extract memory and other resources.
Interrupt (GIC ITS) information is extracted from the MADT table
by drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c.
IORT table is parsed to configure DMA.
Signed-off-by: Makarand Pawagi <makarand.pawagi@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20200619082013.13661-13-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Diana Craciun [Fri, 19 Jun 2020 08:20:12 +0000 (09:20 +0100)]
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
The DPRC driver is not taking into account the msi-map property
and assumes that the icid is the same as the stream ID. Although
this assumption is correct, generalize the code to include a
translation between icid and streamID.
Furthermore do not just copy the MSI domain from parent (for child
containers), but use the information provided by the msi-map property.
If the msi-map property is missing from the device tree retain the old
behaviour for backward compatibility ie the child DPRC objects
inherit the MSI domain from the parent.
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-12-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:11 +0000 (09:20 +0100)]
of/irq: Make of_msi_map_rid() PCI bus agnostic
There is nothing PCI bus specific in the of_msi_map_rid()
implementation other than the requester ID tag for the input
ID space. Rename requester ID to a more generic ID so that
the translation code can be used by all busses that require
input/output ID translations.
No functional change intended.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-11-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Diana Craciun [Fri, 19 Jun 2020 08:20:10 +0000 (09:20 +0100)]
of/irq: make of_msi_map_get_device_domain() bus agnostic
of_msi_map_get_device_domain() is PCI specific but it need not be and
can be easily changed to be bus agnostic in order to be used by other
busses by adding an IRQ domain bus token as an input parameter.
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci/msi.c
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-10-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Laurentiu Tudor [Fri, 19 Jun 2020 08:20:09 +0000 (09:20 +0100)]
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
The existing bindings cannot be used to specify the relationship
between fsl-mc devices and GIC ITSes.
Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
msi-map property.
In addition, deprecate msi-parent property which no longer makes sense
now that we support translating the MSIs.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-9-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:08 +0000 (09:20 +0100)]
of/device: Add input id to of_dma_configure()
Devices sitting on proprietary busses have a device ID space that
is owned by the respective bus and related firmware bindings. In order
to let the generic OF layer handle the input translations to
an IOMMU id, for such busses the current of_dma_configure() interface
should be extended in order to allow the bus layer to provide the
device input id parameter - that is retrieved/assigned in bus
specific code and firmware.
Augment of_dma_configure() to add an optional input_id parameter,
leaving current functionality unchanged.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20200619082013.13661-8-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:07 +0000 (09:20 +0100)]
of/iommu: Make of_map_rid() PCI agnostic
There is nothing PCI specific (other than the RID - requester ID)
in the of_map_rid() implementation, so the same function can be
reused for input/output IDs mapping for other busses just as well.
Rename the RID instances/names to a generic "id" tag.
No functionality change intended.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-7-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:06 +0000 (09:20 +0100)]
ACPI/IORT: Add an input ID to acpi_dma_configure()
Some HW devices are created as child devices of proprietary busses,
that have a bus specific policy defining how the child devices
wires representing the devices ID are translated into IOMMU and
IRQ controllers device IDs.
Current IORT code provides translations for:
- PCI devices, where the device ID is well identified at bus level
as the requester ID (RID)
- Platform devices that are endpoint devices where the device ID is
retrieved from the ACPI object IORT mappings (Named components single
mappings). A platform device is represented in IORT as a named
component node
For devices that are child devices of proprietary busses the IORT
firmware represents the bus node as a named component node in IORT
and it is up to that named component node to define in/out bus
specific ID translations for the bus child devices that are
allocated and created in a bus specific manner.
In order to make IORT ID translations available for proprietary
bus child devices, the current ACPI (and IORT) code must be
augmented to provide an additional ID parameter to acpi_dma_configure()
representing the child devices input ID. This ID is bus specific
and it is retrieved in bus specific code.
By adding an ID parameter to acpi_dma_configure(), the IORT
code can map the child device ID to an IOMMU stream ID through
the IORT named component representing the bus in/out ID mappings.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-6-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:05 +0000 (09:20 +0100)]
ACPI/IORT: Remove useless PCI bus walk
The PCI bus domain number (used in the iort_match_node_callback() -
pci_domain_nr() call) is cascaded through the PCI bus hierarchy at PCI
bus enumeration time, therefore there is no need in iort_find_dev_node()
to walk the PCI bus upwards to grab the root bus to be passed to
iort_scan_node(), the device->bus PCI bus pointer will do.
Remove this useless code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-5-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:04 +0000 (09:20 +0100)]
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
There is nothing PCI specific in iort_msi_map_rid().
Rename the function using a bus protocol agnostic name,
iort_msi_map_id(), and convert current callers to it.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-4-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:03 +0000 (09:20 +0100)]
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
iort_get_device_domain() is PCI specific but it need not be,
since it can be used to retrieve IRQ domain nexus of any kind
by adding an irq_domain_bus_token input to it.
Make it PCI agnostic by also renaming the requestor ID input
to a more generic ID name.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci/msi.c
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-3-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:02 +0000 (09:20 +0100)]
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
When the iort_match_node_callback is invoked for a named component
the match should be executed upon a device with an ACPI companion.
For devices with no ACPI companion set-up the ACPI device tree must be
walked in order to find the first parent node with a companion set and
check the parent node against the named component entry to check whether
there is a match and therefore an IORT node describing the in/out ID
translation for the device has been found.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-2-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Andrei Vagin [Wed, 24 Jun 2020 08:33:21 +0000 (01:33 -0700)]
arm64: enable time namespace support
CONFIG_TIME_NS is dependes on GENERIC_VDSO_TIME_NS.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-7-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Andrei Vagin [Wed, 24 Jun 2020 08:33:20 +0000 (01:33 -0700)]
arm64/vdso: Restrict splitting VVAR VMA
Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the
amount of corner-cases to consider while working further on VDSO time
namespace support.
As the offset from timens to VVAR page is computed compile-time, the pages
in VVAR should stay together and not being partically mremap()'ed.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-6-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Andrei Vagin [Wed, 24 Jun 2020 08:33:19 +0000 (01:33 -0700)]
arm64/vdso: Handle faults on timens page
If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-5-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Andrei Vagin [Wed, 24 Jun 2020 08:33:18 +0000 (01:33 -0700)]
arm64/vdso: Add time namespace page
Allocate the time namespace page among VVAR pages. Provide
__arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.
If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.
The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.
If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.
The time-namespace page isn't allocated on !CONFIG_TIME_NAMESPACE, but
vma is the same size, which simplifies criu/vdso migration between
different kernel configs.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200624083321.144975-4-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Andrei Vagin [Wed, 24 Jun 2020 08:33:17 +0000 (01:33 -0700)]
arm64/vdso: Zap vvar pages when switching to a time namespace
The order of vvar pages depends on whether a task belongs to the root
time namespace or not. In the root time namespace, a task doesn't have a
per-namespace page. In a non-root namespace, the VVAR page which contains
the system-wide VDSO data is replaced with a namespace specific page
that contains clock offsets.
Whenever a task changes its namespace, the VVAR page tables are cleared
and then they will be re-faulted with a corresponding layout.
A task can switch its time namespace only if its ->mm isn't shared with
another task.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200624083321.144975-3-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Andrei Vagin [Wed, 24 Jun 2020 08:33:16 +0000 (01:33 -0700)]
arm64/vdso: use the fault callback to map vvar pages
Currently the vdso has no awareness of time namespaces, which may
apply distinct offsets to processes in different namespaces. To handle
this within the vdso, we'll need to expose a per-namespace data page.
As a preparatory step, this patch separates the vdso data page from
the code pages, and has it faulted in via its own fault callback.
Subsquent patches will extend this to support distinct pages per time
namespace.
The vvar vma has to be installed with the VM_PFNMAP flag to handle
faults via its vma fault callback.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-2-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Gregory Herrero [Fri, 17 Jul 2020 14:33:38 +0000 (16:33 +0200)]
recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.
Currently, if a section has a relocation to '_mcount' symbol, a new
__mcount_loc entry will be added whatever the relocation type is.
This is problematic when a relocation to '_mcount' is in the middle of a
section and is not a call for ftrace use.
Such relocation could be generated with below code for example:
bool is_mcount(unsigned long addr)
{
return (target == (unsigned long) &_mcount);
}
With this snippet of code, ftrace will try to patch the mcount location
generated by this code on module load and fail with:
Call trace:
ftrace_bug+0xa0/0x28c
ftrace_process_locs+0x2f4/0x430
ftrace_module_init+0x30/0x38
load_module+0x14f0/0x1e78
__do_sys_finit_module+0x100/0x11c
__arm64_sys_finit_module+0x28/0x34
el0_svc_common+0x88/0x194
el0_svc_handler+0x38/0x8c
el0_svc+0x8/0xc
---[ end trace
d828d06b36ad9d59 ]---
ftrace failed to modify
[<
ffffa2dbf3a3a41c>] 0xffffa2dbf3a3a41c
actual: 66:a9:3c:90
Initializing ftrace call sites
ftrace record flags:
2000000
(0)
expected tramp:
ffffa2dc6cf66724
So Limit the relocation type to R_AARCH64_CALL26 as in perl version of
recordmcount.
Fixes:
af64d2aa872a ("ftrace: Add arm64 support to recordmcount")
Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20200717143338.19302-1-gregory.herrero@oracle.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Fri, 24 Jul 2020 09:41:31 +0000 (10:41 +0100)]
arm64: Reserve HWCAP2_MTE as (1 << 18)
While MTE is not supported in the upstream kernel yet, add a comment
that HWCAP2_MTE as (1 << 18) is reserved. Glibc makes use of it for the
resolving (ifunc) of the MTE-safe string routines.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel [Tue, 21 Jul 2020 08:33:15 +0000 (10:33 +0200)]
arm64/entry: deduplicate SW PAN entry/exit routines
Factor the 12 copies of the SW PAN entry and exit code into callable
subroutines, and use alternatives patching to either emit a 'bl'
instruction to call them, or a NOP if h/w PAN is found to be available
at runtime.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200721083315.4816-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Vladimir Murzin [Tue, 21 Jul 2020 09:12:59 +0000 (10:12 +0100)]
arm64: s/AMEVTYPE/AMEVTYPER
Activity Monitor Event Type Registers are named as AMEVTYPER{0,1}<n>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200721091259.102756-1-vladimir.murzin@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Shaokun Zhang [Tue, 21 Jul 2020 10:49:33 +0000 (18:49 +0800)]
arm64: perf: Expose some new events via sysfs
Some new PMU events can been detected by PMCEID1_EL0, but it can't
be listed, Let's expose these through sysfs.
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1595328573-12751-2-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Tue, 30 Jun 2020 12:53:07 +0000 (13:53 +0100)]
arm64: Reduce the number of header files pulled into vmlinux.lds.S
Although vmlinux.lds.S smells like an assembly file and is compiled
with __ASSEMBLY__ defined, it's actually just fed to the preprocessor to
create our linker script. This means that any assembly macros defined
by headers that it includes will result in a helpful link error:
| aarch64-linux-gnu-ld:./arch/arm64/kernel/vmlinux.lds:1: syntax error
In preparation for an arm64-private asm/rwonce.h implementation, which
will end up pulling assembly macros into linux/compiler.h, reduce the
number of headers we include directly and transitively in vmlinux.lds.S
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Tue, 21 Jul 2020 08:54:15 +0000 (09:54 +0100)]
compiler.h: Move compiletime_assert() macros into compiler_types.h
The kernel test robot reports that moving READ_ONCE() out into its own
header breaks a W=1 build for parisc, which is relying on the definition
of compiletime_assert() being available:
| In file included from ./arch/parisc/include/generated/asm/rwonce.h:1,
| from ./include/asm-generic/barrier.h:16,
| from ./arch/parisc/include/asm/barrier.h:29,
| from ./arch/parisc/include/asm/atomic.h:11,
| from ./include/linux/atomic.h:7,
| from kernel/locking/percpu-rwsem.c:2:
| ./arch/parisc/include/asm/atomic.h: In function 'atomic_read':
| ./include/asm-generic/rwonce.h:36:2: error: implicit declaration of function 'compiletime_assert' [-Werror=implicit-function-declaration]
| 36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
| | ^~~~~~~~~~~~~~~~~~
| ./include/asm-generic/rwonce.h:49:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
| 49 | compiletime_assert_rwonce_type(x); \
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| ./arch/parisc/include/asm/atomic.h:73:9: note: in expansion of macro 'READ_ONCE'
| 73 | return READ_ONCE((v)->counter);
| | ^~~~~~~~~
Move these macros into compiler_types.h, so that they are available to
READ_ONCE() and friends.
Link: http://lists.infradead.org/pipermail/linux-arm-kernel/2020-July/587094.html
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Thu, 7 Nov 2019 14:49:00 +0000 (14:49 +0000)]
checkpatch: Remove checks relating to [smp_]read_barrier_depends()
The [smp_]read_barrier_depends() macros no longer exist, so we don't
need to deal with them in the checkpatch script.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Thu, 7 Nov 2019 14:46:59 +0000 (14:46 +0000)]
include/linux: Remove smp_read_barrier_depends() from comments
smp_read_barrier_depends() doesn't exist any more, so reword the two
comments that mention it to refer to "dependency ordering" instead.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Thu, 7 Nov 2019 14:44:06 +0000 (14:44 +0000)]
tools/memory-model: Remove smp_read_barrier_depends() from informal doc
smp_read_barrier_depends() has gone the way of mmiowb() and so many
esoteric memory barriers before it. Drop the two mentions of this
deceased barrier from the LKMM informal explanation document.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
SeongJae Park [Fri, 29 Nov 2019 18:08:37 +0000 (19:08 +0100)]
Documentation/barriers/kokr: Remove references to [smp_]read_barrier_depends()
This commit translates commit ("Documentation/barriers: Remove references to
[smp_]read_barrier_depends()") into Korean.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Yunjae Lee <lyj7694@gmail.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Thu, 7 Nov 2019 14:36:37 +0000 (14:36 +0000)]
Documentation/barriers: Remove references to [smp_]read_barrier_depends()
The [smp_]read_barrier_depends() barrier macros no longer exist as
part of the Linux memory model, so remove all references to them from
the Documentation/ directory.
Although this is fairly mechanical on the whole, we drop the "CACHE
COHERENCY" section entirely from 'memory-barriers.txt' as it doesn't
make any sense now that the dependency barriers have been removed.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 30 Oct 2019 17:17:22 +0000 (17:17 +0000)]
locking/barriers: Remove definitions for [smp_]read_barrier_depends()
There are no remaining users of [smp_]read_barrier_depends(), so
remove it from the generic implementation of 'barrier.h'.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 30 Oct 2019 17:15:01 +0000 (17:15 +0000)]
alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()
In preparation for removing smp_read_barrier_depends() altogether,
move the Alpha code over to using smp_rmb() and smp_mb() directly.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 30 Oct 2019 16:22:17 +0000 (16:22 +0000)]
vhost: Remove redundant use of read_barrier_depends() barrier
Since commit
76ebbe78f739 ("locking/barriers: Add implicit
smp_read_barrier_depends() to READ_ONCE()"), there is no need to use
smp_read_barrier_depends() outside of the Alpha architecture code.
Unfortunately, there is precisely _one_ user in the vhost code, and
there isn't an obvious READ_ONCE() access making the barrier
redundant. However, on closer inspection (thanks, Jason), it appears
that vring synchronisation between the producer and consumer occurs via
the 'avail_idx' field, which is followed up by an rmb() in
vhost_get_vq_desc(), making the read_barrier_depends() redundant on
Alpha.
Jason says:
| I'm also confused about the barrier here, basically in driver side
| we did:
|
| 1) allocate pages
| 2) store pages in indirect->addr
| 3) smp_wmb()
| 4) increase the avail idx (somehow a tail pointer of vring)
|
| in vhost we did:
|
| 1) read avail idx
| 2) smp_rmb()
| 3) read indirect->addr
| 4) read from indirect->addr
|
| It looks to me even the data dependency barrier is not necessary
| since we have rmb() which is sufficient for us to the correct
| indirect->addr and driver are not expected to do any writing to
| indirect->addr after avail idx is increased
Remove the redundant barrier invocation.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Fri, 10 Jul 2020 13:49:40 +0000 (14:49 +0100)]
asm/rwonce: Don't pull <asm/barrier.h> into 'asm-generic/rwonce.h'
Now that 'smp_read_barrier_depends()' has gone the way of the Norwegian
Blue, drop the inclusion of <asm/barrier.h> in 'asm-generic/rwonce.h'.
This requires fixups to some architecture vdso headers which were
previously relying on 'asm/barrier.h' coming in via 'linux/compiler.h'.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 30 Oct 2019 16:51:07 +0000 (16:51 +0000)]
asm/rwonce: Remove smp_read_barrier_depends() invocation
Alpha overrides __READ_ONCE() directly, so there's no need to use
smp_read_barrier_depends() in the core code. This also means that
__READ_ONCE() can be relied upon to provide dependency ordering.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 30 Oct 2019 16:50:10 +0000 (16:50 +0000)]
alpha: Override READ_ONCE() with barriered implementation
Rather then relying on the core code to use smp_read_barrier_depends()
as part of the READ_ONCE() definition, instead override __READ_ONCE()
in the Alpha code so that it generates the required mb() and then
implement smp_load_acquire() using the new macro to avoid redundant
back-to-back barriers from the generic implementation.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 16 Oct 2019 00:30:47 +0000 (17:30 -0700)]
asm/rwonce: Allow __READ_ONCE to be overridden by the architecture
The meat and potatoes of READ_ONCE() is defined by the __READ_ONCE()
macro, which uses a volatile casts in an attempt to avoid tearing of
byte, halfword, word and double-word accesses. Allow this to be
overridden by the architecture code in the case that things like memory
barriers are also required.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Tue, 15 Oct 2019 23:29:32 +0000 (16:29 -0700)]
compiler.h: Split {READ,WRITE}_ONCE definitions out into rwonce.h
In preparation for allowing architectures to define their own
implementation of the READ_ONCE() macro, move the generic
{READ,WRITE}_ONCE() definitions out of the unwieldy 'linux/compiler.h'
file and into a new 'rwonce.h' header under 'asm-generic'.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Thu, 21 Nov 2019 12:41:40 +0000 (12:41 +0000)]
tools: bpf: Use local copy of headers including uapi/linux/filter.h
Pulling header files directly out of the kernel sources for inclusion in
userspace programs is highly error prone, not least because it bypasses
the kbuild infrastructure entirely and so may end up referencing other
header files that have not been generated.
Subsequent patches will cause compiler.h to pull in the ungenerated
asm/rwonce.h file via filter.h, breaking the build for tools/bpf:
| $ make -C tools/bpf
| make: Entering directory '/linux/tools/bpf'
| CC bpf_jit_disasm.o
| LINK bpf_jit_disasm
| CC bpf_dbg.o
| In file included from /linux/include/uapi/linux/filter.h:9,
| from /linux/tools/bpf/bpf_dbg.c:41:
| /linux/include/linux/compiler.h:247:10: fatal error: asm/rwonce.h: No such file or directory
| #include <asm/rwonce.h>
| ^~~~~~~~~~~~~~
| compilation terminated.
| make: *** [Makefile:61: bpf_dbg.o] Error 1
| make: Leaving directory '/linux/tools/bpf'
Take a copy of the installed version of linux/filter.h (i.e. the one
created by the 'headers_install' target) into tools/include/uapi/linux/
and adjust the BPF tool Makefile to reference the local include
directories instead of those in the main source tree.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Will Deacon <will@kernel.org>
Leo Yan [Thu, 16 Jul 2020 05:11:30 +0000 (13:11 +0800)]
tools headers UAPI: Update tools's copy of linux/perf_event.h
To get the changes in the commit:
"perf: Add perf_event_mmap_page::cap_user_time_short ABI"
This update is a prerequisite to add support for short clock counters
related ABI extension.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-8-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Peter Zijlstra [Thu, 16 Jul 2020 05:11:29 +0000 (13:11 +0800)]
arm64: perf: Add cap_user_time_short
This completes the ARM64 cap_user_time support.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-7-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Peter Zijlstra [Thu, 16 Jul 2020 05:11:28 +0000 (13:11 +0800)]
perf: Add perf_event_mmap_page::cap_user_time_short ABI
In order to support short clock counters, provide an ABI extension.
As a whole:
u64 time, delta, cyc = read_cycle_counter();
+ if (cap_user_time_short)
+ cyc = time_cycle + ((cyc - time_cycle) & time_mask);
delta = mul_u64_u32_shr(cyc, time_mult, time_shift);
if (cap_user_time_zero)
time = time_zero + delta;
delta += time_offset;
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-6-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Peter Zijlstra [Thu, 16 Jul 2020 05:11:27 +0000 (13:11 +0800)]
arm64: perf: Only advertise cap_user_time for arch_timer
When sched_clock is running on anything other than arch_timer, don't
advertise cap_user_time*.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-5-leo.yan@linaro.org
Requested-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Peter Zijlstra [Thu, 16 Jul 2020 05:11:26 +0000 (13:11 +0800)]
arm64: perf: Implement correct cap_user_time
As reported by Leo; the existing implementation is broken when the
clock and counter don't intersect at 0.
Use the sched_clock's struct clock_read_data information to correctly
implement cap_user_time and cap_user_time_zero.
Note that the ARM64 counter is architecturally only guaranteed to be
56bit wide (implementations are allowed to be wider) and the existing
perf ABI cannot deal with wrap-around.
This implementation should also be faster than the old; seeing how we
don't need to recompute mult and shift all the time.
[leoyan: Use mul_u64_u32_shr() to convert cyc to ns to avoid overflow]
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-4-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Ahmed S. Darwish [Thu, 16 Jul 2020 05:11:25 +0000 (13:11 +0800)]
time/sched_clock: Use raw_read_seqcount_latch()
sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.
Since
7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it also has one read
memory barrier less than the currently used raw_read_seqcount() API.
Use raw_read_seqcount_latch() for the seqcount_t latch read path.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
Link: https://lore.kernel.org/r/20200716051130.4359-3-leo.yan@linaro.org
References:
1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Will Deacon <will@kernel.org>
Peter Zijlstra [Thu, 16 Jul 2020 05:11:24 +0000 (13:11 +0800)]
sched_clock: Expose struct clock_read_data
In order to support perf_event_mmap_page::cap_time features, an
architecture needs, aside from a userspace readable counter register,
to expose the exact clock data so that userspace can convert the
counter register into a correct timestamp.
Provide struct clock_read_data and two (seqcount) helpers so that
architectures (arm64 in specific) can expose the numbers to userspace.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-2-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Shaokun Zhang [Thu, 18 Jun 2020 13:35:44 +0000 (21:35 +0800)]
arm64: perf: Correct the event index in sysfs
When PMU event ID is equal or greater than 0x4000, it will be reduced
by 0x4000 and it is not the raw number in the sysfs. Let's correct it
and obtain the raw event ID.
Before this patch:
cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed
event=0x001
After this patch:
cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed
event=0x4001
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1592487344-30555-3-git-send-email-zhangshaokun@hisilicon.com
[will: fixed formatting of 'if' condition]
Signed-off-by: Will Deacon <will@kernel.org>
Zhenyu Ye [Wed, 15 Jul 2020 07:19:45 +0000 (15:19 +0800)]
arm64: tlb: Use the TLBI RANGE feature in arm64
Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().
When cpu supports TLBI feature, the minimum range granularity is
decided by 'scale', so we can not flush all pages by one instruction
in some cases.
For example, when the pages = 0xe81a, let's start 'scale' from
maximum, and find right 'num' for each 'scale':
1. scale = 3, we can flush no pages because the minimum range is
2^(5*3 + 1) = 0x10000.
2. scale = 2, the minimum range is 2^(5*2 + 1) = 0x800, we can
flush 0xe800 pages this time, the num = 0xe800/0x800 - 1 = 0x1c.
Remaining pages is 0x1a;
3. scale = 1, the minimum range is 2^(5*1 + 1) = 0x40, no page
can be flushed.
4. scale = 0, we flush the remaining 0x1a pages, the num =
0x1a/0x2 - 1 = 0xd.
However, in most scenarios, the pages = 1 when flush_tlb_range() is
called. Start from scale = 3 or other proper value (such as scale =
ilog2(pages)), will incur extra overhead.
So increase 'scale' from 0 to maximum, the flush order is exactly
opposite to the example.
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200715071945.897-4-yezhenyu2@huawei.com
[catalin.marinas@arm.com: removed unnecessary masks in __TLBI_VADDR_RANGE]
[catalin.marinas@arm.com: __TLB_RANGE_NUM subtracts 1]
[catalin.marinas@arm.com: minor adjustments to the comments]
[catalin.marinas@arm.com: introduce system_supports_tlb_range()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Zhenyu Ye [Wed, 15 Jul 2020 07:19:44 +0000 (15:19 +0800)]
arm64: enable tlbi range instructions
TLBI RANGE feature instoduces new assembly instructions and only
support by binutils >= 2.30. Add necessary Kconfig logic to allow
this to be enabled and pass '-march=armv8.4-a' to KBUILD_CFLAGS.
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200715071945.897-3-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Zhenyu Ye [Wed, 15 Jul 2020 07:19:43 +0000 (15:19 +0800)]
arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This patch detect this feature.
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200715071945.897-2-yezhenyu2@huawei.com
[catalin.marinas@arm.com: some renaming for consistency]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Wed, 1 Jul 2020 04:42:01 +0000 (10:12 +0530)]
arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs
Currently 'hugetlb_cma=' command line argument does not create CMA area on
ARM64_16K_PAGES and ARM64_64K_PAGES based platforms. Instead, it just ends
up with the following warning message. Reason being, hugetlb_cma_reserve()
never gets called for these huge page sizes.
[ 64.255669] hugetlb_cma: the option isn't supported by current arch
This enables CMA areas reservation on ARM64_16K_PAGES and ARM64_64K_PAGES
configs by defining an unified arm64_hugetlb_cma_reseve() that is wrapped
in CONFIG_CMA. Call site for arm64_hugetlb_cma_reserve() is also protected
as <asm/hugetlb.h> is conditionally included and hence cannot contain stub
for the inverse config i.e !(CONFIG_HUGETLB_PAGE && CONFIG_CMA).
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593578521-24672-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Fri, 10 Jul 2020 18:24:02 +0000 (19:24 +0100)]
arm64: stacktrace: Move export for save_stack_trace_tsk()
Due to refactoring way back in
bb53c820c5b0f1 ("arm64: stacktrace: avoid
listing stacktrace functions in stacktrace") the EXPORT_SYMBOL_GPL() for
save_stack_trace_tsk() is at the end of __save_stack_trace() rather than
the function it exports. Move it to the expected location.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200710182402.50473-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel [Fri, 26 Jun 2020 15:58:32 +0000 (17:58 +0200)]
arm64/acpi: disallow writeable AML opregion mapping for EFI code regions
Given that the contents of EFI runtime code and data regions are
provided by the firmware, as well as the DSDT, it is not unimaginable
that AML code exists today that accesses EFI runtime code regions using
a SystemMemory OpRegion. There is nothing fundamentally wrong with that,
but since we take great care to ensure that executable code is never
mapped writeable and executable at the same time, we should not permit
AML to create writable mapping.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20200626155832.2323789-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel [Fri, 26 Jun 2020 15:58:31 +0000 (17:58 +0200)]
arm64/acpi: disallow AML memory opregions to access kernel memory
AML uses SystemMemory opregions to allow AML handlers to access MMIO
registers of, e.g., GPIO controllers, or access reserved regions of
memory that are owned by the firmware.
Currently, we also allow AML access to memory that is owned by the
kernel and mapped via the linear region, which does not seem to be
supported by a valid use case, and exposes the kernel's internal
state to AML methods that may be buggy and exploitable.
On arm64, ACPI support requires booting in EFI mode, and so we can cross
reference the requested region against the EFI memory map, rather than
just do a minimal check on the first page. So let's only permit regions
to be remapped by the ACPI core if
- they don't appear in the EFI memory map at all (which is the case for
most MMIO), or
- they are covered by a single region in the EFI memory map, which is not
of a type that describes memory that is given to the kernel at boot.
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20200626155832.2323789-2-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jay Chen [Mon, 6 Jul 2020 11:22:45 +0000 (19:22 +0800)]
perf/smmuv3: To simplify code for ioremap page in pmcg
Use the devm_platform_get_and_ioremap_resource to simplify the code
a bit.
Signed-off-by: Jay Chen <jkchen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20200706112246.92220-2-jkchen@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
Zhenyu Ye [Fri, 10 Jul 2020 09:41:58 +0000 (17:41 +0800)]
arm64: tlb: don't set the ttl value in flush_tlb_page_nosync
flush_tlb_page_nosync() may be called from pmd level, so we
can not set the ttl = 3 here.
The callstack is as follows:
pmdp_set_access_flags
ptep_set_access_flags
flush_tlb_fix_spurious_fault
flush_tlb_page
flush_tlb_page_nosync
Fixes:
e735b98a5fe0 ("arm64: Add tlbi_user_level TLB invalidation helper")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200710094158.468-1-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Tue, 7 Jul 2020 14:23:13 +0000 (19:53 +0530)]
arm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[]
arm64_feature_bits for a register in arm64_ftr_regs[] are in a descending
order as per their shift values. Validate that these features bits are
defined correctly and do not overlap with each other. This check protects
against any inadvertent erroneous changes to the register definitions.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1594131793-9498-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Tue, 7 Jul 2020 10:26:14 +0000 (11:26 +0100)]
arm64: Shift the __tlbi_level() indentation left
This is for consistency with the other __tlbi macros in this file. No
functional change.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Zhenyu Ye [Thu, 25 Jun 2020 08:03:14 +0000 (16:03 +0800)]
arm64: tlb: Set the TTL field in flush_*_tlb_range
This patch implement flush_{pmd|pud}_tlb_range() in arm64 by
calling __flush_tlb_range() with the corresponding stride and
tlb_level values.
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200625080314.230-7-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Zhenyu Ye [Thu, 25 Jun 2020 08:03:13 +0000 (16:03 +0800)]
arm64: tlb: Set the TTL field in flush_tlb_range
This patch uses the cleared_* in struct mmu_gather to set the
TTL field in flush_tlb_range().
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200625080314.230-6-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Peter Zijlstra (Intel) [Thu, 25 Jun 2020 08:03:12 +0000 (16:03 +0800)]
tlb: mmu_gather: add tlb_flush_*_range APIs
tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and
tlb->end, then set corresponding cleared_*.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200625080314.230-5-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Zhenyu Ye [Thu, 25 Jun 2020 08:03:11 +0000 (16:03 +0800)]
arm64: Add tlbi_user_level TLB invalidation helper
Add a level-hinted parameter to __tlbi_user, which only gets used
if ARMv8.4-TTL gets detected.
ARMv8.4-TTL provides the TTL field in tlbi instruction to indicate
the level of translation table walk holding the leaf entry for the
address that is being invalidated.
This patch set the default level value of flush_tlb_range() to 0,
which will be updated in future patches. And set the ttl value of
flush_tlb_page_nosync() to 3 because it is only called to flush a
single pte page.
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200625080314.230-4-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Andrew Scull [Thu, 18 Jun 2020 14:55:11 +0000 (15:55 +0100)]
smccc: Make constants available to assembly
Move constants out of the C-only section of the header next to the other
constants that are available to assembly.
Signed-off-by: Andrew Scull <ascull@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200618145511.69203-1-ascull@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Marc Zyngier [Wed, 2 Jan 2019 10:21:29 +0000 (10:21 +0000)]
arm64: Add level-hinted TLB invalidation helper
Add a level-hinted TLB invalidation helper that only gets used if
ARMv8.4-TTL gets detected.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 28 Dec 2018 09:11:50 +0000 (09:11 +0000)]
arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
Advertise bits [58:55] as reserved for SW in the S2 descriptors.
Reviewed-by: Andrew Scull <ascull@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sat, 22 Dec 2018 12:00:10 +0000 (12:00 +0000)]
arm64: Detect the ARMv8.4 TTL feature
In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
feature allows TLBs to be issued with a level allowing for quicker
invalidation.
Let's detect the feature for now. Further patches will implement
its actual usage.
Reviewed-by : Suzuki K Polose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Gavin Shan [Tue, 30 Jun 2020 06:24:28 +0000 (16:24 +1000)]
arm64/mm: Redefine CONT_{PTE, PMD}_SHIFT
Currently, the value of CONT_{PTE, PMD}_SHIFT is off from standard
{PAGE, PMD}_SHIFT. In turn, we have to consider adding {PAGE, PMD}_SHIFT
when using CONT_{PTE, PMD}_SHIFT in the function hugetlbpage_init().
It's a bit confusing.
This redefines CONT_{PTE, PMD}_SHIFT with {PAGE, PMD}_SHIFT included
so that the later values needn't be added when using the former ones
in function hugetlbpage_init(). Note that the values of CONT_{PTES, PMDS}
are unchanged.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lkml.org/lkml/2020/5/6/190
Link: https://lore.kernel.org/r/20200630062428.194235-1-gshan@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bhupesh Sharma [Mon, 6 Apr 2020 22:31:40 +0000 (04:01 +0530)]
arm64/defconfig: Enable CONFIG_KEXEC_FILE
kexec_file_load() syscall interface is now supported for
arm64 architecture as well via commits:
3751e728cef2 ("arm64: kexec_file: add crash dump support") and
3ddd9992a590 ("arm64: enable KEXEC_FILE config")].
This patch enables config KEXEC_FILE by default in the
arm64 defconfig, so that user-space tools like kexec-tools
can use the same as the default interface for kexec/kdump
on arm64.
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1586212300-30797-1-git-send-email-bhsharma@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Fri, 3 Jul 2020 03:51:37 +0000 (09:21 +0530)]
arm64/cpufeature: Replace all open bits shift encodings with macros
There are many open bits shift encodings for various CPU ID registers that
are scattered across cpufeature. This replaces them with register specific
sensible macro definitions. This should not have any functional change.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-5-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Fri, 3 Jul 2020 03:51:36 +0000 (09:21 +0530)]
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register
Enable EVT, BBM, TTL, IDS, ST, NV and CCIDX features bits in ID_AA64MMFR2
register as per ARM DDI 0487F.a specification.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Fri, 3 Jul 2020 03:51:35 +0000 (09:21 +0530)]
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register
Enable ETS, TWED, XNX and SPECSEI features bits in ID_AA64MMFR1 register as
per ARM DDI 0487F.a specification.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Fri, 3 Jul 2020 03:51:34 +0000 (09:21 +0530)]
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register
Enable EVC, FGT, EXS features bits in ID_AA64MMFR0 register as per ARM DDI
0487F.a specification.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Thu, 25 Jun 2020 13:15:07 +0000 (14:15 +0100)]
arm64: Document sysctls for emulated deprecated instructions
We have support for emulating a number of deprecated instructions in the
kernel with individual Kconfig options enabling this support per
instruction. In addition to the Kconfig options we also provide runtime
control via sysctls but this is not currently mentioned in the Kconfig so
not very discoverable for users. This is particularly important for
SWP/SWPB since this is disabled by default at runtime and must be enabled
via the sysctl, causing considerable frustration for users who have enabled
the config option and are then confused to find that the instruction is
still faulting.
Add a reference to the sysctls in the help text for each of the config
options, noting that SWP/SWPB is disabled by default, to improve the
user experience.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200625131507.32334-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bhupesh Sharma [Wed, 13 May 2020 18:52:37 +0000 (00:22 +0530)]
arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
TCR_EL1.TxSZ, which controls the VA space size, is configured by a
single kernel image to support either 48-bit or 52-bit VA space.
If the ARMv8.2-LVA optional feature is present and we are running
with a 64KB page size, then it is possible to use 52-bits of address
space for both userspace and kernel addresses. However, any kernel
binary that supports 52-bit must also be able to fall back to 48-bit
at early boot time if the hardware feature is not present.
Since TCR_EL1.T1SZ indicates the size of the memory region addressed by
TTBR1_EL1, export the same in vmcoreinfo. User-space utilities like
makedumpfile and crash-utility need to read this value from vmcoreinfo
for determining if a virtual address lies in the linear map range.
While at it also add documentation for TCR_EL1.T1SZ variable being
added to vmcoreinfo.
It indicates the size offset of the memory region addressed by
TTBR1_EL1.
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: John Donnelly <john.p.donnelly@oracle.com>
Tested-by: Kamlakant Patel <kamlakantp@marvell.com>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kexec@lists.infradead.org
Link: https://lore.kernel.org/r/1589395957-24628-3-git-send-email-bhsharma@redhat.com
[catalin.marinas@arm.com: removed vabits_actual from the commit log]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bhupesh Sharma [Wed, 13 May 2020 18:52:36 +0000 (00:22 +0530)]
crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
Right now user-space tools like 'makedumpfile' and 'crash' need to rely
on a best-guess method of determining value of 'MAX_PHYSMEM_BITS'
supported by underlying kernel.
This value is used in user-space code to calculate the bit-space
required to store a section for SPARESMEM (similar to the existing
calculation method used in the kernel implementation):
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
Now, regressions have been reported in user-space utilities
like 'makedumpfile' and 'crash' on arm64, with the recently added
kernel support for 52-bit physical address space, as there is
no clear method of determining this value in user-space
(other than reading kernel CONFIG flags).
As per suggestion from makedumpfile maintainer (Kazu), it makes more
sense to append 'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself
rather than in arch-specific code, so that the user-space code for other
archs can also benefit from this addition to the vmcoreinfo and use it
as a standard way of determining 'SECTIONS_SHIFT' value in user-land.
A reference 'makedumpfile' implementation which reads the
'MAX_PHYSMEM_BITS' value from vmcoreinfo in a arch-independent fashion
is available here:
While at it also update vmcoreinfo documentation for 'MAX_PHYSMEM_BITS'
variable being added to vmcoreinfo.
'MAX_PHYSMEM_BITS' defines the maximum supported physical address
space memory.
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: John Donnelly <john.p.donnelly@oracle.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Boris Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: x86@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kexec@lists.infradead.org
Link: https://lore.kernel.org/r/1589395957-24628-2-git-send-email-bhsharma@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Mon, 29 Jun 2020 04:38:31 +0000 (10:08 +0530)]
arm64/panic: Unify all three existing notifier blocks
Currently there are three different registered panic notifier blocks. This
unifies all of them into a single one i.e arm64_panic_block, hence reducing
code duplication and required calling sequence during panic. This preserves
the existing dump sequence. While here, just use device_initcall() directly
instead of __initcall() which has been a legacy alias for the earlier. This
replacement is a pure cleanup with no functional implications.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593405511-7625-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Saravana Kannan [Tue, 23 Jun 2020 01:18:02 +0000 (18:18 -0700)]
arm64/module: Optimize module load time by optimizing PLT counting
When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs that'll be needed to handle all the RELAs. While
doing this, it tries to dedupe PLT allocations for multiple
R_AARCH64_CALL26 relocations to the same symbol. It does the same for
R_AARCH64_JUMP26 relocations.
To make checks for duplicates easier/faster, it sorts the relocation
list by type, symbol and addend. That way, to check for a duplicate
relocation, it just needs to compare with the previous entry.
However, sorting the entire relocation array is unnecessary and
expensive (O(n log n)) because there are a lot of other relocation types
that don't need deduping or can't be deduped.
So this commit partitions the array into entries that need deduping and
those that don't. And then sorts just the part that needs deduping. And
when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
if it's disabled.
This gives significant reduction in module load time for modules with
large number of relocations with no measurable impact on modules with a
small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
enabled, these were the results for a few downstream modules:
Module Size (MB)
wlan 14
video codec 3.8
drm 1.8
IPA 2.5
audio 1.2
gpu 1.8
Without this patch:
Module Number of entries sorted Module load time (ms)
wlan 243739 283
video codec 74029 138
drm 53837 67
IPA 42800 90
audio 21326 27
gpu 20967 32
Total time to load all these module: 637 ms
With this patch:
Module Number of entries sorted Module load time (ms)
wlan 22454 61
video codec 10150 47
drm 13014 40
IPA 8097 63
audio 4606 16
gpu 6527 20
Total time to load all these modules: 247
Time saved during boot for just these 6 modules: 390 ms
Signed-off-by: Saravana Kannan <saravanak@google.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Link: https://lore.kernel.org/r/20200623011803.91232-1-saravanak@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Sun, 28 Jun 2020 22:00:24 +0000 (15:00 -0700)]
Linux 5.8-rc3
Linus Torvalds [Sun, 28 Jun 2020 21:57:14 +0000 (14:57 -0700)]
Merge tag 'arm-omap-fixes-5.8-1' of git://git./linux/kernel/git/soc/soc
Pull ARM OMAP fixes from Arnd Bergmann:
"The OMAP developers are particularly active at hunting down
regressions, so this is a separate branch with OMAP specific
fixes for v5.8:
As Tony explains
"The recent display subsystem (DSS) related platform data changes
caused display related regressions for suspend and resume. Looks
like I only tested suspend and resume before dropping the legacy
platform data, and forgot to test it after dropping it. Turns out
the main issue was that we no longer have platform code calling
pm_runtime_suspend for DSS like we did for the legacy platform data
case, and that fix is still being discussed on the dri-devel list
and will get merged separately. The DSS related testing exposed a
pile other other display related issues that also need fixing
though":
- Fix ti-sysc optional clock handling and reset status checks for
devices that reset automatically in idle like DSS
- Ignore ti-sysc clockactivity bit unless separately requested to
avoid unexpected performance issues
- Init ti-sysc framedonetv_irq to true and disable for am4
- Avoid duplicate DSS reset for legacy mode with dts data
- Remove LCD timings for am4 as they cause warnings now that we're
using generic panels
Other OMAP changes from Tony include:
- Fix omap_prm reset deassert as we still have drivers setting the
pm_runtime_irq_safe() flag
- Flush posted write for ti-sysc enable and disable
- Fix droid4 spi related errors with spi flags
- Fix am335x USB range and a typo for softreset
- Fix dra7 timer nodes for clocks for IPU and DSP
- Drop duplicate mailboxes after mismerge for dra7
- Prevent pocketgeagle header line signal from accidentally setting
micro-SD write protection signal by removing the default mux
- Fix NFSroot flakeyness after resume for duover by switching the
smsc911x gpio interrupt to back to level sensitive
- Fix regression for omap4 clockevent source after recent system
timer changes
- Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
phy-mode
- One patch to convert am3/am4 DT files to use the regular sdhci-omap
driver instead of the old hsmmc driver, this was meant for the
merge window but got lost in the process"
* tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode
ARM: dts: Fix omap4 system timer source clocks
ARM: dts: Fix duovero smsc interrupt for suspend
ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect
Revert "bus: ti-sysc: Increase max softreset wait"
ARM: dts: am437x-epos-evm: remove lcd timings
ARM: dts: am437x-gp-evm: remove lcd timings
ARM: dts: am437x-sk-evm: remove lcd timings
ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes
ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks
ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag
ARM: dts: Fix am33xx.dtsi USB ranges length
bus: ti-sysc: Increase max softreset wait
ARM: OMAP2+: Fix legacy mode dss_reset
bus: ti-sysc: Fix uninitialized framedonetv_irq
bus: ti-sysc: Ignore clockactivity unless specified as a quirk
bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit
ARM: dts: omap4-droid4: Fix spi configuration and increase rate
bus: ti-sysc: Flush posted write on enable and disable
soc: ti: omap-prm: use atomic iopoll instead of sleeping one
...
Linus Torvalds [Sun, 28 Jun 2020 21:55:18 +0000 (14:55 -0700)]
Merge tag 'arm-fixes-5.8-1' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple of bug fixes, mostly for devicetree files
NXP i.MX:
- Use correct voltage on some i.MX8M board device trees to avoid
hardware damage
- Code fixes for a compiler warning and incorrect reference counting,
both harmless.
- Fix the i.MX8M SoC driver to correctly identify imx8mp
- Fix watchdog configuration in imx6ul-kontron device tree.
Broadcom:
- A small regression fix for the Raspberry-Pi firmware driver
- A Kconfig change to use the correct timer driver on Northstar
- A DT fix for the Luxul XWC-2000 machine
- Two more DT fixes for NSP SoCs
STmicroelectronics STI
- Revert one broken patch for L2 cache configuration
ARM Versatile Express:
- Fix a regression by reverting a broken DT cleanup
TEE drivers:
- MAINTAINERS: change tee mailing list"
* tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
Revert "ARM: sti: Implement dummy L2 cache's write_sec"
soc: imx8m: fix build warning
ARM: imx6: add missing put_device() call in imx6q_suspend_init()
ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()
soc: imx8m: Correct i.MX8MP UID fuse offset
ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain
ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM
arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2
arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range
arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range
ARM: dts: NSP: Correct FA2 mailbox node
ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision()
MAINTAINERS: change tee mailing list
ARM: dts: NSP: Disable PL330 by default, add dma-coherent property
ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP
ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000
arm: dts: vexpress: Move mcc node back into motherboard node
Linus Torvalds [Sun, 28 Jun 2020 18:59:08 +0000 (11:59 -0700)]
Merge tag 'timers-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"A single DocBook fix"
* tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix kerneldoc system_device_crosststamp & al
Linus Torvalds [Sun, 28 Jun 2020 18:58:14 +0000 (11:58 -0700)]
Merge tag 'perf-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"A single Kbuild dependency fix"
* tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Fix RAPL config variable bug
Linus Torvalds [Sun, 28 Jun 2020 18:42:16 +0000 (11:42 -0700)]
Merge tag 'efi-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
- Fix build regression on v4.8 and older
- Robustness fix for TPM log parsing code
- kobject refcount fix for the ESRT parsing code
- Two efivarfs fixes to make it behave more like an ordinary file
system
- Style fixup for zero length arrays
- Fix a regression in path separator handling in the initrd loader
- Fix a missing prototype warning
- Add some kerneldoc headers for newly introduced stub routines
- Allow support for SSDT overrides via EFI variables to be disabled
- Report CPU mode and MMU state upon entry for 32-bit ARM
- Use the correct stack pointer alignment when entering from mixed mode
* tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub: arm: Print CPU boot mode and MMU state at boot
efi/libstub: arm: Omit arch specific config table matching array on arm64
efi/x86: Setup stack correctly for efi_pe_entry
efi: Make it possible to disable efivar_ssdt entirely
efi/libstub: Descriptions for stub helper functions
efi/libstub: Fix path separator regression
efi/libstub: Fix missing-prototype warning for skip_spaces()
efi: Replace zero-length array and use struct_size() helper
efivarfs: Don't return -EINTR when rate-limiting reads
efivarfs: Update inode modification time for successful writes
efi/esrt: Fix reference count leak in esre_create_sysfs_entry.
efi/tpm: Verify event log header before parsing
efi/x86: Fix build with gcc 4
Linus Torvalds [Sun, 28 Jun 2020 17:37:39 +0000 (10:37 -0700)]
Merge tag 'sched_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
"The most anticipated fix in this pull request is probably the horrible
build fix for the RANDSTRUCT fail that didn't make -rc2. Also included
is the cleanup that removes those BUILD_BUG_ON()s and replaces it with
ugly unions.
Also included is the try_to_wake_up() race fix that was first
triggered by Paul's RCU-torture runs, but was independently hit by
Dave Chinner's fstest runs as well"
* tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cfs: change initial value of runnable_avg
smp, irq_work: Continue smp_call_function*() and irq_work*() integration
sched/core: s/WF_ON_RQ/WQ_ON_CPU/
sched/core: Fix ttwu() race
sched/core: Fix PI boosting between RT and DEADLINE tasks
sched/deadline: Initialize ->dl_boosted
sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption
sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail
Linus Torvalds [Sun, 28 Jun 2020 17:35:01 +0000 (10:35 -0700)]
Merge tag 'x86_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- AMD Memory bandwidth counter width fix, by Babu Moger.
- Use the proper length type in the 32-bit truncate() syscall variant,
by Jiri Slaby.
- Reinit IA32_FEAT_CTL during wakeup to fix the case where after
resume, VMXON would #GP due to VMX not being properly enabled, by
Sean Christopherson.
- Fix a static checker warning in the resctrl code, by Dan Carpenter.
- Add a CR4 pinning mask for bits which cannot change after boot, by
Kees Cook.
- Align the start of the loop of __clear_user() to 16 bytes, to improve
performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.
* tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/64: Align start of __clear_user() loop to 16-bytes
x86/cpu: Use pinning mask for CR4 bits needing to be 0
x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
syscalls: Fix offset type of ksys_ftruncate()
x86/resctrl: Fix memory bandwidth counter width for AMD
Linus Torvalds [Sun, 28 Jun 2020 17:29:38 +0000 (10:29 -0700)]
Merge tag 'rcu_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip
Pull RCU-vs-KCSAN fixes from Borislav Petkov:
"A single commit that uses "arch_" atomic operations to avoid the
instrumentation that comes with the non-"arch_" versions.
In preparation for that commit, it also has another commit that makes
these "arch_" atomic operations available to generic code.
Without these commits, KCSAN uses can see pointless errors"
* tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Fixup noinstr warnings
locking/atomics: Provide the arch_atomic_ interface to generic code
Linus Torvalds [Sun, 28 Jun 2020 17:16:15 +0000 (10:16 -0700)]
Merge tag 'objtool_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:
"Three fixes from Peter Zijlstra suppressing KCOV instrumentation in
noinstr sections.
Peter Zijlstra says:
"Address KCOV vs noinstr. There is no function attribute to
selectively suppress KCOV instrumentation, instead teach objtool
to NOP out the calls in noinstr functions"
This cures a bunch of KCOV crashes (as used by syzcaller)"
* tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix noinstr vs KCOV
objtool: Provide elf_write_{insn,reloc}()
objtool: Clean up elf_write() condition
Linus Torvalds [Sun, 28 Jun 2020 16:42:47 +0000 (09:42 -0700)]
Merge tag 'x86_entry_for_5.8' of git://git./linux/kernel/git/tip/tip
Pull x86 entry fixes from Borislav Petkov:
"This is the x86/entry urgent pile which has accumulated since the
merge window.
It is not the smallest but considering the almost complete entry core
rewrite, the amount of fixes to follow is somewhat higher than usual,
which is to be expected.
Peter Zijlstra says:
'These patches address a number of instrumentation issues that were
found after the x86/entry overhaul. When combined with rcu/urgent
and objtool/urgent, these patches make UBSAN/KASAN/KCSAN happy
again.
Part of making this all work is bumping the minimum GCC version for
KASAN builds to gcc-8.3, the reason for this is that the
__no_sanitize_address function attribute is broken in GCC releases
before that.
No known GCC version has a working __no_sanitize_undefined, however
because the only noinstr violation that results from this happens
when an UB is found, we treat it like WARN. That is, we allow it to
violate the noinstr rules in order to get the warning out'"
* tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Fix #UD vs WARN more
x86/entry: Increase entry_stack size to a full page
x86/entry: Fixup bad_iret vs noinstr
objtool: Don't consider vmlinux a C-file
kasan: Fix required compiler version
compiler_attributes.h: Support no_sanitize_undefined check with GCC 4
x86/entry, bug: Comment the instrumentation_begin() usage for WARN()
x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*()
x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline()
compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr
kasan: Bump required compiler version
x86, kcsan: Add __no_kcsan to noinstr
kcsan: Remove __no_kcsan_or_inline
x86, kcsan: Remove __no_kcsan_or_inline usage
Vincent Guittot [Wed, 24 Jun 2020 15:44:22 +0000 (17:44 +0200)]
sched/cfs: change initial value of runnable_avg
Some performance regression on reaim benchmark have been raised with
commit
070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
The problem comes from the init value of runnable_avg which is initialized
with max value. This can be a problem if the newly forked task is finally
a short task because the group of CPUs is wrongly set to overloaded and
tasks are pulled less agressively.
Set initial value of runnable_avg equals to util_avg to reflect that there
is no waiting time so far.
Fixes:
070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200624154422.29166-1-vincent.guittot@linaro.org
Peter Zijlstra [Mon, 22 Jun 2020 10:01:25 +0000 (12:01 +0200)]
smp, irq_work: Continue smp_call_function*() and irq_work*() integration
Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions to make it all automatic.
Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
Peter Zijlstra [Mon, 22 Jun 2020 10:01:24 +0000 (12:01 +0200)]
sched/core: s/WF_ON_RQ/WQ_ON_CPU/
Use a better name for this poorly named flag, to avoid confusion...
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org