linux-2.6-microblaze.git
4 weeks agoKVM: arm64: Add FEAT_FGT2 registers to the VNCR page
Marc Zyngier [Tue, 22 Apr 2025 18:21:46 +0000 (19:21 +0100)]
KVM: arm64: Add FEAT_FGT2 registers to the VNCR page

The FEAT_FGT2 registers are part of the VNCR page. Describe the
corresponding offsets and add them to the vcpu sysreg enumeration.

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 weeks agoKVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
Marc Zyngier [Tue, 4 Feb 2025 10:46:41 +0000 (10:46 +0000)]
KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits

Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

An additional complexity stems from the status of some bits such
as E2H and RW, which do not had a RESx status, but still take
a fixed value due to implementation choices in KVM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 weeks agoKVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
Marc Zyngier [Sun, 9 Feb 2025 14:51:23 +0000 (14:51 +0000)]
KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits

Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 weeks agoKVM: arm64: Allow kvm_has_feat() to take variable arguments
Marc Zyngier [Sun, 9 Feb 2025 13:38:35 +0000 (13:38 +0000)]
KVM: arm64: Allow kvm_has_feat() to take variable arguments

In order to be able to write more compact (and easier to read) code,
let kvm_has_feat() and co take variable arguments. This enables
constructs such as:

#define FEAT_SME ID_AA64PFR1_EL1, SME, IMP

if (kvm_has_feat(kvm, FEAT_SME))
[...]

which is admitedly more readable.

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 weeks agoKVM: arm64: Use FGT feature maps to drive RES0 bits
Marc Zyngier [Sun, 9 Feb 2025 14:45:29 +0000 (14:45 +0000)]
KVM: arm64: Use FGT feature maps to drive RES0 bits

Another benefit of mapping bits to features is that it becomes trivial
to define which bits should be handled as RES0.

Let's apply this principle to the guest's view of the FGT registers.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
5 weeks agoKVM: arm64: Validate FGT register descriptions against RES0 masks
Marc Zyngier [Sun, 9 Feb 2025 14:19:05 +0000 (14:19 +0000)]
KVM: arm64: Validate FGT register descriptions against RES0 masks

In order to point out to the unsuspecting KVM hacker that they
are missing something somewhere, validate that the known FGT bits
do not intersect with the corresponding RES0 mask, as computed at
boot time.

THis check is also performed at boot time, ensuring that there is
no runtime overhead.

Signed-off-by: Marc Zyngier <maz@kernel.org>
5 weeks agoKVM: arm64: Switch to table-driven FGU configuration
Marc Zyngier [Sun, 9 Feb 2025 14:05:01 +0000 (14:05 +0000)]
KVM: arm64: Switch to table-driven FGU configuration

Defining the FGU behaviour is extremely tedious. It relies on matching
each set of bits from FGT registers with am architectural feature, and
adding them to the FGU list if the corresponding feature isn't advertised
to the guest.

It is however relatively easy to dump most of that information from
the architecture JSON description, and use that to control the FGU bits.

Let's introduce a new set of tables descripbing the mapping between
FGT bits and features. Most of the time, this is only a lookup in
an idreg field, with a few more complex exceptions.

While this is obviously many more lines in a new file, this is
mostly generated, and is pretty easy to maintain.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
5 weeks agoKVM: arm64: Handle PSB CSYNC traps
Marc Zyngier [Mon, 27 Jan 2025 11:58:38 +0000 (11:58 +0000)]
KVM: arm64: Handle PSB CSYNC traps

The architecture introduces a trap for PSB CSYNC that fits in
 the same EC as LS64. Let's deal with it in a similar way as
LS64.

It's not that we expect this to be useful any time soon anyway.

Signed-off-by: Marc Zyngier <maz@kernel.org>
5 weeks agoKVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
Marc Zyngier [Fri, 24 Jan 2025 19:04:26 +0000 (19:04 +0000)]
KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask

We do not have a computed table for HCRX_EL2, so statically define
the bits we know about. A warning will fire if the architecture
grows bits that are not handled yet.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
5 weeks agoKVM: arm64: Remove hand-crafted masks for FGT registers
Marc Zyngier [Fri, 24 Jan 2025 17:21:17 +0000 (17:21 +0000)]
KVM: arm64: Remove hand-crafted masks for FGT registers

These masks are now useless, and can be removed.

Signed-off-by: Marc Zyngier <maz@kernel.org>
5 weeks agoKVM: arm64: Use computed FGT masks to setup FGT registers
Marc Zyngier [Fri, 24 Jan 2025 17:20:31 +0000 (17:20 +0000)]
KVM: arm64: Use computed FGT masks to setup FGT registers

Flip the hyervisor FGT configuration over to the computed FGT
masks.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Propagate FGT masks to the nVHE hypervisor
Marc Zyngier [Fri, 24 Jan 2025 17:17:42 +0000 (17:17 +0000)]
KVM: arm64: Propagate FGT masks to the nVHE hypervisor

The nVHE hypervisor needs to have access to its own view of the FGT
masks, which unfortunately results in a bit of data duplication.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Unconditionally configure fine-grain traps
Mark Rutland [Tue, 19 Nov 2024 13:57:22 +0000 (13:57 +0000)]
KVM: arm64: Unconditionally configure fine-grain traps

... otherwise we can inherit the host configuration if this differs from
the KVM configuration.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[maz: simplified a couple of things]
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Use computed masks as sanitisers for FGT registers
Marc Zyngier [Fri, 24 Jan 2025 16:01:47 +0000 (16:01 +0000)]
KVM: arm64: Use computed masks as sanitisers for FGT registers

Now that we have computed RES0 bits, use them to sanitise the
guest view of FGT registers.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Add description of FGT bits leading to EC!=0x18
Marc Zyngier [Fri, 24 Jan 2025 17:09:27 +0000 (17:09 +0000)]
KVM: arm64: Add description of FGT bits leading to EC!=0x18

The current FTP tables are only concerned with the bits generating
ESR_ELx.EC==0x18. However, we want an exhaustive view of what KVM
really knows about.

So let's add another small table that provides that extra information.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Compute FGT masks from KVM's own FGT tables
Marc Zyngier [Fri, 24 Jan 2025 15:51:12 +0000 (15:51 +0000)]
KVM: arm64: Compute FGT masks from KVM's own FGT tables

In the process of decoupling KVM's view of the FGT bits from the
wider architectural state, use KVM's own FGT tables to build
a synthetic view of what is actually known.

This allows for some checking along the way.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Plug FEAT_GCS handling
Marc Zyngier [Fri, 24 Jan 2025 15:44:32 +0000 (15:44 +0000)]
KVM: arm64: Plug FEAT_GCS handling

We don't seem to be handling the GCS-specific exception class.
Handle it by delivering an UNDEF to the guest, and populate the
relevant trap bits.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Don't treat HCRX_EL2 as a FGT register
Marc Zyngier [Fri, 24 Jan 2025 15:36:17 +0000 (15:36 +0000)]
KVM: arm64: Don't treat HCRX_EL2 as a FGT register

Treating HCRX_EL2 as yet another FGT register seems excessive, and
gets in a way of further improvements. It is actually simpler to
just be explicit about the masking, so just to that.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabled
Marc Zyngier [Wed, 3 Jul 2024 15:41:47 +0000 (16:41 +0100)]
KVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabled

We currently unconditionally make ACCDATA_EL1 accesses UNDEF.

As we are about to support it, restrict the UNDEF behaviour to cases
where FEAT_LS64_ACCDATA is not exposed to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Handle trapping of FEAT_LS64* instructions
Marc Zyngier [Thu, 4 Jul 2024 17:58:01 +0000 (18:58 +0100)]
KVM: arm64: Handle trapping of FEAT_LS64* instructions

We generally don't expect FEAT_LS64* instructions to trap, unless
they are trapped by a guest hypervisor.

Otherwise, this is just the guest playing tricks on us by using
an instruction that isn't advertised, which we handle with a well
deserved UNDEF.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Simplify handling of negative FGT bits
Marc Zyngier [Sat, 26 Apr 2025 11:05:04 +0000 (12:05 +0100)]
KVM: arm64: Simplify handling of negative FGT bits

check_fgt_bit() and triage_sysreg_trap() implement the same thing
twice for no good reason. We have to lookup the FGT register twice,
as we don't communicate it. Similarly, we extract the register value
at the wrong spot.

Reorganise the code in a more logical way so that things are done
at the correct location, removing a lot of duplication.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoKVM: arm64: Tighten handling of unknown FGT groups
Marc Zyngier [Sat, 26 Apr 2025 10:42:15 +0000 (11:42 +0100)]
KVM: arm64: Tighten handling of unknown FGT groups

triage_sysreg_trap() assumes that it knows all the possible values
for FGT groups, which won't be the case as we start adding more
FGT registers (unless we add everything in one go, which is obviously
undesirable).

At the same time, it doesn't offer much in terms of debugging info
when things go wrong.

Turn the "__NR_FGT_GROUP_IDS__" case into a default, covering any
unhandled value, and give the kernel hacker a bit of a clue about
what's wrong (system register and full trap descriptor).

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: Add FEAT_FGT2 capability
Marc Zyngier [Tue, 22 Apr 2025 18:23:41 +0000 (19:23 +0100)]
arm64: Add FEAT_FGT2 capability

As we will eventually have to context-switch the FEAT_FGT2 registers
in KVM (something that has been completely ignored so far), add
a new cap that we will be able to check for.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: Add syndrome information for trapped LD64B/ST64B{,V,V0}
Marc Zyngier [Thu, 4 Jul 2024 17:45:22 +0000 (18:45 +0100)]
arm64: Add syndrome information for trapped LD64B/ST64B{,V,V0}

Provide the architected EC and ISS values for all the FEAT_LS64*
instructions.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: tools: Resync sysreg.h
Marc Zyngier [Sat, 26 Apr 2025 10:17:13 +0000 (11:17 +0100)]
arm64: tools: Resync sysreg.h

Perform a bulk resync of tools/arch/arm64/include/asm/sysreg.h.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: Remove duplicated sysreg encodings
Marc Zyngier [Tue, 11 Mar 2025 09:36:28 +0000 (09:36 +0000)]
arm64: Remove duplicated sysreg encodings

A bunch of sysregs are now generated from the sysreg file, so no
need to carry separate definitions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Add system instructions trapped by HFGIRT2_EL2
Marc Zyngier [Fri, 25 Apr 2025 12:47:18 +0000 (13:47 +0100)]
arm64: sysreg: Add system instructions trapped by HFGIRT2_EL2

Add the new CMOs trapped by HFGITR2_EL2.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Add registers trapped by HDFG{R,W}TR2_EL2
Marc Zyngier [Fri, 25 Apr 2025 12:44:31 +0000 (13:44 +0100)]
arm64: sysreg: Add registers trapped by HDFG{R,W}TR2_EL2

Bulk addition of all the system registers trapped by HDFG{R,W}TR2_EL2.

The descriptions are extracted from the BSD-licenced JSON file part
of the 2025-03 drop from ARM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Add registers trapped by HFG{R,W}TR2_EL2
Marc Zyngier [Thu, 24 Apr 2025 18:47:09 +0000 (19:47 +0100)]
arm64: sysreg: Add registers trapped by HFG{R,W}TR2_EL2

Bulk addition of all the system registers trapped by HFG{R,W}TR2_EL2.

The descriptions are extracted from the BSD-licenced JSON file part
of the 2025-03 drop from ARM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Update CPACR_EL1 description
Marc Zyngier [Tue, 6 May 2025 16:27:25 +0000 (17:27 +0100)]
arm64: sysreg: Update CPACR_EL1 description

Add the couple of fields introduced with FEAT_NV2p1.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Update TRBIDR_EL1 description
Marc Zyngier [Wed, 23 Apr 2025 10:28:05 +0000 (11:28 +0100)]
arm64: sysreg: Update TRBIDR_EL1 description

Add the missing MPAM field.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Update PMSIDR_EL1 description
Marc Zyngier [Wed, 23 Apr 2025 10:27:44 +0000 (11:27 +0100)]
arm64: sysreg: Update PMSIDR_EL1 description

Add the missing SME, ALTCLK, FPF, EFT. CRR and FDS fields.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Update ID_AA64PFR0_EL1 description
Marc Zyngier [Wed, 23 Apr 2025 10:26:42 +0000 (11:26 +0100)]
arm64: sysreg: Update ID_AA64PFR0_EL1 description

Add the missing RASv2 description.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2
Marc Zyngier [Tue, 11 Mar 2025 14:44:30 +0000 (14:44 +0000)]
arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2

Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake.
It makes things hard to reason about, has the potential to
introduce bugs by giving a meaning to bits that are really reserved,
and is in general a bad description of the architecture.

Given that #defines are cheap, let's describe both registers as
intended by the architecture, and repaint all the existing uses.

Yes, this is painful.

The registers themselves are generated from the JSON file in
an automated way.

Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Add layout for HCR_EL2
Marc Zyngier [Mon, 3 Feb 2025 17:27:23 +0000 (17:27 +0000)]
arm64: sysreg: Add layout for HCR_EL2

Add HCR_EL2 to the sysreg file, more or less directly generated
from the JSON file.

Since the generated names significantly differ from the existing
naming, express the old names in terms of the new one. One day, we'll
fix this mess, but I'm not in any hurry.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Update ID_AA64MMFR4_EL1 description
Marc Zyngier [Mon, 10 Feb 2025 11:35:31 +0000 (11:35 +0000)]
arm64: sysreg: Update ID_AA64MMFR4_EL1 description

Resync the ID_AA64MMFR4_EL1 with the architectue description.

This results in:

- the new PoPS field
- the new NV2P1 value for the NV_frac field
- the new RMEGDI field
- the new SRMASK field

These fields have been generated from the reference JSON file.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
6 weeks agoarm64: sysreg: Add ID_AA64ISAR1_EL1.LS64 encoding for FEAT_LS64WB
Marc Zyngier [Thu, 31 Oct 2024 08:42:11 +0000 (08:42 +0000)]
arm64: sysreg: Add ID_AA64ISAR1_EL1.LS64 encoding for FEAT_LS64WB

The 2024 extensions are adding yet another variant of LS64
(aptly named FEAT_LS64WB) supporting LS64 accesses to write-back
memory, as well as 32 byte single-copy atomic accesses using pairs
of FP registers.

Add the relevant encoding to ID_AA64ISAR1_EL1.LS64.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
8 weeks agoLinux 6.15-rc3
Linus Torvalds [Sun, 20 Apr 2025 20:43:47 +0000 (13:43 -0700)]
Linux 6.15-rc3

8 weeks agogcc-15: work around sequence-point warning
Linus Torvalds [Sun, 20 Apr 2025 18:30:11 +0000 (11:30 -0700)]
gcc-15: work around sequence-point warning

The C sequence points are complicated things, and gcc-15 has apparently
added a warning for the case where an object is both used and modified
multiple times within the same sequence point.

That's a great warning.

Or rather, it would be a great warning, except gcc-15 seems to not
really be very exact about it, and doesn't notice that the modification
are to two entirely different members of the same object: the array
counter and the array entries.

So that seems kind of silly.

That said, the code that gcc complains about is unnecessarily
complicated, so moving the array counter update into a separate
statement seems like the most straightforward fix for these warnings:

  drivers/net/wireless/intel/iwlwifi/mld/d3.c: In function ‘iwl_mld_set_netdetect_info’:
  drivers/net/wireless/intel/iwlwifi/mld/d3.c:1102:66: error: operation on ‘netdetect_info->n_matches’ may be undefined [-Werror=sequence-point]
   1102 |                 netdetect_info->matches[netdetect_info->n_matches++] = match;
        |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~^~

  drivers/net/wireless/intel/iwlwifi/mld/d3.c:1120:58: error: operation on ‘match->n_channels’ may be undefined [-Werror=sequence-point]
   1120 |                         match->channels[match->n_channels++] =
        |                                         ~~~~~~~~~~~~~~~~~^~

side note: the code at that second warning is actively buggy, and only
works on little-endian machines that don't do strict alignment checks.

The code casts an array of integers into an array of unsigned long in
order to use our bitmap iterators.  That happens to work fine on any
sane architecture, but it's still wrong.

This does *not* fix that more serious problem.  This only splits the two
assignments into two statements and fixes the compiler warning.  I need
to get rid of the new warnings in order to be able to actually do any
build testing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 weeks agogcc-15: add '__nonstring' markers to byte arrays
Linus Torvalds [Sun, 20 Apr 2025 18:18:55 +0000 (11:18 -0700)]
gcc-15: add '__nonstring' markers to byte arrays

All of these cases are perfectly valid and good traditional C, but hit
by the "you're not NUL-terminating your byte array" warning.

And none of the cases want any terminating NUL character.

Mark them __nonstring to shut up gcc-15 (and in the case of the ak8974
magnetometer driver, I just removed the explicit array size and let gcc
expand the 3-byte and 6-byte arrays by one extra byte, because it was
the simpler change).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 weeks agogcc-15: get rid of misc extra NUL character padding
Linus Torvalds [Sun, 20 Apr 2025 18:04:00 +0000 (11:04 -0700)]
gcc-15: get rid of misc extra NUL character padding

This removes two cases of explicit NUL padding that now causes warnings
because of '-Wunterminated-string-initialization' being part of -Wextra
in gcc-15.

Gcc is being silly in this case when it says that it truncates a NUL
terminator, because in these cases there were _multiple_ NUL characters.

But we can get rid of the warning by just simplifying the two
initializers that trigger the warning for me, so this does exactly that.

I'm not sure why the power supply code did that odd

    .attr_name = #_name "\0",

pattern: it was introduced in commit 2cabeaf15129 ("power: supply: core:
Cleanup power supply sysfs attribute list"), but that 'attr_name[]'
field is an explicitly sized character array in a statically initialized
variable, and a string initializer always has a terminating NUL _and_
statically initialized character arrays are zero-padded anyway, so it
really seems to be rather extraneous belt-and-suspenders.

The zero_uuid[16] initialization in drivers/md/bcache/super.c makes
perfect sense, but it isn't necessary for the same reasons, and not
worth the new gcc warning noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 weeks agogcc-15: acpi: sprinkle random '__nonstring' crumbles around
Linus Torvalds [Sun, 20 Apr 2025 18:02:18 +0000 (11:02 -0700)]
gcc-15: acpi: sprinkle random '__nonstring' crumbles around

This is not great: I'd much rather introduce a typedef that is a "ACPI
name byte buffer", and use that to mark these special 4-byte ACPI names
that do not use NUL termination.

But as noted in the previous commit ("gcc-15: make 'unterminated string
initialization' just a warning") gcc doesn't actually seem to support
that notion, so instead you have to just mark every single array
declaration individually.

So this is not pretty, but this gets rid of the bulk of the annoying
warnings during an allmodconfig build for me.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 weeks agogcc-15: make 'unterminated string initialization' just a warning
Linus Torvalds [Sun, 20 Apr 2025 17:33:23 +0000 (10:33 -0700)]
gcc-15: make 'unterminated string initialization' just a warning

gcc-15 enabling -Wunterminated-string-initialization in -Wextra by
default was done with the best intentions, but the warning is still
quite broken.

What annoys me about the warning is that this is a very traditional AND
CORRECT way to initialize fixed byte arrays in C:

unsigned char hex[16] = "0123456789abcdef";

and we use this all over the kernel.  And the warning is fine, but gcc
developers apparently never made a reasonable way to disable it.  As is
(sadly) tradition with these things.

Yes, there's "__attribute__((nonstring))", and we have a macro to make
that absolutely disgusting syntax more palatable (ie the kernel syntax
for that monstrosity is just "__nonstring").

But that attribute is misdesigned.  What you'd typically want to do is
tell the compiler that you are using a type that isn't a string but a
byte array, but that doesn't work at all:

warning: ‘nonstring’ attribute does not apply to types [-Wattributes]

and because of this fundamental mis-design, you then have to mark each
instance of that pattern.

This is particularly noticeable in our ACPI code, because ACPI has this
notion of a 4-byte "type name" that gets used all over, and is exactly
this kind of byte array.

This is a sad oversight, because the warning is useful, but really would
be so much better if gcc had also given a sane way to indicate that we
really just want a byte array type at a type level, not the broken "each
and every array definition" level.

So now instead of creating a nice "ACPI name" type using something like

typedef char acpi_name_t[4] __nonstring;

we have to do things like

char name[ACPI_NAMESEG_SIZE] __nonstring;

in every place that uses this concept and then happens to have the
typical initializers.

This is annoying me mainly because I think the warning _is_ a good
warning, which is why I'm not just turning it off in disgust.  But it is
hampered by this bad implementation detail.

[ And obviously I'm doing this now because system upgrades for me are
  something that happen in the middle of the release cycle: don't do it
  before or during travel, or just before or during the busy merge
  window period. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 weeks agoMerge tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 20 Apr 2025 04:46:58 +0000 (21:46 -0700)]
Merge tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git./linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
 "16 hotfixes. 2 are cc:stable and the remainder address post-6.14
  issues or aren't considered necessary for -stable kernels.

  All patches are basically for MM although five are alterations to
  MAINTAINERS"

[ Basic counting skills are clearly not a strictly necessary requirement
  for kernel maintainers.     - Linus ]

* tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add section for locking of mm's and VMAs
  mm: vmscan: fix kswapd exit condition in defrag_mode
  mm: vmscan: restore high-cpu watermark safety in kswapd
  MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
  mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
  mm, hugetlb: increment the number of pages to be reset on HVO
  writeback: fix false warning in inode_to_wb()
  docs: ABI: replace mcroce@microsoft.com with new Meta address
  mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
  MAINTAINERS: add memory advice section
  MAINTAINERS: add mmap trace events to MEMORY MAPPING
  mm: memcontrol: fix swap counter leak from offline cgroup
  MAINTAINERS: add MM subsection for the page allocator
  MAINTAINERS: update SLAB ALLOCATOR maintainers
  fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages
  mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()

8 weeks agoMerge tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 19 Apr 2025 21:31:08 +0000 (14:31 -0700)]
Merge tag 'vfs-6.15-rc3.fixes.2' of git://git./linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Revert the hfs{plus} deprecation warning that's also included in this
   pull request. The commit introducing the deprecation warning resides
   rather early in this branch. So simply dropping it would've rebased
   all other commits which I decided to avoid. Hence the revert in the
   same branch

   [ Background - the deprecation warning discussion resulted in people
     stepping up, and so hfs{plus} will have a maintainer taking care of
     it after all..   - Linus ]

 - Switch CONFIG_SYSFS_SYCALL default to n and decouple from
   CONFIG_EXPERT

 - Fix an audit bug caused by changes to our kernel path lookup helpers
   this cycle. Audit needs the parent path even if the dentry it tried
   to look up is negative

 - Ensure that the kernel path lookup helpers leave the passed in path
   argument clean when they return an error. This is consistent with all
   our other helpers

 - Ensure that vfs_getattr_nosec() calls bdev_statx() so the relevant
   information is available to kernel consumers as well

 - Don't set a timer and call schedule() if the timer will expire
   immediately in epoll

 - Make netfs lookup tables with __nonstring

* tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "hfs{plus}: add deprecation warning"
  fs: move the bdex_statx call to vfs_getattr_nosec
  netfs: Mark __nonstring lookup tables
  eventpoll: Set epoll timeout if it's in the future
  fs: ensure that *path_locked*() helpers leave passed path pristine
  fs: add kern_path_locked_negative()
  hfs{plus}: add deprecation warning
  Kconfig: switch CONFIG_SYSFS_SYCALL default to n

8 weeks agoMerge tag 'i2c-for-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 19 Apr 2025 20:59:04 +0000 (13:59 -0700)]
Merge tag 'i2c-for-6.15-rc3' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

 - Address translator: fix wrong include

 - ChromeOS EC tunnel: fix potential NULL pointer dereference

* tag 'i2c-for-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: atr: Fix wrong include
  i2c: cros-ec-tunnel: defer probe if parent EC is not present

8 weeks agoRevert "hfs{plus}: add deprecation warning"
Christian Brauner [Sat, 19 Apr 2025 20:48:59 +0000 (22:48 +0200)]
Revert "hfs{plus}: add deprecation warning"

This reverts commit ddee68c499f76ae47c011549df5be53db0057402.

There's ongoing discussion about better maintenance of at least hfsplus.
Rever the deprecation warning for now.

Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoMerge tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sat, 19 Apr 2025 18:57:36 +0000 (11:57 -0700)]
Merge tag 'trace-v6.15-rc2' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Initialize hash variables in ftrace subops logic

   The fix that simplified the ftrace subops logic opened a path where
   some variables could be used without being initialized, and done
   subtly where the compiler did not catch it. Initialize those
   variables to the EMPTY_HASH, which is the default hash.

 - Reinitialize the hash pointers after they are freed

   Some of the hash pointers in the subop logic were freed but may still
   be referenced later. To prevent use-after-free bugs, initialize them
   back to the EMPTY_HASH.

 - Free the ftrace hashes when they are replaced

   The fix that simplified the subops logic updated some hash pointers,
   but left the original hash that they were pointing to where they are
   no longer used. This caused a memory leak. Free the hashes that are
   pointed to by the pointers when they are replaced.

 - Fix size initialization of ftrace direct function hash

   The ftrace direct function hash used by BPF initialized the hash size
   incorrectly. It checked the size of items to a hard coded 32, which
   made the hash bit size of 5. The hash size is supposed to be limited
   by the bit size of the hash, as the bitmask is allowed to be greater
   than 5. Rework the size check to first pass the number of elements to
   fls() and then compare that to FTRACE_HASH_MAX_BITS before allocating
   the hash.

 - Fix format output of ftrace_graph_ent_entry event

   The field depth of the ftrace_graph_ent_entry event is of size 4 but
   the output showed it as unsigned long and use "%lu". Change it to
   unsigned int and use "%u" in the print format that is displayed to
   user space.

 - Fix the trace event filter on strings

   Events can be filtered on numbers or string values. The return value
   checked from strncpy_from_kernel_nofault() and
   strncpy_from_user_nofault() was used to determine if reading the
   strings would fault or not. It would return fault if the value was
   non zero, which is basically meant that it was always considering the
   read as a fault.

 - Add selftest to test trace event string filtering

   In order to catch the breakage of the string filtering, add a self
   test to make sure that it continues to work.

* tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: selftests: Add testing a user string to filters
  tracing: Fix filter string testing
  ftrace: Fix type of ftrace_graph_ent_entry.depth
  ftrace: fix incorrect hash size in register_ftrace_direct()
  ftrace: Free ftrace hashes after they are replaced in the subops code
  ftrace: Reinitialize hash to EMPTY_HASH after freeing
  ftrace: Initialize variables for ftrace_startup/shutdown_subops()

8 weeks agoMerge tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Sat, 19 Apr 2025 17:38:03 +0000 (10:38 -0700)]
Merge tag 'nfsd-6.15-1' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - v6.15 libcrc clean-up makes invalid configurations possible

 - Fix a potential deadlock introduced during the v6.15 merge window

* tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: decrease sc_count directly if fail to queue dl_recall
  nfs: add missing selections of CONFIG_CRC32

8 weeks agoMerge tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda...
Linus Torvalds [Sat, 19 Apr 2025 17:02:43 +0000 (10:02 -0700)]
Merge tag 'rust-fixes-6.15' of git://git./linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Fix missing KASAN LLVM flags on first build (and fix spurious
     rebuilds) by skipping '--target'

   - Fix Make < 4.3 build error by using '$(pound)'

   - Fix UML build error by removing 'volatile' qualifier from io
     helpers

   - Fix UML build error by adding 'dma_{alloc,free}_attrs()' helpers

   - Clean gendwarfksyms warnings by avoiding to export '__pfx' symbols

   - Clean objtool warning by adding a new 'noreturn' function for
     1.86.0

   - Disable 'needless_continue' Clippy lint due to new 1.86.0 warnings

   - Add missing 'ffi' crate to 'generate_rust_analyzer.py'

  'pin-init' crate:

   - Import a couple fixes from upstream"

* tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: helpers: Add dma_alloc_attrs() and dma_free_attrs()
  rust: helpers: Remove volatile qualifier from io helpers
  rust: kbuild: use `pound` to support GNU Make < 4.3
  objtool/rust: add one more `noreturn` Rust function for Rust 1.86.0
  rust: kasan/kbuild: fix missing flags on first build
  rust: disable `clippy::needless_continue`
  rust: kbuild: Don't export __pfx symbols
  rust: pin-init: use Markdown autolinks in Rust comments
  rust: pin-init: alloc: restrict `impl ZeroableOption` for `Box` to `T: Sized`
  scripts: generate_rust_analyzer: Add ffi crate

8 weeks agoMerge tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 19 Apr 2025 16:31:21 +0000 (09:31 -0700)]
Merge tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Easter rc3 pull request, fixes in all the usuals, amdgpu, xe, msm,
  with some i915/ivpu/mgag200/v3d fixes, then a couple of bits in
  dma-buf/gem.

  Hopefully has no easter eggs in it.

  dma-buf:
   - Correctly decrement refcounter on errors

  gem:
   - Fix test for imported buffers

  amdgpu:
   - Cleaner shader sysfs fix
   - Suspend fix
   - Fix doorbell free ordering
   - Video caps fix
   - DML2 memory allocation optimization
   - HDP fix

  i915:
   - Fix DP DSC configurations that require 3 DSC engines per pipe

  xe:
   - Fix LRC address being written too late for GuC
   - Fix notifier vs folio deadlock
   - Fix race betwen dma_buf unmap and vram eviction
   - Fix debugfs handling PXP terminations unconditionally

  msm:
   - Display:
       - Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
         case of multi-rect
       - Fix to validate plane_state pointer before using it in
         dpu_plane_virtual_atomic_check()
       - Fix to make sure dereferencing dpu_encoder_phys happens after
         making sure it is valid in _dpu_encoder_trigger_start()
       - Remove the remaining intr_tear_rd_ptr which we initialized to
         -1 because NO_IRQ indices start from 0 now
   - GPU:
       - Fix IB_SIZE overflow

  ivpu:
   - Fix debugging
   - Fixes to frequency
   - Support firmware API 3.28.3
   - Flush jobs upon reset

  mgag200:
   - Set vblank start to correct values

  v3d:
   - Fix Indirect Dispatch"

* tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel: (26 commits)
  drm/msm/a6xx+: Don't let IB_SIZE overflow
  drm/xe/pxp: do not queue unneeded terminations from debugfs
  drm/xe/dma_buf: stop relying on placement in unmap
  drm/xe/userptr: fix notifier vs folio deadlock
  drm/xe: Set LRC addresses before guc load
  drm/mgag200: Fix value in <VBLKSTR> register
  drm/gem: Internally test import_attach for imported objects
  drm/amdgpu: Use the right function for hdp flush
  drm/amd/display/dml2: use vzalloc rather than kzalloc
  drm/amdgpu: Add back JPEG to video caps for carrizo and newer
  drm/amdgpu: fix warning of drm_mm_clean
  drm/amd: Forbid suspending into non-default suspend states
  drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v4
  drm/i915/dp: Check for HAS_DSC_3ENGINES while configuring DSC slices
  drm/i915/display: Add macro for checking 3 DSC engines
  dma-buf/sw_sync: Decrement refcount on error in sw_sync_ioctl_get_deadline()
  accel/ivpu: Add cmdq_id to job related logs
  accel/ivpu: Show NPU frequency in sysfs
  accel/ivpu: Fix the NPU's DPU frequency calculation
  accel/ivpu: Update FW Boot API to version 3.28.3
  ...

8 weeks agoMerge tag 'drm-msm-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/msm into...
Dave Airlie [Sat, 19 Apr 2025 05:09:29 +0000 (15:09 +1000)]
Merge tag 'drm-msm-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.15-rc3

Display:
- Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
  case of multi-rect
- Fix to validate plane_state pointer before using it in
  dpu_plane_virtual_atomic_check()
- Fix to make sure dereferencing dpu_encoder_phys happens after
  making sure it is valid in _dpu_encoder_trigger_start()
- Remove the remaining intr_tear_rd_ptr which we initialized
  to -1 because NO_IRQ indices start from 0 now

GPU:
- Fix IB_SIZE overflow

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://lore.kernel.org/r/CAF6AEGtVKXEVdzUzFWmQE8JmK3nx_hp+ynOd-5j3vnfcU-sgOA@mail.gmail.com
8 weeks agoMerge tag 'drm-xe-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Sat, 19 Apr 2025 04:59:47 +0000 (14:59 +1000)]
Merge tag 'drm-xe-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix LRC address being written too late for GuC
- Fix notifier vs folio deadlock
- Fix race betwen dma_buf unmap and vram eviction
- Fix debugfs handling PXP terminations unconditionally

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/ndinq644zenywaaycxyfqqivsb2xer4z7err3dlpalbz33jfkm@ttabzsg6wnet
2 months agoMerge tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 19 Apr 2025 03:10:42 +0000 (20:10 -0700)]
Merge tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Fix hard link lease key problem when close is deferred

 - Revert the socket lockdep/refcount workarounds done in cifs.ko now
   that it is fixed at the socket layer

* tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  Revert "smb: client: fix TCP timers deadlock after rmmod"
  Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free"
  smb3 client: fix open hardlink on deferred close file error

2 months agodrm/msm/a6xx+: Don't let IB_SIZE overflow
Rob Clark [Mon, 17 Mar 2025 15:00:06 +0000 (08:00 -0700)]
drm/msm/a6xx+: Don't let IB_SIZE overflow

IB_SIZE is only b0..b19.  Starting with a6xx gen3, additional fields
were added above the IB_SIZE.  Accidentially setting them can cause
badness.  Fix this by properly defining the CP_INDIRECT_BUFFER packet
and using the generated builder macro to ensure unintended bits are not
set.

v2: add missing type attribute for IB_BASE
v3: fix offset attribute in xml

Reported-by: Connor Abbott <cwabbott0@gmail.com>
Fixes: a83366ef19ea ("drm/msm/a6xx: add A640/A650 to gpulist")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/643396/

2 months agoMerge tag 'i2c-host-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Wolfram Sang [Fri, 18 Apr 2025 21:42:56 +0000 (23:42 +0200)]
Merge tag 'i2c-host-fixes-6.15-rc3' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host-fixes for v6.15-rc3

- ChromeOS EC tunnel: fix potential NULL pointer dereference

2 months agoMerge tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Apr 2025 21:04:57 +0000 (14:04 -0700)]
Merge tag 'x86-urgent-2025-04-18' of git://git./linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix hypercall detection on Xen guests

 - Extend the AMD microcode loader SHA check to Zen5, to block loading
   of any unreleased standalone Zen5 microcode patches

 - Add new Intel CPU model number for Bartlett Lake

 - Fix the workaround for AMD erratum 1054

 - Fix buggy early memory acceptance between SEV-SNP guests and the EFI
   stub

* tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/sev: Avoid shared GHCB page for early memory acceptance
  x86/cpu/amd: Fix workaround for erratum 1054
  x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores
  x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches
  x86/xen: Fix __xen_hypercall_setfunc()

2 months agoMerge tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 18 Apr 2025 21:02:45 +0000 (14:02 -0700)]
Merge tag 'timers-urgent-2025-04-18' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix a lockdep false positive in the i8253 driver"

* tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/i8253: Call clockevent_i8253_disable() with interrupts disabled

2 months agoMerge tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Apr 2025 20:35:13 +0000 (13:35 -0700)]
Merge tag 'perf-urgent-2025-04-18' of git://git./linux/kernel/git/tip/tip

Pull x86 perf event fixes from Ingo Molnar:
 "Miscellaneous fixes and a hardware-enabling change:

   - Fix Intel uncore PMU IIO free running counters on SPR, ICX and SNR
     systems

   - Fix Intel PEBS buffer overflow handling

   - Fix skid in Intel PEBS sampling of user-space general purpose
     registers

   - Enable Panther Lake PMU support - similar to Lunar Lake"

* tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add Panther Lake support
  perf/x86/intel: Allow to update user space GPRs from PEBS records
  perf/x86/intel: Don't clear perf metrics overflow bit unconditionally
  perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR
  perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX
  perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR

2 months agoMerge tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Apr 2025 20:28:41 +0000 (13:28 -0700)]
Merge tag 'irq-urgent-2025-04-18' of git://git./linux/kernel/git/tip/tip

Pull misc irq fixes from Ingo Molnar:

 - Fix BCM2712 irqchip driver Kconfig dependencies required on the
   Raspberry PI5

 - Fix spurious interrupts on RZ/G3E SMARC EVK systems

 - Fix crash regression on Sun/NIU hardware

 - Apply MSI driver quirk for Sun Neptune chips

* tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabled
  irqchip/renesas-rzv2h: Prevent TINT spurious interrupt
  net/niu: Niu requires MSIX ENTRY_DATA fields touch before entry reads
  PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads

2 months agoMerge tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Apr 2025 20:25:33 +0000 (13:25 -0700)]
Merge tag 'core-urgent-2025-04-18' of git://git./linux/kernel/git/tip/tip

Pull misc core fixes from Ingo Molnar:
 "Fix a genksyms related bug, triggered by recent changes to the percpu
  code, and update the .clang-format file to not include obsolete
  function names"

* tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genksyms: Handle typeof_unqual keyword and __seg_{fs,gs} qualifiers
  clang-format: Update the ForEachMacros list for v6.15-rc1

2 months agoMerge tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Apr 2025 20:20:20 +0000 (13:20 -0700)]
Merge tag 'hardening-v6.15-rc3' of git://git./linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - lib/prime_numbers: KUnit test should not select PRIME_NUMBERS (Geert
   Uytterhoeven)

 - ubsan: Fix panic from test_ubsan_out_of_bounds (Mostafa Saleh)

 - ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP (Nathan
   Chancellor)

 - string: Add load_unaligned_zeropad() code path to sized_strscpy()
   (Peter Collingbourne)

 - kasan: Add strscpy() test to trigger tag fault on arm64 (Vincenzo
   Frascino)

 - Disable GCC randstruct for COMPILE_TEST

* tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib/prime_numbers: KUnit test should not select PRIME_NUMBERS
  ubsan: Fix panic from test_ubsan_out_of_bounds
  lib/Kconfig.ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP
  hardening: Disable GCC randstruct for COMPILE_TEST
  kasan: Add strscpy() test to trigger tag fault on arm64
  string: Add load_unaligned_zeropad() code path to sized_strscpy()

2 months agoMerge tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 18 Apr 2025 20:18:01 +0000 (13:18 -0700)]
Merge tag 'gpio-fixes-for-v6.15-rc3' of git://git./linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:

 - check for both the new AND old (deprecated) setter callback when
   changing GPIO direction to output

* tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: Allow to use setters with return value for output-only gpios

2 months agoMerge tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 18 Apr 2025 20:09:20 +0000 (13:09 -0700)]
Merge tag 'thermal-6.15-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "Add missing DVFS support flags for the Lunar Lake and Panther Lake
  platforms to the int340x Intel thermal driver and fix DLVR support
  for Panther Lake in it (Srinivas Pandruvada)"

* tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: int340x: Fix Panther Lake DLVR support
  thermal: intel: int340x: Add missing DVFS support flags

2 months agoMerge tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 18 Apr 2025 20:06:12 +0000 (13:06 -0700)]
Merge tag 'pm-6.15-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These are mostly cpufreq fixes, some of which address recent
  regressions and some address older issues that have come to light
  during the last two weeks, and a runtime PM documentation correction:

   - Fix the performance-to-frequency scaling factor computation on
     systems using HWP in the intel_pstate driver after a recent
     incorrect update of it (Rafael Wysocki)

   - Fix the usage of the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag
     in the schedutil cpufreq governor after a recent update of it that
     has caused frequency limits changes to be missed sometimes (Rafael
     Wysocki)

   - Address some recently discovered synchronization issues related to
     frequency limits changes in the schedutil cpufreq governor and in
     the cpufreq core (Rafael Wysocki)

   - Fix ITMT support in the amd-pstate cpufreq driver so that it is
     enabled after asym priorities have been correctly initialized for
     all CPUs (K Prateek Nayak)

   - Fix changing min/max limits in the amd-pstate cpufreq driver while
     on the performance governor (Dhananjay Ugwekar)

   - Fix a function name in the runtime PM documentation that was
     previously incorrectly updated by mistake (Sakari Ailus)"

* tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Avoid using inconsistent policy->min and policy->max
  cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
  cpufreq/sched: Explicitly synchronize limits_changed flag handling
  cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
  Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()
  cpufreq: intel_pstate: Fix hwp_get_cpu_scaling()
  cpufreq/amd-pstate: Enable ITMT support after initializing core rankings
  cpufreq/amd-pstate: Fix min_limit perf and freq updation for performance governor

2 months agoMerge branch 'pm-docs'
Rafael J. Wysocki [Fri, 18 Apr 2025 18:55:48 +0000 (20:55 +0200)]
Merge branch 'pm-docs'

Merge a runtime PM documentation correction for 6.15-rc3.

* pm-docs:
  Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()

2 months agoMerge tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 18 Apr 2025 18:46:44 +0000 (11:46 -0700)]
Merge tag 'riscv-for-linus-6.15-rc3' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix for an issue where C instructions ended up in non-C builds, due
   to some broken inline assembly in the KGDB breakpoint insertion code

 - A fix to avoid spurious printk messages about misaligned access
   performance probing

 - A fix for a handful of issues with /proc/iomem's reserved region
   handling

 - A pair of fixes for module relocation processing

 - A few build-time fixes

* tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break
  riscv: KGDB: Do not inline arch_kgdb_breakpoint()
  riscv: Avoid fortify warning in syscall_get_arguments()
  riscv: Provide all alternative macros all the time
  riscv: module: Allocate PLT entries for R_RISCV_PLT32
  riscv: module: Fix out-of-bounds relocation access
  riscv: Properly export reserved regions in /proc/iomem
  riscv: Fix unaligned access info messages
  riscv: Avoid fortify warning in syscall_get_arguments()
  Documentation: riscv: Fix typo MIMPLID -> MIMPID
  riscv: Use kvmalloc_array on relocation_hashtable

2 months agoMerge tag 'linux_kselftest-kunit-fixes-6.15-rc3' of git://git.kernel.org/pub/scm...
Linus Torvalds [Fri, 18 Apr 2025 18:35:11 +0000 (11:35 -0700)]
Merge tag 'linux_kselftest-kunit-fixes-6.15-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kunit fix from Shuah Khan:
 "Fixes arch sh kunit qemu_configs script sh.py to honor kunit cmdline"

* tag 'linux_kselftest-kunit-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: qemu_configs: SH: Respect kunit cmdline

2 months agoMerge tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 18 Apr 2025 18:32:31 +0000 (11:32 -0700)]
Merge tag 'linux_kselftest-fixes-6.15-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
 "Fixes dynevent_limitations.tc test failure on dash by detecting and
  handling bash and dash differences in evaluating \\"

* tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc

2 months agoMerge tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Fri, 18 Apr 2025 16:37:44 +0000 (09:37 -0700)]
Merge tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Fix integer overflow in server disconnect deadtime calculation

 - Three fixes for potential use after frees: one for oplocks, and one
   for leases and one for kerberos authentication

 - Fix to prevent attempted write to directory

 - Fix locking warning for durable scavenger thread

* tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: Prevent integer overflow in calculation of deadtime
  ksmbd: fix the warning from __kernel_write_iter
  ksmbd: fix use-after-free in smb_break_all_levII_oplock()
  ksmbd: fix use-after-free in __smb2_lease_break_noti()
  ksmbd: fix WARNING "do not call blocking ops when !TASK_RUNNING"
  ksmbd: Fix dangling pointer in krb_authenticate

2 months agoMerge tag 'block-6.15-20250417' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 18 Apr 2025 16:21:14 +0000 (09:21 -0700)]
Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - MD pull via Yu:
      - fix raid10 missing discard IO accounting (Yu Kuai)
      - fix bitmap stats for bitmap file (Zheng Qixing)
      - fix oops while reading all member disks failed during
        check/repair (Meir Elisha)

 - NVMe pull via Christoph:
      - fix scan failure for non-ANA multipath controllers (Hannes
        Reinecke)
      - fix multipath sysfs links creation for some cases (Hannes
        Reinecke)
      - PCIe endpoint fixes (Damien Le Moal)
      - use NULL instead of 0 in the auth code (Damien Le Moal)

 - Various ublk fixes:
      - Slew of selftest additions
      - Improvements and fixes for IO cancelation
      - Tweak to Kconfig verbiage

 - Fix for page dirtying for blk integrity mapped pages

 - loop fixes:
      - buffered IO fix
      - uevent fixes
      - request priority inheritance fix

 - Various little fixes

* tag 'block-6.15-20250417' of git://git.kernel.dk/linux: (38 commits)
  selftests: ublk: add generic_06 for covering fault inject
  ublk: simplify aborting ublk request
  ublk: remove __ublk_quiesce_dev()
  ublk: improve detection and handling of ublk server exit
  ublk: move device reset into ublk_ch_release()
  ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io
  ublk: add ublk_force_abort_dev()
  ublk: properly serialize all FETCH_REQs
  selftests: ublk: move creating UBLK_TMP into _prep_test()
  selftests: ublk: add test_stress_05.sh
  selftests: ublk: support user recovery
  selftests: ublk: support target specific command line
  selftests: ublk: increase max nr_queues and queue depth
  selftests: ublk: set queue pthread's cpu affinity
  selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN
  selftests: ublk: add two stress tests for zero copy feature
  selftests: ublk: run stress tests in parallel
  selftests: ublk: make sure _add_ublk_dev can return in sub-shell
  selftests: ublk: cleanup backfile automatically
  selftests: ublk: add io_uring uapi header
  ...

2 months agoMerge tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 18 Apr 2025 16:13:52 +0000 (09:13 -0700)]
Merge tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Correctly cap iov_iter->nr_segs for imports of registered buffers,
   both kbuf and normal ones.

   Three cleanups to make it saner first, then two fixes for each of the
   buffer types.

   This fixes a performance regression where partial buffer usage
   doesn't trim the tail number of segments, leading the block layer to
   iterate the IOs to check if it needs splitting.

 - Two patches tweaking the newly introduced zero-copy rx API, mostly to
   keep the API consistent once we add multiple interface queues per
   ring support in the 6.16 release.

 - zc rx unmapping fix for a dead device

* tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux:
  io_uring/zcrx: fix late dma unmap for a dead dev
  io_uring/rsrc: ensure segments counts are correct on kbuf buffers
  io_uring/rsrc: send exact nr_segs for fixed buffer
  io_uring/rsrc: refactor io_import_fixed
  io_uring/rsrc: separate kbuf offset adjustments
  io_uring/rsrc: don't skip offset calculation
  io_uring/zcrx: add pp to ifq conversion helper
  io_uring/zcrx: return ifq id to the user

2 months agotracing: selftests: Add testing a user string to filters
Steven Rostedt [Fri, 18 Apr 2025 14:12:08 +0000 (10:12 -0400)]
tracing: selftests: Add testing a user string to filters

Running the following commands was broken:

  # cd /sys/kernel/tracing
  # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter
  # echo 1 > events/syscalls/sys_enter_openat/enable
  # ls /proc/$$/maps
  # cat trace

And would produce nothing when it should have produced something like:

      ls-1192    [007] .....  8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0)

Add a test to check this case so that it will be caught if it breaks
again.

Link: https://lore.kernel.org/linux-trace-kernel/20250417183003.505835fb@gandalf.local.home/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/20250418101208.38dc81f5@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2 months agox86/boot/sev: Avoid shared GHCB page for early memory acceptance
Ard Biesheuvel [Thu, 17 Apr 2025 20:21:21 +0000 (22:21 +0200)]
x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.

So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.

For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.

Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com
Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com
Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com
2 months agox86/cpu/amd: Fix workaround for erratum 1054
Sandipan Das [Fri, 18 Apr 2025 06:19:40 +0000 (11:49 +0530)]
x86/cpu/amd: Fix workaround for erratum 1054

Erratum 1054 affects AMD Zen processors that are a part of Family 17h
Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However,
when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected
processors was incorrectly changed in a way that the IRPerfEn bit gets
set only for unaffected Zen 1 processors.

Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This
includes a subset of Zen 1 (Family 17h Models 30h and above) and all
later processors. Also clear X86_FEATURE_IRPERF on affected processors
so that the IRPerfCount register is not used by other entities like the
MSR PMU driver.

Fixes: 232afb557835 ("x86/CPU/AMD: Add X86_FEATURE_ZEN1")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/caa057a9d6f8ad579e2f1abaa71efbd5bd4eaf6d.1744956467.git.sandipan.das@amd.com
2 months agoio_uring/zcrx: fix late dma unmap for a dead dev
Pavel Begunkov [Fri, 18 Apr 2025 12:02:27 +0000 (13:02 +0100)]
io_uring/zcrx: fix late dma unmap for a dead dev

There is a problem with page pools not dma-unmapping immediately when
the device is going down, and delaying it until the page pool is
destroyed, which is not allowed (see links). That just got fixed for
normal page pools, and we need to address memory providers as well.

Unmap pages in the memory provider uninstall callback, and protect it
with a new lock. There is also a gap between when a dma mapping is
created and the mp is installed, so if the device is killed in between,
io_uring would be holding on to dma mappings to a dead device with no
one to call ->uninstall. Move it to page pool init and rely on
->is_mapped to make sure it's only done once.

Link: https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
Link: https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/
Fixes: 34a3e60821ab9 ("io_uring/zcrx: implement zerocopy receive pp memory provider")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ef9b7db249b14f6e0b570a1bb77ff177389f881c.1744965853.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 months agoMAINTAINERS: add section for locking of mm's and VMAs
Lorenzo Stoakes [Wed, 16 Apr 2025 10:38:37 +0000 (11:38 +0100)]
MAINTAINERS: add section for locking of mm's and VMAs

We place this under memory mapping as related to memory mapping
abstractions in the form of mm_struct and vm_area_struct (VMA).  Now we
have separated out mmap/vma locking logic into the mmap_lock.c and
mmap_lock.h files, so this should encapsulate the majority of the mm
locking logic in the kernel.

Suren is best placed to maintain this logic as the core architect of VMA
locking as a whole.

Link: https://lkml.kernel.org/r/e6ed679a184ca444b20dfa77af96913fd8b5efa0.1744799282.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: vmscan: fix kswapd exit condition in defrag_mode
Johannes Weiner [Wed, 16 Apr 2025 13:45:40 +0000 (09:45 -0400)]
mm: vmscan: fix kswapd exit condition in defrag_mode

Vlastimil points out an issue with kswapd in defrag_mode not waking up
kcompactd reliably.

Background: When kswapd is woken for any higher-order request, it
initially checks those high-order watermarks to decide if work is
necesary.  However, it cannot (efficiently) meet the contiguity goal of
such a request by itself.  So once it has reclaimed a compaction gap, it
adjusts the request down to check for free order-0 pages, then wakes
kcompactd to coalesce them into larger blocks.

In defrag_mode, the initial watermark check needs to be analogously
against free pageblocks.  However, once kswapd drops the high-order to
hand off contiguity work, it also needs to fall back to base page
watermarks - otherwise it'll keep reclaiming until blocks are freed.

While it appears kcompactd is woken up frequently enough to do most of the
compaction work, kswapd ends up overreclaiming by quite a bit:

                                                     DEFRAGMODE     DEFRAGMODE-thispatch
Hugealloc Time mean                       79381.34 (    +0.00%)    88126.12 (   +11.02%)
Hugealloc Time stddev                     85852.16 (    +0.00%)   135366.75 (   +57.67%)
Kbuild Real time                            249.35 (    +0.00%)      226.71 (    -9.04%)
Kbuild User time                           1249.16 (    +0.00%)     1249.37 (    +0.02%)
Kbuild System time                          171.76 (    +0.00%)      166.93 (    -2.79%)
THP fault alloc                           51666.87 (    +0.00%)    52685.60 (    +1.97%)
THP fault fallback                        16970.00 (    +0.00%)    15951.87 (    -6.00%)
Direct compact fail                         166.53 (    +0.00%)      178.93 (    +7.40%)
Direct compact success                       17.13 (    +0.00%)        4.13 (   -71.69%)
Compact daemon scanned migrate          3095413.33 (    +0.00%)  9231239.53 (  +198.22%)
Compact daemon scanned free             2155966.53 (    +0.00%)  7053692.87 (  +227.17%)
Compact direct scanned migrate           265642.47 (    +0.00%)    68388.33 (   -74.26%)
Compact direct scanned free              130252.60 (    +0.00%)    55634.87 (   -57.29%)
Compact total migrate scanned           3361055.80 (    +0.00%)  9299627.87 (  +176.69%)
Compact total free scanned              2286219.13 (    +0.00%)  7109327.73 (  +210.96%)
Alloc stall                                1890.80 (    +0.00%)     6297.60 (  +232.94%)
Pages kswapd scanned                    9043558.80 (    +0.00%)  5952576.73 (   -34.18%)
Pages kswapd reclaimed                  1891708.67 (    +0.00%)  1030645.00 (   -45.52%)
Pages direct scanned                    1017090.60 (    +0.00%)  2688047.60 (  +164.29%)
Pages direct reclaimed                    92682.60 (    +0.00%)   309770.53 (  +234.22%)
Pages total scanned                    10060649.40 (    +0.00%)  8640624.33 (   -14.11%)
Pages total reclaimed                   1984391.27 (    +0.00%)  1340415.53 (   -32.45%)
Swap out                                 884585.73 (    +0.00%)   417781.93 (   -52.77%)
Swap in                                  287106.27 (    +0.00%)    95589.73 (   -66.71%)
File refaults                            551697.60 (    +0.00%)   426474.80 (   -22.70%)

Link: https://lkml.kernel.org/r/20250416135142.778933-3-hannes@cmpxchg.org
Fixes: a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: vmscan: restore high-cpu watermark safety in kswapd
Johannes Weiner [Wed, 16 Apr 2025 13:45:39 +0000 (09:45 -0400)]
mm: vmscan: restore high-cpu watermark safety in kswapd

Vlastimil points out that commit a211c6550efc ("mm: page_alloc:
defrag_mode kswapd/kcompactd watermarks") switched kswapd from
zone_watermark_ok_safe() to the standard, percpu-cached version of reading
free pages, thus dropping the watermark safety precautions for systems
with high CPU counts (e.g.  >212 cpus on 64G).  Restore them.

Since zone_watermark_ok_safe() is no longer the right interface, and this
was the last caller of the function anyway, open-code the
zone_page_state_snapshot() conditional and delete the function.

Link: https://lkml.kernel.org/r/20250416135142.778933-2-hannes@cmpxchg.org
Fixes: a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoMAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
Lorenzo Stoakes [Wed, 16 Apr 2025 13:53:01 +0000 (14:53 +0100)]
MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section

Pedro has offered to review memory mapping code.  He has good experience
in this area and has provided excellent feedback on memory mapping series
in the past so I feel he'll be a great addition.

Link: https://lkml.kernel.org/r/20250416135301.43513-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
David Hildenbrand [Tue, 15 Apr 2025 09:50:07 +0000 (11:50 +0200)]
mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization

In __folio_remove_rmap() for RMAP_LEVEL_PMD/RMAP_LEVEL_PUD and with
CONFIG_PAGE_MAPCOUNT we first decrement the folio mapcount (and recompute
mapped shared vs.  mapped exclusively) to then adjust the entire mapcount.

This means that another process might stumble in do_wp_page() over a
PTE-mapped PMD folio that is indicated as "exclusively mapped", but still
has an entire mapcount (PMD mapping), because it is racing with the
process that is unmapping the folio (PMD mapping).  Note that do_wp_page()
will back off once it detects the remaining folio reference from the
process that is in the process of unmapping the folio.

This will trigger the early VM_WARN_ON_ONCE(folio_entire_mapcount(folio))
check in do_wp_page(), that can easily be reproduced by looping a couple
of times over allocating a PMD THP, forking a child where we immediately
unmap it again, and writing in the parent concurrently to the THP.

[  252.738129][T16470] ------------[ cut here ]------------
[  252.739267][T16470] WARNING: CPU: 3 PID: 16470 at mm/memory.c:3738 do_wp_page+0x2a75/0x2c00
[  252.740968][T16470] Modules linked in:
[  252.741958][T16470] CPU: 3 UID: 0 PID: 16470 Comm: ...
...
[  252.765841][T16470]  <TASK>
[  252.766419][T16470]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.767558][T16470]  ? rcu_is_watching+0x12/0x60
[  252.768525][T16470]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.769645][T16470]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.770778][T16470]  ? lock_acquire+0x33/0x80
[  252.771697][T16470]  ? __handle_mm_fault+0x5e8/0x3e40
[  252.772735][T16470]  ? __handle_mm_fault+0x5e8/0x3e40
[  252.773781][T16470]  __handle_mm_fault+0x1869/0x3e40
[  252.774839][T16470]  handle_mm_fault+0x22a/0x640
[  252.775808][T16470]  do_user_addr_fault+0x618/0x1000
[  252.776847][T16470]  exc_page_fault+0x68/0xd0
[  252.777775][T16470]  asm_exc_page_fault+0x26/0x30

While we could adjust the sequence in __folio_remove_rmap(), let's rater
move the mapcount sanity checks after the mapcount vs.  refcount
stabilization phase.  With this fix, a simple reproducer is happy.

While at it, convert the two VM_WARN_ON_ONCE() we are moving to
VM_WARN_ON_ONCE_FOLIO().

Link: https://lkml.kernel.org/r/20250415095007.569836-1-david@redhat.com
Fixes: 1da190f4d0a6 ("mm: Copy-on-Write (COW) reuse support for PTE-mapped THP")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+5e8feb543ca8e12e0ede@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/67fab4fe.050a0220.2c5fcf.0011.GAE@google.com
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm, hugetlb: increment the number of pages to be reset on HVO
Oscar Salvador [Tue, 15 Apr 2025 11:18:59 +0000 (13:18 +0200)]
mm, hugetlb: increment the number of pages to be reset on HVO

commit 4eeec8c89a0c ("mm: move hugetlb specific things in folio to
page[3]") shifted hugetlb specific stuff, and now mapping overlaps
_hugetlb_cgroup field.

Upon restoring the vmemmap for HVO, only the first two tail pages are
reset, and this causes the check in free_tail_page_prepare() to fail as it
finds an unexpected mapping value in some tails.

Increment the number of pages to be reset to 4 (head + 3 tail pages)

Link: https://lkml.kernel.org/r/20250415111859.376302-1-osalvador@suse.de
Fixes: 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agowriteback: fix false warning in inode_to_wb()
Andreas Gruenbacher [Sat, 12 Apr 2025 16:39:12 +0000 (18:39 +0200)]
writeback: fix false warning in inode_to_wb()

inode_to_wb() is used also for filesystems that don't support cgroup
writeback.  For these filesystems inode->i_wb is stable during the
lifetime of the inode (it points to bdi->wb) and there's no need to hold
locks protecting the inode->i_wb dereference.  Improve the warning in
inode_to_wb() to not trigger for these filesystems.

Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com
Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agodocs: ABI: replace mcroce@microsoft.com with new Meta address
Ahmad Fatoum [Mon, 14 Apr 2025 07:35:31 +0000 (09:35 +0200)]
docs: ABI: replace mcroce@microsoft.com with new Meta address

The Microsoft email address is bouncing:

    550 5.4.1 Recipient address rejected: Access denied.

So let's replace it with Matteo's current mail address.

Link: https://lkml.kernel.org/r/20250414-fix-mcroce-mail-bounce-v3-1-0aed2d71f3d7@pengutronix.de
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Matteo Croce <teknoraver@meta.com>
Link: https://lore.kernel.org/all/BYAPR15MB2504E4B02DFFB1E55871955DA1062@BYAPR15MB2504.namprd15.prod.outlook.com/
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matteo Croce <teknoraver@meta.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
Baoquan He [Thu, 10 Apr 2025 03:57:14 +0000 (11:57 +0800)]
mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()

Not like fault_in_readable() or fault_in_writeable(), in
fault_in_safe_writeable() local variable 'start' is increased page by page
to loop till the whole address range is handled.  However, it mistakenly
calculates the size of the handled range with 'uaddr - start'.

Fix it here.

Andreas said:

: In gfs2, fault_in_iov_iter_writeable() is used in
: gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially
: affects buffered as well as direct reads.  This bug could cause those
: gfs2 functions to spin in a loop.

Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Fixes: fe673d3f5bf1 ("mm: gup: make fault_in_safe_writeable() use fixup_user_fault()")
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Yanjun.Zhu <yanjun.zhu@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoMAINTAINERS: add memory advice section
Lorenzo Stoakes [Fri, 11 Apr 2025 07:27:24 +0000 (08:27 +0100)]
MAINTAINERS: add memory advice section

The madvise code straddles both VMA and page table manipulation.  As a
result, separate it out into its own section and add maintainers/reviewers
as appropriate.

We additionally include the mman-common.h file as this contains the shared
madvise flags and it is important we maintain this alongside madvise.c.

Link: https://lkml.kernel.org/r/20250411072724.10841-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Jann Horn <jannh@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoMAINTAINERS: add mmap trace events to MEMORY MAPPING
Liam R. Howlett [Fri, 11 Apr 2025 17:33:28 +0000 (13:33 -0400)]
MAINTAINERS: add mmap trace events to MEMORY MAPPING

MEMORY MAPPING does not list the mmap.h trace point file, but does list
the mmap.c file.  Couple the trace points with the users and authors of
the trace points for notifications of updates.

Link: https://lkml.kernel.org/r/20250411173328.8172-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: memcontrol: fix swap counter leak from offline cgroup
Muchun Song [Thu, 10 Apr 2025 08:18:12 +0000 (16:18 +0800)]
mm: memcontrol: fix swap counter leak from offline cgroup

commit 73f839b6d2ed addressed an issue regarding the swap counter leak
that occurred from an offline cgroup.  However, commit 89ce924f0bd4
modified the parameter from @swap_memcg to @memcg (presumably this
alteration was introduced while resolving conflicts).  Fix this problem by
reverting this minor change.

Link: https://lkml.kernel.org/r/20250410081812.10073-1-songmuchun@bytedance.com
Fixes: 89ce924f0bd4 ("mm: memcontrol: move memsw charge callbacks to v1")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoMAINTAINERS: add MM subsection for the page allocator
Vlastimil Babka [Thu, 10 Apr 2025 09:00:23 +0000 (11:00 +0200)]
MAINTAINERS: add MM subsection for the page allocator

Add a subsection for the page allocator, including compaction as it's
crucial for high-order allocations and works together with the
anti-fragmentation features.  Add reviewers (including myself) who
voluteered.

Link: https://lkml.kernel.org/r/20250410090021.72296-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Brendan Jackman <jackmanb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Christoph Lameter (Ampere) <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoMAINTAINERS: update SLAB ALLOCATOR maintainers
Vlastimil Babka [Thu, 10 Apr 2025 09:00:22 +0000 (11:00 +0200)]
MAINTAINERS: update SLAB ALLOCATOR maintainers

With permission, reduce the number of maintainers.  Create a CREDITS entry
for Joonsoo (Pekka already has one).  Thanks for all the work!

Link: https://lkml.kernel.org/r/20250410090021.72296-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Christoph Lameter (Ampere) <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agofs/dax: fix folio splitting issue by resetting old folio order + _nr_pages
David Hildenbrand [Thu, 10 Apr 2025 09:10:20 +0000 (11:10 +0200)]
fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages

Alison reports an issue with fsdax when large extends end up using large
ZONE_DEVICE folios:

[  417.796271] BUG: kernel NULL pointer dereference, address: 0000000000000b00
[  417.796982] #PF: supervisor read access in kernel mode
[  417.797540] #PF: error_code(0x0000) - not-present page
[  417.798123] PGD 2a5c5067 P4D 2a5c5067 PUD 2a5c6067 PMD 0
[  417.798690] Oops: Oops: 0000 [#1] SMP NOPTI
[  417.799178] CPU: 5 UID: 0 PID: 1515 Comm: mmap Tainted: ...
[  417.800150] Tainted: [O]=OOT_MODULE
[  417.800583] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[  417.801358] RIP: 0010:__lruvec_stat_mod_folio+0x7e/0x250
[  417.801948] Code: ...
[  417.803662] RSP: 0000:ffffc90002be3a08 EFLAGS: 00010206
[  417.804234] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000002
[  417.804984] RDX: ffffffff815652d7 RSI: 0000000000000000 RDI: ffffffff82a2beae
[  417.805689] RBP: ffffc90002be3a28 R08: 0000000000000000 R09: 0000000000000000
[  417.806384] R10: ffffea0007000040 R11: ffff888376ffe000 R12: 0000000000000001
[  417.807099] R13: 0000000000000012 R14: ffff88807fe4ab40 R15: ffff888029210580
[  417.807801] FS:  00007f339fa7a740(0000) GS:ffff8881fa9b9000(0000) knlGS:0000000000000000
[  417.808570] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  417.809193] CR2: 0000000000000b00 CR3: 000000002a4f0004 CR4: 0000000000370ef0
[  417.809925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  417.810622] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  417.811353] Call Trace:
[  417.811709]  <TASK>
[  417.812038]  folio_add_file_rmap_ptes+0x143/0x230
[  417.812566]  insert_page_into_pte_locked+0x1ee/0x3c0
[  417.813132]  insert_page+0x78/0xf0
[  417.813558]  vmf_insert_page_mkwrite+0x55/0xa0
[  417.814088]  dax_fault_iter+0x484/0x7b0
[  417.814542]  dax_iomap_pte_fault+0x1ca/0x620
[  417.815055]  dax_iomap_fault+0x39/0x40
[  417.815499]  __xfs_write_fault+0x139/0x380
[  417.815995]  ? __handle_mm_fault+0x5e5/0x1a60
[  417.816483]  xfs_write_fault+0x41/0x50
[  417.816966]  xfs_filemap_fault+0x3b/0xe0
[  417.817424]  __do_fault+0x31/0x180
[  417.817859]  __handle_mm_fault+0xee1/0x1a60
[  417.818325]  ? debug_smp_processor_id+0x17/0x20
[  417.818844]  handle_mm_fault+0xe1/0x2b0
[...]

The issue is that when we split a large ZONE_DEVICE folio to order-0 ones,
we don't reset the order/_nr_pages.  As folio->_nr_pages overlays
page[1]->memcg_data, once page[1] is a folio, it suddenly looks like it
has folio->memcg_data set.  And we never manually initialize
folio->memcg_data in fsdax code, because we never expect it to be set at
all.

When __lruvec_stat_mod_folio() then stumbles over such a folio, it tries
to use folio->memcg_data (because it's non-NULL) but it does not actually
point at a memcg, resulting in the problem.

Alison also observed that these folios sometimes have "locked" set, which
is rather concerning (folios locked from the beginning ...).  The reason
is that the order for large folios is stored in page[1]->flags, which
become the folio->flags of a new small folio.

Let's fix it by adding a folio helper to clear order/_nr_pages for
splitting purposes.

Maybe we should reinitialize other large folio flags / folio members as
well when splitting, because they might similarly cause harm once page[1]
becomes a folio?  At least other flags in PAGE_FLAGS_SECOND should not be
set for fsdax, so at least page[1]->flags might be as expected with this
fix.

From a quick glimpse, initializing ->mapping, ->pgmap and ->share should
re-initialize most things from a previous page[1] used by large folios
that fsdax cares about.  For example folio->private might not get
reinitialized, but maybe that's not relevant -- no traces of it's use in
fsdax code.  Needs a closer look.

Another thing that should be considered in the future is performing
similar checks as we perform in free_tail_page_prepare()
-- checking pincount etc.
-- when freeing a large fsdax folio.

Link: https://lkml.kernel.org/r/20250410091020.119116-1-david@redhat.com
Fixes: 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page")
Fixes: 38607c62b34b ("fs/dax: properly refcount fs dax pages")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Alison Schofield <alison.schofield@intel.com>
Closes: https://lkml.kernel.org/r/Z_W9Oeg-D9FhImf3@aschofie-mobl2.lan
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: "Darrick J. Wong" <djwong@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
Kirill A. Shutemov [Sat, 29 Mar 2025 17:10:29 +0000 (19:10 +0200)]
mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()

When the last page in the zone is accepted, __accept_page() calls
static_branch_dec().  This function takes cpu_hotplug_lock, which can lead
to a deadlock if the allocation occurs during CPU bringup path as
_cpu_up() also takes the lock.

To prevent this deadlock, defer static_branch_dec() to a workqueue.

Call static_branch_dec() only when the workqueue is not yet initialized.
Workqueues are initialized before CPU bring up, so this will not conflict
with the first scenario.

Link: https://lkml.kernel.org/r/20250329171030.3942298-1-kirill.shutemov@linux.intel.com
Fixes: 55ad43e8ba0f ("mm: add a helper to accept page")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Srikanth Aithal <sraithal@amd.com>
Tested-by: Srikanth Aithal <sraithal@amd.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agotracing: Fix filter string testing
Steven Rostedt [Thu, 17 Apr 2025 22:30:03 +0000 (18:30 -0400)]
tracing: Fix filter string testing

The filter string testing uses strncpy_from_kernel/user_nofault() to
retrieve the string to test the filter against. The if() statement was
incorrect as it considered 0 as a fault, when it is only negative that it
faulted.

Running the following commands:

  # cd /sys/kernel/tracing
  # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter
  # echo 1 > events/syscalls/sys_enter_openat/enable
  # ls /proc/$$/maps
  # cat trace

Would produce nothing, but with the fix it will produce something like:

      ls-1192    [007] .....  8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0)

Link: https://lore.kernel.org/all/CAEf4BzbVPQ=BjWztmEwBPRKHUwNfKBkS3kce-Rzka6zvbQeVpg@mail.gmail.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250417183003.505835fb@gandalf.local.home
Fixes: 77360f9bbc7e5 ("tracing: Add test for user space strings when filtering on string pointers")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2 months agodrm/xe/pxp: do not queue unneeded terminations from debugfs
Daniele Ceraolo Spurio [Wed, 16 Apr 2025 20:16:22 +0000 (13:16 -0700)]
drm/xe/pxp: do not queue unneeded terminations from debugfs

The PXP terminate debugfs currently unconditionally simulates a
termination, no matter what the HW status is. This is unneeded if PXP is
not in use and can cause errors if the HW init hasn't completed yet.
To solve these issues, we can simply limit the terminations to the cases
where PXP is fully initialized and in use.

v2: s/pxp_status/ready/ to avoid confusion with pxp->status (John)

Fixes: 385a8015b214 ("drm/xe/pxp: Add PXP debugfs support")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4749
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250416201622.1295369-1-daniele.ceraolospurio@intel.com
(cherry picked from commit ba1f62a0cac84757ca35f4217e3cd3a2654233ae)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/dma_buf: stop relying on placement in unmap
Matthew Auld [Thu, 10 Apr 2025 16:27:17 +0000 (17:27 +0100)]
drm/xe/dma_buf: stop relying on placement in unmap

The is_vram() is checking the current placement, however if we consider
exported VRAM with dynamic dma-buf, it looks possible for the xe driver
to async evict the memory, notifying the importer, however importer does
not have to call unmap_attachment() immediately, but rather just as
"soon as possible", like when the dma-resv idles. Following from this we
would then pipeline the move, attaching the fence to the manager, and
then update the current placement. But when the unmap_attachment() runs
at some later point we might see that is_vram() is now false, and take
the complete wrong path when dma-unmapping the sg, leading to
explosions.

To fix this check if the sgl was mapping a struct page.

v2:
  - The attachment can be mapped multiple times it seems, so we can't
    really rely on encoding something in the attachment->priv. Instead
    see if the page_link has an encoded struct page. For vram we expect
    this to be NULL.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4563
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250410162716.159403-2-matthew.auld@intel.com
(cherry picked from commit d755887f8e5a2a18e15e6632a5193e5feea18499)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/userptr: fix notifier vs folio deadlock
Matthew Auld [Mon, 14 Apr 2025 13:25:40 +0000 (14:25 +0100)]
drm/xe/userptr: fix notifier vs folio deadlock

User is reporting what smells like notifier vs folio deadlock, where
migrate_pages_batch() on core kernel side is holding folio lock(s) and
then interacting with the mappings of it, however those mappings are
tied to some userptr, which means calling into the notifier callback and
grabbing the notifier lock. With perfect timing it looks possible that
the pages we pulled from the hmm fault can get sniped by
migrate_pages_batch() at the same time that we are holding the notifier
lock to mark the pages as accessed/dirty, but at this point we also want
to grab the folio locks(s) to mark them as dirty, but if they are
contended from notifier/migrate_pages_batch side then we deadlock since
folio lock won't be dropped until we drop the notifier lock.

Fortunately the mark_page_accessed/dirty is not really needed in the
first place it seems and should have already been done by hmm fault, so
just remove it.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4765
Fixes: 0a98219bcc96 ("drm/xe/hmm: Don't dereference struct page pointers without notifier lock")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250414132539.26654-2-matthew.auld@intel.com
(cherry picked from commit bd7c0cb695e87c0e43247be8196b4919edbe0e85)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe: Set LRC addresses before guc load
Lucas De Marchi [Thu, 10 Apr 2025 04:59:34 +0000 (21:59 -0700)]
drm/xe: Set LRC addresses before guc load

The metadata saved in the ADS is read by GuC when it's initialized.
Saving the addresses to the LRCs when they are populated is too late as
GuC will keep using the old ones.

This was causing GuC to use the RCS LRC for any engine class. It's not a
big problem on a Linux-only scenario since the they are used by GuC only
on media engines when the watchdog is triggered. However, in a
virtualization scenario with Windows as the VF, it causes the wrong LRCs
to be loaded as the watchdog is used for all engines.

Fix it by letting guc_golden_lrc_init() initialize the metadata, like
other *_init() functions, and later guc_golden_lrc_populate() to copy
the LRCs to the right places. The former is called before the second GuC
load, while the latter is called after LRCs have been recorded.

Cc: Chee Yin Wong <chee.yin.wong@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org> # v6.11+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Chee Yin Wong <chee.yin.wong@intel.com>
Link: https://lore.kernel.org/r/20250409-fix-guc-ads-v1-1-494135f7a5d0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit c31a0b6402d15b530514eee9925adfcb8cfbb1c9)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agoMerge tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Thu, 17 Apr 2025 23:00:31 +0000 (16:00 -0700)]
Merge tag 'pci-v6.15-fixes-2' of git://git./linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

 - Revert a reset patch that broke VFIO passthrough because devices
   ended up with no available reset mechanisms (Alex Williamson)

* tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  Revert "PCI: Avoid reset when disabled via sysfs"

2 months agoMerge tag 'drm-misc-fixes-2025-04-17' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 17 Apr 2025 22:38:26 +0000 (08:38 +1000)]
Merge tag 'drm-misc-fixes-2025-04-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

dma-buf:
- Correctly decrement refcounter on errors

gem:
- Fix test for imported buffers

ivpu:
- Fix debugging
- Fixes to frequency
- Support firmware API 3.28.3
- Flush jobs upon reset

mgag200:
- Set vblank start to correct values

v3d:
- Fix Indirect Dispatch

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250417084043.GA365738@linux.fritz.box
2 months agoMerge tag 'drm-intel-fixes-2025-04-17' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Thu, 17 Apr 2025 22:37:59 +0000 (08:37 +1000)]
Merge tag 'drm-intel-fixes-2025-04-17' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

drm/i915 fixes for v6.15-rc3:
- Fix DP DSC configurations that require 3 DSC engines per pipe

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/87fri7p8tp.fsf@intel.com