Steven Price [Tue, 4 Feb 2020 01:35:50 +0000 (17:35 -0800)]
mm: pagewalk: allow walking without vma
Since
48684a65b4e3: "mm: pagewalk: fix misbehavior of walk_page_range for
vma(VM_PFNMAP)", page_table_walk() will report any kernel area as a hole,
because it lacks a vma.
This means each arch has re-implemented page table walking when needed,
for example in the per-arch ptdump walker.
Remove the requirement to have a vma in the generic code and add a new
function walk_page_range_novma() which ignores the VMAs and simply walks
the page tables.
Link: http://lkml.kernel.org/r/20191218162402.45610-13-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:45 +0000 (17:35 -0800)]
mm: pagewalk: add p4d_entry() and pgd_entry()
pgd_entry() and pud_entry() were removed by commit
0b1fbfe50006c410
("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were no
users. We're about to add users so reintroduce them, along with
p4d_entry() as we now have 5 levels of tables.
Note that commit
a00cc7d9dd93d66a ("mm, x86: add support for PUD-sized
transparent hugepages") already re-added pud_entry() but with different
semantics to the other callbacks. This commit reverts the semantics back
to match the other callbacks.
To support hmm.c which now uses the new semantics of pud_entry() a new
member ('action') of struct mm_walk is added which allows the callbacks to
either descend (ACTION_SUBTREE, the default), skip (ACTION_CONTINUE) or
repeat the callback (ACTION_AGAIN). hmm.c is then updated to call
pud_trans_huge_lock() itself and make use of the splitting/retry logic of
the core code.
After this change pud_entry() is called for all entries, not just
transparent huge pages.
[arnd@arndb.de: fix unused variable warning]
Link: http://lkml.kernel.org/r/20200107204607.1533842-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20191218162402.45610-12-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:41 +0000 (17:35 -0800)]
x86: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
For x86 we already have p?d_large() functions, so simply add macros to
provide the generic p?d_leaf() names for the generic code.
Link: http://lkml.kernel.org/r/20191218162402.45610-11-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:36 +0000 (17:35 -0800)]
sparc: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
For sparc 64 bit, pmd_large() and pud_large() are already provided, so add
macros to provide the p?d_leaf names required by the generic code.
Link: http://lkml.kernel.org/r/20191218162402.45610-10-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:32 +0000 (17:35 -0800)]
s390: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
For s390, pud_large() and pmd_large() are already implemented as static
inline functions. Add a macro to provide the p?d_leaf names for the
generic code to use.
Link: http://lkml.kernel.org/r/20191218162402.45610-9-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:28 +0000 (17:35 -0800)]
riscv: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
For riscv a page is a leaf page when it has a read, write or execute bit
set on it.
Link: http://lkml.kernel.org/r/20191218162402.45610-8-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Zong Li <zong.li@sifive.com>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> [arch/riscv]
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:24 +0000 (17:35 -0800)]
powerpc: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
For powerpc p?d_is_leaf() functions already exist. Export them using the
new p?d_leaf() name.
Link: http://lkml.kernel.org/r/20191218162402.45610-7-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:19 +0000 (17:35 -0800)]
mips: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
If _PAGE_HUGE is defined we can simply look for it. When not defined we
can be confident that there are no leaf pages in existence and fall back
on the generic implementation (added in a later patch) which returns 0.
Link: http://lkml.kernel.org/r/20191218162402.45610-6-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:14 +0000 (17:35 -0800)]
arm64: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_leaf() functions/macros.
For arm64, we already have p?d_sect() macros which we can reuse for
p?d_leaf().
pud_sect() is defined as a dummy function when CONFIG_PGTABLE_LEVELS < 3
or CONFIG_ARM64_64K_PAGES is defined. However when the kernel is
configured this way then architecturally it isn't allowed to have a large
page at this level, and any code using these page walking macros is
implicitly relying on the page size/number of levels being the same as the
kernel. So it is safe to reuse this for p?d_leaf() as it is an
architectural restriction.
Link: http://lkml.kernel.org/r/20191218162402.45610-5-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:10 +0000 (17:35 -0800)]
arm: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
For arm pmd_large() already exists and does what we want. So simply
provide the generic pmd_leaf() name.
Link: http://lkml.kernel.org/r/20191218162402.45610-4-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:06 +0000 (17:35 -0800)]
arc: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_leaf() functions/macros.
For arc, we only have two levels, so only pmd_leaf() is needed.
Link: http://lkml.kernel.org/r/20191218162402.45610-3-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Price [Tue, 4 Feb 2020 01:35:01 +0000 (17:35 -0800)]
mm: add generic p?d_leaf() macros
Patch series "Generic page walk and ptdump", v17.
Many architectures current have a debugfs file for dumping the kernel page
tables. Currently each architecture has to implement custom functions for
this because the details of walking the page tables used by the kernel are
different between architectures.
This series extends the capabilities of walk_page_range() so that it can
deal with the page tables of the kernel (which have no VMAs and can
contain larger huge pages than exist for user space). A generic PTDUMP
implementation is the implemented making use of the new functionality of
walk_page_range() and finally arm64 and x86 are switch to using it,
removing the custom table walkers.
To enable a generic page table walker to walk the unusual mappings of the
kernel we need to implement a set of functions which let us know when the
walker has reached the leaf entry. After a suggestion from Will Deacon
I've chosen the name p?d_leaf() as this (hopefully) describes the purpose
(and is a new name so has no historic baggage). Some architectures have
p?d_large macros but this is easily confused with "large pages".
This series ends with a generic PTDUMP implemention for arm64 and x86.
Mostly this is a clean up and there should be very little functional
change. The exceptions are:
* arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).
* arm64 has the ability to efficiently process KASAN pages (which
previously only x86 implemented). This means that the combination of
KASAN and DEBUG_WX is now useable.
This patch (of 23):
Exposing the pud/pgd levels of the page tables to walk_page_range() means
we may come across the exotic large mappings that come with large areas of
contiguous memory (such as the kernel's linear map).
For architectures that don't provide all p?d_leaf() macros, provide
generic do nothing default that are suitable where there cannot be leaf
pages at that level. Futher patches will add implementations for
individual architectures.
The name p?d_leaf() is chosen to minimize the confusion with existing uses
of "large" pages and "huge" pages which do not necessary mean that the
entry is a leaf (for example it may be a set of contiguous entries that
only take 1 TLB slot). For the purpose of walking the page tables we
don't need to know how it will be represented in the TLB, but we do need
to know for sure if it is a leaf of the tree.
Link: http://lkml.kernel.org/r/20191218162402.45610-2-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian Westphal [Tue, 4 Feb 2020 01:34:58 +0000 (17:34 -0800)]
mm: remove __krealloc
Since 5.5-rc1 the last user of this function is gone, so remove the
functionality.
See commit
2ad9d7747c10 ("netfilter: conntrack: free extension area immediately")
for details.
Link: http://lkml.kernel.org/r/20191212223442.22141-1-fw@strlen.de
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 4 Feb 2020 01:34:55 +0000 (17:34 -0800)]
pinctrl: fix pxa2xx.c build warnings
Add #include of <linux/pinctrl/machine.h> to fix build
warnings in pinctrl-pxa2xx.c. Fixes these warnings:
In file included from ../drivers/pinctrl/pxa/pinctrl-pxa2xx.c:24:0:
../drivers/pinctrl/pxa/../pinctrl-utils.h:36:8: warning: `enum pinctrl_map_type' declared inside parameter list [enabled by default]
enum pinctrl_map_type type);
^
../drivers/pinctrl/pxa/../pinctrl-utils.h:36:8: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
Link: http://lkml.kernel.org/r/0024542e-cba9-8f13-6c18-32d0050a6007@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 4 Feb 2020 01:34:52 +0000 (17:34 -0800)]
drivers/block/null_blk_main.c: fix uninitialized var warnings
With gcc-7.2, many instances of
drivers/block/null_blk_main.c: In function ‘nullb_device_zone_nr_conv_store’:
drivers/block/null_blk_main.c:291:12: warning: ‘new_value’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dev->NAME = new_value; \
^
drivers/block/null_blk_main.c:279:7: note: ‘new_value’ was declared here
TYPE new_value; \
^
Presumably notabug, so use uninitialized_var() to suppress them.
Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 4 Feb 2020 01:34:49 +0000 (17:34 -0800)]
drivers/block/null_blk_main.c: fix layout
Each line here overflows 80 cols by exactly one character. Delete one tab
per line to fix.
Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lu Shuaibing [Tue, 4 Feb 2020 01:34:46 +0000 (17:34 -0800)]
ipc/msg.c: consolidate all xxxctl_down() functions
A use of uninitialized memory in msgctl_down() because msqid64 in
ksys_msgctl hasn't been initialized. The local | msqid64 | is created in
ksys_msgctl() and then passed into msgctl_down(). Along the way msqid64
is never initialized before msgctl_down() checks msqid64->msg_qbytes.
KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
reports:
==================================================================
BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300
Read of size 8 at addr
ffff88806bb97eb8 by task syz-executor707/2022
CPU: 0 PID: 2022 Comm: syz-executor707 Not tainted 5.2.0-rc4+ #63
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
dump_stack+0x75/0xae
__kumsan_report+0x17c/0x3e6
kumsan_report+0xe/0x20
msgctl_down+0x94/0x300
ksys_msgctl.constprop.14+0xef/0x260
do_syscall_64+0x7e/0x1f0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4400e9
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007ffd869e0598 EFLAGS:
00000246 ORIG_RAX:
0000000000000047
RAX:
ffffffffffffffda RBX:
00000000004002c8 RCX:
00000000004400e9
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
00000000006ca018 R08:
0000000000000000 R09:
0000000000000000
R10:
00000000ffffffff R11:
0000000000000246 R12:
0000000000401970
R13:
0000000000401a00 R14:
0000000000000000 R15:
0000000000000000
The buggy address belongs to the page:
page:
ffffea0001aee5c0 refcount:0 mapcount:0 mapping:
0000000000000000 index:0x0
flags: 0x100000000000000()
raw:
0100000000000000 0000000000000000 ffffffff01ae0101 0000000000000000
raw:
0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kumsan: bad access detected
==================================================================
Syzkaller reproducer:
msgctl$IPC_RMID(0x0, 0x0)
C reproducer:
// autogenerated by syzkaller (https://github.com/google/syzkaller)
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
syscall(__NR_msgctl, 0, 0, 0);
return 0;
}
[natechancellor@gmail.com: adjust indentation in ksys_msgctl]
Link: https://github.com/ClangBuiltLinux/linux/issues/829
Link: http://lkml.kernel.org/r/20191218032932.37479-1-natechancellor@gmail.com
Link: http://lkml.kernel.org/r/20190613014044.24234-1-shuaibinglu@126.com
Signed-off-by: Lu Shuaibing <shuaibinglu@126.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: NeilBrown <neilb@suse.com>
From: Andrew Morton <akpm@linux-foundation.org>
Subject: drivers/block/null_blk_main.c: fix layout
Each line here overflows 80 cols by exactly one character. Delete one tab
per line to fix.
Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Tue, 4 Feb 2020 01:34:42 +0000 (17:34 -0800)]
ipc/sem.c: document and update memory barriers
Document and update the memory barriers in ipc/sem.c:
- Add smp_store_release() to wake_up_sem_queue_prepare() and
document why it is needed.
- Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep().
as the pair for the barrier inside wake_up_sem_queue_prepare().
- Add comments to all barriers, and mention the rules in the block
regarding locking.
- Switch to using wake_q_add_safe().
Link: http://lkml.kernel.org/r/20191020123305.14715-6-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Tue, 4 Feb 2020 01:34:39 +0000 (17:34 -0800)]
ipc/msg.c: update and document memory barriers
Transfer findings from ipc/mqueue.c:
- A control barrier was missing for the lockless receive case So in
theory, not yet initialized data may have been copied to user space -
obviously only for architectures where control barriers are not NOP.
- use smp_store_release(). In theory, the refount may have been
decreased to 0 already when wake_q_add() tries to get a reference.
Link: http://lkml.kernel.org/r/20191020123305.14715-5-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Tue, 4 Feb 2020 01:34:36 +0000 (17:34 -0800)]
ipc/mqueue.c: update/document memory barriers
Update and document memory barriers for mqueue.c:
- ewp->state is read without any locks, thus READ_ONCE is required.
- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
acquire semantics if the value is STATE_READY.
- use wake_q_add_safe()
- document why __set_current_state() may be used:
Reading task->state cannot happen before the wake_q_add() call,
which happens while holding info->lock. Thus the spin_unlock()
is the RELEASE, and the spin_lock() is the ACQUIRE.
For completeness: there is also a 3 CPU scenario, if the to be woken
up task is already on another wake_q.
Then:
- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
- CPU2: the spin_lock() of the waker is the ACQUIRE
- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE
Link: http://lkml.kernel.org/r/20191020123305.14715-4-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Waiman Long <longman@redhat.com>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 4 Feb 2020 01:34:32 +0000 (17:34 -0800)]
ipc/mqueue.c: remove duplicated code
pipelined_send() and pipelined_receive() are identical, so merge them.
[manfred@colorfullife.com: add changelog]
Link: http://lkml.kernel.org/r/20191020123305.14715-3-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Tue, 4 Feb 2020 01:34:29 +0000 (17:34 -0800)]
smp_mb__{before,after}_atomic(): update Documentation
When adding the _{acquire|release|relaxed}() variants of some atomic
operations, it was forgotten to update Documentation/memory_barrier.txt:
smp_mb__{before,after}_atomic() is now intended for all RMW operations
that do not imply a memory barrier.
1)
smp_mb__before_atomic();
atomic_add();
2)
smp_mb__before_atomic();
atomic_xchg_relaxed();
3)
smp_mb__before_atomic();
atomic_fetch_add_relaxed();
Invalid would be:
smp_mb__before_atomic();
atomic_set();
In addition, the patch splits the long sentence into multiple shorter
sentences.
Link: http://lkml.kernel.org/r/20191020123305.14715-2-manfred@colorfullife.com
Fixes:
654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <1vier1@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:34:26 +0000 (17:34 -0800)]
mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone()
The callers are only interested in the actual zone, they don't care about
boundaries. Return the zone instead to simplify.
Link: http://lkml.kernel.org/r/20200110183308.11849-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:34:23 +0000 (17:34 -0800)]
mm/memory_hotplug: cleanup __remove_pages()
Let's drop the basically unused section stuff and simplify.
Also, let's use a shorter variant to calculate the number of pages to
the next section boundary.
Link: http://lkml.kernel.org/r/20191006085646.5768-11-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:34:19 +0000 (17:34 -0800)]
mm/memory_hotplug: drop local variables in shrink_zone_span()
Get rid of the unnecessary local variables.
Link: http://lkml.kernel.org/r/20191006085646.5768-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:34:16 +0000 (17:34 -0800)]
mm/memory_hotplug: don't check for "all holes" in shrink_zone_span()
If we have holes, the holes will automatically get detected and removed
once we remove the next bigger/smaller section. The extra checks can go.
Link: http://lkml.kernel.org/r/20191006085646.5768-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:34:12 +0000 (17:34 -0800)]
mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn
With shrink_pgdat_span() out of the way, we now always have a valid zone.
Link: http://lkml.kernel.org/r/20191006085646.5768-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:34:09 +0000 (17:34 -0800)]
mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()
Let's poison the pages similar to when adding new memory in
sparse_add_section(). Also call remove_pfn_range_from_zone() from
memunmap_pages(), so we can poison the memmap from there as well.
Link: http://lkml.kernel.org/r/20191006085646.5768-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 4 Feb 2020 01:34:06 +0000 (17:34 -0800)]
mm/memmap_init: update variable name in memmap_init_zone
Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6.
This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory. Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.
We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect the
ZONE_DEVICE as contiguous).
We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of
code to a minimum. Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of DIMMs
at zone boundaries.
--------------------------------------------------------------------------
Zones are now properly shrunk when offlining memory blocks or when
onlining failed. This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.
Example:
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 0
present 0
managed 0
:/# echo "online_movable" > /sys/devices/system/memory/memory41/state
:/# echo "online_movable" > /sys/devices/system/memory/memory43/state
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 98304
present 65536
managed 65536
:/# echo 0 > /sys/devices/system/memory/memory43/online
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 32768
present 32768
managed 32768
:/# echo 0 > /sys/devices/system/memory/memory41/online
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 0
present 0
managed 0
This patch (of 6):
The third argument is actually number of pages. Change the variable name
from size to nr_pages to indicate this better.
No functional change in this patch.
Link: http://lkml.kernel.org/r/20191006085646.5768-3-david@redhat.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:34:02 +0000 (17:34 -0800)]
mm: factor out next_present_section_nr()
Let's move it to the header and use the shorter variant from
mm/page_alloc.c (the original one will also check
"__highest_present_section_nr + 1", which is not necessary). While at
it, make the section_nr in next_pfn() const.
In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once we
exceed __highest_present_section_nr, which doesn't make a difference in
the caller as it is big enough (>= all sane end_pfn).
Link: http://lkml.kernel.org/r/20200113144035.10848-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Jin, Zhi" <zhi.jin@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:33:59 +0000 (17:33 -0800)]
mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
Let's update the pfn manually whenever we continue the loop. This makes
the code easier to read but also less error prone (and we can directly fix
one issue).
When overlap_memmap_init() returns true, pfn is updated to
"memblock_region_memory_end_pfn(r)". So it already points at the *next*
pfn to process. Incrementing the pfn another time is wrong, we might
leave one uninitialized. I spotted this by inspecting the code, so I have
no idea if this is relevant in practise (with kernelcore=mirror).
Link: http://lkml.kernel.org/r/20200113144035.10848-2-david@redhat.com
Fixes:
a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Jin, Zhi" <zhi.jin@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:33:55 +0000 (17:33 -0800)]
mm/page_alloc.c: initialize memmap of unavailable memory directly
Let's make sure that all memory holes are actually marked PageReserved(),
that page_to_pfn() produces reliable results, and that these pages are not
detected as "mmap" pages due to the mapcount.
E.g., booting a x86-64 QEMU guest with 4160 MB:
[ 0.010585] Early memory node ranges
[ 0.010586] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.010588] node 0: [mem 0x0000000000100000-0x00000000bffdefff]
[ 0.010589] node 0: [mem 0x0000000100000000-0x0000000143ffffff]
max_pfn is 0x144000.
Before this change:
[root@localhost ~]# ./page-types -r -a 0x144000,
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000800 16384 64 ___________M_______________________________ mmap
total 16384 64
After this change:
[root@localhost ~]# ./page-types -r -a 0x144000,
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000100000000 16384 64 ___________________________r_______________ reserved
total 16384 64
IOW, especially the unavailable physical memory ("memory hole") in the
last section would not get properly marked PageReserved() and is indicated
to be "mmap" memory.
Drop the trace of that function from include/linux/mm.h - nobody else
needs it, and rename it accordingly.
Note: The fake zone/node might not be covered by the zone/node span. This
is not an urgent issue (for now, we had the same node/zone due to the
zeroing). We'll need a clean way to mark memory holes (e.g., using a page
type PageHole() if possible or a fake ZONE_INVALID) and eventually stop
marking these memory holes PageReserved().
Link: http://lkml.kernel.org/r/20191211163201.17179-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:33:52 +0000 (17:33 -0800)]
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Tue, 4 Feb 2020 01:33:48 +0000 (17:33 -0800)]
mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
Patch series "mm: fix max_pfn not falling on section boundary", v2.
Playing with different memory sizes for a x86-64 guest, I discovered that
some memmaps (highest section if max_mem does not fall on the section
boundary) are marked as being valid and online, but contain garbage. We
have to properly initialize these memmaps.
Looking at /proc/kpageflags and friends, I found some more issues,
partially related to this.
This patch (of 3):
If max_pfn is not aligned to a section boundary, we can easily run into
BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a
memory size that is not a multiple of 128MB (e.g., 4097MB, but also
4160MB). I was told that on real HW, we can easily have this scenario
(esp., one of the main reasons sub-section hotadd of devmem was added).
The issue is, that we have a valid memmap (pfn_valid()) for the whole
section, and the whole section will be marked "online".
pfn_to_online_page() will succeed, but the memmap contains garbage.
E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
4160M" - (see tools/vm/page-types.c):
[ 200.476376] BUG: unable to handle page fault for address:
fffffffffffffffe
[ 200.477500] #PF: supervisor read access in kernel mode
[ 200.478334] #PF: error_code(0x0000) - not-present page
[ 200.479076] PGD
59614067 P4D
59614067 PUD
59616067 PMD 0
[ 200.479557] Oops: 0000 [#4] SMP NOPTI
[ 200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G D W 5.5.0-rc1-next-
20191209 #93
[ 200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[ 200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
[ 200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
[ 200.483644] RSP: 0018:
ffffb139401cbe60 EFLAGS:
00010202
[ 200.484091] RAX:
fffffffffffffffe RBX:
fffffbeec5100040 RCX:
0000000000000000
[ 200.484697] RDX:
0000000000000001 RSI:
ffffffff9535c7cd RDI:
0000000000000246
[ 200.485313] RBP:
ffffffffffffffff R08:
0000000000000000 R09:
0000000000000000
[ 200.485917] R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000144001
[ 200.486523] R13:
00007ffd6ba55f48 R14:
00007ffd6ba55f40 R15:
ffffb139401cbf08
[ 200.487130] FS:
00007f68df717580(0000) GS:
ffff9ec77fa00000(0000) knlGS:
0000000000000000
[ 200.487804] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 200.488295] CR2:
fffffffffffffffe CR3:
0000000135d48000 CR4:
00000000000006f0
[ 200.488897] Call Trace:
[ 200.489115] kpageflags_read+0xe9/0x140
[ 200.489447] proc_reg_read+0x3c/0x60
[ 200.489755] vfs_read+0xc2/0x170
[ 200.490037] ksys_pread64+0x65/0xa0
[ 200.490352] do_syscall_64+0x5c/0xa0
[ 200.490665] entry_SYSCALL_64_after_hwframe+0x49/0xbe
But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
after cold/hot plugging a DIMM to such a system:
[root@localhost ~]# cat /proc/kpageflags > /dev/null
[ 111.517275] BUG: unable to handle page fault for address:
fffffffffffffffe
[ 111.517907] #PF: supervisor read access in kernel mode
[ 111.518333] #PF: error_code(0x0000) - not-present page
[ 111.518771] PGD
a240e067 P4D
a240e067 PUD
a2410067 PMD 0
This patch fixes that by at least zero-ing out that memmap (so e.g.,
page_to_pfn() will not crash). Commit
907ec5fca3dc ("mm: zero remaining
unavailable struct pages") tried to fix a similar issue, but forgot to
consider this special case.
After this patch, there are still problems to solve. E.g., not all of
these pages falling into a memory hole will actually get initialized later
and set PageReserved - they are only zeroed out - but at least the
immediate crashes are gone. A follow-up patch will take care of this.
Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
Fixes:
f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Tue, 4 Feb 2020 01:33:45 +0000 (17:33 -0800)]
ocfs2: fix oops when writing cloned file
Writing a cloned file triggers a kernel oops and the user-space command
process is also killed by the system. The bug can be reproduced stably
via:
1) create a file under ocfs2 file system directory.
journalctl -b > aa.txt
2) create a cloned file for this file.
reflink aa.txt bb.txt
3) write the cloned file with dd command.
dd if=/dev/zero of=bb.txt bs=512 count=1 conv=notrunc
The dd command is killed by the kernel, then you can see the oops message
via dmesg command.
[ 463.875404] BUG: kernel NULL pointer dereference, address:
0000000000000028
[ 463.875413] #PF: supervisor read access in kernel mode
[ 463.875416] #PF: error_code(0x0000) - not-present page
[ 463.875418] PGD 0 P4D 0
[ 463.875425] Oops: 0000 [#1] SMP PTI
[ 463.875431] CPU: 1 PID: 2291 Comm: dd Tainted: G OE 5.3.16-2-default
[ 463.875433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 463.875500] RIP: 0010:ocfs2_refcount_cow+0xa4/0x5d0 [ocfs2]
[ 463.875505] Code: 06 89 6c 24 38 89 eb f6 44 24 3c 02 74 be 49 8b 47 28
[ 463.875508] RSP: 0018:
ffffa2cb409dfce8 EFLAGS:
00010202
[ 463.875512] RAX:
ffff8b1ebdca8000 RBX:
0000000000000001 RCX:
ffff8b1eb73a9df0
[ 463.875515] RDX:
0000000000056a01 RSI:
0000000000000000 RDI:
0000000000000000
[ 463.875517] RBP:
0000000000000001 R08:
ffff8b1eb73a9de0 R09:
0000000000000000
[ 463.875520] R10:
0000000000000001 R11:
0000000000000000 R12:
0000000000000000
[ 463.875522] R13:
ffff8b1eb922f048 R14:
0000000000000000 R15:
ffff8b1eb922f048
[ 463.875526] FS:
00007f8f44d15540(0000) GS:
ffff8b1ebeb00000(0000) knlGS:
0000000000000000
[ 463.875529] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 463.875532] CR2:
0000000000000028 CR3:
000000003c17a000 CR4:
00000000000006e0
[ 463.875546] Call Trace:
[ 463.875596] ? ocfs2_inode_lock_full_nested+0x18b/0x960 [ocfs2]
[ 463.875648] ocfs2_file_write_iter+0xaf8/0xc70 [ocfs2]
[ 463.875672] new_sync_write+0x12d/0x1d0
[ 463.875688] vfs_write+0xad/0x1a0
[ 463.875697] ksys_write+0xa1/0xe0
[ 463.875710] do_syscall_64+0x60/0x1f0
[ 463.875743] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 463.875758] RIP: 0033:0x7f8f4482ed44
[ 463.875762] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00
[ 463.875765] RSP: 002b:
00007fff300a79d8 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
[ 463.875769] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f8f4482ed44
[ 463.875771] RDX:
0000000000000200 RSI:
000055f771b5c000 RDI:
0000000000000001
[ 463.875774] RBP:
0000000000000200 R08:
00007f8f44af9c78 R09:
0000000000000003
[ 463.875776] R10:
000000000000089f R11:
0000000000000246 R12:
000055f771b5c000
[ 463.875779] R13:
0000000000000200 R14:
0000000000000000 R15:
000055f771b5c000
This regression problem was introduced by commit
e74540b28556 ("ocfs2:
protect extent tree in ocfs2_prepare_inode_for_write()").
Link: http://lkml.kernel.org/r/20200121050153.13290-1-ghe@suse.com
Fixes:
e74540b28556 ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()").
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jakub Kicinski [Mon, 3 Feb 2020 23:38:50 +0000 (15:38 -0800)]
Merge branch 'netdevsim-fix-several-bugs-in-netdevsim-module'
Taehee Yoo says:
=====================
netdevsim: fix several bugs in netdevsim module
This patchset fixes several bugs in netdevsim module.
1. The first patch fixes using uninitialized resources
This patch fixes two similar problems, which is to use uninitialized
resources.
a) In the current code, {new/del}_device_store() use resource,
they are initialized by __init().
But, these functions could be called before __init() is finished.
So, accessing uninitialized data could occur and it eventually makes panic.
b) In the current code, {new/del}_port_store() uses resource,
they are initialized by new_device_store().
But thes functions could be called before new_device_store() is finished.
2. The second patch fixes another race condition.
The main problem is a race condition in {new/del}_port() and devlink reload
function.
These functions would allocate and remove resources. So these functions
should not be executed concurrently.
3. The third patch fixes a panic in nsim_dev_take_snapshot_write().
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
But these data could be removed by both reload routine and
del_device_store(). And these functions could be executed concurrently.
4. The fourth patch fixes stack-out-of-bound in nsim_dev_debugfs_init().
nsim_dev_debugfs_init() provides only 16bytes for name pointer.
But, there are some case the name length is over 16bytes.
So, stack-out-of-bound occurs.
5. The fifth patch uses IS_ERR instead of IS_ERR_OR_NULL.
debugfs_create_{dir/file} doesn't return NULL.
So, IS_ERR() is more correct.
6. The sixth patch avoids kmalloc warning.
When too large memory allocation is requested by user-space, kmalloc
internally prints warning message.
That warning message is not necessary.
In order to avoid that, it adds __GFP_NOWARN.
7. The last patch removes an unused sdev.c file
Change log:
v2 -> v3:
- Use smp_load_acquire() and smp_store_release() for flag variables.
- Change variable names.
- Fix deadlock in second patch.
- Update lock variable comment.
- Add new patch for fixing panic in snapshot_write().
- Include Reviewed-by tags.
- Update some log messages and comment.
v1 -> v2:
- Splits a fixing race condition patch into two patches.
- Fix incorrect Fixes tags.
- Update comments
- Fix use-after-free
- Add a new patch, which removes an unused sdev.c file.
- Remove a patch, which tries to avoid debugfs warning.
=====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Sat, 1 Feb 2020 16:43:48 +0000 (16:43 +0000)]
netdevsim: remove unused sdev code
sdev.c code is merged into dev.c and is not used anymore.
it would be removed.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Sat, 1 Feb 2020 16:43:39 +0000 (16:43 +0000)]
netdevsim: use __GFP_NOWARN to avoid memalloc warning
vfnum buffer size and binary_len buffer size is received by user-space.
So, this buffer size could be too large. If so, kmalloc will internally
print a warning message.
This warning message is actually not necessary for the netdevsim module.
So, this patch adds __GFP_NOWARN.
Test commands:
modprobe netdevsim
echo 1 > /sys/bus/netdevsim/new_device
echo
1000000000 > /sys/devices/netdevsim1/sriov_numvfs
Splat looks like:
[ 357.847266][ T1000] WARNING: CPU: 0 PID: 1000 at mm/page_alloc.c:4738 __alloc_pages_nodemask+0x2f3/0x740
[ 357.850273][ T1000] Modules linked in: netdevsim veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrx
[ 357.852989][ T1000] CPU: 0 PID: 1000 Comm: bash Tainted: G B 5.5.0-rc5+ #270
[ 357.854334][ T1000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 357.855703][ T1000] RIP: 0010:__alloc_pages_nodemask+0x2f3/0x740
[ 357.856669][ T1000] Code: 64 fe ff ff 65 48 8b 04 25 c0 0f 02 00 48 05 f0 12 00 00 41 be 01 00 00 00 49 89 47 0
[ 357.860272][ T1000] RSP: 0018:
ffff8880b7f47bd8 EFLAGS:
00010246
[ 357.861009][ T1000] RAX:
ffffed1016fe8f80 RBX:
1ffff11016fe8fae RCX:
0000000000000000
[ 357.861843][ T1000] RDX:
0000000000000000 RSI:
0000000000000017 RDI:
0000000000000000
[ 357.862661][ T1000] RBP:
0000000000040dc0 R08:
1ffff11016fe8f67 R09:
dffffc0000000000
[ 357.863509][ T1000] R10:
ffff8880b7f47d68 R11:
fffffbfff2798180 R12:
1ffff11016fe8f80
[ 357.864355][ T1000] R13:
0000000000000017 R14:
0000000000000017 R15:
ffff8880c2038d68
[ 357.865178][ T1000] FS:
00007fd9a5b8c740(0000) GS:
ffff8880d9c00000(0000) knlGS:
0000000000000000
[ 357.866248][ T1000] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 357.867531][ T1000] CR2:
000055ce01ba8100 CR3:
00000000b7dbe005 CR4:
00000000000606f0
[ 357.868972][ T1000] Call Trace:
[ 357.869423][ T1000] ? lock_contended+0xcd0/0xcd0
[ 357.870001][ T1000] ? __alloc_pages_slowpath+0x21d0/0x21d0
[ 357.870673][ T1000] ? _kstrtoull+0x76/0x160
[ 357.871148][ T1000] ? alloc_pages_current+0xc1/0x1a0
[ 357.871704][ T1000] kmalloc_order+0x22/0x80
[ 357.872184][ T1000] kmalloc_order_trace+0x1d/0x140
[ 357.872733][ T1000] __kmalloc+0x302/0x3a0
[ 357.873204][ T1000] nsim_bus_dev_numvfs_store+0x1ab/0x260 [netdevsim]
[ 357.873919][ T1000] ? kernfs_get_active+0x12c/0x180
[ 357.874459][ T1000] ? new_device_store+0x450/0x450 [netdevsim]
[ 357.875111][ T1000] ? kernfs_get_parent+0x70/0x70
[ 357.875632][ T1000] ? sysfs_file_ops+0x160/0x160
[ 357.876152][ T1000] kernfs_fop_write+0x276/0x410
[ 357.876680][ T1000] ? __sb_start_write+0x1ba/0x2e0
[ 357.877225][ T1000] vfs_write+0x197/0x4a0
[ 357.877671][ T1000] ksys_write+0x141/0x1d0
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes:
79579220566c ("netdevsim: add SR-IOV functionality")
Fixes:
82c93a87bf8b ("netdevsim: implement couple of testing devlink health reporters")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Sat, 1 Feb 2020 16:43:30 +0000 (16:43 +0000)]
netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs
Debugfs APIs return valid pointer or error pointer. it doesn't return NULL.
So, using IS_ERR is enough, not using IS_ERR_OR_NULL.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Sat, 1 Feb 2020 16:43:22 +0000 (16:43 +0000)]
netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()
When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.
Test commands:
modprobe netdevsim
echo "
1000000000 0" > /sys/bus/netdevsim/new_device
Splat looks like:
[ 249.622710][ T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880
[ 249.623658][ T900] Write of size 1 at addr
ffff88804c527988 by task bash/900
[ 249.624521][ T900]
[ 249.624830][ T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322
[ 249.625691][ T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 249.626712][ T900] Call Trace:
[ 249.627103][ T900] dump_stack+0x96/0xdb
[ 249.627639][ T900] ? number+0x824/0x880
[ 249.628173][ T900] print_address_description.constprop.5+0x1be/0x360
[ 249.629022][ T900] ? number+0x824/0x880
[ 249.629569][ T900] ? number+0x824/0x880
[ 249.630105][ T900] __kasan_report+0x12a/0x170
[ 249.630717][ T900] ? number+0x824/0x880
[ 249.631201][ T900] kasan_report+0xe/0x20
[ 249.631723][ T900] number+0x824/0x880
[ 249.632235][ T900] ? put_dec+0xa0/0xa0
[ 249.632716][ T900] ? rcu_read_lock_sched_held+0x90/0xc0
[ 249.633392][ T900] vsnprintf+0x63c/0x10b0
[ 249.633983][ T900] ? pointer+0x5b0/0x5b0
[ 249.634543][ T900] ? mark_lock+0x11d/0xc40
[ 249.635200][ T900] sprintf+0x9b/0xd0
[ 249.635750][ T900] ? scnprintf+0xe0/0xe0
[ 249.636370][ T900] nsim_dev_probe+0x63c/0xbf0 [netdevsim]
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes:
ab1d0cc004d7 ("netdevsim: change debugfs tree topology")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Sat, 1 Feb 2020 16:43:13 +0000 (16:43 +0000)]
netdevsim: fix panic in nsim_dev_take_snapshot_write()
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
So, during this function, these data shouldn't be removed.
But there is no protecting stuff in this function.
There are two similar cases.
1. reload case
reload could be called during nsim_dev_take_snapshot_write().
When reload is being executed, nsim_dev_reload_down() is called and it
calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls
devlink_region_destroy() to destroy nsim_dev->dummy_region.
So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region()
would be removed.
At this point, snapshot_write() would access freed pointer.
In order to fix this case, take_snapshot file will be removed before
devlink_region_destroy().
The take_snapshot file will be re-created by ->reload_up().
2. del_device_store case
del_device_store() also could call nsim_dev_reload_destroy()
during nsim_dev_take_snapshot_write(). If so, panic would occur.
This problem is actually the same problem with the first case.
So, this problem will be fixed by the first case's solution.
Test commands:
modprobe netdevsim
while :
do
echo 1 > /sys/bus/netdevsim/new_device &
echo 1 > /sys/bus/netdevsim/del_device &
devlink dev reload netdevsim/netdevsim1 &
echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot &
done
Splat looks like:
[ 45.564513][ T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI
[ 45.566131][ T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7]
[ 45.566135][ T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322
[ 45.569020][ T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 45.569026][ T975] RIP: 0010:__mutex_lock+0x10a/0x14b0
[ 45.570518][ T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[ 45.570522][ T975] RSP: 0018:
ffff888046ccfbf0 EFLAGS:
00010206
[ 45.572305][ T975] RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
0000000000000000
[ 45.572308][ T975] RDX:
000000000000003a RSI:
ffffffffac926440 RDI:
00000000000001d0
[ 45.576843][ T975] RBP:
ffff888046ccfd70 R08:
ffffffffab610645 R09:
0000000000000000
[ 45.576847][ T975] R10:
ffff888046ccfd90 R11:
ffffed100d6360ad R12:
0000000000000000
[ 45.578471][ T975] R13:
dffffc0000000000 R14:
ffffffffae1976c0 R15:
0000000000000168
[ 45.578475][ T975] FS:
00007f614d6e7740(0000) GS:
ffff88806c400000(0000) knlGS:
0000000000000000
[ 45.581492][ T975] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 45.582942][ T975] CR2:
00005618677d1cf0 CR3:
000000005fb9c002 CR4:
00000000000606e0
[ 45.584543][ T975] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 45.586633][ T975] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 45.589889][ T975] Call Trace:
[ 45.591445][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.601250][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.602817][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.603875][ T975] ? mark_held_locks+0xa5/0xe0
[ 45.604769][ T975] ? _raw_spin_unlock_irqrestore+0x2d/0x50
[ 45.606147][ T975] ? __mutex_unlock_slowpath+0xd0/0x670
[ 45.607723][ T975] ? crng_backtrack_protect+0x80/0x80
[ 45.613530][ T975] ? wait_for_completion+0x390/0x390
[ 45.615152][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.616834][ T975] devlink_region_snapshot_create+0x55/0x4a0
[ ... ]
Fixes:
4418f862d675 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Sat, 1 Feb 2020 16:43:04 +0000 (16:43 +0000)]
netdevsim: disable devlink reload when resources are being used
devlink reload destroys resources and allocates resources again.
So, when devices and ports resources are being used, devlink reload
function should not be executed. In order to avoid this race, a new
lock is added and new_port() and del_port() call devlink_reload_disable()
and devlink_reload_enable().
Thread0 Thread1
{new/del}_port() {new/del}_port()
devlink_reload_disable()
devlink_reload_disable()
devlink_reload_enable()
//here
devlink_reload_enable()
Before Thread1's devlink_reload_enable(), the devlink is already allowed
to execute reload because Thread0 allows it. devlink reload disable/enable
variable type is bool. So the above case would exist.
So, disable/enable should be executed atomically.
In order to do that, a new lock is used.
Test commands:
modprobe netdevsim
echo 1 > /sys/bus/netdevsim/new_device
while :
do
echo 1 > /sys/devices/netdevsim1/new_port &
echo 1 > /sys/devices/netdevsim1/del_port &
devlink dev reload netdevsim/netdevsim1 &
done
Splat looks like:
[ 23.342145][ T932] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
[ 23.342159][ T932] WARNING: CPU: 0 PID: 932 at kernel/locking/mutex-debug.c:103 mutex_destroy+0xc7/0xf0
[ 23.344182][ T932] Modules linked in: netdevsim openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_dx
[ 23.346485][ T932] CPU: 0 PID: 932 Comm: devlink Not tainted 5.5.0+ #322
[ 23.347696][ T932] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 23.348893][ T932] RIP: 0010:mutex_destroy+0xc7/0xf0
[ 23.349505][ T932] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2e 8b 05 00 ac b0 02 85 c0 75 8b 48 c7 c6 00 5e 07 96 40
[ 23.351887][ T932] RSP: 0018:
ffff88806208f810 EFLAGS:
00010286
[ 23.353963][ T932] RAX:
dffffc0000000008 RBX:
ffff888067f6f2c0 RCX:
ffffffff942c4bd4
[ 23.355222][ T932] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffffffff96dac5b4
[ 23.356169][ T932] RBP:
ffff888067f6f000 R08:
fffffbfff2d235a5 R09:
fffffbfff2d235a5
[ 23.357160][ T932] R10:
0000000000000001 R11:
fffffbfff2d235a4 R12:
ffff888067f6f208
[ 23.358288][ T932] R13:
ffff88806208fa70 R14:
ffff888067f6f000 R15:
ffff888069ce3800
[ 23.359307][ T932] FS:
00007fe2a3876740(0000) GS:
ffff88806c000000(0000) knlGS:
0000000000000000
[ 23.360473][ T932] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 23.361319][ T932] CR2:
00005561357aa000 CR3:
000000005227a006 CR4:
00000000000606f0
[ 23.362323][ T932] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 23.363417][ T932] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 23.364414][ T932] Call Trace:
[ 23.364828][ T932] nsim_dev_reload_destroy+0x77/0xb0 [netdevsim]
[ 23.365655][ T932] nsim_dev_reload_down+0x84/0xb0 [netdevsim]
[ 23.366433][ T932] devlink_reload+0xb1/0x350
[ 23.367010][ T932] genl_rcv_msg+0x580/0xe90
[ ...]
[ 23.531729][ T1305] kernel BUG at lib/list_debug.c:53!
[ 23.532523][ T1305] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 23.533467][ T1305] CPU: 2 PID: 1305 Comm: bash Tainted: G W 5.5.0+ #322
[ 23.534962][ T1305] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 23.536503][ T1305] RIP: 0010:__list_del_entry_valid+0xe6/0x150
[ 23.538346][ T1305] Code: 89 ea 48 c7 c7 00 73 1e 96 e8 df f7 4c ff 0f 0b 48 c7 c7 60 73 1e 96 e8 d1 f7 4c ff 0f 0b 44
[ 23.541068][ T1305] RSP: 0018:
ffff888047c27b58 EFLAGS:
00010282
[ 23.542001][ T1305] RAX:
0000000000000054 RBX:
ffff888067f6f318 RCX:
0000000000000000
[ 23.543051][ T1305] RDX:
0000000000000054 RSI:
0000000000000008 RDI:
ffffed1008f84f61
[ 23.544072][ T1305] RBP:
ffff88804aa0fca0 R08:
ffffed100d940539 R09:
ffffed100d940539
[ 23.545085][ T1305] R10:
0000000000000001 R11:
ffffed100d940538 R12:
ffff888047c27cb0
[ 23.546422][ T1305] R13:
ffff88806208b840 R14:
ffffffff981976c0 R15:
ffff888067f6f2c0
[ 23.547406][ T1305] FS:
00007f76c0431740(0000) GS:
ffff88806c800000(0000) knlGS:
0000000000000000
[ 23.548527][ T1305] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 23.549389][ T1305] CR2:
00007f5048f1a2f8 CR3:
000000004b310006 CR4:
00000000000606e0
[ 23.550636][ T1305] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 23.551578][ T1305] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 23.552597][ T1305] Call Trace:
[ 23.553004][ T1305] mutex_remove_waiter+0x101/0x520
[ 23.553646][ T1305] __mutex_lock+0xac7/0x14b0
[ 23.554218][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.554908][ T1305] ? mutex_lock_io_nested+0x1380/0x1380
[ 23.555570][ T1305] ? _parse_integer+0xf0/0xf0
[ 23.556043][ T1305] ? kstrtouint+0x86/0x110
[ 23.556504][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.557133][ T1305] nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.558024][ T1305] del_port_store+0xcc/0xf0 [netdevsim]
[ ... ]
Fixes:
75ba029f3c07 ("netdevsim: implement proper devlink reload")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Sat, 1 Feb 2020 16:42:54 +0000 (16:42 +0000)]
netdevsim: fix using uninitialized resources
When module is being initialized, __init() calls bus_register() and
driver_register().
These functions internally create various resources and sysfs files.
The sysfs files are used for basic operations(add/del device).
/sys/bus/netdevsim/new_device
/sys/bus/netdevsim/del_device
These sysfs files use netdevsim resources, they are mostly allocated
and initialized in ->probe() function, which is nsim_dev_probe().
But, sysfs files could be executed before ->probe() is finished.
So, accessing uninitialized data would occur.
Another problem is very similar.
/sys/bus/netdevsim/new_device internally creates sysfs files.
/sys/devices/netdevsim<id>/new_port
/sys/devices/netdevsim<id>/del_port
These sysfs files also use netdevsim resources, they are mostly allocated
and initialized in creating device routine, which is nsim_bus_dev_new().
But they also could be executed before nsim_bus_dev_new() is finished.
So, accessing uninitialized data would occur.
To fix these problems, this patch adds flags, which means whether the
operation is finished or not.
The flag variable 'nsim_bus_enable' means whether netdevsim bus was
initialized or not.
This is protected by nsim_bus_dev_list_lock.
The flag variable 'nsim_bus_dev->init' means whether nsim_bus_dev was
initialized or not.
This could be used in {new/del}_port_store() with no lock.
Test commands:
#SHELL1
modprobe netdevsim
while :
do
echo "1 1" > /sys/bus/netdevsim/new_device
echo "1 1" > /sys/bus/netdevsim/del_device
done
#SHELL2
while :
do
echo 1 > /sys/devices/netdevsim1/new_port
echo 1 > /sys/devices/netdevsim1/del_port
done
Splat looks like:
[ 47.508954][ T1008] general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 I
[ 47.510793][ T1008] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
[ 47.511963][ T1008] CPU: 2 PID: 1008 Comm: bash Not tainted 5.5.0+ #322
[ 47.512823][ T1008] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 47.514041][ T1008] RIP: 0010:__mutex_lock+0x10a/0x14b0
[ 47.514699][ T1008] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[ 47.517163][ T1008] RSP: 0018:
ffff888059b4fbb0 EFLAGS:
00010206
[ 47.517802][ T1008] RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
0000000000000000
[ 47.518941][ T1008] RDX:
0000000000000021 RSI:
ffffffff85926440 RDI:
0000000000000108
[ 47.519732][ T1008] RBP:
ffff888059b4fd30 R08:
ffffffffc073fad0 R09:
0000000000000000
[ 47.520729][ T1008] R10:
ffff888059b4fd50 R11:
ffff88804bb38040 R12:
0000000000000000
[ 47.521702][ T1008] R13:
dffffc0000000000 R14:
ffffffff871976c0 R15:
00000000000000a0
[ 47.522760][ T1008] FS:
00007fd4be05a740(0000) GS:
ffff88806c800000(0000) knlGS:
0000000000000000
[ 47.523877][ T1008] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 47.524627][ T1008] CR2:
0000561c82b69cf0 CR3:
0000000065dd6004 CR4:
00000000000606e0
[ 47.527662][ T1008] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 47.528604][ T1008] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 47.529531][ T1008] Call Trace:
[ 47.529874][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.530470][ T1008] ? mutex_lock_io_nested+0x1380/0x1380
[ 47.531018][ T1008] ? _kstrtoull+0x76/0x160
[ 47.531449][ T1008] ? _parse_integer+0xf0/0xf0
[ 47.531874][ T1008] ? kernfs_fop_write+0x1cf/0x410
[ 47.532330][ T1008] ? sysfs_file_ops+0x160/0x160
[ 47.532773][ T1008] ? kstrtouint+0x86/0x110
[ 47.533168][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.533721][ T1008] nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.534336][ T1008] ? sysfs_file_ops+0x160/0x160
[ 47.534858][ T1008] new_port_store+0x99/0xb0 [netdevsim]
[ 47.535439][ T1008] ? del_port_store+0xb0/0xb0 [netdevsim]
[ 47.536035][ T1008] ? sysfs_file_ops+0x112/0x160
[ 47.536544][ T1008] ? sysfs_kf_write+0x3b/0x180
[ 47.537029][ T1008] kernfs_fop_write+0x276/0x410
[ 47.537548][ T1008] ? __sb_start_write+0x215/0x2e0
[ 47.538110][ T1008] vfs_write+0x197/0x4a0
[ ... ]
Fixes:
f9d9db47d3ba ("netdevsim: add bus attributes to add new and delete devices")
Fixes:
794b2c05ca1c ("netdevsim: extend device attrs to support port addition and deletion")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 3 Feb 2020 23:07:26 +0000 (15:07 -0800)]
Merge branch 'bnxt_en-Bug-fixes'
Michael Chan says:
=====================
bnxt_en: Bug fixes
3 patches that fix some issues in the firmware reset logic, starting
with a small patch to refactor the code that re-enables SRIOV. The
last patch fixes a TC queue mapping issue.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Sun, 2 Feb 2020 07:41:38 +0000 (02:41 -0500)]
bnxt_en: Fix TC queue mapping.
The driver currently only calls netdev_set_tc_queue when the number of
TCs is greater than 1. Instead, the comparison should be greater than
or equal to 1. Even with 1 TC, we need to set the queue mapping.
This bug can cause warnings when the number of TCs is changed back to 1.
Fixes:
7809592d3e2e ("bnxt_en: Enable MSIX early in bnxt_init_one().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vasundhara Volam [Sun, 2 Feb 2020 07:41:37 +0000 (02:41 -0500)]
bnxt_en: Fix logic that disables Bus Master during firmware reset.
The current logic that calls pci_disable_device() in __bnxt_close_nic()
during firmware reset is flawed. If firmware is still alive, we're
disabling the device too early, causing some firmware commands to
not reach the firmware.
Fix it by moving the logic to bnxt_reset_close(). If firmware is
in fatal condition, we call pci_disable_device() before we free
any of the rings to prevent DMA corruption of the freed rings. If
firmware is still alive, we call pci_disable_device() after the
last firmware message has been sent.
Fixes:
3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Sun, 2 Feb 2020 07:41:36 +0000 (02:41 -0500)]
bnxt_en: Fix RDMA driver failure with SRIOV after firmware reset.
bnxt_ulp_start() needs to be called before SRIOV is re-enabled after
firmware reset. Re-enabling SRIOV may consume all the resources and
may cause the RDMA driver to fail to get MSIX and other resources.
Fix it by calling bnxt_ulp_start() first before calling
bnxt_reenable_sriov().
We re-arrange the logic so that we call bnxt_ulp_start() and
bnxt_reenable_sriov() in proper sequence in bnxt_fw_reset_task() and
bnxt_open(). The former is the normal coordinated firmware reset sequence
and the latter is firmware reset while the function is down. This new
logic is now more straight forward and will now fix both scenarios.
Fixes:
f3a6d206c25a ("bnxt_en: Call bnxt_ulp_stop()/bnxt_ulp_start() during error recovery.")
Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Sun, 2 Feb 2020 07:41:35 +0000 (02:41 -0500)]
bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.
Put the current logic in bnxt_open() to re-enable SRIOV after detecting
firmware reset into a new function bnxt_reenable_sriov(). This call
needs to be invoked in the firmware reset path also in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nicolin Chen [Sat, 1 Feb 2020 02:01:24 +0000 (18:01 -0800)]
net: stmmac: Delete txtimer in suspend()
When running v5.5 with a rootfs on NFS, memory abort may happen in
the system resume stage:
Unable to handle kernel paging request at virtual address
dead00000000012a
[
dead00000000012a] address between user and kernel address ranges
pc : run_timer_softirq+0x334/0x3d8
lr : run_timer_softirq+0x244/0x3d8
x1 :
ffff800011cafe80 x0 :
dead000000000122
Call trace:
run_timer_softirq+0x334/0x3d8
efi_header_end+0x114/0x234
irq_exit+0xd0/0xd8
__handle_domain_irq+0x60/0xb0
gic_handle_irq+0x58/0xa8
el1_irq+0xb8/0x180
arch_cpu_idle+0x10/0x18
do_idle+0x1d8/0x2b0
cpu_startup_entry+0x24/0x40
secondary_start_kernel+0x1b4/0x208
Code:
f9000693 a9400660 f9000020 b4000040 (
f9000401)
---[ end trace
bb83ceeb4c482071 ]---
Kernel panic - not syncing: Fatal exception in interrupt
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 2-3
Kernel Offset: disabled
CPU features: 0x00002,
2300aa30
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
It's found that stmmac_xmit() and stmmac_resume() sometimes might
run concurrently, possibly resulting in a race condition between
mod_timer() and setup_timer(), being called by stmmac_xmit() and
stmmac_resume() respectively.
Since the resume() runs setup_timer() every time, it'd be safer to
have del_timer_sync() in the suspend() as the counterpart.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 3 Feb 2020 22:27:33 +0000 (22:27 +0000)]
Merge branch 'for-5.6' of git://git./linux/kernel/git/dennis/percpu
Pull percpu updates from Dennis Zhou:
"Separate out variables that can be decrypted into their own page
anytime encryption can be enabled and fix __percpu annotations in
asm-generic for sparse"
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: Separate decrypted varaibles anytime encryption can be enabled
percpu: fix __percpu annotation in asm-generic
Linus Torvalds [Mon, 3 Feb 2020 22:25:27 +0000 (22:25 +0000)]
Merge branch 'stable/for-linus-5.6' of git://git./linux/kernel/git/konrad/ibft
Pull ibft update from Konrad Rzeszutek Wilk:
"Adhere to the iBFT spec and extend the structure to handle more
than two NICs"
* 'stable/for-linus-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
iscsi_ibft: Don't limits Targets and NICs to two
Linus Torvalds [Mon, 3 Feb 2020 22:22:05 +0000 (22:22 +0000)]
Merge tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Fix nvlink error path (Alexey Kardashevskiy)
- Update nvlink and spapr to use mmgrab() (Julia Lawall)
- Update static declaration (Ben Dooks)
- Annotate __iomem to fix sparse warnings (Ben Dooks)
* tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio:
vfio: platform: fix __iomem in vfio_platform_amdxgbe.c
vfio/mdev: make create attribute static
vfio/spapr_tce: use mmgrab
vfio: vfio_pci_nvlink2: use mmgrab
vfio/spapr/nvlink2: Skip unpinning pages on error exit
Linus Torvalds [Mon, 3 Feb 2020 22:10:18 +0000 (22:10 +0000)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"There are a few changes to the core framework this time around, in
addition to the normal collection of driver updates to support new
SoCs, fix incorrect data, and convert various drivers to clk_hw based
APIs.
In the core, we allow clk_ops::init() to return an error code now so
that we can fail clk registration if the callback does something like
fail to allocate memory. We also add a new "terminate" clk_op so that
things done in clk_ops::init() can be undone, e.g. free memory. We
also spit out a warning now when critical clks fail to enable and we
support changing clk rates and enable/disable state through debugfs
when developers compile the kernel themselves.
On the driver front, we get support for what seems like a lot of
Qualcomm and NXP SoCs given that those vendors dominate the diffstat.
There are a couple new drivers for Xilinx and Amlogic SoCs too. The
updates are all small things like fixing the way glitch free muxes
switch parents, avoiding div-by-zero problems, or fixing data like
parent names. See the updates section below for more details.
Finally, the "basic" clk types have been converted to support
specifying parents with clk_hw pointers. This work includes an
overhaul of the fixed-rate clk type to be more modern by using clk_hw
APIs.
Core:
- Let clk_ops::init() return an error code
- Add a clk_ops::terminate() callback to undo clk_ops::init()
- Warn about critical clks that fail to enable or prepare
- Support dangerous debugfs actions on clks with dead code
New Drivers:
- Support for Xilinx Versal platform clks
- Display clk controller on qcom sc7180
- Video clk controller on qcom sc7180
- Graphics clk controller on qcom sc7180
- CPU PLLs for qcom msm8916
- Move qcom msm8974 gfx3d clk to RPM control
- Display port clk support on qcom sdm845 SoCs
- Global clk controller on qcom ipq6018
- Add a driver for BCLK of Freescale SAI cores
- Add cam, vpe and sgx clock support for TI dra7
- Add aess clock support for TI omap5
- Enable clks for CPUfreq on Allwinner A64 SoCs
- Add Amlogic meson8b DDR clock controller
- Add input clocks to Amlogic meson8b controllers
- Add SPIBSC (SPI FLASH) clock on Renesas RZ/A2
- i.MX8MP clk driver support
Updates:
- Convert gpio, fixed-factor, mux, gate, divider basic clks to hw
based APIs
- Detect more PRMCU variants in ux500 driver
- Adjust the composite clk type to new way of describing clk parents
- Fixes for clk controllers on qcom msm8998 SoCs
- Fix gmac main clock for TI dra7
- Move TI dra7-atl clock header to correct location
- Fix hidden node name dependency on TI clkctrl clocks
- Fix Amlogic meson8b mali clock update using the glitch free mux
- Fix Amlogic pll driver division by zero at init
- Prepare for split of Renesas R-Car H3 ES1.x and ES2.0+ config
symbols
- Switch more i.MX clk drivers to clk_hw based APIs
- Disable non-functional divider between pll4_audio_div and
pll4_post_div on imx6q
- Fix watchdog2 clock name typo in imx7ulp clock driver
- Set CLK_GET_RATE_NOCACHE flag for DRAM related clocks on i.MX8M
SoCs
- Suppress bind attrs for i.MX8M clock driver
- Add a big comment in imx8qxp-lpcg driver to tell why
devm_platform_ioremap_resource() shouldn't be used for the driver
- A correction on i.MX8MN usb1_ctrl parent clock setting"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (140 commits)
dt/bindings: clk: fsl,plldig: Drop 'bindings' from schema id
clk: ls1028a: Fix warning on clamp() usage
clk: qoriq: add ls1088a hwaccel clocks support
clk: ls1028a: Add clock driver for Display output interface
dt/bindings: clk: Add YAML schemas for LS1028A Display Clock bindings
clk: fsl-sai: new driver
dt-bindings: clock: document the fsl-sai driver
clk: composite: add _register_composite_pdata() variants
clk: qcom: rpmh: Sort OF match table
dt-bindings: fix warnings in validation of qcom,gcc.yaml
dt-binding: fix compilation error of the example in qcom,gcc.yaml
clk: zynqmp: Add support for clock with CLK_DIVIDER_POWER_OF_TWO flag
clk: zynqmp: Fix divider calculation
clk: zynqmp: Add support for get max divider
clk: zynqmp: Warn user if clock user are more than allowed
clk: zynqmp: Extend driver for versal
dt-bindings: clock: Add bindings for versal clock driver
clk: ti: clkctrl: Fix hidden dependency to node name
clk: ti: add clkctrl data dra7 sgx
clk: ti: omap5: Add missing AESS clock
...
Linus Torvalds [Mon, 3 Feb 2020 22:05:15 +0000 (22:05 +0000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a driver for SGI IOC3 PS/2 controller
- updates to driver for FocalTech FT5x06 series touch screen
controllers
- other assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - switch to reduced reporting mode
dt-bindings: touchscreen: Convert Goodix touchscreen to json-schema
dt-bindings: touchscreen: Add touchscreen schema
Input: add IOC3 serio driver
Input: axp20x-pek - enable wakeup for all AXP variants
Input: axp20x-pek - respect userspace wakeup configuration
Input: ads7846 - use new `delay` structure for SPI transfer delays
Input: edt-ft5x06 - use pm core to enable/disable the wake irq
Input: edt-ft5x06 - make wakeup-source switchable
Input: edt-ft5x06 - document wakeup-source capability
Input: edt-ft5x06 - alphabetical include reorder
Input: edt-ft5x06 - work around first register access error
Input: apbps2 - add __iomem to register struct
Input: axp20x-pek - make device attributes static
Input: elants_i2c - check Remark ID when attempting firmware update
Stephen Boyd [Mon, 3 Feb 2020 05:25:07 +0000 (21:25 -0800)]
dt/bindings: clk: fsl,plldig: Drop 'bindings' from schema id
Having 'bindings' in here causes a warning when checking the schema.
Documentation/devicetree/bindings/clock/fsl,plldig.yaml:
$id: relative path/filename doesn't match actual path or filename
expected: http://devicetree.org/schemas/clock/fsl,plldig.yaml#
Remove it.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Wen He <wen.he_1@nxp.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20200203052507.93215-2-sboyd@kernel.org
Acked-by: Rob Herring <robh@kernel.org>
Stephen Boyd [Mon, 3 Feb 2020 05:25:06 +0000 (21:25 -0800)]
clk: ls1028a: Fix warning on clamp() usage
These constants are used in clamp() with the value being clamped an
unsigned long. Make them unsigned long defines so that clamp() doesn't
complain about comparing different types.
In file included from include/linux/list.h:9,
from include/linux/kobject.h:19,
from include/linux/of.h:17,
from include/linux/clk-provider.h:9,
from drivers/clk/clk-plldig.c:8:
drivers/clk/clk-plldig.c: In function 'plldig_determine_rate':
include/linux/kernel.h:835:29: warning: comparison of distinct pointer types lacks a cast
835 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
|
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wen He <wen.he_1@nxp.com>
Fixes:
d37010a3c162 ("clk: ls1028a: Add clock driver for Display output interface")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20200203052507.93215-1-sboyd@kernel.org
Jakub Kicinski [Mon, 3 Feb 2020 18:26:23 +0000 (10:26 -0800)]
Merge tag 'rxrpc-fixes-
20200203' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
RxRPC fixes
Here are a number of fixes for AF_RXRPC:
(1) Fix a potential use after free in rxrpc_put_local() where it was
accessing the object just put to get tracing information.
(2) Fix insufficient notifications being generated by the function that
queues data packets on a call. This occasionally causes recvmsg() to
stall indefinitely.
(3) Fix a number of packet-transmitting work functions to hold an active
count on the local endpoint so that the UDP socket doesn't get
destroyed whilst they're calling kernel_sendmsg() on it.
(4) Fix a NULL pointer deref that stemmed from a call's connection pointer
being cleared when the call was disconnected.
Changes:
v2: Removed a couple of BUG() statements that got added.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Masahiro Yamada [Mon, 3 Feb 2020 16:47:08 +0000 (01:47 +0900)]
initramfs: do not show compression mode choice if INITRAMFS_SOURCE is empty
Since commit
ddd09bcc899f ("initramfs: make compression options not
depend on INITRAMFS_SOURCE"), Kconfig asks the compression mode for
the built-in initramfs regardless of INITRAMFS_SOURCE.
It is technically simpler, but pointless from a UI perspective,
Linus says [1].
When INITRAMFS_SOURCE is empty, usr/Makefile creates a tiny default
cpio, which is so small that nobody cares about the compression.
This commit hides the Kconfig choice in that case. The default cpio
is embedded without compression, which was the original behavior.
[1]: https://lkml.org/lkml/2020/2/1/160
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 3 Feb 2020 17:03:42 +0000 (17:03 +0000)]
Merge tag 'for-5.6-tag' of git://git./linux/kernel/git/kdave/linux
Pull more btrfs updates from David Sterba:
"Fixes that arrived after the merge window freeze, mostly stable
material.
- fix race in tree-mod-log element tracking
- fix bio flushing inside extent writepages
- fix assertion when in-memory tracking of discarded extents finds an
empty tree (eg. after adding a new device)
- update logic of temporary read-only block groups to take into
account overcommit
- fix some fixup worker corner cases:
- page could not go through proper COW cycle and the dirty status
is lost due to page migration
- deadlock if delayed allocation is performed under page lock
- fix send emitting invalid clones within the same file
- fix statfs reporting 0 free space when global block reserve size is
larger than remaining free space but there is still space for new
chunks"
* tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: do not zero f_bavail if we have available space
Btrfs: send, fix emission of invalid clone operations within the same file
btrfs: do not do delalloc reservation under page lock
btrfs: drop the -EBUSY case in __extent_writepage_io
Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker
btrfs: take overcommit into account in inc_block_group_ro
btrfs: fix force usage in inc_block_group_ro
btrfs: Correctly handle empty trees in find_first_clear_extent_bit
btrfs: flush write bio if we loop in extent_write_cache_pages
Btrfs: fix race between adding and putting tree mod seq elements and nodes
Linus Torvalds [Mon, 3 Feb 2020 16:59:51 +0000 (16:59 +0000)]
Merge tag 'kgdb-5.6-rc1' of git://git./linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Everything for kgdb this time around is either simplifications or
clean ups.
In particular Douglas Anderson's modifications to the backtrace
machine in the *last* dev cycle have enabled Doug to tidy up some MIPS
specific backtrace code and stop sharing certain data structures
across the kernel. Note that The MIPS folks were on Cc: for the MIPS
patch and reacted positively (but without an explicit Acked-by).
Doug also got rid of the implicit switching between tasks and register
sets during some but not of kdb's backtrace actions (because the
implicit switching was either confusing for users, pointless or both).
Finally there is a coverity fix and patch to replace open coded
console traversal with the proper helper function"
* tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: Use for_each_console() helper
kdb: remove redundant assignment to pointer bp
kdb: Get rid of confusing diag msg from "rd" if current task has no regs
kdb: Gid rid of implicit setting of the current task / regs
kdb: kdb_current_task shouldn't be exported
kdb: kdb_current_regs should be private
MIPS: kdb: Remove old workaround for backtracing on other CPUs
Enric Balletbo i Serra [Wed, 22 Jan 2020 09:07:01 +0000 (10:07 +0100)]
platform/chrome: cros_ec: Match implementation with headers
The 'cros_ec' core driver is the common interface for the cros_ec
transport drivers to do the shared operations to register, unregister,
suspend, resume and handle_event. The interface is provided by including
the header 'include/linux/platform_data/cros_ec_proto.h', however, instead
of have the implementation of these functions in cros_ec_proto.c, it is in
'cros_ec.c', which is a different kernel module. Apart from being a bad
practice, this can induce confusions allowing the users of the cros_ec
protocol to call these functions.
The register, unregister, suspend, resume and handle_event functions
*should* only be called by the different transport drivers (i2c, spi, lpc,
etc.), so make this a bit less confusing by moving these functions from
the public in-kernel space to a private include in platform/chrome, and
then, the interface for cros_ec module and for the cros_ec_proto module is
clean.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Benson Leung <bleung@chromium.org>
Linus Torvalds [Mon, 3 Feb 2020 14:57:33 +0000 (14:57 +0000)]
Merge tag 'char-misc-5.6-rc1-2' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is a single patch, that fixes up a commit that came in the
previous char/misc merge.
It fixes a bug in the hpet driver that everyone keeps tripping over in
their automated testing. Good thing is, people are catching it. Bad
thing it wasn't caught by anyone testing before this. Oh well...
This has been in linux-next for a few days with no reported issues"
* tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
char: hpet: Fix out-of-bounds read bug
Linus Torvalds [Mon, 3 Feb 2020 14:55:08 +0000 (14:55 +0000)]
Merge tag 'backlight-next-5.6' of git://git./linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Fix-ups:
- Remove superfluous code in ams369fg06
- Convert over to GPIO descriptor (gpiod) in bd6107
Bug Fixes:
- Fix unsigned comparison to less than zero in qcom-wled"
* tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: qcom-wled: Fix unsigned comparison to zero
backlight: bd6107: Convert to use GPIO descriptor
backlight: ams369fg06: Drop GPIO include
Linus Torvalds [Mon, 3 Feb 2020 14:51:57 +0000 (14:51 +0000)]
Merge tag 'mfd-next-5.6' of git://git./linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Drivers:
- Add support for ROHM
BD71828 PMICs and GPIOs
- Add support for Qualcomm Aqstic Audio Codecs WCD9340 and WCD9341
New Device Support:
- Add support for
BD71828 to
BD70528 RTC driver
- Add support for Intel's Jasper Lake to LPSS PCI
New Functionality:
- Add support for Power Key to ROHM
BD71828
- Add support for Clocks to ROHM
BD71828
- Add support for GPIOs to Dialog DA9062
- Add support for USB PD Notify to ChromiumOS EC
- Allow callers to specify args when requesting regmap lookup; syscon
Fix-ups:
- Improve error handling and sanity checking; atmel-hlcdc, dln2
- Device Tree support/documentation;
bd71828, da9062, xylon,logicvc,
ab8500, max14577, atmel-usart
- Match devices using platform IDs; bd7xxxx
- Refactor BD718x7 regulator component; bd718x7-regulator
- Use standard interfaces/helpers; syscon, sm501
- Trivial (whitespace, spelling, etc); ab8500-core, Kconfig
- Remove unused code; db8500-prcmu, tqmx86
- Wait until boot has finished before accessing registers;
madera-core
- Provide missing register value defaults; cs47l15-tables
- Allow more time for hardware to reset; madera-core
Bug Fixes:
- Fix erroneous register values; rohm-
bd70528
- Fix register volatility; axp20x, rn5t618
- Fix Kconfig dependencies; MFD_MAX77650
- Fix incorrect compatible string; da9062-core
- Fix syscon_regmap_lookup_by_phandle_args() stub; syscon"
* tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (41 commits)
mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy
mfd: wcd934x: Add support to wcd9340/wcd9341 codec
mfd: syscon: Add arguments support for syscon reference
mfd: rn5t618: Mark ADC control register volatile
dt-bindings: atmel-usart: Add microchip,sam9x60-{usart, dbgu}
dt-bindings: atmel-usart: Remove wildcard
mfd: cros_ec: Add cros-usbpd-notify subdevice
mfd: da9062: Fix watchdog compatible string
mfd: madera: Allow more time for hardware reset
mfd: cs47l15: Add missing register default
mfd: madera: Wait for boot done before accessing any other registers
mfd: Kconfig: Rename Samsung to lowercase
mfd: tqmx86: remove set but not used variable 'i2c_ien'
mfd: dbx500-prcmu: Drop DSI pll clock functions
mfd: dbx500-prcmu: Drop set_display_clocks()
mfd: max77650: Select REGMAP_IRQ in Kconfig
mfd: axp20x: Mark AXP20X_VBUS_IPSOUT_MGMT as volatile
mfd: ab8500: Fix ab8500-clk typo
mfd: intel-lpss: Add Intel Jasper Lake PCI IDs
dt-bindings: mfd: max14577: Add reference to max14040_battery.txt descriptions
...
Linus Torvalds [Mon, 3 Feb 2020 14:42:03 +0000 (14:42 +0000)]
Merge tag 'hyperv-next-signed' of git://git./linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Sasha Levin:
- Most of the commits here are work to enable host-initiated
hibernation support by Dexuan Cui.
- Fix for a warning shown when host sends non-aligned balloon requests
by Tianyu Lan.
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hv_utils: Add the support of hibernation
hv_utils: Support host-initiated hibernation request
hv_utils: Support host-initiated restart request
Tools: hv: Reopen the devices if read() or write() returns errors
video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.
Drivers: hv: vmbus: Ignore CHANNELMSG_TL_CONNECT_RESULT(23)
video: hyperv_fb: Fix hibernation for the deferred IO feature
Input: hyperv-keyboard: Add the support of hibernation
hv_balloon: Balloon up according to request page number
Miklos Szeredi [Mon, 3 Feb 2020 10:41:53 +0000 (11:41 +0100)]
ovl: fix lseek overflow on 32bit
ovl_lseek() is using ssize_t to return the value from vfs_llseek(). On a
32-bit kernel ssize_t is a 32-bit signed int, which overflows above 2 GB.
Assign the return value of vfs_llseek() to loff_t to fix this.
Reported-by: Boris Gjenero <boris.gjenero@gmail.com>
Fixes:
9e46b840c705 ("ovl: support stacked SEEK_HOLE/SEEK_DATA")
Cc: <stable@vger.kernel.org> # v4.19
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
David Howells [Thu, 30 Jan 2020 21:50:36 +0000 (21:50 +0000)]
rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect
When a call is disconnected, the connection pointer from the call is
cleared to make sure it isn't used again and to prevent further attempted
transmission for the call. Unfortunately, there might be a daemon trying
to use it at the same time to transmit a packet.
Fix this by keeping call->conn set, but setting a flag on the call to
indicate disconnection instead.
Remove also the bits in the transmission functions where the conn pointer is
checked and a ref taken under spinlock as this is now redundant.
Fixes:
8d94aa381dab ("rxrpc: Calls shouldn't hold socket refs")
Signed-off-by: David Howells <dhowells@redhat.com>
Geert Uytterhoeven [Thu, 30 Jan 2020 12:55:29 +0000 (13:55 +0100)]
mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy
If CONFIG_MFD_SYSCON=n:
include/linux/mfd/syscon.h:54:23: warning: ‘syscon_regmap_lookup_by_phandle_args’ defined but not used [-Wunused-function]
Fix this by adding the missing inline keyword.
Fixes:
6a24f567af4accef ("mfd: syscon: Add arguments support for syscon reference")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Jakub Kicinski [Sun, 2 Feb 2020 21:39:11 +0000 (13:39 -0800)]
Merge branch 'Fix-reconnection-latency-caused-by-FIN-ACK-handling-race'
SeongJae Park says:
====================
Fix reconnection latency caused by FIN/ACK handling race
The first patch fixes the problem by adjusting the first resend delay of
the SYN in the case. The second one adds a user space test to reproduce
this problem.
From v2
(https://lore.kernel.org/linux-kselftest/
20200201071859.4231-1-sj38.park@gmail.com/)
- Use TCP_TIMEOUT_MIN as reduced delay (Neal Cardwall)
- Add Reviewed-by and Signed-off-by from Eric Dumazet
From v1
(https://lore.kernel.org/linux-kselftest/
20200131122421.23286-1-sjpark@amazon.com/)
- Drop the trivial comment fix patch (Eric Dumazet)
- Limit the delay adjustment to only the first SYN resend (Eric Dumazet)
- selftest: Avoid use of hard-coded port number (Eric Dumazet)
- Explain RST/ACK and FIN/ACK has no big difference (Neal Cardwell)
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SeongJae Park [Sun, 2 Feb 2020 03:38:27 +0000 (03:38 +0000)]
selftests: net: Add FIN_ACK processing order related latency spike test
This commit adds a test for FIN_ACK process races related reconnection
latency spike issues. The issue has described and solved by the
previous commit ("tcp: Reduce SYN resend delay if a suspicous ACK is
received").
The test program is configured with a server and a client process. The
server creates and binds a socket to a port that dynamically allocated,
listen on it, and start a infinite loop. Inside the loop, it accepts
connection, reads 4 bytes from the socket, and closes the connection.
The client is constructed as an infinite loop. Inside the loop, it
creates a socket with LINGER and NODELAY option, connect to the server,
send 4 bytes data, try read some data from server. After the read()
returns, it measure the latency from the beginning of this loop to this
point and if the latency is larger than 1 second (spike), print a
message.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SeongJae Park [Sun, 2 Feb 2020 03:38:26 +0000 (03:38 +0000)]
tcp: Reduce SYN resend delay if a suspicous ACK is received
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
In some cases such as LINGER option applied socket, the FIN and FIN/ACK
will be substituted to RST and RST/ACK, but there is no difference in
the main logic.
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq
2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq
2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq
2560603644
This commit mitigates the problem by reducing the delay for the next SYN
if the suspicous ACK is received while in SYN_SENT state.
Following commit will add a selftest, which can be also helpful for
understanding of this issue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Sat, 1 Feb 2020 12:43:01 +0000 (13:43 +0100)]
MAINTAINERS: correct entries for ISDN/mISDN section
Commit
6d97985072dc ("isdn: move capi drivers to staging") cleaned up the
isdn drivers and split the MAINTAINERS section for ISDN, but missed to add
the terminal slash for the two directories mISDN and hardware. Hence, all
files in those directories were not part of the new ISDN/mISDN SUBSYSTEM,
but were considered to be part of "THE REST".
Rectify the situation, and while at it, also complete the section with two
further build files that belong to that subsystem.
This was identified with a small script that finds all files belonging to
"THE REST" according to the current MAINTAINERS file, and I investigated
upon its output.
Fixes:
6d97985072dc ("isdn: move capi drivers to staging")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Sun, 2 Feb 2020 19:50:58 +0000 (11:50 -0800)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fix from David Miller:
"adjtimex regression fix from Arnd"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: fix adjtimex regression
Linus Torvalds [Sun, 2 Feb 2020 19:48:46 +0000 (11:48 -0800)]
Merge tag 'leds-5.6-rc1' of git://git./linux/kernel/git/pavel/linux-leds
Pull LED updates from Pavel Machek:
- New driver for TI TPS6105X
- Add managed API to get a LED from a device driver
- Misc fixes and updates
* tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: (22 commits)
leds: lm3692x: Disable chip on brightness 0
leds: lm3692x: Split out lm3692x_leds_disable
leds: lm3692x: Move lm3692x_init and rename to lm3692x_leds_enable
leds: lm3692x: Make sure we don't exceed the maximum LED current
dt: bindings: lm3692x: Add led-max-microamp property
leds: lm3692x: Allow to configure over voltage protection
dt: bindings: lm3692x: Add ti,ovp-microvolt property
leds: populate the device's of_node
leds: Add managed API to get a LED from a device driver
leds: Add of_led_get() and led_put()
leds: lm3532: add pointer to documentation and fix typo
leds: lm3532: use extended registration so that LED can be used for backlight
leds: lm3642: remove warnings for bad strtol, cleanup gotos
leds: rb532: cleanup whitespace
ledtrig-pattern: fix email address quoting in MODULE_AUTHOR()
dt-bindings: mfd: update TI tps6105x chip bindings
leds: tps6105x: add driver for MFD chip LED mode
led: max77650: add of_match table
leds: bd2802: Convert to use GPIO descriptors
leds: pca963x: Fix open-drain initialization
...
Linus Torvalds [Sun, 2 Feb 2020 19:31:52 +0000 (11:31 -0800)]
Merge branch 'pcmcia-next' of git://git./linux/kernel/git/brodo/linux
Pull pcmcia updates from Dominik Brodowski:
"This is a series co-developed by Simon Geis and Lukas Panzer to clean
up the i82092 PCMCIA device driver"
* 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
PCMCIA/i82092: remove #if 0 block
PCMCIA/i82092: delete enter/leave macro
PCMCIA/i82092: include <linux/io.h> instead of <asm/io.h>
PCMCIA/i82092: shorten the lines with over 80 characters
PCMCIA/i82092: move assignment out of if condition
PCMCIA/i82092: change code indentation
PCMCIA/i82092: insert blank line after declarations
PCMCIA/i82092: remove braces around single statement blocks
PCMCIA/i82092: add/remove spaces to improve readability
PCMCIA/i82092: use dev_<level> instead of printk
Josef Bacik [Fri, 31 Jan 2020 14:31:05 +0000 (09:31 -0500)]
btrfs: do not zero f_bavail if we have available space
There was some logic added a while ago to clear out f_bavail in statfs()
if we did not have enough free metadata space to satisfy our global
reserve. This was incorrect at the time, however didn't really pose a
problem for normal file systems because we would often allocate chunks
if we got this low on free metadata space, and thus wouldn't really hit
this case unless we were actually full.
Fast forward to today and now we are much better about not allocating
metadata chunks all of the time. Couple this with
d792b0f19711 ("btrfs:
always reserve our entire size for the global reserve") which now means
we'll easily have a larger global reserve than our free space, we are
now more likely to trip over this while still having plenty of space.
Fix this by skipping this logic if the global rsv's space_info is not
full. space_info->full is 0 unless we've attempted to allocate a chunk
for that space_info and that has failed. If this happens then the space
for the global reserve is definitely sacred and we need to report
b_avail == 0, but before then we can just use our calculated b_avail.
Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Fixes:
ca8a51b3a979 ("btrfs: statfs: report zero available if metadata are exhausted")
CC: stable@vger.kernel.org # 4.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-By: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Arnd Bergmann [Sat, 1 Feb 2020 21:20:52 +0000 (22:20 +0100)]
sparc64: fix adjtimex regression
Anatoly Pugachev reported one of the y2038 patches to introduce
a fatal bug from a stupid typo:
[ 96.384129] watchdog: BUG: soft lockup - CPU#8 stuck for 22s!
...
[ 96.385624] [
0000000000652ca4] handle_mm_fault+0x84/0x320
[ 96.385668] [
0000000000b6f2bc] do_sparc64_fault+0x43c/0x820
[ 96.385720] [
0000000000407754] sparc64_realfault_common+0x10/0x20
[ 96.385769] [
000000000042fa28] __do_sys_sparc_clock_adjtime+0x28/0x80
[ 96.385819] [
00000000004307f0] sys_sparc_clock_adjtime+0x10/0x20
[ 96.385866] [
0000000000406294] linux_sparc_syscall+0x34/0x44
Fix the code to dereference the correct pointer again.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Fixes:
251ec1c159e4 ("y2038: sparc: remove use of struct timex")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 1 Feb 2020 20:38:20 +0000 (12:38 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix suspicious RCU usage in ipset, from Jozsef Kadlecsik.
2) Use kvcalloc, from Joe Perches.
3) Flush flowtable hardware workqueue after garbage collection run,
from Paul Blakey.
4) Missing flowtable hardware workqueue flush from nf_flow_table_free(),
also from Paul.
5) Restore NF_FLOW_HW_DEAD in flow_offload_work_del(), from Paul.
6) Flowtable documentation fixes, from Matteo Croce.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 31 Jan 2020 23:27:04 +0000 (15:27 -0800)]
cls_rsvp: fix rsvp_policy
NLA_BINARY can be confusing, since .len value represents
the max size of the blob.
cls_rsvp really wants user space to provide long enough data
for TCA_RSVP_DST and TCA_RSVP_SRC attributes.
BUG: KMSAN: uninit-value in rsvp_get net/sched/cls_rsvp.h:258 [inline]
BUG: KMSAN: uninit-value in gen_handle net/sched/cls_rsvp.h:402 [inline]
BUG: KMSAN: uninit-value in rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572
CPU: 1 PID: 13228 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
rsvp_get net/sched/cls_rsvp.h:258 [inline]
gen_handle net/sched/cls_rsvp.h:402 [inline]
rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572
tc_new_tfilter+0x31fe/0x5010 net/sched/cls_api.c:2104
rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415
netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg net/socket.c:659 [inline]
____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
___sys_sendmsg net/socket.c:2384 [inline]
__sys_sendmsg+0x451/0x5f0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45b349
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007f269d43dc78 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
00007f269d43e6d4 RCX:
000000000045b349
RDX:
0000000000000000 RSI:
00000000200001c0 RDI:
0000000000000003
RBP:
000000000075bfc8 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000ffffffff
R13:
00000000000009c2 R14:
00000000004cb338 R15:
000000000075bfd4
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
slab_alloc_node mm/slub.c:2774 [inline]
__kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382
__kmalloc_reserve net/core/skbuff.c:141 [inline]
__alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
alloc_skb include/linux/skbuff.h:1049 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg net/socket.c:659 [inline]
____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
___sys_sendmsg net/socket.c:2384 [inline]
__sys_sendmsg+0x451/0x5f0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
6fa8c0144b77 ("[NET_SCHED]: Use nla_policy for attribute validation in classifiers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Eckelmann [Fri, 31 Jan 2020 08:59:19 +0000 (09:59 +0100)]
MAINTAINERS: Orphan HSR network protocol
The current maintainer Arvid Brodin <arvid.brodin@alten.se> hasn't
contributed to the kernel since 2015-02-27. His company mail address is
also bouncing and the company confirmed (2020-01-31) that no Arvid Brodin
is working for them:
> Vi har dessvärre ingen Arvid Brodin som arbetar på ALTEN.
A MIA person cannot be the maintainer. It is better to mark is as orphaned
until some other person can jump in and take over the responsibility for
HSR.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Fri, 31 Jan 2020 05:03:26 +0000 (08:03 +0300)]
qed: Fix a error code in qed_hw_init()
If the qed_fw_overlay_mem_alloc() then we should return -ENOMEM instead
of success.
Fixes:
30d5f85895fa ("qed: FW 8.42.2.0 Add fw overlay feature")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Fri, 31 Jan 2020 05:02:41 +0000 (08:02 +0300)]
octeontx2-pf: Fix an IS_ERR() vs NULL bug
The otx2_mbox_get_rsp() function never returns NULL, it returns error
pointers on error.
Fixes:
34bfe0ebedb7 ("octeontx2-pf: MTU, MAC and RX mode config support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Sat, 1 Feb 2020 19:22:41 +0000 (11:22 -0800)]
Merge tag '5.6-rc-small-smb3-fix-for-stable' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fix from Steve French:
"Small SMB3 fix for stable (fixes problem with soft mounts)"
* tag '5.6-rc-small-smb3-fix-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
cifs: fix soft mounts hanging in the reconnect code
Al Viro [Sat, 1 Feb 2020 16:26:45 +0000 (16:26 +0000)]
vfs: fix do_last() regression
Brown paperbag time: fetching ->i_uid/->i_mode really should've been
done from nd->inode. I even suggested that, but the reason for that has
slipped through the cracks and I went for dir->d_inode instead - made
for more "obvious" patch.
Analysis:
- at the entry into do_last() and all the way to step_into(): dir (aka
nd->path.dentry) is known not to have been freed; so's nd->inode and
it's equal to dir->d_inode unless we are already doomed to -ECHILD.
inode of the file to get opened is not known.
- after step_into(): inode of the file to get opened is known; dir
might be pointing to freed memory/be negative/etc.
- at the call of may_create_in_sticky(): guaranteed to be out of RCU
mode; inode of the file to get opened is known and pinned; dir might
be garbage.
The last was the reason for the original patch. Except that at the
do_last() entry we can be in RCU mode and it is possible that
nd->path.dentry->d_inode has already changed under us.
In that case we are going to fail with -ECHILD, but we need to be
careful; nd->inode is pointing to valid struct inode and it's the same
as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
should use that.
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Fixes:
d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 1 Feb 2020 18:25:55 +0000 (10:25 -0800)]
Merge tag 'kconfig-v5.6' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig updates from Masahiro Yamada:
- add 'yes2modconfig' and 'mod2yesconfig' targets (useful mainly for
turning syzbot configs into more modular ones as a step to minimizing
the result)
- sanitize help text
- various code cleanups
* tag 'kconfig-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix documentation typos
kconfig: fix an "implicit declaration of function" warning
kconfig: fix nesting of symbol help text
kconfig: distinguish between dependencies and visibility in help text
kconfig: list all definitions of a symbol in help text
kconfig: Add yes2modconfig and mod2yesconfig targets.
kconfig: use $(PERL) in Makefile
kconfig: fix too deep indentation in Makefile
kconfig: localmodconfig: fix indentation for closing brace
kconfig: localmodconfig: remove unused $config
kconfig: squash prop_alloc() into menu_add_prop()
kconfig: remove sym from struct property
kconfig: remove 'prompt' argument from menu_add_prop()
kconfig: move prompt handling to menu_add_prompt() from menu_add_prop()
kconfig: remove 'prompt' symbol
kconfig: drop T_WORD from the RHS of 'prompt' symbol
kconfig: use parent->dep as the parentdep of 'menu'
kconfig: remove the rootmenu check in menu_add_prop()
Linus Torvalds [Sat, 1 Feb 2020 18:01:52 +0000 (10:01 -0800)]
Merge tag 'kbuild-v5.6' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- detect missing include guard in UAPI headers
- do not create orphan built-in.a or obj-y objects
- generate modules.builtin more simply, and drop tristate.conf
- simplify built-in initramfs creation
- make linux-headers deb package thinner
- optimize the deb package build script
- misc cleanups
* tag 'kbuild-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
builddeb: split libc headers deployment out into a function
builddeb: split kernel headers deployment out into a function
builddeb: remove redundant make for ARCH=um
builddeb: avoid invoking sub-shells where possible
builddeb: remove redundant $objtree/
builddeb: match temporary directory name to the package name
builddeb: remove unneeded files in hdrobjfiles for headers package
kbuild: use -S instead of -E for precise cc-option test in Kconfig
builddeb: allow selection of .deb compressor
kbuild: remove 'Building modules, stage 2.' log
kbuild: remove *.tmp file when filechk fails
kbuild: remove PYTHON2 variable
modpost: assume STT_SPARC_REGISTER is defined
gen_initramfs.sh: remove intermediate cpio_list on errors
initramfs: refactor the initramfs build rules
gen_initramfs.sh: always output cpio even without -o option
initramfs: add default_cpio_list, and delete -d option support
initramfs: generate dependency list and cpio at the same time
initramfs: specify $(src)/gen_initramfs.sh as a prerequisite in Makefile
initramfs: make initramfs compression choice non-optional
...
Linus Torvalds [Sat, 1 Feb 2020 17:48:37 +0000 (09:48 -0800)]
Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random
Pull random changes from Ted Ts'o:
"Change /dev/random so that it uses the CRNG and only blocking if the
CRNG hasn't initialized, instead of the old blocking pool. Also clean
up archrandom.h, and some other miscellaneous cleanups"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits)
s390x: Mark archrandom.h functions __must_check
powerpc: Mark archrandom.h functions __must_check
powerpc: Use bool in archrandom.h
x86: Mark archrandom.h functions __must_check
linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check
linux/random.h: Use false with bool
linux/random.h: Remove arch_has_random, arch_has_random_seed
s390: Remove arch_has_random, arch_has_random_seed
powerpc: Remove arch_has_random, arch_has_random_seed
x86: Remove arch_has_random, arch_has_random_seed
random: remove some dead code of poolinfo
random: fix typo in add_timer_randomness()
random: Add and use pr_fmt()
random: convert to ENTROPY_BITS for better code readability
random: remove unnecessary unlikely()
random: remove kernel.random.read_wakeup_threshold
random: delete code to pull data into pools
random: remove the blocking pool
random: make /dev/random be almost like /dev/urandom
random: ignore GRND_RANDOM in getentropy(2)
...
Michael Ellerman [Sat, 1 Feb 2020 10:47:17 +0000 (21:47 +1100)]
Merge branch 'topic/user-access-begin' into next
Merge the user_access_begin() series from Christophe. This is based on
a commit from Linus that went into v5.5-rc7.
Eric Dumazet [Fri, 31 Jan 2020 18:44:50 +0000 (10:44 -0800)]
tcp: clear tp->segs_{in|out} in tcp_disconnect()
tp->segs_in and tp->segs_out need to be cleared in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes:
2efd055c53c0 ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 31 Jan 2020 18:32:41 +0000 (10:32 -0800)]
tcp: clear tp->data_segs{in|out} in tcp_disconnect()
tp->data_segs_in and tp->data_segs_out need to be cleared
in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes:
a44d6eacdaf5 ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 31 Jan 2020 18:22:47 +0000 (10:22 -0800)]
tcp: clear tp->delivered in tcp_disconnect()
tp->delivered needs to be cleared in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes:
ddf1af6fa00e ("tcp: new delivery accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 31 Jan 2020 17:14:47 +0000 (09:14 -0800)]
tcp: clear tp->total_retrans in tcp_disconnect()
total_retrans needs to be cleared in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmitry Torokhov [Sat, 1 Feb 2020 01:42:33 +0000 (17:42 -0800)]
Merge branch 'next' into for-linus
Prepare input updates for 5.6 merge window.
Lucas Stach [Sat, 1 Feb 2020 01:38:19 +0000 (17:38 -0800)]
Input: synaptics-rmi4 - switch to reduced reporting mode
When the distance thresholds are set the controller must be in reduced
reporting mode for them to have any effect on the interrupt generation.
This has a potentially large impact on the number of events the host
needs to process.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Andrew Duggan <aduggan@synaptics.com>
Link: https://lore.kernel.org/r/20200120111628.18376-1-l.stach@pengutronix.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Fri, 31 Jan 2020 22:48:54 +0000 (14:48 -0800)]
Merge tag 'pci-v5.6-changes' of git://git./linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Resource management:
- Improve resource assignment for hot-added nested bridges, e.g.,
Thunderbolt (Nicholas Johnson)
Power management:
- Optionally print config space of devices before suspend (Chen Yu)
- Increase D3 delay for AMD Ryzen5/7 XHCI controllers (Daniel Drake)
Virtualization:
- Generalize DMA alias quirks (James Sewart)
- Add DMA alias quirk for PLX PEX NTB (James Sewart)
- Fix IOV memory leak (Navid Emamdoost)
AER:
- Log which device prevents error recovery (Yicong Yang)
Peer-to-peer DMA:
- Whitelist Intel SkyLake-E (Armen Baloyan)
Broadcom iProc host bridge driver:
- Apply PAXC quirk whether driver is built-in or module (Wei Liu)
Broadcom STB host bridge driver:
- Add Broadcom STB PCIe host controller driver (Jim Quinlan)
Intel Gateway SoC host bridge driver:
- Add driver for Intel Gateway SoC (Dilip Kota)
Intel VMD host bridge driver:
- Add support for DMA aliases on other buses (Jon Derrick)
- Remove dma_map_ops overrides (Jon Derrick)
- Remove now-unused X86_DEV_DMA_OPS (Christoph Hellwig)
NVIDIA Tegra host bridge driver:
- Fix Tegra30 afi_pex2_ctrl register offset (Marcel Ziswiler)
Panasonic UniPhier host bridge driver:
- Remove module code since driver can't be built as a module
(Masahiro Yamada)
Qualcomm host bridge driver:
- Add support for SDM845 PCIe controller (Bjorn Andersson)
TI Keystone host bridge driver:
- Fix "num-viewport" DT property error handling (Kishon Vijay Abraham I)
- Fix link training retries initiation (Yurii Monakov)
- Fix outbound region mapping (Yurii Monakov)
Misc:
- Add Switchtec Gen4 support (Kelvin Cao)
- Add Switchtec Intercomm Notify and Upstream Error Containment
support (Logan Gunthorpe)
- Use dma_set_mask_and_coherent() since Switchtec supports 64-bit
addressing (Wesley Sheng)"
* tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (60 commits)
PCI: Allow adjust_bridge_window() to shrink resource if necessary
PCI: Set resource size directly in adjust_bridge_window()
PCI: Rename extend_bridge_window() to adjust_bridge_window()
PCI: Rename extend_bridge_window() parameter
PCI: Consider alignment of hot-added bridges when assigning resources
PCI: Remove local variable usage in pci_bus_distribute_available_resources()
PCI: Pass size + alignment to pci_bus_distribute_available_resources()
PCI: Rename variables
PCI: vmd: Add two VMD Device IDs
PCI: Remove unnecessary braces
PCI: brcmstb: Add MSI support
PCI: brcmstb: Add Broadcom STB PCIe host controller driver
x86/PCI: Remove X86_DEV_DMA_OPS
PCI: vmd: Remove dma_map_ops overrides
iommu/vt-d: Remove VMD child device sanity check
iommu/vt-d: Use pci_real_dma_dev() for mapping
PCI: Introduce pci_real_dma_dev()
x86/PCI: Expose VMD's pci_dev in struct pci_sysdata
x86/PCI: Add to_pci_sysdata() helper
PCI/AER: Initialize aer_fifo
...
Linus Torvalds [Fri, 31 Jan 2020 22:43:23 +0000 (14:43 -0800)]
Merge tag 'media/v5.6-1' of git://git./linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- New staging driver for Rockship ISPv1 unit
- New staging driver for Rockchip MIPI Synopsys DPHY RX0
- y2038 fixes at V4L2 API (backward-compatible)
- A dvb core fix when receiving invalid EIT sections
- Some clang-specific warnings got fixed
- Added support for touch V4L2 interface at vivid
- Several drivers were converted to use the new
i2c_new_scanned_device() kAPI
- Added sm1 support at meson's vdec driver
- Several other driver cleanups, fixes and improvements
* tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (207 commits)
media: staging/intel-ipu3: remove TODO item about acronyms
media: v4l2-fwnode: Print the node name while parsing endpoints
media: Revert "media: staging/intel-ipu3: make imgu use fixed running mode"
media: mt9v111: constify copied structure
media: platform: VIDEO_MEDIATEK_JPEG can also depend on MTK_IOMMU
media: uvcvideo: Add a quirk to force GEO GC6500 Camera bits-per-pixel value
media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
media: hantro: fix post-processing NULL pointer dereference
media: rcar-vin: Use correct pixel format when aligning format
media: MAINTAINERS: add entry for Rockchip ISP1 driver
media: staging: rkisp1: add TODO file for staging
media: staging: rkisp1: add document for rkisp1 meta buffer format
media: staging: rkisp1: add output device for parameters
media: staging: rkisp1: add capture device for statistics
media: staging: rkisp1: add user space ABI definitions
media: staging: rkisp1: add streaming paths
media: staging: rkisp1: add Rockchip ISP1 base driver
media: staging: phy-rockchip-dphy-rx0: add Rockchip MIPI Synopsys DPHY RX0 driver
media: staging: dt-bindings: add Rockchip MIPI RX D-PHY RX0 yaml bindings
media: staging: dt-bindings: add Rockchip ISP1 yaml bindings
...
Linus Torvalds [Fri, 31 Jan 2020 22:40:36 +0000 (14:40 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A very quiet cycle with few notable changes. Mostly the usual list of
one or two patches to drivers changing something that isn't quite rc
worthy. The subsystem seems to be seeing a larger number of rework and
cleanup style patches right now, I feel that several vendors are
prepping their drivers for new silicon.
Summary:
- Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
rxe, i40iw
- Larger series doing cleanup and rework for hns and hfi1.
- Some general reworking of the CM code to make it a little more
understandable
- Unify the different code paths connected to the uverbs FD scheme
- New UAPI ioctls conversions for get context and get async fd
- Trace points for CQ and CM portions of the RDMA stack
- mlx5 driver support for virtio-net formatted rings as RDMA raw
ethernet QPs
- verbs support for setting the PCI-E relaxed ordering bit on DMA
traffic connected to a MR
- A couple of bug fixes that came too late to make rc7"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
RDMA/core: Make the entire API tree static
RDMA/efa: Mask access flags with the correct optional range
RDMA/cma: Fix unbalanced cm_id reference count during address resolve
RDMA/umem: Fix ib_umem_find_best_pgsz()
IB/mlx4: Fix leak in id_map_find_del
IB/opa_vnic: Spelling correction of 'erorr' to 'error'
IB/hfi1: Fix logical condition in msix_request_irq
RDMA/cm: Remove CM message structs
RDMA/cm: Use IBA functions for complex structure members
RDMA/cm: Use IBA functions for simple structure members
RDMA/cm: Use IBA functions for swapping get/set acessors
RDMA/cm: Use IBA functions for simple get/set acessors
RDMA/cm: Add SET/GET implementations to hide IBA wire format
RDMA/cm: Add accessors for CM_REQ transport_type
IB/mlx5: Return the administrative GUID if exists
RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
IB/mlx4: Fix memory leak in add_gid error flow
IB/mlx5: Expose RoCE accelerator counters
RDMA/mlx5: Set relaxed ordering when requested
RDMA/core: Add the core support field to METHOD_GET_CONTEXT
...
Linus Torvalds [Fri, 31 Jan 2020 22:39:21 +0000 (14:39 -0800)]
Merge tag 'thermal-v5.6-rc1-2' of git://git./linux/kernel/git/thermal/linux
Pull thermal fixes from Daniel Lezcano:
- Fix a severe docs build failure for cpu idle cooling device (Randy
Dunlap)
- Fix a spelling mistake in the error message for the stm32 (Colin Ian
King)
* tag 'thermal-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal: stm32: fix spelling mistake "preprare" -> "prepare"
Documentation: cpu-idle-cooling: fix a SEVERE docs build failure
Linus Torvalds [Fri, 31 Jan 2020 22:38:17 +0000 (14:38 -0800)]
Merge tag 'acpi-5.6-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"Fix up MAINTAINERS entires related to ACPI (Andy Shevchenko)"
* tag 'acpi-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
MAINTAINERS: Sort entries in database for X-POWERS AXP288
MAINTAINERS: Sort entries in database for ACPICA
MAINTAINERS: Sort entries in database for ACPI
Linus Torvalds [Fri, 31 Jan 2020 22:36:35 +0000 (14:36 -0800)]
Merge tag 'pm-5.6-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm
Pull more power manadement updates from Rafael Wysocki:
"Prevent cpufreq from creating excessively large stack frames and fix
the handling of devices deleted during system-wide resume in the PM
core (Rafael Wysocki), revert a problematic commit affecting the
cpupower utility and correct its man page (Thomas Renninger,
Brahadambal Srinivasan), and improve the intel_pstate_tracer utility
(Doug Smythies)"
* tag 'pm-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools/power/x86/intel_pstate_tracer: change several graphs to autoscale y-axis
tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility
Correction to manpage of cpupower
cpufreq: Avoid creating excessively large stack frames
PM: core: Fix handling of devices deleted during system-wide resume
cpupower: Revert library ABI changes from commit
ae2917093fb60bdc1ed3e