Hugh Dickins [Thu, 2 Sep 2021 21:54:37 +0000 (14:54 -0700)]
huge tmpfs: shmem_is_huge(vma, inode, index)
Extend shmem_huge_enabled(vma) to shmem_is_huge(vma, inode, index), so
that a consistent set of checks can be applied, even when the inode is
accessed through read/write syscalls (with NULL vma) instead of mmaps (the
index argument is seldom of interest, but required by mount option
"huge=within_size"). Clean up and rearrange the checks a little.
This then replaces the checks which shmem_fault() and shmem_getpage_gfp()
were making, and eliminates the SGP_HUGE and SGP_NOHUGE modes.
Replace a couple of 0s by explicit SHMEM_HUGE_NEVERs; and replace the
obscure !shmem_mapping() symlink check by explicit S_ISLNK() - nothing
else needs that symlink check, so leave it there in shmem_getpage_gfp().
Link: https://lkml.kernel.org/r/23a77889-2ddc-b030-75cd-44ca27fd4d1@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 2 Sep 2021 21:54:34 +0000 (14:54 -0700)]
huge tmpfs: SGP_NOALLOC to stop collapse_file() on race
khugepaged's collapse_file() currently uses SGP_NOHUGE to tell
shmem_getpage() not to try allocating a huge page, in the very unlikely
event that a racing hole-punch removes the swapped or fallocated page as
soon as i_pages lock is dropped.
We want to consolidate shmem's huge decisions, removing SGP_HUGE and
SGP_NOHUGE; but cannot quite persuade ourselves that it's okay to regress
the protection in this case - Yang Shi points out that the huge page would
remain indefinitely, charged to root instead of the intended memcg.
collapse_file() should not even allocate a small page in this case: why
proceed if someone is punching a hole? SGP_READ is almost the right flag
here, except that it optimizes away from a fallocated page, with NULL to
tell caller to fill with zeroes (like a hole); whereas collapse_file()'s
sequence relies on using a cache page. Add SGP_NOALLOC just for this.
There are too many consecutive "if (page"s there in shmem_getpage_gfp():
group it better; and fix the outdated "bring it back from swap" comment.
Link: https://lkml.kernel.org/r/1355343b-acf-4653-ef79-6aee40214ac5@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 2 Sep 2021 21:54:31 +0000 (14:54 -0700)]
huge tmpfs: move shmem_huge_enabled() upwards
shmem_huge_enabled() is about to be enhanced into shmem_is_huge(), so that
it can be used more widely throughout: before making functional changes,
shift it to its final position (to avoid forward declaration).
Link: https://lkml.kernel.org/r/16fec7b7-5c84-415a-8586-69d8bf6a6685@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 2 Sep 2021 21:54:27 +0000 (14:54 -0700)]
huge tmpfs: revert shmem's use of transhuge_vma_enabled()
5.14 commit
e6be37b2e7bd ("mm/huge_memory.c: add missing read-only THP
checking in transparent_hugepage_enabled()") added transhuge_vma_enabled()
as a wrapper for two very different checks (one check is whether the app
has marked its address range not to use THPs, the other check is whether
the app is running in a hierarchy that has been marked never to use THPs).
shmem_huge_enabled() prefers to show those two checks explicitly, as
before.
Link: https://lkml.kernel.org/r/45e5338-18d-c6f9-c17e-34f510bc1728@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 2 Sep 2021 21:54:24 +0000 (14:54 -0700)]
huge tmpfs: remove shrinklist addition from shmem_setattr()
There's a block of code in shmem_setattr() to add the inode to
shmem_unused_huge_shrink()'s shrinklist when lowering i_size: it dates
from before 5.7 changed truncation to do split_huge_page() for itself, and
should have been removed at that time.
I am over-stating that: split_huge_page() can fail (notably if there's an
extra reference to the page at that time), so there might be value in
retrying. But there were already retries as truncation worked through the
tails, and this addition risks repeating unsuccessful retries
indefinitely: I'd rather remove it now, and work on reducing the chance of
split_huge_page() failures separately, if we need to.
Link: https://lkml.kernel.org/r/b73b3492-8822-18f9-83e2-938528cdde94@google.com
Fixes:
71725ed10c40 ("mm: huge tmpfs: try to split_huge_page() when punching hole")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 2 Sep 2021 21:54:21 +0000 (14:54 -0700)]
huge tmpfs: fix split_huge_page() after FALLOC_FL_KEEP_SIZE
A successful shmem_fallocate() guarantees that the extent has been
reserved, even beyond i_size when the FALLOC_FL_KEEP_SIZE flag was used.
But that guarantee is broken by shmem_unused_huge_shrink()'s attempts to
split huge pages and free their excess beyond i_size; and by other uses of
split_huge_page() near i_size.
It's sad to add a shmem inode field just for this, but I did not find a
better way to keep the guarantee. A flag to say KEEP_SIZE has been used
would be cheaper, but I'm averse to unclearable flags. The fallocend
field is not perfect either (many disjoint ranges might be fallocated),
but good enough; and gains another use later on.
Link: https://lkml.kernel.org/r/ca9a146-3a59-6cd3-7f28-e9a044bb1052@google.com
Fixes:
779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 2 Sep 2021 21:54:18 +0000 (14:54 -0700)]
huge tmpfs: fix fallocate(vanilla) advance over huge pages
Patch series "huge tmpfs: shmem_is_huge() fixes and cleanups".
A series of huge tmpfs fixes and cleanups.
This patch (of 9):
shmem_fallocate() goes to a lot of trouble to leave its newly allocated
pages !Uptodate, partly to identify and undo them on failure, partly to
leave the overhead of clearing them until later. But the huge page case
did not skip to the end of the extent, walked through the tail pages one
by one, and appeared to work just fine: but in doing so, cleared and
Uptodated the huge page, so there was no way to undo it on failure.
And by setting Uptodate too soon, it messed up both its nr_falloced and
nr_unswapped counts, so that the intended "time to give up" heuristic did
not work at all.
Now advance immediately to the end of the huge extent, with a comment on
why this is more than just an optimization. But although this speeds up
huge tmpfs fallocation, it does leave the clearing until first use, and
some users may have come to appreciate slow fallocate but fast first use:
if they complain, then we can consider adding a pass to clear at the end.
Link: https://lkml.kernel.org/r/da632211-8e3e-6b1-aee-ab24734429a0@google.com
Link: https://lkml.kernel.org/r/16201bd2-70e-37e2-e89b-5f929430da@google.com
Fixes:
800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:54:15 +0000 (14:54 -0700)]
shmem: include header file to declare swap_info
It's bad to extern swap_info[] in .c. Include corresponding header file
instead.
Link: https://lkml.kernel.org/r/20210812120350.49801-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:54:12 +0000 (14:54 -0700)]
shmem: remove unneeded function forward declaration
The forward declaration for shmem_should_replace_page() and
shmem_replace_page() is unnecessary. Remove them.
Link: https://lkml.kernel.org/r/20210812120350.49801-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:54:09 +0000 (14:54 -0700)]
shmem: remove unneeded header file
mfill_atomic_install_pte() is introduced to install pte and update mmu
cache since commit
bf6ebd97aba0 ("userfaultfd/shmem: modify
shmem_mfill_atomic_pte to use install_pte()"). So we should remove
tlbflush.h as update_mmu_cache() is not called here now.
Link: https://lkml.kernel.org/r/20210812120350.49801-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:54:06 +0000 (14:54 -0700)]
shmem: remove unneeded variable ret
Patch series "Cleanups for shmem".
This series contains cleanups to remove unneeded variable, header file,
function forward declaration and so on. More details can be found in the
respective changelogs.
This patch (of 4):
The local variable ret is always equal to -ENOMEM and never touched. So
remove it and return -ENOMEM directly to simplify the code.
Link: https://lkml.kernel.org/r/20210812120350.49801-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210812120350.49801-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sebastian Andrzej Siewior [Thu, 2 Sep 2021 21:54:03 +0000 (14:54 -0700)]
shmem: use raw_spinlock_t for ->stat_lock
Each CPU has SHMEM_INO_BATCH inodes available in `->ino_batch' which is
per-CPU. Access here is serialized by disabling preemption. If the pool
is empty, it gets reloaded from `->next_ino'. Access here is serialized
by ->stat_lock which is a spinlock_t and can not be acquired with disabled
preemption.
One way around it would make per-CPU ino_batch struct containing the inode
number a local_lock_t.
Another solution is to promote ->stat_lock to a raw_spinlock_t. The
critical sections are short. The mpol_put() must be moved outside of the
critical section to avoid invoking the destructor with disabled
preemption.
Link: https://lkml.kernel.org/r/20210806142916.jdwkb5bx62q5fwfo@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Hubbard [Thu, 2 Sep 2021 21:54:00 +0000 (14:54 -0700)]
mm: delete unused get_kernel_page()
get_kernel_page() was added in 2012 by [1]. It was used for a while for
NFS, but then in 2014, a refactoring [2] removed all callers, and it has
apparently not been used since.
Remove get_kernel_page() because it has no callers.
[1] commit
18022c5d8627 ("mm: add get_kernel_page[s] for pinning of
kernel addresses for I/O")
[2] commit
91f79c43d1b5 ("new helper: iov_iter_get_pages_alloc()")
Link: https://lkml.kernel.org/r/20210729221847.1165665-1-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 2 Sep 2021 21:53:57 +0000 (14:53 -0700)]
fs, mm: fix race in unlinking swapfile
We had a recurring situation in which admin procedures setting up
swapfiles would race with test preparation clearing away swapfiles; and
just occasionally that got stuck on a swapfile "(deleted)" which could
never be swapped off. That is not supposed to be possible.
2.6.28 commit
f9454548e17c ("don't unlink an active swapfile") admitted
that it was leaving a race window open: now close it.
may_delete() makes the IS_SWAPFILE check (amongst many others) before
inode_lock has been taken on target: now repeat just that simple check in
vfs_unlink() and vfs_rename(), after taking inode_lock.
Which goes most of the way to fixing the race, but swapon() must also
check after it acquires inode_lock, that the file just opened has not
already been unlinked.
Link: https://lkml.kernel.org/r/e17b91ad-a578-9a15-5e3-4989e0f999b5@google.com
Fixes:
f9454548e17c ("don't unlink an active swapfile")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Hubbard [Thu, 2 Sep 2021 21:53:54 +0000 (14:53 -0700)]
mm/gup: remove try_get_page(), call try_get_compound_head() directly
try_get_page() is very similar to try_get_compound_head(), and in fact
try_get_page() has fallen a little behind in terms of maintenance:
try_get_compound_head() handles speculative page references more
thoroughly.
There are only two try_get_page() callsites, so just call
try_get_compound_head() directly from those, and remove try_get_page()
entirely.
Also, seeing as how this changes try_get_compound_head() into a non-static
function, provide some kerneldoc documentation for it.
Link: https://lkml.kernel.org/r/20210813044133.1536842-4-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Hubbard [Thu, 2 Sep 2021 21:53:51 +0000 (14:53 -0700)]
mm/gup: small refactoring: simplify try_grab_page()
try_grab_page() does the same thing as try_grab_compound_head(..., refs=1,
...), just with a different API. So there is a lot of code duplication
there.
Change try_grab_page() to call try_grab_compound_head(), while keeping the
API contract identical for callers.
Also, now that try_grab_compound_head() always has a caller, remove the
__maybe_unused annotation.
Link: https://lkml.kernel.org/r/20210813044133.1536842-3-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Hubbard [Thu, 2 Sep 2021 21:53:48 +0000 (14:53 -0700)]
mm/gup: documentation corrections for gup/pup
Patch series "A few gup refactorings and documentation updates", v3.
While reviewing some of the other things going on around gup.c, I noticed
that the documentation was wrong for a few of the routines that I wrote.
And then I noticed that there was some significant code duplication too.
So this fixes those issues.
This is not entirely risk-free, but after looking closely at this, I think
it's actually a useful improvement, getting rid of the code duplication
here.
This patch (of 3):
The documentation for try_grab_compound_head() and try_grab_page() has
fallen a little out of date. Update and clarify a few points.
Also make it kerneldoc-correct, by adding @args documentation.
Link: https://lkml.kernel.org/r/20210813044133.1536842-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20210813044133.1536842-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:53:45 +0000 (14:53 -0700)]
mm: gup: use helper PAGE_ALIGNED in populate_vma_page_range()
Use helper PAGE_ALIGNED to check if address is aligned to PAGE_SIZE.
Minor readability improvement.
Link: https://lkml.kernel.org/r/20210807093620.21347-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:53:42 +0000 (14:53 -0700)]
mm: gup: fix potential pgmap refcnt leak in __gup_device_huge()
When failed to try_grab_page, put_dev_pagemap() is missed. So pgmap
refcnt will leak in this case. Also we remove the check for pgmap against
NULL as it's also checked inside the put_dev_pagemap().
[akpm@linux-foundation.org: simplify, cleanup]
[akpm@linux-foundation.org: fix return value]
Link: https://lkml.kernel.org/r/20210807093620.21347-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Fixes:
3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:53:39 +0000 (14:53 -0700)]
mm: gup: remove useless BUG_ON in __get_user_pages()
Indeed, this BUG_ON couldn't catch anything useful. We are sure ret == 0
here because we would already bail out if ret != 0 and ret is untouched
till here.
Link: https://lkml.kernel.org/r/20210807093620.21347-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:53:36 +0000 (14:53 -0700)]
mm: gup: remove unneed local variable orig_refs
Remove unneed local variable orig_refs since refs is unchanged now.
Link: https://lkml.kernel.org/r/20210807093620.21347-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 21:53:33 +0000 (14:53 -0700)]
mm: gup: remove set but unused local variable major
Patch series "Cleanups and fixup for gup".
This series contains cleanups to remove unneeded variable, useless BUG_ON
and use helper to improve readability. Also we fix a potential pgmap
refcnt leak. More details can be found in the respective changelogs.
This patch (of 5):
Since commit
a2beb5f1efed ("mm: clean up the last pieces of page fault
accountings"), the local variable major is unused. Remove it.
Link: https://lkml.kernel.org/r/20210807093620.21347-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210807093620.21347-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jing Yangyang [Thu, 2 Sep 2021 21:53:30 +0000 (14:53 -0700)]
include/linux/buffer_head.h: fix boolreturn.cocci warnings
./include/linux/buffer_head.h:412:64-65:WARNING:return of 0/1 in
function 'has_bh_in_lru' with return type bool
Return statements in functions returning bool should use true/false
instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Link: https://lkml.kernel.org/r/20210824055828.58783-1-deng.changcheng@zte.com.cn
Signed-off-by: Jing Yangyang <jing.yangyang@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shakeel Butt [Thu, 2 Sep 2021 21:53:27 +0000 (14:53 -0700)]
writeback: memcg: simplify cgroup_writeback_by_id
Currently cgroup_writeback_by_id calls mem_cgroup_wb_stats() to get dirty
pages for a memcg. However mem_cgroup_wb_stats() does a lot more than
just get the number of dirty pages. Just directly get the number of dirty
pages instead of calling mem_cgroup_wb_stats(). Also
cgroup_writeback_by_id() is only called for best-effort dirty flushing, so
remove the unused 'nr' parameter and no need to explicitly flush memcg
stats.
Link: https://lkml.kernel.org/r/20210722182627.2267368-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 2 Sep 2021 21:53:24 +0000 (14:53 -0700)]
fs: inode: count invalidated shadow pages in pginodesteal
pginodesteal is supposed to capture the impact that inode reclaim has on
the page cache state. Currently, it doesn't consider shadow pages that
get dropped this way, even though this can have a significant impact on
paging behavior, memory pressure calculations etc.
To improve visibility into these effects, make sure shadow pages get
counted when they get dropped through inode reclaim.
This changes the return value semantics of invalidate_mapping_pages()
semantics slightly, but the only two users are the inode shrinker itsel
and a usb driver that logs it for debugging purposes.
Link: https://lkml.kernel.org/r/20210614211904.14420-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 2 Sep 2021 21:53:21 +0000 (14:53 -0700)]
fs: drop_caches: fix skipping over shadow cache inodes
When drop_caches truncates the page cache in an inode it also includes any
shadow entries for evicted pages. However, there is a preliminary check
on whether the inode has pages: if it has *only* shadow entries, it will
skip running truncation on the inode and leave it behind.
Fix the check to mapping_empty(), such that it runs truncation on any
inode that has cache entries at all.
Link: https://lkml.kernel.org/r/20210614211904.14420-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Roman Gushchin <guro@fb.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 2 Sep 2021 21:53:18 +0000 (14:53 -0700)]
mm: remove irqsave/restore locking from contexts with irqs enabled
The page cache deletion paths all have interrupts enabled, so no need to
use irqsafe/irqrestore locking variants.
They used to have irqs disabled by the memcg lock added in commit
c4843a7593a9 ("memcg: add per cgroup dirty page accounting"), but that has
since been replaced by memcg taking the page lock instead, commit
0a31bc97c80c ("mm: memcontrol: rewrite uncharge AP").
Link: https://lkml.kernel.org/r/20210614211904.14420-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 2 Sep 2021 21:53:15 +0000 (14:53 -0700)]
writeback: use READ_ONCE for unlocked reads of writeback stats
We do some unlocked reads of writeback statistics like
avg_write_bandwidth, dirty_ratelimit, or bw_time_stamp. Generally we are
fine with getting somewhat out-of-date values but actually getting
different values in various parts of the functions because the compiler
decided to reload value from original memory location could confuse
calculations. Use READ_ONCE for these unlocked accesses and WRITE_ONCE
for the updates to be on the safe side.
Link: https://lkml.kernel.org/r/20210713104716.22868-5-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Michael Stapelberg <stapelberg+linux@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 2 Sep 2021 21:53:12 +0000 (14:53 -0700)]
writeback: rename domain_update_bandwidth()
Rename domain_update_bandwidth() to domain_update_dirty_limit(). The
original name is a misnomer. The function has nothing to do with a
bandwidth, it updates dirty limits.
Link: https://lkml.kernel.org/r/20210713104716.22868-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Michael Stapelberg <stapelberg+linux@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 2 Sep 2021 21:53:09 +0000 (14:53 -0700)]
writeback: fix bandwidth estimate for spiky workload
Michael Stapelberg has reported that for workload with short big spikes of
writes (GCC linker seem to trigger this frequently) the write throughput
is heavily underestimated and tends to steadily sink until it reaches
zero. This has rather bad impact on writeback throttling (causing
stalls). The problem is that writeback throughput estimate gets updated
at most once per 200 ms. One update happens early after we submit pages
for writeback (at that point writeout of only small fraction of pages is
completed and thus observed throughput is tiny). Next update happens only
during the next write spike (updates happen only from inode writeback and
dirty throttling code) and if that is more than 1s after previous spike,
we decide system was idle and just ignore whatever was written until this
moment.
Fix the problem by making sure writeback throughput estimate is also
updated shortly after writeback completes to get reasonable estimate of
throughput for spiky workloads.
[jack@suse.cz: avoid division by 0 in wb_update_dirty_ratelimit()]
Link: https://lore.kernel.org/lkml/20210617095309.3542373-1-stapelberg+linux@google.com
Link: https://lkml.kernel.org/r/20210713104716.22868-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Michael Stapelberg <stapelberg+linux@google.com>
Tested-by: Michael Stapelberg <stapelberg+linux@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 2 Sep 2021 21:53:06 +0000 (14:53 -0700)]
writeback: reliably update bandwidth estimation
Currently we trigger writeback bandwidth estimation from
balance_dirty_pages() and from wb_writeback(). However neither of these
need to trigger when the system is relatively idle and writeback is
triggered e.g. from fsync(2). Make sure writeback estimates happen
reliably by triggering them from do_writepages().
Link: https://lkml.kernel.org/r/20210713104716.22868-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Michael Stapelberg <stapelberg+linux@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 2 Sep 2021 21:53:04 +0000 (14:53 -0700)]
writeback: track number of inodes under writeback
Patch series "writeback: Fix bandwidth estimates", v4.
Fix estimate of writeback throughput when device is not fully busy doing
writeback. Michael Stapelberg has reported that such workload (e.g.
generated by linking) tends to push estimated throughput down to 0 and as
a result writeback on the device is practically stalled.
The first three patches fix the reported issue, the remaining two patches
are unrelated cleanups of problems I've noticed when reading the code.
This patch (of 4):
Track number of inodes under writeback for each bdi_writeback structure.
We will use this to decide whether wb does any IO and so we can estimate
its writeback throughput. In principle we could use number of pages under
writeback (WB_WRITEBACK counter) for this however normal percpu counter
reads are too inaccurate for our purposes and summing the counter is too
expensive.
Link: https://lkml.kernel.org/r/20210713104519.16394-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20210713104716.22868-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Michael Stapelberg <stapelberg+linux@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
liuhailong [Thu, 2 Sep 2021 21:53:01 +0000 (14:53 -0700)]
mm: add kernel_misc_reclaimable in show_free_areas
Print NR_KERNEL_MISC_RECLAIMABLE stat from show_free_areas() so users can
check whether the shrinker is working correctly and to show the current
memory usage.
Link: https://lkml.kernel.org/r/20210813104725.4562-1-liuhailong@oppo.com
Signed-off-by: liuhailong <liuhailong@oppo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox (Oracle) [Thu, 2 Sep 2021 21:52:58 +0000 (14:52 -0700)]
mm: report a more useful address for reclaim acquisition
A recent lockdep report included these lines:
[ 96.177910] 3 locks held by containerd/770:
[ 96.177934] #0:
ffff88810815ea28 (&mm->mmap_lock#2){++++}-{3:3},
at: do_user_addr_fault+0x115/0x770
[ 96.177999] #1:
ffffffff82915020 (rcu_read_lock){....}-{1:2}, at:
get_swap_device+0x33/0x140
[ 96.178057] #2:
ffffffff82955ba0 (fs_reclaim){+.+.}-{0:0}, at:
__fs_reclaim_acquire+0x5/0x30
While it was not useful to that bug report to know where the reclaim lock
had been acquired, it might be useful under other circumstances. Allow
the caller of __fs_reclaim_acquire to specify the instruction pointer to
use.
Link: https://lkml.kernel.org/r/20210719185709.1755149-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:54 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: fix corrupted page flag
In page table entry modifying tests, set_xxx_at() are used to populate
the page table entries. On ARM64, PG_arch_1 (PG_dcache_clean) flag is
set to the target page flag if execution permission is given. The logic
exits since commit
4f04d8f00545 ("arm64: MMU definitions"). The page
flag is kept when the page is free'd to buddy's free area list. However,
it will trigger page checking failure when it's pulled from the buddy's
free area list, as the following warning messages indicate.
BUG: Bad page state in process memhog pfn:08000
page:
0000000015c0a628 refcount:0 mapcount:0 \
mapping:
0000000000000000 index:0x1 pfn:0x8000
flags: 0x7ffff8000000800(arch_1|node=0|zone=0|lastcpupid=0xfffff)
raw:
07ffff8000000800 dead000000000100 dead000000000122 0000000000000000
raw:
0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
This fixes the issue by clearing PG_arch_1 through flush_dcache_page()
after set_xxx_at() is called. For architectures other than ARM64, the
unexpected overhead of cache flushing is acceptable.
Link: https://lkml.kernel.org/r/20210809092631.1888748-13-gshan@redhat.com
Fixes:
a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:51 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: remove unused code
The variables used by old implementation isn't needed as we switched to
"struct pgtable_debug_args". Lets remove them and related code in
debug_vm_pgtable().
Link: https://lkml.kernel.org/r/20210809092631.1888748-12-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:48 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in PGD and P4D modifying tests
This uses struct pgtable_debug_args in PGD/P4D modifying tests. No
allocated huge page is used in these tests. Besides, the unused variable
@saved_p4dp and @saved_pudp are dropped.
Link: https://lkml.kernel.org/r/20210809092631.1888748-11-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:45 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in PUD modifying tests
This uses struct pgtable_debug_args in PUD modifying tests. The allocated
huge page is used when set_pud_at() is used. The corresponding tests are
skipped if the huge page doesn't exist. Besides, the following unused
variables in debug_vm_pgtable() are dropped: @prot, @paddr, @pud_aligned.
Link: https://lkml.kernel.org/r/20210809092631.1888748-10-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:41 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in PMD modifying tests
This uses struct pgtable_debug_args in PMD modifying tests. The allocated
huge page is used when set_pmd_at() is used. The corresponding tests are
skipped if the huge page doesn't exist. Besides, the unused variable
@pmd_aligned in debug_vm_pgtable() is dropped.
Link: https://lkml.kernel.org/r/20210809092631.1888748-9-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:38 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in PTE modifying tests
This uses struct pgtable_debug_args in PTE modifying tests. The allocated
page is used as set_pte_at() is used there. The tests are skipped if the
allocated page doesn't exist. It's notable that args->ptep need to be
mapped before the tests. The reason why we don't map args->ptep at the
beginning is PTE entry is only mapped and accessible in atomic context
when CONFIG_HIGHPTE is enabled. So we avoid to do that so that atomic
context is only enabled if needed.
Besides, the unused variable @pte_aligned and @ptep in debug_vm_pgtable()
are dropped.
Link: https://lkml.kernel.org/r/20210809092631.1888748-8-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:35 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in migration and thp tests
This uses struct pgtable_debug_args in the migration and thp test
functions. It's notable that the pre-allocated page is used in
swap_migration_tests() as set_pte_at() is used there.
Link: https://lkml.kernel.org/r/20210809092631.1888748-7-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:32 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in soft_dirty and swap tests
This uses struct pgtable_debug_args in the soft_dirty and swap test
functions.
Link: https://lkml.kernel.org/r/20210809092631.1888748-6-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:28 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in protnone and devmap tests
This uses struct pgtable_debug_args in protnone and devmap test functions.
After that, the unused variable @protnone in debug_vm_pgtable() is
dropped.
Link: https://lkml.kernel.org/r/20210809092631.1888748-5-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:25 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in leaf and savewrite tests
This uses struct pgtable_debug_args in the leaf and savewrite test
functions.
Link: https://lkml.kernel.org/r/20210809092631.1888748-4-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:22 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: use struct pgtable_debug_args in basic tests
This uses struct pgtable_debug_args in the basic test functions. The
unused variables @pgd_aligned and @p4d_aligned in debug_vm_pgtable() are
dropped.
Link: https://lkml.kernel.org/r/20210809092631.1888748-3-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gavin Shan [Thu, 2 Sep 2021 21:52:19 +0000 (14:52 -0700)]
mm/debug_vm_pgtable: introduce struct pgtable_debug_args
Patch series "mm/debug_vm_pgtable: Enhancements", v6.
There are a couple of issues with current implementations and this series
tries to resolve the issues:
(a) All needed information are scattered in variables, passed to various
test functions. The code is organized in pretty much relaxed fashion.
(b) The page isn't allocated from buddy during page table entry modifying
tests. The page can be invalid, conflicting to the implementations
of set_xxx_at() on ARM64. The target page is accessed so that the
iCache can be flushed when execution permission is given on ARM64.
Besides, the target page can be unmapped and accessing to it causes
kernel crash.
"struct pgtable_debug_args" is introduced to address issue (a). For issue
(b), the used page is allocated from buddy in page table entry modifying
tests. The corresponding tets will be skipped if we fail to allocate the
(huge) page. For other test cases, the original page around to kernel
symbol (@start_kernel) is still used.
The patches are organized as below. PATCH[2-10] could be combined to one
patch, but it will make the review harder:
PATCH[1] introduces "struct pgtable_debug_args" as place holder of all
needed information. With it, the old and new implementation
can coexist.
PATCH[2-10] uses "struct pgtable_debug_args" in various test functions.
PATCH[11] removes the unused code for old implementation.
PATCH[12] fixes the issue of corrupted page flag for ARM64
This patch (of 6):
In debug_vm_pgtable(), there are many local variables introduced to track
the needed information and they are passed to the functions for various
test cases. It'd better to introduce a struct as place holder for these
information. With it, what the tests functions need is the struct. In
this way, the code is simplified and easier to be maintained.
Besides, set_xxx_at() could access the data on the corresponding pages in
the page table modifying tests. So the accessed pages in the tests should
have been allocated from buddy. Otherwise, we're accessing pages that
aren't owned by us. This causes issues like page flag corruption or
kernel crash on accessing unmapped page when CONFIG_DEBUG_PAGEALLOC is
enabled.
This introduces "struct pgtable_debug_args". The struct is initialized
and destroyed, but the information in the struct isn't used yet. It will
be used in subsequent patches.
Link: https://lkml.kernel.org/r/20210809092631.1888748-1-gshan@redhat.com
Link: https://lkml.kernel.org/r/20210809092631.1888748-2-gshan@redhat.com
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> [s390]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel test robot [Thu, 2 Sep 2021 21:50:20 +0000 (14:50 -0700)]
arch/csky/kernel/probes/kprobes.c: fix bugon.cocci warnings
Use BUG_ON instead of a if condition followed by BUG.
Generated by: scripts/coccinelle/misc/bugon.cocci
Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2107061049150.7197@hadrien
Fixes:
7d37cb2c912d ("lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS")
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Julian Braha <julianbraha@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Thu, 2 Sep 2021 21:50:17 +0000 (14:50 -0700)]
ocfs2: ocfs2_downconvert_lock failure results in deadlock
Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to
the expected level for satisfy dlm bast requests from the other nodes.
But there is a rare situation. When dlm lock conversion is being
canceled, ocfs2_downconvert_lock() function will return -EBUSY. You need
to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm
implementation.
If we does not requeue this lockres entry, ocfs2 downconvert thread no
longer handles this dlm lock bast request. Then, the other nodes will not
get the dlm lock again, the current node's process will be blocked when
acquire this dlm lock again.
Link: https://lkml.kernel.org/r/20210830044621.12544-1-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tuo Li [Thu, 2 Sep 2021 21:50:14 +0000 (14:50 -0700)]
ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()
A memory block is allocated through kmalloc(), and its return value is
assigned to the pointer oinfo. However, oinfo->dqi_gqinode is not
initialized but it is accessed in:
iput(oinfo->dqi_gqinode);
To fix this possible uninitialized-variable access, assign NULL to
oinfo->dqi_gqinode, and add ocfs2_qinfo_lock_res_init() behind the
assignment in ocfs2_local_read_info(). Remove ocfs2_qinfo_lock_res_init()
in ocfs2_global_read_info().
Link: https://lkml.kernel.org/r/20210804031832.57154-1-islituo@gmail.com
Signed-off-by: Tuo Li <islituo@gmail.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Thu, 2 Sep 2021 21:50:11 +0000 (14:50 -0700)]
ocfs2: remove an unnecessary condition
The case where "tmp_oh" is NULL is handled at the start of the function.
At this point we know it's non-NULL so this will always return 1.
Link: https://lkml.kernel.org/r/YOcItgIXtisi3MaO@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Larry Chen <lchen@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Thu, 2 Sep 2021 21:50:08 +0000 (14:50 -0700)]
ia64: make num_rsvd_regions static
Commit
f62800992e5917f2 ("ia64: switch to NO_BOOTMEM") removed the last
user of num_rsvd_regions outside arch/ia64/kernel/setup.c.
Link: https://lkml.kernel.org/r/a377b5437e3e9da93d02f996fe06a2b956cb0990.1629884459.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Jay Lan <jlan@sgi.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Thu, 2 Sep 2021 21:50:04 +0000 (14:50 -0700)]
ia64: make reserve_elfcorehdr() static
There never was a reason for reserve_elfcorehdr() to be global. Make the
function static, and move it before its sole caller.
Link: https://lkml.kernel.org/r/fe236cd73b64abc4abd03dd808cb015c907f4c8c.1629884459.git.geert+renesas@glider.be
Fixes:
cee87af2a5f75713 ("[IA64] kexec: Use EFI_LOADER_DATA for ELF core header")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Jay Lan <jlan@sgi.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Thu, 2 Sep 2021 21:50:01 +0000 (14:50 -0700)]
ia64: fix #endif comment for reserve_elfcorehdr()
Patch series "ia64: Miscellaneous fixes and cleanups".
This patch series contains some miscellaneous fixes and cleanups for ia64.
The second patch fixes a naming conflict triggered by a patch for the FDT
code.
This patch (of 3):
The definition of reserve_elfcorehdr() depends on CONFIG_CRASH_DUMP, not
CONFIG_PROC_VMCORE.
Link: https://lkml.kernel.org/r/cover.1629884459.git.geert+renesas@glider.be
Link: https://lkml.kernel.org/r/77b4c0648f200cab7e1c2c5171c06763e09362aa.1629884459.git.geert+renesas@glider.be
Fixes:
d9a9855d0b06ca6d ("always reserve elfcore header memory in crash kernel")
Fixes:
17c1f07ed70afa4f ("[IA64] Reserve elfcorehdr memory in CONFIG_CRASH_DUMP")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Simon Horman <horms@verge.net.au>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jay Lan <jlan@sgi.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Wang [Thu, 2 Sep 2021 21:49:58 +0000 (14:49 -0700)]
ia64: fix typo in a comment
s/when when/when/
Link: https://lkml.kernel.org/r/20210817112500.12848-1-wangborong@cdjrlc.com
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 29 Aug 2021 22:04:50 +0000 (15:04 -0700)]
Linux 5.14
Linus Torvalds [Sun, 29 Aug 2021 19:52:17 +0000 (12:52 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One hotfix for a NULL pointer deref in the Renesas usb clk driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference
Linus Torvalds [Sun, 29 Aug 2021 17:54:14 +0000 (10:54 -0700)]
Merge tag 'sched_urgent_for_v5.14' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Have get_push_task() check whether current has migration disabled and
thus avoid useless invocations of the migration thread
- Rework initialization flow so that all rq->core's are initialized,
even of CPUs which have not been onlined yet, so that iterating over
them all works as expected
* tag 'sched_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix get_push_task() vs migrate_disable()
sched: Fix Core-wide rq->lock for uninitialized CPUs
Linus Torvalds [Sun, 29 Aug 2021 17:47:02 +0000 (10:47 -0700)]
Merge tag 'irq_urgent_for_v5.14' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Have msix_mask_all() check a global control which says whether MSI-X
masking should be done and thus make it usable on Xen-PV too
* tag 'irq_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Skip masking MSI-X on Xen PV
Linus Torvalds [Sun, 29 Aug 2021 17:36:32 +0000 (10:36 -0700)]
Merge tag 'perf_urgent_for_v5.14' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Prevent the amd/power module from being removed while in use
- Mark AMD IBS as not supporting content exclusion
- Add a workaround for AMD erratum #1197 where IBS registers might not
be restored properly after exiting CC6 state
- Fix a potential truncation of a 32-bit variable due to shifting
- Read the correct bits describing the number of configurable address
ranges on Intel PT
* tag 'perf_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/power: Assign pmu.module
perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op
perf/x86/amd/ibs: Work around erratum #1197
perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32
perf/x86/intel/pt: Fix mask of num_address_ranges
Linus Torvalds [Sun, 29 Aug 2021 17:26:00 +0000 (10:26 -0700)]
Merge tag 'x86_urgent_for_v5.14' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix build error on RHEL where -Werror=maybe-uninitialized is set.
- Restore the firmware's IDT when calling EFI boot services and before
ExitBootServices() has been called. This fixes a boot failure on what
appears to be a tablet with 32-bit UEFI running a 64-bit kernel.
* tag 'x86_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix a maybe-uninitialized build warning treated as error
x86/efi: Restore Firmware IDT before calling ExitBootServices()
Helge Deller [Fri, 27 Aug 2021 18:42:57 +0000 (20:42 +0200)]
Revert "parisc: Add assembly implementations for memset, strlen, strcpy, strncpy and strcat"
This reverts commit
83af58f8068ea3f7b3c537c37a30887bfa585069.
It turns out that at least the assembly implementation for strncpy() was
buggy. Revert the whole commit and return back to the default coding.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.4+
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adam Ford [Thu, 26 Aug 2021 14:17:21 +0000 (09:17 -0500)]
clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference
The probe was manually passing NULL instead of dev to devm_clk_hw_register.
This caused a Unable to handle kernel NULL pointer dereference error.
Fix this by passing 'dev'.
Signed-off-by: Adam Ford <aford173@gmail.com>
Fixes:
a20a40a8bbc2 ("clk: renesas: rcar-usb2-clock-sel: Fix error handling in .probe()")
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Linus Torvalds [Sat, 28 Aug 2021 18:39:16 +0000 (11:39 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"A single fix for a race introduced by a fix that went into 5.14-rc5"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Fix hang of freezing queue between blocking and running device
Linus Torvalds [Sat, 28 Aug 2021 18:32:16 +0000 (11:32 -0700)]
Merge tag 'usb-5.14' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few tiny USB fixes for reported issues with some USB
drivers.
These fixes include:
- gadget driver fixes for regressions
- tcpm driver fix
- dwc3 driver fixes
- xhci renesas firmware loading fix, again.
- usb serial option driver device id addition
- usb serial ch341 revert for regression
All all of these have been in linux-next with no reported problems"
* tag 'usb-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: u_audio: fix race condition on endpoint stop
usb: gadget: f_uac2: fixup feedback endpoint stop
usb: typec: tcpm: Raise vdm_sm_running flag only when VDM SM is running
usb: renesas-xhci: Prefer firmware loading on unknown ROM state
usb: dwc3: gadget: Stop EP0 transfers during pullup disable
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Revert "USB: serial: ch341: fix character loss at high transfer rates"
USB: serial: option: add new VID/PID to support Fibocom FG150
Linus Torvalds [Sat, 28 Aug 2021 17:40:41 +0000 (10:40 -0700)]
Merge tag 'powerpc-5.14-7' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix scv implicit soft-mask table for relocated (eg. kdump) kernels
- Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK, which was disabled due to a
typo
Thanks to Lukas Bulwahn, Nicholas Piggin, and Daniel Axtens.
* tag 'powerpc-5.14-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix scv implicit soft-mask table for relocated kernels
powerpc: Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
Linus Torvalds [Fri, 27 Aug 2021 23:08:29 +0000 (16:08 -0700)]
Merge tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Revert the mq-deadline priority handling, it's causing serious
performance regressions. While experimental patches exists to fix
this up, it's too late to do so now. Revert it and re-do it properly
for 5.15 instead.
- Fix a NULL vs IS_ERR() regression in this release (Dan)
- Fix a mq-deadline accounting regression in this release (Bart)
- Mark cryptoloop as deprecated. It's broken and dm-crypt fully
supports it, and it's actively intefering with loop. Plan on removal
for 5.16 (Christoph)
* tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block:
cryptoloop: add a deprecation warning
pd: fix a NULL vs IS_ERR() check
Revert "block/mq-deadline: Prioritize high-priority requests"
mq-deadline: Fix request accounting
Linus Torvalds [Fri, 27 Aug 2021 22:59:00 +0000 (15:59 -0700)]
Merge tag 'soc-fixes-5.14-4' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Just two trivial fixes from the reset driver tree, nothing else came
up since the last soc fixes"
* tag 'soc-fixes-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
reset: reset-zynqmp: Fixed the argument data type
reset: RESET_MCHP_SPARX5 should depend on ARCH_SPARX5
Linus Torvalds [Fri, 27 Aug 2021 19:18:09 +0000 (12:18 -0700)]
Merge tag 'acpi-5.14-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a regression introduced during this cycle that has been partially
addressed by an earlier commit (Andy Shevchenko)"
* tag 'acpi-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor()
Linus Torvalds [Fri, 27 Aug 2021 19:06:51 +0000 (12:06 -0700)]
Merge tag 'pm-5.14-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two issues introduced during this cycle, one of which is a
regression and the other one affects new code.
Specifics:
- Prevent the operating performance points (OPP) code from crashing
when some entries in the table of required OPPs are set to error
pointer values (Marijn Suijten)
- Prevent the generic power domains (genpd) framework from
incorrectly overriding the performance state of a device set by its
driver while it is runtime-suspended or when runtime PM of it is
disabled (Dmitry Osipenko)"
* tag 'pm-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: domains: Improve runtime PM performance state handling
opp: core: Check for pending links before reading required_opp pointers
David Hildenbrand [Wed, 25 Aug 2021 10:24:15 +0000 (12:24 +0200)]
virtio-mem: fix sleeping in RCU read side section in virtio_mem_online_page_cb()
virtio_mem_set_fake_offline() might sleep now, and we call it under
rcu_read_lock(). To fix it, simply move the rcu_read_unlock() further
up, as we're done with the device.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes:
6cc26d77613a: "virtio-mem: use page_offline_(start|end) when setting PageOffline()
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Fri, 27 Aug 2021 18:27:01 +0000 (20:27 +0200)]
Merge branch 'pm-opp'
* pm-opp:
opp: core: Check for pending links before reading required_opp pointers
Linus Torvalds [Fri, 27 Aug 2021 18:04:57 +0000 (11:04 -0700)]
Merge tag 'riscv-for-linus-5.14-rc8' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- device tree updates for the Microsemi Polarfire development kit that
fix some mismatches between the u-boot and Linux enternet entries
- ensure that the F register state is correctly reflected in core dumps
* tag 'riscv-for-linus-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: microchip: Add ethernet0 to the aliases node
riscv: dts: microchip: Use 'local-mac-address' for emac1
riscv: Ensure the value of FP registers in the core dump file is up to date
Linus Torvalds [Fri, 27 Aug 2021 16:52:48 +0000 (09:52 -0700)]
Merge tag 'mmc-v5.14-rc7' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fix from Ulf Hansson:
- sdhci-iproc: Fix clock error for ACPI rpi's
* tag 'mmc-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711"
Christoph Hellwig [Fri, 27 Aug 2021 16:32:50 +0000 (18:32 +0200)]
cryptoloop: add a deprecation warning
Support for cryptoloop has been officially marked broken and deprecated
in favor of dm-crypt (which supports the same broken algorithms if
needed) in Linux 2.6.4 (released in March 2004), and support for it has
been entirely removed from losetup in util-linux 2.23 (released in April
2013). Add a warning and a deprecation schedule.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210827163250.255325-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 27 Aug 2021 16:00:43 +0000 (09:00 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
"Resolve a Keystone 2 kernel mapping regression"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9104/2: Fix Keystone 2 kernel mapping regression
Ulf Hansson [Fri, 27 Aug 2021 14:30:36 +0000 (16:30 +0200)]
Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711"
This reverts commit
419dd626e357e89fc9c4e3863592c8b38cfe1571.
It turned out that the change from the reverted commit breaks the ACPI
based rpi's because it causes the 100Mhz max clock to be overridden to the
return from sdhci_iproc_get_max_clock(), which is 0 because there isn't a
OF/DT based clock device.
Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Fixes:
419dd626e357 ("mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711")
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Jerome Brunet [Fri, 27 Aug 2021 09:29:27 +0000 (11:29 +0200)]
usb: gadget: u_audio: fix race condition on endpoint stop
If the endpoint completion callback is call right after the ep_enabled flag
is cleared and before usb_ep_dequeue() is call, we could do a double free
on the request and the associated buffer.
Fix this by clearing ep_enabled after all the endpoint requests have been
dequeued.
Fixes:
7de8681be2cd ("usb: gadget: u_audio: Free requests only after callback")
Cc: stable <stable@vger.kernel.org>
Reported-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20210827092927.366482-1-jbrunet@baylibre.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jerome Brunet [Fri, 27 Aug 2021 07:58:53 +0000 (09:58 +0200)]
usb: gadget: f_uac2: fixup feedback endpoint stop
When the uac2 function is stopped, there seems to be an issue reported on
some platforms (Intel Merrifield at least)
BUG: kernel NULL pointer dereference, address:
0000000000000008
...
RIP: 0010:dwc3_gadget_del_and_unmap_request+0x19/0xe0
...
Call Trace:
dwc3_remove_requests.constprop.0+0x12f/0x170
__dwc3_gadget_ep_disable+0x7a/0x160
dwc3_gadget_ep_disable+0x3d/0xd0
usb_ep_disable+0x1c/0x70
u_audio_stop_capture+0x79/0x120 [u_audio]
afunc_set_alt+0x73/0x80 [usb_f_uac2]
composite_setup+0x224/0x1b90 [libcomposite]
The issue happens only when the gadget is using the sync type "async", not
"adaptive". This indicates that problem is coming from the feedback
endpoint, which is only used with async synchronization mode.
The problem is that request is freed regardless of usb_ep_dequeue(), which
ends up badly if the request is not actually dequeued yet.
Update the feedback endpoint free function to release the endpoint the same
way it is done for the data endpoint, which takes care of the problem.
Fixes:
24f779dac8f3 ("usb: gadget: f_uac2/u_audio: add feedback endpoint support")
Reported-by: Ferry Toth <ftoth@exalondelft.nl>
Tested-by: Ferry Toth <ftoth@exalondelft.nl>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20210827075853.266912-1-jbrunet@baylibre.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Fri, 27 Aug 2021 10:00:23 +0000 (13:00 +0300)]
pd: fix a NULL vs IS_ERR() check
blk_mq_alloc_disk() returns error pointers, it doesn't return NULL
so correct the check.
Fixes:
262d431f9000 ("pd: use blk_mq_alloc_disk and blk_cleanup_disk")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20210827100023.GB9449@kili
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 27 Aug 2021 01:44:25 +0000 (18:44 -0700)]
Merge tag 'drm-fixes-2021-08-27' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Last set of fixes for 5.14, nothing major a couple of i915, couple of
imx and a few amdgpu. All pretty small.
i915:
- Fix syncmap memory leak
- Drop redundant display port debug print
amdgpu:
- Fix for pinning display buffers multiple times
- Fix delayed work handling for GFXOFF
- Fix build when CONFIG_SUSPEND is not set
imx:
- fix planar offset calculations
- fix accidental partial revert"
* tag 'drm-fixes-2021-08-27' of git://anongit.freedesktop.org/drm/drm:
drm/i915/dp: Drop redundant debug print
drm/i915: Fix syncmap memory leak
drm/amdgpu: Fix build with missing pm_suspend_target_state module export
drm/amdgpu: Cancel delayed work when GFXOFF is disabled
drm/amdgpu: use the preferred pin domain after the check
drm/imx: ipuv3-plane: fix accidental partial revert of 8 pixel alignment fix
gpu: ipu-v3: Fix i.MX IPU-v3 offset calculations for (semi)planar U/V formats
Dave Airlie [Fri, 27 Aug 2021 00:49:32 +0000 (10:49 +1000)]
Merge tag 'imx-drm-fixes-2021-08-18' of git://git.pengutronix.de/pza/linux into drm-fixes
drm/imx: imx-drm alignment and plane offset fixes
Fix an accidental partial revert of commit
94dfec48fca7 ("drm/imx: Add 8
pixel alignment fix") and plane offset calculations for capture of
non-aligned resolutions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/85a41af99beb2c9e7d6020435a135bf9f205a5ff.camel@pengutronix.de
Dave Airlie [Fri, 27 Aug 2021 00:24:07 +0000 (10:24 +1000)]
Merge tag 'amd-drm-fixes-5.14-2021-08-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.14-2021-08-25:
amdgpu:
- Fix for pinning display buffers multiple times
- Fix delayed work handling for GFXOFF
- Fix build when CONFIG_SUSPEND is not set
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210826032658.4068-1-alexander.deucher@amd.com
Dave Airlie [Fri, 27 Aug 2021 00:13:37 +0000 (10:13 +1000)]
Merge tag 'drm-intel-fixes-2021-08-26' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix syncmap memory leak
- Drop redundant display port debug print
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YSfSeHbyS5wBZtNJ@intel.com
Marek Marczykowski-Górecki [Thu, 26 Aug 2021 17:03:42 +0000 (19:03 +0200)]
PCI/MSI: Skip masking MSI-X on Xen PV
When running as Xen PV guest, masking MSI-X is a responsibility of the
hypervisor. The guest has no write access to the relevant BAR at all - when
it tries to, it results in a crash like this:
BUG: unable to handle page fault for address:
ffffc9004069100c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
e1000_probe+0x41f/0xdb0 [e1000e]
local_pci_probe+0x42/0x80
(...)
The recently introduced function msix_mask_all() does not check the global
variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking
of MSI[-X] interrupts.
Add the check to make this function XEN PV compatible.
Fixes:
7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
Linus Torvalds [Thu, 26 Aug 2021 20:26:40 +0000 (13:26 -0700)]
Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fix from Bruce Fields:
"This is a one-liner fix for a serious bug that can cause the server to
become unresponsive to a client, so I think it's worth the last-minute
inclusion for 5.14"
* tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
Linus Torvalds [Thu, 26 Aug 2021 20:20:22 +0000 (13:20 -0700)]
Merge tag 'net-5.14-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from can and bpf.
Closing three hw-dependent regressions. Any fixes of note are in the
'old code' category. Nothing blocking release from our perspective.
Current release - regressions:
- stmmac: revert "stmmac: align RX buffers"
- usb: asix: ax88772: move embedded PHY detection as early as
possible
- usb: asix: do not call phy_disconnect() for ax88178
- Revert "net: really fix the build...", from Kalle to fix QCA6390
Current release - new code bugs:
- phy: mediatek: add the missing suspend/resume callbacks
Previous releases - regressions:
- qrtr: fix another OOB Read in qrtr_endpoint_post
- stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
Previous releases - always broken:
- inet: use siphash in exception handling
- ip_gre: add validation for csum_start
- bpf: fix ringbuf helper function compatibility
- rtnetlink: return correct error on changing device netns
- e1000e: do not try to recover the NVM checksum on Tiger Lake"
* tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
Revert "net: really fix the build..."
net: hns3: fix get wrong pfc_en when query PFC configuration
net: hns3: fix GRO configuration error after reset
net: hns3: change the method of getting cmd index in debugfs
net: hns3: fix duplicate node in VLAN list
net: hns3: fix speed unknown issue in bond 4
net: hns3: add waiting time before cmdq memory is released
net: hns3: clear hardware resource when loading driver
net: fix NULL pointer reference in cipso_v4_doi_free
rtnetlink: Return correct error on changing device netns
net: dsa: hellcreek: Adjust schedule look ahead window
net: dsa: hellcreek: Fix incorrect setting of GCL
cxgb4: dont touch blocked freelist bitmap after free
ipv4: use siphash instead of Jenkins in fnhe_hashfun()
ipv6: use siphash in rt6_exception_hash()
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
net: usb: asix: ax88772: fix boolconv.cocci warnings
net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
qede: Fix memset corruption
net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
...
Jens Axboe [Thu, 26 Aug 2021 18:59:44 +0000 (12:59 -0600)]
Revert "block/mq-deadline: Prioritize high-priority requests"
This reverts commit
fb926032b3209300f9dc454a36b8299582ae545c.
Zhen reports that this commit slows down mq-deadline on a 128 thread
box, going from 258K IOPS to 170-180K. My testing shows that Optane
gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box.
Looking in detail at the code, the main culprit here is needing to sum
percpu counters in the dispatch hot path, leading to very high CPU
utilization there. To make matters worse, the code currently needs to
sum 2 percpu counters, and it does so in the most naive way of iterating
possible CPUs _twice_.
Since we're close to release, revert this commit and we can re-do it
with regular per-priority counters instead for the 5.15 kernel.
Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/
Reported-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 26 Aug 2021 18:26:00 +0000 (11:26 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"We received a report this week that the generic version of
pfn_valid(), which we switched to this merge window in
16c9afc77660
("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with
dma_map_resource() due to the following check:
/* Don't allow RAM to be mapped */
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;
Since the ongoing saga to determine the semantics of pfn_valid() is
unlikely to be resolved this week (does it indicate valid memory, or
just the presence of a struct page, or whether that struct page has
been initialised?), just revert back to our old version of pfn_valid()
for 5.14.
Summary:
- Fix dma_map_resource() by reverting back to old pfn_valid() code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
Linus Torvalds [Thu, 26 Aug 2021 18:18:30 +0000 (11:18 -0700)]
Merge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two memory management fixes for the filesystem"
* tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client:
ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
ceph: correctly handle releasing an embedded cap flush
Kalle Valo [Thu, 26 Aug 2021 17:28:16 +0000 (20:28 +0300)]
Revert "net: really fix the build..."
This reverts commit
ce78ffa3ef1681065ba451cfd545da6126f5ca88.
Wren and Nicolas reported that ath11k was failing to initialise QCA6390
Wi-Fi 6 device with error:
qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22
Commit
ce78ffa3ef16 ("net: really fix the build..."), introduced in
v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
devices are broken, but I only tested QCA6390. Let's revert the broken
commit so that ath11k works again.
Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 26 Aug 2021 18:05:11 +0000 (11:05 -0700)]
Merge tag 'for-5.14-rc7-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix that I think qualifies for a late merge. It's a revert of
a one-liner fix that meanwhile got backported to stable kernels and we
got reports from users.
The broken fix prevents creating compressed inline extents, which
could be noticeable on space consumption.
Technically it's a regression as the patch was merged in 5.14-rc1 but
got propagated to several stable kernels and has higher exposure than
a 'typical' development cycle bug"
* tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Revert "btrfs: compression: don't try to compress if we don't have enough pages"
Sebastian Andrzej Siewior [Thu, 26 Aug 2021 13:37:38 +0000 (15:37 +0200)]
sched: Fix get_push_task() vs migrate_disable()
push_rt_task() attempts to move the currently running task away if the
next runnable task has migration disabled and therefore is pinned on the
current CPU.
The current task is retrieved via get_push_task() which only checks for
nr_cpus_allowed == 1, but does not check whether the task has migration
disabled and therefore cannot be moved either. The consequence is a
pointless invocation of the migration thread which correctly observes
that the task cannot be moved.
Return NULL if the task has migration disabled and cannot be moved to
another CPU.
Fixes:
a7c81556ec4d3 ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de
Andy Shevchenko [Thu, 26 Aug 2021 10:53:24 +0000 (13:53 +0300)]
media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor()
The commit
71f642833284 ("ACPI: utils: Fix reference counting in
for_each_acpi_dev_match()") moved adev assignment outside of error
path and hence made acpi_dev_put(sensor->adev) a no-op. We still
need to drop reference count on error path, and to achieve that,
replace sensor->adev by locally assigned adev.
Fixes:
71f642833284 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()")
Depends-on:
fc68f42aa737 ("ACPI: fix NULL pointer dereference")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jakub Kicinski [Thu, 26 Aug 2021 15:43:20 +0000 (08:43 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
bpf 2021-08-26
We've added 1 non-merge commit during the last 1 day(s):
1) Fix ringbuf helper function compatibility, from Daniel.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix ringbuf helper function compatibility
====================
Link: https://lore.kernel.org/r/20210826153720.19083-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 26 Aug 2021 14:24:20 +0000 (07:24 -0700)]
Merge branch 'net-hns3-add-some-fixes-for-net'
Guangbin Huang says:
====================
net: hns3: add some fixes for -net
This series adds some fixes for the HNS3 ethernet driver.
====================
Link: https://lore.kernel.org/r/1629976921-43438-1-git-send-email-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guangbin Huang [Thu, 26 Aug 2021 11:22:01 +0000 (19:22 +0800)]
net: hns3: fix get wrong pfc_en when query PFC configuration
Currently, when query PFC configuration by dcbtool, driver will return
PFC enable status based on TC. As all priorities are mapped to TC0 by
default, if TC0 is enabled, then all priorities mapped to TC0 will be
shown as enabled status when query PFC setting, even though some
priorities have never been set.
for example:
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off
$ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on
To fix this problem, just returns user's PFC config parameter saved in
driver.
Fixes:
cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yufeng Mo [Thu, 26 Aug 2021 11:22:00 +0000 (19:22 +0800)]
net: hns3: fix GRO configuration error after reset
The GRO configuration is enabled by default after reset. This
is incorrect and should be restored to the user-configured value.
So this restoration is added during reset initialization.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yufeng Mo [Thu, 26 Aug 2021 11:21:59 +0000 (19:21 +0800)]
net: hns3: change the method of getting cmd index in debugfs
Currently, the cmd index is obtained in debugfs by comparing file names.
However, this method may cause errors when processing more complex file
names. So, change this method by saving cmd in private data and comparing
it when getting cmd index in debugfs for optimization.
Fixes:
5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guojia Liao [Thu, 26 Aug 2021 11:21:58 +0000 (19:21 +0800)]
net: hns3: fix duplicate node in VLAN list
VLAN list should not be added duplicate VLAN node, otherwise it would
cause "add failed" when restore VLAN from VLAN list, so this patch adds
VLAN ID check before adding node into VLAN list.
Fixes:
c6075b193462 ("net: hns3: Record VF vlan tables")
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yonglong Liu [Thu, 26 Aug 2021 11:21:57 +0000 (19:21 +0800)]
net: hns3: fix speed unknown issue in bond 4
In bond 4, when the link goes down and up repeatedly, the bond may get an
unknown speed, and then this port can not work.
The driver notify netif_carrier_on() before update the link state, when the
bond receive carrier on, will query the speed of the port, if the query
operation happens before updating the link state, will get an unknown
speed. So need to notify netif_carrier_on() after update the link state.
Fixes:
46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes:
e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>