arc/mm: enable ARCH_HAS_VM_GET_PAGE_PROT This enables ARCH_HAS_VM_GET_PAGE_PROT on the platform and exports standard vm_get_page_prot() implementation via DECLARE_VM_GET_PAGE_PROT, which looks up a private and static protection_map[] array. Subsequently all __SXXX and __PXXX macros can be dropped which are no longer needed. Link: https://lkml.kernel.org/r/20220711070600.2378316-23-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm: avoid unnecessary page fault retires on shared memory types I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). Then after that throttling we return VM_FAULT_RETRY. We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock. However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary. It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all. To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that. To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it. This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs: Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%) I believe it could help more than that. We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward. Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is. Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Brian Cain <bcain@quicinc.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Weinberger <richard@nod.at> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Will Deacon <will@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Simek <monstr@monstr.eu> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: David Hildenbrand <david@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Helge Deller <deller@gmx.de> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Merge tag 'arc-5.17-rc1' of git://git./linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: "Nothing too exciting for now" * tag 'arc-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: arc: use swap() to make code cleaner arc: perf: Move static structs to where they're really used ARC: perf: fix misleading comment about pmu vs counter stop arc: Replace lkml.org links with lore ARC: perf: Remove redundant initialization of variable idx ARC: thread_info.h: correct two typos in a comment
mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") allowed VM_FAULT_RETRY for multiple times, the FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page fault path, so the following check is no longer needed: flags & FAULT_FLAG_ALLOW_RETRY So just remove it. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arc: Replace lkml.org links with lore As started by commit 05a5f51ca566 ("Documentation: Replace lkml.org links with lore"), replace lkml.org links with lore to better use a single source that's more likely to stay available long-term. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
memblock: allow to specify flags with memblock_add_node() We want to specify flags when hotplugging memory. Let's prepare to pass flags to memblock_add_node() by adjusting all existing users. Note that when hotplugging memory the system is already up and running and we might have concurrent memblock users: for example, while we're hotplugging memory, kexec_file code might search for suitable memory regions to place kexec images. It's important to add the memory directly to memblock via a single call with the right flags, instead of adding the memory first and apply flags later: otherwise, concurrent memblock users might temporarily stumble over memblocks with wrong flags, which will be important in a follow-up patch that introduces a new flag to properly handle add_memory_driver_managed(). Link: https://lkml.kernel.org/r/20211004093605.5830-4-david@redhat.com Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Shahab Vahedi <shahab@synopsys.com> [arch/arc] Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jianyong Wu <Jianyong.Wu@arm.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memblock: rename memblock_free to memblock_phys_free Since memblock_free() operates on a physical range, make its name reflect it and rename it to memblock_phys_free(), so it will be a logical counterpart to memblock_phys_alloc(). The callers are updated with the below semantic patch: @@ expression addr; expression size; @@ - memblock_free(addr, size); + memblock_phys_free(addr, size); Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'arc-5.15-rc1' of git://git./linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: "Finally a big pile of changes for ARC (atomics/mm). These are from our internal arc64 tree, preparing mainline for eventual arc64 support. I'm spreading them out to avoid tsunami of patches in one release. - MM rework: - Implement up to 4 paging levels - Enable STRICT_MM_TYPECHECK - switch pgtable_t back to 'struct page *' - Atomics rework / implement relaxed accessors - Retire legacy MMUv1,v2; ARC750 cores - A few other build errors, typos" * tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (33 commits) ARC: mm: vmalloc sync from kernel to user table to update PMD ... ARC: mm: support 4 levels of page tables ARC: mm: support 3 levels of page tables ARC: mm: switch to asm-generic/pgalloc.h ARC: mm: switch pgtable_t back to struct page * ARC: mm: hack to allow 2 level build with 4 level code ARC: mm: disintegrate pgtable.h into levels and flags ARC: mm: disintegrate mmu.h (arcv2 bits out) ARC: mm: move MMU specific bits out of entry code ... ARC: mm: move MMU specific bits out of ASID allocator ARC: mm: non-functional code movement/cleanup ARC: mm: pmd_populate* to use the canonical set_pmd (and drop pmd_set) ARC: ioremap: use more commonly used PAGE_KERNEL based uncached flag ARC: mm: Enable STRICT_MM_TYPECHECKS ARC: mm: Fixes to allow STRICT_MM_TYPECHECKS ARC: mm: move mmu/cache externs out to setup.h ARC: mm: remove tlb paranoid code ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only ARC: retire MMUv1 and MMUv2 support ARC: retire ARC750 support ...
ARC: mm: vmalloc sync from kernel to user table to update PMD ... ... not PGD vmalloc() sets up the kernel page table (starting from @swapper_pg_dir). But when vmalloc area is accessed in context of a user task, say opening terminal in n_tty_open(), the user page tables need to be synced from kernel page tables so that TLB entry is created in "user context". The old code was doing this incorrectly, as it was updating the user pgd entry (first level itself) to point to kernel pud table (2nd level), effectively yanking away the entire user space translation with kernel one. The correct way to do this is to ONLY update a user space pgd/pud/pmd entry if it is not popluated already. This ensures that only the missing leaf pmd entry gets updated to point to relevant kernel pte table. From code change pov, we are chaging the pattern: p4d = p4d_offset(pgd, address); p4d_k = p4d_offset(pgd_k, address); if (!p4d_present(*p4d_k)) goto bad_area; set_p4d(p4d, *p4d_k); with p4d = p4d_offset(pgd, address); p4d_k = p4d_offset(pgd_k, address); if (p4d_none(*p4d_k)) goto bad_area; if (!p4d_present(*p4d)) set_p4d(p4d, *p4d_k); Signed-off-by: Vineet Gupta <vgupta@kernel.org>
ARC: mm: support 3 levels of page tables ARCv2 MMU is software walked and Linux implements 2 levels of paging: pgd/pte. Forthcoming hw will have multiple levels, so this change preps mm code for same. It is also fun to try multi levels even on soft-walked code to ensure generic mm code is robust to handle. overview ________ 2 levels {pgd, pte} : pmd is folded but pmd_* macros are valid and operate on pgd 3 levels {pgd, pmd, pte}: - pud is folded and pud_* macros point to pgd - pmd_* macros operate on actual pmd code changes ____________ 1. #include <asm-generic/pgtable-nopud.h> 2. Define CONFIG_PGTABLE_LEVELS 3 3a. Define PMD_SHIFT, PMD_SIZE, PMD_MASK, pmd_t 3b. Define pmd_val() which actually deals with pmd (pmd_offset(), pmd_index() are provided by generic code) 3c. pmd_alloc_one()/pmd_free() also provided by generic code (pmd_populate/pmd_free already exist) 4. Define pud_none(), pud_bad() macros based on generic pud_val() which internally pertains to pgd now. 4b. define pud_populate() to just setup pgd Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
ARC: mm: switch pgtable_t back to struct page * So far ARC pgtable_t has not been struct page based to avoid extra page_address() calls involved. However the differences are down to noise and get in the way of using generic code, hence this patch. This also allows us to reuse generic THP depost/withdraw code. There's some additional consideration for PGDIR_SHIFT in 4K page config. Now due to page tables being PAGE_SIZE deep only, the address split can't be really arbitrary. Tested-by: kernel test robot <lkp@intel.com> Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
ARC: mm: hack to allow 2 level build with 4 level code PMD_SHIFT is mapped to PUD_SHIFT or PGD_SHIFT by asm-generic/pgtable-* but only for !__ASSEMBLY__ tlbex.S asm code has PTRS_PER_PTE which uses PMD_SHIFT hence barfs for CONFIG_PGTABLE_LEVEL={2,3} and works for 4. So add a workaround local to tlbex.S - the proper fix is to change asm-generic/pgtable-* headers to expose the defines for __ASSEMBLY__ too Signed-off-by: Vineet Gupta <vgupta@kernel.org>
ARC: mm: Enable STRICT_MM_TYPECHECKS In the past I've refrained from doing this (at least 2 times) due to the slight code bloat due to ABI implications of pte_t etc becoming struct Per ARC ABI, functions return struct via memory and not through register r0, even if the struct would fit in register(s) - caller allocates space on stack and passes the address as first arg (r0), shifting rest of args by one - callee creates return struct in memory (referenced via r0) This time around the code actually shrunk slightly (due to subtle inlining heuristic effects), but still slightly inefficient due to return values passed through memory. That however seems like a small cost compared to maintenance burden given the impending new mmu support for page walk etc Signed-off-by: Vineet Gupta <vgupta@kernel.org>
ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only MMU SCRATCH_DATA0 register is intended to cache task pgd. However in ARC700 SMP port, it has to be repurposed for re-entrant interrupt handling, while UP port doesn't. We currently handle these use-cases using a fabricated #define which has usual issues of dependency nesting and obvious ugliness. So clean this up: for ARC700 don't use to cache pgd (even in UP) and do the opposite for ARCv2. And while here, switch to canonical pgd_offset(). Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>