linux-2.6-microblaze.git
15 months agomm/vmalloc.c: allow vread() to read out vm_map_ram areas
Baoquan He [Mon, 6 Feb 2023 08:40:16 +0000 (16:40 +0800)]
mm/vmalloc.c: allow vread() to read out vm_map_ram areas

Currently, vread can read out vmalloc areas which is associated with a
vm_struct.  While this doesn't work for areas created by vm_map_ram()
interface because it doesn't have an associated vm_struct.  Then in
vread(), these areas are all skipped.

Here, add a new function vmap_ram_vread() to read out vm_map_ram areas.
The area created with vmap_ram_vread() interface directly can be handled
like the other normal vmap areas with aligned_vread().  While areas which
will be further subdivided and managed with vmap_block need carefully read
out page-aligned small regions and zero fill holes.

Link: https://lkml.kernel.org/r/20230206084020.174506-4-bhe@redhat.com
Reported-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Tested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/vmalloc.c: add flags to mark vm_map_ram area
Baoquan He [Mon, 6 Feb 2023 08:40:15 +0000 (16:40 +0800)]
mm/vmalloc.c: add flags to mark vm_map_ram area

Through vmalloc API, a virtual kernel area is reserved for physical
address mapping.  And vmap_area is used to track them, while vm_struct is
allocated to associate with the vmap_area to store more information and
passed out.

However, area reserved via vm_map_ram() is an exception.  It doesn't have
vm_struct to associate with vmap_area.  And we can't recognize the
vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal
freeing path will set va->vm = NULL before unmapping, please see function
remove_vm_area().

Meanwhile, there are two kinds of handling for vm_map_ram area.  One is
the whole vmap_area being reserved and mapped at one time through
vm_map_area() interface; the other is the whole vmap_area with
VMAP_BLOCK_SIZE size being reserved, while mapped into split regions with
smaller size via vb_alloc().

To mark the area reserved through vm_map_ram(), add flags field into
struct vmap_area.  Bit 0 indicates this is vm_map_ram area created through
vm_map_ram() interface, while bit 1 marks out the type of vm_map_ram area
which makes use of vmap_block to manage split regions via vb_alloc/free().

This is a preparation for later use.

Link: https://lkml.kernel.org/r/20230206084020.174506-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/vmalloc.c: add used_map into vmap_block to track space of vmap_block
Baoquan He [Mon, 6 Feb 2023 08:40:14 +0000 (16:40 +0800)]
mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block

Patch series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas", v5.

Problem:
***

Stephen reported vread() will skip vm_map_ram areas when reading out
/proc/kcore with drgn utility.  Please see below link to get more details.

  /proc/kcore reads 0's for vmap_block
  https://lore.kernel.org/all/87ilk6gos2.fsf@oracle.com/T/#u

Root cause:
***

The normal vmalloc API uses struct vmap_area to manage the virtual kernel
area allocated, and associate a vm_struct to store more information and
pass out.  However, area reserved through vm_map_ram() interface doesn't
allocate vm_struct to associate with.  So the current code in vread() will
skip the vm_map_ram area through 'if (!va->vm)' conditional checking.

Solution:
***

To mark the area reserved through vm_map_ram() interface, add field
'flags' into struct vmap_area.  Bit 0 indicates this is vm_map_ram area
created through vm_map_ram() interface, bit 1 marks out the type of
vm_map_ram area which makes use of vmap_block to manage split regions via
vb_alloc/free().

And also add bitmap field 'used_map' into struct vmap_block to mark those
further subdivided regions being used to differentiate with dirty and free
regions in vmap_block.

With the help of above vmap_area->flags and vmap_block->used_map, we can
recognize and handle vm_map_ram areas successfully.  All these are done in
patch 1~3.

Meanwhile, do some improvement on areas related to vm_map_ram areas in
patch 4, 5.  And also change area flag from VM_ALLOC to VM_IOREMAP in
patch 6, 7 because this will show them as 'ioremap' in /proc/vmallocinfo,
and exclude them from /proc/kcore.

This patch (of 7):

In one vmap_block area, there could be three types of regions: region
being used which is allocated through vb_alloc(), dirty region which is
freed via vb_free() and free region.  Among them, only used region has
available data.  While there's no way to track those used regions
currently.

Here, add bitmap field used_map into vmap_block, and set/clear it during
allocation or freeing regions of vmap_block area.

This is a preparation for later use.

Link: https://lkml.kernel.org/r/20230206084020.174506-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20230206084020.174506-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoshmem: fix W=1 build warnings with CONFIG_SHMEM=n
Matthew Wilcox (Oracle) [Mon, 6 Feb 2023 19:08:50 +0000 (19:08 +0000)]
shmem: fix W=1 build warnings with CONFIG_SHMEM=n

With W=1 and CONFIG_SHMEM=n, shmem.c functions have no prototypes so the
compiler emits warnings.

Link: https://lkml.kernel.org/r/20230206190850.4054983-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mark Hemment <markhemm@googlemail.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoshmem: add shmem_read_folio() and shmem_read_folio_gfp()
Matthew Wilcox (Oracle) [Mon, 6 Feb 2023 16:25:20 +0000 (16:25 +0000)]
shmem: add shmem_read_folio() and shmem_read_folio_gfp()

These are the folio replacements for shmem_read_mapping_page() and
shmem_read_mapping_page_gfp().

[akpm@linux-foundation.org: fix shmem_read_mapping_page_gfp(), per Matthew]
Link: https://lkml.kernel.org/r/Y+QdJTuzxeBYejw2@casper.infradead.org
Link: https://lkml.kernel.org/r/20230206162520.4029022-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mark Hemment <markhemm@googlemail.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agofilemap: add mapping_read_folio_gfp()
Matthew Wilcox (Oracle) [Mon, 6 Feb 2023 16:25:19 +0000 (16:25 +0000)]
filemap: add mapping_read_folio_gfp()

This is like read_cache_page_gfp() except it returns the folio instead
of the precise page.

Link: https://lkml.kernel.org/r/20230206162520.4029022-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mark Hemment <markhemm@googlemail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/page_alloc: reduce fallbacks to (MIGRATE_PCPTYPES - 1)
Yajun Deng [Fri, 3 Feb 2023 10:01:32 +0000 (18:01 +0800)]
mm/page_alloc: reduce fallbacks to (MIGRATE_PCPTYPES - 1)

The commit 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
pageblocks with others") has removed MIGRATE_CMA and MIGRATE_ISOLATE from
fallbacks list. so there is no need to add an element at the end of every
type.

Reduce fallbacks to (MIGRATE_PCPTYPES - 1).

Link: https://lkml.kernel.org/r/20230203100132.1627787-1-yajun.deng@linux.dev
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates
Suren Baghdasaryan [Wed, 1 Feb 2023 00:01:16 +0000 (16:01 -0800)]
mm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates

Provide vm_flags_reset_once() and replace the vm_flags updates which used
WRITE_ONCE() to prevent compiler optimizations.

Link: https://lkml.kernel.org/r/20230201000116.1333160-1-surenb@google.com
Fixes: 0cce31a0aa0e ("mm: replace vma->vm_flags direct modifications with modifier calls")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/vmalloc: replace BUG_ON with a simple if statement
Hyunmin Lee [Wed, 1 Feb 2023 11:51:42 +0000 (20:51 +0900)]
mm/vmalloc: replace BUG_ON with a simple if statement

As per the coding standards, in the event of an abnormal condition that
should not occur under normal circumstances, the kernel should attempt
recovery and proceed with execution, rather than halting the machine.

Specifically, in the alloc_vmap_area() function, use a simple if()
instead of using BUG_ON() halting the machine.

Link: https://lkml.kernel.org/r/20230201115142.GA7772@min-iamroot
Co-developed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Co-developed-by: Jeungwoo Yoo <casionwoo@gmail.com>
Signed-off-by: Jeungwoo Yoo <casionwoo@gmail.com>
Co-developed-by: Sangyun Kim <sangyun.kim@snu.ac.kr>
Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr>
Signed-off-by: Hyunmin Lee <hn.min.lee@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/swapfile: remove pr_debug in get_swap_pages()
Longlong Xia [Tue, 31 Jan 2023 07:10:35 +0000 (07:10 +0000)]
mm/swapfile: remove pr_debug in get_swap_pages()

It's known that get_swap_pages() may fail to find available space under
some extreme case, but pr_debug() provides useless information.  Let's
remove it.

Link: https://lkml.kernel.org/r/20230131071035.1085968-1-xialonglong1@huawei.com
Signed-off-by: Longlong Xia <xialonglong1@huawei.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm, arch: add generic implementation of pfn_valid() for FLATMEM
Mike Rapoport (IBM) [Sun, 29 Jan 2023 12:42:35 +0000 (14:42 +0200)]
mm, arch: add generic implementation of pfn_valid() for FLATMEM

Every architecture that supports FLATMEM memory model defines its own
version of pfn_valid() that essentially compares a pfn to max_mapnr.

Use mips/powerpc version implemented as static inline as a generic
implementation of pfn_valid() and drop its per-architecture definitions.

[rppt@kernel.org: fix the generic pfn_valid()]
Link: https://lkml.kernel.org/r/Y9lg7R1Yd931C+y5@kernel.org
Link: https://lkml.kernel.org/r/20230129124235.209895-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Huacai Chen <chenhuacai@loongson.cn> [LoongArch]
Acked-by: Stafford Horne <shorne@gmail.com> [OpenRISC]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Cc: Brian Cain <bcain@quicinc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomips: drop definition of pfn_valid() for DISCONTIGMEM
Mike Rapoport (IBM) [Sun, 29 Jan 2023 12:42:34 +0000 (14:42 +0200)]
mips: drop definition of pfn_valid() for DISCONTIGMEM

There is a stale definition of pfn_valid() for DISCONTINGMEM memory model
guarded !FLATMEM && !SPARSEMEM && NUMA ifdefery.

Remove everything but definition of pfn_valid() for FLATMEM.

Link: https://lkml.kernel.org/r/20230129124235.209895-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agom68k: use asm-generic/memory_model.h for both MMU and !MMU
Mike Rapoport (IBM) [Sun, 29 Jan 2023 12:42:33 +0000 (14:42 +0200)]
m68k: use asm-generic/memory_model.h for both MMU and !MMU

The MMU variant uses generic definitions of page_to_pfn() and
pfn_to_page(), but !MMU defines them in include/asm/page_no.h for no good
reason.

Include asm-generic/memory_model.h in the common include/asm/page.h and
drop redundant definitions.

Link: https://lkml.kernel.org/r/20230129124235.209895-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoarm: include asm-generic/memory_model.h from page.h rather than memory.h
Mike Rapoport (IBM) [Sun, 29 Jan 2023 12:42:32 +0000 (14:42 +0200)]
arm: include asm-generic/memory_model.h from page.h rather than memory.h

Patch series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM", v2.

Every architecture that supports FLATMEM memory model defines its own
version of pfn_valid() that essentially compares a pfn to max_mapnr.

Use mips/powerpc version implemented as static inline as a generic
implementation of pfn_valid() and drop its per-architecture definitions

This patch (of 4):

Makes it consistent with other architectures and allows for generic
definition of pfn_valid() in asm-generic/memory_model.h with clear
override in arch/arm/include/asm/page.h

Link: https://lkml.kernel.org/r/20230129124235.209895-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20230129124235.209895-2-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomemory tier: release the new_memtier in find_create_memory_tier()
Tong Tiangen [Sun, 29 Jan 2023 04:06:51 +0000 (04:06 +0000)]
memory tier: release the new_memtier in find_create_memory_tier()

In find_create_memory_tier(), if failed to register device, then we should
release new_memtier from the tier list and put device instead of memtier.

Link: https://lkml.kernel.org/r/20230129040651.1329208-1-tongtiangen@huawei.com
Fixes: 9832fb87834e ("mm/demotion: expose memory tier details via sysfs")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Guohanjun <guohanjun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agokasan: infer allocation size by scanning metadata
Kuan-Ying Lee [Sun, 29 Jan 2023 02:14:35 +0000 (10:14 +0800)]
kasan: infer allocation size by scanning metadata

Make KASAN scan metadata to infer the requested allocation size instead of
printing cache->object_size.

This patch fixes confusing slab-out-of-bounds reports as reported in:

https://bugzilla.kernel.org/show_bug.cgi?id=216457

As an example of the confusing behavior, the report below hints that the
allocation size was 192, while the kernel actually called kmalloc(184):

==================================================================
BUG: KASAN: slab-out-of-bounds in _find_next_bit+0x143/0x160 lib/find_bit.c:109
Read of size 8 at addr ffff8880175766b8 by task kworker/1:1/26
...
The buggy address belongs to the object at ffff888017576600
 which belongs to the cache kmalloc-192 of size 192
The buggy address is located 184 bytes inside of
 192-byte region [ffff888017576600ffff8880175766c0)
...
Memory state around the buggy address:
 ffff888017576580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff888017576600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888017576680: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
                                        ^
 ffff888017576700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888017576780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

With this patch, the report shows:

==================================================================
...
The buggy address belongs to the object at ffff888017576600
 which belongs to the cache kmalloc-192 of size 192
The buggy address is located 0 bytes to the right of
 allocated 184-byte region [ffff888017576600ffff8880175766b8)
...
==================================================================

Also report slab use-after-free bugs as "slab-use-after-free" and print
"freed" instead of "allocated" in the report when describing the accessed
memory region.

Also improve the metadata-related comment in kasan_find_first_bad_addr
and use addr_has_metadata across KASAN code instead of open-coding
KASAN_SHADOW_START checks.

[akpm@linux-foundation.org: fix printk warning]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216457
Link: https://lkml.kernel.org/r/20230129021437.18812-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Co-developed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: export dump_mm()
Suren Baghdasaryan [Thu, 26 Jan 2023 19:37:52 +0000 (11:37 -0800)]
mm: export dump_mm()

mmap_assert_write_locked() is used in vm_flags modifiers.  Because
mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes
modified from inside a module, it's necessary to export dump_mm()
function.

Link: https://lkml.kernel.org/r/20230126193752.297968-8-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: introduce __vm_flags_mod and use it in untrack_pfn
Suren Baghdasaryan [Thu, 26 Jan 2023 19:37:51 +0000 (11:37 -0800)]
mm: introduce __vm_flags_mod and use it in untrack_pfn

There are scenarios when vm_flags can be modified without exclusive
mmap_lock, such as:
- after VMA was isolated and mmap_lock was downgraded or dropped
- in exit_mmap when there are no other mm users and locking is unnecessary
Introduce __vm_flags_mod to avoid assertions when the caller takes
responsibility for the required locking.
Pass a hint to untrack_pfn to conditionally use __vm_flags_mod for
flags modification to avoid assertion.

Link: https://lkml.kernel.org/r/20230126193752.297968-7-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: replace vma->vm_flags indirect modification in ksm_madvise
Suren Baghdasaryan [Thu, 26 Jan 2023 19:37:50 +0000 (11:37 -0800)]
mm: replace vma->vm_flags indirect modification in ksm_madvise

Replace indirect modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

Link: https://lkml.kernel.org/r/20230126193752.297968-6-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: replace vma->vm_flags direct modifications with modifier calls
Suren Baghdasaryan [Thu, 26 Jan 2023 19:37:49 +0000 (11:37 -0800)]
mm: replace vma->vm_flags direct modifications with modifier calls

Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK
Suren Baghdasaryan [Thu, 26 Jan 2023 19:37:48 +0000 (11:37 -0800)]
mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

To simplify the usage of VM_LOCKED_CLEAR_MASK in vm_flags_clear(), replace
it with VM_LOCKED_MASK bitmask and convert all users.

Link: https://lkml.kernel.org/r/20230126193752.297968-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: introduce vma->vm_flags wrapper functions
Suren Baghdasaryan [Thu, 26 Jan 2023 19:37:47 +0000 (11:37 -0800)]
mm: introduce vma->vm_flags wrapper functions

vm_flags are among VMA attributes which affect decisions like VMA merging
and splitting.  Therefore all vm_flags modifications are performed after
taking exclusive mmap_lock to prevent vm_flags updates racing with such
operations.  Introduce modifier functions for vm_flags to be used whenever
flags are updated.  This way we can better check and control correct
locking behavior during these updates.

Link: https://lkml.kernel.org/r/20230126193752.297968-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agokernel/fork: convert vma assignment to a memcpy
Suren Baghdasaryan [Thu, 26 Jan 2023 19:37:46 +0000 (11:37 -0800)]
kernel/fork: convert vma assignment to a memcpy

Patch series "introduce vm_flags modifier functions", v4.

This patchset was originally published as a part of per-VMA locking [1]
and was split after suggestion that it's viable on its own and to
facilitate the review process.  It is now a preprequisite for the next
version of per-VMA lock patchset, which reuses vm_flags modifier functions
to lock the VMA when vm_flags are being updated.

VMA vm_flags modifications are usually done under exclusive mmap_lock
protection because this attrubute affects other decisions like VMA merging
or splitting and races should be prevented.  Introduce vm_flags modifier
functions to enforce correct locking.

This patch (of 7):

Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
errors when we add a const modifier to vma->vm_flags.

Link: https://lkml.kernel.org/r/20230126193752.297968-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230126193752.297968-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agovma_merge: set vma iterator to correct position.
Liam R. Howlett [Fri, 20 Jan 2023 16:26:50 +0000 (11:26 -0500)]
vma_merge: set vma iterator to correct position.

When merging the previous value, set the vma iterator to the previous
slot.  Don't use the vma iterator to get the next/prev so that it is in
the correct position for a write.

Link: https://lkml.kernel.org/r/20230120162650.984577-50-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: remove __vma_adjust()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:49 +0000 (11:26 -0500)]
mm/mmap: remove __vma_adjust()

Inline the work of __vma_adjust() into vma_merge().  This reduces code
size and has the added benefits of the comments for the cases being
located with the code.

Change the comments referencing vma_adjust() accordingly.

[Liam.Howlett@oracle.com: fix vma_merge() offset when expanding the next vma]
Link: https://lkml.kernel.org/r/20230130195713.2881766-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230120162650.984577-49-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: convert do_brk_flags() to use vma_prepare() and vma_complete()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:48 +0000 (11:26 -0500)]
mm/mmap: convert do_brk_flags() to use vma_prepare() and vma_complete()

Use the abstracted vma locking for do_brk_flags()

Link: https://lkml.kernel.org/r/20230120162650.984577-48-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: introduce dup_vma_anon() helper
Liam R. Howlett [Fri, 20 Jan 2023 16:26:47 +0000 (11:26 -0500)]
mm/mmap: introduce dup_vma_anon() helper

Create a helper for duplicating the anon vma when adjusting the vma.  This
simplifies the logic of __vma_adjust().

Link: https://lkml.kernel.org/r/20230120162650.984577-47-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: don't use __vma_adjust() in shift_arg_pages()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:46 +0000 (11:26 -0500)]
mm/mmap: don't use __vma_adjust() in shift_arg_pages()

Introduce shrink_vma() which uses the vma_prepare() and vma_complete()
functions to reduce the vma coverage.

Convert shift_arg_pages() to use expand_vma() and the new shrink_vma()
function.  Remove support from __vma_adjust() to reduce a vma size since
shift_arg_pages() is the only user that shrinks a VMA in this way.

Link: https://lkml.kernel.org/r/20230120162650.984577-46-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mremap: convert vma_adjust() to vma_expand()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:45 +0000 (11:26 -0500)]
mm/mremap: convert vma_adjust() to vma_expand()

Stop using vma_adjust() in preparation for removing the function.  Export
vma_expand() to use instead.

Link: https://lkml.kernel.org/r/20230120162650.984577-45-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: don't use __vma_adjust() in __split_vma()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:44 +0000 (11:26 -0500)]
mm: don't use __vma_adjust() in __split_vma()

Use the abstracted locking and maple tree operations.  Since __split_vma()
is the only user of the __vma_adjust() function to use the insert
argument, drop that argument.  Remove the NULL passed through from
fs/exec's shift_arg_pages() and mremap() at the same time.

Link: https://lkml.kernel.org/r/20230120162650.984577-44-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: introduce init_vma_prep() and init_multi_vma_prep()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:43 +0000 (11:26 -0500)]
mm/mmap: introduce init_vma_prep() and init_multi_vma_prep()

Add init_vma_prep() and init_multi_vma_prep() to set up the struct
vma_prepare.  This is to abstract the locking when adjusting the VMAs.

Also change __vma_adjust() variable remove_next int in favour of a pointer
to the VMA to remove.  Rename next_next to remove2 since this better
reflects its use.

Link: https://lkml.kernel.org/r/20230120162650.984577-43-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: use vma_prepare() and vma_complete() in vma_expand()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:42 +0000 (11:26 -0500)]
mm/mmap: use vma_prepare() and vma_complete() in vma_expand()

Use the new locking functions for vma_expand().  This reduces code
duplication.

At the same time change VM_BUG_ON() to VM_WARN_ON()

Link: https://lkml.kernel.org/r/20230120162650.984577-42-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: refactor locking out of __vma_adjust()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:41 +0000 (11:26 -0500)]
mm/mmap: refactor locking out of __vma_adjust()

Move the locking into vma_prepare() and vma_complete() for use elsewhere

Link: https://lkml.kernel.org/r/20230120162650.984577-41-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: move anon_vma setting in __vma_adjust()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:40 +0000 (11:26 -0500)]
mm/mmap: move anon_vma setting in __vma_adjust()

Move the anon_vma setting & warn_no up the function.  This is done to
clear up the locking later.

Link: https://lkml.kernel.org/r/20230120162650.984577-40-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: change munmap splitting order and move_vma()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:39 +0000 (11:26 -0500)]
mm: change munmap splitting order and move_vma()

Splitting can be more efficient when the order is not of concern.  Change
do_vmi_align_munmap() to reduce walking of the tree during split
operations.

move_vma() must also be altered to remove the dependency of keeping the
original VMA as the active part of the split.  Transition to using vma
iterator to look up the prev and/or next vma after munmap.

[Liam.Howlett@oracle.com: fix vma iterator initialization]
Link: https://lkml.kernel.org/r/20230126212011.980350-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230120162650.984577-39-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agommap: clean up mmap_region() unrolling
Liam R. Howlett [Fri, 20 Jan 2023 16:26:38 +0000 (11:26 -0500)]
mmap: clean up mmap_region() unrolling

Move logic of unrolling to the error path as apposed to duplicating it
within the function body.  This reduces the potential of missing an update
to one path when making changes.

Link: https://lkml.kernel.org/r/20230120162650.984577-38-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: add vma iterator to vma_adjust() arguments
Liam R. Howlett [Fri, 20 Jan 2023 16:26:37 +0000 (11:26 -0500)]
mm: add vma iterator to vma_adjust() arguments

Change the vma_adjust() function definition to accept the vma iterator and
pass it through to __vma_adjust().

Update fs/exec to use the new vma_adjust() function parameters.

Update mm/mremap to use the new vma_adjust() function parameters.

Revert the __split_vma() calls back from __vma_adjust() to vma_adjust()
and pass through the vma iterator.

Link: https://lkml.kernel.org/r/20230120162650.984577-37-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: pass vma iterator through to __vma_adjust()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:36 +0000 (11:26 -0500)]
mm: pass vma iterator through to __vma_adjust()

Pass the iterator through to be used in __vma_adjust().  The state of the
iterator needs to be correct for the operation that will occur so make the
adjustments.

Link: https://lkml.kernel.org/r/20230120162650.984577-36-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: remove unnecessary write to vma iterator in __vma_adjust()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:35 +0000 (11:26 -0500)]
mm: remove unnecessary write to vma iterator in __vma_adjust()

If the vma start address is going to change due to an insert, then it is
safe to not write the vma to the tree.  The write of the insert vma will
alter the tree as necessary.

Link: https://lkml.kernel.org/r/20230120162650.984577-35-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomadvise: use split_vma() instead of __split_vma()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:34 +0000 (11:26 -0500)]
madvise: use split_vma() instead of __split_vma()

The split_vma() wrapper is specifically for this use case, so use it.

[Liam.Howlett@oracle.com: fix VMA_ITERATOR start position]
Link: https://lkml.kernel.org/r/20230125135809.85262-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230120162650.984577-34-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: pass through vma iterator to __vma_adjust()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:33 +0000 (11:26 -0500)]
mm: pass through vma iterator to __vma_adjust()

Pass the vma iterator through to __vma_adjust() so the state can be
updated.

Link: https://lkml.kernel.org/r/20230120162650.984577-33-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agommap: convert __vma_adjust() to use vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:32 +0000 (11:26 -0500)]
mmap: convert __vma_adjust() to use vma iterator

Use the vma iterator internally for __vma_adjust().  Avoid using the maple
tree interface directly for type safety.

Link: https://lkml.kernel.org/r/20230120162650.984577-32-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/damon/vaddr-test.h: stop using vma_mas_store() for maple tree store
Liam R. Howlett [Fri, 20 Jan 2023 16:26:31 +0000 (11:26 -0500)]
mm/damon/vaddr-test.h: stop using vma_mas_store() for maple tree store

Prepare for the removal of the vma_mas_store() function by open coding the
maple tree store in this test code.  Set the range of the maple state and
call the store function directly.

Link: https://lkml.kernel.org/r/20230120162650.984577-31-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:30 +0000 (11:26 -0500)]
mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator

Drop the vmi_* functions and transition all users to use the vma iterator
directly.

Link: https://lkml.kernel.org/r/20230120162650.984577-30-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agonommu: pass through vma iterator to shrink_vma()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:29 +0000 (11:26 -0500)]
nommu: pass through vma iterator to shrink_vma()

Rename the function to vmi_shrink_vma() indicate it takes the vma
iterator.  Use the iterator to preallocate and drop the delete function.
The maple tree is able to do the modification easier than the linked list
and rbtree, so just clear the necessary area in the tree.

add_vma_to_mm() is no longer used, so drop this function.

vmi_add_vma_to_mm() is now only used once, so inline this function into
do_mmap().

Link: https://lkml.kernel.org/r/20230120162650.984577-29-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agonommu: convert nommu to using the vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:28 +0000 (11:26 -0500)]
nommu: convert nommu to using the vma iterator

Gain type safety in nommu by using the vma_iterator and not the maple tree
directly.

Link: https://lkml.kernel.org/r/20230120162650.984577-28-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mremap: use vmi version of vma_merge()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:27 +0000 (11:26 -0500)]
mm/mremap: use vmi version of vma_merge()

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-27-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agommap: use vmi version of vma_merge()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:26 +0000 (11:26 -0500)]
mmap: use vmi version of vma_merge()

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-26-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agommap: pass through vmi iterator to __split_vma()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:25 +0000 (11:26 -0500)]
mmap: pass through vmi iterator to __split_vma()

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-25-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomadvise: use vmi iterator for __split_vma() and vma_merge()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:24 +0000 (11:26 -0500)]
madvise: use vmi iterator for __split_vma() and vma_merge()

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-24-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agosched: convert to vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:23 +0000 (11:26 -0500)]
sched: convert to vma iterator

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-23-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agotask_mmu: convert to vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:22 +0000 (11:26 -0500)]
task_mmu: convert to vma iterator

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Update the comments to how the vma iterator works.  The vma iterator will
keep track of the last vm_end and start the search from vm_end + 1.

Link: https://lkml.kernel.org/r/20230120162650.984577-22-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomempolicy: convert to vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:21 +0000 (11:26 -0500)]
mempolicy: convert to vma iterator

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-21-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agocoredump: convert to vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:20 +0000 (11:26 -0500)]
coredump: convert to vma iterator

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-20-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomlock: convert mlock to vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:19 +0000 (11:26 -0500)]
mlock: convert mlock to vma iterator

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-19-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: change mprotect_fixup to vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:18 +0000 (11:26 -0500)]
mm: change mprotect_fixup to vma iterator

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-18-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agouserfaultfd: use vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:17 +0000 (11:26 -0500)]
userfaultfd: use vma iterator

Use the vma iterator so that the iterator can be invalidated or updated to
avoid each caller doing so.

Link: https://lkml.kernel.org/r/20230120162650.984577-17-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoipc/shm: introduce new do_vma_munmap() to munmap
Liam R. Howlett [Thu, 26 Jan 2023 21:20:49 +0000 (16:20 -0500)]
ipc/shm: introduce new do_vma_munmap() to munmap

The shm already has the vma iterator in position for a write.
do_vmi_munmap() searches for the correct position and aligns the write, so
it is not the right function to use in this case.

The shm VMA tree modification is similar to the brk munmap situation, the
vma iterator is in position and the VMA is already known.  This patch
generalizes the brk munmap function do_brk_munmap() to be used for any
other callers with the vma iterator already in position to munmap a VMA.

Link: https://lkml.kernel.org/r/20230126212049.980501-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/linux-mm/yt9dh6wec21a.fsf@linux.ibm.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoipc/shm: use the vma iterator for munmap calls
Liam R. Howlett [Fri, 20 Jan 2023 16:26:16 +0000 (11:26 -0500)]
ipc/shm: use the vma iterator for munmap calls

Pass through the vma iterator to do_vmi_munmap() to handle the iterator
state internally

Link: https://lkml.kernel.org/r/20230120162650.984577-16-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:15 +0000 (11:26 -0500)]
mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()

These wrappers are short-lived in this patch set so that each user can be
converted on its own.  In the end, these functions are renamed in one
commit.

Link: https://lkml.kernel.org/r/20230120162650.984577-15-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agommap: convert vma_expand() to use vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:14 +0000 (11:26 -0500)]
mmap: convert vma_expand() to use vma iterator

Use the vma iterator instead of the maple state for type safety and for
consistency through the mm code.

Link: https://lkml.kernel.org/r/20230120162650.984577-14-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agommap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:13 +0000 (11:26 -0500)]
mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator

Start passing the vma iterator through the mm code.  This will allow for
reuse of the state and cleaner invalidation if necessary.

Link: https://lkml.kernel.org/r/20230120162650.984577-13-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: remove preallocation from do_mas_align_munmap()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:12 +0000 (11:26 -0500)]
mm/mmap: remove preallocation from do_mas_align_munmap()

In preparation of passing the vma state through split, the pre-allocation
that occurs before the split has to be moved to after.  Since the
preallocation would then live right next to the store, just call store
instead of preallocating.  This effectively restores the potential error
path of splitting and not munmap'ing which pre-dates the maple tree.

Link: https://lkml.kernel.org/r/20230120162650.984577-12-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agommap: convert vma_link() vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:11 +0000 (11:26 -0500)]
mmap: convert vma_link() vma iterator

Avoid using the maple tree interface directly.

Link: https://lkml.kernel.org/r/20230120162650.984577-11-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agokernel/fork: convert forking to using the vmi iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:10 +0000 (11:26 -0500)]
kernel/fork: convert forking to using the vmi iterator

Avoid using the maple tree interface directly.  This gains type safety.

Link: https://lkml.kernel.org/r/20230120162650.984577-10-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/mmap: convert brk to use vma iterator
Liam R. Howlett [Fri, 20 Jan 2023 16:26:09 +0000 (11:26 -0500)]
mm/mmap: convert brk to use vma iterator

Use the vma iterator API for the brk() system call.  This will provide
type safety at compile time.

Link: https://lkml.kernel.org/r/20230120162650.984577-9-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: expand vma iterator interface
Liam R. Howlett [Fri, 20 Jan 2023 16:26:08 +0000 (11:26 -0500)]
mm: expand vma iterator interface

Add wrappers for the maple tree to the vma iterator.  This will provide
type safety at compile time.

Link: https://lkml.kernel.org/r/20230120162650.984577-8-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomaple_tree: fix mas_prev() and mas_find() state handling
Liam R. Howlett [Fri, 20 Jan 2023 16:26:07 +0000 (11:26 -0500)]
maple_tree: fix mas_prev() and mas_find() state handling

When mas_prev() does not find anything, set the state to MAS_NONE.

Handle the MAS_NONE in mas_find() like a MAS_START.

Link: https://lkml.kernel.org/r/20230120162650.984577-7-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: <syzbot+502859d610c661e56545@syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomaple_tree: fix handle of invalidated state in mas_wr_store_setup()
Liam R. Howlett [Fri, 20 Jan 2023 16:26:06 +0000 (11:26 -0500)]
maple_tree: fix handle of invalidated state in mas_wr_store_setup()

If an invalidated maple state is encountered during write, reset the maple
state to MAS_START.  This will result in a re-walk of the tree to the
correct location for the write.

Link: https://lore.kernel.org/all/20230107020126.1627-1-sj@kernel.org/
Link: https://lkml.kernel.org/r/20230120162650.984577-6-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agotest_maple_tree: test modifications while iterating
Liam R. Howlett [Fri, 20 Jan 2023 16:26:05 +0000 (11:26 -0500)]
test_maple_tree: test modifications while iterating

Add a testcase to ensure the iterator detects bad states on modifications
and does what the user expects

Link: https://lkml.kernel.org/r/20230120162650.984577-5-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomaple_tree: reduce user error potential
Liam R. Howlett [Fri, 20 Jan 2023 16:26:04 +0000 (11:26 -0500)]
maple_tree: reduce user error potential

When iterating, a user may operate on the tree and cause the maple state
to be altered and left in an unintuitive state.  Detect this scenario and
correct it by setting to the limit and invalidating the state.

Link: https://lkml.kernel.org/r/20230120162650.984577-4-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomaple_tree: fix potential rcu issue
Liam R. Howlett [Fri, 20 Jan 2023 16:26:03 +0000 (11:26 -0500)]
maple_tree: fix potential rcu issue

Ensure the node isn't dead after reading the node end.

Link: https://lkml.kernel.org/r/20230120162650.984577-3-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomaple_tree: add mas_init() function
Liam R. Howlett [Fri, 20 Jan 2023 16:26:02 +0000 (11:26 -0500)]
maple_tree: add mas_init() function

Patch series "VMA tree type safety and remove __vma_adjust()", v4.

This patchset does two things: 1.  Clean up, including removal of
__vma_adjust() and 2.  Extends the VMA iterator API to provide type safety
to the VMA operations using the maple tree, as requested by Linus [1].

It also addresses another issue of usability brought up by Linus about
needing to modify the maple state within the loops.  The maple state has
been replaced by the VMA iterator and the iterator is now modified within
the MM code so the caller should not need to worry about doing the work
themselves when tree modifications occur.

This brought up a potential inconsistency of the iterator state and what
the user expects, so the inconsistency is addressed to keep the VMA
iterator safe for use after the looping over a VMA range.  This is
addressed in patch 3 ("maple_tree: Reduce user error potential") and 4
("test_maple_tree: Test modifications while iterating").

While cleaning up the state, the duplicate locking code in mm/mmap.c
introduced by the maple tree has been address by abstracting it to two
functions: vma_prepare() and vma_complete().  These abstractions allowed
for a much simpler __vma_adjust(), which eventually leads to the removal
of the __vma_adjust() function by placing the logic into the vma_merge()
function itself.

1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/

This patch (of 49):

Add a function that will zero out the maple state struct and set some
basic defaults.

Link: https://lkml.kernel.org/r/20230120162650.984577-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230120162650.984577-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: fix memcpy_from_file_folio() integer underflow
Matthew Wilcox (Oracle) [Fri, 3 Feb 2023 21:28:40 +0000 (16:28 -0500)]
mm: fix memcpy_from_file_folio() integer underflow

If we have a HIGHMEM system with a large folio, 'offset' may be larger
than PAGE_SIZE, and so min_t will cap at 'len' instead of the intended
end-of-page.  That can overflow into the next page which is likely to be
unmapped and fault, but could theoretically copy the wrong data.

Link: https://lkml.kernel.org/r/Y919vmSrtAgsf6K3@casper.infradead.org
Fixes: 00cdf76012ab ("mm: add memcpy_from_file_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoarm/mm: fix swp type masking in __swp_entry()
David Hildenbrand [Wed, 8 Feb 2023 14:08:01 +0000 (15:08 +0100)]
arm/mm: fix swp type masking in __swp_entry()

We're masking with the number of type bits instead of the type mask, which
is obviously wrong.

Link: https://lkml.kernel.org/r/39fd91e3-c93b-23c6-afc6-cbe473bb0ca9@redhat.com
Fixes: 20aae9eff5ac ("arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agompage: convert __mpage_writepage() to use a folio more fully
Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:12:55 +0000 (20:12 +0000)]
mpage: convert __mpage_writepage() to use a folio more fully

This is just a conversion to the folio API.  While there are some nods
towards supporting multi-page folios in here, the blocks array is still
sized for one page's worth of blocks, and there are other assumptions such
as the blocks_per_page variable.

[willy@infradead.org: fix accidentally-triggering WARN_ON_ONCE]
Link: https://lkml.kernel.org/r/Y9kuaBgXf9lKJ8b0@casper.infradead.org
Link: https://lkml.kernel.org/r/20230126201255.1681189-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agofs: convert writepage_t callback to pass a folio
Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:12:54 +0000 (20:12 +0000)]
fs: convert writepage_t callback to pass a folio

Patch series "Convert writepage_t to use a folio".

More folioisation.  I split out the mpage work from everything else
because it completely dominated the patch, but some implementations I just
converted outright.

This patch (of 2):

We always write back an entire folio, but that's currently passed as the
head page.  Convert all filesystems that use write_cache_pages() to expect
a folio instead of a page.

Link: https://lkml.kernel.org/r/20230126201255.1681189-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230126201255.1681189-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: add memcpy_from_file_folio()
Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:15:52 +0000 (20:15 +0000)]
mm: add memcpy_from_file_folio()

This is the equivalent of memcpy_from_page().  It differs in that it takes
the position in a file instead of offset in a folio, it accepts the total
number of bytes to be copied (instead of the number of bytes to be copied
from this folio) and it returns how many bytes were copied from the folio,
rather than making the caller calculate that and then checking if the
caller got it right.

[akpm@linux-foundation.org: fix typo in comment]
Link: https://lkml.kernel.org/r/20230126201552.1681588-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoblock: remove ->rw_page
Christoph Hellwig [Wed, 25 Jan 2023 13:34:36 +0000 (14:34 +0100)]
block: remove ->rw_page

The ->rw_page method is a special purpose bypass of the usual bio handling
path that is limited to single-page reads and writes and synchronous which
causes a lot of extra code in the drivers, callers and the block layer.

The only remaining user is the MM swap code.  Switch that swap code to
simply submit a single-vec on-stack bio an synchronously wait on it based
on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that
currently implement ->rw_page instead.  While this touches one extra cache
line and executes extra code, it simplifies the block layer and drivers
and ensures that all feastures are properly supported by all drivers, e.g.
right now ->rw_page bypassed cgroup writeback entirely.

[akpm@linux-foundation.org: fix comment typo, per Dan]
Link: https://lkml.kernel.org/r/20230125133436.447864-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: factor out a swap_writepage_bdev helper
Christoph Hellwig [Wed, 25 Jan 2023 13:34:35 +0000 (14:34 +0100)]
mm: factor out a swap_writepage_bdev helper

Split the block device case from swap_readpage into a separate helper,
following the abstraction for file based swap.

Link: https://lkml.kernel.org/r/20230125133436.447864-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: remove the __swap_writepage return value
Christoph Hellwig [Wed, 25 Jan 2023 13:34:34 +0000 (14:34 +0100)]
mm: remove the __swap_writepage return value

__swap_writepage always returns 0.

Link: https://lkml.kernel.org/r/20230125133436.447864-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: use an on-stack bio for synchronous swapin
Christoph Hellwig [Wed, 25 Jan 2023 13:34:33 +0000 (14:34 +0100)]
mm: use an on-stack bio for synchronous swapin

Optimize the synchronous swap in case by using an on-stack bio instead of
allocating one using bio_alloc.

Link: https://lkml.kernel.org/r/20230125133436.447864-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: factor out a swap_readpage_bdev helper
Christoph Hellwig [Wed, 25 Jan 2023 13:34:32 +0000 (14:34 +0100)]
mm: factor out a swap_readpage_bdev helper

Split the block device case from swap_readpage into a separate helper,
following the abstraction for file based swap and frontswap.

Link: https://lkml.kernel.org/r/20230125133436.447864-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: remove the swap_readpage return value
Christoph Hellwig [Wed, 25 Jan 2023 13:34:31 +0000 (14:34 +0100)]
mm: remove the swap_readpage return value

swap_readpage always returns 0, and no caller checks the return value.

[akpm@linux-foundation.org: fix void-returning swap_readpage() stub, per Keith]
Link: https://lkml.kernel.org/r/20230125133436.447864-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agompage: stop using bdev_{read,write}_page
Christoph Hellwig [Wed, 25 Jan 2023 13:34:30 +0000 (14:34 +0100)]
mpage: stop using bdev_{read,write}_page

Patch series "remove ->rw_page".

This series removes the ->rw_page block_device_operation, which is an old
and clumsy attempt at a simple read/write fast path for the block layer.
It isn't actually used by the fastest block layer operations that we
support (polled I/O through io_uring), but only used by the mpage buffered
I/O helpers which are some of the slowest I/O we have and do not make any
difference there at all, and zram which is a block device abused to
duplicate the zram functionality.

Given that zram is heavily used we need to make sure there is a good
replacement for synchronous I/O, so this series adds a new flag for
drivers that complete I/O synchronously and uses that flag to use on-stack
bios and synchronous submission for them in the swap code.

This patch (of 7):

These are micro-optimizations for synchronous I/O, which do not matter
compared to all the other inefficiencies in the legacy buffer_head based
mpage code.

Link: https://lkml.kernel.org/r/20230125133436.447864-1-hch@lst.de
Link: https://lkml.kernel.org/r/20230125133436.447864-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: refactor va_remove_mappings
Christoph Hellwig [Sat, 21 Jan 2023 07:10:51 +0000 (08:10 +0100)]
mm: refactor va_remove_mappings

Move the VM_FLUSH_RESET_PERMS to the caller and rename the function to
better describe what it is doing.

Link: https://lkml.kernel.org/r/20230121071051.1143058-11-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: split __vunmap
Christoph Hellwig [Sat, 21 Jan 2023 07:10:50 +0000 (08:10 +0100)]
mm: split __vunmap

vunmap only needs to find and free the vmap_area and vm_strut, so open
code that there and merge the rest of the code into vfree.

Link: https://lkml.kernel.org/r/20230121071051.1143058-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: move debug checks from __vunmap to remove_vm_area
Christoph Hellwig [Sat, 21 Jan 2023 07:10:49 +0000 (08:10 +0100)]
mm: move debug checks from __vunmap to remove_vm_area

All these checks apply to the free_vm_area interface as well, so move them
to the common routine.

Link: https://lkml.kernel.org/r/20230121071051.1143058-9-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: use remove_vm_area in __vunmap
Christoph Hellwig [Sat, 21 Jan 2023 07:10:48 +0000 (08:10 +0100)]
mm: use remove_vm_area in __vunmap

Use the common helper to find and remove a vmap_area instead of open
coding it.

Link: https://lkml.kernel.org/r/20230121071051.1143058-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: move __remove_vm_area out of va_remove_mappings
Christoph Hellwig [Sat, 21 Jan 2023 07:10:47 +0000 (08:10 +0100)]
mm: move __remove_vm_area out of va_remove_mappings

__remove_vm_area is the only part of va_remove_mappings that requires a
vmap_area.  Move the call out to the caller and only pass the vm_struct to
va_remove_mappings.

Link: https://lkml.kernel.org/r/20230121071051.1143058-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: call vfree instead of __vunmap from delayed_vfree_work
Christoph Hellwig [Sat, 21 Jan 2023 07:10:46 +0000 (08:10 +0100)]
mm: call vfree instead of __vunmap from delayed_vfree_work

This adds an extra, never taken, in_interrupt() branch, but will allow to
cut down the maze of vfree helpers.

Link: https://lkml.kernel.org/r/20230121071051.1143058-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: move vmalloc_init and free_work down in vmalloc.c
Christoph Hellwig [Sat, 21 Jan 2023 07:10:45 +0000 (08:10 +0100)]
mm: move vmalloc_init and free_work down in vmalloc.c

Move these two functions around a bit to avoid forward declarations.

Link: https://lkml.kernel.org/r/20230121071051.1143058-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: remove __vfree_deferred
Christoph Hellwig [Sat, 21 Jan 2023 07:10:44 +0000 (08:10 +0100)]
mm: remove __vfree_deferred

Fold __vfree_deferred into vfree_atomic, and call vfree_atomic early on
from vfree if called from interrupt context so that the extra low-level
helper can be avoided.

Link: https://lkml.kernel.org/r/20230121071051.1143058-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: remove __vfree
Christoph Hellwig [Sat, 21 Jan 2023 07:10:43 +0000 (08:10 +0100)]
mm: remove __vfree

__vfree is a subset of vfree that just skips a few checks, and which is
only used by vfree and an error cleanup path.  Fold __vfree into vfree and
switch the only other caller to call vfree() instead.

Link: https://lkml.kernel.org/r/20230121071051.1143058-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: reject vmap with VM_FLUSH_RESET_PERMS
Christoph Hellwig [Sat, 21 Jan 2023 07:10:42 +0000 (08:10 +0100)]
mm: reject vmap with VM_FLUSH_RESET_PERMS

Patch series "cleanup vfree and vunmap".

This little series untangles the vfree and vunmap code path a bit.

This patch (of 10):

VM_FLUSH_RESET_PERMS is just for use with vmalloc as it is tied to freeing
the underlying pages.

Link: https://lkml.kernel.org/r/20230121071051.1143058-1-hch@lst.de
Link: https://lkml.kernel.org/r/20230121071051.1143058-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm, compaction: finish pageblocks on complete migration failure
Mel Gorman [Wed, 25 Jan 2023 13:44:34 +0000 (13:44 +0000)]
mm, compaction: finish pageblocks on complete migration failure

Commit 7efc3b726103 ("mm/compaction: fix set skip in
fast_find_migrateblock") address an issue where a pageblock selected by
fast_find_migrateblock() was ignored.  Unfortunately, the same fix
resulted in numerous reports of khugepaged or kcompactd stalling for long
periods of time or consuming 100% of CPU.

Tracing showed that there was a lot of rescanning between a small subset
of pageblocks because the conditions for marking the block skip are not
met.  The scan is not reaching the end of the pageblock because enough
pages were isolated but none were migrated successfully.  Eventually it
circles back to the same block.

Pageblock skip tracking tries to minimise both latency and excessive
scanning but tracking exactly when a block is fully scanned requires an
excessive amount of state.  This patch forcibly rescans a pageblock when
all isolated pages fail to migrate even though it could be for transient
reasons such as page writeback or page dirty.  This will sometimes migrate
too many pages but pageblocks will be marked skip and forward progress
will be made.

"Usemen" from the mmtests configuration
workload-usemem-stress-numa-compact was used to stress compaction.  The
compaction trace events were recorded using a 6.2-rc5 kernel that includes
commit 7efc3b726103 and count of unique ranges were measured.  The top 5
ranges were

   3076 range=(0x10ca00-0x10cc00)
   3076 range=(0x110a00-0x110c00)
   3098 range=(0x13b600-0x13b800)
   3104 range=(0x141c00-0x141e00)
  11424 range=(0x11b600-0x11b800)

While this workload is very different than what the bugs reported, the
pattern of the same subset of blocks being repeatedly scanned is observed.
At one point, *only* the range range=(0x11b600 ~ 0x11b800) was scanned
for 2 seconds.  14 seconds passed between the first migration-related
event and the last.

With the series applied including this patch, the top 5 ranges were

      1 range=(0x11607e-0x116200)
      1 range=(0x116200-0x116278)
      1 range=(0x116278-0x116400)
      1 range=(0x116400-0x116424)
      1 range=(0x116424-0x116600)

Only unique ranges were scanned and the time between the first
migration-related event was 0.11 milliseconds.

Link: https://lkml.kernel.org/r/20230125134434.18017-5-mgorman@techsingularity.net
Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm, compaction: finish scanning the current pageblock if requested
Mel Gorman [Wed, 25 Jan 2023 13:44:33 +0000 (13:44 +0000)]
mm, compaction: finish scanning the current pageblock if requested

cc->finish_pageblock is set when the current pageblock should be rescanned
but fast_find_migrateblock can select an alternative block.  Disable
fast_find_migrateblock when the current pageblock scan should be
completed.

Link: https://lkml.kernel.org/r/20230125134434.18017-4-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm, compaction: check if a page has been captured before draining PCP pages
Mel Gorman [Wed, 25 Jan 2023 13:44:32 +0000 (13:44 +0000)]
mm, compaction: check if a page has been captured before draining PCP pages

If a page has been captured then draining is unnecssary so check first for
a captured page.

Link: https://lkml.kernel.org/r/20230125134434.18017-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm, compaction: rename compact_control->rescan to finish_pageblock
Mel Gorman [Wed, 25 Jan 2023 13:44:31 +0000 (13:44 +0000)]
mm, compaction: rename compact_control->rescan to finish_pageblock

Patch series "Fix excessive CPU usage during compaction".

Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
fixed a problem where pageblocks found by fast_find_migrateblock() were
ignored. Unfortunately there were numerous bug reports complaining about high
CPU usage and massive stalls once 6.1 was released. Due to the severity,
the patch was reverted by Vlastimil as a short-term fix[1] to -stable.

The underlying problem for each of the bugs is suspected to be the
repeated scanning of the same pageblocks.  This series should guarantee
forward progress even with commit 7efc3b726103.  More information is in
the changelog for patch 4.

[1] http://lore.kernel.org/r/20230113173345.9692-1-vbabka@suse.cz

This patch (of 4):

The rescan field was not well named albeit accurate at the time.  Rename
the field to finish_pageblock to indicate that the remainder of the
pageblock should be scanned regardless of COMPACT_CLUSTER_MAX.  The intent
is that pageblocks with transient failures get marked for skipping to
avoid revisiting the same pageblock.

Link: https://lkml.kernel.org/r/20230125134434.18017-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm/gup.c: fix typo in comments
Jongwoo Han [Wed, 25 Jan 2023 18:08:47 +0000 (03:08 +0900)]
mm/gup.c: fix typo in comments

Link: https://lkml.kernel.org/r/20230125180847.4542-1-jongwooo.han@gmail.com
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>