kernel/events/uprobes: handle device-exclusive entries correctly in __replace_page()
authorDavid Hildenbrand <david@redhat.com>
Mon, 10 Feb 2025 19:37:50 +0000 (20:37 +0100)
committerAndrew Morton <akpm@linux-foundation.org>
Mon, 17 Mar 2025 05:05:58 +0000 (22:05 -0700)
Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we
can return with a device-exclusive entry from page_vma_mapped_walk().

__replace_page() is not prepared for that, so teach it about these PFN
swap PTEs.  Note that device-private entries are so far not applicable on
that path, because GUP would never have returned such folios (conversion
to device-private happens by page migration, not in-place conversion of
the PTE).

There is a race between GUP and us locking the folio to look it up using
page_vma_mapped_walk(), so this is likely a fix (unless something else
could prevent that race, but it doesn't look like).  pte_pfn() on
something that is not a present pte could give use garbage, and we'd
wrongly mess up the mapcount because it was already adjusted by calling
folio_remove_rmap_pte() when making the entry device-exclusive.

Link: https://lkml.kernel.org/r/20250210193801.781278-9-david@redhat.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Alistair Popple <apopple@nvidia.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Lyude <lyude@redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yanteng Si <si.yanteng@linux.dev>
Cc: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kernel/events/uprobes.c

index b4ca889..eac24f3 100644 (file)
@@ -173,6 +173,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
        DEFINE_FOLIO_VMA_WALK(pvmw, old_folio, vma, addr, 0);
        int err;
        struct mmu_notifier_range range;
+       pte_t pte;
 
        mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr,
                                addr + PAGE_SIZE);
@@ -192,6 +193,16 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
        if (!page_vma_mapped_walk(&pvmw))
                goto unlock;
        VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
+       pte = ptep_get(pvmw.pte);
+
+       /*
+        * Handle PFN swap PTES, such as device-exclusive ones, that actually
+        * map pages: simply trigger GUP again to fix it up.
+        */
+       if (unlikely(!pte_present(pte))) {
+               page_vma_mapped_walk_done(&pvmw);
+               goto unlock;
+       }
 
        if (new_page) {
                folio_get(new_folio);
@@ -206,7 +217,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
                inc_mm_counter(mm, MM_ANONPAGES);
        }
 
-       flush_cache_page(vma, addr, pte_pfn(ptep_get(pvmw.pte)));
+       flush_cache_page(vma, addr, pte_pfn(pte));
        ptep_clear_flush(vma, addr, pvmw.pte);
        if (new_page)
                set_pte_at(mm, addr, pvmw.pte,