mm/mempolicy: Take VMA lock before replacing policy
authorJann Horn <jannh@google.com>
Fri, 28 Jul 2023 04:13:21 +0000 (06:13 +0200)
committerLinus Torvalds <torvalds@linux-foundation.org>
Fri, 28 Jul 2023 16:44:06 +0000 (09:44 -0700)
commit6c21e066f9256ea1df6f88768f6ae1080b7cf509
tree95326194f48628759a895e5d3dfcb766faf4b119
parent57012c57536f8814dec92e74197ee96c3498d24e
mm/mempolicy: Take VMA lock before replacing policy

mbind() calls down into vma_replace_policy() without taking the per-VMA
locks, replaces the VMA's vma->vm_policy pointer, and frees the old
policy.  That's bad; a concurrent page fault might still be using the
old policy (in vma_alloc_folio()), resulting in use-after-free.

Normally this will manifest as a use-after-free read first, but it can
result in memory corruption, including because vma_alloc_folio() can
call mpol_cond_put() on the freed policy, which conditionally changes
the policy's refcount member.

This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA
systems as long as the kernel was built with CONFIG_NUMA.

Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/mempolicy.c