selftests/vm/userfaultfd: wake after copy failure
authorNadav Amit <namit@vmware.com>
Thu, 2 Sep 2021 21:59:02 +0000 (14:59 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Fri, 3 Sep 2021 16:58:16 +0000 (09:58 -0700)
When userfaultfd copy-ioctl fails since the PTE already exists, an -EEXIST
error is returned and the faulting thread is not woken.  The current
userfaultfd test does not wake the faulting thread in such case.  The
assumption is presumably that another thread set the PTE through copy/wp
ioctl and would wake the faulting thread or that alternatively the fault
handler would realize there is no need to "must_wait" and continue.  This
is not necessarily true.

There is an assumption that the "must_wait" tests in handle_userfault()
are sufficient to provide definitive answer whether the offending PTE is
populated or not.  However, userfaultfd_must_wait() test is lockless.
Consequently, concurrent calls to ptep_modify_prot_start(), for instance,
can clear the PTE and can cause userfaultfd_must_wait() to wrongly assume
it is not populated and a wait is needed.

There are therefore 3 options:
(1) Change the tests to wake on copy failure.
(2) Wake faulting thread unconditionally on zero/copy ioctls before
    returning -EEXIST.
(3) Change the userfaultfd_must_wait() to hold locks.

This patch took the first approach, but the others are valid solutions
with different tradeoffs.

Link: https://lkml.kernel.org/r/20210808020724.1022515-4-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tools/testing/selftests/vm/userfaultfd.c

index 2ea438e..10ab56c 100644 (file)
@@ -566,6 +566,18 @@ static void retry_copy_page(int ufd, struct uffdio_copy *uffdio_copy,
        }
 }
 
+static void wake_range(int ufd, unsigned long addr, unsigned long len)
+{
+       struct uffdio_range uffdio_wake;
+
+       uffdio_wake.start = addr;
+       uffdio_wake.len = len;
+
+       if (ioctl(ufd, UFFDIO_WAKE, &uffdio_wake))
+               fprintf(stderr, "error waking %lu\n",
+                       addr), exit(1);
+}
+
 static int __copy_page(int ufd, unsigned long offset, bool retry)
 {
        struct uffdio_copy uffdio_copy;
@@ -585,6 +597,7 @@ static int __copy_page(int ufd, unsigned long offset, bool retry)
                if (uffdio_copy.copy != -EEXIST)
                        err("UFFDIO_COPY error: %"PRId64,
                            (int64_t)uffdio_copy.copy);
+               wake_range(ufd, uffdio_copy.dst, page_size);
        } else if (uffdio_copy.copy != page_size) {
                err("UFFDIO_COPY error: %"PRId64, (int64_t)uffdio_copy.copy);
        } else {