huge tmpfs: SGP_NOALLOC to stop collapse_file() on race
authorHugh Dickins <hughd@google.com>
Thu, 2 Sep 2021 21:54:34 +0000 (14:54 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Fri, 3 Sep 2021 16:58:12 +0000 (09:58 -0700)
khugepaged's collapse_file() currently uses SGP_NOHUGE to tell
shmem_getpage() not to try allocating a huge page, in the very unlikely
event that a racing hole-punch removes the swapped or fallocated page as
soon as i_pages lock is dropped.

We want to consolidate shmem's huge decisions, removing SGP_HUGE and
SGP_NOHUGE; but cannot quite persuade ourselves that it's okay to regress
the protection in this case - Yang Shi points out that the huge page would
remain indefinitely, charged to root instead of the intended memcg.

collapse_file() should not even allocate a small page in this case: why
proceed if someone is punching a hole?  SGP_READ is almost the right flag
here, except that it optimizes away from a fallocated page, with NULL to
tell caller to fill with zeroes (like a hole); whereas collapse_file()'s
sequence relies on using a cache page.  Add SGP_NOALLOC just for this.

There are too many consecutive "if (page"s there in shmem_getpage_gfp():
group it better; and fix the outdated "bring it back from swap" comment.

Link: https://lkml.kernel.org/r/1355343b-acf-4653-ef79-6aee40214ac5@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
include/linux/shmem_fs.h
mm/khugepaged.c
mm/shmem.c

index bfc5899..a3f4502 100644 (file)
@@ -94,6 +94,7 @@ extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 /* Flag allocation requirements to shmem_getpage */
 enum sgp_type {
        SGP_READ,       /* don't exceed i_size, don't allocate page */
+       SGP_NOALLOC,    /* similar, but fail on hole or use fallocated page */
        SGP_CACHE,      /* don't exceed i_size, may allocate page */
        SGP_NOHUGE,     /* like SGP_CACHE, but no huge pages */
        SGP_HUGE,       /* like SGP_CACHE, huge pages preferred */
index b0412be..045cc57 100644 (file)
@@ -1721,7 +1721,7 @@ static void collapse_file(struct mm_struct *mm,
                                xas_unlock_irq(&xas);
                                /* swap in or instantiate fallocated page */
                                if (shmem_getpage(mapping->host, index, &page,
-                                                 SGP_NOHUGE)) {
+                                                 SGP_NOALLOC)) {
                                        result = SCAN_FAIL;
                                        goto xa_unlocked;
                                }
index 2df6a53..867cb40 100644 (file)
@@ -1854,26 +1854,31 @@ repeat:
                return error;
        }
 
-       if (page)
+       if (page) {
                hindex = page->index;
-       if (page && sgp == SGP_WRITE)
-               mark_page_accessed(page);
-
-       /* fallocated page? */
-       if (page && !PageUptodate(page)) {
+               if (sgp == SGP_WRITE)
+                       mark_page_accessed(page);
+               if (PageUptodate(page))
+                       goto out;
+               /* fallocated page */
                if (sgp != SGP_READ)
                        goto clear;
                unlock_page(page);
                put_page(page);
-               page = NULL;
-               hindex = index;
        }
-       if (page || sgp == SGP_READ)
-               goto out;
 
        /*
-        * Fast cache lookup did not find it:
-        * bring it back from swap or allocate.
+        * SGP_READ: succeed on hole, with NULL page, letting caller zero.
+        * SGP_NOALLOC: fail on hole, with NULL page, letting caller fail.
+        */
+       *pagep = NULL;
+       if (sgp == SGP_READ)
+               return 0;
+       if (sgp == SGP_NOALLOC)
+               return -ENOENT;
+
+       /*
+        * Fast cache lookup and swap lookup did not find it: allocate.
         */
 
        if (vma && userfaultfd_missing(vma)) {