Carlos Maiolino [Tue, 12 Nov 2024 10:03:15 +0000 (11:03 +0100)]
Merge tag 'better-ondisk-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: improve ondisk structure checks [v5.5 10/10]
Reorganize xfs_ondisk.h to group the build checks by type, then add a
bunch of missing checks that were in xfs/122 but not the build system.
With this, we can get rid of xfs/122.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 10:02:55 +0000 (11:02 +0100)]
Merge tag 'metadir-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: enable metadir [v5.5 09/10]
Actually enable this very large feature, which adds metadata directory
trees, allocation groups on the realtime volume, persistent quota
options, and quota for realtime files.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 10:02:25 +0000 (11:02 +0100)]
Merge tag 'realtime-quotas-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: enable quota for realtime volumes [v5.5 08/10]
At some point, I realized that I've refactored enough of the quota code
in XFS that I should evaluate whether or not quota actually works on
realtime volumes. It turns out that it nearly works: the only broken
pieces are chown and delayed allocation, and reporting of project
quotas in the statvfs output for projinherit+rtinherit directories.
Fix these things and we can have realtime quotas again after 20 years.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 10:01:12 +0000 (11:01 +0100)]
Merge tag 'metadir-quotas-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: persist quota options with metadir [v5.5 07/10]
Store the quota files in the metadata directory tree instead of the
superblock. Since we're introducing a new incompat feature flag, let's
also make the mount process bring up quotas in whatever state they were
when the filesystem was last unmounted, instead of requiring sysadmins
to remember that themselves.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 10:00:42 +0000 (11:00 +0100)]
Merge tag 'realtime-groups-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: shard the realtime section [v5.5 06/10]
Right now, the realtime section uses a single pair of metadata inodes to
store the free space information. This presents a scalability problem
since every thread trying to allocate or free rt extents have to lock
these files. Solve this problem by sharding the realtime section into
separate realtime allocation groups.
While we're at it, define a superblock to be stamped into the start of
the rt section. This enables utilities such as blkid to identify block
devices containing realtime sections, and avoids the situation where
anything written into block 0 of the realtime extent can be
misinterpreted as file data.
The best advantage for rtgroups will become evident later when we get to
adding rmap and reflink to the realtime volume, since the geometry
constraints are the same for rt groups and AGs. Hence we can reuse all
that code directly.
This is a very large patchset, but it catches us up with 20 years of
technical debt that have accumulated.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 10:00:16 +0000 (11:00 +0100)]
Merge tag 'rtgroups-prep-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: preparation for realtime allocation groups [v5.5 05/10]
Prepare for realtime groups by adding a few bug fixes and generic code
that will be necessary.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 09:59:34 +0000 (10:59 +0100)]
Merge tag 'incore-rtgroups-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: create incore rt allocation groups [v5.5 04/10]
Add in-memory data structures for sharding the realtime volume into
independent allocation groups. For existing filesystems, the entire rt
volume is modelled as having a single large group, with (potentially) a
number of rt extents exceeding 2^32 blocks, though these are not likely
to exist because the codebase has been a bit broken for decades. The
next series fills in the ondisk format and other supporting structures.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 09:59:05 +0000 (10:59 +0100)]
Merge tag 'metadata-directory-tree-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: metadata inode directory trees [v5.5 03/10]
This series delivers a new feature -- metadata inode directories. This
is a separate directory tree (rooted in the superblock) that contains
only inodes that contain filesystem metadata. Different metadata
objects can be looked up with regular paths.
Start by creating xfs_imeta{dir,file}* functions to mediate access to
the metadata directory tree. By the end of this mega series, all
existing metadata inodes (rt+quota) will use this directory tree instead
of the superblock.
Next, define the metadir on-disk format, which consists of marking
inodes with a new iflag that says they're metadata. This prevents
bulkstat and friends from ever getting their hands on fs metadata files.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 09:58:27 +0000 (10:58 +0100)]
Merge tag 'generic-groups-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: create a generic allocation group structure [v5.5 02/10]
Soon we'll be sharding the realtime volume into separate allocation
groups. These rt groups will /mostly/ behave the same as the ones on
the data device, but since rt groups don't have quite the same set of
struct fields as perags, let's hoist the parts that will be shared by
both into a common xfs_group object.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Carlos Maiolino [Tue, 12 Nov 2024 09:57:32 +0000 (10:57 +0100)]
Merge tag 'perag-xarray-6.13_2024-11-05' of https://git./linux/kernel/git/djwong/xfs-linux into staging-merge
xfs: convert perag to use xarrays [v5.5 01/10]
Convert the xfs_mount perag tree to use an xarray instead of a radix
tree. There should be no functional changes here.
With a bit of luck, this should all go splendidly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:45 +0000 (20:19 -0800)]
xfs: port ondisk structure checks from xfs/122 to the kernel
Check this with every kernel and userspace build, so we can drop the
nonsense in xfs/122. Roughly drafted with:
sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \
-e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \
-e 's/ = \([0-9]*\)/,\t\t\t\1);/g' \
-e 's/xfs_sb_t/struct xfs_dsb/g' \
-e 's/),/,/g' \
-e 's/xfs_\([a-z0-9_]*\)_t,/struct xfs_\1,/g' \
< tests/xfs/122.out | sort
and then manual fixups.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:44 +0000 (20:19 -0800)]
xfs: separate space btree structures in xfs_ondisk.h
Create a separate section for space management btrees so that they're
not mixed in with file structures. Ignore the dsb stuff sprinkled
around for now, because we'll deal with that in a subsequent patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:44 +0000 (20:19 -0800)]
xfs: convert struct typedefs in xfs_ondisk.h
Replace xfs_foo_t with struct xfs_foo where appropriate. The next patch
will import more checks from xfs/122, and it's easier to automate
deduplication if we don't have to reason about typedefs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:43 +0000 (20:19 -0800)]
xfs: enable metadata directory feature
Enable the metadata directory feature. With this feature, all metadata
inodes are placed in the metadata directory, and the only inumbers in
the superblock are the roots of the two directory trees.
The RT device is now sharded into a number of rtgroups, where 0 rtgroups
mean that no RT extents are supported, and the traditional XFS stub RT
bitmap and summary inodes don't exist. A single rtgroup gives roughly
identical behavior to the traditional RT setup, but now with checksummed
and self identifying free space metadata.
For quota, the quota options are read from the superblock unless
explicitly overridden via mount options.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:42 +0000 (20:19 -0800)]
xfs: enable realtime quota again
Enable quotas for the realtime device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:43 +0000 (20:19 -0800)]
xfs: update sb field checks when metadir is turned on
When metadir is enabled, we want to check the two new rtgroups fields,
and we don't want to check the old inumbers that are now in the metadir.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:41 +0000 (20:19 -0800)]
xfs: reserve quota for realtime files correctly
Fix xfs_quota_reserve_blkres to reserve rt block quota whenever we're
dealing with a realtime file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:41 +0000 (20:19 -0800)]
xfs: create quota preallocation watermarks for realtime quota
Refactor the quota preallocation watermarking code so that it'll work
for realtime quota too. Convert the do_div calls into div_u64 for
compactness.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:40 +0000 (20:19 -0800)]
xfs: report realtime block quota limits on realtime directories
On the data device, calling statvfs on a projinherit directory results
in the block and avail counts being curtailed to the project quota block
limits, if any are set. Do the same for realtime files or directories,
only use the project quota rt block limits.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:39 +0000 (20:19 -0800)]
xfs: persist quota flags with metadir
It's annoying that one has to keep reminding XFS about what quota
options it should mount with, since the quota flags recording the
previous state are sitting right there in the primary superblock. Even
more strangely, there exists a noquota option to disable quotas
completely, so it's odder still that providing no options is the same as
noquota.
Starting with metadir, let's change the behavior so that if the user
does not specify any quota-related mount options at all, the ondisk
quota flags will be used to bring up quota. In other words, the
filesystem will mount in the same state and with the same functionality
as it had during the last mount.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:40 +0000 (20:19 -0800)]
xfs: advertise realtime quota support in the xqm stat files
Add a fifth column to this (really old) stat file to advertise that the
kernel supports quota for realtime volumes. This will be used by
fstests to detect kernel support.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:38 +0000 (20:19 -0800)]
xfs: scrub quota file metapaths
Enable online fsck for quota file metadata directory paths.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:39 +0000 (20:19 -0800)]
xfs: fix chown with rt quota
Make chown's quota adjustments work with realtime files. This is mostly
a matter of calling xfs_inode_count_blocks on a given file to figure out
the number of blocks allocated to the data device and to the realtime
device, and using those quantities to update the quota accounting when
the id changes. Delayed allocation reservations are moved from the old
dquot's incore reservation to the new dquot's incore reservation.
Note that there was a missing ILOCK bug in xfs_qm_dqusage_adjust that we
must fix before calling xfs_iread_extents. Prior to 2.6.37 the locking
was correct, but then someone removed the ILOCK as part of a cleanup.
Nobody noticed because nowhere in the git history have we ever supported
rt+quota so nobody can use this.
I'm leaving git breadcrumbs in case anyone is desperate enough to try to
backport the rtquota code to old kernels.
Not-Cc: <stable@vger.kernel.org> # v2.6.37
Fixes:
52fda114249578 ("xfs: simplify xfs_qm_dqusage_adjust")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:37 +0000 (20:19 -0800)]
xfs: use metadir for quota inodes
Store the quota inodes in the /quota metadata directory if metadir is
enabled. This enables us to stop using the sb_[ugp]uotino fields in the
superblock. From this point on, all metadata files will be children of
the metadata directory tree root.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:37 +0000 (20:19 -0800)]
xfs: refactor xfs_qm_destroy_quotainos
Reuse this function instead of open-coding the logic.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:36 +0000 (20:19 -0800)]
xfs: use rtgroup busy extent list for FITRIM
For filesystems that have rtgroups and hence use the busy extent list
for freed rt space, use that busy extent list so that FITRIM can issue
discard commands asynchronously without worrying about other callers
accidentally allocating and using space that is being discarded.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:36 +0000 (20:19 -0800)]
xfs: implement busy extent tracking for rtgroups
For rtgroups filesystems, track newly freed (rt) space through the log
until the rt EFIs have been committed to disk. This way we ensure that
space cannot be reused until all traces of the old owner are gone.
As a fringe benefit, we now support -o discard on the realtime device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:35 +0000 (20:19 -0800)]
xfs: port the perag discard code to handle generic groups
Port xfs_discard_extents and its tracepoints to handle generic groups
instead of just perags. This is needed to enable busy extent tracking
for rtgroups.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:35 +0000 (20:19 -0800)]
xfs: move the min and max group block numbers to xfs_group
Move the min and max agblock numbers to the generic xfs_group structure
so that we can start building validators for extents within an rtgroup.
While we're at it, use check_add_overflow for the extent length
computation because that has much better overflow checking.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:34 +0000 (20:19 -0800)]
xfs: adjust min_block usage in xfs_verify_agbno
There's some weird logic in xfs_verify_agbno -- min_block ought to be
the first agblock number in the AG that can be used by non-static
metadata. However, we initialize it to the last agblock of the static
metadata, which works due to the <= check, even though this isn't
technically correct.
Change the check to < and set min_block to the next agblock past the
static metadata. This hasn't been an issue up to now, but we're going
to move these things into the generic group struct, and this will cause
problems with rtgroups, where min_block can be zero for an rtgroup that
doesn't have a rt superblock.
Note that there's no user-visible impact with the old logic, so this
isn't a bug fix.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:33 +0000 (20:19 -0800)]
xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t
Now that we've finished adding allocation groups to the realtime volume,
let's make the file block mapping address (xfs_rtblock_t) a segmented
value just like we do on the data device. This means that group number
and block number conversions can be done with shifting and masking
instead of integer division.
While in theory we could continue caching the rgno shift value in
m_rgblklog, the fact that we now always use the shift value means that
we have an opportunity to increase the redundancy of the rt geometry by
storing it in the ondisk superblock and adding more sb verifier code.
Extend the sueprblock to store the rgblklog value.
Now that we have segmented addresses, set the correct values in
m_groups[XG_TYPE_RTG] so that the xfs_group helpers work correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:33 +0000 (20:19 -0800)]
xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries
We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file mapping block
lengths because the rtb helpers soon will not do the right thing there.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:32 +0000 (20:19 -0800)]
xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries
We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file block offsets
because the rtb helpers soon will not do the right thing there.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:31 +0000 (20:19 -0800)]
xfs: mask off the rtbitmap and summary inodes when metadir in use
Set the rtbitmap and summary file inumbers to NULLFSINO in the
superblock and make sure they're zeroed whenever we write the superblock
to disk, to mimic mkfs behavior.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:31 +0000 (20:19 -0800)]
xfs: scrub metadir paths for rtgroup metadata
Add the code we need to scan the metadata directory paths of rt group
metadata files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:30 +0000 (20:19 -0800)]
xfs: repair realtime group superblock
Repair the realtime superblock if it has become out of date with the
primary superblock.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:30 +0000 (20:19 -0800)]
xfs: scrub the realtime group superblock
Enable scrubbing of realtime group superblocks.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:29 +0000 (20:19 -0800)]
xfs: don't coalesce file mappings that cross rtgroup boundaries in scrub
The bmbt scrubber will combine file mappings if they are mergeable to
reduce the number of cross-referencing checks. However, we shouldn't
combine mappings that cross rt group boundaries because that will cause
verifiers to trip incorrectly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:29 +0000 (20:19 -0800)]
xfs: make the RT allocator rtgroup aware
Make the allocator rtgroup aware by either picking a specific group if
there is a hint, or loop over all groups otherwise. A simple rotor is
provided to pick the placement for initial allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:28 +0000 (20:19 -0800)]
xfs: don't merge ioends across RTGs
Unlike AGs, RTGs don't always have metadata in their first blocks, and
thus we don't get automatic protection from merging I/O completions
across RTG boundaries. Add code to set the IOMAP_F_BOUNDARY flag for
ioends that start at the first block of a RTG so that they never get
merged into the previous ioend.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:27 +0000 (20:19 -0800)]
xfs: use realtime EFI to free extents when rtgroups are enabled
When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks. When reflink is enabled, XFS replaces (3) with a
deferred refcount decrement operation that can schedule freeing the
blocks if that was the last refcount.
For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which will break both rmap and reflink unless we
switch it to use realtime EFIs. Both rmap and reflink depend on the
rtgroups feature, so let's turn on EFIs for all rtgroups filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:27 +0000 (20:19 -0800)]
xfs: support error injection when freeing rt extents
A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent. Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:26 +0000 (20:19 -0800)]
xfs: support logging EFIs for realtime extents
Teach the EFI mechanism how to free realtime extents. We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.
Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents. This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases. Hopefully this
will make it easier to debug filesystem problems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:26 +0000 (20:19 -0800)]
xfs: force swapext to a realtime file to use the file content exchange ioctl
xfs_swap_extent_rmap does not use log items to track the overall
progress of an attempt to swap the extent mappings between two files.
If the system crashes in the middle of swapping a partially written
realtime extent, the mapping will be left in an inconsistent state
wherein a file can point to multiple extents on the rt volume.
The new file range exchange functionality handles this correctly, so all
callers must upgrade to that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:25 +0000 (20:19 -0800)]
xfs: store rtgroup information with a bmap intent
Make the bmap intent items take an active reference to the rtgroup
containing the space that is being mapped or unmapped. We will need
this functionality once we start enabling rmap and reflink on the rt
volume. Technically speaking we need it even for !rtgroups filesystems
to prevent the (dummy) rtgroup 0 from going away, even though this will
never happen.
As a bonus, we can rework the xfs_bmap_deferred_class tracepoint to use
the xfs_group object to figure out the type and group number, widen the
group block number field to fit 64-bit quantities, and get rid of the
now redundant opdev and rtblock fields.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:24 +0000 (20:19 -0800)]
xfs: grow the realtime section when realtime groups are enabled
Enable growing the rt section when realtime groups are enabled.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:24 +0000 (20:19 -0800)]
xfs: encode the rtsummary in big endian format
Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words. There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file. Encode the summary
information in big endian format, like most of the rest of the
filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:23 +0000 (20:19 -0800)]
xfs: encode the rtbitmap in big endian format
Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words. There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc. Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:23 +0000 (20:19 -0800)]
xfs: add block headers to realtime bitmap and summary blocks
Upgrade rtbitmap and rtsummary blocks to have self describing metadata
like most every other thing in XFS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:22 +0000 (20:19 -0800)]
xfs: export the geometry of realtime groups to userspace
Create an ioctl so that the kernel can report the status of realtime
groups to userspace.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:21 +0000 (20:19 -0800)]
xfs: record rt group metadata errors in the health system
Record the state of per-rtgroup metadata sickness in the rtgroup
structure for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:21 +0000 (20:19 -0800)]
xfs: convert sick_map loops to use ARRAY_SIZE
Convert these arrays to use ARRAY_SIZE insteead of requiring an empty
sentinel array element at the end. This saves memory and would have
avoided a bug that worked its way into the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:20 +0000 (20:19 -0800)]
xfs: add frextents to the lazysbcounters when rtgroups enabled
Make the free rt extent count a part of the lazy sb counters when the
realtime groups feature is enabled. This is possible because the patch
to recompute frextents from the rtbitmap during log recovery predates
the code adding rtgroup support, hence we know that the value will
always be correct during runtime.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:20 +0000 (20:19 -0800)]
xfs: add a helper to prevent bmap merges across rtgroup boundaries
Except for the rt superblock, realtime groups do not store any metadata
at the start (or end) of the group. There is nothing to prevent the
bmap code from merging allocations from multiple groups into a single
bmap record. Add a helper to check for this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: massage the commit message after pulling this into rtgroups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:16 +0000 (20:19 -0800)]
iomap: add a merge boundary flag
File systems might have boundaries over which merges aren't possible.
In fact these are very common, although most of the time some kind of
header at the beginning of this region (e.g. XFS alloation groups, ext4
block groups) automatically create a merge barrier. But if that is
not present, say for a device purely used for data we need to manually
communicate that to iomap.
Add a IOMAP_F_BOUNDARY flag to never merge I/O into a previous mapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:19 +0000 (20:19 -0800)]
xfs: check that rtblock extents do not break rtsupers or rtgroups
Check that rt block pointers do not point to the realtime superblock and
that allocated rt space extents do not cross rtgroup boundaries.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:18 +0000 (20:19 -0800)]
xfs: export realtime group geometry via XFS_FSOP_GEOM
Export the realtime geometry information so that userspace can query it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:18 +0000 (20:19 -0800)]
xfs: update realtime super every time we update the primary fs super
Every time we update parts of the primary filesystem superblock that are
echoed in the rt superblock, we must update the rt super. Avoid
changing the log to support logging to the rt device by using ordered
buffers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:17 +0000 (20:19 -0800)]
xfs: check the realtime superblock at mount time
Check the realtime superblock at mount time, to ensure that the label
and uuids actually match the primary superblock on the data device. If
the rt superblock is good, attach it to the xfs_mount so that the log
can use ordered buffers to keep this primary in sync with the primary
super on the data device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:17 +0000 (20:19 -0800)]
xfs: define the format of rt groups
Define the ondisk format of realtime group metadata, and a superblock
for realtime volumes. rt supers are conditionally enabled by a
predicate function so that they can be disabled if we ever implement
zoned storage support for the realtime volume.
For rt group enabled file systems there is a separate bitmap and summary
file for each group and thus the number of bitmap and summary blocks
needs to be calculated differently.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:15 +0000 (20:19 -0800)]
xfs: make RT extent numbers relative to the rtgroup
To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t
type encode the RT extent number relative to the rtgroup. The biggest
part of this to clearly distinguish between the relative extent number
that gets masked when converting from a global block number and length
values that just have a factor applied to them when converting from
file system blocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:16 +0000 (20:19 -0800)]
xfs: fix rt device offset calculations for FITRIM
FITRIM on xfs has this bizarro uapi where we flatten all the physically
addressable storage across two block devices into a linear address
space. In this address space, the realtime device comes immediately
after the data device. Therefore, the xfs_trim_rtdev_extents has to
convert its input parameters from the linear address space to actual
rtdev block addresses on the realtime volume.
Right now the address space conversion is done in units of rtblocks.
However, a future patchset will convert xfs_rtblock_t to be a segmented
address space (group:blkno) like the data device. Change the conversion
code to be done in units of daddrs since those will never be segmented.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:14 +0000 (20:19 -0800)]
xfs: refactor xfs_rtsummary_blockcount
Make xfs_rtsummary_blockcount take all the required information from
the mount structure and return the number of summary levels from it
as well. This cleans up many of the callers and prepares for making the
rtsummary files per-rtgroup where they need to look at different value.
This means we recalculate some values in some callers, but as all these
calculations are outside the fast path and cheap, which seems like a
price worth paying.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:14 +0000 (20:19 -0800)]
xfs: refactor xfs_rtbitmap_blockcount
Rename the existing xfs_rtbitmap_blockcount to
xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper
around it that takes the number of extents from the mount structure.
This will simplify the move to per-rtgroup bitmaps as those will need to
pass in the number of extents per rtgroup instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:13 +0000 (20:19 -0800)]
xfs: factor out a xfs_growfs_check_rtgeom helper
Split the check that the rtsummary fits into the log into a separate
helper, and use xfs_growfs_rt_alloc_fake_mount to calculate the new RT
geometry.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: avoid division for the 0-rtx growfs check]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:13 +0000 (20:19 -0800)]
xfs: use xfs_growfs_rt_alloc_fake_mount in xfs_growfs_rt_alloc_blocks
Use xfs_growfs_rt_alloc_fake_mount instead of manually recalculating
the RT bitmap geometry.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:12 +0000 (20:19 -0800)]
xfs: factor out a xfs_growfs_rt_alloc_fake_mount helper
Split the code to set up a fake mount point to calculate new RT
geometry out of xfs_growfs_rt_bmblock so that it can be reused.
Note that this changes the rmblocks calculation method to be based
on the passed in rblocks and extsize and not the explicitly passed
one, but both methods will always lead to the same result. The new
version just does a little bit more math while being more general.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:12 +0000 (20:19 -0800)]
xfs: calculate RT bitmap and summary blocks based on sb_rextents
Use the on-disk rextents to calculate the bitmap and summary blocks
instead of the calculated one so that we can refactor the helpers for
calculating them.
As the RT bitmap and summary scrubbers already check that sb_rextents
match the block count this does not change coverage of the scrubber.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:11 +0000 (20:19 -0800)]
xfs: remove XFS_ILOCK_RT*
Now that we've centralized the realtime metadata locking routines, get
rid of the ILOCK subclasses since we now use explicit lockdep classes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:10 +0000 (20:19 -0800)]
xfs: support creating per-RTG files in growfs
To support adding new RT groups in growfs, we need to be able to create
the per-RT group files. Add a new xfs_rtginode_create helper to create
a given per-RTG file. Most of the code for that is shared, but the
details of the actual file are abstracted out using a new create method
in struct xfs_rtginode_ops.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:10 +0000 (20:19 -0800)]
xfs: move RT bitmap and summary information to the rtgroup
Move the pointers to the RT bitmap and summary inodes as well as the
summary cache to the rtgroups structure to prepare for having a
separate bitmap and summary inodes for each rtgroup.
Code using the inodes now needs to operate on a rtgroup. Where easily
possible such code is converted to iterate over all rtgroups, else
rtgroup 0 (the only one that can currently exist) is hardcoded.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:09 +0000 (20:19 -0800)]
xfs: split xfs_trim_rtdev_extents
Split xfs_trim_rtdev_extents into two parts to prepare for reusing the
main validation also for RT group aware file systems.
Use the fully features xfs_daddr_to_rtb helper to convert from a daddr
to a xfs_rtblock_t to prepare for segmented addressing in RT groups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:09 +0000 (20:19 -0800)]
xfs: cleanup xfs_getfsmap_rtdev_rtbitmap
Use mp->m_sb.sb_rblocks to calculate the end instead of sb_rextents that
needs a conversion, use consistent names to xfs_rtblock_t types, and
only calculated them by the time they are needed. Remove the pointless
"high" local variable that only has a single user.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:08 +0000 (20:19 -0800)]
xfs: factor out a xfs_growfs_rt_alloc_blocks helper
Split out a helper to allocate or grow the rtbitmap and rtsummary files
in preparation of per-RT group bitmap and summary files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:07 +0000 (20:19 -0800)]
xfs: add a xfs_qm_unmount_rt helper
RT group enabled file systems fix the bug where we pointlessly attach
quotas to the RT bitmap and summary files. Split the code to detach the
quotas into a helper, make it conditional and document the differing
behavior for RT group and pre-RT group file systems.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:07 +0000 (20:19 -0800)]
xfs: add a xfs_bmap_free_rtblocks helper
Split the RT extent freeing logic from xfs_bmap_del_extent_real because
it will become more complicated when adding RT group.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:06 +0000 (20:19 -0800)]
xfs: add rtgroup-based realtime scrubbing context management
Create a state tracking structure and helpers to initialize the tracking
structure so that we can check metadata records against the realtime
space management metadata. Right now this is limited to grabbing the
incore rtgroup object, but we'll eventually add to the tracking
structure the ILOCK state and btree cursors.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:03 +0000 (20:19 -0800)]
xfs: repair metadata directory file path connectivity
Fix disconnected or incorrect metadata directory paths.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:06 +0000 (20:19 -0800)]
xfs: support caching rtgroup metadata inodes
Create the necessary per-rtgroup infrastructure that we need to load
metadata inodes into memory.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:05 +0000 (20:19 -0800)]
xfs: add a lockdep class key for rtgroup inodes
Add a dynamic lockdep class key for rtgroup inodes. This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order. Each class can have 8 subclasses, and for now we will only have
2 inodes per group. This enables rtgroup order and inode order checks
when nesting ILOCKs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:05 +0000 (20:19 -0800)]
xfs: define locking primitives for realtime groups
Define helper functions to lock all metadata inodes related to a
realtime group. There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:04 +0000 (20:19 -0800)]
xfs: create incore realtime group structures
Create an incore object that will contain information about a realtime
allocation group. This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section, but for
now just a single object for the entire RT subvolume is created.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 4 Nov 2024 04:19:03 +0000 (20:19 -0800)]
xfs: clean up xfs_getfsmap_helper arguments
The calling conventions for xfs_getfsmap_helper are confusing -- callers
pass in an rmap record, but they must also supply startblock and
blockcount in daddr units. This was bolted onto the original fsmap
implementation so that we could report *something* for realtime
volumes, which do not support rmap and hence can draw only from the rt
free space bitmap. Free space on the rt volume can be more than 2^32
fsblocks long, which means that we can't use the rmap startblock or
blockcount fields.
This is confusing for callers, because they must supplying redundant
data, but not all of it is used. Streamline this by creating a separate
fsmap irec structure that contains exactly the data we need, once.
Note that we actually do need rm_startblock for rmap key comparisons
when we're actually querying an rmap btree, so leave that field but
document why it's there.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:02 +0000 (20:19 -0800)]
xfs: confirm dotdot target before replacing it during a repair
xfs_dir_replace trips an assertion if you tell it to change a dirent to
point to an inumber that it already points at. Look up the dotdot entry
directly to confirm that we need to make a change.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:02 +0000 (20:19 -0800)]
xfs: check metadata directory file path connectivity
Create a new scrubber type that checks that well known metadata
directory paths are connected to the metadata inode that the incore
structures think is in use. For example, check that "/quota/user" in
the metadata directory tree actually points to
mp->m_quotainfo->qi_uquotaip->i_ino.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:01 +0000 (20:19 -0800)]
xfs: move repair temporary files to the metadata directory tree
Due to resource acquisition rules, we have to create the ondisk
temporary files used to stage a filesystem repair before we can acquire
a reference to the inode that we actually want to repair. Therefore,
we do not know at tempfile creation time whether the tempfile will
belong to the regular directory tree or the metadata directory tree.
This distinction becomes important when the swapext code tries to figure
out the quota accounting of the two files whose mappings are being
swapped. The swapext code assumes that accounting updates are required
for a file if dqattach attaches dquots. Metadir files are never
accounted in quota, which means that swapext must not update the quota
accounting when swapping in a repaired directory/xattr/rtbitmap structure.
Prior to the swapext call, therefore, both files must be marked as
METADIR for dqattach so that dqattach will ignore them. Add support for
a repair tempfile to be switched to the metadir tree and switched back
before being released so that ifree will just free the file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:00 +0000 (20:19 -0800)]
xfs: check the metadata directory inumber in superblocks
When metadata directories are enabled, make sure that the secondary
superblocks point to the metadata directory. This isn't strictly
required because the secondaries are only used to recover damaged
filesystems, and the metadir root inumber is fixed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:00 +0000 (20:19 -0800)]
xfs: scrub metadata directories
Teach online scrub about the metadata directory tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:59 +0000 (20:18 -0800)]
xfs: fix di_metatype field of inodes that won't load
Make sure that the di_metatype field is at least set plausibly so that
later scrubbers could set the real type.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:59 +0000 (20:18 -0800)]
xfs: adjust parent pointer scrubber for sb-rooted metadata files
Starting with the metadata directory feature, we're allowed to call the
directory and parent pointer scrubbers for every metadata file,
including the ones that are children of the superblock.
For these children, checking the link count against the number of parent
pointers is a bit funny -- there's no such thing as a parent pointer for
a child of the superblock since there's no corresponding dirent. For
purposes of validating nlink, we pretend that there is a parent pointer.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:58 +0000 (20:18 -0800)]
xfs: metadata files can have xattrs if metadir is enabled
If parent pointers are enabled, then metadata files will store parent
pointers in xattrs, just like files in the user visible directory tree.
Therefore, scrub and repair need to handle attr forks for metadata files
on metadir filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:58 +0000 (20:18 -0800)]
xfs: do not count metadata directory files when doing online quotacheck
Previously, we stated that files in the metadata directory tree are not
counted in the dquot information. Fix the online quotacheck code to
reflect this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:57 +0000 (20:18 -0800)]
xfs: refactor directory tree root predicates
Metadata directory trees make reasoning about the parent of a file more
difficult. Traditionally, user files are children of sb_rootino, and
metadata files are "children" of the superblock. Now, we add a third
possibility -- some metadata files can be children of sb_metadirino, but
the classic ones (rt free space data and quotas) are left alone.
Let's add some helper functions (instead of open-coding the logic
everywhere) to make scrub logic easier to understand.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:57 +0000 (20:18 -0800)]
xfs: record health problems with the metadata directory
Make a report to the health monitoring subsystem any time we encounter
something in the metadata directory tree that looks like corruption.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:56 +0000 (20:18 -0800)]
xfs: adjust xfs_bmap_add_attrfork for metadir
Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers. In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all. Adjust the assertions appropriately.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:55 +0000 (20:18 -0800)]
xfs: mark quota inodes as metadata files
When we're creating quota files at mount time, make sure to mark them as
metadir inodes if appropriate.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:55 +0000 (20:18 -0800)]
xfs: don't count metadata directory files to quota
Files in the metadata directory tree are internal to the filesystem.
Don't count the inodes or the blocks they use in the root dquot because
users do not need to know about their resource usage. This will also
quiet down complaints about dquot usage not matching du output.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:54 +0000 (20:18 -0800)]
xfs: allow bulkstat to return metadata directories
Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.
(Metadata files of course require per-file scrub code and hence do not
need exposure.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:53 +0000 (20:18 -0800)]
xfs: advertise metadata directory feature
Advertise the existence of the metadata directory feature; this will be
used by scrub to decide if it needs to scan the metadir too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:53 +0000 (20:18 -0800)]
xfs: hide metadata inodes from everyone because they are special
Metadata inodes are private files and therefore cannot be exposed to
userspace. This means no bulkstat, no open-by-handle, no linking them
into the directory tree, and no feeding them to LSMs. As such, we mark
them S_PRIVATE, which stops all that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>