Block group size classes are managed consistently everywhere.
Currently, btrfs_use_block_group_size_class() sets a block group's size
class to specialize it for a specific allocation size. However, this
size class remains "stale" even if the block group becomes completely
empty (both used and reserved bytes reach zero).
This happens in two scenarios:
1. When space reservations are freed (e.g., due to errors or transaction
aborts) via btrfs_free_reserved_bytes().
2. When the last extent in a block group is freed via
btrfs_update_block_group().
While size classes are advisory, a stale size class can cause
find_free_extent to unnecessarily skip candidate block groups during
initial search loops. This undermines the purpose of size classes to
reduce fragmentation by keeping block groups restricted to a specific
size class when they could be reused for any size.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a
block group's used and reserved counts both reach zero. This ensures
that empty block groups are fully available for any allocation size in
the next cycle.
Fixes:
52bb7a2166af ("btrfs: introduce size class to block group allocator")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
return ret;
}
+static void btrfs_maybe_reset_size_class(struct btrfs_block_group *bg)
+{
+ lockdep_assert_held(&bg->lock);
+ if (btrfs_block_group_should_use_size_class(bg) &&
+ bg->used == 0 && bg->reserved == 0)
+ bg->size_class = BTRFS_BG_SZ_NONE;
+}
+
int btrfs_update_block_group(struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes, bool alloc)
{
old_val -= num_bytes;
cache->used = old_val;
cache->pinned += num_bytes;
+ btrfs_maybe_reset_size_class(cache);
btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
space_info->bytes_used -= num_bytes;
space_info->disk_used -= num_bytes * factor;
spin_lock(&cache->lock);
bg_ro = cache->ro;
cache->reserved -= num_bytes;
+ btrfs_maybe_reset_size_class(cache);
if (is_delalloc)
cache->delalloc_bytes -= num_bytes;
spin_unlock(&cache->lock);