linux-2.6-microblaze.git
11 months agobcachefs: Rework open bucket partial list allocation
Kent Overstreet [Sat, 25 Feb 2023 07:22:49 +0000 (02:22 -0500)]
bcachefs: Rework open bucket partial list allocation

Now, any open_bucket can go on the partial list: allocating from the
partial list has been moved to its own dedicated function,
open_bucket_add_bucets() -> bucket_alloc_set_partial().

In particular, this means that erasure coded buckets can safely go on
the partial list; the new location works with the "allocate an ec bucket
first, then the rest" logic.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: don't bump key cache journal seq on nojournal commits
Brian Foster [Thu, 2 Mar 2023 14:03:37 +0000 (09:03 -0500)]
bcachefs: don't bump key cache journal seq on nojournal commits

fstest generic/388 occasionally reproduces corruptions where an
inode has extents beyond i_size. This is a deliberate crash and
recovery test, and the post crash+recovery characteristics are
usually the same: the inode exists on disk in an early (i.e. just
allocated) state based on the journal sequence number associated
with the inode. Subsequent inode updates exist in the journal at
higher sequence numbers, but the inode hadn't been written back
before the associated crash and the post-crash recovery processes a
set of journal sequence numbers that doesn't include updates to the
inode. In fact, the sequence with the most recent inode key update
always happens to be the sequence just before the front of the
journal processed by recovery.

This last bit is a significant hint that the problem relates to an
on-disk journal update of the front of the journal. The root cause
of this problem is basically that the inode is updated (multiple
times) in-core and in the key cache, each time bumping the key cache
sequence number used to control the cache flush. The cache flush
skips one or more times, bumping the associated key cache journal
pin to the key cache seq value. This has a side effect of holding
the inode in memory a bit longer than normal, which helps exacerbate
this problem, but is also unsafe in certain cases where the key
cache seq may have been updated by a transaction commit that didn't
journal the associated key.

For example, consider an inode that has been allocated, updated
several times in the key cache, journaled, but not yet written back.
At this stage, everything should be consistent if the fs happens to
crash because the latest update has been journal. Now consider a key
update via bch2_extent_update_i_size_sectors() that uses the
BTREE_UPDATE_NOJOURNAL flag. While this update may not change inode
state, it can have the side effect of bumping ck->seq in
bch2_btree_insert_key_cached(). In turn, if a subsequent key cache
flush skips due to seq not matching the former, the ck->journal pin
is updated to ck->seq even though the most recent key update was not
journaled. If this pin happens to reside at the front (tail) of the
journal, this means a subsequent journal write can update last_seq
to a value beyond that which includes the most recent update to the
inode. If this occurs and the fs happens to crash before the inode
happens to flush, recovery will see the latest last_seq, fail to
recover the inode and leave the inode in the inconsistent state
described above.

To avoid this problem, skip the key cache seq update on NOJOURNAL
commits, except on initial pin add. Pass the insert entry directly
to bch2_btree_insert_key_cached() to make the associated flag
available and be consistent with btree_insert_key_leaf().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: When shutting down, flush btree node writes last
Kent Overstreet [Tue, 7 Mar 2023 12:28:20 +0000 (07:28 -0500)]
bcachefs: When shutting down, flush btree node writes last

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Verbose on by default when CONFIG_BCACHEFS_DEBUG=y
Kent Overstreet [Tue, 7 Mar 2023 12:25:12 +0000 (07:25 -0500)]
bcachefs: Verbose on by default when CONFIG_BCACHEFS_DEBUG=y

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agofixup bcachefs: Use for_each_btree_key_upto() more consistently
Kent Overstreet [Mon, 6 Mar 2023 15:20:36 +0000 (10:20 -0500)]
fixup bcachefs: Use for_each_btree_key_upto() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agosix locks: be more careful about lost wakeups
Kent Overstreet [Mon, 6 Mar 2023 12:57:51 +0000 (07:57 -0500)]
six locks: be more careful about lost wakeups

This is a workaround for a lost wakeup bug we've been seeing - we still
need to discover the actual bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Journal resize fixes
Kent Overstreet [Mon, 6 Mar 2023 10:29:12 +0000 (05:29 -0500)]
bcachefs: Journal resize fixes

 - Fix a sleeping-in-atomic bug due to calling
   bch2_journal_buckets_to_sb() under the journal lock.
 - Additionally, now we mark buckets as journal buckets before adding
   them to the journal in memory and the superblock. This ensures that
   if we crash part way through we'll never be writing to journal
   buckets that aren't marked correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_iter_peek_node_and_restart()
Kent Overstreet [Mon, 6 Mar 2023 09:01:22 +0000 (04:01 -0500)]
bcachefs: bch2_btree_iter_peek_node_and_restart()

Minor refactoring for the Rust interface.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_node_ondisk_to_text()
Kent Overstreet [Mon, 6 Mar 2023 07:53:25 +0000 (02:53 -0500)]
bcachefs: bch2_btree_node_ondisk_to_text()

Pulling out a helper from cmd_list.c, as the rest is being rewritten in
Rust but we're not ready to rewrite lower-level btree code in Rust.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_node_to_text() const correctness
Kent Overstreet [Mon, 6 Mar 2023 07:34:59 +0000 (02:34 -0500)]
bcachefs: bch2_btree_node_to_text() const correctness

This is for the Rust interface - Rust cares more about const than C
does.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix "btree node in stripe" error
Kent Overstreet [Mon, 6 Mar 2023 05:10:14 +0000 (00:10 -0500)]
bcachefs: Fix "btree node in stripe" error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill bch2_ec_bucket_written()
Kent Overstreet [Mon, 6 Mar 2023 04:52:49 +0000 (23:52 -0500)]
bcachefs: Kill bch2_ec_bucket_written()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve bch2_new_stripes_to_text()
Kent Overstreet [Wed, 8 Mar 2023 08:57:32 +0000 (03:57 -0500)]
bcachefs: Improve bch2_new_stripes_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improved copygc pipelining
Kent Overstreet [Tue, 28 Feb 2023 03:58:01 +0000 (22:58 -0500)]
bcachefs: Improved copygc pipelining

This improves copygc pipelining across multiple buckets: we now track
each in flight bucket we're evacuating, with separate moving_contexts.

This means that whereas previously we had to wait for outstanding moves
to complete to ensure we didn't try to evacuate the same bucket twice,
we can now just check buckets we want to evacuate against the pending
list.

This also mean we can run the verify_bucket_evacuated() check without
killing pipelining - meaning it can now always be enabled, not just on
debug builds.

This is going to be important for the upcoming erasure coding work,
where moving IOs that are being erasure coded will now skip the initial
replication step; instead the IOs will wait on the stripe to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Free move buffers as early as possible
Kent Overstreet [Sun, 5 Mar 2023 08:11:00 +0000 (03:11 -0500)]
bcachefs: Free move buffers as early as possible

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix stripe reuse path
Kent Overstreet [Sun, 5 Mar 2023 07:52:40 +0000 (02:52 -0500)]
bcachefs: Fix stripe reuse path

It's possible that we reuse a stripe that doesn't have quite the same
configuration as the stripe_head we're allocating from. In that case, we
have to make sure that the new stripe uses the settings from the stripe
we resue, not the stripe head, and make sure the buffer is allocated
correctly.

This fixes the ec_mixed_tiers test.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Drop some anonymous structs, unions
Kent Overstreet [Sun, 5 Mar 2023 04:05:55 +0000 (23:05 -0500)]
bcachefs: Drop some anonymous structs, unions

Rust bindgen doesn't cope well with anonymous structs and unions. This
patch drops the fancy anonymous structs & unions in bkey_i that let us
use the same helpers for bkey_i and bkey_packed; since bkey_packed is an
internal type that's never exposed to outside code, it's only a minor
inconvenienc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: BKEY_PADDED_ONSTACK()
Kent Overstreet [Sun, 5 Mar 2023 03:36:02 +0000 (22:36 -0500)]
bcachefs: BKEY_PADDED_ONSTACK()

Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED()
only needs the anonymous struct when it's used on the stack, to
guarantee layout, not when it's embedded in another struct.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: moving_context->stats is allowed to be NULL
Kent Overstreet [Sat, 4 Mar 2023 07:51:12 +0000 (02:51 -0500)]
bcachefs: moving_context->stats is allowed to be NULL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: RESERVE_stripe
Kent Overstreet [Thu, 2 Mar 2023 06:54:17 +0000 (01:54 -0500)]
bcachefs: RESERVE_stripe

Rework stripe creation path - new algorithm for deciding when to create
new stripes or reuse existing stripes.

We add a new allocation watermark, RESERVE_stripe, above RESERVE_none.
Then we always try to create a new stripe by doing RESERVE_stripe
allocations; if this fails, we reuse an existing stripe and allocate
buckets for it with the reserve watermark for the given write
(RESERVE_none or RESERVE_movinggc).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve error message for stripe block sector counts wrong
Kent Overstreet [Sat, 4 Mar 2023 04:08:11 +0000 (23:08 -0500)]
bcachefs: Improve error message for stripe block sector counts wrong

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More stripe create cleanup/fixes
Kent Overstreet [Fri, 3 Mar 2023 08:11:06 +0000 (03:11 -0500)]
bcachefs: More stripe create cleanup/fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Plumb alloc_reserve through stripe create path
Kent Overstreet [Fri, 3 Mar 2023 07:43:39 +0000 (02:43 -0500)]
bcachefs: Plumb alloc_reserve through stripe create path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Mark stripe buckets with correct data type
Kent Overstreet [Thu, 2 Mar 2023 02:47:07 +0000 (21:47 -0500)]
bcachefs: Mark stripe buckets with correct data type

Currently, we don't use bucket data type for tracking whether buckets
are part of a stripe; parity buckets are BCH_DATA_parity, but data
buckets in a stripe are BCH_DATA_user. There's a separate counter,
buckets_ec, outside the BCH_DATA_TYPES system for tracking number of
buckets on a device that are part of a stripe.

The trouble with this approach is that it's too coarse grained, and we
need better information on fragmentation for debugging copygc.

With this patch, data buckets in a stripe are now tracked as
BCH_DATA_stripe buckets.

This doesn't yet differentiate between erasure coded and non-erasure
coded data in a stripe bucket, nor do we yet track empty data buckets in
stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Centralize btree node lock initialization
Kent Overstreet [Fri, 3 Mar 2023 05:03:01 +0000 (00:03 -0500)]
bcachefs: Centralize btree node lock initialization

This fixes some confusion in the lockdep code due to initializing btree
node/key cache locks with the same lockdep key, but different names.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Plumb btree_trans through btree cache code
Kent Overstreet [Thu, 2 Mar 2023 07:12:18 +0000 (02:12 -0500)]
bcachefs: Plumb btree_trans through btree cache code

Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve dev_alloc_debug_to_text()
Kent Overstreet [Thu, 2 Mar 2023 06:08:46 +0000 (01:08 -0500)]
bcachefs: Improve dev_alloc_debug_to_text()

Now we also print the number of buckets reserved for each watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_copygc_wait_to_text()
Kent Overstreet [Thu, 2 Mar 2023 04:10:39 +0000 (23:10 -0500)]
bcachefs: bch2_copygc_wait_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_mark_key() now takes btree_id & level
Kent Overstreet [Thu, 2 Mar 2023 03:14:31 +0000 (22:14 -0500)]
bcachefs: bch2_mark_key() now takes btree_id & level

btree & level are passed to trans_mark - for backpointers -
bch2_mark_key() should take them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_write_queue()
Kent Overstreet [Wed, 1 Mar 2023 04:08:04 +0000 (23:08 -0500)]
bcachefs: bch2_write_queue()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: ec: Improve error message for btree node in stripe
Kent Overstreet [Wed, 1 Mar 2023 04:11:36 +0000 (23:11 -0500)]
bcachefs: ec: Improve error message for btree node in stripe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_open_bucket_to_text()
Kent Overstreet [Wed, 1 Mar 2023 04:08:48 +0000 (23:08 -0500)]
bcachefs: bch2_open_bucket_to_text()

Factor out a common helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_data_update_init() considers ptr durability
Kent Overstreet [Tue, 28 Feb 2023 04:16:37 +0000 (23:16 -0500)]
bcachefs: bch2_data_update_init() considers ptr durability

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: ec: Ensure new stripe is closed in error path
Kent Overstreet [Tue, 28 Feb 2023 03:30:54 +0000 (22:30 -0500)]
bcachefs: ec: Ensure new stripe is closed in error path

This fixes a use-after-free bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Convert constants to consts
Kent Overstreet [Tue, 28 Feb 2023 03:12:06 +0000 (22:12 -0500)]
bcachefs: Convert constants to consts

Rust bindgen doesn't handle macros, but it does handle integer
constants: this conversion aids in implementing safe Rust wrapper
interfaces.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_iter_peek_and_restart_outlined()
Kent Overstreet [Tue, 28 Feb 2023 02:26:07 +0000 (21:26 -0500)]
bcachefs: bch2_btree_iter_peek_and_restart_outlined()

Needed for interfacing with Rust - bindgen can't handle inline
functions, alas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: ec: zero_out_rest_of_ec_bucket()
Kent Overstreet [Sun, 26 Feb 2023 22:12:36 +0000 (17:12 -0500)]
bcachefs: ec: zero_out_rest_of_ec_bucket()

Occasionally, we won't write to an entire bucket. This fixes the EC code
to handle this case, zeroing out the rest of the bucket as needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_data_update_index_update() -> bch2_trans_run()
Kent Overstreet [Sun, 26 Feb 2023 22:12:05 +0000 (17:12 -0500)]
bcachefs: bch2_data_update_index_update() -> bch2_trans_run()

Convert to use the standard helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Flush write buffer as needed in backpointers repair
Kent Overstreet [Sat, 25 Feb 2023 10:22:37 +0000 (05:22 -0500)]
bcachefs: Flush write buffer as needed in backpointers repair

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix for shared paths in write buffer flush
Kent Overstreet [Sun, 26 Feb 2023 20:48:39 +0000 (15:48 -0500)]
bcachefs: Fix for shared paths in write buffer flush

It's possible for bch2_write_buffer_flush_one() to end up with a shared
path, if called from a context that already has a btree iterator
pointing to a key being flushed. We have to be careful when that
happens, since we can't clone a path that holds write locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Single open_bucket_partial list
Kent Overstreet [Sat, 25 Feb 2023 05:32:34 +0000 (00:32 -0500)]
bcachefs: Single open_bucket_partial list

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve bch2_stripe_to_text()
Kent Overstreet [Sat, 25 Feb 2023 00:26:03 +0000 (19:26 -0500)]
bcachefs: Improve bch2_stripe_to_text()

We now print pointers as bucket:offset, the same as how we print extent
pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add option for completely disabling nocow
Kent Overstreet [Sat, 25 Feb 2023 00:07:21 +0000 (19:07 -0500)]
bcachefs: Add option for completely disabling nocow

This adds an option for completely disabling nocow mode, including the
locking in the data move path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Make bucket_alloc tracepoint more readable
Kent Overstreet [Sat, 25 Feb 2023 00:06:32 +0000 (19:06 -0500)]
bcachefs: Make bucket_alloc tracepoint more readable

Print bucket in dev:bucket notation, to be consistent with how we refer
to buckets elsewhere.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't call bch2_trans_update() unlocked
Kent Overstreet [Thu, 23 Feb 2023 00:39:02 +0000 (19:39 -0500)]
bcachefs: Don't call bch2_trans_update() unlocked

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: get_stripe_key_trans()
Kent Overstreet [Thu, 23 Feb 2023 00:28:58 +0000 (19:28 -0500)]
bcachefs: get_stripe_key_trans()

Another nested btree_trans fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix erasure coding shutdown path
Kent Overstreet [Wed, 22 Feb 2023 23:35:51 +0000 (18:35 -0500)]
bcachefs: Fix erasure coding shutdown path

It's possible when shutting down to for a stripe head to have a new
stripe that doesn't yet have any blocks allocated - we just need to free
it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix buffer overrun in ec_stripe_update_extent()
Kent Overstreet [Wed, 22 Feb 2023 22:57:59 +0000 (17:57 -0500)]
bcachefs: Fix buffer overrun in ec_stripe_update_extent()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check for redundant ec entries/stripe ptrs
Kent Overstreet [Wed, 22 Feb 2023 04:51:19 +0000 (23:51 -0500)]
bcachefs: Check for redundant ec entries/stripe ptrs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Cached pointers should not be erasure coded
Kent Overstreet [Wed, 22 Feb 2023 00:22:44 +0000 (19:22 -0500)]
bcachefs: Cached pointers should not be erasure coded

There's no reason to erasure code cached pointers: we'll always have
another copy, and it'll be cheaper to read the other copy than do a
reconstruct read. And erasure coded cached pointers would add
complications that we'd rather not have to deal with, so let's make sure
to disallow them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill bch2_keylist_add_in_order()
Kent Overstreet [Wed, 22 Feb 2023 05:56:41 +0000 (00:56 -0500)]
bcachefs: Kill bch2_keylist_add_in_order()

Dead code, so delete

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add tracepoint & counter for btree split race
Kent Overstreet [Mon, 20 Feb 2023 21:41:03 +0000 (16:41 -0500)]
bcachefs: Add tracepoint & counter for btree split race

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: __bch2_btree_insert uses BTREE_INSERT_CACHED
Kent Overstreet [Mon, 20 Feb 2023 19:33:46 +0000 (14:33 -0500)]
bcachefs: __bch2_btree_insert uses BTREE_INSERT_CACHED

Cached btrees should be doing cached updates by default: this fixes a
bug in the migrate tool.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve a verbose log message
Kent Overstreet [Mon, 20 Feb 2023 19:34:38 +0000 (14:34 -0500)]
bcachefs: Improve a verbose log message

We should be using bch2_err_str() where applicable.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_journal_entries_postprocess()
Kent Overstreet [Sun, 19 Feb 2023 05:49:51 +0000 (00:49 -0500)]
bcachefs: bch2_journal_entries_postprocess()

This brings back journal_entries_compact(), but in a more efficient form
- we need to do multiple postprocess steps, so iterate over the
journal entries being written just once to make it more efficient.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix ec repair code check
Kent Overstreet [Sun, 19 Feb 2023 05:43:10 +0000 (00:43 -0500)]
bcachefs: Fix ec repair code check

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Simplify ec stripes heap
Kent Overstreet [Sun, 19 Feb 2023 03:11:50 +0000 (22:11 -0500)]
bcachefs: Simplify ec stripes heap

Now that we have a separate data structure for tracking open stripes,
the stripes heap can track all existing stripes, which is a nice
simplification.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Erasure coding: Track open stripes
Kent Overstreet [Sun, 19 Feb 2023 02:07:25 +0000 (21:07 -0500)]
bcachefs: Erasure coding: Track open stripes

This adds a new hash table for stripes being created or updated, instead
of hackily relying on the stripes heap.

This lets us reserve the slot for the new stripe up front, at the same
time as we would pick an existing stripe - if we were updating an
existing stripe - making the overall code more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Stripe deletion now checks what it's deleting
Kent Overstreet [Sun, 19 Feb 2023 02:31:07 +0000 (21:31 -0500)]
bcachefs: Stripe deletion now checks what it's deleting

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve c->writes refcounting for stripe create path
Kent Overstreet [Sun, 19 Feb 2023 02:10:13 +0000 (21:10 -0500)]
bcachefs: Improve c->writes refcounting for stripe create path

This makes our handling of c->writes more consistent with other
asynchronous work items.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Switch ec_stripes_heap_lock to a mutex
Kent Overstreet [Sun, 19 Feb 2023 01:49:37 +0000 (20:49 -0500)]
bcachefs: Switch ec_stripes_heap_lock to a mutex

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Split trans->last_begin_ip and trans->last_restarted_ip
Kent Overstreet [Sun, 19 Feb 2023 02:20:18 +0000 (21:20 -0500)]
bcachefs: Split trans->last_begin_ip and trans->last_restarted_ip

These are two different things - this improves our debug assert
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix erasure coding locking
Kent Overstreet [Sat, 18 Feb 2023 03:43:47 +0000 (22:43 -0500)]
bcachefs: Fix erasure coding locking

This adds a new helper, bch2_trans_mutex_lock(), for locking a mutex -
dropping and retaking btree locks as needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't block on ec_stripe_head_lock with btree locks held
Kent Overstreet [Sat, 18 Feb 2023 02:04:46 +0000 (21:04 -0500)]
bcachefs: Don't block on ec_stripe_head_lock with btree locks held

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add an assertion for using multiple btree_trans
Kent Overstreet [Sat, 18 Feb 2023 01:51:52 +0000 (20:51 -0500)]
bcachefs: Add an assertion for using multiple btree_trans

A thread should never be using more than one btree_trans - doing so is
an invitation for deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Erasure coding now uses bch2_bucket_alloc_trans
Kent Overstreet [Sat, 18 Feb 2023 01:50:55 +0000 (20:50 -0500)]
bcachefs: Erasure coding now uses bch2_bucket_alloc_trans

This code predates plumbing btree_trans through the bucket allocation
path: switching to it fixes a deadlock due to using multiple btree_trans
at the same time, which we never want to do.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't invalidate open buckets
Kent Overstreet [Sat, 18 Feb 2023 01:33:12 +0000 (20:33 -0500)]
bcachefs: Don't invalidate open buckets

Like bch2_trans_mark_bucket(), we shouldn't be incrementing a bucket gen
while it's still open - erasure coding was hitting this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fragmentation LRU
Kent Overstreet [Mon, 5 Dec 2022 15:24:19 +0000 (10:24 -0500)]
bcachefs: Fragmentation LRU

Now that we have much more efficient updates to the LRU btree, this
patch adds a new LRU that indexes buckets by fragmentation.

This means copygc no longer has to scan every bucket to find buckets
that need to be evacuated.

Changes:
 - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to
   the bucket's position in the fragmentation LRU. We add a new field
   for this instead of calculating it as needed because we may make the
   fragmentation LRU optional; this field indicates whether a bucket is
   on the fragmentation LRU.

   Also, zoned devices will introduce variable bucket sizes; explicitly
   recording the LRU position will be safer for them.

 - A new copygc path for using the fragmentation LRU instead of
   scanning every bucket and building up an in-memory heap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Use btree write buffer for LRU btree
Kent Overstreet [Mon, 6 Feb 2023 23:51:42 +0000 (18:51 -0500)]
bcachefs: Use btree write buffer for LRU btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix integer overflow warnings on 32 bit
Kent Overstreet [Fri, 17 Feb 2023 21:06:51 +0000 (16:06 -0500)]
bcachefs: Fix integer overflow warnings on 32 bit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix insert_snapshot_whiteouts()
Kent Overstreet [Fri, 17 Feb 2023 04:42:09 +0000 (23:42 -0500)]
bcachefs: Fix insert_snapshot_whiteouts()

 - We were failing to set the key type on the whiteouts it was creating,
   oops.

 - Also, we need to create whiteouts when generating front splits, not
   just back splits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_mark_snapshot() now called like other triggers
Kent Overstreet [Fri, 17 Feb 2023 05:39:12 +0000 (00:39 -0500)]
bcachefs: bch2_mark_snapshot() now called like other triggers

This fixes a bug where bch2_mark_snapshot() wasn't called for existing
snapshot nodes being updated when child nodes were added.

This led to the data update path thinking the key being updated was for
a snapshot that didn't have children, causing it to fail to insert
whiteouts when splitting existing extents.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Snapshot whiteout fix
Kent Overstreet [Fri, 17 Feb 2023 04:36:41 +0000 (23:36 -0500)]
bcachefs: Snapshot whiteout fix

When fully overwriting an existing extent, we may need to generate a
whiteout - not just if the extent being overwritten was in an older
snapshot, but also if it was overwriting an extent in an older snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Reimplement repair for overlapping extents
Daniel Hill [Sun, 8 May 2022 03:03:28 +0000 (15:03 +1200)]
bcachefs: Reimplement repair for overlapping extents

Repair now checks if overlapping extents exist in the same snapshot
and calls update_trans_update_extent to do the repair work.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't print out duplicate fsck errors
Kent Overstreet [Fri, 17 Feb 2023 02:02:14 +0000 (21:02 -0500)]
bcachefs: Don't print out duplicate fsck errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_insert_nonextent()
Kent Overstreet [Fri, 17 Feb 2023 04:09:27 +0000 (23:09 -0500)]
bcachefs: bch2_btree_insert_nonextent()

This adds a new helper to delete some redundant code in
bch2_trans_update_extent().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a 64 bit divide
Kent Overstreet [Fri, 17 Feb 2023 20:36:46 +0000 (15:36 -0500)]
bcachefs: Fix a 64 bit divide

This fixes a build failure on 32 bit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agosix locks: Simplify six_lock_counts()
Kent Overstreet [Wed, 15 Feb 2023 23:29:16 +0000 (18:29 -0500)]
six locks: Simplify six_lock_counts()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix verify_update_old_key()
Kent Overstreet [Mon, 13 Feb 2023 23:21:40 +0000 (18:21 -0500)]
bcachefs: Fix verify_update_old_key()

This fixes a very-rare race in our assertion, with needs_whiteout being
modified in the btree key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: New backtrace utility code
Kent Overstreet [Mon, 13 Feb 2023 04:15:53 +0000 (23:15 -0500)]
bcachefs: New backtrace utility code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix verify_bucket_evacuated()
Kent Overstreet [Mon, 13 Feb 2023 03:42:31 +0000 (22:42 -0500)]
bcachefs: Fix verify_bucket_evacuated()

This fixes an incorrectly handled transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Handle sb buffer resizing in __copy_super()
Kent Overstreet [Mon, 13 Feb 2023 03:08:39 +0000 (22:08 -0500)]
bcachefs: Handle sb buffer resizing in __copy_super()

This fixes a rare buffer overrun when one field is growing and another
field is shrinking - and is a nice simplification as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix failure to read btree roots
Kent Overstreet [Mon, 13 Feb 2023 00:24:34 +0000 (19:24 -0500)]
bcachefs: Fix failure to read btree roots

If failed to read a btree root - or if we're not using a btree root,
because of the reconstruct_alloc option - make sure we update the
corresponding info for the key/level for the root on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't run triggers when repairing in __bch2_mark_reflink_p()
Daniel Hill [Sun, 12 Feb 2023 02:51:45 +0000 (15:51 +1300)]
bcachefs: Don't run triggers when repairing in __bch2_mark_reflink_p()

Triggers current trip-up on the faulty reflink we're trying to repair,
Disabling them lets us fix broken reflink and continue.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: let __bch2_btree_insert() pass in flags
Daniel Hill [Thu, 19 Jan 2023 12:27:30 +0000 (01:27 +1300)]
bcachefs: let __bch2_btree_insert() pass in flags

This patch is prep work for the following patch.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve locking in __bch2_set_nr_journal_buckets()
Kent Overstreet [Sat, 11 Feb 2023 21:53:59 +0000 (16:53 -0500)]
bcachefs: Improve locking in __bch2_set_nr_journal_buckets()

This refactors to not call bch2_journal_block() with c->sb_lock held.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More info on check_bucket_ref() error
Kent Overstreet [Sun, 12 Feb 2023 00:31:03 +0000 (19:31 -0500)]
bcachefs: More info on check_bucket_ref() error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add missing include
Kent Overstreet [Sun, 12 Feb 2023 00:30:41 +0000 (19:30 -0500)]
bcachefs: Add missing include

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Handle btree node rewrites before going RW
Kent Overstreet [Sat, 11 Feb 2023 17:57:04 +0000 (12:57 -0500)]
bcachefs: Handle btree node rewrites before going RW

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Nocow locking fixup
Kent Overstreet [Sat, 11 Feb 2023 17:38:28 +0000 (12:38 -0500)]
bcachefs: Nocow locking fixup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add some logging for btree node rewrites due to errors
Kent Overstreet [Fri, 10 Feb 2023 20:47:46 +0000 (15:47 -0500)]
bcachefs: Add some logging for btree node rewrites due to errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Ensure btree node cache is not more than half dirty
Kent Overstreet [Thu, 11 Nov 2021 20:50:22 +0000 (15:50 -0500)]
bcachefs: Ensure btree node cache is not more than half dirty

Tweak journal reclaim to ensure the btree node cache isn't more
than half dirty so that memory reclaim can always make progress - the
same as we do for the btree key cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
11 months agobcachefs: Add max nr of IOs in flight to the move path
Kent Overstreet [Mon, 9 Jan 2023 06:45:18 +0000 (01:45 -0500)]
bcachefs: Add max nr of IOs in flight to the move path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add an assert to bch2_bucket_nocow_unlock()
Kent Overstreet [Thu, 26 Jan 2023 18:36:30 +0000 (13:36 -0500)]
bcachefs: Add an assert to bch2_bucket_nocow_unlock()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: don't block reads if we're promoting
Daniel Hill [Fri, 6 Jan 2023 08:11:07 +0000 (21:11 +1300)]
bcachefs: don't block reads if we're promoting

The promote path calls data_update_init() and now that we take locks here,
there's potential for promote to block our read path, just error
when we can't take the lock instead of blocking.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix promote path leak
Kent Overstreet [Thu, 5 Jan 2023 08:55:23 +0000 (03:55 -0500)]
bcachefs: Fix promote path leak

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve invalidate_one_bucket() error messages
Kent Overstreet [Wed, 4 Jan 2023 04:54:10 +0000 (23:54 -0500)]
bcachefs: Improve invalidate_one_bucket() error messages

Make sure to check for lru entries that point to buckets that don't
exist as well as buckets in the wrong state, and improve the error
message we print out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix move_ctxt_wait_event()
Kent Overstreet [Wed, 4 Jan 2023 04:39:42 +0000 (23:39 -0500)]
bcachefs: Fix move_ctxt_wait_event()

We shouldn't be evaluating cond again if it already returned true.

This fixes a bug when this helper is used for taking nocow locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix deadlock on nocow locks in data move path
Kent Overstreet [Mon, 2 Jan 2023 22:53:02 +0000 (17:53 -0500)]
bcachefs: Fix deadlock on nocow locks in data move path

The recent nocow locking rework introduced a deadlock in the data move
path: the new nocow locking scheme uses a hash table with a fixed size
array for chaining, meaning on hash collision we may have to wait for
other locks to be released before we can lock a bucket.

And since the data move path needs to submit writes from the same thread
that's taking nocow locks and submitting reads, this introduces a
deadlock.

This shouldn't happen often in practice, but since the data move path
can keep large numbers of IOs in flight simultaneously, it's something
we have to handle.

This patch makes move_ctxt_wait_event() available to
bch2_data_update_init() and uses it when appropriate, which is our
normal solution to this kind of thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: BKEY_INVALID_FROM_JOURNAL
Kent Overstreet [Wed, 21 Dec 2022 01:00:34 +0000 (20:00 -0500)]
bcachefs: BKEY_INVALID_FROM_JOURNAL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>