linux-2.6-microblaze.git
11 months agobcachefs: use reservation for log messages during recovery
Brian Foster [Wed, 22 Mar 2023 12:27:58 +0000 (08:27 -0400)]
bcachefs: use reservation for log messages during recovery

If we block on journal reservation attempting to log journal
messages during recovery, particularly for the first message(s)
before we start doing actual work, chances are the filesystem ends
up deadlocked.

Allow logged messages to use reserved journal space to mitigate this
problem. In the worst case where no space is available whatsoever,
this at least allows the fs to recognize that the journal is stuck
and fail the mount gracefully.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve trans_restart_split_race tracepoint
Kent Overstreet [Thu, 30 Mar 2023 20:04:02 +0000 (16:04 -0400)]
bcachefs: Improve trans_restart_split_race tracepoint

Seeing occasional test failures where we get stuck in a livelock that
involves this event - this will help track it down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Data update path no longer leaves cached replicas
Kent Overstreet [Thu, 30 Mar 2023 02:47:30 +0000 (22:47 -0400)]
bcachefs: Data update path no longer leaves cached replicas

It turns out that it's currently impossible to invalidate buckets
containing only cached data if they're part of a stripe. The normal
bucket invalidate path can't do it because we have to be able to
incerement the bucket's gen, which isn't correct becasue it's still a
member of the stripe - and the bucket invalidate path makes the bucket
availabel for reuse right away, which also isn't correct for buckets in
stripes.

What would work is invalidating cached data by following backpointers,
except that cached replicas don't currently get backpointers - because
they would be awkward for the existing bucket invalidate path to delete
and they haven't been needed elsewhere.

So for the time being, to prevent running out of space in stripes,
switch the data update path to not leave cached replicas; we may revisit
this in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Rhashtable based buckets_in_flight for copygc
Kent Overstreet [Sat, 11 Mar 2023 19:44:41 +0000 (14:44 -0500)]
bcachefs: Rhashtable based buckets_in_flight for copygc

Previously, copygc used a fifo for tracking buckets in flight - this had
the disadvantage of being fixed size, since we pass references to
elements into the move code.

This restructures it to be a hash table and linked list, since with
erasure coding we need to be able to pipeline across an arbitrary number
of buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent()
Kent Overstreet [Wed, 29 Mar 2023 17:10:36 +0000 (13:10 -0400)]
bcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent()

This adds a flags param to bch2_backpointer_get_key() so that we can
pass BTREE_ITER_INTENT, since ec_stripe_update_extent() is updating the
extent immediately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: move snapshot_t to subvolume_types.h
Kent Overstreet [Wed, 29 Mar 2023 15:01:12 +0000 (11:01 -0400)]
bcachefs: move snapshot_t to subvolume_types.h

this doesn't need to be in bcachefs.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_get_key_or_hole()
Kent Overstreet [Tue, 28 Mar 2023 23:37:25 +0000 (19:37 -0400)]
bcachefs: Fix bch2_get_key_or_hole()

This fixes an off by one error, due to confusing closed vs. half open
intervals.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check return code from need_whiteout_for_snapshot()
Kent Overstreet [Tue, 28 Mar 2023 23:15:53 +0000 (19:15 -0400)]
bcachefs: Check return code from need_whiteout_for_snapshot()

This could return a transaction restart; we need to check for that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_dev_freespace_init() Print out status every 10 seconds
Kent Overstreet [Thu, 23 Mar 2023 01:22:51 +0000 (21:22 -0400)]
bcachefs: bch2_dev_freespace_init() Print out status every 10 seconds

It appears freespace init can still take awhile, and we've had a report
or two of it getting stuck - let's have it print out where it's at every
10 seconds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Run freespace init in device hot add path
Kent Overstreet [Thu, 23 Mar 2023 00:48:37 +0000 (20:48 -0400)]
bcachefs: Run freespace init in device hot add path

Like in the recovery, and device add, we have to check if devices don't
have the freespace btree initialized - this was missed in the device hot
add path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improved copygc wait debugging
Kent Overstreet [Fri, 17 Mar 2023 13:59:17 +0000 (09:59 -0400)]
bcachefs: Improved copygc wait debugging

This just adds a line for how long copygc has been waiting to sysfs
copygc_wait, helpful for debugging why copygc isn't running.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Call bch2_path_put_nokeep() before bch2_path_put()
Kent Overstreet [Tue, 21 Mar 2023 16:18:10 +0000 (12:18 -0400)]
bcachefs: Call bch2_path_put_nokeep() before bch2_path_put()

bch2_path_put_nokeep() is sketchy, and we should consider removing it:
it unconditionally frees btree_paths once their ref hits 0.

The assumption is that we only use it for paths that have never been
visible outside the btree core btree code; i.e. higher level code will
never be making assumptions about locking based on these paths.

However, there's subtle brokenness with this approach:

 - If we call bch2_path_put(), then bch2_path_put_nokeep(),
   bch2_path_put() may free the first path on the assumption that we we
   have another path keeping a node locked - but then
   bch2_path_put_nokeep() just unconditionally frees it.

The same bug may arise if we're calling bch2_path_put() and
bch2_path_put_nokeep() on the same (refcounted) path, or two adjacent
paths that point to the same btree node.

This patch hacks around one of these bugs by calling
bch2_path_put_nokeep() first in bch2_trans_iter_exit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: drop unnecessary journal stuck check from space calculation
Brian Foster [Tue, 21 Mar 2023 12:09:16 +0000 (08:09 -0400)]
bcachefs: drop unnecessary journal stuck check from space calculation

The journal stucking check in bch2_journal_space_available() is
particularly aggressive and can lead to premature shutdown in some
rare cases. This is difficult to reproduce, but also comes along
with a fatal error and so is worthwhile to be cautious.

For example, we've seen instances where the journal is under heavy
reservation pressure, the journal allocation path transitions into
the final available journal bucket, the journal write path
immediately consumes that bucket and calls into
bch2_journal_space_available(), which then in turn flags the journal
as stuck because there is no available space and shuts down the
filesystem instead of submitting the journal write (that would have
otherwise succeeded).

To avoid this problem, simplify the journal stuck checking by just
relying on the higher level logic in the journal reservation path.
This produces more useful debug output and is a more reliable
indicator that things have bogged down.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: refactor journal stuck checking into standalone helper
Brian Foster [Tue, 21 Mar 2023 12:03:18 +0000 (08:03 -0400)]
bcachefs: refactor journal stuck checking into standalone helper

bcachefs checks for journal stuck conditions both in the journal
space calculation code and the journal reservation slow path. The
logic in both places is rather tricky and can result in
non-deterministic failure characteristics and debug output.

In preparation to condense journal stuck handling to a single place,
refactor the __journal_res_get() logic into a standalone helper.
Since multiple callers into the reservation code can result in
duplicate reports, use the ->err_seq field as a serialization
mechanism for the debug dump. Finally, add some comments to help
explain the logic and hopefully facilitate further improvements in
the future.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: gracefully unwind journal res slowpath on shutdown
Brian Foster [Mon, 20 Mar 2023 17:21:19 +0000 (13:21 -0400)]
bcachefs: gracefully unwind journal res slowpath on shutdown

bcachefs detects journal stuck conditions in a couple different
places. If the logic in the journal reservation slow path happens to
detect the problem, I've seen instances where the filesystem remains
deadlocked even though it has been shut down. This is occasionally
reproduced by generic/333, and usually manifests as one or more
tasks stuck in the journal reservation slow path.

To help avoid this problem, repeat the journal error check in
__journal_res_get() once under spinlock to cover the case where the
previous lock holder might have triggered shutdown. This also helps
avoid spurious/duplicate stuck reports. Also, wake the journal from
the halt code to make sure blocked callers of the journal res
slowpath have a chance to wake up and observe the pending error.
This survives an overnight looping run of generic/333 without the
aforementioned lockups.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: more aggressive fast path write buffer key flushing
Brian Foster [Fri, 17 Mar 2023 12:54:01 +0000 (08:54 -0400)]
bcachefs: more aggressive fast path write buffer key flushing

The btree write buffer flush code is prone to causing journal
deadlock due to inefficient use and release of reservation space.
Reservation is not pre-reserved for write buffered keys (as is done
for key cache keys, for example), because the write buffer flush
side uses a fast path that attempts insertion without need for any
reservation at all.

The write buffer flush attempts to deal with this by inserting keys
using the BTREE_INSERT_JOURNAL_RECLAIM flag to return an error on
journal reservations that require blocking. Upon first error, it
falls back to a slow path that inserts in journal order and supports
moving the associated journal pin forward.

The problem is that under pathological conditions (i.e. smaller log,
larger write buffer and journal reservation pressure), we've seen
instances where the fast path fails fairly quickly without having
completed many insertions, and then the slow path is unable to push
the journal pin forward enough to free up the space it needs to
completely flush the buffer. This problem is occasionally reproduced
by fstest generic/333.

To avoid this problem, update the fast path algorithm to skip key
inserts that fail due to inability to acquire needed journal
reservation without immediately breaking out of the loop. Instead,
insert as many keys as possible, zap the sequence numbers to mark
them as processed, and then fall back to the slow path to process
the remaining set in journal order. This reduces the amount of
journal reservation that might be required to flush the entire
buffer and increases the odds that the slow path is able to move the
journal pin forward and free up space as keys are processed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: use dedicated workqueue for tasks holding write refs
Brian Foster [Thu, 23 Mar 2023 18:09:05 +0000 (14:09 -0400)]
bcachefs: use dedicated workqueue for tasks holding write refs

A workqueue resource deadlock has been observed when running fsck
on a filesystem with a full/stuck journal. fsck is not currently
able to repair the fs due to fairly rapid emergency shutdown, but
rather than exit gracefully the fsck process hangs during the
shutdown sequence. Fortunately this is easily recoverable from
userspace, but the root cause involves code shared between the
kernel and userspace and so should be addressed.

The deadlock scenario involves the main task in the bch2_fs_stop()
-> bch2_fs_read_only() path waiting on write references to drain
with the fs state lock held. A bch2_read_only_work() workqueue task
is scheduled on the system_long_wq, blocked on the state lock.
Finally, various other write ref holding workqueue tasks are
scheduled to run on the same workqueue and must complete in order to
release references that the initial task is waiting on.

To avoid this problem, we can split the dependent workqueue tasks
across different workqueues. It's a bit of a waste to create a
dedicated wq for the read-only worker, but there are several tasks
throughout the fs that follow the pattern of acquiring a write
reference and then scheduling to the system wq. Use a local wq
for such tasks to break the subtle dependency between these and the
read-only worker.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: remove unused bch2_trans_log_msg()
Brian Foster [Wed, 22 Mar 2023 13:17:26 +0000 (09:17 -0400)]
bcachefs: remove unused bch2_trans_log_msg()

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_verify_bucket_evacuated()
Kent Overstreet [Mon, 27 Mar 2023 20:25:15 +0000 (16:25 -0400)]
bcachefs: Fix bch2_verify_bucket_evacuated()

We were going into an infinite loop when printing out backpointers, due
to never incrementing bp_offset - whoops.

Also limit the number of backpointers we print to 10; this is debug code
and we only need to print a sample, not all of them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: verify_bucket_evacuated() -> set_btree_iter_dontneed()
Kent Overstreet [Sun, 19 Mar 2023 18:32:23 +0000 (14:32 -0400)]
bcachefs: verify_bucket_evacuated() -> set_btree_iter_dontneed()

This should help with excessive 'would deadlock' transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Make reconstruct_alloc quieter
Kent Overstreet [Sun, 19 Mar 2023 18:29:51 +0000 (14:29 -0400)]
bcachefs: Make reconstruct_alloc quieter

We shouldn't be printing out fsck errors for expected errors - this
helps make test logs more readable, and makes it easier to see what the
actual failure was.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix an unhandled transaction restart error
Kent Overstreet [Sun, 19 Mar 2023 18:13:17 +0000 (14:13 -0400)]
bcachefs: Fix an unhandled transaction restart error

This is a bit awkward: we're passing around a btree_trans, but we're not
in a context where transaction restarts are handled - we should try to
come up with a better way to denote situations like this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix nocow write path closure bug
Kent Overstreet [Sun, 19 Mar 2023 17:01:06 +0000 (13:01 -0400)]
bcachefs: Fix nocow write path closure bug

With regular waitlists, we need to ensure we always call finish_wait().
With closures, the equivalent is that we need to call closure_sync()
before returning with a stack-allocated closure.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Nocow write error path fix
Kent Overstreet [Sun, 19 Mar 2023 16:50:05 +0000 (12:50 -0400)]
bcachefs: Nocow write error path fix

The nocow write error path was iterating over pointers in an extent,
aftre we'd dropped btree locks - oops.

Fortunately we'd already stashed what we need in nocow_lock_bucket, so
use that instead.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_extent_fallocate() in nocow mode
Kent Overstreet [Fri, 17 Mar 2023 14:56:44 +0000 (10:56 -0400)]
bcachefs: Fix bch2_extent_fallocate() in nocow mode

When we allocate disk space, we need to be incrementing the WRITE io
clock, which perhaps should be renamed to sectors allocated - copygc
uses this io clock to know when to run.

Also, we should be incrementing the same clock when allocating btree
nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add an assert in inode_write for -ENOENT
Kent Overstreet [Wed, 15 Mar 2023 23:04:05 +0000 (19:04 -0400)]
bcachefs: Add an assert in inode_write for -ENOENT

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_evict_subvolume_inodes()
Kent Overstreet [Wed, 15 Mar 2023 15:53:51 +0000 (11:53 -0400)]
bcachefs: Fix bch2_evict_subvolume_inodes()

This fixes a bug in bch2_evict_subvolume_inodes(): d_mark_dontcache()
doesn't handle the case where i_count is already 0, we need to grab and
put the inode in order for it to be dropped.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve error handling in bch2_ioctl_subvolume_destroy()
Kent Overstreet [Thu, 16 Mar 2023 16:47:35 +0000 (12:47 -0400)]
bcachefs: Improve error handling in bch2_ioctl_subvolume_destroy()

Pure style fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix for 'missing subvolume' error
Kent Overstreet [Thu, 16 Mar 2023 15:04:28 +0000 (11:04 -0400)]
bcachefs: Fix for 'missing subvolume' error

Subvolumes, including their root inodes, get deleted asynchronously
after an unlink. But we still need to ensure that we tell the VFS the
inode has been deleted, otherwise VFS writeback could fire after
asynchronous deletion has finished, and try to write to an
inode/subvolume that no longer exists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't run transaction hooks multiple times
Kent Overstreet [Wed, 15 Mar 2023 18:41:07 +0000 (14:41 -0400)]
bcachefs: Don't run transaction hooks multiple times

transaction hooks aren't supposed to run unless we know the transaction
is going to commit succesfully: this fixes a bug with attempting to
delete a subvolume multiple times.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add a fallback when journal_keys doesn't fit in ram
Kent Overstreet [Wed, 15 Mar 2023 15:02:00 +0000 (11:02 -0400)]
bcachefs: Add a fallback when journal_keys doesn't fit in ram

We may end up in a situation where allocating the buffer for the sorted
journal_keys fails - but it would likely succeed, post compaction where
we drop duplicates.

We've had reports of this allocation failing, so this adds a slowpath to
do the compaction incrementally.

This is only a band-aid fix; we need to look at limiting the number of
keys in the journal based on the amount of system RAM.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve the backpointer to missing extent message
Kent Overstreet [Tue, 14 Mar 2023 18:39:54 +0000 (14:39 -0400)]
bcachefs: Improve the backpointer to missing extent message

We now print the pos where the backpointer was found in the btree, as
well as the exact bucket:bucket_offset of the data, to aid in grepping
through logs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add error message for failing to allocate sorted journal keys
Kent Overstreet [Tue, 14 Mar 2023 20:21:16 +0000 (16:21 -0400)]
bcachefs: Add error message for failing to allocate sorted journal keys

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: New erasure coding shutdown path
Kent Overstreet [Tue, 14 Mar 2023 02:01:47 +0000 (22:01 -0400)]
bcachefs: New erasure coding shutdown path

This implements a new shutdown path for erasure coding, which is needed
for the upcoming BCH_WRITE_WAIT_FOR_EC write path.

The process is:
 - Cancel new stripes being built up
 - Close out/cancel open buckets on write points or the partial list
   that are for stripes
 - Shutdown rebalance/copygc
 - Then wait for in flight new stripes to finish

With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill
up before they complete; the new ec shutdown path is needed for shutting
down copygc/rebalance without deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_fs_moving_ctxts_to_text()
Kent Overstreet [Sun, 12 Mar 2023 01:38:46 +0000 (20:38 -0500)]
bcachefs: bch2_fs_moving_ctxts_to_text()

This also adds bch2_write_op_to_text(): now we can see outstand moves,
useful for debugging shutdown with the upcoming BCH_WRITE_WAIT_FOR_EC
and likely for other things in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Private error codes: ENOMEM
Kent Overstreet [Tue, 14 Mar 2023 19:35:57 +0000 (15:35 -0400)]
bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_check_extents_to_backpointers()
Kent Overstreet [Tue, 14 Mar 2023 16:54:21 +0000 (12:54 -0400)]
bcachefs: Fix bch2_check_extents_to_backpointers()

In rare cases, bch2_check_extents_to_backpointers() would incorrectly
flag an extent has having a missing backpointer when we just needed to
flush the btree write buffer - we weren't tracking the last flushed
position correctly.

This adds a level field to the last_flushed pos, fixing a bug where we'd
sometimes fail on a new root node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix an assert in copygc thread shutdown path
Kent Overstreet [Tue, 14 Mar 2023 15:48:07 +0000 (11:48 -0400)]
bcachefs: Fix an assert in copygc thread shutdown path

We're not supposed to have nested (locked) btree_trans on the stack:
this means copygc shutdown needs to exit our btree_trans before exiting
the move_ctxt, which calls bch2_write().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_bucket_is_movable() -> BTREE_ITER_CACHED
Kent Overstreet [Tue, 14 Mar 2023 12:35:04 +0000 (08:35 -0400)]
bcachefs: bch2_bucket_is_movable() -> BTREE_ITER_CACHED

BTREE_ITER_CACHED should really be the default for cached btrees - this
is an easy mistake to make.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't use BTREE_ITER_INTENT in make_extent_indirect()
Kent Overstreet [Tue, 14 Mar 2023 01:58:14 +0000 (21:58 -0400)]
bcachefs: Don't use BTREE_ITER_INTENT in make_extent_indirect()

This is a workaround for a btree path overflow - searching with
BTREE_ITER_INTENT periodically saves the iterator position for updates,
which eventually overflows.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix stripe create error path
Kent Overstreet [Mon, 13 Mar 2023 13:53:04 +0000 (09:53 -0400)]
bcachefs: Fix stripe create error path

If we errored out on a new stripe before fully allocating it, we
shouldn't be zeroing out unwritten data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Mark new snapshots earlier in create path
Kent Overstreet [Mon, 13 Mar 2023 11:09:33 +0000 (07:09 -0400)]
bcachefs: Mark new snapshots earlier in create path

This fixes a null ptr deref when creating new snapshots:
bch2_create_trans() will lookup the subvolume and find the _new_
snapshot in the BCH_CREATE_SUBVOL path that's being created in that
transaction.

We have to call bch2_mark_snapshot() earlier so that it's properly
initialized, instead of leaving it for transaction commit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve bch2_new_stripes_to_text()
Kent Overstreet [Sat, 11 Mar 2023 22:23:08 +0000 (17:23 -0500)]
bcachefs: Improve bch2_new_stripes_to_text()

Print out the alloc reserve, and format it a bit more nicely.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill bch_write_op->btree_update_ready
Kent Overstreet [Sat, 11 Mar 2023 22:21:30 +0000 (17:21 -0500)]
bcachefs: Kill bch_write_op->btree_update_ready

This changes the write path to not add write ops to to the write_point's
list of pending work items until it's ready; this means we have to
change the lock protecting it to an irq-safe lock, but means
bch2_write_point_do_index_updates() no longer has to iterate over the
list, which is beneficial with the way the new BCH_WRITE_WAIT_FOR_EC
code works.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Simplify stripe_idx_to_delete
Kent Overstreet [Sat, 11 Mar 2023 04:37:19 +0000 (23:37 -0500)]
bcachefs: Simplify stripe_idx_to_delete

This is not technically correct - it's subject to a race if we ever end
up with a stripe with all empty blocks (that needs to be deleted) being
held open. But the "correct" version was much too inefficient, and soon
we'll be adding a stripes LRU.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix next_bucket()
Kent Overstreet [Sat, 11 Mar 2023 20:52:37 +0000 (15:52 -0500)]
bcachefs: Fix next_bucket()

This fixes an infinite loop in bch2_get_key_or_real_bucket_hole().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Second layer of refcounting for new stripes
Kent Overstreet [Thu, 9 Mar 2023 15:18:09 +0000 (10:18 -0500)]
bcachefs: Second layer of refcounting for new stripes

This will be used for move writes, which will be waiting until the
stripe is created to do the index update. They need to prevent the
stripe from being reclaimed until their index update is done, so we need
another refcount that just keeps the stripe open.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# Conflicts:
# fs/bcachefs/ec.c
# fs/bcachefs/io.c

11 months agobcachefs: ec: fall back to creating new stripes for copygc
Kent Overstreet [Fri, 10 Mar 2023 21:46:24 +0000 (16:46 -0500)]
bcachefs: ec: fall back to creating new stripes for copygc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Rework __bch2_data_update_index_update()
Kent Overstreet [Sat, 4 Mar 2023 08:21:34 +0000 (03:21 -0500)]
bcachefs: Rework __bch2_data_update_index_update()

This makes some improvements to the logic for adding/removing replicas,
as part of the larger erasure coding improvements. We now directly
consider number of replicas desired for the given inode, and
extent/pointer durability: this ensures that the extent ends up with the
desired number of replicas when we're replacing multiple pointers with
one that has higher durability (e.g. erasure coded).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Extent helper improvements
Kent Overstreet [Fri, 10 Mar 2023 21:28:37 +0000 (16:28 -0500)]
bcachefs: Extent helper improvements

 - __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available
   outside extents.

 - Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and
   non const versions

 - bch2_extent_has_ptr() now returns the pointer it found

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: evacuate_bucket() no longer moves cached ptrs
Kent Overstreet [Fri, 10 Mar 2023 23:00:10 +0000 (18:00 -0500)]
bcachefs: evacuate_bucket() no longer moves cached ptrs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: evacuate_bucket() no longer calls verify_bucket_evacuated()
Kent Overstreet [Fri, 10 Mar 2023 22:40:21 +0000 (17:40 -0500)]
bcachefs: evacuate_bucket() no longer calls verify_bucket_evacuated()

The copygc code itself now calls this when all moves from a given bucket
are complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Suppress transaction restart err message
Kent Overstreet [Fri, 10 Mar 2023 19:34:30 +0000 (14:34 -0500)]
bcachefs: Suppress transaction restart err message

This isn't a real error, and doesn't need to be printed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Rework open bucket partial list allocation
Kent Overstreet [Sat, 25 Feb 2023 07:22:49 +0000 (02:22 -0500)]
bcachefs: Rework open bucket partial list allocation

Now, any open_bucket can go on the partial list: allocating from the
partial list has been moved to its own dedicated function,
open_bucket_add_bucets() -> bucket_alloc_set_partial().

In particular, this means that erasure coded buckets can safely go on
the partial list; the new location works with the "allocate an ec bucket
first, then the rest" logic.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: don't bump key cache journal seq on nojournal commits
Brian Foster [Thu, 2 Mar 2023 14:03:37 +0000 (09:03 -0500)]
bcachefs: don't bump key cache journal seq on nojournal commits

fstest generic/388 occasionally reproduces corruptions where an
inode has extents beyond i_size. This is a deliberate crash and
recovery test, and the post crash+recovery characteristics are
usually the same: the inode exists on disk in an early (i.e. just
allocated) state based on the journal sequence number associated
with the inode. Subsequent inode updates exist in the journal at
higher sequence numbers, but the inode hadn't been written back
before the associated crash and the post-crash recovery processes a
set of journal sequence numbers that doesn't include updates to the
inode. In fact, the sequence with the most recent inode key update
always happens to be the sequence just before the front of the
journal processed by recovery.

This last bit is a significant hint that the problem relates to an
on-disk journal update of the front of the journal. The root cause
of this problem is basically that the inode is updated (multiple
times) in-core and in the key cache, each time bumping the key cache
sequence number used to control the cache flush. The cache flush
skips one or more times, bumping the associated key cache journal
pin to the key cache seq value. This has a side effect of holding
the inode in memory a bit longer than normal, which helps exacerbate
this problem, but is also unsafe in certain cases where the key
cache seq may have been updated by a transaction commit that didn't
journal the associated key.

For example, consider an inode that has been allocated, updated
several times in the key cache, journaled, but not yet written back.
At this stage, everything should be consistent if the fs happens to
crash because the latest update has been journal. Now consider a key
update via bch2_extent_update_i_size_sectors() that uses the
BTREE_UPDATE_NOJOURNAL flag. While this update may not change inode
state, it can have the side effect of bumping ck->seq in
bch2_btree_insert_key_cached(). In turn, if a subsequent key cache
flush skips due to seq not matching the former, the ck->journal pin
is updated to ck->seq even though the most recent key update was not
journaled. If this pin happens to reside at the front (tail) of the
journal, this means a subsequent journal write can update last_seq
to a value beyond that which includes the most recent update to the
inode. If this occurs and the fs happens to crash before the inode
happens to flush, recovery will see the latest last_seq, fail to
recover the inode and leave the inode in the inconsistent state
described above.

To avoid this problem, skip the key cache seq update on NOJOURNAL
commits, except on initial pin add. Pass the insert entry directly
to bch2_btree_insert_key_cached() to make the associated flag
available and be consistent with btree_insert_key_leaf().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: When shutting down, flush btree node writes last
Kent Overstreet [Tue, 7 Mar 2023 12:28:20 +0000 (07:28 -0500)]
bcachefs: When shutting down, flush btree node writes last

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Verbose on by default when CONFIG_BCACHEFS_DEBUG=y
Kent Overstreet [Tue, 7 Mar 2023 12:25:12 +0000 (07:25 -0500)]
bcachefs: Verbose on by default when CONFIG_BCACHEFS_DEBUG=y

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agofixup bcachefs: Use for_each_btree_key_upto() more consistently
Kent Overstreet [Mon, 6 Mar 2023 15:20:36 +0000 (10:20 -0500)]
fixup bcachefs: Use for_each_btree_key_upto() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agosix locks: be more careful about lost wakeups
Kent Overstreet [Mon, 6 Mar 2023 12:57:51 +0000 (07:57 -0500)]
six locks: be more careful about lost wakeups

This is a workaround for a lost wakeup bug we've been seeing - we still
need to discover the actual bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Journal resize fixes
Kent Overstreet [Mon, 6 Mar 2023 10:29:12 +0000 (05:29 -0500)]
bcachefs: Journal resize fixes

 - Fix a sleeping-in-atomic bug due to calling
   bch2_journal_buckets_to_sb() under the journal lock.
 - Additionally, now we mark buckets as journal buckets before adding
   them to the journal in memory and the superblock. This ensures that
   if we crash part way through we'll never be writing to journal
   buckets that aren't marked correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_iter_peek_node_and_restart()
Kent Overstreet [Mon, 6 Mar 2023 09:01:22 +0000 (04:01 -0500)]
bcachefs: bch2_btree_iter_peek_node_and_restart()

Minor refactoring for the Rust interface.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_node_ondisk_to_text()
Kent Overstreet [Mon, 6 Mar 2023 07:53:25 +0000 (02:53 -0500)]
bcachefs: bch2_btree_node_ondisk_to_text()

Pulling out a helper from cmd_list.c, as the rest is being rewritten in
Rust but we're not ready to rewrite lower-level btree code in Rust.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_node_to_text() const correctness
Kent Overstreet [Mon, 6 Mar 2023 07:34:59 +0000 (02:34 -0500)]
bcachefs: bch2_btree_node_to_text() const correctness

This is for the Rust interface - Rust cares more about const than C
does.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix "btree node in stripe" error
Kent Overstreet [Mon, 6 Mar 2023 05:10:14 +0000 (00:10 -0500)]
bcachefs: Fix "btree node in stripe" error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill bch2_ec_bucket_written()
Kent Overstreet [Mon, 6 Mar 2023 04:52:49 +0000 (23:52 -0500)]
bcachefs: Kill bch2_ec_bucket_written()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve bch2_new_stripes_to_text()
Kent Overstreet [Wed, 8 Mar 2023 08:57:32 +0000 (03:57 -0500)]
bcachefs: Improve bch2_new_stripes_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improved copygc pipelining
Kent Overstreet [Tue, 28 Feb 2023 03:58:01 +0000 (22:58 -0500)]
bcachefs: Improved copygc pipelining

This improves copygc pipelining across multiple buckets: we now track
each in flight bucket we're evacuating, with separate moving_contexts.

This means that whereas previously we had to wait for outstanding moves
to complete to ensure we didn't try to evacuate the same bucket twice,
we can now just check buckets we want to evacuate against the pending
list.

This also mean we can run the verify_bucket_evacuated() check without
killing pipelining - meaning it can now always be enabled, not just on
debug builds.

This is going to be important for the upcoming erasure coding work,
where moving IOs that are being erasure coded will now skip the initial
replication step; instead the IOs will wait on the stripe to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Free move buffers as early as possible
Kent Overstreet [Sun, 5 Mar 2023 08:11:00 +0000 (03:11 -0500)]
bcachefs: Free move buffers as early as possible

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix stripe reuse path
Kent Overstreet [Sun, 5 Mar 2023 07:52:40 +0000 (02:52 -0500)]
bcachefs: Fix stripe reuse path

It's possible that we reuse a stripe that doesn't have quite the same
configuration as the stripe_head we're allocating from. In that case, we
have to make sure that the new stripe uses the settings from the stripe
we resue, not the stripe head, and make sure the buffer is allocated
correctly.

This fixes the ec_mixed_tiers test.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Drop some anonymous structs, unions
Kent Overstreet [Sun, 5 Mar 2023 04:05:55 +0000 (23:05 -0500)]
bcachefs: Drop some anonymous structs, unions

Rust bindgen doesn't cope well with anonymous structs and unions. This
patch drops the fancy anonymous structs & unions in bkey_i that let us
use the same helpers for bkey_i and bkey_packed; since bkey_packed is an
internal type that's never exposed to outside code, it's only a minor
inconvenienc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: BKEY_PADDED_ONSTACK()
Kent Overstreet [Sun, 5 Mar 2023 03:36:02 +0000 (22:36 -0500)]
bcachefs: BKEY_PADDED_ONSTACK()

Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED()
only needs the anonymous struct when it's used on the stack, to
guarantee layout, not when it's embedded in another struct.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: moving_context->stats is allowed to be NULL
Kent Overstreet [Sat, 4 Mar 2023 07:51:12 +0000 (02:51 -0500)]
bcachefs: moving_context->stats is allowed to be NULL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: RESERVE_stripe
Kent Overstreet [Thu, 2 Mar 2023 06:54:17 +0000 (01:54 -0500)]
bcachefs: RESERVE_stripe

Rework stripe creation path - new algorithm for deciding when to create
new stripes or reuse existing stripes.

We add a new allocation watermark, RESERVE_stripe, above RESERVE_none.
Then we always try to create a new stripe by doing RESERVE_stripe
allocations; if this fails, we reuse an existing stripe and allocate
buckets for it with the reserve watermark for the given write
(RESERVE_none or RESERVE_movinggc).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve error message for stripe block sector counts wrong
Kent Overstreet [Sat, 4 Mar 2023 04:08:11 +0000 (23:08 -0500)]
bcachefs: Improve error message for stripe block sector counts wrong

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More stripe create cleanup/fixes
Kent Overstreet [Fri, 3 Mar 2023 08:11:06 +0000 (03:11 -0500)]
bcachefs: More stripe create cleanup/fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Plumb alloc_reserve through stripe create path
Kent Overstreet [Fri, 3 Mar 2023 07:43:39 +0000 (02:43 -0500)]
bcachefs: Plumb alloc_reserve through stripe create path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Mark stripe buckets with correct data type
Kent Overstreet [Thu, 2 Mar 2023 02:47:07 +0000 (21:47 -0500)]
bcachefs: Mark stripe buckets with correct data type

Currently, we don't use bucket data type for tracking whether buckets
are part of a stripe; parity buckets are BCH_DATA_parity, but data
buckets in a stripe are BCH_DATA_user. There's a separate counter,
buckets_ec, outside the BCH_DATA_TYPES system for tracking number of
buckets on a device that are part of a stripe.

The trouble with this approach is that it's too coarse grained, and we
need better information on fragmentation for debugging copygc.

With this patch, data buckets in a stripe are now tracked as
BCH_DATA_stripe buckets.

This doesn't yet differentiate between erasure coded and non-erasure
coded data in a stripe bucket, nor do we yet track empty data buckets in
stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Centralize btree node lock initialization
Kent Overstreet [Fri, 3 Mar 2023 05:03:01 +0000 (00:03 -0500)]
bcachefs: Centralize btree node lock initialization

This fixes some confusion in the lockdep code due to initializing btree
node/key cache locks with the same lockdep key, but different names.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Plumb btree_trans through btree cache code
Kent Overstreet [Thu, 2 Mar 2023 07:12:18 +0000 (02:12 -0500)]
bcachefs: Plumb btree_trans through btree cache code

Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve dev_alloc_debug_to_text()
Kent Overstreet [Thu, 2 Mar 2023 06:08:46 +0000 (01:08 -0500)]
bcachefs: Improve dev_alloc_debug_to_text()

Now we also print the number of buckets reserved for each watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_copygc_wait_to_text()
Kent Overstreet [Thu, 2 Mar 2023 04:10:39 +0000 (23:10 -0500)]
bcachefs: bch2_copygc_wait_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_mark_key() now takes btree_id & level
Kent Overstreet [Thu, 2 Mar 2023 03:14:31 +0000 (22:14 -0500)]
bcachefs: bch2_mark_key() now takes btree_id & level

btree & level are passed to trans_mark - for backpointers -
bch2_mark_key() should take them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_write_queue()
Kent Overstreet [Wed, 1 Mar 2023 04:08:04 +0000 (23:08 -0500)]
bcachefs: bch2_write_queue()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: ec: Improve error message for btree node in stripe
Kent Overstreet [Wed, 1 Mar 2023 04:11:36 +0000 (23:11 -0500)]
bcachefs: ec: Improve error message for btree node in stripe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_open_bucket_to_text()
Kent Overstreet [Wed, 1 Mar 2023 04:08:48 +0000 (23:08 -0500)]
bcachefs: bch2_open_bucket_to_text()

Factor out a common helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_data_update_init() considers ptr durability
Kent Overstreet [Tue, 28 Feb 2023 04:16:37 +0000 (23:16 -0500)]
bcachefs: bch2_data_update_init() considers ptr durability

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: ec: Ensure new stripe is closed in error path
Kent Overstreet [Tue, 28 Feb 2023 03:30:54 +0000 (22:30 -0500)]
bcachefs: ec: Ensure new stripe is closed in error path

This fixes a use-after-free bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Convert constants to consts
Kent Overstreet [Tue, 28 Feb 2023 03:12:06 +0000 (22:12 -0500)]
bcachefs: Convert constants to consts

Rust bindgen doesn't handle macros, but it does handle integer
constants: this conversion aids in implementing safe Rust wrapper
interfaces.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_iter_peek_and_restart_outlined()
Kent Overstreet [Tue, 28 Feb 2023 02:26:07 +0000 (21:26 -0500)]
bcachefs: bch2_btree_iter_peek_and_restart_outlined()

Needed for interfacing with Rust - bindgen can't handle inline
functions, alas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: ec: zero_out_rest_of_ec_bucket()
Kent Overstreet [Sun, 26 Feb 2023 22:12:36 +0000 (17:12 -0500)]
bcachefs: ec: zero_out_rest_of_ec_bucket()

Occasionally, we won't write to an entire bucket. This fixes the EC code
to handle this case, zeroing out the rest of the bucket as needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_data_update_index_update() -> bch2_trans_run()
Kent Overstreet [Sun, 26 Feb 2023 22:12:05 +0000 (17:12 -0500)]
bcachefs: bch2_data_update_index_update() -> bch2_trans_run()

Convert to use the standard helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Flush write buffer as needed in backpointers repair
Kent Overstreet [Sat, 25 Feb 2023 10:22:37 +0000 (05:22 -0500)]
bcachefs: Flush write buffer as needed in backpointers repair

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix for shared paths in write buffer flush
Kent Overstreet [Sun, 26 Feb 2023 20:48:39 +0000 (15:48 -0500)]
bcachefs: Fix for shared paths in write buffer flush

It's possible for bch2_write_buffer_flush_one() to end up with a shared
path, if called from a context that already has a btree iterator
pointing to a key being flushed. We have to be careful when that
happens, since we can't clone a path that holds write locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Single open_bucket_partial list
Kent Overstreet [Sat, 25 Feb 2023 05:32:34 +0000 (00:32 -0500)]
bcachefs: Single open_bucket_partial list

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve bch2_stripe_to_text()
Kent Overstreet [Sat, 25 Feb 2023 00:26:03 +0000 (19:26 -0500)]
bcachefs: Improve bch2_stripe_to_text()

We now print pointers as bucket:offset, the same as how we print extent
pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add option for completely disabling nocow
Kent Overstreet [Sat, 25 Feb 2023 00:07:21 +0000 (19:07 -0500)]
bcachefs: Add option for completely disabling nocow

This adds an option for completely disabling nocow mode, including the
locking in the data move path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Make bucket_alloc tracepoint more readable
Kent Overstreet [Sat, 25 Feb 2023 00:06:32 +0000 (19:06 -0500)]
bcachefs: Make bucket_alloc tracepoint more readable

Print bucket in dev:bucket notation, to be consistent with how we refer
to buckets elsewhere.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't call bch2_trans_update() unlocked
Kent Overstreet [Thu, 23 Feb 2023 00:39:02 +0000 (19:39 -0500)]
bcachefs: Don't call bch2_trans_update() unlocked

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: get_stripe_key_trans()
Kent Overstreet [Thu, 23 Feb 2023 00:28:58 +0000 (19:28 -0500)]
bcachefs: get_stripe_key_trans()

Another nested btree_trans fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix erasure coding shutdown path
Kent Overstreet [Wed, 22 Feb 2023 23:35:51 +0000 (18:35 -0500)]
bcachefs: Fix erasure coding shutdown path

It's possible when shutting down to for a stripe head to have a new
stripe that doesn't yet have any blocks allocated - we just need to free
it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>