There's a race between adding a block group to the list of the unused
block groups and removing an unused block group (cleaner kthread) that
leads to freeing extents that are in use or a crash during transaction
commmit. Basically the cleaner kthread, when executing
btrfs_delete_unused_bgs(), might catch the newly added block group to
the list fs_info->unused_bgs and clear the range representing the whole
group from fs_info->freed_extents[] before the task that added the block
group to the list (running update_block_group()) marked the last freed
extent as dirty in fs_info->freed_extents (pinned_extents).
That is:
     CPU 1                                CPU 2
                                  btrfs_delete_unused_bgs()
update_block_group()
   add block group to
   fs_info->unused_bgs
                                    got block group from the list
                                    clear_extent_bits for the whole
                                    block group range in freed_extents[]
   set_extent_dirty for the
   range covering the freed
   extent in freed_extents[]
   (fs_info->pinned_extents)
                                  block group deleted, and a new block
                                  group with the same logical address is
                                  created
                                  reserve space from the new block group
                                  for new data or metadata - the reserved
                                  space overlaps the range specified by
                                  CPU 1 for set_extent_dirty()
                                  commit transaction
                                    find all ranges marked as dirty in
                                    fs_info->pinned_extents, clear them
                                    and add them to the free space cache
Alternatively, if CPU 2 doesn't create a new block group with the same
logical address, we get a crash/BUG_ON at transaction commit when unpining
extent ranges because we can't find a block group for the range marked as
dirty by CPU 1. Sample trace:
[ 2163.426462] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2163.426640] Modules linked in: btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32c_generic libcrc32c dm_mod nfsd auth_rpc
gss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse parport_pc parport i2c_piix4 processor thermal_sys i2ccore evdev button pcspkr microcode serio_raw ext4 crc16 jbd2 mbcache
 sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata e1000 scsi_mod virtio_pci virtio_ring virtio
[ 2163.428209] CPU: 0 PID: 11858 Comm: btrfs-transacti Tainted: G        W      3.17.0-rc5-btrfs-next-1+ #1
[ 2163.428519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2163.428875] task: 
ffff88009f2c0650 ti: 
ffff8801356bc000 task.ti: 
ffff8801356bc000
[ 2163.429157] RIP: 0010:[<
ffffffffa037728e>]  [<
ffffffffa037728e>] unpin_extent_range.isra.58+0x62/0x192 [btrfs]
[ 2163.429562] RSP: 0018:
ffff8801356bfda8  EFLAGS: 
00010246
[ 2163.429802] RAX: 
0000000000000000 RBX: 
0000000000000000 RCX: 
0000000000000000
[ 2163.429990] RDX: 
0000000041bfffff RSI: 
0000000001c00000 RDI: 
ffff880024307080
[ 2163.430042] RBP: 
ffff8801356bfde8 R08: 
0000000000000068 R09: 
ffff88003734f118
[ 2163.430042] R10: 
ffff8801356bfcb8 R11: 
fffffffffffffb69 R12: 
ffff8800243070d0
[ 2163.430042] R13: 
0000000083c04000 R14: 
ffff8800751b0f00 R15: 
ffff880024307000
[ 2163.430042] FS:  
0000000000000000(0000) GS:
ffff88013f400000(0000) knlGS:
0000000000000000
[ 2163.430042] CS:  0010 DS: 0000 ES: 0000 CR0: 
000000008005003b
[ 2163.430042] CR2: 
00007ff10eb43fc0 CR3: 
0000000004cb8000 CR4: 
00000000000006f0
[ 2163.430042] Stack:
[ 2163.430042]  
ffff8800243070d0 0000000083c08000 0000000083c07fff ffff88012d6bc800
[ 2163.430042]  
ffff8800243070d0 ffff8800751b0f18 ffff8800751b0f00 0000000000000000
[ 2163.430042]  
ffff8801356bfe18 ffffffffa037a481 0000000083c04000 0000000083c07fff
[ 2163.430042] Call Trace:
[ 2163.430042]  [<
ffffffffa037a481>] btrfs_finish_extent_commit+0xac/0xbf [btrfs]
[ 2163.430042]  [<
ffffffffa038c06d>] btrfs_commit_transaction+0x6ee/0x882 [btrfs]
[ 2163.430042]  [<
ffffffffa03881f1>] transaction_kthread+0xf2/0x1a4 [btrfs]
[ 2163.430042]  [<
ffffffffa03880ff>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
[ 2163.430042]  [<
ffffffff8105966b>] kthread+0xb7/0xbf
[ 2163.430042]  [<
ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
[ 2163.430042]  [<
ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
[ 2163.430042]  [<
ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
So fix this by making update_block_group() first set the range as dirty
in pinned_extents before adding the block group to the unused_bgs list.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
                        spin_unlock(&cache->space_info->lock);
                } else {
                        old_val -= num_bytes;
+                       btrfs_set_block_group_used(&cache->item, old_val);
+                       cache->pinned += num_bytes;
+                       cache->space_info->bytes_pinned += num_bytes;
+                       cache->space_info->bytes_used -= num_bytes;
+                       cache->space_info->disk_used -= num_bytes * factor;
+                       spin_unlock(&cache->lock);
+                       spin_unlock(&cache->space_info->lock);
 
+                       set_extent_dirty(info->pinned_extents,
+                                        bytenr, bytenr + num_bytes - 1,
+                                        GFP_NOFS | __GFP_NOFAIL);
                        /*
                         * No longer have used bytes in this block group, queue
                         * it for deletion.
                                }
                                spin_unlock(&info->unused_bgs_lock);
                        }
-                       btrfs_set_block_group_used(&cache->item, old_val);
-                       cache->pinned += num_bytes;
-                       cache->space_info->bytes_pinned += num_bytes;
-                       cache->space_info->bytes_used -= num_bytes;
-                       cache->space_info->disk_used -= num_bytes * factor;
-                       spin_unlock(&cache->lock);
-                       spin_unlock(&cache->space_info->lock);
-
-                       set_extent_dirty(info->pinned_extents,
-                                        bytenr, bytenr + num_bytes - 1,
-                                        GFP_NOFS | __GFP_NOFAIL);
                }
                btrfs_put_block_group(cache);
                total -= num_bytes;