Jan Kara [Tue, 11 Jan 2011 20:16:31 +0000 (15:16 -0500)]
ext4: fix trimming starting with block 0 with small blocksize
When s_first_data_block is not zero (which happens e.g. when block size is 1KB)
and trim ioctl is called to start trimming from block 0, the math in
ext4_get_group_no_and_offset() overflows. The overall result is that ioctl
returns EINVAL which is kind of unexpected and we probably don't want
userspace tools to bother with internal details of filesystem structure.
So just silently increase starting offset (and shorten length) when starting
block is below s_first_data_block.
CC: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 11 Jan 2011 19:42:29 +0000 (14:42 -0500)]
ext4: revert buggy trim overflow patch
This reverts commit
4f531501e44: ext4: fix possible overflow in
ext4_trim_fs()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Mon, 10 Jan 2011 18:03:35 +0000 (13:03 -0500)]
ext4: don't pass entire map to check_eofblocks_fl
Since check_eofblocks_fl() only uses the m_lblk portion of the map
structure, we may as well pass that directly, rather than passing the
entire map, which IMHO obfuscates what parameters check_eofblocks_fl()
cares about. Not a big deal, but seems tidier and less confusing, to
me.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:51:28 +0000 (12:51 -0500)]
ext4: fix memory leak in ext4_free_branches
Commit
40389687 moved a call to ext4_forget() out of
ext4_free_branches and let ext4_free_blocks() handle calling
bforget(). But that change unfortunately did not replace the call to
ext4_forget() with brelse(), which was needed to drop the in-use count
of the indirect block's buffer head, which lead to a memory leak when
deleting files that used indirect blocks. Fix this.
Thanks to Hugh Dickins for pointing this out.
Cc: stable@kernel.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:47:07 +0000 (12:47 -0500)]
ext4: remove ext4_mb_return_to_preallocation()
This function was never implemented, except for a BUG_ON which was
tripping when ext4 is run without a journal. The problem is that
although the comment asserts that "truncate (which is the only way to
free block) discards all preallocations", ext4_free_blocks() is also
called in various error recovery paths when blocks have been
allocated, but for various reasons, we were not able to use those data
blocks (for example, because we ran out of memory while trying to
manipulate the extent tree, or some other similar situation).
In addition to the fact that this function isn't implemented except
for the incorrect BUG_ON, the single caller of this function,
ext4_free_blocks(), doesn't use it all if the journal is enabled.
So remove the (stub) function entirely for now. If we decide it's
better to add it back, it's only going to be useful with a relatively
large number of code changes anyway.
Google-Bug-Id:
3236408
Cc: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jiaying Zhang [Mon, 10 Jan 2011 17:47:05 +0000 (12:47 -0500)]
ext4: flush the i_completed_io_list during ext4_truncate
Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.
The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls
ext4_convert_unwritten_extents(), we may reallocate some blocks that have
been truncated, so the inode size becomes inconsistent with the allocated
blocks.
The following patch flushes the i_completed_io_list during truncate to reduce
the risk that some pending end_io requests are executed later and convert
already truncated blocks to initialized.
Note that although the fix helps reduce the problem a lot there may still
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.
Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:
a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.
b) We change the ext4 write path so that we change the extent tree to contain
the newly allocated blocks and update i_size both at the same time --- when
the write of the data blocks is completed.
c) We add some additional locking to synchronize vmtruncate() and
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.
All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:46:59 +0000 (12:46 -0500)]
ext4: add error checking to calls to ext4_handle_dirty_metadata()
Call ext4_std_error() in various places when we can't bail out
cleanly, so the file system can be marked as in error.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Mon, 10 Jan 2011 17:30:39 +0000 (12:30 -0500)]
ext4: fix trimming of a single group
When ext4_trim_fs() is called to trim a part of a single group, the
logic will wrongly set last block of the interval to 'len' instead
of 'first_block + len'. Thus a shorter interval is possibly trimmed.
Fix it.
CC: Lukas Czerner <lczerner@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Andrew Morton [Mon, 10 Jan 2011 17:30:17 +0000 (12:30 -0500)]
ext4: fix uninitialized variable in ext4_register_li_request
fs/ext4/super.c: In function 'ext4_register_li_request':
fs/ext4/super.c:2936: warning: 'ret' may be used uninitialized in this function
It looks buggy to me, too.
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:29:43 +0000 (12:29 -0500)]
ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing. This allows us to further slim down the ext4_inode_info
structure.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:18:25 +0000 (12:18 -0500)]
ext4: drop i_state_flags on architectures with 64-bit longs
We can store the dynamic inode state flags in the high bits of
EXT4_I(inode)->i_flags, and eliminate i_state_flags. This saves 8
bytes from the size of ext4_inode_info structure, which when
multiplied by the number of the number of in the inode cache, can save
a lot of memory.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:13:42 +0000 (12:13 -0500)]
ext4: reorder ext4_inode_info structure elements to remove unneeded padding
By reordering the elements in the ext4_inode_info structure, we can
reduce the padding needed on an x86_64 system by 16 bytes.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:13:26 +0000 (12:13 -0500)]
ext4: drop ec_type from the ext4_ext_cache structure
We can encode the ec_type information by using ee_len == 0 to denote
EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if
neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT.
This allows us to reduce the size of ext4_ext_inode by another 8
bytes. (ec_type is 4 bytes, plus another 4 bytes of padding)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:13:03 +0000 (12:13 -0500)]
ext4: use ext4_lblk_t instead of sector_t for logical blocks
This fixes a number of places where we used sector_t instead of
ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data
types. No point wasting space in the ext4_inode_info structure, and
requiring 64-bit arithmetic on 32-bit systems, when it isn't
necessary.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:12:36 +0000 (12:12 -0500)]
ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
Remove the short element i_delalloc_reserved_flag from the
ext4_inode_info structure and replace it a new bit in i_state_flags.
Since we have an ext4_inode_info for every ext4 inode cached in the
inode cache, any savings we can produce here is a very good thing from
a memory utilization perspective.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Kazuya Mio [Mon, 10 Jan 2011 17:12:28 +0000 (12:12 -0500)]
ext4: fix 32bit overflow in ext4_ext_find_goal()
ext4_ext_find_goal() returns an ideal physical block number that the block
allocator tries to allocate first. However, if a required file offset is
smaller than the existing extent's one, ext4_ext_find_goal() returns
a wrong block number because it may overflow at
"block - le32_to_cpu(ex->ee_block)". This patch fixes the problem.
ext4_ext_find_goal() will also return a wrong block number in case
a file offset of the existing extent is too big. In this case,
the ideal physical block number is fixed in ext4_mb_initialize_context(),
so it's no problem.
reproduce:
# dd if=/dev/zero of=/mnt/mp1/tmp bs=127M count=1 oflag=sync
# dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 seek=1 oflag=sync
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is
1048576 (256 blocks, blocksize 4096)
ext logical physical expected length flags
0 128 67456 128 eof
/mnt/mp1/file: 2 extents found
# rm -rf /mnt/mp1/tmp
# echo $((512*4096)) > /sys/fs/ext4/loop0/mb_stream_req
# dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 oflag=sync conv=notrunc
result (linux-2.6.37-rc2 + ext4 patch queue):
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is
1048576 (256 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 33280 128
1 128 67456 33407 128 eof
/mnt/mp1/file: 2 extents found
result(apply this patch):
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is
1048576 (256 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 66560 128
1 128 67456 66687 128 eof
/mnt/mp1/file: 2 extents found
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Namhyung Kim [Mon, 10 Jan 2011 17:11:16 +0000 (12:11 -0500)]
ext4: add more error checks to ext4_mkdir()
Check return value of ext4_journal_get_write_access,
ext4_journal_dirty_metadata and ext4_mark_inode_dirty. Move brelse()
under 'out_stop' to release bh properly in case of journal error.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Paris [Mon, 10 Jan 2011 17:11:00 +0000 (12:11 -0500)]
ext4: ext4_ext_migrate should use NULL not 0
ext4_ext_migrate() calls ext4_new_inode() and passes 0 instead of a pointer
to a struct qstr. This patch uses NULL, to make it obvious to the caller
that this was a pointer.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:10:55 +0000 (12:10 -0500)]
ext4: Use ext4_error_file() to print the pathname to the corrupted inode
Where the file pointer is available, use ext4_error_file() instead of
ext4_error_inode().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Dan Carpenter [Mon, 10 Jan 2011 17:10:50 +0000 (12:10 -0500)]
ext4: use IS_ERR() to check for errors in ext4_error_file
d_path() returns an ERR_PTR and it doesn't return NULL. This is in
ext4_error_file() and no one actually calls ext4_error_file().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Dan Carpenter [Mon, 10 Jan 2011 17:10:44 +0000 (12:10 -0500)]
ext4: test the correct variable in ext4_init_pageio()
This is a copy and paste error. The intent was to check
"io_page_cachep". We tested "io_page_cachep" earlier.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Wang Sheng-Hui [Mon, 10 Jan 2011 17:10:37 +0000 (12:10 -0500)]
ext2: remove dead code in ext2_xattr_get
Reviewed-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Wang Sheng-Hui <crosslonelyover@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Wang Sheng-Hui [Mon, 10 Jan 2011 17:10:30 +0000 (12:10 -0500)]
ext2,ext3,ext4: clarify comment for extN_xattr_set_handle
Signed-off-by: Wang Sheng-Hui <crosslonelyover@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Jan 2011 17:10:07 +0000 (12:10 -0500)]
ext4: clean up ext4_xattr_list()'s error code checking and return strategy
Any time you see code that tries to add error codes together, you
should want to claw your eyes out...
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Mon, 10 Jan 2011 17:09:59 +0000 (12:09 -0500)]
ext4: remove warning message from ext4_issue_discard helper
ext4_issue_discard is supposed to be helper for calling discard, however
in case that underlying device does not support discard it prints out
the warning message and clears the DISCARD t_mount_opt flag. Since it
can be (and is) used by others, it should not do anything and let the
caller to handle the error case.
This commit removes warning message and flag setting from
ext4_issue_discard and use it just in place where it is really needed
(release_blocks_on_commit). FITRIM ioctl should not set any flags nor it
should print out warning messages, so get rid of the warning as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Lukas Czerner [Mon, 10 Jan 2011 17:04:55 +0000 (12:04 -0500)]
ext4: fix possible overflow in ext4_trim_fs()
When determining last group through ext4_get_group_no_and_offset() the
result may be wrong in cases when range->start and range-len are too
big, because it may overflow when summing up those two numbers.
Fix that by checking range->len and limit its value to
ext4_blocks_count(). This commit was tested by myself with expected
result.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Theodore Ts'o [Mon, 20 Dec 2010 12:26:59 +0000 (07:26 -0500)]
ext4: Add error checking to kmem_cache_alloc() call in ext4_free_blocks()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Joe Perches [Mon, 20 Dec 2010 03:43:19 +0000 (22:43 -0500)]
ext4: Use printf extension %pV
Using %pV reduces the number of printk calls and eliminates any
possible message interleaving from other printk calls.
In function __ext4_grp_locked_error also added KERN_CONT to some
printks.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Joe Perches [Mon, 20 Dec 2010 03:21:02 +0000 (22:21 -0500)]
ext4: Use vzalloc in ext4_fill_flex_info()
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Mon, 20 Dec 2010 03:10:31 +0000 (22:10 -0500)]
ext4: zero out nanosecond timestamps for small inodes
When nanosecond timestamp resolution isn't supported on an ext4
partition (inode size = 128), stat() appears to be returning
uninitialized garbage in the nanosecond component of timestamps.
EXT4_INODE_GET_XTIME should zero out tv_nsec when EXT4_FITS_IN_INODE
evaluates to false.
Reported-by: Jordan Russell <jr-list-2010@quo.to>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 20 Dec 2010 03:07:02 +0000 (22:07 -0500)]
ext4: optimize ext4_check_dir_entry() with unlikely() annotations
This function gets called a lot for large directories, and the answer
is almost always "no, no, there's no problem". This means using
unlikely() is a good thing.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jesper Juhl [Mon, 20 Dec 2010 02:41:55 +0000 (21:41 -0500)]
ext4: use kmem_cache_zalloc() in ext4_init_io_end()
Use advantage of kmem_cache_zalloc() to remove a memset() call in
ext4_init_io_end() and save a few bytes.
Before:
[jj@dragon linux-2.6]$ size fs/ext4/page-io.o
text data bss dec hex filename
3016 0 624 3640 e38 fs/ext4/page-io.o
After:
[jj@dragon linux-2.6]$ size fs/ext4/page-io.o
text data bss dec hex filename
3000 0 624 3624 e28 fs/ext4/page-io.o
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tobias Klauser [Mon, 20 Dec 2010 02:38:46 +0000 (21:38 -0500)]
ext4: Remove redundant unlikely()
IS_ERR() already implies unlikely(), so it can be omitted here.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 18 Dec 2010 18:39:38 +0000 (13:39 -0500)]
jbd2: simplify return path of journal_init_common
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 18 Dec 2010 18:36:33 +0000 (13:36 -0500)]
jbd2: move debug message into debug #ifdef
This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com>
originally made to fs/jbd.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 18 Dec 2010 18:34:20 +0000 (13:34 -0500)]
jbd2: remove unnecessary goto statement
This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com>
originally made to fs/jbd.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 18 Dec 2010 18:13:40 +0000 (13:13 -0500)]
jbd2: use offset_in_page() instead of manual calculation
This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com>
originally made to fs/jbd.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 18 Dec 2010 18:07:34 +0000 (13:07 -0500)]
jbd2: Fix a debug message in do_get_write_access()
'buffer_head' should be 'journal_head'
This is a port of a patch which Namhyung Kim <namhyung@gmail.com> made
to fs/jbd to jbd2.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 17 Dec 2010 15:44:16 +0000 (10:44 -0500)]
jbd2: Use pr_notice_ratelimited() in journal_alloc_journal_head()
We had an open-coded version of printk_ratelimited(); use the provided
abstraction to make the code cleaner and easier to understand.
Based on a similar patch for fs/jbd from Namhyung Kim <namhyung@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 17 Dec 2010 15:40:47 +0000 (10:40 -0500)]
ext4: Use pr_warning_ratelimited() instead of printk_ratelimit()
printk_ratelimit() is deprecated since it is a global instead of a
per-printk ratelimit.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 16 Dec 2010 21:38:26 +0000 (16:38 -0500)]
ext4: Fix up comments in inode.c
This fixes up some broken argument descriptions that Namhyung Kim had
originally submitted for ext3. This fixes the comments that were
still applicable in ext4.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 16 Dec 2010 01:30:48 +0000 (20:30 -0500)]
ext4: Add second mount options field since the s_mount_opt is full up
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 16 Dec 2010 01:28:48 +0000 (20:28 -0500)]
ext4: Move struct ext4_mount_options from ext4.h to super.c
Move the ext4_mount_options structure definition from ext4.h, since it
is only used in super.c.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 16 Dec 2010 01:26:48 +0000 (20:26 -0500)]
ext4: Simplify the usage of clear_opt() and set_opt() macros
Change clear_opt() and set_opt() to take a superblock pointer instead
of a pointer to EXT4_SB(sb)->s_mount_opt. This makes it easier for us
to support a second mount option field.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Thu, 16 Dec 2010 01:24:48 +0000 (17:24 -0800)]
Linux 2.6.37-rc6
Linus Torvalds [Thu, 16 Dec 2010 01:24:05 +0000 (17:24 -0800)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h
Linus Torvalds [Wed, 15 Dec 2010 20:41:17 +0000 (12:41 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix typo which broke '..' detection in ext4_find_entry()
ext4: Turn off multiple page-io submission by default
Jeremy Fitzhardinge [Wed, 8 Dec 2010 20:39:12 +0000 (12:39 -0800)]
xen: Provide a variant of __RING_SIZE() that is an integer constant expression
Without this, gcc 4.5 won't compile xen-netfront and xen-blkfront, where
this is being used to specify array sizes.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Miller <davem@davemloft.net>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 9 Dec 2010 23:45:27 +0000 (15:45 -0800)]
MAINTAINERS: update MSM git tree
The MSM main git tree has changed over to this new address.
Signed-off-by: Daniel Walker <dwalker@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tavis Ormandy [Thu, 9 Dec 2010 14:29:42 +0000 (15:29 +0100)]
install_special_mapping skips security_file_mmap check.
The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.
bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.
$ uname -m
x86_64
$ cat /proc/sys/vm/mmap_min_addr
65536
$ cat install_special_mapping.s
section .bss
resb BSS_SIZE
section .text
global _start
_start:
mov eax, __NR_pause
int 0x80
$ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
$ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
$ ./install_special_mapping &
[1] 14303
$ cat /proc/14303/maps
0000f000-
00010000 r-xp
00000000 00:00 0 [vdso]
00010000-
00011000 r-xp
00001000 00:19
2453665 /home/taviso/install_special_mapping
00011000-
ffffe000 rwxp
00000000 00:00 0 [stack]
It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.
Signed-off-by: Tavis Ormandy <taviso@google.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Acked-by: Robert Swiecki <swiecki@google.com>
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 15 Dec 2010 09:58:57 +0000 (17:58 +0800)]
crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h
Add missing header file:
arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR'
arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Wed, 15 Dec 2010 02:50:10 +0000 (18:50 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: It is likely that WORKER_NOT_RUNNING is true
MAINTAINERS: Add workqueue entry
workqueue: check the allocation of system_unbound_wq
Linus Torvalds [Wed, 15 Dec 2010 02:49:40 +0000 (18:49 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: protect against NULL reference when waiting to start a raid10.
md: fix bug with re-adding of partially recovered device.
md: fix possible deadlock in handling flush requests.
md: move code in to submit_flushes.
md: remove handling of flush_pending in md_submit_flush_data
Major Lee [Fri, 10 Dec 2010 10:13:49 +0000 (10:13 +0000)]
dw_spi: Fix missing final read in some polling situations
There is a possibility that the last word of a transaction will be lost
if data is not ready. Re-read in poll_transfer() to solve this issue
when poll_mode is enabled.
Verified on SPI touch screen device.
Signed-off-by: Major Lee <major_lee@wistron.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Tue, 14 Dec 2010 15:29:08 +0000 (15:29 +0000)]
i2c_intel_mid: Fix slash in sysfs name
This gets caught by the new sanity check code. Instead of the slash use a
different symbol. This was originally found by Major Lee who proposed a
rather more complex patch which changed the name according to the chip
type.
On the basis that we are in a late -rc and making Linus grumpy isn't always
a good idea (however fun) this is a simple alternative.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaro Koskinen [Wed, 15 Dec 2010 02:45:31 +0000 (21:45 -0500)]
ext4: fix typo which broke '..' detection in ext4_find_entry()
There should be a check for the NUL character instead of '0'.
Fortunately the only thing that cares about this is NFS serving, which
is why we didn't notice this in the merge window testing.
Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Wed, 15 Dec 2010 01:37:08 +0000 (17:37 -0800)]
Merge branch 'sh-fixes-for-linus' of git://git./linux/kernel/git/lethal/sh-2.6
* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: wire up accept4 syscall (non-multiplexed path)
sh: Enable deprecated IRQ chip APIs for MFD and GPIOLIB drivers.
Linus Torvalds [Wed, 15 Dec 2010 01:36:35 +0000 (17:36 -0800)]
Merge branch 'omap-fixes-for-linus' of git://git./linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
OMAP2: PRCM: fix some SHIFT macros that were actually bitmasks
OMAP2+: PM/serial: fix console semaphore acquire during suspend
OMAP1: SRAM: fix size for OMAP1611 SoCs
arm: omap2: io: fix clk_get() error check
arm: plat-omap: counter_32k: use IS_ERR() instead of NULL check
omap: nand: remove hardware ECC as default
omap: zoom: wl1271 slot is MMC_CAP_POWER_OFF_CARD
omap: PM debug: fix wake-on-timer debugfs dependency
Linus Torvalds [Wed, 15 Dec 2010 01:36:10 +0000 (17:36 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 6535/1: V6 MPCore v6_dma_inv_range and v6_dma_flush_range RWFO fix
ARM: 6534/1: Make CONFIG_FPE_NWFPE depend on !CONFIG_THUMB2_KERNEL
ARM: 6533/1: Thumb-2: Make CONFIG_THUMB2_KERNEL depend on !CPU_V6
Change bcmring Maintainer list.
ARM: Update mach-types
ARM: 6528/1: Use CTR for the I-cache line size on ARMv7
ARM: 6527/1: Use CTR instead of CCSIDR for the D-cache line size on ARMv7
ARM: pxa/palm: fix ifdef around gen_nand driver registration
ARM: pxa: fix pxa2xx-flash section mismatch
ARM: mmp2: remove not used clk_rtc
Linus Torvalds [Wed, 15 Dec 2010 01:34:00 +0000 (17:34 -0800)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Write to prom console using indirect buffer.
sparc: Delete prom_*getchar().
sparc: Pass buffer pointer all the way down to prom_{get,put}char().
sparc: Do not export prom_nb{get,put}char().
sparc64: Delete prom_setcallback().
sparc64: Unexport prom_service_exists().
sparc: Kill prom devops_{32,64}.c
sparc: Remove prom_pathtoinode()
sparc64: Delete prom_puts() unused.
SPARC/LEON: removed constant timer initialization as if HZ=100, now it reflects the value of HZ
Linus Torvalds [Wed, 15 Dec 2010 01:33:40 +0000 (17:33 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (75 commits)
pppoe.c: Fix kernel panic caused by __pppoe_xmit
WAN: Fix a TX IRQ causing BUG() in PC300 and PCI200SYN drivers.
bnx2x: Advance a version number to 1.60.01-0
bnx2x: Fixed a compilation warning
bnx2x: LSO code was broken on BE platforms
qlge: Fix deadlock when cancelling worker.
net: fix skb_defer_rx_timestamp()
cxgb4vf: Ingress Queue Entry Size needs to be 64 bytes
phy: add the IC+ IP1001 driver
atm: correct sysfs 'device' link creation and parent relationships
MAINTAINERS: remove me from tulip
SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address
enic: Bug Fix: Pass napi reference to the isr that services receive queue
ipv6: fix nl group when advertising a new link
connector: add module alias
net: Document the kernel_recvmsg() function
r8169: Fix runtime power management
hso: IP checksuming doesn't work on GE0301 option cards
xfrm: Fix xfrm_state_migrate leak
net: Convert netpoll blocking api in bonding driver to be a counter
...
Linus Torvalds [Wed, 15 Dec 2010 01:32:56 +0000 (17:32 -0800)]
Merge git://git./linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] hpsa: fix redefinition of PCI_DEVICE_ID_CISSF
[SCSI] qla2xxx: Update version number to 8.03.05-k0.
[SCSI] qla2xxx: Properly set the return value in qla2xxx_eh_abort function.
[SCSI] qla2xxx: Correct issue where NPIV-config data was not being allocated for 82xx parts.
[SCSI] qla2xxx: Change MSI initialization from using incorrect request_irq parameter.
[SCSI] qla2xxx: Populate Command Type 6 LUN field properly.
[SCSI] zfcp: Issue FCP command without holding SCSI host_lock
[SCSI] zfcp: Prevent usage w/o holding a reference
[SCSI] zfcp: No ERP escalation on gpn_ft eval
[SCSI] zfcp: Correct false abort data assignment.
[SCSI] zfcp: Fix common FCP request reception
[SCSI] Eliminate error handler overload of the SCSI serial number
[SCSI] pmcraid: disable msix and expand device config entry
[SCSI] bsg: correct fault if queue object removed while dev_t open
[SCSI] osd: checking NULL instead of ERR_PTR()
Linus Torvalds [Tue, 14 Dec 2010 22:35:04 +0000 (14:35 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdboc,input: Fix regression with keyboard release key and early debugging
Linus Torvalds [Tue, 14 Dec 2010 22:33:33 +0000 (14:33 -0800)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI / PM: Do not save/restore NVS on Sony Vaio VGN-NW130D
ACPI/HEST: adjust section selection
ACPI: eliminate unused variable warning for !ACPI_SLEEP
ACPI/PNP: avoid section mismatch warning
ACPI thermal: remove two unused functions
ACPI: fix a section mismatch
ACPI, APEI, use raw spinlock in ERST
ACPI: video: fix build for CONFIG_ACPI=n
ACPI: video: fix build for VIDEO_OUTPUT_CONTROL=n
ACPI: fix allowing to add/remove multiple _OSI strings
acpi: fix _OSI string setup regression
ACPI: EC: Add another dmi match entry for MSI hardware
ACPI battery: update status upon sysfs query
ACPI ac: update AC status upon sysfs query
ACPI / PM: Do not refcount power resources that can't be turned on
ACPI / PM: Check device state before refcounting power resources
Linus Torvalds [Tue, 14 Dec 2010 22:33:13 +0000 (14:33 -0800)]
Merge branch 'idle-release' of git://git./linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
intel_idle: recognize ARAT on WSM-EX
Valentine Barshak [Mon, 13 Dec 2010 23:03:16 +0000 (00:03 +0100)]
ARM: 6535/1: V6 MPCore v6_dma_inv_range and v6_dma_flush_range RWFO fix
Cache ownership must be acquired by reading/writing data from the
cache line to make cache operation have the desired effect on the
SMP MPCore CPU. However, the ownership is never acquired in the
v6_dma_inv_range function when cleaning the first line and
flushing the last one, in case the address is not aligned
to D_CACHE_LINE_SIZE boundary.
Fix this by reading/writing data if needed, before performing
cache operations.
While at it, fix v6_dma_flush_range to prevent RWFO outside
the buffer.
Cc: stable@kernel.org
Signed-off-by: Valentine Barshak <vbarshak@mvista.com>
Signed-off-by: George G. Davis <gdavis@mvista.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Martin [Mon, 13 Dec 2010 20:56:03 +0000 (21:56 +0100)]
ARM: 6534/1: Make CONFIG_FPE_NWFPE depend on !CONFIG_THUMB2_KERNEL
Because the nwfpe support is unlikely to be used on new platforms
and requires CONFIG_OABI_COMPAT, which is not generally used with
ARMv7+, we shouldn't expect to build nwfpe support into a Thumb-2
kernel.
At present, nwfpe contains assembly code which isn't Thumb-2
compatible, and for now it doesn't appear useful to port this
code.
All ARMv7-A/R platforms necessarily have VFPv3 hardware floating-
point natively, making emulation unnecessary.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Martin [Mon, 13 Dec 2010 20:53:53 +0000 (21:53 +0100)]
ARM: 6533/1: Thumb-2: Make CONFIG_THUMB2_KERNEL depend on !CPU_V6
This makes sense, because Thumb-2 code can't execute on plain
ARMv6 processors.
This will avoid accidentally configuring a broken kernel where the
config otherwise would allow multiple architecture versions to
coexist in the same kernel.
Not adding !CPU_V5 etc., because the chance of anyone trying to
put v5 and v7 in the same kernel is low, and I'm not aware of
any mach which can do this. These could be added later if it
matters.
Note that the rules may need to be refined if support for the
ARM1156J(F)-S processor is later added to the kernel, since this
processor supports the rare ARMv6T2 extensions, which add support
for Thumb-2 and a few other ARMv7 features.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Jiandong Zheng [Tue, 14 Dec 2010 21:55:49 +0000 (21:55 +0000)]
Change bcmring Maintainer list.
I am Jiandong Zheng working on BCMRING in Broadcom Canada Ltd. I am
replacing Leo Chen (leochen@broadcom.com) as "ARM/BCMRING ARM
ARCHITECTURE" and "ARM/BCMRING MTD NAND DRIVER" maintainer from
Broadcom as he is no longer the maintainer of these components.
Signed-off-by: Jiandong Zheng <jdzheng@broadcom.com>
Acked-by: Scott Branden <sbranden@broadcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Tue, 14 Dec 2010 21:37:12 +0000 (13:37 -0800)]
Merge branch 'fbdev-fixes-for-linus' of git://git./linux/kernel/git/lethal/fbdev-2.6
* 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6:
fbdev: Fix fb_find_nearest_mode refresh comparison
Linus Torvalds [Tue, 14 Dec 2010 21:36:26 +0000 (13:36 -0800)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
hwmon: (ltc4215) make sysfs file match the alarm cause
Linus Torvalds [Tue, 14 Dec 2010 21:35:47 +0000 (13:35 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/djbw/async_tx
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dmaengine: at_hdmac: fix buffer transfer size specification
fsldma: fix issue of slow dma
dmaengine i.MX SDMA: initialize on module_init
dma : EG20T PCH: Fix miss-setting DMA descriptor
intel_mid_dma: fix section mismatch warnings
dmaengine: imx-sdma: fix bug in buffer descriptor initialization
drivers/dma/ppc4xx: Use printf extension %pR for struct resource
drivers/dma/ioat: Use the ccflag-y instead of EXTRA_CFLAGS
drivers/dma/: Use the ccflag-y instead of EXTRA_CFLAGS
dma: intel_mid_dma: fix double free on mid_setup_dma error path
dma: imx-dma: fix imxdma_probe error path
Linus Torvalds [Tue, 14 Dec 2010 21:34:25 +0000 (13:34 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/uverbs: Handle large number of entries in poll CQ
Linus Torvalds [Tue, 14 Dec 2010 21:33:52 +0000 (13:33 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/ieee1394/linux1394-2.6
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: ohci: fix regression with Agere FW643 rev 06, disable MSI
firewire: ohci: fix regression with VIA VT6315, disable MSI
Linus Torvalds [Tue, 14 Dec 2010 21:33:21 +0000 (13:33 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
regulator: tps6586x: correct register table
regulator: tps6586x: Handle both enable reg/bits being the same
regulator: tps6586x: Fix TPS6586X_DVM to store goreg/bit
regulator: tps6586x: Add missing bit mask generation
Linus Torvalds [Tue, 14 Dec 2010 21:32:40 +0000 (13:32 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: HDA: Quirk for Dell Vostro 320 to make microphone work
ALSA: hda - Reset sample sizes and max bitrates when reading ELD
ALSA: hda - Always allow basic audio irrespective of ELD info
ALSA: hda - Do not wrongly restrict min_channels based on ELD
ASoC: Correct WM8962 interrupt mask register read
ASoC: WM8580: Debug BCLK and sample size
ASoC: Fix resource leak if soc_register_ac97_dai_link failed
ASoC: Hold client_mutex while calling snd_soc_instantiate_cards()
ASoC: Fix swap of left and right channels for WM8993/4 speaker boost gain
ASoC: Fix off by one error in WM8994 EQ register bank size
ALSA: hda: Use position_fix=1 for Acer Aspire 5538 to enable capture on internal mic
ALSA: hda - Enable jack sense for Thinkpad Edge 13
ALSA: hda - Fix ThinkPad T410[s] docking station line-out
ALSA: hda: Use model=lg quirk for LG P1 Express to enable playback and capture
Linus Torvalds [Tue, 14 Dec 2010 21:32:19 +0000 (13:32 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
amd64_edac: Fix interleaving check
EDAC: Correct MiB_TO_PAGES() macro
EDAC: Fix workqueue-related crashes
Linus Torvalds [Tue, 14 Dec 2010 21:31:49 +0000 (13:31 -0800)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: don't apply 7xx HDP flush workaround on AGP
drm: use after free in drm_queue_vblank_event()
drm/kms: remove spaces from connector names (v2)
Linus Torvalds [Tue, 14 Dec 2010 21:31:23 +0000 (13:31 -0800)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (adm1026) Allow 1 as a valid divider value
hwmon: (adm1026) Fix setting fan_div
hwmon: (it87) Fix manual fan speed control on IT8721F
Theodore Ts'o [Tue, 14 Dec 2010 20:27:50 +0000 (15:27 -0500)]
ext4: Turn off multiple page-io submission by default
Jon Nelson has found a test case which causes postgresql to fail with
the error:
psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581
Under memory pressure, it looks like part of a file can end up getting
replaced by zero's. Until we can figure out the cause, we'll roll
back the change and use block_write_full_page() instead of
ext4_bio_write_page(). The new, more efficient writing function can
be used via the mount option mblk_io_submit, so we can test and fix
the new page I/O code.
To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
memory such that the system just at the end of triggering writeback
before running the following sql script:
begin;
create temporary table foo as select x as a, ARRAY[x] as b FROM
generate_series(1,
10000000 ) AS x;
create index foo_a_idx on foo (a);
create index foo_b_idx on foo USING GIN (b);
rollback;
If the temporary table is created on a hard drive partition which is
encrypted using dm_crypt, then under memory pressure, approximately
30-40% of the time, pgsql will issue the above failure.
This patch should fix this problem, and the problem will come back if
the file system is mounted with the mblk_io_submit mount option.
Reported-by: Jon Nelson <jnelson@jamponi.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Tue, 14 Dec 2010 19:09:05 +0000 (11:09 -0800)]
Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix possible BUG_ON firing in set_change_info
sunrpc: prevent use-after-free on clearing XPT_BUSY
Linus Torvalds [Tue, 14 Dec 2010 19:08:13 +0000 (11:08 -0800)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: prevent RAID level downgrades when space is low
Btrfs: account for missing devices in RAID allocation profiles
Btrfs: EIO when we fail to read tree roots
Btrfs: fix compiler warnings
Btrfs: Make async snapshot ioctl more generic
Btrfs: pwrite blocked when writing from the mmaped buffer of the same page
Btrfs: Fix a crash when mounting a subvolume
Btrfs: fix sync subvol/snapshot creation
Btrfs: Fix page leak in compressed writeback path
Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots
Btrfs: fixup return code for btrfs_del_orphan_item
Btrfs: do not do fast caching if we are allocating blocks for tree_root
Btrfs: deal with space cache errors better
Btrfs: fix use after free in O_DIRECT
Linus Torvalds [Tue, 14 Dec 2010 19:07:39 +0000 (11:07 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: verify ioctl retries
fuse: fix ioctl when server is 32bit
Linus Torvalds [Tue, 14 Dec 2010 19:06:17 +0000 (11:06 -0800)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log timestamp changes to the source inode in rename
Linus Torvalds [Tue, 14 Dec 2010 19:02:15 +0000 (11:02 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix ioctl magic
ceph: Behave better when handling file lock replies.
ceph: pass lock information by struct file_lock instead of as individual params.
ceph: Handle file locks in replies from the MDS.
ceph: avoid possible null deref in readdir after dir llseek
Linus Torvalds [Tue, 14 Dec 2010 16:51:12 +0000 (08:51 -0800)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix panic after nfs_umount()
nfs: remove extraneous and problematic calls to nfs_clear_request
nfs: kernel should return EPROTONOSUPPORT when not support NFSv4
NFS: Fix fcntl F_GETLK not reporting some conflicts
nfs: Discard ACL cache on mode update
NFS: Readdir cleanups
NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found
NFS: Fix a memory leak in nfs_readdir
Call the filesystem back whenever a page is removed from the page cache
NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
Linus Torvalds [Tue, 14 Dec 2010 16:49:15 +0000 (08:49 -0800)]
Merge git://git./linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: remove bogus remapping of error in cifs_filldir()
cifs: allow calling cifs_build_path_to_root on incomplete cifs_sb
cifs: fix check of error return from is_path_accessable
cifs: remove Local_System_Name
cifs: fix use of CONFIG_CIFS_ACL
cifs: add attribute cache timeout (actimeo) tunable
Steven Rostedt [Sat, 4 Dec 2010 04:12:33 +0000 (23:12 -0500)]
workqueue: It is likely that WORKER_NOT_RUNNING is true
Running the annotate branch profiler on three boxes, including my
main box that runs firefox, evolution, xchat, and is part of the distcc farm,
showed this with the likelys in the workqueue code:
correct incorrect % Function File Line
------- --------- - -------- ---- ----
96 996253 99 wq_worker_sleeping workqueue.c 703
96 996247 99 wq_worker_waking_up workqueue.c 677
The likely()s in this case were assuming that WORKER_NOT_RUNNING will
most likely be false. But this is not the case. The reason is
(and shown by adding trace_printks and testing it) that most of the time
WORKER_PREP is set.
In worker_thread() we have:
worker_clr_flags(worker, WORKER_PREP);
[ do work stuff ]
worker_set_flags(worker, WORKER_PREP, false);
(that 'false' means not to wake up an idle worker)
The wq_worker_sleeping() is called from schedule when a worker thread
is putting itself to sleep. Which happens most of the time outside
of that [ do work stuff ].
The wq_worker_waking_up is called by the wakeup worker code, which
is also callod outside that [ do work stuff ].
Thus, the likely and unlikely used by those two functions are actually
backwards.
Remove the annotation and let gcc figure it out.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Fri, 10 Dec 2010 16:20:23 +0000 (17:20 +0100)]
MAINTAINERS: Add workqueue entry
Signed-off-by: Tejun Heo <tj@kernel.org>
Andrew Kephart [Mon, 13 Dec 2010 15:46:34 +0000 (09:46 -0600)]
fbdev: Fix fb_find_nearest_mode refresh comparison
Refresh rate nearness is not calculated or reset when nearest resolution
changes.
This patch resets the refresh rate differential measurement whenever a
new nearest resolution is discovered. This fixes two error cases;
first, wherein the first mode's refresh rate differential is never
calculated and second, when the closest refresh rate from a previous
nearest resolution is erroneously preserved.
Signed-off-by: Andrew Kephart <andrew.kephart@alereon.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Carmelo AMOROSO [Mon, 13 Dec 2010 10:20:26 +0000 (10:20 +0000)]
sh: wire up accept4 syscall (non-multiplexed path)
Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Nicolas Ferre [Mon, 13 Dec 2010 12:48:41 +0000 (13:48 +0100)]
dmaengine: at_hdmac: fix buffer transfer size specification
Buffer transfer size is the number of transfers to be performed in
relation with the width of the _source_ interface.
So in the DMA_FROM_DEVICE case, it should be the register width that
should be taken into account.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jason Wessel [Wed, 1 Dec 2010 19:01:01 +0000 (13:01 -0600)]
kgdboc,input: Fix regression with keyboard release key and early debugging
The commit
111c182340cd22e238ab1cc6564df336c6ebd7cb (kgdboc: reset
input devices (keyboards) when exiting debugger) introduced a
regression in early debugging such that you get a kernel oops on
continue (with the go command) if you boot a kernel with:
earlyprintk=vga ekgdboc=kbd kgdbwait
The restore kgdboc_restore_input() routine schedules work for the
purpose of sending key release events for any keys that were in the
depressed state prior to entering the kernel debugger. A simple fix
to the crash is to not invoke the schedule_work() if the kernel
system_state is anything other than SYSTEM_RUNNING.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Sergei Shtylyov <sshtylyov@mvista.com>
Len Brown [Tue, 14 Dec 2010 03:40:54 +0000 (22:40 -0500)]
Merge branch 'bugzilla-23002' into release
Rafael J. Wysocki [Sun, 12 Dec 2010 20:10:42 +0000 (21:10 +0100)]
ACPI / PM: Do not save/restore NVS on Sony Vaio VGN-NW130D
The saving of the NVS memory area during suspend and restoring it
during resume causes problems to appear on Sony Vaio VGN-NW130D, so
blacklist that machine to avoid those problems.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=23002
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: Adriano <adriano.vilela@yahoo.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Chris Mason [Mon, 13 Dec 2010 20:06:46 +0000 (15:06 -0500)]
Btrfs: prevent RAID level downgrades when space is low
The extent allocator has code that allows us to fill
allocations from any available block group, even if it doesn't
match the raid level we've requested.
This was put in because adding a new drive to a filesystem
made with the default mkfs options actually upgrades the metadata from
single spindle dup to full RAID1.
But, the code also allows us to allocate from a raid0 chunk when we
really want a raid1 or raid10 chunk. This can cause big trouble because
mkfs creates a small (4MB) raid0 chunk for data and metadata which then
goes unused for raid1/raid10 installs.
The allocator will happily wander in and allocate from that chunk when
things get tight, which is not correct.
The fix here is to make sure that we provide duplication when the
caller has asked for it. It does all the dups to be any raid level,
which preserves the dup->raid1 upgrade abilities.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 13 Dec 2010 19:56:23 +0000 (14:56 -0500)]
Btrfs: account for missing devices in RAID allocation profiles
When we mount in RAID degraded mode without adding a new device to
replace the failed one, we can end up using the wrong RAID flags for
allocations.
This results in strange combinations of block groups (raid1 in a raid10
filesystem) and corruptions when we try to allocate blocks from single
spindle chunks on drives that are actually missing.
The first device has two small 4MB chunks in it that mkfs creates and
these are usually unused in a raid1 or raid10 setup. But, in -o degraded,
the allocator will fall back to these because the mask of desired raid groups
isn't correct.
The fix here is to count the missing devices as we build up the list
of devices in the system. This count is used when picking the
raid level to make sure we continue using the same levels that were
in place before we lost a drive.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Forrest Shi [Thu, 9 Dec 2010 08:14:04 +0000 (16:14 +0800)]
fsldma: fix issue of slow dma
Fixed fsl dma slow issue by initializing dma mode register with
bandwidth control. It boosts dma performance and should works
with 85xx board.
Signed-off-by: Forrest Shi <b29237@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Chris Mason [Mon, 13 Dec 2010 19:47:58 +0000 (14:47 -0500)]
Btrfs: EIO when we fail to read tree roots
If we just get a plain IO error when we read tree roots, the code
wasn't properly sending that error up the chain. This allowed mounts to
continue when they should failed, and allowed operations
on partially setup root structs. The end result was usually oopsen
on spinlocks that hadn't been spun up correctly.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Ira W. Snyder [Mon, 13 Dec 2010 16:42:30 +0000 (11:42 -0500)]
hwmon: (ltc4215) make sysfs file match the alarm cause
The ltc4215 driver used the chip's "power good" status bit to provide
the power1_alarm file. This is wrong: the chip is really reporting the
status of one of the monitored voltages.
Change the sysfs file from power1_alarm to in2_min_alarm instead. This
matches the voltage that the chip is raising an alarm for.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>