Nishanth Aravamudan [Tue, 4 Mar 2008 22:29:42 +0000 (14:29 -0800)]
 
hugetlb: fix pool shrinking while in restricted cpuset
Adam Litke noticed that currently we grow the hugepage pool independent of any
cpuset the running process may be in, but when shrinking the pool, the cpuset
is checked.  This leads to inconsistency when shrinking the pool in a
restricted cpuset -- an administrator may have been able to grow the pool on a
node restricted by a containing cpuset, but they cannot shrink it there.
There are two options: either prevent growing of the pool outside of the
cpuset or allow shrinking outside of the cpuset.  >From previous discussions
on linux-mm, /proc/sys/vm/nr_hugepages is an administrative interface that
should not be restricted by cpusets.  So allow shrinking the pool by removing
pages from nodes outside of current's cpuset.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: William Irwin <wli@holomorphy.com>
Cc: Lee Schermerhorn <Lee.Schermerhonr@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adam Litke [Tue, 4 Mar 2008 22:29:38 +0000 (14:29 -0800)]
 
hugetlb: close a difficult to trigger reservation race
A hugetlb reservation may be inadequately backed in the event of racing
allocations and frees when utilizing surplus huge pages.  Consider the
following series of events in processes A and B:
 A) Allocates some surplus pages to satisfy a reservation
 B) Frees some huge pages
 A) A notices the extra free pages and drops hugetlb_lock to free some of
    its surplus pages back to the buddy allocator.
 B) Allocates some huge pages
 A) Reacquires hugetlb_lock and returns from gather_surplus_huge_pages()
Avoid this by commiting the reservation after pages have been allocated but
before dropping the lock to free excess pages.  For parity, release the
reservation in return_unused_surplus_pages().
This patch also corrects the cpuset_mems_nr() error path in
hugetlb_acct_memory().  If the cpuset check fails, uncommit the
reservation, but also be sure to return any surplus huge pages that may
have been allocated to back the failed reservation.
Thanks to Andy Whitcroft for discovering this.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
K.Tanaka [Tue, 4 Mar 2008 22:29:37 +0000 (14:29 -0800)]
 
md: the md RAID10 resync thread could cause a md RAID10 array deadlock
This message describes another issue about md RAID10 found by testing the
2.6.24 md RAID10 using new scsi fault injection framework.
Abstract:
When a scsi error results in disabling a disk during RAID10 recovery, the
resync threads of md RAID10 could stall.
This case, the raid array has already been broken and it may not matter.  But
I think stall is not preferable.  If it occurs, even shutdown or reboot will
fail because of resource busy.
The deadlock mechanism:
The r10bio_s structure has a "remaining" member to keep track of BIOs yet to
be handled when recovering.  The "remaining" counter is incremented when
building a BIO in sync_request() and is decremented when finish a BIO in
end_sync_write().
If building a BIO fails for some reasons in sync_request(), the "remaining"
should be decremented if it has already been incremented.  I found a case
where this decrement is forgotten.  This causes a md_do_sync() deadlock
because md_do_sync() waits for md_done_sync() called by end_sync_write(), but
end_sync_write() never calls md_done_sync() because of the "remaining" counter
mismatch.
For example, this problem would be reproduced in the following case:
Personalities : [raid10]
md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F)
      
3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_]
      [>....................]  recovery =  2.2% (45376/
1959808) finish=0.7min speed=45376K/sec
This case, sdf1 is recovering, sdb1 and sde1 are disabled.
An additional error with detaching sdd will cause a deadlock.
md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F)
      
3919616 blocks 64K chunks 2 near-copies [4/1] [_U__]
      [=>...................]  recovery =  5.0% (99520/
1959808) finish=5.9min speed=5237K/sec
 2739 ?        S<     0:17 [md0_raid10]
28608 ?        D<     0:00 [md0_resync]
28629 pts/1    Ss     0:00 bash
28830 pts/1    R+     0:00 ps ax
31819 ?        D<     0:00 [kjournald]
The resync thread keeps working, but actually it is deadlocked.
Patch:
By this patch, the remaining counter will be decremented if needed.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 4 Mar 2008 22:29:35 +0000 (14:29 -0800)]
 
md: fix possible raid1/raid10 deadlock on read error during resync
Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for
another possible deadlock in raid1/raid10 error handing.
If a read request returns an error while a resync is happening and a resync
request is pending, the attempt to fix the error will block until the resync
progresses, and the resync will block until the read request completes.  Thus
a deadlock.
This patch fixes the problem.
Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keld Simonsen [Tue, 4 Mar 2008 22:29:34 +0000 (14:29 -0800)]
 
md: don't attempt read-balancing for raid10 'far' layouts
This patch changes the disk to be read for layout "far > 1" to always be the
disk with the lowest block address.
Thus the chunks to be read will always be (for a fully functioning array) from
the first band of stripes, and the raid will then work as a raid0 consisting
of the first band of stripes.
Some advantages:
The fastest part which is the outer sectors of the disks involved will be
used.  The outer blocks of a disk may be as much as 100 % faster than the
inner blocks.
Average seek time will be smaller, as seeks will always be confined to the
first part of the disks.
Mixed disks with different performance characteristics will work better, as
they will work as raid0, the sequential read rate will be number of disks
involved times the IO rate of the slowest disk.
If a disk is malfunctioning, the first disk which is working, and has the
lowest block address for the logical block will be used.
Signed-off-by: Keld Simonsen <keld@dkuug.dk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 4 Mar 2008 22:29:33 +0000 (14:29 -0800)]
 
md: lock access to rdev attributes properly
When we access attributes of an rdev (component device on an md array) through
sysfs, we really need to lock the array against concurrent changes.  We
currently do that when we change an attribute, but not when we read an
attribute.  We need to lock when reading as well else rdev->mddev could become
NULL while we are accessing it.
So add appropriate locking (mddev_lock) to rdev_attr_show.
rdev_size_store requires some extra care as well as it needs to unlock the
mddev while scanning other mddevs for overlapping regions.  We currently
assume that rdev->mddev will still be unchanged after the scan, but that
cannot be certain.  So take a copy of rdev->mddev for use at the end of the
function.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 4 Mar 2008 22:29:32 +0000 (14:29 -0800)]
 
md: make sure a reshape is started when device switches to read-write
A resync/reshape/recovery thread will refuse to progress when the array is
marked read-only.  So whenever it mark it not read-only, it is important to
wake up thread resync thread.  There is one place we didn't do this.
The problem manifests if the start_ro module parameters is set, and a raid5
array that is in the middle of a reshape (restripe) is started.  The array
will initially be semi-read-only (meaning it acts like it is readonly until
the first write).  So the reshape will not proceed.
On the first write, the array will become read-write, but the reshape will not
be started, and there is no event which will ever restart that thread.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 4 Mar 2008 22:29:31 +0000 (14:29 -0800)]
 
md: clean up irregularity with raid autodetect
When a raid1 array is stopped, all components currently get added to the list
for auto-detection.  However we should really only add components that were
found by autodetection in the first place.  So add a flag to record that
information, and use it.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 4 Mar 2008 22:29:31 +0000 (14:29 -0800)]
 
md: guard against possible bad array geometry in v1 metadata
Make sure the data doesn't start before the end of the superblock when the
superblock is at the start of the device.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 4 Mar 2008 22:29:30 +0000 (14:29 -0800)]
 
md: reduce CPU wastage on idle md array with a write-intent bitmap
On an md array with a write-intent bitmap, a thread wakes up every few seconds
and scans the bitmap looking for work to do.  If the array is idle, there will
be no work to do, but a lot of scanning is done to discover this.
So cache the fact that the bitmap is completely clean, and avoid scanning the
whole bitmap when the cache is known to be clean.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 4 Mar 2008 22:29:29 +0000 (14:29 -0800)]
 
md: fix deadlock in md/raid1 and md/raid10 when handling a read error
When handling a read error, we freeze the array to stop any other IO while
attempting to over-write with correct data.
This is done in the raid1d(raid10d) thread and must wait for all submitted IO
to complete (except for requests that failed and are sitting in the retry
queue - these are counted in ->nr_queue and will stay there during a freeze).
However write requests need attention from raid1d as bitmap updates might be
required.  This can cause a deadlock as raid1 is waiting for requests to
finish that themselves need attention from raid1d.
So we create a new function 'flush_pending_writes' to give that attention, and
call it in freeze_array to be sure that we aren't waiting on raid1d.
Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this
problem.
Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Tue, 4 Mar 2008 22:29:28 +0000 (14:29 -0800)]
 
iommu: parisc: make the IOMMUs respect the segment boundary limits
Make PARISC's two IOMMU implementations not allocate a memory area spanning
LLD's segment boundary.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Tue, 4 Mar 2008 22:29:28 +0000 (14:29 -0800)]
 
iommu: parisc: pass struct device to iommu_alloc_range
This adds struct device argument to sba_alloc_range and ccio_alloc_range, a
preparation for modifications to fix the IOMMU segment boundary problem.  This
change enables ccio_alloc_range to access to LLD's segment boundary limits.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Tue, 4 Mar 2008 22:29:27 +0000 (14:29 -0800)]
 
iommu: export iommu_is_span_boundary helper function
iommu_is_span_boundary is used internally in the IOMMU helper
(lib/iommu-helper.c), a primitive function that judges whether a memory area
spans LLD's segment boundary or not.
It's difficult to convert some IOMMUs to use the IOMMU helper but
iommu_is_span_boundary is still useful for them.  So this patch exports it.
This is needed for the parisc iommu fixes.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyle McMartin [Tue, 4 Mar 2008 22:29:26 +0000 (14:29 -0800)]
 
hisax_fcpcipnp: move request_irq later in probe
After a quick glance at the code, we're getting the DEBUG_SHIRQ spurious
interrupt before we have the adapter template filled in.  Real interrupts
appear to be turned on by fcpci*_init(), so move request_irq until just before
that.
Signed-off-by: Kyle McMartin <kmcmartin@redhat.com>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Halcrow [Tue, 4 Mar 2008 22:29:24 +0000 (14:29 -0800)]
 
eCryptfs: make ecryptfs_prepare_write decrypt the page
When the page is not up to date, ecryptfs_prepare_write() should be
acting much like ecryptfs_readpage(). This includes the painfully
obvious step of actually decrypting the page contents read from the
lower encrypted file.
Note that this patch resolves a bug in eCryptfs in 2.6.24 that one can
produce with these steps:
# mount -t ecryptfs /secret /secret
# echo "abc" > /secret/file.txt
# umount /secret
# mount -t ecryptfs /secret /secret
# echo "def" >> /secret/file.txt
# cat /secret/file.txt
Without this patch, the resulting data returned from cat is likely to
be something other than "abc\ndef\n".
(Thanks to Benedikt Driessen for reporting this.)
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Benedikt Driessen <bdriessen@escrypt.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Nilsson [Tue, 4 Mar 2008 22:29:23 +0000 (14:29 -0800)]
 
cris: correct syscall numbers in unistd.h for timerfd_settime and timerfd_gettime
Last commit for unistd was not correct, it only had a partial update of
syscall numbers for __NR_timerfd_settime and __NR_timerfd_gettime.  Also,
NR_syscalls was not incremented for the new syscalls.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Mikael Starvik <mikael.starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Nilsson [Tue, 4 Mar 2008 22:29:23 +0000 (14:29 -0800)]
 
cris: correct usage of __user for copy to and from user space in lib/usercopy and uaccess.h
Function __copy_user_zeroing in arch/lib/usercopy.c had the wrong parameter
set as __user, and in include/asm-cris/uaccess.h, it was not set at all for
some of the calling functions.
This will cut the number of warnings quite dramatically when using sparse.
While we're here, remove useless CVS log and correct confusing typo.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Mikael Starvik <mikael.starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Henrique de Moraes Holschuh [Tue, 4 Mar 2008 22:29:21 +0000 (14:29 -0800)]
 
ACPI: thinkpad-acpi: fix hotkey_get_tablet_mode
I used the wrong return convention on hotkey_get_tablet_mode(), breaking a lot
of stuff.  Bad Henrique!
Fix it to return the status in the parameter-by-reference, and IO status on
the function return value.  Duh.
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Julia Lawall [Tue, 4 Mar 2008 22:29:20 +0000 (14:29 -0800)]
 
fs/reiserfs/super.c: correct use of ! and &
In commit 
e6bafba5b4765a5a252f1b8d31cbf6d2459da337 ("wmi: (!x & y)
strikes again"), a bug was fixed that involved converting !x & y to !(x
& y).  The code below shows the same pattern, and thus should perhaps be
fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@ expression E1,E2; @@
(
  !E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Julia Lawall [Tue, 4 Mar 2008 22:29:19 +0000 (14:29 -0800)]
 
drivers/serial/m32r_sio.c: correct use of ! and &
In commit 
e6bafba5b4765a5a252f1b8d31cbf6d2459da337 ("wmi: (!x & y)
strikes again"), a bug was fixed that involved converting !x & y to !(x
& y).  The code below shows the same pattern, and thus should perhaps be
fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@ expression E1,E2; @@
(
  !E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Julia Lawall [Tue, 4 Mar 2008 22:29:18 +0000 (14:29 -0800)]
 
drivers/isdn: correct use of ! and &
In commit 
e6bafba5b4765a5a252f1b8d31cbf6d2459da337 ("wmi: (!x & y)
strikes again"), a bug was fixed that involved converting !x & y to !(x
& y).  The code below shows the same pattern, and thus should perhaps be
fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@ expression E1,E2; @@
(
  !E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Julia Lawall [Tue, 4 Mar 2008 22:29:17 +0000 (14:29 -0800)]
 
drivers/char/isicom.c: correct use of ! and &
In commit 
e6bafba5b4765a5a252f1b8d31cbf6d2459da337 ("wmi: (!x & y)
strikes again"), a bug was fixed that involved converting !x & y to !(x
& y).  The code below shows the same pattern, and thus should perhaps be
fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@ expression E1,E2; @@
(
  !E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:16 +0000 (14:29 -0800)]
 
memcg: fix oops on NULL lru list
While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
 I couldn't see what racing tasks on other cpus were doing, but surmise that
another must have been in mem_cgroup_charge_common on the same page, between
its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
which I'd almost changed to a kmalloc).
Normally such a race cannot happen, the ref_cnt prevents it, the final
uncharge cannot race with the initial charge.  But force_empty buggers the
ref_cnt, that's what it's all about; and thereafter forced pages are
vulnerable to races such as this (just think of a shared page also mapped into
an mm of another mem_cgroup than that just emptied).  And remain vulnerable
until they're freed indefinitely later.
This patch just fixes the oops by moving the unlock_page_cgroups down below
adding to and removing from the list (only possible given the previous patch);
and while we're at it, we might as well make it an invariant that
page->page_cgroup is always set while pc is on lru.
But this behaviour of force_empty seems highly unsatisfactory to me: why have
a ref_cnt if we always have to cope with it being violated (as in the earlier
page migration patch).  We may prefer force_empty to move pages to an orphan
mem_cgroup (could be the root, but better not), from which other cgroups could
recover them; we might need to reverse the locking again; but no time now for
such concerns.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hirokazu Takahashi [Tue, 4 Mar 2008 22:29:15 +0000 (14:29 -0800)]
 
memcg: simplify force_empty and move_lists
As for force_empty, though this may not be the main topic here,
mem_cgroup_force_empty_list() can be implemented simpler.  It is possible to
make the function just call mem_cgroup_uncharge_page() instead of releasing
page_cgroups by itself.  The tip is to call get_page() before invoking
mem_cgroup_uncharge_page(), so the page won't be released during this
function.
Kamezawa-san points out that by the time mem_cgroup_uncharge_page() uncharges,
the page might have been reassigned to an lru of a different mem_cgroup, and
now be emptied from that; but Hugh claims that's okay, the end state is the
same as when it hasn't gone to another list.
And once force_empty stops taking lock_page_cgroup within mz->lru_lock,
mem_cgroup_move_lists() can be simplified to take mz->lru_lock directly while
holding page_cgroup lock (but still has to use try_lock_page_cgroup).
Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:13 +0000 (14:29 -0800)]
 
memcg: fix mem_cgroup_move_lists locking
Ever since the VM_BUG_ON(page_get_page_cgroup(page)) (now Bad page state) went
into page freeing, I've hit it from time to time in testing on some machines,
sometimes only after many days.  Recently found a machine which could usually
produce it within a few hours, which got me there at last.
The culprit is mem_cgroup_move_lists, whose locking is inadequate; and the
arrangement of structures was such that you got page_cgroups from the lru list
neatly put on to SLUB's freelist.  Kamezawa-san identified the same hole
independently.
The main problem was that it was missing the lock_page_cgroup it needs to
safely page_get_page_cgroup; but it's tricky to go beyond that too, and I
couldn't do it with SLAB_DESTROY_BY_RCU as I'd expected.  See the code for
comments on the constraints.
This patch immediately gets replaced by a simpler one from Hirokazu-san; but
is it just foolish pride that tells me to put this one on record, in case we
need to come back to it later?
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:12 +0000 (14:29 -0800)]
 
memcg: css_put after remove_list
mem_cgroup_uncharge_page does css_put on the mem_cgroup before uncharging from
it, and before removing page_cgroup from one of its lru lists: isn't there a
danger that struct mem_cgroup memory could be freed and reused before
completing that, so corrupting something?  Never seen it, and for all I know
there may be other constraints which make it impossible; but let's be
defensive and reverse the ordering there.
mem_cgroup_force_empty_list is safe because there's an extra css_get around
all its works; but even so, change its ordering the same way round, to help
get in the habit of doing it like this.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:11 +0000 (14:29 -0800)]
 
memcg: remove clear_page_cgroup and atomics
Remove clear_page_cgroup: it's an unhelpful helper, see for example how
mem_cgroup_uncharge_page had to unlock_page_cgroup just in order to call it
(serious races from that?  I'm not sure).
Once that's gone, you can see it's pointless for page_cgroup's ref_cnt to be
atomic: it's always manipulated under lock_page_cgroup, except where
force_empty unilaterally reset it to 0 (and how does uncharge's
atomic_dec_and_test protect against that?).
Simplify this page_cgroup locking: if you've got the lock and the pc is
attached, then the ref_cnt must be positive: VM_BUG_ONs to check that, and to
check that pc->page matches page (we're on the way to finding why sometimes it
doesn't, but this patch doesn't fix that).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:10 +0000 (14:29 -0800)]
 
memcg: memcontrol uninlined and static
More cleanup to memcontrol.c, this time changing some of the code generated.
Let the compiler decide what to inline (except for page_cgroup_locked which is
only used when CONFIG_DEBUG_VM): the __always_inline on lock_page_cgroup etc.
was quite a waste since bit_spin_lock etc.  are inlines in a header file; made
mem_cgroup_force_empty and mem_cgroup_write_strategy static.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:09 +0000 (14:29 -0800)]
 
memcg: memcontrol whitespace cleanups
Sorry, before getting down to more important changes, I'd like to do some
cleanup in memcontrol.c.  This patch doesn't change the code generated, but
cleans up whitespace, moves up a double declaration, removes an unused enum,
removes void returns, removes misleading comments, that kind of thing.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:08 +0000 (14:29 -0800)]
 
memcg: remove mem_cgroup_uncharge
Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
trivial wrapper around it) and mem_cgroup_end_migration (which does the same
as mem_cgroup_uncharge_page).  And it often ends up having to lock just to let
its caller unlock.  Remove it (but leave the silly locking until a later
patch).
Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:08 +0000 (14:29 -0800)]
 
memcg: mem_cgroup_charge never NULL
My memcgroup patch to fix hang with shmem/tmpfs added NULL page handling to
mem_cgroup_charge_common.  It seemed convenient at the time, but hard to
justify now: there's a perfectly appropriate swappage to charge and uncharge
instead, this is not on any hot path through shmem_getpage, and no performance
hit was observed from the slight extra overhead.
So revert that NULL page handling from mem_cgroup_charge_common; and make it
clearer by bringing page_cgroup_assign_new_page_cgroup into its body - that
was a helper I found more of a hindrance to understanding.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:07 +0000 (14:29 -0800)]
 
memcg: bad page if page_cgroup when free
Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
were set here, it'd likely cause corruption when the page is reused.
Don't use page_assign_page_cgroup to clear it: that should be private to
memcontrol.c, and always called with the lock taken; and memmap_init_zone
doesn't need it either - like page->mapping and other pointers throughout the
kernel, Linux assumes pointers in zeroed structures are NULL pointers.
Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:06 +0000 (14:29 -0800)]
 
memcg: fix VM_BUG_ON from page migration
Page migration gave me free_hot_cold_page's VM_BUG_ON page->page_cgroup.
remove_migration_pte was calling mem_cgroup_charge on the new page whenever it
found a swap pte, before it had determined it to be a migration entry.  That
left a surplus reference count on the page_cgroup, so it was still attached
when the page was later freed.
Move that mem_cgroup_charge down to where we're sure it's a migration entry.
We were already under i_mmap_lock or anon_vma->lock, so its GFP_KERNEL was
already inappropriate: change that to GFP_ATOMIC.
It's essential that remove_migration_pte removes all the migration entries,
other crashes follow if not.  So proceed even when the charge fails: normally
it cannot, but after a mem_cgroup_force_empty it might - comment in the code.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:04 +0000 (14:29 -0800)]
 
memcg: when do_swap's do_wp_page fails
Don't uncharge when do_swap_page's call to do_wp_page fails: the page which
was charged for is there in the pagetable, and will be correctly uncharged
when that area is unmapped - it was only its COWing which failed.
And while we're here, remove earlier XXX comment: yes, OR in do_wp_page's
return value (maybe VM_FAULT_WRITE) with do_swap_page's there; but if it
fails, mask out success bits, which might confuse some arches e.g.  sparc.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:04 +0000 (14:29 -0800)]
 
memcg: page_cache_release not __free_page
There's nothing wrong with mem_cgroup_charge failure in do_wp_page and
do_anonymous page using __free_page, but it does look odd when nearby code
uses page_cache_release: use that instead (while turning a blind eye to
ancient inconsistencies of page_cache_release versus put_page).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:03 +0000 (14:29 -0800)]
 
memcg: move_lists on page not page_cgroup
Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
it's more convenient if it acts upon the page itself not the page_cgroup; and
in a later patch this becomes important to handle within memcontrol.c.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 22:29:01 +0000 (14:29 -0800)]
 
memcg: mm_match_cgroup not vm_match_cgroup
vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathieu Desnoyers [Tue, 4 Mar 2008 22:29:00 +0000 (14:29 -0800)]
 
markers: add an if(0) to __mark_check_format()
Wrap __mark_check_format() into an if(0) to make sure that parameters such as
trace_mark(mm_page_alloc, "order %u pfn %lu", order, page?page_to_pfn(page):0);
(where page_to_pfn() has side-effects) won't generate code because of the
__mark_check_format().
Thanks to Jan Kiszka for reporting this.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Juhl [Tue, 4 Mar 2008 22:29:00 +0000 (14:29 -0800)]
 
markers: don't risk NULL deref in marker
get_marker() may return NULL, so test for it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Dearman [Tue, 4 Mar 2008 22:28:59 +0000 (14:28 -0800)]
 
.gitignore: ignore emacs backup and temporary files.
Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Tue, 4 Mar 2008 22:28:58 +0000 (14:28 -0800)]
 
alpha: remove unused DEBUG_FORCEDAC define in IOMMU
This just removes unused DEBUG_FORCEDAC define in the IOMMU code.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Tue, 4 Mar 2008 22:28:57 +0000 (14:28 -0800)]
 
alpha: make IOMMU respect the segment boundary limits
This patch makes the IOMMU code not allocate a memory area spanning LLD's
segment boundary.
is_span_boundary() judges whether a memory area spans LLD's segment boundary.
If iommu_arena_find_pages() finds such a area, it tries to find the next
available memory area.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Tue, 4 Mar 2008 22:28:57 +0000 (14:28 -0800)]
 
alpha: IOMMU had better access to the free space bitmap at only one place
iommu_arena_find_pages duplicates the code to access to the bitmap for free
space management.  This patch convert the IOMMU code to have only one place to
access the bitmap, in the popular way that other IOMMUs (e.g.  POWER and
SPARC) do.
This patch is preparation for modifications to fix the IOMMU segment boundary
problem.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Tue, 4 Mar 2008 22:28:54 +0000 (14:28 -0800)]
 
alpha: convert IOMMU to use ALIGN()
This patch is preparation for modifications to fix the IOMMU segment boundary
problem.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Sandeen [Tue, 4 Mar 2008 22:28:53 +0000 (14:28 -0800)]
 
include falloc.h in header-y
Include falloc.h in header-y; it defines a flag for the fallocate sysctl.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Nilsson [Tue, 4 Mar 2008 22:28:52 +0000 (14:28 -0800)]
 
CRIS: Import string.c (memcpy) from newlib: fixes compile error with gcc 4
Adrian Bunk reported another compile error with a SVN head GCC:
...
  CC      arch/cris/arch-v10/lib/string.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/arch/cris/arch-v10/lib/string.c:138:
error: lvalue required as increment operand
/home/bunk/linux/kernel-2.6/git/linux-2.6/arch/cris/arch-v10/lib/string.c:138:
error: lvalue required as increment operand
/home/bunk/linux/kernel-2.6/git/linux-2.6/arch/cris/arch-v10/lib/string.c:139:
error: lvalue required as increment operand
...
This is due to the use of the construct:
	*((long*)dst)++ = lc;
Which isn't legal since casts don't return an lvalue.
The solution is to import the implementation from newlib,
which is continually autotested together with GCC mainline,
and uses the construct:
	*(long *) dst = lc; dst += 4;
Since this is an import of a file from newlib, I'm not touching
the formatting or correcting any checkpatch errors.
As for the earlier fix for memset.c, even if the two files for
CRIS v10 and CRIS v32 are identical at the moment, it might
be possible to tweak the CRIS v32 version.
Thus, I'm not yet folding them into the same file, at least not
until we've done some research on it.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Sterba [Tue, 4 Mar 2008 22:28:50 +0000 (14:28 -0800)]
 
ipwireless: fix potential tty == NULL dereference
The Coverity checker spotted the following inconsequent NULL checking in
drivers/char/pcmcia/ipwireless/network.c:ipwireless_network_packet_received()
if (tty && channel_idx == IPW_CHANNEL_RAS
		&& (network->ras_control_lines &
			IPW_CONTROL_LINE_DCD) != 0
		&& ipwireless_tty_is_modem(tty)) {
...
	else
		ipwireless_tty_received(tty, data, length);
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 4 Mar 2008 22:28:50 +0000 (14:28 -0800)]
 
sm501: add support for the SM502 programmable PLL
SM502 has a programmable PLL which can provide the panel pixel clock instead
of the 288MHz and 336MHz PLLs.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 4 Mar 2008 22:28:49 +0000 (14:28 -0800)]
 
sm501: remove a duplicated table
misc_div is a subset of px_div so eliminate the smaller table.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 4 Mar 2008 22:28:49 +0000 (14:28 -0800)]
 
sm501fb: fix timing limits
Vertical sync height register can only hold 6 bits.  Fix the hsync start test
to use > instead of >=.  Also add a few clarifying comments.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 4 Mar 2008 22:28:48 +0000 (14:28 -0800)]
 
sm501fb: set transp.offset to 0 in 8bpp and 16bpp modes
Even though it may not be strictly necessary transp.offset should probably be
0 when alpha channel is not available.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 4 Mar 2008 22:28:47 +0000 (14:28 -0800)]
 
sm501fb: RGB offsets are reversed in 16bpp modes
The RGB offsets were reversed in 16bpp modes.  Simply trying to reverse the
offsets when endianness differs is clearly the wrong thing to do but that is
an issue for another patch.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 4 Mar 2008 22:28:46 +0000 (14:28 -0800)]
 
sm501fb: direct color visual does not work
The sm501fb palette code clearly does not handle direct color so change the
driver to use true color visual for 16bpp.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Acked-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 4 Mar 2008 22:28:45 +0000 (14:28 -0800)]
 
ndelay(): switch to C function to avoid 64-bit division
We should be able to do ndelay(some_u64), but that can cause a call to
__divdi3() to be emitted because the ndelay() macros does a divide.
Fix it by switching to static inline which will force the u64 arg to be
treated as an unsigned long.  udelay() takes an unsigned long arg.
[bunk@kernel.org: reported m68k build breakage]
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Martin Michlmayr <tbm@cyrius.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Tue, 4 Mar 2008 22:28:44 +0000 (14:28 -0800)]
 
ds1wm: report bus reset error
The patch replaces dev_dbg() by dev_err(), so the user could actually see the
error, instead of wondering why w1 doesn't work.  The root cause of the bus
reset error isn't yet debugged though, but this sometimes happens on iPaq
H5555.
And while I'm at it, some cosmetic cleanups also made (few lines were using
spaces instead of tabs).
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Tue, 4 Mar 2008 22:28:43 +0000 (14:28 -0800)]
 
ds1wm: should check for IS_ERR(clk) instead of NULL
On the error condition clk_get() returns ERR_PTR(..), so checking for NULL
doesn't work.  ds1wm module causes a kernel oops when ds1wm clock isn't
registered.
This patch converts NULL check to IS_ERR(), plus uses PTR_ERR()
for the return code.
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Grant Likely [Tue, 4 Mar 2008 22:28:42 +0000 (14:28 -0800)]
 
powerpc: mpc5200: fix build error on mpc52xx_psc_spi device driver
Commit id 
94f389485e27641348c1951ab8d65157122a8939 (Separate MPC52xx PSC FIOF
regsiters from the rest of PSC) split the PSC fifo registers away from the
core PSC regs.  Doing so broke the mpc52xx_psc_spi driver.
This patch teaches the mpc52xx_psc_spi driver about the new PSC fifo
register definitions.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: David Brownell <david-b@pacbell.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Osterlund [Tue, 4 Mar 2008 22:28:41 +0000 (14:28 -0800)]
 
pktcdvd: reduce stack consumption
On my system, pkt_open() consumes 584 bytes because the compiler decides to
inline lots of functions that would not normally be part of long call chains.
The following patch fixes that problem on my system.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Cc: Nix <nix@esperi.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 4 Mar 2008 22:28:40 +0000 (14:28 -0800)]
 
add noinline_for_stack
People are adding `noinline' in various places to prevent excess stack
consumption due to gcc inlining.  But once this is done, it is quite unobvious
why the `noinline' is present in the code.  We can comment each and every
site, or we can use noinline_for_stack.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Tue, 4 Mar 2008 22:28:39 +0000 (14:28 -0800)]
 
tridentfb: resource management fixes in probe function
Correct error paths in probe function.
The probe function enables mmio mode so it important to disable the mmio
mode before exiting the probe function.  Otherwise, the console is left in
unusable state (garbled fonts at least, lock up at worst).
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Balbir Singh [Tue, 4 Mar 2008 22:28:39 +0000 (14:28 -0800)]
 
Memory controller: rename to Memory Resource Controller
Rename Memory Controller to Memory Resource Controller.  Reflect the same
changes in the CONFIG definition for the Memory Resource Controller.  Group
together the config options for Resource Counters and Memory Resource
Controller.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ananth N Mavinakayanahalli [Tue, 4 Mar 2008 22:28:38 +0000 (14:28 -0800)]
 
Kprobes: move kprobe examples to samples/
Move kprobes examples from Documentation/kprobes.txt to under samples/.
Patch originally by Randy Dunlap.
o Updated the patch to apply on 2.6.25-rc3
o Modified examples code to build on multiple architectures. Currently,
  the kprobe and jprobe examples code works for x86 and powerpc
o Cleaned up unneeded #includes
o Cleaned up Kconfig per Sam Ravnborg's suggestions to fix build break
  on archs that don't have kretprobes
o Implemented suggestions by Mathieu Desnoyers on CONFIG_KRETPROBES
o Included Andrew Morton's cleanup based on x86-git
o Modified kretprobe_example to act as a arch-agnostic module to
  determine routine execution times:
	Use 'modprobe kretprobe_example func=<func_name>' to determine
	execution time of func_name in nanoseconds.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ananth N Mavinakayanahalli [Tue, 4 Mar 2008 22:28:37 +0000 (14:28 -0800)]
 
Kprobes: indicate kretprobe support in Kconfig
Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant
architectures with kprobes support.  This facilitates easy handling of
in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on
kretprobes being present in the kernel.
Thanks to Sam Ravnborg for helping make the patch more lean.
Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Tue, 4 Mar 2008 22:28:36 +0000 (14:28 -0800)]
 
VT notifier fix for VT switch
VT notifier callbacks need to be aware of console switches.  This is already
partially done from console_callback(), but at that time fg_console, cursor
positions, etc.  are not yet updated and hence screen readers fetch the old
values.
This adds an update notify after all of the values are updated in
redraw_screen(vc, 1).
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Tue, 4 Mar 2008 22:28:35 +0000 (14:28 -0800)]
 
alloc_percpu() fails to allocate percpu data
Some oprofile results obtained while using tbench on a 2x2 cpu machine were
very surprising.
For example, loopback_xmit() function was using high number of cpu cycles
to perform the statistic updates, supposed to be real cheap since they use
percpu data
        pcpu_lstats = netdev_priv(dev);
        lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
        lb_stats->packets++;  /* HERE : serious contention */
        lb_stats->bytes += skb->len;
struct pcpu_lstats is a small structure containing two longs.  It appears
that on my 32bits platform, alloc_percpu(8) allocates a single cache line,
instead of giving to each cpu a separate cache line.
Using the following patch gave me impressive boost in various benchmarks
( 6 % in tbench)
(all percpu_counters hit this bug too)
Long term fix (ie >= 2.6.26) would be to let each CPU allocate their own
block of memory, so that we dont need to roudup sizes to L1_CACHE_BYTES, or
merging the SGI stuff of course...
Note : SLUB vs SLAB is important here to *show* the improvement, since they
dont have the same minimum allocation sizes (8 bytes vs 32 bytes).  This
could very well explain regressions some guys reported when they switched
to SLUB.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 4 Mar 2008 22:28:33 +0000 (14:28 -0800)]
 
vfs: fix NULL pointer dereference in fsync_buffers_list()
Fix NULL pointer dereference in fsync_buffers_list() introduced by recent fix
of races in private_list handling.  Since bh->b_assoc_map has been cleared in
__remove_assoc_queue() we should really use original value stored in the
'mapping' variable.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 4 Mar 2008 22:28:32 +0000 (14:28 -0800)]
 
zlc_setup(): handle jiffies wraparound
jiffies subtraction may cause an overflow problem.  It should be using
time_after().
[akpm@linux-foundation.org: include jiffies.h]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Riesen [Tue, 4 Mar 2008 22:28:31 +0000 (14:28 -0800)]
 
Fix "Malformed early option 'loglevel'"
Keith Mannthey said:
  The parameter hotadd_percent is setup right but there is a "Malformed
  early option 'numa'" message.
Rusty Russell said:
  This happens when the function registered with early_param() returns
  non-zero.  __setup() functions return 1 if OK, module_param() and
  early_param() return 0 or a -ve error code.
For instance:
Linux version 2.6.25-rc3-t (raa@steel) (gcc version 4.1.3 
20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #22 SMP PREEMPT Tue Feb 26
BIOS-provided physical RAM map:
 BIOS-e820: 
0000000000000000 - 
00000000000a0000 (usable)
 BIOS-e820: 
00000000000f0000 - 
0000000000100000 (reserved)
 BIOS-e820: 
0000000000100000 - 
000000003fff0000 (usable)
 BIOS-e820: 
000000003fff0000 - 
000000003fff3000 (ACPI NVS)
 BIOS-e820: 
000000003fff3000 - 
0000000040000000 (ACPI data)
 BIOS-e820: 
00000000fec00000 - 
0000000100000000 (reserved)
Malformed early option 'loglevel'
127MB HIGHMEM available.
896MB LOWMEM available.
Command line:
BOOT_IMAGE=2.6.25-t ro root=809 ro console=ttyS0,57600n8 console=tty0 loglevel=5
Acked-by: Yinghai Lu <yhlu.kernel@gmai.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Keith Mannthey <kmannth@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Tue, 4 Mar 2008 22:28:30 +0000 (14:28 -0800)]
 
core dump: user_regset writeback
This makes the user_regset-based core dump code call user_regset writeback
hooks when available.  This is necessary groundwork to allow IA64 to set
CORE_DUMP_USE_REGSET.
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
akpm@linux-foundation.org [Tue, 4 Mar 2008 22:28:29 +0000 (14:28 -0800)]
 
Add memory resource controller maintainers
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Menage [Tue, 4 Mar 2008 22:28:28 +0000 (14:28 -0800)]
 
Control Groups: add Paul Menage as maintainer
Control Groups: Add Paul Menage as maintainer
Signed-off-by: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 4 Mar 2008 22:28:27 +0000 (14:28 -0800)]
 
gpio: <linux/gpio.h> and "no GPIO support here" stubs
Add a <linux/gpio.h> defining fail/warn stubs for GPIO calls on platforms that
don't support the GPIO programming interface.  That includes the arch-specific
implementation glue otherwise.
This facilitates a new model for GPIO usage: drivers that can use GPIOs if
they're available, but don't require them.  One example of such a driver is
NAND driver for various FreeScale chips.  On platforms update with GPIO
support, they can be used instead of a worst-case delay to verify that the
BUSY signal is off.
(Also includes a couple minor unrelated doc updates.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Tue, 4 Mar 2008 22:28:26 +0000 (14:28 -0800)]
 
specialix.c: fix possible double-unlock
Noticed by sparse, trivial to see:
drivers/char/specialix.c:2112:3: warning: context imbalance in 'sx_throttle' - unexpected unlock
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Byron Bradley [Tue, 4 Mar 2008 22:28:25 +0000 (14:28 -0800)]
 
rtc: add support for the S-35390A RTC chip
This adds basic get/set time support for the Seiko Instruments S-35390A.
This chip communicates using I2C and is used on the QNAP TS-109/TS-209 NAS
devices.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Byron Bradley <byron.bbradley@gmail.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Acked-by: David Brownell <david-b@pacbell.net>
Tested-by: Tim Ellis <tim@ngndg.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Balbir Singh [Tue, 4 Mar 2008 22:28:24 +0000 (14:28 -0800)]
 
Memory Resource Controller use strstrip while parsing arguments
The memory controller has a requirement that while writing values, we need
to use echo -n. This patch fixes the problem and makes the UI more consistent.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Nilsson [Tue, 4 Mar 2008 22:28:23 +0000 (14:28 -0800)]
 
CRIS v10: Include mm.h instead of vmstat.h in kernel/time.c
Commit 
2f569afd9ced9ebec9a6eb3dbf6f83429be0a7b4
(CONFIG_HIGHPTE vs. sub-page page tables) introduced use of
inc_zone_page_state and dec_zone_page_state in include/linux/mm.h.
Those are defined in include/linux/vmstat.h, but after it includes
mm.h, making it impossible to include vmstat.h since inc_zone_page_state
and dec_zone_page_state then would be undefined.
arch/cris/arch-v10/kernel/time.c does just this, which makes the
CRIS v10 build break with the following error:
...
  CC      arch/cris/arch-v10/kernel/time.o
In file included from include/linux/vmstat.h:7,
                 from arch/cris/arch-v10/kernel/time.c:17:
include/linux/mm.h: In function 'pgtable_page_ctor':
include/linux/mm.h:902: error: implicit declaration of function 'inc_zone_page_state'
include/linux/mm.h: In function 'pgtable_page_dtor':
include/linux/mm.h:908: error: implicit declaration of function 'dec_zone_page_state'
make[2]: *** [arch/cris/arch-v10/kernel/time.o] Error 1
make[1]: *** [arch/cris/arch-v10/kernel] Error 2
make: *** [sub-make] Error 2
...
By changing kernel/time.c to include linux/mm.h, the build succeeds.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Mikael Starvik <mikael.starvik@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bjorn Helgaas [Tue, 4 Mar 2008 22:28:21 +0000 (14:28 -0800)]
 
serial: add PNP ID GVC0303 for Archtek 3334BRV ISA modem
Thomas Lehmann <thomas.lehmann@alumni.tu-berlin.de> verified that this
entry works.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 4 Mar 2008 22:28:20 +0000 (14:28 -0800)]
 
update checkpatch.pl to version 0.15
This version brings a number of minor fixes updating the type detector and
the unary tracker.  It also brings a few small fixes for false positives.
It also reverts the --file warning.  Of note:
 - limit CVS checks to added lines
 - improved type detections
 - fixes to the unary tracker
Andy Whitcroft (13):
      Version: 0.15
      EXPORT_SYMBOL checks need to accept array variables
      export checks must match DECLARE_foo and LIST_HEAD
      possible types: cleanup debugging missing line
      values: track values through preprocessor conditional paths
      typeof is actually a type
      possible types: detect definitions which cross lines
      values: include line numbers on value debug information
      values: ensure we find correctly record pending brackets
      values: simplify the brace history stack
      CVS keyword checks should only apply to added lines
      loosen spacing for comments
      allow braces for single statement blocks with multiline conditionals
Harvey Harrison (1):
      checkpatch: remove fastcall
Ingo Molnar (1):
      checkpatch.pl: revert wrong --file message
Uwe Kleine-Koenig (1):
      fix typo "goot" -> "good"
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Joel Schopp <jschopp@austin.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Tue, 4 Mar 2008 22:28:19 +0000 (14:28 -0800)]
 
cgroup: fix default notify_on_release setting
The documentation says the default value of notify_on_release of a child
cgroup is inherited from its parent, which is reasonable, but the
implementation just sets the flag disabled.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 4 Mar 2008 19:33:24 +0000 (19:33 +0000)]
 
x86: a P4 is a P6 not an i486
P4 has been coming out as CPU_FAMILY=4 instead of 6: fix MPENTIUM4 typo.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 4 Mar 2008 19:37:36 +0000 (11:37 -0800)]
 
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  scsi: missing add of padded bytes to io completion byte count
Linus Torvalds [Tue, 4 Mar 2008 19:27:32 +0000 (11:27 -0800)]
 
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  [PATCH] fs/ocfs2/aops.c: Correct use of ! and &
  [2.6 patch] ocfs2: make dlm_do_assert_master() static
  [2.6 patch] make ocfs2_downconvert_thread() static
  [2.6 patch] fs/ocfs2/: possible cleanups
  [PATCH] ocfs2: le*_add_cpu conversion
  ocfs2: Fix writeout in ocfs2_data_convert_worker()
  ocfs2: Enable localalloc for local mounts
Linus Torvalds [Tue, 4 Mar 2008 19:26:01 +0000 (11:26 -0800)]
 
Merge branch 'fixes' of git://git./linux/kernel/git/djbw/async_tx
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  ioat: fix 'ack' handling, driver must ensure that 'ack' is zero
  dmaengine: fix sparse warning
  fsldma: do not cleanup descriptors in hardirq context
  dmaengine: add driver for Freescale MPC85xx DMA controller
Jens Axboe [Tue, 4 Mar 2008 19:22:54 +0000 (20:22 +0100)]
 
scsi: missing add of padded bytes to io completion byte count
Original patch from Tejun Heo <htejun@gmail.com> but should use ->extra_len
and not ->data_len, as we would then overshoot the original request size.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Linus Torvalds [Tue, 4 Mar 2008 17:23:28 +0000 (09:23 -0800)]
 
Merge branch 'for-linus' of git://git./linux/kernel/git/mingo/linux-2.6-sched-devel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel:
  sched: revert load_balance_monitor() changes
Linus Torvalds [Tue, 4 Mar 2008 17:22:32 +0000 (09:22 -0800)]
 
Merge branch 'for-linus' of git://git./linux/kernel/git/x86/linux-2.6-x86
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86/xen: fix DomU boot problem
  x86: not set node to cpu_to_node if the node is not online
  x86, i387: fix ptrace leakage using init_fpu()
Linus Torvalds [Tue, 4 Mar 2008 17:22:05 +0000 (09:22 -0800)]
 
Merge branch 'for-linus' of git://git./linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  x86: disable KVM for Voyager and friends
  KVM: VMX: Avoid rearranging switched guest msrs while they are loaded
  KVM: MMU: Fix race when instantiating a shadow pte
  KVM: Route irq 0 to vcpu 0 exclusively
  KVM: Avoid infinite-frequency local apic timer
  KVM: make MMU_DEBUG compile again
  KVM: move alloc_apic_access_page() outside of non-preemptable region
  KVM: SVM: fix Windows XP 64 bit installation crash
  KVM: remove the usage of the mmap_sem for the protection of the memory slots.
  KVM: emulate access to MSR_IA32_MCG_CTL
  KVM: Make the supported cpuid list a host property rather than a vm property
  KVM: Fix kvm_arch_vcpu_ioctl_set_sregs so that set_cr0 works properly
  KVM: SVM: set NM intercept when enabling CR0.TS in the guest
  KVM: SVM: Fix lazy FPU switching
Dan Williams [Sat, 1 Mar 2008 14:52:14 +0000 (07:52 -0700)]
 
ioat: fix 'ack' handling, driver must ensure that 'ack' is zero
Initialize 'ack' to zero in case the descriptor has been recycled.
Prevents "kernel BUG at crypto/async_tx/async_xor.c:185!"
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Cc: stable@kernel.org
Dan Williams [Sat, 1 Mar 2008 14:51:29 +0000 (07:51 -0700)]
 
dmaengine: fix sparse warning
include/linux/dmaengine.h:364:2: warning: returning void-valued expression
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Mar 2008 14:51:17 +0000 (07:51 -0700)]
 
fsldma: do not cleanup descriptors in hardirq context
"Cleaning" descriptors involves calling pending callbacks and clients
assume that their callback will only ever happen in softirq context.
Delay cleanup to the tasklet.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Zhang Wei <wei.zhang@freescale.com>
Zhang Wei [Sat, 1 Mar 2008 14:42:48 +0000 (07:42 -0700)]
 
dmaengine: add driver for Freescale MPC85xx DMA controller
The driver implements DMA engine API for Freescale MPC85xx DMA controller,
which could be used by devices in the silicon.  The driver supports the
Basic mode of Freescale MPC85xx DMA controller.  The MPC85xx processors
supported include MPC8540/60, MPC8555, MPC8548, MPC8641 and so on.
The MPC83xx(MPC8349, MPC8360) are also supported.
[kamalesh@linux.vnet.ibm.com: build fix]
[dan.j.williams@intel.com: merge mm fixes, rebase on async_tx-2.6.25]
Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Ebony Zhu <ebony.zhu@freescale.com>
Acked-by: Kumar Gala <galak@gate.crashing.org>
Cc: Shannon Nelson <shannon.nelson@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Peter Zijlstra [Mon, 25 Feb 2008 16:34:02 +0000 (17:34 +0100)]
 
sched: revert load_balance_monitor() changes
The following commits cause a number of regressions:
  commit 
58e2d4ca581167c2a079f4ee02be2f0bc52e8729
  Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
  Date:   Fri Jan 25 21:08:00 2008 +0100
  sched: group scheduling, change how cpu load is calculated
  commit 
6b2d7700266b9402e12824e11e0099ae6a4a6a79
  Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
  Date:   Fri Jan 25 21:08:00 2008 +0100
  sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups
Namely:
 - very frequent wakeups on SMP, reported by PowerTop users.
 - cacheline trashing on (large) SMP
 - some latencies larger than 500ms
While there is a mergeable patch to fix the latter, the former issues
are not fixable in a manner suitable for .25 (we're at -rc3 now).
Hence we revert them and try again in v2.6.26.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ian Campbell [Thu, 28 Feb 2008 23:16:49 +0000 (23:16 +0000)]
 
x86/xen: fix DomU boot problem
Construct Xen guest e820 map with a hole between 640K-1M.
It's pure luck that Xen kernels have gotten away with it in the past.
The patch below seems like the right thing to do. It certainly boots in
a domU without the DMI problem (without any of the other related patches
such as Alexander's).
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Yinghai Lu [Tue, 19 Feb 2008 23:35:54 +0000 (15:35 -0800)]
 
x86: not set node to cpu_to_node if the node is not online
resolve boot problem reported by Mel Gorman:
   http://lkml.org/lkml/2008/2/13/404
init_cpu_to_node will use cpu->apic (from MADT or mptable) and
apic->node(from SRAT or AMD config space with k8_bus_64.c) to have
cpu->node mapping, and later identify_cpu will overwrite them
again...(with nearby_node...)
this patch checks if the node is online, otherwise it will not
update cpu_node map. so keep cpu_node map to online node before
identify_cpu..., to prevent possible error.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Suresh Siddha [Mon, 3 Mar 2008 21:01:08 +0000 (13:01 -0800)]
 
x86, i387: fix ptrace leakage using init_fpu()
This bug got introduced by the recent i387 merge:
  commit 
4421011120b2304e5c248ae4165a2704588aedf1
  Author: Roland McGrath <roland@redhat.com>
  Date:   Wed Jan 30 13:31:50 2008 +0100
      x86: x86 i387 user_regset
Current usage of unlazy_fpu() in ptrace specific routines is wrong.
unlazy_fpu() will not init fpu if the task never used math. So the
ptrace calls can expose the parent tasks FPU data in some cases.
Replace it with the init_fpu() which will init the math state, if the
task never used math before.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Tue, 4 Mar 2008 16:08:05 +0000 (08:08 -0800)]
 
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: fix blkdev_issue_flush() not detecting and passing EOPNOTSUPP back
  block: fix shadowed variable warning in blk-map.c
  block: remove extern on function definition
  cciss: remove READ_AHEAD define and use block layer defaults
  make cdrom.c:check_for_audio_disc() static
  block/genhd.c: proper externs
  unexport blk_rq_map_user_iov
  unexport blk_{get,put}_queue
  block/genhd.c: cleanups
  proper prototype for blk_dev_init()
  block/blk-tag.c should #include "blk.h"
  Fix DMA access of block device in 64-bit kernel on some non-x86 systems with 4GB or upper 4GB memory
  block: separate out padding from alignment
  block: restore the meaning of rq->data_len to the true data length
  resubmit: cciss: procfs updates to display info about many
  splice: only return -EAGAIN if there's hope of more data
  block: fix kernel-docbook parameters and files
Geert Uytterhoeven [Tue, 4 Mar 2008 08:18:16 +0000 (09:18 +0100)]
 
m68k{,nommu}: Wire up new timerfd syscalls
m68k{,nommu}: Wire up the new timerfd syscalls, which were introduced in
commit 
4d672e7ac79b5ec5cdc90e450823441e20464691 ("timerfd: new timerfd API").
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Ungerer [Tue, 4 Mar 2008 06:52:01 +0000 (16:52 +1000)]
 
m68knommu: fix fec driver interrupt races
The FEC driver has a common interrupt handler for all interrupt event
types. It is raised on a number of distinct interrupt vectors.
This handler can't be re-entered while processing an interrupt, so
make sure all requested vectors are flagged as IRQF_DISABLED.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Ungerer [Tue, 4 Mar 2008 06:35:04 +0000 (16:35 +1000)]
 
m68knommu: declare do_IRQ()
Need a declaration of do_IRQ for the 68328 interrupt handling code.
It is common to all m68knommu targets, so a common declaration makes
sense.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>