Paul Gortmaker [Sat, 20 Jun 2015 23:02:32 +0000 (19:02 -0400)]
clocksource: Increase dependencies of timer-stm32 to limit build wreckage
This driver leaks out into arch/parisc builds that don't have
CONFIG_GENERIC_CLOCKEVENTS, leading to the following (truncated)
wreckage:
CC drivers/clocksource/timer-stm32.o
drivers/clocksource/timer-stm32.c:38:28: error: field 'evtdev' has incomplete type
drivers/clocksource/timer-stm32.c:44:19: warning: 'enum clock_event_mode' declared inside parameter list
drivers/clocksource/timer-stm32.c:44:19: warning: its scope is only this definition or declaration, which is probably not what you want
drivers/clocksource/timer-stm32.c:43:62: error: parameter 1 ('mode') has incomplete type
drivers/clocksource/timer-stm32.c:43:13: error: function declaration isn't a prototype
drivers/clocksource/timer-stm32.c: In function 'stm32_clock_event_set_mode':
drivers/clocksource/timer-stm32.c:47:3: error: type defaults to 'int' in declaration of '__mptr'
drivers/clocksource/timer-stm32.c:47:3: warning: initialization from incompatible pointer type
drivers/clocksource/timer-stm32.c:51:7: error: 'CLOCK_EVT_MODE_PERIODIC' undeclared (first use in this function)
drivers/clocksource/timer-stm32.c:51:7: note: each undeclared identifier is reported only once for each function it appears in
drivers/clocksource/timer-stm32.c:56:7: error: 'CLOCK_EVT_MODE_ONESHOT' undeclared (first use in this function)
Tighten up the dependencies to limit where it gets built by copying
the style of the Kconfig line for CLKSRC_EFM32 a few lines above.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1434841352-24300-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 26 May 2015 22:50:35 +0000 (22:50 +0000)]
timer: Minimize nohz off overhead
If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.
Before:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
After:
48.73% hog [.] main
15.36% [kernel] [k] _raw_spin_lock_irqsave
9.77% [kernel] [k] _raw_spin_unlock_irqrestore
6.61% [kernel] [k] lock_timer_base.isra.38
6.42% [kernel] [k] mod_timer
3.90% [kernel] [k] detach_if_pending
3.76% [kernel] [k] del_timer
2.41% [kernel] [k] internal_add_timer
1.39% [kernel] [k] __internal_add_timer
0.76% [kernel] [k] timerfn
We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 26 May 2015 22:50:33 +0000 (22:50 +0000)]
timer: Reduce timer migration overhead if disabled
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.
The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.
With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.
Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu
After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 26 May 2015 22:50:31 +0000 (22:50 +0000)]
timer: Stats: Simplify the flags handling
Simplify the handling of the flag storage for the timer statistics. No
intermediate storage anymore. Just hand over the flags field.
I left the printout of 'deferrable' for now because changing this
would be an ABI update and I have no idea how strong people feel about
that. OTOH, I wonder whether we should kill the whole timer stats
stuff because all of that information can be retrieved via ftrace/perf
as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 26 May 2015 22:50:29 +0000 (22:50 +0000)]
timer: Replace timer base by a cpu index
Instead of storing a pointer to the per cpu tvec_base we can simply
cache a CPU index in the timer_list and use that to get hold of the
correct per cpu tvec_base. This is only used in lock_timer_base() and
the slightly larger code is peanuts versus the spinlock operation and
the d-cache foot print of the timer wheel.
Aside of that this allows to get rid of following nuisances:
- boot_tvec_base
That statically allocated 4k bss data is just kept around so the
timer has a home when it gets statically initialized. It serves no
other purpose.
With the CPU index we assign the timer to CPU0 at static
initialization time and therefor can avoid the whole boot_tvec_base
dance. That also simplifies the init code, which just can use the
per cpu base.
Before:
text data bss dec hex filename
17491 9201 4160 30852 7884 ../build/kernel/time/timer.o
After:
text data bss dec hex filename
17440 9193 0 26633 6809 ../build/kernel/time/timer.o
- Overloading the base pointer with various flags
The CPU index has enough space to hold the flags (deferrable,
irqsafe) so we can get rid of the extra masking and bit fiddling
with the base pointer.
As a benefit we reduce the size of struct timer_list on 64 bit
machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
which is a real win as we have tons of them embedded in other structs.
This changes also the newly added deferrable printout of the timer
start trace point to capture and print all timer->flags, which allows
us to decode the target cpu of the timer as well.
We might have used bitfields for this, but that would change the
static initializers and the init function for no value to accomodate
big endian bitfields.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Badhri Jagan Sridharan <Badhri@google.com>
Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 26 May 2015 22:50:28 +0000 (22:50 +0000)]
timer: Use hlist for the timer wheel hash buckets
This reduces the size of struct tvec_base by 50% and results in
slightly smaller code as well.
Before:
struct tvec_base: size: 8256, cachelines: 129
text data bss dec hex filename
17698 13297 8256 39251 9953 ../build/kernel/time/timer.o
After:
struct tvec_base: 4160, cachelines: 65
text data bss dec hex filename
17491 9201 4160 30852 7884 ../build/kernel/time/timer.o
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 26 May 2015 22:50:26 +0000 (22:50 +0000)]
timer: Remove FIFO "guarantee"
The FIFO guarantee is only there if two timers are queued into the
same bucket at the same jiffie on the same cpu:
- The slack value depends on the delta between expiry and enqueue
time, so the resulting expiry time can be different for timers
which are queued in different jiffies.
- Timers which are queued into the secondary array end up after a
later queued timer which was queued into the primary array due to
cascading.
- Timers can end up on different cpus due to the NOHZ target moving
around. Obviously there is no guarantee of expiry ordering between
cpus.
So anything which relies on FIFO behaviour of the timer wheel is
broken already.
This is a preparatory patch for converting the timer wheel to hlist
which reduces the memory foot print of the wheel by 50%.
It's a seperate patch so any (unlikely to happen) regression caused by
this can be identified clearly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Cc: George Spelvin <linux@horizon.com>
Link: http://lkml.kernel.org/r/20150526224511.757520403@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 26 May 2015 22:50:24 +0000 (22:50 +0000)]
timers: Sanitize catchup_timer_jiffies() usage
catchup_timer_jiffies() has been applied blindly to several functions
without looking for possible better ways to do it.
1) internal_add_timer()
Move the update to base->all_timers before we actually insert the
timer into the wheel.
2) detach_if_pending()
Again the update to base->all_timers allows us to explicitely do
the timer_jiffies update in place, if this was the last timer which
got removed.
3) __run_timers()
We only check on entry, which is silly, because base->timer_jiffies
can be behind - especially on NOHZ kernels - and if there is a
single deferrable timer somewhere between base->timer_jiffies and
jiffies we expire it and then loop until base->timer_jiffies ==
jiffies.
Move it into the loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224511.662994644@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Thu, 11 Jun 2015 12:46:48 +0000 (14:46 +0200)]
hrtimer: Allow hrtimer::function() to free the timer
Currently an hrtimer callback function cannot free its own timer
because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
after it. Freeing the timer would result in a clear use-after-free.
Solve this by using a scheme similar to regular timers; track the
current running timer in hrtimer_clock_base::running.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Wed, 17 Jun 2015 12:29:24 +0000 (14:29 +0200)]
seqcount: Introduce raw_write_seqcount_barrier()
Introduce raw_write_seqcount_barrier(), a new construct that can be
used to provide write barrier semantics in seqcount read loops instead
of the usual consistency guarantee.
raw_write_seqcount_barier() is equivalent to:
raw_write_seqcount_begin();
raw_write_seqcount_end();
But avoids issueing two back-to-back smp_wmb() instructions.
This construct works because the read side will 'stall' when observing
odd values. This means that -- referring to the example in the comment
below -- even though there is no (matching) read barrier between the
loads of X and Y, we cannot observe !x && !y, because:
- if we observe Y == false we must observe the first sequence
increment, which makes us loop, until
- we observe !(seq & 1) -- the second sequence increment -- at which
time we must also observe T == true.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: umgwanakikbuti@gmail.com
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: oleg@redhat.com
Cc: wanpeng.li@linux.intel.com
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20150617122924.GP3644@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Thu, 11 Jun 2015 12:46:46 +0000 (14:46 +0200)]
seqcount: Rename write_seqcount_barrier()
I'll shortly be introducing another seqcount primitive that's useful
to provide ordering semantics and would like to use the
write_seqcount_barrier() name for that.
Seeing how there's only one user of the current primitive, lets rename
it to invalidate, as that appears what its doing.
While there, employ lockdep_assert_held() instead of
assert_spin_locked() to not generate debug code for regular kernels.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: wanpeng.li@linux.intel.com
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.279926217@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Thu, 11 Jun 2015 12:46:45 +0000 (14:46 +0200)]
hrtimer: Fix hrtimer_is_queued() hole
A queued hrtimer that gets restarted (hrtimer_start*() while
hrtimer_is_queued()) will briefly appear as unqueued/inactive, even
though the timer has always been active, we just moved it.
Close this hole by preserving timer->state in
hrtimer_start_range_ns()'s remove_hrtimer() call.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.175989138@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Oleg Nesterov [Thu, 11 Jun 2015 12:46:44 +0000 (14:46 +0200)]
hrtimer: Remove HRTIMER_STATE_MIGRATE
I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally
confused it looks buggy and simply unneeded.
migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this
is not enough: this can fool, say, hrtimer_is_queued() in
dequeue_signal().
Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED?
This fixes the race and we can kill STATE_MIGRATE.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
John Stultz [Wed, 17 Jun 2015 18:16:43 +0000 (11:16 -0700)]
selftest: Timers: Avoid signal deadlock in leap-a-day
In
0c4a5fc95b1df (Add leap-second timer edge testing to
leap-a-day.c), we added a timer to the test which checks to make
sure timers near the leapsecond edge behave correctly.
However, the output generated from the timer uses ctime_r, which
isn't async-signal safe, and should that signal land while the
main test is using ctime_r to print its output, its possible for
the test to deadlock on glibc internal locks.
Thus this patch reworks the output to avoid using ctime_r in
the signal handler.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434565003-3386-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
John Stultz [Wed, 17 Jun 2015 17:05:53 +0000 (10:05 -0700)]
timekeeping: Copy the shadow-timekeeper over the real timekeeper last
The fix in
d151832650ed9 (time: Move clock_was_set_seq update
before updating shadow-timekeeper) was unfortunately incomplete.
The main gist of that change was to do the shadow-copy update
last, so that any state changes were properly duplicated, and
we wouldn't accidentally have stale data in the shadow.
Unfortunately in the main update_wall_time() logic, we update
use the shadow-timekeeper to calculate the next update values,
then while holding the lock, copy the shadow-timekeeper over,
then call timekeeping_update() to do some additional
bookkeeping, (skipping the shadow mirror). The bug with this is
the additional bookkeeping isn't all read-only, and some
changes timkeeper state. Thus we might then overwrite this state
change on the next update.
To avoid this problem, do the timekeeping_update() on the
shadow-timekeeper prior to copying the full state over to
the real-timekeeper.
This avoids problems with both the clock_was_set_seq and
next_leap_ktime being overwritten and possibly the
fast-timekeepers as well.
Many thanks to Prarit for his rigorous testing, which discovered
this problem, along with Prarit and Daniel's work validating this
fix.
Reported-by: Prarit Bhargava <prarit@redhat.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Viresh Kumar [Wed, 17 Jun 2015 10:34:46 +0000 (16:04 +0530)]
clockevents: Check state instead of mode in suspend/resume path
CLOCK_EVT_MODE_* macros are present for backward compatibility (as most
of the drivers are still using old ->set_mode() interface).
These macro's shouldn't be used anymore in code, that is common to both
driver interfaces, i.e. ->set_mode() and ->set_state_*().
Drivers implementing ->set_state_*() interface, which have their
clkevt->mode set to 0 (clkevt device structures are normally globally
defined), will not participate in suspend/resume as they will always be
marked as UNUSED.
Fix this by checking state of the clockevent device instead of mode,
which is updated for both the interfaces.
Fixes:
ac34ad27fc16 ("clockevents: Do not suspend/resume if unused")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: alexandre.belloni@free-electrons.com
Cc: sylvain.rochet@finsecur.com
Link: http://lkml.kernel.org/r/a1964eef6e8a47d02b1ff9083c6c91f73f0ff643.1434537215.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
John Stultz [Thu, 11 Jun 2015 22:54:57 +0000 (15:54 -0700)]
selftests: timers: Add leap-second timer edge testing to leap-a-day.c
Prarit reported an issue w/ timers around the leapsecond, where a
timer set for Midnight UTC (00:00:00) might fire a second early right
before the leapsecond (23:59:60 - though it appears as a repeated
23:59:59) is applied.
So I've updated the leap-a-day.c test to integrate a similar test,
where we set a timer and check if it triggers at the right time, and
if the ntp state transition is managed properly.
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
John Stultz [Thu, 11 Jun 2015 22:54:56 +0000 (15:54 -0700)]
ntp: Do leapsecond adjustment in adjtimex read path
Since the leapsecond is applied at tick-time, this means there is a
small window of time at the start of a leap-second where we cross into
the next second before applying the leap.
This patch modified adjtimex so that the leap-second is applied on the
second edge. Providing more correct leapsecond behavior.
This does make it so that adjtimex()'s returned time values can be
inconsistent with time values read from gettimeofday() or
clock_gettime(CLOCK_REALTIME,...) for a brief period of one tick at
the leapsecond. However, those other interfaces do not provide the
TIME_OOP time_state return that adjtimex() provides, which allows the
leapsecond to be properly represented. They instead only see a time
discontinuity, and cannot tell the first 23:59:59 from the repeated
23:59:59 leap second.
This seems like a reasonable tradeoff given clock_gettime() /
gettimeofday() cannot properly represent a leapsecond, and users
likely care more about performance, while folks who are using
adjtimex() more likely care about leap-second correctness.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
John Stultz [Thu, 11 Jun 2015 22:54:55 +0000 (15:54 -0700)]
time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.
This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.
However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.
This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)
However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.
So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.
This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.
Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.
While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.
Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
John Stultz [Thu, 11 Jun 2015 22:54:54 +0000 (15:54 -0700)]
ntp: Introduce and use SECS_PER_DAY macro instead of 86400
Currently the leapsecond logic uses what looks like magic values.
Improve this by defining SECS_PER_DAY and using that macro
to make the logic more clear.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
John Stultz [Thu, 11 Jun 2015 22:54:53 +0000 (15:54 -0700)]
time: Move clock_was_set_seq update before updating shadow-timekeeper
It was reported that
868a3e915f7f5eba (hrtimer: Make offset
update smarter) was causing timer problems after suspend/resume.
The problem with that change is the modification to
clock_was_set_seq in timekeeping_update is done prior to
mirroring the time state to the shadow-timekeeper. Thus the
next time we do update_wall_time() the updated sequence is
overwritten by whats in the shadow copy.
This patch moves the shadow-timekeeper mirroring to the end
of the function, after all updates have been made, so all data
is kept in sync.
(This patch also affects the update_fast_timekeeper calls which
were also problematically done prior to the mirroring).
Reported-and-tested-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Joe Perches [Mon, 25 May 2015 18:49:55 +0000 (11:49 -0700)]
clocksource: Use current logging style
clocksource messages aren't prefixed in dmesg so it's a bit unclear
what subsystem emits the messages.
Use pr_fmt and pr_<level> to auto-prefix the messages appropriately.
Miscellanea:
o Remove "Warning" from KERN_WARNING level messages
o Align "timekeeping watchdog: " messages
o Coalesce formats
o Align multiline arguments
Signed-off-by: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1432579795.2846.75.camel@perches.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Nicholas Mc Guire [Thu, 28 May 2015 17:09:56 +0000 (19:09 +0200)]
time: Allow gcc to fold usecs_to_jiffies(constant)
To allow constant folding in usecs_to_jiffies() conditionally calls
the HZ dependent _usecs_to_jiffies() helpers or, when gcc can not
figure out constant folding, __usecs_to_jiffies, which is the renamed
original usecs_to_jiffies() function.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1432832996-12129-2-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Nicholas Mc Guire [Thu, 28 May 2015 17:09:55 +0000 (19:09 +0200)]
time: Refactor usecs_to_jiffies
Refactor the usecs_to_jiffies conditional code part in time.c and
jiffies.h putting it into conditional functions rather than #ifdefs
to improve readability. This is analogous to the msecs_to_jiffies()
cleanup in commit
ca42aaf0c861 ("time: Refactor msecs_to_jiffies")
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1432832996-12129-1-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Borislav Petkov [Sat, 6 Jun 2015 09:30:00 +0000 (11:30 +0200)]
hrtimers: Make sure hrtimer_resolution is unsigned int
... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like
this one:
net/sched/sch_api.c: In function ‘psched_show’:
net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
(u32)NSEC_PER_SEC / hrtimer_resolution);
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1433583000-32090-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 2 Jun 2015 14:57:47 +0000 (16:57 +0200)]
Merge branch 'clockevents/4.2' of git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull clockevents/clocksource changes from Daniel Lezcano:
- Removed dead code in the files related to mach-msm for qcom (Stephen Boyd)
- Cleaned up code for exynos_mct (Krzysztof Kozlowski)
- Added the new timer lpc3220 (Joachim Eastwood)
- Added the new timer STM32 and ARM system timer (Maxime Coquelin)
Thomas Gleixner [Tue, 2 Jun 2015 12:30:11 +0000 (14:30 +0200)]
clockevents: Rename state to state_use_accessors
The only sensible way to make abuse of core internal fields obvious
and easy to grep for.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Thomas Gleixner [Tue, 2 Jun 2015 12:13:46 +0000 (14:13 +0200)]
clockevents: Use set/get state helper functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Thomas Gleixner [Tue, 2 Jun 2015 12:08:46 +0000 (14:08 +0200)]
clockevents: Provide functions to set and get the state
We want to rename dev->state, so provide proper get and set
functions. Rename clockevents_set_state() to
clockevents_switch_state() to avoid confusion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Viresh Kumar [Thu, 21 May 2015 08:03:46 +0000 (13:33 +0530)]
clockevents: Use helpers to check the state of a clockevent device
Use accessor functions to check the state of clockevent devices in
core code.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/fa2b9869fd17f210eaa156ec2b594efd0230b6c7.1432192527.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Viresh Kumar [Thu, 21 May 2015 08:03:45 +0000 (13:33 +0530)]
clockevents: Add helpers to check the state of a clockevent device
Some clockevent drivers, once migrated to use per-state callbacks,
need to check the state of the clockevent device in their callbacks or
interrupt handler.
Add accessor functions clockevent_state_*() to get this information.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/04a717d490335c688dd7af899fbcede97e1bb8ee.1432192527.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Maxime Coquelin [Thu, 28 May 2015 05:05:53 +0000 (07:05 +0200)]
clockevents/drivers/timer-stm32: Fix build warning spotted by kbuild test robot
This patch fixes below warning spotted by kbuild test robot when building
with ARCH=powerpc:
drivers/clocksource/timer-stm32.c: In function 'stm32_clockevent_init':
>> drivers/clocksource/timer-stm32.c:140:9: warning: large integer implicitly
truncated to unsigned type [-Woverflow]
writel_relaxed(~0UL, data->base + TIM_ARR);
The fix consists in using 0U instead of 0UL.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Maxime Coquelin [Fri, 22 May 2015 21:03:33 +0000 (23:03 +0200)]
clockevents/drivers: Add STM32 Timer driver
STM32 MCUs feature 16 and 32 bits general purpose timers with prescalers.
The drivers detects whether the time is 16 or 32 bits, and applies a
1024 prescaler value if it is 16 bits.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Maxime Coquelin [Fri, 22 May 2015 21:03:32 +0000 (23:03 +0200)]
dt-bindings: Document the STM32 timer bindings
This adds documentation of device tree bindings for the
STM32 timer.
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Maxime Coquelin [Sat, 9 May 2015 07:53:46 +0000 (09:53 +0200)]
clocksource/drivers/armv7m_systick: Add ARM System timer driver
This patch adds clocksource support for ARMv7-M's System timer,
also known as SysTick.
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Maxime Coquelin [Sat, 9 May 2015 07:53:45 +0000 (09:53 +0200)]
dt-bindings: Document the ARM System timer bindings
This adds documentation of device tree bindings for the
ARM System timer.
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Joachim Eastwood [Mon, 11 May 2015 22:00:49 +0000 (00:00 +0200)]
doc: dt: Add documentation for lpc3220-timer
Add DT bindings documentation for lpc3220-timer. This timer is
used as clocksource on many NXP platforms.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Joachim Eastwood [Mon, 11 May 2015 22:00:48 +0000 (00:00 +0200)]
clocksource/drivers/lpc32xx: Add the lpc32xx timer driver
Add support for using the NXP LPC timer as clocksource and clock
event. These timers are present on many NXP devices including
LPC32xx, LPC17xx, LPC18xx and LPC43xx.
The timer has a 32-bit timer counter register with a programmable
32-bit prescaler. It supports up to 4 compare match values with
interrupt generation and reset/stop timer counter action.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Krzysztof Kozlowski [Thu, 30 Apr 2015 04:42:53 +0000 (13:42 +0900)]
clocksource/drivers/exynos_mct: Remove old platform mct_init()
Since commit
228e3023eb04 ("Merge tag 'mct-exynos-for-v3.10' of ...") the
mct_init() was superseded by mct_init_dt() and is not referenced
anywhere. Remove it.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Krzysztof Kozlowski [Thu, 30 Apr 2015 04:42:52 +0000 (13:42 +0900)]
clocksource/drivers/exynos_mct: Staticize struct clocksource
The struct clocksource 'mct_frc' is not exported and used outside so
make it static.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Krzysztof Kozlowski [Thu, 30 Apr 2015 04:42:51 +0000 (13:42 +0900)]
clocksource/drivers/exynos_mct: Change exynos4_mct_tick_clear return type to void
Return value of exynos4_mct_tick_clear() was never checked so it can
be safely changed to void.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Stephen Boyd [Fri, 10 Apr 2015 23:11:02 +0000 (16:11 -0700)]
clocksource/drivers/qcom: Remove dead code
This code is no longer used now that mach-msm has been removed.
Delete it.
Cc: David Brown <davidb@codeaurora.org>
Cc: Bryan Huntsman <bryanh@codeaurora.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Alexandre Belloni [Fri, 16 Jan 2015 09:05:51 +0000 (10:05 +0100)]
clockevents: Do not suspend/resume if unused
There is no point in calling suspend/resume for unused clockevents as
they are already stopped and disabled.
This is really important for AT91 as the hardware is a trainwreck and
takes ages to synchronize.
Reported-by: Sylvain Rochet <sylvain.rochet@finsecur.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1421399151-26800-1-git-send-email-alexandre.belloni@free-electrons.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Xunlei Pang [Thu, 9 Apr 2015 01:04:42 +0000 (09:04 +0800)]
time: Remove read_boot_clock()
Now that we have a read_boot_clock64() function available on every
architecture, and converted all the users to it, it's time to remove
the (now unused) read_boot_clock() completely from the kernel.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
[jstultz: Minor commit message tweak suggested by Ingo]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Xunlei Pang [Thu, 9 Apr 2015 01:04:41 +0000 (09:04 +0800)]
s390: time: Provide read_boot_clock64() and read_persistent_clock64()
As part of addressing the "y2038 problem" for in-kernel uses,
this patch converts read_boot_clock() to read_boot_clock64()
and read_persistent_clock() to read_persistent_clock64() using
timespec64.
Rename some instances of 'timespec' to 'timespec64' in time.c and
related references
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
[jstultz: Fixed minor style and grammer tweaks
pointed out by Ingo]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Xunlei Pang [Thu, 9 Apr 2015 01:04:40 +0000 (09:04 +0800)]
time: Include math64.h in time64.h
On 32-bit systems, timespec64_add_ns() calls __iter_div_u64_rem()
which needs math64.h, and we want to include time64.h in some
cases.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Badhri Jagan Sridharan [Thu, 7 May 2015 23:20:34 +0000 (16:20 -0700)]
tracing: timer: Add deferrable flag to timer_start
The timer_start event now shows whether the timer is
deferrable in case of a low-res timer. The debug_activate
function now includes a deferrable flag while calling
the trace_timer_start event.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
[jstultz: Fixed minor whitespace and grammer tweaks
pointed out by Ingo]
Signed-off-by: John Stultz <john.stultz@linaro.org>
John Stultz [Wed, 13 May 2015 23:04:47 +0000 (16:04 -0700)]
time: Rework debugging variables so they aren't global
Ingo suggested that the timekeeping debugging variables
recently added should not be global, and should be tied
to the timekeeper's read_base.
Thus this patch implements that suggestion.
This version is different from the earlier versions
as it keeps the variables in the timekeeper structure
rather then in the tkr.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Harald Geyer [Tue, 7 Apr 2015 11:12:35 +0000 (11:12 +0000)]
timekeeping: Provide new API to get the current time resolution
This patch series introduces a new function
u32 ktime_get_resolution_ns(void)
which allows to clean up some driver code.
In particular the IIO subsystem has a function to provide timestamps for
events but no means to get their resolution. So currently the dht11 driver
tries to guess the resolution in a rather messy and convoluted way. We
can do much better with the new code.
This API is not designed to be exposed to user space.
This has been tested on i386, sunxi and mxs.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Harald Geyer <harald@ccbib.org>
[jstultz: Tweaked to make it build after upstream changes]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Sasha Levin [Tue, 2 Dec 2014 04:04:06 +0000 (23:04 -0500)]
time: Make sure tz_minuteswest is set to a valid value when setting time
Invalid values may overflow later, leading to undefined behaviour when
multiplied by 60 to get the amount of seconds.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Thomas Gleixner [Tue, 19 May 2015 15:14:51 +0000 (17:14 +0200)]
jiffies: Remove the extra indentation level
Somehow I missed to clean that up when applying the patches. Fix it up
now.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
Viresh Kumar [Fri, 3 Apr 2015 03:34:05 +0000 (09:04 +0530)]
clockevents: Stop unused clockevent devices
To avoid getting spurious interrupts on a tickless CPU, clockevent
device can now be stopped by switching to ONESHOT_STOPPED state.
The natural place for handling this transition is tick_program_event().
On 'expires == KTIME_MAX', we skip programming the event and so we need
to fix such call sites as well, to always call tick_program_event()
irrespective of the expires value.
Once the clockevent device is required again, check if it was earlier
put into ONESHOT_STOPPED state. If yes, switch its state to ONESHOT
before programming its event.
To make sure we haven't missed any corner case, add a WARN() for the
case where we try to reprogram clockevent device while we aren't
configured in ONESHOT_STOPPED state.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5146b07be7f0bc497e0ebae036590ec2fa73e540.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Viresh Kumar [Fri, 3 Apr 2015 03:34:04 +0000 (09:04 +0530)]
clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state
When no timers/hrtimers are pending, the expiry time is set to a
special value: 'KTIME_MAX'. This normally happens with
NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer
(NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device
(NOHZ_MODE_LOWRES). But, the clockevent device is already
reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at
least fire one more time (unnecessarily). Timers on few
implementations (like arm_arch_timer, etc.) only support PERIODIC mode
and their drivers emulate ONESHOT over that. Which means that on these
platforms we will get spurious interrupts periodically (at last
programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the clockevent device should be
stopped or its interrupts should be masked.
A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and
update clockevent drivers to STOP generating events (or delay it to
max time) when 'expires' is set to KTIME_MAX. But the drawback here is
that every clockevent driver has to be hacked for this particular case
and its very easy for new ones to miss this.
However, Thomas suggested to add an optional state ONESHOT_STOPPED to
solve this problem: lkml.org/lkml/2014/5/9/508.
This patch adds support for ONESHOT_STOPPED state in clockevents
core. It will only be available to drivers that implement the
state-specific callbacks instead of the legacy ->set_mode() callback.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 19 May 2015 14:12:32 +0000 (16:12 +0200)]
Merge branch 'linus' into timers/core
Make sure the upstream fixes are applied before adding further
modifications.
Nicholas Mc Guire [Mon, 18 May 2015 12:19:14 +0000 (14:19 +0200)]
time: Allow gcc to fold constants when possible
To allow constant folding in msecs_to_jiffies() conditionally calls
the HZ dependent _msecs_to_jiffies() helpers or, when gcc can not
figure out constant folding, __msecs_to_jiffies which is the renamed
original msecs_to_jiffies() function.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1431951554-5563-3-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Nicholas Mc Guire [Mon, 18 May 2015 12:19:13 +0000 (14:19 +0200)]
time: Refactor msecs_to_jiffies
Refactor the msecs_to_jiffies conditional code part in time.c and
jiffies.h putting it into conditional functions rather than #ifdefs
to improve readability.
[ tglx: Verified that there is no binary code change ]
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1431951554-5563-2-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Nicholas Mc Guire [Mon, 18 May 2015 12:19:12 +0000 (14:19 +0200)]
time: Move timeconst.h into include/generated
kernel/time/timeconst.h is moved to include/generated/ and generated
by the top level Kbuild. This allows using timeconst.h in an earlier
build stage.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1431951554-5563-1-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Mon, 18 May 2015 17:13:47 +0000 (10:13 -0700)]
Linux 4.1-rc4
Peter Zijlstra [Mon, 18 May 2015 09:31:50 +0000 (11:31 +0200)]
watchdog: Fix merge 'conflict'
Two watchdog changes that came through different trees had a non
conflicting conflict, that is, one changed the semantics of a variable
but no actual code conflict happened. So the merge appeared fine, but
the resulting code did not behave as expected.
Commit
195daf665a62 ("watchdog: enable the new user interface of the
watchdog mechanism") changes the semantics of watchdog_user_enabled,
which thereafter is only used by the functions introduced by
b3738d293233 ("watchdog: Add watchdog enable/disable all functions").
There further appears to be a distinct lack of serialization between
setting and using watchdog_enabled, so perhaps we should wrap the
{en,dis}able_all() things in watchdog_proc_mutex.
This patch fixes a s2r failure reported by Michal; which I cannot
readily explain. But this does make the code internally consistent
again.
Reported-and-tested-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 18 May 2015 17:01:54 +0000 (10:01 -0700)]
Merge tag 'for-linus-
20150516' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"Two MTD fixes for 4.1:
- readtest: the signal-handling code was clobbering the error codes
we should be handling/reporting in this test, rendering it useless.
Noticed by Coverity.
- the common SPI NOR flash DT binding (merged for 4.1-rc1) is being
revised, so let's change that before 4.1 is minted"
* tag 'for-linus-
20150516' of git://git.infradead.org/linux-mtd:
Documentation: dt: mtd: replace "nor-jedec" binding with "jedec, spi-nor"
mtd: readtest: don't clobber error reports
Peter Zijlstra [Thu, 14 May 2015 10:23:11 +0000 (12:23 +0200)]
sched,perf: Fix periodic timers
In the below two commits (see Fixes) we have periodic timers that can
stop themselves when they're no longer required, but need to be
(re)-started when their idle condition changes.
Further complications is that we want the timer handler to always do
the forward such that it will always correctly deal with the overruns,
and we do not want to race such that the handler has already decided
to stop, but the (external) restart sees the timer still active and we
end up with a 'lost' timer.
The problem with the current code is that the re-start can come before
the callback does the forward, at which point the forward from the
callback will WARN about forwarding an enqueued timer.
Now, conceptually its easy to detect if you're before or after the fwd
by comparing the expiration time against the current time. Of course,
that's expensive (and racy) because we don't have the current time.
Alternatively one could cache this state inside the timer, but then
everybody pays the overhead of maintaining this extra state, and that
is undesired.
The only other option that I could see is the external timer_active
variable, which I tried to kill before. I would love a nicer interface
for this seemingly simple 'problem' but alas.
Fixes:
272325c4821f ("perf: Fix mux_interval hrtimer wreckage")
Fixes:
77a4d1a1b9a1 ("sched: Cleanup bandwidth timers")
Cc: pjt@google.com
Cc: tglx@linutronix.de
Cc: klamm@yandex-team.ru
Cc: mingo@kernel.org
Cc: bsegall@google.com
Cc: hpa@zytor.com
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150514102311.GX21418@twins.programming.kicks-ass.net
Linus Torvalds [Sun, 17 May 2015 04:15:59 +0000 (21:15 -0700)]
Merge tag 'usb-4.1-rc4' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 4.1-rc4.
All are pretty minor, and have been in linux-next successfully"
* tag 'usb-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb-storage: Add NO_WP_DETECT quirk for Lacie 059f:0651 devices
Added another USB product ID for ELAN touchscreen quirks.
xhci: gracefully handle xhci_irq dead device
xhci: Solve full event ring by increasing TRBS_PER_SEGMENT to 256
xhci: fix isoc endpoint dequeue from advancing too far on transaction error
usb: chipidea: debug: avoid out of bound read
USB: visor: Match I330 phone more precisely
USB: pl2303: Remove support for Samsung I330
USB: cp210x: add ID for KCF Technologies PRN device
usb: gadget: remove incorrect __init/__exit annotations
usb: phy: isp1301: work around tps65010 dependency
usb: gadget: serial: fix re-ordering of tx data
usb: gadget: hid: Fix static variable usage
usb: gadget: configfs: Fix interfaces array NULL-termination
usb: gadget: xilinx: fix devm_ioremap_resource() check
usb: dwc3: dwc3-omap: correct the register macros
Linus Torvalds [Sun, 17 May 2015 04:10:05 +0000 (21:10 -0700)]
Merge tag 'tty-4.1-rc4' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here's some TTY and serial driver fixes for reported issues.
All of these have been in linux-next successfully"
* tag 'tty-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
pty: Fix input race when closing
tty/n_gsm.c: fix a memory leak when gsmtty is removed
Revert "serial/amba-pl011: Leave the TX IRQ alone when the UART is not open"
serial: omap: Fix error handling in probe
earlycon: Revert log warnings
Linus Torvalds [Sun, 17 May 2015 04:04:56 +0000 (21:04 -0700)]
Merge tag 'staging-4.1-rc4' of git://git./linux/kernel/git/gregkh/staging
Pull staging / IIO driver fixes from Greg KH:
"Here's some staging and iio driver fixes to resolve a number of
reported issues.
All of these have been in linux-next for a while"
* tag 'staging-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (31 commits)
iio: light: hid-sensor-prox: Fix memory leak in probe()
iio: adc:
cc10001: Add delay before setting START bit
iio: adc:
cc10001: Fix regulator_get_voltage() return value check
iio: adc:
cc10001: Fix incorrect use of power-up/power-down register
staging: gdm724x: Correction of variable usage after applying ALIGN()
iio: adc:
cc10001: Fix the channel number mapping
staging: vt6655: lock MACvWriteBSSIDAddress.
staging: vt6655: CARDbUpdateTSF bss timestamp correct tsf counter value.
staging: vt6655: vnt_tx_packet Correct TX order of OWNED_BY_NIC
staging: vt6655: Fix 80211 control and management status reporting.
staging: vt6655: implement IEEE80211_TX_STAT_NOACK_TRANSMITTED
staging: vt6655: device_free_tx_buf use only ieee80211_tx_status_irqsafe
staging: vt6656: use ieee80211_tx_info to select packet type.
staging: rtl8712: freeing an ERR_PTR
staging: sm750: remove incorrect __exit annotation
iio: kfifo: Set update_needed to false only if a buffer was allocated
iio: mcp320x: Fix occasional incorrect readings
iio: accel: mma9553: check input value for activity period
iio: accel: mma9553: add enable channel for activity
iio: accel: mma9551_core: prevent buffer overrun
...
Linus Torvalds [Sun, 17 May 2015 03:48:42 +0000 (20:48 -0700)]
Merge tag 'char-misc-4.1-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is one fix, in the extcon subsystem, that resolves a reported
issue.
It's been in linux-next for a number of weeks now, sorry for not
getting it to you sooner"
* tag 'char-misc-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
extcon: usb-gpio: register extcon device before IRQ registration
Linus Torvalds [Sat, 16 May 2015 23:33:59 +0000 (16:33 -0700)]
Merge branch 'for-linus-4.1-rc4' of git://git./linux/kernel/git/rw/uml
Pull UML hostfs fix from Richard Weinberger:
"This contains a single fix for a regression introduced in 4.1-rc1"
* 'for-linus-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
hostfs: Use correct mask for file mode
Linus Torvalds [Sat, 16 May 2015 23:28:01 +0000 (16:28 -0700)]
Merge tag 'upstream-4.1-rc4' of git://git.infradead.org/linux-ubifs
Pull UBI bufix from Richard Weinberger:
"This contains a single bug fix for the UBI block driver"
* tag 'upstream-4.1-rc4' of git://git.infradead.org/linux-ubifs:
UBI: block: Add missing cache flushes
Linus Torvalds [Sat, 16 May 2015 22:55:31 +0000 (15:55 -0700)]
Merge tag 'for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of ext4 bugs; the most serious of which is a bug in the
lazytime mount optimization code where we could end up updating the
timestamps to the wrong inode"
* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix an ext3 collapse range regression in xfstests
jbd2: fix r_count overflows leading to buffer overflow in journal recovery
ext4: check for zero length extent explicitly
ext4: fix NULL pointer dereference when journal restart fails
ext4: remove unused function prototype from ext4.h
ext4: don't save the error information if the block device is read-only
ext4: fix lazytime optimization
Linus Torvalds [Sat, 16 May 2015 22:50:58 +0000 (15:50 -0700)]
Merge branch 'for-linus-4.1' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"The first commit is a fix from Filipe for a very old extent buffer
reuse race that triggered a BUG_ON. It hasn't come up often, I looked
through old logs at FB and we hit it a handful of times over the last
year.
The rest are other corners he hit during testing"
* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix race when reusing stale extent buffers that leads to BUG_ON
Btrfs: fix race between block group creation and their cache writeout
Btrfs: fix panic when starting bg cache writeout after IO error
Btrfs: fix crash after inode cache writeback failure
Linus Torvalds [Sat, 16 May 2015 22:46:30 +0000 (15:46 -0700)]
Merge branch 'master' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Seven small fixes. The shortlog below is a good description so no
need to elaborate.
It has sat in linux-next and survived the usual automated testing by
Imagination's test farm"
* 'master' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: tlb-r4k: Fix PG_ELPA comment
MIPS: Fix up obsolete cpu_set usage
MIPS: IP32: Fix build errors in reset code in DS1685 platform hook.
MIPS: KVM: Fix unused variable build warning
MIPS: traps: remove extra Tainted: line from __show_regs() output
MIPS: Fix wrong CHECKFLAGS (sparse builds) with GCC 5.1
MIPS: Fix a preemption issue with thread's FPU defaults
Linus Torvalds [Sat, 16 May 2015 22:40:07 +0000 (15:40 -0700)]
Merge tag 'arc-4.1-fixes' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta.
* tag 'arc-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: inline cache flush toggle helpers
ARC: With earlycon in use, retire EARLY_PRINTK
ARC: unbork !LLSC build
Linus Torvalds [Sat, 16 May 2015 22:33:25 +0000 (15:33 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Nothing frightening this time, just smaller fixes in a number of
places.
The other changes contained here are:
MAINTAINERS file updates:
- The mach-gemini maintainer is back in action and has a new git tree
- Krzysztof Kozlowski has volunteered to be a new co-maintainer for
the samsung platforms
- updates to the files that belong to Marvell mvebu
Bug fixes:
- The largest changes are on omap2, but are only to avoid some
harmless warnings and to fix reset on omap4
- a small regression fix on tegra
- multiple fixes for incorrect IRQ affinity on vexpress
- the missing system controller on arm64 juno is added
- one revert of a patch that was accidentally applied twice for
mach-rockchip
- two clock related DT fixes for mvebu
- a workaround for suspend with old DT binaries on new exynos kernels
- Another fix for suspend on exynos, needs to be backported"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (21 commits)
MAINTAINERS: Add dts entries for some of the Marvell SoCs
MAINTAINERS: ARM: EXYNOS: Add Krzysztof Kozlowski as co-maintainer
ARM: EXYNOS: Use of_machine_is_compatible instead of soc_is_exynos4
ARM: EXYNOS: Fix failed second suspend on Exynos4
Revert "ARM: rockchip: fix undefined instruction of reset_ctrl_regs"
ARM: EXYNOS: Fix dereference of ERR_PTR returned by of_genpd_get_from_provider
ARM: EXYNOS: Don't try to initialize suspend on old DT
ARM: dts: Add keep-power-in-suspend to WiFi SDIO node for Peach Boards
ARM: gemini: fix compiler warning due wrong data type
ARM: vexpress/tc2: Add interrupt-affinity to the PMU node
ARM: vexpress/ca9: Add interrupt-affinity to the PMU node
ARM: vexpress/ca9: Add unified-cache property to l2 cache node
ARM64: juno: add sp810 support and fix sp804 clock frequency
ARM: Gemini: Maintainers update
ARM: OMAP2+: Remove bogus struct clk comparison for timer clock
ARM: dove: Add clock-names to CuBox Si5351 clk generator
ARM: AM33xx+: hwmod: re-use omap4 implementations for reset functionality
ARM: OMAP4+: PRM: add support for passing status register/bit info to reset
ARM: AM43xx: hwmod: add VPFE hwmod entries
ARM: mvebu: Fix the main PLL frequency on Armada 375, 38x and 39x SoCs
...
Linus Torvalds [Sat, 16 May 2015 22:27:33 +0000 (15:27 -0700)]
Merge branch 'for-rc' of git://git./linux/kernel/git/rzhang/linux
Pull thermal fixes from Zhang Rui:
"Specifics:
- fix an issue in intel_powerclamp driver that idle injection target
is not accurately maintained on newer Intel CPUs. Package C8 to
C10 states are introduced on these CPUs but they were not included
in the package c-state residency calculation. From Jacob Pan.
- fix a problem that package c-state idle injection was missing on
Broadwell server, by adding its id to intel_powerclamp driver.
From Jacob Pan.
- a couple of small fixes and cleanups from Joe Perches, Mathias
Krause, Dan Carpenter and Anand Moon"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
tools/thermal: tmon: fixed the 'make install' command
thermal: rockchip: fix an error code
thermal/powerclamp: fix missing newer package c-states
thermal/intel_powerclamp: add id for broadwell server
thermal/intel_powerclamp: add __init / __exit annotations
thermal: Use bool function return values of true/false not 1/0
Linus Torvalds [Sat, 16 May 2015 22:03:52 +0000 (15:03 -0700)]
Merge tag 'linux-kselftest-4.1-rc4' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Urgent fix for Kselftest regression introduced in 4.1-rc1 by the new
x86 test due to its hard dependency on 32-bit build environment.
A set of 5 patches fix the make kselftest run and kselftest install"
* tag 'linux-kselftest-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests, x86: Rework x86 target architecture detection
selftests, x86: Remove useless run_tests rule
selftests/x86: install tests
selftest/x86: have no dependency on all when cross building
selftest/x86: build both bitnesses
Linus Torvalds [Fri, 15 May 2015 20:06:06 +0000 (13:06 -0700)]
Merge branch 'parisc-4.1-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"One important patch which fixes crashes due to stack randomization on
architectures where the stack grows upwards (currently parisc and
metag only).
This bug went unnoticed on parisc since kernel 3.14 where the flexible
mmap memory layout support was added by commit
9dabf60dc4ab. The
changes in fs/exec.c are inside an #ifdef CONFIG_STACK_GROWSUP section
and will not affect other platforms.
The other two patches rename args of the kthread_arg() function and
fixes a printk output"
* 'parisc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures
parisc: copy_thread(): rename 'arg' argument to 'kthread_arg'
parisc: %pf is only for function pointers
Brian Norris [Thu, 14 May 2015 17:32:53 +0000 (10:32 -0700)]
Documentation: dt: mtd: replace "nor-jedec" binding with "jedec, spi-nor"
In commit
8ff16cf77ce3 ("Documentation: devicetree: m25p80: add "nor-jedec"
binding"), we added a generic "nor-jedec" binding to catch all
mostly-compatible SPI NOR flash which can be detected via the READ ID
opcode (0x9F). This was discussed and reviewed at the time, however
objections have come up since then as part of this discussion:
http://lkml.kernel.org/g/
20150511224646.GJ32500@ld-irv-0074
It seems the parties involved agree that "jedec,spi-nor" does a better
job of capturing the fact that this is SPI-specific, not just any NOR
flash.
This binding was only merged for v4.1-rc1, so it's still OK to change
the naming.
At the same time, let's move the documentation to a better name.
Next up: stop referring to code (drivers/mtd/devices/m25p80.c) from the
documentation.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Marek Vasut <marex@denx.de>
Cc: Rafał Miłecki <zajec5@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: devicetree@vger.kernel.org
Acked-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mark Rutland <mark.rutland@arm.com>
James Hogan [Wed, 13 May 2015 10:50:55 +0000 (11:50 +0100)]
MIPS: tlb-r4k: Fix PG_ELPA comment
The ELPA bit in PageGrain is all about large *physical* addresses, so
correct the reference to "large virtual address" in the comment above
where it is set for MIPS64.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10038/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ezequiel Garcia [Tue, 28 Apr 2015 21:34:23 +0000 (18:34 -0300)]
MIPS: Fix up obsolete cpu_set usage
cpu_set was removed (along with a bunch of cpumask helpers) by
commit
2f0f267ea072 ("cpumask: remove deprecated functions.").
Fix this by replacing cpu_set with cpumask_set_cpu. Without this
fix the following error is triggered when CONFIG_MIPS_MT_FPAFF=y.
arch/mips/kernel/smp-cps.c: In function 'cps_smp_setup':
arch/mips/kernel/smp-cps.c:95:3: error: implicit declaration of function 'cpu_set' [-Werror=implicit-function-declaration]
Fixes:
90db024f140d ("MIPS: smp-cps: cpu_set FPU mask if FPU present")
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Acked-by: Niklas Cassel <niklass@axis.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9912/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Linus Torvalds [Fri, 15 May 2015 20:01:31 +0000 (13:01 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 build fix from Ingo Molnar:
"A bzImage build fix on older distros"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Fix 'make bzImage' on older distros
Linus Torvalds [Fri, 15 May 2015 19:42:33 +0000 (12:42 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a suspend/resume related regression fix, and an RT priority
boosting fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix regression in cpuset_cpu_inactive() for suspend
sched: Handle priority boosted tasks proper in setscheduler()
Linus Torvalds [Fri, 15 May 2015 19:38:21 +0000 (12:38 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also a lockdep annotation fix, a PMU event
list fix and a new model addition"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/liblockdep: Fix compilation error
tools/liblockdep: Fix linker error in case of cross compile
perf tools: Use getconf to determine number of online CPUs
tools: Fix tools/vm build
perf/x86/rapl: Enable Broadwell-U RAPL support
perf/x86/intel: Fix SLM cache event list
perf: Annotate inherited event ctx->mutex recursion
Linus Torvalds [Fri, 15 May 2015 19:34:05 +0000 (12:34 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"A tegra irqchip driver memory corruption fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: tegra: Set the proper base address in irq chip data
Linus Torvalds [Fri, 15 May 2015 18:44:30 +0000 (11:44 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Radeon:
one oops fix, one bug fix, one pci id addition patch
i915:
one suspend/resume regression fix.
All seems quiet enough."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: don't do mst probing if MST isn't enabled.
drm/radeon: add new bonaire pci id
drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling
drm/i915: Avoid GPU hang when coming out of s3 or s4
Linus Torvalds [Fri, 15 May 2015 18:17:41 +0000 (11:17 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"8 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, numa: really disable NUMA balancing by default on single node machines
MAINTAINERS: update Jingoo Han's email address
CMA: page_isolation: check buddy before accessing it
uidgid: make uid_valid and gid_valid work with !CONFIG_MULTIUSER
kernfs: do not account ino_ida allocations to memcg
gfp: add __GFP_NOACCOUNT
tools/vm: fix page-flags build
drivers/rtc/rtc-armada38x.c: remove unused local `flags'
Arnd Bergmann [Fri, 15 May 2015 15:14:48 +0000 (17:14 +0200)]
Merge tag 'samsung-fixes-2' of git://git./linux/kernel/git/kgene/linux-samsung into fixes
Merge "Samsung 2nd fixes for v4.1" from Kukjin Kim:
- fix second S2R on exynos4412 based Trats2, Odroid U3 boards which
happened after enabling L2$ and caused by commit
13cfa6c4f7fa ("ARM:
EXYNOS: Fix CPU idle clock down after CPU off")
And replace the soc_is_exynosxxx() macro with of_compatible_xxx
- fix dereference of ERR_PTR of of_genpd_get_from_provider()
- fix suspend problem on old DT machines to skip the initialization
suspend and caused by commit
8b283c025443 ("ARM: exynos4/5: convert
pmu wakeup to stacked domains")
- add keep-power-in-suspend for Peach Boards to support S2R and has
been missed in previous pull-request for fixes
* tag 'samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: EXYNOS: Use of_machine_is_compatible instead of soc_is_exynos4
ARM: EXYNOS: Fix failed second suspend on Exynos4
ARM: EXYNOS: Fix dereference of ERR_PTR returned by of_genpd_get_from_provider
ARM: EXYNOS: Don't try to initialize suspend on old DT
ARM: dts: Add keep-power-in-suspend to WiFi SDIO node for Peach Boards
Arnd Bergmann [Fri, 15 May 2015 15:13:06 +0000 (17:13 +0200)]
Merge tag 'mvebu-fixes-4.1-2' of git://git.infradead.org/linux-mvebu into fixes
Merge "mvebu fixes for 4.1 (part 2)" from Gregory CLEMENT:
Fix the main PLL frequency on Armada 375, 38x and 39x SoCs
Add clock-names to CuBox Si5351 clk generator
Add dts entries in the MAINTAINERS file
* tag 'mvebu-fixes-4.1-2' of git://git.infradead.org/linux-mvebu:
MAINTAINERS: Add dts entries for some of the Marvell SoCs
ARM: dove: Add clock-names to CuBox Si5351 clk generator
ARM: mvebu: Fix the main PLL frequency on Armada 375, 38x and 39x SoCs
Gregory CLEMENT [Fri, 15 May 2015 12:25:43 +0000 (14:25 +0200)]
MAINTAINERS: Add dts entries for some of the Marvell SoCs
Since many releases, the modifications of the mvebu and berlin device
tree files are merged through the mvebu subsystem. This patch makes it
official in order to help the contributors using the get_maintainer.pl
to find the accurate peoples.
In the same time, updated the mvebu description which now includes the
kirkwood SoCs and new Armada SoCs.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Ingo Molnar [Fri, 15 May 2015 06:43:15 +0000 (08:43 +0200)]
Merge branch 'liblockdep-fixes' of git://git./linux/kernel/git/sashal/linux into perf/urgent
Pull liblockdep fixes from Sasha Levin:
"two fixes that deal with compilation errors in liblockdep."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Fri, 15 May 2015 05:21:18 +0000 (15:21 +1000)]
Merge tag 'drm-intel-fixes-2015-05-13' of git://anongit.freedesktop.org/drm-intel into drm-fixes
fix one gpu hang on resume.
* tag 'drm-intel-fixes-2015-05-13' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Avoid GPU hang when coming out of s3 or s4
Dave Airlie [Fri, 15 May 2015 05:20:45 +0000 (15:20 +1000)]
Merge branch 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
radeon minor fixes, and pci id addition.
* 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: don't do mst probing if MST isn't enabled.
drm/radeon: add new bonaire pci id
drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling
Theodore Ts'o [Fri, 15 May 2015 04:24:10 +0000 (00:24 -0400)]
ext4: fix an ext3 collapse range regression in xfstests
The xfstests test suite assumes that an attempt to collapse range on
the range (0, 1) will return EOPNOTSUPP if the file system does not
support collapse range. Commit
280227a75b56: "ext4: move check under
lock scope to close a race" broke this, and this caused xfstests to
fail when run when testing file systems that did not have the extents
feature enabled.
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Fri, 15 May 2015 01:40:16 +0000 (18:40 -0700)]
Merge tag 'pm+acpi-4.1-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Two fixes here, one revert of a recent ACPICA commit that broke audio
support on one Dell machine and a fix for a long-standing issue that
may cause systems to break randomly during boot.
Specifics:
- The recent ACPICA commit that set the ACPI _REV return value to 2
(which is the value always used by Windows and now mandated by the
spec too) in order to prevent the firmware people from using it to
play tricks with us caused a serious audio regression to happen on
Dell XPS 13 (the AML on that machine uses the _REV return value to
decide how to expose audio to the OS and does that to hide the lack
of proper support for its I2S audio in Linux), so revert that
commit for now and we'll revisit the issue in the next cycle.
- Ensure that the ordering of acpi_reserve_resources() with respect
to the rest of the ACPI initialization sequence will always be the
same, or the IO or memory region occupied by the ACPI fixed
registers may be assigned to a PCI host bridge as a result of a
race and random breakage ensues going forward"
* tag 'pm+acpi-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Permanently set _REV to the value '2'."
ACPI / init: Fix the ordering of acpi_reserve_resources()
Linus Torvalds [Fri, 15 May 2015 01:35:33 +0000 (18:35 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- fix potential memory leak in perf PMU probing
- BPF sign extension fix for 64-bit immediates
- fix build failure with unusual configuration
- revert unused and broken branch patching from alternative code
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: perf: fix memory leak when probing PMU PPIs
arm64: bpf: fix signedness bug in loading 64-bit immediate
arm64: mm: Fix build error with CONFIG_SPARSEMEM_VMEMMAP disabled
Revert "arm64: alternative: Allow immediate branch as alternative instruction"
Linus Torvalds [Fri, 15 May 2015 01:02:15 +0000 (18:02 -0700)]
Merge branch 'dmi-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull dmi fixes from Jean Delvare.
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi_scan: Fix ordering of product_uuid
firmware: dmi_scan: Simplified displayed version
Mel Gorman [Thu, 14 May 2015 22:17:09 +0000 (15:17 -0700)]
mm, numa: really disable NUMA balancing by default on single node machines
NUMA balancing is meant to be disabled by default on UMA machines but
the check is using nr_node_ids (highest node) instead of
num_online_nodes (online nodes).
The consequences are that a UMA machine with a node ID of 1 or higher
will enable NUMA balancing. This will incur useless overhead due to
minor faults with the impact depending on the workload. These are the
impact on the stats when running a kernel build on a single node machine
whose node ID happened to be 1:
vanilla patched
NUMA base PTE updates
5113158 0
NUMA huge PMD updates 643 0
NUMA page range updates
5442374 0
NUMA hint faults
2109622 0
NUMA hint local faults
2109622 0
NUMA hint local percent 100 100
NUMA pages migrated 0 0
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [3.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Thu, 14 May 2015 22:17:07 +0000 (15:17 -0700)]
MAINTAINERS: update Jingoo Han's email address
Change my private email address.
Signed-off-by: Jingoo Han <jingoohan1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hui Zhu [Thu, 14 May 2015 22:17:04 +0000 (15:17 -0700)]
CMA: page_isolation: check buddy before accessing it
I had an issue:
Unable to handle kernel NULL pointer dereference at virtual address
0000082a
pgd =
cc970000
[
0000082a] *pgd=
00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
PC is at get_pageblock_flags_group+0x5c/0xb0
LR is at unset_migratetype_isolate+0x148/0x1b0
pc : [<
c00cc9a0>] lr : [<
c0109874>] psr:
80000093
sp :
c7029d00 ip :
00000105 fp :
c7029d1c
r10:
00000001 r9 :
0000000a r8 :
00000004
r7 :
60000013 r6 :
000000a4 r5 :
c0a357e4 r4 :
00000000
r3 :
00000826 r2 :
00000002 r1 :
00000000 r0 :
0000003f
Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control:
10c5387d Table:
2cb7006a DAC:
00000015
Backtrace:
get_pageblock_flags_group+0x0/0xb0
unset_migratetype_isolate+0x0/0x1b0
undo_isolate_page_range+0x0/0xdc
__alloc_contig_range+0x0/0x34c
alloc_contig_range+0x0/0x18
This issue is because when calling unset_migratetype_isolate() to unset
a part of CMA memory, it try to access the buddy page to get its status:
if (order >= pageblock_order) {
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
buddy_idx = __find_buddy_index(page_idx, order);
buddy = page + (buddy_idx - page_idx);
if (!is_migrate_isolate_page(buddy)) {
But the begin addr of this part of CMA memory is very close to a part of
memory that is reserved at boot time (not in buddy system). So add a
check before accessing it.
[akpm@linux-foundation.org: use conventional code layout]
Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
Suggested-by: Laura Abbott <labbott@redhat.com>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Triplett [Thu, 14 May 2015 22:17:01 +0000 (15:17 -0700)]
uidgid: make uid_valid and gid_valid work with !CONFIG_MULTIUSER
{u,g}id_valid call {u,g}id_eq, which calls __k{u,g}id_val on both
arguments and compares. With !CONFIG_MULTIUSER, __k{u,g}id_val return a
constant 0, which makes {u,g}id_valid always return false. Change
{u,g}id_valid to compare their argument against -1 instead. That produces
identical results in the normal CONFIG_MULTIUSER=y case, but with
!CONFIG_MULTIUSER will make {u,g}id_valid constant-fold into "return
true;" rather than "return false;".
This fixes uses of devpts without CONFIG_MULTIUSER.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>,
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Thu, 14 May 2015 22:16:58 +0000 (15:16 -0700)]
kernfs: do not account ino_ida allocations to memcg
root->ino_ida is used for kernfs inode number allocations. Since IDA has
a layered structure, different IDs can reside on the same layer, which
is currently accounted to some memory cgroup. The problem is that each
kmem cache of a memory cgroup has its own directory on sysfs (under
/sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a
directory or any file in it gets allocated from a layer accounted to the
cgroup which the cache is created for, the cgroup will get pinned for
good, because one has to free all kmem allocations accounted to a cgroup
in order to release it and destroy all its kmem caches. That said we
must not account layers of ino_ida to any memory cgroup.
Since per net init operations may create new sysfs entries directly
(e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
per each namespace, which, in turn, creates new sysfs entries), an easy
way to reproduce this issue is by creating network namespace(s) from
inside a kmem-active memory cgroup.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Thu, 14 May 2015 22:16:55 +0000 (15:16 -0700)]
gfp: add __GFP_NOACCOUNT
Not all kmem allocations should be accounted to memcg. The following
patch gives an example when accounting of a certain type of allocations to
memcg can effectively result in a memory leak. This patch adds the
__GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the
allocation to go through the root cgroup. It will be used by the next
patch.
Note, since in case of kmemleak enabled each kmalloc implies yet another
allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
gfp_kmemleak_mask.
Alternatively, we could introduce a per kmem cache flag disabling
accounting for all allocations of a particular kind, but (a) we would not
be able to bypass accounting for kmalloc then and (b) a kmem cache with
this flag set could not be merged with a kmem cache without this flag,
which would increase the number of global caches and therefore
fragmentation even if the memory cgroup controller is not used.
Despite its generic name, currently __GFP_NOACCOUNT disables accounting
only for kmem allocations while user page allocations are always charged.
To catch abusing of this flag, a warning is issued on an attempt of
passing it to mem_cgroup_try_charge.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>