git.monstr.eu Git - linux-2.6-microblaze.git/log

VFS: don't do protected {sym,hard}links by default

In commit 800179c9b8a1 ("This adds symlink and hardlink restrictions to
the Linux VFS"), the new link protections were enabled by default, in
the hope that no actual application would care, despite it being
technically against legacy UNIX (and documented POSIX) behavior.

However, it does turn out to break some applications. It's rare, and
it's unfortunate, but it's unacceptable to break existing systems, so
we'll have to default to legacy behavior.

In particular, it has broken the way AFD distributes files, see

http://www.dwd.de/AFD/

along with some legacy scripts.

Distributions can end up setting this at initrd time or in system
scripts: if you have security problems due to link attacks during your
early boot sequence, you have bigger problems than some kernel sysctl
setting. Do:

echo 1 > /proc/sys/fs/protected_symlinks
echo 1 > /proc/sys/fs/protected_hardlinks

to re-enable the link protections.

Alternatively, we may at some point introduce a kernel config option
that sets these kinds of "more secure but not traditional" behavioural
options automatically.

Reported-by: Nick Bowler <nbowler@elliptictech.com>
Reported-by: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # v3.6
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'sound-3.7' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Slightly a high amount of commits come from Adrian Knoth's HDSPM
  driver fixes.  Other than that, all small trival fixes or quirks that
  are pretty driver-specific."

* tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: wm8994: Only enable extra BCLK cycles when required
  ALSA: als3000: check for the kzalloc return value
  ALSA: sound/isa/opti9xx/miro.c: eliminate possible double free
  ALSA: hda - Fix silent headphone output from Toshiba P200
  ALSA: hdspm - Fix coding style in CTL_ELEM macros
  ALSA: hdspm - Fix typo in kcontrol element on RME MADI cards
  ALSA: hdspm - Fix sync_in detection on AES/AES32
  ALSA: hdspm - Fix sync_in reporting on RME MADI cards
  ALSA: hdspm - Also report autosync_sample_rate on MADI and MADIface
  ALSA: hdspm - Fix reported autosync_sample_rate
  ALSA: hdspm - Fix sync check reporting on all RME HDSPM cards
  ALSA: hdspm - Report external rate in slave mode on PCI MADI
  ALSA: hdspm - Allow DDS/Varispeed to be set from userspace
  ALSA: hda - add dock support for Thinkpad T430
  ASoC: ux500_msp_i2s: Fix devm_* and return code merge error
  ASoC: Ux500: Dispose of device nodes correctly

Merge branch 'fixes_for_linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

Pull DMA-mapping revert from Marek Szyprowski:
"Due to my mistake, my previous pull request (merged as commit
  cff7b8ba60e3: "Merge branch 'fixes_for_linus' ..") contained a patch
  which is aimed for v3.8 and lacks its dependences.  This pull request
  reverts it and fixes build break of ARM architecture."

* 'fixes_for_linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  Revert "ARM: dma-mapping: support debug_dma_mapping_error"

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"This fixes a couple of nasty page table initialization bugs which were
  causing kdump regressions.  A clean rearchitecturing of the code is in
  the works - meanwhile these are reverts that restore the
  best-known-working state of the kernel.

  There's also EFI fixes and other small fixes."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mm: Undo incorrect revert in arch/x86/mm/init.c
  x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
  x86, mm: Find_early_table_space based on ranges that are actually being mapped
  x86, mm: Use memblock memory loop instead of e820_RAM
  x86, mm: Trim memory in memblock to be page aligned
  x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
  x86/efi: Fix oops caused by incorrect set_memory_uc() usage
  x86-64: Fix page table accounting
  Revert "x86/mm: Fix the size calculation of mapping tables"
  MAINTAINERS: Add EFI git repository location

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Most of the kernel diffstat relates to a group of Intel P6 and KNC
  (Xeon-Phi Knights Corner) PMU driver fixes, neither of which is in
  heavy use, so we took the fixes.

  The rest is diverse smallish fixes to the tooling and kernel side."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Remove unused variable in nhmex_rbox_alter_er()
  perf/x86: Enable overflow on Intel KNC with a custom knc_pmu_handle_irq()
  perf/x86: Remove cpuc->enable check on Intl KNC event enable/disable
  perf/x86: Make Intel KNC use full 40-bit width of counters
  perf/x86/uncore: Handle pci_read_config_dword() errors
  perf/x86: Remove P6 cpuc->enabled check
  perf/x86: Update/fix generic events on P6 PMU
  perf/x86: Fix P6 FP_ASSIST event constraint
  perf, cpu hotplug: Use cached value of smp_processor_id()
  perf, cpu hotplug: Run CPU_STARTING notifiers with irqs disabled
  x86/perf: Fix virtualization sanity check
  perf test: Fix exclude_guest parse events tests
  perf tools: do not flush maps on COMM for perf report
  perf help: Fix --help for builtins
  perf trace: Check if sample raw_data field is set
  perf trace: Validate syscall id before growing syscall table

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"This has our series of fixes for the next rc.  The biggest batch is
  from Jan Schmidt, fixing up some problems in our subvolume quota code
  and fixing btrfs send/receive to work with the new extended inode
  refs."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: do not bug when we fail to commit the transaction
  Btrfs: fix memory leak when cloning root's node
  Btrfs: Use btrfs_update_inode_fallback when creating a snapshot
  Btrfs: Send: preserve ownership (uid and gid) also for symlinks.
  Btrfs: fix deadlock caused by the nested chunk allocation
  btrfs: Return EINVAL when length to trim is less than FSB
  Btrfs: fix memory leak in btrfs_quota_enable()
  Btrfs: send correct rdev and mode in btrfs-send
  Btrfs: extended inode refs support for send mechanism
  Btrfs: Fix wrong error handling code
  Fix a sign bug causing invalid memory access in the ino_paths ioctl.
  Btrfs: comment for loop in tree_mod_log_insert_move
  Btrfs: fix extent buffer reference for tree mod log roots
  Btrfs: determine level of old roots
  Btrfs: tree mod log's old roots could still be part of the tree
  Btrfs: fix a tree mod logging issue for root replacement operations
  Btrfs: don't put removals from push_node_left into tree mod log twice

Merge tag 'efi-for-3.7' of git://git./linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

"Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and
document EFI git repository location on kernel.org."

Conflicts:
arch/x86/include/asm/efi.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Revert "ARM: dma-mapping: support debug_dma_mapping_error"

This reverts commit 871ae57adc5ed092c1341f411514d0e8482e2611, which is
scheduled for v3.8 and accidently got into v3.7-rc series.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm radeon fixes from Dave Airlie:
"Just radeon fixes in this one:
   - some new PCI IDs
   - ATPX regression fix
   - async VM regression fixes
   - some module options fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: fix ATPX regression in acpi rework
  drm/radeon: fix ATPX function documentation
  drm/radeon: move the retry to gem_object_create
  drm/radeon: move size limits to gem_object_create.
  drm/radeon: use vzalloc for gart pages
  drm/radeon: fix and simplify pot argument checks v3
  drm/radeon: fix header size estimation in VM code
  drm/radeon: remove set_page check from VM code
  drm/radeon: fix si_set_page v2
  drm/radeon: fix cayman_vm_set_page v2
  drm/radeon: fix PFP sync in vm_flush
  drm/radeon: add error output if VM CS fails on cayman
  drm/radeon: give each backlight a unique id
  drm/radeon: fix sparse warning
  drm/radeon: add some new SI PCI ids

Merge tag 'nfs-for-3.7-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS bugfixes from Trond Myklebust:

- Fix the NFSv2/v3 kernel statd protocol, which broke due to net
   namespace related changes.

- Fix a number of races in the SUNRPC TCP disconnect/reconnect code.

* tag 'nfs-for-3.7-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
  LOCKD: fix races in nsm_client_get
  SUNRPC: Get rid of the xs_error_report socket callback
  SUNRPC: Prevent races in xs_abort_connection()
  Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."
  SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT

Merge branch 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Alex writes:
"Fixes pull request for radeon.  The main things here are
fixing a ATPX regression from the acpi rework, fixing some
fallout from the async VM work, and fixing some module options
that were broken in certain cases.  Other than that, mainly
just bug fixes."

* 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: fix ATPX regression in acpi rework
  drm/radeon: fix ATPX function documentation
  drm/radeon: move the retry to gem_object_create
  drm/radeon: move size limits to gem_object_create.
  drm/radeon: use vzalloc for gart pages
  drm/radeon: fix and simplify pot argument checks v3
  drm/radeon: fix header size estimation in VM code
  drm/radeon: remove set_page check from VM code
  drm/radeon: fix si_set_page v2
  drm/radeon: fix cayman_vm_set_page v2
  drm/radeon: fix PFP sync in vm_flush
  drm/radeon: add error output if VM CS fails on cayman
  drm/radeon: give each backlight a unique id
  drm/radeon: fix sparse warning
  drm/radeon: add some new SI PCI ids

Merge branch 'akpm' (Andrew's fixes)

Merge misc fixes from Andrew Morton:
"18 total.  15 fixes and some updates to a device_cgroup patchset which
  bring it up to date with the version which I should have merged in the
  first place."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (18 patches)
  fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check
  gen_init_cpio: avoid stack overflow when expanding
  drivers/rtc/rtc-imxdi.c: add missing spin lock initialization
  mm, numa: avoid setting zone_reclaim_mode unless a node is sufficiently distant
  pidns: limit the nesting depth of pid namespaces
  drivers/dma/dw_dmac: make driver's endianness configurable
  mm/mmu_notifier: allocate mmu_notifier in advance
  tools/testing/selftests/epoll/test_epoll.c: fix build
  UAPI: fix tools/vm/page-types.c
  mm/page_alloc.c:alloc_contig_range(): return early for err path
  rbtree: include linux/compiler.h for definition of __always_inline
  genalloc: stop crashing the system when destroying a pool
  backlight: ili9320: add missing SPI dependency
  device_cgroup: add proper checking when changing default behavior
  device_cgroup: stop using simple_strtoul()
  device_cgroup: rename deny_all to behavior
  cgroup: fix invalid rcu dereference
  mm: fix XFS oops due to dirty pages without buffers on s390

Input: wacom - add touch sensor support for Cintiq 24HD touch

Decode multitouch reports from the touch sensor of the Cintiq 24HD
touch.

Signed-off-by: Jason Gerecke <killertofu@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Input: wacom - handle split-sensor devices with internal hubs

Like our other pen-and-touch products, the Cintiq 24HD touch needs data
to be shared between its two sensors to facilitate proximity-based palm
rejection.

Unlike other tablets that report sensor data through separate interfaces
of the same USB device, the Cintiq 24HD touch has separate USB devices
that are connected to an internal USB hub.

This patch makes it possible to designate the USB VID/PID of the other
device so that the two may share data. To ensure we don't accidentally
link to a sensor from a physically separate device (if several have been
plugged in), we limit the search to siblings (i.e., devices directly
connected to the same hub).

Signed-off-by: Jason Gerecke <killertofu@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Makefile: Documentation for external tool should be correct

If one includes documentation for an external tool, it should be
correct.  This is not:

1. Overriding the input to rngd should typically be neither
   necessary nor desired.  This is especially so since newer
   versions of rngd support a number of different *types* of sources.
2. The default kernel-exported device is called /dev/hwrng not
   /dev/hwrandom nor /dev/hw_random (both of which were used in the
   past; however, kernel and udev seem to have converged on
   /dev/hwrng.)

Overall it is better if the documentation for rngd is kept with rngd
rather than in a kernel Makefile.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"A random collection of various fixes, mainly from Arnd and a few other
  people.  Not thing really stands out here."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: drop experimental status for hotplug and Thumb2
  ARM: 7560/1: SMP_TWD: use DIV_ROUND_CLOSEST() for periodic mode
  ARM: 7559/1: smp: switch away from the idmap before updating init_mm.mm_count
  ARM: 7556/1: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD
  ARM: 7555/1: kexec: fix segment memory addresses check
  ARM: warnings in arch/arm/include/asm/uaccess.h
  ARM: binfmt_flat: unused variable 'persistent'
  ARM: be really quiet when building with 'make -s'
  ARM: pass -marm to gcc by default for both C and assembler
  ARM: Xen: fix initial build problems
  ARM: export default read_current_timer
  ARM: Fix another build warning in arch/arm/mm/alignment.c
  ARM: export set_irq_flags
  ARM: kprobes: make more tests conditional

Merge branch 'fixes_for_linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

Pull CMA and DMA-mapping fixes from Marek Szyprowski:
"This consists mainly of a set of one-liner fixes and cleanups for a
  few minor issues identified in both Contiguous Memory Allocator code
  and ARM DMA-mapping subsystem."

* 'fixes_for_linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: mm: Remove unused arm_vmregion priv field
  ARM: dma-mapping: fix build warning in __dma_alloc()
  ARM: dma-mapping: support debug_dma_mapping_error
  mm: cma: alloc_contig_range: return early for err path
  drivers: cma: Fix wrong CMA selected region size default value
  drivers: dma-coherent: Fix typo in dma_mmap_from_coherent documentation
  drivers: dma-contiguous: Don't redefine SZ_1M

x86, mm: Undo incorrect revert in arch/x86/mm/init.c

Commit

844ab6f9 x86, mm: Find_early_table_space based on ranges that are actually being mapped

added back some lines back wrongly that has been removed in commit

7b16bbf97 Revert "x86/mm: Fix the size calculation of mapping tables"

remove them again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQW_vuaYQbmagVnxT2DGsYc=9tNeAbdBq53sYkitPOwxSQ@mail.gmail.com
Acked-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check

The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments. This could lead to leaking kernel
stack contents into userspace.

Patch extracted from existing fix in grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gen_init_cpio: avoid stack overflow when expanding

Fix possible overflow of the buffer used for expanding environment
variables when building file list.

In the extremely unlikely case of an attacker having control over the
environment variables visible to gen_init_cpio, control over the
contents of the file gen_init_cpio parses, and gen_init_cpio was built
without compiler hardening, the attacker can gain arbitrary execution
control via a stack buffer overflow.

  $ cat usr/crash.list
  file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0
  $ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list
  *** buffer overflow detected ***: ./usr/gen_init_cpio terminated

This also replaces the space-indenting with tabs.

Patch based on existing fix extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-imxdi.c: add missing spin lock initialization

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Roland Stigge <stigge@antcom.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Roland Stigge <stigge@antcom.de>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, numa: avoid setting zone_reclaim_mode unless a node is sufficiently distant

Commit 957f822a0ab9 ("mm, numa: reclaim from all nodes within reclaim
distance") caused zone_reclaim_mode to be set for all systems where two
nodes are within RECLAIM_DISTANCE of each other. This is the opposite
of what we actually want: zone_reclaim_mode should be set if two nodes
are sufficiently distant.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Julian Wollrath <jwollrath@web.de>
Tested-by: Julian Wollrath <jwollrath@web.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Patrik Kullman <patrik.kullman@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

pidns: limit the nesting depth of pid namespaces

'struct pid' is a "variable sized struct" - a header with an array of
upids at the end.

The size of the array depends on a level (depth) of pid namespaces.  Now a
level of pidns is not limited, so 'struct pid' can be more than one page.

Looks reasonable, that it should be less than a page.  MAX_PIS_NS_LEVEL is
not calculated from PAGE_SIZE, because in this case it depends on
architectures, config options and it will be reduced, if someone adds a
new fields in struct pid or struct upid.

I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to expand
"struct pid" and it's more than enough for all known for me use-cases.
When someone finds a reasonable use case, we can add a config option or a
sysctl parameter.

In addition it will reduce the effect of another problem, when we have
many nested namespaces and the oldest one starts dying.
zap_pid_ns_processe will be called for each namespace and find_vpid will
be called for each process in a namespace.  find_vpid will be called
minimum max_level^2 / 2 times.  The reason of that is that when we found a
bit in pidmap, we can't determine this pidns is top for this process or it
isn't.

vpid is a heavy operation, so a fork bomb, which create many nested
namespace, can make a system inaccessible for a long time.  For example my
system becomes inaccessible for a few minutes with 4000 processes.

[akpm@linux-foundation.org: return -EINVAL in response to excessive nesting, not -ENOMEM]
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/dma/dw_dmac: make driver's endianness configurable

The dw_dmac driver was originally developed for avr32 to be used with the
Synopsys DesignWare AHB DMA controller. Starting from 2.6.38, access to
the device's i/o memory was done with the little-endian readl/writel
functions(1)

This broke the driver for the avr32 platform, because it needs big
(native) endian accessors. This patch makes the endianness configurable
using 'DW_DMAC_BIG_ENDIAN_IO', which will default be true for AVR32

I submitted this patch before(2) but then waited for Andy to finish other
changes to the same module(3).

(1) https://patchwork.kernel.org/patch/608211
(2) https://lkml.org/lkml/2012/8/26/148
(3) https://lkml.org/lkml/2012/9/21/173

Signed-off-by: Hein Tibosch <hein_tibosch@yahoo.es>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: Havard Skinnemoen <havard@skinnemoen.net>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/mmu_notifier: allocate mmu_notifier in advance

While allocating mmu_notifier with parameter GFP_KERNEL, swap would start
to work in case of tight available memory.  Eventually, that would lead to
a deadlock while the swap deamon swaps anonymous pages.  It was caused by
commit e0f3c3f78da29b ("mm/mmu_notifier: init notifier if necessary").

  =================================
  [ INFO: inconsistent lock state ]
  3.7.0-rc1+ #518 Not tainted
  ---------------------------------
  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
  kswapd0/35 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (&mapping->i_mmap_mutex){+.+.?.}, at: page_referenced+0x9c/0x2e0
  {RECLAIM_FS-ON-W} state was registered at:
     mark_held_locks+0x86/0x150
     lockdep_trace_alloc+0x67/0xc0
     kmem_cache_alloc_trace+0x33/0x230
     do_mmu_notifier_register+0x87/0x180
     mmu_notifier_register+0x13/0x20
     kvm_dev_ioctl+0x428/0x510
     do_vfs_ioctl+0x98/0x570
     sys_ioctl+0x91/0xb0
     system_call_fastpath+0x16/0x1b
  irq event stamp: 825
  hardirqs last  enabled at (825): _raw_spin_unlock_irq+0x30/0x60
  hardirqs last disabled at (824): _raw_spin_lock_irq+0x19/0x80
  softirqs last  enabled at (0): copy_process+0x630/0x17c0
  softirqs last disabled at (0): (null)
  ...

Simply back out the above commit, which was a small performance
optimization.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reported-by: Andrea Righi <andrea@betterlinux.com>
Tested-by: Andrea Righi <andrea@betterlinux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Sagi Grimberg <sagig@mellanox.co.il>
Cc: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tools/testing/selftests/epoll/test_epoll.c: fix build

Latest Linus head run of "make selftests" in the tools directory failed
with references to undefined variables. Reference was to
'write_thread_data' which is the name of a struct that is being used, not
the variable itself. Change reference so it points to the variable.

Signed-off-by: Daniel Hazelton <dshadowwolf@gmail.com>
Cc: "Paton J. Lewis" <palewis@adobe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

UAPI: fix tools/vm/page-types.c

Fix tools/vm/page-types.c to use the UAPI variant of linux/kernel-page-flags.h
lest the following error appear:

  In file included from page-types.c:38:0:
    ../../include/linux/kernel-page-flags.h:4:42: fatal error:
    uapi/linux/kernel-page-flags.h: No such file or directory

Reported-by: Daniel Hazelton <dshadowwolf@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Daniel Hazelton <dshadowwolf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_alloc.c:alloc_contig_range(): return early for err path

If start_isolate_page_range() failed, unset_migratetype_isolate() has been
done inside it.

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rbtree: include linux/compiler.h for definition of __always_inline

rb_erase_augmented() is a static function annotated with
__always_inline.  This causes a compile failure when attempting to use
the rbtree implementation as a library (e.g.  kvm tool):

  rbtree_augmented.h:125:24: error: expected `=', `,', `;', `asm' or `__attribute__' before `void'

Include linux/compiler.h in rbtree_augmented.h so that the __always_inline
macro is resolved correctly.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

genalloc: stop crashing the system when destroying a pool

The genalloc code uses the bitmap API from include/linux/bitmap.h and
lib/bitmap.c, which is based on long values.  Both bitmap_set from
lib/bitmap.c and bitmap_set_ll, which is the lockless version from
genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
the bitmap.

That one uses (1 << bits) - 1, 0b111, if you are setting the first three
bits.  This means that the API counts from the least significant bits
(LSB from now on) to the MSB.  The LSB in the first long is bit 0, then.
The same works for the lookup functions.

The genalloc code uses longs for the bitmap, as it should.  In
include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
bits[0] as its last member.  When allocating the struct, genalloc should
reserve enough space for the bitmap.  This should be a proper number of
longs that can fit the amount of bits in the bitmap.

However, genalloc allocates an integer number of bytes that fit the
amount of bits, but may not be an integer amount of longs.  9 bytes, for
example, could be allocated for 70 bits.

This is a problem in itself if the Least Significat Bit in a long is in
the byte with the largest address, which happens in Big Endian machines.
This means genalloc is not allocating the byte in which it will try to
set or check for a bit.

This may end up in memory corruption, where genalloc will try to set the
bits it has not allocated.  In fact, genalloc may not set these bits
because it may find them already set, because they were not zeroed since
they were not allocated.  And that's what causes a BUG when
gen_pool_destroy is called and check for any set bits.

What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
on gen_pool_add_virt.  With SLAB and SLUB, this means the whole slab
will be cleared, not only the requested bytes.  Since struct
gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
multiples of 8, we get lucky and allocate and clear the right amount of
bytes.

Hower, this is not the case with SLOB or with older code that did memset
after allocating instead of using __GFP_ZERO.

So, a simple module as this (running 3.6.0), will cause a crash when
rmmod'ed.

  [root@phantom-lp2 foo]# cat foo.c
  #include <linux/kernel.h>
  #include <linux/module.h>
  #include <linux/init.h>
  #include <linux/genalloc.h>

  MODULE_LICENSE("GPL");
  MODULE_VERSION("0.1");

  static struct gen_pool *foo_pool;

  static __init int foo_init(void)
  {
          int ret;
          foo_pool = gen_pool_create(10, -1);
          if (!foo_pool)
                  return -ENOMEM;
          ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
          if (ret) {
                  gen_pool_destroy(foo_pool);
                  return ret;
          }
          return 0;
  }

  static __exit void foo_exit(void)
  {
          gen_pool_destroy(foo_pool);
  }

  module_init(foo_init);
  module_exit(foo_exit);
  [root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
  CONFIG_SLOB=y
  [root@phantom-lp2 foo]# insmod ./foo.ko
  [root@phantom-lp2 foo]# rmmod foo
  ------------[ cut here ]------------
  kernel BUG at lib/genalloc.c:243!
  cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
      pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
      lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
      sp: c0000000bb0e7be0
     msr: 8000000000029032
    current = 0xc0000000bb0e0000
    paca    = 0xc000000006d30e00   softe: 0        irq_happened: 0x01
      pid   = 13044, comm = rmmod
  kernel BUG at lib/genalloc.c:243!
  [c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
  [c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
  [c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
  --- Exception: c00 (System Call) at 000000800753d1a0
  SP (fffd0b0e640) is in userspace

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

backlight: ili9320: add missing SPI dependency

Add this missing SPI dependency and prevent the driver from building
without SPI, because functions of the spi driver are used in this
driver.

drivers/video/backlight/ili9320.c:51: undefined reference to `spi_sync'

Also, a prompt string for CONFIG_LCD_ILI9320 is added for explicit
selection.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device_cgroup: add proper checking when changing default behavior

Before changing a group's default behavior to ALLOW, we must check if
its parent's behavior is also ALLOW.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device_cgroup: stop using simple_strtoul()

Convert the code to use kstrtou32() instead of simple_strtoul() which is
deprecated. The real size of the variables are u32, so use kstrtou32
instead of kstrtoul

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device_cgroup: rename deny_all to behavior

This was done in a v2 patch but v1 ended up being committed. The
variable name is less confusing and stores the default behavior when no
matching exception exists.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: fix invalid rcu dereference

Commit ad676077a2ae ("device_cgroup: convert device_cgroup internally to
policy + exceptions") removed rcu locks which are needed in
task_devcgroup called in this chain:

  devcgroup_inode_mknod OR __devcgroup_inode_permission ->
    __devcgroup_inode_permission ->
      task_devcgroup ->
        task_subsys_state ->
          task_subsys_state_check.

Change the code so that task_devcgroup is safely called with rcu read
lock held.

  ===============================
  [ INFO: suspicious RCU usage. ]
  3.6.0-rc5-next-20120913+ #42 Not tainted
  -------------------------------
  include/linux/cgroup.h:553 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 0
  2 locks held by kdevtmpfs/23:
   #0:  (sb_writers){.+.+.+}, at: [<ffffffff8116873f>]
  mnt_want_write+0x1f/0x50
   #1:  (&sb->s_type->i_mutex_key#3/1){+.+.+.}, at: [<ffffffff811558af>]
  kern_path_create+0x7f/0x170

  stack backtrace:
  Pid: 23, comm: kdevtmpfs Not tainted 3.6.0-rc5-next-20120913+ #42
  Call Trace:
    lockdep_rcu_suspicious+0xfd/0x130
    devcgroup_inode_mknod+0x19d/0x240
    vfs_mknod+0x71/0xf0
    handle_create.isra.2+0x72/0x200
    devtmpfsd+0x114/0x140
    ? handle_create.isra.2+0x200/0x200
    kthread+0xd6/0xe0
    kernel_thread_helper+0x4/0x10

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix XFS oops due to dirty pages without buffers on s390

On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit.  Thus when a page is written to via buffered
write, HW dirty bit gets set and when we later map and unmap the page,
page_remove_rmap() finds the dirty bit and calls set_page_dirty().

Dirtying of a page which shouldn't be dirty can cause all sorts of
problems to filesystems.  The bug we observed in practice is that
buffers from the page get freed, so when the page gets later marked as
dirty and writeback writes it, XFS crashes due to an assertion
BUG_ON(!PagePrivate(page)) in page_buffers() called from
xfs_count_page_state().

Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then
page unmapped.

Fix the issue by ignoring s390 HW dirty bit for page cache pages of
mappings with mapping_cap_account_dirty().  This is safe because for
such mappings when a page gets marked as writeable in PTE it is also
marked dirty in do_wp_page() or do_page_fault().  When the dirty bit is
cleared by clear_page_dirty_for_io(), the page gets writeprotected in
page_mkclean().  So pagecache page is writeable if and only if it is
dirty.

Thanks to Hugh Dickins for pointing out mapping has to have
mapping_cap_account_dirty() for things to work and proposing a cleaned
up variant of the patch.

The patch has survived about two hours of running fsx-linux on tmpfs
while heavily swapping and several days of running on out build machines
where the original problem was triggered.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Btrfs: do not bug when we fail to commit the transaction

We BUG if we fail to commit the transaction when creating a snapshot, which
is just obnoxious. Remove the BUG_ON(). Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: fix memory leak when cloning root's node

After cloning root's node, we forgot to dec the src's ref
which can lead to a memory leak.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

Merge branch 'for-chris-fixed' of git://git.jan-o-sch.net/btrfs-unstable

Btrfs: Use btrfs_update_inode_fallback when creating a snapshot

On a really full file system I was getting ENOSPC back from
btrfs_update_inode when trying to update the parent inode when creating a
snapshot. Just use the fallback method so we can update the inode and not
have to worry about having a delayed ref. Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: Send: preserve ownership (uid and gid) also for symlinks.

This patch also requires a change in the user-space part of "receive".
We need to use "lchown" instead of "chown". We will do this in the
following patch.

Signed-off-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
if (S_ISREG(sctx->cur_inode_mode)) {

Btrfs: fix deadlock caused by the nested chunk allocation

Steps to reproduce:
# mkfs.btrfs -m raid1 <disk1> <disk2>
# btrfstune -S 1 <disk1>
# mount <disk1> <mnt>
# btrfs device add <disk3> <disk4> <mnt>
# mount -o remount,rw <mnt>
# dd if=/dev/zero of=<mnt>/tmpfile bs=1M count=1
Deadlock happened.

It is because of the nested chunk allocation. When we wrote the data
into the filesystem, we would allocate the data chunk because there was
no data chunk in the filesystem. At the end of the data chunk allocation,
we should insert the metadata of the data chunk into the extent tree, but
there was no raid1 chunk, so we tried to lock the chunk allocation mutex to
allocate the new chunk, but we had held the mutex, the deadlock happened.

By rights, we would allocate the raid1 chunk when we added the second device
because the profile of the seed filesystem is raid1 and we had two devices.
But we didn't do that in fact. It is because the last step of the first device
insertion didn't commit the transaction. So when we added the second device,
we didn't cow the tree, and just inserted the relative metadata into the leaves
which were generated by the first device insertion, and its profile was dup.

So, I fix this problem by commiting the transaction at the end of the first
device insertion.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

btrfs: Return EINVAL when length to trim is less than FSB

Currently if len argument in btrfs_ioctl_fitrim() is smaller than
one FSB we will continue and finally return 0 bytes discarded.
However if the length to discard is smaller then file system block
we should really return EINVAL.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>

Btrfs: fix memory leak in btrfs_quota_enable()

We should free quota_root before returning from the error
handling code.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>

Btrfs: send correct rdev and mode in btrfs-send

When sending a device file, the stream was missing the mode. Also the
rdev was encoded wrongly.

Signed-off-by: Arne Jansen <sensille@gmx.net>

Btrfs: extended inode refs support for send mechanism

This adds support for the new extended inode refs to btrfs send.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

Btrfs: Fix wrong error handling code

gcc says "warning: comparison of unsigned expression >= 0 is always
true" because i is an unsigned long. And gcc is right this time.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>

Fix a sign bug causing invalid memory access in the ino_paths ioctl.

To see the problem, create many hardlinks to the same file (120 should do it),
then look up paths by inode with:

ls -i
btrfs inspect inode-resolve -v $ino /mnt/btrfs

I noticed the memory layout of the fspath->val data had some irregularities
(some unnecessary gaps that stop appearing about halfway),
so I'm not sure there aren't any bugs left in it.

Merge tag 'asoc-3.7' of git://git./linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v3.7

A couple of driver fixes, one that improves the interoperability of
WM8994 with controllers that are sensitive to extra BCLK cycles and some
build break fixes for ux500.

x86: efi: Turn off efi_enabled after setup on mixed fw/kernel

When 32-bit EFI is used with 64-bit kernel (or vice versa), turn off
efi_enabled once setup is done. Beyond setup, it is normally used to
determine if runtime services are available and we will have none.

This will resolve issues stemming from efivars modprobe panicking on a
32/64-bit setup, as well as some reboot issues on similar setups.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=45991

Reported-by: Marko Kohtala <marko.kohtala@gmail.com>
Reported-by: Maxim Kammerer <mk@dee.su>
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: stable@kernel.org # 3.4 - 3.6
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

Merge remote-tracking branches 'asoc/fix/ux500' and 'asoc/fix/wm8994' into for-3.7

Merge tag 'spi-linus' of git://git./linux/kernel/git/broonie/misc

Pull spi fixes from Mark Brown:
"A bunch of fixes here, mostly minor except for the pl022 which has
  just been a bit of a shambles all round, the recent runtime PM changes
  have as far as I can tell never worked so they're just getting thrown
  out."

* tag 'spi-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc:
  spi/pl022: Revert recent runtime PM changes
  spi: tsc2005: delete soon-obsolete e-mail address
  spi: spi-rspi: fix build error for the latest shdma driver

Merge tag 'iommu-fixes-v3.7-rc2' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
"Two fixes this time:

   1. Another fix for a broken BIOS to detect when AMD IOMMU interrupt
      remapping can not work reliably
   2. Typo fix for NVidia IOMMU driver"

* tag 'iommu-fixes-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/tegra: smmu: Fix deadly typo
  iommu/amd: Work around wrong IOAPIC device-id in IVRS table

Merge tag 'pinctrl-v3.7-rc3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fixes from Linus Walleij:
"This fixes a few pinctrl problems seen since v3.7-rc1:
   - Section tagging for init code
   - Use proper pointers to lookup struct device * in the bcm2835
     (a.k.a.  Raspberry Pi)
   - Remove duplicate #includes
   - Fix bad return values in errorpath
   - Remove extraneous pull function from the sirf driver causing build
     errors
   - Provide compilation stubs for the Nomadik pinctrl driver when used
     with legacy systems without PRCMU units
   - Various irqdomain fixes in the Nomadik driver as predicted
   - Various smallish bugs in the Tegra driver, most also targeted for
     stable
   - Removed a deadlocking mutex in the groups debugfs show function"

* tag 'pinctrl-v3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl/nomadik: pass DT node to the irqdomain
  pinctrl/nomadik: use zero as default irq_start
  pinctrl: fix missing unlock on error in pinctrl_groups_show()
  pinctrl/nomadik: use irq_create_mapping()
  pinctrl: remove mutex lock in groups show
  pinctrl: tegra: correct bank for pingroup and drv pingroup
  pinctrl: tegra: set low power mode bank width to 2
  dt: Document: correct tegra20/30 pinctrl slew-rate name

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull apparmor bugfix from James Morris.

Fix a possibly unbounded recursion by iterating over the entries instead.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix IRQ stack overflow during free_profile

Merge tag 'edac_scrubrates_fix' of git://git./linux/kernel/git/bp/bp

Pull amd64_edac fix from Borislav Petkov:
"An array out-of-bounds fix from Andrew when setting the scrub rate of
the memory controller."

* tag 'edac_scrubrates_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]

Merge branch 'for-3.7-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"This pull request contains three fixes.

  Two are reverts of task_lock() removal in cgroup fork path.  The
  optimizations incorrectly assumed that threadgroup_lock can protect
  process forks (as opposed to thread creations) too.  Further cleanup
  of cgroup fork path is scheduled.

  The third fixes cgroup emptiness notification loss."

* 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  Revert "cgroup: Remove task_lock() from cgroup_post_fork()"
  Revert "cgroup: Drop task_lock(parent) on cgroup_fork()"
  cgroup: notify_on_release may not be triggered in some cases

Merge branch 'for-3.7-fixes' of git://git./linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:
"This pull request contains one patch from Dan Magenheimer to fix
  cancel_delayed_work() regression introduced by its reimplementation
  using try_to_grab_pending().  The reimplementation made it incorrectly
  return %true when the work item is idle.

  There aren't too many consumers of the return value but it broke at
  least ramster."

* 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: cancel_delayed_work() should return %false if work item is idle

x86, mm: Find_early_table_space based on ranges that are actually being mapped

Current logic finds enough space for direct mapping page tables from 0
to end. Instead, we only need to find enough space to cover mr[0].start
to mr[nr_range].end -- the range that is actually being mapped by
init_memory_mapping()

This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address
the panic reported here:

https://lkml.org/lkml/2012/10/20/160
https://lkml.org/lkml/2012/10/21/157

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie
Tested-by: Tom Rini <trini@ti.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

workqueue: cancel_delayed_work() should return %false if work item is idle

57b30ae77b ("workqueue: reimplement cancel_delayed_work() using
try_to_grab_pending()") made cancel_delayed_work() always return %true
unless someone else is also trying to cancel the work item, which is
broken - if the target work item is idle, the return value should be
%false.

try_to_grab_pending() indicates that the target work item was idle by
zero return value. Use it for return. Note that this brings
cancel_delayed_work() in line with __cancel_work_timer() in return
value handling.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <444a6439-b1a4-4740-9e7e-bc37267cfe73@default>

x86, mm: Use memblock memory loop instead of e820_RAM

We need to handle E820_RAM and E820_RESERVED_KERNEL at the same time.

Also memblock has page aligned range for ram, so we could avoid mapping
partial pages.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

x86, mm: Trim memory in memblock to be page aligned

We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

drm/radeon: fix ATPX regression in acpi rework

Copy and paste typo in the apci rework.

Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=49351

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>

drm/radeon: fix ATPX function documentation

The ATPX code no longer handles ATRM.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: move the retry to gem_object_create

When internal users want VRAM we shouldn't return GART memory instead.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: move size limits to gem_object_create.

Driver internal users shouldn't be limited in their allocation size.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use vzalloc for gart pages

When allocating more than 2GB of GART the array of pages
gets to big for kzalloc, use vzalloc instead.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix and simplify pot argument checks v3

GART and VRAM size limits need to be a power of two.
Fix values greater than 1GB and simplify those checks a bit.

v2: also fix radeon_vram_limit usage, and simplify test even more.
v3: agd5f: fix spelling as noticed by Klaus Schnass

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

apparmor: fix IRQ stack overflow during free_profile

BugLink: http://bugs.launchpad.net/bugs/1056078
Profile replacement can cause long chains of profiles to build up when
the profile being replaced is pinned. When the pinned profile is finally
freed, it puts the reference to its replacement, which may in turn nest
another call to free_profile on the stack. Because this may happen for
each profile in the replacedby chain this can result in a recusion that
causes the stack to overflow.

Break this nesting by directly walking the chain of replacedby profiles
(ie. use iteration instead of recursion to free the list). This results
in at most 2 levels of free_profile being called, while freeing a
replacedby chain.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>

iommu/tegra: smmu: Fix deadly typo

Fix a deadly typo in macro definition.

Cc: stable@vger.kernel.org
Signed-off-by: Hiro Sugawara <hsugawara@nvidia.com>
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero

The current code is clearing it in all cases _except_ when zero.

Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

LOCKD: fix races in nsm_client_get

Commit e9406db20fecbfcab646bad157b4cfdc7cadddfb (lockd: per-net
NSM client creation and destruction helpers introduced) contains
a nasty race on initialisation of the per-net NSM client because
it doesn't check whether or not the client is set after grabbing
the nsm_create_mutex.

Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

SUNRPC: Get rid of the xs_error_report socket callback

Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.

Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>

SUNRPC: Prevent races in xs_abort_connection()

The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().

All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>

Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."

This reverts commit 55420c24a0d4d1fce70ca713f84aa00b6b74a70e.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>

SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT

This is needed to ensure that we call xprt_connect() upon the next
call to call_connect().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>

amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]

If none of the elements in scrubrates[] matches, this loop will cause
__amd64_set_scrub_rate() to incorrectly use the n+1th element.

As the function is designed to use the final scrubrates[] element in the
case of no match, we can fix this bug by simply terminating the array
search at the n-1th element.

Boris: this code is fragile anyway, see here why:
http://marc.info/?l=linux-kernel&m=135102834131236&w=2

It will be rewritten more robustly soonish.

Reported-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: stable@vger.kernel.org
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>

ASoC: wm8994: Only enable extra BCLK cycles when required

Rather than always assuming the maximum possible BCLK rate will be
required generate BCLKs for stereo if either one or two channels is
enabled. In order to support this we also need to ensure that only
the relevant channels are enabled.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

Posting this patch to fix an issue concerning sparse irq's that
I raised a while back. There was discussion about adding
refcounting to sparse irqs (to fix other potential race
conditions), but that does not appear to have been addressed
yet. This covers the only issue of this type that I've
encountered in this area.

A NULL pointer dereference can occur in
smp_irq_move_cleanup_interrupt() if we haven't yet setup the
irq_cfg pointer in the irq_desc.irq_data.chip_data.

In create_irq_nr() there is a window where we have set
vector_irq in __assign_irq_vector(), but not yet called
irq_set_chip_data() to set the irq_cfg pointer.

Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during
this time, smp_irq_move_cleanup_interrupt() will attempt to
process the aforementioned irq, but panic when accessing
irq_cfg.

Only continue processing the irq if irq_cfg is non-NULL.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20121016125021.GA22935@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Remove unused variable in nhmex_rbox_alter_er()

The variable port is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/CAPgLHd8NZkYSkZm22FpZxiEh6HcA0q-V%3D29vdnheiDhgrJZ%2Byw@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/efi: Fix oops caused by incorrect set_memory_uc() usage

Calling __pa() with an ioremap'd address is invalid. If we
encounter an efi_memory_desc_t without EFI_MEMORY_WB set in
->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this results in the following oops:

  BUG: unable to handle kernel paging request at f7f22280
  IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
  *pdpt = 0000000001978001 *pde = 0000000001ffb067 *pte = 0000000000000000
  Oops: 0000 [#1] PREEMPT SMP
  Modules linked in:

  Pid: 0, comm: swapper Not tainted 3.0.0-acpi-efi-0805 #3
   EIP: 0060:[<c10257b9>] EFLAGS: 00010202 CPU: 0
   EIP is at reserve_ram_pages_type+0x89/0x210
   EAX: 0070e280 EBX: 38714000 ECX: f7814000 EDX: 00000000
   ESI: 00000000 EDI: 38715000 EBP: c189fef0 ESP: c189fea8
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 0, ti=c189e000 task=c18bbe60 task.ti=c189e000)
  Stack:
   80000200 ff108000 00000000 c189ff00 00038714 00000000 00000000 c189fed0
   c104f8ca 00038714 00000000 00038715 00000000 00000000 00038715 00000000
   00000010 38715000 c189ff48 c1025aff 38715000 00000000 00000010 00000000
  Call Trace:
   [<c104f8ca>] ? page_is_ram+0x1a/0x40
   [<c1025aff>] reserve_memtype+0xdf/0x2f0
   [<c1024dc9>] set_memory_uc+0x49/0xa0
   [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
   [<c19216d4>] start_kernel+0x291/0x2f2
   [<c19211c7>] ? loglevel+0x1b/0x1b
   [<c19210bf>] i386_start_kernel+0xbf/0xc8

The only time we can call set_memory_uc() for a memory region is
when it is part of the direct kernel mapping. For the case where
we ioremap a memory region we must leave it alone.

This patch reimplements the fix from e8c7106280a3 ("x86, efi:
Calling __pa() with an ioremap()ed address is invalid") which
was reverted in e1ad783b12ec because it caused a regression on
some MacBooks (they hung at boot). The regression was caused
because the commit only marked EFI_RUNTIME_SERVICES_DATA as
E820_RESERVED_EFI, when it should have marked all regions that
have the EFI_MEMORY_RUNTIME attribute.

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because of the way that the memory map might be
configured as detailed in the following bug report,

https://bugzilla.redhat.com/show_bug.cgi?id=748516

e.g. some of the EFI memory regions *need* to be mapped as part
of the direct kernel mapping.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1350649546-23541-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Btrfs: comment for loop in tree_mod_log_insert_move

Emphasis the way tree_mod_log_insert_move avoids adding
MOD_LOG_KEY_REMOVE_WHILE_MOVING operations, depending on the direction of
the move operation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

Btrfs: fix extent buffer reference for tree mod log roots

In get_old_root we grab a lock on the extent buffer before we obtain a
reference on that buffer. That order is changed now.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

Btrfs: determine level of old roots

In btrfs_find_all_roots' termination condition, we compare the level of the
old buffer we got from btrfs_search_old_slot to the level of the current
root node. We'd better compare it to the level of the rewinded root node.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

Btrfs: tree mod log's old roots could still be part of the tree

Tree mod log treated old root buffers as always empty buffers when starting
the rewind operations. However, the old root may still be part of the
current tree at a lower level, with still some valid entries.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

perf/x86: Enable overflow on Intel KNC with a custom knc_pmu_handle_irq()

Although based on the Intel P6 design, the interrupt mechnanism
for KNC more closely resembles the Intel architectural
perfmon one.

We can't just re-use that code though, because KNC has different
MSR numbers for the status and ack registers.

In this case we just cut-and paste from perf_event_intel.c
with some minor changes, as it looks like it would not be
worth the trouble to change that code to be MSR-configurable.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171304410.23243@vincent-weaver-1.um.maine.edu
[ Small stylistic edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Remove cpuc->enable check on Intl KNC event enable/disable

x86_pmu.enable() is called from x86_pmu_enable() with
cpuc->enabled set to 0. This means we weren't re-enabling the
counters after a context switch.

This patch just removes the check, as it should't be necessary
(and the equivelent x86_ generic code does not have the checks).

The origin of this problem is the KNC driver being based on the
P6 one. The P6 driver also has this issue, but works anyway
due to various lucky accidents.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows
Cc: Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171303290.23243@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Make Intel KNC use full 40-bit width of counters

Early versions of Intel KNC chips have a bug where bits above 32
were not properly set. We worked around this by only using the
bottom 32 bits (out of 40 that should be available).

It turns out this workaround breaks overflow handling.

The buggy silicon will in theory never be used in production
systems, so remove this workaround so we get proper overflow
support.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171302140.23243@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/uncore: Handle pci_read_config_dword() errors

This, beyond handling corner cases, also fixes some build warnings:

arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_disable_box’:
arch/x86/kernel/cpu/perf_event_intel_uncore.c:124:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_enable_box’:
arch/x86/kernel/cpu/perf_event_intel_uncore.c:135:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_read_counter’:
arch/x86/kernel/cpu/perf_event_intel_uncore.c:164:2: warning: ‘count’ is used uninitialized in this function [-Wuninitialized]

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1351068140-13456-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86-64: Fix page table accounting

Commit 20167d3421a089a1bf1bd680b150dc69c9506810 ("x86-64: Fix
accounting in kernel_physical_mapping_init()") went a little too
far by entirely removing the counting of pre-populated page
tables: this should be done at boot time (to cover the page
tables set up in early boot code), but shouldn't be done during
memory hot add.

Hence, re-add the removed increments of "pages", but make them
and the one in phys_pte_init() conditional upon !after_bootmem.

Reported-Acked-and-Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/506DAFBA020000780009FA8C@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Remove P6 cpuc->enabled check

Between 2.6.33 and 2.6.34 the PMU code was made modular.

The x86_pmu_enable() call was extended to disable cpuc->enabled
and iterate the counters, enabling one at a time, before calling
enable_all() at the end, followed by re-enabling cpuc->enabled.

Since cpuc->enabled was set to 0, that change effectively caused
the "val |= ARCH_PERFMON_EVENTSEL_ENABLE;" code in p6_pmu_enable_event()
and p6_pmu_disable_event() to be dead code that was never called.

This change removes this code (which was confusing) and adds some
extra commentary to make it more clear what is going on.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191732000.14552@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Update/fix generic events on P6 PMU

This patch updates the generic events on p6, including some new
extended cache events.

Values for these events were taken from the equivelant PAPI
predefined events.

Tested on a Pentium II.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191730080.14552@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Fix P6 FP_ASSIST event constraint

According to Intel SDM Volume 3B, FP_ASSIST is limited to Counter 1 only,
not Counter 0.

Tested on a Pentium II.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191728570.14552@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf, cpu hotplug: Use cached value of smp_processor_id()

The perf_cpu_notifier() macro invokes smp_processor_id()
multiple times. Optimize it by using a local variable.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: peterz@infradead.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20121016075817.3572.76733.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf, cpu hotplug: Run CPU_STARTING notifiers with irqs disabled

The CPU_STARTING notifiers are supposed to be run with irqs
disabled. But the perf_cpu_notifier() macro invokes them without
doing that. Fix it.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: peterz@infradead.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20121016075809.3572.47848.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Revert "x86/mm: Fix the size calculation of mapping tables"

Commit:

722bc6b16771 x86/mm: Fix the size calculation of mapping tables

Tried to address the issue that the first 2/4M should use 4k pages
if PSE enabled, but extra counts should only be valid for x86_32.

This commit caused a kdump regression: the kdump kernel hangs.

Work is in progress to fundamentally fix the various page table
initialization issues that we have, via the design suggested
by H. Peter Anvin, but it's not ready yet to be merged.

So, to get a working kdump revert to the last known working version,
which is the revert of this commit and of a followup fix (which was
incomplete):

bd2753b2dda7 x86/mm: Only add extra pages count for the first memory range during pre-allocation

Tested kdump on physical and virtual machines.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Tested-by: Flavio Leitner <fbl@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Flavio Leitner <fbl@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: ianfang.cn@gmail.com
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/perf: Fix virtualization sanity check

In check_hw_exists() we try to detect non-emulated MSR accesses
by writing an arbitrary value into one of the PMU registers
and check if it's value after a readout is still the same.
This algorithm silently assumes that the register does not contain
the magic value already, which is wrong in at least one situation.

Fix the algorithm to really do a read-modify-write cycle. This fixes
a warning under Xen under some circumstances on AMD family 10h CPUs.

The reasons in more details actually sound like a story from
Believe It or Not!:

First you need an AMD family 10h/12h CPU. These do not reset the
PERF_CTR registers on a reboot.
Now you boot bare metal Linux, which goes successfully through this
check, but leaves the magic value of 0xabcd in the register. You
don't use the performance counters, but do a reboot (warm reset).
Then you choose to boot Xen. The check will be triggered with a
recent Linux kernel as Dom0 again, trying to write 0xabcd into the
MSR. Xen silently drops the write (expected), but the subsequent read
will return the value in the register, which just happens to be the
expected magic value. Thus the test misleadingly succeeds, leaving
the kernel in the belief that the PMU is available. This will trigger
the following message:

[    0.020294] ------------[ cut here ]------------
[    0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
[    0.020318] Hardware name: empty
[    0.020323] Modules linked in:
[    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
[    0.020340] Call Trace:
[    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
[    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
[    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
[    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
[    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
[    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
[    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
[    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
[    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
[    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
[    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
[    0.020500] ---[ end trace a7919e7f17c0a725 ]---

The new code will change every of the 16 low bits read from the
register and tries to write and read-back that modified number
from the MSR.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

* Validate syscall id before growing syscall table in 'trace', fixing potential
   excessive memory usage.

* Validate perf_sample.raw_data, making 'trace' more robust, avoiding some
   potential SEGFAULTs when reading tracepoint fields.

* Fix exclude_guest parse events 'perf test's, from Jiri Olsa.

* Do not flush maps on COMM, that is sent by the kernel when a process is
   exec'ed, but also when a process changes its name. Since we were assuming
   a COMM always meant an EXEC, we were losing track of a process maps by
   flushing its maps. Fix from Luigi Semenzato.

* A recent patch introduced a problem by not initializing what should be
   the first kind of pager to use, 'man', instead it was being left as zero
   which means no pager. This caused 'perf subcmd --help' to produce no output.
   Fix from Namhyung Kim.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

ARM: mm: Remove unused arm_vmregion priv field

Commit e9da6e9905e639b0f842a244bc770b48ad0523e9 ("ARM: dma-mapping:
remove custom consistent dma region") removed the last users of the
field. Remove it.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: dma-mapping: fix build warning in __dma_alloc()

Fix build warning in __dma_alloc() as below:

arch/arm/mm/dma-mapping.c: In function '__dma_alloc':
arch/arm/mm/dma-mapping.c:653:29: warning: 'page' may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>