linux-2.6-microblaze.git
18 years ago[PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers
David Howells [Tue, 3 Oct 2006 08:13:46 +0000 (01:13 -0700)]
[PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers

These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system.  They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example.  The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.

Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.

This patch:

Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.

The stat functions then returns the full 64-bit inode number where
available and where possible.  If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.

Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.

Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.

Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.

It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.

[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] pid.h cleanup
Andrew Morton [Tue, 3 Oct 2006 08:13:45 +0000 (01:13 -0700)]
[PATCH] pid.h cleanup

Make the pid.h macros look less revolting in an 80-col window.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Add unifdef to gitignore
Paul Mundt [Tue, 3 Oct 2006 02:26:34 +0000 (11:26 +0900)]
[PATCH] Add unifdef to gitignore

This seems to have been missed when unifdef went in
via Sam's tree..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] hp100: fix conditional compilation mess
Jeff Garzik [Tue, 3 Oct 2006 01:08:22 +0000 (21:08 -0400)]
[PATCH] hp100: fix conditional compilation mess

The previous hp100 changeset attempted to kill warnings, but was only
tested on !CONFIG_ISA platforms.  The correct conditional compilation
setup involves tested CONFIG_ISA rather than just MODULE.

Fixes link on CONFIG_ISA platforms (i386) in current -git.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] revert "insert IOAPIC(s) and Local APIC into resource map"
Andrew Morton [Tue, 3 Oct 2006 00:24:06 +0000 (17:24 -0700)]
[PATCH] revert "insert IOAPIC(s) and Local APIC into resource map"

Commit 54dbc0c9ebefb38840c6b07fa6eabaeb96c921f5 is causing various
people's machines to fail to map PCI resources.

Revert it in preparation for addressing the show-APICs-in-/proc/iomem
requirement in a different manner.

Cc: Aaron Durbin <adurbin@google.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland...
Linus Torvalds [Mon, 2 Oct 2006 22:29:11 +0000 (15:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/ehca: Tweak trace message format
  IB/ehca: Fix device registration
  IB/ipath: Fix RDMA reads
  RDMA/cma: Optimize error handling
  RDMA/cma: Eliminate unnecessary remove_list
  RDMA/cma: Set status correctly on route resolution error
  RDMA/cma: Fix device removal race
  RDMA/cma: Fix leak of cm_ids in case of failures

18 years agoIB/ehca: Tweak trace message format
Hoang-Nam Nguyen [Mon, 2 Oct 2006 21:52:17 +0000 (14:52 -0700)]
IB/ehca: Tweak trace message format

Add an extra space to make things more readable.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/ehca: Fix device registration
Hoang-Nam Nguyen [Mon, 2 Oct 2006 21:52:17 +0000 (14:52 -0700)]
IB/ehca: Fix device registration

Move the call to ib_register_device() later, since a device should not
be registered until it is completely read to be used.  This fixes
crashes that occur if an upper-layer driver such as IPoIB is loaded
before the ehca module.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/ipath: Fix RDMA reads
Ralph Campbell [Fri, 29 Sep 2006 21:37:51 +0000 (14:37 -0700)]
IB/ipath: Fix RDMA reads

The PSN used to generate the request following a RDMA read was
incorrect and some state booking wasn't maintained correctly.  This
patch fixes that.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
18 years agoRDMA/cma: Optimize error handling
Krishna Kumar [Fri, 29 Sep 2006 19:09:51 +0000 (12:09 -0700)]
RDMA/cma: Optimize error handling

Reorganize code relating to cma_get_net_info() and rdam_create_id() to
optimize error case handling (no need to alloc memory/etc. as part of
rdma_create_id() if input parameters are wrong).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/cma: Eliminate unnecessary remove_list
Krishna Kumar [Fri, 29 Sep 2006 19:03:35 +0000 (12:03 -0700)]
RDMA/cma: Eliminate unnecessary remove_list

Eliminate remove_list by using list_del_init() instead during device
removal handling.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/cma: Set status correctly on route resolution error
Sean Hefty [Fri, 29 Sep 2006 18:57:09 +0000 (11:57 -0700)]
RDMA/cma: Set status correctly on route resolution error

On reporting a route error, also include the status for the error,
rather than indicating a status of 0 when an error has occurred.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/cma: Fix device removal race
Krishna Kumar [Fri, 29 Sep 2006 18:51:49 +0000 (11:51 -0700)]
RDMA/cma: Fix device removal race

The race is as follows:

A process : cma_process_remove() calls cma_remove_id_dev(),
    which sets id state to CMA_DEVICE_REMOVAL and
    calls wait_event(dev_remove).

B process : cma_req_handler() had incremented dev_remove,
    and calls cma_acquire_ib_dev() and on failure
    calls cma_release_remove(), which does a
    wake_up of cma_process_remove(). Then
    cma_req_handler() calls rdma_destroy_id();

A Process : cma_remove_id_dev() gets woken and checks the
    state of id, and since it is still (wrongly)
    CMA_DEVICE_REMOVAL, it calls notify_user(id)
    and if that fails, the caller - cma_process_remove()
    calls rdma_destroy_id(id). Two processes can
    call rdma_destroy_id(), resulting in one
    de-referencing kfreed id_priv.

Fix is for process B to set CMA_DESTROYING in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/cma: Fix leak of cm_ids in case of failures
Krishna Kumar [Fri, 29 Sep 2006 18:47:06 +0000 (11:47 -0700)]
RDMA/cma: Fix leak of cm_ids in case of failures

cma_connect_ib() and cma_connect_iw() leak cm_id's in failure cases.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
Linus Torvalds [Mon, 2 Oct 2006 21:48:07 +0000 (14:48 -0700)]
Merge /linux/kernel/git/wim/linux-2.6-watchdog

* master.kernel.org:/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [WATCHDOG] improve machzwd detection
  [WATCHDOG] use ENOTTY instead of ENOIOCTLCMD in ioctl()
  [WATCHDOG] s3c24XX nowayout
  [WATCHDOG] pnx4008: add cpu_relax()
  [WATCHDOG] pnx4008_wdt.c - spinlock fixes.
  [WATCHDOG] pnx4008_wdt.c - remove patch
  [WATCHDOG] pnx4008_wdt.c - nowayout patch
  [WATCHDOG] pnx4008: add watchdog support
  [WATCHDOG] i8xx_tco remove pci_find_device.
  [WATCHDOG] alim remove pci_find_device

18 years ago[WATCHDOG] improve machzwd detection
Dave Jones [Tue, 1 Aug 2006 18:06:43 +0000 (20:06 +0200)]
[WATCHDOG] improve machzwd detection

On a machine with no machzwd, loading the module prints out..

machzwd: MachZ ZF-Logic Watchdog driver initializing.
0xffff
machzwd: Watchdog using action = RESET

- the 0xffff printk is unnecessary
- 0xffff seems to be 'hardware not present'
- fix CodingStyle. (This driver could use some more work here)

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@osdl.org>
18 years ago[WATCHDOG] use ENOTTY instead of ENOIOCTLCMD in ioctl()
Samuel Tardieu [Sat, 9 Sep 2006 15:34:31 +0000 (17:34 +0200)]
[WATCHDOG] use ENOTTY instead of ENOIOCTLCMD in ioctl()

Return ENOTTY instead of ENOIOCTLCMD in user-visible ioctl() results

The watchdog drivers used to return ENOIOCTLCMD for bad ioctl() commands.
ENOIOCTLCMD should not be visible by the user, so use ENOTTY instead.

Signed-off-by: Samuel Tardieu <sam@rfc1149.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
18 years ago[WATCHDOG] s3c24XX nowayout
Ben Dooks [Wed, 6 Sep 2006 11:24:35 +0000 (12:24 +0100)]
[WATCHDOG] s3c24XX nowayout

If the driver is not configured for `no way out`,
then the open method should not automatically allow
the setting of allow_close to CLOSE_STATE_ALLOW.

The setting of allow_close nullifies the use of
the magic close via the write path. It means that
in the default state, the watchdog will shut-down
even if the magic close has not been issued.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
18 years ago[WATCHDOG] pnx4008: add cpu_relax()
Vitaly Wool [Mon, 11 Sep 2006 10:42:39 +0000 (14:42 +0400)]
[WATCHDOG] pnx4008: add cpu_relax()

Added cpu_relax as suggested by Alan Cox.

Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
18 years ago[WATCHDOG] pnx4008_wdt.c - spinlock fixes.
Wim Van Sebroeck [Sun, 10 Sep 2006 10:48:15 +0000 (12:48 +0200)]
[WATCHDOG] pnx4008_wdt.c - spinlock fixes.

Add io spinlocks to prevent possible race
conditions between start and stop operations
that are issued from different child processes
where the master process opened /dev/watchdog.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
18 years agoAdd prototype for sigset_from_compat()
Linus Torvalds [Mon, 2 Oct 2006 21:05:20 +0000 (14:05 -0700)]
Add prototype for sigset_from_compat()

Duh.  I screwed up editing David Howells patch in commit
3f2e05e90e0846c42626e3d272454f26be34a1bc, and the actual declaration for
the sigset_from_compat() function went missing. My bad.

Olaf Hering saved the day and noticed that I'm a moron.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[WATCHDOG] pnx4008_wdt.c - remove patch
Wim Van Sebroeck [Sun, 30 Jul 2006 18:06:07 +0000 (20:06 +0200)]
[WATCHDOG] pnx4008_wdt.c - remove patch

Change remove code so that we first detach
the driver from userspace, then clean up the
clock and then clean up the memory we allocated.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
18 years ago[WATCHDOG] pnx4008_wdt.c - nowayout patch
Wim Van Sebroeck [Mon, 3 Jul 2006 07:03:47 +0000 (09:03 +0200)]
[WATCHDOG] pnx4008_wdt.c - nowayout patch

Change nowayout to: WATCHDOG_NOWAYOUT as defined
in include/linux/watchdog.h .

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
18 years ago[WATCHDOG] pnx4008: add watchdog support
Vitaly Wool [Mon, 26 Jun 2006 15:31:49 +0000 (19:31 +0400)]
[WATCHDOG] pnx4008: add watchdog support

Add watchdog support for Philips PNX4008 ARM board inlined.

Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
18 years ago[WATCHDOG] i8xx_tco remove pci_find_device.
Jiri Slaby [Wed, 19 Jul 2006 00:19:23 +0000 (02:18 +0159)]
[WATCHDOG] i8xx_tco remove pci_find_device.

Use refcounting for pci device obtaining.
Use PCI_DEVICE macro.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: Andrew Morton <akpm@osdl.org>
18 years ago[WATCHDOG] alim remove pci_find_device
Jiri Slaby [Tue, 18 Jul 2006 16:30:00 +0000 (18:29 +0159)]
[WATCHDOG] alim remove pci_find_device

Convert pci_find_device to pci_get_device + pci_dev_put
in alim watchdog cards' drivers (refcounting).

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@osdl.org>
18 years agoMerge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Mon, 2 Oct 2006 15:56:58 +0000 (08:56 -0700)]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6

* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (37 commits)
  [netdrvr] hp100: encapsulate all non-module code
  drivers/net/wireless/{airo,ipw2100}: fix error handling bugs
  [netdrvr] phy: Fix bugs in error handling
  [PATCH] spidernet: Use pci_dma_mapping_error()
  [PATCH] sky2: version 1.9
  [PATCH] sky2: fragmented receive for large MTU
  [PATCH] sky2: use netif_tx_lock instead of LLTX
  [PATCH] sky2: incremental transmit completion
  [PATCH] sky2: name irq after eth for irqbalance
  [PATCH] sky2: workarounds for some 88e806x chips
  [PATCH] sky2: use standard pci register capabilties for error register
  [PATCH] sky2: gigabit full duplex negotiation
  e100, e1000, ixgb: increment version numbers
  ixgb: convert to netdev_priv(netdev)
  ixgb: combine more rx descriptors to improve performance
  e1000: possible memory leak in e1000_set_ringparam
  e1000: Janitor: Use #defined values for literals
  e1000: don't strip vlan ID if 8021q claims it
  e1000: rework polarity, NVM, eeprom code and fixes.
  e1000: driver state fixes (race fix)
  ...

18 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy...
Linus Torvalds [Mon, 2 Oct 2006 15:27:09 +0000 (08:27 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/shaggy/jfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: White space cleanup
  [PATCH] JFS: return correct error when i-node allocation failed
  JFS: Remove shadow variable from fs/jfs/jfs_txnmgr.c:xtLog()

18 years agoMerge git://git.infradead.org/mtd-2.6
Linus Torvalds [Mon, 2 Oct 2006 15:22:17 +0000 (08:22 -0700)]
Merge git://git.infradead.org/mtd-2.6

* git://git.infradead.org/mtd-2.6:
  [MTD] Cleanup of 'ioremap balanced with iounmap for drivers/mtd subsystem'
  [MTD] fix nftl_write warning
  [MTD] fix printk warning
  [MTD ONENAND] Check OneNAND lock scheme & all block unlock command support
  [MTD ONENAND] Remove unused MTD_ONENAND_SYNC_READ configuration
  [MTD ONENAND] Fix OneNAND probe
  [MTD NAND] Provide prototype for newly-exported nand_wait_ready()
  [MTD] Remove #ifndef __KERNEL__ hack in <mtd/mtd-abi.h>
  [MTD NAND] Allow override of page read and write functions.
  [MTD NAND] Allocate chip->buffers separately to allow it to be overridden
  [MTD NAND] Split nand_scan() into two parts; allow board driver to intervene
  [MTD NAND] Export nand_wait_ready() for use by board drivers

18 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Mon, 2 Oct 2006 15:20:33 +0000 (08:20 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (35 commits)
  Input: wistron - add support for Acer TravelMate 2424NWXCi
  Input: wistron - fix setting up special buttons
  Input: add KEY_BLUETOOTH and KEY_WLAN definitions
  Input: add new BUS_VIRTUAL bus type
  Input: add driver for stowaway serial keyboards
  Input: make input_register_handler() return error codes
  Input: remove cruft that was needed for transition to sysfs
  Input: fix input module refcounting
  Input: constify input core
  Input: libps2 - rearrange exports
  Input: atkbd - support Microsoft Natural Elite Pro keyboards
  Input: i8042 - disable MUX mode on Toshiba Equium A110
  Input: i8042 - get rid of polling timer
  Input: send key up events at disconnect
  Input: constify psmouse driver
  Input: i8042 - add Amoi to the MUX blacklist
  Input: logips2pp - add sugnature 56 (Cordless MouseMan Wheel), cleanup
  Input: add driver for Touchwin serial touchscreens
  Input: add driver for Touchright serial touchscreens
  Input: add driver for Penmount serial touchscreens
  ...

18 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Mon, 2 Oct 2006 15:18:43 +0000 (08:18 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Remove unused galileo-boars header files
  [MIPS] Rename SERIAL_PORT_DEFNS for EV64120
  [MIPS] Add UART IRQ number for EV64120
  [MIPS] Remove excite_flash.c
  [MIPS] Update i8259 resources.
  [MIPS] Make unwind_stack() can dig into interrupted context
  [MIPS] Stacktrace build-fix and improvement
  [MIPS] QEMU: Add support for little endian mips
  [MIPS] Remove __flush_icache_page
  [MIPS] lockdep: update defconfigs
  [MIPS] lockdep: Add STACKTRACE_SUPPORT and enable LOCKDEP_SUPPORT
  [MIPS] lockdep: fix TRACE_IRQFLAGS_SUPPORT

18 years ago[PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h
David Howells [Mon, 2 Oct 2006 13:12:31 +0000 (14:12 +0100)]
[PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h

Revert Andrew Morton's patch to temporarily hack around the lack of a
declaration of sigset_t in linux/compat.h to make the block-disablement
patches build on IA64.  This got accidentally pushed to Linus and should
be fixed in a different manner.

Also make linux/compat.h #include asm/signal.h to gain a definition of
sigset_t so that it can externally declare sigset_from_compat().

This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
ppc64 and run-tested on frv.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] replace cad_pid by a struct pid
Cedric Le Goater [Mon, 2 Oct 2006 09:19:00 +0000 (02:19 -0700)]
[PATCH] replace cad_pid by a struct pid

There are a few places in the kernel where the init task is signaled.  The
ctrl+alt+del sequence is one them.  It kills a task, usually init, using a
cached pid (cad_pid).

This patch replaces the pid_t by a struct pid to avoid pid wrap around
problem.  The struct pid is initialized at boot time in init() and can be
modified through systctl with

/proc/sys/kernel/cad_pid

[ I haven't found any distro using it ? ]

It also introduces a small helper routine kill_cad_pid() which is used
where it seemed ok to use cad_pid instead of pid 1.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] introduce get_task_pid() to fix unsafe get_pid()
Oleg Nesterov [Mon, 2 Oct 2006 09:18:59 +0000 (02:18 -0700)]
[PATCH] introduce get_task_pid() to fix unsafe get_pid()

proc_pid_make_inode:

ei->pid = get_pid(task_pid(task));

I think this is not safe.  get_pid() can be preempted after checking "pid
!= NULL".  Then the task exits, does detach_pid(), and RCU frees the pid.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: comment what proc_fill_cache does
Eric W. Biederman [Mon, 2 Oct 2006 09:18:57 +0000 (02:18 -0700)]
[PATCH] proc: comment what proc_fill_cache does

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: remove the useless SMP-safe comments from /proc
Eric W. Biederman [Mon, 2 Oct 2006 09:18:57 +0000 (02:18 -0700)]
[PATCH] proc: remove the useless SMP-safe comments from /proc

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: remove trailing blank entry from pid_entry arrays
Eric W. Biederman [Mon, 2 Oct 2006 09:18:56 +0000 (02:18 -0700)]
[PATCH] proc: remove trailing blank entry from pid_entry arrays

It was pointed out that since I am taking ARRAY_SIZE anyway the trailing empty
entry is silly and just wastes space.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: properly compute TGID_OFFSET
Eric W. Biederman [Mon, 2 Oct 2006 09:18:55 +0000 (02:18 -0700)]
[PATCH] proc: properly compute TGID_OFFSET

The value doesn't change but this ensures I will have the proper value when
other files are added to proc_base_stuff.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: drop tasklist lock in task_state()
Oleg Nesterov [Mon, 2 Oct 2006 09:18:54 +0000 (02:18 -0700)]
[PATCH] proc: drop tasklist lock in task_state()

task_state() needs tasklist_lock to protect ->parent/->real_parent.  However
task->parent points to nowhere only when the actions below happen in order

1) release_task(task)
2) release_task(task->parent)
3) a grace period passed

But 3) implies that the memory ops from 1) should be finished, so pid_alive()
can't be true in such a case.

Otherwise, we don't care if ->parent/->real_parent changes under us.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: convert do_task_stat() to use lock_task_sighand()
Oleg Nesterov [Mon, 2 Oct 2006 09:18:53 +0000 (02:18 -0700)]
[PATCH] proc: convert do_task_stat() to use lock_task_sighand()

Drop tasklist_lock. ->siglock protects almost all interesting data
(including sub-threads traversal) except:

->signal->tty
protected by tty_mutex

->real_parent
the task can't be unhashed while we are holding
->siglock, so ->real_parent can change from under us
but we can safely dereference it under rcu_read_lock()

->pgrp/->session
we can get inconsistent numbers if the task does
sys_setsid/daemonize at the same time. I hope this
is acceptable.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: convert task_sig() to use lock_task_sighand()
Oleg Nesterov [Mon, 2 Oct 2006 09:18:52 +0000 (02:18 -0700)]
[PATCH] proc: convert task_sig() to use lock_task_sighand()

lock_task_sighand() can take ->siglock without holding tasklist_lock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: Use pid_task instead of open coding it
Eric W. Biederman [Mon, 2 Oct 2006 09:18:51 +0000 (02:18 -0700)]
[PATCH] proc: Use pid_task instead of open coding it

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: Merge proc_tid_attr and proc_tgid_attr
Eric W. Biederman [Mon, 2 Oct 2006 09:18:50 +0000 (02:18 -0700)]
[PATCH] proc: Merge proc_tid_attr and proc_tgid_attr

The implementation is exactly the same and there is currently nothing to
distinguish proc_tid_attr, and proc_tgid_attr.  So it is pointless to have two
separate implementations.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: Remove the hard coded inode numbers
Eric W. Biederman [Mon, 2 Oct 2006 09:18:49 +0000 (02:18 -0700)]
[PATCH] proc: Remove the hard coded inode numbers

The hard coded inode numbers in proc currently limit its maintainability,
its flexibility, and what can be done with the rest of system.  /proc limits
pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems,
and placing the pid in the inode number really gets in the way of implementing
subdirectories of per process information.

Ever since people started adding to the middle of the file type enumeration we
haven't been maintaing the historical inode numbers, all we have really
succeeded in doing is keeping the pid in the proc inode number.  The pid is
already available in the directory name so no information is lost removing it
from the inode number.

So if something in user space cares if we remove the inode number from the
/proc inode it is almost certainly broken.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: Factor out an instantiate method from every lookup method
Eric W. Biederman [Mon, 2 Oct 2006 09:18:49 +0000 (02:18 -0700)]
[PATCH] proc: Factor out an instantiate method from every lookup method

To remove the hard coded proc inode numbers it is necessary to be able to
create the proc inodes during readdir.  The instantiate methods are the subset
of lookup that is needed to accomplish that.

This first step just splits the lookup methods into 2 functions.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: Make the generation of the self symlink table driven
Eric W. Biederman [Mon, 2 Oct 2006 09:18:48 +0000 (02:18 -0700)]
[PATCH] proc: Make the generation of the self symlink table driven

This patch generalizes the concept of files in /proc that are related to
processes but live in the root directory of /proc

Ideally this would reuse infrastructure from the rest of the process specific
parts of proc but unfortunately security_task_to_inode must not be called on
files that are not strictly per process.  security_task_to_inode really needs
to be reexamined as the security label can change in important places that we
are not currently catching, but I'm not certain that simplifies this problem.

By at least matching the structure of the rest of proc we get more idiom reuse
and it becomes easier to spot problems in the way things are put together.

Later things like /proc/mounts are likely to be moved into proc_base as well.
If union mounts are ever supported we may be able to make /proc a union mount,
and properly split it into 2 filesystems.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] AVR32: Implement kernel_execve
Haavard Skinnemoen [Mon, 2 Oct 2006 09:18:46 +0000 (02:18 -0700)]
[PATCH] AVR32: Implement kernel_execve

Move execve() into arch/avr32/kernel/sys_avr32.c, rename it to
kernel_execve() and return the syscall return value directly without
setting errno.

This also gets rid of the __KERNEL_SYSCALLS__ stuff from unistd.h and
expands #ifdef __KERNEL__ to cover everything in unistd.h except the
__NR_foo definitions.

Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] remove remaining errno and __KERNEL_SYSCALLS__ references
Arnd Bergmann [Mon, 2 Oct 2006 09:18:44 +0000 (02:18 -0700)]
[PATCH] remove remaining errno and __KERNEL_SYSCALLS__ references

The last in-kernel user of errno is gone, so we should remove the definition
and everything referring to it.  This also removes the now-unused lib/execve.c
file that was introduced earlier.

Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the
kernel.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] sh64: remove the use of kernel syscalls
Arnd Bergmann [Mon, 2 Oct 2006 09:18:41 +0000 (02:18 -0700)]
[PATCH] sh64: remove the use of kernel syscalls

sh64 is using system call macros to call some functions from the kernel.

The old debug code can simply be removed, since we don't really have that much
of a need for it anymore, it was mostly something that was handy during the
initial bringup.  This also brings us closer to something that looks like
readable code again..

I also added a sane kernel_thread() implementation that gets away from this,
so that should take care of sh64 at least.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Remove the use of _syscallX macros in UML
Arnd Bergmann [Mon, 2 Oct 2006 09:18:37 +0000 (02:18 -0700)]
[PATCH] Remove the use of _syscallX macros in UML

User mode linux uses _syscallX() to call into the host kernel.  The
recommended way to do this is to use the syscall() function from libc.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] provide kernel_execve on all architectures
Arnd Bergmann [Mon, 2 Oct 2006 09:18:34 +0000 (02:18 -0700)]
[PATCH] provide kernel_execve on all architectures

This adds the new kernel_execve function on all architectures that were using
_syscall3() to implement execve.

The implementation uses code from the _syscall3 macros provided in the
unistd.h header file.  I don't have cross-compilers for any of these
architectures, so the patch is untested with the exception of i386.

Most architectures can probably implement this in a nicer way in assembly or
by combining it with the sys_execve implementation itself, but this should do
it for now.

[bunk@stusta.de: m68knommu build fix]
[markh@osdl.org: build fix]
[bero@arklinux.org: build fix]
[ralf@linux-mips.org: mips fix]
[schwidefsky@de.ibm.com: s390 fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] rename the provided execve functions to kernel_execve
Arnd Bergmann [Mon, 2 Oct 2006 09:18:31 +0000 (02:18 -0700)]
[PATCH] rename the provided execve functions to kernel_execve

Some architectures provide an execve function that does not set errno, but
instead returns the result code directly.  Rename these to kernel_execve to
get the right semantics there.  Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] introduce kernel_execve
Arnd Bergmann [Mon, 2 Oct 2006 09:18:26 +0000 (02:18 -0700)]
[PATCH] introduce kernel_execve

The use of execve() in the kernel is dubious, since it relies on the
__KERNEL_SYSCALLS__ mechanism that stores the result in a global errno
variable.  As a first step of getting rid of this, change all users to a
global kernel_execve function that returns a proper error code.

This function is a terrible hack, and a later patch removes it again after the
kernel syscalls are gone.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] ipc: replace kmalloc and memset in get_undo_list with kzalloc
Matt Helsley [Mon, 2 Oct 2006 09:18:25 +0000 (02:18 -0700)]
[PATCH] ipc: replace kmalloc and memset in get_undo_list with kzalloc

Simplify get_undo_list() by dropping the unnecessary cast, removing the
size variable, and switching to kzalloc() instead of a kmalloc() followed
by a memset().

This cleanup was split then modified from Jes Sorenson's Task Notifiers
patches.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] nsproxy cloning error path fix
Pavel [Mon, 2 Oct 2006 09:18:24 +0000 (02:18 -0700)]
[PATCH] nsproxy cloning error path fix

This patch fixes copy_namespaces()'s error path.

when new nsproxy (new_ns) is created pointers to namespaces (ipc, uts) are
copied from the old nsproxy.  Later in copy_utsname, copy_ipcs, etc.
according namespaces are get-ed.  On error path needed namespaces are
put-ed, so there's no need to put new nsproxy itelf as it woud cause
putting namespaces for the second time.

Found when incorporating namespaces into OpenVZ kernel.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] IPC namespace - sysctls
Kirill Korotaev [Mon, 2 Oct 2006 09:18:23 +0000 (02:18 -0700)]
[PATCH] IPC namespace - sysctls

Sysctl tweaks for IPC namespace

Signed-off-by: Pavel Emelianiov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] IPC namespace - shm
Kirill Korotaev [Mon, 2 Oct 2006 09:18:22 +0000 (02:18 -0700)]
[PATCH] IPC namespace - shm

IPC namespace support for IPC shm code.

Signed-off-by: Pavel Emelianiov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] IPC namespace - sem
Kirill Korotaev [Mon, 2 Oct 2006 09:18:22 +0000 (02:18 -0700)]
[PATCH] IPC namespace - sem

IPC namespace support for IPC sem code.

Signed-off-by: Pavel Emelianiov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] IPC namespace - msg
Kirill Korotaev [Mon, 2 Oct 2006 09:18:21 +0000 (02:18 -0700)]
[PATCH] IPC namespace - msg

IPC namespace support for IPC msg code.

Signed-off-by: Pavel Emelianiov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] IPC namespace - utils
Kirill Korotaev [Mon, 2 Oct 2006 09:18:20 +0000 (02:18 -0700)]
[PATCH] IPC namespace - utils

This patch adds basic IPC namespace functionality to
IPC utils:
- init_ipc_ns
- copy/clone/unshare/free IPC ns
- /proc preparations

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] IPC namespace core
Kirill Korotaev [Mon, 2 Oct 2006 09:18:19 +0000 (02:18 -0700)]
[PATCH] IPC namespace core

This patch set allows to unshare IPCs and have a private set of IPC objects
(sem, shm, msg) inside namespace.  Basically, it is another building block of
containers functionality.

This patch implements core IPC namespace changes:
- ipc_namespace structure
- new config option CONFIG_IPC_NS
- adds CLONE_NEWIPC flag
- unshare support

[clg@fr.ibm.com: small fix for unshare of ipc namespace]
[akpm@osdl.org: build fix]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uts: copy nsproxy only when needed
Serge Hallyn [Mon, 2 Oct 2006 09:18:18 +0000 (02:18 -0700)]
[PATCH] uts: copy nsproxy only when needed

The nsproxy was being copied in unshare() when anything was being unshared,
even if it was something not referenced from nsproxy.  This should end up
in some cases with far more memory usage than necessary.

Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: utsname: implement CLONE_NEWUTS flag
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:17 +0000 (02:18 -0700)]
[PATCH] namespaces: utsname: implement CLONE_NEWUTS flag

Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare.

[clg@fr.ibm.com: IPC unshare fix]
[bunk@stusta.de: cleanup]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: utsname: remove system_utsname
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:16 +0000 (02:18 -0700)]
[PATCH] namespaces: utsname: remove system_utsname

The system_utsname isn't needed now that kernel/sysctl.c is fixed.
Nuke it.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: utsname: sysctl
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:15 +0000 (02:18 -0700)]
[PATCH] namespaces: utsname: sysctl

Sysctl uts patch.  This will need to be done another way, but since sysctl
itself needs to be container aware, 'the right thing' is a separate patchset.

[akpm@osdl.org: ia64 build fix]
[sam.vilain@catalyst.net.nz: cleanup]
[sam.vilain@catalyst.net.nz: add proc_do_utsns_string]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: utsname: implement utsname namespaces
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:14 +0000 (02:18 -0700)]
[PATCH] namespaces: utsname: implement utsname namespaces

This patch defines the uts namespace and some manipulators.
Adds the uts namespace to task_struct, and initializes a
system-wide init namespace.

It leaves a #define for system_utsname so sysctl will compile.
This define will be removed in a separate patch.

[akpm@osdl.org: build fix, cleanup]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: utsname: use init_utsname when appropriate
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:13 +0000 (02:18 -0700)]
[PATCH] namespaces: utsname: use init_utsname when appropriate

In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
right ones in net/ipv4/ipconfig.c.  These are now changed to
utsname() (the per-process namespace utsname) in the previous
patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: utsname: switch to using uts namespaces
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:11 +0000 (02:18 -0700)]
[PATCH] namespaces: utsname: switch to using uts namespaces

Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: utsname: introduce temporary helpers
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:10 +0000 (02:18 -0700)]
[PATCH] namespaces: utsname: introduce temporary helpers

Define utsname() and init_utsname() which return &system_utsname.  Users of
system_utsname will be changed to use these helpers, after which
system_utsname will disappear.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: exit_task_namespaces() invalidates nsproxy
Cedric Le Goater [Mon, 2 Oct 2006 09:18:09 +0000 (02:18 -0700)]
[PATCH] namespaces: exit_task_namespaces() invalidates nsproxy

exit_task_namespaces() has replaced the former exit_namespace().  It
invalidates task->nsproxy and associated namespaces.  This is an issue for
the (futur) pid namespace which is required to be valid in exit_notify().

This patch moves exit_task_namespaces() after exit_notify() to keep nsproxy
valid.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: incorporate fs namespace into nsproxy
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:08 +0000 (02:18 -0700)]
[PATCH] namespaces: incorporate fs namespace into nsproxy

This moves the mount namespace into the nsproxy.  The mount namespace count
now refers to the number of nsproxies point to it, rather than the number of
tasks.  As a result, the unshare_namespace() function in kernel/fork.c no
longer checks whether it is being shared.

Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:07 +0000 (02:18 -0700)]
[PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c

Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c.  This
avoids all arches having to be updated.  Compiles and boots on s390.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namespaces: add nsproxy
Serge E. Hallyn [Mon, 2 Oct 2006 09:18:06 +0000 (02:18 -0700)]
[PATCH] namespaces: add nsproxy

This patch adds a nsproxy structure to the task struct.  Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.

The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.

[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] make kernel/sysctl.c:_proc_do_string() static
Adrian Bunk [Mon, 2 Oct 2006 09:18:05 +0000 (02:18 -0700)]
[PATCH] make kernel/sysctl.c:_proc_do_string() static

This patch makes the needlessly global _proc_do_string() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] proc: sysctl: add _proc_do_string helper
Sam Vilain [Mon, 2 Oct 2006 09:18:04 +0000 (02:18 -0700)]
[PATCH] proc: sysctl: add _proc_do_string helper

The logic in proc_do_string is worth re-using without passing in a
ctl_table structure (say, we want to calculate a pointer and pass that in
instead); pass in the two fields it uses from that structure as explicit
arguments.

Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] nfsd: lockdep annotation
Peter Zijlstra [Mon, 2 Oct 2006 09:18:03 +0000 (02:18 -0700)]
[PATCH] nfsd: lockdep annotation

while doing a kernel make modules_install install over an NFS mount.

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9550 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9550:
   #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9580 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9580:
   #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: allow admin to set nthreads per node
Greg Banks [Mon, 2 Oct 2006 09:18:02 +0000 (02:18 -0700)]
[PATCH] knfsd: allow admin to set nthreads per node

Add /proc/fs/nfsd/pool_threads which allows the sysadmin (or a userspace
daemon) to read and change the number of nfsd threads in each pool.  The
format is a list of space-separated integers, one per pool.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: make rpc threads pools numa aware
Greg Banks [Mon, 2 Oct 2006 09:18:01 +0000 (02:18 -0700)]
[PATCH] knfsd: make rpc threads pools numa aware

Actually implement multiple pools.  On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool.  Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e.  the NIC interrupt CPU).  Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.

This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.

Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: use svc_set_num_threads to manage threads in knfsd
Greg Banks [Mon, 2 Oct 2006 09:18:00 +0000 (02:18 -0700)]
[PATCH] knfsd: use svc_set_num_threads to manage threads in knfsd

Replace the existing list of all nfsd threads with new code using
svc_create_pooled().

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: add svc_set_num_threads
Greg Banks [Mon, 2 Oct 2006 09:17:59 +0000 (02:17 -0700)]
[PATCH] knfsd: add svc_set_num_threads

Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv.  Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code.  Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: add svc_get
Greg Banks [Mon, 2 Oct 2006 09:17:58 +0000 (02:17 -0700)]
[PATCH] knfsd: add svc_get

add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: split svc_serv into pools
Greg Banks [Mon, 2 Oct 2006 09:17:58 +0000 (02:17 -0700)]
[PATCH] knfsd: split svc_serv into pools

Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv.  The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
 This is a major step towards making the NFS server NUMA-friendly.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: test and set SK_BUSY atomically
Greg Banks [Mon, 2 Oct 2006 09:17:57 +0000 (02:17 -0700)]
[PATCH] knfsd: test and set SK_BUSY atomically

The SK_BUSY bit in svc_sock->sk_flags ensures that we do not attempt to
enqueue a socket twice.  Currently, setting and clearing the bit is protected
by svc_serv->sv_lock.  As I intend to reduce the data that the lock protects
so it's not held when svc_sock_enqueue() tests and sets SK_BUSY, that test and
set needs to be atomic.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: convert sk_reserved to atomic_t
Greg Banks [Mon, 2 Oct 2006 09:17:56 +0000 (02:17 -0700)]
[PATCH] knfsd: convert sk_reserved to atomic_t

Convert the svc_sock->sk_reserved variable from an int protected by
svc_serv->sv_lock, to an atomic.  This reduces (by 1) the number of places we
need to take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: use new lock for svc_sock deferred list
Greg Banks [Mon, 2 Oct 2006 09:17:55 +0000 (02:17 -0700)]
[PATCH] knfsd: use new lock for svc_sock deferred list

Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
instead of svc_serv->sv_lock.  Using the more fine-grained lock reduces the
number of places we need to take the svc_serv lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: convert sk_inuse to atomic_t
Greg Banks [Mon, 2 Oct 2006 09:17:54 +0000 (02:17 -0700)]
[PATCH] knfsd: convert sk_inuse to atomic_t

Convert the svc_sock->sk_inuse counter from an int protected by
svc_serv->sv_lock, to an atomic.  This reduces the number of places we need to
take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: move tempsock aging to a timer
Greg Banks [Mon, 2 Oct 2006 09:17:54 +0000 (02:17 -0700)]
[PATCH] knfsd: move tempsock aging to a timer

Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware.  They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.

knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.

The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.

Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients.  Number of NUMA nodes == N/2

N Throughput, MiB/s CPU usage, % (max=N*100)
Before After Before After
--- ------ ---- ----- -----
4 312 435 350 228
6 500 656 501 418
8 562 804 690 589

This patch:

Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes.  This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.

[akpm@osdl.org: cleanup]
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: Correctly handle error condition from lockd_up
NeilBrown [Mon, 2 Oct 2006 09:17:53 +0000 (02:17 -0700)]
[PATCH] knfsd: Correctly handle error condition from lockd_up

If lockd_up fails - what should we expect?  Do we have to later call
lockd_down?

Well the nfs client thinks "no", the nfs server thinks "yes".  lockd thinks
"yes".

The only answer that really makes sense is "no" !!

So:
  Make lockd_up only increment  nlmsvc_users on success.
  Make nfsd handle errors from lockd_up properly.
  Make sure lockd_up(0) never fails when lockd is running
    so that the 'reclaimer' call to lockd_up doesn't need to
    be error checked.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: Move makesock failed warning into make_socks.
NeilBrown [Mon, 2 Oct 2006 09:17:52 +0000 (02:17 -0700)]
[PATCH] knfsd: Move makesock failed warning into make_socks.

Thus it is printed for any path that leads to failure (make_socks is called
from two places).

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: Check return value of lockd_up in write_ports
NeilBrown [Mon, 2 Oct 2006 09:17:51 +0000 (02:17 -0700)]
[PATCH] knfsd: Check return value of lockd_up in write_ports

We should be checking the return value of lockd_up when adding a new socket to
nfsd.  So move the lockd_up before the svc_addsock and check the return value.

The move is because lockd_down is easy, but there is no easy way to remove a
recently added socket.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process
NeilBrown [Mon, 2 Oct 2006 09:17:50 +0000 (02:17 -0700)]
[PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process

It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] nfsd: add lock annotations to e_start and e_stop
Josh Triplett [Mon, 2 Oct 2006 09:17:50 +0000 (02:17 -0700)]
[PATCH] nfsd: add lock annotations to e_start and e_stop

e_start acquires svc_export_cache.hash_lock, and e_stop releases it.  Add
lock annotations to these two functions so that sparse can check callers
for lock pairing, and so that sparse will not complain about these
functions since they intentionally use locks in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: Use SEQ_START_TOKEN instead of hardcoded magic (void*)1
Greg Banks [Mon, 2 Oct 2006 09:17:49 +0000 (02:17 -0700)]
[PATCH] knfsd: Use SEQ_START_TOKEN instead of hardcoded magic (void*)1

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'
NeilBrown [Mon, 2 Oct 2006 09:17:48 +0000 (02:17 -0700)]
[PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'

Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist.  This will cause the nfs server to listen on that socket.

To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports
NeilBrown [Mon, 2 Oct 2006 09:17:47 +0000 (02:17 -0700)]
[PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports

This file will list all ports that nfsd has open.
Default when TCP enabled will be
   ipv4 udp 0.0.0.0 2049
   ipv4 tcp 0.0.0.0 2049

Later, the list of ports will be settable.

'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
non-mainline patches which created 'ports' with different semantics.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: separate out some parts of nfsd_svc, which start nfs servers
NeilBrown [Mon, 2 Oct 2006 09:17:46 +0000 (02:17 -0700)]
[PATCH] knfsd: separate out some parts of nfsd_svc, which start nfs servers

Separate out the code for creating a new service, and for creating initial
sockets.

Some of these new functions will have multiple callers soon.

[akpm@osdl.org: cleanups]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions
NeilBrown [Mon, 2 Oct 2006 09:17:46 +0000 (02:17 -0700)]
[PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions

We have an array 'nfsd_version' which lists the available versions of nfsd,
and 'nfsd_versions' (poor choice there :-() which lists the currently active
versions.

Then we have a bitmap - nfsd_versbits which says which versions are wanted.
The bits in this bitset cause content to be copied from nfsd_version to
nfsd_versions when nfsd starts.

This patch removes nfsd_versbits and moves information directly from
nfsd_version to nfsd_versions when requests for version changes arrive.

Note that this doesn't make it possible to change versions while the server is
running.  This is because serv->sv_xdrsize is calculated when a service is
created, and used when threads are created, and xdrsize depends on the active
versions.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: be more selective in which sockets lockd listens on
NeilBrown [Mon, 2 Oct 2006 09:17:45 +0000 (02:17 -0700)]
[PATCH] knfsd: be more selective in which sockets lockd listens on

Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.

However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.

So:
 - add an option to lockd_up saying which protocol is needed
 - Always open sockets for which an explicit port was given, otherwise
   only open a socket of the type required
 - Change nfsd to do one lockd_up per socket rather than one per thread.

This
 - removes the dependancy on CONFIG_NFSD_TCP
 - means that lockd may open sockets other than at startup
 - means that lockd will *not* listen on UDP if the only
   mounts are TCP mount (and nfsd hasn't started).

The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: add a callback for when last rpc thread finishes
NeilBrown [Mon, 2 Oct 2006 09:17:44 +0000 (02:17 -0700)]
[PATCH] knfsd: add a callback for when last rpc thread finishes

nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more.  So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: remove an unused variable from auth_unix_lookup()
Greg Banks [Mon, 2 Oct 2006 09:17:43 +0000 (02:17 -0700)]
[PATCH] knfsd: remove an unused variable from auth_unix_lookup()

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>