git.monstr.eu Git - linux-2.6-microblaze.git/log

bpf: add sample usages for persistent maps/progs

This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
and BPF_OBJ_GET commands can be used.

Example with maps:

  # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
  bpf: map fd:3 (Success)
  bpf: pin ret:(0,Success)
  bpf: fd:3 u->(1:42) ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
  bpf: get fd:3 (Success)
  bpf: fd:3 l->(1):42 ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
  bpf: get fd:3 (Success)
  bpf: fd:3 u->(1:24) ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
  bpf: get fd:3 (Success)
  bpf: fd:3 l->(1):24 ret:(0,Success)

  # ./fds_example -F /sys/fs/bpf/m2 -P -m
  bpf: map fd:3 (Success)
  bpf: pin ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
  bpf: get fd:3 (Success)
  bpf: fd:3 l->(1):0 ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m2 -G -m
  bpf: get fd:3 (Success)

Example with progs:

  # ./fds_example -F /sys/fs/bpf/p -P -p
  bpf: prog fd:3 (Success)
  bpf: pin ret:(0,Success)
  bpf sock:4 <- fd:3 attached ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/p -G -p
  bpf: get fd:3 (Success)
  bpf: sock:4 <- fd:3 attached ret:(0,Success)

  # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
  bpf: prog fd:5 (Success)
  bpf: pin ret:(0,Success)
  bpf: sock:3 <- fd:5 attached ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/p2 -G -p
  bpf: get fd:3 (Success)
  bpf: sock:4 <- fd:3 attached ret:(0,Success)

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add support for persistent maps/progs

This work adds support for "persistent" eBPF maps/programs. The term
"persistent" is to be understood that maps/programs have a facility
that lets them survive process termination. This is desired by various
eBPF subsystem users.

Just to name one example: tc classifier/action. Whenever tc parses
the ELF object, extracts and loads maps/progs into the kernel, these
file descriptors will be out of reach after the tc instance exits.
So a subsequent tc invocation won't be able to access/relocate on this
resource, and therefore maps cannot easily be shared, f.e. between the
ingress and egress networking data path.

The current workaround is that Unix domain sockets (UDS) need to be
instrumented in order to pass the created eBPF map/program file
descriptors to a third party management daemon through UDS' socket
passing facility. This makes it a bit complicated to deploy shared
eBPF maps or programs (programs f.e. for tail calls) among various
processes.

We've been brainstorming on how we could tackle this issue and various
approches have been tried out so far, which can be read up further in
the below reference.

The architecture we eventually ended up with is a minimal file system
that can hold map/prog objects. The file system is a per mount namespace
singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
mounts within a given namespace will point to the same instance. The
file system allows for creating a user-defined directory structure.
The objects for maps/progs are created/fetched through bpf(2) with
two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
along with a pathname is being passed to bpf(2) that in turn creates
(we call it eBPF object pinning) the file system nodes. Only the pathname
is being passed to bpf(2) for getting a new BPF file descriptor to an
existing node. The user can use that to access maps and progs later on,
through bpf(2). Removal of file system nodes is being managed through
normal VFS functions such as unlink(2), etc. The file system code is
kept to a very minimum and can be further extended later on.

The next step I'm working on is to add dump eBPF map/prog commands
to bpf(2), so that a specification from a given file descriptor can
be retrieved. This can be used by things like CRIU but also applications
can inspect the meta data after calling BPF_OBJ_GET.

Big thanks also to Alexei and Hannes who significantly contributed
in the design discussion that eventually let us end up with this
architecture here.

Reference: https://lkml.org/lkml/2015/10/15/925
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: consolidate bpf_prog_put{, _rcu} dismantle paths

We currently have duplicated cleanup code in bpf_prog_put() and
bpf_prog_put_rcu() cleanup paths. Back then we decided that it was
not worth it to make it a common helper called by both, but with
the recent addition of resource charging, we could have avoided
the fix in commit ac00737f4e81 ("bpf: Need to call bpf_prog_uncharge_memlock
from bpf_prog_put") if we would have had only a single, common path.
We can simplify it further by assigning aux->prog only once during
allocation time.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: align and clean bpf_{map,prog}_get helpers

Add a bpf_map_get() function that we're going to use later on and
align/clean the remaining helpers a bit so that we have them a bit
more consistent:

  - __bpf_map_get() and __bpf_prog_get() that both work on the fd
    struct, check whether the descriptor is eBPF and return the
    pointer to the map/prog stored in the private data.

    Also, we can return f.file->private_data directly, the function
    signature is enough of a documentation already.

  - bpf_map_get() and bpf_prog_get() that both work on u32 user fd,
    call their respective __bpf_map_get()/__bpf_prog_get() variants,
    and take a reference.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: abstract anon_inode_getfd invocations

Since we're going to use anon_inode_getfd() invocations in more than just
the current places, make a helper function for both, so that we only need
to pass a map/prog pointer to the helper itself in order to get a fd. The
new helpers are called bpf_map_new_fd() and bpf_prog_new_fd().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix percpu memory leaks

This patch fixes following problems :

1) percpu_counter_init() can return an error, therefore
  init_frag_mem_limit() must propagate this error so that
  inet_frags_init_net() can do the same up to its callers.

2) If ip[46]_frags_ns_ctl_register() fail, we must unwind
   properly and free the percpu_counter.

Without this fix, we leave freed object in percpu_counters
global list (if CONFIG_HOTPLUG_CPU) leading to crashes.

This bug was detected by KASAN and syzkaller tool
(http://github.com/google/syzkaller)

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ravb: use pdev rather than ndev for error messages

This corrects what appear to be typos, making the code consistent with
itself, and allowing meaningful prefixes to be displayed with the errors in
question.

Before:
(null): failed to initialize MDIO
(null): Cannot allocate desc base address table (size 176 bytes)

After:
ravb e6800000.ethernet: failed to initialize MDIO
ravb e6800000.ethernet: Cannot allocate desc base address table (size 176 bytes)

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make skb_set_owner_w() more robust

skb_set_owner_w() is called from various places that assume
skb->sk always point to a full blown socket (as it changes
sk->sk_wmem_alloc)

We'd like to attach skb to request sockets, and in the future
to timewait sockets as well. For these kind of pseudo sockets,
we need to take a traditional refcount and use sock_edemux()
as the destructor.

It is now time to un-inline skb_set_owner_w(), being too big.

Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: vlan: Use rcu_dereference instead of rtnl_dereference

br_should_learn() is protected by RCU and not by RTNL, so use correct
flavor of nbp_vlan_group().

Fixes: 907b1e6e83ed ("bridge: vlan: use proper rcu for the vlgrp
member")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: lookup switch name

All the mv88e6xxx drivers use the exact same code in their probe
function to lookup the switch name given its ID. Thus introduce a
mv88e6xxx_switch_id structure and a mv88e6xxx_lookup_name function in
the common mv88e6xxx code.

In the meantime make __mv88e6xxx_reg_{read,write} static since we do not
need to expose these low-level r/w routines anymore.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: assert SMI lock

It's easy to forget to lock the smi_mutex before calling the low-level
_mv88e6xxx_reg_{read,write}, so add a assert_smi_lock function in them.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: convert hashtab lock to raw lock

When running bpf samples on rt kernel, it reports the below warning:

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping
Preemption disabled at:[<ffff80000017db58>] kprobe_perf_func+0x30/0x228

CPU: 3 PID: 477 Comm: ping Not tainted 4.1.10-rt8 #4
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
[<ffff80000008a5b0>] dump_backtrace+0x0/0x128
[<ffff80000008a6f8>] show_stack+0x20/0x30
[<ffff8000007da90c>] dump_stack+0x7c/0xa0
[<ffff8000000e4830>] ___might_sleep+0x188/0x1a0
[<ffff8000007e2200>] rt_spin_lock+0x28/0x40
[<ffff80000018bf9c>] htab_map_update_elem+0x124/0x320
[<ffff80000018c718>] bpf_map_update_elem+0x40/0x58
[<ffff800000187658>] __bpf_prog_run+0xd48/0x1640
[<ffff80000017ca6c>] trace_call_bpf+0x8c/0x100
[<ffff80000017db58>] kprobe_perf_func+0x30/0x228
[<ffff80000017dd84>] kprobe_dispatcher+0x34/0x58
[<ffff8000007e399c>] kprobe_handler+0x114/0x250
[<ffff8000007e3bf4>] kprobe_breakpoint_handler+0x1c/0x30
[<ffff800000085b80>] brk_handler+0x88/0x98
[<ffff8000000822f0>] do_debug_exception+0x50/0xb8
Exception stack(0xffff808349687460 to 0xffff808349687580)
7460: 4ca2b600 ffff8083 4a3a7000 ffff8083 49687620 ffff8083 0069c5f8 ffff8000
7480: 00000001 00000000 007e0628 ffff8000 496874b0 ffff8083 007e1de8 ffff8000
74a0: 496874d0 ffff8083 0008e04c ffff8000 00000001 00000000 4ca2b600 ffff8083
74c0: 00ba2e80 ffff8000 49687528 ffff8083 49687510 ffff8083 000e5c70 ffff8000
74e0: 00c22348 ffff8000 00000000 ffff8083 49687510 ffff8083 000e5c74 ffff8000
7500: 4ca2b600 ffff8083 49401800 ffff8083 00000001 00000000 00000000 00000000
7520: 496874d0 ffff8083 00000000 00000000 00000000 00000000 00000000 00000000
7540: 2f2e2d2c 33323130 00000000 00000000 4c944500 ffff8083 00000000 00000000
7560: 00000000 00000000 008751e0 ffff8000 00000001 00000000 124e2d1d 00107b77

Convert hashtab lock to raw lock to avoid such warning.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge_vlan_fixes'

Nikolay Aleksandrov says:

====================
bridge: vlan: failure path and comment fixes

This is a set from Ido which takes care of one failure path error in
nbp_vlan_init (patch 1) and a few comment errors (patch 2).
I must admit I didn't expect the port init continues after a vlan init
failure but should've checked to make sure. Thanks to Ido for catching
these!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: vlan: Use correct flag name in comment

The flag used to indicate if a VLAN should be used for filtering - as
opposed to context only - on the bridge itself (e.g. br0) is called
'brentry' and not 'brvlan'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: vlan: Prevent possible use-after-free

When adding a port to a bridge we initialize VLAN filtering on it. We do
not bail out in case an error occurred in nbp_vlan_init, as it can be
used as a non VLAN filtering bridge.

However, if VLAN filtering is required and an error occurred in
nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering
functions (e.g. br_vlan_find, br_get_pvid) will know the struct is
invalid and will not try to access it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp/dccp: fix ireq->pktopts race

IPv6 request sockets store a pointer to skb containing the SYN packet
to be able to transfer it to full blown socket when 3WHS is done
(ireq->pktopts -> np->pktoptions)

As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for
passive sessions"), we must transfer the skb only if we won the
hashdance race, if multiple cpus receive the 'ack' packet completing
3WHS at the same time.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: convert bind hash table to re-sizable hashtable

To further improve the RDS connection scalabilty on massive systems
where number of sockets grows into tens of thousands of sockets, there
is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
not very flexible in terms of memory utilisation. The rhashtable
infrastructure gives us the flexibility to grow the hashtbable based
on use and also comes up with inbuilt efficient bucket(chain) handling.

Reviewed-by: David Miller <davem@davemloft.net>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rds: changing the return type from int to void

as result of function rds_iw_flush_mr_pool is nowhere checked,
changing its return type from int to void.
also removing the unused variable rc as there is nothing to return

Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'encx24j600-fixes'

Javier Martinez Canillas says:

====================
net: encx24j600: Fix SPI driver module autoload

Recently I've been trying to fix module autoloading for all SPI drivers and
found that the encx24j600 driver does not fill module alias information due
missing a MODULE_DEVICE_TABLE() so module autload won't work and the driver
Kconfig symbol is tristate which means the driver can be built as a module.

But also the SPI id table is not correctly defined so this series fixes both
issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: encx24j600: Export missing SPI module alias information

The driver Kconfig symbol is tristate which means that it can be built as
a module but the module alias information is not added to the module info
so module autoload won't work since user-space won't have the information.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: encx24j600: Fix SPI id table definition

A driver's SPI id table is expected to be an array of struct spi_device_id
that ends with a zero-initialized sentinel entry. But this driver defines
the table as a single struct spi_device_id and sets .id_table to a pointer
to this struct.

But spi_match_id() has a loop that iterates while the struct spi_device_id
.name[0] is not NULL, so not having a sentinel can cause a NULL pointer
deference error.

This patch defines the SPI id table correctly as all other SPI drivers do.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: assign affinity hint to interrupts

The affinity hint is used by the user space daemon, irqbalancer, to
indicate a preferred CPU mask for irqs. This patch sets the irq affinity
hint to local numa core first, when exausted we try non-local numa cores.

Also set tx xps cpus mask bassed on affinity hint.

v2: remove the global affinity policy.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: use l4 hash for locally generated multipath flows

This patch changes how the multipath hash is computed for locally
generated flows: now the hash comprises l4 information.

This allows better utilization of the available paths when the existing
flows have the same source IP and the same destination IP: with l3 hash,
even when multiple connections are in place simultaneously, a single path
will be used, while with l4 hash we can use all the available paths.

v2 changes:
- use get_hash_from_flowi4() instead of implementing just another l4 hash
function

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Use 64-bit timekeeping

This patch changes the use of struct timespec in
dccp_probe to use struct timespec64 instead. timespec uses a 32-bit
seconds field which will overflow in the year 2038 and beyond. timespec64
uses a 64-bit seconds field. Note that the correctness of the code isn't
changed, since the original code only uses the timestamps to compute a
small elapsed interval. This patch is part of a larger attempt to remove
instances of 32-bit timekeeping structures (timespec, timeval, time_t)
from the kernel so it is easier to identify where the real 2038 issues
are.

Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hisilicon: Remove .owner assignment from platform_driver

platform_driver doesn't need to set .owner, because
platform_driver_register() will set it.

Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: use switchdev obj for VLAN add/del ops

Simplify DSA by pushing the switchdev objects for VLAN add and delete
operations down to its drivers. Currently only mv88e6xxx is affected.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: define VSOCK_SS_LISTEN once only

The SS_LISTEN socket state is defined by both af_vsock.c and
vmci_transport.c.  This is risky since the value could be changed in one
file and the other would be out of sync.

Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part
of enum socket_state (SS_CONNECTED, ...).  This way it is clear that the
constant is vsock-specific.

The big text reflow in af_vsock.c was necessary to keep to the maximum
line length.  Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'csum_partial_frags'

Hannes Frederic Sowa says:

====================
net: clean up interactions of CHECKSUM_PARTIAL and fragmentation

This series fixes wrong checksums on the wire for IPv4 and IPv6. Large
send buffers and especially NFS lead to wrong checksums in both IPv4
and IPv6.

CHECKSUM_PARTIAL skbs should not receive the respective fragmentations
functions, so we add WARN_ON_ONCE to those functions to fix up those as
soon as they get reported.

Changelog:
v2: added v4 checks
v3: removed WARN_ON_ONCES (advice by Tom Herbert)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in
the existence of IPv6 extension headers, but the condition was wrong. Fix
it up, too. Also the condition to check whether the packet fits into
one fragment was wrong and has been corrected.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull Ceph fix from Sage Weil:
"This sets the stable pages flag on the RBD block device when we have
  CRCs enabled.  (This is necessary since the default assumption for
  block devices changed in 3.9)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: require stable pages if message data CRCs are enabled

Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs bug fixes from Miklos Szeredi:
"This contains fixes for bugs that appeared in earlier kernels (all are
  marked for -stable)"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: free lower_mnt array in ovl_put_super
  ovl: free stack of paths in ovl_fill_super
  ovl: fix open in stacked overlay
  ovl: fix dentry reference leak
  ovl: use O_LARGEFILE in ovl_copy_up()

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix two regressions in ipv6 route lookups, particularly wrt output
    interface specifications in the lookup key.  From David Ahern.

2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from
    Herbert Xu.

3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon
    Arlott.

4) Some smsc phys misbehave with energy detect mode enabled, so add a
    DT property and disable it on such switches.  From Heiko Schocher.

5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer.

6) Fix regression added by removal of openvswitch vport stats, from
    James Morse.

7) Vendor Kconfig options should be bool, not tristate, from Andreas
    Schwab.

8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise we
    barf during TCP REPAIR operations.

9) Fix various bugs in openvswitch conntrack support, from Joe
    Stringer.

10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann.

11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen
    Hansen.

12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten
    Keil.

13) Add some device IDs to qmi_wwan, from Bjorn Mork.

14) Fix ovs egress tunnel information when using lwtunnel devices, from
    Pravin B Shelar.

15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason
    Wang.

16) Fix incorrect handling of throw routes when the result of the throw
    cannot find a match, from Xin Long.

17) Protect ipv6 MTU calculations from wrap-around, from Hannes Frederic
    Sowa.

18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan
    Sullivan.

19) Add missing memory barries in descriptor accesses or xgbe driver,
    from Thomas Lendacky.

20) Fix release conditon test in pppoe_release(), from Guillaume Nault.

21) Fix gianfar bugs wrt filter configuration, from Claudiu Manoil.

22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei
    Shtylyov.

23) Fixing missing of_node_put() calls in various places around the
    networking, from Julia Lawall.

24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander
    Duyck.

25) RDS doesn't check pskb_pull()/pskb_trim() return values, from
    Sowmini Varadhan.

26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues
  Revert "Merge branch 'ipv6-overflow-arith'"
  net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes
  net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present
  vhost: fix performance on LE hosts
  bpf: sample: define aarch64 specific registers
  amd-xgbe: Fix race between access of desc and desc index
  RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
  forcedeth: fix unilateral interrupt disabling in netpoll path
  openvswitch: Fix skb leak using IPv6 defrag
  ipv6: Export nf_ct_frag6_consume_orig()
  openvswitch: Fix double-free on ip_defrag() errors
  fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key
  net: mv643xx_eth: add missing of_node_put
  ath6kl: add missing of_node_put
  net: phy: mdio: add missing of_node_put
  netdev/phy: add missing of_node_put
  net: netcp: add missing of_node_put
  net: thunderx: add missing of_node_put
  ipv6: gre: support SIT encapsulation
  ...

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input layer fixes from Dmitry Torokhov:

- a change to the ALPS driver where we had limit the quirk for
   trackstick handling from being active on all Dells to just a few
   models

- a fix for a build dependency issue in the sur40 driver

- a small clock handling fixup in the LPC32xx touchscreen driver

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits
  Input: sur40 - add dependency on VIDEO_V4L2
  Input: lpc32xx_ts - fix warnings caused by enabling unprepared clock

Merge tag 'pci-v4.3-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
"Sorry for this last-minute update; it's been in -next for quite a
  while, but I forgot about it until I started getting ready for the
  merge window.

  It's small and fixes a way a user could cause a panic via sysfs, so I
  think it's worth getting it in v4.3.

  NUMA:
    - Prevent out of bounds access in sysfs numa_node override (Sasha Levin)"

* tag 'pci-v4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Prevent out of bounds access in numa_node override

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Apologies for this being so late, but we've uncovered a few nasty
  issues on arm64 which didn't settle down until yesterday and the fixes
  all look suitable for 4.3.  Of the four patches, three of them are
  Cc'd to stable, with the remaining patch fixing an issue that only
  took effect during the merge window.

  Summary:

   - Fix corruption in SWP emulation when STXR fails due to contention
   - Fix MMU re-initialisation when resuming from a low-power state
   - Fix stack unwinding code to match what ftrace expects
   - Fix relocation code in the EFI stub when DRAM base is not 2MB aligned"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/efi: do not assume DRAM base is aligned to 2 MB
  Revert "ARM64: unwind: Fix PC calculation"
  arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap
  arm64: compat: fix stxr failure case in SWP emulation

Merge tag 'please-pull-syscalls' of git://git./linux/kernel/git/aegl/linux

Pull ia64 kcmp syscall from Tony Luck:
"Missed adding the kcmp() syscall a long time ago.  Now it seems that
  it is essential to build systemd"

* tag 'please-pull-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Wire up kcmp syscall

rbd: require stable pages if message data CRCs are enabled

rbd requires stable pages, as it performs a crc of the page data before
they are send to the OSDs.

But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0
"mm: only enforce stable page writes if the backing device requires
it") it is not assumed anymore that block devices require stable pages.

This patch sets the necessary flag to get stable pages back for rbd.

In a ceph installation that provides multiple ext4 formatted rbd
devices "bad crc" messages appeared regularly (ca 1 message every 1-2
minutes on every OSD that provided the data for the rbd) in the
OSD-logs before this patch. After this patch this messages are pretty
much gone (only ca 1-2 / month / OSD).

Cc: stable@vger.kernel.org # 3.9+, needs backporting
Signed-off-by: Ronny Hegewald <Ronny.Hegewald@online.de>
[idryomov@gmail.com: require stable pages only in crc case, changelog]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-10-30

1) The flow cache is limited by the flow cache limit which
   depends on the number of cpus and the xfrm garbage collector
   threshold which is independent of the number of cpus. This
   leads to the fact that on systems with more than 16 cpus
   we hit the xfrm garbage collector limit and refuse new
   allocations, so new flows are dropped. On systems with 16
   or less cpus, we hit the flowcache limit. In this case, we
   shrink the flow cache instead of refusing new flows.

   We increase the xfrm garbage collector threshold to INT_MAX
   to get the same behaviour, independent of the number of cpus.

2) Fix some unaligned accesses on sparc systems.
   From Sowmini Varadhan.

3) Fix some header checks in _decode_session4. We may call
   pskb_may_pull with a negative value converted to unsigened
   int from pskb_may_pull. This can lead to incorrect policy
   lookups. We fix this by a check of the data pointer position
   before we call pskb_may_pull.

4) Reload skb header pointers after calling pskb_may_pull
   in _decode_session4 as this may change the pointers into
   the packet.

5) Add a missing statistic counter on inner mode errors.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2015-10-29' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
iwlwifi

* bug fix for TDLS
* fixes and cleanups in scan
* support of several scan plans
* improvements in FTM
* fixes in FW API
* improvements in the failure paths when the bus is dead
* other various small things here and there

ath10k

* add QCA9377 support
* fw_stats support for 10.4 firmware

ath6kl

* report antenna configuration to user space
* implement ethtool stats

ssb

* add Kconfig SSB_HOST_SOC for compiling SoC related code
* move functions specific to SoC hosted bus to separated file
* pick PCMCIA host code support from b43 driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: fix: pass correct obj size when deferring obj add

Fixes: 4d429c5dd ("switchdev: introduce possibility to defer obj_add/del")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: fix: erasing too much of vlan obj when handling multiple vlan specs

When adding vlans with multiple IFLA_BRIDGE_VLAN_INFO attrs set in AFSPEC,
we would wipe the vlan obj struct after the first IFLA_BRIDGE_VLAN_INFO.
Fix this by only clearing what's necessary on each IFLA_BRIDGE_VLAN_INFO
iteration.

Fixes: 9e8f4a54 ("switchdev: push object ID back to object structure")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'nfc-next-4.4-2' of git://git./linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.4 pull request

This is the NFC pull request for 4.4.

It's a bit bigger than usual, the 3 main culprits being:

- A new driver for Intel's Fields Peak NCI chipset. In order to
  support this chipset we had to export a few NCI routines and
  extend the driver NCI ops to not only support proprietary
  commands but also core ones.

- Support for vendor commands for both STM drivers, st-nci
  and st21nfca. Those vendor commands allow to run factory tests
  through the NFC netlink interface.

- New i2c and SPI support for the Marvell driver, together with
  firmware download support for this driver's core.

Besides that we also have:

- A few file renames in the STM drivers, to keep the naming
  consistent between drivers.

- Some improvements and fixes on the NCI HCI layer, mostly to
  properly reach a secure element over a legacy HCI link.

- A few fixes for the s3fwrn5 and trf7970a drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-10-28

Here are a some more Bluetooth patches for 4.4 which collected up during
the past week. The most important ones are from Kuba Pawlak for fixing
locking issues with SCO sockets. There's also a fix from Alexander Aring
for 6lowpan, a memleak fix from Julia Lawall for the btmrvl driver and
some cleanup patches from Marcel.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTU

This change makes it so that we reinitialize the interface if the MTU is
increased back above IPV6_MIN_MTU and the interface is up.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-flooding-and-cosmetics'

Jiri Pirko says:

====================
mlxsw: driver update

This driver update mainly brings support for user to be able to setup
flooding on specified port, via bridge flag. Also, there is a fix in ageing
time conversion. The rest is just cosmetics.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Make mlxsw_sp_port_switchdev_ops static

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Put braces on all arms of branch statement

Fix a place where checkpatch complains that braces should be used
on all arms of this statement.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Put constant on the right side of comparisons

Fixes those places where checkpatch complains that comparisons
should place the constant on the right side of the test.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Fix ageing time value

The value passed through switchdev attr set is not in jiffies, but in
clock_t, so fix the convert.

Reported-by: Sagi Rotem <sagir@mellanox.com>
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Avoid unnecessary line wrap for mlxsw_reg_sfd_uc_unpack

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Fix desription typos of couple of SFN items

Fix copy-paste errors.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Fix description for reg_sfd_uc_sub_port

The original description was for LAG, so fix it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Add support for flood control

Add or remove a bridged port from the flooding domain of unknown unicast
packets according to user configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Add support for VLAN ranges in flooding configuration

When enabling a range of VLANs on a bridged port we can configure
flooding for these VLANs by one register access instead of calling the
same register for each VLAN. This is accomplished by using the 'range'
field of the Switch Flooding Table Register (SFTR).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: move "bridged" bool to u8 flags

It is a flag anyway, so move it to existing u8 flag and don't waste mem.
Fix the flags to be in single u8 on the way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: Make flood to CPU optional

In certain use cases it is not always desirable for the switch device to
flood traffic to CPU port. Instead, only certain packet types (e.g.
STP, LACP) should be trapped to it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: Add support for flood control

Allow devices supporting this feature to control the flooding of unknown
unicast traffic, by making switchdev infrastructure propagate this setting
to the switch driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'xgene_txrx_delay'

Iyappan Subramanian says:

====================
drivers: xgene: Add support RGMII TX/RX delay configuration

X-Gene RGMII ethernet controller has a RGMII bridge that performs the
task of converting the RGMII signal {RX_CLK,RX_CTL, RX_DATA[3:0]} from
PHY to GMII signal {RX_DV,RX_ER,RX_DATA[7:0]} and vice versa. This
RGMII bridge has a provision to internally delay the input RX_CLK and
the output TX_CLK using configuration registers. This will help in
maintain the CLK-CTL delay relationship in various operating
conditions.

This patch adds support RGMII TX/RX delay configuration.
====================

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: dts: xgene: Add TX/RX delay field

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: xgene: Add support RGMII TX/RX delay configuration

Add RGMII TX/RX delay configuration support. RGMII standard requires 2ns
delay to help the RGMII bridge receiver to sample data correctly. If the
default value does not provide proper centering of the data sample, the
TX/RX delay parameters can be used to adjust accordingly.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: set is_local and is_static before fdb entry is added to the fdb hashtable

Problem Description:
We can add fdbs pointing to the bridge with NULL ->dst but that has a
few race conditions because br_fdb_insert() is used which first creates
the fdb and then, after the fdb has been published/linked, sets
"is_local" to 1 and in that time frame if a packet arrives for that fdb
it may see it as non-local and either do a NULL ptr dereference in
br_forward() or attach the fdb to the port where it arrived, and later
br_fdb_insert() will make it local thus getting a wrong fdb entry.
Call chain br_handle_frame_finish() -> br_forward():
But in br_handle_frame_finish() in order to call br_forward() the dst
should not be local i.e. skb != NULL, whenever the dst is
found to be local skb is set to NULL so we can't forward it,
and here comes the problem since it's running only
with RCU when forwarding packets it can see the entry before "is_local"
is set to 1 and actually try to dereference NULL.
The main issue is that if someone sends a packet to the switch while
it's adding the entry which points to the bridge device, it may
dereference NULL ptr. This is needed now after we can add fdbs
pointing to the bridge. This poses a problem for
br_fdb_update() as well, while someone's adding a bridge fdb, but
before it has is_local == 1, it might get moved to a port if it comes
as a source mac and then it may get its "is_local" set to 1

This patch changes fdb_create to take is_local and is_static as
arguments to set these values in the fdb entry before it is added to the
hash. Also adds null check for port in br_forward.

Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the bridge device")
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: add IPv6 bits to geneve_fill_metadata_dst

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: handle ipv6 priority like ipv4 tos

Other callers of udp_tunnel6_xmit_skb just pass 0 for the prio
argument. Jesse Gross <jesse@nicira.com> suggested that prio is really
the same as IPv4's tos and should be handled the same, so this is my
interpretation of that suggestion.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reported-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: implement support for IPv6-based tunnels

NOTE: Link-local IPv6 addresses for remote endpoints are not supported,
since the driver currently has no capacity for binding a geneve
interface to a specific link.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64/efi: do not assume DRAM base is aligned to 2 MB

The current arm64 Image relocation code in the UEFI stub assumes that
the dram_base argument it receives is always a multiple of 2 MB. In
reality, it is simply the lowest start address of all RAM entries in
the UEFI memory map, which means it could be any multiple of 4 KB.

Since the arm64 kernel Image needs to reside TEXT_OFFSET bytes beyond
a 2 MB aligned base, or it will fail to boot, make sure we round dram_base
to 2 MB before using it to calculate the relocation address.

Fixes: e38457c361b30c5a ("arm64: efi: prefer AllocatePages() over efi_low_alloc() for vmlinux")
Reported-by: Timur Tabi <timur@codeaurora.org>
Tested-by: Timur Tabi <timur@codeaurora.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues

Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.

Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.

This is the second version of the patch which doesn't use the
overflow_usub function, which got reverted for now.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "Merge branch 'ipv6-overflow-arith'"

Linus dislikes these changes. To not hold up the net-merge let's revert
it for now and fix the bug like Linus suggested.

This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing
changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge ath-next from ath.git

Major changes:

ath10k:

* add QCA9377 support
* fw_stats support for 10.4 firmware

ath6kl:

* report antenna configuration to user space
* implement ethtool stats

ath6kl: implement ethtool stats

This supports a way to get target stats through normal
ethtool stats API.

For instance:
# ethtool -S wlan1
NIC statistics:
     tx_pkts_nic: 353
     tx_bytes_nic: 25142
     rx_pkts_nic: 6
     rx_bytes_nic: 996
     d_tx_ucast_pkts: 89
     d_tx_bcast_pkts: 264
     d_tx_ucast_bytes: 3020
     d_tx_bcast_bytes: 22122
...

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath6kl: break stats gathering code into separate method

This will allow us to call it from elsewhere when implementing
ethtool stats.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath6kl: fix firmware version assignment

Improper use of strlcpy caused garbage to be appended to the
firmware version string. Fix this by paying attention to the
ie_lenth.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath6kl: add error message to explain lack of HT

It can take a user a while to understand why their
NIC that advertises 802.11n support cannot actually
do 802.11n. Print out a warning in the logs to save
the next poor person to use this NIC some trouble.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath6kl: report antenna configuration

This lets 'iw phy phy0 info' report antennas for
the radio device:

...
Available Antennas: TX 0x2 RX 0x2
Configured Antennas: TX 0x2 RX 0x2
...

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

wil6210: handle failure in Tx vring config

When configuring Tx vring for new connection,
WMI call to the firmware may fail. In this case, need to
clean up properly. In particular, need to call
cfg80211_del_sta() in case of AP like interface.

Perform full "disconnect" procedure for proper clean up

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

wil6210: fix device ready detection

Adjust driver behavior during FW boot. Proper sequence of
events after reset and FW download, is as following:

- FW prepares mailbox structure and reports IRQ "FW_READY"
- driver caches mailbox registers, marks mailbox readiness
- FW sends WMI_FW_READY event, ignore it
- FW sends WMI_READY event with some data
- driver stores relevant data marks FW is operational

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

wil6210: Fix TSO overflow handling

When Tx ring full is encountered with TSO,
printout of "DMA error" was wrongly printed.

In addition, in case of Tx ring full return
proper error code so that NETDEV_TX_BUSY is
returned to network stack in order not to
drop the packets and retry transmission of the
packets when ring is emptied.

Signed-off-by: Hamad Kadmany <qca_hkadmany@qca.qualcomm.com>
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

wil6210: ignore selected WMI events

Some events are ignored for purpose; such events should not
be treated as "unhandled events". Replace info message
saying "unhandled" with debug one saying "ignore", to reduce
dmesg pollution

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: add QCA9377 chipset support

Add the hardware name, revision and update the pci_id table.

Currently there're two HW ref. designs available I'm aware of,
with 1.0.2 and 1.1 chip revisions. I've access and been using
the first one so far and this patch cover only it.

QCA9377 inherits most of the stuff (e.g. fw interfaces)
from QCA61x4 design, so the integration was pretty straightforward.

Signed-off-by: Bartosz Markowski <bartosz.markowski@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: reload HT/VHT capabilities on antenna change

To reflect configured antenna settings in HT/VHT MCS map,
reload the HT/VHT capabilities upon antenna change.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: move static HT/VHT capability setup functions

Move HT and VHT capabiltity setup static functions to avoid
forward declaration.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: fill HT/VHT MCS rateset only for configured chainmask

HT/VHT MCS rateset should be filled only for configured chainmask
rather that max supported chainmask. Fix that by checking configured
chainmask while filling HT/VHT MCS rate map.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: remove supported chain mask

Removing supported chainmask fields as it can be always derived
from num_rf_chains.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: remove shadow copy of CE descriptors for source ring

For the messages from host to target, shadow copy of CE descriptors
are maintained in source ring. Before writing actual CE descriptor,
first shadow copy is filled and then it is copied to CE address space.
To optimize in download path and to reduce d-cache pressure, removing
shadow copy of CE descriptors. This will also reduce driver memory
consumption by 33KB during on device probing.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: cleanup copy engine send completion

The physical address necessary to unmap DMA ('bufferp') is stored
in ath10k_skb_cb as 'paddr'. ath10k doesn't rely on the meta/transfer_id
when handling send completion (htc ep id is stored in sk_buff control
buffer). So the unused output arguments {bufferp, nbytesp and transfer_idp}
are removed from CE send completion. This change is needed before removing
the shadow copy of copy engine (CE) descriptors in follow up patch.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: remove send completion validation in diag read/write

CE diag window access is serialized (it has to be by design) so
there's no way to get a different send completion. so there's no
need for post completion validation.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: use local memory instead of shadow descriptor in ce_send

Currently to avoid uncached memory access while filling up copy engine
descriptors, shadow descriptors are used. This can be optimized further
by removing shadow descriptors. To achieve that first shadow ring
dependency in ce_send is removed by creating local copy of the
descriptor on stack and make a one-shot copy into the "uncached"
descriptor.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: add fw_stats support to 10.4 firmware

This patch adds support for getting firmware debug stats in 10.4 fw.

Signed-off-by: Manikanta Pubbisetty <c_mpubbi@qti.qualcomm.com>
Signed-off-by: Tamizh chelvam <c_traja@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: add FW API support to test mode

Add WMI-TLV and FW API support in ath10k testmode.
Ath10k can get right wmi command format from UTF image
to communicate UTF firmware.

Signed-off-by: Alan Liu <alanliu@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

[IA64] Wire up kcmp syscall

systemd > 218 fails to compile on ia64 with:

error: ‘__NR_kcmp’ undeclared [1].

I've been told that this is because the kcmp syscall hasn't been wired up
for the ia64 arch [2].

The proposed patch thus wire up the kcmp syscall for the ia64 arch.

[1] https://bugs.gentoo.org/show_bug.cgi?id=560492
[2] https://bugs.gentoo.org/show_bug.cgi?id=560492#c17

Signed-off-by: Émeric MASCHINO <emeric.maschino@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

ath10k: enable adaptive CCA

European Union has made it mandatory that all devices working in 2.4 GHz
has to adhere to the ETSI specification (ETSI EN 300 328 V1.9.1)
beginnig this year. The standard basically speaks about interferences
in 2.4Ghz band.
For example, when 802.11 device detects interference, TX must be stopped
as long as interference is present.

Adaptive CCA is a feature, when enabled the device learns from the
environment and configures CCA levels adaptively. This will improve
detecting interferences and the device can stop trasmissions till the
interference is present eventually leading to good performances in
varying interference conditions.

The patch includes code for enabling adaptive CCA for 10.2.4 firmware on
QCA988X.

Signed-off-by: Maharaja <c_mkenna@qti.qualcomm.com>
Signed-off-by: Manikanta Pubbisetty <c_mpubbi@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ssb: add Kconfig entry for compiling SoC related code

This allows saving a little of space when not using ssb on Broadcom SoC.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

ssb: move functions specific to SoC hosted bus to separated file

This cleans main.c a bit and will allow us to compile SoC related code
conditionally in the future.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

ssb: pick PCMCIA host code support from b43 driver

ssb bus can be found on various "host" devices like PCI/PCMCIA/SDIO.
Every ssb bus contains cores AKA devices.
The main idea is to have ssb driver scan/initialize bus and register
ready-to-use cores. This way ssb drivers can operate on a single core
mostly ignoring underlaying details.

For some reason PCMCIA support was split between ssb and b43. We got
PCMCIA host device probing in b43, then bus scanning in ssb and then
wireless core probing back in b43. The truth is it's very unlikely we
will ever see PCMCIA ssb device with no 802.11 core but I still don't
see any advantage of the current architecture.

With proposed change we get the same functionality with a simpler
architecture, less Kconfig symbols, one killed EXPORT and hopefully
cleaner b43. Since b43 supports both: ssb & bcma I prefer to keep ssb
specific code in ssb driver.

This mostly moves code from b43's pcmcia.c to bridge_pcmcia_80211.c. We
already use similar solution with b43_pci_bridge.c. I didn't use "b43"
in name of this new file as in theory any driver can operate on wireless
core.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

mwifiex: toggle carrier state in start_ap/stop_ap.

In uap mode the carrier is not enabled until after the first STA joins.
The carrier triggers the bridge to start its state machine, and if STP
is enabled, it takes 4 seconds as it transitions from disabled to
forwarding. During this time the bridge drops all traffic, and the EAPOL
handshake times out after 3 seconds, preventing stations from joining.

Follow the logic used in mac80211 and start the carrier in start_ap
and disable it in stop_ap. This has a nice benefit of allowing the
first station connection time to be reduced by up to 75% when STP is
in use.

Signed-off-by: Martin Faltesek <mfaltesek@google.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

wcn36xx: Remove warning message when dev is NULL for arm64 dma_alloc.

arm64 has requirement that all the dma operations have actual device.
Otherwise, following warnning message shown and dma allocation fails:

WARNING: CPU: 0 PID: 954 at arch/arm64/mm/dma-mapping.c:106 __dma_alloc+0x24c/0x258()
Use an actual device structure for DMA allocation
Modules linked in: wcn36xx wcn36xx_platform
CPU: 0 PID: 954 Comm: ifconfig Not tainted 4.0.0+ #14
Hardware name: Qualcomm Technologies, Inc. MSM 8916 MTP (DT)
Call trace:
[<ffffffc000089904>] dump_backtrace+0x0/0x124
[<ffffffc000089a38>] show_stack+0x10/0x1c
[<ffffffc000627114>] dump_stack+0x80/0xc4
[<ffffffc0000b2e64>] warn_slowpath_common+0x98/0xd0
[<ffffffc0000b2ee8>] warn_slowpath_fmt+0x4c/0x58
[<ffffffc00009487c>] __dma_alloc+0x248/0x258
[<ffffffbffc009270>] wcn36xx_dxe_allocate_mem_pools+0xc4/0x108 [wcn36xx]
[<ffffffbffc0079c4>] wcn36xx_start+0x38/0x240 [wcn36xx]
[<ffffffc0005f161c>] ieee80211_do_open+0x1b0/0x9a4
[<ffffffc0005f1e68>] ieee80211_open+0x58/0x68
[<ffffffc00051693c>] __dev_open+0xb0/0x120
[<ffffffc000516c10>] __dev_change_flags+0x88/0x150
[<ffffffc000516cf4>] dev_change_flags+0x1c/0x5c
[<ffffffc000570950>] devinet_ioctl+0x644/0x6f0

Signed-off-by: Yin, Fengwei <fengwei.yin@linaro.org>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

wcn36xx: introduce per-channel ring buffer locks

wcn36xx implements a ring buffer for transmitted frames for each
(high and low priority) DMA channel. The ring buffers are lockless:
new frames are inserted at the head of the queue, while finished
packets are reaped from the tail.

Unfortunately, the list manipulations are missing any kind of barriers
so are susceptible to various races: for example, a TX completion
handler might read an updated desc->ctrl before the head has actually
advanced, and then null out the ctl->skb pointer while it is still
being used in the TX path.

Simplify things here by adding a spin lock when traversing the ring.
This change increased stability for me without adding any noticeable
overhead on my platform (xperia z).

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>