linux-2.6-microblaze.git
7 years agonet: sched: cls_u32: call block callbacks for offload
Jiri Pirko [Thu, 19 Oct 2017 13:50:35 +0000 (15:50 +0200)]
net: sched: cls_u32: call block callbacks for offload

Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: cls_u32: swap u32_remove_hw_knode and u32_remove_hw_hnode
Jiri Pirko [Thu, 19 Oct 2017 13:50:34 +0000 (15:50 +0200)]
net: sched: cls_u32: swap u32_remove_hw_knode and u32_remove_hw_hnode

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: cls_matchall: call block callbacks for offload
Jiri Pirko [Thu, 19 Oct 2017 13:50:33 +0000 (15:50 +0200)]
net: sched: cls_matchall: call block callbacks for offload

Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: use tc_setup_cb_call to call per-block callbacks
Jiri Pirko [Thu, 19 Oct 2017 13:50:32 +0000 (15:50 +0200)]
net: sched: use tc_setup_cb_call to call per-block callbacks

Extend the tc_setup_cb_call entrypoint function originally used only for
action egress devices callbacks to call per-block callbacks as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: introduce per-block callbacks
Jiri Pirko [Thu, 19 Oct 2017 13:50:31 +0000 (15:50 +0200)]
net: sched: introduce per-block callbacks

Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule for a specific block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: use extended variants of block_get/put in ingress and clsact qdiscs
Jiri Pirko [Thu, 19 Oct 2017 13:50:30 +0000 (15:50 +0200)]
net: sched: use extended variants of block_get/put in ingress and clsact qdiscs

Use previously introduced extended variants of block get and put
functions. This allows to specify a binder types specific to clsact
ingress/egress which is useful for drivers to distinguish who actually
got the block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: add block bind/unbind notif. and extended block_get/put
Jiri Pirko [Thu, 19 Oct 2017 13:50:29 +0000 (15:50 +0200)]
net: sched: add block bind/unbind notif. and extended block_get/put

Introduce new type of ndo_setup_tc message to propage binding/unbinding
of a block to driver. Call this ndo whenever qdisc gets/puts a block.
Alongside with this, there's need to propagate binder type from qdisc
code down to the notifier. So introduce extended variants of
block_get/put in order to pass this info.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: let trace_fib6_table_lookup() dereference the fib table
Paolo Abeni [Thu, 19 Oct 2017 07:31:43 +0000 (09:31 +0200)]
ipv6: let trace_fib6_table_lookup() dereference the fib table

The perf traces for ipv6 routing code show a relevant cost around
trace_fib6_table_lookup(), even if no trace is enabled. This is
due to the fib6_table de-referencing currently performed by the
caller.

Let's the tracing code pay this overhead, passing to the trace
helper the table pointer. This gives small but measurable
performance improvement under UDP flood.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Sat, 21 Oct 2017 01:22:19 +0000 (02:22 +0100)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2017-10-19

Here's the first bluetooth-next pull request targeting the 4.15 kernel
release.

 - Multiple fixes & improvements to the hci_bcm driver
 - DT improvements, e.g. new local-bd-address property
 - Fixes & improvements to ECDH usage. Private key is now generated by
   the crypto subsystem.
 - gcc-4.9 warning fixes

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv4: ipv4_default_advmss() should use route mtu
Eric Dumazet [Thu, 19 Oct 2017 00:02:03 +0000 (17:02 -0700)]
ipv4: ipv4_default_advmss() should use route mtu

ipv4_default_advmss() incorrectly uses the device MTU instead
of the route provided one. IPv6 has the proper behavior,
lets harmonize the two protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'ieee802154-for-davem-2017-10-18' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Sat, 21 Oct 2017 00:45:56 +0000 (01:45 +0100)]
Merge branch 'ieee802154-for-davem-2017-10-18' of git://git./linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154 2017-10-18

Please find below a pull request from the ieee802154 subsystem for net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agospectrum: Convert fib event handlers to use container_of on info arg
David Ahern [Wed, 18 Oct 2017 22:01:38 +0000 (15:01 -0700)]
spectrum: Convert fib event handlers to use container_of on info arg

Use container_of to convert the generic fib_notifier_info into
the event specific data structure.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: fix tcp_send_syn_data()
Eric Dumazet [Wed, 18 Oct 2017 21:20:30 +0000 (14:20 -0700)]
tcp: fix tcp_send_syn_data()

syn_data was allocated by sk_stream_alloc_skb(), meaning
its destructor and _skb_refdst fields are mangled.

We need to call tcp_skb_tsorted_anchor_cleanup() before
calling kfree_skb() or kernel crashes.

Bug was reported by syzkaller bot.

Fixes: e2080072ed2d ("tcp: new list for sent but unacked skbs for RACK recovery")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'ipv6-fixes-for-RTF_CACHE-entries'
David S. Miller [Sat, 21 Oct 2017 00:39:11 +0000 (01:39 +0100)]
Merge branch 'ipv6-fixes-for-RTF_CACHE-entries'

Paolo Abeni says:

====================
ipv6: fixes for RTF_CACHE entries

This series addresses 2 different but related issues with RTF_CACHE
introduced by the recent refactory.

patch 1 restore the gc timer for such routes
patch 2 removes the aged out dst from the fib tree, properly coping with pMTU
routes

v1 -> v2:
 - dropped the  for ip route show cache
 - avoid touching dst.obsolete when the dst is aged out

v2 -> v3:
 - take care of pMTU exceptions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: remove from fib tree aged out RTF_CACHE dst
Paolo Abeni [Thu, 19 Oct 2017 14:07:11 +0000 (16:07 +0200)]
ipv6: remove from fib tree aged out RTF_CACHE dst

The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
dst cache") partially reverted the commit 1e2ea8ad37be ("ipv6: set
dst.obsolete when a cached route has expired").

As a result, RTF_CACHE dst referenced outside the fib tree will
not be removed until the next sernum change; dst_check() does not
fail on aged-out dst, and dst->__refcnt can't decrease: the aged
out dst will stay valid for a potentially unlimited time after the
timeout expiration.

This change explicitly removes RTF_CACHE dst from the fib tree when
aged out. The rt6_remove_exception() logic will then obsolete the
dst and other entities will drop the related reference on next
dst_check().

pMTU exceptions are not aged-out, and are removed from the exception
table only when the - usually considerably longer - ip6_rt_mtu_expires
timeout expires.

v1 -> v2:
  - do not touch dst.obsolete in rt6_remove_exception(), not needed
v2 -> v3:
  - take care of pMTU exceptions, too

Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: start fib6 gc on RTF_CACHE dst creation
Paolo Abeni [Thu, 19 Oct 2017 14:07:10 +0000 (16:07 +0200)]
ipv6: start fib6 gc on RTF_CACHE dst creation

After the commit 2b760fcf5cfb ("ipv6: hook up exception table
to store dst cache"), the fib6 gc is not started after the
creation of a RTF_CACHE via a redirect or pmtu update, since
fib6_add() isn't invoked anymore for such dsts.

We need the fib6 gc to run periodically to clean the RTF_CACHE,
or the dst will stay there forever.

Fix it by explicitly calling fib6_force_start_gc() on successful
exception creation. gc_args->more accounting will ensure that
the gc timer will run for whatever time needed to properly
clean the table.

v2 -> v3:
 - clarified the commit message

Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bpf-lsm-hooks'
David S. Miller [Fri, 20 Oct 2017 12:33:00 +0000 (13:33 +0100)]
Merge branch 'bpf-lsm-hooks'

Chenbo Feng says:

====================
bpf: security: New file mode and LSM hooks for eBPF object permission control

Much like files and sockets, eBPF objects are accessed, controlled, and
shared via a file descriptor (FD). Unlike files and sockets, the
existing mechanism for eBPF object access control is very limited.
Currently there are two options for granting accessing to eBPF
operations: grant access to all processes, or only CAP_SYS_ADMIN
processes. The CAP_SYS_ADMIN-only mode is not ideal because most users
do not have this capability and granting a user CAP_SYS_ADMIN grants too
many other security-sensitive permissions. It also unnecessarily allows
all CAP_SYS_ADMIN processes access to eBPF functionality. Allowing all
processes to access to eBPF objects is also undesirable since it has
potential to allow unprivileged processes to consume kernel memory, and
opens up attack surface to the kernel.

Adding LSM hooks maintains the status quo for systems which do not use
an LSM, preserving compatibility with userspace, while allowing security
modules to choose how best to handle permissions on eBPF objects. Here
is a possible use case for the lsm hooks with selinux module:

The network-control daemon (netd) creates and loads an eBPF object for
network packet filtering and analysis. It passes the object FD to an
unprivileged network monitor app (netmonitor), which is not allowed to
create, modify or load eBPF objects, but is allowed to read the traffic
stats from the map.

Selinux could use these hooks to grant the following permissions:
allow netd self:bpf_map { create read write};
allow netmonitor netd:fd use;
allow netmonitor netd:bpf_map read;

In this patch series, A file mode is added to bpf map to store the
accessing mode. With this file mode flags, the map can be obtained read
only, write only or read and write. With the help of this file mode,
several security hooks can be added to the eBPF syscall implementations
to do permissions checks. These LSM hooks are mainly focused on checking
the process privileges before it obtains the fd for a specific bpf
object. No matter from a file location or from a eBPF id. Besides that,
a general check hook is also implemented at the start of bpf syscalls so
that each security module can have their own implementation on the reset
of bpf object related functionalities.

In order to store the ownership and security information about eBPF
maps, a security field pointer is added to the struct bpf_map. And the
last two patch set are implementation of selinux check on these hooks
introduced, plus an additional check when eBPF object is passed between
processes using unix socket as well as binder IPC.

Change since V1:

 - Whitelist the new bpf flags in the map allocate check.
 - Added bpf selftest for the new flags.
 - Added two new security hooks for copying the security information from
   the bpf object security struct to file security struct
 - Simplified the checking action when bpf fd is passed between processes.

 Change since V2:

 - Fixed the line break problem for map flags check
 - Fixed the typo in selinux check of file mode.
 - Merge bpf_map and bpf_prog into one selinux class
 - Added bpf_type and bpf_sid into file security struct to store the
   security information when generate fd.
 - Add the hook to bpf_map_new_fd and bpf_prog_new_fd.

 Change since V3:

 - Return the actual error from security check instead of -EPERM
 - Move the hooks into anon_inode_getfd() to avoid get file again after
   bpf object file is installed with fd.
 - Removed the bpf_sid field inside file_scerity_struct to reduce the
   cache size.

 Change since V4:

 - Rename bpf av prog_use to prog_run to distinguish from fd_use.
 - Remove the bpf_type field inside file_scerity_struct and use bpf fops
   to indentify bpf object instead.

 Change since v5:

 - Fixed the incorrect selinux class name for SECCLASS_BPF

 Change since v7:

 - Fixed the build error caused by xt_bpf module.
 - Add flags check for bpf_obj_get() and bpf_map_get_fd_by_id() to make it
   uapi-wise.
 - Add the flags field to the bpf_obj_get_user function when BPF_SYSCALL
   is not configured.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselinux: bpf: Add addtional check for bpf object file receive
Chenbo Feng [Wed, 18 Oct 2017 20:00:26 +0000 (13:00 -0700)]
selinux: bpf: Add addtional check for bpf object file receive

Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselinux: bpf: Add selinux check for eBPF syscall operations
Chenbo Feng [Wed, 18 Oct 2017 20:00:25 +0000 (13:00 -0700)]
selinux: bpf: Add selinux check for eBPF syscall operations

Implement the actual checks introduced to eBPF related syscalls. This
implementation use the security field inside bpf object to store a sid that
identify the bpf object. And when processes try to access the object,
selinux will check if processes have the right privileges. The creation
of eBPF object are also checked at the general bpf check hook and new
cmd introduced to eBPF domain can also be checked there.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosecurity: bpf: Add LSM hooks for bpf object related syscall
Chenbo Feng [Wed, 18 Oct 2017 20:00:24 +0000 (13:00 -0700)]
security: bpf: Add LSM hooks for bpf object related syscall

Introduce several LSM hooks for the syscalls that will allow the
userspace to access to eBPF object such as eBPF programs and eBPF maps.
The security check is aimed to enforce a per object security protection
for eBPF object so only processes with the right priviliges can
read/write to a specific map or use a specific eBPF program. Besides
that, a general security hook is added before the multiplexer of bpf
syscall to check the cmd and the attribute used for the command. The
actual security module can decide which command need to be checked and
how the cmd should be checked.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: Add tests for eBPF file mode
Chenbo Feng [Wed, 18 Oct 2017 20:00:23 +0000 (13:00 -0700)]
bpf: Add tests for eBPF file mode

Two related tests are added into bpf selftest to test read only map and
write only map. The tests verified the read only and write only flags
are working on hash maps.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: Add file mode configuration into bpf maps
Chenbo Feng [Wed, 18 Oct 2017 20:00:22 +0000 (13:00 -0700)]
bpf: Add file mode configuration into bpf maps

Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet-tun: fix panics at dismantle time
Eric Dumazet [Wed, 18 Oct 2017 19:12:09 +0000 (12:12 -0700)]
net-tun: fix panics at dismantle time

syzkaller got crashes at dismantle time [1]

It is not correct to test (tun->flags & IFF_NAPI) in tun_napi_disable()
and tun_napi_del() : Each tun_file can have different mode, depending
on how they were created.

Similarly I have changed tun_get_user() and tun_poll_controller()
to use the new tfile->napi_enabled boolean.

[  154.331360] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  154.339220] IP: [<ffffffff9634cad6>] hrtimer_active+0x26/0x60
[  154.344983] PGD 0
[  154.347009] Oops: 0000 [#1] SMP
[  154.350680] gsmi: Log Shutdown Reason 0x03
[  154.379572] task: ffff994719150dc0 ti: ffff99475c0ae000 task.ti: ffff99475c0ae000
[  154.387043] RIP: 0010:[<ffffffff9634cad6>]  [<ffffffff9634cad6>] hrtimer_active+0x26/0x60
[  154.395232] RSP: 0018:ffff99475c0afce8  EFLAGS: 00010246
[  154.400542] RAX: ffff994754850ac0 RBX: ffff994753e65408 RCX: ffff994753e65388
[  154.407666] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff994753e65408
[  154.414790] RBP: ffff99475c0afce8 R08: 0000000000000000 R09: 0000000000000000
[  154.421921] R10: ffff99475f6f5910 R11: 0000000000000001 R12: 0000000000000000
[  154.429044] R13: ffff99417deab668 R14: ffff99417deaa780 R15: ffff99475f45dde0
[  154.436174] FS:  0000000000000000(0000) GS:ffff994767a00000(0000) knlGS:0000000000000000
[  154.444249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  154.449986] CR2: 0000000000000000 CR3: 00000005a8a0e000 CR4: 0000000000022670
[  154.457110] Stack:
[  154.459120]  ffff99475c0afd28 ffffffff9634d614 1000000000000000 0000000000000000
[  154.466598]  ffffe54240000000 ffff994753e65408 ffff994753e653a8 ffff99417deab668
[  154.474067]  ffff99475c0afd48 ffffffff9634d6fd ffff99474c2be678 ffff994753e65398
[  154.481537] Call Trace:
[  154.483985]  [<ffffffff9634d614>] hrtimer_try_to_cancel+0x24/0xf0
[  154.490074]  [<ffffffff9634d6fd>] hrtimer_cancel+0x1d/0x30
[  154.495563]  [<ffffffff96860b3c>] napi_disable+0x3c/0x70
[  154.500875]  [<ffffffff9678ae62>] __tun_detach+0xd2/0x360
[  154.506272]  [<ffffffff9678b117>] tun_chr_close+0x27/0x40
[  154.511669]  [<ffffffff9646ebe6>] __fput+0xd6/0x1e0
[  154.516548]  [<ffffffff9646ed3e>] ____fput+0xe/0x10
[  154.521429]  [<ffffffff963035a2>] task_work_run+0x72/0x90
[  154.526827]  [<ffffffff962e9407>] do_exit+0x317/0xb60
[  154.531879]  [<ffffffff962e9c8f>] do_group_exit+0x3f/0xa0
[  154.537275]  [<ffffffff962e9d07>] SyS_exit_group+0x17/0x20
[  154.542769]  [<ffffffff969784be>] entry_SYSCALL_64_fastpath+0x12/0x17

Fixes: 943170998b20 ("net-tun: enable NAPI for TUN/TAP driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv4: Change fib notifiers to take a fib_alias
David Ahern [Wed, 18 Oct 2017 18:39:13 +0000 (11:39 -0700)]
net: ipv4: Change fib notifiers to take a fib_alias

All of the notifier data (fib_info, tos, type and table id) are
contained in the fib_alias. Pass it to the notifier instead of
each data separately shortening the argument list by 3.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: socket option to set TCP fast open key
Yuchung Cheng [Wed, 18 Oct 2017 18:22:51 +0000 (11:22 -0700)]
tcp: socket option to set TCP fast open key

New socket option TCP_FASTOPEN_KEY to allow different keys per
listener.  The listener by default uses the global key until the
socket option is set.  The key is a 16 bytes long binary data. This
option has no effect on regular non-listener TCP sockets.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-extack'
David S. Miller [Fri, 20 Oct 2017 12:15:08 +0000 (13:15 +0100)]
Merge branch 'mlxsw-extack'

David Ahern says:

====================
mlxsw: spectrum_router: Add extack messages for RIF and VRF overflow

Currently, exceeding the number of VRF instances or the number of router
interfaces either fails with a non-intuitive EBUSY:
    $ ip li set swp1s1.6 vrf vrf-1s1-6 up
    RTNETLINK answers: Device or resource busy

or fails silently (IPv6) since the checks are done in a work queue. This
set adds support for the address validator notifier to spectrum which
allows ext-ack based messages to be returned on failure.

To make that happen the IPv6 version needs to be converted from atomic
to blocking (patch 2), and then support for extack needs to be added
to the notifier (patch 3). Patch 1 reworks the locking in ipv6_add_addr
to work better in the atomic and non-atomic code paths. Patches 4 and 5
add the validator notifier to spectrum and then plumb the extack argument
through spectrum_router.

With this set, VRF overflows fail with:
   $ ip li set swp1s1.6 vrf vrf-1s1-6 up
   Error: spectrum: Exceeded number of supported VRF.

and RIF overflows fail with:
   $ ip addr add dev swp1s2.191 10.12.191.1/24
   Error: spectrum: Exceeded number of supported router interfaces.

v2 -> v3
- fix surround context of patch 4 which was altered by c30f5d012edf

v1 -> v2
- fix error path in ipv6_add_addr: reset rt to NULL (Ido comment) and
  add in6_dev_put on ifa once the hold has been done

RFC -> v1
- addressed various comments from Ido
- refactored ipv6_add_addr to allow ifa's to be allocated with
  GFP_KERNEL as requested by DaveM
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Add extack message for RIF and VRF overflow
David Ahern [Wed, 18 Oct 2017 16:56:56 +0000 (09:56 -0700)]
mlxsw: spectrum_router: Add extack message for RIF and VRF overflow

Add extack argument down to mlxsw_sp_rif_create and mlxsw_sp_vr_create
to set an error message on RIF or VR overflow. Now on overflow of
either resource the user gets an informative message as opposed to
failing with EBUSY.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: router: Add support for address validator notifier
David Ahern [Wed, 18 Oct 2017 16:56:55 +0000 (09:56 -0700)]
mlxsw: spectrum: router: Add support for address validator notifier

Add support for inetaddr_validator and inet6addr_validator. The
notifiers provide a means for validating ipv4 and ipv6 addresses
before the addresses are installed and on failure the error
is propagated back to the user.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Add extack to validator_info structs used for address notifier
David Ahern [Wed, 18 Oct 2017 16:56:54 +0000 (09:56 -0700)]
net: Add extack to validator_info structs used for address notifier

Add extack to in_validator_info and in6_validator_info. Update the one
user of each, ipvlan, to return an error message for failures.

Only manual configuration of an address is plumbed in the IPv6 code path.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv6: Make inet6addr_validator a blocking notifier
David Ahern [Wed, 18 Oct 2017 16:56:53 +0000 (09:56 -0700)]
net: ipv6: Make inet6addr_validator a blocking notifier

inet6addr_validator chain was added by commit 3ad7d2468f79f ("Ipvlan
should return an error when an address is already in use") to allow
address validation before changes are committed and to be able to
fail the address change with an error back to the user. The address
validation is not done for addresses received from router
advertisements.

Handling RAs in softirq context is the only reason for the notifier
chain to be atomic versus blocking. Since the only current user, ipvlan,
of the validator chain ignores softirq context, the notifier can be made
blocking and simply not invoked for softirq path.

The blocking option is needed by spectrum for example to validate
resources for an adding an address to an interface.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: addrconf: cleanup locking in ipv6_add_addr
David Ahern [Wed, 18 Oct 2017 16:56:52 +0000 (09:56 -0700)]
ipv6: addrconf: cleanup locking in ipv6_add_addr

ipv6_add_addr is called in process context with rtnl lock held
(e.g., manual config of an address) or during softirq processing
(e.g., autoconf and address from a router advertisement).

Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry
and does not call unlock until exit, minus the call around the address
validator notifier. Similarly, addrconf_hash_lock is taken after the
validator notifier and held until exit. This forces the allocation of
inet6_ifaddr to always be atomic.

Refactor ipv6_add_addr as follows:
1. add an input boolean to discriminate the call path (process context
   or softirq). This new flag controls whether the alloc can be done
   with GFP_KERNEL or GFP_ATOMIC.

2. Move the rcu_read_lock_bh and unlock calls only around functions that
   do rcu updates.

3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should
   return an error when an address is already in use."). This was done
   presumably because rcu_read_unlock_bh needs to be called before calling
   the validator. Since rcu_read_lock is not needed before the validator
   runs revert the hold and put added by 3ad7d2468f79f and only do the
   hold when setting ifp->idev.

4. move duplicate address check and insertion of new address in the global
   address hash into a helper. The helper is called after an ifa is
   allocated and filled in.

This allows the ifa for manually configured addresses to be done with
GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held
and hash table spinlock held.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 's390-next'
David S. Miller [Fri, 20 Oct 2017 12:11:05 +0000 (13:11 +0100)]
Merge branch 's390-next'

Julian Wiedmann says:

====================
s390/net: updates 2017-10-18

please apply some additional robustness fixes and cleanups for 4.15.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: don't dump control cmd twice
Julian Wiedmann [Wed, 18 Oct 2017 15:40:25 +0000 (17:40 +0200)]
s390/qeth: don't dump control cmd twice

A few lines down, qeth_prepare_control_data() makes further changes to
the control cmd buffer, and then also writes a trace entry for it.
So the first entry just pollutes the trace file with intermediate data,
drop it.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: support GRO flush timer
Julian Wiedmann [Wed, 18 Oct 2017 15:40:24 +0000 (17:40 +0200)]
s390/qeth: support GRO flush timer

Switch to napi_complete_done(), and thus enable delayed GRO flushing.
The timeout is configured via /sys/class/net/<if>/gro_flush_timeout.

Default timeout is 0, so no change in behaviour.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: try harder to get packets from RX buffer
Julian Wiedmann [Wed, 18 Oct 2017 15:40:23 +0000 (17:40 +0200)]
s390/qeth: try harder to get packets from RX buffer

Current code bails out when two subsequent buffer elements hold
insufficient data to contain a qeth_hdr packet descriptor.
This seems reasonable, but it would be legal for quirky hardware to
leave a few elements empty and then present packets in a subsequent
element. These packets would currently be dropped.

So make sure to check all buffer elements, until we hit the LAST_ENTRY
indication.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: consolidate skb allocation
Julian Wiedmann [Wed, 18 Oct 2017 15:40:22 +0000 (17:40 +0200)]
s390/qeth: consolidate skb allocation

Move the allocation of SG skbs into the main path. This allows for
a little code sharing, and handling ENOMEM from within one place.

As side effect, L2 SG skbs now get the proper amount of additional
headroom (read: zero) instead of the hard-coded ETH_HLEN.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: clean up page frag creation
Julian Wiedmann [Wed, 18 Oct 2017 15:40:21 +0000 (17:40 +0200)]
s390/qeth: clean up page frag creation

Replace the open-coded skb_add_rx_frag(), and use a fall-through
to remove some duplicated code.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: no VLAN support on OSM
Julian Wiedmann [Wed, 18 Oct 2017 15:40:20 +0000 (17:40 +0200)]
s390/qeth: no VLAN support on OSM

Instead of silently discarding VLAN registration requests on OSM,
just indicate that this card type doesn't support VLAN.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: don't verify device when setting MAC address
Julian Wiedmann [Wed, 18 Oct 2017 15:40:19 +0000 (17:40 +0200)]
s390/qeth: don't verify device when setting MAC address

There's no reason why l2_set_mac_address() should ever be called for
a netdevice that's not owned by qeth. It's certainly not required for
VLAN devices, which have their own netdev_ops.

Also:
1) we don't do such validation for any of the other netdev_ops routines.
2) the code in question clearly has never been actually exercised;
   it's broken. After determining that the device is not owned
   by qeth, it would still use dev->ml_priv to write a qeth trace entry.

Remove the check, and its helper that walked the global card list.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: clean up initial MTU determination
Julian Wiedmann [Wed, 18 Oct 2017 15:40:18 +0000 (17:40 +0200)]
s390/qeth: clean up initial MTU determination

1. Drop the support for Token Ring,
2. use the ETH_DATA_LEN macro for the default L2 MTU,
3. handle OSM via the default case (as OSM is L2-only), and
4. document why the L3 MTU is reduced.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: fix early exit from error path
Julian Wiedmann [Wed, 18 Oct 2017 15:40:17 +0000 (17:40 +0200)]
s390/qeth: fix early exit from error path

When the allocation of the addr buffer fails, we need to free
our refcount on the inetdevice before returning.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: use kstrtobool() in qeth_bridgeport_hostnotification_store()
Andy Shevchenko [Wed, 18 Oct 2017 15:40:16 +0000 (17:40 +0200)]
s390/qeth: use kstrtobool() in qeth_bridgeport_hostnotification_store()

The sysfs enabled value is a boolean, so kstrtobool() is a better fit
for parsing the input string since it does the range checking for us.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: remove duplicated device matching
Julian Wiedmann [Wed, 18 Oct 2017 15:40:15 +0000 (17:40 +0200)]
s390/qeth: remove duplicated device matching

With commit "s390/ccwgroup: tie a ccwgroup driver to its ccw driver",
the ccwgroup core now ensures that a qeth group device only consists of
ccw devices which are supported by qeth. Therefore remove qeth's
internal device matching, and use .driver_info to determine the card
type.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/drivers: use setup_timer
Allen Pais [Wed, 18 Oct 2017 15:40:14 +0000 (17:40 +0200)]
s390/drivers: use setup_timer

Use setup_timer function instead of initializing timer with the
function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos390/qeth: rely on kernel for feature recovery
Julian Wiedmann [Wed, 18 Oct 2017 15:40:13 +0000 (17:40 +0200)]
s390/qeth: rely on kernel for feature recovery

When recovering a device, qeth needs to re-run the IPA commands that
enable all previously active HW features.
Instead of duplicating qeth_set_features(), let netdev_update_features()
recover the missing HW features from dev->wanted_features.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/sched: Set the net-device for egress device instance
Or Gerlitz [Wed, 18 Oct 2017 15:38:08 +0000 (18:38 +0300)]
net/sched: Set the net-device for egress device instance

Currently the netdevice field is not set and the egdev instance
is not functional, fix that.

Fixes: 3f55bdda8df ('net: sched: introduce per-egress action device callbacks')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'cxgb4-more-flower-offloads'
David S. Miller [Fri, 20 Oct 2017 12:06:53 +0000 (13:06 +0100)]
Merge branch 'cxgb4-more-flower-offloads'

Rahul Lakkireddy says:

====================
cxgb4: enable more tc flower offload matches and actions

This patch series enable more matches and actions for TC Flower
Offload support on Chelsio adapters.

Patch 1 enables matching on IP TOS.

Patch 2 enables matching on VLAN TCI.

Patch 3 adds support for action PASS.

Patch 4 adds support for ETH-DMAC rewrite via TC-PEDIT action. Also,
adds a check to assert that vlan/eth-dmac rewrite actions are valid
only in combination with action egress redirect.

Patch 5 introduces SMT ops for adding/removing entries from SMAC Table
in HW in preparation for patch 6.

Patch 6 adds support for ETH-SMAC rewrite via TC-PEDIT action.

Patch 7 introduces fw_filter2_wr to support L3/L4 header rewrites
in preparation for patch 8.

Patch 8 adds support for rewrite on L3/L4 header fields via TC-PEDIT
action. Supported fields for rewrite are:
IPv4 src/dst address, IPv6 src/dst address, TCP/UDP sport/dport.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add tc flower support for L3/L4 rewrite
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:14 +0000 (20:49 +0530)]
cxgb4: add tc flower support for L3/L4 rewrite

Adds support to rewrite L3/L4 fields via TC-PEDIT action.
Supported fields for rewrite are:
IPv4 src/dst address, IPv6 src/dst address, TCP/UDP sport/dport.

Also, process match fields first and then process the action items.

Refactor pedit action validation to separate function to avoid
excessive code indentation.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: introduce fw_filter2_wr to prepare for L3/L4 rewrite support
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:13 +0000 (20:49 +0530)]
cxgb4: introduce fw_filter2_wr to prepare for L3/L4 rewrite support

Update driver to use new fw_filter2_wr in order to support rewrite of
L3/L4 header fields via filters. Query FW_PARAMS_PARAM_DEV_FILTER2_WR
to check whether FW supports this new wr.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add tc flower support for ETH-SMAC rewrite
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:12 +0000 (20:49 +0530)]
cxgb4: add tc flower support for ETH-SMAC rewrite

Adds support for ETH-SMAC rewrite via TC-PEDIT action.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: introduce SMT ops to prepare for SMAC rewrite support
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:11 +0000 (20:49 +0530)]
cxgb4: introduce SMT ops to prepare for SMAC rewrite support

Introduce SMT operations for allocating/removing entries from
SMAC table. Make TCAM filters use the SMT ops whenever SMAC rewrite
is required.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add tc flower support for ETH-DMAC rewrite
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:10 +0000 (20:49 +0530)]
cxgb4: add tc flower support for ETH-DMAC rewrite

Add support for ETH-DMAC Rewrite via TC-PEDIT action. Also, add
check to assert that vlan/eth-dmac rewrite actions are valid only
in combination with action egress redirect.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add tc flower support for action PASS
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:09 +0000 (20:49 +0530)]
cxgb4: add tc flower support for action PASS

Add support for tc flower action PASS.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add tc flower match support for vlan
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:08 +0000 (20:49 +0530)]
cxgb4: add tc flower match support for vlan

Add support for matching on vlan tci.  Construct vlan tci match param
based on vlan-id and vlan-pcp values supplied by tc.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add tc flower match support for TOS
Kumar Sanghvi [Wed, 18 Oct 2017 15:19:07 +0000 (20:49 +0530)]
cxgb4: add tc flower match support for TOS

Add support for matching on IP TOS.  Also check on ethtype value
to be either IPv4 or IPv6.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: Remove use of inet6_sk and add IPv6 checks to tracepoint
David Ahern [Wed, 18 Oct 2017 15:17:29 +0000 (08:17 -0700)]
tcp: Remove use of inet6_sk and add IPv6 checks to tracepoint

386fd5da401d ("tcp: Check daddr_cache before use in tracepoint") was the
second version of the tracepoint fixup patch. This patch is the delta
between v2 and v3.  Specifically, remove the use of inet6_sk and check
sk_family as requested by Eric and add IS_ENABLED(CONFIG_IPV6) around
the use of sk_v6_rcv_saddr and sk_v6_daddr as done in sock_common (noted
by Cong).

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodoc: Update VRF documentation metric
Donald Sharp [Wed, 18 Oct 2017 14:24:28 +0000 (10:24 -0400)]
doc: Update VRF documentation metric

Two things:

1) Update examples to show usage of metric
2) Discuss reasoning for using such a high metric.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'rxrpc-next-20171018' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 20 Oct 2017 07:42:09 +0000 (08:42 +0100)]
Merge tag 'rxrpc-next-20171018' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Add bits for kernel services

Here are some patches that add a few things for kernel services to use:

 (1) Allow service upgrade to be requested and allow the resultant actual
     service ID to be obtained.

 (2) Allow the RTT time of a call to be obtained.

 (3) Allow a kernel service to find out if a call is still alive on a
     server between transmitting a request and getting the reply.

 (4) Allow data transmission to ignore signals if transmission progress is
     being made in reasonable time.  This is also usable by userspace by
     passing MSG_WAITALL to sendmsg()[*].

[*] I'm not sure this is the right interface for this or whether a sockopt
    should be used instead.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'wireless-drivers-next-for-davem-2017-10-18' of git://git.kernel.org/pub...
David S. Miller [Fri, 20 Oct 2017 07:37:28 +0000 (08:37 +0100)]
Merge tag 'wireless-drivers-next-for-davem-2017-10-18' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.15

The first pull request for 4.15, unusually late this time but still
relatively small. Also includes merge from wireless-drivers to fix
conflicts in iwlwifi.

Major changes:

rsi

* add P2P mode support

* sdio suspend and resume support

iwlwifi

* A fix and an addition for PCI devices for the A000 family

* Dump PCI registers when an error occurs, to make it easier to debug

rtlwifi

* add support for 64 bit DMA, enabled with a module parameter

* add module parameter to enable ASPM
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: cls_u32: use hash_ptr() for tc_u_hash
Arnd Bergmann [Wed, 18 Oct 2017 08:33:37 +0000 (10:33 +0200)]
net: sched: cls_u32: use hash_ptr() for tc_u_hash

After the change to the tp hash, we now get a build warning
on 32-bit architectures:

net/sched/cls_u32.c: In function 'tc_u_hash':
net/sched/cls_u32.c:338:17: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
  return hash_64((u64) tp->chain->block, U32_HASH_SHIFT);

Using hash_ptr() instead of hash_64() lets us drop the cast
and fixes the warning while still resulting in the same hash
value.

Fixes: 7fa9d974f3c2 ("net: sched: cls_u32: use block instead of q in tc_u_common")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: checking for NULL instead of IS_ERR()
Dan Carpenter [Wed, 18 Oct 2017 07:48:25 +0000 (10:48 +0300)]
tipc: checking for NULL instead of IS_ERR()

The tipc_alloc_conn() function never returns NULL, it returns error
pointers, so I have fixed the check.

Fixes: 14c04493cb77 ("tipc: add ability to order and receive topology events in driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sh_eth-fallback-compat-strings'
David S. Miller [Fri, 20 Oct 2017 07:32:49 +0000 (08:32 +0100)]
Merge branch 'sh_eth-fallback-compat-strings'

Simon Horman says:

====================
net: sh_eth: add R-Car Gen[12] fallback compatibility strings

Add fallback compatibility strings for R-Car Gen 1 and 2.

In the case of Renesas R-Car hardware we know that there are generations of
SoCs, f.e. Gen 1 and 2. But beyond that its not clear what the relationship
between IP blocks might be. For example, I believe that r8a7790 is older
than r8a7791 but that doesn't imply that the latter is a descendant of the
former or vice versa.

We can, however, by examining the documentation and behaviour of the
hardware at run-time observe that the current driver implementation appears
to be compatible with the IP blocks on SoCs within a given generation.

For the above reasons and convenience when enabling new SoCs a
per-generation fallback compatibility string scheme is being adopted for
drivers for Renesas SoCs.

Changes since v1:
* Correct typos in changelogs
* Consistently use tabs for indentation in bindings document
* Enhance readability of description of bindings usage
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sh_eth: implement R-Car Gen[12] fallback compatibility strings
Simon Horman [Wed, 18 Oct 2017 07:21:28 +0000 (09:21 +0200)]
net: sh_eth: implement R-Car Gen[12] fallback compatibility strings

Implement fallback compatibility strings for R-Car Gen 1 and 2.

In the case of Renesas R-Car hardware we know that there are generations of
SoCs, f.e. Gen 1 and 2. But beyond that its not clear what the relationship
between IP blocks might be. For example, I believe that r8a7790 is older
than r8a7791 but that doesn't imply that the latter is a descendant of the
former or vice versa.

We can, however, by examining the documentation and behaviour of the
hardware at run-time observe that the current driver implementation appears
to be compatible with the IP blocks on SoCs within a given generation.

For the above reasons and convenience when enabling new SoCs a
per-generation fallback compatibility string scheme is being adopted for
drivers for Renesas SoCs.

Note that R-Car Gen2 and RZ/G1 have many compatible IP blocks.  The
approach that has been consistently taken for other IP blocks is to name
common code, compatibility strings and so on after R-Car Gen2.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sh_eth: rename name structures as rcar_gen[12]_*
Simon Horman [Wed, 18 Oct 2017 07:21:27 +0000 (09:21 +0200)]
net: sh_eth: rename name structures as rcar_gen[12]_*

Rename structures describing R-Car SoCs as rcar_gen[12]_*
rather than r8a77[79]x_*. This seems a little easier on the
eyes. And will make things slightly cleaner in a follow-up
patch that adds fallback-compatibility strings for these SoCs.

Note that R-Car Gen2 and RZ/G1 have many compatible IP blocks.  The
approach that has been consistently taken for other IP blocks is to name
common code, compatibility strings and so on after R-Car Gen2.

Also rename sh_eth_set_rate_r8a777x as sh_eth_set_rate_rcar as
it it is used by the R-Car generations supported by the driver.

This patch should have no run-time effect and
is compile-tested only.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: net: sh_eth: add R-Car Gen[12] fallback compatibility strings
Simon Horman [Wed, 18 Oct 2017 07:21:26 +0000 (09:21 +0200)]
dt-bindings: net: sh_eth: add R-Car Gen[12] fallback compatibility strings

Add fallback compatibility strings for R-Car Gen 1 and 2.

In the case of Renesas R-Car hardware we know that there are generations of
SoCs, f.e. Gen 1 and 2. But beyond that its not clear what the relationship
between IP blocks might be. For example, I believe that r8a7790 is older
than r8a7791 but that doesn't imply that the latter is a descendant of the
former or vice versa.

We can, however, by examining the documentation and behaviour of the
hardware at run-time observe that the current driver implementation appears
to be compatible with the IP blocks on SoCs within a given generation.

For the above reasons and convenience when enabling new SoCs a
per-generation fallback compatibility string scheme is being adopted for
drivers for Renesas SoCs.

Note that R-Car Gen2 and RZ/G1 have many compatible IP blocks.  The
approach that has been consistently taken for other IP blocks is to name
common code, compatibility strings and so on after R-Car Gen2.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodql: make dql_init return void
Stephen Hemminger [Wed, 18 Oct 2017 00:16:52 +0000 (17:16 -0700)]
dql: make dql_init return void

dql_init always returned 0, and the only place that uses it
in network core code didn't care about the return value anyway.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: l2tp: mark expected switch fall-through
Gustavo A. R. Silva [Tue, 17 Oct 2017 22:42:53 +0000 (17:42 -0500)]
net: l2tp: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I replaced the "NOBREAK" comment with
a "fall through" comment, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: mark expected switch fall-through in octeon_destroy_resources
Gustavo A. R. Silva [Tue, 17 Oct 2017 19:01:45 +0000 (14:01 -0500)]
liquidio: mark expected switch fall-through in octeon_destroy_resources

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: remove unnecessary NULL check before kfree in delete_glists
Gustavo A. R. Silva [Tue, 17 Oct 2017 18:59:20 +0000 (13:59 -0500)]
liquidio: remove unnecessary NULL check before kfree in delete_glists

NULL check before freeing functions like kfree is not needed.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'ibmvnic-next'
David S. Miller [Thu, 19 Oct 2017 12:20:32 +0000 (13:20 +0100)]
Merge branch 'ibmvnic-next'

Thomas Falcon says:

====================
ibmvnic: Enable SG and TSO feature support

This patch set is fairly straightforward. The first patch enables
scatter-gather support in the ibmvnic driver. The following patch
then enables the TCP Segmentation offload feature. The final patch
allows users to enable or disable net device features using ethtool.

Enabling SG and TSO grants a large increase in throughput with TX
speed increasing from 1Gb/s to 9Gb/s in our initial test runs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Let users change net device features
Thomas Falcon [Tue, 17 Oct 2017 17:36:56 +0000 (12:36 -0500)]
ibmvnic: Let users change net device features

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Enable TSO support
Thomas Falcon [Tue, 17 Oct 2017 17:36:55 +0000 (12:36 -0500)]
ibmvnic: Enable TSO support

This patch enables TSO support. It includes additional
buffers reserved exclusively for large packets. Throughput
is greatly increased with TSO enabled, from about 1 Gb/s to
9 Gb/s on our test systems.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Enable scatter-gather support
Thomas Falcon [Tue, 17 Oct 2017 17:36:54 +0000 (12:36 -0500)]
ibmvnic: Enable scatter-gather support

This patch enables scatter gather support. Since there is no
HW/FW scatter-gather support at this time, the driver needs to
loop through each fragment and copy it to a contiguous, pre-mapped
buffer entry.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotun: relax check on eth_get_headlen() return value
Eric Dumazet [Tue, 17 Oct 2017 17:07:44 +0000 (10:07 -0700)]
tun: relax check on eth_get_headlen() return value

syzkaller hit the WARN() in tun_get_user(), providing skb
with payload in fragments only, and nothing in skb->head

GRO layer is fine with this, so relax the check.

Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomqprio: fix potential null pointer dereference on opt
Colin Ian King [Tue, 17 Oct 2017 15:01:30 +0000 (16:01 +0100)]
mqprio: fix potential null pointer dereference on opt

The pointer opt has a null check however before for this check opt is
dereferenced when len is initialized, hence we potentially have a null
pointer deference on opt.  Avoid this by checking for a null opt before
dereferencing it.

Detected by CoverityScan, CID#1458234 ("Dereference before null check")

Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agothunderbolt: Right shifting to zero bug in tbnet_handle_packet()
Dan Carpenter [Tue, 17 Oct 2017 12:33:01 +0000 (15:33 +0300)]
thunderbolt: Right shifting to zero bug in tbnet_handle_packet()

There is a problem when we do:

sequence = pkg->hdr.length_sn & TBIP_HDR_SN_MASK;
sequence >>= TBIP_HDR_SN_SHIFT;

TBIP_HDR_SN_SHIFT is 27, and right shifting a u8 27 bits is always
going to result in zero.  The fix is to declare these variables as u32.

Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agothunderbolt: Fix a couple right shifting to zero bugs
Dan Carpenter [Tue, 17 Oct 2017 12:32:17 +0000 (15:32 +0300)]
thunderbolt: Fix a couple right shifting to zero bugs

The problematic code looks like this:

res_seq = res_hdr->xd_hdr.length_sn & TB_XDOMAIN_SN_MASK;
res_seq >>= TB_XDOMAIN_SN_SHIFT;

TB_XDOMAIN_SN_SHIFT is 27, and right shifting a u8 27 bits is always
going to result in zero.  The fix is to declare these variables as u32.

Fixes: d1ff70241a27 ("thunderbolt: Add support for XDomain discovery protocol")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'ena-next'
David S. Miller [Thu, 19 Oct 2017 11:51:38 +0000 (12:51 +0100)]
Merge branch 'ena-next'

Netanel Belgazal says:
====================
update ENA driver to releawse 1.3.0
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ena: increase ena driver version to 1.3.0
Netanel Belgazal [Tue, 17 Oct 2017 07:34:01 +0000 (07:34 +0000)]
net: ena: increase ena driver version to 1.3.0

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ena: add new admin define for future support of IPv6 RSS
Netanel Belgazal [Tue, 17 Oct 2017 07:34:00 +0000 (07:34 +0000)]
net: ena: add new admin define for future support of IPv6 RSS

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ena: add statistics for missed tx packets
Netanel Belgazal [Tue, 17 Oct 2017 07:33:59 +0000 (07:33 +0000)]
net: ena: add statistics for missed tx packets

Add a new statistic to ethtool stats that show the number of packets
without transmit acknowledgement from ENA device.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ena: add power management ops to the ENA driver
Netanel Belgazal [Tue, 17 Oct 2017 07:33:58 +0000 (07:33 +0000)]
net: ena: add power management ops to the ENA driver

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ena: remove legacy suspend suspend/resume support
Netanel Belgazal [Tue, 17 Oct 2017 07:33:57 +0000 (07:33 +0000)]
net: ena: remove legacy suspend suspend/resume support

Remove ena_device_io_suspend/resume() methods
Those methods were intend to be used by the device to trigger
suspend/resume but eventually it was dropped.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ena: improve ENA driver boot time.
Netanel Belgazal [Tue, 17 Oct 2017 07:33:56 +0000 (07:33 +0000)]
net: ena: improve ENA driver boot time.

The ena admin commands timeout is in resolutions of 100ms.
Therefore, When the driver works in polling mode, it sleeps for 100ms
each time. The overall boot time of the ENA driver is ~1.5 sec.
To reduce the boot time, This change modifies the granularity of
the sleeps to 5ms.
This change improves the boot time to 220ms.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMAINTAINERS: change ENA driver maintainers email domain
Netanel Belgazal [Tue, 17 Oct 2017 07:30:34 +0000 (07:30 +0000)]
MAINTAINERS: change ENA driver maintainers email domain

ENA driver was developed by developers from Annapurna Labs.
Annapurna Labs was acquired by Amazon and the company's domain
(@annapurnalabs.com) will become deprecated soon.

Update the email addresses of the maintainers to the alternative amazon
emails (@amazon.com)

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Fix iWARP out of order flow
Michal Kalderon [Tue, 17 Oct 2017 07:23:25 +0000 (10:23 +0300)]
qed: Fix iWARP out of order flow

Out of order flow is not working for iWARP.
This patch got cut out from initial series that added out
of order support for iWARP.

Make out of order code common for iWARP and iSCSI.
Add new configuration option CONFIG_QED_OOO. Set by
qedr and qedi Kconfigs.

Fixes: d1abfd0b4ee2 ("qed: Add iWARP out of order support")

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: Add mqprio hardware offload support in hns3 driver
Yunsheng Lin [Tue, 17 Oct 2017 06:51:30 +0000 (14:51 +0800)]
net: hns3: Add mqprio hardware offload support in hns3 driver

When using tc qdisc, dcb_ops->setup_tc is used to tell hclge_dcb
module to do the tm related setup. Only TC_MQPRIO_MODE_CHANNEL
offload mode is supported.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomacvlan/macvtap: Add support for L2 forwarding offloads with macvtap
Alexander Duyck [Tue, 17 Oct 2017 02:44:44 +0000 (22:44 -0400)]
macvlan/macvtap: Add support for L2 forwarding offloads with macvtap

This patch reverts earlier commit b13ba1b83f52 ("macvlan: forbid L2
fowarding offload for macvtap"). The reason for reverting this is because
the original patch no longer fixes what it previously did as the
underlying structure has changed for macvtap. Specifically macvtap
originally pulled packets directly off of the lowerdev. However in commit
6acf54f1cf0a ("macvtap: Add support of packet capture on macvtap device.")
that code was changed and instead macvtap would listen directly on the
macvtap device itself instead of the lower device. As such, the L2
forwarding offload should now be able to provide a performance advantage of
skipping the checks on the lower dev while not introducing any sort of
regression.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 19 Oct 2017 10:44:36 +0000 (11:44 +0100)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-10-17

This series contains updates to i40e and ethtool.

Alan provides most of the changes in this series which are mainly fixes
and cleanups.  Renamed the ethtool "cmd" variable to "ks", since the new
ethtool API passes us ksettings structs instead of command structs.
Cleaned up an ifdef that was not accomplishing anything.  Added function
header comments to provide better documentation.  Fixed two issues in
i40e_get_link_ksettings(), by calling
ethtool_link_ksettings_zero_link_mode() to ensure the advertising and
link masks are cleared before we start setting bits.  Cleaned up and fixed
code comments which were incorrect.  Separated the setting of autoneg in
i40e_phy_types_to_ethtool() into its own conditional to clarify what PHYs
support and advertise autoneg, and makes it easier to add new PHY types in
the future.  Added ethtool functionality to intersect two link masks
together to find the common ground between them.  Overhauled i40e to
ensure that the new ethtool API macros are being used, instead of the
old ones.  Fixed the usage of unsigned 64-bit division which is not
supported on all architectures.

Sudheer adds support for 25G Active Optical Cables (AOC) and Active Copper
Cables (ACC) PHY types.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge remote-tracking branch 'net-next/master'
Stefan Schmidt [Wed, 18 Oct 2017 15:40:18 +0000 (17:40 +0200)]
Merge remote-tracking branch 'net-next/master'

7 years agotcp: fix tcp_xmit_retransmit_queue() after rbtree introduction
Eric Dumazet [Tue, 17 Oct 2017 02:38:35 +0000 (19:38 -0700)]
tcp: fix tcp_xmit_retransmit_queue() after rbtree introduction

I tried to hard avoiding a call to rb_first() (via tcp_rtx_queue_head)
in tcp_xmit_retransmit_queue(). But this was probably too bold.

Quoting Yuchung :

We might miss re-arming the RTO if tp->retransmit_skb_hint is not NULL.
This can happen when RACK marks the first packet lost again and resets
tp->retransmit_skb_hint for example (tcp_rack_mark_skb_lost())

Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bpf-ctx-info-out-of-verifier'
David S. Miller [Wed, 18 Oct 2017 13:17:11 +0000 (14:17 +0100)]
Merge branch 'bpf-ctx-info-out-of-verifier'

Jakub Kicinski says:

====================
bpf: move context info out of the verifier

Daniel pointed out during the review of my previous patchset that
the knowledge about context doesn't really belong directly in the
verifier.  This patch set takes a bit of a drastic approach to
move the info out of there.  I want to be able to use different
set of verifier_ops for program analysis.  To do that, I have
to first move the test_run callback to a separate structure.  Then
verifier ops can be declared in the verifier directly and
different sets can be picked for verification vs analysis.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: allow access to skb->len from offloads
Jakub Kicinski [Mon, 16 Oct 2017 23:40:56 +0000 (16:40 -0700)]
bpf: allow access to skb->len from offloads

Since we are now doing strict checking of what offloads
may access, make sure skb->len is on that list.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: move knowledge about post-translation offsets out of verifier
Jakub Kicinski [Mon, 16 Oct 2017 23:40:55 +0000 (16:40 -0700)]
bpf: move knowledge about post-translation offsets out of verifier

Use the fact that verifier ops are now separate from program
ops to define a separate set of callbacks for verification of
already translated programs.

Since we expect the analyzer ops to be defined only for
a small subset of all program types initialize their array
by hand (don't use linux/bpf_types.h).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: remove the verifier ops from program structure
Jakub Kicinski [Mon, 16 Oct 2017 23:40:54 +0000 (16:40 -0700)]
bpf: remove the verifier ops from program structure

Since the verifier ops don't have to be associated with
the program for its entire lifetime we can move it to
verifier's struct bpf_verifier_env.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: split verifier and program ops
Jakub Kicinski [Mon, 16 Oct 2017 23:40:53 +0000 (16:40 -0700)]
bpf: split verifier and program ops

struct bpf_verifier_ops contains both verifier ops and operations
used later during program's lifetime (test_run).  Split the runtime
ops into a different structure.

BPF_PROG_TYPE() will now append ## _prog_ops or ## _verifier_ops
to the names.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: Check daddr_cache before use in tracepoint
David Ahern [Mon, 16 Oct 2017 22:32:07 +0000 (15:32 -0700)]
tcp: Check daddr_cache before use in tracepoint

Running perf in one window to capture tcp_retransmit_skb tracepoint:
    $ perf record -e tcp:tcp_retransmit_skb -a

And causing a retransmission on an active TCP session (e.g., dropping
packets in the receiver, changing MTU on the interface to 500 and back
to 1500) triggers a panic:

[   58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.546770] PGD 0 P4D 0
[   58.547472] Oops: 0000 [#1] SMP
[   58.548328] Modules linked in: vrf
[   58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
[   58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000
[   58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282
[   58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098
[   58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68
[   58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a
[   58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0
[   58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00
[   58.563845] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   58.564873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0
[   58.566538] Call Trace:
[   58.566865]  <IRQ>
[   58.567140]  __tcp_retransmit_skb+0x4ab/0x4c6
[   58.567704]  ? tcp_set_ca_state+0x22/0x3f
[   58.568231]  tcp_retransmit_skb+0x14/0xa3
[   58.568754]  tcp_retransmit_timer+0x472/0x5e3
[   58.569324]  ? tcp_write_timer_handler+0x1e9/0x1e9
[   58.569946]  tcp_write_timer_handler+0x95/0x1e9
[   58.570548]  tcp_write_timer+0x2a/0x58

Check that daddr_cache is non-NULL before de-referencing.

Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipx: mark expected switch fall-through
Gustavo A. R. Silva [Mon, 16 Oct 2017 21:53:16 +0000 (16:53 -0500)]
net: ipx: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: mark expected switch fall-throughs
Gustavo A. R. Silva [Mon, 16 Oct 2017 21:36:52 +0000 (16:36 -0500)]
ipv6: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in some cases I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: Use pI6c in tcp tracepoint
David Ahern [Mon, 16 Oct 2017 21:24:02 +0000 (14:24 -0700)]
tcp: Use pI6c in tcp tracepoint

The compact form for IPv6 addresses is more user friendly than the full
version. For example:
   compact: 2001:db8:1::1
      full: 2001:0db8:0001:0000:0000:0000:0000:0004i

Update the tcp tracepoint to show the compact form.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>