linux-2.6-microblaze.git
7 years agoMerge branch 'bpf-lru-perf'
David S. Miller [Mon, 17 Apr 2017 17:55:54 +0000 (13:55 -0400)]
Merge branch 'bpf-lru-perf'

Martin KaFai Lau says:

====================
bpf: LRU performance and test-program improvements

The first 4 patches make a few improvements to the LRU tests.

Patch 5/6 is to improve the performance of BPF_F_NO_COMMON_LRU map.

Patch 6/6 adds an example in using LRU map with map-in-map.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Add map-in-map LRU example
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:30 +0000 (10:30 -0700)]
bpf: lru: Add map-in-map LRU example

This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.

It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.

BPF_F_NO_COMMON_LRU:
> map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
9234314 (9.23M/s)

map-in-map LRU:
> map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
9962743 (9.96M/s)

Notes that the max_entries for the map-in-map LRU test is 1260000 which
is the max_entries for each inner LRU map.  8 processes have been
started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:29 +0000 (10:30 -0700)]
bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4

After doing map_perf_test with a much bigger
BPF_F_NO_COMMON_LRU map, the perf report shows a
lot of time spent in rotating the inactive list (i.e.
__bpf_lru_list_rotate_inactive):
> map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}'
19644783 (19M/s)
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
6283930 (6.28M/s)

By inactive, it usually means the element is not in cache.  Hence,
there is a need to tune the PERCPU_NR_SCANS value.

This patch finds a better number of elements to
scan during each list rotation.  The PERCPU_NR_SCANS (which
is defined the same as PERCPU_FREE_TARGET) decreases
from 16 elements to 4 elements.  This change only
affects the BPF_F_NO_COMMON_LRU map.

The test_lru_dist does not show meaningful difference
between 16 and 4.  Our production L4 load balancer which uses
the LRU map for conntrack-ing also shows little change in cache
hit rate.  Since both benchmark and production data show no
cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
We can consider making it configurable if we find a usecase
later that shows another value works better and/or use
a different rotation strategy.

After this change:
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
9240324 (9.2M/s)

i.e. 6.28M/s -> 9.2M/s

The test_lru_dist has not shown meaningful difference:
> test_lru_dist zipf.100k.a1_01.out 4000 1:
nr_misses: 31575 (Before) vs 31566 (After)

> test_lru_dist zipf.100k.a0_01.out 40000 1
nr_misses: 67036 (Before) vs 67031 (After)

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: Allow bpf sample programs (*_user.c) to change bpf_map_def
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:28 +0000 (10:30 -0700)]
bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def

The current bpf_map_def is statically defined during compile
time.  This patch allows the *_user.c program to change it during
runtime.  It is done by adding load_bpf_file_fixup_map() which
takes a callback.  The callback will be called before creating
each map so that it has a chance to modify the bpf_map_def.

The current usecase is to change max_entries in map_perf_test.
It is interesting to test with a much bigger map size in
some cases (e.g. the following patch on bpf_lru_map.c).
However,  it is hard to find one size to fit all testing
environment.  Hence, it is handy to take the max_entries
as a cmdline arg and then configure the bpf_map_def during
runtime.

This patch adds two cmdline args.  One is to configure
the map's max_entries.  Another is to configure the max_cnt
which controls how many times a syscall is called.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Refactor LRU map tests in map_perf_test
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:27 +0000 (10:30 -0700)]
bpf: lru: Refactor LRU map tests in map_perf_test

One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.

One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Cleanup test_lru_map.c
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:26 +0000 (10:30 -0700)]
bpf: lru: Cleanup test_lru_map.c

This patch does the following cleanup on test_lru_map.c
1) Fix indentation (Replace spaces by tabs)
2) Remove redundant BPF_F_NO_COMMON_LRU test
3) Simplify some comments

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:25 +0000 (10:30 -0700)]
bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU

test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU.
It just happens to work when PERCPU_FREE_TARGET == 16.

This patch:
1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU
2) Add test_lru_sanity6 to test list rotation for
   the BPF_F_NO_COMMON_LRU map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvneta: fix failed to suspend if WOL is enabled
Jisheng Zhang [Fri, 14 Apr 2017 11:07:32 +0000 (19:07 +0800)]
net: mvneta: fix failed to suspend if WOL is enabled

Recently, suspend/resume and WOL support are added into mvneta driver.
If we enable WOL, then we get some error as below on Marvell BG4CT
platforms during suspend:

[  184.149723] dpm_run_callback(): mdio_bus_suspend+0x0/0x50 returns -16
[  184.149727] PM: Device f7b62004.mdio-mi:00 failed to suspend: error -16

-16 means -EBUSY, phy_suspend() will return -EBUSY if it finds the
device has WOL enabled.

We fix this issue by properly setting the netdev's power.can_wakeup
and power.wakeup, i.e

1. in mvneta_mdio_probe(), call device_set_wakeup_capable() to set
power.can_wakeup if the phy support WOL.

2. in mvneta_ethtool_set_wol(), call device_set_wakeup_enable() to
set power.wakeup if WOL has been successfully enabled in phy.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: bridge: notify on hw fdb takeover
Nikolay Aleksandrov [Fri, 14 Apr 2017 10:49:34 +0000 (13:49 +0300)]
net: bridge: notify on hw fdb takeover

Recently we added support for SW fdbs to take over HW ones, but that
results in changing a user-visible fdb flag thus we need to send a
notification, also it's consistent with how HW takes over SW entries.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agokcm: remove a useless copy_from_user()
WANG Cong [Thu, 13 Apr 2017 18:38:02 +0000 (11:38 -0700)]
kcm: remove a useless copy_from_user()

struct kcm_clone only contains fd, and kcm_clone() only
writes this struct, so there is no need to copy it from user.

Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMAINTAINERS: rename TC entry and add couple of header files
Jiri Pirko [Thu, 13 Apr 2017 16:13:51 +0000 (18:13 +0200)]
MAINTAINERS: rename TC entry and add couple of header files

The section is not specific only to "TC classifiers", but applies to the
whole TC subsystem. Also, add couple of forgotten headers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: simplify phy_supported_speeds()
Russell King [Thu, 13 Apr 2017 15:49:20 +0000 (16:49 +0100)]
net: phy: simplify phy_supported_speeds()

Simplify the loop in phy_supported_speeds().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: improve phylib correctness for non-autoneg settings
Russell King [Thu, 13 Apr 2017 15:49:15 +0000 (16:49 +0100)]
net: phy: improve phylib correctness for non-autoneg settings

phylib has some undesirable behaviour when forcing a link mode through
ethtool.  phylib uses this code:

idx = phy_find_valid(phy_find_setting(phydev->speed, phydev->duplex),
features);

to find an index in the settings table.  phy_find_setting() starts at
index 0, and scans upwards looking for an exact speed and duplex match.
When it doesn't find it, it returns MAX_NUM_SETTINGS - 1, which is
10baseT-Half duplex.

phy_find_valid() then scans from the point (and effectively only checks
one entry) before bailing out, returning MAX_NUM_SETTINGS - 1.

phy_sanitize_settings() then sets ->speed to SPEED_10 and ->duplex to
DUPLEX_HALF whether or not 10baseT-Half is supported or not.  This goes
against all the comments against these functions, and 10baseT-Half may
not even be supported by the hardware.

Rework these functions, introducing a new method of scanning the table.
There are two modes of lookup that phylib wants: exact, and inexact.

- in exact mode, we return either an exact match or failure
- in inexact mode, we return an exact match if it exists, a match at
  the highest speed that is not greater than the requested speed
  (ignoring duplex), or failing that, the lowest supported speed, or
  failure.

The biggest difference is that we always check whether the entry is
supported before further consideration, so all unsupported entries are
not considered as candidates.

This results in arguably saner behaviour, better matches the comments,
and is probably what users would expect.

This becomes important as ethernet speeds increase, PHYs exist which do
not support the 10Mbit speeds, and half-duplex is likely to become
obsolete - it's already not even an option on 10Gbit and faster links.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoSubject: net: allow configuring default qdisc
stephen hemminger [Thu, 13 Apr 2017 15:40:53 +0000 (08:40 -0700)]
Subject: net: allow configuring default qdisc

Since 3.12 it has been possible to configure the default queuing
discipline via sysctl. This patch adds ability to configure the
default queue discipline in kernel configuration. This is useful for
environments where configuring the value from userspace is difficult
to manage.

The default is still the same as before (pfifo_fast) and it is
possible to change after kernel init with sysctl. This is similar
to how TCP congestion control works.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'qed-arfs'
David S. Miller [Mon, 17 Apr 2017 17:06:19 +0000 (13:06 -0400)]
Merge branch 'qed-arfs'

Manish Chopra says:

====================
qed/qede: aRFS support

This series adds support for Accelerated Flow Steering
in qede driver for TCP/UDP over IPv4/IPv6 protocols.

Please consider applying this series to "net-next"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Add aRFS support
Chopra, Manish [Thu, 13 Apr 2017 11:54:45 +0000 (04:54 -0700)]
qede: Add aRFS support

This patch adds support for aRFS for TCP and UDP
protocols with IPv4/IPv6.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: aRFS infrastructure support
Chopra, Manish [Thu, 13 Apr 2017 11:54:44 +0000 (04:54 -0700)]
qed: aRFS infrastructure support

This patch adds necessary APIs to interface with
qede aRFS support in successive patch.

It also reserves separate PTT entry for aRFS,
[as being in fastpath flow] for hardware access instead of
trying to acquire it at run time from the ptt pool.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosmsc95xx: Add comments to the registers definition
Martin Wetterwald [Thu, 13 Apr 2017 08:08:44 +0000 (10:08 +0200)]
smsc95xx: Add comments to the registers definition

This chip is used by a lot of embedded devices and also by the Raspberry
Pi 1, 2 & 3 which were created to promote the study of computer
sciences. Students wanting to learn kernel / network device driver
programming through those devices can only rely on the Linux kernel
driver source to make their own.

This commit adds a lot of comments to the registers definition to expand
the register names.

Cc: Steve Glendinning <steve.glendinning@shawell.net>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: Martin Wetterwald <martin@wetterwald.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Steve Glendinning <steve.glendinning@shawell.net>
Acked-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agol2tp: device MTU setup, tunnel socket needs a lock
R. Parameswaran [Thu, 13 Apr 2017 01:31:04 +0000 (18:31 -0700)]
l2tp: device MTU setup, tunnel socket needs a lock

The MTU overhead calculation in L2TP device set-up
merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
needs to be adjusted to lock the tunnel socket while
referencing the sub-data structures to derive the
socket's IP overhead.

Reported-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv6: send unsolicited NA on admin up
David Ahern [Wed, 12 Apr 2017 18:49:04 +0000 (11:49 -0700)]
net: ipv6: send unsolicited NA on admin up

ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is
set to 1, gratuitous arp requests are sent when the device is brought up.
The same is expected when ndisc_notify is set to 1 (per ndisc_notify in
Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP
event; add it.

Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlx5-RDMA-netdevice'
David S. Miller [Mon, 17 Apr 2017 15:08:33 +0000 (11:08 -0400)]
Merge branch 'mlx5-RDMA-netdevice'

Saeed Mahameed says:

====================
Mellanox, mlx5 RDMA net device support

This series provides the lower level mlx5 support of RDMA netdevice
creation API [1] suggested and introduced by Intel's HFI OPA VNIC
netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation.

mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current
IPoIB ULP generic netdevice, providing:
- mlx5 RSS support.
- mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..).
- Full mlx5 HW features transparent to the ULP itself.

The idea here is to reuse and benefit from the already implemented mlx5e netdevice
management and channels API for both etherent and RDMA netdevices, since both IPoIB
and Ethernet netdevices share same common mlx5 HW resources (with some small
exceptions) and share most of the control/data path logic, it is more natural to
have them share the same code.

The differences between IPoIB and Ethernet netdevices can be summarized to:

Steering:
In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet
the traffic is handled by vports and vport steering is managed by e-switch or FW.

For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS
HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to
them (underlay QP will be 0 in case of Ethernet).

RX,TX:
Since IPoIB traffic is different, slightly modified RX and TX handlers are required,
still we do some code reuse in data path via common helper functions.

All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet
and IPoIB netdevices, e.g.
- Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..)
- Offloads, checksum, GRO, LRO, TSO, and more.
        - netdevice logic and non Ethernet specific ndos (open/close, etc..)

In order to achieve what we want:

In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored
the mlx5 steering code to accept the underlay QP as a parameter for creating steering
objects and enabled flow steering for IB link.

Then we are going to use the mlx5e netdevice profile, which is already used to separate between
NIC and VF representors netdevices, to create new type of IPoIB netdevice profile.

For that, one small refactoring is required to make mlx5e netdevice profile management
more genetic and agnostic to link type which is done in patch #4.

In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a
skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches,
using mlx5e already existing APIs.

Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS
resources, same as mlx5e but without vlan and L2 steering tables.

Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources
same as mlx5e but with one TC (tc = 0) support.

Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels.

Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts.

Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function and implement the
mlx5i IPoIB xmit routine.

Patch #13 and #14, Have an RX handler per netdevice profile. We already do this before this series
in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make
the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers.

Patch #15, Small cleanup to avoid e-switch with IPoIB netdev.

In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3]
- which was alread submitted to the rdma mailing list - and this series is required
plus an extra small patch [4] which will connect between both sides and actually enables the offload.

Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable
the feature.

Thanks,
Saeed.

[1] https://patchwork.kernel.org/patch/9676637/

[2] https://lwn.net/Articles/715453/
    https://patchwork.kernel.org/patch/9587815/

[3] https://patchwork.kernel.org/patch/9672069/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohw/mlx5: Add New bit to check over QP creation
Erez Shitrit [Thu, 13 Apr 2017 03:37:06 +0000 (06:37 +0300)]
hw/mlx5: Add New bit to check over QP creation

Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: E-switch vport manager is valid for ethernet only
Saeed Mahameed [Thu, 13 Apr 2017 03:37:05 +0000 (06:37 +0300)]
net/mlx5e: E-switch vport manager is valid for ethernet only

Currently the driver support only ethernet eswitch, and we want to
protect downstream IPoIB netdev from trying to access it in IB link.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, RX handler
Saeed Mahameed [Thu, 13 Apr 2017 03:37:04 +0000 (06:37 +0300)]
net/mlx5e: IPoIB, RX handler

Implement IPoIB RX SKB handler.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: RX handlers per netdev profile
Saeed Mahameed [Thu, 13 Apr 2017 03:37:03 +0000 (06:37 +0300)]
net/mlx5e: RX handlers per netdev profile

In order to have different RX handler per profile, fix and refactor the
current code to take the rx handler directly from the netdevice profile
rather than computing it on runtime as it was done with the switchdev
mode representor rx handler.

This will also remove the current wrong assumption in mlx5e_alloc_rq
code that mlx5e_priv->ppriv is of the type vport_rep.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, Xmit flow
Saeed Mahameed [Thu, 13 Apr 2017 03:37:02 +0000 (06:37 +0300)]
net/mlx5e: IPoIB, Xmit flow

Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Xmit flow break down
Saeed Mahameed [Thu, 13 Apr 2017 03:37:01 +0000 (06:37 +0300)]
net/mlx5e: Xmit flow break down

Break current mlx5e xmit flow into smaller blocks (helper functions)
in order to reuse them for IPoIB SKB transmission.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, Underlay QP
Saeed Mahameed [Thu, 13 Apr 2017 03:37:00 +0000 (06:37 +0300)]
net/mlx5e: IPoIB, Underlay QP

Create IPoIB underlay QP needed by the IPoIB netdevice profile for RSS
and TX HW context to perform on IPoIB traffic.

Reset the underlay QP on dev_uninit ndo to stop IPoIB traffic going
through this QP when the ULP IPoIB decides to cleanup.

Implement attach/detach mcast RDMA netdev callbacks for later RDMA
netdev use.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, Basic netdev ndos open/close
Saeed Mahameed [Thu, 13 Apr 2017 03:36:59 +0000 (06:36 +0300)]
net/mlx5e: IPoIB, Basic netdev ndos open/close

Implement open/close of IPoIB netdevice ndos using mlx5e's
channels API to manage data path resources (RQs/SQs/CQs).

Set IPoIB netdev address on dev_init ndo.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, TX TIS creation
Saeed Mahameed [Thu, 13 Apr 2017 03:36:58 +0000 (06:36 +0300)]
net/mlx5e: IPoIB, TX TIS creation

Modify mlx5e tis creation function to accept underlay qp number, which
will be needed by IPoIB.

Implement mlx5i (IPoIB) tx init/cleanup netdevice profile flows to
create one TIS with the IPoIB underlay qp, for IPoIB TX SQs.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, RSS flow steering tables
Saeed Mahameed [Thu, 13 Apr 2017 03:36:57 +0000 (06:36 +0300)]
net/mlx5e: IPoIB, RSS flow steering tables

Like the mlx5e ethernet mode, on IPoIB mode we need to create RX steering
tables, but IPoIB do not require MAC and VLAN steering tables so the
only tables we create in here are:
1. TTC Table (Traffic Type Classifier table for RSS steering)
2. ARFS Table (for accelerated RFS support)

Creation of those tables is identical to mlx5e ethernet mode, hence the
use of mlx5e_create_ttc_table and mlx5e_arfs_create_tables.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, RX steering RSS RQTs and TIRs
Saeed Mahameed [Thu, 13 Apr 2017 03:36:56 +0000 (06:36 +0300)]
net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs

Implement IPoIB RX RSS (RQTs and TIRs) HW objects creation,
All we do here is simply reuse the mlx5e implementation to create
direct and indirect (RSS) steering HW objects.

For that we just expose
mlx5e_{create,destroy}_{direct,indirect}_{rqt,tir} functions into en.h
and call them from ipoib.c in init/cleanup_rx IPoIB netdevice profile
callbacks.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: IPoIB, Add netdevice profile skeleton
Saeed Mahameed [Thu, 13 Apr 2017 03:36:55 +0000 (06:36 +0300)]
net/mlx5e: IPoIB, Add netdevice profile skeleton

Create mlx5e IPoIB netdevice profile skeleton in the new ipoib.c
file with empty implementation.

Downstream patches will provide the full mlx5 rdma netdevice acceleration
support for IPoIB into this new file, by using the mlx5e netdevice
profile and new mlx5_channels APIs and infrastructures.
Same as already done in mlx5e NIC netdevice and switchdev mode VF
representors.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: More generic netdev management API
Saeed Mahameed [Thu, 13 Apr 2017 03:36:54 +0000 (06:36 +0300)]
net/mlx5e: More generic netdev management API

In preparation for mlx5e RDMA net_device support, here we generalize
mlx5e_attach/detach in a way that those functions will be agnostic
to link type.  For that we move ethernet specific NIC net device logic out
of those functions into {nic,rep}_{enable/disable} mlx5e NIC and
representor profiles callbacks.

Also some of the logic was moved only to NIC profile since it is not right
to have this logic for representor net device (e.g. set port MTU).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5: Enable flow-steering for IB link
Erez Shitrit [Thu, 13 Apr 2017 03:36:53 +0000 (06:36 +0300)]
net/mlx5: Enable flow-steering for IB link

Get the relevant capabilities if supports ipoib_enhanced_offloads and
init the flow steering table accordingly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5: Refactor create flow table method to accept underlay QP
Erez Shitrit [Thu, 13 Apr 2017 03:36:52 +0000 (06:36 +0300)]
net/mlx5: Refactor create flow table method to accept underlay QP

IB flow tables need the underlay qp to perform flow steering.
Here we change the API of the flow tables creation to accept the
underlay QP number as a parameter in order to support IB (IPoIB) flow
steering.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5: Add IPoIB enhanced offloads bits to mlx5_ifc
Erez Shitrit [Thu, 13 Apr 2017 03:36:51 +0000 (06:36 +0300)]
net/mlx5: Add IPoIB enhanced offloads bits to mlx5_ifc

New capability bit: ipoib_enhanced_offloads, indicates new ability for UD
QP to do RSS and enhanced IPoIB offloads and acceleration.

Add underlay_qpn to the TIS and flow_table objects In order to support
SET_ROOT command, to connect between IPoIB QPs and flow steering tables.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: Exclude non-TCP port numbers from vRSS hashing
Haiyang Zhang [Wed, 12 Apr 2017 18:45:18 +0000 (11:45 -0700)]
hv_netvsc: Exclude non-TCP port numbers from vRSS hashing

Azure hosts are not supporting non-TCP port numbers in vRSS hashing for
now. For example, UDP packet loss rate will be high if port numbers are
also included in vRSS hash.

So, we created this patch to use only IP numbers for hashing in non-TCP
traffic.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: Fix the queue index computation in forwarding case
Haiyang Zhang [Wed, 12 Apr 2017 18:35:05 +0000 (11:35 -0700)]
hv_netvsc: Fix the queue index computation in forwarding case

If the outgoing skb has a RX queue mapping available, we use the queue
number directly, other than put it through Send Indirection Table.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: isolate legacy code
Vivien Didelot [Wed, 12 Apr 2017 16:45:03 +0000 (12:45 -0400)]
net: dsa: isolate legacy code

This patch moves as is the legacy DSA code from dsa.c to legacy.c,
except the few shared symbols which remain in dsa.c.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 16 Apr 2017 01:16:30 +0000 (21:16 -0400)]
Merge git://git./linux/kernel/git/davem/net

Conflicts were simply overlapping changes.  In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.

In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 15 Apr 2017 00:51:16 +0000 (17:51 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
 "Just a small update to xpad driver to recognize yet another gamepad,
  and another change making sure userio.h is exported"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - add support for Razer Wildcat gamepad
  uapi: add missing install of userio.h

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 15 Apr 2017 00:38:24 +0000 (17:38 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "Things seem to be settling down as far as networking is concerned,
  let's hope this trend continues...

   1) Add iov_iter_revert() and use it to fix the behavior of
      skb_copy_datagram_msg() et al., from Al Viro.

   2) Fix the protocol used in the synthetic SKB we cons up for the
      purposes of doing a simulated route lookup for RTM_GETROUTE
      requests. From Florian Larysch.

   3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong
      Wang.

   4) Don't call netdev_change_features with the team lock held, from
      Xin Long.

   5) Revert TCP F-RTO extension to catch more spurious timeouts because
      it interacts very badly with some middle-boxes. From Yuchung
      Cheng.

   6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from
      Guillaume Nault.

   7) ctnetlink uses bit positions where it should be using bit masks,
      fix from Liping Zhang.

   8) Missing RCU locking in netfilter helper code, from Gao Feng.

   9) Avoid double frees and use-after-frees in tcp_disconnect(), from
      Eric Dumazet.

  10) Don't do a changelink before we register the netdevice in
      bridging, from Ido Schimmel.

  11) Lock the ipv6 device address list properly, from Rabin Vincent"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage
  netfilter: nft_hash: do not dump the auto generated seed
  drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201
  ipv6: Fix idev->addr_list corruption
  net: xdp: don't export dev_change_xdp_fd()
  bridge: netlink: register netdevice before executing changelink
  bridge: implement missing ndo_uninit()
  bpf: reference may_access_skb() from __bpf_prog_run()
  tcp: clear saved_syn in tcp_disconnect()
  netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
  netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
  netfilter: make it safer during the inet6_dev->addr_list traversal
  netfilter: ctnetlink: make it safer when checking the ct helper name
  netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
  netfilter: ctnetlink: using bit to represent the ct event
  netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
  net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb
  l2tp: don't mask errors in pppol2tp_getsockopt()
  l2tp: don't mask errors in pppol2tp_setsockopt()
  tcp: restrict F-RTO to work-around broken middle-boxes
  ...

7 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 15 Apr 2017 00:00:01 +0000 (17:00 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of small fixes for x86:

   - fix locking in RDT to prevent memory leaks and freeing in use
     memory

   - prevent setting invalid values for vdso32_enabled which cause
     inconsistencies for user space resulting in application crashes.

   - plug a race in the vdso32 code between fork and sysctl which causes
     inconsistencies for user space resulting in application crashes.

   - make MPX signal delivery work in compat mode

   - make the dmesg output of traps and faults readable again"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
  x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
  x86/vdso: Plug race between mapping and ELF header setup
  x86/vdso: Ensure vdso32_enabled gets set to valid values only
  x86/signals: Fix lower/upper bound reporting in compat siginfo

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Apr 2017 23:58:38 +0000 (16:58 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Two small fixes for perf:

   - the move to support cross arch annotation introduced per arch
     initialization requirements, fullfill them for s/390 (Christian
     Borntraeger)

   - add the missing initialization to the LBR entries to avoid exposing
     random or stale data"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
  perf annotate s390: Fix perf annotate error -95 (4.10 regression)

7 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Apr 2017 23:57:14 +0000 (16:57 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "The irq department provides:

   - two fixes for the CPU affinity spread infrastructure to prevent
     unbalanced spreading in corner cases which leads to horrible
     performance, because interrupts are rather aggregated than spread

   - add a missing spinlock initializer in the imx-gpcv2 init code"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-imx-gpcv2: Fix spinlock initialization
  irq/affinity: Fix extra vecs calculation
  irq/affinity: Fix CPU spread for unbalanced nodes

7 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Apr 2017 23:55:33 +0000 (16:55 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Thomas Gleixner:
 "Three fixes from EFI land:

   - prevent accessing a Graphic Output Device (GOP) which the kernel
     does not know to handle

   - prevent PCI reconfiguration to modify a BAR which covers the
     framebuffer because that's already in use through the EFI GOP
     interface

   - avoid reserving EFI runtime regions as this results in bogus memory
     mappings"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Don't try to reserve runtime regions
  efi/fb: Avoid reconfiguration of BAR that covers the framebuffer
  efi/libstub: Skip GOP with PIXEL_BLT_ONLY format

7 years agoMerge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason...
Linus Torvalds [Fri, 14 Apr 2017 23:53:45 +0000 (16:53 -0700)]
Merge branch 'for-linus-4.11' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
 "Dave Sterba collected a few more fixes for the last rc.

  These aren't marked for stable, but I'm putting them in with a batch
  were testing/sending by hand for this release"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix potential use-after-free for cloned bio
  Btrfs: fix segmentation fault when doing dio read
  Btrfs: fix invalid dereference in btrfs_retry_endio
  btrfs: drop the nossd flag when remounting with -o ssd

7 years agoMerge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 14 Apr 2017 23:51:29 +0000 (16:51 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull more CIFS fixes from Steve French:
 "As promised, here is the remaining set of cifs/smb3 fixes for stable
  (and a fix for one regression) now that they have had additional
  review and testing"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix SMB3 mount without specifying a security mechanism
  CIFS: store results of cifs_reopen_file to avoid infinite wait
  CIFS: remove bad_network_name flag
  CIFS: reconnect thread reschedule itself
  CIFS: handle guest access errors to Windows shares
  CIFS: Fix null pointer deref during read resp processing

7 years agoMerge tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux
Linus Torvalds [Fri, 14 Apr 2017 16:18:17 +0000 (09:18 -0700)]
Merge tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux

Pull fbdev fixes from Bartlomiej Zolnierkiewicz:

 - fix probing time checks in omapfb driver (regression fix)

 - fix optional VBAT support in ssd1307fb driver (regression fix)

 - fix connecting to backend in xen-fbfront driver

* tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux:
  fbdev: omapfb: delete check_required_callbacks()
  xen, fbfront: fix connecting to backend
  fbdev/ssd1307fb: fix optional VBAT support

7 years agoMerge tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 14 Apr 2017 16:16:23 +0000 (09:16 -0700)]
Merge tag 'pm-4.11-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a cpufreq core regression related to CPU online/offline and
  several issues in the turbostat and cpupower utilities.

  Specifics:

   - Allow CPUs to be put back online even if the cpufreq driver is
     unable to work with them (eg. due to missing information from
     platform firmware), which was the previous behavior expected by
     users, but changed in the 4.9 time frame (Chen Yu).

   - Fix a few minor issues in the turbostat utility, introduced mostly
     during the recent update of it (Len Brown, Doug Smythies).

   - Fix a cpupower utility bug causing it to report incorrect values
     for turbo frequencies in some cases (Ben Hutchings)"

* tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
  cpufreq: Bring CPUs up even if cpufreq_online() failed
  tools/power turbostat: update version number
  tools/power turbostat: fix impossibly large CPU%c1 value
  tools/power turbostat: turbostat.8 add missing column definitions
  tools/power turbostat: update HWP dump to decimal from hex
  tools/power turbostat: enable package THERM_INTERRUPT dump
  tools/power turbostat: show missing Core and GFX power on SKL and KBL
  tools/power turbostat: bugfix: GFXMHz column not changing

7 years agoMerge tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 14 Apr 2017 16:05:42 +0000 (09:05 -0700)]
Merge tag 'acpi-4.11-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:

 "These revert a recent ACPICA commit that turned out to be problematic
  and fix a device enumeration breakage from the 4.8 cycle.

  Specifics:

   - Revert a recent ACPICA commit targeted at catching firmware bugs
     which promptly did that and caused functional problems to appear
     (Rafael Wysocki).

   - Fix a device enumeration problem introduced in the 4.8 time frame
     which caused the ACPI docking station driver to report incorrect
     status via sysfs among other things (Rafael Wysocki)"

* tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPICA: Resources: Not a valid resource if buffer length too long"
  ACPI / scan: Set the visited flag for all enumerated devices

7 years agoMerge tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Fri, 14 Apr 2017 15:57:20 +0000 (08:57 -0700)]
Merge tag 'devmem-v4.11-rc7' of git://git./linux/kernel/git/kees/linux

Pull CONFIG_STRICT_DEVMEM fix from Kees Cook:
 "Fixes /dev/mem to read back zeros for System RAM areas in the 1MB
  exception area on x86 to avoid exposing RAM or tripping hardened
  usercopy"

* tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: Tighten x86 /dev/mem with zeroing reads

7 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Fri, 14 Apr 2017 15:49:39 +0000 (08:49 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio fixes from Michael S. Tsirkin:
 "virtio oops fixes

  The virtio pci rework using shared interrupts caused a lot of issues.
  We tried to fix them but run out of time. Revert for now, and revisit
  the issue for the next kernel.

  Luckily we are able to do this without loosing automatic interrupt
  NUMA affinity which was the main motivator for the rework"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-pci: Remove affinity hint before freeing the interrupt
  Revert "virtio_pci: remove struct virtio_pci_vq_info"
  Revert "virtio_pci: use shared interrupts for virtqueues"
  Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev"
  Revert "virtio_pci: simplify MSI-X setup"
  Revert "virtio_pci: fix out of bound access for msix_names"
  MAINTAINERS: fix virtio file pattern
  virtio_console: fix uninitialized variable use
  virtio_net: clear MTU when out of range
  virtio: allow drivers to validate features
  virtio_net: enable big packets for large MTU values

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Fri, 14 Apr 2017 14:47:13 +0000 (10:47 -0400)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Missing TCP header sanity check in TCPMSS target, from Eric Dumazet.

2) Incorrect event message type for related conntracks created via
   ctnetlink, from Liping Zhang.

3) Fix incorrect rcu locking when handling helpers from ctnetlink,
   from Gao feng.

4) Fix missing rcu locking when updating helper, from Liping Zhang.

5) Fix missing read_lock_bh when iterating over list of device addresses
   from TPROXY and redirect, also from Liping.

6) Fix crash when trying to dump expectations from conntrack with no
   helper via ctnetlink, from Liping.

7) Missing RCU protection to expecation list update given ctnetlink
   iterates over the list under rcu read lock side, from Liping too.

8) Don't dump autogenerated seed in nft_hash to userspace, this is
   very confusing to the user, again from Liping.

9) Fix wrong conntrack netns module refcount in ipt_CLUSTERIP,
   from Gao feng.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agofbdev: omapfb: delete check_required_callbacks()
Aaro Koskinen [Fri, 14 Apr 2017 11:38:32 +0000 (13:38 +0200)]
fbdev: omapfb: delete check_required_callbacks()

Commit 561eb9d09a93 ("fbdev: omap/lcd: Make callbacks optional") made
panel callbacks optional but forgot to update check_required_callbacks().
As a result many (all?) OMAP systems using omapfb will crash at boot.
Fix by deleting the whole function.

Fixes: 561eb9d09a93 ("fbdev: omap/lcd: Make callbacks optional")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
7 years agoMerge branches 'acpi-scan-fixes' and 'acpica-fixes'
Rafael J. Wysocki [Fri, 14 Apr 2017 11:11:43 +0000 (13:11 +0200)]
Merge branches 'acpi-scan-fixes' and 'acpica-fixes'

* acpi-scan-fixes:
  ACPI / scan: Set the visited flag for all enumerated devices

* acpica-fixes:
  Revert "ACPICA: Resources: Not a valid resource if buffer length too long"

7 years agoMerge branches 'pm-cpufreq-fixes' and 'pm-tools-fixes'
Rafael J. Wysocki [Fri, 14 Apr 2017 11:11:09 +0000 (13:11 +0200)]
Merge branches 'pm-cpufreq-fixes' and 'pm-tools-fixes'

* pm-cpufreq-fixes:
  cpufreq: Bring CPUs up even if cpufreq_online() failed

* pm-tools-fixes:
  cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
  tools/power turbostat: update version number
  tools/power turbostat: fix impossibly large CPU%c1 value
  tools/power turbostat: turbostat.8 add missing column definitions
  tools/power turbostat: update HWP dump to decimal from hex
  tools/power turbostat: enable package THERM_INTERRUPT dump
  tools/power turbostat: show missing Core and GFX power on SKL and KBL
  tools/power turbostat: bugfix: GFXMHz column not changing

7 years agoirqchip/irq-imx-gpcv2: Fix spinlock initialization
Tyler Baker [Thu, 13 Apr 2017 22:27:31 +0000 (15:27 -0700)]
irqchip/irq-imx-gpcv2: Fix spinlock initialization

The raw_spinlock in the IMX GPCV2 interupt chip is not initialized before
usage. That results in a lockdep splat:

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.

Add the missing raw_spin_lock_init() to the setup code.

Fixes: e324c4dc4a59 ("irqchip/imx-gpcv2: IMX GPCv2 driver for wakeup sources")
Signed-off-by: Tyler Baker <tyler.baker@linaro.org>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: shawnguo@kernel.org
Cc: andrew.smirnov@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170413222731.5917-1-tyler.baker@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
7 years agoperf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
Peter Zijlstra [Tue, 11 Apr 2017 08:10:28 +0000 (10:10 +0200)]
perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()

When the perf_branch_entry::{in_tx,abort,cycles} fields were added,
intel_pmu_lbr_read_32() wasn't updated to initialize them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Fixes: 135c5612c460 ("perf/x86/intel: Support Haswell/v4 LBR format")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 14 Apr 2017 03:08:33 +0000 (20:08 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "11 fixes.

  The presence of 'thp: reduce indentation level in change_huge_pmd()'
  is unfortunate. But the patchset had been decently reviewed and tested
  before we decided it was needed in -stable and I felt it best not to
  churn things at the last minute"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mailmap: add Martin Kepplinger's email
  zsmalloc: expand class bit
  zram: do not use copy_page with non-page aligned address
  zram: fix operator precedence to get offset
  hugetlbfs: fix offset overflow in hugetlbfs mmap
  thp: fix MADV_DONTNEED vs clear soft dirty race
  thp: fix MADV_DONTNEED vs. MADV_FREE race
  mm: drop unused pmdp_huge_get_and_clear_notify()
  thp: fix MADV_DONTNEED vs. numa balancing race
  thp: reduce indentation level in change_huge_pmd()
  z3fold: fix page locking in z3fold_alloc()

7 years agomailmap: add Martin Kepplinger's email
Martin Kepplinger [Thu, 13 Apr 2017 21:56:43 +0000 (14:56 -0700)]
mailmap: add Martin Kepplinger's email

Set the partly deprecated companies' email addresses as alias for the
personal one.

Link: http://lkml.kernel.org/r/1491984622-17321-1-git-send-email-martin.kepplinger@ginzinger.com
Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agozsmalloc: expand class bit
Minchan Kim [Thu, 13 Apr 2017 21:56:40 +0000 (14:56 -0700)]
zsmalloc: expand class bit

Now 64K page system, zsamlloc has 257 classes so 8 class bit is not
enough.  With that, it corrupts the system when zsmalloc stores
65536byte data(ie, index number 256) so that this patch increases class
bit for simple fix for stable backport.  We should clean up this mess
soon.

  index size
  0 32
  1 288
  ..
  ..
  204 52256
  256 65536

Fixes: 3783689a1 ("zsmalloc: introduce zspage structure")
Link: http://lkml.kernel.org/r/1492042622-12074-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agozram: do not use copy_page with non-page aligned address
Minchan Kim [Thu, 13 Apr 2017 21:56:37 +0000 (14:56 -0700)]
zram: do not use copy_page with non-page aligned address

The copy_page is optimized memcpy for page-alinged address.  If it is
used with non-page aligned address, it can corrupt memory which means
system corruption.  With zram, it can happen with

1. 64K architecture
2. partial IO
3. slub debug

Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address.  And finally, copy_page(mem, cmem) corrupts memory.

So, this patch changes it to memcpy.

Actuaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.

Note:
 When this patch is merged to stable, clear_page should be fixed, too.
 Unfortunately, recent zram removes it by "same page merge" feature so
 it's hard to backport this patch to -stable tree.

I will handle it when I receive the mail from stable tree maintainer to
merge this patch to backport.

Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agozram: fix operator precedence to get offset
Minchan Kim [Thu, 13 Apr 2017 21:56:35 +0000 (14:56 -0700)]
zram: fix operator precedence to get offset

In zram_rw_page, the logic to get offset is wrong by operator precedence
(i.e., "<<" is higher than "&").  With wrong offset, zram can corrupt
the user's data.  This patch fixes it.

Fixes: 8c7f01025 ("zram: implement rw_page operation of zram")
Link: http://lkml.kernel.org/r/1492042622-12074-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agohugetlbfs: fix offset overflow in hugetlbfs mmap
Mike Kravetz [Thu, 13 Apr 2017 21:56:32 +0000 (14:56 -0700)]
hugetlbfs: fix offset overflow in hugetlbfs mmap

If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start.  Offset could be a negative value when
represented as a loff_t.  The offset plus length will be used to update
the file size (i_size) which is also a loff_t.

Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.

Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied.  Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.

To reproduce:

   mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);

Resulted in,

  kernel BUG at mm/hugetlb.c:742!
  Call Trace:
   hugetlbfs_evict_inode+0x80/0xa0
   evict+0x24a/0x620
   iput+0x48f/0x8c0
   dentry_unlink_inode+0x31f/0x4d0
   __dentry_kill+0x292/0x5e0
   dput+0x730/0x830
   __fput+0x438/0x720
   ____fput+0x1a/0x20
   task_work_run+0xfe/0x180
   exit_to_usermode_loop+0x133/0x150
   syscall_return_slowpath+0x184/0x1c0
   entry_SYSCALL_64_fastpath+0xab/0xad

Fixes: ff8c0c53c475 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.com
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agothp: fix MADV_DONTNEED vs clear soft dirty race
Kirill A. Shutemov [Thu, 13 Apr 2017 21:56:28 +0000 (14:56 -0700)]
thp: fix MADV_DONTNEED vs clear soft dirty race

Yet another instance of the same race.

Fix is identical to change_huge_pmd().

See "thp: fix MADV_DONTNEED vs.  numa balancing race" for more details.

Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agothp: fix MADV_DONTNEED vs. MADV_FREE race
Kirill A. Shutemov [Thu, 13 Apr 2017 21:56:26 +0000 (14:56 -0700)]
thp: fix MADV_DONTNEED vs. MADV_FREE race

Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).

It's critical to not clear pmd intermittently while handling MADV_FREE
to avoid race with MADV_DONTNEED:

CPU0: CPU1:
madvise_free_huge_pmd()
 pmdp_huge_get_and_clear_full()
madvise_dontneed()
 zap_pmd_range()
  pmd_trans_huge(*pmd) == 0 (without ptl)
  // skip the pmd
 set_pmd_at();
 // pmd is re-established

It results in MADV_DONTNEED skipping the pmd, leaving it not cleared.
It violates MADV_DONTNEED interface and can result is userspace
misbehaviour.

Basically it's the same race as with numa balancing in
change_huge_pmd(), but a bit simpler to mitigate: we don't need to
preserve dirty/young flags here due to MADV_FREE functionality.

[kirill.shutemov@linux.intel.com: Urgh... Power is special again]
Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com
Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: drop unused pmdp_huge_get_and_clear_notify()
Kirill A. Shutemov [Thu, 13 Apr 2017 21:56:23 +0000 (14:56 -0700)]
mm: drop unused pmdp_huge_get_and_clear_notify()

Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the
last pmdp_huge_get_and_clear_notify() user is gone.

Let's drop the helper.

Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agothp: fix MADV_DONTNEED vs. numa balancing race
Kirill A. Shutemov [Thu, 13 Apr 2017 21:56:20 +0000 (14:56 -0700)]
thp: fix MADV_DONTNEED vs. numa balancing race

In case prot_numa, we are under down_read(mmap_sem).  It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):

CPU0: CPU1:
change_huge_pmd(prot_numa=1)
 pmdp_huge_get_and_clear_notify()
madvise_dontneed()
 zap_pmd_range()
  pmd_trans_huge(*pmd) == 0 (without ptl)
  // skip the pmd
 set_pmd_at();
 // pmd is re-established

The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.

Found by code analysis, never saw triggered.

Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agothp: reduce indentation level in change_huge_pmd()
Kirill A. Shutemov [Thu, 13 Apr 2017 21:56:17 +0000 (14:56 -0700)]
thp: reduce indentation level in change_huge_pmd()

Patch series "thp: fix few MADV_DONTNEED races"

For MADV_DONTNEED to work properly with huge pages, it's critical to not
clear pmd intermittently unless you hold down_write(mmap_sem).

Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
breakage.

See example of such race in commit message of patch 2/4.

All these races are found by code inspection.  I haven't seen them
triggered.  I don't think it's worth to apply them to stable@.

This patch (of 4):

Restructure code in preparation for a fix.

Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoz3fold: fix page locking in z3fold_alloc()
Vitaly Wool [Thu, 13 Apr 2017 21:56:14 +0000 (14:56 -0700)]
z3fold: fix page locking in z3fold_alloc()

Stress testing of the current z3fold implementation on a 8-core system
revealed it was possible that a z3fold page deleted from its unbuddied
list in z3fold_alloc() would be put on another unbuddied list by
z3fold_free() while z3fold_alloc() is still processing it.  This has
been introduced with commit 5a27aa822 ("z3fold: add kref refcounting")
due to the removal of special handling of a z3fold page not on any list
in z3fold_free().

To fix this, the z3fold page lock should be taken in z3fold_alloc()
before the pool lock is released.  To avoid deadlocking, we just try to
lock the page as soon as we get a hold of it, and if trylock fails, we
drop this page and take the next one.

Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoia64: restore symbol versions for symbols defined in assembly
Jan Beulich [Thu, 13 Apr 2017 18:06:00 +0000 (11:06 -0700)]
ia64: restore symbol versions for symbols defined in assembly

The ia64 build generates many warnings like this:

   WARNING: EXPORT symbol "empty_zero_page" [vmlinux] version generation failed, symbol will not be versioned.

Besides adding the necessary header this also requires fiddling with
some explicit .S -> .o rules.

Cc: IA64-ML <linux-ia64@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoirq/affinity: Fix extra vecs calculation
Keith Busch [Thu, 13 Apr 2017 17:28:12 +0000 (13:28 -0400)]
irq/affinity: Fix extra vecs calculation

This fixes a math error calculating the extra_vecs. The error assumed
only 1 cpu per vector, but the value needs to account for the actual
number of cpus per vector in order to get the correct remainder for
extra CPU assignment.

Fixes: 7bf8222b9bd0 ("irq/affinity: Fix CPU spread for unbalanced nodes")
Reported-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Link: http://lkml.kernel.org/r/1492104492-19943-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
7 years agonetfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage
Gao Feng [Thu, 6 Apr 2017 01:45:22 +0000 (09:45 +0800)]
netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage

Current codes invoke wrongly nf_ct_netns_get in the destroy routine,
it should use nf_ct_netns_put, not nf_ct_netns_get.
It could cause some modules could not be unloaded.

Fixes: ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put")
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
7 years agonetfilter: nft_hash: do not dump the auto generated seed
Liping Zhang [Mon, 3 Apr 2017 08:34:38 +0000 (16:34 +0800)]
netfilter: nft_hash: do not dump the auto generated seed

This can prevent the nft utility from printing out the auto generated
seed to the user, which is unnecessary and confusing.

Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
7 years agoMerge branch 'netlink_ext_ACK'
David S. Miller [Thu, 13 Apr 2017 17:58:23 +0000 (13:58 -0400)]
Merge branch 'netlink_ext_ACK'

Johannes Berg says:

====================
netlink extended ACK reporting

Changes since v4:
 * use __NLMSGERR_ATTR_MAX instead of NUM_NLMSGERR_ATTRS

Changes since v3:
 * Add NLM_F_CAPPED and NLM_F_ACK_TLVS flags, to allow entirely
   stateless parsing of the ACK messages by looking at the new
   flags. Need to check NLM_F_ACK_TLVS first, since capping can
   be done in kernels before this patchset without setting the
   flag.
 * Remove "missing_attr" functionality - this can obviously be
   added back rather easily, but I'd rather have more discussion
   about the nesting problem there.
 * Improve documentation of NLMSGERR_ATTR_OFFS
 * Improve message structure documentation, documenting that the
   request message is always capped for success cases
 * fix nlmsg_len of the outer message by calling nlmsg_end()
 * fix memcpy() of the request in success cases, going back to
   the original code that I'd changed before due to the payload
   adjustments that I reverted when introducing tlvlen

Changes since v2:
 * add NUM_NLMSGERR_ATTRS, NLMSGERR_ATTR_MAX
 * fix cookie length to 20 (sha-1 length)
 * move struct members for cookie to patch 3 where they should be
 * another cleanup suggested by David Ahern

Changes since v1:
 * credit Pablo and Jamal
 * incorporate suggestion from David Ahern
 * fix compilation in decnet
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetlink: pass extended ACK struct where available
Johannes Berg [Wed, 12 Apr 2017 12:34:08 +0000 (14:34 +0200)]
netlink: pass extended ACK struct where available

This is an add-on to the previous patch that passes the extended ACK
structure where it's already available by existing genl_info or extack
function arguments.

This was done with this spatch (with some manual adjustment of
indentation):

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, info->extack)
...
}

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, info->extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse(A, B, C, D, E, NULL)
+nla_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_validate(A, B, C, D, NULL)
+nlmsg_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate(A, B, C, D, NULL)
+nla_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate_nested(A, B, C, NULL)
+nla_validate_nested(A, B, C, extack)
...>
}

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetlink: pass extended ACK struct to parsing functions
Johannes Berg [Wed, 12 Apr 2017 12:34:07 +0000 (14:34 +0200)]
netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetlink: allow sending extended ACK with cookie on success
Johannes Berg [Wed, 12 Apr 2017 12:34:06 +0000 (14:34 +0200)]
netlink: allow sending extended ACK with cookie on success

Now that we have extended error reporting and a new message format for
netlink ACK messages, also extend this to be able to return arbitrary
cookie data on success.

This will allow, for example, nl80211 to not send an extra message for
cookies identifying newly created objects, but return those directly
in the ACK message.

The cookie data size is currently limited to 20 bytes (since Jamal
talked about using SHA1 for identifiers.)

Thanks to Jamal Hadi Salim for bringing up this idea during the
discussions.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agogenetlink: pass extended ACK report down
Johannes Berg [Wed, 12 Apr 2017 12:34:05 +0000 (14:34 +0200)]
genetlink: pass extended ACK report down

Pass the extended ACK reporting struct down from generic netlink to
the families, using the existing struct genl_info for simplicity.

Also add support to set the extended ACK information from generic
netlink users.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetlink: extended ACK reporting
Johannes Berg [Wed, 12 Apr 2017 12:34:04 +0000 (14:34 +0200)]
netlink: extended ACK reporting

Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: handle link transition from FAIL to UP correctly
Mahesh Bandewar [Wed, 12 Apr 2017 05:36:00 +0000 (22:36 -0700)]
bonding: handle link transition from FAIL to UP correctly

When link transitions from LINK_FAIL to LINK_UP, the commit phase is
not called. This leads to an erroneous state causing slave-link state to
get stuck in "going down" state while its speed and duplex are perfectly
fine. This issue is a side-effect of splitting link-set into propose and
commit phases introduced by de77ecd4ef02 ("bonding: improve link-status
update in mii-monitoring")

This patch fixes these issues by calling commit phase whenever link
state change is proposed.

Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dwc-xlgmac: add the initial ethtool support
Jie Deng [Wed, 12 Apr 2017 05:10:06 +0000 (13:10 +0800)]
net: dwc-xlgmac: add the initial ethtool support

It is necessary to provide ethtool support for displaying and
modifying parameters of dwc-xlgmac.

Signed-off-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv4: Refine the ipv4_default_advmss
Gao Feng [Wed, 12 Apr 2017 04:34:03 +0000 (12:34 +0800)]
net: ipv4: Refine the ipv4_default_advmss

1. Don't get the metric RTAX_ADVMSS of dst.
There are two reasons.
1) Its caller dst_metric_advmss has already invoke dst_metric_advmss
before invoke default_advmss.
2) The ipv4_default_advmss is used to get the default mss, it should
not try to get the metric like ip6_default_advmss.

2. Use sizeof(tcphdr)+sizeof(iphdr) instead of literal 40.

3. Define one new macro IPV4_MAX_PMTU instead of 65535 according to
RFC 2675, section 5.1.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'rtnetlink-cleanup-user-notifications'
David S. Miller [Thu, 13 Apr 2017 17:16:35 +0000 (13:16 -0400)]
Merge branch 'rtnetlink-cleanup-user-notifications'

David Ahern says:

====================
rtnetlink: Cleanup user notifications for netdev events

Vlad's recent patch to add the event type to rtnetlink notifications
points out a number of redundant or unnecessary notifications sent to
userspace for events that are essentially internal to the kernel. Trim
the list to put a dent in the notification storm.

v2
- rebased to top of net-next with IFLA_EVENT patch reverted
- dropped removal NETDEV_CHANGEINFODATA since it is intentionally
  only to send a message to userspace
- dropped NOTIFY_PEERS since Vlad's says it is needed for macvlans
- add patches to remove NETDEV_CHANGEUPPER and NETDEV_CHANGE_TX_QUEUE_LEN
  from the event list
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notifications for NETDEV_CHANGE_TX_QUEUE_LEN event
David Ahern [Wed, 12 Apr 2017 00:02:47 +0000 (17:02 -0700)]
rtnetlink: Do not generate notifications for NETDEV_CHANGE_TX_QUEUE_LEN event

Changing tx queue length generates identical messages:

[LINK]22: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether 02:04:f4:b7:5c:d2 brd ff:ff:ff:ff:ff:ff promiscuity 0
    dummy numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
[LINK]22: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether 02:04:f4:b7:5c:d2 brd ff:ff:ff:ff:ff:ff promiscuity 0
    dummy numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Remove NETDEV_CHANGE_TX_QUEUE_LEN from the list of notifiers that generate
notifications.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notifications for NETDEV_CHANGEUPPER event
David Ahern [Wed, 12 Apr 2017 00:02:46 +0000 (17:02 -0700)]
rtnetlink: Do not generate notifications for NETDEV_CHANGEUPPER event

NETDEV_CHANGEUPPER is an internal event; do not generate userspace
notifications.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notifications for CHANGELOWERSTATE event
David Ahern [Wed, 12 Apr 2017 00:02:45 +0000 (17:02 -0700)]
rtnetlink: Do not generate notifications for CHANGELOWERSTATE event

CHANGELOWERSTATE is an internal event; do not generate userspace
notifications.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notifications for PRECHANGEUPPER event
David Ahern [Wed, 12 Apr 2017 00:02:44 +0000 (17:02 -0700)]
rtnetlink: Do not generate notifications for PRECHANGEUPPER event

PRECHANGEUPPER is an internal event; do not generate userspace
notifications.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notifications for POST_TYPE_CHANGE event
David Ahern [Wed, 12 Apr 2017 00:02:43 +0000 (17:02 -0700)]
rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event

Changing the master device for a link generates many messages; the one
generated for POST_TYPE_CHANGE is redundant:

[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UNKNOWN group default
    link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff
[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UNKNOWN group default
    link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff

Remove POST_TYPE_CHANGE from the list of notifiers that generate
notifications.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notifications for CHANGEADDR event
David Ahern [Wed, 12 Apr 2017 00:02:42 +0000 (17:02 -0700)]
rtnetlink: Do not generate notifications for CHANGEADDR event

Changing hardware address generates redundant messages:

[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff

[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff

Do not send a notification for the CHANGEADDR notifier.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notification for UDP_TUNNEL_PUSH_INFO
David Ahern [Wed, 12 Apr 2017 00:02:41 +0000 (17:02 -0700)]
rtnetlink: Do not generate notification for UDP_TUNNEL_PUSH_INFO

NETDEV_UDP_TUNNEL_PUSH_INFO is an internal notifier; nothing userspace
can do so don't generate a netlink notification.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: Do not generate notifications for MTU events
David Ahern [Wed, 12 Apr 2017 00:02:40 +0000 (17:02 -0700)]
rtnetlink: Do not generate notifications for MTU events

Changing MTU on a link currently causes 3 messages to be sent to userspace:

[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1490 qdisc noqueue state UNKNOWN group default
    link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff

[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff

[LINK]11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff

Remove the messages sent for PRE_CHANGE_MTU and CHANGE_MTU netdev events.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotools: bpf_jit_disasm: Add option to dump JIT image to a file.
David Daney [Tue, 11 Apr 2017 21:30:52 +0000 (14:30 -0700)]
tools: bpf_jit_disasm: Add option to dump JIT image to a file.

When debugging the JIT on an embedded platform or cross build
environment, libbfd may not be available, making it impossible to run
bpf_jit_disasm natively.

Add an option to emit a binary image of the JIT code to a file.  This
file can then be disassembled off line.  Typical usage in this case
might be (pasting mips64 dmesg output to cat command):

   $ cat > jit.raw
   $ bpf_jit_disasm -f jit.raw -O jit.bin
   $ mips64-linux-gnu-objdump -D -b binary -m mips:isa64r2 -EB jit.bin

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: stmmac: set total length of the packet to be transmitted in TDES3
Niklas Cassel [Mon, 10 Apr 2017 18:33:29 +0000 (20:33 +0200)]
net: stmmac: set total length of the packet to be transmitted in TDES3

Field FL/TPL in register TDES3 is not correctly set on GMAC4.
TX appears to be functional on GMAC 4.10a even if this field is not set,
however, to avoid relying on undefined behavior, set the length in TDES3.

The field has a different meaning depending on if the TSE bit in TDES3
is set or not (TSO). However, regardless of the TSE bit, the field is
not optional. The field is already set correctly when the TSE bit is set.

Since there is no limit for the number of descriptors that can be
used for a single packet, the field should be set to the sum of
the buffers contained in:
[<desc with First Descriptor bit set> ... <desc n> ...
<desc with Last Descriptor bit set>], which should be equal to skb->len.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: save tid while creating server filter
Ganesh Goudar [Mon, 10 Apr 2017 15:56:18 +0000 (21:26 +0530)]
cxgb4: save tid while creating server filter

Save the filter tid while creating the server filter, which is used
later to retrieve the corresponding filter instance while handling
the filter reply.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201
Daniele Palmas [Mon, 10 Apr 2017 15:34:23 +0000 (17:34 +0200)]
drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201

Telit LE920A4 uses the same pid 0x1201 of LE920, but modem
implementation is different, since it requires DTR to be set for
answering to qmi messages.

This patch replaces QMI_FIXED_INTF with QMI_QUIRK_SET_DTR: tests on
LE920 have been performed in order to verify backward compatibility.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRevert "ACPICA: Resources: Not a valid resource if buffer length too long"
Rafael J. Wysocki [Thu, 13 Apr 2017 16:14:55 +0000 (18:14 +0200)]
Revert "ACPICA: Resources: Not a valid resource if buffer length too long"

Revert commit 57707a9a7780 (ACPICA: Resources: Not a valid resource if
buffer length too long) as it is reported to prevent the TPM module
from loading on Lenovo X60 with Coreboot.

It also causes new confusing warnings to show up in the kernel log.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=195311
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
7 years agoMerge tag 'pinctrl-v4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Thu, 13 Apr 2017 16:08:29 +0000 (09:08 -0700)]
Merge tag 'pinctrl-v4.11-5' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Two pin control fixes arriving late, these are hopefully the last pin
  control fixes I send this kernel cycle. A Chromebook and an Exynos SoC
  thingie.

  The Exynos patch is pretty big, it is fixing unbroken a breakage
  caused by yours truly when trying to figure out the merge mess with
  the different Samsung platforms for this merge window. Sorry about
  that. We have countered this situation by assigning a Samsung pin
  control submaintainer to catch stuff earlier.

  Summary:

   - Make the Acer Chromebook keyboard work again with the Intel
     Cherryview driver.

   - Fix a merge error in the Exynos 5433 driver"

* tag 'pinctrl-v4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: cherryview: Add a quirk to make Acer Chromebook keyboard work again
  pinctrl: samsung: Add missing part for PINCFG_TYPE_DRV of Exynos5433