Alexei Starovoitov [Mon, 19 May 2014 21:56:14 +0000 (14:56 -0700)]
net: filter: cleanup invocation of internal BPF
Kernel API for classic BPF socket filters is:
sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter
Cleanup internal BPF kernel API as following:
sk_filter_select_runtime() - final step of internal BPF creation.
Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program
Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.
Example of internal BPF create, run, destroy:
struct sk_filter *fp;
fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
fp->len = prog_len;
sk_filter_select_runtime(fp);
SK_RUN_FILTER(fp, ctx);
sk_filter_free(fp);
Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 May 2014 21:04:42 +0000 (17:04 -0400)]
Merge branch 'enic-next'
Govindarajulu Varadarajan says:
====================
enic: Add adaptive coalescing interrupt support
This series add support for adaptive coalescing interrupt and updates
enic Maintainers.
v1->v2:
* Add commit log
* do vnic_intr_coalescing_timer_set only while enabling intr
* use ktime_get instead of hrtimer
* make enic_set_rx_coal_setting return type void
* change func name enic_apply_int_moderation to enic_calc_int_moderation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Mon, 19 May 2014 21:44:32 +0000 (03:14 +0530)]
MAINTAINERS: Update enic maintainers
Cc: Sujith Sankar <ssujith@cisco.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Mon, 19 May 2014 21:44:05 +0000 (03:14 +0530)]
enic: Add support for adaptive interrupt coalescing
This patch adds support for adaptive interrupt coalescing.
For small pkts with low pkt rate, we can decrease the coalescing interrupt
dynamically which decreases the latency. This however increases the cpu
utilization. Based on testing with different coal intr and pkt rate we came up
with a table(mod_table) with rx_rate and coalescing interrupt value where we
get low latency without significant increase in cpu. mod_table table stores
the coalescing timer percentage value for different throughputs.
Function enic_calc_int_moderation() calculates the desired coalescing intr timer
value. This function is called in driver rx napi_poll. The actual value is set
by enic_set_int_moderation() which is called when napi_poll is complete. i.e
when we unmask the rx intr.
Adaptive coal intr is support only when driver is using msix intr. Because
intr is not shared.
Struct mod_range is used to store only the default adaptive coalescing intr
value.
Adaptive coal intr calue is calculated by
timer = range_start + ((rx_coal->range_end - range_start) *
mod_table[index].range_percent / 100);
rx_coal->range_end is the rx-usecs-high value set using ethtool.
range_start is rx-usecs-low, set using ethtool, if rx_small_pkt_bytes_cnt is
greater than 2 * rx_large_pkt_bytes_cnt. i.e small pkts are dominant. Else its
rx-usecs-low + 3.
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manuel Schölling [Mon, 19 May 2014 20:59:04 +0000 (22:59 +0200)]
vxge: Use time_before()
To be future-proof and for better readability the time comparisons are modified
to use time_before() instead of plain, error-prone math.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Himangi Saraogi [Mon, 19 May 2014 16:55:11 +0000 (22:25 +0530)]
ieee802154: Introduce the use of the managed version of kzalloc
This patch moves data allocated using kzalloc to managed data allocated
using devm_kzalloc and cleans now unnecessary kfrees in probe and remove
functions. An explicit linux/device.h include is added to make sure
the devm_*() routine declarations are unambiguously available.
The following Coccinelle semantic patch was used for making the change:
@platform@
identifier p, probefn, removefn;
@@
struct platform_driver p = {
.probe = probefn,
.remove = removefn,
};
@prb@
identifier platform.probefn, pdev;
expression e, e1, e2;
@@
probefn(struct platform_device *pdev, ...) {
<+...
- e = kzalloc(e1, e2)
+ e = devm_kzalloc(&pdev->dev, e1, e2)
...
?-kfree(e);
...+>
}
@rem depends on prb@
identifier platform.removefn;
expression e;
@@
removefn(...) {
<...
- kfree(e);
...>
}
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Senna Tschudin [Mon, 19 May 2014 10:26:52 +0000 (12:26 +0200)]
atm: idt77252: Remove redundant error check
Remove double checks, convert printk to pr_warn, and move the call to
pr_warn to the first check. The simplified version of the coccinelle
semantic patch that find this issue is as follows:
// <smpl>
@@
expression E; identifier pr; expression list es;
@@
while(...){
...
- if (E) break;
+ if (E){
+ pr(es);
+ break;
+ }
...
}
- if(E) pr(es);
// </smpl>
Tested by compilation only.
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Mon, 19 May 2014 09:30:28 +0000 (17:30 +0800)]
ipv6: slight optimization in ip6_dst_gc
entries is always greater than rt_max_size here, since if entries is less
than rt_max_size, the fib6_run_gc function will be skipped
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xi Wang [Fri, 16 May 2014 22:11:48 +0000 (15:11 -0700)]
net-tun: restructure tun_do_read for better sleep/wakeup efficiency
tun_do_read always adds current thread to wait queue, even if a packet
is ready to read. This is inefficient because both sleeper and waker
want to acquire the wait queue spin lock when packet rate is high.
We restructure the read function and use common kernel networking
routines to handle receive, sleep and wakeup. With the change
available packets are checked first before the reading thread is added
to the wait queue.
Ran performance tests with the following configuration:
- my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer
- sender pinned to one core and receiver pinned to another core
- sender send small UDP packets (64 bytes total) as fast as it can
- sandy bridge cores
- throughput are receiver side goodput numbers
The results are
baseline: 731k pkts/sec, cpu utilization at 1.50 cpus
changed: 783k pkts/sec, cpu utilization at 1.53 cpus
The performance difference is largely determined by packet rate and
inter-cpu communication cost. For example, if the sender and
receiver are pinned to different cpu sockets, the results are
baseline: 558k pkts/sec, cpu utilization at 1.71 cpus
changed: 690k pkts/sec, cpu utilization at 1.67 cpus
Co-authored-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xi Wang <xii@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Gundersen [Thu, 15 May 2014 21:21:30 +0000 (23:21 +0200)]
net: tunnels - enable module autoloading
Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.
This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 May 2014 06:00:27 +0000 (02:00 -0400)]
Merge tag 'linux-can-next-for-3.16-
20140519' of git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2014-05-19
this is a pull request of 13 patches for net-next/master.
A patch by Dan Carpenter fixes a coccinelle warning in the mcp251x
driver. Jean Delvare contributes three patches to tightening the
Kconfig dependencies for some drivers. Then come three patches by Pavel
Machek that improve the c_can driver support on the socfpga platform.
Sergei Shtylyov's patch brings support for the CAN hardware found on
Renesas R-Car CAN controllers. Four patches by Oliver Hartkopp, the
first cleans up the guard macros in the CAN headers the other three
improve the EFF frame filtering. Maximilian Schneider's patch adds
support for the GS_USB CAN devices.
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Mon, 19 May 2014 07:21:09 +0000 (09:21 +0200)]
net: cdc_ncm: fix 64bit division build error
The upper timer_interval limit is arbitrary and much higher
than anything usable in the real world. Reducing it from 15s
to ~4s to make the timer_interval fit in an u32 does not make
much difference. The limit is still outside the practical
bounds.
This eliminates the need for a 64bit timer_interval, fixing a
build error related to 64bit division:
drivers/built-in.o: In function `cdc_ncm_get_coalesce':
ak8975.c:(.text+0x1ac994): undefined reference to `__aeabi_uldivmod'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maximilian Schneider [Tue, 24 Dec 2013 19:29:45 +0000 (19:29 +0000)]
can: gs_usb: Added support for the GS_USB CAN devices
The Geschwister Schneider Family of devices are galvanically isolated USB2.0 to
CAN2.0A/B adapters. Currently two form factors are available, a tethered dongle
in a rugged enclosure, and mini-pci-e card.
Signed-off-by: Maximilian Schneider <max@schneidersoft.net>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Wed, 2 Apr 2014 18:25:27 +0000 (20:25 +0200)]
can: add documentation for CAN filter usage optimisation
To benefit from special filters for single SFF or single EFF CAN identifier
subscriptions the CAN_EFF_FLAG bit and the CAN_RTR_FLAG bit has to be set
together with the CAN_(SFF|EFF)_MASK in can_filter.mask.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Wed, 2 Apr 2014 18:25:26 +0000 (20:25 +0200)]
can: add hash based access to single EFF frame filters
In contrast to the direct access to the single SFF frame filters (which are
indexed by the SFF CAN ID itself) the single EFF frame filters are arranged
in a single linked hlist. To reduce the hlist traversal in the case of many
filter subscriptions a hash based access is introduced for single EFF filters.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Wed, 2 Apr 2014 18:25:25 +0000 (20:25 +0200)]
can: proc: make array printing function indenpendent from sff frames
The can_rcvlist_sff_proc_show_one() function which prints the array of filters
for the single SFF CAN identifiers is prepared to be used by a second caller.
Therefore it is also renamed to properly describe its future functionality.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Thu, 15 May 2014 18:31:56 +0000 (20:31 +0200)]
can: unify identifiers to ensure unique include processing
Armin pointed me to the fact that the identifier which is used to ensure the
unique include processing in lunux/include/uapi/linux/can.h is CAN_H.
This clashed with his own source as includes from libraries and APIs should
use an underscore '_' at the identifier start.
This patch fixes the protection identifiers in all CAN relavant includes.
Reported-by: Armin Burchardt <armin@uni-bremen.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sergei Shtylyov [Fri, 16 May 2014 20:03:54 +0000 (00:03 +0400)]
can: add Renesas R-Car CAN driver
Add support for the CAN controller found in Renesas R-Car SoCs.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pavel Machek [Tue, 13 May 2014 13:09:14 +0000 (15:09 +0200)]
can: c_can: add hwinit support for non-TI devices
Non-TI chips (including socfpga) needs different raminit sequence. Implement
it.
Tested-by: Thor Thayer <tthayer@altera.com>
Signed-off-by: Thor Thayer <tthayer@altera.com>
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pavel Machek [Tue, 6 May 2014 13:57:02 +0000 (15:57 +0200)]
can: c_can: Add and make use of 32-bit accesses functions
Add helpers for 32-bit accesses and replace open-coded 32-bit access
with calls to helpers. Minimum changes are done to the pci case, as I
don't have access to that hardware.
Tested-by: Thor Thayer <tthayer@altera.com>
Signed-off-by: Thor Thayer <tthayer@altera.com>
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pavel Machek [Tue, 13 May 2014 13:09:14 +0000 (15:09 +0200)]
can: c_can: make {read,write}_reg functions const
This patch makes the {read,write}_reg functions const, this is a preparation to
make use of {read,write}_reg in the hwinit callback.
Signed-off-by: Thor Thayer <tthayer@altera.com>
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jean Delvare [Tue, 6 May 2014 08:20:59 +0000 (10:20 +0200)]
can: pch_can: Fix Kconfig dependencies
The pch_can driver is for a companion chip to the Intel Atom E600
series processors. These are 32-bit x86 processors so the driver is
only needed on X86_32. Add COMPILE_TEST as an alternative, so that the
driver can still be build-tested elsewhere.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jean Delvare [Tue, 6 May 2014 08:47:10 +0000 (10:47 +0200)]
can: mscan: Fix Kconfig dependencies
The only driver based on MSCAN at the moment is for PPC machines,
so it makes no sense to present the menu on M68K. The menu will always
be empty there.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jean Delvare [Tue, 6 May 2014 08:19:15 +0000 (10:19 +0200)]
can: at91_can: Fix Kconfig dependencies
The at91_can driver is AT91-specific so it should depend on ARCH_AT91
rather than just ARM. Add COMPILE_TEST as an alternative, so that the
driver can still be build-tested elsewhere.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Dan Carpenter [Fri, 2 May 2014 11:57:01 +0000 (13:57 +0200)]
can: mcp251x: fix coccinelle warnings
drivers/net/can/spi/mcp251x.c:953:7-27: ERROR: Threaded IRQ with no primary handler requested without IRQF_ONESHOT
Make sure threaded IRQs without a primary handler are always request with
IRQF_ONESHOT
Generated by: scripts/coccinelle/misc/irqf_oneshot.cocci
CC: Stefano Babic <sbabic@denx.de>
CC: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
David S. Miller [Mon, 19 May 2014 01:27:09 +0000 (21:27 -0400)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- fix codestyle to respect new checkpatch warnings
- increase internal version number
Rickard Strandqvist [Sun, 18 May 2014 16:01:21 +0000 (18:01 +0200)]
net: ethernet: ibm: ehea: ehea_qmr.c: Fix for possible null pointer dereference
There is otherwise a risk of a possible null pointer dereference.
Was largely found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manuel Schölling [Sun, 18 May 2014 21:32:49 +0000 (23:32 +0200)]
net: rds: Use time_after() for time comparison
To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of raw math.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manuel Schölling [Sun, 18 May 2014 10:45:48 +0000 (12:45 +0200)]
net: 8390: use time_after() for time comparison
To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of raw math.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Engelmayer [Sat, 17 May 2014 21:52:03 +0000 (23:52 +0200)]
net/mlx4_en: Fix uninitialized use of 'port_up' in mlx4_en_set_channels()
Function mlx4_en_set_channels() stops running ports before performing the
requested action. In that case local variable 'port_up' is set so that the
port is restarted at the end of the function, however, in case the port was
not stopped, variable 'port_up' is left uninitialized and the behaviour is
undetermined. Detected by Coverity - CID 751497.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Sat, 17 May 2014 03:46:58 +0000 (20:46 -0700)]
ipv4: minor spelling fix
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Sat, 17 May 2014 03:46:17 +0000 (20:46 -0700)]
bridge: fix spelling of promiscuous
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Sun, 18 May 2014 17:30:28 +0000 (10:30 -0700)]
net: bridge: fix build
fix build when BRIDGE_VLAN_FILTERING is not set
Fixes:
2796d0c648c94 ("bridge: Automatically manage port promiscuous mode")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich [Fri, 16 May 2014 11:30:45 +0000 (13:30 +0200)]
batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Antonio Quartulli [Thu, 15 May 2014 09:24:26 +0000 (11:24 +0200)]
batman-adv: remove semi-colon after macro definition
Reported by checkpatch with the following warning:
"WARNING: macros should not use a trailing semicolon"
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Sat, 10 May 2014 16:56:37 +0000 (18:56 +0200)]
batman-adv: add blank line between declarations and the rest of the code
Reported by checkpatch with the following message:
"WARNING: Missing a blank line after declarations"
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
David S. Miller [Sat, 17 May 2014 02:39:13 +0000 (22:39 -0400)]
Merge branch 'cdc_ncm-coalesce'
Bjørn Mork says:
====================
cdc_ncm: add buffer tuning and stats using ethtool
Quoting the previous description of this series (skip to the
changelog below if you only want a summary of the changes):
"I have got quite a few reports from frustrated users of OpenWRT
hosts trying to use some powerful LTE modem, but not achieving
full speed. This is typically caused by a combination of
big buffers and little memory, giving in allocation errors and
bad performance as a result.
This series is an attempt to let users adjust the size of these
buffers without having to rebuild the driver.
Patches 1 - 4 are mostly rearranging existing code, in preparing
for the dynamic buffer size changes.
Patch 5 adds userspace control (ab)using the ethtool coalescing
API. This isn't a perfect match, which is the main reason why I
post this series as a RFC.
Patch 6 is an unrelated framing optimization, reducing the
overhead quite a bit and allowing for better use of smaller
buffers.
Patch 7 changes the way we calculate frame padding cutoff. The
problem with big buffers is made much worse by the current padding
strategy where zero padding often can account for more than 90% of
the frames.
Patch 8 add some counters giving some insight into how well the
NCM/MBIM protocol works, supporting further tuning.
Patch 9 reduce the initial maximum buffer size from 32kB to 16kB
in an attempt to make the default better suit all. It is still
possible to tune this up again to the old fixed max, using the
new tuning knobs.
I must admit that I had higher hopes for this series before I
tested it on my own modems. One really unexpected result was
that one of the MBIM modems accepted the new rx buffer size we
set, but happily continued sending buffers of the same size as
before. Needless to say: This did not work very well...
So don't really expect to be able to use any values with any
given device. Firmware implementations are still... I don't
think I have words suitable for a public mailing list.
But I am hoping this will help the many users who have had success
rebuilding the driver with lower fixed limits.
Please test and/or comment!"
Changes:
** RFC -> v1 **
Patch 10 - a follow-up to a comment Joe Perches made in November
2013. I don't always forget :-)
Patch 11 - removes the redundant "connected" driver state, and the
associated .check_connect callbacks.
** v1 -> v2 **
Patch 1 - Better handling of minium rx buffer size, based on feedback
from Oliver Neukum and Enrico Mioso
Patch 5 - fixed locking around timer interval update
Patch 9 - fixed whitespace error
Patch 12 - new fix related to the tuneable tx timer
...and spelling fixes all over the commit messages. I have finally
added a spelling hook, which I'm sure may of you will appreciate :-)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:29 +0000 (21:48 +0200)]
net: cdc_ncm: do not start timer on an empty skb
We can end up with a freshly allocated tx_curr_skb with no frames
in it. In this case it does not make any sense to start the timer.
This avoids the timer periodically trying to start tx when there
is nothing in the queue.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:28 +0000 (21:48 +0200)]
net: cdc_ncm: remove redundant "disconnected" flag
Calling netif_carrier_{on,off} is sufficient. There is no need
to duplicate the carrier state in a driver specific flag.
Acked-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:27 +0000 (21:48 +0200)]
net: cdc_ncm: fix argument alignment
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:26 +0000 (21:48 +0200)]
net: cdc_ncm: use sane defaults for rx/tx buffers
Lots of devices request much larger buffers than reasonable. This
cause real problems for users of hosts with limited resources.
Reducing the default buffer size to 16kB for such devices is
a reasonable trade-off between allowing them to aggregate traffic
and avoiding memory exhaustion on resource restrained hosts.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:25 +0000 (21:48 +0200)]
net: cdc_ncm/cdc_mbim: adding NCM protocol statistics
To have an idea of the effects of the protocol coalescing
it's useful to have some counters showing the different
aspects.
Due to the asymmetrical usbnet interface the netdev
rx_bytes counter has been counting real received payload,
while the tx_bytes counter has included the NCM/MBIM
framing overhead. This overhead can be many times the
payload because of the aggressive padding strategy of
this driver, and will vary a lot depending on device
and traffic.
With very few exceptions, users are only interested in
the payload size. Having an somewhat accurate payload
byte counter is particularly important for mobile
broadband devices, which many NCM devices and of course
all MBIM devices are. Users and userspace applications
will use this counter to monitor account quotas.
Having protocol specific counters for the overhead, we are
now able to correct the tx_bytes netdev counter so that
it shows the real payload
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:24 +0000 (21:48 +0200)]
net: cdc_ncm: set reasonable padding limits
We pad frames larger than X to maximum size for devices which
don't need a ZLP after maximum sized frames. This allows the
device to optimize its transfers for one fixed buffer size.
X was arbitrarily set at 512 bytes regardless of real buffer
maximum, causing extreme overheads due to excessive padding of
larger tx buffers. Limit the padding to at most 3 full USB
packets, still allowing the overhead to payload ratio of 3/1.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:23 +0000 (21:48 +0200)]
net: cdc_ncm: use true max dgram count for header estimates
Many newer NCM and MBIM devices will request a maximum tx
datagram count which is much smaller than our hard-coded
absolute max. We can reduce the overhead without sacrificing
any of the simplicity for these devices, by simply using the
true negotiated count in when calculated the maximum NTH and
NDP header sizes.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:22 +0000 (21:48 +0200)]
net: cdc_ncm: use ethtool to tune coalescing settings
Datagram coalescing is an integral part of the NCM and MBIM
protocols, intended to reduce the interrupt load primarily
on the device end of the USB link. As with all coalescing
solutions, there is a trade-off between buffering and
interrupts.
The current defaults are based on the assumption that device
side buffers should be the limiting factor. However, many
modern high speed LTE modems suffers from buffer-bloat,
making this assumption fail. This results in sub-optimal
performance due to excessive coalescing. And in cases where
such modems are connected to cheap embedded hosts there is
often severe buffer allocation issues, giving very noticeable
performance degradation .
A start on improving this is going from build time hard
coded limits to per device user configurable limits. The
ethtool coalescing API was selected as user interface
because, although the tuned values are buffer sizes, these
settings directly control datagram coalescing.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:21 +0000 (21:48 +0200)]
net: cdc_ncm: support rx_max/tx_max updates when running
Finish the rx_max/tx_max setup by flushing buffers and
informing usbnet about the changes. This way, the settings
can be modified while the netdev is up and running.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:20 +0000 (21:48 +0200)]
net: cdc_ncm: split .bind device initialization
Now that we have split out the part of the device setup
which MUST be done with the data interface in altsetting 0,
we can delay the rest of the initialization. This allows us
to move some of post-init buffer size config from bind to
the appropriate setup function.
The purpose of this refactoring is to collect all code
adjusting the rx_max and tx_max buffers in one place, so
that it is easier to call it from multiple call sites.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:19 +0000 (21:48 +0200)]
net: cdc_ncm: factor out one-time device initialization
Split the parts of setup dealing with device initialization from
parts just setting defaults for attributes which might be
changed after initialization.
Some commands of the device initialization are only allowed when
the data interface is in its disabled altsetting, so we must
separate them out of we are to allow rerunning parts of setup.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 16 May 2014 19:48:18 +0000 (21:48 +0200)]
net: cdc_ncm: split out rx_max/tx_max update of setup
Split out the part of setup dealing with updating the rx_max
and tx_max buffer sizes so that this code can be reused for
dynamically updating the limits.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 16 May 2014 21:28:54 +0000 (23:28 +0200)]
pktgen: Use seq_puts() where seq_printf() is not needed
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 May 2014 21:23:49 +0000 (17:23 -0400)]
Merge branch 'ieee802154-next'
Phoebe Buckheister says:
====================
802154: implement link-layer security
This patch series implements 802.15.4-2011 link layer security.
Patches 1 and 2 prepare for llsec by adding data structures to represent the
llsec PIB as specified in 802.15.4-2011. I've changed some structures from
their specification to be more sensible, since 802.15.4 specifies some
structures in not-exactly-useful ways. Nested lists are common, but not very
accessible for netlink methods, and not very fast to traverse when searching
for specific elements either.
Patch 3 implements backends for these structures in mac802154.
Patch 4 and 5 implement the encryption and decryption methods, split from patch
3 to ease review. The encryption and decryption methods are almost entirely
compliant with the specified outgoing/incoming frame procedures. Decryption
deviates from the specification slightly where the specification makes no
sense, i.e. encrypted frames with security level 0 may be sent, but must be
dropped an reception - but transforms for processing such frames are given a
few lines in the standard. I've opted to not drop these frames instead of not
implementing the transforms that wouldn't be used if they were dropped.
Patch 6 links the mac802154 llsec with the SoftMAC devices. This is mainly
init//fini code for llsec context, handling of security subheaders and calling
the encryption/decryption methods.
Patch 7 adds sockopts to 802.15.4 dgram sockets to modifiy outgoing security
parameters on a per-socket basis. Ideally, this would also be available for
sockets on 6lowpan devices, but I'm not sure how to do that nicely.
Patch 8 adds forwarders to the llsec configuration methods for netlink, patch
10 implements these netlink accessors. This is mainly mechanical.
Patch 11, implements a key tracking option for devices that previous patches
haven't, because I'm not entirely sure whether this is the best approach to the
problem. It performs reasonably well though, so I decided to include it as a
separate patch in this series instead of sending an RFC just for this one
option.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:45 +0000 (17:46 +0200)]
ieee802154, mac802154: implement devkey record option
The 802.15.4-2011 standard states that for each key, a list of devices
that use this key shall be kept. Previous patches have only considered
two options:
* a device "uses" (or may use) all keys, rendering the list useless
* a device is restricted to a certain set of keys
Another option would be that a device *may* use all keys, but need not
do so, and we are interested in the actual set of keys the device uses.
Recording keys used by any given device may have a noticable performance
impact and might not be needed as often. The common case, in which a
device will not switch keys too often, should still perform well.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:44 +0000 (17:46 +0200)]
ieee802154: add netlink interfaces for llsec
This patch adds user-visible interfaces for the llsec infrastructure.
For the added methods, the only major difference between all add/remove
implementation lies in how the specific object is parsed, and for dump
requests, how objects are written into netlink messages.
To save on boilerplate code, table dumps are routed through a helper
function that handles netlink dump state, leaving the actual dumping
code to care only about iterating over the table to be dumped and
filling netlink messages. For add/remove methods, the boilerplate
required to work is not quite as large, but still enough to also move
into a local helper.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:43 +0000 (17:46 +0200)]
mac802154: propagate device address changes to llsec
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:42 +0000 (17:46 +0200)]
mac802154: add llsec configuration functions
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:41 +0000 (17:46 +0200)]
ieee802154: add dgram sockopts for security control
Allow datagram sockets to override the security settings of the device
they send from on a per-socket basis. Requires CAP_NET_ADMIN or
CAP_NET_RAW, since raw sockets can send arbitrary packets anyway.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:40 +0000 (17:46 +0200)]
mac802154: integrate llsec with wpan devices
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:39 +0000 (17:46 +0200)]
mac802154: add llsec decryption method
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:38 +0000 (17:46 +0200)]
mac802154: add llsec encryption method
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:37 +0000 (17:46 +0200)]
mac802154: add llsec structures and mutators
This patch adds containers and mutators for the major ieee802154_llsec
structures to mac802154. Most of the (rather simple) ieee802154_llsec
structs are wrapped only to provide an rcu_head for orderly disposal,
but some structs - llsec keys notably - require more complex
bookkeeping.
Since each llsec key may be referenced by a number of llsec key table
entries (with differing key ids, but the same actual key), we want to
save memory and not allocate crypto transforms for each entry in the
table. Thus, the mac802154 llsec key is reference-counted instead.
Further, each key will have four associated crypto transforms - three
CCM transforms for the authsizes 4/8/16 and one CTR transform for
unauthenticated encryption. If we had a CCM* transform that allowed
authsize 0, and authsize as part of requests instead of transforms, this
would not be necessary.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:36 +0000 (17:46 +0200)]
mac802154: update Kconfig
Link-layer security requires AES CCM for authenticated modes and AES CTR
for the unauthenticated encryption mode.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Fri, 16 May 2014 15:46:35 +0000 (17:46 +0200)]
ieee802154: add types for link-layer security
The added structures match 802.15.4-2011 link-layer security PIBs as
closely as is reasonable. Some lists required by the standard were
modeled as bitmaps (frame_types and command_frame_ids in *llsec_key,
802.15.4-2011 7.5/Table 61), since using lists for those seems a bit
excessive and not particularly useful. The DeviceDescriptorHandleList
was inverted and is here a per-device list, since operations on this
list are likely to have both a key and a device at hand, and per-device
lists of keys are shorter than per-key lists of devices.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 May 2014 21:21:51 +0000 (17:21 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jesse/openvswitch
Jesse Gross says:
====================
A set of OVS changes for net-next/3.16.
The major change here is a switch from per-CPU to per-NUMA flow
statistics. This improves scalability by reducing kernel overhead
in flow setup and maintenance.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 May 2014 21:19:44 +0000 (17:19 -0400)]
Merge branch 'dt_fixed_phy'
Thomas Petazzoni says:
====================
Add DT support for fixed PHYs
Here is a fourth version of the patch set that adds a Device Tree
binding and the related code to support fixed PHYs. I'm hoping to get
this merged in 3.16.
Changes since v3:
* Rebased on top of v3.15-rc5
* In patch "net: phy: decouple PHY id and PHY address in fixed PHY
driver", changed the PHY ID of fixed PHYs from 0xdeadbeef to 0x0,
as suggested by Grant Likely.
* Fixed the !CONFIG_PHY_FIXED case in patch "net: phy: extend fixed
driver with fixed_phy_register()". Noticed by Florian Fainelli.
* Added Acked-by from Grant Likely and Florian Fainelli on patch
"net: phy: extend fixed driver with fixed_phy_register()".
* Reworked the new fixed-link DT binding to be just a sub-node of the
Ethernet MAC node, and not a node referenced by the 'phy'
property. This was requested by Grant Likely.
* Reworked the code implementing the new DT binding to also make it
accept the old, single property based, DT binding.
* Added a patch that actually uses the new fixed link DT binding for
the Armada XP Matrix board.
Changes since v2:
* Rebased on top of v3.14-rc1, and re-tested on hardware.
* Removed the RFC tag, since there seems to be some real interest in
this feature, and the code has gone through several iterations
already.
* The error handling in fixed_phy_register() has been fixed.
Changes since v1:
* Instead of using a 'fixed-link' property inside the Ethernet device
DT node, with a fairly cryptic succession of integer values, we now
use a PHY subnode under the Ethernet device DT node, with explicit
properties to configure the duplex, speed, pause and other PHY
properties.
* The PHY address is automatically allocated by the kernel and no
longer visible in the Device Tree binding.
* The PHY device is created directly when the network driver calls
of_phy_connect_fixed_link(), and associated to the PHY DT node,
which allows the existing of_phy_connect() function to work,
without the need to use the deprecated of_phy_connect_fixed_link().
Posts of previous versions:
RFCv1: http://www.spinics.net/lists/netdev/msg243253.html
RFCv2: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-September/196919.html
PATCHv3: http://www.spinics.net/lists/netdev/msg273117.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Fri, 16 May 2014 14:14:07 +0000 (16:14 +0200)]
ARM: mvebu: use the fixed-link PHY DT binding for the Armada XP Matrix board
The Armada XP Matrix board has an Ethernet PHY that isn't configurable
through the MDIO bus, so we use the newly introduced fixed-link PHY DT
binding to represent the PHY of this platform and get network working.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Fri, 16 May 2014 14:14:06 +0000 (16:14 +0200)]
net: mvneta: add support for fixed links
Following the introduction of of_phy_register_fixed_link(), this patch
introduces fixed link support in the mvneta driver, for Marvell Armada
370/XP SOCs.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Fri, 16 May 2014 14:14:05 +0000 (16:14 +0200)]
of: provide a binding for fixed link PHYs
Some Ethernet MACs have a "fixed link", and are not connected to a
normal MDIO-managed PHY device. For those situations, a Device Tree
binding allows to describe a "fixed link" using a special PHY node.
This patch adds:
* A documentation for the fixed PHY Device Tree binding.
* An of_phy_is_fixed_link() function that an Ethernet driver can call
on its PHY phandle to find out whether it's a fixed link PHY or
not. It should typically be used to know if
of_phy_register_fixed_link() should be called.
* An of_phy_register_fixed_link() function that instantiates the
fixed PHY into the PHY subsystem, so that when the driver calls
of_phy_connect(), the PHY device associated to the OF node will be
found.
These two additional functions also support the old fixed-link Device
Tree binding used on PowerPC platforms, so that ultimately, the
network device drivers for those platforms could be converted to use
of_phy_is_fixed_link() and of_phy_register_fixed_link() instead of
of_phy_connect_fixed_link(), while keeping compatibility with their
respective Device Tree bindings.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Fri, 16 May 2014 14:14:04 +0000 (16:14 +0200)]
net: phy: extend fixed driver with fixed_phy_register()
The existing fixed_phy_add() function has several drawbacks that
prevents it from being used as is for OF-based declaration of fixed
PHYs:
* The address of the PHY on the fake bus needs to be passed, while a
dynamic allocation is desired.
* Since the phy_device instantiation is post-poned until the next
mdiobus scan, there is no way to associate the fixed PHY with its
OF node, which later prevents of_phy_connect() from finding this
fixed PHY from a given OF node.
To solve this, this commit introduces fixed_phy_register(), which will
allocate an available PHY address, add the PHY using fixed_phy_add()
and instantiate the phy_device structure associated with the provided
OF node.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Fri, 16 May 2014 14:14:03 +0000 (16:14 +0200)]
net: phy: decouple PHY id and PHY address in fixed PHY driver
Until now, the fixed_phy_add() function was taking as argument
'phy_id', which was used both as the PHY address on the fake fixed
MDIO bus, and as the PHY id, as available in the MII_PHYSID1 and
MII_PHYSID2 registers. However, those two informations are completely
unrelated.
This patch decouples them. The PHY id of fixed PHYs is hardcoded to be
0x0. Ideally, a really reserved value would be nicer, but there
doesn't seem to be an easy of making sure a dummy value can be
assigned to the Linux kernel for such usage.
The PHY address remains passed by the caller of phy_fixed_add().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 May 2014 21:06:40 +0000 (17:06 -0400)]
Merge branch 'bridge-non-promisc'
Vlad Yasevich says:
====================
bridge: Non-promisc bridge ports support
This series adds functionality to the bridge device to enable
operations without setting all ports to promiscuous mode.
The basic concept is this. The bridge keeps track of the ports
that support learning and flooding packets to unknown destinations.
We call these ports auto-discovery ports since they automatically
discover who is behind them through learning and flooding.
If flooding and learning are disabled via flags, then the port
requires static configuration to tell it which mac addresses
are behind it. This is accomplished through adding of fdbs.
These fdbs should be static as dynamic fdbs can expire and systems
will become unreachable due to lack of flooding.
If the user marks all ports as needing static configuration then
we can safely make them non-promiscuous since we will know all the
information about them.
If the user leaves only 1 port as automatic, then we can mark
that port as not-promiscuous as well. One could think of
this a edge relay similar to what's support by embedded switches
in SRIOV devices. Since we have all the information about the
other ports, we can just program the mac addresses into the
single automatic port to receive all necessary traffic.
More information about this is patch 6.
In other cases, we keep all ports promiscuous as before.
There are some other cases when promiscuous mode has to be turned
back on. One is when the bridge itself if placed in promiscuous
mode (user sets promisc flag). The other is if vlan filtering is
turned off. Since this is the default configuration, the default
bridge operation is not changed.
Changes since v2:
- White space and spelling fixes from Michael Tsirkin
- Squash patches 6, 7 and 8 to prevent bisect breakage.
Changes since v1:
- Address issues rasied by Stephen Heminger
- Address initializer comments raised by Sergey Shtylyov
- Rebased recent net-next.
Changes since rfc v2:
- Better description of in the commit logs
- Leave port in promiscuous mode if IFF_UNICAST_FLT is disabled on the
device.
- Fix issue with flag masking
- Rework patch ordering a bit.
Changes since rfc v1:
- Removed private list. We now traverse the fdb hashtable itself
to write necessary addresses to the ports (Stephen's concern)
- Add learning flag to the mask for flags that decides if the port
is 'auto' or not (suggest by MST and Jamal).
- Simplified tracking of such ports at the cost of a loop over all
ports (suggested by MST)
I've played with quite a large number of ports and the current approach
seems to work fairly well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 16 May 2014 13:59:20 +0000 (09:59 -0400)]
bridge: Automatically manage port promiscuous mode.
There exist configurations where the administrator or another management
entity has the foreknowledge of all the mac addresses of end systems
that are being bridged together.
In these environments, the administrator can statically configure known
addresses in the bridge FDB and disable flooding and learning on ports.
This makes it possible to turn off promiscuous mode on the interfaces
connected to the bridge.
Here is why disabling flooding and learning allows us to control
promiscuity:
Consider port X. All traffic coming into this port from outside the
bridge (ingress) will be either forwarded through other ports of the
bridge (egress) or dropped. Forwarding (egress) is defined by FDB
entries and by flooding in the event that no FDB entry exists.
In the event that flooding is disabled, only FDB entries define
the egress. Once learning is disabled, only static FDB entries
provided by a management entity define the egress. If we provide
information from these static FDBs to the ingress port X, then we'll
be able to accept all traffic that can be successfully forwarded and
drop all the other traffic sooner without spending CPU cycles to
process it.
Another way to define the above is as following equations:
ingress = egress + drop
expanding egress
ingress = static FDB + learned FDB + flooding + drop
disabling flooding and learning we a left with
ingress = static FDB + drop
By adding addresses from the static FDB entries to the MAC address
filter of an ingress port X, we fully define what the bridge can
process without dropping and can thus turn off promiscuous mode,
thus dropping packets sooner.
There have been suggestions that we may want to allow learning
and update the filters with learned addresses as well. This
would require mac-level authentication similar to 802.1x to
prevent attacks against the hw filters as they are limited
resource.
Additionally, if the user places the bridge device in promiscuous mode,
all ports are placed in promiscuous mode regardless of the changes
to flooding and learning.
Since the above functionality depends on full static configuration,
we have also require that vlan filtering be enabled to take
advantage of this. The reason is that the bridge has to be
able to receive and process VLAN-tagged frames and the there
are only 2 ways to accomplish this right now: promiscuous mode
or vlan filtering.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 16 May 2014 13:59:19 +0000 (09:59 -0400)]
bridge: Add addresses from static fdbs to non-promisc ports
When a static fdb entry is created, add the mac address
from this fdb entry to any ports that are currently running
in non-promiscuous mode. These ports need this data so that
they can receive traffic destined to these addresses.
By default ports start in promiscuous mode, so this feature
is disabled.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 16 May 2014 13:59:18 +0000 (09:59 -0400)]
bridge: Introduce BR_PROMISC flag
Introduce a BR_PROMISC per-port flag that will help us track if the
current port is supposed to be in promiscuous mode or not. For now,
always start in promiscuous mode.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 16 May 2014 13:59:17 +0000 (09:59 -0400)]
bridge: Add functionality to sync static fdb entries to hw
Add code that allows static fdb entires to be synced to the
hw list for a specified port. This will be used later to
program ports that can function in non-promiscuous mode.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 16 May 2014 13:59:16 +0000 (09:59 -0400)]
bridge: Keep track of ports capable of automatic discovery.
By default, ports on the bridge are capable of automatic
discovery of nodes located behind the port. This is accomplished
via flooding of unknown traffic (BR_FLOOD) and learning the
mac addresses from these packets (BR_LEARNING).
If the above functionality is disabled by turning off these
flags, the port requires static configuration in the form
of static FDB entries to function properly.
This patch adds functionality to keep track of all ports
capable of automatic discovery. This will later be used
to control promiscuity settings.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 16 May 2014 13:59:15 +0000 (09:59 -0400)]
bridge: Turn flag change macro into a function.
Turn the flag change macro into a function to allow
easier updates and to reduce space.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Delvare [Fri, 16 May 2014 09:29:04 +0000 (11:29 +0200)]
net: pch_gbe depends on x86_32
The pch_gbe driver is for a companion chip to the Intel Atom E600
series processors. These are 32-bit x86 processors so the driver is
only needed on X86_32.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Duan Jiong [Thu, 15 May 2014 05:07:02 +0000 (13:07 +0800)]
ip_tunnel: don't add tunnel twice
When using command "ip tunnel add" to add a tunnel, the tunnel will be added twice,
through ip_tunnel_create() and ip_tunnel_update().
Because the second is unnecessary, so we can just break after adding tunnel
through ip_tunnel_create().
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 15 May 2014 22:56:39 +0000 (15:56 -0700)]
tools: bpf_jit_disasm: increase image buffer size
JITed seccomp filters can be quite large if they check a lot of syscalls
Simply increase buffer size
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 15 May 2014 22:56:38 +0000 (15:56 -0700)]
tools: bpf_jit_disasm: ignore image address for disasm
seccomp filters use kernel JIT image addresses, so bpf_jit_enable=2 prints
[ 20.146438] flen=3 proglen=82 pass=0 image=
0000000000000000
[ 20.146442] JIT code:
00000000: 55 48 89 e5 48 81 ec 28 02 00 00 ...
ignore image address, so that seccomp filters can be disassembled
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 May 2014 20:41:53 +0000 (16:41 -0400)]
Merge branch 'systemport-next'
Florian Fainelli says:
====================
net: systemport: DMA and MAC fixes
This patch series contains a critical fix in how the DMA unmapping of packet
is done, as well as a less critical fix in how we disable the Ethernet MAC
RX/TX functions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 15 May 2014 21:33:53 +0000 (14:33 -0700)]
net: systemport: wait for packet in umac_enable_set()
When umac_enable_set() is used to disable the UniMAC receiver or
transmitter, we need to make sure that we wait for a full-sized packet
to be processed because the UniMAC hardware stops on a packet boundary,
not immediately.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 15 May 2014 21:33:52 +0000 (14:33 -0700)]
net: systemport: fix dma_unmap_single() len
dma_unmap_single() was called with dma_unmap_len(cb, dma_len),
unfortunately we failed to assign this length field in
bcm_sysport_rx_refill() or bcm_sysport_alloc_rx_bufs() using
dma_unmap_len_set().
This causes packet contents corruption because are we not invoking the
cache invalidation routines with the proper length. Fix this by using
the full RX buffer size (RX_BUF_LENGTH) because the mappings for the RX
bufers are created with that size.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Monam Agarwal [Sun, 23 Mar 2014 19:22:43 +0000 (00:52 +0530)]
net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.c
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)
The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)
Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jarno Rajahalme [Thu, 27 Mar 2014 19:51:49 +0000 (12:51 -0700)]
openvswitch: Use TCP flags in the flow key for stats.
We already extract the TCP flags for the key, might as well use that
for stats.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jarno Rajahalme [Thu, 27 Mar 2014 19:47:11 +0000 (12:47 -0700)]
openvswitch: Fix output of SCTP mask.
The 'output' argument of the ovs_nla_put_flow() is the one from which
the bits are written to the netlink attributes. For SCTP we
accidentally used the bits from the 'swkey' instead. This caused the
mask attributes to include the bits from the actual flow key instead
of the mask.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jarno Rajahalme [Thu, 27 Mar 2014 19:42:54 +0000 (12:42 -0700)]
openvswitch: Per NUMA node flow stats.
Keep kernel flow stats for each NUMA node rather than each (logical)
CPU. This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.
With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master. Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace. The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows. This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.
Perf results for this test (master):
Events: 305K cycles
+ 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area
+ 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.03% swapper [kernel.kallsyms] [k] intel_idle
+ 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6
+ 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset
+ 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock
+ 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock
...
And after this patch:
Events: 356K cycles
+ 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f
+ 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 1.47% swapper [kernel.kallsyms] [k] intel_idle
+ 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask
+ 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref
+ 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash
+ 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions
+ 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref
+ 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock
...
There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).
On flow setup, a single stats instance is allocated (for the NUMA node
0). As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated. This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency. If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated. This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jarno Rajahalme [Thu, 27 Mar 2014 19:35:23 +0000 (12:35 -0700)]
openvswitch: Remove 5-tuple optimization.
The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch. Remove it first to make the changes easier to
grasp.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Joe Perches [Tue, 18 Feb 2014 19:15:45 +0000 (11:15 -0800)]
openvswitch: Use ether_addr_copy
It's slightly smaller/faster for some architectures.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Joe Perches [Tue, 4 Feb 2014 01:18:21 +0000 (17:18 -0800)]
openvswitch: flow_netlink: Use pr_fmt to OVS_NLERR output
Add "openvswitch: " prefix to OVS_NLERR output
to match the other OVS_NLERR output of datapath.c
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Joe Perches [Tue, 4 Feb 2014 01:06:46 +0000 (17:06 -0800)]
openvswitch: Use net_ratelimit in OVS_NLERR
Each use of pr_<level>_once has a per-site flag.
Some of the OVS_NLERR messages look as if seeing them
multiple times could be useful, so use net_ratelimit()
instead of pr_info_once.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Daniele Di Proietto [Mon, 3 Feb 2014 22:09:01 +0000 (14:09 -0800)]
openvswitch: Added (unsigned long long) cast in printf
This is necessary, since u64 is not unsigned long long
in all architectures: u64 could be also uint64_t.
Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Daniele Di Proietto [Mon, 3 Feb 2014 22:08:29 +0000 (14:08 -0800)]
openvswitch: avoid cast-qual warning in vport_priv
This function must cast a const value to a non const value.
By adding an uintptr_t cast the warning is suppressed.
To avoid the cast (proper solution) several function signatures
must be changed.
Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Daniele Di Proietto [Mon, 3 Feb 2014 22:07:43 +0000 (14:07 -0800)]
openvswitch: avoid warnings in vport_from_priv
This change, firstly, avoids declaring the formal parameter const,
since it is treated as non const. (to avoid -Wcast-qual)
Secondly, it cast the pointer from void* to u8*, since it is used
in arithmetic (to avoid -Wpointer-arith)
Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Daniele Di Proietto [Thu, 23 Jan 2014 18:56:49 +0000 (10:56 -0800)]
openvswitch: use const in some local vars and casts
In few functions, const formal parameters are assigned or cast to
non-const.
These changes suppress warnings if compiled with -Wcast-qual.
Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
David S. Miller [Fri, 16 May 2014 20:34:43 +0000 (16:34 -0400)]
Merge branch 'bonding-next'
Veaceslav Falico says:
====================
bonding: simple macro cleanup
Trivial patchset that converts most of the bonding's macros into inline
functions. It introduces only one macro, BOND_MODE(), which is just
bond->params.mode, better to write/understand/remember.
The only real change is the removal of IFF_UP verification, which always
came in pair with && netif_running(), and is though useless, as it's always
IFF_UP when LINK_STATE_RUNNING.
v2->v3: fix 3/9 to actually invert bond_mode_uses_arp() and add
bond_uses_arp() alongside bond_mode_uses_arp()
v1->v2: use inlined functions instead of macros.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 15 May 2014 19:39:59 +0000 (21:39 +0200)]
bonding: replace SLAVE_IS_OK() with bond_slave_can_tx()
They're verifying the same thing (except of IFF_UP, which is implied for
netif_running(), which is also a prerequisite).
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 15 May 2014 19:39:58 +0000 (21:39 +0200)]
bonding: rename {, bond_}slave_can_tx and clean it up
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 15 May 2014 19:39:57 +0000 (21:39 +0200)]
bonding: convert IS_UP(slave->dev) to inline function
Also, remove the IFF_UP verification cause we can't be netif_running() with
being also IFF_UP.
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 15 May 2014 19:39:56 +0000 (21:39 +0200)]
bonding: make IS_IP_TARGET_UNUSABLE_ADDRESS an inline function
Also, use standard IP primitives to check the address.
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>