Linus Torvalds [Mon, 8 Jan 2018 19:51:04 +0000 (11:51 -0800)]
Mark HI and TASKLET softirq synchronous
Way back in 4.9, we committed
4cd13c21b207 ("softirq: Let ksoftirqd do
its job"), and ever since we've had small nagging issues with it. For
example, we've had:
1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
all of which worked around some of the effects of that commit.
The DVB people have also complained that the commit causes excessive USB
URB latencies, which seems to be due to the USB code using tasklets to
schedule USB traffic. This seems to be an issue mainly when already
living on the edge, but waiting for ksoftirqd to handle it really does
seem to cause excessive latencies.
Now Hanna Hawa reports that this issue isn't just limited to USB URB and
DVB, but also causes timeout problems for the Marvell SoC team:
"I'm facing kernel panic issue while running raid 5 on sata disks
connected to Macchiatobin (Marvell community board with Armada-8040
SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
uses a tasklet to clean the done descriptors from the queue"
The latency problem causes a panic:
mv_xor_v2
f0400000.xor: dma_sync_wait: timeout!
Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
We've discussed simply just reverting the original commit entirely, and
also much more involved solutions (with per-softirq threads etc). This
patch is intentionally stupid and fairly limited, because the issue
still remains, and the other solutions either got sidetracked or had
other issues.
We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
This does only the tasklet cases.
Reported-and-tested-by: Hanna Hawa <hannah@marvell.com>
Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at>
Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Iwai [Tue, 17 Jul 2018 15:26:43 +0000 (17:26 +0200)]
ALSA: rawmidi: Change resized buffers atomically
The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the
current code is racy. For example, the sequencer client may write to
buffer while it being resized.
As a simple workaround, let's switch to the resized buffer inside the
stream runtime lock.
Reported-by: syzbot+52f83f0ea8df16932f7f@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Qu Wenruo [Wed, 11 Jul 2018 05:41:21 +0000 (13:41 +0800)]
btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block()
In commit
ac0b4145d662 ("btrfs: scrub: Don't use inode pages for device
replace") we removed the branch of copy_nocow_pages() to avoid
corruption for compressed nodatasum extents.
However above commit only solves the problem in scrub_extent(), if
during scrub_pages() we failed to read some pages,
sctx->no_io_error_seen will be non-zero and we go to fixup function
scrub_handle_errored_block().
In scrub_handle_errored_block(), for sctx without csum (no matter if
we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
which does the similar thing with copy_nocow_pages(), but does it
without the extra check in copy_nocow_pages() routine.
So for test cases like btrfs/100, where we emulate read errors during
replace/scrub, we could corrupt compressed extent data again.
This patch will fix it just by avoiding any "optimization" for
nodatasum, just falls back to the normal fixup routine by try read from
any good copy.
This also solves WARN_ON() or dead lock caused by lame backref iteration
in scrub_fixup_nodatasum() routine.
The deadlock or WARN_ON() won't be triggered before commit
ac0b4145d662
("btrfs: scrub: Don't use inode pages for device replace") since
copy_nocow_pages() have better locking and extra check for data extent,
and it's already doing the fixup work by try to read data from any good
copy, so it won't go scrub_fixup_nodatasum() anyway.
This patch disables the faulty code and will be removed completely in a
followup patch.
Fixes:
ac0b4145d662 ("btrfs: scrub: Don't use inode pages for device replace")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Ursula Braun [Mon, 16 Jul 2018 11:56:52 +0000 (13:56 +0200)]
net/smc: take sock lock in smc_ioctl()
SMC ioctl processing requires the sock lock to work properly in
all thinkable scenarios.
Problem has been found with RaceFuzzer and fixes:
KASAN: null-ptr-deref Read in smc_ioctl
Reported-by: Byoungyoung Lee <lifeasageek@gmail.com>
Reported-by: syzbot+35b2c5aa76fd398b9fd4@syzkaller.appspotmail.com
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 Jul 2018 21:42:11 +0000 (14:42 -0700)]
Merge branch 'tg3-fixes'
Siva Reddy Kallam says:
====================
tg3: Update copyright and fix for tx timeout with 5762
First patch:
Update copyright
Second patch:
Add higher cpu clock for 5762
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sanjeev Bansal [Mon, 16 Jul 2018 05:43:32 +0000 (11:13 +0530)]
tg3: Add higher cpu clock for 5762.
This patch has fix for TX timeout while running bi-directional
traffic with 100 Mbps using 5762.
Signed-off-by: Sanjeev Bansal <sanjeevb.bansal@broadcom.com>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Siva Reddy Kallam [Mon, 16 Jul 2018 05:43:31 +0000 (11:13 +0530)]
tg3: Update copyright
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Allen [Mon, 16 Jul 2018 15:29:30 +0000 (10:29 -0500)]
ibmvnic: Fix error recovery on login failure
Testing has uncovered a failure case that is not handled properly. In the
event that a login fails and we are not able to recover on the spot, we
return 0 from do_reset, preventing any error recovery code from being
triggered. Additionally, the state is set to "probed" meaning that when we
are able to trigger the error recovery, the driver always comes up in the
probed state. To handle the case properly, we need to return a failure code
here and set the adapter state to the state that we entered the reset in
indicating the state that we would like to come out of the recovery reset
in.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Wahren [Sun, 15 Jul 2018 19:53:20 +0000 (21:53 +0200)]
net: lan78xx: Fix race in tx pending skb size calculation
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.
This patch was tested on Raspberry Pi 3B+
Link: https://github.com/raspberrypi/linux/issues/2608
Fixes:
55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Floris Bos <bos@je-eigen-domein.nl>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 15 Jul 2018 16:35:19 +0000 (09:35 -0700)]
net/ipv6: Do not allow device only routes via the multipath API
Eric reported that reverting the patch that fixed and simplified IPv6
multipath routes means reverting back to invalid userspace notifications.
eg.,
$ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1
only generates a single notification:
2001:db8:1::/64 dev eth0 metric 1024 pref medium
While working on a fix for this problem I found another case that is just
broken completely - a multipath route with a gateway followed by device
followed by gateway:
$ ip -6 ro add 2001:db8:103::/64
nexthop via 2001:db8:1::64
nexthop dev dummy2
nexthop via 2001:db8:3::64
In this case the device only route is dropped completely - no notification
to userpsace but no addition to the FIB either:
$ ip -6 ro ls
2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium
2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium
2001:db8:103::/64 metric 1024
nexthop via 2001:db8:1::64 dev dummy1 weight 1
nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy2 proto kernel metric 256 pref medium
fe80::/64 dev dummy3 proto kernel metric 256 pref medium
Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to
device only routes, so do not allow it all.
This change will break any scripts relying on the mpath api for insert,
but I don't see any other way to handle the permutations. Besides, since
the routes are added to the FIB as standalone (non-multipath) routes the
kernel is not doing what the user requested, so it might as well tell the
user that.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Baranoff [Sun, 15 Jul 2018 15:36:37 +0000 (11:36 -0400)]
tcp: Fix broken repair socket window probe patch
Correct previous bad attempt at allowing sockets to come out of TCP
repair without sending window probes. To avoid changing size of
the repair variable in struct tcp_sock, this lets the decision for
sending probes or not to be made when coming out of repair by
introducing two ways to turn it off.
v2:
* Remove erroneous comment; defines now make behavior clear
Fixes:
70b7ff130224 ("tcp: allow user to create repair socket without window probes")
Signed-off-by: Stefan Baranoff <sbaranoff@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sun, 15 Jul 2018 10:54:39 +0000 (13:54 +0300)]
net/mlx4_en: Don't reuse RX page when XDP is set
When a new rx packet arrives, the rx path will decide whether to reuse
the remainder of the page or not according to one of the below conditions:
1. frag_info->frag_stride == PAGE_SIZE / 2
2. frags->page_offset + frag_info->frag_size > PAGE_SIZE;
The first condition is no met for when XDP is set.
For XDP, page_offset is always set to priv->rx_headroom which is
XDP_PACKET_HEADROOM and frag_info->frag_size is around mtu size + some
padding, still the 2nd release condition will hold since
XDP_PACKET_HEADROOM + 1536 < PAGE_SIZE, as a result the page will not
be released and will be _wrongly_ reused for next free rx descriptor.
In XDP there is an assumption to have a page per packet and reuse can
break such assumption and might cause packet data corruptions.
Fix this by adding an extra condition (!priv->rx_headroom) to the 2nd
case to avoid page reuse when XDP is set, since rx_headroom is set to 0
for non XDP setup and set to XDP_PACKET_HEADROOM for XDP setup.
No additional cache line is required for the new condition.
Fixes:
34db548bfb95 ("mlx4: add page recycling in receive path")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: Martin KaFai Lau <kafai@fb.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 14 Jul 2018 04:25:19 +0000 (21:25 -0700)]
net/ethernet/freescale/fman: fix cross-build error
CC [M] drivers/net/ethernet/freescale/fman/fman.o
In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35:
../include/linux/fsl/guts.h: In function 'guts_set_dmacr':
../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration]
clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift);
^~~~~~~~~~~~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Madalin Bucur <madalin.bucur@nxp.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 13 Jul 2018 17:38:38 +0000 (10:38 -0700)]
hv/netvsc: fix handling of fallback to single queue mode
The netvsc device may need to fallback to running in single queue
mode if host side only wants to support single queue.
Recent change for handling mtu broke this in setup logic.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes:
3ffe64f1a641 ("hv_netvsc: split sub-channel setup into async and sync")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 13 Jul 2018 17:03:32 +0000 (12:03 -0500)]
ibmvnic: Revise RX/TX queue error messages
During a device failover, there may be latency between the loss
of the current backing device and a notification from firmware that
a failover has occurred. This latency can result in a large amount of
error printouts as firmware returns outgoing traffic with a generic
error code. These are not necessarily errors in this case as the
firmware is busy swapping in a new backing adapter and is not ready
to send packets yet. This patch reclassifies those error codes as
warnings with an explanation that a failover may be pending. All
other return codes will be considered errors.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 13 Jul 2018 15:21:42 +0000 (17:21 +0200)]
ipv6: make DAD fail with enhanced DAD when nonce length differs
Commit
adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)")
added enhanced DAD with a nonce length of 6 bytes. However, RFC7527
doesn't specify the length of the nonce, other than being 6 + 8*k bytes,
with integer k >= 0 (RFC3971 5.3.2). The current implementation simply
assumes that the nonce will always be 6 bytes, but others systems are
free to choose different sizes.
If another system sends a nonce of different length but with the same 6
bytes prefix, it shouldn't be considered as the same nonce. Thus, check
that the length of the received nonce is the same as the length we sent.
Ugly scapy test script running on veth0:
def loop():
pkt=sniff(iface="veth0", filter="icmp6", count=1)
pkt = pkt[0]
b = bytearray(pkt[Raw].load)
b[1] += 1
b += b'\xde\xad\xbe\xef\xde\xad\xbe\xef'
pkt[Raw].load = bytes(b)
pkt[IPv6].plen += 8
# fixup checksum after modifying the payload
pkt[IPv6].payload.cksum -= 0x3b44
if pkt[IPv6].payload.cksum < 0:
pkt[IPv6].payload.cksum += 0xffff
sendp(pkt, iface="veth0")
This should result in DAD failure for any address added to veth0's peer,
but is currently ignored.
Fixes:
adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corentin Labbe [Fri, 13 Jul 2018 11:50:15 +0000 (11:50 +0000)]
net: ethernet: stmmac: fix documentation warning
This patch remove the following documentation warning
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:103: warning: Excess function parameter 'priv' description in 'stmmac_axi_setup'
It was introduced in commit
afea03656add7 ("stmmac: rework DMA bus setting and introduce new platform AXI structure")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corentin Labbe [Fri, 13 Jul 2018 11:48:42 +0000 (11:48 +0000)]
net: stmmac: dwmac-sun8i: fix typo descrive => describe
This patch fix a typo in the word Describe
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Fri, 13 Jul 2018 05:40:50 +0000 (14:40 +0900)]
net: ip6_gre: get ipv6hdr after skb_cow_head()
A KASAN:use-after-free bug was found related to ip6-erspan
while running selftests/net/ip6_gre_headroom.sh
It happens because of following sequence:
- ipv6hdr pointer is obtained from skb
- skb_cow_head() is called, skb->head memory is reallocated
- old data is accessed using ipv6hdr pointer
skb_cow_head() call was added in
e41c7c68ea77 ("ip6erspan: make sure
enough headroom at xmit."), but looking at the history there was a
chance of similar bug because gre_handle_offloads() and pskb_trim()
can also reallocate skb->head memory. Fixes tag points to commit
which introduced possibility of this bug.
This patch moves ipv6hdr pointer assignment after skb_cow_head() call.
Fixes:
5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Fri, 13 Jul 2018 04:24:38 +0000 (13:24 +0900)]
tun: Fix use-after-free on XDP_TX
On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
negative value. A positive value indicates that the packet is
successfully enqueued to the ptr_ring, so freeing the page causes
use-after-free.
Fixes:
735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masanari Iida [Thu, 12 Jul 2018 16:05:17 +0000 (01:05 +0900)]
bonding: Fix a typo in bonding.txt
This patch fixes a spelling typo in bonding.txt
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Thu, 12 Jul 2018 15:03:43 +0000 (08:03 -0700)]
tls: Stricter error checking in zerocopy sendmsg path
In the zerocopy sendmsg() path, there are error checks to revert
the zerocopy if we get any error code. syzkaller has discovered
that tls_push_record can return -ECONNRESET, which is fatal, and
happens after the point at which it is safe to revert the iter,
as we've already passed the memory to do_tcp_sendpages.
Previously this code could return -ENOMEM and we would want to
revert the iter, but AFAIK this no longer returns ENOMEM after
a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
so we fail for all error codes.
Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: Dave Watson <davejwatson@fb.com>
Fixes:
3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: David S. Miller <davem@davemloft.net>
Constantine Shulyupin [Thu, 12 Jul 2018 05:28:46 +0000 (08:28 +0300)]
scripts/tags.sh: Add BPF_CALL
Signed-off-by: Constantine Shulyupin <const@MakeLinux.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Biggers [Wed, 11 Jul 2018 17:46:29 +0000 (10:46 -0700)]
KEYS: DNS: fix parsing multiple options
My recent fix for dns_resolver_preparse() printing very long strings was
incomplete, as shown by syzbot which still managed to hit the
WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:
precision 50001 too large
WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0
The bug this time isn't just a printing bug, but also a logical error
when multiple options ("#"-separated strings) are given in the key
payload. Specifically, when separating an option string into name and
value, if there is no value then the name is incorrectly considered to
end at the end of the key payload, rather than the end of the current
option. This bypasses validation of the option length, and also means
that specifying multiple options is broken -- which presumably has gone
unnoticed as there is currently only one valid option anyway.
A similar problem also applied to option values, as the kstrtoul() when
parsing the "dnserror" option will read past the end of the current
option and into the next option.
Fix these bugs by correctly computing the length of the option name and
by copying the option value, null-terminated, into a temporary buffer.
Reproducer for the WARN_ONCE() that syzbot hit:
perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s
Reproducer for "dnserror" option being parsed incorrectly (expected
behavior is to fail when seeing the unknown option "foo", actual
behavior was to read the dnserror value as "1#foo" and fail there):
perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes:
4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 Jul 2018 18:20:07 +0000 (11:20 -0700)]
Merge branch 'multicast-init-as-INCLUDE-when-join-SSM-INCLUDE-group'
Hangbin Liu says:
====================
multicast: init as INCLUDE when join SSM INCLUDE group
Based on RFC3376 5.1 and RFC3810 6.1, we should init as INCLUDE when join SSM
INCLUDE group. In my first version I only clear the group change record. But
this is not enough as when a new group join, it will init as EXCLUDE and
trigger an filter mode change in ip/ip6_mc_add_src(), which will clear all
source addresses' sf_crcount. This will prevent early joined address sending
state change records if multi source addresses joined at the same time.
In this v2 patchset, I fixed it by directly initializing the mode to INCLUDE
for SSM JOIN_SOURCE_GROUP. I also split the original patch into two separated
patches for IPv4 and IPv6.
Test: test by myself and customer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Tue, 10 Jul 2018 14:41:27 +0000 (22:41 +0800)]
ipv6/mcast: init as INCLUDE when join SSM INCLUDE group
This an IPv6 version patch of "ipv4/igmp: init group mode as INCLUDE when
join source group". From RFC3810, part 6.1:
If no per-interface state existed for that
multicast address before the change (i.e., the change consisted of
creating a new per-interface record), or if no state exists after the
change (i.e., the change consisted of deleting a per-interface
record), then the "non-existent" state is considered to have an
INCLUDE filter mode and an empty source list.
Which means a new multicast group should start with state IN(). Currently,
for MLDv2 SSM JOIN_SOURCE_GROUP mode, we first call ipv6_sock_mc_join(),
then ip6_mc_source(), which will trigger a TO_IN() message instead of
ALLOW().
The issue was exposed by commit
a052517a8ff65 ("net/multicast: should not
send source list records when have filter mode change"). Before this change,
we sent both ALLOW(A) and TO_IN(A). Now, we only send TO_IN(A).
Fix it by adding a new parameter to init group mode. Also add some wrapper
functions to avoid changing too much code.
v1 -> v2:
In the first version I only cleared the group change record. But this is not
enough. Because when a new group join, it will init as EXCLUDE and trigger
a filter mode change in ip/ip6_mc_add_src(), which will clear all source
addresses sf_crcount. This will prevent early joined address sending state
change records if multi source addressed joined at the same time.
In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
for IPv4 and IPv6.
There is also a difference between v4 and v6 version. For IPv6, when the
interface goes down and up, we will send correct state change record with
unspecified IPv6 address (::) with function ipv6_mc_up(). But after DAD is
completed, we resend the change record TO_IN() in mld_send_initial_cr().
Fix it by sending ALLOW() for INCLUDE mode in mld_send_initial_cr().
Fixes:
a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Tue, 10 Jul 2018 14:41:26 +0000 (22:41 +0800)]
ipv4/igmp: init group mode as INCLUDE when join source group
Based on RFC3376 5.1
If no interface
state existed for that multicast address before the change (i.e., the
change consisted of creating a new per-interface record), or if no
state exists after the change (i.e., the change consisted of deleting
a per-interface record), then the "non-existent" state is considered
to have a filter mode of INCLUDE and an empty source list.
Which means a new multicast group should start with state IN().
Function ip_mc_join_group() works correctly for IGMP ASM(Any-Source Multicast)
mode. It adds a group with state EX() and inits crcount to mc_qrv,
so the kernel will send a TO_EX() report message after adding group.
But for IGMPv3 SSM(Source-specific multicast) JOIN_SOURCE_GROUP mode, we
split the group joining into two steps. First we join the group like ASM,
i.e. via ip_mc_join_group(). So the state changes from IN() to EX().
Then we add the source-specific address with INCLUDE mode. So the state
changes from EX() to IN(A).
Before the first step sends a group change record, we finished the second
step. So we will only send the second change record. i.e. TO_IN(A).
Regarding the RFC stands, we should actually send an ALLOW(A) message for
SSM JOIN_SOURCE_GROUP as the state should mimic the 'IN() to IN(A)'
transition.
The issue was exposed by commit
a052517a8ff65 ("net/multicast: should not
send source list records when have filter mode change"). Before this change,
we used to send both ALLOW(A) and TO_IN(A). After this change we only send
TO_IN(A).
Fix it by adding a new parameter to init group mode. Also add new wrapper
functions so we don't need to change too much code.
v1 -> v2:
In my first version I only cleared the group change record. But this is not
enough. Because when a new group join, it will init as EXCLUDE and trigger
an filter mode change in ip/ip6_mc_add_src(), which will clear all source
addresses' sf_crcount. This will prevent early joined address sending state
change records if multi source addressed joined at the same time.
In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
for IPv4 and IPv6.
Fixes:
a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 16 Jul 2018 17:24:52 +0000 (10:24 -0700)]
Merge tag 'pinctrl-v4.18-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- A slew of driver fixes for Mediatek mt7622
- Fix a direction inversion bug in the Ingenic driver
- Fix unsupported drive strength setting on the PFC r8a77970
- Off by one and NULL dereference fixes in the NSP driver
* tag 'pinctrl-v4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: nsp: Fix potential NULL dereference
pinctrl: nsp: off by ones in nsp_pinmux_enable()
pinctrl: sh-pfc: r8a77970: remove SH_PFC_PIN_CFG_DRIVE_STRENGTH flag
pinctrl: ingenic: Fix inverted direction for < JZ4770
pinctrl: mt7622: fix a kernel panic when gpio-hog is being applied
pinctrl: mt7622: stop using the deprecated pinctrl_add_gpio_range
pinctrl: mt7622: fix that pinctrl_claim_hogs cannot work
pinctrl: mt7622: fix initialization sequence between eint and gpiochip
pinctrl: mt7622: fix error path on failing at groups building
Linus Torvalds [Mon, 16 Jul 2018 17:20:43 +0000 (10:20 -0700)]
Merge tag 'drm-fixes-2018-07-16-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
- two AGP fixes in here
- a bunch of mostly amdgpu fixes
- sun4i build fix
- two armada fixes
- some tegra fixes
- one i915 core and one i915 gvt fix
* tag 'drm-fixes-2018-07-16-1' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu/pp/smu7: use a local variable for toc indexing
amd/dc/dce100: On dce100, set clocks to 0 on suspend
drm/amd/display: Convert 10kHz clks from PPLib into kHz for Vega
drm/amdgpu: Verify root PD is mapped into kernel address space (v4)
drm/amd/display: fix invalid function table override
drm/amdgpu: Reserve VM root shared fence slot for command submission (v3)
Revert "drm/amd/display: Don't return ddc result and read_bytes in same return value"
char: amd64-agp: Use 64-bit arithmetic instead of 32-bit
char: agp: Change return type to vm_fault_t
drm/i915: Fix hotplug irq ack on i965/g4x
drm/armada: fix irq handling
drm/armada: fix colorkey mode property
drm/tegra: Fix comparison operator for buffer size
gpu: host1x: Check whether size of unpin isn't 0
gpu: host1x: Skip IOMMU initialization if firewall is enabled
drm/sun4i: link in front-end code if needed
drm/i915/gvt: update vreg on inhibit context lri command
Pavel Tatashin [Mon, 16 Jul 2018 15:16:30 +0000 (11:16 -0400)]
mm: don't do zero_resv_unavail if memmap is not allocated
Moving zero_resv_unavail before memmap_init_zone(), caused a regression on
x86-32.
The cause is that we access struct pages before they are allocated when
CONFIG_FLAT_NODE_MEM_MAP is used.
free_area_init_nodes()
zero_resv_unavail()
mm_zero_struct_page(pfn_to_page(pfn)); <- struct page is not alloced
free_area_init_node()
if CONFIG_FLAT_NODE_MEM_MAP
alloc_node_mem_map()
memblock_virt_alloc_node_nopanic() <- struct page alloced here
On the other hand memblock_virt_alloc_node_nopanic() zeroes all the memory
that it returns, so we do not need to do zero_resv_unavail() here.
Fixes:
e181ae0c5db9 ("mm: zero unavailable pages before memmap init")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Tested-by: Matt Hart <matt@mattface.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YOKOTA Hiroshi [Sun, 1 Jul 2018 09:30:01 +0000 (18:30 +0900)]
ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
This adds some required quirk when uses headset or headphone on
Panasonic CF-SZ6.
Signed-off-by: YOKOTA Hiroshi <yokota.hgml@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Po-Hsu Lin [Mon, 16 Jul 2018 07:50:08 +0000 (15:50 +0800)]
ALSA: hda: add mute led support for HP ProBook 455 G5
Audio mute led does not work on HP ProBook 455 G5,
this can be fixed by using CXT_FIXUP_MUTE_LED_GPIO to support it.
BugLink: https://bugs.launchpad.net/bugs/1781763
Reported-by: James Buren
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Frank Rowand [Thu, 12 Jul 2018 21:00:07 +0000 (14:00 -0700)]
of: overlay: update phandle cache on overlay apply and remove
A comment in the review of the patch adding the phandle cache said that
the cache would have to be updated when modules are applied and removed.
This patch implements the cache updates.
Fixes:
0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()")
Reported-by: Alan Tull <atull@kernel.org>
Suggested-by: Alan Tull <atull@kernel.org>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Dave Airlie [Mon, 16 Jul 2018 00:32:11 +0000 (10:32 +1000)]
Merge tag 'drm-intel-fixes-2018-07-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
I already pulled the first fix, pull the GVT fixes.
- GVT fix for KBL vGPU hang to update virtual register from LRI.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180713070922.GA19840@intel.com
Dave Airlie [Sun, 15 Jul 2018 23:57:22 +0000 (09:57 +1000)]
Merge branch 'drm-armada-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm into drm-fixes
Two armada fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180713075427.GA16160@rmk-PC.armlinux.org.uk
Dave Airlie [Sun, 15 Jul 2018 23:50:26 +0000 (09:50 +1000)]
Merge tag 'drm-misc-fixes-2018-07-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Fixes for v4.18-rc5:
- Single fix for a build error when the driver is builtin,
but the backend is a loadable module.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9c596cf5-3f24-070e-74f2-c59bfbaf68fa@linux.intel.com
Dave Airlie [Sun, 15 Jul 2018 23:48:54 +0000 (09:48 +1000)]
Merge tag 'drm/tegra/for-4.18-rc5' of git://anongit.freedesktop.org/tegra/linux into drm-fixes
drm/tegra: Fixes for v4.18-rc5
This contains a couple of one- or two-line fixes for various minor
issues in the Tegra driver.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712070142.15571-1-thierry.reding@gmail.com
Dave Airlie [Sun, 15 Jul 2018 23:45:56 +0000 (09:45 +1000)]
Merge branch 'drm-fixes-4.18' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few display and GPUVM fixes for 4.18.
A few more fixes for 4.18. Two display fixes and a fix to avoid a segfault if
the GPU does not power up properly on resume. These are on top of my pull
from earlier this week.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712043820.2877-1-alexander.deucher@amd.com
Dave Airlie [Sun, 15 Jul 2018 23:42:48 +0000 (09:42 +1000)]
Merge tag 'drm-intel-fixes-2018-07-10' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix hotplug irq ack on i965/g4x (Ville)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180710213249.GA16479@intel.com
Linus Torvalds [Sun, 15 Jul 2018 19:49:31 +0000 (12:49 -0700)]
Linux 4.18-rc5
Linus Torvalds [Sun, 15 Jul 2018 16:49:21 +0000 (09:49 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
- A fix for OMAP5 and DRA7 to make the branch predictor hardening
settings take proper effect on secondary cores
- Disable USB OTG on am3517 since current driver isn't working
- Fix thermal sensor register settings on Armada 38x
- Fix suspend/resume IRQs on pxa3xx
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores
ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
ARM: dts: armada-38x: use the new thermal binding
Radim Krčmář [Sun, 15 Jul 2018 15:43:11 +0000 (17:43 +0200)]
x86/kvmclock: set pvti_cpu0_va after enabling kvmclock
pvti_cpu0_va is the address of shared kvmclock data structure.
pvti_cpu0_va is currently kept unset (1) on 32 bit systems, (2) when
kvmclock vsyscall is disabled, and (3) if kvmclock is not stable.
This poses a problem, because kvm_ptp needs pvti_cpu0_va, but (1) can
work on 32 bit, (2) has little relation to the vsyscall, and (3) does
not need stable kvmclock (although kvmclock won't be used for system
clock if it's not stable, so kvm_ptp is pointless in that case).
Expose pvti_cpu0_va whenever kvmclock is enabled to allow all users to
work with it.
This fixes a regression found on Gentoo: https://bugs.gentoo.org/658544.
Fixes:
9f08890ab906 ("x86/pvclock: add setter for pvclock_pvti_cpu0_va")
Cc: stable@vger.kernel.org
Reported-by: Andreas Steinmetz <ast@domdv.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Janakarajan Natarajan [Wed, 27 Jun 2018 16:30:53 +0000 (11:30 -0500)]
x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
Prevent a config where KVM_AMD=y and CRYPTO_DEV_CCP_DD=m thereby ensuring
that AMD Secure Processor device driver will be built-in when KVM_AMD is
also built-in.
v1->v2:
* Removed usage of 'imply' Kconfig option.
* Change patch commit message.
Fixes:
505c9e94d832 ("KVM: x86: prefer "depends on" to "select" for SEV")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Wed, 30 May 2018 23:00:02 +0000 (16:00 -0700)]
kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading
This exit qualification was inadvertently dropped when the two
VM-entry failure blocks were coalesced.
Fixes:
e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Wed, 11 Jul 2018 17:37:18 +0000 (19:37 +0200)]
x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks
When we switched from doing rdmsr() to reading FS/GS base values from
current->thread we completely forgot about legacy 32-bit userspaces which
we still support in KVM (why?). task->thread.{fsbase,gsbase} are only
synced for 64-bit processes, calling save_fsgs_for_kvm() and using
its result from current is illegal for legacy processes.
There's no ARCH_SET_FS/GS prctls for legacy applications. Base MSRs are,
however, not always equal to zero. Intel's manual says (3.4.4 Segment
Loading Instructions in IA-32e Mode):
"In order to set up compatibility mode for an application, segment-load
instructions (MOV to Sreg, POP Sreg) work normally in 64-bit mode. An
entry is read from the system descriptor table (GDT or LDT) and is loaded
in the hidden portion of the segment register.
...
The hidden descriptor register fields for FS.base and GS.base are
physically mapped to MSRs in order to load all address bits supported by
a 64-bit implementation.
"
The issue was found by strace test suite where 32-bit ioctl_kvm_run test
started segfaulting.
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Bisected-by: Masatake YAMATO <yamato@redhat.com>
Fixes:
42b933b59721 ("x86/kvm/vmx: read MSR_{FS,KERNEL_GS}_BASE from current->thread")
Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 25 Jun 2018 12:04:37 +0000 (14:04 +0200)]
KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Sat, 14 Jul 2018 23:15:19 +0000 (16:15 -0700)]
Merge tag 'rtc-4.18-2' of git://git./linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"Two fixes for 4.18:
- an important core fix for RTCs using the core offsetting only one
driver is affected
- a fix for the error path of mrst"
* tag 'rtc-4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: fix alarm read and set offset
rtc: mrst: fix error code in probe()
Olof Johansson [Sat, 14 Jul 2018 22:14:02 +0000 (15:14 -0700)]
Merge tag 'omap-for-v4.18/fixes-rc4-signed' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
Two omap fixes for v4.18-rc cycle
Turns out the recent patches for ARM branch predictor hardening are
not working on omap5 and dra7 as planned because the secondary CPU
is parked to the bootrom code. We can't configure it in the bootloader.
So we must enable invalidates of BTB for omap5 and dra7 secondary
core in the kernel.
And there's a fix for reserved register access for am3517. The
usb otg module on am3517 is not the same as for other omap3.
* tag 'omap-for-v4.18/fixes-rc4-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Sat, 14 Jul 2018 22:12:24 +0000 (15:12 -0700)]
Merge tag 'mvebu-fixes-4.18-1' of git://git.infradead.org/linux-mvebu into fixes
mvebu fixes for 4.18 (part 1)
Use the new thermal binding on Armada 38x allowing to use a driver fix
which is already part of the kernel.
* tag 'mvebu-fixes-4.18-1' of git://git.infradead.org/linux-mvebu:
ARM: dts: armada-38x: use the new thermal binding
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Sat, 14 Jul 2018 22:11:41 +0000 (15:11 -0700)]
Merge tag 'pxa-fixes-4.18' of https://github.com/rjarzmik/linux into fixes
This is the fixes set for v4.18 cycle.
This is a fix for suspending all pxa3xx platforms, where high
number interrupts are not reenabled.
* tag 'pxa-fixes-4.18' of https://github.com/rjarzmik/linux:
ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Sat, 14 Jul 2018 19:30:13 +0000 (12:30 -0700)]
Merge tag 'for-linus-4.18-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two related fixes for a boot failure of Xen PV guests"
* tag 'for-linus-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: setup pv irq ops vector earlier
xen: remove global bit from __default_kernel_pte_mask for pv guests
Linus Torvalds [Sat, 14 Jul 2018 19:28:00 +0000 (12:28 -0700)]
Merge tag 'for-linus-
20180713' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a single regression fix (from 4.17) for bsg, fixing an EINVAL
return on non-data commands"
* tag 'for-linus-
20180713' of git://git.kernel.dk/linux-block:
bsg: fix bogus EINVAL on non-data commands
Linus Torvalds [Sat, 14 Jul 2018 18:14:33 +0000 (11:14 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches form Andrew Morton <akpm@linux-foundation.org>:
reiserfs: fix buffer overflow with long warning messages
checkpatch: fix duplicate invalid vsprintf pointer extension '%p<foo>' messages
mm: do not bug_on on incorrect length in __mm_populate()
mm/memblock.c: do not complain about top-down allocations for !MEMORY_HOTREMOVE
fs, elf: make sure to page align bss in load_elf_library
x86/purgatory: add missing FORCE to Makefile target
net/9p/client.c: put refcount of trans_mod in error case in parse_opts()
mm: allow arch to supply p??_free_tlb functions
autofs: fix slab out of bounds read in getname_kernel()
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
mm: do not drop unused pages when userfaultd is running
Eric Biggers [Fri, 13 Jul 2018 23:59:27 +0000 (16:59 -0700)]
reiserfs: fix buffer overflow with long warning messages
ReiserFS prepares log messages into a 1024-byte buffer with no bounds
checks. Long messages, such as the "unknown mount option" warning when
userspace passes a crafted mount options string, overflow this buffer.
This causes KASAN to report a global-out-of-bounds write.
Fix it by truncating messages to the buffer size.
Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 13 Jul 2018 23:59:23 +0000 (16:59 -0700)]
checkpatch: fix duplicate invalid vsprintf pointer extension '%p<foo>' messages
Multiline statements with invalid %p<foo> uses produce multiple
warnings. Fix that.
e.g.:
$ cat t_block.c
void foo(void)
{
MY_DEBUG(drv->foo,
"%pk",
foo->boo);
}
$ ./scripts/checkpatch.pl -f t_block.c
WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#1: FILE: t_block.c:1:
+void foo(void)
WARNING: Invalid vsprintf pointer extension '%pk'
#3: FILE: t_block.c:3:
+ MY_DEBUG(drv->foo,
+ "%pk",
+ foo->boo);
WARNING: Invalid vsprintf pointer extension '%pk'
#3: FILE: t_block.c:3:
+ MY_DEBUG(drv->foo,
+ "%pk",
+ foo->boo);
total: 0 errors, 3 warnings, 6 lines checked
NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.
t_block.c has style problems, please review.
NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Link: http://lkml.kernel.org/r/9e8341bbe4c9877d159cb512bb701043cbfbb10b.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Tobin C. Harding" <me@tobin.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 13 Jul 2018 23:59:20 +0000 (16:59 -0700)]
mm: do not bug_on on incorrect length in __mm_populate()
syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate
kernel BUG at mm/gup.c:1242!
invalid opcode: 0000 [#1] SMP
CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
RIP: 0010:__mm_populate+0x1e2/0x1f0
Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
Call Trace:
vm_brk_flags+0xc3/0x100
vm_brk+0x1f/0x30
load_elf_library+0x281/0x2e0
__ia32_sys_uselib+0x170/0x1e0
do_fast_syscall_32+0xca/0x420
entry_SYSENTER_compat+0x70/0x7f
The reason is that the length of the new brk is not page aligned when we
try to populate the it. There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is
expanded as it should. All we need is to tell mm_populate about it.
Besides that there is absolutely no reason to to bug_on in the first
place. The worst thing that could happen is that the last page wouldn't
get populated and that is far from putting system into an inconsistent
state.
Fix the issue by moving the length sanitization code from do_brk_flags
up to vm_brk_flags. The only other caller of do_brk_flags is brk
syscall entry and it makes sure to provide the proper length so t here
is no need for sanitation and so we can use do_brk_flags without it.
Also remove the bogus BUG_ONs.
[osalvador@techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: syzbot <syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 13 Jul 2018 23:59:16 +0000 (16:59 -0700)]
mm/memblock.c: do not complain about top-down allocations for !MEMORY_HOTREMOVE
Mike Rapoport is converting architectures from bootmem to nobootmem
allocator. While doing so for m68k Geert has noticed that he gets a
scary looking warning:
WARNING: CPU: 0 PID: 0 at mm/memblock.c:230
memblock_find_in_range_node+0x11c/0x1be
memblock: bottom-up allocation failed, memory hotunplug may be affected
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
4.18.0-rc3-atari-01343-gf2fb5f2e09a97a3c-dirty #7
Call Trace: __warn+0xa8/0xc2
kernel_pg_dir+0x0/0x1000
netdev_lower_get_next+0x2/0x22
warn_slowpath_fmt+0x2e/0x36
memblock_find_in_range_node+0x11c/0x1be
memblock_find_in_range_node+0x11c/0x1be
memblock_find_in_range_node+0x0/0x1be
vprintk_func+0x66/0x6e
memblock_virt_alloc_internal+0xd0/0x156
netdev_lower_get_next+0x2/0x22
netdev_lower_get_next+0x2/0x22
kernel_pg_dir+0x0/0x1000
memblock_virt_alloc_try_nid_nopanic+0x58/0x7a
netdev_lower_get_next+0x2/0x22
kernel_pg_dir+0x0/0x1000
kernel_pg_dir+0x0/0x1000
EXPTBL+0x234/0x400
EXPTBL+0x234/0x400
alloc_node_mem_map+0x4a/0x66
netdev_lower_get_next+0x2/0x22
free_area_init_node+0xe2/0x29e
EXPTBL+0x234/0x400
paging_init+0x430/0x462
kernel_pg_dir+0x0/0x1000
printk+0x0/0x1a
EXPTBL+0x234/0x400
setup_arch+0x1b8/0x22c
start_kernel+0x4a/0x40a
_sinittext+0x344/0x9e8
The warning is basically saying that a top-down allocation can break
memory hotremove because memblock allocation is not movable. But m68k
doesn't even support MEMORY_HOTREMOVE so there is no point to warn about
it.
Make the warning conditional only to configurations that care.
Link: http://lkml.kernel.org/r/20180706061750.GH32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Sam Creasey <sammy@sammy.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Fri, 13 Jul 2018 23:59:13 +0000 (16:59 -0700)]
fs, elf: make sure to page align bss in load_elf_library
The current code does not make sure to page align bss before calling
vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to
the requested lenght not being correctly aligned.
Let us make sure to align it properly.
Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured
for libc5.
Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philipp Rudo [Fri, 13 Jul 2018 23:59:09 +0000 (16:59 -0700)]
x86/purgatory: add missing FORCE to Makefile target
- Build the kernel without the fix
- Add some flag to the purgatories KBUILD_CFLAGS,I used
-fno-asynchronous-unwind-tables
- Re-build the kernel
When you look at makes output you see that sha256.o is not re-build in the
last step. Also readelf -S still shows the .eh_frame section for
sha256.o.
With the fix sha256.o is rebuilt in the last step.
Without FORCE make does not detect changes only made to the command line
options. So object files might not be re-built even when they should be.
Fix this by adding FORCE where it is missing.
Link: http://lkml.kernel.org/r/20180704110044.29279-2-prudo@linux.ibm.com
Fixes:
df6f2801f511 ("kernel/kexec_file.c: move purgatories sha256 to common code")
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> [4.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Fri, 13 Jul 2018 23:59:06 +0000 (16:59 -0700)]
net/9p/client.c: put refcount of trans_mod in error case in parse_opts()
In my testing, the second mount will fail after umounting successfully.
The reason is that we put refcount of trans_mod in the correct case
rather than the error case in parse_opts() at last. That will cause the
refcount decrease to -1, and when we try to get trans_mod again in
try_module_get(), we could only increase refcount to 0 which will cause
failure as follows:
parse_opts
v9fs_get_trans_by_name
try_module_get : return NULL to caller which cause error
So we should put refcount of trans_mod in error case.
Link: http://lkml.kernel.org/r/5B3F39A0.2030509@huawei.com
Fixes:
9421c3e64137ec ("net/9p/client.c: fix potential refcnt problem of trans module")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Dominique Martinet <dominique.martinet@cea.fr>
Tested-by: Dominique Martinet <dominique.martinet@cea.fr>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicholas Piggin [Fri, 13 Jul 2018 23:59:03 +0000 (16:59 -0700)]
mm: allow arch to supply p??_free_tlb functions
The mmu_gather APIs keep track of the invalidated address range
including the span covered by invalidated page table pages. Ranges
covered by page tables but not ptes (and therefore no TLBs) still need
to be invalidated because some architectures (x86) can cache
intermediate page table entries, and invalidate those with normal TLB
invalidation instructions to be almost-backward-compatible.
Architectures which don't cache intermediate page table entries, or
which invalidate these caches separately from TLB invalidation, do not
require TLB invalidation range expanded over page tables.
Allow architectures to supply their own p??_free_tlb functions, which
can avoid the __tlb_adjust_range.
Link: http://lkml.kernel.org/r/20180703013131.2807-1-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tomas Bortoli [Fri, 13 Jul 2018 23:58:59 +0000 (16:58 -0700)]
autofs: fix slab out of bounds read in getname_kernel()
The autofs subsystem does not check that the "path" parameter is present
for all cases where it is required when it is passed in via the "param"
struct.
In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD
ioctl command.
To solve it, modify validate_dev_ioctl(function to check that a path has
been provided for ioctl commands that require it.
Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Fri, 13 Jul 2018 23:58:56 +0000 (16:58 -0700)]
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Thomas reports:
"While looking around in /proc on my v4.14.52 system I noticed that all
processes got a lot of "Locked" memory in /proc/*/smaps. A lot more
memory than a regular user can usually lock with mlock().
Commit
493b0e9d945f (in v4.14-rc1) seems to have changed the behavior
of "Locked".
Before that commit the code was like this. Notice the VM_LOCKED check.
(vma->vm_flags & VM_LOCKED) ?
(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
After that commit Locked is now the same as Pss:
(unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
This looks like a mistake."
Indeed, the commit has added mss->pss_locked with the correct value that
depends on VM_LOCKED, but forgot to actually use it. Fix it.
Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz
Fixes:
493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christian Borntraeger [Fri, 13 Jul 2018 23:58:52 +0000 (16:58 -0700)]
mm: do not drop unused pages when userfaultd is running
KVM guests on s390 can notify the host of unused pages. This can result
in pte_unused callbacks to be true for KVM guest memory.
If a page is unused (checked with pte_unused) we might drop this page
instead of paging it. This can have side-effects on userfaultd, when
the page in question was already migrated:
The next access of that page will trigger a fault and a user fault
instead of faulting in a new and empty zero page. As QEMU does not
expect a userfault on an already migrated page this migration will fail.
The most straightforward solution is to ignore the pte_unused hint if a
userfault context is active for this VMA.
Link: http://lkml.kernel.org/r/20180703171854.63981-1-borntraeger@de.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Tatashin [Sat, 14 Jul 2018 13:15:07 +0000 (09:15 -0400)]
mm: zero unavailable pages before memmap init
We must zero struct pages for memory that is not backed by physical
memory, or kernel does not have access to.
Recently, there was a change which zeroed all memmap for all holes in
e820. Unfortunately, it introduced a bug that is discussed here:
https://www.spinics.net/lists/linux-mm/msg156764.html
Linus, also saw this bug on his machine, and confirmed that reverting
commit
124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into
memblock.reserved") fixes the issue.
The problem is that we incorrectly zero some struct pages after they
were setup.
The fix is to zero unavailable struct pages prior to initializing of
struct pages.
A more detailed fix should come later that would avoid double zeroing
cases: one in __init_single_page(), the other one in
zero_resv_unavail().
Fixes:
124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yongjun [Wed, 11 Jul 2018 12:34:21 +0000 (12:34 +0000)]
pinctrl: nsp: Fix potential NULL dereference
platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.
This is detected by Coccinelle semantic patch.
@@
expression pdev, res, n, t, e, e1, e2;
@@
res = platform_get_resource(pdev, t, n);
+ if (!res)
+ return -EINVAL;
... when != res == NULL
e = devm_ioremap_nocache(e1, res->start, e2);
Fixes:
cc4fa83f66e9 ("pinctrl: nsp: add pinmux driver support for Broadcom NSP SoC")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dan Carpenter [Tue, 3 Jul 2018 12:04:25 +0000 (15:04 +0300)]
pinctrl: nsp: off by ones in nsp_pinmux_enable()
The > comparisons should be >= or else we read beyond the end of the
pinctrl->functions[] array.
Fixes:
cc4fa83f66e9 ("pinctrl: nsp: add pinmux driver support for Broadcom NSP SoC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Niklas Söderlund [Tue, 3 Jul 2018 15:18:42 +0000 (17:18 +0200)]
pinctrl: sh-pfc: r8a77970: remove SH_PFC_PIN_CFG_DRIVE_STRENGTH flag
The datasheet does not document any registers to control drive strength,
and no drive strength registers are for this reason described for this
SoC. The flags indicating that drive strength can be controlled are
however set for some pins in the driver.
This leads to a NULL pointer dereference when the sh-pfc core tries to
access the struct describing the drive strength registers, for example
when reading the sysfs file pinconf-pins.
Fix this by removing the SH_PFC_PIN_CFG_DRIVE_STRENGTH from all pins.
Fixes:
b92ac66a1819602b ("pinctrl: sh-pfc: Add R8A77970 PFC support")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Paul Cercueil [Wed, 27 Jun 2018 11:49:02 +0000 (13:49 +0200)]
pinctrl: ingenic: Fix inverted direction for < JZ4770
The .gpio_set_direction() callback was setting inverted direction
for SoCs older than the JZ4770, this restores the correct behaviour.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Sean Wang [Fri, 22 Jun 2018 03:49:08 +0000 (11:49 +0800)]
pinctrl: mt7622: fix a kernel panic when gpio-hog is being applied
When we are explicitly using GPIO hogging mechanism in the pinctrl node,
such as:
&pio {
line_input {
gpio-hog;
gpios = <95 0>, <96 0>, <97 0>;
input;
};
};
A kernel panic happens at dereferencing a NULL pointer: In this case, the
drvdata is still not setup properly yet when it is being accessed.
A better solution for fixing up this issue should be we should obtain the
private data from struct gpio_chip using a specific gpiochip_get_data
instead of a generic dev_get_drvdata.
[ 0.249424] Unable to handle kernel NULL pointer dereference at virtual
address
000000c8
[ 0.257818] Mem abort info:
[ 0.260704] ESR = 0x96000005
[ 0.263869] Exception class = DABT (current EL), IL = 32 bits
[ 0.270011] SET = 0, FnV = 0
[ 0.273167] EA = 0, S1PTW = 0
[ 0.276421] Data abort info:
[ 0.279398] ISV = 0, ISS = 0x00000005
[ 0.283372] CM = 0, WnR = 0
[ 0.286440] [
00000000000000c8] user address but active_mm is swapper
[ 0.293027] Internal error: Oops:
96000005 [#1] PREEMPT SMP
[ 0.298795] Modules linked in:
[ 0.301958] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1+ #389
[ 0.308716] Hardware name: MediaTek MT7622 RFB1 board (DT)
[ 0.314396] pstate:
80000005 (Nzcv daif -PAN -UAO)
[ 0.319362] pc : mtk_hw_pin_field_get+0x28/0x118
[ 0.324140] lr : mtk_hw_set_value+0x30/0x104
[ 0.328557] sp :
ffffff800801b6d0
[ 0.331983] x29:
ffffff800801b6d0 x28:
ffffff80086b7970
[ 0.337484] x27:
0000000000000000 x26:
ffffff80087b8000
[ 0.342986] x25:
0000000000000000 x24:
ffffffc00324c230
[ 0.348487] x23:
0000000000000003 x22:
0000000000000000
[ 0.353988] x21:
ffffff80087b8000 x20:
0000000000000000
[ 0.359489] x19:
0000000000000054 x18:
00000000fffff7c0
[ 0.364990] x17:
0000000000006300 x16:
000000000000003f
[ 0.370492] x15:
000000000000000e x14:
ffffffffffffffff
[ 0.375993] x13:
0000000000000000 x12:
0000000000000020
[ 0.381494] x11:
0000000000000006 x10:
0101010101010101
[ 0.386995] x9 :
fffffffffffffffa x8 :
0000000000000007
[ 0.392496] x7 :
ffffff80085d63f8 x6 :
0000000000000003
[ 0.397997] x5 :
0000000000000054 x4 :
ffffffc0031eb800
[ 0.403499] x3 :
ffffff800801b728 x2 :
0000000000000003
[ 0.409000] x1 :
0000000000000054 x0 :
0000000000000000
[ 0.414502] Process swapper/0 (pid: 1, stack limit = 0x000000002a913c1c)
[ 0.421441] Call trace:
[ 0.423968] mtk_hw_pin_field_get+0x28/0x118
[ 0.428387] mtk_hw_set_value+0x30/0x104
[ 0.432445] mtk_gpio_set+0x20/0x28
[ 0.436052] mtk_gpio_direction_output+0x18/0x30
[ 0.440833] gpiod_direction_output_raw_commit+0x7c/0xa0
[ 0.446333] gpiod_direction_output+0x104/0x114
[ 0.451022] gpiod_configure_flags+0xbc/0xfc
[ 0.455441] gpiod_hog+0x8c/0x140
[ 0.458869] of_gpiochip_add+0x27c/0x2d4
[ 0.462928] gpiochip_add_data_with_key+0x338/0x5f0
[ 0.467976] mtk_pinctrl_probe+0x388/0x400
[ 0.472217] platform_drv_probe+0x58/0xa4
[ 0.476365] driver_probe_device+0x204/0x44c
[ 0.480783] __device_attach_driver+0xac/0x108
[ 0.485384] bus_for_each_drv+0x7c/0xac
[ 0.489352] __device_attach+0xa0/0x144
[ 0.493320] device_initial_probe+0x10/0x18
[ 0.497647] bus_probe_device+0x2c/0x8c
[ 0.501616] device_add+0x2f8/0x540
[ 0.505226] of_device_add+0x3c/0x44
[ 0.508925] of_platform_device_create_pdata+0x80/0xb8
[ 0.514245] of_platform_bus_create+0x290/0x3e8
[ 0.518933] of_platform_populate+0x78/0x100
[ 0.523352] of_platform_default_populate+0x24/0x2c
[ 0.528403] of_platform_default_populate_init+0x94/0xa4
[ 0.533903] do_one_initcall+0x98/0x130
[ 0.537874] kernel_init_freeable+0x13c/0x1d4
[ 0.542385] kernel_init+0x10/0xf8
[ 0.545903] ret_from_fork+0x10/0x18
[ 0.549603] Code:
900020a1 f9400800 911dcc21 1400001f (
f9406401)
[ 0.555916] ---[ end trace
de8c34787fdad3b3 ]---
[ 0.560722] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[ 0.560722]
[ 0.570188] SMP: stopping secondary CPUs
[ 0.574253] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b
[ 0.574253]
Cc: stable@vger.kernel.org
Fixes:
d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Sean Wang [Fri, 22 Jun 2018 03:49:07 +0000 (11:49 +0800)]
pinctrl: mt7622: stop using the deprecated pinctrl_add_gpio_range
If the pinctrl node has the gpio-ranges property, the range will be added
by the gpio core and doesn't need to be added by the pinctrl driver.
But for keeping backward compatibility, an explicit pinctrl_add_gpio_range
is still needed to be called when there is a missing gpio-ranges in pinctrl
node in old dts files.
Cc: stable@vger.kernel.org
Fixes:
d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Sean Wang [Fri, 22 Jun 2018 03:49:06 +0000 (11:49 +0800)]
pinctrl: mt7622: fix that pinctrl_claim_hogs cannot work
To allow claiming hogs by pinctrl, we cannot enable pinctrl until all
groups and functions are being added done. Also, it's necessary that
the corresponding gpiochip is being added when the pinctrl device is
enabled.
Cc: stable@vger.kernel.org
Fixes:
d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Sean Wang [Fri, 22 Jun 2018 03:49:05 +0000 (11:49 +0800)]
pinctrl: mt7622: fix initialization sequence between eint and gpiochip
Because gpichip applied in the driver must depend on mtk eint to implement
the input data debouncing and the translation between gpio and irq, it's
better to keep logic consistent with mtk eint being built prior to gpiochip
being added.
Cc: stable@vger.kernel.org
Fixes:
e6dabd38d8e7 ("pinctrl: mediatek: add EINT support to MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Sean Wang [Fri, 22 Jun 2018 03:49:04 +0000 (11:49 +0800)]
pinctrl: mt7622: fix error path on failing at groups building
It should be to return an error code when failing at groups building.
Cc: stable@vger.kernel.org
Fixes:
d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
David S. Miller [Sat, 14 Jul 2018 01:30:19 +0000 (18:30 -0700)]
Merge branch 'fix-DCTCP-delayed-ACK'
Yuchung Cheng says:
====================
fix DCTCP delayed ACK
This patch series addresses the issue that sometimes DCTCP
fail to acknowledge the latest sequence and result in sender timeout
if inflight is small.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Thu, 12 Jul 2018 13:04:53 +0000 (06:04 -0700)]
tcp: remove DELAYED ACK events in DCTCP
After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK
related callbacks are no longer needed
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Thu, 12 Jul 2018 13:04:52 +0000 (06:04 -0700)]
tcp: fix dctcp delayed ACK schedule
Previously, when a data segment was sent an ACK was piggybacked
on the data segment without generating a CA_EVENT_NON_DELAYED_ACK
event to notify congestion control modules. So the DCTCP
ca->delayed_ack_reserved flag could incorrectly stay set when
in fact there were no delayed ACKs being reserved. This could result
in sending a special ECN notification ACK that carries an older
ACK sequence, when in fact there was no need for such an ACK.
DCTCP keeps track of the delayed ACK status with its own separate
state ca->delayed_ack_reserved. Previously it may accidentally cancel
the delayed ACK without updating this field upon sending a special
ACK that carries a older ACK sequence. This inconsistency would
lead to DCTCP receiver never acknowledging the latest data until the
sender times out and retry in some cases.
Packetdrill script (provided by Larry Brakmo)
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001
0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001
0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001
0.210 < [ce] P. 4001:4501(500) ack 3 win 257
+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501
+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack
+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data
+0 > [ect01] . 4:4(0) ack 6501 // now acks everything
+0.500 < F. 9501:9501(0) ack 4 win 257
Reported-by: Larry Brakmo <brakmo@fb.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 12 Jul 2018 12:23:45 +0000 (15:23 +0300)]
qlogic: check kstrtoul() for errors
We accidentally left out the error handling for kstrtoul().
Fixes:
a520030e326a ("qlcnic: Implement flash sysfs callback for 83xx adapter")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Heimpold [Wed, 11 Jul 2018 21:10:55 +0000 (23:10 +0200)]
net: ethtool: fix spelling mistake: "tubale" -> "tunable"
Signed-off-by: Michael Heimpold <mhei@heimpold.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 13 Jul 2018 22:34:29 +0000 (15:34 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- I2C core bugfix regarding bus recovery
- driver bugfix for the tegra driver
- typo correction
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: recovery: if possible send STOP with recovery pulses
i2c: tegra: Fix NACK error handling
i2c: stu300: use non-archaic spelling of failes
David S. Miller [Fri, 13 Jul 2018 21:31:47 +0000 (14:31 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-13
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix AF_XDP TX error reporting before final kernel release such that it
becomes consistent between copy mode and zero-copy, from Magnus.
2) Fix three different syzkaller reported issues: oob due to ld_abs
rewrite with too large offset, another oob in l3 based skb test run
and a bug leaving mangled prog in subprog JITing error path, from Daniel.
3) Fix BTF handling for bitfield extraction on big endian, from Okash.
4) Fix a missing linux/errno.h include in cgroup/BPF found by kbuild bot,
from Roman.
5) Fix xdp2skb_meta.sh sample by using just command names instead of
absolute paths for tc and ip and allow them to be redefined, from Taeung.
6) Fix availability probing for BPF seg6 helpers before final kernel ships
so they can be detected at prog load time, from Mathieu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Fri, 13 Jul 2018 11:21:07 +0000 (13:21 +0200)]
skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Commit
8b7008620b84 ("net: Don't copy pfmemalloc flag in
__copy_skb_header()") introduced a different handling for the
pfmemalloc flag in copy and clone paths.
In __skb_clone(), now, the flag is set only if it was set in the
original skb, but not cleared if it wasn't. This is wrong and
might lead to socket buffers being flagged with pfmemalloc even
if the skb data wasn't allocated from pfmemalloc reserves. Copy
the flag instead of ORing it.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Fixes:
8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 13 Jul 2018 20:36:36 +0000 (13:36 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"A clocksource driver fix and a revert"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: arm_arch_timer: Set arch_mem_timer cpumask to cpu_possible_mask
Revert "tick: Prefer a lower rating device only if it's CPU local device"
Linus Torvalds [Fri, 13 Jul 2018 20:33:09 +0000 (13:33 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tool fixes from Ingo Molnar:
"Misc tooling fixes: python3 related fixes, gcc8 fix, bashism fixes and
some other smaller fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Use python-config --includes rather than --cflags
perf script python: Fix dict reference counting
perf stat: Fix --interval_clear option
perf tools: Fix compilation errors on gcc8
perf test shell: Prevent temporary editor files from being considered test scripts
perf llvm-utils: Remove bashism from kernel include fetch script
perf test shell: Make perf's inet_pton test more portable
perf test shell: Replace '|&' with '2>&1 |' to work with more shells
perf scripts python: Add Python 3 support to EventClass.py
perf scripts python: Add Python 3 support to sched-migration.py
perf scripts python: Add Python 3 support to Util.py
perf scripts python: Add Python 3 support to SchedGui.py
perf scripts python: Add Python 3 support to Core.py
perf tools: Generate a Python script compatible with Python 2 and 3
Linus Torvalds [Fri, 13 Jul 2018 20:30:21 +0000 (13:30 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fix from Ingo Molnar:
"Fix a UEFI mixed mode (64-bit kernel on 32-bit UEFI) reboot loop
regression"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Fix mixed mode reboot loop by removing pointless call to PciIo->Attributes()
Linus Torvalds [Fri, 13 Jul 2018 19:50:42 +0000 (12:50 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:
"Various rseq ABI fixes and cleanups: use get_user()/put_user(),
validate parameters and use proper uapi types, etc"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq/selftests: cleanup: Update comment above rseq_prepare_unload
rseq: Remove unused types_32_64.h uapi header
rseq: uapi: Declare rseq_cs field as union, update includes
rseq: uapi: Update uapi comments
rseq: Use get_user/put_user rather than __get_user/__put_user
rseq: Use __u64 for rseq_cs fields, validate user inputs
Linus Torvalds [Fri, 13 Jul 2018 19:42:14 +0000 (12:42 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Things have been quite slow, only 6 RC patches have been sent to the
list. Regression, user visible bugs, and crashing fixes:
- cxgb4 could wrongly fail MR creation due to a typo
- various crashes if the wrong QP type is mixed in with APIs that
expect other types
- syzkaller oops
- using ERR_PTR and NULL together cases HFI1 to crash in some cases
- mlx5 memory leak in error unwind"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
RDMA/uverbs: Don't fail in creation of multiple flows
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
RDMA/uverbs: Protect from attempts to create flows on unsupported QP
iw_cxgb4: correctly enforce the max reg_mr depth
Linus Torvalds [Fri, 13 Jul 2018 19:37:45 +0000 (12:37 -0700)]
Merge tag 'vfio-v4.18-rc5' of git://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
"Fix deadlock in mbochs sample driver (Alexey Khoroshilov)"
* tag 'vfio-v4.18-rc5' of git://github.com/awilliam/linux-vfio:
sample: vfio-mdev: avoid deadlock in mdev_access()
Linus Torvalds [Fri, 13 Jul 2018 19:15:12 +0000 (12:15 -0700)]
Merge tag 'kbuild-fixes-v4.18-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- update Kbuild and Kconfig documents
- sanitize -I compiler option handling
- update extract-vmlinux script to recognize LZ4 and ZSTD
- fix tools Makefiles
- update tags.sh to handle __ro_after_init
- suppress warnings in case getconf does not recognize LFS_* parameters
* tag 'kbuild-fixes-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: suppress warnings from 'getconf LFS_*'
scripts/tags.sh: add __ro_after_init
tools: build: Use HOSTLDFLAGS with fixdep
tools: build: Fixup host c flags
tools build: fix # escaping in .cmd files for future Make
scripts: teach extract-vmlinux about LZ4 and ZSTD
kbuild: remove duplicated comments about PHONY
kbuild: .PHONY is not a variable, but PHONY is
kbuild: do not drop -I without parameter
kbuild: document the KBUILD_KCONFIG env. variable
kconfig: update user kconfig tools doc.
kbuild: delete INSTALL_FW_PATH from kbuild documentation
kbuild: update ARCH alias info for sparc
kbuild: update ARCH alias info for sh
Linus Torvalds [Fri, 13 Jul 2018 18:48:34 +0000 (11:48 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Catalin's out enjoying the sunshine, so I'm sending the fixes for a
couple of weeks (although there hopefully won't be any more!).
We've got a revert of a previous fix because it broke the build with
some distro toolchains and a preemption fix when detemining whether or
not the SIMD unit is in use.
Summary:
- Revert back to the 'linux' target for LD, as 'elf' breaks some
distributions
- Fix preemption race when testing whether the vector unit is in use
or not"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: neon: Fix function may_use_simd() return error status
Revert "arm64: Use aarch64elf and aarch64elfb emulation mode variants"
Linus Torvalds [Fri, 13 Jul 2018 18:44:12 +0000 (11:44 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A couple of small fixes this time around from Steven for an
interaction between ftrace and kernel read-only protection, and
Vladimir for nommu"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot
ARM: 8775/1: NOMMU: Use instr_sync instead of plain isb in common code
Linus Torvalds [Fri, 13 Jul 2018 18:40:11 +0000 (11:40 -0700)]
Merge tag 'trace-v4.18-rc3-3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixlet from Steven Rostedt:
"Joel Fernandes asked to add a feature in tracing that Android had its
own patch internally for. I took it back in 4.13. Now he realizes that
he had a mistake, and swapped the values from what Android had. This
means that the old Android tools will break when using a new kernel
that has the new feature on it.
The options are:
1. To swap it back to what Android wants.
2. Add a command line option or something to do the swap
3. Just let Android carry a patch that swaps it back
Since it requires setting a tracing option to enable this anyway, I
doubt there are other users of this than Android. Thus, I've decided
to take option 1. If someone else is actually depending on the order
that is in the kernel, then we will have to revert this change and go
to option 2 or 3"
* tag 'trace-v4.18-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Reorder display of TGID to be after PID
Linus Torvalds [Fri, 13 Jul 2018 18:36:46 +0000 (11:36 -0700)]
Merge tag 'sound-4.18-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few HD-auio fixes: one fix for a possible mutex deadlock at
HDMI hotplug handling is somewhat subtle and delicate, while the rest
are usual device-specific quirks"
* tag 'sound-4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/ca0132: Update a pci quirk device name
ALSA: hda/ca0132: Add Recon3Di quirk for Gigabyte G1.Sniper Z97
ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION
ALSA: hda - Handle pm failure during hotplug
Linus Torvalds [Fri, 13 Jul 2018 17:54:01 +0000 (10:54 -0700)]
Merge tag 'libnvdimm-fixes-4.18-rc5' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dave Jiang:
- ensure that a variable passed in by reference to acpi_nfit_ctl is
always set to a value. An incremental patch is provided due to notice
from testing in -next. The rest of the commits did not exhibit
issues.
- fix a return path in nsio_rw_bytes() that was not returning "bytes
remain" as expected for the function.
- address an issue where applications polling on scrub-completion for
the NVDIMM may falsely wakeup and read the wrong state value and
cause hang.
- change the test unit persistent capability attribute to fix up a
broken assumption in the unit test infrastructure wrt the
'write_cache' attribute
- ratelimit dev_info() in the dax device check_vma() function since
this is easily triggered from userspace
* tag 'libnvdimm-fixes-4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: fix unchecked dereference in acpi_nfit_ctl
acpi, nfit: Fix scrub idle detection
tools/testing/nvdimm: advertise a write cache for nfit_test
acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value
dev-dax: check_vma: ratelimit dev_info-s
libnvdimm, pmem: Fix memcpy_mcsafe() return code handling in nsio_rw_bytes()
Alex Deucher [Thu, 12 Jul 2018 13:38:09 +0000 (08:38 -0500)]
drm/amdgpu/pp/smu7: use a local variable for toc indexing
Rather than using the index variable stored in vram. If
the device fails to come back online after a resume cycle,
reads from vram will return all 1s which will cause a
segfault. Based on a patch from Thomas Martitz <kugel@rockbox.org>.
This avoids the segfault, but we still need to sort out
why the GPU does not come back online after a resume.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105760
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Naohiro Aota [Fri, 13 Jul 2018 14:07:20 +0000 (23:07 +0900)]
btrfs: fix use-after-free of cmp workspace pages
btrfs_cmp_data_free() puts cmp's src_pages and dst_pages, but leaves
their page address intact. Now, if you hit "goto again" in
btrfs_extent_same_range() and hit some error in
btrfs_cmp_data_prepare(), you'll try to unlock/put already put pages.
This is simple fix to reset the address to avoid use-after-free.
Fixes:
67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
Signed-off-by: Naohiro Aota <naota@elisp.net>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Daniel Borkmann [Fri, 13 Jul 2018 13:34:31 +0000 (15:34 +0200)]
Merge branch 'bpf-af-xdp-consistent-err-reporting'
Magnus Karlsson says:
====================
This patch set adjusts the AF_XDP TX error reporting so that it becomes
consistent between copy mode and zero-copy. First some background:
Copy-mode for TX uses the SKB path in which the action of sending the
packet is performed from process context using the sendmsg
syscall. Completions are usually done asynchronously from NAPI mode by
using a TX interrupt. In this mode, send errors can be returned back
through the syscall.
In zero-copy mode both the sending of the packet and the completions
are done asynchronously from NAPI mode for performance reasons. In
this mode, the sendmsg syscall only makes sure that the TX NAPI loop
will be run that performs both the actions of sending and
completing. In this mode it is therefore not possible to return errors
through the sendmsg syscall as the sending is done from the NAPI
loop. Note that it is possible to implement a synchronous send with
our API, but in our benchmarks that made the TX performance drop by
nearly half due to synchronization requirements and cache line
bouncing. But for some netdevs this might be preferable so let us
leave it up to the implementation to decide.
The problem is that the current code base returns some errors in
copy-mode that are not possible to return in zero-copy mode. This
patch set aligns them so that the two modes always return the same
error code. We achieve this by removing some of the errors returned by
sendmsg in copy-mode (and in one case adding an error message for
zero-copy mode) and offering alternative error detection methods that
are consistent between the two modes.
The structure of the patch set is as follows:
Patch 1: removes the ENXIO return code from copy-mode when someone has
forcefully changed the number of queues on the device so that the
queue bound to the socket is no longer available. Just silently stop
sending anything as in zero-copy mode.
Patch 2: stop returning EAGAIN in copy mode when the completion queue
is full as zero-copy does not do this. Instead this situation can be
detected by comparing the head and tail pointers of the completion
queue in both modes. In any case, EAGAIN was not the correct error code
here since no amount of calling sendmsg will solve the problem. Only
consuming one or more messages on the completion queue will fix this.
Patch 3: Always return ENOBUFS from sendmsg if there is no TX queue
configured. This was not the case for zero-copy mode.
Patch 4: stop returning EMSGSIZE when the size of the packet is larger
than the MTU. Just send it to the device so that it will drop it as in
zero-copy mode.
Note that copy-mode can still return EAGAIN in certain circumstances,
but as these conditions cannot occur in zero-copy mode it is fine for
copy-mode to return them.
====================
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Magnus Karlsson [Wed, 11 Jul 2018 08:12:52 +0000 (10:12 +0200)]
xsk: do not return EMSGSIZE in copy mode for packets larger than MTU
This patch stops returning EMSGSIZE from sendmsg in copy mode when the
size of the packet is larger than the MTU. Just send it to the device
so that it will drop it as in zero-copy mode. This makes the error
reporting consistent between copy mode and zero-copy mode.
Fixes:
35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Magnus Karlsson [Wed, 11 Jul 2018 08:12:51 +0000 (10:12 +0200)]
xsk: always return ENOBUFS from sendmsg if there is no TX queue
This patch makes sure ENOBUFS is always returned from sendmsg if there
is no TX queue configured. This was not the case for zero-copy
mode. With this patch this error reporting is consistent between copy
mode and zero-copy mode.
Fixes:
ac98d8aab61b ("xsk: wire upp Tx zero-copy functions")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Magnus Karlsson [Wed, 11 Jul 2018 08:12:50 +0000 (10:12 +0200)]
xsk: do not return EAGAIN from sendmsg when completion queue is full
This patch stops returning EAGAIN in TX copy mode when the completion
queue is full as zero-copy does not do this. Instead this situation
can be detected by comparing the head and tail pointers of the
completion queue in both modes. In any case, EAGAIN was not the
correct error code here since no amount of calling sendmsg will solve
the problem. Only consuming one or more messages on the completion
queue will fix this.
With this patch, the error reporting becomes consistent between copy
mode and zero-copy mode.
Fixes:
35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>