Luis Henriques [Mon, 15 Oct 2018 15:45:59 +0000 (16:45 +0100)]
ceph: support copy_file_range file operation
This commit implements support for the copy_file_range syscall in cephfs.
It is implemented using the RADOS 'copy-from' operation, which allows to
do a remote object copy, without the need to download/upload data from/to
the OSDs.
Some manual copy may however be required if the source/destination file
offsets aren't object aligned or if the copy length is smaller than the
object size.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Luis Henriques [Mon, 15 Oct 2018 15:45:58 +0000 (16:45 +0100)]
libceph: support the RADOS copy-from operation
Add support for performing remote object copies using the 'copy-from'
operation.
[ Add COPY_FROM to get_num_data_items(). ]
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Luis Henriques [Mon, 15 Oct 2018 15:45:57 +0000 (16:45 +0100)]
ceph: add non-blocking parameter to ceph_try_get_caps()
ceph_try_get_caps currently calls try_get_cap_refs with the nonblock
parameter always set to 'true'. This change adds a new parameter that
allows to set it's value. This will be useful for a follow-up patch that
will need to get two sets of capabilities for two different inodes without
risking a deadlock.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 17 Oct 2018 12:23:04 +0000 (14:23 +0200)]
libceph: check reply num_data_items in setup_request_data()
setup_request_data() adds message data items to both request and reply
messages, but only checks request num_data_items before proceeding with
the loop. This is wrong because if an op doesn't have any request data
items but has a reply data item (e.g. read), a duplicate data item gets
added to the message on every resend attempt.
This went unnoticed for years but now that message data items are
preallocated, it promptly crashes in ceph_msg_data_add(). Amend the
signature to make it clear that setup_request_data() operates on both
request and reply messages. Also, remove data_len assert -- we have
another one in prepare_write_message().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Mon, 15 Oct 2018 15:38:23 +0000 (17:38 +0200)]
libceph: preallocate message data items
Currently message data items are allocated with ceph_msg_data_create()
in setup_request_data() inside send_request(). send_request() has never
been allowed to fail, so each allocation is followed by a BUG_ON:
data = ceph_msg_data_create(...);
BUG_ON(!data);
It's been this way since support for multiple message data items was
added in commit
6644ed7b7e04 ("libceph: make message data be a pointer")
in 3.10.
There is no reason to delay the allocation of message data items until
the last possible moment and we certainly don't need a linked list of
them as they are only ever appended to the end and never erased. Make
ceph_msg_new2() take max_data_items and adapt the rest of the code.
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Mon, 15 Oct 2018 14:11:37 +0000 (16:11 +0200)]
libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls
The current requirement is that ceph_osdc_alloc_messages() should be
called after oid and oloc are known. In preparation for preallocating
message data items, move ceph_osdc_alloc_messages() further down, so
that it is called when OSD op codes are known.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Mon, 15 Oct 2018 13:26:27 +0000 (15:26 +0200)]
libceph: introduce alloc_watch_request()
ceph_osdc_alloc_messages() call will be moved out of
alloc_linger_request() in the next commit, which means that
ceph_osdc_watch() will need to call ceph_osdc_alloc_messages()
twice. Add a helper for that.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 11 Oct 2018 10:58:33 +0000 (12:58 +0200)]
libceph: assign cookies in linger_submit()
Register lingers directly in linger_submit(). This avoids allocating
memory for notify pagelist while holding osdc->lock and simplifies both
callers of linger_submit().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 11 Oct 2018 15:04:33 +0000 (17:04 +0200)]
libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get()
ceph_msgpool_get() can fall back to ceph_msg_new() when it is asked for
a message whose front portion is larger than pool->front_len. However
the caller always passes 0, effectively disabling that code path. The
allocation goes to the message pool and returns a message with a front
that is smaller than requested, setting us up for a crash.
One example of this is a directory with a large number of snapshots.
If its snap context doesn't fit, we oops in encode_request_partial().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 11 Oct 2018 14:15:38 +0000 (16:15 +0200)]
ceph: num_ops is off by one in ceph_aio_retry_work()
Two OSD op slots are allocated, but only one is ever used.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Sat, 13 Oct 2018 09:36:52 +0000 (11:36 +0200)]
libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op()
Any uninitialized or unknown ops will be caught by the default clause
anyway.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Xuehan Xu [Thu, 11 Oct 2018 09:55:39 +0000 (17:55 +0800)]
ceph: set timeout conditionally in __cap_delay_requeue
__cap_delay_requeue could be invoked through ceph_check_caps when there
exists caps that needs to be sent and are delayed by "i_hold_caps_min"
or "i_hold_caps_max". If __cap_delay_requeue sets timeout unconditionally,
there could be a chance that some "wanted" caps can not be release for a
long since their timeouts are reset every time they get delayed.
Fixes: http://tracker.ceph.com/issues/36369
Signed-off-by: Xuehan Xu <xuxuehan@360.cn>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Fri, 28 Sep 2018 14:02:53 +0000 (16:02 +0200)]
libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist()
Because send_mds_reconnect() wants to send a message with a pagelist
and pass the ownership to the messenger, ceph_msg_data_add_pagelist()
consumes a ref which is then put in ceph_msg_data_destroy(). This
makes managing pagelists in the OSD client (where they are wrapped in
ceph_osd_data) unnecessarily hard because the handoff only happens in
ceph_osdc_start_request() instead of when the pagelist is passed to
ceph_osd_data_pagelist_init(). I counted several memory leaks on
various error paths.
Fix up ceph_msg_data_add_pagelist() and carry a pagelist ref in
ceph_osd_data.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Fri, 28 Sep 2018 13:38:34 +0000 (15:38 +0200)]
libceph: introduce ceph_pagelist_alloc()
struct ceph_pagelist cannot be embedded into anything else because it
has its own refcount. Merge allocation and initialization together.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 26 Sep 2018 17:12:07 +0000 (19:12 +0200)]
libceph: osd_req_op_cls_init() doesn't need to take opcode
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 26 Sep 2018 16:03:16 +0000 (18:03 +0200)]
libceph: bump CEPH_MSG_MAX_DATA_LEN
If the read is large enough, we end up spinning in the messenger:
libceph: osd0 192.168.122.1:6801 io error
libceph: osd0 192.168.122.1:6801 io error
libceph: osd0 192.168.122.1:6801 io error
This is a receive side limit, so only reads were affected.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Luis Henriques [Tue, 9 Oct 2018 17:54:28 +0000 (18:54 +0100)]
ceph: only allow punch hole mode in fallocate
Current implementation of cephfs fallocate isn't correct as it doesn't
really reserve the space in the cluster, which means that a subsequent
call to a write may actually fail due to lack of space. In fact, it is
currently possible to fallocate an amount space that is larger than the
free space in the cluster. It has behaved this way since the initial
commit
ad7a60de882a ("ceph: punch hole support").
Since there's no easy solution to fix this at the moment, this patch
simply removes support for all fallocate operations but
FALLOC_FL_PUNCH_HOLE (which implies FALLOC_FL_KEEP_SIZE).
Link: https://tracker.ceph.com/issues/36317
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Yan, Zheng [Sat, 29 Sep 2018 08:02:19 +0000 (16:02 +0800)]
ceph: refactor ceph_sync_read()
Avoid allocating memory for the entire user request: striped_read()
does a synchronous OSD request per object, so it doesn't need more than
object size worth of pages at a time.
[ Preserve the comment, changelog. ]
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Yan, Zheng [Fri, 28 Sep 2018 03:34:42 +0000 (11:34 +0800)]
ceph: check if LOOKUPNAME request was aborted when filling trace
d_lookup()/d_alloc() require parent inode locked. Parent inode is
not locked if request is aborted.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Yan, Zheng [Fri, 28 Sep 2018 01:10:29 +0000 (09:10 +0800)]
ceph: fix dentry leak in ceph_readdir_prepopulate
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Yan, Zheng [Thu, 27 Sep 2018 13:16:05 +0000 (21:16 +0800)]
Revert "ceph: fix dentry leak in splice_dentry()"
This reverts commit
8b8f53af1ed9df88a4c0fbfdf3db58f62060edf3.
splice_dentry() is used by three places. For two places, req->r_dentry
is passed to splice_dentry(). In the case of error, req->r_dentry does
not get updated. So splice_dentry() should not drop reference.
Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Chengguang Xu [Sun, 2 Sep 2018 15:21:09 +0000 (23:21 +0800)]
ceph: check snap first in ceph_set_acl()
Do the snap check first in ceph_set_acl(), so we can avoid
unnecessary operations when the inode has snap.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Chengguang Xu [Sun, 12 Aug 2018 15:06:54 +0000 (23:06 +0800)]
rbd: add __init/__exit annotations
Add __init/__exit annotation to init/cleanup helpers
which are only called once in the module.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Chengguang Xu [Mon, 30 Jul 2018 15:55:36 +0000 (23:55 +0800)]
ceph: reset cap hold timeout only for requeued inode
__cap_delay_requeue() only requeue inode which does not
have CEPH_I_FLUSH flag, so avoid reset cap hold timeout
for that inode.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Greg Kroah-Hartman [Mon, 22 Oct 2018 06:37:37 +0000 (07:37 +0100)]
Linux 4.19
Greg Kroah-Hartman [Fri, 19 Oct 2018 09:30:16 +0000 (11:30 +0200)]
MAINTAINERS: Add an entry for the code of conduct
As I introduced these files, I'm willing to be the maintainer of them as
well.
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 19 Oct 2018 09:08:12 +0000 (11:08 +0200)]
Code of Conduct: Change the contact email address
The contact point for the kernel's Code of Conduct should now be the
Code of Conduct Committee, not the full TAB. Change the email address
in the file to properly reflect this.
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 19 Oct 2018 09:04:07 +0000 (11:04 +0200)]
Code of Conduct Interpretation: Put in the proper URL for the committee
There was a blank <URL> reference for how to find the Code of Conduct
Committee. Fix that up by pointing it to the correct kernel.org website
page location.
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 19 Oct 2018 08:45:08 +0000 (10:45 +0200)]
Code of Conduct: Provide links between the two documents
Create a link between the Code of Conduct and the Code of Conduct
Interpretation so that people can see that they are related.
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 19 Oct 2018 08:28:14 +0000 (10:28 +0200)]
Code of Conduct Interpretation: Properly reference the TAB correctly
We use the term "TAB" before defining it later in the document. Fix
that up by defining it at the first location.
Reported-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Sun, 14 Oct 2018 14:16:47 +0000 (16:16 +0200)]
Code of Conduct Interpretation: Add document explaining how the Code of Conduct is to be interpreted
The Contributor Covenant Code of Conduct is a general document meant to
provide a set of rules for almost any open source community. Every
open-source community is unique and the Linux kernel is no exception.
Because of this, this document describes how we in the Linux kernel
community will interpret it. We also do not expect this interpretation
to be static over time, and will adjust it as needed.
This document was created with the input and feedback of the TAB as well
as many current kernel maintainers.
Co-Developed-by: Thomas Gleixner <tglx@linutronix.de>
Co-Developed-by: Olof Johansson <olof@lixom.net>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Boris Brezillon <boris.brezillon@bootlin.com>
Acked-by: Borislav Petkov <bp@kernel.org>
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Christian Lütke-Stetzkamp <christian@lkamp.de>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Sterba <kdave@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Acked-by: Hans de Goede <j.w.r.degoede@gmail.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: James Smart <james.smart@broadcom.com>
Acked-by: James Smart <jsmart2021@gmail.com>
Acked-by: Jan Kara <jack@ucw.cz>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Jiri Kosina <jikos@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Lina Iyer <ilina@codeaurora.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Matias Bjørling <mb@lightnvm.io>
Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Mishi Choudhary <mishi@linux.com>
Acked-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Palmer Dabbelt <palmer@dabbelt.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Sean Paul <sean@poorly.run>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Shuah Khan <shuah@kernel.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Takashi Iwai <tiwai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Acked-by: Todd Poynor <toddpoynor@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chris Mason [Thu, 11 Oct 2018 16:09:31 +0000 (09:09 -0700)]
Code of conduct: Fix wording around maintainers enforcing the code of conduct
As it was originally worded, this paragraph requires maintainers to
enforce the code of conduct, or face potential repercussions. It sends
the wrong message, when really we just want maintainers to be part of
the solution and not violate the code of conduct themselves.
Removing it doesn't limit our ability to enforce the code of conduct,
and we can still encourage maintainers to help maintain high standards
for the level of discourse in their subsystem.
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Boris Brezillon <boris.brezillon@bootlin.com>
Acked-by: Borislav Petkov <bp@kernel.org>
Acked-by: Christian Lütke-Stetzkamp <christian@lkamp.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Sterba <kdave@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Acked-by: Hans de Goede <j.w.r.degoede@gmail.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: James Smart <james.smart@broadcom.com>
Acked-by: James Smart <jsmart2021@gmail.com>
Acked-by: Jan Kara <jack@ucw.cz>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Jiri Kosina <jikos@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Lina Iyer <ilina@codeaurora.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Matias Bjørling <mb@lightnvm.io>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Palmer Dabbelt <palmer@dabbelt.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Shuah Khan <shuah@kernel.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Takashi Iwai <tiwai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tim Bird <tim.bird@sony.com>
Acked-by: Todd Poynor <toddpoynor@google.com>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Sun, 21 Oct 2018 11:51:36 +0000 (13:51 +0200)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Wolfram writes:
"i2c for 4.19
Another driver bugfix and MAINTAINERS addition from I2C."
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rcar: cleanup DMA for all kinds of failure
MAINTAINERS: Add entry for Broadcom STB I2C controller
Greg Kroah-Hartman [Sun, 21 Oct 2018 08:08:38 +0000 (10:08 +0200)]
Merge git://git./linux/kernel/git/davem/net
David writes:
"Networking:
A few straggler bug fixes:
1) Fix indexing of multi-pass dumps of ipv6 addresses, from David
Ahern.
2) Revert RCU locking change for bonding netpoll, causes worse
problems than it solves.
3) pskb_trim_rcsum_slow() doesn't handle odd trim offsets, resulting
in erroneous bad hw checksum triggers with CHECKSUM_COMPLETE
devices. From Dimitris Michailidis.
4) a revert to some neighbour code changes that adjust notifications
in a way that confuses some apps."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"
net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
net: fix pskb_trim_rcsum_slow() with odd trim offset
Revert "bond: take rcu lock in netpoll_send_skb_on_dev"
Roopa Prabhu [Sun, 21 Oct 2018 01:09:31 +0000 (18:09 -0700)]
Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"
This reverts commit
8e326289e3069dfc9fa9c209924668dd031ab8ef.
This patch results in unnecessary netlink notification when one
tries to delete a neigh entry already in NUD_FAILED state. Found
this with a buggy app that tries to delete a NUD_FAILED entry
repeatedly. While the notification issue can be fixed with more
checks, adding more complexity here seems unnecessary. Also,
recent tests with other changes in the neighbour code have
shown that the INCOMPLETE and PROBE checks are good enough for
the original issue.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 19 Oct 2018 17:00:19 +0000 (10:00 -0700)]
net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
The loop wants to skip previously dumped addresses, so loops until
current index >= saved index. If the message fills it wants to save
the index for the next address to dump - ie., the one that did not
fit in the current message.
Currently, it is incrementing the index counter before comparing to the
saved index, and then the saved index is off by 1 - it assumes the
current address is going to fit in the message.
Change the index handling to increment only after a succesful dump.
Fixes:
502a2ffd7376a ("ipv6: convert idev_list to list macros")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wolfram Sang [Fri, 19 Oct 2018 19:15:26 +0000 (21:15 +0200)]
i2c: rcar: cleanup DMA for all kinds of failure
DMA needs to be cleaned up not only on timeout, but on all errors where
it has been setup before.
Fixes:
73e8b0528346 ("i2c: rcar: add DMA support")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Kamal Dasu [Wed, 17 Oct 2018 16:05:09 +0000 (12:05 -0400)]
MAINTAINERS: Add entry for Broadcom STB I2C controller
Add an entry for the Broadcom STB I2C controller in the MAINTAINERS file.
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
[wsa: fixed sorting and a whitespace error]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Greg Kroah-Hartman [Sat, 20 Oct 2018 13:04:23 +0000 (15:04 +0200)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Ingo writes:
"x86 fixes:
It's 4 misc fixes, 3 build warning fixes and 3 comment fixes.
In hindsight I'd have left out the 3 comment fixes to make the pull
request look less scary at such a late point in the cycle. :-/"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels
x86/fpu: Fix i486 + no387 boot crash by only saving FPU registers on context switch if there is an FPU
x86/fpu: Remove second definition of fpu in __fpu__restore_sig()
x86/entry/64: Further improve paranoid_entry comments
x86/entry/32: Clear the CS high bits
x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS
x86/time: Correct the attribute on jiffies' definition
x86/entry: Add some paranoid entry/exit CR3 handling comments
x86/percpu: Fix this_cpu_read()
x86/tsc: Force inlining of cyc2ns bits
Greg Kroah-Hartman [Sat, 20 Oct 2018 13:03:45 +0000 (15:03 +0200)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Ingo writes:
"scheduler fixes:
Two fixes: a CFS-throttling bug fix, and an interactivity fix."
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the min_vruntime update logic in dequeue_entity()
sched/fair: Fix throttle_list starvation with low CFS quota
Greg Kroah-Hartman [Sat, 20 Oct 2018 13:02:51 +0000 (15:02 +0200)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Ingo writes:
"perf fixes:
Misc perf tooling fixes."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Stop fallbacking to kallsyms for vdso symbols lookup
perf tools: Pass build flags to traceevent build
perf report: Don't crash on invalid inline debug information
perf cpu_map: Align cpu map synthesized events properly.
perf tools: Fix tracing_path_mount proper path
perf tools: Fix use of alternatives to find JDIR
perf evsel: Store ids for events with their own cpus perf_event__synthesize_event_update_cpus
perf vendor events intel: Fix wrong filter_band* values for uncore events
Revert "perf tools: Fix PMU term format max value calculation"
tools headers uapi: Sync kvm.h copy
tools arch uapi: Sync the x86 kvm.h copy
Dimitris Michailidis [Sat, 20 Oct 2018 00:07:13 +0000 (17:07 -0700)]
net: fix pskb_trim_rcsum_slow() with odd trim offset
We've been getting checksum errors involving small UDP packets, usually
59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
has been complaining that HW is providing bad checksums. Turns out the
problem is in pskb_trim_rcsum_slow(), introduced in commit
88078d98d1bb
("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
The source of the problem is that when the bytes we are trimming start
at an odd address, as in the case of the 1 padding byte above,
skb_checksum() returns a byte-swapped value. We cannot just combine this
with skb->csum using csum_sub(). We need to use csum_block_sub() here
that takes into account the parity of the start address and handles the
swapping.
Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
Fixes:
88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 20 Oct 2018 07:23:12 +0000 (09:23 +0200)]
Merge tag 'drm-fixes-2018-10-20-1' of git://anongit.freedesktop.org/drm/drm
Dave writes:
"drm fixes for 4.19 final (part 2)
Looked like two stragglers snuck in, one very urgent the pageflipping
was missing a reference that could result in a GPF on non-i915
drivers, the other is an overflow in the sun4i dotclock calcs
resulting in a mode not getting set."
* tag 'drm-fixes-2018-10-20-1' of git://anongit.freedesktop.org/drm/drm:
drm/sun4i: Fix an ulong overflow in the dotclock driver
drm: Get ref on CRTC commit object when waiting for flip_done
Greg Kroah-Hartman [Sat, 20 Oct 2018 07:20:48 +0000 (09:20 +0200)]
Merge tag 'trace-v4.19-rc8-2' of git://git./linux/kernel/git/rostedt/linux-trace
Steven writes:
"tracing: A few small fixes to synthetic events
Masami found some issues with the creation of synthetic events. The
first two patches fix handling of unsigned type, and handling of a
space before an ending semi-colon.
The third patch adds a selftest to test the processing of synthetic
events."
* tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
selftests: ftrace: Add synthetic event syntax testcase
tracing: Fix synthetic event to allow semicolon at end
tracing: Fix synthetic event to accept unsigned modifier
Greg Kroah-Hartman [Sat, 20 Oct 2018 06:42:56 +0000 (08:42 +0200)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Dmitry writes:
"Input updates for 4.19-rc8
Just an addition to elan touchpad driver ACPI table."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15IGM
Dave Airlie [Fri, 19 Oct 2018 21:18:12 +0000 (07:18 +1000)]
Merge tag 'drm-misc-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Second pull request for v4.19:
- Fix ulong overflow in sun4i
- Fix a serious GPF in waiting for flip_done from commit_tail().
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/97d1ed42-1d99-fcc5-291e-cd1dc29a4252@linux.intel.com
Masami Hiramatsu [Thu, 18 Oct 2018 13:13:02 +0000 (22:13 +0900)]
selftests: ftrace: Add synthetic event syntax testcase
Add a testcase to check the syntax and field types for
synthetic_events interface.
Link: http://lkml.kernel.org/r/153986838264.18251.16627517536956299922.stgit@devbox
Acked-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Thu, 18 Oct 2018 13:12:34 +0000 (22:12 +0900)]
tracing: Fix synthetic event to allow semicolon at end
Fix synthetic event to allow independent semicolon at end.
The synthetic_events interface accepts a semicolon after the
last word if there is no space.
# echo "myevent u64 var;" >> synthetic_events
But if there is a space, it returns an error.
# echo "myevent u64 var ;" > synthetic_events
sh: write error: Invalid argument
This behavior is difficult for users to understand. Let's
allow the last independent semicolon too.
Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit
4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Thu, 18 Oct 2018 13:12:05 +0000 (22:12 +0900)]
tracing: Fix synthetic event to accept unsigned modifier
Fix synthetic event to accept unsigned modifier for its field type
correctly.
Currently, synthetic_events interface returns error for "unsigned"
modifiers as below;
# echo "myevent unsigned long var" >> synthetic_events
sh: write error: Invalid argument
This is because argv_split() breaks "unsigned long" into "unsigned"
and "long", but parse_synth_field() doesn't expected it.
With this fix, synthetic_events can handle the "unsigned long"
correctly like as below;
# echo "myevent unsigned long var" >> synthetic_events
# cat synthetic_events
myevent unsigned long var
Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit
4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
David S. Miller [Fri, 19 Oct 2018 17:45:08 +0000 (10:45 -0700)]
Revert "bond: take rcu lock in netpoll_send_skb_on_dev"
This reverts commit
6fe9487892b32cb1c8b8b0d552ed7222a527fe30.
It is causing more serious regressions than the RCU warning
it is fixing.
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Fri, 19 Oct 2018 17:25:44 +0000 (19:25 +0200)]
Merge tag 'usb-4.19-final' of git://git./linux/kernel/git/gregkh/usb
I wrote:
"USB fixes for 4.19-final
Here are a small number of last-minute USB driver fixes
Included here are:
- spectre fix for usb storage gadgets
- xhci fixes
- cdc-acm fixes
- usbip fixes for reported problems
All of these have been in linux-next with no reported issues."
* tag 'usb-4.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: storage: Fix Spectre v1 vulnerability
USB: fix the usbfs flag sanitization for control transfers
usb: xhci: pci: Enable Intel USB role mux on Apollo Lake platforms
usb: roles: intel_xhci: Fix Unbalanced pm_runtime_enable
cdc-acm: correct counting of UART states in serial state notification
cdc-acm: do not reset notification buffer index upon urb unlinking
cdc-acm: fix race between reset and control messaging
usb: usbip: Fix BUG: KASAN: slab-out-of-bounds in vhci_hub_control()
selftests: usbip: add wait after attach and before checking port status
Greg Kroah-Hartman [Fri, 19 Oct 2018 16:51:07 +0000 (18:51 +0200)]
Merge tag 'for-linus-
20181019' of git://git.kernel.dk/linux-block
Jens writes:
"Block fixes for 4.19-final
Two small fixes that should go into this release."
* tag 'for-linus-
20181019' of git://git.kernel.dk/linux-block:
block: don't deal with discard limit in blkdev_issue_discard()
nvme: remove ns sibling before clearing path
Boris Brezillon [Thu, 18 Oct 2018 10:02:50 +0000 (12:02 +0200)]
drm/sun4i: Fix an ulong overflow in the dotclock driver
The calculated ideal rate can easily overflow an unsigned long, thus
making the best div selection buggy as soon as no ideal match is found
before the overflow occurs.
Fixes:
4731a72df273 ("drm/sun4i: request exact rates to our parents")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181018100250.12565-1-boris.brezillon@bootlin.com
Greg Kroah-Hartman [Fri, 19 Oct 2018 07:16:20 +0000 (09:16 +0200)]
Merge git://git./linux/kernel/git/davem/net
David writes:
"Networking
1) Fix gro_cells leak in xfrm layer, from Li RongQing.
2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that. From
Eric Dumazet.
3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn
Töpel.
4) Out of bounds packet access in _decode_session6(), from Alexei
Starovoitov.
5) Several ethtool bugs, where we copy a struct into the kernel twice
and our validations of the values in the first copy can be
invalidated by the second copy due to asynchronous updates to the
memory by the user. From Wenwen Wang.
6) Missing netlink attribute validation in cls_api, from Davide
Caratti.
7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang.
8) rxrpc operates on wrong kvec, from Yue Haibing.
9) A regression was introduced by the disassosciation of route
neighbour references in rt6_probe(), causing probe for
neighbourless routes to not be properly rate limited. Fix from
Sabrina Dubroca.
10) Unsafe RCU locking in tipc, from Tung Nguyen.
11) Use after free in inet6_mc_check(), from Eric Dumazet.
12) PMTU from icmp packets should update the SCTP transport pathmtu,
from Xin Long.
13) Missing peer put on error in rxrpc, from David Howells.
14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren.
15) Fix overflowing shift statement in qla3xxx driver, from Nathan
Chancellor.
16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva.
17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return
value in an inverted manner, fix from Paolo Abeni.
18) Fix missed unresolved entries in ipmr dumps, from Nikolay
Aleksandrov.
19) Fix NAPI handling under high load, we can completely miss events
when NAPI has to loop more than one time in a cycle. From Heiner
Kallweit."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits)
ip6_tunnel: Fix encapsulation layout
tipc: fix info leak from kernel tipc_event
net: socket: fix a missing-check bug
net: sched: Fix for duplicate class dump
r8169: fix NAPI handling under high load
net: ipmr: fix unresolved entry dumps
net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion()
sctp: fix the data size calculation in sctp_data_size
virtio_net: avoid using netif_tx_disable() for serializing tx routine
udp6: fix encap return code for resubmitting
mlxsw: core: Fix use-after-free when flashing firmware during init
sctp: not free the new asoc when sctp_wait_for_connect returns err
sctp: fix race on sctp_id2asoc
r8169: re-enable MSI-X on RTL8168g
net: bpfilter: use get_pid_task instead of pid_task
ptp: fix Spectre v1 vulnerability
net: qla3xxx: Remove overflowing shift statement
geneve, vxlan: Don't set exceptions if skb->len < mtu
geneve, vxlan: Don't check skb_dst() twice
sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead
...
Greg Kroah-Hartman [Fri, 19 Oct 2018 07:15:12 +0000 (09:15 +0200)]
Merge git://git./linux/kernel/git/davem/sparc
David writes:
"Sparc fixes:
The main bit here is fixing how fallback system calls are handled in
the sparc vDSO.
Unfortunately, I fat fingered the commit and some perf debugging
hacks slipped into the vDSO fix, which I revert in the very next
commit."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Revert unintended perf changes.
sparc: vDSO: Silence an uninitialized variable warning
sparc: Fix syscall fallback bugs in VDSO.
Greg Kroah-Hartman [Fri, 19 Oct 2018 06:31:22 +0000 (08:31 +0200)]
Merge tag 'drm-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm
Dave writes:
"drm fixes for 4.19 final
Just a last set of misc core fixes for final.
4 fixes, one use after free, one fb integration fix, one EDID fix,
and one laptop panel quirk,"
* tag 'drm-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm:
drm/edid: VSDB yCBCr420 Deep Color mode bit definitions
drm: fix use of freed memory in drm_mode_setcrtc
drm: fb-helper: Reject all pixel format changing requests
drm/edid: Add 6 bpc quirk for BOE panel in HP Pavilion 15-n233sl
Greg Kroah-Hartman [Fri, 19 Oct 2018 06:30:35 +0000 (08:30 +0200)]
Merge tag 'for-gkh' of git://git./linux/kernel/git/rdma/rdma
Doug writes:
"Really final for-rc pull request for 4.19
Ok, so last week I thought we had sent our final pull request for
4.19. Well, wouldn't ya know someone went and found a couple Spectre
v1 fixes were needed :-/. So, a couple *very* small specter patches
for this (hopefully) final -rc week."
* tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/ucma: Fix Spectre v1 vulnerability
IB/ucm: Fix Spectre v1 vulnerability
Christoph Hellwig [Sun, 14 Oct 2018 07:52:08 +0000 (09:52 +0200)]
x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels
We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.
Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.
[ mingo: Changelog fixes. ]
Fixes:
21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes:
ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in illegal_highdma")
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Cc: konrad.wilk@oracle.com
Cc: iommu@lists.linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Fri, 19 Oct 2018 03:51:55 +0000 (13:51 +1000)]
Merge tag 'drm-misc-fixes-2018-10-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v4.19:
- Fix use of freed memory in drm_mode_setcrtc.
- Reject pixel format changing requests in fb helper.
- Add 6 bpc quirk for HP Pavilion 15-n233sl
- Fix VSDB yCBCr420 Deep Color mode bit definitions
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/647fe5d0-4ec5-57cc-9f23-a4836b29e278@linux.intel.com
Stefano Brivio [Thu, 18 Oct 2018 19:25:07 +0000 (21:25 +0200)]
ip6_tunnel: Fix encapsulation layout
Commit
058214a4d1df ("ip6_tun: Add infrastructure for doing
encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.
As long as the option didn't actually end up in generated packets, this
wasn't an issue. Then commit
89a23c8b528b ("ip6_tunnel: Fix missing tunnel
encapsulation limit option") fixed sending of this option, and the
resulting layout, e.g. for FoU, is:
.-------------------.------------.----------.-------------------.----- - -
| Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
'-------------------'------------'----------'-------------------'----- - -
Needless to say, FoU and GUE (at least) won't work over IPv6. The option
is appended by default, and I couldn't find a way to disable it with the
current iproute2.
Turn this into a more reasonable:
.-------------------.----------.------------.-------------------.----- - -
| Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
'-------------------'----------'------------'-------------------'----- - -
With this, and with
84dad55951b0 ("udp6: fix encap return code for
resubmitting"), FoU and GUE work again over IPv6.
Fixes:
058214a4d1df ("ip6_tun: Add infrastructure for doing encapsulation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 18 Oct 2018 15:38:29 +0000 (17:38 +0200)]
tipc: fix info leak from kernel tipc_event
We initialize a struct tipc_event allocated on the kernel stack to
zero to avert info leak to user space.
Reported-by: syzbot+057458894bc8cada4dee@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wenwen Wang [Thu, 18 Oct 2018 14:36:46 +0000 (09:36 -0500)]
net: socket: fix a missing-check bug
In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.
This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 18 Oct 2018 08:34:26 +0000 (10:34 +0200)]
net: sched: Fix for duplicate class dump
When dumping classes by parent, kernel would return classes twice:
| # tc qdisc add dev lo root prio
| # tc class show dev lo
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| # tc class show dev lo parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
This comes from qdisc_match_from_root() potentially returning the root
qdisc itself if its handle matched. Though in that case, root's classes
were already dumped a few lines above.
Fixes:
cb395b2010879 ("net: sched: optimize class dumps")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 18 Oct 2018 17:56:01 +0000 (19:56 +0200)]
r8169: fix NAPI handling under high load
rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.
Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.
Fixes:
da78dbff2e05 ("r8169: remove work from irq handler.")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Oct 2018 18:32:29 +0000 (11:32 -0700)]
sparc: Revert unintended perf changes.
Some local debugging hacks accidently slipped into the VDSO commit.
Sorry!
Signed-off-by: David S. Miller <davem@davemloft.net>
Leo Li [Mon, 15 Oct 2018 13:46:40 +0000 (09:46 -0400)]
drm: Get ref on CRTC commit object when waiting for flip_done
This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.
Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:
...
1. W does flip work
2. W runs commit_hw_done()
3. W waits for flip_done on CRTC 1
4. > flip_done for CRTC 1 completes
5. W finishes waiting for CRTC 1
6. W waits for flip_done on CRTC 2
7. > Preempted by X
8. > flip_done for CRTC 2 completes
9. X atomic_check: hw_done and flip_done are complete on all CRTCs
10. X updates cursor on both CRTCs
11. X destroys atomic state
12. X done
13. > Switch back to W
14. W waits for flip_done on CRTC 2
15. W raises general protection fault
The error looks like so:
general protection fault: 0000 [#1] PREEMPT SMP PTI
**snip**
Call Trace:
lock_acquire+0xa2/0x1b0
_raw_spin_lock_irq+0x39/0x70
wait_for_completion_timeout+0x31/0x130
drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
commit_tail+0x3d/0x70 [drm_kms_helper]
process_one_work+0x212/0x650
worker_thread+0x49/0x420
kthread+0xfb/0x130
ret_from_fork+0x3a/0x50
Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm(O) drm(O)
Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.
v2: The reference on the commit object needs to be obtained before
hw_done() is signaled, since that's the point where another commit
is allowed to modify the state. Assuming that the
new_crtc_state->commit object still exists within flip_done() is
incorrect.
Fix by getting a reference in setup_commit(), and releasing it
during default_clear().
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1539611200-6184-1-git-send-email-sunpeng.li@amd.com
David S. Miller [Thu, 18 Oct 2018 16:55:08 +0000 (09:55 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2018-10-18
1) Free the xfrm interface gro_cells when deleting the
interface, otherwise we leak it. From Li RongQing.
2) net/core/flow.c does not exist anymore, so remove it
from the MAINTAINERS file.
3) Fix a slab-out-of-bounds in _decode_session6.
From Alexei Starovoitov.
4) Fix RCU protection when policies inserted into
thei bydst lists. From Florian Westphal.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ming Lei [Fri, 12 Oct 2018 07:53:10 +0000 (15:53 +0800)]
block: don't deal with discard limit in blkdev_issue_discard()
blk_queue_split() does respect this limit via bio splitting, so no
need to do that in blkdev_issue_discard(), then we can align to
normal bio submit(bio_add_page() & submit_bio()).
More importantly, this patch fixes one issue introduced in
a22c4d7e34402cc
("block: re-add discard_granularity and alignment checks"), in which
zero discard bio may be generated in case of zero alignment.
Fixes:
a22c4d7e34402ccdf3 ("block: re-add discard_granularity and alignment checks")
Cc: stable@vger.kernel.org
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Xiao Ni <xni@redhat.com>
Tested-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Eric Sandeen [Wed, 17 Oct 2018 14:23:59 +0000 (15:23 +0100)]
fscache: Fix out of bound read in long cookie keys
fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
Read of size 4 at addr
ffff88084ff056d4 by task mount.nfs/32615
and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
Read of size 4 at addr
ffff8801d3cc8bb4 by task syz-executor907/4466
This happens for any index_key_len which is not divisible by 4 and is
larger than the size of the inline key, because the code allocates exactly
index_key_len for the key buffer, but the hashing loop is stepping through
it 4 bytes (u32) at a time in the buf[] array.
Fix this by calculating how many u32 buffers we'll need by using
DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
buffer to hold the index_key, then using that same count as the hashing
index limit.
Fixes:
ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells [Wed, 17 Oct 2018 14:23:45 +0000 (15:23 +0100)]
fscache: Fix incomplete initialisation of inline key space
The inline key in struct rxrpc_cookie is insufficiently initialized,
zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
bytes will end up hashing uninitialized memory because the memcpy only
partially fills the last buf[] element.
Fix this by clearing fscache_cookie objects on allocation rather than using
the slab constructor to initialise them. We're going to pretty much fill
in the entire struct anyway, so bringing it into our dcache writably
shouldn't incur much overhead.
This removes the need to do clearance in fscache_set_key() (where we aren't
doing it correctly anyway).
Also, we don't need to set cookie->key_len in fscache_set_key() as we
already did it in the only caller, so remove that.
Fixes:
ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Reported-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Wed, 17 Oct 2018 14:23:26 +0000 (15:23 +0100)]
cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)
the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent. Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc. So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.
The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.
Cc: stable@vger.kernel.org
Fixes:
9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Fri, 12 Oct 2018 22:22:59 +0000 (15:22 -0700)]
mremap: properly flush TLB before releasing the page
Jann Horn points out that our TLB flushing was subtly wrong for the
mremap() case. What makes mremap() special is that we don't follow the
usual "add page to list of pages to be freed, then flush tlb, and then
free pages". No, mremap() obviously just _moves_ the page from one page
table location to another.
That matters, because mremap() thus doesn't directly control the
lifetime of the moved page with a freelist: instead, the lifetime of the
page is controlled by the page table locking, that serializes access to
the entry.
As a result, we need to flush the TLB not just before releasing the lock
for the source location (to avoid any concurrent accesses to the entry),
but also before we release the destination page table lock (to avoid the
TLB being flushed after somebody else has already done something to that
page).
This also makes the whole "need_flush" logic unnecessary, since we now
always end up flushing the TLB for every valid entry.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christoph Hellwig [Thu, 18 Oct 2018 06:22:39 +0000 (08:22 +0200)]
LICENSES: Remove CC-BY-SA-4.0 license text
Using non-GPL licenses for our documentation is rather problematic,
as it can directly include other files, which generally are GPLv2
licensed and thus not compatible.
Remove this license now that the only user (idr.rst) is gone to avoid
people semi-accidentally using it again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Thu, 18 Oct 2018 09:24:32 +0000 (11:24 +0200)]
Merge branch 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax
Matthew writes:
"IDA/IDR fixes for 4.19
I have two tiny fixes, one for the IDA test-suite and one for the IDR
documentation license."
* 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax:
idr: Change documentation license
test_ida: Fix lockdep warning
Ingo Molnar [Thu, 18 Oct 2018 05:41:29 +0000 (07:41 +0200)]
Merge tag 'perf-urgent-for-mingo-4.19-
20181017' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Stop falling back to kallsyms for vDSO symbols lookup, this wasn't
being really used and is not valid in arches such as Sparc, where
user and kernel space don't share the address space, relying only on
cpumode to figure out what DSOs to lookup (Arnaldo Carvalho de Melo)
- Align CPU map synthesized events properly, fixing SIGBUS in
CPUs like Sparc (David Miller)
- Fix use of alternatives to find JDIR (Jarod Wilson)
- Store IDs for events with their own CPUs when synthesizing user
level event details (scale, unit, etc) events, fixing a crash
when recording a PMU event with a cpumask defined (Jiri Olsa)
- Fix wrong filter_band* values for uncore Intel vendor events (Jiri Olsa)
- Fix detection of tracefs path in systems without tracefs, where
that path should be the debugfs mountpoint plus "/tracing/" (Jiri Olsa)
- Pass build flags to traceevent build, allowing using alternative
flags in distro packages, RPM, for instance (Jiri Olsa)
- Fix 'perf report' crash on invalid inline debug information (Milian Wolff)
- Synch KVM UAPI copies (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Nikolay Aleksandrov [Wed, 17 Oct 2018 19:34:34 +0000 (22:34 +0300)]
net: ipmr: fix unresolved entry dumps
If the skb space ends in an unresolved entry while dumping we'll miss
some unresolved entries. The reason is due to zeroing the entry counter
between dumping resolved and unresolved mfc entries. We should just
keep counting until the whole table is dumped and zero when we move to
the next as we have a separate table counter.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes:
8fb472c09b9d ("ipmr: improve hash scalability")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Wed, 17 Oct 2018 15:26:35 +0000 (17:26 +0200)]
net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion()
The ocelot_vlant_wait_for_completion() function is very similar to the
ocelot_mact_wait_for_completion(). It seemed to have be copied but the
comment was not updated, so let's fix it.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 17 Oct 2018 13:11:27 +0000 (21:11 +0800)]
sctp: fix the data size calculation in sctp_data_size
sctp data size should be calculated by subtracting data chunk header's
length from chunk_hdr->length, not just data header.
Fixes:
668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ake Koomsin [Wed, 17 Oct 2018 10:44:12 +0000 (19:44 +0900)]
virtio_net: avoid using netif_tx_disable() for serializing tx routine
Commit
713a98d90c5e ("virtio-net: serialize tx routine during reset")
introduces netif_tx_disable() after netif_device_detach() in order to
avoid use-after-free of tx queues. However, there are two issues.
1) Its operation is redundant with netif_device_detach() in case the
interface is running.
2) In case of the interface is not running before suspending and
resuming, the tx does not get resumed by netif_device_attach().
This results in losing network connectivity.
It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
serializing tx routine during reset. This also preserves the symmetry
of netif_device_detach() and netif_device_attach().
Fixes commit
713a98d90c5e ("virtio-net: serialize tx routine during reset")
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Thu, 18 Oct 2018 05:29:05 +0000 (07:29 +0200)]
Merge tag 'trace-v4.19-rc8' of git://git./linux/kernel/git/rostedt/linux-trace
Steven writes:
"tracing: Two fixes for 4.19
This fixes two bugs:
- Fix size mismatch of tracepoint array
- Have preemptirq test module use same clock source of the selftest"
* tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c
tracepoint: Fix tracepoint array element size mismatch
Paolo Abeni [Wed, 17 Oct 2018 09:44:04 +0000 (11:44 +0200)]
udp6: fix encap return code for resubmitting
The commit
eb63f2964dbe ("udp6: add missing checks on edumux packet
processing") used the same return code convention of the ipv4 counterpart,
but ipv6 uses the opposite one: positive values means resubmit.
This change addresses the issue, using positive return value for
resubmitting. Also update the related comment, which was broken, too.
Fixes:
eb63f2964dbe ("udp6: add missing checks on edumux packet processing")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Oct 2018 08:05:45 +0000 (08:05 +0000)]
mlxsw: core: Fix use-after-free when flashing firmware during init
When the switch driver (e.g., mlxsw_spectrum) determines it needs to
flash a new firmware version it resets the ASIC after the flashing
process. The bus driver (e.g., mlxsw_pci) then registers itself again
with mlxsw_core which means (among other things) that the device
registers itself again with the hwmon subsystem again.
Since the device was registered with the hwmon subsystem using
devm_hwmon_device_register_with_groups(), then the old hwmon device
(registered before the flashing) was never unregistered and was
referencing stale data, resulting in a use-after free.
Fix by removing reliance on device managed APIs in mlxsw_hwmon_init().
Fixes:
c86d62cc410c ("mlxsw: spectrum: Reset FW after flash")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 16 Oct 2018 19:06:12 +0000 (03:06 +0800)]
sctp: not free the new asoc when sctp_wait_for_connect returns err
When sctp_wait_for_connect is called to wait for connect ready
for sp->strm_interleave in sctp_sendmsg_to_asoc, a panic could
be triggered if cpu is scheduled out and the new asoc is freed
elsewhere, as it will return err and later the asoc gets freed
again in sctp_sendmsg.
[ 285.840764] list_del corruption,
ffff9f0f7b284078->next is LIST_POISON1 (
dead000000000100)
[ 285.843590] WARNING: CPU: 1 PID: 8861 at lib/list_debug.c:47 __list_del_entry_valid+0x50/0xa0
[ 285.846193] Kernel panic - not syncing: panic_on_warn set ...
[ 285.846193]
[ 285.848206] CPU: 1 PID: 8861 Comm: sctp_ndata Kdump: loaded Not tainted 4.19.0-rc7.label #584
[ 285.850559] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 285.852164] Call Trace:
...
[ 285.872210] ? __list_del_entry_valid+0x50/0xa0
[ 285.872894] sctp_association_free+0x42/0x2d0 [sctp]
[ 285.873612] sctp_sendmsg+0x5a4/0x6b0 [sctp]
[ 285.874236] sock_sendmsg+0x30/0x40
[ 285.874741] ___sys_sendmsg+0x27a/0x290
[ 285.875304] ? __switch_to_asm+0x34/0x70
[ 285.875872] ? __switch_to_asm+0x40/0x70
[ 285.876438] ? ptep_set_access_flags+0x2a/0x30
[ 285.877083] ? do_wp_page+0x151/0x540
[ 285.877614] __sys_sendmsg+0x58/0xa0
[ 285.878138] do_syscall_64+0x55/0x180
[ 285.878669] entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is a similar issue with the one fixed in Commit
ca3af4dd28cf
("sctp: do not free asoc when it is already dead in sctp_sendmsg").
But this one can't be fixed by returning -ESRCH for the dead asoc
in sctp_wait_for_connect, as it will break sctp_connect's return
value to users.
This patch is to simply set err to -ESRCH before it returns to
sctp_sendmsg when any err is returned by sctp_wait_for_connect
for sp->strm_interleave, so that no asoc would be freed due to
this.
When users see this error, they will know the packet hasn't been
sent. And it also makes sense to not free asoc because waiting
connect fails, like the second call for sctp_wait_for_connect in
sctp_sendmsg_to_asoc.
Fixes:
668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Tue, 16 Oct 2018 18:18:17 +0000 (15:18 -0300)]
sctp: fix race on sctp_id2asoc
syzbot reported an use-after-free involving sctp_id2asoc. Dmitry Vyukov
helped to root cause it and it is because of reading the asoc after it
was freed:
CPU 1 CPU 2
(working on socket 1) (working on socket 2)
sctp_association_destroy
sctp_id2asoc
spin lock
grab the asoc from idr
spin unlock
spin lock
remove asoc from idr
spin unlock
free(asoc)
if asoc->base.sk != sk ... [*]
This can only be hit if trying to fetch asocs from different sockets. As
we have a single IDR for all asocs, in all SCTP sockets, their id is
unique on the system. An application can try to send stuff on an id
that matches on another socket, and the if in [*] will protect from such
usage. But it didn't consider that as that asoc may belong to another
socket, it may be freed in parallel (read: under another socket lock).
We fix it by moving the checks in [*] into the protected region. This
fixes it because the asoc cannot be freed while the lock is held.
Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 16 Oct 2018 17:35:17 +0000 (19:35 +0200)]
r8169: re-enable MSI-X on RTL8168g
Similar to
d49c88d7677b ("r8169: Enable MSI-X on RTL8106e") after
e9d0ba506ea8 ("PCI: Reprogram bridge prefetch registers on resume")
we can safely assume that this also fixes the root cause of
the issue worked around by
7c53a722459c ("r8169: don't use MSI-X on
RTL8168g"). So let's revert it.
Fixes:
7c53a722459c ("r8169: don't use MSI-X on RTL8168g")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Tue, 16 Oct 2018 15:35:10 +0000 (00:35 +0900)]
net: bpfilter: use get_pid_task instead of pid_task
pid_task() dereferences rcu protected tasks array.
But there is no rcu_read_lock() in shutdown_umh() routine so that
rcu_read_lock() is needed.
get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock()
then calls pid_task(). if task isn't NULL, it increases reference count
of task.
test commands:
%modprobe bpfilter
%modprobe -rv bpfilter
splat looks like:
[15102.030932] =============================
[15102.030957] WARNING: suspicious RCU usage
[15102.030985] 4.19.0-rc7+ #21 Not tainted
[15102.031010] -----------------------------
[15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage!
[15102.031063]
other info that might help us debug this:
[15102.031332]
rcu_scheduler_active = 2, debug_locks = 1
[15102.031363] 1 lock held by modprobe/1570:
[15102.031389] #0:
00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter]
[15102.031552]
stack backtrace:
[15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21
[15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[15102.031628] Call Trace:
[15102.031676] dump_stack+0xc9/0x16b
[15102.031723] ? show_regs_print_info+0x5/0x5
[15102.031801] ? lockdep_rcu_suspicious+0x117/0x160
[15102.031855] pid_task+0x134/0x160
[15102.031900] ? find_vpid+0xf0/0xf0
[15102.032017] shutdown_umh.constprop.1+0x1e/0x53 [bpfilter]
[15102.032055] stop_umh+0x46/0x52 [bpfilter]
[15102.032092] __x64_sys_delete_module+0x47e/0x570
[ ... ]
Fixes:
d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Tue, 16 Oct 2018 13:06:41 +0000 (15:06 +0200)]
ptp: fix Spectre v1 vulnerability
pin_index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/ptp/ptp_chardev.c:253 ptp_ioctl() warn: potential spectre issue
'ops->pin_config' [r] (local cap)
Fix this by sanitizing pin_index before using it to index
ops->pin_config, and before passing it as an argument to
function ptp_set_pinfunc(), in which it is used to index
info->pin_config.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=
152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Sat, 13 Oct 2018 10:26:53 +0000 (13:26 +0300)]
sparc: vDSO: Silence an uninitialized variable warning
Smatch complains that "val" would be uninitialized if kstrtoul() fails.
Fixes:
9a08862a5d2e ("vDSO for sparc")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor [Sat, 13 Oct 2018 02:14:58 +0000 (19:14 -0700)]
net: qla3xxx: Remove overflowing shift statement
Clang currently warns:
drivers/net/ethernet/qlogic/qla3xxx.c:384:24: warning: signed shift
result (0xF00000000) requires 37 bits to represent, but 'int' only has
32 bits [-Wshift-overflow]
((ISP_NVRAM_MASK << 16) | qdev->eeprom_cmd_data));
~~~~~~~~~~~~~~ ^ ~~
1 warning generated.
The warning is certainly accurate since ISP_NVRAM_MASK is defined as
(0x000F << 16) which is then shifted by 16, resulting in
64424509440,
well above UINT_MAX.
Given that this is the only location in this driver where ISP_NVRAM_MASK
is shifted again, it seems likely that ISP_NVRAM_MASK was originally
defined without a shift and during the move of the shift to the
definition, this statement wasn't properly removed (since ISP_NVRAM_MASK
is used in the statenent right above this). Only the maintainers can
confirm this since this statment has been here since the driver was
first added to the kernel.
Link: https://github.com/ClangBuiltLinux/linux/issues/127
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Oct 2018 04:51:14 +0000 (21:51 -0700)]
Merge branch 'geneve-vxlan-mtu'
Stefano Brivio says:
====================
geneve, vxlan: Don't set exceptions if skb->len < mtu
This series fixes the exception abuse described in 2/2, and 1/2
is just a preparatory change to make 2/2 less ugly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Fri, 12 Oct 2018 21:53:59 +0000 (23:53 +0200)]
geneve, vxlan: Don't set exceptions if skb->len < mtu
We shouldn't abuse exceptions: if the destination MTU is already higher
than what we're transmitting, no exception should be created.
Fixes:
52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes:
a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Fri, 12 Oct 2018 21:53:58 +0000 (23:53 +0200)]
geneve, vxlan: Don't check skb_dst() twice
Commit
f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
that we try updating PMTU for a non-existent destination, but didn't clean
up cases where the check was already explicit. Drop those redundant checks.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Oct 2018 04:28:01 +0000 (21:28 -0700)]
sparc: Fix syscall fallback bugs in VDSO.
First, the trap number for 32-bit syscalls is 0x10.
Also, only negate the return value when syscall error is indicated by
the carry bit being set.
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Rostedt (VMware) [Tue, 16 Oct 2018 03:31:42 +0000 (23:31 -0400)]
tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c
The preemptirq_delay_test module is used for the ftrace selftest code that
tests the latency tracers. The problem is that it uses ktime for the delay
loop, and then checks the tracer to see if the delay loop is caught, but the
tracer uses trace_clock_local() which uses various different other clocks to
measure the latency. As ktime uses the clock cycles, and the code then
converts that to nanoseconds, it causes rounding errors, and the preemptirq
latency tests are failing due to being off by 1 (it expects to see a delay
of 500000 us, but the delay is only 499999 us). This is happening due to a
rounding error in the ktime (which is totally legit). The purpose of the
test is to see if it can catch the delay, not to test the accuracy between
trace_clock_local() and ktime_get(). Best to use apples to apples, and have
the delay loop use the same clock as the latency tracer does.
Cc: stable@vger.kernel.org
Fixes:
f96e8577da102 ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Mathieu Desnoyers [Sat, 13 Oct 2018 19:10:50 +0000 (15:10 -0400)]
tracepoint: Fix tracepoint array element size mismatch
commit
46e0c9be206f ("kernel: tracepoints: add support for relative
references") changes the layout of the __tracepoint_ptrs section on
architectures supporting relative references. However, it does so
without turning struct tracepoint * const into const int elsewhere in
the tracepoint code, which has the following side-effect:
Setting mod->num_tracepoints is done in by module.c:
mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs",
sizeof(*mod->tracepoints_ptrs),
&mod->num_tracepoints);
Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size
(rather than sizeof(int)), num_tracepoints is erroneously set to half the
size it should be on 64-bit arch. So a module with an odd number of
tracepoints misses the last tracepoint due to effect of integer
division.
So in the module going notifier:
for_each_tracepoint_range(mod->tracepoints_ptrs,
mod->tracepoints_ptrs + mod->num_tracepoints,
tp_module_going_check_quiescent, NULL);
the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually
evaluates to something within the bounds of the array, but miss the
last tracepoint if the number of tracepoints is odd on 64-bit arch.
Fix this by introducing a new typedef: tracepoint_ptr_t, which
is either "const int" on architectures that have PREL32 relocations,
or "struct tracepoint * const" on architectures that does not have
this feature.
Also provide a new tracepoint_ptr_defer() static inline to
encapsulate deferencing this type rather than duplicate code and
ugly idefs within the for_each_tracepoint_range() implementation.
This issue appears in 4.19-rc kernels, and should ideally be fixed
before the end of the rc cycle.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com
Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Gustavo A. R. Silva [Tue, 16 Oct 2018 10:16:45 +0000 (12:16 +0200)]
usb: gadget: storage: Fix Spectre v1 vulnerability
num can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/usb/gadget/function/f_mass_storage.c:3177 fsg_lun_make() warn:
potential spectre issue 'fsg_opts->common->luns' [r] (local cap)
Fix this by sanitizing num before using it to index
fsg_opts->common->luns
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=
152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Felipe Balbi <felipe.balbi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnaldo Carvalho de Melo [Tue, 16 Oct 2018 20:08:29 +0000 (17:08 -0300)]
perf tools: Stop fallbacking to kallsyms for vdso symbols lookup
David reports that:
<quote>
Perf has this hack where it uses the kernel symbol map as a backup when
a symbol can't be found in the user's symbol table(s).
This causes problems because the tests driving this code path use
machine__kernel_ip(), and that is completely meaningless on Sparc. On
sparc64 the kernel and user live in physically separate virtual address
spaces, rather than a shared one. And the kernel lives at a virtual
address that overlaps common userspace addresses. So this test passes
almost all the time when a user symbol lookup fails.
The consequence of this is that, if the unfound user virtual address in
the sample doesn't match up to a kernel symbol either, we trigger things
like this code in builtin-top.c:
if (al.sym == NULL && al.map != NULL) {
const char *msg = "Kernel samples will not be resolved.\n";
/*
* As we do lazy loading of symtabs we only will know if the
* specified vmlinux file is invalid when we actually have a
* hit in kernel space and then try to load it. So if we get
* here and there are _no_ symbols in the DSO backing the
* kernel map, bail out.
*
* We may never get here, for instance, if we use -K/
* --hide-kernel-symbols, even if the user specifies an
* invalid --vmlinux ;-)
*/
if (!machine->kptr_restrict_warned && !top->vmlinux_warned &&
__map__is_kernel(al.map) && map__has_symbols(al.map)) {
if (symbol_conf.vmlinux_name) {
char serr[256];
dso__strerror_load(al.map->dso, serr, sizeof(serr));
ui__warning("The %s file can't be used: %s\n%s",
symbol_conf.vmlinux_name, serr, msg);
} else {
ui__warning("A vmlinux file was not found.\n%s",
msg);
}
if (use_browser <= 0)
sleep(5);
top->vmlinux_warned = true;
}
}
When I fire up a compilation on sparc, this triggers immediately.
I'm trying to figure out what the "backup to kernel map" code is
accomplishing.
I see some language in the current code and in the changes that have
happened in this area talking about vdso. Does that really happen?
The vdso is mapped into userspace virtual addresses, not kernel ones.
More history. This didn't cause problems on sparc some time ago,
because the kernel IP check used to be "ip < 0" :-) Sparc kernel
addresses are not negative. But now with machine__kernel_ip(), which
works using the symbol table determined kernel address range, it does
trigger.
What it all boils down to is that on architectures like sparc,
machine__kernel_ip() should always return false in this scenerio, and
therefore this kind of logic:
if (cpumode == PERF_RECORD_MISC_USER && machine &&
mg != &machine->kmaps &&
machine__kernel_ip(machine, al->addr)) {
is basically invalid. PERF_RECORD_MISC_USER implies no kernel address
can possibly match for the sample/event in question (no matter how
hard you try!) :-)
</>
So, I thought something had changed and in the past we would somehow
find that address in the kallsyms, but I couldn't find anything to back
that up, the patch introducing this is over a decade old, lots of things
changed, so I was just thinking I was missing something.
I tried a gtod busy loop to generate vdso activity and added a 'perf
probe' at that branch, on x86_64 to see if it ever gets hit:
Made thread__find_map() noinline, as 'perf probe' in lines of inline
functions seems to not be working, only at function start. (Masami?)
# perf probe -x ~/bin/perf -L thread__find_map:57
<thread__find_map@/home/acme/git/perf/tools/perf/util/event.c:57>
57 if (cpumode == PERF_RECORD_MISC_USER && machine &&
58 mg != &machine->kmaps &&
59 machine__kernel_ip(machine, al->addr)) {
60 mg = &machine->kmaps;
61 load_map = true;
62 goto try_again;
}
} else {
/*
* Kernel maps might be changed when loading
* symbols so loading
* must be done prior to using kernel maps.
*/
69 if (load_map)
70 map__load(al->map);
71 al->addr = al->map->map_ip(al->map, al->addr);
# perf probe -x ~/bin/perf thread__find_map:60
Added new event:
probe_perf:thread__find_map (on thread__find_map:60 in /home/acme/bin/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:thread__find_map -aR sleep 1
#
Then used this to see if, system wide, those probe points were being hit:
# perf trace -e *perf:thread*/max-stack=8/
^C[root@jouet ~]#
No hits when running 'perf top' and:
# cat gtod.c
#include <sys/time.h>
int main(void)
{
struct timeval tv;
while (1)
gettimeofday(&tv, 0);
return 0;
}
[root@jouet c]# ./gtod
^C
Pressed 'P' in 'perf top' and the [vdso] samples are there:
62.84% [vdso] [.] __vdso_gettimeofday
8.13% gtod [.] main
7.51% [vdso] [.] 0x0000000000000914
5.78% [vdso] [.] 0x0000000000000917
5.43% gtod [.] _init
2.71% [vdso] [.] 0x000000000000092d
0.35% [kernel] [k] native_io_delay
0.33% libc-2.26.so [.] __memmove_avx_unaligned_erms
0.20% [vdso] [.] 0x000000000000091d
0.17% [i2c_i801] [k] i801_access
0.06% firefox [.] free
0.06% libglib-2.0.so.0.5400.3 [.] g_source_iter_next
0.05% [vdso] [.] 0x0000000000000919
0.05% libpthread-2.26.so [.] __pthread_mutex_lock
0.05% libpixman-1.so.0.34.0 [.] 0x000000000006d3a7
0.04% [kernel] [k] entry_SYSCALL_64_trampoline
0.04% libxul.so [.] style::dom_apis::query_selector_slow
0.04% [kernel] [k] module_get_kallsym
0.04% firefox [.] malloc
0.04% [vdso] [.] 0x0000000000000910
I added a 'perf probe' to thread__find_map:69, and that surely got tons
of hits, i.e. for every map found, just to make sure the 'perf probe'
command was really working.
In the process I noticed a bug, we're only have records for '[vdso]' for
pre-existing commands, i.e. ones that are running when we start 'perf top',
when we will generate the PERF_RECORD_MMAP by looking at /perf/PID/maps.
I.e. like this, for preexisting processes with a vdso map, again,
tracing for all the system, only pre-existing processes get a [vdso] map
(when having one):
[root@jouet ~]# perf probe -x ~/bin/perf __machine__addnew_vdso
Added new event:
probe_perf:__machine__addnew_vdso (on __machine__addnew_vdso in /home/acme/bin/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:__machine__addnew_vdso -aR sleep 1
[root@jouet ~]# perf trace -e probe_perf:__machine__addnew_vdso/max-stack=8/
0.000 probe_perf:__machine__addnew_vdso:(568eb3)
__machine__addnew_vdso (/home/acme/bin/perf)
map__new (/home/acme/bin/perf)
machine__process_mmap2_event (/home/acme/bin/perf)
machine__process_event (/home/acme/bin/perf)
perf_event__process (/home/acme/bin/perf)
perf_tool__process_synth_event (/home/acme/bin/perf)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
__event__synthesize_thread (/home/acme/bin/perf)
The kernel is generating a PERF_RECORD_MMAP for vDSOs, but somehow
'perf top' is not getting those records while 'perf record' is:
# perf record ~acme/c/gtod
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.076 MB perf.data (1499 samples) ]
# perf report -D | grep PERF_RECORD_MMAP2
71293612401913 0x11b48 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x400000(0x1000) @ 0 fd:02 1137
541179306]: r-xp /home/acme/c/gtod
71293612419012 0x11be0 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x7fa4a2783000(0x227000) @ 0 fd:00
3146370 854107250]: r-xp /usr/lib64/ld-2.26.so
71293612432110 0x11c50 [0x60]: PERF_RECORD_MMAP2 25484/25484: [0x7ffcdb53a000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
71293612509944 0x11cb0 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x7fa4a23cd000(0x3b6000) @ 0 fd:00
3149723 262067164]: r-xp /usr/lib64/libc-2.26.so
#
# perf script | grep vdso | head
gtod 25484 71293.612768:
2485554 cycles:ppp:
7ffcdb53a914 [unknown] ([vdso])
gtod 25484 71293.613576:
2149343 cycles:ppp:
7ffcdb53a917 [unknown] ([vdso])
gtod 25484 71293.614274:
1814652 cycles:ppp:
7ffcdb53aca8 __vdso_gettimeofday+0x98 ([vdso])
gtod 25484 71293.614862:
1669070 cycles:ppp:
7ffcdb53acc5 __vdso_gettimeofday+0xb5 ([vdso])
gtod 25484 71293.615404:
1451589 cycles:ppp:
7ffcdb53acc5 __vdso_gettimeofday+0xb5 ([vdso])
gtod 25484 71293.615999:
1269941 cycles:ppp:
7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
gtod 25484 71293.616405:
1177946 cycles:ppp:
7ffcdb53a914 [unknown] ([vdso])
gtod 25484 71293.616775:
1121290 cycles:ppp:
7ffcdb53ac47 __vdso_gettimeofday+0x37 ([vdso])
gtod 25484 71293.617150:
1037721 cycles:ppp:
7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
gtod 25484 71293.617478: 994526 cycles:ppp:
7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
#
The patch is the obvious one and with it we also continue to resolve
vdso symbols for pre-existing processes in 'perf top' and for all
processes in 'perf record' + 'perf report/script'.
Suggested-by: David Miller <davem@davemloft.net>
Acked-by: David Miller <davem@davemloft.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-cs7skq9pp0kjypiju6o7trse@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jens Axboe [Wed, 17 Oct 2018 15:45:49 +0000 (09:45 -0600)]
Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus
Pull single NVMe fix from Christoph.
* 'nvme-4.19' of git://git.infradead.org/nvme:
nvme: remove ns sibling before clearing path
Greg Kroah-Hartman [Wed, 17 Oct 2018 12:01:00 +0000 (14:01 +0200)]
Merge branch 'parisc-4.19-3' of git://git./linux/kernel/git/deller/parisc-linux
Helge writes:
"parisc fix:
Fix an unitialized variable usage in the parisc unwind code."
* 'parisc-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix uninitialized variable usage in unwind.c
Greg Kroah-Hartman [Wed, 17 Oct 2018 11:40:10 +0000 (13:40 +0200)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Stephen writes:
"clk fixes for v4.19-rc8
One fix for the Allwinner A10 SoC's audio PLL that wasn't properly
set and generating noise."
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: sun4i: Set VCO and PLL bias current to lowest setting