linux-2.6-microblaze.git
6 years agoKVM: SVM: Add irqchip_split() checks before enabling AVIC
Suravee Suthikulpanit [Tue, 12 Sep 2017 15:42:42 +0000 (10:42 -0500)]
KVM: SVM: Add irqchip_split() checks before enabling AVIC

SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.

So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support")
[Removed pr_debug - Radim.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agodmi: Mark all struct dmi_system_id instances const
Christoph Hellwig [Thu, 14 Sep 2017 09:59:30 +0000 (11:59 +0200)]
dmi: Mark all struct dmi_system_id instances const

... and __initconst if applicable.

Based on similar work for an older kernel in the Grsecurity patch.

[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
6 years agomm, page_owner: skip unnecessary stack_trace entries
Prakash Gupta [Wed, 13 Sep 2017 23:28:35 +0000 (16:28 -0700)]
mm, page_owner: skip unnecessary stack_trace entries

The page_owner stacktrace always begin as follows:

  [<ffffff987bfd48f4>] save_stack+0x40/0xc8
  [<ffffff987bfd4da8>] __set_page_owner+0x3c/0x6c

These two entries do not provide any useful information and limits the
available stacktrace depth.  The page_owner stacktrace was skipping
caller function from stack entries but this was missed with commit
f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")

Example page_owner entry after the patch:

  Page allocated via order 0, mask 0x8(ffffff80085fb714)
  PFN 654411 type Movable Block 639 type CMA Flags 0x0(ffffffbe5c7f12c0)
  [<ffffff9b64989c14>] post_alloc_hook+0x70/0x80
  ...
  [<ffffff9b651216e8>] msm_comm_try_state+0x5f8/0x14f4
  [<ffffff9b6512486c>] msm_vidc_open+0x5e4/0x7d0
  [<ffffff9b65113674>] msm_v4l2_open+0xa8/0x224

Link: http://lkml.kernel.org/r/1504078343-28754-2-git-send-email-guptap@codeaurora.org
Fixes: f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoarm64: stacktrace: avoid listing stacktrace functions in stacktrace
Prakash Gupta [Wed, 13 Sep 2017 23:28:32 +0000 (16:28 -0700)]
arm64: stacktrace: avoid listing stacktrace functions in stacktrace

The stacktraces always begin as follows:

  [<c00117b4>] save_stack_trace_tsk+0x0/0x98
  [<c0011870>] save_stack_trace+0x24/0x28
  ...

This is because the stack trace code includes the stack frames for
itself.  This is incorrect behaviour, and also leads to "skip" doing the
wrong thing (which is the number of stack frames to avoid recording.)

Perversely, it does the right thing when passed a non-current thread.
Fix this by ensuring that we have a known constant number of frames
above the main stack trace function, and always skip these.

This was fixed for arch arm by commit 3683f44c42e9 ("ARM: stacktrace:
avoid listing stacktrace functions in stacktrace")

Link: http://lkml.kernel.org/r/1504078343-28754-1-git-send-email-guptap@codeaurora.org
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: treewide: remove GFP_TEMPORARY allocation flag
Michal Hocko [Wed, 13 Sep 2017 23:28:29 +0000 (16:28 -0700)]
mm: treewide: remove GFP_TEMPORARY allocation flag

GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoIB/mlx4: fix sprintf format warning
Arnd Bergmann [Wed, 13 Sep 2017 23:28:26 +0000 (16:28 -0700)]
IB/mlx4: fix sprintf format warning

gcc-7 points out that a negative port_num value would overflow the
string buffer:

  drivers/infiniband/hw/mlx4/sysfs.c: In function 'mlx4_ib_device_register_sysfs':
  drivers/infiniband/hw/mlx4/sysfs.c:251:16: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
  drivers/infiniband/hw/mlx4/sysfs.c:251:2: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
  drivers/infiniband/hw/mlx4/sysfs.c:303:17: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
  drivers/infiniband/hw/mlx4/sysfs.c:303:3: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10

While we should be able to assume that port_num is positive here, making
the buffer one byte longer has no downsides and avoids the warning.

Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device")
Link: http://lkml.kernel.org/r/20170714120720.906842-23-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofscache: fix fscache_objlist_show format processing
Arnd Bergmann [Wed, 13 Sep 2017 23:28:23 +0000 (16:28 -0700)]
fscache: fix fscache_objlist_show format processing

gcc points out a minor bug in the handling of unknown cookie types,
which could result in a string overflow when the integer is copied into
a 3-byte string:

  fs/fscache/object-list.c: In function 'fscache_objlist_show':
  fs/fscache/object-list.c:265:19: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
   sprintf(_type, "%02u", cookie->def->type);
                  ^~~~~~
  fs/fscache/object-list.c:265:4: note: 'sprintf' output between 3 and 4 bytes into a destination of size 3

This is currently harmless as no code sets a type other than 0 or 1, but
it makes sense to use snprintf() here to avoid overflowing the array if
that changes.

Link: http://lkml.kernel.org/r/20170714120720.906842-22-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agolib/test_bitmap.c: use ULL suffix for 64-bit constants
Geert Uytterhoeven [Wed, 13 Sep 2017 23:28:20 +0000 (16:28 -0700)]
lib/test_bitmap.c: use ULL suffix for 64-bit constants

With gcc 4.1.2:

  lib/test_bitmap.c:189: warning: integer constant is too large for `long' type
  lib/test_bitmap.c:190: warning: integer constant is too large for `long' type
  lib/test_bitmap.c:194: warning: integer constant is too large for `long' type
  lib/test_bitmap.c:195: warning: integer constant is too large for `long' type

Add the missing "ULL" suffix to fix this.

Link: http://lkml.kernel.org/r/1505040523-31230-1-git-send-email-geert@linux-m68k.org
Fixes: 60ef690018b262dd ("bitmap: introduce BITMAP_FROM_U64()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoprocfs: remove unused variable
Arnd Bergmann [Wed, 13 Sep 2017 23:28:17 +0000 (16:28 -0700)]
procfs: remove unused variable

In NOMMU configurations, we get a warning about a variable that has become
unused:

  fs/proc/task_nommu.c: In function 'nommu_vma_show':
  fs/proc/task_nommu.c:148:28: error: unused variable 'priv' [-Werror=unused-variable]

Link: http://lkml.kernel.org/r/20170911200231.3171415-1-arnd@arndb.de
Fixes: 1240ea0dc3bb ("fs, proc: remove priv argument from is_stack")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/media/cec/cec-adap.c: fix build with gcc-4.4.4
Andrew Morton [Wed, 13 Sep 2017 23:28:14 +0000 (16:28 -0700)]
drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4

gcc-4.4.4 has issues with initialization of anonymous unions:

  drivers/media/cec/cec-adap.c: In function 'cec_queue_msg_fh':
  drivers/media/cec/cec-adap.c:184: error: unknown field 'lost_msgs' specified in initializer

work around this.

Fixes: 6b2bbb08747a5 ("media: cec: rework the cec event handling")
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoidr: remove WARN_ON_ONCE() when trying to replace negative ID
Eric Biggers [Wed, 13 Sep 2017 23:28:11 +0000 (16:28 -0700)]
idr: remove WARN_ON_ONCE() when trying to replace negative ID

IDR only supports non-negative IDs.  There used to be a 'WARN_ON_ONCE(id <
0)' in idr_replace(), but it was intentionally removed by commit
2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").

Then it was added back by commit 0a835c4f090a ("Reimplement IDR and IDA
using the radix tree").  However it seems that adding it back was a
mistake, given that some users such as drm_gem_handle_delete()
(DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
allowing the WARN_ON_ONCE to be triggered.  drm_gem_handle_delete()
actually just wants idr_replace() to return an error code if the ID is
not allocated, including in the case where the ID is invalid (negative).

So once again remove the bogus WARN_ON_ONCE().

This bug was found by syzkaller, which encountered the following
warning:

    WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
     do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
     do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
     do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
     do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
     invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
    RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
    RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
    RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
    RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
    R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
    R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
     drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
     drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
     drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
     drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
     vfs_ioctl fs/ioctl.c:45 [inline]
     do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
     SYSC_ioctl fs/ioctl.c:700 [inline]
     SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
     entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

    #include <fcntl.h>
    #include <stddef.h>
    #include <stdint.h>
    #include <sys/ioctl.h>
    #include <drm/drm.h>

    int main(void)
    {
            int cardfd = open("/dev/dri/card0", O_RDONLY);

            ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
                  &(struct drm_gem_close) { .handle = -1 } );
    }

Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [v4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agosctp: potential read out of bounds in sctp_ulpevent_type_enabled()
Dan Carpenter [Wed, 13 Sep 2017 23:00:54 +0000 (02:00 +0300)]
sctp: potential read out of bounds in sctp_ulpevent_type_enabled()

This code causes a static checker warning because Smatch doesn't trust
anything that comes from skb->data.  I've reviewed this code and I do
think skb->data can be controlled by the user here.

The sctp_event_subscribe struct has 13 __u8 fields and we want to see
if ours is non-zero.  sn_type can be any value in the 0-USHRT_MAX range.
We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
either before the start of the struct or after the end.

This is a very old bug and it's surprising that it would go undetected
for so long but my theory is that it just doesn't have a big impact so
it would be hard to notice.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi2c: altera: Add Altera I2C Controller driver
Thor Thayer [Mon, 11 Sep 2017 21:17:20 +0000 (16:17 -0500)]
i2c: altera: Add Altera I2C Controller driver

Add driver support for the Altera I2C Controller. The I2C
controller is soft IP for use in FPGAs.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
6 years agodt-bindings: i2c: Add Altera I2C Controller
Thor Thayer [Mon, 11 Sep 2017 21:17:19 +0000 (16:17 -0500)]
dt-bindings: i2c: Add Altera I2C Controller

Add the documentation to support the Altera synthesizable
logic I2C Controller in FPGA.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
6 years agoMAINTAINERS: review Renesas DT bindings as well
Sergei Shtylyov [Wed, 13 Sep 2017 19:28:53 +0000 (22:28 +0300)]
MAINTAINERS: review Renesas DT bindings as well

When adding myself  as a reviewer for  the Renesas  Ethernet drivers
I somehow forgot about the bindings -- I want to review them as well.

Fixes: 8e6569af3a1b ("MAINTAINERS: add myself as Renesas Ethernet drivers reviewer")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoum: return negative in tuntap_open_tramp()
Dan Carpenter [Fri, 25 Aug 2017 10:19:58 +0000 (13:19 +0300)]
um: return negative in tuntap_open_tramp()

The intention is to return negative error codes.  "pid" is already
negative but we accidentally negate it again back to positive.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoum: remove a stray tab
Dan Carpenter [Fri, 25 Aug 2017 10:36:15 +0000 (13:36 +0300)]
um: remove a stray tab

Static checkers would urge us to add curly braces to this code, but
actually the code works correctly.  It just isn't indented right.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoum: Use relative modversions with LD_SCRIPT_DYN
Thomas Meyer [Sun, 20 Aug 2017 11:26:05 +0000 (13:26 +0200)]
um: Use relative modversions with LD_SCRIPT_DYN

When building a dynamic kernel image use relative symbols with MODVERSIONS.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoum: link vmlinux with -no-pie
Thomas Meyer [Sun, 20 Aug 2017 11:26:04 +0000 (13:26 +0200)]
um: link vmlinux with -no-pie

Debian's gcc defaults to pie. The global Makefile already defines the -fno-pie option.
Link UML dynamic kernel image also with -no-pie to fix the build.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoum: Fix CONFIG_GCOV for modules.
Thomas Meyer [Sun, 13 Aug 2017 12:51:12 +0000 (14:51 +0200)]
um: Fix CONFIG_GCOV for modules.

Explicitly export symbols so modpost doesn't complain.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoFix minor typos and grammar in UML start_up help
James Pack [Tue, 8 Aug 2017 20:19:41 +0000 (13:19 -0700)]
Fix minor typos and grammar in UML start_up help

Signed-off-by: James Pack <jpack61108@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoum: defconfig: Cleanup from old Kconfig options
Krzysztof Kozlowski [Thu, 20 Jul 2017 05:01:10 +0000 (07:01 +0200)]
um: defconfig: Cleanup from old Kconfig options

Remove old, dead Kconfig option INET_LRO. It is gone since
commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library").

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agonet_sched: gen_estimator: fix scaling error in bytes/packets samples
Eric Dumazet [Wed, 13 Sep 2017 18:16:45 +0000 (11:16 -0700)]
net_sched: gen_estimator: fix scaling error in bytes/packets samples

Denys reported wrong rate estimations with HTB classes.

It appears the bug was added in linux-4.10, since my tests
where using intervals of one second only.

HTB using 4 sec default rate estimators, reported rates
were 4x higher.

We need to properly scale the bytes/packets samples before
integrating them in EWMA.

Tested:
 echo 1 >/sys/module/sch_htb/parameters/htb_rate_est

 Setup HTB with one class with a rate/cail of 5Gbit

 Generate traffic on this class

 tc -s -d cl sh dev eth0 classid 7002:11
class htb 7002:11 parent 7002:1 prio 5 quantum 200000 rate 5Gbit ceil
5Gbit linklayer ethernet burst 80000b/1 mpu 0b cburst 80000b/1 mpu 0b
level 0 rate_handle 1
 Sent 1488215421648 bytes 982969243 pkt (dropped 0, overlimits 0
requeues 0)
 rate 5Gbit 412814pps backlog 136260b 2p requeues 0
 TCP pkts/rtx 982969327/45 bytes 1488215557414/68130
 lended: 22732826 borrowed: 0 giants: 0
 tokens: -1684 ctokens: -1684

Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'nfp-card-init'
David S. Miller [Wed, 13 Sep 2017 20:29:13 +0000 (13:29 -0700)]
Merge branch 'nfp-card-init'

Jakub Kicinski says:

====================
nfp: wait more carefully for card init

The first patch is a small fix for flower offload, we need a whitelist
of supported matches, otherwise the unsupported ones will be ignored.

The second and the third patch are adding wait/polling to the probe path.
We had reports of driver failing probe because it couldn't find the
control process (NSP) on the card.  Turns out the NSP will only announce
its existence after it's fully initialized.  Until now we assumed it
will be reachable, just not processing commands (hence we wait for
a NOOP command to execute successfully).

v2:
 - fix a bad merge which resulted in a build warning and retest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: wait for the NSP resource to appear on boot
Jakub Kicinski [Wed, 13 Sep 2017 17:16:00 +0000 (10:16 -0700)]
nfp: wait for the NSP resource to appear on boot

The control process (NSP) may take some time to complete its
initialization.  This is not a problem on most servers, but
on very fast-booting machines it may not be ready for operation
when driver probes the device.  There is also a version of the
flash in the wild where NSP tries to train the links as part
of init.  To wait for NSP initialization we should make sure
its resource has already been added to the resource table.
NSP adds itself there as last step of init.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: wait for board state before talking to the NSP
Jakub Kicinski [Wed, 13 Sep 2017 17:15:59 +0000 (10:15 -0700)]
nfp: wait for board state before talking to the NSP

Board state informs us which low-level initialization stages the card
has completed.  We should wait for the card to be fully initialized
before trying to communicate with it, not only before we configure
passing traffic.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: add whitelist of supported flow dissector
Pieter Jansen van Vuuren [Wed, 13 Sep 2017 17:15:58 +0000 (10:15 -0700)]
nfp: add whitelist of supported flow dissector

Previously we did not check the flow dissector against a list of allowed
and supported flow key dissectors. This patch introduces such a list and
correctly rejects unsupported flow keys.

Fixes: 43f84b72c50d ("nfp: add metadata to each flow offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoum: Fix FP register size for XSTATE/XSAVE
Thomas Meyer [Sat, 29 Jul 2017 15:03:23 +0000 (17:03 +0200)]
um: Fix FP register size for XSTATE/XSAVE

Hard code max size. Taken from
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/common/x86-xstate.h

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoUBI: Fix two typos in comments
Uwe Kleine-König [Thu, 31 Aug 2017 19:31:57 +0000 (21:31 +0200)]
UBI: Fix two typos in comments

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoubi: fastmap: fix spelling mistake: "invalidiate" -> "invalidate"
Colin Ian King [Mon, 3 Jul 2017 09:37:01 +0000 (10:37 +0100)]
ubi: fastmap: fix spelling mistake: "invalidiate" -> "invalidate"

Trivial fix to spelling mistake in ubi_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoubi: pr_err() strings should end with newlines
Ben Dooks [Wed, 26 Jul 2017 09:53:50 +0000 (10:53 +0100)]
ubi: pr_err() strings should end with newlines

In build.c, the following pr_err calls should be terminated with
a new-line to avoid other messages being concatenated onto the end.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoubi: pr_err() strings should end with newlines
Ben Dooks [Wed, 26 Jul 2017 09:51:10 +0000 (10:51 +0100)]
ubi: pr_err() strings should end with newlines

In ubi_attach_mtd_dev() the pr_err() calls should have their
messgaes terminated with a new-line to avoid other messages
being concatenated onto the end.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoubi: pr_err() strings should end with newlines
Ben Dooks [Wed, 26 Jul 2017 09:42:54 +0000 (10:42 +0100)]
ubi: pr_err() strings should end with newlines

The ubi_init() function has a few error paths that use the
pr_err() to output errors. These should have new lines on
them as pr_err() does not automatically do this.

This fixes issues where if multiple mtd fail to bind to
ubi the console output starts wrapping around.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 13 Sep 2017 19:24:20 +0000 (12:24 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "A handful of tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf stat: Wait for the correct child
  perf tools: Support running perf binaries with a dash in their name
  perf config: Check not only section->from_system_config but also item's
  perf ui progress: Fix progress update
  perf ui progress: Make sure we always define step value
  perf tools: Open perf.data with O_CLOEXEC flag
  tools lib api: Fix make DEBUG=1 build
  perf tests: Fix compile when libunwind's unwind.h is available
  tools include linux: Guard against redefinition of some macros

6 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 13 Sep 2017 19:22:32 +0000 (12:22 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Three CPU hotplug related fixes and a debugging improvement"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Add debugfs knob for "sched_debug"
  sched/core: WARN() when migrating to an offline CPU
  sched/fair: Plug hole between hotplug and active_load_balance()
  sched/fair: Avoid newidle balance for !active CPUs

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 13 Sep 2017 18:56:16 +0000 (11:56 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "The main changes are the PCID fixes from Andy, but there's also two
  hyperv fixes and two paravirt updates"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition
  x86/hyper-V: Allocate the IDT entry early in boot
  paravirt: Switch maintainer
  x86/paravirt: Remove no longer used paravirt functions
  x86/mm/64: Initialize CR4.PCIDE early
  x86/hibernate/64: Mask off CR3's PCID bits in the saved CR3
  x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off()

6 years agoMerge tag 'openrisc-for-linus' of git://github.com/openrisc/linux
Linus Torvalds [Wed, 13 Sep 2017 18:52:18 +0000 (11:52 -0700)]
Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux

Pull OpenRISC fixlet from Stafford Horne:
 "Fix warning for upcoming work to remove linux/vmalloc.h from
  asm-generic/io.h"

* tag 'openrisc-for-linus' of git://github.com/openrisc/linux:
  openrisc: add forward declaration for struct vm_area_struct

6 years agoMerge tag 'modules-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Wed, 13 Sep 2017 18:28:19 +0000 (11:28 -0700)]
Merge tag 'modules-for-v4.14' of git://git./linux/kernel/git/jeyu/linux

Pull modules updates from Jessica Yu:
 "Summary of modules changes for the 4.14 merge window:

   - minor code cleanups and fixes

   - modpost: avoid building modules that have names that exceed the
     size of the name field in struct module"

* tag 'modules-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Remove const attribute from alias for MODULE_DEVICE_TABLE
  module: fix ddebug_remove_module()
  modpost: abort if module name is too long

6 years agoFix up MAINTAINERS file sorting
Linus Torvalds [Wed, 13 Sep 2017 18:18:19 +0000 (11:18 -0700)]
Fix up MAINTAINERS file sorting

Another merge window, another MAINTAINERS file disaster.

People have serious problems with the alphabet and sorting, and poor
Jérôme Glisse and Radim Krčmář get their names mangled by locale issues,
turning them into some mangled mess (probably others do too, but those
two stood out when sorting things again).

And we now have two copies of the same 'AS3645A LED FLASH CONTROLLER
DRIVER' in the tree and in the MAINTAINERS file, but that's a separate
issue - the duplication is real, and I left them as two entries for the
same name.

This does not try to sort the actual section pattern entries, although I
may end up doing that later.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMerge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Linus Torvalds [Wed, 13 Sep 2017 18:04:14 +0000 (11:04 -0700)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "The diff is dominated by the Allwinner A10/A20 SoCs getting converted
  to the sunxi-ng framework. Otherwise, the heavy hitters are various
  drivers for SoCs like AT91, Amlogic, Renesas, and Rockchip. There are
  some other new clk drivers in here too but overall this is just a
  bunch of clk drivers for various different pieces of hardware and a
  collection of non-critical fixes for clk drivers.

  New Drivers:
   - Allwinner R40 SoCs
   - Renesas R-Car Gen3 USB 2.0 clock selector PHY
   - Atmel AT91 audio PLL
   - Uniphier PXs3 SoCs
   - ARC HSDK Board PLLs
   - AXS10X Board PLLs
   - STMicroelectronics STM32H743 SoCs

  Removed Drivers:
   - Non-compiling mb86s7x support

  Updates:
   - Allwinner A10/A20 SoCs converted to sunxi-ng framework
   - Allwinner H3 CPU clk fixes
   - Renesas R-Car D3 SoC
   - Renesas V2H and M3-W modules
   - Samsung Exynos5420/5422/5800 audio fixes
   - Rockchip fractional clk approximation fixes
   - Rockchip rk3126 SoC support within the rk3128 driver
   - Amlogic gxbb CEC32 and sd_emmc clks
   - Amlogic meson8b reset controller support
   - IDT VersaClock 5P49V5925/5P49V6901 support
   - Qualcomm MSM8996 SMMU clks
   - Various 'const' applications for struct clk_ops
   - si5351 PLL reset bugfix
   - Uniphier audio on LD11/LD20 and ethernet support on LD11/LD20/Pro4/PXs2
   - Assorted Tegra clk driver fixes"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (120 commits)
  clk: si5351: fix PLL reset
  ASoC: atmel-classd: remove aclk clock
  ASoC: atmel-classd: remove aclk clock from DT binding
  clk: at91: clk-generated: make gclk determine audio_pll rate
  clk: at91: clk-generated: create function to find best_diff
  clk: at91: add audio pll clock drivers
  dt-bindings: clk: at91: add audio plls to the compatible list
  clk: at91: clk-generated: remove useless divisor loop
  clk: mb86s7x: Drop non-building driver
  clk: ti: check for null return in strrchr to avoid null dereferencing
  clk: Don't write error code into divider register
  clk: uniphier: add video input subsystem clock
  clk: uniphier: add audio system clock
  clk: stm32h7: Add stm32h743 clock driver
  clk: gate: expose clk_gate_ops::is_enabled
  clk: nxp: clk-lpc32xx: rename clk_gate_is_enabled()
  clk: uniphier: add PXs3 clock data
  clk: hi6220: change watchdog clock source
  clk: Kconfig: Name RK805 in Kconfig for COMMON_CLK_RK808
  clk: cs2000: Add cs2000_set_saved_rate
  ...

6 years agoMerge tag 'rtc-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Linus Torvalds [Wed, 13 Sep 2017 17:56:00 +0000 (10:56 -0700)]
Merge tag 'rtc-4.14' of git://git./linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "Subsystem:
   - remove .open() and .release() RTC ops
   - constify i2c_device_id

  New driver:
   - Realtek RTD1295
   - Android emulator (goldfish) RTC

  Drivers:
   - ds1307: Beginning of a huge cleanup
   - s35390a: handle invalid RTC time
   - sun6i: external oscillator gate support"

* tag 'rtc-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (40 commits)
  rtc: ds1307: use octal permissions
  rtc: ds1307: fix braces
  rtc: ds1307: fix alignments and blank lines
  rtc: ds1307: use BIT
  rtc: ds1307: use u32
  rtc: ds1307: use sizeof
  rtc: ds1307: remove regs member
  rtc: Add Realtek RTD1295
  dt-bindings: rtc: Add Realtek RTD1295
  rtc: sun6i: Add support for the external oscillator gate
  rtc: goldfish: Add RTC driver for Android emulator
  dt-bindings: Add device tree binding for Goldfish RTC driver
  rtc: ds1307: add basic support for ds1341 chip
  rtc: ds1307: remove member nvram_offset from struct ds1307
  rtc: ds1307: factor out offset to struct chip_desc
  rtc: ds1307: factor out rtc_ops to struct chip_desc
  rtc: ds1307: factor out irq_handler to struct chip_desc
  rtc: ds1307: improve irq setup
  rtc: ds1307: constify struct chip_desc variables
  rtc: ds1307: improve trickle charger initialization
  ...

6 years agoMerge tag 'sound-fix-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Wed, 13 Sep 2017 17:50:06 +0000 (10:50 -0700)]
Merge tag 'sound-fix-4.14-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Most of the commits are trivial cleanup patches, while one commit is a
  significant fix for the race at ALSA sequencer that was spotted by
  syzkaller"

* tag 'sound-fix-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: seq: Cancel pending autoload work at unbinding device
  ALSA: firewire: Use common error handling code in snd_motu_stream_start_duplex()
  ALSA: asihpi: Kill BUG_ON() usages
  ALSA: core: Use %pS printk format for direct addresses
  ALSA: ymfpci: Use common error handling code in snd_ymfpci_create()
  ALSA: ymfpci: Use common error handling code in snd_card_ymfpci_probe()
  ALSA: 6fire: Use common error handling code in usb6fire_chip_probe()
  ALSA: usx2y: Use common error handling code in submit_urbs()
  ALSA: us122l: Use common error handling code in us122l_create_card()
  ALSA: hdspm: Use common error handling code in snd_hdspm_probe()
  ALSA: rme9652: Use common code in hdsp_get_iobox_version()
  ALSA: maestro3: Use common error handling code in two functions

6 years agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Wed, 13 Sep 2017 17:47:14 +0000 (10:47 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "A tiny update: one patch corrects a Kconfig problem with the shift of
  the SAS SMP code to BSG and the other removes a vestige of user space
  target mode"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_transport_sas: select BLK_DEV_BSGLIB
  scsi: Remove Scsi_Host.uspace_req_q

6 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Wed, 13 Sep 2017 17:20:41 +0000 (10:20 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Small collection of fixes that would be nice to have in -rc1. This
  contains:

   - NVMe pull request form Christoph, mostly with fixes for nvme-pci,
     host memory buffer in particular.

   - Error handling fixup for cgwb_create(), in case allocation of 'wb'
     fails. From Christophe Jaillet.

   - Ensure that trace_block_getrq() gets the 'dev' in an appropriate
     fashion, to avoid a potential NULL deref. From Greg Thelen.

   - Regression fix for dm-mq with blk-mq, fixing a problem with
     stacking IO schedulers. From me.

   - string.h fixup, fixing an issue with memcpy_and_pad(). This
     original change came in through an NVMe dependency, which is why
     I'm including it here. From Martin Wilck.

   - Fix potential int overflow in __blkdev_sectors_to_bio_pages(), from
     Mikulas.

   - MBR enable fix for sed-opal, from Scott"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: directly insert blk-mq request from blk_insert_cloned_request()
  mm/backing-dev.c: fix an error handling path in 'cgwb_create()'
  string.h: un-fortify memcpy_and_pad
  nvme-pci: implement the HMB entry number and size limitations
  nvme-pci: propagate (some) errors from host memory buffer setup
  nvme-pci: use appropriate initial chunk size for HMB allocation
  nvme-pci: fix host memory buffer allocation fallback
  nvme: fix lightnvm check
  block: fix integer overflow in __blkdev_sectors_to_bio_pages()
  block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled
  block: tolerate tracing of NULL bio

6 years agoMerge tag 'docs-4.14' of git://git.lwn.net/linux
Linus Torvalds [Wed, 13 Sep 2017 17:18:34 +0000 (10:18 -0700)]
Merge tag 'docs-4.14' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A cleanup from Mauro that needed to wait for the media pull, plus a
  handful of other fixes that wandered in"

* tag 'docs-4.14' of git://git.lwn.net/linux:
  kokr/memory-barriers.txt: Apply atomic_t.txt change
  kokr/doc: Update memory-barriers.txt for read-to-write dependencies
  docs-rst: don't require adjustbox anymore
  docs-rst: conf.py: only setup notice box colors if Sphinx < 1.6
  docs-rst: conf.py: remove lscape from LaTeX preamble

6 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi...
Linus Torvalds [Wed, 13 Sep 2017 17:10:19 +0000 (10:10 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:
 "This fixes a regression (spotted by the Sandstorm.io folks) in the pid
  namespace handling introduced in 4.12.

  There's also a fix for honoring sync/dsync flags for pwritev2()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: getattr cleanup
  fuse: honor iocb sync flags on write
  fuse: allow server to run in different pid_ns

6 years agonet: sched: fix use-after-free in tcf_action_destroy and tcf_del_walker
Jiri Pirko [Wed, 13 Sep 2017 15:32:37 +0000 (17:32 +0200)]
net: sched: fix use-after-free in tcf_action_destroy and tcf_del_walker

Recent commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu") removed
freeing in call_rcu, which changed already existing hard-to-hit
race condition into 100% hit:

[  598.599825] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[  598.607782] IP: tcf_action_destroy+0xc0/0x140

Or:

[   40.858924] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[   40.862840] IP: tcf_generic_walker+0x534/0x820

Fix this by storing the ops and use them directly for module_put call.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoKVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
Suravee Suthikulpanit [Tue, 12 Sep 2017 15:42:41 +0000 (10:42 -0500)]
KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()

Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agoKVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
Suravee Suthikulpanit [Tue, 12 Sep 2017 15:42:40 +0000 (10:42 -0500)]
KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()

Preparing the base code for subsequent changes. This does not change
existing logic.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agobe2net: fix TSO6/GSO issue causing TX-stall on Lancer/BEx
Suresh Reddy [Wed, 13 Sep 2017 15:12:42 +0000 (11:12 -0400)]
be2net: fix TSO6/GSO issue causing TX-stall on Lancer/BEx

IPv6 TSO requests with extension hdrs are a problem to the
Lancer and BEx chips. Workaround is to disable TSO6 feature
for such packets.

Also in Lancer chips, MSS less than 256 was resulting in TX stall.
Fix this by disabling GSO when MSS less than 256.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszer...
Linus Torvalds [Wed, 13 Sep 2017 16:11:44 +0000 (09:11 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
 "This fixes d_ino correctness in readdir, which brings overlayfs on par
  with normal filesystems regarding inode number semantics, as long as
  all layers are on the same filesystem.

  There are also some bug fixes, one in particular (random ioctl's
  shouldn't be able to modify lower layers) that touches some vfs code,
  but of course no-op for non-overlay fs"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix false positive ESTALE on lookup
  ovl: don't allow writing ioctl on lower layer
  ovl: fix relatime for directories
  vfs: add flags to d_real()
  ovl: cleanup d_real for negative
  ovl: constant d_ino for non-merge dirs
  ovl: constant d_ino across copy up
  ovl: fix readdir error value
  ovl: check snprintf return

6 years agoKVM: x86: fix clang build
Radim Krčmář [Wed, 13 Sep 2017 13:13:46 +0000 (15:13 +0200)]
KVM: x86: fix clang build

Clang resolves __builtin_constant_p() to false even if the expression is
constant in the end.  The only purpose of that expression was to
differentiate a case where the following expression couldn't be checked
at compile-time, so we can just remove the check.

Clang handles the following two correctly.  Turn it into BUG_ON if there
are any more problems with this.

Fixes: d6321d493319 ("KVM: x86: generalize guest_cpuid_has_ helpers")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agoKVM: fix rcu warning on VM_CREATE errors
Christian Borntraeger [Wed, 13 Sep 2017 12:17:22 +0000 (14:17 +0200)]
KVM: fix rcu warning on VM_CREATE errors

commit 3898da947bba ("KVM: avoid using rcu_dereference_protected") can
trigger the following lockdep/rcu splat if the VM_CREATE ioctl fails,
for example if kvm_arch_init_vm fails:

WARNING: suspicious RCU usage
4.13.0+ #105 Not tainted
-----------------------------
./include/linux/kvm_host.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
no locks held by qemu-system-s39/79.
stack backtrace:
CPU: 0 PID: 79 Comm: qemu-system-s39 Not tainted 4.13.0+ #105
Hardware name: IBM 2964 NC9 704 (KVM/Linux)
Call Trace:
([<00000000001140b2>] show_stack+0xea/0xf0)
 [<00000000008a68a4>] dump_stack+0x94/0xd8
 [<0000000000134c12>] kvm_dev_ioctl+0x372/0x7a0
 [<000000000038f940>] do_vfs_ioctl+0xa8/0x6c8
 [<0000000000390004>] SyS_ioctl+0xa4/0xb8
 [<00000000008c7a8c>] system_call+0xc4/0x27c
no locks held by qemu-system-s39/79.

We have to reset the just created users_count back to 0 to
tell the check to not trigger.

Reported-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 3898da947bba ("KVM: avoid using rcu_dereference_protected")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agoKVM: x86: Fix immediate_exit handling for uninitialized AP
Jan H. Schönherr [Wed, 6 Sep 2017 16:34:06 +0000 (18:34 +0200)]
KVM: x86: Fix immediate_exit handling for uninitialized AP

When user space sets kvm_run->immediate_exit, KVM is supposed to
return quickly. However, when a vCPU is in KVM_MP_STATE_UNINITIALIZED,
the value is not considered and the vCPU blocks.

Fix that oversight.

Fixes: 460df4c1fc7c008 ("KVM: race-free exit from KVM_RUN without POSIX signals")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agoKVM: x86: Fix handling of pending signal on uninitialized AP
Jan H. Schönherr [Tue, 5 Sep 2017 22:27:19 +0000 (00:27 +0200)]
KVM: x86: Fix handling of pending signal on uninitialized AP

KVM API says that KVM_RUN will return with -EINTR when a signal is
pending. However, if a vCPU is in KVM_MP_STATE_UNINITIALIZED, then
the return value is unconditionally -EAGAIN.

Copy over some code from vcpu_run(), so that the case of a pending
signal results in the expected return value.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agoKVM: SVM: Add a missing 'break' statement
Jan H. Schönherr [Tue, 5 Sep 2017 21:58:44 +0000 (23:58 +0200)]
KVM: SVM: Add a missing 'break' statement

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Fixes: f6511935f424 ("KVM: SVM: Add checks for IO instructions")
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agoKVM: x86: Remove .get_pkru() from kvm_x86_ops
Joerg Roedel [Mon, 28 Aug 2017 14:38:35 +0000 (16:38 +0200)]
KVM: x86: Remove .get_pkru() from kvm_x86_ops

The commit

9dd21e104bc ('KVM: x86: simplify handling of PKRU')

removed all users and providers of that call-back, but
didn't remove it. Remove it now.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
6 years agox86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition
Vitaly Kuznetsov [Mon, 11 Sep 2017 15:06:20 +0000 (17:06 +0200)]
x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition

Commits:

  7dcf90e9e032 ("PCI: hv: Use vPCI protocol version 1.2")
  628f54cc6451 ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")

added the same definition and they came in through different trees.
Fix the duplication.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agox86/hyper-V: Allocate the IDT entry early in boot
K. Y. Srinivasan [Fri, 8 Sep 2017 23:15:57 +0000 (16:15 -0700)]
x86/hyper-V: Allocate the IDT entry early in boot

Allocate the hypervisor callback IDT entry early in the boot sequence.

The previous code would allocate the entry as part of registering the handler
when the vmbus driver loaded, and this caused a problem for the IDT cleanup
that Thomas is working on for v4.15.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: olaf@aepfle.de
Link: http://lkml.kernel.org/r/20170908231557.2419-1-kys@exchange.microsoft.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoparavirt: Switch maintainer
Juergen Gross [Tue, 5 Sep 2017 14:34:07 +0000 (16:34 +0200)]
paravirt: Switch maintainer

Jeremy Fitzhardinge is stepping down as a paravirt maintainer. I'll
replace him.

While at it, update the file list to the actual pattern.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170905143407.9227-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agox86/paravirt: Remove no longer used paravirt functions
Juergen Gross [Mon, 4 Sep 2017 10:25:27 +0000 (12:25 +0200)]
x86/paravirt: Remove no longer used paravirt functions

With removal of lguest some of the paravirt functions are no longer
needed:

->read_cr4()
->store_idt()
->set_pmd_at()
->set_pud_at()
->pte_update()

Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agox86/mm/64: Initialize CR4.PCIDE early
Andy Lutomirski [Mon, 11 Sep 2017 00:48:27 +0000 (17:48 -0700)]
x86/mm/64: Initialize CR4.PCIDE early

cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the boot
CPU but is called extremely early (before identification) on secondary
CPUs.  It's called just late enough on the boot CPU that its CR4 value
isn't propagated to mmu_cr4_features.

Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
problems.  First, we'd crash in the trampoline code.  That's
fixable, and I tried that.  It turns out that mmu_cr4_features is
totally ignored by secondary_start_64(), though, so even with the
trampoline code fixed, it wouldn't help.

This means that we don't currently have CR4.PCIDE reliably initialized
before we start playing with cpu_tlbstate.  This is very fragile and
tends to cause boot failures if I make even small changes to the TLB
handling code.

Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
and propagate it to secondary CPUs in start_secondary().

( Yes, this is ugly.  I think we should have improved mmu_cr4_features
  to actually control CR4 during secondary bootup, but that would be
  fairly intrusive at this stage. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agox86/hibernate/64: Mask off CR3's PCID bits in the saved CR3
Andy Lutomirski [Fri, 8 Sep 2017 05:06:58 +0000 (22:06 -0700)]
x86/hibernate/64: Mask off CR3's PCID bits in the saved CR3

Jiri reported a resume-from-hibernation failure triggered by PCID.
The root cause appears to be rather odd.  The hibernation asm
restores a CR3 value that comes from the image header.  If the image
kernel has PCID on, it's entirely reasonable for this CR3 value to
have one of the low 12 bits set.  The restore code restores it with
CR4.PCIDE=0, which means that those low 12 bits are accepted by the
CPU but are either ignored or interpreted as a caching mode.  This
is odd, but still works.  We blow up later when the image kernel
restores CR4, though, since changing CR4.PCIDE with CR3[11:0] != 0
is illegal.  Boom!

FWIW, it's entirely unclear to me what's supposed to happen if a PAE
kernel restores a non-PAE image or vice versa.  Ditto for LA57.

Reported-by: Jiri Kosina <jikos@kernel.org>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Link: http://lkml.kernel.org/r/18ca57090651a6341e97083883f9e814c4f14684.1504847163.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agox86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off()
Andy Lutomirski [Fri, 8 Sep 2017 05:06:57 +0000 (22:06 -0700)]
x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off()

If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
but we're very unlikely to get a useful call trace.

Make it a warning instead.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoMerge tag 'perf-urgent-for-mingo-4.14-20170912' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Wed, 13 Sep 2017 07:25:10 +0000 (09:25 +0200)]
Merge tag 'perf-urgent-for-mingo-4.14-20170912' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fix TUI progress bar when delta from new total from that of the
  previous update is greater than the progress "step" (screen width
  progress bar block))  (Jiri Olsa)

- Make tools/lib/api make DEBUG=1 build use -D_FORTIFY_SOURCE=2 not
  to cripple debuginfo, just like tools/perf/ does (Jiri Olsa)

- Avoid leaking the 'perf.data' file to workloads started from the
  'perf record' command line by using the O_CLOEXEC open flag (Jiri Olsa)

- Fix building when libunwind's 'unwind.h' file is present in the
  include path, clashing with tools/perf/util/unwind.h (Milian Wolff)

- Check per .perfconfig section entry flag, not just per section (Taeung Song)

- Support running perf binaries with a dash in their name, needed to
  run perf as an AppImage (Milian Wolff)

- Wait for the right child by using waitpid() when running workloads
  from 'perf stat', also to fix using perf as an AppImage (Milian Wolff)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoMerge branch 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Wed, 13 Sep 2017 04:34:11 +0000 (14:34 +1000)]
Merge branch 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux into drm-next

A few fixes for 4.14.  Nothing too major.

6 years agow90p910_ether: include linux/interrupt.h
Arnd Bergmann [Tue, 12 Sep 2017 12:31:48 +0000 (14:31 +0200)]
w90p910_ether: include linux/interrupt.h

A randconfig build caused a compile failure:

drivers/net/ethernet/nuvoton/w90p910_ether.c: In function 'w90p910_ether_close':
drivers/net/ethernet/nuvoton/w90p910_ether.c:580:2: error: implicit declaration of function 'free_irq'; did you mean 'free_uid'? [-Werror=implicit-function-declaration]

Adding the correct include fixes the problem.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bonding: fix tlb_dynamic_lb default value
Nikolay Aleksandrov [Tue, 12 Sep 2017 12:10:05 +0000 (15:10 +0300)]
net: bonding: fix tlb_dynamic_lb default value

Commit 8b426dc54cf4 ("bonding: remove hardcoded value") changed the
default value for tlb_dynamic_lb which lead to either broken ALB mode
(since tlb_dynamic_lb can be changed only in TLB) or setting TLB mode
with tlb_dynamic_lb equal to 0.
The first issue was recently fixed by setting tlb_dynamic_lb to 1 always
when switching to ALB mode, but the default value is still wrong and
we'll enter TLB mode with tlb_dynamic_lb equal to 0 if the mode is
changed via netlink or sysfs. In order to restore the previous behaviour
and default value simply remove the mode check around the default param
initialization for tlb_dynamic_lb which will always set it to 1 as
before.

Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip6_tunnel: fix ip6 tunnel lookup in collect_md mode
Haishuang Yan [Tue, 12 Sep 2017 09:47:57 +0000 (17:47 +0800)]
ip6_tunnel: fix ip6 tunnel lookup in collect_md mode

In collect_md mode, if the tun dev is down, it still can call
__ip6_tnl_rcv to receive on packets, and the rx statistics increase
improperly.

When the md tunnel is down, it's not neccessary to increase RX drops
for the tunnel device, packets would be recieved on fallback tunnel,
and the RX drops on fallback device will be increased as expected.

Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip_tunnel: fix ip tunnel lookup in collect_md mode
Haishuang Yan [Tue, 12 Sep 2017 09:47:56 +0000 (17:47 +0800)]
ip_tunnel: fix ip tunnel lookup in collect_md mode

In collect_md mode, if the tun dev is down, it still can call
ip_tunnel_rcv to receive on packets, and the rx statistics increase
improperly.

When the md tunnel is down, it's not neccessary to increase RX drops
for the tunnel device, packets would be recieved on fallback tunnel,
and the RX drops on fallback device will be increased as expected.

Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Prevent mirred-related crash on removal
Yuval Mintz [Tue, 12 Sep 2017 06:50:53 +0000 (08:50 +0200)]
mlxsw: spectrum: Prevent mirred-related crash on removal

When removing the offloading of mirred actions under
matchall classifiers, mlxsw would find the destination port
associated with the offloaded action and utilize it for undoing
the configuration.

Depending on the order by which ports are removed, it's possible that
the destination port would get removed before the source port.
In such a scenario, when actions would be flushed for the source port
mlxsw would perform an illegal dereference as the destination port is
no longer listed.

Since the only item necessary for undoing the configuration on the
destination side is the port-id and that in turn is already maintained
by mlxsw on the source-port, simply stop trying to access the
destination port and use the port-id directly instead.

Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net_sched-fix-filter-chain-reference-counting'
David S. Miller [Wed, 13 Sep 2017 03:41:02 +0000 (20:41 -0700)]
Merge branch 'net_sched-fix-filter-chain-reference-counting'

Cong Wang says:

====================
net_sched: fix filter chain reference counting

This patchset fixes tc filter chain reference counting and nasty race
conditions with RCU callbacks. Please see each patch for details.

v3: Rebase on the latest -net
    Add code comment in patch 1
    Improve comment and changelog for patch 2
    Add patch 3

v2: Add patch 1
    Get rid of more ugly code in patch 2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet_sched: carefully handle tcf_block_put()
Cong Wang [Mon, 11 Sep 2017 23:33:32 +0000 (16:33 -0700)]
net_sched: carefully handle tcf_block_put()

As pointed out by Jiri, there is still a race condition between
tcf_block_put() and tcf_chain_destroy() in a RCU callback. There
is no way to make it correct without proper locking or synchronization,
because both operate on a shared list.

Locking is hard, because the only lock we can pick here is a spinlock,
however, in tc_dump_tfilter() we iterate this list with a sleeping
function called (tcf_chain_dump()), which makes using a lock to protect
chain_list almost impossible.

Jiri suggested the idea of holding a refcnt before flushing, this works
because it guarantees us there would be no parallel tcf_chain_destroy()
during the loop, therefore the race condition is gone. But we have to
be very careful with proper synchronization with RCU callbacks.

Suggested-by: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet_sched: fix reference counting of tc filter chain
Cong Wang [Mon, 11 Sep 2017 23:33:31 +0000 (16:33 -0700)]
net_sched: fix reference counting of tc filter chain

This patch fixes the following ugliness of tc filter chain refcnt:

a) tp proto should hold a refcnt to the chain too. This significantly
   simplifies the logic.

b) Chain 0 is no longer special, it is created with refcnt=1 like any
   other chains. All the ugliness in tcf_chain_put() can be gone!

c) No need to handle the flushing oddly, because block still holds
   chain 0, it can not be released, this guarantees block is the last
   user.

d) The race condition with RCU callbacks is easier to handle with just
   a rcu_barrier(). Much easier to understand, nothing to hide. Thanks
   to the previous patch. Please see also the comments in code.

e) Make the code understandable by humans, much less error-prone.

Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet_sched: get rid of tcfa_rcu
Cong Wang [Mon, 11 Sep 2017 23:33:30 +0000 (16:33 -0700)]
net_sched: get rid of tcfa_rcu

gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller is no longer needed to wait for a grace period.
So this patch gets rid of it.

This also completely closes a race condition between action free
path and filter chain add/remove path for the following patch.
Because otherwise the nested RCU callback can't be caught by
rcu_barrier().

Please see also the comments in code.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp/dccp: remove reqsk_put() from inet_child_forget()
Eric Dumazet [Mon, 11 Sep 2017 22:58:38 +0000 (15:58 -0700)]
tcp/dccp: remove reqsk_put() from inet_child_forget()

Back in linux-4.4, I inadvertently put a call to reqsk_put() in
inet_child_forget(), forgetting it could be called from two different
points.

In the case it is called from inet_csk_reqsk_queue_add(), we want to
keep the reference on the request socket, since it is released later by
the caller (tcp_v{4|6}_rcv())

This bug never showed up because atomic_dec_and_test() was not signaling
the underflow, and SLAB_DESTROY_BY RCU semantic for request sockets
prevented the request to be put in quarantine.

Recent conversion of socket refcount from atomic_t to refcount_t finally
exposed the bug.

So move the reqsk_put() to inet_csk_listen_stop() to fix this.

Thanks to Shankara Pailoor for using syzkaller and providing
a nice set of .config and C repro.

WARNING: CPU: 2 PID: 4277 at lib/refcount.c:186
refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:186
Kernel panic - not syncing: panic_on_warn set ...

CPU: 2 PID: 4277 Comm: syz-executor0 Not tainted 4.13.0-rc7 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0xf7/0x1aa lib/dump_stack.c:52
 panic+0x1ae/0x3a7 kernel/panic.c:180
 __warn+0x1c4/0x1d9 kernel/panic.c:541
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
 do_error_trap+0x118/0x340 arch/x86/kernel/traps.c:310
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:846
RIP: 0010:refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:186
RSP: 0018:ffff88006e006b60 EFLAGS: 00010286
RAX: 0000000000000026 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000026 RSI: 1ffff1000dc00d2c RDI: ffffed000dc00d60
RBP: ffff88006e006bf0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1000dc00d6d
R13: 00000000ffffffff R14: 0000000000000001 R15: ffff88006ce9d340
 refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
 reqsk_put+0x71/0x2b0 include/net/request_sock.h:123
 tcp_v4_rcv+0x259e/0x2e20 net/ipv4/tcp_ipv4.c:1729
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:248 [inline]
 ip_local_deliver+0x1ce/0x6d0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:477 [inline]
 ip_rcv_finish+0x8db/0x19c0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:248 [inline]
 ip_rcv+0xc3f/0x17d0 net/ipv4/ip_input.c:488
 __netif_receive_skb_core+0x1fb7/0x31f0 net/core/dev.c:4298
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4336
 process_backlog+0x1c5/0x6d0 net/core/dev.c:5102
 napi_poll net/core/dev.c:5499 [inline]
 net_rx_action+0x6d3/0x14a0 net/core/dev.c:5565
 __do_softirq+0x2cb/0xb2d kernel/softirq.c:284
 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:898
 </IRQ>
 do_softirq.part.16+0x63/0x80 kernel/softirq.c:328
 do_softirq kernel/softirq.c:176 [inline]
 __local_bh_enable_ip+0x84/0x90 kernel/softirq.c:181
 local_bh_enable include/linux/bottom_half.h:31 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:705 [inline]
 ip_finish_output2+0x8ad/0x1360 net/ipv4/ip_output.c:231
 ip_finish_output+0x74e/0xb80 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:237 [inline]
 ip_output+0x1cc/0x850 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:471 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_queue_xmit+0x8c6/0x1810 net/ipv4/ip_output.c:504
 tcp_transmit_skb+0x1963/0x3320 net/ipv4/tcp_output.c:1123
 tcp_send_ack.part.35+0x38c/0x620 net/ipv4/tcp_output.c:3575
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3545
 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:5795 [inline]
 tcp_rcv_state_process+0x4876/0x4b60 net/ipv4/tcp_input.c:5930
 tcp_v4_do_rcv+0x58a/0x820 net/ipv4/tcp_ipv4.c:1483
 sk_backlog_rcv include/net/sock.h:907 [inline]
 __release_sock+0x124/0x360 net/core/sock.c:2223
 release_sock+0xa4/0x2a0 net/core/sock.c:2715
 inet_wait_for_connect net/ipv4/af_inet.c:557 [inline]
 __inet_stream_connect+0x671/0xf00 net/ipv4/af_inet.c:643
 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:682
 SYSC_connect+0x204/0x470 net/socket.c:1628
 SyS_connect+0x24/0x30 net/socket.c:1609
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x451e59
RSP: 002b:00007f474843fc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 0000000000451e59
RDX: 0000000000000010 RSI: 0000000020002000 RDI: 0000000000000007
RBP: 0000000000000046 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000000000
R13: 00007ffc040a0f8f R14: 00007f47484409c0 R15: 0000000000000000

Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Shankara Pailoor <sp3485@columbia.edu>
Tested-by: Shankara Pailoor <sp3485@columbia.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoopenvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()'
Christophe JAILLET [Mon, 11 Sep 2017 19:56:20 +0000 (21:56 +0200)]
openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()'

All other error handling paths in this function go through the 'error'
label. This one should do the same.

Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosmsc95xx: Configure pause time to 0xffff when tx flow control enabled
Nisar Sayed [Mon, 11 Sep 2017 17:43:11 +0000 (17:43 +0000)]
smsc95xx: Configure pause time to 0xffff when tx flow control enabled

Configure pause time to 0xffff when tx flow control enabled

Set pause time to 0xffff in the pause frame to indicate the
partner to stop sending the packets. When RX buffer frees up,
the device sends pause frame with pause time zero for partner to
resume transmission.

Fixes: 2f7ca802bdae ("Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver")
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk...
Linus Torvalds [Wed, 13 Sep 2017 03:05:58 +0000 (20:05 -0700)]
Merge tag 'f2fs-for-4.14' of git://git./linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've mostly tuned f2fs to provide better user
  experience for Android. Especially, we've worked on atomic write
  feature again with SQLite community in order to support it officially.
  And we added or modified several facilities to analyze and enhance IO
  behaviors.

  Major changes include:
   - add app/fs io stat
   - add inode checksum feature
   - support project/journalled quota
   - enhance atomic write with new ioctl() which exposes feature set
   - enhance background gc/discard/fstrim flows with new gc_urgent mode
   - add F2FS_IOC_FS{GET,SET}XATTR
   - fix some quota flows"

* tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits)
  f2fs: hurry up to issue discard after io interruption
  f2fs: fix to show correct discard_granularity in sysfs
  f2fs: detect dirty inode in evict_inode
  f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared
  f2fs: speed up gc_urgent mode with SSR
  f2fs: better to wait for fstrim completion
  f2fs: avoid race in between read xattr & write xattr
  f2fs: make get_lock_data_page to handle encrypted inode
  f2fs: use generic terms used for encrypted block management
  f2fs: introduce f2fs_encrypted_file for clean-up
  Revert "f2fs: add a new function get_ssr_cost"
  f2fs: constify super_operations
  f2fs: fix to wake up all sleeping flusher
  f2fs: avoid race in between atomic_read & atomic_inc
  f2fs: remove unneeded parameter of change_curseg
  f2fs: update i_flags correctly
  f2fs: don't check inode's checksum if it was dirtied or writebacked
  f2fs: don't need to update inode checksum for recovery
  f2fs: trigger fdatasync for non-atomic_write file
  f2fs: fix to avoid race in between aio and gc
  ...

6 years agoMerge tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-client
Linus Torvalds [Wed, 13 Sep 2017 03:03:53 +0000 (20:03 -0700)]
Merge tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The highlights include:

   - a large series of fixes and improvements to the snapshot-handling
     code (Zheng Yan)

   - individual read/write OSD requests passed down to libceph are now
     limited to 16M in size to avoid hitting OSD-side limits (Zheng Yan)

   - encode MStatfs v2 message to allow for more accurate space usage
     reporting (Douglas Fuller)

   - switch to the new writeback error tracking infrastructure (Jeff
     Layton)"

* tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-client: (35 commits)
  ceph: stop on-going cached readdir if mds revokes FILE_SHARED cap
  ceph: wait on writeback after writing snapshot data
  ceph: fix capsnap dirty pages accounting
  ceph: ignore wbc->range_{start,end} when write back snapshot data
  ceph: fix "range cyclic" mode writepages
  ceph: cleanup local variables in ceph_writepages_start()
  ceph: optimize pagevec iterating in ceph_writepages_start()
  ceph: make writepage_nounlock() invalidate page that beyonds EOF
  ceph: properly get capsnap's size in get_oldest_context()
  ceph: remove stale check in ceph_invalidatepage()
  ceph: queue cap snap only when snap realm's context changes
  ceph: handle race between vmtruncate and queuing cap snap
  ceph: fix message order check in handle_cap_export()
  ceph: fix NULL pointer dereference in ceph_flush_snaps()
  ceph: adjust 36 checks for NULL pointers
  ceph: delete an unnecessary return statement in update_dentry_lease()
  ceph: ENOMEM pr_err in __get_or_create_frag() is redundant
  ceph: check negative offsets in ceph_llseek()
  ceph: more accurate statfs
  ceph: properly set snap follows for cap reconnect
  ...

6 years agoxfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
Richard Wareing [Tue, 12 Sep 2017 23:09:35 +0000 (09:09 +1000)]
xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present

If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on
a directory in a filesystem that does not have a realtime device and
create a new file in that directory, it gets marked as a real time file.
When data is written and a fsync is issued, the filesystem attempts to
flush a non-existent rt device during the fsync process.

This results in a crash dereferencing a null buftarg pointer in
xfs_blkdev_issue_flush():

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: xfs_blkdev_issue_flush+0xd/0x20
  .....
  Call Trace:
    xfs_file_fsync+0x188/0x1c0
    vfs_fsync_range+0x3b/0xa0
    do_fsync+0x3d/0x70
    SyS_fsync+0x10/0x20
    do_syscall_64+0x4d/0xb0
    entry_SYSCALL64_slow_path+0x25/0x25

Setting RT inode flags does not require special privileges so any
unprivileged user can cause this oops to occur.  To reproduce, confirm
kernel is compiled with CONFIG_XFS_RT=y and run:

  # mkfs.xfs -f /dev/pmem0
  # mount /dev/pmem0 /mnt/test
  # mkdir /mnt/test/foo
  # xfs_io -c 'chattr +t' /mnt/test/foo
  # xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar

Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait.

Kernels built with CONFIG_XFS_RT=n are not exposed to this bug.

Fixes: f538d4da8d52 ("[XFS] write barrier support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Richard Wareing <rwareing@fb.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
Christian König [Tue, 5 Sep 2017 13:10:50 +0000 (15:10 +0200)]
drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"

This reverts commit 10e709cb296c98424c03408d23e3addeddcd4088.

The patch doesn't work at all:
1. The CS can still be blocked because of amdgpu_ctx_add_fence().
2. The order of submission isn't correct any more.
3. We could end up using freed up memory because we now drop the
   ctx reference to early.

This needs to be fixed cleanly by doing the context handling after the BO
handling, but this is a larger task just avoid the obvious crashes for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu monk.liu@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agoMerge tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Tue, 12 Sep 2017 20:30:06 +0000 (13:30 -0700)]
Merge tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - removal of the old dma_alloc_noncoherent interface

 - remove unused flags to dma_declare_coherent_memory

 - restrict OF DMA configuration to specific physical busses

 - use the iommu mailing list for dma-mapping questions and patches

* tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping:
  dma-coherent: fix dma_declare_coherent_memory() logic error
  ARM: imx: mx31moboard: Remove unused 'dma' variable
  dma-coherent: remove an unused variable
  MAINTAINERS: use the iommu list for the dma-mapping subsystem
  dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags
  dma-coherent: remove the DMA_MEMORY_INCLUDES_CHILDREN flag
  of: restrict DMA configuration
  dma-mapping: remove dma_alloc_noncoherent and dma_free_noncoherent
  i825xx: switch to switch to dma_alloc_attrs
  au1000_eth: switch to dma_alloc_attrs
  sgiseeq: switch to dma_alloc_attrs
  dma-mapping: reduce dma_mapping_error inline bloat

6 years agoMerge tag 'uuid-for-4.14' of git://git.infradead.org/users/hch/uuid
Linus Torvalds [Tue, 12 Sep 2017 20:27:21 +0000 (13:27 -0700)]
Merge tag 'uuid-for-4.14' of git://git.infradead.org/users/hch/uuid

Pull uuid updates from Christoph Hellwig:
 "Just a single conversion to the new UUID API for this merge window"

* tag 'uuid-for-4.14' of git://git.infradead.org/users/hch/uuid:
  efi: switch to use new generic UUID API

6 years agoMerge tag 'selinux-pr-20170831' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 12 Sep 2017 20:21:00 +0000 (13:21 -0700)]
Merge tag 'selinux-pr-20170831' of git://git./linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "A relatively quiet period for SELinux, 11 patches with only two/three
  having any substantive changes.

  These noteworthy changes include another tweak to the NNP/nosuid
  handling, per-file labeling for cgroups, and an object class fix for
  AF_UNIX/SOCK_RAW sockets; the rest of the changes are minor tweaks or
  administrative updates (Stephen's email update explains the file
  explosion in the diffstat).

  Everything passes the selinux-testsuite"

[ Also a couple of small patches from the security tree from Tetsuo
  Handa for Tomoyo and LSM cleanup. The separation of security policy
  updates wasn't all that clean - Linus ]

* tag 'selinux-pr-20170831' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: constify nf_hook_ops
  selinux: allow per-file labeling for cgroupfs
  lsm_audit: update my email address
  selinux: update my email address
  MAINTAINERS: update the NetLabel and Labeled Networking information
  selinux: use GFP_NOWAIT in the AVC kmem_caches
  selinux: Generalize support for NNP/nosuid SELinux domain transitions
  selinux: genheaders should fail if too many permissions are defined
  selinux: update the selinux info in MAINTAINERS
  credits: update Paul Moore's info
  selinux: Assign proper class to PF_UNIX/SOCK_RAW sockets
  tomoyo: Update URLs in Documentation/admin-guide/LSM/tomoyo.rst
  LSM: Remove security_task_create() hook.

6 years agoInput: xpad - validate USB endpoint type during probe
Cameron Gutman [Tue, 12 Sep 2017 18:27:44 +0000 (11:27 -0700)]
Input: xpad - validate USB endpoint type during probe

We should only see devices with interrupt endpoints. Ignore any other
endpoints that we find, so we don't send try to send them interrupt URBs
and trigger a WARN down in the USB stack.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org> # c01b5e7464f0 Input: xpad - don't depend on endpoint order
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 12 Sep 2017 18:34:39 +0000 (11:34 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Two fixes: dead code removal, plus a SME memory encryption fix on
  32-bit kernels that crashed Xen guests"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Remove unused and undefined __generic_processor_info() declaration
  x86/mm: Make the SME mask a u64

6 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 12 Sep 2017 18:30:56 +0000 (11:30 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Three fixes:

   - fix a suspend/resume cpusets bug

   - fix a !CONFIG_NUMA_BALANCING bug

   - fix a kerneldoc warning"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix nuisance kernel-doc warning
  sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
  sched/fair: Fix wake_affine_llc() balancing rules

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 12 Sep 2017 18:28:13 +0000 (11:28 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf tooling updates from Ingo Molnar:
 "Perf tooling updates and fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf annotate browser: Help for cycling thru hottest instructions with TAB/shift+TAB
  perf stat: Only auto-merge events that are PMU aliases
  perf test: Add test case for PERF_SAMPLE_PHYS_ADDR
  perf script: Support physical address
  perf mem: Support physical address
  perf sort: Add sort option for physical address
  perf tools: Support new sample type for physical address
  perf vendor events powerpc: Remove duplicate events
  perf intel-pt: Fix syntax in documentation of config option
  perf test powerpc: Fix 'Object code reading' test
  perf trace: Support syscall name globbing
  perf syscalltbl: Support glob matching on syscall names
  perf report: Calculate the average cycles of iterations

6 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 12 Sep 2017 18:25:56 +0000 (11:25 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:
 "A sparse irq race/locking fix, and a MSI irq domains population fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Make sparse_irq_lock protect what it should protect
  genirq/msi: Fix populating multiple interrupts

6 years agof2fs: hurry up to issue discard after io interruption
Chao Yu [Tue, 12 Sep 2017 13:35:12 +0000 (21:35 +0800)]
f2fs: hurry up to issue discard after io interruption

Once we encounter I/O interruption during issuing discards, we will delay
long time before next round, but if system status is I/O idle during the
time, it may loses opportunity to issue discards. So this patch changes
to hurry up to issue discard after io interruption.

Besides, this patch also fixes to issue discards accurately with assigned
rate.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to show correct discard_granularity in sysfs
Chao Yu [Tue, 12 Sep 2017 06:25:35 +0000 (14:25 +0800)]
f2fs: fix to show correct discard_granularity in sysfs

Fix below incorrect display when reading discard_granularity sysfs node.

$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16
$ echo 32 > /sys/fs/f2fs/<device>/discard_granularity
$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: detect dirty inode in evict_inode
Chao Yu [Tue, 12 Sep 2017 06:04:05 +0000 (14:04 +0800)]
f2fs: detect dirty inode in evict_inode

Add a bugon in f2fs_evict_inode to detect inconsistent status between
inode cache and related node page cache.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agoperf stat: Wait for the correct child
Milian Wolff [Tue, 12 Sep 2017 15:25:23 +0000 (17:25 +0200)]
perf stat: Wait for the correct child

When packaging the perf userland application into an AppImage, the
wait() call in perf stat returned too early. It turned out that some
other child process exited, but not the one perf stat launched:

  $ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1
  execve("./perf-git.3a73b7f9-x86_64.AppImage", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars */) = 0
  clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912
  strace: Process 3912 attached
  [pid  3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914
  strace: Process 3914 attached
  [pid  3912] +++ exited with 0 +++
  [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
  [pid  3914] clone(strace: Process 3915 attached
  child_stack=0x7f6a6d9fefb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915
  [pid  3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 /* 21 vars */) = 0
  [pid  3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916
  strace: Process 3916 attached
  [pid  3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912
  [pid  3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */
   Performance counter stats for 'sleep 1':

       <not counted> task-clock
       <not counted> context-switches
       <not counted> cpu-migrations
       <not counted> page-faults
       <not counted> cycles
       <not counted> instructions
       <not counted>      branches
       <not counted>      branch-misses

         0.000047194 seconds time elapsed

  [pid  3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} ---
  [pid  3916] +++ killed by SIGTERM +++
  [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
  [pid  3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} ---
  [pid  3911] +++ exited with 0 +++
  [pid  3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} ---
  [pid  3915] +++ exited with 0 +++
  +++ exited with 0 +++

This patch uses waitpid instead to ensure the call waits for the
debuggee application launched by 'perf stat'. This fixes 'perf stat'
when launched from an AppImage:

  $ ./perf-x86_64.AppImage stat sleep 1

   Performance counter stats for 'sleep 1':

          0.357235      task-clock (msec)         #    0.000 CPUs utilized
                 1      context-switches          #    0.003 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                50      page-faults               #    0.140 M/sec
           1269602      cycles                    #    3.554 GHz
            654278      instructions              #    0.52  insn per cycle
            129963      branches                  #  363.803 M/sec
              7082      branch-misses             #    5.45% of all branches

       1.000633420 seconds time elapsed

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
6 years agoperf tools: Support running perf binaries with a dash in their name
Milian Wolff [Mon, 11 Sep 2017 11:14:22 +0000 (13:14 +0200)]
perf tools: Support running perf binaries with a dash in their name

Previously the part behind "perf-" was interpreted as an internal perf
command. If the suffix could not be handled, the execution was stopped.
This makes it impossible to launch perf binaries that got renamed to
have the `perf-` prefix. This is e.g. the case for appimages (e.g.
"perf-x86_64.AppImage"), but would also apply to all other scenarios
where users symlink or rename perf themselves:

Status quo with the broken behavior:

  $ ln -s ./perf ./perf-custom-suffix
  $ ./perf-custom-suffix list
  cannot handle custom-suffix internally$

Also note the missing newline at the end of the error message.

With this patch applied, the above works properly:

  $ ./perf-custom-suffix list

  List of pre-defined events (to be used in -e):
  ...

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170911111422.31903-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
6 years agosched/debug: Add debugfs knob for "sched_debug"
Peter Zijlstra [Thu, 7 Sep 2017 15:03:53 +0000 (17:03 +0200)]
sched/debug: Add debugfs knob for "sched_debug"

I'm forever late for editing my kernel cmdline, add a runtime knob to
disable the "sched_debug" thing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.142924283@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agosched/core: WARN() when migrating to an offline CPU
Peter Zijlstra [Thu, 7 Sep 2017 15:03:52 +0000 (17:03 +0200)]
sched/core: WARN() when migrating to an offline CPU

Migrating tasks to offline CPUs is a pretty big fail, warn about it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.094206976@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agosched/fair: Plug hole between hotplug and active_load_balance()
Peter Zijlstra [Thu, 7 Sep 2017 15:03:51 +0000 (17:03 +0200)]
sched/fair: Plug hole between hotplug and active_load_balance()

The load balancer applies cpu_active_mask to whatever sched_domains it
finds, however in the case of active_balance there is a hole between
setting rq->{active_balance,push_cpu} and running the stop_machine
work doing the actual migration.

The @push_cpu can go offline in this window, which would result in us
moving a task onto a dead cpu, which is a fairly bad thing.

Double check the active mask before the stop work does the migration.

  CPU0 CPU1

  <SoftIRQ>
stop_machine(takedown_cpu)
    load_balance() cpu_stopper_thread()
      ...   work = multi_cpu_stop
      stop_one_cpu_nowait(     /* wait for CPU0 */
.func = active_load_balance_cpu_stop
      );
  </SoftIRQ>

  cpu_stopper_thread()
    work = multi_cpu_stop
      /* sync with CPU1 */
    take_cpu_down()
<idle>
  play_dead();

    work = active_load_balance_cpu_stop
      set_task_cpu(p, CPU1); /* oops!! */

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170907150614.044460912@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agosched/fair: Avoid newidle balance for !active CPUs
Peter Zijlstra [Thu, 7 Sep 2017 15:03:50 +0000 (17:03 +0200)]
sched/fair: Avoid newidle balance for !active CPUs

On CPU hot unplug, when parking the last kthread we'll try and
schedule into idle to kill the CPU. This last schedule can (and does)
trigger newidle balance because at this point the sched domains are
still up because of commit:

  77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")

Obviously pulling tasks to an already offline CPU is a bad idea, and
all balancing operations _should_ be subject to cpu_active_mask, make
it so.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Link: http://lkml.kernel.org/r/20170907150613.994135806@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoperf config: Check not only section->from_system_config but also item's
Taeung Song [Thu, 7 Sep 2017 03:18:45 +0000 (12:18 +0900)]
perf config: Check not only section->from_system_config but also item's

Currently section->from_system_config is being checked multiple times.
item->from_system_config should be checked instead, when iterating thru
the items in a section. Fix it.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504754325-9724-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>