linux-2.6-microblaze.git
3 years agonet: phy: micrel: Add KS8851 PHY support
Marek Vasut [Tue, 5 Jan 2021 14:11:50 +0000 (15:11 +0100)]
net: phy: micrel: Add KS8851 PHY support

The KS8851 has a reduced internal PHY, which is accessible through its
registers at offset 0xe4. The PHY is compatible with KS886x PHY present
in Micrel switches, including the PHY ID Low/High registers swap, which
is present both in the MAC and the switch.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomacvlan: remove redundant null check on data
Yunjian Wang [Tue, 5 Jan 2021 06:31:34 +0000 (14:31 +0800)]
macvlan: remove redundant null check on data

Because macvlan_common_newlink() and macvlan_changelink() already
checked NULL data parameter, so the additional check is unnecessary,
just remove it.

Fixes: 79cf79abce71 ("macvlan: add source mode")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-pf: Add RSS multi group support
Geetha sowjanya [Mon, 4 Jan 2021 07:20:39 +0000 (12:50 +0530)]
octeontx2-pf: Add RSS multi group support

Hardware supports 8 RSS groups per interface. Currently we are using
only group '0'. This patch allows user to create new RSS groups/contexts
and use the same as destination for flow steering rules.

usage:
To steer the traffic to RQ 2,3

ethtool -X eth0 weight 0 0 1 1 context new
(It will print the allocated context id number)
New RSS context is 1

ethtool -N eth0 flow-type tcp4 dst-port 80 context 1 loc 1

To delete the context
ethtool -X eth0 context 1 delete

When an RSS context is removed, the active classification
rules using this context are also removed.

Change-log:

v4
- Fixed compiletime warning.
- Address Saeed's comments on v3.

v3
- Coverted otx2_set_rxfh() to use new function.

v2
- Removed unrelated whitespace
- Coverted otx2_get_rxfh() to use new function.

Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: nfc: nci: Change the NCI close sequence
Bongsu Jeon [Thu, 31 Dec 2020 02:59:26 +0000 (11:59 +0900)]
net: nfc: nci: Change the NCI close sequence

If there is a NCI command in work queue after closing the NCI device at
nci_unregister_device, The NCI command timer starts at flush_workqueue
function and then NCI command timeout handler would be called 5 second
after flushing the NCI command work queue and destroying the queue.
At that time, the timeout handler would try to use NCI command work queue
that is destroyed already. it will causes the problem. To avoid this
abnormal situation, change the sequence to prevent the NCI command timeout
handler from being called after destroying the NCI command work queue.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: kcm: Replace fput with sockfd_put
Zheng Yongjun [Wed, 30 Dec 2020 09:18:09 +0000 (17:18 +0800)]
net: kcm: Replace fput with sockfd_put

The function sockfd_lookup uses fget on the value that is stored in
the file field of the returned structure, so fput should ultimately be
applied to this value.  This can be done directly, but it seems better
to use the specific macro sockfd_put, which does the same thing.

Perform a source code refactoring by using the following semantic patch.

    // <smpl>
    @@
    expression s;
    @@

       s = sockfd_lookup(...)
       ...
    + sockfd_put(s);
    - fput(s->file);
    // </smpl>

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlxfw: Use kzalloc for allocating only one thing
Zheng Yongjun [Wed, 30 Dec 2020 08:18:35 +0000 (16:18 +0800)]
net/mlxfw: Use kzalloc for allocating only one thing

Use kzalloc rather than kcalloc(1,...)

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
@@

- kcalloc(1,
+ kzalloc(
          ...)
// </smpl>

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2021-01-05' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Tue, 5 Jan 2021 23:44:45 +0000 (15:44 -0800)]
Merge tag 'mlx5-updates-2021-01-05' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-01-05

SW steering, Refactor to have a device specific STE layer below dr_ste

This series introduces some improvements and refactoring by adding a new layer
below dr_ste to allow support for different devices format.

It adds a struct of device specific callbacks for STE layer below dr_ste.
Each device will implement its HW-specific function, and a common logic
from the DR code will access these functions through the new ste_ctx API.

Connect-X5-style steering format is called STE_v0.
In the next patch series we bring the Connect-X6-style format - STE_v1.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoocteontx2-af: Use kzalloc for allocating only one thing
Zheng Yongjun [Tue, 29 Dec 2020 13:53:09 +0000 (21:53 +0800)]
octeontx2-af: Use kzalloc for allocating only one thing

Use kzalloc rather than kcalloc(1,...)

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
@@

- kcalloc(1,
+ kzalloc(
          ...)
// </smpl>

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoiavf: Use kzalloc for allocating only one thing
Zheng Yongjun [Tue, 29 Dec 2020 13:53:01 +0000 (21:53 +0800)]
iavf: Use kzalloc for allocating only one thing

Use kzalloc rather than kcalloc(1,...)

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
@@

- kcalloc(1,
+ kzalloc(
          ...)
// </smpl>

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoliquidio: Use kzalloc for allocating only one thing
Zheng Yongjun [Tue, 29 Dec 2020 13:52:54 +0000 (21:52 +0800)]
liquidio: Use kzalloc for allocating only one thing

Use kzalloc rather than kcalloc(1,...)

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
@@

- kcalloc(1,
+ kzalloc(
          ...)
// </smpl>

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Use kzalloc for allocating only one thing
Zheng Yongjun [Tue, 29 Dec 2020 13:52:46 +0000 (21:52 +0800)]
bnxt_en: Use kzalloc for allocating only one thing

Use kzalloc rather than kcalloc(1,...)

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
@@

- kcalloc(1,
+ kzalloc(
          ...)
// </smpl>

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: Use kzalloc for allocating only one thing
Zheng Yongjun [Tue, 29 Dec 2020 13:52:38 +0000 (21:52 +0800)]
net: dsa: sja1105: Use kzalloc for allocating only one thing

Use kzalloc rather than kcalloc(1,...)

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
@@

- kcalloc(1,
+ kzalloc(
          ...)
// </smpl>

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocavium/liquidio: Use DEFINE_SPINLOCK() for spinlock
Zheng Yongjun [Tue, 29 Dec 2020 13:49:54 +0000 (21:49 +0800)]
cavium/liquidio: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ixp4xx_eth: Use DEFINE_SPINLOCK() for spinlock
Zheng Yongjun [Tue, 29 Dec 2020 13:49:47 +0000 (21:49 +0800)]
net: ixp4xx_eth: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: usb: Use DEFINE_SPINLOCK() for spinlock
Zheng Yongjun [Tue, 29 Dec 2020 13:49:27 +0000 (21:49 +0800)]
net: usb: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: Use DEFINE_SPINLOCK() for spinlock
Zheng Yongjun [Tue, 29 Dec 2020 13:49:18 +0000 (21:49 +0800)]
net: wan: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: tipc: Replace expression with offsetof()
Zheng Yongjun [Tue, 29 Dec 2020 13:49:04 +0000 (21:49 +0800)]
net: tipc: Replace expression with offsetof()

Use the existing offsetof() macro instead of duplicating code.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: Replace simple_strtol by simple_strtoul
Zheng Yongjun [Tue, 29 Dec 2020 13:48:56 +0000 (21:48 +0800)]
net: wan: Replace simple_strtol by simple_strtoul

The simple_strtol() function is deprecated, use simple_strtoul() instead.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mhi: Add raw IP mode support
Loic Poulain [Tue, 29 Dec 2020 09:04:54 +0000 (10:04 +0100)]
net: mhi: Add raw IP mode support

MHI net is protocol agnostic, the payload protocol depends on the modem
configuration, which can be either RMNET (IP muxing and aggregation) or
raw IP. This patch adds support for incomming IPv4/IPv6 packets, that
was previously unconditionnaly reported as RMNET packets.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'enetc-code-cleanups'
David S. Miller [Tue, 5 Jan 2021 23:19:19 +0000 (15:19 -0800)]
Merge branch 'enetc-code-cleanups'

Michael Walle says:

====================
enetc: code cleanups

This are some code cleanups in the MDIO part of the enetc. They are
intended to make the code more readable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoenetc: reorder macros and functions
Michael Walle [Mon, 28 Dec 2020 13:00:34 +0000 (14:00 +0100)]
enetc: reorder macros and functions

Now that there aren't any more macros with parameters, move the macros
above any functions.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoenetc: drop MDIO_DATA() macro
Michael Walle [Mon, 28 Dec 2020 13:00:33 +0000 (14:00 +0100)]
enetc: drop MDIO_DATA() macro

value is u16, masking with 0xffff is a nop. Drop it.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoenetc: don't use macro magic for the readx_poll_timeout() callback
Michael Walle [Mon, 28 Dec 2020 13:00:32 +0000 (14:00 +0100)]
enetc: don't use macro magic for the readx_poll_timeout() callback

The macro enetc_mdio_rd_reg() is just used in that particular case and
has a hardcoded parameter name "mdio_priv". Define a specific function
to use for readx_poll_timeout() instead. Also drop the TIMEOUT macro
since it is used just once.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoenetc: drop unneeded indirection
Michael Walle [Mon, 28 Dec 2020 13:00:31 +0000 (14:00 +0100)]
enetc: drop unneeded indirection

Before commit 6517798dd343 ("enetc: Make MDIO accessors more generic and
export to include/linux/fsl") these macros actually had some benefits.
But after the commit it just makes the code hard to read. Drop the macro
indirections.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx5: DR, Move STEv0 modify header logic
Yevgeny Kliteynik [Thu, 19 Nov 2020 05:15:49 +0000 (07:15 +0200)]
net/mlx5: DR, Move STEv0 modify header logic

Move HW specific modify header fields and logic to STEv0 file
and use the new STE context callbacks.
Since STEv0 and STEv1 modify actions values are different, each
version has its own implementation.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Add STE modify header actions per-device API
Yevgeny Kliteynik [Thu, 19 Nov 2020 05:14:27 +0000 (07:14 +0200)]
net/mlx5: DR, Add STE modify header actions per-device API

Extend the STE context struct with per-device modify header actions.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Move STEv0 action apply logic
Yevgeny Kliteynik [Thu, 19 Nov 2020 05:03:43 +0000 (07:03 +0200)]
net/mlx5: DR, Move STEv0 action apply logic

Use STE tx/rx actions per-device API: move HW specific
action apply logic from dr_ste to STEv0 file - STEv0 and
STEv1 actions format is different, each version should
have its own implementation.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Add STE tx/rx actions per-device API
Yevgeny Kliteynik [Thu, 19 Nov 2020 04:48:33 +0000 (06:48 +0200)]
net/mlx5: DR, Add STE tx/rx actions per-device API

Extend the STE context struct with per-device
tx/rx actions.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Move STEv0 setters and getters
Yevgeny Kliteynik [Thu, 19 Nov 2020 04:44:45 +0000 (06:44 +0200)]
net/mlx5: DR, Move STEv0 setters and getters

Use the new setters and getters API for STEv0: move HW specific setter and
getters from dr_ste to STEv0 file. Since STEv0 and STEv1 format are
different each version should implemented different setters and getters.
Rename remaining static functions w/o mlx5 prefix.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Add STE setters and getters per-device API
Yevgeny Kliteynik [Thu, 19 Nov 2020 04:41:57 +0000 (06:41 +0200)]
net/mlx5: DR, Add STE setters and getters per-device API

Extend the STE context struct with various per-device
setters and getters.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Move action apply logic to dr_ste
Yevgeny Kliteynik [Tue, 22 Sep 2020 01:22:31 +0000 (04:22 +0300)]
net/mlx5: DR, Move action apply logic to dr_ste

The action apply logic is device specific per STE version,
moving to the STE layer will allow implementing it for
both devices while keeping DR upper layers the same.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Refactor ICMP STE builder
Yevgeny Kliteynik [Tue, 22 Sep 2020 01:22:24 +0000 (04:22 +0300)]
net/mlx5: DR, Refactor ICMP STE builder

Reworked ICMP tag builder to better handle ICMP v4/6 fields and avoid unneeded
code duplication and 'if' statements, removed unused macro, changed bitfield
of len 8 to u8.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Move STEv0 look up types from mlx5_ifc_dr header
Yevgeny Kliteynik [Tue, 22 Sep 2020 01:22:12 +0000 (04:22 +0300)]
net/mlx5: DR, Move STEv0 look up types from mlx5_ifc_dr header

The lookup types are device specific and should not be
exposed to DR upper layers, matchers/tables.
Each HW STE version should keep them internal.
The lu_type size is updated to support larger lu_types as
required for STEv1.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Merge similar DR STE SET macros
Yevgeny Kliteynik [Tue, 22 Sep 2020 01:22:02 +0000 (04:22 +0300)]
net/mlx5: DR, Merge similar DR STE SET macros

Merge DR_STE_STE macros for better code reuse, the macro
DR_STE_SET_MASK_V and DR_STE_SET_TAG are merged to avoid
tag and bit_mask function creation which are usually the
same.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Fix STEv0 source_eswitch_owner_vhca_id support
Yevgeny Kliteynik [Wed, 28 Oct 2020 23:33:00 +0000 (01:33 +0200)]
net/mlx5: DR, Fix STEv0 source_eswitch_owner_vhca_id support

Check vport_cap only if match on source gvmi is required.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Remove unused macro definition from dr_ste
Yevgeny Kliteynik [Thu, 19 Nov 2020 04:20:07 +0000 (06:20 +0200)]
net/mlx5: DR, Remove unused macro definition from dr_ste

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Move HW STEv0 match logic to a separate file
Yevgeny Kliteynik [Thu, 19 Nov 2020 04:18:01 +0000 (06:18 +0200)]
net/mlx5: DR, Move HW STEv0 match logic to a separate file

Move current STE match logic to a seprate file.
This file will be used for HW specific STEv0.

Future patches will add functionality for v1 steering.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Use the new HW specific STE infrastructure
Yevgeny Kliteynik [Thu, 19 Nov 2020 04:03:40 +0000 (06:03 +0200)]
net/mlx5: DR, Use the new HW specific STE infrastructure

Split the STE builders functionality into the common part and
device-specific part. All the device-specific part (with 'v0' in
the function names) is accessed through the STE context structure.

Subsequent patches will have the device-specific logic moved to a
separate file.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Move macros from dr_ste.c to header
Yevgeny Kliteynik [Thu, 19 Nov 2020 03:16:38 +0000 (05:16 +0200)]
net/mlx5: DR, Move macros from dr_ste.c to header

Move some macros from dr_ste.c to header - these macros
will be used by all the format-specific functions.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Add infrastructure for supporting several steering formats
Yevgeny Kliteynik [Thu, 19 Nov 2020 02:53:53 +0000 (04:53 +0200)]
net/mlx5: DR, Add infrastructure for supporting several steering formats

Add a struct of device specific callbacks for STE layer below dr_ste.
Each device will implement its HW-specific function, and a comon logic
from the DR code will access these functions through the new ste_ctx API.

More callbacks will follow in the subsequent patches.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoMerge tag 'arc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Tue, 5 Jan 2021 20:46:27 +0000 (12:46 -0800)]
Merge tag 'arc-5.11-rc3' of git://git./linux/kernel/git/vgupta/arc

Pull ARC updates from Vineet Gupta:
 "Things are quieter on upstreaming front as we are mostly focusing on
  ARCv3/ARC64 port.

  This contains just build system updates from Masahiro Yamada"

* tag 'arc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: build: use $(READELF) instead of hard-coded readelf
  ARC: build: remove unneeded extra-y
  ARC: build: move symlink creation to arch/arc/Makefile to avoid race
  ARC: build: add boot_targets to PHONY
  ARC: build: add uImage.lzma to the top-level target
  ARC: build: remove non-existing bootpImage from KBUILD_IMAGE

3 years agoMerge tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Tue, 5 Jan 2021 20:38:56 +0000 (12:38 -0800)]
Merge tag 'net-5.11-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from netfilter, wireless and bpf
  trees.

  Current release - regressions:

   - mt76: fix NULL pointer dereference in mt76u_status_worker and
     mt76s_process_tx_queue

   - net: ipa: fix interconnect enable bug

  Current release - always broken:

   - netfilter: fixes possible oops in mtype_resize in ipset

   - ath11k: fix number of coding issues found by static analysis tools
     and spurious error messages

  Previous releases - regressions:

   - e1000e: re-enable s0ix power saving flows for systems with the
     Intel i219-LM Ethernet controllers to fix power use regression

   - virtio_net: fix recursive call to cpus_read_lock() to avoid a
     deadlock

   - ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst()

   - sysfs: take the rtnl lock around XPS configuration

   - xsk: fix memory leak for failed bind and rollback reservation at
     NETDEV_TX_BUSY

   - r8169: work around power-saving bug on some chip versions

  Previous releases - always broken:

   - dcb: validate netlink message in DCB handler

   - tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
     to prevent unnecessary retries

   - vhost_net: fix ubuf refcount when sendmsg fails

   - bpf: save correct stopping point in file seq iteration

   - ncsi: use real net-device for response handler

   - neighbor: fix div by zero caused by a data race (TOCTOU)

   - bareudp: fix use of incorrect min_headroom size and a false
     positive lockdep splat from the TX lock

   - mvpp2:
      - clear force link UP during port init procedure in case
        bootloader had set it
      - add TCAM entry to drop flow control pause frames
      - fix PPPoE with ipv6 packet parsing
      - fix GoP Networking Complex Control config of port 3
      - fix pkt coalescing IRQ-threshold configuration

   - xsk: fix race in SKB mode transmit with shared cq

   - ionic: account for vlan tag len in rx buffer len

   - stmmac: ignore the second clock input, current clock framework does
     not handle exclusive clock use well, other drivers may reconfigure
     the second clock

  Misc:

   - ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow
     existing scheme"

* tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
  net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
  net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
  net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
  r8169: work around power-saving bug on some chip versions
  net: usb: qmi_wwan: add Quectel EM160R-GL
  selftests: mlxsw: Set headroom size of correct port
  net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
  ibmvnic: fix: NULL pointer dereference.
  docs: networking: packet_mmap: fix old config reference
  docs: networking: packet_mmap: fix formatting for C macros
  vhost_net: fix ubuf refcount incorrectly when sendmsg fails
  bareudp: Fix use of incorrect min_headroom size
  bareudp: set NETIF_F_LLTX flag
  net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
  atlantic: remove architecture depends
  erspan: fix version 1 check in gre_parse_header()
  net: hns: fix return value check in __lb_other_process()
  net: sched: prevent invalid Scell_log shift count
  net: neighbor: fix a crash caused by mod zero
  ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
  ...

3 years agoMerge tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowe...
Linus Torvalds [Tue, 5 Jan 2021 19:55:46 +0000 (11:55 -0800)]
Merge tag 'afs-fixes-04012021' of git://git./linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "Two fixes.

  The first is the fix for the strnlen() array limit check and the
  second fixes the calculation of the number of dirent records used to
  represent any particular filename length"

* tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix directory entry size calculation
  afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y

3 years agomm: make wait_on_page_writeback() wait for multiple pending writebacks
Linus Torvalds [Tue, 5 Jan 2021 19:33:00 +0000 (11:33 -0800)]
mm: make wait_on_page_writeback() wait for multiple pending writebacks

Ever since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit
073861ed77b6 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").

But syzbot just reported another one, even with that commit in place.

And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use.  It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.

Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.

This gives us this fairly simple race condition:

  CPU1 = end previous writeback
  CPU2 = start new writeback under page lock
  CPU3 = write_cache_pages()

  CPU1          CPU2            CPU3
  ----          ----            ----

  end_page_writeback()
    test_clear_page_writeback(page)
    ... delayed...

                lock_page();
                set_page_writeback()
                unlock_page()

                                lock_page()
                                wait_on_page_writeback();

    wake_up_page(page, PG_writeback);
    .. wakes up CPU3 ..

                                BUG_ON(PageWriteback(page));

where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.

The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.

The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").

However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.

IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions.  Painful.

Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior.  This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.

Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Jakub Kicinski [Mon, 4 Jan 2021 22:02:02 +0000 (14:02 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Missing sanitization of rateest userspace string, bug has been
   triggered by syzbot, patch from Florian Westphal.

2) Report EOPNOTSUPP on missing set features in nft_dynset, otherwise
   error reporting to userspace via EINVAL is misleading since this is
   reserved for malformed netlink requests.

3) New binaries with old kernels might silently accept several set
   element expressions. New binaries set on the NFT_SET_EXPR and
   NFT_DYNSET_F_EXPR flags to request for several expressions per
   element, hence old kernels which do not support for this bail out
   with EOPNOTSUPP.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  netfilter: nftables: add set expression flags
  netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
  netfilter: xt_RATEEST: reject non-null terminated string from userspace
====================

Link: https://lore.kernel.org/r/20210103192920.18639-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-dsa-lantiq_gswip-two-fixes-for-net-stable'
Jakub Kicinski [Mon, 4 Jan 2021 21:47:17 +0000 (13:47 -0800)]
Merge branch 'net-dsa-lantiq_gswip-two-fixes-for-net-stable'

Martin Blumenstingl says:

====================
net: dsa: lantiq_gswip: two fixes for -net/-stable

While testing the lantiq_gswip driver in OpenWrt at least one board had
a non-working Ethernet port connected to an internal 100Mbit/s PHY22F
GPHY. The problem which could be observed:
- the PHY would detect the link just fine
- ethtool stats would see the TX counter rise
- the RX counter in ethtool was stuck at zero

It turns out that two independent patches are needed to fix this:
- first we need to enable the MII data lines also for internal PHYs
- second we need to program the GSWIP_MII_CFG registers for all ports
  except the CPU port

These two patches have also been tested by back-porting them on top of
Linux 5.4.86 in OpenWrt.

Special thanks to Hauke for debugging and brainstorming this on IRC
with me!
====================

Link: https://lore.kernel.org/r/20210103012544.3259029-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
Martin Blumenstingl [Sun, 3 Jan 2021 01:25:44 +0000 (02:25 +0100)]
net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access

There is one GSWIP_MII_CFG register for each switch-port except the CPU
port. The register offset for the first port is 0x0, 0x02 for the
second, 0x04 for the third and so on.

Update the driver to not only restrict the GSWIP_MII_CFG registers to
ports 0, 1 and 5. Handle ports 0..5 instead but skip the CPU port. This
means we are not overwriting the configuration for the third port (port
two since we start counting from zero) with the settings for the sixth
port (with number five) anymore.

The GSWIP_MII_PCDU(p) registers are not updated because there's really
only three (one for each of the following ports: 0, 1, 5).

Fixes: 14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
Martin Blumenstingl [Sun, 3 Jan 2021 01:25:43 +0000 (02:25 +0100)]
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs

Enable GSWIP_MII_CFG_EN also for internal PHYs to make traffic flow.
Without this the PHY link is detected properly and ethtool statistics
for TX are increasing but there's no RX traffic coming in.

Fixes: 14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Suggested-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
Xie He [Thu, 31 Dec 2020 17:43:31 +0000 (09:43 -0800)]
net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event

In lapb_device_event, lapb_devtostruct is called to get a reference to
an object of "struct lapb_cb". lapb_devtostruct increases the refcount
of the object and returns a pointer to it. However, we didn't decrease
the refcount after we finished using the pointer. This patch fixes this
problem.

Fixes: a4989fa91110 ("net/lapb: support netdev events")
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201231174331.64539-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: work around power-saving bug on some chip versions
Heiner Kallweit [Wed, 30 Dec 2020 18:33:34 +0000 (19:33 +0100)]
r8169: work around power-saving bug on some chip versions

A user reported failing network with RTL8168dp (a quite rare chip
version). Realtek confirmed that few chip versions suffer from a PLL
power-down hw bug.

Fixes: 07df5bd874f0 ("r8169: power down chip in probe")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/a1c39460-d533-7f9e-fa9d-2b8990b02426@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: usb: qmi_wwan: add Quectel EM160R-GL
Bjørn Mork [Wed, 30 Dec 2020 15:24:51 +0000 (16:24 +0100)]
net: usb: qmi_wwan: add Quectel EM160R-GL

New modem using ff/ff/30 for QCDM, ff/00/00 for  AT and NMEA,
and ff/ff/ff for RMNET/QMI.

T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=5000 MxCh= 0
D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1
P: Vendor=2c7c ProdID=0620 Rev= 4.09
S: Manufacturer=Quectel
S: Product=EM160R-GL
S: SerialNumber=e31cedc1
C:* #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=896mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=(none)
E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=82(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
E: Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20201230152451.245271-1-bjorn@mork.no
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mlxsw: Set headroom size of correct port
Ido Schimmel [Wed, 30 Dec 2020 11:42:51 +0000 (13:42 +0200)]
selftests: mlxsw: Set headroom size of correct port

The test was setting the headroom size of the wrong port. This was not
visible because of a firmware bug that canceled this bug.

Set the headroom size of the correct port, so that the test will pass
with both old and new firmware versions.

Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20201230114251.394009-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
Charles Keepax [Mon, 4 Jan 2021 10:38:02 +0000 (10:38 +0000)]
net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag

A new flag MACB_CAPS_CLK_HW_CHG was added and all callers of
macb_set_tx_clk were gated on the presence of this flag.

-   if (!clk)
+ if (!bp->tx_clk || !(bp->caps & MACB_CAPS_CLK_HW_CHG))

However the flag was not added to anything other than the new
sama7g5_gem, turning that function call into a no op for all other
systems. This breaks the networking on Zynq.

The commit message adding this states: a new capability so that
macb_set_tx_clock() to not be called for IPs having this
capability

This strongly implies that present of the flag was intended to skip
the function not absence of the flag. Update the if statement to
this effect, which repairs the existing users.

Fixes: daafa1d33cc9 ("net: macb: add capability to not set the clock rate")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210104103802.13091-1-ckeepax@opensource.cirrus.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoibmvnic: fix: NULL pointer dereference.
YANG LI [Wed, 30 Dec 2020 07:23:14 +0000 (15:23 +0800)]
ibmvnic: fix: NULL pointer dereference.

The error is due to dereference a null pointer in function
reset_one_sub_crq_queue():

if (!scrq) {
    netdev_dbg(adapter->netdev,
               "Invalid scrq reset. irq (%d) or msgs(%p).\n",
scrq->irq, scrq->msgs);
return -EINVAL;
}

If the expression is true, scrq must be a null pointer and cannot
dereference.

Fixes: 9281cf2d5840 ("ibmvnic: avoid memset null scrq msgs")
Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com>
Reported-by: Abaci <abaci@linux.alibaba.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/1609312994-121032-1-git-send-email-abaci-bugfix@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agodocs: networking: packet_mmap: fix old config reference
Baruch Siach [Tue, 29 Dec 2020 09:08:39 +0000 (11:08 +0200)]
docs: networking: packet_mmap: fix old config reference

Before commit 889b8f964f2f ("packet: Kill CONFIG_PACKET_MMAP.") there
used to be a CONFIG_PACKET_MMAP config symbol that depended on
CONFIG_PACKET. The text still implies that PACKET_MMAP can be disabled.
Remove that from the text, as well as reference to old kernel versions.

Also, drop reference to broken link to information for pre 2.6.5
kernels.

Make a slight working improvement (s/In/On/) while at it.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Link: https://lore.kernel.org/r/80089f3783372c8fd7833f28ce774a171b2ef252.1609232919.git.baruch@tkos.co.il
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agodocs: networking: packet_mmap: fix formatting for C macros
Baruch Siach [Tue, 29 Dec 2020 09:08:38 +0000 (11:08 +0200)]
docs: networking: packet_mmap: fix formatting for C macros

The citation of macro definitions should appear in a code block.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Link: https://lore.kernel.org/r/5cb47005e7a59b64299e038827e295822193384c.1609232919.git.baruch@tkos.co.il
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agovhost_net: fix ubuf refcount incorrectly when sendmsg fails
Yunjian Wang [Tue, 29 Dec 2020 02:01:48 +0000 (10:01 +0800)]
vhost_net: fix ubuf refcount incorrectly when sendmsg fails

Currently the vhost_zerocopy_callback() maybe be called to decrease
the refcount when sendmsg fails in tun. The error handling in vhost
handle_tx_zerocopy() will try to decrease the same refcount again.
This is wrong. To fix this issue, we only call vhost_net_ubuf_put()
when vq->heads[nvq->desc].len == VHOST_DMA_IN_PROGRESS.

Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/1609207308-20544-1-git-send-email-wangyunjian@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agobareudp: Fix use of incorrect min_headroom size
Taehee Yoo [Mon, 28 Dec 2020 15:21:46 +0000 (15:21 +0000)]
bareudp: Fix use of incorrect min_headroom size

In the bareudp6_xmit_skb(), it calculates min_headroom.
At that point, it uses struct iphdr, but it's not correct.
So panic could occur.
The struct ipv6hdr should be used.

Test commands:
    ip netns add A
    ip netns add B
    ip link add veth0 netns A type veth peer name veth1 netns B
    ip netns exec A ip link set veth0 up
    ip netns exec A ip a a 2001:db8:0::1/64 dev veth0
    ip netns exec B ip link set veth1 up
    ip netns exec B ip a a 2001:db8:0::2/64 dev veth1

    for i in {10..1}
    do
            let A=$i-1
            ip netns exec A ip link add bareudp$i type bareudp dstport $i \
    ethertype 0x86dd
            ip netns exec A ip link set bareudp$i up
            ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev bareudp$i
            ip netns exec A ip -6 r a 2001:db8:$i::2 encap ip6 src \
    2001:db8:$A::1 dst 2001:db8:$A::2 via 2001:db8:$i::2 \
    dev bareudp$i

            ip netns exec B ip link add bareudp$i type bareudp dstport $i \
    ethertype 0x86dd
            ip netns exec B ip link set bareudp$i up
            ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev bareudp$i
            ip netns exec B ip -6 r a 2001:db8:$i::1 encap ip6 src \
    2001:db8:$A::2 dst 2001:db8:$A::1 via 2001:db8:$i::1 \
    dev bareudp$i
    done
    ip netns exec A ping 2001:db8:7::2

Splat looks like:
[   66.436679][    C2] skbuff: skb_under_panic: text:ffffffff928614c8 len:454 put:14 head:ffff88810abb4000 data:ffff88810abb3ffa tail:0x1c0 end:0x3ec0 dev:veth0
[   66.441626][    C2] ------------[ cut here ]------------
[   66.443458][    C2] kernel BUG at net/core/skbuff.c:109!
[   66.445313][    C2] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[   66.447606][    C2] CPU: 2 PID: 913 Comm: ping Not tainted 5.10.0+ #819
[   66.450251][    C2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   66.453713][    C2] RIP: 0010:skb_panic+0x15d/0x15f
[   66.455345][    C2] Code: 98 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 60 8b 78 93 41 57 41 56 41 55 48 8b 54 24 20 48 8b 74 24 28 e8 b5 40 f9 ff <0f> 0b 48 8b 6c 24 20 89 34 24 e8 08 c9 98 fe 8b 34 24 48 c7 c1 80
[   66.462314][    C2] RSP: 0018:ffff888119209648 EFLAGS: 00010286
[   66.464281][    C2] RAX: 0000000000000089 RBX: ffff888003159000 RCX: 0000000000000000
[   66.467216][    C2] RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed10232412c0
[   66.469768][    C2] RBP: ffff88810a53d440 R08: ffffed102328018d R09: ffffed102328018d
[   66.472297][    C2] R10: ffff888119400c67 R11: ffffed102328018c R12: 000000000000000e
[   66.474833][    C2] R13: ffff88810abb3ffa R14: 00000000000001c0 R15: 0000000000003ec0
[   66.477361][    C2] FS:  00007f37c0c72f00(0000) GS:ffff888119200000(0000) knlGS:0000000000000000
[   66.480214][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.482296][    C2] CR2: 000055a058808570 CR3: 000000011039e002 CR4: 00000000003706e0
[   66.484811][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.487793][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   66.490424][    C2] Call Trace:
[   66.491469][    C2]  <IRQ>
[   66.492374][    C2]  ? eth_header+0x28/0x190
[   66.494054][    C2]  ? eth_header+0x28/0x190
[   66.495401][    C2]  skb_push.cold.99+0x22/0x22
[   66.496700][    C2]  eth_header+0x28/0x190
[   66.497867][    C2]  neigh_resolve_output+0x3de/0x720
[   66.499615][    C2]  ? __neigh_update+0x7e8/0x20a0
[   66.501176][    C2]  __neigh_update+0x8bd/0x20a0
[   66.502749][    C2]  ndisc_update+0x34/0xc0
[   66.504010][    C2]  ndisc_recv_na+0x8da/0xb80
[   66.505041][    C2]  ? pndisc_redo+0x20/0x20
[   66.505888][    C2]  ? rcu_read_lock_sched_held+0xc0/0xc0
[   66.506965][    C2]  ndisc_rcv+0x3a0/0x470
[   66.507797][    C2]  icmpv6_rcv+0xad9/0x1b00
[   66.508645][    C2]  ip6_protocol_deliver_rcu+0xcd6/0x1560
[   66.509719][    C2]  ip6_input_finish+0x5b/0xf0
[   66.510615][    C2]  ip6_input+0xcd/0x2d0
[   66.511406][    C2]  ? ip6_input_finish+0xf0/0xf0
[   66.512327][    C2]  ? rcu_read_lock_held+0x91/0xa0
[   66.513279][    C2]  ? ip6_protocol_deliver_rcu+0x1560/0x1560
[   66.514414][    C2]  ipv6_rcv+0xe8/0x300
[ ... ]

Acked-by: Guillaume Nault <gnault@redhat.com>
Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201228152146.24270-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agobareudp: set NETIF_F_LLTX flag
Taehee Yoo [Mon, 28 Dec 2020 15:21:36 +0000 (15:21 +0000)]
bareudp: set NETIF_F_LLTX flag

Like other tunneling interfaces, the bareudp doesn't need TXLOCK.
So, It is good to set the NETIF_F_LLTX flag to improve performance and
to avoid lockdep's false-positive warning.

Test commands:
    ip netns add A
    ip netns add B
    ip link add veth0 netns A type veth peer name veth1 netns B
    ip netns exec A ip link set veth0 up
    ip netns exec A ip a a 10.0.0.1/24 dev veth0
    ip netns exec B ip link set veth1 up
    ip netns exec B ip a a 10.0.0.2/24 dev veth1

    for i in {2..1}
    do
            let A=$i-1
            ip netns exec A ip link add bareudp$i type bareudp \
    dstport $i ethertype ip
            ip netns exec A ip link set bareudp$i up
            ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i
            ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \
    dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i

            ip netns exec B ip link add bareudp$i type bareudp \
    dstport $i ethertype ip
            ip netns exec B ip link set bareudp$i up
            ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i
            ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \
    dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i
    done
    ip netns exec A ping 10.0.2.2

Splat looks like:
[   96.992803][  T822] ============================================
[   96.993954][  T822] WARNING: possible recursive locking detected
[   96.995102][  T822] 5.10.0+ #819 Not tainted
[   96.995927][  T822] --------------------------------------------
[   96.997091][  T822] ping/822 is trying to acquire lock:
[   96.998083][  T822] ffff88810f753898 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[   96.999813][  T822]
[   96.999813][  T822] but task is already holding lock:
[   97.001192][  T822] ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[   97.002908][  T822]
[   97.002908][  T822] other info that might help us debug this:
[   97.004401][  T822]  Possible unsafe locking scenario:
[   97.004401][  T822]
[   97.005784][  T822]        CPU0
[   97.006407][  T822]        ----
[   97.007010][  T822]   lock(_xmit_NONE#2);
[   97.007779][  T822]   lock(_xmit_NONE#2);
[   97.008550][  T822]
[   97.008550][  T822]  *** DEADLOCK ***
[   97.008550][  T822]
[   97.010057][  T822]  May be due to missing lock nesting notation
[   97.010057][  T822]
[   97.011594][  T822] 7 locks held by ping/822:
[   97.012426][  T822]  #0: ffff888109a144f0 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x12f7/0x2b00
[   97.014191][  T822]  #1: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
[   97.016045][  T822]  #2: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
[   97.017897][  T822]  #3: ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[   97.019684][  T822]  #4: ffffffffbce2f600 (rcu_read_lock){....}-{1:2}, at: bareudp_xmit+0x31b/0x3690 [bareudp]
[   97.021573][  T822]  #5: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
[   97.023424][  T822]  #6: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
[   97.025259][  T822]
[   97.025259][  T822] stack backtrace:
[   97.026349][  T822] CPU: 3 PID: 822 Comm: ping Not tainted 5.10.0+ #819
[   97.027609][  T822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   97.029407][  T822] Call Trace:
[   97.030015][  T822]  dump_stack+0x99/0xcb
[   97.030783][  T822]  __lock_acquire.cold.77+0x149/0x3a9
[   97.031773][  T822]  ? stack_trace_save+0x81/0xa0
[   97.032661][  T822]  ? register_lock_class+0x1910/0x1910
[   97.033673][  T822]  ? register_lock_class+0x1910/0x1910
[   97.034679][  T822]  ? rcu_read_lock_sched_held+0x91/0xc0
[   97.035697][  T822]  ? rcu_read_lock_bh_held+0xa0/0xa0
[   97.036690][  T822]  lock_acquire+0x1b2/0x730
[   97.037515][  T822]  ? __dev_queue_xmit+0x1f52/0x2960
[   97.038466][  T822]  ? check_flags+0x50/0x50
[   97.039277][  T822]  ? netif_skb_features+0x296/0x9c0
[   97.040226][  T822]  ? validate_xmit_skb+0x29/0xb10
[   97.041151][  T822]  _raw_spin_lock+0x30/0x70
[   97.041977][  T822]  ? __dev_queue_xmit+0x1f52/0x2960
[   97.042927][  T822]  __dev_queue_xmit+0x1f52/0x2960
[   97.043852][  T822]  ? netdev_core_pick_tx+0x290/0x290
[   97.044824][  T822]  ? mark_held_locks+0xb7/0x120
[   97.045712][  T822]  ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
[   97.046824][  T822]  ? __local_bh_enable_ip+0xa5/0xf0
[   97.047771][  T822]  ? ___neigh_create+0x12a8/0x1eb0
[   97.048710][  T822]  ? trace_hardirqs_on+0x41/0x120
[   97.049626][  T822]  ? ___neigh_create+0x12a8/0x1eb0
[   97.050556][  T822]  ? __local_bh_enable_ip+0xa5/0xf0
[   97.051509][  T822]  ? ___neigh_create+0x12a8/0x1eb0
[   97.052443][  T822]  ? check_chain_key+0x244/0x5f0
[   97.053352][  T822]  ? rcu_read_lock_bh_held+0x56/0xa0
[   97.054317][  T822]  ? ip_finish_output2+0x6ea/0x2020
[   97.055263][  T822]  ? pneigh_lookup+0x410/0x410
[   97.056135][  T822]  ip_finish_output2+0x6ea/0x2020
[ ... ]

Acked-by: Guillaume Nault <gnault@redhat.com>
Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201228152136.24215-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck...
Linus Torvalds [Mon, 4 Jan 2021 18:55:19 +0000 (10:55 -0800)]
Merge branch 'rcu/urgent' of git://git./linux/kernel/git/paulmck/linux-rcu

Pull RCU fix from Paul McKenney:
 "This is a fix for a regression in the v5.10 merge window, but it was
  reported quite late in the v5.10 process, plus generating and testing
  the fix took some time.

  The regression is due to commit 36dadef23fcc ("kprobes: Init kprobes
  in early_initcall") which on powerpc can use RCU Tasks before
  initialization, resulting in boot failures.

  The fix is straightforward, simply moving initialization of RCU Tasks
  before the early_initcall()s. The fix has been exposed to -next and
  kbuild test robot testing, and has been tested by the PowerPC guys"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu-tasks: Move RCU-tasks initialization to before early_initcall()

3 years agoMerge tag 'compiler-attributes-for-linus-v5.11' of git://github.com/ojeda/linux
Linus Torvalds [Mon, 4 Jan 2021 18:47:38 +0000 (10:47 -0800)]
Merge tag 'compiler-attributes-for-linus-v5.11' of git://github.com/ojeda/linux

Pull ENABLE_MUST_CHECK removal from Miguel Ojeda:
 "Remove CONFIG_ENABLE_MUST_CHECK (Masahiro Yamada)"

Note that this removes the config option by making the must-check
unconditional, not by removing must check itself.

* tag 'compiler-attributes-for-linus-v5.11' of git://github.com/ojeda/linux:
  Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK

3 years agoafs: Fix directory entry size calculation
David Howells [Wed, 23 Dec 2020 10:39:57 +0000 (10:39 +0000)]
afs: Fix directory entry size calculation

The number of dirent records used by an AFS directory entry should be
calculated using the assumption that there is a 16-byte name field in the
first block, rather than a 20-byte name field (which is actually the case).
This miscalculation is historic and effectively standard, so we have to use
it.

The calculation we need to use is:

1 + (((strlen(name) + 1) + 15) >> 5)

where we are adding one to the strlen() result to account for the NUL
termination.

Fix this by the following means:

 (1) Create an inline function to do the calculation for a given name
     length.

 (2) Use the function to calculate the number of records used for a dirent
     in afs_dir_iterate_block().

     Use this to move the over-end check out of the loop since it only
     needs to be done once.

     Further use this to only go through the loop for the 2nd+ records
     composing an entry.  The only test there now is for if the record is
     allocated - and we already checked the first block at the top of the
     outer loop.

 (3) Add a max name length check in afs_dir_iterate_block().

 (4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function
     from (1) to calculate the number of blocks rather than doing it
     incorrectly themselves.

Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...")
Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
3 years agoafs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
David Howells [Mon, 21 Dec 2020 22:37:58 +0000 (22:37 +0000)]
afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y

AFS has a structured layout in its directory contents (AFS dirs are
downloaded as files and parsed locally by the client for lookup/readdir).
The slots in the directory are defined by union afs_xdr_dirent.  This,
however, only directly allows a name of a length that will fit into that
union.  To support a longer name, the next 1-8 contiguous entries are
annexed to the first one and the name flows across these.

afs_dir_iterate_block() uses strnlen(), limited to the space to the end of
the page, to find out how long the name is.  This worked fine until
6a39e62abbaf.  With that commit, the compiler determines the size of the
array and asserts that the string fits inside that array.  This is a
problem for AFS because we *expect* it to overflow one or more arrays.

A similar problem also occurs in afs_dir_scan_block() when a directory file
is being locally edited to avoid the need to redownload it.  There strlen()
was being used safely because each page has the last byte set to 0 when the
file is downloaded and validated (in afs_dir_check_page()).

Fix this by changing the afs_xdr_dirent union name field to an
indeterminate-length array and dropping the overflow field.

(Note that whilst looking at this, I realised that the calculation of the
number of slots a dirent used is non-standard and not quite right, but I'll
address that in a separate patch.)

The issue can be triggered by something like:

        touch /afs/example.com/thisisaveryveryverylongname

and it generates a report that looks like:

        detected buffer overflow in strnlen
        ------------[ cut here ]------------
        kernel BUG at lib/string.c:1149!
        ...
        RIP: 0010:fortify_panic+0xf/0x11
        ...
        Call Trace:
         afs_dir_iterate_block+0x12b/0x35b
         afs_dir_iterate+0x14e/0x1ce
         afs_do_lookup+0x131/0x417
         afs_lookup+0x24f/0x344
         lookup_open.isra.0+0x1bb/0x27d
         open_last_lookups+0x166/0x237
         path_openat+0xe0/0x159
         do_filp_open+0x48/0xa4
         ? kmem_cache_alloc+0xf5/0x16e
         ? __clear_close_on_exec+0x13/0x22
         ? _raw_spin_unlock+0xa/0xb
         do_sys_openat2+0x72/0xde
         do_sys_open+0x3b/0x58
         do_syscall_64+0x2d/0x3a
         entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 6a39e62abbaf ("lib: string.h: detect intra-object overflow in fortified string functions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: Daniel Axtens <dja@axtens.net>

3 years agoLinux 5.11-rc2
Linus Torvalds [Sun, 3 Jan 2021 23:55:30 +0000 (15:55 -0800)]
Linux 5.11-rc2

3 years agoMerge tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 2 Jan 2021 20:22:46 +0000 (12:22 -0800)]
Merge tag 's390-5.11-3' of git://git./linux/kernel/git/s390/linux

Pull s390 cleanups from Vasily Gorbik:
 "Update defconfigs and sort config select list"

* tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/Kconfig: sort config S390 select list once again
  s390: update defconfigs

3 years agoMerge tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 2 Jan 2021 19:53:05 +0000 (11:53 -0800)]
Merge tag 'pm-5.11-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a crash in intel_pstate during resume from suspend-to-RAM
  that may occur after recent changes and two resource leaks in error
  paths in the operating performance points (OPP) framework, add a new
  C-states table to intel_idle and update the cpuidle MAINTAINERS entry
  to cover the governors too.

  Specifics:

   - Fix recently introduced crash in the intel_pstate driver that
     occurs if scale-invariance is disabled during resume from
     suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
     values made by the platform firmware (Rafael Wysocki).

   - Fix a memory leak and add a missing clk_put() in error paths in the
     OPP framework (Quanyang Wang, Viresh Kumar).

   - Add new C-states table for SnowRidge processors to the intel_idle
     driver (Artem Bityutskiy).

   - Update the MAINTAINERS entry for cpuidle to make it clear that the
     governors are covered by it too (Lukas Bulwahn)"

* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: add SnowRidge C-state table
  cpufreq: intel_pstate: Fix fast-switch fallback path
  opp: Call the missing clk_put() on error
  opp: fix memory leak in _allocate_opp_table
  MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK

3 years agoMerge branches 'pm-cpufreq' and 'pm-cpuidle'
Rafael J. Wysocki [Sat, 2 Jan 2021 09:16:32 +0000 (10:16 +0100)]
Merge branches 'pm-cpufreq' and 'pm-cpuidle'

* pm-cpufreq:
  cpufreq: intel_pstate: Fix fast-switch fallback path

* pm-cpuidle:
  intel_idle: add SnowRidge C-state table
  MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK

3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 1 Jan 2021 20:58:07 +0000 (12:58 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).

  The big core two fixes are for power management ("block: Do not accept
  any requests while suspended" and "block: Fix a race in the runtime
  power management code") which finally sorts out the resume problems
  we've occasionally been having.

  To make the resume fix, there are seven necessary precursors which
  effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
  in block is automatically a power management exempt one.

  All of the non-PM preempt cases are removed except for the one in the
  SCSI Parallel Interface (spi) domain validation which is a genuine
  case where we have to run requests at high priority to validate the
  bus so this becomes an autopm get/put protected request"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
  scsi: cxgb4i: Fix TLS dependency
  scsi: ufs: Un-inline ufshcd_vops_device_reset function
  scsi: ufs: Re-enable WriteBooster after device reset
  scsi: ufs-mediatek: Use correct path to fix compile error
  scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
  scsi: block: Do not accept any requests while suspended
  scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
  scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
  scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
  scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
  scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
  scsi: block: Introduce BLK_MQ_REQ_PM
  scsi: block: Fix a race in the runtime power management code
  scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
  scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
  scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
  scsi: ufs-pci: Fix restore from S4 for Intel controllers
  scsi: ufs-mediatek: Keep VCC always-on for specific devices
  scsi: ufs: Allow regulators being always-on
  scsi: ufs: Clear UAC for RPMB after ufshcd resets
  ...

3 years agoMerge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 1 Jan 2021 20:49:09 +0000 (12:49 -0800)]
Merge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Two minor block fixes from this last week that should go into 5.11:

   - Add missing NOWAIT debugfs definition (Andres)

   - Fix kerneldoc warning introduced this merge window (Randy)"

* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
  block: add debugfs stanza for QUEUE_FLAG_NOWAIT
  fs: block_dev.c: fix kernel-doc warnings from struct block_device changes

3 years agoMerge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 1 Jan 2021 20:29:49 +0000 (12:29 -0800)]
Merge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few fixes that should go into 5.11, all marked for stable as well:

   - Fix issue around identity COW'ing and users that share a ring
     across processes

   - Fix a hang associated with unregistering fixed files (Pavel)

   - Move the 'process is exiting' cancelation a bit earlier, so
     task_works aren't affected by it (Pavel)"

* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
  kernel/io_uring: cancel io_uring before task works
  io_uring: fix io_sqe_files_unregister() hangs
  io_uring: add a helper for setting a ref node
  io_uring: don't assume mm is constant across submits

3 years agodepmod: handle the case of /sbin/depmod without /sbin in PATH
Linus Torvalds [Mon, 28 Dec 2020 19:40:22 +0000 (11:40 -0800)]
depmod: handle the case of /sbin/depmod without /sbin in PATH

Commit 436e980e2ed5 ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).

Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.

Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel/io_uring: cancel io_uring before task works
Pavel Begunkov [Wed, 30 Dec 2020 21:34:16 +0000 (21:34 +0000)]
kernel/io_uring: cancel io_uring before task works

For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.

Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix io_sqe_files_unregister() hangs
Pavel Begunkov [Wed, 30 Dec 2020 21:34:15 +0000 (21:34 +0000)]
io_uring: fix io_sqe_files_unregister() hangs

io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add a helper for setting a ref node
Pavel Begunkov [Wed, 30 Dec 2020 21:34:14 +0000 (21:34 +0000)]
io_uring: add a helper for setting a ref node

Setting a new reference node to a file data is not trivial, don't repeat
it, add and use a helper.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client
Linus Torvalds [Wed, 30 Dec 2020 20:02:12 +0000 (12:02 -0800)]
Merge tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A fix for an edge case in MClientRequest encoding and a couple of
  trivial fixups for the new msgr2 support"

* tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client:
  libceph: add __maybe_unused to DEFINE_MSGR2_FEATURE
  libceph: align session_key and con_secret to 16 bytes
  libceph: fix auth_signature buffer allocation in secure mode
  ceph: reencode gid_list when reconnecting

3 years agointel_idle: add SnowRidge C-state table
Artem Bityutskiy [Sun, 27 Dec 2020 10:11:16 +0000 (12:11 +0200)]
intel_idle: add SnowRidge C-state table

Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.

The following has been changed.

 1. C1E latency changed from 10us to 15us. It was measured using the
    open source "wult" tool (the "nic" method, 15us is the 99.99th
    percentile).

 2. C1E power break even changed from 20us to 25us, which may result
    in less C1E residency in some workloads.

 3. C6 latency changed from 50us to 130us. Measured the same way as C1E.

The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.

On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agocpufreq: intel_pstate: Fix fast-switch fallback path
Rafael J. Wysocki [Tue, 29 Dec 2020 17:08:18 +0000 (18:08 +0100)]
cpufreq: intel_pstate: Fix fast-switch fallback path

When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's ->fast_switch() callback will be invoked,
so it must not be NULL.

However, after commit a365ab6b9dfb ("cpufreq: intel_pstate: Implement
the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original ->adjust_perf()
callback pointer.

Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoMerge branch 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Rafael J. Wysocki [Wed, 30 Dec 2020 17:19:34 +0000 (18:19 +0100)]
Merge branch 'opp/linux-next' of git://git./linux/kernel/git/vireshk/pm

Pull operating performance points (OPP) framework fixes for 5.11-rc2
from Viresh Kumar:

"This contains two patches to fix freeing of resources in error paths."

* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  opp: Call the missing clk_put() on error
  opp: fix memory leak in _allocate_opp_table

3 years agos390/Kconfig: sort config S390 select list once again
Heiko Carstens [Mon, 28 Dec 2020 08:51:57 +0000 (09:51 +0100)]
s390/Kconfig: sort config S390 select list once again

...and add comments at the top and bottom.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
3 years agos390: update defconfigs
Heiko Carstens [Wed, 18 Nov 2020 20:23:33 +0000 (21:23 +0100)]
s390: update defconfigs

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
3 years agoblock: add debugfs stanza for QUEUE_FLAG_NOWAIT
Andres Freund [Mon, 28 Dec 2020 19:27:18 +0000 (11:27 -0800)]
block: add debugfs stanza for QUEUE_FLAG_NOWAIT

This was missed in 021a24460dc2. Leads to the numeric value of
QUEUE_FLAG_NOWAIT (i.e. 29) showing up in
/sys/kernel/debug/block/*/state.

Fixes: 021a24460dc28e7412aecfae89f60e1847e685c0
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agofs: block_dev.c: fix kernel-doc warnings from struct block_device changes
Randy Dunlap [Tue, 29 Dec 2020 03:47:06 +0000 (19:47 -0800)]
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes

Fix new kernel-doc warnings in fs/block_dev.c:

../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'

Fixes: 4e7b5671c6a8 ("block: remove i_bdev")
Fixes: 37c3fc9abb25 ("block: simplify the block device claiming interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Tue, 29 Dec 2020 23:45:49 +0000 (15:45 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "16 patches

  Subsystems affected by this patch series: mm (selftests, hugetlb,
  pagecache, mremap, kasan, and slub), kbuild, checkpatch, misc, and
  lib"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: slub: call account_slab_page() after slab page initialization
  zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
  lib/zlib: fix inflating zlib streams on s390
  lib/genalloc: fix the overflow when size is too big
  kdev_t: always inline major/minor helper functions
  sizes.h: add SZ_8G/SZ_16G/SZ_32G macros
  local64.h: make <asm/local64.h> mandatory
  kasan: fix null pointer dereference in kasan_record_aux_stack
  mm: generalise COW SMC TLB flushing race comment
  mm/mremap.c: fix extent calculation
  mm: memmap defer init doesn't work as expected
  mm: add prototype for __add_to_page_cache_locked()
  checkpatch: prefer strscpy to strlcpy
  Revert "kbuild: avoid static_assert for genksyms"
  mm/hugetlb: fix deadlock in hugetlb_cow error path
  selftests/vm: fix building protection keys test

3 years agomm: slub: call account_slab_page() after slab page initialization
Roman Gushchin [Tue, 29 Dec 2020 23:15:07 +0000 (15:15 -0800)]
mm: slub: call account_slab_page() after slab page initialization

It's convenient to have page->objects initialized before calling into
account_slab_page().  In particular, this information can be used to
pre-alloc the obj_cgroup vector.

Let's call account_slab_page() a bit later, after the initialization of
page->objects.

This commit doesn't bring any functional change, but is required for
further optimizations.

[akpm@linux-foundation.org: undo changes needed by forthcoming mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch]

Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agozlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
Randy Dunlap [Tue, 29 Dec 2020 23:15:04 +0000 (15:15 -0800)]
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c

In commit 11fb479ff5d9 ("zlib: export S390 symbols for zlib modules"), I
added EXPORT_SYMBOL()s to dfltcc_inflate.c but then Mikhail said that
these should probably be in dfltcc_syms.c with the other
EXPORT_SYMBOL()s.

However, that is contrary to the current kernel style, which places
EXPORT_SYMBOL() immediately after the function that it applies to, so
move all EXPORT_SYMBOL()s to their respective function locations and
drop the dfltcc_syms.c file.  Also move MODULE_LICENSE() from the
deleted file to dfltcc.c.

[rdunlap@infradead.org: remove dfltcc_syms.o from Makefile]
Link: https://lkml.kernel.org/r/20201227171837.15492-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/20201219052530.28461-1-rdunlap@infradead.org
Fixes: 11fb479ff5d9 ("zlib: export S390 symbols for zlib modules")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Zaslonko Mikhail <zaslonko@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/zlib: fix inflating zlib streams on s390
Ilya Leoshkevich [Tue, 29 Dec 2020 23:15:01 +0000 (15:15 -0800)]
lib/zlib: fix inflating zlib streams on s390

Decompressing zlib streams on s390 fails with "incorrect data check"
error.

Userspace zlib checks inflate_state.flags in order to byteswap checksums
only for zlib streams, and s390 hardware inflate code, which was ported
from there, tries to match this behavior.  At the same time, kernel zlib
does not use inflate_state.flags, so it contains essentially random
values.  For many use cases either zlib stream is zeroed out or checksum
is not used, so this problem is masked, but at least SquashFS is still
affected.

Fix by always passing a checksum to and from the hardware as is, which
matches zlib_inflate()'s expectations.

Link: https://lkml.kernel.org/r/20201215155551.894884-1-iii@linux.ibm.com
Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: <stable@vger.kernel.org> [5.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/genalloc: fix the overflow when size is too big
Huang Shijie [Tue, 29 Dec 2020 23:14:58 +0000 (15:14 -0800)]
lib/genalloc: fix the overflow when size is too big

Some graphic card has very big memory on chip, such as 32G bytes.

In the following case, it will cause overflow:

    pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
    ret = gen_pool_add(pool, 0x1000000, SZ_32G, NUMA_NO_NODE);

    va = gen_pool_alloc(pool, SZ_4G);

The overflow occurs in gen_pool_alloc_algo_owner():

....
size = nbits << order;
....

The @nbits is "int" type, so it will overflow.
Then the gen_pool_avail() will return the wrong value.

This patch converts some "int" to "unsigned long", and
changes the compare code in while.

Link: https://lkml.kernel.org/r/20201229060657.3389-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Reported-by: Shi Jiasheng <jiasheng.shi@iluvatar.ai>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokdev_t: always inline major/minor helper functions
Josh Poimboeuf [Tue, 29 Dec 2020 23:14:55 +0000 (15:14 -0800)]
kdev_t: always inline major/minor helper functions

Silly GCC doesn't always inline these trivial functions.

Fixes the following warning:

  arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to new_encode_dev() with UACCESS enabled

Link: https://lkml.kernel.org/r/984353b44a4484d86ba9f73884b7306232e25e30.1608737428.git.jpoimboe@redhat.com
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> [build-tested]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agosizes.h: add SZ_8G/SZ_16G/SZ_32G macros
Huang Shijie [Tue, 29 Dec 2020 23:14:52 +0000 (15:14 -0800)]
sizes.h: add SZ_8G/SZ_16G/SZ_32G macros

Add these macros, since we can use them in drivers.

Link: https://lkml.kernel.org/r/20201229072819.11183-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolocal64.h: make <asm/local64.h> mandatory
Randy Dunlap [Tue, 29 Dec 2020 23:14:49 +0000 (15:14 -0800)]
local64.h: make <asm/local64.h> mandatory

Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include <asm-generic/local64.h>.

This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.

Build-tested on 21 of 25 arch-es.  (tools problems on the others)

Yes, we could even rename <asm-generic/local64.h> to
<linux/local64.h> and change all #includes to use
<linux/local64.h> instead.

Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokasan: fix null pointer dereference in kasan_record_aux_stack
Walter Wu [Tue, 29 Dec 2020 23:14:46 +0000 (15:14 -0800)]
kasan: fix null pointer dereference in kasan_record_aux_stack

Syzbot reported the following [1]:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 2d993067 P4D 2d993067 PUD 19a3c067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 3852 Comm: kworker/1:2 Not tainted 5.10.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events free_ipc
  RIP: 0010:kasan_record_aux_stack+0x77/0xb0

Add null checking slab object from kasan_get_alloc_meta() in order to
avoid null pointer dereference.

[1] https://syzkaller.appspot.com/x/log.txt?x=10a82a50d00000

Link: https://lkml.kernel.org/r/20201228080018.23041-1-walter-zh.wu@mediatek.com
Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: generalise COW SMC TLB flushing race comment
Nicholas Piggin [Tue, 29 Dec 2020 23:14:43 +0000 (15:14 -0800)]
mm: generalise COW SMC TLB flushing race comment

I'm not sure if I'm completely missing something here, but AFAIKS the
reference to the mysterious "COW SMC race" confuses the issue.  The
original changelog and mailing list thread didn't help me either.

This SMC race is where the problem was detected, but isn't the general
problem bigger and more obvious: that the new PTE could be picked up at
any time by any TLB while entries for the old PTE exist in other TLBs
before the TLB flush takes effect?

The case where the iTLB and dTLB of a CPU are pointing at different pages
is an interesting one but follows from the general problem.

The other (minor) thing with the comment I think it makes it a bit clearer
to say what the old code was doing (i.e., it avoids the race as opposed to
what?).

References: 4ce072f1faf29 ("mm: fix a race condition under SMC + COW")
Link: https://lkml.kernel.org/r/20201215121119.351650-1-npiggin@gmail.com
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/mremap.c: fix extent calculation
Kalesh Singh [Tue, 29 Dec 2020 23:14:40 +0000 (15:14 -0800)]
mm/mremap.c: fix extent calculation

When `next < old_addr`, `next - old_addr` arithmetic underflows causing
`extent` to be incorrect.

Make `extent` the smaller of `next - old_addr` or `old_end - old_addr`.

Link: https://lkml.kernel.org/r/20201219170433.2418867-1-kaleshsingh@google.com
Fixes: c49dd34018026 ("mm: speedup mremap on 1GB or larger regions")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: memmap defer init doesn't work as expected
Baoquan He [Tue, 29 Dec 2020 23:14:37 +0000 (15:14 -0800)]
mm: memmap defer init doesn't work as expected

VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb376 ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.

Before the commit:

  [0.033176] Normal zone: 1445888 pages used for memmap
  [0.033176] Normal zone: 89391104 pages, LIFO batch:63
  [0.035851] ACPI: PM-Timer IO Port: 0x448

With commit

  [0.026874] Normal zone: 1445888 pages used for memmap
  [0.026875] Normal zone: 89391104 pages, LIFO batch:63
  [2.028450] ACPI: PM-Timer IO Port: 0x448

The root cause is the current memmap defer init doesn't work as expected.

Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node.  However, since commit 73a6e474cb376,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.

E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2.  The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
only one memory section's memmap initialized.  That's why more time is
costed at the time.

[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]

Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.

Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: add prototype for __add_to_page_cache_locked()
Souptick Joarder [Tue, 29 Dec 2020 23:14:34 +0000 (15:14 -0800)]
mm: add prototype for __add_to_page_cache_locked()

Otherwise it causes a gcc warning:

  mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]

A previous attempt to make this function static led to compilation
errors when CONFIG_DEBUG_INFO_BTF is enabled because
__add_to_page_cache_locked() is referred to by BPF code.

Adding a prototype will silence the warning.

Link: https://lkml.kernel.org/r/1608693702-4665-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: prefer strscpy to strlcpy
Joe Perches [Tue, 29 Dec 2020 23:14:31 +0000 (15:14 -0800)]
checkpatch: prefer strscpy to strlcpy

Prefer strscpy over the deprecated strlcpy function.

Link: https://lkml.kernel.org/r/19fe91084890e2c16fe56f960de6c570a93fa99b.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoRevert "kbuild: avoid static_assert for genksyms"
Masahiro Yamada [Tue, 29 Dec 2020 23:14:28 +0000 (15:14 -0800)]
Revert "kbuild: avoid static_assert for genksyms"

This reverts commit 14dc3983b5dff513a90bd5a8cc90acaf7867c3d0.

Macro Elver had sent a fix proper fix earlier, and also pointed out
corner cases:
 "I guess what you propose is simpler, but might still have corner cases
  where we still get warnings. In particular, if some file (for whatever
  reason) does not include build_bug.h and uses a raw _Static_assert(),
  then we still get warnings. E.g. I see 1 user of raw _Static_assert()
  (drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h )."

I believe the raw use of _Static_assert() should be allowed, so this
should be fixed in genksyms.

Even after commit 14dc3983b5df ("kbuild: avoid static_assert for
genksyms"), I confirmed the following test code emits the warning.

  ---------------->8----------------
  #include <linux/export.h>

  _Static_assert((1 ?: 0), "");

  void foo(void) { }
  EXPORT_SYMBOL(foo);
  ---------------->8----------------

  WARNING: modpost: EXPORT symbol "foo" [vmlinux] version generation failed, symbol will not be versioned.

Now that commit 869b91992bce ("genksyms: Ignore module scoped
_Static_assert()") fixed this issue properly, the workaround should
be reverted.

Link: https://lkml.org/lkml/2020/12/10/845
Link: https://lkml.kernel.org/r/20201219183911.181442-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/hugetlb: fix deadlock in hugetlb_cow error path
Mike Kravetz [Tue, 29 Dec 2020 23:14:25 +0000 (15:14 -0800)]
mm/hugetlb: fix deadlock in hugetlb_cow error path

syzbot reported the deadlock here [1].  The issue is in hugetlb cow
error handling when there are not enough huge pages for the faulting
task which took the original reservation.  It is possible that other
(child) tasks could have consumed pages associated with the reservation.
In this case, we want the task which took the original reservation to
succeed.  So, we unmap any associated pages in children so that they can
be used by the faulting task that owns the reservation.

The unmapping code needs to hold i_mmap_rwsem in write mode.  However,
due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd
sharing synchronization") we are already holding i_mmap_rwsem in read
mode when hugetlb_cow is called.

Technically, i_mmap_rwsem does not need to be held in read mode for COW
mappings as they can not share pmd's.  Modifying the fault code to not
take i_mmap_rwsem in read mode for COW (and other non-sharable) mappings
is too involved for a stable fix.

Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem before
unmapping.  This is OK as it is technically not needed.  They are
reacquired after unmapping as expected by calling code.  Since this is
done in an uncommon error path, the overhead of dropping and reacquiring
mutexes is acceptable.

While making changes, remove redundant BUG_ON after unmap_ref_private.

[1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@google.com

Link: https://lkml.kernel.org/r/4c5781b8-3b00-761e-c0c7-c5edebb6ec1a@oracle.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: syzbot+5eee4145df3c15e96625@syzkaller.appspotmail.com
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoselftests/vm: fix building protection keys test
Harish [Tue, 29 Dec 2020 23:14:22 +0000 (15:14 -0800)]
selftests/vm: fix building protection keys test

Commit d8cbe8bfa7d ("tools/testing/selftests/vm: fix build error") tried
to include a ARCH check for powerpc, however ARCH is not defined in the
Makefile before including lib.mk.  This makes test building to skip on
both x86 and powerpc.

Fix the arch check by replacing it using machine type as it is already
defined and used in the test.

Link: https://lkml.kernel.org/r/20201215100402.257376-1-harish@linux.ibm.com
Fixes: d8cbe8bfa7d ("tools/testing/selftests/vm: fix build error")
Signed-off-by: Harish <harish@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoio_uring: don't assume mm is constant across submits
Jens Axboe [Tue, 29 Dec 2020 17:50:46 +0000 (10:50 -0700)]
io_uring: don't assume mm is constant across submits

If we COW the identity, we assume that ->mm never changes. But this
isn't true of multiple processes end up sharing the ring. Hence treat
id->mm like like any other process compontent when it comes to the
identity mapping. This is pretty trivial, just moving the existing grab
into io_grab_identity(), and including a check for the match.

Cc: stable@vger.kernel.org # 5.10
Fixes: 1e6fa5216a0e ("io_uring: COW io_identity on mismatch")
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>:
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>:
Signed-off-by: Jens Axboe <axboe@kernel.dk>