euid:= decimal value
fowner:= decimal value
lsm: are LSM specific
- option: appraise_type:= [imasig]
+ option: appraise_type:= [imasig] [imasig|modsig]
template:= name of a defined IMA template type
(eg, ima-ng). Only valid when action is "measure".
pcr:= decimal value
measure func=KEXEC_KERNEL_CHECK pcr=4
measure func=KEXEC_INITRAMFS_CHECK pcr=5
+
+ Example of appraise rule allowing modsig appended signatures:
+
+ appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig
It is a read/write file. When read, the currently assigned
pretimeout governor is returned. When written, it sets
the pretimeout governor.
+
+What: /sys/class/watchdog/watchdog1/access_cs0
+Date: August 2019
+Contact: Ivan Mikhaylov <i.mikhaylov@yadro.com>,
+ Alexander Amelkin <a.amelkin@yadro.com>
+Description:
+ It is a read/write file. This attribute exists only if the
+ system has booted from the alternate flash chip due to
+ expiration of a watchdog timer of AST2400/AST2500 when
+ alternate boot function was enabled with 'aspeed,alt-boot'
+ devicetree option for that watchdog or with an appropriate
+ h/w strapping (for WDT2 only).
+
+ At alternate flash the 'access_cs0' sysfs node provides:
+ ast2400: a way to get access to the primary SPI flash
+ chip at CS0 after booting from the alternate
+ chip at CS1.
+ ast2500: a way to restore the normal address mapping
+ from (CS0->CS1, CS1->CS0) to (CS0->CS0,
+ CS1->CS1).
+
+ Clearing the boot code selection and timeout counter also
+ resets to the initial state the chip select line mapping. When
+ the SoC is in normal mapping state (i.e. booted from CS0),
+ clearing those bits does nothing for both versions of the SoC.
+ For alternate boot mode (booted from CS1 due to wdt2
+ expiration) the behavior differs as described above.
+
+ This option can be used with wdt2 (watchdog1) only.
+
+ When read, the current status of the boot code selection is
+ shown. When written with any non-zero value, it clears
+ the boot code selection and the timeout counter, which results
+ in chipselect reset for AST2400/AST2500.
lockd.nlm_udpport=M [NFS] Assign UDP port.
Format: <integer>
+ lockdown= [SECURITY]
+ { integrity | confidentiality }
+ Enable the kernel lockdown feature. If set to
+ integrity, kernel features that allow userland to
+ modify the running kernel are disabled. If set to
+ confidentiality, kernel features that allow userland
+ to extract confidential information from the kernel
+ are also disabled.
+
locktorture.nreaders_stress= [KNL]
Set the number of locking read-acquisition kthreads.
Defaults to being automatically set based on the
<&pd IMX_SC_R_DSP_RAM>;
mbox-names = "txdb0", "txdb1", "rxdb0", "rxdb1";
mboxes = <&lsio_mu13 2 0>, <&lsio_mu13 2 1>, <&lsio_mu13 3 0>, <&lsio_mu13 3 1>;
+ memory-region = <&dsp_reserved>;
};
dvdd-supply:
description: DVdd voltage supply
- items:
- - const: dvdd
avdd-supply:
description: AVdd voltage supply
- items:
- - const: avdd
adi,rejection-60-Hz-enable:
description: |
examples:
- |
spi0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
adc@0 {
compatible = "adi,ad7192";
reg = <0>;
- rc-genius-tvgo-a11mce
- rc-gotview7135
- rc-hauppauge
- - rc-hauppauge
- rc-hisi-poplar
- rc-hisi-tv-demo
- rc-imon-mce
enum: [ 4, 8, 12, 16, 20, 24 ]
default: 8
- adi,disable-energy-detect:
- description: |
- Disables Energy Detect Powerdown Mode (default disabled, i.e energy detect
- is enabled if this property is unspecified)
- type: boolean
-
examples:
- |
ethernet {
reg = <1>;
adi,fifo-depth-bits = <16>;
- adi,disable-energy-detect;
};
};
KSZ9021:
All skew control options are specified in picoseconds. The minimum
- value is 0, the maximum value is 3000, and it is incremented by 200ps
- steps.
+ value is 0, the maximum value is 3000, and it can be specified in 200ps
+ steps, *but* these values are in not fact what you get because this chip's
+ skew values actually increase in 120ps steps, starting from -840ps. The
+ incorrect values came from an error in the original KSZ9021 datasheet
+ before it was corrected in revision 1.2 (Feb 2014), but it is too late to
+ change the driver now because of the many existing device trees that have
+ been created using values that go up in increments of 200.
+
+ The following table shows the actual skew delay you will get for each of the
+ possible devicetree values, and the number that will be programmed into the
+ corresponding pad skew register:
+
+ Device Tree Value Delay Pad Skew Register Value
+ -----------------------------------------------------
+ 0 -840ps 0000
+ 200 -720ps 0001
+ 400 -600ps 0010
+ 600 -480ps 0011
+ 800 -360ps 0100
+ 1000 -240ps 0101
+ 1200 -120ps 0110
+ 1400 0ps 0111
+ 1600 120ps 1000
+ 1800 240ps 1001
+ 2000 360ps 1010
+ 2200 480ps 1011
+ 2400 600ps 1100
+ 2600 720ps 1101
+ 2800 840ps 1110
+ 3000 960ps 1111
Optional properties:
R-Car Gen2 and RZ/G1 devices.
- "renesas,etheravb-r8a774a1" for the R8A774A1 SoC.
+ - "renesas,etheravb-r8a774b1" for the R8A774B1 SoC.
- "renesas,etheravb-r8a774c0" for the R8A774C0 SoC.
- "renesas,etheravb-r8a7795" for the R8A7795 SoC.
- "renesas,etheravb-r8a7796" for the R8A7796 SoC.
const: stmmaceth
mac-mode:
- maxItems: 1
+ $ref: ethernet-controller.yaml#/properties/phy-connection-type
description:
The property is identical to 'phy-mode', and assumes that there is mode
converter in-between the MAC & PHY (e.g. GMII-to-RGMII). This converter
- description: exclusive PHY reset line
- description: shared reset line between the PCIe PHY and PCIe controller
- resets-names:
+ reset-names:
items:
- const: phy
- const: pcie
- "mediatek,mt7622-pwm": found on mt7622 SoC.
- "mediatek,mt7623-pwm": found on mt7623 SoC.
- "mediatek,mt7628-pwm": found on mt7628 SoC.
+ - "mediatek,mt7629-pwm", "mediatek,mt7622-pwm": found on mt7629 SoC.
+ - "mediatek,mt8516-pwm": found on mt8516 SoC.
- reg: physical base address and length of the controller's registers.
- #pwm-cells: must be 2. See pwm.txt in this directory for a description of
the cell format.
--- /dev/null
+Spreadtrum PWM controller
+
+Spreadtrum SoCs PWM controller provides 4 PWM channels.
+
+Required properties:
+- compatible : Should be "sprd,ums512-pwm".
+- reg: Physical base address and length of the controller's registers.
+- clocks: The phandle and specifier referencing the controller's clocks.
+- clock-names: Should contain following entries:
+ "pwmn": used to derive the functional clock for PWM channel n (n range: 0 ~ 3).
+ "enablen": for PWM channel n enable clock (n range: 0 ~ 3).
+- #pwm-cells: Should be 2. See pwm.txt in this directory for a description of
+ the cells format.
+
+Optional properties:
+- assigned-clocks: Reference to the PWM clock entries.
+- assigned-clock-parents: The phandle of the parent clock of PWM clock.
+
+Example:
+ pwms: pwm@32260000 {
+ compatible = "sprd,ums512-pwm";
+ reg = <0 0x32260000 0 0x10000>;
+ clock-names = "pwm0", "enable0",
+ "pwm1", "enable1",
+ "pwm2", "enable2",
+ "pwm3", "enable3";
+ clocks = <&aon_clk CLK_PWM0>, <&aonapb_gate CLK_PWM0_EB>,
+ <&aon_clk CLK_PWM1>, <&aonapb_gate CLK_PWM1_EB>,
+ <&aon_clk CLK_PWM2>, <&aonapb_gate CLK_PWM2_EB>,
+ <&aon_clk CLK_PWM3>, <&aonapb_gate CLK_PWM3_EB>;
+ assigned-clocks = <&aon_clk CLK_PWM0>,
+ <&aon_clk CLK_PWM1>,
+ <&aon_clk CLK_PWM2>,
+ <&aon_clk CLK_PWM3>;
+ assigned-clock-parents = <&ext_26m>,
+ <&ext_26m>,
+ <&ext_26m>,
+ <&ext_26m>;
+ #pwm-cells = <2>;
+ };
Optional property:
- little-endian : If present, the TMU registers are little endian. If absent,
the default is big endian.
+- clocks : the clock for clocking the TMU silicon.
Example:
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/watchdog/allwinner,sun4i-a10-wdt.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Allwinner A10 Watchdog Device Tree Bindings
+
+allOf:
+ - $ref: "watchdog.yaml#"
+
+maintainers:
+ - Chen-Yu Tsai <wens@csie.org>
+ - Maxime Ripard <maxime.ripard@bootlin.com>
+
+properties:
+ compatible:
+ oneOf:
+ - const: allwinner,sun4i-a10-wdt
+ - const: allwinner,sun6i-a31-wdt
+ - items:
+ - const: allwinner,sun50i-a64-wdt
+ - const: allwinner,sun6i-a31-wdt
+ - items:
+ - const: allwinner,sun50i-h6-wdt
+ - const: allwinner,sun6i-a31-wdt
+ - items:
+ - const: allwinner,suniv-f1c100s-wdt
+ - const: allwinner,sun4i-a10-wdt
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - interrupts
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ wdt: watchdog@1c20c90 {
+ compatible = "allwinner,sun4i-a10-wdt";
+ reg = <0x01c20c90 0x10>;
+ interrupts = <24>;
+ clocks = <&osc24M>;
+ timeout-sec = <10>;
+ };
+
+...
- compatible: must be one of:
- "aspeed,ast2400-wdt"
- "aspeed,ast2500-wdt"
+ - "aspeed,ast2600-wdt"
- reg: physical base address of the controller and length of memory mapped
region
--- /dev/null
+* Freescale i.MX7ULP Watchdog Timer (WDT) Controller
+
+Required properties:
+- compatible : Should be "fsl,imx7ulp-wdt"
+- reg : Should contain WDT registers location and length
+- interrupts : Should contain WDT interrupt
+- clocks: Should contain a phandle pointing to the gated peripheral clock.
+
+Optional properties:
+- timeout-sec : Contains the watchdog timeout in seconds
+
+Examples:
+
+wdog1: watchdog@403d0000 {
+ compatible = "fsl,imx7ulp-wdt";
+ reg = <0x403d0000 0x10000>;
+ interrupts = <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&pcc2 IMX7ULP_CLK_WDG1>;
+ assigned-clocks = <&pcc2 IMX7ULP_CLK_WDG1>;
+ assigned-clocks-parents = <&scg1 IMX7ULP_CLK_FIRC_BUS_CLK>;
+ timeout-sec = <40>;
+};
+++ /dev/null
-Allwinner SoCs Watchdog timer
-
-Required properties:
-
-- compatible : should be one of
- "allwinner,sun4i-a10-wdt"
- "allwinner,sun6i-a31-wdt"
- "allwinner,sun50i-a64-wdt","allwinner,sun6i-a31-wdt"
- "allwinner,sun50i-h6-wdt","allwinner,sun6i-a31-wdt"
- "allwinner,suniv-f1c100s-wdt", "allwinner,sun4i-a10-wdt"
-- reg : Specifies base physical address and size of the registers.
-
-Optional properties:
-- timeout-sec : Contains the watchdog timeout in seconds
-
-Example:
-
-wdt: watchdog@1c20c90 {
- compatible = "allwinner,sun4i-a10-wdt";
- reg = <0x01c20c90 0x10>;
- timeout-sec = <10>;
-};
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/watchdog/watchdog.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Watchdog Generic Bindings
+
+maintainers:
+ - Guenter Roeck <linux@roeck-us.net>
+ - Wim Van Sebroeck <wim@linux-watchdog.org>
+
+description: |
+ This document describes generic bindings which can be used to
+ describe watchdog devices in a device tree.
+
+properties:
+ $nodename:
+ pattern: "^watchdog(@.*|-[0-9a-f])?$"
+
+ timeout-sec:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ Contains the watchdog timeout in seconds.
+
+...
journalling
fscrypt
fsverity
+
+Filesystems
+===========
+
+Documentation for filesystem implementations.
+
+.. toctree::
+ :maxdepth: 2
+
+ virtiofs
--- /dev/null
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================
+virtiofs: virtio-fs host<->guest shared file system
+===================================================
+
+- Copyright (C) 2019 Red Hat, Inc.
+
+Introduction
+============
+The virtiofs file system for Linux implements a driver for the paravirtualized
+VIRTIO "virtio-fs" device for guest<->host file system sharing. It allows a
+guest to mount a directory that has been exported on the host.
+
+Guests often require access to files residing on the host or remote systems.
+Use cases include making files available to new guests during installation,
+booting from a root file system located on the host, persistent storage for
+stateless or ephemeral guests, and sharing a directory between guests.
+
+Although it is possible to use existing network file systems for some of these
+tasks, they require configuration steps that are hard to automate and they
+expose the storage network to the guest. The virtio-fs device was designed to
+solve these problems by providing file system access without networking.
+
+Furthermore the virtio-fs device takes advantage of the co-location of the
+guest and host to increase performance and provide semantics that are not
+possible with network file systems.
+
+Usage
+=====
+Mount file system with tag ``myfs`` on ``/mnt``:
+
+.. code-block:: sh
+
+ guest# mount -t virtiofs myfs /mnt
+
+Please see https://virtio-fs.gitlab.io/ for details on how to configure QEMU
+and the virtiofsd daemon.
+
+Internals
+=========
+Since the virtio-fs device uses the FUSE protocol for file system requests, the
+virtiofs file system for Linux is integrated closely with the FUSE file system
+client. The guest acts as the FUSE client while the host acts as the FUSE
+server. The /dev/fuse interface between the kernel and userspace is replaced
+with the virtio-fs device interface.
+
+FUSE requests are placed into a virtqueue and processed by the host. The
+response portion of the buffer is filled in by the host and the guest handles
+the request completion.
+
+Mapping /dev/fuse to virtqueues requires solving differences in semantics
+between /dev/fuse and virtqueues. Each time the /dev/fuse device is read, the
+FUSE client may choose which request to transfer, making it possible to
+prioritize certain requests over others. Virtqueues have queue semantics and
+it is not possible to change the order of requests that have been enqueued.
+This is especially important if the virtqueue becomes full since it is then
+impossible to add high priority requests. In order to address this difference,
+the virtio-fs device uses a "hiprio" virtqueue specifically for requests that
+have priority over normal requests.
From commandline LDFLAGS_MODULE shall be used (see kbuild.txt).
- KBUILD_ARFLAGS Options for $(AR) when creating archives
-
- $(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic
- mode) if this option is supported by $(AR).
-
KBUILD_LDS
The linker script with full path. Assigned by the top-level Makefile.
will be written containing all exported symbols that were not
defined in the kernel.
---- 6.3 Symbols From Another External Module
+6.3 Symbols From Another External Module
+----------------------------------------
Sometimes, an external module uses exported symbols from
- another external module. kbuild needs to have full knowledge of
+ another external module. Kbuild needs to have full knowledge of
all symbols to avoid spitting out warnings about undefined
symbols. Three solutions exist for this situation.
The top-level kbuild file would then look like::
#./Kbuild (or ./Makefile):
- obj-y := foo/ bar/
+ obj-m := foo/ bar/
And executing::
Timestamps
----------
-The kernel embeds a timestamp in two places:
+The kernel embeds timestamps in three places:
* The version string exposed by ``uname()`` and included in
``/proc/version``
* File timestamps in the embedded initramfs
-By default the timestamp is the current time. This must be overridden
-using the `KBUILD_BUILD_TIMESTAMP`_ variable. If you are building
-from a git commit, you could use its commit date.
+* If enabled via ``CONFIG_IKHEADERS``, file timestamps of kernel
+ headers embedded in the kernel or respective module,
+ exposed via ``/sys/kernel/kheaders.tar.xz``
+
+By default the timestamp is the current time and in the case of
+``kheaders`` the various files' modification times. This must
+be overridden using the `KBUILD_BUILD_TIMESTAMP`_ variable.
+If you are building from a git commit, you could use its commit date.
The kernel does *not* use the ``__DATE__`` and ``__TIME__`` macros,
and enables warnings if they are used. If you incorporate external
intel/ice
google/gve
mellanox/mlx5
+ netronome/nfp
pensando/ionic
.. only:: subproject and html
* - ``port_list_is_empty``
- ``drop``
- Traps packets that the device decided to drop in case they need to be
- flooded and the flood list is empty
+ flooded (e.g., unknown unicast, unregistered multicast) and there are
+ no ports the packets should be flooded to
* - ``port_loopback_filter``
- ``drop``
- Traps packets that the device decided to drop in case after layer 2
* MSG_DONTWAIT, i.e. non-blocking operation.
recvmsg(2)
-^^^^^^^^^
+^^^^^^^^^^
In most cases recvmsg(2) is needed if you want to extract more information than
recvfrom(2) can provide. For example package priority and timestamp. The
in their role as Linux kernel developers. They will, however, agree to
adhere to this documented process and the Memorandum of Understanding.
+The disclosing party should provide a list of contacts for all other
+entities who have already been, or should be, informed about the issue.
+This serves several purposes:
+
+ - The list of disclosed entities allows communication accross the
+ industry, e.g. other OS vendors, HW vendors, etc.
+
+ - The disclosed entities can be contacted to name experts who should
+ participate in the mitigation development.
+
+ - If an expert which is required to handle an issue is employed by an
+ listed entity or member of an listed entity, then the response teams can
+ request the disclosure of that expert from that entity. This ensures
+ that the expert is also part of the entity's response team.
Disclosure
""""""""""
""""""""""""""""""""""
The initial response team sets up an encrypted mailing-list or repurposes
-an existing one if appropriate. The disclosing party should provide a list
-of contacts for all other parties who have already been, or should be,
-informed about the issue. The response team contacts these parties so they
-can name experts who should be subscribed to the mailing-list.
+an existing one if appropriate.
Using a mailing-list is close to the normal Linux development process and
has been successfully used in developing mitigations for various hardware
stable kernel versions as necessary.
The initial response team will identify further experts from the Linux
-kernel developer community as needed and inform the disclosing party about
-their participation. Bringing in experts can happen at any time of the
-development process and often needs to be handled in a timely manner.
+kernel developer community as needed. Bringing in experts can happen at any
+time of the development process and needs to be handled in a timely manner.
+
+If an expert is employed by or member of an entity on the disclosure list
+provided by the disclosing party, then participation will be requested from
+the relevant entity.
+
+If not, then the disclosing party will be informed about the experts
+participation. The experts are covered by the Memorandum of Understanding
+and the disclosing party is requested to acknowledge the participation. In
+case that the disclosing party has a compelling reason to object, then this
+objection has to be raised within five work days and resolved with the
+incident team immediately. If the disclosing party does not react within
+five work days this is taken as silent acknowledgement.
+
+After acknowledgement or resolution of an objection the expert is disclosed
+by the incident team and brought into the development process.
+
Coordinated release
"""""""""""""""""""
ARM
AMD
IBM
- Intel
+ Intel Tony Luck <tony.luck@intel.com>
Qualcomm Trilok Soni <tsoni@codeaurora.org>
Microsoft Sasha Levin <sashal@kernel.org>
- 'd-ng': the digest of the event, calculated with an arbitrary hash
algorithm (field format: [<hash algo>:]digest, where the digest
prefix is shown only if the hash algorithm is not SHA1 or MD5);
+ - 'd-modsig': the digest of the event without the appended modsig;
- 'n-ng': the name of the event, without size limitations;
- 'sig': the file signature;
+ - 'modsig' the appended file signature;
- 'buf': the buffer data that was used to generate the hash without size limitations;
- "ima-ng" (default): its format is ``d-ng|n-ng``;
- "ima-sig": its format is ``d-ng|n-ng|sig``;
- "ima-buf": its format is ``d-ng|n-ng|buf``;
+ - "ima-modsig": its format is ``d-ng|n-ng|sig|d-modsig|modsig``;
Use
This capability indicates that KVM supports paravirtualized Hyper-V IPI send
hypercalls:
HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
+8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
+
+Architecture: x86
+
+This capability indicates that KVM running on top of Hyper-V hypervisor
+enables Direct TLB flush for its guests meaning that TLB flush
+hypercalls are handled by Level 0 hypervisor (Hyper-V) bypassing KVM.
+Due to the different ABI for hypercall parameters between Hyper-V and
+KVM, enabling this capability effectively disables all hypercall
+handling by KVM (as some KVM hypercall may be mistakenly treated as TLB
+flush hypercalls by Hyper-V) so userspace should disable KVM identification
+in CPUID and only exposes Hyper-V identification. In this case, guest
+thinks it's running on Hyper-V and only use Hyper-V hypercalls.
-------------------------------------------------
-ks8695_wdt:
- wdt_time:
- Watchdog time in seconds. (default=5)
- nowayout:
- Watchdog cannot be stopped once started
- (default=kernel config parameter)
-
--------------------------------------------------
-
machzwd:
nowayout:
Watchdog cannot be stopped once started
-------------------------------------------------
-nuc900_wdt:
- heartbeat:
- Watchdog heartbeats in seconds.
- (default = 15)
- nowayout:
- Watchdog cannot be stopped once started
- (default=kernel config parameter)
-
--------------------------------------------------
-
omap_wdt:
timer_margin:
initial watchdog timeout (in seconds)
FORCEDETH GIGABIT ETHERNET DRIVER
M: Rain River <rain.1986.08.12@gmail.com>
+M: Zhu Yanjun <yanjun.zhu@oracle.com>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/nvidia/*
M: Chao Yu <yuchao0@huawei.com>
L: linux-erofs@lists.ozlabs.org
S: Maintained
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git
+F: Documentation/filesystems/erofs.txt
F: fs/erofs/
+F: include/trace/events/erofs.h
ERRSEQ ERROR TRACKING INFRASTRUCTURE
M: Jeff Layton <jlayton@kernel.org>
KEYS/KEYRINGS:
M: David Howells <dhowells@redhat.com>
+M: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
L: keyrings@vger.kernel.org
S: Maintained
F: Documentation/security/keys/core.rst
KGDB / KDB /debug_core
M: Jason Wessel <jason.wessel@windriver.com>
M: Daniel Thompson <daniel.thompson@linaro.org>
+R: Douglas Anderson <dianders@chromium.org>
W: http://kgdb.wiki.kernel.org/
L: kgdb-bugreport@lists.sourceforge.net
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git
PWM SUBSYSTEM
M: Thierry Reding <thierry.reding@gmail.com>
+R: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
L: linux-pwm@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
+Q: https://patchwork.ozlabs.org/project/linux-pwm/list/
F: Documentation/driver-api/pwm.rst
F: Documentation/devicetree/bindings/pwm/
F: include/linux/pwm.h
F: include/linux/pwm_backlight.h
F: drivers/gpio/gpio-mvebu.c
F: Documentation/devicetree/bindings/gpio/gpio-mvebu.txt
+K: pwm_(config|apply_state|ops)
PXA GPIO DRIVER
M: Robert Jarzmik <robert.jarzmik@free.fr>
M: Zhang Rui <rui.zhang@intel.com>
M: Eduardo Valentin <edubezval@gmail.com>
R: Daniel Lezcano <daniel.lezcano@linaro.org>
+R: Amit Kucheria <amit.kucheria@verdurent.com>
L: linux-pm@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git
F: drivers/s390/virtio/
F: arch/s390/include/uapi/asm/virtio-ccw.h
+VIRTIO FILE SYSTEM
+M: Vivek Goyal <vgoyal@redhat.com>
+M: Stefan Hajnoczi <stefanha@redhat.com>
+M: Miklos Szeredi <miklos@szeredi.hu>
+L: virtualization@lists.linux-foundation.org
+L: linux-fsdevel@vger.kernel.org
+W: https://virtio-fs.gitlab.io/
+S: Supported
+F: fs/fuse/virtio_fs.c
+F: include/uapi/linux/virtio_fs.h
+F: Documentation/filesystems/virtiofs.rst
+
VIRTIO GPU DRIVER
M: David Airlie <airlied@linux.ie>
M: Gerd Hoffmann <kraxel@redhat.com>
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
-PATCHLEVEL = 3
+PATCHLEVEL = 4
SUBLEVEL = 0
-EXTRAVERSION =
-NAME = Bobtail Squid
+EXTRAVERSION = -rc2
+NAME = Nesting Opossum
# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
KBUILD_CHECKSRC = 0
endif
-# Use make M=dir to specify directory of external module to build
-# Old syntax make ... SUBDIRS=$PWD is still supported
-# Setting the environment variable KBUILD_EXTMOD take precedence
-ifdef SUBDIRS
- $(warning ================= WARNING ================)
- $(warning 'SUBDIRS' will be removed after Linux 5.3)
- $(warning )
- $(warning If you are building an individual subdirectory)
- $(warning in the kernel tree, you can do like this:)
- $(warning $$ make path/to/dir/you/want/to/build/)
- $(warning (Do not forget the trailing slash))
- $(warning )
- $(warning If you are building an external module,)
- $(warning Please use 'M=' or 'KBUILD_EXTMOD' instead)
- $(warning ==========================================)
- KBUILD_EXTMOD ?= $(SUBDIRS)
-endif
-
+# Use make M=dir or set the environment variable KBUILD_EXTMOD to specify the
+# directory of external module to build. Setting M= takes precedence.
ifeq ("$(origin M)", "command line")
KBUILD_EXTMOD := $(M)
endif
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
-export KBUILD_ARFLAGS
# Files to ignore in find ... statements
KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
endif
-# use the deterministic mode of AR if available
-KBUILD_ARFLAGS := $(call ar-option,D)
-
include scripts/Makefile.kasan
include scripts/Makefile.extrawarn
include scripts/Makefile.ubsan
pinctrl-0 = <&mmc0_pins_default>;
};
-&gpio0 {
+&gpio0_target {
/* Do not idle the GPIO used for holding the VTT regulator */
ti,no-reset-on-init;
ti,no-idle-on-init;
ranges = <0x0 0x5000 0x1000>;
};
- target-module@7000 { /* 0x44e07000, ap 14 20.0 */
+ gpio0_target: target-module@7000 { /* 0x44e07000, ap 14 20.0 */
compatible = "ti,sysc-omap2", "ti,sysc";
ti,hwmods = "gpio1";
reg = <0x7000 0x4>,
reg = <0xe000 0x4>,
<0xe054 0x4>;
reg-names = "rev", "sysc";
- ti,sysc-midle ;
+ ti,sysc-midle = <SYSC_IDLE_FORCE>,
+ <SYSC_IDLE_NO>,
+ <SYSC_IDLE_SMART>;
ti,sysc-sidle = <SYSC_IDLE_FORCE>,
<SYSC_IDLE_NO>,
<SYSC_IDLE_SMART>;
};
lcd0: display@0 {
- compatible = "panel-dpi";
+ /* This isn't the exact LCD, but the timings meet spec */
+ /* To make it work, set CONFIG_OMAP2_DSS_MIN_FCK_PER_PCK=4 */
+ compatible = "newhaven,nhd-4.3-480272ef-atxl";
label = "15";
- status = "okay";
- pinctrl-names = "default";
+ backlight = <&bl>;
enable-gpios = <&gpio6 16 GPIO_ACTIVE_HIGH>; /* gpio176, lcd INI */
vcc-supply = <&vdd_io_reg>;
remote-endpoint = <&dpi_out>;
};
};
-
- panel-timing {
- clock-frequency = <9000000>;
- hactive = <480>;
- vactive = <272>;
- hfront-porch = <3>;
- hback-porch = <2>;
- hsync-len = <42>;
- vback-porch = <3>;
- vfront-porch = <4>;
- vsync-len = <11>;
- hsync-active = <0>;
- vsync-active = <0>;
- de-active = <1>;
- pixelclk-active = <1>;
- };
};
bl: backlight {
ti,hwmods = "dss_dispc";
clocks = <&disp_clk>;
clock-names = "fck";
+
+ max-memory-bandwidth = <230000000>;
};
rfbi: rfbi@4832a800 {
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 129 1>, <&edma_xbar 128 1>;
dma-names = "tx", "rx";
- clocks = <&ipu_clkctrl DRA7_IPU_MCASP1_CLKCTRL 22>,
+ clocks = <&ipu_clkctrl DRA7_IPU_MCASP1_CLKCTRL 0>,
<&ipu_clkctrl DRA7_IPU_MCASP1_CLKCTRL 24>,
<&ipu_clkctrl DRA7_IPU_MCASP1_CLKCTRL 28>;
clock-names = "fck", "ahclkx", "ahclkr";
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 131 1>, <&edma_xbar 130 1>;
dma-names = "tx", "rx";
- clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP2_CLKCTRL 22>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP2_CLKCTRL 24>,
+ clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP2_CLKCTRL 0>,
+ <&ipu_clkctrl DRA7_IPU_MCASP1_CLKCTRL 24>,
<&l4per2_clkctrl DRA7_L4PER2_MCASP2_CLKCTRL 28>;
clock-names = "fck", "ahclkx", "ahclkr";
status = "disabled";
<SYSC_IDLE_SMART>;
/* Domains (P, C): l4per_pwrdm, l4per2_clkdm */
clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP3_CLKCTRL 0>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP3_CLKCTRL 24>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP3_CLKCTRL 28>;
- clock-names = "fck", "ahclkx", "ahclkr";
+ <&l4per2_clkctrl DRA7_L4PER2_MCASP3_CLKCTRL 24>;
+ clock-names = "fck", "ahclkx";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x68000 0x2000>,
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 133 1>, <&edma_xbar 132 1>;
dma-names = "tx", "rx";
- clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP3_CLKCTRL 22>,
+ clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP3_CLKCTRL 0>,
<&l4per2_clkctrl DRA7_L4PER2_MCASP3_CLKCTRL 24>;
clock-names = "fck", "ahclkx";
status = "disabled";
<SYSC_IDLE_SMART>;
/* Domains (P, C): l4per_pwrdm, l4per2_clkdm */
clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP4_CLKCTRL 0>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP4_CLKCTRL 24>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP4_CLKCTRL 28>;
- clock-names = "fck", "ahclkx", "ahclkr";
+ <&l4per2_clkctrl DRA7_L4PER2_MCASP4_CLKCTRL 24>;
+ clock-names = "fck", "ahclkx";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x6c000 0x2000>,
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 135 1>, <&edma_xbar 134 1>;
dma-names = "tx", "rx";
- clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP4_CLKCTRL 22>,
+ clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP4_CLKCTRL 0>,
<&l4per2_clkctrl DRA7_L4PER2_MCASP4_CLKCTRL 24>;
clock-names = "fck", "ahclkx";
status = "disabled";
<SYSC_IDLE_SMART>;
/* Domains (P, C): l4per_pwrdm, l4per2_clkdm */
clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP5_CLKCTRL 0>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP5_CLKCTRL 24>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP5_CLKCTRL 28>;
- clock-names = "fck", "ahclkx", "ahclkr";
+ <&l4per2_clkctrl DRA7_L4PER2_MCASP5_CLKCTRL 24>;
+ clock-names = "fck", "ahclkx";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x70000 0x2000>,
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 137 1>, <&edma_xbar 136 1>;
dma-names = "tx", "rx";
- clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP5_CLKCTRL 22>,
+ clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP5_CLKCTRL 0>,
<&l4per2_clkctrl DRA7_L4PER2_MCASP5_CLKCTRL 24>;
clock-names = "fck", "ahclkx";
status = "disabled";
<SYSC_IDLE_SMART>;
/* Domains (P, C): l4per_pwrdm, l4per2_clkdm */
clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP6_CLKCTRL 0>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP6_CLKCTRL 24>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP6_CLKCTRL 28>;
- clock-names = "fck", "ahclkx", "ahclkr";
+ <&l4per2_clkctrl DRA7_L4PER2_MCASP6_CLKCTRL 24>;
+ clock-names = "fck", "ahclkx";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x74000 0x2000>,
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 139 1>, <&edma_xbar 138 1>;
dma-names = "tx", "rx";
- clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP6_CLKCTRL 22>,
+ clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP6_CLKCTRL 0>,
<&l4per2_clkctrl DRA7_L4PER2_MCASP6_CLKCTRL 24>;
clock-names = "fck", "ahclkx";
status = "disabled";
<SYSC_IDLE_SMART>;
/* Domains (P, C): l4per_pwrdm, l4per2_clkdm */
clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP7_CLKCTRL 0>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP7_CLKCTRL 24>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP7_CLKCTRL 28>;
- clock-names = "fck", "ahclkx", "ahclkr";
+ <&l4per2_clkctrl DRA7_L4PER2_MCASP7_CLKCTRL 24>;
+ clock-names = "fck", "ahclkx";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x78000 0x2000>,
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 141 1>, <&edma_xbar 140 1>;
dma-names = "tx", "rx";
- clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP7_CLKCTRL 22>,
+ clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP7_CLKCTRL 0>,
<&l4per2_clkctrl DRA7_L4PER2_MCASP7_CLKCTRL 24>;
clock-names = "fck", "ahclkx";
status = "disabled";
<SYSC_IDLE_SMART>;
/* Domains (P, C): l4per_pwrdm, l4per2_clkdm */
clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP8_CLKCTRL 0>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP8_CLKCTRL 24>,
- <&l4per2_clkctrl DRA7_L4PER2_MCASP8_CLKCTRL 28>;
- clock-names = "fck", "ahclkx", "ahclkr";
+ <&l4per2_clkctrl DRA7_L4PER2_MCASP8_CLKCTRL 24>;
+ clock-names = "fck", "ahclkx";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x7c000 0x2000>,
interrupt-names = "tx", "rx";
dmas = <&edma_xbar 143 1>, <&edma_xbar 142 1>;
dma-names = "tx", "rx";
- clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP8_CLKCTRL 22>,
+ clocks = <&l4per2_clkctrl DRA7_L4PER2_MCASP8_CLKCTRL 0>,
<&l4per2_clkctrl DRA7_L4PER2_MCASP8_CLKCTRL 24>;
clock-names = "fck", "ahclkx";
status = "disabled";
>;
};
+ i2c2_pins: pinmux_i2c2_pins {
+ pinctrl-single,pins = <
+ OMAP3_CORE1_IOPAD(0x21be, PIN_INPUT | MUX_MODE0) /* i2c2_scl */
+ OMAP3_CORE1_IOPAD(0x21c0, PIN_INPUT | MUX_MODE0) /* i2c2_sda */
+ >;
+ };
+
+ i2c3_pins: pinmux_i2c3_pins {
+ pinctrl-single,pins = <
+ OMAP3_CORE1_IOPAD(0x21c2, PIN_INPUT | MUX_MODE0) /* i2c3_scl */
+ OMAP3_CORE1_IOPAD(0x21c4, PIN_INPUT | MUX_MODE0) /* i2c3_sda */
+ >;
+ };
+
tsc2004_pins: pinmux_tsc2004_pins {
pinctrl-single,pins = <
OMAP3_CORE1_IOPAD(0x2186, PIN_INPUT | MUX_MODE4) /* mcbsp4_dr.gpio_153 */
OMAP3_WKUP_IOPAD(0x2a0c, PIN_OUTPUT | MUX_MODE4) /* sys_boot1.gpio_3 */
>;
};
- i2c2_pins: pinmux_i2c2_pins {
- pinctrl-single,pins = <
- OMAP3_CORE1_IOPAD(0x21be, PIN_INPUT | MUX_MODE0) /* i2c2_scl */
- OMAP3_CORE1_IOPAD(0x21c0, PIN_INPUT | MUX_MODE0) /* i2c2_sda */
- >;
- };
- i2c3_pins: pinmux_i2c3_pins {
- pinctrl-single,pins = <
- OMAP3_CORE1_IOPAD(0x21c2, PIN_INPUT | MUX_MODE0) /* i2c3_scl */
- OMAP3_CORE1_IOPAD(0x21c4, PIN_INPUT | MUX_MODE0) /* i2c3_sda */
- >;
- };
};
&omap3_pmx_core2 {
&dss {
status = "ok";
vdds_dsi-supply = <&vpll2>;
- vdda_video-supply = <&video_reg>;
pinctrl-names = "default";
pinctrl-0 = <&dss_dpi_pins1>;
port {
display0 = &lcd0;
};
- video_reg: video_reg {
- pinctrl-names = "default";
- pinctrl-0 = <&panel_pwr_pins>;
- compatible = "regulator-fixed";
- regulator-name = "fixed-supply";
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- gpio = <&gpio5 27 GPIO_ACTIVE_HIGH>; /* gpio155, lcd INI */
- };
-
lcd0: display {
- compatible = "panel-dpi";
+ /* This isn't the exact LCD, but the timings meet spec */
+ /* To make it work, set CONFIG_OMAP2_DSS_MIN_FCK_PER_PCK=4 */
+ compatible = "newhaven,nhd-4.3-480272ef-atxl";
label = "15";
- status = "okay";
- /* default-on; */
pinctrl-names = "default";
-
+ pinctrl-0 = <&panel_pwr_pins>;
+ backlight = <&bl>;
+ enable-gpios = <&gpio5 27 GPIO_ACTIVE_HIGH>;
port {
lcd_in: endpoint {
remote-endpoint = <&dpi_out>;
};
};
-
- panel-timing {
- clock-frequency = <9000000>;
- hactive = <480>;
- vactive = <272>;
- hfront-porch = <3>;
- hback-porch = <2>;
- hsync-len = <42>;
- vback-porch = <3>;
- vfront-porch = <4>;
- vsync-len = <11>;
- hsync-active = <0>;
- vsync-active = <0>;
- de-active = <1>;
- pixelclk-active = <1>;
- };
};
bl: backlight {
spi-max-frequency = <100000>;
spi-cpol;
spi-cpha;
+ spi-cs-high;
backlight= <&backlight>;
label = "lcd";
#include <dt-bindings/mfd/dbx500-prcmu.h>
#include <dt-bindings/arm/ux500_pm_domains.h>
#include <dt-bindings/gpio/gpio.h>
+#include <dt-bindings/thermal/thermal.h>
/ {
#address-cells = <1>;
* cooling.
*/
cpu_thermal: cpu-thermal {
- polling-delay-passive = <0>;
- polling-delay = <1000>;
+ polling-delay-passive = <250>;
+ /*
+ * This sensor fires interrupts to update the thermal
+ * zone, so no polling is needed.
+ */
+ polling-delay = <0>;
thermal-sensors = <&thermal>;
cooling-maps {
trip = <&cpu_alert>;
- cooling-device = <&CPU0 0 2>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
contribution = <100>;
};
};
CONFIG_DMADEVICES=y
CONFIG_TI_EDMA=y
CONFIG_COMMON_CLK_PWM=m
-CONFIG_REMOTEPROC=m
+CONFIG_REMOTEPROC=y
CONFIG_DA8XX_REMOTEPROC=m
CONFIG_MEMORY=y
CONFIG_TI_AEMIF=m
CONFIG_SPI_SH_HSPI=y
CONFIG_SPI_SIRF=y
CONFIG_SPI_STM32=m
-CONFIG_SPI_STM32_QSPI=m
+CONFIG_SPI_STM32_QSPI=y
CONFIG_SPI_SUN4I=y
CONFIG_SPI_SUN6I=y
CONFIG_SPI_TEGRA114=y
CONFIG_ROCKCHIP_IOMMU=y
CONFIG_TEGRA_IOMMU_GART=y
CONFIG_TEGRA_IOMMU_SMMU=y
-CONFIG_REMOTEPROC=m
+CONFIG_REMOTEPROC=y
CONFIG_ST_REMOTEPROC=m
CONFIG_RPMSG_VIRTIO=m
CONFIG_ASPEED_LPC_CTRL=m
CONFIG_DRM_OMAP_PANEL_TPO_TD043MTEA1=m
CONFIG_DRM_OMAP_PANEL_NEC_NL8048HL11=m
CONFIG_DRM_TILCDC=m
+CONFIG_DRM_PANEL_SIMPLE=m
+CONFIG_DRM_TI_TFP410=m
CONFIG_FB=y
CONFIG_FIRMWARE_EDID=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_USB_SERIAL_SIMPLE=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_SERIAL_PL2303=m
+CONFIG_USB_SERIAL_OPTION=m
CONFIG_USB_TEST=m
CONFIG_NOP_USB_XCEIV=m
CONFIG_AM335X_PHY_USB=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=m
CONFIG_LEDS_CPCAP=m
+CONFIG_LEDS_LM3532=m
CONFIG_LEDS_GPIO=m
CONFIG_LEDS_PCA963X=m
CONFIG_LEDS_PWM=m
CONFIG_RTC_DRV_CPCAP=m
CONFIG_DMADEVICES=y
CONFIG_OMAP_IOMMU=y
-CONFIG_REMOTEPROC=m
+CONFIG_REMOTEPROC=y
CONFIG_OMAP_REMOTEPROC=m
CONFIG_WKUP_M3_RPROC=m
CONFIG_SOC_TI=y
+++ /dev/null
-#ifndef _ASM_XEN_OPS_H
-#define _ASM_XEN_OPS_H
-
-void xen_efi_runtime_setup(void);
-
-#endif /* _ASM_XEN_OPS_H */
config MACH_ASPEED_G5
bool "Aspeed SoC 5th Generation"
depends on ARCH_MULTI_V6
- select CPU_V6
select PINCTRL_ASPEED_G5
select FTTMR010_TIMER
help
.rev_offs = 0x0000,
.sysc_offs = 0x0010,
.syss_offs = 0x0014,
- .sysc_flags = (SYSC_HAS_SIDLEMODE | SYSC_HAS_SOFTRESET),
+ .sysc_flags = SYSC_HAS_SIDLEMODE | SYSC_HAS_SOFTRESET |
+ SYSC_HAS_RESET_STATUS,
.idlemodes = (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
SIDLE_SMART_WKUP),
.sysc_fields = &omap_hwmod_sysc_type2,
static struct omap_hwmod_class_sysconfig lcdc_sysc = {
.rev_offs = 0x0,
.sysc_offs = 0x54,
- .sysc_flags = (SYSC_HAS_SIDLEMODE | SYSC_HAS_MIDLEMODE),
- .idlemodes = (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART),
+ .sysc_flags = SYSC_HAS_SIDLEMODE | SYSC_HAS_MIDLEMODE,
+ .idlemodes = SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
+ MSTANDBY_FORCE | MSTANDBY_NO | MSTANDBY_SMART,
.sysc_fields = &omap_hwmod_sysc_type2,
};
struct clk *fck, struct clk *ick,
struct ti_sysc_cookie *cookie)
{
- if (fck)
+ if (!IS_ERR(fck))
cookie->clkdm = ti_sysc_find_one_clockdomain(fck);
if (cookie->clkdm)
return 0;
- if (ick)
+ if (!IS_ERR(ick))
cookie->clkdm = ti_sysc_find_one_clockdomain(ick);
if (cookie->clkdm)
return 0;
return 0;
}
-/*
- * This API is to be called during init to set the various voltage
- * domains to the voltage as per the opp table. Typically we boot up
- * at the nominal voltage. So this function finds out the rate of
- * the clock associated with the voltage domain, finds out the correct
- * opp entry and sets the voltage domain to the voltage specified
- * in the opp entry
- */
-static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name,
- const char *oh_name)
-{
- struct voltagedomain *voltdm;
- struct clk *clk;
- struct dev_pm_opp *opp;
- unsigned long freq, bootup_volt;
- struct device *dev;
-
- if (!vdd_name || !clk_name || !oh_name) {
- pr_err("%s: invalid parameters\n", __func__);
- goto exit;
- }
-
- if (!strncmp(oh_name, "mpu", 3))
- /*
- * All current OMAPs share voltage rail and clock
- * source, so CPU0 is used to represent the MPU-SS.
- */
- dev = get_cpu_device(0);
- else
- dev = omap_device_get_by_hwmod_name(oh_name);
-
- if (IS_ERR(dev)) {
- pr_err("%s: Unable to get dev pointer for hwmod %s\n",
- __func__, oh_name);
- goto exit;
- }
-
- voltdm = voltdm_lookup(vdd_name);
- if (!voltdm) {
- pr_err("%s: unable to get vdd pointer for vdd_%s\n",
- __func__, vdd_name);
- goto exit;
- }
-
- clk = clk_get(NULL, clk_name);
- if (IS_ERR(clk)) {
- pr_err("%s: unable to get clk %s\n", __func__, clk_name);
- goto exit;
- }
-
- freq = clk_get_rate(clk);
- clk_put(clk);
-
- opp = dev_pm_opp_find_freq_ceil(dev, &freq);
- if (IS_ERR(opp)) {
- pr_err("%s: unable to find boot up OPP for vdd_%s\n",
- __func__, vdd_name);
- goto exit;
- }
-
- bootup_volt = dev_pm_opp_get_voltage(opp);
- dev_pm_opp_put(opp);
-
- if (!bootup_volt) {
- pr_err("%s: unable to find voltage corresponding to the bootup OPP for vdd_%s\n",
- __func__, vdd_name);
- goto exit;
- }
-
- voltdm_scale(voltdm, bootup_volt);
- return 0;
-
-exit:
- pr_err("%s: unable to set vdd_%s\n", __func__, vdd_name);
- return -EINVAL;
-}
-
#ifdef CONFIG_SUSPEND
static int omap_pm_enter(suspend_state_t suspend_state)
{
}
#endif /* CONFIG_SUSPEND */
-static void __init omap3_init_voltages(void)
-{
- if (!soc_is_omap34xx())
- return;
-
- omap2_set_init_voltage("mpu_iva", "dpll1_ck", "mpu");
- omap2_set_init_voltage("core", "l3_ick", "l3_main");
-}
-
-static void __init omap4_init_voltages(void)
-{
- if (!soc_is_omap44xx())
- return;
-
- omap2_set_init_voltage("mpu", "dpll_mpu_ck", "mpu");
- omap2_set_init_voltage("core", "l3_div_ck", "l3_main_1");
- omap2_set_init_voltage("iva", "dpll_iva_m5x2_ck", "iva");
-}
-
int __maybe_unused omap_pm_nop_init(void)
{
return 0;
omap4_twl_init();
omap_voltage_late_init();
- /* Initialize the voltages */
- omap3_init_voltages();
- omap4_init_voltages();
-
/* Smartreflex device init */
omap_devinit_smartreflex();
# SPDX-License-Identifier: GPL-2.0-only
obj-y := enlighten.o hypercall.o grant-table.o p2m.o mm.o
-obj-$(CONFIG_XEN_EFI) += efi.o
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * Copyright (c) 2015, Linaro Limited, Shannon Zhao
- */
-
-#include <linux/efi.h>
-#include <xen/xen-ops.h>
-#include <asm/xen/xen-ops.h>
-
-/* Set XEN EFI runtime services function pointers. Other fields of struct efi,
- * e.g. efi.systab, will be set like normal EFI.
- */
-void __init xen_efi_runtime_setup(void)
-{
- efi.get_time = xen_efi_get_time;
- efi.set_time = xen_efi_set_time;
- efi.get_wakeup_time = xen_efi_get_wakeup_time;
- efi.set_wakeup_time = xen_efi_set_wakeup_time;
- efi.get_variable = xen_efi_get_variable;
- efi.get_next_variable = xen_efi_get_next_variable;
- efi.set_variable = xen_efi_set_variable;
- efi.query_variable_info = xen_efi_query_variable_info;
- efi.update_capsule = xen_efi_update_capsule;
- efi.query_capsule_caps = xen_efi_query_capsule_caps;
- efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count;
- efi.reset_system = xen_efi_reset_system;
-}
-EXPORT_SYMBOL_GPL(xen_efi_runtime_setup);
#include <xen/xen-ops.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
-#include <asm/xen/xen-ops.h>
#include <asm/system_misc.h>
#include <asm/efi.h>
#include <linux/interrupt.h>
EXPORT_SYMBOL_GPL(HYPERVISOR_physdev_op);
EXPORT_SYMBOL_GPL(HYPERVISOR_vcpu_op);
EXPORT_SYMBOL_GPL(HYPERVISOR_tmem_op);
-EXPORT_SYMBOL_GPL(HYPERVISOR_platform_op);
+EXPORT_SYMBOL_GPL(HYPERVISOR_platform_op_raw);
EXPORT_SYMBOL_GPL(HYPERVISOR_multicall);
EXPORT_SYMBOL_GPL(HYPERVISOR_vm_assist);
EXPORT_SYMBOL_GPL(HYPERVISOR_dm_op);
for_each_memblock(memory, reg) {
if (reg->base < (phys_addr_t)0xffffffff) {
- flags |= __GFP_DMA;
+ if (IS_ENABLED(CONFIG_ZONE_DMA32))
+ flags |= __GFP_DMA32;
+ else
+ flags |= __GFP_DMA;
break;
}
}
for kernel and initramfs as opposed to list of segments as
accepted by previous system call.
-config KEXEC_VERIFY_SIG
+config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
help
config KEXEC_IMAGE_VERIFY_SIG
bool "Enable Image signature verification support"
default y
- depends on KEXEC_VERIFY_SIG
+ depends on KEXEC_SIG
depends on EFI && SIGNED_PE_FILE_VERIFICATION
help
Enable Image signature verification support.
comment "Support for PE file signature verification disabled"
- depends on KEXEC_VERIFY_SIG
+ depends on KEXEC_SIG
depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
config CRASH_DUMP
CONFIG_ARM_SMMU=y
CONFIG_ARM_SMMU_V3=y
CONFIG_QCOM_IOMMU=y
-CONFIG_REMOTEPROC=m
+CONFIG_REMOTEPROC=y
CONFIG_QCOM_Q6V5_MSS=m
CONFIG_QCOM_Q6V5_PAS=m
CONFIG_QCOM_SYSMON=m
#define read_sysreg_el2(r) read_sysreg_elx(r, _EL2, _EL1)
#define write_sysreg_el2(v,r) write_sysreg_elx(v, r, _EL2, _EL1)
-/**
- * hyp_alternate_select - Generates patchable code sequences that are
- * used to switch between two implementations of a function, depending
- * on the availability of a feature.
- *
- * @fname: a symbol name that will be defined as a function returning a
- * function pointer whose type will match @orig and @alt
- * @orig: A pointer to the default function, as returned by @fname when
- * @cond doesn't hold
- * @alt: A pointer to the alternate function, as returned by @fname
- * when @cond holds
- * @cond: a CPU feature (as described in asm/cpufeature.h)
- */
-#define hyp_alternate_select(fname, orig, alt, cond) \
-typeof(orig) * __hyp_text fname(void) \
-{ \
- typeof(alt) *val = orig; \
- asm volatile(ALTERNATIVE("nop \n", \
- "mov %0, %1 \n", \
- cond) \
- : "+r" (val) : "r" (alt)); \
- return val; \
-}
-
int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_XEN_OPS_H
-#define _ASM_XEN_OPS_H
-
-void xen_efi_runtime_setup(void);
-
-#endif /* _ASM_XEN_OPS_H */
}
}
-static bool __hyp_text __true_value(void)
-{
- return true;
-}
-
-static bool __hyp_text __false_value(void)
-{
- return false;
-}
-
-static hyp_alternate_select(__check_arm_834220,
- __false_value, __true_value,
- ARM64_WORKAROUND_834220);
-
static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
{
u64 par, tmp;
* resolve the IPA using the AT instruction.
*/
if (!(esr & ESR_ELx_S1PTW) &&
- (__check_arm_834220()() || (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
+ (cpus_have_const_cap(ARM64_WORKAROUND_834220) ||
+ (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
if (!__translate_far_to_hpfar(far, &hpfar))
return false;
} else {
isb();
}
-static hyp_alternate_select(__tlb_switch_to_guest,
- __tlb_switch_to_guest_nvhe,
- __tlb_switch_to_guest_vhe,
- ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __tlb_switch_to_guest(struct kvm *kvm,
+ struct tlb_inv_context *cxt)
+{
+ if (has_vhe())
+ __tlb_switch_to_guest_vhe(kvm, cxt);
+ else
+ __tlb_switch_to_guest_nvhe(kvm, cxt);
+}
static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
struct tlb_inv_context *cxt)
write_sysreg(0, vttbr_el2);
}
-static hyp_alternate_select(__tlb_switch_to_host,
- __tlb_switch_to_host_nvhe,
- __tlb_switch_to_host_vhe,
- ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __tlb_switch_to_host(struct kvm *kvm,
+ struct tlb_inv_context *cxt)
+{
+ if (has_vhe())
+ __tlb_switch_to_host_vhe(kvm, cxt);
+ else
+ __tlb_switch_to_host_nvhe(kvm, cxt);
+}
void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
{
/* Switch to requested VMID */
kvm = kern_hyp_va(kvm);
- __tlb_switch_to_guest()(kvm, &cxt);
+ __tlb_switch_to_guest(kvm, &cxt);
/*
* We could do so much better if we had the VA as well.
if (!has_vhe() && icache_is_vpipt())
__flush_icache_all();
- __tlb_switch_to_host()(kvm, &cxt);
+ __tlb_switch_to_host(kvm, &cxt);
}
void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
/* Switch to requested VMID */
kvm = kern_hyp_va(kvm);
- __tlb_switch_to_guest()(kvm, &cxt);
+ __tlb_switch_to_guest(kvm, &cxt);
__tlbi(vmalls12e1is);
dsb(ish);
isb();
- __tlb_switch_to_host()(kvm, &cxt);
+ __tlb_switch_to_host(kvm, &cxt);
}
void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
- __tlb_switch_to_guest()(kvm, &cxt);
+ __tlb_switch_to_guest(kvm, &cxt);
__tlbi(vmalle1);
dsb(nsh);
isb();
- __tlb_switch_to_host()(kvm, &cxt);
+ __tlb_switch_to_host(kvm, &cxt);
}
void __hyp_text __kvm_flush_vm_context(void)
# SPDX-License-Identifier: GPL-2.0-only
xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o p2m.o mm.o)
obj-y := xen-arm.o hypercall.o
-obj-$(CONFIG_XEN_EFI) += $(addprefix ../../arm/xen/, efi.o)
#include <linux/uaccess.h>
#include <linux/ptrace.h>
-static int align_enable = 1;
-static int align_count;
+static int align_kern_enable = 1;
+static int align_usr_enable = 1;
+static int align_kern_count = 0;
+static int align_usr_count = 0;
static inline uint32_t get_ptreg(struct pt_regs *regs, uint32_t rx)
{
uint32_t val;
int err;
- if (!access_ok((void *)addr, 1))
- return 1;
-
asm volatile (
"movi %0, 0\n"
"1:\n"
{
int err;
- if (!access_ok((void *)addr, 1))
- return 1;
-
asm volatile (
"movi %0, 0\n"
"1:\n"
if (stb_asm(addr, byte3))
return 1;
- align_count++;
-
return 0;
}
uint32_t addr = 0;
if (!user_mode(regs))
+ goto kernel_area;
+
+ if (!align_usr_enable) {
+ pr_err("%s user disabled.\n", __func__);
goto bad_area;
+ }
+
+ align_usr_count++;
ret = get_user(tmp, (uint16_t *)instruction_pointer(regs));
if (ret) {
goto bad_area;
}
+ goto good_area;
+
+kernel_area:
+ if (!align_kern_enable) {
+ pr_err("%s kernel disabled.\n", __func__);
+ goto bad_area;
+ }
+
+ align_kern_count++;
+
+ tmp = *(uint16_t *)instruction_pointer(regs);
+
+good_area:
opcode = (uint32_t)tmp;
rx = opcode & 0xf;
force_sig_fault(SIGBUS, BUS_ADRALN, (void __user *)addr);
}
-static struct ctl_table alignment_tbl[4] = {
+static struct ctl_table alignment_tbl[5] = {
+ {
+ .procname = "kernel_enable",
+ .data = &align_kern_enable,
+ .maxlen = sizeof(align_kern_enable),
+ .mode = 0666,
+ .proc_handler = &proc_dointvec
+ },
+ {
+ .procname = "user_enable",
+ .data = &align_usr_enable,
+ .maxlen = sizeof(align_usr_enable),
+ .mode = 0666,
+ .proc_handler = &proc_dointvec
+ },
{
- .procname = "enable",
- .data = &align_enable,
- .maxlen = sizeof(align_enable),
+ .procname = "kernel_count",
+ .data = &align_kern_count,
+ .maxlen = sizeof(align_kern_count),
.mode = 0666,
.proc_handler = &proc_dointvec
},
{
- .procname = "count",
- .data = &align_count,
- .maxlen = sizeof(align_count),
+ .procname = "user_count",
+ .data = &align_usr_count,
+ .maxlen = sizeof(align_usr_count),
.mode = 0666,
.proc_handler = &proc_dointvec
},
#include <asm/cacheflush.h>
#include <asm/cachectl.h>
+#define PG_dcache_clean PG_arch_1
+
void flush_dcache_page(struct page *page)
{
- struct address_space *mapping = page_mapping(page);
- unsigned long addr;
+ struct address_space *mapping;
- if (mapping && !mapping_mapped(mapping)) {
- set_bit(PG_arch_1, &(page)->flags);
+ if (page == ZERO_PAGE(0))
return;
- }
- /*
- * We could delay the flush for the !page_mapping case too. But that
- * case is for exec env/arg pages and those are %99 certainly going to
- * get faulted into the tlb (and thus flushed) anyways.
- */
- addr = (unsigned long) page_address(page);
- dcache_wb_range(addr, addr + PAGE_SIZE);
+ mapping = page_mapping_file(page);
+
+ if (mapping && !page_mapcount(page))
+ clear_bit(PG_dcache_clean, &page->flags);
+ else {
+ dcache_wbinv_all();
+ if (mapping)
+ icache_inv_all();
+ set_bit(PG_dcache_clean, &page->flags);
+ }
}
+EXPORT_SYMBOL(flush_dcache_page);
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
- pte_t *pte)
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep)
{
- unsigned long addr;
+ unsigned long pfn = pte_pfn(*ptep);
struct page *page;
- unsigned long pfn;
- pfn = pte_pfn(*pte);
- if (unlikely(!pfn_valid(pfn)))
+ if (!pfn_valid(pfn))
return;
page = pfn_to_page(pfn);
- addr = (unsigned long) page_address(page);
+ if (page == ZERO_PAGE(0))
+ return;
+
+ if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+ dcache_wbinv_all();
- if (vma->vm_flags & VM_EXEC ||
- pages_do_alias(addr, address & PAGE_MASK))
- cache_wbinv_all();
+ if (page_mapping_file(page)) {
+ if (vma->vm_flags & VM_EXEC)
+ icache_inv_all();
+ }
+}
+
+void flush_kernel_dcache_page(struct page *page)
+{
+ struct address_space *mapping;
+
+ mapping = page_mapping_file(page);
+
+ if (!mapping || mapping_mapped(mapping))
+ dcache_wbinv_all();
+}
+EXPORT_SYMBOL(flush_kernel_dcache_page);
+
+void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end)
+{
+ dcache_wbinv_all();
- clear_bit(PG_arch_1, &(page)->flags);
+ if (vma->vm_flags & VM_EXEC)
+ icache_inv_all();
}
#ifndef __ABI_CSKY_CACHEFLUSH_H
#define __ABI_CSKY_CACHEFLUSH_H
-#include <linux/compiler.h>
+#include <linux/mm.h>
#include <asm/string.h>
#include <asm/cache.h>
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *);
-#define flush_cache_mm(mm) cache_wbinv_all()
+#define flush_cache_mm(mm) dcache_wbinv_all()
#define flush_cache_page(vma, page, pfn) cache_wbinv_all()
#define flush_cache_dup_mm(mm) cache_wbinv_all()
+#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
+extern void flush_kernel_dcache_page(struct page *);
+
+#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)
+
+static inline void flush_kernel_vmap_range(void *addr, int size)
+{
+ dcache_wbinv_all();
+}
+static inline void invalidate_kernel_vmap_range(void *addr, int size)
+{
+ dcache_wbinv_all();
+}
+
+#define ARCH_HAS_FLUSH_ANON_PAGE
+static inline void flush_anon_page(struct vm_area_struct *vma,
+ struct page *page, unsigned long vmaddr)
+{
+ if (PageAnon(page))
+ cache_wbinv_all();
+}
+
/*
* if (current_mm != vma->mm) cache_wbinv_range(start, end) will be broken.
* Use cache_wbinv_all() here and need to be improved in future.
*/
-#define flush_cache_range(vma, start, end) cache_wbinv_all()
-#define flush_cache_vmap(start, end) cache_wbinv_range(start, end)
-#define flush_cache_vunmap(start, end) cache_wbinv_range(start, end)
+extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
+#define flush_cache_vmap(start, end) cache_wbinv_all()
+#define flush_cache_vunmap(start, end) cache_wbinv_all()
-#define flush_icache_page(vma, page) cache_wbinv_all()
+#define flush_icache_page(vma, page) do {} while (0);
#define flush_icache_range(start, end) cache_wbinv_range(start, end)
-#define flush_icache_user_range(vma, pg, adr, len) \
- cache_wbinv_range(adr, adr + len)
+#define flush_icache_user_range(vma,page,addr,len) \
+ flush_dcache_page(page)
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
do { \
- cache_wbinv_all(); \
memcpy(dst, src, len); \
- cache_wbinv_all(); \
} while (0)
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
- cache_wbinv_all(); \
memcpy(dst, src, len); \
cache_wbinv_all(); \
} while (0)
-#define flush_dcache_mmap_lock(mapping) do {} while (0)
-#define flush_dcache_mmap_unlock(mapping) do {} while (0)
-
#endif /* __ABI_CSKY_CACHEFLUSH_H */
/* SPDX-License-Identifier: GPL-2.0 */
// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
-extern unsigned long shm_align_mask;
+#include <asm/shmparam.h>
+
extern void flush_dcache_page(struct page *page);
static inline unsigned long pages_do_alias(unsigned long addr1,
unsigned long addr2)
{
- return (addr1 ^ addr2) & shm_align_mask;
+ return (addr1 ^ addr2) & (SHMLBA-1);
}
static inline void clear_user_page(void *addr, unsigned long vaddr,
#include <linux/random.h>
#include <linux/io.h>
-unsigned long shm_align_mask = (0x4000 >> 1) - 1; /* Sane caches */
+#define COLOUR_ALIGN(addr,pgoff) \
+ ((((addr)+SHMLBA-1)&~(SHMLBA-1)) + \
+ (((pgoff)<<PAGE_SHIFT) & (SHMLBA-1)))
-#define COLOUR_ALIGN(addr, pgoff) \
- ((((addr) + shm_align_mask) & ~shm_align_mask) + \
- (((pgoff) << PAGE_SHIFT) & shm_align_mask))
-
-unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
+/*
+ * We need to ensure that shared mappings are correctly aligned to
+ * avoid aliasing issues with VIPT caches. We need to ensure that
+ * a specific page of an object is always mapped at a multiple of
+ * SHMLBA bytes.
+ *
+ * We unconditionally provide this function for all cases.
+ */
+unsigned long
+arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)
{
- struct vm_area_struct *vmm;
- int do_color_align;
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ int do_align = 0;
+ struct vm_unmapped_area_info info;
+
+ /*
+ * We only need to do colour alignment if either the I or D
+ * caches alias.
+ */
+ do_align = filp || (flags & MAP_SHARED);
+ /*
+ * We enforce the MAP_FIXED case.
+ */
if (flags & MAP_FIXED) {
- /*
- * We do not accept a shared mapping if it would violate
- * cache aliasing constraints.
- */
- if ((flags & MAP_SHARED) &&
- ((addr - (pgoff << PAGE_SHIFT)) & shm_align_mask))
+ if (flags & MAP_SHARED &&
+ (addr - (pgoff << PAGE_SHIFT)) & (SHMLBA - 1))
return -EINVAL;
return addr;
}
if (len > TASK_SIZE)
return -ENOMEM;
- do_color_align = 0;
- if (filp || (flags & MAP_SHARED))
- do_color_align = 1;
+
if (addr) {
- if (do_color_align)
+ if (do_align)
addr = COLOUR_ALIGN(addr, pgoff);
else
addr = PAGE_ALIGN(addr);
- vmm = find_vma(current->mm, addr);
+
+ vma = find_vma(mm, addr);
if (TASK_SIZE - len >= addr &&
- (!vmm || addr + len <= vmm->vm_start))
+ (!vma || addr + len <= vm_start_gap(vma)))
return addr;
}
- addr = TASK_UNMAPPED_BASE;
- if (do_color_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- else
- addr = PAGE_ALIGN(addr);
- for (vmm = find_vma(current->mm, addr); ; vmm = vmm->vm_next) {
- /* At this point: (!vmm || addr < vmm->vm_end). */
- if (TASK_SIZE - len < addr)
- return -ENOMEM;
- if (!vmm || addr + len <= vmm->vm_start)
- return addr;
- addr = vmm->vm_end;
- if (do_color_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- }
+ info.flags = 0;
+ info.length = len;
+ info.low_limit = mm->mmap_base;
+ info.high_limit = TASK_SIZE;
+ info.align_mask = do_align ? (PAGE_MASK & (SHMLBA - 1)) : 0;
+ info.align_offset = pgoff << PAGE_SHIFT;
+ return vm_unmapped_area(&info);
}
#define nop() asm volatile ("nop\n":::"memory")
/*
- * sync: completion barrier
- * sync.s: completion barrier and shareable to other cores
- * sync.i: completion barrier with flush cpu pipeline
- * sync.is: completion barrier with flush cpu pipeline and shareable to
- * other cores
+ * sync: completion barrier, all sync.xx instructions
+ * guarantee the last response recieved by bus transaction
+ * made by ld/st instructions before sync.s
+ * sync.s: inherit from sync, but also shareable to other cores
+ * sync.i: inherit from sync, but also flush cpu pipeline
+ * sync.is: the same with sync.i + sync.s
*
* bar.brwarw: ordering barrier for all load/store instructions before it
* bar.brwarws: ordering barrier for all load/store instructions before it
*/
#ifdef CONFIG_CPU_HAS_CACHEV2
-#define mb() asm volatile ("bar.brwarw\n":::"memory")
-#define rmb() asm volatile ("bar.brar\n":::"memory")
-#define wmb() asm volatile ("bar.bwaw\n":::"memory")
+#define mb() asm volatile ("sync.s\n":::"memory")
#ifdef CONFIG_SMP
#define __smp_mb() asm volatile ("bar.brwarws\n":::"memory")
void cache_wbinv_all(void);
void dma_wbinv_range(unsigned long start, unsigned long end);
+void dma_inv_range(unsigned long start, unsigned long end);
void dma_wb_range(unsigned long start, unsigned long end);
#endif
#ifndef __ASM_CSKY_IO_H
#define __ASM_CSKY_IO_H
-#include <abi/pgtable-bits.h>
+#include <asm/pgtable.h>
#include <linux/types.h>
#include <linux/version.h>
-extern void __iomem *ioremap(phys_addr_t offset, size_t size);
-
-extern void iounmap(void *addr);
-
-extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
- size_t size, unsigned long flags);
-
/*
* I/O memory access primitives. Reads are ordered relative to any
* following Normal memory access. Writes are ordered relative to any prior
#define writel(v,c) ({ wmb(); writel_relaxed((v),(c)); mb(); })
#endif
-#define ioremap_nocache(phy, sz) ioremap(phy, sz)
-#define ioremap_wc ioremap_nocache
-#define ioremap_wt ioremap_nocache
+/*
+ * I/O memory mapping functions.
+ */
+extern void __iomem *ioremap_cache(phys_addr_t addr, size_t size);
+extern void __iomem *__ioremap(phys_addr_t addr, size_t size, pgprot_t prot);
+extern void iounmap(void *addr);
+
+#define ioremap(addr, size) __ioremap((addr), (size), pgprot_noncached(PAGE_KERNEL))
+#define ioremap_wc(addr, size) __ioremap((addr), (size), pgprot_writecombine(PAGE_KERNEL))
+#define ioremap_nocache(addr, size) ioremap((addr), (size))
+#define ioremap_cache ioremap_cache
#include <asm-generic/io.h>
{
unsigned long prot = pgprot_val(_prot);
+ prot = (prot & ~_CACHE_MASK) | _CACHE_UNCACHED | _PAGE_SO;
+
+ return __pgprot(prot);
+}
+
+#define pgprot_writecombine pgprot_writecombine
+static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
+{
+ unsigned long prot = pgprot_val(_prot);
+
prot = (prot & ~_CACHE_MASK) | _CACHE_UNCACHED;
return __pgprot(prot);
#define PTE_INDX_SHIFT 10
#define _PGDIR_SHIFT 22
+.macro zero_fp
+#ifdef CONFIG_STACKTRACE
+ movi r8, 0
+#endif
+.endm
+
.macro tlbop_begin name, val0, val1, val2
ENTRY(csky_\name)
mtcr a3, ss2
SAVE_ALL 0
.endm
.macro tlbop_end is_write
+ zero_fp
RD_MEH a2
psrset ee, ie
mov a0, sp
ENTRY(csky_systemcall)
SAVE_ALL TRAP0_SIZE
+ zero_fp
psrset ee, ie
mov r9, sp
bmaski r10, THREAD_SHIFT
andn r9, r10
- ldw r8, (r9, TINFO_FLAGS)
- ANDI_R3 r8, (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT)
- cmpnei r8, 0
+ ldw r12, (r9, TINFO_FLAGS)
+ ANDI_R3 r12, (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT)
+ cmpnei r12, 0
bt csky_syscall_trace
#if defined(__CSKYABIV2__)
subi sp, 8
ENTRY(ret_from_kernel_thread)
jbsr schedule_tail
- mov a0, r8
+ mov a0, r10
jsr r9
jbsr ret_from_exception
mov r9, sp
bmaski r10, THREAD_SHIFT
andn r9, r10
- ldw r8, (r9, TINFO_FLAGS)
- ANDI_R3 r8, (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT)
- cmpnei r8, 0
+ ldw r12, (r9, TINFO_FLAGS)
+ ANDI_R3 r12, (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT)
+ cmpnei r12, 0
bf ret_from_exception
mov a0, sp /* sp = pt_regs pointer */
jbsr syscall_trace_exit
bmaski r10, THREAD_SHIFT
andn r9, r10
- ldw r8, (r9, TINFO_FLAGS)
- andi r8, (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED)
- cmpnei r8, 0
+ ldw r12, (r9, TINFO_FLAGS)
+ andi r12, (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED)
+ cmpnei r12, 0
bt exit_work
1:
RESTORE_ALL
lrw syscallid, ret_from_exception
mov lr, syscallid
- btsti r8, TIF_NEED_RESCHED
+ btsti r12, TIF_NEED_RESCHED
bt work_resched
mov a0, sp
- mov a1, r8
+ mov a1, r12
jmpi do_notify_resume
work_resched:
ENTRY(csky_trap)
SAVE_ALL 0
+ zero_fp
psrset ee
mov a0, sp /* Push Stack pointer arg */
jbsr trap_c /* Call C-level trap handler */
ENTRY(csky_irq)
SAVE_ALL 0
+ zero_fp
psrset ee
#ifdef CONFIG_PREEMPT
* Get task_struct->stack.preempt_count for current,
* and increase 1.
*/
- ldw r8, (r9, TINFO_PREEMPT)
- addi r8, 1
- stw r8, (r9, TINFO_PREEMPT)
+ ldw r12, (r9, TINFO_PREEMPT)
+ addi r12, 1
+ stw r12, (r9, TINFO_PREEMPT)
#endif
mov a0, sp
jbsr csky_do_IRQ
#ifdef CONFIG_PREEMPT
- subi r8, 1
- stw r8, (r9, TINFO_PREEMPT)
- cmpnei r8, 0
+ subi r12, 1
+ stw r12, (r9, TINFO_PREEMPT)
+ cmpnei r12, 0
bt 2f
- ldw r8, (r9, TINFO_FLAGS)
- btsti r8, TIF_NEED_RESCHED
+ ldw r12, (r9, TINFO_FLAGS)
+ btsti r12, TIF_NEED_RESCHED
bf 2f
-1:
jbsr preempt_schedule_irq /* irq en/disable is done inside */
- ldw r7, (r9, TINFO_FLAGS) /* get new tasks TI_FLAGS */
- btsti r7, TIF_NEED_RESCHED
- bt 1b /* go again */
#endif
2:
jmpi ret_from_exception
&csky_pmu.count_width)) {
csky_pmu.count_width = DEFAULT_COUNT_WIDTH;
}
- csky_pmu.max_period = BIT(csky_pmu.count_width) - 1;
+ csky_pmu.max_period = BIT_ULL(csky_pmu.count_width) - 1;
csky_pmu.plat_device = pdev;
return ret;
}
-const static struct of_device_id csky_pmu_of_device_ids[] = {
+static const struct of_device_id csky_pmu_of_device_ids[] = {
{.compatible = "csky,csky-pmu"},
{},
};
if (unlikely(p->flags & PF_KTHREAD)) {
memset(childregs, 0, sizeof(struct pt_regs));
childstack->r15 = (unsigned long) ret_from_kernel_thread;
- childstack->r8 = kthread_arg;
+ childstack->r10 = kthread_arg;
childstack->r9 = usp;
childregs->sr = mfcr("psr");
} else {
cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
}
+void dma_inv_range(unsigned long start, unsigned long end)
+{
+ cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
+}
+
void dma_wb_range(unsigned long start, unsigned long end)
{
- cache_op_range(start, end, DATA_CACHE|CACHE_INV, 1);
+ cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
}
sync_is();
}
+void dma_inv_range(unsigned long start, unsigned long end)
+{
+ unsigned long i = start & ~(L1_CACHE_BYTES - 1);
+
+ for (; i < end; i += L1_CACHE_BYTES)
+ asm volatile("dcache.iva %0\n"::"r"(i):"memory");
+ sync_is();
+}
+
void dma_wb_range(unsigned long start, unsigned long end)
{
unsigned long i = start & ~(L1_CACHE_BYTES - 1);
for (; i < end; i += L1_CACHE_BYTES)
- asm volatile("dcache.civa %0\n"::"r"(i):"memory");
+ asm volatile("dcache.cva %0\n"::"r"(i):"memory");
sync_is();
}
#include <linux/version.h>
#include <asm/cache.h>
-void arch_dma_prep_coherent(struct page *page, size_t size)
-{
- if (PageHighMem(page)) {
- unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-
- do {
- void *ptr = kmap_atomic(page);
- size_t _size = (size < PAGE_SIZE) ? size : PAGE_SIZE;
-
- memset(ptr, 0, _size);
- dma_wbinv_range((unsigned long)ptr,
- (unsigned long)ptr + _size);
-
- kunmap_atomic(ptr);
-
- page++;
- size -= PAGE_SIZE;
- count--;
- } while (count);
- } else {
- void *ptr = page_address(page);
-
- memset(ptr, 0, size);
- dma_wbinv_range((unsigned long)ptr, (unsigned long)ptr + size);
- }
-}
-
static inline void cache_op(phys_addr_t paddr, size_t size,
void (*fn)(unsigned long start, unsigned long end))
{
- struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
- unsigned int offset = paddr & ~PAGE_MASK;
- size_t left = size;
- unsigned long start;
+ struct page *page = phys_to_page(paddr);
+ void *start = __va(page_to_phys(page));
+ unsigned long offset = offset_in_page(paddr);
+ size_t left = size;
do {
size_t len = left;
+ if (offset + len > PAGE_SIZE)
+ len = PAGE_SIZE - offset;
+
if (PageHighMem(page)) {
- void *addr;
+ start = kmap_atomic(page);
- if (offset + len > PAGE_SIZE) {
- if (offset >= PAGE_SIZE) {
- page += offset >> PAGE_SHIFT;
- offset &= ~PAGE_MASK;
- }
- len = PAGE_SIZE - offset;
- }
+ fn((unsigned long)start + offset,
+ (unsigned long)start + offset + len);
- addr = kmap_atomic(page);
- start = (unsigned long)(addr + offset);
- fn(start, start + len);
- kunmap_atomic(addr);
+ kunmap_atomic(start);
} else {
- start = (unsigned long)phys_to_virt(paddr);
- fn(start, start + size);
+ fn((unsigned long)start + offset,
+ (unsigned long)start + offset + len);
}
offset = 0;
+
page++;
+ start += PAGE_SIZE;
left -= len;
} while (left);
}
+static void dma_wbinv_set_zero_range(unsigned long start, unsigned long end)
+{
+ memset((void *)start, 0, end - start);
+ dma_wbinv_range(start, end);
+}
+
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+ cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range);
+}
+
void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir)
{
{
switch (dir) {
case DMA_TO_DEVICE:
- cache_op(paddr, size, dma_wb_range);
- break;
+ return;
case DMA_FROM_DEVICE:
case DMA_BIDIRECTIONAL:
- cache_op(paddr, size, dma_wbinv_range);
+ cache_op(paddr, size, dma_inv_range);
break;
default:
BUG();
mem_init_print_info(NULL);
}
-#ifdef CONFIG_BLK_DEV_INITRD
-void free_initrd_mem(unsigned long start, unsigned long end)
-{
- if (start < end)
- pr_info("Freeing initrd memory: %ldk freed\n",
- (end - start) >> 10);
-
- for (; start < end; start += PAGE_SIZE) {
- ClearPageReserved(virt_to_page(start));
- init_page_count(virt_to_page(start));
- free_page(start);
- totalram_pages_inc();
- }
-}
-#endif
-
extern char __init_begin[], __init_end[];
void free_initmem(void)
#include <asm/pgtable.h>
-void __iomem *ioremap(phys_addr_t addr, size_t size)
+static void __iomem *__ioremap_caller(phys_addr_t addr, size_t size,
+ pgprot_t prot, void *caller)
{
phys_addr_t last_addr;
unsigned long offset, vaddr;
struct vm_struct *area;
- pgprot_t prot;
last_addr = addr + size - 1;
if (!size || last_addr < addr)
addr &= PAGE_MASK;
size = PAGE_ALIGN(size + offset);
- area = get_vm_area_caller(size, VM_ALLOC, __builtin_return_address(0));
+ area = get_vm_area_caller(size, VM_IOREMAP, caller);
if (!area)
return NULL;
vaddr = (unsigned long)area->addr;
- prot = __pgprot(_PAGE_PRESENT | __READABLE | __WRITEABLE |
- _PAGE_GLOBAL | _CACHE_UNCACHED | _PAGE_SO);
-
if (ioremap_page_range(vaddr, vaddr + size, addr, prot)) {
free_vm_area(area);
return NULL;
return (void __iomem *)(vaddr + offset);
}
-EXPORT_SYMBOL(ioremap);
+
+void __iomem *__ioremap(phys_addr_t phys_addr, size_t size, pgprot_t prot)
+{
+ return __ioremap_caller(phys_addr, size, prot,
+ __builtin_return_address(0));
+}
+EXPORT_SYMBOL(__ioremap);
+
+void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
+{
+ return __ioremap_caller(phys_addr, size, PAGE_KERNEL,
+ __builtin_return_address(0));
+}
+EXPORT_SYMBOL(ioremap_cache);
void iounmap(void __iomem *addr)
{
unsigned long size, pgprot_t vma_prot)
{
if (!pfn_valid(pfn)) {
- vma_prot.pgprot |= _PAGE_SO;
return pgprot_noncached(vma_prot);
} else if (file->f_flags & O_SYNC) {
- return pgprot_noncached(vma_prot);
+ return pgprot_writecombine(vma_prot);
}
return vma_prot;
miscintc: interrupt-controller@18060010 {
compatible = "qca,ar7240-misc-intc";
- reg = <0x18060010 0x4>;
+ reg = <0x18060010 0x8>;
interrupt-parent = <&cpuintc>;
interrupts = <6>;
void __init prom_free_prom_memory(void)
{
- unsigned long addr;
int i;
if (prom_flags & PROM_FLAG_DONT_FREE_TEMP)
#include <asm/octeon/octeon-feature.h>
#include <asm/octeon/cvmx-ipd-defs.h>
+#include <asm/octeon/cvmx-pip-defs.h>
enum cvmx_ipd_mode {
CVMX_IPD_OPC_MODE_STT = 0LL, /* All blocks DRAM, not cached in L2 */
# endif
#define __ARCH_WANT_SYS_FORK
#define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_SYS_CLONE3
/* whitelists for checksyscalls */
#define __IGNORE_fadvise64_64
static char daddiwar[] __initdata =
"Enable CPU_DADDI_WORKAROUNDS to rectify.";
-static inline void align_mod(const int align, const int mod)
+static __always_inline __init
+void align_mod(const int align, const int mod)
{
asm volatile(
".set push\n\t"
: "n"(align), "n"(mod));
}
-static __always_inline void mult_sh_align_mod(long *v1, long *v2, long *w,
- const int align, const int mod)
+static __always_inline __init
+void mult_sh_align_mod(long *v1, long *v2, long *w,
+ const int align, const int mod)
{
unsigned long flags;
int m1, m2;
*w = lw;
}
-static inline void check_mult_sh(void)
+static __always_inline __init void check_mult_sh(void)
{
long v1[8], v2[8], w[8];
int bug, fix, i;
exception_exit(prev_state);
}
-static inline void check_daddi(void)
+static __init void check_daddi(void)
{
extern asmlinkage void handle_daddi_ov(void);
unsigned long flags;
int daddiu_bug = IS_ENABLED(CONFIG_CPU_MIPSR6) ? 0 : -1;
-static inline void check_daddiu(void)
+static __init void check_daddiu(void)
{
long v, w, tmp;
return;
}
+ if (start < PHYS_OFFSET)
+ return;
+
memblock_add(start, size);
/* Reserve any memory except the ordinary RAM ranges. */
switch (type) {
* Reserve any memory between the start of RAM and PHYS_OFFSET
*/
if (ramstart > PHYS_OFFSET)
- memblock_reserve(PHYS_OFFSET, PFN_UP(ramstart) - PHYS_OFFSET);
+ memblock_reserve(PHYS_OFFSET, ramstart - PHYS_OFFSET);
if (PFN_UP(ramstart) > ARCH_PFN_OFFSET) {
pr_info("Wasting %lu bytes for tracking %lu unused pages\n",
save_static_function(sys_fork);
save_static_function(sys_clone);
+save_static_function(sys_clone3);
SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
{
432 n32 fsmount sys_fsmount
433 n32 fspick sys_fspick
434 n32 pidfd_open sys_pidfd_open
-# 435 reserved for clone3
+435 n32 clone3 __sys_clone3
432 n64 fsmount sys_fsmount
433 n64 fspick sys_fspick
434 n64 pidfd_open sys_pidfd_open
-# 435 reserved for clone3
+435 n64 clone3 __sys_clone3
432 o32 fsmount sys_fsmount
433 o32 fspick sys_fspick
434 o32 pidfd_open sys_pidfd_open
-# 435 reserved for clone3
+435 o32 clone3 __sys_clone3
*/
#include <linux/fs.h>
#include <linux/fcntl.h>
+#include <linux/memblock.h>
#include <linux/mm.h>
#include <asm/bootinfo.h>
node_id = loongson_memmap->map[i].node_id;
mem_type = loongson_memmap->map[i].mem_type;
- if (node_id == 0) {
- switch (mem_type) {
- case SYSTEM_RAM_LOW:
- add_memory_region(loongson_memmap->map[i].mem_start,
- (u64)loongson_memmap->map[i].mem_size << 20,
- BOOT_MEM_RAM);
- break;
- case SYSTEM_RAM_HIGH:
- add_memory_region(loongson_memmap->map[i].mem_start,
- (u64)loongson_memmap->map[i].mem_size << 20,
- BOOT_MEM_RAM);
- break;
- case SYSTEM_RAM_RESERVED:
- add_memory_region(loongson_memmap->map[i].mem_start,
- (u64)loongson_memmap->map[i].mem_size << 20,
- BOOT_MEM_RESERVED);
- break;
- }
+ if (node_id != 0)
+ continue;
+
+ switch (mem_type) {
+ case SYSTEM_RAM_LOW:
+ memblock_add(loongson_memmap->map[i].mem_start,
+ (u64)loongson_memmap->map[i].mem_size << 20);
+ break;
+ case SYSTEM_RAM_HIGH:
+ memblock_add(loongson_memmap->map[i].mem_start,
+ (u64)loongson_memmap->map[i].mem_size << 20);
+ break;
+ case SYSTEM_RAM_RESERVED:
+ memblock_reserve(loongson_memmap->map[i].mem_start,
+ (u64)loongson_memmap->map[i].mem_size << 20);
+ break;
}
}
}
}
module_init(serial_init);
-static void __init serial_exit(void)
+static void __exit serial_exit(void)
{
platform_device_unregister(&uart8250_device);
}
(u32)node_id, mem_type, mem_start, mem_size);
pr_info(" start_pfn:0x%llx, end_pfn:0x%llx, num_physpages:0x%lx\n",
start_pfn, end_pfn, num_physpages);
- add_memory_region((node_id << 44) + mem_start,
- (u64)mem_size << 20, BOOT_MEM_RAM);
memblock_add_node(PFN_PHYS(start_pfn),
PFN_PHYS(end_pfn - start_pfn), node);
break;
(u32)node_id, mem_type, mem_start, mem_size);
pr_info(" start_pfn:0x%llx, end_pfn:0x%llx, num_physpages:0x%lx\n",
start_pfn, end_pfn, num_physpages);
- add_memory_region((node_id << 44) + mem_start,
- (u64)mem_size << 20, BOOT_MEM_RAM);
memblock_add_node(PFN_PHYS(start_pfn),
PFN_PHYS(end_pfn - start_pfn), node);
break;
case SYSTEM_RAM_RESERVED:
pr_info("Node%d: mem_type:%d, mem_start:0x%llx, mem_size:0x%llx MB\n",
(u32)node_id, mem_type, mem_start, mem_size);
- add_memory_region((node_id << 44) + mem_start,
- (u64)mem_size << 20, BOOT_MEM_RESERVED);
memblock_reserve(((node_id << 44) + mem_start),
mem_size << 20);
break;
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
- free_bootmem_with_active_regions(node, end_pfn);
-
if (node == 0) {
/* kernel end address */
unsigned long kernel_end_pfn = PFN_UP(__pa_symbol(&_end));
memblock_reserve((node_addrspace_offset | 0xfe000000),
32 << 20);
}
-
- sparse_memory_present_with_active_regions(node);
}
static __init void prom_meminit(void)
cpumask_clear(&__node_data[(node)]->cpumask);
}
}
+ memblocks_present();
max_low_pfn = PHYS_PFN(memblock_end_of_DRAM());
for (cpu = 0; cpu < loongson_sysconf.nr_cpus; cpu++) {
/* memory blocks */
struct prom_pmemblock mdesc[PROM_MAX_PMEMBLOCKS];
+#define MAX_PROM_MEM 5
static phys_addr_t prom_mem_base[MAX_PROM_MEM] __initdata;
static phys_addr_t prom_mem_size[MAX_PROM_MEM] __initdata;
static unsigned int nr_prom_mem __initdata;
p++;
if (type == BOOT_MEM_ROM_DATA) {
- if (nr_prom_mem >= 5) {
+ if (nr_prom_mem >= MAX_PROM_MEM) {
pr_err("Too many ROM DATA regions");
continue;
}
char *ptr;
int len = 0;
int i;
- unsigned long addr;
/*
* preserve environment variables and command line from pmon/bbload
ifndef CONFIG_CPU_MIPSR6
ifeq ($(call ld-ifversion, -lt, 225000000, y),y)
$(warning MIPS VDSO requires binutils >= 2.25)
- obj-vdso-y := $(filter-out gettimeofday.o, $(obj-vdso-y))
+ obj-vdso-y := $(filter-out vgettimeofday.o, $(obj-vdso-y))
ccflags-vdso += -DDISABLE_MIPS_VDSO
endif
endif
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * Copyright (C) 2015 Imagination Technologies
- * Author: Alex Smith <alex.smith@imgtec.com>
- */
-
-#include "vdso.h"
-
-#include <linux/compiler.h>
-#include <linux/time.h>
-
-#include <asm/clocksource.h>
-#include <asm/io.h>
-#include <asm/unistd.h>
-#include <asm/vdso.h>
-
-#ifdef CONFIG_MIPS_CLOCK_VSYSCALL
-
-static __always_inline long gettimeofday_fallback(struct timeval *_tv,
- struct timezone *_tz)
-{
- register struct timezone *tz asm("a1") = _tz;
- register struct timeval *tv asm("a0") = _tv;
- register long ret asm("v0");
- register long nr asm("v0") = __NR_gettimeofday;
- register long error asm("a3");
-
- asm volatile(
- " syscall\n"
- : "=r" (ret), "=r" (error)
- : "r" (tv), "r" (tz), "r" (nr)
- : "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13",
- "$14", "$15", "$24", "$25", "hi", "lo", "memory");
-
- return error ? -ret : ret;
-}
-
-#endif
-
-static __always_inline long clock_gettime_fallback(clockid_t _clkid,
- struct timespec *_ts)
-{
- register struct timespec *ts asm("a1") = _ts;
- register clockid_t clkid asm("a0") = _clkid;
- register long ret asm("v0");
- register long nr asm("v0") = __NR_clock_gettime;
- register long error asm("a3");
-
- asm volatile(
- " syscall\n"
- : "=r" (ret), "=r" (error)
- : "r" (clkid), "r" (ts), "r" (nr)
- : "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13",
- "$14", "$15", "$24", "$25", "hi", "lo", "memory");
-
- return error ? -ret : ret;
-}
-
-static __always_inline int do_realtime_coarse(struct timespec *ts,
- const union mips_vdso_data *data)
-{
- u32 start_seq;
-
- do {
- start_seq = vdso_data_read_begin(data);
-
- ts->tv_sec = data->xtime_sec;
- ts->tv_nsec = data->xtime_nsec >> data->cs_shift;
- } while (vdso_data_read_retry(data, start_seq));
-
- return 0;
-}
-
-static __always_inline int do_monotonic_coarse(struct timespec *ts,
- const union mips_vdso_data *data)
-{
- u32 start_seq;
- u64 to_mono_sec;
- u64 to_mono_nsec;
-
- do {
- start_seq = vdso_data_read_begin(data);
-
- ts->tv_sec = data->xtime_sec;
- ts->tv_nsec = data->xtime_nsec >> data->cs_shift;
-
- to_mono_sec = data->wall_to_mono_sec;
- to_mono_nsec = data->wall_to_mono_nsec;
- } while (vdso_data_read_retry(data, start_seq));
-
- ts->tv_sec += to_mono_sec;
- timespec_add_ns(ts, to_mono_nsec);
-
- return 0;
-}
-
-#ifdef CONFIG_CSRC_R4K
-
-static __always_inline u64 read_r4k_count(void)
-{
- unsigned int count;
-
- __asm__ __volatile__(
- " .set push\n"
- " .set mips32r2\n"
- " rdhwr %0, $2\n"
- " .set pop\n"
- : "=r" (count));
-
- return count;
-}
-
-#endif
-
-#ifdef CONFIG_CLKSRC_MIPS_GIC
-
-static __always_inline u64 read_gic_count(const union mips_vdso_data *data)
-{
- void __iomem *gic = get_gic(data);
- u32 hi, hi2, lo;
-
- do {
- hi = __raw_readl(gic + sizeof(lo));
- lo = __raw_readl(gic);
- hi2 = __raw_readl(gic + sizeof(lo));
- } while (hi2 != hi);
-
- return (((u64)hi) << 32) + lo;
-}
-
-#endif
-
-static __always_inline u64 get_ns(const union mips_vdso_data *data)
-{
- u64 cycle_now, delta, nsec;
-
- switch (data->clock_mode) {
-#ifdef CONFIG_CSRC_R4K
- case VDSO_CLOCK_R4K:
- cycle_now = read_r4k_count();
- break;
-#endif
-#ifdef CONFIG_CLKSRC_MIPS_GIC
- case VDSO_CLOCK_GIC:
- cycle_now = read_gic_count(data);
- break;
-#endif
- default:
- return 0;
- }
-
- delta = (cycle_now - data->cs_cycle_last) & data->cs_mask;
-
- nsec = (delta * data->cs_mult) + data->xtime_nsec;
- nsec >>= data->cs_shift;
-
- return nsec;
-}
-
-static __always_inline int do_realtime(struct timespec *ts,
- const union mips_vdso_data *data)
-{
- u32 start_seq;
- u64 ns;
-
- do {
- start_seq = vdso_data_read_begin(data);
-
- if (data->clock_mode == VDSO_CLOCK_NONE)
- return -ENOSYS;
-
- ts->tv_sec = data->xtime_sec;
- ns = get_ns(data);
- } while (vdso_data_read_retry(data, start_seq));
-
- ts->tv_nsec = 0;
- timespec_add_ns(ts, ns);
-
- return 0;
-}
-
-static __always_inline int do_monotonic(struct timespec *ts,
- const union mips_vdso_data *data)
-{
- u32 start_seq;
- u64 ns;
- u64 to_mono_sec;
- u64 to_mono_nsec;
-
- do {
- start_seq = vdso_data_read_begin(data);
-
- if (data->clock_mode == VDSO_CLOCK_NONE)
- return -ENOSYS;
-
- ts->tv_sec = data->xtime_sec;
- ns = get_ns(data);
-
- to_mono_sec = data->wall_to_mono_sec;
- to_mono_nsec = data->wall_to_mono_nsec;
- } while (vdso_data_read_retry(data, start_seq));
-
- ts->tv_sec += to_mono_sec;
- ts->tv_nsec = 0;
- timespec_add_ns(ts, ns + to_mono_nsec);
-
- return 0;
-}
-
-#ifdef CONFIG_MIPS_CLOCK_VSYSCALL
-
-/*
- * This is behind the ifdef so that we don't provide the symbol when there's no
- * possibility of there being a usable clocksource, because there's nothing we
- * can do without it. When libc fails the symbol lookup it should fall back on
- * the standard syscall path.
- */
-int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
-{
- const union mips_vdso_data *data = get_vdso_data();
- struct timespec ts;
- int ret;
-
- ret = do_realtime(&ts, data);
- if (ret)
- return gettimeofday_fallback(tv, tz);
-
- if (tv) {
- tv->tv_sec = ts.tv_sec;
- tv->tv_usec = ts.tv_nsec / 1000;
- }
-
- if (tz) {
- tz->tz_minuteswest = data->tz_minuteswest;
- tz->tz_dsttime = data->tz_dsttime;
- }
-
- return 0;
-}
-
-#endif /* CONFIG_MIPS_CLOCK_VSYSCALL */
-
-int __vdso_clock_gettime(clockid_t clkid, struct timespec *ts)
-{
- const union mips_vdso_data *data = get_vdso_data();
- int ret = -1;
-
- switch (clkid) {
- case CLOCK_REALTIME_COARSE:
- ret = do_realtime_coarse(ts, data);
- break;
- case CLOCK_MONOTONIC_COARSE:
- ret = do_monotonic_coarse(ts, data);
- break;
- case CLOCK_REALTIME:
- ret = do_realtime(ts, data);
- break;
- case CLOCK_MONOTONIC:
- ret = do_monotonic(ts, data);
- break;
- default:
- break;
- }
-
- if (ret)
- ret = clock_gettime_fallback(clkid, ts);
-
- return ret;
-}
dtb_passed = r6;
if (r7)
- strncpy(cmdline_passed, (char *)r7, COMMAND_LINE_SIZE);
+ strlcpy(cmdline_passed, (char *)r7, COMMAND_LINE_SIZE);
}
#endif
#ifndef CONFIG_CMDLINE_FORCE
if (cmdline_passed[0])
- strncpy(boot_command_line, cmdline_passed, COMMAND_LINE_SIZE);
+ strlcpy(boot_command_line, cmdline_passed, COMMAND_LINE_SIZE);
#ifdef CONFIG_NIOS2_CMDLINE_IGNORE_DTB
else
- strncpy(boot_command_line, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
+ strlcpy(boot_command_line, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
#endif
#endif
BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc
-BOOTARFLAGS := -cr$(KBUILD_ARFLAGS)
+BOOTARFLAGS := -crD
ifdef CONFIG_CC_IS_CLANG
BOOTCFLAGS += $(CLANG_FLAGS)
extern pgtable_t radix__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
extern pmd_t radix__pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp);
-extern int radix__has_transparent_hugepage(void);
+static inline int radix__has_transparent_hugepage(void)
+{
+ /* For radix 2M at PMD level means thp */
+ if (mmu_psize_defs[MMU_PAGE_2M].shift == PMD_SHIFT)
+ return 1;
+ return 0;
+}
#endif
extern int __meminit radix__vmemmap_create_mapping(unsigned long start,
#define CPU_FTR_POWER9_DD2_1 LONG_ASM_CONST(0x0000080000000000)
#define CPU_FTR_P9_TM_HV_ASSIST LONG_ASM_CONST(0x0000100000000000)
#define CPU_FTR_P9_TM_XER_SO_BUG LONG_ASM_CONST(0x0000200000000000)
-#define CPU_FTR_P9_TLBIE_BUG LONG_ASM_CONST(0x0000400000000000)
+#define CPU_FTR_P9_TLBIE_STQ_BUG LONG_ASM_CONST(0x0000400000000000)
#define CPU_FTR_P9_TIDR LONG_ASM_CONST(0x0000800000000000)
+#define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001000000000000)
#ifndef __ASSEMBLY__
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
- CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR)
+ CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR)
#define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
#define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1)
#define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1 | \
return xirr;
}
-static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
+/*
+ * To avoid the need to unnecessarily exit fully to the host kernel, an IPI to
+ * a CPU thread that's running/napping inside of a guest is by default regarded
+ * as a request to wake the CPU (if needed) and continue execution within the
+ * guest, potentially to process new state like externally-generated
+ * interrupts or IPIs sent from within the guest itself (e.g. H_PROD/H_IPI).
+ *
+ * To force an exit to the host kernel, kvmppc_set_host_ipi() must be called
+ * prior to issuing the IPI to set the corresponding 'host_ipi' flag in the
+ * target CPU's PACA. To avoid unnecessary exits to the host, this flag should
+ * be immediately cleared via kvmppc_clear_host_ipi() by the IPI handler on
+ * the receiving side prior to processing the IPI work.
+ *
+ * NOTE:
+ *
+ * We currently issue an smp_mb() at the beginning of kvmppc_set_host_ipi().
+ * This is to guard against sequences such as the following:
+ *
+ * CPU
+ * X: smp_muxed_ipi_set_message():
+ * X: smp_mb()
+ * X: message[RESCHEDULE] = 1
+ * X: doorbell_global_ipi(42):
+ * X: kvmppc_set_host_ipi(42)
+ * X: ppc_msgsnd_sync()/smp_mb()
+ * X: ppc_msgsnd() -> 42
+ * 42: doorbell_exception(): // from CPU X
+ * 42: ppc_msgsync()
+ * 105: smp_muxed_ipi_set_message():
+ * 105: smb_mb()
+ * // STORE DEFERRED DUE TO RE-ORDERING
+ * --105: message[CALL_FUNCTION] = 1
+ * | 105: doorbell_global_ipi(42):
+ * | 105: kvmppc_set_host_ipi(42)
+ * | 42: kvmppc_clear_host_ipi(42)
+ * | 42: smp_ipi_demux_relaxed()
+ * | 42: // returns to executing guest
+ * | // RE-ORDERED STORE COMPLETES
+ * ->105: message[CALL_FUNCTION] = 1
+ * 105: ppc_msgsnd_sync()/smp_mb()
+ * 105: ppc_msgsnd() -> 42
+ * 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
+ * 105: // hangs waiting on 42 to process messages/call_single_queue
+ *
+ * We also issue an smp_mb() at the end of kvmppc_clear_host_ipi(). This is
+ * to guard against sequences such as the following (as well as to create
+ * a read-side pairing with the barrier in kvmppc_set_host_ipi()):
+ *
+ * CPU
+ * X: smp_muxed_ipi_set_message():
+ * X: smp_mb()
+ * X: message[RESCHEDULE] = 1
+ * X: doorbell_global_ipi(42):
+ * X: kvmppc_set_host_ipi(42)
+ * X: ppc_msgsnd_sync()/smp_mb()
+ * X: ppc_msgsnd() -> 42
+ * 42: doorbell_exception(): // from CPU X
+ * 42: ppc_msgsync()
+ * // STORE DEFERRED DUE TO RE-ORDERING
+ * -- 42: kvmppc_clear_host_ipi(42)
+ * | 42: smp_ipi_demux_relaxed()
+ * | 105: smp_muxed_ipi_set_message():
+ * | 105: smb_mb()
+ * | 105: message[CALL_FUNCTION] = 1
+ * | 105: doorbell_global_ipi(42):
+ * | 105: kvmppc_set_host_ipi(42)
+ * | // RE-ORDERED STORE COMPLETES
+ * -> 42: kvmppc_clear_host_ipi(42)
+ * 42: // returns to executing guest
+ * 105: ppc_msgsnd_sync()/smp_mb()
+ * 105: ppc_msgsnd() -> 42
+ * 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
+ * 105: // hangs waiting on 42 to process messages/call_single_queue
+ */
+static inline void kvmppc_set_host_ipi(int cpu)
{
- paca_ptrs[cpu]->kvm_hstate.host_ipi = host_ipi;
+ /*
+ * order stores of IPI messages vs. setting of host_ipi flag
+ *
+ * pairs with the barrier in kvmppc_clear_host_ipi()
+ */
+ smp_mb();
+ paca_ptrs[cpu]->kvm_hstate.host_ipi = 1;
+}
+
+static inline void kvmppc_clear_host_ipi(int cpu)
+{
+ paca_ptrs[cpu]->kvm_hstate.host_ipi = 0;
+ /*
+ * order clearing of host_ipi flag vs. processing of IPI messages
+ *
+ * pairs with the barrier in kvmppc_set_host_ipi()
+ */
+ smp_mb();
}
static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
return 0;
}
-static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
+static inline void kvmppc_set_host_ipi(int cpu)
+{}
+
+static inline void kvmppc_clear_host_ipi(int cpu)
{}
static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
#define HMER_DEBUG_TRIG (1ul << (63 - 17)) /* Debug trigger */
#define SPRN_HMEER 0x151 /* Hyp maintenance exception enable reg */
#define SPRN_PCR 0x152 /* Processor compatibility register */
-#define PCR_VEC_DIS (1ul << (63-0)) /* Vec. disable (bit NA since POWER8) */
-#define PCR_VSX_DIS (1ul << (63-1)) /* VSX disable (bit NA since POWER8) */
-#define PCR_TM_DIS (1ul << (63-2)) /* Trans. memory disable (POWER8) */
+#define PCR_VEC_DIS (__MASK(63-0)) /* Vec. disable (bit NA since POWER8) */
+#define PCR_VSX_DIS (__MASK(63-1)) /* VSX disable (bit NA since POWER8) */
+#define PCR_TM_DIS (__MASK(63-2)) /* Trans. memory disable (POWER8) */
+#define PCR_HIGH_BITS (PCR_VEC_DIS | PCR_VSX_DIS | PCR_TM_DIS)
/*
* These bits are used in the function kvmppc_set_arch_compat() to specify and
* determine both the compatibility level which we want to emulate and the
#define PCR_ARCH_207 0x8 /* Architecture 2.07 */
#define PCR_ARCH_206 0x4 /* Architecture 2.06 */
#define PCR_ARCH_205 0x2 /* Architecture 2.05 */
+#define PCR_LOW_BITS (PCR_ARCH_207 | PCR_ARCH_206 | PCR_ARCH_205)
+#define PCR_MASK ~(PCR_HIGH_BITS | PCR_LOW_BITS) /* PCR Reserved Bits */
#define SPRN_HEIR 0x153 /* Hypervisor Emulated Instruction Register */
#define SPRN_TLBINDEXR 0x154 /* P7 TLB control register */
#define SPRN_TLBVPNR 0x155 /* P7 TLB control register */
beqlr
li r0,0
mtspr SPRN_LPID,r0
+ LOAD_REG_IMMEDIATE(r0, PCR_MASK)
mtspr SPRN_PCR,r0
mfspr r3,SPRN_LPCR
li r4,(LPCR_LPES1 >> LPCR_LPES_SH)
beqlr
li r0,0
mtspr SPRN_LPID,r0
+ LOAD_REG_IMMEDIATE(r0, PCR_MASK)
mtspr SPRN_PCR,r0
mfspr r3,SPRN_LPCR
li r4,(LPCR_LPES1 >> LPCR_LPES_SH)
beqlr
li r0,0
mtspr SPRN_LPID,r0
+ LOAD_REG_IMMEDIATE(r0, PCR_MASK)
mtspr SPRN_PCR,r0
mfspr r3,SPRN_LPCR
ori r3, r3, LPCR_PECEDH
beqlr
li r0,0
mtspr SPRN_LPID,r0
+ LOAD_REG_IMMEDIATE(r0, PCR_MASK)
mtspr SPRN_PCR,r0
mfspr r3,SPRN_LPCR
ori r3, r3, LPCR_PECEDH
mtspr SPRN_PSSCR,r0
mtspr SPRN_LPID,r0
mtspr SPRN_PID,r0
+ LOAD_REG_IMMEDIATE(r0, PCR_MASK)
mtspr SPRN_PCR,r0
mfspr r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC)
mtspr SPRN_PSSCR,r0
mtspr SPRN_LPID,r0
mtspr SPRN_PID,r0
+ LOAD_REG_IMMEDIATE(r0, PCR_MASK)
mtspr SPRN_PCR,r0
mfspr r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC)
{
u32 tag = get_hard_smp_processor_id(cpu);
- kvmppc_set_host_ipi(cpu, 1);
+ kvmppc_set_host_ipi(cpu);
/* Order previous accesses vs. msgsnd, which is treated as a store */
ppc_msgsnd_sync();
ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag);
{
u32 tag = cpu_thread_in_core(cpu);
- kvmppc_set_host_ipi(cpu, 1);
+ kvmppc_set_host_ipi(cpu);
/* Order previous accesses vs. msgsnd, which is treated as a store */
ppc_msgsnd_sync();
ppc_msgsnd(PPC_DBELL_MSGTYPE, 0, tag);
may_hard_irq_enable();
- kvmppc_set_host_ipi(smp_processor_id(), 0);
+ kvmppc_clear_host_ipi(smp_processor_id());
__this_cpu_inc(irq_stat.doorbell_irqs);
smp_ipi_demux_relaxed(); /* already performed the barrier */
if (hv_mode) {
mtspr(SPRN_LPID, 0);
mtspr(SPRN_HFSCR, system_registers.hfscr);
- mtspr(SPRN_PCR, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
}
mtspr(SPRN_FSCR, system_registers.fscr);
mtspr(SPRN_HFSCR, 0);
}
mtspr(SPRN_FSCR, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
/*
* LPCR does not get cleared, to match behaviour with secondaries
return true;
}
+/*
+ * Handle POWER9 broadcast tlbie invalidation issue using
+ * cpu feature flag.
+ */
+static __init void update_tlbie_feature_flag(unsigned long pvr)
+{
+ if (PVR_VER(pvr) == PVR_POWER9) {
+ /*
+ * Set the tlbie feature flag for anything below
+ * Nimbus DD 2.3 and Cumulus DD 1.3
+ */
+ if ((pvr & 0xe000) == 0) {
+ /* Nimbus */
+ if ((pvr & 0xfff) < 0x203)
+ cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG;
+ } else if ((pvr & 0xc000) == 0) {
+ /* Cumulus */
+ if ((pvr & 0xfff) < 0x103)
+ cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG;
+ } else {
+ WARN_ONCE(1, "Unknown PVR");
+ cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG;
+ }
+
+ cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_ERAT_BUG;
+ }
+}
+
static __init void cpufeatures_cpu_quirks(void)
{
- int version = mfspr(SPRN_PVR);
+ unsigned long version = mfspr(SPRN_PVR);
/*
* Not all quirks can be derived from the cpufeatures device tree.
if ((version & 0xffff0000) == 0x004e0000) {
cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR);
- cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG;
cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR;
}
+ update_tlbie_feature_flag(version);
/*
* PKEY was not in the initial base or feature node
* specification, but it should become optional in the next
pci_err(pdev, "Going to break: %pR\n", bar);
if (pdev->is_virtfn) {
-#ifndef CONFIG_IOV
+#ifndef CONFIG_PCI_IOV
return -ENXIO;
#else
/*
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
pos += PCI_SRIOV_CTRL;
bit = PCI_SRIOV_CTRL_MSE;
-#endif /* !CONFIG_IOV */
+#endif /* !CONFIG_PCI_IOV */
} else {
bit = PCI_COMMAND_MEMORY;
pos = PCI_COMMAND;
#include "book3s.h"
#include "trace.h"
-#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
-#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+#define VM_STAT(x, ...) offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__
+#define VCPU_STAT(x, ...) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__
/* #define EXIT_DEBUG */
{ "pthru_all", VCPU_STAT(pthru_all) },
{ "pthru_host", VCPU_STAT(pthru_host) },
{ "pthru_bad_aff", VCPU_STAT(pthru_bad_aff) },
- { "largepages_2M", VM_STAT(num_2M_pages) },
- { "largepages_1G", VM_STAT(num_1G_pages) },
+ { "largepages_2M", VM_STAT(num_2M_pages, .mode = 0444) },
+ { "largepages_1G", VM_STAT(num_1G_pages, .mode = 0444) },
{ NULL }
};
spin_lock(&vc->lock);
vc->arch_compat = arch_compat;
- /* Set all PCR bits for which guest_pcr_bit <= bit < host_pcr_bit */
- vc->pcr = host_pcr_bit - guest_pcr_bit;
+ /*
+ * Set all PCR bits for which guest_pcr_bit <= bit < host_pcr_bit
+ * Also set all reserved PCR bits
+ */
+ vc->pcr = (host_pcr_bit - guest_pcr_bit) | PCR_MASK;
spin_unlock(&vc->lock);
return 0;
}
if (vc->pcr)
- mtspr(SPRN_PCR, vc->pcr);
+ mtspr(SPRN_PCR, vc->pcr | PCR_MASK);
mtspr(SPRN_DPDES, vc->dpdes);
mtspr(SPRN_VTB, vc->vtb);
vc->vtb = mfspr(SPRN_VTB);
mtspr(SPRN_DPDES, 0);
if (vc->pcr)
- mtspr(SPRN_PCR, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
if (vc->tb_offset_applied) {
u64 new_tb = mftb() - vc->tb_offset_applied;
{
struct kvmppc_vcore *vc = vcpu->arch.vcore;
- hr->pcr = vc->pcr;
+ hr->pcr = vc->pcr | PCR_MASK;
hr->dpdes = vc->dpdes;
hr->hfscr = vcpu->arch.hfscr;
hr->tb_offset = vc->tb_offset;
hr->lpid = swab32(hr->lpid);
hr->vcpu_token = swab32(hr->vcpu_token);
hr->lpcr = swab64(hr->lpcr);
- hr->pcr = swab64(hr->pcr);
+ hr->pcr = swab64(hr->pcr) | PCR_MASK;
hr->amor = swab64(hr->amor);
hr->dpdes = swab64(hr->dpdes);
hr->hfscr = swab64(hr->hfscr);
{
struct kvmppc_vcore *vc = vcpu->arch.vcore;
- vc->pcr = hr->pcr;
+ vc->pcr = hr->pcr | PCR_MASK;
vc->dpdes = hr->dpdes;
vcpu->arch.hfscr = hr->hfscr;
vcpu->arch.dawr = hr->dawr0;
(HPTE_R_KEY_HI | HPTE_R_KEY_LO));
}
+static inline void fixup_tlbie_lpid(unsigned long rb_value, unsigned long lpid)
+{
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+ /* Radix flush for a hash guest */
+
+ unsigned long rb,rs,prs,r,ric;
+
+ rb = PPC_BIT(52); /* IS = 2 */
+ rs = 0; /* lpid = 0 */
+ prs = 0; /* partition scoped */
+ r = 1; /* radix format */
+ ric = 0; /* RIC_FLSUH_TLB */
+
+ /*
+ * Need the extra ptesync to make sure we don't
+ * re-order the tlbie
+ */
+ asm volatile("ptesync": : :"memory");
+ asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+ : : "r"(rb), "i"(r), "i"(prs),
+ "i"(ric), "r"(rs) : "memory");
+ }
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
+ "r" (rb_value), "r" (lpid));
+ }
+}
+
static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
long npages, int global, bool need_sync)
{
"r" (rbvalues[i]), "r" (kvm->arch.lpid));
}
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) {
- /*
- * Need the extra ptesync to make sure we don't
- * re-order the tlbie
- */
- asm volatile("ptesync": : :"memory");
- asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
- "r" (rbvalues[0]), "r" (kvm->arch.lpid));
- }
-
+ fixup_tlbie_lpid(rbvalues[i - 1], kvm->arch.lpid);
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
} else {
if (need_sync)
hcpu = hcore << threads_shift;
kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
- kvmppc_set_host_ipi(hcpu, 1);
+ kvmppc_set_host_ipi(hcpu);
smp_mb();
kvmhv_rm_send_ipi(hcpu);
}
/* Load guest PCR value to select appropriate compat mode */
37: ld r7, VCORE_PCR(r5)
- cmpdi r7, 0
+ LOAD_REG_IMMEDIATE(r6, PCR_MASK)
+ cmpld r7, r6
beq 38f
+ or r7, r7, r6
mtspr SPRN_PCR, r7
38:
/* Reset PCR */
ld r0, VCORE_PCR(r5)
- cmpdi r0, 0
+ LOAD_REG_IMMEDIATE(r6, PCR_MASK)
+ cmpld r0, r6
beq 18f
- li r0, 0
- mtspr SPRN_PCR, r0
+ mtspr SPRN_PCR, r6
18:
/* Signal secondary CPUs to continue */
stb r0,VCORE_IN_GUEST(r5)
return va;
}
-static inline void fixup_tlbie(unsigned long vpn, int psize, int apsize, int ssize)
+static inline void fixup_tlbie_vpn(unsigned long vpn, int psize,
+ int apsize, int ssize)
{
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) {
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+ /* Radix flush for a hash guest */
+
+ unsigned long rb,rs,prs,r,ric;
+
+ rb = PPC_BIT(52); /* IS = 2 */
+ rs = 0; /* lpid = 0 */
+ prs = 0; /* partition scoped */
+ r = 1; /* radix format */
+ ric = 0; /* RIC_FLSUH_TLB */
+
+ /*
+ * Need the extra ptesync to make sure we don't
+ * re-order the tlbie
+ */
+ asm volatile("ptesync": : :"memory");
+ asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+ : : "r"(rb), "i"(r), "i"(prs),
+ "i"(ric), "r"(rs) : "memory");
+ }
+
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
/* Need the extra ptesync to ensure we don't reorder tlbie*/
asm volatile("ptesync": : :"memory");
___tlbie(vpn, psize, apsize, ssize);
asm volatile("ptesync": : :"memory");
} else {
__tlbie(vpn, psize, apsize, ssize);
- fixup_tlbie(vpn, psize, apsize, ssize);
+ fixup_tlbie_vpn(vpn, psize, apsize, ssize);
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
if (lock_tlbie && !use_local)
/*
* Just do one more with the last used values.
*/
- fixup_tlbie(vpn, psize, psize, ssize);
+ fixup_tlbie_vpn(vpn, psize, psize, ssize);
asm volatile("eieio; tlbsync; ptesync":::"memory");
if (lock_tlbie)
return 1;
}
+EXPORT_SYMBOL_GPL(hash__has_transparent_hugepage);
+
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#ifdef CONFIG_STRICT_KERNEL_RWX
#ifdef CONFIG_SPAPR_TCE_IOMMU
WARN_ON_ONCE(!list_empty(&mm->context.iommu_group_mem_list));
#endif
+ /*
+ * For tasks which were successfully initialized we end up calling
+ * arch_exit_mmap() which clears the process table entry. And
+ * arch_exit_mmap() is called before the required fullmm TLB flush
+ * which does a RIC=2 flush. Hence for an initialized task, we do clear
+ * any cached process table entries.
+ *
+ * The condition below handles the error case during task init. We have
+ * set the process table entry early and if we fail a task
+ * initialization, we need to ensure the process table entry is zeroed.
+ * We need not worry about process table entry caches because the task
+ * never ran with the PID value.
+ */
if (radix_enabled())
- WARN_ON(process_tb[mm->context.id].prtb0 != 0);
+ process_tb[mm->context.id].prtb0 = 0;
else
subpage_prot_free(mm);
destroy_contexts(&mm->context);
return old_pmd;
}
-int radix__has_transparent_hugepage(void)
-{
- /* For radix 2M at PMD level means thp */
- if (mmu_psize_defs[MMU_PAGE_2M].shift == PMD_SHIFT)
- return 1;
- return 0;
-}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
}
-static inline void fixup_tlbie(void)
+
+static inline void fixup_tlbie_va(unsigned long va, unsigned long pid,
+ unsigned long ap)
+{
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_va(va, 0, ap, RIC_FLUSH_TLB);
+ }
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_va(va, pid, ap, RIC_FLUSH_TLB);
+ }
+}
+
+static inline void fixup_tlbie_va_range(unsigned long va, unsigned long pid,
+ unsigned long ap)
+{
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_pid(0, RIC_FLUSH_TLB);
+ }
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_va(va, pid, ap, RIC_FLUSH_TLB);
+ }
+}
+
+static inline void fixup_tlbie_pid(unsigned long pid)
{
- unsigned long pid = 0;
+ /*
+ * We can use any address for the invalidation, pick one which is
+ * probably unused as an optimisation.
+ */
unsigned long va = ((1UL << 52) - 1);
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) {
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_pid(0, RIC_FLUSH_TLB);
+ }
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
asm volatile("ptesync": : :"memory");
__tlbie_va(va, pid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB);
}
}
+
+static inline void fixup_tlbie_lpid_va(unsigned long va, unsigned long lpid,
+ unsigned long ap)
+{
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_lpid_va(va, 0, ap, RIC_FLUSH_TLB);
+ }
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_lpid_va(va, lpid, ap, RIC_FLUSH_TLB);
+ }
+}
+
static inline void fixup_tlbie_lpid(unsigned long lpid)
{
+ /*
+ * We can use any address for the invalidation, pick one which is
+ * probably unused as an optimisation.
+ */
unsigned long va = ((1UL << 52) - 1);
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) {
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_lpid(0, RIC_FLUSH_TLB);
+ }
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
asm volatile("ptesync": : :"memory");
__tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB);
}
switch (ric) {
case RIC_FLUSH_TLB:
__tlbie_pid(pid, RIC_FLUSH_TLB);
+ fixup_tlbie_pid(pid);
break;
case RIC_FLUSH_PWC:
__tlbie_pid(pid, RIC_FLUSH_PWC);
case RIC_FLUSH_ALL:
default:
__tlbie_pid(pid, RIC_FLUSH_ALL);
+ fixup_tlbie_pid(pid);
}
- fixup_tlbie();
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
switch (ric) {
case RIC_FLUSH_TLB:
__tlbie_lpid(lpid, RIC_FLUSH_TLB);
+ fixup_tlbie_lpid(lpid);
break;
case RIC_FLUSH_PWC:
__tlbie_lpid(lpid, RIC_FLUSH_PWC);
case RIC_FLUSH_ALL:
default:
__tlbie_lpid(lpid, RIC_FLUSH_ALL);
+ fixup_tlbie_lpid(lpid);
}
- fixup_tlbie_lpid(lpid);
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
for (addr = start; addr < end; addr += page_size)
__tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
+
+ fixup_tlbie_va_range(addr - page_size, pid, ap);
}
static __always_inline void _tlbie_va(unsigned long va, unsigned long pid,
asm volatile("ptesync": : :"memory");
__tlbie_va(va, pid, ap, ric);
- fixup_tlbie();
+ fixup_tlbie_va(va, pid, ap);
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
asm volatile("ptesync": : :"memory");
__tlbie_lpid_va(va, lpid, ap, ric);
- fixup_tlbie_lpid(lpid);
+ fixup_tlbie_lpid_va(va, lpid, ap);
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
if (also_pwc)
__tlbie_pid(pid, RIC_FLUSH_PWC);
__tlbie_va_range(start, end, pid, page_size, psize);
- fixup_tlbie();
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
if (gflush)
__tlbie_va_range(gstart, gend, pid,
PUD_SIZE, MMU_PAGE_1G);
- fixup_tlbie();
+
asm volatile("eieio; tlbsync; ptesync": : :"memory");
} else {
_tlbiel_va_range_multicast(mm,
vmemmap_list = vmem_back;
}
+static bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
+ unsigned long page_size)
+{
+ unsigned long nr_pfn = page_size / sizeof(struct page);
+ unsigned long start_pfn = page_to_pfn((struct page *)start);
+
+ if ((start_pfn + nr_pfn) > altmap->end_pfn)
+ return true;
+
+ if (start_pfn < altmap->base_pfn)
+ return true;
+
+ return false;
+}
+
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
* fail due to alignment issues when using 16MB hugepages, so
* fall back to system memory if the altmap allocation fail.
*/
- if (altmap) {
+ if (altmap && !altmap_cross_boundary(altmap, start, page_size)) {
p = altmap_alloc_block_buf(page_size, altmap);
if (!p)
pr_debug("altmap block allocation failed, falling back to system memory");
#include <asm/code-patching.h>
#include <mm/mmu_decl.h>
+static pgprot_t kasan_prot_ro(void)
+{
+ if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
+ return PAGE_READONLY;
+
+ return PAGE_KERNEL_RO;
+}
+
static void kasan_populate_pte(pte_t *ptep, pgprot_t prot)
{
unsigned long va = (unsigned long)kasan_early_shadow_page;
{
pmd_t *pmd;
unsigned long k_cur, k_next;
+ pgprot_t prot = slab_is_available() ? kasan_prot_ro() : PAGE_KERNEL;
pmd = pmd_offset(pud_offset(pgd_offset_k(k_start), k_start), k_start);
if (!new)
return -ENOMEM;
- if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
- kasan_populate_pte(new, PAGE_READONLY);
- else
- kasan_populate_pte(new, PAGE_KERNEL_RO);
+ kasan_populate_pte(new, prot);
smp_wmb(); /* See comment in __pte_alloc */
static void __init kasan_remap_early_shadow_ro(void)
{
- if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
- kasan_populate_pte(kasan_early_shadow_pte, PAGE_READONLY);
- else
- kasan_populate_pte(kasan_early_shadow_pte, PAGE_KERNEL_RO);
+ pgprot_t prot = kasan_prot_ro();
+ unsigned long k_start = KASAN_SHADOW_START;
+ unsigned long k_end = KASAN_SHADOW_END;
+ unsigned long k_cur;
+ phys_addr_t pa = __pa(kasan_early_shadow_page);
+
+ kasan_populate_pte(kasan_early_shadow_pte, prot);
+
+ for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
+ pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), k_cur);
+ pte_t *ptep = pte_offset_kernel(pmd, k_cur);
+
+ if ((pte_val(*ptep) & PTE_RPN_MASK) != pa)
+ continue;
+ __set_pte_at(&init_mm, k_cur, ptep, pfn_pte(PHYS_PFN(pa), prot), 0);
+ }
flush_tlb_kernel_range(KASAN_SHADOW_START, KASAN_SHADOW_END);
}
* for coming online, which are handled via
* generic_check_cpu_restart() calls.
*/
- kvmppc_set_host_ipi(cpu, 0);
+ kvmppc_clear_host_ipi(cpu);
srr1 = pnv_cpu_offline(cpu);
EXPORT_SYMBOL(plpar_hcall9);
EXPORT_SYMBOL(plpar_hcall_norets);
+/*
+ * H_BLOCK_REMOVE supported block size for this page size in segment who's base
+ * page size is that page size.
+ *
+ * The first index is the segment base page size, the second one is the actual
+ * page size.
+ */
+static int hblkrm_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT] __ro_after_init;
+
+/*
+ * Due to the involved complexity, and that the current hypervisor is only
+ * returning this value or 0, we are limiting the support of the H_BLOCK_REMOVE
+ * buffer size to 8 size block.
+ */
+#define HBLKRM_SUPPORTED_BLOCK_SIZE 8
+
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
static u8 dtl_mask = DTL_LOG_PREEMPT;
#else
#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
+/*
+ * Returned true if we are supporting this block size for the specified segment
+ * base page size and actual page size.
+ *
+ * Currently, we only support 8 size block.
+ */
+static inline bool is_supported_hlbkrm(int bpsize, int psize)
+{
+ return (hblkrm_size[bpsize][psize] == HBLKRM_SUPPORTED_BLOCK_SIZE);
+}
+
/**
* H_BLOCK_REMOVE caller.
* @idx should point to the latest @param entry set with a PTEX.
if (lock_tlbie)
spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
- if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+ /* Assuming THP size is 16M */
+ if (is_supported_hlbkrm(psize, MMU_PAGE_16M))
hugepage_block_invalidate(slot, vpn, count, psize, ssize);
else
hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
(void)call_block_remove(pix, param, true);
}
+/*
+ * TLB Block Invalidate Characteristics
+ *
+ * These characteristics define the size of the block the hcall H_BLOCK_REMOVE
+ * is able to process for each couple segment base page size, actual page size.
+ *
+ * The ibm,get-system-parameter properties is returning a buffer with the
+ * following layout:
+ *
+ * [ 2 bytes size of the RTAS buffer (excluding these 2 bytes) ]
+ * -----------------
+ * TLB Block Invalidate Specifiers:
+ * [ 1 byte LOG base 2 of the TLB invalidate block size being specified ]
+ * [ 1 byte Number of page sizes (N) that are supported for the specified
+ * TLB invalidate block size ]
+ * [ 1 byte Encoded segment base page size and actual page size
+ * MSB=0 means 4k segment base page size and actual page size
+ * MSB=1 the penc value in mmu_psize_def ]
+ * ...
+ * -----------------
+ * Next TLB Block Invalidate Specifiers...
+ * -----------------
+ * [ 0 ]
+ */
+static inline void set_hblkrm_bloc_size(int bpsize, int psize,
+ unsigned int block_size)
+{
+ if (block_size > hblkrm_size[bpsize][psize])
+ hblkrm_size[bpsize][psize] = block_size;
+}
+
+/*
+ * Decode the Encoded segment base page size and actual page size.
+ * PAPR specifies:
+ * - bit 7 is the L bit
+ * - bits 0-5 are the penc value
+ * If the L bit is 0, this means 4K segment base page size and actual page size
+ * otherwise the penc value should be read.
+ */
+#define HBLKRM_L_MASK 0x80
+#define HBLKRM_PENC_MASK 0x3f
+static inline void __init check_lp_set_hblkrm(unsigned int lp,
+ unsigned int block_size)
+{
+ unsigned int bpsize, psize;
+
+ /* First, check the L bit, if not set, this means 4K */
+ if ((lp & HBLKRM_L_MASK) == 0) {
+ set_hblkrm_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
+ return;
+ }
+
+ lp &= HBLKRM_PENC_MASK;
+ for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
+ struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
+
+ for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+ if (def->penc[psize] == lp) {
+ set_hblkrm_bloc_size(bpsize, psize, block_size);
+ return;
+ }
+ }
+ }
+}
+
+#define SPLPAR_TLB_BIC_TOKEN 50
+
+/*
+ * The size of the TLB Block Invalidate Characteristics is variable. But at the
+ * maximum it will be the number of possible page sizes *2 + 10 bytes.
+ * Currently MMU_PAGE_COUNT is 16, which means 42 bytes. Use a cache line size
+ * (128 bytes) for the buffer to get plenty of space.
+ */
+#define SPLPAR_TLB_BIC_MAXLENGTH 128
+
+void __init pseries_lpar_read_hblkrm_characteristics(void)
+{
+ unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
+ int call_status, len, idx, bpsize;
+
+ spin_lock(&rtas_data_buf_lock);
+ memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
+ call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+ NULL,
+ SPLPAR_TLB_BIC_TOKEN,
+ __pa(rtas_data_buf),
+ RTAS_DATA_BUF_SIZE);
+ memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
+ local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
+ spin_unlock(&rtas_data_buf_lock);
+
+ if (call_status != 0) {
+ pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
+ __FILE__, __func__, call_status);
+ return;
+ }
+
+ /*
+ * The first two (2) bytes of the data in the buffer are the length of
+ * the returned data, not counting these first two (2) bytes.
+ */
+ len = be16_to_cpu(*((u16 *)local_buffer)) + 2;
+ if (len > SPLPAR_TLB_BIC_MAXLENGTH) {
+ pr_warn("%s too large returned buffer %d", __func__, len);
+ return;
+ }
+
+ idx = 2;
+ while (idx < len) {
+ u8 block_shift = local_buffer[idx++];
+ u32 block_size;
+ unsigned int npsize;
+
+ if (!block_shift)
+ break;
+
+ block_size = 1 << block_shift;
+
+ for (npsize = local_buffer[idx++];
+ npsize > 0 && idx < len; npsize--)
+ check_lp_set_hblkrm((unsigned int) local_buffer[idx++],
+ block_size);
+ }
+
+ for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
+ for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
+ if (hblkrm_size[bpsize][idx])
+ pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
+ bpsize, idx, hblkrm_size[bpsize][idx]);
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
if (lock_tlbie)
spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
- if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+ if (is_supported_hlbkrm(batch->psize, batch->psize)) {
do_block_remove(number, batch, param);
goto out;
}
cond_resched();
} while (rc == H_BUSY);
- if (rc) {
- /* H_OVERLAP needs a separate error path */
- if (rc == H_OVERLAP)
- return -EBUSY;
-
- dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
- return -ENXIO;
- }
+ if (rc)
+ return rc;
p->bound_addr = saved;
-
- dev_dbg(&p->pdev->dev, "bound drc %x to %pR\n", p->drc_index, &p->res);
-
- return 0;
+ dev_dbg(&p->pdev->dev, "bound drc 0x%x to %pR\n", p->drc_index, &p->res);
+ return rc;
}
-static int drc_pmem_unbind(struct papr_scm_priv *p)
+static void drc_pmem_unbind(struct papr_scm_priv *p)
{
unsigned long ret[PLPAR_HCALL_BUFSIZE];
uint64_t token = 0;
int64_t rc;
- dev_dbg(&p->pdev->dev, "unbind drc %x\n", p->drc_index);
+ dev_dbg(&p->pdev->dev, "unbind drc 0x%x\n", p->drc_index);
/* NB: unbind has the same retry requirements as drc_pmem_bind() */
do {
if (rc)
dev_err(&p->pdev->dev, "unbind error: %lld\n", rc);
else
- dev_dbg(&p->pdev->dev, "unbind drc %x complete\n",
+ dev_dbg(&p->pdev->dev, "unbind drc 0x%x complete\n",
p->drc_index);
- return rc == H_SUCCESS ? 0 : -ENXIO;
+ return;
}
+static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
+{
+ unsigned long start_addr;
+ unsigned long end_addr;
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ int64_t rc;
+
+
+ rc = plpar_hcall(H_SCM_QUERY_BLOCK_MEM_BINDING, ret,
+ p->drc_index, 0);
+ if (rc)
+ goto err_out;
+ start_addr = ret[0];
+
+ /* Make sure the full region is bound. */
+ rc = plpar_hcall(H_SCM_QUERY_BLOCK_MEM_BINDING, ret,
+ p->drc_index, p->blocks - 1);
+ if (rc)
+ goto err_out;
+ end_addr = ret[0];
+
+ if ((end_addr - start_addr) != ((p->blocks - 1) * p->block_size))
+ goto err_out;
+
+ p->bound_addr = start_addr;
+ dev_dbg(&p->pdev->dev, "bound drc 0x%x to %pR\n", p->drc_index, &p->res);
+ return rc;
+
+err_out:
+ dev_info(&p->pdev->dev,
+ "Failed to query, trying an unbind followed by bind");
+ drc_pmem_unbind(p);
+ return drc_pmem_bind(p);
+}
+
+
static int papr_scm_meta_get(struct papr_scm_priv *p,
struct nd_cmd_get_config_data_hdr *hdr)
{
rc = drc_pmem_bind(p);
/* If phyp says drc memory still bound then force unbound and retry */
- if (rc == -EBUSY) {
- dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
- drc_pmem_unbind(p);
- rc = drc_pmem_bind(p);
- }
+ if (rc == H_OVERLAP)
+ rc = drc_pmem_query_n_bind(p);
- if (rc)
+ if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev, "bind err: %d\n", rc);
+ rc = -ENXIO;
goto err;
+ }
/* setup the resource for the newly bound range */
p->res.start = p->bound_addr;
int dlpar_workqueue_init(void);
void pseries_setup_rfi_flush(void);
+void pseries_lpar_read_hblkrm_characteristics(void);
#endif /* _PSERIES_PSERIES_H */
pseries_setup_rfi_flush();
setup_stf_barrier();
+ pseries_lpar_read_hblkrm_characteristics();
/* By default, only probe PCI (can be overridden by rtas_pci) */
pci_add_flags(PCI_PROBE_ONLY);
static void icp_native_cause_ipi(int cpu)
{
- kvmppc_set_host_ipi(cpu, 1);
+ kvmppc_set_host_ipi(cpu);
icp_native_set_qirr(cpu, IPI_PRIORITY);
}
if (vec == XICS_IPI) {
/* Clear pending IPI */
int cpu = smp_processor_id();
- kvmppc_set_host_ipi(cpu, 0);
+ kvmppc_clear_host_ipi(cpu);
icp_native_set_qirr(cpu, 0xff);
} else {
pr_err("XICS: hw interrupt 0x%x to offline cpu, disabling\n",
{
int cpu = smp_processor_id();
- kvmppc_set_host_ipi(cpu, 0);
+ kvmppc_clear_host_ipi(cpu);
icp_native_set_qirr(cpu, 0xff);
return smp_ipi_demux();
{
int hw_cpu = get_hard_smp_processor_id(cpu);
- kvmppc_set_host_ipi(cpu, 1);
+ kvmppc_set_host_ipi(cpu);
opal_int_set_mfrr(hw_cpu, IPI_PRIORITY);
}
{
int cpu = smp_processor_id();
- kvmppc_set_host_ipi(cpu, 0);
+ kvmppc_clear_host_ipi(cpu);
opal_int_set_mfrr(get_hard_smp_processor_id(cpu), 0xff);
return smp_ipi_demux();
if (vec == XICS_IPI) {
/* Clear pending IPI */
int cpu = smp_processor_id();
- kvmppc_set_host_ipi(cpu, 0);
+ kvmppc_clear_host_ipi(cpu);
opal_int_set_mfrr(get_hard_smp_processor_id(cpu), 0xff);
} else {
pr_err("XICS: hw interrupt 0x%x to offline cpu, "
aliases {
serial0 = &uart0;
serial1 = &uart1;
+ ethernet0 = ð0;
};
chosen {
};
};
cpu2: cpu@2 {
- clock-frequency = <0>;
compatible = "sifive,u54-mc", "sifive,rocket0", "riscv";
d-cache-block-size = <64>;
d-cache-sets = <64>;
};
};
cpu3: cpu@3 {
- clock-frequency = <0>;
compatible = "sifive,u54-mc", "sifive,rocket0", "riscv";
d-cache-block-size = <64>;
d-cache-sets = <64>;
};
};
cpu4: cpu@4 {
- clock-frequency = <0>;
compatible = "sifive,u54-mc", "sifive,rocket0", "riscv";
d-cache-block-size = <64>;
d-cache-sets = <64>;
#size-cells = <0>;
status = "disabled";
};
+ pwm0: pwm@10020000 {
+ compatible = "sifive,fu540-c000-pwm", "sifive,pwm0";
+ reg = <0x0 0x10020000 0x0 0x1000>;
+ interrupt-parent = <&plic0>;
+ interrupts = <42 43 44 45>;
+ clocks = <&prci PRCI_CLK_TLCLK>;
+ #pwm-cells = <3>;
+ status = "disabled";
+ };
+ pwm1: pwm@10021000 {
+ compatible = "sifive,fu540-c000-pwm", "sifive,pwm0";
+ reg = <0x0 0x10021000 0x0 0x1000>;
+ interrupt-parent = <&plic0>;
+ interrupts = <46 47 48 49>;
+ clocks = <&prci PRCI_CLK_TLCLK>;
+ #pwm-cells = <3>;
+ status = "disabled";
+ };
};
};
reg = <0>;
};
};
+
+&pwm0 {
+ status = "okay";
+};
+
+&pwm1 {
+ status = "okay";
+};
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NETLINK_DIAG=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCI_HOST_GENERIC=y
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=y
+CONFIG_SCSI_VIRTIO=y
CONFIG_ATA=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
CONFIG_SERIAL_OF_PLATFORM=y
CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
CONFIG_HVC_RISCV_SBI=y
+CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_SPI=y
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_DRM=y
CONFIG_DRM_RADEON=y
+CONFIG_DRM_VIRTIO_GPU=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_UAS=y
CONFIG_MMC=y
CONFIG_MMC_SPI=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_INPUT=y
CONFIG_VIRTIO_MMIO=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_RPMSG_VIRTIO=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_AUTOFS4_FS=y
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_ROOT_NFS=y
+CONFIG_9P_FS=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_PRINTK_TIME=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NETLINK_DIAG=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCI_HOST_GENERIC=y
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=y
+CONFIG_SCSI_VIRTIO=y
CONFIG_ATA=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
CONFIG_SERIAL_OF_PLATFORM=y
CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
CONFIG_HVC_RISCV_SBI=y
+CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_DRM=y
CONFIG_DRM_RADEON=y
+CONFIG_DRM_VIRTIO_GPU=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_OHCI_HCD_PLATFORM=y
CONFIG_USB_STORAGE=y
CONFIG_USB_UAS=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_INPUT=y
CONFIG_VIRTIO_MMIO=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_RPMSG_VIRTIO=y
CONFIG_SIFIVE_PLIC=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_ROOT_NFS=y
+CONFIG_9P_FS=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_PRINTK_TIME=y
#define REG_L __REG_SEL(ld, lw)
#define REG_S __REG_SEL(sd, sw)
+#define REG_SC __REG_SEL(sc.d, sc.w)
#define SZREG __REG_SEL(8, 4)
#define LGREG __REG_SEL(3, 2)
#define __S110 PAGE_SHARED_EXEC
#define __S111 PAGE_SHARED_EXEC
+#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
+#define VMALLOC_END (PAGE_OFFSET - 1)
+#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
+
+#define FIXADDR_TOP VMALLOC_START
+#ifdef CONFIG_64BIT
+#define FIXADDR_SIZE PMD_SIZE
+#else
+#define FIXADDR_SIZE PGDIR_SIZE
+#endif
+#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
+
/*
* Roughly size the vmemmap space to be large enough to fit enough
* struct pages to map half the virtual address space. Then
extern void setup_bootmem(void);
extern void paging_init(void);
-#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
-#define VMALLOC_END (PAGE_OFFSET - 1)
-#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
-
-#define FIXADDR_TOP VMALLOC_START
-#ifdef CONFIG_64BIT
-#define FIXADDR_SIZE PMD_SIZE
-#else
-#define FIXADDR_SIZE PGDIR_SIZE
-#endif
-#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
-
/*
* Task size is 0x4000000000 for RV64 or 0x9fc00000 for RV32.
* Note that PGDIR_SIZE must evenly divide TASK_SIZE.
*/
.macro RESTORE_ALL
REG_L a0, PT_SSTATUS(sp)
- REG_L a2, PT_SEPC(sp)
+ /*
+ * The current load reservation is effectively part of the processor's
+ * state, in the sense that load reservations cannot be shared between
+ * different hart contexts. We can't actually save and restore a load
+ * reservation, so instead here we clear any existing reservation --
+ * it's always legal for implementations to clear load reservations at
+ * any point (as long as the forward progress guarantee is kept, but
+ * we'll ignore that here).
+ *
+ * Dangling load reservations can be the result of taking a trap in the
+ * middle of an LR/SC sequence, but can also be the result of a taken
+ * forward branch around an SC -- which is how we implement CAS. As a
+ * result we need to clear reservations between the last CAS and the
+ * jump back to the new context. While it is unlikely the store
+ * completes, implementations are allowed to expand reservations to be
+ * arbitrarily large.
+ */
+ REG_L a2, PT_SEPC(sp)
+ REG_SC x0, a2, PT_SEPC(sp)
+
csrw CSR_SSTATUS, a0
csrw CSR_SEPC, a2
move a0, sp /* pt_regs */
tail do_IRQ
1:
- /* Exceptions run with interrupts enabled */
+ /* Exceptions run with interrupts enabled or disabled
+ depending on the state of sstatus.SR_SPIE */
+ andi t0, s1, SR_SPIE
+ beqz t0, 1f
csrs CSR_SSTATUS, SR_SIE
+1:
/* Handle syscalls */
li t0, EXC_SYSCALL
beq s4, t0, handle_syscall
li t0, SR_FS
csrc CSR_SSTATUS, t0
+#ifdef CONFIG_SMP
+ li t0, CONFIG_NR_CPUS
+ bgeu a0, t0, .Lsecondary_park
+#endif
+
/* Pick one hart to run the main boot sequence */
la a3, hart_lottery
li a2, 1
.Lsecondary_start:
#ifdef CONFIG_SMP
- li a1, CONFIG_NR_CPUS
- bgeu a0, a1, .Lsecondary_park
-
/* Set trap vector to spin forever to help debug */
la a3, .Lsecondary_park
csrw CSR_STVEC, a3
{
send_ipi_single(cpu, IPI_RESCHEDULE);
}
+EXPORT_SYMBOL_GPL(smp_send_reschedule);
#include <asm/sbi.h>
unsigned long riscv_timebase;
+EXPORT_SYMBOL_GPL(riscv_timebase);
void __init time_init(void)
{
#include <linux/swap.h>
#include <linux/sizes.h>
#include <linux/of_fdt.h>
+#include <linux/libfdt.h>
#include <asm/fixmap.h>
#include <asm/tlbflush.h>
}
#endif /* CONFIG_BLK_DEV_INITRD */
+static phys_addr_t dtb_early_pa __initdata;
+
void __init setup_bootmem(void)
{
struct memblock_region *reg;
setup_initrd();
#endif /* CONFIG_BLK_DEV_INITRD */
- early_init_fdt_reserve_self();
+ /*
+ * Avoid using early_init_fdt_reserve_self() since __pa() does
+ * not work for DTB pointers that are fixmap addresses
+ */
+ memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va));
+
early_init_fdt_scan_reserved_mem();
memblock_allow_resize();
memblock_dump_all();
/* Save pointer to DTB for early FDT parsing */
dtb_early_va = (void *)fix_to_virt(FIX_FDT) + (dtb_pa & ~PAGE_MASK);
+ /* Save physical address for memblock reservation */
+ dtb_early_pa = dtb_pa;
}
static void __init setup_vm_final(void)
def_bool y
depends on KEXEC_FILE
-config KEXEC_VERIFY_SIG
+config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
- depends on KEXEC_FILE && SYSTEM_DATA_VERIFICATION
+ depends on KEXEC_FILE && MODULE_SIG_FORMAT
help
This option makes kernel signature verification mandatory for
the kexec_file_load() syscall.
CONFIG_NUMA=y
CONFIG_HZ_100=y
CONFIG_KEXEC_FILE=y
+CONFIG_KEXEC_SIG=y
CONFIG_EXPOLINE=y
CONFIG_EXPOLINE_AUTO=y
CONFIG_CHSC_SCH=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
-CONFIG_MODULE_SIG=y
CONFIG_MODULE_SIG_SHA256=y
+CONFIG_UNUSED_SYMBOLS=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLK_WBT=y
CONFIG_BLK_CGROUP_IOLATENCY=y
+CONFIG_BLK_CGROUP_IOCOST=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_IBM_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_BPF_JIT=y
CONFIG_NET_PKTGEN=m
+# CONFIG_NET_DROP_MONITOR is not set
CONFIG_PCI=y
CONFIG_PCI_DEBUG=y
CONFIG_HOTPLUG_PCI=y
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_WRITECACHE=m
+CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
+CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
CONFIG_DM_SWITCH=m
CONFIG_NETDEVICES=y
CONFIG_BONDING=m
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_NET_VENDOR_PACKET_ENGINES is not set
+# CONFIG_NET_VENDOR_PENSANDO is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_RDC is not set
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_SOFT_WATCHDOG=m
CONFIG_DIAG288_WATCHDOG=m
-CONFIG_DRM=y
-CONFIG_DRM_VIRTIO_GPU=y
+CONFIG_FB=y
CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_HID is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_MLX4_INFINIBAND=m
CONFIG_MLX5_INFINIBAND=m
+CONFIG_SYNC_FILE=y
CONFIG_VFIO=m
CONFIG_VFIO_PCI=m
CONFIG_VFIO_MDEV=m
CONFIG_FS_DAX=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FS_ENCRYPTION=y
+CONFIG_FS_VERITY=y
+CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=y
CONFIG_CUSE=m
+CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
CONFIG_FSCACHE=m
CONFIG_CACHEFILES=m
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
+CONFIG_SECURITY_LOCKDOWN_LSM=y
+CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_IMA=y
CONFIG_IMA_DEFAULT_HASH_SHA256=y
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_APPRAISE=y
+CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_ECRDSA=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
-CONFIG_CRYPTO_AEGIS128L=m
-CONFIG_CRYPTO_AEGIS256=m
-CONFIG_CRYPTO_MORUS640=m
-CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=1024
-CONFIG_UNUSED_SYMBOLS=y
CONFIG_HEADERS_INSTALL=y
CONFIG_HEADERS_CHECK=y
CONFIG_DEBUG_SECTION_MISMATCH=y
# CONFIG_NUMA_EMU is not set
CONFIG_HZ_100=y
CONFIG_KEXEC_FILE=y
+CONFIG_KEXEC_SIG=y
CONFIG_EXPOLINE=y
CONFIG_EXPOLINE_AUTO=y
CONFIG_CHSC_SCH=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
-CONFIG_MODULE_SIG=y
CONFIG_MODULE_SIG_SHA256=y
+CONFIG_UNUSED_SYMBOLS=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLK_WBT=y
CONFIG_BLK_CGROUP_IOLATENCY=y
+CONFIG_BLK_CGROUP_IOCOST=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_IBM_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_BPF_JIT=y
CONFIG_NET_PKTGEN=m
+# CONFIG_NET_DROP_MONITOR is not set
CONFIG_PCI=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_S390=y
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_WRITECACHE=m
+CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
+CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
CONFIG_DM_SWITCH=m
CONFIG_DM_INTEGRITY=m
CONFIG_NETDEVICES=y
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_NET_VENDOR_PACKET_ENGINES is not set
+# CONFIG_NET_VENDOR_PENSANDO is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_RDC is not set
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_SOFT_WATCHDOG=m
CONFIG_DIAG288_WATCHDOG=m
-CONFIG_DRM=y
-CONFIG_DRM_VIRTIO_GPU=y
-# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
+CONFIG_FB=y
CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_HID is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_MLX4_INFINIBAND=m
CONFIG_MLX5_INFINIBAND=m
+CONFIG_SYNC_FILE=y
CONFIG_VFIO=m
CONFIG_VFIO_PCI=m
CONFIG_VFIO_MDEV=m
CONFIG_FS_DAX=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FS_ENCRYPTION=y
+CONFIG_FS_VERITY=y
+CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=y
CONFIG_CUSE=m
+CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
CONFIG_FSCACHE=m
CONFIG_CACHEFILES=m
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
+CONFIG_SECURITY_LOCKDOWN_LSM=y
+CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_IMA=y
CONFIG_IMA_DEFAULT_HASH_SHA256=y
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_APPRAISE=y
+CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_ECRDSA=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
-CONFIG_CRYPTO_AEGIS128L=m
-CONFIG_CRYPTO_AEGIS256=m
-CONFIG_CRYPTO_MORUS640=m
-CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_OFB=m
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=1024
-CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_CONFIGFS_FS=y
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
-# CONFIG_DIMLIB is not set
+CONFIG_LSM="yama,loadpin,safesetid,integrity"
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
#undef __ATOMIC_OP
#define __ATOMIC_CONST_OP(op_name, op_type, op_string, op_barrier) \
-static inline void op_name(op_type val, op_type *ptr) \
+static __always_inline void op_name(op_type val, op_type *ptr) \
{ \
asm volatile( \
op_string " %[ptr],%[val]\n" \
return ((unsigned char *)ptr) + ((nr ^ (BITS_PER_LONG - 8)) >> 3);
}
-static inline void arch_set_bit(unsigned long nr, volatile unsigned long *ptr)
+static __always_inline void arch_set_bit(unsigned long nr, volatile unsigned long *ptr)
{
unsigned long *addr = __bitops_word(nr, ptr);
unsigned long mask;
__atomic64_or(mask, (long *)addr);
}
-static inline void arch_clear_bit(unsigned long nr, volatile unsigned long *ptr)
+static __always_inline void arch_clear_bit(unsigned long nr, volatile unsigned long *ptr)
{
unsigned long *addr = __bitops_word(nr, ptr);
unsigned long mask;
__atomic64_and(mask, (long *)addr);
}
-static inline void arch_change_bit(unsigned long nr,
- volatile unsigned long *ptr)
+static __always_inline void arch_change_bit(unsigned long nr,
+ volatile unsigned long *ptr)
{
unsigned long *addr = __bitops_word(nr, ptr);
unsigned long mask;
*
* Returns 1 if @func is available for @opcode, 0 otherwise
*/
-static inline void __cpacf_query(unsigned int opcode, cpacf_mask_t *mask)
+static __always_inline void __cpacf_query(unsigned int opcode, cpacf_mask_t *mask)
{
register unsigned long r0 asm("0") = 0; /* query function */
register unsigned long r1 asm("1") = (unsigned long) mask;
CPU_MF_INT_SF_PRA|CPU_MF_INT_SF_SACA| \
CPU_MF_INT_SF_LSDA)
+#define CPU_MF_SF_RIBM_NOTAV 0x1 /* Sampling unavailable */
+
/* CPU measurement facility support */
static inline int cpum_cf_avail(void)
{
unsigned long max_sampl_rate; /* 16-23: maximum sampling interval*/
unsigned long tear; /* 24-31: TEAR contents */
unsigned long dear; /* 32-39: DEAR contents */
- unsigned int rsvrd0; /* 40-43: reserved */
+ unsigned int rsvrd0:24; /* 40-42: reserved */
+ unsigned int ribm:8; /* 43: Reserved by IBM */
unsigned int cpu_speed; /* 44-47: CPU speed */
unsigned long long rsvrd1; /* 48-55: reserved */
unsigned long long rsvrd2; /* 56-63: reserved */
MT_DIAG = 5,
MT_DIAG_CLEARING = 9, /* clears loss-of-MT-ctr-data alert */
};
-static inline int stcctm(enum stcctm_ctr_set set, u64 range, u64 *dest)
+
+static __always_inline int stcctm(enum stcctm_ctr_set set, u64 range, u64 *dest)
{
int cc;
#include <asm/page.h>
#include <asm/pgtable.h>
-
-#define is_hugepage_only_range(mm, addr, len) 0
#define hugetlb_free_pgd_range free_pgd_range
#define hugepages_supported() (MACHINE_HAS_EDAT1)
pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
+static inline bool is_hugepage_only_range(struct mm_struct *mm,
+ unsigned long addr,
+ unsigned long len)
+{
+ return false;
+}
+
/*
* If the arch doesn't supply something else, assume that hugepage
* size aligned regions are ok without further preparation.
* We use a brcl 0,2 instruction for jump labels at compile time so it
* can be easily distinguished from a hotpatch generated instruction.
*/
-static inline bool arch_static_branch(struct static_key *key, bool branch)
+static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
asm_volatile_goto("0: brcl 0,"__stringify(JUMP_LABEL_NOP_OFFSET)"\n"
".pushsection __jump_table,\"aw\"\n"
return true;
}
-static inline bool arch_static_branch_jump(struct static_key *key, bool branch)
+static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
asm_volatile_goto("0: brcl 15,%l[label]\n"
".pushsection __jump_table,\"aw\"\n"
#define IPTE_NODAT 0x400
#define IPTE_GUEST_ASCE 0x800
-static inline void __ptep_ipte(unsigned long address, pte_t *ptep,
- unsigned long opt, unsigned long asce,
- int local)
+static __always_inline void __ptep_ipte(unsigned long address, pte_t *ptep,
+ unsigned long opt, unsigned long asce,
+ int local)
{
unsigned long pto = (unsigned long) ptep;
: [r1] "a" (pto), [m4] "i" (local) : "memory");
}
-static inline void __ptep_ipte_range(unsigned long address, int nr,
- pte_t *ptep, int local)
+static __always_inline void __ptep_ipte_range(unsigned long address, int nr,
+ pte_t *ptep, int local)
{
unsigned long pto = (unsigned long) ptep;
#define pte_offset_kernel(pmd, address) pte_offset(pmd, address)
#define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address)
-#define pte_unmap(pte) do { } while (0)
+
+static inline void pte_unmap(pte_t *pte) { }
static inline bool gup_fast_permitted(unsigned long start, unsigned long end)
{
#define IDTE_NODAT 0x1000
#define IDTE_GUEST_ASCE 0x2000
-static inline void __pmdp_idte(unsigned long addr, pmd_t *pmdp,
- unsigned long opt, unsigned long asce,
- int local)
+static __always_inline void __pmdp_idte(unsigned long addr, pmd_t *pmdp,
+ unsigned long opt, unsigned long asce,
+ int local)
{
unsigned long sto;
}
}
-static inline void __pudp_idte(unsigned long addr, pud_t *pudp,
- unsigned long opt, unsigned long asce,
- int local)
+static __always_inline void __pudp_idte(unsigned long addr, pud_t *pudp,
+ unsigned long opt, unsigned long asce,
+ int local)
{
unsigned long r3o;
/* private: */
u8 res[88];
/* public: */
- u8 parm[QDIO_MAX_BUFFERS_PER_Q];
+ u8 parm[128];
} __attribute__ ((packed, aligned(256)));
/**
const struct kexec_file_ops s390_kexec_elf_ops = {
.probe = s390_elf_probe,
.load = s390_elf_load,
-#ifdef CONFIG_KEXEC_VERIFY_SIG
+#ifdef CONFIG_KEXEC_SIG
.verify_sig = s390_verify_sig,
-#endif /* CONFIG_KEXEC_VERIFY_SIG */
+#endif /* CONFIG_KEXEC_SIG */
};
const struct kexec_file_ops s390_kexec_image_ops = {
.probe = s390_image_probe,
.load = s390_image_load,
-#ifdef CONFIG_KEXEC_VERIFY_SIG
+#ifdef CONFIG_KEXEC_SIG
.verify_sig = s390_verify_sig,
-#endif /* CONFIG_KEXEC_VERIFY_SIG */
+#endif /* CONFIG_KEXEC_SIG */
};
#include <linux/elf.h>
#include <linux/errno.h>
#include <linux/kexec.h>
-#include <linux/module.h>
+#include <linux/module_signature.h>
#include <linux/verification.h>
#include <asm/boot_data.h>
#include <asm/ipl.h>
NULL,
};
-#ifdef CONFIG_KEXEC_VERIFY_SIG
-/*
- * Module signature information block.
- *
- * The constituents of the signature section are, in order:
- *
- * - Signer's name
- * - Key identifier
- * - Signature data
- * - Information block
- */
-struct module_signature {
- u8 algo; /* Public-key crypto algorithm [0] */
- u8 hash; /* Digest algorithm [0] */
- u8 id_type; /* Key identifier type [PKEY_ID_PKCS7] */
- u8 signer_len; /* Length of signer's name [0] */
- u8 key_id_len; /* Length of key identifier [0] */
- u8 __pad[3];
- __be32 sig_len; /* Length of signature data */
-};
-
-#define PKEY_ID_PKCS7 2
-
+#ifdef CONFIG_KEXEC_SIG
int s390_verify_sig(const char *kernel, unsigned long kernel_len)
{
const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
VERIFYING_MODULE_SIGNATURE,
NULL, NULL);
}
-#endif /* CONFIG_KEXEC_VERIFY_SIG */
+#endif /* CONFIG_KEXEC_SIG */
static int kexec_file_update_purgatory(struct kimage *image,
struct s390_load_data *data)
debug_sprintf_event(cf_diag_dbg, 6,
"%s ctrset %d ctrset_size %zu cfvn %d csvn %d"
- " need %zd rc:%d\n",
+ " need %zd rc %d\n",
__func__, ctrset, ctrset_size, cpuhw->info.cfvn,
cpuhw->info.csvn, need, rc);
return need;
int err = 0;
debug_sprintf_event(cf_diag_dbg, 5,
- "%s event %p cpu %d flags %#x cpuhw:%p\n",
+ "%s event %p cpu %d flags %#x cpuhw %p\n",
__func__, event, event->cpu, flags, cpuhw);
if (cpuhw->flags & PMU_F_IN_USE) {
goto out;
}
+ if (si.ribm & CPU_MF_SF_RIBM_NOTAV) {
+ pr_warn("CPU Measurement Facility sampling is temporarily not available\n");
+ err = -EBUSY;
+ goto out;
+ }
+
/* Always enable basic sampling */
SAMPL_FLAGS(hwc) = PERF_CPUM_SF_BASIC_MODE;
/* Check online status of the CPU to which the event is pinned */
if (event->cpu >= 0 && !cpu_online(event->cpu))
- return -ENODEV;
+ return -ENODEV;
/* Force reset of idle/hv excludes regardless of what the
* user requested.
return cc == 0;
}
-static inline void __insn32_query(unsigned int opcode, u8 query[32])
+static __always_inline void __insn32_query(unsigned int opcode, u8 *query)
{
register unsigned long r0 asm("0") = 0; /* query function */
register unsigned long r1 asm("1") = (unsigned long) query;
asm volatile(
/* Parameter regs are ignored */
" .insn rrf,%[opc] << 16,2,4,6,0\n"
- : "=m" (*query)
+ :
: "d" (r0), "a" (r1), [opc] "i" (opcode)
- : "cc");
+ : "cc", "memory");
}
#define INSN_SORTL 0xb938
/*
* Call Logical Processor with c=0, the give constant lps and an lpcb request.
*/
-static inline int clp_req(void *data, unsigned int lps)
+static __always_inline int clp_req(void *data, unsigned int lps)
{
struct { u8 _[CLP_BLK_SIZE]; } *req = data;
u64 ignored;
config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
-config KEXEC_VERIFY_SIG
+config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
---help---
- This option makes kernel signature verification mandatory for
- the kexec_file_load() syscall.
- In addition to that option, you need to enable signature
+ This option makes the kexec_file_load() syscall check for a valid
+ signature of the kernel image. The image can still be loaded without
+ a valid signature unless you also enable KEXEC_SIG_FORCE, though if
+ there's a signature that we can check, then it must be valid.
+
+ In addition to this option, you need to enable signature
verification for the corresponding kernel image type being
loaded in order for this to work.
+config KEXEC_SIG_FORCE
+ bool "Require a valid signature in kexec_file_load() syscall"
+ depends on KEXEC_SIG
+ ---help---
+ This option makes kernel signature verification mandatory for
+ the kexec_file_load() syscall.
+
config KEXEC_BZIMAGE_VERIFY_SIG
bool "Enable bzImage signature verification support"
- depends on KEXEC_VERIFY_SIG
+ depends on KEXEC_SIG
depends on SIGNED_PE_FILE_VERIFICATION
select SYSTEM_TRUSTED_KEYRING
---help---
*/
#define MAX_ADDR_LEN 19
-static acpi_physical_address get_acpi_rsdp(void)
+static acpi_physical_address get_cmdline_acpi_rsdp(void)
{
acpi_physical_address addr = 0;
{
acpi_physical_address pa;
- pa = get_acpi_rsdp();
-
- if (!pa)
- pa = boot_params->acpi_rsdp_addr;
+ pa = boot_params->acpi_rsdp_addr;
/*
* Try to get EFI data from setup_data. This can happen when we're a
char arg[10];
u8 *entry;
- rsdp = (struct acpi_table_rsdp *)(long)boot_params->acpi_rsdp_addr;
+ /*
+ * Check whether we were given an RSDP on the command line. We don't
+ * stash this in boot params because the kernel itself may have
+ * different ideas about whether to trust a command-line parameter.
+ */
+ rsdp = (struct acpi_table_rsdp *)get_cmdline_acpi_rsdp();
+
+ if (!rsdp)
+ rsdp = (struct acpi_table_rsdp *)(long)
+ boot_params->acpi_rsdp_addr;
+
if (!rsdp)
return 0;
return !!acpi_lapic;
}
+#define ACPI_HAVE_ARCH_SET_ROOT_POINTER
+static inline void acpi_arch_set_root_pointer(u64 addr)
+{
+ x86_init.acpi.set_root_pointer(addr);
+}
+
#define ACPI_HAVE_ARCH_GET_ROOT_POINTER
static inline u64 acpi_arch_get_root_pointer(void)
{
void acpi_generic_reduced_hw_init(void);
+void x86_default_set_root_pointer(u64 addr);
u64 x86_default_get_root_pointer(void);
#else /* !CONFIG_ACPI */
static inline void acpi_generic_reduced_hw_init(void) { }
+static inline void x86_default_set_root_pointer(u64 addr) { }
+
static inline u64 x86_default_get_root_pointer(void)
{
return 0;
/* Recommend using enlightened VMCS */
#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)
+/*
+ * Virtual processor will never share a physical core with another virtual
+ * processor, except for virtual processors that are reported as sibling SMT
+ * threads.
+ */
+#define HV_X64_NO_NONARCH_CORESHARING BIT(18)
+
/* Nested features. These are HYPERV_CPUID_NESTED_FEATURES.EAX bits. */
+#define HV_X64_NESTED_DIRECT_FLUSH BIT(17)
#define HV_X64_NESTED_GUEST_MAPPING_FLUSH BIT(18)
#define HV_X64_NESTED_MSR_BITMAP BIT(19)
__u64 delivery_time; /* When the message was delivered */
} __packed;
+struct hv_nested_enlightenments_control {
+ struct {
+ __u32 directhypercall:1;
+ __u32 reserved:31;
+ } features;
+ struct {
+ __u32 reserved;
+ } hypercallControls;
+} __packed;
+
/* Define virtual processor assist page structure. */
struct hv_vp_assist_page {
__u32 apic_assist;
- __u32 reserved;
- __u64 vtl_control[2];
- __u64 nested_enlightenments_control[2];
- __u32 enlighten_vmentry;
- __u32 padding;
+ __u32 reserved1;
+ __u64 vtl_control[3];
+ struct hv_nested_enlightenments_control nested_control;
+ __u8 enlighten_vmentry;
+ __u8 reserved2[7];
__u64 current_nested_vmcs;
} __packed;
u64 gva_list[];
} __packed;
+struct hv_partition_assist_pg {
+ u32 tlb_lock_count;
+};
#endif
PFERR_WRITE_MASK | \
PFERR_PRESENT_MASK)
-/*
- * The mask used to denote special SPTEs, which can be either MMIO SPTEs or
- * Access Tracking SPTEs. We use bit 62 instead of bit 63 to avoid conflicting
- * with the SVE bit in EPT PTEs.
- */
-#define SPTE_SPECIAL_MASK (1ULL << 62)
-
/* apic attention bits */
#define KVM_APIC_CHECK_VAPIC 0
/*
struct list_head link;
struct hlist_node hash_link;
bool unsync;
+ u8 mmu_valid_gen;
bool mmio_cached;
/*
int root_count; /* Currently serving as active root */
unsigned int unsync_children;
struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
- unsigned long mmu_valid_gen;
DECLARE_BITMAP(unsync_child_bitmap, 512);
#ifdef CONFIG_X86_32
/* How many vCPUs have VP index != vCPU index */
atomic_t num_mismatched_vp_indexes;
+
+ struct hv_partition_assist_pg *hv_pa_pg;
};
enum kvm_irqchip_mode {
unsigned long n_requested_mmu_pages;
unsigned long n_max_mmu_pages;
unsigned int indirect_shadow_pages;
- unsigned long mmu_valid_gen;
+ u8 mmu_valid_gen;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
* Hash table of struct kvm_mmu_page.
*/
struct list_head active_mmu_pages;
+ struct list_head zapped_obsolete_pages;
struct kvm_page_track_notifier_node mmu_sp_tracker;
struct kvm_page_track_notifier_head track_notifier_head;
bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
+ int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
};
struct kvm_arch_async_pf {
extern u64 kvm_mce_cap_supported;
-enum emulation_result {
- EMULATE_DONE, /* no further processing */
- EMULATE_USER_EXIT, /* kvm_run ready for userspace exit */
- EMULATE_FAIL, /* can't emulate this instruction */
-};
-
+/*
+ * EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
+ * userspace I/O) to indicate that the emulation context
+ * should be resued as is, i.e. skip initialization of
+ * emulation context, instruction fetch and decode.
+ *
+ * EMULTYPE_TRAP_UD - Set when emulating an intercepted #UD from hardware.
+ * Indicates that only select instructions (tagged with
+ * EmulateOnUD) should be emulated (to minimize the emulator
+ * attack surface). See also EMULTYPE_TRAP_UD_FORCED.
+ *
+ * EMULTYPE_SKIP - Set when emulating solely to skip an instruction, i.e. to
+ * decode the instruction length. For use *only* by
+ * kvm_x86_ops->skip_emulated_instruction() implementations.
+ *
+ * EMULTYPE_ALLOW_RETRY - Set when the emulator should resume the guest to
+ * retry native execution under certain conditions.
+ *
+ * EMULTYPE_TRAP_UD_FORCED - Set when emulating an intercepted #UD that was
+ * triggered by KVM's magic "force emulation" prefix,
+ * which is opt in via module param (off by default).
+ * Bypasses EmulateOnUD restriction despite emulating
+ * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
+ * Used to test the full emulator from userspace.
+ *
+ * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
+ * backdoor emulation, which is opt in via module param.
+ * VMware backoor emulation handles select instructions
+ * and reinjects the #GP for all other cases.
+ */
#define EMULTYPE_NO_DECODE (1 << 0)
#define EMULTYPE_TRAP_UD (1 << 1)
#define EMULTYPE_SKIP (1 << 2)
#define EMULTYPE_ALLOW_RETRY (1 << 3)
-#define EMULTYPE_NO_UD_ON_FAIL (1 << 4)
-#define EMULTYPE_VMWARE (1 << 5)
+#define EMULTYPE_TRAP_UD_FORCED (1 << 4)
+#define EMULTYPE_VMWARE_GP (1 << 5)
int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
void *insn, int insn_len);
#define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
#define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
-asmlinkage void __noreturn kvm_spurious_fault(void);
+asmlinkage void kvm_spurious_fault(void);
/*
* Hardware virtualization extension instructions may fault if a
* Usually after catching the fault we just panic; during reboot
* instead the instruction is ignored.
*/
-#define ____kvm_handle_fault_on_reboot(insn, cleanup_insn) \
+#define __kvm_handle_fault_on_reboot(insn) \
"666: \n\t" \
insn "\n\t" \
"jmp 668f \n\t" \
"667: \n\t" \
"call kvm_spurious_fault \n\t" \
"668: \n\t" \
- ".pushsection .fixup, \"ax\" \n\t" \
- "700: \n\t" \
- cleanup_insn "\n\t" \
- "cmpb $0, kvm_rebooting\n\t" \
- "je 667b \n\t" \
- "jmp 668b \n\t" \
- ".popsection \n\t" \
- _ASM_EXTABLE(666b, 700b)
-
-#define __kvm_handle_fault_on_reboot(insn) \
- ____kvm_handle_fault_on_reboot(insn, "")
+ _ASM_EXTABLE(666b, 667b)
#define KVM_ARCH_WANT_MMU_NOTIFIER
int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end);
INTERCEPT_MWAIT,
INTERCEPT_MWAIT_COND,
INTERCEPT_XSETBV,
+ INTERCEPT_RDPRU,
};
#define SECONDARY_EXEC_PT_USE_GPA 0x01000000
#define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x00400000
#define SECONDARY_EXEC_TSC_SCALING 0x02000000
+#define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE 0x04000000
#define PIN_BASED_EXT_INTR_MASK 0x00000001
#define PIN_BASED_NMI_EXITING 0x00000008
#define VMX_MISC_SAVE_EFER_LMA 0x00000020
#define VMX_MISC_ACTIVITY_HLT 0x00000040
#define VMX_MISC_ZERO_LEN_INS 0x40000000
+#define VMX_MISC_MSR_LIST_MULTIPLIER 512
/* VMFUNC functions */
#define VMX_VMFUNC_EPTP_SWITCHING 0x00000001
/**
* struct x86_init_acpi - x86 ACPI init functions
+ * @set_root_poitner: set RSDP address
* @get_root_pointer: get RSDP address
* @reduced_hw_early_init: hardware reduced platform early init
*/
struct x86_init_acpi {
+ void (*set_root_pointer)(u64 addr);
u64 (*get_root_pointer)(void);
void (*reduced_hw_early_init)(void);
};
#define SVM_EXIT_MWAIT 0x08b
#define SVM_EXIT_MWAIT_COND 0x08c
#define SVM_EXIT_XSETBV 0x08d
+#define SVM_EXIT_RDPRU 0x08e
#define SVM_EXIT_NPF 0x400
#define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401
#define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402
#define EXIT_REASON_PML_FULL 62
#define EXIT_REASON_XSAVES 63
#define EXIT_REASON_XRSTORS 64
+#define EXIT_REASON_UMWAIT 67
+#define EXIT_REASON_TPAUSE 68
#define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
{ EXIT_REASON_RDSEED, "RDSEED" }, \
{ EXIT_REASON_PML_FULL, "PML_FULL" }, \
{ EXIT_REASON_XSAVES, "XSAVES" }, \
- { EXIT_REASON_XRSTORS, "XRSTORS" }
+ { EXIT_REASON_XRSTORS, "XRSTORS" }, \
+ { EXIT_REASON_UMWAIT, "UMWAIT" }, \
+ { EXIT_REASON_TPAUSE, "TPAUSE" }
#define VMX_ABORT_SAVE_GUEST_MSR_FAIL 1
#define VMX_ABORT_LOAD_HOST_PDPTE_FAIL 2
e820__update_table_print();
}
+void x86_default_set_root_pointer(u64 addr)
+{
+ boot_params.acpi_rsdp_addr = addr;
+}
+
u64 x86_default_get_root_pointer(void)
{
return boot_params.acpi_rsdp_addr;
*/
static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE);
+u32 get_umwait_control_msr(void)
+{
+ return umwait_control_cached;
+}
+EXPORT_SYMBOL_GPL(get_umwait_control_msr);
+
/*
* Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by
* hardware or BIOS before kernel boot.
/* secureboot arch rules */
static const char * const sb_arch_rules[] = {
-#if !IS_ENABLED(CONFIG_KEXEC_VERIFY_SIG)
+#if !IS_ENABLED(CONFIG_KEXEC_SIG)
"appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig",
-#endif /* CONFIG_KEXEC_VERIFY_SIG */
+#endif /* CONFIG_KEXEC_SIG */
"measure func=KEXEC_KERNEL_CHECK",
#if !IS_ENABLED(CONFIG_MODULE_SIG)
"appraise func=MODULE_CHECK appraise_type=imasig",
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/ioport.h>
+#include <linux/security.h>
#include <linux/smp.h>
#include <linux/stddef.h>
#include <linux/slab.h>
if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
return -EINVAL;
- if (turn_on && !capable(CAP_SYS_RAWIO))
+ if (turn_on && (!capable(CAP_SYS_RAWIO) ||
+ security_locked_down(LOCKDOWN_IOPORT)))
return -EPERM;
/*
return -EINVAL;
/* Trying to gain more privileges? */
if (level > old) {
- if (!capable(CAP_SYS_RAWIO))
+ if (!capable(CAP_SYS_RAWIO) ||
+ security_locked_down(LOCKDOWN_IOPORT))
return -EPERM;
}
regs->flags = (regs->flags & ~X86_EFLAGS_IOPL) |
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
+ params->secure_boot = boot_params.secure_boot;
ei->efi_loader_signature = current_ei->efi_loader_signature;
ei->efi_systab = current_ei->efi_systab;
ei->efi_systab_hi = current_ei->efi_systab_hi;
#include <linux/notifier.h>
#include <linux/uaccess.h>
#include <linux/gfp.h>
+#include <linux/security.h>
#include <asm/cpufeature.h>
#include <asm/msr.h>
int err = 0;
ssize_t bytes = 0;
+ err = security_locked_down(LOCKDOWN_MSR);
+ if (err)
+ return err;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */
err = -EFAULT;
break;
}
+ err = security_locked_down(LOCKDOWN_MSR);
+ if (err)
+ break;
err = wrmsr_safe_regs_on_cpu(cpu, regs);
if (err)
break;
},
.acpi = {
+ .set_root_pointer = x86_default_set_root_pointer,
.get_root_pointer = x86_default_get_root_pointer,
.reduced_hw_early_init = acpi_generic_reduced_hw_init,
},
case 7:
case 0xb:
case 0xd:
+ case 0xf:
+ case 0x10:
+ case 0x12:
case 0x14:
+ case 0x17:
+ case 0x18:
+ case 0x1f:
case 0x8000001d:
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
break;
F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
- F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+ F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/;
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
/* cpuid 0x80000008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+ F(CLZERO) | F(XSAVEERPTR) |
F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) |
F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON);
*/
case 0x1f:
case 0xb: {
- int i, level_type;
+ int i;
- /* read more entries until level_type is zero */
- for (i = 1; ; ++i) {
+ /*
+ * We filled in entry[0] for CPUID(EAX=<function>,
+ * ECX=00H) above. If its level type (ECX[15:8]) is
+ * zero, then the leaf is unimplemented, and we're
+ * done. Otherwise, continue to populate entries
+ * until the level type (ECX[15:8]) of the previously
+ * added entry is zero.
+ */
+ for (i = 1; entry[i - 1].ecx & 0xff00; ++i) {
if (*nent >= maxnent)
goto out;
- level_type = entry[i - 1].ecx & 0xff00;
- if (!level_type)
- break;
do_host_cpuid(&entry[i], function, i);
++*nent;
}
EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
/*
- * If no match is found, check whether we exceed the vCPU's limit
- * and return the content of the highest valid _standard_ leaf instead.
- * This is to satisfy the CPUID specification.
+ * If the basic or extended CPUID leaf requested is higher than the
+ * maximum supported basic or extended leaf, respectively, then it is
+ * out of range.
*/
-static struct kvm_cpuid_entry2* check_cpuid_limit(struct kvm_vcpu *vcpu,
- u32 function, u32 index)
+static bool cpuid_function_in_range(struct kvm_vcpu *vcpu, u32 function)
{
- struct kvm_cpuid_entry2 *maxlevel;
-
- maxlevel = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
- if (!maxlevel || maxlevel->eax >= function)
- return NULL;
- if (function & 0x80000000) {
- maxlevel = kvm_find_cpuid_entry(vcpu, 0, 0);
- if (!maxlevel)
- return NULL;
- }
- return kvm_find_cpuid_entry(vcpu, maxlevel->eax, index);
+ struct kvm_cpuid_entry2 *max;
+
+ max = kvm_find_cpuid_entry(vcpu, function & 0x80000000, 0);
+ return max && function <= max->eax;
}
bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
u32 *ecx, u32 *edx, bool check_limit)
{
u32 function = *eax, index = *ecx;
- struct kvm_cpuid_entry2 *best;
- bool entry_found = true;
-
- best = kvm_find_cpuid_entry(vcpu, function, index);
-
- if (!best) {
- entry_found = false;
- if (!check_limit)
- goto out;
+ struct kvm_cpuid_entry2 *entry;
+ struct kvm_cpuid_entry2 *max;
+ bool found;
- best = check_cpuid_limit(vcpu, function, index);
+ entry = kvm_find_cpuid_entry(vcpu, function, index);
+ found = entry;
+ /*
+ * Intel CPUID semantics treats any query for an out-of-range
+ * leaf as if the highest basic leaf (i.e. CPUID.0H:EAX) were
+ * requested. AMD CPUID semantics returns all zeroes for any
+ * undefined leaf, whether or not the leaf is in range.
+ */
+ if (!entry && check_limit && !guest_cpuid_is_amd(vcpu) &&
+ !cpuid_function_in_range(vcpu, function)) {
+ max = kvm_find_cpuid_entry(vcpu, 0, 0);
+ if (max) {
+ function = max->eax;
+ entry = kvm_find_cpuid_entry(vcpu, function, index);
+ }
}
-
-out:
- if (best) {
- *eax = best->eax;
- *ebx = best->ebx;
- *ecx = best->ecx;
- *edx = best->edx;
- } else
+ if (entry) {
+ *eax = entry->eax;
+ *ebx = entry->ebx;
+ *ecx = entry->ecx;
+ *edx = entry->edx;
+ } else {
*eax = *ebx = *ecx = *edx = 0;
- trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, entry_found);
- return entry_found;
+ /*
+ * When leaf 0BH or 1FH is defined, CL is pass-through
+ * and EDX is always the x2APIC ID, even for undefined
+ * subleaves. Index 1 will exist iff the leaf is
+ * implemented, so we pass through CL iff leaf 1
+ * exists. EDX can be copied from any existing index.
+ */
+ if (function == 0xb || function == 0x1f) {
+ entry = kvm_find_cpuid_entry(vcpu, function, 1);
+ if (entry) {
+ *ecx = index & 0xff;
+ *edx = entry->edx;
+ }
+ }
+ }
+ trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, found);
+ return found;
}
EXPORT_SYMBOL_GPL(kvm_cpuid);
#include "ioapic.h"
#include "hyperv.h"
+#include <linux/cpu.h>
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <linux/sched/cputime.h>
.vector = stimer->config.apic_vector
};
- return !kvm_apic_set_irq(vcpu, &irq, NULL);
+ if (lapic_in_kernel(vcpu))
+ return !kvm_apic_set_irq(vcpu, &irq, NULL);
+ return 0;
}
static void stimer_expiration(struct kvm_vcpu_hv_stimer *stimer)
ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
- ent->edx |= HV_STIMER_DIRECT_MODE_AVAILABLE;
+
+ /*
+ * Direct Synthetic timers only make sense with in-kernel
+ * LAPIC
+ */
+ if (lapic_in_kernel(vcpu))
+ ent->edx |= HV_STIMER_DIRECT_MODE_AVAILABLE;
break;
ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
if (evmcs_ver)
ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
-
+ if (!cpu_smt_possible())
+ ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
/*
* Default number of spinlock retry attempts, matches
* HyperV 2016.
#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
-#define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
-#define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000
+static bool lapic_timer_advance_dynamic __read_mostly;
+#define LAPIC_TIMER_ADVANCE_ADJUST_MIN 100 /* clock cycles */
+#define LAPIC_TIMER_ADVANCE_ADJUST_MAX 10000 /* clock cycles */
+#define LAPIC_TIMER_ADVANCE_NS_INIT 1000
+#define LAPIC_TIMER_ADVANCE_NS_MAX 5000
/* step-by-step approximation to mitigate fluctuation */
#define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
u64 ns;
+ /* Do not adjust for tiny fluctuations or large random spikes. */
+ if (abs(advance_expire_delta) > LAPIC_TIMER_ADVANCE_ADJUST_MAX ||
+ abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_MIN)
+ return;
+
/* too early */
if (advance_expire_delta < 0) {
ns = -advance_expire_delta * 1000000ULL;
do_div(ns, vcpu->arch.virtual_tsc_khz);
- timer_advance_ns -= min((u32)ns,
- timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+ timer_advance_ns -= ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP;
} else {
/* too late */
ns = advance_expire_delta * 1000000ULL;
do_div(ns, vcpu->arch.virtual_tsc_khz);
- timer_advance_ns += min((u32)ns,
- timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+ timer_advance_ns += ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP;
}
- if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
- apic->lapic_timer.timer_advance_adjust_done = true;
- if (unlikely(timer_advance_ns > 5000)) {
- timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
- apic->lapic_timer.timer_advance_adjust_done = false;
- }
+ if (unlikely(timer_advance_ns > LAPIC_TIMER_ADVANCE_NS_MAX))
+ timer_advance_ns = LAPIC_TIMER_ADVANCE_NS_INIT;
apic->lapic_timer.timer_advance_ns = timer_advance_ns;
}
if (guest_tsc < tsc_deadline)
__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
- if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
+ if (lapic_timer_advance_dynamic)
adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
}
HRTIMER_MODE_ABS_HARD);
apic->lapic_timer.timer.function = apic_timer_fn;
if (timer_advance_ns == -1) {
- apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
- apic->lapic_timer.timer_advance_adjust_done = false;
+ apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_NS_INIT;
+ lapic_timer_advance_dynamic = true;
} else {
apic->lapic_timer.timer_advance_ns = timer_advance_ns;
- apic->lapic_timer.timer_advance_adjust_done = true;
+ lapic_timer_advance_dynamic = false;
}
-
/*
* APIC is created enabled. This will prevent kvm_lapic_set_base from
* thinking that APIC state has changed.
s64 advance_expire_delta;
atomic_t pending; /* accumulated triggered timers */
bool hv_timer_in_use;
- bool timer_advance_adjust_done;
};
struct kvm_lapic {
#define PTE_PREFETCH_NUM 8
#define PT_FIRST_AVAIL_BITS_SHIFT 10
-#define PT64_SECOND_AVAIL_BITS_SHIFT 52
+#define PT64_SECOND_AVAIL_BITS_SHIFT 54
+
+/*
+ * The mask used to denote special SPTEs, which can be either MMIO SPTEs or
+ * Access Tracking SPTEs.
+ */
+#define SPTE_SPECIAL_MASK (3ULL << 52)
+#define SPTE_AD_ENABLED_MASK (0ULL << 52)
+#define SPTE_AD_DISABLED_MASK (1ULL << 52)
+#define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52)
+#define SPTE_MMIO_MASK (3ULL << 52)
#define PT64_LEVEL_BITS 9
static u64 __read_mostly shadow_me_mask;
/*
- * SPTEs used by MMUs without A/D bits are marked with shadow_acc_track_value.
- * Non-present SPTEs with shadow_acc_track_value set are in place for access
- * tracking.
+ * SPTEs used by MMUs without A/D bits are marked with SPTE_AD_DISABLED_MASK;
+ * shadow_acc_track_mask is the set of bits to be cleared in non-accessed
+ * pages.
*/
static u64 __read_mostly shadow_acc_track_mask;
-static const u64 shadow_acc_track_value = SPTE_SPECIAL_MASK;
/*
* The mask/shift to use for saving the original R/X bits when marking the PTE
{
BUG_ON((u64)(unsigned)access_mask != access_mask);
BUG_ON((mmio_mask & mmio_value) != mmio_value);
- shadow_mmio_value = mmio_value | SPTE_SPECIAL_MASK;
+ shadow_mmio_value = mmio_value | SPTE_MMIO_MASK;
shadow_mmio_mask = mmio_mask | SPTE_SPECIAL_MASK;
shadow_mmio_access_mask = access_mask;
}
return sp->role.ad_disabled;
}
+static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
+{
+ /*
+ * When using the EPT page-modification log, the GPAs in the log
+ * would come from L2 rather than L1. Therefore, we need to rely
+ * on write protection to record dirty pages. This also bypasses
+ * PML, since writes now result in a vmexit.
+ */
+ return vcpu->arch.mmu == &vcpu->arch.guest_mmu;
+}
+
static inline bool spte_ad_enabled(u64 spte)
{
MMU_WARN_ON(is_mmio_spte(spte));
- return !(spte & shadow_acc_track_value);
+ return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_DISABLED_MASK;
+}
+
+static inline bool spte_ad_need_write_protect(u64 spte)
+{
+ MMU_WARN_ON(is_mmio_spte(spte));
+ return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
}
static inline u64 spte_shadow_accessed_mask(u64 spte)
mask |= (gpa & shadow_nonpresent_or_rsvd_mask)
<< shadow_nonpresent_or_rsvd_mask_len;
- page_header(__pa(sptep))->mmio_cached = true;
-
trace_mark_mmio_spte(sptep, gfn, access, gen);
mmu_spte_set(sptep, mask);
}
{
BUG_ON(!dirty_mask != !accessed_mask);
BUG_ON(!accessed_mask && !acc_track_mask);
- BUG_ON(acc_track_mask & shadow_acc_track_value);
+ BUG_ON(acc_track_mask & SPTE_SPECIAL_MASK);
shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask;
rmap_printk("rmap_clear_dirty: spte %p %llx\n", sptep, *sptep);
+ MMU_WARN_ON(!spte_ad_enabled(spte));
spte &= ~shadow_dirty_mask;
-
return mmu_spte_update(sptep, spte);
}
-static bool wrprot_ad_disabled_spte(u64 *sptep)
+static bool spte_wrprot_for_clear_dirty(u64 *sptep)
{
bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
(unsigned long *)sptep);
- if (was_writable)
+ if (was_writable && !spte_ad_enabled(*sptep))
kvm_set_pfn_dirty(spte_to_pfn(*sptep));
return was_writable;
bool flush = false;
for_each_rmap_spte(rmap_head, &iter, sptep)
- if (spte_ad_enabled(*sptep))
- flush |= spte_clear_dirty(sptep);
+ if (spte_ad_need_write_protect(*sptep))
+ flush |= spte_wrprot_for_clear_dirty(sptep);
else
- flush |= wrprot_ad_disabled_spte(sptep);
+ flush |= spte_clear_dirty(sptep);
return flush;
}
rmap_printk("rmap_set_dirty: spte %p %llx\n", sptep, *sptep);
+ /*
+ * Similar to the !kvm_x86_ops->slot_disable_log_dirty case,
+ * do not bother adding back write access to pages marked
+ * SPTE_AD_WRPROT_ONLY_MASK.
+ */
spte |= shadow_dirty_mask;
return mmu_spte_update(sptep, spte);
* depends on valid pages being added to the head of the list. See
* comments in kvm_zap_obsolete_pages().
*/
+ sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
kvm_mod_used_mmu_pages(vcpu->kvm, +1);
return sp;
#define for_each_valid_sp(_kvm, _sp, _gfn) \
hlist_for_each_entry(_sp, \
&(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
- if (is_obsolete_sp((_kvm), (_sp)) || (_sp)->role.invalid) { \
+ if (is_obsolete_sp((_kvm), (_sp))) { \
} else
#define for_each_gfn_indirect_valid_sp(_kvm, _sp, _gfn) \
static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
{
- return unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+ return sp->role.invalid ||
+ unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
}
static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (level > PT_PAGE_TABLE_LEVEL && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
}
- sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
clear_page(sp->spt);
trace_kvm_mmu_get_page(sp, true);
shadow_user_mask | shadow_x_mask | shadow_me_mask;
if (sp_ad_disabled(sp))
- spte |= shadow_acc_track_value;
+ spte |= SPTE_AD_DISABLED_MASK;
else
spte |= shadow_accessed_mask;
} else {
list_move(&sp->link, &kvm->arch.active_mmu_pages);
- if (!sp->role.invalid)
+ /*
+ * Obsolete pages cannot be used on any vCPUs, see the comment
+ * in kvm_mmu_zap_all_fast(). Note, is_obsolete_sp() also
+ * treats invalid shadow pages as being obsolete.
+ */
+ if (!is_obsolete_sp(kvm, sp))
kvm_reload_remote_mmus(kvm);
}
sp = page_header(__pa(sptep));
if (sp_ad_disabled(sp))
- spte |= shadow_acc_track_value;
+ spte |= SPTE_AD_DISABLED_MASK;
+ else if (kvm_vcpu_ad_need_write_protect(vcpu))
+ spte |= SPTE_AD_WRPROT_ONLY_MASK;
/*
* For the EPT case, shadow_present_mask is 0 if hardware
void *insn, int insn_len)
{
int r, emulation_type = 0;
- enum emulation_result er;
bool direct = vcpu->arch.mmu->direct_map;
/* With shadow page tables, fault_address contains a GVA or nGPA. */
return 1;
}
- er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len);
-
- switch (er) {
- case EMULATE_DONE:
- return 1;
- case EMULATE_USER_EXIT:
- ++vcpu->stat.mmio_exits;
- /* fall through */
- case EMULATE_FAIL:
- return 0;
- default:
- BUG();
- }
+ return x86_emulate_instruction(vcpu, cr2, emulation_type, insn,
+ insn_len);
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
return ret;
}
-
+#define BATCH_ZAP_PAGES 10
static void kvm_zap_obsolete_pages(struct kvm *kvm)
{
struct kvm_mmu_page *sp, *node;
- LIST_HEAD(invalid_list);
- int ign;
+ int nr_zapped, batch = 0;
restart:
list_for_each_entry_safe_reverse(sp, node,
break;
/*
- * Do not repeatedly zap a root page to avoid unnecessary
- * KVM_REQ_MMU_RELOAD, otherwise we may not be able to
- * progress:
- * vcpu 0 vcpu 1
- * call vcpu_enter_guest():
- * 1): handle KVM_REQ_MMU_RELOAD
- * and require mmu-lock to
- * load mmu
- * repeat:
- * 1): zap root page and
- * send KVM_REQ_MMU_RELOAD
- *
- * 2): if (cond_resched_lock(mmu-lock))
- *
- * 2): hold mmu-lock and load mmu
- *
- * 3): see KVM_REQ_MMU_RELOAD bit
- * on vcpu->requests is set
- * then return 1 to call
- * vcpu_enter_guest() again.
- * goto repeat;
- *
- * Since we are reversely walking the list and the invalid
- * list will be moved to the head, skip the invalid page
- * can help us to avoid the infinity list walking.
+ * Skip invalid pages with a non-zero root count, zapping pages
+ * with a non-zero root count will never succeed, i.e. the page
+ * will get thrown back on active_mmu_pages and we'll get stuck
+ * in an infinite loop.
*/
- if (sp->role.invalid)
+ if (sp->role.invalid && sp->root_count)
continue;
- if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
- kvm_mmu_commit_zap_page(kvm, &invalid_list);
- cond_resched_lock(&kvm->mmu_lock);
+ /*
+ * No need to flush the TLB since we're only zapping shadow
+ * pages with an obsolete generation number and all vCPUS have
+ * loaded a new root, i.e. the shadow pages being zapped cannot
+ * be in active use by the guest.
+ */
+ if (batch >= BATCH_ZAP_PAGES &&
+ cond_resched_lock(&kvm->mmu_lock)) {
+ batch = 0;
goto restart;
}
- if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
+ if (__kvm_mmu_prepare_zap_page(kvm, sp,
+ &kvm->arch.zapped_obsolete_pages, &nr_zapped)) {
+ batch += nr_zapped;
goto restart;
+ }
}
- kvm_mmu_commit_zap_page(kvm, &invalid_list);
+ /*
+ * Trigger a remote TLB flush before freeing the page tables to ensure
+ * KVM is not in the middle of a lockless shadow page table walk, which
+ * may reference the pages.
+ */
+ kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
}
/*
*/
static void kvm_mmu_zap_all_fast(struct kvm *kvm)
{
+ lockdep_assert_held(&kvm->slots_lock);
+
spin_lock(&kvm->mmu_lock);
- kvm->arch.mmu_valid_gen++;
+ trace_kvm_mmu_zap_all_fast(kvm);
+
+ /*
+ * Toggle mmu_valid_gen between '0' and '1'. Because slots_lock is
+ * held for the entire duration of zapping obsolete pages, it's
+ * impossible for there to be multiple invalid generations associated
+ * with *valid* shadow pages at any given time, i.e. there is exactly
+ * one valid generation and (at most) one invalid generation.
+ */
+ kvm->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen ? 0 : 1;
+
+ /*
+ * Notify all vcpus to reload its shadow page table and flush TLB.
+ * Then all vcpus will switch to new shadow page table with the new
+ * mmu_valid_gen.
+ *
+ * Note: we need to do this under the protection of mmu_lock,
+ * otherwise, vcpu would purge shadow page but miss tlb flush.
+ */
+ kvm_reload_remote_mmus(kvm);
kvm_zap_obsolete_pages(kvm);
spin_unlock(&kvm->mmu_lock);
}
+static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
+{
+ return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
+}
+
static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot,
struct kvm_page_track_notifier_node *node)
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
-static void __kvm_mmu_zap_all(struct kvm *kvm, bool mmio_only)
+void kvm_mmu_zap_all(struct kvm *kvm)
{
struct kvm_mmu_page *sp, *node;
LIST_HEAD(invalid_list);
spin_lock(&kvm->mmu_lock);
restart:
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
- if (mmio_only && !sp->mmio_cached)
- continue;
if (sp->role.invalid && sp->root_count)
continue;
- if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign)) {
- WARN_ON_ONCE(mmio_only);
+ if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
goto restart;
- }
if (cond_resched_lock(&kvm->mmu_lock))
goto restart;
}
spin_unlock(&kvm->mmu_lock);
}
-void kvm_mmu_zap_all(struct kvm *kvm)
-{
- return __kvm_mmu_zap_all(kvm, false);
-}
-
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
{
WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
*/
if (unlikely(gen == 0)) {
kvm_debug_ratelimited("kvm: zapping shadow pages for mmio generation wraparound\n");
- __kvm_mmu_zap_all(kvm, true);
+ kvm_mmu_zap_all_fast(kvm);
}
}
* want to shrink a VM that only started to populate its MMU
* anyway.
*/
- if (!kvm->arch.n_used_mmu_pages)
+ if (!kvm->arch.n_used_mmu_pages &&
+ !kvm_has_zapped_obsolete_pages(kvm))
continue;
idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
+ if (kvm_has_zapped_obsolete_pages(kvm)) {
+ kvm_mmu_commit_zap_page(kvm,
+ &kvm->arch.zapped_obsolete_pages);
+ goto unlock;
+ }
+
if (prepare_zap_oldest_mmu_page(kvm, &invalid_list))
freed++;
kvm_mmu_commit_zap_page(kvm, &invalid_list);
+unlock:
spin_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
#undef TRACE_SYSTEM
#define TRACE_SYSTEM kvmmmu
-#define KVM_MMU_PAGE_FIELDS \
- __field(__u64, gfn) \
- __field(__u32, role) \
- __field(__u32, root_count) \
+#define KVM_MMU_PAGE_FIELDS \
+ __field(__u8, mmu_valid_gen) \
+ __field(__u64, gfn) \
+ __field(__u32, role) \
+ __field(__u32, root_count) \
__field(bool, unsync)
-#define KVM_MMU_PAGE_ASSIGN(sp) \
- __entry->gfn = sp->gfn; \
- __entry->role = sp->role.word; \
- __entry->root_count = sp->root_count; \
+#define KVM_MMU_PAGE_ASSIGN(sp) \
+ __entry->mmu_valid_gen = sp->mmu_valid_gen; \
+ __entry->gfn = sp->gfn; \
+ __entry->role = sp->role.word; \
+ __entry->root_count = sp->root_count; \
__entry->unsync = sp->unsync;
#define KVM_MMU_PAGE_PRINTK() ({ \
\
role.word = __entry->role; \
\
- trace_seq_printf(p, "sp gfn %llx l%u %u-byte q%u%s %s%s" \
+ trace_seq_printf(p, "sp gen %u gfn %llx l%u %u-byte q%u%s %s%s" \
" %snxe %sad root %u %s%c", \
+ __entry->mmu_valid_gen, \
__entry->gfn, role.level, \
role.gpte_is_8_bytes ? 8 : 4, \
role.quadrant, \
)
);
+TRACE_EVENT(
+ kvm_mmu_zap_all_fast,
+ TP_PROTO(struct kvm *kvm),
+ TP_ARGS(kvm),
+
+ TP_STRUCT__entry(
+ __field(__u8, mmu_valid_gen)
+ __field(unsigned int, mmu_used_pages)
+ ),
+
+ TP_fast_assign(
+ __entry->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+ __entry->mmu_used_pages = kvm->arch.n_used_mmu_pages;
+ ),
+
+ TP_printk("kvm-mmu-valid-gen %u used_pages %x",
+ __entry->mmu_valid_gen, __entry->mmu_used_pages
+ )
+);
+
+
TRACE_EVENT(
check_mmio_spte,
TP_PROTO(u64 spte, unsigned int kvm_gen, unsigned int spte_gen),
svm->next_rip = svm->vmcb->control.next_rip;
}
- if (!svm->next_rip)
- return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP);
-
- if (svm->next_rip - kvm_rip_read(vcpu) > MAX_INST_SIZE)
- printk(KERN_ERR "%s: ip 0x%lx next 0x%llx\n",
- __func__, kvm_rip_read(vcpu), svm->next_rip);
-
- kvm_rip_write(vcpu, svm->next_rip);
+ if (!svm->next_rip) {
+ if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP))
+ return 0;
+ } else {
+ if (svm->next_rip - kvm_rip_read(vcpu) > MAX_INST_SIZE)
+ pr_err("%s: ip 0x%lx next 0x%llx\n",
+ __func__, kvm_rip_read(vcpu), svm->next_rip);
+ kvm_rip_write(vcpu, svm->next_rip);
+ }
svm_set_interrupt_shadow(vcpu, 0);
- return EMULATE_DONE;
+ return 1;
}
static void svm_queue_exception(struct kvm_vcpu *vcpu)
set_intercept(svm, INTERCEPT_SKINIT);
set_intercept(svm, INTERCEPT_WBINVD);
set_intercept(svm, INTERCEPT_XSETBV);
+ set_intercept(svm, INTERCEPT_RDPRU);
set_intercept(svm, INTERCEPT_RSM);
if (!kvm_mwait_in_guest(svm->vcpu.kvm)) {
{
struct kvm_vcpu *vcpu = &svm->vcpu;
u32 error_code = svm->vmcb->control.exit_info_1;
- int er;
WARN_ON_ONCE(!enable_vmware_backdoor);
- er = kvm_emulate_instruction(vcpu,
- EMULTYPE_VMWARE | EMULTYPE_NO_UD_ON_FAIL);
- if (er == EMULATE_USER_EXIT)
- return 0;
- else if (er != EMULATE_DONE)
+ /*
+ * VMware backdoor emulation on #GP interception only handles IN{S},
+ * OUT{S}, and RDPMC, none of which generate a non-zero error code.
+ */
+ if (error_code) {
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
- return 1;
+ return 1;
+ }
+ return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
}
static bool is_erratum_383(void)
string = (io_info & SVM_IOIO_STR_MASK) != 0;
in = (io_info & SVM_IOIO_TYPE_MASK) != 0;
if (string)
- return kvm_emulate_instruction(vcpu, 0) == EMULATE_DONE;
+ return kvm_emulate_instruction(vcpu, 0);
port = io_info >> 16;
size = (io_info & SVM_IOIO_SIZE_MASK) >> SVM_IOIO_SIZE_SHIFT;
return 1;
}
+static int rdpru_interception(struct vcpu_svm *svm)
+{
+ kvm_queue_exception(&svm->vcpu, UD_VECTOR);
+ return 1;
+}
+
static int task_switch_interception(struct vcpu_svm *svm)
{
u16 tss_selector;
int_type == SVM_EXITINTINFO_TYPE_SOFT ||
(int_type == SVM_EXITINTINFO_TYPE_EXEPT &&
(int_vec == OF_VECTOR || int_vec == BP_VECTOR))) {
- if (skip_emulated_instruction(&svm->vcpu) != EMULATE_DONE)
- goto fail;
+ if (!skip_emulated_instruction(&svm->vcpu))
+ return 0;
}
if (int_type != SVM_EXITINTINFO_TYPE_SOFT)
int_vec = -1;
- if (kvm_task_switch(&svm->vcpu, tss_selector, int_vec, reason,
- has_error_code, error_code) == EMULATE_FAIL)
- goto fail;
-
- return 1;
-
-fail:
- svm->vcpu.run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
- svm->vcpu.run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
- svm->vcpu.run->internal.ndata = 0;
- return 0;
+ return kvm_task_switch(&svm->vcpu, tss_selector, int_vec, reason,
+ has_error_code, error_code);
}
static int cpuid_interception(struct vcpu_svm *svm)
static int invlpg_interception(struct vcpu_svm *svm)
{
if (!static_cpu_has(X86_FEATURE_DECODEASSISTS))
- return kvm_emulate_instruction(&svm->vcpu, 0) == EMULATE_DONE;
+ return kvm_emulate_instruction(&svm->vcpu, 0);
kvm_mmu_invlpg(&svm->vcpu, svm->vmcb->control.exit_info_1);
return kvm_skip_emulated_instruction(&svm->vcpu);
static int emulate_on_interception(struct vcpu_svm *svm)
{
- return kvm_emulate_instruction(&svm->vcpu, 0) == EMULATE_DONE;
+ return kvm_emulate_instruction(&svm->vcpu, 0);
}
static int rsm_interception(struct vcpu_svm *svm)
{
- return kvm_emulate_instruction_from_buffer(&svm->vcpu,
- rsm_ins_bytes, 2) == EMULATE_DONE;
+ return kvm_emulate_instruction_from_buffer(&svm->vcpu, rsm_ins_bytes, 2);
}
static int rdpmc_interception(struct vcpu_svm *svm)
ret = avic_unaccel_trap_write(svm);
} else {
/* Handling Fault */
- ret = (kvm_emulate_instruction(&svm->vcpu, 0) == EMULATE_DONE);
+ ret = kvm_emulate_instruction(&svm->vcpu, 0);
}
return ret;
[SVM_EXIT_MONITOR] = monitor_interception,
[SVM_EXIT_MWAIT] = mwait_interception,
[SVM_EXIT_XSETBV] = xsetbv_interception,
+ [SVM_EXIT_RDPRU] = rdpru_interception,
[SVM_EXIT_NPF] = npf_interception,
[SVM_EXIT_RSM] = rsm_interception,
[SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception,
return ret;
}
-static int nested_enable_evmcs(struct kvm_vcpu *vcpu,
- uint16_t *vmcs_version)
-{
- /* Intel-only feature */
- return -ENODEV;
-}
-
static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
{
unsigned long cr4 = kvm_read_cr4(vcpu);
.mem_enc_reg_region = svm_register_enc_region,
.mem_enc_unreg_region = svm_unregister_enc_region,
- .nested_enable_evmcs = nested_enable_evmcs,
+ .nested_enable_evmcs = NULL,
.nested_get_evmcs_version = NULL,
.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
SECONDARY_EXEC_XSAVES;
}
+static inline bool vmx_waitpkg_supported(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl &
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+}
+
static inline bool cpu_has_vmx_tsc_scaling(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
struct hv_vp_assist_page *vp_ap =
hv_get_vp_assist_page(smp_processor_id());
+ if (current_evmcs->hv_enlightenments_control.nested_flush_hypercall)
+ vp_ap->nested_control.features.directhypercall = 1;
vp_ap->current_nested_vmcs = phys_addr;
vp_ap->enlighten_vmentry = 1;
}
pr_debug_ratelimited("kvm: nested vmx abort, indicator %d\n", indicator);
}
+static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
+{
+ return fixed_bits_valid(control, low, high);
+}
+
+static inline u64 vmx_control_msr(u32 low, u32 high)
+{
+ return low | ((u64)high << 32);
+}
+
static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
{
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
return 0;
}
+static u32 nested_vmx_max_atomic_switch_msrs(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ u64 vmx_misc = vmx_control_msr(vmx->nested.msrs.misc_low,
+ vmx->nested.msrs.misc_high);
+
+ return (vmx_misc_max_msr(vmx_misc) + 1) * VMX_MISC_MSR_LIST_MULTIPLIER;
+}
+
/*
* Load guest's/host's msr at nested entry/exit.
* return 0 for success, entry index for failure.
+ *
+ * One of the failure modes for MSR load/store is when a list exceeds the
+ * virtual hardware's capacity. To maintain compatibility with hardware inasmuch
+ * as possible, process all valid entries before failing rather than precheck
+ * for a capacity violation.
*/
static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
{
u32 i;
struct vmx_msr_entry e;
+ u32 max_msr_list_size = nested_vmx_max_atomic_switch_msrs(vcpu);
for (i = 0; i < count; i++) {
+ if (unlikely(i >= max_msr_list_size))
+ goto fail;
+
if (kvm_vcpu_read_guest(vcpu, gpa + i * sizeof(e),
&e, sizeof(e))) {
pr_debug_ratelimited(
u64 data;
u32 i;
struct vmx_msr_entry e;
+ u32 max_msr_list_size = nested_vmx_max_atomic_switch_msrs(vcpu);
for (i = 0; i < count; i++) {
+ if (unlikely(i >= max_msr_list_size))
+ return -EINVAL;
+
if (kvm_vcpu_read_guest(vcpu,
gpa + i * sizeof(e),
&e, 2 * sizeof(u32))) {
return vmx->nested.vpid02 ? vmx->nested.vpid02 : vmx->vpid;
}
-
-static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
-{
- return fixed_bits_valid(control, low, high);
-}
-
-static inline u64 vmx_control_msr(u32 low, u32 high)
-{
- return low | ((u64)high << 32);
-}
-
static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
{
superset &= mask;
SECONDARY_EXEC_ENABLE_INVPCID |
SECONDARY_EXEC_RDTSCP |
SECONDARY_EXEC_XSAVES |
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_ENABLE_VMFUNC);
/* VM-entry exception error code */
if (CC(has_error_code &&
- vmcs12->vm_entry_exception_error_code & GENMASK(31, 15)))
+ vmcs12->vm_entry_exception_error_code & GENMASK(31, 16)))
return -EINVAL;
/* VM-entry interruption-info field: reserved bits */
CC(!kvm_pat_valid(vmcs12->host_ia32_pat)))
return -EINVAL;
- ia32e = (vmcs12->vm_exit_controls &
- VM_EXIT_HOST_ADDR_SPACE_SIZE) != 0;
+#ifdef CONFIG_X86_64
+ ia32e = !!(vcpu->arch.efer & EFER_LMA);
+#else
+ ia32e = false;
+#endif
+
+ if (ia32e) {
+ if (CC(!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE)) ||
+ CC(!(vmcs12->host_cr4 & X86_CR4_PAE)))
+ return -EINVAL;
+ } else {
+ if (CC(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE) ||
+ CC(vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE) ||
+ CC(vmcs12->host_cr4 & X86_CR4_PCIDE) ||
+ CC((vmcs12->host_rip) >> 32))
+ return -EINVAL;
+ }
if (CC(vmcs12->host_cs_selector & (SEGMENT_RPL_MASK | SEGMENT_TI_MASK)) ||
CC(vmcs12->host_ss_selector & (SEGMENT_RPL_MASK | SEGMENT_TI_MASK)) ||
CC(is_noncanonical_address(vmcs12->host_gs_base, vcpu)) ||
CC(is_noncanonical_address(vmcs12->host_gdtr_base, vcpu)) ||
CC(is_noncanonical_address(vmcs12->host_idtr_base, vcpu)) ||
- CC(is_noncanonical_address(vmcs12->host_tr_base, vcpu)))
+ CC(is_noncanonical_address(vmcs12->host_tr_base, vcpu)) ||
+ CC(is_noncanonical_address(vmcs12->host_rip, vcpu)))
return -EINVAL;
#endif
case EXIT_REASON_ENCLS:
/* SGX is never exposed to L1 */
return false;
+ case EXIT_REASON_UMWAIT:
+ case EXIT_REASON_TPAUSE:
+ return nested_cpu_has2(vmcs12,
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE);
default:
return true;
}
#include "vmcs.h"
#define __ex(x) __kvm_handle_fault_on_reboot(x)
-#define __ex_clear(x, reg) \
- ____kvm_handle_fault_on_reboot(x, "xor " reg ", " reg)
+
+asmlinkage void vmread_error(unsigned long field, bool fault);
+void vmwrite_error(unsigned long field, unsigned long value);
+void vmclear_error(struct vmcs *vmcs, u64 phys_addr);
+void vmptrld_error(struct vmcs *vmcs, u64 phys_addr);
+void invvpid_error(unsigned long ext, u16 vpid, gva_t gva);
+void invept_error(unsigned long ext, u64 eptp, gpa_t gpa);
static __always_inline void vmcs_check16(unsigned long field)
{
{
unsigned long value;
- asm volatile (__ex_clear("vmread %1, %0", "%k0")
- : "=r"(value) : "r"(field));
+ asm volatile("1: vmread %2, %1\n\t"
+ ".byte 0x3e\n\t" /* branch taken hint */
+ "ja 3f\n\t"
+ "mov %2, %%" _ASM_ARG1 "\n\t"
+ "xor %%" _ASM_ARG2 ", %%" _ASM_ARG2 "\n\t"
+ "2: call vmread_error\n\t"
+ "xor %k1, %k1\n\t"
+ "3:\n\t"
+
+ ".pushsection .fixup, \"ax\"\n\t"
+ "4: mov %2, %%" _ASM_ARG1 "\n\t"
+ "mov $1, %%" _ASM_ARG2 "\n\t"
+ "jmp 2b\n\t"
+ ".popsection\n\t"
+ _ASM_EXTABLE(1b, 4b)
+ : ASM_CALL_CONSTRAINT, "=r"(value) : "r"(field) : "cc");
return value;
}
return __vmcs_readl(field);
}
-static noinline void vmwrite_error(unsigned long field, unsigned long value)
-{
- printk(KERN_ERR "vmwrite error: reg %lx value %lx (err %d)\n",
- field, value, vmcs_read32(VM_INSTRUCTION_ERROR));
- dump_stack();
-}
+#define vmx_asm1(insn, op1, error_args...) \
+do { \
+ asm_volatile_goto("1: " __stringify(insn) " %0\n\t" \
+ ".byte 0x2e\n\t" /* branch not taken hint */ \
+ "jna %l[error]\n\t" \
+ _ASM_EXTABLE(1b, %l[fault]) \
+ : : op1 : "cc" : error, fault); \
+ return; \
+error: \
+ insn##_error(error_args); \
+ return; \
+fault: \
+ kvm_spurious_fault(); \
+} while (0)
+
+#define vmx_asm2(insn, op1, op2, error_args...) \
+do { \
+ asm_volatile_goto("1: " __stringify(insn) " %1, %0\n\t" \
+ ".byte 0x2e\n\t" /* branch not taken hint */ \
+ "jna %l[error]\n\t" \
+ _ASM_EXTABLE(1b, %l[fault]) \
+ : : op1, op2 : "cc" : error, fault); \
+ return; \
+error: \
+ insn##_error(error_args); \
+ return; \
+fault: \
+ kvm_spurious_fault(); \
+} while (0)
static __always_inline void __vmcs_writel(unsigned long field, unsigned long value)
{
- bool error;
-
- asm volatile (__ex("vmwrite %2, %1") CC_SET(na)
- : CC_OUT(na) (error) : "r"(field), "rm"(value));
- if (unlikely(error))
- vmwrite_error(field, value);
+ vmx_asm2(vmwrite, "r"(field), "rm"(value), field, value);
}
static __always_inline void vmcs_write16(unsigned long field, u16 value)
static inline void vmcs_clear(struct vmcs *vmcs)
{
u64 phys_addr = __pa(vmcs);
- bool error;
- asm volatile (__ex("vmclear %1") CC_SET(na)
- : CC_OUT(na) (error) : "m"(phys_addr));
- if (unlikely(error))
- printk(KERN_ERR "kvm: vmclear fail: %p/%llx\n",
- vmcs, phys_addr);
+ vmx_asm1(vmclear, "m"(phys_addr), vmcs, phys_addr);
}
static inline void vmcs_load(struct vmcs *vmcs)
{
u64 phys_addr = __pa(vmcs);
- bool error;
if (static_branch_unlikely(&enable_evmcs))
return evmcs_load(phys_addr);
- asm volatile (__ex("vmptrld %1") CC_SET(na)
- : CC_OUT(na) (error) : "m"(phys_addr));
- if (unlikely(error))
- printk(KERN_ERR "kvm: vmptrld %p/%llx failed\n",
- vmcs, phys_addr);
+ vmx_asm1(vmptrld, "m"(phys_addr), vmcs, phys_addr);
}
static inline void __invvpid(unsigned long ext, u16 vpid, gva_t gva)
u64 rsvd : 48;
u64 gva;
} operand = { vpid, 0, gva };
- bool error;
- asm volatile (__ex("invvpid %2, %1") CC_SET(na)
- : CC_OUT(na) (error) : "r"(ext), "m"(operand));
- BUG_ON(error);
+ vmx_asm2(invvpid, "r"(ext), "m"(operand), ext, vpid, gva);
}
static inline void __invept(unsigned long ext, u64 eptp, gpa_t gpa)
struct {
u64 eptp, gpa;
} operand = {eptp, gpa};
- bool error;
- asm volatile (__ex("invept %2, %1") CC_SET(na)
- : CC_OUT(na) (error) : "r"(ext), "m"(operand));
- BUG_ON(error);
+ vmx_asm2(invept, "r"(ext), "m"(operand), ext, eptp, gpa);
}
static inline bool vpid_sync_vcpu_addr(int vpid, gva_t addr)
static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct x86_pmu_capability x86_pmu;
struct kvm_cpuid_entry2 *entry;
union cpuid10_eax eax;
union cpuid10_edx edx;
if (!pmu->version)
return;
+ perf_get_x86_pmu_capability(&x86_pmu);
+
pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
- INTEL_PMC_MAX_GENERIC);
+ x86_pmu.num_counters_gp);
pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1;
pmu->available_event_types = ~entry->ebx &
((1ull << eax.split.mask_length) - 1);
} else {
pmu->nr_arch_fixed_counters =
min_t(int, edx.split.num_counters_fixed,
- INTEL_PMC_MAX_FIXED);
+ x86_pmu.num_counters_fixed);
pmu->counter_bitmask[KVM_PMC_FIXED] =
((u64)1 << edx.split.bit_width_fixed) - 1;
}
struct page *page;
unsigned int i;
+ if (!boot_cpu_has_bug(X86_BUG_L1TF)) {
+ l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
+ return 0;
+ }
+
if (!enable_ept) {
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED;
return 0;
void vmx_vmexit(void);
+#define vmx_insn_failed(fmt...) \
+do { \
+ WARN_ONCE(1, fmt); \
+ pr_warn_ratelimited(fmt); \
+} while (0)
+
+asmlinkage void vmread_error(unsigned long field, bool fault)
+{
+ if (fault)
+ kvm_spurious_fault();
+ else
+ vmx_insn_failed("kvm: vmread failed: field=%lx\n", field);
+}
+
+noinline void vmwrite_error(unsigned long field, unsigned long value)
+{
+ vmx_insn_failed("kvm: vmwrite failed: field=%lx val=%lx err=%d\n",
+ field, value, vmcs_read32(VM_INSTRUCTION_ERROR));
+}
+
+noinline void vmclear_error(struct vmcs *vmcs, u64 phys_addr)
+{
+ vmx_insn_failed("kvm: vmclear failed: %p/%llx\n", vmcs, phys_addr);
+}
+
+noinline void vmptrld_error(struct vmcs *vmcs, u64 phys_addr)
+{
+ vmx_insn_failed("kvm: vmptrld failed: %p/%llx\n", vmcs, phys_addr);
+}
+
+noinline void invvpid_error(unsigned long ext, u16 vpid, gva_t gva)
+{
+ vmx_insn_failed("kvm: invvpid failed: ext=0x%lx vpid=%u gva=0x%lx\n",
+ ext, vpid, gva);
+}
+
+noinline void invept_error(unsigned long ext, u64 eptp, gpa_t gpa)
+{
+ vmx_insn_failed("kvm: invept failed: ext=0x%lx eptp=%llx gpa=0x%llx\n",
+ ext, eptp, gpa);
+}
+
static DEFINE_PER_CPU(struct vmcs *, vmxarea);
DEFINE_PER_CPU(struct vmcs *, current_vmcs);
/*
return hv_remote_flush_tlb_with_range(kvm, NULL);
}
+static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
+{
+ struct hv_enlightened_vmcs *evmcs;
+ struct hv_partition_assist_pg **p_hv_pa_pg =
+ &vcpu->kvm->arch.hyperv.hv_pa_pg;
+ /*
+ * Synthetic VM-Exit is not enabled in current code and so All
+ * evmcs in singe VM shares same assist page.
+ */
+ if (!*p_hv_pa_pg)
+ *p_hv_pa_pg = kzalloc(PAGE_SIZE, GFP_KERNEL);
+
+ if (!*p_hv_pa_pg)
+ return -ENOMEM;
+
+ evmcs = (struct hv_enlightened_vmcs *)to_vmx(vcpu)->loaded_vmcs->vmcs;
+
+ evmcs->partition_assist_page =
+ __pa(*p_hv_pa_pg);
+ evmcs->hv_vm_id = (unsigned long)vcpu->kvm;
+ evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;
+
+ return 0;
+}
+
#endif /* IS_ENABLED(CONFIG_HYPERV) */
/*
return 0;
}
-/*
- * Returns an int to be compatible with SVM implementation (which can fail).
- * Do not use directly, use skip_emulated_instruction() instead.
- */
-static int __skip_emulated_instruction(struct kvm_vcpu *vcpu)
+static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
unsigned long rip;
- rip = kvm_rip_read(vcpu);
- rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
- kvm_rip_write(vcpu, rip);
+ /*
+ * Using VMCS.VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on
+ * undefined behavior: Intel's SDM doesn't mandate the VMCS field be
+ * set when EPT misconfig occurs. In practice, real hardware updates
+ * VM_EXIT_INSTRUCTION_LEN on EPT misconfig, but other hypervisors
+ * (namely Hyper-V) don't set it due to it being undefined behavior,
+ * i.e. we end up advancing IP with some random value.
+ */
+ if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
+ to_vmx(vcpu)->exit_reason != EXIT_REASON_EPT_MISCONFIG) {
+ rip = kvm_rip_read(vcpu);
+ rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+ kvm_rip_write(vcpu, rip);
+ } else {
+ if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP))
+ return 0;
+ }
/* skipping an emulated instruction also counts */
vmx_set_interrupt_shadow(vcpu, 0);
- return EMULATE_DONE;
-}
-
-static inline void skip_emulated_instruction(struct kvm_vcpu *vcpu)
-{
- (void)__skip_emulated_instruction(vcpu);
+ return 1;
}
static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
int inc_eip = 0;
if (kvm_exception_is_soft(nr))
inc_eip = vcpu->arch.event_exit_inst_len;
- if (kvm_inject_realmode_interrupt(vcpu, nr, inc_eip) != EMULATE_DONE)
- kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+ kvm_inject_realmode_interrupt(vcpu, nr, inc_eip);
return;
}
#endif
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_info);
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
+ return 1;
+
+ msr_info->data = vmx->msr_ia32_umwait_control;
+ break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
return 1;
vmcs_write64(GUEST_BNDCFGS, data);
break;
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
+ return 1;
+
+ /* The reserved bit 1 and non-32 bit [63:32] should be zero */
+ if (data & (BIT_ULL(1) | GENMASK_ULL(63, 32)))
+ return 1;
+
+ vmx->msr_ia32_umwait_control = data;
+ break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
SECONDARY_EXEC_TSC_SCALING |
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
SECONDARY_EXEC_PT_USE_GPA |
SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC |
}
}
+ if (vmx_waitpkg_supported()) {
+ bool waitpkg_enabled =
+ guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG);
+
+ if (!waitpkg_enabled)
+ exec_control &= ~SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+
+ if (nested) {
+ if (waitpkg_enabled)
+ vmx->nested.msrs.secondary_ctls_high |=
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+ else
+ vmx->nested.msrs.secondary_ctls_high &=
+ ~SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+ }
+ }
+
vmx->secondary_exec_control = exec_control;
}
vmx->rmode.vm86_active = 0;
vmx->spec_ctrl = 0;
+ vmx->msr_ia32_umwait_control = 0;
+
vcpu->arch.microcode_version = 0x100000000ULL;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
vmx->hv_deadline_tsc = -1;
int inc_eip = 0;
if (vcpu->arch.interrupt.soft)
inc_eip = vcpu->arch.event_exit_inst_len;
- if (kvm_inject_realmode_interrupt(vcpu, irq, inc_eip) != EMULATE_DONE)
- kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+ kvm_inject_realmode_interrupt(vcpu, irq, inc_eip);
return;
}
intr = irq | INTR_INFO_VALID_MASK;
vmx->loaded_vmcs->nmi_known_unmasked = false;
if (vmx->rmode.vm86_active) {
- if (kvm_inject_realmode_interrupt(vcpu, NMI_VECTOR, 0) != EMULATE_DONE)
- kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+ kvm_inject_realmode_interrupt(vcpu, NMI_VECTOR, 0);
return;
}
* Cause the #SS fault with 0 error code in VM86 mode.
*/
if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0) {
- if (kvm_emulate_instruction(vcpu, 0) == EMULATE_DONE) {
+ if (kvm_emulate_instruction(vcpu, 0)) {
if (vcpu->arch.halt_request) {
vcpu->arch.halt_request = 0;
return kvm_vcpu_halt(vcpu);
u32 intr_info, ex_no, error_code;
unsigned long cr2, rip, dr6;
u32 vect_info;
- enum emulation_result er;
vect_info = vmx->idt_vectoring_info;
intr_info = vmx->exit_intr_info;
if (!vmx->rmode.vm86_active && is_gp_fault(intr_info)) {
WARN_ON_ONCE(!enable_vmware_backdoor);
- er = kvm_emulate_instruction(vcpu,
- EMULTYPE_VMWARE | EMULTYPE_NO_UD_ON_FAIL);
- if (er == EMULATE_USER_EXIT)
- return 0;
- else if (er != EMULATE_DONE)
+
+ /*
+ * VMware backdoor emulation on #GP interception only handles
+ * IN{S}, OUT{S}, and RDPMC, none of which generate a non-zero
+ * error code on #GP.
+ */
+ if (error_code) {
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
- return 1;
+ return 1;
+ }
+ return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
}
/*
vcpu->arch.dr6 &= ~DR_TRAP_BITS;
vcpu->arch.dr6 |= dr6 | DR6_RTM;
if (is_icebp(intr_info))
- skip_emulated_instruction(vcpu);
+ WARN_ON(!skip_emulated_instruction(vcpu));
kvm_queue_exception(vcpu, DB_VECTOR);
return 1;
++vcpu->stat.io_exits;
if (string)
- return kvm_emulate_instruction(vcpu, 0) == EMULATE_DONE;
+ return kvm_emulate_instruction(vcpu, 0);
port = exit_qualification >> 16;
size = (exit_qualification & 7) + 1;
static int handle_desc(struct kvm_vcpu *vcpu)
{
WARN_ON(!(vcpu->arch.cr4 & X86_CR4_UMIP));
- return kvm_emulate_instruction(vcpu, 0) == EMULATE_DONE;
+ return kvm_emulate_instruction(vcpu, 0);
}
static int handle_cr(struct kvm_vcpu *vcpu)
static int handle_invd(struct kvm_vcpu *vcpu)
{
- return kvm_emulate_instruction(vcpu, 0) == EMULATE_DONE;
+ return kvm_emulate_instruction(vcpu, 0);
}
static int handle_invlpg(struct kvm_vcpu *vcpu)
return 1;
}
-static int handle_xsaves(struct kvm_vcpu *vcpu)
-{
- kvm_skip_emulated_instruction(vcpu);
- WARN(1, "this should never happen\n");
- return 1;
-}
-
-static int handle_xrstors(struct kvm_vcpu *vcpu)
-{
- kvm_skip_emulated_instruction(vcpu);
- WARN(1, "this should never happen\n");
- return 1;
-}
-
static int handle_apic_access(struct kvm_vcpu *vcpu)
{
if (likely(fasteoi)) {
return kvm_skip_emulated_instruction(vcpu);
}
}
- return kvm_emulate_instruction(vcpu, 0) == EMULATE_DONE;
+ return kvm_emulate_instruction(vcpu, 0);
}
static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
if (!idt_v || (type != INTR_TYPE_HARD_EXCEPTION &&
type != INTR_TYPE_EXT_INTR &&
type != INTR_TYPE_NMI_INTR))
- skip_emulated_instruction(vcpu);
-
- if (kvm_task_switch(vcpu, tss_selector,
- type == INTR_TYPE_SOFT_INTR ? idt_index : -1, reason,
- has_error_code, error_code) == EMULATE_FAIL) {
- vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
- vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
- vcpu->run->internal.ndata = 0;
- return 0;
- }
+ WARN_ON(!skip_emulated_instruction(vcpu));
/*
* TODO: What about debug traps on tss switch?
* Are we supposed to inject them and update dr6?
*/
-
- return 1;
+ return kvm_task_switch(vcpu, tss_selector,
+ type == INTR_TYPE_SOFT_INTR ? idt_index : -1,
+ reason, has_error_code, error_code);
}
static int handle_ept_violation(struct kvm_vcpu *vcpu)
if (!is_guest_mode(vcpu) &&
!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
trace_kvm_fast_mmio(gpa);
- /*
- * Doing kvm_skip_emulated_instruction() depends on undefined
- * behavior: Intel's manual doesn't mandate
- * VM_EXIT_INSTRUCTION_LEN to be set in VMCS when EPT MISCONFIG
- * occurs and while on real hardware it was observed to be set,
- * other hypervisors (namely Hyper-V) don't set it, we end up
- * advancing IP with some random value. Disable fast mmio when
- * running nested and keep it for real hardware in hope that
- * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
- */
- if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
- return kvm_skip_emulated_instruction(vcpu);
- else
- return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP) ==
- EMULATE_DONE;
+ return kvm_skip_emulated_instruction(vcpu);
}
return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- enum emulation_result err = EMULATE_DONE;
- int ret = 1;
bool intr_window_requested;
unsigned count = 130;
if (kvm_test_request(KVM_REQ_EVENT, vcpu))
return 1;
- err = kvm_emulate_instruction(vcpu, 0);
-
- if (err == EMULATE_USER_EXIT) {
- ++vcpu->stat.mmio_exits;
- ret = 0;
- goto out;
- }
-
- if (err != EMULATE_DONE)
- goto emulation_error;
+ if (!kvm_emulate_instruction(vcpu, 0))
+ return 0;
if (vmx->emulation_required && !vmx->rmode.vm86_active &&
- vcpu->arch.exception.pending)
- goto emulation_error;
+ vcpu->arch.exception.pending) {
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror =
+ KVM_INTERNAL_ERROR_EMULATION;
+ vcpu->run->internal.ndata = 0;
+ return 0;
+ }
if (vcpu->arch.halt_request) {
vcpu->arch.halt_request = 0;
- ret = kvm_vcpu_halt(vcpu);
- goto out;
+ return kvm_vcpu_halt(vcpu);
}
+ /*
+ * Note, return 1 and not 0, vcpu_run() is responsible for
+ * morphing the pending signal into the proper return code.
+ */
if (signal_pending(current))
- goto out;
+ return 1;
+
if (need_resched())
schedule();
}
-out:
- return ret;
-
-emulation_error:
- vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
- vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
- vcpu->run->internal.ndata = 0;
- return 0;
+ return 1;
}
static void grow_ple_window(struct kvm_vcpu *vcpu)
return 1;
}
+static int handle_unexpected_vmexit(struct kvm_vcpu *vcpu)
+{
+ kvm_skip_emulated_instruction(vcpu);
+ WARN_ONCE(1, "Unexpected VM-Exit Reason = 0x%x",
+ vmcs_read32(VM_EXIT_REASON));
+ return 1;
+}
+
/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
[EXIT_REASON_INVVPID] = handle_vmx_instruction,
[EXIT_REASON_RDRAND] = handle_invalid_op,
[EXIT_REASON_RDSEED] = handle_invalid_op,
- [EXIT_REASON_XSAVES] = handle_xsaves,
- [EXIT_REASON_XRSTORS] = handle_xrstors,
+ [EXIT_REASON_XSAVES] = handle_unexpected_vmexit,
+ [EXIT_REASON_XRSTORS] = handle_unexpected_vmexit,
[EXIT_REASON_PML_FULL] = handle_pml_full,
[EXIT_REASON_INVPCID] = handle_invpcid,
[EXIT_REASON_VMFUNC] = handle_vmx_instruction,
[EXIT_REASON_PREEMPTION_TIMER] = handle_preemption_timer,
[EXIT_REASON_ENCLS] = handle_encls,
+ [EXIT_REASON_UMWAIT] = handle_unexpected_vmexit,
+ [EXIT_REASON_TPAUSE] = handle_unexpected_vmexit,
};
static const int kvm_vmx_max_exit_handlers =
msrs[i].host, false);
}
+static void atomic_switch_umwait_control_msr(struct vcpu_vmx *vmx)
+{
+ u32 host_umwait_control;
+
+ if (!vmx_has_waitpkg(vmx))
+ return;
+
+ host_umwait_control = get_umwait_control_msr();
+
+ if (vmx->msr_ia32_umwait_control != host_umwait_control)
+ add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
+ vmx->msr_ia32_umwait_control,
+ host_umwait_control, false);
+ else
+ clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
+}
+
static void vmx_update_hv_timer(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
pt_guest_enter(vmx);
atomic_switch_perf_msrs(vmx);
+ atomic_switch_umwait_control_msr(vmx);
if (enable_preemption_timer)
vmx_update_hv_timer(vcpu);
current_evmcs->hv_clean_fields |=
HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
+ if (static_branch_unlikely(&enable_evmcs))
+ current_evmcs->hv_vp_id = vcpu->arch.hyperv.vp_index;
+
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
if (vmx->host_debugctlmsr)
update_debugctlmsr(vmx->host_debugctlmsr);
static void vmx_vm_free(struct kvm *kvm)
{
+ kfree(kvm->arch.hyperv.hv_pa_pg);
vfree(to_kvm_vmx(kvm));
}
.run = vmx_vcpu_run,
.handle_exit = vmx_handle_exit,
- .skip_emulated_instruction = __skip_emulated_instruction,
+ .skip_emulated_instruction = skip_emulated_instruction,
.set_interrupt_shadow = vmx_set_interrupt_shadow,
.get_interrupt_shadow = vmx_get_interrupt_shadow,
.patch_hypercall = vmx_patch_hypercall,
if (!vp_ap)
continue;
+ vp_ap->nested_control.features.directhypercall = 0;
vp_ap->current_nested_vmcs = 0;
vp_ap->enlighten_vmentry = 0;
}
pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n");
static_branch_enable(&enable_evmcs);
}
+
+ if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
+ vmx_x86_ops.enable_direct_tlbflush
+ = hv_enable_direct_tlbflush;
+
} else {
enlightened_vmcs = false;
}
* contain 'auto' which will be turned into the default 'cond'
* mitigation mode.
*/
- if (boot_cpu_has(X86_BUG_L1TF)) {
- r = vmx_setup_l1d_flush(vmentry_l1d_flush_param);
- if (r) {
- vmx_exit();
- return r;
- }
+ r = vmx_setup_l1d_flush(vmentry_l1d_flush_param);
+ if (r) {
+ vmx_exit();
+ return r;
}
#ifdef CONFIG_KEXEC_CORE
extern const u32 vmx_msr_index[];
extern u64 host_efer;
+extern u32 get_umwait_control_msr(void);
+
#define MSR_TYPE_R 1
#define MSR_TYPE_W 2
#define MSR_TYPE_RW 3
#endif
u64 spec_ctrl;
+ u32 msr_ia32_umwait_control;
u32 secondary_exec_control;
vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
}
+static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
+{
+ return vmx->secondary_exec_control &
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+}
+
void dump_vmcs(void);
#endif /* __KVM_X86_VMX_H */
static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#endif
-#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
-#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+#define VM_STAT(x, ...) offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__
+#define VCPU_STAT(x, ...) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__
#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
{ "mmu_cache_miss", VM_STAT(mmu_cache_miss) },
{ "mmu_unsync", VM_STAT(mmu_unsync) },
{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
- { "largepages", VM_STAT(lpages) },
+ { "largepages", VM_STAT(lpages, .mode = 0444) },
{ "max_mmu_page_hash_collisions",
VM_STAT(max_mmu_page_hash_collisions) },
{ NULL }
asmlinkage __visible void kvm_spurious_fault(void)
{
/* Fault while not rebooting. We want the trace. */
- BUG();
+ if (!kvm_rebooting)
+ BUG();
}
EXPORT_SYMBOL_GPL(kvm_spurious_fault);
}
EXPORT_SYMBOL_GPL(kvm_set_xcr);
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
- unsigned long old_cr4 = kvm_read_cr4(vcpu);
- unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
- X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
-
if (cr4 & CR4_RESERVED_BITS)
- return 1;
+ return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && (cr4 & X86_CR4_OSXSAVE))
- return 1;
+ return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMEP) && (cr4 & X86_CR4_SMEP))
- return 1;
+ return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMAP) && (cr4 & X86_CR4_SMAP))
- return 1;
+ return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_FSGSBASE) && (cr4 & X86_CR4_FSGSBASE))
- return 1;
+ return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE))
- return 1;
+ return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57))
- return 1;
+ return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_UMIP) && (cr4 & X86_CR4_UMIP))
+ return -EINVAL;
+
+ return 0;
+}
+
+int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ unsigned long old_cr4 = kvm_read_cr4(vcpu);
+ unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
+ X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
+
+ if (kvm_valid_cr4(vcpu, cr4))
return 1;
if (is_long_mode(vcpu)) {
MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+ MSR_IA32_UMWAIT_CONTROL,
+
+ MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
+ MSR_ARCH_PERFMON_FIXED_CTR0 + 2, MSR_ARCH_PERFMON_FIXED_CTR0 + 3,
+ MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
+ MSR_CORE_PERF_GLOBAL_CTRL, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
+ MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
+ MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
+ MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
+ MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
+ MSR_ARCH_PERFMON_PERFCTR0 + 8, MSR_ARCH_PERFMON_PERFCTR0 + 9,
+ MSR_ARCH_PERFMON_PERFCTR0 + 10, MSR_ARCH_PERFMON_PERFCTR0 + 11,
+ MSR_ARCH_PERFMON_PERFCTR0 + 12, MSR_ARCH_PERFMON_PERFCTR0 + 13,
+ MSR_ARCH_PERFMON_PERFCTR0 + 14, MSR_ARCH_PERFMON_PERFCTR0 + 15,
+ MSR_ARCH_PERFMON_PERFCTR0 + 16, MSR_ARCH_PERFMON_PERFCTR0 + 17,
+ MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 8, MSR_ARCH_PERFMON_EVENTSEL0 + 9,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 10, MSR_ARCH_PERFMON_EVENTSEL0 + 11,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 12, MSR_ARCH_PERFMON_EVENTSEL0 + 13,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 14, MSR_ARCH_PERFMON_EVENTSEL0 + 15,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 16, MSR_ARCH_PERFMON_EVENTSEL0 + 17,
};
static unsigned num_msrs_to_save;
case KVM_CAP_HYPERV_EVENTFD:
case KVM_CAP_HYPERV_TLBFLUSH:
case KVM_CAP_HYPERV_SEND_IPI:
- case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
case KVM_CAP_HYPERV_CPUID:
case KVM_CAP_PCI_SEGMENT:
case KVM_CAP_DEBUGREGS:
r = kvm_x86_ops->get_nested_state ?
kvm_x86_ops->get_nested_state(NULL, NULL, 0) : 0;
break;
+ case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
+ r = kvm_x86_ops->enable_direct_tlbflush != NULL;
+ break;
+ case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
+ r = kvm_x86_ops->nested_enable_evmcs != NULL;
+ break;
default:
break;
}
r = -EFAULT;
}
return r;
+ case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
+ if (!kvm_x86_ops->enable_direct_tlbflush)
+ return -ENOTTY;
+
+ return kvm_x86_ops->enable_direct_tlbflush(vcpu);
default:
return -EINVAL;
static void kvm_init_msr_list(void)
{
+ struct x86_pmu_capability x86_pmu;
u32 dummy[2];
unsigned i, j;
+ BUILD_BUG_ON_MSG(INTEL_PMC_MAX_FIXED != 4,
+ "Please update the fixed PMCs in msrs_to_save[]");
+
+ perf_get_x86_pmu_capability(&x86_pmu);
+
for (i = j = 0; i < ARRAY_SIZE(msrs_to_save); i++) {
if (rdmsr_safe(msrs_to_save[i], &dummy[0], &dummy[1]) < 0)
continue;
intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2)
continue;
break;
+ case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17:
+ if (msrs_to_save[i] - MSR_ARCH_PERFMON_PERFCTR0 >=
+ min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
+ continue;
+ break;
+ case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 17:
+ if (msrs_to_save[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
+ min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
+ continue;
}
default:
break;
int handle_ud(struct kvm_vcpu *vcpu)
{
int emul_type = EMULTYPE_TRAP_UD;
- enum emulation_result er;
char sig[5]; /* ud2; .ascii "kvm" */
struct x86_exception e;
sig, sizeof(sig), &e) == 0 &&
memcmp(sig, "\xf\xbkvm", sizeof(sig)) == 0) {
kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(sig));
- emul_type = 0;
+ emul_type = EMULTYPE_TRAP_UD_FORCED;
}
- er = kvm_emulate_instruction(vcpu, emul_type);
- if (er == EMULATE_USER_EXIT)
- return 0;
- if (er != EMULATE_DONE)
- kvm_queue_exception(vcpu, UD_VECTOR);
- return 1;
+ return kvm_emulate_instruction(vcpu, emul_type);
}
EXPORT_SYMBOL_GPL(handle_ud);
vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
}
-int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
+void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
{
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
int ret;
ctxt->_eip = ctxt->eip + inc_eip;
ret = emulate_int_real(ctxt, irq);
- if (ret != X86EMUL_CONTINUE)
- return EMULATE_FAIL;
-
- ctxt->eip = ctxt->_eip;
- kvm_rip_write(vcpu, ctxt->eip);
- kvm_set_rflags(vcpu, ctxt->eflags);
-
- return EMULATE_DONE;
+ if (ret != X86EMUL_CONTINUE) {
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+ } else {
+ ctxt->eip = ctxt->_eip;
+ kvm_rip_write(vcpu, ctxt->eip);
+ kvm_set_rflags(vcpu, ctxt->eflags);
+ }
}
EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
{
- int r = EMULATE_DONE;
-
++vcpu->stat.insn_emulation_fail;
trace_kvm_emulate_insn_failed(vcpu);
- if (emulation_type & EMULTYPE_NO_UD_ON_FAIL)
- return EMULATE_FAIL;
+ if (emulation_type & EMULTYPE_VMWARE_GP) {
+ kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
+ return 1;
+ }
- if (!is_guest_mode(vcpu) && kvm_x86_ops->get_cpl(vcpu) == 0) {
+ if (emulation_type & EMULTYPE_SKIP) {
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
vcpu->run->internal.ndata = 0;
- r = EMULATE_USER_EXIT;
+ return 0;
}
kvm_queue_exception(vcpu, UD_VECTOR);
- return r;
+ if (!is_guest_mode(vcpu) && kvm_x86_ops->get_cpl(vcpu) == 0) {
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+ vcpu->run->internal.ndata = 0;
+ return 0;
+ }
+
+ return 1;
}
static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2,
return dr6;
}
-static void kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu, int *r)
+static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
{
struct kvm_run *kvm_run = vcpu->run;
kvm_run->debug.arch.pc = vcpu->arch.singlestep_rip;
kvm_run->debug.arch.exception = DB_VECTOR;
kvm_run->exit_reason = KVM_EXIT_DEBUG;
- *r = EMULATE_USER_EXIT;
- } else {
- kvm_queue_exception_p(vcpu, DB_VECTOR, DR6_BS);
+ return 0;
}
+ kvm_queue_exception_p(vcpu, DB_VECTOR, DR6_BS);
+ return 1;
}
int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
int r;
r = kvm_x86_ops->skip_emulated_instruction(vcpu);
- if (unlikely(r != EMULATE_DONE))
+ if (unlikely(!r))
return 0;
/*
* that sets the TF flag".
*/
if (unlikely(rflags & X86_EFLAGS_TF))
- kvm_vcpu_do_singlestep(vcpu, &r);
- return r == EMULATE_DONE;
+ r = kvm_vcpu_do_singlestep(vcpu);
+ return r;
}
EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
kvm_run->debug.arch.pc = eip;
kvm_run->debug.arch.exception = DB_VECTOR;
kvm_run->exit_reason = KVM_EXIT_DEBUG;
- *r = EMULATE_USER_EXIT;
+ *r = 0;
return true;
}
}
vcpu->arch.dr6 &= ~DR_TRAP_BITS;
vcpu->arch.dr6 |= dr6 | DR6_RTM;
kvm_queue_exception(vcpu, DB_VECTOR);
- *r = EMULATE_DONE;
+ *r = 1;
return true;
}
}
trace_kvm_emulate_insn_start(vcpu);
++vcpu->stat.insn_emulation;
if (r != EMULATION_OK) {
- if (emulation_type & EMULTYPE_TRAP_UD)
- return EMULATE_FAIL;
+ if ((emulation_type & EMULTYPE_TRAP_UD) ||
+ (emulation_type & EMULTYPE_TRAP_UD_FORCED)) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
emulation_type))
- return EMULATE_DONE;
+ return 1;
if (ctxt->have_exception) {
/*
* #UD should result in just EMULATION_FAILED, and trap-like
WARN_ON_ONCE(ctxt->exception.vector == UD_VECTOR ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP);
inject_emulated_exception(vcpu);
- return EMULATE_DONE;
+ return 1;
}
- if (emulation_type & EMULTYPE_SKIP)
- return EMULATE_FAIL;
return handle_emulation_failure(vcpu, emulation_type);
}
}
- if ((emulation_type & EMULTYPE_VMWARE) &&
- !is_vmware_backdoor_opcode(ctxt))
- return EMULATE_FAIL;
+ if ((emulation_type & EMULTYPE_VMWARE_GP) &&
+ !is_vmware_backdoor_opcode(ctxt)) {
+ kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
+ return 1;
+ }
+ /*
+ * Note, EMULTYPE_SKIP is intended for use *only* by vendor callbacks
+ * for kvm_skip_emulated_instruction(). The caller is responsible for
+ * updating interruptibility state and injecting single-step #DBs.
+ */
if (emulation_type & EMULTYPE_SKIP) {
kvm_rip_write(vcpu, ctxt->_eip);
if (ctxt->eflags & X86_EFLAGS_RF)
kvm_set_rflags(vcpu, ctxt->eflags & ~X86_EFLAGS_RF);
- kvm_x86_ops->set_interrupt_shadow(vcpu, 0);
- return EMULATE_DONE;
+ return 1;
}
if (retry_instruction(ctxt, cr2, emulation_type))
- return EMULATE_DONE;
+ return 1;
/* this is needed for vmware backdoor interface to work since it
changes registers values during IO operation */
r = x86_emulate_insn(ctxt);
if (r == EMULATION_INTERCEPTED)
- return EMULATE_DONE;
+ return 1;
if (r == EMULATION_FAILED) {
if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
emulation_type))
- return EMULATE_DONE;
+ return 1;
return handle_emulation_failure(vcpu, emulation_type);
}
if (ctxt->have_exception) {
- r = EMULATE_DONE;
+ r = 1;
if (inject_emulated_exception(vcpu))
return r;
} else if (vcpu->arch.pio.count) {
writeback = false;
vcpu->arch.complete_userspace_io = complete_emulated_pio;
}
- r = EMULATE_USER_EXIT;
+ r = 0;
} else if (vcpu->mmio_needed) {
+ ++vcpu->stat.mmio_exits;
+
if (!vcpu->mmio_is_write)
writeback = false;
- r = EMULATE_USER_EXIT;
+ r = 0;
vcpu->arch.complete_userspace_io = complete_emulated_mmio;
} else if (r == EMULATION_RESTART)
goto restart;
else
- r = EMULATE_DONE;
+ r = 1;
if (writeback) {
unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_rip_write(vcpu, ctxt->eip);
- if (r == EMULATE_DONE && ctxt->tf)
- kvm_vcpu_do_singlestep(vcpu, &r);
+ if (r && ctxt->tf)
+ r = kvm_vcpu_do_singlestep(vcpu);
__kvm_set_rflags(vcpu, ctxt->eflags);
}
static inline int complete_emulated_io(struct kvm_vcpu *vcpu)
{
int r;
+
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
r = kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
- if (r != EMULATE_DONE)
- return 0;
- return 1;
+ return r;
}
static int complete_emulated_pio(struct kvm_vcpu *vcpu)
ret = emulator_task_switch(ctxt, tss_selector, idt_index, reason,
has_error_code, error_code);
-
- if (ret)
- return EMULATE_FAIL;
+ if (ret) {
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+ vcpu->run->internal.ndata = 0;
+ return 0;
+ }
kvm_rip_write(vcpu, ctxt->eip);
kvm_set_rflags(vcpu, ctxt->eflags);
kvm_make_request(KVM_REQ_EVENT, vcpu);
- return EMULATE_DONE;
+ return 1;
}
EXPORT_SYMBOL_GPL(kvm_task_switch);
static int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
{
- if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
- (sregs->cr4 & X86_CR4_OSXSAVE))
- return -EINVAL;
-
if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
/*
* When EFER.LME and CR0.PG are set, the processor is in
return -EINVAL;
}
- return 0;
+ return kvm_valid_cr4(vcpu, sregs->cr4);
}
static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+ INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
atomic_set(&kvm->arch.noncoherent_dma_count, 0);
* Scan sptes if dirty logging has been stopped, dropping those
* which can be collapsed into a single large-page spte. Later
* page faults will create the large-page sptes.
+ *
+ * There is no need to do this in any of the following cases:
+ * CREATE: No dirty mappings will already exist.
+ * MOVE/DELETE: The old mappings will already have been cleaned up by
+ * kvm_arch_flush_shadow_memslot()
*/
- if ((change != KVM_MR_DELETE) &&
+ if (change == KVM_MR_FLAGS_ONLY &&
(old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
kvm_mmu_zap_collapsible_sptes(kvm, new);
}
void kvm_set_pending_timer(struct kvm_vcpu *vcpu);
-int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
+void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr);
u64 get_kvmclock_ns(struct kvm *kvm);
#include <linux/module.h>
#include <linux/io.h>
#include <linux/mmiotrace.h>
+#include <linux/security.h>
static unsigned long mmio_address;
module_param_hw(mmio_address, ulong, iomem, 0);
static int __init init(void)
{
unsigned long size = (read_far) ? (8 << 20) : (16 << 10);
+ int ret = security_locked_down(LOCKDOWN_MMIOTRACE);
+
+ if (ret)
+ return ret;
if (mmio_address == 0) {
pr_err("you have to use the module argument mmio_address.\n");
PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel
PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss
+PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN)
# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
# in turn leaves some undefined symbols like __fentry__ in purgatory and not
return NULL;
/* Here we know that Xen runs on EFI platform. */
-
- efi.get_time = xen_efi_get_time;
- efi.set_time = xen_efi_set_time;
- efi.get_wakeup_time = xen_efi_get_wakeup_time;
- efi.set_wakeup_time = xen_efi_set_wakeup_time;
- efi.get_variable = xen_efi_get_variable;
- efi.get_next_variable = xen_efi_get_next_variable;
- efi.set_variable = xen_efi_set_variable;
- efi.query_variable_info = xen_efi_query_variable_info;
- efi.update_capsule = xen_efi_update_capsule;
- efi.query_capsule_caps = xen_efi_query_capsule_caps;
- efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count;
- efi.reset_system = xen_efi_reset_system;
+ xen_efi_runtime_setup();
efi_systab_xen.tables = info->cfg.addr;
efi_systab_xen.nr_tables = info->cfg.nent;
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags);
+
+ if (!refcount_dec_and_test(&flush_rq->ref)) {
+ fq->rq_status = error;
+ spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
+ return;
+ }
+
+ if (fq->rq_status != BLK_STS_OK)
+ error = fq->rq_status;
+
hctx = flush_rq->mq_hctx;
if (!q->elevator) {
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
static const struct ioc_params autop[] = {
[AUTOP_HDD] = {
.qos = {
- [QOS_RLAT] = 50000, /* 50ms */
- [QOS_WLAT] = 50000,
+ [QOS_RLAT] = 250000, /* 250ms */
+ [QOS_WLAT] = 250000,
[QOS_MIN] = VRATE_MIN_PPM,
[QOS_MAX] = VRATE_MAX_PPM,
},
u32 ppm_wthr = MILLION - ioc->params.qos[QOS_WPPM];
u32 missed_ppm[2], rq_wait_pct;
u64 period_vtime;
- int i;
+ int prev_busy_level, i;
/* how were the latencies during the period? */
ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct);
* comparing vdone against period start. If lagging behind
* IOs from past periods, don't increase vrate.
*/
- if (!atomic_read(&iocg_to_blkg(iocg)->use_delay) &&
+ if ((ppm_rthr != MILLION || ppm_wthr != MILLION) &&
+ !atomic_read(&iocg_to_blkg(iocg)->use_delay) &&
time_after64(vtime, vdone) &&
time_after64(vtime, now.vnow -
MAX_LAGGING_PERIODS * period_vtime) &&
* and experiencing shortages but not surpluses, we're too stingy
* and should increase vtime rate.
*/
+ prev_busy_level = ioc->busy_level;
if (rq_wait_pct > RQ_WAIT_BUSY_PCT ||
missed_ppm[READ] > ppm_rthr ||
missed_ppm[WRITE] > ppm_wthr) {
ioc->busy_level = max(ioc->busy_level, 0);
ioc->busy_level++;
- } else if (nr_lagging) {
- ioc->busy_level = max(ioc->busy_level, 0);
- } else if (nr_shortages && !nr_surpluses &&
- rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 &&
+ } else if (rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 &&
missed_ppm[READ] <= ppm_rthr * UNBUSY_THR_PCT / 100 &&
missed_ppm[WRITE] <= ppm_wthr * UNBUSY_THR_PCT / 100) {
- ioc->busy_level = min(ioc->busy_level, 0);
- ioc->busy_level--;
+ /* take action iff there is contention */
+ if (nr_shortages && !nr_lagging) {
+ ioc->busy_level = min(ioc->busy_level, 0);
+ /* redistribute surpluses first */
+ if (!nr_surpluses)
+ ioc->busy_level--;
+ }
} else {
ioc->busy_level = 0;
}
ioc->busy_level = clamp(ioc->busy_level, -1000, 1000);
- if (ioc->busy_level) {
+ if (ioc->busy_level > 0 || (ioc->busy_level < 0 && !nr_lagging)) {
u64 vrate = atomic64_read(&ioc->vtime_rate);
u64 vrate_min = ioc->vrate_min, vrate_max = ioc->vrate_max;
atomic64_set(&ioc->vtime_rate, vrate);
ioc->inuse_margin_vtime = DIV64_U64_ROUND_UP(
ioc->period_us * vrate * INUSE_MARGIN_PCT, 100);
+ } else if (ioc->busy_level != prev_busy_level || nr_lagging) {
+ trace_iocost_ioc_vrate_adj(ioc, atomic64_read(&ioc->vtime_rate),
+ &missed_ppm, rq_wait_pct, nr_lagging,
+ nr_shortages, nr_surpluses);
}
ioc_refresh_params(ioc, false);
struct blk_mq_hw_ctx *hctx;
int i;
- lockdep_assert_held(&q->sysfs_lock);
-
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_tags)
blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
*/
if (blk_mq_req_expired(rq, next))
blk_mq_rq_timed_out(rq, reserved);
- if (refcount_dec_and_test(&rq->ref))
+
+ if (is_flush_rq(rq, hctx))
+ rq->end_io(rq, 0);
+ else if (refcount_dec_and_test(&rq->ref))
__blk_mq_free_request(rq);
return true;
/* bypass scheduler for flush rq */
blk_insert_flush(rq);
blk_mq_run_hw_queue(data.hctx, true);
- } else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs)) {
+ } else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs ||
+ !blk_queue_nonrot(q))) {
/*
* Use plugging if we have a ->commit_rqs() hook as well, as
* we know the driver uses bd->last in a smart fashion.
+ *
+ * Use normal plugging if this disk is slow HDD, as sequential
+ * IO may benefit a lot from plug merging.
*/
unsigned int request_count = plug->rq_count;
struct request *last = NULL;
}
blk_add_rq_to_plug(plug, rq);
+ } else if (q->elevator) {
+ blk_mq_sched_insert_request(rq, false, true, true);
} else if (plug && !blk_queue_nomerges(q)) {
/*
* We do limited plugging. If the bio can be merged, do that.
blk_mq_try_issue_directly(data.hctx, same_queue_rq,
&cookie);
}
- } else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator &&
- !data.hctx->dispatch_busy)) {
+ } else if ((q->nr_hw_queues > 1 && is_sync) ||
+ !data.hctx->dispatch_busy) {
blk_mq_try_issue_directly(data.hctx, rq, &cookie);
} else {
blk_mq_sched_insert_request(rq, false, true, true);
blk_mq_quiesce_queue(q);
wbt_set_min_lat(q, val);
- wbt_update_limits(q);
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
blk_mq_debugfs_register(q);
}
- /*
- * The flag of QUEUE_FLAG_REGISTERED isn't set yet, so elevator
- * switch won't happen at all.
- */
+ mutex_lock(&q->sysfs_lock);
if (q->elevator) {
ret = elv_register_queue(q, false);
if (ret) {
+ mutex_unlock(&q->sysfs_lock);
mutex_unlock(&q->sysfs_dir_lock);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(dev);
has_elevator = true;
}
- mutex_lock(&q->sysfs_lock);
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
wbt_enable_default(q);
blk_throtl_register_queue(q);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(disk_to_dev(disk));
- /*
- * q->kobj has been removed, so it is safe to check if elevator
- * exists without holding q->sysfs_lock.
- */
+ mutex_lock(&q->sysfs_lock);
if (q->elevator)
elv_unregister_queue(q);
+ mutex_unlock(&q->sysfs_lock);
mutex_unlock(&q->sysfs_dir_lock);
kobject_put(&disk_to_dev(disk)->kobj);
unsigned int flush_queue_delayed:1;
unsigned int flush_pending_idx:1;
unsigned int flush_running_idx:1;
+ blk_status_t rq_status;
unsigned long flush_pending_since;
struct list_head flush_queue[2];
struct list_head flush_data_in_flight;
kobject_get(&q->kobj);
}
+static inline bool
+is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx)
+{
+ return hctx->fq->flush_rq == req;
+}
+
struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
int node, int cmd_size, gfp_t flags);
void blk_free_flush_queue(struct blk_flush_queue *q);
static inline void elevator_exit(struct request_queue *q,
struct elevator_queue *e)
{
+ lockdep_assert_held(&q->sysfs_lock);
+
blk_mq_sched_free_requests(q);
__elevator_exit(q, e);
}
if (uevent)
kobject_uevent(&e->kobj, KOBJ_ADD);
- mutex_lock(&q->sysfs_lock);
e->registered = 1;
- mutex_unlock(&q->sysfs_lock);
}
return error;
}
kobject_uevent(&e->kobj, KOBJ_REMOVE);
kobject_del(&e->kobj);
- mutex_lock(&q->sysfs_lock);
e->registered = 0;
/* Re-enable throttling in case elevator disabled it */
wbt_enable_default(q);
- mutex_unlock(&q->sysfs_lock);
}
}
lockdep_assert_held(&q->sysfs_lock);
if (q->elevator) {
- if (q->elevator->registered) {
- mutex_unlock(&q->sysfs_lock);
-
- /*
- * Concurrent elevator switch can't happen becasue
- * sysfs write is always exclusively on same file.
- *
- * Also the elevator queue won't be freed after
- * sysfs_lock is released becasue kobject_del() in
- * blk_unregister_queue() waits for completion of
- * .store & .show on its attributes.
- */
+ if (q->elevator->registered)
elv_unregister_queue(q);
- mutex_lock(&q->sysfs_lock);
- }
ioc_clear_queue(q);
elevator_exit(q, q->elevator);
-
- /*
- * sysfs_lock may be dropped, so re-check if queue is
- * unregistered. If yes, don't switch to new elevator
- * any more
- */
- if (!blk_queue_registered(q))
- return 0;
}
ret = blk_mq_init_sched(q, new_e);
goto out;
if (new_e) {
- mutex_unlock(&q->sysfs_lock);
-
ret = elv_register_queue(q, true);
-
- mutex_lock(&q->sysfs_lock);
if (ret) {
elevator_exit(q, q->elevator);
goto out;
{ 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x84, 0x01 },
/* tables */
- [OPAL_TABLE_TABLE]
+ [OPAL_TABLE_TABLE] =
{ 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01 },
[OPAL_LOCKINGRANGE_GLOBAL] =
{ 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01 },
{
const struct d0_geometry_features *geo = data;
- dev->align = geo->alignment_granularity;
- dev->lowest_lba = geo->lowest_aligned_lba;
+ dev->align = be64_to_cpu(geo->alignment_granularity);
+ dev->lowest_lba = be64_to_cpu(geo->lowest_aligned_lba);
}
static int execute_step(struct opal_dev *dev,
#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
/**
- * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
+ * verify_pkcs7_message_sig - Verify a PKCS#7-based signature on system data.
* @data: The data to be verified (NULL if expecting internal data).
* @len: Size of @data.
- * @raw_pkcs7: The PKCS#7 message that is the signature.
- * @pkcs7_len: The size of @raw_pkcs7.
+ * @pkcs7: The PKCS#7 message that is the signature.
* @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
* (void *)1UL for all trusted keys).
* @usage: The use to which the key is being put.
* @view_content: Callback to gain access to content.
* @ctx: Context for callback.
*/
-int verify_pkcs7_signature(const void *data, size_t len,
- const void *raw_pkcs7, size_t pkcs7_len,
- struct key *trusted_keys,
- enum key_being_used_for usage,
- int (*view_content)(void *ctx,
- const void *data, size_t len,
- size_t asn1hdrlen),
- void *ctx)
+int verify_pkcs7_message_sig(const void *data, size_t len,
+ struct pkcs7_message *pkcs7,
+ struct key *trusted_keys,
+ enum key_being_used_for usage,
+ int (*view_content)(void *ctx,
+ const void *data, size_t len,
+ size_t asn1hdrlen),
+ void *ctx)
{
- struct pkcs7_message *pkcs7;
int ret;
- pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
- if (IS_ERR(pkcs7))
- return PTR_ERR(pkcs7);
-
/* The data should be detached - so we need to supply it. */
if (data && pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
pr_err("PKCS#7 signature with non-detached data\n");
}
error:
+ pr_devel("<==%s() = %d\n", __func__, ret);
+ return ret;
+}
+
+/**
+ * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
+ * @data: The data to be verified (NULL if expecting internal data).
+ * @len: Size of @data.
+ * @raw_pkcs7: The PKCS#7 message that is the signature.
+ * @pkcs7_len: The size of @raw_pkcs7.
+ * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
+ * (void *)1UL for all trusted keys).
+ * @usage: The use to which the key is being put.
+ * @view_content: Callback to gain access to content.
+ * @ctx: Context for callback.
+ */
+int verify_pkcs7_signature(const void *data, size_t len,
+ const void *raw_pkcs7, size_t pkcs7_len,
+ struct key *trusted_keys,
+ enum key_being_used_for usage,
+ int (*view_content)(void *ctx,
+ const void *data, size_t len,
+ size_t asn1hdrlen),
+ void *ctx)
+{
+ struct pkcs7_message *pkcs7;
+ int ret;
+
+ pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
+ if (IS_ERR(pkcs7))
+ return PTR_ERR(pkcs7);
+
+ ret = verify_pkcs7_message_sig(data, len, pkcs7, trusted_keys, usage,
+ view_content, ctx);
+
pkcs7_free_message(pkcs7);
pr_devel("<==%s() = %d\n", __func__, ret);
return ret;
#include <linux/err.h>
#include <linux/asn1.h>
#include <crypto/hash.h>
+#include <crypto/hash_info.h>
#include <crypto/public_key.h>
#include "pkcs7_parser.h"
kenter(",%u,%s", sinfo->index, sinfo->sig->hash_algo);
+ /* The digest was calculated already. */
+ if (sig->digest)
+ return 0;
+
if (!sinfo->sig->hash_algo)
return -ENOPKG;
return ret;
}
+int pkcs7_get_digest(struct pkcs7_message *pkcs7, const u8 **buf, u32 *len,
+ enum hash_algo *hash_algo)
+{
+ struct pkcs7_signed_info *sinfo = pkcs7->signed_infos;
+ int i, ret;
+
+ /*
+ * This function doesn't support messages with more than one signature.
+ */
+ if (sinfo == NULL || sinfo->next != NULL)
+ return -EBADMSG;
+
+ ret = pkcs7_digest(pkcs7, sinfo);
+ if (ret)
+ return ret;
+
+ *buf = sinfo->sig->digest;
+ *len = sinfo->sig->digest_size;
+
+ for (i = 0; i < HASH_ALGO__LAST; i++)
+ if (!strcmp(hash_algo_name[i], sinfo->sig->hash_algo)) {
+ *hash_algo = i;
+ break;
+ }
+
+ return 0;
+}
+
/*
* Find the key (X.509 certificate) to use to verify a PKCS#7 message. PKCS#7
* uses the issuer's name and the issuing certificate serial number for
if (!ddir->certs.virtual_address || !ddir->certs.size) {
pr_debug("Unsigned PE binary\n");
- return -EKEYREJECTED;
+ return -ENODATA;
}
chkaddr(ctx->header_size, ddir->certs.virtual_address,
* (*) 0 if at least one signature chain intersects with the keys in the trust
* keyring, or:
*
+ * (*) -ENODATA if there is no signature present.
+ *
* (*) -ENOPKG if a suitable crypto module couldn't be found for a check on a
* chain.
*
#include <linux/uaccess.h>
#include <linux/debugfs.h>
#include <linux/acpi.h>
+#include <linux/security.h>
#include "internal.h"
struct acpi_table_header table;
acpi_status status;
+ int ret;
+
+ ret = security_locked_down(LOCKDOWN_ACPI_TABLES);
+ if (ret)
+ return ret;
if (!(*ppos)) {
/* parse the table header to get the table length */
#include <linux/list.h>
#include <linux/jiffies.h>
#include <linux/semaphore.h>
+#include <linux/security.h>
#include <asm/io.h>
#include <linux/uaccess.h>
acpi_physical_address pa;
#ifdef CONFIG_KEXEC
- if (acpi_rsdp)
+ /*
+ * We may have been provided with an RSDP on the command line,
+ * but if a malicious user has done so they may be pointing us
+ * at modified ACPI tables that could alter kernel behaviour -
+ * so, we check the lockdown status before making use of
+ * it. If we trust it then also stash it in an architecture
+ * specific location (if appropriate) so it can be carried
+ * over further kexec()s.
+ */
+ if (acpi_rsdp && !security_locked_down(LOCKDOWN_ACPI_TABLES)) {
+ acpi_arch_set_root_pointer(acpi_rsdp);
return acpi_rsdp;
+ }
#endif
pa = acpi_arch_get_root_pointer();
if (pa)
#include <linux/memblock.h>
#include <linux/earlycpio.h>
#include <linux/initrd.h>
+#include <linux/security.h>
#include "internal.h"
#ifdef CONFIG_ACPI_CUSTOM_DSDT
if (table_nr == 0)
return;
+ if (security_locked_down(LOCKDOWN_ACPI_TABLES)) {
+ pr_notice("kernel is locked down, ignoring table override\n");
+ return;
+ }
+
acpi_tables_addr =
memblock_find_in_range(0, ACPI_TABLE_UPGRADE_MAX_PHYS,
all_tables_size, PAGE_SIZE);
if (RBRQ_HBUF_ERR(he_dev->rbrq_head)) {
hprintk("HBUF_ERR! (cid 0x%x)\n", cid);
- atomic_inc(&vcc->stats->rx_drop);
+ atomic_inc(&vcc->stats->rx_drop);
goto return_host_buffers;
}
if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
blk_queue_write_cache(lo->lo_queue, true, false);
+ if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) {
+ /* In case of direct I/O, match underlying block size */
+ unsigned short bsize = bdev_logical_block_size(
+ inode->i_sb->s_bdev);
+
+ blk_queue_logical_block_size(lo->lo_queue, bsize);
+ blk_queue_physical_block_size(lo->lo_queue, bsize);
+ blk_queue_io_min(lo->lo_queue, bsize);
+ }
+
loop_update_rotational(lo);
loop_update_dio(lo);
set_capacity(lo->lo_disk, size);
ddata->clocks[index] = devm_clk_get(ddata->dev, name);
if (IS_ERR(ddata->clocks[index])) {
- if (PTR_ERR(ddata->clocks[index]) == -ENOENT)
- return 0;
-
dev_err(ddata->dev, "clock get error for %s: %li\n",
name, PTR_ERR(ddata->clocks[index]));
continue;
error = sysc_get_one_clock(ddata, name);
- if (error && error != -ENOENT)
+ if (error)
return error;
}
if (error)
return error;
- if (manage_clocks) {
- sysc_clkdm_deny_idle(ddata);
+ sysc_clkdm_deny_idle(ddata);
- error = sysc_enable_opt_clocks(ddata);
- if (error)
- return error;
+ /*
+ * Always enable clocks. The bootloader may or may not have enabled
+ * the related clocks.
+ */
+ error = sysc_enable_opt_clocks(ddata);
+ if (error)
+ return error;
- error = sysc_enable_main_clocks(ddata);
- if (error)
- goto err_opt_clocks;
- }
+ error = sysc_enable_main_clocks(ddata);
+ if (error)
+ goto err_opt_clocks;
if (!(ddata->cfg.quirks & SYSC_QUIRK_NO_RESET_ON_INIT)) {
error = sysc_rstctrl_reset_deassert(ddata, true);
goto err_main_clocks;
}
- if (!ddata->legacy_mode && manage_clocks) {
+ if (!ddata->legacy_mode) {
error = sysc_enable_module(ddata->dev);
if (error)
goto err_main_clocks;
if (manage_clocks)
sysc_disable_main_clocks(ddata);
err_opt_clocks:
+ /* No re-enable of clockdomain autoidle to prevent module autoidle */
if (manage_clocks) {
sysc_disable_opt_clocks(ddata);
sysc_clkdm_allow_idle(ddata);
ddata = container_of(work, struct sysc, idle_work.work);
+ /*
+ * One time decrement of clock usage counts if left on from init.
+ * Note that we disable opt clocks unconditionally in this case
+ * as they are enabled unconditionally during init without
+ * considering sysc_opt_clks_needed() at that point.
+ */
+ if (ddata->cfg.quirks & (SYSC_QUIRK_NO_IDLE |
+ SYSC_QUIRK_NO_IDLE_ON_INIT)) {
+ sysc_disable_main_clocks(ddata);
+ sysc_disable_opt_clocks(ddata);
+ sysc_clkdm_allow_idle(ddata);
+ }
+
+ /* Keep permanent PM runtime usage count for SYSC_QUIRK_NO_IDLE */
+ if (ddata->cfg.quirks & SYSC_QUIRK_NO_IDLE)
+ return;
+
+ /*
+ * Decrement PM runtime usage count for SYSC_QUIRK_NO_IDLE_ON_INIT
+ * and SYSC_QUIRK_NO_RESET_ON_INIT
+ */
if (pm_runtime_active(ddata->dev))
pm_runtime_put_sync(ddata->dev);
}
INIT_DELAYED_WORK(&ddata->idle_work, ti_sysc_idle);
/* At least earlycon won't survive without deferred idle */
- if (ddata->cfg.quirks & (SYSC_QUIRK_NO_IDLE_ON_INIT |
+ if (ddata->cfg.quirks & (SYSC_QUIRK_NO_IDLE |
+ SYSC_QUIRK_NO_IDLE_ON_INIT |
SYSC_QUIRK_NO_RESET_ON_INIT)) {
schedule_delayed_work(&ddata->idle_work, 3000);
} else {
#include <linux/export.h>
#include <linux/io.h>
#include <linux/uio.h>
-
#include <linux/uaccess.h>
+#include <linux/security.h>
#ifdef CONFIG_IA64
# include <linux/efi.h>
static int open_port(struct inode *inode, struct file *filp)
{
- return capable(CAP_SYS_RAWIO) ? 0 : -EPERM;
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
+ return security_locked_down(LOCKDOWN_DEV_MEM);
}
#define zero_lseek null_lseek
}
EXPORT_SYMBOL(get_random_bytes);
+
+/*
+ * Each time the timer fires, we expect that we got an unpredictable
+ * jump in the cycle counter. Even if the timer is running on another
+ * CPU, the timer activity will be touching the stack of the CPU that is
+ * generating entropy..
+ *
+ * Note that we don't re-arm the timer in the timer itself - we are
+ * happy to be scheduled away, since that just makes the load more
+ * complex, but we do not want the timer to keep ticking unless the
+ * entropy loop is running.
+ *
+ * So the re-arming always happens in the entropy loop itself.
+ */
+static void entropy_timer(struct timer_list *t)
+{
+ credit_entropy_bits(&input_pool, 1);
+}
+
+/*
+ * If we have an actual cycle counter, see if we can
+ * generate enough entropy with timing noise
+ */
+static void try_to_generate_entropy(void)
+{
+ struct {
+ unsigned long now;
+ struct timer_list timer;
+ } stack;
+
+ stack.now = random_get_entropy();
+
+ /* Slow counter - or none. Don't even bother */
+ if (stack.now == random_get_entropy())
+ return;
+
+ timer_setup_on_stack(&stack.timer, entropy_timer, 0);
+ while (!crng_ready()) {
+ if (!timer_pending(&stack.timer))
+ mod_timer(&stack.timer, jiffies+1);
+ mix_pool_bytes(&input_pool, &stack.now, sizeof(stack.now));
+ schedule();
+ stack.now = random_get_entropy();
+ }
+
+ del_timer_sync(&stack.timer);
+ destroy_timer_on_stack(&stack.timer);
+ mix_pool_bytes(&input_pool, &stack.now, sizeof(stack.now));
+}
+
/*
* Wait for the urandom pool to be seeded and thus guaranteed to supply
* cryptographically secure random numbers. This applies to: the /dev/urandom
{
if (likely(crng_ready()))
return 0;
- return wait_event_interruptible(crng_init_wait, crng_ready());
+
+ do {
+ int ret;
+ ret = wait_event_interruptible_timeout(crng_init_wait, crng_ready(), HZ);
+ if (ret)
+ return ret > 0 ? 0 : ret;
+
+ try_to_generate_entropy();
+ } while (!crng_ready());
+
+ return 0;
}
EXPORT_SYMBOL(wait_for_random_bytes);
else
add_device_randomness(buf, size);
}
-EXPORT_SYMBOL_GPL(add_bootloader_randomness);
\ No newline at end of file
+EXPORT_SYMBOL_GPL(add_bootloader_randomness);
{ DRA7_L4PER2_MCASP2_CLKCTRL, dra7_mcasp2_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:0154:22" },
{ DRA7_L4PER2_MCASP3_CLKCTRL, dra7_mcasp3_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:015c:22" },
{ DRA7_L4PER2_MCASP5_CLKCTRL, dra7_mcasp5_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:016c:22" },
- { DRA7_L4PER2_MCASP8_CLKCTRL, dra7_mcasp8_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:0184:24" },
+ { DRA7_L4PER2_MCASP8_CLKCTRL, dra7_mcasp8_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:0184:22" },
{ DRA7_L4PER2_MCASP4_CLKCTRL, dra7_mcasp4_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:018c:22" },
{ DRA7_L4PER2_UART7_CLKCTRL, dra7_uart7_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:01c4:24" },
{ DRA7_L4PER2_UART8_CLKCTRL, dra7_uart8_bit_data, CLKF_SW_SUP, "l4per2-clkctrl:01d4:24" },
DT_CLK(NULL, "mcasp6_aux_gfclk_mux", "l4per2-clkctrl:01f8:22"),
DT_CLK(NULL, "mcasp7_ahclkx_mux", "l4per2-clkctrl:01fc:24"),
DT_CLK(NULL, "mcasp7_aux_gfclk_mux", "l4per2-clkctrl:01fc:22"),
- DT_CLK(NULL, "mcasp8_ahclkx_mux", "l4per2-clkctrl:0184:22"),
- DT_CLK(NULL, "mcasp8_aux_gfclk_mux", "l4per2-clkctrl:0184:24"),
+ DT_CLK(NULL, "mcasp8_ahclkx_mux", "l4per2-clkctrl:0184:24"),
+ DT_CLK(NULL, "mcasp8_aux_gfclk_mux", "l4per2-clkctrl:0184:22"),
DT_CLK(NULL, "mmc1_clk32k", "l3init-clkctrl:0008:8"),
DT_CLK(NULL, "mmc1_fclk_div", "l3init-clkctrl:0008:25"),
DT_CLK(NULL, "mmc1_fclk_mux", "l3init-clkctrl:0008:24"),
struct clock_event_device *clkevt = &to->clkevt;
- of_irq->percpu ? free_percpu_irq(of_irq->irq, clkevt) :
+ if (of_irq->percpu)
+ free_percpu_irq(of_irq->irq, clkevt);
+ else
free_irq(of_irq->irq, clkevt);
}
dom = t->tx.buf;
dom->domain_id = cpu_to_le32(domain);
dom->flags = cpu_to_le32(flags);
- dom->domain_id = cpu_to_le32(state);
+ dom->reset_state = cpu_to_le32(state);
if (rdom->async_reset)
ret = scmi_do_xfer_with_response(handle, t);
#include <linux/acpi.h>
#include <linux/ucs2_string.h>
#include <linux/memblock.h>
+#include <linux/security.h>
#include <asm/early_ioremap.h>
static char efivar_ssdt[EFIVAR_SSDT_NAME_MAX] __initdata;
static int __init efivar_ssdt_setup(char *str)
{
+ int ret = security_locked_down(LOCKDOWN_ACPI_TABLES);
+
+ if (ret)
+ return ret;
+
if (strlen(str) < sizeof(efivar_ssdt))
memcpy(efivar_ssdt, str, strlen(str));
else
}
static int mvebu_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct mvebu_pwm *mvpwm = to_mvebu_pwm(chip);
struct mvebu_gpio_chip *mvchip = mvpwm->mvchip;
amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o amdgpu_atomfirmware.o \
amdgpu_vf_error.o amdgpu_sched.o amdgpu_debugfs.o amdgpu_ids.o \
amdgpu_gmc.o amdgpu_xgmi.o amdgpu_csa.o amdgpu_ras.o amdgpu_vm_cpu.o \
- amdgpu_vm_sdma.o amdgpu_pmu.o amdgpu_discovery.o amdgpu_ras_eeprom.o smu_v11_0_i2c.o
+ amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o smu_v11_0_i2c.o
amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
u32 val = 0;
u32 count = 0;
struct device *dev;
- struct i2s_platform_data *i2s_pdata;
+ struct i2s_platform_data *i2s_pdata = NULL;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
adev->acp.acp_cell = kcalloc(ACP_DEVS, sizeof(struct mfd_cell),
GFP_KERNEL);
- if (adev->acp.acp_cell == NULL)
- return -ENOMEM;
+ if (adev->acp.acp_cell == NULL) {
+ r = -ENOMEM;
+ goto failure;
+ }
adev->acp.acp_res = kcalloc(5, sizeof(struct resource), GFP_KERNEL);
if (adev->acp.acp_res == NULL) {
- kfree(adev->acp.acp_cell);
- return -ENOMEM;
+ r = -ENOMEM;
+ goto failure;
}
i2s_pdata = kcalloc(3, sizeof(struct i2s_platform_data), GFP_KERNEL);
if (i2s_pdata == NULL) {
- kfree(adev->acp.acp_res);
- kfree(adev->acp.acp_cell);
- return -ENOMEM;
+ r = -ENOMEM;
+ goto failure;
}
switch (adev->asic_type) {
r = mfd_add_hotplug_devices(adev->acp.parent, adev->acp.acp_cell,
ACP_DEVS);
if (r)
- return r;
+ goto failure;
for (i = 0; i < ACP_DEVS ; i++) {
dev = get_mfd_cell_dev(adev->acp.acp_cell[i].name, i);
r = pm_genpd_add_device(&adev->acp.acp_genpd->gpd, dev);
if (r) {
dev_err(dev, "Failed to add dev to genpd\n");
- return r;
+ goto failure;
}
}
break;
if (--count == 0) {
dev_err(&adev->pdev->dev, "Failed to reset ACP\n");
- return -ETIMEDOUT;
+ r = -ETIMEDOUT;
+ goto failure;
}
udelay(100);
}
break;
if (--count == 0) {
dev_err(&adev->pdev->dev, "Failed to reset ACP\n");
- return -ETIMEDOUT;
+ r = -ETIMEDOUT;
+ goto failure;
}
udelay(100);
}
val &= ~ACP_SOFT_RESET__SoftResetAud_MASK;
cgs_write_register(adev->acp.cgs_device, mmACP_SOFT_RESET, val);
return 0;
+
+failure:
+ kfree(i2s_pdata);
+ kfree(adev->acp.acp_res);
+ kfree(adev->acp.acp_cell);
+ kfree(adev->acp.acp_genpd);
+ return r;
}
/**
case AMD_IP_BLOCK_TYPE_UVD:
case AMD_IP_BLOCK_TYPE_VCN:
case AMD_IP_BLOCK_TYPE_VCE:
+ case AMD_IP_BLOCK_TYPE_SDMA:
if (swsmu)
ret = smu_dpm_set_power_gate(&adev->smu, block_type, gate);
else
break;
case AMD_IP_BLOCK_TYPE_GMC:
case AMD_IP_BLOCK_TYPE_ACP:
- case AMD_IP_BLOCK_TYPE_SDMA:
ret = ((adev)->powerplay.pp_funcs->set_powergating_by_smu(
(adev)->powerplay.pp_handle, block_type, gate));
break;
* - 3.32.0 - Add syncobj timeline support to AMDGPU_CS.
* - 3.33.0 - Fixes for GDS ENOMEM failures in AMDGPU_CS.
* - 3.34.0 - Non-DC can flip correctly between buffers with different pitches
+ * - 3.35.0 - Add drm_amdgpu_info_device::tcc_disabled_mask
*/
#define KMS_DRIVER_MAJOR 3
-#define KMS_DRIVER_MINOR 34
+#define KMS_DRIVER_MINOR 35
#define KMS_DRIVER_PATCHLEVEL 0
#define AMDGPU_MAX_TIMEOUT_PARAM_LENTH 256
{0x1002, 0x731B, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI10},
{0x1002, 0x731F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI10},
/* Navi14 */
- {0x1002, 0x7340, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
+ {0x1002, 0x7340, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14|AMD_EXP_HW_SUPPORT},
+ {0x1002, 0x7341, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14|AMD_EXP_HW_SUPPORT},
+ {0x1002, 0x7347, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14|AMD_EXP_HW_SUPPORT},
/* Renoir */
{0x1002, 0x1636, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RENOIR|AMD_IS_APU|AMD_EXP_HW_SUPPORT},
+ /* Navi12 */
+ {0x1002, 0x7360, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI12|AMD_EXP_HW_SUPPORT},
+
{0, 0, 0}
};
uint32_t num_sc_per_sh;
uint32_t num_packer_per_sc;
uint32_t pa_sc_tile_steering_override;
+ uint64_t tcc_disabled_mask;
};
struct amdgpu_cu_info {
/* ring tests don't use a job */
if (job) {
vm = job->vm;
- fence_ctx = job->base.s_fence->scheduled.context;
+ fence_ctx = job->base.s_fence ?
+ job->base.s_fence->scheduled.context : 0;
} else {
vm = NULL;
fence_ctx = 0;
if (sh_num == AMDGPU_INFO_MMR_SH_INDEX_MASK)
sh_num = 0xffffffff;
+ if (info->read_mmr_reg.count > 128)
+ return -EINVAL;
+
regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL);
if (!regs)
return -ENOMEM;
dev_info.pa_sc_tile_steering_override =
adev->gfx.config.pa_sc_tile_steering_override;
+ dev_info.tcc_disabled_mask = adev->gfx.config.tcc_disabled_mask;
+
return copy_to_user(out, &dev_info,
min((size_t)size, sizeof(dev_info))) ? -EFAULT : 0;
}
struct ttm_bo_global *glob = adev->mman.bdev.glob;
struct amdgpu_vm_bo_base *bo_base;
-#if 0
if (vm->bulk_moveable) {
spin_lock(&glob->lru_lock);
ttm_bo_bulk_move_lru_tail(&vm->lru_bulk_move);
spin_unlock(&glob->lru_lock);
return;
}
-#endif
memset(&vm->lru_bulk_move, 0, sizeof(vm->lru_bulk_move));
MODULE_FIRMWARE("amdgpu/navi10_mec2.bin");
MODULE_FIRMWARE("amdgpu/navi10_rlc.bin");
+MODULE_FIRMWARE("amdgpu/navi14_ce_wks.bin");
+MODULE_FIRMWARE("amdgpu/navi14_pfp_wks.bin");
+MODULE_FIRMWARE("amdgpu/navi14_me_wks.bin");
+MODULE_FIRMWARE("amdgpu/navi14_mec_wks.bin");
+MODULE_FIRMWARE("amdgpu/navi14_mec2_wks.bin");
MODULE_FIRMWARE("amdgpu/navi14_ce.bin");
MODULE_FIRMWARE("amdgpu/navi14_pfp.bin");
MODULE_FIRMWARE("amdgpu/navi14_me.bin");
static int gfx_v10_0_init_microcode(struct amdgpu_device *adev)
{
const char *chip_name;
- char fw_name[30];
+ char fw_name[40];
+ char wks[10];
int err;
struct amdgpu_firmware_info *info = NULL;
const struct common_firmware_header *header = NULL;
DRM_DEBUG("\n");
+ memset(wks, 0, sizeof(wks));
switch (adev->asic_type) {
case CHIP_NAVI10:
chip_name = "navi10";
break;
case CHIP_NAVI14:
chip_name = "navi14";
+ if (!(adev->pdev->device == 0x7340 &&
+ adev->pdev->revision != 0x00))
+ snprintf(wks, sizeof(wks), "_wks");
break;
case CHIP_NAVI12:
chip_name = "navi12";
BUG();
}
- snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", chip_name);
+ snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp%s.bin", chip_name, wks);
err = request_firmware(&adev->gfx.pfp_fw, fw_name, adev->dev);
if (err)
goto out;
adev->gfx.pfp_fw_version = le32_to_cpu(cp_hdr->header.ucode_version);
adev->gfx.pfp_feature_version = le32_to_cpu(cp_hdr->ucode_feature_version);
- snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", chip_name);
+ snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me%s.bin", chip_name, wks);
err = request_firmware(&adev->gfx.me_fw, fw_name, adev->dev);
if (err)
goto out;
adev->gfx.me_fw_version = le32_to_cpu(cp_hdr->header.ucode_version);
adev->gfx.me_feature_version = le32_to_cpu(cp_hdr->ucode_feature_version);
- snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce.bin", chip_name);
+ snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce%s.bin", chip_name, wks);
err = request_firmware(&adev->gfx.ce_fw, fw_name, adev->dev);
if (err)
goto out;
if (adev->gfx.rlc.is_rlc_v2_1)
gfx_v10_0_init_rlc_ext_microcode(adev);
- snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name);
+ snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec%s.bin", chip_name, wks);
err = request_firmware(&adev->gfx.mec_fw, fw_name, adev->dev);
if (err)
goto out;
adev->gfx.mec_fw_version = le32_to_cpu(cp_hdr->header.ucode_version);
adev->gfx.mec_feature_version = le32_to_cpu(cp_hdr->ucode_feature_version);
- snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2.bin", chip_name);
+ snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2%s.bin", chip_name, wks);
err = request_firmware(&adev->gfx.mec2_fw, fw_name, adev->dev);
if (!err) {
err = amdgpu_ucode_validate(adev->gfx.mec2_fw);
}
}
+static void gfx_v10_0_get_tcc_info(struct amdgpu_device *adev)
+{
+ /* TCCs are global (not instanced). */
+ uint32_t tcc_disable = RREG32_SOC15(GC, 0, mmCGTS_TCC_DISABLE) |
+ RREG32_SOC15(GC, 0, mmCGTS_USER_TCC_DISABLE);
+
+ adev->gfx.config.tcc_disabled_mask =
+ REG_GET_FIELD(tcc_disable, CGTS_TCC_DISABLE, TCC_DISABLE) |
+ (REG_GET_FIELD(tcc_disable, CGTS_TCC_DISABLE, HI_TCC_DISABLE) << 16);
+}
+
static void gfx_v10_0_constants_init(struct amdgpu_device *adev)
{
u32 tmp;
gfx_v10_0_setup_rb(adev);
gfx_v10_0_get_cu_info(adev, &adev->gfx.cu_info);
+ gfx_v10_0_get_tcc_info(adev);
adev->gfx.config.pa_sc_tile_steering_override =
gfx_v10_0_init_pa_sc_tile_steering_override(adev);
switch (adev->asic_type) {
case CHIP_RAVEN:
- case CHIP_RENOIR:
gfx_v9_0_init_lbpw(adev);
break;
case CHIP_VEGA20:
switch (adev->asic_type) {
case CHIP_RAVEN:
- case CHIP_RENOIR:
if (amdgpu_lbpw == 0)
gfx_v9_0_enable_lbpw(adev, false);
else
struct smu_context *smu = &adev->smu;
if (nv_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
- amdgpu_inc_vram_lost(adev);
+ if (!adev->in_suspend)
+ amdgpu_inc_vram_lost(adev);
ret = smu_baco_reset(smu);
} else {
- amdgpu_inc_vram_lost(adev);
+ if (!adev->in_suspend)
+ amdgpu_inc_vram_lost(adev);
ret = nv_asic_mode1_reset(adev);
}
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- if (adev->asic_type == CHIP_RAVEN && adev->powerplay.pp_funcs &&
- adev->powerplay.pp_funcs->set_powergating_by_smu)
+ if ((adev->asic_type == CHIP_RAVEN && adev->powerplay.pp_funcs &&
+ adev->powerplay.pp_funcs->set_powergating_by_smu) ||
+ adev->asic_type == CHIP_RENOIR)
amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_SDMA, false);
if (!amdgpu_sriov_vf(adev))
sdma_v4_0_ctx_switch_enable(adev, false);
sdma_v4_0_enable(adev, false);
- if (adev->asic_type == CHIP_RAVEN && adev->powerplay.pp_funcs
- && adev->powerplay.pp_funcs->set_powergating_by_smu)
+ if ((adev->asic_type == CHIP_RAVEN && adev->powerplay.pp_funcs
+ && adev->powerplay.pp_funcs->set_powergating_by_smu) ||
+ adev->asic_type == CHIP_RENOIR)
amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_SDMA, true);
return 0;
}
/* Restore clock gating */
- smu_v11_0_i2c_set_clock_gating(control, true);
+
+ /*
+ * TODO Reenabling clock gating seems to break subsequent SMU operation
+ * on the I2C bus. My guess is that SMU doesn't disable clock gating like
+ * we do here before working with the bus. So for now just don't restore
+ * it but later work with SMU to see if they have this issue and can
+ * update their code appropriately
+ */
+ /* smu_v11_0_i2c_set_clock_gating(control, true); */
}
{
switch (soc15_asic_reset_method(adev)) {
case AMD_RESET_METHOD_BACO:
- amdgpu_inc_vram_lost(adev);
+ if (!adev->in_suspend)
+ amdgpu_inc_vram_lost(adev);
return soc15_asic_baco_reset(adev);
case AMD_RESET_METHOD_MODE2:
return soc15_mode2_reset(adev);
default:
- amdgpu_inc_vram_lost(adev);
+ if (!adev->in_suspend)
+ amdgpu_inc_vram_lost(adev);
return soc15_asic_mode1_reset(adev);
}
}
#if defined(CONFIG_DRM_AMD_DC)
else if (amdgpu_device_has_dc_support(adev))
amdgpu_device_ip_block_add(adev, &dm_ip_block);
-#else
-# warning "Enable CONFIG_DRM_AMD_DC for display support on SOC15."
#endif
amdgpu_device_ip_block_add(adev, &vcn_v2_0_ip_block);
break;
0x003f8000, 0x8f6f896f,
0x88776f77, 0x8a6eff6e,
0x023f8000, 0xb9eef807,
- 0xb970f812, 0xb971f813,
- 0x8ff08870, 0xf4051bb8,
+ 0xb97af812, 0xb97bf813,
+ 0x8ffa887a, 0xf4051bbd,
0xfa000000, 0xbf8cc07f,
- 0xf4051c38, 0xfa000008,
+ 0xf4051ebd, 0xfa000008,
0xbf8cc07f, 0x87ee6e6e,
0xbf840001, 0xbe80206e,
0xb971f803, 0x8771ff71,
// Read second-level TBA/TMA from first-level TMA and jump if available.
// ttmp[2:5] and ttmp12 can be used (others hold SPI-initialized debug data)
// ttmp12 holds SQ_WAVE_STATUS
- s_getreg_b32 ttmp4, hwreg(HW_REG_SHADER_TMA_LO)
- s_getreg_b32 ttmp5, hwreg(HW_REG_SHADER_TMA_HI)
- s_lshl_b64 [ttmp4, ttmp5], [ttmp4, ttmp5], 0x8
- s_load_dwordx2 [ttmp2, ttmp3], [ttmp4, ttmp5], 0x0 glc:1 // second-level TBA
+ s_getreg_b32 ttmp14, hwreg(HW_REG_SHADER_TMA_LO)
+ s_getreg_b32 ttmp15, hwreg(HW_REG_SHADER_TMA_HI)
+ s_lshl_b64 [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8
+ s_load_dwordx2 [ttmp2, ttmp3], [ttmp14, ttmp15], 0x0 glc:1 // second-level TBA
s_waitcnt lgkmcnt(0)
- s_load_dwordx2 [ttmp4, ttmp5], [ttmp4, ttmp5], 0x8 glc:1 // second-level TMA
+ s_load_dwordx2 [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 glc:1 // second-level TMA
s_waitcnt lgkmcnt(0)
s_and_b64 [ttmp2, ttmp3], [ttmp2, ttmp3], [ttmp2, ttmp3]
s_cbranch_scc0 L_NO_NEXT_TRAP // second-level trap handler not been set
}
static const struct backlight_ops amdgpu_dm_backlight_ops = {
+ .options = BL_CORE_SUSPENDRESUME,
.get_brightness = amdgpu_dm_backlight_get_brightness,
.update_status = amdgpu_dm_backlight_update_status,
};
* change FB pitch, DCC state, rotation or mirroing.
*/
bundle->flip_addrs[planes_count].flip_immediate =
- (crtc->state->pageflip_flags &
- DRM_MODE_PAGE_FLIP_ASYNC) != 0 &&
+ crtc->state->async_flip &&
acrtc_state->update_type == UPDATE_TYPE_FAST;
timestamp_ns = ktime_get_ns();
struct drm_crtc *crtc;
struct drm_crtc_state *old_crtc_state, *new_crtc_state;
int i;
+#ifdef CONFIG_DEBUG_FS
enum amdgpu_dm_pipe_crc_source source;
+#endif
for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state,
new_crtc_state, i) {
amdgpu_dm_enable_crtc_interrupts(dev, state, true);
for_each_new_crtc_in_state(state, crtc, new_crtc_state, j)
- if (new_crtc_state->pageflip_flags & DRM_MODE_PAGE_FLIP_ASYNC)
+ if (new_crtc_state->async_flip)
wait_for_vblank = false;
/* update planes when needed per crtc*/
unsigned int get_highest_allowed_voltage_level(uint32_t hw_internal_rev)
{
+ /* for dali, the highest voltage level we want is 0 */
+ if (ASICREV_IS_DALI(hw_internal_rev))
+ return 0;
+
/* we are ok with all levels */
return 4;
}
struct dc_stream_state *stream = context->streams[j];
uint32_t vertical_blank_in_pixels = 0;
uint32_t vertical_blank_time = 0;
+ uint32_t vertical_total_min = stream->timing.v_total;
+ struct dc_crtc_timing_adjust adjust = stream->adjust;
+ if (adjust.v_total_max != adjust.v_total_min)
+ vertical_total_min = adjust.v_total_min;
vertical_blank_in_pixels = stream->timing.h_total *
- (stream->timing.v_total
+ (vertical_total_min
- stream->timing.v_addressable);
-
vertical_blank_time = vertical_blank_in_pixels
* 10000 / stream->timing.pix_clk_100hz;
struct dc_state *context)
{
struct dm_pp_display_configuration *pp_display_cfg = &context->pp_display_cfg;
+ int memory_type_multiplier = MEMORY_TYPE_MULTIPLIER_CZ;
+
+ if (dc->bw_vbios && dc->bw_vbios->memory_type == bw_def_hbm)
+ memory_type_multiplier = MEMORY_TYPE_HBM;
pp_display_cfg->all_displays_in_sync =
context->bw_ctx.bw.dce.all_displays_in_sync;
pp_display_cfg->cpu_pstate_separation_time =
context->bw_ctx.bw.dce.blackout_recovery_time_us;
- pp_display_cfg->min_memory_clock_khz = context->bw_ctx.bw.dce.yclk_khz
- / MEMORY_TYPE_MULTIPLIER_CZ;
+ /*
+ * TODO: determine whether the bandwidth has reached memory's limitation
+ * , then change minimum memory clock based on real-time bandwidth
+ * limitation.
+ */
+ if (ASICREV_IS_VEGA20_P(dc->ctx->asic_id.hw_internal_rev) && (context->stream_count >= 2)) {
+ pp_display_cfg->min_memory_clock_khz = max(pp_display_cfg->min_memory_clock_khz,
+ (uint32_t) div64_s64(
+ div64_s64(dc->bw_vbios->high_yclk.value,
+ memory_type_multiplier), 10000));
+ } else {
+ pp_display_cfg->min_memory_clock_khz = context->bw_ctx.bw.dce.yclk_khz
+ / memory_type_multiplier;
+ }
pp_display_cfg->min_engine_clock_khz = determine_sclk_from_bounding_box(
dc,
pte->min_pte_before_flip_horiz_scan;
REG_UPDATE(GRPH_PIPE_OUTSTANDING_REQUEST_LIMIT,
- GRPH_PIPE_OUTSTANDING_REQUEST_LIMIT, 0xff);
+ GRPH_PIPE_OUTSTANDING_REQUEST_LIMIT, 0x7f);
REG_UPDATE_3(DVMM_PTE_CONTROL,
DVMM_PAGE_WIDTH, page_width,
REG_UPDATE_2(DVMM_PTE_ARB_CONTROL,
DVMM_PTE_REQ_PER_CHUNK, pte->pte_req_per_chunk,
- DVMM_MAX_PTE_REQ_OUTSTANDING, 0xff);
+ DVMM_MAX_PTE_REQ_OUTSTANDING, 0x7f);
}
static void program_urgency_watermark(
return &clk_src->base;
}
+ kfree(clk_src);
BREAK_TO_DEBUGGER();
return NULL;
}
if (construct(num_virtual_links, dc, pool))
return &pool->base;
+ kfree(pool);
BREAK_TO_DEBUGGER();
return NULL;
}
return &clk_src->base;
}
+ kfree(clk_src);
BREAK_TO_DEBUGGER();
return NULL;
}
if (construct(num_virtual_links, dc, pool, asic_id))
return &pool->base;
+ kfree(pool);
BREAK_TO_DEBUGGER();
return NULL;
}
return &clk_src->base;
}
+ kfree(clk_src);
BREAK_TO_DEBUGGER();
return NULL;
}
struct dm_pp_clock_levels_with_latency mem_clks = {0};
struct dm_pp_wm_sets_with_clock_ranges clk_ranges = {0};
struct dm_pp_clock_levels clks = {0};
+ int memory_type_multiplier = MEMORY_TYPE_MULTIPLIER_CZ;
+
+ if (dc->bw_vbios && dc->bw_vbios->memory_type == bw_def_hbm)
+ memory_type_multiplier = MEMORY_TYPE_HBM;
/*do system clock TODO PPLIB: after PPLIB implement,
* then remove old way
&clks);
dc->bw_vbios->low_yclk = bw_frc_to_fixed(
- clks.clocks_in_khz[0] * MEMORY_TYPE_MULTIPLIER_CZ, 1000);
+ clks.clocks_in_khz[0] * memory_type_multiplier, 1000);
dc->bw_vbios->mid_yclk = bw_frc_to_fixed(
- clks.clocks_in_khz[clks.num_levels>>1] * MEMORY_TYPE_MULTIPLIER_CZ,
+ clks.clocks_in_khz[clks.num_levels>>1] * memory_type_multiplier,
1000);
dc->bw_vbios->high_yclk = bw_frc_to_fixed(
- clks.clocks_in_khz[clks.num_levels-1] * MEMORY_TYPE_MULTIPLIER_CZ,
+ clks.clocks_in_khz[clks.num_levels-1] * memory_type_multiplier,
1000);
return;
* YCLK = UMACLK*m_memoryTypeMultiplier
*/
dc->bw_vbios->low_yclk = bw_frc_to_fixed(
- mem_clks.data[0].clocks_in_khz * MEMORY_TYPE_MULTIPLIER_CZ, 1000);
+ mem_clks.data[0].clocks_in_khz * memory_type_multiplier, 1000);
dc->bw_vbios->mid_yclk = bw_frc_to_fixed(
- mem_clks.data[mem_clks.num_levels>>1].clocks_in_khz * MEMORY_TYPE_MULTIPLIER_CZ,
+ mem_clks.data[mem_clks.num_levels>>1].clocks_in_khz * memory_type_multiplier,
1000);
dc->bw_vbios->high_yclk = bw_frc_to_fixed(
- mem_clks.data[mem_clks.num_levels-1].clocks_in_khz * MEMORY_TYPE_MULTIPLIER_CZ,
+ mem_clks.data[mem_clks.num_levels-1].clocks_in_khz * memory_type_multiplier,
1000);
/* Now notify PPLib/SMU about which Watermarks sets they should select
if (construct(num_virtual_links, dc, pool))
return &pool->base;
+ kfree(pool);
BREAK_TO_DEBUGGER();
return NULL;
}
return &clk_src->base;
}
+ kfree(clk_src);
BREAK_TO_DEBUGGER();
return NULL;
}
int i;
unsigned int clk;
unsigned int latency;
+ /*original logic in dal3*/
+ int memory_type_multiplier = MEMORY_TYPE_MULTIPLIER_CZ;
/*do system clock*/
if (!dm_pp_get_clock_levels_by_type_with_latency(
* ALSO always convert UMA clock (from PPLIB) to YCLK (HW formula):
* YCLK = UMACLK*m_memoryTypeMultiplier
*/
+ if (dc->bw_vbios->memory_type == bw_def_hbm)
+ memory_type_multiplier = MEMORY_TYPE_HBM;
+
dc->bw_vbios->low_yclk = bw_frc_to_fixed(
- mem_clks.data[0].clocks_in_khz * MEMORY_TYPE_MULTIPLIER_CZ, 1000);
+ mem_clks.data[0].clocks_in_khz * memory_type_multiplier, 1000);
dc->bw_vbios->mid_yclk = bw_frc_to_fixed(
- mem_clks.data[mem_clks.num_levels>>1].clocks_in_khz * MEMORY_TYPE_MULTIPLIER_CZ,
+ mem_clks.data[mem_clks.num_levels>>1].clocks_in_khz * memory_type_multiplier,
1000);
dc->bw_vbios->high_yclk = bw_frc_to_fixed(
- mem_clks.data[mem_clks.num_levels-1].clocks_in_khz * MEMORY_TYPE_MULTIPLIER_CZ,
+ mem_clks.data[mem_clks.num_levels-1].clocks_in_khz * memory_type_multiplier,
1000);
/* Now notify PPLib/SMU about which Watermarks sets they should select
if (construct(num_virtual_links, dc, pool))
return &pool->base;
+ kfree(pool);
BREAK_TO_DEBUGGER();
return NULL;
}
return &clk_src->base;
}
+ kfree(clk_src);
BREAK_TO_DEBUGGER();
return NULL;
}
return &clk_src->base;
}
+ kfree(clk_src);
BREAK_TO_DEBUGGER();
return NULL;
}
if (construct(init_data->num_virtual_links, dc, pool))
return &pool->base;
+ kfree(pool);
BREAK_TO_DEBUGGER();
return NULL;
}
return &clk_src->base;
}
+ kfree(clk_src);
BREAK_TO_DEBUGGER();
return NULL;
}
DCN21 = dcn21_hubp.o dcn21_hubbub.o dcn21_resource.o
-CFLAGS_$(AMDDALPATH)/dc/dcn21/dcn21_resource.o := -mhard-float -msse -mpreferred-stack-boundary=4
+ifneq ($(call cc-option, -mpreferred-stack-boundary=4),)
+ cc_stack_align := -mpreferred-stack-boundary=4
+else ifneq ($(call cc-option, -mstack-alignment=16),)
+ cc_stack_align := -mstack-alignment=16
+endif
+
+CFLAGS_$(AMDDALPATH)/dc/dcn21/dcn21_resource.o := -mhard-float -msse $(cc_stack_align)
+
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_$(AMDDALPATH)/dc/dcn21/dcn21_resource.o += -msse2
+endif
AMD_DAL_DCN21 = $(addprefix $(AMDDALPATH)/dc/dcn21/,$(DCN21))
*
*/
+#include <linux/slab.h>
+
#include "dm_services.h"
#include "dc.h"
* ways. Unless there is something clearly wrong with it the code should
* remain as-is as it provides us with a guarantee from HW that it is correct.
*/
-
-typedef unsigned int uint;
-
typedef struct {
double DPPCLK;
double DISPCLK;
mode_lib->vba.MaximumReadBandwidthWithoutPrefetch = 0.0;
mode_lib->vba.MaximumReadBandwidthWithPrefetch = 0.0;
for (k = 0; k <= mode_lib->vba.NumberOfActivePlanes - 1; k++) {
- uint m;
+ unsigned int m;
locals->cursor_bw[k] = 0;
locals->cursor_bw_pre[k] = 0;
double SecondMinActiveDRAMClockChangeMarginOneDisplayInVBLank;
double FullDETBufferingTimeYStutterCriticalPlane = 0;
double TimeToFinishSwathTransferStutterCriticalPlane = 0;
- uint k, j;
+ unsigned int k, j;
mode_lib->vba.TotalActiveDPP = 0;
mode_lib->vba.TotalDCCActiveDPP = 0;
double DPPCLK[],
double *DCFCLKDeepSleep)
{
- uint k;
+ unsigned int k;
double DisplayPipeLineDeliveryTimeLuma;
double DisplayPipeLineDeliveryTimeChroma;
//double DCFCLKDeepSleepPerPlane[DC__NUM_DPP__MAX];
double DisplayPipeRequestDeliveryTimeChromaPrefetch[])
{
double req_per_swath_ub;
- uint k;
+ unsigned int k;
for (k = 0; k < NumberOfActivePlanes; ++k) {
if (VRatio[k] <= 1) {
unsigned int dpte_groups_per_row_chroma_ub;
unsigned int num_group_per_lower_vm_stage;
unsigned int num_req_per_lower_vm_stage;
- uint k;
+ unsigned int k;
for (k = 0; k < NumberOfActivePlanes; ++k) {
if (GPUVMEnable == true) {
#include "hw_factory_dcn21.h"
-
#include "dcn/dcn_2_1_0_offset.h"
#include "dcn/dcn_2_1_0_sh_mask.h"
#include "renoir_ip_offset.h"
-
#include "reg_helper.h"
#include "../hpd_regs.h"
/* begin *********************
DDC_MASK_SH_LIST_DCN2(_MASK, 6)
};
+#include "../generic_regs.h"
+
+/* set field name */
+#define SF_GENERIC(reg_name, field_name, post_fix)\
+ .field_name = reg_name ## __ ## field_name ## post_fix
+
+#define generic_regs(id) \
+{\
+ GENERIC_REG_LIST(id)\
+}
+
+static const struct generic_registers generic_regs[] = {
+ generic_regs(A),
+};
+
+static const struct generic_sh_mask generic_shift[] = {
+ GENERIC_MASK_SH_LIST(__SHIFT, A),
+};
+
+static const struct generic_sh_mask generic_mask[] = {
+ GENERIC_MASK_SH_LIST(_MASK, A),
+};
+
+static void define_generic_registers(struct hw_gpio_pin *pin, uint32_t en)
+{
+ struct hw_generic *generic = HW_GENERIC_FROM_BASE(pin);
+
+ generic->regs = &generic_regs[en];
+ generic->shifts = &generic_shift[en];
+ generic->masks = &generic_mask[en];
+ generic->base.regs = &generic_regs[en].gpio;
+}
+
static void define_ddc_registers(
struct hw_gpio_pin *pin,
uint32_t en)
.get_hpd_pin = dal_hw_hpd_get_pin,
.get_generic_pin = dal_hw_generic_get_pin,
.define_hpd_registers = define_hpd_registers,
- .define_ddc_registers = define_ddc_registers
+ .define_ddc_registers = define_ddc_registers,
+ .define_generic_registers = define_generic_registers
};
/*
* dal_hw_factory_dcn10_init
#define SF_HPD(reg_name, field_name, post_fix)\
.field_name = reg_name ## __ ## field_name ## post_fix
-
/* macros to expend register list macro defined in HW object header file
* end *********************/
{
switch (offset) {
/* GENERIC */
- case REG(DC_GENERICA):
+ case REG(DC_GPIO_GENERIC_A):
*id = GPIO_ID_GENERIC;
switch (mask) {
case DC_GPIO_GENERIC_A__DC_GPIO_GENERICA_A_MASK:
#include "dm_pp_smu.h"
#define MEMORY_TYPE_MULTIPLIER_CZ 4
+#define MEMORY_TYPE_HBM 2
+
enum dce_version resource_parse_asic_id(
struct hw_asic_id asic_id);
#define RAVEN1_F0 0xF0
#define RAVEN_UNKNOWN 0xFF
+#define PICASSO_15D8_REV_E3 0xE3
+#define PICASSO_15D8_REV_E4 0xE4
+
#define ASICREV_IS_RAVEN(eChipRev) ((eChipRev >= RAVEN_A0) && eChipRev < RAVEN_UNKNOWN)
#define ASICREV_IS_PICASSO(eChipRev) ((eChipRev >= PICASSO_A0) && (eChipRev < RAVEN2_A0))
-#define ASICREV_IS_RAVEN2(eChipRev) ((eChipRev >= RAVEN2_A0) && (eChipRev < 0xF0))
-
+#define ASICREV_IS_RAVEN2(eChipRev) ((eChipRev >= RAVEN2_A0) && (eChipRev < PICASSO_15D8_REV_E3))
+#define ASICREV_IS_DALI(eChipRev) ((eChipRev >= PICASSO_15D8_REV_E3) && (eChipRev < RAVEN1_F0))
#define ASICREV_IS_RV1_F0(eChipRev) ((eChipRev >= RAVEN1_F0) && (eChipRev < RAVEN_UNKNOWN))
{ { 0, 0, 0, 0, 0 } },
{ { 0, 0, 0, 0, 0 } },
{ { 0, 0, 0, 0, 0 } } } };
-static const struct IP_BASE MP1_BASE ={ { { { 0x00016200, 0x02400400, 0x00E80000, 0x00EC0000, 0x00F00000 } },
+static const struct IP_BASE MP1_BASE ={ { { { 0x00016000, 0x02400400, 0x00E80000, 0x00EC0000, 0x00F00000 } },
{ { 0, 0, 0, 0, 0 } },
{ { 0, 0, 0, 0, 0 } },
{ { 0, 0, 0, 0, 0 } },
static int pp_smu_i2c_bus_access(void *handle, bool acquire)
{
struct pp_hwmgr *hwmgr = handle;
+ int ret = 0;
if (!hwmgr || !hwmgr->pm_en)
return -EINVAL;
return -EINVAL;
}
- return hwmgr->hwmgr_func->smu_i2c_bus_access(hwmgr, acquire);
+ mutex_lock(&hwmgr->smu_lock);
+ ret = hwmgr->hwmgr_func->smu_i2c_bus_access(hwmgr, acquire);
+ mutex_unlock(&hwmgr->smu_lock);
+
+ return ret;
}
static const struct amd_pm_funcs pp_dpm_funcs = {
case AMD_IP_BLOCK_TYPE_GFX:
ret = smu_gfx_off_control(smu, gate);
break;
+ case AMD_IP_BLOCK_TYPE_SDMA:
+ ret = smu_powergate_sdma(smu, gate);
+ break;
default:
break;
}
smu->smu_baco.state = SMU_BACO_STATE_EXIT;
smu->smu_baco.platform_support = false;
+ mutex_init(&smu->sensor_lock);
+
smu->watermarks_bitmap = 0;
smu->power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
smu->default_power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
if (!data || !size)
return -EINVAL;
+ mutex_lock(&smu->sensor_lock);
switch (sensor) {
case AMDGPU_PP_SENSOR_MAX_FAN_RPM:
*(uint32_t *)data = pptable->FanMaximumRpm;
default:
ret = smu_smc_read_sensor(smu, sensor, data, size);
}
+ mutex_unlock(&smu->sensor_lock);
return ret;
}
const struct smu_funcs *funcs;
const struct pptable_funcs *ppt_funcs;
struct mutex mutex;
+ struct mutex sensor_lock;
uint64_t pool_size;
struct smu_table_context smu_table;
struct smu_table_context *smu_table= &smu->smu_table;
int ret = 0;
- if (!smu_table->metrics_time || time_after(jiffies, smu_table->metrics_time + HZ / 1000)) {
+ if (!smu_table->metrics_time || time_after(jiffies, smu_table->metrics_time + msecs_to_jiffies(100))) {
ret = smu_update_table(smu, SMU_TABLE_SMU_METRICS, 0,
(void *)smu_table->metrics_table, false);
if (ret) {
if(!data || !size)
return -EINVAL;
+ mutex_lock(&smu->sensor_lock);
switch (sensor) {
case AMDGPU_PP_SENSOR_MAX_FAN_RPM:
*(uint32_t *)data = pptable->FanMaximumRpm;
default:
ret = smu_smc_read_sensor(smu, sensor, data, size);
}
+ mutex_unlock(&smu->sensor_lock);
return ret;
}
}
+static int renoir_print_clk_levels(struct smu_context *smu,
+ enum smu_clk_type clk_type, char *buf)
+{
+ int i, size = 0, ret = 0;
+ uint32_t cur_value = 0, value = 0, count = 0, min = 0, max = 0;
+ DpmClocks_t *clk_table = smu->smu_table.clocks_table;
+ SmuMetrics_t metrics = {0};
+
+ if (!clk_table || clk_type >= SMU_CLK_COUNT)
+ return -EINVAL;
+
+ ret = smu_update_table(smu, SMU_TABLE_SMU_METRICS, 0,
+ (void *)&metrics, false);
+ if (ret)
+ return ret;
+
+ switch (clk_type) {
+ case SMU_GFXCLK:
+ case SMU_SCLK:
+ /* retirve table returned paramters unit is MHz */
+ cur_value = metrics.ClockFrequency[CLOCK_GFXCLK];
+ ret = smu_get_dpm_freq_range(smu, SMU_GFXCLK, &min, &max);
+ if (!ret) {
+ /* driver only know min/max gfx_clk, Add level 1 for all other gfx clks */
+ if (cur_value == max)
+ i = 2;
+ else if (cur_value == min)
+ i = 0;
+ else
+ i = 1;
+
+ size += sprintf(buf + size, "0: %uMhz %s\n", min,
+ i == 0 ? "*" : "");
+ size += sprintf(buf + size, "1: %uMhz %s\n",
+ i == 1 ? cur_value : RENOIR_UMD_PSTATE_GFXCLK,
+ i == 1 ? "*" : "");
+ size += sprintf(buf + size, "2: %uMhz %s\n", max,
+ i == 2 ? "*" : "");
+ }
+ return size;
+ case SMU_SOCCLK:
+ count = NUM_SOCCLK_DPM_LEVELS;
+ cur_value = metrics.ClockFrequency[CLOCK_SOCCLK];
+ break;
+ case SMU_MCLK:
+ count = NUM_MEMCLK_DPM_LEVELS;
+ cur_value = metrics.ClockFrequency[CLOCK_UMCCLK];
+ break;
+ case SMU_DCEFCLK:
+ count = NUM_DCFCLK_DPM_LEVELS;
+ cur_value = metrics.ClockFrequency[CLOCK_DCFCLK];
+ break;
+ case SMU_FCLK:
+ count = NUM_FCLK_DPM_LEVELS;
+ cur_value = metrics.ClockFrequency[CLOCK_FCLK];
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ for (i = 0; i < count; i++) {
+ GET_DPM_CUR_FREQ(clk_table, clk_type, i, value);
+ size += sprintf(buf + size, "%d: %uMhz %s\n", i, value,
+ cur_value == value ? "*" : "");
+ }
+
+ return size;
+}
+
static const struct pptable_funcs renoir_ppt_funcs = {
.get_smu_msg_index = renoir_get_smu_msg_index,
.get_smu_table_index = renoir_get_smu_table_index,
.tables_init = renoir_tables_init,
.set_power_state = NULL,
.get_dpm_uclk_limited = renoir_get_dpm_uclk_limited,
+ .print_clk_levels = renoir_print_clk_levels,
};
void renoir_set_ppt_funcs(struct smu_context *smu)
extern void renoir_set_ppt_funcs(struct smu_context *smu);
+/* UMD PState Renoir Msg Parameters in MHz */
+#define RENOIR_UMD_PSTATE_GFXCLK 700
+#define RENOIR_UMD_PSTATE_SOCCLK 678
+#define RENOIR_UMD_PSTATE_FCLK 800
+
+#define GET_DPM_CUR_FREQ(table, clk_type, dpm_level, freq) \
+ do { \
+ switch (clk_type) { \
+ case SMU_SOCCLK: \
+ freq = table->SocClocks[dpm_level].Freq; \
+ break; \
+ case SMU_MCLK: \
+ freq = table->MemClocks[dpm_level].Freq; \
+ break; \
+ case SMU_DCEFCLK: \
+ freq = table->DcfClocks[dpm_level].Freq; \
+ break; \
+ case SMU_FCLK: \
+ freq = table->FClocks[dpm_level].Freq; \
+ break; \
+ default: \
+ break; \
+ } \
+ } while (0)
+
#endif
if(!data || !size)
return -EINVAL;
+ mutex_lock(&smu->sensor_lock);
switch (sensor) {
case AMDGPU_PP_SENSOR_MAX_FAN_RPM:
*(uint32_t *)data = pptable->FanMaximumRpm;
default:
ret = smu_smc_read_sensor(smu, sensor, data, size);
}
+ mutex_unlock(&smu->sensor_lock);
return ret;
}
struct komeda_data_flow_cfg dflow;
int err;
- if (!writeback_job || !writeback_job->fb) {
+ if (!writeback_job)
return 0;
- }
if (!crtc_st->active) {
DRM_DEBUG_ATOMIC("Cannot write the composition result out on a inactive CRTC.\n");
&komeda_wb_encoder_helper_funcs,
formats, n_formats);
komeda_put_fourcc_list(formats);
- if (err)
+ if (err) {
+ kfree(kwb_conn);
return err;
+ }
drm_connector_helper_add(&wb_conn->base, &komeda_wb_conn_helper_funcs);
struct drm_framebuffer *fb;
int i, n_planes;
- if (!conn_state->writeback_job || !conn_state->writeback_job->fb)
+ if (!conn_state->writeback_job)
return 0;
fb = conn_state->writeback_job->fb;
mw_state = to_mw_state(conn_state);
- if (conn_state->writeback_job && conn_state->writeback_job->fb) {
+ if (conn_state->writeback_job) {
struct drm_framebuffer *fb = conn_state->writeback_job->fb;
DRM_DEV_DEBUG_DRIVER(drm->dev,
&adv7511_connector_helper_funcs);
drm_connector_attach_encoder(&adv->connector, bridge->encoder);
+ if (adv->type == ADV7533)
+ ret = adv7533_attach_dsi(adv);
+
if (adv->i2c_main->irq)
regmap_write(adv->regmap, ADV7511_REG_INT_ENABLE(0),
ADV7511_INT0_HPD);
drm_bridge_add(&adv7511->bridge);
adv7511_audio_init(dev, adv7511);
-
- if (adv7511->type == ADV7533) {
- ret = adv7533_attach_dsi(adv7511);
- if (ret)
- goto err_remove_bridge;
- }
-
return 0;
-err_remove_bridge:
- drm_bridge_remove(&adv7511->bridge);
err_unregister_cec:
i2c_unregister_device(adv7511->i2c_cec);
if (adv7511->cec_clk)
return -EINVAL;
}
- if (writeback_job->out_fence && !writeback_job->fb) {
- DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] requesting out-fence without framebuffer\n",
- connector->base.id, connector->name);
- return -EINVAL;
+ if (!writeback_job->fb) {
+ if (writeback_job->out_fence) {
+ DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] requesting out-fence without framebuffer\n",
+ connector->base.id, connector->name);
+ return -EINVAL;
+ }
+
+ drm_writeback_cleanup_job(writeback_job);
+ state->writeback_job = NULL;
}
return 0;
*/
#include <linux/dma-fence.h>
+#include <linux/ktime.h>
#include <drm/drm_atomic.h>
#include <drm/drm_atomic_helper.h>
{
struct drm_device *dev = old_state->dev;
const struct drm_mode_config_helper_funcs *funcs;
+ ktime_t start;
+ s64 commit_time_ms;
funcs = dev->mode_config.helper_private;
+ /*
+ * We're measuring the _entire_ commit, so the time will vary depending
+ * on how many fences and objects are involved. For the purposes of self
+ * refresh, this is desirable since it'll give us an idea of how
+ * congested things are. This will inform our decision on how often we
+ * should enter self refresh after idle.
+ *
+ * These times will be averaged out in the self refresh helpers to avoid
+ * overreacting over one outlier frame
+ */
+ start = ktime_get();
+
drm_atomic_helper_wait_for_fences(dev, old_state, false);
drm_atomic_helper_wait_for_dependencies(old_state);
else
drm_atomic_helper_commit_tail(old_state);
+ commit_time_ms = ktime_ms_delta(ktime_get(), start);
+ if (commit_time_ms > 0)
+ drm_self_refresh_helper_update_avg_times(old_state,
+ (unsigned long)commit_time_ms);
+
drm_atomic_helper_commit_cleanup_done(old_state);
drm_atomic_state_put(old_state);
return PTR_ERR(crtc_state);
crtc_state->event = event;
- crtc_state->pageflip_flags = flags;
+ crtc_state->async_flip = flags & DRM_MODE_PAGE_FLIP_ASYNC;
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state))
state->zpos_changed = false;
state->commit = NULL;
state->event = NULL;
- state->pageflip_flags = 0;
+ state->async_flip = false;
/* Self refresh should be canceled when a new update is available */
state->active = drm_atomic_crtc_effectively_active(state);
if (arg->reserved)
return -EINVAL;
- if ((arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) &&
- !dev->mode_config.async_page_flip)
+ if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC)
return -EINVAL;
/* can't test and expect an event at the same time. */
if (ret)
goto err_minors;
+ dev->registered = true;
+
if (dev->driver->load) {
ret = dev->driver->load(dev, flags);
if (ret)
goto err_minors;
}
- dev->registered = true;
-
if (drm_core_check_feature(dev, DRIVER_MODESET))
drm_modeset_register_all(dev);
case DRM_CLIENT_CAP_ATOMIC:
if (!drm_core_check_feature(dev, DRIVER_ATOMIC))
return -EOPNOTSUPP;
- if (req->value > 1)
+ /* The modesetting DDX has a totally broken idea of atomic. */
+ if (current->comm[0] == 'X' && req->value == 1) {
+ pr_info("broken atomic modeset userspace detected, disabling atomic\n");
+ return -EOPNOTSUPP;
+ }
+ if (req->value > 2)
return -EINVAL;
file_priv->atomic = req->value;
file_priv->universal_planes = req->value;
{
int ret;
- WARN_ON(dev->registered && !obj_free_cb);
+ WARN_ON(!dev->driver->load && dev->registered && !obj_free_cb);
mutex_lock(&dev->mode_config.idr_mutex);
ret = idr_alloc(&dev->mode_config.object_idr, register_obj ? obj : NULL,
void drm_mode_object_unregister(struct drm_device *dev,
struct drm_mode_object *object)
{
- WARN_ON(dev->registered && !object->free_cb);
+ WARN_ON(!dev->driver->load && dev->registered && !object->free_cb);
mutex_lock(&dev->mode_config.idr_mutex);
if (object->id) {
* Authors:
* Sean Paul <seanpaul@chromium.org>
*/
+#include <linux/average.h>
#include <linux/bitops.h>
#include <linux/slab.h>
#include <linux/workqueue.h>
* atomic_check when &drm_crtc_state.self_refresh_active is true.
*/
+#define SELF_REFRESH_AVG_SEED_MS 200
+
+DECLARE_EWMA(psr_time, 4, 4)
+
struct drm_self_refresh_data {
struct drm_crtc *crtc;
struct delayed_work entry_work;
- struct drm_atomic_state *save_state;
- unsigned int entry_delay_ms;
+
+ struct mutex avg_mutex;
+ struct ewma_psr_time entry_avg_ms;
+ struct ewma_psr_time exit_avg_ms;
};
static void drm_self_refresh_helper_entry_work(struct work_struct *work)
drm_modeset_acquire_fini(&ctx);
}
+/**
+ * drm_self_refresh_helper_update_avg_times - Updates a crtc's SR time averages
+ * @state: the state which has just been applied to hardware
+ * @commit_time_ms: the amount of time in ms that this commit took to complete
+ *
+ * Called after &drm_mode_config_funcs.atomic_commit_tail, this function will
+ * update the average entry/exit self refresh times on self refresh transitions.
+ * These averages will be used when calculating how long to delay before
+ * entering self refresh mode after activity.
+ */
+void drm_self_refresh_helper_update_avg_times(struct drm_atomic_state *state,
+ unsigned int commit_time_ms)
+{
+ struct drm_crtc *crtc;
+ struct drm_crtc_state *old_crtc_state, *new_crtc_state;
+ int i;
+
+ for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state,
+ new_crtc_state, i) {
+ struct drm_self_refresh_data *sr_data = crtc->self_refresh_data;
+ struct ewma_psr_time *time;
+
+ if (old_crtc_state->self_refresh_active ==
+ new_crtc_state->self_refresh_active)
+ continue;
+
+ if (new_crtc_state->self_refresh_active)
+ time = &sr_data->entry_avg_ms;
+ else
+ time = &sr_data->exit_avg_ms;
+
+ mutex_lock(&sr_data->avg_mutex);
+ ewma_psr_time_add(time, commit_time_ms);
+ mutex_unlock(&sr_data->avg_mutex);
+ }
+}
+EXPORT_SYMBOL(drm_self_refresh_helper_update_avg_times);
+
/**
* drm_self_refresh_helper_alter_state - Alters the atomic state for SR exit
* @state: the state currently being checked
for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
struct drm_self_refresh_data *sr_data;
+ unsigned int delay;
/* Don't trigger the entry timer when we're already in SR */
if (crtc_state->self_refresh_active)
if (!sr_data)
continue;
+ mutex_lock(&sr_data->avg_mutex);
+ delay = (ewma_psr_time_read(&sr_data->entry_avg_ms) +
+ ewma_psr_time_read(&sr_data->exit_avg_ms)) * 2;
+ mutex_unlock(&sr_data->avg_mutex);
+
mod_delayed_work(system_wq, &sr_data->entry_work,
- msecs_to_jiffies(sr_data->entry_delay_ms));
+ msecs_to_jiffies(delay));
}
}
EXPORT_SYMBOL(drm_self_refresh_helper_alter_state);
/**
* drm_self_refresh_helper_init - Initializes self refresh helpers for a crtc
* @crtc: the crtc which supports self refresh supported displays
- * @entry_delay_ms: amount of inactivity to wait before entering self refresh
*
* Returns zero if successful or -errno on failure
*/
-int drm_self_refresh_helper_init(struct drm_crtc *crtc,
- unsigned int entry_delay_ms)
+int drm_self_refresh_helper_init(struct drm_crtc *crtc)
{
struct drm_self_refresh_data *sr_data = crtc->self_refresh_data;
INIT_DELAYED_WORK(&sr_data->entry_work,
drm_self_refresh_helper_entry_work);
- sr_data->entry_delay_ms = entry_delay_ms;
sr_data->crtc = crtc;
+ mutex_init(&sr_data->avg_mutex);
+ ewma_psr_time_init(&sr_data->entry_avg_ms);
+ ewma_psr_time_init(&sr_data->exit_avg_ms);
+
+ /*
+ * Seed the averages so they're non-zero (and sufficiently large
+ * for even poorly performing panels). As time goes on, this will be
+ * averaged out and the values will trend to their true value.
+ */
+ ewma_psr_time_add(&sr_data->entry_avg_ms, SELF_REFRESH_AVG_SEED_MS);
+ ewma_psr_time_add(&sr_data->exit_avg_ms, SELF_REFRESH_AVG_SEED_MS);
crtc->self_refresh_data = sr_data;
return 0;
if (job->fb)
drm_framebuffer_put(job->fb);
+ if (job->out_fence)
+ dma_fence_put(job->out_fence);
+
kfree(job);
}
EXPORT_SYMBOL(drm_writeback_cleanup_job);
{
unsigned long flags;
struct drm_writeback_job *job;
+ struct dma_fence *out_fence;
spin_lock_irqsave(&wb_connector->job_lock, flags);
job = list_first_entry_or_null(&wb_connector->job_queue,
struct drm_writeback_job,
list_entry);
- if (job) {
+ if (job)
list_del(&job->list_entry);
- if (job->out_fence) {
- if (status)
- dma_fence_set_error(job->out_fence, status);
- dma_fence_signal(job->out_fence);
- dma_fence_put(job->out_fence);
- }
- }
+
spin_unlock_irqrestore(&wb_connector->job_lock, flags);
if (WARN_ON(!job))
return;
+ out_fence = job->out_fence;
+ if (out_fence) {
+ if (status)
+ dma_fence_set_error(out_fence, status);
+ dma_fence_signal(out_fence);
+ dma_fence_put(out_fence);
+ job->out_fence = NULL;
+ }
+
INIT_WORK(&job->cleanup_work, cleanup_work);
queue_work(system_long_wq, &job->cleanup_work);
}
pipe_config->fdi_lanes = lane;
intel_link_compute_m_n(pipe_config->pipe_bpp, lane, fdi_dotclock,
- link_bw, &pipe_config->fdi_m_n, false);
+ link_bw, &pipe_config->fdi_m_n, false, false);
ret = ironlake_check_fdi_lanes(dev, intel_crtc->pipe, pipe_config);
if (ret == -EDEADLK)
intel_link_compute_m_n(u16 bits_per_pixel, int nlanes,
int pixel_clock, int link_clock,
struct intel_link_m_n *m_n,
- bool constant_n)
+ bool constant_n, bool fec_enable)
{
- m_n->tu = 64;
+ u32 data_clock = bits_per_pixel * pixel_clock;
+
+ if (fec_enable)
+ data_clock = intel_dp_mode_to_fec_clock(data_clock);
- compute_m_n(bits_per_pixel * pixel_clock,
+ m_n->tu = 64;
+ compute_m_n(data_clock,
link_clock * nlanes * 8,
&m_n->gmch_m, &m_n->gmch_n,
constant_n);
void intel_link_compute_m_n(u16 bpp, int nlanes,
int pixel_clock, int link_clock,
struct intel_link_m_n *m_n,
- bool constant_n);
+ bool constant_n, bool fec_enable);
bool is_ccs_modifier(u64 modifier);
void lpt_disable_clkout_dp(struct drm_i915_private *dev_priv);
u32 intel_plane_fb_max_stride(struct drm_i915_private *dev_priv,
#define DP_DSC_MAX_ENC_THROUGHPUT_0 340000
#define DP_DSC_MAX_ENC_THROUGHPUT_1 400000
-/* DP DSC FEC Overhead factor = (100 - 2.4)/100 */
-#define DP_DSC_FEC_OVERHEAD_FACTOR 976
+/* DP DSC FEC Overhead factor = 1/(0.972261) */
+#define DP_DSC_FEC_OVERHEAD_FACTOR 972261
/* Compliance test status bits */
#define INTEL_DP_RESOLUTION_SHIFT_MASK 0
return 0;
}
+u32 intel_dp_mode_to_fec_clock(u32 mode_clock)
+{
+ return div_u64(mul_u32_u32(mode_clock, 1000000U),
+ DP_DSC_FEC_OVERHEAD_FACTOR);
+}
+
+static u16 intel_dp_dsc_get_output_bpp(u32 link_clock, u32 lane_count,
+ u32 mode_clock, u32 mode_hdisplay)
+{
+ u32 bits_per_pixel, max_bpp_small_joiner_ram;
+ int i;
+
+ /*
+ * Available Link Bandwidth(Kbits/sec) = (NumberOfLanes)*
+ * (LinkSymbolClock)* 8 * (TimeSlotsPerMTP)
+ * for SST -> TimeSlotsPerMTP is 1,
+ * for MST -> TimeSlotsPerMTP has to be calculated
+ */
+ bits_per_pixel = (link_clock * lane_count * 8) /
+ intel_dp_mode_to_fec_clock(mode_clock);
+ DRM_DEBUG_KMS("Max link bpp: %u\n", bits_per_pixel);
+
+ /* Small Joiner Check: output bpp <= joiner RAM (bits) / Horiz. width */
+ max_bpp_small_joiner_ram = DP_DSC_MAX_SMALL_JOINER_RAM_BUFFER / mode_hdisplay;
+ DRM_DEBUG_KMS("Max small joiner bpp: %u\n", max_bpp_small_joiner_ram);
+
+ /*
+ * Greatest allowed DSC BPP = MIN (output BPP from available Link BW
+ * check, output bpp from small joiner RAM check)
+ */
+ bits_per_pixel = min(bits_per_pixel, max_bpp_small_joiner_ram);
+
+ /* Error out if the max bpp is less than smallest allowed valid bpp */
+ if (bits_per_pixel < valid_dsc_bpp[0]) {
+ DRM_DEBUG_KMS("Unsupported BPP %u, min %u\n",
+ bits_per_pixel, valid_dsc_bpp[0]);
+ return 0;
+ }
+
+ /* Find the nearest match in the array of known BPPs from VESA */
+ for (i = 0; i < ARRAY_SIZE(valid_dsc_bpp) - 1; i++) {
+ if (bits_per_pixel < valid_dsc_bpp[i + 1])
+ break;
+ }
+ bits_per_pixel = valid_dsc_bpp[i];
+
+ /*
+ * Compressed BPP in U6.4 format so multiply by 16, for Gen 11,
+ * fractional part is 0
+ */
+ return bits_per_pixel << 4;
+}
+
+static u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp,
+ int mode_clock, int mode_hdisplay)
+{
+ u8 min_slice_count, i;
+ int max_slice_width;
+
+ if (mode_clock <= DP_DSC_PEAK_PIXEL_RATE)
+ min_slice_count = DIV_ROUND_UP(mode_clock,
+ DP_DSC_MAX_ENC_THROUGHPUT_0);
+ else
+ min_slice_count = DIV_ROUND_UP(mode_clock,
+ DP_DSC_MAX_ENC_THROUGHPUT_1);
+
+ max_slice_width = drm_dp_dsc_sink_max_slice_width(intel_dp->dsc_dpcd);
+ if (max_slice_width < DP_DSC_MIN_SLICE_WIDTH_VALUE) {
+ DRM_DEBUG_KMS("Unsupported slice width %d by DP DSC Sink device\n",
+ max_slice_width);
+ return 0;
+ }
+ /* Also take into account max slice width */
+ min_slice_count = min_t(u8, min_slice_count,
+ DIV_ROUND_UP(mode_hdisplay,
+ max_slice_width));
+
+ /* Find the closest match to the valid slice count values */
+ for (i = 0; i < ARRAY_SIZE(valid_dsc_slicecount); i++) {
+ if (valid_dsc_slicecount[i] >
+ drm_dp_dsc_sink_max_slice_count(intel_dp->dsc_dpcd,
+ false))
+ break;
+ if (min_slice_count <= valid_dsc_slicecount[i])
+ return valid_dsc_slicecount[i];
+ }
+
+ DRM_DEBUG_KMS("Unsupported Slice Count %d\n", min_slice_count);
+ return 0;
+}
+
static enum drm_mode_status
intel_dp_mode_valid(struct drm_connector *connector,
struct drm_display_mode *mode)
adjusted_mode->crtc_clock,
pipe_config->port_clock,
&pipe_config->dp_m_n,
- constant_n);
+ constant_n, pipe_config->fec_enable);
if (intel_connector->panel.downclock_mode != NULL &&
dev_priv->drrs.type == SEAMLESS_DRRS_SUPPORT) {
intel_connector->panel.downclock_mode->clock,
pipe_config->port_clock,
&pipe_config->dp_m2_n2,
- constant_n);
+ constant_n, pipe_config->fec_enable);
}
if (!HAS_DDI(dev_priv))
DP_DPRX_ESI_LEN;
}
-u16 intel_dp_dsc_get_output_bpp(int link_clock, u8 lane_count,
- int mode_clock, int mode_hdisplay)
-{
- u16 bits_per_pixel, max_bpp_small_joiner_ram;
- int i;
-
- /*
- * Available Link Bandwidth(Kbits/sec) = (NumberOfLanes)*
- * (LinkSymbolClock)* 8 * ((100-FECOverhead)/100)*(TimeSlotsPerMTP)
- * FECOverhead = 2.4%, for SST -> TimeSlotsPerMTP is 1,
- * for MST -> TimeSlotsPerMTP has to be calculated
- */
- bits_per_pixel = (link_clock * lane_count * 8 *
- DP_DSC_FEC_OVERHEAD_FACTOR) /
- mode_clock;
-
- /* Small Joiner Check: output bpp <= joiner RAM (bits) / Horiz. width */
- max_bpp_small_joiner_ram = DP_DSC_MAX_SMALL_JOINER_RAM_BUFFER /
- mode_hdisplay;
-
- /*
- * Greatest allowed DSC BPP = MIN (output BPP from avaialble Link BW
- * check, output bpp from small joiner RAM check)
- */
- bits_per_pixel = min(bits_per_pixel, max_bpp_small_joiner_ram);
-
- /* Error out if the max bpp is less than smallest allowed valid bpp */
- if (bits_per_pixel < valid_dsc_bpp[0]) {
- DRM_DEBUG_KMS("Unsupported BPP %d\n", bits_per_pixel);
- return 0;
- }
-
- /* Find the nearest match in the array of known BPPs from VESA */
- for (i = 0; i < ARRAY_SIZE(valid_dsc_bpp) - 1; i++) {
- if (bits_per_pixel < valid_dsc_bpp[i + 1])
- break;
- }
- bits_per_pixel = valid_dsc_bpp[i];
-
- /*
- * Compressed BPP in U6.4 format so multiply by 16, for Gen 11,
- * fractional part is 0
- */
- return bits_per_pixel << 4;
-}
-
-u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp,
- int mode_clock,
- int mode_hdisplay)
-{
- u8 min_slice_count, i;
- int max_slice_width;
-
- if (mode_clock <= DP_DSC_PEAK_PIXEL_RATE)
- min_slice_count = DIV_ROUND_UP(mode_clock,
- DP_DSC_MAX_ENC_THROUGHPUT_0);
- else
- min_slice_count = DIV_ROUND_UP(mode_clock,
- DP_DSC_MAX_ENC_THROUGHPUT_1);
-
- max_slice_width = drm_dp_dsc_sink_max_slice_width(intel_dp->dsc_dpcd);
- if (max_slice_width < DP_DSC_MIN_SLICE_WIDTH_VALUE) {
- DRM_DEBUG_KMS("Unsupported slice width %d by DP DSC Sink device\n",
- max_slice_width);
- return 0;
- }
- /* Also take into account max slice width */
- min_slice_count = min_t(u8, min_slice_count,
- DIV_ROUND_UP(mode_hdisplay,
- max_slice_width));
-
- /* Find the closest match to the valid slice count values */
- for (i = 0; i < ARRAY_SIZE(valid_dsc_slicecount); i++) {
- if (valid_dsc_slicecount[i] >
- drm_dp_dsc_sink_max_slice_count(intel_dp->dsc_dpcd,
- false))
- break;
- if (min_slice_count <= valid_dsc_slicecount[i])
- return valid_dsc_slicecount[i];
- }
-
- DRM_DEBUG_KMS("Unsupported Slice Count %d\n", min_slice_count);
- return 0;
-}
-
static void
intel_pixel_encoding_setup_vsc(struct intel_dp *intel_dp,
const struct intel_crtc_state *crtc_state)
bool intel_dp_source_supports_hbr3(struct intel_dp *intel_dp);
bool
intel_dp_get_link_status(struct intel_dp *intel_dp, u8 *link_status);
-u16 intel_dp_dsc_get_output_bpp(int link_clock, u8 lane_count,
- int mode_clock, int mode_hdisplay);
-u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp, int mode_clock,
- int mode_hdisplay);
bool intel_dp_read_dpcd(struct intel_dp *intel_dp);
bool intel_dp_get_colorimetry_status(struct intel_dp *intel_dp);
return ~((1 << lane_count) - 1) & 0xf;
}
+u32 intel_dp_mode_to_fec_clock(u32 mode_clock);
+
#endif /* __INTEL_DP_H__ */
adjusted_mode->crtc_clock,
crtc_state->port_clock,
&crtc_state->dp_m_n,
- constant_n);
+ constant_n, crtc_state->fec_enable);
crtc_state->dp_m_n.tu = slots;
return 0;
intel_encoder->type = INTEL_OUTPUT_DP_MST;
intel_encoder->power_domain = intel_dig_port->base.power_domain;
intel_encoder->port = intel_dig_port->base.port;
- intel_encoder->crtc_mask = BIT(pipe);
+ intel_encoder->crtc_mask = 0x7;
intel_encoder->cloneable = 0;
intel_encoder->compute_config = intel_dp_mst_compute_config;
int src_x, src_w, src_h, crtc_w, crtc_h;
const struct drm_display_mode *adjusted_mode =
&crtc_state->base.adjusted_mode;
+ unsigned int stride = plane_state->color_plane[0].stride;
unsigned int cpp = fb->format->cpp[0];
unsigned int width_bytes;
int min_width, min_height;
return -EINVAL;
}
- if (width_bytes > 4096 || fb->pitches[0] > 4096) {
+ if (stride > 4096) {
DRM_DEBUG_KMS("Stride (%u) exceeds hardware max with scaling (%u)\n",
- fb->pitches[0], 4096);
+ stride, 4096);
return -EINVAL;
}
asyw->image.pitch[0] = fb->base.pitches[0];
}
- if (!(asyh->state.pageflip_flags & DRM_MODE_PAGE_FLIP_ASYNC))
+ if (!asyh->state.async_flip)
asyw->image.interval = 1;
else
asyw->image.interval = 0;
}
/* Can't do an immediate flip while changing the LUT. */
- asyh->state.pageflip_flags &= ~DRM_MODE_PAGE_FLIP_ASYNC;
+ asyh->state.async_flip = false;
}
static int
static const struct dss_features omap3630_dss_feats = {
.model = DSS_MODEL_OMAP3,
- .fck_div_max = 32,
+ .fck_div_max = 31,
.fck_freq_max = 173000000,
.dss_fck_multiplier = 1,
.parent_clk_name = "dpll4_ck",
* If frequency scaling from low to high, adjust voltage first.
* If frequency scaling from high to low, adjust frequency first.
*/
- if (old_clk_rate < target_rate && pfdev->regulator) {
+ if (old_clk_rate < target_rate) {
err = regulator_set_voltage(pfdev->regulator, target_volt,
target_volt);
if (err) {
if (err) {
dev_err(dev, "Cannot set frequency %lu (%d)\n", target_rate,
err);
- if (pfdev->regulator)
- regulator_set_voltage(pfdev->regulator,
- pfdev->devfreq.cur_volt,
- pfdev->devfreq.cur_volt);
+ regulator_set_voltage(pfdev->regulator, pfdev->devfreq.cur_volt,
+ pfdev->devfreq.cur_volt);
return err;
}
- if (old_clk_rate > target_rate && pfdev->regulator) {
+ if (old_clk_rate > target_rate) {
err = regulator_set_voltage(pfdev->regulator, target_volt,
target_volt);
if (err)
{
int ret;
- pfdev->regulator = devm_regulator_get_optional(pfdev->dev, "mali");
+ pfdev->regulator = devm_regulator_get(pfdev->dev, "mali");
if (IS_ERR(pfdev->regulator)) {
ret = PTR_ERR(pfdev->regulator);
- pfdev->regulator = NULL;
- if (ret == -ENODEV)
- return 0;
dev_err(pfdev->dev, "failed to get regulator: %d\n", ret);
return ret;
}
static void panfrost_regulator_fini(struct panfrost_device *pfdev)
{
- if (pfdev->regulator)
- regulator_disable(pfdev->regulator);
+ regulator_disable(pfdev->regulator);
}
int panfrost_device_init(struct panfrost_device *pfdev)
free_io_pgtable_ops(mmu->pgtbl_ops);
}
-static struct drm_mm_node *addr_to_drm_mm_node(struct panfrost_device *pfdev, int as, u64 addr)
+static struct panfrost_gem_object *
+addr_to_drm_mm_node(struct panfrost_device *pfdev, int as, u64 addr)
{
- struct drm_mm_node *node = NULL;
+ struct panfrost_gem_object *bo = NULL;
+ struct panfrost_file_priv *priv;
+ struct drm_mm_node *node;
u64 offset = addr >> PAGE_SHIFT;
struct panfrost_mmu *mmu;
spin_lock(&pfdev->as_lock);
list_for_each_entry(mmu, &pfdev->as_lru_list, list) {
- struct panfrost_file_priv *priv;
- if (as != mmu->as)
- continue;
+ if (as == mmu->as)
+ break;
+ }
+ if (as != mmu->as)
+ goto out;
+
+ priv = container_of(mmu, struct panfrost_file_priv, mmu);
- priv = container_of(mmu, struct panfrost_file_priv, mmu);
- drm_mm_for_each_node(node, &priv->mm) {
- if (offset >= node->start && offset < (node->start + node->size))
- goto out;
+ spin_lock(&priv->mm_lock);
+
+ drm_mm_for_each_node(node, &priv->mm) {
+ if (offset >= node->start &&
+ offset < (node->start + node->size)) {
+ bo = drm_mm_node_to_panfrost_bo(node);
+ drm_gem_object_get(&bo->base.base);
+ break;
}
}
+ spin_unlock(&priv->mm_lock);
out:
spin_unlock(&pfdev->as_lock);
- return node;
+ return bo;
}
#define NUM_FAULT_PAGES (SZ_2M / PAGE_SIZE)
int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, u64 addr)
{
int ret, i;
- struct drm_mm_node *node;
struct panfrost_gem_object *bo;
struct address_space *mapping;
pgoff_t page_offset;
struct sg_table *sgt;
struct page **pages;
- node = addr_to_drm_mm_node(pfdev, as, addr);
- if (!node)
+ bo = addr_to_drm_mm_node(pfdev, as, addr);
+ if (!bo)
return -ENOENT;
- bo = drm_mm_node_to_panfrost_bo(node);
if (!bo->is_heap) {
dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
- node->start << PAGE_SHIFT);
- return -EINVAL;
+ bo->node.start << PAGE_SHIFT);
+ ret = -EINVAL;
+ goto err_bo;
}
WARN_ON(bo->mmu->as != as);
/* Assume 2MB alignment and size multiple */
addr &= ~((u64)SZ_2M - 1);
page_offset = addr >> PAGE_SHIFT;
- page_offset -= node->start;
+ page_offset -= bo->node.start;
mutex_lock(&bo->base.pages_lock);
sizeof(struct sg_table), GFP_KERNEL | __GFP_ZERO);
if (!bo->sgts) {
mutex_unlock(&bo->base.pages_lock);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto err_bo;
}
pages = kvmalloc_array(bo->base.base.size >> PAGE_SHIFT,
kfree(bo->sgts);
bo->sgts = NULL;
mutex_unlock(&bo->base.pages_lock);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto err_bo;
}
bo->base.pages = pages;
bo->base.pages_use_count = 1;
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
+ drm_gem_object_put_unlocked(&bo->base.base);
+
return 0;
err_map:
sg_free_table(sgt);
err_pages:
drm_gem_shmem_put_pages(&bo->base);
+err_bo:
+ drm_gem_object_put_unlocked(&bo->base.base);
return ret;
}
static int radeon_pci_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
+ unsigned long flags = 0;
int ret;
+ if (!ent)
+ return -ENODEV; /* Avoid NULL-ptr deref in drm_get_pci_dev */
+
+ flags = ent->driver_data;
+
+ if (!radeon_si_support) {
+ switch (flags & RADEON_FAMILY_MASK) {
+ case CHIP_TAHITI:
+ case CHIP_PITCAIRN:
+ case CHIP_VERDE:
+ case CHIP_OLAND:
+ case CHIP_HAINAN:
+ dev_info(&pdev->dev,
+ "SI support disabled by module param\n");
+ return -ENODEV;
+ }
+ }
+ if (!radeon_cik_support) {
+ switch (flags & RADEON_FAMILY_MASK) {
+ case CHIP_KAVERI:
+ case CHIP_BONAIRE:
+ case CHIP_HAWAII:
+ case CHIP_KABINI:
+ case CHIP_MULLINS:
+ dev_info(&pdev->dev,
+ "CIK support disabled by module param\n");
+ return -ENODEV;
+ }
+ }
+
if (vga_switcheroo_client_probe_defer(pdev))
return -EPROBE_DEFER;
struct radeon_device *rdev;
int r, acpi_status;
- if (!radeon_si_support) {
- switch (flags & RADEON_FAMILY_MASK) {
- case CHIP_TAHITI:
- case CHIP_PITCAIRN:
- case CHIP_VERDE:
- case CHIP_OLAND:
- case CHIP_HAINAN:
- dev_info(dev->dev,
- "SI support disabled by module param\n");
- return -ENODEV;
- }
- }
- if (!radeon_cik_support) {
- switch (flags & RADEON_FAMILY_MASK) {
- case CHIP_KAVERI:
- case CHIP_BONAIRE:
- case CHIP_HAWAII:
- case CHIP_KABINI:
- case CHIP_MULLINS:
- dev_info(dev->dev,
- "CIK support disabled by module param\n");
- return -ENODEV;
- }
- }
-
rdev = kzalloc(sizeof(struct radeon_device), GFP_KERNEL);
if (rdev == NULL) {
return -ENOMEM;
struct drm_device *dev = encoder->dev;
struct drm_framebuffer *fb;
- if (!conn_state->writeback_job || !conn_state->writeback_job->fb)
+ if (!conn_state->writeback_job)
return 0;
fb = conn_state->writeback_job->fb;
unsigned int i;
state = rcrtc->writeback.base.state;
- if (!state || !state->writeback_job || !state->writeback_job->fb)
+ if (!state || !state->writeback_job)
return;
fb = state->writeback_job->fb;
#include "rockchip_drm_vop.h"
#include "rockchip_rgb.h"
-#define VOP_SELF_REFRESH_ENTRY_DELAY_MS 100
-
#define VOP_WIN_SET(vop, win, name, v) \
vop_reg_set(vop, &win->phy->name, win->base, ~0, v, #name)
#define VOP_SCL_SET(vop, win, name, v) \
init_completion(&vop->line_flag_completion);
crtc->port = port;
- ret = drm_self_refresh_helper_init(crtc,
- VOP_SELF_REFRESH_ENTRY_DELAY_MS);
+ ret = drm_self_refresh_helper_init(crtc);
if (ret)
DRM_DEV_DEBUG_KMS(vop->dev,
"Failed to init %s with SR helpers %d, ignoring\n",
#include <linux/gpio.h>
#include <linux/mod_devicetable.h>
#include <linux/of_gpio.h>
+#include <linux/pinctrl/consumer.h>
#include <linux/platform_device.h>
#include <drm/drm_atomic_helper.h>
int i;
conn_state = drm_atomic_get_new_connector_state(state, conn);
- if (!conn_state->writeback_job || !conn_state->writeback_job->fb)
+ if (!conn_state->writeback_job)
return 0;
crtc_state = drm_atomic_get_new_crtc_state(state, conn_state->crtc);
u32 ctrl;
int i;
- if (WARN_ON(!conn_state->writeback_job ||
- !conn_state->writeback_job->fb))
+ if (WARN_ON(!conn_state->writeback_job))
return;
mode = &conn_state->crtc->state->adjusted_mode;
case PCI_DEVICE_ID_INTEL_LEWISBURG_SSKU_SMBUS:
case PCI_DEVICE_ID_INTEL_DNV_SMBUS:
case PCI_DEVICE_ID_INTEL_KABYLAKE_PCH_H_SMBUS:
+ priv->features |= FEATURE_BLOCK_PROC;
priv->features |= FEATURE_I2C_BLOCK_READ;
priv->features |= FEATURE_IRQ;
priv->features |= FEATURE_SMBUS_PEC;
{
dma_addr_t rx_dma;
unsigned long time_left;
- void *dma_buf;
+ void *dma_buf = NULL;
struct geni_se *se = &gi2c->se;
size_t len = msg->len;
- dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
+ if (!of_machine_is_compatible("lenovo,yoga-c630"))
+ dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
+
if (dma_buf)
geni_se_select_mode(se, GENI_SE_DMA);
else
{
dma_addr_t tx_dma;
unsigned long time_left;
- void *dma_buf;
+ void *dma_buf = NULL;
struct geni_se *se = &gi2c->se;
size_t len = msg->len;
- dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
+ if (!of_machine_is_compatible("lenovo,yoga-c630"))
+ dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
+
if (dma_buf)
geni_se_select_mode(se, GENI_SE_DMA);
else
if (readb(riic->base + RIIC_ICSR2) & ICSR2_NACKF) {
/* We got a NACKIE */
readb(riic->base + RIIC_ICDRR); /* dummy read */
+ riic_clear_set_bit(riic, ICSR2_NACKF, 0, RIIC_ICSR2);
riic->err = -ENXIO;
} else if (riic->bytes_left) {
return IRQ_NONE;
u16 address_mask;
u8 num_address_bytes;
u8 idx_write_cnt;
+ bool read_only;
u8 buffer[];
};
#define I2C_SLAVE_BYTELEN GENMASK(15, 0)
#define I2C_SLAVE_FLAG_ADDR16 BIT(16)
+#define I2C_SLAVE_FLAG_RO BIT(17)
#define I2C_SLAVE_DEVICE_MAGIC(_len, _flags) ((_flags) | (_len))
static int i2c_slave_eeprom_slave_cb(struct i2c_client *client,
eeprom->buffer_idx = *val | (eeprom->buffer_idx << 8);
eeprom->idx_write_cnt++;
} else {
- spin_lock(&eeprom->buffer_lock);
- eeprom->buffer[eeprom->buffer_idx++ & eeprom->address_mask] = *val;
- spin_unlock(&eeprom->buffer_lock);
+ if (!eeprom->read_only) {
+ spin_lock(&eeprom->buffer_lock);
+ eeprom->buffer[eeprom->buffer_idx++ & eeprom->address_mask] = *val;
+ spin_unlock(&eeprom->buffer_lock);
+ }
}
break;
eeprom->idx_write_cnt = 0;
eeprom->num_address_bytes = flag_addr16 ? 2 : 1;
eeprom->address_mask = size - 1;
+ eeprom->read_only = FIELD_GET(I2C_SLAVE_FLAG_RO, id->driver_data);
spin_lock_init(&eeprom->buffer_lock);
i2c_set_clientdata(client, eeprom);
static const struct i2c_device_id i2c_slave_eeprom_id[] = {
{ "slave-24c02", I2C_SLAVE_DEVICE_MAGIC(2048 / 8, 0) },
+ { "slave-24c02ro", I2C_SLAVE_DEVICE_MAGIC(2048 / 8, I2C_SLAVE_FLAG_RO) },
{ "slave-24c32", I2C_SLAVE_DEVICE_MAGIC(32768 / 8, I2C_SLAVE_FLAG_ADDR16) },
+ { "slave-24c32ro", I2C_SLAVE_DEVICE_MAGIC(32768 / 8, I2C_SLAVE_FLAG_ADDR16 | I2C_SLAVE_FLAG_RO) },
{ "slave-24c64", I2C_SLAVE_DEVICE_MAGIC(65536 / 8, I2C_SLAVE_FLAG_ADDR16) },
+ { "slave-24c64ro", I2C_SLAVE_DEVICE_MAGIC(65536 / 8, I2C_SLAVE_FLAG_ADDR16 | I2C_SLAVE_FLAG_RO) },
{ }
};
MODULE_DEVICE_TABLE(i2c, i2c_slave_eeprom_id);
if (family == AF_INET) {
rt = container_of(dst, struct rtable, dst);
- return rt->rt_gw_family == AF_INET;
+ return rt->rt_uses_gateway;
}
rt6 = container_of(dst, struct rt6_info, dst);
*/
#define AMD_IOMMU_PGSIZES ((~0xFFFUL) & ~(2ULL << 38))
-static DEFINE_SPINLOCK(amd_iommu_devtable_lock);
static DEFINE_SPINLOCK(pd_bitmap_lock);
/* List of all available dev_data structures */
if (!dev_data)
return NULL;
+ spin_lock_init(&dev_data->lock);
dev_data->devid = devid;
ratelimit_default_init(&dev_data->rs);
*/
}
+/*
+ * Helper function to get the first pte of a large mapping
+ */
+static u64 *first_pte_l7(u64 *pte, unsigned long *page_size,
+ unsigned long *count)
+{
+ unsigned long pte_mask, pg_size, cnt;
+ u64 *fpte;
+
+ pg_size = PTE_PAGE_SIZE(*pte);
+ cnt = PAGE_SIZE_PTE_COUNT(pg_size);
+ pte_mask = ~((cnt << 3) - 1);
+ fpte = (u64 *)(((unsigned long)pte) & pte_mask);
+
+ if (page_size)
+ *page_size = pg_size;
+
+ if (count)
+ *count = cnt;
+
+ return fpte;
+}
+
/****************************************************************************
*
* Interrupt handling functions
dma_addr_t iova, size_t size)
{
if (unlikely(amd_iommu_np_cache)) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&domain->lock, flags);
domain_flush_pages(domain, iova, size);
domain_flush_complete(domain);
+ spin_unlock_irqrestore(&domain->lock, flags);
}
}
BUG_ON(domain->mode < PAGE_MODE_NONE ||
domain->mode > PAGE_MODE_6_LEVEL);
- free_sub_pt(root, domain->mode, freelist);
+ freelist = free_sub_pt(root, domain->mode, freelist);
free_page_list(freelist);
}
* another level increases the size of the address space by 9 bits to a size up
* to 64 bits.
*/
-static void increase_address_space(struct protection_domain *domain,
+static bool increase_address_space(struct protection_domain *domain,
gfp_t gfp)
{
unsigned long flags;
+ bool ret = false;
u64 *pte;
spin_lock_irqsave(&domain->lock, flags);
iommu_virt_to_phys(domain->pt_root));
domain->pt_root = pte;
domain->mode += 1;
- domain->updated = true;
+
+ ret = true;
out:
spin_unlock_irqrestore(&domain->lock, flags);
- return;
+ return ret;
}
static u64 *alloc_pte(struct protection_domain *domain,
unsigned long address,
unsigned long page_size,
u64 **pte_page,
- gfp_t gfp)
+ gfp_t gfp,
+ bool *updated)
{
int level, end_lvl;
u64 *pte, *page;
BUG_ON(!is_power_of_2(page_size));
while (address > PM_LEVEL_SIZE(domain->mode))
- increase_address_space(domain, gfp);
+ *updated = increase_address_space(domain, gfp) || *updated;
level = domain->mode - 1;
pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
__pte = *pte;
pte_level = PM_PTE_LEVEL(__pte);
- if (!IOMMU_PTE_PRESENT(__pte) ||
+ /*
+ * If we replace a series of large PTEs, we need
+ * to tear down all of them.
+ */
+ if (IOMMU_PTE_PRESENT(__pte) &&
pte_level == PAGE_MODE_7_LEVEL) {
+ unsigned long count, i;
+ u64 *lpte;
+
+ lpte = first_pte_l7(pte, NULL, &count);
+
+ /*
+ * Unmap the replicated PTEs that still match the
+ * original large mapping
+ */
+ for (i = 0; i < count; ++i)
+ cmpxchg64(&lpte[i], __pte, 0ULL);
+
+ *updated = true;
+ continue;
+ }
+
+ if (!IOMMU_PTE_PRESENT(__pte) ||
+ pte_level == PAGE_MODE_NONE) {
page = (u64 *)get_zeroed_page(gfp);
+
if (!page)
return NULL;
/* pte could have been changed somewhere. */
if (cmpxchg64(pte, __pte, __npte) != __pte)
free_page((unsigned long)page);
- else if (pte_level == PAGE_MODE_7_LEVEL)
- domain->updated = true;
+ else if (IOMMU_PTE_PRESENT(__pte))
+ *updated = true;
continue;
}
*page_size = PTE_LEVEL_PAGE_SIZE(level);
}
- if (PM_PTE_LEVEL(*pte) == 0x07) {
- unsigned long pte_mask;
-
- /*
- * If we have a series of large PTEs, make
- * sure to return a pointer to the first one.
- */
- *page_size = pte_mask = PTE_PAGE_SIZE(*pte);
- pte_mask = ~((PAGE_SIZE_PTE_COUNT(pte_mask) << 3) - 1);
- pte = (u64 *)(((unsigned long)pte) & pte_mask);
- }
+ /*
+ * If we have a series of large PTEs, make
+ * sure to return a pointer to the first one.
+ */
+ if (PM_PTE_LEVEL(*pte) == PAGE_MODE_7_LEVEL)
+ pte = first_pte_l7(pte, page_size, NULL);
return pte;
}
gfp_t gfp)
{
struct page *freelist = NULL;
+ bool updated = false;
u64 __pte, *pte;
- int i, count;
+ int ret, i, count;
BUG_ON(!IS_ALIGNED(bus_addr, page_size));
BUG_ON(!IS_ALIGNED(phys_addr, page_size));
+ ret = -EINVAL;
if (!(prot & IOMMU_PROT_MASK))
- return -EINVAL;
+ goto out;
count = PAGE_SIZE_PTE_COUNT(page_size);
- pte = alloc_pte(dom, bus_addr, page_size, NULL, gfp);
+ pte = alloc_pte(dom, bus_addr, page_size, NULL, gfp, &updated);
+ ret = -ENOMEM;
if (!pte)
- return -ENOMEM;
+ goto out;
for (i = 0; i < count; ++i)
freelist = free_clear_pte(&pte[i], pte[i], freelist);
if (freelist != NULL)
- dom->updated = true;
+ updated = true;
if (count > 1) {
__pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size);
for (i = 0; i < count; ++i)
pte[i] = __pte;
- update_domain(dom);
+ ret = 0;
+
+out:
+ if (updated) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&dom->lock, flags);
+ update_domain(dom);
+ spin_unlock_irqrestore(&dom->lock, flags);
+ }
/* Everything flushed out, free pages now */
free_page_list(freelist);
- return 0;
+ return ret;
}
static unsigned long iommu_unmap_page(struct protection_domain *dom,
static void dma_ops_domain_flush_tlb(struct dma_ops_domain *dom)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&dom->domain.lock, flags);
domain_flush_tlb(&dom->domain);
domain_flush_complete(&dom->domain);
+ spin_unlock_irqrestore(&dom->domain.lock, flags);
}
static void iova_domain_flush_tlb(struct iova_domain *iovad)
domain->dev_cnt -= 1;
}
-/*
- * If a device is not yet associated with a domain, this function makes the
- * device visible in the domain
- */
-static int __attach_device(struct iommu_dev_data *dev_data,
- struct protection_domain *domain)
-{
- int ret;
-
- /* lock domain */
- spin_lock(&domain->lock);
-
- ret = -EBUSY;
- if (dev_data->domain != NULL)
- goto out_unlock;
-
- /* Attach alias group root */
- do_attach(dev_data, domain);
-
- ret = 0;
-
-out_unlock:
-
- /* ready */
- spin_unlock(&domain->lock);
-
- return ret;
-}
-
-
static void pdev_iommuv2_disable(struct pci_dev *pdev)
{
pci_disable_ats(pdev);
unsigned long flags;
int ret;
+ spin_lock_irqsave(&domain->lock, flags);
+
dev_data = get_dev_data(dev);
+ spin_lock(&dev_data->lock);
+
+ ret = -EBUSY;
+ if (dev_data->domain != NULL)
+ goto out;
+
if (!dev_is_pci(dev))
goto skip_ats_check;
pdev = to_pci_dev(dev);
if (domain->flags & PD_IOMMUV2_MASK) {
+ ret = -EINVAL;
if (!dev_data->passthrough)
- return -EINVAL;
+ goto out;
if (dev_data->iommu_v2) {
if (pdev_iommuv2_enable(pdev) != 0)
- return -EINVAL;
+ goto out;
dev_data->ats.enabled = true;
dev_data->ats.qdep = pci_ats_queue_depth(pdev);
}
skip_ats_check:
- spin_lock_irqsave(&amd_iommu_devtable_lock, flags);
- ret = __attach_device(dev_data, domain);
- spin_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+ ret = 0;
+
+ do_attach(dev_data, domain);
/*
* We might boot into a crash-kernel here. The crashed kernel
*/
domain_flush_tlb_pde(domain);
- return ret;
-}
-
-/*
- * Removes a device from a protection domain (unlocked)
- */
-static void __detach_device(struct iommu_dev_data *dev_data)
-{
- struct protection_domain *domain;
-
- domain = dev_data->domain;
+ domain_flush_complete(domain);
- spin_lock(&domain->lock);
+out:
+ spin_unlock(&dev_data->lock);
- do_detach(dev_data);
+ spin_unlock_irqrestore(&domain->lock, flags);
- spin_unlock(&domain->lock);
+ return ret;
}
/*
dev_data = get_dev_data(dev);
domain = dev_data->domain;
+ spin_lock_irqsave(&domain->lock, flags);
+
+ spin_lock(&dev_data->lock);
+
/*
* First check if the device is still attached. It might already
* be detached from its domain because the generic
* our alias handling.
*/
if (WARN_ON(!dev_data->domain))
- return;
+ goto out;
- /* lock device table */
- spin_lock_irqsave(&amd_iommu_devtable_lock, flags);
- __detach_device(dev_data);
- spin_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+ do_detach(dev_data);
if (!dev_is_pci(dev))
- return;
+ goto out;
if (domain->flags & PD_IOMMUV2_MASK && dev_data->iommu_v2)
pdev_iommuv2_disable(to_pci_dev(dev));
pci_disable_ats(to_pci_dev(dev));
dev_data->ats.enabled = false;
+
+out:
+ spin_unlock(&dev_data->lock);
+
+ spin_unlock_irqrestore(&domain->lock, flags);
}
static int amd_iommu_add_device(struct device *dev)
static void update_domain(struct protection_domain *domain)
{
- if (!domain->updated)
- return;
-
update_device_table(domain);
domain_flush_devices(domain);
domain_flush_tlb_pde(domain);
-
- domain->updated = false;
}
static int dir2prot(enum dma_data_direction direction)
{
dma_addr_t offset = paddr & ~PAGE_MASK;
dma_addr_t address, start, ret;
+ unsigned long flags;
unsigned int pages;
int prot = 0;
int i;
iommu_unmap_page(&dma_dom->domain, start, PAGE_SIZE);
}
+ spin_lock_irqsave(&dma_dom->domain.lock, flags);
domain_flush_tlb(&dma_dom->domain);
domain_flush_complete(&dma_dom->domain);
+ spin_unlock_irqrestore(&dma_dom->domain.lock, flags);
dma_ops_free_iova(dma_dom, address, pages);
}
if (amd_iommu_unmap_flush) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&dma_dom->domain.lock, flags);
domain_flush_tlb(&dma_dom->domain);
domain_flush_complete(&dma_dom->domain);
+ spin_unlock_irqrestore(&dma_dom->domain.lock, flags);
dma_ops_free_iova(dma_dom, dma_addr, pages);
} else {
pages = __roundup_pow_of_two(pages);
struct iommu_dev_data *entry;
unsigned long flags;
- spin_lock_irqsave(&amd_iommu_devtable_lock, flags);
+ spin_lock_irqsave(&domain->lock, flags);
while (!list_empty(&domain->dev_list)) {
entry = list_first_entry(&domain->dev_list,
struct iommu_dev_data, list);
BUG_ON(!entry->domain);
- __detach_device(entry);
+ do_detach(entry);
}
- spin_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+ spin_unlock_irqrestore(&domain->lock, flags);
}
static void protection_domain_free(struct protection_domain *domain)
static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)
{
struct protection_domain *dom = to_pdomain(domain);
+ unsigned long flags;
+ spin_lock_irqsave(&dom->lock, flags);
domain_flush_tlb_pde(dom);
domain_flush_complete(dom);
+ spin_unlock_irqrestore(&dom->lock, flags);
}
static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
/* Update data structure */
domain->mode = PAGE_MODE_NONE;
- domain->updated = true;
/* Make changes visible to IOMMUs */
update_domain(domain);
domain->glx = levels;
domain->flags |= PD_IOMMUV2_MASK;
- domain->updated = true;
update_domain(domain);
int glx; /* Number of levels for GCR3 table */
u64 *gcr3_tbl; /* Guest CR3 table */
unsigned long flags; /* flags to find out type of domain */
- bool updated; /* complete domain flush required */
unsigned dev_cnt; /* devices assigned to this domain */
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
};
* This struct contains device specific data for the IOMMU
*/
struct iommu_dev_data {
+ /*Protect against attach/detach races */
+ spinlock_t lock;
+
struct list_head list; /* For domain->dev_list */
struct llist_node dev_data_list; /* For global dev_data_list */
struct protection_domain *domain; /* Domain the device is bound to */
if (sock->type != SOCK_RAW)
return -ESOCKTNOSUPPORT;
+ if (!capable(CAP_NET_RAW))
+ return -EPERM;
sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern);
if (!sk)
#include <linux/regulator/db8500-prcmu.h>
#include <linux/regulator/machine.h>
#include <linux/platform_data/ux500_wdt.h>
-#include <linux/platform_data/db8500_thermal.h>
#include "dbx500-prcmu-regs.h"
/* Index of different voltages to be used when accessing AVSData */
.timeout = 600, /* 10 minutes */
.has_28_bits_resolution = true,
};
-/*
- * Thermal Sensor
- */
-
-static struct resource db8500_thsens_resources[] = {
- {
- .name = "IRQ_HOTMON_LOW",
- .start = IRQ_PRCMU_HOTMON_LOW,
- .end = IRQ_PRCMU_HOTMON_LOW,
- .flags = IORESOURCE_IRQ,
- },
- {
- .name = "IRQ_HOTMON_HIGH",
- .start = IRQ_PRCMU_HOTMON_HIGH,
- .end = IRQ_PRCMU_HOTMON_HIGH,
- .flags = IORESOURCE_IRQ,
- },
-};
-
-static struct db8500_thsens_platform_data db8500_thsens_data = {
- .trip_points[0] = {
- .temp = 70000,
- .type = THERMAL_TRIP_ACTIVE,
- .cdev_name = {
- [0] = "thermal-cpufreq-0",
- },
- },
- .trip_points[1] = {
- .temp = 75000,
- .type = THERMAL_TRIP_ACTIVE,
- .cdev_name = {
- [0] = "thermal-cpufreq-0",
- },
- },
- .trip_points[2] = {
- .temp = 80000,
- .type = THERMAL_TRIP_ACTIVE,
- .cdev_name = {
- [0] = "thermal-cpufreq-0",
- },
- },
- .trip_points[3] = {
- .temp = 85000,
- .type = THERMAL_TRIP_CRITICAL,
- },
- .num_trips = 4,
-};
static const struct mfd_cell common_prcmu_devs[] = {
{
},
{
.name = "db8500-thermal",
- .num_resources = ARRAY_SIZE(db8500_thsens_resources),
- .resources = db8500_thsens_resources,
- .platform_data = &db8500_thsens_data,
- .pdata_size = sizeof(db8500_thsens_data),
+ .of_compatible = "stericsson,db8500-thermal",
},
};
depends on MMC_SDHCI && PCI
select MMC_CQHCI
select IOSF_MBI if X86
+ select MMC_SDHCI_IO_ACCESSORS
help
This selects the PCI Secure Digital Host Controller Interface.
Most controllers found today are PCI devices.
obj-$(CONFIG_MMC_SDHCI) += sdhci.o
obj-$(CONFIG_MMC_SDHCI_PCI) += sdhci-pci.o
sdhci-pci-y += sdhci-pci-core.o sdhci-pci-o2micro.o sdhci-pci-arasan.o \
- sdhci-pci-dwc-mshc.o
+ sdhci-pci-dwc-mshc.o sdhci-pci-gli.o
obj-$(subst m,y,$(CONFIG_MMC_SDHCI_PCI)) += sdhci-pci-data.o
obj-$(CONFIG_MMC_SDHCI_ACPI) += sdhci-acpi.o
obj-$(CONFIG_MMC_SDHCI_PXAV3) += sdhci-pxav3.o
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
value = sdhci_readl(host, ESDHC_DMA_SYSCTL);
- value |= ESDHC_DMA_SNOOP;
+
+ if (of_dma_is_coherent(dev->of_node))
+ value |= ESDHC_DMA_SNOOP;
+ else
+ value &= ~ESDHC_DMA_SNOOP;
+
sdhci_writel(host, value, ESDHC_DMA_SYSCTL);
return 0;
}
SDHCI_PCI_DEVICE(O2, SEABIRD1, o2),
SDHCI_PCI_DEVICE(ARASAN, PHY_EMMC, arasan),
SDHCI_PCI_DEVICE(SYNOPSYS, DWC_MSHC, snps),
+ SDHCI_PCI_DEVICE(GLI, 9750, gl9750),
+ SDHCI_PCI_DEVICE(GLI, 9755, gl9755),
SDHCI_PCI_DEVICE_CLASS(AMD, SYSTEM_SDHCI, PCI_CLASS_MASK, amd),
/* Generic SD host controller */
{PCI_DEVICE_CLASS(SYSTEM_SDHCI, PCI_CLASS_MASK)},
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Genesys Logic, Inc.
+ *
+ * Authors: Ben Chuang <ben.chuang@genesyslogic.com.tw>
+ *
+ * Version: v0.9.0 (2019-08-08)
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+#include <linux/pci.h>
+#include <linux/mmc/mmc.h>
+#include <linux/delay.h>
+#include "sdhci.h"
+#include "sdhci-pci.h"
+
+/* Genesys Logic extra registers */
+#define SDHCI_GLI_9750_WT 0x800
+#define SDHCI_GLI_9750_WT_EN BIT(0)
+#define GLI_9750_WT_EN_ON 0x1
+#define GLI_9750_WT_EN_OFF 0x0
+
+#define SDHCI_GLI_9750_DRIVING 0x860
+#define SDHCI_GLI_9750_DRIVING_1 GENMASK(11, 0)
+#define SDHCI_GLI_9750_DRIVING_2 GENMASK(27, 26)
+#define GLI_9750_DRIVING_1_VALUE 0xFFF
+#define GLI_9750_DRIVING_2_VALUE 0x3
+
+#define SDHCI_GLI_9750_PLL 0x864
+#define SDHCI_GLI_9750_PLL_TX2_INV BIT(23)
+#define SDHCI_GLI_9750_PLL_TX2_DLY GENMASK(22, 20)
+#define GLI_9750_PLL_TX2_INV_VALUE 0x1
+#define GLI_9750_PLL_TX2_DLY_VALUE 0x0
+
+#define SDHCI_GLI_9750_SW_CTRL 0x874
+#define SDHCI_GLI_9750_SW_CTRL_4 GENMASK(7, 6)
+#define GLI_9750_SW_CTRL_4_VALUE 0x3
+
+#define SDHCI_GLI_9750_MISC 0x878
+#define SDHCI_GLI_9750_MISC_TX1_INV BIT(2)
+#define SDHCI_GLI_9750_MISC_RX_INV BIT(3)
+#define SDHCI_GLI_9750_MISC_TX1_DLY GENMASK(6, 4)
+#define GLI_9750_MISC_TX1_INV_VALUE 0x0
+#define GLI_9750_MISC_RX_INV_ON 0x1
+#define GLI_9750_MISC_RX_INV_OFF 0x0
+#define GLI_9750_MISC_RX_INV_VALUE GLI_9750_MISC_RX_INV_OFF
+#define GLI_9750_MISC_TX1_DLY_VALUE 0x5
+
+#define SDHCI_GLI_9750_TUNING_CONTROL 0x540
+#define SDHCI_GLI_9750_TUNING_CONTROL_EN BIT(4)
+#define GLI_9750_TUNING_CONTROL_EN_ON 0x1
+#define GLI_9750_TUNING_CONTROL_EN_OFF 0x0
+#define SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_1 BIT(16)
+#define SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_2 GENMASK(20, 19)
+#define GLI_9750_TUNING_CONTROL_GLITCH_1_VALUE 0x1
+#define GLI_9750_TUNING_CONTROL_GLITCH_2_VALUE 0x2
+
+#define SDHCI_GLI_9750_TUNING_PARAMETERS 0x544
+#define SDHCI_GLI_9750_TUNING_PARAMETERS_RX_DLY GENMASK(2, 0)
+#define GLI_9750_TUNING_PARAMETERS_RX_DLY_VALUE 0x1
+
+#define GLI_MAX_TUNING_LOOP 40
+
+/* Genesys Logic chipset */
+static inline void gl9750_wt_on(struct sdhci_host *host)
+{
+ u32 wt_value;
+ u32 wt_enable;
+
+ wt_value = sdhci_readl(host, SDHCI_GLI_9750_WT);
+ wt_enable = FIELD_GET(SDHCI_GLI_9750_WT_EN, wt_value);
+
+ if (wt_enable == GLI_9750_WT_EN_ON)
+ return;
+
+ wt_value &= ~SDHCI_GLI_9750_WT_EN;
+ wt_value |= FIELD_PREP(SDHCI_GLI_9750_WT_EN, GLI_9750_WT_EN_ON);
+
+ sdhci_writel(host, wt_value, SDHCI_GLI_9750_WT);
+}
+
+static inline void gl9750_wt_off(struct sdhci_host *host)
+{
+ u32 wt_value;
+ u32 wt_enable;
+
+ wt_value = sdhci_readl(host, SDHCI_GLI_9750_WT);
+ wt_enable = FIELD_GET(SDHCI_GLI_9750_WT_EN, wt_value);
+
+ if (wt_enable == GLI_9750_WT_EN_OFF)
+ return;
+
+ wt_value &= ~SDHCI_GLI_9750_WT_EN;
+ wt_value |= FIELD_PREP(SDHCI_GLI_9750_WT_EN, GLI_9750_WT_EN_OFF);
+
+ sdhci_writel(host, wt_value, SDHCI_GLI_9750_WT);
+}
+
+static void gli_set_9750(struct sdhci_host *host)
+{
+ u32 driving_value;
+ u32 pll_value;
+ u32 sw_ctrl_value;
+ u32 misc_value;
+ u32 parameter_value;
+ u32 control_value;
+ u16 ctrl2;
+
+ gl9750_wt_on(host);
+
+ driving_value = sdhci_readl(host, SDHCI_GLI_9750_DRIVING);
+ pll_value = sdhci_readl(host, SDHCI_GLI_9750_PLL);
+ sw_ctrl_value = sdhci_readl(host, SDHCI_GLI_9750_SW_CTRL);
+ misc_value = sdhci_readl(host, SDHCI_GLI_9750_MISC);
+ parameter_value = sdhci_readl(host, SDHCI_GLI_9750_TUNING_PARAMETERS);
+ control_value = sdhci_readl(host, SDHCI_GLI_9750_TUNING_CONTROL);
+
+ driving_value &= ~(SDHCI_GLI_9750_DRIVING_1);
+ driving_value &= ~(SDHCI_GLI_9750_DRIVING_2);
+ driving_value |= FIELD_PREP(SDHCI_GLI_9750_DRIVING_1,
+ GLI_9750_DRIVING_1_VALUE);
+ driving_value |= FIELD_PREP(SDHCI_GLI_9750_DRIVING_2,
+ GLI_9750_DRIVING_2_VALUE);
+ sdhci_writel(host, driving_value, SDHCI_GLI_9750_DRIVING);
+
+ sw_ctrl_value &= ~SDHCI_GLI_9750_SW_CTRL_4;
+ sw_ctrl_value |= FIELD_PREP(SDHCI_GLI_9750_SW_CTRL_4,
+ GLI_9750_SW_CTRL_4_VALUE);
+ sdhci_writel(host, sw_ctrl_value, SDHCI_GLI_9750_SW_CTRL);
+
+ /* reset the tuning flow after reinit and before starting tuning */
+ pll_value &= ~SDHCI_GLI_9750_PLL_TX2_INV;
+ pll_value &= ~SDHCI_GLI_9750_PLL_TX2_DLY;
+ pll_value |= FIELD_PREP(SDHCI_GLI_9750_PLL_TX2_INV,
+ GLI_9750_PLL_TX2_INV_VALUE);
+ pll_value |= FIELD_PREP(SDHCI_GLI_9750_PLL_TX2_DLY,
+ GLI_9750_PLL_TX2_DLY_VALUE);
+
+ misc_value &= ~SDHCI_GLI_9750_MISC_TX1_INV;
+ misc_value &= ~SDHCI_GLI_9750_MISC_RX_INV;
+ misc_value &= ~SDHCI_GLI_9750_MISC_TX1_DLY;
+ misc_value |= FIELD_PREP(SDHCI_GLI_9750_MISC_TX1_INV,
+ GLI_9750_MISC_TX1_INV_VALUE);
+ misc_value |= FIELD_PREP(SDHCI_GLI_9750_MISC_RX_INV,
+ GLI_9750_MISC_RX_INV_VALUE);
+ misc_value |= FIELD_PREP(SDHCI_GLI_9750_MISC_TX1_DLY,
+ GLI_9750_MISC_TX1_DLY_VALUE);
+
+ parameter_value &= ~SDHCI_GLI_9750_TUNING_PARAMETERS_RX_DLY;
+ parameter_value |= FIELD_PREP(SDHCI_GLI_9750_TUNING_PARAMETERS_RX_DLY,
+ GLI_9750_TUNING_PARAMETERS_RX_DLY_VALUE);
+
+ control_value &= ~SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_1;
+ control_value &= ~SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_2;
+ control_value |= FIELD_PREP(SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_1,
+ GLI_9750_TUNING_CONTROL_GLITCH_1_VALUE);
+ control_value |= FIELD_PREP(SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_2,
+ GLI_9750_TUNING_CONTROL_GLITCH_2_VALUE);
+
+ sdhci_writel(host, pll_value, SDHCI_GLI_9750_PLL);
+ sdhci_writel(host, misc_value, SDHCI_GLI_9750_MISC);
+
+ /* disable tuned clk */
+ ctrl2 = sdhci_readw(host, SDHCI_HOST_CONTROL2);
+ ctrl2 &= ~SDHCI_CTRL_TUNED_CLK;
+ sdhci_writew(host, ctrl2, SDHCI_HOST_CONTROL2);
+
+ /* enable tuning parameters control */
+ control_value &= ~SDHCI_GLI_9750_TUNING_CONTROL_EN;
+ control_value |= FIELD_PREP(SDHCI_GLI_9750_TUNING_CONTROL_EN,
+ GLI_9750_TUNING_CONTROL_EN_ON);
+ sdhci_writel(host, control_value, SDHCI_GLI_9750_TUNING_CONTROL);
+
+ /* write tuning parameters */
+ sdhci_writel(host, parameter_value, SDHCI_GLI_9750_TUNING_PARAMETERS);
+
+ /* disable tuning parameters control */
+ control_value &= ~SDHCI_GLI_9750_TUNING_CONTROL_EN;
+ control_value |= FIELD_PREP(SDHCI_GLI_9750_TUNING_CONTROL_EN,
+ GLI_9750_TUNING_CONTROL_EN_OFF);
+ sdhci_writel(host, control_value, SDHCI_GLI_9750_TUNING_CONTROL);
+
+ /* clear tuned clk */
+ ctrl2 = sdhci_readw(host, SDHCI_HOST_CONTROL2);
+ ctrl2 &= ~SDHCI_CTRL_TUNED_CLK;
+ sdhci_writew(host, ctrl2, SDHCI_HOST_CONTROL2);
+
+ gl9750_wt_off(host);
+}
+
+static void gli_set_9750_rx_inv(struct sdhci_host *host, bool b)
+{
+ u32 misc_value;
+
+ gl9750_wt_on(host);
+
+ misc_value = sdhci_readl(host, SDHCI_GLI_9750_MISC);
+ misc_value &= ~SDHCI_GLI_9750_MISC_RX_INV;
+ if (b) {
+ misc_value |= FIELD_PREP(SDHCI_GLI_9750_MISC_RX_INV,
+ GLI_9750_MISC_RX_INV_ON);
+ } else {
+ misc_value |= FIELD_PREP(SDHCI_GLI_9750_MISC_RX_INV,
+ GLI_9750_MISC_RX_INV_OFF);
+ }
+ sdhci_writel(host, misc_value, SDHCI_GLI_9750_MISC);
+
+ gl9750_wt_off(host);
+}
+
+static int __sdhci_execute_tuning_9750(struct sdhci_host *host, u32 opcode)
+{
+ int i;
+ int rx_inv;
+
+ for (rx_inv = 0; rx_inv < 2; rx_inv++) {
+ gli_set_9750_rx_inv(host, !!rx_inv);
+ sdhci_start_tuning(host);
+
+ for (i = 0; i < GLI_MAX_TUNING_LOOP; i++) {
+ u16 ctrl;
+
+ sdhci_send_tuning(host, opcode);
+
+ if (!host->tuning_done) {
+ sdhci_abort_tuning(host, opcode);
+ break;
+ }
+
+ ctrl = sdhci_readw(host, SDHCI_HOST_CONTROL2);
+ if (!(ctrl & SDHCI_CTRL_EXEC_TUNING)) {
+ if (ctrl & SDHCI_CTRL_TUNED_CLK)
+ return 0; /* Success! */
+ break;
+ }
+ }
+ }
+ if (!host->tuning_done) {
+ pr_info("%s: Tuning timeout, falling back to fixed sampling clock\n",
+ mmc_hostname(host->mmc));
+ return -ETIMEDOUT;
+ }
+
+ pr_info("%s: Tuning failed, falling back to fixed sampling clock\n",
+ mmc_hostname(host->mmc));
+ sdhci_reset_tuning(host);
+
+ return -EAGAIN;
+}
+
+static int gl9750_execute_tuning(struct sdhci_host *host, u32 opcode)
+{
+ host->mmc->retune_period = 0;
+ if (host->tuning_mode == SDHCI_TUNING_MODE_1)
+ host->mmc->retune_period = host->tuning_count;
+
+ gli_set_9750(host);
+ host->tuning_err = __sdhci_execute_tuning_9750(host, opcode);
+ sdhci_end_tuning(host);
+
+ return 0;
+}
+
+static int gli_probe_slot_gl9750(struct sdhci_pci_slot *slot)
+{
+ struct sdhci_host *host = slot->host;
+
+ slot->host->mmc->caps2 |= MMC_CAP2_NO_SDIO;
+ sdhci_enable_v4_mode(host);
+
+ return 0;
+}
+
+static int gli_probe_slot_gl9755(struct sdhci_pci_slot *slot)
+{
+ struct sdhci_host *host = slot->host;
+
+ slot->host->mmc->caps2 |= MMC_CAP2_NO_SDIO;
+ sdhci_enable_v4_mode(host);
+
+ return 0;
+}
+
+static void sdhci_gli_voltage_switch(struct sdhci_host *host)
+{
+ /*
+ * According to Section 3.6.1 signal voltage switch procedure in
+ * SD Host Controller Simplified Spec. 4.20, steps 6~8 are as
+ * follows:
+ * (6) Set 1.8V Signal Enable in the Host Control 2 register.
+ * (7) Wait 5ms. 1.8V voltage regulator shall be stable within this
+ * period.
+ * (8) If 1.8V Signal Enable is cleared by Host Controller, go to
+ * step (12).
+ *
+ * Wait 5ms after set 1.8V signal enable in Host Control 2 register
+ * to ensure 1.8V signal enable bit is set by GL9750/GL9755.
+ */
+ usleep_range(5000, 5500);
+}
+
+static void sdhci_gl9750_reset(struct sdhci_host *host, u8 mask)
+{
+ sdhci_reset(host, mask);
+ gli_set_9750(host);
+}
+
+static u32 sdhci_gl9750_readl(struct sdhci_host *host, int reg)
+{
+ u32 value;
+
+ value = readl(host->ioaddr + reg);
+ if (unlikely(reg == SDHCI_MAX_CURRENT && !(value & 0xff)))
+ value |= 0xc8;
+
+ return value;
+}
+
+static const struct sdhci_ops sdhci_gl9755_ops = {
+ .set_clock = sdhci_set_clock,
+ .enable_dma = sdhci_pci_enable_dma,
+ .set_bus_width = sdhci_set_bus_width,
+ .reset = sdhci_reset,
+ .set_uhs_signaling = sdhci_set_uhs_signaling,
+ .voltage_switch = sdhci_gli_voltage_switch,
+};
+
+const struct sdhci_pci_fixes sdhci_gl9755 = {
+ .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
+ .quirks2 = SDHCI_QUIRK2_BROKEN_DDR50,
+ .probe_slot = gli_probe_slot_gl9755,
+ .ops = &sdhci_gl9755_ops,
+};
+
+static const struct sdhci_ops sdhci_gl9750_ops = {
+ .read_l = sdhci_gl9750_readl,
+ .set_clock = sdhci_set_clock,
+ .enable_dma = sdhci_pci_enable_dma,
+ .set_bus_width = sdhci_set_bus_width,
+ .reset = sdhci_gl9750_reset,
+ .set_uhs_signaling = sdhci_set_uhs_signaling,
+ .voltage_switch = sdhci_gli_voltage_switch,
+ .platform_execute_tuning = gl9750_execute_tuning,
+};
+
+const struct sdhci_pci_fixes sdhci_gl9750 = {
+ .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
+ .quirks2 = SDHCI_QUIRK2_BROKEN_DDR50,
+ .probe_slot = gli_probe_slot_gl9750,
+ .ops = &sdhci_gl9750_ops,
+};
#define PCI_DEVICE_ID_SYNOPSYS_DWC_MSHC 0xc202
+#define PCI_DEVICE_ID_GLI_9755 0x9755
+#define PCI_DEVICE_ID_GLI_9750 0x9750
+
/*
* PCI device class and mask
*/
extern const struct sdhci_pci_fixes sdhci_arasan;
extern const struct sdhci_pci_fixes sdhci_snps;
extern const struct sdhci_pci_fixes sdhci_o2;
+extern const struct sdhci_pci_fixes sdhci_gl9750;
+extern const struct sdhci_pci_fixes sdhci_gl9755;
#endif /* __SDHCI_PCI_H */
*/
#include <linux/delay.h>
+#include <linux/dma-mapping.h>
#include <linux/err.h>
#include <linux/module.h>
#include <linux/init.h>
struct sdhci_tegra_soc_data {
const struct sdhci_pltfm_data *pdata;
+ u64 dma_mask;
u32 nvquirks;
u8 min_tap_delay;
u8 max_tap_delay;
.update_dcmd_desc = sdhci_tegra_update_dcmd_desc,
};
+static int tegra_sdhci_set_dma_mask(struct sdhci_host *host)
+{
+ struct sdhci_pltfm_host *platform = sdhci_priv(host);
+ struct sdhci_tegra *tegra = sdhci_pltfm_priv(platform);
+ const struct sdhci_tegra_soc_data *soc = tegra->soc_data;
+ struct device *dev = mmc_dev(host->mmc);
+
+ if (soc->dma_mask)
+ return dma_set_mask_and_coherent(dev, soc->dma_mask);
+
+ return 0;
+}
+
static const struct sdhci_ops tegra_sdhci_ops = {
.get_ro = tegra_sdhci_get_ro,
.read_w = tegra_sdhci_readw,
.write_l = tegra_sdhci_writel,
.set_clock = tegra_sdhci_set_clock,
+ .set_dma_mask = tegra_sdhci_set_dma_mask,
.set_bus_width = sdhci_set_bus_width,
.reset = tegra_sdhci_reset,
.platform_execute_tuning = tegra_sdhci_execute_tuning,
static const struct sdhci_tegra_soc_data soc_data_tegra20 = {
.pdata = &sdhci_tegra20_pdata,
+ .dma_mask = DMA_BIT_MASK(32),
.nvquirks = NVQUIRK_FORCE_SDHCI_SPEC_200 |
NVQUIRK_ENABLE_BLOCK_GAP_DET,
};
static const struct sdhci_tegra_soc_data soc_data_tegra30 = {
.pdata = &sdhci_tegra30_pdata,
+ .dma_mask = DMA_BIT_MASK(32),
.nvquirks = NVQUIRK_ENABLE_SDHCI_SPEC_300 |
NVQUIRK_ENABLE_SDR50 |
NVQUIRK_ENABLE_SDR104 |
.write_w = tegra_sdhci_writew,
.write_l = tegra_sdhci_writel,
.set_clock = tegra_sdhci_set_clock,
+ .set_dma_mask = tegra_sdhci_set_dma_mask,
.set_bus_width = sdhci_set_bus_width,
.reset = tegra_sdhci_reset,
.platform_execute_tuning = tegra_sdhci_execute_tuning,
static const struct sdhci_tegra_soc_data soc_data_tegra114 = {
.pdata = &sdhci_tegra114_pdata,
+ .dma_mask = DMA_BIT_MASK(32),
};
static const struct sdhci_pltfm_data sdhci_tegra124_pdata = {
SDHCI_QUIRK_NO_HISPD_BIT |
SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC |
SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
- .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
- /*
- * The TRM states that the SD/MMC controller found on
- * Tegra124 can address 34 bits (the maximum supported by
- * the Tegra memory controller), but tests show that DMA
- * to or from above 4 GiB doesn't work. This is possibly
- * caused by missing programming, though it's not obvious
- * what sequence is required. Mark 64-bit DMA broken for
- * now to fix this for existing users (e.g. Nyan boards).
- */
- SDHCI_QUIRK2_BROKEN_64_BIT_DMA,
+ .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN,
.ops = &tegra114_sdhci_ops,
};
static const struct sdhci_tegra_soc_data soc_data_tegra124 = {
.pdata = &sdhci_tegra124_pdata,
+ .dma_mask = DMA_BIT_MASK(34),
};
static const struct sdhci_ops tegra210_sdhci_ops = {
.write_w = tegra210_sdhci_writew,
.write_l = tegra_sdhci_writel,
.set_clock = tegra_sdhci_set_clock,
+ .set_dma_mask = tegra_sdhci_set_dma_mask,
.set_bus_width = sdhci_set_bus_width,
.reset = tegra_sdhci_reset,
.set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
static const struct sdhci_tegra_soc_data soc_data_tegra210 = {
.pdata = &sdhci_tegra210_pdata,
+ .dma_mask = DMA_BIT_MASK(34),
.nvquirks = NVQUIRK_NEEDS_PAD_CONTROL |
NVQUIRK_HAS_PADCALIB |
NVQUIRK_DIS_CARD_CLK_CONFIG_TAP |
.read_w = tegra_sdhci_readw,
.write_l = tegra_sdhci_writel,
.set_clock = tegra_sdhci_set_clock,
+ .set_dma_mask = tegra_sdhci_set_dma_mask,
.set_bus_width = sdhci_set_bus_width,
.reset = tegra_sdhci_reset,
.set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
SDHCI_QUIRK_NO_HISPD_BIT |
SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC |
SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
- .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
- /* SDHCI controllers on Tegra186 support 40-bit addressing.
- * IOVA addresses are 48-bit wide on Tegra186.
- * With 64-bit dma mask used for SDHCI, accesses can
- * be broken. Disable 64-bit dma, which would fall back
- * to 32-bit dma mask. Ideally 40-bit dma mask would work,
- * But it is not supported as of now.
- */
- SDHCI_QUIRK2_BROKEN_64_BIT_DMA,
+ .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN,
.ops = &tegra186_sdhci_ops,
};
static const struct sdhci_tegra_soc_data soc_data_tegra186 = {
.pdata = &sdhci_tegra186_pdata,
+ .dma_mask = DMA_BIT_MASK(40),
.nvquirks = NVQUIRK_NEEDS_PAD_CONTROL |
NVQUIRK_HAS_PADCALIB |
NVQUIRK_DIS_CARD_CLK_CONFIG_TAP |
static const struct sdhci_tegra_soc_data soc_data_tegra194 = {
.pdata = &sdhci_tegra186_pdata,
+ .dma_mask = DMA_BIT_MASK(39),
.nvquirks = NVQUIRK_NEEDS_PAD_CONTROL |
NVQUIRK_HAS_PADCALIB |
NVQUIRK_DIS_CARD_CLK_CONFIG_TAP |
static void sdhci_adma_show_error(struct sdhci_host *host)
{
void *desc = host->adma_table;
+ dma_addr_t dma = host->adma_addr;
sdhci_dumpregs(host);
struct sdhci_adma2_64_desc *dma_desc = desc;
if (host->flags & SDHCI_USE_64_BIT_DMA)
- DBG("%p: DMA 0x%08x%08x, LEN 0x%04x, Attr=0x%02x\n",
- desc, le32_to_cpu(dma_desc->addr_hi),
+ SDHCI_DUMP("%08llx: DMA 0x%08x%08x, LEN 0x%04x, Attr=0x%02x\n",
+ (unsigned long long)dma,
+ le32_to_cpu(dma_desc->addr_hi),
le32_to_cpu(dma_desc->addr_lo),
le16_to_cpu(dma_desc->len),
le16_to_cpu(dma_desc->cmd));
else
- DBG("%p: DMA 0x%08x, LEN 0x%04x, Attr=0x%02x\n",
- desc, le32_to_cpu(dma_desc->addr_lo),
+ SDHCI_DUMP("%08llx: DMA 0x%08x, LEN 0x%04x, Attr=0x%02x\n",
+ (unsigned long long)dma,
+ le32_to_cpu(dma_desc->addr_lo),
le16_to_cpu(dma_desc->len),
le16_to_cpu(dma_desc->cmd));
desc += host->desc_sz;
+ dma += host->desc_sz;
if (dma_desc->cmd & cpu_to_le16(ADMA2_END))
break;
!= MMC_BUS_TEST_R)
host->data->error = -EILSEQ;
else if (intmask & SDHCI_INT_ADMA_ERROR) {
- pr_err("%s: ADMA error\n", mmc_hostname(host->mmc));
+ pr_err("%s: ADMA error: 0x%08x\n", mmc_hostname(host->mmc),
+ intmask);
sdhci_adma_show_error(host);
host->data->error = -EIO;
if (host->ops->adma_workaround)
host->flags &= ~SDHCI_USE_ADMA;
}
- /*
- * It is assumed that a 64-bit capable device has set a 64-bit DMA mask
- * and *must* do 64-bit DMA. A driver has the opportunity to change
- * that during the first call to ->enable_dma(). Similarly
- * SDHCI_QUIRK2_BROKEN_64_BIT_DMA must be left to the drivers to
- * implement.
- */
if (sdhci_can_64bit_dma(host))
host->flags |= SDHCI_USE_64_BIT_DMA;
if (host->flags & (SDHCI_USE_SDMA | SDHCI_USE_ADMA)) {
- ret = sdhci_set_dma_mask(host);
+ if (host->ops->set_dma_mask)
+ ret = host->ops->set_dma_mask(host);
+ else
+ ret = sdhci_set_dma_mask(host);
if (!ret && host->ops->enable_dma)
ret = host->ops->enable_dma(host);
u32 (*irq)(struct sdhci_host *host, u32 intmask);
+ int (*set_dma_mask)(struct sdhci_host *host);
int (*enable_dma)(struct sdhci_host *host);
unsigned int (*get_max_clock)(struct sdhci_host *host);
unsigned int (*get_min_clock)(struct sdhci_host *host);
depends on ACPI
help
This driver provides support for Extended Socket network device
- on Extended Partitioning of FUJITSU PRIMEQUEST 2000 E2 series.
+ on Extended Partitioning of FUJITSU PRIMEQUEST 2000 E2 series.
config THUNDERBOLT_NET
tristate "Networking over Thunderbolt cable"
tristate "Enable CAP mode packet interface"
help
ARCnet "cap mode" packet encapsulation. Used to get the hardware
- acknowledge back to userspace. After the initial protocol byte every
- packet is stuffed with an extra 4 byte "cookie" which doesn't
- actually appear on the network. After transmit the driver will send
- back a packet with protocol byte 0 containing the status of the
- transmission:
- 0=no hardware acknowledge
- 1=excessive nak
- 2=transmission accepted by the receiver hardware
-
- Received packets are also stuffed with the extra 4 bytes but it will
- be random data.
-
- Cap only listens to protocol 1-8.
+ acknowledge back to userspace. After the initial protocol byte every
+ packet is stuffed with an extra 4 byte "cookie" which doesn't
+ actually appear on the network. After transmit the driver will send
+ back a packet with protocol byte 0 containing the status of the
+ transmission:
+ 0=no hardware acknowledge
+ 1=excessive nak
+ 2=transmission accepted by the receiver hardware
+
+ Received packets are also stuffed with the extra 4 bytes but it will
+ be random data.
+
+ Cap only listens to protocol 1-8.
config ARCNET_COM90xx
tristate "ARCnet COM90xx (normal) chipset driver"
static void arcnet_rx(struct net_device *dev, int bufnum)
{
struct arcnet_local *lp = netdev_priv(dev);
- struct archdr pkt;
+ union {
+ struct archdr pkt;
+ char buf[512];
+ } rxdata;
struct arc_rfc1201 *soft;
int length, ofs;
- soft = &pkt.soft.rfc1201;
+ soft = &rxdata.pkt.soft.rfc1201;
- lp->hw.copy_from_card(dev, bufnum, 0, &pkt, ARC_HDR_SIZE);
- if (pkt.hard.offset[0]) {
- ofs = pkt.hard.offset[0];
+ lp->hw.copy_from_card(dev, bufnum, 0, &rxdata.pkt, ARC_HDR_SIZE);
+ if (rxdata.pkt.hard.offset[0]) {
+ ofs = rxdata.pkt.hard.offset[0];
length = 256 - ofs;
} else {
- ofs = pkt.hard.offset[1];
+ ofs = rxdata.pkt.hard.offset[1];
length = 512 - ofs;
}
/* get the full header, if possible */
- if (sizeof(pkt.soft) <= length) {
- lp->hw.copy_from_card(dev, bufnum, ofs, soft, sizeof(pkt.soft));
+ if (sizeof(rxdata.pkt.soft) <= length) {
+ lp->hw.copy_from_card(dev, bufnum, ofs, soft, sizeof(rxdata.pkt.soft));
} else {
- memset(&pkt.soft, 0, sizeof(pkt.soft));
+ memset(&rxdata.pkt.soft, 0, sizeof(rxdata.pkt.soft));
lp->hw.copy_from_card(dev, bufnum, ofs, soft, length);
}
arc_printk(D_DURING, dev, "Buffer #%d: received packet from %02Xh to %02Xh (%d+4 bytes)\n",
- bufnum, pkt.hard.source, pkt.hard.dest, length);
+ bufnum, rxdata.pkt.hard.source, rxdata.pkt.hard.dest, length);
dev->stats.rx_packets++;
dev->stats.rx_bytes += length + ARC_HDR_SIZE;
if (arc_proto_map[soft->proto]->is_ip) {
if (BUGLVL(D_PROTO)) {
struct ArcProto
- *oldp = arc_proto_map[lp->default_proto[pkt.hard.source]],
+ *oldp = arc_proto_map[lp->default_proto[rxdata.pkt.hard.source]],
*newp = arc_proto_map[soft->proto];
if (oldp != newp) {
arc_printk(D_PROTO, dev,
"got protocol %02Xh; encap for host %02Xh is now '%c' (was '%c')\n",
- soft->proto, pkt.hard.source,
+ soft->proto, rxdata.pkt.hard.source,
newp->suffix, oldp->suffix);
}
}
lp->default_proto[0] = soft->proto;
/* in striking contrast, the following isn't a hack. */
- lp->default_proto[pkt.hard.source] = soft->proto;
+ lp->default_proto[rxdata.pkt.hard.source] = soft->proto;
}
/* call the protocol-specific receiver. */
- arc_proto_map[soft->proto]->rx(dev, bufnum, &pkt, length);
+ arc_proto_map[soft->proto]->rx(dev, bufnum, &rxdata.pkt, length);
}
static void null_rx(struct net_device *dev, int bufnum,
from EMS Dr. Thomas Wuensche (http://www.ems-wuensche.de).
config CAN_ESD_USB2
- tristate "ESD USB/2 CAN/USB interface"
- ---help---
- This driver supports the CAN-USB/2 interface
- from esd electronic system design gmbh (http://www.esd.eu).
+ tristate "ESD USB/2 CAN/USB interface"
+ ---help---
+ This driver supports the CAN-USB/2 interface
+ from esd electronic system design gmbh (http://www.esd.eu).
config CAN_GS_USB
tristate "Geschwister Schneider UG interfaces"
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause
- *
+/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
+/*
* Northstar Plus switch SerDes/SGMII PHY definitions
*
* Copyright (C) 2018 Florian Fainelli <f.fainelli@gmail.com>
-// SPDX-License-Identifier: GPL-2.0
+/* SPDX-License-Identifier: GPL-2.0 */
/*
* PCE microcode extracted from UGW 7.1.1 switch api
*
{ \
.name = #width, \
.val_bits = (width), \
- .reg_stride = (width) / 8, \
+ .reg_stride = 1, \
.reg_bits = (regbits) + (regalign), \
.pad_bits = (regpad), \
.max_register = BIT(regbits) - 1, \
BIT(0) << QCA8K_GLOBAL_FW_CTRL1_UC_DP_S);
/* Setup connection between CPU port & user ports */
- for (i = 0; i < DSA_MAX_PORTS; i++) {
+ for (i = 0; i < QCA8K_NUM_PORTS; i++) {
/* CPU port gets connected to all user ports of the switch */
if (dsa_is_cpu_port(ds, i)) {
qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(QCA8K_CPU_PORT),
{
struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
+ if (!dsa_is_user_port(ds, port))
+ return 0;
+
qca8k_port_set_status(priv, port, 1);
priv->port_sts[port].enabled = 1;
if (id != QCA8K_ID_QCA8337)
return -ENODEV;
- priv->ds = dsa_switch_alloc(&mdiodev->dev, DSA_MAX_PORTS);
+ priv->ds = dsa_switch_alloc(&mdiodev->dev, QCA8K_NUM_PORTS);
if (!priv->ds)
return -ENOMEM;
const struct switchdev_obj_port_vlan *vlan)
{
struct realtek_smi *smi = ds->priv;
+ u16 vid;
int ret;
- if (!smi->ops->is_vlan_valid(smi, port))
- return -EINVAL;
+ for (vid = vlan->vid_begin; vid < vlan->vid_end; vid++)
+ if (!smi->ops->is_vlan_valid(smi, vid))
+ return -EINVAL;
dev_info(smi->dev, "prepare VLANs %04x..%04x\n",
vlan->vid_begin, vlan->vid_end);
u16 vid;
int ret;
- if (!smi->ops->is_vlan_valid(smi, port))
- return;
+ for (vid = vlan->vid_begin; vid < vlan->vid_end; vid++)
+ if (!smi->ops->is_vlan_valid(smi, vid))
+ return;
dev_info(smi->dev, "add VLAN on port %d, %s, %s\n",
port,
irq = of_irq_get(intc, 0);
if (irq <= 0) {
dev_err(smi->dev, "failed to get parent IRQ\n");
- return irq ? irq : -EINVAL;
+ ret = irq ? irq : -EINVAL;
+ goto out_put_node;
}
/* This clears the IRQ status register */
&val);
if (ret) {
dev_err(smi->dev, "can't read interrupt status\n");
- return ret;
+ goto out_put_node;
}
/* Fetch IRQ edge information from the descriptor */
val);
if (ret) {
dev_err(smi->dev, "could not configure IRQ polarity\n");
- return ret;
+ goto out_put_node;
}
ret = devm_request_threaded_irq(smi->dev, irq, NULL,
"RTL8366RB", smi);
if (ret) {
dev_err(smi->dev, "unable to request irq: %d\n", ret);
- return ret;
+ goto out_put_node;
}
smi->irqdomain = irq_domain_add_linear(intc,
RTL8366RB_NUM_INTERRUPT,
smi);
if (!smi->irqdomain) {
dev_err(smi->dev, "failed to create IRQ domain\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto out_put_node;
}
for (i = 0; i < smi->num_ports; i++)
irq_set_parent(irq_create_mapping(smi->irqdomain, i), irq);
- return 0;
+out_put_node:
+ of_node_put(intc);
+ return ret;
}
static int rtl8366rb_set_addr(struct realtek_smi *smi)
config NET_DSA_SJA1105_TAS
bool "Support for the Time-Aware Scheduler on NXP SJA1105"
depends on NET_DSA_SJA1105
+ depends on NET_SCH_TAPRIO
help
This enables support for the TTEthernet-based egress scheduling
engine in the SJA1105 DSA driver, which is controlled using a
return sja1105_static_config_reload(priv);
}
-/* Caller must hold priv->tagger_data.meta_lock */
+/* Must be called only with priv->tagger_data.state bit
+ * SJA1105_HWTS_RX_EN cleared
+ */
static int sja1105_change_rxtstamping(struct sja1105_private *priv,
bool on)
{
break;
}
- if (rx_on != priv->tagger_data.hwts_rx_en) {
- spin_lock(&priv->tagger_data.meta_lock);
+ if (rx_on != test_bit(SJA1105_HWTS_RX_EN, &priv->tagger_data.state)) {
+ clear_bit(SJA1105_HWTS_RX_EN, &priv->tagger_data.state);
+
rc = sja1105_change_rxtstamping(priv, rx_on);
- spin_unlock(&priv->tagger_data.meta_lock);
if (rc < 0) {
dev_err(ds->dev,
"Failed to change RX timestamping: %d\n", rc);
- return -EFAULT;
+ return rc;
}
- priv->tagger_data.hwts_rx_en = rx_on;
+ if (rx_on)
+ set_bit(SJA1105_HWTS_RX_EN, &priv->tagger_data.state);
}
if (copy_to_user(ifr->ifr_data, &config, sizeof(config)))
config.tx_type = HWTSTAMP_TX_ON;
else
config.tx_type = HWTSTAMP_TX_OFF;
- if (priv->tagger_data.hwts_rx_en)
+ if (test_bit(SJA1105_HWTS_RX_EN, &priv->tagger_data.state))
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
else
config.rx_filter = HWTSTAMP_FILTER_NONE;
mutex_lock(&priv->ptp_lock);
- now = priv->tstamp_cc.read(&priv->tstamp_cc);
-
while ((skb = skb_dequeue(&data->skb_rxtstamp_queue)) != NULL) {
struct skb_shared_hwtstamps *shwt = skb_hwtstamps(skb);
u64 ts;
+ now = priv->tstamp_cc.read(&priv->tstamp_cc);
+
*shwt = (struct skb_shared_hwtstamps) {0};
ts = SJA1105_SKB_CB(skb)->meta_tstamp;
struct sja1105_private *priv = ds->priv;
struct sja1105_tagger_data *data = &priv->tagger_data;
- if (!data->hwts_rx_en)
+ if (!test_bit(SJA1105_HWTS_RX_EN, &data->state))
return false;
/* We need to read the full PTP clock to reconstruct the Rx
tagger_data = &priv->tagger_data;
skb_queue_head_init(&tagger_data->skb_rxtstamp_queue);
INIT_WORK(&tagger_data->rxtstamp_work, sja1105_rxtstamp_work);
+ spin_lock_init(&tagger_data->meta_lock);
/* Connections between dsa_port and sja1105_port */
for (i = 0; i < SJA1105_NUM_PORTS; i++) {
rc = static_config_buf_prepare_for_upload(priv, config_buf, buf_len);
if (rc < 0) {
dev_err(dev, "Invalid config, cannot upload\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto out;
}
/* Prevent PHY jabbering during switch reset by inhibiting
* Tx on all ports and waiting for current packet to drain.
rc = sja1105_inhibit_tx(priv, port_bitmap, true);
if (rc < 0) {
dev_err(dev, "Failed to inhibit Tx on ports\n");
- return -ENXIO;
+ rc = -ENXIO;
+ goto out;
}
/* Wait for an eventual egress packet to finish transmission
* (reach IFG). It is guaranteed that a second one will not
source "drivers/net/ethernet/netronome/Kconfig"
source "drivers/net/ethernet/ni/Kconfig"
source "drivers/net/ethernet/8390/Kconfig"
-
-config NET_NETX
- tristate "NetX Ethernet support"
- select MII
- depends on ARCH_NETX
- ---help---
- This is support for the Hilscher netX builtin Ethernet ports
-
- To compile this driver as a module, choose M here. The module
- will be called netx-eth.
-
source "drivers/net/ethernet/nvidia/Kconfig"
source "drivers/net/ethernet/nxp/Kconfig"
source "drivers/net/ethernet/oki-semi/Kconfig"
obj-$(CONFIG_NET_VENDOR_NETERION) += neterion/
obj-$(CONFIG_NET_VENDOR_NETRONOME) += netronome/
obj-$(CONFIG_NET_VENDOR_NI) += ni/
-obj-$(CONFIG_NET_NETX) += netx-eth.o
obj-$(CONFIG_NET_VENDOR_NVIDIA) += nvidia/
obj-$(CONFIG_LPC_ENET) += nxp/
obj-$(CONFIG_NET_VENDOR_OKI) += oki-semi/
if NET_VENDOR_ALLWINNER
config SUN4I_EMAC
- tristate "Allwinner A10 EMAC support"
+ tristate "Allwinner A10 EMAC support"
depends on ARCH_SUNXI
depends on OF
select CRC32
select MII
select PHYLIB
select MDIO_SUN4I
- ---help---
- Support for Allwinner A10 EMAC ethernet driver.
+ ---help---
+ Support for Allwinner A10 EMAC ethernet driver.
- To compile this driver as a module, choose M here. The module
- will be called sun4i-emac.
+ To compile this driver as a module, choose M here. The module
+ will be called sun4i-emac.
endif # NET_VENDOR_ALLWINNER
config ENA_ETHERNET
tristate "Elastic Network Adapter (ENA) support"
depends on PCI_MSI && !CPU_BIG_ENDIAN
+ select DIMLIB
---help---
This driver supports Elastic Network Adapter (ENA)"
pkt_ctrl->curr_bounce_buf =
ena_com_get_next_bounce_buffer(&io_sq->bounce_buf_ctrl);
- memset(io_sq->llq_buf_ctrl.curr_bounce_buf,
- 0x0, llq_info->desc_list_entry_size);
+ memset(io_sq->llq_buf_ctrl.curr_bounce_buf,
+ 0x0, llq_info->desc_list_entry_size);
pkt_ctrl->idx = 0;
if (unlikely(llq_info->desc_stride_ctrl == ENA_ADMIN_SINGLE_DESC_PER_ENTRY))
{
struct aq_vec_s *self = private;
u64 irq_mask = 0U;
- irqreturn_t err = 0;
+ int err;
- if (!self) {
- err = -EINVAL;
- goto err_exit;
- }
+ if (!self)
+ return IRQ_NONE;
err = self->aq_hw_ops->hw_irq_read(self->aq_hw, &irq_mask);
if (err < 0)
- goto err_exit;
+ return IRQ_NONE;
if (irq_mask) {
self->aq_hw_ops->hw_irq_disable(self->aq_hw,
napi_schedule(&self->napi);
} else {
self->aq_hw_ops->hw_irq_enable(self->aq_hw, 1U);
- err = IRQ_NONE;
+ return IRQ_NONE;
}
-err_exit:
- return err >= 0 ? IRQ_HANDLED : IRQ_NONE;
+ return IRQ_HANDLED;
}
cpumask_t *aq_vec_get_affinity_mask(struct aq_vec_s *self)
struct device *dev = &ag->pdev->dev;
struct net_device *ndev = ag->ndev;
static struct mii_bus *mii_bus;
- struct device_node *np;
+ struct device_node *np, *mnp;
int err;
np = dev->of_node;
msleep(200);
}
- err = of_mdiobus_register(mii_bus, np);
+ mnp = of_get_child_by_name(np, "mdio");
+ err = of_mdiobus_register(mii_bus, mnp);
+ of_node_put(mnp);
if (err)
goto mdio_err_put_clk;
priv->phy_interface = of_get_phy_mode(dn);
/* Default to GMII interface mode */
- if (priv->phy_interface < 0)
+ if ((int)priv->phy_interface < 0)
priv->phy_interface = PHY_INTERFACE_MODE_GMII;
/* In the case of a fixed PHY, the DT node associated
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
static struct macb_dma_desc_64 *macb_64b_desc(struct macb *bp, struct macb_dma_desc *desc)
{
- if (bp->hw_dma_cap & HW_DMA_CAP_64B)
- return (struct macb_dma_desc_64 *)((void *)desc + sizeof(struct macb_dma_desc));
- return NULL;
+ return (struct macb_dma_desc_64 *)((void *)desc
+ + sizeof(struct macb_dma_desc));
}
#endif
whoami = t4_read_reg(adapter, PL_WHOAMI_A);
pci_read_config_word(pdev, PCI_DEVICE_ID, &device_id);
chip = t4_get_chip_type(adapter, CHELSIO_PCI_ID_VER(device_id));
- if (chip < 0) {
+ if ((int)chip < 0) {
dev_err(&pdev->dev, "Device %d is not supported\n", device_id);
err = chip;
goto out_free_adapter;
static int alloc_uld_rxqs(struct adapter *adap,
struct sge_uld_rxq_info *rxq_info, bool lro)
{
- struct sge *s = &adap->sge;
unsigned int nq = rxq_info->nrxq + rxq_info->nciq;
+ int i, err, msi_idx, que_idx = 0, bmap_idx = 0;
struct sge_ofld_rxq *q = rxq_info->uldrxq;
unsigned short *ids = rxq_info->rspq_id;
- unsigned int bmap_idx = 0;
+ struct sge *s = &adap->sge;
unsigned int per_chan;
- int i, err, msi_idx, que_idx = 0;
per_chan = rxq_info->nrxq / adap->params.nports;
if (msi_idx >= 0) {
bmap_idx = get_msix_idx_from_bmap(adap);
+ if (bmap_idx < 0) {
+ err = -ENOSPC;
+ goto freeout;
+ }
msi_idx = adap->msix_info_ulds[bmap_idx].idx;
}
err = t4_sge_alloc_rxq(adap, &q->rspq, false,
chipsets. (e.g. OneConnect OCe14xxx)
comment "WARNING: be2net is useless without any enabled chip"
- depends on BE2NET_BE2=n && BE2NET_BE3=n && BE2NET_LANCER=n && \
+ depends on BE2NET_BE2=n && BE2NET_BE3=n && BE2NET_LANCER=n && \
BE2NET_SKYHAWK=n && BE2NET
}
priv->if_mode = of_get_phy_mode(np);
- if (priv->if_mode < 0) {
+ if ((int)priv->if_mode < 0) {
dev_err(priv->dev, "missing phy type\n");
of_node_put(priv->phy_node);
if (of_phy_is_fixed_link(np))
return 0;
}
-void reset_gfar(struct net_device *ndev)
+static void reset_gfar(struct net_device *ndev)
{
struct gfar_private *priv = netdev_priv(ndev);
goto err_free_mdio;
priv->phy_mode = of_get_phy_mode(node);
- if (priv->phy_mode < 0) {
+ if ((int)priv->phy_mode < 0) {
netdev_err(ndev, "not find phy-mode\n");
ret = -EINVAL;
goto err_mdiobus;
{
u32 time_cnt;
u32 reg_value;
+ int ret;
regmap_write(mdio_dev->subctrl_vbase, cfg_reg, set_val);
for (time_cnt = MDIO_TIMEOUT; time_cnt; time_cnt--) {
- regmap_read(mdio_dev->subctrl_vbase, st_reg, ®_value);
+ ret = regmap_read(mdio_dev->subctrl_vbase, st_reg, ®_value);
+ if (ret)
+ return ret;
+
reg_value &= st_msk;
if ((!!check_st) == (!!reg_value))
break;
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
/* ensure that transmissions are stopped if called by do_reset */
- if (adapter->resetting)
+ if (test_bit(0, &adapter->resetting))
netif_tx_disable(netdev);
else
netif_tx_stop_all_queues(netdev);
u8 proto = 0;
netdev_tx_t ret = NETDEV_TX_OK;
- if (adapter->resetting) {
+ if (test_bit(0, &adapter->resetting)) {
if (!netif_subqueue_stopped(netdev, skb))
netif_stop_subqueue(netdev, queue_num);
dev_kfree_skb_any(skb);
return rc;
}
+/**
+ * do_change_param_reset returns zero if we are able to keep processing reset
+ * events, or non-zero if we hit a fatal error and must halt.
+ */
+static int do_change_param_reset(struct ibmvnic_adapter *adapter,
+ struct ibmvnic_rwi *rwi,
+ u32 reset_state)
+{
+ struct net_device *netdev = adapter->netdev;
+ int i, rc;
+
+ netdev_dbg(adapter->netdev, "Change param resetting driver (%d)\n",
+ rwi->reset_reason);
+
+ netif_carrier_off(netdev);
+ adapter->reset_reason = rwi->reset_reason;
+
+ ibmvnic_cleanup(netdev);
+
+ if (reset_state == VNIC_OPEN) {
+ rc = __ibmvnic_close(netdev);
+ if (rc)
+ return rc;
+ }
+
+ release_resources(adapter);
+ release_sub_crqs(adapter, 1);
+ release_crq_queue(adapter);
+
+ adapter->state = VNIC_PROBED;
+
+ rc = init_crq_queue(adapter);
+
+ if (rc) {
+ netdev_err(adapter->netdev,
+ "Couldn't initialize crq. rc=%d\n", rc);
+ return rc;
+ }
+
+ rc = ibmvnic_reset_init(adapter);
+ if (rc)
+ return IBMVNIC_INIT_FAILED;
+
+ /* If the adapter was in PROBE state prior to the reset,
+ * exit here.
+ */
+ if (reset_state == VNIC_PROBED)
+ return 0;
+
+ rc = ibmvnic_login(netdev);
+ if (rc) {
+ adapter->state = reset_state;
+ return rc;
+ }
+
+ rc = init_resources(adapter);
+ if (rc)
+ return rc;
+
+ ibmvnic_disable_irqs(adapter);
+
+ adapter->state = VNIC_CLOSED;
+
+ if (reset_state == VNIC_CLOSED)
+ return 0;
+
+ rc = __ibmvnic_open(netdev);
+ if (rc)
+ return IBMVNIC_OPEN_FAILED;
+
+ /* refresh device's multicast list */
+ ibmvnic_set_multi(netdev);
+
+ /* kick napi */
+ for (i = 0; i < adapter->req_rx_queues; i++)
+ napi_schedule(&adapter->napi[i]);
+
+ return 0;
+}
+
/**
* do_reset returns zero if we are able to keep processing reset events, or
* non-zero if we hit a fatal error and must halt.
netdev_dbg(adapter->netdev, "Re-setting driver (%d)\n",
rwi->reset_reason);
+ rtnl_lock();
+
netif_carrier_off(netdev);
adapter->reset_reason = rwi->reset_reason;
if (reset_state == VNIC_OPEN &&
adapter->reset_reason != VNIC_RESET_MOBILITY &&
adapter->reset_reason != VNIC_RESET_FAILOVER) {
- rc = __ibmvnic_close(netdev);
+ adapter->state = VNIC_CLOSING;
+
+ /* Release the RTNL lock before link state change and
+ * re-acquire after the link state change to allow
+ * linkwatch_event to grab the RTNL lock and run during
+ * a reset.
+ */
+ rtnl_unlock();
+ rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
+ rtnl_lock();
if (rc)
- return rc;
- }
+ goto out;
- if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM ||
- adapter->wait_for_reset) {
- release_resources(adapter);
- release_sub_crqs(adapter, 1);
- release_crq_queue(adapter);
+ if (adapter->state != VNIC_CLOSING) {
+ rc = -1;
+ goto out;
+ }
+
+ adapter->state = VNIC_CLOSED;
}
if (adapter->reset_reason != VNIC_RESET_NON_FATAL) {
*/
adapter->state = VNIC_PROBED;
- if (adapter->wait_for_reset) {
- rc = init_crq_queue(adapter);
- } else if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
+ if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
rc = ibmvnic_reenable_crq_queue(adapter);
release_sub_crqs(adapter, 1);
} else {
if (rc) {
netdev_err(adapter->netdev,
"Couldn't initialize crq. rc=%d\n", rc);
- return rc;
+ goto out;
}
rc = ibmvnic_reset_init(adapter);
- if (rc)
- return IBMVNIC_INIT_FAILED;
+ if (rc) {
+ rc = IBMVNIC_INIT_FAILED;
+ goto out;
+ }
/* If the adapter was in PROBE state prior to the reset,
* exit here.
*/
- if (reset_state == VNIC_PROBED)
- return 0;
+ if (reset_state == VNIC_PROBED) {
+ rc = 0;
+ goto out;
+ }
rc = ibmvnic_login(netdev);
if (rc) {
adapter->state = reset_state;
- return rc;
+ goto out;
}
- if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM ||
- adapter->wait_for_reset) {
- rc = init_resources(adapter);
- if (rc)
- return rc;
- } else if (adapter->req_rx_queues != old_num_rx_queues ||
- adapter->req_tx_queues != old_num_tx_queues ||
- adapter->req_rx_add_entries_per_subcrq !=
- old_num_rx_slots ||
- adapter->req_tx_entries_per_subcrq !=
- old_num_tx_slots) {
+ if (adapter->req_rx_queues != old_num_rx_queues ||
+ adapter->req_tx_queues != old_num_tx_queues ||
+ adapter->req_rx_add_entries_per_subcrq !=
+ old_num_rx_slots ||
+ adapter->req_tx_entries_per_subcrq !=
+ old_num_tx_slots) {
release_rx_pools(adapter);
release_tx_pools(adapter);
release_napi(adapter);
rc = init_resources(adapter);
if (rc)
- return rc;
+ goto out;
} else {
rc = reset_tx_pools(adapter);
if (rc)
- return rc;
+ goto out;
rc = reset_rx_pools(adapter);
if (rc)
- return rc;
+ goto out;
}
ibmvnic_disable_irqs(adapter);
}
adapter->state = VNIC_CLOSED;
- if (reset_state == VNIC_CLOSED)
- return 0;
+ if (reset_state == VNIC_CLOSED) {
+ rc = 0;
+ goto out;
+ }
rc = __ibmvnic_open(netdev);
if (rc) {
- if (list_empty(&adapter->rwi_list))
- adapter->state = VNIC_CLOSED;
- else
- adapter->state = reset_state;
-
- return 0;
+ rc = IBMVNIC_OPEN_FAILED;
+ goto out;
}
/* refresh device's multicast list */
for (i = 0; i < adapter->req_rx_queues; i++)
napi_schedule(&adapter->napi[i]);
- if (adapter->reset_reason != VNIC_RESET_FAILOVER &&
- adapter->reset_reason != VNIC_RESET_CHANGE_PARAM)
+ if (adapter->reset_reason != VNIC_RESET_FAILOVER)
call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, netdev);
- return 0;
+ rc = 0;
+
+out:
+ rtnl_unlock();
+
+ return rc;
}
static int do_hard_reset(struct ibmvnic_adapter *adapter,
return 0;
rc = __ibmvnic_open(netdev);
- if (rc) {
- if (list_empty(&adapter->rwi_list))
- adapter->state = VNIC_CLOSED;
- else
- adapter->state = reset_state;
-
- return 0;
- }
+ if (rc)
+ return IBMVNIC_OPEN_FAILED;
return 0;
}
{
struct ibmvnic_rwi *rwi;
struct ibmvnic_adapter *adapter;
- bool we_lock_rtnl = false;
u32 reset_state;
int rc = 0;
adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);
- /* netif_set_real_num_xx_queues needs to take rtnl lock here
- * unless wait_for_reset is set, in which case the rtnl lock
- * has already been taken before initializing the reset
- */
- if (!adapter->wait_for_reset) {
- rtnl_lock();
- we_lock_rtnl = true;
+ if (test_and_set_bit_lock(0, &adapter->resetting)) {
+ schedule_delayed_work(&adapter->ibmvnic_delayed_reset,
+ IBMVNIC_RESET_DELAY);
+ return;
}
+
reset_state = adapter->state;
rwi = get_next_rwi(adapter);
break;
}
- if (adapter->force_reset_recovery) {
- adapter->force_reset_recovery = false;
- rc = do_hard_reset(adapter, rwi, reset_state);
+ if (rwi->reset_reason == VNIC_RESET_CHANGE_PARAM) {
+ /* CHANGE_PARAM requestor holds rtnl_lock */
+ rc = do_change_param_reset(adapter, rwi, reset_state);
+ } else if (adapter->force_reset_recovery) {
+ /* Transport event occurred during previous reset */
+ if (adapter->wait_for_reset) {
+ /* Previous was CHANGE_PARAM; caller locked */
+ adapter->force_reset_recovery = false;
+ rc = do_hard_reset(adapter, rwi, reset_state);
+ } else {
+ rtnl_lock();
+ adapter->force_reset_recovery = false;
+ rc = do_hard_reset(adapter, rwi, reset_state);
+ rtnl_unlock();
+ }
} else {
rc = do_reset(adapter, rwi, reset_state);
}
kfree(rwi);
- if (rc && rc != IBMVNIC_INIT_FAILED &&
+ if (rc == IBMVNIC_OPEN_FAILED) {
+ if (list_empty(&adapter->rwi_list))
+ adapter->state = VNIC_CLOSED;
+ else
+ adapter->state = reset_state;
+ rc = 0;
+ } else if (rc && rc != IBMVNIC_INIT_FAILED &&
!adapter->force_reset_recovery)
break;
rwi = get_next_rwi(adapter);
+
+ if (rwi && (rwi->reset_reason == VNIC_RESET_FAILOVER ||
+ rwi->reset_reason == VNIC_RESET_MOBILITY))
+ adapter->force_reset_recovery = true;
}
if (adapter->wait_for_reset) {
- adapter->wait_for_reset = false;
adapter->reset_done_rc = rc;
complete(&adapter->reset_done);
}
free_all_rwi(adapter);
}
- adapter->resetting = false;
- if (we_lock_rtnl)
- rtnl_unlock();
+ clear_bit_unlock(0, &adapter->resetting);
+}
+
+static void __ibmvnic_delayed_reset(struct work_struct *work)
+{
+ struct ibmvnic_adapter *adapter;
+
+ adapter = container_of(work, struct ibmvnic_adapter,
+ ibmvnic_delayed_reset.work);
+ __ibmvnic_reset(&adapter->ibmvnic_reset);
}
static int ibmvnic_reset(struct ibmvnic_adapter *adapter,
rwi->reset_reason = reason;
list_add_tail(&rwi->list, &adapter->rwi_list);
spin_unlock_irqrestore(&adapter->rwi_lock, flags);
- adapter->resetting = true;
netdev_dbg(adapter->netdev, "Scheduling reset (reason %d)\n", reason);
schedule_work(&adapter->ibmvnic_reset);
return 0;
err:
- if (adapter->wait_for_reset)
- adapter->wait_for_reset = false;
return -ret;
}
u16 offset;
u8 flags = 0;
- if (unlikely(adapter->resetting &&
+ if (unlikely(test_bit(0, &adapter->resetting) &&
adapter->reset_reason != VNIC_RESET_NON_FATAL)) {
enable_scrq_irq(adapter, adapter->rx_scrq[scrq_num]);
napi_complete_done(napi, frames_processed);
return 1;
}
- if (adapter->resetting &&
+ if (test_bit(0, &adapter->resetting) &&
adapter->reset_reason == VNIC_RESET_MOBILITY) {
u64 val = (0xff000000) | scrq->hw_irq;
if (rc) {
if (rc == H_CLOSED) {
dev_warn(dev, "CRQ Queue closed\n");
- if (adapter->resetting)
+ if (test_bit(0, &adapter->resetting))
ibmvnic_reset(adapter, VNIC_RESET_FATAL);
}
{
struct net_device *netdev = adapter->netdev;
int rc;
+ __be32 rspeed = cpu_to_be32(crq->query_phys_parms_rsp.speed);
rc = crq->query_phys_parms_rsp.rc.code;
if (rc) {
netdev_err(netdev, "Error %d in QUERY_PHYS_PARMS\n", rc);
return rc;
}
- switch (cpu_to_be32(crq->query_phys_parms_rsp.speed)) {
+ switch (rspeed) {
case IBMVNIC_10MBPS:
adapter->speed = SPEED_10;
break;
adapter->speed = SPEED_100000;
break;
default:
- netdev_warn(netdev, "Unknown speed 0x%08x\n",
- cpu_to_be32(crq->query_phys_parms_rsp.speed));
+ if (netif_carrier_ok(netdev))
+ netdev_warn(netdev, "Unknown speed 0x%08x\n", rspeed);
adapter->speed = SPEED_UNKNOWN;
}
if (crq->query_phys_parms_rsp.flags1 & IBMVNIC_FULL_DUPLEX)
case IBMVNIC_CRQ_XPORT_EVENT:
netif_carrier_off(netdev);
adapter->crq.active = false;
- if (adapter->resetting)
+ if (test_bit(0, &adapter->resetting))
adapter->force_reset_recovery = true;
if (gen_crq->cmd == IBMVNIC_PARTITION_MIGRATED) {
dev_info(dev, "Migrated, re-enabling adapter\n");
return -1;
}
- if (adapter->resetting && !adapter->wait_for_reset &&
+ if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset &&
adapter->reset_reason != VNIC_RESET_MOBILITY) {
if (adapter->req_rx_queues != old_num_rx_queues ||
adapter->req_tx_queues != old_num_tx_queues) {
spin_lock_init(&adapter->stats_lock);
INIT_WORK(&adapter->ibmvnic_reset, __ibmvnic_reset);
+ INIT_DELAYED_WORK(&adapter->ibmvnic_delayed_reset,
+ __ibmvnic_delayed_reset);
INIT_LIST_HEAD(&adapter->rwi_list);
spin_lock_init(&adapter->rwi_lock);
init_completion(&adapter->init_done);
- adapter->resetting = false;
+ clear_bit(0, &adapter->resetting);
do {
rc = init_crq_queue(adapter);
#define IBMVNIC_INVALID_MAP -1
#define IBMVNIC_STATS_TIMEOUT 1
#define IBMVNIC_INIT_FAILED 2
+#define IBMVNIC_OPEN_FAILED 3
/* basic structures plus 100 2k buffers */
#define IBMVNIC_IO_ENTITLEMENT_DEFAULT 610305
#define IBMVNIC_MAX_LTB_SIZE ((1 << (MAX_ORDER - 1)) * PAGE_SIZE)
#define IBMVNIC_BUFFER_HLEN 500
+#define IBMVNIC_RESET_DELAY 100
+
static const char ibmvnic_priv_flags[][ETH_GSTRING_LEN] = {
#define IBMVNIC_USE_SERVER_MAXES 0x1
"use-server-maxes"
spinlock_t rwi_lock;
struct list_head rwi_list;
struct work_struct ibmvnic_reset;
- bool resetting;
+ struct delayed_work ibmvnic_delayed_reset;
+ unsigned long resetting;
bool napi_enabled, from_passive_init;
bool failover_pending;
skb_put(skb, len);
if (dev->features & NETIF_F_RXCSUM) {
- skb->csum = csum;
+ skb->csum = le16_to_cpu(csum);
skb->ip_summed = CHECKSUM_COMPLETE;
}
bool
config MLX5_FPGA
- bool "Mellanox Technologies Innova support"
- depends on MLX5_CORE
+ bool "Mellanox Technologies Innova support"
+ depends on MLX5_CORE
select MLX5_ACCEL
- ---help---
- Build support for the Innova family of network cards by Mellanox
- Technologies. Innova network cards are comprised of a ConnectX chip
- and an FPGA chip on one board. If you select this option, the
- mlx5_core driver will include the Innova FPGA core and allow building
- sandbox-specific client drivers.
+ ---help---
+ Build support for the Innova family of network cards by Mellanox
+ Technologies. Innova network cards are comprised of a ConnectX chip
+ and an FPGA chip on one board. If you select this option, the
+ mlx5_core driver will include the Innova FPGA core and allow building
+ sandbox-specific client drivers.
config MLX5_CORE_EN
bool "Mellanox 5th generation network adapters (ConnectX series) Ethernet support"
API.
config MLX5_MPFS
- bool "Mellanox Technologies MLX5 MPFS support"
- depends on MLX5_CORE_EN
+ bool "Mellanox Technologies MLX5 MPFS support"
+ depends on MLX5_CORE_EN
default y
- ---help---
+ ---help---
Mellanox Technologies Ethernet Multi-Physical Function Switch (MPFS)
- support in ConnectX NIC. MPFs is required for when multi-PF configuration
- is enabled to allow passing user configured unicast MAC addresses to the
- requesting PF.
+ support in ConnectX NIC. MPFs is required for when multi-PF configuration
+ is enabled to allow passing user configured unicast MAC addresses to the
+ requesting PF.
config MLX5_ESWITCH
bool "Mellanox Technologies MLX5 SRIOV E-Switch support"
default y
---help---
Mellanox Technologies Ethernet SRIOV E-Switch support in ConnectX NIC.
- E-Switch provides internal SRIOV packet steering and switching for the
- enabled VFs and PF in two available modes:
- Legacy SRIOV mode (L2 mac vlan steering based).
- Switchdev mode (eswitch offloads).
+ E-Switch provides internal SRIOV packet steering and switching for the
+ enabled VFs and PF in two available modes:
+ Legacy SRIOV mode (L2 mac vlan steering based).
+ Switchdev mode (eswitch offloads).
config MLX5_CORE_EN_DCB
bool "Data Center Bridging (DCB) Support"
struct mlx5_flow_table *ft,
struct ethtool_rx_flow_spec *fs)
{
+ struct mlx5_flow_act flow_act = { .flags = FLOW_ACT_NO_APPEND };
struct mlx5_flow_destination *dst = NULL;
- struct mlx5_flow_act flow_act = {0};
- struct mlx5_flow_spec *spec;
struct mlx5_flow_handle *rule;
+ struct mlx5_flow_spec *spec;
int err = 0;
spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
return err;
}
- if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS)) {
- struct flow_match_ipv4_addrs match;
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
+ struct flow_match_control match;
+ u16 addr_type;
- flow_rule_match_enc_ipv4_addrs(rule, &match);
- MLX5_SET(fte_match_set_lyr_2_4, headers_c,
- src_ipv4_src_ipv6.ipv4_layout.ipv4,
- ntohl(match.mask->src));
- MLX5_SET(fte_match_set_lyr_2_4, headers_v,
- src_ipv4_src_ipv6.ipv4_layout.ipv4,
- ntohl(match.key->src));
-
- MLX5_SET(fte_match_set_lyr_2_4, headers_c,
- dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
- ntohl(match.mask->dst));
- MLX5_SET(fte_match_set_lyr_2_4, headers_v,
- dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
- ntohl(match.key->dst));
-
- MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, ethertype);
- MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype, ETH_P_IP);
- } else if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS)) {
- struct flow_match_ipv6_addrs match;
+ flow_rule_match_enc_control(rule, &match);
+ addr_type = match.key->addr_type;
- flow_rule_match_enc_ipv6_addrs(rule, &match);
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
- src_ipv4_src_ipv6.ipv6_layout.ipv6),
- &match.mask->src, MLX5_FLD_SZ_BYTES(ipv6_layout, ipv6));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
- src_ipv4_src_ipv6.ipv6_layout.ipv6),
- &match.key->src, MLX5_FLD_SZ_BYTES(ipv6_layout, ipv6));
+ /* For tunnel addr_type used same key id`s as for non-tunnel */
+ if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
+ struct flow_match_ipv4_addrs match;
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
- dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
- &match.mask->dst, MLX5_FLD_SZ_BYTES(ipv6_layout, ipv6));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
- dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
- &match.key->dst, MLX5_FLD_SZ_BYTES(ipv6_layout, ipv6));
+ flow_rule_match_enc_ipv4_addrs(rule, &match);
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+ src_ipv4_src_ipv6.ipv4_layout.ipv4,
+ ntohl(match.mask->src));
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+ src_ipv4_src_ipv6.ipv4_layout.ipv4,
+ ntohl(match.key->src));
- MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, ethertype);
- MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype, ETH_P_IPV6);
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+ dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
+ ntohl(match.mask->dst));
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+ dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
+ ntohl(match.key->dst));
+
+ MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c,
+ ethertype);
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
+ ETH_P_IP);
+ } else if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
+ struct flow_match_ipv6_addrs match;
+
+ flow_rule_match_enc_ipv6_addrs(rule, &match);
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+ src_ipv4_src_ipv6.ipv6_layout.ipv6),
+ &match.mask->src, MLX5_FLD_SZ_BYTES(ipv6_layout,
+ ipv6));
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+ src_ipv4_src_ipv6.ipv6_layout.ipv6),
+ &match.key->src, MLX5_FLD_SZ_BYTES(ipv6_layout,
+ ipv6));
+
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+ dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+ &match.mask->dst, MLX5_FLD_SZ_BYTES(ipv6_layout,
+ ipv6));
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+ dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+ &match.key->dst, MLX5_FLD_SZ_BYTES(ipv6_layout,
+ ipv6));
+
+ MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c,
+ ethertype);
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
+ ETH_P_IPV6);
+ }
}
if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IP)) {
{ PCI_VDEVICE(MELLANOX, 0x101e), MLX5_PCI_DEV_IS_VF}, /* ConnectX Family mlx5Gen Virtual Function */
{ PCI_VDEVICE(MELLANOX, 0xa2d2) }, /* BlueField integrated ConnectX-5 network controller */
{ PCI_VDEVICE(MELLANOX, 0xa2d3), MLX5_PCI_DEV_IS_VF}, /* BlueField integrated ConnectX-5 network controller VF */
+ { PCI_VDEVICE(MELLANOX, 0xa2d6) }, /* BlueField-2 integrated ConnectX-6 Dx network controller */
{ 0, }
};
* that recalculates the CS and forwards to the vport.
*/
ret = mlx5dr_domain_cache_get_recalc_cs_ft_addr(dest_action->vport.dmn,
- dest_action->vport.num,
+ dest_action->vport.caps->num,
final_icm_addr);
if (ret) {
mlx5dr_err(dmn, "Failed to get FW cs recalc flow table\n");
dest_action = action;
if (rx_rule) {
/* Loopback on WIRE vport is not supported */
- if (action->vport.num == WIRE_PORT)
+ if (action->vport.caps->num == WIRE_PORT)
goto out_invalid_arg;
attr.final_icm_addr = action->vport.caps->icm_address_rx;
icm_mr->icm_start_addr = icm_mr->dm.addr;
- align_diff = icm_mr->icm_start_addr % align_base;
+ /* align_base is always a power of 2 */
+ align_diff = icm_mr->icm_start_addr & (align_base - 1);
if (align_diff)
icm_mr->used_length = align_base - align_diff;
(dmn->type == MLX5DR_DOMAIN_TYPE_FDB ||
dmn->type == MLX5DR_DOMAIN_TYPE_NIC_RX)) {
ret = mlx5dr_ste_build_src_gvmi_qpn(&sb[idx++], &mask,
- &dmn->info.caps,
- inner, rx);
+ dmn, inner, rx);
if (ret)
return ret;
}
prev_matcher = NULL;
if (next_matcher && !first)
- prev_matcher = list_entry(next_matcher->matcher_list.prev,
- struct mlx5dr_matcher,
- matcher_list);
+ prev_matcher = list_prev_entry(next_matcher, matcher_list);
else if (!first)
- prev_matcher = list_entry(tbl->matcher_list.prev,
- struct mlx5dr_matcher,
- matcher_list);
+ prev_matcher = list_last_entry(&tbl->matcher_list,
+ struct mlx5dr_matcher,
+ matcher_list);
if (dmn->type == MLX5DR_DOMAIN_TYPE_FDB ||
dmn->type == MLX5DR_DOMAIN_TYPE_NIC_RX) {
struct mlx5dr_ste *last_ste;
/* The new entry will be inserted after the last */
- last_ste = list_entry(miss_list->prev, struct mlx5dr_ste, miss_list_node);
+ last_ste = list_last_entry(miss_list, struct mlx5dr_ste, miss_list_node);
WARN_ON(!last_ste);
ste_info_last = kzalloc(sizeof(*ste_info_last), GFP_KERNEL);
struct mlx5dr_ste *prev_ste;
u64 miss_addr;
- prev_ste = list_entry(mlx5dr_ste_get_miss_list(ste)->prev, struct mlx5dr_ste,
- miss_list_node);
- if (!prev_ste) {
- WARN_ON(true);
+ prev_ste = list_prev_entry(ste, miss_list_node);
+ if (WARN_ON(!prev_ste))
return;
- }
miss_addr = mlx5dr_ste_get_miss_addr(ste->hw_ste);
mlx5dr_ste_set_miss_addr(prev_ste->hw_ste, miss_addr);
struct mlx5dr_ste_htbl *stats_tbl;
LIST_HEAD(send_ste_list);
- first_ste = list_entry(mlx5dr_ste_get_miss_list(ste)->next,
- struct mlx5dr_ste, miss_list_node);
+ first_ste = list_first_entry(mlx5dr_ste_get_miss_list(ste),
+ struct mlx5dr_ste, miss_list_node);
stats_tbl = first_ste->htbl;
/* Two options:
if (last_ste == first_ste)
next_ste = NULL;
else
- next_ste = list_entry(ste->miss_list_node.next,
- struct mlx5dr_ste, miss_list_node);
+ next_ste = list_next_entry(ste, miss_list_node);
if (!next_ste) {
/* One and only entry in the list */
spec->source_sqn = MLX5_GET(fte_match_set_misc, mask, source_sqn);
spec->source_port = MLX5_GET(fte_match_set_misc, mask, source_port);
+ spec->source_eswitch_owner_vhca_id = MLX5_GET(fte_match_set_misc, mask,
+ source_eswitch_owner_vhca_id);
spec->outer_second_prio = MLX5_GET(fte_match_set_misc, mask, outer_second_prio);
spec->outer_second_cfi = MLX5_GET(fte_match_set_misc, mask, outer_second_cfi);
{
struct mlx5dr_match_misc *misc_mask = &value->misc;
- if (misc_mask->source_port != 0xffff)
+ /* Partial misc source_port is not supported */
+ if (misc_mask->source_port && misc_mask->source_port != 0xffff)
+ return -EINVAL;
+
+ /* Partial misc source_eswitch_owner_vhca_id is not supported */
+ if (misc_mask->source_eswitch_owner_vhca_id &&
+ misc_mask->source_eswitch_owner_vhca_id != 0xffff)
return -EINVAL;
DR_STE_SET_MASK(src_gvmi_qp, bit_mask, source_gvmi, misc_mask, source_port);
DR_STE_SET_MASK(src_gvmi_qp, bit_mask, source_qp, misc_mask, source_sqn);
+ misc_mask->source_eswitch_owner_vhca_id = 0;
return 0;
}
struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
struct mlx5dr_match_misc *misc = &value->misc;
struct mlx5dr_cmd_vport_cap *vport_cap;
+ struct mlx5dr_domain *dmn = sb->dmn;
+ struct mlx5dr_cmd_caps *caps;
u8 *tag = hw_ste->tag;
DR_STE_SET_TAG(src_gvmi_qp, tag, source_qp, misc, source_sqn);
- vport_cap = mlx5dr_get_vport_cap(sb->caps, misc->source_port);
+ if (sb->vhca_id_valid) {
+ /* Find port GVMI based on the eswitch_owner_vhca_id */
+ if (misc->source_eswitch_owner_vhca_id == dmn->info.caps.gvmi)
+ caps = &dmn->info.caps;
+ else if (dmn->peer_dmn && (misc->source_eswitch_owner_vhca_id ==
+ dmn->peer_dmn->info.caps.gvmi))
+ caps = &dmn->peer_dmn->info.caps;
+ else
+ return -EINVAL;
+ } else {
+ caps = &dmn->info.caps;
+ }
+
+ vport_cap = mlx5dr_get_vport_cap(caps, misc->source_port);
if (!vport_cap)
return -EINVAL;
if (vport_cap->vport_gvmi)
MLX5_SET(ste_src_gvmi_qp, tag, source_gvmi, vport_cap->vport_gvmi);
+ misc->source_eswitch_owner_vhca_id = 0;
misc->source_port = 0;
return 0;
int mlx5dr_ste_build_src_gvmi_qpn(struct mlx5dr_ste_build *sb,
struct mlx5dr_match_param *mask,
- struct mlx5dr_cmd_caps *caps,
+ struct mlx5dr_domain *dmn,
bool inner, bool rx)
{
int ret;
+ /* Set vhca_id_valid before we reset source_eswitch_owner_vhca_id */
+ sb->vhca_id_valid = mask->misc.source_eswitch_owner_vhca_id;
+
ret = dr_ste_build_src_gvmi_qpn_bit_mask(mask, sb->bit_mask);
if (ret)
return ret;
sb->rx = rx;
- sb->caps = caps;
+ sb->dmn = dmn;
sb->inner = inner;
sb->lu_type = MLX5DR_STE_LU_TYPE_SRC_GVMI_AND_QP;
sb->byte_mask = dr_ste_conv_bit_to_byte_mask(sb->bit_mask);
struct mlx5dr_ste_build {
u8 inner:1;
u8 rx:1;
+ u8 vhca_id_valid:1;
+ struct mlx5dr_domain *dmn;
struct mlx5dr_cmd_caps *caps;
u8 lu_type;
u16 byte_mask;
bool inner, bool rx);
int mlx5dr_ste_build_src_gvmi_qpn(struct mlx5dr_ste_build *sb,
struct mlx5dr_match_param *mask,
- struct mlx5dr_cmd_caps *caps,
+ struct mlx5dr_domain *dmn,
bool inner, bool rx);
void mlx5dr_ste_build_empty_always_hit(struct mlx5dr_ste_build *sb, bool rx);
u32 gre_c_present:1;
/* Source port.;0xffff determines wire port */
u32 source_port:16;
- u32 reserved_auto2:16;
+ u32 source_eswitch_owner_vhca_id:16;
/* VLAN ID of first VLAN tag the inner header of the incoming packet.
* Valid only when inner_second_cvlan_tag ==1 or inner_second_svlan_tag ==1
*/
struct {
struct mlx5dr_domain *dmn;
struct mlx5dr_cmd_vport_cap *caps;
- u32 num;
} vport;
struct {
u32 vlan_hdr; /* tpid_pcp_dei_vid */
goto err_port_qdiscs_init;
}
+ err = mlxsw_sp_port_vlan_set(mlxsw_sp_port, 0, VLAN_N_VID - 1, false,
+ false);
+ if (err) {
+ dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to clear VLAN filter\n",
+ mlxsw_sp_port->local_port);
+ goto err_port_vlan_clear;
+ }
+
err = mlxsw_sp_port_nve_init(mlxsw_sp_port);
if (err) {
dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to initialize NVE\n",
err_port_pvid_set:
mlxsw_sp_port_nve_fini(mlxsw_sp_port);
err_port_nve_init:
+err_port_vlan_clear:
mlxsw_sp_tc_qdisc_fini(mlxsw_sp_port);
err_port_qdiscs_init:
mlxsw_sp_port_fids_fini(mlxsw_sp_port);
struct netlink_ext_ack *extack)
{
const struct flow_action_entry *act;
+ int mirror_act_count = 0;
int err, i;
if (!flow_action_has_entries(flow_action))
case FLOW_ACTION_MIRRED: {
struct net_device *out_dev = act->dev;
+ if (mirror_act_count++) {
+ NL_SET_ERR_MSG_MOD(extack, "Multiple mirror actions per rule are not supported");
+ return -EOPNOTSUPP;
+ }
+
err = mlxsw_sp_acl_rulei_act_mirror(mlxsw_sp, rulei,
block, out_dev,
extack);
continue;
phy = of_phy_find_device(phy_node);
+ of_node_put(phy_node);
if (!phy)
continue;
err = ocelot_probe_port(ocelot, port, regs, phy);
if (err) {
of_node_put(portnp);
- return err;
+ goto out_put_ports;
}
phy_mode = of_get_phy_mode(portnp);
"invalid phy mode for port%d, (Q)SGMII only\n",
port);
of_node_put(portnp);
- return -EINVAL;
+ err = -EINVAL;
+ goto out_put_ports;
}
serdes = devm_of_phy_get(ocelot->dev, portnp, NULL);
"missing SerDes phys for port%d\n",
port);
- goto err_probe_ports;
+ of_node_put(portnp);
+ goto out_put_ports;
}
ocelot->ports[port]->serdes = serdes;
dev_info(&pdev->dev, "Ocelot switch probed\n");
- return 0;
-
-err_probe_ports:
+out_put_ports:
+ of_node_put(ports);
return err;
}
u8 mask, val;
int err;
- if (!nfp_abm_u32_check_knode(alink->abm, knode, proto, extack))
+ if (!nfp_abm_u32_check_knode(alink->abm, knode, proto, extack)) {
+ err = -EOPNOTSUPP;
goto err_delete;
+ }
tos_off = proto == htons(ETH_P_IP) ? 16 : 20;
if ((iter->val & cmask) == (val & cmask) &&
iter->band != knode->res->classid) {
NL_SET_ERR_MSG_MOD(extack, "conflict with already offloaded filter");
+ err = -EOPNOTSUPP;
goto err_delete;
}
}
if (!match) {
match = kzalloc(sizeof(*match), GFP_KERNEL);
- if (!match)
- return -ENOMEM;
+ if (!match) {
+ err = -ENOMEM;
+ goto err_delete;
+ }
+
list_add(&match->list, &alink->dscp_map);
}
match->handle = knode->handle;
err_delete:
nfp_abm_u32_knode_delete(alink, knode);
- return -EOPNOTSUPP;
+ return err;
}
static int nfp_abm_setup_tc_block_cb(enum tc_setup_type type,
repr_priv = kzalloc(sizeof(*repr_priv), GFP_KERNEL);
if (!repr_priv) {
err = -ENOMEM;
+ nfp_repr_free(repr);
goto err_reprs_clean;
}
port = nfp_port_alloc(app, port_type, repr);
if (IS_ERR(port)) {
err = PTR_ERR(port);
+ kfree(repr_priv);
nfp_repr_free(repr);
goto err_reprs_clean;
}
err = nfp_repr_init(app, repr,
port_id, port, priv->nn->dp.netdev);
if (err) {
+ kfree(repr_priv);
nfp_port_free(port);
nfp_repr_free(repr);
goto err_reprs_clean;
repr_priv = kzalloc(sizeof(*repr_priv), GFP_KERNEL);
if (!repr_priv) {
err = -ENOMEM;
+ nfp_repr_free(repr);
goto err_reprs_clean;
}
port = nfp_port_alloc(app, NFP_PORT_PHYS_PORT, repr);
if (IS_ERR(port)) {
err = PTR_ERR(port);
+ kfree(repr_priv);
nfp_repr_free(repr);
goto err_reprs_clean;
}
err = nfp_port_init_phy_port(app->pf, app, port, i);
if (err) {
+ kfree(repr_priv);
nfp_port_free(port);
nfp_repr_free(repr);
goto err_reprs_clean;
err = nfp_repr_init(app, repr,
cmsg_port_id, port, priv->nn->dp.netdev);
if (err) {
+ kfree(repr_priv);
nfp_port_free(port);
nfp_repr_free(repr);
goto err_reprs_clean;
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * drivers/net/ethernet/netx-eth.c
- *
- * Copyright (c) 2005 Sascha Hauer <s.hauer@pengutronix.de>, Pengutronix
- */
-
-#include <linux/init.h>
-#include <linux/interrupt.h>
-#include <linux/module.h>
-#include <linux/kernel.h>
-#include <linux/delay.h>
-
-#include <linux/netdevice.h>
-#include <linux/platform_device.h>
-#include <linux/etherdevice.h>
-#include <linux/skbuff.h>
-#include <linux/mii.h>
-
-#include <asm/io.h>
-#include <mach/hardware.h>
-#include <mach/netx-regs.h>
-#include <mach/pfifo.h>
-#include <mach/xc.h>
-#include <linux/platform_data/eth-netx.h>
-
-/* XC Fifo Offsets */
-#define EMPTY_PTR_FIFO(xcno) (0 + ((xcno) << 3)) /* Index of the empty pointer FIFO */
-#define IND_FIFO_PORT_HI(xcno) (1 + ((xcno) << 3)) /* Index of the FIFO where received */
- /* Data packages are indicated by XC */
-#define IND_FIFO_PORT_LO(xcno) (2 + ((xcno) << 3)) /* Index of the FIFO where received */
- /* Data packages are indicated by XC */
-#define REQ_FIFO_PORT_HI(xcno) (3 + ((xcno) << 3)) /* Index of the FIFO where Data packages */
- /* have to be indicated by ARM which */
- /* shall be sent */
-#define REQ_FIFO_PORT_LO(xcno) (4 + ((xcno) << 3)) /* Index of the FIFO where Data packages */
- /* have to be indicated by ARM which shall */
- /* be sent */
-#define CON_FIFO_PORT_HI(xcno) (5 + ((xcno) << 3)) /* Index of the FIFO where sent Data packages */
- /* are confirmed */
-#define CON_FIFO_PORT_LO(xcno) (6 + ((xcno) << 3)) /* Index of the FIFO where sent Data */
- /* packages are confirmed */
-#define PFIFO_MASK(xcno) (0x7f << (xcno*8))
-
-#define FIFO_PTR_FRAMELEN_SHIFT 0
-#define FIFO_PTR_FRAMELEN_MASK (0x7ff << 0)
-#define FIFO_PTR_FRAMELEN(len) (((len) << 0) & FIFO_PTR_FRAMELEN_MASK)
-#define FIFO_PTR_TIMETRIG (1<<11)
-#define FIFO_PTR_MULTI_REQ
-#define FIFO_PTR_ORIGIN (1<<14)
-#define FIFO_PTR_VLAN (1<<15)
-#define FIFO_PTR_FRAMENO_SHIFT 16
-#define FIFO_PTR_FRAMENO_MASK (0x3f << 16)
-#define FIFO_PTR_FRAMENO(no) (((no) << 16) & FIFO_PTR_FRAMENO_MASK)
-#define FIFO_PTR_SEGMENT_SHIFT 22
-#define FIFO_PTR_SEGMENT_MASK (0xf << 22)
-#define FIFO_PTR_SEGMENT(seg) (((seg) & 0xf) << 22)
-#define FIFO_PTR_ERROR_SHIFT 28
-#define FIFO_PTR_ERROR_MASK (0xf << 28)
-
-#define ISR_LINK_STATUS_CHANGE (1<<4)
-#define ISR_IND_LO (1<<3)
-#define ISR_CON_LO (1<<2)
-#define ISR_IND_HI (1<<1)
-#define ISR_CON_HI (1<<0)
-
-#define ETH_MAC_LOCAL_CONFIG 0x1560
-#define ETH_MAC_4321 0x1564
-#define ETH_MAC_65 0x1568
-
-#define MAC_TRAFFIC_CLASS_ARRANGEMENT_SHIFT 16
-#define MAC_TRAFFIC_CLASS_ARRANGEMENT_MASK (0xf<<MAC_TRAFFIC_CLASS_ARRANGEMENT_SHIFT)
-#define MAC_TRAFFIC_CLASS_ARRANGEMENT(x) (((x)<<MAC_TRAFFIC_CLASS_ARRANGEMENT_SHIFT) & MAC_TRAFFIC_CLASS_ARRANGEMENT_MASK)
-#define LOCAL_CONFIG_LINK_STATUS_IRQ_EN (1<<24)
-#define LOCAL_CONFIG_CON_LO_IRQ_EN (1<<23)
-#define LOCAL_CONFIG_CON_HI_IRQ_EN (1<<22)
-#define LOCAL_CONFIG_IND_LO_IRQ_EN (1<<21)
-#define LOCAL_CONFIG_IND_HI_IRQ_EN (1<<20)
-
-#define CARDNAME "netx-eth"
-
-/* LSB must be zero */
-#define INTERNAL_PHY_ADR 0x1c
-
-struct netx_eth_priv {
- void __iomem *sram_base, *xpec_base, *xmac_base;
- int id;
- struct mii_if_info mii;
- u32 msg_enable;
- struct xc *xc;
- spinlock_t lock;
-};
-
-static void netx_eth_set_multicast_list(struct net_device *ndev)
-{
- /* implement me */
-}
-
-static int
-netx_eth_hard_start_xmit(struct sk_buff *skb, struct net_device *ndev)
-{
- struct netx_eth_priv *priv = netdev_priv(ndev);
- unsigned char *buf = skb->data;
- unsigned int len = skb->len;
-
- spin_lock_irq(&priv->lock);
- memcpy_toio(priv->sram_base + 1560, (void *)buf, len);
- if (len < 60) {
- memset_io(priv->sram_base + 1560 + len, 0, 60 - len);
- len = 60;
- }
-
- pfifo_push(REQ_FIFO_PORT_LO(priv->id),
- FIFO_PTR_SEGMENT(priv->id) |
- FIFO_PTR_FRAMENO(1) |
- FIFO_PTR_FRAMELEN(len));
-
- ndev->stats.tx_packets++;
- ndev->stats.tx_bytes += skb->len;
-
- netif_stop_queue(ndev);
- spin_unlock_irq(&priv->lock);
- dev_kfree_skb(skb);
-
- return NETDEV_TX_OK;
-}
-
-static void netx_eth_receive(struct net_device *ndev)
-{
- struct netx_eth_priv *priv = netdev_priv(ndev);
- unsigned int val, frameno, seg, len;
- unsigned char *data;
- struct sk_buff *skb;
-
- val = pfifo_pop(IND_FIFO_PORT_LO(priv->id));
-
- frameno = (val & FIFO_PTR_FRAMENO_MASK) >> FIFO_PTR_FRAMENO_SHIFT;
- seg = (val & FIFO_PTR_SEGMENT_MASK) >> FIFO_PTR_SEGMENT_SHIFT;
- len = (val & FIFO_PTR_FRAMELEN_MASK) >> FIFO_PTR_FRAMELEN_SHIFT;
-
- skb = netdev_alloc_skb(ndev, len);
- if (unlikely(skb == NULL)) {
- ndev->stats.rx_dropped++;
- return;
- }
-
- data = skb_put(skb, len);
-
- memcpy_fromio(data, priv->sram_base + frameno * 1560, len);
-
- pfifo_push(EMPTY_PTR_FIFO(priv->id),
- FIFO_PTR_SEGMENT(seg) | FIFO_PTR_FRAMENO(frameno));
-
- skb->protocol = eth_type_trans(skb, ndev);
- netif_rx(skb);
- ndev->stats.rx_packets++;
- ndev->stats.rx_bytes += len;
-}
-
-static irqreturn_t
-netx_eth_interrupt(int irq, void *dev_id)
-{
- struct net_device *ndev = dev_id;
- struct netx_eth_priv *priv = netdev_priv(ndev);
- int status;
- unsigned long flags;
-
- spin_lock_irqsave(&priv->lock, flags);
-
- status = readl(NETX_PFIFO_XPEC_ISR(priv->id));
- while (status) {
- int fill_level;
- writel(status, NETX_PFIFO_XPEC_ISR(priv->id));
-
- if ((status & ISR_CON_HI) || (status & ISR_IND_HI))
- printk("%s: unexpected status: 0x%08x\n",
- __func__, status);
-
- fill_level =
- readl(NETX_PFIFO_FILL_LEVEL(IND_FIFO_PORT_LO(priv->id)));
- while (fill_level--)
- netx_eth_receive(ndev);
-
- if (status & ISR_CON_LO)
- netif_wake_queue(ndev);
-
- if (status & ISR_LINK_STATUS_CHANGE)
- mii_check_media(&priv->mii, netif_msg_link(priv), 1);
-
- status = readl(NETX_PFIFO_XPEC_ISR(priv->id));
- }
- spin_unlock_irqrestore(&priv->lock, flags);
- return IRQ_HANDLED;
-}
-
-static int netx_eth_open(struct net_device *ndev)
-{
- struct netx_eth_priv *priv = netdev_priv(ndev);
-
- if (request_irq
- (ndev->irq, netx_eth_interrupt, IRQF_SHARED, ndev->name, ndev))
- return -EAGAIN;
-
- writel(ndev->dev_addr[0] |
- ndev->dev_addr[1]<<8 |
- ndev->dev_addr[2]<<16 |
- ndev->dev_addr[3]<<24,
- priv->xpec_base + NETX_XPEC_RAM_START_OFS + ETH_MAC_4321);
- writel(ndev->dev_addr[4] |
- ndev->dev_addr[5]<<8,
- priv->xpec_base + NETX_XPEC_RAM_START_OFS + ETH_MAC_65);
-
- writel(LOCAL_CONFIG_LINK_STATUS_IRQ_EN |
- LOCAL_CONFIG_CON_LO_IRQ_EN |
- LOCAL_CONFIG_CON_HI_IRQ_EN |
- LOCAL_CONFIG_IND_LO_IRQ_EN |
- LOCAL_CONFIG_IND_HI_IRQ_EN,
- priv->xpec_base + NETX_XPEC_RAM_START_OFS +
- ETH_MAC_LOCAL_CONFIG);
-
- mii_check_media(&priv->mii, netif_msg_link(priv), 1);
- netif_start_queue(ndev);
-
- return 0;
-}
-
-static int netx_eth_close(struct net_device *ndev)
-{
- struct netx_eth_priv *priv = netdev_priv(ndev);
-
- netif_stop_queue(ndev);
-
- writel(0,
- priv->xpec_base + NETX_XPEC_RAM_START_OFS + ETH_MAC_LOCAL_CONFIG);
-
- free_irq(ndev->irq, ndev);
-
- return 0;
-}
-
-static void netx_eth_timeout(struct net_device *ndev)
-{
- struct netx_eth_priv *priv = netdev_priv(ndev);
- int i;
-
- printk(KERN_ERR "%s: transmit timed out, resetting\n", ndev->name);
-
- spin_lock_irq(&priv->lock);
-
- xc_reset(priv->xc);
- xc_start(priv->xc);
-
- for (i=2; i<=18; i++)
- pfifo_push(EMPTY_PTR_FIFO(priv->id),
- FIFO_PTR_FRAMENO(i) | FIFO_PTR_SEGMENT(priv->id));
-
- spin_unlock_irq(&priv->lock);
-
- netif_wake_queue(ndev);
-}
-
-static int
-netx_eth_phy_read(struct net_device *ndev, int phy_id, int reg)
-{
- unsigned int val;
-
- val = MIIMU_SNRDY | MIIMU_PREAMBLE | MIIMU_PHYADDR(phy_id) |
- MIIMU_REGADDR(reg) | MIIMU_PHY_NRES;
-
- writel(val, NETX_MIIMU);
- while (readl(NETX_MIIMU) & MIIMU_SNRDY);
-
- return readl(NETX_MIIMU) >> 16;
-
-}
-
-static void
-netx_eth_phy_write(struct net_device *ndev, int phy_id, int reg, int value)
-{
- unsigned int val;
-
- val = MIIMU_SNRDY | MIIMU_PREAMBLE | MIIMU_PHYADDR(phy_id) |
- MIIMU_REGADDR(reg) | MIIMU_PHY_NRES | MIIMU_OPMODE_WRITE |
- MIIMU_DATA(value);
-
- writel(val, NETX_MIIMU);
- while (readl(NETX_MIIMU) & MIIMU_SNRDY);
-}
-
-static const struct net_device_ops netx_eth_netdev_ops = {
- .ndo_open = netx_eth_open,
- .ndo_stop = netx_eth_close,
- .ndo_start_xmit = netx_eth_hard_start_xmit,
- .ndo_tx_timeout = netx_eth_timeout,
- .ndo_set_rx_mode = netx_eth_set_multicast_list,
- .ndo_validate_addr = eth_validate_addr,
- .ndo_set_mac_address = eth_mac_addr,
-};
-
-static int netx_eth_enable(struct net_device *ndev)
-{
- struct netx_eth_priv *priv = netdev_priv(ndev);
- unsigned int mac4321, mac65;
- int running, i, ret;
- bool inv_mac_addr = false;
-
- ndev->netdev_ops = &netx_eth_netdev_ops;
- ndev->watchdog_timeo = msecs_to_jiffies(5000);
-
- priv->msg_enable = NETIF_MSG_LINK;
- priv->mii.phy_id_mask = 0x1f;
- priv->mii.reg_num_mask = 0x1f;
- priv->mii.force_media = 0;
- priv->mii.full_duplex = 0;
- priv->mii.dev = ndev;
- priv->mii.mdio_read = netx_eth_phy_read;
- priv->mii.mdio_write = netx_eth_phy_write;
- priv->mii.phy_id = INTERNAL_PHY_ADR + priv->id;
-
- running = xc_running(priv->xc);
- xc_stop(priv->xc);
-
- /* if the xc engine is already running, assume the bootloader has
- * loaded the firmware for us
- */
- if (running) {
- /* get Node Address from hardware */
- mac4321 = readl(priv->xpec_base +
- NETX_XPEC_RAM_START_OFS + ETH_MAC_4321);
- mac65 = readl(priv->xpec_base +
- NETX_XPEC_RAM_START_OFS + ETH_MAC_65);
-
- ndev->dev_addr[0] = mac4321 & 0xff;
- ndev->dev_addr[1] = (mac4321 >> 8) & 0xff;
- ndev->dev_addr[2] = (mac4321 >> 16) & 0xff;
- ndev->dev_addr[3] = (mac4321 >> 24) & 0xff;
- ndev->dev_addr[4] = mac65 & 0xff;
- ndev->dev_addr[5] = (mac65 >> 8) & 0xff;
- } else {
- if (xc_request_firmware(priv->xc)) {
- printk(CARDNAME ": requesting firmware failed\n");
- return -ENODEV;
- }
- }
-
- xc_reset(priv->xc);
- xc_start(priv->xc);
-
- if (!is_valid_ether_addr(ndev->dev_addr))
- inv_mac_addr = true;
-
- for (i=2; i<=18; i++)
- pfifo_push(EMPTY_PTR_FIFO(priv->id),
- FIFO_PTR_FRAMENO(i) | FIFO_PTR_SEGMENT(priv->id));
-
- ret = register_netdev(ndev);
- if (inv_mac_addr)
- printk("%s: Invalid ethernet MAC address. Please set using ip\n",
- ndev->name);
-
- return ret;
-}
-
-static int netx_eth_drv_probe(struct platform_device *pdev)
-{
- struct netx_eth_priv *priv;
- struct net_device *ndev;
- struct netxeth_platform_data *pdata;
- int ret;
-
- ndev = alloc_etherdev(sizeof (struct netx_eth_priv));
- if (!ndev) {
- ret = -ENOMEM;
- goto exit;
- }
- SET_NETDEV_DEV(ndev, &pdev->dev);
-
- platform_set_drvdata(pdev, ndev);
-
- priv = netdev_priv(ndev);
-
- pdata = dev_get_platdata(&pdev->dev);
- priv->xc = request_xc(pdata->xcno, &pdev->dev);
- if (!priv->xc) {
- dev_err(&pdev->dev, "unable to request xc engine\n");
- ret = -ENODEV;
- goto exit_free_netdev;
- }
-
- ndev->irq = priv->xc->irq;
- priv->id = pdev->id;
- priv->xpec_base = priv->xc->xpec_base;
- priv->xmac_base = priv->xc->xmac_base;
- priv->sram_base = priv->xc->sram_base;
-
- spin_lock_init(&priv->lock);
-
- ret = pfifo_request(PFIFO_MASK(priv->id));
- if (ret) {
- printk("unable to request PFIFO\n");
- goto exit_free_xc;
- }
-
- ret = netx_eth_enable(ndev);
- if (ret)
- goto exit_free_pfifo;
-
- return 0;
-exit_free_pfifo:
- pfifo_free(PFIFO_MASK(priv->id));
-exit_free_xc:
- free_xc(priv->xc);
-exit_free_netdev:
- free_netdev(ndev);
-exit:
- return ret;
-}
-
-static int netx_eth_drv_remove(struct platform_device *pdev)
-{
- struct net_device *ndev = platform_get_drvdata(pdev);
- struct netx_eth_priv *priv = netdev_priv(ndev);
-
- unregister_netdev(ndev);
- xc_stop(priv->xc);
- free_xc(priv->xc);
- free_netdev(ndev);
- pfifo_free(PFIFO_MASK(priv->id));
-
- return 0;
-}
-
-static int netx_eth_drv_suspend(struct platform_device *pdev, pm_message_t state)
-{
- dev_err(&pdev->dev, "suspend not implemented\n");
- return 0;
-}
-
-static int netx_eth_drv_resume(struct platform_device *pdev)
-{
- dev_err(&pdev->dev, "resume not implemented\n");
- return 0;
-}
-
-static struct platform_driver netx_eth_driver = {
- .probe = netx_eth_drv_probe,
- .remove = netx_eth_drv_remove,
- .suspend = netx_eth_drv_suspend,
- .resume = netx_eth_drv_resume,
- .driver = {
- .name = CARDNAME,
- },
-};
-
-static int __init netx_eth_init(void)
-{
- unsigned int phy_control, val;
-
- printk("NetX Ethernet driver\n");
-
- phy_control = PHY_CONTROL_PHY_ADDRESS(INTERNAL_PHY_ADR>>1) |
- PHY_CONTROL_PHY1_MODE(PHY_MODE_ALL) |
- PHY_CONTROL_PHY1_AUTOMDIX |
- PHY_CONTROL_PHY1_EN |
- PHY_CONTROL_PHY0_MODE(PHY_MODE_ALL) |
- PHY_CONTROL_PHY0_AUTOMDIX |
- PHY_CONTROL_PHY0_EN |
- PHY_CONTROL_CLK_XLATIN;
-
- val = readl(NETX_SYSTEM_IOC_ACCESS_KEY);
- writel(val, NETX_SYSTEM_IOC_ACCESS_KEY);
-
- writel(phy_control | PHY_CONTROL_RESET, NETX_SYSTEM_PHY_CONTROL);
- udelay(100);
-
- val = readl(NETX_SYSTEM_IOC_ACCESS_KEY);
- writel(val, NETX_SYSTEM_IOC_ACCESS_KEY);
-
- writel(phy_control, NETX_SYSTEM_PHY_CONTROL);
-
- return platform_driver_register(&netx_eth_driver);
-}
-
-static void __exit netx_eth_cleanup(void)
-{
- platform_driver_unregister(&netx_eth_driver);
-}
-
-module_init(netx_eth_init);
-module_exit(netx_eth_cleanup);
-
-MODULE_AUTHOR("Sascha Hauer, Pengutronix");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS("platform:" CARDNAME);
-MODULE_FIRMWARE("xc0.bin");
-MODULE_FIRMWARE("xc1.bin");
-MODULE_FIRMWARE("xc2.bin");
}
priv->phy_mode = of_get_phy_mode(pdev->dev.of_node);
- if (priv->phy_mode < 0) {
+ if ((int)priv->phy_mode < 0) {
netdev_err(ndev, "not find \"phy-mode\" property\n");
err = -EINVAL;
goto unregister_mdio;
# SPDX-License-Identifier: GPL-2.0-only
config LPC_ENET
- tristate "NXP ethernet MAC on LPC devices"
- depends on ARCH_LPC32XX || COMPILE_TEST
- select PHYLIB
- help
+ tristate "NXP ethernet MAC on LPC devices"
+ depends on ARCH_LPC32XX || COMPILE_TEST
+ select PHYLIB
+ help
Say Y or M here if you want to use the NXP ethernet MAC included on
some NXP LPC devices. You can safely enable this option for LPC32xx
SoC. Also available as a module.
config IONIC
tristate "Pensando Ethernet IONIC Support"
depends on 64BIT && PCI
+ select NET_DEVLINK
help
This enables the support for the Pensando family of Ethernet
adapters. More specific information on this driver can be
found in
<file:Documentation/networking/device_drivers/pensando/ionic.rst>.
- To compile this driver as a module, choose M here. The module
- will be called ionic.
+ To compile this driver as a module, choose M here. The module
+ will be called ionic.
endif # NET_VENDOR_PENSANDO
void ionic_debugfs_add_ident(struct ionic *ionic)
{
debugfs_create_file("identity", 0400, ionic->dentry,
- ionic, &identity_fops) ? 0 : -EOPNOTSUPP;
+ ionic, &identity_fops);
}
void ionic_debugfs_add_sizes(struct ionic *ionic)
GFP_KERNEL);
if (!lif->rss_ind_tbl) {
+ err = -ENOMEM;
dev_err(dev, "Failed to allocate rss indirection table, aborting\n");
goto err_out_free_qcqs;
}
return NULL;
skb_reserve(skb, pad);
- memcpy(skb_put(skb, len),
- page_address(bd->data) + offset, len);
+ skb_put_data(skb, page_address(bd->data) + offset, len);
qede_reuse_page(rxq, bd);
goto out;
}
netdev_err(qdev->ndev,
"PCI mapping failed with error: %d\n",
err);
+ dev_kfree_skb_irq(skb);
ql_free_large_buffers(qdev);
return -ENOMEM;
}
void *vaddr;
u16 head, tail;
u16 xdp_xmit; /* netsec_xdp_xmit packets */
- bool is_xdp;
struct page_pool *page_pool;
struct xdp_rxq_info xdp_rxq;
spinlock_t lock; /* XDP tx queue locking */
unsigned int bytes;
int cnt = 0;
- if (dring->is_xdp)
- spin_lock(&dring->lock);
+ spin_lock(&dring->lock);
bytes = 0;
entry = dring->vaddr + DESC_SZ * tail;
entry = dring->vaddr + DESC_SZ * tail;
cnt++;
}
- if (dring->is_xdp)
- spin_unlock(&dring->lock);
+
+ spin_unlock(&dring->lock);
if (!cnt)
return false;
de->data_buf_addr_lw = lower_32_bits(desc->dma_addr);
de->buf_len_info = (tx_ctrl->tcp_seg_len << 16) | desc->len;
de->attr = attr;
- /* under spin_lock if using XDP */
- if (!dring->is_xdp)
- dma_wmb();
dring->desc[idx] = *desc;
if (desc->buf_type == TYPE_NETSEC_SKB)
u16 tso_seg_len = 0;
int filled;
- if (dring->is_xdp)
- spin_lock_bh(&dring->lock);
+ spin_lock_bh(&dring->lock);
filled = netsec_desc_used(dring);
if (netsec_check_stop_tx(priv, filled)) {
- if (dring->is_xdp)
- spin_unlock_bh(&dring->lock);
+ spin_unlock_bh(&dring->lock);
net_warn_ratelimited("%s %s Tx queue full\n",
dev_name(priv->dev), ndev->name);
return NETDEV_TX_BUSY;
tx_desc.dma_addr = dma_map_single(priv->dev, skb->data,
skb_headlen(skb), DMA_TO_DEVICE);
if (dma_mapping_error(priv->dev, tx_desc.dma_addr)) {
- if (dring->is_xdp)
- spin_unlock_bh(&dring->lock);
+ spin_unlock_bh(&dring->lock);
netif_err(priv, drv, priv->ndev,
"%s: DMA mapping failed\n", __func__);
ndev->stats.tx_dropped++;
netdev_sent_queue(priv->ndev, skb->len);
netsec_set_tx_de(priv, dring, &tx_ctrl, &tx_desc, skb);
- if (dring->is_xdp)
- spin_unlock_bh(&dring->lock);
+ spin_unlock_bh(&dring->lock);
netsec_write(priv, NETSEC_REG_NRM_TX_PKTCNT, 1); /* submit another tx */
return NETDEV_TX_OK;
static void netsec_setup_tx_dring(struct netsec_priv *priv)
{
struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_TX];
- struct bpf_prog *xdp_prog = READ_ONCE(priv->xdp_prog);
int i;
for (i = 0; i < DESC_NUM; i++) {
*/
de->attr = 1U << NETSEC_TX_SHIFT_OWN_FIELD;
}
-
- if (xdp_prog)
- dring->is_xdp = true;
- else
- dring->is_xdp = false;
-
}
static int netsec_setup_rx_dring(struct netsec_priv *priv)
NETIF_MSG_LINK | NETIF_MSG_PROBE;
priv->phy_interface = device_get_phy_mode(&pdev->dev);
- if (priv->phy_interface < 0) {
+ if ((int)priv->phy_interface < 0) {
dev_err(&pdev->dev, "missing required property 'phy-mode'\n");
ret = -ENODEV;
goto free_ndev;
np = dev->of_node;
phy_mode = of_get_phy_mode(np);
- if (phy_mode < 0) {
+ if ((int)phy_mode < 0) {
dev_err(dev, "phy-mode not found\n");
return -EINVAL;
}
"socionext,syscon-phy-mode",
1, 0, &args);
if (ret) {
- netdev_err(ndev, "can't get syscon-phy-mode property\n");
+ dev_err(dev, "can't get syscon-phy-mode property\n");
goto out_free_netdev;
}
priv->regmap = syscon_node_to_regmap(args.np);
of_node_put(args.np);
if (IS_ERR(priv->regmap)) {
- netdev_err(ndev, "can't map syscon-phy-mode\n");
+ dev_err(dev, "can't map syscon-phy-mode\n");
ret = PTR_ERR(priv->regmap);
goto out_free_netdev;
}
ret = priv->data->get_pinmode(priv, phy_mode, args.args[0]);
if (ret) {
- netdev_err(ndev, "invalid phy-mode setting\n");
+ dev_err(dev, "invalid phy-mode setting\n");
goto out_free_netdev;
}
struct device *dev = &gmac->pdev->dev;
gmac->phy_mode = of_get_phy_mode(dev->of_node);
- if (gmac->phy_mode < 0) {
+ if ((int)gmac->phy_mode < 0) {
dev_err(dev, "missing phy mode property\n");
return -EINVAL;
}
dwmac->dev = &pdev->dev;
dwmac->phy_mode = of_get_phy_mode(pdev->dev.of_node);
- if (dwmac->phy_mode < 0) {
+ if ((int)dwmac->phy_mode < 0) {
dev_err(&pdev->dev, "missing phy-mode property\n");
ret = -EINVAL;
goto err_remove_config_dt;
int numhashregs = (hw->multicast_filter_bins >> 5);
int mcbitslog2 = hw->mcast_bits_log2;
unsigned int value;
+ u32 mc_filter[8];
int i;
+ memset(mc_filter, 0, sizeof(mc_filter));
+
value = readl(ioaddr + GMAC_PACKET_FILTER);
value &= ~GMAC_PACKET_FILTER_HMC;
value &= ~GMAC_PACKET_FILTER_HPF;
/* Pass all multi */
value |= GMAC_PACKET_FILTER_PM;
/* Set all the bits of the HASH tab */
- for (i = 0; i < numhashregs; i++)
- writel(0xffffffff, ioaddr + GMAC_HASH_TAB(i));
+ memset(mc_filter, 0xff, sizeof(mc_filter));
} else if (!netdev_mc_empty(dev)) {
struct netdev_hw_addr *ha;
- u32 mc_filter[8];
/* Hash filter for multicast */
value |= GMAC_PACKET_FILTER_HMC;
- memset(mc_filter, 0, sizeof(mc_filter));
netdev_for_each_mc_addr(ha, dev) {
/* The upper n bits of the calculated CRC are used to
* index the contents of the hash table. The number of
*/
mc_filter[bit_nr >> 5] |= (1 << (bit_nr & 0x1f));
}
- for (i = 0; i < numhashregs; i++)
- writel(mc_filter[i], ioaddr + GMAC_HASH_TAB(i));
}
+ for (i = 0; i < numhashregs; i++)
+ writel(mc_filter[i], ioaddr + GMAC_HASH_TAB(i));
+
value |= GMAC_PACKET_FILTER_HPF;
/* Handle multiple unicast addresses */
#define XGMAC_TSIE BIT(12)
#define XGMAC_LPIIE BIT(5)
#define XGMAC_PMTIE BIT(4)
-#define XGMAC_INT_DEFAULT_EN (XGMAC_LPIIE | XGMAC_PMTIE | XGMAC_TSIE)
+#define XGMAC_INT_DEFAULT_EN (XGMAC_LPIIE | XGMAC_PMTIE)
#define XGMAC_Qx_TX_FLOW_CTRL(x) (0x00000070 + (x) * 4)
#define XGMAC_PT GENMASK(31, 16)
#define XGMAC_PT_SHIFT 16
#define XGMAC_HWFEAT_GMIISEL BIT(1)
#define XGMAC_HW_FEATURE1 0x00000120
#define XGMAC_HWFEAT_L3L4FNUM GENMASK(30, 27)
+#define XGMAC_HWFEAT_HASHTBLSZ GENMASK(25, 24)
#define XGMAC_HWFEAT_RSSEN BIT(20)
#define XGMAC_HWFEAT_TSOEN BIT(18)
#define XGMAC_HWFEAT_SPHEN BIT(17)
dwxgmac2_set_mchash(ioaddr, mc_filter, mcbitslog2);
/* Handle multiple unicast addresses */
- if (netdev_uc_count(dev) > XGMAC_ADDR_MAX) {
+ if (netdev_uc_count(dev) > hw->unicast_filter_entries) {
value |= XGMAC_FILTER_PR;
} else {
struct netdev_hw_addr *ha;
struct stmmac_rss *cfg, u32 num_rxq)
{
void __iomem *ioaddr = hw->pcsr;
- u32 *key = (u32 *)cfg->key;
+ u32 value, *key;
int i, ret;
- u32 value;
value = readl(ioaddr + XGMAC_RSS_CTRL);
- if (!cfg->enable) {
+ if (!cfg || !cfg->enable) {
value &= ~XGMAC_RSSE;
writel(value, ioaddr + XGMAC_RSS_CTRL);
return 0;
}
- for (i = 0; i < (sizeof(cfg->key) / sizeof(u32)); i++) {
- ret = dwxgmac2_rss_write_reg(ioaddr, true, i, *key++);
+ key = (u32 *)cfg->key;
+ for (i = 0; i < (ARRAY_SIZE(cfg->key) / sizeof(u32)); i++) {
+ ret = dwxgmac2_rss_write_reg(ioaddr, true, i, key[i]);
if (ret)
return ret;
}
/* MAC HW feature 1 */
hw_cap = readl(ioaddr + XGMAC_HW_FEATURE1);
dma_cap->l3l4fnum = (hw_cap & XGMAC_HWFEAT_L3L4FNUM) >> 27;
+ dma_cap->hash_tb_sz = (hw_cap & XGMAC_HWFEAT_HASHTBLSZ) >> 24;
dma_cap->rssen = (hw_cap & XGMAC_HWFEAT_RSSEN) >> 20;
dma_cap->tsoen = (hw_cap & XGMAC_HWFEAT_TSOEN) >> 18;
dma_cap->sphen = (hw_cap & XGMAC_HWFEAT_SPHEN) >> 17;
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
+ ts_event_en = PTP_TCR_TSEVNTENA;
ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
ptp_over_ethernet = PTP_TCR_TSIPENA;
for (queue = 0; queue < rx_count; queue++) {
struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
struct page_pool_params pp_params = { 0 };
+ unsigned int num_pages;
rx_q->queue_index = queue;
rx_q->priv_data = priv;
pp_params.flags = PP_FLAG_DMA_MAP;
pp_params.pool_size = DMA_RX_SIZE;
- pp_params.order = DIV_ROUND_UP(priv->dma_buf_sz, PAGE_SIZE);
+ num_pages = DIV_ROUND_UP(priv->dma_buf_sz, PAGE_SIZE);
+ pp_params.order = ilog2(num_pages);
pp_params.nid = dev_to_node(priv->device);
pp_params.dev = priv->device;
pp_params.dma_dir = DMA_FROM_DEVICE;
if (!ndev || !netif_running(ndev))
return 0;
- mutex_lock(&priv->lock);
+ phylink_mac_change(priv->phylink, false);
- rtnl_lock();
- phylink_stop(priv->phylink);
- rtnl_unlock();
+ mutex_lock(&priv->lock);
netif_device_detach(ndev);
stmmac_stop_all_queues(priv);
stmmac_pmt(priv, priv->hw, priv->wolopts);
priv->irq_wake = 1;
} else {
+ mutex_unlock(&priv->lock);
+ rtnl_lock();
+ phylink_stop(priv->phylink);
+ rtnl_unlock();
+ mutex_lock(&priv->lock);
+
stmmac_mac_set(priv, priv->ioaddr, false);
pinctrl_pm_select_sleep_state(priv->device);
/* Disable clock in case of PWM is off */
stmmac_start_all_queues(priv);
- rtnl_lock();
- phylink_start(priv->phylink);
- rtnl_unlock();
-
mutex_unlock(&priv->lock);
+ if (!device_may_wakeup(priv->device)) {
+ rtnl_lock();
+ phylink_start(priv->phylink);
+ rtnl_unlock();
+ }
+
+ phylink_mac_change(priv->phylink, true);
+
return 0;
}
EXPORT_SYMBOL_GPL(stmmac_resume);
unsigned int pkt_count;
int i, ret = 0;
- if (!phydev || !phydev->pause)
+ if (!phydev || (!phydev->pause && !phydev->asym_pause))
return -EOPNOTSUPP;
tpriv = kzalloc(sizeof(*tpriv), GFP_KERNEL);
return -EOPNOTSUPP;
if (!priv->dma_cap.l3l4fnum)
return -EOPNOTSUPP;
- if (priv->rss.enable) {
- struct stmmac_rss rss = { .enable = false, };
-
- stmmac_rss_configure(priv, priv->hw, &rss,
+ if (priv->rss.enable)
+ stmmac_rss_configure(priv, priv->hw, NULL,
priv->plat->rx_queues_to_use);
- }
dissector = kzalloc(sizeof(*dissector), GFP_KERNEL);
if (!dissector) {
return -EOPNOTSUPP;
if (!priv->dma_cap.l3l4fnum)
return -EOPNOTSUPP;
- if (priv->rss.enable) {
- struct stmmac_rss rss = { .enable = false, };
-
- stmmac_rss_configure(priv, priv->hw, &rss,
+ if (priv->rss.enable)
+ stmmac_rss_configure(priv, priv->hw, NULL,
priv->plat->rx_queues_to_use);
- }
dissector = kzalloc(sizeof(*dissector), GFP_KERNEL);
if (!dissector) {
struct stmmac_packet_attrs attr = { };
int size = priv->dma_buf_sz;
- /* Only XGMAC has SW support for multiple RX descs in same packet */
- if (priv->plat->has_xgmac)
- size = priv->dev->max_mtu;
-
attr.dst = priv->dev->dev_addr;
attr.max_size = size - ETH_FCS_LEN;
attr.queue_mapping = queue;
}
} else {
lp->phy_mode = of_get_phy_mode(pdev->dev.of_node);
- if (lp->phy_mode < 0) {
+ if ((int)lp->phy_mode < 0) {
ret = -EINVAL;
goto free_netdev;
}
ieee802154_unregister_hw(atusb->hw);
+ usb_put_dev(atusb->usb_dev);
+
ieee802154_free_hw(atusb->hw);
usb_set_intfdata(interface, NULL);
- usb_put_dev(atusb->usb_dev);
pr_debug("%s done\n", __func__);
}
goto error;
}
+ priv->spi->dev.platform_data = pdata;
ret = ca8210_get_platform_data(priv->spi, pdata);
if (ret) {
dev_crit(&spi_device->dev, "ca8210_get_platform_data failed\n");
goto error;
}
- priv->spi->dev.platform_data = pdata;
ret = ca8210_dev_com_init(priv);
if (ret) {
if (!skb)
return;
- memcpy(skb_put(skb, len), lp->rx_buf, len);
+ __skb_put_data(skb, lp->rx_buf, len);
ieee802154_rx_irqsafe(lp->hw, skb, lp->rx_lqi[0]);
print_hex_dump_debug("mcr20a rx: ", DUMP_PREFIX_OFFSET, 16, 1,
macsec_rxsa_put(rx_sa);
macsec_rxsc_put(rx_sc);
+ skb_orphan(skb);
ret = gro_cells_receive(&macsec->gro_cells, skb);
if (ret == NET_RX_SUCCESS)
count_rx(dev, skb->len);
Supports the Renesas PHYs uPD60620 and uPD60620A.
config ROCKCHIP_PHY
- tristate "Driver for Rockchip Ethernet PHYs"
- ---help---
- Currently supports the integrated Ethernet PHY.
+ tristate "Driver for Rockchip Ethernet PHYs"
+ ---help---
+ Currently supports the integrated Ethernet PHY.
config SMSC_PHY
tristate "SMSC PHYs"
#include <linux/of_gpio.h>
#include <linux/gpio/consumer.h>
+#define AT803X_SPECIFIC_STATUS 0x11
+#define AT803X_SS_SPEED_MASK (3 << 14)
+#define AT803X_SS_SPEED_1000 (2 << 14)
+#define AT803X_SS_SPEED_100 (1 << 14)
+#define AT803X_SS_SPEED_10 (0 << 14)
+#define AT803X_SS_DUPLEX BIT(13)
+#define AT803X_SS_SPEED_DUPLEX_RESOLVED BIT(11)
+#define AT803X_SS_MDIX BIT(6)
+
#define AT803X_INTR_ENABLE 0x12
#define AT803X_INTR_ENABLE_AUTONEG_ERR BIT(15)
#define AT803X_INTR_ENABLE_SPEED_CHANGED BIT(14)
return aneg_done;
}
+static int at803x_read_status(struct phy_device *phydev)
+{
+ int ss, err, old_link = phydev->link;
+
+ /* Update the link, but return if there was an error */
+ err = genphy_update_link(phydev);
+ if (err)
+ return err;
+
+ /* why bother the PHY if nothing can have changed */
+ if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link)
+ return 0;
+
+ phydev->speed = SPEED_UNKNOWN;
+ phydev->duplex = DUPLEX_UNKNOWN;
+ phydev->pause = 0;
+ phydev->asym_pause = 0;
+
+ err = genphy_read_lpa(phydev);
+ if (err < 0)
+ return err;
+
+ /* Read the AT8035 PHY-Specific Status register, which indicates the
+ * speed and duplex that the PHY is actually using, irrespective of
+ * whether we are in autoneg mode or not.
+ */
+ ss = phy_read(phydev, AT803X_SPECIFIC_STATUS);
+ if (ss < 0)
+ return ss;
+
+ if (ss & AT803X_SS_SPEED_DUPLEX_RESOLVED) {
+ switch (ss & AT803X_SS_SPEED_MASK) {
+ case AT803X_SS_SPEED_10:
+ phydev->speed = SPEED_10;
+ break;
+ case AT803X_SS_SPEED_100:
+ phydev->speed = SPEED_100;
+ break;
+ case AT803X_SS_SPEED_1000:
+ phydev->speed = SPEED_1000;
+ break;
+ }
+ if (ss & AT803X_SS_DUPLEX)
+ phydev->duplex = DUPLEX_FULL;
+ else
+ phydev->duplex = DUPLEX_HALF;
+ if (ss & AT803X_SS_MDIX)
+ phydev->mdix = ETH_TP_MDI_X;
+ else
+ phydev->mdix = ETH_TP_MDI;
+ }
+
+ if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete)
+ phy_resolve_aneg_pause(phydev);
+
+ return 0;
+}
+
static struct phy_driver at803x_driver[] = {
{
/* ATHEROS 8035 */
.suspend = at803x_suspend,
.resume = at803x_resume,
/* PHY_GBIT_FEATURES */
+ .read_status = at803x_read_status,
.ack_interrupt = at803x_ack_interrupt,
.config_intr = at803x_config_intr,
}, {
.suspend = at803x_suspend,
.resume = at803x_resume,
/* PHY_GBIT_FEATURES */
+ .read_status = at803x_read_status,
.aneg_done = at803x_aneg_done,
.ack_interrupt = &at803x_ack_interrupt,
.config_intr = &at803x_config_intr,
return;
if (mdiodev->reset_gpio)
- gpiod_set_value(mdiodev->reset_gpio, value);
+ gpiod_set_value_cansleep(mdiodev->reset_gpio, value);
if (mdiodev->reset_ctrl) {
if (value)
* Whenever the device's Asymmetric Pause capability is set to 1,
* link-up may fail after a link-up to link-down transition.
*
+ * The Errata Sheet is for ksz9031, but ksz9021 has the same issue
+ *
* Workaround:
* Do not enable the Asymmetric Pause capability bit.
*/
/* PHY_GBIT_FEATURES */
.driver_data = &ksz9021_type,
.probe = kszphy_probe,
+ .get_features = ksz9031_get_features,
.config_init = ksz9021_config_init,
.ack_interrupt = kszphy_ack_interrupt,
.config_intr = kszphy_config_intr,
static void ns_10_base_t_hdx_loopack(struct phy_device *phydev, int disable)
{
+ u16 lb_dis = BIT(1);
+
if (disable)
- ns_exp_write(phydev, 0x1c0, ns_exp_read(phydev, 0x1c0) | 1);
+ ns_exp_write(phydev, 0x1c0,
+ ns_exp_read(phydev, 0x1c0) | lb_dis);
else
ns_exp_write(phydev, 0x1c0,
- ns_exp_read(phydev, 0x1c0) & 0xfffe);
+ ns_exp_read(phydev, 0x1c0) & ~lb_dis);
pr_debug("10BASE-T HDX loopback %s\n",
- (ns_exp_read(phydev, 0x1c0) & 0x0001) ? "off" : "on");
+ (ns_exp_read(phydev, 0x1c0) & lb_dis) ? "off" : "on");
}
static int ns_config_init(struct phy_device *phydev)
phydev->eee_broken_modes = broken;
}
+void phy_resolve_aneg_pause(struct phy_device *phydev)
+{
+ if (phydev->duplex == DUPLEX_FULL) {
+ phydev->pause = linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+ phydev->lp_advertising);
+ phydev->asym_pause = linkmode_test_bit(
+ ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+ phydev->lp_advertising);
+ }
+}
+EXPORT_SYMBOL_GPL(phy_resolve_aneg_pause);
+
/**
* phy_resolve_aneg_linkmode - resolve the advertisements into phy settings
* @phydev: The phy_device struct
break;
}
- if (phydev->duplex == DUPLEX_FULL) {
- phydev->pause = linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT,
- phydev->lp_advertising);
- phydev->asym_pause = linkmode_test_bit(
- ETHTOOL_LINK_MODE_Asym_Pause_BIT,
- phydev->lp_advertising);
- }
+ phy_resolve_aneg_pause(phydev);
}
EXPORT_SYMBOL_GPL(phy_resolve_aneg_linkmode);
val);
change_autoneg = true;
break;
+ case MII_CTRL1000:
+ mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
+ val);
+ change_autoneg = true;
+ break;
default:
/* do nothing */
break;
}
EXPORT_SYMBOL(genphy_update_link);
-/**
- * genphy_read_status - check the link status and update current link state
- * @phydev: target phy_device struct
- *
- * Description: Check the link, then figure out the current state
- * by comparing what we advertise with what the link partner
- * advertises. Start by checking the gigabit possibilities,
- * then move on to 10/100.
- */
-int genphy_read_status(struct phy_device *phydev)
+int genphy_read_lpa(struct phy_device *phydev)
{
- int lpa, lpagb, err, old_link = phydev->link;
-
- /* Update the link, but return if there was an error */
- err = genphy_update_link(phydev);
- if (err)
- return err;
-
- /* why bother the PHY if nothing can have changed */
- if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link)
- return 0;
-
- phydev->speed = SPEED_UNKNOWN;
- phydev->duplex = DUPLEX_UNKNOWN;
- phydev->pause = 0;
- phydev->asym_pause = 0;
+ int lpa, lpagb;
if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
if (phydev->is_gigabit_capable) {
return lpa;
mii_lpa_mod_linkmode_lpa_t(phydev->lp_advertising, lpa);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(genphy_read_lpa);
+
+/**
+ * genphy_read_status - check the link status and update current link state
+ * @phydev: target phy_device struct
+ *
+ * Description: Check the link, then figure out the current state
+ * by comparing what we advertise with what the link partner
+ * advertises. Start by checking the gigabit possibilities,
+ * then move on to 10/100.
+ */
+int genphy_read_status(struct phy_device *phydev)
+{
+ int err, old_link = phydev->link;
+
+ /* Update the link, but return if there was an error */
+ err = genphy_update_link(phydev);
+ if (err)
+ return err;
+
+ /* why bother the PHY if nothing can have changed */
+ if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link)
+ return 0;
+
+ phydev->speed = SPEED_UNKNOWN;
+ phydev->duplex = DUPLEX_UNKNOWN;
+ phydev->pause = 0;
+ phydev->asym_pause = 0;
+
+ err = genphy_read_lpa(phydev);
+ if (err < 0)
+ return err;
+
+ if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
phy_resolve_aneg_linkmode(phydev);
} else if (phydev->autoneg == AUTONEG_DISABLE) {
int bmcr = phy_read(phydev, MII_BMCR);
netif_wake_queue(ppp->dev);
else
netif_stop_queue(ppp->dev);
+ } else {
+ kfree_skb(skb);
}
ppp_xmit_unlock(ppp);
}
skb_dst_drop(skb);
skb_dst_set(skb, &rt->dst);
- nf_reset(skb);
+ nf_reset_ct(skb);
skb->ip_summed = CHECKSUM_NONE;
ip_select_ident(net, skb, NULL);
po = lookup_chan(htons(header->call_id), iph->saddr);
if (po) {
skb_dst_drop(skb);
- nf_reset(skb);
+ nf_reset_ct(skb);
return sk_receive_skb(sk_pppox(po), skb, 0);
}
drop:
kfree_skb(skb);
err:
rcu_read_lock();
- tap = rcu_dereference(q->tap);
+ tap = rcu_dereference(q->tap);
if (tap && tap->count_tx_dropped)
tap->count_tx_dropped(tap);
rcu_read_unlock();
*/
skb_orphan(skb);
- nf_reset(skb);
+ nf_reset_ct(skb);
if (ptr_ring_produce(&tfile->tx_ring, skb))
goto drop;
u8 ep;
for (ep = 0; ep < intf->cur_altsetting->desc.bNumEndpoints; ep++) {
-
e = intf->cur_altsetting->endpoint + ep;
+
+ /* ignore endpoints which cannot transfer data */
+ if (!usb_endpoint_maxp(&e->desc))
+ continue;
+
switch (e->desc.bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) {
case USB_ENDPOINT_XFER_INT:
if (usb_endpoint_dir_in(&e->desc)) {
*/
if (serial->tiocmget) {
tiocmget = serial->tiocmget;
+ tiocmget->endp = hso_get_ep(interface,
+ USB_ENDPOINT_XFER_INT,
+ USB_DIR_IN);
+ if (!tiocmget->endp) {
+ dev_err(&interface->dev, "Failed to find INT IN ep\n");
+ goto exit;
+ }
+
tiocmget->urb = usb_alloc_urb(0, GFP_KERNEL);
if (tiocmget->urb) {
mutex_init(&tiocmget->mutex);
init_waitqueue_head(&tiocmget->waitq);
- tiocmget->endp = hso_get_ep(
- interface,
- USB_ENDPOINT_XFER_INT,
- USB_DIR_IN);
} else
hso_free_tiomget(serial);
}
{QMI_FIXED_INTF(0x1e2d, 0x0082, 4)}, /* Cinterion PHxx,PXxx (2 RmNet) */
{QMI_FIXED_INTF(0x1e2d, 0x0082, 5)}, /* Cinterion PHxx,PXxx (2 RmNet) */
{QMI_FIXED_INTF(0x1e2d, 0x0083, 4)}, /* Cinterion PHxx,PXxx (1 RmNet + USB Audio)*/
+ {QMI_QUIRK_SET_DTR(0x1e2d, 0x00b0, 4)}, /* Cinterion CLS8 */
{QMI_FIXED_INTF(0x413c, 0x81a2, 8)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */
{QMI_FIXED_INTF(0x413c, 0x81a3, 8)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */
{QMI_FIXED_INTF(0x413c, 0x81a4, 8)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */
struct r8152 *tp = usb_get_intfdata(intf);
clear_bit(SELECTIVE_SUSPEND, &tp->flags);
- mutex_lock(&tp->control);
tp->rtl_ops.init(tp);
queue_delayed_work(system_long_wq, &tp->hw_phy_work, 0);
- mutex_unlock(&tp->control);
+ set_ethernet_addr(tp);
return rtl8152_resume(intf);
}
int intr = 0;
e = alt->endpoint + ep;
+
+ /* ignore endpoints which cannot transfer data */
+ if (!usb_endpoint_maxp(&e->desc))
+ continue;
+
switch (e->desc.bmAttributes) {
case USB_ENDPOINT_XFER_INT:
if (!usb_endpoint_dir_in(&e->desc))
{
enum usb_device_speed speed = dev->udev->speed;
+ if (!dev->rx_urb_size || !dev->hard_mtu)
+ goto insanity;
switch (speed) {
case USB_SPEED_HIGH:
dev->rx_qlen = MAX_QUEUE_MEMORY / dev->rx_urb_size;
dev->tx_qlen = 5 * MAX_QUEUE_MEMORY / dev->hard_mtu;
break;
default:
+insanity:
dev->rx_qlen = dev->tx_qlen = 4;
}
}
/* Don't wait up for transmitted skbs to be freed. */
if (!use_napi) {
skb_orphan(skb);
- nf_reset(skb);
+ nf_reset_ct(skb);
}
/* If running out of space, stop queue to avoid getting packets that we
struct neighbour *neigh;
int ret;
- nf_reset(skb);
+ nf_reset_ct(skb);
skb->protocol = htons(ETH_P_IPV6);
skb->dev = dev;
/* reset skb device */
if (likely(err == 1))
- nf_reset(skb);
+ nf_reset_ct(skb);
else
skb = NULL;
bool is_v6gw = false;
int ret = -EINVAL;
- nf_reset(skb);
+ nf_reset_ct(skb);
/* Be paranoid, rather than too clever. */
if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
/* reset skb device */
if (likely(err == 1))
- nf_reset(skb);
+ nf_reset_ct(skb);
else
skb = NULL;
struct sk_buff *skb;
int err;
- if (family == AF_INET6 && !ipv6_mod_enabled())
+ if ((family == AF_INET6 || family == RTNL_FAMILY_IP6MR) &&
+ !ipv6_mod_enabled())
return 0;
skb = nlmsg_new(vrf_fib_rule_nl_size(), GFP_KERNEL);
depends on ATH_DEBUG
depends on EVENT_TRACING
---help---
- This option enables tracepoints for atheros wireless drivers.
+ This option enables tracepoints for atheros wireless drivers.
Currently, ath9k makes use of this facility.
config ATH_REG_DYNAMIC_USER_REG_HINTS
select ATH_COMMON
select FW_LOADER
---help---
- This module add support for AR5523 based USB dongles such as D-Link
- DWL-G132, Netgear WPN111 and many more.
+ This module add support for AR5523 based USB dongles such as D-Link
+ DWL-G132, Netgear WPN111 and many more.
config ATH6KL
tristate "Atheros mobile chipsets support"
depends on CFG80211
- ---help---
+ ---help---
This module adds core support for wireless adapters based on
Atheros AR6003 and AR6004 chipsets. You still need separate
bus drivers for USB and SDIO to be able to use real devices.
depends on ATH9K
default n
---help---
- This option enables channel context support in ath9k, which is needed
+ This option enables channel context support in ath9k, which is needed
for multi-channel concurrency. Enable this if P2P PowerSave support
is required.
default y
config CARL9170_HWRNG
- bool "Random number generator"
- depends on CARL9170 && (HW_RANDOM = y || HW_RANDOM = CARL9170)
- default n
+ bool "Random number generator"
+ depends on CARL9170 && (HW_RANDOM = y || HW_RANDOM = CARL9170)
+ default n
help
Provides a hardware random number generator to the kernel.
skb_orphan(skb);
if (security && (wil->txrx_ops.rx_crypto_check(wil, skb) != 0)) {
+ wil_dbg_txrx(wil, "Rx drop %d bytes\n", skb->len);
dev_kfree_skb(skb);
ndev->stats.rx_dropped++;
stats->rx_replay++;
stats->rx_dropped++;
- wil_dbg_txrx(wil, "Rx drop %d bytes\n", skb->len);
return;
}
select FW_LOADER
select CRC32
---help---
- A driver 802.11b wireless cards based on the Atmel fast-vnet
- chips. This driver supports standard Linux wireless extensions.
+ A driver 802.11b wireless cards based on the Atmel fast-vnet
+ chips. This driver supports standard Linux wireless extensions.
- Many cards based on this chipset do not have flash memory
- and need their firmware loaded at start-up. If yours is
- one of these, you will need to provide a firmware image
- to be loaded into the card by the driver. The Atmel
- firmware package can be downloaded from
- <http://www.thekelleys.org.uk/atmel>
+ Many cards based on this chipset do not have flash memory
+ and need their firmware loaded at start-up. If yours is
+ one of these, you will need to provide a firmware image
+ to be loaded into the card by the driver. The Atmel
+ firmware package can be downloaded from
+ <http://www.thekelleys.org.uk/atmel>
config PCI_ATMEL
tristate "Atmel at76c506 PCI cards"
depends on ATMEL && PCI
---help---
- Enable support for PCI and mini-PCI cards containing the
- Atmel at76c506 chip.
+ Enable support for PCI and mini-PCI cards containing the
+ Atmel at76c506 chip.
config PCMCIA_ATMEL
tristate "Atmel at76c502/at76c504 PCMCIA cards"
Atmel at76c502 and at76c504 chips.
config AT76C50X_USB
- tristate "Atmel at76c503/at76c505/at76c505a USB cards"
- depends on MAC80211 && USB
- select FW_LOADER
- ---help---
- Enable support for USB Wireless devices using Atmel at76c503,
- at76c505 or at76c505a chips.
+ tristate "Atmel at76c503/at76c505/at76c505a USB cards"
+ depends on MAC80211 && USB
+ select FW_LOADER
+ ---help---
+ Enable support for USB Wireless devices using Atmel at76c503,
+ at76c505 or at76c505a chips.
endif # WLAN_VENDOR_ATMEL
select LIB80211
select LIBIPW
---help---
- A driver for the Intel PRO/Wireless 2100 Network
+ A driver for the Intel PRO/Wireless 2100 Network
Connection 802.11b wireless network adapter.
- See <file:Documentation/networking/device_drivers/intel/ipw2100.txt>
+ See <file:Documentation/networking/device_drivers/intel/ipw2100.txt>
for information on the capabilities currently enabled in this driver
and for tips for debugging issues and problems.
In order to use this driver, you will need a firmware image for it.
- You can obtain the firmware from
- <http://ipw2100.sf.net/>. Once you have the firmware image, you
+ You can obtain the firmware from
+ <http://ipw2100.sf.net/>. Once you have the firmware image, you
will need to place it in /lib/firmware.
- You will also very likely need the Wireless Tools in order to
- configure your card:
+ You will also very likely need the Wireless Tools in order to
+ configure your card:
- <http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/Tools.html>.
+ <http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/Tools.html>.
+
+ It is recommended that you compile this driver as a module (M)
+ rather than built-in (Y). This driver requires firmware at device
+ initialization time, and when built-in this typically happens
+ before the filesystem is accessible (hence firmware will be
+ unavailable and initialization will fail). If you do choose to build
+ this driver into your kernel image, you can avoid this problem by
+ including the firmware and a firmware loader in an initramfs.
- It is recommended that you compile this driver as a module (M)
- rather than built-in (Y). This driver requires firmware at device
- initialization time, and when built-in this typically happens
- before the filesystem is accessible (hence firmware will be
- unavailable and initialization will fail). If you do choose to build
- this driver into your kernel image, you can avoid this problem by
- including the firmware and a firmware loader in an initramfs.
-
config IPW2100_MONITOR
- bool "Enable promiscuous mode"
- depends on IPW2100
- ---help---
+ bool "Enable promiscuous mode"
+ depends on IPW2100
+ ---help---
Enables promiscuous/monitor mode support for the ipw2100 driver.
- With this feature compiled into the driver, you can switch to
+ With this feature compiled into the driver, you can switch to
promiscuous mode via the Wireless Tool's Monitor mode. While in this
mode, no packets can be sent.
bool "Enable full debugging output in IPW2100 module."
depends on IPW2100
---help---
- This option will enable debug tracing output for the IPW2100.
+ This option will enable debug tracing output for the IPW2100.
- This will result in the kernel module being ~60k larger. You can
- control which debug output is sent to the kernel log by setting the
- value in
+ This will result in the kernel module being ~60k larger. You can
+ control which debug output is sent to the kernel log by setting the
+ value in
/sys/bus/pci/drivers/ipw2100/debug_level
This entry will only exist if this option is enabled.
- If you are not trying to debug or develop the IPW2100 driver, you
+ If you are not trying to debug or develop the IPW2100 driver, you
most likely want to say N here.
config IPW2200
select LIB80211
select LIBIPW
---help---
- A driver for the Intel PRO/Wireless 2200BG and 2915ABG Network
- Connection adapters.
+ A driver for the Intel PRO/Wireless 2200BG and 2915ABG Network
+ Connection adapters.
- See <file:Documentation/networking/device_drivers/intel/ipw2200.txt>
+ See <file:Documentation/networking/device_drivers/intel/ipw2200.txt>
for information on the capabilities currently enabled in this
driver and for tips for debugging issues and problems.
In order to use this driver, you will need a firmware image for it.
- You can obtain the firmware from
- <http://ipw2200.sf.net/>. See the above referenced README.ipw2200
+ You can obtain the firmware from
+ <http://ipw2200.sf.net/>. See the above referenced README.ipw2200
for information on where to install the firmware images.
- You will also very likely need the Wireless Tools in order to
- configure your card:
+ You will also very likely need the Wireless Tools in order to
+ configure your card:
- <http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/Tools.html>.
+ <http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/Tools.html>.
- It is recommended that you compile this driver as a module (M)
- rather than built-in (Y). This driver requires firmware at device
- initialization time, and when built-in this typically happens
- before the filesystem is accessible (hence firmware will be
- unavailable and initialization will fail). If you do choose to build
- this driver into your kernel image, you can avoid this problem by
- including the firmware and a firmware loader in an initramfs.
+ It is recommended that you compile this driver as a module (M)
+ rather than built-in (Y). This driver requires firmware at device
+ initialization time, and when built-in this typically happens
+ before the filesystem is accessible (hence firmware will be
+ unavailable and initialization will fail). If you do choose to build
+ this driver into your kernel image, you can avoid this problem by
+ including the firmware and a firmware loader in an initramfs.
config IPW2200_MONITOR
- bool "Enable promiscuous mode"
- depends on IPW2200
- ---help---
+ bool "Enable promiscuous mode"
+ depends on IPW2200
+ ---help---
Enables promiscuous/monitor mode support for the ipw2200 driver.
- With this feature compiled into the driver, you can switch to
+ With this feature compiled into the driver, you can switch to
promiscuous mode via the Wireless Tool's Monitor mode. While in this
mode, no packets can be sent.
depends on IPW2200_MONITOR
select IPW2200_RADIOTAP
---help---
- Enables the creation of a second interface prefixed 'rtap'.
- This second interface will provide every received in radiotap
+ Enables the creation of a second interface prefixed 'rtap'.
+ This second interface will provide every received in radiotap
format.
- This is useful for performing wireless network analysis while
- maintaining an active association.
+ This is useful for performing wireless network analysis while
+ maintaining an active association.
+
+ Example usage:
- Example usage:
+ % modprobe ipw2200 rtap_iface=1
+ % ifconfig rtap0 up
+ % tethereal -i rtap0
- % modprobe ipw2200 rtap_iface=1
- % ifconfig rtap0 up
- % tethereal -i rtap0
+ If you do not specify 'rtap_iface=1' as a module parameter then
+ the rtap interface will not be created and you will need to turn
+ it on via sysfs:
- If you do not specify 'rtap_iface=1' as a module parameter then
- the rtap interface will not be created and you will need to turn
- it on via sysfs:
-
- % echo 1 > /sys/bus/pci/drivers/ipw2200/*/rtap_iface
+ % echo 1 > /sys/bus/pci/drivers/ipw2200/*/rtap_iface
config IPW2200_QOS
- bool "Enable QoS support"
- depends on IPW2200
+ bool "Enable QoS support"
+ depends on IPW2200
config IPW2200_DEBUG
bool "Enable full debugging output in IPW2200 module."
any problems you may encounter.
config IWLEGACY_DEBUGFS
- bool "iwlegacy (iwl 3945/4965) debugfs support"
- depends on IWLEGACY && MAC80211_DEBUGFS
- ---help---
+ bool "iwlegacy (iwl 3945/4965) debugfs support"
+ depends on IWLEGACY && MAC80211_DEBUGFS
+ ---help---
Enable creation of debugfs files for the iwlegacy drivers. This
is a low-impact option that allows getting insight into the
driver's state at runtime.
any problems you may encounter.
config IWLWIFI_DEBUGFS
- bool "iwlwifi debugfs support"
- depends on MAC80211_DEBUGFS
- ---help---
+ bool "iwlwifi debugfs support"
+ depends on MAC80211_DEBUGFS
+ ---help---
Enable creation of debugfs files for the iwlwifi drivers. This
is a low-impact option that allows getting insight into the
driver's state at runtime.
* firmware versions. Unfortunately, we don't have a TLV API
* flag to rely on, so rely on the major version which is in
* the first byte of ucode_ver. This was implemented
- * initially on version 38 and then backported to 36, 29 and
- * 17.
+ * initially on version 38 and then backported to29 and 17.
+ * The intention was to have it in 36 as well, but not all
+ * 8000 family got this feature enabled. The 8000 family is
+ * the only one using version 36, so skip this version
+ * entirely.
*/
return IWL_UCODE_SERIAL(mvm->fw->ucode_ver) >= 38 ||
- IWL_UCODE_SERIAL(mvm->fw->ucode_ver) == 36 ||
IWL_UCODE_SERIAL(mvm->fw->ucode_ver) == 29 ||
IWL_UCODE_SERIAL(mvm->fw->ucode_ver) == 17;
}
return ((s16)le16_to_cpu(*(__le16 *)a) -
(s16)le16_to_cpu(*(__le16 *)b));
}
+#endif
int iwl_mvm_send_temp_report_ths_cmd(struct iwl_mvm *mvm)
{
struct temp_report_ths_cmd cmd = {0};
- int ret, i, j, idx = 0;
+ int ret;
+#ifdef CONFIG_THERMAL
+ int i, j, idx = 0;
lockdep_assert_held(&mvm->mutex);
if (!mvm->tz_device.tzone)
- return -EINVAL;
+ goto send;
/* The driver holds array of temperature trips that are unsorted
* and uncompressed, the FW should get it compressed and sorted
}
send:
+#endif
ret = iwl_mvm_send_cmd_pdu(mvm, WIDE_ID(PHY_OPS_GROUP,
TEMP_REPORTING_THRESHOLDS_CMD),
0, sizeof(cmd), &cmd);
return ret;
}
+#ifdef CONFIG_THERMAL
static int iwl_mvm_tzone_get_temp(struct thermal_zone_device *device,
int *temperature)
{
skb_orphan(skb);
skb_dst_drop(skb);
skb->mark = 0;
- secpath_reset(skb);
- nf_reset(skb);
+ skb_ext_reset(skb);
+ nf_reset_ct(skb);
/*
* Get absolute mactime here so all HWs RX at the "same time", and
static int mt7615_load_patch(struct mt7615_dev *dev)
{
- const char *firmware = MT7615_ROM_PATCH;
const struct mt7615_patch_hdr *hdr;
const struct firmware *fw = NULL;
int len, ret, sem;
return -EAGAIN;
}
- ret = request_firmware(&fw, firmware, dev->mt76.dev);
+ ret = request_firmware(&fw, MT7615_ROM_PATCH, dev->mt76.dev);
if (ret)
goto out;
static int mt7615_load_ram(struct mt7615_dev *dev)
{
- const struct firmware *fw;
const struct mt7615_fw_trailer *hdr;
- const char *n9_firmware = MT7615_FIRMWARE_N9;
- const char *cr4_firmware = MT7615_FIRMWARE_CR4;
+ const struct firmware *fw;
int ret;
- ret = request_firmware(&fw, n9_firmware, dev->mt76.dev);
+ ret = request_firmware(&fw, MT7615_FIRMWARE_N9, dev->mt76.dev);
if (ret)
return ret;
release_firmware(fw);
- ret = request_firmware(&fw, cr4_firmware, dev->mt76.dev);
+ ret = request_firmware(&fw, MT7615_FIRMWARE_CR4, dev->mt76.dev);
if (ret)
return ret;
#define MT7615_RX_RING_SIZE 1024
#define MT7615_RX_MCU_RING_SIZE 512
-#define MT7615_FIRMWARE_CR4 "mt7615_cr4.bin"
-#define MT7615_FIRMWARE_N9 "mt7615_n9.bin"
-#define MT7615_ROM_PATCH "mt7615_rom_patch.bin"
+#define MT7615_FIRMWARE_CR4 "mediatek/mt7615_cr4.bin"
+#define MT7615_FIRMWARE_N9 "mediatek/mt7615_n9.bin"
+#define MT7615_ROM_PATCH "mediatek/mt7615_rom_patch.bin"
#define MT7615_EEPROM_SIZE 1024
#define MT7615_TOKEN_SIZE 4096
bool "rt2800pci - Include support for rt53xx devices (EXPERIMENTAL)"
default y
---help---
- This adds support for rt53xx wireless chipset family to the
- rt2800pci driver.
- Supported chips: RT5390
+ This adds support for rt53xx wireless chipset family to the
+ rt2800pci driver.
+ Supported chips: RT5390
config RT2800PCI_RT3290
bool "rt2800pci - Include support for rt3290 devices (EXPERIMENTAL)"
default y
---help---
- This adds support for rt3290 wireless chipset family to the
- rt2800pci driver.
- Supported chips: RT3290
+ This adds support for rt3290 wireless chipset family to the
+ rt2800pci driver.
+ Supported chips: RT3290
endif
config RT2500USB
config RT2800USB_RT53XX
bool "rt2800usb - Include support for rt53xx devices (EXPERIMENTAL)"
---help---
- This adds support for rt53xx wireless chipset family to the
- rt2800usb driver.
- Supported chips: RT5370
+ This adds support for rt53xx wireless chipset family to the
+ rt2800usb driver.
+ Supported chips: RT5370
config RT2800USB_RT55XX
bool "rt2800usb - Include support for rt55xx devices (EXPERIMENTAL)"
---help---
- This adds support for rt55xx wireless chipset family to the
- rt2800usb driver.
- Supported chips: RT5572
+ This adds support for rt55xx wireless chipset family to the
+ rt2800usb driver.
+ Supported chips: RT5572
config RT2800USB_UNKNOWN
bool "rt2800usb - Include support for unknown (USB) devices"
rtwdev->h2c.last_box_num = 0;
rtwdev->h2c.seq = 0;
- rtw_fw_send_general_info(rtwdev);
- rtw_fw_send_phydm_info(rtwdev);
-
rtw_flag_set(rtwdev, RTW_FLAG_FW_RUNNING);
return 0;
goto err_off;
}
+ /* send H2C after HCI has started */
+ rtw_fw_send_general_info(rtwdev);
+ rtw_fw_send_phydm_info(rtwdev);
+
wifi_only = !rtwdev->efuse.btcoex;
rtw_coex_power_on_setting(rtwdev);
rtw_coex_init_hw_config(rtwdev, wifi_only);
return tx_ring->r.head + offset;
}
-static void rtw_pci_free_tx_ring(struct rtw_dev *rtwdev,
- struct rtw_pci_tx_ring *tx_ring)
+static void rtw_pci_free_tx_ring_skbs(struct rtw_dev *rtwdev,
+ struct rtw_pci_tx_ring *tx_ring)
{
struct pci_dev *pdev = to_pci_dev(rtwdev->dev);
struct rtw_pci_tx_data *tx_data;
struct sk_buff *skb, *tmp;
dma_addr_t dma;
- u8 *head = tx_ring->r.head;
- u32 len = tx_ring->r.len;
- int ring_sz = len * tx_ring->r.desc_size;
/* free every skb remained in tx list */
skb_queue_walk_safe(&tx_ring->queue, skb, tmp) {
pci_unmap_single(pdev, dma, skb->len, PCI_DMA_TODEVICE);
dev_kfree_skb_any(skb);
}
+}
+
+static void rtw_pci_free_tx_ring(struct rtw_dev *rtwdev,
+ struct rtw_pci_tx_ring *tx_ring)
+{
+ struct pci_dev *pdev = to_pci_dev(rtwdev->dev);
+ u8 *head = tx_ring->r.head;
+ u32 len = tx_ring->r.len;
+ int ring_sz = len * tx_ring->r.desc_size;
+
+ rtw_pci_free_tx_ring_skbs(rtwdev, tx_ring);
/* free the ring itself */
pci_free_consistent(pdev, ring_sz, head, tx_ring->r.dma);
tx_ring->r.head = NULL;
}
-static void rtw_pci_free_rx_ring(struct rtw_dev *rtwdev,
- struct rtw_pci_rx_ring *rx_ring)
+static void rtw_pci_free_rx_ring_skbs(struct rtw_dev *rtwdev,
+ struct rtw_pci_rx_ring *rx_ring)
{
struct pci_dev *pdev = to_pci_dev(rtwdev->dev);
struct sk_buff *skb;
- dma_addr_t dma;
- u8 *head = rx_ring->r.head;
int buf_sz = RTK_PCI_RX_BUF_SIZE;
- int ring_sz = rx_ring->r.desc_size * rx_ring->r.len;
+ dma_addr_t dma;
int i;
for (i = 0; i < rx_ring->r.len; i++) {
dev_kfree_skb(skb);
rx_ring->buf[i] = NULL;
}
+}
+
+static void rtw_pci_free_rx_ring(struct rtw_dev *rtwdev,
+ struct rtw_pci_rx_ring *rx_ring)
+{
+ struct pci_dev *pdev = to_pci_dev(rtwdev->dev);
+ u8 *head = rx_ring->r.head;
+ int ring_sz = rx_ring->r.desc_size * rx_ring->r.len;
+
+ rtw_pci_free_rx_ring_skbs(rtwdev, rx_ring);
pci_free_consistent(pdev, ring_sz, head, rx_ring->r.dma);
}
rtwpci->rx_tag = 0;
}
+static void rtw_pci_dma_release(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci)
+{
+ struct rtw_pci_tx_ring *tx_ring;
+ u8 queue;
+
+ for (queue = 0; queue < RTK_MAX_TX_QUEUE_NUM; queue++) {
+ tx_ring = &rtwpci->tx_rings[queue];
+ rtw_pci_free_tx_ring_skbs(rtwdev, tx_ring);
+ }
+}
+
static int rtw_pci_start(struct rtw_dev *rtwdev)
{
struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
spin_lock_irqsave(&rtwpci->irq_lock, flags);
rtw_pci_disable_interrupt(rtwdev, rtwpci);
+ rtw_pci_dma_release(rtwdev, rtwpci);
spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
}
*/
if (rr->length < struct_size(regs, regs, count)) {
dev_dbg_f(zd_usb_dev(usb),
- "error: actual length %d less than expected %ld\n",
+ "error: actual length %d less than expected %zu\n",
rr->length, struct_size(regs, regs, count));
return false;
}
return 0;
}
-static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
- struct sk_buff *skb,
- struct sk_buff_head *list)
+static int xennet_fill_frags(struct netfront_queue *queue,
+ struct sk_buff *skb,
+ struct sk_buff_head *list)
{
RING_IDX cons = queue->rx.rsp_cons;
struct sk_buff *nskb;
if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
queue->rx.rsp_cons = ++cons + skb_queue_len(list);
kfree_skb(nskb);
- return ~0U;
+ return -ENOENT;
}
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
kfree_skb(nskb);
}
- return cons;
+ queue->rx.rsp_cons = cons;
+
+ return 0;
}
static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
skb->data_len = rx->status;
skb->len += rx->status;
- i = xennet_fill_frags(queue, skb, &tmpq);
- if (unlikely(i == ~0U))
+ if (unlikely(xennet_fill_frags(queue, skb, &tmpq)))
goto err;
if (rx->flags & XEN_NETRXF_csum_blank)
__skb_queue_tail(&rxq, skb);
- queue->rx.rsp_cons = ++i;
+ i = ++queue->rx.rsp_cons;
work_done++;
}
result = -ETIMEDOUT;
else
result = -EIO;
- return result;
+ return result;
}
/* Check for CRC err only if CRC is present in the tag response */
if (idx < 0 || idx > ndev->mw_count)
return -EINVAL;
- return 1 << idx;
+ return ndev->dev_data->mw_idx << idx;
}
static int amd_ntb_mw_count(struct ntb_dev *ntb, int pidx)
{
void __iomem *mmio = ndev->self_mmio;
- ndev->mw_count = AMD_MW_CNT;
+ ndev->mw_count = ndev->dev_data->mw_count;
ndev->spad_count = AMD_SPADS_CNT;
ndev->db_count = AMD_DB_CNT;
goto err_ndev;
}
+ ndev->dev_data = (struct ntb_dev_data *)id->driver_data;
+
ndev_init_struct(ndev, pdev);
rc = amd_ntb_init_pci(ndev, pdev);
.read = ndev_debugfs_read,
};
+static const struct ntb_dev_data dev_data[] = {
+ { /* for device 145b */
+ .mw_count = 3,
+ .mw_idx = 1,
+ },
+ { /* for device 148b */
+ .mw_count = 2,
+ .mw_idx = 2,
+ },
+};
+
static const struct pci_device_id amd_ntb_pci_tbl[] = {
- {PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_NTB)},
- {0}
+ { PCI_VDEVICE(AMD, 0x145b), (kernel_ulong_t)&dev_data[0] },
+ { PCI_VDEVICE(AMD, 0x148b), (kernel_ulong_t)&dev_data[1] },
+ { 0, }
};
MODULE_DEVICE_TABLE(pci, amd_ntb_pci_tbl);
#include <linux/ntb.h>
#include <linux/pci.h>
-#define PCI_DEVICE_ID_AMD_NTB 0x145B
#define AMD_LINK_HB_TIMEOUT msecs_to_jiffies(1000)
#define AMD_LINK_STATUS_OFFSET 0x68
#define NTB_LIN_STA_ACTIVE_BIT 0x00000002
enum {
/* AMD NTB Capability */
- AMD_MW_CNT = 3,
AMD_DB_CNT = 16,
AMD_MSIX_VECTOR_CNT = 24,
AMD_SPADS_CNT = 16,
AMD_PEER_OFFSET = 0x400,
};
+struct ntb_dev_data {
+ const unsigned char mw_count;
+ const unsigned int mw_idx;
+};
+
struct amd_ntb_dev;
struct amd_ntb_vec {
u32 cntl_sta;
u32 peer_sta;
+ struct ntb_dev_data *dev_data;
unsigned char mw_count;
unsigned char spad_count;
unsigned char db_count;
depends on PCI
select HWMON
help
- This driver supports NTB of cappable IDT PCIe-switches.
+ This driver supports NTB of capable IDT PCIe-switches.
Some of the pre-initializations must be made before IDT PCIe-switch
- exposes it NT-functions correctly. It should be done by either proper
- initialisation of EEPROM connected to master smbus of the switch or
+ exposes its NT-functions correctly. It should be done by either proper
+ initialization of EEPROM connected to master SMbus of the switch or
by BIOS using slave-SMBus interface changing corresponding registers
value. Evidently it must be done before PCI bus enumeration is
finished in Linux kernel.
if (rc)
return rc;
- if (addr == 0 || size == 0) {
+ if (size == 0) {
if (widx < nr_direct_mw)
switchtec_ntb_mw_clr_direct(sndev, widx);
else
static int ntb_transport_bus_probe(struct device *dev)
{
const struct ntb_transport_client *client;
- int rc = -EINVAL;
+ int rc;
get_device(dev);
int ret;
/* Get outbound MW parameters and map it */
- ret = ntb_peer_mw_get_addr(perf->ntb, peer->gidx, &phys_addr,
+ ret = ntb_peer_mw_get_addr(perf->ntb, perf->gidx, &phys_addr,
&peer->outbuf_size);
if (ret)
return ret;
arena->freelist[lane].sub = 1 - arena->freelist[lane].sub;
if (++(arena->freelist[lane].seq) == 4)
arena->freelist[lane].seq = 1;
- if (ent_e_flag(ent->old_map))
+ if (ent_e_flag(le32_to_cpu(ent->old_map)))
arena->freelist[lane].has_err = 1;
- arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map));
+ arena->freelist[lane].block = ent_lba(le32_to_cpu(ent->old_map));
return ret;
}
* FIXME: if error clearing fails during init, we want to make
* the BTT read-only
*/
- if (ent_e_flag(log_new.old_map) &&
- !ent_normal(log_new.old_map)) {
+ if (ent_e_flag(le32_to_cpu(log_new.old_map)) &&
+ !ent_normal(le32_to_cpu(log_new.old_map))) {
arena->freelist[i].has_err = 1;
ret = arena_clear_freelist_error(arena, i);
if (ret)
sector_t sector;
/* make sure device is a region */
- if (!is_nd_pmem(dev))
+ if (!is_memory(dev))
return 0;
nd_region = to_nd_region(dev);
nd_mapping = &nd_region->mapping[i];
label_ent = list_first_entry_or_null(&nd_mapping->labels,
typeof(*label_ent), list);
- label0 = label_ent ? label_ent->label : 0;
+ label0 = label_ent ? label_ent->label : NULL;
if (!label0) {
WARN_ON(1);
continue;
/* skip labels that describe extents outside of the region */
- if (nd_label->dpa < nd_mapping->start || nd_label->dpa > map_end)
- continue;
+ if (__le64_to_cpu(nd_label->dpa) < nd_mapping->start ||
+ __le64_to_cpu(nd_label->dpa) > map_end)
+ continue;
i = add_namespace_resource(nd_region, nd_label, devs, count);
if (i < 0)
struct nd_pfn *to_nd_pfn(struct device *dev);
#if IS_ENABLED(CONFIG_NVDIMM_PFN)
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
-#else
-#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
-#endif
+#define MAX_NVDIMM_ALIGN 4
int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
bool is_nd_pfn(struct device *dev);
return sprintf(buf, "%ld\n", nd_pfn->align);
}
-static const unsigned long *nd_pfn_supported_alignments(void)
+static unsigned long *nd_pfn_supported_alignments(unsigned long *alignments)
{
- /*
- * This needs to be a non-static variable because the *_SIZE
- * macros aren't always constants.
- */
- const unsigned long supported_alignments[] = {
- PAGE_SIZE,
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- HPAGE_PMD_SIZE,
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
- HPAGE_PUD_SIZE,
-#endif
-#endif
- 0,
- };
- static unsigned long data[ARRAY_SIZE(supported_alignments)];
- memcpy(data, supported_alignments, sizeof(data));
+ alignments[0] = PAGE_SIZE;
+
+ if (has_transparent_hugepage()) {
+ alignments[1] = HPAGE_PMD_SIZE;
+ if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD))
+ alignments[2] = HPAGE_PUD_SIZE;
+ }
+
+ return alignments;
+}
+
+/*
+ * Use pmd mapping if supported as default alignment
+ */
+static unsigned long nd_pfn_default_alignment(void)
+{
- return data;
+ if (has_transparent_hugepage())
+ return HPAGE_PMD_SIZE;
+ return PAGE_SIZE;
}
static ssize_t align_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
struct nd_pfn *nd_pfn = to_nd_pfn_safe(dev);
+ unsigned long aligns[MAX_NVDIMM_ALIGN] = { [0] = 0, };
ssize_t rc;
nd_device_lock(dev);
nvdimm_bus_lock(dev);
rc = nd_size_select_store(dev, buf, &nd_pfn->align,
- nd_pfn_supported_alignments());
+ nd_pfn_supported_alignments(aligns));
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
static ssize_t supported_alignments_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
- return nd_size_select_show(0, nd_pfn_supported_alignments(), buf);
+ unsigned long aligns[MAX_NVDIMM_ALIGN] = { [0] = 0, };
+
+ return nd_size_select_show(0,
+ nd_pfn_supported_alignments(aligns), buf);
}
static DEVICE_ATTR_RO(supported_alignments);
return NULL;
nd_pfn->mode = PFN_MODE_NONE;
- nd_pfn->align = PFN_DEFAULT_ALIGNMENT;
+ nd_pfn->align = nd_pfn_default_alignment();
dev = &nd_pfn->dev;
device_initialize(&nd_pfn->dev);
if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) {
return 0;
}
+static bool nd_supported_alignment(unsigned long align)
+{
+ int i;
+ unsigned long supported[MAX_NVDIMM_ALIGN] = { [0] = 0, };
+
+ if (align == 0)
+ return false;
+
+ nd_pfn_supported_alignments(supported);
+ for (i = 0; supported[i]; i++)
+ if (align == supported[i])
+ return true;
+ return false;
+}
+
/**
* nd_pfn_validate - read and validate info-block
* @nd_pfn: fsdax namespace runtime state / properties
return -EOPNOTSUPP;
}
+ /*
+ * Check whether the we support the alignment. For Dax if the
+ * superblock alignment is not matching, we won't initialize
+ * the device.
+ */
+ if (!nd_supported_alignment(align) &&
+ !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
+ dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
+ "%ld:%ld\n", nd_pfn->align, align);
+ return -EOPNOTSUPP;
+ }
+
if (!nd_pfn->uuid) {
/*
* When probing a namepace via nd_pfn_probe() the uuid
struct nd_namespace_common *ndns = nd_pfn->ndns;
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
resource_size_t base = nsio->res.start + start_pad;
+ resource_size_t end = nsio->res.end - end_trunc;
struct vmem_altmap __altmap = {
.base_pfn = init_altmap_base(base),
.reserve = init_altmap_reserve(base),
+ .end_pfn = PHYS_PFN(end),
};
memcpy(res, &nsio->res, sizeof(*res));
if (rc)
return rc;
- if (is_nd_pmem(&nd_region->dev)) {
+ if (is_memory(&nd_region->dev)) {
struct resource ndr_res;
if (devm_init_badblocks(dev, &nd_region->bb))
struct nd_region *nd_region = to_nd_region(dev);
struct resource res;
- if (is_nd_pmem(&nd_region->dev)) {
+ if (is_memory(&nd_region->dev)) {
res.start = nd_region->ndr_start;
res.end = nd_region->ndr_start +
nd_region->ndr_size - 1;
if (!is_memory(dev) && a == &dev_attr_dax_seed.attr)
return 0;
- if (!is_nd_pmem(dev) && a == &dev_attr_badblocks.attr)
+ if (!is_memory(dev) && a == &dev_attr_badblocks.attr)
return 0;
if (a == &dev_attr_resource.attr) {
- if (is_nd_pmem(dev))
+ if (is_memory(dev))
return 0400;
else
return 0;
bool is_nvdimm_sync(struct nd_region *nd_region)
{
+ if (is_nd_volatile(&nd_region->dev))
+ return true;
+
return is_nd_pmem(&nd_region->dev) &&
!test_bit(ND_REGION_ASYNC, &nd_region->flags);
}
|| !nvdimm->sec.flags)
return -EIO;
+ /* No need to go further if security is disabled */
+ if (test_bit(NVDIMM_SECURITY_DISABLED, &nvdimm->sec.flags))
+ return 0;
+
if (test_bit(NDD_SECURITY_OVERWRITE, &nvdimm->flags)) {
dev_dbg(dev, "Security operation in progress.\n");
return -EBUSY;
*/
if (!ns->disk || test_and_set_bit(NVME_NS_DEAD, &ns->flags))
return;
- revalidate_disk(ns->disk);
blk_set_queue_dying(ns->queue);
/* Forcibly unquiesce queues to avoid blocking dispatch */
blk_mq_unquiesce_queue(ns->queue);
+ /*
+ * Revalidate after unblocking dispatchers that may be holding bd_butex
+ */
+ revalidate_disk(ns->disk);
}
static void nvme_queue_scan(struct nvme_ctrl *ctrl)
static int nvme_submit_user_cmd(struct request_queue *q,
struct nvme_command *cmd, void __user *ubuffer,
unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
- u32 meta_seed, u32 *result, unsigned timeout)
+ u32 meta_seed, u64 *result, unsigned timeout)
{
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
else
ret = nvme_req(req)->status;
if (result)
- *result = le32_to_cpu(nvme_req(req)->result.u32);
+ *result = le64_to_cpu(nvme_req(req)->result.u64);
if (meta && !ret && !write) {
if (copy_to_user(meta_buffer, meta, meta_len))
ret = -EFAULT;
struct nvme_command c;
unsigned timeout = 0;
u32 effects;
+ u64 result;
+ int status;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+ if (copy_from_user(&cmd, ucmd, sizeof(cmd)))
+ return -EFAULT;
+ if (cmd.flags)
+ return -EINVAL;
+
+ memset(&c, 0, sizeof(c));
+ c.common.opcode = cmd.opcode;
+ c.common.flags = cmd.flags;
+ c.common.nsid = cpu_to_le32(cmd.nsid);
+ c.common.cdw2[0] = cpu_to_le32(cmd.cdw2);
+ c.common.cdw2[1] = cpu_to_le32(cmd.cdw3);
+ c.common.cdw10 = cpu_to_le32(cmd.cdw10);
+ c.common.cdw11 = cpu_to_le32(cmd.cdw11);
+ c.common.cdw12 = cpu_to_le32(cmd.cdw12);
+ c.common.cdw13 = cpu_to_le32(cmd.cdw13);
+ c.common.cdw14 = cpu_to_le32(cmd.cdw14);
+ c.common.cdw15 = cpu_to_le32(cmd.cdw15);
+
+ if (cmd.timeout_ms)
+ timeout = msecs_to_jiffies(cmd.timeout_ms);
+
+ effects = nvme_passthru_start(ctrl, ns, cmd.opcode);
+ status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
+ (void __user *)(uintptr_t)cmd.addr, cmd.data_len,
+ (void __user *)(uintptr_t)cmd.metadata,
+ cmd.metadata_len, 0, &result, timeout);
+ nvme_passthru_end(ctrl, effects);
+
+ if (status >= 0) {
+ if (put_user(result, &ucmd->result))
+ return -EFAULT;
+ }
+
+ return status;
+}
+
+static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+ struct nvme_passthru_cmd64 __user *ucmd)
+{
+ struct nvme_passthru_cmd64 cmd;
+ struct nvme_command c;
+ unsigned timeout = 0;
+ u32 effects;
int status;
if (!capable(CAP_SYS_ADMIN))
srcu_read_unlock(&head->srcu, idx);
}
+static bool is_ctrl_ioctl(unsigned int cmd)
+{
+ if (cmd == NVME_IOCTL_ADMIN_CMD || cmd == NVME_IOCTL_ADMIN64_CMD)
+ return true;
+ if (is_sed_ioctl(cmd))
+ return true;
+ return false;
+}
+
+static int nvme_handle_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
+ void __user *argp,
+ struct nvme_ns_head *head,
+ int srcu_idx)
+{
+ struct nvme_ctrl *ctrl = ns->ctrl;
+ int ret;
+
+ nvme_get_ctrl(ns->ctrl);
+ nvme_put_ns_from_disk(head, srcu_idx);
+
+ switch (cmd) {
+ case NVME_IOCTL_ADMIN_CMD:
+ ret = nvme_user_cmd(ctrl, NULL, argp);
+ break;
+ case NVME_IOCTL_ADMIN64_CMD:
+ ret = nvme_user_cmd64(ctrl, NULL, argp);
+ break;
+ default:
+ ret = sed_ioctl(ctrl->opal_dev, cmd, argp);
+ break;
+ }
+ nvme_put_ctrl(ctrl);
+ return ret;
+}
+
static int nvme_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
{
* seperately and drop the ns SRCU reference early. This avoids a
* deadlock when deleting namespaces using the passthrough interface.
*/
- if (cmd == NVME_IOCTL_ADMIN_CMD || is_sed_ioctl(cmd)) {
- struct nvme_ctrl *ctrl = ns->ctrl;
-
- nvme_get_ctrl(ns->ctrl);
- nvme_put_ns_from_disk(head, srcu_idx);
-
- if (cmd == NVME_IOCTL_ADMIN_CMD)
- ret = nvme_user_cmd(ctrl, NULL, argp);
- else
- ret = sed_ioctl(ctrl->opal_dev, cmd, argp);
-
- nvme_put_ctrl(ctrl);
- return ret;
- }
+ if (is_ctrl_ioctl(cmd))
+ return nvme_handle_ctrl_ioctl(ns, cmd, argp, head, srcu_idx);
switch (cmd) {
case NVME_IOCTL_ID:
case NVME_IOCTL_SUBMIT_IO:
ret = nvme_submit_io(ns, argp);
break;
+ case NVME_IOCTL_IO64_CMD:
+ ret = nvme_user_cmd64(ns->ctrl, ns, argp);
+ break;
default:
if (ns->ndev)
ret = nvme_nvm_ioctl(ns, cmd, arg);
.vid = 0x14a4,
.fr = "22301111",
.quirks = NVME_QUIRK_SIMPLE_SUSPEND,
+ },
+ {
+ /*
+ * This Kingston E8FK11.T firmware version has no interrupt
+ * after resume with actions related to suspend to idle
+ * https://bugzilla.kernel.org/show_bug.cgi?id=204887
+ */
+ .vid = 0x2646,
+ .fr = "E8FK11.T",
+ .quirks = NVME_QUIRK_SIMPLE_SUSPEND,
}
};
list_add_tail(&subsys->entry, &nvme_subsystems);
}
- if (sysfs_create_link(&subsys->dev.kobj, &ctrl->device->kobj,
- dev_name(ctrl->device))) {
+ ret = sysfs_create_link(&subsys->dev.kobj, &ctrl->device->kobj,
+ dev_name(ctrl->device));
+ if (ret) {
dev_err(ctrl->device,
"failed to create sysfs link from subsystem.\n");
goto out_put_subsystem;
switch (cmd) {
case NVME_IOCTL_ADMIN_CMD:
return nvme_user_cmd(ctrl, NULL, argp);
+ case NVME_IOCTL_ADMIN64_CMD:
+ return nvme_user_cmd64(ctrl, NULL, argp);
case NVME_IOCTL_IO_CMD:
return nvme_dev_user_cmd(ctrl, argp);
case NVME_IOCTL_RESET:
nvme_show_int_function(cntlid);
nvme_show_int_function(numa_node);
+nvme_show_int_function(queue_count);
+nvme_show_int_function(sqsize);
static ssize_t nvme_sysfs_delete(struct device *dev,
struct device_attribute *attr, const char *buf,
&dev_attr_address.attr,
&dev_attr_state.attr,
&dev_attr_numa_node.attr,
+ &dev_attr_queue_count.attr,
+ &dev_attr_sqsize.attr,
NULL
};
u16 oacs;
u16 nssa;
u16 nr_streams;
+ u16 sqsize;
u32 max_namespaces;
atomic_t abort_limit;
u8 vwc;
u16 hmmaxd;
/* Fabrics only */
- u16 sqsize;
u32 ioccsz;
u32 iorcsz;
u16 icdoff;
if (ret < 0)
goto unfreeze;
+ /*
+ * A saved state prevents pci pm from generically controlling the
+ * device's power. If we're using protocol specific settings, we don't
+ * want pci interfering.
+ */
+ pci_save_state(pdev);
+
ret = nvme_set_power_state(ctrl, ctrl->npss);
if (ret < 0)
goto unfreeze;
if (ret) {
+ /* discard the saved state */
+ pci_load_saved_state(pdev, NULL);
+
/*
* Clearing npss forces a controller reset on resume. The
* correct value will be resdicovered then.
nvme_dev_disable(ndev, true);
ctrl->npss = 0;
ret = 0;
- goto unfreeze;
}
- /*
- * A saved state prevents pci pm from generically controlling the
- * device's power. If we're using protocol specific settings, we don't
- * want pci interfering.
- */
- pci_save_state(pdev);
unfreeze:
nvme_unfreeze(ctrl);
return ret;
.driver_data = NVME_QUIRK_LIGHTNVM, },
{ PCI_DEVICE(0x10ec, 0x5762), /* ADATA SX6000LNP */
.driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, },
+ { PCI_DEVICE(0x1cc1, 0x8201), /* ADATA SX8200PNP 512GB */
+ .driver_data = NVME_QUIRK_NO_DEEPEST_PS |
+ NVME_QUIRK_IGNORE_DEV_SUBNQN, },
{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev)
{
return min_t(u32, NVME_RDMA_MAX_SEGMENTS,
- ibdev->attrs.max_fast_reg_page_list_len);
+ ibdev->attrs.max_fast_reg_page_list_len - 1);
}
static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
const int cq_factor = send_wr_factor + 1; /* + RECV */
int comp_vector, idx = nvme_rdma_queue_idx(queue);
enum ib_poll_context poll_ctx;
- int ret;
+ int ret, pages_per_mr;
queue->device = nvme_rdma_find_get_device(queue->cm_id);
if (!queue->device) {
goto out_destroy_qp;
}
+ /*
+ * Currently we don't use SG_GAPS MR's so if the first entry is
+ * misaligned we'll end up using two entries for a single data page,
+ * so one additional entry is required.
+ */
+ pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev) + 1;
ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
queue->queue_size,
IB_MR_TYPE_MEM_REG,
- nvme_rdma_get_max_fr_pages(ibdev), 0);
+ pages_per_mr, 0);
if (ret) {
dev_err(queue->ctrl->ctrl.device,
"failed to initialize MR pool sized %d for QID %d\n",
if (!ret) {
set_bit(NVME_RDMA_Q_LIVE, &queue->flags);
} else {
- __nvme_rdma_stop_queue(queue);
+ if (test_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
+ __nvme_rdma_stop_queue(queue);
dev_info(ctrl->ctrl.device,
"failed to connect queue: %d ret=%d\n", idx, ret);
}
if (error)
goto out_stop_queue;
- ctrl->ctrl.max_hw_sectors =
- (ctrl->max_fr_pages - 1) << (ilog2(SZ_4K) - 9);
+ ctrl->ctrl.max_segments = ctrl->max_fr_pages;
+ ctrl->ctrl.max_hw_sectors = ctrl->max_fr_pages << (ilog2(SZ_4K) - 9);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
{
struct nvme_tcp_queue *queue =
container_of(w, struct nvme_tcp_queue, io_work);
- unsigned long start = jiffies + msecs_to_jiffies(1);
+ unsigned long deadline = jiffies + msecs_to_jiffies(1);
do {
bool pending = false;
if (!pending)
return;
- } while (time_after(jiffies, start)); /* quota is exhausted */
+ } while (!time_after(jiffies, deadline)); /* quota is exhausted */
queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
}
void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
{
const struct queue_limits *ql = &bdev_get_queue(bdev)->limits;
- /* Number of physical blocks per logical block. */
- const u32 ppl = ql->physical_block_size / ql->logical_block_size;
- /* Physical blocks per logical block, 0's based. */
- const __le16 ppl0b = to0based(ppl);
+ /* Number of logical blocks per physical block. */
+ const u32 lpp = ql->physical_block_size / ql->logical_block_size;
+ /* Logical blocks per physical block, 0's based. */
+ const __le16 lpp0b = to0based(lpp);
/*
* For NVMe 1.2 and later, bit 1 indicates that the fields NAWUN,
* field from the identify controller data structure should be used.
*/
id->nsfeat |= 1 << 1;
- id->nawun = ppl0b;
- id->nawupf = ppl0b;
- id->nacwu = ppl0b;
+ id->nawun = lpp0b;
+ id->nawupf = lpp0b;
+ id->nacwu = lpp0b;
/*
* Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
*/
id->nsfeat |= 1 << 4;
/* NPWG = Namespace Preferred Write Granularity. 0's based */
- id->npwg = ppl0b;
+ id->npwg = lpp0b;
/* NPWA = Namespace Preferred Write Alignment. 0's based */
id->npwa = id->npwg;
/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
return 0;
err:
- if (cmd->req.sg_cnt)
- sgl_free(cmd->req.sg);
+ sgl_free(cmd->req.sg);
return NVME_SC_INTERNAL;
}
if (queue->nvme_sq.sqhd_disabled) {
kfree(cmd->iov);
- if (cmd->req.sg_cnt)
- sgl_free(cmd->req.sg);
+ sgl_free(cmd->req.sg);
}
return 1;
return -EAGAIN;
kfree(cmd->iov);
- if (cmd->req.sg_cnt)
- sgl_free(cmd->req.sg);
+ sgl_free(cmd->req.sg);
cmd->queue->snd_cmd = NULL;
nvmet_tcp_put_cmd(cmd);
return 1;
nvmet_req_uninit(&cmd->req);
nvmet_tcp_unmap_pdu_iovec(cmd);
kfree(cmd->iov);
- if (cmd->req.sg_cnt)
- sgl_free(cmd->req.sg);
+ sgl_free(cmd->req.sg);
}
static void nvmet_tcp_uninit_data_in_cmds(struct nvmet_tcp_queue *queue)
int ret;
iface = of_get_phy_mode(np);
- if (iface < 0)
+ if ((int)iface < 0)
return NULL;
if (of_phy_is_fixed_link(np)) {
ret = of_phy_register_fixed_link(np);
unsigned int size = count;
loff_t init_off = off;
u8 *data = (u8 *) buf;
+ int ret;
+
+ ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+ if (ret)
+ return ret;
if (off > dev->cfg_size)
return 0;
int bar = (unsigned long)attr->private;
enum pci_mmap_state mmap_type;
struct resource *res = &pdev->resource[bar];
+ int ret;
+
+ ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+ if (ret)
+ return ret;
if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
return -EINVAL;
struct bin_attribute *attr, char *buf,
loff_t off, size_t count)
{
+ int ret;
+
+ ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+ if (ret)
+ return ret;
+
return pci_resource_io(filp, kobj, attr, buf, off, count, true);
}
#include <linux/seq_file.h>
#include <linux/capability.h>
#include <linux/uaccess.h>
+#include <linux/security.h>
#include <asm/byteorder.h>
#include "pci.h"
struct pci_dev *dev = PDE_DATA(ino);
int pos = *ppos;
int size = dev->cfg_size;
- int cnt;
+ int cnt, ret;
+
+ ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+ if (ret)
+ return ret;
if (pos >= size)
return 0;
#endif /* HAVE_PCI_MMAP */
int ret = 0;
+ ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
+ if (ret)
+ return ret;
+
switch (cmd) {
case PCIIOC_CONTROLLER:
ret = pci_domain_nr(dev->bus);
struct pci_filp_private *fpriv = file->private_data;
int i, ret, write_combine = 0, res_bit = IORESOURCE_MEM;
- if (!capable(CAP_SYS_RAWIO))
+ if (!capable(CAP_SYS_RAWIO) ||
+ security_locked_down(LOCKDOWN_PCI_ACCESS))
return -EPERM;
if (fpriv->mmap_state == pci_mmap_io) {
#include <linux/errno.h>
#include <linux/pci.h>
+#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
#include "pci.h"
u32 dword;
int err = 0;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable(CAP_SYS_ADMIN) ||
+ security_locked_down(LOCKDOWN_PCI_ACCESS))
return -EPERM;
dev = pci_get_domain_bus_and_slot(0, bus, dfn);
#include <linux/pci.h>
#include <linux/ioport.h>
#include <linux/io.h>
+#include <linux/security.h>
#include <asm/byteorder.h>
#include <asm/unaligned.h>
struct pcmcia_socket *s;
int error;
+ error = security_locked_down(LOCKDOWN_PCMCIA_CIS);
+ if (error)
+ return error;
+
s = to_socket(container_of(kobj, struct device, kobj));
if (off)
err = -EINVAL;
break;
} else if (cmd == PTP_EXTTS_REQUEST) {
- req.extts.flags &= ~PTP_EXTTS_VALID_FLAGS;
+ req.extts.flags &= PTP_EXTTS_V1_VALID_FLAGS;
req.extts.rsv[0] = 0;
req.extts.rsv[1] = 0;
}
err = -EINVAL;
break;
} else if (cmd == PTP_PEROUT_REQUEST) {
- req.perout.flags &= ~PTP_PEROUT_VALID_FLAGS;
+ req.perout.flags &= PTP_PEROUT_V1_VALID_FLAGS;
req.perout.rsv[0] = 0;
req.perout.rsv[1] = 0;
req.perout.rsv[2] = 0;
ptp_qoriq->regs.etts_regs = base + ETTS_REGS_OFFSET;
}
+ spin_lock_init(&ptp_qoriq->lock);
+
ktime_get_real_ts64(&now);
ptp_qoriq_settime(&ptp_qoriq->caps, &now);
(ptp_qoriq->tclk_period & TCLK_PERIOD_MASK) << TCLK_PERIOD_SHIFT |
(ptp_qoriq->cksel & CKSEL_MASK) << CKSEL_SHIFT;
- spin_lock_init(&ptp_qoriq->lock);
spin_lock_irqsave(&ptp_qoriq->lock, flags);
regs = &ptp_qoriq->regs;
config PWM_ATMEL
tristate "Atmel PWM support"
- depends on ARCH_AT91
+ depends on ARCH_AT91 && OF
help
Generic PWM framework driver for Atmel SoC.
To compile this driver as a module, choose M here: the module
will be called pwm-spear.
+config PWM_SPRD
+ tristate "Spreadtrum PWM support"
+ depends on ARCH_SPRD || COMPILE_TEST
+ depends on HAS_IOMEM
+ help
+ Generic PWM framework driver for the PWM controller on
+ Spreadtrum SoCs.
+
+ To compile this driver as a module, choose M here: the module
+ will be called pwm-sprd.
+
config PWM_STI
tristate "STiH4xx PWM support"
depends on ARCH_STI
obj-$(CONFIG_PWM_SAMSUNG) += pwm-samsung.o
obj-$(CONFIG_PWM_SIFIVE) += pwm-sifive.o
obj-$(CONFIG_PWM_SPEAR) += pwm-spear.o
+obj-$(CONFIG_PWM_SPRD) += pwm-sprd.o
obj-$(CONFIG_PWM_STI) += pwm-sti.o
obj-$(CONFIG_PWM_STM32) += pwm-stm32.o
obj-$(CONFIG_PWM_STM32_LP) += pwm-stm32-lp.o
/**
* pwm_apply_state() - atomically apply a new state to a PWM device
* @pwm: PWM device
- * @state: new state to apply. This can be adjusted by the PWM driver
- * if the requested config is not achievable, for example,
- * ->duty_cycle and ->period might be approximated.
+ * @state: new state to apply
*/
-int pwm_apply_state(struct pwm_device *pwm, struct pwm_state *state)
+int pwm_apply_state(struct pwm_device *pwm, const struct pwm_state *state)
{
+ struct pwm_chip *chip;
int err;
if (!pwm || !state || !state->period ||
state->duty_cycle > state->period)
return -EINVAL;
+ chip = pwm->chip;
+
if (state->period == pwm->state.period &&
state->duty_cycle == pwm->state.duty_cycle &&
state->polarity == pwm->state.polarity &&
state->enabled == pwm->state.enabled)
return 0;
- if (pwm->chip->ops->apply) {
- err = pwm->chip->ops->apply(pwm->chip, pwm, state);
+ if (chip->ops->apply) {
+ err = chip->ops->apply(chip, pwm, state);
if (err)
return err;
- pwm->state = *state;
+ /*
+ * .apply might have to round some values in *state, if possible
+ * read the actually implemented value back.
+ */
+ if (chip->ops->get_state)
+ chip->ops->get_state(chip, pwm, &pwm->state);
+ else
+ pwm->state = *state;
} else {
/*
* FIXME: restore the initial state in case of error.
*/
if (state->polarity != pwm->state.polarity) {
- if (!pwm->chip->ops->set_polarity)
+ if (!chip->ops->set_polarity)
return -ENOTSUPP;
/*
* ->apply().
*/
if (pwm->state.enabled) {
- pwm->chip->ops->disable(pwm->chip, pwm);
+ chip->ops->disable(chip, pwm);
pwm->state.enabled = false;
}
- err = pwm->chip->ops->set_polarity(pwm->chip, pwm,
- state->polarity);
+ err = chip->ops->set_polarity(chip, pwm,
+ state->polarity);
if (err)
return err;
if (state->period != pwm->state.period ||
state->duty_cycle != pwm->state.duty_cycle) {
- err = pwm->chip->ops->config(pwm->chip, pwm,
- state->duty_cycle,
- state->period);
+ err = chip->ops->config(pwm->chip, pwm,
+ state->duty_cycle,
+ state->period);
if (err)
return err;
if (state->enabled != pwm->state.enabled) {
if (state->enabled) {
- err = pwm->chip->ops->enable(pwm->chip, pwm);
+ err = chip->ops->enable(chip, pwm);
if (err)
return err;
} else {
- pwm->chip->ops->disable(pwm->chip, pwm);
+ chip->ops->disable(chip, pwm);
}
pwm->state.enabled = state->enabled;
}
static int atmel_hlcdc_pwm_apply(struct pwm_chip *c, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct atmel_hlcdc_pwm *chip = to_atmel_hlcdc_pwm(c);
struct atmel_hlcdc *hlcdc = chip->hlcdc;
}
static int atmel_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct atmel_pwm_chip *atmel_pwm = to_atmel_pwm_chip(chip);
struct pwm_state cstate;
},
};
-static const struct platform_device_id atmel_pwm_devtypes[] = {
- {
- .name = "at91sam9rl-pwm",
- .driver_data = (kernel_ulong_t)&atmel_sam9rl_pwm_data,
- }, {
- .name = "sama5d3-pwm",
- .driver_data = (kernel_ulong_t)&atmel_sama5_pwm_data,
- }, {
- /* sentinel */
- },
-};
-MODULE_DEVICE_TABLE(platform, atmel_pwm_devtypes);
-
static const struct of_device_id atmel_pwm_dt_ids[] = {
{
.compatible = "atmel,at91sam9rl-pwm",
};
MODULE_DEVICE_TABLE(of, atmel_pwm_dt_ids);
-static inline const struct atmel_pwm_data *
-atmel_pwm_get_driver_data(struct platform_device *pdev)
-{
- const struct platform_device_id *id;
-
- if (pdev->dev.of_node)
- return of_device_get_match_data(&pdev->dev);
-
- id = platform_get_device_id(pdev);
-
- return (struct atmel_pwm_data *)id->driver_data;
-}
-
static int atmel_pwm_probe(struct platform_device *pdev)
{
- const struct atmel_pwm_data *data;
struct atmel_pwm_chip *atmel_pwm;
struct resource *res;
int ret;
- data = atmel_pwm_get_driver_data(pdev);
- if (!data)
- return -ENODEV;
-
atmel_pwm = devm_kzalloc(&pdev->dev, sizeof(*atmel_pwm), GFP_KERNEL);
if (!atmel_pwm)
return -ENOMEM;
+ mutex_init(&atmel_pwm->isr_lock);
+ atmel_pwm->data = of_device_get_match_data(&pdev->dev);
+ atmel_pwm->updated_pwms = 0;
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
atmel_pwm->base = devm_ioremap_resource(&pdev->dev, res);
if (IS_ERR(atmel_pwm->base))
atmel_pwm->chip.dev = &pdev->dev;
atmel_pwm->chip.ops = &atmel_pwm_ops;
-
- if (pdev->dev.of_node) {
- atmel_pwm->chip.of_xlate = of_pwm_xlate_with_flags;
- atmel_pwm->chip.of_pwm_n_cells = 3;
- }
-
+ atmel_pwm->chip.of_xlate = of_pwm_xlate_with_flags;
+ atmel_pwm->chip.of_pwm_n_cells = 3;
atmel_pwm->chip.base = -1;
atmel_pwm->chip.npwm = 4;
- atmel_pwm->data = data;
- atmel_pwm->updated_pwms = 0;
- mutex_init(&atmel_pwm->isr_lock);
ret = pwmchip_add(&atmel_pwm->chip);
if (ret < 0) {
.name = "atmel-pwm",
.of_match_table = of_match_ptr(atmel_pwm_dt_ids),
},
- .id_table = atmel_pwm_devtypes,
.probe = atmel_pwm_probe,
.remove = atmel_pwm_remove,
};
}
static int iproc_pwmc_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
unsigned long prescale = IPROC_PWM_PRESCALE_MIN;
struct iproc_pwmc *ip = to_iproc_pwmc(chip);
#define PERIOD(x) (((x) * 0x10) + 0x10)
#define DUTY(x) (((x) * 0x10) + 0x14)
-#define MIN_PERIOD 108 /* 9.2 MHz max. PWM clock */
+#define PERIOD_MIN 0x2
struct bcm2835_pwm {
struct pwm_chip chip;
struct bcm2835_pwm *pc = to_bcm2835_pwm(chip);
unsigned long rate = clk_get_rate(pc->clk);
unsigned long scaler;
+ u32 period;
if (!rate) {
dev_err(pc->dev, "failed to get clock rate\n");
}
scaler = DIV_ROUND_CLOSEST(NSEC_PER_SEC, rate);
+ period = DIV_ROUND_CLOSEST(period_ns, scaler);
- if (period_ns <= MIN_PERIOD) {
- dev_err(pc->dev, "period %d not supported, minimum %d\n",
- period_ns, MIN_PERIOD);
+ if (period < PERIOD_MIN)
return -EINVAL;
- }
writel(DIV_ROUND_CLOSEST(duty_ns, scaler),
pc->base + DUTY(pwm->hwpwm));
- writel(DIV_ROUND_CLOSEST(period_ns, scaler),
- pc->base + PERIOD(pwm->hwpwm));
+ writel(period, pc->base + PERIOD(pwm->hwpwm));
return 0;
}
pc->clk = devm_clk_get(&pdev->dev, NULL);
if (IS_ERR(pc->clk)) {
- dev_err(&pdev->dev, "clock not found: %ld\n", PTR_ERR(pc->clk));
- return PTR_ERR(pc->clk);
+ ret = PTR_ERR(pc->clk);
+ if (ret != -EPROBE_DEFER)
+ dev_err(&pdev->dev, "clock not found: %d\n", ret);
+
+ return ret;
}
ret = clk_prepare_enable(pc->clk);
}
static int cros_ec_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct cros_ec_pwm_device *ec_pwm = pwm_to_cros_ec_pwm(chip);
int duty_cycle;
static int fsl_pwm_apply_config(struct fsl_pwm_chip *fpc,
struct pwm_device *pwm,
- struct pwm_state *newstate)
+ const struct pwm_state *newstate)
{
unsigned int duty;
u32 reg_polarity;
regmap_update_bits(fpc->regmap, FTM_POL, BIT(pwm->hwpwm), reg_polarity);
- newstate->period = fsl_pwm_ticks_to_ns(fpc,
- fpc->period.mod_period + 1);
- newstate->duty_cycle = fsl_pwm_ticks_to_ns(fpc, duty);
-
ftm_set_write_protection(fpc);
return 0;
}
static int fsl_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *newstate)
+ const struct pwm_state *newstate)
{
struct fsl_pwm_chip *fpc = to_fsl_chip(chip);
struct pwm_state *oldstate = &pwm->state;
}
static int hibvt_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct hibvt_pwm_chip *hi_pwm_chip = to_hibvt_pwm_chip(chip);
static int pwm_imx_tpm_round_state(struct pwm_chip *chip,
struct imx_tpm_pwm_param *p,
struct pwm_state *real_state,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct imx_tpm_pwm_chip *tpm = to_imx_tpm_pwm_chip(chip);
u32 rate, prescale, period_count, clock_unit;
static int pwm_imx_tpm_apply(struct pwm_chip *chip,
struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct imx_tpm_pwm_chip *tpm = to_imx_tpm_pwm_chip(chip);
struct imx_tpm_pwm_param param;
* simple driver for PWM (Pulse Width Modulator) controller
*
* Derived from pxa PWM driver by eric miao <eric.miao@marvell.com>
+ *
+ * Limitations:
+ * - When disabled the output is driven to 0 independent of the configured
+ * polarity.
*/
#include <linux/bitfield.h>
}
static int pwm_imx27_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
unsigned long period_cycles, duty_cycles, prescale;
struct pwm_imx27_chip *imx = to_pwm_imx27_chip(chip);
/*
* Copyright (C) 2010, Lars-Peter Clausen <lars@metafoo.de>
* JZ4740 platform PWM support
+ *
+ * Limitations:
+ * - The .apply callback doesn't complete the currently running period before
+ * reconfiguring the hardware.
+ * - Each period starts with the inactive part.
*/
#include <linux/clk.h>
}
static int jz4740_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct jz4740_pwm_chip *jz4740 = to_jz4740(pwm->chip);
unsigned long long tmp;
}
static int pwm_lpss_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct pwm_lpss_chip *lpwm = to_lpwm(chip);
int ret;
+// SPDX-License-Identifier: GPL-2.0
/*
- * Mediatek Pulse Width Modulator driver
+ * MediaTek Pulse Width Modulator driver
*
* Copyright (C) 2015 John Crispin <blogic@openwrt.org>
* Copyright (C) 2017 Zhi Mao <zhi.mao@mediatek.com>
*
- * This file is licensed under the terms of the GNU General Public
- * License version 2. This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
*/
#include <linux/err.h>
#define PWM_CLK_DIV_MAX 7
-enum {
- MTK_CLK_MAIN = 0,
- MTK_CLK_TOP,
- MTK_CLK_PWM1,
- MTK_CLK_PWM2,
- MTK_CLK_PWM3,
- MTK_CLK_PWM4,
- MTK_CLK_PWM5,
- MTK_CLK_PWM6,
- MTK_CLK_PWM7,
- MTK_CLK_PWM8,
- MTK_CLK_MAX,
-};
-
-static const char * const mtk_pwm_clk_name[MTK_CLK_MAX] = {
- "main", "top", "pwm1", "pwm2", "pwm3", "pwm4", "pwm5", "pwm6", "pwm7",
- "pwm8"
-};
-
-struct mtk_pwm_platform_data {
+struct pwm_mediatek_of_data {
unsigned int num_pwms;
bool pwm45_fixup;
- bool has_clks;
};
/**
- * struct mtk_pwm_chip - struct representing PWM chip
+ * struct pwm_mediatek_chip - struct representing PWM chip
* @chip: linux PWM chip representation
* @regs: base address of PWM chip
- * @clks: list of clocks
+ * @clk_top: the top clock generator
+ * @clk_main: the clock used by PWM core
+ * @clk_pwms: the clock used by each PWM channel
+ * @clk_freq: the fix clock frequency of legacy MIPS SoC
*/
-struct mtk_pwm_chip {
+struct pwm_mediatek_chip {
struct pwm_chip chip;
void __iomem *regs;
- struct clk *clks[MTK_CLK_MAX];
- const struct mtk_pwm_platform_data *soc;
+ struct clk *clk_top;
+ struct clk *clk_main;
+ struct clk **clk_pwms;
+ const struct pwm_mediatek_of_data *soc;
};
-static const unsigned int mtk_pwm_reg_offset[] = {
+static const unsigned int pwm_mediatek_reg_offset[] = {
0x0010, 0x0050, 0x0090, 0x00d0, 0x0110, 0x0150, 0x0190, 0x0220
};
-static inline struct mtk_pwm_chip *to_mtk_pwm_chip(struct pwm_chip *chip)
+static inline struct pwm_mediatek_chip *
+to_pwm_mediatek_chip(struct pwm_chip *chip)
{
- return container_of(chip, struct mtk_pwm_chip, chip);
+ return container_of(chip, struct pwm_mediatek_chip, chip);
}
-static int mtk_pwm_clk_enable(struct pwm_chip *chip, struct pwm_device *pwm)
+static int pwm_mediatek_clk_enable(struct pwm_chip *chip,
+ struct pwm_device *pwm)
{
- struct mtk_pwm_chip *pc = to_mtk_pwm_chip(chip);
+ struct pwm_mediatek_chip *pc = to_pwm_mediatek_chip(chip);
int ret;
- if (!pc->soc->has_clks)
- return 0;
-
- ret = clk_prepare_enable(pc->clks[MTK_CLK_TOP]);
+ ret = clk_prepare_enable(pc->clk_top);
if (ret < 0)
return ret;
- ret = clk_prepare_enable(pc->clks[MTK_CLK_MAIN]);
+ ret = clk_prepare_enable(pc->clk_main);
if (ret < 0)
goto disable_clk_top;
- ret = clk_prepare_enable(pc->clks[MTK_CLK_PWM1 + pwm->hwpwm]);
+ ret = clk_prepare_enable(pc->clk_pwms[pwm->hwpwm]);
if (ret < 0)
goto disable_clk_main;
return 0;
disable_clk_main:
- clk_disable_unprepare(pc->clks[MTK_CLK_MAIN]);
+ clk_disable_unprepare(pc->clk_main);
disable_clk_top:
- clk_disable_unprepare(pc->clks[MTK_CLK_TOP]);
+ clk_disable_unprepare(pc->clk_top);
return ret;
}
-static void mtk_pwm_clk_disable(struct pwm_chip *chip, struct pwm_device *pwm)
+static void pwm_mediatek_clk_disable(struct pwm_chip *chip,
+ struct pwm_device *pwm)
{
- struct mtk_pwm_chip *pc = to_mtk_pwm_chip(chip);
-
- if (!pc->soc->has_clks)
- return;
+ struct pwm_mediatek_chip *pc = to_pwm_mediatek_chip(chip);
- clk_disable_unprepare(pc->clks[MTK_CLK_PWM1 + pwm->hwpwm]);
- clk_disable_unprepare(pc->clks[MTK_CLK_MAIN]);
- clk_disable_unprepare(pc->clks[MTK_CLK_TOP]);
+ clk_disable_unprepare(pc->clk_pwms[pwm->hwpwm]);
+ clk_disable_unprepare(pc->clk_main);
+ clk_disable_unprepare(pc->clk_top);
}
-static inline u32 mtk_pwm_readl(struct mtk_pwm_chip *chip, unsigned int num,
- unsigned int offset)
+static inline u32 pwm_mediatek_readl(struct pwm_mediatek_chip *chip,
+ unsigned int num, unsigned int offset)
{
- return readl(chip->regs + mtk_pwm_reg_offset[num] + offset);
+ return readl(chip->regs + pwm_mediatek_reg_offset[num] + offset);
}
-static inline void mtk_pwm_writel(struct mtk_pwm_chip *chip,
- unsigned int num, unsigned int offset,
- u32 value)
+static inline void pwm_mediatek_writel(struct pwm_mediatek_chip *chip,
+ unsigned int num, unsigned int offset,
+ u32 value)
{
- writel(value, chip->regs + mtk_pwm_reg_offset[num] + offset);
+ writel(value, chip->regs + pwm_mediatek_reg_offset[num] + offset);
}
-static int mtk_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
- int duty_ns, int period_ns)
+static int pwm_mediatek_config(struct pwm_chip *chip, struct pwm_device *pwm,
+ int duty_ns, int period_ns)
{
- struct mtk_pwm_chip *pc = to_mtk_pwm_chip(chip);
- struct clk *clk = pc->clks[MTK_CLK_PWM1 + pwm->hwpwm];
+ struct pwm_mediatek_chip *pc = to_pwm_mediatek_chip(chip);
u32 clkdiv = 0, cnt_period, cnt_duty, reg_width = PWMDWIDTH,
reg_thres = PWMTHRES;
u64 resolution;
int ret;
- ret = mtk_pwm_clk_enable(chip, pwm);
+ ret = pwm_mediatek_clk_enable(chip, pwm);
+
if (ret < 0)
return ret;
/* Using resolution in picosecond gets accuracy higher */
resolution = (u64)NSEC_PER_SEC * 1000;
- do_div(resolution, clk_get_rate(clk));
+ do_div(resolution, clk_get_rate(pc->clk_pwms[pwm->hwpwm]));
cnt_period = DIV_ROUND_CLOSEST_ULL((u64)period_ns * 1000, resolution);
while (cnt_period > 8191) {
}
if (clkdiv > PWM_CLK_DIV_MAX) {
- mtk_pwm_clk_disable(chip, pwm);
+ pwm_mediatek_clk_disable(chip, pwm);
dev_err(chip->dev, "period %d not supported\n", period_ns);
return -EINVAL;
}
}
cnt_duty = DIV_ROUND_CLOSEST_ULL((u64)duty_ns * 1000, resolution);
- mtk_pwm_writel(pc, pwm->hwpwm, PWMCON, BIT(15) | clkdiv);
- mtk_pwm_writel(pc, pwm->hwpwm, reg_width, cnt_period);
- mtk_pwm_writel(pc, pwm->hwpwm, reg_thres, cnt_duty);
+ pwm_mediatek_writel(pc, pwm->hwpwm, PWMCON, BIT(15) | clkdiv);
+ pwm_mediatek_writel(pc, pwm->hwpwm, reg_width, cnt_period);
+ pwm_mediatek_writel(pc, pwm->hwpwm, reg_thres, cnt_duty);
- mtk_pwm_clk_disable(chip, pwm);
+ pwm_mediatek_clk_disable(chip, pwm);
return 0;
}
-static int mtk_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm)
+static int pwm_mediatek_enable(struct pwm_chip *chip, struct pwm_device *pwm)
{
- struct mtk_pwm_chip *pc = to_mtk_pwm_chip(chip);
+ struct pwm_mediatek_chip *pc = to_pwm_mediatek_chip(chip);
u32 value;
int ret;
- ret = mtk_pwm_clk_enable(chip, pwm);
+ ret = pwm_mediatek_clk_enable(chip, pwm);
if (ret < 0)
return ret;
return 0;
}
-static void mtk_pwm_disable(struct pwm_chip *chip, struct pwm_device *pwm)
+static void pwm_mediatek_disable(struct pwm_chip *chip, struct pwm_device *pwm)
{
- struct mtk_pwm_chip *pc = to_mtk_pwm_chip(chip);
+ struct pwm_mediatek_chip *pc = to_pwm_mediatek_chip(chip);
u32 value;
value = readl(pc->regs);
value &= ~BIT(pwm->hwpwm);
writel(value, pc->regs);
- mtk_pwm_clk_disable(chip, pwm);
+ pwm_mediatek_clk_disable(chip, pwm);
}
-static const struct pwm_ops mtk_pwm_ops = {
- .config = mtk_pwm_config,
- .enable = mtk_pwm_enable,
- .disable = mtk_pwm_disable,
+static const struct pwm_ops pwm_mediatek_ops = {
+ .config = pwm_mediatek_config,
+ .enable = pwm_mediatek_enable,
+ .disable = pwm_mediatek_disable,
.owner = THIS_MODULE,
};
-static int mtk_pwm_probe(struct platform_device *pdev)
+static int pwm_mediatek_probe(struct platform_device *pdev)
{
- const struct mtk_pwm_platform_data *data;
- struct mtk_pwm_chip *pc;
+ struct pwm_mediatek_chip *pc;
struct resource *res;
unsigned int i;
int ret;
if (!pc)
return -ENOMEM;
- data = of_device_get_match_data(&pdev->dev);
- if (data == NULL)
- return -EINVAL;
- pc->soc = data;
+ pc->soc = of_device_get_match_data(&pdev->dev);
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
pc->regs = devm_ioremap_resource(&pdev->dev, res);
if (IS_ERR(pc->regs))
return PTR_ERR(pc->regs);
- for (i = 0; i < data->num_pwms + 2 && pc->soc->has_clks; i++) {
- pc->clks[i] = devm_clk_get(&pdev->dev, mtk_pwm_clk_name[i]);
- if (IS_ERR(pc->clks[i])) {
+ pc->clk_pwms = devm_kcalloc(&pdev->dev, pc->soc->num_pwms,
+ sizeof(*pc->clk_pwms), GFP_KERNEL);
+ if (!pc->clk_pwms)
+ return -ENOMEM;
+
+ pc->clk_top = devm_clk_get(&pdev->dev, "top");
+ if (IS_ERR(pc->clk_top)) {
+ dev_err(&pdev->dev, "clock: top fail: %ld\n",
+ PTR_ERR(pc->clk_top));
+ return PTR_ERR(pc->clk_top);
+ }
+
+ pc->clk_main = devm_clk_get(&pdev->dev, "main");
+ if (IS_ERR(pc->clk_main)) {
+ dev_err(&pdev->dev, "clock: main fail: %ld\n",
+ PTR_ERR(pc->clk_main));
+ return PTR_ERR(pc->clk_main);
+ }
+
+ for (i = 0; i < pc->soc->num_pwms; i++) {
+ char name[8];
+
+ snprintf(name, sizeof(name), "pwm%d", i + 1);
+
+ pc->clk_pwms[i] = devm_clk_get(&pdev->dev, name);
+ if (IS_ERR(pc->clk_pwms[i])) {
dev_err(&pdev->dev, "clock: %s fail: %ld\n",
- mtk_pwm_clk_name[i], PTR_ERR(pc->clks[i]));
- return PTR_ERR(pc->clks[i]);
+ name, PTR_ERR(pc->clk_pwms[i]));
+ return PTR_ERR(pc->clk_pwms[i]);
}
}
platform_set_drvdata(pdev, pc);
pc->chip.dev = &pdev->dev;
- pc->chip.ops = &mtk_pwm_ops;
+ pc->chip.ops = &pwm_mediatek_ops;
pc->chip.base = -1;
- pc->chip.npwm = data->num_pwms;
+ pc->chip.npwm = pc->soc->num_pwms;
ret = pwmchip_add(&pc->chip);
if (ret < 0) {
return 0;
}
-static int mtk_pwm_remove(struct platform_device *pdev)
+static int pwm_mediatek_remove(struct platform_device *pdev)
{
- struct mtk_pwm_chip *pc = platform_get_drvdata(pdev);
+ struct pwm_mediatek_chip *pc = platform_get_drvdata(pdev);
return pwmchip_remove(&pc->chip);
}
-static const struct mtk_pwm_platform_data mt2712_pwm_data = {
+static const struct pwm_mediatek_of_data mt2712_pwm_data = {
.num_pwms = 8,
.pwm45_fixup = false,
- .has_clks = true,
};
-static const struct mtk_pwm_platform_data mt7622_pwm_data = {
+static const struct pwm_mediatek_of_data mt7622_pwm_data = {
.num_pwms = 6,
.pwm45_fixup = false,
- .has_clks = true,
};
-static const struct mtk_pwm_platform_data mt7623_pwm_data = {
+static const struct pwm_mediatek_of_data mt7623_pwm_data = {
.num_pwms = 5,
.pwm45_fixup = true,
- .has_clks = true,
};
-static const struct mtk_pwm_platform_data mt7628_pwm_data = {
+static const struct pwm_mediatek_of_data mt7628_pwm_data = {
.num_pwms = 4,
.pwm45_fixup = true,
- .has_clks = false,
};
-static const struct of_device_id mtk_pwm_of_match[] = {
+static const struct pwm_mediatek_of_data mt7629_pwm_data = {
+ .num_pwms = 1,
+ .pwm45_fixup = false,
+};
+
+static const struct pwm_mediatek_of_data mt8516_pwm_data = {
+ .num_pwms = 5,
+ .pwm45_fixup = false,
+};
+
+static const struct of_device_id pwm_mediatek_of_match[] = {
{ .compatible = "mediatek,mt2712-pwm", .data = &mt2712_pwm_data },
{ .compatible = "mediatek,mt7622-pwm", .data = &mt7622_pwm_data },
{ .compatible = "mediatek,mt7623-pwm", .data = &mt7623_pwm_data },
{ .compatible = "mediatek,mt7628-pwm", .data = &mt7628_pwm_data },
+ { .compatible = "mediatek,mt7629-pwm", .data = &mt7629_pwm_data },
+ { .compatible = "mediatek,mt8516-pwm", .data = &mt8516_pwm_data },
{ },
};
-MODULE_DEVICE_TABLE(of, mtk_pwm_of_match);
+MODULE_DEVICE_TABLE(of, pwm_mediatek_of_match);
-static struct platform_driver mtk_pwm_driver = {
+static struct platform_driver pwm_mediatek_driver = {
.driver = {
- .name = "mtk-pwm",
- .of_match_table = mtk_pwm_of_match,
+ .name = "pwm-mediatek",
+ .of_match_table = pwm_mediatek_of_match,
},
- .probe = mtk_pwm_probe,
- .remove = mtk_pwm_remove,
+ .probe = pwm_mediatek_probe,
+ .remove = pwm_mediatek_remove,
};
-module_platform_driver(mtk_pwm_driver);
+module_platform_driver(pwm_mediatek_driver);
MODULE_AUTHOR("John Crispin <blogic@openwrt.org>");
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("GPL v2");
}
static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct meson_pwm_channel *channel = pwm_get_chip_data(pwm);
unsigned int duty, period, pre_div, cnt, duty_cnt;
}
static int meson_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct meson_pwm_channel *channel = pwm_get_chip_data(pwm);
struct meson_pwm *meson = to_meson_pwm(chip);
{
struct device_node *np = pdev->dev.of_node;
struct mxs_pwm_chip *mxs;
- struct resource *res;
int ret;
mxs = devm_kzalloc(&pdev->dev, sizeof(*mxs), GFP_KERNEL);
if (!mxs)
return -ENOMEM;
- res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- mxs->base = devm_ioremap_resource(&pdev->dev, res);
+ mxs->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(mxs->base))
return PTR_ERR(mxs->base);
}
static int rcar_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct rcar_pwm_chip *rp = to_rcar_pwm_chip(chip);
struct pwm_state cur_state;
/* The SYNC should be set to 0 even if rcar_pwm_set_counter failed */
rcar_pwm_update(rp, RCAR_PWMCR_SYNC, 0, RCAR_PWMCR);
- if (!ret && state->enabled)
+ if (!ret)
ret = rcar_pwm_enable(rp);
return ret;
state->enabled = ((val & enable_conf) == enable_conf) ?
true : false;
- if (pc->data->supports_polarity) {
- if (!(val & PWM_DUTY_POSITIVE))
- state->polarity = PWM_POLARITY_INVERSED;
- }
+ if (pc->data->supports_polarity && !(val & PWM_DUTY_POSITIVE))
+ state->polarity = PWM_POLARITY_INVERSED;
+ else
+ state->polarity = PWM_POLARITY_NORMAL;
clk_disable(pc->pclk);
}
static void rockchip_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct rockchip_pwm_chip *pc = to_rockchip_pwm_chip(chip);
unsigned long period, duty;
}
static int rockchip_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct rockchip_pwm_chip *pc = to_rockchip_pwm_chip(chip);
struct pwm_state curstate;
goto out;
}
- /*
- * Update the state with the real hardware, which can differ a bit
- * because of period/duty_cycle approximation.
- */
- rockchip_pwm_get_state(chip, pwm, state);
-
out:
clk_disable(pc->pclk);
}
static int pwm_sifive_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct pwm_sifive_ddata *ddata = pwm_sifive_chip_to_ddata(chip);
struct pwm_state cur_state;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ddata->regs = devm_ioremap_resource(dev, res);
- if (IS_ERR(ddata->regs)) {
- dev_err(dev, "Unable to map IO resources\n");
+ if (IS_ERR(ddata->regs))
return PTR_ERR(ddata->regs);
- }
ddata->clk = devm_clk_get(dev, NULL);
if (IS_ERR(ddata->clk)) {
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Spreadtrum Communications Inc.
+ */
+
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/math64.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pwm.h>
+
+#define SPRD_PWM_PRESCALE 0x0
+#define SPRD_PWM_MOD 0x4
+#define SPRD_PWM_DUTY 0x8
+#define SPRD_PWM_ENABLE 0x18
+
+#define SPRD_PWM_MOD_MAX GENMASK(7, 0)
+#define SPRD_PWM_DUTY_MSK GENMASK(15, 0)
+#define SPRD_PWM_PRESCALE_MSK GENMASK(7, 0)
+#define SPRD_PWM_ENABLE_BIT BIT(0)
+
+#define SPRD_PWM_CHN_NUM 4
+#define SPRD_PWM_REGS_SHIFT 5
+#define SPRD_PWM_CHN_CLKS_NUM 2
+#define SPRD_PWM_CHN_OUTPUT_CLK 1
+
+struct sprd_pwm_chn {
+ struct clk_bulk_data clks[SPRD_PWM_CHN_CLKS_NUM];
+ u32 clk_rate;
+};
+
+struct sprd_pwm_chip {
+ void __iomem *base;
+ struct device *dev;
+ struct pwm_chip chip;
+ int num_pwms;
+ struct sprd_pwm_chn chn[SPRD_PWM_CHN_NUM];
+};
+
+/*
+ * The list of clocks required by PWM channels, and each channel has 2 clocks:
+ * enable clock and pwm clock.
+ */
+static const char * const sprd_pwm_clks[] = {
+ "enable0", "pwm0",
+ "enable1", "pwm1",
+ "enable2", "pwm2",
+ "enable3", "pwm3",
+};
+
+static u32 sprd_pwm_read(struct sprd_pwm_chip *spc, u32 hwid, u32 reg)
+{
+ u32 offset = reg + (hwid << SPRD_PWM_REGS_SHIFT);
+
+ return readl_relaxed(spc->base + offset);
+}
+
+static void sprd_pwm_write(struct sprd_pwm_chip *spc, u32 hwid,
+ u32 reg, u32 val)
+{
+ u32 offset = reg + (hwid << SPRD_PWM_REGS_SHIFT);
+
+ writel_relaxed(val, spc->base + offset);
+}
+
+static void sprd_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
+ struct pwm_state *state)
+{
+ struct sprd_pwm_chip *spc =
+ container_of(chip, struct sprd_pwm_chip, chip);
+ struct sprd_pwm_chn *chn = &spc->chn[pwm->hwpwm];
+ u32 val, duty, prescale;
+ u64 tmp;
+ int ret;
+
+ /*
+ * The clocks to PWM channel has to be enabled first before
+ * reading to the registers.
+ */
+ ret = clk_bulk_prepare_enable(SPRD_PWM_CHN_CLKS_NUM, chn->clks);
+ if (ret) {
+ dev_err(spc->dev, "failed to enable pwm%u clocks\n",
+ pwm->hwpwm);
+ return;
+ }
+
+ val = sprd_pwm_read(spc, pwm->hwpwm, SPRD_PWM_ENABLE);
+ if (val & SPRD_PWM_ENABLE_BIT)
+ state->enabled = true;
+ else
+ state->enabled = false;
+
+ /*
+ * The hardware provides a counter that is feed by the source clock.
+ * The period length is (PRESCALE + 1) * MOD counter steps.
+ * The duty cycle length is (PRESCALE + 1) * DUTY counter steps.
+ * Thus the period_ns and duty_ns calculation formula should be:
+ * period_ns = NSEC_PER_SEC * (prescale + 1) * mod / clk_rate
+ * duty_ns = NSEC_PER_SEC * (prescale + 1) * duty / clk_rate
+ */
+ val = sprd_pwm_read(spc, pwm->hwpwm, SPRD_PWM_PRESCALE);
+ prescale = val & SPRD_PWM_PRESCALE_MSK;
+ tmp = (prescale + 1) * NSEC_PER_SEC * SPRD_PWM_MOD_MAX;
+ state->period = DIV_ROUND_CLOSEST_ULL(tmp, chn->clk_rate);
+
+ val = sprd_pwm_read(spc, pwm->hwpwm, SPRD_PWM_DUTY);
+ duty = val & SPRD_PWM_DUTY_MSK;
+ tmp = (prescale + 1) * NSEC_PER_SEC * duty;
+ state->duty_cycle = DIV_ROUND_CLOSEST_ULL(tmp, chn->clk_rate);
+
+ /* Disable PWM clocks if the PWM channel is not in enable state. */
+ if (!state->enabled)
+ clk_bulk_disable_unprepare(SPRD_PWM_CHN_CLKS_NUM, chn->clks);
+}
+
+static int sprd_pwm_config(struct sprd_pwm_chip *spc, struct pwm_device *pwm,
+ int duty_ns, int period_ns)
+{
+ struct sprd_pwm_chn *chn = &spc->chn[pwm->hwpwm];
+ u32 prescale, duty;
+ u64 tmp;
+
+ /*
+ * The hardware provides a counter that is feed by the source clock.
+ * The period length is (PRESCALE + 1) * MOD counter steps.
+ * The duty cycle length is (PRESCALE + 1) * DUTY counter steps.
+ *
+ * To keep the maths simple we're always using MOD = SPRD_PWM_MOD_MAX.
+ * The value for PRESCALE is selected such that the resulting period
+ * gets the maximal length not bigger than the requested one with the
+ * given settings (MOD = SPRD_PWM_MOD_MAX and input clock).
+ */
+ duty = duty_ns * SPRD_PWM_MOD_MAX / period_ns;
+
+ tmp = (u64)chn->clk_rate * period_ns;
+ do_div(tmp, NSEC_PER_SEC);
+ prescale = DIV_ROUND_CLOSEST_ULL(tmp, SPRD_PWM_MOD_MAX) - 1;
+ if (prescale > SPRD_PWM_PRESCALE_MSK)
+ prescale = SPRD_PWM_PRESCALE_MSK;
+
+ /*
+ * Note: Writing DUTY triggers the hardware to actually apply the
+ * values written to MOD and DUTY to the output, so must keep writing
+ * DUTY last.
+ *
+ * The hardware can ensures that current running period is completed
+ * before changing a new configuration to avoid mixed settings.
+ */
+ sprd_pwm_write(spc, pwm->hwpwm, SPRD_PWM_PRESCALE, prescale);
+ sprd_pwm_write(spc, pwm->hwpwm, SPRD_PWM_MOD, SPRD_PWM_MOD_MAX);
+ sprd_pwm_write(spc, pwm->hwpwm, SPRD_PWM_DUTY, duty);
+
+ return 0;
+}
+
+static int sprd_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
+ const struct pwm_state *state)
+{
+ struct sprd_pwm_chip *spc =
+ container_of(chip, struct sprd_pwm_chip, chip);
+ struct sprd_pwm_chn *chn = &spc->chn[pwm->hwpwm];
+ struct pwm_state *cstate = &pwm->state;
+ int ret;
+
+ if (state->enabled) {
+ if (!cstate->enabled) {
+ /*
+ * The clocks to PWM channel has to be enabled first
+ * before writing to the registers.
+ */
+ ret = clk_bulk_prepare_enable(SPRD_PWM_CHN_CLKS_NUM,
+ chn->clks);
+ if (ret) {
+ dev_err(spc->dev,
+ "failed to enable pwm%u clocks\n",
+ pwm->hwpwm);
+ return ret;
+ }
+ }
+
+ if (state->period != cstate->period ||
+ state->duty_cycle != cstate->duty_cycle) {
+ ret = sprd_pwm_config(spc, pwm, state->duty_cycle,
+ state->period);
+ if (ret)
+ return ret;
+ }
+
+ sprd_pwm_write(spc, pwm->hwpwm, SPRD_PWM_ENABLE, 1);
+ } else if (cstate->enabled) {
+ /*
+ * Note: After setting SPRD_PWM_ENABLE to zero, the controller
+ * will not wait for current period to be completed, instead it
+ * will stop the PWM channel immediately.
+ */
+ sprd_pwm_write(spc, pwm->hwpwm, SPRD_PWM_ENABLE, 0);
+
+ clk_bulk_disable_unprepare(SPRD_PWM_CHN_CLKS_NUM, chn->clks);
+ }
+
+ return 0;
+}
+
+static const struct pwm_ops sprd_pwm_ops = {
+ .apply = sprd_pwm_apply,
+ .get_state = sprd_pwm_get_state,
+ .owner = THIS_MODULE,
+};
+
+static int sprd_pwm_clk_init(struct sprd_pwm_chip *spc)
+{
+ struct clk *clk_pwm;
+ int ret, i;
+
+ for (i = 0; i < SPRD_PWM_CHN_NUM; i++) {
+ struct sprd_pwm_chn *chn = &spc->chn[i];
+ int j;
+
+ for (j = 0; j < SPRD_PWM_CHN_CLKS_NUM; ++j)
+ chn->clks[j].id =
+ sprd_pwm_clks[i * SPRD_PWM_CHN_CLKS_NUM + j];
+
+ ret = devm_clk_bulk_get(spc->dev, SPRD_PWM_CHN_CLKS_NUM,
+ chn->clks);
+ if (ret) {
+ if (ret == -ENOENT)
+ break;
+
+ if (ret != -EPROBE_DEFER)
+ dev_err(spc->dev,
+ "failed to get channel clocks\n");
+
+ return ret;
+ }
+
+ clk_pwm = chn->clks[SPRD_PWM_CHN_OUTPUT_CLK].clk;
+ chn->clk_rate = clk_get_rate(clk_pwm);
+ }
+
+ if (!i) {
+ dev_err(spc->dev, "no available PWM channels\n");
+ return -ENODEV;
+ }
+
+ spc->num_pwms = i;
+
+ return 0;
+}
+
+static int sprd_pwm_probe(struct platform_device *pdev)
+{
+ struct sprd_pwm_chip *spc;
+ int ret;
+
+ spc = devm_kzalloc(&pdev->dev, sizeof(*spc), GFP_KERNEL);
+ if (!spc)
+ return -ENOMEM;
+
+ spc->base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(spc->base))
+ return PTR_ERR(spc->base);
+
+ spc->dev = &pdev->dev;
+ platform_set_drvdata(pdev, spc);
+
+ ret = sprd_pwm_clk_init(spc);
+ if (ret)
+ return ret;
+
+ spc->chip.dev = &pdev->dev;
+ spc->chip.ops = &sprd_pwm_ops;
+ spc->chip.base = -1;
+ spc->chip.npwm = spc->num_pwms;
+
+ ret = pwmchip_add(&spc->chip);
+ if (ret)
+ dev_err(&pdev->dev, "failed to add PWM chip\n");
+
+ return ret;
+}
+
+static int sprd_pwm_remove(struct platform_device *pdev)
+{
+ struct sprd_pwm_chip *spc = platform_get_drvdata(pdev);
+
+ return pwmchip_remove(&spc->chip);
+}
+
+static const struct of_device_id sprd_pwm_of_match[] = {
+ { .compatible = "sprd,ums512-pwm", },
+ { },
+};
+MODULE_DEVICE_TABLE(of, sprd_pwm_of_match);
+
+static struct platform_driver sprd_pwm_driver = {
+ .driver = {
+ .name = "sprd-pwm",
+ .of_match_table = sprd_pwm_of_match,
+ },
+ .probe = sprd_pwm_probe,
+ .remove = sprd_pwm_remove,
+};
+
+module_platform_driver(sprd_pwm_driver);
+
+MODULE_DESCRIPTION("Spreadtrum PWM Driver");
+MODULE_LICENSE("GPL v2");
return PTR_ERR(pc->regmap);
irq = platform_get_irq(pdev, 0);
- if (irq < 0) {
- dev_err(&pdev->dev, "Failed to obtain IRQ\n");
+ if (irq < 0)
return irq;
- }
ret = devm_request_irq(&pdev->dev, irq, sti_pwm_interrupt, 0,
pdev->name, pc);
#define STM32_LPTIM_MAX_PRESCALER 128
static int stm32_pwm_lp_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct stm32_pwm_lp *priv = to_stm32_pwm_lp(chip);
unsigned long long prd, div, dty;
/* Calculate the period and prescaler value */
div = (unsigned long long)clk_get_rate(priv->clk) * state->period;
do_div(div, NSEC_PER_SEC);
+ if (!div) {
+ /* Clock is too slow to achieve requested period. */
+ dev_dbg(priv->chip.dev, "Can't reach %u ns\n", state->period);
+ return -EINVAL;
+ }
+
prd = div;
while (div > STM32_LPTIM_MAX_ARR) {
presc++;
}
static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
bool enabled;
struct stm32_pwm *priv = to_stm32_pwm_dev(chip);
}
static int stm32_pwm_apply_locked(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct stm32_pwm *priv = to_stm32_pwm_dev(chip);
int ret;
}
static int sun4i_pwm_calculate(struct sun4i_pwm_chip *sun4i_pwm,
- struct pwm_state *state,
+ const struct pwm_state *state,
u32 *dty, u32 *prd, unsigned int *prsclr)
{
u64 clk_rate, div = 0;
*dty = div;
*prsclr = prescaler;
- div = (u64)pval * NSEC_PER_SEC * *prd;
- state->period = DIV_ROUND_CLOSEST_ULL(div, clk_rate);
-
- div = (u64)pval * NSEC_PER_SEC * *dty;
- state->duty_cycle = DIV_ROUND_CLOSEST_ULL(div, clk_rate);
-
return 0;
}
static int sun4i_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct sun4i_pwm_chip *sun4i_pwm = to_sun4i_pwm_chip(chip);
struct pwm_state cstate;
}
static int zx_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state)
+ const struct pwm_state *state)
{
struct zx_pwm_chip *zpc = to_zx_pwm_chip(chip);
struct pwm_state cstate;
data->rcdev.owner = THIS_MODULE;
data->rcdev.of_node = np;
data->rcdev.nr_resets = handle->reset_ops->num_domains_get(handle);
+ data->handle = handle;
return devm_reset_controller_register(dev, &data->rcdev);
}
if (rc == 0) {
memcpy(&private->vsq, vsq, sizeof(*vsq));
} else {
- dev_warn(&device->cdev->dev,
- "Reading the volume storage information failed with rc=%d\n", rc);
+ DBF_EVENT_DEVID(DBF_WARNING, device->cdev,
+ "Reading the volume storage information failed with rc=%d", rc);
}
if (useglobal)
if (rc == 0) {
dasd_eckd_cpy_ext_pool_data(device, lcq);
} else {
- dev_warn(&device->cdev->dev,
- "Reading the logical configuration failed with rc=%d\n", rc);
+ DBF_EVENT_DEVID(DBF_WARNING, device->cdev,
+ "Reading the logical configuration failed with rc=%d", rc);
}
dasd_sfree_request(cqr, cqr->memdev);
dasd_eckd_read_features(device);
/* Read Volume Information */
- rc = dasd_eckd_read_vol_info(device);
- if (rc)
- goto out_err3;
+ dasd_eckd_read_vol_info(device);
/* Read Extent Pool Information */
- rc = dasd_eckd_read_ext_pool_info(device);
- if (rc)
- goto out_err3;
+ dasd_eckd_read_ext_pool_info(device);
/* Read Device Characteristics */
rc = dasd_generic_read_dev_chars(device, DASD_ECKD_MAGIC,
if (readonly)
set_bit(DASD_FLAG_DEVICE_RO, &device->flags);
- if (dasd_eckd_is_ese(device))
- dasd_set_feature(device->cdev, DASD_FEATURE_DISCARD, 1);
-
dev_info(&device->cdev->dev, "New DASD %04X/%02X (CU %04X/%02X) "
"with %d cylinders, %d heads, %d sectors%s\n",
private->rdc_data.dev_type,
return -EINVAL;
}
-static struct dasd_ccw_req *
-dasd_eckd_build_cp_discard(struct dasd_device *device, struct dasd_block *block,
- struct request *req, sector_t first_trk,
- sector_t last_trk)
-{
- return dasd_eckd_dso_ras(device, block, req, first_trk, last_trk, 1);
-}
-
static struct dasd_ccw_req *dasd_eckd_build_cp_cmd_single(
struct dasd_device *startdev,
struct dasd_block *block,
cmdwtd = private->features.feature[12] & 0x40;
use_prefix = private->features.feature[8] & 0x01;
- if (req_op(req) == REQ_OP_DISCARD)
- return dasd_eckd_build_cp_discard(startdev, block, req,
- first_trk, last_trk);
-
cqr = NULL;
if (cdlspecial || dasd_page_cache) {
/* do nothing, just fall through to the cmd mode single case */
struct dasd_block *block,
struct request *req)
{
- struct dasd_device *startdev = NULL;
struct dasd_eckd_private *private;
- struct dasd_ccw_req *cqr;
+ struct dasd_device *startdev;
unsigned long flags;
+ struct dasd_ccw_req *cqr;
- /* Discard requests can only be processed on base devices */
- if (req_op(req) != REQ_OP_DISCARD)
- startdev = dasd_alias_get_start_dev(base);
+ startdev = dasd_alias_get_start_dev(base);
if (!startdev)
startdev = base;
private = startdev->private;
dasd_eckd_read_features(device);
/* Read Volume Information */
- rc = dasd_eckd_read_vol_info(device);
- if (rc)
- goto out_err2;
+ dasd_eckd_read_vol_info(device);
/* Read Extent Pool Information */
- rc = dasd_eckd_read_ext_pool_info(device);
- if (rc)
- goto out_err2;
+ dasd_eckd_read_ext_pool_info(device);
/* Read Device Characteristics */
rc = dasd_generic_read_dev_chars(device, DASD_ECKD_MAGIC,
unsigned int logical_block_size = block->bp_block;
struct request_queue *q = block->request_queue;
struct dasd_device *device = block->base;
- struct dasd_eckd_private *private;
- unsigned int max_discard_sectors;
- unsigned int max_bytes;
- unsigned int ext_bytes; /* Extent Size in Bytes */
- int recs_per_trk;
- int trks_per_cyl;
- int ext_limit;
- int ext_size; /* Extent Size in Cylinders */
int max;
- private = device->private;
- trks_per_cyl = private->rdc_data.trk_per_cyl;
- recs_per_trk = recs_per_track(&private->rdc_data, 0, logical_block_size);
-
if (device->features & DASD_FEATURE_USERAW) {
/*
* the max_blocks value for raw_track access is 256
/* With page sized segments each segment can be translated into one idaw/tidaw */
blk_queue_max_segment_size(q, PAGE_SIZE);
blk_queue_segment_boundary(q, PAGE_SIZE - 1);
-
- if (dasd_eckd_is_ese(device)) {
- /*
- * Depending on the extent size, up to UINT_MAX bytes can be
- * accepted. However, neither DASD_ECKD_RAS_EXTS_MAX nor the
- * device limits should be exceeded.
- */
- ext_size = dasd_eckd_ext_size(device);
- ext_limit = min(private->real_cyl / ext_size, DASD_ECKD_RAS_EXTS_MAX);
- ext_bytes = ext_size * trks_per_cyl * recs_per_trk *
- logical_block_size;
- max_bytes = UINT_MAX - (UINT_MAX % ext_bytes);
- if (max_bytes / ext_bytes > ext_limit)
- max_bytes = ext_bytes * ext_limit;
-
- max_discard_sectors = max_bytes / 512;
-
- blk_queue_max_discard_sectors(q, max_discard_sectors);
- blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
- q->limits.discard_granularity = ext_bytes;
- q->limits.discard_alignment = ext_bytes;
- }
}
static struct ccw_driver dasd_eckd_driver = {
irq_ptr->qib.pfmt = qib_param_field_format;
if (qib_param_field)
memcpy(irq_ptr->qib.parm, qib_param_field,
- QDIO_MAX_BUFFERS_PER_Q);
+ sizeof(irq_ptr->qib.parm));
if (!input_slib_elements)
goto output;
QETH_CARD_TEXT(card, 2, "qdioest");
- qib_param_field = kzalloc(QDIO_MAX_BUFFERS_PER_Q,
- GFP_KERNEL);
+ qib_param_field = kzalloc(FIELD_SIZEOF(struct qib, parm), GFP_KERNEL);
if (!qib_param_field) {
rc = -ENOMEM;
goto out_free_nothing;
struct fcoe_fcp_rsp_payload *fcp_rsp;
struct bnx2fc_rport *tgt = io_req->tgt;
struct scsi_cmnd *sc_cmd;
+ u16 scope = 0, qualifier = 0;
/* scsi_cmd_cmpl is called with tgt lock held */
if (io_req->cdb_status == SAM_STAT_TASK_SET_FULL ||
io_req->cdb_status == SAM_STAT_BUSY) {
- /* Set the jiffies + retry_delay_timer * 100ms
- for the rport/tgt */
- tgt->retry_delay_timestamp = jiffies +
- fcp_rsp->retry_delay_timer * HZ / 10;
+ /* Newer array firmware with BUSY or
+ * TASK_SET_FULL may return a status that needs
+ * the scope bits masked.
+ * Or a huge delay timestamp up to 27 minutes
+ * can result.
+ */
+ if (fcp_rsp->retry_delay_timer) {
+ /* Upper 2 bits */
+ scope = fcp_rsp->retry_delay_timer
+ & 0xC000;
+ /* Lower 14 bits */
+ qualifier = fcp_rsp->retry_delay_timer
+ & 0x3FFF;
+ }
+ if (scope > 0 && qualifier > 0 &&
+ qualifier <= 0x3FEF) {
+ /* Set the jiffies +
+ * retry_delay_timer * 100ms
+ * for the rport/tgt
+ */
+ tgt->retry_delay_timestamp = jiffies +
+ (qualifier * HZ / 10);
+ }
}
-
}
if (io_req->fcp_resid)
scsi_set_resid(sc_cmd, io_req->fcp_resid);
}
EXPORT_SYMBOL_GPL(hisi_sas_debugfs_work_handler);
-void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba)
+static void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba)
{
struct device *dev = hisi_hba->dev;
int i;
devm_kfree(dev, hisi_hba->debugfs_port_reg[i]);
}
-int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba)
+static int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba)
{
const struct hisi_sas_hw *hw = hisi_hba->hw;
struct device *dev = hisi_hba->dev;
return -ENOMEM;
}
-void hisi_sas_debugfs_bist_init(struct hisi_hba *hisi_hba)
+static void hisi_sas_debugfs_bist_init(struct hisi_hba *hisi_hba)
{
hisi_hba->debugfs_bist_dentry =
debugfs_create_dir("bist", hisi_hba->debugfs_dir);
*/
if (pdev->subsystem_vendor == PCI_VENDOR_ID_COMPAQ &&
pdev->subsystem_device == 0xC000)
- return -ENODEV;
+ goto out_disable_device;
/* Now check the magic signature byte */
pci_read_config_word(pdev, PCI_CONF_AMISIG, &magic);
if (magic != HBA_SIGNATURE_471 && magic != HBA_SIGNATURE)
- return -ENODEV;
+ goto out_disable_device;
/* Ok it is probably a megaraid */
}
tmp_prio = get->operational.app_prio.fcoe;
if (qedf_default_prio > -1)
qedf->prio = qedf_default_prio;
- else if (tmp_prio < 0 || tmp_prio > 7) {
+ else if (tmp_prio > 7) {
QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC,
"FIP/FCoE prio %d out of range, setting to %d.\n",
tmp_prio, QEDF_DEFAULT_PRIO);
struct qla_hw_data *ha = vha->hw;
uint16_t id = vha->vp_idx;
+ set_bit(VPORT_DELETE, &vha->dpc_flags);
+
while (test_bit(LOOP_RESYNC_ACTIVE, &vha->dpc_flags) ||
test_bit(FCPORT_UPDATE_NEEDED, &vha->dpc_flags))
msleep(1000);
unsigned int query:1;
unsigned int id_changed:1;
unsigned int scan_needed:1;
+ unsigned int n2n_flag:1;
struct completion nvme_del_done;
uint32_t nvme_prli_service_param;
uint8_t fc4_type;
uint8_t fc4f_nvme;
uint8_t scan_state;
- uint8_t n2n_flag;
unsigned long last_queue_full;
unsigned long last_ramp_up;
enum fc4type_t {
FS_FC4TYPE_FCP = BIT_0,
FS_FC4TYPE_NVME = BIT_1,
+ FS_FCP_IS_N2N = BIT_7,
};
struct fab_scan_rp {
#define IOCB_WORK_ACTIVE 31
#define SET_ZIO_THRESHOLD_NEEDED 32
#define ISP_ABORT_TO_ROM 33
+#define VPORT_DELETE 34
unsigned long pci_flags;
#define PFLG_DISCONNECTED 0 /* PCI device removed */
{
struct qla_work_evt *e;
- if (test_bit(UNLOADING, &vha->dpc_flags))
+ if (test_bit(UNLOADING, &vha->dpc_flags) ||
+ (vha->vp_idx && test_bit(VPORT_DELETE, &vha->dpc_flags)))
return 0;
e = qla2x00_alloc_work(vha, QLA_EVT_GPNID);
break;
default:
if ((id.b24 != fcport->d_id.b24 &&
- fcport->d_id.b24) ||
+ fcport->d_id.b24 &&
+ fcport->loop_id != FC_NO_LOOP_ID) ||
(fcport->loop_id != FC_NO_LOOP_ID &&
fcport->loop_id != loop_id)) {
ql_dbg(ql_dbg_disc, vha, 0x20e3,
"%s %d %8phC post del sess\n",
__func__, __LINE__, fcport->port_name);
+ if (fcport->n2n_flag)
+ fcport->d_id.b24 = 0;
qlt_schedule_sess_for_deletion(fcport);
return;
}
}
fcport->loop_id = loop_id;
+ if (fcport->n2n_flag)
+ fcport->d_id.b24 = id.b24;
wwn = wwn_to_u64(fcport->port_name);
qlt_find_sess_invalidate_other(vha, wwn,
wwn = wwn_to_u64(e->port_name);
ql_dbg(ql_dbg_disc + ql_dbg_verbose, vha, 0x20e8,
- "%s %8phC %02x:%02x:%02x state %d/%d lid %x \n",
+ "%s %8phC %02x:%02x:%02x CLS %x/%x lid %x \n",
__func__, (void *)&wwn, e->port_id[2], e->port_id[1],
e->port_id[0], e->current_login_state, e->last_login_state,
(loop_id & 0x7fff));
(fcport->fw_login_state == DSC_LS_PRLI_PEND)))
return 0;
- if (fcport->fw_login_state == DSC_LS_PLOGI_COMP) {
+ if (fcport->fw_login_state == DSC_LS_PLOGI_COMP &&
+ !N2N_TOPO(vha->hw)) {
if (time_before_eq(jiffies, fcport->plogi_nack_done_deadline)) {
set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
return 0;
qla24xx_post_gpdb_work(vha, fcport, 0);
} else {
ql_dbg(ql_dbg_disc, vha, 0x2118,
- "%s %d %8phC post NVMe PRLI\n",
- __func__, __LINE__, fcport->port_name);
+ "%s %d %8phC post %s PRLI\n",
+ __func__, __LINE__, fcport->port_name,
+ fcport->fc4f_nvme ? "NVME" : "FC");
qla24xx_post_prli_work(vha, fcport);
}
break;
break;
}
- if (ea->fcport->n2n_flag) {
+ if (ea->fcport->fc4f_nvme) {
ql_dbg(ql_dbg_disc, vha, 0x2118,
"%s %d %8phC post fc4 prli\n",
__func__, __LINE__, ea->fcport->port_name);
ea->fcport->fc4f_nvme = 0;
- ea->fcport->n2n_flag = 0;
qla24xx_post_prli_work(vha, ea->fcport);
+ return;
+ }
+
+ /* at this point both PRLI NVME & PRLI FCP failed */
+ if (N2N_TOPO(vha->hw)) {
+ if (ea->fcport->n2n_link_reset_cnt < 3) {
+ ea->fcport->n2n_link_reset_cnt++;
+ /*
+ * remote port is not sending Plogi. Reset
+ * link to kick start his state machine
+ */
+ set_bit(N2N_LINK_RESET, &vha->dpc_flags);
+ } else {
+ ql_log(ql_log_warn, vha, 0x2119,
+ "%s %d %8phC Unable to reconnect\n",
+ __func__, __LINE__, ea->fcport->port_name);
+ }
+ } else {
+ /*
+ * switch connect. login failed. Take connection
+ * down and allow relogin to retrigger
+ */
+ ea->fcport->flags &= ~FCF_ASYNC_SENT;
+ ea->fcport->keep_nport_handle = 0;
+ qlt_schedule_sess_for_deletion(ea->fcport);
}
- ql_dbg(ql_dbg_disc, vha, 0x2119,
- "%s %d %8phC unhandle event of %x\n",
- __func__, __LINE__, ea->fcport->port_name, ea->data[0]);
break;
}
}
for (j = 0; j < 2; j++, fwdt++) {
if (!fwdt->template) {
- ql_log(ql_log_warn, vha, 0x00ba,
+ ql_dbg(ql_dbg_init, vha, 0x00ba,
"-> fwdt%u no template\n", j);
continue;
}
unsigned long flags;
/* Inititae N2N login. */
- if (test_and_clear_bit(N2N_LOGIN_NEEDED, &vha->dpc_flags)) {
- /* borrowing */
- u32 *bp, i, sz;
-
- memset(ha->init_cb, 0, ha->init_cb_size);
- sz = min_t(int, sizeof(struct els_plogi_payload),
- ha->init_cb_size);
- rval = qla24xx_get_port_login_templ(vha, ha->init_cb_dma,
- (void *)ha->init_cb, sz);
- if (rval == QLA_SUCCESS) {
- bp = (uint32_t *)ha->init_cb;
- for (i = 0; i < sz/4 ; i++, bp++)
- *bp = cpu_to_be32(*bp);
+ if (N2N_TOPO(ha)) {
+ if (test_and_clear_bit(N2N_LOGIN_NEEDED, &vha->dpc_flags)) {
+ /* borrowing */
+ u32 *bp, i, sz;
+
+ memset(ha->init_cb, 0, ha->init_cb_size);
+ sz = min_t(int, sizeof(struct els_plogi_payload),
+ ha->init_cb_size);
+ rval = qla24xx_get_port_login_templ(vha,
+ ha->init_cb_dma, (void *)ha->init_cb, sz);
+ if (rval == QLA_SUCCESS) {
+ bp = (uint32_t *)ha->init_cb;
+ for (i = 0; i < sz/4 ; i++, bp++)
+ *bp = cpu_to_be32(*bp);
- memcpy(&ha->plogi_els_payld.data, (void *)ha->init_cb,
- sizeof(ha->plogi_els_payld.data));
- set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
- } else {
- ql_dbg(ql_dbg_init, vha, 0x00d1,
- "PLOGI ELS param read fail.\n");
+ memcpy(&ha->plogi_els_payld.data,
+ (void *)ha->init_cb,
+ sizeof(ha->plogi_els_payld.data));
+ set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
+ } else {
+ ql_dbg(ql_dbg_init, vha, 0x00d1,
+ "PLOGI ELS param read fail.\n");
+ goto skip_login;
+ }
+ }
+
+ list_for_each_entry(fcport, &vha->vp_fcports, list) {
+ if (fcport->n2n_flag) {
+ qla24xx_fcport_handle_login(vha, fcport);
+ return QLA_SUCCESS;
+ }
+ }
+skip_login:
+ spin_lock_irqsave(&vha->work_lock, flags);
+ vha->scan.scan_retry++;
+ spin_unlock_irqrestore(&vha->work_lock, flags);
+
+ if (vha->scan.scan_retry < MAX_SCAN_RETRIES) {
+ set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags);
+ set_bit(LOOP_RESYNC_NEEDED, &vha->dpc_flags);
}
- return QLA_SUCCESS;
}
found_devs = 0;
els_iocb->port_id[0] = sp->fcport->d_id.b.al_pa;
els_iocb->port_id[1] = sp->fcport->d_id.b.area;
els_iocb->port_id[2] = sp->fcport->d_id.b.domain;
- els_iocb->s_id[0] = vha->d_id.b.al_pa;
- els_iocb->s_id[1] = vha->d_id.b.area;
- els_iocb->s_id[2] = vha->d_id.b.domain;
+ /* For SID the byte order is different than DID */
+ els_iocb->s_id[1] = vha->d_id.b.al_pa;
+ els_iocb->s_id[2] = vha->d_id.b.area;
+ els_iocb->s_id[0] = vha->d_id.b.domain;
if (elsio->u.els_logo.els_cmd == ELS_DCMD_PLOGI) {
els_iocb->control_flags = 0;
mbx_cmd_t mc;
mbx_cmd_t *mcp = &mc;
- ql_dbg(ql_dbg_mbx + ql_dbg_verbose, vha, 0x105a,
+ ql_dbg(ql_dbg_disc, vha, 0x105a,
"Entered %s.\n", __func__);
if (IS_CNA_CAPABLE(vha->hw)) {
case TOPO_N2N:
ha->current_topology = ISP_CFG_N;
spin_lock_irqsave(&vha->hw->tgt.sess_lock, flags);
+ list_for_each_entry(fcport, &vha->vp_fcports, list) {
+ fcport->scan_state = QLA_FCPORT_SCAN;
+ fcport->n2n_flag = 0;
+ }
+
fcport = qla2x00_find_fcport_by_wwpn(vha,
rptid_entry->u.f1.port_name, 1);
spin_unlock_irqrestore(&vha->hw->tgt.sess_lock, flags);
if (fcport) {
fcport->plogi_nack_done_deadline = jiffies + HZ;
- fcport->dm_login_expire = jiffies + 3*HZ;
+ fcport->dm_login_expire = jiffies + 2*HZ;
fcport->scan_state = QLA_FCPORT_FOUND;
+ fcport->n2n_flag = 1;
+ fcport->keep_nport_handle = 1;
+ if (vha->flags.nvme_enabled)
+ fcport->fc4f_nvme = 1;
+
switch (fcport->disc_state) {
case DSC_DELETED:
set_bit(RELOGIN_NEEDED,
rptid_entry->u.f1.port_name,
rptid_entry->u.f1.node_name,
NULL,
- FC4_TYPE_UNKNOWN);
+ FS_FCP_IS_N2N);
}
/* if our portname is higher then initiate N2N login */
list_for_each_entry(fcport, &vha->vp_fcports, list) {
fcport->scan_state = QLA_FCPORT_SCAN;
+ fcport->n2n_flag = 0;
}
fcport = qla2x00_find_fcport_by_wwpn(vha,
fcport->login_retry = vha->hw->login_retry_count;
fcport->plogi_nack_done_deadline = jiffies + HZ;
fcport->scan_state = QLA_FCPORT_FOUND;
+ fcport->keep_nport_handle = 1;
+ fcport->n2n_flag = 1;
+ fcport->d_id.b.domain =
+ rptid_entry->u.f2.remote_nport_id[2];
+ fcport->d_id.b.area =
+ rptid_entry->u.f2.remote_nport_id[1];
+ fcport->d_id.b.al_pa =
+ rptid_entry->u.f2.remote_nport_id[0];
}
}
}
uint16_t vp_id;
struct qla_hw_data *ha = vha->hw;
unsigned long flags = 0;
+ u8 i;
mutex_lock(&ha->vport_lock);
/*
* ensures no active vp_list traversal while the vport is removed
* from the queue)
*/
- wait_event_timeout(vha->vref_waitq, !atomic_read(&vha->vref_count),
- 10*HZ);
+ for (i = 0; i < 10 && atomic_read(&vha->vref_count); i++)
+ wait_event_timeout(vha->vref_waitq,
+ atomic_read(&vha->vref_count), HZ);
spin_lock_irqsave(&ha->vport_slock, flags);
if (atomic_read(&vha->vref_count)) {
spin_lock_irqsave(&ha->vport_slock, flags);
list_for_each_entry(vha, &ha->vp_list, list) {
if (vha->vp_idx) {
+ if (test_bit(VPORT_DELETE, &vha->dpc_flags))
+ continue;
+
atomic_inc(&vha->vref_count);
spin_unlock_irqrestore(&ha->vport_slock, flags);
int
qla2x00_vp_abort_isp(scsi_qla_host_t *vha)
{
+ fc_port_t *fcport;
+
+ /*
+ * To exclusively reset vport, we need to log it out first.
+ * Note: This control_vp can fail if ISP reset is already
+ * issued, this is expected, as the vp would be already
+ * logged out due to ISP reset.
+ */
+ if (!test_bit(ABORT_ISP_ACTIVE, &vha->dpc_flags)) {
+ qla24xx_control_vp(vha, VCE_COMMAND_DISABLE_VPS_LOGO_ALL);
+ list_for_each_entry(fcport, &vha->vp_fcports, list)
+ fcport->logout_on_delete = 0;
+ }
+
/*
* Physical port will do most of the abort and recovery work. We can
* just treat it as a loop down
atomic_set(&vha->loop_down_timer, LOOP_DOWN_TIME);
}
- /*
- * To exclusively reset vport, we need to log it out first. Note: this
- * control_vp can fail if ISP reset is already issued, this is
- * expected, as the vp would be already logged out due to ISP reset.
- */
- if (!test_bit(ABORT_ISP_ACTIVE, &vha->dpc_flags))
- qla24xx_control_vp(vha, VCE_COMMAND_DISABLE_VPS_LOGO_ALL);
-
ql_dbg(ql_dbg_taskm, vha, 0x801d,
"Scheduling enable of Vport %d.\n", vha->vp_idx);
+
return qla24xx_enable_vp(vha);
}
void
qla2x00_wait_for_sess_deletion(scsi_qla_host_t *vha)
{
+ u8 i;
+
qla2x00_mark_all_devices_lost(vha, 0);
- wait_event_timeout(vha->fcport_waitQ, test_fcport_count(vha), 10*HZ);
+ for (i = 0; i < 10; i++)
+ wait_event_timeout(vha->fcport_waitQ, test_fcport_count(vha),
+ HZ);
+
+ flush_workqueue(vha->hw->wq);
}
/*
memcpy(fcport->port_name, e->u.new_sess.port_name,
WWN_SIZE);
+
+ if (e->u.new_sess.fc4_type & FS_FCP_IS_N2N)
+ fcport->n2n_flag = 1;
+
} else {
ql_dbg(ql_dbg_disc, vha, 0xffff,
"%s %8phC mem alloc fail.\n",
if (dfcp)
qlt_schedule_sess_for_deletion(tfcp);
-
- if (N2N_TOPO(vha->hw))
- fcport->flags &= ~FCF_FABRIC_DEVICE;
-
if (N2N_TOPO(vha->hw)) {
+ fcport->flags &= ~FCF_FABRIC_DEVICE;
+ fcport->keep_nport_handle = 1;
if (vha->flags.nvme_enabled) {
fcport->fc4f_nvme = 1;
fcport->n2n_flag = 1;
struct qla_hw_data *ha = vha->hw;
unsigned long flags;
bool logout_started = false;
- scsi_qla_host_t *base_vha;
+ scsi_qla_host_t *base_vha = pci_get_drvdata(ha->pdev);
struct qlt_plogi_ack_t *own =
sess->plogi_link[QLT_PLOGI_LINK_SAME_WWN];
if (logout_started) {
bool traced = false;
+ u16 cnt = 0;
while (!READ_ONCE(sess->logout_completed)) {
if (!traced) {
traced = true;
}
msleep(100);
+ cnt++;
+ if (cnt > 200)
+ break;
}
ql_dbg(ql_dbg_disc, vha, 0xf087,
}
spin_unlock_irqrestore(&ha->tgt.sess_lock, flags);
+ sess->free_pending = 0;
ql_dbg(ql_dbg_tgt_mgt, vha, 0xf001,
"Unregistration of sess %p %8phC finished fcp_cnt %d\n",
if (tgt && (tgt->sess_count == 0))
wake_up_all(&tgt->waitQ);
- if (vha->fcport_count == 0)
- wake_up_all(&vha->fcport_waitQ);
-
- base_vha = pci_get_drvdata(ha->pdev);
-
- sess->free_pending = 0;
-
- if (test_bit(PFLG_DRIVER_REMOVING, &base_vha->pci_flags))
- return;
-
- if ((!tgt || !tgt->tgt_stop) && !LOOP_TRANSITION(vha)) {
+ if (!test_bit(PFLG_DRIVER_REMOVING, &base_vha->pci_flags) &&
+ !(vha->vp_idx && test_bit(VPORT_DELETE, &vha->dpc_flags)) &&
+ (!tgt || !tgt->tgt_stop) && !LOOP_TRANSITION(vha)) {
switch (vha->host->active_mode) {
case MODE_INITIATOR:
case MODE_DUAL:
break;
}
}
+
+ if (vha->fcport_count == 0)
+ wake_up_all(&vha->fcport_waitQ);
}
/* ha->tgt.sess_lock supposed to be held on entry */
sess->last_login_gen = sess->login_gen;
INIT_WORK(&sess->free_work, qlt_free_session_done);
- schedule_work(&sess->free_work);
+ queue_work(sess->vha->hw->wq, &sess->free_work);
}
EXPORT_SYMBOL(qlt_unreg_sess);
/*
* Set the number of HW queues we are supporting.
*/
- if (stor_device->num_sc != 0)
- host->nr_hw_queues = stor_device->num_sc + 1;
+ host->nr_hw_queues = num_present_cpus();
/*
* Set the error handler work queue.
{
int ret = 0;
+ if (!hba->is_powered)
+ goto out;
+
if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba))
goto out;
*/
dst_release(skb_dst(skb));
skb_dst_set(skb, NULL);
-#ifdef CONFIG_XFRM
- secpath_reset(skb);
-#endif
- nf_reset(skb);
+ skb_ext_reset(skb);
+ nf_reset_ct(skb);
#ifdef CONFIG_NET_SCHED
skb->tc_index = 0;
config DB8500_THERMAL
tristate "DB8500 thermal management"
- depends on MFD_DB8500_PRCMU
+ depends on MFD_DB8500_PRCMU && OF
default y
help
Adds DB8500 thermal management implementation according to the thermal
#define CONTROL0_TSEN_MODE_EXTERNAL 0x2
#define CONTROL0_TSEN_MODE_MASK 0x3
-#define CONTROL1_TSEN_AVG_SHIFT 0
#define CONTROL1_TSEN_AVG_MASK 0x7
#define CONTROL1_EXT_TSEN_SW_RESET BIT(7)
#define CONTROL1_EXT_TSEN_HW_RESETn BIT(8)
/* Average the output value over 2^1 = 2 samples */
regmap_read(priv->syscon, data->syscon_control1_off, ®);
- reg &= ~CONTROL1_TSEN_AVG_MASK << CONTROL1_TSEN_AVG_SHIFT;
- reg |= 1 << CONTROL1_TSEN_AVG_SHIFT;
+ reg &= ~CONTROL1_TSEN_AVG_MASK;
+ reg |= 1;
regmap_write(priv->syscon, data->syscon_control1_off, reg);
}
* db8500_thermal.c - DB8500 Thermal Management Implementation
*
* Copyright (C) 2012 ST-Ericsson
- * Copyright (C) 2012 Linaro Ltd.
+ * Copyright (C) 2012-2019 Linaro Ltd.
*
- * Author: Hongbo Zhang <hongbo.zhang@linaro.com>
+ * Authors: Hongbo Zhang, Linus Walleij
*/
#include <linux/cpu_cooling.h>
#include <linux/mfd/dbx500-prcmu.h>
#include <linux/module.h>
#include <linux/of.h>
-#include <linux/platform_data/db8500_thermal.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
#include <linux/thermal.h>
#define PRCMU_DEFAULT_MEASURE_TIME 0xFFF
#define PRCMU_DEFAULT_LOW_TEMP 0
+/**
+ * db8500_thermal_points - the interpolation points that trigger
+ * interrupts
+ */
+static const unsigned long db8500_thermal_points[] = {
+ 15000,
+ 20000,
+ 25000,
+ 30000,
+ 35000,
+ 40000,
+ 45000,
+ 50000,
+ 55000,
+ 60000,
+ 65000,
+ 70000,
+ 75000,
+ 80000,
+ /*
+ * This is where things start to get really bad for the
+ * SoC and the thermal zones should be set up to trigger
+ * critical temperature at 85000 mC so we don't get above
+ * this point.
+ */
+ 85000,
+ 90000,
+ 95000,
+ 100000,
+};
+
struct db8500_thermal_zone {
- struct thermal_zone_device *therm_dev;
- struct mutex th_lock;
- struct work_struct therm_work;
- struct db8500_thsens_platform_data *trip_tab;
- enum thermal_device_mode mode;
+ struct thermal_zone_device *tz;
enum thermal_trend trend;
- unsigned long cur_temp_pseudo;
+ unsigned long interpolated_temp;
unsigned int cur_index;
};
-/* Local function to check if thermal zone matches cooling devices */
-static int db8500_thermal_match_cdev(struct thermal_cooling_device *cdev,
- struct db8500_trip_point *trip_point)
-{
- int i;
-
- if (!strlen(cdev->type))
- return -EINVAL;
-
- for (i = 0; i < COOLING_DEV_MAX; i++) {
- if (!strcmp(trip_point->cdev_name[i], cdev->type))
- return 0;
- }
-
- return -ENODEV;
-}
-
-/* Callback to bind cooling device to thermal zone */
-static int db8500_cdev_bind(struct thermal_zone_device *thermal,
- struct thermal_cooling_device *cdev)
-{
- struct db8500_thermal_zone *pzone = thermal->devdata;
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
- unsigned long max_state, upper, lower;
- int i, ret = -EINVAL;
-
- cdev->ops->get_max_state(cdev, &max_state);
-
- for (i = 0; i < ptrips->num_trips; i++) {
- if (db8500_thermal_match_cdev(cdev, &ptrips->trip_points[i]))
- continue;
-
- upper = lower = i > max_state ? max_state : i;
-
- ret = thermal_zone_bind_cooling_device(thermal, i, cdev,
- upper, lower, THERMAL_WEIGHT_DEFAULT);
-
- dev_info(&cdev->device, "%s bind to %d: %d-%s\n", cdev->type,
- i, ret, ret ? "fail" : "succeed");
- }
-
- return ret;
-}
-
-/* Callback to unbind cooling device from thermal zone */
-static int db8500_cdev_unbind(struct thermal_zone_device *thermal,
- struct thermal_cooling_device *cdev)
-{
- struct db8500_thermal_zone *pzone = thermal->devdata;
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
- int i, ret = -EINVAL;
-
- for (i = 0; i < ptrips->num_trips; i++) {
- if (db8500_thermal_match_cdev(cdev, &ptrips->trip_points[i]))
- continue;
-
- ret = thermal_zone_unbind_cooling_device(thermal, i, cdev);
-
- dev_info(&cdev->device, "%s unbind from %d: %s\n", cdev->type,
- i, ret ? "fail" : "succeed");
- }
-
- return ret;
-}
-
/* Callback to get current temperature */
-static int db8500_sys_get_temp(struct thermal_zone_device *thermal, int *temp)
+static int db8500_thermal_get_temp(void *data, int *temp)
{
- struct db8500_thermal_zone *pzone = thermal->devdata;
+ struct db8500_thermal_zone *th = data;
/*
* TODO: There is no PRCMU interface to get temperature data currently,
* so a pseudo temperature is returned , it works for thermal framework
* and this will be fixed when the PRCMU interface is available.
*/
- *temp = pzone->cur_temp_pseudo;
+ *temp = th->interpolated_temp;
return 0;
}
/* Callback to get temperature changing trend */
-static int db8500_sys_get_trend(struct thermal_zone_device *thermal,
- int trip, enum thermal_trend *trend)
-{
- struct db8500_thermal_zone *pzone = thermal->devdata;
-
- *trend = pzone->trend;
-
- return 0;
-}
-
-/* Callback to get thermal zone mode */
-static int db8500_sys_get_mode(struct thermal_zone_device *thermal,
- enum thermal_device_mode *mode)
-{
- struct db8500_thermal_zone *pzone = thermal->devdata;
-
- mutex_lock(&pzone->th_lock);
- *mode = pzone->mode;
- mutex_unlock(&pzone->th_lock);
-
- return 0;
-}
-
-/* Callback to set thermal zone mode */
-static int db8500_sys_set_mode(struct thermal_zone_device *thermal,
- enum thermal_device_mode mode)
-{
- struct db8500_thermal_zone *pzone = thermal->devdata;
-
- mutex_lock(&pzone->th_lock);
-
- pzone->mode = mode;
- if (mode == THERMAL_DEVICE_ENABLED)
- schedule_work(&pzone->therm_work);
-
- mutex_unlock(&pzone->th_lock);
-
- return 0;
-}
-
-/* Callback to get trip point type */
-static int db8500_sys_get_trip_type(struct thermal_zone_device *thermal,
- int trip, enum thermal_trip_type *type)
-{
- struct db8500_thermal_zone *pzone = thermal->devdata;
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
-
- if (trip >= ptrips->num_trips)
- return -EINVAL;
-
- *type = ptrips->trip_points[trip].type;
-
- return 0;
-}
-
-/* Callback to get trip point temperature */
-static int db8500_sys_get_trip_temp(struct thermal_zone_device *thermal,
- int trip, int *temp)
+static int db8500_thermal_get_trend(void *data, int trip, enum thermal_trend *trend)
{
- struct db8500_thermal_zone *pzone = thermal->devdata;
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+ struct db8500_thermal_zone *th = data;
- if (trip >= ptrips->num_trips)
- return -EINVAL;
-
- *temp = ptrips->trip_points[trip].temp;
+ *trend = th->trend;
return 0;
}
-/* Callback to get critical trip point temperature */
-static int db8500_sys_get_crit_temp(struct thermal_zone_device *thermal,
- int *temp)
-{
- struct db8500_thermal_zone *pzone = thermal->devdata;
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
- int i;
-
- for (i = ptrips->num_trips - 1; i > 0; i--) {
- if (ptrips->trip_points[i].type == THERMAL_TRIP_CRITICAL) {
- *temp = ptrips->trip_points[i].temp;
- return 0;
- }
- }
-
- return -EINVAL;
-}
-
-static struct thermal_zone_device_ops thdev_ops = {
- .bind = db8500_cdev_bind,
- .unbind = db8500_cdev_unbind,
- .get_temp = db8500_sys_get_temp,
- .get_trend = db8500_sys_get_trend,
- .get_mode = db8500_sys_get_mode,
- .set_mode = db8500_sys_set_mode,
- .get_trip_type = db8500_sys_get_trip_type,
- .get_trip_temp = db8500_sys_get_trip_temp,
- .get_crit_temp = db8500_sys_get_crit_temp,
+static struct thermal_zone_of_device_ops thdev_ops = {
+ .get_temp = db8500_thermal_get_temp,
+ .get_trend = db8500_thermal_get_trend,
};
-static void db8500_thermal_update_config(struct db8500_thermal_zone *pzone,
- unsigned int idx, enum thermal_trend trend,
- unsigned long next_low, unsigned long next_high)
+static void db8500_thermal_update_config(struct db8500_thermal_zone *th,
+ unsigned int idx,
+ enum thermal_trend trend,
+ unsigned long next_low,
+ unsigned long next_high)
{
prcmu_stop_temp_sense();
- pzone->cur_index = idx;
- pzone->cur_temp_pseudo = (next_low + next_high)/2;
- pzone->trend = trend;
+ th->cur_index = idx;
+ th->interpolated_temp = (next_low + next_high)/2;
+ th->trend = trend;
+ /*
+ * The PRCMU accept absolute temperatures in celsius so divide
+ * down the millicelsius with 1000
+ */
prcmu_config_hotmon((u8)(next_low/1000), (u8)(next_high/1000));
prcmu_start_temp_sense(PRCMU_DEFAULT_MEASURE_TIME);
}
static irqreturn_t prcmu_low_irq_handler(int irq, void *irq_data)
{
- struct db8500_thermal_zone *pzone = irq_data;
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
- unsigned int idx = pzone->cur_index;
+ struct db8500_thermal_zone *th = irq_data;
+ unsigned int idx = th->cur_index;
unsigned long next_low, next_high;
- if (unlikely(idx == 0))
+ if (idx == 0)
/* Meaningless for thermal management, ignoring it */
return IRQ_HANDLED;
if (idx == 1) {
- next_high = ptrips->trip_points[0].temp;
+ next_high = db8500_thermal_points[0];
next_low = PRCMU_DEFAULT_LOW_TEMP;
} else {
- next_high = ptrips->trip_points[idx-1].temp;
- next_low = ptrips->trip_points[idx-2].temp;
+ next_high = db8500_thermal_points[idx - 1];
+ next_low = db8500_thermal_points[idx - 2];
}
idx -= 1;
- db8500_thermal_update_config(pzone, idx, THERMAL_TREND_DROPPING,
- next_low, next_high);
-
- dev_dbg(&pzone->therm_dev->device,
+ db8500_thermal_update_config(th, idx, THERMAL_TREND_DROPPING,
+ next_low, next_high);
+ dev_dbg(&th->tz->device,
"PRCMU set max %ld, min %ld\n", next_high, next_low);
- schedule_work(&pzone->therm_work);
+ thermal_zone_device_update(th->tz, THERMAL_EVENT_UNSPECIFIED);
return IRQ_HANDLED;
}
static irqreturn_t prcmu_high_irq_handler(int irq, void *irq_data)
{
- struct db8500_thermal_zone *pzone = irq_data;
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
- unsigned int idx = pzone->cur_index;
+ struct db8500_thermal_zone *th = irq_data;
+ unsigned int idx = th->cur_index;
unsigned long next_low, next_high;
+ int num_points = ARRAY_SIZE(db8500_thermal_points);
- if (idx < ptrips->num_trips - 1) {
- next_high = ptrips->trip_points[idx+1].temp;
- next_low = ptrips->trip_points[idx].temp;
+ if (idx < num_points - 1) {
+ next_high = db8500_thermal_points[idx+1];
+ next_low = db8500_thermal_points[idx];
idx += 1;
- db8500_thermal_update_config(pzone, idx, THERMAL_TREND_RAISING,
- next_low, next_high);
+ db8500_thermal_update_config(th, idx, THERMAL_TREND_RAISING,
+ next_low, next_high);
- dev_dbg(&pzone->therm_dev->device,
- "PRCMU set max %ld, min %ld\n", next_high, next_low);
- } else if (idx == ptrips->num_trips - 1)
- pzone->cur_temp_pseudo = ptrips->trip_points[idx].temp + 1;
+ dev_info(&th->tz->device,
+ "PRCMU set max %ld, min %ld\n", next_high, next_low);
+ } else if (idx == num_points - 1)
+ /* So we roof out 1 degree over the max point */
+ th->interpolated_temp = db8500_thermal_points[idx] + 1;
- schedule_work(&pzone->therm_work);
+ thermal_zone_device_update(th->tz, THERMAL_EVENT_UNSPECIFIED);
return IRQ_HANDLED;
}
-static void db8500_thermal_work(struct work_struct *work)
-{
- enum thermal_device_mode cur_mode;
- struct db8500_thermal_zone *pzone;
-
- pzone = container_of(work, struct db8500_thermal_zone, therm_work);
-
- mutex_lock(&pzone->th_lock);
- cur_mode = pzone->mode;
- mutex_unlock(&pzone->th_lock);
-
- if (cur_mode == THERMAL_DEVICE_DISABLED)
- return;
-
- thermal_zone_device_update(pzone->therm_dev, THERMAL_EVENT_UNSPECIFIED);
- dev_dbg(&pzone->therm_dev->device, "thermal work finished.\n");
-}
-
-#ifdef CONFIG_OF
-static struct db8500_thsens_platform_data*
- db8500_thermal_parse_dt(struct platform_device *pdev)
-{
- struct db8500_thsens_platform_data *ptrips;
- struct device_node *np = pdev->dev.of_node;
- char prop_name[32];
- const char *tmp_str;
- u32 tmp_data;
- int i, j;
-
- ptrips = devm_kzalloc(&pdev->dev, sizeof(*ptrips), GFP_KERNEL);
- if (!ptrips)
- return NULL;
-
- if (of_property_read_u32(np, "num-trips", &tmp_data))
- goto err_parse_dt;
-
- if (tmp_data > THERMAL_MAX_TRIPS)
- goto err_parse_dt;
-
- ptrips->num_trips = tmp_data;
-
- for (i = 0; i < ptrips->num_trips; i++) {
- sprintf(prop_name, "trip%d-temp", i);
- if (of_property_read_u32(np, prop_name, &tmp_data))
- goto err_parse_dt;
-
- ptrips->trip_points[i].temp = tmp_data;
-
- sprintf(prop_name, "trip%d-type", i);
- if (of_property_read_string(np, prop_name, &tmp_str))
- goto err_parse_dt;
-
- if (!strcmp(tmp_str, "active"))
- ptrips->trip_points[i].type = THERMAL_TRIP_ACTIVE;
- else if (!strcmp(tmp_str, "passive"))
- ptrips->trip_points[i].type = THERMAL_TRIP_PASSIVE;
- else if (!strcmp(tmp_str, "hot"))
- ptrips->trip_points[i].type = THERMAL_TRIP_HOT;
- else if (!strcmp(tmp_str, "critical"))
- ptrips->trip_points[i].type = THERMAL_TRIP_CRITICAL;
- else
- goto err_parse_dt;
-
- sprintf(prop_name, "trip%d-cdev-num", i);
- if (of_property_read_u32(np, prop_name, &tmp_data))
- goto err_parse_dt;
-
- if (tmp_data > COOLING_DEV_MAX)
- goto err_parse_dt;
-
- for (j = 0; j < tmp_data; j++) {
- sprintf(prop_name, "trip%d-cdev-name%d", i, j);
- if (of_property_read_string(np, prop_name, &tmp_str))
- goto err_parse_dt;
-
- if (strlen(tmp_str) >= THERMAL_NAME_LENGTH)
- goto err_parse_dt;
-
- strcpy(ptrips->trip_points[i].cdev_name[j], tmp_str);
- }
- }
- return ptrips;
-
-err_parse_dt:
- dev_err(&pdev->dev, "Parsing device tree data error.\n");
- return NULL;
-}
-#else
-static inline struct db8500_thsens_platform_data*
- db8500_thermal_parse_dt(struct platform_device *pdev)
-{
- return NULL;
-}
-#endif
-
static int db8500_thermal_probe(struct platform_device *pdev)
{
- struct db8500_thermal_zone *pzone = NULL;
- struct db8500_thsens_platform_data *ptrips = NULL;
- struct device_node *np = pdev->dev.of_node;
+ struct db8500_thermal_zone *th = NULL;
+ struct device *dev = &pdev->dev;
int low_irq, high_irq, ret = 0;
- unsigned long dft_low, dft_high;
- if (np)
- ptrips = db8500_thermal_parse_dt(pdev);
- else
- ptrips = dev_get_platdata(&pdev->dev);
-
- if (!ptrips)
- return -EINVAL;
-
- pzone = devm_kzalloc(&pdev->dev, sizeof(*pzone), GFP_KERNEL);
- if (!pzone)
+ th = devm_kzalloc(dev, sizeof(*th), GFP_KERNEL);
+ if (!th)
return -ENOMEM;
- mutex_init(&pzone->th_lock);
- mutex_lock(&pzone->th_lock);
-
- pzone->mode = THERMAL_DEVICE_DISABLED;
- pzone->trip_tab = ptrips;
-
- INIT_WORK(&pzone->therm_work, db8500_thermal_work);
-
low_irq = platform_get_irq_byname(pdev, "IRQ_HOTMON_LOW");
if (low_irq < 0) {
- dev_err(&pdev->dev, "Get IRQ_HOTMON_LOW failed.\n");
- ret = low_irq;
- goto out_unlock;
+ dev_err(dev, "Get IRQ_HOTMON_LOW failed\n");
+ return low_irq;
}
- ret = devm_request_threaded_irq(&pdev->dev, low_irq, NULL,
+ ret = devm_request_threaded_irq(dev, low_irq, NULL,
prcmu_low_irq_handler, IRQF_NO_SUSPEND | IRQF_ONESHOT,
- "dbx500_temp_low", pzone);
+ "dbx500_temp_low", th);
if (ret < 0) {
- dev_err(&pdev->dev, "Failed to allocate temp low irq.\n");
- goto out_unlock;
+ dev_err(dev, "failed to allocate temp low irq\n");
+ return ret;
}
high_irq = platform_get_irq_byname(pdev, "IRQ_HOTMON_HIGH");
if (high_irq < 0) {
- dev_err(&pdev->dev, "Get IRQ_HOTMON_HIGH failed.\n");
- ret = high_irq;
- goto out_unlock;
+ dev_err(dev, "Get IRQ_HOTMON_HIGH failed\n");
+ return high_irq;
}
- ret = devm_request_threaded_irq(&pdev->dev, high_irq, NULL,
+ ret = devm_request_threaded_irq(dev, high_irq, NULL,
prcmu_high_irq_handler, IRQF_NO_SUSPEND | IRQF_ONESHOT,
- "dbx500_temp_high", pzone);
+ "dbx500_temp_high", th);
if (ret < 0) {
- dev_err(&pdev->dev, "Failed to allocate temp high irq.\n");
- goto out_unlock;
+ dev_err(dev, "failed to allocate temp high irq\n");
+ return ret;
}
- pzone->therm_dev = thermal_zone_device_register("db8500_thermal_zone",
- ptrips->num_trips, 0, pzone, &thdev_ops, NULL, 0, 0);
-
- if (IS_ERR(pzone->therm_dev)) {
- dev_err(&pdev->dev, "Register thermal zone device failed.\n");
- ret = PTR_ERR(pzone->therm_dev);
- goto out_unlock;
+ /* register of thermal sensor and get info from DT */
+ th->tz = devm_thermal_zone_of_sensor_register(dev, 0, th, &thdev_ops);
+ if (IS_ERR(th->tz)) {
+ dev_err(dev, "register thermal zone sensor failed\n");
+ return PTR_ERR(th->tz);
}
- dev_info(&pdev->dev, "Thermal zone device registered.\n");
-
- dft_low = PRCMU_DEFAULT_LOW_TEMP;
- dft_high = ptrips->trip_points[0].temp;
-
- db8500_thermal_update_config(pzone, 0, THERMAL_TREND_STABLE,
- dft_low, dft_high);
-
- platform_set_drvdata(pdev, pzone);
- pzone->mode = THERMAL_DEVICE_ENABLED;
+ dev_info(dev, "thermal zone sensor registered\n");
-out_unlock:
- mutex_unlock(&pzone->th_lock);
+ /* Start measuring at the lowest point */
+ db8500_thermal_update_config(th, 0, THERMAL_TREND_STABLE,
+ PRCMU_DEFAULT_LOW_TEMP,
+ db8500_thermal_points[0]);
- return ret;
-}
-
-static int db8500_thermal_remove(struct platform_device *pdev)
-{
- struct db8500_thermal_zone *pzone = platform_get_drvdata(pdev);
-
- thermal_zone_device_unregister(pzone->therm_dev);
- cancel_work_sync(&pzone->therm_work);
- mutex_destroy(&pzone->th_lock);
+ platform_set_drvdata(pdev, th);
return 0;
}
static int db8500_thermal_suspend(struct platform_device *pdev,
pm_message_t state)
{
- struct db8500_thermal_zone *pzone = platform_get_drvdata(pdev);
-
- flush_work(&pzone->therm_work);
prcmu_stop_temp_sense();
return 0;
static int db8500_thermal_resume(struct platform_device *pdev)
{
- struct db8500_thermal_zone *pzone = platform_get_drvdata(pdev);
- struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
- unsigned long dft_low, dft_high;
-
- dft_low = PRCMU_DEFAULT_LOW_TEMP;
- dft_high = ptrips->trip_points[0].temp;
+ struct db8500_thermal_zone *th = platform_get_drvdata(pdev);
- db8500_thermal_update_config(pzone, 0, THERMAL_TREND_STABLE,
- dft_low, dft_high);
+ /* Resume and start measuring at the lowest point */
+ db8500_thermal_update_config(th, 0, THERMAL_TREND_STABLE,
+ PRCMU_DEFAULT_LOW_TEMP,
+ db8500_thermal_points[0]);
return 0;
}
-#ifdef CONFIG_OF
static const struct of_device_id db8500_thermal_match[] = {
{ .compatible = "stericsson,db8500-thermal" },
{},
};
MODULE_DEVICE_TABLE(of, db8500_thermal_match);
-#endif
static struct platform_driver db8500_thermal_driver = {
.driver = {
.probe = db8500_thermal_probe,
.suspend = db8500_thermal_suspend,
.resume = db8500_thermal_resume,
- .remove = db8500_thermal_remove,
};
module_platform_driver(db8500_thermal_driver);
struct acpi_buffer element = { 0, NULL };
struct acpi_buffer trt_format = { sizeof("RRNNNNNN"), "RRNNNNNN" };
- if (!acpi_has_method(handle, "_TRT"))
- return -ENODEV;
-
status = acpi_evaluate_object(handle, "_TRT", NULL, &buffer);
if (ACPI_FAILURE(status))
return -ENODEV;
struct acpi_buffer art_format = {
sizeof("RRNNNNNNNNNNN"), "RRNNNNNNNNNNN" };
- if (!acpi_has_method(handle, "_ART"))
- return -ENODEV;
-
status = acpi_evaluate_object(handle, "_ART", NULL, &buffer);
if (ACPI_FAILURE(status))
return -ENODEV;
p = buf.pointer;
if (!p || (p->type != ACPI_TYPE_PACKAGE)) {
- printk(KERN_WARNING "Invalid PPSS data\n");
+ pr_warn("Invalid PPSS data\n");
kfree(buf.pointer);
return -EFAULT;
}
/* GeminiLake thermal reporting device */
#define PCI_DEVICE_ID_PROC_GLK_THERMAL 0x318C
+/* IceLake thermal reporting device */
+#define PCI_DEVICE_ID_PROC_ICL_THERMAL 0x8a03
+
#define DRV_NAME "proc_thermal"
struct power_config {
.name = "power_limits"
};
+static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ u64 val;
+ int err;
+
+ err = rdmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, &val);
+ if (err)
+ return err;
+
+ val = (val >> 24) & 0xff;
+ return sprintf(buf, "%d\n", (int)val);
+}
+
+static int tcc_offset_update(int tcc)
+{
+ u64 val;
+ int err;
+
+ if (!tcc)
+ return -EINVAL;
+
+ err = rdmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, &val);
+ if (err)
+ return err;
+
+ val &= ~GENMASK_ULL(31, 24);
+ val |= (tcc & 0xff) << 24;
+
+ err = wrmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, val);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int tcc_offset_save;
+
+static ssize_t tcc_offset_degree_celsius_store(struct device *dev,
+ struct device_attribute *attr, const char *buf,
+ size_t count)
+{
+ u64 val;
+ int tcc, err;
+
+ err = rdmsrl_safe(MSR_PLATFORM_INFO, &val);
+ if (err)
+ return err;
+
+ if (!(val & BIT(30)))
+ return -EACCES;
+
+ if (kstrtoint(buf, 0, &tcc))
+ return -EINVAL;
+
+ err = tcc_offset_update(tcc);
+ if (err)
+ return err;
+
+ tcc_offset_save = tcc;
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(tcc_offset_degree_celsius);
+
static int stored_tjmax; /* since it is fixed, we can have local storage */
static int get_tjmax(void)
acpi_remove_notify_handler(proc_priv->adev->handle,
ACPI_DEVICE_NOTIFY, proc_thermal_notify);
int340x_thermal_zone_remove(proc_priv->int340x_zone);
+ sysfs_remove_file(&proc_priv->dev->kobj, &dev_attr_tcc_offset_degree_celsius.attr);
sysfs_remove_group(&proc_priv->dev->kobj,
&power_limit_attribute_group);
}
dev_info(&pdev->dev, "Creating sysfs group for PROC_THERMAL_PLATFORM_DEV\n");
- return sysfs_create_group(&pdev->dev.kobj,
- &power_limit_attribute_group);
+ ret = sysfs_create_file(&pdev->dev.kobj, &dev_attr_tcc_offset_degree_celsius.attr);
+ if (ret)
+ return ret;
+
+ ret = sysfs_create_group(&pdev->dev.kobj, &power_limit_attribute_group);
+ if (ret)
+ sysfs_remove_file(&pdev->dev.kobj, &dev_attr_tcc_offset_degree_celsius.attr);
+
+ return ret;
}
static int int3401_remove(struct platform_device *pdev)
dev_info(&pdev->dev, "Creating sysfs group for PROC_THERMAL_PCI\n");
- return sysfs_create_group(&pdev->dev.kobj,
- &power_limit_attribute_group);
+ ret = sysfs_create_file(&pdev->dev.kobj, &dev_attr_tcc_offset_degree_celsius.attr);
+ if (ret)
+ return ret;
+
+ ret = sysfs_create_group(&pdev->dev.kobj, &power_limit_attribute_group);
+ if (ret)
+ sysfs_remove_file(&pdev->dev.kobj, &dev_attr_tcc_offset_degree_celsius.attr);
+
+ return ret;
}
static void proc_thermal_pci_remove(struct pci_dev *pdev)
proc_dev = dev_get_drvdata(dev);
proc_thermal_read_ppcc(proc_dev);
+ tcc_offset_update(tcc_offset_save);
+
return 0;
}
#else
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_CNL_THERMAL)},
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_CFL_THERMAL)},
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_GLK_THERMAL)},
+ { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_ICL_THERMAL),
+ .driver_data = (kernel_ulong_t)&rapl_mmio_hsw, },
{ 0, },
};
static int intel_pch_thermal_suspend(struct device *device)
{
- struct pci_dev *pdev = to_pci_dev(device);
- struct pch_thermal_device *ptd = pci_get_drvdata(pdev);
+ struct pch_thermal_device *ptd = dev_get_drvdata(device);
return ptd->ops->suspend(ptd);
}
static int intel_pch_thermal_resume(struct device *device)
{
- struct pci_dev *pdev = to_pci_dev(device);
- struct pch_thermal_device *ptd = pci_get_drvdata(pdev);
+ struct pch_thermal_device *ptd = dev_get_drvdata(device);
return ptd->ops->resume(ptd);
}
for (i = 0; i < num_read; i++, s++)
s->offset = data[i];
+ kfree(data);
+
return 0;
}
return PTR_ERR(qfprom_cdata);
qfprom_csel = (u32 *)qfprom_read(priv->dev, "calib_sel");
- if (IS_ERR(qfprom_csel))
+ if (IS_ERR(qfprom_csel)) {
+ kfree(qfprom_cdata);
return PTR_ERR(qfprom_csel);
+ }
mode = (qfprom_csel[0] & MSM8916_CAL_SEL_MASK) >> MSM8916_CAL_SEL_SHIFT;
dev_dbg(priv->dev, "calibration mode is %d\n", mode);
}
compute_intercept_slope(priv, p1, p2, mode);
+ kfree(qfprom_cdata);
+ kfree(qfprom_csel);
return 0;
}
return PTR_ERR(calib);
bkp = (u32 *)qfprom_read(priv->dev, "calib_backup");
- if (IS_ERR(bkp))
+ if (IS_ERR(bkp)) {
+ kfree(calib);
return PTR_ERR(bkp);
+ }
calib_redun_sel = bkp[1] & BKP_REDUN_SEL;
calib_redun_sel >>= BKP_REDUN_SHIFT;
}
compute_intercept_slope(priv, p1, p2, mode);
+ kfree(calib);
+ kfree(bkp);
return 0;
}
}
compute_intercept_slope(priv, p1, p2, mode);
+ kfree(qfprom_cdata);
return 0;
}
#include <linux/thermal.h>
#include <linux/regmap.h>
+#include <linux/slab.h>
struct tsens_priv;
//
// Copyright 2016 Freescale Semiconductor, Inc.
+#include <linux/clk.h>
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/err.h>
struct qoriq_tmu_data {
struct qoriq_tmu_regs __iomem *regs;
+ struct clk *clk;
bool little_endian;
struct qoriq_sensor *sensor[SITES_MAX];
};
data->little_endian = of_property_read_bool(np, "little-endian");
- data->regs = of_iomap(np, 0);
- if (!data->regs) {
+ data->regs = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(data->regs)) {
dev_err(&pdev->dev, "Failed to get memory region\n");
- ret = -ENODEV;
- goto err_iomap;
+ return PTR_ERR(data->regs);
+ }
+
+ data->clk = devm_clk_get_optional(&pdev->dev, NULL);
+ if (IS_ERR(data->clk))
+ return PTR_ERR(data->clk);
+
+ ret = clk_prepare_enable(data->clk);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to enable clock\n");
+ return ret;
}
qoriq_tmu_init_device(data); /* TMU initialization */
ret = qoriq_tmu_calibration(pdev); /* TMU calibration */
if (ret < 0)
- goto err_tmu;
+ goto err;
ret = qoriq_tmu_register_tmu_zone(pdev);
if (ret < 0) {
dev_err(&pdev->dev, "Failed to register sensors\n");
ret = -ENODEV;
- goto err_iomap;
+ goto err;
}
return 0;
-err_tmu:
- iounmap(data->regs);
-
-err_iomap:
+err:
+ clk_disable_unprepare(data->clk);
platform_set_drvdata(pdev, NULL);
return ret;
/* Disable monitoring */
tmu_write(data, TMR_DISABLE, &data->regs->tmr);
- iounmap(data->regs);
+ clk_disable_unprepare(data->clk);
+
platform_set_drvdata(pdev, NULL);
return 0;
}
-#ifdef CONFIG_PM_SLEEP
-static int qoriq_tmu_suspend(struct device *dev)
+static int __maybe_unused qoriq_tmu_suspend(struct device *dev)
{
u32 tmr;
struct qoriq_tmu_data *data = dev_get_drvdata(dev);
tmr &= ~TMR_ME;
tmu_write(data, tmr, &data->regs->tmr);
+ clk_disable_unprepare(data->clk);
+
return 0;
}
-static int qoriq_tmu_resume(struct device *dev)
+static int __maybe_unused qoriq_tmu_resume(struct device *dev)
{
u32 tmr;
+ int ret;
struct qoriq_tmu_data *data = dev_get_drvdata(dev);
+ ret = clk_prepare_enable(data->clk);
+ if (ret)
+ return ret;
+
/* Enable monitoring */
tmr = tmu_read(data, &data->regs->tmr);
tmr |= TMR_ME;
return 0;
}
-#endif
static SIMPLE_DEV_PM_OPS(qoriq_tmu_pm_ops,
qoriq_tmu_suspend, qoriq_tmu_resume);
if (ret)
goto error_unregister;
- ret = devm_add_action(dev, rcar_gen3_hwmon_action, zone);
+ ret = devm_add_action_or_reset(dev, rcar_gen3_hwmon_action, zone);
if (ret) {
- rcar_gen3_hwmon_action(zone);
goto error_unregister;
}
/* get dividend from the depth */
#define THROT_DEPTH_DIVIDEND(depth) ((256 * (100 - (depth)) / 100) - 1)
-/* gk20a nv_therm interface N:3 Mapping. Levels defined in tegra124-sochterm.h
+/* gk20a nv_therm interface N:3 Mapping. Levels defined in tegra124-soctherm.h
* level vector
* NONE 3'b000
* LOW 3'b001
&tz->poll_queue,
msecs_to_jiffies(delay));
else
- cancel_delayed_work(&tz->poll_queue);
+ cancel_delayed_work_sync(&tz->poll_queue);
}
static void monitor_thermal_zone(struct thermal_zone_device *tz)
result = device_register(&cdev->device);
if (result) {
ida_simple_remove(&thermal_cdev_ida, cdev->id);
- kfree(cdev);
+ put_device(&cdev->device);
return ERR_PTR(result);
}
struct thermal_zone_device *tz;
enum thermal_trip_type trip_type;
int trip_temp;
+ int id;
int result;
int count;
struct thermal_governor *governor;
- if (!type || strlen(type) == 0)
+ if (!type || strlen(type) == 0) {
+ pr_err("Error: No thermal zone type defined\n");
return ERR_PTR(-EINVAL);
+ }
- if (type && strlen(type) >= THERMAL_NAME_LENGTH)
+ if (type && strlen(type) >= THERMAL_NAME_LENGTH) {
+ pr_err("Error: Thermal zone name (%s) too long, should be under %d chars\n",
+ type, THERMAL_NAME_LENGTH);
return ERR_PTR(-EINVAL);
+ }
- if (trips > THERMAL_MAX_TRIPS || trips < 0 || mask >> trips)
+ if (trips > THERMAL_MAX_TRIPS || trips < 0 || mask >> trips) {
+ pr_err("Error: Incorrect number of thermal trips\n");
return ERR_PTR(-EINVAL);
+ }
- if (!ops)
+ if (!ops) {
+ pr_err("Error: Thermal zone device ops not defined\n");
return ERR_PTR(-EINVAL);
+ }
if (trips > 0 && (!ops->get_trip_type || !ops->get_trip_temp))
return ERR_PTR(-EINVAL);
INIT_LIST_HEAD(&tz->thermal_instances);
ida_init(&tz->ida);
mutex_init(&tz->lock);
- result = ida_simple_get(&thermal_tz_ida, 0, 0, GFP_KERNEL);
- if (result < 0)
+ id = ida_simple_get(&thermal_tz_ida, 0, 0, GFP_KERNEL);
+ if (id < 0) {
+ result = id;
goto free_tz;
+ }
- tz->id = result;
+ tz->id = id;
strlcpy(tz->type, type, sizeof(tz->type));
tz->ops = ops;
tz->tzp = tzp;
dev_set_name(&tz->device, "thermal_zone%d", tz->id);
result = device_register(&tz->device);
if (result)
- goto remove_device_groups;
+ goto release_device;
for (count = 0; count < trips; count++) {
if (tz->ops->get_trip_type(tz, count, &trip_type))
return tz;
unregister:
- ida_simple_remove(&thermal_tz_ida, tz->id);
- device_unregister(&tz->device);
- return ERR_PTR(result);
-
-remove_device_groups:
- thermal_zone_destroy_device_groups(tz);
+ device_del(&tz->device);
+release_device:
+ put_device(&tz->device);
+ tz = NULL;
remove_id:
- ida_simple_remove(&thermal_tz_ida, tz->id);
+ ida_simple_remove(&thermal_tz_ida, id);
free_tz:
kfree(tz);
return ERR_PTR(result);
thermal_hwmon_lookup_by_type(const struct thermal_zone_device *tz)
{
struct thermal_hwmon_device *hwmon;
+ char type[THERMAL_NAME_LENGTH];
mutex_lock(&thermal_hwmon_list_lock);
- list_for_each_entry(hwmon, &thermal_hwmon_list, node)
- if (!strcmp(hwmon->type, tz->type)) {
+ list_for_each_entry(hwmon, &thermal_hwmon_list, node) {
+ strcpy(type, tz->type);
+ strreplace(type, '-', '_');
+ if (!strcmp(hwmon->type, type)) {
mutex_unlock(&thermal_hwmon_list_lock);
return hwmon;
}
+ }
mutex_unlock(&thermal_hwmon_list_lock);
return NULL;
return -ENOMEM;
resource = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- if (IS_ERR(resource)) {
- dev_err(&pdev->dev,
- "fail to get platform memory resource (%ld)\n",
- PTR_ERR(resource));
- return PTR_ERR(resource);
- }
-
sensor->mmio_base = devm_ioremap_resource(&pdev->dev, resource);
if (IS_ERR(sensor->mmio_base)) {
dev_err(&pdev->dev, "failed to ioremap memory (%ld)\n",
#include <linux/serial_core.h>
#include <linux/delay.h>
#include <linux/mutex.h>
+#include <linux/security.h>
#include <linux/irq.h>
#include <linux/uaccess.h>
goto check_and_exit;
}
+ retval = security_locked_down(LOCKDOWN_TIOCSSERIAL);
+ if (retval && (change_irq || change_port))
+ goto exit;
+
/*
* Ask the low level driver to verify the settings.
*/
# How to generate logo's
-# Use logo-cfiles to retrieve list of .c files to be built
-logo-cfiles = $(notdir $(patsubst %.$(2), %.c, \
- $(wildcard $(srctree)/$(src)/*$(1).$(2))))
-
-
-# Mono logos
-extra-y += $(call logo-cfiles,_mono,pbm)
-
-# VGA16 logos
-extra-y += $(call logo-cfiles,_vga16,ppm)
-
-# 224 Logos
-extra-y += $(call logo-cfiles,_clut224,ppm)
-
-# Gray 256
-extra-y += $(call logo-cfiles,_gray256,pgm)
-
pnmtologo := scripts/pnmtologo
# Create commands like "pnmtologo -t mono -n logo_mac_mono -o ..."
$(obj)/%_gray256.c: $(src)/%_gray256.pgm $(pnmtologo) FORCE
$(call if_changed,logo)
-# Files generated that shall be removed upon make clean
-clean-files := *.o *_mono.c *_vga16.c *_clut224.c *_gray256.c
+# generated C files
+targets += *_mono.c *_vga16.c *_clut224.c *_gray256.c
Say N if you are unsure.
-config KS8695_WATCHDOG
- tristate "KS8695 watchdog"
- depends on ARCH_KS8695
- help
- Watchdog timer embedded into KS8695 processor. This will reboot your
- system when the timeout is reached.
-
config HAVE_S3C2410_WATCHDOG
bool
help
To compile this driver as a module, choose M here: the
module will be called stmp3xxx_rtc_wdt.
-config NUC900_WATCHDOG
- tristate "Nuvoton NUC900 watchdog"
- depends on ARCH_W90X900 || COMPILE_TEST
- help
- Say Y here if to include support for the watchdog timer
- for the Nuvoton NUC900 series SoCs.
- To compile this driver as a module, choose M here: the
- module will be called nuc900_wdt.
-
config TS4800_WATCHDOG
tristate "TS-4800 Watchdog"
depends on HAS_IOMEM && OF
To compile this driver as a module, choose M here: the
module will be called imx_sc_wdt.
+config IMX7ULP_WDT
+ tristate "IMX7ULP Watchdog"
+ depends on ARCH_MXC || COMPILE_TEST
+ select WATCHDOG_CORE
+ help
+ This is the driver for the hardware watchdog on the Freescale
+ IMX7ULP and later processors. If you have one of these
+ processors and wish to have watchdog support enabled,
+ say Y, otherwise say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called imx7ulp_wdt.
+
config UX500_WATCHDOG
tristate "ST-Ericsson Ux500 watchdog"
depends on MFD_DB8500_PRCMU
depends on X86
help
This is the driver for the hardware watchdog on the Fintek F71808E,
- F71862FG, F71868, F71869, F71882FG, F71889FG, F81865 and F81866
- Super I/O controllers.
+ F71862FG, F71868, F71869, F71882FG, F71889FG, F81803, F81865, and
+ F81866 Super I/O controllers.
You can compile this driver directly into the kernel, or use
it as a module. The module will be called f71808e_wdt.
obj-$(CONFIG_977_WATCHDOG) += wdt977.o
obj-$(CONFIG_FTWDT010_WATCHDOG) += ftwdt010_wdt.o
obj-$(CONFIG_IXP4XX_WATCHDOG) += ixp4xx_wdt.o
-obj-$(CONFIG_KS8695_WATCHDOG) += ks8695_wdt.o
obj-$(CONFIG_S3C2410_WATCHDOG) += s3c2410_wdt.o
obj-$(CONFIG_SA1100_WATCHDOG) += sa1100_wdt.o
obj-$(CONFIG_SAMA5D4_WATCHDOG) += sama5d4_wdt.o
obj-$(CONFIG_COH901327_WATCHDOG) += coh901327_wdt.o
obj-$(CONFIG_NPCM7XX_WATCHDOG) += npcm_wdt.o
obj-$(CONFIG_STMP3XXX_RTC_WATCHDOG) += stmp3xxx_rtc_wdt.o
-obj-$(CONFIG_NUC900_WATCHDOG) += nuc900_wdt.o
obj-$(CONFIG_TS4800_WATCHDOG) += ts4800_wdt.o
obj-$(CONFIG_TS72XX_WATCHDOG) += ts72xx_wdt.o
obj-$(CONFIG_IMX2_WDT) += imx2_wdt.o
obj-$(CONFIG_IMX_SC_WDT) += imx_sc_wdt.o
+obj-$(CONFIG_IMX7ULP_WDT) += imx7ulp_wdt.o
obj-$(CONFIG_UX500_WATCHDOG) += ux500_wdt.o
obj-$(CONFIG_RETU_WATCHDOG) += retu_wdt.o
obj-$(CONFIG_BCM2835_WDT) += bcm2835_wdt.o
static const struct of_device_id aspeed_wdt_of_table[] = {
{ .compatible = "aspeed,ast2400-wdt", .data = &ast2400_config },
{ .compatible = "aspeed,ast2500-wdt", .data = &ast2500_config },
+ { .compatible = "aspeed,ast2600-wdt", .data = &ast2500_config },
{ },
};
MODULE_DEVICE_TABLE(of, aspeed_wdt_of_table);
#define WDT_CTRL_ENABLE BIT(0)
#define WDT_TIMEOUT_STATUS 0x10
#define WDT_TIMEOUT_STATUS_BOOT_SECONDARY BIT(1)
+#define WDT_CLEAR_TIMEOUT_STATUS 0x14
+#define WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION BIT(0)
/*
* WDT_RESET_WIDTH controls the characteristics of the external pulse (if
return 0;
}
+/* access_cs0 shows if cs0 is accessible, hence the reverted bit */
+static ssize_t access_cs0_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct aspeed_wdt *wdt = dev_get_drvdata(dev);
+ u32 status = readl(wdt->base + WDT_TIMEOUT_STATUS);
+
+ return sprintf(buf, "%u\n",
+ !(status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY));
+}
+
+static ssize_t access_cs0_store(struct device *dev,
+ struct device_attribute *attr, const char *buf,
+ size_t size)
+{
+ struct aspeed_wdt *wdt = dev_get_drvdata(dev);
+ unsigned long val;
+
+ if (kstrtoul(buf, 10, &val))
+ return -EINVAL;
+
+ if (val)
+ writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION,
+ wdt->base + WDT_CLEAR_TIMEOUT_STATUS);
+
+ return size;
+}
+
+/*
+ * This attribute exists only if the system has booted from the alternate
+ * flash with 'alt-boot' option.
+ *
+ * At alternate flash the 'access_cs0' sysfs node provides:
+ * ast2400: a way to get access to the primary SPI flash chip at CS0
+ * after booting from the alternate chip at CS1.
+ * ast2500: a way to restore the normal address mapping from
+ * (CS0->CS1, CS1->CS0) to (CS0->CS0, CS1->CS1).
+ *
+ * Clearing the boot code selection and timeout counter also resets to the
+ * initial state the chip select line mapping. When the SoC is in normal
+ * mapping state (i.e. booted from CS0), clearing those bits does nothing for
+ * both versions of the SoC. For alternate boot mode (booted from CS1 due to
+ * wdt2 expiration) the behavior differs as described above.
+ *
+ * This option can be used with wdt2 (watchdog1) only.
+ */
+static DEVICE_ATTR_RW(access_cs0);
+
+static struct attribute *bswitch_attrs[] = {
+ &dev_attr_access_cs0.attr,
+ NULL
+};
+ATTRIBUTE_GROUPS(bswitch);
+
static const struct watchdog_ops aspeed_wdt_ops = {
.start = aspeed_wdt_start,
.stop = aspeed_wdt_stop,
set_bit(WDOG_HW_RUNNING, &wdt->wdd.status);
}
- if (of_device_is_compatible(np, "aspeed,ast2500-wdt")) {
+ if ((of_device_is_compatible(np, "aspeed,ast2500-wdt")) ||
+ (of_device_is_compatible(np, "aspeed,ast2600-wdt"))) {
u32 reg = readl(wdt->base + WDT_RESET_WIDTH);
reg &= config->ext_pulse_width_mask;
}
status = readl(wdt->base + WDT_TIMEOUT_STATUS);
- if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY)
+ if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY) {
wdt->wdd.bootstatus = WDIOF_CARDRESET;
+ if (of_device_is_compatible(np, "aspeed,ast2400-wdt") ||
+ of_device_is_compatible(np, "aspeed,ast2500-wdt"))
+ wdt->wdd.groups = bswitch_groups;
+ }
+
+ dev_set_drvdata(dev, wdt);
+
return devm_watchdog_register_device(dev, &wdt->wdd);
}
return 0;
}
-static void ath97_wdt_shutdown(struct platform_device *pdev)
+static void ath79_wdt_shutdown(struct platform_device *pdev)
{
ath79_wdt_disable();
}
static struct platform_driver ath79_wdt_driver = {
.probe = ath79_wdt_probe,
.remove = ath79_wdt_remove,
- .shutdown = ath97_wdt_shutdown,
+ .shutdown = ath79_wdt_shutdown,
.driver = {
.name = DRIVER_NAME,
.of_match_table = of_match_ptr(ath79_wdt_match),
return 0;
}
-static long cpwd_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- int rval = -ENOIOCTLCMD;
-
- switch (cmd) {
- /* solaris ioctls are specific to this driver */
- case WIOCSTART:
- case WIOCSTOP:
- case WIOCGSTAT:
- mutex_lock(&cpwd_mutex);
- rval = cpwd_ioctl(file, cmd, arg);
- mutex_unlock(&cpwd_mutex);
- break;
-
- /* everything else is handled by the generic compat layer */
- default:
- break;
- }
-
- return rval;
-}
-
static ssize_t cpwd_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
static const struct file_operations cpwd_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = cpwd_ioctl,
- .compat_ioctl = cpwd_compat_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
.open = cpwd_open,
.write = cpwd_write,
.read = cpwd_read,
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/slab.h>
-#include <linux/miscdevice.h>
#include <linux/watchdog.h>
#include <linux/suspend.h>
#include <asm/ebcdic.h>
#include <asm/diag.h>
#include <linux/io.h>
-#include <linux/uaccess.h>
#define MAX_CMDLEN 240
#define DEFAULT_CMD "SYSTEM RESTART"
module_param_named(nowayout, nowayout_info, bool, 0444);
MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default = CONFIG_WATCHDOG_NOWAYOUT)");
-MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR);
MODULE_ALIAS("vmwatchdog");
static int __diag288(unsigned int func, unsigned int timeout,
#define SIO_REG_DEVID 0x20 /* Device ID (2 bytes) */
#define SIO_REG_DEVREV 0x22 /* Device revision */
#define SIO_REG_MANID 0x23 /* Fintek ID (2 bytes) */
+#define SIO_REG_CLOCK_SEL 0x26 /* Clock select */
#define SIO_REG_ROM_ADDR_SEL 0x27 /* ROM address select */
#define SIO_F81866_REG_PORT_SEL 0x27 /* F81866 Multi-Function Register */
+#define SIO_REG_TSI_LEVEL_SEL 0x28 /* TSI Level select */
#define SIO_REG_MFUNCT1 0x29 /* Multi function select 1 */
#define SIO_REG_MFUNCT2 0x2a /* Multi function select 2 */
#define SIO_REG_MFUNCT3 0x2b /* Multi function select 3 */
#define SIO_F71869A_ID 0x1007 /* Chipset ID */
#define SIO_F71882_ID 0x0541 /* Chipset ID */
#define SIO_F71889_ID 0x0723 /* Chipset ID */
+#define SIO_F81803_ID 0x1210 /* Chipset ID */
#define SIO_F81865_ID 0x0704 /* Chipset ID */
#define SIO_F81866_ID 0x1010 /* Chipset ID */
" given initial timeout. Zero (default) disables this feature.");
enum chips { f71808fg, f71858fg, f71862fg, f71868, f71869, f71882fg, f71889fg,
- f81865, f81866};
+ f81803, f81865, f81866};
static const char *f71808e_names[] = {
"f71808fg",
"f71869",
"f71882fg",
"f71889fg",
+ "f81803",
"f81865",
"f81866",
};
superio_inb(watchdog.sioaddr, SIO_REG_MFUNCT3) & 0xcf);
break;
+ case f81803:
+ /* Enable TSI Level register bank */
+ superio_clear_bit(watchdog.sioaddr, SIO_REG_CLOCK_SEL, 3);
+ /* Set pin 27 to WDTRST# */
+ superio_outb(watchdog.sioaddr, SIO_REG_TSI_LEVEL_SEL, 0x5f &
+ superio_inb(watchdog.sioaddr, SIO_REG_TSI_LEVEL_SEL));
+ break;
+
case f81865:
/* Set pin 70 to WDTRST# */
superio_clear_bit(watchdog.sioaddr, SIO_REG_MFUNCT3, 5);
/* Confirmed (by datasheet) not to have a watchdog. */
err = -ENODEV;
goto exit;
+ case SIO_F81803_ID:
+ watchdog.type = f81803;
+ break;
case SIO_F81865_ID:
watchdog.type = f81865;
break;
#define IMX2_WDT_WMCR 0x08 /* Misc Register */
-#define IMX2_WDT_MAX_TIME 128
+#define IMX2_WDT_MAX_TIME 128U
#define IMX2_WDT_DEFAULT_TIME 60 /* in seconds */
#define WDOG_SEC_TO_COUNT(s) ((s * 2 - 1) << 8)
{
unsigned int actual;
- actual = min(new_timeout, wdog->max_hw_heartbeat_ms * 1000);
+ actual = min(new_timeout, IMX2_WDT_MAX_TIME);
__imx2_wdt_set_timeout(wdog, actual);
wdog->timeout = new_timeout;
return 0;
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 NXP.
+ */
+
+#include <linux/clk.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/reboot.h>
+#include <linux/watchdog.h>
+
+#define WDOG_CS 0x0
+#define WDOG_CS_CMD32EN BIT(13)
+#define WDOG_CS_ULK BIT(11)
+#define WDOG_CS_RCS BIT(10)
+#define WDOG_CS_EN BIT(7)
+#define WDOG_CS_UPDATE BIT(5)
+
+#define WDOG_CNT 0x4
+#define WDOG_TOVAL 0x8
+
+#define REFRESH_SEQ0 0xA602
+#define REFRESH_SEQ1 0xB480
+#define REFRESH ((REFRESH_SEQ1 << 16) | REFRESH_SEQ0)
+
+#define UNLOCK_SEQ0 0xC520
+#define UNLOCK_SEQ1 0xD928
+#define UNLOCK ((UNLOCK_SEQ1 << 16) | UNLOCK_SEQ0)
+
+#define DEFAULT_TIMEOUT 60
+#define MAX_TIMEOUT 128
+#define WDOG_CLOCK_RATE 1000
+
+static bool nowayout = WATCHDOG_NOWAYOUT;
+module_param(nowayout, bool, 0000);
+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
+ __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
+
+struct imx7ulp_wdt_device {
+ struct notifier_block restart_handler;
+ struct watchdog_device wdd;
+ void __iomem *base;
+ struct clk *clk;
+};
+
+static inline void imx7ulp_wdt_enable(void __iomem *base, bool enable)
+{
+ u32 val = readl(base + WDOG_CS);
+
+ writel(UNLOCK, base + WDOG_CNT);
+ if (enable)
+ writel(val | WDOG_CS_EN, base + WDOG_CS);
+ else
+ writel(val & ~WDOG_CS_EN, base + WDOG_CS);
+}
+
+static inline bool imx7ulp_wdt_is_enabled(void __iomem *base)
+{
+ u32 val = readl(base + WDOG_CS);
+
+ return val & WDOG_CS_EN;
+}
+
+static int imx7ulp_wdt_ping(struct watchdog_device *wdog)
+{
+ struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+
+ writel(REFRESH, wdt->base + WDOG_CNT);
+
+ return 0;
+}
+
+static int imx7ulp_wdt_start(struct watchdog_device *wdog)
+{
+ struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+
+ imx7ulp_wdt_enable(wdt->base, true);
+
+ return 0;
+}
+
+static int imx7ulp_wdt_stop(struct watchdog_device *wdog)
+{
+ struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+
+ imx7ulp_wdt_enable(wdt->base, false);
+
+ return 0;
+}
+
+static int imx7ulp_wdt_set_timeout(struct watchdog_device *wdog,
+ unsigned int timeout)
+{
+ struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+ u32 val = WDOG_CLOCK_RATE * timeout;
+
+ writel(UNLOCK, wdt->base + WDOG_CNT);
+ writel(val, wdt->base + WDOG_TOVAL);
+
+ wdog->timeout = timeout;
+
+ return 0;
+}
+
+static const struct watchdog_ops imx7ulp_wdt_ops = {
+ .owner = THIS_MODULE,
+ .start = imx7ulp_wdt_start,
+ .stop = imx7ulp_wdt_stop,
+ .ping = imx7ulp_wdt_ping,
+ .set_timeout = imx7ulp_wdt_set_timeout,
+};
+
+static const struct watchdog_info imx7ulp_wdt_info = {
+ .identity = "i.MX7ULP watchdog timer",
+ .options = WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING |
+ WDIOF_MAGICCLOSE,
+};
+
+static inline void imx7ulp_wdt_init(void __iomem *base, unsigned int timeout)
+{
+ u32 val;
+
+ /* unlock the wdog for reconfiguration */
+ writel_relaxed(UNLOCK_SEQ0, base + WDOG_CNT);
+ writel_relaxed(UNLOCK_SEQ1, base + WDOG_CNT);
+
+ /* set an initial timeout value in TOVAL */
+ writel(timeout, base + WDOG_TOVAL);
+ /* enable 32bit command sequence and reconfigure */
+ val = BIT(13) | BIT(8) | BIT(5);
+ writel(val, base + WDOG_CS);
+}
+
+static void imx7ulp_wdt_action(void *data)
+{
+ clk_disable_unprepare(data);
+}
+
+static int imx7ulp_wdt_probe(struct platform_device *pdev)
+{
+ struct imx7ulp_wdt_device *imx7ulp_wdt;
+ struct device *dev = &pdev->dev;
+ struct watchdog_device *wdog;
+ int ret;
+
+ imx7ulp_wdt = devm_kzalloc(dev, sizeof(*imx7ulp_wdt), GFP_KERNEL);
+ if (!imx7ulp_wdt)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, imx7ulp_wdt);
+
+ imx7ulp_wdt->base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(imx7ulp_wdt->base))
+ return PTR_ERR(imx7ulp_wdt->base);
+
+ imx7ulp_wdt->clk = devm_clk_get(dev, NULL);
+ if (IS_ERR(imx7ulp_wdt->clk)) {
+ dev_err(dev, "Failed to get watchdog clock\n");
+ return PTR_ERR(imx7ulp_wdt->clk);
+ }
+
+ ret = clk_prepare_enable(imx7ulp_wdt->clk);
+ if (ret)
+ return ret;
+
+ ret = devm_add_action_or_reset(dev, imx7ulp_wdt_action, imx7ulp_wdt->clk);
+ if (ret)
+ return ret;
+
+ wdog = &imx7ulp_wdt->wdd;
+ wdog->info = &imx7ulp_wdt_info;
+ wdog->ops = &imx7ulp_wdt_ops;
+ wdog->min_timeout = 1;
+ wdog->max_timeout = MAX_TIMEOUT;
+ wdog->parent = dev;
+ wdog->timeout = DEFAULT_TIMEOUT;
+
+ watchdog_init_timeout(wdog, 0, dev);
+ watchdog_stop_on_reboot(wdog);
+ watchdog_stop_on_unregister(wdog);
+ watchdog_set_drvdata(wdog, imx7ulp_wdt);
+ imx7ulp_wdt_init(imx7ulp_wdt->base, wdog->timeout * WDOG_CLOCK_RATE);
+
+ return devm_watchdog_register_device(dev, wdog);
+}
+
+static int __maybe_unused imx7ulp_wdt_suspend(struct device *dev)
+{
+ struct imx7ulp_wdt_device *imx7ulp_wdt = dev_get_drvdata(dev);
+
+ if (watchdog_active(&imx7ulp_wdt->wdd))
+ imx7ulp_wdt_stop(&imx7ulp_wdt->wdd);
+
+ clk_disable_unprepare(imx7ulp_wdt->clk);
+
+ return 0;
+}
+
+static int __maybe_unused imx7ulp_wdt_resume(struct device *dev)
+{
+ struct imx7ulp_wdt_device *imx7ulp_wdt = dev_get_drvdata(dev);
+ u32 timeout = imx7ulp_wdt->wdd.timeout * WDOG_CLOCK_RATE;
+ int ret;
+
+ ret = clk_prepare_enable(imx7ulp_wdt->clk);
+ if (ret)
+ return ret;
+
+ if (imx7ulp_wdt_is_enabled(imx7ulp_wdt->base))
+ imx7ulp_wdt_init(imx7ulp_wdt->base, timeout);
+
+ if (watchdog_active(&imx7ulp_wdt->wdd))
+ imx7ulp_wdt_start(&imx7ulp_wdt->wdd);
+
+ return 0;
+}
+
+static SIMPLE_DEV_PM_OPS(imx7ulp_wdt_pm_ops, imx7ulp_wdt_suspend,
+ imx7ulp_wdt_resume);
+
+static const struct of_device_id imx7ulp_wdt_dt_ids[] = {
+ { .compatible = "fsl,imx7ulp-wdt", },
+ { /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, imx7ulp_wdt_dt_ids);
+
+static struct platform_driver imx7ulp_wdt_driver = {
+ .probe = imx7ulp_wdt_probe,
+ .driver = {
+ .name = "imx7ulp-wdt",
+ .pm = &imx7ulp_wdt_pm_ops,
+ .of_match_table = imx7ulp_wdt_dt_ids,
+ },
+};
+module_platform_driver(imx7ulp_wdt_driver);
+
+MODULE_AUTHOR("Anson Huang <Anson.Huang@nxp.com>");
+MODULE_DESCRIPTION("Freescale i.MX7ULP watchdog driver");
+MODULE_LICENSE("GPL v2");
watchdog_stop_on_unregister(wdog);
ret = devm_watchdog_register_device(dev, wdog);
-
- if (ret) {
- dev_err(dev, "Failed to register watchdog device\n");
- return ret;
- }
-
+ if (ret)
+ return ret;
+
ret = imx_scu_irq_group_enable(SC_IRQ_GROUP_WDOG,
SC_IRQ_WDOG,
true);
struct device *dev = &pdev->dev;
struct jz4740_wdt_drvdata *drvdata;
struct watchdog_device *jz4740_wdt;
- int ret;
drvdata = devm_kzalloc(dev, sizeof(struct jz4740_wdt_drvdata),
GFP_KERNEL);
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Watchdog driver for Kendin/Micrel KS8695.
- *
- * (C) 2007 Andrew Victor
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/bitops.h>
-#include <linux/errno.h>
-#include <linux/fs.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/miscdevice.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/platform_device.h>
-#include <linux/types.h>
-#include <linux/watchdog.h>
-#include <linux/io.h>
-#include <linux/uaccess.h>
-#include <mach/hardware.h>
-
-#define KS8695_TMR_OFFSET (0xF0000 + 0xE400)
-#define KS8695_TMR_VA (KS8695_IO_VA + KS8695_TMR_OFFSET)
-
-/*
- * Timer registers
- */
-#define KS8695_TMCON (0x00) /* Timer Control Register */
-#define KS8695_T0TC (0x08) /* Timer 0 Timeout Count Register */
-#define TMCON_T0EN (1 << 0) /* Timer 0 Enable */
-
-/* Timer0 Timeout Counter Register */
-#define T0TC_WATCHDOG (0xff) /* Enable watchdog mode */
-
-#define WDT_DEFAULT_TIME 5 /* seconds */
-#define WDT_MAX_TIME 171 /* seconds */
-
-static int wdt_time = WDT_DEFAULT_TIME;
-static bool nowayout = WATCHDOG_NOWAYOUT;
-
-module_param(wdt_time, int, 0);
-MODULE_PARM_DESC(wdt_time, "Watchdog time in seconds. (default="
- __MODULE_STRING(WDT_DEFAULT_TIME) ")");
-
-#ifdef CONFIG_WATCHDOG_NOWAYOUT
-module_param(nowayout, bool, 0);
-MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
- __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
-#endif
-
-
-static unsigned long ks8695wdt_busy;
-static DEFINE_SPINLOCK(ks8695_lock);
-
-/* ......................................................................... */
-
-/*
- * Disable the watchdog.
- */
-static inline void ks8695_wdt_stop(void)
-{
- unsigned long tmcon;
-
- spin_lock(&ks8695_lock);
- /* disable timer0 */
- tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
- __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
- spin_unlock(&ks8695_lock);
-}
-
-/*
- * Enable and reset the watchdog.
- */
-static inline void ks8695_wdt_start(void)
-{
- unsigned long tmcon;
- unsigned long tval = wdt_time * KS8695_CLOCK_RATE;
-
- spin_lock(&ks8695_lock);
- /* disable timer0 */
- tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
- __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
-
- /* program timer0 */
- __raw_writel(tval | T0TC_WATCHDOG, KS8695_TMR_VA + KS8695_T0TC);
-
- /* re-enable timer0 */
- tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
- __raw_writel(tmcon | TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
- spin_unlock(&ks8695_lock);
-}
-
-/*
- * Reload the watchdog timer. (ie, pat the watchdog)
- */
-static inline void ks8695_wdt_reload(void)
-{
- unsigned long tmcon;
-
- spin_lock(&ks8695_lock);
- /* disable, then re-enable timer0 */
- tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
- __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
- __raw_writel(tmcon | TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
- spin_unlock(&ks8695_lock);
-}
-
-/*
- * Change the watchdog time interval.
- */
-static int ks8695_wdt_settimeout(int new_time)
-{
- /*
- * All counting occurs at KS8695_CLOCK_RATE / 128 = 0.256 Hz
- *
- * Since WDV is a 16-bit counter, the maximum period is
- * 65536 / 0.256 = 256 seconds.
- */
- if ((new_time <= 0) || (new_time > WDT_MAX_TIME))
- return -EINVAL;
-
- /* Set new watchdog time. It will be used when
- ks8695_wdt_start() is called. */
- wdt_time = new_time;
- return 0;
-}
-
-/* ......................................................................... */
-
-/*
- * Watchdog device is opened, and watchdog starts running.
- */
-static int ks8695_wdt_open(struct inode *inode, struct file *file)
-{
- if (test_and_set_bit(0, &ks8695wdt_busy))
- return -EBUSY;
-
- ks8695_wdt_start();
- return stream_open(inode, file);
-}
-
-/*
- * Close the watchdog device.
- * If CONFIG_WATCHDOG_NOWAYOUT is NOT defined then the watchdog is also
- * disabled.
- */
-static int ks8695_wdt_close(struct inode *inode, struct file *file)
-{
- /* Disable the watchdog when file is closed */
- if (!nowayout)
- ks8695_wdt_stop();
- clear_bit(0, &ks8695wdt_busy);
- return 0;
-}
-
-static const struct watchdog_info ks8695_wdt_info = {
- .identity = "ks8695 watchdog",
- .options = WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING,
-};
-
-/*
- * Handle commands from user-space.
- */
-static long ks8695_wdt_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- void __user *argp = (void __user *)arg;
- int __user *p = argp;
- int new_value;
-
- switch (cmd) {
- case WDIOC_GETSUPPORT:
- return copy_to_user(argp, &ks8695_wdt_info,
- sizeof(ks8695_wdt_info)) ? -EFAULT : 0;
- case WDIOC_GETSTATUS:
- case WDIOC_GETBOOTSTATUS:
- return put_user(0, p);
- case WDIOC_SETOPTIONS:
- if (get_user(new_value, p))
- return -EFAULT;
- if (new_value & WDIOS_DISABLECARD)
- ks8695_wdt_stop();
- if (new_value & WDIOS_ENABLECARD)
- ks8695_wdt_start();
- return 0;
- case WDIOC_KEEPALIVE:
- ks8695_wdt_reload(); /* pat the watchdog */
- return 0;
- case WDIOC_SETTIMEOUT:
- if (get_user(new_value, p))
- return -EFAULT;
- if (ks8695_wdt_settimeout(new_value))
- return -EINVAL;
- /* Enable new time value */
- ks8695_wdt_start();
- /* Return current value */
- return put_user(wdt_time, p);
- case WDIOC_GETTIMEOUT:
- return put_user(wdt_time, p);
- default:
- return -ENOTTY;
- }
-}
-
-/*
- * Pat the watchdog whenever device is written to.
- */
-static ssize_t ks8695_wdt_write(struct file *file, const char *data,
- size_t len, loff_t *ppos)
-{
- ks8695_wdt_reload(); /* pat the watchdog */
- return len;
-}
-
-/* ......................................................................... */
-
-static const struct file_operations ks8695wdt_fops = {
- .owner = THIS_MODULE,
- .llseek = no_llseek,
- .unlocked_ioctl = ks8695_wdt_ioctl,
- .open = ks8695_wdt_open,
- .release = ks8695_wdt_close,
- .write = ks8695_wdt_write,
-};
-
-static struct miscdevice ks8695wdt_miscdev = {
- .minor = WATCHDOG_MINOR,
- .name = "watchdog",
- .fops = &ks8695wdt_fops,
-};
-
-static int ks8695wdt_probe(struct platform_device *pdev)
-{
- int res;
-
- if (ks8695wdt_miscdev.parent)
- return -EBUSY;
- ks8695wdt_miscdev.parent = &pdev->dev;
-
- res = misc_register(&ks8695wdt_miscdev);
- if (res)
- return res;
-
- pr_info("KS8695 Watchdog Timer enabled (%d seconds%s)\n",
- wdt_time, nowayout ? ", nowayout" : "");
- return 0;
-}
-
-static int ks8695wdt_remove(struct platform_device *pdev)
-{
- misc_deregister(&ks8695wdt_miscdev);
- ks8695wdt_miscdev.parent = NULL;
-
- return 0;
-}
-
-static void ks8695wdt_shutdown(struct platform_device *pdev)
-{
- ks8695_wdt_stop();
-}
-
-#ifdef CONFIG_PM
-
-static int ks8695wdt_suspend(struct platform_device *pdev, pm_message_t message)
-{
- ks8695_wdt_stop();
- return 0;
-}
-
-static int ks8695wdt_resume(struct platform_device *pdev)
-{
- if (ks8695wdt_busy)
- ks8695_wdt_start();
- return 0;
-}
-
-#else
-#define ks8695wdt_suspend NULL
-#define ks8695wdt_resume NULL
-#endif
-
-static struct platform_driver ks8695wdt_driver = {
- .probe = ks8695wdt_probe,
- .remove = ks8695wdt_remove,
- .shutdown = ks8695wdt_shutdown,
- .suspend = ks8695wdt_suspend,
- .resume = ks8695wdt_resume,
- .driver = {
- .name = "ks8695_wdt",
- },
-};
-
-static int __init ks8695_wdt_init(void)
-{
- /* Check that the heartbeat value is within range;
- if not reset to the default */
- if (ks8695_wdt_settimeout(wdt_time)) {
- ks8695_wdt_settimeout(WDT_DEFAULT_TIME);
- pr_info("ks8695_wdt: wdt_time value must be 1 <= wdt_time <= %i"
- ", using %d\n", wdt_time, WDT_MAX_TIME);
- }
- return platform_driver_register(&ks8695wdt_driver);
-}
-
-static void __exit ks8695_wdt_exit(void)
-{
- platform_driver_unregister(&ks8695wdt_driver);
-}
-
-module_init(ks8695_wdt_init);
-module_exit(ks8695_wdt_exit);
-
-MODULE_AUTHOR("Andrew Victor");
-MODULE_DESCRIPTION("Watchdog driver for KS8695");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS("platform:ks8695_wdt");
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (c) 2009 Nuvoton technology corporation.
- *
- * Wan ZongShun <mcuos.com@gmail.com>
- */
-
-#include <linux/bitops.h>
-#include <linux/errno.h>
-#include <linux/fs.h>
-#include <linux/io.h>
-#include <linux/clk.h>
-#include <linux/kernel.h>
-#include <linux/miscdevice.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/platform_device.h>
-#include <linux/slab.h>
-#include <linux/interrupt.h>
-#include <linux/types.h>
-#include <linux/watchdog.h>
-#include <linux/uaccess.h>
-
-#define REG_WTCR 0x1c
-#define WTCLK (0x01 << 10)
-#define WTE (0x01 << 7) /*wdt enable*/
-#define WTIS (0x03 << 4)
-#define WTIF (0x01 << 3)
-#define WTRF (0x01 << 2)
-#define WTRE (0x01 << 1)
-#define WTR (0x01 << 0)
-/*
- * The watchdog time interval can be calculated via following formula:
- * WTIS real time interval (formula)
- * 0x00 ((2^ 14 ) * ((external crystal freq) / 256))seconds
- * 0x01 ((2^ 16 ) * ((external crystal freq) / 256))seconds
- * 0x02 ((2^ 18 ) * ((external crystal freq) / 256))seconds
- * 0x03 ((2^ 20 ) * ((external crystal freq) / 256))seconds
- *
- * The external crystal freq is 15Mhz in the nuc900 evaluation board.
- * So 0x00 = +-0.28 seconds, 0x01 = +-1.12 seconds, 0x02 = +-4.48 seconds,
- * 0x03 = +- 16.92 seconds..
- */
-#define WDT_HW_TIMEOUT 0x02
-#define WDT_TIMEOUT (HZ/2)
-#define WDT_HEARTBEAT 15
-
-static int heartbeat = WDT_HEARTBEAT;
-module_param(heartbeat, int, 0);
-MODULE_PARM_DESC(heartbeat, "Watchdog heartbeats in seconds. "
- "(default = " __MODULE_STRING(WDT_HEARTBEAT) ")");
-
-static bool nowayout = WATCHDOG_NOWAYOUT;
-module_param(nowayout, bool, 0);
-MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started "
- "(default=" __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
-
-struct nuc900_wdt {
- struct clk *wdt_clock;
- struct platform_device *pdev;
- void __iomem *wdt_base;
- char expect_close;
- struct timer_list timer;
- spinlock_t wdt_lock;
- unsigned long next_heartbeat;
-};
-
-static unsigned long nuc900wdt_busy;
-static struct nuc900_wdt *nuc900_wdt;
-
-static inline void nuc900_wdt_keepalive(void)
-{
- unsigned int val;
-
- spin_lock(&nuc900_wdt->wdt_lock);
-
- val = __raw_readl(nuc900_wdt->wdt_base + REG_WTCR);
- val |= (WTR | WTIF);
- __raw_writel(val, nuc900_wdt->wdt_base + REG_WTCR);
-
- spin_unlock(&nuc900_wdt->wdt_lock);
-}
-
-static inline void nuc900_wdt_start(void)
-{
- unsigned int val;
-
- spin_lock(&nuc900_wdt->wdt_lock);
-
- val = __raw_readl(nuc900_wdt->wdt_base + REG_WTCR);
- val |= (WTRE | WTE | WTR | WTCLK | WTIF);
- val &= ~WTIS;
- val |= (WDT_HW_TIMEOUT << 0x04);
- __raw_writel(val, nuc900_wdt->wdt_base + REG_WTCR);
-
- spin_unlock(&nuc900_wdt->wdt_lock);
-
- nuc900_wdt->next_heartbeat = jiffies + heartbeat * HZ;
- mod_timer(&nuc900_wdt->timer, jiffies + WDT_TIMEOUT);
-}
-
-static inline void nuc900_wdt_stop(void)
-{
- unsigned int val;
-
- del_timer(&nuc900_wdt->timer);
-
- spin_lock(&nuc900_wdt->wdt_lock);
-
- val = __raw_readl(nuc900_wdt->wdt_base + REG_WTCR);
- val &= ~WTE;
- __raw_writel(val, nuc900_wdt->wdt_base + REG_WTCR);
-
- spin_unlock(&nuc900_wdt->wdt_lock);
-}
-
-static inline void nuc900_wdt_ping(void)
-{
- nuc900_wdt->next_heartbeat = jiffies + heartbeat * HZ;
-}
-
-static int nuc900_wdt_open(struct inode *inode, struct file *file)
-{
-
- if (test_and_set_bit(0, &nuc900wdt_busy))
- return -EBUSY;
-
- nuc900_wdt_start();
-
- return stream_open(inode, file);
-}
-
-static int nuc900_wdt_close(struct inode *inode, struct file *file)
-{
- if (nuc900_wdt->expect_close == 42)
- nuc900_wdt_stop();
- else {
- dev_crit(&nuc900_wdt->pdev->dev,
- "Unexpected close, not stopping watchdog!\n");
- nuc900_wdt_ping();
- }
-
- nuc900_wdt->expect_close = 0;
- clear_bit(0, &nuc900wdt_busy);
- return 0;
-}
-
-static const struct watchdog_info nuc900_wdt_info = {
- .identity = "nuc900 watchdog",
- .options = WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING |
- WDIOF_MAGICCLOSE,
-};
-
-static long nuc900_wdt_ioctl(struct file *file,
- unsigned int cmd, unsigned long arg)
-{
- void __user *argp = (void __user *)arg;
- int __user *p = argp;
- int new_value;
-
- switch (cmd) {
- case WDIOC_GETSUPPORT:
- return copy_to_user(argp, &nuc900_wdt_info,
- sizeof(nuc900_wdt_info)) ? -EFAULT : 0;
- case WDIOC_GETSTATUS:
- case WDIOC_GETBOOTSTATUS:
- return put_user(0, p);
-
- case WDIOC_KEEPALIVE:
- nuc900_wdt_ping();
- return 0;
-
- case WDIOC_SETTIMEOUT:
- if (get_user(new_value, p))
- return -EFAULT;
-
- heartbeat = new_value;
- nuc900_wdt_ping();
-
- return put_user(new_value, p);
- case WDIOC_GETTIMEOUT:
- return put_user(heartbeat, p);
- default:
- return -ENOTTY;
- }
-}
-
-static ssize_t nuc900_wdt_write(struct file *file, const char __user *data,
- size_t len, loff_t *ppos)
-{
- if (!len)
- return 0;
-
- /* Scan for magic character */
- if (!nowayout) {
- size_t i;
-
- nuc900_wdt->expect_close = 0;
-
- for (i = 0; i < len; i++) {
- char c;
- if (get_user(c, data + i))
- return -EFAULT;
- if (c == 'V') {
- nuc900_wdt->expect_close = 42;
- break;
- }
- }
- }
-
- nuc900_wdt_ping();
- return len;
-}
-
-static void nuc900_wdt_timer_ping(struct timer_list *unused)
-{
- if (time_before(jiffies, nuc900_wdt->next_heartbeat)) {
- nuc900_wdt_keepalive();
- mod_timer(&nuc900_wdt->timer, jiffies + WDT_TIMEOUT);
- } else
- dev_warn(&nuc900_wdt->pdev->dev, "Will reset the machine !\n");
-}
-
-static const struct file_operations nuc900wdt_fops = {
- .owner = THIS_MODULE,
- .llseek = no_llseek,
- .unlocked_ioctl = nuc900_wdt_ioctl,
- .open = nuc900_wdt_open,
- .release = nuc900_wdt_close,
- .write = nuc900_wdt_write,
-};
-
-static struct miscdevice nuc900wdt_miscdev = {
- .minor = WATCHDOG_MINOR,
- .name = "watchdog",
- .fops = &nuc900wdt_fops,
-};
-
-static int nuc900wdt_probe(struct platform_device *pdev)
-{
- int ret = 0;
-
- nuc900_wdt = devm_kzalloc(&pdev->dev, sizeof(*nuc900_wdt),
- GFP_KERNEL);
- if (!nuc900_wdt)
- return -ENOMEM;
-
- nuc900_wdt->pdev = pdev;
-
- spin_lock_init(&nuc900_wdt->wdt_lock);
-
- nuc900_wdt->wdt_base = devm_platform_ioremap_resource(pdev, 0);
- if (IS_ERR(nuc900_wdt->wdt_base))
- return PTR_ERR(nuc900_wdt->wdt_base);
-
- nuc900_wdt->wdt_clock = devm_clk_get(&pdev->dev, NULL);
- if (IS_ERR(nuc900_wdt->wdt_clock)) {
- dev_err(&pdev->dev, "failed to find watchdog clock source\n");
- return PTR_ERR(nuc900_wdt->wdt_clock);
- }
-
- clk_enable(nuc900_wdt->wdt_clock);
-
- timer_setup(&nuc900_wdt->timer, nuc900_wdt_timer_ping, 0);
-
- ret = misc_register(&nuc900wdt_miscdev);
- if (ret) {
- dev_err(&pdev->dev, "err register miscdev on minor=%d (%d)\n",
- WATCHDOG_MINOR, ret);
- goto err_clk;
- }
-
- return 0;
-
-err_clk:
- clk_disable(nuc900_wdt->wdt_clock);
- return ret;
-}
-
-static int nuc900wdt_remove(struct platform_device *pdev)
-{
- misc_deregister(&nuc900wdt_miscdev);
-
- clk_disable(nuc900_wdt->wdt_clock);
-
- return 0;
-}
-
-static struct platform_driver nuc900wdt_driver = {
- .probe = nuc900wdt_probe,
- .remove = nuc900wdt_remove,
- .driver = {
- .name = "nuc900-wdt",
- },
-};
-
-module_platform_driver(nuc900wdt_driver);
-
-MODULE_AUTHOR("Wan ZongShun <mcuos.com@gmail.com>");
-MODULE_DESCRIPTION("Watchdog driver for NUC900");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS("platform:nuc900-wdt");
* Watchdog timer block registers.
*/
#define TIMER_CTRL 0x0000
-#define TIMER_A370_STATUS 0x04
+#define TIMER1_FIXED_ENABLE_BIT BIT(12)
+#define WDT_AXP_FIXED_ENABLE_BIT BIT(10)
+#define TIMER1_ENABLE_BIT BIT(2)
+
+#define TIMER_A370_STATUS 0x0004
+#define WDT_A370_EXPIRED BIT(31)
+#define TIMER1_STATUS_BIT BIT(8)
+
+#define TIMER1_VAL_OFF 0x001c
#define WDT_MAX_CYCLE_COUNT 0xffffffff
#define WDT_A370_RATIO_SHIFT 5
#define WDT_A370_RATIO (1 << WDT_A370_RATIO_SHIFT)
-#define WDT_AXP_FIXED_ENABLE_BIT BIT(10)
-#define WDT_A370_EXPIRED BIT(31)
-
static bool nowayout = WATCHDOG_NOWAYOUT;
static int heartbeat = -1; /* module parameter (seconds) */
struct orion_watchdog *dev)
{
int ret;
+ u32 val;
dev->clk = of_clk_get_by_name(pdev->dev.of_node, "fixed");
if (IS_ERR(dev->clk))
return ret;
}
- /* Enable the fixed watchdog clock input */
- atomic_io_modify(dev->reg + TIMER_CTRL,
- WDT_AXP_FIXED_ENABLE_BIT,
- WDT_AXP_FIXED_ENABLE_BIT);
+ /* Fix the wdt and timer1 clock freqency to 25MHz */
+ val = WDT_AXP_FIXED_ENABLE_BIT | TIMER1_FIXED_ENABLE_BIT;
+ atomic_io_modify(dev->reg + TIMER_CTRL, val, val);
dev->clk_rate = clk_get_rate(dev->clk);
return 0;
/* Reload watchdog duration */
writel(dev->clk_rate * wdt_dev->timeout,
dev->reg + dev->data->wdt_counter_offset);
+ if (dev->wdt.info->options & WDIOF_PRETIMEOUT)
+ writel(dev->clk_rate * (wdt_dev->timeout - wdt_dev->pretimeout),
+ dev->reg + TIMER1_VAL_OFF);
+
return 0;
}
/* Set watchdog duration */
writel(dev->clk_rate * wdt_dev->timeout,
dev->reg + dev->data->wdt_counter_offset);
+ if (dev->wdt.info->options & WDIOF_PRETIMEOUT)
+ writel(dev->clk_rate * (wdt_dev->timeout - wdt_dev->pretimeout),
+ dev->reg + TIMER1_VAL_OFF);
/* Clear the watchdog expiration bit */
atomic_io_modify(dev->reg + TIMER_A370_STATUS, WDT_A370_EXPIRED, 0);
/* Enable watchdog timer */
- atomic_io_modify(dev->reg + TIMER_CTRL, dev->data->wdt_enable_bit,
- dev->data->wdt_enable_bit);
+ reg = dev->data->wdt_enable_bit;
+ if (dev->wdt.info->options & WDIOF_PRETIMEOUT)
+ reg |= TIMER1_ENABLE_BIT;
+ atomic_io_modify(dev->reg + TIMER_CTRL, reg, reg);
/* Enable reset on watchdog */
reg = readl(dev->rstout);
static int armada375_stop(struct watchdog_device *wdt_dev)
{
struct orion_watchdog *dev = watchdog_get_drvdata(wdt_dev);
- u32 reg;
+ u32 reg, mask;
/* Disable reset on watchdog */
atomic_io_modify(dev->rstout_mask, dev->data->rstout_mask_bit,
writel(reg, dev->rstout);
/* Disable watchdog timer */
- atomic_io_modify(dev->reg + TIMER_CTRL, dev->data->wdt_enable_bit, 0);
+ mask = dev->data->wdt_enable_bit;
+ if (wdt_dev->info->options & WDIOF_PRETIMEOUT)
+ mask |= TIMER1_ENABLE_BIT;
+ atomic_io_modify(dev->reg + TIMER_CTRL, mask, 0);
return 0;
}
return readl(dev->reg + dev->data->wdt_counter_offset) / dev->clk_rate;
}
-static const struct watchdog_info orion_wdt_info = {
+static struct watchdog_info orion_wdt_info = {
.options = WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING | WDIOF_MAGICCLOSE,
.identity = "Orion Watchdog",
};
return IRQ_HANDLED;
}
+static irqreturn_t orion_wdt_pre_irq(int irq, void *devid)
+{
+ struct orion_watchdog *dev = devid;
+
+ atomic_io_modify(dev->reg + TIMER_A370_STATUS,
+ TIMER1_STATUS_BIT, 0);
+ watchdog_notify_pretimeout(&dev->wdt);
+ return IRQ_HANDLED;
+}
+
/*
* The original devicetree binding for this driver specified only
* one memory resource, so in order to keep DT backwards compatibility
}
}
+ /* Optional 2nd interrupt for pretimeout */
+ irq = platform_get_irq(pdev, 1);
+ if (irq > 0) {
+ orion_wdt_info.options |= WDIOF_PRETIMEOUT;
+ ret = devm_request_irq(&pdev->dev, irq, orion_wdt_pre_irq,
+ 0, pdev->name, dev);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "failed to request IRQ\n");
+ goto disable_clk;
+ }
+ }
+
+
watchdog_set_nowayout(&dev->wdt, nowayout);
ret = watchdog_register_device(&dev->wdt);
if (ret)
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2014, The Linux Foundation. All rights reserved.
*/
+#include <linux/bits.h>
#include <linux/clk.h>
#include <linux/delay.h>
+#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/kernel.h>
#include <linux/module.h>
WDT_BITE_TIME,
};
+#define QCOM_WDT_ENABLE BIT(0)
+#define QCOM_WDT_ENABLE_IRQ BIT(1)
+
static const u32 reg_offset_data_apcs_tmr[] = {
[WDT_RST] = 0x38,
[WDT_EN] = 0x40,
struct qcom_wdt {
struct watchdog_device wdd;
- struct clk *clk;
unsigned long rate;
void __iomem *base;
const u32 *layout;
return container_of(wdd, struct qcom_wdt, wdd);
}
+static inline int qcom_get_enable(struct watchdog_device *wdd)
+{
+ int enable = QCOM_WDT_ENABLE;
+
+ if (wdd->pretimeout)
+ enable |= QCOM_WDT_ENABLE_IRQ;
+
+ return enable;
+}
+
+static irqreturn_t qcom_wdt_isr(int irq, void *arg)
+{
+ struct watchdog_device *wdd = arg;
+
+ watchdog_notify_pretimeout(wdd);
+
+ return IRQ_HANDLED;
+}
+
static int qcom_wdt_start(struct watchdog_device *wdd)
{
struct qcom_wdt *wdt = to_qcom_wdt(wdd);
+ unsigned int bark = wdd->timeout - wdd->pretimeout;
writel(0, wdt_addr(wdt, WDT_EN));
writel(1, wdt_addr(wdt, WDT_RST));
- writel(wdd->timeout * wdt->rate, wdt_addr(wdt, WDT_BARK_TIME));
+ writel(bark * wdt->rate, wdt_addr(wdt, WDT_BARK_TIME));
writel(wdd->timeout * wdt->rate, wdt_addr(wdt, WDT_BITE_TIME));
- writel(1, wdt_addr(wdt, WDT_EN));
+ writel(qcom_get_enable(wdd), wdt_addr(wdt, WDT_EN));
return 0;
}
return qcom_wdt_start(wdd);
}
+static int qcom_wdt_set_pretimeout(struct watchdog_device *wdd,
+ unsigned int timeout)
+{
+ wdd->pretimeout = timeout;
+ return qcom_wdt_start(wdd);
+}
+
static int qcom_wdt_restart(struct watchdog_device *wdd, unsigned long action,
void *data)
{
writel(1, wdt_addr(wdt, WDT_RST));
writel(timeout, wdt_addr(wdt, WDT_BARK_TIME));
writel(timeout, wdt_addr(wdt, WDT_BITE_TIME));
- writel(1, wdt_addr(wdt, WDT_EN));
+ writel(QCOM_WDT_ENABLE, wdt_addr(wdt, WDT_EN));
/*
* Actually make sure the above sequence hits hardware before sleeping.
.stop = qcom_wdt_stop,
.ping = qcom_wdt_ping,
.set_timeout = qcom_wdt_set_timeout,
+ .set_pretimeout = qcom_wdt_set_pretimeout,
.restart = qcom_wdt_restart,
.owner = THIS_MODULE,
};
.identity = KBUILD_MODNAME,
};
+static const struct watchdog_info qcom_wdt_pt_info = {
+ .options = WDIOF_KEEPALIVEPING
+ | WDIOF_MAGICCLOSE
+ | WDIOF_SETTIMEOUT
+ | WDIOF_PRETIMEOUT
+ | WDIOF_CARDRESET,
+ .identity = KBUILD_MODNAME,
+};
+
static void qcom_clk_disable_unprepare(void *data)
{
clk_disable_unprepare(data);
struct device_node *np = dev->of_node;
const u32 *regs;
u32 percpu_offset;
- int ret;
+ int irq, ret;
+ struct clk *clk;
regs = of_device_get_match_data(dev);
if (!regs) {
if (IS_ERR(wdt->base))
return PTR_ERR(wdt->base);
- wdt->clk = devm_clk_get(dev, NULL);
- if (IS_ERR(wdt->clk)) {
+ clk = devm_clk_get(dev, NULL);
+ if (IS_ERR(clk)) {
dev_err(dev, "failed to get input clock\n");
- return PTR_ERR(wdt->clk);
+ return PTR_ERR(clk);
}
- ret = clk_prepare_enable(wdt->clk);
+ ret = clk_prepare_enable(clk);
if (ret) {
dev_err(dev, "failed to setup clock\n");
return ret;
}
- ret = devm_add_action_or_reset(dev, qcom_clk_disable_unprepare,
- wdt->clk);
+ ret = devm_add_action_or_reset(dev, qcom_clk_disable_unprepare, clk);
if (ret)
return ret;
* that it would bite before a second elapses it's usefulness is
* limited. Bail if this is the case.
*/
- wdt->rate = clk_get_rate(wdt->clk);
+ wdt->rate = clk_get_rate(clk);
if (wdt->rate == 0 ||
wdt->rate > 0x10000000U) {
dev_err(dev, "invalid clock rate\n");
return -EINVAL;
}
- wdt->wdd.info = &qcom_wdt_info;
+ /* check if there is pretimeout support */
+ irq = platform_get_irq(pdev, 0);
+ if (irq > 0) {
+ ret = devm_request_irq(dev, irq, qcom_wdt_isr,
+ IRQF_TRIGGER_RISING,
+ "wdt_bark", &wdt->wdd);
+ if (ret)
+ return ret;
+
+ wdt->wdd.info = &qcom_wdt_pt_info;
+ wdt->wdd.pretimeout = 1;
+ } else {
+ if (irq == -EPROBE_DEFER)
+ return -EPROBE_DEFER;
+
+ wdt->wdd.info = &qcom_wdt_info;
+ }
+
wdt->wdd.ops = &qcom_wdt_ops;
wdt->wdd.min_timeout = 1;
wdt->wdd.max_timeout = 0x10000000U / wdt->rate;
}
wdt->irq = platform_get_irq(pdev, 0);
- if (wdt->irq < 0) {
- dev_err(dev, "failed to get IRQ resource\n");
+ if (wdt->irq < 0)
return wdt->irq;
- }
ret = devm_request_irq(dev, wdt->irq, sprd_wdt_isr, IRQF_NO_SUSPEND,
"sprd-wdt", (void *)wdt);
#include <linux/version.h>
#include <linux/watchdog.h>
+#include <asm/unaligned.h>
+
#define ZIIRAVE_TIMEOUT_MIN 3
#define ZIIRAVE_TIMEOUT_MAX 255
+#define ZIIRAVE_TIMEOUT_DEFAULT 30
#define ZIIRAVE_PING_VALUE 0x0
#define ZIIRAVE_FIRM_PKT_TOTAL_SIZE 20
#define ZIIRAVE_FIRM_PKT_DATA_SIZE 16
-#define ZIIRAVE_FIRM_FLASH_MEMORY_START 0x1600
-#define ZIIRAVE_FIRM_FLASH_MEMORY_END 0x2bbf
+#define ZIIRAVE_FIRM_FLASH_MEMORY_START (2 * 0x1600)
+#define ZIIRAVE_FIRM_FLASH_MEMORY_END (2 * 0x2bbf)
+#define ZIIRAVE_FIRM_PAGE_SIZE 128
/* Received and ready for next Download packet. */
#define ZIIRAVE_FIRM_DOWNLOAD_ACK 1
-/* Currently writing to flash. Retry Download status in a moment! */
-#define ZIIRAVE_FIRM_DOWNLOAD_BUSY 2
-
-/* Wait for ACK timeout in ms */
-#define ZIIRAVE_FIRM_WAIT_FOR_ACK_TIMEOUT 50
/* Firmware commands */
#define ZIIRAVE_CMD_DOWNLOAD_START 0x10
#define ZIIRAVE_CMD_JUMP_TO_BOOTLOADER 0x0c
#define ZIIRAVE_CMD_DOWNLOAD_PACKET 0x0e
+#define ZIIRAVE_CMD_JUMP_TO_BOOTLOADER_MAGIC 1
+#define ZIIRAVE_CMD_RESET_PROCESSOR_MAGIC 1
+
+#define ZIIRAVE_FW_VERSION_FMT "02.%02u.%02u"
+#define ZIIRAVE_BL_VERSION_FMT "01.%02u.%02u"
+
struct ziirave_wdt_rev {
unsigned char major;
unsigned char minor;
return ret;
}
-static int ziirave_firm_wait_for_ack(struct watchdog_device *wdd)
+static int ziirave_firm_read_ack(struct watchdog_device *wdd)
{
struct i2c_client *client = to_i2c_client(wdd->parent);
int ret;
- unsigned long timeout;
- timeout = jiffies + msecs_to_jiffies(ZIIRAVE_FIRM_WAIT_FOR_ACK_TIMEOUT);
- do {
- if (time_after(jiffies, timeout))
- return -ETIMEDOUT;
-
- usleep_range(5000, 10000);
-
- ret = i2c_smbus_read_byte(client);
- if (ret < 0) {
- dev_err(&client->dev, "Failed to read byte\n");
- return ret;
- }
- } while (ret == ZIIRAVE_FIRM_DOWNLOAD_BUSY);
+ ret = i2c_smbus_read_byte(client);
+ if (ret < 0) {
+ dev_err(&client->dev, "Failed to read status byte\n");
+ return ret;
+ }
return ret == ZIIRAVE_FIRM_DOWNLOAD_ACK ? 0 : -EIO;
}
-static int ziirave_firm_set_read_addr(struct watchdog_device *wdd, u16 addr)
+static int ziirave_firm_set_read_addr(struct watchdog_device *wdd, u32 addr)
{
struct i2c_client *client = to_i2c_client(wdd->parent);
+ const u16 addr16 = (u16)addr / 2;
u8 address[2];
- address[0] = addr & 0xff;
- address[1] = (addr >> 8) & 0xff;
+ put_unaligned_le16(addr16, address);
return i2c_smbus_write_block_data(client,
ZIIRAVE_CMD_DOWNLOAD_SET_READ_ADDR,
- ARRAY_SIZE(address), address);
-}
-
-static int ziirave_firm_write_block_data(struct watchdog_device *wdd,
- u8 command, u8 length, const u8 *data,
- bool wait_for_ack)
-{
- struct i2c_client *client = to_i2c_client(wdd->parent);
- int ret;
-
- ret = i2c_smbus_write_block_data(client, command, length, data);
- if (ret) {
- dev_err(&client->dev,
- "Failed to send command 0x%02x: %d\n", command, ret);
- return ret;
- }
-
- if (wait_for_ack)
- ret = ziirave_firm_wait_for_ack(wdd);
-
- return ret;
+ sizeof(address), address);
}
-static int ziirave_firm_write_byte(struct watchdog_device *wdd, u8 command,
- u8 byte, bool wait_for_ack)
+static bool ziirave_firm_addr_readonly(u32 addr)
{
- return ziirave_firm_write_block_data(wdd, command, 1, &byte,
- wait_for_ack);
+ return addr < ZIIRAVE_FIRM_FLASH_MEMORY_START ||
+ addr > ZIIRAVE_FIRM_FLASH_MEMORY_END;
}
/*
* Data0 .. Data15: Array of 16 bytes of data.
* Checksum: Checksum byte to verify data integrity.
*/
-static int ziirave_firm_write_pkt(struct watchdog_device *wdd,
- const struct ihex_binrec *rec)
+static int __ziirave_firm_write_pkt(struct watchdog_device *wdd,
+ u32 addr, const u8 *data, u8 len)
{
+ const u16 addr16 = (u16)addr / 2;
struct i2c_client *client = to_i2c_client(wdd->parent);
u8 i, checksum = 0, packet[ZIIRAVE_FIRM_PKT_TOTAL_SIZE];
int ret;
- u16 addr;
- memset(packet, 0, ARRAY_SIZE(packet));
+ /* Check max data size */
+ if (len > ZIIRAVE_FIRM_PKT_DATA_SIZE) {
+ dev_err(&client->dev, "Firmware packet too long (%d)\n",
+ len);
+ return -EMSGSIZE;
+ }
+
+ /*
+ * Ignore packets that are targeting program memory outisde of
+ * app partition, since they will be ignored by the
+ * bootloader. At the same time, we need to make sure we'll
+ * allow zero length packet that will be sent as the last step
+ * of firmware update
+ */
+ if (len && ziirave_firm_addr_readonly(addr))
+ return 0;
/* Packet length */
- packet[0] = (u8)be16_to_cpu(rec->len);
+ packet[0] = len;
/* Packet address */
- addr = (be32_to_cpu(rec->addr) & 0xffff) >> 1;
- packet[1] = addr & 0xff;
- packet[2] = (addr & 0xff00) >> 8;
+ put_unaligned_le16(addr16, packet + 1);
- /* Packet data */
- if (be16_to_cpu(rec->len) > ZIIRAVE_FIRM_PKT_DATA_SIZE)
- return -EMSGSIZE;
- memcpy(packet + 3, rec->data, be16_to_cpu(rec->len));
+ memcpy(packet + 3, data, len);
+ memset(packet + 3 + len, 0, ZIIRAVE_FIRM_PKT_DATA_SIZE - len);
/* Packet checksum */
- for (i = 0; i < ZIIRAVE_FIRM_PKT_TOTAL_SIZE - 1; i++)
+ for (i = 0; i < len + 3; i++)
checksum += packet[i];
packet[ZIIRAVE_FIRM_PKT_TOTAL_SIZE - 1] = checksum;
- ret = ziirave_firm_write_block_data(wdd, ZIIRAVE_CMD_DOWNLOAD_PACKET,
- ARRAY_SIZE(packet), packet, true);
+ ret = i2c_smbus_write_block_data(client, ZIIRAVE_CMD_DOWNLOAD_PACKET,
+ sizeof(packet), packet);
+ if (ret) {
+ dev_err(&client->dev,
+ "Failed to send DOWNLOAD_PACKET: %d\n", ret);
+ return ret;
+ }
+
+ ret = ziirave_firm_read_ack(wdd);
if (ret)
dev_err(&client->dev,
"Failed to write firmware packet at address 0x%04x: %d\n",
return ret;
}
+static int ziirave_firm_write_pkt(struct watchdog_device *wdd,
+ u32 addr, const u8 *data, u8 len)
+{
+ const u8 max_write_len = ZIIRAVE_FIRM_PAGE_SIZE -
+ (addr - ALIGN_DOWN(addr, ZIIRAVE_FIRM_PAGE_SIZE));
+ int ret;
+
+ if (len > max_write_len) {
+ /*
+ * If data crossed page boundary we need to split this
+ * write in two
+ */
+ ret = __ziirave_firm_write_pkt(wdd, addr, data, max_write_len);
+ if (ret)
+ return ret;
+
+ addr += max_write_len;
+ data += max_write_len;
+ len -= max_write_len;
+ }
+
+ return __ziirave_firm_write_pkt(wdd, addr, data, len);
+}
+
static int ziirave_firm_verify(struct watchdog_device *wdd,
const struct firmware *fw)
{
const struct ihex_binrec *rec;
int i, ret;
u8 data[ZIIRAVE_FIRM_PKT_DATA_SIZE];
- u16 addr;
for (rec = (void *)fw->data; rec; rec = ihex_next_binrec(rec)) {
- /* Zero length marks end of records */
- if (!be16_to_cpu(rec->len))
- break;
+ const u16 len = be16_to_cpu(rec->len);
+ const u32 addr = be32_to_cpu(rec->addr);
- addr = (be32_to_cpu(rec->addr) & 0xffff) >> 1;
- if (addr < ZIIRAVE_FIRM_FLASH_MEMORY_START ||
- addr > ZIIRAVE_FIRM_FLASH_MEMORY_END)
+ if (ziirave_firm_addr_readonly(addr))
continue;
ret = ziirave_firm_set_read_addr(wdd, addr);
return ret;
}
- for (i = 0; i < ARRAY_SIZE(data); i++) {
+ for (i = 0; i < len; i++) {
ret = i2c_smbus_read_byte_data(client,
ZIIRAVE_CMD_DOWNLOAD_READ_BYTE);
if (ret < 0) {
data[i] = ret;
}
- if (memcmp(data, rec->data, be16_to_cpu(rec->len))) {
+ if (memcmp(data, rec->data, len)) {
dev_err(&client->dev,
"Firmware mismatch at address 0x%04x\n", addr);
return -EINVAL;
const struct firmware *fw)
{
struct i2c_client *client = to_i2c_client(wdd->parent);
- int ret, words_till_page_break;
const struct ihex_binrec *rec;
- struct ihex_binrec *rec_new;
+ int ret;
- ret = ziirave_firm_write_byte(wdd, ZIIRAVE_CMD_JUMP_TO_BOOTLOADER, 1,
- false);
- if (ret)
+ ret = i2c_smbus_write_byte_data(client,
+ ZIIRAVE_CMD_JUMP_TO_BOOTLOADER,
+ ZIIRAVE_CMD_JUMP_TO_BOOTLOADER_MAGIC);
+ if (ret) {
+ dev_err(&client->dev, "Failed to jump to bootloader\n");
return ret;
+ }
msleep(500);
- ret = ziirave_firm_write_byte(wdd, ZIIRAVE_CMD_DOWNLOAD_START, 1, true);
- if (ret)
+ ret = i2c_smbus_write_byte(client, ZIIRAVE_CMD_DOWNLOAD_START);
+ if (ret) {
+ dev_err(&client->dev, "Failed to start download\n");
return ret;
+ }
+
+ ret = ziirave_firm_read_ack(wdd);
+ if (ret) {
+ dev_err(&client->dev, "No ACK for start download\n");
+ return ret;
+ }
msleep(500);
for (rec = (void *)fw->data; rec; rec = ihex_next_binrec(rec)) {
- /* Zero length marks end of records */
- if (!be16_to_cpu(rec->len))
- break;
-
- /* Check max data size */
- if (be16_to_cpu(rec->len) > ZIIRAVE_FIRM_PKT_DATA_SIZE) {
- dev_err(&client->dev, "Firmware packet too long (%d)\n",
- be16_to_cpu(rec->len));
- return -EMSGSIZE;
- }
-
- /* Calculate words till page break */
- words_till_page_break = (64 - ((be32_to_cpu(rec->addr) >> 1) &
- 0x3f));
- if ((be16_to_cpu(rec->len) >> 1) > words_till_page_break) {
- /*
- * Data in passes page boundary, so we need to split in
- * two blocks of data. Create a packet with the first
- * block of data.
- */
- rec_new = kzalloc(sizeof(struct ihex_binrec) +
- (words_till_page_break << 1),
- GFP_KERNEL);
- if (!rec_new)
- return -ENOMEM;
-
- rec_new->len = cpu_to_be16(words_till_page_break << 1);
- rec_new->addr = rec->addr;
- memcpy(rec_new->data, rec->data,
- be16_to_cpu(rec_new->len));
-
- ret = ziirave_firm_write_pkt(wdd, rec_new);
- kfree(rec_new);
- if (ret)
- return ret;
-
- /* Create a packet with the second block of data */
- rec_new = kzalloc(sizeof(struct ihex_binrec) +
- be16_to_cpu(rec->len) -
- (words_till_page_break << 1),
- GFP_KERNEL);
- if (!rec_new)
- return -ENOMEM;
-
- /* Remaining bytes */
- rec_new->len = rec->len -
- cpu_to_be16(words_till_page_break << 1);
-
- rec_new->addr = cpu_to_be32(be32_to_cpu(rec->addr) +
- (words_till_page_break << 1));
-
- memcpy(rec_new->data,
- rec->data + (words_till_page_break << 1),
- be16_to_cpu(rec_new->len));
-
- ret = ziirave_firm_write_pkt(wdd, rec_new);
- kfree(rec_new);
- if (ret)
- return ret;
- } else {
- ret = ziirave_firm_write_pkt(wdd, rec);
- if (ret)
- return ret;
- }
+ ret = ziirave_firm_write_pkt(wdd, be32_to_cpu(rec->addr),
+ rec->data, be16_to_cpu(rec->len));
+ if (ret)
+ return ret;
}
- /* For end of download, the length field will be set to 0 */
- rec_new = kzalloc(sizeof(struct ihex_binrec) + 1, GFP_KERNEL);
- if (!rec_new)
- return -ENOMEM;
-
- ret = ziirave_firm_write_pkt(wdd, rec_new);
- kfree(rec_new);
+ /*
+ * Finish firmware download process by sending a zero length
+ * payload
+ */
+ ret = ziirave_firm_write_pkt(wdd, 0, NULL, 0);
if (ret) {
dev_err(&client->dev, "Failed to send EMPTY packet: %d\n", ret);
return ret;
}
/* End download operation */
- ret = ziirave_firm_write_byte(wdd, ZIIRAVE_CMD_DOWNLOAD_END, 1, false);
- if (ret)
+ ret = i2c_smbus_write_byte(client, ZIIRAVE_CMD_DOWNLOAD_END);
+ if (ret) {
+ dev_err(&client->dev,
+ "Failed to end firmware download: %d\n", ret);
return ret;
+ }
/* Reset the processor */
- ret = ziirave_firm_write_byte(wdd, ZIIRAVE_CMD_RESET_PROCESSOR, 1,
- false);
- if (ret)
+ ret = i2c_smbus_write_byte_data(client,
+ ZIIRAVE_CMD_RESET_PROCESSOR,
+ ZIIRAVE_CMD_RESET_PROCESSOR_MAGIC);
+ if (ret) {
+ dev_err(&client->dev,
+ "Failed to reset the watchdog: %d\n", ret);
return ret;
+ }
msleep(500);
if (ret)
return ret;
- ret = sprintf(buf, "02.%02u.%02u", w_priv->firmware_rev.major,
+ ret = sprintf(buf, ZIIRAVE_FW_VERSION_FMT, w_priv->firmware_rev.major,
w_priv->firmware_rev.minor);
mutex_unlock(&w_priv->sysfs_mutex);
if (ret)
return ret;
- ret = sprintf(buf, "01.%02u.%02u", w_priv->bootloader_rev.major,
+ ret = sprintf(buf, ZIIRAVE_BL_VERSION_FMT, w_priv->bootloader_rev.major,
w_priv->bootloader_rev.minor);
mutex_unlock(&w_priv->sysfs_mutex);
goto unlock_mutex;
}
- dev_info(&client->dev, "Firmware updated to version 02.%02u.%02u\n",
+ dev_info(&client->dev,
+ "Firmware updated to version " ZIIRAVE_FW_VERSION_FMT "\n",
w_priv->firmware_rev.major, w_priv->firmware_rev.minor);
/* Restore the watchdog timeout */
&reset_duration);
if (ret) {
dev_info(&client->dev,
- "Unable to set reset pulse duration, using default\n");
+ "No reset pulse duration specified, using default\n");
return 0;
}
}
struct ziirave_wdt_data *w_priv;
int val;
- if (!i2c_check_functionality(client->adapter, I2C_FUNC_SMBUS_BYTE_DATA))
+ if (!i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_BYTE |
+ I2C_FUNC_SMBUS_BYTE_DATA |
+ I2C_FUNC_SMBUS_WRITE_BLOCK_DATA))
return -ENODEV;
w_priv = devm_kzalloc(&client->dev, sizeof(*w_priv), GFP_KERNEL);
*/
if (w_priv->wdd.timeout == 0) {
val = i2c_smbus_read_byte_data(client, ZIIRAVE_WDT_TIMEOUT);
- if (val < 0)
+ if (val < 0) {
+ dev_err(&client->dev, "Failed to read timeout\n");
return val;
+ }
- if (val < ZIIRAVE_TIMEOUT_MIN)
- return -ENODEV;
+ if (val > ZIIRAVE_TIMEOUT_MAX ||
+ val < ZIIRAVE_TIMEOUT_MIN)
+ val = ZIIRAVE_TIMEOUT_DEFAULT;
w_priv->wdd.timeout = val;
- } else {
- ret = ziirave_wdt_set_timeout(&w_priv->wdd,
- w_priv->wdd.timeout);
- if (ret)
- return ret;
+ }
- dev_info(&client->dev, "Timeout set to %ds.",
- w_priv->wdd.timeout);
+ ret = ziirave_wdt_set_timeout(&w_priv->wdd, w_priv->wdd.timeout);
+ if (ret) {
+ dev_err(&client->dev, "Failed to set timeout\n");
+ return ret;
}
+ dev_info(&client->dev, "Timeout set to %ds\n", w_priv->wdd.timeout);
+
watchdog_set_nowayout(&w_priv->wdd, nowayout);
i2c_set_clientdata(client, w_priv);
/* If in unconfigured state, set to stopped */
val = i2c_smbus_read_byte_data(client, ZIIRAVE_WDT_STATE);
- if (val < 0)
+ if (val < 0) {
+ dev_err(&client->dev, "Failed to read state\n");
return val;
+ }
if (val == ZIIRAVE_STATE_INITIAL)
ziirave_wdt_stop(&w_priv->wdd);
ret = ziirave_wdt_init_duration(client);
- if (ret)
+ if (ret) {
+ dev_err(&client->dev, "Failed to init duration\n");
return ret;
+ }
ret = ziirave_wdt_revision(client, &w_priv->firmware_rev,
ZIIRAVE_WDT_FIRM_VER_MAJOR);
- if (ret)
+ if (ret) {
+ dev_err(&client->dev, "Failed to read firmware version\n");
return ret;
+ }
+
+ dev_info(&client->dev,
+ "Firmware version: " ZIIRAVE_FW_VERSION_FMT "\n",
+ w_priv->firmware_rev.major, w_priv->firmware_rev.minor);
ret = ziirave_wdt_revision(client, &w_priv->bootloader_rev,
ZIIRAVE_WDT_BOOT_VER_MAJOR);
- if (ret)
+ if (ret) {
+ dev_err(&client->dev, "Failed to read bootloader version\n");
return ret;
+ }
+
+ dev_info(&client->dev,
+ "Bootloader version: " ZIIRAVE_BL_VERSION_FMT "\n",
+ w_priv->bootloader_rev.major, w_priv->bootloader_rev.minor);
w_priv->reset_reason = i2c_smbus_read_byte_data(client,
ZIIRAVE_WDT_RESET_REASON);
- if (w_priv->reset_reason < 0)
+ if (w_priv->reset_reason < 0) {
+ dev_err(&client->dev, "Failed to read reset reason\n");
return w_priv->reset_reason;
+ }
if (w_priv->reset_reason >= ARRAY_SIZE(ziirave_reasons) ||
- !ziirave_reasons[w_priv->reset_reason])
+ !ziirave_reasons[w_priv->reset_reason]) {
+ dev_err(&client->dev, "Invalid reset reason\n");
return -ENODEV;
+ }
ret = watchdog_register_device(&w_priv->wdd);
(GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
/* balloon_append: add the given page to the balloon. */
-static void __balloon_append(struct page *page)
+static void balloon_append(struct page *page)
{
+ __SetPageOffline(page);
+
/* Lowmem is re-populated first, so highmem pages go at list tail. */
if (PageHighMem(page)) {
list_add_tail(&page->lru, &ballooned_pages);
wake_up(&balloon_wq);
}
-static void balloon_append(struct page *page)
-{
- __balloon_append(page);
-}
-
/* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
static struct page *balloon_retrieve(bool require_lowmem)
{
else
balloon_stats.balloon_low--;
+ __ClearPageOffline(page);
return page;
}
for (i = 0; i < size; i++) {
p = pfn_to_page(start_pfn + i);
__online_page_set_limits(p);
- __SetPageOffline(p);
- __balloon_append(p);
+ balloon_append(p);
}
mutex_unlock(&balloon_mutex);
}
xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
/* Relinquish the page back to the allocator. */
- __ClearPageOffline(page);
free_reserved_page(page);
}
state = BP_EAGAIN;
break;
}
- __SetPageOffline(page);
adjust_managed_page_count(page, -1);
xenmem_reservation_scrub_page(page);
list_add(&page->lru, &pages);
while (pgno < nr_pages) {
page = balloon_retrieve(true);
if (page) {
- __ClearPageOffline(page);
pages[pgno++] = page;
#ifdef CONFIG_XEN_HAVE_PVMMU
/*
mutex_lock(&balloon_mutex);
for (i = 0; i < nr_pages; i++) {
- if (pages[i]) {
- __SetPageOffline(pages[i]);
+ if (pages[i])
balloon_append(pages[i]);
- }
}
balloon_stats.target_unpopulated -= nr_pages;
unsigned long pages)
{
unsigned long pfn, extra_pfn_end;
- struct page *page;
/*
* If the amount of usable memory has been limited (e.g., with
extra_pfn_end = min(max_pfn, start_pfn + pages);
for (pfn = start_pfn; pfn < extra_pfn_end; pfn++) {
- page = pfn_to_page(pfn);
/* totalram_pages and totalhigh_pages do not
include the boot-time balloon extension, so
don't subtract from it. */
- __balloon_append(page);
+ balloon_append(pfn_to_page(pfn));
}
balloon_stats.total_pages += extra_pfn_end - start_pfn;
#define efi_data(op) (op.u.efi_runtime_call)
-efi_status_t xen_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
+static efi_status_t xen_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
{
struct xen_platform_op op = INIT_EFI_OP(get_time);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_get_time);
-efi_status_t xen_efi_set_time(efi_time_t *tm)
+static efi_status_t xen_efi_set_time(efi_time_t *tm)
{
struct xen_platform_op op = INIT_EFI_OP(set_time);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_set_time);
-efi_status_t xen_efi_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending,
- efi_time_t *tm)
+static efi_status_t xen_efi_get_wakeup_time(efi_bool_t *enabled,
+ efi_bool_t *pending,
+ efi_time_t *tm)
{
struct xen_platform_op op = INIT_EFI_OP(get_wakeup_time);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_get_wakeup_time);
-efi_status_t xen_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm)
+static efi_status_t xen_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm)
{
struct xen_platform_op op = INIT_EFI_OP(set_wakeup_time);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_set_wakeup_time);
-efi_status_t xen_efi_get_variable(efi_char16_t *name, efi_guid_t *vendor,
- u32 *attr, unsigned long *data_size,
- void *data)
+static efi_status_t xen_efi_get_variable(efi_char16_t *name, efi_guid_t *vendor,
+ u32 *attr, unsigned long *data_size,
+ void *data)
{
struct xen_platform_op op = INIT_EFI_OP(get_variable);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_get_variable);
-efi_status_t xen_efi_get_next_variable(unsigned long *name_size,
- efi_char16_t *name,
- efi_guid_t *vendor)
+static efi_status_t xen_efi_get_next_variable(unsigned long *name_size,
+ efi_char16_t *name,
+ efi_guid_t *vendor)
{
struct xen_platform_op op = INIT_EFI_OP(get_next_variable_name);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_get_next_variable);
-efi_status_t xen_efi_set_variable(efi_char16_t *name, efi_guid_t *vendor,
- u32 attr, unsigned long data_size,
- void *data)
+static efi_status_t xen_efi_set_variable(efi_char16_t *name, efi_guid_t *vendor,
+ u32 attr, unsigned long data_size,
+ void *data)
{
struct xen_platform_op op = INIT_EFI_OP(set_variable);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_set_variable);
-efi_status_t xen_efi_query_variable_info(u32 attr, u64 *storage_space,
- u64 *remaining_space,
- u64 *max_variable_size)
+static efi_status_t xen_efi_query_variable_info(u32 attr, u64 *storage_space,
+ u64 *remaining_space,
+ u64 *max_variable_size)
{
struct xen_platform_op op = INIT_EFI_OP(query_variable_info);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_query_variable_info);
-efi_status_t xen_efi_get_next_high_mono_count(u32 *count)
+static efi_status_t xen_efi_get_next_high_mono_count(u32 *count)
{
struct xen_platform_op op = INIT_EFI_OP(get_next_high_monotonic_count);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_get_next_high_mono_count);
-efi_status_t xen_efi_update_capsule(efi_capsule_header_t **capsules,
- unsigned long count, unsigned long sg_list)
+static efi_status_t xen_efi_update_capsule(efi_capsule_header_t **capsules,
+ unsigned long count, unsigned long sg_list)
{
struct xen_platform_op op = INIT_EFI_OP(update_capsule);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_update_capsule);
-efi_status_t xen_efi_query_capsule_caps(efi_capsule_header_t **capsules,
- unsigned long count, u64 *max_size,
- int *reset_type)
+static efi_status_t xen_efi_query_capsule_caps(efi_capsule_header_t **capsules,
+ unsigned long count, u64 *max_size, int *reset_type)
{
struct xen_platform_op op = INIT_EFI_OP(query_capsule_capabilities);
return efi_data(op).status;
}
-EXPORT_SYMBOL_GPL(xen_efi_query_capsule_caps);
-void xen_efi_reset_system(int reset_type, efi_status_t status,
- unsigned long data_size, efi_char16_t *data)
+static void xen_efi_reset_system(int reset_type, efi_status_t status,
+ unsigned long data_size, efi_char16_t *data)
{
switch (reset_type) {
case EFI_RESET_COLD:
BUG();
}
}
-EXPORT_SYMBOL_GPL(xen_efi_reset_system);
+
+/*
+ * Set XEN EFI runtime services function pointers. Other fields of struct efi,
+ * e.g. efi.systab, will be set like normal EFI.
+ */
+void __init xen_efi_runtime_setup(void)
+{
+ efi.get_time = xen_efi_get_time;
+ efi.set_time = xen_efi_set_time;
+ efi.get_wakeup_time = xen_efi_get_wakeup_time;
+ efi.set_wakeup_time = xen_efi_set_wakeup_time;
+ efi.get_variable = xen_efi_get_variable;
+ efi.get_next_variable = xen_efi_get_next_variable;
+ efi.set_variable = xen_efi_set_variable;
+ efi.set_variable_nonblocking = xen_efi_set_variable;
+ efi.query_variable_info = xen_efi_query_variable_info;
+ efi.query_variable_info_nonblocking = xen_efi_query_variable_info;
+ efi.update_capsule = xen_efi_update_capsule;
+ efi.query_capsule_caps = xen_efi_query_capsule_caps;
+ efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count;
+ efi.reset_system = xen_efi_reset_system;
+}
#include <linux/string.h>
#include <linux/slab.h>
#include <linux/miscdevice.h>
+#include <linux/workqueue.h>
#include <xen/xenbus.h>
#include <xen/xen.h>
wait_queue_head_t read_waitq;
struct kref kref;
+
+ struct work_struct wq;
};
/* Read out any raw xenbus messages queued up. */
mutex_unlock(&adap->dev_data->reply_mutex);
}
-static void xenbus_file_free(struct kref *kref)
+static void xenbus_worker(struct work_struct *wq)
{
struct xenbus_file_priv *u;
struct xenbus_transaction_holder *trans, *tmp;
struct watch_adapter *watch, *tmp_watch;
struct read_buffer *rb, *tmp_rb;
- u = container_of(kref, struct xenbus_file_priv, kref);
+ u = container_of(wq, struct xenbus_file_priv, wq);
/*
* No need for locking here because there are no other users,
kfree(u);
}
+static void xenbus_file_free(struct kref *kref)
+{
+ struct xenbus_file_priv *u;
+
+ /*
+ * We might be called in xenbus_thread().
+ * Use workqueue to avoid deadlock.
+ */
+ u = container_of(kref, struct xenbus_file_priv, kref);
+ schedule_work(&u->wq);
+}
+
static struct xenbus_transaction_holder *xenbus_get_transaction(
struct xenbus_file_priv *u, uint32_t tx_id)
{
INIT_LIST_HEAD(&u->watches);
INIT_LIST_HEAD(&u->read_buffers);
init_waitqueue_head(&u->read_waitq);
+ INIT_WORK(&u->wq, xenbus_worker);
mutex_init(&u->reply_mutex);
mutex_init(&u->msgbuffer_mutex);
if (!v9ses->cachetag) {
if (v9fs_random_cachetag(v9ses) < 0) {
v9ses->fscache = NULL;
+ kfree(v9ses->cachetag);
+ v9ses->cachetag = NULL;
return;
}
}
v9inode = V9FS_I(inode);
mutex_lock(&v9inode->v_mutex);
if (!v9inode->writeback_fid &&
+ (vma->vm_flags & VM_SHARED) &&
(vma->vm_flags & VM_WRITE)) {
/*
* clone a fid and add it to writeback_fid
(vma->vm_end - vma->vm_start - 1),
};
+ if (!(vma->vm_flags & VM_SHARED))
+ return;
p9_debug(P9_DEBUG_VFS, "9p VMA close, %p, flushing", vma);
static int
v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses,
- int flags, void *data)
+ int flags)
{
int ret;
retval = PTR_ERR(sb);
goto clunk_fid;
}
- retval = v9fs_fill_super(sb, v9ses, flags, data);
+ retval = v9fs_fill_super(sb, v9ses, flags);
if (retval)
goto release_sb;
#include <linux/dns_resolver.h>
#include "internal.h"
-const struct file_operations afs_dynroot_file_operations = {
- .open = dcache_dir_open,
- .release = dcache_dir_close,
- .iterate_shared = dcache_readdir,
- .llseek = dcache_dir_lseek,
-};
-
/*
* Probe to see if a cell may exist. This prevents positive dentries from
* being created unnecessarily.
inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO;
if (root) {
inode->i_op = &afs_dynroot_inode_operations;
- inode->i_fop = &afs_dynroot_file_operations;
+ inode->i_fop = &simple_dir_operations;
} else {
inode->i_op = &afs_autocell_inode_operations;
}
/*
* dynroot.c
*/
-extern const struct file_operations afs_dynroot_file_operations;
extern const struct inode_operations afs_dynroot_inode_operations;
extern const struct dentry_operations afs_dynroot_dentry_operations;
the correct location in memory. */
for(i = 0, elf_ppnt = elf_phdata;
i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
- int elf_prot, elf_flags, elf_fixed = MAP_FIXED_NOREPLACE;
+ int elf_prot, elf_flags;
unsigned long k, vaddr;
unsigned long total_size = 0;
*/
}
}
-
- /*
- * Some binaries have overlapping elf segments and then
- * we have to forcefully map over an existing mapping
- * e.g. over this newly established brk mapping.
- */
- elf_fixed = MAP_FIXED;
}
elf_prot = make_prot(elf_ppnt->p_flags);
* the ET_DYN load_addr calculations, proceed normally.
*/
if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
- elf_flags |= elf_fixed;
+ elf_flags |= MAP_FIXED;
} else if (loc->elf_ex.e_type == ET_DYN) {
/*
* This logic is run once for the first LOAD Program
load_bias = ELF_ET_DYN_BASE;
if (current->flags & PF_RANDOMIZE)
load_bias += arch_mmap_rnd();
- elf_flags |= elf_fixed;
+ elf_flags |= MAP_FIXED;
} else
load_bias = 0;
static void set_btree_ioerr(struct page *page)
{
struct extent_buffer *eb = (struct extent_buffer *)page->private;
+ struct btrfs_fs_info *fs_info;
SetPageError(page);
if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
return;
+ /*
+ * If we error out, we should add back the dirty_metadata_bytes
+ * to make it consistent.
+ */
+ fs_info = eb->fs_info;
+ percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
+ eb->len, fs_info->dirty_metadata_batch);
+
/*
* If writeback for a btree extent that doesn't belong to a log tree
* failed, increment the counter transaction->eb_write_errors.
if (!ret) {
free_extent_buffer(eb);
continue;
+ } else if (ret < 0) {
+ done = 1;
+ free_extent_buffer(eb);
+ break;
}
ret = write_one_eb(eb, wbc, &epd);
btrfs_free_path(path);
mutex_lock(&fs_info->qgroup_rescan_lock);
- if (!btrfs_fs_closing(fs_info))
- fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
-
if (err > 0 &&
fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
trans = btrfs_start_transaction(fs_info->quota_root, 1);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
+ trans = NULL;
btrfs_err(fs_info,
"fail to start transaction for status update: %d",
err);
- goto done;
}
- ret = update_qgroup_status_item(trans);
- if (ret < 0) {
- err = ret;
- btrfs_err(fs_info, "fail to update qgroup status: %d", err);
+
+ mutex_lock(&fs_info->qgroup_rescan_lock);
+ if (!btrfs_fs_closing(fs_info))
+ fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+ if (trans) {
+ ret = update_qgroup_status_item(trans);
+ if (ret < 0) {
+ err = ret;
+ btrfs_err(fs_info, "fail to update qgroup status: %d",
+ err);
+ }
}
+ fs_info->qgroup_rescan_running = false;
+ complete_all(&fs_info->qgroup_rescan_completion);
+ mutex_unlock(&fs_info->qgroup_rescan_lock);
+
+ if (!trans)
+ return;
+
btrfs_end_transaction(trans);
if (btrfs_fs_closing(fs_info)) {
} else {
btrfs_err(fs_info, "qgroup scan failed with %d", err);
}
-
-done:
- mutex_lock(&fs_info->qgroup_rescan_lock);
- fs_info->qgroup_rescan_running = false;
- mutex_unlock(&fs_info->qgroup_rescan_lock);
- complete_all(&fs_info->qgroup_rescan_completion);
}
/*
while ((unode = ulist_next(&reserved->range_changed, &uiter)))
clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL);
+ /* Also free data bytes of already reserved one */
+ btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
+ orig_reserved, BTRFS_QGROUP_RSV_DATA);
extent_changeset_release(reserved);
return ret;
}
* EXTENT_QGROUP_RESERVED, we won't double free.
* So not need to rush.
*/
- ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree,
+ ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree,
free_start, free_start + free_len - 1,
EXTENT_QGROUP_RESERVED, &changeset);
if (ret < 0)
int clear_rsv = 0;
int ret;
+ /*
+ * The subvolume has reloc tree but the swap is finished, no need to
+ * create/update the dead reloc tree
+ */
+ if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
+ return 0;
+
if (root->reloc_root) {
reloc_root = root->reloc_root;
reloc_root->last_trans = trans->transid;
/* Merged subvolume, cleanup its reloc root */
struct btrfs_root *reloc_root = root->reloc_root;
- clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
list_del_init(&root->reloc_dirty_list);
root->reloc_root = NULL;
if (reloc_root) {
if (ret2 < 0 && !ret)
ret = ret2;
}
+ clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
btrfs_put_fs_root(root);
} else {
/* Orphan reloc tree, just clean it up */
struct inode *btrfs_new_test_inode(void)
{
- return new_inode(test_mnt->mnt_sb);
+ struct inode *inode;
+
+ inode = new_inode(test_mnt->mnt_sb);
+ if (inode)
+ inode_init_owner(inode, NULL, S_IFREG);
+
+ return inode;
}
static int btrfs_init_test_fs(void)
}
num_devices = btrfs_num_devices(fs_info);
- allowed = 0;
+
+ /*
+ * SINGLE profile on-disk has no profile bit, but in-memory we have a
+ * special bit for it, to make it easier to distinguish. Thus we need
+ * to set it manually, or balance would refuse the profile.
+ */
+ allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
for (i = 0; i < ARRAY_SIZE(btrfs_raid_array); i++)
if (num_devices >= btrfs_raid_array[i].devs_min)
allowed |= btrfs_raid_array[i].bg_flag;
/* char buffer[]; */
} __packed;
+struct smb3_key_debug_info {
+ __u64 Suid;
+ __u16 cipher_type;
+ __u8 auth_key[16]; /* SMB2_NTLMV2_SESSKEY_SIZE */
+ __u8 smb3encryptionkey[SMB3_SIGN_KEY_SIZE];
+ __u8 smb3decryptionkey[SMB3_SIGN_KEY_SIZE];
+} __packed;
+
#define CIFS_IOCTL_MAGIC 0xCF
#define CIFS_IOC_COPYCHUNK_FILE _IOW(CIFS_IOCTL_MAGIC, 3, int)
#define CIFS_IOC_SET_INTEGRITY _IO(CIFS_IOCTL_MAGIC, 4)
#define CIFS_IOC_GET_MNT_INFO _IOR(CIFS_IOCTL_MAGIC, 5, struct smb_mnt_fs_info)
#define CIFS_ENUMERATE_SNAPSHOTS _IOR(CIFS_IOCTL_MAGIC, 6, struct smb_snapshot_array)
#define CIFS_QUERY_INFO _IOWR(CIFS_IOCTL_MAGIC, 7, struct smb_query_info)
+#define CIFS_DUMP_KEY _IOWR(CIFS_IOCTL_MAGIC, 8, struct smb3_key_debug_info)
__le32 num_aces;
} __attribute__((packed));
+/* ACE types - see MS-DTYP 2.4.4.1 */
+#define ACCESS_ALLOWED_ACE_TYPE 0x00
+#define ACCESS_DENIED_ACE_TYPE 0x01
+#define SYSTEM_AUDIT_ACE_TYPE 0x02
+#define SYSTEM_ALARM_ACE_TYPE 0x03
+#define ACCESS_ALLOWED_COMPOUND_ACE_TYPE 0x04
+#define ACCESS_ALLOWED_OBJECT_ACE_TYPE 0x05
+#define ACCESS_DENIED_OBJECT_ACE_TYPE 0x06
+#define SYSTEM_AUDIT_OBJECT_ACE_TYPE 0x07
+#define SYSTEM_ALARM_OBJECT_ACE_TYPE 0x08
+#define ACCESS_ALLOWED_CALLBACK_ACE_TYPE 0x09
+#define ACCESS_DENIED_CALLBACK_ACE_TYPE 0x0A
+#define ACCESS_ALLOWED_CALLBACK_OBJECT_ACE_TYPE 0x0B
+#define ACCESS_DENIED_CALLBACK_OBJECT_ACE_TYPE 0x0C
+#define SYSTEM_AUDIT_CALLBACK_ACE_TYPE 0x0D
+#define SYSTEM_ALARM_CALLBACK_ACE_TYPE 0x0E /* Reserved */
+#define SYSTEM_AUDIT_CALLBACK_OBJECT_ACE_TYPE 0x0F
+#define SYSTEM_ALARM_CALLBACK_OBJECT_ACE_TYPE 0x10 /* reserved */
+#define SYSTEM_MANDATORY_LABEL_ACE_TYPE 0x11
+#define SYSTEM_RESOURCE_ATTRIBUTE_ACE_TYPE 0x12
+#define SYSTEM_SCOPED_POLICY_ID_ACE_TYPE 0x13
+
+/* ACE flags */
+#define OBJECT_INHERIT_ACE 0x01
+#define CONTAINER_INHERIT_ACE 0x02
+#define NO_PROPAGATE_INHERIT_ACE 0x04
+#define INHERIT_ONLY_ACE 0x08
+#define INHERITED_ACE 0x10
+#define SUCCESSFUL_ACCESS_ACE_FLAG 0x40
+#define FAILED_ACCESS_ACE_FLAG 0x80
+
struct cifs_ace {
- __u8 type;
+ __u8 type; /* see above and MS-DTYP 2.4.4.1 */
__u8 flags;
__le16 size;
__le32 access_req;
struct cifs_sid sid; /* ie UUID of user or group who gets these perms */
} __attribute__((packed));
+/*
+ * The current SMB3 form of security descriptor is similar to what was used for
+ * cifs (see above) but some fields are split, and fields in the struct below
+ * matches names of fields to the the spec, MS-DTYP (see sections 2.4.5 and
+ * 2.4.6). Note that "CamelCase" fields are used in this struct in order to
+ * match the MS-DTYP and MS-SMB2 specs which define the wire format.
+ */
+struct smb3_sd {
+ __u8 Revision; /* revision level, MUST be one */
+ __u8 Sbz1; /* only meaningful if 'RM' flag set below */
+ __le16 Control;
+ __le32 OffsetOwner;
+ __le32 OffsetGroup;
+ __le32 OffsetSacl;
+ __le32 OffsetDacl;
+} __packed;
+
+/* Meaning of 'Control' field flags */
+#define ACL_CONTROL_SR 0x0001 /* Self relative */
+#define ACL_CONTROL_RM 0x0002 /* Resource manager control bits */
+#define ACL_CONTROL_PS 0x0004 /* SACL protected from inherits */
+#define ACL_CONTROL_PD 0x0008 /* DACL protected from inherits */
+#define ACL_CONTROL_SI 0x0010 /* SACL Auto-Inherited */
+#define ACL_CONTROL_DI 0x0020 /* DACL Auto-Inherited */
+#define ACL_CONTROL_SC 0x0040 /* SACL computed through inheritance */
+#define ACL_CONTROL_DC 0x0080 /* DACL computed through inheritence */
+#define ACL_CONTROL_SS 0x0100 /* Create server ACL */
+#define ACL_CONTROL_DT 0x0200 /* DACL provided by trusteed source */
+#define ACL_CONTROL_SD 0x0400 /* SACL defaulted */
+#define ACL_CONTROL_SP 0x0800 /* SACL is present on object */
+#define ACL_CONTROL_DD 0x1000 /* DACL defaulted */
+#define ACL_CONTROL_DP 0x2000 /* DACL is present on object */
+#define ACL_CONTROL_GD 0x4000 /* Group was defaulted */
+#define ACL_CONTROL_OD 0x8000 /* User was defaulted */
+
+/* Meaning of AclRevision flags */
+#define ACL_REVISION 0x02 /* See section 2.4.4.1 of MS-DTYP */
+#define ACL_REVISION_DS 0x04 /* Additional AceTypes allowed */
+
+struct smb3_acl {
+ u8 AclRevision; /* revision level */
+ u8 Sbz1; /* MBZ */
+ __le16 AclSize;
+ __le16 AceCount;
+ __le16 Sbz2; /* MBZ */
+} __packed;
+
+
/*
* Minimum security identifier can be one for system defined Users
* and Groups such as NULL SID and World or Built-in accounts such
umode_t mode, struct cifs_tcon *tcon,
const char *full_path,
struct cifs_sb_info *cifs_sb);
- int (*mkdir)(const unsigned int, struct cifs_tcon *, const char *,
- struct cifs_sb_info *);
+ int (*mkdir)(const unsigned int xid, struct inode *inode, umode_t mode,
+ struct cifs_tcon *tcon, const char *name,
+ struct cifs_sb_info *sb);
/* set info on created directory */
void (*mkdir_setinfo)(struct inode *, const char *,
struct cifs_sb_info *, struct cifs_tcon *,
bool smallBuf:1; /* so we know which buf_release function to call */
};
+#define ACL_NO_MODE -1
struct cifs_open_parms {
struct cifs_tcon *tcon;
struct cifs_sb_info *cifs_sb;
const struct nls_table *nls_codepage,
int remap);
-extern int CIFSSMBMkDir(const unsigned int xid, struct cifs_tcon *tcon,
+extern int CIFSSMBMkDir(const unsigned int xid, struct inode *inode,
+ umode_t mode, struct cifs_tcon *tcon,
const char *name, struct cifs_sb_info *cifs_sb);
extern int CIFSSMBRmDir(const unsigned int xid, struct cifs_tcon *tcon,
const char *name, struct cifs_sb_info *cifs_sb);
}
int
-CIFSSMBMkDir(const unsigned int xid, struct cifs_tcon *tcon, const char *name,
+CIFSSMBMkDir(const unsigned int xid, struct inode *inode, umode_t mode,
+ struct cifs_tcon *tcon, const char *name,
struct cifs_sb_info *cifs_sb)
{
int rc = 0;
}
/* BB add setting the equivalent of mode via CreateX w/ACLs */
- rc = server->ops->mkdir(xid, tcon, full_path, cifs_sb);
+ rc = server->ops->mkdir(xid, inode, mode, tcon, full_path, cifs_sb);
if (rc) {
cifs_dbg(FYI, "cifs_mkdir returned 0x%x\n", rc);
d_drop(direntry);
goto mkdir_out;
}
+ /* TODO: skip this for smb2/smb3 */
rc = cifs_mkdir_qinfo(inode, direntry, mode, full_path, cifs_sb, tcon,
xid);
mkdir_out:
long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
{
struct inode *inode = file_inode(filep);
+ struct smb3_key_debug_info pkey_inf;
int rc = -ENOTTY; /* strange error - but the precedent */
unsigned int xid;
struct cifsFileInfo *pSMBFile = filep->private_data;
else
rc = -EOPNOTSUPP;
break;
+ case CIFS_DUMP_KEY:
+ if (pSMBFile == NULL)
+ break;
+ if (!capable(CAP_SYS_ADMIN)) {
+ rc = -EACCES;
+ break;
+ }
+
+ tcon = tlink_tcon(pSMBFile->tlink);
+ if (!smb3_encryption_required(tcon)) {
+ rc = -EOPNOTSUPP;
+ break;
+ }
+ pkey_inf.cipher_type =
+ le16_to_cpu(tcon->ses->server->cipher_type);
+ pkey_inf.Suid = tcon->ses->Suid;
+ memcpy(pkey_inf.auth_key, tcon->ses->auth_key.response,
+ 16 /* SMB2_NTLMV2_SESSKEY_SIZE */);
+ memcpy(pkey_inf.smb3decryptionkey,
+ tcon->ses->smb3decryptionkey, SMB3_SIGN_KEY_SIZE);
+ memcpy(pkey_inf.smb3encryptionkey,
+ tcon->ses->smb3encryptionkey, SMB3_SIGN_KEY_SIZE);
+ if (copy_to_user((void __user *)arg, &pkey_inf,
+ sizeof(struct smb3_key_debug_info)))
+ rc = -EFAULT;
+ else
+ rc = 0;
+ break;
default:
cifs_dbg(FYI, "unsupported ioctl\n");
break;
char *bcc_ptr;
struct cifs_ses *ses = sess_data->ses;
char lnm_session_key[CIFS_AUTH_RESP_SIZE];
- __u32 capabilities;
__u16 bytes_remaining;
/* lanman 2 style sessionsetup */
pSMB = (SESSION_SETUP_ANDX *)sess_data->iov[0].iov_base;
bcc_ptr = sess_data->iov[2].iov_base;
- capabilities = cifs_ssetup_hdr(ses, pSMB);
+ (void)cifs_ssetup_hdr(ses, pSMB);
pSMB->req.hdr.Flags2 &= ~SMBFLG2_UNICODE;
smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon,
struct cifs_sb_info *cifs_sb, const char *full_path,
__u32 desired_access, __u32 create_disposition,
- __u32 create_options, void *ptr, int command,
+ __u32 create_options, umode_t mode, void *ptr, int command,
struct cifsFileInfo *cfile)
{
int rc;
oparms.create_options |= CREATE_OPEN_BACKUP_INTENT;
oparms.fid = &fid;
oparms.reconnect = false;
+ oparms.mode = mode;
memset(&open_iov, 0, sizeof(open_iov));
rqst[num_rqst].rq_iov = open_iov;
cifs_get_readable_path(tcon, full_path, &cfile);
rc = smb2_compound_op(xid, tcon, cifs_sb, full_path,
FILE_READ_ATTRIBUTES, FILE_OPEN, create_options,
- smb2_data, SMB2_OP_QUERY_INFO, cfile);
+ ACL_NO_MODE, smb2_data, SMB2_OP_QUERY_INFO, cfile);
if (rc == -EOPNOTSUPP) {
*symlink = true;
create_options |= OPEN_REPARSE_POINT;
/* Failed on a symbolic link - query a reparse point info */
rc = smb2_compound_op(xid, tcon, cifs_sb, full_path,
FILE_READ_ATTRIBUTES, FILE_OPEN,
- create_options, smb2_data,
- SMB2_OP_QUERY_INFO, NULL);
+ create_options, ACL_NO_MODE,
+ smb2_data, SMB2_OP_QUERY_INFO, NULL);
}
if (rc)
goto out;
}
int
-smb2_mkdir(const unsigned int xid, struct cifs_tcon *tcon, const char *name,
+smb2_mkdir(const unsigned int xid, struct inode *parent_inode, umode_t mode,
+ struct cifs_tcon *tcon, const char *name,
struct cifs_sb_info *cifs_sb)
{
return smb2_compound_op(xid, tcon, cifs_sb, name,
FILE_WRITE_ATTRIBUTES, FILE_CREATE,
- CREATE_NOT_FILE, NULL, SMB2_OP_MKDIR, NULL);
+ CREATE_NOT_FILE, mode, NULL, SMB2_OP_MKDIR,
+ NULL);
}
void
cifs_get_writable_path(tcon, name, &cfile);
tmprc = smb2_compound_op(xid, tcon, cifs_sb, name,
FILE_WRITE_ATTRIBUTES, FILE_CREATE,
- CREATE_NOT_FILE, &data, SMB2_OP_SET_INFO,
- cfile);
+ CREATE_NOT_FILE, ACL_NO_MODE,
+ &data, SMB2_OP_SET_INFO, cfile);
if (tmprc == 0)
cifs_i->cifsAttrs = dosattrs;
}
struct cifs_sb_info *cifs_sb)
{
return smb2_compound_op(xid, tcon, cifs_sb, name, DELETE, FILE_OPEN,
- CREATE_NOT_FILE,
+ CREATE_NOT_FILE, ACL_NO_MODE,
NULL, SMB2_OP_RMDIR, NULL);
}
{
return smb2_compound_op(xid, tcon, cifs_sb, name, DELETE, FILE_OPEN,
CREATE_DELETE_ON_CLOSE | OPEN_REPARSE_POINT,
- NULL, SMB2_OP_DELETE, NULL);
+ ACL_NO_MODE, NULL, SMB2_OP_DELETE, NULL);
}
static int
goto smb2_rename_path;
}
rc = smb2_compound_op(xid, tcon, cifs_sb, from_name, access,
- FILE_OPEN, 0, smb2_to_name, command, cfile);
+ FILE_OPEN, 0, ACL_NO_MODE, smb2_to_name,
+ command, cfile);
smb2_rename_path:
kfree(smb2_to_name);
return rc;
__le64 eof = cpu_to_le64(size);
return smb2_compound_op(xid, tcon, cifs_sb, full_path,
- FILE_WRITE_DATA, FILE_OPEN, 0, &eof,
- SMB2_OP_SET_EOF, NULL);
+ FILE_WRITE_DATA, FILE_OPEN, 0, ACL_NO_MODE,
+ &eof, SMB2_OP_SET_EOF, NULL);
}
int
return PTR_ERR(tlink);
rc = smb2_compound_op(xid, tlink_tcon(tlink), cifs_sb, full_path,
- FILE_WRITE_ATTRIBUTES, FILE_OPEN, 0, buf,
- SMB2_OP_SET_INFO, NULL);
+ FILE_WRITE_ATTRIBUTES, FILE_OPEN,
+ 0, ACL_NO_MODE, buf, SMB2_OP_SET_INFO, NULL);
cifs_put_tlink(tlink);
return rc;
}
goto oshr_exit;
}
+ atomic_inc(&tcon->num_remote_opens);
+
o_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
oparms.fid->persistent_fid = o_rsp->PersistentFileId;
oparms.fid->volatile_fid = o_rsp->VolatileFileId;
rc = compound_send_recv(xid, ses, flags, 3, rqst,
resp_buftype, rsp_iov);
+ /* no need to bump num_remote_opens because handle immediately closed */
sea_exit:
kfree(ea);
resp_buftype, rsp_iov);
if (rc)
goto iqinf_exit;
+
+ /* No need to bump num_remote_opens since handle immediately closed */
if (qi.flags & PASSTHRU_FSCTL) {
pqi = (struct smb_query_info __user *)arg;
io_rsp = (struct smb2_ioctl_rsp *)rsp_iov[1].iov_base;
if (oplock == SMB2_OPLOCK_LEVEL_NOCHANGE)
return;
+ /* Check if the server granted an oplock rather than a lease */
+ if (oplock & SMB2_OPLOCK_LEVEL_EXCLUSIVE)
+ return smb2_set_oplock_level(cinode, oplock, epoch,
+ purge_cache);
+
if (oplock & SMB2_LEASE_READ_CACHING_HE) {
new_oplock |= CIFS_CACHE_READ_FLG;
strcat(message, "R");
unsigned int num = *num_iovec;
iov[num].iov_base = create_posix_buf(mode);
+ if (mode == -1)
+ cifs_dbg(VFS, "illegal mode\n"); /* BB REMOVEME */
if (iov[num].iov_base == NULL)
return -ENOMEM;
iov[num].iov_len = sizeof(struct create_posix);
rqst.rq_iov = iov;
rqst.rq_nvec = n_iov;
+ /* no need to inc num_remote_opens because we close it just below */
trace_smb3_posix_mkdir_enter(xid, tcon->tid, ses->Suid, CREATE_NOT_FILE,
FILE_WRITE_ATTRIBUTES);
/* resource #4: response buffer */
/* File attributes ignored on open (used in create though) */
req->FileAttributes = cpu_to_le32(file_attributes);
req->ShareAccess = FILE_SHARE_ALL_LE;
+
req->CreateDisposition = cpu_to_le32(oparms->disposition);
req->CreateOptions = cpu_to_le32(oparms->create_options & CREATE_OPTIONS_MASK);
req->NameOffset = cpu_to_le16(sizeof(struct smb2_create_req));
return rc;
}
+ /* TODO: add handling for the mode on create */
+ if (oparms->disposition == FILE_CREATE)
+ cifs_dbg(VFS, "mode is 0x%x\n", oparms->mode); /* BB REMOVEME */
+
+ if ((oparms->disposition == FILE_CREATE) && (oparms->mode != -1)) {
+ if (n_iov > 2) {
+ struct create_context *ccontext =
+ (struct create_context *)iov[n_iov-1].iov_base;
+ ccontext->Next =
+ cpu_to_le32(iov[n_iov-1].iov_len);
+ }
+
+ /* rc = add_sd_context(iov, &n_iov, oparms->mode); */
+ if (rc)
+ return rc;
+ }
+
if (n_iov > 2) {
struct create_context *ccontext =
(struct create_context *)iov[n_iov-1].iov_base;
* See MS-SMB2 2.2.35 and 2.2.36
*/
-int
+static int
SMB2_notify_init(const unsigned int xid, struct smb_rqst *rqst,
struct cifs_tcon *tcon, u64 persistent_fid, u64 volatile_fid,
u32 completion_filter, bool watch_tree)
umode_t mode, struct cifs_tcon *tcon,
const char *full_path,
struct cifs_sb_info *cifs_sb);
-extern int smb2_mkdir(const unsigned int xid, struct cifs_tcon *tcon,
+extern int smb2_mkdir(const unsigned int xid, struct inode *inode,
+ umode_t mode, struct cifs_tcon *tcon,
const char *name, struct cifs_sb_info *cifs_sb);
extern void smb2_mkdir_setinfo(struct inode *inode, const char *full_path,
struct cifs_sb_info *cifs_sb,
#define IO_REPARSE_APPXSTREAM 0xC0000014
/* NFS symlinks, Win 8/SMB3 and later */
#define IO_REPARSE_TAG_NFS 0x80000014
+/*
+ * AzureFileSync - see
+ * https://docs.microsoft.com/en-us/azure/storage/files/storage-sync-cloud-tiering
+ */
+#define IO_REPARSE_TAG_AZ_FILE_SYNC 0x8000001e
+/* WSL reparse tags */
+#define IO_REPARSE_TAG_LX_SYMLINK 0xA000001D
+#define IO_REPARSE_TAG_AF_UNIX 0x80000023
+#define IO_REPARSE_TAG_LX_FIFO 0x80000024
+#define IO_REPARSE_TAG_LX_CHR 0x80000025
+#define IO_REPARSE_TAG_LX_BLK 0x80000026
/* fsctl flags */
/* If Flags is set to this value, the request is an FSCTL not ioctl request */
#include "cifs_fs_sb.h"
#include "cifs_unicode.h"
-#define MAX_EA_VALUE_SIZE 65535
+#define MAX_EA_VALUE_SIZE CIFSMaxBufSize
#define CIFS_XATTR_CIFS_ACL "system.cifs_acl"
#define CIFS_XATTR_ATTRIB "cifs.dosattrib" /* full name: user.cifs.dosattrib */
#define CIFS_XATTR_CREATETIME "cifs.creationtime" /* user.cifs.creationtime */
#include <linux/atomic.h>
#include <linux/device.h>
#include <linux/poll.h>
+#include <linux/security.h>
#include "internal.h"
}
EXPORT_SYMBOL_GPL(debugfs_file_put);
+/*
+ * Only permit access to world-readable files when the kernel is locked down.
+ * We also need to exclude any file that has ways to write or alter it as root
+ * can bypass the permissions check.
+ */
+static bool debugfs_is_locked_down(struct inode *inode,
+ struct file *filp,
+ const struct file_operations *real_fops)
+{
+ if ((inode->i_mode & 07777) == 0444 &&
+ !(filp->f_mode & FMODE_WRITE) &&
+ !real_fops->unlocked_ioctl &&
+ !real_fops->compat_ioctl &&
+ !real_fops->mmap)
+ return false;
+
+ return security_locked_down(LOCKDOWN_DEBUGFS);
+}
+
static int open_proxy_open(struct inode *inode, struct file *filp)
{
struct dentry *dentry = F_DENTRY(filp);
return r == -EIO ? -ENOENT : r;
real_fops = debugfs_real_fops(filp);
+
+ r = debugfs_is_locked_down(inode, filp, real_fops);
+ if (r)
+ goto out;
+
real_fops = fops_get(real_fops);
if (!real_fops) {
/* Huh? Module did not clean up after itself at exit? */
return r == -EIO ? -ENOENT : r;
real_fops = debugfs_real_fops(filp);
+
+ r = debugfs_is_locked_down(inode, filp, real_fops);
+ if (r)
+ goto out;
+
real_fops = fops_get(real_fops);
if (!real_fops) {
/* Huh? Module did not cleanup after itself at exit? */
#include <linux/parser.h>
#include <linux/magic.h>
#include <linux/slab.h>
+#include <linux/security.h>
#include "internal.h"
static int debugfs_mount_count;
static bool debugfs_registered;
+/*
+ * Don't allow access attributes to be changed whilst the kernel is locked down
+ * so that we can use the file mode as part of a heuristic to determine whether
+ * to lock down individual files.
+ */
+static int debugfs_setattr(struct dentry *dentry, struct iattr *ia)
+{
+ int ret = security_locked_down(LOCKDOWN_DEBUGFS);
+
+ if (ret && (ia->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)))
+ return ret;
+ return simple_setattr(dentry, ia);
+}
+
+static const struct inode_operations debugfs_file_inode_operations = {
+ .setattr = debugfs_setattr,
+};
+static const struct inode_operations debugfs_dir_inode_operations = {
+ .lookup = simple_lookup,
+ .setattr = debugfs_setattr,
+};
+static const struct inode_operations debugfs_symlink_inode_operations = {
+ .get_link = simple_get_link,
+ .setattr = debugfs_setattr,
+};
+
static struct inode *debugfs_get_inode(struct super_block *sb)
{
struct inode *inode = new_inode(sb);
inode->i_mode = mode;
inode->i_private = data;
+ inode->i_op = &debugfs_file_inode_operations;
inode->i_fop = proxy_fops;
dentry->d_fsdata = (void *)((unsigned long)real_fops |
DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
}
inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
- inode->i_op = &simple_dir_inode_operations;
+ inode->i_op = &debugfs_dir_inode_operations;
inode->i_fop = &simple_dir_operations;
/* directory inodes start off with i_nlink == 2 (for "." entry) */
return failed_creating(dentry);
}
inode->i_mode = S_IFLNK | S_IRWXUGO;
- inode->i_op = &simple_symlink_inode_operations;
+ inode->i_op = &debugfs_symlink_inode_operations;
inode->i_link = link;
d_instantiate(dentry, inode);
return end_creating(dentry);
struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
{
- struct inode *const bd_inode = sb->s_bdev->bd_inode;
- struct address_space *const mapping = bd_inode->i_mapping;
+ struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
+ struct page *page;
- return read_cache_page_gfp(mapping, blkaddr,
+ page = read_cache_page_gfp(mapping, blkaddr,
mapping_gfp_constraint(mapping, ~__GFP_FS));
+ /* should already be PageUptodate */
+ if (!IS_ERR(page))
+ lock_page(page);
+ return page;
}
static int erofs_map_blocks_flatmode(struct inode *inode,
int ret;
page = read_mapping_page(sb->s_bdev->bd_inode->i_mapping, 0, NULL);
- if (!page) {
+ if (IS_ERR(page)) {
erofs_err(sb, "cannot read erofs superblock");
- return -EIO;
+ return PTR_ERR(page);
}
sbi = EROFS_SB(sb);
struct erofs_map_blocks *const map = &fe->map;
struct z_erofs_collector *const clt = &fe->clt;
const loff_t offset = page_offset(page);
- bool tight = (clt->mode >= COLLECT_PRIMARY_HOOKED);
+ bool tight = true;
enum z_erofs_cache_alloctype cache_strategy;
enum z_erofs_page_type page_type;
preload_compressed_pages(clt, MNGD_MAPPING(sbi),
cache_strategy, pagepool);
- tight &= (clt->mode >= COLLECT_PRIMARY_HOOKED);
hitted:
+ /*
+ * Ensure the current partial page belongs to this submit chain rather
+ * than other concurrent submit chains or the noio(bypass) chain since
+ * those chains are handled asynchronously thus the page cannot be used
+ * for inplace I/O or pagevec (should be processed in strict order.)
+ */
+ tight &= (clt->mode >= COLLECT_PRIMARY_HOOKED &&
+ clt->mode != COLLECT_PRIMARY_FOLLOWED_NOINPLACE);
+
cur = end - min_t(unsigned int, offset + end - map->m_la, end);
if (!(map->m_flags & EROFS_MAP_MAPPED)) {
zero_user_segment(page, cur, end);
}
task_lock(tsk);
active_mm = tsk->active_mm;
+ membarrier_exec_mmap(mm);
tsk->mm = mm;
tsk->active_mm = mm;
activate_mm(active_mm, mm);
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
- membarrier_execve(current);
rseq_execve(current);
acct_update_integrals(current);
task_numa_free(current, false);
struct buffer_head *bh;
struct super_block *sb = inode->i_sb;
ext4_fsblk_t block;
+ struct blk_plug plug;
int inodes_per_block, inode_offset;
iloc->bh = NULL;
* If we need to do any I/O, try to pre-readahead extra
* blocks from the inode table.
*/
+ blk_start_plug(&plug);
if (EXT4_SB(sb)->s_inode_readahead_blks) {
ext4_fsblk_t b, end, table;
unsigned num;
get_bh(bh);
bh->b_end_io = end_buffer_read_sync;
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
+ blk_finish_plug(&plug);
wait_on_buffer(bh);
if (!buffer_uptodate(bh)) {
EXT4_ERROR_INODE_BLOCK(inode, block,
* sys_open_by_handle_at: Open the file handle
* @mountdirfd: directory file descriptor
* @handle: file handle to be opened
- * @flag: open flags.
+ * @flags: open flags.
*
* @mountdirfd indicate the directory file descriptor
* of the mount point. file handle is decoded relative
{
delayed_fput(NULL);
}
+EXPORT_SYMBOL_GPL(flush_delayed_fput);
static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
If you want to develop or use a userspace character device
based on CUSE, answer Y or M.
+
+config VIRTIO_FS
+ tristate "Virtio Filesystem"
+ depends on FUSE_FS
+ select VIRTIO
+ help
+ The Virtio Filesystem allows guests to mount file systems from the
+ host.
+
+ If you want to share files between guests or with the host, answer Y
+ or M.
obj-$(CONFIG_FUSE_FS) += fuse.o
obj-$(CONFIG_CUSE) += cuse.o
+obj-$(CONFIG_VIRTIO_FS) += virtio_fs.o
fuse-objs := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o
/** Used to wake up the task waiting for completion of request*/
wait_queue_head_t waitq;
+#if IS_ENABLED(CONFIG_VIRTIO_FS)
+ /** virtio-fs's physically contiguous buffer for in and out args */
+ void *argbuf;
+#endif
};
struct fuse_iqueue;
*/
void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq)
__releases(fiq->lock);
+
+ /**
+ * Clean up when fuse_iqueue is destroyed
+ */
+ void (*release)(struct fuse_iqueue *fiq);
};
/** /dev/fuse input queue operations */
void fuse_conn_put(struct fuse_conn *fc)
{
if (refcount_dec_and_test(&fc->count)) {
+ struct fuse_iqueue *fiq = &fc->iq;
+
+ if (fiq->ops->release)
+ fiq->ops->release(fiq);
put_pid_ns(fc->pid_ns);
put_user_ns(fc->user_ns);
fc->release(fc);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio-fs: Virtio Filesystem
+ * Copyright (C) 2018 Red Hat, Inc.
+ */
+
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/virtio_fs.h>
+#include <linux/delay.h>
+#include <linux/fs_context.h>
+#include <linux/highmem.h>
+#include "fuse_i.h"
+
+/* List of virtio-fs device instances and a lock for the list. Also provides
+ * mutual exclusion in device removal and mounting path
+ */
+static DEFINE_MUTEX(virtio_fs_mutex);
+static LIST_HEAD(virtio_fs_instances);
+
+enum {
+ VQ_HIPRIO,
+ VQ_REQUEST
+};
+
+/* Per-virtqueue state */
+struct virtio_fs_vq {
+ spinlock_t lock;
+ struct virtqueue *vq; /* protected by ->lock */
+ struct work_struct done_work;
+ struct list_head queued_reqs;
+ struct delayed_work dispatch_work;
+ struct fuse_dev *fud;
+ bool connected;
+ long in_flight;
+ char name[24];
+} ____cacheline_aligned_in_smp;
+
+/* A virtio-fs device instance */
+struct virtio_fs {
+ struct kref refcount;
+ struct list_head list; /* on virtio_fs_instances */
+ char *tag;
+ struct virtio_fs_vq *vqs;
+ unsigned int nvqs; /* number of virtqueues */
+ unsigned int num_request_queues; /* number of request queues */
+};
+
+struct virtio_fs_forget {
+ struct fuse_in_header ih;
+ struct fuse_forget_in arg;
+ /* This request can be temporarily queued on virt queue */
+ struct list_head list;
+};
+
+static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq)
+{
+ struct virtio_fs *fs = vq->vdev->priv;
+
+ return &fs->vqs[vq->index];
+}
+
+static inline struct fuse_pqueue *vq_to_fpq(struct virtqueue *vq)
+{
+ return &vq_to_fsvq(vq)->fud->pq;
+}
+
+static void release_virtio_fs_obj(struct kref *ref)
+{
+ struct virtio_fs *vfs = container_of(ref, struct virtio_fs, refcount);
+
+ kfree(vfs->vqs);
+ kfree(vfs);
+}
+
+/* Make sure virtiofs_mutex is held */
+static void virtio_fs_put(struct virtio_fs *fs)
+{
+ kref_put(&fs->refcount, release_virtio_fs_obj);
+}
+
+static void virtio_fs_fiq_release(struct fuse_iqueue *fiq)
+{
+ struct virtio_fs *vfs = fiq->priv;
+
+ mutex_lock(&virtio_fs_mutex);
+ virtio_fs_put(vfs);
+ mutex_unlock(&virtio_fs_mutex);
+}
+
+static void virtio_fs_drain_queue(struct virtio_fs_vq *fsvq)
+{
+ WARN_ON(fsvq->in_flight < 0);
+
+ /* Wait for in flight requests to finish.*/
+ while (1) {
+ spin_lock(&fsvq->lock);
+ if (!fsvq->in_flight) {
+ spin_unlock(&fsvq->lock);
+ break;
+ }
+ spin_unlock(&fsvq->lock);
+ /* TODO use completion instead of timeout */
+ usleep_range(1000, 2000);
+ }
+
+ flush_work(&fsvq->done_work);
+ flush_delayed_work(&fsvq->dispatch_work);
+}
+
+static inline void drain_hiprio_queued_reqs(struct virtio_fs_vq *fsvq)
+{
+ struct virtio_fs_forget *forget;
+
+ spin_lock(&fsvq->lock);
+ while (1) {
+ forget = list_first_entry_or_null(&fsvq->queued_reqs,
+ struct virtio_fs_forget, list);
+ if (!forget)
+ break;
+ list_del(&forget->list);
+ kfree(forget);
+ }
+ spin_unlock(&fsvq->lock);
+}
+
+static void virtio_fs_drain_all_queues(struct virtio_fs *fs)
+{
+ struct virtio_fs_vq *fsvq;
+ int i;
+
+ for (i = 0; i < fs->nvqs; i++) {
+ fsvq = &fs->vqs[i];
+ if (i == VQ_HIPRIO)
+ drain_hiprio_queued_reqs(fsvq);
+
+ virtio_fs_drain_queue(fsvq);
+ }
+}
+
+static void virtio_fs_start_all_queues(struct virtio_fs *fs)
+{
+ struct virtio_fs_vq *fsvq;
+ int i;
+
+ for (i = 0; i < fs->nvqs; i++) {
+ fsvq = &fs->vqs[i];
+ spin_lock(&fsvq->lock);
+ fsvq->connected = true;
+ spin_unlock(&fsvq->lock);
+ }
+}
+
+/* Add a new instance to the list or return -EEXIST if tag name exists*/
+static int virtio_fs_add_instance(struct virtio_fs *fs)
+{
+ struct virtio_fs *fs2;
+ bool duplicate = false;
+
+ mutex_lock(&virtio_fs_mutex);
+
+ list_for_each_entry(fs2, &virtio_fs_instances, list) {
+ if (strcmp(fs->tag, fs2->tag) == 0)
+ duplicate = true;
+ }
+
+ if (!duplicate)
+ list_add_tail(&fs->list, &virtio_fs_instances);
+
+ mutex_unlock(&virtio_fs_mutex);
+
+ if (duplicate)
+ return -EEXIST;
+ return 0;
+}
+
+/* Return the virtio_fs with a given tag, or NULL */
+static struct virtio_fs *virtio_fs_find_instance(const char *tag)
+{
+ struct virtio_fs *fs;
+
+ mutex_lock(&virtio_fs_mutex);
+
+ list_for_each_entry(fs, &virtio_fs_instances, list) {
+ if (strcmp(fs->tag, tag) == 0) {
+ kref_get(&fs->refcount);
+ goto found;
+ }
+ }
+
+ fs = NULL; /* not found */
+
+found:
+ mutex_unlock(&virtio_fs_mutex);
+
+ return fs;
+}
+
+static void virtio_fs_free_devs(struct virtio_fs *fs)
+{
+ unsigned int i;
+
+ for (i = 0; i < fs->nvqs; i++) {
+ struct virtio_fs_vq *fsvq = &fs->vqs[i];
+
+ if (!fsvq->fud)
+ continue;
+
+ fuse_dev_free(fsvq->fud);
+ fsvq->fud = NULL;
+ }
+}
+
+/* Read filesystem name from virtio config into fs->tag (must kfree()). */
+static int virtio_fs_read_tag(struct virtio_device *vdev, struct virtio_fs *fs)
+{
+ char tag_buf[sizeof_field(struct virtio_fs_config, tag)];
+ char *end;
+ size_t len;
+
+ virtio_cread_bytes(vdev, offsetof(struct virtio_fs_config, tag),
+ &tag_buf, sizeof(tag_buf));
+ end = memchr(tag_buf, '\0', sizeof(tag_buf));
+ if (end == tag_buf)
+ return -EINVAL; /* empty tag */
+ if (!end)
+ end = &tag_buf[sizeof(tag_buf)];
+
+ len = end - tag_buf;
+ fs->tag = devm_kmalloc(&vdev->dev, len + 1, GFP_KERNEL);
+ if (!fs->tag)
+ return -ENOMEM;
+ memcpy(fs->tag, tag_buf, len);
+ fs->tag[len] = '\0';
+ return 0;
+}
+
+/* Work function for hiprio completion */
+static void virtio_fs_hiprio_done_work(struct work_struct *work)
+{
+ struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
+ done_work);
+ struct virtqueue *vq = fsvq->vq;
+
+ /* Free completed FUSE_FORGET requests */
+ spin_lock(&fsvq->lock);
+ do {
+ unsigned int len;
+ void *req;
+
+ virtqueue_disable_cb(vq);
+
+ while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+ kfree(req);
+ fsvq->in_flight--;
+ }
+ } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
+ spin_unlock(&fsvq->lock);
+}
+
+static void virtio_fs_dummy_dispatch_work(struct work_struct *work)
+{
+}
+
+static void virtio_fs_hiprio_dispatch_work(struct work_struct *work)
+{
+ struct virtio_fs_forget *forget;
+ struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
+ dispatch_work.work);
+ struct virtqueue *vq = fsvq->vq;
+ struct scatterlist sg;
+ struct scatterlist *sgs[] = {&sg};
+ bool notify;
+ int ret;
+
+ pr_debug("virtio-fs: worker %s called.\n", __func__);
+ while (1) {
+ spin_lock(&fsvq->lock);
+ forget = list_first_entry_or_null(&fsvq->queued_reqs,
+ struct virtio_fs_forget, list);
+ if (!forget) {
+ spin_unlock(&fsvq->lock);
+ return;
+ }
+
+ list_del(&forget->list);
+ if (!fsvq->connected) {
+ spin_unlock(&fsvq->lock);
+ kfree(forget);
+ continue;
+ }
+
+ sg_init_one(&sg, forget, sizeof(*forget));
+
+ /* Enqueue the request */
+ dev_dbg(&vq->vdev->dev, "%s\n", __func__);
+ ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC);
+ if (ret < 0) {
+ if (ret == -ENOMEM || ret == -ENOSPC) {
+ pr_debug("virtio-fs: Could not queue FORGET: err=%d. Will try later\n",
+ ret);
+ list_add_tail(&forget->list,
+ &fsvq->queued_reqs);
+ schedule_delayed_work(&fsvq->dispatch_work,
+ msecs_to_jiffies(1));
+ } else {
+ pr_debug("virtio-fs: Could not queue FORGET: err=%d. Dropping it.\n",
+ ret);
+ kfree(forget);
+ }
+ spin_unlock(&fsvq->lock);
+ return;
+ }
+
+ fsvq->in_flight++;
+ notify = virtqueue_kick_prepare(vq);
+ spin_unlock(&fsvq->lock);
+
+ if (notify)
+ virtqueue_notify(vq);
+ pr_debug("virtio-fs: worker %s dispatched one forget request.\n",
+ __func__);
+ }
+}
+
+/* Allocate and copy args into req->argbuf */
+static int copy_args_to_argbuf(struct fuse_req *req)
+{
+ struct fuse_args *args = req->args;
+ unsigned int offset = 0;
+ unsigned int num_in;
+ unsigned int num_out;
+ unsigned int len;
+ unsigned int i;
+
+ num_in = args->in_numargs - args->in_pages;
+ num_out = args->out_numargs - args->out_pages;
+ len = fuse_len_args(num_in, (struct fuse_arg *) args->in_args) +
+ fuse_len_args(num_out, args->out_args);
+
+ req->argbuf = kmalloc(len, GFP_ATOMIC);
+ if (!req->argbuf)
+ return -ENOMEM;
+
+ for (i = 0; i < num_in; i++) {
+ memcpy(req->argbuf + offset,
+ args->in_args[i].value,
+ args->in_args[i].size);
+ offset += args->in_args[i].size;
+ }
+
+ return 0;
+}
+
+/* Copy args out of and free req->argbuf */
+static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
+{
+ unsigned int remaining;
+ unsigned int offset;
+ unsigned int num_in;
+ unsigned int num_out;
+ unsigned int i;
+
+ remaining = req->out.h.len - sizeof(req->out.h);
+ num_in = args->in_numargs - args->in_pages;
+ num_out = args->out_numargs - args->out_pages;
+ offset = fuse_len_args(num_in, (struct fuse_arg *)args->in_args);
+
+ for (i = 0; i < num_out; i++) {
+ unsigned int argsize = args->out_args[i].size;
+
+ if (args->out_argvar &&
+ i == args->out_numargs - 1 &&
+ argsize > remaining) {
+ argsize = remaining;
+ }
+
+ memcpy(args->out_args[i].value, req->argbuf + offset, argsize);
+ offset += argsize;
+
+ if (i != args->out_numargs - 1)
+ remaining -= argsize;
+ }
+
+ /* Store the actual size of the variable-length arg */
+ if (args->out_argvar)
+ args->out_args[args->out_numargs - 1].size = remaining;
+
+ kfree(req->argbuf);
+ req->argbuf = NULL;
+}
+
+/* Work function for request completion */
+static void virtio_fs_requests_done_work(struct work_struct *work)
+{
+ struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
+ done_work);
+ struct fuse_pqueue *fpq = &fsvq->fud->pq;
+ struct fuse_conn *fc = fsvq->fud->fc;
+ struct virtqueue *vq = fsvq->vq;
+ struct fuse_req *req;
+ struct fuse_args_pages *ap;
+ struct fuse_req *next;
+ struct fuse_args *args;
+ unsigned int len, i, thislen;
+ struct page *page;
+ LIST_HEAD(reqs);
+
+ /* Collect completed requests off the virtqueue */
+ spin_lock(&fsvq->lock);
+ do {
+ virtqueue_disable_cb(vq);
+
+ while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+ spin_lock(&fpq->lock);
+ list_move_tail(&req->list, &reqs);
+ spin_unlock(&fpq->lock);
+ }
+ } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
+ spin_unlock(&fsvq->lock);
+
+ /* End requests */
+ list_for_each_entry_safe(req, next, &reqs, list) {
+ /*
+ * TODO verify that server properly follows FUSE protocol
+ * (oh.uniq, oh.len)
+ */
+ args = req->args;
+ copy_args_from_argbuf(args, req);
+
+ if (args->out_pages && args->page_zeroing) {
+ len = args->out_args[args->out_numargs - 1].size;
+ ap = container_of(args, typeof(*ap), args);
+ for (i = 0; i < ap->num_pages; i++) {
+ thislen = ap->descs[i].length;
+ if (len < thislen) {
+ WARN_ON(ap->descs[i].offset);
+ page = ap->pages[i];
+ zero_user_segment(page, len, thislen);
+ len = 0;
+ } else {
+ len -= thislen;
+ }
+ }
+ }
+
+ spin_lock(&fpq->lock);
+ clear_bit(FR_SENT, &req->flags);
+ list_del_init(&req->list);
+ spin_unlock(&fpq->lock);
+
+ fuse_request_end(fc, req);
+ spin_lock(&fsvq->lock);
+ fsvq->in_flight--;
+ spin_unlock(&fsvq->lock);
+ }
+}
+
+/* Virtqueue interrupt handler */
+static void virtio_fs_vq_done(struct virtqueue *vq)
+{
+ struct virtio_fs_vq *fsvq = vq_to_fsvq(vq);
+
+ dev_dbg(&vq->vdev->dev, "%s %s\n", __func__, fsvq->name);
+
+ schedule_work(&fsvq->done_work);
+}
+
+/* Initialize virtqueues */
+static int virtio_fs_setup_vqs(struct virtio_device *vdev,
+ struct virtio_fs *fs)
+{
+ struct virtqueue **vqs;
+ vq_callback_t **callbacks;
+ const char **names;
+ unsigned int i;
+ int ret = 0;
+
+ virtio_cread(vdev, struct virtio_fs_config, num_request_queues,
+ &fs->num_request_queues);
+ if (fs->num_request_queues == 0)
+ return -EINVAL;
+
+ fs->nvqs = 1 + fs->num_request_queues;
+ fs->vqs = kcalloc(fs->nvqs, sizeof(fs->vqs[VQ_HIPRIO]), GFP_KERNEL);
+ if (!fs->vqs)
+ return -ENOMEM;
+
+ vqs = kmalloc_array(fs->nvqs, sizeof(vqs[VQ_HIPRIO]), GFP_KERNEL);
+ callbacks = kmalloc_array(fs->nvqs, sizeof(callbacks[VQ_HIPRIO]),
+ GFP_KERNEL);
+ names = kmalloc_array(fs->nvqs, sizeof(names[VQ_HIPRIO]), GFP_KERNEL);
+ if (!vqs || !callbacks || !names) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ callbacks[VQ_HIPRIO] = virtio_fs_vq_done;
+ snprintf(fs->vqs[VQ_HIPRIO].name, sizeof(fs->vqs[VQ_HIPRIO].name),
+ "hiprio");
+ names[VQ_HIPRIO] = fs->vqs[VQ_HIPRIO].name;
+ INIT_WORK(&fs->vqs[VQ_HIPRIO].done_work, virtio_fs_hiprio_done_work);
+ INIT_LIST_HEAD(&fs->vqs[VQ_HIPRIO].queued_reqs);
+ INIT_DELAYED_WORK(&fs->vqs[VQ_HIPRIO].dispatch_work,
+ virtio_fs_hiprio_dispatch_work);
+ spin_lock_init(&fs->vqs[VQ_HIPRIO].lock);
+
+ /* Initialize the requests virtqueues */
+ for (i = VQ_REQUEST; i < fs->nvqs; i++) {
+ spin_lock_init(&fs->vqs[i].lock);
+ INIT_WORK(&fs->vqs[i].done_work, virtio_fs_requests_done_work);
+ INIT_DELAYED_WORK(&fs->vqs[i].dispatch_work,
+ virtio_fs_dummy_dispatch_work);
+ INIT_LIST_HEAD(&fs->vqs[i].queued_reqs);
+ snprintf(fs->vqs[i].name, sizeof(fs->vqs[i].name),
+ "requests.%u", i - VQ_REQUEST);
+ callbacks[i] = virtio_fs_vq_done;
+ names[i] = fs->vqs[i].name;
+ }
+
+ ret = virtio_find_vqs(vdev, fs->nvqs, vqs, callbacks, names, NULL);
+ if (ret < 0)
+ goto out;
+
+ for (i = 0; i < fs->nvqs; i++)
+ fs->vqs[i].vq = vqs[i];
+
+ virtio_fs_start_all_queues(fs);
+out:
+ kfree(names);
+ kfree(callbacks);
+ kfree(vqs);
+ if (ret)
+ kfree(fs->vqs);
+ return ret;
+}
+
+/* Free virtqueues (device must already be reset) */
+static void virtio_fs_cleanup_vqs(struct virtio_device *vdev,
+ struct virtio_fs *fs)
+{
+ vdev->config->del_vqs(vdev);
+}
+
+static int virtio_fs_probe(struct virtio_device *vdev)
+{
+ struct virtio_fs *fs;
+ int ret;
+
+ fs = kzalloc(sizeof(*fs), GFP_KERNEL);
+ if (!fs)
+ return -ENOMEM;
+ kref_init(&fs->refcount);
+ vdev->priv = fs;
+
+ ret = virtio_fs_read_tag(vdev, fs);
+ if (ret < 0)
+ goto out;
+
+ ret = virtio_fs_setup_vqs(vdev, fs);
+ if (ret < 0)
+ goto out;
+
+ /* TODO vq affinity */
+
+ /* Bring the device online in case the filesystem is mounted and
+ * requests need to be sent before we return.
+ */
+ virtio_device_ready(vdev);
+
+ ret = virtio_fs_add_instance(fs);
+ if (ret < 0)
+ goto out_vqs;
+
+ return 0;
+
+out_vqs:
+ vdev->config->reset(vdev);
+ virtio_fs_cleanup_vqs(vdev, fs);
+
+out:
+ vdev->priv = NULL;
+ kfree(fs);
+ return ret;
+}
+
+static void virtio_fs_stop_all_queues(struct virtio_fs *fs)
+{
+ struct virtio_fs_vq *fsvq;
+ int i;
+
+ for (i = 0; i < fs->nvqs; i++) {
+ fsvq = &fs->vqs[i];
+ spin_lock(&fsvq->lock);
+ fsvq->connected = false;
+ spin_unlock(&fsvq->lock);
+ }
+}
+
+static void virtio_fs_remove(struct virtio_device *vdev)
+{
+ struct virtio_fs *fs = vdev->priv;
+
+ mutex_lock(&virtio_fs_mutex);
+ /* This device is going away. No one should get new reference */
+ list_del_init(&fs->list);
+ virtio_fs_stop_all_queues(fs);
+ virtio_fs_drain_all_queues(fs);
+ vdev->config->reset(vdev);
+ virtio_fs_cleanup_vqs(vdev, fs);
+
+ vdev->priv = NULL;
+ /* Put device reference on virtio_fs object */
+ virtio_fs_put(fs);
+ mutex_unlock(&virtio_fs_mutex);
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int virtio_fs_freeze(struct virtio_device *vdev)
+{
+ /* TODO need to save state here */
+ pr_warn("virtio-fs: suspend/resume not yet supported\n");
+ return -EOPNOTSUPP;
+}
+
+static int virtio_fs_restore(struct virtio_device *vdev)
+{
+ /* TODO need to restore state here */
+ return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
+
+const static struct virtio_device_id id_table[] = {
+ { VIRTIO_ID_FS, VIRTIO_DEV_ANY_ID },
+ {},
+};
+
+const static unsigned int feature_table[] = {};
+
+static struct virtio_driver virtio_fs_driver = {
+ .driver.name = KBUILD_MODNAME,
+ .driver.owner = THIS_MODULE,
+ .id_table = id_table,
+ .feature_table = feature_table,
+ .feature_table_size = ARRAY_SIZE(feature_table),
+ .probe = virtio_fs_probe,
+ .remove = virtio_fs_remove,
+#ifdef CONFIG_PM_SLEEP
+ .freeze = virtio_fs_freeze,
+ .restore = virtio_fs_restore,
+#endif
+};
+
+static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq)
+__releases(fiq->lock)
+{
+ struct fuse_forget_link *link;
+ struct virtio_fs_forget *forget;
+ struct scatterlist sg;
+ struct scatterlist *sgs[] = {&sg};
+ struct virtio_fs *fs;
+ struct virtqueue *vq;
+ struct virtio_fs_vq *fsvq;
+ bool notify;
+ u64 unique;
+ int ret;
+
+ link = fuse_dequeue_forget(fiq, 1, NULL);
+ unique = fuse_get_unique(fiq);
+
+ fs = fiq->priv;
+ fsvq = &fs->vqs[VQ_HIPRIO];
+ spin_unlock(&fiq->lock);
+
+ /* Allocate a buffer for the request */
+ forget = kmalloc(sizeof(*forget), GFP_NOFS | __GFP_NOFAIL);
+
+ forget->ih = (struct fuse_in_header){
+ .opcode = FUSE_FORGET,
+ .nodeid = link->forget_one.nodeid,
+ .unique = unique,
+ .len = sizeof(*forget),
+ };
+ forget->arg = (struct fuse_forget_in){
+ .nlookup = link->forget_one.nlookup,
+ };
+
+ sg_init_one(&sg, forget, sizeof(*forget));
+
+ /* Enqueue the request */
+ spin_lock(&fsvq->lock);
+
+ if (!fsvq->connected) {
+ kfree(forget);
+ spin_unlock(&fsvq->lock);
+ goto out;
+ }
+
+ vq = fsvq->vq;
+ dev_dbg(&vq->vdev->dev, "%s\n", __func__);
+
+ ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC);
+ if (ret < 0) {
+ if (ret == -ENOMEM || ret == -ENOSPC) {
+ pr_debug("virtio-fs: Could not queue FORGET: err=%d. Will try later.\n",
+ ret);
+ list_add_tail(&forget->list, &fsvq->queued_reqs);
+ schedule_delayed_work(&fsvq->dispatch_work,
+ msecs_to_jiffies(1));
+ } else {
+ pr_debug("virtio-fs: Could not queue FORGET: err=%d. Dropping it.\n",
+ ret);
+ kfree(forget);
+ }
+ spin_unlock(&fsvq->lock);
+ goto out;
+ }
+
+ fsvq->in_flight++;
+ notify = virtqueue_kick_prepare(vq);
+
+ spin_unlock(&fsvq->lock);
+
+ if (notify)
+ virtqueue_notify(vq);
+out:
+ kfree(link);
+}
+
+static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq)
+__releases(fiq->lock)
+{
+ /*
+ * TODO interrupts.
+ *
+ * Normal fs operations on a local filesystems aren't interruptible.
+ * Exceptions are blocking lock operations; for example fcntl(F_SETLKW)
+ * with shared lock between host and guest.
+ */
+ spin_unlock(&fiq->lock);
+}
+
+/* Return the number of scatter-gather list elements required */
+static unsigned int sg_count_fuse_req(struct fuse_req *req)
+{
+ struct fuse_args *args = req->args;
+ struct fuse_args_pages *ap = container_of(args, typeof(*ap), args);
+ unsigned int total_sgs = 1 /* fuse_in_header */;
+
+ if (args->in_numargs - args->in_pages)
+ total_sgs += 1;
+
+ if (args->in_pages)
+ total_sgs += ap->num_pages;
+
+ if (!test_bit(FR_ISREPLY, &req->flags))
+ return total_sgs;
+
+ total_sgs += 1 /* fuse_out_header */;
+
+ if (args->out_numargs - args->out_pages)
+ total_sgs += 1;
+
+ if (args->out_pages)
+ total_sgs += ap->num_pages;
+
+ return total_sgs;
+}
+
+/* Add pages to scatter-gather list and return number of elements used */
+static unsigned int sg_init_fuse_pages(struct scatterlist *sg,
+ struct page **pages,
+ struct fuse_page_desc *page_descs,
+ unsigned int num_pages,
+ unsigned int total_len)
+{
+ unsigned int i;
+ unsigned int this_len;
+
+ for (i = 0; i < num_pages && total_len; i++) {
+ sg_init_table(&sg[i], 1);
+ this_len = min(page_descs[i].length, total_len);
+ sg_set_page(&sg[i], pages[i], this_len, page_descs[i].offset);
+ total_len -= this_len;
+ }
+
+ return i;
+}
+
+/* Add args to scatter-gather list and return number of elements used */
+static unsigned int sg_init_fuse_args(struct scatterlist *sg,
+ struct fuse_req *req,
+ struct fuse_arg *args,
+ unsigned int numargs,
+ bool argpages,
+ void *argbuf,
+ unsigned int *len_used)
+{
+ struct fuse_args_pages *ap = container_of(req->args, typeof(*ap), args);
+ unsigned int total_sgs = 0;
+ unsigned int len;
+
+ len = fuse_len_args(numargs - argpages, args);
+ if (len)
+ sg_init_one(&sg[total_sgs++], argbuf, len);
+
+ if (argpages)
+ total_sgs += sg_init_fuse_pages(&sg[total_sgs],
+ ap->pages, ap->descs,
+ ap->num_pages,
+ args[numargs - 1].size);
+
+ if (len_used)
+ *len_used = len;
+
+ return total_sgs;
+}
+
+/* Add a request to a virtqueue and kick the device */
+static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
+ struct fuse_req *req)
+{
+ /* requests need at least 4 elements */
+ struct scatterlist *stack_sgs[6];
+ struct scatterlist stack_sg[ARRAY_SIZE(stack_sgs)];
+ struct scatterlist **sgs = stack_sgs;
+ struct scatterlist *sg = stack_sg;
+ struct virtqueue *vq;
+ struct fuse_args *args = req->args;
+ unsigned int argbuf_used = 0;
+ unsigned int out_sgs = 0;
+ unsigned int in_sgs = 0;
+ unsigned int total_sgs;
+ unsigned int i;
+ int ret;
+ bool notify;
+
+ /* Does the sglist fit on the stack? */
+ total_sgs = sg_count_fuse_req(req);
+ if (total_sgs > ARRAY_SIZE(stack_sgs)) {
+ sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
+ sg = kmalloc_array(total_sgs, sizeof(sg[0]), GFP_ATOMIC);
+ if (!sgs || !sg) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+
+ /* Use a bounce buffer since stack args cannot be mapped */
+ ret = copy_args_to_argbuf(req);
+ if (ret < 0)
+ goto out;
+
+ /* Request elements */
+ sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h));
+ out_sgs += sg_init_fuse_args(&sg[out_sgs], req,
+ (struct fuse_arg *)args->in_args,
+ args->in_numargs, args->in_pages,
+ req->argbuf, &argbuf_used);
+
+ /* Reply elements */
+ if (test_bit(FR_ISREPLY, &req->flags)) {
+ sg_init_one(&sg[out_sgs + in_sgs++],
+ &req->out.h, sizeof(req->out.h));
+ in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req,
+ args->out_args, args->out_numargs,
+ args->out_pages,
+ req->argbuf + argbuf_used, NULL);
+ }
+
+ WARN_ON(out_sgs + in_sgs != total_sgs);
+
+ for (i = 0; i < total_sgs; i++)
+ sgs[i] = &sg[i];
+
+ spin_lock(&fsvq->lock);
+
+ if (!fsvq->connected) {
+ spin_unlock(&fsvq->lock);
+ ret = -ENOTCONN;
+ goto out;
+ }
+
+ vq = fsvq->vq;
+ ret = virtqueue_add_sgs(vq, sgs, out_sgs, in_sgs, req, GFP_ATOMIC);
+ if (ret < 0) {
+ spin_unlock(&fsvq->lock);
+ goto out;
+ }
+
+ fsvq->in_flight++;
+ notify = virtqueue_kick_prepare(vq);
+
+ spin_unlock(&fsvq->lock);
+
+ if (notify)
+ virtqueue_notify(vq);
+
+out:
+ if (ret < 0 && req->argbuf) {
+ kfree(req->argbuf);
+ req->argbuf = NULL;
+ }
+ if (sgs != stack_sgs) {
+ kfree(sgs);
+ kfree(sg);
+ }
+
+ return ret;
+}
+
+static void virtio_fs_wake_pending_and_unlock(struct fuse_iqueue *fiq)
+__releases(fiq->lock)
+{
+ unsigned int queue_id = VQ_REQUEST; /* TODO multiqueue */
+ struct virtio_fs *fs;
+ struct fuse_conn *fc;
+ struct fuse_req *req;
+ struct fuse_pqueue *fpq;
+ int ret;
+
+ WARN_ON(list_empty(&fiq->pending));
+ req = list_last_entry(&fiq->pending, struct fuse_req, list);
+ clear_bit(FR_PENDING, &req->flags);
+ list_del_init(&req->list);
+ WARN_ON(!list_empty(&fiq->pending));
+ spin_unlock(&fiq->lock);
+
+ fs = fiq->priv;
+ fc = fs->vqs[queue_id].fud->fc;
+
+ pr_debug("%s: opcode %u unique %#llx nodeid %#llx in.len %u out.len %u\n",
+ __func__, req->in.h.opcode, req->in.h.unique,
+ req->in.h.nodeid, req->in.h.len,
+ fuse_len_args(req->args->out_numargs, req->args->out_args));
+
+ fpq = &fs->vqs[queue_id].fud->pq;
+ spin_lock(&fpq->lock);
+ if (!fpq->connected) {
+ spin_unlock(&fpq->lock);
+ req->out.h.error = -ENODEV;
+ pr_err("virtio-fs: %s disconnected\n", __func__);
+ fuse_request_end(fc, req);
+ return;
+ }
+ list_add_tail(&req->list, fpq->processing);
+ spin_unlock(&fpq->lock);
+ set_bit(FR_SENT, &req->flags);
+ /* matches barrier in request_wait_answer() */
+ smp_mb__after_atomic();
+
+retry:
+ ret = virtio_fs_enqueue_req(&fs->vqs[queue_id], req);
+ if (ret < 0) {
+ if (ret == -ENOMEM || ret == -ENOSPC) {
+ /* Virtqueue full. Retry submission */
+ /* TODO use completion instead of timeout */
+ usleep_range(20, 30);
+ goto retry;
+ }
+ req->out.h.error = ret;
+ pr_err("virtio-fs: virtio_fs_enqueue_req() failed %d\n", ret);
+ spin_lock(&fpq->lock);
+ clear_bit(FR_SENT, &req->flags);
+ list_del_init(&req->list);
+ spin_unlock(&fpq->lock);
+ fuse_request_end(fc, req);
+ return;
+ }
+}
+
+const static struct fuse_iqueue_ops virtio_fs_fiq_ops = {
+ .wake_forget_and_unlock = virtio_fs_wake_forget_and_unlock,
+ .wake_interrupt_and_unlock = virtio_fs_wake_interrupt_and_unlock,
+ .wake_pending_and_unlock = virtio_fs_wake_pending_and_unlock,
+ .release = virtio_fs_fiq_release,
+};
+
+static int virtio_fs_fill_super(struct super_block *sb)
+{
+ struct fuse_conn *fc = get_fuse_conn_super(sb);
+ struct virtio_fs *fs = fc->iq.priv;
+ unsigned int i;
+ int err;
+ struct fuse_fs_context ctx = {
+ .rootmode = S_IFDIR,
+ .default_permissions = 1,
+ .allow_other = 1,
+ .max_read = UINT_MAX,
+ .blksize = 512,
+ .destroy = true,
+ .no_control = true,
+ .no_force_umount = true,
+ };
+
+ mutex_lock(&virtio_fs_mutex);
+
+ /* After holding mutex, make sure virtiofs device is still there.
+ * Though we are holding a reference to it, drive ->remove might
+ * still have cleaned up virtual queues. In that case bail out.
+ */
+ err = -EINVAL;
+ if (list_empty(&fs->list)) {
+ pr_info("virtio-fs: tag <%s> not found\n", fs->tag);
+ goto err;
+ }
+
+ err = -ENOMEM;
+ /* Allocate fuse_dev for hiprio and notification queues */
+ for (i = 0; i < VQ_REQUEST; i++) {
+ struct virtio_fs_vq *fsvq = &fs->vqs[i];
+
+ fsvq->fud = fuse_dev_alloc();
+ if (!fsvq->fud)
+ goto err_free_fuse_devs;
+ }
+
+ ctx.fudptr = (void **)&fs->vqs[VQ_REQUEST].fud;
+ err = fuse_fill_super_common(sb, &ctx);
+ if (err < 0)
+ goto err_free_fuse_devs;
+
+ fc = fs->vqs[VQ_REQUEST].fud->fc;
+
+ for (i = 0; i < fs->nvqs; i++) {
+ struct virtio_fs_vq *fsvq = &fs->vqs[i];
+
+ if (i == VQ_REQUEST)
+ continue; /* already initialized */
+ fuse_dev_install(fsvq->fud, fc);
+ }
+
+ /* Previous unmount will stop all queues. Start these again */
+ virtio_fs_start_all_queues(fs);
+ fuse_send_init(fc);
+ mutex_unlock(&virtio_fs_mutex);
+ return 0;
+
+err_free_fuse_devs:
+ virtio_fs_free_devs(fs);
+err:
+ mutex_unlock(&virtio_fs_mutex);
+ return err;
+}
+
+static void virtio_kill_sb(struct super_block *sb)
+{
+ struct fuse_conn *fc = get_fuse_conn_super(sb);
+ struct virtio_fs *vfs;
+ struct virtio_fs_vq *fsvq;
+
+ /* If mount failed, we can still be called without any fc */
+ if (!fc)
+ return fuse_kill_sb_anon(sb);
+
+ vfs = fc->iq.priv;
+ fsvq = &vfs->vqs[VQ_HIPRIO];
+
+ /* Stop forget queue. Soon destroy will be sent */
+ spin_lock(&fsvq->lock);
+ fsvq->connected = false;
+ spin_unlock(&fsvq->lock);
+ virtio_fs_drain_all_queues(vfs);
+
+ fuse_kill_sb_anon(sb);
+
+ /* fuse_kill_sb_anon() must have sent destroy. Stop all queues
+ * and drain one more time and free fuse devices. Freeing fuse
+ * devices will drop their reference on fuse_conn and that in
+ * turn will drop its reference on virtio_fs object.
+ */
+ virtio_fs_stop_all_queues(vfs);
+ virtio_fs_drain_all_queues(vfs);
+ virtio_fs_free_devs(vfs);
+}
+
+static int virtio_fs_test_super(struct super_block *sb,
+ struct fs_context *fsc)
+{
+ struct fuse_conn *fc = fsc->s_fs_info;
+
+ return fc->iq.priv == get_fuse_conn_super(sb)->iq.priv;
+}
+
+static int virtio_fs_set_super(struct super_block *sb,
+ struct fs_context *fsc)
+{
+ int err;
+
+ err = get_anon_bdev(&sb->s_dev);
+ if (!err)
+ fuse_conn_get(fsc->s_fs_info);
+
+ return err;
+}
+
+static int virtio_fs_get_tree(struct fs_context *fsc)
+{
+ struct virtio_fs *fs;
+ struct super_block *sb;
+ struct fuse_conn *fc;
+ int err;
+
+ /* This gets a reference on virtio_fs object. This ptr gets installed
+ * in fc->iq->priv. Once fuse_conn is going away, it calls ->put()
+ * to drop the reference to this object.
+ */
+ fs = virtio_fs_find_instance(fsc->source);
+ if (!fs) {
+ pr_info("virtio-fs: tag <%s> not found\n", fsc->source);
+ return -EINVAL;
+ }
+
+ fc = kzalloc(sizeof(struct fuse_conn), GFP_KERNEL);
+ if (!fc) {
+ mutex_lock(&virtio_fs_mutex);
+ virtio_fs_put(fs);
+ mutex_unlock(&virtio_fs_mutex);
+ return -ENOMEM;
+ }
+
+ fuse_conn_init(fc, get_user_ns(current_user_ns()), &virtio_fs_fiq_ops,
+ fs);
+ fc->release = fuse_free_conn;
+ fc->delete_stale = true;
+
+ fsc->s_fs_info = fc;
+ sb = sget_fc(fsc, virtio_fs_test_super, virtio_fs_set_super);
+ fuse_conn_put(fc);
+ if (IS_ERR(sb))
+ return PTR_ERR(sb);
+
+ if (!sb->s_root) {
+ err = virtio_fs_fill_super(sb);
+ if (err) {
+ deactivate_locked_super(sb);
+ return err;
+ }
+
+ sb->s_flags |= SB_ACTIVE;
+ }
+
+ WARN_ON(fsc->root);
+ fsc->root = dget(sb->s_root);
+ return 0;
+}
+
+static const struct fs_context_operations virtio_fs_context_ops = {
+ .get_tree = virtio_fs_get_tree,
+};
+
+static int virtio_fs_init_fs_context(struct fs_context *fsc)
+{
+ fsc->ops = &virtio_fs_context_ops;
+ return 0;
+}
+
+static struct file_system_type virtio_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "virtiofs",
+ .init_fs_context = virtio_fs_init_fs_context,
+ .kill_sb = virtio_kill_sb,
+};
+
+static int __init virtio_fs_init(void)
+{
+ int ret;
+
+ ret = register_virtio_driver(&virtio_fs_driver);
+ if (ret < 0)
+ return ret;
+
+ ret = register_filesystem(&virtio_fs_type);
+ if (ret < 0) {
+ unregister_virtio_driver(&virtio_fs_driver);
+ return ret;
+ }
+
+ return 0;
+}
+module_init(virtio_fs_init);
+
+static void __exit virtio_fs_exit(void)
+{
+ unregister_filesystem(&virtio_fs_type);
+ unregister_virtio_driver(&virtio_fs_driver);
+}
+module_exit(virtio_fs_exit);
+
+MODULE_AUTHOR("Stefan Hajnoczi <stefanha@redhat.com>");
+MODULE_DESCRIPTION("Virtio Filesystem");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_FS(KBUILD_MODNAME);
+MODULE_DEVICE_TABLE(virtio, id_table);
unsigned count, req_dist, tail_index;
struct io_ring_ctx *ctx = req->ctx;
struct list_head *entry;
- struct timespec ts;
+ struct timespec64 ts;
if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
if (sqe->flags || sqe->ioprio || sqe->buf_index || sqe->timeout_flags ||
sqe->len != 1)
return -EINVAL;
- if (copy_from_user(&ts, (void __user *) (unsigned long) sqe->addr,
- sizeof(ts)))
+
+ if (get_timespec64(&ts, u64_to_user_ptr(sqe->addr)))
return -EFAULT;
/*
hrtimer_init(&req->timeout.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
req->timeout.timer.function = io_timeout_fn;
- hrtimer_start(&req->timeout.timer, timespec_to_ktime(ts),
+ hrtimer_start(&req->timeout.timer, timespec64_to_ktime(ts),
HRTIMER_MODE_REL);
return 0;
}
return submit;
}
+struct io_wait_queue {
+ struct wait_queue_entry wq;
+ struct io_ring_ctx *ctx;
+ unsigned to_wait;
+ unsigned nr_timeouts;
+};
+
+static inline bool io_should_wake(struct io_wait_queue *iowq)
+{
+ struct io_ring_ctx *ctx = iowq->ctx;
+
+ /*
+ * Wake up if we have enough events, or if a timeout occured since we
+ * started waiting. For timeouts, we always want to return to userspace,
+ * regardless of event count.
+ */
+ return io_cqring_events(ctx->rings) >= iowq->to_wait ||
+ atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
+}
+
+static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
+ int wake_flags, void *key)
+{
+ struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
+ wq);
+
+ if (!io_should_wake(iowq))
+ return -1;
+
+ return autoremove_wake_function(curr, mode, wake_flags, key);
+}
+
/*
* Wait until events become available, if we don't already have some. The
* application must reap them itself, as they reside on the shared cq ring.
static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
const sigset_t __user *sig, size_t sigsz)
{
+ struct io_wait_queue iowq = {
+ .wq = {
+ .private = current,
+ .func = io_wake_function,
+ .entry = LIST_HEAD_INIT(iowq.wq.entry),
+ },
+ .ctx = ctx,
+ .to_wait = min_events,
+ };
struct io_rings *rings = ctx->rings;
- unsigned nr_timeouts;
int ret;
if (io_cqring_events(rings) >= min_events)
return ret;
}
- nr_timeouts = atomic_read(&ctx->cq_timeouts);
- /*
- * Return if we have enough events, or if a timeout occured since
- * we started waiting. For timeouts, we always want to return to
- * userspace.
- */
- ret = wait_event_interruptible(ctx->wait,
- io_cqring_events(rings) >= min_events ||
- atomic_read(&ctx->cq_timeouts) != nr_timeouts);
+ ret = 0;
+ iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
+ do {
+ prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
+ TASK_INTERRUPTIBLE);
+ if (io_should_wake(&iowq))
+ break;
+ schedule();
+ if (signal_pending(current)) {
+ ret = -ERESTARTSYS;
+ break;
+ }
+ } while (1);
+ finish_wait(&ctx->wait, &iowq.wq);
+
restore_saved_sigmask_unless(ret == -ERESTARTSYS);
if (ret == -ERESTARTSYS)
ret = -EINTR;
if (READ_ONCE(ctx->rings->sq.tail) - ctx->cached_sq_head !=
ctx->rings->sq_ring_entries)
mask |= EPOLLOUT | EPOLLWRNORM;
- if (READ_ONCE(ctx->rings->sq.head) != ctx->cached_cq_tail)
+ if (READ_ONCE(ctx->rings->cq.head) != ctx->cached_cq_tail)
mask |= EPOLLIN | EPOLLRDNORM;
return mask;
static DEFINE_PER_CPU(struct file_lock_list_struct, file_lock_list);
DEFINE_STATIC_PERCPU_RWSEM(file_rwsem);
+
/*
* The blocked_hash is used to find POSIX lock loops for deadlock detection.
* It is protected by blocked_lock_lock.
}
EXPORT_SYMBOL(generic_setlease);
+#if IS_ENABLED(CONFIG_SRCU)
+/*
+ * Kernel subsystems can register to be notified on any attempt to set
+ * a new lease with the lease_notifier_chain. This is used by (e.g.) nfsd
+ * to close files that it may have cached when there is an attempt to set a
+ * conflicting lease.
+ */
+static struct srcu_notifier_head lease_notifier_chain;
+
+static inline void
+lease_notifier_chain_init(void)
+{
+ srcu_init_notifier_head(&lease_notifier_chain);
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+ if (arg != F_UNLCK)
+ srcu_notifier_call_chain(&lease_notifier_chain, arg, lease);
+}
+
+int lease_register_notifier(struct notifier_block *nb)
+{
+ return srcu_notifier_chain_register(&lease_notifier_chain, nb);
+}
+EXPORT_SYMBOL_GPL(lease_register_notifier);
+
+void lease_unregister_notifier(struct notifier_block *nb)
+{
+ srcu_notifier_chain_unregister(&lease_notifier_chain, nb);
+}
+EXPORT_SYMBOL_GPL(lease_unregister_notifier);
+
+#else /* !IS_ENABLED(CONFIG_SRCU) */
+static inline void
+lease_notifier_chain_init(void)
+{
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+}
+
+int lease_register_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(lease_register_notifier);
+
+void lease_unregister_notifier(struct notifier_block *nb)
+{
+}
+EXPORT_SYMBOL_GPL(lease_unregister_notifier);
+
+#endif /* IS_ENABLED(CONFIG_SRCU) */
+
/**
* vfs_setlease - sets a lease on an open file
* @filp: file pointer
int
vfs_setlease(struct file *filp, long arg, struct file_lock **lease, void **priv)
{
+ if (lease)
+ setlease_notifier(arg, *lease);
if (filp->f_op->setlease)
return filp->f_op->setlease(filp, arg, lease, priv);
else
INIT_HLIST_HEAD(&fll->hlist);
}
+ lease_notifier_chain_init();
return 0;
}
core_initcall(filelock_init);
tristate "NFS server support"
depends on INET
depends on FILE_LOCKING
+ depends on FSNOTIFY
select LOCKD
select SUNRPC
select EXPORTFS
config NFSD_FAULT_INJECTION
bool "NFS server manual fault injection"
- depends on NFSD_V4 && DEBUG_KERNEL && DEBUG_FS
+ depends on NFSD_V4 && DEBUG_KERNEL && DEBUG_FS && BROKEN
help
This option enables support for manually injecting faults
into the NFS server. This is intended to be used for
nfsd-y += trace.o
nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
- export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
+ export.o auth.o lockd.o nfscache.o nfsxdr.o \
+ stats.o filecache.o
nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o
struct svc_fh;
struct svc_rqst;
-/*
- * Maximum ACL we'll accept from a client; chosen (somewhat
- * arbitrarily) so that kmalloc'ing the ACL shouldn't require a
- * high-order allocation. This allows 204 ACEs on x86_64:
- */
-#define NFS4_ACL_MAX ((PAGE_SIZE - sizeof(struct nfs4_acl)) \
- / sizeof(struct nfs4_ace))
-
int nfs4_acl_bytes(int entries);
int nfs4_acl_get_whotype(char *, u32);
__be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
#include "blocklayoutxdr.h"
#include "pnfs.h"
+#include "filecache.h"
#define NFSDDBG_FACILITY NFSDDBG_PNFS
nfsd4_scsi_fence_client(struct nfs4_layout_stateid *ls)
{
struct nfs4_client *clp = ls->ls_stid.sc_client;
- struct block_device *bdev = ls->ls_file->f_path.mnt->mnt_sb->s_bdev;
+ struct block_device *bdev = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_bdev;
bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY,
nfsd4_scsi_pr_key(clp), 0, true);
#include "nfsfh.h"
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"
#define NFSDDBG_FACILITY NFSDDBG_EXPORT
return NULL;
}
+static void expkey_flush(void)
+{
+ /*
+ * Take the nfsd_mutex here to ensure that the file cache is not
+ * destroyed while we're in the middle of flushing.
+ */
+ mutex_lock(&nfsd_mutex);
+ nfsd_file_cache_purge(current->nsproxy->net_ns);
+ mutex_unlock(&nfsd_mutex);
+}
+
static const struct cache_detail svc_expkey_cache_template = {
.owner = THIS_MODULE,
.hash_size = EXPKEY_HASHMAX,
.init = expkey_init,
.update = expkey_update,
.alloc = expkey_alloc,
+ .flush = expkey_flush,
};
static int
--- /dev/null
+/*
+ * Open file cache.
+ *
+ * (c) 2015 - Jeff Layton <jeff.layton@primarydata.com>
+ */
+
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/sched.h>
+#include <linux/list_lru.h>
+#include <linux/fsnotify_backend.h>
+#include <linux/fsnotify.h>
+#include <linux/seq_file.h>
+
+#include "vfs.h"
+#include "nfsd.h"
+#include "nfsfh.h"
+#include "netns.h"
+#include "filecache.h"
+#include "trace.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_FH
+
+/* FIXME: dynamically size this for the machine somehow? */
+#define NFSD_FILE_HASH_BITS 12
+#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS)
+#define NFSD_LAUNDRETTE_DELAY (2 * HZ)
+
+#define NFSD_FILE_LRU_RESCAN (0)
+#define NFSD_FILE_SHUTDOWN (1)
+#define NFSD_FILE_LRU_THRESHOLD (4096UL)
+#define NFSD_FILE_LRU_LIMIT (NFSD_FILE_LRU_THRESHOLD << 2)
+
+/* We only care about NFSD_MAY_READ/WRITE for this cache */
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+
+struct nfsd_fcache_bucket {
+ struct hlist_head nfb_head;
+ spinlock_t nfb_lock;
+ unsigned int nfb_count;
+ unsigned int nfb_maxcount;
+};
+
+static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
+
+static struct kmem_cache *nfsd_file_slab;
+static struct kmem_cache *nfsd_file_mark_slab;
+static struct nfsd_fcache_bucket *nfsd_file_hashtbl;
+static struct list_lru nfsd_file_lru;
+static long nfsd_file_lru_flags;
+static struct fsnotify_group *nfsd_file_fsnotify_group;
+static atomic_long_t nfsd_filecache_count;
+static struct delayed_work nfsd_filecache_laundrette;
+
+enum nfsd_file_laundrette_ctl {
+ NFSD_FILE_LAUNDRETTE_NOFLUSH = 0,
+ NFSD_FILE_LAUNDRETTE_MAY_FLUSH
+};
+
+static void
+nfsd_file_schedule_laundrette(enum nfsd_file_laundrette_ctl ctl)
+{
+ long count = atomic_long_read(&nfsd_filecache_count);
+
+ if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags))
+ return;
+
+ /* Be more aggressive about scanning if over the threshold */
+ if (count > NFSD_FILE_LRU_THRESHOLD)
+ mod_delayed_work(system_wq, &nfsd_filecache_laundrette, 0);
+ else
+ schedule_delayed_work(&nfsd_filecache_laundrette, NFSD_LAUNDRETTE_DELAY);
+
+ if (ctl == NFSD_FILE_LAUNDRETTE_NOFLUSH)
+ return;
+
+ /* ...and don't delay flushing if we're out of control */
+ if (count >= NFSD_FILE_LRU_LIMIT)
+ flush_delayed_work(&nfsd_filecache_laundrette);
+}
+
+static void
+nfsd_file_slab_free(struct rcu_head *rcu)
+{
+ struct nfsd_file *nf = container_of(rcu, struct nfsd_file, nf_rcu);
+
+ put_cred(nf->nf_cred);
+ kmem_cache_free(nfsd_file_slab, nf);
+}
+
+static void
+nfsd_file_mark_free(struct fsnotify_mark *mark)
+{
+ struct nfsd_file_mark *nfm = container_of(mark, struct nfsd_file_mark,
+ nfm_mark);
+
+ kmem_cache_free(nfsd_file_mark_slab, nfm);
+}
+
+static struct nfsd_file_mark *
+nfsd_file_mark_get(struct nfsd_file_mark *nfm)
+{
+ if (!atomic_inc_not_zero(&nfm->nfm_ref))
+ return NULL;
+ return nfm;
+}
+
+static void
+nfsd_file_mark_put(struct nfsd_file_mark *nfm)
+{
+ if (atomic_dec_and_test(&nfm->nfm_ref)) {
+
+ fsnotify_destroy_mark(&nfm->nfm_mark, nfsd_file_fsnotify_group);
+ fsnotify_put_mark(&nfm->nfm_mark);
+ }
+}
+
+static struct nfsd_file_mark *
+nfsd_file_mark_find_or_create(struct nfsd_file *nf)
+{
+ int err;
+ struct fsnotify_mark *mark;
+ struct nfsd_file_mark *nfm = NULL, *new;
+ struct inode *inode = nf->nf_inode;
+
+ do {
+ mutex_lock(&nfsd_file_fsnotify_group->mark_mutex);
+ mark = fsnotify_find_mark(&inode->i_fsnotify_marks,
+ nfsd_file_fsnotify_group);
+ if (mark) {
+ nfm = nfsd_file_mark_get(container_of(mark,
+ struct nfsd_file_mark,
+ nfm_mark));
+ mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
+ fsnotify_put_mark(mark);
+ if (likely(nfm))
+ break;
+ } else
+ mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
+
+ /* allocate a new nfm */
+ new = kmem_cache_alloc(nfsd_file_mark_slab, GFP_KERNEL);
+ if (!new)
+ return NULL;
+ fsnotify_init_mark(&new->nfm_mark, nfsd_file_fsnotify_group);
+ new->nfm_mark.mask = FS_ATTRIB|FS_DELETE_SELF;
+ atomic_set(&new->nfm_ref, 1);
+
+ err = fsnotify_add_inode_mark(&new->nfm_mark, inode, 0);
+
+ /*
+ * If the add was successful, then return the object.
+ * Otherwise, we need to put the reference we hold on the
+ * nfm_mark. The fsnotify code will take a reference and put
+ * it on failure, so we can't just free it directly. It's also
+ * not safe to call fsnotify_destroy_mark on it as the
+ * mark->group will be NULL. Thus, we can't let the nfm_ref
+ * counter drive the destruction at this point.
+ */
+ if (likely(!err))
+ nfm = new;
+ else
+ fsnotify_put_mark(&new->nfm_mark);
+ } while (unlikely(err == -EEXIST));
+
+ return nfm;
+}
+
+static struct nfsd_file *
+nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval,
+ struct net *net)
+{
+ struct nfsd_file *nf;
+
+ nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
+ if (nf) {
+ INIT_HLIST_NODE(&nf->nf_node);
+ INIT_LIST_HEAD(&nf->nf_lru);
+ nf->nf_file = NULL;
+ nf->nf_cred = get_current_cred();
+ nf->nf_net = net;
+ nf->nf_flags = 0;
+ nf->nf_inode = inode;
+ nf->nf_hashval = hashval;
+ atomic_set(&nf->nf_ref, 1);
+ nf->nf_may = may & NFSD_FILE_MAY_MASK;
+ if (may & NFSD_MAY_NOT_BREAK_LEASE) {
+ if (may & NFSD_MAY_WRITE)
+ __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+ if (may & NFSD_MAY_READ)
+ __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ }
+ nf->nf_mark = NULL;
+ trace_nfsd_file_alloc(nf);
+ }
+ return nf;
+}
+
+static bool
+nfsd_file_free(struct nfsd_file *nf)
+{
+ bool flush = false;
+
+ trace_nfsd_file_put_final(nf);
+ if (nf->nf_mark)
+ nfsd_file_mark_put(nf->nf_mark);
+ if (nf->nf_file) {
+ get_file(nf->nf_file);
+ filp_close(nf->nf_file, NULL);
+ fput(nf->nf_file);
+ flush = true;
+ }
+ call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
+ return flush;
+}
+
+static bool
+nfsd_file_check_writeback(struct nfsd_file *nf)
+{
+ struct file *file = nf->nf_file;
+ struct address_space *mapping;
+
+ if (!file || !(file->f_mode & FMODE_WRITE))
+ return false;
+ mapping = file->f_mapping;
+ return mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) ||
+ mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK);
+}
+
+static int
+nfsd_file_check_write_error(struct nfsd_file *nf)
+{
+ struct file *file = nf->nf_file;
+
+ if (!file || !(file->f_mode & FMODE_WRITE))
+ return 0;
+ return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err));
+}
+
+static bool
+nfsd_file_in_use(struct nfsd_file *nf)
+{
+ return nfsd_file_check_writeback(nf) ||
+ nfsd_file_check_write_error(nf);
+}
+
+static void
+nfsd_file_do_unhash(struct nfsd_file *nf)
+{
+ lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+ trace_nfsd_file_unhash(nf);
+
+ if (nfsd_file_check_write_error(nf))
+ nfsd_reset_boot_verifier(net_generic(nf->nf_net, nfsd_net_id));
+ --nfsd_file_hashtbl[nf->nf_hashval].nfb_count;
+ hlist_del_rcu(&nf->nf_node);
+ if (!list_empty(&nf->nf_lru))
+ list_lru_del(&nfsd_file_lru, &nf->nf_lru);
+ atomic_long_dec(&nfsd_filecache_count);
+}
+
+static bool
+nfsd_file_unhash(struct nfsd_file *nf)
+{
+ if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ nfsd_file_do_unhash(nf);
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Return true if the file was unhashed.
+ */
+static bool
+nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose)
+{
+ lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+ trace_nfsd_file_unhash_and_release_locked(nf);
+ if (!nfsd_file_unhash(nf))
+ return false;
+ /* keep final reference for nfsd_file_lru_dispose */
+ if (atomic_add_unless(&nf->nf_ref, -1, 1))
+ return true;
+
+ list_add(&nf->nf_lru, dispose);
+ return true;
+}
+
+static int
+nfsd_file_put_noref(struct nfsd_file *nf)
+{
+ int count;
+ trace_nfsd_file_put(nf);
+
+ count = atomic_dec_return(&nf->nf_ref);
+ if (!count) {
+ WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
+ nfsd_file_free(nf);
+ }
+ return count;
+}
+
+void
+nfsd_file_put(struct nfsd_file *nf)
+{
+ bool is_hashed = test_bit(NFSD_FILE_HASHED, &nf->nf_flags) != 0;
+ bool unused = !nfsd_file_in_use(nf);
+
+ set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+ if (nfsd_file_put_noref(nf) == 1 && is_hashed && unused)
+ nfsd_file_schedule_laundrette(NFSD_FILE_LAUNDRETTE_MAY_FLUSH);
+}
+
+struct nfsd_file *
+nfsd_file_get(struct nfsd_file *nf)
+{
+ if (likely(atomic_inc_not_zero(&nf->nf_ref)))
+ return nf;
+ return NULL;
+}
+
+static void
+nfsd_file_dispose_list(struct list_head *dispose)
+{
+ struct nfsd_file *nf;
+
+ while(!list_empty(dispose)) {
+ nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+ list_del(&nf->nf_lru);
+ nfsd_file_put_noref(nf);
+ }
+}
+
+static void
+nfsd_file_dispose_list_sync(struct list_head *dispose)
+{
+ bool flush = false;
+ struct nfsd_file *nf;
+
+ while(!list_empty(dispose)) {
+ nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+ list_del(&nf->nf_lru);
+ if (!atomic_dec_and_test(&nf->nf_ref))
+ continue;
+ if (nfsd_file_free(nf))
+ flush = true;
+ }
+ if (flush)
+ flush_delayed_fput();
+}
+
+/*
+ * Note this can deadlock with nfsd_file_cache_purge.
+ */
+static enum lru_status
+nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
+ spinlock_t *lock, void *arg)
+ __releases(lock)
+ __acquires(lock)
+{
+ struct list_head *head = arg;
+ struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru);
+
+ /*
+ * Do a lockless refcount check. The hashtable holds one reference, so
+ * we look to see if anything else has a reference, or if any have
+ * been put since the shrinker last ran. Those don't get unhashed and
+ * released.
+ *
+ * Note that in the put path, we set the flag and then decrement the
+ * counter. Here we check the counter and then test and clear the flag.
+ * That order is deliberate to ensure that we can do this locklessly.
+ */
+ if (atomic_read(&nf->nf_ref) > 1)
+ goto out_skip;
+
+ /*
+ * Don't throw out files that are still undergoing I/O or
+ * that have uncleared errors pending.
+ */
+ if (nfsd_file_check_writeback(nf))
+ goto out_skip;
+
+ if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags))
+ goto out_rescan;
+
+ if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags))
+ goto out_skip;
+
+ list_lru_isolate_move(lru, &nf->nf_lru, head);
+ return LRU_REMOVED;
+out_rescan:
+ set_bit(NFSD_FILE_LRU_RESCAN, &nfsd_file_lru_flags);
+out_skip:
+ return LRU_SKIP;
+}
+
+static void
+nfsd_file_lru_dispose(struct list_head *head)
+{
+ while(!list_empty(head)) {
+ struct nfsd_file *nf = list_first_entry(head,
+ struct nfsd_file, nf_lru);
+ list_del_init(&nf->nf_lru);
+ spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+ nfsd_file_do_unhash(nf);
+ spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+ nfsd_file_put_noref(nf);
+ }
+}
+
+static unsigned long
+nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc)
+{
+ return list_lru_count(&nfsd_file_lru);
+}
+
+static unsigned long
+nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
+{
+ LIST_HEAD(head);
+ unsigned long ret;
+
+ ret = list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, &head);
+ nfsd_file_lru_dispose(&head);
+ return ret;
+}
+
+static struct shrinker nfsd_file_shrinker = {
+ .scan_objects = nfsd_file_lru_scan,
+ .count_objects = nfsd_file_lru_count,
+ .seeks = 1,
+};
+
+static void
+__nfsd_file_close_inode(struct inode *inode, unsigned int hashval,
+ struct list_head *dispose)
+{
+ struct nfsd_file *nf;
+ struct hlist_node *tmp;
+
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
+ if (inode == nf->nf_inode)
+ nfsd_file_unhash_and_release_locked(nf, dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+}
+
+/**
+ * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file
+ * @inode: inode of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "inode".
+ * If any do, then unhash them and put the hashtable reference to them and
+ * destroy any that had their last reference put. Also ensure that any of the
+ * fputs also have their final __fput done as well.
+ */
+void
+nfsd_file_close_inode_sync(struct inode *inode)
+{
+ unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
+ NFSD_FILE_HASH_BITS);
+ LIST_HEAD(dispose);
+
+ __nfsd_file_close_inode(inode, hashval, &dispose);
+ trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose));
+ nfsd_file_dispose_list_sync(&dispose);
+}
+
+/**
+ * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file
+ * @inode: inode of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "inode".
+ * If any do, then unhash them and put the hashtable reference to them and
+ * destroy any that had their last reference put.
+ */
+static void
+nfsd_file_close_inode(struct inode *inode)
+{
+ unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
+ NFSD_FILE_HASH_BITS);
+ LIST_HEAD(dispose);
+
+ __nfsd_file_close_inode(inode, hashval, &dispose);
+ trace_nfsd_file_close_inode(inode, hashval, !list_empty(&dispose));
+ nfsd_file_dispose_list(&dispose);
+}
+
+/**
+ * nfsd_file_delayed_close - close unused nfsd_files
+ * @work: dummy
+ *
+ * Walk the LRU list and close any entries that have not been used since
+ * the last scan.
+ *
+ * Note this can deadlock with nfsd_file_cache_purge.
+ */
+static void
+nfsd_file_delayed_close(struct work_struct *work)
+{
+ LIST_HEAD(head);
+
+ list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, &head, LONG_MAX);
+
+ if (test_and_clear_bit(NFSD_FILE_LRU_RESCAN, &nfsd_file_lru_flags))
+ nfsd_file_schedule_laundrette(NFSD_FILE_LAUNDRETTE_NOFLUSH);
+
+ if (!list_empty(&head)) {
+ nfsd_file_lru_dispose(&head);
+ flush_delayed_fput();
+ }
+}
+
+static int
+nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
+ void *data)
+{
+ struct file_lock *fl = data;
+
+ /* Only close files for F_SETLEASE leases */
+ if (fl->fl_flags & FL_LEASE)
+ nfsd_file_close_inode_sync(file_inode(fl->fl_file));
+ return 0;
+}
+
+static struct notifier_block nfsd_file_lease_notifier = {
+ .notifier_call = nfsd_file_lease_notifier_call,
+};
+
+static int
+nfsd_file_fsnotify_handle_event(struct fsnotify_group *group,
+ struct inode *inode,
+ u32 mask, const void *data, int data_type,
+ const struct qstr *file_name, u32 cookie,
+ struct fsnotify_iter_info *iter_info)
+{
+ trace_nfsd_file_fsnotify_handle_event(inode, mask);
+
+ /* Should be no marks on non-regular files */
+ if (!S_ISREG(inode->i_mode)) {
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+
+ /* don't close files if this was not the last link */
+ if (mask & FS_ATTRIB) {
+ if (inode->i_nlink)
+ return 0;
+ }
+
+ nfsd_file_close_inode(inode);
+ return 0;
+}
+
+
+static const struct fsnotify_ops nfsd_file_fsnotify_ops = {
+ .handle_event = nfsd_file_fsnotify_handle_event,
+ .free_mark = nfsd_file_mark_free,
+};
+
+int
+nfsd_file_cache_init(void)
+{
+ int ret = -ENOMEM;
+ unsigned int i;
+
+ clear_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags);
+
+ if (nfsd_file_hashtbl)
+ return 0;
+
+ nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+ sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
+ if (!nfsd_file_hashtbl) {
+ pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
+ goto out_err;
+ }
+
+ nfsd_file_slab = kmem_cache_create("nfsd_file",
+ sizeof(struct nfsd_file), 0, 0, NULL);
+ if (!nfsd_file_slab) {
+ pr_err("nfsd: unable to create nfsd_file_slab\n");
+ goto out_err;
+ }
+
+ nfsd_file_mark_slab = kmem_cache_create("nfsd_file_mark",
+ sizeof(struct nfsd_file_mark), 0, 0, NULL);
+ if (!nfsd_file_mark_slab) {
+ pr_err("nfsd: unable to create nfsd_file_mark_slab\n");
+ goto out_err;
+ }
+
+
+ ret = list_lru_init(&nfsd_file_lru);
+ if (ret) {
+ pr_err("nfsd: failed to init nfsd_file_lru: %d\n", ret);
+ goto out_err;
+ }
+
+ ret = register_shrinker(&nfsd_file_shrinker);
+ if (ret) {
+ pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
+ goto out_lru;
+ }
+
+ ret = lease_register_notifier(&nfsd_file_lease_notifier);
+ if (ret) {
+ pr_err("nfsd: unable to register lease notifier: %d\n", ret);
+ goto out_shrinker;
+ }
+
+ nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops);
+ if (IS_ERR(nfsd_file_fsnotify_group)) {
+ pr_err("nfsd: unable to create fsnotify group: %ld\n",
+ PTR_ERR(nfsd_file_fsnotify_group));
+ nfsd_file_fsnotify_group = NULL;
+ goto out_notifier;
+ }
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head);
+ spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock);
+ }
+
+ INIT_DELAYED_WORK(&nfsd_filecache_laundrette, nfsd_file_delayed_close);
+out:
+ return ret;
+out_notifier:
+ lease_unregister_notifier(&nfsd_file_lease_notifier);
+out_shrinker:
+ unregister_shrinker(&nfsd_file_shrinker);
+out_lru:
+ list_lru_destroy(&nfsd_file_lru);
+out_err:
+ kmem_cache_destroy(nfsd_file_slab);
+ nfsd_file_slab = NULL;
+ kmem_cache_destroy(nfsd_file_mark_slab);
+ nfsd_file_mark_slab = NULL;
+ kfree(nfsd_file_hashtbl);
+ nfsd_file_hashtbl = NULL;
+ goto out;
+}
+
+/*
+ * Note this can deadlock with nfsd_file_lru_cb.
+ */
+void
+nfsd_file_cache_purge(struct net *net)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ struct hlist_node *next;
+ LIST_HEAD(dispose);
+ bool del;
+
+ if (!nfsd_file_hashtbl)
+ return;
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i];
+
+ spin_lock(&nfb->nfb_lock);
+ hlist_for_each_entry_safe(nf, next, &nfb->nfb_head, nf_node) {
+ if (net && nf->nf_net != net)
+ continue;
+ del = nfsd_file_unhash_and_release_locked(nf, &dispose);
+
+ /*
+ * Deadlock detected! Something marked this entry as
+ * unhased, but hasn't removed it from the hash list.
+ */
+ WARN_ON_ONCE(!del);
+ }
+ spin_unlock(&nfb->nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+}
+
+void
+nfsd_file_cache_shutdown(void)
+{
+ LIST_HEAD(dispose);
+
+ set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags);
+
+ lease_unregister_notifier(&nfsd_file_lease_notifier);
+ unregister_shrinker(&nfsd_file_shrinker);
+ /*
+ * make sure all callers of nfsd_file_lru_cb are done before
+ * calling nfsd_file_cache_purge
+ */
+ cancel_delayed_work_sync(&nfsd_filecache_laundrette);
+ nfsd_file_cache_purge(NULL);
+ list_lru_destroy(&nfsd_file_lru);
+ rcu_barrier();
+ fsnotify_put_group(nfsd_file_fsnotify_group);
+ nfsd_file_fsnotify_group = NULL;
+ kmem_cache_destroy(nfsd_file_slab);
+ nfsd_file_slab = NULL;
+ fsnotify_wait_marks_destroyed();
+ kmem_cache_destroy(nfsd_file_mark_slab);
+ nfsd_file_mark_slab = NULL;
+ kfree(nfsd_file_hashtbl);
+ nfsd_file_hashtbl = NULL;
+}
+
+static bool
+nfsd_match_cred(const struct cred *c1, const struct cred *c2)
+{
+ int i;
+
+ if (!uid_eq(c1->fsuid, c2->fsuid))
+ return false;
+ if (!gid_eq(c1->fsgid, c2->fsgid))
+ return false;
+ if (c1->group_info == NULL || c2->group_info == NULL)
+ return c1->group_info == c2->group_info;
+ if (c1->group_info->ngroups != c2->group_info->ngroups)
+ return false;
+ for (i = 0; i < c1->group_info->ngroups; i++) {
+ if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i]))
+ return false;
+ }
+ return true;
+}
+
+static struct nfsd_file *
+nfsd_file_find_locked(struct inode *inode, unsigned int may_flags,
+ unsigned int hashval, struct net *net)
+{
+ struct nfsd_file *nf;
+ unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
+
+ hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+ nf_node) {
+ if ((need & nf->nf_may) != need)
+ continue;
+ if (nf->nf_inode != inode)
+ continue;
+ if (nf->nf_net != net)
+ continue;
+ if (!nfsd_match_cred(nf->nf_cred, current_cred()))
+ continue;
+ if (nfsd_file_get(nf) != NULL)
+ return nf;
+ }
+ return NULL;
+}
+
+/**
+ * nfsd_file_is_cached - are there any cached open files for this fh?
+ * @inode: inode of the file to check
+ *
+ * Scan the hashtable for open files that match this fh. Returns true if there
+ * are any, and false if not.
+ */
+bool
+nfsd_file_is_cached(struct inode *inode)
+{
+ bool ret = false;
+ struct nfsd_file *nf;
+ unsigned int hashval;
+
+ hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+ nf_node) {
+ if (inode == nf->nf_inode) {
+ ret = true;
+ break;
+ }
+ }
+ rcu_read_unlock();
+ trace_nfsd_file_is_cached(inode, hashval, (int)ret);
+ return ret;
+}
+
+__be32
+nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **pnf)
+{
+ __be32 status;
+ struct net *net = SVC_NET(rqstp);
+ struct nfsd_file *nf, *new;
+ struct inode *inode;
+ unsigned int hashval;
+
+ /* FIXME: skip this if fh_dentry is already set? */
+ status = fh_verify(rqstp, fhp, S_IFREG,
+ may_flags|NFSD_MAY_OWNER_OVERRIDE);
+ if (status != nfs_ok)
+ return status;
+
+ inode = d_inode(fhp->fh_dentry);
+ hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
+retry:
+ rcu_read_lock();
+ nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
+ rcu_read_unlock();
+ if (nf)
+ goto wait_for_construction;
+
+ new = nfsd_file_alloc(inode, may_flags, hashval, net);
+ if (!new) {
+ trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags,
+ NULL, nfserr_jukebox);
+ return nfserr_jukebox;
+ }
+
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nf = nfsd_file_find_locked(inode, may_flags, hashval, net);
+ if (nf == NULL)
+ goto open_file;
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nfsd_file_slab_free(&new->nf_rcu);
+
+wait_for_construction:
+ wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
+
+ /* Did construction of this file fail? */
+ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ nfsd_file_put_noref(nf);
+ goto retry;
+ }
+
+ this_cpu_inc(nfsd_file_cache_hits);
+
+ if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
+ bool write = (may_flags & NFSD_MAY_WRITE);
+
+ if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) ||
+ (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) {
+ status = nfserrno(nfsd_open_break_lease(
+ file_inode(nf->nf_file), may_flags));
+ if (status == nfs_ok) {
+ clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ if (write)
+ clear_bit(NFSD_FILE_BREAK_WRITE,
+ &nf->nf_flags);
+ }
+ }
+ }
+out:
+ if (status == nfs_ok) {
+ *pnf = nf;
+ } else {
+ nfsd_file_put(nf);
+ nf = NULL;
+ }
+
+ trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, nf, status);
+ return status;
+open_file:
+ nf = new;
+ /* Take reference for the hashtable */
+ atomic_inc(&nf->nf_ref);
+ __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
+ __set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
+ list_lru_add(&nfsd_file_lru, &nf->nf_lru);
+ hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
+ ++nfsd_file_hashtbl[hashval].nfb_count;
+ nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
+ nfsd_file_hashtbl[hashval].nfb_count);
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ atomic_long_inc(&nfsd_filecache_count);
+
+ nf->nf_mark = nfsd_file_mark_find_or_create(nf);
+ if (nf->nf_mark)
+ status = nfsd_open_verified(rqstp, fhp, S_IFREG,
+ may_flags, &nf->nf_file);
+ else
+ status = nfserr_jukebox;
+ /*
+ * If construction failed, or we raced with a call to unlink()
+ * then unhash.
+ */
+ if (status != nfs_ok || inode->i_nlink == 0) {
+ bool do_free;
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ do_free = nfsd_file_unhash(nf);
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ if (do_free)
+ nfsd_file_put_noref(nf);
+ }
+ clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
+ smp_mb__after_atomic();
+ wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
+ goto out;
+}
+
+/*
+ * Note that fields may be added, removed or reordered in the future. Programs
+ * scraping this file for info should test the labels to ensure they're
+ * getting the correct field.
+ */
+static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
+{
+ unsigned int i, count = 0, longest = 0;
+ unsigned long hits = 0;
+
+ /*
+ * No need for spinlocks here since we're not terribly interested in
+ * accuracy. We do take the nfsd_mutex simply to ensure that we
+ * don't end up racing with server shutdown
+ */
+ mutex_lock(&nfsd_mutex);
+ if (nfsd_file_hashtbl) {
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ count += nfsd_file_hashtbl[i].nfb_count;
+ longest = max(longest, nfsd_file_hashtbl[i].nfb_count);
+ }
+ }
+ mutex_unlock(&nfsd_mutex);
+
+ for_each_possible_cpu(i)
+ hits += per_cpu(nfsd_file_cache_hits, i);
+
+ seq_printf(m, "total entries: %u\n", count);
+ seq_printf(m, "longest chain: %u\n", longest);
+ seq_printf(m, "cache hits: %lu\n", hits);
+ return 0;
+}
+
+int nfsd_file_cache_stats_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, nfsd_file_cache_stats_show, NULL);
+}
--- /dev/null
+#ifndef _FS_NFSD_FILECACHE_H
+#define _FS_NFSD_FILECACHE_H
+
+#include <linux/fsnotify_backend.h>
+
+/*
+ * This is the fsnotify_mark container that nfsd attaches to the files that it
+ * is holding open. Note that we have a separate refcount here aside from the
+ * one in the fsnotify_mark. We only want a single fsnotify_mark attached to
+ * the inode, and for each nfsd_file to hold a reference to it.
+ *
+ * The fsnotify_mark is itself refcounted, but that's not sufficient to tell us
+ * how to put that reference. If there are still outstanding nfsd_files that
+ * reference the mark, then we would want to call fsnotify_put_mark on it.
+ * If there were not, then we'd need to call fsnotify_destroy_mark. Since we
+ * can't really tell the difference, we use the nfm_mark to keep track of how
+ * many nfsd_files hold references to the mark. When that counter goes to zero
+ * then we know to call fsnotify_destroy_mark on it.
+ */
+struct nfsd_file_mark {
+ struct fsnotify_mark nfm_mark;
+ atomic_t nfm_ref;
+};
+
+/*
+ * A representation of a file that has been opened by knfsd. These are hashed
+ * in the hashtable by inode pointer value. Note that this object doesn't
+ * hold a reference to the inode by itself, so the nf_inode pointer should
+ * never be dereferenced, only used for comparison.
+ */
+struct nfsd_file {
+ struct hlist_node nf_node;
+ struct list_head nf_lru;
+ struct rcu_head nf_rcu;
+ struct file *nf_file;
+ const struct cred *nf_cred;
+ struct net *nf_net;
+#define NFSD_FILE_HASHED (0)
+#define NFSD_FILE_PENDING (1)
+#define NFSD_FILE_BREAK_READ (2)
+#define NFSD_FILE_BREAK_WRITE (3)
+#define NFSD_FILE_REFERENCED (4)
+ unsigned long nf_flags;
+ struct inode *nf_inode;
+ unsigned int nf_hashval;
+ atomic_t nf_ref;
+ unsigned char nf_may;
+ struct nfsd_file_mark *nf_mark;
+};
+
+int nfsd_file_cache_init(void);
+void nfsd_file_cache_purge(struct net *);
+void nfsd_file_cache_shutdown(void);
+void nfsd_file_put(struct nfsd_file *nf);
+struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+void nfsd_file_close_inode_sync(struct inode *inode);
+bool nfsd_file_is_cached(struct inode *inode);
+__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **nfp);
+int nfsd_file_cache_stats_open(struct inode *, struct file *);
+#endif /* _FS_NFSD_FILECACHE_H */
/* Time of server startup */
struct timespec64 nfssvc_boot;
+ seqlock_t boot_lock;
/*
* Max number of connections this nfsd container will allow. Defaults
extern void nfsd_netns_free_versions(struct nfsd_net *nn);
extern unsigned int nfsd_net_id;
+
+void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn);
+void nfsd_reset_boot_verifier(struct nfsd_net *nn);
#endif /* __NFSD_NETNS_H__ */
nfserr = nfsd_read(rqstp, &resp->fh,
argp->offset,
rqstp->rq_vec, argp->vlen,
- &resp->count);
- if (nfserr == 0) {
- struct inode *inode = d_inode(resp->fh.fh_dentry);
- resp->eof = nfsd_eof_on_read(cnt, resp->count, argp->offset,
- inode->i_size);
- }
-
+ &resp->count,
+ &resp->eof);
RETURN_STATUS(nfserr);
}
NF3SOCK, NF3BAD, NF3LNK, NF3BAD,
};
+
/*
* XDR functions for basic NFS types
*/
{
struct nfsd3_writeres *resp = rqstp->rq_resp;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ __be32 verf[2];
p = encode_wcc_data(rqstp, p, &resp->fh);
if (resp->status == 0) {
*p++ = htonl(resp->count);
*p++ = htonl(resp->committed);
/* unique identifier, y2038 overflow can be ignored */
- *p++ = htonl((u32)nn->nfssvc_boot.tv_sec);
- *p++ = htonl(nn->nfssvc_boot.tv_nsec);
+ nfsd_copy_boot_verifier(verf, nn);
+ *p++ = verf[0];
+ *p++ = verf[1];
}
return xdr_ressize_check(rqstp, p);
}
{
struct nfsd3_commitres *resp = rqstp->rq_resp;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ __be32 verf[2];
p = encode_wcc_data(rqstp, p, &resp->fh);
/* Write verifier */
if (resp->status == 0) {
/* unique identifier, y2038 overflow can be ignored */
- *p++ = htonl((u32)nn->nfssvc_boot.tv_sec);
- *p++ = htonl(nn->nfssvc_boot.tv_nsec);
+ nfsd_copy_boot_verifier(verf, nn);
+ *p++ = verf[0];
+ *p++ = verf[1];
}
return xdr_ressize_check(rqstp, p);
}
if (unlikely(status))
return status;
- if (cb != NULL) {
- status = decode_cb_sequence4res(xdr, cb);
- if (unlikely(status || cb->cb_seq_status))
- return status;
- }
+ status = decode_cb_sequence4res(xdr, cb);
+ if (unlikely(status || cb->cb_seq_status))
+ return status;
return decode_cb_op_status(xdr, OP_CB_RECALL, &cb->cb_status);
}
if (unlikely(status))
return status;
- if (cb) {
- status = decode_cb_sequence4res(xdr, cb);
- if (unlikely(status || cb->cb_seq_status))
- return status;
- }
+ status = decode_cb_sequence4res(xdr, cb);
+ if (unlikely(status || cb->cb_seq_status))
+ return status;
+
return decode_cb_op_status(xdr, OP_CB_LAYOUTRECALL, &cb->cb_status);
}
#endif /* CONFIG_NFSD_PNFS */
if (unlikely(status))
return status;
- if (cb) {
- status = decode_cb_sequence4res(xdr, cb);
- if (unlikely(status || cb->cb_seq_status))
- return status;
- }
+ status = decode_cb_sequence4res(xdr, cb);
+ if (unlikely(status || cb->cb_seq_status))
+ return status;
+
return decode_cb_op_status(xdr, OP_CB_NOTIFY_LOCK, &cb->cb_status);
}
if (unlikely(status))
return status;
- if (cb) {
- status = decode_cb_sequence4res(xdr, cb);
- if (unlikely(status || cb->cb_seq_status))
- return status;
- }
+ status = decode_cb_sequence4res(xdr, cb);
+ if (unlikely(status || cb->cb_seq_status))
+ return status;
+
return decode_cb_op_status(xdr, OP_CB_OFFLOAD, &cb->cb_status);
}
/*
spin_unlock(&fp->fi_lock);
if (!nfsd4_layout_ops[ls->ls_layout_type]->disable_recalls)
- vfs_setlease(ls->ls_file, F_UNLCK, NULL, (void **)&ls);
- fput(ls->ls_file);
+ vfs_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
+ nfsd_file_put(ls->ls_file);
if (ls->ls_recalled)
atomic_dec(&ls->ls_stid.sc_file->fi_lo_recalls);
fl->fl_end = OFFSET_MAX;
fl->fl_owner = ls;
fl->fl_pid = current->tgid;
- fl->fl_file = ls->ls_file;
+ fl->fl_file = ls->ls_file->nf_file;
status = vfs_setlease(fl->fl_file, fl->fl_type, &fl, NULL);
if (status) {
NFSPROC4_CLNT_CB_LAYOUT);
if (parent->sc_type == NFS4_DELEG_STID)
- ls->ls_file = get_file(fp->fi_deleg_file);
+ ls->ls_file = nfsd_file_get(fp->fi_deleg_file);
else
ls->ls_file = find_any_file(fp);
BUG_ON(!ls->ls_file);
if (nfsd4_layout_setlease(ls)) {
- fput(ls->ls_file);
+ nfsd_file_put(ls->ls_file);
put_nfs4_file(fp);
kmem_cache_free(nfs4_layout_stateid_cache, ls);
return NULL;
argv[0] = (char *)nfsd_recall_failed;
argv[1] = addr_str;
- argv[2] = ls->ls_file->f_path.mnt->mnt_sb->s_id;
+ argv[2] = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_id;
argv[3] = NULL;
error = call_usermodehelper(nfsd_recall_failed, argv, envp,
static void gen_boot_verifier(nfs4_verifier *verifier, struct net *net)
{
- __be32 verf[2];
- struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ __be32 *verf = (__be32 *)verifier->data;
- /*
- * This is opaque to client, so no need to byte-swap. Use
- * __force to keep sparse happy. y2038 time_t overflow is
- * irrelevant in this usage.
- */
- verf[0] = (__force __be32)nn->nfssvc_boot.tv_sec;
- verf[1] = (__force __be32)nn->nfssvc_boot.tv_nsec;
- memcpy(verifier->data, verf, sizeof(verifier->data));
+ BUILD_BUG_ON(2*sizeof(*verf) != sizeof(verifier->data));
+
+ nfsd_copy_boot_verifier(verf, net_generic(net, nfsd_net_id));
}
static __be32
struct nfsd4_read *read = &u->read;
__be32 status;
- read->rd_filp = NULL;
+ read->rd_nf = NULL;
if (read->rd_offset >= OFFSET_MAX)
return nfserr_inval;
/* check stateid */
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&read->rd_stateid, RD_STATE,
- &read->rd_filp, &read->rd_tmp_file);
+ &read->rd_nf);
if (status) {
dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
goto out;
static void
nfsd4_read_release(union nfsd4_op_u *u)
{
- if (u->read.rd_filp)
- fput(u->read.rd_filp);
+ if (u->read.rd_nf)
+ nfsd_file_put(u->read.rd_nf);
trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
u->read.rd_offset, u->read.rd_length);
}
if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
status = nfs4_preprocess_stateid_op(rqstp, cstate,
&cstate->current_fh, &setattr->sa_stateid,
- WR_STATE, NULL, NULL);
+ WR_STATE, NULL);
if (status) {
dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
return status;
{
struct nfsd4_write *write = &u->write;
stateid_t *stateid = &write->wr_stateid;
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
__be32 status = nfs_ok;
unsigned long cnt;
int nvecs;
trace_nfsd_write_start(rqstp, &cstate->current_fh,
write->wr_offset, cnt);
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
- stateid, WR_STATE, &filp, NULL);
+ stateid, WR_STATE, &nf);
if (status) {
dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
return status;
&write->wr_head, write->wr_buflen);
WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
- status = nfsd_vfs_write(rqstp, &cstate->current_fh, filp,
+ status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf->nf_file,
write->wr_offset, rqstp->rq_vec, nvecs, &cnt,
write->wr_how_written);
- fput(filp);
+ nfsd_file_put(nf);
write->wr_bytes_written = cnt;
trace_nfsd_write_done(rqstp, &cstate->current_fh,
static __be32
nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
- stateid_t *src_stateid, struct file **src,
- stateid_t *dst_stateid, struct file **dst)
+ stateid_t *src_stateid, struct nfsd_file **src,
+ stateid_t *dst_stateid, struct nfsd_file **dst)
{
__be32 status;
return nfserr_nofilehandle;
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
- src_stateid, RD_STATE, src, NULL);
+ src_stateid, RD_STATE, src);
if (status) {
dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
goto out;
}
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
- dst_stateid, WR_STATE, dst, NULL);
+ dst_stateid, WR_STATE, dst);
if (status) {
dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
goto out_put_src;
}
/* fix up for NFS-specific error code */
- if (!S_ISREG(file_inode(*src)->i_mode) ||
- !S_ISREG(file_inode(*dst)->i_mode)) {
+ if (!S_ISREG(file_inode((*src)->nf_file)->i_mode) ||
+ !S_ISREG(file_inode((*dst)->nf_file)->i_mode)) {
status = nfserr_wrong_type;
goto out_put_dst;
}
out:
return status;
out_put_dst:
- fput(*dst);
+ nfsd_file_put(*dst);
out_put_src:
- fput(*src);
+ nfsd_file_put(*src);
goto out;
}
union nfsd4_op_u *u)
{
struct nfsd4_clone *clone = &u->clone;
- struct file *src, *dst;
+ struct nfsd_file *src, *dst;
__be32 status;
status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
if (status)
goto out;
- status = nfsd4_clone_file_range(src, clone->cl_src_pos,
- dst, clone->cl_dst_pos, clone->cl_count);
+ status = nfsd4_clone_file_range(src->nf_file, clone->cl_src_pos,
+ dst->nf_file, clone->cl_dst_pos, clone->cl_count);
- fput(dst);
- fput(src);
+ nfsd_file_put(dst);
+ nfsd_file_put(src);
out:
return status;
}
do {
if (kthread_should_stop())
break;
- bytes_copied = nfsd_copy_file_range(copy->file_src, src_pos,
- copy->file_dst, dst_pos, bytes_total);
+ bytes_copied = nfsd_copy_file_range(copy->nf_src->nf_file,
+ src_pos, copy->nf_dst->nf_file, dst_pos,
+ bytes_total);
if (bytes_copied <= 0)
break;
bytes_total -= bytes_copied;
status = nfs_ok;
}
- fput(copy->file_src);
- fput(copy->file_dst);
+ nfsd_file_put(copy->nf_src);
+ nfsd_file_put(copy->nf_dst);
return status;
}
memcpy(&dst->cp_res, &src->cp_res, sizeof(src->cp_res));
memcpy(&dst->fh, &src->fh, sizeof(src->fh));
dst->cp_clp = src->cp_clp;
- dst->file_dst = get_file(src->file_dst);
- dst->file_src = get_file(src->file_src);
+ dst->nf_dst = nfsd_file_get(src->nf_dst);
+ dst->nf_src = nfsd_file_get(src->nf_src);
memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid));
}
static void cleanup_async_copy(struct nfsd4_copy *copy)
{
nfs4_free_cp_state(copy);
- fput(copy->file_dst);
- fput(copy->file_src);
+ nfsd_file_put(copy->nf_dst);
+ nfsd_file_put(copy->nf_src);
spin_lock(©->cp_clp->async_lock);
list_del(©->copies);
spin_unlock(©->cp_clp->async_lock);
struct nfsd4_copy *async_copy = NULL;
status = nfsd4_verify_copy(rqstp, cstate, ©->cp_src_stateid,
- ©->file_src, ©->cp_dst_stateid,
- ©->file_dst);
+ ©->nf_src, ©->cp_dst_stateid,
+ ©->nf_dst);
if (status)
goto out;
struct nfsd4_fallocate *fallocate, int flags)
{
__be32 status;
- struct file *file;
+ struct nfsd_file *nf;
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&fallocate->falloc_stateid,
- WR_STATE, &file, NULL);
+ WR_STATE, &nf);
if (status != nfs_ok) {
dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
return status;
}
- status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, file,
+ status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file,
fallocate->falloc_offset,
fallocate->falloc_length,
flags);
- fput(file);
+ nfsd_file_put(nf);
return status;
}
static __be32
struct nfsd4_seek *seek = &u->seek;
int whence;
__be32 status;
- struct file *file;
+ struct nfsd_file *nf;
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&seek->seek_stateid,
- RD_STATE, &file, NULL);
+ RD_STATE, &nf);
if (status) {
dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
return status;
* Note: This call does change file->f_pos, but nothing in NFSD
* should ever file->f_pos.
*/
- seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
+ seek->seek_pos = vfs_llseek(nf->nf_file, seek->seek_offset, whence);
if (seek->seek_pos < 0)
status = nfserrno(seek->seek_pos);
- else if (seek->seek_pos >= i_size_read(file_inode(file)))
+ else if (seek->seek_pos >= i_size_read(file_inode(nf->nf_file)))
seek->seek_eof = true;
out:
- fput(file);
+ nfsd_file_put(nf);
return status;
}
void (*remove)(struct nfs4_client *);
int (*check)(struct nfs4_client *);
void (*grace_done)(struct nfsd_net *);
+ uint8_t version;
+ size_t msglen;
};
+static const struct nfsd4_client_tracking_ops nfsd4_cld_tracking_ops;
+static const struct nfsd4_client_tracking_ops nfsd4_cld_tracking_ops_v2;
+
/* Globals */
static char user_recovery_dirname[PATH_MAX] = "/var/lib/nfs/v4recovery";
const char *dname, int len, struct nfsd_net *nn)
{
struct xdr_netobj name;
+ struct xdr_netobj princhash = { .len = 0, .data = NULL };
struct nfs4_client_reclaim *crp;
name.data = kmemdup(dname, len, GFP_KERNEL);
return;
}
name.len = len;
- crp = nfs4_client_to_reclaim(name, nn);
+ crp = nfs4_client_to_reclaim(name, princhash, nn);
if (!crp) {
kfree(name.data);
return;
load_recdir(struct dentry *parent, struct dentry *child, struct nfsd_net *nn)
{
struct xdr_netobj name;
+ struct xdr_netobj princhash = { .len = 0, .data = NULL };
if (child->d_name.len != HEXDIR_LEN - 1) {
printk("%s: illegal name %pd in recovery directory\n",
goto out;
}
name.len = HEXDIR_LEN;
- if (!nfs4_client_to_reclaim(name, nn))
+ if (!nfs4_client_to_reclaim(name, princhash, nn))
kfree(name.data);
out:
return 0;
.remove = nfsd4_remove_clid_dir,
.check = nfsd4_check_legacy_client,
.grace_done = nfsd4_recdir_purge_old,
+ .version = 1,
+ .msglen = 0,
};
/* Globals */
struct list_head cn_list;
unsigned int cn_xid;
bool cn_has_legacy;
+ struct crypto_shash *cn_tfm;
};
struct cld_upcall {
struct list_head cu_list;
struct cld_net *cu_net;
struct completion cu_done;
- struct cld_msg cu_msg;
+ union {
+ struct cld_msg_hdr cu_hdr;
+ struct cld_msg cu_msg;
+ struct cld_msg_v2 cu_msg_v2;
+ } cu_u;
};
static int
-__cld_pipe_upcall(struct rpc_pipe *pipe, struct cld_msg *cmsg)
+__cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg)
{
int ret;
struct rpc_pipe_msg msg;
- struct cld_upcall *cup = container_of(cmsg, struct cld_upcall, cu_msg);
+ struct cld_upcall *cup = container_of(cmsg, struct cld_upcall, cu_u);
+ struct nfsd_net *nn = net_generic(pipe->dentry->d_sb->s_fs_info,
+ nfsd_net_id);
memset(&msg, 0, sizeof(msg));
msg.data = cmsg;
- msg.len = sizeof(*cmsg);
+ msg.len = nn->client_tracking_ops->msglen;
ret = rpc_queue_upcall(pipe, &msg);
if (ret < 0) {
}
static int
-cld_pipe_upcall(struct rpc_pipe *pipe, struct cld_msg *cmsg)
+cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg)
{
int ret;
}
static ssize_t
-__cld_pipe_inprogress_downcall(const struct cld_msg __user *cmsg,
+__cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg,
struct nfsd_net *nn)
{
- uint8_t cmd;
- struct xdr_netobj name;
+ uint8_t cmd, princhashlen;
+ struct xdr_netobj name, princhash = { .len = 0, .data = NULL };
uint16_t namelen;
struct cld_net *cn = nn->cld_net;
return -EFAULT;
}
if (cmd == Cld_GraceStart) {
- if (get_user(namelen, &cmsg->cm_u.cm_name.cn_len))
- return -EFAULT;
- name.data = memdup_user(&cmsg->cm_u.cm_name.cn_id, namelen);
- if (IS_ERR_OR_NULL(name.data))
- return -EFAULT;
- name.len = namelen;
+ if (nn->client_tracking_ops->version >= 2) {
+ const struct cld_clntinfo __user *ci;
+
+ ci = &cmsg->cm_u.cm_clntinfo;
+ if (get_user(namelen, &ci->cc_name.cn_len))
+ return -EFAULT;
+ name.data = memdup_user(&ci->cc_name.cn_id, namelen);
+ if (IS_ERR_OR_NULL(name.data))
+ return -EFAULT;
+ name.len = namelen;
+ get_user(princhashlen, &ci->cc_princhash.cp_len);
+ if (princhashlen > 0) {
+ princhash.data = memdup_user(
+ &ci->cc_princhash.cp_data,
+ princhashlen);
+ if (IS_ERR_OR_NULL(princhash.data))
+ return -EFAULT;
+ princhash.len = princhashlen;
+ } else
+ princhash.len = 0;
+ } else {
+ const struct cld_name __user *cnm;
+
+ cnm = &cmsg->cm_u.cm_name;
+ if (get_user(namelen, &cnm->cn_len))
+ return -EFAULT;
+ name.data = memdup_user(&cnm->cn_id, namelen);
+ if (IS_ERR_OR_NULL(name.data))
+ return -EFAULT;
+ name.len = namelen;
+ }
if (name.len > 5 && memcmp(name.data, "hash:", 5) == 0) {
name.len = name.len - 5;
memmove(name.data, name.data + 5, name.len);
cn->cn_has_legacy = true;
}
- if (!nfs4_client_to_reclaim(name, nn)) {
+ if (!nfs4_client_to_reclaim(name, princhash, nn)) {
kfree(name.data);
+ kfree(princhash.data);
return -EFAULT;
}
- return sizeof(*cmsg);
+ return nn->client_tracking_ops->msglen;
}
return -EFAULT;
}
cld_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
{
struct cld_upcall *tmp, *cup;
- struct cld_msg __user *cmsg = (struct cld_msg __user *)src;
+ struct cld_msg_hdr __user *hdr = (struct cld_msg_hdr __user *)src;
+ struct cld_msg_v2 __user *cmsg = (struct cld_msg_v2 __user *)src;
uint32_t xid;
struct nfsd_net *nn = net_generic(file_inode(filp)->i_sb->s_fs_info,
nfsd_net_id);
struct cld_net *cn = nn->cld_net;
int16_t status;
- if (mlen != sizeof(*cmsg)) {
+ if (mlen != nn->client_tracking_ops->msglen) {
dprintk("%s: got %zu bytes, expected %zu\n", __func__, mlen,
- sizeof(*cmsg));
+ nn->client_tracking_ops->msglen);
return -EINVAL;
}
/* copy just the xid so we can try to find that */
- if (copy_from_user(&xid, &cmsg->cm_xid, sizeof(xid)) != 0) {
+ if (copy_from_user(&xid, &hdr->cm_xid, sizeof(xid)) != 0) {
dprintk("%s: error when copying xid from userspace", __func__);
return -EFAULT;
}
* list (for -EINPROGRESS, we just want to make sure the xid is
* valid, not remove the upcall from the list)
*/
- if (get_user(status, &cmsg->cm_status)) {
+ if (get_user(status, &hdr->cm_status)) {
dprintk("%s: error when copying status from userspace", __func__);
return -EFAULT;
}
cup = NULL;
spin_lock(&cn->cn_lock);
list_for_each_entry(tmp, &cn->cn_list, cu_list) {
- if (get_unaligned(&tmp->cu_msg.cm_xid) == xid) {
+ if (get_unaligned(&tmp->cu_u.cu_hdr.cm_xid) == xid) {
cup = tmp;
if (status != -EINPROGRESS)
list_del_init(&cup->cu_list);
if (status == -EINPROGRESS)
return __cld_pipe_inprogress_downcall(cmsg, nn);
- if (copy_from_user(&cup->cu_msg, src, mlen) != 0)
+ if (copy_from_user(&cup->cu_u.cu_msg_v2, src, mlen) != 0)
return -EFAULT;
complete(&cup->cu_done);
{
struct cld_msg *cmsg = msg->data;
struct cld_upcall *cup = container_of(cmsg, struct cld_upcall,
- cu_msg);
+ cu_u.cu_msg);
/* errno >= 0 means we got a downcall */
if (msg->errno >= 0)
nfsd4_cld_unregister_net(net, cn->cn_pipe);
rpc_destroy_pipe_data(cn->cn_pipe);
+ if (cn->cn_tfm)
+ crypto_free_shash(cn->cn_tfm);
kfree(nn->cld_net);
nn->cld_net = NULL;
}
static struct cld_upcall *
-alloc_cld_upcall(struct cld_net *cn)
+alloc_cld_upcall(struct nfsd_net *nn)
{
struct cld_upcall *new, *tmp;
+ struct cld_net *cn = nn->cld_net;
new = kzalloc(sizeof(*new), GFP_KERNEL);
if (!new)
restart_search:
spin_lock(&cn->cn_lock);
list_for_each_entry(tmp, &cn->cn_list, cu_list) {
- if (tmp->cu_msg.cm_xid == cn->cn_xid) {
+ if (tmp->cu_u.cu_msg.cm_xid == cn->cn_xid) {
cn->cn_xid++;
spin_unlock(&cn->cn_lock);
goto restart_search;
}
}
init_completion(&new->cu_done);
- new->cu_msg.cm_vers = CLD_UPCALL_VERSION;
- put_unaligned(cn->cn_xid++, &new->cu_msg.cm_xid);
+ new->cu_u.cu_msg.cm_vers = nn->client_tracking_ops->version;
+ put_unaligned(cn->cn_xid++, &new->cu_u.cu_msg.cm_xid);
new->cu_net = cn;
list_add(&new->cu_list, &cn->cn_list);
spin_unlock(&cn->cn_lock);
- dprintk("%s: allocated xid %u\n", __func__, new->cu_msg.cm_xid);
+ dprintk("%s: allocated xid %u\n", __func__, new->cu_u.cu_msg.cm_xid);
return new;
}
if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags))
return;
- cup = alloc_cld_upcall(cn);
+ cup = alloc_cld_upcall(nn);
if (!cup) {
ret = -ENOMEM;
goto out_err;
}
- cup->cu_msg.cm_cmd = Cld_Create;
- cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
- memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
+ cup->cu_u.cu_msg.cm_cmd = Cld_Create;
+ cup->cu_u.cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
+ memcpy(cup->cu_u.cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
clp->cl_name.len);
- ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
+ ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg);
if (!ret) {
- ret = cup->cu_msg.cm_status;
+ ret = cup->cu_u.cu_msg.cm_status;
set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags);
}
"record on stable storage: %d\n", ret);
}
+/* Ask daemon to create a new record */
+static void
+nfsd4_cld_create_v2(struct nfs4_client *clp)
+{
+ int ret;
+ struct cld_upcall *cup;
+ struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+ struct cld_net *cn = nn->cld_net;
+ struct cld_msg_v2 *cmsg;
+ struct crypto_shash *tfm = cn->cn_tfm;
+ struct xdr_netobj cksum;
+ char *principal = NULL;
+ SHASH_DESC_ON_STACK(desc, tfm);
+
+ /* Don't upcall if it's already stored */
+ if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags))
+ return;
+
+ cup = alloc_cld_upcall(nn);
+ if (!cup) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ cmsg = &cup->cu_u.cu_msg_v2;
+ cmsg->cm_cmd = Cld_Create;
+ cmsg->cm_u.cm_clntinfo.cc_name.cn_len = clp->cl_name.len;
+ memcpy(cmsg->cm_u.cm_clntinfo.cc_name.cn_id, clp->cl_name.data,
+ clp->cl_name.len);
+ if (clp->cl_cred.cr_raw_principal)
+ principal = clp->cl_cred.cr_raw_principal;
+ else if (clp->cl_cred.cr_principal)
+ principal = clp->cl_cred.cr_principal;
+ if (principal) {
+ desc->tfm = tfm;
+ cksum.len = crypto_shash_digestsize(tfm);
+ cksum.data = kmalloc(cksum.len, GFP_KERNEL);
+ if (cksum.data == NULL) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ ret = crypto_shash_digest(desc, principal, strlen(principal),
+ cksum.data);
+ shash_desc_zero(desc);
+ if (ret) {
+ kfree(cksum.data);
+ goto out;
+ }
+ cmsg->cm_u.cm_clntinfo.cc_princhash.cp_len = cksum.len;
+ memcpy(cmsg->cm_u.cm_clntinfo.cc_princhash.cp_data,
+ cksum.data, cksum.len);
+ kfree(cksum.data);
+ } else
+ cmsg->cm_u.cm_clntinfo.cc_princhash.cp_len = 0;
+
+ ret = cld_pipe_upcall(cn->cn_pipe, cmsg);
+ if (!ret) {
+ ret = cmsg->cm_status;
+ set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags);
+ }
+
+out:
+ free_cld_upcall(cup);
+out_err:
+ if (ret)
+ pr_err("NFSD: Unable to create client record on stable storage: %d\n",
+ ret);
+}
+
/* Ask daemon to create a new record */
static void
nfsd4_cld_remove(struct nfs4_client *clp)
if (!test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags))
return;
- cup = alloc_cld_upcall(cn);
+ cup = alloc_cld_upcall(nn);
if (!cup) {
ret = -ENOMEM;
goto out_err;
}
- cup->cu_msg.cm_cmd = Cld_Remove;
- cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
- memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
+ cup->cu_u.cu_msg.cm_cmd = Cld_Remove;
+ cup->cu_u.cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
+ memcpy(cup->cu_u.cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
clp->cl_name.len);
- ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
+ ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg);
if (!ret) {
- ret = cup->cu_msg.cm_status;
+ ret = cup->cu_u.cu_msg.cm_status;
clear_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags);
}
if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags))
return 0;
- cup = alloc_cld_upcall(cn);
+ cup = alloc_cld_upcall(nn);
if (!cup) {
printk(KERN_ERR "NFSD: Unable to check client record on "
"stable storage: %d\n", -ENOMEM);
return -ENOMEM;
}
- cup->cu_msg.cm_cmd = Cld_Check;
- cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
- memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
+ cup->cu_u.cu_msg.cm_cmd = Cld_Check;
+ cup->cu_u.cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
+ memcpy(cup->cu_u.cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
clp->cl_name.len);
- ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
+ ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg);
if (!ret) {
- ret = cup->cu_msg.cm_status;
+ ret = cup->cu_u.cu_msg.cm_status;
set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags);
}
return 0;
}
+static int
+nfsd4_cld_check_v2(struct nfs4_client *clp)
+{
+ struct nfs4_client_reclaim *crp;
+ struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+ struct cld_net *cn = nn->cld_net;
+ int status;
+ char dname[HEXDIR_LEN];
+ struct xdr_netobj name;
+ struct crypto_shash *tfm = cn->cn_tfm;
+ struct xdr_netobj cksum;
+ char *principal = NULL;
+ SHASH_DESC_ON_STACK(desc, tfm);
+
+ /* did we already find that this client is stable? */
+ if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags))
+ return 0;
+
+ /* look for it in the reclaim hashtable otherwise */
+ crp = nfsd4_find_reclaim_client(clp->cl_name, nn);
+ if (crp)
+ goto found;
+
+ if (cn->cn_has_legacy) {
+ status = nfs4_make_rec_clidname(dname, &clp->cl_name);
+ if (status)
+ return -ENOENT;
+
+ name.data = kmemdup(dname, HEXDIR_LEN, GFP_KERNEL);
+ if (!name.data) {
+ dprintk("%s: failed to allocate memory for name.data\n",
+ __func__);
+ return -ENOENT;
+ }
+ name.len = HEXDIR_LEN;
+ crp = nfsd4_find_reclaim_client(name, nn);
+ kfree(name.data);
+ if (crp)
+ goto found;
+
+ }
+ return -ENOENT;
+found:
+ if (crp->cr_princhash.len) {
+ if (clp->cl_cred.cr_raw_principal)
+ principal = clp->cl_cred.cr_raw_principal;
+ else if (clp->cl_cred.cr_principal)
+ principal = clp->cl_cred.cr_principal;
+ if (principal == NULL)
+ return -ENOENT;
+ desc->tfm = tfm;
+ cksum.len = crypto_shash_digestsize(tfm);
+ cksum.data = kmalloc(cksum.len, GFP_KERNEL);
+ if (cksum.data == NULL)
+ return -ENOENT;
+ status = crypto_shash_digest(desc, principal, strlen(principal),
+ cksum.data);
+ shash_desc_zero(desc);
+ if (status) {
+ kfree(cksum.data);
+ return -ENOENT;
+ }
+ if (memcmp(crp->cr_princhash.data, cksum.data,
+ crp->cr_princhash.len)) {
+ kfree(cksum.data);
+ return -ENOENT;
+ }
+ kfree(cksum.data);
+ }
+ crp->cr_clp = clp;
+ return 0;
+}
+
static int
nfsd4_cld_grace_start(struct nfsd_net *nn)
{
struct cld_upcall *cup;
struct cld_net *cn = nn->cld_net;
- cup = alloc_cld_upcall(cn);
+ cup = alloc_cld_upcall(nn);
if (!cup) {
ret = -ENOMEM;
goto out_err;
}
- cup->cu_msg.cm_cmd = Cld_GraceStart;
- ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
+ cup->cu_u.cu_msg.cm_cmd = Cld_GraceStart;
+ ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg);
if (!ret)
- ret = cup->cu_msg.cm_status;
+ ret = cup->cu_u.cu_msg.cm_status;
free_cld_upcall(cup);
out_err:
struct cld_upcall *cup;
struct cld_net *cn = nn->cld_net;
- cup = alloc_cld_upcall(cn);
+ cup = alloc_cld_upcall(nn);
if (!cup) {
ret = -ENOMEM;
goto out_err;
}
- cup->cu_msg.cm_cmd = Cld_GraceDone;
- cup->cu_msg.cm_u.cm_gracetime = (int64_t)nn->boot_time;
- ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
+ cup->cu_u.cu_msg.cm_cmd = Cld_GraceDone;
+ cup->cu_u.cu_msg.cm_u.cm_gracetime = (int64_t)nn->boot_time;
+ ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg);
if (!ret)
- ret = cup->cu_msg.cm_status;
+ ret = cup->cu_u.cu_msg.cm_status;
free_cld_upcall(cup);
out_err:
struct cld_upcall *cup;
struct cld_net *cn = nn->cld_net;
- cup = alloc_cld_upcall(cn);
+ cup = alloc_cld_upcall(nn);
if (!cup) {
ret = -ENOMEM;
goto out_err;
}
- cup->cu_msg.cm_cmd = Cld_GraceDone;
- ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
+ cup->cu_u.cu_msg.cm_cmd = Cld_GraceDone;
+ ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg);
if (!ret)
- ret = cup->cu_msg.cm_status;
+ ret = cup->cu_u.cu_msg.cm_status;
free_cld_upcall(cup);
out_err:
return pipe->nreaders || pipe->nwriters;
}
+static int
+nfsd4_cld_get_version(struct nfsd_net *nn)
+{
+ int ret = 0;
+ struct cld_upcall *cup;
+ struct cld_net *cn = nn->cld_net;
+ uint8_t version;
+
+ cup = alloc_cld_upcall(nn);
+ if (!cup) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+ cup->cu_u.cu_msg.cm_cmd = Cld_GetVersion;
+ ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg);
+ if (!ret) {
+ ret = cup->cu_u.cu_msg.cm_status;
+ if (ret)
+ goto out_free;
+ version = cup->cu_u.cu_msg.cm_u.cm_version;
+ dprintk("%s: userspace returned version %u\n",
+ __func__, version);
+ if (version < 1)
+ version = 1;
+ else if (version > CLD_UPCALL_VERSION)
+ version = CLD_UPCALL_VERSION;
+
+ switch (version) {
+ case 1:
+ nn->client_tracking_ops = &nfsd4_cld_tracking_ops;
+ break;
+ case 2:
+ nn->client_tracking_ops = &nfsd4_cld_tracking_ops_v2;
+ break;
+ default:
+ break;
+ }
+ }
+out_free:
+ free_cld_upcall(cup);
+out_err:
+ if (ret)
+ dprintk("%s: Unable to get version from userspace: %d\n",
+ __func__, ret);
+ return ret;
+}
+
static int
nfsd4_cld_tracking_init(struct net *net)
{
status = __nfsd4_init_cld_pipe(net);
if (status)
goto err_shutdown;
+ nn->cld_net->cn_tfm = crypto_alloc_shash("sha256", 0, 0);
+ if (IS_ERR(nn->cld_net->cn_tfm)) {
+ status = PTR_ERR(nn->cld_net->cn_tfm);
+ goto err_remove;
+ }
/*
* rpc pipe upcalls take 30 seconds to time out, so we don't want to
goto err_remove;
}
+ status = nfsd4_cld_get_version(nn);
+ if (status == -EOPNOTSUPP)
+ pr_warn("NFSD: nfsdcld GetVersion upcall failed. Please upgrade nfsdcld.\n");
+
status = nfsd4_cld_grace_start(nn);
if (status) {
if (status == -EOPNOTSUPP)
- printk(KERN_WARNING "NFSD: Please upgrade nfsdcld.\n");
+ pr_warn("NFSD: nfsdcld GraceStart upcall failed. Please upgrade nfsdcld.\n");
nfs4_release_reclaim(nn);
goto err_remove;
} else
.remove = nfsd4_cld_remove,
.check = nfsd4_cld_check_v0,
.grace_done = nfsd4_cld_grace_done_v0,
+ .version = 1,
+ .msglen = sizeof(struct cld_msg),
};
/* For newer nfsdcld's */
.remove = nfsd4_cld_remove,
.check = nfsd4_cld_check,
.grace_done = nfsd4_cld_grace_done,
+ .version = 1,
+ .msglen = sizeof(struct cld_msg),
+};
+
+/* v2 create/check ops include the principal, if available */
+static const struct nfsd4_client_tracking_ops nfsd4_cld_tracking_ops_v2 = {
+ .init = nfsd4_cld_tracking_init,
+ .exit = nfsd4_cld_tracking_exit,
+ .create = nfsd4_cld_create_v2,
+ .remove = nfsd4_cld_remove,
+ .check = nfsd4_cld_check_v2,
+ .grace_done = nfsd4_cld_grace_done,
+ .version = 2,
+ .msglen = sizeof(struct cld_msg_v2),
};
/* upcall via usermodehelper */
.remove = nfsd4_umh_cltrack_remove,
.check = nfsd4_umh_cltrack_check,
.grace_done = nfsd4_umh_cltrack_grace_done,
+ .version = 1,
+ .msglen = 0,
};
int
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"
#define NFSDDBG_FACILITY NFSDDBG_PROC
}
}
-static struct file *
+static struct nfsd_file *
__nfs4_get_fd(struct nfs4_file *f, int oflag)
{
if (f->fi_fds[oflag])
- return get_file(f->fi_fds[oflag]);
+ return nfsd_file_get(f->fi_fds[oflag]);
return NULL;
}
-static struct file *
+static struct nfsd_file *
find_writeable_file_locked(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;
lockdep_assert_held(&f->fi_lock);
return ret;
}
-static struct file *
+static struct nfsd_file *
find_writeable_file(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;
spin_lock(&f->fi_lock);
ret = find_writeable_file_locked(f);
return ret;
}
-static struct file *find_readable_file_locked(struct nfs4_file *f)
+static struct nfsd_file *
+find_readable_file_locked(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;
lockdep_assert_held(&f->fi_lock);
return ret;
}
-static struct file *
+static struct nfsd_file *
find_readable_file(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;
spin_lock(&f->fi_lock);
ret = find_readable_file_locked(f);
return ret;
}
-struct file *
+struct nfsd_file *
find_any_file(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;
spin_lock(&f->fi_lock);
ret = __nfs4_get_fd(f, O_RDWR);
might_lock(&fp->fi_lock);
if (atomic_dec_and_lock(&fp->fi_access[oflag], &fp->fi_lock)) {
- struct file *f1 = NULL;
- struct file *f2 = NULL;
+ struct nfsd_file *f1 = NULL;
+ struct nfsd_file *f2 = NULL;
swap(f1, fp->fi_fds[oflag]);
if (atomic_read(&fp->fi_access[1 - oflag]) == 0)
swap(f2, fp->fi_fds[O_RDWR]);
spin_unlock(&fp->fi_lock);
if (f1)
- fput(f1);
+ nfsd_file_put(f1);
if (f2)
- fput(f2);
+ nfsd_file_put(f2);
}
}
static void put_deleg_file(struct nfs4_file *fp)
{
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
spin_lock(&fp->fi_lock);
if (--fp->fi_delegees == 0)
- swap(filp, fp->fi_deleg_file);
+ swap(nf, fp->fi_deleg_file);
spin_unlock(&fp->fi_lock);
- if (filp)
- fput(filp);
+ if (nf)
+ nfsd_file_put(nf);
}
static void nfs4_unlock_deleg_lease(struct nfs4_delegation *dp)
{
struct nfs4_file *fp = dp->dl_stid.sc_file;
- struct file *filp = fp->fi_deleg_file;
+ struct nfsd_file *nf = fp->fi_deleg_file;
WARN_ON_ONCE(!fp->fi_delegees);
- vfs_setlease(filp, F_UNLCK, NULL, (void **)&dp);
+ vfs_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&dp);
put_deleg_file(fp);
}
{
struct nfs4_ol_stateid *stp = openlockstateid(stid);
struct nfs4_lockowner *lo = lockowner(stp->st_stateowner);
- struct file *file;
+ struct nfsd_file *nf;
- file = find_any_file(stp->st_stid.sc_file);
- if (file)
- filp_close(file, (fl_owner_t)lo);
+ nf = find_any_file(stp->st_stid.sc_file);
+ if (nf) {
+ get_file(nf->nf_file);
+ filp_close(nf->nf_file, (fl_owner_t)lo);
+ nfsd_file_put(nf);
+ }
nfs4_free_ol_stateid(stid);
}
* re-negotiate active sessions and reduce their slot usage to make
* room for new connections. For now we just fail the create session.
*/
-static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
+static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
{
u32 slotsize = slot_bytes(ca);
u32 num = ca->maxreqs;
unsigned long avail, total_avail;
+ unsigned int scale_factor;
spin_lock(&nfsd_drc_lock);
- total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
+ if (nfsd_drc_max_mem > nfsd_drc_mem_used)
+ total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
+ else
+ /* We have handed out more space than we chose in
+ * set_max_drc() to allow. That isn't really a
+ * problem as long as that doesn't make us think we
+ * have lots more due to integer overflow.
+ */
+ total_avail = 0;
avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
/*
- * Never use more than a third of the remaining memory,
- * unless it's the only way to give this client a slot:
+ * Never use more than a fraction of the remaining memory,
+ * unless it's the only way to give this client a slot.
+ * The chosen fraction is either 1/8 or 1/number of threads,
+ * whichever is smaller. This ensures there are adequate
+ * slots to support multiple clients per thread.
+ * Give the client one slot even if that would require
+ * over-allocation--it is better than failure.
*/
- avail = clamp_t(unsigned long, avail, slotsize, total_avail/3);
+ scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
+
+ avail = clamp_t(unsigned long, avail, slotsize,
+ total_avail/scale_factor);
num = min_t(int, num, avail / slotsize);
+ num = max_t(int, num, 1);
nfsd_drc_mem_used += num * slotsize;
spin_unlock(&nfsd_drc_lock);
spin_unlock(&clp->cl_lock);
}
-static void nfs4_show_superblock(struct seq_file *s, struct file *f)
+static void nfs4_show_superblock(struct seq_file *s, struct nfsd_file *f)
{
- struct inode *inode = file_inode(f);
+ struct inode *inode = f->nf_inode;
seq_printf(s, "superblock: \"%02x:%02x:%ld\"",
MAJOR(inode->i_sb->s_dev),
{
struct nfs4_ol_stateid *ols;
struct nfs4_file *nf;
- struct file *file;
+ struct nfsd_file *file;
struct nfs4_stateowner *oo;
unsigned int access, deny;
seq_printf(s, ", ");
nfs4_show_owner(s, oo);
seq_printf(s, " }\n");
- fput(file);
+ nfsd_file_put(file);
return 0;
}
{
struct nfs4_ol_stateid *ols;
struct nfs4_file *nf;
- struct file *file;
+ struct nfsd_file *file;
struct nfs4_stateowner *oo;
ols = openlockstateid(st);
seq_printf(s, ", ");
nfs4_show_owner(s, oo);
seq_printf(s, " }\n");
- fput(file);
+ nfsd_file_put(file);
return 0;
}
{
struct nfs4_delegation *ds;
struct nfs4_file *nf;
- struct file *file;
+ struct nfsd_file *file;
ds = delegstateid(st);
nf = st->sc_file;
static int nfs4_show_layout(struct seq_file *s, struct nfs4_stid *st)
{
struct nfs4_layout_stateid *ls;
- struct file *file;
+ struct nfsd_file *file;
ls = container_of(st, struct nfs4_layout_stateid, ls_stid);
file = ls->ls_file;
* performance. When short on memory we therefore prefer to
* decrease number of slots instead of their size. Clients that
* request larger slots than they need will get poor results:
+ * Note that we always allow at least one slot, because our
+ * accounting is soft and provides no guarantees either way.
*/
- ca->maxreqs = nfsd4_get_drc_mem(ca);
- if (!ca->maxreqs)
- return nfserr_jukebox;
+ ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
return nfs_ok;
}
struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp,
struct nfsd4_open *open)
{
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
__be32 status;
int oflag = nfs4_access_to_omode(open->op_share_access);
int access = nfs4_access_to_access(open->op_share_access);
if (!fp->fi_fds[oflag]) {
spin_unlock(&fp->fi_lock);
- status = nfsd_open(rqstp, cur_fh, S_IFREG, access, &filp);
+ status = nfsd_file_acquire(rqstp, cur_fh, access, &nf);
if (status)
goto out_put_access;
spin_lock(&fp->fi_lock);
if (!fp->fi_fds[oflag]) {
- fp->fi_fds[oflag] = filp;
- filp = NULL;
+ fp->fi_fds[oflag] = nf;
+ nf = NULL;
}
}
spin_unlock(&fp->fi_lock);
- if (filp)
- fput(filp);
+ if (nf)
+ nfsd_file_put(nf);
status = nfsd4_truncate(rqstp, cur_fh, open);
if (status)
fl->fl_end = OFFSET_MAX;
fl->fl_owner = (fl_owner_t)dp;
fl->fl_pid = current->tgid;
- fl->fl_file = dp->dl_stid.sc_file->fi_deleg_file;
+ fl->fl_file = dp->dl_stid.sc_file->fi_deleg_file->nf_file;
return fl;
}
{
int status = 0;
struct nfs4_delegation *dp;
- struct file *filp;
+ struct nfsd_file *nf;
struct file_lock *fl;
/*
if (fp->fi_had_conflict)
return ERR_PTR(-EAGAIN);
- filp = find_readable_file(fp);
- if (!filp) {
+ nf = find_readable_file(fp);
+ if (!nf) {
/* We should always have a readable file here */
WARN_ON_ONCE(1);
return ERR_PTR(-EBADF);
if (nfs4_delegation_exists(clp, fp))
status = -EAGAIN;
else if (!fp->fi_deleg_file) {
- fp->fi_deleg_file = filp;
+ fp->fi_deleg_file = nf;
/* increment early to prevent fi_deleg_file from being
* cleared */
fp->fi_delegees = 1;
- filp = NULL;
+ nf = NULL;
} else
fp->fi_delegees++;
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
- if (filp)
- fput(filp);
+ if (nf)
+ nfsd_file_put(nf);
if (status)
return ERR_PTR(status);
if (!fl)
goto out_clnt_odstate;
- status = vfs_setlease(fp->fi_deleg_file, fl->fl_type, &fl, NULL);
+ status = vfs_setlease(fp->fi_deleg_file->nf_file, fl->fl_type, &fl, NULL);
if (fl)
locks_free_lock(fl);
if (status)
return dp;
out_unlock:
- vfs_setlease(fp->fi_deleg_file, F_UNLCK, NULL, (void **)&dp);
+ vfs_setlease(fp->fi_deleg_file->nf_file, F_UNLCK, NULL, (void **)&dp);
out_clnt_odstate:
put_clnt_odstate(dp->dl_clnt_odstate);
nfs4_put_stid(&dp->dl_stid);
return nfs_ok;
}
-static struct file *
+static struct nfsd_file *
nfs4_find_file(struct nfs4_stid *s, int flags)
{
if (!s)
case NFS4_DELEG_STID:
if (WARN_ON_ONCE(!s->sc_file->fi_deleg_file))
return NULL;
- return get_file(s->sc_file->fi_deleg_file);
+ return nfsd_file_get(s->sc_file->fi_deleg_file);
case NFS4_OPEN_STID:
case NFS4_LOCK_STID:
if (flags & RD_STATE)
static __be32
nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
- struct file **filpp, bool *tmp_file, int flags)
+ struct nfsd_file **nfp, int flags)
{
int acc = (flags & RD_STATE) ? NFSD_MAY_READ : NFSD_MAY_WRITE;
- struct file *file;
+ struct nfsd_file *nf;
__be32 status;
- file = nfs4_find_file(s, flags);
- if (file) {
+ nf = nfs4_find_file(s, flags);
+ if (nf) {
status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
acc | NFSD_MAY_OWNER_OVERRIDE);
if (status) {
- fput(file);
- return status;
+ nfsd_file_put(nf);
+ goto out;
}
-
- *filpp = file;
} else {
- status = nfsd_open(rqstp, fhp, S_IFREG, acc, filpp);
+ status = nfsd_file_acquire(rqstp, fhp, acc, &nf);
if (status)
return status;
-
- if (tmp_file)
- *tmp_file = true;
}
-
- return 0;
+ *nfp = nf;
+out:
+ return status;
}
/*
__be32
nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
- stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file)
+ stateid_t *stateid, int flags, struct nfsd_file **nfp)
{
struct inode *ino = d_inode(fhp->fh_dentry);
struct net *net = SVC_NET(rqstp);
struct nfs4_stid *s = NULL;
__be32 status;
- if (filpp)
- *filpp = NULL;
- if (tmp_file)
- *tmp_file = false;
+ if (nfp)
+ *nfp = NULL;
if (grace_disallows_io(net, ino))
return nfserr_grace;
status = nfs4_check_fh(fhp, s);
done:
- if (!status && filpp)
- status = nfs4_check_file(rqstp, fhp, s, filpp, tmp_file, flags);
+ if (status == nfs_ok && nfp)
+ status = nfs4_check_file(rqstp, fhp, s, nfp, flags);
out:
if (s)
nfs4_put_stid(s);
struct nfs4_ol_stateid *lock_stp = NULL;
struct nfs4_ol_stateid *open_stp = NULL;
struct nfs4_file *fp;
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
struct nfsd4_blocked_lock *nbl = NULL;
struct file_lock *file_lock = NULL;
struct file_lock *conflock = NULL;
/* Fallthrough */
case NFS4_READ_LT:
spin_lock(&fp->fi_lock);
- filp = find_readable_file_locked(fp);
- if (filp)
+ nf = find_readable_file_locked(fp);
+ if (nf)
get_lock_access(lock_stp, NFS4_SHARE_ACCESS_READ);
spin_unlock(&fp->fi_lock);
fl_type = F_RDLCK;
/* Fallthrough */
case NFS4_WRITE_LT:
spin_lock(&fp->fi_lock);
- filp = find_writeable_file_locked(fp);
- if (filp)
+ nf = find_writeable_file_locked(fp);
+ if (nf)
get_lock_access(lock_stp, NFS4_SHARE_ACCESS_WRITE);
spin_unlock(&fp->fi_lock);
fl_type = F_WRLCK;
goto out;
}
- if (!filp) {
+ if (!nf) {
status = nfserr_openmode;
goto out;
}
file_lock->fl_type = fl_type;
file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
file_lock->fl_pid = current->tgid;
- file_lock->fl_file = filp;
+ file_lock->fl_file = nf->nf_file;
file_lock->fl_flags = fl_flags;
file_lock->fl_lmops = &nfsd_posix_mng_ops;
file_lock->fl_start = lock->lk_offset;
spin_unlock(&nn->blocked_locks_lock);
}
- err = vfs_lock_file(filp, F_SETLK, file_lock, conflock);
+ err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, conflock);
switch (err) {
case 0: /* success! */
nfs4_inc_and_copy_stateid(&lock->lk_resp_stateid, &lock_stp->st_stid);
}
free_blocked_lock(nbl);
}
- if (filp)
- fput(filp);
+ if (nf)
+ nfsd_file_put(nf);
if (lock_stp) {
/* Bump seqid manually if the 4.0 replay owner is openowner */
if (cstate->replay_owner &&
*/
static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
{
- struct file *file;
- __be32 err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+ struct nfsd_file *nf;
+ __be32 err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
if (!err) {
- err = nfserrno(vfs_test_lock(file, lock));
- fput(file);
+ err = nfserrno(vfs_test_lock(nf->nf_file, lock));
+ nfsd_file_put(nf);
}
return err;
}
{
struct nfsd4_locku *locku = &u->locku;
struct nfs4_ol_stateid *stp;
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
struct file_lock *file_lock = NULL;
__be32 status;
int err;
&stp, nn);
if (status)
goto out;
- filp = find_any_file(stp->st_stid.sc_file);
- if (!filp) {
+ nf = find_any_file(stp->st_stid.sc_file);
+ if (!nf) {
status = nfserr_lock_range;
goto put_stateid;
}
if (!file_lock) {
dprintk("NFSD: %s: unable to allocate lock!\n", __func__);
status = nfserr_jukebox;
- goto fput;
+ goto put_file;
}
file_lock->fl_type = F_UNLCK;
file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
file_lock->fl_pid = current->tgid;
- file_lock->fl_file = filp;
+ file_lock->fl_file = nf->nf_file;
file_lock->fl_flags = FL_POSIX;
file_lock->fl_lmops = &nfsd_posix_mng_ops;
file_lock->fl_start = locku->lu_offset;
locku->lu_length);
nfs4_transform_lock_offset(file_lock);
- err = vfs_lock_file(filp, F_SETLK, file_lock, NULL);
+ err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, NULL);
if (err) {
dprintk("NFSD: nfs4_locku: vfs_lock_file failed!\n");
goto out_nfserr;
}
nfs4_inc_and_copy_stateid(&locku->lu_stateid, &stp->st_stid);
-fput:
- fput(filp);
+put_file:
+ nfsd_file_put(nf);
put_stateid:
mutex_unlock(&stp->st_mutex);
nfs4_put_stid(&stp->st_stid);
out_nfserr:
status = nfserrno(err);
- goto fput;
+ goto put_file;
}
/*
{
struct file_lock *fl;
int status = false;
- struct file *filp = find_any_file(fp);
+ struct nfsd_file *nf = find_any_file(fp);
struct inode *inode;
struct file_lock_context *flctx;
- if (!filp) {
+ if (!nf) {
/* Any valid lock stateid should have some sort of access */
WARN_ON_ONCE(1);
return status;
}
- inode = locks_inode(filp);
+ inode = locks_inode(nf->nf_file);
flctx = inode->i_flctx;
if (flctx && !list_empty_careful(&flctx->flc_posix)) {
}
spin_unlock(&flctx->flc_lock);
}
- fput(filp);
+ nfsd_file_put(nf);
return status;
}
* will be freed in nfs4_remove_reclaim_record in the normal case).
*/
struct nfs4_client_reclaim *
-nfs4_client_to_reclaim(struct xdr_netobj name, struct nfsd_net *nn)
+nfs4_client_to_reclaim(struct xdr_netobj name, struct xdr_netobj princhash,
+ struct nfsd_net *nn)
{
unsigned int strhashval;
struct nfs4_client_reclaim *crp;
list_add(&crp->cr_strhash, &nn->reclaim_str_hashtbl[strhashval]);
crp->cr_name.data = name.data;
crp->cr_name.len = name.len;
+ crp->cr_princhash.data = princhash.data;
+ crp->cr_princhash.len = princhash.len;
crp->cr_clp = NULL;
nn->reclaim_str_hashtbl_size++;
}
{
list_del(&crp->cr_strhash);
kfree(crp->cr_name.data);
+ kfree(crp->cr_princhash.data);
kfree(crp);
nn->reclaim_str_hashtbl_size--;
}
#include "cache.h"
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"
#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
#include <linux/security.h>
return p;
}
+static unsigned int compoundargs_bytes_left(struct nfsd4_compoundargs *argp)
+{
+ unsigned int this = (char *)argp->end - (char *)argp->p;
+
+ return this + argp->pagelen;
+}
+
static int zero_clientid(clientid_t *clid)
{
return (clid->cl_boot == 0) && (clid->cl_id == 0);
/**
* svcxdr_tmpalloc - allocate memory to be freed after compound processing
* @argp: NFSv4 compound argument structure
- * @p: pointer to be freed (with kfree())
+ * @len: length of buffer to allocate
*
- * Marks @p to be freed when processing the compound operation
- * described in @argp finishes.
+ * Allocates a buffer of size @len to be freed when processing the compound
+ * operation described in @argp finishes.
*/
static void *
svcxdr_tmpalloc(struct nfsd4_compoundargs *argp, u32 len)
READ_BUF(4); len += 4;
nace = be32_to_cpup(p++);
- if (nace > NFS4_ACL_MAX)
+ if (nace > compoundargs_bytes_left(argp)/20)
+ /*
+ * Even with 4-byte names there wouldn't be
+ * space for that many aces; something fishy is
+ * going on:
+ */
return nfserr_fbig;
*acl = svcxdr_tmpalloc(argp, nfs4_acl_bytes(nace));
struct nfsd4_create_session *sess)
{
DECODE_HEAD;
- u32 dummy;
READ_BUF(16);
COPYMEM(&sess->clientid, 8);
/* Fore channel attrs */
READ_BUF(28);
- dummy = be32_to_cpup(p++); /* headerpadsz is always 0 */
+ p++; /* headerpadsz is always 0 */
sess->fore_channel.maxreq_sz = be32_to_cpup(p++);
sess->fore_channel.maxresp_sz = be32_to_cpup(p++);
sess->fore_channel.maxresp_cached = be32_to_cpup(p++);
/* Back channel attrs */
READ_BUF(28);
- dummy = be32_to_cpup(p++); /* headerpadsz is always 0 */
+ p++; /* headerpadsz is always 0 */
sess->back_channel.maxreq_sz = be32_to_cpup(p++);
sess->back_channel.maxresp_sz = be32_to_cpup(p++);
sess->back_channel.maxresp_cached = be32_to_cpup(p++);
nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
{
DECODE_HEAD;
- unsigned int tmp;
status = nfsd4_decode_stateid(argp, ©->cp_src_stateid);
if (status)
p = xdr_decode_hyper(p, ©->cp_count);
p++; /* ca_consecutive: we always do consecutive copies */
copy->cp_synchronous = be32_to_cpup(p++);
- tmp = be32_to_cpup(p); /* Source server list not supported */
+ /* tmp = be32_to_cpup(p); Source server list not supported */
DECODE_TAIL;
}
if (!p)
return nfserr_resource;
encode_cinfo(p, &create->cr_cinfo);
- nfserr = nfsd4_encode_bitmap(xdr, create->cr_bmval[0],
+ return nfsd4_encode_bitmap(xdr, create->cr_bmval[0],
create->cr_bmval[1], create->cr_bmval[2]);
- return 0;
}
static __be32
len = maxcount;
nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp,
- file, read->rd_offset, &maxcount);
+ file, read->rd_offset, &maxcount, &eof);
read->rd_length = maxcount;
if (nfserr) {
/*
return nfserr;
}
- eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
- d_inode(read->rd_fhp->fh_dentry)->i_size);
-
*(p++) = htonl(eof);
*(p++) = htonl(maxcount);
len = maxcount;
nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
- resp->rqstp->rq_vec, read->rd_vlen, &maxcount);
+ resp->rqstp->rq_vec, read->rd_vlen, &maxcount,
+ &eof);
read->rd_length = maxcount;
if (nfserr)
return nfserr;
xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
- eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
- d_inode(read->rd_fhp->fh_dentry)->i_size);
-
tmp = htonl(eof);
write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
tmp = htonl(maxcount);
{
unsigned long maxcount;
struct xdr_stream *xdr = &resp->xdr;
- struct file *file = read->rd_filp;
+ struct file *file;
int starting_len = xdr->buf->len;
- struct raparms *ra = NULL;
__be32 *p;
+ if (nfserr)
+ return nfserr;
+ file = read->rd_nf->nf_file;
+
p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
if (!p) {
WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags));
(xdr->buf->buflen - xdr->buf->len));
maxcount = min_t(unsigned long, maxcount, read->rd_length);
- if (read->rd_tmp_file)
- ra = nfsd_init_raparms(file);
-
if (file->f_op->splice_read &&
test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
else
nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
- if (ra)
- nfsd_put_raparams(file, ra);
-
if (nfserr)
xdr_truncate_encode(xdr, starting_len);
atomic_set(&nn->ntf_refcnt, 0);
init_waitqueue_head(&nn->ntf_wq);
+ seqlock_init(&nn->boot_lock);
mnt = vfs_kern_mount(&nfsd_fs_type, SB_KERNMOUNT, "nfsd", NULL);
if (IS_ERR(mnt)) {
struct nfsd_readargs *argp = rqstp->rq_argp;
struct nfsd_readres *resp = rqstp->rq_resp;
__be32 nfserr;
+ u32 eof;
dprintk("nfsd: READ %s %d bytes at %d\n",
SVCFH_fmt(&argp->fh),
nfserr = nfsd_read(rqstp, fh_copy(&resp->fh, &argp->fh),
argp->offset,
rqstp->rq_vec, argp->vlen,
- &resp->count);
+ &resp->count,
+ &eof);
if (nfserr) return nfserr;
return fh_getattr(&resp->fh, &resp->stat);
#include "cache.h"
#include "vfs.h"
#include "netns.h"
+#include "filecache.h"
#define NFSDDBG_FACILITY NFSDDBG_SVC
if (nfsd_users++)
return 0;
- /*
- * Readahead param cache - will no-op if it already exists.
- * (Note therefore results will be suboptimal if number of
- * threads is modified after nfsd start.)
- */
- ret = nfsd_racache_init(2*nrservs);
+ ret = nfsd_file_cache_init();
if (ret)
goto dec_users;
ret = nfs4_state_start();
if (ret)
- goto out_racache;
+ goto out_file_cache;
return 0;
-out_racache:
- nfsd_racache_shutdown();
+out_file_cache:
+ nfsd_file_cache_shutdown();
dec_users:
nfsd_users--;
return ret;
return;
nfs4_state_shutdown();
- nfsd_racache_shutdown();
+ nfsd_file_cache_shutdown();
}
static bool nfsd_needs_lockd(struct nfsd_net *nn)
return nfsd_vers(nn, 2, NFSD_TEST) || nfsd_vers(nn, 3, NFSD_TEST);
}
+void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn)
+{
+ int seq = 0;
+
+ do {
+ read_seqbegin_or_lock(&nn->boot_lock, &seq);
+ /*
+ * This is opaque to client, so no need to byte-swap. Use
+ * __force to keep sparse happy. y2038 time_t overflow is
+ * irrelevant in this usage
+ */
+ verf[0] = (__force __be32)nn->nfssvc_boot.tv_sec;
+ verf[1] = (__force __be32)nn->nfssvc_boot.tv_nsec;
+ } while (need_seqretry(&nn->boot_lock, seq));
+ done_seqretry(&nn->boot_lock, seq);
+}
+
+static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn)
+{
+ ktime_get_real_ts64(&nn->nfssvc_boot);
+}
+
+void nfsd_reset_boot_verifier(struct nfsd_net *nn)
+{
+ write_seqlock(&nn->boot_lock);
+ nfsd_reset_boot_verifier_locked(nn);
+ write_sequnlock(&nn->boot_lock);
+}
+
static int nfsd_startup_net(int nrservs, struct net *net, const struct cred *cred)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ nfsd_file_cache_purge(net);
nfs4_state_shutdown_net(net);
if (nn->lockd_up) {
lockd_down(net);
#endif
}
atomic_inc(&nn->ntf_refcnt);
- ktime_get_real_ts64(&nn->nfssvc_boot); /* record boot time */
+ nfsd_reset_boot_verifier(nn);
return 0;
}
struct list_head cr_strhash; /* hash by cr_name */
struct nfs4_client *cr_clp; /* pointer to associated clp */
struct xdr_netobj cr_name; /* recovery dir name */
+ struct xdr_netobj cr_princhash;
};
/* A reasonable value for REPLAY_ISIZE was estimated as follows:
};
struct list_head fi_clnt_odstate;
/* One each for O_RDONLY, O_WRONLY, O_RDWR: */
- struct file * fi_fds[3];
+ struct nfsd_file *fi_fds[3];
/*
* Each open or lock stateid contributes 0-4 to the counts
* below depending on which bits are set in st_access_bitmap:
*/
atomic_t fi_access[2];
u32 fi_share_deny;
- struct file *fi_deleg_file;
+ struct nfsd_file *fi_deleg_file;
int fi_delegees;
struct knfsd_fh fi_fhandle;
bool fi_had_conflict;
spinlock_t ls_lock;
struct list_head ls_layouts;
u32 ls_layout_type;
- struct file *ls_file;
+ struct nfsd_file *ls_file;
struct nfsd4_callback ls_recall;
stateid_t ls_recall_sid;
bool ls_recalled;
extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
- stateid_t *stateid, int flags, struct file **filp, bool *tmp_file);
+ stateid_t *stateid, int flags, struct nfsd_file **filp);
__be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
stateid_t *stateid, unsigned char typemask,
struct nfs4_stid **s, struct nfsd_net *nn);
extern void nfsd4_shutdown_copy(struct nfs4_client *clp);
extern void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp);
extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name,
- struct nfsd_net *nn);
+ struct xdr_netobj princhash, struct nfsd_net *nn);
extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn);
struct nfs4_file *find_file(struct knfsd_fh *fh);
{
refcount_inc(&fi->fi_ref);
}
-struct file *find_any_file(struct nfs4_file *f);
+struct nfsd_file *find_any_file(struct nfs4_file *f);
/* grace period management */
void nfsd4_end_grace(struct nfsd_net *nn);
DEFINE_NFSD_ERR_EVENT(write_err);
#include "state.h"
+#include "filecache.h"
+#include "vfs.h"
DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
DEFINE_STATEID_EVENT(layout_recall_fail);
DEFINE_STATEID_EVENT(layout_recall_release);
+#define show_nf_flags(val) \
+ __print_flags(val, "|", \
+ { 1 << NFSD_FILE_HASHED, "HASHED" }, \
+ { 1 << NFSD_FILE_PENDING, "PENDING" }, \
+ { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \
+ { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \
+ { 1 << NFSD_FILE_REFERENCED, "REFERENCED"})
+
+/* FIXME: This should probably be fleshed out in the future. */
+#define show_nf_may(val) \
+ __print_flags(val, "|", \
+ { NFSD_MAY_READ, "READ" }, \
+ { NFSD_MAY_WRITE, "WRITE" }, \
+ { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" })
+
+DECLARE_EVENT_CLASS(nfsd_file_class,
+ TP_PROTO(struct nfsd_file *nf),
+ TP_ARGS(nf),
+ TP_STRUCT__entry(
+ __field(unsigned int, nf_hashval)
+ __field(void *, nf_inode)
+ __field(int, nf_ref)
+ __field(unsigned long, nf_flags)
+ __field(unsigned char, nf_may)
+ __field(struct file *, nf_file)
+ ),
+ TP_fast_assign(
+ __entry->nf_hashval = nf->nf_hashval;
+ __entry->nf_inode = nf->nf_inode;
+ __entry->nf_ref = atomic_read(&nf->nf_ref);
+ __entry->nf_flags = nf->nf_flags;
+ __entry->nf_may = nf->nf_may;
+ __entry->nf_file = nf->nf_file;
+ ),
+ TP_printk("hash=0x%x inode=0x%p ref=%d flags=%s may=%s file=%p",
+ __entry->nf_hashval,
+ __entry->nf_inode,
+ __entry->nf_ref,
+ show_nf_flags(__entry->nf_flags),
+ show_nf_may(__entry->nf_may),
+ __entry->nf_file)
+)
+
+#define DEFINE_NFSD_FILE_EVENT(name) \
+DEFINE_EVENT(nfsd_file_class, name, \
+ TP_PROTO(struct nfsd_file *nf), \
+ TP_ARGS(nf))
+
+DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked);
+
+TRACE_EVENT(nfsd_file_acquire,
+ TP_PROTO(struct svc_rqst *rqstp, unsigned int hash,
+ struct inode *inode, unsigned int may_flags,
+ struct nfsd_file *nf, __be32 status),
+
+ TP_ARGS(rqstp, hash, inode, may_flags, nf, status),
+
+ TP_STRUCT__entry(
+ __field(__be32, xid)
+ __field(unsigned int, hash)
+ __field(void *, inode)
+ __field(unsigned int, may_flags)
+ __field(int, nf_ref)
+ __field(unsigned long, nf_flags)
+ __field(unsigned char, nf_may)
+ __field(struct file *, nf_file)
+ __field(__be32, status)
+ ),
+
+ TP_fast_assign(
+ __entry->xid = rqstp->rq_xid;
+ __entry->hash = hash;
+ __entry->inode = inode;
+ __entry->may_flags = may_flags;
+ __entry->nf_ref = nf ? atomic_read(&nf->nf_ref) : 0;
+ __entry->nf_flags = nf ? nf->nf_flags : 0;
+ __entry->nf_may = nf ? nf->nf_may : 0;
+ __entry->nf_file = nf ? nf->nf_file : NULL;
+ __entry->status = status;
+ ),
+
+ TP_printk("xid=0x%x hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u",
+ be32_to_cpu(__entry->xid), __entry->hash, __entry->inode,
+ show_nf_may(__entry->may_flags), __entry->nf_ref,
+ show_nf_flags(__entry->nf_flags),
+ show_nf_may(__entry->nf_may), __entry->nf_file,
+ be32_to_cpu(__entry->status))
+);
+
+DECLARE_EVENT_CLASS(nfsd_file_search_class,
+ TP_PROTO(struct inode *inode, unsigned int hash, int found),
+ TP_ARGS(inode, hash, found),
+ TP_STRUCT__entry(
+ __field(struct inode *, inode)
+ __field(unsigned int, hash)
+ __field(int, found)
+ ),
+ TP_fast_assign(
+ __entry->inode = inode;
+ __entry->hash = hash;
+ __entry->found = found;
+ ),
+ TP_printk("hash=0x%x inode=0x%p found=%d", __entry->hash,
+ __entry->inode, __entry->found)
+);
+
+#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \
+DEFINE_EVENT(nfsd_file_search_class, name, \
+ TP_PROTO(struct inode *inode, unsigned int hash, int found), \
+ TP_ARGS(inode, hash, found))
+
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync);
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode);
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_is_cached);
+
+TRACE_EVENT(nfsd_file_fsnotify_handle_event,
+ TP_PROTO(struct inode *inode, u32 mask),
+ TP_ARGS(inode, mask),
+ TP_STRUCT__entry(
+ __field(struct inode *, inode)
+ __field(unsigned int, nlink)
+ __field(umode_t, mode)
+ __field(u32, mask)
+ ),
+ TP_fast_assign(
+ __entry->inode = inode;
+ __entry->nlink = inode->i_nlink;
+ __entry->mode = inode->i_mode;
+ __entry->mask = mask;
+ ),
+ TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode,
+ __entry->nlink, __entry->mode, __entry->mask)
+);
+
#endif /* _NFSD_TRACE_H */
#undef TRACE_INCLUDE_PATH
#include "nfsd.h"
#include "vfs.h"
+#include "filecache.h"
#include "trace.h"
#define NFSDDBG_FACILITY NFSDDBG_FILEOP
-
-/*
- * This is a cache of readahead params that help us choose the proper
- * readahead strategy. Initially, we set all readahead parameters to 0
- * and let the VFS handle things.
- * If you increase the number of cached files very much, you'll need to
- * add a hash table here.
- */
-struct raparms {
- struct raparms *p_next;
- unsigned int p_count;
- ino_t p_ino;
- dev_t p_dev;
- int p_set;
- struct file_ra_state p_ra;
- unsigned int p_hindex;
-};
-
-struct raparm_hbucket {
- struct raparms *pb_head;
- spinlock_t pb_lock;
-} ____cacheline_aligned_in_smp;
-
-#define RAPARM_HASH_BITS 4
-#define RAPARM_HASH_SIZE (1<<RAPARM_HASH_BITS)
-#define RAPARM_HASH_MASK (RAPARM_HASH_SIZE-1)
-static struct raparm_hbucket raparm_hash[RAPARM_HASH_SIZE];
-
/*
* Called from nfsd_lookup and encode_dirent. Check if we have crossed
* a mount point.
}
#endif /* CONFIG_NFSD_V3 */
-static int nfsd_open_break_lease(struct inode *inode, int access)
+int nfsd_open_break_lease(struct inode *inode, int access)
{
unsigned int mode;
* and additional flags.
* N.B. After this call fhp needs an fh_put
*/
-__be32
-nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+static __be32
+__nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
int may_flags, struct file **filp)
{
struct path path;
__be32 err;
int host_err = 0;
- validate_process_creds();
-
- /*
- * If we get here, then the client has already done an "open",
- * and (hopefully) checked permission - so allow OWNER_OVERRIDE
- * in case a chmod has now revoked permission.
- *
- * Arguably we should also allow the owner override for
- * directories, but we never have and it doesn't seem to have
- * caused anyone a problem. If we were to change this, note
- * also that our filldir callbacks would need a variant of
- * lookup_one_len that doesn't check permissions.
- */
- if (type == S_IFREG)
- may_flags |= NFSD_MAY_OWNER_OVERRIDE;
- err = fh_verify(rqstp, fhp, type, may_flags);
- if (err)
- goto out;
-
path.mnt = fhp->fh_export->ex_path.mnt;
path.dentry = fhp->fh_dentry;
inode = d_inode(path.dentry);
out_nfserr:
err = nfserrno(host_err);
out:
- validate_process_creds();
return err;
}
-struct raparms *
-nfsd_init_raparms(struct file *file)
+__be32
+nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+ int may_flags, struct file **filp)
{
- struct inode *inode = file_inode(file);
- dev_t dev = inode->i_sb->s_dev;
- ino_t ino = inode->i_ino;
- struct raparms *ra, **rap, **frap = NULL;
- int depth = 0;
- unsigned int hash;
- struct raparm_hbucket *rab;
-
- hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
- rab = &raparm_hash[hash];
-
- spin_lock(&rab->pb_lock);
- for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
- if (ra->p_ino == ino && ra->p_dev == dev)
- goto found;
- depth++;
- if (ra->p_count == 0)
- frap = rap;
- }
- depth = nfsdstats.ra_size;
- if (!frap) {
- spin_unlock(&rab->pb_lock);
- return NULL;
- }
- rap = frap;
- ra = *frap;
- ra->p_dev = dev;
- ra->p_ino = ino;
- ra->p_set = 0;
- ra->p_hindex = hash;
-found:
- if (rap != &rab->pb_head) {
- *rap = ra->p_next;
- ra->p_next = rab->pb_head;
- rab->pb_head = ra;
- }
- ra->p_count++;
- nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
- spin_unlock(&rab->pb_lock);
+ __be32 err;
- if (ra->p_set)
- file->f_ra = ra->p_ra;
- return ra;
+ validate_process_creds();
+ /*
+ * If we get here, then the client has already done an "open",
+ * and (hopefully) checked permission - so allow OWNER_OVERRIDE
+ * in case a chmod has now revoked permission.
+ *
+ * Arguably we should also allow the owner override for
+ * directories, but we never have and it doesn't seem to have
+ * caused anyone a problem. If we were to change this, note
+ * also that our filldir callbacks would need a variant of
+ * lookup_one_len that doesn't check permissions.
+ */
+ if (type == S_IFREG)
+ may_flags |= NFSD_MAY_OWNER_OVERRIDE;
+ err = fh_verify(rqstp, fhp, type, may_flags);
+ if (!err)
+ err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
+ validate_process_creds();
+ return err;
}
-void nfsd_put_raparams(struct file *file, struct raparms *ra)
+__be32
+nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+ int may_flags, struct file **filp)
{
- struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
+ __be32 err;
- spin_lock(&rab->pb_lock);
- ra->p_ra = file->f_ra;
- ra->p_set = 1;
- ra->p_count--;
- spin_unlock(&rab->pb_lock);
+ validate_process_creds();
+ err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
+ validate_process_creds();
+ return err;
}
/*
return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
}
+static u32 nfsd_eof_on_read(struct file *file, loff_t offset, ssize_t len,
+ size_t expected)
+{
+ if (expected != 0 && len == 0)
+ return 1;
+ if (offset+len >= i_size_read(file_inode(file)))
+ return 1;
+ return 0;
+}
+
static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset,
- unsigned long *count, int host_err)
+ unsigned long *count, u32 *eof, ssize_t host_err)
{
if (host_err >= 0) {
nfsdstats.io_read += host_err;
+ *eof = nfsd_eof_on_read(file, offset, host_err, *count);
*count = host_err;
fsnotify_access(file);
trace_nfsd_read_io_done(rqstp, fhp, offset, *count);
}
__be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
- struct file *file, loff_t offset, unsigned long *count)
+ struct file *file, loff_t offset, unsigned long *count,
+ u32 *eof)
{
struct splice_desc sd = {
.len = 0,
.pos = offset,
.u.data = rqstp,
};
- int host_err;
+ ssize_t host_err;
trace_nfsd_read_splice(rqstp, fhp, offset, *count);
rqstp->rq_next_page = rqstp->rq_respages + 1;
host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
- return nfsd_finish_read(rqstp, fhp, file, offset, count, host_err);
+ return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
}
__be32 nfsd_readv(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset,
- struct kvec *vec, int vlen, unsigned long *count)
+ struct kvec *vec, int vlen, unsigned long *count,
+ u32 *eof)
{
struct iov_iter iter;
- int host_err;
+ loff_t ppos = offset;
+ ssize_t host_err;
trace_nfsd_read_vector(rqstp, fhp, offset, *count);
iov_iter_kvec(&iter, READ, vec, vlen, *count);
- host_err = vfs_iter_read(file, &iter, &offset, 0);
- return nfsd_finish_read(rqstp, fhp, file, offset, count, host_err);
+ host_err = vfs_iter_read(file, &iter, &ppos, 0);
+ return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
}
/*
nfsdstats.io_write += *cnt;
fsnotify_modify(file);
- if (stable && use_wgather)
+ if (stable && use_wgather) {
host_err = wait_for_concurrent_writes(file);
+ if (host_err < 0)
+ nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp),
+ nfsd_net_id));
+ }
out_nfserr:
if (host_err >= 0) {
* N.B. After this call fhp needs an fh_put
*/
__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+ loff_t offset, struct kvec *vec, int vlen, unsigned long *count,
+ u32 *eof)
{
+ struct nfsd_file *nf;
struct file *file;
- struct raparms *ra;
__be32 err;
trace_nfsd_read_start(rqstp, fhp, offset, *count);
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
if (err)
return err;
- ra = nfsd_init_raparms(file);
-
+ file = nf->nf_file;
if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &rqstp->rq_flags))
- err = nfsd_splice_read(rqstp, fhp, file, offset, count);
+ err = nfsd_splice_read(rqstp, fhp, file, offset, count, eof);
else
- err = nfsd_readv(rqstp, fhp, file, offset, vec, vlen, count);
+ err = nfsd_readv(rqstp, fhp, file, offset, vec, vlen, count, eof);
- if (ra)
- nfsd_put_raparams(file, ra);
- fput(file);
+ nfsd_file_put(nf);
trace_nfsd_read_done(rqstp, fhp, offset, *count);
nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
struct kvec *vec, int vlen, unsigned long *cnt, int stable)
{
- struct file *file = NULL;
- __be32 err = 0;
+ struct nfsd_file *nf;
+ __be32 err;
trace_nfsd_write_start(rqstp, fhp, offset, *cnt);
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
if (err)
goto out;
- err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt, stable);
- fput(file);
+ err = nfsd_vfs_write(rqstp, fhp, nf->nf_file, offset, vec,
+ vlen, cnt, stable);
+ nfsd_file_put(nf);
out:
trace_nfsd_write_done(rqstp, fhp, offset, *cnt);
return err;
nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, unsigned long count)
{
- struct file *file;
- loff_t end = LLONG_MAX;
- __be32 err = nfserr_inval;
+ struct nfsd_file *nf;
+ loff_t end = LLONG_MAX;
+ __be32 err = nfserr_inval;
if (offset < 0)
goto out;
goto out;
}
- err = nfsd_open(rqstp, fhp, S_IFREG,
- NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &file);
+ err = nfsd_file_acquire(rqstp, fhp,
+ NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
if (err)
goto out;
if (EX_ISSYNC(fhp->fh_export)) {
- int err2 = vfs_fsync_range(file, offset, end, 0);
+ int err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);
- if (err2 != -EINVAL)
- err = nfserrno(err2);
- else
+ switch (err2) {
+ case 0:
+ break;
+ case -EINVAL:
err = nfserr_notsupp;
+ break;
+ default:
+ err = nfserrno(err2);
+ nfsd_reset_boot_verifier(net_generic(nf->nf_net,
+ nfsd_net_id));
+ }
}
- fput(file);
+ nfsd_file_put(nf);
out:
return err;
}
goto out_unlock;
}
+static void
+nfsd_close_cached_files(struct dentry *dentry)
+{
+ struct inode *inode = d_inode(dentry);
+
+ if (inode && S_ISREG(inode->i_mode))
+ nfsd_file_close_inode_sync(inode);
+}
+
+static bool
+nfsd_has_cached_files(struct dentry *dentry)
+{
+ bool ret = false;
+ struct inode *inode = d_inode(dentry);
+
+ if (inode && S_ISREG(inode->i_mode))
+ ret = nfsd_file_is_cached(inode);
+ return ret;
+}
+
/*
* Rename a file
* N.B. After this call _both_ ffhp and tfhp need an fh_put
struct inode *fdir, *tdir;
__be32 err;
int host_err;
+ bool has_cached = false;
err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
if (err)
if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
goto out;
+retry:
host_err = fh_want_write(ffhp);
if (host_err) {
err = nfserrno(host_err);
if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
goto out_dput_new;
- host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
- if (!host_err) {
- host_err = commit_metadata(tfhp);
- if (!host_err)
- host_err = commit_metadata(ffhp);
+ if (nfsd_has_cached_files(ndentry)) {
+ has_cached = true;
+ goto out_dput_old;
+ } else {
+ host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
+ if (!host_err) {
+ host_err = commit_metadata(tfhp);
+ if (!host_err)
+ host_err = commit_metadata(ffhp);
+ }
}
out_dput_new:
dput(ndentry);
* as that would do the wrong thing if the two directories
* were the same, so again we do it by hand.
*/
- fill_post_wcc(ffhp);
- fill_post_wcc(tfhp);
+ if (!has_cached) {
+ fill_post_wcc(ffhp);
+ fill_post_wcc(tfhp);
+ }
unlock_rename(tdentry, fdentry);
ffhp->fh_locked = tfhp->fh_locked = false;
fh_drop_write(ffhp);
+ /*
+ * If the target dentry has cached open files, then we need to try to
+ * close them prior to doing the rename. Flushing delayed fput
+ * shouldn't be done with locks held however, so we delay it until this
+ * point and then reattempt the whole shebang.
+ */
+ if (has_cached) {
+ has_cached = false;
+ nfsd_close_cached_files(ndentry);
+ dput(ndentry);
+ goto retry;
+ }
out:
return err;
}
if (!type)
type = d_inode(rdentry)->i_mode & S_IFMT;
- if (type != S_IFDIR)
+ if (type != S_IFDIR) {
+ nfsd_close_cached_files(rdentry);
host_err = vfs_unlink(dirp, rdentry, NULL);
- else
+ } else {
host_err = vfs_rmdir(dirp, rdentry);
+ }
+
if (!host_err)
host_err = commit_metadata(fhp);
dput(rdentry);
return err? nfserrno(err) : 0;
}
-
-void
-nfsd_racache_shutdown(void)
-{
- struct raparms *raparm, *last_raparm;
- unsigned int i;
-
- dprintk("nfsd: freeing readahead buffers.\n");
-
- for (i = 0; i < RAPARM_HASH_SIZE; i++) {
- raparm = raparm_hash[i].pb_head;
- while(raparm) {
- last_raparm = raparm;
- raparm = raparm->p_next;
- kfree(last_raparm);
- }
- raparm_hash[i].pb_head = NULL;
- }
-}
-/*
- * Initialize readahead param cache
- */
-int
-nfsd_racache_init(int cache_size)
-{
- int i;
- int j = 0;
- int nperbucket;
- struct raparms **raparm = NULL;
-
-
- if (raparm_hash[0].pb_head)
- return 0;
- nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
- nperbucket = max(2, nperbucket);
- cache_size = nperbucket * RAPARM_HASH_SIZE;
-
- dprintk("nfsd: allocating %d readahead buffers.\n", cache_size);
-
- for (i = 0; i < RAPARM_HASH_SIZE; i++) {
- spin_lock_init(&raparm_hash[i].pb_lock);
-
- raparm = &raparm_hash[i].pb_head;
- for (j = 0; j < nperbucket; j++) {
- *raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL);
- if (!*raparm)
- goto out_nomem;
- raparm = &(*raparm)->p_next;
- }
- *raparm = NULL;
- }
-
- nfsdstats.ra_size = cache_size;
- return 0;
-
-out_nomem:
- dprintk("nfsd: kmalloc failed, freeing readahead buffers\n");
- nfsd_racache_shutdown();
- return -ENOMEM;
-}
typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
/* nfsd/vfs.c */
-int nfsd_racache_init(int);
-void nfsd_racache_shutdown(void);
int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
struct svc_export **expp);
__be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *,
__be32 nfsd_commit(struct svc_rqst *, struct svc_fh *,
loff_t, unsigned long);
#endif /* CONFIG_NFSD_V3 */
+int nfsd_open_break_lease(struct inode *, int);
__be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
-struct raparms;
+__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
+ int, struct file **);
__be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset,
- unsigned long *count);
+ unsigned long *count,
+ u32 *eof);
__be32 nfsd_readv(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset,
struct kvec *vec, int vlen,
- unsigned long *count);
+ unsigned long *count,
+ u32 *eof);
__be32 nfsd_read(struct svc_rqst *, struct svc_fh *,
- loff_t, struct kvec *, int, unsigned long *);
+ loff_t, struct kvec *, int, unsigned long *,
+ u32 *eof);
__be32 nfsd_write(struct svc_rqst *, struct svc_fh *, loff_t,
struct kvec *, int, unsigned long *, int);
__be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
__be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
struct dentry *, int);
-struct raparms *nfsd_init_raparms(struct file *file);
-void nfsd_put_raparams(struct file *file, struct raparms *ra);
-
static inline int fh_want_write(struct svc_fh *fh)
{
int ret;
|| createmode == NFS4_CREATE_EXCLUSIVE4_1;
}
-static inline bool nfsd_eof_on_read(long requested, long read,
- loff_t offset, loff_t size)
-{
- /* We assume a short read means eof: */
- if (requested > read)
- return true;
- /*
- * A non-short read might also reach end of file. The spec
- * still requires us to set eof in that case.
- *
- * Further operations may have modified the file size since
- * the read, so the following check is not atomic with the read.
- * We've only seen that cause a problem for a client in the case
- * where the read returned a count of 0 without setting eof.
- * That case was fixed by the addition of the above check.
- */
- return (offset + read >= size);
-}
-
#endif /* LINUX_NFSD_VFS_H */
__be32 status;
struct svc_fh fh;
unsigned long count;
- int eof;
+ __u32 eof;
};
struct nfsd3_writeres {
struct nfsd4_read {
- stateid_t rd_stateid; /* request */
- u64 rd_offset; /* request */
- u32 rd_length; /* request */
- int rd_vlen;
- struct file *rd_filp;
- bool rd_tmp_file;
+ stateid_t rd_stateid; /* request */
+ u64 rd_offset; /* request */
+ u32 rd_length; /* request */
+ int rd_vlen;
+ struct nfsd_file *rd_nf;
- struct svc_rqst *rd_rqstp; /* response */
- struct svc_fh * rd_fhp; /* response */
+ struct svc_rqst *rd_rqstp; /* response */
+ struct svc_fh *rd_fhp; /* response */
};
struct nfsd4_readdir {
struct nfs4_client *cp_clp;
- struct file *file_src;
- struct file *file_dst;
+ struct nfsd_file *nf_src;
+ struct nfsd_file *nf_dst;
stateid_t cp_stateid;
{
fsnotify_destroy_marks(&sb->s_fsnotify_marks);
}
-/* Wait until all marks queued for destruction are destroyed */
-extern void fsnotify_wait_marks_destroyed(void);
/*
* update the dentry->d_flags of all of inode's children to indicate if inode cares
if (refcount_dec_and_test(&group->refcnt))
fsnotify_final_destroy_group(group);
}
+EXPORT_SYMBOL_GPL(fsnotify_put_group);
/*
* Create a new fsnotify_group and hold a reference for the group returned.
return group;
}
+EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
int fsnotify_fasync(int fd, struct file *file, int on)
{
queue_delayed_work(system_unbound_wq, &reaper_work,
FSNOTIFY_REAPER_DELAY);
}
+EXPORT_SYMBOL_GPL(fsnotify_put_mark);
/*
* Get mark reference when we found the mark via lockless traversal of object
mutex_unlock(&group->mark_mutex);
fsnotify_free_mark(mark);
}
+EXPORT_SYMBOL_GPL(fsnotify_destroy_mark);
/*
* Sorting function for lists of fsnotify marks.
mutex_unlock(&group->mark_mutex);
return ret;
}
+EXPORT_SYMBOL_GPL(fsnotify_add_mark);
/*
* Given a list of marks, find the mark associated with given group. If found
spin_unlock(&conn->lock);
return NULL;
}
+EXPORT_SYMBOL_GPL(fsnotify_find_mark);
/* Clear any marks in a group with given type mask */
void fsnotify_clear_marks_by_group(struct fsnotify_group *group,
mark->group = group;
WRITE_ONCE(mark->connector, NULL);
}
+EXPORT_SYMBOL_GPL(fsnotify_init_mark);
/*
* Destroy all marks in destroy_list, waits for SRCU period to finish before
{
flush_delayed_work(&reaper_work);
}
+EXPORT_SYMBOL_GPL(fsnotify_wait_marks_destroyed);
#include <linux/ioport.h>
#include <linux/memory.h>
#include <linux/sched/task.h>
+#include <linux/security.h>
#include <asm/sections.h>
#include "internal.h"
static int open_kcore(struct inode *inode, struct file *filp)
{
+ int ret = security_locked_down(LOCKDOWN_KCORE);
+
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
+ if (ret)
+ return ret;
+
filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!filp->private_data)
return -ENOMEM;
#include <linux/syscalls.h>
#include <linux/unistd.h>
#include <linux/compat.h>
-
#include <linux/uaccess.h>
+#include <asm/unaligned.h>
+
+/*
+ * Note the "unsafe_put_user() semantics: we goto a
+ * label for errors.
+ *
+ * Also note how we use a "while()" loop here, even though
+ * only the biggest size needs to loop. The compiler (well,
+ * at least gcc) is smart enough to turn the smaller sizes
+ * into just if-statements, and this way we don't need to
+ * care whether 'u64' or 'u32' is the biggest size.
+ */
+#define unsafe_copy_loop(dst, src, len, type, label) \
+ while (len >= sizeof(type)) { \
+ unsafe_put_user(get_unaligned((type *)src), \
+ (type __user *)dst, label); \
+ dst += sizeof(type); \
+ src += sizeof(type); \
+ len -= sizeof(type); \
+ }
+
+/*
+ * We avoid doing 64-bit copies on 32-bit architectures. They
+ * might be better, but the component names are mostly small,
+ * and the 64-bit cases can end up being much more complex and
+ * put much more register pressure on the code, so it's likely
+ * not worth the pain of unaligned accesses etc.
+ *
+ * So limit the copies to "unsigned long" size. I did verify
+ * that at least the x86-32 case is ok without this limiting,
+ * but I worry about random other legacy 32-bit cases that
+ * might not do as well.
+ */
+#define unsafe_copy_type(dst, src, len, type, label) do { \
+ if (sizeof(type) <= sizeof(unsigned long)) \
+ unsafe_copy_loop(dst, src, len, type, label); \
+} while (0)
+
+/*
+ * Copy the dirent name to user space, and NUL-terminate
+ * it. This should not be a function call, since we're doing
+ * the copy inside a "user_access_begin/end()" section.
+ */
+#define unsafe_copy_dirent_name(_dst, _src, _len, label) do { \
+ char __user *dst = (_dst); \
+ const char *src = (_src); \
+ size_t len = (_len); \
+ unsafe_copy_type(dst, src, len, u64, label); \
+ unsafe_copy_type(dst, src, len, u32, label); \
+ unsafe_copy_type(dst, src, len, u16, label); \
+ unsafe_copy_type(dst, src, len, u8, label); \
+ unsafe_put_user(0, dst, label); \
+} while (0)
+
+
int iterate_dir(struct file *file, struct dir_context *ctx)
{
struct inode *inode = file_inode(file);
}
EXPORT_SYMBOL(iterate_dir);
+/*
+ * POSIX says that a dirent name cannot contain NULL or a '/'.
+ *
+ * It's not 100% clear what we should really do in this case.
+ * The filesystem is clearly corrupted, but returning a hard
+ * error means that you now don't see any of the other names
+ * either, so that isn't a perfect alternative.
+ *
+ * And if you return an error, what error do you use? Several
+ * filesystems seem to have decided on EUCLEAN being the error
+ * code for EFSCORRUPTED, and that may be the error to use. Or
+ * just EIO, which is perhaps more obvious to users.
+ *
+ * In order to see the other file names in the directory, the
+ * caller might want to make this a "soft" error: skip the
+ * entry, and return the error at the end instead.
+ *
+ * Note that this should likely do a "memchr(name, 0, len)"
+ * check too, since that would be filesystem corruption as
+ * well. However, that case can't actually confuse user space,
+ * which has to do a strlen() on the name anyway to find the
+ * filename length, and the above "soft error" worry means
+ * that it's probably better left alone until we have that
+ * issue clarified.
+ */
+static int verify_dirent_name(const char *name, int len)
+{
+ if (WARN_ON_ONCE(!len))
+ return -EIO;
+ if (WARN_ON_ONCE(memchr(name, '/', len)))
+ return -EIO;
+ return 0;
+}
+
/*
* Traditional linux readdir() handling..
*
int reclen = ALIGN(offsetof(struct linux_dirent, d_name) + namlen + 2,
sizeof(long));
+ buf->error = verify_dirent_name(name, namlen);
+ if (unlikely(buf->error))
+ return buf->error;
buf->error = -EINVAL; /* only used if we fail.. */
if (reclen > buf->count)
return -EINVAL;
return -EOVERFLOW;
}
dirent = buf->previous;
- if (dirent) {
- if (signal_pending(current))
- return -EINTR;
- if (__put_user(offset, &dirent->d_off))
- goto efault;
- }
- dirent = buf->current_dir;
- if (__put_user(d_ino, &dirent->d_ino))
- goto efault;
- if (__put_user(reclen, &dirent->d_reclen))
- goto efault;
- if (copy_to_user(dirent->d_name, name, namlen))
- goto efault;
- if (__put_user(0, dirent->d_name + namlen))
- goto efault;
- if (__put_user(d_type, (char __user *) dirent + reclen - 1))
+ if (dirent && signal_pending(current))
+ return -EINTR;
+
+ /*
+ * Note! This range-checks 'previous' (which may be NULL).
+ * The real range was checked in getdents
+ */
+ if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
+ if (dirent)
+ unsafe_put_user(offset, &dirent->d_off, efault_end);
+ dirent = buf->current_dir;
+ unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
+ unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
+ unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end);
+ unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
+ user_access_end();
+
buf->previous = dirent;
dirent = (void __user *)dirent + reclen;
buf->current_dir = dirent;
buf->count -= reclen;
return 0;
+efault_end:
+ user_access_end();
efault:
buf->error = -EFAULT;
return -EFAULT;
int reclen = ALIGN(offsetof(struct linux_dirent64, d_name) + namlen + 1,
sizeof(u64));
+ buf->error = verify_dirent_name(name, namlen);
+ if (unlikely(buf->error))
+ return buf->error;
buf->error = -EINVAL; /* only used if we fail.. */
if (reclen > buf->count)
return -EINVAL;
dirent = buf->previous;
- if (dirent) {
- if (signal_pending(current))
- return -EINTR;
- if (__put_user(offset, &dirent->d_off))
- goto efault;
- }
- dirent = buf->current_dir;
- if (__put_user(ino, &dirent->d_ino))
- goto efault;
- if (__put_user(0, &dirent->d_off))
- goto efault;
- if (__put_user(reclen, &dirent->d_reclen))
- goto efault;
- if (__put_user(d_type, &dirent->d_type))
- goto efault;
- if (copy_to_user(dirent->d_name, name, namlen))
- goto efault;
- if (__put_user(0, dirent->d_name + namlen))
+ if (dirent && signal_pending(current))
+ return -EINTR;
+
+ /*
+ * Note! This range-checks 'previous' (which may be NULL).
+ * The real range was checked in getdents
+ */
+ if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
+ if (dirent)
+ unsafe_put_user(offset, &dirent->d_off, efault_end);
+ dirent = buf->current_dir;
+ unsafe_put_user(ino, &dirent->d_ino, efault_end);
+ unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
+ unsafe_put_user(d_type, &dirent->d_type, efault_end);
+ unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
+ user_access_end();
+
buf->previous = dirent;
dirent = (void __user *)dirent + reclen;
buf->current_dir = dirent;
buf->count -= reclen;
return 0;
+efault_end:
+ user_access_end();
efault:
buf->error = -EFAULT;
return -EFAULT;
static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstatfs *kbuf)
{
struct compat_statfs64 buf;
- if (sizeof(ubuf->f_bsize) == 4) {
- if ((kbuf->f_type | kbuf->f_bsize | kbuf->f_namelen |
- kbuf->f_frsize | kbuf->f_flags) & 0xffffffff00000000ULL)
- return -EOVERFLOW;
- /* f_files and f_ffree may be -1; it's okay
- * to stuff that into 32 bits */
- if (kbuf->f_files != 0xffffffffffffffffULL
- && (kbuf->f_files & 0xffffffff00000000ULL))
- return -EOVERFLOW;
- if (kbuf->f_ffree != 0xffffffffffffffffULL
- && (kbuf->f_ffree & 0xffffffff00000000ULL))
- return -EOVERFLOW;
- }
+
+ if ((kbuf->f_bsize | kbuf->f_frsize) & 0xffffffff00000000ULL)
+ return -EOVERFLOW;
+
memset(&buf, 0, sizeof(struct compat_statfs64));
buf.f_type = kbuf->f_type;
buf.f_bsize = kbuf->f_bsize;
#include <linux/parser.h>
#include <linux/magic.h>
#include <linux/slab.h>
+#include <linux/security.h>
#define TRACEFS_DEFAULT_MODE 0700
static int tracefs_mount_count;
static bool tracefs_registered;
+static int default_open_file(struct inode *inode, struct file *filp)
+{
+ struct dentry *dentry = filp->f_path.dentry;
+ struct file_operations *real_fops;
+ int ret;
+
+ if (!dentry)
+ return -EINVAL;
+
+ ret = security_locked_down(LOCKDOWN_TRACEFS);
+ if (ret)
+ return ret;
+
+ real_fops = dentry->d_fsdata;
+ if (!real_fops->open)
+ return 0;
+ return real_fops->open(inode, filp);
+}
+
static ssize_t default_read_file(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
return 0;
}
+static void tracefs_destroy_inode(struct inode *inode)
+{
+ if (S_ISREG(inode->i_mode))
+ kfree(inode->i_fop);
+}
+
static int tracefs_remount(struct super_block *sb, int *flags, char *data)
{
int err;
static const struct super_operations tracefs_super_operations = {
.statfs = simple_statfs,
.remount_fs = tracefs_remount,
+ .destroy_inode = tracefs_destroy_inode,
.show_options = tracefs_show_options,
};
struct dentry *parent, void *data,
const struct file_operations *fops)
{
+ struct file_operations *proxy_fops;
struct dentry *dentry;
struct inode *inode;
if (unlikely(!inode))
return failed_creating(dentry);
+ proxy_fops = kzalloc(sizeof(struct file_operations), GFP_KERNEL);
+ if (unlikely(!proxy_fops)) {
+ iput(inode);
+ return failed_creating(dentry);
+ }
+
+ if (!fops)
+ fops = &tracefs_file_operations;
+
+ dentry->d_fsdata = (void *)fops;
+ memcpy(proxy_fops, fops, sizeof(*proxy_fops));
+ proxy_fops->open = default_open_file;
inode->i_mode = mode;
- inode->i_fop = fops ? fops : &tracefs_file_operations;
+ inode->i_fop = proxy_fops;
inode->i_private = data;
d_instantiate(dentry, inode);
fsnotify_create(dentry->d_parent->d_inode, dentry);
__start_lsm_info = .; \
KEEP(*(.lsm_info.init)) \
__end_lsm_info = .;
+#define EARLY_LSM_TABLE() . = ALIGN(8); \
+ __start_early_lsm_info = .; \
+ KEEP(*(.early_lsm_info.init)) \
+ __end_early_lsm_info = .;
#else
#define LSM_TABLE()
+#define EARLY_LSM_TABLE()
#endif
#define ___OF_TABLE(cfg, name) _OF_TABLE_##cfg(name)
ACPI_PROBE_TABLE(timer) \
THERMAL_TABLE(governor) \
EARLYCON_TABLE() \
- LSM_TABLE()
+ LSM_TABLE() \
+ EARLY_LSM_TABLE()
#define INIT_TEXT \
*(.init.text .init.text.*) \
#define _CRYPTO_PKCS7_H
#include <linux/verification.h>
+#include <linux/hash_info.h>
#include <crypto/public_key.h>
struct key;
extern int pkcs7_supply_detached_data(struct pkcs7_message *pkcs7,
const void *data, size_t datalen);
+extern int pkcs7_get_digest(struct pkcs7_message *pkcs7, const u8 **buf,
+ u32 *len, enum hash_algo *hash_algo);
+
#endif /* _CRYPTO_PKCS7_H */
u32 target_vblank;
/**
- * @pageflip_flags:
+ * @async_flip:
*
- * DRM_MODE_PAGE_FLIP_* flags, as passed to the page flip ioctl.
- * Zero in any other case.
+ * This is set when DRM_MODE_PAGE_FLIP_ASYNC is set in the legacy
+ * PAGE_FLIP IOCTL. It's not wired up for the atomic IOCTL itself yet.
*/
- u32 pageflip_flags;
+ bool async_flip;
/**
* @vrr_enabled:
/**
* @self_refresh_data: Holds the state for the self refresh helpers
*
- * Initialized via drm_self_refresh_helper_register().
+ * Initialized via drm_self_refresh_helper_init().
*/
struct drm_self_refresh_data *self_refresh_data;
};
struct drm_crtc;
void drm_self_refresh_helper_alter_state(struct drm_atomic_state *state);
+void drm_self_refresh_helper_update_avg_times(struct drm_atomic_state *state,
+ unsigned int commit_time_ms);
-int drm_self_refresh_helper_init(struct drm_crtc *crtc,
- unsigned int entry_delay_ms);
-
+int drm_self_refresh_helper_init(struct drm_crtc *crtc);
void drm_self_refresh_helper_cleanup(struct drm_crtc *crtc);
#endif
int acpi_arch_timer_mem_init(struct arch_timer_mem *timer_mem, int *timer_count);
#endif
+#ifndef ACPI_HAVE_ARCH_SET_ROOT_POINTER
+static inline void acpi_arch_set_root_pointer(u64 addr)
+{
+}
+#endif
+
#ifndef ACPI_HAVE_ARCH_GET_ROOT_POINTER
static inline u64 acpi_arch_get_root_pointer(void)
{
#include <asm/types.h>
#include <linux/bits.h>
+/* Set bits in the first 'n' bytes when loaded from memory */
+#ifdef __LITTLE_ENDIAN
+# define aligned_byte_mask(n) ((1UL << 8*(n))-1)
+#else
+# define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
+#endif
+
#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
extern enum cpuhp_smt_control cpu_smt_control;
extern void cpu_smt_disable(bool force);
extern void cpu_smt_check_topology(void);
+extern bool cpu_smt_possible(void);
extern int cpuhp_smt_enable(void);
extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
#else
# define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED)
static inline void cpu_smt_disable(bool force) { }
static inline void cpu_smt_check_topology(void) { }
+static inline bool cpu_smt_possible(void) { return false; }
static inline int cpuhp_smt_enable(void) { return 0; }
static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; }
#endif
#define SJA1105_META_SMAC 0x222222222222ull
#define SJA1105_META_DMAC 0x0180C200000Eull
+#define SJA1105_HWTS_RX_EN 0
+
/* Global tagger data: each struct sja1105_port has a reference to
* the structure defined in struct sja1105_private.
*/
* from taggers running on multiple ports on SMP systems
*/
spinlock_t meta_lock;
- bool hwts_rx_en;
+ unsigned long state;
};
struct sja1105_skb_cb {
extern int generic_setlease(struct file *, long, struct file_lock **, void **priv);
extern int vfs_setlease(struct file *, long, struct file_lock **, void **);
extern int lease_modify(struct file_lock *, int, struct list_head *);
+
+struct notifier_block;
+extern int lease_register_notifier(struct notifier_block *);
+extern void lease_unregister_notifier(struct notifier_block *);
+
struct files_struct;
extern void show_fd_locks(struct seq_file *f,
struct file *filp, struct files_struct *files);
extern void fsnotify_detach_mark(struct fsnotify_mark *mark);
/* free mark */
extern void fsnotify_free_mark(struct fsnotify_mark *mark);
+/* Wait until all marks queued for destruction are destroyed */
+extern void fsnotify_wait_marks_destroyed(void);
/* run all the marks in a group, and clear all of the marks attached to given object type */
extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, unsigned int type);
/* run all the marks in a group, and clear all of the vfsmount marks */
}
extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
struct vm_area_struct *vma, unsigned long addr,
- int node);
+ int node, bool hugepage);
+#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
+ alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
#else
#define alloc_pages(gfp_mask, order) \
alloc_pages_node(numa_node_id(), gfp_mask, order)
-#define alloc_pages_vma(gfp_mask, order, vma, addr, node)\
+#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
+ alloc_pages(gfp_mask, order)
+#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
alloc_pages(gfp_mask, order)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
#define alloc_page_vma(gfp_mask, vma, addr) \
- alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id())
+ alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
#define alloc_page_vma_node(gfp_mask, vma, addr, node) \
- alloc_pages_vma(gfp_mask, 0, vma, addr, node)
+ alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
return true;
-
+ /*
+ * For dax vmas, try to always use hugepage mappings. If the kernel does
+ * not support hugepages, fsdax mappings will fallback to PAGE_SIZE
+ * mappings, and device-dax namespaces, that try to guarantee a given
+ * mapping size, will fail to enable
+ */
if (vma_is_dax(vma))
return true;
return 0;
}
#endif /* CONFIG_IMA_APPRAISE */
+
+#if defined(CONFIG_IMA_APPRAISE) && defined(CONFIG_INTEGRITY_TRUSTED_KEYRING)
+extern bool ima_appraise_signature(enum kernel_read_file_id func);
+#else
+static inline bool ima_appraise_signature(enum kernel_read_file_id func)
+{
+ return false;
+}
+#endif /* CONFIG_IMA_APPRAISE && CONFIG_INTEGRITY_TRUSTED_KEYRING */
#endif /* _LINUX_IMA_H */
unsigned long cmdline_len);
typedef int (kexec_cleanup_t)(void *loader_data);
-#ifdef CONFIG_KEXEC_VERIFY_SIG
+#ifdef CONFIG_KEXEC_SIG
typedef int (kexec_verify_sig_t)(const char *kernel_buf,
unsigned long kernel_len);
#endif
kexec_probe_t *probe;
kexec_load_t *load;
kexec_cleanup_t *cleanup;
-#ifdef CONFIG_KEXEC_VERIFY_SIG
+#ifdef CONFIG_KEXEC_SIG
kexec_verify_sig_t *verify_sig;
#endif
};
struct kvm_stat_data {
int offset;
+ int mode;
struct kvm *kvm;
};
const char *name;
int offset;
enum kvm_stat_kind kind;
+ int mode;
};
extern struct kvm_stats_debugfs_item debugfs_entries[];
extern struct dentry *kvm_debugfs_dir;
* @bpf_prog_free_security:
* Clean up the security information stored inside bpf prog.
*
+ * @locked_down
+ * Determine whether a kernel feature that potentially enables arbitrary
+ * code execution in kernel space should be permitted.
+ *
+ * @what: kernel feature being accessed
*/
union security_list_options {
int (*binder_set_context_mgr)(struct task_struct *mgr);
int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
#endif /* CONFIG_BPF_SYSCALL */
+ int (*locked_down)(enum lockdown_reason what);
};
struct security_hook_heads {
struct hlist_head bpf_prog_alloc_security;
struct hlist_head bpf_prog_free_security;
#endif /* CONFIG_BPF_SYSCALL */
+ struct hlist_head locked_down;
} __randomize_layout;
/*
};
extern struct lsm_info __start_lsm_info[], __end_lsm_info[];
+extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[];
#define DEFINE_LSM(lsm) \
static struct lsm_info __lsm_##lsm \
__used __section(.lsm_info.init) \
__aligned(sizeof(unsigned long))
+#define DEFINE_EARLY_LSM(lsm) \
+ static struct lsm_info __early_lsm_##lsm \
+ __used __section(.early_lsm_info.init) \
+ __aligned(sizeof(unsigned long))
+
#ifdef CONFIG_SECURITY_SELINUX_DISABLE
/*
* Assuring the safety of deleting a security module is up to
struct mempolicy *get_task_policy(struct task_struct *p);
struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
unsigned long addr);
-struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
- unsigned long addr);
bool vma_policy_mof(struct vm_area_struct *vma);
extern void numa_default_policy(void);
*/
struct vmem_altmap {
const unsigned long base_pfn;
+ const unsigned long end_pfn;
const unsigned long reserve;
unsigned long free;
unsigned long align;
lp_advertising, lpa & LPA_LPACK);
}
+static inline void mii_ctrl1000_mod_linkmode_adv_t(unsigned long *advertising,
+ u32 ctrl1000)
+{
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT, advertising,
+ ctrl1000 & ADVERTISE_1000HALF);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, advertising,
+ ctrl1000 & ADVERTISE_1000FULL);
+}
+
/**
* linkmode_adv_to_lcl_adv_t
* @advertising:pointer to linkmode advertising
MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT = 0x940,
MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT = 0x941,
MLX5_CMD_OP_QUERY_MODIFY_HEADER_CONTEXT = 0x942,
- MLX5_CMD_OP_SYNC_STEERING = 0xb00,
MLX5_CMD_OP_FPGA_CREATE_QP = 0x960,
MLX5_CMD_OP_FPGA_MODIFY_QP = 0x961,
MLX5_CMD_OP_FPGA_QUERY_QP = 0x962,
MLX5_CMD_OP_DESTROY_UCTX = 0xa06,
MLX5_CMD_OP_CREATE_UMEM = 0xa08,
MLX5_CMD_OP_DESTROY_UMEM = 0xa0a,
+ MLX5_CMD_OP_SYNC_STEERING = 0xb00,
MLX5_CMD_OP_MAX
};
struct mlx5_ifc_fte_match_set_misc_bits {
u8 gre_c_present[0x1];
- u8 reserved_auto1[0x1];
+ u8 reserved_at_1[0x1];
u8 gre_k_present[0x1];
u8 gre_s_present[0x1];
u8 source_vhca_port[0x4];
struct mlx5_ifc_other_hca_cap_bits {
u8 roce[0x1];
- u8 reserved_0[0x27f];
+ u8 reserved_at_1[0x27f];
};
struct mlx5_ifc_query_other_hca_cap_out_bits {
u8 status[0x8];
- u8 reserved_0[0x18];
+ u8 reserved_at_8[0x18];
u8 syndrome[0x20];
- u8 reserved_1[0x40];
+ u8 reserved_at_40[0x40];
struct mlx5_ifc_other_hca_cap_bits other_capability;
};
struct mlx5_ifc_query_other_hca_cap_in_bits {
u8 opcode[0x10];
- u8 reserved_0[0x10];
+ u8 reserved_at_10[0x10];
- u8 reserved_1[0x10];
+ u8 reserved_at_20[0x10];
u8 op_mod[0x10];
- u8 reserved_2[0x10];
+ u8 reserved_at_40[0x10];
u8 function_id[0x10];
- u8 reserved_3[0x20];
+ u8 reserved_at_60[0x20];
};
struct mlx5_ifc_modify_other_hca_cap_out_bits {
u8 status[0x8];
- u8 reserved_0[0x18];
+ u8 reserved_at_8[0x18];
u8 syndrome[0x20];
- u8 reserved_1[0x40];
+ u8 reserved_at_40[0x40];
};
struct mlx5_ifc_modify_other_hca_cap_in_bits {
u8 opcode[0x10];
- u8 reserved_0[0x10];
+ u8 reserved_at_10[0x10];
- u8 reserved_1[0x10];
+ u8 reserved_at_20[0x10];
u8 op_mod[0x10];
- u8 reserved_2[0x10];
+ u8 reserved_at_40[0x10];
u8 function_id[0x10];
u8 field_select[0x20];
unsigned long highest_vm_end; /* highest vma end address */
pgd_t * pgd;
+#ifdef CONFIG_MEMBARRIER
+ /**
+ * @membarrier_state: Flags controlling membarrier behavior.
+ *
+ * This field is close to @pgd to hopefully fit in the same
+ * cache-line, which needs to be touched by switch_mm().
+ */
+ atomic_t membarrier_state;
+#endif
+
/**
* @mm_users: The number of users including userspace.
*
unsigned long flags; /* Must use atomic bitops to access */
struct core_state *core_state; /* coredumping support */
-#ifdef CONFIG_MEMBARRIER
- atomic_t membarrier_state;
-#endif
+
#ifdef CONFIG_AIO
spinlock_t ioctx_lock;
struct kioctx_table __rcu *ioctx_table;
#include <linux/percpu.h>
#include <asm/module.h>
-/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
-#define MODULE_SIG_STRING "~Module signature appended~\n"
-
/* Not Yet Implemented */
#define MODULE_SUPPORTED_DEVICE(name)
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Module signature handling.
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#ifndef _LINUX_MODULE_SIGNATURE_H
+#define _LINUX_MODULE_SIGNATURE_H
+
+#include <linux/types.h>
+
+/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
+#define MODULE_SIG_STRING "~Module signature appended~\n"
+
+enum pkey_id_type {
+ PKEY_ID_PGP, /* OpenPGP generated key ID */
+ PKEY_ID_X509, /* X.509 arbitrary subjectKeyIdentifier */
+ PKEY_ID_PKCS7, /* Signature in PKCS#7 message */
+};
+
+/*
+ * Module signature information block.
+ *
+ * The constituents of the signature section are, in order:
+ *
+ * - Signer's name
+ * - Key identifier
+ * - Signature data
+ * - Information block
+ */
+struct module_signature {
+ u8 algo; /* Public-key crypto algorithm [0] */
+ u8 hash; /* Digest algorithm [0] */
+ u8 id_type; /* Key identifier type [PKEY_ID_PKCS7] */
+ u8 signer_len; /* Length of signer's name [0] */
+ u8 key_id_len; /* Length of key identifier [0] */
+ u8 __pad[3];
+ __be32 sig_len; /* Length of signature data */
+};
+
+int mod_check_sig(const struct module_signature *ms, size_t file_len,
+ const char *name);
+
+#endif /* _LINUX_MODULE_SIGNATURE_H */
return phydev->state >= PHY_UP;
}
+void phy_resolve_aneg_pause(struct phy_device *phydev);
void phy_resolve_aneg_linkmode(struct phy_device *phydev);
/**
int __genphy_config_aneg(struct phy_device *phydev, bool changed);
int genphy_aneg_done(struct phy_device *phydev);
int genphy_update_link(struct phy_device *phydev);
+int genphy_read_lpa(struct phy_device *phydev);
int genphy_read_status(struct phy_device *phydev);
int genphy_suspend(struct phy_device *phydev);
int genphy_resume(struct phy_device *phydev);
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * db8500_thermal.h - DB8500 Thermal Management Implementation
- *
- * Copyright (C) 2012 ST-Ericsson
- * Copyright (C) 2012 Linaro Ltd.
- *
- * Author: Hongbo Zhang <hongbo.zhang@linaro.com>
- */
-
-#ifndef _DB8500_THERMAL_H_
-#define _DB8500_THERMAL_H_
-
-#include <linux/thermal.h>
-
-#define COOLING_DEV_MAX 8
-
-struct db8500_trip_point {
- unsigned long temp;
- enum thermal_trip_type type;
- char cdev_name[COOLING_DEV_MAX][THERMAL_NAME_LENGTH];
-};
-
-struct db8500_thsens_platform_data {
- struct db8500_trip_point trip_points[THERMAL_MAX_TRIPS];
- int num_trips;
-};
-
-#endif /* _DB8500_THERMAL_H_ */
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (c) 2005 Sascha Hauer <s.hauer@pengutronix.de>, Pengutronix
- */
-
-#ifndef __ETH_NETX_H
-#define __ETH_NETX_H
-
-struct netxeth_platform_data {
- unsigned int xcno; /* number of xmac/xpec engine this eth uses */
-};
-
-#endif
int (*capture)(struct pwm_chip *chip, struct pwm_device *pwm,
struct pwm_capture *result, unsigned long timeout);
int (*apply)(struct pwm_chip *chip, struct pwm_device *pwm,
- struct pwm_state *state);
+ const struct pwm_state *state);
void (*get_state)(struct pwm_chip *chip, struct pwm_device *pwm,
struct pwm_state *state);
struct module *owner;
/* PWM user APIs */
struct pwm_device *pwm_request(int pwm_id, const char *label);
void pwm_free(struct pwm_device *pwm);
-int pwm_apply_state(struct pwm_device *pwm, struct pwm_state *state);
+int pwm_apply_state(struct pwm_device *pwm, const struct pwm_state *state);
int pwm_adjust_config(struct pwm_device *pwm);
/**
/*
* rcuwait provides a way of blocking and waking up a single
- * task in an rcu-safe manner; where it is forbidden to use
- * after exit_notify(). task_struct is not properly rcu protected,
- * unless dealing with rcu-aware lists, ie: find_task_by_*().
+ * task in an rcu-safe manner.
*
- * Alternatively we have task_rcu_dereference(), but the return
- * semantics have different implications which would break the
- * wakeup side. The only time @task is non-nil is when a user is
- * blocked (or checking if it needs to) on a condition, and reset
- * as soon as we know that the condition has succeeded and are
- * awoken.
+ * The only time @task is non-nil is when a user is blocked (or
+ * checking if it needs to) on a condition, and reset as soon as we
+ * know that the condition has succeeded and are awoken.
*/
struct rcuwait {
struct task_struct __rcu *task;
*/
#define rcuwait_wait_event(w, condition) \
({ \
- /* \
- * Complain if we are called after do_exit()/exit_notify(), \
- * as we cannot rely on the rcu critical region for the \
- * wakeup side. \
- */ \
- WARN_ON(current->exit_state); \
- \
rcu_assign_pointer((w)->task, current); \
for (;;) { \
/* \
struct tlbflush_unmap_batch tlb_ubc;
- struct rcu_head rcu;
+ union {
+ refcount_t rcu_users;
+ struct rcu_head rcu;
+ };
/* Cache last used pipe for splice(): */
struct pipe_inode_info *splice_pipe;
* running or not.
*/
#ifndef vcpu_is_preempted
-# define vcpu_is_preempted(cpu) false
+static inline bool vcpu_is_preempted(int cpu)
+{
+ return false;
+}
#endif
extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
{
+ if (current->mm != mm)
+ return;
if (likely(!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
return;
sync_core_before_usermode();
}
-static inline void membarrier_execve(struct task_struct *t)
-{
- atomic_set(&t->mm->membarrier_state, 0);
-}
+extern void membarrier_exec_mmap(struct mm_struct *mm);
+
#else
#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
{
}
#endif
-static inline void membarrier_execve(struct task_struct *t)
+static inline void membarrier_exec_mmap(struct mm_struct *mm)
{
}
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
__put_task_struct(t);
}
-struct task_struct *task_rcu_dereference(struct task_struct **ptask);
+void put_task_struct_rcu_user(struct task_struct *task);
#ifdef CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT
extern int arch_task_struct_size __read_mostly;
LSM_POLICY_CHANGE,
};
+/*
+ * These are reasons that can be passed to the security_locked_down()
+ * LSM hook. Lockdown reasons that protect kernel integrity (ie, the
+ * ability for userland to modify kernel code) are placed before
+ * LOCKDOWN_INTEGRITY_MAX. Lockdown reasons that protect kernel
+ * confidentiality (ie, the ability for userland to extract
+ * information from the running kernel that would otherwise be
+ * restricted) are placed before LOCKDOWN_CONFIDENTIALITY_MAX.
+ *
+ * LSM authors should note that the semantics of any given lockdown
+ * reason are not guaranteed to be stable - the same reason may block
+ * one set of features in one kernel release, and a slightly different
+ * set of features in a later kernel release. LSMs that seek to expose
+ * lockdown policy at any level of granularity other than "none",
+ * "integrity" or "confidentiality" are responsible for either
+ * ensuring that they expose a consistent level of functionality to
+ * userland, or ensuring that userland is aware that this is
+ * potentially a moving target. It is easy to misuse this information
+ * in a way that could break userspace. Please be careful not to do
+ * so.
+ *
+ * If you add to this, remember to extend lockdown_reasons in
+ * security/lockdown/lockdown.c.
+ */
+enum lockdown_reason {
+ LOCKDOWN_NONE,
+ LOCKDOWN_MODULE_SIGNATURE,
+ LOCKDOWN_DEV_MEM,
+ LOCKDOWN_KEXEC,
+ LOCKDOWN_HIBERNATION,
+ LOCKDOWN_PCI_ACCESS,
+ LOCKDOWN_IOPORT,
+ LOCKDOWN_MSR,
+ LOCKDOWN_ACPI_TABLES,
+ LOCKDOWN_PCMCIA_CIS,
+ LOCKDOWN_TIOCSSERIAL,
+ LOCKDOWN_MODULE_PARAMETERS,
+ LOCKDOWN_MMIOTRACE,
+ LOCKDOWN_DEBUGFS,
+ LOCKDOWN_INTEGRITY_MAX,
+ LOCKDOWN_KCORE,
+ LOCKDOWN_KPROBES,
+ LOCKDOWN_BPF_READ,
+ LOCKDOWN_PERF,
+ LOCKDOWN_TRACEFS,
+ LOCKDOWN_CONFIDENTIALITY_MAX,
+};
+
/* These functions are in security/commoncap.c */
extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
int cap, unsigned int opts);
/* prototypes */
extern int security_init(void);
+extern int early_security_init(void);
/* Security operations */
int security_binder_set_context_mgr(struct task_struct *mgr);
int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
+int security_locked_down(enum lockdown_reason what);
#else /* CONFIG_SECURITY */
static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
return 0;
}
+static inline int early_security_init(void)
+{
+ return 0;
+}
+
static inline int security_binder_set_context_mgr(struct task_struct *mgr)
{
return 0;
{
return -EOPNOTSUPP;
}
+static inline int security_locked_down(enum lockdown_reason what)
+{
+ return 0;
+}
#endif /* CONFIG_SECURITY */
#ifdef CONFIG_SECURITY_NETWORK
return NULL;
}
+
+static inline void skb_ext_reset(struct sk_buff *skb)
+{
+ if (unlikely(skb->active_extensions)) {
+ __skb_ext_put(skb->extensions);
+ skb->active_extensions = 0;
+ }
+}
#else
static inline void skb_ext_put(struct sk_buff *skb) {}
+static inline void skb_ext_reset(struct sk_buff *skb) {}
static inline void skb_ext_del(struct sk_buff *skb, int unused) {}
static inline void __skb_ext_copy(struct sk_buff *d, const struct sk_buff *s) {}
static inline void skb_ext_copy(struct sk_buff *dst, const struct sk_buff *s) {}
#endif /* CONFIG_SKB_EXTENSIONS */
-static inline void nf_reset(struct sk_buff *skb)
+static inline void nf_reset_ct(struct sk_buff *skb)
{
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
nf_conntrack_put(skb_nfct(skb));
skb->_nfct = 0;
#endif
-#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
- skb_ext_del(skb, SKB_EXT_BRIDGE_NF);
-#endif
}
static inline void nf_reset_trace(struct sk_buff *skb)
int has_died);
struct cache_head * (*alloc)(void);
+ void (*flush)(void);
int (*match)(struct cache_head *orig, struct cache_head *new);
void (*init)(struct cache_head *orig, struct cache_head *new);
void (*update)(struct cache_head *orig, struct cache_head *new);
/* fields for communication over channel */
struct list_head queue;
- atomic_t readers; /* how many time is /chennel open */
- time_t last_close; /* if no readers, when did last close */
- time_t last_warn; /* when we last warned about no readers */
+ atomic_t writers; /* how many time is /channel open */
+ time_t last_close; /* if no writers, when did last close */
+ time_t last_warn; /* when we last warned about no writers */
union {
struct proc_dir_entry *procfs;
#ifndef SVC_RDMA_H
#define SVC_RDMA_H
+#include <linux/llist.h>
#include <linux/sunrpc/xdr.h>
#include <linux/sunrpc/svcsock.h>
#include <linux/sunrpc/rpc_rdma.h>
struct list_head sc_read_complete_q;
struct work_struct sc_work;
- spinlock_t sc_recv_lock;
- struct list_head sc_recv_ctxts;
+ struct llist_head sc_recv_ctxts;
};
/* sc_flags */
#define RDMAXPRT_CONN_PENDING 3
#define RPCSVC_MAXPAYLOAD_RDMA RPCSVC_MAXPAYLOAD
struct svc_rdma_recv_ctxt {
+ struct llist_node rc_node;
struct list_head rc_list;
struct ib_recv_wr rc_recv_wr;
struct ib_cqe rc_cqe;
#endif
/* svc_rdma.c */
-extern struct workqueue_struct *svc_rdma_wq;
extern int svc_rdma_init(void);
extern void svc_rdma_cleanup(void);
#endif /* ARCH_HAS_NOCACHE_UACCESS */
+extern __must_check int check_zeroed_user(const void __user *from, size_t size);
+
+/**
+ * copy_struct_from_user: copy a struct from userspace
+ * @dst: Destination address, in kernel space. This buffer must be @ksize
+ * bytes long.
+ * @ksize: Size of @dst struct.
+ * @src: Source address, in userspace.
+ * @usize: (Alleged) size of @src struct.
+ *
+ * Copies a struct from userspace to kernel space, in a way that guarantees
+ * backwards-compatibility for struct syscall arguments (as long as future
+ * struct extensions are made such that all new fields are *appended* to the
+ * old struct, and zeroed-out new fields have the same meaning as the old
+ * struct).
+ *
+ * @ksize is just sizeof(*dst), and @usize should've been passed by userspace.
+ * The recommended usage is something like the following:
+ *
+ * SYSCALL_DEFINE2(foobar, const struct foo __user *, uarg, size_t, usize)
+ * {
+ * int err;
+ * struct foo karg = {};
+ *
+ * if (usize > PAGE_SIZE)
+ * return -E2BIG;
+ * if (usize < FOO_SIZE_VER0)
+ * return -EINVAL;
+ *
+ * err = copy_struct_from_user(&karg, sizeof(karg), uarg, usize);
+ * if (err)
+ * return err;
+ *
+ * // ...
+ * }
+ *
+ * There are three cases to consider:
+ * * If @usize == @ksize, then it's copied verbatim.
+ * * If @usize < @ksize, then the userspace has passed an old struct to a
+ * newer kernel. The rest of the trailing bytes in @dst (@ksize - @usize)
+ * are to be zero-filled.
+ * * If @usize > @ksize, then the userspace has passed a new struct to an
+ * older kernel. The trailing bytes unknown to the kernel (@usize - @ksize)
+ * are checked to ensure they are zeroed, otherwise -E2BIG is returned.
+ *
+ * Returns (in all cases, some data may have been copied):
+ * * -E2BIG: (@usize > @ksize) and there are non-zero trailing bytes in @src.
+ * * -EFAULT: access to userspace failed.
+ */
+static __always_inline __must_check int
+copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
+ size_t usize)
+{
+ size_t size = min(ksize, usize);
+ size_t rest = max(ksize, usize) - size;
+
+ /* Deal with trailing bytes. */
+ if (usize < ksize) {
+ memset(dst + size, 0, rest);
+ } else if (usize > ksize) {
+ int ret = check_zeroed_user(src + size, rest);
+ if (ret <= 0)
+ return ret ?: -E2BIG;
+ }
+ /* Copy the interoperable parts of the struct. */
+ if (copy_from_user(dst, src, size))
+ return -EFAULT;
+ return 0;
+}
+
/*
* probe_kernel_read(): safely attempt to read from a location
* @dst: pointer to the buffer that shall take the data
#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
struct key;
+struct pkcs7_message;
extern int verify_pkcs7_signature(const void *data, size_t len,
const void *raw_pkcs7, size_t pkcs7_len,
const void *data, size_t len,
size_t asn1hdrlen),
void *ctx);
+extern int verify_pkcs7_message_sig(const void *data, size_t len,
+ struct pkcs7_message *pkcs7,
+ struct key *trusted_keys,
+ enum key_being_used_for usage,
+ int (*view_content)(void *ctx,
+ const void *data,
+ size_t len,
+ size_t asn1hdrlen),
+ void *ctx);
#ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION
extern int verify_pefile_signature(const void *pebuf, unsigned pelen,
tw_pad : 2, /* 2 bits hole */
tw_tos : 8;
u32 tw_txhash;
+ u32 tw_priority;
struct timer_list tw_timer;
struct inet_bind_bucket *tw_tb;
};
* upper-layer output functions
*/
int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
- __u32 mark, struct ipv6_txoptions *opt, int tclass);
+ __u32 mark, struct ipv6_txoptions *opt, int tclass, u32 priority);
int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr);
NFT_CHAIN_HW_OFFLOAD = 0x2,
};
+#define NFT_CHAIN_POLICY_UNSET U8_MAX
+
/**
* struct nft_chain - nf_tables chain
*
const struct nlattr *nla,
u8 genmask);
+void nf_tables_deactivate_flowtable(const struct nft_ctx *ctx,
+ struct nft_flowtable *flowtable,
+ enum nft_trans_phase phase);
+
void nft_register_flowtable_type(struct nf_flowtable_type *type);
void nft_unregister_flowtable_type(struct nf_flowtable_type *type);
unsigned int rt_flags;
__u16 rt_type;
__u8 rt_is_input;
- u8 rt_gw_family;
+ __u8 rt_uses_gateway;
int rt_iif;
+ u8 rt_gw_family;
/* Info on neighbour */
union {
__be32 rt_gw4;
return q;
}
+static inline struct Qdisc *qdisc_root_bh(const struct Qdisc *qdisc)
+{
+ return rcu_dereference_bh(qdisc->dev_queue->qdisc);
+}
+
static inline struct Qdisc *qdisc_root_sleeping(const struct Qdisc *qdisc)
{
return qdisc->dev_queue->qdisc_sleeping;
__entry->gfp_flags = gfp_flags;
),
- TP_printk("call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s",
- __entry->call_site,
+ TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s",
+ (void *)__entry->call_site,
__entry->ptr,
__entry->bytes_req,
__entry->bytes_alloc,
__entry->ptr = ptr;
),
- TP_printk("call_site=%lx ptr=%p", __entry->call_site, __entry->ptr)
+ TP_printk("call_site=%pS ptr=%p",
+ (void *)__entry->call_site, __entry->ptr)
);
DEFINE_EVENT(kmem_free, kfree,
),
TP_fast_assign(
- __entry->call = call->debug_id;
+ __entry->call = call ? call->debug_id : 0;
__entry->why = why;
__entry->seq = seq;
__entry->offset = offset;
__u64 high_va_max;
/* gfx10 pa_sc_tile_steering_override */
__u32 pa_sc_tile_steering_override;
+ /* disabled TCCs */
+ __u64 tcc_disabled_mask;
};
struct drm_amdgpu_info_hw_ip {
};
/* Max # of type identifier */
-#define BTF_MAX_TYPE 0x0000ffff
+#define BTF_MAX_TYPE 0x000fffff
/* Max offset into the string section */
-#define BTF_MAX_NAME_OFFSET 0x0000ffff
+#define BTF_MAX_NAME_OFFSET 0x00ffffff
/* Max # of struct/union/enum members or func args */
#define BTF_MAX_VLEN 0xffff
*
* 7.31
* - add FUSE_WRITE_KILL_PRIV flag
+ * - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING
+ * - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag
*/
#ifndef _LINUX_FUSE_H
* FUSE_CACHE_SYMLINKS: cache READLINK responses
* FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir
* FUSE_EXPLICIT_INVAL_DATA: only invalidate cached pages on explicit request
+ * FUSE_MAP_ALIGNMENT: map_alignment field is valid
*/
#define FUSE_ASYNC_READ (1 << 0)
#define FUSE_POSIX_LOCKS (1 << 1)
#define FUSE_CACHE_SYMLINKS (1 << 23)
#define FUSE_NO_OPENDIR_SUPPORT (1 << 24)
#define FUSE_EXPLICIT_INVAL_DATA (1 << 25)
+#define FUSE_MAP_ALIGNMENT (1 << 26)
/**
* CUSE INIT request/reply flags
FUSE_RENAME2 = 45,
FUSE_LSEEK = 46,
FUSE_COPY_FILE_RANGE = 47,
+ FUSE_SETUPMAPPING = 48,
+ FUSE_REMOVEMAPPING = 49,
/* CUSE specific operations */
CUSE_INIT = 4096,
uint32_t max_write;
uint32_t time_gran;
uint16_t max_pages;
- uint16_t padding;
+ uint16_t map_alignment;
uint32_t unused[8];
};
#define KVM_CAP_ARM_PTRAUTH_GENERIC 172
#define KVM_CAP_PMU_EVENT_FILTER 173
#define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
+#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
#ifdef KVM_CAP_IRQ_ROUTING
#define KVM_REG_S390 0x5000000000000000ULL
#define KVM_REG_ARM64 0x6000000000000000ULL
#define KVM_REG_MIPS 0x7000000000000000ULL
+#define KVM_REG_RISCV 0x8000000000000000ULL
#define KVM_REG_SIZE_SHIFT 52
#define KVM_REG_SIZE_MASK 0x00f0000000000000ULL
union {
struct {
char name[EBT_EXTENSION_MAXNAMELEN];
- uint8_t revision;
+ __u8 revision;
};
struct xt_match *match;
} u;
union {
struct {
char name[EBT_EXTENSION_MAXNAMELEN];
- uint8_t revision;
+ __u8 revision;
};
struct xt_target *watcher;
} u;
union {
struct {
char name[EBT_EXTENSION_MAXNAMELEN];
- uint8_t revision;
+ __u8 revision;
};
struct xt_target *target;
} u;
#include <linux/types.h>
/* latest upcall version available */
-#define CLD_UPCALL_VERSION 1
+#define CLD_UPCALL_VERSION 2
/* defined by RFC3530 */
#define NFS4_OPAQUE_LIMIT 1024
+#ifndef SHA256_DIGEST_SIZE
+#define SHA256_DIGEST_SIZE 32
+#endif
+
enum cld_command {
Cld_Create, /* create a record for this cm_id */
Cld_Remove, /* remove record of this cm_id */
Cld_Check, /* is this cm_id allowed? */
Cld_GraceDone, /* grace period is complete */
- Cld_GraceStart,
+ Cld_GraceStart, /* grace start (upload client records) */
+ Cld_GetVersion, /* query max supported upcall version */
};
/* representation of long-form NFSv4 client ID */
unsigned char cn_id[NFS4_OPAQUE_LIMIT]; /* client-provided */
} __attribute__((packed));
+/* sha256 hash of the kerberos principal */
+struct cld_princhash {
+ __u8 cp_len; /* length of cp_data */
+ unsigned char cp_data[SHA256_DIGEST_SIZE]; /* hash of principal */
+} __attribute__((packed));
+
+struct cld_clntinfo {
+ struct cld_name cc_name;
+ struct cld_princhash cc_princhash;
+} __attribute__((packed));
+
/* message struct for communication with userspace */
struct cld_msg {
__u8 cm_vers; /* upcall version */
union {
__s64 cm_gracetime; /* grace period start time */
struct cld_name cm_name;
+ __u8 cm_version; /* for getting max version */
+ } __attribute__((packed)) cm_u;
+} __attribute__((packed));
+
+/* version 2 message can include hash of kerberos principal */
+struct cld_msg_v2 {
+ __u8 cm_vers; /* upcall version */
+ __u8 cm_cmd; /* upcall command */
+ __s16 cm_status; /* return code */
+ __u32 cm_xid; /* transaction id */
+ union {
+ struct cld_name cm_name;
+ __u8 cm_version; /* for getting max version */
+ struct cld_clntinfo cm_clntinfo; /* name & princ hash */
} __attribute__((packed)) cm_u;
} __attribute__((packed));
+struct cld_msg_hdr {
+ __u8 cm_vers; /* upcall version */
+ __u8 cm_cmd; /* upcall command */
+ __s16 cm_status; /* return code */
+ __u32 cm_xid; /* transaction id */
+} __attribute__((packed));
+
#endif /* !_NFSD_CLD_H */
__u32 result;
};
+struct nvme_passthru_cmd64 {
+ __u8 opcode;
+ __u8 flags;
+ __u16 rsvd1;
+ __u32 nsid;
+ __u32 cdw2;
+ __u32 cdw3;
+ __u64 metadata;
+ __u64 addr;
+ __u32 metadata_len;
+ __u32 data_len;
+ __u32 cdw10;
+ __u32 cdw11;
+ __u32 cdw12;
+ __u32 cdw13;
+ __u32 cdw14;
+ __u32 cdw15;
+ __u32 timeout_ms;
+ __u64 result;
+};
+
#define nvme_admin_cmd nvme_passthru_cmd
#define NVME_IOCTL_ID _IO('N', 0x40)
#define NVME_IOCTL_RESET _IO('N', 0x44)
#define NVME_IOCTL_SUBSYS_RESET _IO('N', 0x45)
#define NVME_IOCTL_RESCAN _IO('N', 0x46)
+#define NVME_IOCTL_ADMIN64_CMD _IOWR('N', 0x47, struct nvme_passthru_cmd64)
+#define NVME_IOCTL_IO64_CMD _IOWR('N', 0x48, struct nvme_passthru_cmd64)
#endif /* _UAPI_LINUX_NVME_IOCTL_H */
*/
+#ifndef _UAPI_LINUX_PG_H
+#define _UAPI_LINUX_PG_H
+
#define PG_MAGIC 'P'
#define PG_RESET 'Z'
#define PG_COMMAND 'C'
};
-/* end of pg.h */
+#endif /* _UAPI_LINUX_PG_H */
#define PTP_ENABLE_FEATURE (1<<0)
#define PTP_RISING_EDGE (1<<1)
#define PTP_FALLING_EDGE (1<<2)
+
+/*
+ * flag fields valid for the new PTP_EXTTS_REQUEST2 ioctl.
+ */
#define PTP_EXTTS_VALID_FLAGS (PTP_ENABLE_FEATURE | \
PTP_RISING_EDGE | \
PTP_FALLING_EDGE)
+/*
+ * flag fields valid for the original PTP_EXTTS_REQUEST ioctl.
+ * DO NOT ADD NEW FLAGS HERE.
+ */
+#define PTP_EXTTS_V1_VALID_FLAGS (PTP_ENABLE_FEATURE | \
+ PTP_RISING_EDGE | \
+ PTP_FALLING_EDGE)
+
/*
* Bits of the ptp_perout_request.flags field:
*/
#define PTP_PEROUT_ONE_SHOT (1<<0)
+
+/*
+ * flag fields valid for the new PTP_PEROUT_REQUEST2 ioctl.
+ */
#define PTP_PEROUT_VALID_FLAGS (PTP_PEROUT_ONE_SHOT)
+
+/*
+ * No flags are valid for the original PTP_PEROUT_REQUEST ioctl
+ */
+#define PTP_PEROUT_V1_VALID_FLAGS (0)
+
/*
* struct ptp_clock_time - represents a time value
*
#define CLONE_NEWNET 0x40000000 /* New network namespace */
#define CLONE_IO 0x80000000 /* Clone io context */
-/*
- * Arguments for the clone3 syscall
+#ifndef __ASSEMBLY__
+/**
+ * struct clone_args - arguments for the clone3 syscall
+ * @flags: Flags for the new process as listed above.
+ * All flags are valid except for CSIGNAL and
+ * CLONE_DETACHED.
+ * @pidfd: If CLONE_PIDFD is set, a pidfd will be
+ * returned in this argument.
+ * @child_tid: If CLONE_CHILD_SETTID is set, the TID of the
+ * child process will be returned in the child's
+ * memory.
+ * @parent_tid: If CLONE_PARENT_SETTID is set, the TID of
+ * the child process will be returned in the
+ * parent's memory.
+ * @exit_signal: The exit_signal the parent process will be
+ * sent when the child exits.
+ * @stack: Specify the location of the stack for the
+ * child process.
+ * @stack_size: The size of the stack for the child process.
+ * @tls: If CLONE_SETTLS is set, the tls descriptor
+ * is set to tls.
+ *
+ * The structure is versioned by size and thus extensible.
+ * New struct members must go at the end of the struct and
+ * must be properly 64bit aligned.
*/
struct clone_args {
__aligned_u64 flags;
__aligned_u64 stack_size;
__aligned_u64 tls;
};
+#endif
+
+#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
/*
* Scheduling policies
--- /dev/null
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+
+#ifndef _UAPI_LINUX_VIRTIO_FS_H
+#define _UAPI_LINUX_VIRTIO_FS_H
+
+#include <linux/types.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
+
+struct virtio_fs_config {
+ /* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */
+ __u8 tag[36];
+
+ /* Number of request queues */
+ __u32 num_request_queues;
+} __attribute__((packed));
+
+#endif /* _UAPI_LINUX_VIRTIO_FS_H */
#define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */
#define VIRTIO_ID_CRYPTO 20 /* virtio crypto */
#define VIRTIO_ID_IOMMU 23 /* virtio IOMMU */
+#define VIRTIO_ID_FS 26 /* virtio filesystem */
#define VIRTIO_ID_PMEM 27 /* virtio pmem */
#endif /* _LINUX_VIRTIO_IDS_H */
bool xen_running_on_version_or_later(unsigned int major, unsigned int minor);
-efi_status_t xen_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc);
-efi_status_t xen_efi_set_time(efi_time_t *tm);
-efi_status_t xen_efi_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending,
- efi_time_t *tm);
-efi_status_t xen_efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm);
-efi_status_t xen_efi_get_variable(efi_char16_t *name, efi_guid_t *vendor,
- u32 *attr, unsigned long *data_size,
- void *data);
-efi_status_t xen_efi_get_next_variable(unsigned long *name_size,
- efi_char16_t *name, efi_guid_t *vendor);
-efi_status_t xen_efi_set_variable(efi_char16_t *name, efi_guid_t *vendor,
- u32 attr, unsigned long data_size,
- void *data);
-efi_status_t xen_efi_query_variable_info(u32 attr, u64 *storage_space,
- u64 *remaining_space,
- u64 *max_variable_size);
-efi_status_t xen_efi_get_next_high_mono_count(u32 *count);
-efi_status_t xen_efi_update_capsule(efi_capsule_header_t **capsules,
- unsigned long count, unsigned long sg_list);
-efi_status_t xen_efi_query_capsule_caps(efi_capsule_header_t **capsules,
- unsigned long count, u64 *max_size,
- int *reset_type);
-void xen_efi_reset_system(int reset_type, efi_status_t status,
- unsigned long data_size, efi_char16_t *data);
+void xen_efi_runtime_setup(void);
#ifdef CONFIG_PREEMPT
default 0 if BASE_FULL
default 1 if !BASE_FULL
+config MODULE_SIG_FORMAT
+ def_bool n
+ select SYSTEM_DATA_VERIFICATION
+
menuconfig MODULES
bool "Enable loadable module support"
option modules
config MODULE_SIG
bool "Module signature verification"
- select SYSTEM_DATA_VERIFICATION
+ select MODULE_SIG_FORMAT
help
Check modules for valid signatures upon load: the signature
is simply appended to the module. For more information see
kernel build dependency so that the signing tool can use its crypto
library.
+ You should enable this option if you wish to use either
+ CONFIG_SECURITY_LOCKDOWN_LSM or lockdown functionality imposed via
+ another LSM - otherwise unsigned modules will be loadable regardless
+ of the lockdown policy.
+
!!!WARNING!!! If you enable this option, you MUST make sure that the
module DOES NOT get stripped after being signed. This includes the
debuginfo strip done by some packagers (such as rpmbuild) and
boot_cpu_init();
page_address_init();
pr_notice("%s", linux_banner);
+ early_security_init();
setup_arch(&command_line);
setup_command_line(command_line);
setup_nr_cpu_ids();
obj-$(CONFIG_UID16) += uid16.o
obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_MODULE_SIG) += module_signing.o
+obj-$(CONFIG_MODULE_SIG_FORMAT) += module_signature.o
obj-$(CONFIG_KALLSYMS) += kallsyms.o
obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
obj-$(CONFIG_CRASH_CORE) += crash_core.o
if (BITS_PER_BYTE_MASKED(struct_bits_off)) {
btf_verifier_log_member(env, struct_type, member,
"Member is not byte aligned");
- return -EINVAL;
+ return -EINVAL;
}
nr_bits = int_bitsize;
return -EINVAL;
}
- if (t->size != sizeof(int)) {
- btf_verifier_log_type(env, t, "Expected size:%zu",
- sizeof(int));
+ if (t->size > 8 || !is_power_of_2(t->size)) {
+ btf_verifier_log_type(env, t, "Unexpected size");
return -EINVAL;
}
node = kzalloc(sizeof(*node), GFP_ATOMIC | __GFP_NOWARN);
if (!node)
- return NULL;
+ return ERR_PTR(-ENOMEM);
err = xsk_map_inc(map);
if (err) {
void __init cpu_smt_disable(bool force)
{
- if (cpu_smt_control == CPU_SMT_FORCE_DISABLED ||
- cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+ if (!cpu_smt_possible())
return;
if (force) {
*/
return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
}
+
+/* Returns true if SMT is not supported of forcefully (irreversibly) disabled */
+bool cpu_smt_possible(void)
+{
+ return cpu_smt_control != CPU_SMT_FORCE_DISABLED &&
+ cpu_smt_control != CPU_SMT_NOT_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(cpu_smt_possible);
#else
static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
#endif
*/
void dma_common_free_remap(void *cpu_addr, size_t size)
{
- struct page **pages = dma_common_find_pages(cpu_addr);
+ struct vm_struct *area = find_vm_area(cpu_addr);
- if (!pages) {
+ if (!area || area->flags != VM_DMA_COHERENT) {
WARN(1, "trying to free invalid coherent area: %p\n", cpu_addr);
return;
}
u32 size;
int ret;
- if (!access_ok(uattr, PERF_ATTR_SIZE_VER0))
- return -EFAULT;
-
- /*
- * zero the full structure, so that a short copy will be nice.
- */
+ /* Zero the full structure, so that a short copy will be nice. */
memset(attr, 0, sizeof(*attr));
ret = get_user(size, &uattr->size);
if (ret)
return ret;
- if (size > PAGE_SIZE) /* silly large */
- goto err_size;
-
- if (!size) /* abi compat */
+ /* ABI compatibility quirk: */
+ if (!size)
size = PERF_ATTR_SIZE_VER0;
-
- if (size < PERF_ATTR_SIZE_VER0)
+ if (size < PERF_ATTR_SIZE_VER0 || size > PAGE_SIZE)
goto err_size;
- /*
- * If we're handed a bigger struct than we know of,
- * ensure all the unknown bits are 0 - i.e. new
- * user-space does not rely on any kernel feature
- * extensions we dont know about yet.
- */
- if (size > sizeof(*attr)) {
- unsigned char __user *addr;
- unsigned char __user *end;
- unsigned char val;
-
- addr = (void __user *)uattr + sizeof(*attr);
- end = (void __user *)uattr + size;
-
- for (; addr < end; addr++) {
- ret = get_user(val, addr);
- if (ret)
- return ret;
- if (val)
- goto err_size;
- }
- size = sizeof(*attr);
+ ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
+ if (ret) {
+ if (ret == -E2BIG)
+ goto err_size;
+ return ret;
}
- ret = copy_from_user(attr, uattr, size);
- if (ret)
- return -EFAULT;
-
attr->size = size;
if (attr->__reserved_1)
perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
return -EACCES;
+ err = security_locked_down(LOCKDOWN_PERF);
+ if (err && (attr.sample_type & PERF_SAMPLE_REGS_INTR))
+ /* REGS_INTR can leak data, lockdown must prevent this */
+ return err;
+
+ err = 0;
+
/*
* In cgroup mode, the pid argument is used to pass the fd
* opened to the cgroup directory in cgroupfs. The cpu argument
put_task_struct(tsk);
}
+void put_task_struct_rcu_user(struct task_struct *task)
+{
+ if (refcount_dec_and_test(&task->rcu_users))
+ call_rcu(&task->rcu, delayed_put_task_struct);
+}
void release_task(struct task_struct *p)
{
write_unlock_irq(&tasklist_lock);
release_thread(p);
- call_rcu(&p->rcu, delayed_put_task_struct);
+ put_task_struct_rcu_user(p);
p = leader;
if (unlikely(zap_leader))
goto repeat;
}
-/*
- * Note that if this function returns a valid task_struct pointer (!NULL)
- * task->usage must remain >0 for the duration of the RCU critical section.
- */
-struct task_struct *task_rcu_dereference(struct task_struct **ptask)
-{
- struct sighand_struct *sighand;
- struct task_struct *task;
-
- /*
- * We need to verify that release_task() was not called and thus
- * delayed_put_task_struct() can't run and drop the last reference
- * before rcu_read_unlock(). We check task->sighand != NULL,
- * but we can read the already freed and reused memory.
- */
-retry:
- task = rcu_dereference(*ptask);
- if (!task)
- return NULL;
-
- probe_kernel_address(&task->sighand, sighand);
-
- /*
- * Pairs with atomic_dec_and_test() in put_task_struct(). If this task
- * was already freed we can not miss the preceding update of this
- * pointer.
- */
- smp_rmb();
- if (unlikely(task != READ_ONCE(*ptask)))
- goto retry;
-
- /*
- * We've re-checked that "task == *ptask", now we have two different
- * cases:
- *
- * 1. This is actually the same task/task_struct. In this case
- * sighand != NULL tells us it is still alive.
- *
- * 2. This is another task which got the same memory for task_struct.
- * We can't know this of course, and we can not trust
- * sighand != NULL.
- *
- * In this case we actually return a random value, but this is
- * correct.
- *
- * If we return NULL - we can pretend that we actually noticed that
- * *ptask was updated when the previous task has exited. Or pretend
- * that probe_slab_address(&sighand) reads NULL.
- *
- * If we return the new task (because sighand is not NULL for any
- * reason) - this is fine too. This (new) task can't go away before
- * another gp pass.
- *
- * And note: We could even eliminate the false positive if re-read
- * task->sighand once again to avoid the falsely NULL. But this case
- * is very unlikely so we don't care.
- */
- if (!sighand)
- return NULL;
-
- return task;
-}
-
void rcuwait_wake_up(struct rcuwait *w)
{
struct task_struct *task;
*/
smp_mb(); /* (B) */
- /*
- * Avoid using task_rcu_dereference() magic as long as we are careful,
- * see comment in rcuwait_wait_event() regarding ->exit_state.
- */
task = rcu_dereference(w->task);
if (task)
wake_up_process(task);
tsk->cpus_ptr = &tsk->cpus_mask;
/*
- * One for us, one for whoever does the "release_task()" (usually
- * parent)
+ * One for the user space visible state that goes away when reaped.
+ * One for the scheduler.
*/
- refcount_set(&tsk->usage, 2);
+ refcount_set(&tsk->rcu_users, 2);
+ /* One for the rcu users */
+ refcount_set(&tsk->usage, 1);
#ifdef CONFIG_BLK_DEV_IO_TRACE
tsk->btrace_seq = 0;
#endif
#ifdef __ARCH_WANT_SYS_CLONE3
noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
struct clone_args __user *uargs,
- size_t size)
+ size_t usize)
{
+ int err;
struct clone_args args;
- if (unlikely(size > PAGE_SIZE))
+ if (unlikely(usize > PAGE_SIZE))
return -E2BIG;
-
- if (unlikely(size < sizeof(struct clone_args)))
+ if (unlikely(usize < CLONE_ARGS_SIZE_VER0))
return -EINVAL;
- if (unlikely(!access_ok(uargs, size)))
- return -EFAULT;
-
- if (size > sizeof(struct clone_args)) {
- unsigned char __user *addr;
- unsigned char __user *end;
- unsigned char val;
-
- addr = (void __user *)uargs + sizeof(struct clone_args);
- end = (void __user *)uargs + size;
-
- for (; addr < end; addr++) {
- if (get_user(val, addr))
- return -EFAULT;
- if (val)
- return -E2BIG;
- }
-
- size = sizeof(struct clone_args);
- }
-
- if (copy_from_user(&args, uargs, size))
- return -EFAULT;
+ err = copy_struct_from_user(&args, sizeof(args), uargs, usize);
+ if (err)
+ return err;
/*
* Verify that higher 32bits of exit_signal are unset and that
return true;
}
+/**
+ * clone3 - create a new process with specific properties
+ * @uargs: argument structure
+ * @size: size of @uargs
+ *
+ * clone3() is the extensible successor to clone()/clone2().
+ * It takes a struct as argument that is versioned by its size.
+ *
+ * Return: On success, a positive PID for the child process.
+ * On error, a negative errno number.
+ */
SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
{
int err;
find $cpio_dir -type f -print0 |
xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;'
-tar -Jcf $tarfile -C $cpio_dir/ . > /dev/null
+# Create archive and try to normalize metadata for reproducibility
+tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
+ --owner=0 --group=0 --sort=name --numeric-owner \
+ -Jcf $tarfile -C $cpio_dir/ . > /dev/null
echo "$src_files_md5" > kernel/kheaders.md5
echo "$obj_files_md5" >> kernel/kheaders.md5
if (result < 0)
return result;
+ /*
+ * kexec can be used to circumvent module loading restrictions, so
+ * prevent loading in that case
+ */
+ result = security_locked_down(LOCKDOWN_KEXEC);
+ if (result)
+ return result;
+
/*
* Verify we have a legal set of flags
* This leaves us room for future extensions.
return kexec_image_post_load_cleanup_default(image);
}
-#ifdef CONFIG_KEXEC_VERIFY_SIG
+#ifdef CONFIG_KEXEC_SIG
static int kexec_image_verify_sig_default(struct kimage *image, void *buf,
unsigned long buf_len)
{
image->image_loader_data = NULL;
}
+#ifdef CONFIG_KEXEC_SIG
+static int
+kimage_validate_signature(struct kimage *image)
+{
+ const char *reason;
+ int ret;
+
+ ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
+ image->kernel_buf_len);
+ switch (ret) {
+ case 0:
+ break;
+
+ /* Certain verification errors are non-fatal if we're not
+ * checking errors, provided we aren't mandating that there
+ * must be a valid signature.
+ */
+ case -ENODATA:
+ reason = "kexec of unsigned image";
+ goto decide;
+ case -ENOPKG:
+ reason = "kexec of image with unsupported crypto";
+ goto decide;
+ case -ENOKEY:
+ reason = "kexec of image with unavailable key";
+ decide:
+ if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
+ pr_notice("%s rejected\n", reason);
+ return ret;
+ }
+
+ /* If IMA is guaranteed to appraise a signature on the kexec
+ * image, permit it even if the kernel is otherwise locked
+ * down.
+ */
+ if (!ima_appraise_signature(READING_KEXEC_IMAGE) &&
+ security_locked_down(LOCKDOWN_KEXEC))
+ return -EPERM;
+
+ return 0;
+
+ /* All other errors are fatal, including nomem, unparseable
+ * signatures and signature check failures - even if signatures
+ * aren't required.
+ */
+ default:
+ pr_notice("kernel signature verification failed (%d).\n", ret);
+ }
+
+ return ret;
+}
+#endif
+
/*
* In file mode list of segments is prepared by kernel. Copy relevant
* data from user space, do error checking, prepare segment list
const char __user *cmdline_ptr,
unsigned long cmdline_len, unsigned flags)
{
- int ret = 0;
+ int ret;
void *ldata;
loff_t size;
if (ret)
goto out;
-#ifdef CONFIG_KEXEC_VERIFY_SIG
- ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
- image->kernel_buf_len);
- if (ret) {
- pr_debug("kernel signature verification failed.\n");
+#ifdef CONFIG_KEXEC_SIG
+ ret = kimage_validate_signature(image);
+
+ if (ret)
goto out;
- }
- pr_debug("kernel signature verification successful.\n");
#endif
/* It is possible that there no initramfs is being loaded */
if (!(flags & KEXEC_FILE_NO_INITRAMFS)) {
if ((loop & PV_PREV_CHECK_MASK) != 0)
return false;
- return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
+ return READ_ONCE(prev->state) != vcpu_running;
}
/*
#include <linux/export.h>
#include <linux/extable.h>
#include <linux/moduleloader.h>
+#include <linux/module_signature.h>
#include <linux/trace_events.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
#ifdef CONFIG_MODULE_SIG
static int module_sig_check(struct load_info *info, int flags)
{
- int err = -ENOKEY;
+ int err = -ENODATA;
const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
+ const char *reason;
const void *mod = info->hdr;
/*
err = mod_verify_sig(mod, info);
}
- if (!err) {
+ switch (err) {
+ case 0:
info->sig_ok = true;
return 0;
- }
- /* Not having a signature is only an error if we're strict. */
- if (err == -ENOKEY && !is_module_sig_enforced())
- err = 0;
+ /* We don't permit modules to be loaded into trusted kernels
+ * without a valid signature on them, but if we're not
+ * enforcing, certain errors are non-fatal.
+ */
+ case -ENODATA:
+ reason = "Loading of unsigned module";
+ goto decide;
+ case -ENOPKG:
+ reason = "Loading of module with unsupported crypto";
+ goto decide;
+ case -ENOKEY:
+ reason = "Loading of module with unavailable key";
+ decide:
+ if (is_module_sig_enforced()) {
+ pr_notice("%s is rejected\n", reason);
+ return -EKEYREJECTED;
+ }
- return err;
+ return security_locked_down(LOCKDOWN_MODULE_SIGNATURE);
+
+ /* All other errors are fatal, including nomem, unparseable
+ * signatures and signature check failures - even if signatures
+ * aren't required.
+ */
+ default:
+ return err;
+ }
}
#else /* !CONFIG_MODULE_SIG */
static int module_sig_check(struct load_info *info, int flags)
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Module signature checker
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/errno.h>
+#include <linux/printk.h>
+#include <linux/module_signature.h>
+#include <asm/byteorder.h>
+
+/**
+ * mod_check_sig - check that the given signature is sane
+ *
+ * @ms: Signature to check.
+ * @file_len: Size of the file to which @ms is appended.
+ * @name: What is being checked. Used for error messages.
+ */
+int mod_check_sig(const struct module_signature *ms, size_t file_len,
+ const char *name)
+{
+ if (be32_to_cpu(ms->sig_len) >= file_len - sizeof(*ms))
+ return -EBADMSG;
+
+ if (ms->id_type != PKEY_ID_PKCS7) {
+ pr_err("%s: Module is not signed with expected PKCS#7 message\n",
+ name);
+ return -ENOPKG;
+ }
+
+ if (ms->algo != 0 ||
+ ms->hash != 0 ||
+ ms->signer_len != 0 ||
+ ms->key_id_len != 0 ||
+ ms->__pad[0] != 0 ||
+ ms->__pad[1] != 0 ||
+ ms->__pad[2] != 0) {
+ pr_err("%s: PKCS#7 signature info has unexpected non-zero params\n",
+ name);
+ return -EBADMSG;
+ }
+
+ return 0;
+}
#include <linux/kernel.h>
#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/module_signature.h>
#include <linux/string.h>
#include <linux/verification.h>
#include <crypto/public_key.h>
#include "module-internal.h"
-enum pkey_id_type {
- PKEY_ID_PGP, /* OpenPGP generated key ID */
- PKEY_ID_X509, /* X.509 arbitrary subjectKeyIdentifier */
- PKEY_ID_PKCS7, /* Signature in PKCS#7 message */
-};
-
-/*
- * Module signature information block.
- *
- * The constituents of the signature section are, in order:
- *
- * - Signer's name
- * - Key identifier
- * - Signature data
- * - Information block
- */
-struct module_signature {
- u8 algo; /* Public-key crypto algorithm [0] */
- u8 hash; /* Digest algorithm [0] */
- u8 id_type; /* Key identifier type [PKEY_ID_PKCS7] */
- u8 signer_len; /* Length of signer's name [0] */
- u8 key_id_len; /* Length of key identifier [0] */
- u8 __pad[3];
- __be32 sig_len; /* Length of signature data */
-};
-
/*
* Verify the signature on a module.
*/
{
struct module_signature ms;
size_t sig_len, modlen = info->len;
+ int ret;
pr_devel("==>%s(,%zu)\n", __func__, modlen);
return -EBADMSG;
memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms));
- modlen -= sizeof(ms);
+
+ ret = mod_check_sig(&ms, modlen, info->name);
+ if (ret)
+ return ret;
sig_len = be32_to_cpu(ms.sig_len);
- if (sig_len >= modlen)
- return -EBADMSG;
- modlen -= sig_len;
+ modlen -= sig_len + sizeof(ms);
info->len = modlen;
- if (ms.id_type != PKEY_ID_PKCS7) {
- pr_err("%s: Module is not signed with expected PKCS#7 message\n",
- info->name);
- return -ENOPKG;
- }
-
- if (ms.algo != 0 ||
- ms.hash != 0 ||
- ms.signer_len != 0 ||
- ms.key_id_len != 0 ||
- ms.__pad[0] != 0 ||
- ms.__pad[1] != 0 ||
- ms.__pad[2] != 0) {
- pr_err("%s: PKCS#7 signature info has unexpected non-zero params\n",
- info->name);
- return -EBADMSG;
- }
-
return verify_pkcs7_signature(mod, modlen, mod + modlen, sig_len,
VERIFY_USE_SECONDARY_KEYRING,
VERIFYING_MODULE_SIGNATURE,
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/ctype.h>
+#include <linux/security.h>
#ifdef CONFIG_SYSFS
/* Protects all built-in parameters, modules use their own param_lock */
return parameqn(a, b, strlen(a)+1);
}
-static void param_check_unsafe(const struct kernel_param *kp)
+static bool param_check_unsafe(const struct kernel_param *kp)
{
+ if (kp->flags & KERNEL_PARAM_FL_HWPARAM &&
+ security_locked_down(LOCKDOWN_MODULE_PARAMETERS))
+ return false;
+
if (kp->flags & KERNEL_PARAM_FL_UNSAFE) {
pr_notice("Setting dangerous option %s - tainting kernel\n",
kp->name);
add_taint(TAINT_USER, LOCKDEP_STILL_OK);
}
+
+ return true;
}
static int parse_one(char *param,
pr_debug("handling %s with %p\n", param,
params[i].ops->set);
kernel_param_lock(params[i].mod);
- param_check_unsafe(¶ms[i]);
- err = params[i].ops->set(val, ¶ms[i]);
+ if (param_check_unsafe(¶ms[i]))
+ err = params[i].ops->set(val, ¶ms[i]);
+ else
+ err = -EPERM;
kernel_param_unlock(params[i].mod);
return err;
}
return -EPERM;
kernel_param_lock(mk->mod);
- param_check_unsafe(attribute->param);
- err = attribute->param->ops->set(buf, attribute->param);
+ if (param_check_unsafe(attribute->param))
+ err = attribute->param->ops->set(buf, attribute->param);
+ else
+ err = -EPERM;
kernel_param_unlock(mk->mod);
if (!err)
return len;
#include <linux/ctype.h>
#include <linux/genhd.h>
#include <linux/ktime.h>
+#include <linux/security.h>
#include <trace/events/power.h>
#include "power.h"
bool hibernation_available(void)
{
- return (nohibernate == 0);
+ return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
}
/**
if (cpumask_equal(p->cpus_ptr, new_mask))
goto out;
- if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
+ dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
+ if (dest_cpu >= nr_cpu_ids) {
ret = -EINVAL;
goto out;
}
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
- dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
if (task_running(rq, p) || p->state == TASK_WAKING) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
/* Task is done with its stack. */
put_task_stack(prev);
- put_task_struct(prev);
+ put_task_struct_rcu_user(prev);
}
tick_nohz_task_switch();
else
prev->active_mm = NULL;
} else { // to user
+ membarrier_switch_mm(rq, prev->active_mm, next->mm);
/*
* sys_membarrier() requires an smp_mb() between setting
- * rq->curr and returning to userspace.
+ * rq->curr / membarrier_switch_mm() and returning to userspace.
*
* The below provides this either through switch_mm(), or in
* case 'prev->active_mm == next->mm' through
* finish_task_switch()'s mmdrop().
*/
-
switch_mm_irqs_off(prev->active_mm, next->mm, next);
if (!prev->mm) { // from kernel
if (likely(prev != next)) {
rq->nr_switches++;
- rq->curr = next;
+ /*
+ * RCU users of rcu_dereference(rq->curr) may not see
+ * changes to task_struct made by pick_next_task().
+ */
+ RCU_INIT_POINTER(rq->curr, next);
/*
* The membarrier system call requires each architecture
* to have a full memory barrier after updating
#ifdef CONFIG_PREEMPTION
/*
- * this is the entry point to schedule() from in-kernel preemption
- * off of preempt_enable. Kernel preemptions off return from interrupt
- * occur there and call schedule directly.
+ * This is the entry point to schedule() from in-kernel preemption
+ * off of preempt_enable.
*/
asmlinkage __visible void __sched notrace preempt_schedule(void)
{
#endif /* CONFIG_PREEMPTION */
/*
- * this is the entry point to schedule() from kernel preemption
+ * This is the entry point to schedule() from kernel preemption
* off of irq context.
* Note, that this is called and return with irqs disabled. This will
* protect us against recursive calling from irq.
u32 size;
int ret;
- if (!access_ok(uattr, SCHED_ATTR_SIZE_VER0))
- return -EFAULT;
-
/* Zero the full structure, so that a short copy will be nice: */
memset(attr, 0, sizeof(*attr));
if (ret)
return ret;
- /* Bail out on silly large: */
- if (size > PAGE_SIZE)
- goto err_size;
-
/* ABI compatibility quirk: */
if (!size)
size = SCHED_ATTR_SIZE_VER0;
-
- if (size < SCHED_ATTR_SIZE_VER0)
+ if (size < SCHED_ATTR_SIZE_VER0 || size > PAGE_SIZE)
goto err_size;
- /*
- * If we're handed a bigger struct than we know of,
- * ensure all the unknown bits are 0 - i.e. new
- * user-space does not rely on any kernel feature
- * extensions we dont know about yet.
- */
- if (size > sizeof(*attr)) {
- unsigned char __user *addr;
- unsigned char __user *end;
- unsigned char val;
-
- addr = (void __user *)uattr + sizeof(*attr);
- end = (void __user *)uattr + size;
-
- for (; addr < end; addr++) {
- ret = get_user(val, addr);
- if (ret)
- return ret;
- if (val)
- goto err_size;
- }
- size = sizeof(*attr);
+ ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
+ if (ret) {
+ if (ret == -E2BIG)
+ goto err_size;
+ return ret;
}
- ret = copy_from_user(attr, uattr, size);
- if (ret)
- return -EFAULT;
-
if ((attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) &&
size < SCHED_ATTR_SIZE_VER1)
return -EINVAL;
* sys_sched_getattr - similar to sched_getparam, but with sched_attr
* @pid: the pid in question.
* @uattr: structure containing the extended parameters.
- * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility.
+ * @usize: sizeof(attr) for fwd/bwd comp.
* @flags: for future extension.
*/
SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
__set_task_cpu(idle, cpu);
rcu_read_unlock();
- rq->curr = rq->idle = idle;
+ rq->idle = idle;
+ rcu_assign_pointer(rq->curr, idle);
idle->on_rq = TASK_ON_RQ_QUEUED;
#ifdef CONFIG_SMP
idle->on_cpu = 1;
}
rq_unlock_irqrestore(rq, &rf);
- update_max_interval();
-
return 0;
}
/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
}
-static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
static void attach_entity_cfs_rq(struct sched_entity *se);
/*
return;
rcu_read_lock();
- cur = task_rcu_dereference(&dst_rq->curr);
+ cur = rcu_dereference(dst_rq->curr);
if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur)))
cur = NULL;
}
/*
- * Replenish runtime according to assigned quota and update expiration time.
- * We use sched_clock_cpu directly instead of rq->clock to avoid adding
- * additional synchronization around rq->lock.
+ * Replenish runtime according to assigned quota. We use sched_clock_cpu
+ * directly instead of rq->clock to avoid adding additional synchronization
+ * around rq->lock.
*
* requires cfs_b->lock
*/
void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
{
- u64 now;
-
- if (cfs_b->quota == RUNTIME_INF)
- return;
-
- now = sched_clock_cpu(smp_processor_id());
- cfs_b->runtime = cfs_b->quota;
+ if (cfs_b->quota != RUNTIME_INF)
+ cfs_b->runtime = cfs_b->quota;
}
static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
return &tg->cfs_bandwidth;
}
-/* rq->task_clock normalized against any time this cfs_rq has spent throttled */
-static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
-{
- if (unlikely(cfs_rq->throttle_count))
- return cfs_rq->throttled_clock_task - cfs_rq->throttled_clock_task_time;
-
- return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
-}
-
/* returns 0 on failure to allocate runtime */
static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
{
cfs_rq->throttle_count--;
if (!cfs_rq->throttle_count) {
- /* adjust cfs_rq_clock_task() */
cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
cfs_rq->throttled_clock_task;
void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
{
- u64 overrun;
-
lockdep_assert_held(&cfs_b->lock);
if (cfs_b->period_active)
return;
cfs_b->period_active = 1;
- overrun = hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period);
+ hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period);
hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED);
}
return false;
}
-static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
-{
- return rq_clock_task(rq_of(cfs_rq));
-}
-
static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) {}
static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { return false; }
static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {}
}
/* Evaluate the energy impact of using this CPU. */
- if (max_spare_cap_cpu >= 0) {
+ if (max_spare_cap_cpu >= 0 && max_spare_cap_cpu != prev_cpu) {
cur_delta = compute_energy(p, max_spare_cap_cpu, pd);
cur_delta -= base_energy_pd;
if (cur_delta < best_delta) {
smp_mb(); /* IPIs should be serializing but paranoid. */
}
+static void ipi_sync_rq_state(void *info)
+{
+ struct mm_struct *mm = (struct mm_struct *) info;
+
+ if (current->mm != mm)
+ return;
+ this_cpu_write(runqueues.membarrier_state,
+ atomic_read(&mm->membarrier_state));
+ /*
+ * Issue a memory barrier after setting
+ * MEMBARRIER_STATE_GLOBAL_EXPEDITED in the current runqueue to
+ * guarantee that no memory access following registration is reordered
+ * before registration.
+ */
+ smp_mb();
+}
+
+void membarrier_exec_mmap(struct mm_struct *mm)
+{
+ /*
+ * Issue a memory barrier before clearing membarrier_state to
+ * guarantee that no memory access prior to exec is reordered after
+ * clearing this state.
+ */
+ smp_mb();
+ atomic_set(&mm->membarrier_state, 0);
+ /*
+ * Keep the runqueue membarrier_state in sync with this mm
+ * membarrier_state.
+ */
+ this_cpu_write(runqueues.membarrier_state, 0);
+}
+
static int membarrier_global_expedited(void)
{
int cpu;
- bool fallback = false;
cpumask_var_t tmpmask;
if (num_online_cpus() == 1)
*/
smp_mb(); /* system call entry is not a mb. */
- /*
- * Expedited membarrier commands guarantee that they won't
- * block, hence the GFP_NOWAIT allocation flag and fallback
- * implementation.
- */
- if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
- /* Fallback for OOM. */
- fallback = true;
- }
+ if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+ return -ENOMEM;
cpus_read_lock();
+ rcu_read_lock();
for_each_online_cpu(cpu) {
struct task_struct *p;
if (cpu == raw_smp_processor_id())
continue;
- rcu_read_lock();
- p = task_rcu_dereference(&cpu_rq(cpu)->curr);
- if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
- MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
- if (!fallback)
- __cpumask_set_cpu(cpu, tmpmask);
- else
- smp_call_function_single(cpu, ipi_mb, NULL, 1);
- }
- rcu_read_unlock();
- }
- if (!fallback) {
- preempt_disable();
- smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
- preempt_enable();
- free_cpumask_var(tmpmask);
+ if (!(READ_ONCE(cpu_rq(cpu)->membarrier_state) &
+ MEMBARRIER_STATE_GLOBAL_EXPEDITED))
+ continue;
+
+ /*
+ * Skip the CPU if it runs a kernel thread. The scheduler
+ * leaves the prior task mm in place as an optimization when
+ * scheduling a kthread.
+ */
+ p = rcu_dereference(cpu_rq(cpu)->curr);
+ if (p->flags & PF_KTHREAD)
+ continue;
+
+ __cpumask_set_cpu(cpu, tmpmask);
}
+ rcu_read_unlock();
+
+ preempt_disable();
+ smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+ preempt_enable();
+
+ free_cpumask_var(tmpmask);
cpus_read_unlock();
/*
static int membarrier_private_expedited(int flags)
{
int cpu;
- bool fallback = false;
cpumask_var_t tmpmask;
+ struct mm_struct *mm = current->mm;
if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
- if (!(atomic_read(¤t->mm->membarrier_state) &
+ if (!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
return -EPERM;
} else {
- if (!(atomic_read(¤t->mm->membarrier_state) &
+ if (!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
return -EPERM;
}
- if (num_online_cpus() == 1)
+ if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
return 0;
/*
*/
smp_mb(); /* system call entry is not a mb. */
- /*
- * Expedited membarrier commands guarantee that they won't
- * block, hence the GFP_NOWAIT allocation flag and fallback
- * implementation.
- */
- if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
- /* Fallback for OOM. */
- fallback = true;
- }
+ if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+ return -ENOMEM;
cpus_read_lock();
+ rcu_read_lock();
for_each_online_cpu(cpu) {
struct task_struct *p;
*/
if (cpu == raw_smp_processor_id())
continue;
- rcu_read_lock();
- p = task_rcu_dereference(&cpu_rq(cpu)->curr);
- if (p && p->mm == current->mm) {
- if (!fallback)
- __cpumask_set_cpu(cpu, tmpmask);
- else
- smp_call_function_single(cpu, ipi_mb, NULL, 1);
- }
- rcu_read_unlock();
- }
- if (!fallback) {
- preempt_disable();
- smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
- preempt_enable();
- free_cpumask_var(tmpmask);
+ p = rcu_dereference(cpu_rq(cpu)->curr);
+ if (p && p->mm == mm)
+ __cpumask_set_cpu(cpu, tmpmask);
}
+ rcu_read_unlock();
+
+ preempt_disable();
+ smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+ preempt_enable();
+
+ free_cpumask_var(tmpmask);
cpus_read_unlock();
/*
return 0;
}
+static int sync_runqueues_membarrier_state(struct mm_struct *mm)
+{
+ int membarrier_state = atomic_read(&mm->membarrier_state);
+ cpumask_var_t tmpmask;
+ int cpu;
+
+ if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) {
+ this_cpu_write(runqueues.membarrier_state, membarrier_state);
+
+ /*
+ * For single mm user, we can simply issue a memory barrier
+ * after setting MEMBARRIER_STATE_GLOBAL_EXPEDITED in the
+ * mm and in the current runqueue to guarantee that no memory
+ * access following registration is reordered before
+ * registration.
+ */
+ smp_mb();
+ return 0;
+ }
+
+ if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+ return -ENOMEM;
+
+ /*
+ * For mm with multiple users, we need to ensure all future
+ * scheduler executions will observe @mm's new membarrier
+ * state.
+ */
+ synchronize_rcu();
+
+ /*
+ * For each cpu runqueue, if the task's mm match @mm, ensure that all
+ * @mm's membarrier state set bits are also set in in the runqueue's
+ * membarrier state. This ensures that a runqueue scheduling
+ * between threads which are users of @mm has its membarrier state
+ * updated.
+ */
+ cpus_read_lock();
+ rcu_read_lock();
+ for_each_online_cpu(cpu) {
+ struct rq *rq = cpu_rq(cpu);
+ struct task_struct *p;
+
+ p = rcu_dereference(rq->curr);
+ if (p && p->mm == mm)
+ __cpumask_set_cpu(cpu, tmpmask);
+ }
+ rcu_read_unlock();
+
+ preempt_disable();
+ smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1);
+ preempt_enable();
+
+ free_cpumask_var(tmpmask);
+ cpus_read_unlock();
+
+ return 0;
+}
+
static int membarrier_register_global_expedited(void)
{
struct task_struct *p = current;
struct mm_struct *mm = p->mm;
+ int ret;
if (atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY)
return 0;
atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state);
- if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
- /*
- * For single mm user, single threaded process, we can
- * simply issue a memory barrier after setting
- * MEMBARRIER_STATE_GLOBAL_EXPEDITED to guarantee that
- * no memory access following registration is reordered
- * before registration.
- */
- smp_mb();
- } else {
- /*
- * For multi-mm user threads, we need to ensure all
- * future scheduler executions will observe the new
- * thread flag state for this mm.
- */
- synchronize_rcu();
- }
+ ret = sync_runqueues_membarrier_state(mm);
+ if (ret)
+ return ret;
atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY,
&mm->membarrier_state);
{
struct task_struct *p = current;
struct mm_struct *mm = p->mm;
- int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY;
+ int ready_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
+ set_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED,
+ ret;
if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
- state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+ ready_state =
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
}
/*
* groups, which use the same mm. (CLONE_VM but not
* CLONE_THREAD).
*/
- if (atomic_read(&mm->membarrier_state) & state)
+ if ((atomic_read(&mm->membarrier_state) & ready_state) == ready_state)
return 0;
- atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
if (flags & MEMBARRIER_FLAG_SYNC_CORE)
- atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
- &mm->membarrier_state);
- if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
- /*
- * Ensure all future scheduler executions will observe the
- * new thread flag state for this process.
- */
- synchronize_rcu();
- }
- atomic_or(state, &mm->membarrier_state);
+ set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE;
+ atomic_or(set_state, &mm->membarrier_state);
+ ret = sync_runqueues_membarrier_state(mm);
+ if (ret)
+ return ret;
+ atomic_or(ready_state, &mm->membarrier_state);
return 0;
}
* command specified does not exist, not available on the running
* kernel, or if the command argument is invalid, this system call
* returns -EINVAL. For a given command, with flags argument set to 0,
- * this system call is guaranteed to always return the same value until
- * reboot.
+ * if this system call returns -ENOSYS or -EINVAL, it is guaranteed to
+ * always return the same value until reboot. In addition, it can return
+ * -ENOMEM if there is not enough memory available to perform the system
+ * call.
*
* All memory accesses performed in program order from each targeted thread
* is guaranteed to be ordered with respect to sys_membarrier(). If we use
atomic_t nr_iowait;
+#ifdef CONFIG_MEMBARRIER
+ int membarrier_state;
+#endif
+
#ifdef CONFIG_SMP
struct root_domain *rd;
struct sched_domain __rcu *sd;
static inline bool sched_energy_enabled(void) { return false; }
#endif /* CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
+
+#ifdef CONFIG_MEMBARRIER
+/*
+ * The scheduler provides memory barriers required by membarrier between:
+ * - prior user-space memory accesses and store to rq->membarrier_state,
+ * - store to rq->membarrier_state and following user-space memory accesses.
+ * In the same way it provides those guarantees around store to rq->curr.
+ */
+static inline void membarrier_switch_mm(struct rq *rq,
+ struct mm_struct *prev_mm,
+ struct mm_struct *next_mm)
+{
+ int membarrier_state;
+
+ if (prev_mm == next_mm)
+ return;
+
+ membarrier_state = atomic_read(&next_mm->membarrier_state);
+ if (READ_ONCE(rq->membarrier_state) == membarrier_state)
+ return;
+
+ WRITE_ONCE(rq->membarrier_state, membarrier_state);
+}
+#else
+static inline void membarrier_switch_mm(struct rq *rq,
+ struct mm_struct *prev_mm,
+ struct mm_struct *next_mm)
+{
+}
+#endif
*/
static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
{
- int bc_moved;
/*
- * We try to cancel the timer first. If the callback is on
- * flight on some other cpu then we let it handle it. If we
- * were able to cancel the timer nothing can rearm it as we
- * own broadcast_lock.
+ * This is called either from enter/exit idle code or from the
+ * broadcast handler. In all cases tick_broadcast_lock is held.
*
- * However we can also be called from the event handler of
- * ce_broadcast_hrtimer itself when it expires. We cannot
- * restart the timer because we are in the callback, but we
- * can set the expiry time and let the callback return
- * HRTIMER_RESTART.
+ * hrtimer_cancel() cannot be called here neither from the
+ * broadcast handler nor from the enter/exit idle code. The idle
+ * code can run into the problem described in bc_shutdown() and the
+ * broadcast handler cannot wait for itself to complete for obvious
+ * reasons.
*
- * Since we are in the idle loop at this point and because
- * hrtimer_{start/cancel} functions call into tracing,
- * calls to these functions must be bound within RCU_NONIDLE.
+ * Each caller tries to arm the hrtimer on its own CPU, but if the
+ * hrtimer callbback function is currently running, then
+ * hrtimer_start() cannot move it and the timer stays on the CPU on
+ * which it is assigned at the moment.
+ *
+ * As this can be called from idle code, the hrtimer_start()
+ * invocation has to be wrapped with RCU_NONIDLE() as
+ * hrtimer_start() can call into tracing.
*/
- RCU_NONIDLE(
- {
- bc_moved = hrtimer_try_to_cancel(&bctimer) >= 0;
- if (bc_moved) {
- hrtimer_start(&bctimer, expires,
- HRTIMER_MODE_ABS_PINNED_HARD);
- }
- }
- );
-
- if (bc_moved) {
- /* Bind the "device" to the cpu */
- bc->bound_on = smp_processor_id();
- } else if (bc->bound_on == smp_processor_id()) {
- hrtimer_set_expires(&bctimer, expires);
- }
+ RCU_NONIDLE( {
+ hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
+ /*
+ * The core tick broadcast mode expects bc->bound_on to be set
+ * correctly to prevent a CPU which has the broadcast hrtimer
+ * armed from going deep idle.
+ *
+ * As tick_broadcast_lock is held, nothing can change the cpu
+ * base which was just established in hrtimer_start() above. So
+ * the below access is safe even without holding the hrtimer
+ * base lock.
+ */
+ bc->bound_on = bctimer.base->cpu_base->cpu;
+ } );
return 0;
}
{
ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
- if (clockevent_state_oneshot(&ce_broadcast_hrtimer))
- if (ce_broadcast_hrtimer.next_event != KTIME_MAX)
- return HRTIMER_RESTART;
-
return HRTIMER_NORESTART;
}
{
int ret;
+ ret = security_locked_down(LOCKDOWN_BPF_READ);
+ if (ret < 0)
+ goto out;
+
ret = probe_kernel_read(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
+out:
memset(dst, 0, size);
return ret;
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
};
-static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs);
-static DEFINE_PER_CPU(struct perf_sample_data, bpf_misc_sd);
+static DEFINE_PER_CPU(int, bpf_event_output_nest_level);
+struct bpf_nested_pt_regs {
+ struct pt_regs regs[3];
+};
+static DEFINE_PER_CPU(struct bpf_nested_pt_regs, bpf_pt_regs);
+static DEFINE_PER_CPU(struct bpf_trace_sample_data, bpf_misc_sds);
u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy)
{
- struct perf_sample_data *sd = this_cpu_ptr(&bpf_misc_sd);
- struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs);
+ int nest_level = this_cpu_inc_return(bpf_event_output_nest_level);
struct perf_raw_frag frag = {
.copy = ctx_copy,
.size = ctx_size,
.data = meta,
},
};
+ struct perf_sample_data *sd;
+ struct pt_regs *regs;
+ u64 ret;
+
+ if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bpf_misc_sds.sds))) {
+ ret = -EBUSY;
+ goto out;
+ }
+ sd = this_cpu_ptr(&bpf_misc_sds.sds[nest_level - 1]);
+ regs = this_cpu_ptr(&bpf_pt_regs.regs[nest_level - 1]);
perf_fetch_caller_regs(regs);
perf_sample_data_init(sd, 0, 0);
sd->raw = &raw;
- return __bpf_perf_event_output(regs, map, flags, sd);
+ ret = __bpf_perf_event_output(regs, map, flags, sd);
+out:
+ this_cpu_dec(bpf_event_output_nest_level);
+ return ret;
}
BPF_CALL_0(bpf_get_current_task)
{
int ret;
+ ret = security_locked_down(LOCKDOWN_BPF_READ);
+ if (ret < 0)
+ goto out;
+
/*
* The strncpy_from_unsafe() call will likely not fill the entire
* buffer, but that's okay in this circumstance as we're probing
*/
ret = strncpy_from_unsafe(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
+out:
memset(dst, 0, size);
return ret;
__builtin_types_compatible_p(typeof(var), type *)
#undef IF_ASSIGN
-#define IF_ASSIGN(var, entry, etype, id) \
- if (FTRACE_CMP_TYPE(var, etype)) { \
- var = (typeof(var))(entry); \
- WARN_ON(id && (entry)->type != id); \
- break; \
+#define IF_ASSIGN(var, entry, etype, id) \
+ if (FTRACE_CMP_TYPE(var, etype)) { \
+ var = (typeof(var))(entry); \
+ WARN_ON(id != 0 && (entry)->type != id); \
+ break; \
}
/* Will cause compile errors if type is not found. */
switch (*next) {
case '(': /* #2 */
- if (top - op_stack > nr_parens)
- return ERR_PTR(-EINVAL);
+ if (top - op_stack > nr_parens) {
+ ret = -EINVAL;
+ goto out_free;
+ }
*(++top) = invert;
continue;
case '!': /* #3 */
#include <linux/uaccess.h>
#include <linux/rculist.h>
#include <linux/error-injection.h>
+#include <linux/security.h>
#include <asm/setup.h> /* for COMMAND_LINE_SIZE */
{
int i, ret;
+ ret = security_locked_down(LOCKDOWN_KPROBES);
+ if (ret)
+ return ret;
+
if (trace_kprobe_is_registered(tk))
return -EINVAL;
if (!command)
return;
+ if (trace_probe_log.index >= trace_probe_log.argc) {
+ /**
+ * Set the error position is next to the last arg + space.
+ * Note that len includes the terminal null and the cursor
+ * appaers at pos + 1.
+ */
+ pos = len;
+ offset = 0;
+ }
+
/* And make a command string from argv array */
p = command;
for (i = 0; i < trace_probe_log.argc; i++) {
{
int i;
+ /* In case of more arguments */
+ if (a->nr_args < b->nr_args)
+ return a->nr_args + 1;
+ if (a->nr_args > b->nr_args)
+ return b->nr_args + 1;
+
for (i = 0; i < a->nr_args; i++) {
if ((b->nr_args <= i) ||
((a->args[i].type != b->args[i].type) ||
Implementation is done using GnuPG MPI library
config DIMLIB
- bool "DIM library"
- default y
+ bool
help
Dynamic Interrupt Moderation library.
- Implements an algorithm for dynamically change CQ modertion values
+ Implements an algorithm for dynamically changing CQ moderation values
according to run time performance.
#
#include <linux/export.h>
#include <linux/uaccess.h>
#include <linux/mm.h>
+#include <linux/bitops.h>
#include <asm/word-at-a-time.h>
-/* Set bits in the first 'n' bytes when loaded from memory */
-#ifdef __LITTLE_ENDIAN
-# define aligned_byte_mask(n) ((1ul << 8*(n))-1)
-#else
-# define aligned_byte_mask(n) (~0xfful << (BITS_PER_LONG - 8 - 8*(n)))
-#endif
-
/*
* Do a strnlen, return length of string *with* final '\0'.
* 'count' is the user-supplied count, while 'max' is the
# define TEST_U64
#endif
-#define test(condition, msg) \
-({ \
- int cond = (condition); \
- if (cond) \
- pr_warn("%s\n", msg); \
- cond; \
+#define test(condition, msg, ...) \
+({ \
+ int cond = (condition); \
+ if (cond) \
+ pr_warn("[%d] " msg "\n", __LINE__, ##__VA_ARGS__); \
+ cond; \
})
+static bool is_zeroed(void *from, size_t size)
+{
+ return memchr_inv(from, 0x0, size) == NULL;
+}
+
+static int test_check_nonzero_user(char *kmem, char __user *umem, size_t size)
+{
+ int ret = 0;
+ size_t start, end, i;
+ size_t zero_start = size / 4;
+ size_t zero_end = size - zero_start;
+
+ /*
+ * We conduct a series of check_nonzero_user() tests on a block of memory
+ * with the following byte-pattern (trying every possible [start,end]
+ * pair):
+ *
+ * [ 00 ff 00 ff ... 00 00 00 00 ... ff 00 ff 00 ]
+ *
+ * And we verify that check_nonzero_user() acts identically to memchr_inv().
+ */
+
+ memset(kmem, 0x0, size);
+ for (i = 1; i < zero_start; i += 2)
+ kmem[i] = 0xff;
+ for (i = zero_end; i < size; i += 2)
+ kmem[i] = 0xff;
+
+ ret |= test(copy_to_user(umem, kmem, size),
+ "legitimate copy_to_user failed");
+
+ for (start = 0; start <= size; start++) {
+ for (end = start; end <= size; end++) {
+ size_t len = end - start;
+ int retval = check_zeroed_user(umem + start, len);
+ int expected = is_zeroed(kmem + start, len);
+
+ ret |= test(retval != expected,
+ "check_nonzero_user(=%d) != memchr_inv(=%d) mismatch (start=%zu, end=%zu)",
+ retval, expected, start, end);
+ }
+ }
+
+ return ret;
+}
+
+static int test_copy_struct_from_user(char *kmem, char __user *umem,
+ size_t size)
+{
+ int ret = 0;
+ char *umem_src = NULL, *expected = NULL;
+ size_t ksize, usize;
+
+ umem_src = kmalloc(size, GFP_KERNEL);
+ if ((ret |= test(umem_src == NULL, "kmalloc failed")))
+ goto out_free;
+
+ expected = kmalloc(size, GFP_KERNEL);
+ if ((ret |= test(expected == NULL, "kmalloc failed")))
+ goto out_free;
+
+ /* Fill umem with a fixed byte pattern. */
+ memset(umem_src, 0x3e, size);
+ ret |= test(copy_to_user(umem, umem_src, size),
+ "legitimate copy_to_user failed");
+
+ /* Check basic case -- (usize == ksize). */
+ ksize = size;
+ usize = size;
+
+ memcpy(expected, umem_src, ksize);
+
+ memset(kmem, 0x0, size);
+ ret |= test(copy_struct_from_user(kmem, ksize, umem, usize),
+ "copy_struct_from_user(usize == ksize) failed");
+ ret |= test(memcmp(kmem, expected, ksize),
+ "copy_struct_from_user(usize == ksize) gives unexpected copy");
+
+ /* Old userspace case -- (usize < ksize). */
+ ksize = size;
+ usize = size / 2;
+
+ memcpy(expected, umem_src, usize);
+ memset(expected + usize, 0x0, ksize - usize);
+
+ memset(kmem, 0x0, size);
+ ret |= test(copy_struct_from_user(kmem, ksize, umem, usize),
+ "copy_struct_from_user(usize < ksize) failed");
+ ret |= test(memcmp(kmem, expected, ksize),
+ "copy_struct_from_user(usize < ksize) gives unexpected copy");
+
+ /* New userspace (-E2BIG) case -- (usize > ksize). */
+ ksize = size / 2;
+ usize = size;
+
+ memset(kmem, 0x0, size);
+ ret |= test(copy_struct_from_user(kmem, ksize, umem, usize) != -E2BIG,
+ "copy_struct_from_user(usize > ksize) didn't give E2BIG");
+
+ /* New userspace (success) case -- (usize > ksize). */
+ ksize = size / 2;
+ usize = size;
+
+ memcpy(expected, umem_src, ksize);
+ ret |= test(clear_user(umem + ksize, usize - ksize),
+ "legitimate clear_user failed");
+
+ memset(kmem, 0x0, size);
+ ret |= test(copy_struct_from_user(kmem, ksize, umem, usize),
+ "copy_struct_from_user(usize > ksize) failed");
+ ret |= test(memcmp(kmem, expected, ksize),
+ "copy_struct_from_user(usize > ksize) gives unexpected copy");
+
+out_free:
+ kfree(expected);
+ kfree(umem_src);
+ return ret;
+}
+
static int __init test_user_copy_init(void)
{
int ret = 0;
#endif
#undef test_legit
+ /* Test usage of check_nonzero_user(). */
+ ret |= test_check_nonzero_user(kmem, usermem, 2 * PAGE_SIZE);
+ /* Test usage of copy_struct_from_user(). */
+ ret |= test_copy_struct_from_user(kmem, usermem, 2 * PAGE_SIZE);
+
/*
* Invalid usage: none of these copies should succeed.
*/
* goto errout;
* }
*
- * pos = textsearch_find_continuous(conf, \&state, example, strlen(example));
+ * pos = textsearch_find_continuous(conf, &state, example, strlen(example));
* if (pos != UINT_MAX)
- * panic("Oh my god, dancing chickens at \%d\n", pos);
+ * panic("Oh my god, dancing chickens at %d\n", pos);
*
* textsearch_destroy(conf);
*/
// SPDX-License-Identifier: GPL-2.0
#include <linux/uaccess.h>
+#include <linux/bitops.h>
/* out-of-line parts */
}
EXPORT_SYMBOL(_copy_to_user);
#endif
+
+/**
+ * check_zeroed_user: check if a userspace buffer only contains zero bytes
+ * @from: Source address, in userspace.
+ * @size: Size of buffer.
+ *
+ * This is effectively shorthand for "memchr_inv(from, 0, size) == NULL" for
+ * userspace addresses (and is more efficient because we don't care where the
+ * first non-zero byte is).
+ *
+ * Returns:
+ * * 0: There were non-zero bytes present in the buffer.
+ * * 1: The buffer was full of zero bytes.
+ * * -EFAULT: access to userspace failed.
+ */
+int check_zeroed_user(const void __user *from, size_t size)
+{
+ unsigned long val;
+ uintptr_t align = (uintptr_t) from % sizeof(unsigned long);
+
+ if (unlikely(size == 0))
+ return 1;
+
+ from -= align;
+ size += align;
+
+ if (!user_access_begin(from, size))
+ return -EFAULT;
+
+ unsafe_get_user(val, (unsigned long __user *) from, err_fault);
+ if (align)
+ val &= ~aligned_byte_mask(align);
+
+ while (size > sizeof(unsigned long)) {
+ if (unlikely(val))
+ goto done;
+
+ from += sizeof(unsigned long);
+ size -= sizeof(unsigned long);
+
+ unsafe_get_user(val, (unsigned long __user *) from, err_fault);
+ }
+
+ if (size < sizeof(unsigned long))
+ val &= aligned_byte_mask(size);
+
+done:
+ user_access_end();
+ return (val == 0);
+err_fault:
+ user_access_end();
+ return -EFAULT;
+}
+EXPORT_SYMBOL(check_zeroed_user);
* available
* never: never stall for any thp allocation
*/
-static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr)
+static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
{
const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
- gfp_t this_node = 0;
-
-#ifdef CONFIG_NUMA
- struct mempolicy *pol;
- /*
- * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
- * specified, to express a general desire to stay on the current
- * node for optimistic allocation attempts. If the defrag mode
- * and/or madvise hint requires the direct reclaim then we prefer
- * to fallback to other node rather than node reclaim because that
- * can lead to excessive reclaim even though there is free memory
- * on other nodes. We expect that NUMA preferences are specified
- * by memory policies.
- */
- pol = get_vma_policy(vma, addr);
- if (pol->mode != MPOL_BIND)
- this_node = __GFP_THISNODE;
- mpol_cond_put(pol);
-#endif
+ /* Always do synchronous compaction */
if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
+
+ /* Kick kcompactd and fail quickly */
if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
- return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM | this_node;
+ return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
+
+ /* Synchronous compaction if madvised, otherwise kick kcompactd */
if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags))
- return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM :
- __GFP_KSWAPD_RECLAIM | this_node);
+ return GFP_TRANSHUGE_LIGHT |
+ (vma_madvised ? __GFP_DIRECT_RECLAIM :
+ __GFP_KSWAPD_RECLAIM);
+
+ /* Only do synchronous compaction if madvised */
if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags))
- return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM :
- this_node);
- return GFP_TRANSHUGE_LIGHT | this_node;
+ return GFP_TRANSHUGE_LIGHT |
+ (vma_madvised ? __GFP_DIRECT_RECLAIM : 0);
+
+ return GFP_TRANSHUGE_LIGHT;
}
/* Caller must hold page table lock. */
pte_free(vma->vm_mm, pgtable);
return ret;
}
- gfp = alloc_hugepage_direct_gfpmask(vma, haddr);
- page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, vma, haddr, numa_node_id());
+ gfp = alloc_hugepage_direct_gfpmask(vma);
+ page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
if (unlikely(!page)) {
count_vm_event(THP_FAULT_FALLBACK);
return VM_FAULT_FALLBACK;
alloc:
if (__transparent_hugepage_enabled(vma) &&
!transparent_hugepage_debug_cow()) {
- huge_gfp = alloc_hugepage_direct_gfpmask(vma, haddr);
- new_page = alloc_pages_vma(huge_gfp, HPAGE_PMD_ORDER, vma,
- haddr, numa_node_id());
+ huge_gfp = alloc_hugepage_direct_gfpmask(vma);
+ new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PMD_ORDER);
} else
new_page = NULL;
} else if (PageTransHuge(page)) {
struct page *thp;
- thp = alloc_pages_vma(GFP_TRANSHUGE, HPAGE_PMD_ORDER, vma,
- address, numa_node_id());
+ thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
+ HPAGE_PMD_ORDER);
if (!thp)
return NULL;
prep_transhuge_page(thp);
* freeing by another task. It is the caller's responsibility to free the
* extra reference for shared policies.
*/
-struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
+static struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
unsigned long addr)
{
struct mempolicy *pol = __get_vma_policy(vma, addr);
* @vma: Pointer to VMA or NULL if not available.
* @addr: Virtual Address of the allocation. Must be inside the VMA.
* @node: Which node to prefer for allocation (modulo policy).
+ * @hugepage: for hugepages try only the preferred node if possible
*
* This function allocates a page from the kernel page pool and applies
* a NUMA policy associated with the VMA or the current process.
*/
struct page *
alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
- unsigned long addr, int node)
+ unsigned long addr, int node, bool hugepage)
{
struct mempolicy *pol;
struct page *page;
goto out;
}
+ if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) {
+ int hpage_node = node;
+
+ /*
+ * For hugepage allocation and non-interleave policy which
+ * allows the current node (or other explicitly preferred
+ * node) we only try to allocate from the current/preferred
+ * node and don't fall back to other nodes, as the cost of
+ * remote accesses would likely offset THP benefits.
+ *
+ * If the policy is interleave, or does not allow the current
+ * node in its nodemask, we allocate the standard way.
+ */
+ if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL))
+ hpage_node = pol->v.preferred_node;
+
+ nmask = policy_nodemask(gfp, pol);
+ if (!nmask || node_isset(hpage_node, *nmask)) {
+ mpol_cond_put(pol);
+ page = __alloc_pages_node(hpage_node,
+ gfp | __GFP_THISNODE, order);
+
+ /*
+ * If hugepage allocations are configured to always
+ * synchronous compact or the vma has been madvised
+ * to prefer hugepage backing, retry allowing remote
+ * memory as well.
+ */
+ if (!page && (gfp & __GFP_DIRECT_RECLAIM))
+ page = __alloc_pages_node(hpage_node,
+ gfp | __GFP_NORETRY, order);
+
+ goto out;
+ }
+ }
+
nmask = policy_nodemask(gfp, pol);
preferred_nid = policy_node(gfp, pol, node);
page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask);
if (page)
goto got_pg;
+ if (order >= pageblock_order && (gfp_mask & __GFP_IO)) {
+ /*
+ * If allocating entire pageblock(s) and compaction
+ * failed because all zones are below low watermarks
+ * or is prohibited because it recently failed at this
+ * order, fail immediately.
+ *
+ * Reclaim is
+ * - potentially very expensive because zones are far
+ * below their low watermarks or this is part of very
+ * bursty high order allocations,
+ * - not guaranteed to help because isolate_freepages()
+ * may not iterate over freed pages as part of its
+ * linear scan, and
+ * - unlikely to make entire pageblocks free on its
+ * own.
+ */
+ if (compact_result == COMPACT_SKIPPED ||
+ compact_result == COMPACT_DEFERRED)
+ goto nopage;
+ }
+
/*
* Checks for costly allocations with __GFP_NORETRY, which
* includes THP page fault allocations
shmem_pseudo_vma_init(&pvma, info, hindex);
page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
- HPAGE_PMD_ORDER, &pvma, 0, numa_node_id());
+ HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true);
shmem_pseudo_vma_destroy(&pvma);
if (page)
prep_transhuge_page(page);
p9pdu_reset(&req->tc);
p9pdu_reset(&req->rc);
+ req->t_err = 0;
req->status = REQ_STATUS_ALLOC;
init_waitqueue_head(&req->wq);
INIT_LIST_HEAD(&req->req_list);
*/
if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
goto out;
+
+ rc = -EPERM;
+ if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
+ goto out;
+
rc = -ENOMEM;
sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, &ddp_proto, kern);
if (!sk)
break;
case SOCK_RAW:
+ if (!capable(CAP_NET_RAW))
+ return -EPERM;
break;
default:
return -ESOCKTNOSUPPORT;
depends on NET
select LIBCRC32C
help
- B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is
- a routing protocol for multi-hop ad-hoc mesh networks. The
- networks may be wired or wireless. See
- https://www.open-mesh.org/ for more information and user space
- tools.
+ B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is
+ a routing protocol for multi-hop ad-hoc mesh networks. The
+ networks may be wired or wireless. See
+ https://www.open-mesh.org/ for more information and user space
+ tools.
config BATMAN_ADV_BATMAN_V
bool "B.A.T.M.A.N. V protocol"
/* clean the netfilter state now that the batman-adv header has been
* removed
*/
- nf_reset(skb);
+ nf_reset_ct(skb);
if (unlikely(!pskb_may_pull(skb, ETH_HLEN)))
goto dropped;
static void napi_skb_free_stolen_head(struct sk_buff *skb)
{
skb_dst_drop(skb);
- secpath_reset(skb);
+ skb_ext_put(skb);
kmem_cache_free(skbuff_head_cache, skb);
}
skb->encapsulation = 0;
skb_shinfo(skb)->gso_type = 0;
skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
- secpath_reset(skb);
+ skb_ext_reset(skb);
napi->skb = skb;
}
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI);
- if (err) {
+ if (err && err != -EOPNOTSUPP) {
mutex_unlock(&devlink->lock);
goto out;
}
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI);
- if (err) {
+ if (err && err != -EOPNOTSUPP) {
mutex_unlock(&devlink->lock);
goto out;
}
cb->nlh->nlmsg_seq, NLM_F_MULTI,
cb->extack);
mutex_unlock(&devlink->lock);
- if (err)
+ if (err && err != -EOPNOTSUPP)
break;
idx++;
}
int newrefcnt;
newrefcnt = atomic_dec_return(&dst->__refcnt);
- if (unlikely(newrefcnt < 0))
+ if (WARN_ONCE(newrefcnt < 0, "dst_release underflow"))
net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
__func__, dst, newrefcnt);
if (!newrefcnt)
int newrefcnt;
newrefcnt = atomic_dec_return(&dst->__refcnt);
- if (unlikely(newrefcnt < 0))
+ if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow"))
net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
__func__, dst, newrefcnt);
if (!newrefcnt)
skb->skb_iif = 0;
skb->ignore_df = 0;
skb_dst_drop(skb);
- secpath_reset(skb);
- nf_reset(skb);
+ skb_ext_reset(skb);
+ nf_reset_ct(skb);
nf_reset_trace(skb);
#ifdef CONFIG_NET_SWITCHDEV
sk_filter_uncharge(sk, filter);
RCU_INIT_POINTER(sk->sk_filter, NULL);
}
- if (rcu_access_pointer(sk->sk_reuseport_cb))
- reuseport_detach_sock(sk);
sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP);
void sk_destruct(struct sock *sk)
{
- if (sock_flag(sk, SOCK_RCU_FREE))
+ bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE);
+
+ if (rcu_access_pointer(sk->sk_reuseport_cb)) {
+ reuseport_detach_sock(sk);
+ use_call_rcu = true;
+ }
+
+ if (use_call_rcu)
call_rcu(&sk->sk_rcu, __sk_destruct);
else
__sk_destruct(&sk->sk_rcu);
return proto->memory_allocated != NULL ? proto_memory_allocated(proto) : -1L;
}
-static char *sock_prot_memory_pressure(struct proto *proto)
+static const char *sock_prot_memory_pressure(struct proto *proto)
{
return proto->memory_pressure != NULL ?
proto_memory_pressure(proto) ? "yes" : "no" : "NI";
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
goto discard_and_relse;
- nf_reset(skb);
+ nf_reset_ct(skb);
return __sk_receive_skb(sk, skb, 1, dh->dccph_doff * 4, refcounted);
opt = ireq->ipv6_opt;
if (!opt)
opt = rcu_dereference(np->opt);
- err = ip6_xmit(sk, skb, &fl6, sk->sk_mark, opt, np->tclass);
+ err = ip6_xmit(sk, skb, &fl6, sk->sk_mark, opt, np->tclass,
+ sk->sk_priority);
rcu_read_unlock();
err = net_xmit_eval(err);
}
dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL);
if (!IS_ERR(dst)) {
skb_dst_set(skb, dst);
- ip6_xmit(ctl_sk, skb, &fl6, 0, NULL, 0);
+ ip6_xmit(ctl_sk, skb, &fl6, 0, NULL, 0, 0);
DCCP_INC_STATS(DCCP_MIB_OUTSEGS);
DCCP_INC_STATS(DCCP_MIB_OUTRSTS);
return;
/* Step 1: A timestampable frame was received.
* Buffer it until we get its meta frame.
*/
- if (is_link_local && sp->data->hwts_rx_en) {
+ if (is_link_local) {
+ if (!test_bit(SJA1105_HWTS_RX_EN, &sp->data->state))
+ /* Do normal processing. */
+ return skb;
+
spin_lock(&sp->data->meta_lock);
/* Was this a link-local frame instead of the meta
* that we were expecting?
} else if (is_meta) {
struct sk_buff *stampable_skb;
+ /* Drop the meta frame if we're not in the right state
+ * to process it.
+ */
+ if (!test_bit(SJA1105_HWTS_RX_EN, &sp->data->state))
+ return NULL;
+
spin_lock(&sp->data->meta_lock);
stampable_skb = sp->data->stampable_skb;
switch (sock->type) {
case SOCK_RAW:
+ rc = -EPERM;
+ if (!capable(CAP_NET_RAW))
+ goto out;
proto = &ieee802154_raw_prot;
ops = &ieee802154_raw_ops;
break;
menuconfig NET_IFE
depends on NET
- tristate "Inter-FE based on IETF ForCES InterFE LFB"
+ tristate "Inter-FE based on IETF ForCES InterFE LFB"
default n
help
Say Y here to add support of IFE encapsulation protocol
wired networks and throughput over wireless links.
config TCP_CONG_HTCP
- tristate "H-TCP"
- default m
+ tristate "H-TCP"
+ default m
---help---
H-TCP is a send-side only modifications of the TCP Reno
protocol stack that optimizes the performance of TCP
rt = ip_route_output_flow(net, fl4, sk);
if (IS_ERR(rt))
goto no_route;
- if (opt && opt->opt.is_strictroute && rt->rt_gw_family)
+ if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
goto route_err;
rcu_read_unlock();
return &rt->dst;
rt = ip_route_output_flow(net, fl4, sk);
if (IS_ERR(rt))
goto no_route;
- if (opt && opt->opt.is_strictroute && rt->rt_gw_family)
+ if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
goto route_err;
return &rt->dst;
rt = skb_rtable(skb);
- if (opt->is_strictroute && rt->rt_gw_family)
+ if (opt->is_strictroute && rt->rt_uses_gateway)
goto sr_failed;
IPCB(skb)->flags |= IPSKB_FORWARDED;
struct ip_tunnel *t = netdev_priv(dev);
ether_setup(dev);
+ dev->max_mtu = 0;
dev->netdev_ops = &erspan_netdev_ops;
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
kfree_skb(skb);
return;
}
- nf_reset(skb);
+ nf_reset_ct(skb);
}
ret = INDIRECT_CALL_2(ipprot->handler, tcp_v4_rcv, udp_rcv,
skb);
skb_dst_set_noref(skb, &rt->dst);
packet_routed:
- if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_gw_family)
+ if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_uses_gateway)
goto no_route;
/* OK, we know where to send it, allocate and build IP header. */
inet_sk(sk)->tos = arg->tos;
- sk->sk_priority = skb->priority;
sk->sk_protocol = ip_hdr(skb)->protocol;
sk->sk_bound_dev_if = arg->bound_dev_if;
sk->sk_sndbuf = sysctl_wmem_default;
ip_send_check(iph);
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
- nf_reset(skb);
+ nf_reset_ct(skb);
}
static inline int ipmr_forward_finish(struct net *net, struct sock *sk,
mroute_sk = rcu_dereference(mrt->mroute_sk);
if (mroute_sk) {
- nf_reset(skb);
+ nf_reset_ct(skb);
raw_rcv(mroute_sk, skb);
return 0;
}
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
/* Avoid counting cloned packets towards the original connection. */
- nf_reset(skb);
+ nf_reset_ct(skb);
nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
#endif
/*
kfree_skb(skb);
return NET_RX_DROP;
}
- nf_reset(skb);
+ nf_reset_ct(skb);
skb_push(skb, skb->data - skb_network_header(skb));
if (fnhe->fnhe_gw) {
rt->rt_flags |= RTCF_REDIRECTED;
+ rt->rt_uses_gateway = 1;
rt->rt_gw_family = AF_INET;
rt->rt_gw4 = fnhe->fnhe_gw;
}
if (peer->rate_tokens == 0 ||
time_after(jiffies,
(peer->rate_last +
- (ip_rt_redirect_load << peer->rate_tokens)))) {
+ (ip_rt_redirect_load << peer->n_redirects)))) {
__be32 gw = rt_nexthop(rt, ip_hdr(skb)->daddr);
icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, gw);
peer->rate_last = jiffies;
- ++peer->rate_tokens;
++peer->n_redirects;
#ifdef CONFIG_IP_ROUTE_VERBOSE
if (log_martians &&
- peer->rate_tokens == ip_rt_redirect_number)
+ peer->n_redirects == ip_rt_redirect_number)
net_warn_ratelimited("host %pI4/if%d ignores redirects for %pI4 to %pI4\n",
&ip_hdr(skb)->saddr, inet_iif(skb),
&ip_hdr(skb)->daddr, &gw);
mtu = READ_ONCE(dst->dev->mtu);
if (unlikely(ip_mtu_locked(dst))) {
- if (rt->rt_gw_family && mtu > 576)
+ if (rt->rt_uses_gateway && mtu > 576)
mtu = 576;
}
struct fib_nh_common *nhc = FIB_RES_NHC(*res);
if (nhc->nhc_gw_family && nhc->nhc_scope == RT_SCOPE_LINK) {
+ rt->rt_uses_gateway = 1;
rt->rt_gw_family = nhc->nhc_gw_family;
/* only INET and INET6 are supported */
if (likely(nhc->nhc_gw_family == AF_INET))
rt->rt_iif = 0;
rt->rt_pmtu = 0;
rt->rt_mtu_locked = 0;
+ rt->rt_uses_gateway = 0;
rt->rt_gw_family = 0;
rt->rt_gw4 = 0;
INIT_LIST_HEAD(&rt->rt_uncached);
rt->rt_genid = rt_genid_ipv4(net);
rt->rt_flags = ort->rt_flags;
rt->rt_type = ort->rt_type;
+ rt->rt_uses_gateway = ort->rt_uses_gateway;
rt->rt_gw_family = ort->rt_gw_family;
if (rt->rt_gw_family == AF_INET)
rt->rt_gw4 = ort->rt_gw4;
if (nla_put_in_addr(skb, RTA_PREFSRC, fl4->saddr))
goto nla_put_failure;
}
- if (rt->rt_gw_family == AF_INET &&
- nla_put_in_addr(skb, RTA_GATEWAY, rt->rt_gw4)) {
- goto nla_put_failure;
- } else if (rt->rt_gw_family == AF_INET6) {
- int alen = sizeof(struct in6_addr);
- struct nlattr *nla;
- struct rtvia *via;
-
- nla = nla_reserve(skb, RTA_VIA, alen + 2);
- if (!nla)
+ if (rt->rt_uses_gateway) {
+ if (rt->rt_gw_family == AF_INET &&
+ nla_put_in_addr(skb, RTA_GATEWAY, rt->rt_gw4)) {
goto nla_put_failure;
-
- via = nla_data(nla);
- via->rtvia_family = AF_INET6;
- memcpy(via->rtvia_addr, &rt->rt_gw6, alen);
+ } else if (rt->rt_gw_family == AF_INET6) {
+ int alen = sizeof(struct in6_addr);
+ struct nlattr *nla;
+ struct rtvia *via;
+
+ nla = nla_reserve(skb, RTA_VIA, alen + 2);
+ if (!nla)
+ goto nla_put_failure;
+
+ via = nla_data(nla);
+ via->rtvia_family = AF_INET6;
+ memcpy(via->rtvia_addr, &rt->rt_gw6, alen);
+ }
}
expires = rt->dst.expires;
}
if (skb_frag_size(frags) != PAGE_SIZE || skb_frag_off(frags)) {
int remaining = zc->recv_skip_hint;
- int size = skb_frag_size(frags);
- while (remaining && (size != PAGE_SIZE ||
+ while (remaining && (skb_frag_size(frags) != PAGE_SIZE ||
skb_frag_off(frags))) {
- remaining -= size;
+ remaining -= skb_frag_size(frags);
frags++;
- size = skb_frag_size(frags);
}
zc->recv_skip_hint -= remaining;
break;
* which allows 2 outstanding 2-packet sequences, to try to keep pipe
* full even with ACK-every-other-packet delayed ACKs.
*/
-static u32 bbr_quantization_budget(struct sock *sk, u32 cwnd, int gain)
+static u32 bbr_quantization_budget(struct sock *sk, u32 cwnd)
{
struct bbr *bbr = inet_csk_ca(sk);
cwnd = (cwnd + 1) & ~1U;
/* Ensure gain cycling gets inflight above BDP even for small BDPs. */
- if (bbr->mode == BBR_PROBE_BW && gain > BBR_UNIT)
+ if (bbr->mode == BBR_PROBE_BW && bbr->cycle_idx == 0)
cwnd += 2;
return cwnd;
u32 inflight;
inflight = bbr_bdp(sk, bw, gain);
- inflight = bbr_quantization_budget(sk, inflight, gain);
+ inflight = bbr_quantization_budget(sk, inflight);
return inflight;
}
* due to aggregation (of data and/or ACKs) visible in the ACK stream.
*/
target_cwnd += bbr_ack_aggregation_cwnd(sk);
- target_cwnd = bbr_quantization_budget(sk, target_cwnd, gain);
+ target_cwnd = bbr_quantization_budget(sk, target_cwnd);
/* If we're below target cwnd, slow start cwnd toward target cwnd. */
if (bbr_full_bw_reached(sk)) /* only cut cwnd if we filled the pipe */
if (sk) {
ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ?
inet_twsk(sk)->tw_mark : sk->sk_mark;
+ ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
+ inet_twsk(sk)->tw_priority : sk->sk_priority;
transmit_time = tcp_transmit_time(sk);
}
ip_send_unicast_reply(ctl_sk,
ctl_sk = this_cpu_read(*net->ipv4.tcp_sk);
ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ?
inet_twsk(sk)->tw_mark : sk->sk_mark;
+ ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
+ inet_twsk(sk)->tw_priority : sk->sk_priority;
transmit_time = tcp_transmit_time(sk);
ip_send_unicast_reply(ctl_sk,
skb, &TCP_SKB_CB(skb)->header.h4.opt,
if (tcp_v4_inbound_md5_hash(sk, skb))
goto discard_and_relse;
- nf_reset(skb);
+ nf_reset_ct(skb);
if (tcp_filter(sk, skb))
goto discard_and_relse;
tw->tw_transparent = inet->transparent;
tw->tw_mark = sk->sk_mark;
+ tw->tw_priority = sk->sk_priority;
tw->tw_rcv_wscale = tp->rx_opt.rcv_wscale;
tcptw->tw_rcv_nxt = tp->rcv_nxt;
tcptw->tw_snd_nxt = tp->snd_nxt;
return false;
start_ts = tcp_sk(sk)->retrans_stamp;
- if (likely(timeout == 0))
- timeout = tcp_model_timeout(sk, boundary, TCP_RTO_MIN);
+ if (likely(timeout == 0)) {
+ unsigned int rto_base = TCP_RTO_MIN;
+
+ if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
+ rto_base = tcp_timeout_init(sk);
+ timeout = tcp_model_timeout(sk, boundary, rto_base);
+ }
return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0;
}
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct net *net = sock_net(sk);
- bool expired, do_reset;
+ bool expired = false, do_reset;
int retry_until;
if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
if (tcp_out_of_resources(sk, do_reset))
return 1;
}
+ }
+ if (!expired)
expired = retransmits_timed_out(sk, retry_until,
icsk->icsk_user_timeout);
- }
tcp_fastopen_active_detect_blackhole(sk, expired);
if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
int is_udplite = IS_UDPLITE(sk);
int offset = skb_transport_offset(skb);
int len = skb->len - offset;
+ int datalen = len - sizeof(*uh);
__wsum csum = 0;
/*
return -EIO;
}
- skb_shinfo(skb)->gso_size = cork->gso_size;
- skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
- skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
- cork->gso_size);
+ if (datalen > cork->gso_size) {
+ skb_shinfo(skb)->gso_size = cork->gso_size;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
+ cork->gso_size);
+ }
goto csum_partial;
}
*/
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
goto drop;
- nf_reset(skb);
+ nf_reset_ct(skb);
if (static_branch_unlikely(&udp_encap_needed_key) && up->encap_type) {
int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
goto drop;
- nf_reset(skb);
+ nf_reset_ct(skb);
/* No socket. Drop packet silently, if checksum is wrong */
if (udp_lib_checksum_complete(skb))
xdst->u.rt.rt_flags = rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST |
RTCF_LOCAL);
xdst->u.rt.rt_type = rt->rt_type;
+ xdst->u.rt.rt_uses_gateway = rt->rt_uses_gateway;
xdst->u.rt.rt_gw_family = rt->rt_gw_family;
if (rt->rt_gw_family == AF_INET)
xdst->u.rt.rt_gw4 = rt->rt_gw4;
switch (event) {
case RTM_NEWADDR:
/*
- * If the address was optimistic
- * we inserted the route at the start of
- * our DAD process, so we don't need
- * to do it again
+ * If the address was optimistic we inserted the route at the
+ * start of our DAD process, so we don't need to do it again.
+ * If the device was taken down in the middle of the DAD
+ * cycle there is a race where we could get here without a
+ * host route, so nothing to insert. That will be fixed when
+ * the device is brought up.
*/
- if (!rcu_access_pointer(ifp->rt->fib6_node))
+ if (ifp->rt && !rcu_access_pointer(ifp->rt->fib6_node)) {
ip6_ins_rt(net, ifp->rt);
+ } else if (!ifp->rt && (ifp->idev->dev->flags & IFF_UP)) {
+ pr_warn("BUG: Address %pI6c on device %s is missing its host route.\n",
+ &ifp->addr, ifp->idev->dev->name);
+ }
+
if (ifp->idev->cnf.forwarding)
addrconf_join_anycast(ifp);
if (!ipv6_addr_any(&ifp->peer_addr))
return false;
suppress_route:
- ip6_rt_put(rt);
+ if (!(arg->flags & FIB_LOOKUP_NOREF))
+ ip6_rt_put(rt);
return true;
}
fl6.daddr = sk->sk_v6_daddr;
res = ip6_xmit(sk, skb, &fl6, sk->sk_mark, rcu_dereference(np->opt),
- np->tclass);
+ np->tclass, sk->sk_priority);
rcu_read_unlock();
return res;
}
if (rt->dst.error == -EAGAIN) {
ip6_rt_put_flags(rt, flags);
rt = net->ipv6.ip6_null_entry;
- if (!(flags | RT6_LOOKUP_F_DST_NOREF))
+ if (!(flags & RT6_LOOKUP_F_DST_NOREF))
dst_hold(&rt->dst);
}
if (ipv6_addr_is_multicast(&hdr->saddr))
goto err;
+ /* While RFC4291 is not explicit about v4mapped addresses
+ * in IPv6 headers, it seems clear linux dual-stack
+ * model can not deal properly with these.
+ * Security models could be fooled by ::ffff:127.0.0.1 for example.
+ *
+ * https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02
+ */
+ if (ipv6_addr_v4mapped(&hdr->saddr))
+ goto err;
+
skb->transport_header = skb->network_header + sizeof(*hdr);
IP6CB(skb)->nhoff = offsetof(struct ipv6hdr, nexthdr);
/* Free reference early: we don't need it any more,
and it may hold ip_conntrack module loaded
indefinitely. */
- nf_reset(skb);
+ nf_reset_ct(skb);
skb_postpull_rcsum(skb, skb_network_header(skb),
skb_network_header_len(skb));
* which are using proper atomic operations or spinlocks.
*/
int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
- __u32 mark, struct ipv6_txoptions *opt, int tclass)
+ __u32 mark, struct ipv6_txoptions *opt, int tclass, u32 priority)
{
struct net *net = sock_net(sk);
const struct ipv6_pinfo *np = inet6_sk(sk);
hdr->daddr = *first_hop;
skb->protocol = htons(ETH_P_IPV6);
- skb->priority = sk->sk_priority;
+ skb->priority = priority;
skb->mark = mark;
mtu = dst_mtu(dst);
To compile it as a module, choose M here. If unsure, say N.
config IP6_NF_MATCH_SRH
- tristate '"srh" Segment Routing header match support'
- depends on NETFILTER_ADVANCED
- help
- srh matching allows you to match packets based on the segment
+ tristate '"srh" Segment Routing header match support'
+ depends on NETFILTER_ADVANCED
+ help
+ srh matching allows you to match packets based on the segment
routing header of the packet.
- To compile it as a module, choose M here. If unsure, say N.
+ To compile it as a module, choose M here. If unsure, say N.
# The targets
config IP6_NF_TARGET_HL
depends on SECURITY
depends on NETFILTER_ADVANCED
help
- This option adds a `security' table to iptables, for use
- with Mandatory Access Control (MAC) policy.
+ This option adds a `security' table to iptables, for use
+ with Mandatory Access Control (MAC) policy.
- If unsure, say N.
+ If unsure, say N.
config IP6_NF_NAT
tristate "ip6tables NAT support"
return;
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
- nf_reset(skb);
+ nf_reset_ct(skb);
nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
#endif
if (hooknum == NF_INET_PRE_ROUTING ||
/* Not releasing hash table! */
if (clone) {
- nf_reset(clone);
+ nf_reset_ct(clone);
rawv6_rcv(sk, clone);
}
}
opt = ireq->ipv6_opt;
if (!opt)
opt = rcu_dereference(np->opt);
- err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt, np->tclass);
+ err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt, np->tclass,
+ sk->sk_priority);
rcu_read_unlock();
err = net_xmit_eval(err);
}
static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 seq,
u32 ack, u32 win, u32 tsval, u32 tsecr,
int oif, struct tcp_md5sig_key *key, int rst,
- u8 tclass, __be32 label)
+ u8 tclass, __be32 label, u32 priority)
{
const struct tcphdr *th = tcp_hdr(skb);
struct tcphdr *t1;
dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL);
if (!IS_ERR(dst)) {
skb_dst_set(buff, dst);
- ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL, tclass);
+ ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL, tclass,
+ priority);
TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
if (rst)
TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
struct sock *sk1 = NULL;
#endif
__be32 label = 0;
+ u32 priority = 0;
struct net *net;
int oif = 0;
trace_tcp_send_reset(sk, skb);
if (np->repflow)
label = ip6_flowlabel(ipv6h);
+ priority = sk->sk_priority;
}
- if (sk->sk_state == TCP_TIME_WAIT)
+ if (sk->sk_state == TCP_TIME_WAIT) {
label = cpu_to_be32(inet_twsk(sk)->tw_flowlabel);
+ priority = inet_twsk(sk)->tw_priority;
+ }
} else {
if (net->ipv6.sysctl.flowlabel_reflect & FLOWLABEL_REFLECT_TCP_RESET)
label = ip6_flowlabel(ipv6h);
}
tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1, 0,
- label);
+ label, priority);
#ifdef CONFIG_TCP_MD5SIG
out:
static void tcp_v6_send_ack(const struct sock *sk, struct sk_buff *skb, u32 seq,
u32 ack, u32 win, u32 tsval, u32 tsecr, int oif,
struct tcp_md5sig_key *key, u8 tclass,
- __be32 label)
+ __be32 label, u32 priority)
{
tcp_v6_send_response(sk, skb, seq, ack, win, tsval, tsecr, oif, key, 0,
- tclass, label);
+ tclass, label, priority);
}
static void tcp_v6_timewait_ack(struct sock *sk, struct sk_buff *skb)
tcptw->tw_rcv_wnd >> tw->tw_rcv_wscale,
tcp_time_stamp_raw() + tcptw->tw_ts_offset,
tcptw->tw_ts_recent, tw->tw_bound_dev_if, tcp_twsk_md5_key(tcptw),
- tw->tw_tclass, cpu_to_be32(tw->tw_flowlabel));
+ tw->tw_tclass, cpu_to_be32(tw->tw_flowlabel), tw->tw_priority);
inet_twsk_put(tw);
}
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent, sk->sk_bound_dev_if,
tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr),
- 0, 0);
+ 0, 0, sk->sk_priority);
}
__wsum csum = 0;
int offset = skb_transport_offset(skb);
int len = skb->len - offset;
+ int datalen = len - sizeof(*uh);
/*
* Create a UDP header
return -EIO;
}
- skb_shinfo(skb)->gso_size = cork->gso_size;
- skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ if (datalen > cork->gso_size) {
+ skb_shinfo(skb)->gso_size = cork->gso_size;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
+ cork->gso_size);
+ }
goto csum_partial;
}
{
struct kcm_psock *psock = container_of(strp, struct kcm_psock, strp);
struct bpf_prog *prog = psock->bpf_prog;
+ int res;
- return BPF_PROG_RUN(prog, skb);
+ preempt_disable();
+ res = BPF_PROG_RUN(prog, skb);
+ preempt_enable();
+ return res;
}
static int kcm_read_sock_done(struct strparser *strp, int err)
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
IPSKB_REROUTED);
- nf_reset(skb);
+ nf_reset_ct(skb);
bh_lock_sock(sk);
if (sock_owned_by_user(sk)) {
skb->ip_summed = CHECKSUM_NONE;
skb_dst_drop(skb);
- nf_reset(skb);
+ nf_reset_ct(skb);
rcu_read_lock();
dev = rcu_dereference(spriv->dev);
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
goto discard_put;
- nf_reset(skb);
+ nf_reset_ct(skb);
return sk_receive_skb(sk, skb, 1);
if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb))
goto discard_put;
- nf_reset(skb);
+ nf_reset_ct(skb);
return sk_receive_skb(sk, skb, 1);
const struct ieee80211_sub_if_data *sdata, char *buf, int buflen)
{
struct ieee80211_local *local = sdata->local;
- struct txq_info *txqi = to_txq_info(sdata->vif.txq);
+ struct txq_info *txqi;
int len;
+ if (!sdata->vif.txq)
+ return 0;
+
+ txqi = to_txq_info(sdata->vif.txq);
+
spin_lock_bh(&local->fq.lock);
rcu_read_lock();
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_5ghz);
DEBUGFS_ADD(hw_queues);
- if (sdata->local->ops->wake_tx_queue)
+ if (sdata->local->ops->wake_tx_queue &&
+ sdata->vif.type != NL80211_IFTYPE_P2P_DEVICE &&
+ sdata->vif.type != NL80211_IFTYPE_NAN)
DEBUGFS_ADD(aqm);
}
struct sta_info *sta;
int i;
- spin_lock_bh(&fq->lock);
+ local_bh_disable();
+ spin_lock(&fq->lock);
if (sdata->vif.type == NL80211_IFTYPE_AP)
ps = &sdata->bss->ps;
&txqi->flags))
continue;
- spin_unlock_bh(&fq->lock);
+ spin_unlock(&fq->lock);
drv_wake_tx_queue(local, txqi);
- spin_lock_bh(&fq->lock);
+ spin_lock(&fq->lock);
}
}
(ps && atomic_read(&ps->num_sta_ps)) || ac != vif->txq->ac)
goto out;
- spin_unlock_bh(&fq->lock);
+ spin_unlock(&fq->lock);
drv_wake_tx_queue(local, txqi);
+ local_bh_enable();
return;
out:
- spin_unlock_bh(&fq->lock);
+ spin_unlock(&fq->lock);
+ local_bh_enable();
}
static void
ncsi_dev_state_config_ev,
ncsi_dev_state_config_sma,
ncsi_dev_state_config_ebf,
-#if IS_ENABLED(CONFIG_IPV6)
- ncsi_dev_state_config_egmf,
-#endif
+ ncsi_dev_state_config_dgmf,
ncsi_dev_state_config_ecnt,
ncsi_dev_state_config_ec,
ncsi_dev_state_config_ae,
#define NCSI_DEV_RESET 8 /* Reset state of NC */
unsigned int gma_flag; /* OEM GMA flag */
spinlock_t lock; /* Protect the NCSI device */
-#if IS_ENABLED(CONFIG_IPV6)
- unsigned int inet6_addr_num; /* Number of IPv6 addresses */
-#endif
unsigned int package_probe_id;/* Current ID during probe */
unsigned int package_num; /* Number of packages */
struct list_head packages; /* List of packages */
#include <net/sock.h>
#include <net/addrconf.h>
#include <net/ipv6.h>
-#include <net/if_inet6.h>
#include <net/genetlink.h>
#include "internal.h"
case ncsi_dev_state_config_ev:
case ncsi_dev_state_config_sma:
case ncsi_dev_state_config_ebf:
-#if IS_ENABLED(CONFIG_IPV6)
- case ncsi_dev_state_config_egmf:
-#endif
+ case ncsi_dev_state_config_dgmf:
case ncsi_dev_state_config_ecnt:
case ncsi_dev_state_config_ec:
case ncsi_dev_state_config_ae:
} else if (nd->state == ncsi_dev_state_config_ebf) {
nca.type = NCSI_PKT_CMD_EBF;
nca.dwords[0] = nc->caps[NCSI_CAP_BC].cap;
- if (ncsi_channel_is_tx(ndp, nc))
+ /* if multicast global filtering is supported then
+ * disable it so that all multicast packet will be
+ * forwarded to management controller
+ */
+ if (nc->caps[NCSI_CAP_GENERIC].cap &
+ NCSI_CAP_GENERIC_MC)
+ nd->state = ncsi_dev_state_config_dgmf;
+ else if (ncsi_channel_is_tx(ndp, nc))
nd->state = ncsi_dev_state_config_ecnt;
else
nd->state = ncsi_dev_state_config_ec;
-#if IS_ENABLED(CONFIG_IPV6)
- if (ndp->inet6_addr_num > 0 &&
- (nc->caps[NCSI_CAP_GENERIC].cap &
- NCSI_CAP_GENERIC_MC))
- nd->state = ncsi_dev_state_config_egmf;
- } else if (nd->state == ncsi_dev_state_config_egmf) {
- nca.type = NCSI_PKT_CMD_EGMF;
- nca.dwords[0] = nc->caps[NCSI_CAP_MC].cap;
+ } else if (nd->state == ncsi_dev_state_config_dgmf) {
+ nca.type = NCSI_PKT_CMD_DGMF;
if (ncsi_channel_is_tx(ndp, nc))
nd->state = ncsi_dev_state_config_ecnt;
else
nd->state = ncsi_dev_state_config_ec;
-#endif /* CONFIG_IPV6 */
} else if (nd->state == ncsi_dev_state_config_ecnt) {
if (np->preferred_channel &&
nc != np->preferred_channel)
return -ENODEV;
}
-#if IS_ENABLED(CONFIG_IPV6)
-static int ncsi_inet6addr_event(struct notifier_block *this,
- unsigned long event, void *data)
-{
- struct inet6_ifaddr *ifa = data;
- struct net_device *dev = ifa->idev->dev;
- struct ncsi_dev *nd = ncsi_find_dev(dev);
- struct ncsi_dev_priv *ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL;
- struct ncsi_package *np;
- struct ncsi_channel *nc;
- struct ncsi_cmd_arg nca;
- bool action;
- int ret;
-
- if (!ndp || (ipv6_addr_type(&ifa->addr) &
- (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK)))
- return NOTIFY_OK;
-
- switch (event) {
- case NETDEV_UP:
- action = (++ndp->inet6_addr_num) == 1;
- nca.type = NCSI_PKT_CMD_EGMF;
- break;
- case NETDEV_DOWN:
- action = (--ndp->inet6_addr_num == 0);
- nca.type = NCSI_PKT_CMD_DGMF;
- break;
- default:
- return NOTIFY_OK;
- }
-
- /* We might not have active channel or packages. The IPv6
- * required multicast will be enabled when active channel
- * or packages are chosen.
- */
- np = ndp->active_package;
- nc = ndp->active_channel;
- if (!action || !np || !nc)
- return NOTIFY_OK;
-
- /* We needn't enable or disable it if the function isn't supported */
- if (!(nc->caps[NCSI_CAP_GENERIC].cap & NCSI_CAP_GENERIC_MC))
- return NOTIFY_OK;
-
- nca.ndp = ndp;
- nca.req_flags = 0;
- nca.package = np->id;
- nca.channel = nc->id;
- nca.dwords[0] = nc->caps[NCSI_CAP_MC].cap;
- ret = ncsi_xmit_cmd(&nca);
- if (ret) {
- netdev_warn(dev, "Fail to %s global multicast filter (%d)\n",
- (event == NETDEV_UP) ? "enable" : "disable", ret);
- return NOTIFY_DONE;
- }
-
- return NOTIFY_OK;
-}
-
-static struct notifier_block ncsi_inet6addr_notifier = {
- .notifier_call = ncsi_inet6addr_event,
-};
-#endif /* CONFIG_IPV6 */
-
static int ncsi_kick_channels(struct ncsi_dev_priv *ndp)
{
struct ncsi_dev *nd = &ndp->ndev;
}
spin_lock_irqsave(&ncsi_dev_lock, flags);
-#if IS_ENABLED(CONFIG_IPV6)
- ndp->inet6_addr_num = 0;
- if (list_empty(&ncsi_dev_list))
- register_inet6addr_notifier(&ncsi_inet6addr_notifier);
-#endif
list_add_tail_rcu(&ndp->node, &ncsi_dev_list);
spin_unlock_irqrestore(&ncsi_dev_lock, flags);
spin_lock_irqsave(&ncsi_dev_lock, flags);
list_del_rcu(&ndp->node);
-#if IS_ENABLED(CONFIG_IPV6)
- if (list_empty(&ncsi_dev_list))
- unregister_inet6addr_notifier(&ncsi_inet6addr_notifier);
-#endif
spin_unlock_irqrestore(&ncsi_dev_lock, flags);
ncsi_unregister_netlink(nd->dev);
tristate "Netfilter flow table mixed IPv4/IPv6 module"
depends on NF_FLOW_TABLE
help
- This option adds the flow table mixed IPv4/IPv6 support.
+ This option adds the flow table mixed IPv4/IPv6 support.
To compile it as a module, choose M here.
module, choose M here. If unsure, say N.
config IP_VS_LC
- tristate "least-connection scheduling"
+ tristate "least-connection scheduling"
---help---
The least-connection scheduling algorithm directs network
connections to the server with the least number of active
module, choose M here. If unsure, say N.
config IP_VS_WLC
- tristate "weighted least-connection scheduling"
+ tristate "weighted least-connection scheduling"
---help---
The weighted least-connection scheduling algorithm directs network
connections to the server with the least active connections
config IP_VS_PE_SIP
tristate "SIP persistence engine"
- depends on IP_VS_PROTO_UDP
+ depends on IP_VS_PROTO_UDP
depends on NF_CONNTRACK_SIP
---help---
Allow persistence based on the SIP Call-ID
if (unlikely(cp->flags & IP_VS_CONN_F_NFCT))
ret = ip_vs_confirm_conntrack(skb);
if (ret == NF_ACCEPT) {
- nf_reset(skb);
+ nf_reset_ct(skb);
skb_forward_csum(skb);
}
return ret;
goto err2;
}
- nft_trans_chain_policy(trans) = -1;
+ nft_trans_chain_policy(trans) = NFT_CHAIN_POLICY_UNSET;
if (nft_is_base_chain(chain))
nft_trans_chain_policy(trans) = policy;
NFT_SET_OBJECT))
return -EINVAL;
/* Only one of these operations is supported */
- if ((flags & (NFT_SET_MAP | NFT_SET_EVAL | NFT_SET_OBJECT)) ==
- (NFT_SET_MAP | NFT_SET_EVAL | NFT_SET_OBJECT))
+ if ((flags & (NFT_SET_MAP | NFT_SET_OBJECT)) ==
+ (NFT_SET_MAP | NFT_SET_OBJECT))
+ return -EOPNOTSUPP;
+ if ((flags & (NFT_SET_EVAL | NFT_SET_OBJECT)) ==
+ (NFT_SET_EVAL | NFT_SET_OBJECT))
return -EOPNOTSUPP;
}
}
EXPORT_SYMBOL_GPL(nft_flowtable_lookup);
+void nf_tables_deactivate_flowtable(const struct nft_ctx *ctx,
+ struct nft_flowtable *flowtable,
+ enum nft_trans_phase phase)
+{
+ switch (phase) {
+ case NFT_TRANS_PREPARE:
+ case NFT_TRANS_ABORT:
+ case NFT_TRANS_RELEASE:
+ flowtable->use--;
+ /* fall through */
+ default:
+ return;
+ }
+}
+EXPORT_SYMBOL_GPL(nf_tables_deactivate_flowtable);
+
static struct nft_flowtable *
nft_flowtable_lookup_byhandle(const struct nft_table *table,
const struct nlattr *nla, u8 genmask)
policy = ppolicy ? *ppolicy : basechain->policy;
/* Only default policy to accept is supported for now. */
- if (cmd == FLOW_BLOCK_BIND && policy != -1 && policy != NF_ACCEPT)
+ if (cmd == FLOW_BLOCK_BIND && policy == NF_DROP)
return -EOPNOTSUPP;
if (dev->netdev_ops->ndo_setup_tc)
static bool nft_connlimit_gc(struct net *net, const struct nft_expr *expr)
{
struct nft_connlimit *priv = nft_expr_priv(expr);
+ bool ret;
- return nf_conncount_gc_list(net, &priv->list);
+ local_bh_disable();
+ ret = nf_conncount_gc_list(net, &priv->list);
+ local_bh_enable();
+
+ return ret;
}
static struct nft_expr_type nft_connlimit_type;
return nf_ct_netns_get(ctx->net, ctx->family);
}
+static void nft_flow_offload_deactivate(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ enum nft_trans_phase phase)
+{
+ struct nft_flow_offload *priv = nft_expr_priv(expr);
+
+ nf_tables_deactivate_flowtable(ctx, priv->flowtable, phase);
+}
+
+static void nft_flow_offload_activate(const struct nft_ctx *ctx,
+ const struct nft_expr *expr)
+{
+ struct nft_flow_offload *priv = nft_expr_priv(expr);
+
+ priv->flowtable->use++;
+}
+
static void nft_flow_offload_destroy(const struct nft_ctx *ctx,
const struct nft_expr *expr)
{
.size = NFT_EXPR_SIZE(sizeof(struct nft_flow_offload)),
.eval = nft_flow_offload_eval,
.init = nft_flow_offload_init,
+ .activate = nft_flow_offload_activate,
+ .deactivate = nft_flow_offload_deactivate,
.destroy = nft_flow_offload_destroy,
.validate = nft_flow_offload_validate,
.dump = nft_flow_offload_dump,
if (IS_ERR(set))
return PTR_ERR(set);
- if (set->flags & NFT_SET_EVAL)
- return -EOPNOTSUPP;
-
priv->sreg = nft_parse_register(tb[NFTA_LOOKUP_SREG]);
err = nft_validate_register_load(priv->sreg, set->klen);
if (err < 0)
llcp_sock->service_name = kmemdup(llcp_addr.service_name,
llcp_sock->service_name_len,
GFP_KERNEL);
-
+ if (!llcp_sock->service_name) {
+ ret = -ENOMEM;
+ goto put_dev;
+ }
llcp_sock->ssap = nfc_llcp_get_sdp_ssap(local, llcp_sock);
if (llcp_sock->ssap == LLCP_SAP_MAX) {
+ kfree(llcp_sock->service_name);
+ llcp_sock->service_name = NULL;
ret = -EADDRINUSE;
goto put_dev;
}
sock->type != SOCK_RAW)
return -ESOCKTNOSUPPORT;
- if (sock->type == SOCK_RAW)
+ if (sock->type == SOCK_RAW) {
+ if (!capable(CAP_NET_RAW))
+ return -EPERM;
sock->ops = &llcp_rawsock_ops;
- else
+ } else {
sock->ops = &llcp_sock_ops;
+ }
sk = nfc_llcp_sock_alloc(sock, sock->type, GFP_ATOMIC, kern);
if (sk == NULL)
[OVS_VPORT_ATTR_STATS] = { .len = sizeof(struct ovs_vport_stats) },
[OVS_VPORT_ATTR_PORT_NO] = { .type = NLA_U32 },
[OVS_VPORT_ATTR_TYPE] = { .type = NLA_U32 },
- [OVS_VPORT_ATTR_UPCALL_PID] = { .type = NLA_U32 },
+ [OVS_VPORT_ATTR_UPCALL_PID] = { .type = NLA_UNSPEC },
[OVS_VPORT_ATTR_OPTIONS] = { .type = NLA_NESTED },
[OVS_VPORT_ATTR_IFINDEX] = { .type = NLA_U32 },
[OVS_VPORT_ATTR_NETNSID] = { .type = NLA_S32 },
}
skb_dst_drop(skb);
- nf_reset(skb);
+ nf_reset_ct(skb);
secpath_reset(skb);
skb->pkt_type = PACKET_HOST;
skb_dst_drop(skb);
/* drop conntrack reference */
- nf_reset(skb);
+ nf_reset_ct(skb);
spkt = &PACKET_SKB_CB(skb)->sa.pkt;
skb_dst_drop(skb);
/* drop conntrack reference */
- nf_reset(skb);
+ nf_reset_ct(skb);
spin_lock(&sk->sk_receive_queue.lock);
po->stats.stats1.tp_packets++;
list_del(&node->item);
mutex_unlock(&qrtr_node_lock);
+ cancel_work_sync(&node->work);
skb_queue_purge(&node->rx_queue);
kfree(node);
}
This transport does not support RDMA operations.
config RDS_DEBUG
- bool "RDS debugging messages"
+ bool "RDS debugging messages"
depends on RDS
- default n
+ default n
*/
if (rs->rs_transport) {
trans = rs->rs_transport;
- if (trans->laddr_check(sock_net(sock->sk),
+ if (!trans->laddr_check ||
+ trans->laddr_check(sock_net(sock->sk),
binding_addr, scope_id) != 0) {
ret = -ENOPROTOOPT;
goto out;
sock_set_flag(sk, SOCK_RCU_FREE);
ret = rds_add_bound(rs, binding_addr, &port, scope_id);
+ if (ret)
+ rs->rs_transport = NULL;
out:
release_sock(sk);
refcount_set(&rds_ibdev->refcount, 1);
INIT_WORK(&rds_ibdev->free_work, rds_ib_dev_free);
+ INIT_LIST_HEAD(&rds_ibdev->ipaddr_list);
+ INIT_LIST_HEAD(&rds_ibdev->conn_list);
+
rds_ibdev->max_wrs = device->attrs.max_qp_wr;
rds_ibdev->max_sge = min(device->attrs.max_send_sge, RDS_IB_MAX_SGE);
device->name,
rds_ibdev->use_fastreg ? "FRMR" : "FMR");
- INIT_LIST_HEAD(&rds_ibdev->ipaddr_list);
- INIT_LIST_HEAD(&rds_ibdev->conn_list);
-
down_write(&rds_ib_devices_lock);
list_add_tail_rcu(&rds_ibdev->list, &rds_ib_devices);
up_write(&rds_ib_devices_lock);
tristate "Common Applications Kept Enhanced (CAKE)"
help
Say Y here if you want to use the Common Applications Kept Enhanced
- (CAKE) queue management algorithm.
+ (CAKE) queue management algorithm.
To compile this driver as a module, choose M here: the module
will be called sch_cake.
config NET_ACT_POLICE
tristate "Traffic Policing"
- depends on NET_CLS_ACT
- ---help---
+ depends on NET_CLS_ACT
+ ---help---
Say Y here if you want to do traffic policing, i.e. strict
bandwidth limiting. This action replaces the existing policing
module.
module will be called act_police.
config NET_ACT_GACT
- tristate "Generic actions"
- depends on NET_CLS_ACT
- ---help---
+ tristate "Generic actions"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to take generic actions such as dropping and
accepting packets.
module will be called act_gact.
config GACT_PROB
- bool "Probability support"
- depends on NET_ACT_GACT
- ---help---
+ bool "Probability support"
+ depends on NET_ACT_GACT
+ ---help---
Say Y here to use the generic action randomly or deterministically.
config NET_ACT_MIRRED
- tristate "Redirecting and Mirroring"
- depends on NET_CLS_ACT
- ---help---
+ tristate "Redirecting and Mirroring"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to allow packets to be mirrored or redirected to
other devices.
module will be called act_mirred.
config NET_ACT_SAMPLE
- tristate "Traffic Sampling"
- depends on NET_CLS_ACT
- select PSAMPLE
- ---help---
+ tristate "Traffic Sampling"
+ depends on NET_CLS_ACT
+ select PSAMPLE
+ ---help---
Say Y here to allow packet sampling tc action. The packet sample
action consists of statistically choosing packets and sampling
them using the psample module.
module will be called act_sample.
config NET_ACT_IPT
- tristate "IPtables targets"
- depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
- ---help---
+ tristate "IPtables targets"
+ depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+ ---help---
Say Y here to be able to invoke iptables targets after successful
classification.
module will be called act_ipt.
config NET_ACT_NAT
- tristate "Stateless NAT"
- depends on NET_CLS_ACT
- ---help---
+ tristate "Stateless NAT"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to do stateless NAT on IPv4 packets. You should use
netfilter for NAT unless you know what you are doing.
module will be called act_nat.
config NET_ACT_PEDIT
- tristate "Packet Editing"
- depends on NET_CLS_ACT
- ---help---
+ tristate "Packet Editing"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here if you want to mangle the content of packets.
To compile this code as a module, choose M here: the
module will be called act_pedit.
config NET_ACT_SIMP
- tristate "Simple Example (Debug)"
- depends on NET_CLS_ACT
- ---help---
+ tristate "Simple Example (Debug)"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to add a simple action for demonstration purposes.
It is meant as an example and for debugging purposes. It will
print a configured policy string followed by the packet count
module will be called act_simple.
config NET_ACT_SKBEDIT
- tristate "SKB Editing"
- depends on NET_CLS_ACT
- ---help---
+ tristate "SKB Editing"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to change skb priority or queue_mapping settings.
If unsure, say N.
module will be called act_skbedit.
config NET_ACT_CSUM
- tristate "Checksum Updating"
- depends on NET_CLS_ACT && INET
- select LIBCRC32C
- ---help---
+ tristate "Checksum Updating"
+ depends on NET_CLS_ACT && INET
+ select LIBCRC32C
+ ---help---
Say Y here to update some common checksum after some direct
packet alterations.
module will be called act_mpls.
config NET_ACT_VLAN
- tristate "Vlan manipulation"
- depends on NET_CLS_ACT
- ---help---
+ tristate "Vlan manipulation"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to push or pop vlan headers.
If unsure, say N.
module will be called act_vlan.
config NET_ACT_BPF
- tristate "BPF based action"
- depends on NET_CLS_ACT
- ---help---
+ tristate "BPF based action"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to execute BPF code on packets. The BPF code will decide
if the packet should be dropped or not.
module will be called act_bpf.
config NET_ACT_CONNMARK
- tristate "Netfilter Connection Mark Retriever"
- depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
- depends on NF_CONNTRACK && NF_CONNTRACK_MARK
- ---help---
+ tristate "Netfilter Connection Mark Retriever"
+ depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+ depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+ ---help---
Say Y here to allow retrieving of conn mark
If unsure, say N.
module will be called act_connmark.
config NET_ACT_CTINFO
- tristate "Netfilter Connection Mark Actions"
- depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
- depends on NF_CONNTRACK && NF_CONNTRACK_MARK
- help
+ tristate "Netfilter Connection Mark Actions"
+ depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+ depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+ help
Say Y here to allow transfer of a connmark stored information.
Current actions transfer connmark stored DSCP into
ipv4/v6 diffserv and/or to transfer connmark to packet
module will be called act_ctinfo.
config NET_ACT_SKBMOD
- tristate "skb data modification action"
- depends on NET_CLS_ACT
- ---help---
- Say Y here to allow modification of skb data
+ tristate "skb data modification action"
+ depends on NET_CLS_ACT
+ ---help---
+ Say Y here to allow modification of skb data
- If unsure, say N.
+ If unsure, say N.
- To compile this code as a module, choose M here: the
- module will be called act_skbmod.
+ To compile this code as a module, choose M here: the
+ module will be called act_skbmod.
config NET_ACT_IFE
- tristate "Inter-FE action based on IETF ForCES InterFE LFB"
- depends on NET_CLS_ACT
- select NET_IFE
- ---help---
+ tristate "Inter-FE action based on IETF ForCES InterFE LFB"
+ depends on NET_CLS_ACT
+ select NET_IFE
+ ---help---
Say Y here to allow for sourcing and terminating metadata
For details refer to netdev01 paper:
"Distributing Linux Traffic Control Classifier-Action Subsystem"
module will be called act_ife.
config NET_ACT_TUNNEL_KEY
- tristate "IP tunnel metadata manipulation"
- depends on NET_CLS_ACT
- ---help---
+ tristate "IP tunnel metadata manipulation"
+ depends on NET_CLS_ACT
+ ---help---
Say Y here to set/release ip tunnel metadata.
If unsure, say N.
module will be called act_tunnel_key.
config NET_ACT_CT
- tristate "connection tracking tc action"
- depends on NET_CLS_ACT && NF_CONNTRACK && NF_NAT
- help
+ tristate "connection tracking tc action"
+ depends on NET_CLS_ACT && NF_CONNTRACK && NF_NAT
+ help
Say Y here to allow sending the packets to conntrack module.
If unsure, say N.
module will be called act_ct.
config NET_IFE_SKBMARK
- tristate "Support to encoding decoding skb mark on IFE action"
- depends on NET_ACT_IFE
+ tristate "Support to encoding decoding skb mark on IFE action"
+ depends on NET_ACT_IFE
config NET_IFE_SKBPRIO
- tristate "Support to encoding decoding skb prio on IFE action"
- depends on NET_ACT_IFE
+ tristate "Support to encoding decoding skb prio on IFE action"
+ depends on NET_ACT_IFE
config NET_IFE_SKBTCINDEX
- tristate "Support to encoding decoding skb tcindex on IFE action"
- depends on NET_ACT_IFE
+ tristate "Support to encoding decoding skb tcindex on IFE action"
+ depends on NET_ACT_IFE
config NET_TC_SKB_EXT
bool "TC recirculation support"
depends on NET_CLS_ACT
- default y if NET_CLS_ACT
select SKB_EXTENSIONS
help
return c;
}
+static const struct nla_policy tcf_action_policy[TCA_ACT_MAX + 1] = {
+ [TCA_ACT_KIND] = { .type = NLA_NUL_STRING,
+ .len = IFNAMSIZ - 1 },
+ [TCA_ACT_INDEX] = { .type = NLA_U32 },
+ [TCA_ACT_COOKIE] = { .type = NLA_BINARY,
+ .len = TC_COOKIE_MAX_SIZE },
+ [TCA_ACT_OPTIONS] = { .type = NLA_NESTED },
+};
+
struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
struct nlattr *nla, struct nlattr *est,
char *name, int ovr, int bind,
int err;
if (name == NULL) {
- err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, NULL,
- extack);
+ err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla,
+ tcf_action_policy, extack);
if (err < 0)
goto err_out;
err = -EINVAL;
NL_SET_ERR_MSG(extack, "TC action kind must be specified");
goto err_out;
}
- if (nla_strlcpy(act_name, kind, IFNAMSIZ) >= IFNAMSIZ) {
- NL_SET_ERR_MSG(extack, "TC action name too long");
- goto err_out;
- }
- if (tb[TCA_ACT_COOKIE]) {
- int cklen = nla_len(tb[TCA_ACT_COOKIE]);
-
- if (cklen > TC_COOKIE_MAX_SIZE) {
- NL_SET_ERR_MSG(extack, "TC cookie size above the maximum");
- goto err_out;
- }
+ nla_strlcpy(act_name, kind, IFNAMSIZ);
+ if (tb[TCA_ACT_COOKIE]) {
cookie = nla_memdup_cookie(tb);
if (!cookie) {
NL_SET_ERR_MSG(extack, "No memory to generate TC cookie");
int index;
int err;
- err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, NULL, extack);
+ err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla,
+ tcf_action_policy, extack);
if (err < 0)
goto err_out;
b = skb_tail_pointer(skb);
- err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, NULL, extack);
+ err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla,
+ tcf_action_policy, extack);
if (err < 0)
goto err_out;
if (tb[1] == NULL)
return NULL;
- if (nla_parse_nested_deprecated(tb2, TCA_ACT_MAX, tb[1], NULL, NULL) < 0)
+ if (nla_parse_nested_deprecated(tb2, TCA_ACT_MAX, tb[1], tcf_action_policy, NULL) < 0)
return NULL;
kind = tb2[TCA_ACT_KIND];
case ARPHRD_TUNNEL6:
case ARPHRD_SIT:
case ARPHRD_IPGRE:
+ case ARPHRD_IP6GRE:
case ARPHRD_VOID:
case ARPHRD_NONE:
return false;
void tcf_exts_destroy(struct tcf_exts *exts)
{
#ifdef CONFIG_NET_CLS_ACT
- tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
- kfree(exts->actions);
+ if (exts->actions) {
+ tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
+ kfree(exts->actions);
+ }
exts->nr_actions = 0;
#endif
}
}
const struct nla_policy rtm_tca_policy[TCA_MAX + 1] = {
- [TCA_KIND] = { .type = NLA_STRING },
+ [TCA_KIND] = { .type = NLA_NUL_STRING,
+ .len = IFNAMSIZ - 1 },
[TCA_RATE] = { .type = NLA_BINARY,
.len = sizeof(struct tc_estimator) },
[TCA_STAB] = { .type = NLA_NESTED },
[TCA_CBQ_POLICE] = { .len = sizeof(struct tc_cbq_police) },
};
+static int cbq_opt_parse(struct nlattr *tb[TCA_CBQ_MAX + 1],
+ struct nlattr *opt,
+ struct netlink_ext_ack *extack)
+{
+ int err;
+
+ if (!opt) {
+ NL_SET_ERR_MSG(extack, "CBQ options are required for this operation");
+ return -EINVAL;
+ }
+
+ err = nla_parse_nested_deprecated(tb, TCA_CBQ_MAX, opt,
+ cbq_policy, extack);
+ if (err < 0)
+ return err;
+
+ if (tb[TCA_CBQ_WRROPT]) {
+ const struct tc_cbq_wrropt *wrr = nla_data(tb[TCA_CBQ_WRROPT]);
+
+ if (wrr->priority > TC_CBQ_MAXPRIO) {
+ NL_SET_ERR_MSG(extack, "priority is bigger than TC_CBQ_MAXPRIO");
+ err = -EINVAL;
+ }
+ }
+ return err;
+}
+
static int cbq_init(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
q->delay_timer.function = cbq_undelay;
- if (!opt) {
- NL_SET_ERR_MSG(extack, "CBQ options are required for this operation");
- return -EINVAL;
- }
-
- err = nla_parse_nested_deprecated(tb, TCA_CBQ_MAX, opt, cbq_policy,
- extack);
+ err = cbq_opt_parse(tb, opt, extack);
if (err < 0)
return err;
struct cbq_class *parent;
struct qdisc_rate_table *rtab = NULL;
- if (!opt) {
- NL_SET_ERR_MSG(extack, "Mandatory qdisc options missing");
- return -EINVAL;
- }
-
- err = nla_parse_nested_deprecated(tb, TCA_CBQ_MAX, opt, cbq_policy,
- extack);
+ err = cbq_opt_parse(tb, opt, extack);
if (err < 0)
return err;
if (err < 0)
goto skip;
- if (ecmd.base.speed != SPEED_UNKNOWN)
+ if (ecmd.base.speed && ecmd.base.speed != SPEED_UNKNOWN)
speed = ecmd.base.speed;
skip:
{
struct cbs_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
- int err;
if (!opt) {
NL_SET_ERR_MSG(extack, "Missing CBS qdisc options which are mandatory");
if (!q->qdisc)
return -ENOMEM;
+ spin_lock(&cbs_list_lock);
+ list_add(&q->cbs_list, &cbs_list);
+ spin_unlock(&cbs_list_lock);
+
qdisc_hash_add(q->qdisc, false);
q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
qdisc_watchdog_init(&q->watchdog, sch);
- err = cbs_change(sch, opt, extack);
- if (err)
- return err;
-
- if (!q->offload) {
- spin_lock(&cbs_list_lock);
- list_add(&q->cbs_list, &cbs_list);
- spin_unlock(&cbs_list_lock);
- }
-
- return 0;
+ return cbs_change(sch, opt, extack);
}
static void cbs_destroy(struct Qdisc *sch)
struct cbs_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
- spin_lock(&cbs_list_lock);
- list_del(&q->cbs_list);
- spin_unlock(&cbs_list_lock);
+ /* Nothing to do if we couldn't create the underlying qdisc */
+ if (!q->qdisc)
+ return;
qdisc_watchdog_cancel(&q->watchdog);
cbs_disable_offload(dev, q);
- if (q->qdisc)
- qdisc_put(q->qdisc);
+ spin_lock(&cbs_list_lock);
+ list_del(&q->cbs_list);
+ spin_unlock(&cbs_list_lock);
+
+ qdisc_put(q->qdisc);
}
static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
goto errout;
err = -EINVAL;
+ if (!tb[TCA_DSMARK_INDICES])
+ goto errout;
indices = nla_get_u16(tb[TCA_DSMARK_INDICES]);
if (hweight32(indices) != 1)
struct htb_class *cl = (struct htb_class *)*arg, *parent;
struct nlattr *opt = tca[TCA_OPTIONS];
struct nlattr *tb[TCA_HTB_MAX + 1];
+ struct Qdisc *parent_qdisc = NULL;
struct tc_htb_opt *hopt;
u64 rate64, ceil64;
int warn = 0;
if (parent && !parent->level) {
/* turn parent into inner node */
qdisc_purge_queue(parent->leaf.q);
- qdisc_put(parent->leaf.q);
+ parent_qdisc = parent->leaf.q;
if (parent->prio_activity)
htb_deactivate(q, parent);
cl->cbuffer = PSCHED_TICKS2NS(hopt->cbuffer);
sch_tree_unlock(sch);
+ qdisc_put(parent_qdisc);
if (warn)
pr_warn("HTB: quantum of class %X is %s. Consider r2q change.\n",
{
struct multiq_sched_data *q = qdisc_priv(sch);
struct tc_multiq_qopt *qopt;
- int i;
+ struct Qdisc **removed;
+ int i, n_removed = 0;
if (!netif_is_multiqueue(qdisc_dev(sch)))
return -EOPNOTSUPP;
qopt->bands = qdisc_dev(sch)->real_num_tx_queues;
+ removed = kmalloc(sizeof(*removed) * (q->max_bands - q->bands),
+ GFP_KERNEL);
+ if (!removed)
+ return -ENOMEM;
+
sch_tree_lock(sch);
q->bands = qopt->bands;
for (i = q->bands; i < q->max_bands; i++) {
struct Qdisc *child = q->queues[i];
q->queues[i] = &noop_qdisc;
- qdisc_tree_flush_backlog(child);
- qdisc_put(child);
+ qdisc_purge_queue(child);
+ removed[n_removed++] = child;
}
}
sch_tree_unlock(sch);
+ for (i = 0; i < n_removed; i++)
+ qdisc_put(removed[i]);
+ kfree(removed);
+
for (i = 0; i < q->bands; i++) {
if (q->queues[i] == &noop_qdisc) {
struct Qdisc *child, *old;
if (child != &noop_qdisc)
qdisc_hash_add(child, true);
- if (old != &noop_qdisc) {
- qdisc_tree_flush_backlog(old);
- qdisc_put(old);
- }
+ if (old != &noop_qdisc)
+ qdisc_purge_queue(old);
sch_tree_unlock(sch);
+ qdisc_put(old);
}
}
}
* skb will be queued.
*/
if (count > 1 && (skb2 = skb_clone(skb, GFP_ATOMIC)) != NULL) {
- struct Qdisc *rootq = qdisc_root(sch);
+ struct Qdisc *rootq = qdisc_root_bh(sch);
u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
q->duplicate = 0;
struct disttable *d;
int i;
- if (n > NETEM_DIST_MAX)
+ if (!n || n > NETEM_DIST_MAX)
return -EINVAL;
d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL);
struct netlink_ext_ack *extack)
{
struct sfb_sched_data *q = qdisc_priv(sch);
- struct Qdisc *child;
+ struct Qdisc *child, *old;
struct nlattr *tb[TCA_SFB_MAX + 1];
const struct tc_sfb_qopt *ctl = &sfb_default_ops;
u32 limit;
qdisc_hash_add(child, true);
sch_tree_lock(sch);
- qdisc_tree_flush_backlog(q->qdisc);
- qdisc_put(q->qdisc);
+ qdisc_purge_queue(q->qdisc);
+ old = q->qdisc;
q->qdisc = child;
q->rehash_interval = msecs_to_jiffies(ctl->rehash_interval);
sfb_init_perturbation(1, q);
sch_tree_unlock(sch);
+ qdisc_put(old);
return 0;
}
if (err < 0)
goto skip;
- if (ecmd.base.speed != SPEED_UNKNOWN)
+ if (ecmd.base.speed && ecmd.base.speed != SPEED_UNKNOWN)
speed = ecmd.base.speed;
skip:
- picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
- speed * 1000 * 1000);
+ picos_per_byte = (USEC_PER_SEC * 8) / speed;
atomic64_set(&q->picos_per_byte, picos_per_byte);
netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld, linkspeed: %d\n",
if (!xfrm_policy_check(sk, XFRM_POLICY_IN, skb, family))
goto discard_release;
- nf_reset(skb);
+ nf_reset_ct(skb);
if (sk_filter(sk, skb))
goto discard_release;
rcu_read_lock();
res = ip6_xmit(sk, skb, fl6, sk->sk_mark, rcu_dereference(np->opt),
- tclass);
+ tclass, sk->sk_priority);
rcu_read_unlock();
return res;
}
spin_lock(&cache_list_lock);
cd->nextcheck = 0;
cd->entries = 0;
- atomic_set(&cd->readers, 0);
+ atomic_set(&cd->writers, 0);
cd->last_close = 0;
cd->last_warn = -1;
list_add(&cd->others, &cache_list);
}
rp->offset = 0;
rp->q.reader = 1;
- atomic_inc(&cd->readers);
+
spin_lock(&queue_lock);
list_add(&rp->q.list, &cd->queue);
spin_unlock(&queue_lock);
}
+ if (filp->f_mode & FMODE_WRITE)
+ atomic_inc(&cd->writers);
filp->private_data = rp;
return 0;
}
filp->private_data = NULL;
kfree(rp);
+ }
+ if (filp->f_mode & FMODE_WRITE) {
+ atomic_dec(&cd->writers);
cd->last_close = seconds_since_boot();
- atomic_dec(&cd->readers);
}
module_put(cd->owner);
return 0;
static bool cache_listeners_exist(struct cache_detail *detail)
{
- if (atomic_read(&detail->readers))
+ if (atomic_read(&detail->writers))
return true;
if (detail->last_close == 0)
/* This cache was never opened */
cd->nextcheck = now;
cache_flush();
+ if (cd->flush)
+ cd->flush();
+
*ppos += count;
return count;
}
if (rqstp->rq_vers >= progp->pg_nvers )
goto err_bad_vers;
- versp = progp->pg_vers[rqstp->rq_vers];
- if (!versp)
+ versp = progp->pg_vers[rqstp->rq_vers];
+ if (!versp)
goto err_bad_vers;
/*
atomic_t rdma_stat_sq_poll;
atomic_t rdma_stat_sq_prod;
-struct workqueue_struct *svc_rdma_wq;
-
/*
* This function implements reading and resetting an atomic_t stat
* variable through read/write to a proc file. Any write to the file
void svc_rdma_cleanup(void)
{
dprintk("SVCRDMA Module Removed, deregister RPC RDMA transport\n");
- destroy_workqueue(svc_rdma_wq);
if (svcrdma_table_header) {
unregister_sysctl_table(svcrdma_table_header);
svcrdma_table_header = NULL;
dprintk("\tmax_bc_requests : %u\n", svcrdma_max_bc_requests);
dprintk("\tmax_inline : %d\n", svcrdma_max_req_size);
- svc_rdma_wq = alloc_workqueue("svc_rdma", 0, 0);
- if (!svc_rdma_wq)
- return -ENOMEM;
-
if (!svcrdma_table_header)
svcrdma_table_header =
register_sysctl_table(svcrdma_root_table);
void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
{
struct svc_rdma_recv_ctxt *ctxt;
+ struct llist_node *node;
- while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) {
- list_del(&ctxt->rc_list);
+ while ((node = llist_del_first(&rdma->sc_recv_ctxts))) {
+ ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
svc_rdma_recv_ctxt_destroy(rdma, ctxt);
}
}
svc_rdma_recv_ctxt_get(struct svcxprt_rdma *rdma)
{
struct svc_rdma_recv_ctxt *ctxt;
+ struct llist_node *node;
- spin_lock(&rdma->sc_recv_lock);
- ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts);
- if (!ctxt)
+ node = llist_del_first(&rdma->sc_recv_ctxts);
+ if (!node)
goto out_empty;
- list_del(&ctxt->rc_list);
- spin_unlock(&rdma->sc_recv_lock);
+ ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
out:
ctxt->rc_page_count = 0;
return ctxt;
out_empty:
- spin_unlock(&rdma->sc_recv_lock);
-
ctxt = svc_rdma_recv_ctxt_alloc(rdma);
if (!ctxt)
return NULL;
for (i = 0; i < ctxt->rc_page_count; i++)
put_page(ctxt->rc_pages[i]);
- if (!ctxt->rc_temp) {
- spin_lock(&rdma->sc_recv_lock);
- list_add(&ctxt->rc_list, &rdma->sc_recv_ctxts);
- spin_unlock(&rdma->sc_recv_lock);
- } else
+ if (!ctxt->rc_temp)
+ llist_add(&ctxt->rc_node, &rdma->sc_recv_ctxts);
+ else
svc_rdma_recv_ctxt_destroy(rdma, ctxt);
}
INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
INIT_LIST_HEAD(&cma_xprt->sc_send_ctxts);
- INIT_LIST_HEAD(&cma_xprt->sc_recv_ctxts);
+ init_llist_head(&cma_xprt->sc_recv_ctxts);
INIT_LIST_HEAD(&cma_xprt->sc_rw_ctxts);
init_waitqueue_head(&cma_xprt->sc_send_wait);
spin_lock_init(&cma_xprt->sc_lock);
spin_lock_init(&cma_xprt->sc_rq_dto_lock);
spin_lock_init(&cma_xprt->sc_send_lock);
- spin_lock_init(&cma_xprt->sc_recv_lock);
spin_lock_init(&cma_xprt->sc_rw_ctxt_lock);
/*
{
struct svcxprt_rdma *rdma =
container_of(xprt, struct svcxprt_rdma, sc_xprt);
+
INIT_WORK(&rdma->sc_work, __svc_rdma_free);
- queue_work(svc_rdma_wq, &rdma->sc_work);
+ schedule_work(&rdma->sc_work);
}
static int svc_rdma_has_wspace(struct svc_xprt *xprt)
struct {
u16 len;
u16 limit;
+ struct sk_buff *target_bskb;
} backlog[5];
u16 snd_nxt;
u16 window;
void tipc_link_reset(struct tipc_link *l)
{
struct sk_buff_head list;
+ u32 imp;
__skb_queue_head_init(&list);
__skb_queue_purge(&l->deferdq);
__skb_queue_purge(&l->backlogq);
__skb_queue_purge(&l->failover_deferdq);
- l->backlog[TIPC_LOW_IMPORTANCE].len = 0;
- l->backlog[TIPC_MEDIUM_IMPORTANCE].len = 0;
- l->backlog[TIPC_HIGH_IMPORTANCE].len = 0;
- l->backlog[TIPC_CRITICAL_IMPORTANCE].len = 0;
- l->backlog[TIPC_SYSTEM_IMPORTANCE].len = 0;
+ for (imp = 0; imp <= TIPC_SYSTEM_IMPORTANCE; imp++) {
+ l->backlog[imp].len = 0;
+ l->backlog[imp].target_bskb = NULL;
+ }
kfree_skb(l->reasm_buf);
kfree_skb(l->reasm_tnlmsg);
kfree_skb(l->failover_reasm_skb);
u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
struct sk_buff_head *transmq = &l->transmq;
struct sk_buff_head *backlogq = &l->backlogq;
- struct sk_buff *skb, *_skb, *bskb;
+ struct sk_buff *skb, *_skb, **tskb;
int pkt_cnt = skb_queue_len(list);
int rc = 0;
seqno++;
continue;
}
- if (tipc_msg_bundle(skb_peek_tail(backlogq), hdr, mtu)) {
+ tskb = &l->backlog[imp].target_bskb;
+ if (tipc_msg_bundle(*tskb, hdr, mtu)) {
kfree_skb(__skb_dequeue(list));
l->stats.sent_bundled++;
continue;
}
- if (tipc_msg_make_bundle(&bskb, hdr, mtu, l->addr)) {
+ if (tipc_msg_make_bundle(tskb, hdr, mtu, l->addr)) {
kfree_skb(__skb_dequeue(list));
- __skb_queue_tail(backlogq, bskb);
- l->backlog[msg_importance(buf_msg(bskb))].len++;
+ __skb_queue_tail(backlogq, *tskb);
+ l->backlog[imp].len++;
l->stats.sent_bundled++;
l->stats.sent_bundles++;
continue;
}
+ l->backlog[imp].target_bskb = NULL;
l->backlog[imp].len += skb_queue_len(list);
skb_queue_splice_tail_init(list, backlogq);
}
u16 seqno = l->snd_nxt;
u16 ack = l->rcv_nxt - 1;
u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
+ u32 imp;
while (skb_queue_len(&l->transmq) < l->window) {
skb = skb_peek(&l->backlogq);
break;
__skb_dequeue(&l->backlogq);
hdr = buf_msg(skb);
- l->backlog[msg_importance(hdr)].len--;
+ imp = msg_importance(hdr);
+ l->backlog[imp].len--;
+ if (unlikely(skb == l->backlog[imp].target_bskb))
+ l->backlog[imp].target_bskb = NULL;
__skb_queue_tail(&l->transmq, skb);
/* next retransmit attempt */
if (link_is_bc_sndlink(l))
bmsg = buf_msg(_skb);
tipc_msg_init(msg_prevnode(msg), bmsg, MSG_BUNDLER, 0,
INT_H_SIZE, dnode);
- if (msg_isdata(msg))
- msg_set_importance(bmsg, TIPC_CRITICAL_IMPORTANCE);
- else
- msg_set_importance(bmsg, TIPC_SYSTEM_IMPORTANCE);
+ msg_set_importance(bmsg, msg_importance(msg));
msg_set_seqno(bmsg, msg_seqno(msg));
msg_set_ack(bmsg, msg_ack(msg));
msg_set_bcast_ack(bmsg, msg_bcast_ack(msg));
}
EXPORT_SYMBOL_GPL(__vsock_create);
-static void __vsock_release(struct sock *sk)
+static void __vsock_release(struct sock *sk, int level)
{
if (sk) {
struct sk_buff *skb;
vsk = vsock_sk(sk);
pending = NULL; /* Compiler warning. */
+ /* The release call is supposed to use lock_sock_nested()
+ * rather than lock_sock(), if a sock lock should be acquired.
+ */
transport->release(vsk);
- lock_sock(sk);
+ /* When "level" is SINGLE_DEPTH_NESTING, use the nested
+ * version to avoid the warning "possible recursive locking
+ * detected". When "level" is 0, lock_sock_nested(sk, level)
+ * is the same as lock_sock(sk).
+ */
+ lock_sock_nested(sk, level);
sock_orphan(sk);
sk->sk_shutdown = SHUTDOWN_MASK;
/* Clean up any sockets that never were accepted. */
while ((pending = vsock_dequeue_accept(sk)) != NULL) {
- __vsock_release(pending);
+ __vsock_release(pending, SINGLE_DEPTH_NESTING);
sock_put(pending);
}
static int vsock_release(struct socket *sock)
{
- __vsock_release(sock->sk);
+ __vsock_release(sock->sk, 0);
sock->sk = NULL;
sock->state = SS_FREE;
struct sock *sk = sk_vsock(vsk);
bool remove_sock;
- lock_sock(sk);
+ lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
remove_sock = hvs_close_lock_held(vsk);
release_sock(sk);
if (remove_sock)
struct sock *sk = &vsk->sk;
bool remove_sock = true;
- lock_sock(sk);
+ lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
if (sk->sk_type == SOCK_STREAM)
remove_sock = virtio_transport_close(vsk);
return __cfg80211_rdev_from_attrs(netns, info->attrs);
}
+static int validate_beacon_head(const struct nlattr *attr,
+ struct netlink_ext_ack *extack)
+{
+ const u8 *data = nla_data(attr);
+ unsigned int len = nla_len(attr);
+ const struct element *elem;
+ const struct ieee80211_mgmt *mgmt = (void *)data;
+ unsigned int fixedlen = offsetof(struct ieee80211_mgmt,
+ u.beacon.variable);
+
+ if (len < fixedlen)
+ goto err;
+
+ if (ieee80211_hdrlen(mgmt->frame_control) !=
+ offsetof(struct ieee80211_mgmt, u.beacon))
+ goto err;
+
+ data += fixedlen;
+ len -= fixedlen;
+
+ for_each_element(elem, data, len) {
+ /* nothing */
+ }
+
+ if (for_each_element_completed(elem, data, len))
+ return 0;
+
+err:
+ NL_SET_ERR_MSG_ATTR(extack, attr, "malformed beacon head");
+ return -EINVAL;
+}
+
static int validate_ie_attr(const struct nlattr *attr,
struct netlink_ext_ack *extack)
{
[NL80211_ATTR_BEACON_INTERVAL] = { .type = NLA_U32 },
[NL80211_ATTR_DTIM_PERIOD] = { .type = NLA_U32 },
- [NL80211_ATTR_BEACON_HEAD] = { .type = NLA_BINARY,
- .len = IEEE80211_MAX_DATA_LEN },
+ [NL80211_ATTR_BEACON_HEAD] =
+ NLA_POLICY_VALIDATE_FN(NLA_BINARY, validate_beacon_head,
+ IEEE80211_MAX_DATA_LEN),
[NL80211_ATTR_BEACON_TAIL] =
NLA_POLICY_VALIDATE_FN(NLA_BINARY, validate_ie_attr,
IEEE80211_MAX_DATA_LEN),
control_freq = nla_get_u32(attrs[NL80211_ATTR_WIPHY_FREQ]);
+ memset(chandef, 0, sizeof(*chandef));
+
chandef->chan = ieee80211_get_channel(&rdev->wiphy, control_freq);
chandef->width = NL80211_CHAN_WIDTH_20_NOHT;
chandef->center_freq1 = control_freq;
if (rdev->ops->get_channel) {
int ret;
- struct cfg80211_chan_def chandef;
+ struct cfg80211_chan_def chandef = {};
ret = rdev_get_channel(rdev, wdev, &chandef);
if (ret == 0) {
if (!rdev->ops->del_mpath)
return -EOPNOTSUPP;
+ if (dev->ieee80211_ptr->iftype != NL80211_IFTYPE_MESH_POINT)
+ return -EOPNOTSUPP;
+
return rdev_del_mpath(rdev, dev, dst);
}
static bool reg_wdev_chan_valid(struct wiphy *wiphy, struct wireless_dev *wdev)
{
- struct cfg80211_chan_def chandef;
+ struct cfg80211_chan_def chandef = {};
struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
enum nl80211_iftype iftype;
return;
new_ie_len -= trans_ssid[1];
mbssid = cfg80211_find_ie(WLAN_EID_MULTIPLE_BSSID, ie, ielen);
- if (!mbssid)
+ /*
+ * It's not valid to have the MBSSID element before SSID
+ * ignore if that happens - the code below assumes it is
+ * after (while copying things inbetween).
+ */
+ if (!mbssid || mbssid < trans_ssid)
return;
new_ie_len -= mbssid[1];
rcu_read_lock();
{
struct wireless_dev *wdev = dev->ieee80211_ptr;
struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
- struct cfg80211_chan_def chandef;
+ struct cfg80211_chan_def chandef = {};
int ret;
switch (wdev->iftype) {
return -EINVAL;
}
- headroom = ALIGN(headroom, 64);
-
size_chk = chunk_size - headroom - XDP_PACKET_HEADROOM;
if (size_chk < 0)
return -EINVAL;
if (err)
goto drop;
- nf_reset(skb);
+ nf_reset_ct(skb);
if (decaps) {
sp = skb_sec_path(skb);
skb->skb_iif = 0;
skb->ignore_df = 0;
skb_dst_drop(skb);
- nf_reset(skb);
+ nf_reset_ct(skb);
nf_reset_trace(skb);
if (!xnet)
struct net *net = xs_net(skb_dst(skb)->xfrm);
while (likely((err = xfrm_output_one(skb, err)) == 0)) {
- nf_reset(skb);
+ nf_reset_ct(skb);
err = skb_dst(skb)->ops->local_out(net, skb->sk, skb);
if (unlikely(err != 1))
continue;
}
- nf_reset(skb);
+ nf_reset_ct(skb);
skb_dst_drop(skb);
skb_dst_set(skb, dst);
# Usage: KBUILD_LDFLAGS += $(call ld-option, -X, -Y)
ld-option = $(call try-run, $(LD) $(KBUILD_LDFLAGS) $(1) -v,$(1),$(2),$(3))
-# ar-option
-# Usage: KBUILD_ARFLAGS := $(call ar-option,D)
-# Important: no spaces around options
-ar-option = $(call try-run, $(AR) rc$(1) "$$TMP",$(1),$(2))
-
# ld-version
# Note this is mainly for HJ Lu's 3 number binutil versions
ld-version = $(shell $(LD) --version | $(srctree)/scripts/ld-version.sh)
hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
hostprogs-$(CONFIG_ASN1) += asn1_compiler
-hostprogs-$(CONFIG_MODULE_SIG) += sign-file
+hostprogs-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
hostprogs-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
ifdef builtin-target
quiet_cmd_ar_builtin = AR $@
- cmd_ar_builtin = rm -f $@; $(AR) rcSTP$(KBUILD_ARFLAGS) $@ $(real-prereqs)
+ cmd_ar_builtin = rm -f $@; $(AR) cDPrST $@ $(real-prereqs)
$(builtin-target): $(real-obj-y) FORCE
$(call if_changed,ar_builtin)
# ---------------------------------------------------------------------------
quiet_cmd_ar = AR $@
- cmd_ar = rm -f $@; $(AR) rcsTP$(KBUILD_ARFLAGS) $@ $(real-prereqs)
+ cmd_ar = rm -f $@; $(AR) cDPrsT $@ $(real-prereqs)
# Objcopy
# ---------------------------------------------------------------------------
fatal("modpost: Section mismatches detected.\n"
"Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.\n");
for (n = 0; n < SYMBOL_HASH_SIZE; n++) {
- struct symbol *s = symbolhash[n];
+ struct symbol *s;
+
+ for (s = symbolhash[n]; s; s = s->next) {
+ /*
+ * Do not check "vmlinux". This avoids the same warnings
+ * shown twice, and false-positives for ARCH=um.
+ */
+ if (is_vmlinux(s->module->name) && !s->module->is_dot_o)
+ continue;
- while (s) {
if (s->is_static)
warn("\"%s\" [%s] is a static %s\n",
s->name, s->module->name,
export_str(s->export));
-
- s = s->next;
}
}
use warnings;
use strict;
use File::Find;
+use File::Spec;
my $nm = ($ENV{'NM'} || "nm") . " -p";
my $objdump = ($ENV{'OBJDUMP'} || "objdump") . " -s -j .comment";
-my $srctree = "";
-my $objtree = "";
-$srctree = "$ENV{'srctree'}/" if (exists($ENV{'srctree'}));
-$objtree = "$ENV{'objtree'}/" if (exists($ENV{'objtree'}));
+my $srctree = File::Spec->curdir();
+my $objtree = File::Spec->curdir();
+$srctree = File::Spec->rel2abs($ENV{'srctree'}) if (exists($ENV{'srctree'}));
+$objtree = File::Spec->rel2abs($ENV{'objtree'}) if (exists($ENV{'objtree'}));
if ($#ARGV != -1) {
print STDERR "usage: $0 takes no parameters\n";
}
($source = $basename) =~ s/\.o$//;
if (-e "$source.c" || -e "$source.S") {
- $source = "$objtree$File::Find::dir/$source";
+ $source = File::Spec->catfile($objtree, $File::Find::dir, $source)
} else {
- $source = "$srctree$File::Find::dir/$source";
+ $source = File::Spec->catfile($srctree, $File::Find::dir, $source)
}
if (! -e "$source.c" && ! -e "$source.S") {
# No obvious source, exclude the object if it is conglomerate
collect_files()
{
- local file res
+ local file res=
for file; do
case "$file" in
source "security/loadpin/Kconfig"
source "security/yama/Kconfig"
source "security/safesetid/Kconfig"
+source "security/lockdown/Kconfig"
source "security/integrity/Kconfig"
config LSM
string "Ordered list of enabled LSMs"
- default "yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK
- default "yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR
- default "yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO
- default "yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC
- default "yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
+ default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK
+ default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR
+ default "lockdown,yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO
+ default "lockdown,yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC
+ default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
help
A comma-separated list of LSMs, in initialization order.
Any LSMs left off this list will be ignored. This can be
subdir-$(CONFIG_SECURITY_YAMA) += yama
subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin
subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid
+subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown
# always enable default capabilities
obj-y += commoncap.o
obj-$(CONFIG_SECURITY_YAMA) += yama/
obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/
obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/
+obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/
obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o
# Object integrity file lists
config INTEGRITY_SIGNATURE
bool "Digital signature verification using multiple keyrings"
- depends on KEYS
default n
+ select KEYS
select SIGNATURE
help
This option enables digital signature verification support
integrity-$(CONFIG_LOAD_UEFI_KEYS) += platform_certs/efi_parser.o \
platform_certs/load_uefi.o
integrity-$(CONFIG_LOAD_IPL_KEYS) += platform_certs/load_ipl_s390.o
-$(obj)/load_uefi.o: KBUILD_CFLAGS += -fshort-wchar
-subdir-$(CONFIG_IMA) += ima
obj-$(CONFIG_IMA) += ima/
-subdir-$(CONFIG_EVM) += evm
obj-$(CONFIG_EVM) += evm/
#define restrict_link_to_ima restrict_link_by_builtin_trusted
#endif
-int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
- const char *digest, int digestlen)
+static struct key *integrity_keyring_from_id(const unsigned int id)
{
- if (id >= INTEGRITY_KEYRING_MAX || siglen < 2)
- return -EINVAL;
+ if (id >= INTEGRITY_KEYRING_MAX)
+ return ERR_PTR(-EINVAL);
if (!keyring[id]) {
keyring[id] =
int err = PTR_ERR(keyring[id]);
pr_err("no %s keyring: %d\n", keyring_name[id], err);
keyring[id] = NULL;
- return err;
+ return ERR_PTR(err);
}
}
+ return keyring[id];
+}
+
+int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
+ const char *digest, int digestlen)
+{
+ struct key *keyring;
+
+ if (siglen < 2)
+ return -EINVAL;
+
+ keyring = integrity_keyring_from_id(id);
+ if (IS_ERR(keyring))
+ return PTR_ERR(keyring);
+
switch (sig[1]) {
case 1:
/* v1 API expect signature without xattr type */
- return digsig_verify(keyring[id], sig + 1, siglen - 1,
- digest, digestlen);
+ return digsig_verify(keyring, sig + 1, siglen - 1, digest,
+ digestlen);
case 2:
- return asymmetric_verify(keyring[id], sig, siglen,
- digest, digestlen);
+ return asymmetric_verify(keyring, sig, siglen, digest,
+ digestlen);
}
return -EOPNOTSUPP;
}
+int integrity_modsig_verify(const unsigned int id, const struct modsig *modsig)
+{
+ struct key *keyring;
+
+ keyring = integrity_keyring_from_id(id);
+ if (IS_ERR(keyring))
+ return PTR_ERR(keyring);
+
+ return ima_modsig_verify(keyring, modsig);
+}
+
static int __init __integrity_init_keyring(const unsigned int id,
key_perm_t perm,
struct key_restriction *restriction)
config IMA_ARCH_POLICY
bool "Enable loading an IMA architecture specific policy"
- depends on (KEXEC_VERIFY_SIG && IMA) || IMA_APPRAISE \
+ depends on (KEXEC_SIG && IMA) || IMA_APPRAISE \
&& INTEGRITY_ASYMMETRIC_KEYS
default n
help
This option enables the different "ima_appraise=" modes
(eg. fix, log) from the boot command line.
+config IMA_APPRAISE_MODSIG
+ bool "Support module-style signatures for appraisal"
+ depends on IMA_APPRAISE
+ depends on INTEGRITY_ASYMMETRIC_KEYS
+ select PKCS7_MESSAGE_PARSER
+ select MODULE_SIG_FORMAT
+ default n
+ help
+ Adds support for signatures appended to files. The format of the
+ appended signature is the same used for signed kernel modules.
+ The modsig keyword can be used in the IMA policy to allow a hook
+ to accept such signatures.
+
config IMA_TRUSTED_KEYRING
bool "Require all keys on the .ima keyring be signed (deprecated)"
depends on IMA_APPRAISE && SYSTEM_TRUSTED_KEYRING
ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
ima_policy.o ima_template.o ima_template_lib.o
ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
+ima-$(CONFIG_IMA_APPRAISE_MODSIG) += ima_modsig.o
ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
const unsigned char *filename;
struct evm_ima_xattr_data *xattr_value;
int xattr_len;
+ const struct modsig *modsig;
const char *violation;
const void *buf;
int buf_len;
u64 count;
};
+extern const int read_idmap[];
+
#ifdef CONFIG_HAVE_IMA_KEXEC
void ima_load_kexec_buffer(void);
#else
int *num_fields);
struct ima_template_desc *ima_template_desc_current(void);
struct ima_template_desc *lookup_template_desc(const char *name);
+bool ima_template_has_modsig(const struct ima_template_desc *ima_template);
int ima_restore_measurement_entry(struct ima_template_entry *entry);
int ima_restore_measurement_list(loff_t bufsize, void *buf);
int ima_measurements_show(struct seq_file *m, void *v);
__ima_hooks(__ima_hook_enumify)
};
+extern const char *const func_tokens[];
+
+struct modsig;
+
/* LIM API function definitions */
int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
int mask, enum ima_hooks func, int *pcr,
int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func);
int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
- enum hash_algo algo);
+ enum hash_algo algo, struct modsig *modsig);
void ima_store_measurement(struct integrity_iint_cache *iint, struct file *file,
const unsigned char *filename,
struct evm_ima_xattr_data *xattr_value,
- int xattr_len, int pcr,
+ int xattr_len, const struct modsig *modsig, int pcr,
struct ima_template_desc *template_desc);
void ima_audit_measurement(struct integrity_iint_cache *iint,
const unsigned char *filename);
struct integrity_iint_cache *iint,
struct file *file, const unsigned char *filename,
struct evm_ima_xattr_data *xattr_value,
- int xattr_len);
+ int xattr_len, const struct modsig *modsig);
int ima_must_appraise(struct inode *inode, int mask, enum ima_hooks func);
void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file);
enum integrity_status ima_get_cache_status(struct integrity_iint_cache *iint,
struct file *file,
const unsigned char *filename,
struct evm_ima_xattr_data *xattr_value,
- int xattr_len)
+ int xattr_len,
+ const struct modsig *modsig)
{
return INTEGRITY_UNKNOWN;
}
#endif /* CONFIG_IMA_APPRAISE */
+#ifdef CONFIG_IMA_APPRAISE_MODSIG
+bool ima_hook_supports_modsig(enum ima_hooks func);
+int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
+ struct modsig **modsig);
+void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size);
+int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo,
+ const u8 **digest, u32 *digest_size);
+int ima_get_raw_modsig(const struct modsig *modsig, const void **data,
+ u32 *data_len);
+void ima_free_modsig(struct modsig *modsig);
+#else
+static inline bool ima_hook_supports_modsig(enum ima_hooks func)
+{
+ return false;
+}
+
+static inline int ima_read_modsig(enum ima_hooks func, const void *buf,
+ loff_t buf_len, struct modsig **modsig)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void ima_collect_modsig(struct modsig *modsig, const void *buf,
+ loff_t size)
+{
+}
+
+static inline int ima_get_modsig_digest(const struct modsig *modsig,
+ enum hash_algo *algo, const u8 **digest,
+ u32 *digest_size)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int ima_get_raw_modsig(const struct modsig *modsig,
+ const void **data, u32 *data_len)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void ima_free_modsig(struct modsig *modsig)
+{
+}
+#endif /* CONFIG_IMA_APPRAISE_MODSIG */
+
/* LSM based policy rules require audit */
#ifdef CONFIG_IMA_LSM_RULES
else
template_desc = ima_template_desc_current();
- *entry = kzalloc(sizeof(**entry) + template_desc->num_fields *
- sizeof(struct ima_field_data), GFP_NOFS);
+ *entry = kzalloc(struct_size(*entry, template_data,
+ template_desc->num_fields), GFP_NOFS);
if (!*entry)
return -ENOMEM;
*/
int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
- enum hash_algo algo)
+ enum hash_algo algo, struct modsig *modsig)
{
const char *audit_cause = "failed";
struct inode *inode = file_inode(file);
char digest[IMA_MAX_DIGEST_SIZE];
} hash;
+ /*
+ * Always collect the modsig, because IMA might have already collected
+ * the file digest without collecting the modsig in a previous
+ * measurement rule.
+ */
+ if (modsig)
+ ima_collect_modsig(modsig, buf, size);
+
if (iint->flags & IMA_COLLECTED)
goto out;
void ima_store_measurement(struct integrity_iint_cache *iint,
struct file *file, const unsigned char *filename,
struct evm_ima_xattr_data *xattr_value,
- int xattr_len, int pcr,
+ int xattr_len, const struct modsig *modsig, int pcr,
struct ima_template_desc *template_desc)
{
static const char op[] = "add_template_measure";
.file = file,
.filename = filename,
.xattr_value = xattr_value,
- .xattr_len = xattr_len };
+ .xattr_len = xattr_len,
+ .modsig = modsig };
int violation = 0;
- if (iint->measured_pcrs & (0x1 << pcr))
+ /*
+ * We still need to store the measurement in the case of MODSIG because
+ * we only have its contents to put in the list at the time of
+ * appraisal, but a file measurement from earlier might already exist in
+ * the measurement list.
+ */
+ if (iint->measured_pcrs & (0x1 << pcr) && !modsig)
return;
result = ima_alloc_init_template(&event_data, &entry, template_desc);
return ret;
}
+/*
+ * xattr_verify - verify xattr digest or signature
+ *
+ * Verify whether the hash or signature matches the file contents.
+ *
+ * Return 0 on success, error code otherwise.
+ */
+static int xattr_verify(enum ima_hooks func, struct integrity_iint_cache *iint,
+ struct evm_ima_xattr_data *xattr_value, int xattr_len,
+ enum integrity_status *status, const char **cause)
+{
+ int rc = -EINVAL, hash_start = 0;
+
+ switch (xattr_value->type) {
+ case IMA_XATTR_DIGEST_NG:
+ /* first byte contains algorithm id */
+ hash_start = 1;
+ /* fall through */
+ case IMA_XATTR_DIGEST:
+ if (iint->flags & IMA_DIGSIG_REQUIRED) {
+ *cause = "IMA-signature-required";
+ *status = INTEGRITY_FAIL;
+ break;
+ }
+ clear_bit(IMA_DIGSIG, &iint->atomic_flags);
+ if (xattr_len - sizeof(xattr_value->type) - hash_start >=
+ iint->ima_hash->length)
+ /*
+ * xattr length may be longer. md5 hash in previous
+ * version occupied 20 bytes in xattr, instead of 16
+ */
+ rc = memcmp(&xattr_value->data[hash_start],
+ iint->ima_hash->digest,
+ iint->ima_hash->length);
+ else
+ rc = -EINVAL;
+ if (rc) {
+ *cause = "invalid-hash";
+ *status = INTEGRITY_FAIL;
+ break;
+ }
+ *status = INTEGRITY_PASS;
+ break;
+ case EVM_IMA_XATTR_DIGSIG:
+ set_bit(IMA_DIGSIG, &iint->atomic_flags);
+ rc = integrity_digsig_verify(INTEGRITY_KEYRING_IMA,
+ (const char *)xattr_value,
+ xattr_len,
+ iint->ima_hash->digest,
+ iint->ima_hash->length);
+ if (rc == -EOPNOTSUPP) {
+ *status = INTEGRITY_UNKNOWN;
+ break;
+ }
+ if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
+ func == KEXEC_KERNEL_CHECK)
+ rc = integrity_digsig_verify(INTEGRITY_KEYRING_PLATFORM,
+ (const char *)xattr_value,
+ xattr_len,
+ iint->ima_hash->digest,
+ iint->ima_hash->length);
+ if (rc) {
+ *cause = "invalid-signature";
+ *status = INTEGRITY_FAIL;
+ } else {
+ *status = INTEGRITY_PASS;
+ }
+ break;
+ default:
+ *status = INTEGRITY_UNKNOWN;
+ *cause = "unknown-ima-data";
+ break;
+ }
+
+ return rc;
+}
+
+/*
+ * modsig_verify - verify modsig signature
+ *
+ * Verify whether the signature matches the file contents.
+ *
+ * Return 0 on success, error code otherwise.
+ */
+static int modsig_verify(enum ima_hooks func, const struct modsig *modsig,
+ enum integrity_status *status, const char **cause)
+{
+ int rc;
+
+ rc = integrity_modsig_verify(INTEGRITY_KEYRING_IMA, modsig);
+ if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
+ func == KEXEC_KERNEL_CHECK)
+ rc = integrity_modsig_verify(INTEGRITY_KEYRING_PLATFORM,
+ modsig);
+ if (rc) {
+ *cause = "invalid-signature";
+ *status = INTEGRITY_FAIL;
+ } else {
+ *status = INTEGRITY_PASS;
+ }
+
+ return rc;
+}
+
/*
* ima_appraise_measurement - appraise file measurement
*
struct integrity_iint_cache *iint,
struct file *file, const unsigned char *filename,
struct evm_ima_xattr_data *xattr_value,
- int xattr_len)
+ int xattr_len, const struct modsig *modsig)
{
static const char op[] = "appraise_data";
const char *cause = "unknown";
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_backing_inode(dentry);
enum integrity_status status = INTEGRITY_UNKNOWN;
- int rc = xattr_len, hash_start = 0;
+ int rc = xattr_len;
+ bool try_modsig = iint->flags & IMA_MODSIG_ALLOWED && modsig;
- if (!(inode->i_opflags & IOP_XATTR))
+ /* If not appraising a modsig, we need an xattr. */
+ if (!(inode->i_opflags & IOP_XATTR) && !try_modsig)
return INTEGRITY_UNKNOWN;
- if (rc <= 0) {
+ /* If reading the xattr failed and there's no modsig, error out. */
+ if (rc <= 0 && !try_modsig) {
if (rc && rc != -ENODATA)
goto out;
case INTEGRITY_UNKNOWN:
break;
case INTEGRITY_NOXATTRS: /* No EVM protected xattrs. */
+ /* It's fine not to have xattrs when using a modsig. */
+ if (try_modsig)
+ break;
+ /* fall through */
case INTEGRITY_NOLABEL: /* No security.evm xattr. */
cause = "missing-HMAC";
goto out;
WARN_ONCE(true, "Unexpected integrity status %d\n", status);
}
- switch (xattr_value->type) {
- case IMA_XATTR_DIGEST_NG:
- /* first byte contains algorithm id */
- hash_start = 1;
- /* fall through */
- case IMA_XATTR_DIGEST:
- if (iint->flags & IMA_DIGSIG_REQUIRED) {
- cause = "IMA-signature-required";
- status = INTEGRITY_FAIL;
- break;
- }
- clear_bit(IMA_DIGSIG, &iint->atomic_flags);
- if (xattr_len - sizeof(xattr_value->type) - hash_start >=
- iint->ima_hash->length)
- /* xattr length may be longer. md5 hash in previous
- version occupied 20 bytes in xattr, instead of 16
- */
- rc = memcmp(&xattr_value->data[hash_start],
- iint->ima_hash->digest,
- iint->ima_hash->length);
- else
- rc = -EINVAL;
- if (rc) {
- cause = "invalid-hash";
- status = INTEGRITY_FAIL;
- break;
- }
- status = INTEGRITY_PASS;
- break;
- case EVM_IMA_XATTR_DIGSIG:
- set_bit(IMA_DIGSIG, &iint->atomic_flags);
- rc = integrity_digsig_verify(INTEGRITY_KEYRING_IMA,
- (const char *)xattr_value,
- xattr_len,
- iint->ima_hash->digest,
- iint->ima_hash->length);
- if (rc == -EOPNOTSUPP) {
- status = INTEGRITY_UNKNOWN;
- break;
- }
- if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
- func == KEXEC_KERNEL_CHECK)
- rc = integrity_digsig_verify(INTEGRITY_KEYRING_PLATFORM,
- (const char *)xattr_value,
- xattr_len,
- iint->ima_hash->digest,
- iint->ima_hash->length);
- if (rc) {
- cause = "invalid-signature";
- status = INTEGRITY_FAIL;
- } else {
- status = INTEGRITY_PASS;
- }
- break;
- default:
- status = INTEGRITY_UNKNOWN;
- cause = "unknown-ima-data";
- break;
- }
+ if (xattr_value)
+ rc = xattr_verify(func, iint, xattr_value, xattr_len, &status,
+ &cause);
+
+ /*
+ * If we have a modsig and either no imasig or the imasig's key isn't
+ * known, then try verifying the modsig.
+ */
+ if (try_modsig &&
+ (!xattr_value || xattr_value->type == IMA_XATTR_DIGEST_NG ||
+ rc == -ENOKEY))
+ rc = modsig_verify(func, modsig, &status, &cause);
out:
/*
op, cause, rc, 0);
} else if (status != INTEGRITY_PASS) {
/* Fix mode, but don't replace file signatures. */
- if ((ima_appraise & IMA_APPRAISE_FIX) &&
+ if ((ima_appraise & IMA_APPRAISE_FIX) && !try_modsig &&
(!xattr_value ||
xattr_value->type != EVM_IMA_XATTR_DIGSIG)) {
if (!ima_fix_xattr(dentry, iint))
!(iint->flags & IMA_HASH))
return;
- rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo);
+ rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo, NULL);
if (rc < 0)
return;
rbuf_len = min_t(loff_t, i_size - offset, rbuf_size[active]);
rc = integrity_kernel_read(file, offset, rbuf[active],
rbuf_len);
- if (rc != rbuf_len)
+ if (rc != rbuf_len) {
+ if (rc >= 0)
+ rc = -EINVAL;
+ /*
+ * Forward current rc, do not overwrite with return value
+ * from ahash_wait()
+ */
+ ahash_wait(ahash_rc, &wait);
goto out3;
+ }
if (rbuf[1] && offset) {
/* Using two buffers, and it is not the first
int rc = 0, action, must_appraise = 0;
int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
struct evm_ima_xattr_data *xattr_value = NULL;
+ struct modsig *modsig = NULL;
int xattr_len = 0;
bool violation_check;
enum hash_algo hash_algo;
}
if ((action & IMA_APPRAISE_SUBMASK) ||
- strcmp(template_desc->name, IMA_TEMPLATE_IMA_NAME) != 0)
+ strcmp(template_desc->name, IMA_TEMPLATE_IMA_NAME) != 0) {
/* read 'security.ima' */
xattr_len = ima_read_xattr(file_dentry(file), &xattr_value);
+ /*
+ * Read the appended modsig if allowed by the policy, and allow
+ * an additional measurement list entry, if needed, based on the
+ * template format and whether the file was already measured.
+ */
+ if (iint->flags & IMA_MODSIG_ALLOWED) {
+ rc = ima_read_modsig(func, buf, size, &modsig);
+
+ if (!rc && ima_template_has_modsig(template_desc) &&
+ iint->flags & IMA_MEASURED)
+ action |= IMA_MEASURE;
+ }
+ }
+
hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
- rc = ima_collect_measurement(iint, file, buf, size, hash_algo);
+ rc = ima_collect_measurement(iint, file, buf, size, hash_algo, modsig);
if (rc != 0 && rc != -EBADF && rc != -EINVAL)
goto out_locked;
if (action & IMA_MEASURE)
ima_store_measurement(iint, file, pathname,
- xattr_value, xattr_len, pcr,
+ xattr_value, xattr_len, modsig, pcr,
template_desc);
if (rc == 0 && (action & IMA_APPRAISE_SUBMASK)) {
inode_lock(inode);
rc = ima_appraise_measurement(func, iint, file, pathname,
- xattr_value, xattr_len);
+ xattr_value, xattr_len, modsig);
inode_unlock(inode);
if (!rc)
rc = mmap_violation_check(func, file, &pathbuf,
rc = -EACCES;
mutex_unlock(&iint->mutex);
kfree(xattr_value);
+ ima_free_modsig(modsig);
out:
if (pathbuf)
__putname(pathbuf);
return 0;
}
-static const int read_idmap[READING_MAX_ID] = {
+const int read_idmap[READING_MAX_ID] = {
[READING_FIRMWARE] = FIRMWARE_CHECK,
[READING_FIRMWARE_PREALLOC_BUFFER] = FIRMWARE_CHECK,
[READING_MODULE] = MODULE_CHECK,
switch (id) {
case LOADING_KEXEC_IMAGE:
- if (IS_ENABLED(CONFIG_KEXEC_VERIFY_SIG)
+ if (IS_ENABLED(CONFIG_KEXEC_SIG)
&& arch_ima_get_secureboot()) {
pr_err("impossible to appraise a kernel image without a file descriptor; try using kexec_file_load syscall.\n");
return -EACCES;
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * IMA support for appraising module-style appended signatures.
+ *
+ * Copyright (C) 2019 IBM Corporation
+ *
+ * Author:
+ * Thiago Jung Bauermann <bauerman@linux.ibm.com>
+ */
+
+#include <linux/types.h>
+#include <linux/module_signature.h>
+#include <keys/asymmetric-type.h>
+#include <crypto/pkcs7.h>
+
+#include "ima.h"
+
+struct modsig {
+ struct pkcs7_message *pkcs7_msg;
+
+ enum hash_algo hash_algo;
+
+ /* This digest will go in the 'd-modsig' field of the IMA template. */
+ const u8 *digest;
+ u32 digest_size;
+
+ /*
+ * This is what will go to the measurement list if the template requires
+ * storing the signature.
+ */
+ int raw_pkcs7_len;
+ u8 raw_pkcs7[];
+};
+
+/**
+ * ima_hook_supports_modsig - can the policy allow modsig for this hook?
+ *
+ * modsig is only supported by hooks using ima_post_read_file(), because only
+ * they preload the contents of the file in a buffer. FILE_CHECK does that in
+ * some cases, but not when reached from vfs_open(). POLICY_CHECK can support
+ * it, but it's not useful in practice because it's a text file so deny.
+ */
+bool ima_hook_supports_modsig(enum ima_hooks func)
+{
+ switch (func) {
+ case KEXEC_KERNEL_CHECK:
+ case KEXEC_INITRAMFS_CHECK:
+ case MODULE_CHECK:
+ return true;
+ default:
+ return false;
+ }
+}
+
+/*
+ * ima_read_modsig - Read modsig from buf.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
+ struct modsig **modsig)
+{
+ const size_t marker_len = strlen(MODULE_SIG_STRING);
+ const struct module_signature *sig;
+ struct modsig *hdr;
+ size_t sig_len;
+ const void *p;
+ int rc;
+
+ if (buf_len <= marker_len + sizeof(*sig))
+ return -ENOENT;
+
+ p = buf + buf_len - marker_len;
+ if (memcmp(p, MODULE_SIG_STRING, marker_len))
+ return -ENOENT;
+
+ buf_len -= marker_len;
+ sig = (const struct module_signature *)(p - sizeof(*sig));
+
+ rc = mod_check_sig(sig, buf_len, func_tokens[func]);
+ if (rc)
+ return rc;
+
+ sig_len = be32_to_cpu(sig->sig_len);
+ buf_len -= sig_len + sizeof(*sig);
+
+ /* Allocate sig_len additional bytes to hold the raw PKCS#7 data. */
+ hdr = kzalloc(sizeof(*hdr) + sig_len, GFP_KERNEL);
+ if (!hdr)
+ return -ENOMEM;
+
+ hdr->pkcs7_msg = pkcs7_parse_message(buf + buf_len, sig_len);
+ if (IS_ERR(hdr->pkcs7_msg)) {
+ rc = PTR_ERR(hdr->pkcs7_msg);
+ kfree(hdr);
+ return rc;
+ }
+
+ memcpy(hdr->raw_pkcs7, buf + buf_len, sig_len);
+ hdr->raw_pkcs7_len = sig_len;
+
+ /* We don't know the hash algorithm yet. */
+ hdr->hash_algo = HASH_ALGO__LAST;
+
+ *modsig = hdr;
+
+ return 0;
+}
+
+/**
+ * ima_collect_modsig - Calculate the file hash without the appended signature.
+ *
+ * Since the modsig is part of the file contents, the hash used in its signature
+ * isn't the same one ordinarily calculated by IMA. Therefore PKCS7 code
+ * calculates a separate one for signature verification.
+ */
+void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size)
+{
+ int rc;
+
+ /*
+ * Provide the file contents (minus the appended sig) so that the PKCS7
+ * code can calculate the file hash.
+ */
+ size -= modsig->raw_pkcs7_len + strlen(MODULE_SIG_STRING) +
+ sizeof(struct module_signature);
+ rc = pkcs7_supply_detached_data(modsig->pkcs7_msg, buf, size);
+ if (rc)
+ return;
+
+ /* Ask the PKCS7 code to calculate the file hash. */
+ rc = pkcs7_get_digest(modsig->pkcs7_msg, &modsig->digest,
+ &modsig->digest_size, &modsig->hash_algo);
+}
+
+int ima_modsig_verify(struct key *keyring, const struct modsig *modsig)
+{
+ return verify_pkcs7_message_sig(NULL, 0, modsig->pkcs7_msg, keyring,
+ VERIFYING_MODULE_SIGNATURE, NULL, NULL);
+}
+
+int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo,
+ const u8 **digest, u32 *digest_size)
+{
+ *algo = modsig->hash_algo;
+ *digest = modsig->digest;
+ *digest_size = modsig->digest_size;
+
+ return 0;
+}
+
+int ima_get_raw_modsig(const struct modsig *modsig, const void **data,
+ u32 *data_len)
+{
+ *data = &modsig->raw_pkcs7;
+ *data_len = modsig->raw_pkcs7_len;
+
+ return 0;
+}
+
+void ima_free_modsig(struct modsig *modsig)
+{
+ if (!modsig)
+ return;
+
+ pkcs7_free_message(modsig->pkcs7_msg);
+ kfree(modsig);
+}
* ima_policy.c
* - initialize default measure policy rules
*/
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/init.h>
#include <linux/list.h>
#include <linux/fs.h>
struct ima_rule_entry *entry;
int action = 0, actmask = flags | (flags << 1);
+ if (template_desc)
+ *template_desc = ima_template_desc_current();
+
rcu_read_lock();
list_for_each_entry_rcu(entry, ima_rules, list) {
action |= IMA_FAIL_UNVERIFIABLE_SIGS;
}
+
if (entry->action & IMA_DO_MASK)
actmask &= ~(entry->action | entry->action << 1);
else
if (template_desc && entry->template)
*template_desc = entry->template;
- else if (template_desc)
- *template_desc = ima_template_desc_current();
if (!actmask)
break;
ima_log_string_op(ab, key, value, NULL);
}
+/*
+ * Validating the appended signature included in the measurement list requires
+ * the file hash calculated without the appended signature (i.e., the 'd-modsig'
+ * field). Therefore, notify the user if they have the 'modsig' field but not
+ * the 'd-modsig' field in the template.
+ */
+static void check_template_modsig(const struct ima_template_desc *template)
+{
+#define MSG "template with 'modsig' field also needs 'd-modsig' field\n"
+ bool has_modsig, has_dmodsig;
+ static bool checked;
+ int i;
+
+ /* We only need to notify the user once. */
+ if (checked)
+ return;
+
+ has_modsig = has_dmodsig = false;
+ for (i = 0; i < template->num_fields; i++) {
+ if (!strcmp(template->fields[i]->field_id, "modsig"))
+ has_modsig = true;
+ else if (!strcmp(template->fields[i]->field_id, "d-modsig"))
+ has_dmodsig = true;
+ }
+
+ if (has_modsig && !has_dmodsig)
+ pr_notice(MSG);
+
+ checked = true;
+#undef MSG
+}
+
static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
{
struct audit_buffer *ab;
ima_log_string(ab, "appraise_type", args[0].from);
if ((strcmp(args[0].from, "imasig")) == 0)
entry->flags |= IMA_DIGSIG_REQUIRED;
+ else if (ima_hook_supports_modsig(entry->func) &&
+ strcmp(args[0].from, "imasig|modsig") == 0)
+ entry->flags |= IMA_DIGSIG_REQUIRED |
+ IMA_MODSIG_ALLOWED;
else
result = -EINVAL;
break;
else if (entry->action == APPRAISE)
temp_ima_appraise |= ima_appraise_flag(entry->func);
+ if (!result && entry->flags & IMA_MODSIG_ALLOWED) {
+ template_desc = entry->template ? entry->template :
+ ima_template_desc_current();
+ check_template_modsig(template_desc);
+ }
+
audit_log_format(ab, "res=%d", !result);
audit_log_end(ab);
return result;
}
}
+#define __ima_hook_stringify(str) (#str),
+
+const char *const func_tokens[] = {
+ __ima_hooks(__ima_hook_stringify)
+};
+
#ifdef CONFIG_IMA_READ_POLICY
enum {
mask_exec = 0, mask_write, mask_read, mask_append
"^MAY_APPEND"
};
-#define __ima_hook_stringify(str) (#str),
-
-static const char *const func_tokens[] = {
- __ima_hooks(__ima_hook_stringify)
-};
-
void *ima_policy_start(struct seq_file *m, loff_t *pos)
{
loff_t l = *pos;
}
if (entry->template)
seq_printf(m, "template=%s ", entry->template->name);
- if (entry->flags & IMA_DIGSIG_REQUIRED)
- seq_puts(m, "appraise_type=imasig ");
+ if (entry->flags & IMA_DIGSIG_REQUIRED) {
+ if (entry->flags & IMA_MODSIG_ALLOWED)
+ seq_puts(m, "appraise_type=imasig|modsig ");
+ else
+ seq_puts(m, "appraise_type=imasig ");
+ }
if (entry->flags & IMA_PERMIT_DIRECTIO)
seq_puts(m, "permit_directio ");
rcu_read_unlock();
return 0;
}
#endif /* CONFIG_IMA_READ_POLICY */
+
+#if defined(CONFIG_IMA_APPRAISE) && defined(CONFIG_INTEGRITY_TRUSTED_KEYRING)
+/*
+ * ima_appraise_signature: whether IMA will appraise a given function using
+ * an IMA digital signature. This is restricted to cases where the kernel
+ * has a set of built-in trusted keys in order to avoid an attacker simply
+ * loading additional keys.
+ */
+bool ima_appraise_signature(enum kernel_read_file_id id)
+{
+ struct ima_rule_entry *entry;
+ bool found = false;
+ enum ima_hooks func;
+
+ if (id >= READING_MAX_ID)
+ return false;
+
+ func = read_idmap[id] ?: FILE_CHECK;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(entry, ima_rules, list) {
+ if (entry->action != APPRAISE)
+ continue;
+
+ /*
+ * A generic entry will match, but otherwise require that it
+ * match the func we're looking for
+ */
+ if (entry->func && entry->func != func)
+ continue;
+
+ /*
+ * We require this to be a digital signature, not a raw IMA
+ * hash.
+ */
+ if (entry->flags & IMA_DIGSIG_REQUIRED)
+ found = true;
+
+ /*
+ * We've found a rule that matches, so break now even if it
+ * didn't require a digital signature - a later rule that does
+ * won't override it, so would be a false positive.
+ */
+ break;
+ }
+
+ rcu_read_unlock();
+ return found;
+}
+#endif /* CONFIG_IMA_APPRAISE && CONFIG_INTEGRITY_TRUSTED_KEYRING */
{.name = "ima-ng", .fmt = "d-ng|n-ng"},
{.name = "ima-sig", .fmt = "d-ng|n-ng|sig"},
{.name = "ima-buf", .fmt = "d-ng|n-ng|buf"},
+ {.name = "ima-modsig", .fmt = "d-ng|n-ng|sig|d-modsig|modsig"},
{.name = "", .fmt = ""}, /* placeholder for a custom format */
};
.field_show = ima_show_template_sig},
{.field_id = "buf", .field_init = ima_eventbuf_init,
.field_show = ima_show_template_buf},
+ {.field_id = "d-modsig", .field_init = ima_eventdigest_modsig_init,
+ .field_show = ima_show_template_digest_ng},
+ {.field_id = "modsig", .field_init = ima_eventmodsig_init,
+ .field_show = ima_show_template_sig},
};
/*
* need to be accounted for since they shouldn't be defined in the same template
* description as 'd-ng' and 'n-ng' respectively.
*/
-#define MAX_TEMPLATE_NAME_LEN sizeof("d-ng|n-ng|sig|buf")
+#define MAX_TEMPLATE_NAME_LEN sizeof("d-ng|n-ng|sig|buf|d-modisg|modsig")
static struct ima_template_desc *ima_template;
+/**
+ * ima_template_has_modsig - Check whether template has modsig-related fields.
+ * @ima_template: IMA template to check.
+ *
+ * Tells whether the given template has fields referencing a file's appended
+ * signature.
+ */
+bool ima_template_has_modsig(const struct ima_template_desc *ima_template)
+{
+ int i;
+
+ for (i = 0; i < ima_template->num_fields; i++)
+ if (!strcmp(ima_template->fields[i]->field_id, "modsig") ||
+ !strcmp(ima_template->fields[i]->field_id, "d-modsig"))
+ return true;
+
+ return false;
+}
+
static int __init ima_template_setup(char *str)
{
struct ima_template_desc *template_desc;
int ret = 0;
int i;
- *entry = kzalloc(sizeof(**entry) +
- template_desc->num_fields * sizeof(struct ima_field_data),
- GFP_NOFS);
+ *entry = kzalloc(struct_size(*entry, template_data,
+ template_desc->num_fields), GFP_NOFS);
if (!*entry)
return -ENOMEM;
return 0;
}
-static int ima_eventdigest_init_common(u8 *digest, u32 digestsize, u8 hash_algo,
+static int ima_eventdigest_init_common(const u8 *digest, u32 digestsize,
+ u8 hash_algo,
struct ima_field_data *field_data)
{
/*
hash_algo, field_data);
}
+/*
+ * This function writes the digest of the file which is expected to match the
+ * digest contained in the file's appended signature.
+ */
+int ima_eventdigest_modsig_init(struct ima_event_data *event_data,
+ struct ima_field_data *field_data)
+{
+ enum hash_algo hash_algo;
+ const u8 *cur_digest;
+ u32 cur_digestsize;
+
+ if (!event_data->modsig)
+ return 0;
+
+ if (event_data->violation) {
+ /* Recording a violation. */
+ hash_algo = HASH_ALGO_SHA1;
+ cur_digest = NULL;
+ cur_digestsize = 0;
+ } else {
+ int rc;
+
+ rc = ima_get_modsig_digest(event_data->modsig, &hash_algo,
+ &cur_digest, &cur_digestsize);
+ if (rc)
+ return rc;
+ else if (hash_algo == HASH_ALGO__LAST || cur_digestsize == 0)
+ /* There was some error collecting the digest. */
+ return -EINVAL;
+ }
+
+ return ima_eventdigest_init_common(cur_digest, cur_digestsize,
+ hash_algo, field_data);
+}
+
static int ima_eventname_init_common(struct ima_event_data *event_data,
struct ima_field_data *field_data,
bool size_limit)
event_data->buf_len, DATA_FMT_HEX,
field_data);
}
+
+/*
+ * ima_eventmodsig_init - include the appended file signature as part of the
+ * template data
+ */
+int ima_eventmodsig_init(struct ima_event_data *event_data,
+ struct ima_field_data *field_data)
+{
+ const void *data;
+ u32 data_len;
+ int rc;
+
+ if (!event_data->modsig)
+ return 0;
+
+ /*
+ * modsig is a runtime structure containing pointers. Get its raw data
+ * instead.
+ */
+ rc = ima_get_raw_modsig(event_data->modsig, &data, &data_len);
+ if (rc)
+ return rc;
+
+ return ima_write_template_field_data(data, data_len, DATA_FMT_HEX,
+ field_data);
+}
struct ima_field_data *field_data);
int ima_eventdigest_ng_init(struct ima_event_data *event_data,
struct ima_field_data *field_data);
+int ima_eventdigest_modsig_init(struct ima_event_data *event_data,
+ struct ima_field_data *field_data);
int ima_eventname_ng_init(struct ima_event_data *event_data,
struct ima_field_data *field_data);
int ima_eventsig_init(struct ima_event_data *event_data,
struct ima_field_data *field_data);
int ima_eventbuf_init(struct ima_event_data *event_data,
struct ima_field_data *field_data);
+int ima_eventmodsig_init(struct ima_event_data *event_data,
+ struct ima_field_data *field_data);
#endif /* __LINUX_IMA_TEMPLATE_LIB_H */
#define IMA_NEW_FILE 0x04000000
#define EVM_IMMUTABLE_DIGSIG 0x08000000
#define IMA_FAIL_UNVERIFIABLE_SIGS 0x10000000
+#define IMA_MODSIG_ALLOWED 0x20000000
#define IMA_DO_MASK (IMA_MEASURE | IMA_APPRAISE | IMA_AUDIT | \
IMA_HASH | IMA_APPRAISE_SUBMASK)
extern struct dentry *integrity_dir;
+struct modsig;
+
#ifdef CONFIG_INTEGRITY_SIGNATURE
int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
const char *digest, int digestlen);
+int integrity_modsig_verify(unsigned int id, const struct modsig *modsig);
int __init integrity_init_keyring(const unsigned int id);
int __init integrity_load_x509(const unsigned int id, const char *path);
return -EOPNOTSUPP;
}
+static inline int integrity_modsig_verify(unsigned int id,
+ const struct modsig *modsig)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int integrity_init_keyring(const unsigned int id)
{
return 0;
}
#endif
+#ifdef CONFIG_IMA_APPRAISE_MODSIG
+int ima_modsig_verify(struct key *keyring, const struct modsig *modsig);
+#else
+static inline int ima_modsig_verify(struct key *keyring,
+ const struct modsig *modsig)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
#ifdef CONFIG_IMA_LOAD_X509
void __init ima_load_x509(void);
#else
--- /dev/null
+config SECURITY_LOCKDOWN_LSM
+ bool "Basic module for enforcing kernel lockdown"
+ depends on SECURITY
+ select MODULE_SIG if MODULES
+ help
+ Build support for an LSM that enforces a coarse kernel lockdown
+ behaviour.
+
+config SECURITY_LOCKDOWN_LSM_EARLY
+ bool "Enable lockdown LSM early in init"
+ depends on SECURITY_LOCKDOWN_LSM
+ help
+ Enable the lockdown LSM early in boot. This is necessary in order
+ to ensure that lockdown enforcement can be carried out on kernel
+ boot parameters that are otherwise parsed before the security
+ subsystem is fully initialised. If enabled, lockdown will
+ unconditionally be called before any other LSMs.
+
+choice
+ prompt "Kernel default lockdown mode"
+ default LOCK_DOWN_KERNEL_FORCE_NONE
+ depends on SECURITY_LOCKDOWN_LSM
+ help
+ The kernel can be configured to default to differing levels of
+ lockdown.
+
+config LOCK_DOWN_KERNEL_FORCE_NONE
+ bool "None"
+ help
+ No lockdown functionality is enabled by default. Lockdown may be
+ enabled via the kernel commandline or /sys/kernel/security/lockdown.
+
+config LOCK_DOWN_KERNEL_FORCE_INTEGRITY
+ bool "Integrity"
+ help
+ The kernel runs in integrity mode by default. Features that allow
+ the kernel to be modified at runtime are disabled.
+
+config LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY
+ bool "Confidentiality"
+ help
+ The kernel runs in confidentiality mode by default. Features that
+ allow the kernel to be modified at runtime or that permit userland
+ code to read confidential material held inside the kernel are
+ disabled.
+
+endchoice
--- /dev/null
+obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown.o
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/* Lock down the kernel
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/security.h>
+#include <linux/export.h>
+#include <linux/lsm_hooks.h>
+
+static enum lockdown_reason kernel_locked_down;
+
+static const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
+ [LOCKDOWN_NONE] = "none",
+ [LOCKDOWN_MODULE_SIGNATURE] = "unsigned module loading",
+ [LOCKDOWN_DEV_MEM] = "/dev/mem,kmem,port",
+ [LOCKDOWN_KEXEC] = "kexec of unsigned images",
+ [LOCKDOWN_HIBERNATION] = "hibernation",
+ [LOCKDOWN_PCI_ACCESS] = "direct PCI access",
+ [LOCKDOWN_IOPORT] = "raw io port access",
+ [LOCKDOWN_MSR] = "raw MSR access",
+ [LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables",
+ [LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
+ [LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
+ [LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",
+ [LOCKDOWN_MMIOTRACE] = "unsafe mmio",
+ [LOCKDOWN_DEBUGFS] = "debugfs access",
+ [LOCKDOWN_INTEGRITY_MAX] = "integrity",
+ [LOCKDOWN_KCORE] = "/proc/kcore access",
+ [LOCKDOWN_KPROBES] = "use of kprobes",
+ [LOCKDOWN_BPF_READ] = "use of bpf to read kernel RAM",
+ [LOCKDOWN_PERF] = "unsafe use of perf",
+ [LOCKDOWN_TRACEFS] = "use of tracefs",
+ [LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
+};
+
+static const enum lockdown_reason lockdown_levels[] = {LOCKDOWN_NONE,
+ LOCKDOWN_INTEGRITY_MAX,
+ LOCKDOWN_CONFIDENTIALITY_MAX};
+
+/*
+ * Put the kernel into lock-down mode.
+ */
+static int lock_kernel_down(const char *where, enum lockdown_reason level)
+{
+ if (kernel_locked_down >= level)
+ return -EPERM;
+
+ kernel_locked_down = level;
+ pr_notice("Kernel is locked down from %s; see man kernel_lockdown.7\n",
+ where);
+ return 0;
+}
+
+static int __init lockdown_param(char *level)
+{
+ if (!level)
+ return -EINVAL;
+
+ if (strcmp(level, "integrity") == 0)
+ lock_kernel_down("command line", LOCKDOWN_INTEGRITY_MAX);
+ else if (strcmp(level, "confidentiality") == 0)
+ lock_kernel_down("command line", LOCKDOWN_CONFIDENTIALITY_MAX);
+ else
+ return -EINVAL;
+
+ return 0;
+}
+
+early_param("lockdown", lockdown_param);
+
+/**
+ * lockdown_is_locked_down - Find out if the kernel is locked down
+ * @what: Tag to use in notice generated if lockdown is in effect
+ */
+static int lockdown_is_locked_down(enum lockdown_reason what)
+{
+ if (WARN(what >= LOCKDOWN_CONFIDENTIALITY_MAX,
+ "Invalid lockdown reason"))
+ return -EPERM;
+
+ if (kernel_locked_down >= what) {
+ if (lockdown_reasons[what])
+ pr_notice("Lockdown: %s: %s is restricted; see man kernel_lockdown.7\n",
+ current->comm, lockdown_reasons[what]);
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+static struct security_hook_list lockdown_hooks[] __lsm_ro_after_init = {
+ LSM_HOOK_INIT(locked_down, lockdown_is_locked_down),
+};
+
+static int __init lockdown_lsm_init(void)
+{
+#if defined(CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY)
+ lock_kernel_down("Kernel configuration", LOCKDOWN_INTEGRITY_MAX);
+#elif defined(CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY)
+ lock_kernel_down("Kernel configuration", LOCKDOWN_CONFIDENTIALITY_MAX);
+#endif
+ security_add_hooks(lockdown_hooks, ARRAY_SIZE(lockdown_hooks),
+ "lockdown");
+ return 0;
+}
+
+static ssize_t lockdown_read(struct file *filp, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ char temp[80];
+ int i, offset = 0;
+
+ for (i = 0; i < ARRAY_SIZE(lockdown_levels); i++) {
+ enum lockdown_reason level = lockdown_levels[i];
+
+ if (lockdown_reasons[level]) {
+ const char *label = lockdown_reasons[level];
+
+ if (kernel_locked_down == level)
+ offset += sprintf(temp+offset, "[%s] ", label);
+ else
+ offset += sprintf(temp+offset, "%s ", label);
+ }
+ }
+
+ /* Convert the last space to a newline if needed. */
+ if (offset > 0)
+ temp[offset-1] = '\n';
+
+ return simple_read_from_buffer(buf, count, ppos, temp, strlen(temp));
+}
+
+static ssize_t lockdown_write(struct file *file, const char __user *buf,
+ size_t n, loff_t *ppos)
+{
+ char *state;
+ int i, len, err = -EINVAL;
+
+ state = memdup_user_nul(buf, n);
+ if (IS_ERR(state))
+ return PTR_ERR(state);
+
+ len = strlen(state);
+ if (len && state[len-1] == '\n') {
+ state[len-1] = '\0';
+ len--;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(lockdown_levels); i++) {
+ enum lockdown_reason level = lockdown_levels[i];
+ const char *label = lockdown_reasons[level];
+
+ if (label && !strcmp(state, label))
+ err = lock_kernel_down("securityfs", level);
+ }
+
+ kfree(state);
+ return err ? err : n;
+}
+
+static const struct file_operations lockdown_ops = {
+ .read = lockdown_read,
+ .write = lockdown_write,
+};
+
+static int __init lockdown_secfs_init(void)
+{
+ struct dentry *dentry;
+
+ dentry = securityfs_create_file("lockdown", 0600, NULL, NULL,
+ &lockdown_ops);
+ return PTR_ERR_OR_ZERO(dentry);
+}
+
+core_initcall(lockdown_secfs_init);
+
+#ifdef CONFIG_SECURITY_LOCKDOWN_LSM_EARLY
+DEFINE_EARLY_LSM(lockdown) = {
+#else
+DEFINE_LSM(lockdown) = {
+#endif
+ .name = "lockdown",
+ .init = lockdown_lsm_init,
+};
/* How many LSMs were built into the kernel? */
#define LSM_COUNT (__end_lsm_info - __start_lsm_info)
+#define EARLY_LSM_COUNT (__end_early_lsm_info - __start_early_lsm_info)
struct security_hook_heads security_hook_heads __lsm_ro_after_init;
static BLOCKING_NOTIFIER_HEAD(blocking_lsm_notifier_chain);
static void __init lsm_early_cred(struct cred *cred);
static void __init lsm_early_task(struct task_struct *task);
+static int lsm_append(const char *new, char **result);
+
static void __init ordered_lsm_init(void)
{
struct lsm_info **lsm;
kfree(ordered_lsms);
}
+int __init early_security_init(void)
+{
+ int i;
+ struct hlist_head *list = (struct hlist_head *) &security_hook_heads;
+ struct lsm_info *lsm;
+
+ for (i = 0; i < sizeof(security_hook_heads) / sizeof(struct hlist_head);
+ i++)
+ INIT_HLIST_HEAD(&list[i]);
+
+ for (lsm = __start_early_lsm_info; lsm < __end_early_lsm_info; lsm++) {
+ if (!lsm->enabled)
+ lsm->enabled = &lsm_enabled_true;
+ prepare_lsm(lsm);
+ initialize_lsm(lsm);
+ }
+
+ return 0;
+}
+
/**
* security_init - initializes the security framework
*
*/
int __init security_init(void)
{
- int i;
- struct hlist_head *list = (struct hlist_head *) &security_hook_heads;
+ struct lsm_info *lsm;
pr_info("Security Framework initializing\n");
- for (i = 0; i < sizeof(security_hook_heads) / sizeof(struct hlist_head);
- i++)
- INIT_HLIST_HEAD(&list[i]);
+ /*
+ * Append the names of the early LSM modules now that kmalloc() is
+ * available
+ */
+ for (lsm = __start_early_lsm_info; lsm < __end_early_lsm_info; lsm++) {
+ if (lsm->enabled)
+ lsm_append(lsm->name, &lsm_names);
+ }
/* Load LSMs in specified order. */
ordered_lsm_init();
return !strcmp(last, lsm);
}
-static int lsm_append(char *new, char **result)
+static int lsm_append(const char *new, char **result)
{
char *cp;
hooks[i].lsm = lsm;
hlist_add_tail_rcu(&hooks[i].list, hooks[i].head);
}
- if (lsm_append(lsm, &lsm_names) < 0)
- panic("%s - Cannot get early memory.\n", __func__);
+
+ /*
+ * Don't try to append during early_security_init(), we'll come back
+ * and fix this up afterwards.
+ */
+ if (slab_is_available()) {
+ if (lsm_append(lsm, &lsm_names) < 0)
+ panic("%s - Cannot get early memory.\n", __func__);
+ }
}
int call_blocking_lsm_notifier(enum lsm_event event, void *data)
call_void_hook(bpf_prog_free_security, aux);
}
#endif /* CONFIG_BPF_SYSCALL */
+
+int security_locked_down(enum lockdown_reason what)
+{
+ return call_int_hook(locked_down, 0, what);
+}
+EXPORT_SYMBOL(security_locked_down);
__u8 fwd_emitted: 1;
/* whether unique non-duplicate name was already assigned */
__u8 name_resolved: 1;
+ /* whether type is referenced from any other type */
+ __u8 referenced: 1;
};
struct btf_dump {
free(d);
}
+static int btf_dump_mark_referenced(struct btf_dump *d);
static int btf_dump_order_type(struct btf_dump *d, __u32 id, bool through_ptr);
static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id);
/* VOID is special */
d->type_states[0].order_state = ORDERED;
d->type_states[0].emit_state = EMITTED;
+
+ /* eagerly determine referenced types for anon enums */
+ err = btf_dump_mark_referenced(d);
+ if (err)
+ return err;
}
d->emit_queue_cnt = 0;
return 0;
}
+/*
+ * Mark all types that are referenced from any other type. This is used to
+ * determine top-level anonymous enums that need to be emitted as an
+ * independent type declarations.
+ * Anonymous enums come in two flavors: either embedded in a struct's field
+ * definition, in which case they have to be declared inline as part of field
+ * type declaration; or as a top-level anonymous enum, typically used for
+ * declaring global constants. It's impossible to distinguish between two
+ * without knowning whether given enum type was referenced from other type:
+ * top-level anonymous enum won't be referenced by anything, while embedded
+ * one will.
+ */
+static int btf_dump_mark_referenced(struct btf_dump *d)
+{
+ int i, j, n = btf__get_nr_types(d->btf);
+ const struct btf_type *t;
+ __u16 vlen;
+
+ for (i = 1; i <= n; i++) {
+ t = btf__type_by_id(d->btf, i);
+ vlen = btf_vlen(t);
+
+ switch (btf_kind(t)) {
+ case BTF_KIND_INT:
+ case BTF_KIND_ENUM:
+ case BTF_KIND_FWD:
+ break;
+
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ case BTF_KIND_PTR:
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_FUNC:
+ case BTF_KIND_VAR:
+ d->type_states[t->type].referenced = 1;
+ break;
+
+ case BTF_KIND_ARRAY: {
+ const struct btf_array *a = btf_array(t);
+
+ d->type_states[a->index_type].referenced = 1;
+ d->type_states[a->type].referenced = 1;
+ break;
+ }
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION: {
+ const struct btf_member *m = btf_members(t);
+
+ for (j = 0; j < vlen; j++, m++)
+ d->type_states[m->type].referenced = 1;
+ break;
+ }
+ case BTF_KIND_FUNC_PROTO: {
+ const struct btf_param *p = btf_params(t);
+
+ for (j = 0; j < vlen; j++, p++)
+ d->type_states[p->type].referenced = 1;
+ break;
+ }
+ case BTF_KIND_DATASEC: {
+ const struct btf_var_secinfo *v = btf_var_secinfos(t);
+
+ for (j = 0; j < vlen; j++, v++)
+ d->type_states[v->type].referenced = 1;
+ break;
+ }
+ default:
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
static int btf_dump_add_emit_queue_id(struct btf_dump *d, __u32 id)
{
__u32 *new_queue;
}
case BTF_KIND_ENUM:
case BTF_KIND_FWD:
- if (t->name_off != 0) {
+ /*
+ * non-anonymous or non-referenced enums are top-level
+ * declarations and should be emitted. Same logic can be
+ * applied to FWDs, it won't hurt anyways.
+ */
+ if (t->name_off != 0 || !tstate->referenced) {
err = btf_dump_add_emit_queue_id(d, id);
if (err)
return err;
t = btf__type_by_id(d->btf, id);
kind = btf_kind(t);
- if (top_level_def && t->name_off == 0) {
- pr_warning("unexpected nameless definition, id:[%u]\n", id);
- return;
- }
-
if (tstate->emit_state == EMITTING) {
if (tstate->fwd_emitted)
return;
return;
}
+ next_id = decls->ids[decls->cnt - 1];
next_t = btf__type_by_id(d->btf, next_id);
multidim = btf_is_array(next_t);
/* we need space if we have named non-pointer */
int xsks_map_fd;
__u32 queue_id;
char ifname[IFNAMSIZ];
- bool zc;
};
struct xsk_nl_info {
void *rx_map = NULL, *tx_map = NULL;
struct sockaddr_xdp sxdp = {};
struct xdp_mmap_offsets off;
- struct xdp_options opts;
struct xsk_socket *xsk;
socklen_t optlen;
int err;
xsk->prog_fd = -1;
- optlen = sizeof(opts);
- err = getsockopt(xsk->fd, SOL_XDP, XDP_OPTIONS, &opts, &optlen);
- if (err) {
- err = -errno;
- goto out_mmap_tx;
- }
-
- xsk->zc = opts.flags & XDP_OPTIONS_ZEROCOPY;
-
if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
err = xsk_setup_xdp_prog(xsk);
if (err)
"do_task_dead",
"__module_put_and_exit",
"complete_and_exit",
- "kvm_spurious_fault",
"__reiserfs_panic",
"lbug_with_loc",
"fortify_panic",
*/
#ifndef __NFIT_TEST_H__
#define __NFIT_TEST_H__
+#include <linux/acpi.h>
#include <linux/list.h>
#include <linux/uuid.h>
#include <linux/ioport.h>
__u32 status;
} __packed;
-union acpi_object;
-typedef void *acpi_handle;
-
typedef struct nfit_test_resource *(*nfit_test_lookup_fn)(resource_size_t);
typedef union acpi_object *(*nfit_test_evaluate_dsm_fn)(acpi_handle handle,
const guid_t *guid, u64 rev, u64 func,
return fd;
}
+static pthread_mutex_t server_started_mtx = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t server_started = PTHREAD_COND_INITIALIZER;
+
static void *server_thread(void *arg)
{
struct sockaddr_storage addr;
socklen_t len = sizeof(addr);
int fd = *(int *)arg;
int client_fd;
+ int err;
+
+ err = listen(fd, 1);
+
+ pthread_mutex_lock(&server_started_mtx);
+ pthread_cond_signal(&server_started);
+ pthread_mutex_unlock(&server_started_mtx);
- if (CHECK_FAIL(listen(fd, 1)) < 0) {
+ if (CHECK_FAIL(err < 0)) {
perror("Failed to listed on socket");
return NULL;
}
if (CHECK_FAIL(server_fd < 0))
goto close_cgroup_fd;
- pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
+ if (CHECK_FAIL(pthread_create(&tid, NULL, server_thread,
+ (void *)&server_fd)))
+ goto close_cgroup_fd;
+
+ pthread_mutex_lock(&server_started_mtx);
+ pthread_cond_wait(&server_started, &server_started_mtx);
+ pthread_mutex_unlock(&server_started_mtx);
+
CHECK_FAIL(run_test(cgroup_fd, server_fd));
close(server_fd);
close_cgroup_fd:
#else
#pragma unroll
#endif
- for (int i = 0; i < STROBE_MAX_MAP_ENTRIES && i < map.cnt; ++i) {
+ for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) {
+ if (i >= map.cnt)
+ break;
+
descr->key_lens[i] = 0;
len = bpf_probe_read_str(payload, STROBE_MAX_STR_LEN,
map.entries[i].key);
uint8_t raw[sizeof(uint64_t)];
uint64_t num;
} value = {};
- uint8_t c, i;
if (buf_len > sizeof(value)) {
log_err("Value is too big (%zd) to use in fixup", buf_len);
local vid=10
bridge vlan add vid $vid dev $swp2 master
- # During initialization the firmware enables all the VLAN filters and
- # the driver does not turn them off since the traffic will be discarded
- # by the STP filter whose default is DISCARD state. Add the VID on the
- # ingress bridge port and then remove it to make sure it is not member
- # in the VLAN.
- bridge vlan add vid $vid dev $swp1 master
- bridge vlan del vid $vid dev $swp1 master
RET=0
check_error 'p:kprobes/testevent _do_fork ^bcd=\1' # DIFF_ARG_TYPE
check_error 'p:kprobes/testevent _do_fork ^abcd=\1:u8' # DIFF_ARG_TYPE
check_error 'p:kprobes/testevent _do_fork ^abcd=\"foo"' # DIFF_ARG_TYPE
-check_error '^p:kprobes/testevent _do_fork' # SAME_PROBE
+check_error '^p:kprobes/testevent _do_fork abcd=\1' # SAME_PROBE
fi
exit 0
# sequentially. As a result, a policy rule may be defined, but
# might not necessarily be used. This test assumes if a policy
# rule is specified, that is the intent.
+
+ # First check for appended signature (modsig), then xattr
if [ $ima_read_policy -eq 1 ]; then
check_ima_policy "appraise" "func=KEXEC_KERNEL_CHECK" \
- "appraise_type=imasig"
+ "appraise_type=imasig|modsig"
ret=$?
- [ $ret -eq 1 ] && log_info "IMA signature required";
+ if [ $ret -eq 1 ]; then
+ log_info "IMA or appended(modsig) signature required"
+ else
+ check_ima_policy "appraise" "func=KEXEC_KERNEL_CHECK" \
+ "appraise_type=imasig"
+ ret=$?
+ [ $ret -eq 1 ] && log_info "IMA signature required";
+ fi
fi
return $ret
}
return $ret
}
+# Return 1 for appended signature (modsig) found and 0 for not found.
+check_for_modsig()
+{
+ local module_sig_string="~Module signature appended~"
+ local sig="$(tail --bytes $((${#module_sig_string} + 1)) $KERNEL_IMAGE)"
+ local ret=0
+
+ if [ "$sig" == "$module_sig_string" ]; then
+ ret=1
+ log_info "kexec kernel image modsig signed"
+ else
+ log_info "kexec kernel image not modsig signed"
+ fi
+ return $ret
+}
+
kexec_file_load_test()
{
local succeed_msg="kexec_file_load succeeded"
# In secureboot mode with an architecture specific
# policy, make sure either an IMA or PE signature exists.
if [ $secureboot -eq 1 ] && [ $arch_policy -eq 1 ] && \
- [ $ima_signed -eq 0 ] && [ $pe_signed -eq 0 ]; then
+ [ $ima_signed -eq 0 ] && [ $pe_signed -eq 0 ] \
+ && [ $ima_modsig -eq 0 ]; then
log_fail "$succeed_msg (missing sig)"
fi
log_fail "$succeed_msg (missing PE sig)"
fi
- if [ $ima_sig_required -eq 1 ] && [ $ima_signed -eq 0 ]; then
+ if [ $ima_sig_required -eq 1 ] && [ $ima_signed -eq 0 ] \
+ && [ $ima_modsig -eq 0 ]; then
log_fail "$succeed_msg (missing IMA sig)"
fi
check_for_imasig
ima_signed=$?
+check_for_modsig
+ima_modsig=$?
+
# Test loading the kernel image via kexec_file_load syscall
kexec_file_load_test
TEST_GEN_PROGS_x86_64 += x86_64/state_test
TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += clear_dirty_log_test
-I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(UNAME_M) -I..
no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \
- $(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -no-pie -x c - -o "$$TMP", -no-pie)
+ $(CC) -Werror -no-pie -x c - -o "$$TMP", -no-pie)
# On s390, build the testcases KVM-enabled
pgste-option = $(call try-run, echo 'int main() { return 0; }' | \
#include "kvm_util.h"
#include "processor.h"
-#define DEBUG printf
-
#define VCPU_ID 1
/* The memory slot index to track dirty pages */
}
static struct kvm_vm *create_vm(enum vm_guest_mode mode, uint32_t vcpuid,
- uint64_t extra_mem_pages, void *guest_code,
- unsigned long type)
+ uint64_t extra_mem_pages, void *guest_code)
{
struct kvm_vm *vm;
uint64_t extra_pg_pages = extra_mem_pages / 512 * 2;
- vm = _vm_create(mode, DEFAULT_GUEST_PHY_PAGES + extra_pg_pages,
- O_RDWR, type);
+ vm = _vm_create(mode, DEFAULT_GUEST_PHY_PAGES + extra_pg_pages, O_RDWR);
kvm_vm_elf_load(vm, program_invocation_name, 0, 0);
#ifdef __x86_64__
vm_create_irqchip(vm);
return vm;
}
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K 12
+
static void run_test(enum vm_guest_mode mode, unsigned long iterations,
unsigned long interval, uint64_t phys_offset)
{
- unsigned int guest_pa_bits, guest_page_shift;
pthread_t vcpu_thread;
struct kvm_vm *vm;
- uint64_t max_gfn;
unsigned long *bmap;
- unsigned long type = 0;
-
- switch (mode) {
- case VM_MODE_P52V48_4K:
- guest_pa_bits = 52;
- guest_page_shift = 12;
- break;
- case VM_MODE_P52V48_64K:
- guest_pa_bits = 52;
- guest_page_shift = 16;
- break;
- case VM_MODE_P48V48_4K:
- guest_pa_bits = 48;
- guest_page_shift = 12;
- break;
- case VM_MODE_P48V48_64K:
- guest_pa_bits = 48;
- guest_page_shift = 16;
- break;
- case VM_MODE_P40V48_4K:
- guest_pa_bits = 40;
- guest_page_shift = 12;
- break;
- case VM_MODE_P40V48_64K:
- guest_pa_bits = 40;
- guest_page_shift = 16;
- break;
- default:
- TEST_ASSERT(false, "Unknown guest mode, mode: 0x%x", mode);
- }
- DEBUG("Testing guest mode: %s\n", vm_guest_mode_string(mode));
-
-#ifdef __x86_64__
/*
- * FIXME
- * The x86_64 kvm selftests framework currently only supports a
- * single PML4 which restricts the number of physical address
- * bits we can change to 39.
+ * We reserve page table for 2 times of extra dirty mem which
+ * will definitely cover the original (1G+) test range. Here
+ * we do the calculation with 4K page size which is the
+ * smallest so the page number will be enough for all archs
+ * (e.g., 64K page size guest will need even less memory for
+ * page tables).
*/
- guest_pa_bits = 39;
-#endif
-#ifdef __aarch64__
- if (guest_pa_bits != 40)
- type = KVM_VM_TYPE_ARM_IPA_SIZE(guest_pa_bits);
-#endif
- max_gfn = (1ul << (guest_pa_bits - guest_page_shift)) - 1;
- guest_page_size = (1ul << guest_page_shift);
+ vm = create_vm(mode, VCPU_ID,
+ 2ul << (DIRTY_MEM_BITS - PAGE_SHIFT_4K),
+ guest_code);
+
+ guest_page_size = vm_get_page_size(vm);
/*
* A little more than 1G of guest page sized pages. Cover the
* case where the size is not aligned to 64 pages.
*/
- guest_num_pages = (1ul << (30 - guest_page_shift)) + 16;
+ guest_num_pages = (1ul << (DIRTY_MEM_BITS -
+ vm_get_page_shift(vm))) + 16;
#ifdef __s390x__
/* Round up to multiple of 1M (segment size) */
guest_num_pages = (guest_num_pages + 0xff) & ~0xffUL;
!!((guest_num_pages * guest_page_size) % host_page_size);
if (!phys_offset) {
- guest_test_phys_mem = (max_gfn - guest_num_pages) * guest_page_size;
+ guest_test_phys_mem = (vm_get_max_gfn(vm) -
+ guest_num_pages) * guest_page_size;
guest_test_phys_mem &= ~(host_page_size - 1);
} else {
guest_test_phys_mem = phys_offset;
bmap = bitmap_alloc(host_num_pages);
host_bmap_track = bitmap_alloc(host_num_pages);
- vm = create_vm(mode, VCPU_ID, guest_num_pages, guest_code, type);
-
#ifdef USE_CLEAR_DIRTY_LOG
struct kvm_enable_cap cap = {};
#endif
#ifdef __x86_64__
- vm_guest_mode_params_init(VM_MODE_P52V48_4K, true, true);
+ vm_guest_mode_params_init(VM_MODE_PXXV48_4K, true, true);
#endif
#ifdef __aarch64__
vm_guest_mode_params_init(VM_MODE_P40V48_4K, true, true);
typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) physical address */
typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */
+#ifndef NDEBUG
+#define DEBUG(...) printf(__VA_ARGS__);
+#else
+#define DEBUG(...)
+#endif
+
/* Minimum allocated guest virtual and physical addresses */
#define KVM_UTIL_MIN_VADDR 0x2000
VM_MODE_P48V48_64K,
VM_MODE_P40V48_4K,
VM_MODE_P40V48_64K,
+ VM_MODE_PXXV48_4K, /* For 48bits VA but ANY bits PA */
NUM_VM_MODES,
};
-#ifdef __aarch64__
+#if defined(__aarch64__)
#define VM_MODE_DEFAULT VM_MODE_P40V48_4K
+#elif defined(__x86_64__)
+#define VM_MODE_DEFAULT VM_MODE_PXXV48_4K
#else
#define VM_MODE_DEFAULT VM_MODE_P52V48_4K
#endif
int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap);
struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm);
-struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages,
- int perm, unsigned long type);
+struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm);
void kvm_vm_free(struct kvm_vm *vmp);
void kvm_vm_restart(struct kvm_vm *vmp, int perm);
void kvm_vm_release(struct kvm_vm *vmp);
bool vm_is_unrestricted_guest(struct kvm_vm *vm);
+unsigned int vm_get_page_size(struct kvm_vm *vm);
+unsigned int vm_get_page_shift(struct kvm_vm *vm);
+unsigned int vm_get_max_gfn(struct kvm_vm *vm);
+
struct kvm_userspace_memory_region *
kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start,
uint64_t end);
void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
uint64_t msr_value);
+uint32_t kvm_get_cpuid_max(void);
+void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
+
/*
* Basic CPU control in CR0
*/
#define VMX_BASIC_MEM_TYPE_WB 6LLU
#define VMX_BASIC_INOUT 0x0040000000000000LLU
+/* VMX_EPT_VPID_CAP bits */
+#define VMX_EPT_VPID_CAP_AD_BITS (1ULL << 21)
+
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F
void *enlightened_vmcs_hva;
uint64_t enlightened_vmcs_gpa;
void *enlightened_vmcs;
+
+ void *eptp_hva;
+ uint64_t eptp_gpa;
+ void *eptp;
};
struct vmx_pages *vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva);
void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp);
bool load_vmcs(struct vmx_pages *vmx);
+void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint32_t eptp_memslot);
+void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint64_t size,
+ uint32_t eptp_memslot);
+void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint32_t memslot, uint32_t eptp_memslot);
+void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint32_t eptp_memslot);
+
#endif /* SELFTEST_KVM_VMX_H */
case VM_MODE_P52V48_4K:
TEST_ASSERT(false, "AArch64 does not support 4K sized pages "
"with 52-bit physical address ranges");
+ case VM_MODE_PXXV48_4K:
+ TEST_ASSERT(false, "AArch64 does not support 4K sized pages "
+ "with ANY-bit physical address ranges");
case VM_MODE_P52V48_64K:
tcr_el1 |= 1ul << 14; /* TG0 = 64KB */
tcr_el1 |= 6ul << 32; /* IPS = 52 bits */
#include "test_util.h"
#include "kvm_util.h"
#include "kvm_util_internal.h"
+#include "processor.h"
#include <assert.h>
#include <sys/mman.h>
return ret;
}
-static void vm_open(struct kvm_vm *vm, int perm, unsigned long type)
+static void vm_open(struct kvm_vm *vm, int perm)
{
vm->kvm_fd = open(KVM_DEV_PATH, perm);
if (vm->kvm_fd < 0)
exit(KSFT_SKIP);
}
- vm->fd = ioctl(vm->kvm_fd, KVM_CREATE_VM, type);
+ vm->fd = ioctl(vm->kvm_fd, KVM_CREATE_VM, vm->type);
TEST_ASSERT(vm->fd >= 0, "KVM_CREATE_VM ioctl failed, "
"rc: %i errno: %i", vm->fd, errno);
}
const char * const vm_guest_mode_string[] = {
- "PA-bits:52, VA-bits:48, 4K pages",
- "PA-bits:52, VA-bits:48, 64K pages",
- "PA-bits:48, VA-bits:48, 4K pages",
- "PA-bits:48, VA-bits:48, 64K pages",
- "PA-bits:40, VA-bits:48, 4K pages",
- "PA-bits:40, VA-bits:48, 64K pages",
+ "PA-bits:52, VA-bits:48, 4K pages",
+ "PA-bits:52, VA-bits:48, 64K pages",
+ "PA-bits:48, VA-bits:48, 4K pages",
+ "PA-bits:48, VA-bits:48, 64K pages",
+ "PA-bits:40, VA-bits:48, 4K pages",
+ "PA-bits:40, VA-bits:48, 64K pages",
+ "PA-bits:ANY, VA-bits:48, 4K pages",
};
_Static_assert(sizeof(vm_guest_mode_string)/sizeof(char *) == NUM_VM_MODES,
"Missing new mode strings?");
* descriptor to control the created VM is created with the permissions
* given by perm (e.g. O_RDWR).
*/
-struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages,
- int perm, unsigned long type)
+struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
{
struct kvm_vm *vm;
+ DEBUG("Testing guest mode: %s\n", vm_guest_mode_string(mode));
+
vm = calloc(1, sizeof(*vm));
TEST_ASSERT(vm != NULL, "Insufficient Memory");
vm->mode = mode;
- vm->type = type;
- vm_open(vm, perm, type);
+ vm->type = 0;
/* Setup mode specific traits. */
switch (vm->mode) {
vm->page_size = 0x10000;
vm->page_shift = 16;
break;
+ case VM_MODE_PXXV48_4K:
+#ifdef __x86_64__
+ kvm_get_cpu_address_width(&vm->pa_bits, &vm->va_bits);
+ TEST_ASSERT(vm->va_bits == 48, "Linear address width "
+ "(%d bits) not supported", vm->va_bits);
+ vm->pgtable_levels = 4;
+ vm->page_size = 0x1000;
+ vm->page_shift = 12;
+ DEBUG("Guest physical address width detected: %d\n",
+ vm->pa_bits);
+#else
+ TEST_ASSERT(false, "VM_MODE_PXXV48_4K not supported on "
+ "non-x86 platforms");
+#endif
+ break;
default:
TEST_ASSERT(false, "Unknown guest mode, mode: 0x%x", mode);
}
+#ifdef __aarch64__
+ if (vm->pa_bits != 40)
+ vm->type = KVM_VM_TYPE_ARM_IPA_SIZE(vm->pa_bits);
+#endif
+
+ vm_open(vm, perm);
+
/* Limit to VA-bit canonical virtual addresses. */
vm->vpages_valid = sparsebit_alloc();
sparsebit_set_num(vm->vpages_valid,
struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
{
- return _vm_create(mode, phy_pages, perm, 0);
+ return _vm_create(mode, phy_pages, perm);
}
/*
{
struct userspace_mem_region *region;
- vm_open(vmp, perm, vmp->type);
+ vm_open(vmp, perm);
if (vmp->has_irqchip)
vm_create_irqchip(vmp);
* on error (e.g. currently no memory region using memslot as a KVM
* memory slot ID).
*/
-static struct userspace_mem_region *
+struct userspace_mem_region *
memslot2region(struct kvm_vm *vm, uint32_t memslot)
{
struct userspace_mem_region *region;
return val == 'Y';
}
+
+unsigned int vm_get_page_size(struct kvm_vm *vm)
+{
+ return vm->page_size;
+}
+
+unsigned int vm_get_page_shift(struct kvm_vm *vm)
+{
+ return vm->page_shift;
+}
+
+unsigned int vm_get_max_gfn(struct kvm_vm *vm)
+{
+ return vm->max_gfn;
+}
void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent);
void sregs_dump(FILE *stream, struct kvm_sregs *sregs, uint8_t indent);
+struct userspace_mem_region *
+memslot2region(struct kvm_vm *vm, uint32_t memslot);
+
#endif /* SELFTEST_KVM_UTIL_INTERNAL_H */
void virt_pgd_alloc(struct kvm_vm *vm, uint32_t pgd_memslot)
{
- TEST_ASSERT(vm->mode == VM_MODE_P52V48_4K, "Attempt to use "
+ TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
"unknown or unsupported guest mode, mode: 0x%x", vm->mode);
/* If needed, create page map l4 table. */
uint16_t index[4];
struct pageMapL4Entry *pml4e;
- TEST_ASSERT(vm->mode == VM_MODE_P52V48_4K, "Attempt to use "
+ TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
"unknown or unsupported guest mode, mode: 0x%x", vm->mode);
TEST_ASSERT((vaddr % vm->page_size) == 0,
struct pageDirectoryEntry *pde;
struct pageTableEntry *pte;
- TEST_ASSERT(vm->mode == VM_MODE_P52V48_4K, "Attempt to use "
+ TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
"unknown or unsupported guest mode, mode: 0x%x", vm->mode);
index[0] = (gva >> 12) & 0x1ffu;
kvm_setup_gdt(vm, &sregs.gdt, gdt_memslot, pgd_memslot);
switch (vm->mode) {
- case VM_MODE_P52V48_4K:
+ case VM_MODE_PXXV48_4K:
sregs.cr0 = X86_CR0_PE | X86_CR0_NE | X86_CR0_PG;
sregs.cr4 |= X86_CR4_PAE | X86_CR4_OSFXSR;
sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);
for (i = 0; i < nmsrs; i++)
state->msrs.entries[i].index = list->indices[i];
r = ioctl(vcpu->fd, KVM_GET_MSRS, &state->msrs);
- TEST_ASSERT(r == nmsrs, "Unexpected result from KVM_GET_MSRS, r: %i (failed at %x)",
+ TEST_ASSERT(r == nmsrs, "Unexpected result from KVM_GET_MSRS, r: %i (failed MSR was 0x%x)",
r, r == nmsrs ? -1 : list->indices[r]);
r = ioctl(vcpu->fd, KVM_GET_DEBUGREGS, &state->debugregs);
chunk = (const uint32_t *)("GenuineIntel");
return (ebx == chunk[0] && edx == chunk[1] && ecx == chunk[2]);
}
+
+uint32_t kvm_get_cpuid_max(void)
+{
+ return kvm_get_supported_cpuid_entry(0x80000000)->eax;
+}
+
+void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits)
+{
+ struct kvm_cpuid_entry2 *entry;
+ bool pae;
+
+ /* SDM 4.1.4 */
+ if (kvm_get_cpuid_max() < 0x80000008) {
+ pae = kvm_get_supported_cpuid_entry(1)->edx & (1 << 6);
+ *pa_bits = pae ? 36 : 32;
+ *va_bits = 32;
+ } else {
+ entry = kvm_get_supported_cpuid_entry(0x80000008);
+ *pa_bits = entry->eax & 0xff;
+ *va_bits = (entry->eax >> 8) & 0xff;
+ }
+}
va_end(va);
asm volatile("in %[port], %%al"
- : : [port] "d" (UCALL_PIO_PORT), "D" (&uc) : "rax");
+ : : [port] "d" (UCALL_PIO_PORT), "D" (&uc) : "rax", "memory");
}
uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
#include "test_util.h"
#include "kvm_util.h"
+#include "../kvm_util_internal.h"
#include "processor.h"
#include "vmx.h"
+#define PAGE_SHIFT_4K 12
+
+#define KVM_EPT_PAGE_TABLE_MIN_PADDR 0x1c0000
+
bool enable_evmcs;
+struct eptPageTableEntry {
+ uint64_t readable:1;
+ uint64_t writable:1;
+ uint64_t executable:1;
+ uint64_t memory_type:3;
+ uint64_t ignore_pat:1;
+ uint64_t page_size:1;
+ uint64_t accessed:1;
+ uint64_t dirty:1;
+ uint64_t ignored_11_10:2;
+ uint64_t address:40;
+ uint64_t ignored_62_52:11;
+ uint64_t suppress_ve:1;
+};
+
+struct eptPageTablePointer {
+ uint64_t memory_type:3;
+ uint64_t page_walk_length:3;
+ uint64_t ad_enabled:1;
+ uint64_t reserved_11_07:5;
+ uint64_t address:40;
+ uint64_t reserved_63_52:12;
+};
int vcpu_enable_evmcs(struct kvm_vm *vm, int vcpu_id)
{
uint16_t evmcs_ver;
*/
static inline void init_vmcs_control_fields(struct vmx_pages *vmx)
{
+ uint32_t sec_exec_ctl = 0;
+
vmwrite(VIRTUAL_PROCESSOR_ID, 0);
vmwrite(POSTED_INTR_NV, 0);
vmwrite(PIN_BASED_VM_EXEC_CONTROL, rdmsr(MSR_IA32_VMX_TRUE_PINBASED_CTLS));
- if (!vmwrite(SECONDARY_VM_EXEC_CONTROL, 0))
+
+ if (vmx->eptp_gpa) {
+ uint64_t ept_paddr;
+ struct eptPageTablePointer eptp = {
+ .memory_type = VMX_BASIC_MEM_TYPE_WB,
+ .page_walk_length = 3, /* + 1 */
+ .ad_enabled = !!(rdmsr(MSR_IA32_VMX_EPT_VPID_CAP) & VMX_EPT_VPID_CAP_AD_BITS),
+ .address = vmx->eptp_gpa >> PAGE_SHIFT_4K,
+ };
+
+ memcpy(&ept_paddr, &eptp, sizeof(ept_paddr));
+ vmwrite(EPT_POINTER, ept_paddr);
+ sec_exec_ctl |= SECONDARY_EXEC_ENABLE_EPT;
+ }
+
+ if (!vmwrite(SECONDARY_VM_EXEC_CONTROL, sec_exec_ctl))
vmwrite(CPU_BASED_VM_EXEC_CONTROL,
rdmsr(MSR_IA32_VMX_TRUE_PROCBASED_CTLS) | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS);
- else
+ else {
vmwrite(CPU_BASED_VM_EXEC_CONTROL, rdmsr(MSR_IA32_VMX_TRUE_PROCBASED_CTLS));
+ GUEST_ASSERT(!sec_exec_ctl);
+ }
+
vmwrite(EXCEPTION_BITMAP, 0);
vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0);
vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, -1); /* Never match */
init_vmcs_host_state();
init_vmcs_guest_state(guest_rip, guest_rsp);
}
+
+void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint32_t eptp_memslot)
+{
+ uint16_t index[4];
+ struct eptPageTableEntry *pml4e;
+
+ TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
+ "unknown or unsupported guest mode, mode: 0x%x", vm->mode);
+
+ TEST_ASSERT((nested_paddr % vm->page_size) == 0,
+ "Nested physical address not on page boundary,\n"
+ " nested_paddr: 0x%lx vm->page_size: 0x%x",
+ nested_paddr, vm->page_size);
+ TEST_ASSERT((nested_paddr >> vm->page_shift) <= vm->max_gfn,
+ "Physical address beyond beyond maximum supported,\n"
+ " nested_paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
+ paddr, vm->max_gfn, vm->page_size);
+ TEST_ASSERT((paddr % vm->page_size) == 0,
+ "Physical address not on page boundary,\n"
+ " paddr: 0x%lx vm->page_size: 0x%x",
+ paddr, vm->page_size);
+ TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn,
+ "Physical address beyond beyond maximum supported,\n"
+ " paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
+ paddr, vm->max_gfn, vm->page_size);
+
+ index[0] = (nested_paddr >> 12) & 0x1ffu;
+ index[1] = (nested_paddr >> 21) & 0x1ffu;
+ index[2] = (nested_paddr >> 30) & 0x1ffu;
+ index[3] = (nested_paddr >> 39) & 0x1ffu;
+
+ /* Allocate page directory pointer table if not present. */
+ pml4e = vmx->eptp_hva;
+ if (!pml4e[index[3]].readable) {
+ pml4e[index[3]].address = vm_phy_page_alloc(vm,
+ KVM_EPT_PAGE_TABLE_MIN_PADDR, eptp_memslot)
+ >> vm->page_shift;
+ pml4e[index[3]].writable = true;
+ pml4e[index[3]].readable = true;
+ pml4e[index[3]].executable = true;
+ }
+
+ /* Allocate page directory table if not present. */
+ struct eptPageTableEntry *pdpe;
+ pdpe = addr_gpa2hva(vm, pml4e[index[3]].address * vm->page_size);
+ if (!pdpe[index[2]].readable) {
+ pdpe[index[2]].address = vm_phy_page_alloc(vm,
+ KVM_EPT_PAGE_TABLE_MIN_PADDR, eptp_memslot)
+ >> vm->page_shift;
+ pdpe[index[2]].writable = true;
+ pdpe[index[2]].readable = true;
+ pdpe[index[2]].executable = true;
+ }
+
+ /* Allocate page table if not present. */
+ struct eptPageTableEntry *pde;
+ pde = addr_gpa2hva(vm, pdpe[index[2]].address * vm->page_size);
+ if (!pde[index[1]].readable) {
+ pde[index[1]].address = vm_phy_page_alloc(vm,
+ KVM_EPT_PAGE_TABLE_MIN_PADDR, eptp_memslot)
+ >> vm->page_shift;
+ pde[index[1]].writable = true;
+ pde[index[1]].readable = true;
+ pde[index[1]].executable = true;
+ }
+
+ /* Fill in page table entry. */
+ struct eptPageTableEntry *pte;
+ pte = addr_gpa2hva(vm, pde[index[1]].address * vm->page_size);
+ pte[index[0]].address = paddr >> vm->page_shift;
+ pte[index[0]].writable = true;
+ pte[index[0]].readable = true;
+ pte[index[0]].executable = true;
+
+ /*
+ * For now mark these as accessed and dirty because the only
+ * testcase we have needs that. Can be reconsidered later.
+ */
+ pte[index[0]].accessed = true;
+ pte[index[0]].dirty = true;
+}
+
+/*
+ * Map a range of EPT guest physical addresses to the VM's physical address
+ *
+ * Input Args:
+ * vm - Virtual Machine
+ * nested_paddr - Nested guest physical address to map
+ * paddr - VM Physical Address
+ * size - The size of the range to map
+ * eptp_memslot - Memory region slot for new virtual translation tables
+ *
+ * Output Args: None
+ *
+ * Return: None
+ *
+ * Within the VM given by vm, creates a nested guest translation for the
+ * page range starting at nested_paddr to the page range starting at paddr.
+ */
+void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint64_t size,
+ uint32_t eptp_memslot)
+{
+ size_t page_size = vm->page_size;
+ size_t npages = size / page_size;
+
+ TEST_ASSERT(nested_paddr + size > nested_paddr, "Vaddr overflow");
+ TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
+
+ while (npages--) {
+ nested_pg_map(vmx, vm, nested_paddr, paddr, eptp_memslot);
+ nested_paddr += page_size;
+ paddr += page_size;
+ }
+}
+
+/* Prepare an identity extended page table that maps all the
+ * physical pages in VM.
+ */
+void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint32_t memslot, uint32_t eptp_memslot)
+{
+ sparsebit_idx_t i, last;
+ struct userspace_mem_region *region =
+ memslot2region(vm, memslot);
+
+ i = (region->region.guest_phys_addr >> vm->page_shift) - 1;
+ last = i + (region->region.memory_size >> vm->page_shift);
+ for (;;) {
+ i = sparsebit_next_clear(region->unused_phy_pages, i);
+ if (i > last)
+ break;
+
+ nested_map(vmx, vm,
+ (uint64_t)i << vm->page_shift,
+ (uint64_t)i << vm->page_shift,
+ 1 << vm->page_shift,
+ eptp_memslot);
+ }
+}
+
+void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint32_t eptp_memslot)
+{
+ vmx->eptp = (void *)vm_vaddr_alloc(vm, getpagesize(), 0x10000, 0, 0);
+ vmx->eptp_hva = addr_gva2hva(vm, (uintptr_t)vmx->eptp);
+ vmx->eptp_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->eptp);
+}
{
}
+static int smt_possible(void)
+{
+ char buf[16];
+ FILE *f;
+ bool res = 1;
+
+ f = fopen("/sys/devices/system/cpu/smt/control", "r");
+ if (f) {
+ if (fread(buf, sizeof(*buf), sizeof(buf), f) > 0) {
+ if (!strncmp(buf, "forceoff", 8) ||
+ !strncmp(buf, "notsupported", 12))
+ res = 0;
+ }
+ fclose(f);
+ }
+
+ return res;
+}
+
static void test_hv_cpuid(struct kvm_cpuid2 *hv_cpuid_entries,
int evmcs_enabled)
{
TEST_ASSERT(!entry->padding[0] && !entry->padding[1] &&
!entry->padding[2], "padding should be zero");
+ if (entry->function == 0x40000004) {
+ int nononarchcs = !!(entry->eax & (1UL << 18));
+
+ TEST_ASSERT(nononarchcs == !smt_possible(),
+ "NoNonArchitecturalCoreSharing bit"
+ " doesn't reflect SMT setting");
+ }
+
/*
* If needed for debug:
* fprintf(stdout,
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KVM dirty page logging test
+ *
+ * Copyright (C) 2018, Red Hat, Inc.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_name */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "vmx.h"
+
+#define VCPU_ID 1
+
+/* The memory slot index to track dirty pages */
+#define TEST_MEM_SLOT_INDEX 1
+#define TEST_MEM_SIZE 3
+
+/* L1 guest test virtual memory offset */
+#define GUEST_TEST_MEM 0xc0000000
+
+/* L2 guest test virtual memory offset */
+#define NESTED_TEST_MEM1 0xc0001000
+#define NESTED_TEST_MEM2 0xc0002000
+
+static void l2_guest_code(void)
+{
+ *(volatile uint64_t *)NESTED_TEST_MEM1;
+ *(volatile uint64_t *)NESTED_TEST_MEM1 = 1;
+ GUEST_SYNC(true);
+ GUEST_SYNC(false);
+
+ *(volatile uint64_t *)NESTED_TEST_MEM2 = 1;
+ GUEST_SYNC(true);
+ *(volatile uint64_t *)NESTED_TEST_MEM2 = 1;
+ GUEST_SYNC(true);
+ GUEST_SYNC(false);
+
+ /* Exit to L1 and never come back. */
+ vmcall();
+}
+
+void l1_guest_code(struct vmx_pages *vmx)
+{
+#define L2_GUEST_STACK_SIZE 64
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+ GUEST_ASSERT(vmx->vmcs_gpa);
+ GUEST_ASSERT(prepare_for_vmx_operation(vmx));
+ GUEST_ASSERT(load_vmcs(vmx));
+
+ prepare_vmcs(vmx, l2_guest_code,
+ &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ GUEST_SYNC(false);
+ GUEST_ASSERT(!vmlaunch());
+ GUEST_SYNC(false);
+ GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+ GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+ vm_vaddr_t vmx_pages_gva = 0;
+ struct vmx_pages *vmx;
+ unsigned long *bmap;
+ uint64_t *host_test_mem;
+
+ struct kvm_vm *vm;
+ struct kvm_run *run;
+ struct ucall uc;
+ bool done = false;
+
+ /* Create VM */
+ vm = vm_create_default(VCPU_ID, 0, l1_guest_code);
+ vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+ vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
+ vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
+ run = vcpu_state(vm, VCPU_ID);
+
+ /* Add an extra memory slot for testing dirty logging */
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+ GUEST_TEST_MEM,
+ TEST_MEM_SLOT_INDEX,
+ TEST_MEM_SIZE,
+ KVM_MEM_LOG_DIRTY_PAGES);
+
+ /*
+ * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
+ * affects both L1 and L2. However...
+ */
+ virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM,
+ TEST_MEM_SIZE * 4096, 0);
+
+ /*
+ * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
+ * 0xc0000000.
+ *
+ * Note that prepare_eptp should be called only L1's GPA map is done,
+ * meaning after the last call to virt_map.
+ */
+ prepare_eptp(vmx, vm, 0);
+ nested_map_memslot(vmx, vm, 0, 0);
+ nested_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, 4096, 0);
+ nested_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, 4096, 0);
+
+ bmap = bitmap_alloc(TEST_MEM_SIZE);
+ host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
+
+ while (!done) {
+ memset(host_test_mem, 0xaa, TEST_MEM_SIZE * 4096);
+ _vcpu_run(vm, VCPU_ID);
+ TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+ "Unexpected exit reason: %u (%s),\n",
+ run->exit_reason,
+ exit_reason_str(run->exit_reason));
+
+ switch (get_ucall(vm, VCPU_ID, &uc)) {
+ case UCALL_ABORT:
+ TEST_ASSERT(false, "%s at %s:%d", (const char *)uc.args[0],
+ __FILE__, uc.args[1]);
+ /* NOT REACHED */
+ case UCALL_SYNC:
+ /*
+ * The nested guest wrote at offset 0x1000 in the memslot, but the
+ * dirty bitmap must be filled in according to L1 GPA, not L2.
+ */
+ kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
+ if (uc.args[1]) {
+ TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean\n");
+ TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest\n");
+ } else {
+ TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty\n");
+ TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest\n");
+ }
+
+ TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty\n");
+ TEST_ASSERT(host_test_mem[4096 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest\n");
+ TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty\n");
+ TEST_ASSERT(host_test_mem[8192 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest\n");
+ break;
+ case UCALL_DONE:
+ done = true;
+ break;
+ default:
+ TEST_ASSERT(false, "Unknown ucall 0x%x.", uc.cmd);
+ }
+ }
+}
-membarrier_test
+membarrier_test_multi_thread
+membarrier_test_single_thread
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -g -I../../../../usr/include/
+LDLIBS += -lpthread
-TEST_GEN_PROGS := membarrier_test
+TEST_GEN_PROGS := membarrier_test_single_thread \
+ membarrier_test_multi_thread
include ../lib.mk
-
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0
-#define _GNU_SOURCE
-#include <linux/membarrier.h>
-#include <syscall.h>
-#include <stdio.h>
-#include <errno.h>
-#include <string.h>
-
-#include "../kselftest.h"
-
-static int sys_membarrier(int cmd, int flags)
-{
- return syscall(__NR_membarrier, cmd, flags);
-}
-
-static int test_membarrier_cmd_fail(void)
-{
- int cmd = -1, flags = 0;
- const char *test_name = "sys membarrier invalid command";
-
- if (sys_membarrier(cmd, flags) != -1) {
- ksft_exit_fail_msg(
- "%s test: command = %d, flags = %d. Should fail, but passed\n",
- test_name, cmd, flags);
- }
- if (errno != EINVAL) {
- ksft_exit_fail_msg(
- "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
- test_name, flags, EINVAL, strerror(EINVAL),
- errno, strerror(errno));
- }
-
- ksft_test_result_pass(
- "%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
- test_name, cmd, flags, errno);
- return 0;
-}
-
-static int test_membarrier_flags_fail(void)
-{
- int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
-
- if (sys_membarrier(cmd, flags) != -1) {
- ksft_exit_fail_msg(
- "%s test: flags = %d. Should fail, but passed\n",
- test_name, flags);
- }
- if (errno != EINVAL) {
- ksft_exit_fail_msg(
- "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
- test_name, flags, EINVAL, strerror(EINVAL),
- errno, strerror(errno));
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d, errno = %d. Failed as expected\n",
- test_name, flags, errno);
- return 0;
-}
-
-static int test_membarrier_global_success(void)
-{
- int cmd = MEMBARRIER_CMD_GLOBAL, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL";
-
- if (sys_membarrier(cmd, flags) != 0) {
- ksft_exit_fail_msg(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d\n", test_name, flags);
- return 0;
-}
-
-static int test_membarrier_private_expedited_fail(void)
-{
- int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
-
- if (sys_membarrier(cmd, flags) != -1) {
- ksft_exit_fail_msg(
- "%s test: flags = %d. Should fail, but passed\n",
- test_name, flags);
- }
- if (errno != EPERM) {
- ksft_exit_fail_msg(
- "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
- test_name, flags, EPERM, strerror(EPERM),
- errno, strerror(errno));
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- return 0;
-}
-
-static int test_membarrier_register_private_expedited_success(void)
-{
- int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
-
- if (sys_membarrier(cmd, flags) != 0) {
- ksft_exit_fail_msg(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d\n",
- test_name, flags);
- return 0;
-}
-
-static int test_membarrier_private_expedited_success(void)
-{
- int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
-
- if (sys_membarrier(cmd, flags) != 0) {
- ksft_exit_fail_msg(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d\n",
- test_name, flags);
- return 0;
-}
-
-static int test_membarrier_private_expedited_sync_core_fail(void)
-{
- int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure";
-
- if (sys_membarrier(cmd, flags) != -1) {
- ksft_exit_fail_msg(
- "%s test: flags = %d. Should fail, but passed\n",
- test_name, flags);
- }
- if (errno != EPERM) {
- ksft_exit_fail_msg(
- "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
- test_name, flags, EPERM, strerror(EPERM),
- errno, strerror(errno));
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- return 0;
-}
-
-static int test_membarrier_register_private_expedited_sync_core_success(void)
-{
- int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE";
-
- if (sys_membarrier(cmd, flags) != 0) {
- ksft_exit_fail_msg(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d\n",
- test_name, flags);
- return 0;
-}
-
-static int test_membarrier_private_expedited_sync_core_success(void)
-{
- int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE";
-
- if (sys_membarrier(cmd, flags) != 0) {
- ksft_exit_fail_msg(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d\n",
- test_name, flags);
- return 0;
-}
-
-static int test_membarrier_register_global_expedited_success(void)
-{
- int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED";
-
- if (sys_membarrier(cmd, flags) != 0) {
- ksft_exit_fail_msg(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d\n",
- test_name, flags);
- return 0;
-}
-
-static int test_membarrier_global_expedited_success(void)
-{
- int cmd = MEMBARRIER_CMD_GLOBAL_EXPEDITED, flags = 0;
- const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL_EXPEDITED";
-
- if (sys_membarrier(cmd, flags) != 0) {
- ksft_exit_fail_msg(
- "%s test: flags = %d, errno = %d\n",
- test_name, flags, errno);
- }
-
- ksft_test_result_pass(
- "%s test: flags = %d\n",
- test_name, flags);
- return 0;
-}
-
-static int test_membarrier(void)
-{
- int status;
-
- status = test_membarrier_cmd_fail();
- if (status)
- return status;
- status = test_membarrier_flags_fail();
- if (status)
- return status;
- status = test_membarrier_global_success();
- if (status)
- return status;
- status = test_membarrier_private_expedited_fail();
- if (status)
- return status;
- status = test_membarrier_register_private_expedited_success();
- if (status)
- return status;
- status = test_membarrier_private_expedited_success();
- if (status)
- return status;
- status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
- if (status < 0) {
- ksft_test_result_fail("sys_membarrier() failed\n");
- return status;
- }
- if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
- status = test_membarrier_private_expedited_sync_core_fail();
- if (status)
- return status;
- status = test_membarrier_register_private_expedited_sync_core_success();
- if (status)
- return status;
- status = test_membarrier_private_expedited_sync_core_success();
- if (status)
- return status;
- }
- /*
- * It is valid to send a global membarrier from a non-registered
- * process.
- */
- status = test_membarrier_global_expedited_success();
- if (status)
- return status;
- status = test_membarrier_register_global_expedited_success();
- if (status)
- return status;
- status = test_membarrier_global_expedited_success();
- if (status)
- return status;
- return 0;
-}
-
-static int test_membarrier_query(void)
-{
- int flags = 0, ret;
-
- ret = sys_membarrier(MEMBARRIER_CMD_QUERY, flags);
- if (ret < 0) {
- if (errno == ENOSYS) {
- /*
- * It is valid to build a kernel with
- * CONFIG_MEMBARRIER=n. However, this skips the tests.
- */
- ksft_exit_skip(
- "sys membarrier (CONFIG_MEMBARRIER) is disabled.\n");
- }
- ksft_exit_fail_msg("sys_membarrier() failed\n");
- }
- if (!(ret & MEMBARRIER_CMD_GLOBAL))
- ksft_exit_skip(
- "sys_membarrier unsupported: CMD_GLOBAL not found.\n");
-
- ksft_test_result_pass("sys_membarrier available\n");
- return 0;
-}
-
-int main(int argc, char **argv)
-{
- ksft_print_header();
- ksft_set_plan(13);
-
- test_membarrier_query();
- test_membarrier();
-
- return ksft_exit_pass();
-}
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "../kselftest.h"
+
+static int sys_membarrier(int cmd, int flags)
+{
+ return syscall(__NR_membarrier, cmd, flags);
+}
+
+static int test_membarrier_cmd_fail(void)
+{
+ int cmd = -1, flags = 0;
+ const char *test_name = "sys membarrier invalid command";
+
+ if (sys_membarrier(cmd, flags) != -1) {
+ ksft_exit_fail_msg(
+ "%s test: command = %d, flags = %d. Should fail, but passed\n",
+ test_name, cmd, flags);
+ }
+ if (errno != EINVAL) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+ test_name, flags, EINVAL, strerror(EINVAL),
+ errno, strerror(errno));
+ }
+
+ ksft_test_result_pass(
+ "%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+ test_name, cmd, flags, errno);
+ return 0;
+}
+
+static int test_membarrier_flags_fail(void)
+{
+ int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
+
+ if (sys_membarrier(cmd, flags) != -1) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d. Should fail, but passed\n",
+ test_name, flags);
+ }
+ if (errno != EINVAL) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+ test_name, flags, EINVAL, strerror(EINVAL),
+ errno, strerror(errno));
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d, errno = %d. Failed as expected\n",
+ test_name, flags, errno);
+ return 0;
+}
+
+static int test_membarrier_global_success(void)
+{
+ int cmd = MEMBARRIER_CMD_GLOBAL, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL";
+
+ if (sys_membarrier(cmd, flags) != 0) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d\n", test_name, flags);
+ return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+ int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+ if (sys_membarrier(cmd, flags) != -1) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d. Should fail, but passed\n",
+ test_name, flags);
+ }
+ if (errno != EPERM) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+ test_name, flags, EPERM, strerror(EPERM),
+ errno, strerror(errno));
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+ int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
+
+ if (sys_membarrier(cmd, flags) != 0) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d\n",
+ test_name, flags);
+ return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+ int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+ if (sys_membarrier(cmd, flags) != 0) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d\n",
+ test_name, flags);
+ return 0;
+}
+
+static int test_membarrier_private_expedited_sync_core_fail(void)
+{
+ int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure";
+
+ if (sys_membarrier(cmd, flags) != -1) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d. Should fail, but passed\n",
+ test_name, flags);
+ }
+ if (errno != EPERM) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+ test_name, flags, EPERM, strerror(EPERM),
+ errno, strerror(errno));
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ return 0;
+}
+
+static int test_membarrier_register_private_expedited_sync_core_success(void)
+{
+ int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE";
+
+ if (sys_membarrier(cmd, flags) != 0) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d\n",
+ test_name, flags);
+ return 0;
+}
+
+static int test_membarrier_private_expedited_sync_core_success(void)
+{
+ int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE";
+
+ if (sys_membarrier(cmd, flags) != 0) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d\n",
+ test_name, flags);
+ return 0;
+}
+
+static int test_membarrier_register_global_expedited_success(void)
+{
+ int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED";
+
+ if (sys_membarrier(cmd, flags) != 0) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d\n",
+ test_name, flags);
+ return 0;
+}
+
+static int test_membarrier_global_expedited_success(void)
+{
+ int cmd = MEMBARRIER_CMD_GLOBAL_EXPEDITED, flags = 0;
+ const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL_EXPEDITED";
+
+ if (sys_membarrier(cmd, flags) != 0) {
+ ksft_exit_fail_msg(
+ "%s test: flags = %d, errno = %d\n",
+ test_name, flags, errno);
+ }
+
+ ksft_test_result_pass(
+ "%s test: flags = %d\n",
+ test_name, flags);
+ return 0;
+}
+
+static int test_membarrier_fail(void)
+{
+ int status;
+
+ status = test_membarrier_cmd_fail();
+ if (status)
+ return status;
+ status = test_membarrier_flags_fail();
+ if (status)
+ return status;
+ status = test_membarrier_private_expedited_fail();
+ if (status)
+ return status;
+ status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
+ if (status < 0) {
+ ksft_test_result_fail("sys_membarrier() failed\n");
+ return status;
+ }
+ if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+ status = test_membarrier_private_expedited_sync_core_fail();
+ if (status)
+ return status;
+ }
+ return 0;
+}
+
+static int test_membarrier_success(void)
+{
+ int status;
+
+ status = test_membarrier_global_success();
+ if (status)
+ return status;
+ status = test_membarrier_register_private_expedited_success();
+ if (status)
+ return status;
+ status = test_membarrier_private_expedited_success();
+ if (status)
+ return status;
+ status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
+ if (status < 0) {
+ ksft_test_result_fail("sys_membarrier() failed\n");
+ return status;
+ }
+ if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+ status = test_membarrier_register_private_expedited_sync_core_success();
+ if (status)
+ return status;
+ status = test_membarrier_private_expedited_sync_core_success();
+ if (status)
+ return status;
+ }
+ /*
+ * It is valid to send a global membarrier from a non-registered
+ * process.
+ */
+ status = test_membarrier_global_expedited_success();
+ if (status)
+ return status;
+ status = test_membarrier_register_global_expedited_success();
+ if (status)
+ return status;
+ status = test_membarrier_global_expedited_success();
+ if (status)
+ return status;
+ return 0;
+}
+
+static int test_membarrier_query(void)
+{
+ int flags = 0, ret;
+
+ ret = sys_membarrier(MEMBARRIER_CMD_QUERY, flags);
+ if (ret < 0) {
+ if (errno == ENOSYS) {
+ /*
+ * It is valid to build a kernel with
+ * CONFIG_MEMBARRIER=n. However, this skips the tests.
+ */
+ ksft_exit_skip(
+ "sys membarrier (CONFIG_MEMBARRIER) is disabled.\n");
+ }
+ ksft_exit_fail_msg("sys_membarrier() failed\n");
+ }
+ if (!(ret & MEMBARRIER_CMD_GLOBAL))
+ ksft_exit_skip(
+ "sys_membarrier unsupported: CMD_GLOBAL not found.\n");
+
+ ksft_test_result_pass("sys_membarrier available\n");
+ return 0;
+}
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "membarrier_test_impl.h"
+
+static int thread_ready, thread_quit;
+static pthread_mutex_t test_membarrier_thread_mutex =
+ PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t test_membarrier_thread_cond =
+ PTHREAD_COND_INITIALIZER;
+
+void *test_membarrier_thread(void *arg)
+{
+ pthread_mutex_lock(&test_membarrier_thread_mutex);
+ thread_ready = 1;
+ pthread_cond_broadcast(&test_membarrier_thread_cond);
+ pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+ pthread_mutex_lock(&test_membarrier_thread_mutex);
+ while (!thread_quit)
+ pthread_cond_wait(&test_membarrier_thread_cond,
+ &test_membarrier_thread_mutex);
+ pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+ return NULL;
+}
+
+static int test_mt_membarrier(void)
+{
+ int i;
+ pthread_t test_thread;
+
+ pthread_create(&test_thread, NULL,
+ test_membarrier_thread, NULL);
+
+ pthread_mutex_lock(&test_membarrier_thread_mutex);
+ while (!thread_ready)
+ pthread_cond_wait(&test_membarrier_thread_cond,
+ &test_membarrier_thread_mutex);
+ pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+ test_membarrier_fail();
+
+ test_membarrier_success();
+
+ pthread_mutex_lock(&test_membarrier_thread_mutex);
+ thread_quit = 1;
+ pthread_cond_broadcast(&test_membarrier_thread_cond);
+ pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+ pthread_join(test_thread, NULL);
+
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ ksft_print_header();
+ ksft_set_plan(13);
+
+ test_membarrier_query();
+
+ /* Multi-threaded */
+ test_mt_membarrier();
+
+ return ksft_exit_pass();
+}
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "membarrier_test_impl.h"
+
+int main(int argc, char **argv)
+{
+ ksft_print_header();
+ ksft_set_plan(13);
+
+ test_membarrier_query();
+
+ test_membarrier_fail();
+
+ test_membarrier_success();
+
+ return ksft_exit_pass();
+}
ipv6_flowlabel_mgr
so_txtime
tcp_fastopen_backup_key
+nettest
PAUSE_ON_FAIL=no
VERBOSE=0
+which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping)
+
################################################################################
# helpers
local rc
if [ ${ping_sz} != "0" ]; then
- run_cmd ip netns exec h0 ping6 -s ${ping_sz} -c5 -w5 ${dst}
+ run_cmd ip netns exec h0 ${ping6} -s ${ping_sz} -c5 -w5 ${dst}
fi
if [ "$VERBOSE" = "1" ]; then
run_cmd taskset -c ${c} ip netns exec h0 ping -c1 -w1 172.16.10${i}.1
[ $? -ne 0 ] && printf "\nERROR: ping to h${i} failed\n" && ret=1
- run_cmd taskset -c ${c} ip netns exec h0 ping6 -c1 -w1 2001:db8:10${i}::1
+ run_cmd taskset -c ${c} ip netns exec h0 ${ping6} -c1 -w1 2001:db8:10${i}::1
[ $? -ne 0 ] && printf "\nERROR: ping6 to h${i} failed\n" && ret=1
[ $ret -ne 0 ] && break
run_cmd "$IP nexthop add id 104 group 1 dev veth1"
log_test $? 2 "Nexthop group and device"
+ # Tests to ensure that flushing works as expected.
+ run_cmd "$IP nexthop add id 105 blackhole proto 99"
+ run_cmd "$IP nexthop add id 106 blackhole proto 100"
+ run_cmd "$IP nexthop add id 107 blackhole proto 99"
+ run_cmd "$IP nexthop flush proto 99"
+ check_nexthop "id 105" ""
+ check_nexthop "id 106" "id 106 blackhole proto 100"
+ check_nexthop "id 107" ""
+ run_cmd "$IP nexthop flush proto 100"
+ check_nexthop "id 106" ""
+
+ run_cmd "$IP nexthop flush proto 100"
+ log_test $? 0 "Test proto flush"
+
run_cmd "$IP nexthop add id 104 group 1 blackhole"
log_test $? 2 "Nexthop group and blackhole"
ksft_skip=4
# all tests in this script. Can be overridden with -t option
-TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter"
+TESTS="unregister down carrier nexthop suppress ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter"
VERBOSE=0
PAUSE_ON_FAIL=no
IP="ip -netns ns1"
NS_EXEC="ip netns exec ns1"
+which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping)
+
log_test()
{
local rc=$1
cleanup
}
+fib_suppress_test()
+{
+ $IP link add dummy1 type dummy
+ $IP link set dummy1 up
+ $IP -6 route add default dev dummy1
+ $IP -6 rule add table main suppress_prefixlength 0
+ ping -f -c 1000 -W 1 1234::1 || true
+ $IP -6 rule del table main suppress_prefixlength 0
+ $IP link del dummy1
+
+ # If we got here without crashing, we're good.
+ return 0
+}
+
################################################################################
# Tests on route add and replace
log_test $rc 0 "Multipath route with mtu metric"
$IP -6 ro add 2001:db8:104::/64 via 2001:db8:101::2 mtu 1300
- run_cmd "ip netns exec ns1 ping6 -w1 -c1 -s 1500 2001:db8:104::1"
+ run_cmd "ip netns exec ns1 ${ping6} -w1 -c1 -s 1500 2001:db8:104::1"
log_test $? 0 "Using route with mtu metric"
run_cmd "$IP -6 ro add 2001:db8:114::/64 via 2001:db8:101::2 congctl lock foo"
fib_carrier_test|carrier) fib_carrier_test;;
fib_rp_filter_test|rp_filter) fib_rp_filter_test;;
fib_nexthop_test|nexthop) fib_nexthop_test;;
+ fib_suppress_test|suppress) fib_suppress_test;;
ipv6_route_test|ipv6_rt) ipv6_route_test;;
ipv4_route_test|ipv4_rt) ipv4_route_test;;
ipv6_addr_metric) ipv6_addr_metric_test;;
.tfail = true,
},
{
- /* send a single MSS: will fail with GSO, because the segment
- * logic in udp4_ufo_fragment demands a gso skb to be > MTU
- */
+ /* send a single MSS: will fall back to no GSO */
.tlen = CONST_MSS_V4,
.gso_len = CONST_MSS_V4,
- .tfail = true,
.r_num_mss = 1,
},
{
.tfail = true,
},
{
- /* send a single 1B MSS: will fail, see single MSS above */
+ /* send a single 1B MSS: will fall back to no GSO */
.tlen = 1,
.gso_len = 1,
- .tfail = true,
.r_num_mss = 1,
},
{
.tfail = true,
},
{
- /* send a single MSS: will fail with GSO, because the segment
- * logic in udp4_ufo_fragment demands a gso skb to be > MTU
- */
+ /* send a single MSS: will fall back to no GSO */
.tlen = CONST_MSS_V6,
.gso_len = CONST_MSS_V6,
- .tfail = true,
.r_num_mss = 1,
},
{
.tfail = true,
},
{
- /* send a single 1B MSS: will fail, see single MSS above */
+ /* send a single 1B MSS: will fall back to no GSO */
.tlen = 1,
.gso_len = 1,
- .tfail = true,
.r_num_mss = 1,
},
{
# SPDX-License-Identifier: GPL-2.0-only
-CFLAGS += -g -I../../../../usr/include/ -lpthread
+CFLAGS += -g -I../../../../usr/include/ -pthread
TEST_GEN_PROGS := pidfd_test pidfd_open_test pidfd_poll_test pidfd_wait
TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors wild_bctr \
large_vm_fork_separation
+TEST_GEN_PROGS_EXTENDED := tlbie_test
TEST_GEN_FILES := tempfile
top_srcdir = ../../../../..
$(OUTPUT)/tempfile:
dd if=/dev/zero of=$@ bs=64k count=1
+$(OUTPUT)/tlbie_test: LDLIBS += -lpthread
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019, Nick Piggin, Gautham R. Shenoy, Aneesh Kumar K.V, IBM Corp.
+ */
+
+/*
+ *
+ * Test tlbie/mtpidr race. We have 4 threads doing flush/load/compare/store
+ * sequence in a loop. The same threads also rung a context switch task
+ * that does sched_yield() in loop.
+ *
+ * The snapshot thread mark the mmap area PROT_READ in between, make a copy
+ * and copy it back to the original area. This helps us to detect if any
+ * store continued to happen after we marked the memory PROT_READ.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/ipc.h>
+#include <sys/shm.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <linux/futex.h>
+#include <unistd.h>
+#include <asm/unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <time.h>
+#include <stdarg.h>
+#include <sched.h>
+#include <pthread.h>
+#include <signal.h>
+#include <sys/prctl.h>
+
+static inline void dcbf(volatile unsigned int *addr)
+{
+ __asm__ __volatile__ ("dcbf %y0; sync" : : "Z"(*(unsigned char *)addr) : "memory");
+}
+
+static void err_msg(char *msg)
+{
+
+ time_t now;
+ time(&now);
+ printf("=================================\n");
+ printf(" Error: %s\n", msg);
+ printf(" %s", ctime(&now));
+ printf("=================================\n");
+ exit(1);
+}
+
+static char *map1;
+static char *map2;
+static pid_t rim_process_pid;
+
+/*
+ * A "rim-sequence" is defined to be the sequence of the following
+ * operations performed on a memory word:
+ * 1) FLUSH the contents of that word.
+ * 2) LOAD the contents of that word.
+ * 3) COMPARE the contents of that word with the content that was
+ * previously stored at that word
+ * 4) STORE new content into that word.
+ *
+ * The threads in this test that perform the rim-sequence are termed
+ * as rim_threads.
+ */
+
+/*
+ * A "corruption" is defined to be the failed COMPARE operation in a
+ * rim-sequence.
+ *
+ * A rim_thread that detects a corruption informs about it to all the
+ * other rim_threads, and the mem_snapshot thread.
+ */
+static volatile unsigned int corruption_found;
+
+/*
+ * This defines the maximum number of rim_threads in this test.
+ *
+ * The THREAD_ID_BITS denote the number of bits required
+ * to represent the thread_ids [0..MAX_THREADS - 1].
+ * We are being a bit paranoid here and set it to 8 bits,
+ * though 6 bits suffice.
+ *
+ */
+#define MAX_THREADS 64
+#define THREAD_ID_BITS 8
+#define THREAD_ID_MASK ((1 << THREAD_ID_BITS) - 1)
+static unsigned int rim_thread_ids[MAX_THREADS];
+static pthread_t rim_threads[MAX_THREADS];
+
+
+/*
+ * Each rim_thread works on an exclusive "chunk" of size
+ * RIM_CHUNK_SIZE.
+ *
+ * The ith rim_thread works on the ith chunk.
+ *
+ * The ith chunk begins at
+ * map1 + (i * RIM_CHUNK_SIZE)
+ */
+#define RIM_CHUNK_SIZE 1024
+#define BITS_PER_BYTE 8
+#define WORD_SIZE (sizeof(unsigned int))
+#define WORD_BITS (WORD_SIZE * BITS_PER_BYTE)
+#define WORDS_PER_CHUNK (RIM_CHUNK_SIZE/WORD_SIZE)
+
+static inline char *compute_chunk_start_addr(unsigned int thread_id)
+{
+ char *chunk_start;
+
+ chunk_start = (char *)((unsigned long)map1 +
+ (thread_id * RIM_CHUNK_SIZE));
+
+ return chunk_start;
+}
+
+/*
+ * The "word-offset" of a word-aligned address inside a chunk, is
+ * defined to be the number of words that precede the address in that
+ * chunk.
+ *
+ * WORD_OFFSET_BITS denote the number of bits required to represent
+ * the word-offsets of all the word-aligned addresses of a chunk.
+ */
+#define WORD_OFFSET_BITS (__builtin_ctz(WORDS_PER_CHUNK))
+#define WORD_OFFSET_MASK ((1 << WORD_OFFSET_BITS) - 1)
+
+static inline unsigned int compute_word_offset(char *start, unsigned int *addr)
+{
+ unsigned int delta_bytes, ret;
+ delta_bytes = (unsigned long)addr - (unsigned long)start;
+
+ ret = delta_bytes/WORD_SIZE;
+
+ return ret;
+}
+
+/*
+ * A "sweep" is defined to be the sequential execution of the
+ * rim-sequence by a rim_thread on its chunk one word at a time,
+ * starting from the first word of its chunk and ending with the last
+ * word of its chunk.
+ *
+ * Each sweep of a rim_thread is uniquely identified by a sweep_id.
+ * SWEEP_ID_BITS denote the number of bits required to represent
+ * the sweep_ids of rim_threads.
+ *
+ * As to why SWEEP_ID_BITS are computed as a function of THREAD_ID_BITS,
+ * WORD_OFFSET_BITS, and WORD_BITS, see the "store-pattern" below.
+ */
+#define SWEEP_ID_BITS (WORD_BITS - (THREAD_ID_BITS + WORD_OFFSET_BITS))
+#define SWEEP_ID_MASK ((1 << SWEEP_ID_BITS) - 1)
+
+/*
+ * A "store-pattern" is the word-pattern that is stored into a word
+ * location in the 4)STORE step of the rim-sequence.
+ *
+ * In the store-pattern, we shall encode:
+ *
+ * - The thread-id of the rim_thread performing the store
+ * (The most significant THREAD_ID_BITS)
+ *
+ * - The word-offset of the address into which the store is being
+ * performed (The next WORD_OFFSET_BITS)
+ *
+ * - The sweep_id of the current sweep in which the store is
+ * being performed. (The lower SWEEP_ID_BITS)
+ *
+ * Store Pattern: 32 bits
+ * |------------------|--------------------|---------------------------------|
+ * | Thread id | Word offset | sweep_id |
+ * |------------------|--------------------|---------------------------------|
+ * THREAD_ID_BITS WORD_OFFSET_BITS SWEEP_ID_BITS
+ *
+ * In the store pattern, the (Thread-id + Word-offset) uniquely identify the
+ * address to which the store is being performed i.e,
+ * address == map1 +
+ * (Thread-id * RIM_CHUNK_SIZE) + (Word-offset * WORD_SIZE)
+ *
+ * And the sweep_id in the store pattern identifies the time when the
+ * store was performed by the rim_thread.
+ *
+ * We shall use this property in the 3)COMPARE step of the
+ * rim-sequence.
+ */
+#define SWEEP_ID_SHIFT 0
+#define WORD_OFFSET_SHIFT (SWEEP_ID_BITS)
+#define THREAD_ID_SHIFT (WORD_OFFSET_BITS + SWEEP_ID_BITS)
+
+/*
+ * Compute the store pattern for a given thread with id @tid, at
+ * location @addr in the sweep identified by @sweep_id
+ */
+static inline unsigned int compute_store_pattern(unsigned int tid,
+ unsigned int *addr,
+ unsigned int sweep_id)
+{
+ unsigned int ret = 0;
+ char *start = compute_chunk_start_addr(tid);
+ unsigned int word_offset = compute_word_offset(start, addr);
+
+ ret += (tid & THREAD_ID_MASK) << THREAD_ID_SHIFT;
+ ret += (word_offset & WORD_OFFSET_MASK) << WORD_OFFSET_SHIFT;
+ ret += (sweep_id & SWEEP_ID_MASK) << SWEEP_ID_SHIFT;
+ return ret;
+}
+
+/* Extract the thread-id from the given store-pattern */
+static inline unsigned int extract_tid(unsigned int pattern)
+{
+ unsigned int ret;
+
+ ret = (pattern >> THREAD_ID_SHIFT) & THREAD_ID_MASK;
+ return ret;
+}
+
+/* Extract the word-offset from the given store-pattern */
+static inline unsigned int extract_word_offset(unsigned int pattern)
+{
+ unsigned int ret;
+
+ ret = (pattern >> WORD_OFFSET_SHIFT) & WORD_OFFSET_MASK;
+
+ return ret;
+}
+
+/* Extract the sweep-id from the given store-pattern */
+static inline unsigned int extract_sweep_id(unsigned int pattern)
+
+{
+ unsigned int ret;
+
+ ret = (pattern >> SWEEP_ID_SHIFT) & SWEEP_ID_MASK;
+
+ return ret;
+}
+
+/************************************************************
+ * *
+ * Logging the output of the verification *
+ * *
+ ************************************************************/
+#define LOGDIR_NAME_SIZE 100
+static char logdir[LOGDIR_NAME_SIZE];
+
+static FILE *fp[MAX_THREADS];
+static const char logfilename[] ="Thread-%02d-Chunk";
+
+static inline void start_verification_log(unsigned int tid,
+ unsigned int *addr,
+ unsigned int cur_sweep_id,
+ unsigned int prev_sweep_id)
+{
+ FILE *f;
+ char logfile[30];
+ char path[LOGDIR_NAME_SIZE + 30];
+ char separator[2] = "/";
+ char *chunk_start = compute_chunk_start_addr(tid);
+ unsigned int size = RIM_CHUNK_SIZE;
+
+ sprintf(logfile, logfilename, tid);
+ strcpy(path, logdir);
+ strcat(path, separator);
+ strcat(path, logfile);
+ f = fopen(path, "w");
+
+ if (!f) {
+ err_msg("Unable to create logfile\n");
+ }
+
+ fp[tid] = f;
+
+ fprintf(f, "----------------------------------------------------------\n");
+ fprintf(f, "PID = %d\n", rim_process_pid);
+ fprintf(f, "Thread id = %02d\n", tid);
+ fprintf(f, "Chunk Start Addr = 0x%016lx\n", (unsigned long)chunk_start);
+ fprintf(f, "Chunk Size = %d\n", size);
+ fprintf(f, "Next Store Addr = 0x%016lx\n", (unsigned long)addr);
+ fprintf(f, "Current sweep-id = 0x%08x\n", cur_sweep_id);
+ fprintf(f, "Previous sweep-id = 0x%08x\n", prev_sweep_id);
+ fprintf(f, "----------------------------------------------------------\n");
+}
+
+static inline void log_anamoly(unsigned int tid, unsigned int *addr,
+ unsigned int expected, unsigned int observed)
+{
+ FILE *f = fp[tid];
+
+ fprintf(f, "Thread %02d: Addr 0x%lx: Expected 0x%x, Observed 0x%x\n",
+ tid, (unsigned long)addr, expected, observed);
+ fprintf(f, "Thread %02d: Expected Thread id = %02d\n", tid, extract_tid(expected));
+ fprintf(f, "Thread %02d: Observed Thread id = %02d\n", tid, extract_tid(observed));
+ fprintf(f, "Thread %02d: Expected Word offset = %03d\n", tid, extract_word_offset(expected));
+ fprintf(f, "Thread %02d: Observed Word offset = %03d\n", tid, extract_word_offset(observed));
+ fprintf(f, "Thread %02d: Expected sweep-id = 0x%x\n", tid, extract_sweep_id(expected));
+ fprintf(f, "Thread %02d: Observed sweep-id = 0x%x\n", tid, extract_sweep_id(observed));
+ fprintf(f, "----------------------------------------------------------\n");
+}
+
+static inline void end_verification_log(unsigned int tid, unsigned nr_anamolies)
+{
+ FILE *f = fp[tid];
+ char logfile[30];
+ char path[LOGDIR_NAME_SIZE + 30];
+ char separator[] = "/";
+
+ fclose(f);
+
+ if (nr_anamolies == 0) {
+ remove(path);
+ return;
+ }
+
+ sprintf(logfile, logfilename, tid);
+ strcpy(path, logdir);
+ strcat(path, separator);
+ strcat(path, logfile);
+
+ printf("Thread %02d chunk has %d corrupted words. For details check %s\n",
+ tid, nr_anamolies, path);
+}
+
+/*
+ * When a COMPARE step of a rim-sequence fails, the rim_thread informs
+ * everyone else via the shared_memory pointed to by
+ * corruption_found variable. On seeing this, every thread verifies the
+ * content of its chunk as follows.
+ *
+ * Suppose a thread identified with @tid was about to store (but not
+ * yet stored) to @next_store_addr in its current sweep identified
+ * @cur_sweep_id. Let @prev_sweep_id indicate the previous sweep_id.
+ *
+ * This implies that for all the addresses @addr < @next_store_addr,
+ * Thread @tid has already performed a store as part of its current
+ * sweep. Hence we expect the content of such @addr to be:
+ * |-------------------------------------------------|
+ * | tid | word_offset(addr) | cur_sweep_id |
+ * |-------------------------------------------------|
+ *
+ * Since Thread @tid is yet to perform stores on address
+ * @next_store_addr and above, we expect the content of such an
+ * address @addr to be:
+ * |-------------------------------------------------|
+ * | tid | word_offset(addr) | prev_sweep_id |
+ * |-------------------------------------------------|
+ *
+ * The verifier function @verify_chunk does this verification and logs
+ * any anamolies that it finds.
+ */
+static void verify_chunk(unsigned int tid, unsigned int *next_store_addr,
+ unsigned int cur_sweep_id,
+ unsigned int prev_sweep_id)
+{
+ unsigned int *iter_ptr;
+ unsigned int size = RIM_CHUNK_SIZE;
+ unsigned int expected;
+ unsigned int observed;
+ char *chunk_start = compute_chunk_start_addr(tid);
+
+ int nr_anamolies = 0;
+
+ start_verification_log(tid, next_store_addr,
+ cur_sweep_id, prev_sweep_id);
+
+ for (iter_ptr = (unsigned int *)chunk_start;
+ (unsigned long)iter_ptr < (unsigned long)chunk_start + size;
+ iter_ptr++) {
+ unsigned int expected_sweep_id;
+
+ if (iter_ptr < next_store_addr) {
+ expected_sweep_id = cur_sweep_id;
+ } else {
+ expected_sweep_id = prev_sweep_id;
+ }
+
+ expected = compute_store_pattern(tid, iter_ptr, expected_sweep_id);
+
+ dcbf((volatile unsigned int*)iter_ptr); //Flush before reading
+ observed = *iter_ptr;
+
+ if (observed != expected) {
+ nr_anamolies++;
+ log_anamoly(tid, iter_ptr, expected, observed);
+ }
+ }
+
+ end_verification_log(tid, nr_anamolies);
+}
+
+static void set_pthread_cpu(pthread_t th, int cpu)
+{
+ cpu_set_t run_cpu_mask;
+ struct sched_param param;
+
+ CPU_ZERO(&run_cpu_mask);
+ CPU_SET(cpu, &run_cpu_mask);
+ pthread_setaffinity_np(th, sizeof(cpu_set_t), &run_cpu_mask);
+
+ param.sched_priority = 1;
+ if (0 && sched_setscheduler(0, SCHED_FIFO, ¶m) == -1) {
+ /* haven't reproduced with this setting, it kills random preemption which may be a factor */
+ fprintf(stderr, "could not set SCHED_FIFO, run as root?\n");
+ }
+}
+
+static void set_mycpu(int cpu)
+{
+ cpu_set_t run_cpu_mask;
+ struct sched_param param;
+
+ CPU_ZERO(&run_cpu_mask);
+ CPU_SET(cpu, &run_cpu_mask);
+ sched_setaffinity(0, sizeof(cpu_set_t), &run_cpu_mask);
+
+ param.sched_priority = 1;
+ if (0 && sched_setscheduler(0, SCHED_FIFO, ¶m) == -1) {
+ fprintf(stderr, "could not set SCHED_FIFO, run as root?\n");
+ }
+}
+
+static volatile int segv_wait;
+
+static void segv_handler(int signo, siginfo_t *info, void *extra)
+{
+ while (segv_wait) {
+ sched_yield();
+ }
+
+}
+
+static void set_segv_handler(void)
+{
+ struct sigaction sa;
+
+ sa.sa_flags = SA_SIGINFO;
+ sa.sa_sigaction = segv_handler;
+
+ if (sigaction(SIGSEGV, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+}
+
+int timeout = 0;
+/*
+ * This function is executed by every rim_thread.
+ *
+ * This function performs sweeps over the exclusive chunks of the
+ * rim_threads executing the rim-sequence one word at a time.
+ */
+static void *rim_fn(void *arg)
+{
+ unsigned int tid = *((unsigned int *)arg);
+
+ int size = RIM_CHUNK_SIZE;
+ char *chunk_start = compute_chunk_start_addr(tid);
+
+ unsigned int prev_sweep_id;
+ unsigned int cur_sweep_id = 0;
+
+ /* word access */
+ unsigned int pattern = cur_sweep_id;
+ unsigned int *pattern_ptr = &pattern;
+ unsigned int *w_ptr, read_data;
+
+ set_segv_handler();
+
+ /*
+ * Let us initialize the chunk:
+ *
+ * Each word-aligned address addr in the chunk,
+ * is initialized to :
+ * |-------------------------------------------------|
+ * | tid | word_offset(addr) | 0 |
+ * |-------------------------------------------------|
+ */
+ for (w_ptr = (unsigned int *)chunk_start;
+ (unsigned long)w_ptr < (unsigned long)(chunk_start) + size;
+ w_ptr++) {
+
+ *pattern_ptr = compute_store_pattern(tid, w_ptr, cur_sweep_id);
+ *w_ptr = *pattern_ptr;
+ }
+
+ while (!corruption_found && !timeout) {
+ prev_sweep_id = cur_sweep_id;
+ cur_sweep_id = cur_sweep_id + 1;
+
+ for (w_ptr = (unsigned int *)chunk_start;
+ (unsigned long)w_ptr < (unsigned long)(chunk_start) + size;
+ w_ptr++) {
+ unsigned int old_pattern;
+
+ /*
+ * Compute the pattern that we would have
+ * stored at this location in the previous
+ * sweep.
+ */
+ old_pattern = compute_store_pattern(tid, w_ptr, prev_sweep_id);
+
+ /*
+ * FLUSH:Ensure that we flush the contents of
+ * the cache before loading
+ */
+ dcbf((volatile unsigned int*)w_ptr); //Flush
+
+ /* LOAD: Read the value */
+ read_data = *w_ptr; //Load
+
+ /*
+ * COMPARE: Is it the same as what we had stored
+ * in the previous sweep ? It better be!
+ */
+ if (read_data != old_pattern) {
+ /* No it isn't! Tell everyone */
+ corruption_found = 1;
+ }
+
+ /*
+ * Before performing a store, let us check if
+ * any rim_thread has found a corruption.
+ */
+ if (corruption_found || timeout) {
+ /*
+ * Yes. Someone (including us!) has found
+ * a corruption :(
+ *
+ * Let us verify that our chunk is
+ * correct.
+ */
+ /* But first, let us allow the dust to settle down! */
+ verify_chunk(tid, w_ptr, cur_sweep_id, prev_sweep_id);
+
+ return 0;
+ }
+
+ /*
+ * Compute the new pattern that we are going
+ * to write to this location
+ */
+ *pattern_ptr = compute_store_pattern(tid, w_ptr, cur_sweep_id);
+
+ /*
+ * STORE: Now let us write this pattern into
+ * the location
+ */
+ *w_ptr = *pattern_ptr;
+ }
+ }
+
+ return NULL;
+}
+
+
+static unsigned long start_cpu = 0;
+static unsigned long nrthreads = 4;
+
+static pthread_t mem_snapshot_thread;
+
+static void *mem_snapshot_fn(void *arg)
+{
+ int page_size = getpagesize();
+ size_t size = page_size;
+ void *tmp = malloc(size);
+
+ while (!corruption_found && !timeout) {
+ /* Stop memory migration once corruption is found */
+ segv_wait = 1;
+
+ mprotect(map1, size, PROT_READ);
+
+ /*
+ * Load from the working alias (map1). Loading from map2
+ * also fails.
+ */
+ memcpy(tmp, map1, size);
+
+ /*
+ * Stores must go via map2 which has write permissions, but
+ * the corrupted data tends to be seen in the snapshot buffer,
+ * so corruption does not appear to be introduced at the
+ * copy-back via map2 alias here.
+ */
+ memcpy(map2, tmp, size);
+ /*
+ * Before releasing other threads, must ensure the copy
+ * back to
+ */
+ asm volatile("sync" ::: "memory");
+ mprotect(map1, size, PROT_READ|PROT_WRITE);
+ asm volatile("sync" ::: "memory");
+ segv_wait = 0;
+
+ usleep(1); /* This value makes a big difference */
+ }
+
+ return 0;
+}
+
+void alrm_sighandler(int sig)
+{
+ timeout = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ int c;
+ int page_size = getpagesize();
+ time_t now;
+ int i, dir_error;
+ pthread_attr_t attr;
+ key_t shm_key = (key_t) getpid();
+ int shmid, run_time = 20 * 60;
+ struct sigaction sa_alrm;
+
+ snprintf(logdir, LOGDIR_NAME_SIZE,
+ "/tmp/logdir-%u", (unsigned int)getpid());
+ while ((c = getopt(argc, argv, "r:hn:l:t:")) != -1) {
+ switch(c) {
+ case 'r':
+ start_cpu = strtoul(optarg, NULL, 10);
+ break;
+ case 'h':
+ printf("%s [-r <start_cpu>] [-n <nrthreads>] [-l <logdir>] [-t <timeout>]\n", argv[0]);
+ exit(0);
+ break;
+ case 'n':
+ nrthreads = strtoul(optarg, NULL, 10);
+ break;
+ case 'l':
+ strncpy(logdir, optarg, LOGDIR_NAME_SIZE);
+ break;
+ case 't':
+ run_time = strtoul(optarg, NULL, 10);
+ break;
+ default:
+ printf("invalid option\n");
+ exit(0);
+ break;
+ }
+ }
+
+ if (nrthreads > MAX_THREADS)
+ nrthreads = MAX_THREADS;
+
+ shmid = shmget(shm_key, page_size, IPC_CREAT|0666);
+ if (shmid < 0) {
+ err_msg("Failed shmget\n");
+ }
+
+ map1 = shmat(shmid, NULL, 0);
+ if (map1 == (void *) -1) {
+ err_msg("Failed shmat");
+ }
+
+ map2 = shmat(shmid, NULL, 0);
+ if (map2 == (void *) -1) {
+ err_msg("Failed shmat");
+ }
+
+ dir_error = mkdir(logdir, 0755);
+
+ if (dir_error) {
+ err_msg("Failed mkdir");
+ }
+
+ printf("start_cpu list:%lu\n", start_cpu);
+ printf("number of worker threads:%lu + 1 snapshot thread\n", nrthreads);
+ printf("Allocated address:0x%016lx + secondary map:0x%016lx\n", (unsigned long)map1, (unsigned long)map2);
+ printf("logdir at : %s\n", logdir);
+ printf("Timeout: %d seconds\n", run_time);
+
+ time(&now);
+ printf("=================================\n");
+ printf(" Starting Test\n");
+ printf(" %s", ctime(&now));
+ printf("=================================\n");
+
+ for (i = 0; i < nrthreads; i++) {
+ if (1 && !fork()) {
+ prctl(PR_SET_PDEATHSIG, SIGKILL);
+ set_mycpu(start_cpu + i);
+ for (;;)
+ sched_yield();
+ exit(0);
+ }
+ }
+
+
+ sa_alrm.sa_handler = &alrm_sighandler;
+ sigemptyset(&sa_alrm.sa_mask);
+ sa_alrm.sa_flags = 0;
+
+ if (sigaction(SIGALRM, &sa_alrm, 0) == -1) {
+ err_msg("Failed signal handler registration\n");
+ }
+
+ alarm(run_time);
+
+ pthread_attr_init(&attr);
+ for (i = 0; i < nrthreads; i++) {
+ rim_thread_ids[i] = i;
+ pthread_create(&rim_threads[i], &attr, rim_fn, &rim_thread_ids[i]);
+ set_pthread_cpu(rim_threads[i], start_cpu + i);
+ }
+
+ pthread_create(&mem_snapshot_thread, &attr, mem_snapshot_fn, map1);
+ set_pthread_cpu(mem_snapshot_thread, start_cpu + i);
+
+
+ pthread_join(mem_snapshot_thread, NULL);
+ for (i = 0; i < nrthreads; i++) {
+ pthread_join(rim_threads[i], NULL);
+ }
+
+ if (!timeout) {
+ time(&now);
+ printf("=================================\n");
+ printf(" Data Corruption Detected\n");
+ printf(" %s", ctime(&now));
+ printf(" See logfiles in %s\n", logdir);
+ printf("=================================\n");
+ return 1;
+ }
+ return 0;
+}
tm-unavailable
tm-trap
tm-sigreturn
+tm-poison
TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \
$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt \
- tm-signal-context-force-tm
+ tm-signal-context-force-tm tm-poison
top_srcdir = ../../../../..
include ../../lib.mk
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2019, Gustavo Romero, Michael Neuling, IBM Corp.
+ *
+ * This test will spawn two processes. Both will be attached to the same
+ * CPU (CPU 0). The child will be in a loop writing to FP register f31 and
+ * VMX/VEC/Altivec register vr31 a known value, called poison, calling
+ * sched_yield syscall after to allow the parent to switch on the CPU.
+ * Parent will set f31 and vr31 to 1 and in a loop will check if f31 and
+ * vr31 remain 1 as expected until a given timeout (2m). If the issue is
+ * present child's poison will leak into parent's f31 or vr31 registers,
+ * otherwise, poison will never leak into parent's f31 and vr31 registers.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <inttypes.h>
+#include <sched.h>
+#include <sys/types.h>
+#include <signal.h>
+#include <inttypes.h>
+
+#include "tm.h"
+
+int tm_poison_test(void)
+{
+ int pid;
+ cpu_set_t cpuset;
+ uint64_t poison = 0xdeadbeefc0dec0fe;
+ uint64_t unknown = 0;
+ bool fail_fp = false;
+ bool fail_vr = false;
+
+ SKIP_IF(!have_htm());
+
+ /* Attach both Child and Parent to CPU 0 */
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ sched_setaffinity(0, sizeof(cpuset), &cpuset);
+
+ pid = fork();
+ if (!pid) {
+ /**
+ * child
+ */
+ while (1) {
+ sched_yield();
+ asm (
+ "mtvsrd 31, %[poison];" // f31 = poison
+ "mtvsrd 63, %[poison];" // vr31 = poison
+
+ : : [poison] "r" (poison) : );
+ }
+ }
+
+ /**
+ * parent
+ */
+ asm (
+ /*
+ * Set r3, r4, and f31 to known value 1 before entering
+ * in transaction. They won't be written after that.
+ */
+ " li 3, 0x1 ;"
+ " li 4, 0x1 ;"
+ " mtvsrd 31, 4 ;"
+
+ /*
+ * The Time Base (TB) is a 64-bit counter register that is
+ * independent of the CPU clock and which is incremented
+ * at a frequency of 512000000 Hz, so every 1.953125ns.
+ * So it's necessary 120s/0.000000001953125s = 61440000000
+ * increments to get a 2 minutes timeout. Below we set that
+ * value in r5 and then use r6 to track initial TB value,
+ * updating TB values in r7 at every iteration and comparing it
+ * to r6. When r7 (current) - r6 (initial) > 61440000000 we bail
+ * out since for sure we spent already 2 minutes in the loop.
+ * SPR 268 is the TB register.
+ */
+ " lis 5, 14 ;"
+ " ori 5, 5, 19996 ;"
+ " sldi 5, 5, 16 ;" // r5 = 61440000000
+
+ " mfspr 6, 268 ;" // r6 (TB initial)
+ "1: mfspr 7, 268 ;" // r7 (TB current)
+ " subf 7, 6, 7 ;" // r7 - r6 > 61440000000 ?
+ " cmpd 7, 5 ;"
+ " bgt 3f ;" // yes, exit
+
+ /*
+ * Main loop to check f31
+ */
+ " tbegin. ;" // no, try again
+ " beq 1b ;" // restart if no timeout
+ " mfvsrd 3, 31 ;" // read f31
+ " cmpd 3, 4 ;" // f31 == 1 ?
+ " bne 2f ;" // broken :-(
+ " tabort. 3 ;" // try another transaction
+ "2: tend. ;" // commit transaction
+ "3: mr %[unknown], 3 ;" // record r3
+
+ : [unknown] "=r" (unknown)
+ :
+ : "cr0", "r3", "r4", "r5", "r6", "r7", "vs31"
+
+ );
+
+ /*
+ * On leak 'unknown' will contain 'poison' value from child,
+ * otherwise (no leak) 'unknown' will contain the same value
+ * as r3 before entering in transactional mode, i.e. 0x1.
+ */
+ fail_fp = unknown != 0x1;
+ if (fail_fp)
+ printf("Unknown value %#"PRIx64" leaked into f31!\n", unknown);
+ else
+ printf("Good, no poison or leaked value into FP registers\n");
+
+ asm (
+ /*
+ * Set r3, r4, and vr31 to known value 1 before entering
+ * in transaction. They won't be written after that.
+ */
+ " li 3, 0x1 ;"
+ " li 4, 0x1 ;"
+ " mtvsrd 63, 4 ;"
+
+ " lis 5, 14 ;"
+ " ori 5, 5, 19996 ;"
+ " sldi 5, 5, 16 ;" // r5 = 61440000000
+
+ " mfspr 6, 268 ;" // r6 (TB initial)
+ "1: mfspr 7, 268 ;" // r7 (TB current)
+ " subf 7, 6, 7 ;" // r7 - r6 > 61440000000 ?
+ " cmpd 7, 5 ;"
+ " bgt 3f ;" // yes, exit
+
+ /*
+ * Main loop to check vr31
+ */
+ " tbegin. ;" // no, try again
+ " beq 1b ;" // restart if no timeout
+ " mfvsrd 3, 63 ;" // read vr31
+ " cmpd 3, 4 ;" // vr31 == 1 ?
+ " bne 2f ;" // broken :-(
+ " tabort. 3 ;" // try another transaction
+ "2: tend. ;" // commit transaction
+ "3: mr %[unknown], 3 ;" // record r3
+
+ : [unknown] "=r" (unknown)
+ :
+ : "cr0", "r3", "r4", "r5", "r6", "r7", "vs63"
+
+ );
+
+ /*
+ * On leak 'unknown' will contain 'poison' value from child,
+ * otherwise (no leak) 'unknown' will contain the same value
+ * as r3 before entering in transactional mode, i.e. 0x1.
+ */
+ fail_vr = unknown != 0x1;
+ if (fail_vr)
+ printf("Unknown value %#"PRIx64" leaked into vr31!\n", unknown);
+ else
+ printf("Good, no poison or leaked value into VEC registers\n");
+
+ kill(pid, SIGKILL);
+
+ return (fail_fp | fail_vr);
+}
+
+int main(int argc, char *argv[])
+{
+ /* Test completes in about 4m */
+ test_harness_set_timeout(250);
+ return test_harness(tm_poison_test, "tm_poison_test");
+}
header-test-$(CONFIG_CPU_BIG_ENDIAN) += linux/byteorder/big_endian.h
header-test-$(CONFIG_CPU_LITTLE_ENDIAN) += linux/byteorder/little_endian.h
header-test- += linux/coda.h
-header-test- += linux/coda_psdev.h
header-test- += linux/elfcore.h
header-test- += linux/errqueue.h
header-test- += linux/fsmap.h
header-test- += linux/hdlc/ioctl.h
header-test- += linux/ivtv.h
-header-test- += linux/jffs2.h
header-test- += linux/kexec.h
header-test- += linux/matroxfb.h
-header-test- += linux/netfilter_bridge/ebtables.h
header-test- += linux/netfilter_ipv4/ipt_LOG.h
header-test- += linux/netfilter_ipv6/ip6t_LOG.h
header-test- += linux/nfc.h
header-test- += linux/v4l2-subdev.h
header-test- += linux/videodev2.h
header-test- += linux/vm_sockets.h
-header-test- += scsi/scsi_bsg_fc.h
-header-test- += scsi/scsi_netlink.h
-header-test- += scsi/scsi_netlink_fc.h
header-test- += sound/asequencer.h
header-test- += sound/asoc.h
header-test- += sound/asound.h
header-test- += sound/compress_offload.h
header-test- += sound/emu10k1.h
header-test- += sound/sfnt_info.h
-header-test- += sound/sof/eq.h
-header-test- += sound/sof/fw.h
-header-test- += sound/sof/header.h
-header-test- += sound/sof/manifest.h
-header-test- += sound/sof/trace.h
header-test- += xen/evtchn.h
header-test- += xen/gntdev.h
header-test- += xen/privcmd.h
#endif /* _TRACE_VGIC_H */
#undef TRACE_INCLUDE_PATH
-#define TRACE_INCLUDE_PATH ../../../virt/kvm/arm/vgic
+#define TRACE_INCLUDE_PATH ../../virt/kvm/arm/vgic
#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_FILE trace
stat_data->kvm = kvm;
stat_data->offset = p->offset;
+ stat_data->mode = p->mode ? p->mode : 0644;
kvm->debugfs_stat_data[p - debugfs_entries] = stat_data;
- debugfs_create_file(p->name, 0644, kvm->debugfs_dentry,
+ debugfs_create_file(p->name, stat_data->mode, kvm->debugfs_dentry,
stat_data, stat_fops_per_vm[p->kind]);
}
return 0;
if (!refcount_inc_not_zero(&stat_data->kvm->users_count))
return -ENOENT;
- if (simple_attr_open(inode, file, get, set, fmt)) {
+ if (simple_attr_open(inode, file, get,
+ stat_data->mode & S_IWUGO ? set : NULL,
+ fmt)) {
kvm_put_kvm(stat_data->kvm);
return -ENOMEM;
}
kvm_debugfs_num_entries = 0;
for (p = debugfs_entries; p->name; ++p, kvm_debugfs_num_entries++) {
- debugfs_create_file(p->name, 0644, kvm_debugfs_dir,
+ int mode = p->mode ? p->mode : 0644;
+ debugfs_create_file(p->name, mode, kvm_debugfs_dir,
(void *)(long)p->offset,
stat_fops[p->kind]);
}