Gao Xiang <xiang@kernel.org> <hsiangkao@aol.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@linux.alibaba.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@redhat.com>
+Geliang Tang <geliang.tang@linux.dev> <geliang.tang@suse.com>
+Geliang Tang <geliang.tang@linux.dev> <geliangtang@xiaomi.com>
+Geliang Tang <geliang.tang@linux.dev> <geliangtang@gmail.com>
+Geliang Tang <geliang.tang@linux.dev> <geliangtang@163.com>
Georgi Djakov <djakov@kernel.org> <georgi.djakov@linaro.org>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <geraldsc@de.ibm.com>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <gerald.schaefer@de.ibm.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@puri.sm>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> <martyna.szapar-mudlaw@intel.com>
-Mathieu Othacehe <m.othacehe@gmail.com>
+Mathieu Othacehe <m.othacehe@gmail.com> <othacehe@gnu.org>
Mat Martineau <martineau@kernel.org> <mathew.j.martineau@linux.intel.com>
Mat Martineau <martineau@kernel.org> <mathewm@codeaurora.org>
Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com>
Murali Nalajala <quic_mnalajal@quicinc.com> <mnalajal@codeaurora.org>
Mythri P K <mythripk@ti.com>
Nadia Yvette Chambers <nyc@holomorphy.com> William Lee Irwin III <wli@holomorphy.com>
+Naoya Horiguchi <naoya.horiguchi@nec.com> <n-horiguchi@ah.jp.nec.com>
Nathan Chancellor <nathan@kernel.org> <natechancellor@gmail.com>
Neeraj Upadhyay <quic_neeraju@quicinc.com> <neeraju@codeaurora.org>
Neil Armstrong <neil.armstrong@linaro.org> <narmstrong@baylibre.com>
Wolfram Sang <wsa@kernel.org> <wsa@the-dreams.de>
Yakir Yang <kuankuan.y@gmail.com> <ykk@rock-chips.com>
Yusuke Goda <goda.yusuke@renesas.com>
+Zack Rusin <zack.rusin@broadcom.com> <zackr@vmware.com>
Zhu Yanjun <zyjzyj2000@gmail.com> <yanjunz@nvidia.com>
D: 'Trinity' and similar fuzz testing work.
D: Misc/Other.
+N: Tom Joseph
+E: tjoseph@cadence.com
+D: Cadence PCIe driver
+
N: Martin Josfsson
E: gandalf@wlug.westbo.se
P: 1024D/F6B6D3B1 7610 7CED 5C34 4AA6 DBA2 8BE1 5A6D AF95 F6B6 D3B1
S: San Jose, CA 95123
S: USA
+N: Mike Kravetz
+E: mike.kravetz@oracle.com
+D: Maintenance and development of the hugetlb subsystem
+
N: Andreas S. Krebs
E: akrebs@altavista.net
D: CYPRESS CY82C693 chipset IDE, Digital's PC-Alpha 164SX boards
--- /dev/null
+.. SPDX-License-Identifier: GPL-2.0
+
+Reliability, Availability and Serviceability features
+=====================================================
+
+This documents different aspects of the RAS functionality present in the
+kernel.
+
+Error decoding
+---------------
+
+* x86
+
+Error decoding on AMD systems should be done using the rasdaemon tool:
+https://github.com/mchehab/rasdaemon/
+
+While the daemon is running, it would automatically log and decode
+errors. If not, one can still decode such errors by supplying the
+hardware information from the error::
+
+ $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca
+
+Also, the user can pass particular family and model to decode the error
+string::
+
+ $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM>
--- /dev/null
+======================================================================
+Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
+======================================================================
+
+DesignWare Cores (DWC) PCIe PMU
+===============================
+
+The PMU is a PCIe configuration space register block provided by each PCIe Root
+Port in a Vendor-Specific Extended Capability named RAS D.E.S (Debug, Error
+injection, and Statistics).
+
+As the name indicates, the RAS DES capability supports system level
+debugging, AER error injection, and collection of statistics. To facilitate
+collection of statistics, Synopsys DesignWare Cores PCIe controller
+provides the following two features:
+
+- one 64-bit counter for Time Based Analysis (RX/TX data throughput and
+ time spent in each low-power LTSSM state) and
+- one 32-bit counter for Event Counting (error and non-error events for
+ a specified lane)
+
+Note: There is no interrupt for counter overflow.
+
+Time Based Analysis
+-------------------
+
+Using this feature you can obtain information regarding RX/TX data
+throughput and time spent in each low-power LTSSM state by the controller.
+The PMU measures data in two categories:
+
+- Group#0: Percentage of time the controller stays in LTSSM states.
+- Group#1: Amount of data processed (Units of 16 bytes).
+
+Lane Event counters
+-------------------
+
+Using this feature you can obtain Error and Non-Error information in
+specific lane by the controller. The PMU event is selected by all of:
+
+- Group i
+- Event j within the Group i
+- Lane k
+
+Some of the events only exist for specific configurations.
+
+DesignWare Cores (DWC) PCIe PMU Driver
+=======================================
+
+This driver adds PMU devices for each PCIe Root Port named based on the BDF of
+the Root Port. For example,
+
+ 30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
+
+the PMU device name for this Root Port is dwc_rootport_3018.
+
+The DWC PCIe PMU driver registers a perf PMU driver, which provides
+description of available events and configuration options in sysfs, see
+/sys/bus/event_source/devices/dwc_rootport_{bdf}.
+
+The "format" directory describes format of the config fields of the
+perf_event_attr structure. The "events" directory provides configuration
+templates for all documented events. For example,
+"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
+
+The "perf list" command shall list the available events from sysfs, e.g.::
+
+ $# perf list | grep dwc_rootport
+ <...>
+ dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event]
+ <...>
+ dwc_rootport_3018/rx_memory_read,lane=?/ [Kernel PMU event]
+
+Time Based Analysis Event Usage
+-------------------------------
+
+Example usage of counting PCIe RX TLP data payload (Units of bytes)::
+
+ $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
+
+The average RX/TX bandwidth can be calculated using the following formula:
+
+ PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
+ PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
+
+Lane Event Usage
+-------------------------------
+
+Each lane has the same event set and to avoid generating a list of hundreds
+of events, the user need to specify the lane ID explicitly, e.g.::
+
+ $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
+
+The driver does not support sampling, therefore "perf record" will not
+work. Per-task (without "-a") perf sessions are not supported.
interrupt is raised. If any other counter overflows, it continues counting, and
no interrupt is raised.
-The "format" directory describes format of the config (event ID) and config1
-(AXI filtering) fields of the perf_event_attr structure, see /sys/bus/event_source/
+The "format" directory describes format of the config (event ID) and config1/2
+(AXI filter setting) fields of the perf_event_attr structure, see /sys/bus/event_source/
devices/imx8_ddr0/format/. The "events" directory describes the events types
hardware supported that can be used with perf tool, see /sys/bus/event_source/
devices/imx8_ddr0/events/. The "caps" directory describes filter features implemented
AXI filtering is only used by CSV modes 0x41 (axid-read) and 0x42 (axid-write)
to count reading or writing matches filter setting. Filter setting is various
from different DRAM controller implementations, which is distinguished by quirks
-in the driver. You also can dump info from userspace, filter in "caps" directory
-indicates whether PMU supports AXI ID filter or not; enhanced_filter indicates
-whether PMU supports enhanced AXI ID filter or not. Value 0 for un-supported, and
-value 1 for supported.
+in the driver. You also can dump info from userspace, "caps" directory show the
+type of AXI filter (filter, enhanced_filter and super_filter). Value 0 for
+un-supported, and value 1 for supported.
-* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0).
+* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0, super_filter: 0).
Filter is defined with two configuration parts:
--AXI_ID defines AxID matching value.
--AXI_MASKING defines which bits of AxID are meaningful for the matching.
perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12
-* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1).
+* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1, super_filter: 0).
This is an extension to the DDR_CAP_AXI_ID_FILTER quirk which permits
counting the number of bytes (as opposed to the number of bursts) from DDR
read and write transactions concurrently with another set of data counters.
+
+* With DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk(filter: 0, enhanced_filter: 0, super_filter: 1).
+ There is a limitation in previous AXI filter, it cannot filter different IDs
+ at the same time as the filter is shared between counters. This quirk is the
+ extension of AXI ID filter. One improvement is that counter 1-3 has their own
+ filter, means that it supports concurrently filter various IDs. Another
+ improvement is that counter 1-3 supports AXI PORT and CHANNEL selection. Support
+ selecting address channel or data channel.
+
+ Filter is defined with 2 configuration registers per counter 1-3.
+ --Counter N MASK COMP register - including AXI_ID and AXI_MASKING.
+ --Counter N MUX CNTL register - including AXI CHANNEL and AXI PORT.
+
+ - 0: address channel
+ - 1: data channel
+
+ PMU in DDR subsystem, only one single port0 exists, so axi_port is reserved
+ which should be 0.
+
+ .. code-block:: bash
+
+ perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd
+ perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd
+
+ .. note::
+
+ axi_channel is inverted in userspace, and it will be reverted in driver
+ automatically. So that users do not need specify axi_channel if want to
+ monitor data channel from DDR transactions, since data channel is more
+ meaningful.
arm_dsu_pmu
thunderx2-pmu
alibaba_pmu
+ dwc_pcie_pmu
nvidia-pmu
meson-ddr-pmu
cxl
or in some very unusual cases, both. If no command line parameters are used,
the kernel will try to use DT for device enumeration; if there is no DT
present, the kernel will try to use ACPI tables, but only if they are present.
-In neither is available, the kernel will not boot. If acpi=force is used
+If neither is available, the kernel will not boot. If acpi=force is used
on the command line, the kernel will attempt to use ACPI tables first, but
fall back to DT if there are no ACPI tables present. The basic idea is that
the kernel will not fail to boot unless it absolutely has no other choice.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
.. _tools/lib/perf/tests/test-evsel.c:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
+
+Event Counting Threshold
+==========================================
+
+Overview
+--------
+
+FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
+events whose count meets a specified threshold condition. For example if
+threshold_compare is set to 2 ('Greater than or equal'), and the
+threshold is set to 2, then the PMU counter will now only increment by
+when an event would have previously incremented the PMU counter by 2 or
+more on a single processor cycle.
+
+To increment by 1 after passing the threshold condition instead of the
+number of events on that cycle, add the 'threshold_count' option to the
+commandline.
+
+How-to
+------
+
+These are the parameters for controlling the feature:
+
+.. list-table::
+ :header-rows: 1
+
+ * - Parameter
+ - Description
+ * - threshold
+ - Value to threshold the event by. A value of 0 means that
+ thresholding is disabled and the other parameters have no effect.
+ * - threshold_compare
+ - | Comparison function to use, with the following values supported:
+ |
+ | 0: Not-equal
+ | 1: Equals
+ | 2: Greater-than-or-equal
+ | 3: Less-than
+ * - threshold_count
+ - If this is set, count by 1 after passing the threshold condition
+ instead of the value of the event on this cycle.
+
+The threshold, threshold_compare and threshold_count values can be
+provided per event, for example:
+
+.. code-block:: sh
+
+ perf stat -e stall_slot/threshold=2,threshold_compare=2/ \
+ -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/
+
+In this example the stall_slot event will count by 2 or more on every
+cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on
+every cycle where the number of dtlb walks were less than 10.
+
+The maximum supported threshold value can be read from the caps of each
+PMU, for example:
+
+.. code-block:: sh
+
+ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
+
+ 0x000000ff
+
+If a value higher than this is given, then opening the event will result
+in an error. The highest possible maximum is 4095, as the config field
+for threshold is limited to 12 bits, and the Perf tool will refuse to
+parse higher values.
+
+If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read
+0, and attempting to set a threshold value will also result in an error.
+threshold_max will also read as 0 on aarch32 guests, even if the host
+is running on hardware with the feature.
Introduction
============
-On x86, flags appearing in /proc/cpuinfo have an X86_FEATURE definition
-in arch/x86/include/asm/cpufeatures.h. If the kernel cares about a feature
-or KVM want to expose the feature to a KVM guest, it can and should have
-an X86_FEATURE_* defined. These flags represent hardware features as
-well as software features.
-
-If users want to know if a feature is available on a given system, they
-try to find the flag in /proc/cpuinfo. If a given flag is present, it
-means that the kernel supports it and is currently making it available.
-If such flag represents a hardware feature, it also means that the
-hardware supports it.
-
-If the expected flag does not appear in /proc/cpuinfo, things are murkier.
-Users need to find out the reason why the flag is missing and find the way
-how to enable it, which is not always easy. There are several factors that
-can explain missing flags: the expected feature failed to enable, the feature
-is missing in hardware, platform firmware did not enable it, the feature is
-disabled at build or run time, an old kernel is in use, or the kernel does
-not support the feature and thus has not enabled it. In general, /proc/cpuinfo
-shows features which the kernel supports. For a full list of CPUID flags
-which the CPU supports, use tools/arch/x86/kcpuid.
+The list of feature flags in /proc/cpuinfo is not complete and
+represents an ill-fated attempt from long time ago to put feature flags
+in an easy to find place for userspace.
+
+However, the amount of feature flags is growing by the CPU generation,
+leading to unparseable and unwieldy /proc/cpuinfo.
+
+What is more, those feature flags do not even need to be in that file
+because userspace doesn't care about them - glibc et al already use
+CPUID to find out what the target machine supports and what not.
+
+And even if it doesn't show a particular feature flag - although the CPU
+still does have support for the respective hardware functionality and
+said CPU supports CPUID faulting - userspace can simply probe for the
+feature and figure out if it is supported or not, regardless of whether
+it is being advertised somewhere.
+
+Furthermore, those flag strings become an ABI the moment they appear
+there and maintaining them forever when nothing even uses them is a lot
+of wasted effort.
+
+So, the current use of /proc/cpuinfo is to show features which the
+kernel has *enabled* and *supports*. As in: the CPUID feature flag is
+there, there's an additional setup which the kernel has done while
+booting and the functionality is ready to use. A perfect example for
+that is "user_shstk" where additional code enablement is present in the
+kernel to support shadow stack for user programs.
+
+So, if users want to know if a feature is available on a given system,
+they try to find the flag in /proc/cpuinfo. If a given flag is present,
+it means that
+
+* the kernel knows about the feature enough to have an X86_FEATURE bit
+
+* the kernel supports it and is currently making it available either to
+ userspace or some other part of the kernel
+
+* if the flag represents a hardware feature the hardware supports it.
+
+The absence of a flag in /proc/cpuinfo by itself means almost nothing to
+an end user.
+
+On the one hand, a feature like "vaes" might be fully available to user
+applications on a kernel that has not defined X86_FEATURE_VAES and thus
+there is no "vaes" in /proc/cpuinfo.
+
+On the other hand, a new kernel running on non-VAES hardware would also
+have no "vaes" in /proc/cpuinfo. There's no way for an application or
+user to tell the difference.
+
+The end result is that the flags field in /proc/cpuinfo is marginally
+useful for kernel debugging, but not really for anything else.
+Applications should instead use things like the glibc facilities for
+querying CPU support. Users should rely on tools like
+tools/arch/x86/kcpuid and cpuid(1).
+
+Regarding implementation, flags appearing in /proc/cpuinfo have an
+X86_FEATURE definition in arch/x86/include/asm/cpufeatures.h. These flags
+represent hardware features as well as software features.
+
+If the kernel cares about a feature or KVM want to expose the feature to
+a KVM guest, it should only then expose it to the guest when the guest
+needs to parse /proc/cpuinfo. Which, as mentioned above, is highly
+unlikely. KVM can synthesize the CPUID bit and the KVM guest can simply
+query CPUID and figure out what the hypervisor supports and what not. As
+already stated, /proc/cpuinfo is not a dumping ground for useless
+feature flags.
+
How are feature flags created?
==============================
properties:
compatible:
- enum:
- - fsl,imx23-ocotp
- - fsl,imx28-ocotp
+ items:
+ - enum:
+ - fsl,imx23-ocotp
+ - fsl,imx28-ocotp
+ - const: fsl,ocotp
reg:
maxItems: 1
examples:
- |
ocotp: efuse@8002c000 {
- compatible = "fsl,imx28-ocotp";
+ compatible = "fsl,imx28-ocotp", "fsl,ocotp";
#address-cells = <1>;
#size-cells = <1>;
reg = <0x8002c000 0x2000>;
- fsl,imx8mq-ddr-pmu
- fsl,imx8mp-ddr-pmu
- const: fsl,imx8m-ddr-pmu
+ - items:
+ - const: fsl,imx8dxl-ddr-pmu
+ - const: fsl,imx8-ddr-pmu
reg:
maxItems: 1
encode FILEID_INO32_GEN* file handles.
Filesystems that used the default implementation may use the generic helper
generic_encode_ino32_fh() explicitly.
+
+---
+
+**recommended**
+
+Block device freezing and thawing have been moved to holder operations.
+
+Before this change, get_active_super() would only be able to find the
+superblock of the main block device, i.e., the one stored in sb->s_bdev. Block
+device freezing now works for any block device owned by a given superblock, not
+just the main block device. The get_active_super() helper and bd_fsfreeze_sb
+pointer are gone.
- Physical I2C transaction on bus A, slave address 0x20
- ATR chip detects transaction on address 0x20, finds it in table,
propagates transaction on bus B with address translated to 0x10,
- keeps clock streched on bus A waiting for reply
+ keeps clock stretched on bus A waiting for reply
- Slave X chip (on bus B) detects transaction at its own physical
address 0x10 and replies normally
- ATR chip stops clock stretching and forwards reply on bus A,
:maxdepth: 1
staging/index
+ RAS/ras
Translations
temp_prefered_lft - INTEGER
Preferred lifetime (in seconds) for temporary addresses. If
temp_prefered_lft is less than the minimum required lifetime (typically
- 5 seconds), the preferred lifetime is the minimum required. If
+ 5 seconds), temporary addresses will not be created. If
temp_prefered_lft is greater than temp_valid_lft, the preferred lifetime
is temp_valid_lft.
<mailto:vgo@ratio.de>
0xB1 00-1F PPPoX
<mailto:mostrows@styx.uwaterloo.ca>
+0xB2 00 arch/powerpc/include/uapi/asm/papr-vpd.h powerpc/pseries VPD API
+ <mailto:linuxppc-dev>
+0xB2 01-02 arch/powerpc/include/uapi/asm/papr-sysparm.h powerpc/pseries system parameter API
+ <mailto:linuxppc-dev>
0xB3 00 linux/mmc/ioctl.h
0xB4 00-0F linux/gpio.h <mailto:linux-gpio@vger.kernel.org>
0xB5 00-0F uapi/linux/rpmsg.h <mailto:linux-remoteproc@vger.kernel.org>
M: Hante Meuleman <hante.meuleman@broadcom.com>
L: linux-wireless@vger.kernel.org
L: brcm80211-dev-list.pdl@broadcom.com
-L: SHA-cyfmac-dev-list@infineon.com
S: Supported
F: drivers/net/wireless/broadcom/brcm80211/
M: dm-devel@lists.linux.dev
L: dm-devel@lists.linux.dev
S: Maintained
-W: http://sources.redhat.com/dm
Q: http://patchwork.kernel.org/project/dm-devel/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
-T: quilt http://people.redhat.com/agk/patches/linux/editing/
F: Documentation/admin-guide/device-mapper/
F: drivers/md/Kconfig
F: drivers/md/Makefile
F: drivers/gpu/drm/vboxvideo/
DRM DRIVER FOR VMWARE VIRTUAL GPU
-M: Zack Rusin <zackr@vmware.com>
-R: VMware Graphics Reviewers <linux-graphics-maintainer@vmware.com>
+M: Zack Rusin <zack.rusin@broadcom.com>
+R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
L: dri-devel@lists.freedesktop.org
S: Supported
T: git git://anongit.freedesktop.org/drm/drm-misc
FILESYSTEMS (VFS and infrastructure)
M: Alexander Viro <viro@zeniv.linux.org.uk>
M: Christian Brauner <brauner@kernel.org>
+R: Jan Kara <jack@suse.cz>
L: linux-fsdevel@vger.kernel.org
S: Maintained
F: fs/*
F: fs/fhandle.c
F: include/linux/exportfs.h
+FILESYSTEMS [IDMAPPED MOUNTS]
+M: Christian Brauner <brauner@kernel.org>
+M: Seth Forshee <sforshee@kernel.org>
+L: linux-fsdevel@vger.kernel.org
+S: Maintained
+F: Documentation/filesystems/idmappings.rst
+F: fs/mnt_idmapping.c
+F: include/linux/mnt_idmapping.*
+F: tools/testing/selftests/mount_setattr/
+
FILESYSTEMS [IOMAP]
M: Christian Brauner <brauner@kernel.org>
R: Darrick J. Wong <djwong@kernel.org>
F: fs/iomap/
F: include/linux/iomap.h
+FILESYSTEMS [STACKABLE]
+M: Miklos Szeredi <miklos@szeredi.hu>
+M: Amir Goldstein <amir73il@gmail.com>
+L: linux-fsdevel@vger.kernel.org
+L: linux-unionfs@vger.kernel.org
+S: Maintained
+F: fs/backing-file.c
+F: include/linux/backing-file.h
+
FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
M: Riku Voipio <riku.voipio@iki.fi>
L: linux-hwmon@vger.kernel.org
GPIO SUBSYSTEM
M: Linus Walleij <linus.walleij@linaro.org>
M: Bartosz Golaszewski <brgl@bgdev.pl>
-R: Andy Shevchenko <andy@kernel.org>
L: linux-gpio@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
-F: Documentation/ABI/obsolete/sysfs-gpio
-F: Documentation/ABI/testing/gpio-cdev
F: Documentation/admin-guide/gpio/
F: Documentation/devicetree/bindings/gpio/
F: Documentation/driver-api/gpio/
F: include/linux/gpio.h
F: include/linux/gpio/
F: include/linux/of_gpio.h
+
+GPIO UAPI
+M: Bartosz Golaszewski <brgl@bgdev.pl>
+R: Kent Gibson <warthog618@gmail.com>
+L: linux-gpio@vger.kernel.org
+S: Maintained
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
+F: Documentation/ABI/obsolete/sysfs-gpio
+F: Documentation/ABI/testing/gpio-cdev
+F: drivers/gpio/gpiolib-cdev.c
F: include/uapi/linux/gpio.h
F: tools/gpio/
HISILICON NETWORK SUBSYSTEM 3 DRIVER (HNS3)
M: Yisen Zhuang <yisen.zhuang@huawei.com>
M: Salil Mehta <salil.mehta@huawei.com>
+M: Jijie Shao <shaojijie@huawei.com>
L: netdev@vger.kernel.org
S: Maintained
W: http://www.hisilicon.com
F: drivers/net/ethernet/huawei/hinic/
HUGETLB SUBSYSTEM
-M: Mike Kravetz <mike.kravetz@oracle.com>
M: Muchun Song <muchun.song@linux.dev>
L: linux-mm@kvack.org
S: Maintained
F: drivers/media/platform/st/sti/hva
HWPOISON MEMORY FAILURE HANDLING
-M: Naoya Horiguchi <naoya.horiguchi@nec.com>
-R: Miaohe Lin <linmiaohe@huawei.com>
+M: Miaohe Lin <linmiaohe@huawei.com>
+R: Naoya Horiguchi <naoya.horiguchi@nec.com>
L: linux-mm@kvack.org
S: Maintained
F: mm/hwpoison-inject.c
W: https://github.com/o2genum/ideapad-slidebar
F: drivers/input/misc/ideapad_slidebar.c
-IDMAPPED MOUNTS
-M: Christian Brauner <brauner@kernel.org>
-M: Seth Forshee <sforshee@kernel.org>
-L: linux-fsdevel@vger.kernel.org
-S: Maintained
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git
-F: Documentation/filesystems/idmappings.rst
-F: include/linux/mnt_idmapping.*
-F: tools/testing/selftests/mount_setattr/
-
IDT VersaClock 5 CLOCK DRIVER
M: Luca Ceresoli <luca@lucaceresoli.net>
S: Maintained
F: drivers/gpio/gpio-sch.c
F: drivers/gpio/gpio-sodaville.c
F: drivers/gpio/gpio-tangier.c
+F: drivers/gpio/gpio-tangier.h
INTEL GVT-g DRIVERS (Intel GPU Virtualization)
M: Zhenyu Wang <zhenyuw@linux.intel.com>
F: scripts/Kbuild*
F: scripts/Makefile*
F: scripts/basic/
+F: scripts/clang-tools/
F: scripts/dummy-tools/
F: scripts/mk*
F: scripts/mod/
F: arch/powerpc/platforms/40x/
F: arch/powerpc/platforms/44x/
-LINUX FOR POWERPC EMBEDDED PPC83XX AND PPC85XX
+LINUX FOR POWERPC EMBEDDED PPC85XX
M: Scott Wood <oss@buserror.net>
L: linuxppc-dev@lists.ozlabs.org
S: Odd fixes
T: git git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
F: Documentation/devicetree/bindings/cache/freescale-l2cache.txt
F: Documentation/devicetree/bindings/powerpc/fsl/
-F: arch/powerpc/platforms/83xx/
F: arch/powerpc/platforms/85xx/
-LINUX FOR POWERPC EMBEDDED PPC8XX
+LINUX FOR POWERPC EMBEDDED PPC8XX AND PPC83XX
M: Christophe Leroy <christophe.leroy@csgroup.eu>
L: linuxppc-dev@lists.ozlabs.org
S: Maintained
F: arch/powerpc/platforms/8xx/
+F: arch/powerpc/platforms/83xx/
LINUX KERNEL DUMP TEST MODULE (LKDTM)
M: Kees Cook <keescook@chromium.org>
F: drivers/net/ethernet/marvell/mvneta.*
MARVELL MVPP2 ETHERNET DRIVER
-M: Marcin Wojtas <mw@semihalf.com>
+M: Marcin Wojtas <marcin.s.wojtas@gmail.com>
M: Russell King <linux@armlinux.org.uk>
L: netdev@vger.kernel.org
S: Maintained
NETWORKING [MPTCP]
M: Matthieu Baerts <matttbe@kernel.org>
M: Mat Martineau <martineau@kernel.org>
+R: Geliang Tang <geliang.tang@linux.dev>
L: netdev@vger.kernel.org
L: mptcp@lists.linux.dev
S: Maintained
F: drivers/bluetooth/btnxpuart.c
NXP C45 TJA11XX PHY DRIVER
-M: Radu Pirea <radu-nicolae.pirea@oss.nxp.com>
+M: Andrei Botila <andrei.botila@oss.nxp.com>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/phy/nxp-c45-tja11xx.c
F: drivers/pci/controller/dwc/pcie-armada8k.c
PCI DRIVER FOR CADENCE PCIE IP
-M: Tom Joseph <tjoseph@cadence.com>
L: linux-pci@vger.kernel.org
-S: Maintained
+S: Orphan
F: Documentation/devicetree/bindings/pci/cdns,*
-F: drivers/pci/controller/cadence/
+F: drivers/pci/controller/cadence/*cadence*
PCI DRIVER FOR FREESCALE LAYERSCAPE
M: Minghuan Lian <minghuan.Lian@nxp.com>
S: Maintained
F: drivers/mmc/host/dw_mmc*
+SYNOPSYS DESIGNWARE PCIE PMU DRIVER
+M: Shuai Xue <xueshuai@linux.alibaba.com>
+M: Jing Zhang <renyu.zj@linux.alibaba.com>
+S: Supported
+F: Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+F: drivers/perf/dwc_pcie_pmu.c
+
SYNOPSYS HSDK RESET CONTROLLER DRIVER
M: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
S: Supported
F: include/linux/vmw_vmci*
VMWARE VMMOUSE SUBDRIVER
-M: Zack Rusin <zackr@vmware.com>
-R: VMware Graphics Reviewers <linux-graphics-maintainer@vmware.com>
-R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M: Zack Rusin <zack.rusin@broadcom.com>
+R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
L: linux-input@vger.kernel.org
S: Supported
F: drivers/input/mouse/vmmouse.c
VERSION = 6
PATCHLEVEL = 7
SUBLEVEL = 0
-EXTRAVERSION = -rc6
+EXTRAVERSION =
NAME = Hurr durr I'ma ninja sloth
# *DOCUMENTATION*
564 common futex_wake sys_futex_wake
565 common futex_wait sys_futex_wait
566 common futex_requeue sys_futex_requeue
+567 common statmount sys_statmount
+568 common listmount sys_listmount
select OF
select OF_EARLY_FLATTREE
select PCI_SYSCALL if PCI
- select PERF_USE_VMALLOC if ARC_CACHE_VIPT_ALIASING
select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
select TRACE_IRQFLAGS_SUPPORT
Note that Global I/D ENABLE + Per Page DISABLE works but corollary
Global DISABLE + Per Page ENABLE won't work
-config ARC_CACHE_VIPT_ALIASING
- bool "Support VIPT Aliasing D$"
- depends on ARC_HAS_DCACHE && ISA_ARCOMPACT
-
endif #ARC_CACHE
config ARC_HAS_ICCM
#define flush_cache_dup_mm(mm) /* called on fork (VIVT only) */
-#ifndef CONFIG_ARC_CACHE_VIPT_ALIASING
-
#define flush_cache_mm(mm) /* called on munmap/exit */
#define flush_cache_range(mm, u_vstart, u_vend)
#define flush_cache_page(vma, u_vaddr, pfn) /* PF handling/COW-break */
-#else /* VIPT aliasing dcache */
-
-/* To clear out stale userspace mappings */
-void flush_cache_mm(struct mm_struct *mm);
-void flush_cache_range(struct vm_area_struct *vma,
- unsigned long start,unsigned long end);
-void flush_cache_page(struct vm_area_struct *vma,
- unsigned long user_addr, unsigned long page);
-
-/*
- * To make sure that userspace mapping is flushed to memory before
- * get_user_pages() uses a kernel mapping to access the page
- */
-#define ARCH_HAS_FLUSH_ANON_PAGE
-void flush_anon_page(struct vm_area_struct *vma,
- struct page *page, unsigned long u_vaddr);
-
-#endif /* CONFIG_ARC_CACHE_VIPT_ALIASING */
-
/*
* A new pagecache page has PG_arch_1 clear - thus dcache dirty by default
* This works around some PIO based drivers which don't call flush_dcache_page
*/
#define PG_dc_clean PG_arch_1
-#define CACHE_COLORS_NUM 4
-#define CACHE_COLORS_MSK (CACHE_COLORS_NUM - 1)
-#define CACHE_COLOR(addr) (((unsigned long)(addr) >> (PAGE_SHIFT)) & CACHE_COLORS_MSK)
-
-/*
- * Simple wrapper over config option
- * Bootup code ensures that hardware matches kernel configuration
- */
-static inline int cache_is_vipt_aliasing(void)
-{
- return IS_ENABLED(CONFIG_ARC_CACHE_VIPT_ALIASING);
-}
-
-/*
- * checks if two addresses (after page aligning) index into same cache set
- */
-#define addr_not_cache_congruent(addr1, addr2) \
-({ \
- cache_is_vipt_aliasing() ? \
- (CACHE_COLOR(addr1) != CACHE_COLOR(addr2)) : 0; \
-})
-
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
memcpy(dst, src, len); \
/* M = 8-1 N = 8 */
.endm
+.macro SAVE_ABI_CALLEE_REGS
+ push r13
+ push r14
+ push r15
+ push r16
+ push r17
+ push r18
+ push r19
+ push r20
+ push r21
+ push r22
+ push r23
+ push r24
+ push r25
+.endm
+
+.macro RESTORE_ABI_CALLEE_REGS
+ pop r25
+ pop r24
+ pop r23
+ pop r22
+ pop r21
+ pop r20
+ pop r19
+ pop r18
+ pop r17
+ pop r16
+ pop r15
+ pop r14
+ pop r13
+.endm
+
#endif
#include <asm/irqflags-compact.h>
#include <asm/thread_info.h> /* For THREAD_SIZE */
+/* Note on the LD/ST addr modes with addr reg wback
+ *
+ * LD.a same as LD.aw
+ *
+ * LD.a reg1, [reg2, x] => Pre Incr
+ * Eff Addr for load = [reg2 + x]
+ *
+ * LD.ab reg1, [reg2, x] => Post Incr
+ * Eff Addr for load = [reg2]
+ */
+
+.macro PUSHAX aux
+ lr r9, [\aux]
+ push r9
+.endm
+
+.macro POPAX aux
+ pop r9
+ sr r9, [\aux]
+.endm
+
+.macro SAVE_R0_TO_R12
+ push r0
+ push r1
+ push r2
+ push r3
+ push r4
+ push r5
+ push r6
+ push r7
+ push r8
+ push r9
+ push r10
+ push r11
+ push r12
+.endm
+
+.macro RESTORE_R12_TO_R0
+ pop r12
+ pop r11
+ pop r10
+ pop r9
+ pop r8
+ pop r7
+ pop r6
+ pop r5
+ pop r4
+ pop r3
+ pop r2
+ pop r1
+ pop r0
+.endm
+
+.macro SAVE_ABI_CALLEE_REGS
+ push r13
+ push r14
+ push r15
+ push r16
+ push r17
+ push r18
+ push r19
+ push r20
+ push r21
+ push r22
+ push r23
+ push r24
+ push r25
+.endm
+
+.macro RESTORE_ABI_CALLEE_REGS
+ pop r25
+ pop r24
+ pop r23
+ pop r22
+ pop r21
+ pop r20
+ pop r19
+ pop r18
+ pop r17
+ pop r16
+ pop r15
+ pop r14
+ pop r13
+.endm
+
/*--------------------------------------------------------------
* Switch to Kernel Mode stack if SP points to User Mode stack
*
SWITCH_TO_KERNEL_STK
- PUSH 0x003\LVL\()abcd /* Dummy ECR */
+ st.a 0x003\LVL\()abcd, [sp, -4] /* Dummy ECR */
sub sp, sp, 8 /* skip orig_r0 (not needed)
skip pt_regs->sp, already saved above */
#include <asm/entry-arcv2.h>
#endif
-/* Note on the LD/ST addr modes with addr reg wback
- *
- * LD.a same as LD.aw
- *
- * LD.a reg1, [reg2, x] => Pre Incr
- * Eff Addr for load = [reg2 + x]
- *
- * LD.ab reg1, [reg2, x] => Post Incr
- * Eff Addr for load = [reg2]
- */
-
-.macro PUSH reg
- st.a \reg, [sp, -4]
-.endm
-
-.macro PUSHAX aux
- lr r9, [\aux]
- PUSH r9
-.endm
-
-.macro POP reg
- ld.ab \reg, [sp, 4]
-.endm
-
-.macro POPAX aux
- POP r9
- sr r9, [\aux]
-.endm
-
-/*--------------------------------------------------------------
- * Helpers to save/restore Scratch Regs:
- * used by Interrupt/Exception Prologue/Epilogue
- *-------------------------------------------------------------*/
-.macro SAVE_R0_TO_R12
- PUSH r0
- PUSH r1
- PUSH r2
- PUSH r3
- PUSH r4
- PUSH r5
- PUSH r6
- PUSH r7
- PUSH r8
- PUSH r9
- PUSH r10
- PUSH r11
- PUSH r12
-.endm
-
-.macro RESTORE_R12_TO_R0
- POP r12
- POP r11
- POP r10
- POP r9
- POP r8
- POP r7
- POP r6
- POP r5
- POP r4
- POP r3
- POP r2
- POP r1
- POP r0
-
-.endm
-
-/*--------------------------------------------------------------
- * Helpers to save/restore callee-saved regs:
- * used by several macros below
- *-------------------------------------------------------------*/
-.macro SAVE_R13_TO_R25
- PUSH r13
- PUSH r14
- PUSH r15
- PUSH r16
- PUSH r17
- PUSH r18
- PUSH r19
- PUSH r20
- PUSH r21
- PUSH r22
- PUSH r23
- PUSH r24
- PUSH r25
-.endm
-
-.macro RESTORE_R25_TO_R13
- POP r25
- POP r24
- POP r23
- POP r22
- POP r21
- POP r20
- POP r19
- POP r18
- POP r17
- POP r16
- POP r15
- POP r14
- POP r13
-.endm
-
/*
* save user mode callee regs as struct callee_regs
* - needed by fork/do_signal/unaligned-access-emulation.
*/
.macro SAVE_CALLEE_SAVED_USER
- SAVE_R13_TO_R25
+ SAVE_ABI_CALLEE_REGS
.endm
/*
* - could have been changed by ptrace tracer or unaligned-access fixup
*/
.macro RESTORE_CALLEE_SAVED_USER
- RESTORE_R25_TO_R13
+ RESTORE_ABI_CALLEE_REGS
.endm
/*
* save/restore kernel mode callee regs at the time of context switch
*/
.macro SAVE_CALLEE_SAVED_KERNEL
- SAVE_R13_TO_R25
+ SAVE_ABI_CALLEE_REGS
.endm
.macro RESTORE_CALLEE_SAVED_KERNEL
- RESTORE_R25_TO_R13
+ RESTORE_ABI_CALLEE_REGS
.endm
/*--------------------------------------------------------------
#include <linux/types.h>
#include <asm-generic/pgtable-nopmd.h>
+/*
+ * Hugetlb definitions.
+ */
+#define HPAGE_SHIFT PMD_SHIFT
+#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK (~(HPAGE_SIZE - 1))
+
static inline pte_t pmd_pte(pmd_t pmd)
{
return __pte(pmd_val(pmd));
ecr_reg ecr;
};
+struct callee_regs {
+ unsigned long r25, r24, r23, r22, r21, r20, r19, r18, r17, r16, r15, r14, r13;
+};
+
#define MAX_REG_OFFSET offsetof(struct pt_regs, ecr)
#else
unsigned long status32;
};
-#define MAX_REG_OFFSET offsetof(struct pt_regs, status32)
-
-#endif
-
-/* Callee saved registers - need to be saved only when you are scheduled out */
-
struct callee_regs {
unsigned long r25, r24, r23, r22, r21, r20, r19, r18, r17, r16, r15, r14, r13;
};
+#define MAX_REG_OFFSET offsetof(struct pt_regs, status32)
+
+#endif
+
#define instruction_pointer(regs) ((regs)->ret)
#define profile_pc(regs) instruction_pointer(regs)
{
int n = 0;
#ifdef CONFIG_ISA_ARCV2
- const char *release, *cpu_nm, *isa_nm = "ARCv2";
+ const char *release = "", *cpu_nm = "HS38", *isa_nm = "ARCv2";
int dual_issue = 0, dual_enb = 0, mpy_opt, present;
int bpu_full, bpu_cache, bpu_pred, bpu_ret_stk;
char mpy_nm[16], lpb_nm[32];
* releases only update it.
*/
- cpu_nm = "HS38";
-
if (info->arcver > 0x50 && info->arcver <= 0x53) {
release = arc_hs_rel[info->arcver - 0x51].str;
} else {
unsigned int sigret_magic;
};
-static int save_arcv2_regs(struct sigcontext *mctx, struct pt_regs *regs)
+static int save_arcv2_regs(struct sigcontext __user *mctx, struct pt_regs *regs)
{
int err = 0;
#ifndef CONFIG_ISA_ARCOMPACT
#else
v2abi.r58 = v2abi.r59 = 0;
#endif
- err = __copy_to_user(&mctx->v2abi, &v2abi, sizeof(v2abi));
+ err = __copy_to_user(&mctx->v2abi, (void const *)&v2abi, sizeof(v2abi));
#endif
return err;
}
-static int restore_arcv2_regs(struct sigcontext *mctx, struct pt_regs *regs)
+static int restore_arcv2_regs(struct sigcontext __user *mctx, struct pt_regs *regs)
{
int err = 0;
#ifndef CONFIG_ISA_ARCOMPACT
p_dc->sz_k = 1 << (dbcr.sz - 1);
n += scnprintf(buf + n, len - n,
- "D-Cache\t\t: %uK, %dway/set, %uB Line, %s%s%s\n",
+ "D-Cache\t\t: %uK, %dway/set, %uB Line, %s%s\n",
p_dc->sz_k, assoc, p_dc->line_len,
vipt ? "VIPT" : "PIPT",
- p_dc->colors > 1 ? " aliasing" : "",
IS_USED_CFG(CONFIG_ARC_HAS_DCACHE));
slc_chk:
* Exported APIs
*/
-/*
- * Handle cache congruency of kernel and userspace mappings of page when kernel
- * writes-to/reads-from
- *
- * The idea is to defer flushing of kernel mapping after a WRITE, possible if:
- * -dcache is NOT aliasing, hence any U/K-mappings of page are congruent
- * -U-mapping doesn't exist yet for page (finalised in update_mmu_cache)
- * -In SMP, if hardware caches are coherent
- *
- * There's a corollary case, where kernel READs from a userspace mapped page.
- * If the U-mapping is not congruent to K-mapping, former needs flushing.
- */
void flush_dcache_folio(struct folio *folio)
{
- struct address_space *mapping;
-
- if (!cache_is_vipt_aliasing()) {
- clear_bit(PG_dc_clean, &folio->flags);
- return;
- }
-
- /* don't handle anon pages here */
- mapping = folio_flush_mapping(folio);
- if (!mapping)
- return;
-
- /*
- * pagecache page, file not yet mapped to userspace
- * Make a note that K-mapping is dirty
- */
- if (!mapping_mapped(mapping)) {
- clear_bit(PG_dc_clean, &folio->flags);
- } else if (folio_mapped(folio)) {
- /* kernel reading from page with U-mapping */
- phys_addr_t paddr = (unsigned long)folio_address(folio);
- unsigned long vaddr = folio_pos(folio);
-
- /*
- * vaddr is not actually the virtual address, but is
- * congruent to every user mapping.
- */
- if (addr_not_cache_congruent(paddr, vaddr))
- __flush_dcache_pages(paddr, vaddr,
- folio_nr_pages(folio));
- }
+ clear_bit(PG_dc_clean, &folio->flags);
+ return;
}
EXPORT_SYMBOL(flush_dcache_folio);
}
-#ifdef CONFIG_ARC_CACHE_VIPT_ALIASING
-
-void flush_cache_mm(struct mm_struct *mm)
-{
- flush_cache_all();
-}
-
-void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr,
- unsigned long pfn)
-{
- phys_addr_t paddr = pfn << PAGE_SHIFT;
-
- u_vaddr &= PAGE_MASK;
-
- __flush_dcache_pages(paddr, u_vaddr, 1);
-
- if (vma->vm_flags & VM_EXEC)
- __inv_icache_pages(paddr, u_vaddr, 1);
-}
-
-void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end)
-{
- flush_cache_all();
-}
-
-void flush_anon_page(struct vm_area_struct *vma, struct page *page,
- unsigned long u_vaddr)
-{
- /* TBD: do we really need to clear the kernel mapping */
- __flush_dcache_pages((phys_addr_t)page_address(page), u_vaddr, 1);
- __flush_dcache_pages((phys_addr_t)page_address(page),
- (phys_addr_t)page_address(page), 1);
-
-}
-
-#endif
-
void copy_user_highpage(struct page *to, struct page *from,
unsigned long u_vaddr, struct vm_area_struct *vma)
{
struct folio *dst = page_folio(to);
void *kfrom = kmap_atomic(from);
void *kto = kmap_atomic(to);
- int clean_src_k_mappings = 0;
-
- /*
- * If SRC page was already mapped in userspace AND it's U-mapping is
- * not congruent with K-mapping, sync former to physical page so that
- * K-mapping in memcpy below, sees the right data
- *
- * Note that while @u_vaddr refers to DST page's userspace vaddr, it is
- * equally valid for SRC page as well
- *
- * For !VIPT cache, all of this gets compiled out as
- * addr_not_cache_congruent() is 0
- */
- if (page_mapcount(from) && addr_not_cache_congruent(kfrom, u_vaddr)) {
- __flush_dcache_pages((unsigned long)kfrom, u_vaddr, 1);
- clean_src_k_mappings = 1;
- }
copy_page(kto, kfrom);
- /*
- * Mark DST page K-mapping as dirty for a later finalization by
- * update_mmu_cache(). Although the finalization could have been done
- * here as well (given that both vaddr/paddr are available).
- * But update_mmu_cache() already has code to do that for other
- * non copied user pages (e.g. read faults which wire in pagecache page
- * directly).
- */
clear_bit(PG_dc_clean, &dst->flags);
-
- /*
- * if SRC was already usermapped and non-congruent to kernel mapping
- * sync the kernel mapping back to physical page
- */
- if (clean_src_k_mappings) {
- __flush_dcache_pages((unsigned long)kfrom,
- (unsigned long)kfrom, 1);
- } else {
- clear_bit(PG_dc_clean, &src->flags);
- }
+ clear_bit(PG_dc_clean, &src->flags);
kunmap_atomic(kto);
kunmap_atomic(kfrom);
dc->line_len, L1_CACHE_BYTES);
/* check for D-Cache aliasing on ARCompact: ARCv2 has PIPT */
- if (is_isa_arcompact()) {
- int handled = IS_ENABLED(CONFIG_ARC_CACHE_VIPT_ALIASING);
-
- if (dc->colors > 1) {
- if (!handled)
- panic("Enable CONFIG_ARC_CACHE_VIPT_ALIASING\n");
- if (CACHE_COLORS_NUM != dc->colors)
- panic("CACHE_COLORS_NUM not optimized for config\n");
- } else if (handled && dc->colors == 1) {
- panic("Disable CONFIG_ARC_CACHE_VIPT_ALIASING\n");
- }
+ if (is_isa_arcompact() && dc->colors > 1) {
+ panic("Aliasing VIPT cache not supported\n");
}
}
#include <asm/cacheflush.h>
-#define COLOUR_ALIGN(addr, pgoff) \
- ((((addr) + SHMLBA - 1) & ~(SHMLBA - 1)) + \
- (((pgoff) << PAGE_SHIFT) & (SHMLBA - 1)))
-
/*
* Ensure that shared mappings are correctly aligned to
* avoid aliasing issues with VIPT caches.
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- int do_align = 0;
- int aliasing = cache_is_vipt_aliasing();
struct vm_unmapped_area_info info;
- /*
- * We only need to do colour alignment if D cache aliases.
- */
- if (aliasing)
- do_align = filp || (flags & MAP_SHARED);
-
/*
* We enforce the MAP_FIXED case.
*/
if (flags & MAP_FIXED) {
- if (aliasing && flags & MAP_SHARED &&
+ if (flags & MAP_SHARED &&
(addr - (pgoff << PAGE_SHIFT)) & (SHMLBA - 1))
return -EINVAL;
return addr;
return -ENOMEM;
if (addr) {
- if (do_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- else
- addr = PAGE_ALIGN(addr);
+ addr = PAGE_ALIGN(addr);
vma = find_vma(mm, addr);
if (TASK_SIZE - len >= addr &&
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
- info.align_mask = do_align ? (PAGE_MASK & (SHMLBA - 1)) : 0;
+ info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
return vm_unmapped_area(&info);
}
create_tlb(vma, vaddr, ptep);
- if (page == ZERO_PAGE(0)) {
+ if (page == ZERO_PAGE(0))
return;
- }
/*
- * Exec page : Independent of aliasing/page-color considerations,
- * since icache doesn't snoop dcache on ARC, any dirty
- * K-mapping of a code page needs to be wback+inv so that
- * icache fetch by userspace sees code correctly.
- * !EXEC page: If K-mapping is NOT congruent to U-mapping, flush it
- * so userspace sees the right data.
- * (Avoids the flush for Non-exec + congruent mapping case)
+ * For executable pages, since icache doesn't snoop dcache, any
+ * dirty K-mapping of a code page needs to be wback+inv so that
+ * icache fetch by userspace sees code correctly.
*/
- if ((vma->vm_flags & VM_EXEC) ||
- addr_not_cache_congruent(paddr, vaddr)) {
+ if (vma->vm_flags & VM_EXEC) {
struct folio *folio = page_folio(page);
int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
if (dirty) {
<SYSC_IDLE_NO>,
<SYSC_IDLE_SMART>,
<SYSC_IDLE_SMART_WKUP>;
+ ti,sysc-delay-us = <2>;
clocks = <&l3s_clkctrl AM3_L3S_USB_OTG_HS_CLKCTRL 0>;
clock-names = "fck";
#address-cells = <1>;
l3-noc@44000000 {
compatible = "ti,dra7-l3-noc";
- reg = <0x44000000 0x1000>,
+ reg = <0x44000000 0x1000000>,
<0x45000000 0x1000>;
interrupts-extended = <&crossbar_mpu GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>,
<&wakeupgen GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
static void armv6pmu_enable_event(struct perf_event *event)
{
- unsigned long val, mask, evt, flags;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+ unsigned long val, mask, evt;
struct hw_perf_event *hwc = &event->hw;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
if (ARMV6_CYCLE_COUNTER == idx) {
* Mask out the current event and set the counter to count the event
* that we're interested in.
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = armv6_pmcr_read();
val &= ~mask;
val |= evt;
armv6_pmcr_write(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static irqreturn_t
static void armv6pmu_start(struct arm_pmu *cpu_pmu)
{
- unsigned long flags, val;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+ unsigned long val;
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = armv6_pmcr_read();
val |= ARMV6_PMCR_ENABLE;
armv6_pmcr_write(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void armv6pmu_stop(struct arm_pmu *cpu_pmu)
{
- unsigned long flags, val;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+ unsigned long val;
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = armv6_pmcr_read();
val &= ~ARMV6_PMCR_ENABLE;
armv6_pmcr_write(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static int
static void armv6pmu_disable_event(struct perf_event *event)
{
- unsigned long val, mask, evt, flags;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+ unsigned long val, mask, evt;
struct hw_perf_event *hwc = &event->hw;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
if (ARMV6_CYCLE_COUNTER == idx) {
* of ETM bus signal assertion cycles. The external reporting should
* be disabled and so this should never increment.
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = armv6_pmcr_read();
val &= ~mask;
val |= evt;
armv6_pmcr_write(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void armv6mpcore_pmu_disable_event(struct perf_event *event)
{
- unsigned long val, mask, flags, evt = 0;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+ unsigned long val, mask, evt = 0;
struct hw_perf_event *hwc = &event->hw;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
if (ARMV6_CYCLE_COUNTER == idx) {
* Unlike UP ARMv6, we don't have a way of stopping the counters. We
* simply disable the interrupt reporting.
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = armv6_pmcr_read();
val &= ~mask;
val |= evt;
armv6_pmcr_write(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static int armv6_map_event(struct perf_event *event)
static void armv7pmu_enable_event(struct perf_event *event)
{
- unsigned long flags;
struct hw_perf_event *hwc = &event->hw;
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
if (!armv7_pmnc_counter_valid(cpu_pmu, idx)) {
* Enable counter and interrupt, and set the counter to count
* the event that we're interested in.
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/*
* Disable counter
* Enable counter
*/
armv7_pmnc_enable_counter(idx);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void armv7pmu_disable_event(struct perf_event *event)
{
- unsigned long flags;
struct hw_perf_event *hwc = &event->hw;
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
if (!armv7_pmnc_counter_valid(cpu_pmu, idx)) {
/*
* Disable counter and interrupt
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/*
* Disable counter
* Disable interrupt for this counter
*/
armv7_pmnc_disable_intens(idx);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
static void armv7pmu_start(struct arm_pmu *cpu_pmu)
{
- unsigned long flags;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Enable all counters */
armv7_pmnc_write(armv7_pmnc_read() | ARMV7_PMNC_E);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void armv7pmu_stop(struct arm_pmu *cpu_pmu)
{
- unsigned long flags;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Disable all counters */
armv7_pmnc_write(armv7_pmnc_read() & ~ARMV7_PMNC_E);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
{
unsigned long config_base = 0;
- if (attr->exclude_idle)
- return -EPERM;
+ if (attr->exclude_idle) {
+ pr_debug("ARM performance counters do not support mode exclusion\n");
+ return -EOPNOTSUPP;
+ }
if (attr->exclude_user)
config_base |= ARMV7_EXCLUDE_USER;
if (attr->exclude_kernel)
static void krait_pmu_disable_event(struct perf_event *event)
{
- unsigned long flags;
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
/* Disable counter and interrupt */
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Disable counter */
armv7_pmnc_disable_counter(idx);
/* Disable interrupt for this counter */
armv7_pmnc_disable_intens(idx);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void krait_pmu_enable_event(struct perf_event *event)
{
- unsigned long flags;
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
/*
* Enable counter and interrupt, and set the counter to count
* the event that we're interested in.
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Disable counter */
armv7_pmnc_disable_counter(idx);
/* Enable counter */
armv7_pmnc_enable_counter(idx);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void krait_pmu_reset(void *info)
static void scorpion_pmu_disable_event(struct perf_event *event)
{
- unsigned long flags;
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
/* Disable counter and interrupt */
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Disable counter */
armv7_pmnc_disable_counter(idx);
/* Disable interrupt for this counter */
armv7_pmnc_disable_intens(idx);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void scorpion_pmu_enable_event(struct perf_event *event)
{
- unsigned long flags;
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
/*
* Enable counter and interrupt, and set the counter to count
* the event that we're interested in.
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Disable counter */
armv7_pmnc_disable_counter(idx);
/* Enable counter */
armv7_pmnc_enable_counter(idx);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void scorpion_pmu_reset(void *info)
static void xscale1pmu_enable_event(struct perf_event *event)
{
- unsigned long val, mask, evt, flags;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+ unsigned long val, mask, evt;
struct hw_perf_event *hwc = &event->hw;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
switch (idx) {
return;
}
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = xscale1pmu_read_pmnc();
val &= ~mask;
val |= evt;
xscale1pmu_write_pmnc(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void xscale1pmu_disable_event(struct perf_event *event)
{
- unsigned long val, mask, evt, flags;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+ unsigned long val, mask, evt;
struct hw_perf_event *hwc = &event->hw;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
switch (idx) {
return;
}
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = xscale1pmu_read_pmnc();
val &= ~mask;
val |= evt;
xscale1pmu_write_pmnc(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static int
static void xscale1pmu_start(struct arm_pmu *cpu_pmu)
{
- unsigned long flags, val;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+ unsigned long val;
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = xscale1pmu_read_pmnc();
val |= XSCALE_PMU_ENABLE;
xscale1pmu_write_pmnc(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
{
- unsigned long flags, val;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+ unsigned long val;
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = xscale1pmu_read_pmnc();
val &= ~XSCALE_PMU_ENABLE;
xscale1pmu_write_pmnc(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static inline u64 xscale1pmu_read_counter(struct perf_event *event)
static void xscale2pmu_enable_event(struct perf_event *event)
{
- unsigned long flags, ien, evtsel;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+ unsigned long ien, evtsel;
struct hw_perf_event *hwc = &event->hw;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
ien = xscale2pmu_read_int_enable();
return;
}
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
xscale2pmu_write_event_select(evtsel);
xscale2pmu_write_int_enable(ien);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void xscale2pmu_disable_event(struct perf_event *event)
{
- unsigned long flags, ien, evtsel, of_flags;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+ unsigned long ien, evtsel, of_flags;
struct hw_perf_event *hwc = &event->hw;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
int idx = hwc->idx;
ien = xscale2pmu_read_int_enable();
return;
}
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
xscale2pmu_write_event_select(evtsel);
xscale2pmu_write_int_enable(ien);
xscale2pmu_write_overflow_flags(of_flags);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static int
static void xscale2pmu_start(struct arm_pmu *cpu_pmu)
{
- unsigned long flags, val;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+ unsigned long val;
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = xscale2pmu_read_pmnc() & ~XSCALE_PMU_CNT64;
val |= XSCALE_PMU_ENABLE;
xscale2pmu_write_pmnc(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void xscale2pmu_stop(struct arm_pmu *cpu_pmu)
{
- unsigned long flags, val;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+ unsigned long val;
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
val = xscale2pmu_read_pmnc();
val &= ~XSCALE_PMU_ENABLE;
xscale2pmu_write_pmnc(val);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static inline u64 xscale2pmu_read_counter(struct perf_event *event)
soc_dev_attr->machine = soc_name;
soc_dev_attr->family = omap_get_family();
+ if (!soc_dev_attr->family) {
+ kfree(soc_dev_attr);
+ return;
+ }
soc_dev_attr->revision = soc_rev;
soc_dev_attr->custom_attr_group = omap_soc_groups[0];
soc_dev = soc_device_register(soc_dev_attr);
if (IS_ERR(soc_dev)) {
+ kfree(soc_dev_attr->family);
kfree(soc_dev_attr);
return;
}
for (i = 0; i < ARRAY_SIZE(sunxi_mc_smp_data); i++) {
ret = of_property_match_string(node, "enable-method",
sunxi_mc_smp_data[i].enable_method);
- if (!ret)
+ if (ret >= 0)
break;
}
- is_a83t = sunxi_mc_smp_data[i].is_a83t;
-
of_node_put(node);
- if (ret)
+ if (ret < 0)
return -ENODEV;
+ is_a83t = sunxi_mc_smp_data[i].is_a83t;
+
if (!sunxi_mc_smp_cpu_table_init())
return -EINVAL;
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
Don't change if unsure.
config UNMAP_KERNEL_AT_EL0
- bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
+ bool "Unmap kernel when running in userspace (KPTI)" if EXPERT
default y
help
Speculation attacks against some high-performance processors can
endif
vdso-install-y += arch/arm64/kernel/vdso/vdso.so.dbg
-vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so
+vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso32.so.dbg
include $(srctree)/scripts/Makefile.defconf
EFI_ZBOOT_MACH_TYPE := ARM64
EFI_ZBOOT_FORWARD_CFI := $(CONFIG_ARM64_BTI_KERNEL)
-EFI_ZBOOT_OBJCOPY_FLAGS = --add-symbol zboot_code_size=0x$(shell \
+EFI_ZBOOT_OBJCOPY_FLAGS = --add-symbol zboot_code_size=0x$$( \
$(NM) vmlinux|grep _kernel_codesize|cut -d' ' -f1)
include $(srctree)/drivers/firmware/efi/libstub/Makefile.zboot
&emac0 {
pinctrl-names = "default";
pinctrl-0 = <&ext_rgmii_pins>;
- phy-mode = "rgmii";
phy-handle = <&ext_rgmii_phy>;
- allwinner,rx-delay-ps = <3100>;
- allwinner,tx-delay-ps = <700>;
status = "okay";
};
};
&emac0 {
+ allwinner,rx-delay-ps = <3100>;
+ allwinner,tx-delay-ps = <700>;
+ phy-mode = "rgmii";
phy-supply = <®_dcdce>;
};
};
&emac0 {
+ allwinner,tx-delay-ps = <700>;
+ phy-mode = "rgmii-rxid";
phy-supply = <®_dldo1>;
};
mt6360: pmic@34 {
compatible = "mediatek,mt6360";
reg = <0x34>;
+ interrupt-parent = <&pio>;
interrupts = <128 IRQ_TYPE_EDGE_FALLING>;
interrupt-names = "IRQB";
interrupt-controller;
# $3 - kernel map file
# $4 - default install path (blank if root directory)
-if [ "$(basename $2)" = "Image.gz" ]; then
+if [ "$(basename $2)" = "Image.gz" ] || [ "$(basename $2)" = "vmlinuz.efi" ]
+then
# Compressed install
echo "Installing compressed kernel"
base=vmlinuz
#ifndef __ASM_ASSEMBLER_H
#define __ASM_ASSEMBLER_H
-#include <asm-generic/export.h>
+#include <linux/export.h>
#include <asm/alternative.h>
#include <asm/asm-bug.h>
#define CTR_L1IP(ctr) SYS_FIELD_GET(CTR_EL0, L1Ip, ctr)
#define ICACHEF_ALIASING 0
-#define ICACHEF_VPIPT 1
extern unsigned long __icache_flags;
/*
return test_bit(ICACHEF_ALIASING, &__icache_flags);
}
-static __always_inline int icache_is_vpipt(void)
-{
- return test_bit(ICACHEF_VPIPT, &__icache_flags);
-}
-
static inline u32 cache_type_cwg(void)
{
return SYS_FIELD_GET(CTR_EL0, CWG, read_cpuid_cachetype());
return val >= ID_AA64PFR1_EL1_MTE_MTE2;
}
+void __init setup_boot_cpu_features(void);
void __init setup_system_features(void);
void __init setup_user_features(void);
return alternative_has_cap_unlikely(ARM64_HAS_TLB_RANGE);
}
+static inline bool system_supports_lpa2(void)
+{
+ return cpus_have_final_cap(ARM64_HAS_LPA2);
+}
+
int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
| (\nx << 5)
.endm
-/*
- * Zero the entire ZA array
- * ZERO ZA
- */
-.macro zero_za
- .inst 0xc00800ff
-.endm
-
.macro __for from:req, to:req
.if (\from) == (\to)
_for__body %\from
/*
- * If KASLR is enabled, then an offset K is added to the kernel address
- * space. The bottom 21 bits of this offset are zero to guarantee 2MB
- * alignment for PA and VA.
- *
- * For each pagetable level of the swapper, we know that the shift will
- * be larger than 21 (for the 4KB granule case we use section maps thus
- * the smallest shift is actually 30) thus there is the possibility that
- * KASLR can increase the number of pagetable entries by 1, so we make
- * room for this extra entry.
- *
- * Note KASLR cannot increase the number of required entries for a level
- * by more than one because it increments both the virtual start and end
- * addresses equally (the extra entry comes from the case where the end
- * address is just pushed over a boundary and the start address isn't).
+ * A relocatable kernel may execute from an address that differs from the one at
+ * which it was linked. In the worst case, its runtime placement may intersect
+ * with two adjacent PGDIR entries, which means that an additional page table
+ * may be needed at each subordinate level.
*/
-
-#ifdef CONFIG_RANDOMIZE_BASE
-#define EARLY_KASLR (1)
-#else
-#define EARLY_KASLR (0)
-#endif
+#define EXTRA_PAGE __is_defined(CONFIG_RELOCATABLE)
#define SPAN_NR_ENTRIES(vstart, vend, shift) \
((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
+ EARLY_PGDS((vstart), (vend), add) /* each PGDIR needs a next level page table */ \
+ EARLY_PUDS((vstart), (vend), add) /* each PUD needs a next level page table */ \
+ EARLY_PMDS((vstart), (vend), add)) /* each PMD needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR))
+#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE))
/* the initial ID map may need two extra pages if it needs to be extended */
#if VA_BITS < 48
static inline void __invalidate_icache_guest_page(void *va, size_t size)
{
- /*
- * VPIPT I-cache maintenance must be done from EL2. See comment in the
- * nVHE flavor of __kvm_tlb_flush_vmid_ipa().
- */
- if (icache_is_vpipt() && read_sysreg(CurrentEL) != CurrentEL_EL2)
- return;
-
/*
* Blow the whole I-cache if it is aliasing (i.e. VIPT) or the
* invalidation range exceeds our arbitrary limit on invadations by
#define KVM_PGTABLE_MIN_BLOCK_LEVEL 2U
#endif
+#define kvm_lpa2_is_enabled() false
+
static inline u64 kvm_get_parange(u64 mmfr0)
{
u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
#include <linux/types.h>
#include <asm/boot.h>
#include <asm/bug.h>
+#include <asm/sections.h>
#if VA_BITS > 48
extern u64 vabits_actual;
/* PHYS_OFFSET - the physical address of the start of memory. */
#define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
-/* the virtual base of the kernel image */
-extern u64 kimage_vaddr;
-
/* the offset between the kernel virtual and physical mappings */
extern u64 kimage_voffset;
static inline unsigned long kaslr_offset(void)
{
- return kimage_vaddr - KIMAGE_VADDR;
+ return (u64)&_text - KIMAGE_VADDR;
}
#ifdef CONFIG_RANDOMIZE_BASE
#define INIT_MEMBLOCK_MEMORY_REGIONS (INIT_MEMBLOCK_REGIONS * 8)
#endif
-#include <asm-generic/memory_model.h>
#endif /* __ASM_MEMORY_H */
#define PTE_MAYBE_NG (arm64_use_ng_mappings ? PTE_NG : 0)
#define PMD_MAYBE_NG (arm64_use_ng_mappings ? PMD_SECT_NG : 0)
+#define lpa2_is_enabled() false
+
/*
* If we have userspace only BTI we don't want to mark kernel pages
* guarded even if the system does support BTI.
unsigned long fault_address; /* fault info */
unsigned long fault_code; /* ESR_EL1 value */
struct debug_info debug; /* debugging */
+
+ struct user_fpsimd_state kernel_fpsimd_state;
+ unsigned int kernel_fpsimd_cpu;
#ifdef CONFIG_ARM64_PTR_AUTH
struct ptrauth_keys_user keys_user;
#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
#include <linux/preempt.h>
#include <linux/types.h>
-DECLARE_PER_CPU(bool, fpsimd_context_busy);
-
#ifdef CONFIG_KERNEL_MODE_NEON
/*
/*
* We must make sure that the SVE has been initialized properly
* before using the SIMD in kernel.
- * fpsimd_context_busy is only set while preemption is disabled,
- * and is clear whenever preemption is enabled. Since
- * this_cpu_read() is atomic w.r.t. preemption, fpsimd_context_busy
- * cannot change under our feet -- if it's set we cannot be
- * migrated, and if it's clear we cannot be migrated to a CPU
- * where it is set.
*/
return !WARN_ON(!system_capabilities_finalized()) &&
system_supports_fpsimd() &&
- !in_hardirq() && !irqs_disabled() && !in_nmi() &&
- !this_cpu_read(fpsimd_context_busy);
+ !in_hardirq() && !irqs_disabled() && !in_nmi();
}
#else /* ! CONFIG_KERNEL_MODE_NEON */
#ifndef __ASM_STACKTRACE_COMMON_H
#define __ASM_STACKTRACE_COMMON_H
-#include <linux/kprobes.h>
#include <linux/types.h>
struct stack_info {
* @fp: The fp value in the frame record (or the real fp)
* @pc: The lr value in the frame record (or the real lr)
*
- * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
- * associated with the most recently encountered replacement lr
- * value.
- *
- * @task: The task being unwound.
- *
* @stack: The stack currently being unwound.
* @stacks: An array of stacks which can be unwound.
* @nr_stacks: The number of stacks in @stacks.
struct unwind_state {
unsigned long fp;
unsigned long pc;
-#ifdef CONFIG_KRETPROBES
- struct llist_node *kr_cur;
-#endif
- struct task_struct *task;
struct stack_info stack;
struct stack_info *stacks;
return true;
}
-static inline void unwind_init_common(struct unwind_state *state,
- struct task_struct *task)
+static inline void unwind_init_common(struct unwind_state *state)
{
- state->task = task;
-#ifdef CONFIG_KRETPROBES
- state->kr_cur = NULL;
-#endif
-
state->stack = stackinfo_get_unknown();
}
unsigned long fp,
unsigned long pc)
{
- unwind_init_common(state, NULL);
+ unwind_init_common(state);
state->fp = fp;
state->pc = pc;
return sys_ni_syscall(); \
}
-#define COMPAT_SYS_NI(name) \
- SYSCALL_ALIAS(__arm64_compat_sys_##name, sys_ni_posix_timers);
-
#endif /* CONFIG_COMPAT */
#define __SYSCALL_DEFINEx(x, name, ...) \
}
asmlinkage long __arm64_sys_ni_syscall(const struct pt_regs *__unused);
-#define SYS_NI(name) SYSCALL_ALIAS(__arm64_sys_##name, sys_ni_posix_timers);
#endif /* __ASM_SYSCALL_WRAPPER_H */
#define OP_AT_S1E0W sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
#define OP_AT_S1E1RP sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
#define OP_AT_S1E1WP sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
+#define OP_AT_S1E1A sys_insn(AT_Op0, 0, AT_CRn, 9, 2)
#define OP_AT_S1E2R sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
#define OP_AT_S1E2W sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
#define OP_AT_S12E1R sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
#define OP_TLBI_VMALLS12E1NXS sys_insn(1, 4, 9, 7, 6)
/* Misc instructions */
+#define OP_GCSPUSHX sys_insn(1, 0, 7, 7, 4)
+#define OP_GCSPOPCX sys_insn(1, 0, 7, 7, 5)
+#define OP_GCSPOPX sys_insn(1, 0, 7, 7, 6)
+#define OP_GCSPUSHM sys_insn(1, 3, 7, 7, 0)
+
#define OP_BRB_IALL sys_insn(1, 1, 7, 2, 4)
#define OP_BRB_INJ sys_insn(1, 1, 7, 2, 5)
#define OP_CFP_RCTX sys_insn(1, 3, 7, 3, 4)
#define OP_DVP_RCTX sys_insn(1, 3, 7, 3, 5)
+#define OP_COSP_RCTX sys_insn(1, 3, 7, 3, 6)
#define OP_CPP_RCTX sys_insn(1, 3, 7, 3, 7)
/* Common SCTLR_ELx flags. */
/* id_aa64mmfr0 */
#define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN 0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2 ID_AA64MMFR0_EL1_TGRAN4_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX 0x7
#define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN 0x0
#define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX 0x7
#define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN 0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2 ID_AA64MMFR0_EL1_TGRAN16_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX 0xf
#define ARM64_MIN_PARANGE_BITS 32
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT 0x0
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE 0x1
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN 0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2 0x3
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX 0x7
#ifdef CONFIG_ARM64_PA_BITS_52
#if defined(CONFIG_ARM64_4K_PAGES)
#define ID_AA64MMFR0_EL1_TGRAN_SHIFT ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2 ID_AA64MMFR0_EL1_TGRAN4_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
#define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
#elif defined(CONFIG_ARM64_16K_PAGES)
#define ID_AA64MMFR0_EL1_TGRAN_SHIFT ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2 ID_AA64MMFR0_EL1_TGRAN16_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
#define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
#define PIRx_ELx_PERM(idx, perm) ((perm) << ((idx) * 4))
+/*
+ * Permission Overlay Extension (POE) permission encodings.
+ */
+#define POE_NONE UL(0x0)
+#define POE_R UL(0x1)
+#define POE_X UL(0x2)
+#define POE_RX UL(0x3)
+#define POE_W UL(0x4)
+#define POE_RW UL(0x5)
+#define POE_XW UL(0x6)
+#define POE_RXW UL(0x7)
+#define POE_MASK UL(0xf)
+
#define ARM64_FEATURE_FIELD_BITS 4
/* Defined for compatibility only, do not add new users. */
#define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */
#define TIF_SME 27 /* SME in use */
#define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */
+#define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#include <asm-generic/tlb.h>
/*
- * get the tlbi levels in arm64. Default value is 0 if more than one
- * of cleared_* is set or neither is set.
- * Arm64 doesn't support p4ds now.
+ * get the tlbi levels in arm64. Default value is TLBI_TTL_UNKNOWN if more than
+ * one of cleared_* is set or neither is set - this elides the level hinting to
+ * the hardware.
*/
static inline int tlb_get_level(struct mmu_gather *tlb)
{
/* The TTL field is only valid for the leaf entry. */
if (tlb->freed_tables)
- return 0;
+ return TLBI_TTL_UNKNOWN;
if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
tlb->cleared_puds ||
tlb->cleared_p4ds))
return 1;
- return 0;
+ if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
+ tlb->cleared_pmds ||
+ tlb->cleared_puds))
+ return 0;
+
+ return TLBI_TTL_UNKNOWN;
}
static inline void tlb_flush(struct mmu_gather *tlb)
* When ARMv8.4-TTL exists, TLBI operations take an additional hint for
* the level at which the invalidation must take place. If the level is
* wrong, no invalidation may take place. In the case where the level
- * cannot be easily determined, a 0 value for the level parameter will
- * perform a non-hinted invalidation.
+ * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
+ * a non-hinted invalidation. Any provided level outside the hint range
+ * will also cause fall-back to non-hinted invalidation.
*
* For Stage-2 invalidation, use the level values provided to that effect
* in asm/stage2_pgtable.h.
*/
#define TLBI_TTL_MASK GENMASK_ULL(47, 44)
+#define TLBI_TTL_UNKNOWN INT_MAX
+
#define __tlbi_level(op, addr, level) do { \
u64 arg = addr; \
\
if (alternative_has_cap_unlikely(ARM64_HAS_ARMv8_4_TTL) && \
- level) { \
+ level >= 0 && level <= 3) { \
u64 ttl = level & 3; \
ttl |= get_trans_granule() << 2; \
arg &= ~TLBI_TTL_MASK; \
} while (0)
/*
- * This macro creates a properly formatted VA operand for the TLB RANGE.
- * The value bit assignments are:
+ * This macro creates a properly formatted VA operand for the TLB RANGE. The
+ * value bit assignments are:
*
* +----------+------+-------+-------+-------+----------------------+
* | ASID | TG | SCALE | NUM | TTL | BADDR |
* +-----------------+-------+-------+-------+----------------------+
* |63 48|47 46|45 44|43 39|38 37|36 0|
*
- * The address range is determined by below formula:
- * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
+ * The address range is determined by below formula: [BADDR, BADDR + (NUM + 1) *
+ * 2^(5*SCALE + 1) * PAGESIZE)
+ *
+ * Note that the first argument, baddr, is pre-shifted; If LPA2 is in use, BADDR
+ * holds addr[52:16]. Else BADDR holds page number. See for example ARM DDI
+ * 0487J.a section C5.5.60 "TLBI VAE1IS, TLBI VAE1ISNXS, TLB Invalidate by VA,
+ * EL1, Inner Shareable".
*
*/
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl) \
- ({ \
- unsigned long __ta = (addr) >> PAGE_SHIFT; \
- __ta &= GENMASK_ULL(36, 0); \
- __ta |= (unsigned long)(ttl) << 37; \
- __ta |= (unsigned long)(num) << 39; \
- __ta |= (unsigned long)(scale) << 44; \
- __ta |= get_trans_granule() << 46; \
- __ta |= (unsigned long)(asid) << 48; \
- __ta; \
+#define __TLBI_VADDR_RANGE(baddr, asid, scale, num, ttl) \
+ ({ \
+ unsigned long __ta = (baddr); \
+ unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0; \
+ __ta &= GENMASK_ULL(36, 0); \
+ __ta |= __ttl << 37; \
+ __ta |= (unsigned long)(num) << 39; \
+ __ta |= (unsigned long)(scale) << 44; \
+ __ta |= get_trans_granule() << 46; \
+ __ta |= (unsigned long)(asid) << 48; \
+ __ta; \
})
/* These macros are used by the TLBI RANGE feature. */
* CPUs, ensuring that any walk-cache entries associated with the
* translation are also invalidated.
*
- * __flush_tlb_range(vma, start, end, stride, last_level)
+ * __flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
* Invalidate the virtual-address range '[start, end)' on all
* CPUs for the user address space corresponding to 'vma->mm'.
* The invalidation operations are issued at a granularity
* determined by 'stride' and only affect any walk-cache entries
- * if 'last_level' is equal to false.
+ * if 'last_level' is equal to false. tlb_level is the level at
+ * which the invalidation must take place. If the level is wrong,
+ * no invalidation may take place. In the case where the level
+ * cannot be easily determined, the value TLBI_TTL_UNKNOWN will
+ * perform a non-hinted invalidation.
*
*
* Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
* @tlb_level: Translation Table level hint, if known
* @tlbi_user: If 'true', call an additional __tlbi_user()
* (typically for user ASIDs). 'flase' for IPA instructions
+ * @lpa2: If 'true', the lpa2 scheme is used as set out below
*
* When the CPU does not support TLB range operations, flush the TLB
* entries one by one at the granularity of 'stride'. If the TLB
* range ops are supported, then:
*
- * 1. If 'pages' is odd, flush the first page through non-range
- * operations;
+ * 1. If FEAT_LPA2 is in use, the start address of a range operation must be
+ * 64KB aligned, so flush pages one by one until the alignment is reached
+ * using the non-range operations. This step is skipped if LPA2 is not in
+ * use.
+ *
+ * 2. The minimum range granularity is decided by 'scale', so multiple range
+ * TLBI operations may be required. Start from scale = 3, flush the largest
+ * possible number of pages ((num+1)*2^(5*scale+1)) that fit into the
+ * requested range, then decrement scale and continue until one or zero pages
+ * are left. We must start from highest scale to ensure 64KB start alignment
+ * is maintained in the LPA2 case.
*
- * 2. For remaining pages: the minimum range granularity is decided
- * by 'scale', so multiple range TLBI operations may be required.
- * Start from scale = 0, flush the corresponding number of pages
- * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
- * until no pages left.
+ * 3. If there is 1 page remaining, flush it through non-range operations. Range
+ * operations can only span an even number of pages. We save this for last to
+ * ensure 64KB start alignment is maintained for the LPA2 case.
*
* Note that certain ranges can be represented by either num = 31 and
* scale or num = 0 and scale + 1. The loop below favours the latter
* since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
*/
#define __flush_tlb_range_op(op, start, pages, stride, \
- asid, tlb_level, tlbi_user) \
+ asid, tlb_level, tlbi_user, lpa2) \
do { \
int num = 0; \
- int scale = 0; \
+ int scale = 3; \
+ int shift = lpa2 ? 16 : PAGE_SHIFT; \
unsigned long addr; \
\
while (pages > 0) { \
if (!system_supports_tlb_range() || \
- pages % 2 == 1) { \
+ pages == 1 || \
+ (lpa2 && start != ALIGN(start, SZ_64K))) { \
addr = __TLBI_VADDR(start, asid); \
__tlbi_level(op, addr, tlb_level); \
if (tlbi_user) \
\
num = __TLBI_RANGE_NUM(pages, scale); \
if (num >= 0) { \
- addr = __TLBI_VADDR_RANGE(start, asid, scale, \
- num, tlb_level); \
+ addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
+ scale, num, tlb_level); \
__tlbi(r##op, addr); \
if (tlbi_user) \
__tlbi_user(r##op, addr); \
start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
pages -= __TLBI_RANGE_PAGES(num, scale); \
} \
- scale++; \
+ scale--; \
} \
} while (0)
#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
- __flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
+ __flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, kvm_lpa2_is_enabled());
static inline void __flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
asid = ASID(vma->vm_mm);
if (last_level)
- __flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
+ __flush_tlb_range_op(vale1is, start, pages, stride, asid,
+ tlb_level, true, lpa2_is_enabled());
else
- __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
+ __flush_tlb_range_op(vae1is, start, pages, stride, asid,
+ tlb_level, true, lpa2_is_enabled());
dsb(ish);
mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
/*
* We cannot use leaf-only invalidation here, since we may be invalidating
* table entries as part of collapsing hugepages or moving page tables.
- * Set the tlb_level to 0 because we can not get enough information here.
+ * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
+ * information here.
*/
- __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+ __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
}
static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
__SYSCALL(__NR_futex_wait, sys_futex_wait)
#define __NR_futex_requeue 456
__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
+#define __NR_statmount 457
+__SYSCALL(__NR_statmount, sys_statmount)
+#define __NR_listmount 458
+__SYSCALL(__NR_listmount, sys_listmount)
/*
* Please add new compat syscalls above this comment and update
if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid);
-
- /*
- * Initialize the indirect array of CPU capabilities pointers before we
- * handle the boot CPU below.
- */
- init_cpucap_indirect_list();
-
- /*
- * Detect broken pseudo-NMI. Must be called _before_ the call to
- * setup_boot_cpu_capabilities() since it interacts with
- * can_use_gic_priorities().
- */
- detect_system_supports_pseudo_nmi();
-
- /*
- * Detect and enable early CPU capabilities based on the boot CPU,
- * after we have initialised the CPU feature infrastructure.
- */
- setup_boot_cpu_capabilities();
}
static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
return has_sre;
}
-static bool has_no_hw_prefetch(const struct arm64_cpu_capabilities *entry, int __unused)
-{
- u32 midr = read_cpuid_id();
-
- /* Cavium ThunderX pass 1.x and 2.x */
- return midr_is_cpu_model_range(midr, MIDR_THUNDERX,
- MIDR_CPU_VAR_REV(0, 0),
- MIDR_CPU_VAR_REV(1, MIDR_REVISION_MASK));
-}
-
static bool has_cache_idc(const struct arm64_cpu_capabilities *entry,
int scope)
{
return !meltdown_safe;
}
+#if defined(ID_AA64MMFR0_EL1_TGRAN_LPA2) && defined(ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2)
+static bool has_lpa2_at_stage1(u64 mmfr0)
+{
+ unsigned int tgran;
+
+ tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+ ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+ return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
+}
+
+static bool has_lpa2_at_stage2(u64 mmfr0)
+{
+ unsigned int tgran;
+
+ tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+ ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
+ return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
+}
+
+static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ u64 mmfr0;
+
+ mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+ return has_lpa2_at_stage1(mmfr0) && has_lpa2_at_stage2(mmfr0);
+}
+#else
+static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ return false;
+}
+#endif
+
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
#define KPTI_NG_TEMP_VA (-(1UL << PMD_SHIFT))
static void __init kpti_install_ng_mappings(void)
{
/* Check whether KPTI is going to be used */
- if (!cpus_have_cap(ARM64_UNMAP_KERNEL_AT_EL0))
+ if (!arm64_kernel_unmapped_at_el0())
return;
/*
ARM64_CPUID_FIELDS(ID_AA64ISAR0_EL1, ATOMIC, IMP)
},
#endif /* CONFIG_ARM64_LSE_ATOMICS */
- {
- .desc = "Software prefetching using PRFM",
- .capability = ARM64_HAS_NO_HW_PREFETCH,
- .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
- .matches = has_no_hw_prefetch,
- },
{
.desc = "Virtualization Host Extensions",
.capability = ARM64_HAS_VIRT_HOST_EXTN,
.matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
},
+ {
+ .desc = "52-bit Virtual Addressing for KVM (LPA2)",
+ .capability = ARM64_HAS_LPA2,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = has_lpa2,
+ },
{},
};
verify_local_cpu_capabilities();
}
-static void __init setup_boot_cpu_capabilities(void)
-{
- /* Detect capabilities with either SCOPE_BOOT_CPU or SCOPE_LOCAL_CPU */
- update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU);
- /* Enable the SCOPE_BOOT_CPU capabilities alone right away */
- enable_cpu_capabilities(SCOPE_BOOT_CPU);
-}
-
bool this_cpu_has_cap(unsigned int n)
{
if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) {
return elf_hwcap[1];
}
-void __init setup_system_features(void)
+static void __init setup_boot_cpu_capabilities(void)
{
- int i;
/*
- * The system-wide safe feature feature register values have been
- * finalized. Finalize and log the available system capabilities.
+ * The boot CPU's feature register values have been recorded. Detect
+ * boot cpucaps and local cpucaps for the boot CPU, then enable and
+ * patch alternatives for the available boot cpucaps.
*/
- update_cpu_capabilities(SCOPE_SYSTEM);
- if (IS_ENABLED(CONFIG_ARM64_SW_TTBR0_PAN) &&
- !cpus_have_cap(ARM64_HAS_PAN))
- pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
+ update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU);
+ enable_cpu_capabilities(SCOPE_BOOT_CPU);
+ apply_boot_alternatives();
+}
+void __init setup_boot_cpu_features(void)
+{
/*
- * Enable all the available capabilities which have not been enabled
- * already.
+ * Initialize the indirect array of CPU capabilities pointers before we
+ * handle the boot CPU.
*/
- enable_cpu_capabilities(SCOPE_ALL & ~SCOPE_BOOT_CPU);
+ init_cpucap_indirect_list();
- kpti_install_ng_mappings();
+ /*
+ * Detect broken pseudo-NMI. Must be called _before_ the call to
+ * setup_boot_cpu_capabilities() since it interacts with
+ * can_use_gic_priorities().
+ */
+ detect_system_supports_pseudo_nmi();
- sve_setup();
- sme_setup();
+ setup_boot_cpu_capabilities();
+}
+static void __init setup_system_capabilities(void)
+{
/*
- * Check for sane CTR_EL0.CWG value.
+ * The system-wide safe feature register values have been finalized.
+ * Detect, enable, and patch alternatives for the available system
+ * cpucaps.
*/
- if (!cache_type_cwg())
- pr_warn("No Cache Writeback Granule information, assuming %d\n",
- ARCH_DMA_MINALIGN);
+ update_cpu_capabilities(SCOPE_SYSTEM);
+ enable_cpu_capabilities(SCOPE_ALL & ~SCOPE_BOOT_CPU);
+ apply_alternatives_all();
- for (i = 0; i < ARM64_NCAPS; i++) {
+ /*
+ * Log any cpucaps with a cpumask as these aren't logged by
+ * update_cpu_capabilities().
+ */
+ for (int i = 0; i < ARM64_NCAPS; i++) {
const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i];
if (caps && caps->cpus && caps->desc &&
pr_info("detected: %s on CPU%*pbl\n",
caps->desc, cpumask_pr_args(caps->cpus));
}
+
+ /*
+ * TTBR0 PAN doesn't have its own cpucap, so log it manually.
+ */
+ if (system_uses_ttbr0_pan())
+ pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
+}
+
+void __init setup_system_features(void)
+{
+ setup_system_capabilities();
+
+ kpti_install_ng_mappings();
+
+ sve_setup();
+ sme_setup();
+
+ /*
+ * Check for sane CTR_EL0.CWG value.
+ */
+ if (!cache_type_cwg())
+ pr_warn("No Cache Writeback Granule information, assuming %d\n",
+ ARCH_DMA_MINALIGN);
}
void __init setup_user_features(void)
static inline const char *icache_policy_str(int l1ip)
{
switch (l1ip) {
- case CTR_EL0_L1Ip_VPIPT:
- return "VPIPT";
case CTR_EL0_L1Ip_VIPT:
return "VIPT";
case CTR_EL0_L1Ip_PIPT:
switch (l1ip) {
case CTR_EL0_L1Ip_PIPT:
break;
- case CTR_EL0_L1Ip_VPIPT:
- set_bit(ICACHEF_VPIPT, &__icache_flags);
- break;
case CTR_EL0_L1Ip_VIPT:
default:
/* Assume aliasing */
* softirq kicks in. Upon vcpu_put(), KVM will save the vcpu FP state and
* flag the register state as invalid.
*
- * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may
- * save the task's FPSIMD context back to task_struct from softirq context.
- * To prevent this from racing with the manipulation of the task's FPSIMD state
- * from task context and thereby corrupting the state, it is necessary to
- * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
- * flag with {, __}get_cpu_fpsimd_context(). This will still allow softirqs to
- * run but prevent them to use FPSIMD.
+ * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may be
+ * called from softirq context, which will save the task's FPSIMD context back
+ * to task_struct. To prevent this from racing with the manipulation of the
+ * task's FPSIMD state from task context and thereby corrupting the state, it
+ * is necessary to protect any manipulation of a task's fpsimd_state or
+ * TIF_FOREIGN_FPSTATE flag with get_cpu_fpsimd_context(), which will suspend
+ * softirq servicing entirely until put_cpu_fpsimd_context() is called.
*
* For a certain task, the sequence may look something like this:
* - the task gets scheduled in; if both the task's fpsimd_cpu field
#endif
-DEFINE_PER_CPU(bool, fpsimd_context_busy);
-EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy);
-
static void fpsimd_bind_task_to_cpu(void);
-static void __get_cpu_fpsimd_context(void)
-{
- bool busy = __this_cpu_xchg(fpsimd_context_busy, true);
-
- WARN_ON(busy);
-}
-
/*
* Claim ownership of the CPU FPSIMD context for use by the calling context.
*
* The caller may freely manipulate the FPSIMD context metadata until
* put_cpu_fpsimd_context() is called.
*
- * The double-underscore version must only be called if you know the task
- * can't be preempted.
- *
* On RT kernels local_bh_disable() is not sufficient because it only
* serializes soft interrupt related sections via a local lock, but stays
* preemptible. Disabling preemption is the right choice here as bottom
local_bh_disable();
else
preempt_disable();
- __get_cpu_fpsimd_context();
-}
-
-static void __put_cpu_fpsimd_context(void)
-{
- bool busy = __this_cpu_xchg(fpsimd_context_busy, false);
-
- WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
}
/*
*/
static void put_cpu_fpsimd_context(void)
{
- __put_cpu_fpsimd_context();
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
local_bh_enable();
else
preempt_enable();
}
-static bool have_cpu_fpsimd_context(void)
-{
- return !preemptible() && __this_cpu_read(fpsimd_context_busy);
-}
-
unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
{
return task->thread.vl[type];
bool restore_ffr;
WARN_ON(!system_supports_fpsimd());
- WARN_ON(!have_cpu_fpsimd_context());
+ WARN_ON(preemptible());
+ WARN_ON(test_thread_flag(TIF_KERNEL_FPSTATE));
if (system_supports_sve() || system_supports_sme()) {
switch (current->thread.fp_type) {
default:
/*
* This indicates either a bug in
- * fpsimd_save() or memory corruption, we
+ * fpsimd_save_user_state() or memory corruption, we
* should always record an explicit format
* when we save. We always at least have the
* memory allocated for FPSMID registers so
* than via current, if we are saving KVM state then it will have
* ensured that the type of registers to save is set in last->to_save.
*/
-static void fpsimd_save(void)
+static void fpsimd_save_user_state(void)
{
struct cpu_fp_state const *last =
this_cpu_ptr(&fpsimd_last_state);
unsigned int vl;
WARN_ON(!system_supports_fpsimd());
- WARN_ON(!have_cpu_fpsimd_context());
+ WARN_ON(preemptible());
if (test_thread_flag(TIF_FOREIGN_FPSTATE))
return;
if (task == current) {
get_cpu_fpsimd_context();
- fpsimd_save();
+ fpsimd_save_user_state();
}
fpsimd_flush_task_state(task);
unsigned long b;
int max_bit;
- if (!cpus_have_cap(ARM64_SVE))
+ if (!system_supports_sve())
return;
/*
struct vl_info *info = &vl_info[ARM64_VEC_SME];
int min_bit, max_bit;
- if (!cpus_have_cap(ARM64_SME))
+ if (!system_supports_sme())
return;
/*
current);
}
+static void fpsimd_load_kernel_state(struct task_struct *task)
+{
+ struct cpu_fp_state *last = this_cpu_ptr(&fpsimd_last_state);
+
+ /*
+ * Elide the load if this CPU holds the most recent kernel mode
+ * FPSIMD context of the current task.
+ */
+ if (last->st == &task->thread.kernel_fpsimd_state &&
+ task->thread.kernel_fpsimd_cpu == smp_processor_id())
+ return;
+
+ fpsimd_load_state(&task->thread.kernel_fpsimd_state);
+}
+
+static void fpsimd_save_kernel_state(struct task_struct *task)
+{
+ struct cpu_fp_state cpu_fp_state = {
+ .st = &task->thread.kernel_fpsimd_state,
+ .to_save = FP_STATE_FPSIMD,
+ };
+
+ fpsimd_save_state(&task->thread.kernel_fpsimd_state);
+ fpsimd_bind_state_to_cpu(&cpu_fp_state);
+
+ task->thread.kernel_fpsimd_cpu = smp_processor_id();
+}
+
void fpsimd_thread_switch(struct task_struct *next)
{
bool wrong_task, wrong_cpu;
if (!system_supports_fpsimd())
return;
- __get_cpu_fpsimd_context();
+ WARN_ON_ONCE(!irqs_disabled());
/* Save unsaved fpsimd state, if any: */
- fpsimd_save();
-
- /*
- * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's
- * state. For kernel threads, FPSIMD registers are never loaded
- * and wrong_task and wrong_cpu will always be true.
- */
- wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
- &next->thread.uw.fpsimd_state;
- wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
+ if (test_thread_flag(TIF_KERNEL_FPSTATE))
+ fpsimd_save_kernel_state(current);
+ else
+ fpsimd_save_user_state();
- update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
- wrong_task || wrong_cpu);
+ if (test_tsk_thread_flag(next, TIF_KERNEL_FPSTATE)) {
+ fpsimd_load_kernel_state(next);
+ set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
+ } else {
+ /*
+ * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's
+ * state. For kernel threads, FPSIMD registers are never
+ * loaded with user mode FPSIMD state and so wrong_task and
+ * wrong_cpu will always be true.
+ */
+ wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
+ &next->thread.uw.fpsimd_state;
+ wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
- __put_cpu_fpsimd_context();
+ update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
+ wrong_task || wrong_cpu);
+ }
}
static void fpsimd_flush_thread_vl(enum vec_type type)
return;
get_cpu_fpsimd_context();
- fpsimd_save();
+ fpsimd_save_user_state();
put_cpu_fpsimd_context();
}
*/
void fpsimd_save_and_flush_cpu_state(void)
{
+ unsigned long flags;
+
if (!system_supports_fpsimd())
return;
WARN_ON(preemptible());
- __get_cpu_fpsimd_context();
- fpsimd_save();
+ local_irq_save(flags);
+ fpsimd_save_user_state();
fpsimd_flush_cpu_state();
- __put_cpu_fpsimd_context();
+ local_irq_restore(flags);
}
#ifdef CONFIG_KERNEL_MODE_NEON
get_cpu_fpsimd_context();
/* Save unsaved fpsimd state, if any: */
- fpsimd_save();
+ if (test_thread_flag(TIF_KERNEL_FPSTATE)) {
+ BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq());
+ fpsimd_save_kernel_state(current);
+ } else {
+ fpsimd_save_user_state();
+
+ /*
+ * Set the thread flag so that the kernel mode FPSIMD state
+ * will be context switched along with the rest of the task
+ * state.
+ *
+ * On non-PREEMPT_RT, softirqs may interrupt task level kernel
+ * mode FPSIMD, but the task will not be preemptible so setting
+ * TIF_KERNEL_FPSTATE for those would be both wrong (as it
+ * would mark the task context FPSIMD state as requiring a
+ * context switch) and unnecessary.
+ *
+ * On PREEMPT_RT, softirqs are serviced from a separate thread,
+ * which is scheduled as usual, and this guarantees that these
+ * softirqs are not interrupting use of the FPSIMD in kernel
+ * mode in task context. So in this case, setting the flag here
+ * is always appropriate.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq())
+ set_thread_flag(TIF_KERNEL_FPSTATE);
+ }
/* Invalidate any task state remaining in the fpsimd regs: */
fpsimd_flush_cpu_state();
+
+ put_cpu_fpsimd_context();
}
EXPORT_SYMBOL_GPL(kernel_neon_begin);
if (!system_supports_fpsimd())
return;
- put_cpu_fpsimd_context();
+ /*
+ * If we are returning from a nested use of kernel mode FPSIMD, restore
+ * the task context kernel mode FPSIMD state. This can only happen when
+ * running in softirq context on non-PREEMPT_RT.
+ */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq() &&
+ test_thread_flag(TIF_KERNEL_FPSTATE))
+ fpsimd_load_kernel_state(current);
+ else
+ clear_thread_flag(TIF_KERNEL_FPSTATE);
}
EXPORT_SYMBOL_GPL(kernel_neon_end);
str_l x21, __fdt_pointer, x5 // Save FDT pointer
- ldr_l x4, kimage_vaddr // Save the offset between
+ adrp x4, _text // Save the offset between
sub x4, x4, x0 // the kernel virtual and
str_l x4, kimage_voffset, x5 // physical mappings
static u64 __boot_status __initdata;
+// temporary __prel64 related definitions
+// to be removed when this code is moved under pi/
+
+#define __prel64_initconst __initconst
+
+#define PREL64(type, name) union { type *name; }
+
+#define prel64_pointer(__d) (__d)
+
+typedef bool filter_t(u64 val);
+
struct ftr_set_desc {
char name[FTR_DESC_NAME_LEN];
- struct arm64_ftr_override *override;
+ PREL64(struct arm64_ftr_override, override);
struct {
char name[FTR_DESC_FIELD_LEN];
u8 shift;
u8 width;
- bool (*filter)(u64 val);
+ PREL64(filter_t, filter);
} fields[];
};
val == 0);
}
-static const struct ftr_set_desc mmfr1 __initconst = {
+static const struct ftr_set_desc mmfr1 __prel64_initconst = {
.name = "id_aa64mmfr1",
.override = &id_aa64mmfr1_override,
.fields = {
return true;
}
-static const struct ftr_set_desc pfr0 __initconst = {
+static const struct ftr_set_desc pfr0 __prel64_initconst = {
.name = "id_aa64pfr0",
.override = &id_aa64pfr0_override,
.fields = {
return true;
}
-static const struct ftr_set_desc pfr1 __initconst = {
+static const struct ftr_set_desc pfr1 __prel64_initconst = {
.name = "id_aa64pfr1",
.override = &id_aa64pfr1_override,
.fields = {
},
};
-static const struct ftr_set_desc isar1 __initconst = {
+static const struct ftr_set_desc isar1 __prel64_initconst = {
.name = "id_aa64isar1",
.override = &id_aa64isar1_override,
.fields = {
},
};
-static const struct ftr_set_desc isar2 __initconst = {
+static const struct ftr_set_desc isar2 __prel64_initconst = {
.name = "id_aa64isar2",
.override = &id_aa64isar2_override,
.fields = {
},
};
-static const struct ftr_set_desc smfr0 __initconst = {
+static const struct ftr_set_desc smfr0 __prel64_initconst = {
.name = "id_aa64smfr0",
.override = &id_aa64smfr0_override,
.fields = {
ID_AA64MMFR1_EL1_VH_SHIFT));
}
-static const struct ftr_set_desc sw_features __initconst = {
+static const struct ftr_set_desc sw_features __prel64_initconst = {
.name = "arm64_sw",
.override = &arm64_sw_feature_override,
.fields = {
},
};
-static const struct ftr_set_desc * const regs[] __initconst = {
- &mmfr1,
- &pfr0,
- &pfr1,
- &isar1,
- &isar2,
- &smfr0,
- &sw_features,
+static const
+PREL64(const struct ftr_set_desc, reg) regs[] __prel64_initconst = {
+ { &mmfr1 },
+ { &pfr0 },
+ { &pfr1 },
+ { &isar1 },
+ { &isar2 },
+ { &smfr0 },
+ { &sw_features },
};
static const struct {
char alias[FTR_ALIAS_NAME_LEN];
char feature[FTR_ALIAS_OPTION_LEN];
} aliases[] __initconst = {
- { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" },
- { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" },
+ { "kvm_arm.mode=nvhe", "id_aa64mmfr1.vh=0" },
+ { "kvm_arm.mode=protected", "id_aa64mmfr1.vh=0" },
{ "arm64.nosve", "id_aa64pfr0.sve=0" },
{ "arm64.nosme", "id_aa64pfr1.sme=0" },
{ "arm64.nobti", "id_aa64pfr1.bt=0" },
{ "nokaslr", "arm64_sw.nokaslr=1" },
};
-static int __init parse_nokaslr(char *unused)
+static int __init parse_hexdigit(const char *p, u64 *v)
{
- /* nokaslr param handling is done by early cpufeature code */
+ // skip "0x" if it comes next
+ if (p[0] == '0' && tolower(p[1]) == 'x')
+ p += 2;
+
+ // check whether the RHS is a single hex digit
+ if (!isxdigit(p[0]) || (p[1] && !isspace(p[1])))
+ return -EINVAL;
+
+ *v = tolower(*p) - (isdigit(*p) ? '0' : 'a' - 10);
return 0;
}
-early_param("nokaslr", parse_nokaslr);
-static int __init find_field(const char *cmdline,
+static int __init find_field(const char *cmdline, char *opt, int len,
const struct ftr_set_desc *reg, int f, u64 *v)
{
- char opt[FTR_DESC_NAME_LEN + FTR_DESC_FIELD_LEN + 2];
- int len;
+ int flen = strlen(reg->fields[f].name);
- len = snprintf(opt, ARRAY_SIZE(opt), "%s.%s=",
- reg->name, reg->fields[f].name);
+ // append '<fieldname>=' to obtain '<name>.<fieldname>='
+ memcpy(opt + len, reg->fields[f].name, flen);
+ len += flen;
+ opt[len++] = '=';
- if (!parameqn(cmdline, opt, len))
+ if (memcmp(cmdline, opt, len))
return -1;
- return kstrtou64(cmdline + len, 0, v);
+ return parse_hexdigit(cmdline + len, v);
}
static void __init match_options(const char *cmdline)
{
+ char opt[FTR_DESC_NAME_LEN + FTR_DESC_FIELD_LEN + 2];
int i;
for (i = 0; i < ARRAY_SIZE(regs); i++) {
+ const struct ftr_set_desc *reg = prel64_pointer(regs[i].reg);
+ struct arm64_ftr_override *override;
+ int len = strlen(reg->name);
int f;
- if (!regs[i]->override)
- continue;
+ override = prel64_pointer(reg->override);
- for (f = 0; strlen(regs[i]->fields[f].name); f++) {
- u64 shift = regs[i]->fields[f].shift;
- u64 width = regs[i]->fields[f].width ?: 4;
+ // set opt[] to '<name>.'
+ memcpy(opt, reg->name, len);
+ opt[len++] = '.';
+
+ for (f = 0; reg->fields[f].name[0] != '\0'; f++) {
+ u64 shift = reg->fields[f].shift;
+ u64 width = reg->fields[f].width ?: 4;
u64 mask = GENMASK_ULL(shift + width - 1, shift);
+ bool (*filter)(u64 val);
u64 v;
- if (find_field(cmdline, regs[i], f, &v))
+ if (find_field(cmdline, opt, len, reg, f, &v))
continue;
/*
* it by setting the value to the all-ones while
* clearing the mask... Yes, this is fragile.
*/
- if (regs[i]->fields[f].filter &&
- !regs[i]->fields[f].filter(v)) {
- regs[i]->override->val |= mask;
- regs[i]->override->mask &= ~mask;
+ filter = prel64_pointer(reg->fields[f].filter);
+ if (filter && !filter(v)) {
+ override->val |= mask;
+ override->mask &= ~mask;
continue;
}
- regs[i]->override->val &= ~mask;
- regs[i]->override->val |= (v << shift) & mask;
- regs[i]->override->mask |= mask;
+ override->val &= ~mask;
+ override->val |= (v << shift) & mask;
+ override->mask |= mask;
return;
}
cmdline = skip_spaces(cmdline);
- for (len = 0; cmdline[len] && !isspace(cmdline[len]); len++);
- if (!len)
+ /* terminate on "--" appearing on the command line by itself */
+ if (cmdline[0] == '-' && cmdline[1] == '-' && isspace(cmdline[2]))
return;
- len = min(len, ARRAY_SIZE(buf) - 1);
- memcpy(buf, cmdline, len);
- buf[len] = '\0';
-
- if (strcmp(buf, "--") == 0)
+ for (len = 0; cmdline[len] && !isspace(cmdline[len]); len++) {
+ if (len >= sizeof(buf) - 1)
+ break;
+ if (cmdline[len] == '-')
+ buf[len] = '_';
+ else
+ buf[len] = cmdline[len];
+ }
+ if (!len)
return;
+ buf[len] = 0;
+
cmdline += len;
match_options(buf);
for (i = 0; parse_aliases && i < ARRAY_SIZE(aliases); i++)
- if (parameq(buf, aliases[i].alias))
+ if (!memcmp(buf, aliases[i].alias, len + 1))
__parse_cmdline(aliases[i].feature, false);
} while (1);
}
asmlinkage void __init init_feature_override(u64 boot_status)
{
+ struct arm64_ftr_override *override;
+ const struct ftr_set_desc *reg;
int i;
for (i = 0; i < ARRAY_SIZE(regs); i++) {
- if (regs[i]->override) {
- regs[i]->override->val = 0;
- regs[i]->override->mask = 0;
- }
+ reg = prel64_pointer(regs[i].reg);
+ override = prel64_pointer(reg->override);
+
+ override->val = 0;
+ override->mask = 0;
}
__boot_status = boot_status;
parse_cmdline();
for (i = 0; i < ARRAY_SIZE(regs); i++) {
- if (regs[i]->override)
- dcache_clean_inval_poc((unsigned long)regs[i]->override,
- (unsigned long)regs[i]->override +
- sizeof(*regs[i]->override));
+ reg = prel64_pointer(regs[i].reg);
+ override = prel64_pointer(reg->override);
+ dcache_clean_inval_poc((unsigned long)override,
+ (unsigned long)(override + 1));
}
}
#include <linux/vmalloc.h>
#include <asm/daifflags.h>
#include <asm/exception.h>
+#include <asm/numa.h>
#include <asm/softirq_stack.h>
#include <asm/stacktrace.h>
#include <asm/vmap_stack.h>
for_each_possible_cpu(cpu)
per_cpu(irq_shadow_call_stack_ptr, cpu) =
- scs_alloc(cpu_to_node(cpu));
+ scs_alloc(early_cpu_to_node(cpu));
}
#ifdef CONFIG_VMAP_STACK
-static void init_irq_stacks(void)
+static void __init init_irq_stacks(void)
{
int cpu;
unsigned long *p;
for_each_possible_cpu(cpu) {
- p = arch_alloc_vmap_stack(IRQ_STACK_SIZE, cpu_to_node(cpu));
+ p = arch_alloc_vmap_stack(IRQ_STACK_SIZE, early_cpu_to_node(cpu));
per_cpu(irq_stack_ptr, cpu) = p;
}
}
pr_info("KASLR enabled\n");
__kaslr_is_enabled = true;
}
+
+static int __init parse_nokaslr(char *unused)
+{
+ /* nokaslr param handling is done by early cpufeature code */
+ return 0;
+}
+early_param("nokaslr", parse_nokaslr);
KBUILD_CFLAGS := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
-Os -DDISABLE_BRANCH_PROFILING $(DISABLE_STACKLEAK_PLUGIN) \
+ $(DISABLE_LATENT_ENTROPY_PLUGIN) \
$(call cc-option,-mbranch-protection=none) \
-I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \
-include $(srctree)/include/linux/hidden.h \
void __init smp_cpus_done(unsigned int max_cpus)
{
pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
- setup_system_features();
hyp_mode_check();
- apply_alternatives_all();
+ setup_system_features();
setup_user_features();
mark_linear_text_alias_ro();
}
* freed shortly, so we must move over to the runtime per-cpu area.
*/
set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
- cpuinfo_store_boot_cpu();
- /*
- * We now know enough about the boot CPU to apply the
- * alternatives that cannot wait until interrupt handling
- * and/or scheduling is enabled.
- */
- apply_boot_alternatives();
+ cpuinfo_store_boot_cpu();
+ setup_boot_cpu_features();
/* Conditionally switch to GIC PMR for interrupt masking */
if (system_uses_irq_prio_masking())
#include <linux/efi.h>
#include <linux/export.h>
#include <linux/ftrace.h>
+#include <linux/kprobes.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
+/*
+ * Kernel unwind state
+ *
+ * @common: Common unwind state.
+ * @task: The task being unwound.
+ * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
+ * associated with the most recently encountered replacement lr
+ * value.
+ */
+struct kunwind_state {
+ struct unwind_state common;
+ struct task_struct *task;
+#ifdef CONFIG_KRETPROBES
+ struct llist_node *kr_cur;
+#endif
+};
+
+static __always_inline void
+kunwind_init(struct kunwind_state *state,
+ struct task_struct *task)
+{
+ unwind_init_common(&state->common);
+ state->task = task;
+}
+
/*
* Start an unwind from a pt_regs.
*
* The regs must be on a stack currently owned by the calling task.
*/
static __always_inline void
-unwind_init_from_regs(struct unwind_state *state,
- struct pt_regs *regs)
+kunwind_init_from_regs(struct kunwind_state *state,
+ struct pt_regs *regs)
{
- unwind_init_common(state, current);
+ kunwind_init(state, current);
- state->fp = regs->regs[29];
- state->pc = regs->pc;
+ state->common.fp = regs->regs[29];
+ state->common.pc = regs->pc;
}
/*
* The function which invokes this must be noinline.
*/
static __always_inline void
-unwind_init_from_caller(struct unwind_state *state)
+kunwind_init_from_caller(struct kunwind_state *state)
{
- unwind_init_common(state, current);
+ kunwind_init(state, current);
- state->fp = (unsigned long)__builtin_frame_address(1);
- state->pc = (unsigned long)__builtin_return_address(0);
+ state->common.fp = (unsigned long)__builtin_frame_address(1);
+ state->common.pc = (unsigned long)__builtin_return_address(0);
}
/*
* call this for the current task.
*/
static __always_inline void
-unwind_init_from_task(struct unwind_state *state,
- struct task_struct *task)
+kunwind_init_from_task(struct kunwind_state *state,
+ struct task_struct *task)
{
- unwind_init_common(state, task);
+ kunwind_init(state, task);
- state->fp = thread_saved_fp(task);
- state->pc = thread_saved_pc(task);
+ state->common.fp = thread_saved_fp(task);
+ state->common.pc = thread_saved_pc(task);
}
static __always_inline int
-unwind_recover_return_address(struct unwind_state *state)
+kunwind_recover_return_address(struct kunwind_state *state)
{
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (state->task->ret_stack &&
- (state->pc == (unsigned long)return_to_handler)) {
+ (state->common.pc == (unsigned long)return_to_handler)) {
unsigned long orig_pc;
- orig_pc = ftrace_graph_ret_addr(state->task, NULL, state->pc,
- (void *)state->fp);
- if (WARN_ON_ONCE(state->pc == orig_pc))
+ orig_pc = ftrace_graph_ret_addr(state->task, NULL,
+ state->common.pc,
+ (void *)state->common.fp);
+ if (WARN_ON_ONCE(state->common.pc == orig_pc))
return -EINVAL;
- state->pc = orig_pc;
+ state->common.pc = orig_pc;
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
#ifdef CONFIG_KRETPROBES
- if (is_kretprobe_trampoline(state->pc)) {
- state->pc = kretprobe_find_ret_addr(state->task,
- (void *)state->fp,
- &state->kr_cur);
+ if (is_kretprobe_trampoline(state->common.pc)) {
+ unsigned long orig_pc;
+ orig_pc = kretprobe_find_ret_addr(state->task,
+ (void *)state->common.fp,
+ &state->kr_cur);
+ state->common.pc = orig_pc;
}
#endif /* CONFIG_KRETPROBES */
* and the location (but not the fp value) of B.
*/
static __always_inline int
-unwind_next(struct unwind_state *state)
+kunwind_next(struct kunwind_state *state)
{
struct task_struct *tsk = state->task;
- unsigned long fp = state->fp;
+ unsigned long fp = state->common.fp;
int err;
/* Final frame; nothing to unwind */
if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
return -ENOENT;
- err = unwind_next_frame_record(state);
+ err = unwind_next_frame_record(&state->common);
if (err)
return err;
- state->pc = ptrauth_strip_kernel_insn_pac(state->pc);
+ state->common.pc = ptrauth_strip_kernel_insn_pac(state->common.pc);
- return unwind_recover_return_address(state);
+ return kunwind_recover_return_address(state);
}
+typedef bool (*kunwind_consume_fn)(const struct kunwind_state *state, void *cookie);
+
static __always_inline void
-unwind(struct unwind_state *state, stack_trace_consume_fn consume_entry,
- void *cookie)
+do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
+ void *cookie)
{
- if (unwind_recover_return_address(state))
+ if (kunwind_recover_return_address(state))
return;
while (1) {
int ret;
- if (!consume_entry(cookie, state->pc))
+ if (!consume_state(state, cookie))
break;
- ret = unwind_next(state);
+ ret = kunwind_next(state);
if (ret < 0)
break;
}
: stackinfo_get_unknown(); \
})
-noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry,
- void *cookie, struct task_struct *task,
- struct pt_regs *regs)
+static __always_inline void
+kunwind_stack_walk(kunwind_consume_fn consume_state,
+ void *cookie, struct task_struct *task,
+ struct pt_regs *regs)
{
struct stack_info stacks[] = {
stackinfo_get_task(task),
STACKINFO_EFI,
#endif
};
- struct unwind_state state = {
- .stacks = stacks,
- .nr_stacks = ARRAY_SIZE(stacks),
+ struct kunwind_state state = {
+ .common = {
+ .stacks = stacks,
+ .nr_stacks = ARRAY_SIZE(stacks),
+ },
};
if (regs) {
if (task != current)
return;
- unwind_init_from_regs(&state, regs);
+ kunwind_init_from_regs(&state, regs);
} else if (task == current) {
- unwind_init_from_caller(&state);
+ kunwind_init_from_caller(&state);
} else {
- unwind_init_from_task(&state, task);
+ kunwind_init_from_task(&state, task);
}
- unwind(&state, consume_entry, cookie);
+ do_kunwind(&state, consume_state, cookie);
+}
+
+struct kunwind_consume_entry_data {
+ stack_trace_consume_fn consume_entry;
+ void *cookie;
+};
+
+static bool
+arch_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
+{
+ struct kunwind_consume_entry_data *data = cookie;
+ return data->consume_entry(data->cookie, state->common.pc);
+}
+
+noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry,
+ void *cookie, struct task_struct *task,
+ struct pt_regs *regs)
+{
+ struct kunwind_consume_entry_data data = {
+ .consume_entry = consume_entry,
+ .cookie = cookie,
+ };
+
+ kunwind_stack_walk(arch_kunwind_consume_entry, &data, task, regs);
}
static bool dump_backtrace_entry(void *arg, unsigned long where)
VDSO_CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
# Build rules
-targets := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso) vdso.so vdso.so.dbg vdso.so.raw
+targets := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso) vdso.so vdso32.so.dbg vdso.so.raw
c-obj-vdso := $(addprefix $(obj)/, $(c-obj-vdso))
c-obj-vdso-gettimeofday := $(addprefix $(obj)/, $(c-obj-vdso-gettimeofday))
asm-obj-vdso := $(addprefix $(obj)/, $(asm-obj-vdso))
targets += vdso.lds
CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
-include/generated/vdso32-offsets.h: $(obj)/vdso.so.dbg FORCE
+include/generated/vdso32-offsets.h: $(obj)/vdso32.so.dbg FORCE
$(call if_changed,vdsosym)
# Strip rule for vdso.so
$(obj)/vdso.so: OBJCOPYFLAGS := -S
-$(obj)/vdso.so: $(obj)/vdso.so.dbg FORCE
+$(obj)/vdso.so: $(obj)/vdso32.so.dbg FORCE
$(call if_changed,objcopy)
-$(obj)/vdso.so.dbg: $(obj)/vdso.so.raw $(obj)/$(munge) FORCE
+$(obj)/vdso32.so.dbg: $(obj)/vdso.so.raw $(obj)/$(munge) FORCE
$(call if_changed,vdsomunge)
# Link rule for the .so file, .lds has to be first
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
kvm_timer_vcpu_terminate(vcpu);
kvm_pmu_vcpu_destroy(vcpu);
-
+ kvm_vgic_vcpu_destroy(vcpu);
kvm_arm_vcpu_destroy(vcpu);
}
#include <nvhe/pkvm.h>
#include <nvhe/trap_handler.h>
-/* Used by icache_is_vpipt(). */
+/* Used by icache_is_aliasing(). */
unsigned long __icache_flags;
/* Used by kvm_get_vttbr(). */
dsb(ish);
isb();
- /*
- * If the host is running at EL1 and we have a VPIPT I-cache,
- * then we must perform I-cache maintenance at EL2 in order for
- * it to have an effect on the guest. Since the guest cannot hit
- * I-cache lines allocated with a different VMID, we don't need
- * to worry about junk out of guest reset (we nuke the I-cache on
- * VMID rollover), but we do need to be careful when remapping
- * executable pages for the same guest. This can happen when KSM
- * takes a CoW fault on an executable page, copies the page into
- * a page that was previously mapped in the guest and then needs
- * to invalidate the guest view of the I-cache for that page
- * from EL1. To solve this, we invalidate the entire I-cache when
- * unmapping a page from a guest if we have a VPIPT I-cache but
- * the host is running at EL1. As above, we could do better if
- * we had the VA.
- *
- * The moral of this story is: if you have a VPIPT I-cache, then
- * you should be running with VHE enabled.
- */
- if (icache_is_vpipt())
- icache_inval_all_pou();
-
__tlb_switch_to_host(&cxt);
}
dsb(nsh);
isb();
- /*
- * If the host is running at EL1 and we have a VPIPT I-cache,
- * then we must perform I-cache maintenance at EL2 in order for
- * it to have an effect on the guest. Since the guest cannot hit
- * I-cache lines allocated with a different VMID, we don't need
- * to worry about junk out of guest reset (we nuke the I-cache on
- * VMID rollover), but we do need to be careful when remapping
- * executable pages for the same guest. This can happen when KSM
- * takes a CoW fault on an executable page, copies the page into
- * a page that was previously mapped in the guest and then needs
- * to invalidate the guest view of the I-cache for that page
- * from EL1. To solve this, we invalidate the entire I-cache when
- * unmapping a page from a guest if we have a VPIPT I-cache but
- * the host is running at EL1. As above, we could do better if
- * we had the VA.
- *
- * The moral of this story is: if you have a VPIPT I-cache, then
- * you should be running with VHE enabled.
- */
- if (icache_is_vpipt())
- icache_inval_all_pou();
-
__tlb_switch_to_host(&cxt);
}
dsb(ish);
isb();
- /* See the comment in __kvm_tlb_flush_vmid_ipa() */
- if (icache_is_vpipt())
- icache_inval_all_pou();
-
__tlb_switch_to_host(&cxt);
}
/* Same remark as in __tlb_switch_to_guest() */
dsb(ish);
__tlbi(alle1is);
-
- /*
- * VIPT and PIPT caches are not affected by VMID, so no maintenance
- * is necessary across a VMID rollover.
- *
- * VPIPT caches constrain lookup and maintenance to the active VMID,
- * so we need to invalidate lines with a stale VMID to avoid an ABA
- * race after multiple rollovers.
- *
- */
- if (icache_is_vpipt())
- asm volatile("ic ialluis");
-
dsb(ish);
}
{
dsb(ishst);
__tlbi(alle1is);
-
- /*
- * VIPT and PIPT caches are not affected by VMID, so no maintenance
- * is necessary across a VMID rollover.
- *
- * VPIPT caches constrain lookup and maintenance to the active VMID,
- * so we need to invalidate lines with a stale VMID to avoid an ABA
- * race after multiple rollovers.
- *
- */
- if (icache_is_vpipt())
- asm volatile("ic ialluis");
-
dsb(ish);
}
u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
{
- u64 val = kvm_vcpu_read_pmcr(vcpu) >> ARMV8_PMU_PMCR_N_SHIFT;
+ u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu));
- val &= ARMV8_PMU_PMCR_N_MASK;
if (val == 0)
return BIT(ARMV8_PMU_CYCLE_IDX);
else
*/
u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu)
{
- u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0) &
- ~(ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT);
+ u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0);
- return pmcr | ((u64)vcpu->kvm->arch.pmcr_n << ARMV8_PMU_PMCR_N_SHIFT);
+ return u64_replace_bits(pmcr, vcpu->kvm->arch.pmcr_n, ARMV8_PMU_PMCR_N);
}
u64 pmcr, val;
pmcr = kvm_vcpu_read_pmcr(vcpu);
- val = (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK;
+ val = FIELD_GET(ARMV8_PMU_PMCR_N, pmcr);
if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) {
kvm_inject_undefined(vcpu);
return false;
static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
u64 val)
{
- u8 new_n = (val >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK;
+ u8 new_n = FIELD_GET(ARMV8_PMU_PMCR_N, val);
struct kvm *kvm = vcpu->kvm;
mutex_lock(&kvm->arch.config_lock);
vgic_v4_teardown(kvm);
}
-void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
+static void __kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
vgic_flush_pending_lpis(vcpu);
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
- vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
+ if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
+ vgic_unregister_redist_iodev(vcpu);
+ vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
+ }
}
-static void __kvm_vgic_destroy(struct kvm *kvm)
+void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+
+ mutex_lock(&kvm->slots_lock);
+ __kvm_vgic_vcpu_destroy(vcpu);
+ mutex_unlock(&kvm->slots_lock);
+}
+
+void kvm_vgic_destroy(struct kvm *kvm)
{
struct kvm_vcpu *vcpu;
unsigned long i;
- lockdep_assert_held(&kvm->arch.config_lock);
+ mutex_lock(&kvm->slots_lock);
vgic_debug_destroy(kvm);
kvm_for_each_vcpu(i, vcpu, kvm)
- kvm_vgic_vcpu_destroy(vcpu);
+ __kvm_vgic_vcpu_destroy(vcpu);
+
+ mutex_lock(&kvm->arch.config_lock);
kvm_vgic_dist_destroy(kvm);
-}
-void kvm_vgic_destroy(struct kvm *kvm)
-{
- mutex_lock(&kvm->arch.config_lock);
- __kvm_vgic_destroy(kvm);
mutex_unlock(&kvm->arch.config_lock);
+ mutex_unlock(&kvm->slots_lock);
}
/**
type = VGIC_V3;
}
- if (ret) {
- __kvm_vgic_destroy(kvm);
+ if (ret)
goto out;
- }
+
dist->ready = true;
dist_base = dist->vgic_dist_base;
mutex_unlock(&kvm->arch.config_lock);
ret = vgic_register_dist_iodev(kvm, dist_base, type);
- if (ret) {
+ if (ret)
kvm_err("Unable to register VGIC dist MMIO regions\n");
- kvm_vgic_destroy(kvm);
- }
- mutex_unlock(&kvm->slots_lock);
- return ret;
+ goto out_slots;
out:
mutex_unlock(&kvm->arch.config_lock);
+out_slots:
mutex_unlock(&kvm->slots_lock);
+
+ if (ret)
+ kvm_vgic_destroy(kvm);
+
return ret;
}
return ret;
}
-static void vgic_unregister_redist_iodev(struct kvm_vcpu *vcpu)
+void vgic_unregister_redist_iodev(struct kvm_vcpu *vcpu)
{
struct vgic_io_device *rd_dev = &vcpu->arch.vgic_cpu.rd_iodev;
unsigned long c;
int ret = 0;
+ lockdep_assert_held(&kvm->slots_lock);
+
kvm_for_each_vcpu(c, vcpu, kvm) {
ret = vgic_register_redist_iodev(vcpu);
if (ret)
int vgic_v3_save_pending_tables(struct kvm *kvm);
int vgic_v3_set_redist_base(struct kvm *kvm, u32 index, u64 addr, u32 count);
int vgic_register_redist_iodev(struct kvm_vcpu *vcpu);
+void vgic_unregister_redist_iodev(struct kvm_vcpu *vcpu);
bool vgic_v3_check_base(struct kvm *kvm);
void vgic_v3_load(struct kvm_vcpu *vcpu);
* x1 - src
*/
SYM_FUNC_START(__pi_copy_page)
-alternative_if ARM64_HAS_NO_HW_PREFETCH
- // Prefetch three cache lines ahead.
- prfm pldl1strm, [x1, #128]
- prfm pldl1strm, [x1, #256]
- prfm pldl1strm, [x1, #384]
-alternative_else_nop_endif
-
ldp x2, x3, [x1]
ldp x4, x5, [x1, #16]
ldp x6, x7, [x1, #32]
1:
tst x0, #(PAGE_SIZE - 1)
-alternative_if ARM64_HAS_NO_HW_PREFETCH
- prfm pldl1strm, [x1, #384]
-alternative_else_nop_endif
-
stnp x2, x3, [x0, #-256]
ldp x2, x3, [x1]
stnp x4, x5, [x0, #16 - 256]
goto done;
}
count_vm_vma_lock_event(VMA_LOCK_RETRY);
+ if (fault & VM_FAULT_MAJOR)
+ mm_flags |= FAULT_FLAG_TRIED;
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
EXPORT_SYMBOL(vabits_actual);
#endif
-u64 kimage_vaddr __ro_after_init = (u64)&_text;
-EXPORT_SYMBOL(kimage_vaddr);
-
u64 kimage_voffset __ro_after_init;
EXPORT_SYMBOL(kimage_voffset);
{
int i;
+ if (!arm64_kernel_unmapped_at_el0())
+ return 0;
+
pgprot_t prot = kernel_exec_prot();
phys_addr_t pa_start = __pa_symbol(__entry_tramp_text_start);
HAS_GIC_PRIO_RELAXED_SYNC
HAS_HCX
HAS_LDAPR
+HAS_LPA2
HAS_LSE_ATOMICS
HAS_MOPS
HAS_NESTED_VIRT
-HAS_NO_HW_PREFETCH
HAS_PAN
HAS_S1PIE
HAS_RAS_EXTN
EndEnum
EndSysreg
+Sysreg ID_AA64PFR2_EL1 3 0 0 4 2
+Res0 63:36
+UnsignedEnum 35:32 FPMR
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+Res0 31:12
+UnsignedEnum 11:8 MTEFAR
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+UnsignedEnum 7:4 MTESTOREONLY
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+UnsignedEnum 3:0 MTEPERM
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+EndSysreg
+
Sysreg ID_AA64ZFR0_EL1 3 0 0 4 4
Res0 63:60
UnsignedEnum 59:56 F64MM
0b0 NI
0b1 IMP
EndEnum
-Res0 62:60
+Res0 62:61
+UnsignedEnum 60 LUTv2
+ 0b0 NI
+ 0b1 IMP
+EndEnum
UnsignedEnum 59:56 SMEver
0b0000 SME
0b0001 SME2
0b0 NI
0b1 IMP
EndEnum
-Res0 41:40
+UnsignedEnum 41 F8F16
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 40 F8F32
+ 0b0 NI
+ 0b1 IMP
+EndEnum
UnsignedEnum 39:36 I8I32
0b0000 NI
0b1111 IMP
0b0 NI
0b1 IMP
EndEnum
-Res0 31:0
+Res0 31
+UnsignedEnum 30 SF8FMA
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 29 SF8DP4
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 28 SF8DP2
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+Res0 27:0
+EndSysreg
+
+Sysreg ID_AA64FPFR0_EL1 3 0 0 4 7
+Res0 63:32
+UnsignedEnum 31 F8CVT
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 30 F8FMA
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 29 F8DP4
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 28 F8DP2
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+Res0 27:2
+UnsignedEnum 1 F8E4M3
+ 0b0 NI
+ 0b1 IMP
+EndEnum
+UnsignedEnum 0 F8E5M2
+ 0b0 NI
+ 0b1 IMP
+EndEnum
EndSysreg
Sysreg ID_AA64DFR0_EL1 3 0 0 5 0
0b0000 UNPREDICTABLE
0b0001 DEF
EndEnum
-Res0 59:56
+UnsignedEnum 59:56 ExtTrcBuff
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
UnsignedEnum 55:52 BRBE
0b0000 NI
0b0001 IMP
0b0011 PAuth2
0b0100 FPAC
0b0101 FPACCOMBINE
+ 0b0110 PAuth_LR
EndEnum
UnsignedEnum 7:4 APA
0b0000 NI
0b0011 PAuth2
0b0100 FPAC
0b0101 FPACCOMBINE
+ 0b0110 PAuth_LR
EndEnum
UnsignedEnum 3:0 DPB
0b0000 NI
EndSysreg
Sysreg ID_AA64ISAR2_EL1 3 0 0 6 2
-Res0 63:56
+UnsignedEnum 63:60 ATS1A
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+UnsignedEnum 59:56 LUT
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
UnsignedEnum 55:52 CSSC
0b0000 NI
0b0001 IMP
0b0000 NI
0b0001 IMP
EndEnum
-Res0 47:32
+Res0 47:44
+UnsignedEnum 43:40 PRFMSLC
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+UnsignedEnum 39:36 SYSINSTR_128
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+UnsignedEnum 35:32 SYSREG_128
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
UnsignedEnum 31:28 CLRBHB
0b0000 NI
0b0001 IMP
0b0011 PAuth2
0b0100 FPAC
0b0101 FPACCOMBINE
+ 0b0110 PAuth_LR
EndEnum
UnsignedEnum 11:8 GPA3
0b0000 NI
EndEnum
EndSysreg
+Sysreg ID_AA64ISAR3_EL1 3 0 0 6 3
+Res0 63:12
+UnsignedEnum 11:8 TLBIW
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+UnsignedEnum 7:4 FAMINMAX
+ 0b0000 NI
+ 0b0001 IMP
+EndEnum
+UnsignedEnum 3:0 CPA
+ 0b0000 NI
+ 0b0001 IMP
+ 0b0010 CPA2
+EndEnum
+EndSysreg
+
Sysreg ID_AA64MMFR0_EL1 3 0 0 7 0
UnsignedEnum 63:60 ECV
0b0000 NI
Field 62 SPINTMASK
Field 61 NMI
Field 60 EnTP2
-Res0 59:58
+Field 59 TCSO
+Field 58 TCSO0
Field 57 EPAN
Field 56 EnALS
Field 55 EnAS0
Field 37 ITFSB
Field 36 BT1
Field 35 BT0
-Res0 34
+Field 34 EnFPM
Field 33 MSCEn
Field 32 CMOW
Field 31 EnIA
EndSysreg
SysregFields CPACR_ELx
-Res0 63:29
+Res0 63:30
+Field 29 E0POE
Field 28 TTA
Res0 27:26
Field 25:24 SMEN
Fields SMCR_ELx
EndSysreg
+SysregFields GCSCR_ELx
+Res0 63:10
+Field 9 STREn
+Field 8 PUSHMEn
+Res0 7
+Field 6 EXLOCKEN
+Field 5 RVCHKEN
+Res0 4:1
+Field 0 PCRSEL
+EndSysregFields
+
+Sysreg GCSCR_EL1 3 0 2 5 0
+Fields GCSCR_ELx
+EndSysreg
+
+SysregFields GCSPR_ELx
+Field 63:3 PTR
+Res0 2:0
+EndSysregFields
+
+Sysreg GCSPR_EL1 3 0 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
+Sysreg GCSCRE0_EL1 3 0 2 5 2
+Res0 63:11
+Field 10 nTR
+Field 9 STREn
+Field 8 PUSHMEn
+Res0 7:6
+Field 5 RVCHKEN
+Res0 4:1
+Field 0 PCRSEL
+EndSysreg
+
Sysreg ALLINT 3 0 4 3 0
Res0 63:14
Field 13 ALLINT
Fields CONTEXTIDR_ELx
EndSysreg
+Sysreg RCWSMASK_EL1 3 0 13 0 3
+Field 63:0 RCWSMASK
+EndSysreg
+
Sysreg TPIDR_EL1 3 0 13 0 4
Field 63:0 ThreadID
EndSysreg
+Sysreg RCWMASK_EL1 3 0 13 0 6
+Field 63:0 RCWMASK
+EndSysreg
+
Sysreg SCXTNUM_EL1 3 0 13 0 7
Field 63:0 SoftwareContextNumber
EndSysreg
Field 23:20 ERG
Field 19:16 DminLine
Enum 15:14 L1Ip
- 0b00 VPIPT
+ # This was named as VPIPT in the ARM but now documented as reserved
+ 0b00 RESERVED_VPIPT
# This is named as AIVIVT in the ARM but documented as reserved
- 0b01 RESERVED
+ 0b01 RESERVED_AIVIVT
0b10 VIPT
0b11 PIPT
EndEnum
Field 3:0 BS
EndSysreg
+Sysreg GCSPR_EL0 3 3 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
Sysreg SVCR 3 3 4 2 2
Res0 63:2
Field 1 ZA
Field 0 SM
EndSysreg
+Sysreg FPMR 3 3 4 4 2
+Res0 63:38
+Field 37:32 LSCALE2
+Field 31:24 NSCALE
+Res0 23
+Field 22:16 LSCALE
+Field 15 OSC
+Field 14 OSM
+Res0 13:9
+UnsignedEnum 8:6 F8D
+ 0b000 E5M2
+ 0b001 E4M3
+EndEnum
+UnsignedEnum 5:3 F8S2
+ 0b000 E5M2
+ 0b001 E4M3
+EndEnum
+UnsignedEnum 2:0 F8S1
+ 0b000 E5M2
+ 0b001 E4M3
+EndEnum
+EndSysreg
+
SysregFields HFGxTR_EL2
Field 63 nAMAIR2_EL1
Field 62 nMAIR2_EL1
EndSysreg
Sysreg HFGITR_EL2 3 4 1 1 6
-Res0 63:61
+Res0 63
+Field 62 ATS1E1A
+Res0 61
Field 60 COSPRCTX
Field 59 nGCSEPP
Field 58 nGCSSTR_EL1
Field 0 DBGBCRn_EL1
EndSysreg
+Sysreg HAFGRTR_EL2 3 4 3 1 6
+Res0 63:50
+Field 49 AMEVTYPER115_EL0
+Field 48 AMEVCNTR115_EL0
+Field 47 AMEVTYPER114_EL0
+Field 46 AMEVCNTR114_EL0
+Field 45 AMEVTYPER113_EL0
+Field 44 AMEVCNTR113_EL0
+Field 43 AMEVTYPER112_EL0
+Field 42 AMEVCNTR112_EL0
+Field 41 AMEVTYPER111_EL0
+Field 40 AMEVCNTR111_EL0
+Field 39 AMEVTYPER110_EL0
+Field 38 AMEVCNTR110_EL0
+Field 37 AMEVTYPER19_EL0
+Field 36 AMEVCNTR19_EL0
+Field 35 AMEVTYPER18_EL0
+Field 34 AMEVCNTR18_EL0
+Field 33 AMEVTYPER17_EL0
+Field 32 AMEVCNTR17_EL0
+Field 31 AMEVTYPER16_EL0
+Field 30 AMEVCNTR16_EL0
+Field 29 AMEVTYPER15_EL0
+Field 28 AMEVCNTR15_EL0
+Field 27 AMEVTYPER14_EL0
+Field 26 AMEVCNTR14_EL0
+Field 25 AMEVTYPER13_EL0
+Field 24 AMEVCNTR13_EL0
+Field 23 AMEVTYPER12_EL0
+Field 22 AMEVCNTR12_EL0
+Field 21 AMEVTYPER11_EL0
+Field 20 AMEVCNTR11_EL0
+Field 19 AMEVTYPER10_EL0
+Field 18 AMEVCNTR10_EL0
+Field 17 AMCNTEN1
+Res0 16:5
+Field 4 AMEVCNTR03_EL0
+Field 3 AMEVCNTR02_EL0
+Field 2 AMEVCNTR01_EL0
+Field 1 AMEVCNTR00_EL0
+Field 0 AMCNTEN0
+EndSysreg
+
Sysreg ZCR_EL2 3 4 1 2 0
Fields ZCR_ELx
EndSysreg
Sysreg HCRX_EL2 3 4 1 2 2
-Res0 63:23
+Res0 63:25
+Field 24 PACMEn
+Field 23 EnFPM
Field 22 GCSEn
Field 21 EnIDCP128
Field 20 EnSDERR
Fields SMCR_ELx
EndSysreg
+Sysreg GCSCR_EL2 3 4 2 5 0
+Fields GCSCR_ELx
+EndSysreg
+
+Sysreg GCSPR_EL2 3 4 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
Sysreg DACR32_EL2 3 4 3 0 0
Res0 63:32
Field 31:30 D15
Fields SMCR_ELx
EndSysreg
+Sysreg GCSCR_EL12 3 5 2 5 0
+Fields GCSCR_ELx
+EndSysreg
+
+Sysreg GCSPR_EL12 3 5 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
Sysreg FAR_EL12 3 5 6 0 0
Field 63:0 ADDR
EndSysreg
Field 0 PnCH
EndSysreg
+SysregFields MAIR2_ELx
+Field 63:56 Attr7
+Field 55:48 Attr6
+Field 47:40 Attr5
+Field 39:32 Attr4
+Field 31:24 Attr3
+Field 23:16 Attr2
+Field 15:8 Attr1
+Field 7:0 Attr0
+EndSysregFields
+
+Sysreg MAIR2_EL1 3 0 10 2 1
+Fields MAIR2_ELx
+EndSysreg
+
+Sysreg MAIR2_EL2 3 4 10 1 1
+Fields MAIR2_ELx
+EndSysreg
+
+Sysreg AMAIR2_EL1 3 0 10 3 1
+Field 63:0 ImpDef
+EndSysreg
+
+Sysreg AMAIR2_EL2 3 4 10 3 1
+Field 63:0 ImpDef
+EndSysreg
+
SysregFields PIRx_ELx
Field 63:60 Perm15
Field 59:56 Perm14
Fields PIRx_ELx
EndSysreg
+Sysreg POR_EL0 3 3 10 2 4
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg POR_EL1 3 0 10 2 4
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg POR_EL12 3 5 10 2 4
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg S2POR_EL1 3 0 10 2 5
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg S2PIR_EL2 3 4 10 2 5
+Fields PIRx_ELx
+EndSysreg
+
Sysreg LORSA_EL1 3 0 10 4 0
Res0 63:52
Field 51:16 SA
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_L2TP=m
CONFIG_BRIDGE=m
CONFIG_ATALK=m
-CONFIG_DEV_APPLETALK=m
-CONFIG_IPDDP=m
-CONFIG_IPDDP_ENCAP=y
CONFIG_6LOWPAN=m
CONFIG_6LOWPAN_GHC_EXT_HDR_HOP=m
CONFIG_6LOWPAN_GHC_UDP=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_L2TP=m
CONFIG_BRIDGE=m
CONFIG_ATALK=m
-CONFIG_DEV_APPLETALK=m
-CONFIG_IPDDP=m
-CONFIG_IPDDP_ENCAP=y
CONFIG_6LOWPAN=m
CONFIG_6LOWPAN_GHC_EXT_HDR_HOP=m
CONFIG_6LOWPAN_GHC_UDP=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_JFS_FS=m
CONFIG_OCFS2_FS=m
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
+CONFIG_BCACHEFS_FS=m
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_AUTOFS_FS=m
CONFIG_DLM=m
CONFIG_ENCRYPTED_KEYS=m
CONFIG_HARDENED_USERCOPY=y
-CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_TEST=m
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
454 n32 futex_wake sys_futex_wake
455 n32 futex_wait sys_futex_wait
456 n32 futex_requeue sys_futex_requeue
+457 n32 statmount sys_statmount
+458 n32 listmount sys_listmount
454 n64 futex_wake sys_futex_wake
455 n64 futex_wait sys_futex_wait
456 n64 futex_requeue sys_futex_requeue
+457 n64 statmount sys_statmount
+458 n64 listmount sys_listmount
454 o32 futex_wake sys_futex_wake
455 o32 futex_wait sys_futex_wait
456 o32 futex_requeue sys_futex_requeue
+457 o32 statmount sys_statmount
+458 o32 listmount sys_listmount
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+ select FUNCTION_ALIGNMENT_4B
select GENERIC_ATOMIC64 if PPC32
select GENERIC_CLOCKEVENTS_BROADCAST if SMP
select GENERIC_CMOS_UPDATE
def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
config ARCH_SUPPORTS_KEXEC_FILE
- def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
+ def_bool PPC64
config ARCH_SUPPORTS_KEXEC_PURGATORY
- def_bool KEXEC_FILE
+ def_bool y
config ARCH_SELECTS_KEXEC_FILE
def_bool y
config PPC_EARLY_DEBUG_PS3GELIC
bool "Early debugging through the PS3 Ethernet port"
depends on PPC_PS3
- select PS3GELIC_UDBG
help
Select this to enable early debugging for the PlayStation3 via
UDP broadcasts sent out through the Ethernet port.
# Rewritten by Cort Dougan and Paul Mackerras
#
+ifdef cross_compiling
+ ifeq ($(CROSS_COMPILE),)
+ # Auto detect cross compiler prefix.
+ # Look for: (powerpc(64(le)?)?)(-unknown)?-linux(-gnu)?-
+ CC_ARCHES := powerpc powerpc64 powerpc64le
+ CC_SUFFIXES := linux linux-gnu unknown-linux-gnu
+ CROSS_COMPILE := $(call cc-cross-prefix, $(foreach a,$(CC_ARCHES), \
+ $(foreach s,$(CC_SUFFIXES),$(a)-$(s)-)))
+ endif
+endif
+
HAS_BIARCH := $(call cc-option-yn, -m32)
# Set default 32 bits cross compilers for vdso and boot wrapper
CROSS32_COMPILE ?=
# If we're on a ppc/ppc64/ppc64le machine use that defconfig, otherwise just use
-# ppc64_defconfig because we have nothing better to go on.
+# ppc64le_defconfig because we have nothing better to go on.
uname := $(shell uname -m)
-KBUILD_DEFCONFIG := $(if $(filter ppc%,$(uname)),$(uname),ppc64)_defconfig
+KBUILD_DEFCONFIG := $(if $(filter ppc%,$(uname)),$(uname),ppc64le)_defconfig
new_nm := $(shell if $(NM) --help 2>&1 | grep -- '--synthetic' > /dev/null; then echo y; else echo n; fi)
asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1)
-KBUILD_CPPFLAGS += -I $(srctree)/arch/$(ARCH) $(asinstr)
+KBUILD_CPPFLAGS += -I $(srctree)/arch/powerpc $(asinstr)
KBUILD_AFLAGS += $(AFLAGS-y)
KBUILD_CFLAGS += $(call cc-option,-msoft-float)
KBUILD_CFLAGS += $(CFLAGS-y)
PHONY += $(BOOT_TARGETS1) $(BOOT_TARGETS2)
-boot := arch/$(ARCH)/boot
+boot := arch/powerpc/boot
$(BOOT_TARGETS1): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
define archhelp
echo '* zImage - Build default images selected by kernel config'
- echo ' zImage.* - Compressed kernel image (arch/$(ARCH)/boot/zImage.*)'
+ echo ' zImage.* - Compressed kernel image (arch/powerpc/boot/zImage.*)'
echo ' uImage - U-Boot native image format'
echo ' cuImage.<dt> - Backwards compatible U-Boot image for older'
echo ' versions which do not support device trees'
echo ' (your) ~/bin/$(INSTALLKERNEL) or'
echo ' (distribution) /sbin/$(INSTALLKERNEL) or'
echo ' install to $$(INSTALL_PATH) and run lilo'
- echo ' *_defconfig - Select default config from arch/$(ARCH)/configs'
+ echo ' *_defconfig - Select default config from arch/powerpc/configs'
echo ''
echo ' Targets with <dt> embed a device tree blob inside the image'
echo ' These targets support board with firmware that does not'
echo ' support passing a device tree directly. Replace <dt> with the'
- echo ' name of a dts file from the arch/$(ARCH)/boot/dts/ directory'
+ echo ' name of a dts file from the arch/powerpc/boot/dts/ directory'
echo ' (minus the .dts extension).'
echo
$(foreach cfg,$(generated_configs),
reg = <0xf0000 0x1000>;
interrupts = <18 2 0 0>;
fsl,tmu-range = <0xb0000 0xa0026 0x80048 0x30061>;
- fsl,tmu-calibration = <0x00000000 0x0000000f
- 0x00000001 0x00000017
- 0x00000002 0x0000001e
- 0x00000003 0x00000026
- 0x00000004 0x0000002e
- 0x00000005 0x00000035
- 0x00000006 0x0000003d
- 0x00000007 0x00000044
- 0x00000008 0x0000004c
- 0x00000009 0x00000053
- 0x0000000a 0x0000005b
- 0x0000000b 0x00000064
-
- 0x00010000 0x00000011
- 0x00010001 0x0000001c
- 0x00010002 0x00000024
- 0x00010003 0x0000002b
- 0x00010004 0x00000034
- 0x00010005 0x00000039
- 0x00010006 0x00000042
- 0x00010007 0x0000004c
- 0x00010008 0x00000051
- 0x00010009 0x0000005a
- 0x0001000a 0x00000063
-
- 0x00020000 0x00000013
- 0x00020001 0x00000019
- 0x00020002 0x00000024
- 0x00020003 0x0000002c
- 0x00020004 0x00000035
- 0x00020005 0x0000003d
- 0x00020006 0x00000046
- 0x00020007 0x00000050
- 0x00020008 0x00000059
-
- 0x00030000 0x00000002
- 0x00030001 0x0000000d
- 0x00030002 0x00000019
- 0x00030003 0x00000024>;
+ fsl,tmu-calibration =
+ <0x00000000 0x0000000f>,
+ <0x00000001 0x00000017>,
+ <0x00000002 0x0000001e>,
+ <0x00000003 0x00000026>,
+ <0x00000004 0x0000002e>,
+ <0x00000005 0x00000035>,
+ <0x00000006 0x0000003d>,
+ <0x00000007 0x00000044>,
+ <0x00000008 0x0000004c>,
+ <0x00000009 0x00000053>,
+ <0x0000000a 0x0000005b>,
+ <0x0000000b 0x00000064>,
+
+ <0x00010000 0x00000011>,
+ <0x00010001 0x0000001c>,
+ <0x00010002 0x00000024>,
+ <0x00010003 0x0000002b>,
+ <0x00010004 0x00000034>,
+ <0x00010005 0x00000039>,
+ <0x00010006 0x00000042>,
+ <0x00010007 0x0000004c>,
+ <0x00010008 0x00000051>,
+ <0x00010009 0x0000005a>,
+ <0x0001000a 0x00000063>,
+
+ <0x00020000 0x00000013>,
+ <0x00020001 0x00000019>,
+ <0x00020002 0x00000024>,
+ <0x00020003 0x0000002c>,
+ <0x00020004 0x00000035>,
+ <0x00020005 0x0000003d>,
+ <0x00020006 0x00000046>,
+ <0x00020007 0x00000050>,
+ <0x00020008 0x00000059>,
+
+ <0x00030000 0x00000002>,
+ <0x00030001 0x0000000d>,
+ <0x00030002 0x00000019>,
+ <0x00030003 0x00000024>;
#thermal-sensor-cells = <1>;
};
reg = <0xf0000 0x1000>;
interrupts = <18 2 0 0>;
fsl,tmu-range = <0xa0000 0x90026 0x8004a 0x1006a>;
- fsl,tmu-calibration = <0x00000000 0x00000025
- 0x00000001 0x00000028
- 0x00000002 0x0000002d
- 0x00000003 0x00000031
- 0x00000004 0x00000036
- 0x00000005 0x0000003a
- 0x00000006 0x00000040
- 0x00000007 0x00000044
- 0x00000008 0x0000004a
- 0x00000009 0x0000004f
- 0x0000000a 0x00000054
-
- 0x00010000 0x0000000d
- 0x00010001 0x00000013
- 0x00010002 0x00000019
- 0x00010003 0x0000001f
- 0x00010004 0x00000025
- 0x00010005 0x0000002d
- 0x00010006 0x00000033
- 0x00010007 0x00000043
- 0x00010008 0x0000004b
- 0x00010009 0x00000053
-
- 0x00020000 0x00000010
- 0x00020001 0x00000017
- 0x00020002 0x0000001f
- 0x00020003 0x00000029
- 0x00020004 0x00000031
- 0x00020005 0x0000003c
- 0x00020006 0x00000042
- 0x00020007 0x0000004d
- 0x00020008 0x00000056
-
- 0x00030000 0x00000012
- 0x00030001 0x0000001d>;
+ fsl,tmu-calibration =
+ <0x00000000 0x00000025>,
+ <0x00000001 0x00000028>,
+ <0x00000002 0x0000002d>,
+ <0x00000003 0x00000031>,
+ <0x00000004 0x00000036>,
+ <0x00000005 0x0000003a>,
+ <0x00000006 0x00000040>,
+ <0x00000007 0x00000044>,
+ <0x00000008 0x0000004a>,
+ <0x00000009 0x0000004f>,
+ <0x0000000a 0x00000054>,
+
+ <0x00010000 0x0000000d>,
+ <0x00010001 0x00000013>,
+ <0x00010002 0x00000019>,
+ <0x00010003 0x0000001f>,
+ <0x00010004 0x00000025>,
+ <0x00010005 0x0000002d>,
+ <0x00010006 0x00000033>,
+ <0x00010007 0x00000043>,
+ <0x00010008 0x0000004b>,
+ <0x00010009 0x00000053>,
+
+ <0x00020000 0x00000010>,
+ <0x00020001 0x00000017>,
+ <0x00020002 0x0000001f>,
+ <0x00020003 0x00000029>,
+ <0x00020004 0x00000031>,
+ <0x00020005 0x0000003c>,
+ <0x00020006 0x00000042>,
+ <0x00020007 0x0000004d>,
+ <0x00020008 0x00000056>,
+
+ <0x00030000 0x00000012>,
+ <0x00030001 0x0000001d>;
#thermal-sensor-cells = <1>;
};
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_KSM=y
CONFIG_TRANSPARENT_HUGEPAGE=y
+CONFIG_MEM_SOFT_DIRTY=y
CONFIG_ZONE_DEVICE=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PS3_LPM=m
# CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set
CONFIG_KEXEC=y
+# CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 is not set
CONFIG_PPC_4K_PAGES=y
CONFIG_SCHED_SMT=y
CONFIG_PM=y
#define _PAGE_EXEC 0x00001 /* execute permission */
#define _PAGE_WRITE 0x00002 /* write access allowed */
#define _PAGE_READ 0x00004 /* read access allowed */
-#define _PAGE_NA _PAGE_PRIVILEGED
-#define _PAGE_NAX _PAGE_EXEC
-#define _PAGE_RO _PAGE_READ
-#define _PAGE_ROX (_PAGE_READ | _PAGE_EXEC)
-#define _PAGE_RW (_PAGE_READ | _PAGE_WRITE)
-#define _PAGE_RWX (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
#define _PAGE_PRIVILEGED 0x00008 /* kernel access only */
#define _PAGE_SAO 0x00010 /* Strong access order */
#define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */
static inline bool pte_access_permitted(pte_t pte, bool write)
{
/*
- * _PAGE_READ is needed for any access and will be
- * cleared for PROT_NONE
+ * _PAGE_READ is needed for any access and will be cleared for
+ * PROT_NONE. Execute-only mapping via PROT_EXEC also returns false.
*/
if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte))
return false;
*/
}
-static inline bool __pte_protnone(unsigned long pte)
-{
- return (pte & (pgprot_val(PAGE_NONE) | _PAGE_RWX)) == pgprot_val(PAGE_NONE);
-}
-
static inline bool __pte_flags_need_flush(unsigned long oldval,
unsigned long newval)
{
/*
* We do not expect kernel mappings or non-PTEs or not-present PTEs.
*/
- VM_WARN_ON_ONCE(!__pte_protnone(oldval) && oldval & _PAGE_PRIVILEGED);
- VM_WARN_ON_ONCE(!__pte_protnone(newval) && newval & _PAGE_PRIVILEGED);
+ VM_WARN_ON_ONCE(oldval & _PAGE_PRIVILEGED);
+ VM_WARN_ON_ONCE(newval & _PAGE_PRIVILEGED);
VM_WARN_ON_ONCE(!(oldval & _PAGE_PTE));
VM_WARN_ON_ONCE(!(newval & _PAGE_PTE));
VM_WARN_ON_ONCE(!(oldval & _PAGE_PRESENT));
if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
addr += MCOUNT_INSN_SIZE;
- return addr;
+ return addr;
}
unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
#define H_GET_ENERGY_SCALE_INFO 0x450
#define H_PKS_SIGNED_UPDATE 0x454
#define H_WATCHDOG 0x45C
-#define MAX_HCALL_OPCODE H_WATCHDOG
+#define H_GUEST_GET_CAPABILITIES 0x460
+#define H_GUEST_SET_CAPABILITIES 0x464
+#define H_GUEST_CREATE 0x470
+#define H_GUEST_CREATE_VCPU 0x474
+#define H_GUEST_GET_STATE 0x478
+#define H_GUEST_SET_STATE 0x47C
+#define H_GUEST_RUN_VCPU 0x480
+#define H_GUEST_COPY_MEMORY 0x484
+#define H_GUEST_DELETE 0x488
+#define MAX_HCALL_OPCODE H_GUEST_DELETE
/* Scope args for H_SCM_UNBIND_ALL */
#define H_UNBIND_SCOPE_ALL (0x1)
#define H_ENTER_NESTED 0xF804
#define H_TLB_INVALIDATE 0xF808
#define H_COPY_TOFROM_GUEST 0xF80C
-#define H_GUEST_GET_CAPABILITIES 0x460
-#define H_GUEST_SET_CAPABILITIES 0x464
-#define H_GUEST_CREATE 0x470
-#define H_GUEST_CREATE_VCPU 0x474
-#define H_GUEST_GET_STATE 0x478
-#define H_GUEST_SET_STATE 0x47C
-#define H_GUEST_RUN_VCPU 0x480
-#define H_GUEST_COPY_MEMORY 0x484
-#define H_GUEST_DELETE 0x488
/* Flags for H_SVM_PAGE_IN */
#define H_PAGE_IN_SHARED 0x1
void kvmhv_vm_nested_init(struct kvm *kvm);
long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu);
+void kvmhv_flush_lpid(u64 lpid);
void kvmhv_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1);
void kvmhv_release_all_nested(struct kvm *kvm);
long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
KVMPPC_BOOK3S_VCORE_ACCESSOR(vtb, 64, KVMPPC_GSID_VTB)
-KVMPPC_BOOK3S_VCORE_ACCESSOR(tb_offset, 64, KVMPPC_GSID_TB_OFFSET)
KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(arch_compat, 32, KVMPPC_GSID_LOGICAL_PVR)
KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(lpcr, 64, KVMPPC_GSID_LPCR)
+KVMPPC_BOOK3S_VCORE_ACCESSOR_SET(tb_offset, 64, KVMPPC_GSID_TB_OFFSET)
+
+static inline u64 kvmppc_get_tb_offset(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.vcore->tb_offset;
+}
static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
{
- WARN_ON(kvmhv_nestedv2_cached_reload(vcpu, KVMPPC_GSID_TB_OFFSET) < 0);
WARN_ON(kvmhv_nestedv2_cached_reload(vcpu, KVMPPC_GSID_DEC_EXPIRY_TB) < 0);
return vcpu->arch.dec_expires;
}
static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
{
vcpu->arch.dec_expires = val;
- WARN_ON(kvmhv_nestedv2_cached_reload(vcpu, KVMPPC_GSID_TB_OFFSET) < 0);
kvmhv_nestedv2_mark_dirty(vcpu, KVMPPC_GSID_DEC_EXPIRY_TB);
}
int kvmhv_nestedv2_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit);
int kvmhv_nestedv2_set_ptbl_entry(unsigned long lpid, u64 dw0, u64 dw1);
int kvmhv_nestedv2_parse_output(struct kvm_vcpu *vcpu);
+int kvmhv_nestedv2_set_vpa(struct kvm_vcpu *vcpu, unsigned long vpa);
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
#include <asm/types.h>
-#define __ALIGN .align 2
-#define __ALIGN_STR ".align 2"
-
#ifdef CONFIG_PPC64_ELF_ABI_V1
#define cond_syscall(x) \
asm ("\t.weak " #x "\n\t.set " #x ", sys_ni_syscall\n" \
#include <asm/nohash/mmu.h>
#endif
+#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
+#define __HAVE_ARCH_RESERVED_KERNEL_PAGES
+#endif
+
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_MMU_H_ */
#else
#define memory_hotplug_max() memblock_end_of_DRAM()
#endif /* CONFIG_NUMA */
-#ifdef CONFIG_FA_DUMP
-#define __HAVE_ARCH_RESERVED_KERNEL_PAGES
-#endif
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-extern int create_section_mapping(unsigned long start, unsigned long end,
- int nid, pgprot_t prot);
-#endif
#endif /* __KERNEL__ */
#endif /* _ASM_MMZONE_H_ */
#ifndef _ASM_POWERPC_PAPR_SYSPARM_H
#define _ASM_POWERPC_PAPR_SYSPARM_H
+#include <uapi/asm/papr-sysparm.h>
+
typedef struct {
- const u32 token;
+ u32 token;
} papr_sysparm_t;
#define mk_papr_sysparm(x_) ((papr_sysparm_t){ .token = x_, })
#define PAPR_SYSPARM_TLB_BLOCK_INVALIDATE_ATTRS mk_papr_sysparm(50)
#define PAPR_SYSPARM_LPAR_NAME mk_papr_sysparm(55)
-enum {
- PAPR_SYSPARM_MAX_INPUT = 1024,
- PAPR_SYSPARM_MAX_OUTPUT = 4000,
-};
-
+/**
+ * struct papr_sysparm_buf - RTAS work area layout for system parameter functions.
+ *
+ * This is the memory layout of the buffers passed to/from
+ * ibm,get-system-parameter and ibm,set-system-parameter. It is
+ * distinct from the papr_sysparm_io_block structure that is passed
+ * between user space and the kernel.
+ */
struct papr_sysparm_buf {
__be16 len;
char val[PAPR_SYSPARM_MAX_OUTPUT];
{
return lppaca_of(vcpu).idle;
}
+
+static inline bool vcpu_is_dispatched(int vcpu)
+{
+ /*
+ * This is the yield_count. An "odd" value (low bit on) means that
+ * the processor is yielded (either because of an OS yield or a
+ * hypervisor preempt). An even value implies that the processor is
+ * currently executing.
+ */
+ return (!(yield_count_of(vcpu) & 1));
+}
#else
static inline bool is_shared_processor(void)
{
{
return false;
}
+static inline bool vcpu_is_dispatched(int vcpu)
+{
+ return true;
+}
#endif
#define vcpu_is_preempted vcpu_is_preempted
* If the hypervisor has dispatched the target CPU on a physical
* processor, then the target CPU is definitely not preempted.
*/
- if (!(yield_count_of(cpu) & 1))
+ if (vcpu_is_dispatched(cpu))
return false;
/*
- * If the target CPU has yielded to Hypervisor but OS has not
- * requested idle then the target CPU is definitely preempted.
+ * if the target CPU is not dispatched and the guest OS
+ * has not marked the CPU idle, then it is hypervisor preempted.
*/
if (!is_vcpu_idle(cpu))
return true;
/*
* The PowerVM hypervisor dispatches VMs on a whole core
- * basis. So we know that a thread sibling of the local CPU
+ * basis. So we know that a thread sibling of the executing CPU
* cannot have been preempted by the hypervisor, even if it
* has called H_CONFER, which will set the yield bit.
*/
return false;
/*
- * If any of the threads of the target CPU's core are not
- * preempted or ceded, then consider target CPU to be
- * non-preempted.
+ * The specific target CPU was marked by guest OS as idle, but
+ * then also check all other cpus in the core for PowerVM
+ * because it does core scheduling and one of the vcpu
+ * of the core getting preempted by hypervisor implies
+ * other vcpus can also be considered preempted.
*/
first_cpu = cpu_first_thread_sibling(cpu);
for (i = first_cpu; i < first_cpu + threads_per_core; i++) {
if (i == cpu)
continue;
- if (!(yield_count_of(i) & 1))
+ if (vcpu_is_dispatched(i))
return false;
if (!is_vcpu_idle(i))
return true;
extern unsigned long get_phb_buid (struct device_node *);
extern int rtas_setup_phb(struct pci_controller *phb);
+int rtas_pci_dn_read_config(struct pci_dn *pdn, int where, int size, u32 *val);
+int rtas_pci_dn_write_config(struct pci_dn *pdn, int where, int size, u32 val);
+
#ifdef CONFIG_EEH
void eeh_addr_cache_insert_dev(struct pci_dev *dev);
int eeh_pci_enable(struct eeh_pe *pe, int function);
int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed);
void eeh_save_bars(struct eeh_dev *edev);
-int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
-int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
void eeh_pe_state_mark(struct eeh_pe *pe, int state);
void eeh_pe_mark_isolated(struct eeh_pe *pe);
void eeh_pe_state_clear(struct eeh_pe *pe, int state, bool include_passed);
void ps3_early_mm_init(void);
+#ifdef CONFIG_PPC_EARLY_DEBUG_PS3GELIC
+void udbg_shutdown_ps3gelic(void);
+#else
+static inline void udbg_shutdown_ps3gelic(void) {}
+#endif
+
#endif
#define PVR_POWER8E 0x004B
#define PVR_POWER8NVL 0x004C
#define PVR_POWER8 0x004D
+#define PVR_HX_C2000 0x0066
#define PVR_POWER9 0x004E
#define PVR_POWER10 0x0080
#define PVR_BE 0x0070
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * Register definitions specific to the A2 core
- *
- * Copyright (C) 2008 Ben. Herrenschmidt (benh@kernel.crashing.org), IBM Corp.
- */
-
-#ifndef __ASM_POWERPC_REG_A2_H__
-#define __ASM_POWERPC_REG_A2_H__
-
-#include <asm/asm-const.h>
-
-#define SPRN_TENSR 0x1b5
-#define SPRN_TENS 0x1b6 /* Thread ENable Set */
-#define SPRN_TENC 0x1b7 /* Thread ENable Clear */
-
-#define SPRN_A2_CCR0 0x3f0 /* Core Configuration Register 0 */
-#define SPRN_A2_CCR1 0x3f1 /* Core Configuration Register 1 */
-#define SPRN_A2_CCR2 0x3f2 /* Core Configuration Register 2 */
-#define SPRN_MMUCR0 0x3fc /* MMU Control Register 0 */
-#define SPRN_MMUCR1 0x3fd /* MMU Control Register 1 */
-#define SPRN_MMUCR2 0x3fe /* MMU Control Register 2 */
-#define SPRN_MMUCR3 0x3ff /* MMU Control Register 3 */
-
-#define SPRN_IAR 0x372
-
-#define SPRN_IUCR0 0x3f3
-#define IUCR0_ICBI_ACK 0x1000
-
-#define SPRN_XUCR0 0x3f6 /* Execution Unit Config Register 0 */
-
-#define A2_IERAT_SIZE 16
-#define A2_DERAT_SIZE 32
-
-/* A2 MMUCR0 bits */
-#define MMUCR0_ECL 0x80000000 /* Extended Class for TLB fills */
-#define MMUCR0_TID_NZ 0x40000000 /* TID is non-zero */
-#define MMUCR0_TS 0x10000000 /* Translation space for TLB fills */
-#define MMUCR0_TGS 0x20000000 /* Guest space for TLB fills */
-#define MMUCR0_TLBSEL 0x0c000000 /* TLB or ERAT target for TLB fills */
-#define MMUCR0_TLBSEL_U 0x00000000 /* TLBSEL = UTLB */
-#define MMUCR0_TLBSEL_I 0x08000000 /* TLBSEL = I-ERAT */
-#define MMUCR0_TLBSEL_D 0x0c000000 /* TLBSEL = D-ERAT */
-#define MMUCR0_LOCKSRSH 0x02000000 /* Use TLB lock on tlbsx. */
-#define MMUCR0_TID_MASK 0x000000ff /* TID field */
-
-/* A2 MMUCR1 bits */
-#define MMUCR1_IRRE 0x80000000 /* I-ERAT round robin enable */
-#define MMUCR1_DRRE 0x40000000 /* D-ERAT round robin enable */
-#define MMUCR1_REE 0x20000000 /* Reference Exception Enable*/
-#define MMUCR1_CEE 0x10000000 /* Change exception enable */
-#define MMUCR1_CSINV_ALL 0x00000000 /* Inval ERAT on all CS evts */
-#define MMUCR1_CSINV_NISYNC 0x04000000 /* Inval ERAT on all ex isync*/
-#define MMUCR1_CSINV_NEVER 0x0c000000 /* Don't inval ERAT on CS */
-#define MMUCR1_ICTID 0x00080000 /* IERAT class field as TID */
-#define MMUCR1_ITTID 0x00040000 /* IERAT thdid field as TID */
-#define MMUCR1_DCTID 0x00020000 /* DERAT class field as TID */
-#define MMUCR1_DTTID 0x00010000 /* DERAT thdid field as TID */
-#define MMUCR1_DCCD 0x00008000 /* DERAT class ignore */
-#define MMUCR1_TLBWE_BINV 0x00004000 /* back invalidate on tlbwe */
-
-/* A2 MMUCR2 bits */
-#define MMUCR2_PSSEL_SHIFT 4
-
-/* A2 MMUCR3 bits */
-#define MMUCR3_THID 0x0000000f /* Thread ID */
-
-/* *** ERAT TLB bits definitions */
-#define TLB0_EPN_MASK ASM_CONST(0xfffffffffffff000)
-#define TLB0_CLASS_MASK ASM_CONST(0x0000000000000c00)
-#define TLB0_CLASS_00 ASM_CONST(0x0000000000000000)
-#define TLB0_CLASS_01 ASM_CONST(0x0000000000000400)
-#define TLB0_CLASS_10 ASM_CONST(0x0000000000000800)
-#define TLB0_CLASS_11 ASM_CONST(0x0000000000000c00)
-#define TLB0_V ASM_CONST(0x0000000000000200)
-#define TLB0_X ASM_CONST(0x0000000000000100)
-#define TLB0_SIZE_MASK ASM_CONST(0x00000000000000f0)
-#define TLB0_SIZE_4K ASM_CONST(0x0000000000000010)
-#define TLB0_SIZE_64K ASM_CONST(0x0000000000000030)
-#define TLB0_SIZE_1M ASM_CONST(0x0000000000000050)
-#define TLB0_SIZE_16M ASM_CONST(0x0000000000000070)
-#define TLB0_SIZE_1G ASM_CONST(0x00000000000000a0)
-#define TLB0_THDID_MASK ASM_CONST(0x000000000000000f)
-#define TLB0_THDID_0 ASM_CONST(0x0000000000000001)
-#define TLB0_THDID_1 ASM_CONST(0x0000000000000002)
-#define TLB0_THDID_2 ASM_CONST(0x0000000000000004)
-#define TLB0_THDID_3 ASM_CONST(0x0000000000000008)
-#define TLB0_THDID_ALL ASM_CONST(0x000000000000000f)
-
-#define TLB1_RESVATTR ASM_CONST(0x00f0000000000000)
-#define TLB1_U0 ASM_CONST(0x0008000000000000)
-#define TLB1_U1 ASM_CONST(0x0004000000000000)
-#define TLB1_U2 ASM_CONST(0x0002000000000000)
-#define TLB1_U3 ASM_CONST(0x0001000000000000)
-#define TLB1_R ASM_CONST(0x0000800000000000)
-#define TLB1_C ASM_CONST(0x0000400000000000)
-#define TLB1_RPN_MASK ASM_CONST(0x000003fffffff000)
-#define TLB1_W ASM_CONST(0x0000000000000800)
-#define TLB1_I ASM_CONST(0x0000000000000400)
-#define TLB1_M ASM_CONST(0x0000000000000200)
-#define TLB1_G ASM_CONST(0x0000000000000100)
-#define TLB1_E ASM_CONST(0x0000000000000080)
-#define TLB1_VF ASM_CONST(0x0000000000000040)
-#define TLB1_UX ASM_CONST(0x0000000000000020)
-#define TLB1_SX ASM_CONST(0x0000000000000010)
-#define TLB1_UW ASM_CONST(0x0000000000000008)
-#define TLB1_SW ASM_CONST(0x0000000000000004)
-#define TLB1_UR ASM_CONST(0x0000000000000002)
-#define TLB1_SR ASM_CONST(0x0000000000000001)
-
-/* A2 erativax attributes definitions */
-#define ERATIVAX_RS_IS_ALL 0x000
-#define ERATIVAX_RS_IS_TID 0x040
-#define ERATIVAX_RS_IS_CLASS 0x080
-#define ERATIVAX_RS_IS_FULLMATCH 0x0c0
-#define ERATIVAX_CLASS_00 0x000
-#define ERATIVAX_CLASS_01 0x010
-#define ERATIVAX_CLASS_10 0x020
-#define ERATIVAX_CLASS_11 0x030
-#define ERATIVAX_PSIZE_4K (TLB_PSIZE_4K >> 1)
-#define ERATIVAX_PSIZE_64K (TLB_PSIZE_64K >> 1)
-#define ERATIVAX_PSIZE_1M (TLB_PSIZE_1M >> 1)
-#define ERATIVAX_PSIZE_16M (TLB_PSIZE_16M >> 1)
-#define ERATIVAX_PSIZE_1G (TLB_PSIZE_1G >> 1)
-
-/* A2 eratilx attributes definitions */
-#define ERATILX_T_ALL 0
-#define ERATILX_T_TID 1
-#define ERATILX_T_TGS 2
-#define ERATILX_T_FULLMATCH 3
-#define ERATILX_T_CLASS0 4
-#define ERATILX_T_CLASS1 5
-#define ERATILX_T_CLASS2 6
-#define ERATILX_T_CLASS3 7
-
-/* XUCR0 bits */
-#define XUCR0_TRACE_UM_T0 0x40000000 /* Thread 0 */
-#define XUCR0_TRACE_UM_T1 0x20000000 /* Thread 1 */
-#define XUCR0_TRACE_UM_T2 0x10000000 /* Thread 2 */
-#define XUCR0_TRACE_UM_T3 0x08000000 /* Thread 3 */
-
-/* A2 CCR0 register */
-#define A2_CCR0_PME_DISABLED 0x00000000
-#define A2_CCR0_PME_SLEEP 0x40000000
-#define A2_CCR0_PME_RVW 0x80000000
-#define A2_CCR0_PME_DISABLED2 0xc0000000
-
-/* A2 CCR2 register */
-#define A2_CCR2_ERAT_ONLY_MODE 0x00000001
-#define A2_CCR2_ENABLE_ICSWX 0x00000002
-#define A2_CCR2_ENABLE_PC 0x20000000
-#define A2_CCR2_ENABLE_TRACE 0x40000000
-
-#endif /* __ASM_POWERPC_REG_A2_H__ */
#define _POWERPC_RTAS_H
#ifdef __KERNEL__
+#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <asm/page.h>
#include <asm/rtas-types.h>
/* Memory set aside for sys_rtas to use with calls that need a work area. */
#define RTAS_USER_REGION_SIZE (64 * 1024)
-/* RTAS return status codes */
-#define RTAS_HARDWARE_ERROR -1 /* Hardware Error */
-#define RTAS_BUSY -2 /* RTAS Busy */
-#define RTAS_INVALID_PARAMETER -3 /* Invalid indicator/domain/sensor etc. */
-#define RTAS_EXTENDED_DELAY_MIN 9900
-#define RTAS_EXTENDED_DELAY_MAX 9905
+/*
+ * Common RTAS function return values, derived from the table "RTAS
+ * Status Word Values" in PAPR+ v2.13 7.2.8: "Return Codes". If a
+ * function can return a value in this table then generally it has the
+ * meaning listed here. More extended commentary in the documentation
+ * for rtas_call().
+ *
+ * RTAS functions may use negative and positive numbers not in this
+ * set for function-specific error and success conditions,
+ * respectively.
+ */
+#define RTAS_SUCCESS 0 /* Success. */
+#define RTAS_HARDWARE_ERROR -1 /* Hardware or other unspecified error. */
+#define RTAS_BUSY -2 /* Retry immediately. */
+#define RTAS_INVALID_PARAMETER -3 /* Invalid indicator/domain/sensor etc. */
+#define RTAS_UNEXPECTED_STATE_CHANGE -7 /* Seems limited to EEH and slot reset. */
+#define RTAS_EXTENDED_DELAY_MIN 9900 /* Retry after delaying for ~1ms. */
+#define RTAS_EXTENDED_DELAY_MAX 9905 /* Retry after delaying for ~100s. */
+#define RTAS_ML_ISOLATION_ERROR -9000 /* Multi-level isolation error. */
/* statuses specific to ibm,suspend-me */
#define RTAS_SUSPEND_ABORTED 9000 /* Suspension aborted */
#define RTAS_TYPE_DEALLOC 0xE3
#define RTAS_TYPE_DUMP 0xE4
#define RTAS_TYPE_HOTPLUG 0xE5
-/* I don't add PowerMGM events right now, this is a different topic */
+/* I don't add PowerMGM events right now, this is a different topic */
#define RTAS_TYPE_PMGM_POWER_SW_ON 0x60
#define RTAS_TYPE_PMGM_POWER_SW_OFF 0x61
#define RTAS_TYPE_PMGM_LID_OPEN 0x62
{
return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
}
-extern int rtas_token(const char *service);
-extern int rtas_service_present(const char *service);
-extern int rtas_call(int token, int, int, int *, ...);
+int rtas_token(const char *service);
+int rtas_call(int token, int nargs, int nret, int *outputs, ...);
void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
int nret, ...);
-extern void __noreturn rtas_restart(char *cmd);
-extern void rtas_power_off(void);
-extern void __noreturn rtas_halt(void);
-extern void rtas_os_term(char *str);
+void __noreturn rtas_restart(char *cmd);
+void rtas_power_off(void);
+void __noreturn rtas_halt(void);
+void rtas_os_term(char *str);
void rtas_activate_firmware(void);
-extern int rtas_get_sensor(int sensor, int index, int *state);
-extern int rtas_get_sensor_fast(int sensor, int index, int *state);
-extern int rtas_get_power_level(int powerdomain, int *level);
-extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
-extern bool rtas_indicator_present(int token, int *maxindex);
-extern int rtas_set_indicator(int indicator, int index, int new_value);
-extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
-extern void rtas_progress(char *s, unsigned short hex);
+int rtas_get_sensor(int sensor, int index, int *state);
+int rtas_get_sensor_fast(int sensor, int index, int *state);
+int rtas_get_power_level(int powerdomain, int *level);
+int rtas_set_power_level(int powerdomain, int level, int *setlevel);
+bool rtas_indicator_present(int token, int *maxindex);
+int rtas_set_indicator(int indicator, int index, int new_value);
+int rtas_set_indicator_fast(int indicator, int index, int new_value);
+void rtas_progress(char *s, unsigned short hex);
int rtas_ibm_suspend_me(int *fw_status);
int rtas_error_rc(int rtas_rc);
struct rtc_time;
-extern time64_t rtas_get_boot_time(void);
-extern void rtas_get_rtc_time(struct rtc_time *rtc_time);
-extern int rtas_set_rtc_time(struct rtc_time *rtc_time);
+time64_t rtas_get_boot_time(void);
+void rtas_get_rtc_time(struct rtc_time *rtc_time);
+int rtas_set_rtc_time(struct rtc_time *rtc_time);
-extern unsigned int rtas_busy_delay_time(int status);
+unsigned int rtas_busy_delay_time(int status);
bool rtas_busy_delay(int status);
-extern int early_init_dt_scan_rtas(unsigned long node,
- const char *uname, int depth, void *data);
+int early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data);
-extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
+void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
#ifdef CONFIG_PPC_PSERIES
extern time64_t last_rtas_event;
-extern int clobbering_unread_rtas_event(void);
-extern void post_mobility_fixup(void);
+int clobbering_unread_rtas_event(void);
int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
#else
static inline int clobbering_unread_rtas_event(void) { return 0; }
#endif
#ifdef CONFIG_PPC_RTAS_DAEMON
-extern void rtas_cancel_event_scan(void);
+void rtas_cancel_event_scan(void);
#else
static inline void rtas_cancel_event_scan(void) { }
#endif
/* Error types logged. */
#define ERR_FLAG_ALREADY_LOGGED 0x0
-#define ERR_FLAG_BOOT 0x1 /* log was pulled from NVRAM on boot */
+#define ERR_FLAG_BOOT 0x1 /* log was pulled from NVRAM on boot */
#define ERR_TYPE_RTAS_LOG 0x2 /* from rtas event-scan */
#define ERR_TYPE_KERNEL_PANIC 0x4 /* from die()/panic() */
#define ERR_TYPE_KERNEL_PANIC_GZ 0x8 /* ditto, compressed */
(ERR_TYPE_RTAS_LOG | ERR_TYPE_KERNEL_PANIC | ERR_TYPE_KERNEL_PANIC_GZ)
#define RTAS_DEBUG KERN_DEBUG "RTAS: "
-
+
#define RTAS_ERROR_LOG_MAX 2048
/*
* for all rtas calls that require an error buffer argument.
* This includes 'check-exception' and 'rtas-last-error'.
*/
-extern int rtas_get_error_log_max(void);
+int rtas_get_error_log_max(void);
/* Event Scan Parameters */
#define EVENT_SCAN_ALL_EVENTS 0xf0000000
/* RMO buffer reserved for user-space RTAS use */
extern unsigned long rtas_rmo_buf;
+extern struct mutex rtas_ibm_get_vpd_lock;
+
#define GLOBAL_INTERRUPT_QUEUE 9005
/**
(devfn << 8) | (reg & 0xff);
}
-extern void rtas_give_timebase(void);
-extern void rtas_take_timebase(void);
+void rtas_give_timebase(void);
+void rtas_take_timebase(void);
#ifdef CONFIG_PPC_RTAS
static inline int page_is_rtas_user_buf(unsigned long pfn)
/* Not the best place to put pSeries_coalesce_init, will be fixed when we
* move some of the rtas suspend-me stuff to pseries */
-extern void pSeries_coalesce_init(void);
+void pSeries_coalesce_init(void);
void rtas_initialize(void);
#else
static inline int page_is_rtas_user_buf(unsigned long pfn) { return 0;}
static inline void rtas_initialize(void) { }
#endif
-extern int call_rtas(const char *, int, int, unsigned long *, ...);
-
#ifdef CONFIG_HV_PERF_CTRS
void read_24x7_sys_info(void);
#else
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_PAPR_MISCDEV_H_
+#define _UAPI_PAPR_MISCDEV_H_
+
+enum {
+ PAPR_MISCDEV_IOC_ID = 0xb2,
+};
+
+#endif /* _UAPI_PAPR_MISCDEV_H_ */
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_PAPR_SYSPARM_H_
+#define _UAPI_PAPR_SYSPARM_H_
+
+#include <linux/types.h>
+#include <asm/ioctl.h>
+#include <asm/papr-miscdev.h>
+
+enum {
+ PAPR_SYSPARM_MAX_INPUT = 1024,
+ PAPR_SYSPARM_MAX_OUTPUT = 4000,
+};
+
+struct papr_sysparm_io_block {
+ __u32 parameter;
+ __u16 length;
+ char data[PAPR_SYSPARM_MAX_OUTPUT];
+};
+
+/**
+ * PAPR_SYSPARM_IOC_GET - Retrieve the value of a PAPR system parameter.
+ *
+ * Uses _IOWR because of one corner case: Retrieving the value of the
+ * "OS Service Entitlement Status" parameter (60) requires the caller
+ * to supply input data (a date string) in the buffer passed to
+ * firmware. So the @length and @data of the incoming
+ * papr_sysparm_io_block are always used to initialize the work area
+ * supplied to ibm,get-system-parameter. No other parameters are known
+ * to parameterize the result this way, and callers are encouraged
+ * (but not required) to zero-initialize @length and @data in the
+ * common case.
+ *
+ * On error the contents of the ioblock are indeterminate.
+ *
+ * Return:
+ * 0: Success; @length is the length of valid data in @data, not to exceed @PAPR_SYSPARM_MAX_OUTPUT.
+ * -EIO: Platform error. (-1)
+ * -EINVAL: Incorrect data length or format. (-9999)
+ * -EPERM: The calling partition is not allowed to access this parameter. (-9002)
+ * -EOPNOTSUPP: Parameter not supported on this platform (-3)
+ */
+#define PAPR_SYSPARM_IOC_GET _IOWR(PAPR_MISCDEV_IOC_ID, 1, struct papr_sysparm_io_block)
+
+/**
+ * PAPR_SYSPARM_IOC_SET - Update the value of a PAPR system parameter.
+ *
+ * The contents of the ioblock are unchanged regardless of success.
+ *
+ * Return:
+ * 0: Success; the parameter has been updated.
+ * -EIO: Platform error. (-1)
+ * -EINVAL: Incorrect data length or format. (-9999)
+ * -EPERM: The calling partition is not allowed to access this parameter. (-9002)
+ * -EOPNOTSUPP: Parameter not supported on this platform (-3)
+ */
+#define PAPR_SYSPARM_IOC_SET _IOW(PAPR_MISCDEV_IOC_ID, 2, struct papr_sysparm_io_block)
+
+#endif /* _UAPI_PAPR_SYSPARM_H_ */
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_PAPR_VPD_H_
+#define _UAPI_PAPR_VPD_H_
+
+#include <asm/ioctl.h>
+#include <asm/papr-miscdev.h>
+
+struct papr_location_code {
+ /*
+ * PAPR+ v2.13 12.3.2.4 Converged Location Code Rules - Length
+ * Restrictions. 79 characters plus nul.
+ */
+ char str[80];
+};
+
+/*
+ * ioctl for /dev/papr-vpd. Returns a VPD handle fd corresponding to
+ * the location code.
+ */
+#define PAPR_VPD_IOC_CREATE_HANDLE _IOW(PAPR_MISCDEV_IOC_ID, 0, struct papr_location_code)
+
+#endif /* _UAPI_PAPR_VPD_H_ */
.machine_check_early = __machine_check_early_realmode_p8,
.platform = "power8",
},
+ { /* 2.07-compliant processor, HeXin C2000 processor */
+ .pvr_mask = 0xffff0000,
+ .pvr_value = 0x00660000,
+ .cpu_name = "HX-C2000",
+ .cpu_features = CPU_FTRS_POWER8,
+ .cpu_user_features = COMMON_USER_POWER8,
+ .cpu_user_features2 = COMMON_USER2_POWER8,
+ .mmu_features = MMU_FTRS_POWER8,
+ .icache_bsize = 128,
+ .dcache_bsize = 128,
+ .cpu_setup = __setup_cpu_power8,
+ .cpu_restore = __restore_cpu_power8,
+ .machine_check_early = __machine_check_early_realmode_p8,
+ .platform = "power8",
+ },
{ /* 3.00-compliant processor, i.e. Power9 "architected" mode */
.pvr_mask = 0xffffffff,
.pvr_value = 0x0f000005,
#include <asm/setup.h>
#include <asm/cpu_setup.h>
-static struct cpu_spec the_cpu_spec __read_mostly;
+static struct cpu_spec the_cpu_spec __ro_after_init;
-struct cpu_spec* cur_cpu_spec __read_mostly = NULL;
+struct cpu_spec *cur_cpu_spec __ro_after_init = NULL;
EXPORT_SYMBOL(cur_cpu_spec);
/* The platform string corresponding to the real PVR */
#include <asm/cputable.h>
#include <asm/setup.h>
#include <asm/thread_info.h>
-#include <asm/reg_a2.h>
#include <asm/exception-64e.h>
#include <asm/bug.h>
#include <asm/irqflags.h>
#include <linux/kernel.h>
#include <linux/lockdep.h>
#include <linux/memblock.h>
+#include <linux/mutex.h>
#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/reboot.h>
* ppc64le, and we want to keep it that way. It does
* not make sense for this to be set when @filter
* is NULL.
+ * @lock: Pointer to an optional dedicated per-function mutex. This
+ * should be set for functions that require multiple calls in
+ * sequence to complete a single operation, and such sequences
+ * will disrupt each other if allowed to interleave. Users of
+ * this function are required to hold the associated lock for
+ * the duration of the call sequence. Add an explanatory
+ * comment to the function table entry if setting this member.
*/
struct rtas_function {
s32 token;
const bool banned_for_syscall_on_le:1;
const char * const name;
const struct rtas_filter *filter;
+ struct mutex *lock;
};
+/*
+ * Per-function locks for sequence-based RTAS functions.
+ */
+static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
+static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
+static DEFINE_MUTEX(rtas_ibm_get_indices_lock);
+static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
+static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
+static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
+DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
+
static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__CHECK_EXCEPTION] = {
.name = "check-exception",
.buf_idx1 = -1, .size_idx1 = -1,
.buf_idx2 = -1, .size_idx2 = -1,
},
+ /*
+ * PAPR+ as of v2.13 doesn't explicitly impose any
+ * restriction, but this typically requires multiple
+ * calls before success, and there's no reason to
+ * allow sequences to interleave.
+ */
+ .lock = &rtas_ibm_activate_firmware_lock,
},
[RTAS_FNIDX__IBM_CBE_START_PTCAL] = {
.name = "ibm,cbe-start-ptcal",
.buf_idx1 = 1, .size_idx1 = -1,
.buf_idx2 = -1, .size_idx2 = -1,
},
+ /*
+ * PAPR+ v2.13 R1–7.3.19–3 is explicit that the OS
+ * must not call ibm,get-dynamic-sensor-state with
+ * different inputs until a non-retry status has been
+ * returned.
+ */
+ .lock = &rtas_ibm_get_dynamic_sensor_state_lock,
},
[RTAS_FNIDX__IBM_GET_INDICES] = {
.name = "ibm,get-indices",
.buf_idx1 = 2, .size_idx1 = 3,
.buf_idx2 = -1, .size_idx2 = -1,
},
+ /*
+ * PAPR+ v2.13 R1–7.3.17–2 says that the OS must not
+ * interleave ibm,get-indices call sequences with
+ * different inputs.
+ */
+ .lock = &rtas_ibm_get_indices_lock,
},
[RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY] = {
.name = "ibm,get-rio-topology",
.buf_idx1 = 0, .size_idx1 = -1,
.buf_idx2 = 1, .size_idx2 = 2,
},
+ /*
+ * PAPR+ v2.13 R1–7.3.20–4 indicates that sequences
+ * should not be allowed to interleave.
+ */
+ .lock = &rtas_ibm_get_vpd_lock,
},
[RTAS_FNIDX__IBM_GET_XIVE] = {
.name = "ibm,get-xive",
.buf_idx1 = 2, .size_idx1 = 3,
.buf_idx2 = -1, .size_idx2 = -1,
},
+ /*
+ * PAPR+ v2.13 R1–7.3.26–6 says the OS should allow
+ * only one call sequence in progress at a time.
+ */
+ .lock = &rtas_ibm_lpar_perftools_lock,
},
[RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE] = {
.name = "ibm,manage-flash-image",
.buf_idx1 = 0, .size_idx1 = 1,
.buf_idx2 = -1, .size_idx2 = -1,
},
+ /*
+ * This follows a sequence-based pattern similar to
+ * ibm,get-vpd et al. Since PAPR+ restricts
+ * interleaving call sequences for other functions of
+ * this style, assume the restriction applies here,
+ * even though it's not explicit in the spec.
+ */
+ .lock = &rtas_ibm_physical_attestation_lock,
},
[RTAS_FNIDX__IBM_PLATFORM_DUMP] = {
.name = "ibm,platform-dump",
.buf_idx1 = 4, .size_idx1 = 5,
.buf_idx2 = -1, .size_idx2 = -1,
},
+ /*
+ * PAPR+ v2.13 7.3.3.4.1 indicates that concurrent
+ * sequences of ibm,platform-dump are allowed if they
+ * are operating on different dump tags. So leave the
+ * lock pointer unset for now. This may need
+ * reconsideration if kernel-internal users appear.
+ */
},
[RTAS_FNIDX__IBM_POWER_OFF_UPS] = {
.name = "ibm,power-off-ups",
.buf_idx1 = 2, .size_idx1 = -1,
.buf_idx2 = -1, .size_idx2 = -1,
},
+ /*
+ * PAPR+ v2.13 R1–7.3.18–3 says the OS must not call
+ * this function with different inputs until a
+ * non-retry status has been returned.
+ */
+ .lock = &rtas_ibm_set_dynamic_indicator_lock,
},
[RTAS_FNIDX__IBM_SET_EEH_OPTION] = {
.name = "ibm,set-eeh-option",
},
};
+#define for_each_rtas_function(funcp) \
+ for (funcp = &rtas_function_table[0]; \
+ funcp < &rtas_function_table[ARRAY_SIZE(rtas_function_table)]; \
+ ++funcp)
+
/*
* Nearly all RTAS calls need to be serialized. All uses of the
* default rtas_args block must hold rtas_lock.
static int __init rtas_token_to_function_xarray_init(void)
{
+ const struct rtas_function *func;
int err = 0;
- for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) {
- const struct rtas_function *func = &rtas_function_table[i];
+ for_each_rtas_function(func) {
const s32 token = func->token;
if (token == RTAS_UNKNOWN_SERVICE)
}
arch_initcall(rtas_token_to_function_xarray_init);
+/*
+ * For use by sys_rtas(), where the token value is provided by user
+ * space and we don't want to warn on failed lookups.
+ */
+static const struct rtas_function *rtas_token_to_function_untrusted(s32 token)
+{
+ return xa_load(&rtas_token_to_function_xarray, token);
+}
+
+/*
+ * Reverse lookup for deriving the function descriptor from a
+ * known-good token value in contexts where the former is not already
+ * available. @token must be valid, e.g. derived from the result of a
+ * prior lookup against the function table.
+ */
static const struct rtas_function *rtas_token_to_function(s32 token)
{
const struct rtas_function *func;
if (WARN_ONCE(token < 0, "invalid token %d", token))
return NULL;
- func = xa_load(&rtas_token_to_function_xarray, token);
-
- if (WARN_ONCE(!func, "unexpected failed lookup for token %d", token))
- return NULL;
+ func = rtas_token_to_function_untrusted(token);
+ if (func)
+ return func;
+ /*
+ * Fall back to linear scan in case the reverse mapping hasn't
+ * been initialized yet.
+ */
+ if (xa_empty(&rtas_token_to_function_xarray)) {
+ for_each_rtas_function(func) {
+ if (func->token == token)
+ return func;
+ }
+ }
- return func;
+ WARN_ONCE(true, "unexpected failed lookup for token %d", token);
+ return NULL;
}
/* This is here deliberately so it's only used in this file */
static void __do_enter_rtas_trace(struct rtas_args *args)
{
- const char *name = NULL;
+ const struct rtas_function *func = rtas_token_to_function(be32_to_cpu(args->token));
- if (args == &rtas_args)
- lockdep_assert_held(&rtas_lock);
/*
- * If the tracepoints that consume the function name aren't
- * active, avoid the lookup.
+ * If there is a per-function lock, it must be held by the
+ * caller.
*/
- if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) {
- const s32 token = be32_to_cpu(args->token);
- const struct rtas_function *func = rtas_token_to_function(token);
+ if (func->lock)
+ lockdep_assert_held(func->lock);
- name = func->name;
- }
+ if (args == &rtas_args)
+ lockdep_assert_held(&rtas_lock);
- trace_rtas_input(args, name);
+ trace_rtas_input(args, func->name);
trace_rtas_ll_entry(args);
__do_enter_rtas(args);
trace_rtas_ll_exit(args);
- trace_rtas_output(args, name);
+ trace_rtas_output(args, func->name);
}
static void do_enter_rtas(struct rtas_args *args)
static int pending_newline = 0; /* did last write end with unprinted newline? */
static int width = 16;
- if (c == '\n') {
+ if (c == '\n') {
while (width-- > 0)
call_rtas_display_status(' ');
width = 16;
if (pending_newline) {
call_rtas_display_status('\r');
call_rtas_display_status('\n');
- }
+ }
pending_newline = 0;
if (width--) {
call_rtas_display_status(c);
else
rtas_call(display_character, 1, 1, NULL, '\r');
}
-
+
if (row_width)
width = row_width[current_line];
else
spin_unlock(&progress_lock);
return;
}
-
+
/* RTAS wants CR-LF, not just LF */
-
+
if (*os == '\n') {
rtas_call(display_character, 1, 1, NULL, '\r');
rtas_call(display_character, 1, 1, NULL, '\n');
*/
rtas_call(display_character, 1, 1, NULL, *os);
}
-
+
if (row_width)
width = row_width[current_line];
else
width--;
rtas_call(display_character, 1, 1, NULL, *os);
}
-
+
os++;
-
+
/* if we overwrite the screen length */
if (width <= 0)
while ((*os != 0) && (*os != '\n') && (*os != '\r'))
os++;
}
-
+
spin_unlock(&progress_lock);
}
EXPORT_SYMBOL_GPL(rtas_progress); /* needed by rtas_flash module */
}
EXPORT_SYMBOL_GPL(rtas_token);
-int rtas_service_present(const char *service)
-{
- return rtas_token(service) != RTAS_UNKNOWN_SERVICE;
-}
-
#ifdef CONFIG_RTAS_ERROR_LOGGING
static u32 rtas_error_log_max __ro_after_init = RTAS_ERROR_LOG_MAX;
return;
}
+ mutex_lock(&rtas_ibm_activate_firmware_lock);
+
do {
fwrc = rtas_call(token, 0, 1, NULL);
} while (rtas_busy_delay(fwrc));
+ mutex_unlock(&rtas_ibm_activate_firmware_lock);
+
if (fwrc)
pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
}
end < (rtas_rmo_buf + RTAS_USER_REGION_SIZE);
}
-static bool block_rtas_call(int token, int nargs,
+static bool block_rtas_call(const struct rtas_function *func, int nargs,
struct rtas_args *args)
{
- const struct rtas_function *func;
const struct rtas_filter *f;
- const bool is_platform_dump = token == rtas_function_token(RTAS_FN_IBM_PLATFORM_DUMP);
- const bool is_config_conn = token == rtas_function_token(RTAS_FN_IBM_CONFIGURE_CONNECTOR);
+ const bool is_platform_dump =
+ func == &rtas_function_table[RTAS_FNIDX__IBM_PLATFORM_DUMP];
+ const bool is_config_conn =
+ func == &rtas_function_table[RTAS_FNIDX__IBM_CONFIGURE_CONNECTOR];
u32 base, size, end;
/*
- * If this token doesn't correspond to a function the kernel
- * understands, you're not allowed to call it.
- */
- func = rtas_token_to_function(token);
- if (!func)
- goto err;
- /*
- * And only functions with filters attached are allowed.
+ * Only functions with filters attached are allowed.
*/
f = func->filter;
if (!f)
return false;
err:
pr_err_ratelimited("sys_rtas: RTAS call blocked - exploit attempt?\n");
- pr_err_ratelimited("sys_rtas: token=0x%x, nargs=%d (called by %s)\n",
- token, nargs, current->comm);
+ pr_err_ratelimited("sys_rtas: %s nargs=%d (called by %s)\n",
+ func->name, nargs, current->comm);
return true;
}
/* We assume to be passed big endian arguments */
SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
{
+ const struct rtas_function *func;
struct pin_cookie cookie;
struct rtas_args args;
unsigned long flags;
nargs * sizeof(rtas_arg_t)) != 0)
return -EFAULT;
- if (token == RTAS_UNKNOWN_SERVICE)
+ /*
+ * If this token doesn't correspond to a function the kernel
+ * understands, you're not allowed to call it.
+ */
+ func = rtas_token_to_function_untrusted(token);
+ if (!func)
return -EINVAL;
args.rets = &args.args[nargs];
memset(args.rets, 0, nret * sizeof(rtas_arg_t));
- if (block_rtas_call(token, nargs, &args))
+ if (block_rtas_call(func, nargs, &args))
return -EINVAL;
if (token_is_restricted_errinjct(token)) {
buff_copy = get_errorlog_buffer();
+ /*
+ * If this function has a mutex assigned to it, we must
+ * acquire it to avoid interleaving with any kernel-based uses
+ * of the same function. Kernel-based sequences acquire the
+ * appropriate mutex explicitly.
+ */
+ if (func->lock)
+ mutex_lock(func->lock);
+
raw_spin_lock_irqsave(&rtas_lock, flags);
cookie = lockdep_pin_lock(&rtas_lock);
lockdep_unpin_lock(&rtas_lock, cookie);
raw_spin_unlock_irqrestore(&rtas_lock, flags);
+ if (func->lock)
+ mutex_unlock(func->lock);
+
if (buff_copy) {
if (errbuf)
log_error(errbuf, ERR_TYPE_RTAS_LOG, 0);
return 0;
}
-int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
+int rtas_pci_dn_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
{
int returnval = -1;
unsigned long buid, addr;
pdn = pci_get_pdn_by_devfn(bus, devfn);
/* Validity of pdn is checked in here */
- ret = rtas_read_config(pdn, where, size, val);
+ ret = rtas_pci_dn_read_config(pdn, where, size, val);
if (*val == EEH_IO_ERROR_VALUE(size) &&
eeh_dev_check_failure(pdn_to_eeh_dev(pdn)))
return PCIBIOS_DEVICE_NOT_FOUND;
return ret;
}
-int rtas_write_config(struct pci_dn *pdn, int where, int size, u32 val)
+int rtas_pci_dn_write_config(struct pci_dn *pdn, int where, int size, u32 val)
{
unsigned long buid, addr;
int ret;
pdn = pci_get_pdn_by_devfn(bus, devfn);
/* Validity of pdn is checked in here. */
- return rtas_write_config(pdn, where, size, val);
+ return rtas_pci_dn_write_config(pdn, where, size, val);
}
static struct pci_ops rtas_pci_ops = {
#endif
struct task_struct *secondary_current;
-bool has_big_cores;
-bool coregroup_enabled;
-bool thread_group_shares_l2;
-bool thread_group_shares_l3;
+bool has_big_cores __ro_after_init;
+bool coregroup_enabled __ro_after_init;
+bool thread_group_shares_l2 __ro_after_init;
+bool thread_group_shares_l3 __ro_after_init;
DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
EXPORT_PER_CPU_SYMBOL(cpu_core_map);
EXPORT_SYMBOL_GPL(has_big_cores);
-enum {
-#ifdef CONFIG_SCHED_SMT
- smt_idx,
-#endif
- cache_idx,
- mc_idx,
- die_idx,
-};
-
#define MAX_THREAD_LIST_SIZE 8
#define THREAD_GROUP_SHARE_L1 1
#define THREAD_GROUP_SHARE_L2_L3 2
return 0;
}
-static bool shared_caches;
+static bool shared_caches __ro_after_init;
#ifdef CONFIG_SCHED_SMT
/* cpumask of CPUs with asymmetric SMT dependency */
}
#endif
+/*
+ * On shared processor LPARs scheduled on a big core (which has two or more
+ * independent thread groups per core), prefer lower numbered CPUs, so
+ * that workload consolidates to lesser number of cores.
+ */
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(splpar_asym_pack);
+
/*
* P9 has a slightly odd architecture where pairs of cores share an L2 cache.
* This topology makes it *much* cheaper to migrate tasks between adjacent cores
*/
static int powerpc_shared_cache_flags(void)
{
+ if (static_branch_unlikely(&splpar_asym_pack))
+ return SD_SHARE_PKG_RESOURCES | SD_ASYM_PACKING;
+
return SD_SHARE_PKG_RESOURCES;
}
+static int powerpc_shared_proc_flags(void)
+{
+ if (static_branch_unlikely(&splpar_asym_pack))
+ return SD_ASYM_PACKING;
+
+ return 0;
+}
+
/*
* We can't just pass cpu_l2_cache_mask() directly because
* returns a non-const pointer and the compiler barfs on that.
static bool has_coregroup_support(void)
{
+ /* Coregroup identification not available on shared systems */
+ if (is_shared_processor())
+ return 0;
+
return coregroup_enabled;
}
return cpu_coregroup_mask(cpu);
}
-static struct sched_domain_topology_level powerpc_topology[] = {
-#ifdef CONFIG_SCHED_SMT
- { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
-#endif
- { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
- { cpu_mc_mask, SD_INIT_NAME(MC) },
- { cpu_cpu_mask, SD_INIT_NAME(PKG) },
- { NULL, },
-};
-
static int __init init_big_cores(void)
{
int cpu;
BUG();
}
-static void __init fixup_topology(void)
+static struct sched_domain_topology_level powerpc_topology[6];
+
+static void __init build_sched_topology(void)
{
- int i;
+ int i = 0;
+
+ if (is_shared_processor() && has_big_cores)
+ static_branch_enable(&splpar_asym_pack);
#ifdef CONFIG_SCHED_SMT
if (has_big_cores) {
pr_info("Big cores detected but using small core scheduling\n");
- powerpc_topology[smt_idx].mask = smallcore_smt_mask;
+ powerpc_topology[i++] = (struct sched_domain_topology_level){
+ smallcore_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT)
+ };
+ } else {
+ powerpc_topology[i++] = (struct sched_domain_topology_level){
+ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT)
+ };
}
#endif
+ if (shared_caches) {
+ powerpc_topology[i++] = (struct sched_domain_topology_level){
+ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE)
+ };
+ }
+ if (has_coregroup_support()) {
+ powerpc_topology[i++] = (struct sched_domain_topology_level){
+ cpu_mc_mask, powerpc_shared_proc_flags, SD_INIT_NAME(MC)
+ };
+ }
+ powerpc_topology[i++] = (struct sched_domain_topology_level){
+ cpu_cpu_mask, powerpc_shared_proc_flags, SD_INIT_NAME(PKG)
+ };
- if (!has_coregroup_support())
- powerpc_topology[mc_idx].mask = powerpc_topology[cache_idx].mask;
-
- /*
- * Try to consolidate topology levels here instead of
- * allowing scheduler to degenerate.
- * - Dont consolidate if masks are different.
- * - Dont consolidate if sd_flags exists and are different.
- */
- for (i = 1; i <= die_idx; i++) {
- if (powerpc_topology[i].mask != powerpc_topology[i - 1].mask)
- continue;
-
- if (powerpc_topology[i].sd_flags && powerpc_topology[i - 1].sd_flags &&
- powerpc_topology[i].sd_flags != powerpc_topology[i - 1].sd_flags)
- continue;
-
- if (!powerpc_topology[i - 1].sd_flags)
- powerpc_topology[i - 1].sd_flags = powerpc_topology[i].sd_flags;
+ /* There must be one trailing NULL entry left. */
+ BUG_ON(i >= ARRAY_SIZE(powerpc_topology) - 1);
- powerpc_topology[i].mask = powerpc_topology[i + 1].mask;
- powerpc_topology[i].sd_flags = powerpc_topology[i + 1].sd_flags;
-#ifdef CONFIG_SCHED_DEBUG
- powerpc_topology[i].name = powerpc_topology[i + 1].name;
-#endif
- }
+ set_sched_topology(powerpc_topology);
}
void __init smp_cpus_done(unsigned int max_cpus)
smp_ops->bringup_done();
dump_numa_cpu_topology();
+ build_sched_topology();
+}
- fixup_topology();
- set_sched_topology(powerpc_topology);
+/*
+ * For asym packing, by default lower numbered CPU has higher priority.
+ * On shared processors, pack to lower numbered core. However avoid moving
+ * between thread_groups within the same core.
+ */
+int arch_asym_cpu_priority(int cpu)
+{
+ if (static_branch_unlikely(&splpar_asym_pack))
+ return -cpu / threads_per_core;
+
+ return -cpu;
}
#ifdef CONFIG_HOTPLUG_CPU
#include <linux/interrupt.h>
#include <linux/nmi.h>
+void do_after_copyback(void);
+
void do_after_copyback(void)
{
iommu_restore();
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
.globl ftrace_regs_call
ftrace_regs_call:
bl ftrace_stub
- nop
ftrace_regs_exit 1
_GLOBAL(ftrace_caller)
.globl ftrace_call
ftrace_call:
bl ftrace_stub
- nop
ftrace_regs_exit 0
_GLOBAL(ftrace_stub)
return -EINVAL;
}
+#ifdef CONFIG_GENERIC_BUG
int is_valid_bugaddr(unsigned long addr)
{
return is_kernel_addr(addr);
}
+#endif
#ifdef CONFIG_MATH_EMULATION
static int emulate_math(struct pt_regs *regs)
#include <linux/types.h>
#include <asm/udbg.h>
#include <asm/io.h>
-#include <asm/reg_a2.h>
#include <asm/early_ioremap.h>
extern u8 real_readb(volatile u8 __iomem *addr);
targets += vdso32.lds
CPPFLAGS_vdso32.lds += -P -C -Upowerpc
targets += vdso64.lds
-CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
+CPPFLAGS_vdso64.lds += -P -C
# link rule for the .so file, .lds has to be first
$(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) $(obj)/vgettimeofday-32.o FORCE
VMCOREINFO_OFFSET(mmu_psize_def, shift);
#endif
VMCOREINFO_SYMBOL(cur_cpu_spec);
+ VMCOREINFO_OFFSET(cpu_spec, cpu_features);
VMCOREINFO_OFFSET(cpu_spec, mmu_features);
vmcoreinfo_append_str("NUMBER(RADIX_MMU)=%d\n", early_radix_enabled());
vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset());
switch (priority) {
case BOOK3S_IRQPRIO_DECREMENTER:
- deliver = (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
+ deliver = !kvmhv_is_nestedv2() && (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_DECREMENTER;
break;
case BOOK3S_IRQPRIO_EXTERNAL:
- deliver = (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
+ deliver = !kvmhv_is_nestedv2() && (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_EXTERNAL;
break;
case BOOK3S_IRQPRIO_SYSTEM_RESET:
unsigned long quadrant, ret = n;
bool is_load = !!to;
+ if (kvmhv_is_nestedv2())
+ return H_UNSUPPORTED;
+
/* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
if (kvmhv_on_pseries())
return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr,
void *to, void *from, unsigned long n)
{
int lpid = vcpu->kvm->arch.lpid;
- int pid = kvmppc_get_pid(vcpu);
+ int pid;
/* This would cause a data segment intr so don't allow the access */
if (eaddr & (0x3FFUL << 52))
/* If accessing quadrant 3 then pid is expected to be 0 */
if (((eaddr >> 62) & 0x3) == 0x3)
pid = 0;
+ else
+ pid = kvmppc_get_pid(vcpu);
eaddr &= ~(0xFFFUL << 52);
return err;
}
-static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *vpap)
+static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *vpap,
+ struct kvmppc_vpa *old_vpap)
{
struct kvm *kvm = vcpu->kvm;
void *va;
kvmppc_unpin_guest_page(kvm, va, gpa, false);
va = NULL;
}
- if (vpap->pinned_addr)
- kvmppc_unpin_guest_page(kvm, vpap->pinned_addr, vpap->gpa,
- vpap->dirty);
+ *old_vpap = *vpap;
+
vpap->gpa = gpa;
vpap->pinned_addr = va;
vpap->dirty = false;
static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvmppc_vpa old_vpa = { 0 };
+
if (!(vcpu->arch.vpa.update_pending ||
vcpu->arch.slb_shadow.update_pending ||
vcpu->arch.dtl.update_pending))
spin_lock(&vcpu->arch.vpa_update_lock);
if (vcpu->arch.vpa.update_pending) {
- kvmppc_update_vpa(vcpu, &vcpu->arch.vpa);
- if (vcpu->arch.vpa.pinned_addr)
+ kvmppc_update_vpa(vcpu, &vcpu->arch.vpa, &old_vpa);
+ if (old_vpa.pinned_addr) {
+ if (kvmhv_is_nestedv2())
+ kvmhv_nestedv2_set_vpa(vcpu, ~0ull);
+ kvmppc_unpin_guest_page(kvm, old_vpa.pinned_addr, old_vpa.gpa,
+ old_vpa.dirty);
+ }
+ if (vcpu->arch.vpa.pinned_addr) {
init_vpa(vcpu, vcpu->arch.vpa.pinned_addr);
+ if (kvmhv_is_nestedv2())
+ kvmhv_nestedv2_set_vpa(vcpu, __pa(vcpu->arch.vpa.pinned_addr));
+ }
}
if (vcpu->arch.dtl.update_pending) {
- kvmppc_update_vpa(vcpu, &vcpu->arch.dtl);
+ kvmppc_update_vpa(vcpu, &vcpu->arch.dtl, &old_vpa);
+ if (old_vpa.pinned_addr)
+ kvmppc_unpin_guest_page(kvm, old_vpa.pinned_addr, old_vpa.gpa,
+ old_vpa.dirty);
vcpu->arch.dtl_ptr = vcpu->arch.dtl.pinned_addr;
vcpu->arch.dtl_index = 0;
}
- if (vcpu->arch.slb_shadow.update_pending)
- kvmppc_update_vpa(vcpu, &vcpu->arch.slb_shadow);
+ if (vcpu->arch.slb_shadow.update_pending) {
+ kvmppc_update_vpa(vcpu, &vcpu->arch.slb_shadow, &old_vpa);
+ if (old_vpa.pinned_addr)
+ kvmppc_unpin_guest_page(kvm, old_vpa.pinned_addr, old_vpa.gpa,
+ old_vpa.dirty);
+ }
+
spin_unlock(&vcpu->arch.vpa_update_lock);
}
* That can happen due to a bug, or due to a machine check
* occurring at just the wrong time.
*/
- if (__kvmppc_get_msr_hv(vcpu) & MSR_HV) {
+ if (!kvmhv_is_nestedv2() && (__kvmppc_get_msr_hv(vcpu) & MSR_HV)) {
printk(KERN_EMERG "KVM trap in HV mode!\n");
printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
vcpu->arch.trap, kvmppc_get_pc(vcpu),
{
int i;
- if (unlikely(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
+ if (!kvmhv_is_nestedv2() && unlikely(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
/*
* Guest userspace executed sc 1. This can only be
* reached by the P9 path because the old path
if (rc < 0)
return -EINVAL;
+ kvmppc_gse_put_u64(io->vcpu_run_input, KVMPPC_GSID_LPCR, lpcr);
+
accumulate_time(vcpu, &vcpu->arch.in_guest);
rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
&trap, &i);
if (!nested) {
kvmppc_core_prepare_to_enter(vcpu);
- if (__kvmppc_get_msr_hv(vcpu) & MSR_EE) {
- if (xive_interrupt_pending(vcpu))
+ if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
+ &vcpu->arch.pending_exceptions) ||
+ xive_interrupt_pending(vcpu)) {
+ /*
+ * For nested HV, don't synthesize but always pass MER,
+ * the L0 will be able to optimise that more
+ * effectively than manipulating registers directly.
+ */
+ if (!kvmhv_on_pseries() && (__kvmppc_get_msr_hv(vcpu) & MSR_EE))
kvmppc_inject_interrupt_hv(vcpu,
- BOOK3S_INTERRUPT_EXTERNAL, 0);
- } else if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
- &vcpu->arch.pending_exceptions)) {
- lpcr |= LPCR_MER;
+ BOOK3S_INTERRUPT_EXTERNAL, 0);
+ else
+ lpcr |= LPCR_MER;
}
} else if (vcpu->arch.pending_exceptions ||
vcpu->arch.doorbell_request ||
* entering a nested guest in which case the decrementer is now owned
* by L2 and the L1 decrementer is provided in hdec_expires
*/
- if (kvmppc_core_pending_dec(vcpu) &&
+ if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) &&
((tb < kvmppc_dec_expires_host_tb(vcpu)) ||
(trap == BOOK3S_INTERRUPT_SYSCALL &&
kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED)))
if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
accumulate_time(vcpu, &vcpu->arch.hcall);
- if (WARN_ON_ONCE(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
+ if (!kvmhv_is_nestedv2() && WARN_ON_ONCE(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
/*
* These should have been caught reflected
* into the guest by now. Final sanity check:
kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
}
- if (kvmhv_is_nestedv2())
+ if (kvmhv_is_nestedv2()) {
+ kvmhv_flush_lpid(kvm->arch.lpid);
plpar_guest_delete(0, kvm->arch.lpid);
- else
+ } else {
kvmppc_free_lpid(kvm->arch.lpid);
+ }
kvmppc_free_pimap(kvm);
}
}
}
-static void kvmhv_flush_lpid(u64 lpid)
+void kvmhv_flush_lpid(u64 lpid)
{
long rc;
}
EXPORT_SYMBOL_GPL(kvmhv_nestedv2_set_ptbl_entry);
+/**
+ * kvmhv_nestedv2_set_vpa() - register L2 VPA with L0
+ * @vcpu: vcpu
+ * @vpa: L1 logical real address
+ */
+int kvmhv_nestedv2_set_vpa(struct kvm_vcpu *vcpu, unsigned long vpa)
+{
+ struct kvmhv_nestedv2_io *io;
+ struct kvmppc_gs_buff *gsb;
+ int rc = 0;
+
+ io = &vcpu->arch.nestedv2_io;
+ gsb = io->vcpu_run_input;
+
+ kvmppc_gsb_reset(gsb);
+ rc = kvmppc_gse_put_u64(gsb, KVMPPC_GSID_VPA, vpa);
+ if (rc < 0)
+ goto out;
+
+ rc = kvmppc_gsb_send(gsb, 0);
+ if (rc < 0)
+ pr_err("KVM-NESTEDv2: couldn't register the L2 VPA (rc=%d)\n", rc);
+
+out:
+ kvmppc_gsb_reset(gsb);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(kvmhv_nestedv2_set_vpa);
+
/**
* kvmhv_nestedv2_parse_output() - receive values from H_GUEST_RUN_VCPU output
* @vcpu: vcpu
case PVR_POWER8:
case PVR_POWER8E:
case PVR_POWER8NVL:
+ case PVR_HX_C2000:
case PVR_POWER9:
vcpu->arch.hflags |= BOOK3S_HFLAG_MULTI_PGSIZE |
BOOK3S_HFLAG_NEW_TLBIE;
emulated = EMULATE_FAIL;
vcpu->arch.regs.msr = kvmppc_get_msr(vcpu);
- kvmhv_nestedv2_reload_ptregs(vcpu, &vcpu->arch.regs);
if (analyse_instr(&op, &vcpu->arch.regs, inst) == 0) {
int type = op.type & INSTR_TYPE_MASK;
int size = GETSIZE(op.type);
op.reg, size, !instr_byte_swap);
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
- kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+ kvmppc_set_gpr(vcpu, op.update_reg, vcpu->arch.vaddr_accessed);
break;
}
KVM_MMIO_REG_FPR|op.reg, size, 1);
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
- kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+ kvmppc_set_gpr(vcpu, op.update_reg, vcpu->arch.vaddr_accessed);
break;
#endif
break;
}
#endif
- case STORE:
- /* if need byte reverse, op.val has been reversed by
- * analyse_instr().
- */
- emulated = kvmppc_handle_store(vcpu, op.val, size, 1);
+ case STORE: {
+ int instr_byte_swap = op.type & BYTEREV;
+
+ emulated = kvmppc_handle_store(vcpu, kvmppc_get_gpr(vcpu, op.reg),
+ size, !instr_byte_swap);
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
- kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+ kvmppc_set_gpr(vcpu, op.update_reg, vcpu->arch.vaddr_accessed);
break;
+ }
#ifdef CONFIG_PPC_FPU
case STORE_FP:
if (kvmppc_check_fp_disabled(vcpu))
kvmppc_get_fpr(vcpu, op.reg), size, 1);
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
- kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+ kvmppc_set_gpr(vcpu, op.update_reg, vcpu->arch.vaddr_accessed);
break;
#endif
}
trace_kvm_ppc_instr(ppc_inst_val(inst), kvmppc_get_pc(vcpu), emulated);
- kvmhv_nestedv2_mark_dirty_ptregs(vcpu, &vcpu->arch.regs);
/* Advance past emulated instruction. */
if (emulated != EMULATE_FAIL)
# so it is only needed for modules, and only for older linkers which
# do not support --save-restore-funcs
ifndef CONFIG_LD_IS_BFD
-extra-$(CONFIG_PPC64) += crtsavres.o
+always-$(CONFIG_PPC64) += crtsavres.o
endif
obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \
} u;
nb = GETSIZE(op->type);
+ if (nb > sizeof(u))
+ return -EINVAL;
if (!address_ok(regs, ea, nb))
return -EFAULT;
rn = op->reg;
} u;
nb = GETSIZE(op->type);
+ if (nb > sizeof(u))
+ return -EINVAL;
if (!address_ok(regs, ea, nb))
return -EFAULT;
rn = op->reg;
u8 b[sizeof(__vector128)];
} u = {};
+ if (size > sizeof(u))
+ return -EINVAL;
+
if (!address_ok(regs, ea & ~0xfUL, 16))
return -EFAULT;
/* align to multiple of size */
if (err)
return err;
if (unlikely(cross_endian))
- do_byte_reverse(&u.b[ea & 0xf], size);
+ do_byte_reverse(&u.b[ea & 0xf], min_t(size_t, size, sizeof(u)));
preempt_disable();
if (regs->msr & MSR_VEC)
put_vr(rn, &u.v);
u8 b[sizeof(__vector128)];
} u;
+ if (size > sizeof(u))
+ return -EINVAL;
+
if (!address_ok(regs, ea & ~0xfUL, 16))
return -EFAULT;
/* align to multiple of size */
u.v = current->thread.vr_state.vr[rn];
preempt_enable();
if (unlikely(cross_endian))
- do_byte_reverse(&u.b[ea & 0xf], size);
+ do_byte_reverse(&u.b[ea & 0xf], min_t(size_t, size, sizeof(u)));
return copy_mem_out(&u.b[ea & 0xf], ea, size, regs);
}
#endif /* CONFIG_ALTIVEC */
else
rflags |= 0x3;
}
+ VM_WARN_ONCE(!(pteflags & _PAGE_RWX), "no-access mapping request");
} else {
if (pteflags & _PAGE_RWX)
rflags |= 0x2;
+ /*
+ * We should never hit this in normal fault handling because
+ * a permission check (check_pte_access()) will bubble this
+ * to higher level linux handler even for PAGE_NONE.
+ */
+ VM_WARN_ONCE(!(pteflags & _PAGE_RWX), "no-access mapping request");
if (!((pteflags & _PAGE_WRITE) && (pteflags & _PAGE_DIRTY)))
rflags |= 0x1;
}
set_pte_at(vma->vm_mm, addr, ptep, pte);
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
* For hash translation mode, we use the deposited table to store hash slot
* information and they are stored at PTRS_PER_PMD offset from related pmd
return true;
}
+#endif
/*
* Does the CPU support tlbie?
unsigned long pvr = mfspr(SPRN_PVR);
if (PVR_VER(pvr) == PVR_POWER8 || PVR_VER(pvr) == PVR_POWER8E ||
- PVR_VER(pvr) == PVR_POWER8NVL || PVR_VER(pvr) == PVR_POWER9)
+ PVR_VER(pvr) == PVR_POWER8NVL || PVR_VER(pvr) == PVR_POWER9 ||
+ PVR_VER(pvr) == PVR_HX_C2000)
pkeys_total = 32;
}
}
goto done;
}
count_vm_vma_lock_event(VMA_LOCK_RETRY);
+ if (fault & VM_FAULT_MAJOR)
+ flags |= FAULT_FLAG_TRIED;
if (fault_signal_pending(fault, regs))
return user_mode(regs) ? 0 : SIGBUS;
* as to leave enough 0 bits in the address to contain it. */
unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
HUGEPD_SHIFT_MASK + 1);
- struct kmem_cache *new;
+ struct kmem_cache *new = NULL;
/* It would be nice if this was a BUILD_BUG_ON(), but at the
* moment, gcc doesn't seem to recognize is_power_of_2 as a
align = max_t(unsigned long, align, minalign);
name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
- new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
+ if (name)
+ new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
if (!new)
panic("Could not allocate pgtable cache for order %d", shift);
{
return IS_ENABLED(CONFIG_KFENCE) || debug_pagealloc_enabled();
}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int create_section_mapping(unsigned long start, unsigned long end,
+ int nid, pgprot_t prot);
+#endif
if (!ret)
goto parse_result;
+ if (ret && (ret != H_PARAMETER))
+ goto out;
+
/*
* ret value as 'H_PARAMETER' implies that the current buffer size
* can't accommodate all the information, and a partial buffer
attr_group->attrs = attrs;
do {
ev_val_str = kasprintf(GFP_KERNEL, "event=0x%x", pmu->events[i].value);
+ if (!ev_val_str)
+ continue;
dev_str = device_str_attr_create(pmu->events[i].name, ev_val_str);
if (!dev_str)
continue;
attrs[j++] = dev_str;
if (pmu->events[i].scale) {
ev_scale_str = kasprintf(GFP_KERNEL, "%s.scale", pmu->events[i].name);
+ if (!ev_scale_str)
+ continue;
dev_str = device_str_attr_create(ev_scale_str, pmu->events[i].scale);
if (!dev_str)
continue;
if (pmu->events[i].unit) {
ev_unit_str = kasprintf(GFP_KERNEL, "%s.unit", pmu->events[i].name);
+ if (!ev_unit_str)
+ continue;
dev_str = device_str_attr_create(ev_unit_str, pmu->events[i].unit);
if (!dev_str)
continue;
config CURRITUCK
bool "IBM Currituck (476fpe) Support"
depends on PPC_47x
+ select I2C
select SWIOTLB
select 476FPE
select FORCE_PCI
isync();
}
-int __init ppc44x_idle_init(void)
+static int __init ppc44x_idle_init(void)
{
if (!mode_spin) {
/* If we are not setting spin mode
#include <linux/of_address.h>
#include <linux/of_irq.h>
+#include "mpc5121_ads.h"
+
static struct device_node *cpld_pic_node;
static struct irq_domain *cpld_pic_host;
}
#endif /* CONFIG_TOUCHSCREEN_ADS7846 */
-void __init pdm360ng_init(void)
+static void __init pdm360ng_init(void)
{
mpc512x_init();
pdm360ng_touchscreen_init();
static int agent_thread_fn(void *data)
{
+ set_freezable();
+
while (1) {
- wait_event_interruptible(agent_wq, pci_pm_state >= 2);
- try_to_freeze();
+ wait_event_freezable(agent_wq, pci_pm_state >= 2);
if (signal_pending(current) || pci_pm_state < 2)
continue;
/* P1025 has pins muxed for QE and other functions. To
* enable QE UEC mode, we need to set bit QE0 for UCC1
* in Eth mode, QE0 and QE3 for UCC5 in Eth mode, QE9
- * and QE12 for QE MII management singals in PMUXCR
+ * and QE12 for QE MII management signals in PMUXCR
* register.
*/
setbits32(&guts->pmuxcr, MPC85xx_PMUXCR_QE(0) |
select MPIC
default y if GEF_SBC610 || GEF_SBC310 || GEF_PPC9A \
|| MVME7100
-
-config MPC8610
- bool
- select HAVE_PCI
- select FSL_PCI if PCI
- select PPC_UDBG_16550
- select MPIC
}
#ifdef CONFIG_PPC_PASEMI_NEMO
-void pas_shutdown(void)
+static void pas_shutdown(void)
{
/* Set the PLD bit that makes the SB600 think the power button is being pressed */
void __iomem *pld_map = ioremap(0xf5000000,4096);
printk(KERN_ERR "Couldn't get primary IPI interrupt");
}
-void __init smp_psurge_take_timebase(void)
+static void __init smp_psurge_take_timebase(void)
{
if (psurge_type != PSURGE_DUAL)
return;
set_dec(tb_ticks_per_jiffy/2);
}
-void __init smp_psurge_give_timebase(void)
+static void __init smp_psurge_give_timebase(void)
{
/* Nothing to do here */
}
else
name = kasprintf(GFP_KERNEL, "opal");
+ if (!name)
+ continue;
/* Install interrupt handler */
rc = request_irq(r->start, opal_interrupt, r->flags & IRQD_TRIGGER_MASK,
name, NULL);
j = 0;
pcaps[i].pg.name = kasprintf(GFP_KERNEL, "%pOFn", node);
+ if (!pcaps[i].pg.name) {
+ kfree(pcaps[i].pattrs);
+ kfree(pcaps[i].pg.attrs);
+ goto out_pcaps_pattrs;
+ }
+
if (has_min) {
powercap_add_attr(min, "powercap-min",
&pcaps[i].pattrs[j]);
const char *label;
addrp = of_get_address(node, 0, &range_size, NULL);
+ if (!addrp)
+ continue;
range_addr = of_read_number(addrp, 2);
range_end = range_addr + range_size;
ent->chip = chip;
snprintf(ent->name, 16, "%08x", chip);
ent->path.data = (void *)kasprintf(GFP_KERNEL, "%pOF", dn);
+ if (!ent->path.data) {
+ kfree(ent);
+ return -ENOMEM;
+ }
+
ent->path.size = strlen((char *)ent->path.data);
dir = debugfs_create_dir(ent->name, root);
if (pvr_ver != PVR_POWER8 &&
pvr_ver != PVR_POWER8E &&
- pvr_ver != PVR_POWER8NVL)
+ pvr_ver != PVR_POWER8NVL &&
+ pvr_ver != PVR_HX_C2000)
return 0;
/*
profiling support of the Cell processor with programs like
perfmon2, then say Y or M, otherwise say N.
-config PS3GELIC_UDBG
- bool "PS3 udbg output via UDP broadcasts on Ethernet"
- depends on PPC_PS3
- help
- Enables udbg early debugging output by sending broadcast UDP
- via the Ethernet port (UDP port number 18194).
-
- This driver uses a trivial implementation and is independent
- from the main PS3 gelic network driver.
-
- If in doubt, say N here.
-
endmenu
obj-y += interrupt.o exports.o os-area.o
obj-y += system-bus.o
-obj-$(CONFIG_PS3GELIC_UDBG) += gelic_udbg.o
+obj-$(CONFIG_PPC_EARLY_DEBUG_PS3GELIC) += gelic_udbg.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_SPU_BASE) += spu.o
obj-y += device-init.o
if (res)
goto fail_free_irq;
+ set_freezable();
/* Loop here processing the requested notification events. */
do {
try_to_freeze();
#include <linux/ip.h>
#include <linux/udp.h>
+#include <asm/ps3.h>
#include <asm/io.h>
#include <asm/udbg.h>
#include <asm/lv1call.h>
obj-y := lpar.o hvCall.o nvram.o reconfig.o \
of_helpers.o rtas-work-area.o papr-sysparm.o \
+ papr-vpd.o \
setup.o iommu.o event_sources.o ras.o \
firmware.o power.o dlpar.o mobility.o rng.o \
pci.o pci_dlpar.o eeh_pseries.o msi.o \
if (!pdn)
return 0;
- rtas_read_config(pdn, PCI_STATUS, 2, &status);
+ rtas_pci_dn_read_config(pdn, PCI_STATUS, 2, &status);
if (!(status & PCI_STATUS_CAP_LIST))
return 0;
return 0;
while (cnt--) {
- rtas_read_config(pdn, pos, 1, &pos);
+ rtas_pci_dn_read_config(pdn, pos, 1, &pos);
if (pos < 0x40)
break;
pos &= ~3;
- rtas_read_config(pdn, pos + PCI_CAP_LIST_ID, 1, &id);
+ rtas_pci_dn_read_config(pdn, pos + PCI_CAP_LIST_ID, 1, &id);
if (id == 0xff)
break;
if (id == cap)
if (!edev || !edev->pcie_cap)
return 0;
- if (rtas_read_config(pdn, pos, 4, &header) != PCIBIOS_SUCCESSFUL)
+ if (rtas_pci_dn_read_config(pdn, pos, 4, &header) != PCIBIOS_SUCCESSFUL)
return 0;
else if (!header)
return 0;
if (pos < 256)
break;
- if (rtas_read_config(pdn, pos, 4, &header) != PCIBIOS_SUCCESSFUL)
+ if (rtas_pci_dn_read_config(pdn, pos, 4, &header) != PCIBIOS_SUCCESSFUL)
break;
}
if ((pdn->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
edev->mode |= EEH_DEV_BRIDGE;
if (edev->pcie_cap) {
- rtas_read_config(pdn, edev->pcie_cap + PCI_EXP_FLAGS,
- 2, &pcie_flags);
+ rtas_pci_dn_read_config(pdn, edev->pcie_cap + PCI_EXP_FLAGS,
+ 2, &pcie_flags);
pcie_flags = (pcie_flags & PCI_EXP_FLAGS_TYPE) >> 4;
if (pcie_flags == PCI_EXP_TYPE_ROOT_PORT)
edev->mode |= EEH_DEV_ROOT_PORT;
{
struct pci_dn *pdn = eeh_dev_to_pdn(edev);
- return rtas_read_config(pdn, where, size, val);
+ return rtas_pci_dn_read_config(pdn, where, size, val);
}
/**
{
struct pci_dn *pdn = eeh_dev_to_pdn(edev);
- return rtas_write_config(pdn, where, size, val);
+ return rtas_pci_dn_write_config(pdn, where, size, val);
}
#ifdef CONFIG_PCI_IOV
int rc;
mem_block = lmb_to_memblock(lmb);
- if (!mem_block)
+ if (!mem_block) {
+ pr_err("Failed memory block lookup for LMB 0x%x\n", lmb->drc_index);
return -EINVAL;
+ }
if (online && mem_block->dev.offline)
rc = device_online(&mem_block->dev);
}
}
- if (!lmb_found)
+ if (!lmb_found) {
+ pr_debug("Failed to look up LMB for drc index %x\n", drc_index);
rc = -EINVAL;
-
- if (rc)
+ } else if (rc) {
pr_debug("Failed to hot-remove memory at %llx\n",
lmb->base_addr);
- else
+ } else {
pr_debug("Memory at %llx was hot-removed\n", lmb->base_addr);
+ }
return rc;
}
rc = update_lmb_associativity_index(lmb);
if (rc) {
dlpar_release_drc(lmb->drc_index);
+ pr_err("Failed to configure LMB 0x%x\n", lmb->drc_index);
return rc;
}
/* Add the memory */
rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);
if (rc) {
+ pr_err("Failed to add LMB 0x%x to node %u", lmb->drc_index, nid);
invalidate_lmb_associativity_index(lmb);
return rc;
}
rc = dlpar_online_lmb(lmb);
if (rc) {
+ pr_err("Failed to online LMB 0x%x on node %u\n", lmb->drc_index, nid);
__remove_memory(lmb->base_addr, block_sz);
invalidate_lmb_associativity_index(lmb);
} else {
#define pr_fmt(fmt) "papr-sysparm: " fmt
+#include <linux/anon_inodes.h>
#include <linux/bug.h>
+#include <linux/file.h>
+#include <linux/fs.h>
#include <linux/init.h>
#include <linux/kernel.h>
+#include <linux/miscdevice.h>
#include <linux/printk.h>
#include <linux/slab.h>
-#include <asm/rtas.h>
+#include <linux/uaccess.h>
+#include <asm/machdep.h>
#include <asm/papr-sysparm.h>
#include <asm/rtas-work-area.h>
+#include <asm/rtas.h>
struct papr_sysparm_buf *papr_sysparm_buf_alloc(void)
{
kfree(buf);
}
+static size_t papr_sysparm_buf_get_length(const struct papr_sysparm_buf *buf)
+{
+ return be16_to_cpu(buf->len);
+}
+
+static void papr_sysparm_buf_set_length(struct papr_sysparm_buf *buf, size_t length)
+{
+ WARN_ONCE(length > sizeof(buf->val),
+ "bogus length %zu, clamping to safe value", length);
+ length = min(sizeof(buf->val), length);
+ buf->len = cpu_to_be16(length);
+}
+
+/*
+ * For use on buffers returned from ibm,get-system-parameter before
+ * returning them to callers. Ensures the encoded length of valid data
+ * cannot overrun buf->val[].
+ */
+static void papr_sysparm_buf_clamp_length(struct papr_sysparm_buf *buf)
+{
+ papr_sysparm_buf_set_length(buf, papr_sysparm_buf_get_length(buf));
+}
+
+/*
+ * Perform some basic diligence on the system parameter buffer before
+ * submitting it to RTAS.
+ */
+static bool papr_sysparm_buf_can_submit(const struct papr_sysparm_buf *buf)
+{
+ /*
+ * Firmware ought to reject buffer lengths that exceed the
+ * maximum specified in PAPR, but there's no reason for the
+ * kernel to allow them either.
+ */
+ if (papr_sysparm_buf_get_length(buf) > sizeof(buf->val))
+ return false;
+
+ return true;
+}
+
/**
* papr_sysparm_get() - Retrieve the value of a PAPR system parameter.
* @param: PAPR system parameter token as described in
*
* Return: 0 on success, -errno otherwise. @buf is unmodified on error.
*/
-
int papr_sysparm_get(papr_sysparm_t param, struct papr_sysparm_buf *buf)
{
const s32 token = rtas_function_token(RTAS_FN_IBM_GET_SYSTEM_PARAMETER);
if (token == RTAS_UNKNOWN_SERVICE)
return -ENOENT;
+ if (!papr_sysparm_buf_can_submit(buf))
+ return -EINVAL;
+
work_area = rtas_work_area_alloc(sizeof(*buf));
memcpy(rtas_work_area_raw_buf(work_area), buf, sizeof(*buf));
case 0:
ret = 0;
memcpy(buf, rtas_work_area_raw_buf(work_area), sizeof(*buf));
+ papr_sysparm_buf_clamp_length(buf);
break;
case -3: /* parameter not implemented */
ret = -EOPNOTSUPP;
if (token == RTAS_UNKNOWN_SERVICE)
return -ENOENT;
+ if (!papr_sysparm_buf_can_submit(buf))
+ return -EINVAL;
+
work_area = rtas_work_area_alloc(sizeof(*buf));
memcpy(rtas_work_area_raw_buf(work_area), buf, sizeof(*buf));
return ret;
}
+
+static struct papr_sysparm_buf *
+papr_sysparm_buf_from_user(const struct papr_sysparm_io_block __user *user_iob)
+{
+ struct papr_sysparm_buf *kern_spbuf;
+ long err;
+ u16 len;
+
+ /*
+ * The length of valid data that userspace claims to be in
+ * user_iob->data[].
+ */
+ if (get_user(len, &user_iob->length))
+ return ERR_PTR(-EFAULT);
+
+ static_assert(sizeof(user_iob->data) >= PAPR_SYSPARM_MAX_INPUT);
+ static_assert(sizeof(kern_spbuf->val) >= PAPR_SYSPARM_MAX_INPUT);
+
+ if (len > PAPR_SYSPARM_MAX_INPUT)
+ return ERR_PTR(-EINVAL);
+
+ kern_spbuf = papr_sysparm_buf_alloc();
+ if (!kern_spbuf)
+ return ERR_PTR(-ENOMEM);
+
+ papr_sysparm_buf_set_length(kern_spbuf, len);
+
+ if (len > 0 && copy_from_user(kern_spbuf->val, user_iob->data, len)) {
+ err = -EFAULT;
+ goto free_sysparm_buf;
+ }
+
+ return kern_spbuf;
+
+free_sysparm_buf:
+ papr_sysparm_buf_free(kern_spbuf);
+ return ERR_PTR(err);
+}
+
+static int papr_sysparm_buf_to_user(const struct papr_sysparm_buf *kern_spbuf,
+ struct papr_sysparm_io_block __user *user_iob)
+{
+ u16 len_out = papr_sysparm_buf_get_length(kern_spbuf);
+
+ if (put_user(len_out, &user_iob->length))
+ return -EFAULT;
+
+ static_assert(sizeof(user_iob->data) >= PAPR_SYSPARM_MAX_OUTPUT);
+ static_assert(sizeof(kern_spbuf->val) >= PAPR_SYSPARM_MAX_OUTPUT);
+
+ if (copy_to_user(user_iob->data, kern_spbuf->val, PAPR_SYSPARM_MAX_OUTPUT))
+ return -EFAULT;
+
+ return 0;
+}
+
+static long papr_sysparm_ioctl_get(struct papr_sysparm_io_block __user *user_iob)
+{
+ struct papr_sysparm_buf *kern_spbuf;
+ papr_sysparm_t param;
+ long ret;
+
+ if (get_user(param.token, &user_iob->parameter))
+ return -EFAULT;
+
+ kern_spbuf = papr_sysparm_buf_from_user(user_iob);
+ if (IS_ERR(kern_spbuf))
+ return PTR_ERR(kern_spbuf);
+
+ ret = papr_sysparm_get(param, kern_spbuf);
+ if (ret)
+ goto free_sysparm_buf;
+
+ ret = papr_sysparm_buf_to_user(kern_spbuf, user_iob);
+ if (ret)
+ goto free_sysparm_buf;
+
+ ret = 0;
+
+free_sysparm_buf:
+ papr_sysparm_buf_free(kern_spbuf);
+ return ret;
+}
+
+
+static long papr_sysparm_ioctl_set(struct papr_sysparm_io_block __user *user_iob)
+{
+ struct papr_sysparm_buf *kern_spbuf;
+ papr_sysparm_t param;
+ long ret;
+
+ if (get_user(param.token, &user_iob->parameter))
+ return -EFAULT;
+
+ kern_spbuf = papr_sysparm_buf_from_user(user_iob);
+ if (IS_ERR(kern_spbuf))
+ return PTR_ERR(kern_spbuf);
+
+ ret = papr_sysparm_set(param, kern_spbuf);
+ if (ret)
+ goto free_sysparm_buf;
+
+ ret = 0;
+
+free_sysparm_buf:
+ papr_sysparm_buf_free(kern_spbuf);
+ return ret;
+}
+
+static long papr_sysparm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
+{
+ void __user *argp = (__force void __user *)arg;
+ long ret;
+
+ switch (ioctl) {
+ case PAPR_SYSPARM_IOC_GET:
+ ret = papr_sysparm_ioctl_get(argp);
+ break;
+ case PAPR_SYSPARM_IOC_SET:
+ if (filp->f_mode & FMODE_WRITE)
+ ret = papr_sysparm_ioctl_set(argp);
+ else
+ ret = -EBADF;
+ break;
+ default:
+ ret = -ENOIOCTLCMD;
+ break;
+ }
+ return ret;
+}
+
+static const struct file_operations papr_sysparm_ops = {
+ .unlocked_ioctl = papr_sysparm_ioctl,
+};
+
+static struct miscdevice papr_sysparm_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "papr-sysparm",
+ .fops = &papr_sysparm_ops,
+};
+
+static __init int papr_sysparm_init(void)
+{
+ if (!rtas_function_implemented(RTAS_FN_IBM_GET_SYSTEM_PARAMETER))
+ return -ENODEV;
+
+ return misc_register(&papr_sysparm_dev);
+}
+machine_device_initcall(pseries, papr_sysparm_init);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "papr-vpd: " fmt
+
+#include <linux/anon_inodes.h>
+#include <linux/build_bug.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/lockdep.h>
+#include <linux/kernel.h>
+#include <linux/miscdevice.h>
+#include <linux/signal.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/string_helpers.h>
+#include <linux/uaccess.h>
+#include <asm/machdep.h>
+#include <asm/papr-vpd.h>
+#include <asm/rtas-work-area.h>
+#include <asm/rtas.h>
+#include <uapi/asm/papr-vpd.h>
+
+/*
+ * Function-specific return values for ibm,get-vpd, derived from PAPR+
+ * v2.13 7.3.20 "ibm,get-vpd RTAS Call".
+ */
+#define RTAS_IBM_GET_VPD_COMPLETE 0 /* All VPD has been retrieved. */
+#define RTAS_IBM_GET_VPD_MORE_DATA 1 /* More VPD is available. */
+#define RTAS_IBM_GET_VPD_START_OVER -4 /* VPD changed, restart call sequence. */
+
+/**
+ * struct rtas_ibm_get_vpd_params - Parameters (in and out) for ibm,get-vpd.
+ * @loc_code: In: Caller-provided location code buffer. Must be RTAS-addressable.
+ * @work_area: In: Caller-provided work area buffer for results.
+ * @sequence: In: Sequence number. Out: Next sequence number.
+ * @written: Out: Bytes written by ibm,get-vpd to @work_area.
+ * @status: Out: RTAS call status.
+ */
+struct rtas_ibm_get_vpd_params {
+ const struct papr_location_code *loc_code;
+ struct rtas_work_area *work_area;
+ u32 sequence;
+ u32 written;
+ s32 status;
+};
+
+/**
+ * rtas_ibm_get_vpd() - Call ibm,get-vpd to fill a work area buffer.
+ * @params: See &struct rtas_ibm_get_vpd_params.
+ *
+ * Calls ibm,get-vpd until it errors or successfully deposits data
+ * into the supplied work area. Handles RTAS retry statuses. Maps RTAS
+ * error statuses to reasonable errno values.
+ *
+ * The caller is expected to invoke rtas_ibm_get_vpd() multiple times
+ * to retrieve all the VPD for the provided location code. Only one
+ * sequence should be in progress at any time; starting a new sequence
+ * will disrupt any sequence already in progress. Serialization of VPD
+ * retrieval sequences is the responsibility of the caller.
+ *
+ * The caller should inspect @params.status to determine whether more
+ * calls are needed to complete the sequence.
+ *
+ * Context: May sleep.
+ * Return: -ve on error, 0 otherwise.
+ */
+static int rtas_ibm_get_vpd(struct rtas_ibm_get_vpd_params *params)
+{
+ const struct papr_location_code *loc_code = params->loc_code;
+ struct rtas_work_area *work_area = params->work_area;
+ u32 rets[2];
+ s32 fwrc;
+ int ret;
+
+ lockdep_assert_held(&rtas_ibm_get_vpd_lock);
+
+ do {
+ fwrc = rtas_call(rtas_function_token(RTAS_FN_IBM_GET_VPD), 4, 3,
+ rets,
+ __pa(loc_code),
+ rtas_work_area_phys(work_area),
+ rtas_work_area_size(work_area),
+ params->sequence);
+ } while (rtas_busy_delay(fwrc));
+
+ switch (fwrc) {
+ case RTAS_HARDWARE_ERROR:
+ ret = -EIO;
+ break;
+ case RTAS_INVALID_PARAMETER:
+ ret = -EINVAL;
+ break;
+ case RTAS_IBM_GET_VPD_START_OVER:
+ ret = -EAGAIN;
+ break;
+ case RTAS_IBM_GET_VPD_MORE_DATA:
+ params->sequence = rets[0];
+ fallthrough;
+ case RTAS_IBM_GET_VPD_COMPLETE:
+ params->written = rets[1];
+ /*
+ * Kernel or firmware bug, do not continue.
+ */
+ if (WARN(params->written > rtas_work_area_size(work_area),
+ "possible write beyond end of work area"))
+ ret = -EFAULT;
+ else
+ ret = 0;
+ break;
+ default:
+ ret = -EIO;
+ pr_err_ratelimited("unexpected ibm,get-vpd status %d\n", fwrc);
+ break;
+ }
+
+ params->status = fwrc;
+ return ret;
+}
+
+/*
+ * Internal VPD "blob" APIs for accumulating ibm,get-vpd results into
+ * an immutable buffer to be attached to a file descriptor.
+ */
+struct vpd_blob {
+ const char *data;
+ size_t len;
+};
+
+static bool vpd_blob_has_data(const struct vpd_blob *blob)
+{
+ return blob->data && blob->len;
+}
+
+static void vpd_blob_free(const struct vpd_blob *blob)
+{
+ if (blob) {
+ kvfree(blob->data);
+ kfree(blob);
+ }
+}
+
+/**
+ * vpd_blob_extend() - Append data to a &struct vpd_blob.
+ * @blob: The blob to extend.
+ * @data: The new data to append to @blob.
+ * @len: The length of @data.
+ *
+ * Context: May sleep.
+ * Return: -ENOMEM on allocation failure, 0 otherwise.
+ */
+static int vpd_blob_extend(struct vpd_blob *blob, const char *data, size_t len)
+{
+ const size_t new_len = blob->len + len;
+ const size_t old_len = blob->len;
+ const char *old_ptr = blob->data;
+ char *new_ptr;
+
+ new_ptr = old_ptr ?
+ kvrealloc(old_ptr, old_len, new_len, GFP_KERNEL_ACCOUNT) :
+ kvmalloc(len, GFP_KERNEL_ACCOUNT);
+
+ if (!new_ptr)
+ return -ENOMEM;
+
+ memcpy(&new_ptr[old_len], data, len);
+ blob->data = new_ptr;
+ blob->len = new_len;
+ return 0;
+}
+
+/**
+ * vpd_blob_generate() - Construct a new &struct vpd_blob.
+ * @generator: Function that supplies the blob data.
+ * @arg: Context pointer supplied by caller, passed to @generator.
+ *
+ * The @generator callback is invoked until it returns NULL. @arg is
+ * passed to @generator in its first argument on each call. When
+ * @generator returns data, it should store the data length in its
+ * second argument.
+ *
+ * Context: May sleep.
+ * Return: A completely populated &struct vpd_blob, or NULL on error.
+ */
+static const struct vpd_blob *
+vpd_blob_generate(const char * (*generator)(void *, size_t *), void *arg)
+{
+ struct vpd_blob *blob;
+ const char *buf;
+ size_t len;
+ int err = 0;
+
+ blob = kzalloc(sizeof(*blob), GFP_KERNEL_ACCOUNT);
+ if (!blob)
+ return NULL;
+
+ while (err == 0 && (buf = generator(arg, &len)))
+ err = vpd_blob_extend(blob, buf, len);
+
+ if (err != 0 || !vpd_blob_has_data(blob))
+ goto free_blob;
+
+ return blob;
+free_blob:
+ vpd_blob_free(blob);
+ return NULL;
+}
+
+/*
+ * Internal VPD sequence APIs. A VPD sequence is a series of calls to
+ * ibm,get-vpd for a given location code. The sequence ends when an
+ * error is encountered or all VPD for the location code has been
+ * returned.
+ */
+
+/**
+ * struct vpd_sequence - State for managing a VPD sequence.
+ * @error: Shall be zero as long as the sequence has not encountered an error,
+ * -ve errno otherwise. Use vpd_sequence_set_err() to update this.
+ * @params: Parameter block to pass to rtas_ibm_get_vpd().
+ */
+struct vpd_sequence {
+ int error;
+ struct rtas_ibm_get_vpd_params params;
+};
+
+/**
+ * vpd_sequence_begin() - Begin a VPD retrieval sequence.
+ * @seq: Uninitialized sequence state.
+ * @loc_code: Location code that defines the scope of the VPD to return.
+ *
+ * Initializes @seq with the resources necessary to carry out a VPD
+ * sequence. Callers must pass @seq to vpd_sequence_end() regardless
+ * of whether the sequence succeeds.
+ *
+ * Context: May sleep.
+ */
+static void vpd_sequence_begin(struct vpd_sequence *seq,
+ const struct papr_location_code *loc_code)
+{
+ /*
+ * Use a static data structure for the location code passed to
+ * RTAS to ensure it's in the RMA and avoid a separate work
+ * area allocation. Guarded by the function lock.
+ */
+ static struct papr_location_code static_loc_code;
+
+ /*
+ * We could allocate the work area before acquiring the
+ * function lock, but that would allow concurrent requests to
+ * exhaust the limited work area pool for no benefit. So
+ * allocate the work area under the lock.
+ */
+ mutex_lock(&rtas_ibm_get_vpd_lock);
+ static_loc_code = *loc_code;
+ *seq = (struct vpd_sequence) {
+ .params = {
+ .work_area = rtas_work_area_alloc(SZ_4K),
+ .loc_code = &static_loc_code,
+ .sequence = 1,
+ },
+ };
+}
+
+/**
+ * vpd_sequence_end() - Finalize a VPD retrieval sequence.
+ * @seq: Sequence state.
+ *
+ * Releases resources obtained by vpd_sequence_begin().
+ */
+static void vpd_sequence_end(struct vpd_sequence *seq)
+{
+ rtas_work_area_free(seq->params.work_area);
+ mutex_unlock(&rtas_ibm_get_vpd_lock);
+}
+
+/**
+ * vpd_sequence_should_stop() - Determine whether a VPD retrieval sequence
+ * should continue.
+ * @seq: VPD sequence state.
+ *
+ * Examines the sequence error state and outputs of the last call to
+ * ibm,get-vpd to determine whether the sequence in progress should
+ * continue or stop.
+ *
+ * Return: True if the sequence has encountered an error or if all VPD for
+ * this sequence has been retrieved. False otherwise.
+ */
+static bool vpd_sequence_should_stop(const struct vpd_sequence *seq)
+{
+ bool done;
+
+ if (seq->error)
+ return true;
+
+ switch (seq->params.status) {
+ case 0:
+ if (seq->params.written == 0)
+ done = false; /* Initial state. */
+ else
+ done = true; /* All data consumed. */
+ break;
+ case 1:
+ done = false; /* More data available. */
+ break;
+ default:
+ done = true; /* Error encountered. */
+ break;
+ }
+
+ return done;
+}
+
+static int vpd_sequence_set_err(struct vpd_sequence *seq, int err)
+{
+ /* Preserve the first error recorded. */
+ if (seq->error == 0)
+ seq->error = err;
+
+ return seq->error;
+}
+
+/*
+ * Generator function to be passed to vpd_blob_generate().
+ */
+static const char *vpd_sequence_fill_work_area(void *arg, size_t *len)
+{
+ struct vpd_sequence *seq = arg;
+ struct rtas_ibm_get_vpd_params *p = &seq->params;
+
+ if (vpd_sequence_should_stop(seq))
+ return NULL;
+ if (vpd_sequence_set_err(seq, rtas_ibm_get_vpd(p)))
+ return NULL;
+ *len = p->written;
+ return rtas_work_area_raw_buf(p->work_area);
+}
+
+/*
+ * Higher-level VPD retrieval code below. These functions use the
+ * vpd_blob_* and vpd_sequence_* APIs defined above to create fd-based
+ * VPD handles for consumption by user space.
+ */
+
+/**
+ * papr_vpd_run_sequence() - Run a single VPD retrieval sequence.
+ * @loc_code: Location code that defines the scope of VPD to return.
+ *
+ * Context: May sleep. Holds a mutex and an RTAS work area for its
+ * duration. Typically performs multiple sleepable slab
+ * allocations.
+ *
+ * Return: A populated &struct vpd_blob on success. Encoded error
+ * pointer otherwise.
+ */
+static const struct vpd_blob *papr_vpd_run_sequence(const struct papr_location_code *loc_code)
+{
+ const struct vpd_blob *blob;
+ struct vpd_sequence seq;
+
+ vpd_sequence_begin(&seq, loc_code);
+ blob = vpd_blob_generate(vpd_sequence_fill_work_area, &seq);
+ if (!blob)
+ vpd_sequence_set_err(&seq, -ENOMEM);
+ vpd_sequence_end(&seq);
+
+ if (seq.error) {
+ vpd_blob_free(blob);
+ return ERR_PTR(seq.error);
+ }
+
+ return blob;
+}
+
+/**
+ * papr_vpd_retrieve() - Return the VPD for a location code.
+ * @loc_code: Location code that defines the scope of VPD to return.
+ *
+ * Run VPD sequences against @loc_code until a blob is successfully
+ * instantiated, or a hard error is encountered, or a fatal signal is
+ * pending.
+ *
+ * Context: May sleep.
+ * Return: A fully populated VPD blob when successful. Encoded error
+ * pointer otherwise.
+ */
+static const struct vpd_blob *papr_vpd_retrieve(const struct papr_location_code *loc_code)
+{
+ const struct vpd_blob *blob;
+
+ /*
+ * EAGAIN means the sequence errored with a -4 (VPD changed)
+ * status from ibm,get-vpd, and we should attempt a new
+ * sequence. PAPR+ v2.13 R1–7.3.20–5 indicates that this
+ * should be a transient condition, not something that happens
+ * continuously. But we'll stop trying on a fatal signal.
+ */
+ do {
+ blob = papr_vpd_run_sequence(loc_code);
+ if (!IS_ERR(blob)) /* Success. */
+ break;
+ if (PTR_ERR(blob) != -EAGAIN) /* Hard error. */
+ break;
+ pr_info_ratelimited("VPD changed during retrieval, retrying\n");
+ cond_resched();
+ } while (!fatal_signal_pending(current));
+
+ return blob;
+}
+
+static ssize_t papr_vpd_handle_read(struct file *file, char __user *buf, size_t size, loff_t *off)
+{
+ const struct vpd_blob *blob = file->private_data;
+
+ /* bug: we should not instantiate a handle without any data attached. */
+ if (!vpd_blob_has_data(blob)) {
+ pr_err_once("handle without data\n");
+ return -EIO;
+ }
+
+ return simple_read_from_buffer(buf, size, off, blob->data, blob->len);
+}
+
+static int papr_vpd_handle_release(struct inode *inode, struct file *file)
+{
+ const struct vpd_blob *blob = file->private_data;
+
+ vpd_blob_free(blob);
+
+ return 0;
+}
+
+static loff_t papr_vpd_handle_seek(struct file *file, loff_t off, int whence)
+{
+ const struct vpd_blob *blob = file->private_data;
+
+ return fixed_size_llseek(file, off, whence, blob->len);
+}
+
+
+static const struct file_operations papr_vpd_handle_ops = {
+ .read = papr_vpd_handle_read,
+ .llseek = papr_vpd_handle_seek,
+ .release = papr_vpd_handle_release,
+};
+
+/**
+ * papr_vpd_create_handle() - Create a fd-based handle for reading VPD.
+ * @ulc: Location code in user memory; defines the scope of the VPD to
+ * retrieve.
+ *
+ * Handler for PAPR_VPD_IOC_CREATE_HANDLE ioctl command. Validates
+ * @ulc and instantiates an immutable VPD "blob" for it. The blob is
+ * attached to a file descriptor for reading by user space. The memory
+ * backing the blob is freed when the file is released.
+ *
+ * The entire requested VPD is retrieved by this call and all
+ * necessary RTAS interactions are performed before returning the fd
+ * to user space. This keeps the read handler simple and ensures that
+ * the kernel can prevent interleaving of ibm,get-vpd call sequences.
+ *
+ * Return: The installed fd number if successful, -ve errno otherwise.
+ */
+static long papr_vpd_create_handle(struct papr_location_code __user *ulc)
+{
+ struct papr_location_code klc;
+ const struct vpd_blob *blob;
+ struct file *file;
+ long err;
+ int fd;
+
+ if (copy_from_user(&klc, ulc, sizeof(klc)))
+ return -EFAULT;
+
+ if (!string_is_terminated(klc.str, ARRAY_SIZE(klc.str)))
+ return -EINVAL;
+
+ blob = papr_vpd_retrieve(&klc);
+ if (IS_ERR(blob))
+ return PTR_ERR(blob);
+
+ fd = get_unused_fd_flags(O_RDONLY | O_CLOEXEC);
+ if (fd < 0) {
+ err = fd;
+ goto free_blob;
+ }
+
+ file = anon_inode_getfile("[papr-vpd]", &papr_vpd_handle_ops,
+ (void *)blob, O_RDONLY);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto put_fd;
+ }
+
+ file->f_mode |= FMODE_LSEEK | FMODE_PREAD;
+ fd_install(fd, file);
+ return fd;
+put_fd:
+ put_unused_fd(fd);
+free_blob:
+ vpd_blob_free(blob);
+ return err;
+}
+
+/*
+ * Top-level ioctl handler for /dev/papr-vpd.
+ */
+static long papr_vpd_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
+{
+ void __user *argp = (__force void __user *)arg;
+ long ret;
+
+ switch (ioctl) {
+ case PAPR_VPD_IOC_CREATE_HANDLE:
+ ret = papr_vpd_create_handle(argp);
+ break;
+ default:
+ ret = -ENOIOCTLCMD;
+ break;
+ }
+ return ret;
+}
+
+static const struct file_operations papr_vpd_ops = {
+ .unlocked_ioctl = papr_vpd_dev_ioctl,
+};
+
+static struct miscdevice papr_vpd_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "papr-vpd",
+ .fops = &papr_vpd_ops,
+};
+
+static __init int papr_vpd_init(void)
+{
+ if (!rtas_function_implemented(RTAS_FN_IBM_GET_VPD))
+ return -ENODEV;
+
+ return misc_register(&papr_vpd_dev);
+}
+machine_device_initcall(pseries, papr_vpd_init);
extern int dlpar_acquire_drc(u32 drc_index);
extern int dlpar_release_drc(u32 drc_index);
extern int dlpar_unisolate_drc(u32 drc_index);
+extern void post_mobility_fixup(void);
void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog);
int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_errlog);
#include <asm/mmu.h>
#include <asm/rtas.h>
#include <asm/topology.h>
+#include "pseries.h"
static struct device suspend_dev;
#define GRACKLE_CFA(b, d, o) (0x80 | ((b) << 8) | ((d) << 16) \
| (((o) & ~3) << 24))
-#define GRACKLE_PICR1_STG 0x00000040
#define GRACKLE_PICR1_LOOPSNOOP 0x00000010
-/* N.B. this is called before bridges is initialized, so we can't
- use grackle_pcibios_{read,write}_config_dword. */
-static inline void grackle_set_stg(struct pci_controller* bp, int enable)
-{
- unsigned int val;
-
- out_be32(bp->cfg_addr, GRACKLE_CFA(0, 0, 0xa8));
- val = in_le32(bp->cfg_data);
- val = enable? (val | GRACKLE_PICR1_STG) :
- (val & ~GRACKLE_PICR1_STG);
- out_be32(bp->cfg_addr, GRACKLE_CFA(0, 0, 0xa8));
- out_le32(bp->cfg_data, val);
- (void)in_le32(bp->cfg_data);
-}
-
static inline void grackle_set_loop_snoop(struct pci_controller *bp, int enable)
{
unsigned int val;
pci_add_flags(PCI_REASSIGN_ALL_BUS);
if (of_machine_is_compatible("AAPL,PowerBook1998"))
grackle_set_loop_snoop(hose, 1);
-#if 0 /* Disabled for now, HW problems ??? */
- grackle_set_stg(hose, 1);
-#endif
}
rname = kasprintf(GFP_KERNEL, "CPU %d [0x%x] Interrupt Presentation",
cpu, hw_id);
+ if (!rname)
+ return -ENOMEM;
if (!request_mem_region(addr, size, rname)) {
pr_warn("icp_native: Could not reserve ICP MMIO for CPU %d, interrupt server #0x%x\n",
cpu, hw_id);
select KEXEC_ELF
config ARCH_SUPPORTS_KEXEC_PURGATORY
- def_bool KEXEC_FILE
- depends on CRYPTO=y
- depends on CRYPTO_SHA256=y
+ def_bool ARCH_SUPPORTS_KEXEC_FILE
config ARCH_SUPPORTS_CRASH_DUMP
def_bool y
return sys_ni_syscall(); \
}
-#define COMPAT_SYS_NI(name) \
- SYSCALL_ALIAS(__riscv_compat_sys_##name, sys_ni_posix_timers);
-
#endif /* CONFIG_COMPAT */
#define __SYSCALL_DEFINEx(x, name, ...) \
return sys_ni_syscall(); \
}
-#define SYS_NI(name) SYSCALL_ALIAS(__riscv_sys_##name, sys_ni_posix_timers);
-
#endif /* __ASM_SYSCALL_WRAPPER_H */
/* IMSIC SW-file */
struct imsic_mrif *swfile;
phys_addr_t swfile_pa;
+ spinlock_t swfile_extirq_lock;
};
#define imsic_vs_csr_read(__c) \
{
struct imsic *imsic = vcpu->arch.aia_context.imsic_state;
struct imsic_mrif *mrif = imsic->swfile;
+ unsigned long flags;
+
+ /*
+ * The critical section is necessary during external interrupt
+ * updates to avoid the risk of losing interrupts due to potential
+ * interruptions between reading topei and updating pending status.
+ */
+
+ spin_lock_irqsave(&imsic->swfile_extirq_lock, flags);
if (imsic_mrif_atomic_read(mrif, &mrif->eidelivery) &&
imsic_mrif_topei(mrif, imsic->nr_eix, imsic->nr_msis))
kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_VS_EXT);
else
kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_EXT);
+
+ spin_unlock_irqrestore(&imsic->swfile_extirq_lock, flags);
}
static void imsic_swfile_read(struct kvm_vcpu *vcpu, bool clear,
}
imsic->swfile = page_to_virt(swfile_page);
imsic->swfile_pa = page_to_phys(swfile_page);
+ spin_lock_init(&imsic->swfile_extirq_lock);
/* Setup IO device */
kvm_iodevice_init(&imsic->iodev, &imsic_iodoev_ops);
goto done;
}
count_vm_vma_lock_event(VMA_LOCK_RETRY);
+ if (fault & VM_FAULT_MAJOR)
+ flags |= FAULT_FLAG_TRIED;
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
def_bool y
config ARCH_SUPPORTS_KEXEC_FILE
- def_bool CRYPTO && CRYPTO_SHA256 && CRYPTO_SHA256_S390
+ def_bool y
config ARCH_SUPPORTS_KEXEC_SIG
def_bool MODULE_SIG_FORMAT
config ARCH_SUPPORTS_KEXEC_PURGATORY
- def_bool KEXEC_FILE
+ def_bool y
config ARCH_SUPPORTS_CRASH_DUMP
def_bool y
CONFIG_KEXEC_SIG=y
CONFIG_CRASH_DUMP=y
CONFIG_LIVEPATCH=y
-CONFIG_MARCH_ZEC12=y
-CONFIG_TUNE_ZEC12=y
+CONFIG_MARCH_Z13=y
CONFIG_NR_CPUS=512
CONFIG_NUMA=y
CONFIG_HZ_100=y
CONFIG_MODULE_UNLOAD_TAINT_TRACKING=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
-CONFIG_MODULE_SIG_SHA256=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLK_WBT=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BINFMT_MISC=m
CONFIG_ZSWAP=y
+CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
CONFIG_ZSMALLOC_STAT=y
CONFIG_SLUB_STATS=y
# CONFIG_COMPAT_BRK is not set
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_NILFS2_FS=m
+CONFIG_BCACHEFS_FS=y
+CONFIG_BCACHEFS_QUOTA=y
+CONFIG_BCACHEFS_POSIX_ACL=y
CONFIG_FS_DAX=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FS_ENCRYPTION=y
CONFIG_ENCRYPTED_KEYS=m
CONFIG_KEY_NOTIFICATIONS=y
CONFIG_SECURITY=y
-CONFIG_SECURITY_NETWORK=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_SECURITY_SELINUX=y
CONFIG_KEXEC_SIG=y
CONFIG_CRASH_DUMP=y
CONFIG_LIVEPATCH=y
-CONFIG_MARCH_ZEC12=y
-CONFIG_TUNE_ZEC12=y
+CONFIG_MARCH_Z13=y
CONFIG_NR_CPUS=512
CONFIG_NUMA=y
CONFIG_HZ_100=y
CONFIG_MODULE_UNLOAD_TAINT_TRACKING=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
-CONFIG_MODULE_SIG_SHA256=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLK_WBT=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BINFMT_MISC=m
CONFIG_ZSWAP=y
+CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
CONFIG_ZSMALLOC_STAT=y
# CONFIG_COMPAT_BRK is not set
CONFIG_MEMORY_HOTPLUG=y
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_NILFS2_FS=m
+CONFIG_BCACHEFS_FS=m
+CONFIG_BCACHEFS_QUOTA=y
+CONFIG_BCACHEFS_POSIX_ACL=y
CONFIG_FS_DAX=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FS_ENCRYPTION=y
CONFIG_ENCRYPTED_KEYS=m
CONFIG_KEY_NOTIFICATIONS=y
CONFIG_SECURITY=y
-CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_CRASH_DUMP=y
-CONFIG_MARCH_ZEC12=y
-CONFIG_TUNE_ZEC12=y
+CONFIG_MARCH_Z13=y
# CONFIG_COMPAT is not set
CONFIG_NR_CPUS=2
CONFIG_HZ_100=y
#define KERNEL_VXR_HIGH (KERNEL_VXR_V16V23|KERNEL_VXR_V24V31)
#define KERNEL_VXR (KERNEL_VXR_LOW|KERNEL_VXR_HIGH)
-#define KERNEL_FPR (KERNEL_FPC|KERNEL_VXR_V0V7)
+#define KERNEL_FPR (KERNEL_FPC|KERNEL_VXR_LOW)
struct kernel_fpu;
cond_syscall(__s390x_sys_##name); \
cond_syscall(__s390_sys_##name)
-#define SYS_NI(name) \
- SYSCALL_ALIAS(__s390x_sys_##name, sys_ni_posix_timers); \
- SYSCALL_ALIAS(__s390_sys_##name, sys_ni_posix_timers)
-
#define COMPAT_SYSCALL_DEFINEx(x, name, ...) \
long __s390_compat_sys##name(struct pt_regs *regs); \
ALLOW_ERROR_INJECTION(__s390_compat_sys##name, ERRNO); \
/*
* As some compat syscalls may not be implemented, we need to expand
- * COND_SYSCALL_COMPAT in kernel/sys_ni.c and COMPAT_SYS_NI in
- * kernel/time/posix-stubs.c to cover this case as well.
+ * COND_SYSCALL_COMPAT in kernel/sys_ni.c to cover this case as well.
*/
#define COND_SYSCALL_COMPAT(name) \
cond_syscall(__s390_compat_sys_##name)
-#define COMPAT_SYS_NI(name) \
- SYSCALL_ALIAS(__s390_compat_sys_##name, sys_ni_posix_timers)
-
#define __S390_SYS_STUBx(x, name, ...) \
long __s390_sys##name(struct pt_regs *regs); \
ALLOW_ERROR_INJECTION(__s390_sys##name, ERRNO); \
#define COND_SYSCALL(name) \
cond_syscall(__s390x_sys_##name)
-#define SYS_NI(name) \
- SYSCALL_ALIAS(__s390x_sys_##name, sys_ni_posix_timers)
-
#define __S390_SYS_STUBx(x, fullname, name, ...)
#endif /* CONFIG_COMPAT */
454 common futex_wake sys_futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue sys_futex_requeue
+457 common statmount sys_statmount sys_statmount
+458 common listmount sys_listmount sys_listmount
return;
}
count_vm_vma_lock_event(VMA_LOCK_RETRY);
+ if (fault & VM_FAULT_MAJOR)
+ flags |= FAULT_FLAG_TRIED;
+
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
def_bool y
config ARCH_SUPPORTS_KEXEC_FILE
- def_bool X86_64 && CRYPTO && CRYPTO_SHA256
+ def_bool X86_64
config ARCH_SELECTS_KEXEC_FILE
def_bool y
select HAVE_IMA_KEXEC if IMA
config ARCH_SUPPORTS_KEXEC_PURGATORY
- def_bool KEXEC_FILE
+ def_bool y
config ARCH_SUPPORTS_KEXEC_SIG
def_bool y
454 i386 futex_wake sys_futex_wake
455 i386 futex_wait sys_futex_wait
456 i386 futex_requeue sys_futex_requeue
+457 i386 statmount sys_statmount
+458 i386 listmount sys_listmount
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
#
# Due to a historical design error, certain syscalls are numbered differently
u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
int global_ctrl, pebs_enable;
+ /*
+ * In addition to obeying exclude_guest/exclude_host, remove bits being
+ * used for PEBS when running a guest, because PEBS writes to virtual
+ * addresses (not physical addresses).
+ */
*nr = 0;
global_ctrl = (*nr)++;
arr[global_ctrl] = (struct perf_guest_switch_msr){
.msr = MSR_CORE_PERF_GLOBAL_CTRL,
.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
- .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
+ .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
};
if (!x86_pmu.pebs)
#define ALT_FLAG_NOT (1 << 0)
#define ALT_NOT(feature) ((ALT_FLAG_NOT << ALT_FLAGS_SHIFT) | (feature))
+#define ALT_FLAG_DIRECT_CALL (1 << 1)
+#define ALT_DIRECT_CALL(feature) ((ALT_FLAG_DIRECT_CALL << ALT_FLAGS_SHIFT) | (feature))
+#define ALT_CALL_ALWAYS ALT_DIRECT_CALL(X86_FEATURE_ALWAYS)
#ifndef __ASSEMBLY__
u8 replacementlen; /* length of new instruction */
} __packed;
+extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
+
/*
* Debug flag that can be tested to see whether alternative
* instructions were patched in already:
s32 *start_cfi, s32 *end_cfi);
struct module;
-struct paravirt_patch_site;
struct callthunk_sites {
s32 *call_start, *call_end;
- struct paravirt_patch_site *pv_start, *pv_end;
+ struct alt_instr *alt_start, *alt_end;
};
#ifdef CONFIG_CALL_THUNKS
}
#endif /* CONFIG_SMP */
+#define ALT_CALL_INSTR "call BUG_func"
+
#define b_replacement(num) "664"#num
#define e_replacement(num) "665"#num
*/
#define ASM_NO_INPUT_CLOBBER(clbr...) "i" (0) : clbr
+/* Macro for creating assembler functions avoiding any C magic. */
+#define DEFINE_ASM_FUNC(func, instr, sec) \
+ asm (".pushsection " #sec ", \"ax\"\n" \
+ ".global " #func "\n\t" \
+ ".type " #func ", @function\n\t" \
+ ASM_FUNC_ALIGN "\n" \
+ #func ":\n\t" \
+ ASM_ENDBR \
+ instr "\n\t" \
+ ASM_RET \
+ ".size " #func ", . - " #func "\n\t" \
+ ".popsection")
+
+void BUG_func(void);
+void nop_func(void);
+
#else /* __ASSEMBLY__ */
#ifdef CONFIG_SMP
.byte \alt_len
.endm
+.macro ALT_CALL_INSTR
+ call BUG_func
+.endm
+
/*
* Define an alternative between two instructions. If @feature is
* present, early code in apply_alternatives() replaces @oldinstr with
void (*send_IPI_all)(int vector);
void (*send_IPI_self)(int vector);
- enum apic_delivery_modes delivery_mode;
-
u32 disable_esr : 1,
dest_mode_logical : 1,
x2apic_set_max_apicid : 1,
*/
#define IO_APIC_SLOT_SIZE 1024
+#define APIC_DELIVERY_MODE_FIXED 0
+#define APIC_DELIVERY_MODE_LOWESTPRIO 1
+#define APIC_DELIVERY_MODE_SMI 2
+#define APIC_DELIVERY_MODE_NMI 4
+#define APIC_DELIVERY_MODE_INIT 5
+#define APIC_DELIVERY_MODE_EXTINT 7
+
#define APIC_ID 0x20
#define APIC_LVR 0x30
#define APIC_CPUID(apicid) ((apicid) & XAPIC_DEST_CPUS_MASK)
#define NUM_APIC_CLUSTERS ((BAD_APICID + 1) >> XAPIC_DEST_CPUS_SHIFT)
-#ifndef __ASSEMBLY__
-/*
- * the local APIC register structure, memory mapped. Not terribly well
- * tested, but we might eventually use this one in the future - the
- * problem why we cannot use it right now is the P5 APIC, it has an
- * errata which cannot take 8-bit reads and writes, only 32-bit ones ...
- */
-#define u32 unsigned int
-
-struct local_apic {
-
-/*000*/ struct { u32 __reserved[4]; } __reserved_01;
-
-/*010*/ struct { u32 __reserved[4]; } __reserved_02;
-
-/*020*/ struct { /* APIC ID Register */
- u32 __reserved_1 : 24,
- phys_apic_id : 4,
- __reserved_2 : 4;
- u32 __reserved[3];
- } id;
-
-/*030*/ const
- struct { /* APIC Version Register */
- u32 version : 8,
- __reserved_1 : 8,
- max_lvt : 8,
- __reserved_2 : 8;
- u32 __reserved[3];
- } version;
-
-/*040*/ struct { u32 __reserved[4]; } __reserved_03;
-
-/*050*/ struct { u32 __reserved[4]; } __reserved_04;
-
-/*060*/ struct { u32 __reserved[4]; } __reserved_05;
-
-/*070*/ struct { u32 __reserved[4]; } __reserved_06;
-
-/*080*/ struct { /* Task Priority Register */
- u32 priority : 8,
- __reserved_1 : 24;
- u32 __reserved_2[3];
- } tpr;
-
-/*090*/ const
- struct { /* Arbitration Priority Register */
- u32 priority : 8,
- __reserved_1 : 24;
- u32 __reserved_2[3];
- } apr;
-
-/*0A0*/ const
- struct { /* Processor Priority Register */
- u32 priority : 8,
- __reserved_1 : 24;
- u32 __reserved_2[3];
- } ppr;
-
-/*0B0*/ struct { /* End Of Interrupt Register */
- u32 eoi;
- u32 __reserved[3];
- } eoi;
-
-/*0C0*/ struct { u32 __reserved[4]; } __reserved_07;
-
-/*0D0*/ struct { /* Logical Destination Register */
- u32 __reserved_1 : 24,
- logical_dest : 8;
- u32 __reserved_2[3];
- } ldr;
-
-/*0E0*/ struct { /* Destination Format Register */
- u32 __reserved_1 : 28,
- model : 4;
- u32 __reserved_2[3];
- } dfr;
-
-/*0F0*/ struct { /* Spurious Interrupt Vector Register */
- u32 spurious_vector : 8,
- apic_enabled : 1,
- focus_cpu : 1,
- __reserved_2 : 22;
- u32 __reserved_3[3];
- } svr;
-
-/*100*/ struct { /* In Service Register */
-/*170*/ u32 bitfield;
- u32 __reserved[3];
- } isr [8];
-
-/*180*/ struct { /* Trigger Mode Register */
-/*1F0*/ u32 bitfield;
- u32 __reserved[3];
- } tmr [8];
-
-/*200*/ struct { /* Interrupt Request Register */
-/*270*/ u32 bitfield;
- u32 __reserved[3];
- } irr [8];
-
-/*280*/ union { /* Error Status Register */
- struct {
- u32 send_cs_error : 1,
- receive_cs_error : 1,
- send_accept_error : 1,
- receive_accept_error : 1,
- __reserved_1 : 1,
- send_illegal_vector : 1,
- receive_illegal_vector : 1,
- illegal_register_address : 1,
- __reserved_2 : 24;
- u32 __reserved_3[3];
- } error_bits;
- struct {
- u32 errors;
- u32 __reserved_3[3];
- } all_errors;
- } esr;
-
-/*290*/ struct { u32 __reserved[4]; } __reserved_08;
-
-/*2A0*/ struct { u32 __reserved[4]; } __reserved_09;
-
-/*2B0*/ struct { u32 __reserved[4]; } __reserved_10;
-
-/*2C0*/ struct { u32 __reserved[4]; } __reserved_11;
-
-/*2D0*/ struct { u32 __reserved[4]; } __reserved_12;
-
-/*2E0*/ struct { u32 __reserved[4]; } __reserved_13;
-
-/*2F0*/ struct { u32 __reserved[4]; } __reserved_14;
-
-/*300*/ struct { /* Interrupt Command Register 1 */
- u32 vector : 8,
- delivery_mode : 3,
- destination_mode : 1,
- delivery_status : 1,
- __reserved_1 : 1,
- level : 1,
- trigger : 1,
- __reserved_2 : 2,
- shorthand : 2,
- __reserved_3 : 12;
- u32 __reserved_4[3];
- } icr1;
-
-/*310*/ struct { /* Interrupt Command Register 2 */
- union {
- u32 __reserved_1 : 24,
- phys_dest : 4,
- __reserved_2 : 4;
- u32 __reserved_3 : 24,
- logical_dest : 8;
- } dest;
- u32 __reserved_4[3];
- } icr2;
-
-/*320*/ struct { /* LVT - Timer */
- u32 vector : 8,
- __reserved_1 : 4,
- delivery_status : 1,
- __reserved_2 : 3,
- mask : 1,
- timer_mode : 1,
- __reserved_3 : 14;
- u32 __reserved_4[3];
- } lvt_timer;
-
-/*330*/ struct { /* LVT - Thermal Sensor */
- u32 vector : 8,
- delivery_mode : 3,
- __reserved_1 : 1,
- delivery_status : 1,
- __reserved_2 : 3,
- mask : 1,
- __reserved_3 : 15;
- u32 __reserved_4[3];
- } lvt_thermal;
-
-/*340*/ struct { /* LVT - Performance Counter */
- u32 vector : 8,
- delivery_mode : 3,
- __reserved_1 : 1,
- delivery_status : 1,
- __reserved_2 : 3,
- mask : 1,
- __reserved_3 : 15;
- u32 __reserved_4[3];
- } lvt_pc;
-
-/*350*/ struct { /* LVT - LINT0 */
- u32 vector : 8,
- delivery_mode : 3,
- __reserved_1 : 1,
- delivery_status : 1,
- polarity : 1,
- remote_irr : 1,
- trigger : 1,
- mask : 1,
- __reserved_2 : 15;
- u32 __reserved_3[3];
- } lvt_lint0;
-
-/*360*/ struct { /* LVT - LINT1 */
- u32 vector : 8,
- delivery_mode : 3,
- __reserved_1 : 1,
- delivery_status : 1,
- polarity : 1,
- remote_irr : 1,
- trigger : 1,
- mask : 1,
- __reserved_2 : 15;
- u32 __reserved_3[3];
- } lvt_lint1;
-
-/*370*/ struct { /* LVT - Error */
- u32 vector : 8,
- __reserved_1 : 4,
- delivery_status : 1,
- __reserved_2 : 3,
- mask : 1,
- __reserved_3 : 15;
- u32 __reserved_4[3];
- } lvt_error;
-
-/*380*/ struct { /* Timer Initial Count Register */
- u32 initial_count;
- u32 __reserved_2[3];
- } timer_icr;
-
-/*390*/ const
- struct { /* Timer Current Count Register */
- u32 curr_count;
- u32 __reserved_2[3];
- } timer_ccr;
-
-/*3A0*/ struct { u32 __reserved[4]; } __reserved_16;
-
-/*3B0*/ struct { u32 __reserved[4]; } __reserved_17;
-
-/*3C0*/ struct { u32 __reserved[4]; } __reserved_18;
-
-/*3D0*/ struct { u32 __reserved[4]; } __reserved_19;
-
-/*3E0*/ struct { /* Timer Divide Configuration Register */
- u32 divisor : 4,
- __reserved_1 : 28;
- u32 __reserved_2[3];
- } timer_dcr;
-
-/*3F0*/ struct { u32 __reserved[4]; } __reserved_20;
-
-} __attribute__ ((packed));
-
-#undef u32
-
#ifdef CONFIG_X86_32
#define BAD_APICID 0xFFu
#else
#define BAD_APICID 0xFFFFu
#endif
-enum apic_delivery_modes {
- APIC_DELIVERY_MODE_FIXED = 0,
- APIC_DELIVERY_MODE_LOWESTPRIO = 1,
- APIC_DELIVERY_MODE_SMI = 2,
- APIC_DELIVERY_MODE_NMI = 4,
- APIC_DELIVERY_MODE_INIT = 5,
- APIC_DELIVERY_MODE_EXTINT = 7,
-};
-
-#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_APICDEF_H */
#include <asm-generic/barrier.h>
-/*
- * Make previous memory operations globally visible before
- * a WRMSR.
- *
- * MFENCE makes writes visible, but only affects load/store
- * instructions. WRMSR is unfortunately not a load/store
- * instruction and is unaffected by MFENCE. The LFENCE ensures
- * that the WRMSR is not reordered.
- *
- * Most WRMSRs are full serializing instructions themselves and
- * do not require this barrier. This is only required for the
- * IA32_TSC_DEADLINE and X2APIC MSRs.
- */
-static inline void weak_wrmsr_fence(void)
-{
- asm volatile("mfence; lfence" : : : "memory");
-}
-
#endif /* _ASM_X86_BARRIER_H */
#define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */
#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */
#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */
-#define X86_FEATURE_ZEN (7*32+28) /* "" CPU based on Zen microarchitecture */
+#define X86_FEATURE_ZEN ( 7*32+28) /* "" Generic flag for all Zen and newer */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
#define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */
#define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */
#define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */
#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */
-
#define X86_FEATURE_SRSO (11*32+24) /* "" AMD BTB untrain RETs */
#define X86_FEATURE_SRSO_ALIAS (11*32+25) /* "" AMD BTB untrain RETs through aliasing */
#define X86_FEATURE_IBPB_ON_VMEXIT (11*32+26) /* "" Issue an IBPB only on VMEXIT */
+#define X86_FEATURE_APIC_MSRS_FENCE (11*32+27) /* "" IA32_TSC_DEADLINE and X2APIC MSRs need fencing */
+#define X86_FEATURE_ZEN2 (11*32+28) /* "" CPU based on Zen2 microarchitecture */
+#define X86_FEATURE_ZEN3 (11*32+29) /* "" CPU based on Zen3 microarchitecture */
+#define X86_FEATURE_ZEN4 (11*32+30) /* "" CPU based on Zen4 microarchitecture */
+#define X86_FEATURE_ZEN1 (11*32+31) /* "" CPU based on Zen1 microarchitecture */
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
((x)->e_machine == EM_X86_64)
#define compat_elf_check_arch(x) \
- ((elf_check_arch_ia32(x) && ia32_enabled()) || \
+ ((elf_check_arch_ia32(x) && ia32_enabled_verbose()) || \
(IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))
static inline void elf_common_init(struct thread_struct *t,
#ifndef _ASM_X86_IA32_H
#define _ASM_X86_IA32_H
-
#ifdef CONFIG_IA32_EMULATION
#include <linux/compat.h>
#endif
+static inline bool ia32_enabled_verbose(void)
+{
+ bool enabled = ia32_enabled();
+
+ if (IS_ENABLED(CONFIG_IA32_EMULATION) && !enabled)
+ pr_notice_once("32-bit emulation disabled. You can reenable with ia32_emulation=on\n");
+
+ return enabled;
+}
+
#endif /* _ASM_X86_IA32_H */
SMCA_PIE, /* Power, Interrupts, etc. */
SMCA_UMC, /* Unified Memory Controller */
SMCA_UMC_V2,
+ SMCA_MA_LLC, /* Memory Attached Last Level Cache */
SMCA_PB, /* Parameter Block */
SMCA_PSP, /* Platform Security Processor */
SMCA_PSP_V2,
SMCA_SHUB, /* System HUB Unit */
SMCA_SATA, /* SATA Unit */
SMCA_USB, /* USB Unit */
+ SMCA_USR_DP, /* Ultra Short Reach Data Plane Controller */
+ SMCA_USR_CP, /* Ultra Short Reach Control Plane Controller */
SMCA_GMI_PCS, /* GMI PCS Unit */
SMCA_XGMI_PHY, /* xGMI PHY Unit */
SMCA_WAFL_PHY, /* WAFL PHY Unit */
N_SMCA_BANK_TYPES
};
-extern const char *smca_get_long_name(enum smca_bank_types t);
extern bool amd_mce_is_memory_error(struct mce *m);
extern int mce_threshold_create_device(unsigned int cpu);
static __always_inline unsigned long read_cr2(void)
{
return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2,
- "mov %%cr2, %%rax;",
- ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%cr2, %%rax;", ALT_NOT_XEN);
}
static __always_inline void write_cr2(unsigned long x)
static inline unsigned long __read_cr3(void)
{
return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3,
- "mov %%cr3, %%rax;", ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%cr3, %%rax;", ALT_NOT_XEN);
}
static inline void write_cr3(unsigned long x)
{
- PVOP_ALT_VCALL1(mmu.write_cr3, x,
- "mov %%rdi, %%cr3", ALT_NOT(X86_FEATURE_XENPV));
+ PVOP_ALT_VCALL1(mmu.write_cr3, x, "mov %%rdi, %%cr3", ALT_NOT_XEN);
}
static inline void __write_cr4(unsigned long x)
static __always_inline void wbinvd(void)
{
- PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
+ PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT_XEN);
}
static inline u64 paravirt_read_msr(unsigned msr)
static inline pte_t __pte(pteval_t val)
{
return (pte_t) { PVOP_ALT_CALLEE1(pteval_t, mmu.make_pte, val,
- "mov %%rdi, %%rax",
- ALT_NOT(X86_FEATURE_XENPV)) };
+ "mov %%rdi, %%rax", ALT_NOT_XEN) };
}
static inline pteval_t pte_val(pte_t pte)
{
return PVOP_ALT_CALLEE1(pteval_t, mmu.pte_val, pte.pte,
- "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%rdi, %%rax", ALT_NOT_XEN);
}
static inline pgd_t __pgd(pgdval_t val)
{
return (pgd_t) { PVOP_ALT_CALLEE1(pgdval_t, mmu.make_pgd, val,
- "mov %%rdi, %%rax",
- ALT_NOT(X86_FEATURE_XENPV)) };
+ "mov %%rdi, %%rax", ALT_NOT_XEN) };
}
static inline pgdval_t pgd_val(pgd_t pgd)
{
return PVOP_ALT_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd,
- "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%rdi, %%rax", ALT_NOT_XEN);
}
#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
static inline pmd_t __pmd(pmdval_t val)
{
return (pmd_t) { PVOP_ALT_CALLEE1(pmdval_t, mmu.make_pmd, val,
- "mov %%rdi, %%rax",
- ALT_NOT(X86_FEATURE_XENPV)) };
+ "mov %%rdi, %%rax", ALT_NOT_XEN) };
}
static inline pmdval_t pmd_val(pmd_t pmd)
{
return PVOP_ALT_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd,
- "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%rdi, %%rax", ALT_NOT_XEN);
}
static inline void set_pud(pud_t *pudp, pud_t pud)
pudval_t ret;
ret = PVOP_ALT_CALLEE1(pudval_t, mmu.make_pud, val,
- "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%rdi, %%rax", ALT_NOT_XEN);
return (pud_t) { ret };
}
static inline pudval_t pud_val(pud_t pud)
{
return PVOP_ALT_CALLEE1(pudval_t, mmu.pud_val, pud.pud,
- "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%rdi, %%rax", ALT_NOT_XEN);
}
static inline void pud_clear(pud_t *pudp)
static inline p4d_t __p4d(p4dval_t val)
{
p4dval_t ret = PVOP_ALT_CALLEE1(p4dval_t, mmu.make_p4d, val,
- "mov %%rdi, %%rax",
- ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%rdi, %%rax", ALT_NOT_XEN);
return (p4d_t) { ret };
}
static inline p4dval_t p4d_val(p4d_t p4d)
{
return PVOP_ALT_CALLEE1(p4dval_t, mmu.p4d_val, p4d.p4d,
- "mov %%rdi, %%rax", ALT_NOT(X86_FEATURE_XENPV));
+ "mov %%rdi, %%rax", ALT_NOT_XEN);
}
static inline void __set_pgd(pgd_t *pgdp, pgd_t pgd)
static __always_inline unsigned long arch_local_save_flags(void)
{
return PVOP_ALT_CALLEE0(unsigned long, irq.save_fl, "pushf; pop %%rax;",
- ALT_NOT(X86_FEATURE_XENPV));
+ ALT_NOT_XEN);
}
static __always_inline void arch_local_irq_disable(void)
{
- PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", ALT_NOT(X86_FEATURE_XENPV));
+ PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", ALT_NOT_XEN);
}
static __always_inline void arch_local_irq_enable(void)
{
- PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", ALT_NOT(X86_FEATURE_XENPV));
+ PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", ALT_NOT_XEN);
}
static __always_inline unsigned long arch_local_irq_save(void)
#undef PVOP_VCALL4
#undef PVOP_CALL4
-#define DEFINE_PARAVIRT_ASM(func, instr, sec) \
- asm (".pushsection " #sec ", \"ax\"\n" \
- ".global " #func "\n\t" \
- ".type " #func ", @function\n\t" \
- ASM_FUNC_ALIGN "\n" \
- #func ":\n\t" \
- ASM_ENDBR \
- instr "\n\t" \
- ASM_RET \
- ".size " #func ", . - " #func "\n\t" \
- ".popsection")
-
extern void default_banner(void);
void native_pv_lock_init(void) __init;
#else /* __ASSEMBLY__ */
-#define _PVSITE(ptype, ops, word, algn) \
-771:; \
- ops; \
-772:; \
- .pushsection .parainstructions,"a"; \
- .align algn; \
- word 771b; \
- .byte ptype; \
- .byte 772b-771b; \
- _ASM_ALIGN; \
- .popsection
-
-
#ifdef CONFIG_X86_64
#ifdef CONFIG_PARAVIRT_XXL
+#ifdef CONFIG_DEBUG_ENTRY
-#define PARA_PATCH(off) ((off) / 8)
-#define PARA_SITE(ptype, ops) _PVSITE(ptype, ops, .quad, 8)
#define PARA_INDIRECT(addr) *addr(%rip)
-#ifdef CONFIG_DEBUG_ENTRY
.macro PARA_IRQ_save_fl
- PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),
- ANNOTATE_RETPOLINE_SAFE;
- call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);)
+ ANNOTATE_RETPOLINE_SAFE;
+ call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);
.endm
-#define SAVE_FLAGS ALTERNATIVE "PARA_IRQ_save_fl;", "pushf; pop %rax;", \
- ALT_NOT(X86_FEATURE_XENPV)
+#define SAVE_FLAGS ALTERNATIVE_2 "PARA_IRQ_save_fl;", \
+ "ALT_CALL_INSTR;", ALT_CALL_ALWAYS, \
+ "pushf; pop %rax;", ALT_NOT_XEN
#endif
#endif /* CONFIG_PARAVIRT_XXL */
#endif /* CONFIG_X86_64 */
#ifndef _ASM_X86_PARAVIRT_TYPES_H
#define _ASM_X86_PARAVIRT_TYPES_H
-#ifndef __ASSEMBLY__
-/* These all sit in the .parainstructions section to tell us what to patch. */
-struct paravirt_patch_site {
- u8 *instr; /* original instructions */
- u8 type; /* type of this instruction */
- u8 len; /* length of original instruction */
-};
-#endif
-
#ifdef CONFIG_PARAVIRT
#ifndef __ASSEMBLY__
extern struct pv_info pv_info;
extern struct paravirt_patch_template pv_ops;
-#define PARAVIRT_PATCH(x) \
- (offsetof(struct paravirt_patch_template, x) / sizeof(void *))
-
-#define paravirt_type(op) \
- [paravirt_typenum] "i" (PARAVIRT_PATCH(op)), \
- [paravirt_opptr] "m" (pv_ops.op)
-/*
- * Generate some code, and mark it as patchable by the
- * apply_paravirt() alternate instruction patcher.
- */
-#define _paravirt_alt(insn_string, type) \
- "771:\n\t" insn_string "\n" "772:\n" \
- ".pushsection .parainstructions,\"a\"\n" \
- _ASM_ALIGN "\n" \
- _ASM_PTR " 771b\n" \
- " .byte " type "\n" \
- " .byte 772b-771b\n" \
- _ASM_ALIGN "\n" \
- ".popsection\n"
-
-/* Generate patchable code, with the default asm parameters. */
-#define paravirt_alt(insn_string) \
- _paravirt_alt(insn_string, "%c[paravirt_typenum]")
-
-/* Simple instruction patching code. */
-#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
-
-unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr, unsigned int len);
+#define paravirt_ptr(op) [paravirt_opptr] "m" (pv_ops.op)
int paravirt_disable_iospace(void);
-/*
- * This generates an indirect call based on the operation type number.
- * The type number, computed in PARAVIRT_PATCH, is derived from the
- * offset into the paravirt_patch_template structure, and can therefore be
- * freely converted back into a structure offset.
- */
+/* This generates an indirect call based on the operation type number. */
#define PARAVIRT_CALL \
ANNOTATE_RETPOLINE_SAFE \
"call *%[paravirt_opptr];"
* However, x86_64 also has to clobber all caller saved registers, which
* unfortunately, are quite a bit (r8 - r11)
*
- * The call instruction itself is marked by placing its start address
- * and size into the .parainstructions section, so that
- * apply_paravirt() in arch/i386/kernel/alternative.c can do the
- * appropriate patching under the control of the backend pv_init_ops
- * implementation.
- *
* Unfortunately there's no way to get gcc to generate the args setup
* for the call, and then allow the call itself to be generated by an
* inline asm. Because of this, we must do the complete arg setup and
__mask & __eax; \
})
-
+/*
+ * Use alternative patching for paravirt calls:
+ * - For replacing an indirect call with a direct one, use the "normal"
+ * ALTERNATIVE() macro with the indirect call as the initial code sequence,
+ * which will be replaced with the related direct call by using the
+ * ALT_FLAG_DIRECT_CALL special case and the "always on" feature.
+ * - In case the replacement is either a direct call or a short code sequence
+ * depending on a feature bit, the ALTERNATIVE_2() macro is being used.
+ * The indirect call is the initial code sequence again, while the special
+ * code sequence is selected with the specified feature bit. In case the
+ * feature is not active, the direct call is used as above via the
+ * ALT_FLAG_DIRECT_CALL special case and the "always on" feature.
+ */
#define ____PVOP_CALL(ret, op, call_clbr, extra_clbr, ...) \
({ \
PVOP_CALL_ARGS; \
PVOP_TEST_NULL(op); \
- asm volatile(paravirt_alt(PARAVIRT_CALL) \
+ asm volatile(ALTERNATIVE(PARAVIRT_CALL, ALT_CALL_INSTR, \
+ ALT_CALL_ALWAYS) \
: call_clbr, ASM_CALL_CONSTRAINT \
- : paravirt_type(op), \
+ : paravirt_ptr(op), \
##__VA_ARGS__ \
: "memory", "cc" extra_clbr); \
ret; \
({ \
PVOP_CALL_ARGS; \
PVOP_TEST_NULL(op); \
- asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), \
- alt, cond) \
+ asm volatile(ALTERNATIVE_2(PARAVIRT_CALL, \
+ ALT_CALL_INSTR, ALT_CALL_ALWAYS, \
+ alt, cond) \
: call_clbr, ASM_CALL_CONSTRAINT \
- : paravirt_type(op), \
+ : paravirt_ptr(op), \
##__VA_ARGS__ \
: "memory", "cc" extra_clbr); \
ret; \
__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2), \
PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4))
-void _paravirt_nop(void);
-void paravirt_BUG(void);
unsigned long paravirt_ret0(void);
#ifdef CONFIG_PARAVIRT_XXL
u64 _paravirt_ident_64(u64);
unsigned long pv_native_read_cr2(void);
#endif
-#define paravirt_nop ((void *)_paravirt_nop)
-
-extern struct paravirt_patch_site __parainstructions[],
- __parainstructions_end[];
+#define paravirt_nop ((void *)nop_func)
#endif /* __ASSEMBLY__ */
+
+#define ALT_NOT_XEN ALT_NOT(X86_FEATURE_XENPV)
+
#endif /* CONFIG_PARAVIRT */
#endif /* _ASM_X86_PARAVIRT_TYPES_H */
extern bool gds_ucode_mitigated(void);
+/*
+ * Make previous memory operations globally visible before
+ * a WRMSR.
+ *
+ * MFENCE makes writes visible, but only affects load/store
+ * instructions. WRMSR is unfortunately not a load/store
+ * instruction and is unaffected by MFENCE. The LFENCE ensures
+ * that the WRMSR is not reordered.
+ *
+ * Most WRMSRs are full serializing instructions themselves and
+ * do not require this barrier. This is only required for the
+ * IA32_TSC_DEADLINE and X2APIC MSRs.
+ */
+static inline void weak_wrmsr_fence(void)
+{
+ alternative("mfence; lfence", "", ALT_NOT(X86_FEATURE_APIC_MSRS_FENCE));
+}
+
#endif /* _ASM_X86_PROCESSOR_H */
"pop %rdx\n\t" \
FRAME_END
-DEFINE_PARAVIRT_ASM(__raw_callee_save___pv_queued_spin_unlock,
- PV_UNLOCK_ASM, .spinlock.text);
+DEFINE_ASM_FUNC(__raw_callee_save___pv_queued_spin_unlock,
+ PV_UNLOCK_ASM, .spinlock.text);
#else /* CONFIG_64BIT */
return sys_ni_syscall(); \
}
-#define __SYS_NI(abi, name) \
- SYSCALL_ALIAS(__##abi##_##name, sys_ni_posix_timers);
-
#ifdef CONFIG_X86_64
#define __X64_SYS_STUB0(name) \
__SYS_STUB0(x64, sys_##name)
#define __X64_COND_SYSCALL(name) \
__COND_SYSCALL(x64, sys_##name)
-#define __X64_SYS_NI(name) \
- __SYS_NI(x64, sys_##name)
#else /* CONFIG_X86_64 */
#define __X64_SYS_STUB0(name)
#define __X64_SYS_STUBx(x, name, ...)
#define __X64_COND_SYSCALL(name)
-#define __X64_SYS_NI(name)
#endif /* CONFIG_X86_64 */
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
#define __IA32_COND_SYSCALL(name) \
__COND_SYSCALL(ia32, sys_##name)
-#define __IA32_SYS_NI(name) \
- __SYS_NI(ia32, sys_##name)
#else /* CONFIG_X86_32 || CONFIG_IA32_EMULATION */
#define __IA32_SYS_STUB0(name)
#define __IA32_SYS_STUBx(x, name, ...)
#define __IA32_COND_SYSCALL(name)
-#define __IA32_SYS_NI(name)
#endif /* CONFIG_X86_32 || CONFIG_IA32_EMULATION */
#ifdef CONFIG_IA32_EMULATION
* additional wrappers (aptly named __ia32_sys_xyzzy) which decode the
* ia32 regs in the proper order for shared or "common" syscalls. As some
* syscalls may not be implemented, we need to expand COND_SYSCALL in
- * kernel/sys_ni.c and SYS_NI in kernel/time/posix-stubs.c to cover this
- * case as well.
+ * kernel/sys_ni.c to cover this case as well.
*/
#define __IA32_COMPAT_SYS_STUB0(name) \
__SYS_STUB0(ia32, compat_sys_##name)
#define __IA32_COMPAT_COND_SYSCALL(name) \
__COND_SYSCALL(ia32, compat_sys_##name)
-#define __IA32_COMPAT_SYS_NI(name) \
- __SYS_NI(ia32, compat_sys_##name)
-
#else /* CONFIG_IA32_EMULATION */
#define __IA32_COMPAT_SYS_STUB0(name)
#define __IA32_COMPAT_SYS_STUBx(x, name, ...)
#define __IA32_COMPAT_COND_SYSCALL(name)
-#define __IA32_COMPAT_SYS_NI(name)
#endif /* CONFIG_IA32_EMULATION */
#define __X32_COMPAT_COND_SYSCALL(name) \
__COND_SYSCALL(x64, compat_sys_##name)
-#define __X32_COMPAT_SYS_NI(name) \
- __SYS_NI(x64, compat_sys_##name)
#else /* CONFIG_X86_X32_ABI */
#define __X32_COMPAT_SYS_STUB0(name)
#define __X32_COMPAT_SYS_STUBx(x, name, ...)
#define __X32_COMPAT_COND_SYSCALL(name)
-#define __X32_COMPAT_SYS_NI(name)
#endif /* CONFIG_X86_X32_ABI */
/*
* As some compat syscalls may not be implemented, we need to expand
- * COND_SYSCALL_COMPAT in kernel/sys_ni.c and COMPAT_SYS_NI in
- * kernel/time/posix-stubs.c to cover this case as well.
+ * COND_SYSCALL_COMPAT in kernel/sys_ni.c to cover this case as well.
*/
#define COND_SYSCALL_COMPAT(name) \
__IA32_COMPAT_COND_SYSCALL(name) \
__X32_COMPAT_COND_SYSCALL(name)
-#define COMPAT_SYS_NI(name) \
- __IA32_COMPAT_SYS_NI(name) \
- __X32_COMPAT_SYS_NI(name)
-
#endif /* CONFIG_COMPAT */
#define __SYSCALL_DEFINEx(x, name, ...) \
* As the generic SYSCALL_DEFINE0() macro does not decode any parameters for
* obvious reasons, and passing struct pt_regs *regs to it in %rdi does not
* hurt, we only need to re-define it here to keep the naming congruent to
- * SYSCALL_DEFINEx() -- which is essential for the COND_SYSCALL() and SYS_NI()
- * macros to work correctly.
+ * SYSCALL_DEFINEx() -- which is essential for the COND_SYSCALL() macro
+ * to work correctly.
*/
#define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0); \
__X64_COND_SYSCALL(name) \
__IA32_COND_SYSCALL(name)
-#define SYS_NI(name) \
- __X64_SYS_NI(name) \
- __IA32_SYS_NI(name)
-
/*
* For VSYSCALLS, we need to declare these three syscalls with the new
#include <linux/stddef.h>
#include <asm/ptrace.h>
-struct paravirt_patch_site;
-#ifdef CONFIG_PARAVIRT
-void apply_paravirt(struct paravirt_patch_site *start,
- struct paravirt_patch_site *end);
-#else
-static inline void apply_paravirt(struct paravirt_patch_site *start,
- struct paravirt_patch_site *end)
-{}
-#define __parainstructions NULL
-#define __parainstructions_end NULL
-#endif
-
/*
* Currently, the max observed size in the kernel code is
* JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5.
processor->processor_id, /* ACPI ID */
processor->lapic_flags & ACPI_MADT_ENABLED);
+ has_lapic_cpus = true;
return 0;
}
if (!count) {
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
acpi_parse_lapic, MAX_LOCAL_APIC);
- has_lapic_cpus = count > 0;
x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
acpi_parse_x2apic, MAX_LOCAL_APIC);
}
extern s32 __return_sites[], __return_sites_end[];
extern s32 __cfi_sites[], __cfi_sites_end[];
extern s32 __ibt_endbr_seal[], __ibt_endbr_seal_end[];
-extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern s32 __smp_locks[], __smp_locks_end[];
void text_poke_early(void *addr, const void *opcode, size_t len);
}
}
+static void __init_or_module noinline optimize_nops_inplace(u8 *instr, size_t len)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ optimize_nops(instr, len);
+ sync_core();
+ local_irq_restore(flags);
+}
+
/*
* In this context, "source" is where the instructions are placed in the
* section .altinstr_replacement, for example during kernel build by the
}
}
+/* Low-level backend functions usable from alternative code replacements. */
+DEFINE_ASM_FUNC(nop_func, "", .entry.text);
+EXPORT_SYMBOL_GPL(nop_func);
+
+noinstr void BUG_func(void)
+{
+ BUG();
+}
+EXPORT_SYMBOL_GPL(BUG_func);
+
+#define CALL_RIP_REL_OPCODE 0xff
+#define CALL_RIP_REL_MODRM 0x15
+
+/*
+ * Rewrite the "call BUG_func" replacement to point to the target of the
+ * indirect pv_ops call "call *disp(%ip)".
+ */
+static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a)
+{
+ void *target, *bug = &BUG_func;
+ s32 disp;
+
+ if (a->replacementlen != 5 || insn_buff[0] != CALL_INSN_OPCODE) {
+ pr_err("ALT_FLAG_DIRECT_CALL set for a non-call replacement instruction\n");
+ BUG();
+ }
+
+ if (a->instrlen != 6 ||
+ instr[0] != CALL_RIP_REL_OPCODE ||
+ instr[1] != CALL_RIP_REL_MODRM) {
+ pr_err("ALT_FLAG_DIRECT_CALL set for unrecognized indirect call\n");
+ BUG();
+ }
+
+ /* Skip CALL_RIP_REL_OPCODE and CALL_RIP_REL_MODRM */
+ disp = *(s32 *)(instr + 2);
+#ifdef CONFIG_X86_64
+ /* ff 15 00 00 00 00 call *0x0(%rip) */
+ /* target address is stored at "next instruction + disp". */
+ target = *(void **)(instr + a->instrlen + disp);
+#else
+ /* ff 15 00 00 00 00 call *0x0 */
+ /* target address is stored at disp. */
+ target = *(void **)disp;
+#endif
+ if (!target)
+ target = bug;
+
+ /* (BUG_func - .) + (target - BUG_func) := target - . */
+ *(s32 *)(insn_buff + 1) += target - bug;
+
+ if (target == &nop_func)
+ return 0;
+
+ return 5;
+}
+
/*
* Replace instructions with better alternatives for this CPU type. This runs
* before SMP is initialized to avoid SMP problems with self modifying code.
* patch if feature is *NOT* present.
*/
if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) {
- optimize_nops(instr, a->instrlen);
+ optimize_nops_inplace(instr, a->instrlen);
continue;
}
- DPRINTK(ALT, "feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d)",
- (a->flags & ALT_FLAG_NOT) ? "!" : "",
+ DPRINTK(ALT, "feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d) flags: 0x%x",
a->cpuid >> 5,
a->cpuid & 0x1f,
instr, instr, a->instrlen,
- replacement, a->replacementlen);
+ replacement, a->replacementlen, a->flags);
memcpy(insn_buff, replacement, a->replacementlen);
insn_buff_sz = a->replacementlen;
+ if (a->flags & ALT_FLAG_DIRECT_CALL) {
+ insn_buff_sz = alt_replace_call(instr, insn_buff, a);
+ if (insn_buff_sz < 0)
+ continue;
+ }
+
for (; insn_buff_sz < a->instrlen; insn_buff_sz++)
insn_buff[insn_buff_sz] = 0x90;
}
#endif /* CONFIG_SMP */
-#ifdef CONFIG_PARAVIRT
-
-/* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void __init_or_module add_nops(void *insns, unsigned int len)
-{
- while (len > 0) {
- unsigned int noplen = len;
- if (noplen > ASM_NOP_MAX)
- noplen = ASM_NOP_MAX;
- memcpy(insns, x86_nops[noplen], noplen);
- insns += noplen;
- len -= noplen;
- }
-}
-
-void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
- struct paravirt_patch_site *end)
-{
- struct paravirt_patch_site *p;
- char insn_buff[MAX_PATCH_LEN];
-
- for (p = start; p < end; p++) {
- unsigned int used;
-
- BUG_ON(p->len > MAX_PATCH_LEN);
- /* prep the buffer with the original instructions */
- memcpy(insn_buff, p->instr, p->len);
- used = paravirt_patch(p->type, insn_buff, (unsigned long)p->instr, p->len);
-
- BUG_ON(used > p->len);
-
- /* Pad the rest with nops */
- add_nops(insn_buff + used, p->len - used);
- text_poke_early(p->instr, insn_buff, p->len);
- }
-}
-extern struct paravirt_patch_site __start_parainstructions[],
- __stop_parainstructions[];
-#endif /* CONFIG_PARAVIRT */
-
/*
* Self-test for the INT3 based CALL emulation code.
*
*/
/*
- * Paravirt patching and alternative patching can be combined to
- * replace a function call with a short direct code sequence (e.g.
- * by setting a constant return value instead of doing that in an
- * external function).
- * In order to make this work the following sequence is required:
- * 1. set (artificial) features depending on used paravirt
- * functions which can later influence alternative patching
- * 2. apply paravirt patching (generally replacing an indirect
- * function call with a direct one)
- * 3. apply alternative patching (e.g. replacing a direct function
- * call with a custom code sequence)
- * Doing paravirt patching after alternative patching would clobber
- * the optimization of the custom code with a function call again.
+ * Make sure to set (artificial) features depending on used paravirt
+ * functions which can later influence alternative patching.
*/
paravirt_set_cap();
- /*
- * First patch paravirt functions, such that we overwrite the indirect
- * call with the direct call.
- */
- apply_paravirt(__parainstructions, __parainstructions_end);
-
__apply_fineibt(__retpoline_sites, __retpoline_sites_end,
__cfi_sites, __cfi_sites_end, true);
apply_retpolines(__retpoline_sites, __retpoline_sites_end);
apply_returns(__return_sites, __return_sites_end);
- /*
- * Then patch alternatives, such that those paravirt calls that are in
- * alternatives can be overwritten by their immediate fragments.
- */
apply_alternatives(__alt_instructions, __alt_instructions_end);
/*
} else {
local_irq_save(flags);
memcpy(addr, opcode, len);
- local_irq_restore(flags);
sync_core();
+ local_irq_restore(flags);
/*
* Could also do a CLFLUSH here to speed up CPU recovery; but
.acpi_madt_oem_check = flat_acpi_madt_oem_check,
.apic_id_registered = default_apic_id_registered,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = true,
.disable_esr = 0,
.acpi_madt_oem_check = physflat_acpi_madt_oem_check,
.apic_id_registered = default_apic_id_registered,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = false,
.disable_esr = 0,
struct apic apic_noop __ro_after_init = {
.name = "noop",
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = true,
.disable_esr = 0,
.probe = numachip1_probe,
.acpi_madt_oem_check = numachip1_acpi_madt_oem_check,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = false,
.disable_esr = 0,
.probe = numachip2_probe,
.acpi_madt_oem_check = numachip2_acpi_madt_oem_check,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = false,
.disable_esr = 0,
.name = "bigsmp",
.probe = probe_bigsmp,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = false,
.disable_esr = 1,
/*
* Legacy ISA IRQ has already been allocated, just add pin to
* the pin list associated with this IRQ and program the IOAPIC
- * entry. The IOAPIC entry
+ * entry.
*/
if (irq_data && irq_data->parent_data) {
if (!mp_check_pin_attr(irq, info))
.probe = probe_default,
.apic_id_registered = default_apic_id_registered,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = true,
.disable_esr = 0,
.probe = x2apic_cluster_probe,
.acpi_madt_oem_check = x2apic_acpi_madt_oem_check,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = true,
.disable_esr = 0,
.probe = x2apic_phys_probe,
.acpi_madt_oem_check = x2apic_acpi_madt_oem_check,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = false,
.disable_esr = 0,
.probe = uv_probe,
.acpi_madt_oem_check = uv_acpi_madt_oem_check,
- .delivery_mode = APIC_DELIVERY_MODE_FIXED,
.dest_mode_logical = false,
.disable_esr = 0,
}
static __init_or_module void
-patch_paravirt_call_sites(struct paravirt_patch_site *start,
- struct paravirt_patch_site *end,
- const struct core_text *ct)
+patch_alt_call_sites(struct alt_instr *start, struct alt_instr *end,
+ const struct core_text *ct)
{
- struct paravirt_patch_site *p;
+ struct alt_instr *a;
- for (p = start; p < end; p++)
- patch_call(p->instr, ct);
+ for (a = start; a < end; a++)
+ patch_call((void *)&a->instr_offset + a->instr_offset, ct);
}
static __init_or_module void
{
prdbg("Patching call sites %s\n", ct->name);
patch_call_sites(cs->call_start, cs->call_end, ct);
- patch_paravirt_call_sites(cs->pv_start, cs->pv_end, ct);
+ patch_alt_call_sites(cs->alt_start, cs->alt_end, ct);
prdbg("Patching call sites done%s\n", ct->name);
}
struct callthunk_sites cs = {
.call_start = __call_sites,
.call_end = __call_sites_end,
- .pv_start = __parainstructions,
- .pv_end = __parainstructions_end
+ .alt_start = __alt_instructions,
+ .alt_end = __alt_instructions_end
};
if (!cpu_feature_enabled(X86_FEATURE_CALL_DEPTH))
*/
static u32 nodes_per_socket = 1;
-/*
- * AMD errata checking
- *
- * Errata are defined as arrays of ints using the AMD_LEGACY_ERRATUM() or
- * AMD_OSVW_ERRATUM() macros. The latter is intended for newer errata that
- * have an OSVW id assigned, which it takes as first argument. Both take a
- * variable number of family-specific model-stepping ranges created by
- * AMD_MODEL_RANGE().
- *
- * Example:
- *
- * const int amd_erratum_319[] =
- * AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x10, 0x2, 0x1, 0x4, 0x2),
- * AMD_MODEL_RANGE(0x10, 0x8, 0x0, 0x8, 0x0),
- * AMD_MODEL_RANGE(0x10, 0x9, 0x0, 0x9, 0x0));
- */
-
-#define AMD_LEGACY_ERRATUM(...) { -1, __VA_ARGS__, 0 }
-#define AMD_OSVW_ERRATUM(osvw_id, ...) { osvw_id, __VA_ARGS__, 0 }
-#define AMD_MODEL_RANGE(f, m_start, s_start, m_end, s_end) \
- ((f << 24) | (m_start << 16) | (s_start << 12) | (m_end << 4) | (s_end))
-#define AMD_MODEL_RANGE_FAMILY(range) (((range) >> 24) & 0xff)
-#define AMD_MODEL_RANGE_START(range) (((range) >> 12) & 0xfff)
-#define AMD_MODEL_RANGE_END(range) ((range) & 0xfff)
-
-static const int amd_erratum_400[] =
- AMD_OSVW_ERRATUM(1, AMD_MODEL_RANGE(0xf, 0x41, 0x2, 0xff, 0xf),
- AMD_MODEL_RANGE(0x10, 0x2, 0x1, 0xff, 0xf));
-
-static const int amd_erratum_383[] =
- AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
-
-/* #1054: Instructions Retired Performance Counter May Be Inaccurate */
-static const int amd_erratum_1054[] =
- AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0, 0, 0x2f, 0xf));
-
-static const int amd_zenbleed[] =
- AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x30, 0x0, 0x4f, 0xf),
- AMD_MODEL_RANGE(0x17, 0x60, 0x0, 0x7f, 0xf),
- AMD_MODEL_RANGE(0x17, 0x90, 0x0, 0x91, 0xf),
- AMD_MODEL_RANGE(0x17, 0xa0, 0x0, 0xaf, 0xf));
-
-static const int amd_div0[] =
- AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
- AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
-
-static const int amd_erratum_1485[] =
- AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf),
- AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf));
-
-static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
-{
- int osvw_id = *erratum++;
- u32 range;
- u32 ms;
-
- if (osvw_id >= 0 && osvw_id < 65536 &&
- cpu_has(cpu, X86_FEATURE_OSVW)) {
- u64 osvw_len;
-
- rdmsrl(MSR_AMD64_OSVW_ID_LENGTH, osvw_len);
- if (osvw_id < osvw_len) {
- u64 osvw_bits;
-
- rdmsrl(MSR_AMD64_OSVW_STATUS + (osvw_id >> 6),
- osvw_bits);
- return osvw_bits & (1ULL << (osvw_id & 0x3f));
- }
- }
-
- /* OSVW unavailable or ID unknown, match family-model-stepping range */
- ms = (cpu->x86_model << 4) | cpu->x86_stepping;
- while ((range = *erratum++))
- if ((cpu->x86 == AMD_MODEL_RANGE_FAMILY(range)) &&
- (ms >= AMD_MODEL_RANGE_START(range)) &&
- (ms <= AMD_MODEL_RANGE_END(range)))
- return true;
-
- return false;
-}
-
static inline int rdmsrl_amd_safe(unsigned msr, unsigned long long *p)
{
u32 gprs[8] = { 0 };
}
resctrl_cpu_detect(c);
+
+ /* Figure out Zen generations: */
+ switch (c->x86) {
+ case 0x17: {
+ switch (c->x86_model) {
+ case 0x00 ... 0x2f:
+ case 0x50 ... 0x5f:
+ setup_force_cpu_cap(X86_FEATURE_ZEN1);
+ break;
+ case 0x30 ... 0x4f:
+ case 0x60 ... 0x7f:
+ case 0x90 ... 0x91:
+ case 0xa0 ... 0xaf:
+ setup_force_cpu_cap(X86_FEATURE_ZEN2);
+ break;
+ default:
+ goto warn;
+ }
+ break;
+ }
+ case 0x19: {
+ switch (c->x86_model) {
+ case 0x00 ... 0x0f:
+ case 0x20 ... 0x5f:
+ setup_force_cpu_cap(X86_FEATURE_ZEN3);
+ break;
+ case 0x10 ... 0x1f:
+ case 0x60 ... 0xaf:
+ setup_force_cpu_cap(X86_FEATURE_ZEN4);
+ break;
+ default:
+ goto warn;
+ }
+ break;
+ }
+ default:
+ break;
+ }
+
+ return;
+
+warn:
+ WARN_ONCE(1, "Family 0x%x, model: 0x%x??\n", c->x86, c->x86_model);
}
static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
if (c->x86 == 0x16 && c->x86_model <= 0xf)
msr_set_bit(MSR_AMD64_LS_CFG, 15);
- /*
- * Check whether the machine is affected by erratum 400. This is
- * used to select the proper idle routine and to enable the check
- * whether the machine is affected in arch_post_acpi_init(), which
- * sets the X86_BUG_AMD_APIC_C1E bug depending on the MSR check.
- */
- if (cpu_has_amd_erratum(c, amd_erratum_400))
- set_cpu_bug(c, X86_BUG_AMD_E400);
-
early_detect_mem_encrypt(c);
/* Re-enable TopologyExtensions if switched off by BIOS */
msr_set_bit(MSR_K7_HWCR, 6);
#endif
set_cpu_bug(c, X86_BUG_SWAPGS_FENCE);
+
+ /*
+ * Check models and steppings affected by erratum 400. This is
+ * used to select the proper idle routine and to enable the
+ * check whether the machine is affected in arch_post_acpi_subsys_init()
+ * which sets the X86_BUG_AMD_APIC_C1E bug depending on the MSR check.
+ */
+ if (c->x86_model > 0x41 ||
+ (c->x86_model == 0x41 && c->x86_stepping >= 0x2))
+ setup_force_cpu_bug(X86_BUG_AMD_E400);
}
static void init_amd_gh(struct cpuinfo_x86 *c)
*/
msr_clear_bit(MSR_AMD64_BU_CFG2, 24);
- if (cpu_has_amd_erratum(c, amd_erratum_383))
- set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
+ set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
+
+ /*
+ * Check models and steppings affected by erratum 400. This is
+ * used to select the proper idle routine and to enable the
+ * check whether the machine is affected in arch_post_acpi_subsys_init()
+ * which sets the X86_BUG_AMD_APIC_C1E bug depending on the MSR check.
+ */
+ if (c->x86_model > 0x2 ||
+ (c->x86_model == 0x2 && c->x86_stepping >= 0x1))
+ setup_force_cpu_bug(X86_BUG_AMD_E400);
}
static void init_amd_ln(struct cpuinfo_x86 *c)
clear_rdrand_cpuid_bit(c);
}
+static void fix_erratum_1386(struct cpuinfo_x86 *c)
+{
+ /*
+ * Work around Erratum 1386. The XSAVES instruction malfunctions in
+ * certain circumstances on Zen1/2 uarch, and not all parts have had
+ * updated microcode at the time of writing (March 2023).
+ *
+ * Affected parts all have no supervisor XSAVE states, meaning that
+ * the XSAVEC instruction (which works fine) is equivalent.
+ */
+ clear_cpu_cap(c, X86_FEATURE_XSAVES);
+}
+
void init_spectral_chicken(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_CPU_UNRET_ENTRY
*
* This suppresses speculation from the middle of a basic block, i.e. it
* suppresses non-branch predictions.
- *
- * We use STIBP as a heuristic to filter out Zen2 from the rest of F17H
*/
- if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && cpu_has(c, X86_FEATURE_AMD_STIBP)) {
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
if (!rdmsrl_safe(MSR_ZEN2_SPECTRAL_CHICKEN, &value)) {
value |= MSR_ZEN2_SPECTRAL_CHICKEN_BIT;
wrmsrl_safe(MSR_ZEN2_SPECTRAL_CHICKEN, value);
}
}
#endif
- /*
- * Work around Erratum 1386. The XSAVES instruction malfunctions in
- * certain circumstances on Zen1/2 uarch, and not all parts have had
- * updated microcode at the time of writing (March 2023).
- *
- * Affected parts all have no supervisor XSAVE states, meaning that
- * the XSAVEC instruction (which works fine) is equivalent.
- */
- clear_cpu_cap(c, X86_FEATURE_XSAVES);
}
-static void init_amd_zn(struct cpuinfo_x86 *c)
+static void init_amd_zen_common(void)
{
- set_cpu_cap(c, X86_FEATURE_ZEN);
-
+ setup_force_cpu_cap(X86_FEATURE_ZEN);
#ifdef CONFIG_NUMA
node_reclaim_distance = 32;
#endif
+}
+
+static void init_amd_zen1(struct cpuinfo_x86 *c)
+{
+ init_amd_zen_common();
+ fix_erratum_1386(c);
/* Fix up CPUID bits, but only if not virtualised. */
if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
/* Erratum 1076: CPB feature bit not being set in CPUID. */
if (!cpu_has(c, X86_FEATURE_CPB))
set_cpu_cap(c, X86_FEATURE_CPB);
-
- /*
- * Zen3 (Fam19 model < 0x10) parts are not susceptible to
- * Branch Type Confusion, but predate the allocation of the
- * BTC_NO bit.
- */
- if (c->x86 == 0x19 && !cpu_has(c, X86_FEATURE_BTC_NO))
- set_cpu_cap(c, X86_FEATURE_BTC_NO);
}
+
+ pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
+ setup_force_cpu_bug(X86_BUG_DIV0);
}
static bool cpu_has_zenbleed_microcode(void)
return true;
}
-static void zenbleed_check(struct cpuinfo_x86 *c)
+static void zen2_zenbleed_check(struct cpuinfo_x86 *c)
{
- if (!cpu_has_amd_erratum(c, amd_zenbleed))
- return;
-
if (cpu_has(c, X86_FEATURE_HYPERVISOR))
return;
}
}
+static void init_amd_zen2(struct cpuinfo_x86 *c)
+{
+ init_amd_zen_common();
+ init_spectral_chicken(c);
+ fix_erratum_1386(c);
+ zen2_zenbleed_check(c);
+}
+
+static void init_amd_zen3(struct cpuinfo_x86 *c)
+{
+ init_amd_zen_common();
+
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
+ /*
+ * Zen3 (Fam19 model < 0x10) parts are not susceptible to
+ * Branch Type Confusion, but predate the allocation of the
+ * BTC_NO bit.
+ */
+ if (!cpu_has(c, X86_FEATURE_BTC_NO))
+ set_cpu_cap(c, X86_FEATURE_BTC_NO);
+ }
+}
+
+static void init_amd_zen4(struct cpuinfo_x86 *c)
+{
+ init_amd_zen_common();
+
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR))
+ msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
+}
+
static void init_amd(struct cpuinfo_x86 *c)
{
u64 vm_cr;
case 0x12: init_amd_ln(c); break;
case 0x15: init_amd_bd(c); break;
case 0x16: init_amd_jg(c); break;
- case 0x17: init_spectral_chicken(c);
- fallthrough;
- case 0x19: init_amd_zn(c); break;
}
+ if (boot_cpu_has(X86_FEATURE_ZEN1))
+ init_amd_zen1(c);
+ else if (boot_cpu_has(X86_FEATURE_ZEN2))
+ init_amd_zen2(c);
+ else if (boot_cpu_has(X86_FEATURE_ZEN3))
+ init_amd_zen3(c);
+ else if (boot_cpu_has(X86_FEATURE_ZEN4))
+ init_amd_zen4(c);
+
/*
* Enable workaround for FXSAVE leak on CPUs
* without a XSaveErPtr feature
* Counter May Be Inaccurate".
*/
if (cpu_has(c, X86_FEATURE_IRPERF) &&
- !cpu_has_amd_erratum(c, amd_erratum_1054))
+ (boot_cpu_has(X86_FEATURE_ZEN1) && c->x86_model > 0x2f))
msr_set_bit(MSR_K7_HWCR, MSR_K7_HWCR_IRPERF_EN_BIT);
check_null_seg_clears_base(c);
cpu_has(c, X86_FEATURE_AUTOIBRS))
WARN_ON_ONCE(msr_set_bit(MSR_EFER, _EFER_AUTOIBRS));
- zenbleed_check(c);
-
- if (cpu_has_amd_erratum(c, amd_div0)) {
- pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
- setup_force_cpu_bug(X86_BUG_DIV0);
- }
-
- if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
- cpu_has_amd_erratum(c, amd_erratum_1485))
- msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
+ /* AMD CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */
+ clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE);
}
#ifdef CONFIG_X86_32
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
- zenbleed_check(c);
+ zen2_zenbleed_check(c);
}
void amd_check_microcode(void)
c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
#endif
+
+ /*
+ * Set default APIC and TSC_DEADLINE MSR fencing flag. AMD and
+ * Hygon will clear it in ->c_init() below.
+ */
+ set_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE);
+
/*
* Vendor-specific initialization. In this section we
* canonicalize the feature flags, meaning if there are
set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
check_null_seg_clears_base(c);
+
+ /* Hygon CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */
+ clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE);
}
static void cpu_detect_tlb_hygon(struct cpuinfo_x86 *c)
cpuhp_remove_state(CPUHP_AP_X86_INTEL_EPB_ONLINE);
return ret;
}
-subsys_initcall(intel_epb_init);
+late_initcall(intel_epb_init);
static DEFINE_PER_CPU_READ_MOSTLY(struct smca_bank[MAX_NR_BANKS], smca_banks);
static DEFINE_PER_CPU_READ_MOSTLY(u8[N_SMCA_BANK_TYPES], smca_bank_counts);
-struct smca_bank_name {
- const char *name; /* Short name for sysfs */
- const char *long_name; /* Long name for pretty-printing */
-};
-
-static struct smca_bank_name smca_names[] = {
- [SMCA_LS ... SMCA_LS_V2] = { "load_store", "Load Store Unit" },
- [SMCA_IF] = { "insn_fetch", "Instruction Fetch Unit" },
- [SMCA_L2_CACHE] = { "l2_cache", "L2 Cache" },
- [SMCA_DE] = { "decode_unit", "Decode Unit" },
- [SMCA_RESERVED] = { "reserved", "Reserved" },
- [SMCA_EX] = { "execution_unit", "Execution Unit" },
- [SMCA_FP] = { "floating_point", "Floating Point Unit" },
- [SMCA_L3_CACHE] = { "l3_cache", "L3 Cache" },
- [SMCA_CS ... SMCA_CS_V2] = { "coherent_slave", "Coherent Slave" },
- [SMCA_PIE] = { "pie", "Power, Interrupts, etc." },
+static const char * const smca_names[] = {
+ [SMCA_LS ... SMCA_LS_V2] = "load_store",
+ [SMCA_IF] = "insn_fetch",
+ [SMCA_L2_CACHE] = "l2_cache",
+ [SMCA_DE] = "decode_unit",
+ [SMCA_RESERVED] = "reserved",
+ [SMCA_EX] = "execution_unit",
+ [SMCA_FP] = "floating_point",
+ [SMCA_L3_CACHE] = "l3_cache",
+ [SMCA_CS ... SMCA_CS_V2] = "coherent_slave",
+ [SMCA_PIE] = "pie",
/* UMC v2 is separate because both of them can exist in a single system. */
- [SMCA_UMC] = { "umc", "Unified Memory Controller" },
- [SMCA_UMC_V2] = { "umc_v2", "Unified Memory Controller v2" },
- [SMCA_PB] = { "param_block", "Parameter Block" },
- [SMCA_PSP ... SMCA_PSP_V2] = { "psp", "Platform Security Processor" },
- [SMCA_SMU ... SMCA_SMU_V2] = { "smu", "System Management Unit" },
- [SMCA_MP5] = { "mp5", "Microprocessor 5 Unit" },
- [SMCA_MPDMA] = { "mpdma", "MPDMA Unit" },
- [SMCA_NBIO] = { "nbio", "Northbridge IO Unit" },
- [SMCA_PCIE ... SMCA_PCIE_V2] = { "pcie", "PCI Express Unit" },
- [SMCA_XGMI_PCS] = { "xgmi_pcs", "Ext Global Memory Interconnect PCS Unit" },
- [SMCA_NBIF] = { "nbif", "NBIF Unit" },
- [SMCA_SHUB] = { "shub", "System Hub Unit" },
- [SMCA_SATA] = { "sata", "SATA Unit" },
- [SMCA_USB] = { "usb", "USB Unit" },
- [SMCA_GMI_PCS] = { "gmi_pcs", "Global Memory Interconnect PCS Unit" },
- [SMCA_XGMI_PHY] = { "xgmi_phy", "Ext Global Memory Interconnect PHY Unit" },
- [SMCA_WAFL_PHY] = { "wafl_phy", "WAFL PHY Unit" },
- [SMCA_GMI_PHY] = { "gmi_phy", "Global Memory Interconnect PHY Unit" },
+ [SMCA_UMC] = "umc",
+ [SMCA_UMC_V2] = "umc_v2",
+ [SMCA_MA_LLC] = "ma_llc",
+ [SMCA_PB] = "param_block",
+ [SMCA_PSP ... SMCA_PSP_V2] = "psp",
+ [SMCA_SMU ... SMCA_SMU_V2] = "smu",
+ [SMCA_MP5] = "mp5",
+ [SMCA_MPDMA] = "mpdma",
+ [SMCA_NBIO] = "nbio",
+ [SMCA_PCIE ... SMCA_PCIE_V2] = "pcie",
+ [SMCA_XGMI_PCS] = "xgmi_pcs",
+ [SMCA_NBIF] = "nbif",
+ [SMCA_SHUB] = "shub",
+ [SMCA_SATA] = "sata",
+ [SMCA_USB] = "usb",
+ [SMCA_USR_DP] = "usr_dp",
+ [SMCA_USR_CP] = "usr_cp",
+ [SMCA_GMI_PCS] = "gmi_pcs",
+ [SMCA_XGMI_PHY] = "xgmi_phy",
+ [SMCA_WAFL_PHY] = "wafl_phy",
+ [SMCA_GMI_PHY] = "gmi_phy",
};
static const char *smca_get_name(enum smca_bank_types t)
if (t >= N_SMCA_BANK_TYPES)
return NULL;
- return smca_names[t].name;
-}
-
-const char *smca_get_long_name(enum smca_bank_types t)
-{
- if (t >= N_SMCA_BANK_TYPES)
- return NULL;
-
- return smca_names[t].long_name;
+ return smca_names[t];
}
-EXPORT_SYMBOL_GPL(smca_get_long_name);
enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank)
{
{ SMCA_CS, HWID_MCATYPE(0x2E, 0x0) },
{ SMCA_PIE, HWID_MCATYPE(0x2E, 0x1) },
{ SMCA_CS_V2, HWID_MCATYPE(0x2E, 0x2) },
+ { SMCA_MA_LLC, HWID_MCATYPE(0x2E, 0x4) },
/* Unified Memory Controller MCA type */
{ SMCA_UMC, HWID_MCATYPE(0x96, 0x0) },
{ SMCA_SHUB, HWID_MCATYPE(0x80, 0x0) },
{ SMCA_SATA, HWID_MCATYPE(0xA8, 0x0) },
{ SMCA_USB, HWID_MCATYPE(0xAA, 0x0) },
+ { SMCA_USR_DP, HWID_MCATYPE(0x170, 0x0) },
+ { SMCA_USR_CP, HWID_MCATYPE(0x180, 0x0) },
{ SMCA_GMI_PCS, HWID_MCATYPE(0x241, 0x0) },
{ SMCA_XGMI_PHY, HWID_MCATYPE(0x259, 0x0) },
{ SMCA_WAFL_PHY, HWID_MCATYPE(0x267, 0x0) },
#include <linux/sync_core.h>
#include <linux/task_work.h>
#include <linux/hardirq.h>
+#include <linux/kexec.h>
#include <asm/intel-family.h>
#include <asm/processor.h>
struct llist_node *pending;
struct mce_evt_llist *l;
int apei_err = 0;
+ struct page *p;
/*
* Allow instrumentation around external facilities usage. Not that it
if (!fake_panic) {
if (panic_timeout == 0)
panic_timeout = mca_cfg.panic_timeout;
+
+ /*
+ * Kdump skips the poisoned page in order to avoid
+ * touching the error bits again. Poison the page even
+ * if the error is fatal and the machine is about to
+ * panic.
+ */
+ if (kexec_crash_loaded()) {
+ if (final && (final->status & MCI_STATUS_ADDRV)) {
+ p = pfn_to_online_page(final->addr >> PAGE_SHIFT);
+ if (p)
+ SetPageHWPoison(p);
+ }
+ }
panic(msg);
} else
pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
barrier();
m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
+ /*
+ * Update storm tracking here, before checking for the
+ * MCI_STATUS_VAL bit. Valid corrected errors count
+ * towards declaring, or maintaining, storm status. No
+ * error in a bank counts towards avoiding, or ending,
+ * storm status.
+ */
+ if (!mca_cfg.cmci_disabled)
+ mce_track_storm(&m);
+
/* If this entry is not valid, ignore it */
if (!(m.status & MCI_STATUS_VAL))
continue;
static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */
static DEFINE_PER_CPU(struct timer_list, mce_timer);
-static unsigned long mce_adjust_timer_default(unsigned long interval)
-{
- return interval;
-}
-
-static unsigned long (*mce_adjust_timer)(unsigned long interval) = mce_adjust_timer_default;
-
static void __start_timer(struct timer_list *t, unsigned long interval)
{
unsigned long when = jiffies + interval;
iv = __this_cpu_read(mce_next_interval);
- if (mce_available(this_cpu_ptr(&cpu_info))) {
+ if (mce_available(this_cpu_ptr(&cpu_info)))
mc_poll_banks();
- if (mce_intel_cmci_poll()) {
- iv = mce_adjust_timer(iv);
- goto done;
- }
- }
-
/*
* Alert userspace if needed. If we logged an MCE, reduce the polling
* interval, otherwise increase the polling interval.
else
iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
-done:
- __this_cpu_write(mce_next_interval, iv);
- __start_timer(t, iv);
+ if (mce_get_storm_mode()) {
+ __start_timer(t, HZ);
+ } else {
+ __this_cpu_write(mce_next_interval, iv);
+ __start_timer(t, iv);
+ }
}
/*
- * Ensure that the timer is firing in @interval from now.
+ * When a storm starts on any bank on this CPU, switch to polling
+ * once per second. When the storm ends, revert to the default
+ * polling interval.
*/
-void mce_timer_kick(unsigned long interval)
+void mce_timer_kick(bool storm)
{
struct timer_list *t = this_cpu_ptr(&mce_timer);
- unsigned long iv = __this_cpu_read(mce_next_interval);
- __start_timer(t, interval);
+ mce_set_storm_mode(storm);
- if (interval < iv)
- __this_cpu_write(mce_next_interval, interval);
+ if (storm)
+ __start_timer(t, HZ);
+ else
+ __this_cpu_write(mce_next_interval, check_interval * HZ);
}
/* Must not be called in IRQ context where del_timer_sync() can deadlock */
intel_init_cmci();
intel_init_lmce();
- mce_adjust_timer = cmci_intel_adjust_timer;
}
static void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
switch (c->x86_vendor) {
case X86_VENDOR_INTEL:
mce_intel_feature_init(c);
- mce_adjust_timer = cmci_intel_adjust_timer;
break;
case X86_VENDOR_AMD: {
int err;
int i, j;
- if (!mce_available(&boot_cpu_data))
- return -EIO;
-
dev = per_cpu(mce_device, cpu);
if (dev)
return 0;
static int mce_cpu_dead(unsigned int cpu)
{
- mce_intel_hcpu_update(cpu);
-
/* intentionally ignoring frozen here */
if (!cpuhp_tasks_frozen)
cmci_rediscover();
wrmsrl_safe(mca_msr_reg(bank, MCA_STATUS), status);
rdmsrl_safe(mca_msr_reg(bank, MCA_STATUS), &status);
+ wrmsrl_safe(mca_msr_reg(bank, MCA_STATUS), 0);
if (!status) {
hw_injection_possible = false;
*/
static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
-/*
- * CMCI storm detection backoff counter
- *
- * During storm, we reset this counter to INITIAL_CHECK_INTERVAL in case we've
- * encountered an error. If not, we decrement it by one. We signal the end of
- * the CMCI storm when it reaches 0.
- */
-static DEFINE_PER_CPU(int, cmci_backoff_cnt);
-
/*
* cmci_discover_lock protects against parallel discovery attempts
* which could race against each other.
*/
static DEFINE_SPINLOCK(cmci_poll_lock);
+/* Linux non-storm CMCI threshold (may be overridden by BIOS) */
#define CMCI_THRESHOLD 1
-#define CMCI_POLL_INTERVAL (30 * HZ)
-#define CMCI_STORM_INTERVAL (HZ)
-#define CMCI_STORM_THRESHOLD 15
-static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
-static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
-static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
-
-enum {
- CMCI_STORM_NONE,
- CMCI_STORM_ACTIVE,
- CMCI_STORM_SUBSIDED,
-};
+/*
+ * MCi_CTL2 threshold for each bank when there is no storm.
+ * Default value for each bank may have been set by BIOS.
+ */
+static u16 cmci_threshold[MAX_NR_BANKS];
-static atomic_t cmci_storm_on_cpus;
+/*
+ * High threshold to limit CMCI rate during storms. Max supported is
+ * 0x7FFF. Use this slightly smaller value so it has a distinctive
+ * signature when some asks "Why am I not seeing all corrected errors?"
+ * A high threshold is used instead of just disabling CMCI for a
+ * bank because both corrected and uncorrected errors may be logged
+ * in the same bank and signalled with CMCI. The threshold only applies
+ * to corrected errors, so keeping CMCI enabled means that uncorrected
+ * errors will still be processed in a timely fashion.
+ */
+#define CMCI_STORM_THRESHOLD 32749
static int cmci_supported(int *banks)
{
return tmp & FEAT_CTL_LMCE_ENABLED;
}
-bool mce_intel_cmci_poll(void)
+/*
+ * Set a new CMCI threshold value. Preserve the state of the
+ * MCI_CTL2_CMCI_EN bit in case this happens during a
+ * cmci_rediscover() operation.
+ */
+static void cmci_set_threshold(int bank, int thresh)
{
- if (__this_cpu_read(cmci_storm_state) == CMCI_STORM_NONE)
- return false;
-
- /*
- * Reset the counter if we've logged an error in the last poll
- * during the storm.
- */
- if (machine_check_poll(0, this_cpu_ptr(&mce_banks_owned)))
- this_cpu_write(cmci_backoff_cnt, INITIAL_CHECK_INTERVAL);
- else
- this_cpu_dec(cmci_backoff_cnt);
+ unsigned long flags;
+ u64 val;
- return true;
+ raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+ wrmsrl(MSR_IA32_MCx_CTL2(bank), val | thresh);
+ raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
}
-void mce_intel_hcpu_update(unsigned long cpu)
+void mce_intel_handle_storm(int bank, bool on)
{
- if (per_cpu(cmci_storm_state, cpu) == CMCI_STORM_ACTIVE)
- atomic_dec(&cmci_storm_on_cpus);
+ if (on)
+ cmci_set_threshold(bank, CMCI_STORM_THRESHOLD);
+ else
+ cmci_set_threshold(bank, cmci_threshold[bank]);
+}
- per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+/*
+ * The interrupt handler. This is called on every event.
+ * Just call the poller directly to log any events.
+ * This could in theory increase the threshold under high load,
+ * but doesn't for now.
+ */
+static void intel_threshold_interrupt(void)
+{
+ machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
}
-static void cmci_toggle_interrupt_mode(bool on)
+/*
+ * Check all the reasons why current CPU cannot claim
+ * ownership of a bank.
+ * 1: CPU already owns this bank
+ * 2: BIOS owns this bank
+ * 3: Some other CPU owns this bank
+ */
+static bool cmci_skip_bank(int bank, u64 *val)
{
- unsigned long flags, *owned;
- int bank;
- u64 val;
+ unsigned long *owned = (void *)this_cpu_ptr(&mce_banks_owned);
- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
- owned = this_cpu_ptr(mce_banks_owned);
- for_each_set_bit(bank, owned, MAX_NR_BANKS) {
- rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ if (test_bit(bank, owned))
+ return true;
- if (on)
- val |= MCI_CTL2_CMCI_EN;
- else
- val &= ~MCI_CTL2_CMCI_EN;
+ /* Skip banks in firmware first mode */
+ if (test_bit(bank, mce_banks_ce_disabled))
+ return true;
- wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
- }
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
-}
+ rdmsrl(MSR_IA32_MCx_CTL2(bank), *val);
-unsigned long cmci_intel_adjust_timer(unsigned long interval)
-{
- if ((this_cpu_read(cmci_backoff_cnt) > 0) &&
- (__this_cpu_read(cmci_storm_state) == CMCI_STORM_ACTIVE)) {
- mce_notify_irq();
- return CMCI_STORM_INTERVAL;
+ /* Already owned by someone else? */
+ if (*val & MCI_CTL2_CMCI_EN) {
+ clear_bit(bank, owned);
+ __clear_bit(bank, this_cpu_ptr(mce_poll_banks));
+ return true;
}
- switch (__this_cpu_read(cmci_storm_state)) {
- case CMCI_STORM_ACTIVE:
-
- /*
- * We switch back to interrupt mode once the poll timer has
- * silenced itself. That means no events recorded and the timer
- * interval is back to our poll interval.
- */
- __this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
- if (!atomic_sub_return(1, &cmci_storm_on_cpus))
- pr_notice("CMCI storm subsided: switching to interrupt mode\n");
+ return false;
+}
- fallthrough;
+/*
+ * Decide which CMCI interrupt threshold to use:
+ * 1: If this bank is in storm mode from whichever CPU was
+ * the previous owner, stay in storm mode.
+ * 2: If ignoring any threshold set by BIOS, set Linux default
+ * 3: Try to honor BIOS threshold (unless buggy BIOS set it at zero).
+ */
+static u64 cmci_pick_threshold(u64 val, int *bios_zero_thresh)
+{
+ if ((val & MCI_CTL2_CMCI_THRESHOLD_MASK) == CMCI_STORM_THRESHOLD)
+ return val;
- case CMCI_STORM_SUBSIDED:
+ if (!mca_cfg.bios_cmci_threshold) {
+ val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+ val |= CMCI_THRESHOLD;
+ } else if (!(val & MCI_CTL2_CMCI_THRESHOLD_MASK)) {
/*
- * We wait for all CPUs to go back to SUBSIDED state. When that
- * happens we switch back to interrupt mode.
+ * If bios_cmci_threshold boot option was specified
+ * but the threshold is zero, we'll try to initialize
+ * it to 1.
*/
- if (!atomic_read(&cmci_storm_on_cpus)) {
- __this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
- cmci_toggle_interrupt_mode(true);
- cmci_recheck();
- }
- return CMCI_POLL_INTERVAL;
- default:
-
- /* We have shiny weather. Let the poll do whatever it thinks. */
- return interval;
+ *bios_zero_thresh = 1;
+ val |= CMCI_THRESHOLD;
}
+
+ return val;
}
-static bool cmci_storm_detect(void)
+/*
+ * Try to claim ownership of a bank.
+ */
+static void cmci_claim_bank(int bank, u64 val, int bios_zero_thresh, int *bios_wrong_thresh)
{
- unsigned int cnt = __this_cpu_read(cmci_storm_cnt);
- unsigned long ts = __this_cpu_read(cmci_time_stamp);
- unsigned long now = jiffies;
- int r;
+ struct mca_storm_desc *storm = this_cpu_ptr(&storm_desc);
- if (__this_cpu_read(cmci_storm_state) != CMCI_STORM_NONE)
- return true;
+ val |= MCI_CTL2_CMCI_EN;
+ wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
- if (time_before_eq(now, ts + CMCI_STORM_INTERVAL)) {
- cnt++;
- } else {
- cnt = 1;
- __this_cpu_write(cmci_time_stamp, now);
+ /* If the enable bit did not stick, this bank should be polled. */
+ if (!(val & MCI_CTL2_CMCI_EN)) {
+ WARN_ON(!test_bit(bank, this_cpu_ptr(mce_poll_banks)));
+ storm->banks[bank].poll_only = true;
+ return;
}
- __this_cpu_write(cmci_storm_cnt, cnt);
- if (cnt <= CMCI_STORM_THRESHOLD)
- return false;
-
- cmci_toggle_interrupt_mode(false);
- __this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
- r = atomic_add_return(1, &cmci_storm_on_cpus);
- mce_timer_kick(CMCI_STORM_INTERVAL);
- this_cpu_write(cmci_backoff_cnt, INITIAL_CHECK_INTERVAL);
+ /* This CPU successfully set the enable bit. */
+ set_bit(bank, (void *)this_cpu_ptr(&mce_banks_owned));
- if (r == 1)
- pr_notice("CMCI storm detected: switching to poll mode\n");
- return true;
-}
+ if ((val & MCI_CTL2_CMCI_THRESHOLD_MASK) == CMCI_STORM_THRESHOLD) {
+ pr_notice("CPU%d BANK%d CMCI inherited storm\n", smp_processor_id(), bank);
+ mce_inherit_storm(bank);
+ cmci_storm_begin(bank);
+ } else {
+ __clear_bit(bank, this_cpu_ptr(mce_poll_banks));
+ }
-/*
- * The interrupt handler. This is called on every event.
- * Just call the poller directly to log any events.
- * This could in theory increase the threshold under high load,
- * but doesn't for now.
- */
-static void intel_threshold_interrupt(void)
-{
- if (cmci_storm_detect())
- return;
+ /*
+ * We are able to set thresholds for some banks that
+ * had a threshold of 0. This means the BIOS has not
+ * set the thresholds properly or does not work with
+ * this boot option. Note down now and report later.
+ */
+ if (mca_cfg.bios_cmci_threshold && bios_zero_thresh &&
+ (val & MCI_CTL2_CMCI_THRESHOLD_MASK))
+ *bios_wrong_thresh = 1;
- machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
+ /* Save default threshold for each bank */
+ if (cmci_threshold[bank] == 0)
+ cmci_threshold[bank] = val & MCI_CTL2_CMCI_THRESHOLD_MASK;
}
/*
* Enable CMCI (Corrected Machine Check Interrupt) for available MCE banks
* on this CPU. Use the algorithm recommended in the SDM to discover shared
- * banks.
+ * banks. Called during initial bootstrap, and also for hotplug CPU operations
+ * to rediscover/reassign machine check banks.
*/
static void cmci_discover(int banks)
{
- unsigned long *owned = (void *)this_cpu_ptr(&mce_banks_owned);
+ int bios_wrong_thresh = 0;
unsigned long flags;
int i;
- int bios_wrong_thresh = 0;
raw_spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
u64 val;
int bios_zero_thresh = 0;
- if (test_bit(i, owned))
+ if (cmci_skip_bank(i, &val))
continue;
- /* Skip banks in firmware first mode */
- if (test_bit(i, mce_banks_ce_disabled))
- continue;
-
- rdmsrl(MSR_IA32_MCx_CTL2(i), val);
-
- /* Already owned by someone else? */
- if (val & MCI_CTL2_CMCI_EN) {
- clear_bit(i, owned);
- __clear_bit(i, this_cpu_ptr(mce_poll_banks));
- continue;
- }
-
- if (!mca_cfg.bios_cmci_threshold) {
- val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
- val |= CMCI_THRESHOLD;
- } else if (!(val & MCI_CTL2_CMCI_THRESHOLD_MASK)) {
- /*
- * If bios_cmci_threshold boot option was specified
- * but the threshold is zero, we'll try to initialize
- * it to 1.
- */
- bios_zero_thresh = 1;
- val |= CMCI_THRESHOLD;
- }
-
- val |= MCI_CTL2_CMCI_EN;
- wrmsrl(MSR_IA32_MCx_CTL2(i), val);
- rdmsrl(MSR_IA32_MCx_CTL2(i), val);
-
- /* Did the enable bit stick? -- the bank supports CMCI */
- if (val & MCI_CTL2_CMCI_EN) {
- set_bit(i, owned);
- __clear_bit(i, this_cpu_ptr(mce_poll_banks));
- /*
- * We are able to set thresholds for some banks that
- * had a threshold of 0. This means the BIOS has not
- * set the thresholds properly or does not work with
- * this boot option. Note down now and report later.
- */
- if (mca_cfg.bios_cmci_threshold && bios_zero_thresh &&
- (val & MCI_CTL2_CMCI_THRESHOLD_MASK))
- bios_wrong_thresh = 1;
- } else {
- WARN_ON(!test_bit(i, this_cpu_ptr(mce_poll_banks)));
- }
+ val = cmci_pick_threshold(val, &bios_zero_thresh);
+ cmci_claim_bank(i, val, bios_zero_thresh, &bios_wrong_thresh);
}
raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (mca_cfg.bios_cmci_threshold && bios_wrong_thresh) {
val &= ~MCI_CTL2_CMCI_EN;
wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
__clear_bit(bank, this_cpu_ptr(mce_banks_owned));
+
+ if ((val & MCI_CTL2_CMCI_THRESHOLD_MASK) == CMCI_STORM_THRESHOLD)
+ cmci_storm_end(bank);
}
/*
extern mce_banks_t mce_banks_ce_disabled;
#ifdef CONFIG_X86_MCE_INTEL
-unsigned long cmci_intel_adjust_timer(unsigned long interval);
-bool mce_intel_cmci_poll(void);
-void mce_intel_hcpu_update(unsigned long cpu);
+void mce_intel_handle_storm(int bank, bool on);
void cmci_disable_bank(int bank);
void intel_init_cmci(void);
void intel_init_lmce(void);
bool intel_filter_mce(struct mce *m);
bool intel_mce_usable_address(struct mce *m);
#else
-# define cmci_intel_adjust_timer mce_adjust_timer_default
-static inline bool mce_intel_cmci_poll(void) { return false; }
-static inline void mce_intel_hcpu_update(unsigned long cpu) { }
+static inline void mce_intel_handle_storm(int bank, bool on) { }
static inline void cmci_disable_bank(int bank) { }
static inline void intel_init_cmci(void) { }
static inline void intel_init_lmce(void) { }
static inline bool intel_mce_usable_address(struct mce *m) { return false; }
#endif
-void mce_timer_kick(unsigned long interval);
+void mce_timer_kick(bool storm);
+
+#ifdef CONFIG_X86_MCE_THRESHOLD
+void cmci_storm_begin(unsigned int bank);
+void cmci_storm_end(unsigned int bank);
+void mce_track_storm(struct mce *mce);
+void mce_inherit_storm(unsigned int bank);
+bool mce_get_storm_mode(void);
+void mce_set_storm_mode(bool storm);
+#else
+static inline void cmci_storm_begin(unsigned int bank) {}
+static inline void cmci_storm_end(unsigned int bank) {}
+static inline void mce_track_storm(struct mce *mce) {}
+static inline void mce_inherit_storm(unsigned int bank) {}
+static inline bool mce_get_storm_mode(void) { return false; }
+static inline void mce_set_storm_mode(bool storm) {}
+#endif
+
+/*
+ * history: Bitmask tracking errors occurrence. Each set bit
+ * represents an error seen.
+ *
+ * timestamp: Last time (in jiffies) that the bank was polled.
+ * in_storm_mode: Is this bank in storm mode?
+ * poll_only: Bank does not support CMCI, skip storm tracking.
+ */
+struct storm_bank {
+ u64 history;
+ u64 timestamp;
+ bool in_storm_mode;
+ bool poll_only;
+};
+
+#define NUM_HISTORY_BITS (sizeof(u64) * BITS_PER_BYTE)
+
+/* How many errors within the history buffer mark the start of a storm. */
+#define STORM_BEGIN_THRESHOLD 5
+
+/*
+ * How many polls of machine check bank without an error before declaring
+ * the storm is over. Since it is tracked by the bitmasks in the history
+ * field of struct storm_bank the mask is 30 bits [0 ... 29].
+ */
+#define STORM_END_POLL_THRESHOLD 29
+
+/*
+ * banks: per-cpu, per-bank details
+ * stormy_bank_count: count of MC banks in storm state
+ * poll_mode: CPU is in poll mode
+ */
+struct mca_storm_desc {
+ struct storm_bank banks[MAX_NR_BANKS];
+ u8 stormy_bank_count;
+ bool poll_mode;
+};
+
+DECLARE_PER_CPU(struct mca_storm_desc, storm_desc);
#ifdef CONFIG_ACPI_APEI
int apei_write_mce(struct mce *m);
trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR);
apic_eoi();
}
+
+DEFINE_PER_CPU(struct mca_storm_desc, storm_desc);
+
+void mce_inherit_storm(unsigned int bank)
+{
+ struct mca_storm_desc *storm = this_cpu_ptr(&storm_desc);
+
+ /*
+ * Previous CPU owning this bank had put it into storm mode,
+ * but the precise history of that storm is unknown. Assume
+ * the worst (all recent polls of the bank found a valid error
+ * logged). This will avoid the new owner prematurely declaring
+ * the storm has ended.
+ */
+ storm->banks[bank].history = ~0ull;
+ storm->banks[bank].timestamp = jiffies;
+}
+
+bool mce_get_storm_mode(void)
+{
+ return __this_cpu_read(storm_desc.poll_mode);
+}
+
+void mce_set_storm_mode(bool storm)
+{
+ __this_cpu_write(storm_desc.poll_mode, storm);
+}
+
+static void mce_handle_storm(unsigned int bank, bool on)
+{
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_INTEL:
+ mce_intel_handle_storm(bank, on);
+ break;
+ }
+}
+
+void cmci_storm_begin(unsigned int bank)
+{
+ struct mca_storm_desc *storm = this_cpu_ptr(&storm_desc);
+
+ __set_bit(bank, this_cpu_ptr(mce_poll_banks));
+ storm->banks[bank].in_storm_mode = true;
+
+ /*
+ * If this is the first bank on this CPU to enter storm mode
+ * start polling.
+ */
+ if (++storm->stormy_bank_count == 1)
+ mce_timer_kick(true);
+}
+
+void cmci_storm_end(unsigned int bank)
+{
+ struct mca_storm_desc *storm = this_cpu_ptr(&storm_desc);
+
+ __clear_bit(bank, this_cpu_ptr(mce_poll_banks));
+ storm->banks[bank].history = 0;
+ storm->banks[bank].in_storm_mode = false;
+
+ /* If no banks left in storm mode, stop polling. */
+ if (!this_cpu_dec_return(storm_desc.stormy_bank_count))
+ mce_timer_kick(false);
+}
+
+void mce_track_storm(struct mce *mce)
+{
+ struct mca_storm_desc *storm = this_cpu_ptr(&storm_desc);
+ unsigned long now = jiffies, delta;
+ unsigned int shift = 1;
+ u64 history = 0;
+
+ /* No tracking needed for banks that do not support CMCI */
+ if (storm->banks[mce->bank].poll_only)
+ return;
+
+ /*
+ * When a bank is in storm mode it is polled once per second and
+ * the history mask will record about the last minute of poll results.
+ * If it is not in storm mode, then the bank is only checked when
+ * there is a CMCI interrupt. Check how long it has been since
+ * this bank was last checked, and adjust the amount of "shift"
+ * to apply to history.
+ */
+ if (!storm->banks[mce->bank].in_storm_mode) {
+ delta = now - storm->banks[mce->bank].timestamp;
+ shift = (delta + HZ) / HZ;
+ }
+
+ /* If it has been a long time since the last poll, clear history. */
+ if (shift < NUM_HISTORY_BITS)
+ history = storm->banks[mce->bank].history << shift;
+
+ storm->banks[mce->bank].timestamp = now;
+
+ /* History keeps track of corrected errors. VAL=1 && UC=0 */
+ if ((mce->status & MCI_STATUS_VAL) && mce_is_correctable(mce))
+ history |= 1;
+
+ storm->banks[mce->bank].history = history;
+
+ if (storm->banks[mce->bank].in_storm_mode) {
+ if (history & GENMASK_ULL(STORM_END_POLL_THRESHOLD, 0))
+ return;
+ printk_deferred(KERN_NOTICE "CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), mce->bank);
+ mce_handle_storm(mce->bank, false);
+ cmci_storm_end(mce->bank);
+ } else {
+ if (hweight64(history) < STORM_BEGIN_THRESHOLD)
+ return;
+ printk_deferred(KERN_NOTICE "CPU%d BANK%d CMCI storm detected\n", smp_processor_id(), mce->bank);
+ mce_handle_storm(mce->bank, true);
+ cmci_storm_begin(mce->bank);
+ }
+}
{
struct cpio_data cp;
+ intel_collect_cpu_info(&uci->cpu_sig);
+
if (!load_builtin_intel_microcode(&cp))
cp = find_microcode_in_initrd(ucode_path);
if (!(cp.data && cp.size))
return NULL;
- intel_collect_cpu_info(&uci->cpu_sig);
-
return scan_microcode(cp.data, cp.size, uci, save);
}
{
struct ucode_cpu_info uci;
- ed->old_rev = intel_get_microcode_revision();
-
uci.mc = get_microcode_blob(&uci, false);
- if (uci.mc && apply_microcode_early(&uci) == UCODE_UPDATED)
- ucode_patch_va = UCODE_BSP_LOADED;
+ ed->old_rev = uci.cpu_sig.rev;
- ed->new_rev = uci.cpu_sig.rev;
+ if (uci.mc && apply_microcode_early(&uci) == UCODE_UPDATED) {
+ ucode_patch_va = UCODE_BSP_LOADED;
+ ed->new_rev = uci.cpu_sig.rev;
+ }
}
void load_ucode_intel_ap(void)
if (ret != UCODE_UPDATED && ret != UCODE_OK)
return ret;
- if (!cpu && uci->cpu_sig.rev != cur_rev) {
- pr_info("Updated to revision 0x%x, date = %04x-%02x-%02x\n",
- uci->cpu_sig.rev, mc->hdr.date & 0xffff, mc->hdr.date >> 24,
- (mc->hdr.date >> 16) & 0xff);
- }
-
cpu_data(cpu).microcode = uci->cpu_sig.rev;
if (!cpu)
boot_cpu_data.microcode = uci->cpu_sig.rev;
/* Form the CR3 value being sure to include the CR3 modifier */
addq $(early_top_pgt - __START_KERNEL_map), %rax
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ mov %rax, %rdi
+ mov %rax, %r14
+
+ addq phys_base(%rip), %rdi
+
+ /*
+ * For SEV guests: Verify that the C-bit is correct. A malicious
+ * hypervisor could lie about the C-bit position to perform a ROP
+ * attack on the guest by writing to the unencrypted stack and wait for
+ * the next RET instruction.
+ */
+ call sev_verify_cbit
+
+ /*
+ * Restore CR3 value without the phys_base which will be added
+ * below, before writing %cr3.
+ */
+ mov %r14, %rax
+#endif
+
jmp 1f
SYM_CODE_END(startup_64)
/* Setup early boot stage 4-/5-level pagetables. */
addq phys_base(%rip), %rax
- /*
- * For SEV guests: Verify that the C-bit is correct. A malicious
- * hypervisor could lie about the C-bit position to perform a ROP
- * attack on the guest by writing to the unencrypted stack and wait for
- * the next RET instruction.
- */
- movq %rax, %rdi
- call sev_verify_cbit
-
/*
* Switch to new page-table
*
testl $X2APIC_ENABLE, %eax
jnz .Lread_apicid_msr
+#ifdef CONFIG_X86_X2APIC
+ /*
+ * If system is in X2APIC mode then MMIO base might not be
+ * mapped causing the MMIO read below to fault. Faults can't
+ * be handled at that point.
+ */
+ cmpl $0, x2apic_mode(%rip)
+ jz .Lread_apicid_mmio
+
+ /* Force the AP into X2APIC mode. */
+ orl $X2APIC_ENABLE, %eax
+ wrmsr
+ jmp .Lread_apicid_msr
+#endif
+
+.Lread_apicid_mmio:
/* Read the APIC ID from the fix-mapped MMIO space. */
movq apic_mmio_base(%rip), %rcx
addq $APIC_ID, %rcx
{
unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
- int3_emulate_call(regs, regs_get_register(regs, offs));
+ int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + p->ainsn.size);
+ int3_emulate_jmp(regs, regs_get_register(regs, offs));
}
NOKPROBE_SYMBOL(kprobe_emulate_call_indirect);
"cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax)\n\t" \
"setne %al\n\t"
-DEFINE_PARAVIRT_ASM(__raw_callee_save___kvm_vcpu_is_preempted,
- PV_VCPU_PREEMPTED_ASM, .text);
+DEFINE_ASM_FUNC(__raw_callee_save___kvm_vcpu_is_preempted,
+ PV_VCPU_PREEMPTED_ASM, .text);
#endif
static void __init kvm_guest_init(void)
struct module *me)
{
const Elf_Shdr *s, *alt = NULL, *locks = NULL,
- *para = NULL, *orc = NULL, *orc_ip = NULL,
+ *orc = NULL, *orc_ip = NULL,
*retpolines = NULL, *returns = NULL, *ibt_endbr = NULL,
*calls = NULL, *cfi = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
alt = s;
if (!strcmp(".smp_locks", secstrings + s->sh_name))
locks = s;
- if (!strcmp(".parainstructions", secstrings + s->sh_name))
- para = s;
if (!strcmp(".orc_unwind", secstrings + s->sh_name))
orc = s;
if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
ibt_endbr = s;
}
- /*
- * See alternative_instructions() for the ordering rules between the
- * various patching types.
- */
- if (para) {
- void *pseg = (void *)para->sh_addr;
- apply_paravirt(pseg, pseg + para->sh_size);
- }
if (retpolines || cfi) {
void *rseg = NULL, *cseg = NULL;
unsigned int rsize = 0, csize = 0;
void *aseg = (void *)alt->sh_addr;
apply_alternatives(aseg, aseg + alt->sh_size);
}
- if (calls || para) {
+ if (calls || alt) {
struct callthunk_sites cs = {};
if (calls) {
cs.call_end = (void *)calls->sh_addr + calls->sh_size;
}
- if (para) {
- cs.pv_start = (void *)para->sh_addr;
- cs.pv_end = (void *)para->sh_addr + para->sh_size;
+ if (alt) {
+ cs.alt_start = (void *)alt->sh_addr;
+ cs.alt_end = (void *)alt->sh_addr + alt->sh_size;
}
callthunks_patch_module_calls(&cs, me);
#include <asm/io_bitmap.h>
#include <asm/gsseg.h>
-/*
- * nop stub, which must not clobber anything *including the stack* to
- * avoid confusing the entry prologues.
- */
-DEFINE_PARAVIRT_ASM(_paravirt_nop, "", .entry.text);
-
/* stub always returning 0. */
-DEFINE_PARAVIRT_ASM(paravirt_ret0, "xor %eax,%eax", .entry.text);
+DEFINE_ASM_FUNC(paravirt_ret0, "xor %eax,%eax", .entry.text);
void __init default_banner(void)
{
pv_info.name);
}
-/* Undefined instruction for dealing with missing ops pointers. */
-noinstr void paravirt_BUG(void)
-{
- BUG();
-}
-
-static unsigned paravirt_patch_call(void *insn_buff, const void *target,
- unsigned long addr, unsigned len)
-{
- __text_gen_insn(insn_buff, CALL_INSN_OPCODE,
- (void *)addr, target, CALL_INSN_SIZE);
- return CALL_INSN_SIZE;
-}
-
#ifdef CONFIG_PARAVIRT_XXL
-DEFINE_PARAVIRT_ASM(_paravirt_ident_64, "mov %rdi, %rax", .text);
-DEFINE_PARAVIRT_ASM(pv_native_save_fl, "pushf; pop %rax", .noinstr.text);
-DEFINE_PARAVIRT_ASM(pv_native_irq_disable, "cli", .noinstr.text);
-DEFINE_PARAVIRT_ASM(pv_native_irq_enable, "sti", .noinstr.text);
-DEFINE_PARAVIRT_ASM(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
+DEFINE_ASM_FUNC(_paravirt_ident_64, "mov %rdi, %rax", .text);
+DEFINE_ASM_FUNC(pv_native_save_fl, "pushf; pop %rax", .noinstr.text);
+DEFINE_ASM_FUNC(pv_native_irq_disable, "cli", .noinstr.text);
+DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
+DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
#endif
DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
tlb_remove_page(tlb, table);
}
-unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr,
- unsigned int len)
-{
- /*
- * Neat trick to map patch type back to the call within the
- * corresponding structure.
- */
- void *opfunc = *((void **)&pv_ops + type);
- unsigned ret;
-
- if (opfunc == NULL)
- /* If there's no function, patch it with paravirt_BUG() */
- ret = paravirt_patch_call(insn_buff, paravirt_BUG, addr, len);
- else if (opfunc == _paravirt_nop)
- ret = 0;
- else
- /* Otherwise call the function. */
- ret = paravirt_patch_call(insn_buff, opfunc, addr, len);
-
- return ret;
-}
-
struct static_key paravirt_steal_enabled;
struct static_key paravirt_steal_rq_enabled;
}
#endif
- /*
- * start address and size of operations which during runtime
- * can be patched with virtualization friendly instructions or
- * baremetal native ones. Think page table operations.
- * Details in paravirt_types.h
- */
- . = ALIGN(8);
- .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
- __parainstructions = .;
- *(.parainstructions)
- __parainstructions_end = .;
- }
-
#ifdef CONFIG_RETPOLINE
/*
* List of instructions that call/jmp/jcc to retpoline thunks
if (!eventfd)
return HV_STATUS_INVALID_PORT_ID;
- eventfd_signal(eventfd, 1);
+ eventfd_signal(eventfd);
return HV_STATUS_SUCCESS;
}
set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
}
+
+ /*
+ * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
+ * the host/guest supports its use.
+ *
+ * guest_can_use() checks a number of requirements on the host/guest to
+ * ensure that MSR_IA32_XSS is available, but it might report true even
+ * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
+ * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
+ * to further check that the guest CPUID actually supports
+ * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
+ * guests will still get intercepted and caught in the normal
+ * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
+ */
+ if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
+ else
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 0, 0);
}
void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
{ .index = MSR_IA32_LASTINTTOIP, .always = false },
+ { .index = MSR_IA32_XSS, .always = false },
{ .index = MSR_EFER, .always = false },
{ .index = MSR_IA32_CR_PAT, .always = false },
{ .index = MSR_AMD64_SEV_ES_GHCB, .always = true },
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2
-#define MAX_DIRECT_ACCESS_MSRS 46
+#define MAX_DIRECT_ACCESS_MSRS 47
#define MSRPM_OFFSETS 32
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
if (ret < 0 && ret != -ENOTCONN)
return false;
} else {
- eventfd_signal(evtchnfd->deliver.eventfd.ctx, 1);
+ eventfd_signal(evtchnfd->deliver.eventfd.ctx);
}
*r = 0;
#include <asm/checksum.h>
#include <asm/word-at-a-time.h>
-static inline unsigned short from32to16(unsigned a)
+static inline __wsum csum_finalize_sum(u64 temp64)
{
- unsigned short b = a >> 16;
- asm("addw %w2,%w0\n\t"
- "adcw $0,%w0\n"
- : "=r" (b)
- : "0" (b), "r" (a));
- return b;
+ return (__force __wsum)((temp64 + ror64(temp64, 32)) >> 32);
}
-static inline __wsum csum_tail(u64 temp64, int odd)
+static inline unsigned long update_csum_40b(unsigned long sum, const unsigned long m[5])
{
- unsigned int result;
-
- result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff);
- if (unlikely(odd)) {
- result = from32to16(result);
- result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
- }
- return (__force __wsum)result;
+ asm("addq %1,%0\n\t"
+ "adcq %2,%0\n\t"
+ "adcq %3,%0\n\t"
+ "adcq %4,%0\n\t"
+ "adcq %5,%0\n\t"
+ "adcq $0,%0"
+ :"+r" (sum)
+ :"m" (m[0]), "m" (m[1]), "m" (m[2]),
+ "m" (m[3]), "m" (m[4]));
+ return sum;
}
/*
__wsum csum_partial(const void *buff, int len, __wsum sum)
{
u64 temp64 = (__force u64)sum;
- unsigned odd;
- odd = 1 & (unsigned long) buff;
- if (unlikely(odd)) {
- if (unlikely(len == 0))
- return sum;
- temp64 = ror32((__force u32)sum, 8);
- temp64 += (*(unsigned char *)buff << 8);
- len--;
- buff++;
+ /* Do two 40-byte chunks in parallel to get better ILP */
+ if (likely(len >= 80)) {
+ u64 temp64_2 = 0;
+ do {
+ temp64 = update_csum_40b(temp64, buff);
+ temp64_2 = update_csum_40b(temp64_2, buff + 40);
+ buff += 80;
+ len -= 80;
+ } while (len >= 80);
+
+ asm("addq %1,%0\n\t"
+ "adcq $0,%0"
+ :"+r" (temp64): "r" (temp64_2));
}
/*
- * len == 40 is the hot case due to IPv6 headers, but annotating it likely()
- * has noticeable negative affect on codegen for all other cases with
- * minimal performance benefit here.
+ * len == 40 is the hot case due to IPv6 headers, so return
+ * early for that exact case without checking the tail bytes.
*/
- if (len == 40) {
- asm("addq 0*8(%[src]),%[res]\n\t"
- "adcq 1*8(%[src]),%[res]\n\t"
- "adcq 2*8(%[src]),%[res]\n\t"
- "adcq 3*8(%[src]),%[res]\n\t"
- "adcq 4*8(%[src]),%[res]\n\t"
- "adcq $0,%[res]"
- : [res] "+r"(temp64)
- : [src] "r"(buff), "m"(*(const char(*)[40])buff));
- return csum_tail(temp64, odd);
- }
- if (unlikely(len >= 64)) {
- /*
- * Extra accumulators for better ILP in the loop.
- */
- u64 tmp_accum, tmp_carries;
-
- asm("xorl %k[tmp_accum],%k[tmp_accum]\n\t"
- "xorl %k[tmp_carries],%k[tmp_carries]\n\t"
- "subl $64, %[len]\n\t"
- "1:\n\t"
- "addq 0*8(%[src]),%[res]\n\t"
- "adcq 1*8(%[src]),%[res]\n\t"
- "adcq 2*8(%[src]),%[res]\n\t"
- "adcq 3*8(%[src]),%[res]\n\t"
- "adcl $0,%k[tmp_carries]\n\t"
- "addq 4*8(%[src]),%[tmp_accum]\n\t"
- "adcq 5*8(%[src]),%[tmp_accum]\n\t"
- "adcq 6*8(%[src]),%[tmp_accum]\n\t"
- "adcq 7*8(%[src]),%[tmp_accum]\n\t"
- "adcl $0,%k[tmp_carries]\n\t"
- "addq $64, %[src]\n\t"
- "subl $64, %[len]\n\t"
- "jge 1b\n\t"
- "addq %[tmp_accum],%[res]\n\t"
- "adcq %[tmp_carries],%[res]\n\t"
- "adcq $0,%[res]"
- : [tmp_accum] "=&r"(tmp_accum),
- [tmp_carries] "=&r"(tmp_carries), [res] "+r"(temp64),
- [len] "+r"(len), [src] "+r"(buff)
- : "m"(*(const char *)buff));
+ if (len >= 40) {
+ temp64 = update_csum_40b(temp64, buff);
+ len -= 40;
+ if (!len)
+ return csum_finalize_sum(temp64);
+ buff += 40;
}
if (len & 32) {
: [res] "+r"(temp64)
: [trail] "r"(trail));
}
- return csum_tail(temp64, odd);
+ return csum_finalize_sum(temp64);
}
EXPORT_SYMBOL(csum_partial);
*/
int num_digits(int val)
{
- int m = 10;
+ long long m = 10;
int d = 1;
if (val < 0) {
goto done;
}
count_vm_vma_lock_event(VMA_LOCK_RETRY);
+ if (fault & VM_FAULT_MAJOR)
+ flags |= FAULT_FLAG_TRIED;
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
mmr_value = 0;
entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
entry->vector = cfg->vector;
- entry->delivery_mode = apic->delivery_mode;
+ entry->delivery_mode = APIC_DELIVERY_MODE_FIXED;
entry->dest_mode = apic->dest_mode_logical;
entry->polarity = 0;
entry->trigger = 0;
[S_REL] =
"^(__init_(begin|end)|"
"__x86_cpu_dev_(start|end)|"
- "(__parainstructions|__alt_instructions)(_end)?|"
+ "__alt_instructions(_end)?|"
"(__iommu_table|__apicdrivers|__smp_locks)(_end)?|"
"__(start|end)_pci_.*|"
#if CONFIG_FW_LOADER
select PARAVIRT_CLOCK
select X86_HV_CALLBACK_VECTOR
depends on X86_64 || (X86_32 && X86_PAE)
+ depends on X86_64 || (X86_GENERIC || MPENTIUM4 || MCORE2 || MATOM || MK8)
depends on X86_LOCAL_APIC && X86_TSC
help
This is the Linux Xen port. Enabling this will allow the
/* Initial interrupt flag handling only called while interrupts off. */
.save_fl = __PV_IS_CALLEE_SAVE(paravirt_ret0),
.irq_disable = __PV_IS_CALLEE_SAVE(paravirt_nop),
- .irq_enable = __PV_IS_CALLEE_SAVE(paravirt_BUG),
+ .irq_enable = __PV_IS_CALLEE_SAVE(BUG_func),
.safe_halt = xen_safe_halt,
.halt = xen_halt,
454 common futex_wake sys_futex_wake
455 common futex_wait sys_futex_wait
456 common futex_requeue sys_futex_requeue
+457 common statmount sys_statmount
+458 common listmount sys_listmount
select CRC_T10DIF
select CRC64_ROCKSOFT
+config BLK_DEV_WRITE_MOUNTED
+ bool "Allow writing to mounted block devices"
+ default y
+ help
+ When a block device is mounted, writing to its buffer cache is very
+ likely going to cause filesystem corruption. It is also rather easy to
+ crash the kernel in this way since the filesystem has no practical way
+ of detecting these writes to buffer cache and verifying its metadata
+ integrity. However there are some setups that need this capability
+ like running fsck on read-only mounted root device, modifying some
+ features on mounted ext4 filesystem, and similar. If you say N, the
+ kernel will prevent processes from writing to block devices that are
+ mounted by filesystems which provides some more protection from runaway
+ privileged processes and generally makes it much harder to crash
+ filesystem drivers. Note however that this does not prevent
+ underlying device(s) from being modified by other means, e.g. by
+ directly submitting SCSI commands or through access to lower layers of
+ storage stack. If in doubt, say Y. The configuration can be overridden
+ with the bdev_allow_write_mounted boot option.
+
config BLK_DEV_ZONED
bool "Zoned block device support"
select MQ_IOSCHED_DEADLINE
prev = prev_badblocks(bb, &bad, hint);
/* start after all badblocks */
- if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) {
+ if ((prev >= 0) &&
+ ((prev + 1) >= bb->count) && !overlap_front(bb, prev, &bad)) {
len = sectors;
goto update_sectors;
}
- if (overlap_front(bb, prev, &bad)) {
+ /* Overlapped with front badblocks record */
+ if ((prev >= 0) && overlap_front(bb, prev, &bad)) {
if (BB_ACK(p[prev]))
acked_badblocks++;
else
#include "../fs/internal.h"
#include "blk.h"
+/* Should we allow writing to mounted block devices? */
+static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED);
+
struct bdev_inode {
struct block_device bdev;
struct inode vfs_inode;
EXPORT_SYMBOL(sync_blockdev_range);
/**
- * freeze_bdev - lock a filesystem and force it into a consistent state
+ * bdev_freeze - lock a filesystem and force it into a consistent state
* @bdev: blockdevice to lock
*
* If a superblock is found on this device, we take the s_umount semaphore
* on it to make sure nobody unmounts until the snapshot creation is done.
* The reference counter (bd_fsfreeze_count) guarantees that only the last
* unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
+ * freeze requests arrive simultaneously. It counts up in bdev_freeze() and
+ * count down in bdev_thaw(). When it becomes 0, thaw_bdev() will unfreeze
* actually.
+ *
+ * Return: On success zero is returned, negative error code on failure.
*/
-int freeze_bdev(struct block_device *bdev)
+int bdev_freeze(struct block_device *bdev)
{
- struct super_block *sb;
int error = 0;
mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (++bdev->bd_fsfreeze_count > 1)
- goto done;
-
- sb = get_active_super(bdev);
- if (!sb)
- goto sync;
- if (sb->s_op->freeze_super)
- error = sb->s_op->freeze_super(sb, FREEZE_HOLDER_USERSPACE);
- else
- error = freeze_super(sb, FREEZE_HOLDER_USERSPACE);
- deactivate_super(sb);
- if (error) {
- bdev->bd_fsfreeze_count--;
- goto done;
+ if (atomic_inc_return(&bdev->bd_fsfreeze_count) > 1) {
+ mutex_unlock(&bdev->bd_fsfreeze_mutex);
+ return 0;
}
- bdev->bd_fsfreeze_sb = sb;
-sync:
- sync_blockdev(bdev);
-done:
+ mutex_lock(&bdev->bd_holder_lock);
+ if (bdev->bd_holder_ops && bdev->bd_holder_ops->freeze) {
+ error = bdev->bd_holder_ops->freeze(bdev);
+ lockdep_assert_not_held(&bdev->bd_holder_lock);
+ } else {
+ mutex_unlock(&bdev->bd_holder_lock);
+ error = sync_blockdev(bdev);
+ }
+
+ if (error)
+ atomic_dec(&bdev->bd_fsfreeze_count);
+
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return error;
}
-EXPORT_SYMBOL(freeze_bdev);
+EXPORT_SYMBOL(bdev_freeze);
/**
- * thaw_bdev - unlock filesystem
+ * bdev_thaw - unlock filesystem
* @bdev: blockdevice to unlock
*
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ * Unlocks the filesystem and marks it writeable again after bdev_freeze().
+ *
+ * Return: On success zero is returned, negative error code on failure.
*/
-int thaw_bdev(struct block_device *bdev)
+int bdev_thaw(struct block_device *bdev)
{
- struct super_block *sb;
- int error = -EINVAL;
+ int error = -EINVAL, nr_freeze;
mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (!bdev->bd_fsfreeze_count)
+
+ /*
+ * If this returns < 0 it means that @bd_fsfreeze_count was
+ * already 0 and no decrement was performed.
+ */
+ nr_freeze = atomic_dec_if_positive(&bdev->bd_fsfreeze_count);
+ if (nr_freeze < 0)
goto out;
error = 0;
- if (--bdev->bd_fsfreeze_count > 0)
+ if (nr_freeze > 0)
goto out;
- sb = bdev->bd_fsfreeze_sb;
- if (!sb)
- goto out;
+ mutex_lock(&bdev->bd_holder_lock);
+ if (bdev->bd_holder_ops && bdev->bd_holder_ops->thaw) {
+ error = bdev->bd_holder_ops->thaw(bdev);
+ lockdep_assert_not_held(&bdev->bd_holder_lock);
+ } else {
+ mutex_unlock(&bdev->bd_holder_lock);
+ }
- if (sb->s_op->thaw_super)
- error = sb->s_op->thaw_super(sb, FREEZE_HOLDER_USERSPACE);
- else
- error = thaw_super(sb, FREEZE_HOLDER_USERSPACE);
if (error)
- bdev->bd_fsfreeze_count++;
- else
- bdev->bd_fsfreeze_sb = NULL;
+ atomic_inc(&bdev->bd_fsfreeze_count);
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return error;
}
-EXPORT_SYMBOL(thaw_bdev);
+EXPORT_SYMBOL(bdev_thaw);
/*
* pseudo-fs
{
put_device(&bdev->bd_device);
}
-
+
+static bool bdev_writes_blocked(struct block_device *bdev)
+{
+ return bdev->bd_writers == -1;
+}
+
+static void bdev_block_writes(struct block_device *bdev)
+{
+ bdev->bd_writers = -1;
+}
+
+static void bdev_unblock_writes(struct block_device *bdev)
+{
+ bdev->bd_writers = 0;
+}
+
+static bool bdev_may_open(struct block_device *bdev, blk_mode_t mode)
+{
+ if (bdev_allow_write_mounted)
+ return true;
+ /* Writes blocked? */
+ if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
+ return false;
+ if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
+ return false;
+ return true;
+}
+
+static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode)
+{
+ if (bdev_allow_write_mounted)
+ return;
+
+ /* Claim exclusive or shared write access. */
+ if (mode & BLK_OPEN_RESTRICT_WRITES)
+ bdev_block_writes(bdev);
+ else if (mode & BLK_OPEN_WRITE)
+ bdev->bd_writers++;
+}
+
+static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode)
+{
+ if (bdev_allow_write_mounted)
+ return;
+
+ /* Yield exclusive or shared write access. */
+ if (mode & BLK_OPEN_RESTRICT_WRITES)
+ bdev_unblock_writes(bdev);
+ else if (mode & BLK_OPEN_WRITE)
+ bdev->bd_writers--;
+}
+
/**
- * blkdev_get_by_dev - open a block device by device number
+ * bdev_open_by_dev - open a block device by device number
* @dev: device number of block device to open
* @mode: open mode (BLK_OPEN_*)
* @holder: exclusive holder identifier
*
* Use this interface ONLY if you really do not have anything better - i.e. when
* you are behind a truly sucky interface and all you are given is a device
- * number. Everything else should use blkdev_get_by_path().
+ * number. Everything else should use bdev_open_by_path().
*
* CONTEXT:
* Might sleep.
*
* RETURNS:
- * Reference to the block_device on success, ERR_PTR(-errno) on failure.
+ * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
+ * failure.
*/
-struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
- const struct blk_holder_ops *hops)
+struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+ const struct blk_holder_ops *hops)
{
- bool unblock_events = true;
+ struct bdev_handle *handle = kmalloc(sizeof(struct bdev_handle),
+ GFP_KERNEL);
struct block_device *bdev;
+ bool unblock_events = true;
struct gendisk *disk;
int ret;
+ if (!handle)
+ return ERR_PTR(-ENOMEM);
+
ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
MAJOR(dev), MINOR(dev),
((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
if (ret)
- return ERR_PTR(ret);
+ goto free_handle;
+
+ /* Blocking writes requires exclusive opener */
+ if (mode & BLK_OPEN_RESTRICT_WRITES && !holder) {
+ ret = -EINVAL;
+ goto free_handle;
+ }
bdev = blkdev_get_no_open(dev);
- if (!bdev)
- return ERR_PTR(-ENXIO);
+ if (!bdev) {
+ ret = -ENXIO;
+ goto free_handle;
+ }
disk = bdev->bd_disk;
if (holder) {
goto abort_claiming;
if (!try_module_get(disk->fops->owner))
goto abort_claiming;
+ ret = -EBUSY;
+ if (!bdev_may_open(bdev, mode))
+ goto abort_claiming;
if (bdev_is_partition(bdev))
ret = blkdev_get_part(bdev, mode);
else
ret = blkdev_get_whole(bdev, mode);
if (ret)
goto put_module;
+ bdev_claim_write_access(bdev, mode);
if (holder) {
bd_finish_claiming(bdev, holder, hops);
if (unblock_events)
disk_unblock_events(disk);
- return bdev;
+ handle->bdev = bdev;
+ handle->holder = holder;
+ handle->mode = mode;
+ return handle;
put_module:
module_put(disk->fops->owner);
abort_claiming:
disk_unblock_events(disk);
put_blkdev:
blkdev_put_no_open(bdev);
+free_handle:
+ kfree(handle);
return ERR_PTR(ret);
}
-EXPORT_SYMBOL(blkdev_get_by_dev);
-
-struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
- const struct blk_holder_ops *hops)
-{
- struct bdev_handle *handle = kmalloc(sizeof(*handle), GFP_KERNEL);
- struct block_device *bdev;
-
- if (!handle)
- return ERR_PTR(-ENOMEM);
- bdev = blkdev_get_by_dev(dev, mode, holder, hops);
- if (IS_ERR(bdev)) {
- kfree(handle);
- return ERR_CAST(bdev);
- }
- handle->bdev = bdev;
- handle->holder = holder;
- if (holder)
- mode |= BLK_OPEN_EXCL;
- handle->mode = mode;
- return handle;
-}
EXPORT_SYMBOL(bdev_open_by_dev);
/**
- * blkdev_get_by_path - open a block device by name
+ * bdev_open_by_path - open a block device by name
* @path: path to the block device to open
* @mode: open mode (BLK_OPEN_*)
* @holder: exclusive holder identifier
* Might sleep.
*
* RETURNS:
- * Reference to the block_device on success, ERR_PTR(-errno) on failure.
+ * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
+ * failure.
*/
-struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
- void *holder, const struct blk_holder_ops *hops)
-{
- struct block_device *bdev;
- dev_t dev;
- int error;
-
- error = lookup_bdev(path, &dev);
- if (error)
- return ERR_PTR(error);
-
- bdev = blkdev_get_by_dev(dev, mode, holder, hops);
- if (!IS_ERR(bdev) && (mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
- blkdev_put(bdev, holder);
- return ERR_PTR(-EACCES);
- }
-
- return bdev;
-}
-EXPORT_SYMBOL(blkdev_get_by_path);
-
struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
void *holder, const struct blk_holder_ops *hops)
{
}
EXPORT_SYMBOL(bdev_open_by_path);
-void blkdev_put(struct block_device *bdev, void *holder)
+void bdev_release(struct bdev_handle *handle)
{
+ struct block_device *bdev = handle->bdev;
struct gendisk *disk = bdev->bd_disk;
/*
sync_blockdev(bdev);
mutex_lock(&disk->open_mutex);
- if (holder)
- bd_end_claim(bdev, holder);
+ bdev_yield_write_access(bdev, handle->mode);
+
+ if (handle->holder)
+ bd_end_claim(bdev, handle->holder);
/*
* Trigger event checking and tell drivers to flush MEDIA_CHANGE
module_put(disk->fops->owner);
blkdev_put_no_open(bdev);
-}
-EXPORT_SYMBOL(blkdev_put);
-
-void bdev_release(struct bdev_handle *handle)
-{
- blkdev_put(handle->bdev, handle->holder);
kfree(handle);
}
EXPORT_SYMBOL(bdev_release);
blkdev_put_no_open(bdev);
}
+
+static int __init setup_bdev_allow_write_mounted(char *str)
+{
+ if (kstrtobool(str, &bdev_allow_write_mounted))
+ pr_warn("Invalid option string for bdev_allow_write_mounted:"
+ " '%s'\n", str);
+ return 1;
+}
+__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted);
notifier_event->events_mask |= event_mask;
if (notifier_event->eventfd)
- eventfd_signal(notifier_event->eventfd, 1);
+ eventfd_signal(notifier_event->eventfd);
mutex_unlock(¬ifier_event->lock);
}
static int mhi_read_reg(struct mhi_controller *mhi_cntrl, void __iomem *addr, u32 *out)
{
- u32 tmp = readl_relaxed(addr);
+ u32 tmp;
+ /*
+ * SOC_HW_VERSION quirk
+ * The SOC_HW_VERSION register (offset 0x224) is not reliable and
+ * may contain uninitialized values, including 0xFFFFFFFF. This could
+ * cause a false positive link down error. Instead, intercept any
+ * reads and provide the correct value of the register.
+ */
+ if (addr - mhi_cntrl->regs == 0x224) {
+ *out = 0x60110200;
+ return 0;
+ }
+
+ tmp = readl_relaxed(addr);
if (tmp == U32_MAX)
return -EIO;
struct dma_buf_attachment *attach;
struct drm_gem_object *obj;
struct qaic_bo *bo;
- size_t size;
int ret;
bo = qaic_alloc_init_bo();
goto attach_fail;
}
- size = PAGE_ALIGN(attach->dmabuf->size);
- if (size == 0) {
+ if (!attach->dmabuf->size) {
ret = -EINVAL;
goto size_align_fail;
}
- drm_gem_private_object_init(dev, obj, size);
+ drm_gem_private_object_init(dev, obj, attach->dmabuf->size);
/*
* skipping dma_buf_map_attachment() as we do not know the direction
* just yet. Once the direction is known in the subsequent IOCTL to
if (!twcb)
return;
init_task_work(&twcb->twork, binder_do_fd_close);
- twcb->file = close_fd_get_file(fd);
+ twcb->file = file_close_fd(fd);
if (twcb->file) {
// pin it until binder_do_fd_close(); see comments there
get_file(twcb->file);
unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
EXPORT_SYMBOL(__per_cpu_offset);
-static int __init early_cpu_to_node(int cpu)
+int __init early_cpu_to_node(int cpu)
{
return cpu_to_node_map[cpu];
}
iov_iter_bvec(&i, ITER_SOURCE, bvec, 1, bvec->bv_len);
- file_start_write(file);
bw = vfs_iter_write(file, &i, ppos, 0);
- file_end_write(file);
if (likely(bw == bvec->bv_len))
return 0;
*/
if (ublk_need_map_req(req)) {
struct iov_iter iter;
- struct iovec iov;
const int dir = ITER_DEST;
- import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes,
- &iov, &iter);
-
+ import_ubuf(dir, u64_to_user_ptr(io->addr), rq_bytes, &iter);
return ublk_copy_user_pages(req, 0, &iter, dir);
}
return rq_bytes;
if (ublk_need_unmap_req(req)) {
struct iov_iter iter;
- struct iovec iov;
const int dir = ITER_SOURCE;
WARN_ON_ONCE(io->res > rq_bytes);
- import_single_range(dir, u64_to_user_ptr(io->addr), io->res,
- &iov, &iter);
+ import_ubuf(dir, u64_to_user_ptr(io->addr), io->res, &iter);
return ublk_copy_user_pages(req, 0, &iter, dir);
}
return rq_bytes;
static int init_vq(struct virtio_blk *vblk)
{
int err;
- int i;
+ unsigned short i;
vq_callback_t **callbacks;
const char **names;
struct virtqueue **vqs;
unsigned short num_vqs;
- unsigned int num_poll_vqs;
+ unsigned short num_poll_vqs;
struct virtio_device *vdev = vblk->vdev;
struct irq_affinity desc = { 0, };
for (i = 0; i < num_vqs - num_poll_vqs; i++) {
callbacks[i] = virtblk_done;
- snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
+ snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%u", i);
names[i] = vblk->vqs[i].name;
}
for (; i < num_vqs; i++) {
callbacks[i] = NULL;
- snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
+ snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%u", i);
names[i] = vblk->vqs[i].name;
}
#include <linux/module.h>
#include <asm/unaligned.h>
+#include <linux/atomic.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/slab.h>
bool wakeup;
__u16 msft_opcode;
bool aosp_capable;
+ atomic_t initialized;
};
static int vhci_open_dev(struct hci_dev *hdev)
memcpy(skb_push(skb, 1), &hci_skb_pkt_type(skb), 1);
- mutex_lock(&data->open_mutex);
skb_queue_tail(&data->readq, skb);
- mutex_unlock(&data->open_mutex);
- wake_up_interruptible(&data->read_wait);
+ if (atomic_read(&data->initialized))
+ wake_up_interruptible(&data->read_wait);
return 0;
}
skb_put_u8(skb, 0xff);
skb_put_u8(skb, opcode);
put_unaligned_le16(hdev->id, skb_put(skb, 2));
- skb_queue_tail(&data->readq, skb);
+ skb_queue_head(&data->readq, skb);
+ atomic_inc(&data->initialized);
wake_up_interruptible(&data->read_wait);
return 0;
sysc_val = sysc_read_sysconfig(ddata);
sysc_val |= sysc_mask;
sysc_write(ddata, sysc_offset, sysc_val);
- /* Flush posted write */
+
+ /*
+ * Some devices need a delay before reading registers
+ * after reset. Presumably a srst_udelay is not needed
+ * for devices that use a rstctrl register reset.
+ */
+ if (ddata->cfg.srst_udelay)
+ fsleep(ddata->cfg.srst_udelay);
+
+ /*
+ * Flush posted write. For devices needing srst_udelay
+ * this should trigger an interconnect error if the
+ * srst_udelay value is needed but not configured.
+ */
sysc_val = sysc_read_sysconfig(ddata);
}
- if (ddata->cfg.srst_udelay)
- fsleep(ddata->cfg.srst_udelay);
-
if (ddata->post_reset_quirk)
ddata->post_reset_quirk(ddata);
SYSCALL_DEFINE3(getrandom, char __user *, ubuf, size_t, len, unsigned int, flags)
{
struct iov_iter iter;
- struct iovec iov;
int ret;
if (flags & ~(GRND_NONBLOCK | GRND_RANDOM | GRND_INSECURE))
return ret;
}
- ret = import_single_range(ITER_DEST, ubuf, len, &iov, &iter);
+ ret = import_ubuf(ITER_DEST, ubuf, len, &iter);
if (unlikely(ret))
return ret;
return get_random_bytes_user(&iter);
return 0;
case RNDADDENTROPY: {
struct iov_iter iter;
- struct iovec iov;
ssize_t ret;
int len;
return -EINVAL;
if (get_user(len, p++))
return -EFAULT;
- ret = import_single_range(ITER_SOURCE, p, len, &iov, &iter);
+ ret = import_ubuf(ITER_SOURCE, p, len, &iter);
if (unlikely(ret))
return ret;
ret = write_pool_user(&iter);
filter_data[1] = 0;
}
- cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
- cn_filter, (void *)filter_data);
+ if (cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
+ cn_filter, (void *)filter_data) == -ESRCH)
+ atomic_set(&proc_event_num_listeners, 0);
local_unlock(&local_event.lock);
}
#include <linux/of_platform.h>
#include <linux/panic_notifier.h>
#include <linux/platform_device.h>
+#include <linux/property.h>
#include <linux/regmap.h>
#include <linux/types.h>
#include <linux/uaccess.h>
static int altr_sdram_probe(struct platform_device *pdev)
{
- const struct of_device_id *id;
struct edac_mc_layer layers[2];
struct mem_ctl_info *mci;
struct altr_sdram_mc_data *drvdata;
int irq, irq2, res = 0;
unsigned long mem_size, irqflags = 0;
- id = of_match_device(altr_sdram_ctrl_of_match, &pdev->dev);
- if (!id)
- return -ENODEV;
-
/* Grab the register range from the sdr controller in device tree */
mc_vbase = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
"altr,sdr-syscon");
}
/* Check specific dependencies for the module */
- priv = of_match_node(altr_sdram_ctrl_of_match,
- pdev->dev.of_node)->data;
+ priv = device_get_match_data(&pdev->dev);
/* Validate the SDRAM controller has ECC enabled */
if (regmap_read(mc_vbase, priv->ecc_ctrl_offset, &read_reg) ||
return res;
}
-static int altr_sdram_remove(struct platform_device *pdev)
+static void altr_sdram_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
platform_set_drvdata(pdev, NULL);
-
- return 0;
}
/*
static struct platform_driver altr_sdram_edac_driver = {
.probe = altr_sdram_probe,
- .remove = altr_sdram_remove,
+ .remove_new = altr_sdram_remove,
.driver = {
.name = "altr_sdram_edac",
#ifdef CONFIG_PM
return res;
}
-static int altr_edac_device_remove(struct platform_device *pdev)
+static void altr_edac_device_remove(struct platform_device *pdev)
{
struct edac_device_ctl_info *dci = platform_get_drvdata(pdev);
struct altr_edac_device_dev *drvdata = dci->pvt_info;
debugfs_remove_recursive(drvdata->debugfs_dir);
edac_device_del_device(&pdev->dev);
edac_device_free_ctl_info(dci);
-
- return 0;
}
static struct platform_driver altr_edac_device_driver = {
.probe = altr_edac_device_probe,
- .remove = altr_edac_device_remove,
+ .remove_new = altr_edac_device_remove,
.driver = {
.name = "altr_edac_device",
.of_match_table = altr_edac_device_of_match,
#define LNTM_NODE_COUNT GENMASK(27, 16)
#define LNTM_BASE_NODE_ID GENMASK(11, 0)
-static int gpu_get_node_map(void)
+static int gpu_get_node_map(struct amd64_pvt *pvt)
{
struct pci_dev *pdev;
int ret;
u32 tmp;
/*
- * Node ID 0 is reserved for CPUs.
- * Therefore, a non-zero Node ID means we've already cached the values.
+ * Mapping of nodes from hardware-provided AMD Node ID to a
+ * Linux logical one is applicable for MI200 models. Therefore,
+ * return early for other heterogeneous systems.
+ */
+ if (pvt->F3->device != PCI_DEVICE_ID_AMD_MI200_DF_F3)
+ return 0;
+
+ /*
+ * Node ID 0 is reserved for CPUs. Therefore, a non-zero Node ID
+ * means the values have been already cached.
*/
if (gpu_node_map.base_node_id)
return 0;
dimm->nr_pages = gpu_get_csrow_nr_pages(pvt, umc, cs);
dimm->edac_mode = EDAC_SECDED;
- dimm->mtype = MEM_HBM2;
+ dimm->mtype = pvt->dram_type;
dimm->dtype = DEV_X16;
dimm->grain = 64;
}
return true;
}
-static inline u32 gpu_get_umc_base(u8 umc, u8 channel)
+static inline u32 gpu_get_umc_base(struct amd64_pvt *pvt, u8 umc, u8 channel)
{
/*
* On CPUs, there is one channel per UMC, so UMC numbering equals
* On GPU nodes channels are selected in 3rd nibble
* HBM chX[3:0]= [Y ]5X[3:0]000;
* HBM chX[7:4]= [Y+1]5X[3:0]000
+ *
+ * On MI300 APU nodes, same as GPU nodes but channels are selected
+ * in the base address of 0x90000
*/
umc *= 2;
if (channel >= 4)
umc++;
- return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+ return pvt->gpu_umc_base + (umc << 20) + ((channel % 4) << 12);
}
static void gpu_read_mc_regs(struct amd64_pvt *pvt)
/* Read registers from each UMC */
for_each_umc(i) {
- umc_base = gpu_get_umc_base(i, 0);
+ umc_base = gpu_get_umc_base(pvt, i, 0);
umc = &pvt->umc[i];
amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
for_each_umc(umc) {
for_each_chip_select(cs, umc, pvt) {
- base_reg = gpu_get_umc_base(umc, cs) + UMCCH_BASE_ADDR;
+ base_reg = gpu_get_umc_base(pvt, umc, cs) + UMCCH_BASE_ADDR;
base = &pvt->csels[umc].csbases[cs];
if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) {
umc, cs, *base, base_reg);
}
- mask_reg = gpu_get_umc_base(umc, cs) + UMCCH_ADDR_MASK;
+ mask_reg = gpu_get_umc_base(pvt, umc, cs) + UMCCH_ADDR_MASK;
mask = &pvt->csels[umc].csmasks[cs];
if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) {
{
int ret;
- ret = gpu_get_node_map();
+ ret = gpu_get_node_map(pvt);
if (ret)
return ret;
if (pvt->F3->device == PCI_DEVICE_ID_AMD_MI200_DF_F3) {
pvt->ctl_name = "MI200";
pvt->max_mcs = 4;
+ pvt->dram_type = MEM_HBM2;
+ pvt->gpu_umc_base = 0x50000;
pvt->ops = &gpu_ops;
} else {
pvt->ctl_name = "F19h_M30h";
pvt->ctl_name = "F19h_M70h";
pvt->flags.zn_regs_v2 = 1;
break;
+ case 0x90 ... 0x9f:
+ pvt->ctl_name = "F19h_M90h";
+ pvt->max_mcs = 4;
+ pvt->dram_type = MEM_HBM3;
+ pvt->gpu_umc_base = 0x90000;
+ pvt->ops = &gpu_ops;
+ break;
case 0xa0 ... 0xaf:
pvt->ctl_name = "F19h_MA0h";
pvt->max_mcs = 12;
NULL
};
+/*
+ * For heterogeneous and APU models EDAC CHIP_SELECT and CHANNEL layers
+ * should be swapped to fit into the layers.
+ */
+static unsigned int get_layer_size(struct amd64_pvt *pvt, u8 layer)
+{
+ bool is_gpu = (pvt->ops == &gpu_ops);
+
+ if (!layer)
+ return is_gpu ? pvt->max_mcs
+ : pvt->csels[0].b_cnt;
+ else
+ return is_gpu ? pvt->csels[0].b_cnt
+ : pvt->max_mcs;
+}
+
static int init_one_instance(struct amd64_pvt *pvt)
{
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
int ret = -ENOMEM;
- /*
- * For Heterogeneous family EDAC CHIP_SELECT and CHANNEL layers should
- * be swapped to fit into the layers.
- */
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
- layers[0].size = (pvt->F3->device == PCI_DEVICE_ID_AMD_MI200_DF_F3) ?
- pvt->max_mcs : pvt->csels[0].b_cnt;
+ layers[0].size = get_layer_size(pvt, 0);
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;
- layers[1].size = (pvt->F3->device == PCI_DEVICE_ID_AMD_MI200_DF_F3) ?
- pvt->csels[0].b_cnt : pvt->max_mcs;
+ layers[1].size = get_layer_size(pvt, 1);
layers[1].is_virt_csrow = false;
mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
u32 dct_sel_lo; /* DRAM Controller Select Low */
u32 dct_sel_hi; /* DRAM Controller Select High */
u32 online_spare; /* On-Line spare Reg */
+ u32 gpu_umc_base; /* Base address used for channel selection on GPUs */
/* x4, x8, or x16 syndromes in use */
u8 ecc_sym_sz;
#include <linux/kernel.h>
#include <linux/edac.h>
-#include <linux/of_platform.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
#include <asm/hardware/cache-l2x0.h>
#include <asm/hardware/cache-aurora-l2.h>
return 0;
}
-static int axp_mc_remove(struct platform_device *pdev)
+static void axp_mc_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
platform_set_drvdata(pdev, NULL);
-
- return 0;
}
static struct platform_driver axp_mc_driver = {
.probe = axp_mc_probe,
- .remove = axp_mc_remove,
+ .remove_new = axp_mc_remove,
.driver = {
.name = "armada_xp_mc_edac",
.of_match_table = of_match_ptr(axp_mc_of_match),
return 0;
}
-static int aurora_l2_remove(struct platform_device *pdev)
+static void aurora_l2_remove(struct platform_device *pdev)
{
struct edac_device_ctl_info *dci = platform_get_drvdata(pdev);
#ifdef CONFIG_EDAC_DEBUG
edac_device_del_device(&pdev->dev);
edac_device_free_ctl_info(dci);
platform_set_drvdata(pdev, NULL);
-
- return 0;
}
static struct platform_driver aurora_l2_driver = {
.probe = aurora_l2_probe,
- .remove = aurora_l2_remove,
+ .remove_new = aurora_l2_remove,
.driver = {
.name = "aurora_l2_edac",
.of_match_table = of_match_ptr(aurora_l2_of_match),
}
-static int aspeed_remove(struct platform_device *pdev)
+static void aspeed_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci;
mci = edac_mc_del_mc(&pdev->dev);
if (mci)
edac_mc_free(mci);
-
- return 0;
}
.of_match_table = aspeed_of_match
},
.probe = aspeed_probe,
- .remove = aspeed_remove
+ .remove_new = aspeed_remove
};
module_platform_driver(aspeed_driver);
}
-static int bluefield_edac_mc_remove(struct platform_device *pdev)
+static void bluefield_edac_mc_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
-
- return 0;
}
static const struct acpi_device_id bluefield_mc_acpi_ids[] = {
.acpi_match_table = bluefield_mc_acpi_ids,
},
.probe = bluefield_edac_mc_probe,
- .remove = bluefield_edac_mc_remove,
+ .remove_new = bluefield_edac_mc_remove,
};
module_platform_driver(bluefield_edac_mc_driver);
return 0;
}
-static int cell_edac_remove(struct platform_device *pdev)
+static void cell_edac_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = edac_mc_del_mc(&pdev->dev);
if (mci)
edac_mc_free(mci);
- return 0;
}
static struct platform_driver cell_edac_driver = {
.name = "cbe-mic",
},
.probe = cell_edac_probe,
- .remove = cell_edac_remove,
+ .remove_new = cell_edac_remove,
};
static int __init cell_edac_init(void)
return res;
}
-static int cpc925_remove(struct platform_device *pdev)
+static void cpc925_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
-
- return 0;
}
static struct platform_driver cpc925_edac_driver = {
.probe = cpc925_probe,
- .remove = cpc925_remove,
+ .remove_new = cpc925_remove,
.driver = {
.name = "cpc925_edac",
}
return ret;
}
-static int dmc520_edac_remove(struct platform_device *pdev)
+static void dmc520_edac_remove(struct platform_device *pdev)
{
u32 reg_val, idx, irq_mask_all = 0;
struct mem_ctl_info *mci;
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
-
- return 0;
}
static const struct of_device_id dmc520_edac_driver_id[] = {
},
.probe = dmc520_edac_probe,
- .remove = dmc520_edac_remove
+ .remove_new = dmc520_edac_remove
};
module_platform_driver(dmc520_edac_driver);
[MEM_NVDIMM] = "Non-volatile-RAM",
[MEM_WIO2] = "Wide-IO-2",
[MEM_HBM2] = "High-bandwidth-memory-Gen2",
+ [MEM_HBM3] = "High-bandwidth-memory-Gen3",
};
EXPORT_SYMBOL_GPL(edac_mem_types);
/* read the device TYPE, looking for bridges */
pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
- if ((header_type & 0x7F) == PCI_HEADER_TYPE_BRIDGE)
+ if ((header_type & PCI_HEADER_TYPE_MASK) == PCI_HEADER_TYPE_BRIDGE)
get_pci_parity_status(dev, 1);
}
edac_dbg(4, "PCI HEADER TYPE= 0x%02x %s\n",
header_type, dev_name(&dev->dev));
- if ((header_type & 0x7F) == PCI_HEADER_TYPE_BRIDGE) {
+ if ((header_type & PCI_HEADER_TYPE_MASK) == PCI_HEADER_TYPE_BRIDGE) {
/* On bridges, need to examine secondary status register */
status = get_pci_parity_status(dev, 1);
return res;
}
-int fsl_mc_err_remove(struct platform_device *op)
+void fsl_mc_err_remove(struct platform_device *op)
{
struct mem_ctl_info *mci = dev_get_drvdata(&op->dev);
struct fsl_mc_pdata *pdata = mci->pvt_info;
edac_mc_del_mc(&op->dev);
edac_mc_free(mci);
- return 0;
}
int irq;
};
int fsl_mc_err_probe(struct platform_device *op);
-int fsl_mc_err_remove(struct platform_device *op);
+void fsl_mc_err_remove(struct platform_device *op);
#endif
return res;
}
-static int highbank_l2_err_remove(struct platform_device *pdev)
+static void highbank_l2_err_remove(struct platform_device *pdev)
{
struct edac_device_ctl_info *dci = platform_get_drvdata(pdev);
edac_device_del_device(&pdev->dev);
edac_device_free_ctl_info(dci);
- return 0;
}
static struct platform_driver highbank_l2_edac_driver = {
.probe = highbank_l2_err_probe,
- .remove = highbank_l2_err_remove,
+ .remove_new = highbank_l2_err_remove,
.driver = {
.name = "hb_l2_edac",
.of_match_table = hb_l2_err_of_match,
return res;
}
-static int highbank_mc_remove(struct platform_device *pdev)
+static void highbank_mc_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
- return 0;
}
static struct platform_driver highbank_mc_edac_driver = {
.probe = highbank_mc_probe,
- .remove = highbank_mc_remove,
+ .remove_new = highbank_mc_remove,
.driver = {
.name = "hb_mc_edac",
.of_match_table = hb_ddr_ctrl_of_match,
PCI_ID_TABLE_ENTRY(pci_dev_descr_i7core_nehalem),
PCI_ID_TABLE_ENTRY(pci_dev_descr_lynnfield),
PCI_ID_TABLE_ENTRY(pci_dev_descr_i7core_westmere),
- {0,} /* 0 terminated list. */
+ { NULL, }
};
/*
static const struct pci_device_id i7core_pci_tbl[] = {
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_X58_HUB_MGMT)},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LYNNFIELD_QPI_LINK0)},
- {0,} /* 0 terminated list. */
+ { 0, }
};
/****************************************************************************
/* Capability register E */
#define CAPID_E_OFFSET 0xf0
#define CAPID_E_IBECC BIT(12)
+#define CAPID_E_IBECC_BIT18 BIT(18)
/* Error Status */
#define ERRSTS_OFFSET 0xc8
#define ECC_ERROR_LOG_UE BIT_ULL(63)
#define ECC_ERROR_LOG_ADDR_SHIFT 5
#define ECC_ERROR_LOG_ADDR(v) GET_BITFIELD(v, 5, 38)
+#define ECC_ERROR_LOG_ADDR45(v) GET_BITFIELD(v, 5, 45)
#define ECC_ERROR_LOG_SYND(v) GET_BITFIELD(v, 46, 61)
/* Host MMIO base address */
u32 ibecc_base;
u32 ibecc_error_log_offset;
bool (*ibecc_available)(struct pci_dev *pdev);
+ /* Extract error address logged in IBECC */
+ u64 (*err_addr)(u64 ecclog);
/* Convert error address logged in IBECC to system physical address */
u64 (*err_addr_to_sys_addr)(u64 eaddr, int mc);
/* Convert error address logged in IBECC to integrated memory controller address */
#define DID_ADL_SKU3 0x4621
#define DID_ADL_SKU4 0x4641
+/* Compute die IDs for Alder Lake-N with IBECC */
+#define DID_ADL_N_SKU1 0x4614
+#define DID_ADL_N_SKU2 0x4617
+#define DID_ADL_N_SKU3 0x461b
+#define DID_ADL_N_SKU4 0x461c
+#define DID_ADL_N_SKU5 0x4673
+#define DID_ADL_N_SKU6 0x4674
+#define DID_ADL_N_SKU7 0x4675
+#define DID_ADL_N_SKU8 0x4677
+#define DID_ADL_N_SKU9 0x4678
+#define DID_ADL_N_SKU10 0x4679
+#define DID_ADL_N_SKU11 0x467c
+
+/* Compute die IDs for Raptor Lake-P with IBECC */
+#define DID_RPL_P_SKU1 0xa706
+#define DID_RPL_P_SKU2 0xa707
+#define DID_RPL_P_SKU3 0xa708
+#define DID_RPL_P_SKU4 0xa716
+#define DID_RPL_P_SKU5 0xa718
+
+/* Compute die IDs for Meteor Lake-PS with IBECC */
+#define DID_MTL_PS_SKU1 0x7d21
+#define DID_MTL_PS_SKU2 0x7d22
+#define DID_MTL_PS_SKU3 0x7d23
+#define DID_MTL_PS_SKU4 0x7d24
+
+/* Compute die IDs for Meteor Lake-P with IBECC */
+#define DID_MTL_P_SKU1 0x7d01
+#define DID_MTL_P_SKU2 0x7d02
+#define DID_MTL_P_SKU3 0x7d14
+
+static int get_mchbar(struct pci_dev *pdev, u64 *mchbar)
+{
+ union {
+ u64 v;
+ struct {
+ u32 v_lo;
+ u32 v_hi;
+ };
+ } u;
+
+ if (pci_read_config_dword(pdev, MCHBAR_OFFSET, &u.v_lo)) {
+ igen6_printk(KERN_ERR, "Failed to read lower MCHBAR\n");
+ return -ENODEV;
+ }
+
+ if (pci_read_config_dword(pdev, MCHBAR_OFFSET + 4, &u.v_hi)) {
+ igen6_printk(KERN_ERR, "Failed to read upper MCHBAR\n");
+ return -ENODEV;
+ }
+
+ if (!(u.v & MCHBAR_EN)) {
+ igen6_printk(KERN_ERR, "MCHBAR is disabled\n");
+ return -ENODEV;
+ }
+
+ *mchbar = MCHBAR_BASE(u.v);
+
+ return 0;
+}
+
static bool ehl_ibecc_available(struct pci_dev *pdev)
{
u32 v;
return !(CAPID_E_IBECC & v);
}
+static bool mtl_p_ibecc_available(struct pci_dev *pdev)
+{
+ u32 v;
+
+ if (pci_read_config_dword(pdev, CAPID_E_OFFSET, &v))
+ return false;
+
+ return !(CAPID_E_IBECC_BIT18 & v);
+}
+
+static bool mtl_ps_ibecc_available(struct pci_dev *pdev)
+{
+#define MCHBAR_MEMSS_IBECCDIS 0x13c00
+ void __iomem *window;
+ u64 mchbar;
+ u32 val;
+
+ if (get_mchbar(pdev, &mchbar))
+ return false;
+
+ window = ioremap(mchbar, MCHBAR_SIZE * 2);
+ if (!window) {
+ igen6_printk(KERN_ERR, "Failed to ioremap 0x%llx\n", mchbar);
+ return false;
+ }
+
+ val = readl(window + MCHBAR_MEMSS_IBECCDIS);
+ iounmap(window);
+
+ /* Bit6: 1 - IBECC is disabled, 0 - IBECC isn't disabled */
+ return !GET_BITFIELD(val, 6, 6);
+}
+
static u64 mem_addr_to_sys_addr(u64 maddr)
{
if (maddr < igen6_tolud)
return imc_addr;
}
+static u64 rpl_p_err_addr(u64 ecclog)
+{
+ return ECC_ERROR_LOG_ADDR45(ecclog);
+}
+
static struct res_config ehl_cfg = {
.num_imc = 1,
.imc_base = 0x5000,
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
+static struct res_config adl_n_cfg = {
+ .machine_check = true,
+ .num_imc = 1,
+ .imc_base = 0xd800,
+ .ibecc_base = 0xd400,
+ .ibecc_error_log_offset = 0x68,
+ .ibecc_available = tgl_ibecc_available,
+ .err_addr_to_sys_addr = adl_err_addr_to_sys_addr,
+ .err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
+};
+
+static struct res_config rpl_p_cfg = {
+ .machine_check = true,
+ .num_imc = 2,
+ .imc_base = 0xd800,
+ .ibecc_base = 0xd400,
+ .ibecc_error_log_offset = 0x68,
+ .ibecc_available = tgl_ibecc_available,
+ .err_addr = rpl_p_err_addr,
+ .err_addr_to_sys_addr = adl_err_addr_to_sys_addr,
+ .err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
+};
+
+static struct res_config mtl_ps_cfg = {
+ .machine_check = true,
+ .num_imc = 2,
+ .imc_base = 0xd800,
+ .ibecc_base = 0xd400,
+ .ibecc_error_log_offset = 0x170,
+ .ibecc_available = mtl_ps_ibecc_available,
+ .err_addr_to_sys_addr = adl_err_addr_to_sys_addr,
+ .err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
+};
+
+static struct res_config mtl_p_cfg = {
+ .machine_check = true,
+ .num_imc = 2,
+ .imc_base = 0xd800,
+ .ibecc_base = 0xd400,
+ .ibecc_error_log_offset = 0x170,
+ .ibecc_available = mtl_p_ibecc_available,
+ .err_addr_to_sys_addr = adl_err_addr_to_sys_addr,
+ .err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
+};
+
static const struct pci_device_id igen6_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, DID_EHL_SKU5), (kernel_ulong_t)&ehl_cfg },
{ PCI_VDEVICE(INTEL, DID_EHL_SKU6), (kernel_ulong_t)&ehl_cfg },
{ PCI_VDEVICE(INTEL, DID_ADL_SKU2), (kernel_ulong_t)&adl_cfg },
{ PCI_VDEVICE(INTEL, DID_ADL_SKU3), (kernel_ulong_t)&adl_cfg },
{ PCI_VDEVICE(INTEL, DID_ADL_SKU4), (kernel_ulong_t)&adl_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU1), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU2), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU3), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU4), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU5), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU6), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU7), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU8), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU9), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU10), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ADL_N_SKU11), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_RPL_P_SKU1), (kernel_ulong_t)&rpl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_RPL_P_SKU2), (kernel_ulong_t)&rpl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_RPL_P_SKU3), (kernel_ulong_t)&rpl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_RPL_P_SKU4), (kernel_ulong_t)&rpl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_RPL_P_SKU5), (kernel_ulong_t)&rpl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_MTL_PS_SKU1), (kernel_ulong_t)&mtl_ps_cfg },
+ { PCI_VDEVICE(INTEL, DID_MTL_PS_SKU2), (kernel_ulong_t)&mtl_ps_cfg },
+ { PCI_VDEVICE(INTEL, DID_MTL_PS_SKU3), (kernel_ulong_t)&mtl_ps_cfg },
+ { PCI_VDEVICE(INTEL, DID_MTL_PS_SKU4), (kernel_ulong_t)&mtl_ps_cfg },
+ { PCI_VDEVICE(INTEL, DID_MTL_P_SKU1), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_MTL_P_SKU2), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_MTL_P_SKU3), (kernel_ulong_t)&mtl_p_cfg },
{ },
};
MODULE_DEVICE_TABLE(pci, igen6_pci_tbl);
llist_for_each_entry_safe(node, tmp, head, llnode) {
memset(&res, 0, sizeof(res));
- eaddr = ECC_ERROR_LOG_ADDR(node->ecclog) <<
- ECC_ERROR_LOG_ADDR_SHIFT;
+ if (res_cfg->err_addr)
+ eaddr = res_cfg->err_addr(node->ecclog);
+ else
+ eaddr = ECC_ERROR_LOG_ADDR(node->ecclog) <<
+ ECC_ERROR_LOG_ADDR_SHIFT;
res.mc = node->mc;
res.sys_addr = res_cfg->err_addr_to_sys_addr(eaddr, res.mc);
res.imc_addr = res_cfg->err_addr_to_imc_addr(eaddr, res.mc);
igen6_tom = u.v & GENMASK_ULL(38, 20);
- if (pci_read_config_dword(pdev, MCHBAR_OFFSET, &u.v_lo)) {
- igen6_printk(KERN_ERR, "Failed to read lower MCHBAR\n");
+ if (get_mchbar(pdev, mchbar))
goto fail;
- }
-
- if (pci_read_config_dword(pdev, MCHBAR_OFFSET + 4, &u.v_hi)) {
- igen6_printk(KERN_ERR, "Failed to read upper MCHBAR\n");
- goto fail;
- }
-
- if (!(u.v & MCHBAR_EN)) {
- igen6_printk(KERN_ERR, "MCHBAR is disabled\n");
- goto fail;
- }
-
- *mchbar = MCHBAR_BASE(u.v);
#ifdef CONFIG_EDAC_DEBUG
if (pci_read_config_dword(pdev, TOUUD_OFFSET, &u.v_lo))
static struct platform_driver fsl_ddr_mc_err_driver = {
.probe = fsl_mc_err_probe,
- .remove = fsl_mc_err_remove,
+ .remove_new = fsl_mc_err_remove,
.driver = {
.name = "fsl_ddr_mc_err",
.of_match_table = fsl_ddr_mc_err_of_match,
"Status Register File",
};
-/* Scalable MCA error strings */
-static const char * const smca_ls_mce_desc[] = {
- "Load queue parity error",
- "Store queue parity error",
- "Miss address buffer payload parity error",
- "Level 1 TLB parity error",
- "DC Tag error type 5",
- "DC Tag error type 6",
- "DC Tag error type 1",
- "Internal error type 1",
- "Internal error type 2",
- "System Read Data Error Thread 0",
- "System Read Data Error Thread 1",
- "DC Tag error type 2",
- "DC Data error type 1 and poison consumption",
- "DC Data error type 2",
- "DC Data error type 3",
- "DC Tag error type 4",
- "Level 2 TLB parity error",
- "PDC parity error",
- "DC Tag error type 3",
- "DC Tag error type 5",
- "L2 Fill Data error",
-};
-
-static const char * const smca_ls2_mce_desc[] = {
- "An ECC error was detected on a data cache read by a probe or victimization",
- "An ECC error or L2 poison was detected on a data cache read by a load",
- "An ECC error was detected on a data cache read-modify-write by a store",
- "An ECC error or poison bit mismatch was detected on a tag read by a probe or victimization",
- "An ECC error or poison bit mismatch was detected on a tag read by a load",
- "An ECC error or poison bit mismatch was detected on a tag read by a store",
- "An ECC error was detected on an EMEM read by a load",
- "An ECC error was detected on an EMEM read-modify-write by a store",
- "A parity error was detected in an L1 TLB entry by any access",
- "A parity error was detected in an L2 TLB entry by any access",
- "A parity error was detected in a PWC entry by any access",
- "A parity error was detected in an STQ entry by any access",
- "A parity error was detected in an LDQ entry by any access",
- "A parity error was detected in a MAB entry by any access",
- "A parity error was detected in an SCB entry state field by any access",
- "A parity error was detected in an SCB entry address field by any access",
- "A parity error was detected in an SCB entry data field by any access",
- "A parity error was detected in a WCB entry by any access",
- "A poisoned line was detected in an SCB entry by any access",
- "A SystemReadDataError error was reported on read data returned from L2 for a load",
- "A SystemReadDataError error was reported on read data returned from L2 for an SCB store",
- "A SystemReadDataError error was reported on read data returned from L2 for a WCB store",
- "A hardware assertion error was reported",
- "A parity error was detected in an STLF, SCB EMEM entry or SRB store data by any access",
-};
-
-static const char * const smca_if_mce_desc[] = {
- "Op Cache Microtag Probe Port Parity Error",
- "IC Microtag or Full Tag Multi-hit Error",
- "IC Full Tag Parity Error",
- "IC Data Array Parity Error",
- "Decoupling Queue PhysAddr Parity Error",
- "L0 ITLB Parity Error",
- "L1 ITLB Parity Error",
- "L2 ITLB Parity Error",
- "BPQ Thread 0 Snoop Parity Error",
- "BPQ Thread 1 Snoop Parity Error",
- "L1 BTB Multi-Match Error",
- "L2 BTB Multi-Match Error",
- "L2 Cache Response Poison Error",
- "System Read Data Error",
- "Hardware Assertion Error",
- "L1-TLB Multi-Hit",
- "L2-TLB Multi-Hit",
- "BSR Parity Error",
- "CT MCE",
-};
-
-static const char * const smca_l2_mce_desc[] = {
- "L2M Tag Multiple-Way-Hit error",
- "L2M Tag or State Array ECC Error",
- "L2M Data Array ECC Error",
- "Hardware Assert Error",
-};
-
-static const char * const smca_de_mce_desc[] = {
- "Micro-op cache tag parity error",
- "Micro-op cache data parity error",
- "Instruction buffer parity error",
- "Micro-op queue parity error",
- "Instruction dispatch queue parity error",
- "Fetch address FIFO parity error",
- "Patch RAM data parity error",
- "Patch RAM sequencer parity error",
- "Micro-op buffer parity error",
- "Hardware Assertion MCA Error",
-};
-
-static const char * const smca_ex_mce_desc[] = {
- "Watchdog Timeout error",
- "Physical register file parity error",
- "Flag register file parity error",
- "Immediate displacement register file parity error",
- "Address generator payload parity error",
- "EX payload parity error",
- "Checkpoint queue parity error",
- "Retire dispatch queue parity error",
- "Retire status queue parity error",
- "Scheduling queue parity error",
- "Branch buffer queue parity error",
- "Hardware Assertion error",
- "Spec Map parity error",
- "Retire Map parity error",
-};
-
-static const char * const smca_fp_mce_desc[] = {
- "Physical register file (PRF) parity error",
- "Freelist (FL) parity error",
- "Schedule queue parity error",
- "NSQ parity error",
- "Retire queue (RQ) parity error",
- "Status register file (SRF) parity error",
- "Hardware assertion",
-};
-
-static const char * const smca_l3_mce_desc[] = {
- "Shadow Tag Macro ECC Error",
- "Shadow Tag Macro Multi-way-hit Error",
- "L3M Tag ECC Error",
- "L3M Tag Multi-way-hit Error",
- "L3M Data ECC Error",
- "SDP Parity Error or SystemReadDataError from XI",
- "L3 Victim Queue Parity Error",
- "L3 Hardware Assertion",
-};
-
-static const char * const smca_cs_mce_desc[] = {
- "Illegal Request",
- "Address Violation",
- "Security Violation",
- "Illegal Response",
- "Unexpected Response",
- "Request or Probe Parity Error",
- "Read Response Parity Error",
- "Atomic Request Parity Error",
- "Probe Filter ECC Error",
-};
-
-static const char * const smca_cs2_mce_desc[] = {
- "Illegal Request",
- "Address Violation",
- "Security Violation",
- "Illegal Response",
- "Unexpected Response",
- "Request or Probe Parity Error",
- "Read Response Parity Error",
- "Atomic Request Parity Error",
- "SDP read response had no match in the CS queue",
- "Probe Filter Protocol Error",
- "Probe Filter ECC Error",
- "SDP read response had an unexpected RETRY error",
- "Counter overflow error",
- "Counter underflow error",
-};
-
-static const char * const smca_pie_mce_desc[] = {
- "Hardware Assert",
- "Register security violation",
- "Link Error",
- "Poison data consumption",
- "A deferred error was detected in the DF"
-};
-
-static const char * const smca_umc_mce_desc[] = {
- "DRAM ECC error",
- "Data poison error",
- "SDP parity error",
- "Advanced peripheral bus error",
- "Address/Command parity error",
- "Write data CRC error",
- "DCQ SRAM ECC error",
- "AES SRAM ECC error",
-};
-
-static const char * const smca_umc2_mce_desc[] = {
- "DRAM ECC error",
- "Data poison error",
- "SDP parity error",
- "Reserved",
- "Address/Command parity error",
- "Write data parity error",
- "DCQ SRAM ECC error",
- "Reserved",
- "Read data parity error",
- "Rdb SRAM ECC error",
- "RdRsp SRAM ECC error",
- "LM32 MP errors",
-};
-
-static const char * const smca_pb_mce_desc[] = {
- "An ECC error in the Parameter Block RAM array",
-};
-
-static const char * const smca_psp_mce_desc[] = {
- "An ECC or parity error in a PSP RAM instance",
-};
-
-static const char * const smca_psp2_mce_desc[] = {
- "High SRAM ECC or parity error",
- "Low SRAM ECC or parity error",
- "Instruction Cache Bank 0 ECC or parity error",
- "Instruction Cache Bank 1 ECC or parity error",
- "Instruction Tag Ram 0 parity error",
- "Instruction Tag Ram 1 parity error",
- "Data Cache Bank 0 ECC or parity error",
- "Data Cache Bank 1 ECC or parity error",
- "Data Cache Bank 2 ECC or parity error",
- "Data Cache Bank 3 ECC or parity error",
- "Data Tag Bank 0 parity error",
- "Data Tag Bank 1 parity error",
- "Data Tag Bank 2 parity error",
- "Data Tag Bank 3 parity error",
- "Dirty Data Ram parity error",
- "TLB Bank 0 parity error",
- "TLB Bank 1 parity error",
- "System Hub Read Buffer ECC or parity error",
-};
-
-static const char * const smca_smu_mce_desc[] = {
- "An ECC or parity error in an SMU RAM instance",
-};
-
-static const char * const smca_smu2_mce_desc[] = {
- "High SRAM ECC or parity error",
- "Low SRAM ECC or parity error",
- "Data Cache Bank A ECC or parity error",
- "Data Cache Bank B ECC or parity error",
- "Data Tag Cache Bank A ECC or parity error",
- "Data Tag Cache Bank B ECC or parity error",
- "Instruction Cache Bank A ECC or parity error",
- "Instruction Cache Bank B ECC or parity error",
- "Instruction Tag Cache Bank A ECC or parity error",
- "Instruction Tag Cache Bank B ECC or parity error",
- "System Hub Read Buffer ECC or parity error",
- "PHY RAM ECC error",
-};
-
-static const char * const smca_mp5_mce_desc[] = {
- "High SRAM ECC or parity error",
- "Low SRAM ECC or parity error",
- "Data Cache Bank A ECC or parity error",
- "Data Cache Bank B ECC or parity error",
- "Data Tag Cache Bank A ECC or parity error",
- "Data Tag Cache Bank B ECC or parity error",
- "Instruction Cache Bank A ECC or parity error",
- "Instruction Cache Bank B ECC or parity error",
- "Instruction Tag Cache Bank A ECC or parity error",
- "Instruction Tag Cache Bank B ECC or parity error",
-};
-
-static const char * const smca_mpdma_mce_desc[] = {
- "Main SRAM [31:0] bank ECC or parity error",
- "Main SRAM [63:32] bank ECC or parity error",
- "Main SRAM [95:64] bank ECC or parity error",
- "Main SRAM [127:96] bank ECC or parity error",
- "Data Cache Bank A ECC or parity error",
- "Data Cache Bank B ECC or parity error",
- "Data Tag Cache Bank A ECC or parity error",
- "Data Tag Cache Bank B ECC or parity error",
- "Instruction Cache Bank A ECC or parity error",
- "Instruction Cache Bank B ECC or parity error",
- "Instruction Tag Cache Bank A ECC or parity error",
- "Instruction Tag Cache Bank B ECC or parity error",
- "Data Cache Bank A ECC or parity error",
- "Data Cache Bank B ECC or parity error",
- "Data Tag Cache Bank A ECC or parity error",
- "Data Tag Cache Bank B ECC or parity error",
- "Instruction Cache Bank A ECC or parity error",
- "Instruction Cache Bank B ECC or parity error",
- "Instruction Tag Cache Bank A ECC or parity error",
- "Instruction Tag Cache Bank B ECC or parity error",
- "Data Cache Bank A ECC or parity error",
- "Data Cache Bank B ECC or parity error",
- "Data Tag Cache Bank A ECC or parity error",
- "Data Tag Cache Bank B ECC or parity error",
- "Instruction Cache Bank A ECC or parity error",
- "Instruction Cache Bank B ECC or parity error",
- "Instruction Tag Cache Bank A ECC or parity error",
- "Instruction Tag Cache Bank B ECC or parity error",
- "System Hub Read Buffer ECC or parity error",
- "MPDMA TVF DVSEC Memory ECC or parity error",
- "MPDMA TVF MMIO Mailbox0 ECC or parity error",
- "MPDMA TVF MMIO Mailbox1 ECC or parity error",
- "MPDMA TVF Doorbell Memory ECC or parity error",
- "MPDMA TVF SDP Slave Memory 0 ECC or parity error",
- "MPDMA TVF SDP Slave Memory 1 ECC or parity error",
- "MPDMA TVF SDP Slave Memory 2 ECC or parity error",
- "MPDMA TVF SDP Master Memory 0 ECC or parity error",
- "MPDMA TVF SDP Master Memory 1 ECC or parity error",
- "MPDMA TVF SDP Master Memory 2 ECC or parity error",
- "MPDMA TVF SDP Master Memory 3 ECC or parity error",
- "MPDMA TVF SDP Master Memory 4 ECC or parity error",
- "MPDMA TVF SDP Master Memory 5 ECC or parity error",
- "MPDMA TVF SDP Master Memory 6 ECC or parity error",
- "MPDMA PTE Command FIFO ECC or parity error",
- "MPDMA PTE Hub Data FIFO ECC or parity error",
- "MPDMA PTE Internal Data FIFO ECC or parity error",
- "MPDMA PTE Command Memory DMA ECC or parity error",
- "MPDMA PTE Command Memory Internal ECC or parity error",
- "MPDMA PTE DMA Completion FIFO ECC or parity error",
- "MPDMA PTE Tablewalk Completion FIFO ECC or parity error",
- "MPDMA PTE Descriptor Completion FIFO ECC or parity error",
- "MPDMA PTE ReadOnly Completion FIFO ECC or parity error",
- "MPDMA PTE DirectWrite Completion FIFO ECC or parity error",
- "SDP Watchdog Timer expired",
-};
-
-static const char * const smca_nbio_mce_desc[] = {
- "ECC or Parity error",
- "PCIE error",
- "SDP ErrEvent error",
- "SDP Egress Poison Error",
- "IOHC Internal Poison Error",
-};
-
-static const char * const smca_pcie_mce_desc[] = {
- "CCIX PER Message logging",
- "CCIX Read Response with Status: Non-Data Error",
- "CCIX Write Response with Status: Non-Data Error",
- "CCIX Read Response with Status: Data Error",
- "CCIX Non-okay write response with data error",
-};
-
-static const char * const smca_pcie2_mce_desc[] = {
- "SDP Parity Error logging",
-};
-
-static const char * const smca_xgmipcs_mce_desc[] = {
- "Data Loss Error",
- "Training Error",
- "Flow Control Acknowledge Error",
- "Rx Fifo Underflow Error",
- "Rx Fifo Overflow Error",
- "CRC Error",
- "BER Exceeded Error",
- "Tx Vcid Data Error",
- "Replay Buffer Parity Error",
- "Data Parity Error",
- "Replay Fifo Overflow Error",
- "Replay Fifo Underflow Error",
- "Elastic Fifo Overflow Error",
- "Deskew Error",
- "Flow Control CRC Error",
- "Data Startup Limit Error",
- "FC Init Timeout Error",
- "Recovery Timeout Error",
- "Ready Serial Timeout Error",
- "Ready Serial Attempt Error",
- "Recovery Attempt Error",
- "Recovery Relock Attempt Error",
- "Replay Attempt Error",
- "Sync Header Error",
- "Tx Replay Timeout Error",
- "Rx Replay Timeout Error",
- "LinkSub Tx Timeout Error",
- "LinkSub Rx Timeout Error",
- "Rx CMD Packet Error",
-};
-
-static const char * const smca_xgmiphy_mce_desc[] = {
- "RAM ECC Error",
- "ARC instruction buffer parity error",
- "ARC data buffer parity error",
- "PHY APB error",
-};
-
-static const char * const smca_nbif_mce_desc[] = {
- "Timeout error from GMI",
- "SRAM ECC error",
- "NTB Error Event",
- "SDP Parity error",
-};
-
-static const char * const smca_sata_mce_desc[] = {
- "Parity error for port 0",
- "Parity error for port 1",
- "Parity error for port 2",
- "Parity error for port 3",
- "Parity error for port 4",
- "Parity error for port 5",
- "Parity error for port 6",
- "Parity error for port 7",
-};
-
-static const char * const smca_usb_mce_desc[] = {
- "Parity error or ECC error for S0 RAM0",
- "Parity error or ECC error for S0 RAM1",
- "Parity error or ECC error for S0 RAM2",
- "Parity error for PHY RAM0",
- "Parity error for PHY RAM1",
- "AXI Slave Response error",
-};
-
-static const char * const smca_gmipcs_mce_desc[] = {
- "Data Loss Error",
- "Training Error",
- "Replay Parity Error",
- "Rx Fifo Underflow Error",
- "Rx Fifo Overflow Error",
- "CRC Error",
- "BER Exceeded Error",
- "Tx Fifo Underflow Error",
- "Replay Buffer Parity Error",
- "Tx Overflow Error",
- "Replay Fifo Overflow Error",
- "Replay Fifo Underflow Error",
- "Elastic Fifo Overflow Error",
- "Deskew Error",
- "Offline Error",
- "Data Startup Limit Error",
- "FC Init Timeout Error",
- "Recovery Timeout Error",
- "Ready Serial Timeout Error",
- "Ready Serial Attempt Error",
- "Recovery Attempt Error",
- "Recovery Relock Attempt Error",
- "Deskew Abort Error",
- "Rx Buffer Error",
- "Rx LFDS Fifo Overflow Error",
- "Rx LFDS Fifo Underflow Error",
- "LinkSub Tx Timeout Error",
- "LinkSub Rx Timeout Error",
- "Rx CMD Packet Error",
- "LFDS Training Timeout Error",
- "LFDS FC Init Timeout Error",
- "Data Loss Error",
-};
-
-struct smca_mce_desc {
- const char * const *descs;
- unsigned int num_descs;
-};
-
-static struct smca_mce_desc smca_mce_descs[] = {
- [SMCA_LS] = { smca_ls_mce_desc, ARRAY_SIZE(smca_ls_mce_desc) },
- [SMCA_LS_V2] = { smca_ls2_mce_desc, ARRAY_SIZE(smca_ls2_mce_desc) },
- [SMCA_IF] = { smca_if_mce_desc, ARRAY_SIZE(smca_if_mce_desc) },
- [SMCA_L2_CACHE] = { smca_l2_mce_desc, ARRAY_SIZE(smca_l2_mce_desc) },
- [SMCA_DE] = { smca_de_mce_desc, ARRAY_SIZE(smca_de_mce_desc) },
- [SMCA_EX] = { smca_ex_mce_desc, ARRAY_SIZE(smca_ex_mce_desc) },
- [SMCA_FP] = { smca_fp_mce_desc, ARRAY_SIZE(smca_fp_mce_desc) },
- [SMCA_L3_CACHE] = { smca_l3_mce_desc, ARRAY_SIZE(smca_l3_mce_desc) },
- [SMCA_CS] = { smca_cs_mce_desc, ARRAY_SIZE(smca_cs_mce_desc) },
- [SMCA_CS_V2] = { smca_cs2_mce_desc, ARRAY_SIZE(smca_cs2_mce_desc) },
- [SMCA_PIE] = { smca_pie_mce_desc, ARRAY_SIZE(smca_pie_mce_desc) },
- [SMCA_UMC] = { smca_umc_mce_desc, ARRAY_SIZE(smca_umc_mce_desc) },
- [SMCA_UMC_V2] = { smca_umc2_mce_desc, ARRAY_SIZE(smca_umc2_mce_desc) },
- [SMCA_PB] = { smca_pb_mce_desc, ARRAY_SIZE(smca_pb_mce_desc) },
- [SMCA_PSP] = { smca_psp_mce_desc, ARRAY_SIZE(smca_psp_mce_desc) },
- [SMCA_PSP_V2] = { smca_psp2_mce_desc, ARRAY_SIZE(smca_psp2_mce_desc) },
- [SMCA_SMU] = { smca_smu_mce_desc, ARRAY_SIZE(smca_smu_mce_desc) },
- [SMCA_SMU_V2] = { smca_smu2_mce_desc, ARRAY_SIZE(smca_smu2_mce_desc) },
- [SMCA_MP5] = { smca_mp5_mce_desc, ARRAY_SIZE(smca_mp5_mce_desc) },
- [SMCA_MPDMA] = { smca_mpdma_mce_desc, ARRAY_SIZE(smca_mpdma_mce_desc) },
- [SMCA_NBIO] = { smca_nbio_mce_desc, ARRAY_SIZE(smca_nbio_mce_desc) },
- [SMCA_PCIE] = { smca_pcie_mce_desc, ARRAY_SIZE(smca_pcie_mce_desc) },
- [SMCA_PCIE_V2] = { smca_pcie2_mce_desc, ARRAY_SIZE(smca_pcie2_mce_desc) },
- [SMCA_XGMI_PCS] = { smca_xgmipcs_mce_desc, ARRAY_SIZE(smca_xgmipcs_mce_desc) },
- /* NBIF and SHUB have the same error descriptions, for now. */
- [SMCA_NBIF] = { smca_nbif_mce_desc, ARRAY_SIZE(smca_nbif_mce_desc) },
- [SMCA_SHUB] = { smca_nbif_mce_desc, ARRAY_SIZE(smca_nbif_mce_desc) },
- [SMCA_SATA] = { smca_sata_mce_desc, ARRAY_SIZE(smca_sata_mce_desc) },
- [SMCA_USB] = { smca_usb_mce_desc, ARRAY_SIZE(smca_usb_mce_desc) },
- [SMCA_GMI_PCS] = { smca_gmipcs_mce_desc, ARRAY_SIZE(smca_gmipcs_mce_desc) },
- /* All the PHY bank types have the same error descriptions, for now. */
- [SMCA_XGMI_PHY] = { smca_xgmiphy_mce_desc, ARRAY_SIZE(smca_xgmiphy_mce_desc) },
- [SMCA_WAFL_PHY] = { smca_xgmiphy_mce_desc, ARRAY_SIZE(smca_xgmiphy_mce_desc) },
- [SMCA_GMI_PHY] = { smca_xgmiphy_mce_desc, ARRAY_SIZE(smca_xgmiphy_mce_desc) },
-};
-
static bool f12h_mc0_mce(u16 ec, u8 xec)
{
bool ret = false;
pr_emerg(HW_ERR "Corrupted MC6 MCE info?\n");
}
+static const char * const smca_long_names[] = {
+ [SMCA_LS ... SMCA_LS_V2] = "Load Store Unit",
+ [SMCA_IF] = "Instruction Fetch Unit",
+ [SMCA_L2_CACHE] = "L2 Cache",
+ [SMCA_DE] = "Decode Unit",
+ [SMCA_RESERVED] = "Reserved",
+ [SMCA_EX] = "Execution Unit",
+ [SMCA_FP] = "Floating Point Unit",
+ [SMCA_L3_CACHE] = "L3 Cache",
+ [SMCA_CS ... SMCA_CS_V2] = "Coherent Slave",
+ [SMCA_PIE] = "Power, Interrupts, etc.",
+
+ /* UMC v2 is separate because both of them can exist in a single system. */
+ [SMCA_UMC] = "Unified Memory Controller",
+ [SMCA_UMC_V2] = "Unified Memory Controller v2",
+ [SMCA_PB] = "Parameter Block",
+ [SMCA_PSP ... SMCA_PSP_V2] = "Platform Security Processor",
+ [SMCA_SMU ... SMCA_SMU_V2] = "System Management Unit",
+ [SMCA_MP5] = "Microprocessor 5 Unit",
+ [SMCA_MPDMA] = "MPDMA Unit",
+ [SMCA_NBIO] = "Northbridge IO Unit",
+ [SMCA_PCIE ... SMCA_PCIE_V2] = "PCI Express Unit",
+ [SMCA_XGMI_PCS] = "Ext Global Memory Interconnect PCS Unit",
+ [SMCA_NBIF] = "NBIF Unit",
+ [SMCA_SHUB] = "System Hub Unit",
+ [SMCA_SATA] = "SATA Unit",
+ [SMCA_USB] = "USB Unit",
+ [SMCA_GMI_PCS] = "Global Memory Interconnect PCS Unit",
+ [SMCA_XGMI_PHY] = "Ext Global Memory Interconnect PHY Unit",
+ [SMCA_WAFL_PHY] = "WAFL PHY Unit",
+ [SMCA_GMI_PHY] = "Global Memory Interconnect PHY Unit",
+};
+
+static const char *smca_get_long_name(enum smca_bank_types t)
+{
+ if (t >= N_SMCA_BANK_TYPES)
+ return NULL;
+
+ return smca_long_names[t];
+}
+
/* Decode errors according to Scalable MCA specification */
static void decode_smca_error(struct mce *m)
{
enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank);
- const char *ip_name;
u8 xec = XEC(m->status, xec_mask);
if (bank_type >= N_SMCA_BANK_TYPES)
return;
}
- ip_name = smca_get_long_name(bank_type);
-
- pr_emerg(HW_ERR "%s Ext. Error Code: %d", ip_name, xec);
-
- /* Only print the decode of valid error codes */
- if (xec < smca_mce_descs[bank_type].num_descs)
- pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
+ pr_emerg(HW_ERR "%s Ext. Error Code: %d", smca_get_long_name(bank_type), xec);
if ((bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) &&
xec == 0 && decode_dram_ecc)
return res;
}
-static int mpc85xx_pci_err_remove(struct platform_device *op)
+static void mpc85xx_pci_err_remove(struct platform_device *op)
{
struct edac_pci_ctl_info *pci = dev_get_drvdata(&op->dev);
struct mpc85xx_pci_pdata *pdata = pci->pvt_info;
edac_pci_del_device(&op->dev);
edac_pci_free_ctl_info(pci);
-
- return 0;
}
static const struct platform_device_id mpc85xx_pci_err_match[] = {
static struct platform_driver mpc85xx_pci_err_driver = {
.probe = mpc85xx_pci_err_probe,
- .remove = mpc85xx_pci_err_remove,
+ .remove_new = mpc85xx_pci_err_remove,
.id_table = mpc85xx_pci_err_match,
.driver = {
.name = "mpc85xx_pci_err",
return res;
}
-static int mpc85xx_l2_err_remove(struct platform_device *op)
+static void mpc85xx_l2_err_remove(struct platform_device *op)
{
struct edac_device_ctl_info *edac_dev = dev_get_drvdata(&op->dev);
struct mpc85xx_l2_pdata *pdata = edac_dev->pvt_info;
out_be32(pdata->l2_vbase + MPC85XX_L2_ERRDIS, orig_l2_err_disable);
edac_device_del_device(&op->dev);
edac_device_free_ctl_info(edac_dev);
- return 0;
}
static const struct of_device_id mpc85xx_l2_err_of_match[] = {
static struct platform_driver mpc85xx_l2_err_driver = {
.probe = mpc85xx_l2_err_probe,
- .remove = mpc85xx_l2_err_remove,
+ .remove_new = mpc85xx_l2_err_remove,
.driver = {
.name = "mpc85xx_l2_err",
.of_match_table = mpc85xx_l2_err_of_match,
static struct platform_driver mpc85xx_mc_err_driver = {
.probe = fsl_mc_err_probe,
- .remove = fsl_mc_err_remove,
+ .remove_new = fsl_mc_err_remove,
.driver = {
.name = "mpc85xx_mc_err",
.of_match_table = mpc85xx_mc_err_of_match,
return rc;
}
-static int edac_remove(struct platform_device *pdev)
+static void edac_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
struct priv_data *priv = mci->pvt_info;
regmap_write(npcm_regmap, pdata->ctl_int_mask_master,
pdata->int_mask_master_global_mask);
regmap_update_bits(npcm_regmap, pdata->ctl_ecc_en, pdata->ecc_en_mask, 0);
-
- return 0;
}
static const struct npcm_platform_data npcm750_edac = {
.of_match_table = npcm_edac_of_match,
},
.probe = edac_probe,
- .remove = edac_remove,
+ .remove_new = edac_remove,
};
module_platform_driver(npcm_edac_driver);
return -ENXIO;
}
-static int octeon_l2c_remove(struct platform_device *pdev)
+static void octeon_l2c_remove(struct platform_device *pdev)
{
struct edac_device_ctl_info *l2c = platform_get_drvdata(pdev);
edac_device_del_device(&pdev->dev);
edac_device_free_ctl_info(l2c);
-
- return 0;
}
static struct platform_driver octeon_l2c_driver = {
.probe = octeon_l2c_probe,
- .remove = octeon_l2c_remove,
+ .remove_new = octeon_l2c_remove,
.driver = {
.name = "octeon_l2c_edac",
}
return 0;
}
-static int octeon_lmc_edac_remove(struct platform_device *pdev)
+static void octeon_lmc_edac_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
- return 0;
}
static struct platform_driver octeon_lmc_edac_driver = {
.probe = octeon_lmc_edac_probe,
- .remove = octeon_lmc_edac_remove,
+ .remove_new = octeon_lmc_edac_remove,
.driver = {
.name = "octeon_lmc_edac",
}
return -ENXIO;
}
-static int co_cache_error_remove(struct platform_device *pdev)
+static void co_cache_error_remove(struct platform_device *pdev)
{
struct co_cache_error *p = platform_get_drvdata(pdev);
unregister_co_cache_error_notifier(&p->notifier);
edac_device_del_device(&pdev->dev);
edac_device_free_ctl_info(p->ed);
- return 0;
}
static struct platform_driver co_cache_error_driver = {
.probe = co_cache_error_probe,
- .remove = co_cache_error_remove,
+ .remove_new = co_cache_error_remove,
.driver = {
.name = "octeon_pc_edac",
}
return res;
}
-static int octeon_pci_remove(struct platform_device *pdev)
+static void octeon_pci_remove(struct platform_device *pdev)
{
struct edac_pci_ctl_info *pci = platform_get_drvdata(pdev);
edac_pci_del_device(&pdev->dev);
edac_pci_free_ctl_info(pci);
-
- return 0;
}
static struct platform_driver octeon_pci_driver = {
.probe = octeon_pci_probe,
- .remove = octeon_pci_remove,
+ .remove_new = octeon_pci_remove,
.driver = {
.name = "octeon_pci_edac",
}
* rank, bank, row and column using the appropriate "dunit_ops" functions/parameters.
*/
-#include <linux/module.h>
+#include <linux/bitmap.h>
+#include <linux/delay.h>
+#include <linux/edac.h>
#include <linux/init.h>
+#include <linux/math64.h>
+#include <linux/mmzone.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
#include <linux/pci.h>
#include <linux/pci_ids.h>
+#include <linux/sizes.h>
#include <linux/slab.h>
-#include <linux/delay.h>
-#include <linux/edac.h>
-#include <linux/mmzone.h>
#include <linux/smp.h>
-#include <linux/bitmap.h>
-#include <linux/math64.h>
-#include <linux/mod_devicetable.h>
+
#include <linux/platform_data/x86/p2sb.h>
#include <asm/cpu_device_id.h>
#define MOT_CHAN_INTLV_BIT_1SLC_2CH 12
#define MOT_CHAN_INTLV_BIT_2SLC_2CH 13
#define SELECTOR_DISABLED (-1)
-#define _4GB (1ul << 32)
#define PMI_ADDRESS_WIDTH 31
#define PND_MAX_PHYS_BIT 39
}
P2SB_READ(dword, P2SB_DATA_OFF, data);
- ret = (status >> 1) & 0x3;
+ ret = (status >> 1) & GENMASK(1, 0);
out:
/* Hide the P2SB device, if it was hidden before */
if (hidden)
static u8 sym_chan_mask;
static u8 asym_chan_mask;
-static u8 chan_mask;
+static unsigned long chan_mask;
static int slice_selector = -1;
static int chan_selector = -1;
return;
}
if (mask != GENMASK_ULL(PND_MAX_PHYS_BIT, __ffs(mask))) {
- pr_info(FW_BUG "MOT mask not power of two\n");
+ pr_info(FW_BUG "MOT mask is invalid\n");
return;
}
if (base & ~mask) {
/* Get a contiguous memory address (remove the MMIO gap) */
static u64 remove_mmio_gap(u64 sys)
{
- return (sys < _4GB) ? sys : sys - (_4GB - top_lm);
+ return (sys < SZ_4G) ? sys : sys - (SZ_4G - top_lm);
}
/* Squeeze out one address bit, shift upper part down to fill gap */
if (bitidx == -1)
return;
- mask = (1ull << bitidx) - 1;
+ mask = BIT_ULL(bitidx) - 1;
*addr = ((*addr >> 1) & ~mask) | (*addr & mask);
}
int sym_chan_shift = sym_channels >> 1;
/* Give up if address is out of range, or in MMIO gap */
- if (addr >= (1ul << PND_MAX_PHYS_BIT) ||
- (addr >= top_lm && addr < _4GB) || addr >= top_hm) {
+ if (addr >= BIT(PND_MAX_PHYS_BIT) ||
+ (addr >= top_lm && addr < SZ_4G) || addr >= top_hm) {
snprintf(msg, PND2_MSG_SIZE, "Error address 0x%llx is not DRAM", addr);
return -EINVAL;
}
}
/* Translate PMI address to memory (rank, row, bank, column) */
-#define C(n) (0x10 | (n)) /* column */
-#define B(n) (0x20 | (n)) /* bank */
-#define R(n) (0x40 | (n)) /* row */
-#define RS (0x80) /* rank */
+#define C(n) (BIT(4) | (n)) /* column */
+#define B(n) (BIT(5) | (n)) /* bank */
+#define R(n) (BIT(6) | (n)) /* row */
+#define RS (BIT(7)) /* rank */
/* addrdec values */
#define AMAP_1KB 0
int i, ret = 0;
/* Check dramtype and ECC mode for each present DIMM */
- for (i = 0; i < APL_NUM_CHANNELS; i++)
- if (chan_mask & BIT(i))
- ret += check_channel(i);
+ for_each_set_bit(i, &chan_mask, APL_NUM_CHANNELS)
+ ret += check_channel(i);
+
return ret ? -EINVAL : 0;
}
u64 capacity;
int i, g;
- for (i = 0; i < APL_NUM_CHANNELS; i++) {
- if (!(chan_mask & BIT(i)))
- continue;
-
+ for_each_set_bit(i, &chan_mask, APL_NUM_CHANNELS) {
dimm = edac_get_dimm(mci, i, 0, 0);
if (!dimm) {
edac_dbg(0, "No allocated DIMM for channel %d\n", i);
}
pvt->dimm_geom[i] = g;
- capacity = (d->rken0 + d->rken1) * 8 * (1ul << dimms[g].rowbits) *
- (1ul << dimms[g].colbits);
+ capacity = (d->rken0 + d->rken1) * 8 * BIT(dimms[g].rowbits + dimms[g].colbits);
edac_dbg(0, "Channel %d: %lld MByte DIMM\n", i, capacity >> (20 - 3));
dimm->nr_pages = MiB_TO_PAGES(capacity >> (20 - 3));
dimm->grain = 32;
continue;
}
- capacity = ranks_of_dimm[j] * banks * (1ul << rowbits) * (1ul << colbits);
+ capacity = ranks_of_dimm[j] * banks * BIT(rowbits + colbits);
edac_dbg(0, "Channel %d DIMM %d: %lld MByte DIMM\n", i, j, capacity >> (20 - 3));
dimm->nr_pages = MiB_TO_PAGES(capacity >> (20 - 3));
dimm->grain = 32;
*
* Unconditionally returns 0.
*/
-static int
-ppc4xx_edac_remove(struct platform_device *op)
+static void ppc4xx_edac_remove(struct platform_device *op)
{
struct mem_ctl_info *mci = dev_get_drvdata(&op->dev);
struct ppc4xx_edac_pdata *pdata = mci->pvt_info;
edac_mc_del_mc(mci->pdev);
edac_mc_free(mci);
-
- return 0;
}
/**
static struct platform_driver ppc4xx_edac_driver = {
.probe = ppc4xx_edac_probe,
- .remove = ppc4xx_edac_remove,
+ .remove_new = ppc4xx_edac_remove,
.driver = {
.name = PPC4XX_EDAC_MODULE_NAME,
.of_match_table = ppc4xx_edac_match,
return rc;
}
-static int qcom_llcc_edac_remove(struct platform_device *pdev)
+static void qcom_llcc_edac_remove(struct platform_device *pdev)
{
struct edac_device_ctl_info *edev_ctl = dev_get_drvdata(&pdev->dev);
edac_device_del_device(edev_ctl->dev);
edac_device_free_ctl_info(edev_ctl);
-
- return 0;
}
static const struct platform_device_id qcom_llcc_edac_id_table[] = {
static struct platform_driver qcom_llcc_edac_driver = {
.probe = qcom_llcc_edac_probe,
- .remove = qcom_llcc_edac_remove,
+ .remove_new = qcom_llcc_edac_remove,
.driver = {
.name = "qcom_llcc_edac",
},
static const struct pci_id_table pci_dev_descr_sbridge_table[] = {
PCI_ID_TABLE_ENTRY(pci_dev_descr_sbridge, ARRAY_SIZE(pci_dev_descr_sbridge), 1, SANDY_BRIDGE),
- {0,} /* 0 terminated list. */
+ { NULL, }
};
/* This changes depending if 1HA or 2HA:
static const struct pci_id_table pci_dev_descr_ibridge_table[] = {
PCI_ID_TABLE_ENTRY(pci_dev_descr_ibridge, 12, 2, IVY_BRIDGE),
- {0,} /* 0 terminated list. */
+ { NULL, }
};
/* Haswell support */
static const struct pci_id_table pci_dev_descr_haswell_table[] = {
PCI_ID_TABLE_ENTRY(pci_dev_descr_haswell, 13, 2, HASWELL),
- {0,} /* 0 terminated list. */
+ { NULL, }
};
/* Knight's Landing Support */
static const struct pci_id_table pci_dev_descr_knl_table[] = {
PCI_ID_TABLE_ENTRY(pci_dev_descr_knl, ARRAY_SIZE(pci_dev_descr_knl), 1, KNIGHTS_LANDING),
- {0,}
+ { NULL, }
};
/*
static const struct pci_id_table pci_dev_descr_broadwell_table[] = {
PCI_ID_TABLE_ENTRY(pci_dev_descr_broadwell, 10, 2, BROADWELL),
- {0,} /* 0 terminated list. */
+ { NULL, }
};
memset(&res, 0, sizeof(res));
res.mce = mce;
res.addr = mce->addr & MCI_ADDR_PHYSADDR;
+ if (!pfn_to_online_page(res.addr >> PAGE_SHIFT)) {
+ pr_err("Invalid address 0x%llx in IA32_MC%d_ADDR\n", mce->addr, mce->bank);
+ return NOTIFY_DONE;
+ }
/* Try driver decoder first */
if (!(driver_decode && driver_decode(&res))) {
*
* Return: Unconditionally 0
*/
-static int mc_remove(struct platform_device *pdev)
+static void mc_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
struct synps_edac_priv *priv = mci->pvt_info;
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
-
- return 0;
}
static struct platform_driver synps_edac_mc_driver = {
.of_match_table = synps_edac_match,
},
.probe = mc_probe,
- .remove = mc_remove,
+ .remove_new = mc_remove,
};
module_platform_driver(synps_edac_mc_driver);
decode_register(other, OCX_OTHER_SIZE,
ocx_com_errors, ctx->reg_com_int);
- strncat(msg, other, OCX_MESSAGE_SIZE);
+ strlcat(msg, other, OCX_MESSAGE_SIZE);
for (lane = 0; lane < OCX_RX_LANES; lane++)
if (ctx->reg_com_int & BIT(lane)) {
lane, ctx->reg_lane_int[lane],
lane, ctx->reg_lane_stat11[lane]);
- strncat(msg, other, OCX_MESSAGE_SIZE);
+ strlcat(msg, other, OCX_MESSAGE_SIZE);
decode_register(other, OCX_OTHER_SIZE,
ocx_lane_errors,
ctx->reg_lane_int[lane]);
- strncat(msg, other, OCX_MESSAGE_SIZE);
+ strlcat(msg, other, OCX_MESSAGE_SIZE);
}
if (ctx->reg_com_int & OCX_COM_INT_CE)
decode_register(other, OCX_OTHER_SIZE,
ocx_com_link_errors, ctx->reg_com_link_int);
- strncat(msg, other, OCX_MESSAGE_SIZE);
+ strlcat(msg, other, OCX_MESSAGE_SIZE);
if (ctx->reg_com_link_int & OCX_COM_LINK_INT_UE)
edac_device_handle_ue(ocx->edac_dev, 0, 0, msg);
decode_register(other, L2C_OTHER_SIZE, l2_errors, ctx->reg_int);
- strncat(msg, other, L2C_MESSAGE_SIZE);
+ strlcat(msg, other, L2C_MESSAGE_SIZE);
if (ctx->reg_int & mask_ue)
edac_device_handle_ue(l2c->edac_dev, 0, 0, msg);
return ret;
}
-static int ti_edac_remove(struct platform_device *pdev)
+static void ti_edac_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
-
- return 0;
}
static struct platform_driver ti_edac_driver = {
.probe = ti_edac_probe,
- .remove = ti_edac_remove,
+ .remove_new = ti_edac_remove,
.driver = {
.name = EDAC_MOD_NAME,
.of_match_table = ti_edac_of_match,
return rc;
}
-static int xgene_edac_remove(struct platform_device *pdev)
+static void xgene_edac_remove(struct platform_device *pdev)
{
struct xgene_edac *edac = dev_get_drvdata(&pdev->dev);
struct xgene_edac_mc_ctx *mcu;
list_for_each_entry_safe(node, temp_node, &edac->socs, next)
xgene_edac_soc_remove(node);
-
- return 0;
}
static const struct of_device_id xgene_edac_of_match[] = {
static struct platform_driver xgene_edac_driver = {
.probe = xgene_edac_probe,
- .remove = xgene_edac_remove,
+ .remove_new = xgene_edac_remove,
.driver = {
.name = "xgene-edac",
.of_match_table = xgene_edac_of_match,
return ret;
}
-static int edac_remove(struct platform_device *pdev)
+static void edac_remove(struct platform_device *pdev)
{
struct edac_device_ctl_info *dci = platform_get_drvdata(pdev);
struct edac_priv *priv = dci->pvt_info;
edac_device_del_device(&pdev->dev);
edac_device_free_ctl_info(dci);
-
- return 0;
}
static const struct of_device_id zynqmp_ocm_edac_match[] = {
.of_match_table = zynqmp_ocm_edac_match,
},
.probe = edac_probe,
- .remove = edac_remove,
+ .remove_new = edac_remove,
};
module_platform_driver(zynqmp_ocm_edac_driver);
#define QUIRK_TI_SLLZ059 0x20
#define QUIRK_IR_WAKE 0x40
+// On PCI Express Root Complex in any type of AMD Ryzen machine, VIA VT6306/6307/6308 with Asmedia
+// ASM1083/1085 brings an inconvenience that the read accesses to 'Isochronous Cycle Timer' register
+// (at offset 0xf0 in PCI I/O space) often causes unexpected system reboot. The mechanism is not
+// clear, since the read access to the other registers is enough safe; e.g. 'Node ID' register,
+// while it is probable due to detection of any type of PCIe error.
+#define QUIRK_REBOOT_BY_CYCLE_TIMER_READ 0x80000000
+
+#if IS_ENABLED(CONFIG_X86)
+
+static bool has_reboot_by_cycle_timer_read_quirk(const struct fw_ohci *ohci)
+{
+ return !!(ohci->quirks & QUIRK_REBOOT_BY_CYCLE_TIMER_READ);
+}
+
+#define PCI_DEVICE_ID_ASMEDIA_ASM108X 0x1080
+
+static bool detect_vt630x_with_asm1083_on_amd_ryzen_machine(const struct pci_dev *pdev)
+{
+ const struct pci_dev *pcie_to_pci_bridge;
+
+ // Detect any type of AMD Ryzen machine.
+ if (!static_cpu_has(X86_FEATURE_ZEN))
+ return false;
+
+ // Detect VIA VT6306/6307/6308.
+ if (pdev->vendor != PCI_VENDOR_ID_VIA)
+ return false;
+ if (pdev->device != PCI_DEVICE_ID_VIA_VT630X)
+ return false;
+
+ // Detect Asmedia ASM1083/1085.
+ pcie_to_pci_bridge = pdev->bus->self;
+ if (pcie_to_pci_bridge->vendor != PCI_VENDOR_ID_ASMEDIA)
+ return false;
+ if (pcie_to_pci_bridge->device != PCI_DEVICE_ID_ASMEDIA_ASM108X)
+ return false;
+
+ return true;
+}
+
+#else
+#define has_reboot_by_cycle_timer_read_quirk(ohci) false
+#define detect_vt630x_with_asm1083_on_amd_ryzen_machine(pdev) false
+#endif
+
/* In case of multiple matches in ohci_quirks[], only the first one is used. */
static const struct {
unsigned short vendor, device, revision, flags;
s32 diff01, diff12;
int i;
+ if (has_reboot_by_cycle_timer_read_quirk(ohci))
+ return 0;
+
c2 = reg_read(ohci, OHCI1394_IsochronousCycleTimer);
if (ohci->quirks & QUIRK_CYCLE_TIMER) {
if (param_quirks)
ohci->quirks = param_quirks;
+ if (detect_vt630x_with_asm1083_on_amd_ryzen_machine(dev))
+ ohci->quirks |= QUIRK_REBOOT_BY_CYCLE_TIMER_READ;
+
/*
* Because dma_alloc_coherent() allocates at least one page,
* we save space by using a common buffer for the AR request/
# EFI_ZBOOT_FORWARD_CFI
quiet_cmd_copy_and_pad = PAD $@
- cmd_copy_and_pad = cp $< $@ && \
- truncate -s $(shell hexdump -s16 -n4 -e '"%u"' $<) $@
+ cmd_copy_and_pad = cp $< $@; \
+ truncate -s $$(hexdump -s16 -n4 -e '"%u"' $<) $@
# Pad the file to the size of the uncompressed image in memory, including BSS
$(obj)/vmlinux.bin: $(obj)/$(EFI_ZBOOT_PAYLOAD) FORCE
efi_debug("AMI firmware v2.0 or older detected - disabling physical KASLR\n");
seed[0] = 0;
}
+
+ boot_params_ptr->hdr.loadflags |= KASLR_FLAG;
}
status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
{
struct eventfd_ctx *trigger = arg;
- eventfd_signal(trigger, 1);
+ eventfd_signal(trigger);
return IRQ_HANDLED;
}
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
struct dwapb_gpio *gpio = to_dwapb_gpio(gc);
+ irq_hw_number_t hwirq = irqd_to_hwirq(d);
unsigned long flags;
u32 val;
raw_spin_lock_irqsave(&gc->bgpio_lock, flags);
- val = dwapb_read(gpio, GPIO_INTEN);
- val |= BIT(irqd_to_hwirq(d));
+ val = dwapb_read(gpio, GPIO_INTEN) | BIT(hwirq);
dwapb_write(gpio, GPIO_INTEN, val);
+ val = dwapb_read(gpio, GPIO_INTMASK) & ~BIT(hwirq);
+ dwapb_write(gpio, GPIO_INTMASK, val);
raw_spin_unlock_irqrestore(&gc->bgpio_lock, flags);
}
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
struct dwapb_gpio *gpio = to_dwapb_gpio(gc);
+ irq_hw_number_t hwirq = irqd_to_hwirq(d);
unsigned long flags;
u32 val;
raw_spin_lock_irqsave(&gc->bgpio_lock, flags);
- val = dwapb_read(gpio, GPIO_INTEN);
- val &= ~BIT(irqd_to_hwirq(d));
+ val = dwapb_read(gpio, GPIO_INTMASK) | BIT(hwirq);
+ dwapb_write(gpio, GPIO_INTMASK, val);
+ val = dwapb_read(gpio, GPIO_INTEN) & ~BIT(hwirq);
dwapb_write(gpio, GPIO_INTEN, val);
raw_spin_unlock_irqrestore(&gc->bgpio_lock, flags);
}
return 0;
}
-/*
- * gpio_ioctl() - ioctl handler for the GPIO chardev
- */
-static long gpio_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static long gpio_ioctl_unlocked(struct file *file, unsigned int cmd, unsigned long arg)
{
struct gpio_chardev_data *cdev = file->private_data;
struct gpio_device *gdev = cdev->gdev;
}
}
+/*
+ * gpio_ioctl() - ioctl handler for the GPIO chardev
+ */
+static long gpio_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct gpio_chardev_data *cdev = file->private_data;
+
+ return call_ioctl_locked(file, cmd, arg, cdev->gdev,
+ gpio_ioctl_unlocked);
+}
+
#ifdef CONFIG_COMPAT
static long gpio_ioctl_compat(struct file *file, unsigned int cmd,
unsigned long arg)
adev->firmware.gpu_info_fw = NULL;
- if (adev->mman.discovery_bin) {
- /*
- * FIXME: The bounding box is still needed by Navi12, so
- * temporarily read it from gpu_info firmware. Should be dropped
- * when DAL no longer needs it.
- */
- if (adev->asic_type != CHIP_NAVI12)
- return 0;
- }
+ if (adev->mman.discovery_bin)
+ return 0;
switch (adev->asic_type) {
default:
list_for_each_entry_safe(vm_bo, tmp, &vm->idle, vm_status) {
struct amdgpu_bo *bo = vm_bo->bo;
+ vm_bo->moved = true;
if (!bo || bo->tbo.type != ttm_bo_type_kernel)
list_move(&vm_bo->vm_status, &vm_bo->vm->moved);
else if (bo->parent)
if (test_bit(gpuidx, prange->bitmap_access))
bitmap_set(ctx->bitmap, gpuidx, 1);
}
+
+ /*
+ * If prange is already mapped or with always mapped flag,
+ * update mapping on GPUs with ACCESS attribute
+ */
+ if (bitmap_empty(ctx->bitmap, MAX_GPU_INSTANCE)) {
+ if (prange->mapped_to_gpu ||
+ prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)
+ bitmap_copy(ctx->bitmap, prange->bitmap_access, MAX_GPU_INSTANCE);
+ }
} else {
bitmap_or(ctx->bitmap, prange->bitmap_access,
prange->bitmap_aip, MAX_GPU_INSTANCE);
}
if (bitmap_empty(ctx->bitmap, MAX_GPU_INSTANCE)) {
- bitmap_copy(ctx->bitmap, prange->bitmap_access, MAX_GPU_INSTANCE);
- if (!prange->mapped_to_gpu ||
- bitmap_empty(ctx->bitmap, MAX_GPU_INSTANCE)) {
- r = 0;
- goto free_ctx;
- }
+ r = 0;
+ goto free_ctx;
}
if (prange->actual_loc && !prange->ttm_res) {
if (stream->signal == SIGNAL_TYPE_HDMI_TYPE_A)
mod_build_hf_vsif_infopacket(stream, &stream->vsp_infopacket);
-
- if (stream->link->psr_settings.psr_feature_enabled || stream->link->replay_settings.replay_feature_enabled) {
+ else if (stream->signal == SIGNAL_TYPE_DISPLAY_PORT ||
+ stream->signal == SIGNAL_TYPE_DISPLAY_PORT_MST ||
+ stream->signal == SIGNAL_TYPE_EDP) {
//
// should decide stream support vsc sdp colorimetry capability
// before building vsc info packet
if (stream->out_transfer_func->tf == TRANSFER_FUNCTION_GAMMA22)
tf = TRANSFER_FUNC_GAMMA_22;
mod_build_vsc_infopacket(stream, &stream->vsc_infopacket, stream->output_color_space, tf);
- aconnector->psr_skip_count = AMDGPU_DM_PSR_ENTRY_DELAY;
+ if (stream->link->psr_settings.psr_feature_enabled)
+ aconnector->psr_skip_count = AMDGPU_DM_PSR_ENTRY_DELAY;
}
finish:
dc_sink_release(sink);
if (IS_ERR(mst_state))
return PTR_ERR(mst_state);
- if (!mst_state->pbn_div)
- mst_state->pbn_div = dm_mst_get_pbn_divider(aconnector->mst_root->dc_link);
+ mst_state->pbn_div = dm_mst_get_pbn_divider(aconnector->mst_root->dc_link);
if (!state->duplicated) {
int max_bpc = conn_state->max_requested_bpc;
DC_LOG_BIOS("AS_SIGNAL_TYPE_HDMI ss_percentage: %d\n", ss_info->spread_spectrum_percentage);
break;
case AS_SIGNAL_TYPE_DISPLAY_PORT:
- ss_info->spread_spectrum_percentage =
+ if (bp->base.integrated_info) {
+ DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", bp->base.integrated_info->gpuclk_ss_percentage);
+ ss_info->spread_spectrum_percentage =
+ bp->base.integrated_info->gpuclk_ss_percentage;
+ ss_info->type.CENTER_MODE =
+ bp->base.integrated_info->gpuclk_ss_type;
+ } else {
+ ss_info->spread_spectrum_percentage =
disp_cntl_tbl->dp_ss_percentage;
- ss_info->spread_spectrum_range =
+ ss_info->spread_spectrum_range =
disp_cntl_tbl->dp_ss_rate_10hz * 10;
- if (disp_cntl_tbl->dp_ss_mode & ATOM_SS_CENTRE_SPREAD_MODE)
- ss_info->type.CENTER_MODE = true;
-
+ if (disp_cntl_tbl->dp_ss_mode & ATOM_SS_CENTRE_SPREAD_MODE)
+ ss_info->type.CENTER_MODE = true;
+ }
DC_LOG_BIOS("AS_SIGNAL_TYPE_DISPLAY_PORT ss_percentage: %d\n", ss_info->spread_spectrum_percentage);
break;
case AS_SIGNAL_TYPE_GPU_PLL:
return BP_RESULT_BADBIOSTABLE;
info->num_chans = info_v30->channel_num;
- /* As suggested by VBIOS we should always use
- * dram_channel_width_bytes = 2 when using VRAM
- * table version 3.0. This is because the channel_width
- * param in the VRAM info table is changed in 7000 series and
- * no longer represents the memory channel width.
- */
- info->dram_channel_width_bytes = 2;
+ info->dram_channel_width_bytes = (1 << info_v30->channel_width) / 8;
return result;
}
info->ma_channel_number = info_v2_2->umachannelnumber;
info->dp_ss_control =
le16_to_cpu(info_v2_2->reserved1);
+ info->gpuclk_ss_percentage = info_v2_2->gpuclk_ss_percentage;
+ info->gpuclk_ss_type = info_v2_2->gpuclk_ss_type;
for (i = 0; i < NUMBER_OF_UCHAR_FOR_GUID; ++i) {
info->ext_disp_conn_info.gu_id[i] =
*/
bool dc_is_dmub_outbox_supported(struct dc *dc)
{
- /* DCN31 B0 USB4 DPIA needs dmub notifications for interrupts */
- if (dc->ctx->asic_id.chip_family == FAMILY_YELLOW_CARP &&
- dc->ctx->asic_id.hw_internal_rev == YELLOW_CARP_B0 &&
- !dc->debug.dpia_debug.bits.disable_dpia)
- return true;
+ switch (dc->ctx->asic_id.chip_family) {
- if (dc->ctx->asic_id.chip_family == AMDGPU_FAMILY_GC_11_0_1 &&
- !dc->debug.dpia_debug.bits.disable_dpia)
- return true;
+ case FAMILY_YELLOW_CARP:
+ /* DCN31 B0 USB4 DPIA needs dmub notifications for interrupts */
+ if (dc->ctx->asic_id.hw_internal_rev == YELLOW_CARP_B0 &&
+ !dc->debug.dpia_debug.bits.disable_dpia)
+ return true;
+ break;
+
+ case AMDGPU_FAMILY_GC_11_0_1:
+ case AMDGPU_FAMILY_GC_11_5_0:
+ if (!dc->debug.dpia_debug.bits.disable_dpia)
+ return true;
+ break;
+
+ default:
+ break;
+ }
/* dmub aux needs dmub notifications to be enabled */
return dc->debug.enable_dmub_aux_for_legacy_ddc;
+
}
/**
.use_urgent_burst_bw = 0
};
-struct _vcs_dpi_soc_bounding_box_st dcn2_0_nv12_soc = { 0 };
+struct _vcs_dpi_soc_bounding_box_st dcn2_0_nv12_soc = {
+ .clock_limits = {
+ {
+ .state = 0,
+ .dcfclk_mhz = 560.0,
+ .fabricclk_mhz = 560.0,
+ .dispclk_mhz = 513.0,
+ .dppclk_mhz = 513.0,
+ .phyclk_mhz = 540.0,
+ .socclk_mhz = 560.0,
+ .dscclk_mhz = 171.0,
+ .dram_speed_mts = 1069.0,
+ },
+ {
+ .state = 1,
+ .dcfclk_mhz = 694.0,
+ .fabricclk_mhz = 694.0,
+ .dispclk_mhz = 642.0,
+ .dppclk_mhz = 642.0,
+ .phyclk_mhz = 600.0,
+ .socclk_mhz = 694.0,
+ .dscclk_mhz = 214.0,
+ .dram_speed_mts = 1324.0,
+ },
+ {
+ .state = 2,
+ .dcfclk_mhz = 875.0,
+ .fabricclk_mhz = 875.0,
+ .dispclk_mhz = 734.0,
+ .dppclk_mhz = 734.0,
+ .phyclk_mhz = 810.0,
+ .socclk_mhz = 875.0,
+ .dscclk_mhz = 245.0,
+ .dram_speed_mts = 1670.0,
+ },
+ {
+ .state = 3,
+ .dcfclk_mhz = 1000.0,
+ .fabricclk_mhz = 1000.0,
+ .dispclk_mhz = 1100.0,
+ .dppclk_mhz = 1100.0,
+ .phyclk_mhz = 810.0,
+ .socclk_mhz = 1000.0,
+ .dscclk_mhz = 367.0,
+ .dram_speed_mts = 2000.0,
+ },
+ {
+ .state = 4,
+ .dcfclk_mhz = 1200.0,
+ .fabricclk_mhz = 1200.0,
+ .dispclk_mhz = 1284.0,
+ .dppclk_mhz = 1284.0,
+ .phyclk_mhz = 810.0,
+ .socclk_mhz = 1200.0,
+ .dscclk_mhz = 428.0,
+ .dram_speed_mts = 2000.0,
+ },
+ {
+ .state = 5,
+ .dcfclk_mhz = 1200.0,
+ .fabricclk_mhz = 1200.0,
+ .dispclk_mhz = 1284.0,
+ .dppclk_mhz = 1284.0,
+ .phyclk_mhz = 810.0,
+ .socclk_mhz = 1200.0,
+ .dscclk_mhz = 428.0,
+ .dram_speed_mts = 2000.0,
+ },
+ },
+
+ .num_states = 5,
+ .sr_exit_time_us = 1.9,
+ .sr_enter_plus_exit_time_us = 4.4,
+ .urgent_latency_us = 3.0,
+ .urgent_latency_pixel_data_only_us = 4.0,
+ .urgent_latency_pixel_mixed_with_vm_data_us = 4.0,
+ .urgent_latency_vm_data_only_us = 4.0,
+ .urgent_out_of_order_return_per_channel_pixel_only_bytes = 4096,
+ .urgent_out_of_order_return_per_channel_pixel_and_vm_bytes = 4096,
+ .urgent_out_of_order_return_per_channel_vm_only_bytes = 4096,
+ .pct_ideal_dram_sdp_bw_after_urgent_pixel_only = 40.0,
+ .pct_ideal_dram_sdp_bw_after_urgent_pixel_and_vm = 40.0,
+ .pct_ideal_dram_sdp_bw_after_urgent_vm_only = 40.0,
+ .max_avg_sdp_bw_use_normal_percent = 40.0,
+ .max_avg_dram_bw_use_normal_percent = 40.0,
+ .writeback_latency_us = 12.0,
+ .ideal_dram_bw_after_urgent_percent = 40.0,
+ .max_request_size_bytes = 256,
+ .dram_channel_width_bytes = 16,
+ .fabric_datapath_to_dcn_data_return_bytes = 64,
+ .dcn_downspread_percent = 0.5,
+ .downspread_percent = 0.5,
+ .dram_page_open_time_ns = 50.0,
+ .dram_rw_turnaround_time_ns = 17.5,
+ .dram_return_buffer_per_channel_bytes = 8192,
+ .round_trip_ping_latency_dcfclk_cycles = 131,
+ .urgent_out_of_order_return_per_channel_bytes = 4096,
+ .channel_interleave_bytes = 256,
+ .num_banks = 8,
+ .num_chans = 16,
+ .vmm_page_size_bytes = 4096,
+ .dram_clock_change_latency_us = 45.0,
+ .writeback_dram_clock_change_latency_us = 23.0,
+ .return_bus_width_bytes = 64,
+ .dispclk_dppclk_vco_speed_mhz = 3850,
+ .xfc_bus_transport_time_us = 20,
+ .xfc_xbuf_latency_tolerance_us = 50,
+ .use_urgent_burst_bw = 0,
+};
struct _vcs_dpi_ip_params_st dcn2_1_ip = {
.odm_capable = 1,
*OutBpp = TruncToValidBPP((1 - Downspreading / 100) * 13500, OutputLinkDPLanes, HTotal, HActive, PixelClockBackEnd, ForcedOutputLinkBPP, LinkDSCEnable, Output,
OutputFormat, DSCInputBitPerComponent, NumberOfDSCSlices, (dml_uint_t)AudioSampleRate, AudioSampleLayout, ODMModeNoDSC, ODMModeDSC, RequiredSlots);
- if (OutBpp == 0 && PHYCLKD32PerState < 20000 / 32 && DSCEnable == dml_dsc_enable_if_necessary && ForcedOutputLinkBPP == 0) {
+ if (*OutBpp == 0 && PHYCLKD32PerState < 20000 / 32 && DSCEnable == dml_dsc_enable_if_necessary && ForcedOutputLinkBPP == 0) {
*RequiresDSC = true;
LinkDSCEnable = true;
*OutBpp = TruncToValidBPP((1 - Downspreading / 100) * 13500, OutputLinkDPLanes, HTotal, HActive, PixelClockBackEnd, ForcedOutputLinkBPP, LinkDSCEnable, Output,
dc->caps.dmub_caps.subvp_psr = dc->ctx->dmub_srv->dmub->feature_caps.subvp_psr_support;
dc->caps.dmub_caps.gecc_enable = dc->ctx->dmub_srv->dmub->feature_caps.gecc_enable;
dc->caps.dmub_caps.mclk_sw = dc->ctx->dmub_srv->dmub->feature_caps.fw_assisted_mclk_switch;
+
+ if (dc->ctx->dmub_srv->dmub->fw_version <
+ DMUB_FW_VERSION(7, 0, 35)) {
+ dc->debug.force_disable_subvp = true;
+ dc->debug.disable_fpo_optimizations = true;
+ }
}
}
/* V2.1 */
struct edp_info edp1_info;
struct edp_info edp2_info;
+ uint32_t gpuclk_ss_percentage;
+ uint32_t gpuclk_ss_type;
};
/*
}
/* VSC packet set to 4 for PSR-SU, or 2 for PSR1 */
- if (stream->link->psr_settings.psr_version == DC_PSR_VERSION_SU_1)
- vsc_packet_revision = vsc_packet_rev4;
- else if (stream->link->replay_settings.config.replay_supported)
+ if (stream->link->psr_settings.psr_feature_enabled) {
+ if (stream->link->psr_settings.psr_version == DC_PSR_VERSION_SU_1)
+ vsc_packet_revision = vsc_packet_rev4;
+ else if (stream->link->psr_settings.psr_version == DC_PSR_VERSION_1)
+ vsc_packet_revision = vsc_packet_rev2;
+ }
+
+ if (stream->link->replay_settings.config.replay_supported)
vsc_packet_revision = vsc_packet_rev4;
- else if (stream->link->psr_settings.psr_version == DC_PSR_VERSION_1)
- vsc_packet_revision = vsc_packet_rev2;
/* Update to revision 5 for extended colorimetry support */
if (stream->use_vsc_sdp_for_colorimetry)
#define MAX_GFX_CLKS 8
#define MAX_CLKS 4
#define NUM_VCN 4
+#define NUM_JPEG_ENG 32
struct seq_file;
enum amd_pp_clock_type;
uint16_t padding;
};
+struct gpu_metrics_v1_5 {
+ struct metrics_table_header common_header;
+
+ /* Temperature (Celsius) */
+ uint16_t temperature_hotspot;
+ uint16_t temperature_mem;
+ uint16_t temperature_vrsoc;
+
+ /* Power (Watts) */
+ uint16_t curr_socket_power;
+
+ /* Utilization (%) */
+ uint16_t average_gfx_activity;
+ uint16_t average_umc_activity; // memory controller
+ uint16_t vcn_activity[NUM_VCN];
+ uint16_t jpeg_activity[NUM_JPEG_ENG];
+
+ /* Energy (15.259uJ (2^-16) units) */
+ uint64_t energy_accumulator;
+
+ /* Driver attached timestamp (in ns) */
+ uint64_t system_clock_counter;
+
+ /* Throttle status */
+ uint32_t throttle_status;
+
+ /* Clock Lock Status. Each bit corresponds to clock instance */
+ uint32_t gfxclk_lock_status;
+
+ /* Link width (number of lanes) and speed (in 0.1 GT/s) */
+ uint16_t pcie_link_width;
+ uint16_t pcie_link_speed;
+
+ /* XGMI bus width and bitrate (in Gbps) */
+ uint16_t xgmi_link_width;
+ uint16_t xgmi_link_speed;
+
+ /* Utilization Accumulated (%) */
+ uint32_t gfx_activity_acc;
+ uint32_t mem_activity_acc;
+
+ /*PCIE accumulated bandwidth (GB/sec) */
+ uint64_t pcie_bandwidth_acc;
+
+ /*PCIE instantaneous bandwidth (GB/sec) */
+ uint64_t pcie_bandwidth_inst;
+
+ /* PCIE L0 to recovery state transition accumulated count */
+ uint64_t pcie_l0_to_recov_count_acc;
+
+ /* PCIE replay accumulated count */
+ uint64_t pcie_replay_count_acc;
+
+ /* PCIE replay rollover accumulated count */
+ uint64_t pcie_replay_rover_count_acc;
+
+ /* PCIE NAK sent accumulated count */
+ uint32_t pcie_nak_sent_count_acc;
+
+ /* PCIE NAK received accumulated count */
+ uint32_t pcie_nak_rcvd_count_acc;
+
+ /* XGMI accumulated data transfer size(KiloBytes) */
+ uint64_t xgmi_read_data_acc[NUM_XGMI_LINKS];
+ uint64_t xgmi_write_data_acc[NUM_XGMI_LINKS];
+
+ /* PMFW attached timestamp (10ns resolution) */
+ uint64_t firmware_timestamp;
+
+ /* Current clocks (Mhz) */
+ uint16_t current_gfxclk[MAX_GFX_CLKS];
+ uint16_t current_socclk[MAX_CLKS];
+ uint16_t current_vclk0[MAX_CLKS];
+ uint16_t current_dclk0[MAX_CLKS];
+ uint16_t current_uclk;
+
+ uint16_t padding;
+};
+
/*
* gpu_metrics_v2_0 is not recommended as it's not naturally aligned.
* Use gpu_metrics_v2_1 or later instead.
if (amdgpu_dpm_is_overdrive_supported(adev))
*states = ATTR_STATE_SUPPORTED;
} else if (DEVICE_ATTR_IS(mem_busy_percent)) {
- if (adev->flags & AMD_IS_APU || gc_ver == IP_VERSION(9, 0, 1))
+ if ((adev->flags & AMD_IS_APU &&
+ gc_ver != IP_VERSION(9, 4, 3)) ||
+ gc_ver == IP_VERSION(9, 0, 1))
*states = ATTR_STATE_UNSUPPORTED;
} else if (DEVICE_ATTR_IS(pcie_bw)) {
/* PCIe Perf counters won't work on APU nodes */
VOLTAGE_GUARDBAND_COUNT
} GFX_GUARDBAND_e;
-#define SMU_METRICS_TABLE_VERSION 0x9
+#define SMU_METRICS_TABLE_VERSION 0xB
typedef struct __attribute__((packed, aligned(4))) {
uint32_t AccumulationCounter;
uint32_t PCIenReplayARolloverCountAcc; // The Pcie counter itself is accumulated
uint32_t PCIeNAKSentCountAcc; // The Pcie counter itself is accumulated
uint32_t PCIeNAKReceivedCountAcc; // The Pcie counter itself is accumulated
-} MetricsTable_t;
+
+ // VCN/JPEG ACTIVITY
+ uint32_t VcnBusy[4];
+ uint32_t JpegBusy[32];
+} MetricsTableX_t;
+
+typedef struct __attribute__((packed, aligned(4))) {
+ uint32_t AccumulationCounter;
+
+ //TEMPERATURE
+ uint32_t MaxSocketTemperature;
+ uint32_t MaxVrTemperature;
+ uint32_t MaxHbmTemperature;
+ uint64_t MaxSocketTemperatureAcc;
+ uint64_t MaxVrTemperatureAcc;
+ uint64_t MaxHbmTemperatureAcc;
+
+ //POWER
+ uint32_t SocketPowerLimit;
+ uint32_t MaxSocketPowerLimit;
+ uint32_t SocketPower;
+
+ //ENERGY
+ uint64_t Timestamp;
+ uint64_t SocketEnergyAcc;
+ uint64_t CcdEnergyAcc;
+ uint64_t XcdEnergyAcc;
+ uint64_t AidEnergyAcc;
+ uint64_t HbmEnergyAcc;
+
+ //FREQUENCY
+ uint32_t CclkFrequencyLimit;
+ uint32_t GfxclkFrequencyLimit;
+ uint32_t FclkFrequency;
+ uint32_t UclkFrequency;
+ uint32_t SocclkFrequency[4];
+ uint32_t VclkFrequency[4];
+ uint32_t DclkFrequency[4];
+ uint32_t LclkFrequency[4];
+ uint64_t GfxclkFrequencyAcc[8];
+ uint64_t CclkFrequencyAcc[96];
+
+ //FREQUENCY RANGE
+ uint32_t MaxCclkFrequency;
+ uint32_t MinCclkFrequency;
+ uint32_t MaxGfxclkFrequency;
+ uint32_t MinGfxclkFrequency;
+ uint32_t FclkFrequencyTable[4];
+ uint32_t UclkFrequencyTable[4];
+ uint32_t SocclkFrequencyTable[4];
+ uint32_t VclkFrequencyTable[4];
+ uint32_t DclkFrequencyTable[4];
+ uint32_t LclkFrequencyTable[4];
+ uint32_t MaxLclkDpmRange;
+ uint32_t MinLclkDpmRange;
+
+ //XGMI
+ uint32_t XgmiWidth;
+ uint32_t XgmiBitrate;
+ uint64_t XgmiReadBandwidthAcc[8];
+ uint64_t XgmiWriteBandwidthAcc[8];
+
+ //ACTIVITY
+ uint32_t SocketC0Residency;
+ uint32_t SocketGfxBusy;
+ uint32_t DramBandwidthUtilization;
+ uint64_t SocketC0ResidencyAcc;
+ uint64_t SocketGfxBusyAcc;
+ uint64_t DramBandwidthAcc;
+ uint32_t MaxDramBandwidth;
+ uint64_t DramBandwidthUtilizationAcc;
+ uint64_t PcieBandwidthAcc[4];
+
+ //THROTTLERS
+ uint32_t ProchotResidencyAcc;
+ uint32_t PptResidencyAcc;
+ uint32_t SocketThmResidencyAcc;
+ uint32_t VrThmResidencyAcc;
+ uint32_t HbmThmResidencyAcc;
+ uint32_t GfxLockXCDMak;
+
+ // New Items at end to maintain driver compatibility
+ uint32_t GfxclkFrequency[8];
+
+ //PSNs
+ uint64_t PublicSerialNumber_AID[4];
+ uint64_t PublicSerialNumber_XCD[8];
+ uint64_t PublicSerialNumber_CCD[12];
+
+ //XGMI Data tranfser size
+ uint64_t XgmiReadDataSizeAcc[8];//in KByte
+ uint64_t XgmiWriteDataSizeAcc[8];//in KByte
+
+ // VCN/JPEG ACTIVITY
+ uint32_t VcnBusy[4];
+ uint32_t JpegBusy[32];
+} MetricsTableA_t;
#define SMU_VF_METRICS_TABLE_VERSION 0x3
#define SMUQ10_TO_UINT(x) ((x) >> 10)
#define SMUQ10_FRAC(x) ((x) & 0x3ff)
#define SMUQ10_ROUND(x) ((SMUQ10_TO_UINT(x)) + ((SMUQ10_FRAC(x)) >= 0x200))
+#define GET_METRIC_FIELD(field) ((adev->flags & AMD_IS_APU) ?\
+ (metrics_a->field) : (metrics_x->field))
struct smu_v13_0_6_dpm_map {
enum smu_clk_type clk_type;
SMU_TABLE_INIT(tables, SMU_TABLE_PMSTATUSLOG, SMU13_TOOL_SIZE,
PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM);
- SMU_TABLE_INIT(tables, SMU_TABLE_SMU_METRICS, sizeof(MetricsTable_t),
+ SMU_TABLE_INIT(tables, SMU_TABLE_SMU_METRICS,
+ max(sizeof(MetricsTableX_t), sizeof(MetricsTableA_t)),
PAGE_SIZE,
AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT);
PAGE_SIZE,
AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT);
- smu_table->metrics_table = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);
+ smu_table->metrics_table = kzalloc(max(sizeof(MetricsTableX_t),
+ sizeof(MetricsTableA_t)), GFP_KERNEL);
if (!smu_table->metrics_table)
return -ENOMEM;
smu_table->metrics_time = 0;
- smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_4);
+ smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_5);
smu_table->gpu_metrics_table =
kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
if (!smu_table->gpu_metrics_table) {
static int smu_v13_0_6_setup_driver_pptable(struct smu_context *smu)
{
struct smu_table_context *smu_table = &smu->smu_table;
- MetricsTable_t *metrics = (MetricsTable_t *)smu_table->metrics_table;
+ MetricsTableX_t *metrics_x = (MetricsTableX_t *)smu_table->metrics_table;
+ MetricsTableA_t *metrics_a = (MetricsTableA_t *)smu_table->metrics_table;
struct PPTable_t *pptable =
(struct PPTable_t *)smu_table->driver_pptable;
+ struct amdgpu_device *adev = smu->adev;
int ret, i, retry = 100;
/* Store one-time values in driver PPTable */
return ret;
/* Ensure that metrics have been updated */
- if (metrics->AccumulationCounter)
+ if (GET_METRIC_FIELD(AccumulationCounter))
break;
usleep_range(1000, 1100);
return -ETIME;
pptable->MaxSocketPowerLimit =
- SMUQ10_ROUND(metrics->MaxSocketPowerLimit);
+ SMUQ10_ROUND(GET_METRIC_FIELD(MaxSocketPowerLimit));
pptable->MaxGfxclkFrequency =
- SMUQ10_ROUND(metrics->MaxGfxclkFrequency);
+ SMUQ10_ROUND(GET_METRIC_FIELD(MaxGfxclkFrequency));
pptable->MinGfxclkFrequency =
- SMUQ10_ROUND(metrics->MinGfxclkFrequency);
+ SMUQ10_ROUND(GET_METRIC_FIELD(MinGfxclkFrequency));
for (i = 0; i < 4; ++i) {
pptable->FclkFrequencyTable[i] =
- SMUQ10_ROUND(metrics->FclkFrequencyTable[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(FclkFrequencyTable)[i]);
pptable->UclkFrequencyTable[i] =
- SMUQ10_ROUND(metrics->UclkFrequencyTable[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(UclkFrequencyTable)[i]);
pptable->SocclkFrequencyTable[i] = SMUQ10_ROUND(
- metrics->SocclkFrequencyTable[i]);
+ GET_METRIC_FIELD(SocclkFrequencyTable)[i]);
pptable->VclkFrequencyTable[i] =
- SMUQ10_ROUND(metrics->VclkFrequencyTable[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(VclkFrequencyTable)[i]);
pptable->DclkFrequencyTable[i] =
- SMUQ10_ROUND(metrics->DclkFrequencyTable[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(DclkFrequencyTable)[i]);
pptable->LclkFrequencyTable[i] =
- SMUQ10_ROUND(metrics->LclkFrequencyTable[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(LclkFrequencyTable)[i]);
}
/* use AID0 serial number by default */
- pptable->PublicSerialNumber_AID = metrics->PublicSerialNumber_AID[0];
+ pptable->PublicSerialNumber_AID = GET_METRIC_FIELD(PublicSerialNumber_AID)[0];
pptable->Init = true;
}
uint32_t *value)
{
struct smu_table_context *smu_table = &smu->smu_table;
- MetricsTable_t *metrics = (MetricsTable_t *)smu_table->metrics_table;
+ MetricsTableX_t *metrics_x = (MetricsTableX_t *)smu_table->metrics_table;
+ MetricsTableA_t *metrics_a = (MetricsTableA_t *)smu_table->metrics_table;
struct amdgpu_device *adev = smu->adev;
int ret = 0;
int xcc_id;
case METRICS_AVERAGE_GFXCLK:
if (smu->smc_fw_version >= 0x552F00) {
xcc_id = GET_INST(GC, 0);
- *value = SMUQ10_ROUND(metrics->GfxclkFrequency[xcc_id]);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(GfxclkFrequency)[xcc_id]);
} else {
*value = 0;
}
break;
case METRICS_CURR_SOCCLK:
case METRICS_AVERAGE_SOCCLK:
- *value = SMUQ10_ROUND(metrics->SocclkFrequency[0]);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(SocclkFrequency)[0]);
break;
case METRICS_CURR_UCLK:
case METRICS_AVERAGE_UCLK:
- *value = SMUQ10_ROUND(metrics->UclkFrequency);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(UclkFrequency));
break;
case METRICS_CURR_VCLK:
- *value = SMUQ10_ROUND(metrics->VclkFrequency[0]);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(VclkFrequency)[0]);
break;
case METRICS_CURR_DCLK:
- *value = SMUQ10_ROUND(metrics->DclkFrequency[0]);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(DclkFrequency)[0]);
break;
case METRICS_CURR_FCLK:
- *value = SMUQ10_ROUND(metrics->FclkFrequency);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(FclkFrequency));
break;
case METRICS_AVERAGE_GFXACTIVITY:
- *value = SMUQ10_ROUND(metrics->SocketGfxBusy);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(SocketGfxBusy));
break;
case METRICS_AVERAGE_MEMACTIVITY:
- *value = SMUQ10_ROUND(metrics->DramBandwidthUtilization);
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(DramBandwidthUtilization));
break;
case METRICS_CURR_SOCKETPOWER:
- *value = SMUQ10_ROUND(metrics->SocketPower) << 8;
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(SocketPower)) << 8;
break;
case METRICS_TEMPERATURE_HOTSPOT:
- *value = SMUQ10_ROUND(metrics->MaxSocketTemperature) *
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(MaxSocketTemperature)) *
SMU_TEMPERATURE_UNITS_PER_CENTIGRADES;
break;
case METRICS_TEMPERATURE_MEM:
- *value = SMUQ10_ROUND(metrics->MaxHbmTemperature) *
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(MaxHbmTemperature)) *
SMU_TEMPERATURE_UNITS_PER_CENTIGRADES;
break;
/* This is the max of all VRs and not just SOC VR.
* No need to define another data type for the same.
*/
case METRICS_TEMPERATURE_VRSOC:
- *value = SMUQ10_ROUND(metrics->MaxVrTemperature) *
+ *value = SMUQ10_ROUND(GET_METRIC_FIELD(MaxVrTemperature)) *
SMU_TEMPERATURE_UNITS_PER_CENTIGRADES;
break;
default:
static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table)
{
struct smu_table_context *smu_table = &smu->smu_table;
- struct gpu_metrics_v1_4 *gpu_metrics =
- (struct gpu_metrics_v1_4 *)smu_table->gpu_metrics_table;
+ struct gpu_metrics_v1_5 *gpu_metrics =
+ (struct gpu_metrics_v1_5 *)smu_table->gpu_metrics_table;
struct amdgpu_device *adev = smu->adev;
- int ret = 0, xcc_id, inst, i;
- MetricsTable_t *metrics;
+ int ret = 0, xcc_id, inst, i, j;
+ MetricsTableX_t *metrics_x;
+ MetricsTableA_t *metrics_a;
u16 link_width_level;
- metrics = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);
- ret = smu_v13_0_6_get_metrics_table(smu, metrics, true);
+ metrics_x = kzalloc(max(sizeof(MetricsTableX_t), sizeof(MetricsTableA_t)), GFP_KERNEL);
+ ret = smu_v13_0_6_get_metrics_table(smu, metrics_x, true);
if (ret) {
- kfree(metrics);
+ kfree(metrics_x);
return ret;
}
- smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 4);
+ metrics_a = (MetricsTableA_t *)metrics_x;
+
+ smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 5);
gpu_metrics->temperature_hotspot =
- SMUQ10_ROUND(metrics->MaxSocketTemperature);
+ SMUQ10_ROUND(GET_METRIC_FIELD(MaxSocketTemperature));
/* Individual HBM stack temperature is not reported */
gpu_metrics->temperature_mem =
- SMUQ10_ROUND(metrics->MaxHbmTemperature);
+ SMUQ10_ROUND(GET_METRIC_FIELD(MaxHbmTemperature));
/* Reports max temperature of all voltage rails */
gpu_metrics->temperature_vrsoc =
- SMUQ10_ROUND(metrics->MaxVrTemperature);
+ SMUQ10_ROUND(GET_METRIC_FIELD(MaxVrTemperature));
gpu_metrics->average_gfx_activity =
- SMUQ10_ROUND(metrics->SocketGfxBusy);
+ SMUQ10_ROUND(GET_METRIC_FIELD(SocketGfxBusy));
gpu_metrics->average_umc_activity =
- SMUQ10_ROUND(metrics->DramBandwidthUtilization);
+ SMUQ10_ROUND(GET_METRIC_FIELD(DramBandwidthUtilization));
gpu_metrics->curr_socket_power =
- SMUQ10_ROUND(metrics->SocketPower);
+ SMUQ10_ROUND(GET_METRIC_FIELD(SocketPower));
/* Energy counter reported in 15.259uJ (2^-16) units */
- gpu_metrics->energy_accumulator = metrics->SocketEnergyAcc;
+ gpu_metrics->energy_accumulator = GET_METRIC_FIELD(SocketEnergyAcc);
for (i = 0; i < MAX_GFX_CLKS; i++) {
xcc_id = GET_INST(GC, i);
if (xcc_id >= 0)
gpu_metrics->current_gfxclk[i] =
- SMUQ10_ROUND(metrics->GfxclkFrequency[xcc_id]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(GfxclkFrequency)[xcc_id]);
if (i < MAX_CLKS) {
gpu_metrics->current_socclk[i] =
- SMUQ10_ROUND(metrics->SocclkFrequency[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(SocclkFrequency)[i]);
inst = GET_INST(VCN, i);
if (inst >= 0) {
gpu_metrics->current_vclk0[i] =
- SMUQ10_ROUND(metrics->VclkFrequency[inst]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(VclkFrequency)[inst]);
gpu_metrics->current_dclk0[i] =
- SMUQ10_ROUND(metrics->DclkFrequency[inst]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(DclkFrequency)[inst]);
}
}
}
- gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
+ gpu_metrics->current_uclk = SMUQ10_ROUND(GET_METRIC_FIELD(UclkFrequency));
/* Throttle status is not reported through metrics now */
gpu_metrics->throttle_status = 0;
/* Clock Lock Status. Each bit corresponds to each GFXCLK instance */
- gpu_metrics->gfxclk_lock_status = metrics->GfxLockXCDMak >> GET_INST(GC, 0);
+ gpu_metrics->gfxclk_lock_status = GET_METRIC_FIELD(GfxLockXCDMak) >> GET_INST(GC, 0);
if (!(adev->flags & AMD_IS_APU)) {
link_width_level = smu_v13_0_6_get_current_pcie_link_width_level(smu);
gpu_metrics->pcie_link_speed =
smu_v13_0_6_get_current_pcie_link_speed(smu);
gpu_metrics->pcie_bandwidth_acc =
- SMUQ10_ROUND(metrics->PcieBandwidthAcc[0]);
+ SMUQ10_ROUND(metrics_x->PcieBandwidthAcc[0]);
gpu_metrics->pcie_bandwidth_inst =
- SMUQ10_ROUND(metrics->PcieBandwidth[0]);
+ SMUQ10_ROUND(metrics_x->PcieBandwidth[0]);
gpu_metrics->pcie_l0_to_recov_count_acc =
- metrics->PCIeL0ToRecoveryCountAcc;
+ metrics_x->PCIeL0ToRecoveryCountAcc;
gpu_metrics->pcie_replay_count_acc =
- metrics->PCIenReplayAAcc;
+ metrics_x->PCIenReplayAAcc;
gpu_metrics->pcie_replay_rover_count_acc =
- metrics->PCIenReplayARolloverCountAcc;
+ metrics_x->PCIenReplayARolloverCountAcc;
+ gpu_metrics->pcie_nak_sent_count_acc =
+ metrics_x->PCIeNAKSentCountAcc;
+ gpu_metrics->pcie_nak_rcvd_count_acc =
+ metrics_x->PCIeNAKReceivedCountAcc;
}
gpu_metrics->system_clock_counter = ktime_get_boottime_ns();
gpu_metrics->gfx_activity_acc =
- SMUQ10_ROUND(metrics->SocketGfxBusyAcc);
+ SMUQ10_ROUND(GET_METRIC_FIELD(SocketGfxBusyAcc));
gpu_metrics->mem_activity_acc =
- SMUQ10_ROUND(metrics->DramBandwidthUtilizationAcc);
+ SMUQ10_ROUND(GET_METRIC_FIELD(DramBandwidthUtilizationAcc));
for (i = 0; i < NUM_XGMI_LINKS; i++) {
gpu_metrics->xgmi_read_data_acc[i] =
- SMUQ10_ROUND(metrics->XgmiReadDataSizeAcc[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(XgmiReadDataSizeAcc)[i]);
gpu_metrics->xgmi_write_data_acc[i] =
- SMUQ10_ROUND(metrics->XgmiWriteDataSizeAcc[i]);
+ SMUQ10_ROUND(GET_METRIC_FIELD(XgmiWriteDataSizeAcc)[i]);
+ }
+
+ for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) {
+ inst = GET_INST(JPEG, i);
+ for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) {
+ gpu_metrics->jpeg_activity[(i * adev->jpeg.num_jpeg_rings) + j] =
+ SMUQ10_ROUND(GET_METRIC_FIELD(JpegBusy)
+ [(inst * adev->jpeg.num_jpeg_rings) + j]);
+ }
+ }
+
+ for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
+ inst = GET_INST(VCN, i);
+ gpu_metrics->vcn_activity[i] =
+ SMUQ10_ROUND(GET_METRIC_FIELD(VcnBusy)[inst]);
}
- gpu_metrics->xgmi_link_width = SMUQ10_ROUND(metrics->XgmiWidth);
- gpu_metrics->xgmi_link_speed = SMUQ10_ROUND(metrics->XgmiBitrate);
+ gpu_metrics->xgmi_link_width = SMUQ10_ROUND(GET_METRIC_FIELD(XgmiWidth));
+ gpu_metrics->xgmi_link_speed = SMUQ10_ROUND(GET_METRIC_FIELD(XgmiBitrate));
- gpu_metrics->firmware_timestamp = metrics->Timestamp;
+ gpu_metrics->firmware_timestamp = GET_METRIC_FIELD(Timestamp);
*table = (void *)gpu_metrics;
- kfree(metrics);
+ kfree(metrics_x);
return sizeof(*gpu_metrics);
}
case METRICS_VERSION(1, 4):
structure_size = sizeof(struct gpu_metrics_v1_4);
break;
+ case METRICS_VERSION(1, 5):
+ structure_size = sizeof(struct gpu_metrics_v1_5);
+ break;
case METRICS_VERSION(2, 0):
structure_size = sizeof(struct gpu_metrics_v2_0);
break;
struct ps8640 *ps_bridge = aux_to_ps8640(aux);
struct regmap *map = ps_bridge->regmap[PAGE0_DP_CNTL];
struct device *dev = &ps_bridge->page[PAGE0_DP_CNTL]->dev;
- unsigned int len = msg->size;
+ size_t len = msg->size;
unsigned int data;
unsigned int base;
int ret;
return ret;
}
- buf[i] = data;
+ if (i < msg->size)
+ buf[i] = data;
}
}
- return len;
+ return min(len, msg->size);
}
static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux,
u32 request_val = AUX_CMD_REQ(msg->request);
u8 *buf = msg->buffer;
unsigned int len = msg->size;
+ unsigned int short_len;
unsigned int val;
int ret;
u8 addr_len[SN_AUX_LENGTH_REG + 1 - SN_AUX_ADDR_19_16_REG];
}
if (val & AUX_IRQ_STATUS_AUX_SHORT) {
- ret = regmap_read(pdata->regmap, SN_AUX_LENGTH_REG, &len);
+ ret = regmap_read(pdata->regmap, SN_AUX_LENGTH_REG, &short_len);
+ len = min(len, short_len);
if (ret)
goto exit;
} else if (val & AUX_IRQ_STATUS_NAT_I2C_FAIL) {
struct syncobj_eventfd_entry *entry =
container_of(cb, struct syncobj_eventfd_entry, fence_cb);
- eventfd_signal(entry->ev_fd_ctx, 1);
+ eventfd_signal(entry->ev_fd_ctx);
syncobj_eventfd_entry_free(entry);
}
entry->fence = fence;
if (entry->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE) {
- eventfd_signal(entry->ev_fd_ctx, 1);
+ eventfd_signal(entry->ev_fd_ctx);
syncobj_eventfd_entry_free(entry);
} else {
ret = dma_fence_add_callback(fence, &entry->fence_cb,
syncobj_eventfd_entry_fence_func);
if (ret == -ENOENT) {
- eventfd_signal(entry->ev_fd_ctx, 1);
+ eventfd_signal(entry->ev_fd_ctx);
syncobj_eventfd_entry_free(entry);
}
}
val |= XELPDP_FORWARD_CLOCK_UNGATE;
- if (is_hdmi_frl(crtc_state->port_clock))
+ if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI) &&
+ is_hdmi_frl(crtc_state->port_clock))
val |= XELPDP_DDI_CLOCK_SELECT(XELPDP_DDI_CLOCK_SELECT_DIV18CLK);
else
val |= XELPDP_DDI_CLOCK_SELECT(XELPDP_DDI_CLOCK_SELECT_MAXPCLK);
if (!active)
goto out;
- intel_dsc_get_config(pipe_config);
intel_bigjoiner_get_config(pipe_config);
+ intel_dsc_get_config(pipe_config);
if (!transcoder_is_dsi(pipe_config->cpu_transcoder) ||
DISPLAY_VER(dev_priv) >= 11)
return -EINVAL;
}
+ /*
+ * FIXME: Bigjoiner+async flip is busted currently.
+ * Remove this check once the issues are fixed.
+ */
+ if (new_crtc_state->bigjoiner_pipes) {
+ drm_dbg_kms(&i915->drm,
+ "[CRTC:%d:%s] async flip disallowed with bigjoiner\n",
+ crtc->base.base.id, crtc->base.name);
+ return -EINVAL;
+ }
+
for_each_oldnew_intel_plane_in_state(state, plane, old_plane_state,
new_plane_state, i) {
if (plane->pipe != crtc->pipe)
enum intel_dmc_id dmc_id;
/* TODO: check if the following applies to all D13+ platforms. */
- if (!IS_DG2(i915) && !IS_TIGERLAKE(i915))
+ if (!IS_TIGERLAKE(i915))
return;
for_each_dmc_id(dmc_id) {
intel_de_rmw(i915, PIPEDMC_CONTROL(pipe), PIPEDMC_ENABLE, 0);
}
+static bool is_dmc_evt_ctl_reg(struct drm_i915_private *i915,
+ enum intel_dmc_id dmc_id, i915_reg_t reg)
+{
+ u32 offset = i915_mmio_reg_offset(reg);
+ u32 start = i915_mmio_reg_offset(DMC_EVT_CTL(i915, dmc_id, 0));
+ u32 end = i915_mmio_reg_offset(DMC_EVT_CTL(i915, dmc_id, DMC_EVENT_HANDLER_COUNT_GEN12));
+
+ return offset >= start && offset < end;
+}
+
+static bool disable_dmc_evt(struct drm_i915_private *i915,
+ enum intel_dmc_id dmc_id,
+ i915_reg_t reg, u32 data)
+{
+ if (!is_dmc_evt_ctl_reg(i915, dmc_id, reg))
+ return false;
+
+ /* keep all pipe DMC events disabled by default */
+ if (dmc_id != DMC_FW_MAIN)
+ return true;
+
+ return false;
+}
+
+static u32 dmc_mmiodata(struct drm_i915_private *i915,
+ struct intel_dmc *dmc,
+ enum intel_dmc_id dmc_id, int i)
+{
+ if (disable_dmc_evt(i915, dmc_id,
+ dmc->dmc_info[dmc_id].mmioaddr[i],
+ dmc->dmc_info[dmc_id].mmiodata[i]))
+ return REG_FIELD_PREP(DMC_EVT_CTL_TYPE_MASK,
+ DMC_EVT_CTL_TYPE_EDGE_0_1) |
+ REG_FIELD_PREP(DMC_EVT_CTL_EVENT_ID_MASK,
+ DMC_EVT_CTL_EVENT_ID_FALSE);
+ else
+ return dmc->dmc_info[dmc_id].mmiodata[i];
+}
+
/**
* intel_dmc_load_program() - write the firmware from memory to register.
* @i915: i915 drm device.
for_each_dmc_id(dmc_id) {
for (i = 0; i < dmc->dmc_info[dmc_id].mmio_count; i++) {
intel_de_write(i915, dmc->dmc_info[dmc_id].mmioaddr[i],
- dmc->dmc_info[dmc_id].mmiodata[i]);
+ dmc_mmiodata(i915, dmc, dmc_id, i));
}
}
intel_dp->train_set, crtc_state->lane_count);
drm_dp_set_phy_test_pattern(&intel_dp->aux, data,
- link_status[DP_DPCD_REV]);
+ intel_dp->dpcd[DP_DPCD_REV]);
}
static u8 intel_dp_autotest_phy_pattern(struct intel_dp *intel_dp)
#define MSI_CAP_DATA(offset) (offset + 8)
#define MSI_CAP_EN 0x1
-static int inject_virtual_interrupt(struct intel_vgpu *vgpu)
+static void inject_virtual_interrupt(struct intel_vgpu *vgpu)
{
unsigned long offset = vgpu->gvt->device_info.msi_cap_offset;
u16 control, data;
/* Do not generate MSI if MSIEN is disabled */
if (!(control & MSI_CAP_EN))
- return 0;
+ return;
if (WARN(control & GENMASK(15, 1), "only support one MSI format\n"))
- return -EINVAL;
+ return;
trace_inject_msi(vgpu->id, addr, data);
* returned and don't inject interrupt into guest.
*/
if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
- return -ESRCH;
- if (vgpu->msi_trigger && eventfd_signal(vgpu->msi_trigger, 1) != 1)
- return -EFAULT;
- return 0;
+ return;
+ if (vgpu->msi_trigger)
+ eventfd_signal(vgpu->msi_trigger);
}
static void propagate_event(struct intel_gvt_irq *irq,
* tau4 = (4 | x) << y
* but add 2 when doing the final right shift to account for units
*/
- tau4 = ((1 << x_w) | x) << y;
+ tau4 = (u64)((1 << x_w) | x) << y;
/* val in hwmon interface units (millisec) */
out = mul_u64_u32_shr(tau4, SF_TIME, hwmon->scl_shift_time + x_w);
r = FIELD_PREP(PKG_MAX_WIN, PKG_MAX_WIN_DEFAULT);
x = REG_FIELD_GET(PKG_MAX_WIN_X, r);
y = REG_FIELD_GET(PKG_MAX_WIN_Y, r);
- tau4 = ((1 << x_w) | x) << y;
+ tau4 = (u64)((1 << x_w) | x) << y;
max_win = mul_u64_u32_shr(tau4, SF_TIME, hwmon->scl_shift_time + x_w);
if (val > max_win)
* The reason field includes flags identifying what
* triggered this specific report (mostly timer
* triggered or e.g. due to a context switch).
- *
- * In MMIO triggered reports, some platforms do not set the
- * reason bit in this field and it is valid to have a reason
- * field of zero.
*/
reason = oa_report_reason(stream, report);
ctx_id = oa_context_id(stream, report32);
*
* Note: that we don't clear the valid_ctx_bit so userspace can
* understand that the ID has been squashed by the kernel.
+ *
+ * Update:
+ *
+ * On XEHP platforms the behavior of context id valid bit has
+ * changed compared to prior platforms. To describe this, we
+ * define a few terms:
+ *
+ * context-switch-report: This is a report with the reason type
+ * being context-switch. It is generated when a context switches
+ * out.
+ *
+ * context-valid-bit: A bit that is set in the report ID field
+ * to indicate that a valid context has been loaded.
+ *
+ * gpu-idle: A condition characterized by a
+ * context-switch-report with context-valid-bit set to 0.
+ *
+ * On prior platforms, context-id-valid bit is set to 0 only
+ * when GPU goes idle. In all other reports, it is set to 1.
+ *
+ * On XEHP platforms, context-valid-bit is set to 1 in a context
+ * switch report if a new context switched in. For all other
+ * reports it is set to 0.
+ *
+ * This change in behavior causes an issue with MMIO triggered
+ * reports. MMIO triggered reports have the markers in the
+ * context ID field and the context-valid-bit is 0. The logic
+ * below to squash the context ID would render the report
+ * useless since the user will not be able to find it in the OA
+ * buffer. Since MMIO triggered reports exist only on XEHP,
+ * we should avoid squashing these for XEHP platforms.
*/
- if (oa_report_ctx_invalid(stream, report)) {
+
+ if (oa_report_ctx_invalid(stream, report) &&
+ GRAPHICS_VER_FULL(stream->engine->i915) < IP_VER(12, 50)) {
ctx_id = INVALID_CTX_ID;
oa_context_id_squash(stream, report32);
}
.destroy = drm_plane_cleanup, \
DRM_GEM_SHADOW_PLANE_FUNCS
+void mgag200_crtc_set_gamma_linear(struct mga_device *mdev, const struct drm_format_info *format);
+void mgag200_crtc_set_gamma(struct mga_device *mdev,
+ const struct drm_format_info *format,
+ struct drm_color_lut *lut);
+
enum drm_mode_status mgag200_crtc_helper_mode_valid(struct drm_crtc *crtc,
const struct drm_display_mode *mode);
int mgag200_crtc_helper_atomic_check(struct drm_crtc *crtc, struct drm_atomic_state *new_state);
mgag200_g200er_reset_tagfifo(mdev);
+ if (crtc_state->gamma_lut)
+ mgag200_crtc_set_gamma(mdev, format, crtc_state->gamma_lut->data);
+ else
+ mgag200_crtc_set_gamma_linear(mdev, format);
+
mgag200_enable_display(mdev);
if (funcs->enable_vidrst)
mgag200_g200ev_set_hiprilvl(mdev);
+ if (crtc_state->gamma_lut)
+ mgag200_crtc_set_gamma(mdev, format, crtc_state->gamma_lut->data);
+ else
+ mgag200_crtc_set_gamma_linear(mdev, format);
+
mgag200_enable_display(mdev);
if (funcs->enable_vidrst)
mgag200_g200se_set_hiprilvl(mdev, adjusted_mode, format);
+ if (crtc_state->gamma_lut)
+ mgag200_crtc_set_gamma(mdev, format, crtc_state->gamma_lut->data);
+ else
+ mgag200_crtc_set_gamma_linear(mdev, format);
+
mgag200_enable_display(mdev);
if (funcs->enable_vidrst)
* This file contains setup code for the CRTC.
*/
-static void mgag200_crtc_set_gamma_linear(struct mga_device *mdev,
- const struct drm_format_info *format)
+void mgag200_crtc_set_gamma_linear(struct mga_device *mdev,
+ const struct drm_format_info *format)
{
int i;
}
}
-static void mgag200_crtc_set_gamma(struct mga_device *mdev,
- const struct drm_format_info *format,
- struct drm_color_lut *lut)
+void mgag200_crtc_set_gamma(struct mga_device *mdev,
+ const struct drm_format_info *format,
+ struct drm_color_lut *lut)
{
int i;
void (*rpc_done)(struct nvkm_gsp *gsp, void *repv);
void *(*rm_ctrl_get)(struct nvkm_gsp_object *, u32 cmd, u32 argc);
- void *(*rm_ctrl_push)(struct nvkm_gsp_object *, void *argv, u32 repc);
+ int (*rm_ctrl_push)(struct nvkm_gsp_object *, void **argv, u32 repc);
void (*rm_ctrl_done)(struct nvkm_gsp_object *, void *repv);
void *(*rm_alloc_get)(struct nvkm_gsp_object *, u32 oclass, u32 argc);
return object->client->gsp->rm->rm_ctrl_get(object, cmd, argc);
}
-static inline void *
+static inline int
nvkm_gsp_rm_ctrl_push(struct nvkm_gsp_object *object, void *argv, u32 repc)
{
return object->client->gsp->rm->rm_ctrl_push(object, argv, repc);
nvkm_gsp_rm_ctrl_rd(struct nvkm_gsp_object *object, u32 cmd, u32 repc)
{
void *argv = nvkm_gsp_rm_ctrl_get(object, cmd, repc);
+ int ret;
if (IS_ERR(argv))
return argv;
- return nvkm_gsp_rm_ctrl_push(object, argv, repc);
+ ret = nvkm_gsp_rm_ctrl_push(object, &argv, repc);
+ if (ret)
+ return ERR_PTR(ret);
+ return argv;
}
static inline int
nvkm_gsp_rm_ctrl_wr(struct nvkm_gsp_object *object, void *argv)
{
- void *repv = nvkm_gsp_rm_ctrl_push(object, argv, 0);
-
- if (IS_ERR(repv))
- return PTR_ERR(repv);
+ int ret = nvkm_gsp_rm_ctrl_push(object, &argv, 0);
+ if (ret)
+ return ret;
return 0;
}
if (test_bit(DMA_FENCE_FLAG_USER_BITS, &fence->base.flags)) {
struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
- if (!--fctx->notify_ref)
+ if (atomic_dec_and_test(&fctx->notify_ref))
drop = 1;
}
void
nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
{
+ cancel_work_sync(&fctx->allow_block_work);
nouveau_fence_context_kill(fctx, 0);
nvif_event_dtor(&fctx->event);
fctx->dead = 1;
return ret;
}
+static void
+nouveau_fence_work_allow_block(struct work_struct *work)
+{
+ struct nouveau_fence_chan *fctx = container_of(work, struct nouveau_fence_chan,
+ allow_block_work);
+
+ if (atomic_read(&fctx->notify_ref) == 0)
+ nvif_event_block(&fctx->event);
+ else
+ nvif_event_allow(&fctx->event);
+}
+
void
nouveau_fence_context_new(struct nouveau_channel *chan, struct nouveau_fence_chan *fctx)
{
} args;
int ret;
+ INIT_WORK(&fctx->allow_block_work, nouveau_fence_work_allow_block);
INIT_LIST_HEAD(&fctx->flip);
INIT_LIST_HEAD(&fctx->pending);
spin_lock_init(&fctx->lock);
struct nouveau_fence *fence = from_fence(f);
struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
bool ret;
+ bool do_work;
- if (!fctx->notify_ref++)
- nvif_event_allow(&fctx->event);
+ if (atomic_inc_return(&fctx->notify_ref) == 0)
+ do_work = true;
ret = nouveau_fence_no_signaling(f);
if (ret)
set_bit(DMA_FENCE_FLAG_USER_BITS, &fence->base.flags);
- else if (!--fctx->notify_ref)
- nvif_event_block(&fctx->event);
+ else if (atomic_dec_and_test(&fctx->notify_ref))
+ do_work = true;
+
+ if (do_work)
+ schedule_work(&fctx->allow_block_work);
return ret;
}
#define __NOUVEAU_FENCE_H__
#include <linux/dma-fence.h>
+#include <linux/workqueue.h>
#include <nvif/event.h>
struct nouveau_drm;
char name[32];
struct nvif_event event;
- int notify_ref, dead, killed;
+ struct work_struct allow_block_work;
+ atomic_t notify_ref;
+ int dead, killed;
};
struct nouveau_fence_priv {
nvkm_head_del(&head);
}
- if (disp->func->dtor)
+ if (disp->func && disp->func->dtor)
disp->func->dtor(disp);
return data;
spin_lock_init(&disp->client.lock);
ret = nvkm_engine_ctor(&nvkm_disp, device, type, inst, true, &disp->engine);
- if (ret)
+ if (ret) {
+ disp->func = NULL;
return ret;
+ }
if (func->super) {
disp->super.wq = create_singlethread_workqueue("nvkm-disp");
{
struct nvkm_disp *disp = sor->disp;
NV0073_CTRL_SPECIFIC_BACKLIGHT_BRIGHTNESS_PARAMS *ctrl;
- int lvl;
+ int ret, lvl;
ctrl = nvkm_gsp_rm_ctrl_get(&disp->rm.objcom,
NV0073_CTRL_CMD_SPECIFIC_GET_BACKLIGHT_BRIGHTNESS,
ctrl->displayId = BIT(sor->asy.outp->index);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
lvl = ctrl->brightness;
nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
ctrl->subDeviceInstance = 0;
ctrl->displayId = BIT(id);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return (void *)ctrl;
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ERR_PTR(ret);
+ }
list_for_each_entry(conn, &disp->conns, head) {
if (conn->index == ctrl->data[0].index) {
struct nvkm_disp *disp = outp->disp;
struct nvkm_ior *ior;
NV0073_CTRL_DFP_ASSIGN_SOR_PARAMS *ctrl;
- int or;
+ int ret, or;
ctrl = nvkm_gsp_rm_ctrl_get(&disp->rm.objcom,
NV0073_CTRL_CMD_DFP_ASSIGN_SOR, sizeof(*ctrl));
if (hda)
ctrl->flags |= NVDEF(NV0073_CTRL, DFP_ASSIGN_SOR_FLAGS, AUDIO, OPTIMAL);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
for (or = 0; or < ARRAY_SIZE(ctrl->sorAssignListWithTag); or++) {
if (ctrl->sorAssignListWithTag[or].displayMask & BIT(outp->index)) {
r535_disp_head_displayid(struct nvkm_disp *disp, int head, u32 *displayid)
{
NV0073_CTRL_SYSTEM_GET_ACTIVE_PARAMS *ctrl;
+ int ret;
ctrl = nvkm_gsp_rm_ctrl_get(&disp->rm.objcom,
NV0073_CTRL_CMD_SYSTEM_GET_ACTIVE, sizeof(*ctrl));
ctrl->subDeviceInstance = 0;
ctrl->head = head;
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
*displayid = ctrl->displayId;
nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
ctrl->subDeviceInstance = 0;
ctrl->displayId = displayid;
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
return NULL;
+ }
id = ctrl->index;
proto = ctrl->protocol;
{
NV0073_CTRL_DFP_GET_INFO_PARAMS *ctrl;
struct nvkm_disp *disp = outp->disp;
+ int ret;
ctrl = nvkm_gsp_rm_ctrl_get(&disp->rm.objcom, NV0073_CTRL_CMD_DFP_GET_INFO, sizeof(*ctrl));
if (IS_ERR(ctrl))
ctrl->displayId = BIT(outp->index);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
nvkm_debug(&disp->engine.subdev, "DFP %08x: flags:%08x flags2:%08x\n",
ctrl->displayId, ctrl->flags, ctrl->flags2);
ctrl->subDeviceInstance = 0;
ctrl->displayMask = BIT(outp->index);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
if (ctrl->displayMask & BIT(outp->index)) {
ret = r535_outp_dfp_get_info(outp);
{
NV0073_CTRL_CMD_DP_TOPOLOGY_ALLOCATE_DISPLAYID_PARAMS *ctrl;
struct nvkm_disp *disp = outp->disp;
+ int ret;
ctrl = nvkm_gsp_rm_ctrl_get(&disp->rm.objcom,
NV0073_CTRL_CMD_DP_TOPOLOGY_ALLOCATE_DISPLAYID,
ctrl->subDeviceInstance = 0;
ctrl->displayId = BIT(outp->index);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
*pid = ctrl->displayIdAssigned;
nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
{
struct nvkm_disp *disp = outp->disp;
NV0073_CTRL_DP_CTRL_PARAMS *ctrl;
- int ret;
-
- ctrl = nvkm_gsp_rm_ctrl_get(&disp->rm.objcom, NV0073_CTRL_CMD_DP_CTRL, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ int ret, retries;
+ u32 cmd, data;
- ctrl->subDeviceInstance = 0;
- ctrl->displayId = BIT(outp->index);
- ctrl->cmd = NVDEF(NV0073_CTRL, DP_CMD, SET_LANE_COUNT, TRUE) |
- NVDEF(NV0073_CTRL, DP_CMD, SET_LINK_BW, TRUE) |
- NVDEF(NV0073_CTRL, DP_CMD, TRAIN_PHY_REPEATER, YES);
- ctrl->data = NVVAL(NV0073_CTRL, DP_DATA, SET_LANE_COUNT, link_nr) |
- NVVAL(NV0073_CTRL, DP_DATA, SET_LINK_BW, link_bw) |
- NVVAL(NV0073_CTRL, DP_DATA, TARGET, target);
+ cmd = NVDEF(NV0073_CTRL, DP_CMD, SET_LANE_COUNT, TRUE) |
+ NVDEF(NV0073_CTRL, DP_CMD, SET_LINK_BW, TRUE) |
+ NVDEF(NV0073_CTRL, DP_CMD, TRAIN_PHY_REPEATER, YES);
+ data = NVVAL(NV0073_CTRL, DP_DATA, SET_LANE_COUNT, link_nr) |
+ NVVAL(NV0073_CTRL, DP_DATA, SET_LINK_BW, link_bw) |
+ NVVAL(NV0073_CTRL, DP_DATA, TARGET, target);
if (mst)
- ctrl->cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_FORMAT_MODE, MULTI_STREAM);
+ cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_FORMAT_MODE, MULTI_STREAM);
if (outp->dp.dpcd[DPCD_RC02] & DPCD_RC02_ENHANCED_FRAME_CAP)
- ctrl->cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_ENHANCED_FRAMING, TRUE);
+ cmd |= NVDEF(NV0073_CTRL, DP_CMD, SET_ENHANCED_FRAMING, TRUE);
if (target == 0 &&
(outp->dp.dpcd[DPCD_RC02] & 0x20) &&
!(outp->dp.dpcd[DPCD_RC03] & DPCD_RC03_TPS4_SUPPORTED))
- ctrl->cmd |= NVDEF(NV0073_CTRL, DP_CMD, POST_LT_ADJ_REQ_GRANTED, YES);
+ cmd |= NVDEF(NV0073_CTRL, DP_CMD, POST_LT_ADJ_REQ_GRANTED, YES);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ /* We should retry up to 3 times, but only if GSP asks politely */
+ for (retries = 0; retries < 3; ++retries) {
+ ctrl = nvkm_gsp_rm_ctrl_get(&disp->rm.objcom, NV0073_CTRL_CMD_DP_CTRL,
+ sizeof(*ctrl));
+ if (IS_ERR(ctrl))
+ return PTR_ERR(ctrl);
+
+ ctrl->subDeviceInstance = 0;
+ ctrl->displayId = BIT(outp->index);
+ ctrl->retryTimeMs = 0;
+ ctrl->cmd = cmd;
+ ctrl->data = data;
+
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret == -EAGAIN && ctrl->retryTimeMs) {
+ /*
+ * Device (likely an eDP panel) isn't ready yet, wait for the time specified
+ * by GSP before retrying again
+ */
+ nvkm_debug(&disp->engine.subdev,
+ "Waiting %dms for GSP LT panel delay before retrying\n",
+ ctrl->retryTimeMs);
+ msleep(ctrl->retryTimeMs);
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ } else {
+ /* GSP didn't say to retry, or we were successful */
+ if (ctrl->err)
+ ret = -EIO;
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ break;
+ }
+ }
- ret = ctrl->err ? -EIO : 0;
- nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
return ret;
}
ctrl->size = !ctrl->bAddrOnly ? (size - 1) : 0;
memcpy(ctrl->data, data, size);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
return PTR_ERR(ctrl);
+ }
memcpy(data, ctrl->data, size);
*psize = ctrl->size;
ctrl->subDeviceInstance = 0;
ctrl->displayId = BIT(outp->index);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
+ ret = -E2BIG;
if (ctrl->bufferSize <= *psize) {
memcpy(data, ctrl->edidBuffer, ctrl->bufferSize);
*psize = ctrl->bufferSize;
ctrl->subDeviceInstance = 0;
ctrl->displayId = BIT(id);
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
switch (ctrl->type) {
case NV0073_CTRL_SPECIFIC_OR_TYPE_NONE:
ctrl->sorIndex = ~0;
- ctrl = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, ctrl, sizeof(*ctrl));
- if (IS_ERR(ctrl))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
+ if (ret) {
+ nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
+ return ret;
+ }
switch (NVVAL_GET(ctrl->maxLinkRate, NV0073_CTRL_CMD, DP_GET_CAPS, MAX_LINK_RATE)) {
case NV0073_CTRL_CMD_DP_GET_CAPS_MAX_LINK_RATE_1_62:
bool nvhg = acpi_check_dsm(handle, &NVHG_DSM_GUID, NVHG_DSM_REV,
1ULL << 0x00000014);
- printk(KERN_ERR "bl: nbci:%d nvhg:%d\n", nbci, nvhg);
-
if (nbci || nvhg) {
union acpi_object argv4 = {
.buffer.type = ACPI_TYPE_BUFFER,
if (!obj) {
acpi_handle_info(handle, "failed to evaluate _DSM\n");
} else {
- printk(KERN_ERR "bl: obj type %d\n", obj->type);
- printk(KERN_ERR "bl: obj len %d\n", obj->package.count);
-
for (int i = 0; i < obj->package.count; i++) {
union acpi_object *elt = &obj->package.elements[i];
u32 size;
else
size = 4;
- printk(KERN_ERR "elt %03d: type %d size %d\n", i, elt->type, size);
memcpy(&ctrl->backLightData[ctrl->backLightDataSize], &elt->integer.value, size);
ctrl->backLightDataSize += size;
}
- printk(KERN_ERR "bl: data size %d\n", ctrl->backLightDataSize);
ctrl->status = 0;
ACPI_FREE(obj);
}
nvkm_memory_unref(&userd->mem);
nvkm_chid_put(runl->chid, userd->chid, &chan->cgrp->lock);
list_del(&userd->head);
+ kfree(userd);
}
break;
#define GSP_MSG_HDR_SIZE offsetof(struct r535_gsp_msg, data)
+static int
+r535_rpc_status_to_errno(uint32_t rpc_status)
+{
+ switch (rpc_status) {
+ case 0x55: /* NV_ERR_NOT_READY */
+ case 0x66: /* NV_ERR_TIMEOUT_RETRY */
+ return -EAGAIN;
+ case 0x51: /* NV_ERR_NO_MEMORY */
+ return -ENOMEM;
+ default:
+ return -EINVAL;
+ }
+}
+
static void *
r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime)
{
struct nvkm_gsp_msgq_ntfy *ntfy = &gsp->msgq.ntfy[i];
if (ntfy->fn == msg->function) {
- ntfy->func(ntfy->priv, ntfy->fn, msg->data, msg->length - sizeof(*msg));
+ if (ntfy->func)
+ ntfy->func(ntfy->priv, ntfy->fn, msg->data, msg->length - sizeof(*msg));
break;
}
}
return rpc;
if (rpc->status) {
- nvkm_error(&gsp->subdev, "RM_ALLOC: 0x%x\n", rpc->status);
- ret = ERR_PTR(-EINVAL);
+ ret = ERR_PTR(r535_rpc_status_to_errno(rpc->status));
+ if (PTR_ERR(ret) != -EAGAIN)
+ nvkm_error(&gsp->subdev, "RM_ALLOC: 0x%x\n", rpc->status);
} else {
ret = repc ? rpc->params : NULL;
}
- if (IS_ERR_OR_NULL(ret))
- nvkm_gsp_rpc_done(gsp, rpc);
+ nvkm_gsp_rpc_done(gsp, rpc);
return ret;
}
{
rpc_gsp_rm_control_v03_00 *rpc = container_of(repv, typeof(*rpc), params);
+ if (!repv)
+ return;
nvkm_gsp_rpc_done(object->client->gsp, rpc);
}
-static void *
-r535_gsp_rpc_rm_ctrl_push(struct nvkm_gsp_object *object, void *argv, u32 repc)
+static int
+r535_gsp_rpc_rm_ctrl_push(struct nvkm_gsp_object *object, void **argv, u32 repc)
{
- rpc_gsp_rm_control_v03_00 *rpc = container_of(argv, typeof(*rpc), params);
+ rpc_gsp_rm_control_v03_00 *rpc = container_of((*argv), typeof(*rpc), params);
struct nvkm_gsp *gsp = object->client->gsp;
- void *ret;
+ int ret = 0;
rpc = nvkm_gsp_rpc_push(gsp, rpc, true, repc);
- if (IS_ERR_OR_NULL(rpc))
- return rpc;
+ if (IS_ERR_OR_NULL(rpc)) {
+ *argv = NULL;
+ return PTR_ERR(rpc);
+ }
if (rpc->status) {
- nvkm_error(&gsp->subdev, "cli:0x%08x obj:0x%08x ctrl cmd:0x%08x failed: 0x%08x\n",
- object->client->object.handle, object->handle, rpc->cmd, rpc->status);
- ret = ERR_PTR(-EINVAL);
- } else {
- ret = repc ? rpc->params : NULL;
+ ret = r535_rpc_status_to_errno(rpc->status);
+ if (ret != -EAGAIN)
+ nvkm_error(&gsp->subdev, "cli:0x%08x obj:0x%08x ctrl cmd:0x%08x failed: 0x%08x\n",
+ object->client->object.handle, object->handle, rpc->cmd, rpc->status);
}
- if (IS_ERR_OR_NULL(ret))
+ if (repc)
+ *argv = rpc->params;
+ else
nvkm_gsp_rpc_done(gsp, rpc);
return ret;
if (IS_ERR(ctrl))
return PTR_ERR(ctrl);
- ctrl = nvkm_gsp_rm_ctrl_push(&gsp->internal.device.subdevice, ctrl, sizeof(*ctrl));
- if (WARN_ON(IS_ERR(ctrl)))
- return PTR_ERR(ctrl);
+ ret = nvkm_gsp_rm_ctrl_push(&gsp->internal.device.subdevice, &ctrl, sizeof(*ctrl));
+ if (WARN_ON(ret)) {
+ nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, ctrl);
+ return ret;
+ }
for (unsigned i = 0; i < ctrl->tableLen; i++) {
enum nvkm_subdev_type type;
if (!obj)
return;
- printk(KERN_ERR "nvop: obj type %d\n", obj->type);
- printk(KERN_ERR "nvop: obj len %d\n", obj->buffer.length);
-
if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) ||
WARN_ON(obj->buffer.length != 4))
return;
caps->status = 0;
caps->optimusCaps = *(u32 *)obj->buffer.pointer;
- printk(KERN_ERR "nvop: caps %08x\n", caps->optimusCaps);
ACPI_FREE(obj);
if (!obj)
return;
- printk(KERN_ERR "jt: obj type %d\n", obj->type);
- printk(KERN_ERR "jt: obj len %d\n", obj->buffer.length);
-
if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) ||
WARN_ON(obj->buffer.length != 4))
return;
jt->jtCaps = *(u32 *)obj->buffer.pointer;
jt->jtRevId = (jt->jtCaps & 0xfff00000) >> 20;
jt->bSBIOSCaps = 0;
- printk(KERN_ERR "jt: caps %08x rev:%04x\n", jt->jtCaps, jt->jtRevId);
ACPI_FREE(obj);
r535_gsp_acpi_mux_id(acpi_handle handle, u32 id, MUX_METHOD_DATA_ELEMENT *mode,
MUX_METHOD_DATA_ELEMENT *part)
{
+ union acpi_object mux_arg = { ACPI_TYPE_INTEGER };
+ struct acpi_object_list input = { 1, &mux_arg };
acpi_handle iter = NULL, handle_mux = NULL;
acpi_status status;
unsigned long long value;
if (!handle_mux)
return;
- status = acpi_evaluate_integer(handle_mux, "MXDM", NULL, &value);
+ /* I -think- 0 means "acquire" according to nvidia's driver source */
+ input.pointer->integer.type = ACPI_TYPE_INTEGER;
+ input.pointer->integer.value = 0;
+
+ status = acpi_evaluate_integer(handle_mux, "MXDM", &input, &value);
if (ACPI_SUCCESS(status)) {
mode->acpiId = id;
mode->mode = value;
mode->status = 0;
}
- status = acpi_evaluate_integer(handle_mux, "MXDS", NULL, &value);
+ status = acpi_evaluate_integer(handle_mux, "MXDS", &input, &value);
if (ACPI_SUCCESS(status)) {
part->acpiId = id;
part->mode = value;
dod->acpiIdListLen += sizeof(dod->acpiIdList[0]);
}
- printk(KERN_ERR "_DOD: ok! len:%d\n", dod->acpiIdListLen);
dod->status = 0;
+ kfree(output.pointer);
}
#endif
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
r535_gsp_msg_mmu_fault_queued, gsp);
r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_OS_ERROR_LOG, r535_gsp_msg_os_error_log, gsp);
-
+ r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_PERF_BRIDGELESS_INFO_UPDATE, NULL, NULL);
+ r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, NULL, NULL);
+ r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_GSP_SEND_USER_SHARED_DATA, NULL, NULL);
ret = r535_gsp_rm_boot_ctor(gsp);
if (ret)
return ret;
* All the controller's button values are stored in a u32.
* They can be accessed with bitwise ANDs.
*/
-static const u32 JC_BTN_Y = BIT(0);
-static const u32 JC_BTN_X = BIT(1);
-static const u32 JC_BTN_B = BIT(2);
-static const u32 JC_BTN_A = BIT(3);
-static const u32 JC_BTN_SR_R = BIT(4);
-static const u32 JC_BTN_SL_R = BIT(5);
-static const u32 JC_BTN_R = BIT(6);
-static const u32 JC_BTN_ZR = BIT(7);
-static const u32 JC_BTN_MINUS = BIT(8);
-static const u32 JC_BTN_PLUS = BIT(9);
-static const u32 JC_BTN_RSTICK = BIT(10);
-static const u32 JC_BTN_LSTICK = BIT(11);
-static const u32 JC_BTN_HOME = BIT(12);
-static const u32 JC_BTN_CAP = BIT(13); /* capture button */
-static const u32 JC_BTN_DOWN = BIT(16);
-static const u32 JC_BTN_UP = BIT(17);
-static const u32 JC_BTN_RIGHT = BIT(18);
-static const u32 JC_BTN_LEFT = BIT(19);
-static const u32 JC_BTN_SR_L = BIT(20);
-static const u32 JC_BTN_SL_L = BIT(21);
-static const u32 JC_BTN_L = BIT(22);
-static const u32 JC_BTN_ZL = BIT(23);
+#define JC_BTN_Y BIT(0)
+#define JC_BTN_X BIT(1)
+#define JC_BTN_B BIT(2)
+#define JC_BTN_A BIT(3)
+#define JC_BTN_SR_R BIT(4)
+#define JC_BTN_SL_R BIT(5)
+#define JC_BTN_R BIT(6)
+#define JC_BTN_ZR BIT(7)
+#define JC_BTN_MINUS BIT(8)
+#define JC_BTN_PLUS BIT(9)
+#define JC_BTN_RSTICK BIT(10)
+#define JC_BTN_LSTICK BIT(11)
+#define JC_BTN_HOME BIT(12)
+#define JC_BTN_CAP BIT(13) /* capture button */
+#define JC_BTN_DOWN BIT(16)
+#define JC_BTN_UP BIT(17)
+#define JC_BTN_RIGHT BIT(18)
+#define JC_BTN_LEFT BIT(19)
+#define JC_BTN_SR_L BIT(20)
+#define JC_BTN_SL_L BIT(21)
+#define JC_BTN_L BIT(22)
+#define JC_BTN_ZL BIT(23)
enum joycon_msg_type {
JOYCON_MSG_TYPE_NONE,
*/
static void joycon_calc_imu_cal_divisors(struct joycon_ctlr *ctlr)
{
- int i;
+ int i, divz = 0;
for (i = 0; i < 3; i++) {
ctlr->imu_cal_accel_divisor[i] = ctlr->accel_cal.scale[i] -
ctlr->accel_cal.offset[i];
ctlr->imu_cal_gyro_divisor[i] = ctlr->gyro_cal.scale[i] -
ctlr->gyro_cal.offset[i];
+
+ if (ctlr->imu_cal_accel_divisor[i] == 0) {
+ ctlr->imu_cal_accel_divisor[i] = 1;
+ divz++;
+ }
+
+ if (ctlr->imu_cal_gyro_divisor[i] == 0) {
+ ctlr->imu_cal_gyro_divisor[i] = 1;
+ divz++;
+ }
}
+
+ if (divz)
+ hid_warn(ctlr->hdev, "inaccurate IMU divisors (%d)\n", divz);
}
static const s16 DFLT_ACCEL_OFFSET /*= 0*/;
JC_IMU_SAMPLES_PER_DELTA_AVG) {
ctlr->imu_avg_delta_ms = ctlr->imu_delta_samples_sum /
ctlr->imu_delta_samples_count;
- /* don't ever want divide by zero shenanigans */
- if (ctlr->imu_avg_delta_ms == 0) {
- ctlr->imu_avg_delta_ms = 1;
- hid_warn(ctlr->hdev,
- "calculated avg imu delta of 0\n");
- }
ctlr->imu_delta_samples_count = 0;
ctlr->imu_delta_samples_sum = 0;
}
+ /* don't ever want divide by zero shenanigans */
+ if (ctlr->imu_avg_delta_ms == 0) {
+ ctlr->imu_avg_delta_ms = 1;
+ hid_warn(ctlr->hdev, "calculated avg imu delta of 0\n");
+ }
+
/* useful for debugging IMU sample rate */
hid_dbg(ctlr->hdev,
"imu_report: ms=%u last_ms=%u delta=%u avg_delta=%u\n",
if (!slave)
return 0;
- command = readl(bus->base + ASPEED_I2C_CMD_REG);
+ /*
+ * Handle stop conditions early, prior to SLAVE_MATCH. Some masters may drive
+ * transfers with low enough latency between the nak/stop phase of the current
+ * command and the start/address phase of the following command that the
+ * interrupts are coalesced by the time we process them.
+ */
+ if (irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
+ irq_handled |= ASPEED_I2CD_INTR_NORMAL_STOP;
+ bus->slave_state = ASPEED_I2C_SLAVE_STOP;
+ }
+
+ if (irq_status & ASPEED_I2CD_INTR_TX_NAK &&
+ bus->slave_state == ASPEED_I2C_SLAVE_READ_PROCESSED) {
+ irq_handled |= ASPEED_I2CD_INTR_TX_NAK;
+ bus->slave_state = ASPEED_I2C_SLAVE_STOP;
+ }
+
+ /* Propagate any stop conditions to the slave implementation. */
+ if (bus->slave_state == ASPEED_I2C_SLAVE_STOP) {
+ i2c_slave_event(slave, I2C_SLAVE_STOP, &value);
+ bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE;
+ }
- /* Slave was requested, restart state machine. */
+ /*
+ * Now that we've dealt with any potentially coalesced stop conditions,
+ * address any start conditions.
+ */
if (irq_status & ASPEED_I2CD_INTR_SLAVE_MATCH) {
irq_handled |= ASPEED_I2CD_INTR_SLAVE_MATCH;
bus->slave_state = ASPEED_I2C_SLAVE_START;
}
- /* Slave is not currently active, irq was for someone else. */
+ /*
+ * If the slave has been stopped and not started then slave interrupt
+ * handling is complete.
+ */
if (bus->slave_state == ASPEED_I2C_SLAVE_INACTIVE)
return irq_handled;
+ command = readl(bus->base + ASPEED_I2C_CMD_REG);
dev_dbg(bus->dev, "slave irq status 0x%08x, cmd 0x%08x\n",
irq_status, command);
irq_handled |= ASPEED_I2CD_INTR_RX_DONE;
}
- /* Slave was asked to stop. */
- if (irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
- irq_handled |= ASPEED_I2CD_INTR_NORMAL_STOP;
- bus->slave_state = ASPEED_I2C_SLAVE_STOP;
- }
- if (irq_status & ASPEED_I2CD_INTR_TX_NAK &&
- bus->slave_state == ASPEED_I2C_SLAVE_READ_PROCESSED) {
- irq_handled |= ASPEED_I2CD_INTR_TX_NAK;
- bus->slave_state = ASPEED_I2C_SLAVE_STOP;
- }
-
switch (bus->slave_state) {
case ASPEED_I2C_SLAVE_READ_REQUESTED:
if (unlikely(irq_status & ASPEED_I2CD_INTR_TX_ACK))
i2c_slave_event(slave, I2C_SLAVE_WRITE_RECEIVED, &value);
break;
case ASPEED_I2C_SLAVE_STOP:
- i2c_slave_event(slave, I2C_SLAVE_STOP, &value);
- bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE;
+ /* Stop event handling is done early. Unreachable. */
break;
case ASPEED_I2C_SLAVE_START:
/* Slave was just started. Waiting for the next event. */;
ret = geni_se_resources_on(&gi2c->se);
if (ret) {
dev_err(dev, "Error turning on resources %d\n", ret);
+ clk_disable_unprepare(gi2c->core_clk);
return ret;
}
proto = geni_se_read_proto(&gi2c->se);
/* FIFO is disabled, so we can only use GPI DMA */
gi2c->gpi_mode = true;
ret = setup_gpi_dma(gi2c);
- if (ret)
+ if (ret) {
+ geni_se_resources_off(&gi2c->se);
+ clk_disable_unprepare(gi2c->core_clk);
return dev_err_probe(dev, ret, "Failed to setup GPI DMA mode\n");
+ }
dev_dbg(dev, "Using GPI DMA mode for I2C\n");
} else {
if (!tx_depth) {
dev_err(dev, "Invalid TX FIFO depth\n");
+ geni_se_resources_off(&gi2c->se);
+ clk_disable_unprepare(gi2c->core_clk);
return -EINVAL;
}
* @clk: function clk for rk3399 or function & Bus clks for others
* @pclk: Bus clk for rk3399
* @clk_rate_nb: i2c clk rate change notify
+ * @irq: irq number
* @t: I2C known timing information
* @lock: spinlock for the i2c bus
* @wait: the waitqueue to wait for i2c transfer
struct clk *clk;
struct clk *pclk;
struct notifier_block clk_rate_nb;
+ int irq;
/* Settings */
struct i2c_timings t;
spin_unlock_irqrestore(&i2c->lock, flags);
- rk3x_i2c_start(i2c);
-
if (!polling) {
+ rk3x_i2c_start(i2c);
+
timeout = wait_event_timeout(i2c->wait, !i2c->busy,
msecs_to_jiffies(WAIT_TIMEOUT));
} else {
+ disable_irq(i2c->irq);
+ rk3x_i2c_start(i2c);
+
timeout = rk3x_i2c_wait_xfer_poll(i2c);
+
+ enable_irq(i2c->irq);
}
spin_lock_irqsave(&i2c->lock, flags);
return ret;
}
+ i2c->irq = irq;
+
platform_set_drvdata(pdev, i2c);
if (i2c->soc_data->calc_timings == rk3x_i2c_v0_calc_timings) {
* i2c-core.h - interfaces internal to the I2C framework
*/
+#include <linux/kconfig.h>
#include <linux/rwsem.h>
struct i2c_devinfo {
*/
static inline bool i2c_in_atomic_xfer_mode(void)
{
- return system_state > SYSTEM_RUNNING && !preemptible();
+ return system_state > SYSTEM_RUNNING &&
+ (IS_ENABLED(CONFIG_PREEMPT_COUNT) ? !preemptible() : irqs_disabled());
}
static inline int __i2c_lock_bus_helper(struct i2c_adapter *adap)
* (range / 2^bits) * g = (range / 2^bits) * 9.80665 m/s^2
* => KX022A uses 16 bit (HiRes mode - assume the low 8 bits are zeroed
* in low-power mode(?) )
- * => +/-2G => 4 / 2^16 * 9,80665 * 10^6 (to scale to micro)
- * => +/-2G - 598.550415
- * +/-4G - 1197.10083
- * +/-8G - 2394.20166
- * +/-16G - 4788.40332
+ * => +/-2G => 4 / 2^16 * 9,80665
+ * => +/-2G - 0.000598550415
+ * +/-4G - 0.00119710083
+ * +/-8G - 0.00239420166
+ * +/-16G - 0.00478840332
*/
static const int kx022a_scale_table[][2] = {
- { 598, 550415 },
- { 1197, 100830 },
- { 2394, 201660 },
- { 4788, 403320 },
+ { 0, 598550 },
+ { 0, 1197101 },
+ { 0, 2394202 },
+ { 0, 4788403 },
};
static int kx022a_read_avail(struct iio_dev *indio_dev,
*vals = (const int *)kx022a_scale_table;
*length = ARRAY_SIZE(kx022a_scale_table) *
ARRAY_SIZE(kx022a_scale_table[0]);
- *type = IIO_VAL_INT_PLUS_MICRO;
+ *type = IIO_VAL_INT_PLUS_NANO;
return IIO_AVAIL_LIST;
default:
return -EINVAL;
return ret;
}
+static int kx022a_write_raw_get_fmt(struct iio_dev *idev,
+ struct iio_chan_spec const *chan,
+ long mask)
+{
+ switch (mask) {
+ case IIO_CHAN_INFO_SCALE:
+ return IIO_VAL_INT_PLUS_NANO;
+ case IIO_CHAN_INFO_SAMP_FREQ:
+ return IIO_VAL_INT_PLUS_MICRO;
+ default:
+ return -EINVAL;
+ }
+}
+
static int kx022a_write_raw(struct iio_dev *idev,
struct iio_chan_spec const *chan,
int val, int val2, long mask)
kx022a_reg2scale(regval, val, val2);
- return IIO_VAL_INT_PLUS_MICRO;
+ return IIO_VAL_INT_PLUS_NANO;
}
return -EINVAL;
static const struct iio_info kx022a_info = {
.read_raw = &kx022a_read_raw,
.write_raw = &kx022a_write_raw,
+ .write_raw_get_fmt = &kx022a_write_raw_get_fmt,
.read_avail = &kx022a_read_avail,
.validate_trigger = iio_validate_own_trigger,
IMX93_ADC_CHAN(1),
IMX93_ADC_CHAN(2),
IMX93_ADC_CHAN(3),
+ IMX93_ADC_CHAN(4),
+ IMX93_ADC_CHAN(5),
+ IMX93_ADC_CHAN(6),
+ IMX93_ADC_CHAN(7),
};
static void imx93_adc_power_down(struct imx93_adc *adc)
mutex_unlock(&adc->lock);
return ret;
case IIO_CHAN_INFO_CALIBBIAS:
- if (val < mcp3564_calib_bias[0] && val > mcp3564_calib_bias[2])
+ if (val < mcp3564_calib_bias[0] || val > mcp3564_calib_bias[2])
return -EINVAL;
mutex_lock(&adc->lock);
mutex_unlock(&adc->lock);
return ret;
case IIO_CHAN_INFO_CALIBSCALE:
- if (val < mcp3564_calib_scale[0] && val > mcp3564_calib_scale[2])
+ if (val < mcp3564_calib_scale[0] || val > mcp3564_calib_scale[2])
return -EINVAL;
if (adc->calib_scale == val)
enum mcp3564_ids ids;
int ret = 0;
unsigned int tmp = 0x01;
- bool err = true;
+ bool err = false;
/*
* The address is set on a per-device basis by fuses in the factory,
module_spi_driver(mcp3564_driver);
MODULE_AUTHOR("Marius Cristea <marius.cristea@microchip.com>");
-MODULE_DESCRIPTION("Microchip MCP346x/MCP346xR and MCP356x/MCP346xR ADCs");
+MODULE_DESCRIPTION("Microchip MCP346x/MCP346xR and MCP356x/MCP356xR ADCs");
MODULE_LICENSE("GPL v2");
.cmv_select = 1,
};
+static const struct meson_sar_adc_param meson_sar_adc_axg_param = {
+ .has_bl30_integration = true,
+ .clock_rate = 1200000,
+ .bandgap_reg = MESON_SAR_ADC_REG11,
+ .regmap_config = &meson_sar_adc_regmap_config_gxbb,
+ .resolution = 12,
+ .disable_ring_counter = 1,
+ .has_reg11 = true,
+ .vref_volatge = 1,
+ .has_vref_select = true,
+ .vref_select = VREF_VDDA,
+ .cmv_select = 1,
+};
+
static const struct meson_sar_adc_param meson_sar_adc_g12a_param = {
.has_bl30_integration = false,
.clock_rate = 1200000,
};
static const struct meson_sar_adc_data meson_sar_adc_axg_data = {
- .param = &meson_sar_adc_gxl_param,
+ .param = &meson_sar_adc_axg_param,
.name = "meson-axg-saradc",
};
platform_set_drvdata(pdev, indio_dev);
err = tiadc_request_dma(pdev, adc_dev);
- if (err && err == -EPROBE_DEFER)
+ if (err && err != -ENODEV) {
+ dev_err_probe(&pdev->dev, err, "DMA request failed\n");
goto err_dma;
+ }
return 0;
struct iio_buffer *buffer;
int ret;
+ /*
+ * iio_triggered_buffer_cleanup() assumes that the buffer allocated here
+ * is assigned to indio_dev->buffer but this is only the case if this
+ * function is the first caller to iio_device_attach_buffer(). If
+ * indio_dev->buffer is already set then we can't proceed otherwise the
+ * cleanup function will try to free a buffer that was not allocated here.
+ */
+ if (indio_dev->buffer)
+ return -EADDRINUSE;
+
buffer = iio_kfifo_allocate();
if (!buffer) {
ret = -ENOMEM;
/* Conversion times in us */
static const u16 ms_sensors_ht_t_conversion_time[] = { 50000, 25000,
13000, 7000 };
-static const u16 ms_sensors_ht_h_conversion_time[] = { 16000, 3000,
- 5000, 8000 };
+static const u16 ms_sensors_ht_h_conversion_time[] = { 16000, 5000,
+ 3000, 8000 };
static const u16 ms_sensors_tp_conversion_time[] = { 500, 1100, 2100,
4100, 8220, 16440 };
#define ADIS16475_MAX_SCAN_DATA 20
/* spi max speed in brust mode */
#define ADIS16475_BURST_MAX_SPEED 1000000
-#define ADIS16475_LSB_DEC_MASK BIT(0)
-#define ADIS16475_LSB_FIR_MASK BIT(1)
+#define ADIS16475_LSB_DEC_MASK 0
+#define ADIS16475_LSB_FIR_MASK 1
#define ADIS16500_BURST_DATA_SEL_0_CHN_MASK GENMASK(5, 0)
#define ADIS16500_BURST_DATA_SEL_1_CHN_MASK GENMASK(12, 7)
return 0;
}
-static const struct of_device_id adis16475_of_match[] = {
- { .compatible = "adi,adis16470",
- .data = &adis16475_chip_info[ADIS16470] },
- { .compatible = "adi,adis16475-1",
- .data = &adis16475_chip_info[ADIS16475_1] },
- { .compatible = "adi,adis16475-2",
- .data = &adis16475_chip_info[ADIS16475_2] },
- { .compatible = "adi,adis16475-3",
- .data = &adis16475_chip_info[ADIS16475_3] },
- { .compatible = "adi,adis16477-1",
- .data = &adis16475_chip_info[ADIS16477_1] },
- { .compatible = "adi,adis16477-2",
- .data = &adis16475_chip_info[ADIS16477_2] },
- { .compatible = "adi,adis16477-3",
- .data = &adis16475_chip_info[ADIS16477_3] },
- { .compatible = "adi,adis16465-1",
- .data = &adis16475_chip_info[ADIS16465_1] },
- { .compatible = "adi,adis16465-2",
- .data = &adis16475_chip_info[ADIS16465_2] },
- { .compatible = "adi,adis16465-3",
- .data = &adis16475_chip_info[ADIS16465_3] },
- { .compatible = "adi,adis16467-1",
- .data = &adis16475_chip_info[ADIS16467_1] },
- { .compatible = "adi,adis16467-2",
- .data = &adis16475_chip_info[ADIS16467_2] },
- { .compatible = "adi,adis16467-3",
- .data = &adis16475_chip_info[ADIS16467_3] },
- { .compatible = "adi,adis16500",
- .data = &adis16475_chip_info[ADIS16500] },
- { .compatible = "adi,adis16505-1",
- .data = &adis16475_chip_info[ADIS16505_1] },
- { .compatible = "adi,adis16505-2",
- .data = &adis16475_chip_info[ADIS16505_2] },
- { .compatible = "adi,adis16505-3",
- .data = &adis16475_chip_info[ADIS16505_3] },
- { .compatible = "adi,adis16507-1",
- .data = &adis16475_chip_info[ADIS16507_1] },
- { .compatible = "adi,adis16507-2",
- .data = &adis16475_chip_info[ADIS16507_2] },
- { .compatible = "adi,adis16507-3",
- .data = &adis16475_chip_info[ADIS16507_3] },
- { },
-};
-MODULE_DEVICE_TABLE(of, adis16475_of_match);
static int adis16475_probe(struct spi_device *spi)
{
st = iio_priv(indio_dev);
- st->info = device_get_match_data(&spi->dev);
+ st->info = spi_get_device_match_data(spi);
if (!st->info)
return -EINVAL;
return 0;
}
+static const struct of_device_id adis16475_of_match[] = {
+ { .compatible = "adi,adis16470",
+ .data = &adis16475_chip_info[ADIS16470] },
+ { .compatible = "adi,adis16475-1",
+ .data = &adis16475_chip_info[ADIS16475_1] },
+ { .compatible = "adi,adis16475-2",
+ .data = &adis16475_chip_info[ADIS16475_2] },
+ { .compatible = "adi,adis16475-3",
+ .data = &adis16475_chip_info[ADIS16475_3] },
+ { .compatible = "adi,adis16477-1",
+ .data = &adis16475_chip_info[ADIS16477_1] },
+ { .compatible = "adi,adis16477-2",
+ .data = &adis16475_chip_info[ADIS16477_2] },
+ { .compatible = "adi,adis16477-3",
+ .data = &adis16475_chip_info[ADIS16477_3] },
+ { .compatible = "adi,adis16465-1",
+ .data = &adis16475_chip_info[ADIS16465_1] },
+ { .compatible = "adi,adis16465-2",
+ .data = &adis16475_chip_info[ADIS16465_2] },
+ { .compatible = "adi,adis16465-3",
+ .data = &adis16475_chip_info[ADIS16465_3] },
+ { .compatible = "adi,adis16467-1",
+ .data = &adis16475_chip_info[ADIS16467_1] },
+ { .compatible = "adi,adis16467-2",
+ .data = &adis16475_chip_info[ADIS16467_2] },
+ { .compatible = "adi,adis16467-3",
+ .data = &adis16475_chip_info[ADIS16467_3] },
+ { .compatible = "adi,adis16500",
+ .data = &adis16475_chip_info[ADIS16500] },
+ { .compatible = "adi,adis16505-1",
+ .data = &adis16475_chip_info[ADIS16505_1] },
+ { .compatible = "adi,adis16505-2",
+ .data = &adis16475_chip_info[ADIS16505_2] },
+ { .compatible = "adi,adis16505-3",
+ .data = &adis16475_chip_info[ADIS16505_3] },
+ { .compatible = "adi,adis16507-1",
+ .data = &adis16475_chip_info[ADIS16507_1] },
+ { .compatible = "adi,adis16507-2",
+ .data = &adis16475_chip_info[ADIS16507_2] },
+ { .compatible = "adi,adis16507-3",
+ .data = &adis16475_chip_info[ADIS16507_3] },
+ { },
+};
+MODULE_DEVICE_TABLE(of, adis16475_of_match);
+
+static const struct spi_device_id adis16475_ids[] = {
+ { "adis16470", (kernel_ulong_t)&adis16475_chip_info[ADIS16470] },
+ { "adis16475-1", (kernel_ulong_t)&adis16475_chip_info[ADIS16475_1] },
+ { "adis16475-2", (kernel_ulong_t)&adis16475_chip_info[ADIS16475_2] },
+ { "adis16475-3", (kernel_ulong_t)&adis16475_chip_info[ADIS16475_3] },
+ { "adis16477-1", (kernel_ulong_t)&adis16475_chip_info[ADIS16477_1] },
+ { "adis16477-2", (kernel_ulong_t)&adis16475_chip_info[ADIS16477_2] },
+ { "adis16477-3", (kernel_ulong_t)&adis16475_chip_info[ADIS16477_3] },
+ { "adis16465-1", (kernel_ulong_t)&adis16475_chip_info[ADIS16465_1] },
+ { "adis16465-2", (kernel_ulong_t)&adis16475_chip_info[ADIS16465_2] },
+ { "adis16465-3", (kernel_ulong_t)&adis16475_chip_info[ADIS16465_3] },
+ { "adis16467-1", (kernel_ulong_t)&adis16475_chip_info[ADIS16467_1] },
+ { "adis16467-2", (kernel_ulong_t)&adis16475_chip_info[ADIS16467_2] },
+ { "adis16467-3", (kernel_ulong_t)&adis16475_chip_info[ADIS16467_3] },
+ { "adis16500", (kernel_ulong_t)&adis16475_chip_info[ADIS16500] },
+ { "adis16505-1", (kernel_ulong_t)&adis16475_chip_info[ADIS16505_1] },
+ { "adis16505-2", (kernel_ulong_t)&adis16475_chip_info[ADIS16505_2] },
+ { "adis16505-3", (kernel_ulong_t)&adis16475_chip_info[ADIS16505_3] },
+ { "adis16507-1", (kernel_ulong_t)&adis16475_chip_info[ADIS16507_1] },
+ { "adis16507-2", (kernel_ulong_t)&adis16475_chip_info[ADIS16507_2] },
+ { "adis16507-3", (kernel_ulong_t)&adis16475_chip_info[ADIS16507_3] },
+ { }
+};
+MODULE_DEVICE_TABLE(spi, adis16475_ids);
+
static struct spi_driver adis16475_driver = {
.driver = {
.name = "adis16475",
.of_match_table = adis16475_of_match,
},
.probe = adis16475_probe,
+ .id_table = adis16475_ids,
};
module_spi_driver(adis16475_driver);
ret = inv_mpu6050_sensor_show(st, st->reg->gyro_offset,
chan->channel2, val);
mutex_unlock(&st->lock);
- return IIO_VAL_INT;
+ return ret;
case IIO_ACCEL:
mutex_lock(&st->lock);
ret = inv_mpu6050_sensor_show(st, st->reg->accl_offset,
chan->channel2, val);
mutex_unlock(&st->lock);
- return IIO_VAL_INT;
+ return ret;
default:
return -EINVAL;
#include "../common/hid-sensors/hid-sensor-trigger.h"
enum {
- CHANNEL_SCAN_INDEX_INTENSITY,
- CHANNEL_SCAN_INDEX_ILLUM,
- CHANNEL_SCAN_INDEX_COLOR_TEMP,
- CHANNEL_SCAN_INDEX_CHROMATICITY_X,
- CHANNEL_SCAN_INDEX_CHROMATICITY_Y,
+ CHANNEL_SCAN_INDEX_INTENSITY = 0,
+ CHANNEL_SCAN_INDEX_ILLUM = 1,
CHANNEL_SCAN_INDEX_MAX
};
BIT(IIO_CHAN_INFO_HYSTERESIS_RELATIVE),
.scan_index = CHANNEL_SCAN_INDEX_ILLUM,
},
- {
- .type = IIO_COLORTEMP,
- .info_mask_separate = BIT(IIO_CHAN_INFO_RAW),
- .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_OFFSET) |
- BIT(IIO_CHAN_INFO_SCALE) |
- BIT(IIO_CHAN_INFO_SAMP_FREQ) |
- BIT(IIO_CHAN_INFO_HYSTERESIS) |
- BIT(IIO_CHAN_INFO_HYSTERESIS_RELATIVE),
- .scan_index = CHANNEL_SCAN_INDEX_COLOR_TEMP,
- },
- {
- .type = IIO_CHROMATICITY,
- .modified = 1,
- .channel2 = IIO_MOD_X,
- .info_mask_separate = BIT(IIO_CHAN_INFO_RAW),
- .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_OFFSET) |
- BIT(IIO_CHAN_INFO_SCALE) |
- BIT(IIO_CHAN_INFO_SAMP_FREQ) |
- BIT(IIO_CHAN_INFO_HYSTERESIS) |
- BIT(IIO_CHAN_INFO_HYSTERESIS_RELATIVE),
- .scan_index = CHANNEL_SCAN_INDEX_CHROMATICITY_X,
- },
- {
- .type = IIO_CHROMATICITY,
- .modified = 1,
- .channel2 = IIO_MOD_Y,
- .info_mask_separate = BIT(IIO_CHAN_INFO_RAW),
- .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_OFFSET) |
- BIT(IIO_CHAN_INFO_SCALE) |
- BIT(IIO_CHAN_INFO_SAMP_FREQ) |
- BIT(IIO_CHAN_INFO_HYSTERESIS) |
- BIT(IIO_CHAN_INFO_HYSTERESIS_RELATIVE),
- .scan_index = CHANNEL_SCAN_INDEX_CHROMATICITY_Y,
- },
IIO_CHAN_SOFT_TIMESTAMP(CHANNEL_SCAN_INDEX_TIMESTAMP)
};
min = als_state->als[chan->scan_index].logical_minimum;
address = HID_USAGE_SENSOR_LIGHT_ILLUM;
break;
- case CHANNEL_SCAN_INDEX_COLOR_TEMP:
- report_id = als_state->als[chan->scan_index].report_id;
- min = als_state->als[chan->scan_index].logical_minimum;
- address = HID_USAGE_SENSOR_LIGHT_COLOR_TEMPERATURE;
- break;
- case CHANNEL_SCAN_INDEX_CHROMATICITY_X:
- report_id = als_state->als[chan->scan_index].report_id;
- min = als_state->als[chan->scan_index].logical_minimum;
- address = HID_USAGE_SENSOR_LIGHT_CHROMATICITY_X;
- break;
- case CHANNEL_SCAN_INDEX_CHROMATICITY_Y:
- report_id = als_state->als[chan->scan_index].report_id;
- min = als_state->als[chan->scan_index].logical_minimum;
- address = HID_USAGE_SENSOR_LIGHT_CHROMATICITY_Y;
- break;
default:
report_id = -1;
break;
als_state->scan.illum[CHANNEL_SCAN_INDEX_ILLUM] = sample_data;
ret = 0;
break;
- case HID_USAGE_SENSOR_LIGHT_COLOR_TEMPERATURE:
- als_state->scan.illum[CHANNEL_SCAN_INDEX_COLOR_TEMP] = sample_data;
- ret = 0;
- break;
- case HID_USAGE_SENSOR_LIGHT_CHROMATICITY_X:
- als_state->scan.illum[CHANNEL_SCAN_INDEX_CHROMATICITY_X] = sample_data;
- ret = 0;
- break;
- case HID_USAGE_SENSOR_LIGHT_CHROMATICITY_Y:
- als_state->scan.illum[CHANNEL_SCAN_INDEX_CHROMATICITY_Y] = sample_data;
- ret = 0;
- break;
case HID_USAGE_SENSOR_TIME_TIMESTAMP:
als_state->timestamp = hid_sensor_convert_timestamp(&als_state->common_attributes,
*(s64 *)raw_data);
st->als[i].report_id);
}
- ret = sensor_hub_input_get_attribute_info(hsdev, HID_INPUT_REPORT,
- usage_id,
- HID_USAGE_SENSOR_LIGHT_COLOR_TEMPERATURE,
- &st->als[CHANNEL_SCAN_INDEX_COLOR_TEMP]);
- if (ret < 0)
- return ret;
- als_adjust_channel_bit_mask(channels, CHANNEL_SCAN_INDEX_COLOR_TEMP,
- st->als[CHANNEL_SCAN_INDEX_COLOR_TEMP].size);
-
- dev_dbg(&pdev->dev, "als %x:%x\n",
- st->als[CHANNEL_SCAN_INDEX_COLOR_TEMP].index,
- st->als[CHANNEL_SCAN_INDEX_COLOR_TEMP].report_id);
-
- for (i = 0; i < 2; i++) {
- int next_scan_index = CHANNEL_SCAN_INDEX_CHROMATICITY_X + i;
-
- ret = sensor_hub_input_get_attribute_info(hsdev,
- HID_INPUT_REPORT, usage_id,
- HID_USAGE_SENSOR_LIGHT_CHROMATICITY_X + i,
- &st->als[next_scan_index]);
- if (ret < 0)
- return ret;
-
- als_adjust_channel_bit_mask(channels,
- CHANNEL_SCAN_INDEX_CHROMATICITY_X + i,
- st->als[next_scan_index].size);
-
- dev_dbg(&pdev->dev, "als %x:%x\n",
- st->als[next_scan_index].index,
- st->als[next_scan_index].report_id);
- }
-
st->scale_precision = hid_sensor_format_scale(usage_id,
&st->als[CHANNEL_SCAN_INDEX_INTENSITY],
&st->scale_pre_decml, &st->scale_post_decml);
case IIO_CHAN_INFO_OFFSET:
switch (chan->type) {
case IIO_TEMP:
- *val = -266314;
+ *val = -16005;
return IIO_VAL_INT;
default:
return -EINVAL;
#include <linux/types.h>
/* PCIe device related definition. */
-#define PCI_VENDOR_ID_ALIBABA 0x1ded
-
#define ERDMA_PCI_WIDTH 64
#define ERDMA_FUNC_BAR 0
#define ERDMA_MISX_BAR 2
list_for_each_entry_rcu(item, fd_list, xa_list) {
if (item->eventfd)
- eventfd_signal(item->eventfd, 1);
+ eventfd_signal(item->eventfd);
else
deliver_event(item, data);
}
{ 0x146b, 0x0604, "Bigben Interactive DAIJA Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 },
{ 0x1532, 0x0a00, "Razer Atrox Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE },
{ 0x1532, 0x0a03, "Razer Wildcat", 0, XTYPE_XBOXONE },
+ { 0x1532, 0x0a29, "Razer Wolverine V2", 0, XTYPE_XBOXONE },
{ 0x15e4, 0x3f00, "Power A Mini Pro Elite", 0, XTYPE_XBOX360 },
{ 0x15e4, 0x3f0a, "Xbox Airflo wired controller", 0, XTYPE_XBOX360 },
{ 0x15e4, 0x3f10, "Batarang Xbox 360 controller", 0, XTYPE_XBOX360 },
ps2dev->serio->phys);
}
+#ifdef CONFIG_X86
+static bool atkbd_is_portable_device(void)
+{
+ static const char * const chassis_types[] = {
+ "8", /* Portable */
+ "9", /* Laptop */
+ "10", /* Notebook */
+ "14", /* Sub-Notebook */
+ "31", /* Convertible */
+ "32", /* Detachable */
+ };
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(chassis_types); i++)
+ if (dmi_match(DMI_CHASSIS_TYPE, chassis_types[i]))
+ return true;
+
+ return false;
+}
+
+/*
+ * On many modern laptops ATKBD_CMD_GETID may cause problems, on these laptops
+ * the controller is always in translated mode. In this mode mice/touchpads will
+ * not work. So in this case simply assume a keyboard is connected to avoid
+ * confusing some laptop keyboards.
+ *
+ * Skipping ATKBD_CMD_GETID ends up using a fake keyboard id. Using a fake id is
+ * ok in translated mode, only atkbd_select_set() checks atkbd->id and in
+ * translated mode that is a no-op.
+ */
+static bool atkbd_skip_getid(struct atkbd *atkbd)
+{
+ return atkbd->translated && atkbd_is_portable_device();
+}
+#else
+static inline bool atkbd_skip_getid(struct atkbd *atkbd) { return false; }
+#endif
+
/*
* atkbd_probe() probes for an AT keyboard on a serio port.
*/
*/
param[0] = param[1] = 0xa5; /* initialize with invalid values */
- if (ps2_command(ps2dev, param, ATKBD_CMD_GETID)) {
+ if (atkbd_skip_getid(atkbd) || ps2_command(ps2dev, param, ATKBD_CMD_GETID)) {
/*
- * If the get ID command failed, we check if we can at least set the LEDs on
- * the keyboard. This should work on every keyboard out there. It also turns
- * the LEDs off, which we want anyway.
+ * If the get ID command was skipped or failed, we check if we can at least set
+ * the LEDs on the keyboard. This should work on every keyboard out there.
+ * It also turns the LEDs off, which we want anyway.
*/
param[0] = 0;
if (ps2_command(ps2dev, param, ATKBD_CMD_SETLEDS))
keys->codes = devm_kmemdup(&pdev->dev, micro_keycodes,
keys->input->keycodesize * keys->input->keycodemax,
GFP_KERNEL);
+ if (!keys->codes)
+ return -ENOMEM;
+
keys->input->keycode = keys->codes;
__set_bit(EV_KEY, keys->input->evbit);
info->name = "power";
info->event_code = KEY_POWER;
info->wakeup = true;
+ } else if (upage == 0x01 && usage == 0xc6) {
+ info->name = "airplane mode switch";
+ info->event_type = EV_SW;
+ info->event_code = SW_RFKILL_ALL;
+ info->active_low = false;
} else if (upage == 0x01 && usage == 0xca) {
info->name = "rotation lock switch";
info->event_type = EV_SW;
return 0;
}
-static int __exit amimouse_remove(struct platform_device *pdev)
+static void __exit amimouse_remove(struct platform_device *pdev)
{
struct input_dev *dev = platform_get_drvdata(pdev);
input_unregister_device(dev);
- return 0;
}
static struct platform_driver amimouse_driver = {
- .remove = __exit_p(amimouse_remove),
+ .remove_new = __exit_p(amimouse_remove),
.driver = {
.name = "amiga-mouse",
},
"LEN009b", /* T580 */
"LEN0402", /* X1 Extreme Gen 2 / P1 Gen 2 */
"LEN040f", /* P1 Gen 3 */
+ "LEN0411", /* L14 Gen 1 */
"LEN200f", /* T450s */
"LEN2044", /* L470 */
"LEN2054", /* E480 */
},
.driver_data = (void *)(SERIO_QUIRK_DRITEK)
},
+ {
+ /* Acer TravelMate P459-G2-M */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate P459-G2-M"),
+ },
+ .driver_data = (void *)(SERIO_QUIRK_NOMUX)
+ },
{
/* Amoi M636/A737 */
.matches = {
}
mutex_unlock(&icc_lock);
+ if (!node)
+ return ERR_PTR(-EINVAL);
+
if (IS_ERR(node))
return ERR_CAST(node);
if (qn->ib_coeff) {
agg_peak_rate = qn->max_peak[ctx] * 100;
- agg_peak_rate = div_u64(qn->max_peak[ctx], qn->ib_coeff);
+ agg_peak_rate = div_u64(agg_peak_rate, qn->ib_coeff);
} else {
agg_peak_rate = qn->max_peak[ctx];
}
.driver = {
.name = "qnoc-sm8250",
.of_match_table = qnoc_of_match,
+ .sync_state = icc_sync_state,
},
};
module_platform_driver(qnoc_driver);
data->irq_2_irte.devid = devid;
data->irq_2_irte.index = index + sub_handle;
- iommu->irte_ops->prepare(data->entry, apic->delivery_mode,
+ iommu->irte_ops->prepare(data->entry, APIC_DELIVERY_MODE_FIXED,
apic->dest_mode_logical, irq_cfg->vector,
irq_cfg->dest_apicid, devid);
entry->lo.fields_remap.valid = valid;
entry->lo.fields_remap.dm = apic->dest_mode_logical;
- entry->lo.fields_remap.int_type = apic->delivery_mode;
+ entry->lo.fields_remap.int_type = APIC_DELIVERY_MODE_FIXED;
entry->hi.fields.vector = cfg->vector;
entry->lo.fields_remap.destination =
APICID_TO_IRTE_DEST_LO(cfg->dest_apicid);
* irq migration in the presence of interrupt-remapping.
*/
irte->trigger_mode = 0;
- irte->dlvry_mode = apic->delivery_mode;
+ irte->dlvry_mode = APIC_DELIVERY_MODE_FIXED;
irte->vector = vector;
irte->dest_id = IRTE_DEST(dest);
irte->redir_hint = 1;
config DM_AUDIT
bool "DM audit events"
+ depends on BLK_DEV_DM
depends on AUDIT
help
Generate audit events for device-mapper.
sectors_to_process = dio->range.n_sectors;
__bio_for_each_segment(bv, bio, iter, dio->bio_details.bi_iter) {
+ struct bio_vec bv_copy = bv;
unsigned int pos;
char *mem, *checksums_ptr;
again:
- mem = bvec_kmap_local(&bv);
+ mem = bvec_kmap_local(&bv_copy);
pos = 0;
checksums_ptr = checksums;
do {
sectors_to_process -= ic->sectors_per_block;
pos += ic->sectors_per_block << SECTOR_SHIFT;
sector += ic->sectors_per_block;
- } while (pos < bv.bv_len && sectors_to_process && checksums != checksums_onstack);
+ } while (pos < bv_copy.bv_len && sectors_to_process && checksums != checksums_onstack);
kunmap_local(mem);
r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset,
if (!sectors_to_process)
break;
- if (unlikely(pos < bv.bv_len)) {
- bv.bv_offset += pos;
- bv.bv_len -= pos;
+ if (unlikely(pos < bv_copy.bv_len)) {
+ bv_copy.bv_offset += pos;
+ bv_copy.bv_len -= pos;
goto again;
}
}
mddev_lock_nointr(&rs->md);
md_stop(&rs->md);
mddev_unlock(&rs->md);
+
+ if (work_pending(&rs->md.event_work))
+ flush_work(&rs->md.event_work);
raid_set_free(rs);
}
WARN_ON(test_bit(DMF_FROZEN, &md->flags));
- r = freeze_bdev(md->disk->part0);
+ r = bdev_freeze(md->disk->part0);
if (!r)
set_bit(DMF_FROZEN, &md->flags);
return r;
{
if (!test_bit(DMF_FROZEN, &md->flags))
return;
- thaw_bdev(md->disk->part0);
+ bdev_thaw(md->disk->part0);
clear_bit(DMF_FROZEN, &md->flags);
}
static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
static struct workqueue_struct *md_wq;
+
+/*
+ * This workqueue is used for sync_work to register new sync_thread, and for
+ * del_work to remove rdev, and for event_work that is only set by dm-raid.
+ *
+ * Noted that sync_work will grab reconfig_mutex, hence never flush this
+ * workqueue whith reconfig_mutex grabbed.
+ */
static struct workqueue_struct *md_misc_wq;
struct workqueue_struct *md_bitmap_wq;
struct md_personality *pers = mddev->pers;
md_bitmap_destroy(mddev);
mddev_detach(mddev);
- /* Ensure ->event_work is done */
- if (mddev->event_work.func)
- flush_workqueue(md_misc_wq);
spin_lock(&mddev->lock);
mddev->pers = NULL;
spin_unlock(&mddev->lock);
{
if ((pvr_version_is(PVR_POWER8E)) ||
(pvr_version_is(PVR_POWER8NVL)) ||
- (pvr_version_is(PVR_POWER8)))
+ (pvr_version_is(PVR_POWER8)) ||
+ (pvr_version_is(PVR_HX_C2000)))
return true;
return false;
}
static irqreturn_t afu_irq_handler(int virq, void *data)
{
- struct afu_irq *irq = (struct afu_irq *) data;
+ struct afu_irq *irq = data;
trace_ocxl_afu_irq_receive(virq);
*/
static void xsl_fault_error(void *data, u64 addr, u64 dsisr)
{
- struct ocxl_context *ctx = (struct ocxl_context *) data;
+ struct ocxl_context *ctx = data;
mutex_lock(&ctx->xsl_error_lock);
ctx->xsl_error.addr = addr;
{
struct eventfd_ctx *ev_ctx = private;
- eventfd_signal(ev_ctx, 1);
+ eventfd_signal(ev_ctx);
return IRQ_HANDLED;
}
static irqreturn_t xsl_fault_handler(int irq, void *data)
{
- struct ocxl_link *link = (struct ocxl_link *) data;
+ struct ocxl_link *link = data;
struct spa *spa = link->spa;
u64 dsisr, dar, pe_handle;
struct pe_data *pe_data;
void ocxl_link_release(struct pci_dev *dev, void *link_handle)
{
- struct ocxl_link *link = (struct ocxl_link *) link_handle;
+ struct ocxl_link *link = link_handle;
mutex_lock(&links_list_lock);
kref_put(&link->ref, release_xsl);
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
{
- struct ocxl_link *link = (struct ocxl_link *) link_handle;
+ struct ocxl_link *link = link_handle;
struct spa *spa = link->spa;
struct ocxl_process_element *pe;
int pe_handle, rc = 0;
int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
{
- struct ocxl_link *link = (struct ocxl_link *) link_handle;
+ struct ocxl_link *link = link_handle;
struct spa *spa = link->spa;
struct ocxl_process_element *pe;
int pe_handle, rc;
int ocxl_link_remove_pe(void *link_handle, int pasid)
{
- struct ocxl_link *link = (struct ocxl_link *) link_handle;
+ struct ocxl_link *link = link_handle;
struct spa *spa = link->spa;
struct ocxl_process_element *pe;
struct pe_data *pe_data;
int ocxl_link_irq_alloc(void *link_handle, int *hw_irq)
{
- struct ocxl_link *link = (struct ocxl_link *) link_handle;
+ struct ocxl_link *link = link_handle;
int irq;
if (atomic_dec_if_positive(&link->irq_available) < 0)
void ocxl_link_free_irq(void *link_handle, int hw_irq)
{
- struct ocxl_link *link = (struct ocxl_link *) link_handle;
+ struct ocxl_link *link = link_handle;
xive_native_free_irq(hw_irq);
atomic_inc(&link->irq_available);
static int __init init_ocxl(void)
{
- int rc = 0;
+ int rc;
if (!tlbie_capable)
return -EINVAL;
static int mmc_blk_part_switch_pre(struct mmc_card *card,
unsigned int part_type)
{
+ const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB;
int ret = 0;
- if (part_type == EXT_CSD_PART_CONFIG_ACC_RPMB) {
+ if ((part_type & mask) == mask) {
if (card->ext_csd.cmdq_en) {
ret = mmc_cmdq_disable(card);
if (ret)
static int mmc_blk_part_switch_post(struct mmc_card *card,
unsigned int part_type)
{
+ const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB;
int ret = 0;
- if (part_type == EXT_CSD_PART_CONFIG_ACC_RPMB) {
+ if ((part_type & mask) == mask) {
mmc_retune_unpause(card->host);
if (card->reenable_cmdq && !card->ext_csd.cmdq_en)
ret = mmc_cmdq_enable(card);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Multimedia Card (MMC) block device driver");
-
*/
void mmc_free_host(struct mmc_host *host)
{
+ cancel_delayed_work_sync(&host->detect);
mmc_pwrseq_free(host);
put_device(&host->class_dev);
}
static int meson_mx_sdhc_set_clk(struct mmc_host *mmc, struct mmc_ios *ios)
{
struct meson_mx_sdhc_host *host = mmc_priv(mmc);
- u32 rx_clk_phase;
+ u32 val, rx_clk_phase;
int ret;
meson_mx_sdhc_disable_clks(mmc);
mmc->actual_clock = clk_get_rate(host->sd_clk);
/*
- * according to Amlogic the following latching points are
- * selected with empirical values, there is no (known) formula
- * to calculate these.
+ * Phase 90 should work in most cases. For data transmission,
+ * meson_mx_sdhc_execute_tuning() will find a accurate value
*/
- if (mmc->actual_clock > 100000000) {
- rx_clk_phase = 1;
- } else if (mmc->actual_clock > 45000000) {
- if (ios->signal_voltage == MMC_SIGNAL_VOLTAGE_330)
- rx_clk_phase = 15;
- else
- rx_clk_phase = 11;
- } else if (mmc->actual_clock >= 25000000) {
- rx_clk_phase = 15;
- } else if (mmc->actual_clock > 5000000) {
- rx_clk_phase = 23;
- } else if (mmc->actual_clock > 1000000) {
- rx_clk_phase = 55;
- } else {
- rx_clk_phase = 1061;
- }
-
+ regmap_read(host->regmap, MESON_SDHC_CLKC, &val);
+ rx_clk_phase = FIELD_GET(MESON_SDHC_CLKC_CLK_DIV, val) / 4;
regmap_update_bits(host->regmap, MESON_SDHC_CLK2,
MESON_SDHC_CLK2_RX_CLK_PHASE,
FIELD_PREP(MESON_SDHC_CLK2_RX_CLK_PHASE,
div = ((div & 0x300) >> 2) | ((div & 0xFF) << 8);
sdhci_enable_clk(host, div);
+ val = sdhci_readl(host, SDHCI_SPRD_REG_32_BUSY_POSI);
+ mask = SDHCI_SPRD_BIT_OUTR_CLK_AUTO_EN | SDHCI_SPRD_BIT_INNR_CLK_AUTO_EN;
/* Enable CLK_AUTO when the clock is greater than 400K. */
if (clk > 400000) {
- val = sdhci_readl(host, SDHCI_SPRD_REG_32_BUSY_POSI);
- mask = SDHCI_SPRD_BIT_OUTR_CLK_AUTO_EN |
- SDHCI_SPRD_BIT_INNR_CLK_AUTO_EN;
if (mask != (val & mask)) {
val |= mask;
sdhci_writel(host, val, SDHCI_SPRD_REG_32_BUSY_POSI);
}
+ } else {
+ if (val & mask) {
+ val &= ~mask;
+ sdhci_writel(host, val, SDHCI_SPRD_REG_32_BUSY_POSI);
+ }
}
}
netdev_err(adapter->netdev, "offset(%d) > ring size(%d) !!\n",
offset, adapter->ring_size);
err = -1;
- goto failed;
+ goto free_buffer;
}
return 0;
+free_buffer:
+ kfree(tx_ring->tx_buffer);
+ tx_ring->tx_buffer = NULL;
failed:
if (adapter->ring_vir_addr != NULL) {
dma_free_coherent(&pdev->dev, adapter->ring_size,
bnxt_cfg_ntp_filters(bp);
if (test_and_clear_bit(BNXT_HWRM_EXEC_FWD_REQ_SP_EVENT, &bp->sp_event))
bnxt_hwrm_exec_fwd_req(bp);
+ if (test_and_clear_bit(BNXT_HWRM_PF_UNLOAD_SP_EVENT, &bp->sp_event))
+ netdev_info(bp->dev, "Receive PF driver unload event!\n");
if (test_and_clear_bit(BNXT_PERIODIC_STATS_SP_EVENT, &bp->sp_event)) {
bnxt_hwrm_port_qstats(bp, 0);
bnxt_hwrm_port_qstats_ext(bp, 0);
}
}
}
- if (test_and_clear_bit(BNXT_HWRM_PF_UNLOAD_SP_EVENT, &bp->sp_event))
- netdev_info(bp->dev, "Receive PF driver unload event!\n");
}
#else
for (i = 0; i < num_frags ; i++) {
skb_frag_t *frag = &sinfo->frags[i];
struct bnxt_sw_tx_bd *frag_tx_buf;
- struct pci_dev *pdev = bp->pdev;
dma_addr_t frag_mapping;
int frag_len;
txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
frag_len = skb_frag_size(frag);
- frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
- frag_len, DMA_TO_DEVICE);
-
- if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
- return NULL;
-
- dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);
-
flags = frag_len << TX_BD_LEN_SHIFT;
txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
+ frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) +
+ skb_frag_off(frag);
txbd->tx_bd_haddr = cpu_to_le64(frag_mapping);
len = frag_len;
/* Note: if we ever change from DMA_TX_APPEND_CRC below we
* will need to restore software padding of "runt" packets
*/
+ len_stat |= DMA_TX_APPEND_CRC;
+
if (!i) {
- len_stat |= DMA_TX_APPEND_CRC | DMA_SOP;
+ len_stat |= DMA_SOP;
if (skb->ip_summed == CHECKSUM_PARTIAL)
len_stat |= DMA_TX_DO_CSUM;
}
static void netdev_hw_addr_refcnt(struct i40e_mac_filter *f,
struct net_device *netdev, int delta)
{
+ struct netdev_hw_addr_list *ha_list;
struct netdev_hw_addr *ha;
if (!f || !netdev)
return;
- netdev_for_each_mc_addr(ha, netdev) {
+ if (is_unicast_ether_addr(f->macaddr) || is_link_local_ether_addr(f->macaddr))
+ ha_list = &netdev->uc;
+ else
+ ha_list = &netdev->mc;
+
+ netdev_hw_addr_list_for_each(ha, ha_list) {
if (ether_addr_equal(ha->addr, f->macaddr)) {
ha->refcount += delta;
if (ha->refcount <= 0)
return;
i40e_reset_and_rebuild(pf, false, false);
+#ifdef CONFIG_PCI_IOV
+ i40e_restore_all_vfs_msi_state(pdev);
+#endif /* CONFIG_PCI_IOV */
}
/**
#define I40E_GLGEN_MSCA_OPCODE_SHIFT 26
#define I40E_GLGEN_MSCA_OPCODE_MASK(_i) I40E_MASK(_i, I40E_GLGEN_MSCA_OPCODE_SHIFT)
#define I40E_GLGEN_MSCA_STCODE_SHIFT 28
-#define I40E_GLGEN_MSCA_STCODE_MASK I40E_MASK(0x1, I40E_GLGEN_MSCA_STCODE_SHIFT)
+#define I40E_GLGEN_MSCA_STCODE_MASK(_i) I40E_MASK(_i, I40E_GLGEN_MSCA_STCODE_SHIFT)
#define I40E_GLGEN_MSCA_MDICMD_SHIFT 30
#define I40E_GLGEN_MSCA_MDICMD_MASK I40E_MASK(0x1, I40E_GLGEN_MSCA_MDICMD_SHIFT)
#define I40E_GLGEN_MSCA_MDIINPROGEN_SHIFT 31
#define I40E_QTX_CTL_VM_QUEUE 0x1
#define I40E_QTX_CTL_PF_QUEUE 0x2
-#define I40E_MDIO_CLAUSE22_STCODE_MASK I40E_GLGEN_MSCA_STCODE_MASK
+#define I40E_MDIO_CLAUSE22_STCODE_MASK I40E_GLGEN_MSCA_STCODE_MASK(1)
#define I40E_MDIO_CLAUSE22_OPCODE_WRITE_MASK I40E_GLGEN_MSCA_OPCODE_MASK(1)
#define I40E_MDIO_CLAUSE22_OPCODE_READ_MASK I40E_GLGEN_MSCA_OPCODE_MASK(2)
-#define I40E_MDIO_CLAUSE45_STCODE_MASK I40E_GLGEN_MSCA_STCODE_MASK
+#define I40E_MDIO_CLAUSE45_STCODE_MASK I40E_GLGEN_MSCA_STCODE_MASK(0)
#define I40E_MDIO_CLAUSE45_OPCODE_ADDRESS_MASK I40E_GLGEN_MSCA_OPCODE_MASK(0)
#define I40E_MDIO_CLAUSE45_OPCODE_WRITE_MASK I40E_GLGEN_MSCA_OPCODE_MASK(1)
#define I40E_MDIO_CLAUSE45_OPCODE_READ_MASK I40E_GLGEN_MSCA_OPCODE_MASK(3)
(u8 *)&pfe, sizeof(struct virtchnl_pf_event));
}
+#ifdef CONFIG_PCI_IOV
+void i40e_restore_all_vfs_msi_state(struct pci_dev *pdev)
+{
+ u16 vf_id;
+ u16 pos;
+
+ /* Continue only if this is a PF */
+ if (!pdev->is_physfn)
+ return;
+
+ if (!pci_num_vf(pdev))
+ return;
+
+ pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
+ if (pos) {
+ struct pci_dev *vf_dev = NULL;
+
+ pci_read_config_word(pdev, pos + PCI_SRIOV_VF_DID, &vf_id);
+ while ((vf_dev = pci_get_device(pdev->vendor, vf_id, vf_dev))) {
+ if (vf_dev->is_virtfn && vf_dev->physfn == pdev)
+ pci_restore_msi_state(vf_dev);
+ }
+ }
+}
+#endif /* CONFIG_PCI_IOV */
+
/**
* i40e_vc_notify_vf_reset
* @vf: pointer to the VF structure
bool found = false;
int bkt;
- if (!tc_filter->action) {
+ if (tc_filter->action != VIRTCHNL_ACTION_TC_REDIRECT) {
dev_info(&pf->pdev->dev,
- "VF %d: Currently ADq doesn't support Drop Action\n",
- vf->vf_id);
+ "VF %d: ADQ doesn't support this action (%d)\n",
+ vf->vf_id, tc_filter->action);
goto err;
}
/* action_meta is TC number here to which the filter is applied */
if (!tc_filter->action_meta ||
- tc_filter->action_meta > I40E_MAX_VF_VSI) {
+ tc_filter->action_meta > vf->num_tc) {
dev_info(&pf->pdev->dev, "VF %d: Invalid TC number %u\n",
vf->vf_id, tc_filter->action_meta);
goto err;
void i40e_vc_notify_link_state(struct i40e_pf *pf);
void i40e_vc_notify_reset(struct i40e_pf *pf);
+#ifdef CONFIG_PCI_IOV
+void i40e_restore_all_vfs_msi_state(struct pci_dev *pdev);
+#endif /* CONFIG_PCI_IOV */
int i40e_get_vf_stats(struct net_device *netdev, int vf_id,
struct ifla_vf_stats *vf_stats);
u8 lp_flowcontrol;
#define ICE_AQ_LINK_LP_PAUSE_ADV BIT(0)
#define ICE_AQ_LINK_LP_ASM_DIR_ADV BIT(1)
+ u8 reserved5[5];
#define ICE_AQC_LS_DATA_SIZE_V2 \
- offsetofend(struct ice_aqc_get_link_status_data, lp_flowcontrol)
+ offsetofend(struct ice_aqc_get_link_status_data, reserved5)
} __packed;
/* Set event mask command (direct 0x0613) */
u8 *eec_mode)
{
struct ice_aqc_get_cgu_dpll_status *cmd;
- const s64 nsec_per_psec = 1000LL;
struct ice_aq_desc desc;
int status;
*phase_offset = le32_to_cpu(cmd->phase_offset_h);
*phase_offset <<= 32;
*phase_offset += le32_to_cpu(cmd->phase_offset_l);
- *phase_offset = div64_s64(sign_extend64(*phase_offset, 47),
- nsec_per_psec);
+ *phase_offset = sign_extend64(*phase_offset, 47);
*eec_mode = cmd->eec_mode;
}
linkmode_zero(ks->link_modes.supported);
linkmode_zero(ks->link_modes.advertising);
- for (i = 0; i < BITS_PER_TYPE(u64); i++) {
+ for (i = 0; i < ARRAY_SIZE(phy_type_low_lkup); i++) {
if (phy_types_low & BIT_ULL(i))
ice_linkmode_set_bit(&phy_type_low_lkup[i], ks,
req_speeds, advert_phy_type_lo,
i);
}
- for (i = 0; i < BITS_PER_TYPE(u64); i++) {
+ for (i = 0; i < ARRAY_SIZE(phy_type_high_lkup); i++) {
if (phy_types_high & BIT_ULL(i))
ice_linkmode_set_bit(&phy_type_high_lkup[i], ks,
req_speeds, advert_phy_type_hi,
int n, err;
ice_lag_init_feature_support_flag(pf);
+ if (!ice_is_feature_supported(pf, ICE_F_SRIOV_LAG))
+ return 0;
pf->lag = kzalloc(sizeof(*lag), GFP_KERNEL);
if (!pf->lag)
} else {
max_txqs[i] = vsi->alloc_txq;
}
+
+ if (vsi->type == ICE_VSI_PF)
+ max_txqs[i] += vsi->num_xdp_txq;
}
dev_dbg(dev, "vsi->tc_cfg.ena_tc = %d\n", vsi->tc_cfg.ena_tc);
if (vsi->type == ICE_VSI_VF &&
vsi->agg_node && vsi->agg_node->valid)
vsi->agg_node->num_vsis--;
- if (vsi->agg_node) {
- vsi->agg_node->valid = false;
- vsi->agg_node->agg_id = 0;
- }
}
/**
/* Ensure we have media as we cannot configure a medialess port */
if (!(phy->link_info.link_info & ICE_AQ_MEDIA_AVAILABLE))
- return -EPERM;
+ return -ENOMEDIUM;
ice_print_topo_conflict(vsi);
int link_err = ice_force_phys_link_state(vsi, false);
if (link_err) {
- netdev_err(vsi->netdev, "Failed to set physical link down, VSI %d error %d\n",
- vsi->vsi_num, link_err);
+ if (link_err == -ENOMEDIUM)
+ netdev_info(vsi->netdev, "Skipping link reconfig - no media attached, VSI %d\n",
+ vsi->vsi_num);
+ else
+ netdev_err(vsi->netdev, "Failed to set physical link down, VSI %d error %d\n",
+ vsi->vsi_num, link_err);
+
+ ice_vsi_close(vsi);
return -EIO;
}
}
}
idpf_rx_sync_for_cpu(rx_buf, fields.size);
- skb = rx_q->skb;
if (skb)
idpf_rx_add_frag(rx_buf, skb, fields.size);
else
if (!rxq)
return;
- if (!bufq && idpf_is_queue_model_split(q_model) && rxq->skb) {
+ if (rxq->skb) {
dev_kfree_skb_any(rxq->skb);
rxq->skb = NULL;
}
__le32 vport_id;
__le16 key_len;
u8 pad;
- __DECLARE_FLEX_ARRAY(u8, key_flex);
-};
-VIRTCHNL2_CHECK_STRUCT_LEN(8, virtchnl2_rss_key);
+ u8 key_flex[];
+} __packed;
+VIRTCHNL2_CHECK_STRUCT_LEN(7, virtchnl2_rss_key);
/**
* struct virtchnl2_queue_chunk - chunk of contiguous queues
u16 etype;
__be16 vlan_etype;
u16 vlan_tci;
+ u16 vlan_tci_mask;
u8 src_addr[ETH_ALEN];
u8 dst_addr[ETH_ALEN];
u8 user_data[8];
}
#define ETHER_TYPE_FULL_MASK ((__force __be16)~0)
+#define VLAN_TCI_FULL_MASK ((__force __be16)~0)
static int igc_ethtool_get_nfc_rule(struct igc_adapter *adapter,
struct ethtool_rxnfc *cmd)
{
fsp->m_u.ether_spec.h_proto = ETHER_TYPE_FULL_MASK;
}
+ if (rule->filter.match_flags & IGC_FILTER_FLAG_VLAN_ETYPE) {
+ fsp->flow_type |= FLOW_EXT;
+ fsp->h_ext.vlan_etype = rule->filter.vlan_etype;
+ fsp->m_ext.vlan_etype = ETHER_TYPE_FULL_MASK;
+ }
+
if (rule->filter.match_flags & IGC_FILTER_FLAG_VLAN_TCI) {
fsp->flow_type |= FLOW_EXT;
fsp->h_ext.vlan_tci = htons(rule->filter.vlan_tci);
- fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
+ fsp->m_ext.vlan_tci = htons(rule->filter.vlan_tci_mask);
}
if (rule->filter.match_flags & IGC_FILTER_FLAG_DST_MAC_ADDR) {
if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
rule->filter.vlan_tci = ntohs(fsp->h_ext.vlan_tci);
+ rule->filter.vlan_tci_mask = ntohs(fsp->m_ext.vlan_tci);
rule->filter.match_flags |= IGC_FILTER_FLAG_VLAN_TCI;
}
memcpy(rule->filter.user_mask, fsp->m_ext.data, sizeof(fsp->m_ext.data));
}
- /* When multiple filter options or user data or vlan etype is set, use a
- * flex filter.
+ /* The i225/i226 has various different filters. Flex filters provide a
+ * way to match up to the first 128 bytes of a packet. Use them for:
+ * a) For specific user data
+ * b) For VLAN EtherType
+ * c) For full TCI match
+ * d) Or in case multiple filter criteria are set
+ *
+ * Otherwise, use the simple MAC, VLAN PRIO or EtherType filters.
*/
if ((rule->filter.match_flags & IGC_FILTER_FLAG_USER_DATA) ||
(rule->filter.match_flags & IGC_FILTER_FLAG_VLAN_ETYPE) ||
+ ((rule->filter.match_flags & IGC_FILTER_FLAG_VLAN_TCI) &&
+ rule->filter.vlan_tci_mask == ntohs(VLAN_TCI_FULL_MASK)) ||
(rule->filter.match_flags & (rule->filter.match_flags - 1)))
rule->flex = true;
else
return -EINVAL;
}
+ /* There are two ways to match the VLAN TCI:
+ * 1. Match on PCP field and use vlan prio filter for it
+ * 2. Match on complete TCI field and use flex filter for it
+ */
+ if ((fsp->flow_type & FLOW_EXT) &&
+ fsp->m_ext.vlan_tci &&
+ fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK) &&
+ fsp->m_ext.vlan_tci != VLAN_TCI_FULL_MASK) {
+ netdev_dbg(netdev, "VLAN mask not supported\n");
+ return -EOPNOTSUPP;
+ }
+
+ /* VLAN EtherType can only be matched by full mask. */
+ if ((fsp->flow_type & FLOW_EXT) &&
+ fsp->m_ext.vlan_etype &&
+ fsp->m_ext.vlan_etype != ETHER_TYPE_FULL_MASK) {
+ netdev_dbg(netdev, "VLAN EtherType mask not supported\n");
+ return -EOPNOTSUPP;
+ }
+
if (fsp->location >= IGC_MAX_RXNFC_RULES) {
netdev_dbg(netdev, "Invalid location\n");
return -EINVAL;
wr32(IGC_TQAVCC(i), tqavcc);
wr32(IGC_TQAVHC(i),
- 0x80000000 + ring->hicredit * 0x7735);
+ 0x80000000 + ring->hicredit * 0x7736);
} else {
/* Disable any CBS for the queue */
txqctl &= ~(IGC_TXQCTL_QAV_SEL_MASK);
u8 ltype_mask;
u8 ltype_match;
u8 lid;
-};
+} __packed;
struct npc_lt_def_ipsec {
u8 ltype_mask;
u8 lid;
u8 spi_offset;
u8 spi_nz;
-};
+} __packed;
struct npc_lt_def_apad {
u8 ltype_mask;
void *rvu_first_cgx_pdata(struct rvu *rvu);
int cgxlmac_to_pf(struct rvu *rvu, int cgx_id, int lmac_id);
int rvu_cgx_config_tx(void *cgxd, int lmac_id, bool enable);
+int rvu_cgx_tx_enable(struct rvu *rvu, u16 pcifunc, bool enable);
int rvu_cgx_prio_flow_ctrl_cfg(struct rvu *rvu, u16 pcifunc, u8 tx_pause, u8 rx_pause,
u16 pfc_en);
int rvu_cgx_cfg_pause_frm(struct rvu *rvu, u16 pcifunc, u8 tx_pause, u8 rx_pause);
return mac_ops->mac_rx_tx_enable(cgxd, lmac_id, start);
}
+int rvu_cgx_tx_enable(struct rvu *rvu, u16 pcifunc, bool enable)
+{
+ int pf = rvu_get_pf(pcifunc);
+ struct mac_ops *mac_ops;
+ u8 cgx_id, lmac_id;
+ void *cgxd;
+
+ if (!is_cgx_config_permitted(rvu, pcifunc))
+ return LMAC_AF_ERR_PERM_DENIED;
+
+ rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id);
+ cgxd = rvu_cgx_pdata(cgx_id, rvu);
+ mac_ops = get_mac_ops(cgxd);
+
+ return mac_ops->mac_tx_enable(cgxd, lmac_id, enable);
+}
+
int rvu_cgx_config_tx(void *cgxd, int lmac_id, bool enable)
{
struct mac_ops *mac_ops;
req->minlen = minlen;
}
-static int
-nix_config_link_credits(struct rvu *rvu, int blkaddr, int link,
- u16 pcifunc, u64 tx_credits)
-{
- struct rvu_hwinfo *hw = rvu->hw;
- int pf = rvu_get_pf(pcifunc);
- u8 cgx_id = 0, lmac_id = 0;
- unsigned long poll_tmo;
- bool restore_tx_en = 0;
- struct nix_hw *nix_hw;
- u64 cfg, sw_xoff = 0;
- u32 schq = 0;
- u32 credits;
- int rc;
-
- nix_hw = get_nix_hw(rvu->hw, blkaddr);
- if (!nix_hw)
- return NIX_AF_ERR_INVALID_NIXBLK;
-
- if (tx_credits == nix_hw->tx_credits[link])
- return 0;
-
- /* Enable cgx tx if disabled for credits to be back */
- if (is_pf_cgxmapped(rvu, pf)) {
- rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id);
- restore_tx_en = !rvu_cgx_config_tx(rvu_cgx_pdata(cgx_id, rvu),
- lmac_id, true);
- }
-
- mutex_lock(&rvu->rsrc_lock);
- /* Disable new traffic to link */
- if (hw->cap.nix_shaping) {
- schq = nix_get_tx_link(rvu, pcifunc);
- sw_xoff = rvu_read64(rvu, blkaddr, NIX_AF_TL1X_SW_XOFF(schq));
- rvu_write64(rvu, blkaddr,
- NIX_AF_TL1X_SW_XOFF(schq), BIT_ULL(0));
- }
-
- rc = NIX_AF_ERR_LINK_CREDITS;
- poll_tmo = jiffies + usecs_to_jiffies(200000);
- /* Wait for credits to return */
- do {
- if (time_after(jiffies, poll_tmo))
- goto exit;
- usleep_range(100, 200);
-
- cfg = rvu_read64(rvu, blkaddr,
- NIX_AF_TX_LINKX_NORM_CREDIT(link));
- credits = (cfg >> 12) & 0xFFFFFULL;
- } while (credits != nix_hw->tx_credits[link]);
-
- cfg &= ~(0xFFFFFULL << 12);
- cfg |= (tx_credits << 12);
- rvu_write64(rvu, blkaddr, NIX_AF_TX_LINKX_NORM_CREDIT(link), cfg);
- rc = 0;
-
- nix_hw->tx_credits[link] = tx_credits;
-
-exit:
- /* Enable traffic back */
- if (hw->cap.nix_shaping && !sw_xoff)
- rvu_write64(rvu, blkaddr, NIX_AF_TL1X_SW_XOFF(schq), 0);
-
- /* Restore state of cgx tx */
- if (restore_tx_en)
- rvu_cgx_config_tx(rvu_cgx_pdata(cgx_id, rvu), lmac_id, false);
-
- mutex_unlock(&rvu->rsrc_lock);
- return rc;
-}
-
int rvu_mbox_handler_nix_set_hw_frs(struct rvu *rvu, struct nix_frs_cfg *req,
struct msg_rsp *rsp)
{
struct rvu_hwinfo *hw = rvu->hw;
u16 pcifunc = req->hdr.pcifunc;
int pf = rvu_get_pf(pcifunc);
- int blkaddr, schq, link = -1;
- struct nix_txsch *txsch;
- u64 cfg, lmac_fifo_len;
+ int blkaddr, link = -1;
struct nix_hw *nix_hw;
struct rvu_pfvf *pfvf;
u8 cgx = 0, lmac = 0;
u16 max_mtu;
+ u64 cfg;
blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
if (blkaddr < 0)
if (req->update_minlen && req->minlen < NIC_HW_MIN_FRS)
return NIX_AF_ERR_FRS_INVALID;
- /* Check if requester wants to update SMQ's */
- if (!req->update_smq)
- goto rx_frscfg;
-
- /* Update min/maxlen in each of the SMQ attached to this PF/VF */
- txsch = &nix_hw->txsch[NIX_TXSCH_LVL_SMQ];
- mutex_lock(&rvu->rsrc_lock);
- for (schq = 0; schq < txsch->schq.max; schq++) {
- if (TXSCH_MAP_FUNC(txsch->pfvf_map[schq]) != pcifunc)
- continue;
- cfg = rvu_read64(rvu, blkaddr, NIX_AF_SMQX_CFG(schq));
- cfg = (cfg & ~(0xFFFFULL << 8)) | ((u64)req->maxlen << 8);
- if (req->update_minlen)
- cfg = (cfg & ~0x7FULL) | ((u64)req->minlen & 0x7F);
- rvu_write64(rvu, blkaddr, NIX_AF_SMQX_CFG(schq), cfg);
- }
- mutex_unlock(&rvu->rsrc_lock);
-
-rx_frscfg:
/* Check if config is for SDP link */
if (req->sdp_link) {
if (!hw->sdp_links)
if (link < 0)
return NIX_AF_ERR_RX_LINK_INVALID;
-
linkcfg:
nix_find_link_frs(rvu, req, pcifunc);
cfg = (cfg & ~0xFFFFULL) | req->minlen;
rvu_write64(rvu, blkaddr, NIX_AF_RX_LINKX_CFG(link), cfg);
- if (req->sdp_link || pf == 0)
- return 0;
-
- /* Update transmit credits for CGX links */
- lmac_fifo_len = rvu_cgx_get_lmac_fifolen(rvu, cgx, lmac);
- if (!lmac_fifo_len) {
- dev_err(rvu->dev,
- "%s: Failed to get CGX/RPM%d:LMAC%d FIFO size\n",
- __func__, cgx, lmac);
- return 0;
- }
- return nix_config_link_credits(rvu, blkaddr, link, pcifunc,
- (lmac_fifo_len - req->maxlen) / 16);
+ return 0;
}
int rvu_mbox_handler_nix_set_rx_cfg(struct rvu *rvu, struct nix_rx_cfg *req,
pfvf = rvu_get_pfvf(rvu, pcifunc);
clear_bit(NIXLF_INITIALIZED, &pfvf->flags);
- return rvu_cgx_start_stop_io(rvu, pcifunc, false);
+ err = rvu_cgx_start_stop_io(rvu, pcifunc, false);
+ if (err)
+ return err;
+
+ rvu_cgx_tx_enable(rvu, pcifunc, true);
+
+ return 0;
}
#define RX_SA_BASE GENMASK_ULL(52, 7)
static int otx2_dcbnl_ieee_setpfc(struct net_device *dev, struct ieee_pfc *pfc)
{
struct otx2_nic *pfvf = netdev_priv(dev);
+ u8 old_pfc_en;
int err;
- /* Save PFC configuration to interface */
+ old_pfc_en = pfvf->pfc_en;
pfvf->pfc_en = pfc->pfc_en;
if (pfvf->hw.tx_queues >= NIX_PF_PFC_PRIO_MAX)
* supported by the tx queue configuration
*/
err = otx2_check_pfc_config(pfvf);
- if (err)
+ if (err) {
+ pfvf->pfc_en = old_pfc_en;
return err;
+ }
process_pfc:
err = otx2_config_priority_flow_ctrl(pfvf);
- if (err)
+ if (err) {
+ pfvf->pfc_en = old_pfc_en;
return err;
+ }
/* Request Per channel Bpids */
if (pfc->pfc_en)
err = otx2_pfc_txschq_update(pfvf);
if (err) {
+ if (pfc->pfc_en)
+ otx2_nix_config_bp(pfvf, false);
+
+ otx2_pfc_txschq_stop(pfvf);
+ pfvf->pfc_en = old_pfc_en;
+ otx2_config_priority_flow_ctrl(pfvf);
dev_err(pfvf->dev, "%s failed to update TX schedulers\n", __func__);
return err;
}
for (i = 0; i < q->n_desc; i++) {
struct mtk_wed_wo_queue_entry *entry = &q->entry[i];
+ if (!entry->buf)
+ continue;
+
dma_unmap_single(wo->hw->dev, entry->addr, entry->len,
DMA_TO_DEVICE);
skb_free_frag(entry->buf);
return token;
}
-static int cmd_alloc_index(struct mlx5_cmd *cmd)
+static int cmd_alloc_index(struct mlx5_cmd *cmd, struct mlx5_cmd_work_ent *ent)
{
unsigned long flags;
int ret;
spin_lock_irqsave(&cmd->alloc_lock, flags);
ret = find_first_bit(&cmd->vars.bitmask, cmd->vars.max_reg_cmds);
- if (ret < cmd->vars.max_reg_cmds)
+ if (ret < cmd->vars.max_reg_cmds) {
clear_bit(ret, &cmd->vars.bitmask);
+ ent->idx = ret;
+ cmd->ent_arr[ent->idx] = ent;
+ }
spin_unlock_irqrestore(&cmd->alloc_lock, flags);
return ret < cmd->vars.max_reg_cmds ? ret : -ENOMEM;
sem = ent->page_queue ? &cmd->vars.pages_sem : &cmd->vars.sem;
down(sem);
if (!ent->page_queue) {
- alloc_ret = cmd_alloc_index(cmd);
+ alloc_ret = cmd_alloc_index(cmd, ent);
if (alloc_ret < 0) {
mlx5_core_err_rl(dev, "failed to allocate command entry\n");
if (ent->callback) {
up(sem);
return;
}
- ent->idx = alloc_ret;
} else {
ent->idx = cmd->vars.max_reg_cmds;
spin_lock_irqsave(&cmd->alloc_lock, flags);
clear_bit(ent->idx, &cmd->vars.bitmask);
+ cmd->ent_arr[ent->idx] = ent;
spin_unlock_irqrestore(&cmd->alloc_lock, flags);
}
- cmd->ent_arr[ent->idx] = ent;
lay = get_inst(cmd, ent->idx);
ent->lay = lay;
memset(lay, 0, sizeof(*lay));
while (block_timestamp > tracer->last_timestamp) {
/* Check block override if it's not the first block */
- if (!tracer->last_timestamp) {
+ if (tracer->last_timestamp) {
u64 *ts_event;
/* To avoid block override be the HW in case of buffer
* wraparound, the time stamp of the previous block
in = kvzalloc(inlen, GFP_KERNEL);
if (!in || !ft->g) {
kfree(ft->g);
+ ft->g = NULL;
kvfree(in);
return -ENOMEM;
}
}
esw_attr->dests[esw_attr->out_count].flags |= MLX5_ESW_DEST_ENCAP;
esw_attr->out_count++;
- /* attr->dests[].rep is resolved when we handle encap */
+ /* attr->dests[].vport is resolved when we handle encap */
return 0;
}
out_priv = netdev_priv(out_dev);
rpriv = out_priv->ppriv;
- esw_attr->dests[esw_attr->out_count].rep = rpriv->rep;
+ esw_attr->dests[esw_attr->out_count].vport_valid = true;
+ esw_attr->dests[esw_attr->out_count].vport = rpriv->rep->vport;
esw_attr->dests[esw_attr->out_count].mdev = out_priv->mdev;
esw_attr->out_count++;
if (err)
goto destroy_neigh_entry;
+ e->encap_size = ipv4_encap_size;
+ e->encap_header = encap_header;
+ encap_header = NULL;
+
if (!(nud_state & NUD_VALID)) {
neigh_event_send(attr.n, NULL);
/* the encap entry will be made valid on neigh update event
memset(&reformat_params, 0, sizeof(reformat_params));
reformat_params.type = e->reformat_type;
- reformat_params.size = ipv4_encap_size;
- reformat_params.data = encap_header;
+ reformat_params.size = e->encap_size;
+ reformat_params.data = e->encap_header;
e->pkt_reformat = mlx5_packet_reformat_alloc(priv->mdev, &reformat_params,
MLX5_FLOW_NAMESPACE_FDB);
if (IS_ERR(e->pkt_reformat)) {
goto destroy_neigh_entry;
}
- e->encap_size = ipv4_encap_size;
- e->encap_header = encap_header;
e->flags |= MLX5_ENCAP_ENTRY_VALID;
mlx5e_rep_queue_neigh_stats_work(netdev_priv(attr.out_dev));
mlx5e_route_lookup_ipv4_put(&attr);
if (err)
goto free_encap;
+ e->encap_size = ipv4_encap_size;
+ kfree(e->encap_header);
+ e->encap_header = encap_header;
+ encap_header = NULL;
+
if (!(nud_state & NUD_VALID)) {
neigh_event_send(attr.n, NULL);
/* the encap entry will be made valid on neigh update event
* and not used before that.
*/
- goto free_encap;
+ goto release_neigh;
}
memset(&reformat_params, 0, sizeof(reformat_params));
reformat_params.type = e->reformat_type;
- reformat_params.size = ipv4_encap_size;
- reformat_params.data = encap_header;
+ reformat_params.size = e->encap_size;
+ reformat_params.data = e->encap_header;
e->pkt_reformat = mlx5_packet_reformat_alloc(priv->mdev, &reformat_params,
MLX5_FLOW_NAMESPACE_FDB);
if (IS_ERR(e->pkt_reformat)) {
goto free_encap;
}
- e->encap_size = ipv4_encap_size;
- kfree(e->encap_header);
- e->encap_header = encap_header;
-
e->flags |= MLX5_ENCAP_ENTRY_VALID;
mlx5e_rep_queue_neigh_stats_work(netdev_priv(attr.out_dev));
mlx5e_route_lookup_ipv4_put(&attr);
if (err)
goto destroy_neigh_entry;
+ e->encap_size = ipv6_encap_size;
+ e->encap_header = encap_header;
+ encap_header = NULL;
+
if (!(nud_state & NUD_VALID)) {
neigh_event_send(attr.n, NULL);
/* the encap entry will be made valid on neigh update event
memset(&reformat_params, 0, sizeof(reformat_params));
reformat_params.type = e->reformat_type;
- reformat_params.size = ipv6_encap_size;
- reformat_params.data = encap_header;
+ reformat_params.size = e->encap_size;
+ reformat_params.data = e->encap_header;
e->pkt_reformat = mlx5_packet_reformat_alloc(priv->mdev, &reformat_params,
MLX5_FLOW_NAMESPACE_FDB);
if (IS_ERR(e->pkt_reformat)) {
goto destroy_neigh_entry;
}
- e->encap_size = ipv6_encap_size;
- e->encap_header = encap_header;
e->flags |= MLX5_ENCAP_ENTRY_VALID;
mlx5e_rep_queue_neigh_stats_work(netdev_priv(attr.out_dev));
mlx5e_route_lookup_ipv6_put(&attr);
if (err)
goto free_encap;
+ e->encap_size = ipv6_encap_size;
+ kfree(e->encap_header);
+ e->encap_header = encap_header;
+ encap_header = NULL;
+
if (!(nud_state & NUD_VALID)) {
neigh_event_send(attr.n, NULL);
/* the encap entry will be made valid on neigh update event
* and not used before that.
*/
- goto free_encap;
+ goto release_neigh;
}
memset(&reformat_params, 0, sizeof(reformat_params));
reformat_params.type = e->reformat_type;
- reformat_params.size = ipv6_encap_size;
- reformat_params.data = encap_header;
+ reformat_params.size = e->encap_size;
+ reformat_params.data = e->encap_header;
e->pkt_reformat = mlx5_packet_reformat_alloc(priv->mdev, &reformat_params,
MLX5_FLOW_NAMESPACE_FDB);
if (IS_ERR(e->pkt_reformat)) {
goto free_encap;
}
- e->encap_size = ipv6_encap_size;
- kfree(e->encap_header);
- e->encap_header = encap_header;
-
e->flags |= MLX5_ENCAP_ENTRY_VALID;
mlx5e_rep_queue_neigh_stats_work(netdev_priv(attr.out_dev));
mlx5e_route_lookup_ipv6_put(&attr);
out_priv = netdev_priv(encap_dev);
rpriv = out_priv->ppriv;
- esw_attr->dests[out_index].rep = rpriv->rep;
+ esw_attr->dests[out_index].vport_valid = true;
+ esw_attr->dests[out_index].vport = rpriv->rep->vport;
esw_attr->dests[out_index].mdev = out_priv->mdev;
}
dma_addr_t dma_addr = xdptxd->dma_addr;
u32 dma_len = xdptxd->len;
u16 ds_cnt, inline_hdr_sz;
+ unsigned int frags_size;
u8 num_wqebbs = 1;
int num_frags = 0;
bool inline_ok;
inline_ok = sq->min_inline_mode == MLX5_INLINE_MODE_NONE ||
dma_len >= MLX5E_XDP_MIN_INLINE;
+ frags_size = xdptxd->has_frags ? xdptxdf->sinfo->xdp_frags_size : 0;
- if (unlikely(!inline_ok || sq->hw_mtu < dma_len)) {
+ if (unlikely(!inline_ok || sq->hw_mtu < dma_len + frags_size)) {
stats->err++;
return false;
}
static void mlx5e_ipsec_unblock_tc_offload(struct mlx5_core_dev *mdev)
{
- mdev->num_block_tc++;
+ mdev->num_block_tc--;
}
int mlx5e_accel_ipsec_fs_add_rule(struct mlx5e_ipsec_sa_entry *sa_entry)
count = snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
"%d.%d.%04d (%.16s)", fw_rev_maj(mdev),
fw_rev_min(mdev), fw_rev_sub(mdev), mdev->board_id);
- if (count == sizeof(drvinfo->fw_version))
+ if (count >= sizeof(drvinfo->fw_version))
snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
"%d.%d.%04d", fw_rev_maj(mdev),
fw_rev_min(mdev), fw_rev_sub(mdev));
count = snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
"%d.%d.%04d (%.16s)", fw_rev_maj(mdev),
fw_rev_min(mdev), fw_rev_sub(mdev), mdev->board_id);
- if (count == sizeof(drvinfo->fw_version))
+ if (count >= sizeof(drvinfo->fw_version))
snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
"%d.%d.%04d", fw_rev_maj(mdev),
fw_rev_min(mdev), fw_rev_sub(mdev));
break;
case FLOW_ACTION_ACCEPT:
case FLOW_ACTION_PIPE:
- if (set_branch_dest_ft(flow->priv, attr))
+ err = set_branch_dest_ft(flow->priv, attr);
+ if (err)
goto out_err;
break;
case FLOW_ACTION_JUMP:
goto out_err;
}
*jump_count = cond->extval;
- if (set_branch_dest_ft(flow->priv, attr))
+ err = set_branch_dest_ft(flow->priv, attr);
+ if (err)
goto out_err;
break;
default:
esw = priv->mdev->priv.eswitch;
attr->act_id_restore_rule = esw_add_restore_rule(esw, *act_miss_mapping);
- if (IS_ERR(attr->act_id_restore_rule))
+ if (IS_ERR(attr->act_id_restore_rule)) {
+ err = PTR_ERR(attr->act_id_restore_rule);
goto err_rule;
+ }
return 0;
u8 total_vlan;
struct {
u32 flags;
- struct mlx5_eswitch_rep *rep;
+ bool vport_valid;
+ u16 vport;
struct mlx5_pkt_reformat *pkt_reformat;
struct mlx5_core_dev *mdev;
struct mlx5_termtbl_handle *termtbl;
for (i = from; i < to; i++)
if (esw_attr->dests[i].flags & MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE)
mlx5_chains_put_table(chains, 0, 1, 0);
- else if (mlx5_esw_indir_table_needed(esw, attr, esw_attr->dests[i].rep->vport,
+ else if (mlx5_esw_indir_table_needed(esw, attr, esw_attr->dests[i].vport,
esw_attr->dests[i].mdev))
- mlx5_esw_indir_table_put(esw, esw_attr->dests[i].rep->vport,
- false);
+ mlx5_esw_indir_table_put(esw, esw_attr->dests[i].vport, false);
}
static bool
* this criteria.
*/
for (i = esw_attr->split_count; i < esw_attr->out_count; i++) {
- if (esw_attr->dests[i].rep &&
- mlx5_esw_indir_table_needed(esw, attr, esw_attr->dests[i].rep->vport,
+ if (esw_attr->dests[i].vport_valid &&
+ mlx5_esw_indir_table_needed(esw, attr, esw_attr->dests[i].vport,
esw_attr->dests[i].mdev)) {
result = true;
} else {
dest[*i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
dest[*i].ft = mlx5_esw_indir_table_get(esw, attr,
- esw_attr->dests[j].rep->vport, false);
+ esw_attr->dests[j].vport, false);
if (IS_ERR(dest[*i].ft)) {
err = PTR_ERR(dest[*i].ft);
goto err_indir_tbl_get;
int attr_idx)
{
if (esw->offloads.ft_ipsec_tx_pol &&
- esw_attr->dests[attr_idx].rep &&
- esw_attr->dests[attr_idx].rep->vport == MLX5_VPORT_UPLINK &&
+ esw_attr->dests[attr_idx].vport_valid &&
+ esw_attr->dests[attr_idx].vport == MLX5_VPORT_UPLINK &&
/* To be aligned with software, encryption is needed only for tunnel device */
(esw_attr->dests[attr_idx].flags & MLX5_ESW_DEST_ENCAP_VALID) &&
- esw_attr->dests[attr_idx].rep != esw_attr->in_rep &&
+ esw_attr->dests[attr_idx].vport != esw_attr->in_rep->vport &&
esw_same_vhca_id(esw_attr->dests[attr_idx].mdev, esw->dev))
return true;
int attr_idx, int dest_idx, bool pkt_reformat)
{
dest[dest_idx].type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
- dest[dest_idx].vport.num = esw_attr->dests[attr_idx].rep->vport;
+ dest[dest_idx].vport.num = esw_attr->dests[attr_idx].vport;
if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) {
dest[dest_idx].vport.vhca_id =
MLX5_CAP_GEN(esw_attr->dests[attr_idx].mdev, vhca_id);
struct mlx5_flow_handle *flow;
struct mlx5_flow_spec *spec;
struct mlx5_vport *vport;
+ int err, pfindex;
unsigned long i;
void *misc;
- int err;
if (!MLX5_VPORT_MANAGER(esw->dev) && !mlx5_core_is_ecpf_esw_manager(esw->dev))
return 0;
flows[vport->index] = flow;
}
}
- esw->fdb_table.offloads.peer_miss_rules[mlx5_get_dev_index(peer_dev)] = flows;
+
+ pfindex = mlx5_get_dev_index(peer_dev);
+ if (pfindex >= MLX5_MAX_PORTS) {
+ esw_warn(esw->dev, "Peer dev index(%d) is over the max num defined(%d)\n",
+ pfindex, MLX5_MAX_PORTS);
+ err = -EINVAL;
+ goto add_ec_vf_flow_err;
+ }
+ esw->fdb_table.offloads.peer_miss_rules[pfindex] = flows;
kvfree(spec);
return 0;
/* hairpin */
for (i = esw_attr->split_count; i < esw_attr->out_count; i++)
- if (!esw_attr->dest_int_port && esw_attr->dests[i].rep &&
- esw_attr->dests[i].rep->vport == MLX5_VPORT_UPLINK)
+ if (!esw_attr->dest_int_port && esw_attr->dests[i].vport_valid &&
+ esw_attr->dests[i].vport == MLX5_VPORT_UPLINK)
return true;
return false;
req_list_size = max_list_size;
}
- out_sz = MLX5_ST_SZ_BYTES(query_nic_vport_context_in) +
+ out_sz = MLX5_ST_SZ_BYTES(query_nic_vport_context_out) +
req_list_size * MLX5_ST_SZ_BYTES(mac_address_layout);
out = kvzalloc(out_sz, GFP_KERNEL);
priv->stats.rx_truncate_errors++;
}
+ /* Read receive consumer index before replenish so that this routine
+ * returns accurate return value even if packet is received into
+ * just-replenished buffer prior to exiting this routine.
+ */
+ rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
+ rx_ci_rem = rx_ci % priv->rx_q_entries;
+
/* Let hardware know we've replenished one buffer */
rx_pi++;
rx_pi_rem = rx_pi % priv->rx_q_entries;
if (rx_pi_rem == 0)
priv->valid_polarity ^= 1;
- rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
- rx_ci_rem = rx_ci % priv->rx_q_entries;
if (skb)
netif_receive_skb(skb);
* @rxd: Space for receiving SPI data, in DMA-able space.
* @txd: Space for transmitting SPI data, in DMA-able space.
* @msg_enable: The message flags controlling driver output (see ethtool).
+ * @tx_space: Free space in the hardware TX buffer (cached copy of KS_TXMIR).
+ * @queued_len: Space required in hardware TX buffer for queued packets in txq.
* @fid: Incrementing frame id tag.
* @rc_ier: Cached copy of KS_IER.
* @rc_ccr: Cached copy of KS_CCR.
struct work_struct rxctrl_work;
struct sk_buff_head txq;
+ unsigned int queued_len;
struct eeprom_93cx6 eeprom;
struct regulator *vdd_reg;
handled |= IRQ_RXPSI;
if (status & IRQ_TXI) {
- handled |= IRQ_TXI;
+ unsigned short tx_space = ks8851_rdreg16(ks, KS_TXMIR);
- /* no lock here, tx queue should have been stopped */
+ netif_dbg(ks, intr, ks->netdev,
+ "%s: txspace %d\n", __func__, tx_space);
- /* update our idea of how much tx space is available to the
- * system */
- ks->tx_space = ks8851_rdreg16(ks, KS_TXMIR);
+ spin_lock(&ks->statelock);
+ ks->tx_space = tx_space;
+ if (netif_queue_stopped(ks->netdev))
+ netif_wake_queue(ks->netdev);
+ spin_unlock(&ks->statelock);
- netif_dbg(ks, intr, ks->netdev,
- "%s: txspace %d\n", __func__, ks->tx_space);
+ handled |= IRQ_TXI;
}
if (status & IRQ_RXI)
if (status & IRQ_LCI)
mii_check_link(&ks->mii);
- if (status & IRQ_TXI)
- netif_wake_queue(ks->netdev);
-
return IRQ_HANDLED;
}
ks8851_wrreg16(ks, KS_ISR, ks->rc_ier);
ks8851_wrreg16(ks, KS_IER, ks->rc_ier);
+ ks->queued_len = 0;
netif_start_queue(ks->netdev);
netif_dbg(ks, ifup, ks->netdev, "network device up\n");
netdev_err(ks->netdev, "%s: spi_sync() failed\n", __func__);
}
+/**
+ * calc_txlen - calculate size of message to send packet
+ * @len: Length of data
+ *
+ * Returns the size of the TXFIFO message needed to send
+ * this packet.
+ */
+static unsigned int calc_txlen(unsigned int len)
+{
+ return ALIGN(len + 4, 4);
+}
+
/**
* ks8851_rx_skb_spi - receive skbuff
* @ks: The device state
*/
static void ks8851_tx_work(struct work_struct *work)
{
+ unsigned int dequeued_len = 0;
struct ks8851_net_spi *kss;
+ unsigned short tx_space;
struct ks8851_net *ks;
unsigned long flags;
struct sk_buff *txb;
last = skb_queue_empty(&ks->txq);
if (txb) {
+ dequeued_len += calc_txlen(txb->len);
+
ks8851_wrreg16_spi(ks, KS_RXQCR,
ks->rc_rxqcr | RXQCR_SDA);
ks8851_wrfifo_spi(ks, txb, last);
}
}
+ tx_space = ks8851_rdreg16_spi(ks, KS_TXMIR);
+
+ spin_lock(&ks->statelock);
+ ks->queued_len -= dequeued_len;
+ ks->tx_space = tx_space;
+ spin_unlock(&ks->statelock);
+
ks8851_unlock_spi(ks, &flags);
}
flush_work(&kss->tx_work);
}
-/**
- * calc_txlen - calculate size of message to send packet
- * @len: Length of data
- *
- * Returns the size of the TXFIFO message needed to send
- * this packet.
- */
-static unsigned int calc_txlen(unsigned int len)
-{
- return ALIGN(len + 4, 4);
-}
-
/**
* ks8851_start_xmit_spi - transmit packet using SPI
* @skb: The buffer to transmit
spin_lock(&ks->statelock);
- if (needed > ks->tx_space) {
+ if (ks->queued_len + needed > ks->tx_space) {
netif_stop_queue(dev);
ret = NETDEV_TX_BUSY;
} else {
- ks->tx_space -= needed;
+ ks->queued_len += needed;
skb_queue_tail(&ks->txq, skb);
}
spin_unlock(&ks->statelock);
- schedule_work(&kss->tx_work);
+ if (ret == NETDEV_TX_OK)
+ schedule_work(&kss->tx_work);
return ret;
}
depends on PCI_MSI && X86_64
depends on PCI_HYPERV
select AUXILIARY_BUS
+ select PAGE_POOL
help
This driver supports Microsoft Azure Network Adapter (MANA).
So far, the driver is only supported on X86_64.
rmon_stats->hist_tx[0] = s[OCELOT_STAT_TX_64];
rmon_stats->hist_tx[1] = s[OCELOT_STAT_TX_65_127];
rmon_stats->hist_tx[2] = s[OCELOT_STAT_TX_128_255];
- rmon_stats->hist_tx[3] = s[OCELOT_STAT_TX_128_255];
- rmon_stats->hist_tx[4] = s[OCELOT_STAT_TX_256_511];
- rmon_stats->hist_tx[5] = s[OCELOT_STAT_TX_512_1023];
- rmon_stats->hist_tx[6] = s[OCELOT_STAT_TX_1024_1526];
+ rmon_stats->hist_tx[3] = s[OCELOT_STAT_TX_256_511];
+ rmon_stats->hist_tx[4] = s[OCELOT_STAT_TX_512_1023];
+ rmon_stats->hist_tx[5] = s[OCELOT_STAT_TX_1024_1526];
+ rmon_stats->hist_tx[6] = s[OCELOT_STAT_TX_1527_MAX];
}
static void ocelot_port_pmac_rmon_stats_cb(struct ocelot *ocelot, int port,
rmon_stats->hist_tx[0] = s[OCELOT_STAT_TX_PMAC_64];
rmon_stats->hist_tx[1] = s[OCELOT_STAT_TX_PMAC_65_127];
rmon_stats->hist_tx[2] = s[OCELOT_STAT_TX_PMAC_128_255];
- rmon_stats->hist_tx[3] = s[OCELOT_STAT_TX_PMAC_128_255];
- rmon_stats->hist_tx[4] = s[OCELOT_STAT_TX_PMAC_256_511];
- rmon_stats->hist_tx[5] = s[OCELOT_STAT_TX_PMAC_512_1023];
- rmon_stats->hist_tx[6] = s[OCELOT_STAT_TX_PMAC_1024_1526];
+ rmon_stats->hist_tx[3] = s[OCELOT_STAT_TX_PMAC_256_511];
+ rmon_stats->hist_tx[4] = s[OCELOT_STAT_TX_PMAC_512_1023];
+ rmon_stats->hist_tx[5] = s[OCELOT_STAT_TX_PMAC_1024_1526];
+ rmon_stats->hist_tx[6] = s[OCELOT_STAT_TX_PMAC_1527_MAX];
}
void ocelot_port_get_rmon_stats(struct ocelot *ocelot, int port,
if (qdev->lrg_buf_q_alloc_virt_addr == NULL) {
netdev_err(qdev->ndev, "lBufQ failed\n");
+ kfree(qdev->lrg_buf);
return -ENOMEM;
}
qdev->lrg_buf_q_virt_addr = qdev->lrg_buf_q_alloc_virt_addr;
qdev->lrg_buf_q_alloc_size,
qdev->lrg_buf_q_alloc_virt_addr,
qdev->lrg_buf_q_alloc_phy_addr);
+ kfree(qdev->lrg_buf);
return -ENOMEM;
}
{
r8168ep_ocp_write(tp, 0x01, 0x180, OOB_CMD_DRIVER_START);
r8168ep_ocp_write(tp, 0x01, 0x30, r8168ep_ocp_read(tp, 0x30) | 0x01);
- rtl_loop_wait_high(tp, &rtl_ep_ocp_read_cond, 10000, 10);
+ rtl_loop_wait_high(tp, &rtl_ep_ocp_read_cond, 10000, 30);
}
static void rtl8168_driver_start(struct rtl8169_private *tp)
return -ETIMEDOUT;
}
-static int ravb_config(struct net_device *ndev)
+static int ravb_set_opmode(struct net_device *ndev, u32 opmode)
{
+ u32 csr_ops = 1U << (opmode & CCC_OPC);
+ u32 ccc_mask = CCC_OPC;
int error;
- /* Set config mode */
- ravb_modify(ndev, CCC, CCC_OPC, CCC_OPC_CONFIG);
- /* Check if the operating mode is changed to the config mode */
- error = ravb_wait(ndev, CSR, CSR_OPS, CSR_OPS_CONFIG);
- if (error)
- netdev_err(ndev, "failed to switch device to config mode\n");
+ /* If gPTP active in config mode is supported it needs to be configured
+ * along with CSEL and operating mode in the same access. This is a
+ * hardware limitation.
+ */
+ if (opmode & CCC_GAC)
+ ccc_mask |= CCC_GAC | CCC_CSEL;
+
+ /* Set operating mode */
+ ravb_modify(ndev, CCC, ccc_mask, opmode);
+ /* Check if the operating mode is changed to the requested one */
+ error = ravb_wait(ndev, CSR, CSR_OPS, csr_ops);
+ if (error) {
+ netdev_err(ndev, "failed to switch device to requested mode (%u)\n",
+ opmode & CCC_OPC);
+ }
return error;
}
int error;
/* Set CONFIG mode */
- error = ravb_config(ndev);
+ error = ravb_set_opmode(ndev, CCC_OPC_CONFIG);
if (error)
return error;
return error;
/* Setting the control will start the AVB-DMAC process. */
- ravb_modify(ndev, CCC, CCC_OPC, CCC_OPC_OPERATION);
-
- return 0;
+ return ravb_set_opmode(ndev, CCC_OPC_OPERATION);
}
static void ravb_get_tx_tstamp(struct net_device *ndev)
return error;
/* Stop AVB-DMAC process */
- return ravb_config(ndev);
+ return ravb_set_opmode(ndev, CCC_OPC_CONFIG);
}
/* E-MAC interrupt handler */
return 0;
}
-static void ravb_set_config_mode(struct net_device *ndev)
+static int ravb_set_config_mode(struct net_device *ndev)
{
struct ravb_private *priv = netdev_priv(ndev);
const struct ravb_hw_info *info = priv->info;
+ int error;
if (info->gptp) {
- ravb_modify(ndev, CCC, CCC_OPC, CCC_OPC_CONFIG);
+ error = ravb_set_opmode(ndev, CCC_OPC_CONFIG);
+ if (error)
+ return error;
/* Set CSEL value */
ravb_modify(ndev, CCC, CCC_CSEL, CCC_CSEL_HPB);
} else if (info->ccc_gac) {
- ravb_modify(ndev, CCC, CCC_OPC, CCC_OPC_CONFIG |
- CCC_GAC | CCC_CSEL_HPB);
+ error = ravb_set_opmode(ndev, CCC_OPC_CONFIG | CCC_GAC | CCC_CSEL_HPB);
} else {
- ravb_modify(ndev, CCC, CCC_OPC, CCC_OPC_CONFIG);
+ error = ravb_set_opmode(ndev, CCC_OPC_CONFIG);
}
+
+ return error;
}
/* Set tx and rx clock internal delay modes */
ndev->ethtool_ops = &ravb_ethtool_ops;
/* Set AVB config mode */
- ravb_set_config_mode(ndev);
+ error = ravb_set_config_mode(ndev);
+ if (error)
+ goto out_disable_gptp_clk;
if (info->gptp || info->ccc_gac) {
/* Set GTI value */
dma_free_coherent(ndev->dev.parent, priv->desc_bat_size, priv->desc_bat,
priv->desc_bat_dma);
- /* Set reset mode */
- ravb_write(ndev, CCC_OPC_RESET, CCC);
+ ravb_set_opmode(ndev, CCC_OPC_RESET);
clk_disable_unprepare(priv->gptp_clk);
clk_disable_unprepare(priv->refclk);
int ret = 0;
/* If WoL is enabled set reset mode to rearm the WoL logic */
- if (priv->wol_enabled)
- ravb_write(ndev, CCC_OPC_RESET, CCC);
+ if (priv->wol_enabled) {
+ ret = ravb_set_opmode(ndev, CCC_OPC_RESET);
+ if (ret)
+ return ret;
+ }
/* All register have been reset to default values.
* Restore all registers which where setup at probe time and
*/
/* Set AVB config mode */
- ravb_set_config_mode(ndev);
+ ret = ravb_set_config_mode(ndev);
+ if (ret)
+ return ret;
if (info->gptp || info->ccc_gac) {
/* Set GTI value */
}
if (!success) {
- efx_for_each_channel(channel, efx)
+ efx_for_each_channel(channel, efx) {
kfree(channel->rps_flow_id);
+ channel->rps_flow_id = NULL;
+ }
efx->type->filter_table_remove(efx);
rc = -ENOMEM;
goto out_unlock;
*/
ts_status = readl(priv->ioaddr + GMAC_TIMESTAMP_STATUS);
- if (priv->plat->flags & STMMAC_FLAG_EXT_SNAPSHOT_EN)
+ if (!(priv->plat->flags & STMMAC_FLAG_EXT_SNAPSHOT_EN))
return;
num_snapshot = (ts_status & GMAC_TIMESTAMP_ATSNS_MASK) >>
return port->priv;
}
-#ifdef CONFIG_PPC_EARLY_DEBUG_PS3GELIC
-void udbg_shutdown_ps3gelic(void);
-#else
-static inline void udbg_shutdown_ps3gelic(void) {}
-#endif
-
int gelic_card_set_irq_mask(struct gelic_card *card, u64 mask);
/* shared netdev ops */
void gelic_card_up(struct gelic_card *card);
return rx_desc->wb.upper.status_error & cpu_to_le32(stat_err_bits);
}
-static bool wx_can_reuse_rx_page(struct wx_rx_buffer *rx_buffer,
- int rx_buffer_pgcnt)
-{
- unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
- struct page *page = rx_buffer->page;
-
- /* avoid re-using remote and pfmemalloc pages */
- if (!dev_page_is_reusable(page))
- return false;
-
-#if (PAGE_SIZE < 8192)
- /* if we are only owner of page we can reuse it */
- if (unlikely((rx_buffer_pgcnt - pagecnt_bias) > 1))
- return false;
-#endif
-
- /* If we have drained the page fragment pool we need to update
- * the pagecnt_bias and page count so that we fully restock the
- * number of references the driver holds.
- */
- if (unlikely(pagecnt_bias == 1)) {
- page_ref_add(page, USHRT_MAX - 1);
- rx_buffer->pagecnt_bias = USHRT_MAX;
- }
-
- return true;
-}
-
-/**
- * wx_reuse_rx_page - page flip buffer and store it back on the ring
- * @rx_ring: rx descriptor ring to store buffers on
- * @old_buff: donor buffer to have page reused
- *
- * Synchronizes page for reuse by the adapter
- **/
-static void wx_reuse_rx_page(struct wx_ring *rx_ring,
- struct wx_rx_buffer *old_buff)
-{
- u16 nta = rx_ring->next_to_alloc;
- struct wx_rx_buffer *new_buff;
-
- new_buff = &rx_ring->rx_buffer_info[nta];
-
- /* update, and store next to alloc */
- nta++;
- rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
-
- /* transfer page from old buffer to new buffer */
- new_buff->page = old_buff->page;
- new_buff->page_dma = old_buff->page_dma;
- new_buff->page_offset = old_buff->page_offset;
- new_buff->pagecnt_bias = old_buff->pagecnt_bias;
-}
-
static void wx_dma_sync_frag(struct wx_ring *rx_ring,
struct wx_rx_buffer *rx_buffer)
{
size,
DMA_FROM_DEVICE);
skip_sync:
- rx_buffer->pagecnt_bias--;
-
return rx_buffer;
}
struct sk_buff *skb,
int rx_buffer_pgcnt)
{
- if (wx_can_reuse_rx_page(rx_buffer, rx_buffer_pgcnt)) {
- /* hand second half of page back to the ring */
- wx_reuse_rx_page(rx_ring, rx_buffer);
- } else {
- if (!IS_ERR(skb) && WX_CB(skb)->dma == rx_buffer->dma)
- /* the page has been released from the ring */
- WX_CB(skb)->page_released = true;
- else
- page_pool_put_full_page(rx_ring->page_pool, rx_buffer->page, false);
-
- __page_frag_cache_drain(rx_buffer->page,
- rx_buffer->pagecnt_bias);
- }
+ if (!IS_ERR(skb) && WX_CB(skb)->dma == rx_buffer->dma)
+ /* the page has been released from the ring */
+ WX_CB(skb)->page_released = true;
/* clear contents of rx_buffer */
rx_buffer->page = NULL;
if (size <= WX_RXBUFFER_256) {
memcpy(__skb_put(skb, size), page_addr,
ALIGN(size, sizeof(long)));
- rx_buffer->pagecnt_bias++;
-
+ page_pool_put_full_page(rx_ring->page_pool, rx_buffer->page, true);
return skb;
}
+ skb_mark_for_recycle(skb);
+
if (!wx_test_staterr(rx_desc, WX_RXD_STAT_EOP))
WX_CB(skb)->dma = rx_buffer->dma;
bi->page_dma = dma;
bi->page = page;
bi->page_offset = 0;
- page_ref_add(page, USHRT_MAX - 1);
- bi->pagecnt_bias = USHRT_MAX;
return true;
}
/* exit if we failed to retrieve a buffer */
if (!skb) {
rx_ring->rx_stats.alloc_rx_buff_failed++;
- rx_buffer->pagecnt_bias++;
break;
}
/* free resources associated with mapping */
page_pool_put_full_page(rx_ring->page_pool, rx_buffer->page, false);
- __page_frag_cache_drain(rx_buffer->page,
- rx_buffer->pagecnt_bias);
i++;
rx_buffer++;
dma_addr_t page_dma;
struct page *page;
unsigned int page_offset;
- u16 pagecnt_bias;
};
struct wx_queue_stats {
goto error;
phy_resume(phydev);
- phy_led_triggers_register(phydev);
+ if (!phydev->is_on_sfp_module)
+ phy_led_triggers_register(phydev);
/**
* If the external phy used by current mac interface is managed by
}
phydev->phylink = NULL;
- phy_led_triggers_unregister(phydev);
+ if (!phydev->is_on_sfp_module)
+ phy_led_triggers_unregister(phydev);
if (phydev->mdio.dev.driver)
module_put(phydev->mdio.dev.driver->owner);
u8 buf[ETH_ALEN];
struct ax88172a_private *priv;
- usbnet_get_endpoints(dev, intf);
+ ret = usbnet_get_endpoints(dev, intf);
+ if (ret)
+ return ret;
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
u8 in_pm;
u32 wol_supported;
u32 wolopts;
+ u8 disconnecting;
};
struct ax88179_int_data {
{
int ret;
int (*fn)(struct usbnet *, u8, u8, u16, u16, void *, u16);
+ struct ax88179_data *ax179_data = dev->driver_priv;
BUG_ON(!dev);
ret = fn(dev, cmd, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE,
value, index, data, size);
- if (unlikely(ret < 0))
+ if (unlikely((ret < 0) && !(ret == -ENODEV && ax179_data->disconnecting)))
netdev_warn(dev->net, "Failed to read reg index 0x%04x: %d\n",
index, ret);
{
int ret;
int (*fn)(struct usbnet *, u8, u8, u16, u16, const void *, u16);
+ struct ax88179_data *ax179_data = dev->driver_priv;
BUG_ON(!dev);
ret = fn(dev, cmd, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE,
value, index, data, size);
- if (unlikely(ret < 0))
+ if (unlikely((ret < 0) && !(ret == -ENODEV && ax179_data->disconnecting)))
netdev_warn(dev->net, "Failed to write reg index 0x%04x: %d\n",
index, ret);
return usbnet_resume(intf);
}
+static void ax88179_disconnect(struct usb_interface *intf)
+{
+ struct usbnet *dev = usb_get_intfdata(intf);
+ struct ax88179_data *ax179_data;
+
+ if (!dev)
+ return;
+
+ ax179_data = dev->driver_priv;
+ ax179_data->disconnecting = 1;
+
+ usbnet_disconnect(intf);
+}
+
static void
ax88179_get_wol(struct net_device *net, struct ethtool_wolinfo *wolinfo)
{
.suspend = ax88179_suspend,
.resume = ax88179_resume,
.reset_resume = ax88179_resume,
- .disconnect = usbnet_disconnect,
+ .disconnect = ax88179_disconnect,
.supports_autosuspend = 1,
.disable_hub_initiated_lpm = 1,
};
};
};
-static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
static bool is_xdp_frame(void *ptr)
return p;
}
+static void virtnet_rq_free_buf(struct virtnet_info *vi,
+ struct receive_queue *rq, void *buf)
+{
+ if (vi->mergeable_rx_bufs)
+ put_page(virt_to_head_page(buf));
+ else if (vi->big_packets)
+ give_pages(rq, buf);
+ else
+ put_page(virt_to_head_page(buf));
+}
+
static void enable_delayed_refill(struct virtnet_info *vi)
{
spin_lock_bh(&vi->refill_lock);
return buf;
}
-static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
-{
- void *buf;
-
- buf = virtqueue_detach_unused_buf(rq->vq);
- if (buf && rq->do_dma)
- virtnet_rq_unmap(rq, buf, 0);
-
- return buf;
-}
-
static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
{
struct virtnet_rq_dma *dma;
}
}
+static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
+{
+ struct virtnet_info *vi = vq->vdev->priv;
+ struct receive_queue *rq;
+ int i = vq2rxq(vq);
+
+ rq = &vi->rq[i];
+
+ if (rq->do_dma)
+ virtnet_rq_unmap(rq, buf, 0);
+
+ virtnet_rq_free_buf(vi, rq, buf);
+}
+
static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
{
unsigned int len;
if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
DEV_STATS_INC(dev, rx_length_errors);
- virtnet_rq_free_unused_buf(rq->vq, buf);
+ virtnet_rq_free_buf(vi, rq, buf);
return;
}
if (running)
napi_disable(&rq->napi);
- err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
+ err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
if (err)
netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
xdp_return_frame(ptr_to_xdp(buf));
}
-static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf)
-{
- struct virtnet_info *vi = vq->vdev->priv;
- int i = vq2rxq(vq);
-
- if (vi->mergeable_rx_bufs)
- put_page(virt_to_head_page(buf));
- else if (vi->big_packets)
- give_pages(&vi->rq[i], buf);
- else
- put_page(virt_to_head_page(buf));
-}
-
static void free_unused_bufs(struct virtnet_info *vi)
{
void *buf;
}
for (i = 0; i < vi->max_queue_pairs; i++) {
- struct receive_queue *rq = &vi->rq[i];
+ struct virtqueue *vq = vi->rq[i].vq;
- while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
- virtnet_rq_free_unused_buf(rq->vq, buf);
+ while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
+ virtnet_rq_unmap_free_buf(vq, buf);
cond_resched();
}
}
}
}
-void iwl_pcie_handle_rfkill_irq(struct iwl_trans *trans);
+void iwl_pcie_handle_rfkill_irq(struct iwl_trans *trans, bool from_irq);
static inline bool iwl_is_rfkill_set(struct iwl_trans *trans)
{
return (trans->dbg.dest_tlv || iwl_trans_dbg_ini_valid(trans));
}
-void iwl_trans_pcie_rf_kill(struct iwl_trans *trans, bool state);
+void iwl_trans_pcie_rf_kill(struct iwl_trans *trans, bool state, bool from_irq);
void iwl_trans_pcie_dump_regs(struct iwl_trans *trans);
#ifdef CONFIG_IWLWIFI_DEBUGFS
* if it is true then one of the handlers took the page.
*/
- if (reclaim) {
+ if (reclaim && txq) {
u16 sequence = le16_to_cpu(pkt->hdr.sequence);
int index = SEQ_TO_INDEX(sequence);
int cmd_index = iwl_txq_get_cmd_index(txq, index);
return inta;
}
-void iwl_pcie_handle_rfkill_irq(struct iwl_trans *trans)
+void iwl_pcie_handle_rfkill_irq(struct iwl_trans *trans, bool from_irq)
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
struct isr_statistics *isr_stats = &trans_pcie->isr_stats;
isr_stats->rfkill++;
if (prev != report)
- iwl_trans_pcie_rf_kill(trans, report);
+ iwl_trans_pcie_rf_kill(trans, report, from_irq);
mutex_unlock(&trans_pcie->mutex);
if (hw_rfkill) {
/* HW RF KILL switch toggled */
if (inta & CSR_INT_BIT_RF_KILL) {
- iwl_pcie_handle_rfkill_irq(trans);
+ iwl_pcie_handle_rfkill_irq(trans, true);
handled |= CSR_INT_BIT_RF_KILL;
}
/* HW RF KILL switch toggled */
if (inta_hw & MSIX_HW_INT_CAUSES_REG_RF_KILL)
- iwl_pcie_handle_rfkill_irq(trans);
+ iwl_pcie_handle_rfkill_irq(trans, true);
if (inta_hw & MSIX_HW_INT_CAUSES_REG_HW_ERR) {
IWL_ERR(trans,
report = test_bit(STATUS_RFKILL_OPMODE, &trans->status);
if (prev != report)
- iwl_trans_pcie_rf_kill(trans, report);
+ iwl_trans_pcie_rf_kill(trans, report, false);
return hw_rfkill;
}
trans_pcie->hw_mask = trans_pcie->hw_init_mask;
}
-static void _iwl_trans_pcie_stop_device(struct iwl_trans *trans)
+static void _iwl_trans_pcie_stop_device(struct iwl_trans *trans, bool from_irq)
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
if (test_and_clear_bit(STATUS_DEVICE_ENABLED, &trans->status)) {
IWL_DEBUG_INFO(trans,
"DEVICE_ENABLED bit was set and is now cleared\n");
- iwl_pcie_synchronize_irqs(trans);
+ if (!from_irq)
+ iwl_pcie_synchronize_irqs(trans);
iwl_pcie_rx_napi_sync(trans);
iwl_pcie_tx_stop(trans);
iwl_pcie_rx_stop(trans);
clear_bit(STATUS_RFKILL_OPMODE, &trans->status);
}
if (hw_rfkill != was_in_rfkill)
- iwl_trans_pcie_rf_kill(trans, hw_rfkill);
+ iwl_trans_pcie_rf_kill(trans, hw_rfkill, false);
}
static void iwl_trans_pcie_stop_device(struct iwl_trans *trans)
mutex_lock(&trans_pcie->mutex);
trans_pcie->opmode_down = true;
was_in_rfkill = test_bit(STATUS_RFKILL_OPMODE, &trans->status);
- _iwl_trans_pcie_stop_device(trans);
+ _iwl_trans_pcie_stop_device(trans, false);
iwl_trans_pcie_handle_stop_rfkill(trans, was_in_rfkill);
mutex_unlock(&trans_pcie->mutex);
}
-void iwl_trans_pcie_rf_kill(struct iwl_trans *trans, bool state)
+void iwl_trans_pcie_rf_kill(struct iwl_trans *trans, bool state, bool from_irq)
{
struct iwl_trans_pcie __maybe_unused *trans_pcie =
IWL_TRANS_GET_PCIE_TRANS(trans);
if (trans->trans_cfg->gen2)
_iwl_trans_pcie_gen2_stop_device(trans);
else
- _iwl_trans_pcie_stop_device(trans);
+ _iwl_trans_pcie_stop_device(trans, from_irq);
}
}
IWL_WARN(trans, "changing debug rfkill %d->%d\n",
trans_pcie->debug_rfkill, new_value);
trans_pcie->debug_rfkill = new_value;
- iwl_pcie_handle_rfkill_irq(trans);
+ iwl_pcie_handle_rfkill_irq(trans, false);
return count;
}
struct iwl_rxq *rxq = &trans_pcie->rxq[0];
u32 i, r, j, rb_len = 0;
- spin_lock(&rxq->lock);
+ spin_lock_bh(&rxq->lock);
r = iwl_get_closed_rb_stts(trans, rxq);
*data = iwl_fw_error_next_data(*data);
}
- spin_unlock(&rxq->lock);
+ spin_unlock_bh(&rxq->lock);
return rb_len;
}
static void
mt76_add_fragment(struct mt76_dev *dev, struct mt76_queue *q, void *data,
- int len, bool more, u32 info)
+ int len, bool more, u32 info, bool allow_direct)
{
struct sk_buff *skb = q->rx_head;
struct skb_shared_info *shinfo = skb_shinfo(skb);
skb_add_rx_frag(skb, nr_frags, page, offset, len, q->buf_size);
} else {
- mt76_put_page_pool_buf(data, true);
+ mt76_put_page_pool_buf(data, allow_direct);
}
if (more)
struct sk_buff *skb;
unsigned char *data;
bool check_ddone = false;
+ bool allow_direct = !mt76_queue_is_wed_rx(q);
bool more;
if (IS_ENABLED(CONFIG_NET_MEDIATEK_SOC_WED) &&
}
if (q->rx_head) {
- mt76_add_fragment(dev, q, data, len, more, info);
+ mt76_add_fragment(dev, q, data, len, more, info,
+ allow_direct);
continue;
}
continue;
free_frag:
- mt76_put_page_pool_buf(data, true);
+ mt76_put_page_pool_buf(data, allow_direct);
}
mt76_dma_rx_fill(dev, q, true);
ndrv->remove(to_nubus_board(dev));
}
-struct bus_type nubus_bus_type = {
+static const struct bus_type nubus_bus_type = {
.name = "nubus",
.probe = nubus_device_probe,
.remove = nubus_device_remove,
};
-EXPORT_SYMBOL(nubus_bus_type);
int nubus_driver_register(struct nubus_driver *ndrv)
{
struct nvme_ctrl, fw_act_work);
unsigned long fw_act_timeout;
+ nvme_auth_stop(ctrl);
+
if (ctrl->mtfa)
fw_act_timeout = jiffies +
msecs_to_jiffies(ctrl->mtfa * 100);
* firmware activation.
*/
if (nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) {
- nvme_auth_stop(ctrl);
requeue = false;
queue_work(nvme_wq, &ctrl->fw_act_work);
}
* the controller. Abort any ios on the association and let the
* create_association error path resolve things.
*/
- enum nvme_ctrl_state state;
- unsigned long flags;
-
- spin_lock_irqsave(&ctrl->lock, flags);
- state = ctrl->ctrl.state;
- if (state == NVME_CTRL_CONNECTING) {
- set_bit(ASSOC_FAILED, &ctrl->flags);
- spin_unlock_irqrestore(&ctrl->lock, flags);
+ if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) {
__nvme_fc_abort_outstanding_ios(ctrl, true);
+ set_bit(ASSOC_FAILED, &ctrl->flags);
dev_warn(ctrl->ctrl.device,
"NVME-FC{%d}: transport error during (re)connect\n",
ctrl->cnum);
return;
}
- spin_unlock_irqrestore(&ctrl->lock, flags);
/* Otherwise, only proceed if in LIVE state - e.g. on first error */
- if (state != NVME_CTRL_LIVE)
+ if (ctrl->ctrl.state != NVME_CTRL_LIVE)
return;
dev_warn(ctrl->ctrl.device,
else
ret = nvme_fc_recreate_io_queues(ctrl);
}
-
- spin_lock_irqsave(&ctrl->lock, flags);
if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
ret = -EIO;
- if (ret) {
- spin_unlock_irqrestore(&ctrl->lock, flags);
+ if (ret)
goto out_term_aen_ops;
- }
+
changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
- spin_unlock_irqrestore(&ctrl->lock, flags);
ctrl->ctrl.nr_reconnects = 0;
#define NVRAM_MAGIC "FLSH"
+/**
+ * struct brcm_nvram - driver state internal struct
+ *
+ * @dev: NVMEM device pointer
+ * @nvmem_size: Size of the whole space available for NVRAM
+ * @data: NVRAM data copy stored to avoid poking underlaying flash controller
+ * @data_len: NVRAM data size
+ * @padding_byte: Padding value used to fill remaining space
+ * @cells: Array of discovered NVMEM cells
+ * @ncells: Number of elements in cells
+ */
struct brcm_nvram {
struct device *dev;
- void __iomem *base;
+ size_t nvmem_size;
+ uint8_t *data;
+ size_t data_len;
+ uint8_t padding_byte;
struct nvmem_cell_info *cells;
int ncells;
};
size_t bytes)
{
struct brcm_nvram *priv = context;
- u8 *dst = val;
+ size_t to_copy;
+
+ if (offset + bytes > priv->data_len)
+ to_copy = max_t(ssize_t, (ssize_t)priv->data_len - offset, 0);
+ else
+ to_copy = bytes;
+
+ memcpy(val, priv->data + offset, to_copy);
+
+ memset((uint8_t *)val + to_copy, priv->padding_byte, bytes - to_copy);
+
+ return 0;
+}
+
+static int brcm_nvram_copy_data(struct brcm_nvram *priv, struct platform_device *pdev)
+{
+ struct resource *res;
+ void __iomem *base;
+
+ base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
+ if (IS_ERR(base))
+ return PTR_ERR(base);
+
+ priv->nvmem_size = resource_size(res);
+
+ priv->padding_byte = readb(base + priv->nvmem_size - 1);
+ for (priv->data_len = priv->nvmem_size;
+ priv->data_len;
+ priv->data_len--) {
+ if (readb(base + priv->data_len - 1) != priv->padding_byte)
+ break;
+ }
+ WARN(priv->data_len > SZ_128K, "Unexpected (big) NVRAM size: %zu B\n", priv->data_len);
+
+ priv->data = devm_kzalloc(priv->dev, priv->data_len, GFP_KERNEL);
+ if (!priv->data)
+ return -ENOMEM;
+
+ memcpy_fromio(priv->data, base, priv->data_len);
- while (bytes--)
- *dst++ = readb(priv->base + offset++);
+ bcm47xx_nvram_init_from_iomem(base, priv->data_len);
return 0;
}
size_t len)
{
struct device *dev = priv->dev;
- char *var, *value, *eq;
+ char *var, *value;
+ uint8_t tmp;
int idx;
+ int err = 0;
+
+ tmp = priv->data[len - 1];
+ priv->data[len - 1] = '\0';
priv->ncells = 0;
for (var = data + sizeof(struct brcm_nvram_header);
}
priv->cells = devm_kcalloc(dev, priv->ncells, sizeof(*priv->cells), GFP_KERNEL);
- if (!priv->cells)
- return -ENOMEM;
+ if (!priv->cells) {
+ err = -ENOMEM;
+ goto out;
+ }
for (var = data + sizeof(struct brcm_nvram_header), idx = 0;
var < (char *)data + len && *var;
var = value + strlen(value) + 1, idx++) {
+ char *eq, *name;
+
eq = strchr(var, '=');
if (!eq)
break;
*eq = '\0';
+ name = devm_kstrdup(dev, var, GFP_KERNEL);
+ *eq = '=';
+ if (!name) {
+ err = -ENOMEM;
+ goto out;
+ }
value = eq + 1;
- priv->cells[idx].name = devm_kstrdup(dev, var, GFP_KERNEL);
- if (!priv->cells[idx].name)
- return -ENOMEM;
+ priv->cells[idx].name = name;
priv->cells[idx].offset = value - (char *)data;
priv->cells[idx].bytes = strlen(value);
priv->cells[idx].np = of_get_child_by_name(dev->of_node, priv->cells[idx].name);
- if (!strcmp(var, "et0macaddr") ||
- !strcmp(var, "et1macaddr") ||
- !strcmp(var, "et2macaddr")) {
+ if (!strcmp(name, "et0macaddr") ||
+ !strcmp(name, "et1macaddr") ||
+ !strcmp(name, "et2macaddr")) {
priv->cells[idx].raw_len = strlen(value);
priv->cells[idx].bytes = ETH_ALEN;
priv->cells[idx].read_post_process = brcm_nvram_read_post_process_macaddr;
}
}
- return 0;
+out:
+ priv->data[len - 1] = tmp;
+ return err;
}
static int brcm_nvram_parse(struct brcm_nvram *priv)
{
+ struct brcm_nvram_header *header = (struct brcm_nvram_header *)priv->data;
struct device *dev = priv->dev;
- struct brcm_nvram_header header;
- uint8_t *data;
size_t len;
int err;
- memcpy_fromio(&header, priv->base, sizeof(header));
-
- if (memcmp(header.magic, NVRAM_MAGIC, 4)) {
+ if (memcmp(header->magic, NVRAM_MAGIC, 4)) {
dev_err(dev, "Invalid NVRAM magic\n");
return -EINVAL;
}
- len = le32_to_cpu(header.len);
-
- data = kzalloc(len, GFP_KERNEL);
- if (!data)
- return -ENOMEM;
-
- memcpy_fromio(data, priv->base, len);
- data[len - 1] = '\0';
-
- err = brcm_nvram_add_cells(priv, data, len);
- if (err) {
- dev_err(dev, "Failed to add cells: %d\n", err);
- return err;
+ len = le32_to_cpu(header->len);
+ if (len > priv->nvmem_size) {
+ dev_err(dev, "NVRAM length (%zd) exceeds mapped size (%zd)\n", len,
+ priv->nvmem_size);
+ return -EINVAL;
}
- kfree(data);
+ err = brcm_nvram_add_cells(priv, priv->data, len);
+ if (err)
+ dev_err(dev, "Failed to add cells: %d\n", err);
return 0;
}
.reg_read = brcm_nvram_read,
};
struct device *dev = &pdev->dev;
- struct resource *res;
struct brcm_nvram *priv;
int err;
return -ENOMEM;
priv->dev = dev;
- priv->base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
- if (IS_ERR(priv->base))
- return PTR_ERR(priv->base);
+ err = brcm_nvram_copy_data(priv, pdev);
+ if (err)
+ return err;
err = brcm_nvram_parse(priv);
if (err)
return err;
- bcm47xx_nvram_init_from_iomem(priv->base, resource_size(res));
-
config.dev = dev;
config.cells = priv->cells;
config.ncells = priv->ncells;
config.priv = priv;
- config.size = resource_size(res);
+ config.size = priv->nvmem_size;
return PTR_ERR_OR_ZERO(devm_nvmem_register(dev, &config));
}
return pci_bus_write_config_dword(dev->bus, dev->devfn, where, val);
}
EXPORT_SYMBOL(pci_write_config_dword);
+
+void pci_clear_and_set_config_dword(const struct pci_dev *dev, int pos,
+ u32 clear, u32 set)
+{
+ u32 val;
+
+ pci_read_config_dword(dev, pos, &val);
+ val &= ~clear;
+ val |= set;
+ pci_write_config_dword(dev, pos, val);
+}
+EXPORT_SYMBOL(pci_clear_and_set_config_dword);
PCI_FUNC(pdev->devfn);
params->int_target.vector = hv_msi_get_int_vector(data);
- /*
- * Honoring apic->delivery_mode set to APIC_DELIVERY_MODE_FIXED by
- * setting the HV_DEVICE_INTERRUPT_TARGET_MULTICAST flag results in a
- * spurious interrupt storm. Not doing so does not seem to have a
- * negative effect (yet?).
- */
-
if (hbus->protocol_version >= PCI_PROTOCOL_VERSION_1_2) {
/*
* PCI_PROTOCOL_VERSION_1_2 supports the VP_SET version of the
pci_restore_bars(dev);
}
+ if (dev->bus->self)
+ pcie_aspm_pm_state_change(dev->bus->self);
+
return 0;
}
pci_power_name(dev->current_state),
pci_power_name(state));
+ if (dev->bus->self)
+ pcie_aspm_pm_state_change(dev->bus->self);
+
return 0;
}
#ifdef CONFIG_PCIEASPM
void pcie_aspm_init_link_state(struct pci_dev *pdev);
void pcie_aspm_exit_link_state(struct pci_dev *pdev);
+void pcie_aspm_pm_state_change(struct pci_dev *pdev);
void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
#else
static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
+static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
#endif
}
}
-static void pci_clear_and_set_dword(struct pci_dev *pdev, int pos,
- u32 clear, u32 set)
-{
- u32 val;
-
- pci_read_config_dword(pdev, pos, &val);
- val &= ~clear;
- val |= set;
- pci_write_config_dword(pdev, pos, val);
-}
-
/* Calculate L1.2 PM substate timing parameters */
static void aspm_calc_l12_info(struct pcie_link_state *link,
u32 parent_l1ss_cap, u32 child_l1ss_cap)
cl1_2_enables = cctl1 & PCI_L1SS_CTL1_L1_2_MASK;
if (pl1_2_enables || cl1_2_enables) {
- pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_L1_2_MASK, 0);
- pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_L1_2_MASK, 0);
+ pci_clear_and_set_config_dword(child,
+ child->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_L1_2_MASK, 0);
+ pci_clear_and_set_config_dword(parent,
+ parent->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_L1_2_MASK, 0);
}
/* Program T_POWER_ON times in both ports */
pci_write_config_dword(child, child->l1ss + PCI_L1SS_CTL2, ctl2);
/* Program Common_Mode_Restore_Time in upstream device */
- pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_CM_RESTORE_TIME, ctl1);
+ pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_CM_RESTORE_TIME, ctl1);
/* Program LTR_L1.2_THRESHOLD time in both ports */
- pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
- PCI_L1SS_CTL1_LTR_L12_TH_SCALE, ctl1);
- pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
- PCI_L1SS_CTL1_LTR_L12_TH_SCALE, ctl1);
+ pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
+ PCI_L1SS_CTL1_LTR_L12_TH_SCALE,
+ ctl1);
+ pci_clear_and_set_config_dword(child, child->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
+ PCI_L1SS_CTL1_LTR_L12_TH_SCALE,
+ ctl1);
if (pl1_2_enables || cl1_2_enables) {
- pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, 0,
- pl1_2_enables);
- pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1, 0,
- cl1_2_enables);
+ pci_clear_and_set_config_dword(parent,
+ parent->l1ss + PCI_L1SS_CTL1, 0,
+ pl1_2_enables);
+ pci_clear_and_set_config_dword(child,
+ child->l1ss + PCI_L1SS_CTL1, 0,
+ cl1_2_enables);
}
}
*/
/* Disable all L1 substates */
- pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_L1SS_MASK, 0);
- pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_L1SS_MASK, 0);
+ pci_clear_and_set_config_dword(child, child->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_L1SS_MASK, 0);
+ pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_L1SS_MASK, 0);
/*
* If needed, disable L1, and it gets enabled later
* in pcie_config_aspm_link().
val |= PCI_L1SS_CTL1_PCIPM_L1_2;
/* Enable what we need to enable */
- pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_L1SS_MASK, val);
- pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1,
- PCI_L1SS_CTL1_L1SS_MASK, val);
+ pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_L1SS_MASK, val);
+ pci_clear_and_set_config_dword(child, child->l1ss + PCI_L1SS_CTL1,
+ PCI_L1SS_CTL1_L1SS_MASK, val);
}
static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
up_read(&pci_bus_sem);
}
+/* @pdev: the root port or switch downstream port */
+void pcie_aspm_pm_state_change(struct pci_dev *pdev)
+{
+ struct pcie_link_state *link = pdev->link_state;
+
+ if (aspm_disabled || !link)
+ return;
+ /*
+ * Devices changed PM state, we should recheck if latency
+ * meets all functions' requirement
+ */
+ down_read(&pci_bus_sem);
+ mutex_lock(&aspm_lock);
+ pcie_update_aspm_capable(link->root);
+ pcie_config_aspm_path(link);
+ mutex_unlock(&aspm_lock);
+ up_read(&pci_bus_sem);
+}
+
void pcie_aspm_powersave_config_link(struct pci_dev *pdev)
{
struct pcie_link_state *link = pdev->link_state;
Enable perf support for Marvell DDR Performance monitoring
event on CN10K platform.
+config DWC_PCIE_PMU
+ tristate "Synopsys DesignWare PCIe PMU"
+ depends on PCI
+ help
+ Enable perf support for Synopsys DesignWare PCIe PMU Performance
+ monitoring event on platform including the Alibaba Yitian 710.
+
source "drivers/perf/arm_cspmu/Kconfig"
source "drivers/perf/amlogic/Kconfig"
obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
obj-$(CONFIG_ALIBABA_UNCORE_DRW_PMU) += alibaba_uncore_drw_pmu.o
+obj-$(CONFIG_DWC_PCIE_PMU) += dwc_pcie_pmu.o
obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
obj-$(CONFIG_CXL_PMU) += cxl_pmu.o
{
unsigned long config_base = 0;
- if (!attr->exclude_guest)
- return -EINVAL;
+ if (!attr->exclude_guest) {
+ pr_debug("ARM performance counters do not support mode exclusion\n");
+ return -EOPNOTSUPP;
+ }
if (!attr->exclude_kernel)
config_base |= M1_PMU_CFG_COUNT_KERNEL;
if (!attr->exclude_user)
#define CMN_EVENT_HNF_OCC(_model, _name, _event) \
CMN_EVENT_HN_OCC(_model, hnf_##_name, CMN_TYPE_HNF, _event)
#define CMN_EVENT_HNF_CLS(_model, _name, _event) \
- CMN_EVENT_HN_CLS(_model, hnf_##_name, CMN_TYPE_HNS, _event)
+ CMN_EVENT_HN_CLS(_model, hnf_##_name, CMN_TYPE_HNF, _event)
#define CMN_EVENT_HNF_SNT(_model, _name, _event) \
CMN_EVENT_HN_SNT(_model, hnf_##_name, CMN_TYPE_HNF, _event)
return __dsu_pmu_get_reset_overflow();
}
-/**
+/*
* dsu_pmu_set_event_period: Set the period for the counter.
*
* All DSU PMU event counters, except the cycle counter are 32bit
return dsu_pmu;
}
-/**
+/*
* dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster
* from device tree.
*/
return 0;
}
-/**
+/*
* dsu_pmu_acpi_get_cpus: Get the list of CPUs in the cluster
* from ACPI.
*/
{
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- int mapping;
+ int mapping, ret;
hwc->flags = 0;
mapping = armpmu->map_event(event);
/*
* Check whether we need to exclude the counter from certain modes.
*/
- if (armpmu->set_event_filter &&
- armpmu->set_event_filter(hwc, &event->attr)) {
- pr_debug("ARM performance counters do not support "
- "mode exclusion\n");
- return -EOPNOTSUPP;
+ if (armpmu->set_event_filter) {
+ ret = armpmu->set_event_filter(hwc, &event->attr);
+ if (ret)
+ return ret;
}
/*
struct pmu_hw_events *events;
events = per_cpu_ptr(pmu->hw_events, cpu);
- raw_spin_lock_init(&events->pmu_lock);
events->percpu_pmu = pmu;
}
#include <clocksource/arm_arch_timer.h>
#include <linux/acpi.h>
+#include <linux/bitfield.h>
#include <linux/clocksource.h>
#include <linux/of.h>
#include <linux/perf/arm_pmu.h>
PMU_EVENT_ATTR_ID(name, armv8pmu_events_sysfs_show, config)
static struct attribute *armv8_pmuv3_event_attrs[] = {
- ARMV8_EVENT_ATTR(sw_incr, ARMV8_PMUV3_PERFCTR_SW_INCR),
+ /*
+ * Don't expose the sw_incr event in /sys. It's not usable as writes to
+ * PMSWINC_EL0 will trap as PMUSERENR.{SW,EN}=={0,0} and event rotation
+ * means we don't have a fixed event<->counter relationship regardless.
+ */
ARMV8_EVENT_ATTR(l1i_cache_refill, ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL),
ARMV8_EVENT_ATTR(l1i_tlb_refill, ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL),
ARMV8_EVENT_ATTR(l1d_cache_refill, ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL),
.is_visible = armv8pmu_event_attr_is_visible,
};
-PMU_FORMAT_ATTR(event, "config:0-15");
-PMU_FORMAT_ATTR(long, "config1:0");
-PMU_FORMAT_ATTR(rdpmc, "config1:1");
+/* User ABI */
+#define ATTR_CFG_FLD_event_CFG config
+#define ATTR_CFG_FLD_event_LO 0
+#define ATTR_CFG_FLD_event_HI 15
+#define ATTR_CFG_FLD_long_CFG config1
+#define ATTR_CFG_FLD_long_LO 0
+#define ATTR_CFG_FLD_long_HI 0
+#define ATTR_CFG_FLD_rdpmc_CFG config1
+#define ATTR_CFG_FLD_rdpmc_LO 1
+#define ATTR_CFG_FLD_rdpmc_HI 1
+#define ATTR_CFG_FLD_threshold_count_CFG config1 /* PMEVTYPER.TC[0] */
+#define ATTR_CFG_FLD_threshold_count_LO 2
+#define ATTR_CFG_FLD_threshold_count_HI 2
+#define ATTR_CFG_FLD_threshold_compare_CFG config1 /* PMEVTYPER.TC[2:1] */
+#define ATTR_CFG_FLD_threshold_compare_LO 3
+#define ATTR_CFG_FLD_threshold_compare_HI 4
+#define ATTR_CFG_FLD_threshold_CFG config1 /* PMEVTYPER.TH */
+#define ATTR_CFG_FLD_threshold_LO 5
+#define ATTR_CFG_FLD_threshold_HI 16
+
+GEN_PMU_FORMAT_ATTR(event);
+GEN_PMU_FORMAT_ATTR(long);
+GEN_PMU_FORMAT_ATTR(rdpmc);
+GEN_PMU_FORMAT_ATTR(threshold_count);
+GEN_PMU_FORMAT_ATTR(threshold_compare);
+GEN_PMU_FORMAT_ATTR(threshold);
static int sysctl_perf_user_access __read_mostly;
-static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
+static bool armv8pmu_event_is_64bit(struct perf_event *event)
+{
+ return ATTR_CFG_GET_FLD(&event->attr, long);
+}
+
+static bool armv8pmu_event_want_user_access(struct perf_event *event)
{
- return event->attr.config1 & 0x1;
+ return ATTR_CFG_GET_FLD(&event->attr, rdpmc);
}
-static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
+static u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
{
- return event->attr.config1 & 0x2;
+ u8 th_compare = ATTR_CFG_GET_FLD(attr, threshold_compare);
+ u8 th_count = ATTR_CFG_GET_FLD(attr, threshold_count);
+
+ /*
+ * The count bit is always the bottom bit of the full control field, and
+ * the comparison is the upper two bits, but it's not explicitly
+ * labelled in the Arm ARM. For the Perf interface we split it into two
+ * fields, so reconstruct it here.
+ */
+ return (th_compare << 1) | th_count;
}
static struct attribute *armv8_pmuv3_format_attrs[] = {
&format_attr_event.attr,
&format_attr_long.attr,
&format_attr_rdpmc.attr,
+ &format_attr_threshold.attr,
+ &format_attr_threshold_compare.attr,
+ &format_attr_threshold_count.attr,
NULL,
};
{
struct pmu *pmu = dev_get_drvdata(dev);
struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
- u32 slots = cpu_pmu->reg_pmmir & ARMV8_PMU_SLOTS_MASK;
+ u32 slots = FIELD_GET(ARMV8_PMU_SLOTS, cpu_pmu->reg_pmmir);
return sysfs_emit(page, "0x%08x\n", slots);
}
{
struct pmu *pmu = dev_get_drvdata(dev);
struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
- u32 bus_slots = (cpu_pmu->reg_pmmir >> ARMV8_PMU_BUS_SLOTS_SHIFT)
- & ARMV8_PMU_BUS_SLOTS_MASK;
+ u32 bus_slots = FIELD_GET(ARMV8_PMU_BUS_SLOTS, cpu_pmu->reg_pmmir);
return sysfs_emit(page, "0x%08x\n", bus_slots);
}
{
struct pmu *pmu = dev_get_drvdata(dev);
struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
- u32 bus_width = (cpu_pmu->reg_pmmir >> ARMV8_PMU_BUS_WIDTH_SHIFT)
- & ARMV8_PMU_BUS_WIDTH_MASK;
+ u32 bus_width = FIELD_GET(ARMV8_PMU_BUS_WIDTH, cpu_pmu->reg_pmmir);
u32 val = 0;
/* Encoded as Log2(number of bytes), plus one */
static DEVICE_ATTR_RO(bus_width);
+static u32 threshold_max(struct arm_pmu *cpu_pmu)
+{
+ /*
+ * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
+ * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
+ */
+ if (IS_ENABLED(CONFIG_ARM))
+ return 0;
+
+ /*
+ * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
+ * (2 ^ PMMIR.THWIDTH) - 1.
+ */
+ return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
+}
+
+static ssize_t threshold_max_show(struct device *dev,
+ struct device_attribute *attr, char *page)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+ struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+
+ return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
+}
+
+static DEVICE_ATTR_RO(threshold_max);
+
static struct attribute *armv8_pmuv3_caps_attrs[] = {
&dev_attr_slots.attr,
&dev_attr_bus_slots.attr,
&dev_attr_bus_width.attr,
+ &dev_attr_threshold_max.attr,
NULL,
};
return (IS_ENABLED(CONFIG_ARM64) && is_pmuv3p5(cpu_pmu->pmuver));
}
-static inline bool armv8pmu_event_has_user_read(struct perf_event *event)
+static bool armv8pmu_event_has_user_read(struct perf_event *event)
{
return event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT;
}
* except when we have allocated the 64bit cycle counter (for CPU
* cycles event) or when user space counter access is enabled.
*/
-static inline bool armv8pmu_event_is_chained(struct perf_event *event)
+static bool armv8pmu_event_is_chained(struct perf_event *event)
{
int idx = event->hw.idx;
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
#define ARMV8_IDX_TO_COUNTER(x) \
(((x) - ARMV8_IDX_COUNTER0) & ARMV8_PMU_COUNTER_MASK)
-static inline u64 armv8pmu_pmcr_read(void)
+static u64 armv8pmu_pmcr_read(void)
{
return read_pmcr();
}
-static inline void armv8pmu_pmcr_write(u64 val)
+static void armv8pmu_pmcr_write(u64 val)
{
val &= ARMV8_PMU_PMCR_MASK;
isb();
write_pmcr(val);
}
-static inline int armv8pmu_has_overflowed(u32 pmovsr)
+static int armv8pmu_has_overflowed(u32 pmovsr)
{
return pmovsr & ARMV8_PMU_OVERFLOWED_MASK;
}
-static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
+static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
{
return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
}
-static inline u64 armv8pmu_read_evcntr(int idx)
+static u64 armv8pmu_read_evcntr(int idx)
{
u32 counter = ARMV8_IDX_TO_COUNTER(idx);
return read_pmevcntrn(counter);
}
-static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
+static u64 armv8pmu_read_hw_counter(struct perf_event *event)
{
int idx = event->hw.idx;
u64 val = armv8pmu_read_evcntr(idx);
return armv8pmu_unbias_long_counter(event, value);
}
-static inline void armv8pmu_write_evcntr(int idx, u64 value)
+static void armv8pmu_write_evcntr(int idx, u64 value)
{
u32 counter = ARMV8_IDX_TO_COUNTER(idx);
write_pmevcntrn(counter, value);
}
-static inline void armv8pmu_write_hw_counter(struct perf_event *event,
+static void armv8pmu_write_hw_counter(struct perf_event *event,
u64 value)
{
int idx = event->hw.idx;
armv8pmu_write_hw_counter(event, value);
}
-static inline void armv8pmu_write_evtype(int idx, u32 val)
+static void armv8pmu_write_evtype(int idx, unsigned long val)
{
u32 counter = ARMV8_IDX_TO_COUNTER(idx);
+ unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
+ ARMV8_PMU_INCLUDE_EL2 |
+ ARMV8_PMU_EXCLUDE_EL0 |
+ ARMV8_PMU_EXCLUDE_EL1;
- val &= ARMV8_PMU_EVTYPE_MASK;
+ if (IS_ENABLED(CONFIG_ARM64))
+ mask |= ARMV8_PMU_EVTYPE_TC | ARMV8_PMU_EVTYPE_TH;
+
+ val &= mask;
write_pmevtypern(counter, val);
}
-static inline void armv8pmu_write_event_type(struct perf_event *event)
+static void armv8pmu_write_event_type(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
return mask;
}
-static inline void armv8pmu_enable_counter(u32 mask)
+static void armv8pmu_enable_counter(u32 mask)
{
/*
* Make sure event configuration register writes are visible before we
write_pmcntenset(mask);
}
-static inline void armv8pmu_enable_event_counter(struct perf_event *event)
+static void armv8pmu_enable_event_counter(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
u32 mask = armv8pmu_event_cnten_mask(event);
armv8pmu_enable_counter(mask);
}
-static inline void armv8pmu_disable_counter(u32 mask)
+static void armv8pmu_disable_counter(u32 mask)
{
write_pmcntenclr(mask);
/*
isb();
}
-static inline void armv8pmu_disable_event_counter(struct perf_event *event)
+static void armv8pmu_disable_event_counter(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
u32 mask = armv8pmu_event_cnten_mask(event);
armv8pmu_disable_counter(mask);
}
-static inline void armv8pmu_enable_intens(u32 mask)
+static void armv8pmu_enable_intens(u32 mask)
{
write_pmintenset(mask);
}
-static inline void armv8pmu_enable_event_irq(struct perf_event *event)
+static void armv8pmu_enable_event_irq(struct perf_event *event)
{
u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
armv8pmu_enable_intens(BIT(counter));
}
-static inline void armv8pmu_disable_intens(u32 mask)
+static void armv8pmu_disable_intens(u32 mask)
{
write_pmintenclr(mask);
isb();
isb();
}
-static inline void armv8pmu_disable_event_irq(struct perf_event *event)
+static void armv8pmu_disable_event_irq(struct perf_event *event)
{
u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
armv8pmu_disable_intens(BIT(counter));
}
-static inline u32 armv8pmu_getreset_flags(void)
+static u32 armv8pmu_getreset_flags(void)
{
u32 value;
value = read_pmovsclr();
/* Write to clear flags */
- value &= ARMV8_PMU_OVSR_MASK;
+ value &= ARMV8_PMU_OVERFLOWED_MASK;
write_pmovsclr(value);
return value;
struct perf_event_attr *attr)
{
unsigned long config_base = 0;
-
- if (attr->exclude_idle)
- return -EPERM;
+ struct perf_event *perf_event = container_of(attr, struct perf_event,
+ attr);
+ struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
+ u32 th;
+
+ if (attr->exclude_idle) {
+ pr_debug("ARM performance counters do not support mode exclusion\n");
+ return -EOPNOTSUPP;
+ }
/*
* If we're running in hyp mode, then we *are* the hypervisor.
if (attr->exclude_user)
config_base |= ARMV8_PMU_EXCLUDE_EL0;
+ /*
+ * If FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will
+ * be 0 and will also trigger this check, preventing it from being used.
+ */
+ th = ATTR_CFG_GET_FLD(attr, threshold);
+ if (th > threshold_max(cpu_pmu)) {
+ pr_debug("PMU event threshold exceeds max value\n");
+ return -EINVAL;
+ }
+
+ if (IS_ENABLED(CONFIG_ARM64) && th) {
+ config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
+ config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
+ armv8pmu_event_threshold_control(attr));
+ }
+
/*
* Install the filter into config_base as this is used to
* construct the event type.
probe->present = true;
/* Read the nb of CNTx counters supported from PMNC */
- cpu_pmu->num_events = (armv8pmu_pmcr_read() >> ARMV8_PMU_PMCR_N_SHIFT)
- & ARMV8_PMU_PMCR_N_MASK;
+ cpu_pmu->num_events = FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read());
/* Add the CPU cycles counter */
cpu_pmu->num_events += 1;
return armv8_pmu_init(cpu_pmu, #name, armv8_pmuv3_map_event); \
}
+#define PMUV3_INIT_MAP_EVENT(name, map_event) \
+static int name##_pmu_init(struct arm_pmu *cpu_pmu) \
+{ \
+ return armv8_pmu_init(cpu_pmu, #name, map_event); \
+}
+
PMUV3_INIT_SIMPLE(armv8_pmuv3)
PMUV3_INIT_SIMPLE(armv8_cortex_a34)
PMUV3_INIT_SIMPLE(armv8_nvidia_carmel)
PMUV3_INIT_SIMPLE(armv8_nvidia_denver)
-static int armv8_a35_pmu_init(struct arm_pmu *cpu_pmu)
-{
- return armv8_pmu_init(cpu_pmu, "armv8_cortex_a35", armv8_a53_map_event);
-}
-
-static int armv8_a53_pmu_init(struct arm_pmu *cpu_pmu)
-{
- return armv8_pmu_init(cpu_pmu, "armv8_cortex_a53", armv8_a53_map_event);
-}
-
-static int armv8_a57_pmu_init(struct arm_pmu *cpu_pmu)
-{
- return armv8_pmu_init(cpu_pmu, "armv8_cortex_a57", armv8_a57_map_event);
-}
-
-static int armv8_a72_pmu_init(struct arm_pmu *cpu_pmu)
-{
- return armv8_pmu_init(cpu_pmu, "armv8_cortex_a72", armv8_a57_map_event);
-}
-
-static int armv8_a73_pmu_init(struct arm_pmu *cpu_pmu)
-{
- return armv8_pmu_init(cpu_pmu, "armv8_cortex_a73", armv8_a73_map_event);
-}
-
-static int armv8_thunder_pmu_init(struct arm_pmu *cpu_pmu)
-{
- return armv8_pmu_init(cpu_pmu, "armv8_cavium_thunder", armv8_thunder_map_event);
-}
-
-static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu)
-{
- return armv8_pmu_init(cpu_pmu, "armv8_brcm_vulcan", armv8_vulcan_map_event);
-}
+PMUV3_INIT_MAP_EVENT(armv8_cortex_a35, armv8_a53_map_event)
+PMUV3_INIT_MAP_EVENT(armv8_cortex_a53, armv8_a53_map_event)
+PMUV3_INIT_MAP_EVENT(armv8_cortex_a57, armv8_a57_map_event)
+PMUV3_INIT_MAP_EVENT(armv8_cortex_a72, armv8_a57_map_event)
+PMUV3_INIT_MAP_EVENT(armv8_cortex_a73, armv8_a73_map_event)
+PMUV3_INIT_MAP_EVENT(armv8_cavium_thunder, armv8_thunder_map_event)
+PMUV3_INIT_MAP_EVENT(armv8_brcm_vulcan, armv8_vulcan_map_event)
static const struct of_device_id armv8_pmu_of_device_ids[] = {
{.compatible = "arm,armv8-pmuv3", .data = armv8_pmuv3_pmu_init},
{.compatible = "arm,cortex-a34-pmu", .data = armv8_cortex_a34_pmu_init},
- {.compatible = "arm,cortex-a35-pmu", .data = armv8_a35_pmu_init},
- {.compatible = "arm,cortex-a53-pmu", .data = armv8_a53_pmu_init},
+ {.compatible = "arm,cortex-a35-pmu", .data = armv8_cortex_a35_pmu_init},
+ {.compatible = "arm,cortex-a53-pmu", .data = armv8_cortex_a53_pmu_init},
{.compatible = "arm,cortex-a55-pmu", .data = armv8_cortex_a55_pmu_init},
- {.compatible = "arm,cortex-a57-pmu", .data = armv8_a57_pmu_init},
+ {.compatible = "arm,cortex-a57-pmu", .data = armv8_cortex_a57_pmu_init},
{.compatible = "arm,cortex-a65-pmu", .data = armv8_cortex_a65_pmu_init},
- {.compatible = "arm,cortex-a72-pmu", .data = armv8_a72_pmu_init},
- {.compatible = "arm,cortex-a73-pmu", .data = armv8_a73_pmu_init},
+ {.compatible = "arm,cortex-a72-pmu", .data = armv8_cortex_a72_pmu_init},
+ {.compatible = "arm,cortex-a73-pmu", .data = armv8_cortex_a73_pmu_init},
{.compatible = "arm,cortex-a75-pmu", .data = armv8_cortex_a75_pmu_init},
{.compatible = "arm,cortex-a76-pmu", .data = armv8_cortex_a76_pmu_init},
{.compatible = "arm,cortex-a77-pmu", .data = armv8_cortex_a77_pmu_init},
{.compatible = "arm,neoverse-n1-pmu", .data = armv8_neoverse_n1_pmu_init},
{.compatible = "arm,neoverse-n2-pmu", .data = armv9_neoverse_n2_pmu_init},
{.compatible = "arm,neoverse-v1-pmu", .data = armv8_neoverse_v1_pmu_init},
- {.compatible = "cavium,thunder-pmu", .data = armv8_thunder_pmu_init},
- {.compatible = "brcm,vulcan-pmu", .data = armv8_vulcan_pmu_init},
+ {.compatible = "cavium,thunder-pmu", .data = armv8_cavium_thunder_pmu_init},
+ {.compatible = "brcm,vulcan-pmu", .data = armv8_brcm_vulcan_pmu_init},
{.compatible = "nvidia,carmel-pmu", .data = armv8_nvidia_carmel_pmu_init},
{.compatible = "nvidia,denver-pmu", .data = armv8_nvidia_denver_pmu_init},
{},
#define ATTR_CFG_FLD_inv_event_filter_LO 0
#define ATTR_CFG_FLD_inv_event_filter_HI 63
-/* Why does everything I do descend into this? */
-#define __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \
- (lo) == (hi) ? #cfg ":" #lo "\n" : #cfg ":" #lo "-" #hi
-
-#define _GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \
- __GEN_PMU_FORMAT_ATTR(cfg, lo, hi)
-
-#define GEN_PMU_FORMAT_ATTR(name) \
- PMU_FORMAT_ATTR(name, \
- _GEN_PMU_FORMAT_ATTR(ATTR_CFG_FLD_##name##_CFG, \
- ATTR_CFG_FLD_##name##_LO, \
- ATTR_CFG_FLD_##name##_HI))
-
-#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi) \
- ((((attr)->cfg) >> lo) & GENMASK(hi - lo, 0))
-
-#define ATTR_CFG_GET_FLD(attr, name) \
- _ATTR_CFG_GET_FLD(attr, \
- ATTR_CFG_FLD_##name##_CFG, \
- ATTR_CFG_FLD_##name##_LO, \
- ATTR_CFG_FLD_##name##_HI)
-
GEN_PMU_FORMAT_ATTR(ts_enable);
GEN_PMU_FORMAT_ATTR(pa_enable);
GEN_PMU_FORMAT_ATTR(pct_enable);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Synopsys DesignWare PCIe PMU driver
+ *
+ * Copyright (C) 2021-2023 Alibaba Inc.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/cpuhotplug.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/perf_event.h>
+#include <linux/pci.h>
+#include <linux/platform_device.h>
+#include <linux/smp.h>
+#include <linux/sysfs.h>
+#include <linux/types.h>
+
+#define DWC_PCIE_VSEC_RAS_DES_ID 0x02
+#define DWC_PCIE_EVENT_CNT_CTL 0x8
+
+/*
+ * Event Counter Data Select includes two parts:
+ * - 27-24: Group number(4-bit: 0..0x7)
+ * - 23-16: Event number(8-bit: 0..0x13) within the Group
+ *
+ * Put them together as in TRM.
+ */
+#define DWC_PCIE_CNT_EVENT_SEL GENMASK(27, 16)
+#define DWC_PCIE_CNT_LANE_SEL GENMASK(11, 8)
+#define DWC_PCIE_CNT_STATUS BIT(7)
+#define DWC_PCIE_CNT_ENABLE GENMASK(4, 2)
+#define DWC_PCIE_PER_EVENT_OFF 0x1
+#define DWC_PCIE_PER_EVENT_ON 0x3
+#define DWC_PCIE_EVENT_CLEAR GENMASK(1, 0)
+#define DWC_PCIE_EVENT_PER_CLEAR 0x1
+
+#define DWC_PCIE_EVENT_CNT_DATA 0xC
+
+#define DWC_PCIE_TIME_BASED_ANAL_CTL 0x10
+#define DWC_PCIE_TIME_BASED_REPORT_SEL GENMASK(31, 24)
+#define DWC_PCIE_TIME_BASED_DURATION_SEL GENMASK(15, 8)
+#define DWC_PCIE_DURATION_MANUAL_CTL 0x0
+#define DWC_PCIE_DURATION_1MS 0x1
+#define DWC_PCIE_DURATION_10MS 0x2
+#define DWC_PCIE_DURATION_100MS 0x3
+#define DWC_PCIE_DURATION_1S 0x4
+#define DWC_PCIE_DURATION_2S 0x5
+#define DWC_PCIE_DURATION_4S 0x6
+#define DWC_PCIE_DURATION_4US 0xFF
+#define DWC_PCIE_TIME_BASED_TIMER_START BIT(0)
+#define DWC_PCIE_TIME_BASED_CNT_ENABLE 0x1
+
+#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW 0x14
+#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH 0x18
+
+/* Event attributes */
+#define DWC_PCIE_CONFIG_EVENTID GENMASK(15, 0)
+#define DWC_PCIE_CONFIG_TYPE GENMASK(19, 16)
+#define DWC_PCIE_CONFIG_LANE GENMASK(27, 20)
+
+#define DWC_PCIE_EVENT_ID(event) FIELD_GET(DWC_PCIE_CONFIG_EVENTID, (event)->attr.config)
+#define DWC_PCIE_EVENT_TYPE(event) FIELD_GET(DWC_PCIE_CONFIG_TYPE, (event)->attr.config)
+#define DWC_PCIE_EVENT_LANE(event) FIELD_GET(DWC_PCIE_CONFIG_LANE, (event)->attr.config)
+
+enum dwc_pcie_event_type {
+ DWC_PCIE_TIME_BASE_EVENT,
+ DWC_PCIE_LANE_EVENT,
+ DWC_PCIE_EVENT_TYPE_MAX,
+};
+
+#define DWC_PCIE_LANE_EVENT_MAX_PERIOD GENMASK_ULL(31, 0)
+#define DWC_PCIE_MAX_PERIOD GENMASK_ULL(63, 0)
+
+struct dwc_pcie_pmu {
+ struct pmu pmu;
+ struct pci_dev *pdev; /* Root Port device */
+ u16 ras_des_offset;
+ u32 nr_lanes;
+
+ struct list_head pmu_node;
+ struct hlist_node cpuhp_node;
+ struct perf_event *event[DWC_PCIE_EVENT_TYPE_MAX];
+ int on_cpu;
+};
+
+#define to_dwc_pcie_pmu(p) (container_of(p, struct dwc_pcie_pmu, pmu))
+
+static int dwc_pcie_pmu_hp_state;
+static struct list_head dwc_pcie_dev_info_head =
+ LIST_HEAD_INIT(dwc_pcie_dev_info_head);
+static bool notify;
+
+struct dwc_pcie_dev_info {
+ struct platform_device *plat_dev;
+ struct pci_dev *pdev;
+ struct list_head dev_node;
+};
+
+struct dwc_pcie_vendor_id {
+ int vendor_id;
+};
+
+static const struct dwc_pcie_vendor_id dwc_pcie_vendor_ids[] = {
+ {.vendor_id = PCI_VENDOR_ID_ALIBABA },
+ {} /* terminator */
+};
+
+static ssize_t cpumask_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(dev_get_drvdata(dev));
+
+ return cpumap_print_to_pagebuf(true, buf, cpumask_of(pcie_pmu->on_cpu));
+}
+static DEVICE_ATTR_RO(cpumask);
+
+static struct attribute *dwc_pcie_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL
+};
+
+static struct attribute_group dwc_pcie_cpumask_attr_group = {
+ .attrs = dwc_pcie_pmu_cpumask_attrs,
+};
+
+struct dwc_pcie_format_attr {
+ struct device_attribute attr;
+ u64 field;
+ int config;
+};
+
+PMU_FORMAT_ATTR(eventid, "config:0-15");
+PMU_FORMAT_ATTR(type, "config:16-19");
+PMU_FORMAT_ATTR(lane, "config:20-27");
+
+static struct attribute *dwc_pcie_format_attrs[] = {
+ &format_attr_type.attr,
+ &format_attr_eventid.attr,
+ &format_attr_lane.attr,
+ NULL,
+};
+
+static struct attribute_group dwc_pcie_format_attrs_group = {
+ .name = "format",
+ .attrs = dwc_pcie_format_attrs,
+};
+
+struct dwc_pcie_event_attr {
+ struct device_attribute attr;
+ enum dwc_pcie_event_type type;
+ u16 eventid;
+ u8 lane;
+};
+
+static ssize_t dwc_pcie_event_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dwc_pcie_event_attr *eattr;
+
+ eattr = container_of(attr, typeof(*eattr), attr);
+
+ if (eattr->type == DWC_PCIE_LANE_EVENT)
+ return sysfs_emit(buf, "eventid=0x%x,type=0x%x,lane=?\n",
+ eattr->eventid, eattr->type);
+ else if (eattr->type == DWC_PCIE_TIME_BASE_EVENT)
+ return sysfs_emit(buf, "eventid=0x%x,type=0x%x\n",
+ eattr->eventid, eattr->type);
+
+ return 0;
+}
+
+#define DWC_PCIE_EVENT_ATTR(_name, _type, _eventid, _lane) \
+ (&((struct dwc_pcie_event_attr[]) {{ \
+ .attr = __ATTR(_name, 0444, dwc_pcie_event_show, NULL), \
+ .type = _type, \
+ .eventid = _eventid, \
+ .lane = _lane, \
+ }})[0].attr.attr)
+
+#define DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(_name, _eventid) \
+ DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_TIME_BASE_EVENT, _eventid, 0)
+#define DWC_PCIE_PMU_LANE_EVENT_ATTR(_name, _eventid) \
+ DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_LANE_EVENT, _eventid, 0)
+
+static struct attribute *dwc_pcie_pmu_time_event_attrs[] = {
+ /* Group #0 */
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(one_cycle, 0x00),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_L0S, 0x01),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(RX_L0S, 0x02),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L0, 0x03),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1, 0x04),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_1, 0x05),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_2, 0x06),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(CFG_RCVRY, 0x07),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_RX_L0S, 0x08),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_AUX, 0x09),
+
+ /* Group #1 */
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_PCIe_TLP_Data_Payload, 0x20),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_PCIe_TLP_Data_Payload, 0x21),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_CCIX_TLP_Data_Payload, 0x22),
+ DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_CCIX_TLP_Data_Payload, 0x23),
+
+ /*
+ * Leave it to the user to specify the lane ID to avoid generating
+ * a list of hundreds of events.
+ */
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ack_dllp, 0x600),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_update_fc_dllp, 0x601),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ack_dllp, 0x602),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_update_fc_dllp, 0x603),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_nulified_tlp, 0x604),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_nulified_tlp, 0x605),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_duplicate_tl, 0x606),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_write, 0x700),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_read, 0x701),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_write, 0x702),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_read, 0x703),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_write, 0x704),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_read, 0x705),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_without_data, 0x706),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_with_data, 0x707),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_message_tlp, 0x708),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_atomic, 0x709),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_tlp_with_prefix, 0x70A),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_write, 0x70B),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_read, 0x70C),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_write, 0x70F),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_read, 0x710),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_without_data, 0x711),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_with_data, 0x712),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_message_tlp, 0x713),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_atomic, 0x714),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_tlp_with_prefix, 0x715),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ccix_tlp, 0x716),
+ DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ccix_tlp, 0x717),
+ NULL
+};
+
+static const struct attribute_group dwc_pcie_event_attrs_group = {
+ .name = "events",
+ .attrs = dwc_pcie_pmu_time_event_attrs,
+};
+
+static const struct attribute_group *dwc_pcie_attr_groups[] = {
+ &dwc_pcie_event_attrs_group,
+ &dwc_pcie_format_attrs_group,
+ &dwc_pcie_cpumask_attr_group,
+ NULL
+};
+
+static void dwc_pcie_pmu_lane_event_enable(struct dwc_pcie_pmu *pcie_pmu,
+ bool enable)
+{
+ struct pci_dev *pdev = pcie_pmu->pdev;
+ u16 ras_des_offset = pcie_pmu->ras_des_offset;
+
+ if (enable)
+ pci_clear_and_set_config_dword(pdev,
+ ras_des_offset + DWC_PCIE_EVENT_CNT_CTL,
+ DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_ON);
+ else
+ pci_clear_and_set_config_dword(pdev,
+ ras_des_offset + DWC_PCIE_EVENT_CNT_CTL,
+ DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF);
+}
+
+static void dwc_pcie_pmu_time_based_event_enable(struct dwc_pcie_pmu *pcie_pmu,
+ bool enable)
+{
+ struct pci_dev *pdev = pcie_pmu->pdev;
+ u16 ras_des_offset = pcie_pmu->ras_des_offset;
+
+ pci_clear_and_set_config_dword(pdev,
+ ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_CTL,
+ DWC_PCIE_TIME_BASED_TIMER_START, enable);
+}
+
+static u64 dwc_pcie_pmu_read_lane_event_counter(struct perf_event *event)
+{
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+ struct pci_dev *pdev = pcie_pmu->pdev;
+ u16 ras_des_offset = pcie_pmu->ras_des_offset;
+ u32 val;
+
+ pci_read_config_dword(pdev, ras_des_offset + DWC_PCIE_EVENT_CNT_DATA, &val);
+
+ return val;
+}
+
+static u64 dwc_pcie_pmu_read_time_based_counter(struct perf_event *event)
+{
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+ struct pci_dev *pdev = pcie_pmu->pdev;
+ int event_id = DWC_PCIE_EVENT_ID(event);
+ u16 ras_des_offset = pcie_pmu->ras_des_offset;
+ u32 lo, hi, ss;
+ u64 val;
+
+ /*
+ * The 64-bit value of the data counter is spread across two
+ * registers that are not synchronized. In order to read them
+ * atomically, ensure that the high 32 bits match before and after
+ * reading the low 32 bits.
+ */
+ pci_read_config_dword(pdev,
+ ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, &hi);
+ do {
+ /* snapshot the high 32 bits */
+ ss = hi;
+
+ pci_read_config_dword(
+ pdev, ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW,
+ &lo);
+ pci_read_config_dword(
+ pdev, ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH,
+ &hi);
+ } while (hi != ss);
+
+ val = ((u64)hi << 32) | lo;
+ /*
+ * The Group#1 event measures the amount of data processed in 16-byte
+ * units. Simplify the end-user interface by multiplying the counter
+ * at the point of read.
+ */
+ if (event_id >= 0x20 && event_id <= 0x23)
+ val *= 16;
+
+ return val;
+}
+
+static void dwc_pcie_pmu_event_update(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+ u64 delta, prev, now = 0;
+
+ do {
+ prev = local64_read(&hwc->prev_count);
+
+ if (type == DWC_PCIE_LANE_EVENT)
+ now = dwc_pcie_pmu_read_lane_event_counter(event);
+ else if (type == DWC_PCIE_TIME_BASE_EVENT)
+ now = dwc_pcie_pmu_read_time_based_counter(event);
+
+ } while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+ delta = (now - prev) & DWC_PCIE_MAX_PERIOD;
+ /* 32-bit counter for Lane Event Counting */
+ if (type == DWC_PCIE_LANE_EVENT)
+ delta &= DWC_PCIE_LANE_EVENT_MAX_PERIOD;
+
+ local64_add(delta, &event->count);
+}
+
+static int dwc_pcie_pmu_event_init(struct perf_event *event)
+{
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+ enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+ struct perf_event *sibling;
+ u32 lane;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* We don't support sampling */
+ if (is_sampling_event(event))
+ return -EINVAL;
+
+ /* We cannot support task bound events */
+ if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK)
+ return -EINVAL;
+
+ if (event->group_leader != event &&
+ !is_software_event(event->group_leader))
+ return -EINVAL;
+
+ for_each_sibling_event(sibling, event->group_leader) {
+ if (sibling->pmu != event->pmu && !is_software_event(sibling))
+ return -EINVAL;
+ }
+
+ if (type < 0 || type >= DWC_PCIE_EVENT_TYPE_MAX)
+ return -EINVAL;
+
+ if (type == DWC_PCIE_LANE_EVENT) {
+ lane = DWC_PCIE_EVENT_LANE(event);
+ if (lane < 0 || lane >= pcie_pmu->nr_lanes)
+ return -EINVAL;
+ }
+
+ event->cpu = pcie_pmu->on_cpu;
+
+ return 0;
+}
+
+static void dwc_pcie_pmu_event_start(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+ enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+
+ hwc->state = 0;
+ local64_set(&hwc->prev_count, 0);
+
+ if (type == DWC_PCIE_LANE_EVENT)
+ dwc_pcie_pmu_lane_event_enable(pcie_pmu, true);
+ else if (type == DWC_PCIE_TIME_BASE_EVENT)
+ dwc_pcie_pmu_time_based_event_enable(pcie_pmu, true);
+}
+
+static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
+{
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+ enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (event->hw.state & PERF_HES_STOPPED)
+ return;
+
+ if (type == DWC_PCIE_LANE_EVENT)
+ dwc_pcie_pmu_lane_event_enable(pcie_pmu, false);
+ else if (type == DWC_PCIE_TIME_BASE_EVENT)
+ dwc_pcie_pmu_time_based_event_enable(pcie_pmu, false);
+
+ dwc_pcie_pmu_event_update(event);
+ hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static int dwc_pcie_pmu_event_add(struct perf_event *event, int flags)
+{
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+ struct pci_dev *pdev = pcie_pmu->pdev;
+ struct hw_perf_event *hwc = &event->hw;
+ enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+ int event_id = DWC_PCIE_EVENT_ID(event);
+ int lane = DWC_PCIE_EVENT_LANE(event);
+ u16 ras_des_offset = pcie_pmu->ras_des_offset;
+ u32 ctrl;
+
+ /* one counter for each type and it is in use */
+ if (pcie_pmu->event[type])
+ return -ENOSPC;
+
+ pcie_pmu->event[type] = event;
+ hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+ if (type == DWC_PCIE_LANE_EVENT) {
+ /* EVENT_COUNTER_DATA_REG needs clear manually */
+ ctrl = FIELD_PREP(DWC_PCIE_CNT_EVENT_SEL, event_id) |
+ FIELD_PREP(DWC_PCIE_CNT_LANE_SEL, lane) |
+ FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF) |
+ FIELD_PREP(DWC_PCIE_EVENT_CLEAR, DWC_PCIE_EVENT_PER_CLEAR);
+ pci_write_config_dword(pdev, ras_des_offset + DWC_PCIE_EVENT_CNT_CTL,
+ ctrl);
+ } else if (type == DWC_PCIE_TIME_BASE_EVENT) {
+ /*
+ * TIME_BASED_ANAL_DATA_REG is a 64 bit register, we can safely
+ * use it with any manually controlled duration. And it is
+ * cleared when next measurement starts.
+ */
+ ctrl = FIELD_PREP(DWC_PCIE_TIME_BASED_REPORT_SEL, event_id) |
+ FIELD_PREP(DWC_PCIE_TIME_BASED_DURATION_SEL,
+ DWC_PCIE_DURATION_MANUAL_CTL) |
+ DWC_PCIE_TIME_BASED_CNT_ENABLE;
+ pci_write_config_dword(
+ pdev, ras_des_offset + DWC_PCIE_TIME_BASED_ANAL_CTL, ctrl);
+ }
+
+ if (flags & PERF_EF_START)
+ dwc_pcie_pmu_event_start(event, PERF_EF_RELOAD);
+
+ perf_event_update_userpage(event);
+
+ return 0;
+}
+
+static void dwc_pcie_pmu_event_del(struct perf_event *event, int flags)
+{
+ struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+ enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+
+ dwc_pcie_pmu_event_stop(event, flags | PERF_EF_UPDATE);
+ perf_event_update_userpage(event);
+ pcie_pmu->event[type] = NULL;
+}
+
+static void dwc_pcie_pmu_remove_cpuhp_instance(void *hotplug_node)
+{
+ cpuhp_state_remove_instance_nocalls(dwc_pcie_pmu_hp_state, hotplug_node);
+}
+
+/*
+ * Find the binded DES capability device info of a PCI device.
+ * @pdev: The PCI device.
+ */
+static struct dwc_pcie_dev_info *dwc_pcie_find_dev_info(struct pci_dev *pdev)
+{
+ struct dwc_pcie_dev_info *dev_info;
+
+ list_for_each_entry(dev_info, &dwc_pcie_dev_info_head, dev_node)
+ if (dev_info->pdev == pdev)
+ return dev_info;
+
+ return NULL;
+}
+
+static void dwc_pcie_unregister_pmu(void *data)
+{
+ struct dwc_pcie_pmu *pcie_pmu = data;
+
+ perf_pmu_unregister(&pcie_pmu->pmu);
+}
+
+static bool dwc_pcie_match_des_cap(struct pci_dev *pdev)
+{
+ const struct dwc_pcie_vendor_id *vid;
+ u16 vsec = 0;
+ u32 val;
+
+ if (!pci_is_pcie(pdev) || !(pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT))
+ return false;
+
+ for (vid = dwc_pcie_vendor_ids; vid->vendor_id; vid++) {
+ vsec = pci_find_vsec_capability(pdev, vid->vendor_id,
+ DWC_PCIE_VSEC_RAS_DES_ID);
+ if (vsec)
+ break;
+ }
+ if (!vsec)
+ return false;
+
+ pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val);
+ if (PCI_VNDR_HEADER_REV(val) != 0x04)
+ return false;
+
+ pci_dbg(pdev,
+ "Detected PCIe Vendor-Specific Extended Capability RAS DES\n");
+ return true;
+}
+
+static void dwc_pcie_unregister_dev(struct dwc_pcie_dev_info *dev_info)
+{
+ platform_device_unregister(dev_info->plat_dev);
+ list_del(&dev_info->dev_node);
+ kfree(dev_info);
+}
+
+static int dwc_pcie_register_dev(struct pci_dev *pdev)
+{
+ struct platform_device *plat_dev;
+ struct dwc_pcie_dev_info *dev_info;
+ u32 bdf;
+
+ bdf = PCI_DEVID(pdev->bus->number, pdev->devfn);
+ plat_dev = platform_device_register_data(NULL, "dwc_pcie_pmu", bdf,
+ pdev, sizeof(*pdev));
+
+ if (IS_ERR(plat_dev))
+ return PTR_ERR(plat_dev);
+
+ dev_info = kzalloc(sizeof(*dev_info), GFP_KERNEL);
+ if (!dev_info)
+ return -ENOMEM;
+
+ /* Cache platform device to handle pci device hotplug */
+ dev_info->plat_dev = plat_dev;
+ dev_info->pdev = pdev;
+ list_add(&dev_info->dev_node, &dwc_pcie_dev_info_head);
+
+ return 0;
+}
+
+static int dwc_pcie_pmu_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct device *dev = data;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct dwc_pcie_dev_info *dev_info;
+
+ switch (action) {
+ case BUS_NOTIFY_ADD_DEVICE:
+ if (!dwc_pcie_match_des_cap(pdev))
+ return NOTIFY_DONE;
+ if (dwc_pcie_register_dev(pdev))
+ return NOTIFY_BAD;
+ break;
+ case BUS_NOTIFY_DEL_DEVICE:
+ dev_info = dwc_pcie_find_dev_info(pdev);
+ if (!dev_info)
+ return NOTIFY_DONE;
+ dwc_pcie_unregister_dev(dev_info);
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block dwc_pcie_pmu_nb = {
+ .notifier_call = dwc_pcie_pmu_notifier,
+};
+
+static int dwc_pcie_pmu_probe(struct platform_device *plat_dev)
+{
+ struct pci_dev *pdev = plat_dev->dev.platform_data;
+ struct dwc_pcie_pmu *pcie_pmu;
+ char *name;
+ u32 bdf, val;
+ u16 vsec;
+ int ret;
+
+ vsec = pci_find_vsec_capability(pdev, pdev->vendor,
+ DWC_PCIE_VSEC_RAS_DES_ID);
+ pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val);
+ bdf = PCI_DEVID(pdev->bus->number, pdev->devfn);
+ name = devm_kasprintf(&plat_dev->dev, GFP_KERNEL, "dwc_rootport_%x", bdf);
+ if (!name)
+ return -ENOMEM;
+
+ pcie_pmu = devm_kzalloc(&plat_dev->dev, sizeof(*pcie_pmu), GFP_KERNEL);
+ if (!pcie_pmu)
+ return -ENOMEM;
+
+ pcie_pmu->pdev = pdev;
+ pcie_pmu->ras_des_offset = vsec;
+ pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
+ pcie_pmu->on_cpu = -1;
+ pcie_pmu->pmu = (struct pmu){
+ .name = name,
+ .parent = &pdev->dev,
+ .module = THIS_MODULE,
+ .attr_groups = dwc_pcie_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .task_ctx_nr = perf_invalid_context,
+ .event_init = dwc_pcie_pmu_event_init,
+ .add = dwc_pcie_pmu_event_add,
+ .del = dwc_pcie_pmu_event_del,
+ .start = dwc_pcie_pmu_event_start,
+ .stop = dwc_pcie_pmu_event_stop,
+ .read = dwc_pcie_pmu_event_update,
+ };
+
+ /* Add this instance to the list used by the offline callback */
+ ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state,
+ &pcie_pmu->cpuhp_node);
+ if (ret) {
+ pci_err(pdev, "Error %d registering hotplug @%x\n", ret, bdf);
+ return ret;
+ }
+
+ /* Unwind when platform driver removes */
+ ret = devm_add_action_or_reset(&plat_dev->dev,
+ dwc_pcie_pmu_remove_cpuhp_instance,
+ &pcie_pmu->cpuhp_node);
+ if (ret)
+ return ret;
+
+ ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
+ if (ret) {
+ pci_err(pdev, "Error %d registering PMU @%x\n", ret, bdf);
+ return ret;
+ }
+ ret = devm_add_action_or_reset(&plat_dev->dev, dwc_pcie_unregister_pmu,
+ pcie_pmu);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
+{
+ struct dwc_pcie_pmu *pcie_pmu;
+
+ pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
+ if (pcie_pmu->on_cpu == -1)
+ pcie_pmu->on_cpu = cpumask_local_spread(
+ 0, dev_to_node(&pcie_pmu->pdev->dev));
+
+ return 0;
+}
+
+static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
+{
+ struct dwc_pcie_pmu *pcie_pmu;
+ struct pci_dev *pdev;
+ int node;
+ cpumask_t mask;
+ unsigned int target;
+
+ pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
+ /* Nothing to do if this CPU doesn't own the PMU */
+ if (cpu != pcie_pmu->on_cpu)
+ return 0;
+
+ pcie_pmu->on_cpu = -1;
+ pdev = pcie_pmu->pdev;
+ node = dev_to_node(&pdev->dev);
+ if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
+ cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
+ target = cpumask_any(&mask);
+ else
+ target = cpumask_any_but(cpu_online_mask, cpu);
+
+ if (target >= nr_cpu_ids) {
+ pci_err(pdev, "There is no CPU to set\n");
+ return 0;
+ }
+
+ /* This PMU does NOT support interrupt, just migrate context. */
+ perf_pmu_migrate_context(&pcie_pmu->pmu, cpu, target);
+ pcie_pmu->on_cpu = target;
+
+ return 0;
+}
+
+static struct platform_driver dwc_pcie_pmu_driver = {
+ .probe = dwc_pcie_pmu_probe,
+ .driver = {.name = "dwc_pcie_pmu",},
+};
+
+static int __init dwc_pcie_pmu_init(void)
+{
+ struct pci_dev *pdev = NULL;
+ bool found = false;
+ int ret;
+
+ for_each_pci_dev(pdev) {
+ if (!dwc_pcie_match_des_cap(pdev))
+ continue;
+
+ ret = dwc_pcie_register_dev(pdev);
+ if (ret) {
+ pci_dev_put(pdev);
+ return ret;
+ }
+
+ found = true;
+ }
+ if (!found)
+ return -ENODEV;
+
+ ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+ "perf/dwc_pcie_pmu:online",
+ dwc_pcie_pmu_online_cpu,
+ dwc_pcie_pmu_offline_cpu);
+ if (ret < 0)
+ return ret;
+
+ dwc_pcie_pmu_hp_state = ret;
+
+ ret = platform_driver_register(&dwc_pcie_pmu_driver);
+ if (ret)
+ goto platform_driver_register_err;
+
+ ret = bus_register_notifier(&pci_bus_type, &dwc_pcie_pmu_nb);
+ if (ret)
+ goto platform_driver_register_err;
+ notify = true;
+
+ return 0;
+
+platform_driver_register_err:
+ cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
+
+ return ret;
+}
+
+static void __exit dwc_pcie_pmu_exit(void)
+{
+ struct dwc_pcie_dev_info *dev_info, *tmp;
+
+ if (notify)
+ bus_unregister_notifier(&pci_bus_type, &dwc_pcie_pmu_nb);
+ list_for_each_entry_safe(dev_info, tmp, &dwc_pcie_dev_info_head, dev_node)
+ dwc_pcie_unregister_dev(dev_info);
+ platform_driver_unregister(&dwc_pcie_pmu_driver);
+ cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
+}
+
+module_init(dwc_pcie_pmu_init);
+module_exit(dwc_pcie_pmu_exit);
+
+MODULE_DESCRIPTION("PMU driver for DesignWare Cores PCI Express Controller");
+MODULE_AUTHOR("Shuai Xue <xueshuai@linux.alibaba.com>");
+MODULE_LICENSE("GPL v2");
#define COUNTER_READ 0x20
#define COUNTER_DPCR1 0x30
+#define COUNTER_MUX_CNTL 0x50
+#define COUNTER_MASK_COMP 0x54
#define CNTL_OVER 0x1
#define CNTL_CLEAR 0x2
#define CNTL_CSV_SHIFT 24
#define CNTL_CSV_MASK (0xFFU << CNTL_CSV_SHIFT)
+#define READ_PORT_SHIFT 0
+#define READ_PORT_MASK (0x7 << READ_PORT_SHIFT)
+#define READ_CHANNEL_REVERT 0x00000008 /* bit 3 for read channel select */
+#define WRITE_PORT_SHIFT 8
+#define WRITE_PORT_MASK (0x7 << WRITE_PORT_SHIFT)
+#define WRITE_CHANNEL_REVERT 0x00000800 /* bit 11 for write channel select */
+
#define EVENT_CYCLES_ID 0
#define EVENT_CYCLES_COUNTER 0
#define NUM_COUNTERS 4
/* DDR Perf hardware feature */
#define DDR_CAP_AXI_ID_FILTER 0x1 /* support AXI ID filter */
#define DDR_CAP_AXI_ID_FILTER_ENHANCED 0x3 /* support enhanced AXI ID filter */
+#define DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER 0x4 /* support AXI ID PORT CHANNEL filter */
struct fsl_ddr_devtype_data {
unsigned int quirks; /* quirks needed for different DDR Perf core */
.identifier = "i.MX8MP",
};
+static const struct fsl_ddr_devtype_data imx8dxl_devtype_data = {
+ .quirks = DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER,
+ .identifier = "i.MX8DXL",
+};
+
static const struct of_device_id imx_ddr_pmu_dt_ids[] = {
{ .compatible = "fsl,imx8-ddr-pmu", .data = &imx8_devtype_data},
{ .compatible = "fsl,imx8m-ddr-pmu", .data = &imx8m_devtype_data},
{ .compatible = "fsl,imx8mm-ddr-pmu", .data = &imx8mm_devtype_data},
{ .compatible = "fsl,imx8mn-ddr-pmu", .data = &imx8mn_devtype_data},
{ .compatible = "fsl,imx8mp-ddr-pmu", .data = &imx8mp_devtype_data},
+ { .compatible = "fsl,imx8dxl-ddr-pmu", .data = &imx8dxl_devtype_data},
{ /* sentinel */ }
};
MODULE_DEVICE_TABLE(of, imx_ddr_pmu_dt_ids);
enum ddr_perf_filter_capabilities {
PERF_CAP_AXI_ID_FILTER = 0,
PERF_CAP_AXI_ID_FILTER_ENHANCED,
+ PERF_CAP_AXI_ID_PORT_CHANNEL_FILTER,
PERF_CAP_AXI_ID_FEAT_MAX,
};
case PERF_CAP_AXI_ID_FILTER_ENHANCED:
quirks &= DDR_CAP_AXI_ID_FILTER_ENHANCED;
return quirks == DDR_CAP_AXI_ID_FILTER_ENHANCED;
+ case PERF_CAP_AXI_ID_PORT_CHANNEL_FILTER:
+ return !!(quirks & DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER);
default:
WARN(1, "unknown filter cap %d\n", cap);
}
static struct attribute *ddr_perf_filter_cap_attr[] = {
PERF_FILTER_EXT_ATTR_ENTRY(filter, PERF_CAP_AXI_ID_FILTER),
PERF_FILTER_EXT_ATTR_ENTRY(enhanced_filter, PERF_CAP_AXI_ID_FILTER_ENHANCED),
+ PERF_FILTER_EXT_ATTR_ENTRY(super_filter, PERF_CAP_AXI_ID_PORT_CHANNEL_FILTER),
NULL,
};
PMU_FORMAT_ATTR(event, "config:0-7");
PMU_FORMAT_ATTR(axi_id, "config1:0-15");
PMU_FORMAT_ATTR(axi_mask, "config1:16-31");
+PMU_FORMAT_ATTR(axi_port, "config2:0-2");
+PMU_FORMAT_ATTR(axi_channel, "config2:3-3");
static struct attribute *ddr_perf_format_attrs[] = {
&format_attr_event.attr,
&format_attr_axi_id.attr,
&format_attr_axi_mask.attr,
+ &format_attr_axi_port.attr,
+ &format_attr_axi_channel.attr,
NULL,
};
int counter;
int cfg = event->attr.config;
int cfg1 = event->attr.config1;
+ int cfg2 = event->attr.config2;
if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
int i;
return -EOPNOTSUPP;
}
+ if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER) {
+ if (ddr_perf_is_filtered(event)) {
+ /* revert axi id masking(axi_mask) value */
+ cfg1 ^= AXI_MASKING_REVERT;
+ writel(cfg1, pmu->base + COUNTER_MASK_COMP + ((counter - 1) << 4));
+
+ if (cfg == 0x41) {
+ /* revert axi read channel(axi_channel) value */
+ cfg2 ^= READ_CHANNEL_REVERT;
+ cfg2 |= FIELD_PREP(READ_PORT_MASK, cfg2);
+ } else {
+ /* revert axi write channel(axi_channel) value */
+ cfg2 ^= WRITE_CHANNEL_REVERT;
+ cfg2 |= FIELD_PREP(WRITE_PORT_MASK, cfg2);
+ }
+
+ writel(cfg2, pmu->base + COUNTER_MUX_CNTL + ((counter - 1) << 4));
+ }
+ }
+
pmu->events[counter] = event;
hwc->idx = counter;
platform_set_drvdata(pdev, pmu);
- pmu->id = ida_simple_get(&ddr_ida, 0, 0, GFP_KERNEL);
+ pmu->id = ida_alloc(&ddr_ida, GFP_KERNEL);
name = devm_kasprintf(&pdev->dev, GFP_KERNEL, DDR_PERF_DEV_NAME "%d", pmu->id);
if (!name) {
ret = -ENOMEM;
cpuhp_remove_multi_state(pmu->cpuhp_state);
cpuhp_state_err:
format_string_err:
- ida_simple_remove(&ddr_ida, pmu->id);
+ ida_free(&ddr_ida, pmu->id);
dev_warn(&pdev->dev, "i.MX9 DDR Perf PMU failed (%d), disabled\n", ret);
return ret;
}
perf_pmu_unregister(&pmu->pmu);
- ida_simple_remove(&ddr_ida, pmu->id);
+ ida_free(&ddr_ida, pmu->id);
return 0;
}
HISI_PMU_EVENT_ATTR(cpu_rd, 0x10),
HISI_PMU_EVENT_ATTR(cpu_rd64, 0x17),
HISI_PMU_EVENT_ATTR(cpu_rs64, 0x19),
- HISI_PMU_EVENT_ATTR(cpu_mru, 0x1a),
- HISI_PMU_EVENT_ATTR(cycles, 0x9c),
+ HISI_PMU_EVENT_ATTR(cpu_mru, 0x1c),
+ HISI_PMU_EVENT_ATTR(cycles, 0x95),
HISI_PMU_EVENT_ATTR(spipe_hit, 0xb3),
HISI_PMU_EVENT_ATTR(hpipe_hit, 0xdb),
HISI_PMU_EVENT_ATTR(cring_rxdat_cnt, 0xfa),
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
gpio_dev->saved_regs[i] = readl(gpio_dev->base + pin * 4) & ~PIN_IRQ_PENDING;
+
+ /* mask any interrupts not intended to be a wake source */
+ if (!(gpio_dev->saved_regs[i] & WAKE_SOURCE)) {
+ writel(gpio_dev->saved_regs[i] & ~BIT(INTERRUPT_MASK_OFF),
+ gpio_dev->base + pin * 4);
+ pm_pr_dbg("Disabling GPIO #%d interrupt for suspend.\n",
+ pin);
+ }
+
raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
}
#define FUNCTION_MASK GENMASK(1, 0)
#define FUNCTION_INVALID GENMASK(7, 0)
+#define WAKE_SOURCE (BIT(WAKE_CNTRL_OFF_S0I3) | \
+ BIT(WAKE_CNTRL_OFF_S3) | \
+ BIT(WAKE_CNTRL_OFF_S4) | \
+ BIT(WAKECNTRL_Z_OFF))
+
struct amd_function {
const char *name;
const char * const groups[NSELECTS];
}
};
+/*
+ * This lock class allows to tell lockdep that parent IRQ and children IRQ do
+ * not share the same class so it does not raise false positive
+ */
+static struct lock_class_key atmel_lock_key;
+static struct lock_class_key atmel_request_key;
+
static int atmel_pinctrl_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
irq_set_chip_and_handler(irq, &atmel_gpio_irq_chip,
handle_simple_irq);
irq_set_chip_data(irq, atmel_pioctrl);
+ irq_set_lockdep_class(irq, &atmel_lock_key, &atmel_request_key);
dev_dbg(dev,
"atmel gpio irq domain: hwirq: %d, linux irq: %d\n",
i, irq);
"gp77",
};
+static int cy8c95x0_pinmux_direction(struct cy8c95x0_pinctrl *chip,
+ unsigned int pin, bool input);
+
static inline u8 cypress_get_port(struct cy8c95x0_pinctrl *chip, unsigned int pin)
{
/* Account for GPORT2 which only has 4 bits */
ret = regmap_read(chip->regmap, reg, ®_val);
if (reg_val & bit)
arg = 1;
+ if (param == PIN_CONFIG_OUTPUT_ENABLE)
+ arg = !arg;
*config = pinconf_to_config_packed(param, (u16)arg);
out:
u8 port = cypress_get_port(chip, off);
u8 bit = cypress_get_pin_mask(chip, off);
unsigned long param = pinconf_to_config_param(config);
+ unsigned long arg = pinconf_to_config_argument(config);
unsigned int reg;
int ret;
case PIN_CONFIG_MODE_PWM:
reg = CY8C95X0_PWMSEL;
break;
+ case PIN_CONFIG_OUTPUT_ENABLE:
+ ret = cy8c95x0_pinmux_direction(chip, off, !arg);
+ goto out;
+ case PIN_CONFIG_INPUT_ENABLE:
+ ret = cy8c95x0_pinmux_direction(chip, off, arg);
+ goto out;
default:
ret = -ENOTSUPP;
goto out;
gc->get_direction = cy8c95x0_gpio_get_direction;
gc->get_multiple = cy8c95x0_gpio_get_multiple;
gc->set_multiple = cy8c95x0_gpio_set_multiple;
- gc->set_config = gpiochip_generic_config,
+ gc->set_config = gpiochip_generic_config;
gc->can_sleep = true;
gc->add_pin_ranges = cy8c95x0_add_pin_ranges;
nmaps = 0;
ngroups = 0;
- for_each_child_of_node(np, child) {
+ for_each_available_child_of_node(np, child) {
int npinmux = of_property_count_u32_elems(child, "pinmux");
int npins = of_property_count_u32_elems(child, "pins");
nmaps = 0;
ngroups = 0;
mutex_lock(&sfp->mutex);
- for_each_child_of_node(np, child) {
+ for_each_available_child_of_node(np, child) {
int npins;
int i;
int ret;
ngroups = 0;
- for_each_child_of_node(np, child)
+ for_each_available_child_of_node(np, child)
ngroups += 1;
nmaps = 2 * ngroups;
nmaps = 0;
ngroups = 0;
mutex_lock(&sfp->mutex);
- for_each_child_of_node(np, child) {
+ for_each_available_child_of_node(np, child) {
int npins = of_property_count_u32_elems(child, "pinmux");
int *pins;
u32 *pinmux;
struct quirk_entry {
u32 s2idle_bug_mmio;
+ bool spurious_8042;
};
static struct quirk_entry quirk_s2idle_bug = {
.s2idle_bug_mmio = 0xfed80380,
};
+static struct quirk_entry quirk_spurious_8042 = {
+ .spurious_8042 = true,
+};
+
static const struct dmi_system_id fwbug_list[] = {
{
.ident = "L14 Gen2 AMD",
DMI_MATCH(DMI_PRODUCT_NAME, "HP Laptop 15s-eq2xxx"),
}
},
+ /* https://community.frame.work/t/tracking-framework-amd-ryzen-7040-series-lid-wakeup-behavior-feedback/39128 */
+ {
+ .ident = "Framework Laptop 13 (Phoenix)",
+ .driver_data = &quirk_spurious_8042,
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Framework"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Laptop 13 (AMD Ryzen 7040Series)"),
+ DMI_MATCH(DMI_BIOS_VERSION, "03.03"),
+ }
+ },
{}
};
{
const struct dmi_system_id *dmi_id;
+ if (dev->cpu_id == AMD_CPU_ID_CZN)
+ dev->disable_8042_wakeup = true;
+
dmi_id = dmi_first_match(fwbug_list);
if (!dmi_id)
return;
if (dev->quirks->s2idle_bug_mmio)
pr_info("Using s2idle quirk to avoid %s platform firmware bug\n",
dmi_id->ident);
+ if (dev->quirks->spurious_8042)
+ dev->disable_8042_wakeup = true;
}
#define SMU_MSG_LOG_RESET 0x07
#define SMU_MSG_LOG_DUMP_DATA 0x08
#define SMU_MSG_GET_SUP_CONSTRAINTS 0x09
-/* List of supported CPU ids */
-#define AMD_CPU_ID_RV 0x15D0
-#define AMD_CPU_ID_RN 0x1630
-#define AMD_CPU_ID_PCO AMD_CPU_ID_RV
-#define AMD_CPU_ID_CZN AMD_CPU_ID_RN
-#define AMD_CPU_ID_YC 0x14B5
-#define AMD_CPU_ID_CB 0x14D8
-#define AMD_CPU_ID_PS 0x14E8
-#define AMD_CPU_ID_SP 0x14A4
-#define PCI_DEVICE_ID_AMD_1AH_M20H_ROOT 0x1507
#define PMC_MSG_DELAY_MIN_US 50
#define RESPONSE_REGISTER_LOOP_MAX 20000
return -EINVAL;
}
-static int amd_pmc_czn_wa_irq1(struct amd_pmc_dev *pdev)
+static int amd_pmc_wa_irq1(struct amd_pmc_dev *pdev)
{
struct device *d;
int rc;
- if (!pdev->major) {
- rc = amd_pmc_get_smu_version(pdev);
- if (rc)
- return rc;
- }
+ /* cezanne platform firmware has a fix in 64.66.0 */
+ if (pdev->cpu_id == AMD_CPU_ID_CZN) {
+ if (!pdev->major) {
+ rc = amd_pmc_get_smu_version(pdev);
+ if (rc)
+ return rc;
+ }
- if (pdev->major > 64 || (pdev->major == 64 && pdev->minor > 65))
- return 0;
+ if (pdev->major > 64 || (pdev->major == 64 && pdev->minor > 65))
+ return 0;
+ }
d = bus_find_device_by_name(&serio_bus, NULL, "serio0");
if (!d)
{
struct amd_pmc_dev *pdev = dev_get_drvdata(dev);
- if (pdev->cpu_id == AMD_CPU_ID_CZN && !disable_workarounds) {
- int rc = amd_pmc_czn_wa_irq1(pdev);
+ if (pdev->disable_8042_wakeup && !disable_workarounds) {
+ int rc = amd_pmc_wa_irq1(pdev);
if (rc) {
dev_err(pdev->dev, "failed to adjust keyboard wakeup: %d\n", rc);
struct mutex lock; /* generic mutex lock */
struct dentry *dbgfs_dir;
struct quirk_entry *quirks;
+ bool disable_8042_wakeup;
};
void amd_pmc_process_restore_quirks(struct amd_pmc_dev *dev);
void amd_pmc_quirks_init(struct amd_pmc_dev *dev);
+/* List of supported CPU ids */
+#define AMD_CPU_ID_RV 0x15D0
+#define AMD_CPU_ID_RN 0x1630
+#define AMD_CPU_ID_PCO AMD_CPU_ID_RV
+#define AMD_CPU_ID_CZN AMD_CPU_ID_RN
+#define AMD_CPU_ID_YC 0x14B5
+#define AMD_CPU_ID_CB 0x14D8
+#define AMD_CPU_ID_PS 0x14E8
+#define AMD_CPU_ID_SP 0x14A4
+#define PCI_DEVICE_ID_AMD_1AH_M20H_ROOT 0x1507
+
#endif /* PMC_H */
struct pmc *pmc = pmcdev->pmcs[PMC_IDX_MAIN];
int ret;
+ pmcdev->suspend = cnl_suspend;
+ pmcdev->resume = cnl_resume;
+
pmc->map = &adl_reg_map;
ret = get_primary_reg_base(pmc);
if (ret)
return ret;
- /* Due to a hardware limitation, the GBE LTR blocks PC10
- * when a cable is attached. Tell the PMC to ignore it.
- */
- dev_dbg(&pmcdev->pdev->dev, "ignoring GBE LTR\n");
- pmc_core_send_ltr_ignore(pmcdev, 3);
-
return 0;
}
.etr3_offset = ETR3_OFFSET,
};
+void cnl_suspend(struct pmc_dev *pmcdev)
+{
+ /*
+ * Due to a hardware limitation, the GBE LTR blocks PC10
+ * when a cable is attached. To unblock PC10 during suspend,
+ * tell the PMC to ignore it.
+ */
+ pmc_core_send_ltr_ignore(pmcdev, 3, 1);
+}
+
+int cnl_resume(struct pmc_dev *pmcdev)
+{
+ pmc_core_send_ltr_ignore(pmcdev, 3, 0);
+
+ return pmc_core_resume_common(pmcdev);
+}
+
int cnp_core_init(struct pmc_dev *pmcdev)
{
struct pmc *pmc = pmcdev->pmcs[PMC_IDX_MAIN];
int ret;
+ pmcdev->suspend = cnl_suspend;
+ pmcdev->resume = cnl_resume;
+
pmc->map = &cnp_reg_map;
ret = get_primary_reg_base(pmc);
if (ret)
return ret;
- /* Due to a hardware limitation, the GBE LTR blocks PC10
- * when a cable is attached. Tell the PMC to ignore it.
- */
- dev_dbg(&pmcdev->pdev->dev, "ignoring GBE LTR\n");
- pmc_core_send_ltr_ignore(pmcdev, 3);
-
return 0;
}
}
DEFINE_SHOW_ATTRIBUTE(pmc_core_pll);
-int pmc_core_send_ltr_ignore(struct pmc_dev *pmcdev, u32 value)
+int pmc_core_send_ltr_ignore(struct pmc_dev *pmcdev, u32 value, int ignore)
{
struct pmc *pmc;
const struct pmc_reg_map *map;
* is based on the contiguous indexes from ltr_show output.
* pmc index and ltr index needs to be calculated from it.
*/
- for (pmc_index = 0; pmc_index < ARRAY_SIZE(pmcdev->pmcs) && ltr_index > 0; pmc_index++) {
+ for (pmc_index = 0; pmc_index < ARRAY_SIZE(pmcdev->pmcs) && ltr_index >= 0; pmc_index++) {
pmc = pmcdev->pmcs[pmc_index];
if (!pmc)
mutex_lock(&pmcdev->lock);
reg = pmc_core_reg_read(pmc, map->ltr_ignore_offset);
- reg |= BIT(ltr_index);
+ if (ignore)
+ reg |= BIT(ltr_index);
+ else
+ reg &= ~BIT(ltr_index);
pmc_core_reg_write(pmc, map->ltr_ignore_offset, reg);
mutex_unlock(&pmcdev->lock);
if (err)
return err;
- err = pmc_core_send_ltr_ignore(pmcdev, value);
+ err = pmc_core_send_ltr_ignore(pmcdev, value, 1);
return err == 0 ? count : err;
}
struct pmc_dev *pmcdev = dev_get_drvdata(dev);
struct pmc *pmc = pmcdev->pmcs[PMC_IDX_MAIN];
+ if (pmcdev->suspend)
+ pmcdev->suspend(pmcdev);
+
/* Check if the syspend will actually use S0ix */
if (pm_suspend_via_firmware())
return 0;
* @s0ix_counter: S0ix residency (step adjusted)
* @num_lpm_modes: Count of enabled modes
* @lpm_en_modes: Array of enabled modes from lowest to highest priority
+ * @suspend: Function to perform platform specific suspend
* @resume: Function to perform platform specific resume
*
* pmc_dev contains info about power management controller device.
u64 s0ix_counter;
int num_lpm_modes;
int lpm_en_modes[LPM_MAX_NUM_MODES];
+ void (*suspend)(struct pmc_dev *pmcdev);
int (*resume)(struct pmc_dev *pmcdev);
bool has_die_c6;
extern const struct pmc_reg_map mtl_ioem_reg_map;
extern void pmc_core_get_tgl_lpm_reqs(struct platform_device *pdev);
-extern int pmc_core_send_ltr_ignore(struct pmc_dev *pmcdev, u32 value);
+int pmc_core_send_ltr_ignore(struct pmc_dev *pmcdev, u32 value, int ignore);
int pmc_core_resume_common(struct pmc_dev *pmcdev);
int get_primary_reg_base(struct pmc *pmc);
int adl_core_init(struct pmc_dev *pmcdev);
int mtl_core_init(struct pmc_dev *pmcdev);
+void cnl_suspend(struct pmc_dev *pmcdev);
+int cnl_resume(struct pmc_dev *pmcdev);
+
#define pmc_for_each_mode(i, mode, pmcdev) \
for (i = 0, mode = pmcdev->lpm_en_modes[i]; \
i < pmcdev->num_lpm_modes; \
static int mtl_resume(struct pmc_dev *pmcdev)
{
mtl_d3_fixup();
+ pmc_core_send_ltr_ignore(pmcdev, 3, 0);
+
return pmc_core_resume_common(pmcdev);
}
mtl_d3_fixup();
+ pmcdev->suspend = cnl_suspend;
pmcdev->resume = mtl_resume;
pmcdev->regmap_list = mtl_pmc_info_list;
return ret;
}
- /* Due to a hardware limitation, the GBE LTR blocks PC10
- * when a cable is attached. Tell the PMC to ignore it.
- */
- dev_dbg(&pmcdev->pdev->dev, "ignoring GBE LTR\n");
- pmc_core_send_ltr_ignore(pmcdev, 3);
-
return 0;
}
int ret;
pmc->map = &tgl_reg_map;
+
+ pmcdev->suspend = cnl_suspend;
+ pmcdev->resume = cnl_resume;
+
ret = get_primary_reg_base(pmc);
if (ret)
return ret;
pmc_core_get_tgl_lpm_reqs(pmcdev->pdev);
- /* Due to a hardware limitation, the GBE LTR blocks PC10
- * when a cable is attached. Tell the PMC to ignore it.
- */
- dev_dbg(&pmcdev->pdev->dev, "ignoring GBE LTR\n");
- pmc_core_send_ltr_ignore(pmcdev, 3);
return 0;
}
* TPACPI_FAN_WR_TPEC is also available and should be used to
* command the fan. The X31/X40/X41 seems to have 8 fan levels,
* but the ACPI tables just mention level 7.
+ *
+ * TPACPI_FAN_RD_TPEC_NS:
+ * This mode is used for a few ThinkPads (L13 Yoga Gen2, X13 Yoga Gen2 etc.)
+ * that are using non-standard EC locations for reporting fan speeds.
+ * Currently these platforms only provide fan rpm reporting.
+ *
*/
+#define FAN_RPM_CAL_CONST 491520 /* FAN RPM calculation offset for some non-standard ECFW */
+
+#define FAN_NS_CTRL_STATUS BIT(2) /* Bit which determines control is enabled or not */
+#define FAN_NS_CTRL BIT(4) /* Bit which determines control is by host or EC */
+
enum { /* Fan control constants */
fan_status_offset = 0x2f, /* EC register 0x2f */
fan_rpm_offset = 0x84, /* EC register 0x84: LSB, 0x85 MSB (RPM)
fan_select_offset = 0x31, /* EC register 0x31 (Firmware 7M)
bit 0 selects which fan is active */
+ fan_status_offset_ns = 0x93, /* Special status/control offset for non-standard EC Fan1 */
+ fan2_status_offset_ns = 0x96, /* Special status/control offset for non-standard EC Fan2 */
+ fan_rpm_status_ns = 0x95, /* Special offset for Fan1 RPM status for non-standard EC */
+ fan2_rpm_status_ns = 0x98, /* Special offset for Fan2 RPM status for non-standard EC */
+
TP_EC_FAN_FULLSPEED = 0x40, /* EC fan mode: full speed */
TP_EC_FAN_AUTO = 0x80, /* EC fan mode: auto fan control */
TPACPI_FAN_NONE = 0, /* No fan status or control */
TPACPI_FAN_RD_ACPI_GFAN, /* Use ACPI GFAN */
TPACPI_FAN_RD_TPEC, /* Use ACPI EC regs 0x2f, 0x84-0x85 */
+ TPACPI_FAN_RD_TPEC_NS, /* Use non-standard ACPI EC regs (eg: L13 Yoga gen2 etc.) */
};
enum fan_control_access_mode {
static u8 fan_control_resume_level;
static int fan_watchdog_maxinterval;
+static bool fan_with_ns_addr;
+
static struct mutex fan_mutex;
static void fan_watchdog_fire(struct work_struct *ignored);
}
break;
+ case TPACPI_FAN_RD_TPEC_NS:
+ /* Default mode is AUTO which means controlled by EC */
+ if (!acpi_ec_read(fan_status_offset_ns, &s))
+ return -EIO;
+
+ if (status)
+ *status = s;
+
+ break;
default:
return -ENXIO;
if (mutex_lock_killable(&fan_mutex))
return -ERESTARTSYS;
rc = fan_get_status(&s);
- if (!rc)
+ /* NS EC doesn't have register with level settings */
+ if (!rc && !fan_with_ns_addr)
fan_update_desired_level(s);
mutex_unlock(&fan_mutex);
if (likely(speed))
*speed = (hi << 8) | lo;
+ break;
+ case TPACPI_FAN_RD_TPEC_NS:
+ if (!acpi_ec_read(fan_rpm_status_ns, &lo))
+ return -EIO;
+ if (speed)
+ *speed = lo ? FAN_RPM_CAL_CONST / lo : 0;
break;
default:
static int fan2_get_speed(unsigned int *speed)
{
- u8 hi, lo;
+ u8 hi, lo, status;
bool rc;
switch (fan_status_access_mode) {
if (likely(speed))
*speed = (hi << 8) | lo;
+ break;
+ case TPACPI_FAN_RD_TPEC_NS:
+ rc = !acpi_ec_read(fan2_status_offset_ns, &status);
+ if (rc)
+ return -EIO;
+ if (!(status & FAN_NS_CTRL_STATUS)) {
+ pr_info("secondary fan control not supported\n");
+ return -EIO;
+ }
+ rc = !acpi_ec_read(fan2_rpm_status_ns, &lo);
+ if (rc)
+ return -EIO;
+ if (speed)
+ *speed = lo ? FAN_RPM_CAL_CONST / lo : 0;
break;
default:
#define TPACPI_FAN_2FAN 0x0002 /* EC 0x31 bit 0 selects fan2 */
#define TPACPI_FAN_2CTL 0x0004 /* selects fan2 control */
#define TPACPI_FAN_NOFAN 0x0008 /* no fan available */
+#define TPACPI_FAN_NS 0x0010 /* For EC with non-Standard register addresses */
static const struct tpacpi_quirk fan_quirk_table[] __initconst = {
TPACPI_QEC_IBM('1', 'Y', TPACPI_FAN_Q1),
TPACPI_Q_LNV3('N', '2', 'O', TPACPI_FAN_2CTL), /* P1 / X1 Extreme (2nd gen) */
TPACPI_Q_LNV3('N', '3', '0', TPACPI_FAN_2CTL), /* P15 (1st gen) / P15v (1st gen) */
TPACPI_Q_LNV3('N', '3', '7', TPACPI_FAN_2CTL), /* T15g (2nd gen) */
+ TPACPI_Q_LNV3('R', '1', 'F', TPACPI_FAN_NS), /* L13 Yoga Gen 2 */
+ TPACPI_Q_LNV3('N', '2', 'U', TPACPI_FAN_NS), /* X13 Yoga Gen 2*/
TPACPI_Q_LNV3('N', '1', 'O', TPACPI_FAN_NOFAN), /* X1 Tablet (2nd gen) */
};
return -ENODEV;
}
+ if (quirks & TPACPI_FAN_NS) {
+ pr_info("ECFW with non-standard fan reg control found\n");
+ fan_with_ns_addr = 1;
+ /* Fan ctrl support from host is undefined for now */
+ tp_features.fan_ctrl_status_undef = 1;
+ }
+
if (gfan_handle) {
/* 570, 600e/x, 770e, 770x */
fan_status_access_mode = TPACPI_FAN_RD_ACPI_GFAN;
} else {
/* all other ThinkPads: note that even old-style
* ThinkPad ECs supports the fan control register */
- if (likely(acpi_ec_read(fan_status_offset,
- &fan_control_initial_status))) {
+ if (fan_with_ns_addr ||
+ likely(acpi_ec_read(fan_status_offset, &fan_control_initial_status))) {
int res;
unsigned int speed;
- fan_status_access_mode = TPACPI_FAN_RD_TPEC;
+ fan_status_access_mode = fan_with_ns_addr ?
+ TPACPI_FAN_RD_TPEC_NS : TPACPI_FAN_RD_TPEC;
+
if (quirks & TPACPI_FAN_Q1)
fan_quirk1_setup();
/* Try and probe the 2nd fan */
if (res >= 0 && speed != FAN_NOT_PRESENT) {
/* It responded - so let's assume it's there */
tp_features.second_fan = 1;
- tp_features.second_fan_ctl = 1;
+ /* fan control not currently available for ns ECFW */
+ tp_features.second_fan_ctl = !fan_with_ns_addr;
pr_info("secondary fan control detected & enabled\n");
} else {
/* Fan not auto-detected */
str_enabled_disabled(status), status);
break;
+ case TPACPI_FAN_RD_TPEC_NS:
case TPACPI_FAN_RD_TPEC:
/* all except 570, 600e/x, 770e, 770x */
rc = fan_get_status_safe(&status);
seq_printf(m, "speed:\t\t%d\n", speed);
- if (status & TP_EC_FAN_FULLSPEED)
- /* Disengaged mode takes precedence */
- seq_printf(m, "level:\t\tdisengaged\n");
- else if (status & TP_EC_FAN_AUTO)
- seq_printf(m, "level:\t\tauto\n");
- else
- seq_printf(m, "level:\t\t%d\n", status);
+ if (fan_status_access_mode == TPACPI_FAN_RD_TPEC_NS) {
+ /*
+ * No full speed bit in NS EC
+ * EC Auto mode is set by default.
+ * No other levels settings available
+ */
+ seq_printf(m, "level:\t\t%s\n", status & FAN_NS_CTRL ? "unknown" : "auto");
+ } else {
+ if (status & TP_EC_FAN_FULLSPEED)
+ /* Disengaged mode takes precedence */
+ seq_printf(m, "level:\t\tdisengaged\n");
+ else if (status & TP_EC_FAN_AUTO)
+ seq_printf(m, "level:\t\tauto\n");
+ else
+ seq_printf(m, "level:\t\t%d\n", status);
+ }
break;
case TPACPI_FAN_NONE:
cancel_delayed_work_sync(&bp->sync_work);
for (i = 0; i < OCP_SMA_NUM; i++) {
if (bp->sma[i].dpll_pin) {
- dpll_pin_unregister(bp->dpll, bp->sma[i].dpll_pin, &dpll_pins_ops, bp);
+ dpll_pin_unregister(bp->dpll, bp->sma[i].dpll_pin, &dpll_pins_ops, &bp->sma[i]);
dpll_pin_put(bp->sma[i].dpll_pin);
}
}
{
lockdep_assert_held(&reset_list_mutex);
+ if (IS_ERR_OR_NULL(rstc))
+ return;
+
kref_put(&rstc->refcnt, __reset_control_release);
}
void reset_control_bulk_put(int num_rstcs, struct reset_control_bulk_data *rstcs)
{
mutex_lock(&reset_list_mutex);
- while (num_rstcs--) {
- if (IS_ERR_OR_NULL(rstcs[num_rstcs].rstc))
- continue;
+ while (num_rstcs--)
__reset_control_put_internal(rstcs[num_rstcs].rstc);
- }
mutex_unlock(&reset_list_mutex);
}
EXPORT_SYMBOL_GPL(reset_control_bulk_put);
if (!data)
return -ENOMEM;
- type = (enum hi6220_reset_ctrl_type)of_device_get_match_data(dev);
+ type = (uintptr_t)of_device_get_match_data(dev);
regmap = syscon_node_to_regmap(np);
if (IS_ERR(regmap)) {
#include <linux/blk-mq.h>
#include <linux/slab.h>
#include <linux/list.h>
+#include <linux/io.h>
#include <asm/eadm.h>
#include "scm_blk.h"
for (i = 0; i < nr_requests_per_io && scmrq->request[i]; i++) {
msb = &scmrq->aob->msb[i];
- aidaw = msb->data_addr;
+ aidaw = (u64)phys_to_virt(msb->data_addr);
if ((msb->flags & MSB_FLAG_IDA) && aidaw &&
IS_ALIGNED(aidaw, PAGE_SIZE))
msb->scm_addr = scmdev->address + ((u64) blk_rq_pos(req) << 9);
msb->oc = (rq_data_dir(req) == READ) ? MSB_OC_READ : MSB_OC_WRITE;
msb->flags |= MSB_FLAG_IDA;
- msb->data_addr = (u64) aidaw;
+ msb->data_addr = (u64)virt_to_phys(aidaw);
rq_for_each_segment(bv, req, iter) {
WARN_ON(bv.bv_offset);
msb->blk_count += bv.bv_len >> 12;
- aidaw->data_addr = (u64) page_address(bv.bv_page);
+ aidaw->data_addr = virt_to_phys(page_address(bv.bv_page));
aidaw++;
}
/* Notify the guest if more CRWs are on our queue */
if (!list_empty(&private->crw) && private->crw_trigger)
- eventfd_signal(private->crw_trigger, 1);
+ eventfd_signal(private->crw_trigger);
return ret;
}
private->state = VFIO_CCW_STATE_IDLE;
if (private->io_trigger)
- eventfd_signal(private->io_trigger, 1);
+ eventfd_signal(private->io_trigger);
}
void vfio_ccw_crw_todo(struct work_struct *work)
private = container_of(work, struct vfio_ccw_private, crw_work);
if (!list_empty(&private->crw) && private->crw_trigger)
- eventfd_signal(private->crw_trigger, 1);
+ eventfd_signal(private->crw_trigger);
}
/*
case VFIO_IRQ_SET_DATA_NONE:
{
if (*ctx)
- eventfd_signal(*ctx, 1);
+ eventfd_signal(*ctx);
return 0;
}
case VFIO_IRQ_SET_DATA_BOOL:
return -EFAULT;
if (trigger && *ctx)
- eventfd_signal(*ctx, 1);
+ eventfd_signal(*ctx);
return 0;
}
case VFIO_IRQ_SET_DATA_EVENTFD:
"Relaying device request to user (#%u)\n",
count);
- eventfd_signal(private->req_trigger, 1);
+ eventfd_signal(private->req_trigger);
} else if (count == 0) {
dev_notice(dev,
"No device request channel registered, blocked until released by user\n");
"Relaying device request to user (#%u)\n",
count);
- eventfd_signal(matrix_mdev->req_trigger, 1);
+ eventfd_signal(matrix_mdev->req_trigger);
} else if (count == 0) {
dev_notice(dev,
"No device request registered, blocked until released by user\n");
u32 handle_pci_error;
bool init_reset;
u8 soft_reset_support;
- u8 use_map_queue;
};
#define aac_adapter_interrupt(dev) \
struct fib *aac_fib_alloc_tag(struct aac_dev *dev, struct scsi_cmnd *scmd)
{
struct fib *fibptr;
- u32 blk_tag;
- int i;
- blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
- i = blk_mq_unique_tag_to_tag(blk_tag);
- fibptr = &dev->fibs[i];
+ fibptr = &dev->fibs[scsi_cmd_to_rq(scmd)->tag];
/*
* Null out fields that depend on being zero at the start of
* each I/O
#include <linux/compat.h>
#include <linux/blkdev.h>
-#include <linux/blk-mq-pci.h>
#include <linux/completion.h>
#include <linux/init.h>
#include <linux/interrupt.h>
return 0;
}
-static void aac_map_queues(struct Scsi_Host *shost)
-{
- struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
-
- blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
- aac->pdev, 0);
- aac->use_map_queue = true;
-}
-
/**
* aac_change_queue_depth - alter queue depths
* @sdev: SCSI device we are considering
.bios_param = aac_biosparm,
.shost_groups = aac_host_groups,
.slave_configure = aac_slave_configure,
- .map_queues = aac_map_queues,
.change_queue_depth = aac_change_queue_depth,
.sdev_groups = aac_dev_groups,
.eh_abort_handler = aac_eh_abort,
shost->max_lun = AAC_MAX_LUN;
pci_set_drvdata(pdev, shost);
- shost->nr_hw_queues = aac->max_msix;
- shost->host_tagset = 1;
error = scsi_add_host(shost, &pdev->dev);
if (error)
struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
aac_cancel_rescan_worker(aac);
- aac->use_map_queue = false;
scsi_remove_host(shost);
__aac_shutdown(aac);
#endif
u16 vector_no;
- struct scsi_cmnd *scmd;
- u32 blk_tag;
- struct Scsi_Host *shost = dev->scsi_host_ptr;
- struct blk_mq_queue_map *qmap;
atomic_inc(&q->numpending);
if ((dev->comm_interface == AAC_COMM_MESSAGE_TYPE3)
&& dev->sa_firmware)
vector_no = aac_get_vector(dev);
- else {
- if (!fib->vector_no || !fib->callback_data) {
- if (shost && dev->use_map_queue) {
- qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
- vector_no = qmap->mq_map[raw_smp_processor_id()];
- }
- /*
- * We hardcode the vector_no for
- * reserved commands as a valid shost is
- * absent during the init
- */
- else
- vector_no = 0;
- } else {
- scmd = (struct scsi_cmnd *)fib->callback_data;
- blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
- vector_no = blk_mq_unique_tag_to_hwq(blk_tag);
- }
- }
+ else
+ vector_no = fib->vector_no;
if (native_hba) {
if (fib->flags & FIB_CONTEXT_FLAG_NATIVE_HBA_TMF) {
struct fcoe_ctlr *ctlr;
struct fcoe_rcv_info *fr;
struct fcoe_percpu_s *bg;
- struct sk_buff *tmp_skb;
interface = container_of(ptype, struct bnx2fc_interface,
fcoe_packet_type);
goto err;
}
- tmp_skb = skb_share_check(skb, GFP_ATOMIC);
- if (!tmp_skb)
- goto err;
-
- skb = tmp_skb;
+ skb = skb_share_check(skb, GFP_ATOMIC);
+ if (!skb)
+ return -1;
if (unlikely(eth_hdr(skb)->h_proto != htons(ETH_P_FCOE))) {
printk(KERN_ERR PFX "bnx2fc_rcv: Wrong FC type frame\n");
scsi_log_send(scmd);
scmd->submitter = SUBMITTED_BY_SCSI_ERROR_HANDLER;
+ scmd->flags |= SCMD_LAST;
/*
* Lock sdev->state_mutex to avoid that scsi_device_quiesce() can
scsi_init_command(dev, scmd);
scmd->submitter = SUBMITTED_BY_SCSI_RESET_IOCTL;
+ scmd->flags |= SCMD_LAST;
memset(&scmd->sdb, 0, sizeof(scmd->sdb));
scmd->cmd_len = 0;
#include <linux/gpio/consumer.h>
#include <linux/pinctrl/consumer.h>
#include <linux/pm_runtime.h>
+#include <linux/iopoll.h>
#include <trace/events/spi.h>
/* SPI register offsets */
*/
#define DMA_MIN_BYTES 16
-#define SPI_DMA_MIN_TIMEOUT (msecs_to_jiffies(1000))
-#define SPI_DMA_TIMEOUT_PER_10K (msecs_to_jiffies(4))
-
#define AUTOSUSPEND_TIMEOUT 2000
struct atmel_spi_caps {
bool keep_cs;
u32 fifo_size;
+ bool last_polarity;
u8 native_cs_free;
u8 native_cs_for_gpio;
};
#define SPI_MAX_DMA_XFER 65535 /* true for both PDC and DMA */
#define INVALID_DMA_ADDRESS 0xffffffff
+/*
+ * This frequency can be anything supported by the controller, but to avoid
+ * unnecessary delay, the highest possible frequency is chosen.
+ *
+ * This frequency is the highest possible which is not interfering with other
+ * chip select registers (see Note for Serial Clock Bit Rate configuration in
+ * Atmel-11121F-ATARM-SAMA5D3-Series-Datasheet_02-Feb-16, page 1283)
+ */
+#define DUMMY_MSG_FREQUENCY 0x02
+/*
+ * 8 bits is the minimum data the controller is capable of sending.
+ *
+ * This message can be anything as it should not be treated by any SPI device.
+ */
+#define DUMMY_MSG 0xAA
+
/*
* Version 2 of the SPI controller has
* - CR.LASTXFER
return as->caps.is_spi2;
}
+/*
+ * Send a dummy message.
+ *
+ * This is sometimes needed when using a CS GPIO to force clock transition when
+ * switching between devices with different polarities.
+ */
+static void atmel_spi_send_dummy(struct atmel_spi *as, struct spi_device *spi, int chip_select)
+{
+ u32 status;
+ u32 csr;
+
+ /*
+ * Set a clock frequency to allow sending message on SPI bus.
+ * The frequency here can be anything, but is needed for
+ * the controller to send the data.
+ */
+ csr = spi_readl(as, CSR0 + 4 * chip_select);
+ csr = SPI_BFINS(SCBR, DUMMY_MSG_FREQUENCY, csr);
+ spi_writel(as, CSR0 + 4 * chip_select, csr);
+
+ /*
+ * Read all data coming from SPI bus, needed to be able to send
+ * the message.
+ */
+ spi_readl(as, RDR);
+ while (spi_readl(as, SR) & SPI_BIT(RDRF)) {
+ spi_readl(as, RDR);
+ cpu_relax();
+ }
+
+ spi_writel(as, TDR, DUMMY_MSG);
+
+ readl_poll_timeout_atomic(as->regs + SPI_SR, status,
+ (status & SPI_BIT(TXEMPTY)), 1, 1000);
+}
+
+
/*
* Earlier SPI controllers (e.g. on at91rm9200) have a design bug whereby
* they assume that spi slave device state will not change on deselect, so
* Master on Chip Select 0.") No workaround exists for that ... so for
* nCS0 on that chip, we (a) don't use the GPIO, (b) can't support CS_HIGH,
* and (c) will trigger that first erratum in some cases.
+ *
+ * When changing the clock polarity, the SPI controller waits for the next
+ * transmission to enforce the default clock state. This may be an issue when
+ * using a GPIO as Chip Select: the clock level is applied only when the first
+ * packet is sent, once the CS has already been asserted. The workaround is to
+ * avoid this by sending a first (dummy) message before toggling the CS state.
*/
-
static void cs_activate(struct atmel_spi *as, struct spi_device *spi)
{
struct atmel_spi_device *asd = spi->controller_state;
+ bool new_polarity;
int chip_select;
u32 mr;
}
mr = spi_readl(as, MR);
+
+ /*
+ * Ensures the clock polarity is valid before we actually
+ * assert the CS to avoid spurious clock edges to be
+ * processed by the spi devices.
+ */
+ if (spi_get_csgpiod(spi, 0)) {
+ new_polarity = (asd->csr & SPI_BIT(CPOL)) != 0;
+ if (new_polarity != as->last_polarity) {
+ /*
+ * Need to disable the GPIO before sending the dummy
+ * message because it is already set by the spi core.
+ */
+ gpiod_set_value_cansleep(spi_get_csgpiod(spi, 0), 0);
+ atmel_spi_send_dummy(as, spi, chip_select);
+ as->last_polarity = new_polarity;
+ gpiod_set_value_cansleep(spi_get_csgpiod(spi, 0), 1);
+ }
+ }
} else {
u32 cpol = (spi->mode & SPI_CPOL) ? SPI_BIT(CPOL) : 0;
int i;
}
dma_timeout = msecs_to_jiffies(spi_controller_xfer_timeout(host, xfer));
- ret_timeout = wait_for_completion_interruptible_timeout(&as->xfer_completion,
- dma_timeout);
- if (ret_timeout <= 0) {
- dev_err(&spi->dev, "spi transfer %s\n",
- !ret_timeout ? "timeout" : "canceled");
- as->done_status = ret_timeout < 0 ? ret_timeout : -EIO;
+ ret_timeout = wait_for_completion_timeout(&as->xfer_completion, dma_timeout);
+ if (!ret_timeout) {
+ dev_err(&spi->dev, "spi transfer timeout\n");
+ as->done_status = -EIO;
}
if (as->done_status)
udelay(10);
cdns_spi_process_fifo(xspi, xspi->tx_fifo_depth, 0);
- spi_transfer_delay_exec(transfer);
cdns_spi_write(xspi, CDNS_SPI_IER, CDNS_SPI_IXR_DEFAULT);
return transfer->len;
ctrl |= (spi_imx->target_burst * 8 - 1)
<< MX51_ECSPI_CTRL_BL_OFFSET;
else {
- if (spi_imx->count >= 512)
- ctrl |= 0xFFF << MX51_ECSPI_CTRL_BL_OFFSET;
- else
- ctrl |= (spi_imx->count * spi_imx->bits_per_word - 1)
+ if (spi_imx->usedma) {
+ ctrl |= (spi_imx->bits_per_word *
+ spi_imx_bytes_per_word(spi_imx->bits_per_word) - 1)
<< MX51_ECSPI_CTRL_BL_OFFSET;
+ } else {
+ if (spi_imx->count >= MX51_ECSPI_CTRL_MAX_BURST)
+ ctrl |= (MX51_ECSPI_CTRL_MAX_BURST - 1)
+ << MX51_ECSPI_CTRL_BL_OFFSET;
+ else
+ ctrl |= (spi_imx->count * spi_imx->bits_per_word - 1)
+ << MX51_ECSPI_CTRL_BL_OFFSET;
+ }
}
/* set clock speed */
snprintf(dir_name, sizeof(dir_name), "port%d", port->port);
parent = debugfs_lookup(dir_name, port->sw->debugfs_dir);
if (parent)
- debugfs_remove_recursive(debugfs_lookup("margining", parent));
+ debugfs_lookup_and_remove("margining", parent);
kfree(port->usb4->margining);
port->usb4->margining = NULL;
goto err_request;
/*
- * Always keep 1000 Mb/s to make sure xHCI has at least some
+ * Always keep 900 Mb/s to make sure xHCI has at least some
* bandwidth available for isochronous traffic.
*/
- if (consumed_up < 1000)
- consumed_up = 1000;
- if (consumed_down < 1000)
- consumed_down = 1000;
+ if (consumed_up < 900)
+ consumed_up = 900;
+ if (consumed_down < 900)
+ consumed_down = 900;
ret = usb4_usb3_port_write_allocated_bandwidth(port, consumed_up,
consumed_down);
if (is_mcq_enabled(hba)) {
int utrd_size = sizeof(struct utp_transfer_req_desc);
struct utp_transfer_req_desc *src = lrbp->utr_descriptor_ptr;
- struct utp_transfer_req_desc *dest = hwq->sqe_base_addr + hwq->sq_tail_slot;
+ struct utp_transfer_req_desc *dest;
spin_lock(&hwq->sq_lock);
+ dest = hwq->sqe_base_addr + hwq->sq_tail_slot;
memcpy(dest, src, utrd_size);
ufshcd_inc_sq_tail(hwq);
spin_unlock(&hwq->sq_lock);
err = ufs_qcom_clk_scale_up_pre_change(hba);
else
err = ufs_qcom_clk_scale_down_pre_change(hba);
- if (err)
- ufshcd_uic_hibern8_exit(hba);
+ if (err) {
+ ufshcd_uic_hibern8_exit(hba);
+ return err;
+ }
} else {
if (scale_up)
err = ufs_qcom_clk_scale_up_post_change(hba);
* Vinayak Holikatti <h.vinayak@samsung.com>
*/
+#include <linux/clk.h>
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/pm_opp.h>
}
}
+/**
+ * ufshcd_parse_clock_min_max_freq - Parse MIN and MAX clocks freq
+ * @hba: per adapter instance
+ *
+ * This function parses MIN and MAX frequencies of all clocks required
+ * by the host drivers.
+ *
+ * Returns 0 for success and non-zero for failure
+ */
+static int ufshcd_parse_clock_min_max_freq(struct ufs_hba *hba)
+{
+ struct list_head *head = &hba->clk_list_head;
+ struct ufs_clk_info *clki;
+ struct dev_pm_opp *opp;
+ unsigned long freq;
+ u8 idx = 0;
+
+ list_for_each_entry(clki, head, list) {
+ if (!clki->name)
+ continue;
+
+ clki->clk = devm_clk_get(hba->dev, clki->name);
+ if (IS_ERR(clki->clk))
+ continue;
+
+ /* Find Max Freq */
+ freq = ULONG_MAX;
+ opp = dev_pm_opp_find_freq_floor_indexed(hba->dev, &freq, idx);
+ if (IS_ERR(opp)) {
+ dev_err(hba->dev, "Failed to find OPP for MAX frequency\n");
+ return PTR_ERR(opp);
+ }
+ clki->max_freq = dev_pm_opp_get_freq_indexed(opp, idx);
+ dev_pm_opp_put(opp);
+
+ /* Find Min Freq */
+ freq = 0;
+ opp = dev_pm_opp_find_freq_ceil_indexed(hba->dev, &freq, idx);
+ if (IS_ERR(opp)) {
+ dev_err(hba->dev, "Failed to find OPP for MIN frequency\n");
+ return PTR_ERR(opp);
+ }
+ clki->min_freq = dev_pm_opp_get_freq_indexed(opp, idx++);
+ dev_pm_opp_put(opp);
+ }
+
+ return 0;
+}
+
static int ufshcd_parse_operating_points(struct ufs_hba *hba)
{
struct device *dev = hba->dev;
return ret;
}
+ ret = ufshcd_parse_clock_min_max_freq(hba);
+ if (ret)
+ return ret;
+
hba->use_pm_opp = true;
return 0;
temp = size;
size -= temp;
next += temp;
- if (temp == size)
- goto done;
}
temp = snprintf(next, size, "\n");
size -= temp;
next += temp;
-done:
*sizep = size;
*nextp = next;
}
io_data->kiocb->ki_complete(io_data->kiocb, ret);
if (io_data->ffs->ffs_eventfd && !kiocb_has_eventfd)
- eventfd_signal(io_data->ffs->ffs_eventfd, 1);
+ eventfd_signal(io_data->ffs->ffs_eventfd);
if (io_data->read)
kfree(io_data->to_free);
ffs->ev.types[ffs->ev.count++] = type;
wake_up_locked(&ffs->ev.waitq);
if (ffs->ffs_eventfd)
- eventfd_signal(ffs->ffs_eventfd, 1);
+ eventfd_signal(ffs->ffs_eventfd);
}
static void ffs_event_add(struct ffs_data *ffs,
{ USB_DEVICE(FTDI_VID, ACTISENSE_USG_PID) },
{ USB_DEVICE(FTDI_VID, ACTISENSE_NGT_PID) },
{ USB_DEVICE(FTDI_VID, ACTISENSE_NGW_PID) },
- { USB_DEVICE(FTDI_VID, ACTISENSE_D9AC_PID) },
- { USB_DEVICE(FTDI_VID, ACTISENSE_D9AD_PID) },
- { USB_DEVICE(FTDI_VID, ACTISENSE_D9AE_PID) },
+ { USB_DEVICE(FTDI_VID, ACTISENSE_UID_PID) },
+ { USB_DEVICE(FTDI_VID, ACTISENSE_USA_PID) },
+ { USB_DEVICE(FTDI_VID, ACTISENSE_NGX_PID) },
{ USB_DEVICE(FTDI_VID, ACTISENSE_D9AF_PID) },
{ USB_DEVICE(FTDI_VID, CHETCO_SEAGAUGE_PID) },
{ USB_DEVICE(FTDI_VID, CHETCO_SEASWITCH_PID) },
#define ACTISENSE_USG_PID 0xD9A9 /* USG USB Serial Adapter */
#define ACTISENSE_NGT_PID 0xD9AA /* NGT NMEA2000 Interface */
#define ACTISENSE_NGW_PID 0xD9AB /* NGW NMEA2000 Gateway */
-#define ACTISENSE_D9AC_PID 0xD9AC /* Actisense Reserved */
-#define ACTISENSE_D9AD_PID 0xD9AD /* Actisense Reserved */
-#define ACTISENSE_D9AE_PID 0xD9AE /* Actisense Reserved */
+#define ACTISENSE_UID_PID 0xD9AC /* USB Isolating Device */
+#define ACTISENSE_USA_PID 0xD9AD /* USB to Serial Adapter */
+#define ACTISENSE_NGX_PID 0xD9AE /* NGX NMEA2000 Gateway */
#define ACTISENSE_D9AF_PID 0xD9AF /* Actisense Reserved */
#define CHETCO_SEAGAUGE_PID 0xA548 /* SeaGauge USB Adapter */
#define CHETCO_SEASWITCH_PID 0xA549 /* SeaSwitch USB Adapter */
#define QUECTEL_PRODUCT_RM500Q 0x0800
#define QUECTEL_PRODUCT_RM520N 0x0801
#define QUECTEL_PRODUCT_EC200U 0x0901
+#define QUECTEL_PRODUCT_EG912Y 0x6001
#define QUECTEL_PRODUCT_EC200S_CN 0x6002
#define QUECTEL_PRODUCT_EC200A 0x6005
#define QUECTEL_PRODUCT_EM061K_LWW 0x6008
{ USB_DEVICE_INTERFACE_CLASS(QUECTEL_VENDOR_ID, 0x0700, 0xff), /* BG95 */
.driver_info = RSVD(3) | ZLP },
{ USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM500Q, 0xff, 0xff, 0x30) },
+ { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM500Q, 0xff, 0, 0x40) },
{ USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM500Q, 0xff, 0, 0) },
{ USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM500Q, 0xff, 0xff, 0x10),
.driver_info = ZLP },
{ USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC200U, 0xff, 0, 0) },
{ USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC200S_CN, 0xff, 0, 0) },
{ USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC200T, 0xff, 0, 0) },
+ { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EG912Y, 0xff, 0, 0) },
{ USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM500K, 0xff, 0x00, 0x00) },
{ USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) },
.driver_info = RSVD(0) | RSVD(1) | RSVD(6) },
{ USB_DEVICE(0x0489, 0xe0b5), /* Foxconn T77W968 ESIM */
.driver_info = RSVD(0) | RSVD(1) | RSVD(6) },
+ { USB_DEVICE_INTERFACE_CLASS(0x0489, 0xe0da, 0xff), /* Foxconn T99W265 MBIM variant */
+ .driver_info = RSVD(3) | RSVD(5) },
{ USB_DEVICE_INTERFACE_CLASS(0x0489, 0xe0db, 0xff), /* Foxconn T99W265 MBIM */
.driver_info = RSVD(3) },
{ USB_DEVICE_INTERFACE_CLASS(0x0489, 0xe0ee, 0xff), /* Foxconn T99W368 MBIM */
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_INITIAL_READ10 ),
+/*
+ * Patch by Tasos Sahanidis <tasos@tasossah.com>
+ * This flash drive always shows up with write protect enabled
+ * during the first mode sense.
+ */
+UNUSUAL_DEV(0x0951, 0x1697, 0x0100, 0x0100,
+ "Kingston",
+ "DT Ultimate G3",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_NO_WP_DETECT),
+
/*
* This Pentax still camera is not conformant
* to the USB storage specification: -
con_num = UCSI_CCI_CONNECTOR(cci);
if (con_num) {
- if (con_num < PMIC_GLINK_MAX_PORTS &&
+ if (con_num <= PMIC_GLINK_MAX_PORTS &&
ucsi->port_orientation[con_num - 1]) {
int orientation = gpiod_get_value(ucsi->port_orientation[con_num - 1]);
goto unlock;
if (vq->kickfd)
- eventfd_signal(vq->kickfd, 1);
+ eventfd_signal(vq->kickfd);
else
vq->kicked = true;
unlock:
eventfd_ctx_put(vq->kickfd);
vq->kickfd = ctx;
if (vq->ready && vq->kicked && vq->kickfd) {
- eventfd_signal(vq->kickfd, 1);
+ eventfd_signal(vq->kickfd);
vq->kicked = false;
}
spin_unlock(&vq->kick_lock);
spin_lock_irq(&vq->irq_lock);
if (vq->ready && vq->cb.trigger) {
- eventfd_signal(vq->cb.trigger, 1);
+ eventfd_signal(vq->cb.trigger);
signal = true;
}
spin_unlock_irq(&vq->irq_lock);
fput(f);
break;
}
- ret = receive_fd(f, perm_to_file_flags(entry.perm));
+ ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
fput(f);
break;
}
{
struct vfio_fsl_mc_irq *mc_irq = (struct vfio_fsl_mc_irq *)arg;
- eventfd_signal(mc_irq->trigger, 1);
+ eventfd_signal(mc_irq->trigger);
return IRQ_HANDLED;
}
*/
down_write(&vdev->memory_lock);
if (vdev->pm_wake_eventfd_ctx) {
- eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
+ eventfd_signal(vdev->pm_wake_eventfd_ctx);
__vfio_pci_runtime_pm_exit(vdev);
}
up_write(&vdev->memory_lock);
pci_notice_ratelimited(pdev,
"Relaying device request to user (#%u)\n",
count);
- eventfd_signal(vdev->req_trigger, 1);
+ eventfd_signal(vdev->req_trigger);
} else if (count == 0) {
pci_warn(pdev,
"No device request channel registered, blocked until released by user\n");
mutex_lock(&vdev->igate);
if (vdev->err_trigger)
- eventfd_signal(vdev->err_trigger, 1);
+ eventfd_signal(vdev->err_trigger);
mutex_unlock(&vdev->igate);
ctx = vfio_irq_ctx_get(vdev, 0);
if (WARN_ON_ONCE(!ctx))
return;
- eventfd_signal(ctx->trigger, 1);
+ eventfd_signal(ctx->trigger);
}
}
{
struct eventfd_ctx *trigger = arg;
- eventfd_signal(trigger, 1);
+ eventfd_signal(trigger);
return IRQ_HANDLED;
}
if (!ctx)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
- eventfd_signal(ctx->trigger, 1);
+ eventfd_signal(ctx->trigger);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
uint8_t *bools = data;
if (bools[i - start])
- eventfd_signal(ctx->trigger, 1);
+ eventfd_signal(ctx->trigger);
}
}
return 0;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
if (*ctx) {
if (count) {
- eventfd_signal(*ctx, 1);
+ eventfd_signal(*ctx);
} else {
eventfd_ctx_put(*ctx);
*ctx = NULL;
trigger = *(uint8_t *)data;
if (trigger && *ctx)
- eventfd_signal(*ctx, 1);
+ eventfd_signal(*ctx);
return 0;
} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
spin_unlock_irqrestore(&irq_ctx->lock, flags);
if (ret == IRQ_HANDLED)
- eventfd_signal(irq_ctx->trigger, 1);
+ eventfd_signal(irq_ctx->trigger);
return ret;
}
{
struct vfio_platform_irq *irq_ctx = dev_id;
- eventfd_signal(irq_ctx->trigger, 1);
+ eventfd_signal(irq_ctx->trigger);
return IRQ_HANDLED;
}
struct eventfd_ctx *call_ctx = vq->call_ctx.ctx;
if (call_ctx)
- eventfd_signal(call_ctx, 1);
+ eventfd_signal(call_ctx);
return IRQ_HANDLED;
}
struct eventfd_ctx *config_ctx = v->config_ctx;
if (config_ctx)
- eventfd_signal(config_ctx, 1);
+ eventfd_signal(config_ctx);
return IRQ_HANDLED;
}
len -= l;
if (!len) {
if (vq->log_ctx)
- eventfd_signal(vq->log_ctx, 1);
+ eventfd_signal(vq->log_ctx);
return 0;
}
}
log_used(vq, (used - (void __user *)vq->used),
sizeof vq->used->flags);
if (vq->log_ctx)
- eventfd_signal(vq->log_ctx, 1);
+ eventfd_signal(vq->log_ctx);
}
return 0;
}
log_used(vq, (used - (void __user *)vq->used),
sizeof *vhost_avail_event(vq));
if (vq->log_ctx)
- eventfd_signal(vq->log_ctx, 1);
+ eventfd_signal(vq->log_ctx);
}
return 0;
}
log_used(vq, offsetof(struct vring_used, idx),
sizeof vq->used->idx);
if (vq->log_ctx)
- eventfd_signal(vq->log_ctx, 1);
+ eventfd_signal(vq->log_ctx);
}
return r;
}
{
/* Signal the Guest tell them we used something up. */
if (vq->call_ctx.ctx && vhost_notify(dev, vq))
- eventfd_signal(vq->call_ctx.ctx, 1);
+ eventfd_signal(vq->call_ctx.ctx);
}
EXPORT_SYMBOL_GPL(vhost_signal);
#define vq_err(vq, fmt, ...) do { \
pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \
if ((vq)->error_ctx) \
- eventfd_signal((vq)->error_ctx, 1);\
+ eventfd_signal((vq)->error_ctx);\
} while (0)
enum {
mutex_lock(&client->vm->ioeventfds_lock);
p = hsm_ioeventfd_match(client->vm, addr, val, size, req->type);
if (p)
- eventfd_signal(p->eventfd, 1);
+ eventfd_signal(p->eventfd);
mutex_unlock(&client->vm->ioeventfds_lock);
return 0;
return ret;
}
-static int __exit sev_guest_remove(struct platform_device *pdev)
+static void __exit sev_guest_remove(struct platform_device *pdev)
{
struct snp_guest_dev *snp_dev = platform_get_drvdata(pdev);
free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
deinit_crypto(snp_dev->crypto);
misc_deregister(&snp_dev->misc);
-
- return 0;
}
/*
* with the SEV-SNP support, it is named "sev-guest".
*/
static struct platform_driver sev_guest_driver = {
- .remove = __exit_p(sev_guest_remove),
+ .remove_new = __exit_p(sev_guest_remove),
.driver = {
.name = "sev-guest",
},
if (!vq->use_dma_api)
return;
- dma_sync_single_range_for_cpu(dev, addr, offset, size,
- DMA_BIDIRECTIONAL);
+ dma_sync_single_range_for_cpu(dev, addr, offset, size, dir);
}
EXPORT_SYMBOL_GPL(virtqueue_dma_sync_single_range_for_cpu);
if (!vq->use_dma_api)
return;
- dma_sync_single_range_for_device(dev, addr, offset, size,
- DMA_BIDIRECTIONAL);
+ dma_sync_single_range_for_device(dev, addr, offset, size, dir);
}
EXPORT_SYMBOL_GPL(virtqueue_dma_sync_single_range_for_device);
if (ioreq->addr == kioeventfd->addr + VIRTIO_MMIO_QUEUE_NOTIFY &&
ioreq->size == kioeventfd->addr_len &&
(ioreq->data & QUEUE_NOTIFY_VQ_MASK) == kioeventfd->vq) {
- eventfd_signal(kioeventfd->eventfd, 1);
+ eventfd_signal(kioeventfd->eventfd);
state = STATE_IORESP_READY;
break;
}
config FS_IOMAP
bool
+# Stackable filesystems
+config FS_STACK
+ bool
+
config BUFFER_HEAD
bool
obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o
obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o
+obj-$(CONFIG_FS_STACK) += backing-file.o
obj-$(CONFIG_FS_MBCACHE) += mbcache.o
obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o
obj-$(CONFIG_NFS_COMMON) += nfs_common/
if (ret == -ENOMEM)
goto out_wake;
- ret = -ENOMEM;
vllist = afs_alloc_vlserver_list(0);
- if (!vllist)
+ if (!vllist) {
+ if (ret >= 0)
+ ret = -ENOMEM;
goto out_wake;
+ }
switch (ret) {
case -ENODATA:
struct afs_net *net = afs_d2net(dentry);
const char *name = dentry->d_name.name;
size_t len = dentry->d_name.len;
+ char *result = NULL;
int ret;
/* Names prefixed with a dot are R/W mounts. */
}
ret = dns_query(net->net, "afsdb", name, len, "srv=1",
- NULL, NULL, false);
- if (ret == -ENODATA || ret == -ENOKEY)
+ &result, NULL, false);
+ if (ret == -ENODATA || ret == -ENOKEY || ret == 0)
ret = -ENOENT;
+ if (ret > 0 && ret >= sizeof(struct dns_server_list_v1_header)) {
+ struct dns_server_list_v1_header *v1 = (void *)result;
+
+ if (v1->hdr.zero == 0 &&
+ v1->hdr.content == DNS_PAYLOAD_IS_SERVER_LIST &&
+ v1->hdr.version == 1 &&
+ (v1->status != DNS_LOOKUP_GOOD &&
+ v1->status != DNS_LOOKUP_GOOD_WITH_BAD))
+ return -ENOENT;
+
+ }
+
+ kfree(result);
return ret;
}
return 1;
}
-/*
- * Allow the VFS to enquire as to whether a dentry should be unhashed (mustn't
- * sleep)
- * - called from dput() when d_count is going to 0.
- * - return 1 to request dentry be unhashed, 0 otherwise
- */
-static int afs_dynroot_d_delete(const struct dentry *dentry)
-{
- return d_really_is_positive(dentry);
-}
-
const struct dentry_operations afs_dynroot_dentry_operations = {
.d_revalidate = afs_dynroot_d_revalidate,
- .d_delete = afs_dynroot_d_delete,
+ .d_delete = always_delete_dentry,
.d_release = afs_d_release,
.d_automount = afs_d_automount,
};
#define AFS_VOLUME_OFFLINE 4 /* - T if volume offline notice given */
#define AFS_VOLUME_BUSY 5 /* - T if volume busy notice given */
#define AFS_VOLUME_MAYBE_NO_IBULK 6 /* - T if some servers don't have InlineBulkStatus */
+#define AFS_VOLUME_RM_TREE 7 /* - Set if volume removed from cell->volumes */
#ifdef CONFIG_AFS_FSCACHE
struct fscache_volume *cache; /* Caching cookie */
#endif
extern struct afs_volume *afs_create_volume(struct afs_fs_context *);
extern int afs_activate_volume(struct afs_volume *);
extern void afs_deactivate_volume(struct afs_volume *);
+bool afs_try_get_volume(struct afs_volume *volume, enum afs_volume_trace reason);
extern struct afs_volume *afs_get_volume(struct afs_volume *, enum afs_volume_trace);
extern void afs_put_volume(struct afs_net *, struct afs_volume *, enum afs_volume_trace);
extern int afs_check_volume_status(struct afs_volume *, struct afs_operation *);
} else if (p->vid > volume->vid) {
pp = &(*pp)->rb_right;
} else {
- volume = afs_get_volume(p, afs_volume_trace_get_cell_insert);
- goto found;
+ if (afs_try_get_volume(p, afs_volume_trace_get_cell_insert)) {
+ volume = p;
+ goto found;
+ }
+
+ set_bit(AFS_VOLUME_RM_TREE, &volume->flags);
+ rb_replace_node_rcu(&p->cell_node, &volume->cell_node, &cell->volumes);
}
}
afs_volume_trace_remove);
write_seqlock(&cell->volume_lock);
hlist_del_rcu(&volume->proc_link);
- rb_erase(&volume->cell_node, &cell->volumes);
+ if (!test_and_set_bit(AFS_VOLUME_RM_TREE, &volume->flags))
+ rb_erase(&volume->cell_node, &cell->volumes);
write_sequnlock(&cell->volume_lock);
}
}
_leave(" [destroyed]");
}
+/*
+ * Try to get a reference on a volume record.
+ */
+bool afs_try_get_volume(struct afs_volume *volume, enum afs_volume_trace reason)
+{
+ int r;
+
+ if (__refcount_inc_not_zero(&volume->ref, &r)) {
+ trace_afs_volume(volume->vid, r + 1, reason);
+ return true;
+ }
+ return false;
+}
+
/*
* Get a reference on a volume record.
*/
return ERR_CAST(inode);
inode->i_mapping->a_ops = &aio_ctx_aops;
- inode->i_mapping->private_data = ctx;
+ inode->i_mapping->i_private_data = ctx;
inode->i_size = PAGE_SIZE * nr_pages;
file = alloc_file_pseudo(inode, aio_mnt, "[aio]",
/* Prevent further access to the kioctx from migratepages */
i_mapping = aio_ring_file->f_mapping;
- spin_lock(&i_mapping->private_lock);
- i_mapping->private_data = NULL;
+ spin_lock(&i_mapping->i_private_lock);
+ i_mapping->i_private_data = NULL;
ctx->aio_ring_file = NULL;
- spin_unlock(&i_mapping->private_lock);
+ spin_unlock(&i_mapping->i_private_lock);
fput(aio_ring_file);
}
rc = 0;
- /* mapping->private_lock here protects against the kioctx teardown. */
- spin_lock(&mapping->private_lock);
- ctx = mapping->private_data;
+ /* mapping->i_private_lock here protects against the kioctx teardown. */
+ spin_lock(&mapping->i_private_lock);
+ ctx = mapping->i_private_data;
if (!ctx) {
rc = -EINVAL;
goto out;
out_unlock:
mutex_unlock(&ctx->ring_lock);
out:
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
return rc;
}
#else
kmem_cache_free(kiocb_cachep, iocb);
}
+struct aio_waiter {
+ struct wait_queue_entry w;
+ size_t min_nr;
+};
+
/* aio_complete
* Called when the io request on the given iocb is complete.
*/
struct kioctx *ctx = iocb->ki_ctx;
struct aio_ring *ring;
struct io_event *ev_page, *event;
- unsigned tail, pos, head;
+ unsigned tail, pos, head, avail;
unsigned long flags;
/*
ctx->completed_events++;
if (ctx->completed_events > 1)
refill_reqs_available(ctx, head, tail);
+
+ avail = tail > head
+ ? tail - head
+ : tail + ctx->nr_events - head;
spin_unlock_irqrestore(&ctx->completion_lock, flags);
pr_debug("added to ring %p at [%u]\n", iocb, tail);
* from IRQ context.
*/
if (iocb->ki_eventfd)
- eventfd_signal(iocb->ki_eventfd, 1);
+ eventfd_signal(iocb->ki_eventfd);
/*
* We have to order our ring_info tail store above and test
*/
smp_mb();
- if (waitqueue_active(&ctx->wait))
- wake_up(&ctx->wait);
+ if (waitqueue_active(&ctx->wait)) {
+ struct aio_waiter *curr, *next;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ctx->wait.lock, flags);
+ list_for_each_entry_safe(curr, next, &ctx->wait.head, w.entry)
+ if (avail >= curr->min_nr) {
+ list_del_init_careful(&curr->w.entry);
+ wake_up_process(curr->w.private);
+ }
+ spin_unlock_irqrestore(&ctx->wait.lock, flags);
+ }
}
static inline void iocb_put(struct aio_kiocb *iocb)
struct io_event __user *event,
ktime_t until)
{
- long ret = 0;
+ struct hrtimer_sleeper t;
+ struct aio_waiter w;
+ long ret = 0, ret2 = 0;
/*
* Note that aio_read_events() is being called as the conditional - i.e.
* the ringbuffer empty. So in practice we should be ok, but it's
* something to be aware of when touching this code.
*/
- if (until == 0)
- aio_read_events(ctx, min_nr, nr, event, &ret);
- else
- wait_event_interruptible_hrtimeout(ctx->wait,
- aio_read_events(ctx, min_nr, nr, event, &ret),
- until);
+ aio_read_events(ctx, min_nr, nr, event, &ret);
+ if (until == 0 || ret < 0 || ret >= min_nr)
+ return ret;
+
+ hrtimer_init_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ if (until != KTIME_MAX) {
+ hrtimer_set_expires_range_ns(&t.timer, until, current->timer_slack_ns);
+ hrtimer_sleeper_start_expires(&t, HRTIMER_MODE_REL);
+ }
+
+ init_wait(&w.w);
+
+ while (1) {
+ unsigned long nr_got = ret;
+
+ w.min_nr = min_nr - ret;
+
+ ret2 = prepare_to_wait_event(&ctx->wait, &w.w, TASK_INTERRUPTIBLE);
+ if (!ret2 && !t.task)
+ ret2 = -ETIME;
+
+ if (aio_read_events(ctx, min_nr, nr, event, &ret) || ret2)
+ break;
+
+ if (nr_got == ret)
+ schedule();
+ }
+
+ finish_wait(&ctx->wait, &w.w);
+ hrtimer_cancel(&t.timer);
+ destroy_hrtimer_on_stack(&t.timer);
+
return ret;
}
size_t len = iocb->aio_nbytes;
if (!vectored) {
- ssize_t ret = import_single_range(rw, buf, len, *iovec, iter);
+ ssize_t ret = import_ubuf(rw, buf, len, iter);
*iovec = NULL;
return ret;
}
* the vfsmount must be passed through @idmap. This function will then
* take care to map the inode according to @idmap before checking
* permissions. On non-idmapped mounts or if permission checking is to be
- * performed on the raw inode simply passs @nop_mnt_idmap.
+ * performed on the raw inode simply pass @nop_mnt_idmap.
*
* Should be called as the first thing in ->setattr implementations,
* possibly after taking additional locks.
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Common helpers for stackable filesystems and backing files.
+ *
+ * Forked from fs/overlayfs/file.c.
+ *
+ * Copyright (C) 2017 Red Hat, Inc.
+ * Copyright (C) 2023 CTERA Networks.
+ */
+
+#include <linux/fs.h>
+#include <linux/backing-file.h>
+#include <linux/splice.h>
+#include <linux/mm.h>
+
+#include "internal.h"
+
+/**
+ * backing_file_open - open a backing file for kernel internal use
+ * @user_path: path that the user reuqested to open
+ * @flags: open flags
+ * @real_path: path of the backing file
+ * @cred: credentials for open
+ *
+ * Open a backing file for a stackable filesystem (e.g., overlayfs).
+ * @user_path may be on the stackable filesystem and @real_path on the
+ * underlying filesystem. In this case, we want to be able to return the
+ * @user_path of the stackable filesystem. This is done by embedding the
+ * returned file into a container structure that also stores the stacked
+ * file's path, which can be retrieved using backing_file_user_path().
+ */
+struct file *backing_file_open(const struct path *user_path, int flags,
+ const struct path *real_path,
+ const struct cred *cred)
+{
+ struct file *f;
+ int error;
+
+ f = alloc_empty_backing_file(flags, cred);
+ if (IS_ERR(f))
+ return f;
+
+ path_get(user_path);
+ *backing_file_user_path(f) = *user_path;
+ error = vfs_open(real_path, f);
+ if (error) {
+ fput(f);
+ f = ERR_PTR(error);
+ }
+
+ return f;
+}
+EXPORT_SYMBOL_GPL(backing_file_open);
+
+struct backing_aio {
+ struct kiocb iocb;
+ refcount_t ref;
+ struct kiocb *orig_iocb;
+ /* used for aio completion */
+ void (*end_write)(struct file *);
+ struct work_struct work;
+ long res;
+};
+
+static struct kmem_cache *backing_aio_cachep;
+
+#define BACKING_IOCB_MASK \
+ (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND)
+
+static rwf_t iocb_to_rw_flags(int flags)
+{
+ return (__force rwf_t)(flags & BACKING_IOCB_MASK);
+}
+
+static void backing_aio_put(struct backing_aio *aio)
+{
+ if (refcount_dec_and_test(&aio->ref)) {
+ fput(aio->iocb.ki_filp);
+ kmem_cache_free(backing_aio_cachep, aio);
+ }
+}
+
+static void backing_aio_cleanup(struct backing_aio *aio, long res)
+{
+ struct kiocb *iocb = &aio->iocb;
+ struct kiocb *orig_iocb = aio->orig_iocb;
+
+ if (aio->end_write)
+ aio->end_write(orig_iocb->ki_filp);
+
+ orig_iocb->ki_pos = iocb->ki_pos;
+ backing_aio_put(aio);
+}
+
+static void backing_aio_rw_complete(struct kiocb *iocb, long res)
+{
+ struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb);
+ struct kiocb *orig_iocb = aio->orig_iocb;
+
+ if (iocb->ki_flags & IOCB_WRITE)
+ kiocb_end_write(iocb);
+
+ backing_aio_cleanup(aio, res);
+ orig_iocb->ki_complete(orig_iocb, res);
+}
+
+static void backing_aio_complete_work(struct work_struct *work)
+{
+ struct backing_aio *aio = container_of(work, struct backing_aio, work);
+
+ backing_aio_rw_complete(&aio->iocb, aio->res);
+}
+
+static void backing_aio_queue_completion(struct kiocb *iocb, long res)
+{
+ struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb);
+
+ /*
+ * Punt to a work queue to serialize updates of mtime/size.
+ */
+ aio->res = res;
+ INIT_WORK(&aio->work, backing_aio_complete_work);
+ queue_work(file_inode(aio->orig_iocb->ki_filp)->i_sb->s_dio_done_wq,
+ &aio->work);
+}
+
+static int backing_aio_init_wq(struct kiocb *iocb)
+{
+ struct super_block *sb = file_inode(iocb->ki_filp)->i_sb;
+
+ if (sb->s_dio_done_wq)
+ return 0;
+
+ return sb_init_dio_done_wq(sb);
+}
+
+
+ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
+ struct kiocb *iocb, int flags,
+ struct backing_file_ctx *ctx)
+{
+ struct backing_aio *aio = NULL;
+ const struct cred *old_cred;
+ ssize_t ret;
+
+ if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING)))
+ return -EIO;
+
+ if (!iov_iter_count(iter))
+ return 0;
+
+ if (iocb->ki_flags & IOCB_DIRECT &&
+ !(file->f_mode & FMODE_CAN_ODIRECT))
+ return -EINVAL;
+
+ old_cred = override_creds(ctx->cred);
+ if (is_sync_kiocb(iocb)) {
+ rwf_t rwf = iocb_to_rw_flags(flags);
+
+ ret = vfs_iter_read(file, iter, &iocb->ki_pos, rwf);
+ } else {
+ ret = -ENOMEM;
+ aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL);
+ if (!aio)
+ goto out;
+
+ aio->orig_iocb = iocb;
+ kiocb_clone(&aio->iocb, iocb, get_file(file));
+ aio->iocb.ki_complete = backing_aio_rw_complete;
+ refcount_set(&aio->ref, 2);
+ ret = vfs_iocb_iter_read(file, &aio->iocb, iter);
+ backing_aio_put(aio);
+ if (ret != -EIOCBQUEUED)
+ backing_aio_cleanup(aio, ret);
+ }
+out:
+ revert_creds(old_cred);
+
+ if (ctx->accessed)
+ ctx->accessed(ctx->user_file);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_read_iter);
+
+ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
+ struct kiocb *iocb, int flags,
+ struct backing_file_ctx *ctx)
+{
+ const struct cred *old_cred;
+ ssize_t ret;
+
+ if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING)))
+ return -EIO;
+
+ if (!iov_iter_count(iter))
+ return 0;
+
+ ret = file_remove_privs(ctx->user_file);
+ if (ret)
+ return ret;
+
+ if (iocb->ki_flags & IOCB_DIRECT &&
+ !(file->f_mode & FMODE_CAN_ODIRECT))
+ return -EINVAL;
+
+ /*
+ * Stacked filesystems don't support deferred completions, don't copy
+ * this property in case it is set by the issuer.
+ */
+ flags &= ~IOCB_DIO_CALLER_COMP;
+
+ old_cred = override_creds(ctx->cred);
+ if (is_sync_kiocb(iocb)) {
+ rwf_t rwf = iocb_to_rw_flags(flags);
+
+ ret = vfs_iter_write(file, iter, &iocb->ki_pos, rwf);
+ if (ctx->end_write)
+ ctx->end_write(ctx->user_file);
+ } else {
+ struct backing_aio *aio;
+
+ ret = backing_aio_init_wq(iocb);
+ if (ret)
+ goto out;
+
+ ret = -ENOMEM;
+ aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL);
+ if (!aio)
+ goto out;
+
+ aio->orig_iocb = iocb;
+ aio->end_write = ctx->end_write;
+ kiocb_clone(&aio->iocb, iocb, get_file(file));
+ aio->iocb.ki_flags = flags;
+ aio->iocb.ki_complete = backing_aio_queue_completion;
+ refcount_set(&aio->ref, 2);
+ ret = vfs_iocb_iter_write(file, &aio->iocb, iter);
+ backing_aio_put(aio);
+ if (ret != -EIOCBQUEUED)
+ backing_aio_cleanup(aio, ret);
+ }
+out:
+ revert_creds(old_cred);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_write_iter);
+
+ssize_t backing_file_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags,
+ struct backing_file_ctx *ctx)
+{
+ const struct cred *old_cred;
+ ssize_t ret;
+
+ if (WARN_ON_ONCE(!(in->f_mode & FMODE_BACKING)))
+ return -EIO;
+
+ old_cred = override_creds(ctx->cred);
+ ret = vfs_splice_read(in, ppos, pipe, len, flags);
+ revert_creds(old_cred);
+
+ if (ctx->accessed)
+ ctx->accessed(ctx->user_file);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_splice_read);
+
+ssize_t backing_file_splice_write(struct pipe_inode_info *pipe,
+ struct file *out, loff_t *ppos, size_t len,
+ unsigned int flags,
+ struct backing_file_ctx *ctx)
+{
+ const struct cred *old_cred;
+ ssize_t ret;
+
+ if (WARN_ON_ONCE(!(out->f_mode & FMODE_BACKING)))
+ return -EIO;
+
+ ret = file_remove_privs(ctx->user_file);
+ if (ret)
+ return ret;
+
+ old_cred = override_creds(ctx->cred);
+ file_start_write(out);
+ ret = iter_file_splice_write(pipe, out, ppos, len, flags);
+ file_end_write(out);
+ revert_creds(old_cred);
+
+ if (ctx->end_write)
+ ctx->end_write(ctx->user_file);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_splice_write);
+
+int backing_file_mmap(struct file *file, struct vm_area_struct *vma,
+ struct backing_file_ctx *ctx)
+{
+ const struct cred *old_cred;
+ int ret;
+
+ if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING)) ||
+ WARN_ON_ONCE(ctx->user_file != vma->vm_file))
+ return -EIO;
+
+ if (!file->f_op->mmap)
+ return -ENODEV;
+
+ vma_set_file(vma, file);
+
+ old_cred = override_creds(ctx->cred);
+ ret = call_mmap(vma->vm_file, vma);
+ revert_creds(old_cred);
+
+ if (ctx->accessed)
+ ctx->accessed(ctx->user_file);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_mmap);
+
+static int __init backing_aio_init(void)
+{
+ backing_aio_cachep = kmem_cache_create("backing_aio",
+ sizeof(struct backing_aio),
+ 0, SLAB_HWCACHE_ALIGN, NULL);
+ if (!backing_aio_cachep)
+ return -ENOMEM;
+
+ return 0;
+}
+fs_initcall(backing_aio_init);
clock.o \
compress.o \
counters.o \
+ darray.o \
debug.o \
dirent.o \
disk_groups.o \
reflink.o \
replicas.o \
sb-clean.o \
+ sb-downgrade.o \
sb-errors.o \
sb-members.o \
siphash.o \
bch2_trans_begin(trans);
acl = _acl;
- ret = bch2_inode_peek(trans, &inode_iter, &inode_u, inode_inum(inode),
+ ret = bch2_subvol_is_ro_trans(trans, inode->ei_subvol) ?:
+ bch2_inode_peek(trans, &inode_iter, &inode_u, inode_inum(inode),
BTREE_ITER_INTENT);
if (ret)
goto btree_err;
goto alloc_done;
/* Don't retry from all devices if we're out of open buckets: */
- if (bch2_err_matches(ret, BCH_ERR_open_buckets_empty))
- goto allocate_blocking;
+ if (bch2_err_matches(ret, BCH_ERR_open_buckets_empty)) {
+ int ret = open_bucket_add_buckets(trans, &ptrs, wp, devs_have,
+ target, erasure_code,
+ nr_replicas, &nr_effective,
+ &have_cache, watermark,
+ flags, cl);
+ if (!ret ||
+ bch2_err_matches(ret, BCH_ERR_transaction_restart) ||
+ bch2_err_matches(ret, BCH_ERR_open_buckets_empty))
+ goto alloc_done;
+ }
/*
* Only try to allocate cache (durability = 0 devices) from the
&have_cache, watermark,
flags, cl);
} else {
-allocate_blocking:
ret = open_bucket_add_buckets(trans, &ptrs, wp, devs_have,
target, erasure_code,
nr_replicas, &nr_effective,
unsigned nsec_per_time_unit;
u64 features;
u64 compat;
+ unsigned long errors_silent[BITS_TO_LONGS(BCH_SB_ERR_MAX)];
} sb;
};
#define BCH_SB_FIELDS() \
- x(journal, 0) \
- x(members_v1, 1) \
- x(crypt, 2) \
- x(replicas_v0, 3) \
- x(quota, 4) \
- x(disk_groups, 5) \
- x(clean, 6) \
- x(replicas, 7) \
- x(journal_seq_blacklist, 8) \
- x(journal_v2, 9) \
- x(counters, 10) \
- x(members_v2, 11) \
- x(errors, 12)
+ x(journal, 0) \
+ x(members_v1, 1) \
+ x(crypt, 2) \
+ x(replicas_v0, 3) \
+ x(quota, 4) \
+ x(disk_groups, 5) \
+ x(clean, 6) \
+ x(replicas, 7) \
+ x(journal_seq_blacklist, 8) \
+ x(journal_v2, 9) \
+ x(counters, 10) \
+ x(members_v2, 11) \
+ x(errors, 12) \
+ x(ext, 13) \
+ x(downgrade, 14)
enum bch_sb_field_type {
#define x(f, nr) BCH_SB_FIELD_##f = nr,
LE64_BITMASK(BCH_SB_ERROR_ENTRY_ID, struct bch_sb_field_error_entry, v, 0, 16);
LE64_BITMASK(BCH_SB_ERROR_ENTRY_NR, struct bch_sb_field_error_entry, v, 16, 64);
+struct bch_sb_field_ext {
+ struct bch_sb_field field;
+ __le64 recovery_passes_required[2];
+ __le64 errors_silent[8];
+};
+
+struct bch_sb_field_downgrade_entry {
+ __le16 version;
+ __le64 recovery_passes[2];
+ __le16 nr_errors;
+ __le16 errors[] __counted_by(nr_errors);
+} __packed __aligned(2);
+
+struct bch_sb_field_downgrade {
+ struct bch_sb_field field;
+ struct bch_sb_field_downgrade_entry entries[];
+};
+
/* Superblock: */
/*
#define RECOVERY_PASS_ALL_FSCK (1ULL << 63)
+/*
+ * field 1: version name
+ * field 2: BCH_VERSION(major, minor)
+ * field 3: recovery passess required on upgrade
+ */
#define BCH_METADATA_VERSIONS() \
x(bkey_renumber, BCH_VERSION(0, 10), \
RECOVERY_PASS_ALL_FSCK) \
goto out_no_locked;
/*
- * iter->pos should be mononotically increasing, and always be
- * equal to the key we just returned - except extents can
- * straddle iter->pos:
+ * We need to check against @end before FILTER_SNAPSHOTS because
+ * if we get to a different inode that requested we might be
+ * seeing keys for a different snapshot tree that will all be
+ * filtered out.
+ *
+ * But we can't do the full check here, because bkey_start_pos()
+ * isn't monotonically increasing before FILTER_SNAPSHOTS, and
+ * that's what we check against in extents mode:
*/
- if (!(iter->flags & BTREE_ITER_IS_EXTENTS))
- iter_pos = k.k->p;
- else
- iter_pos = bkey_max(iter->pos, bkey_start_pos(k.k));
-
- if (unlikely(!(iter->flags & BTREE_ITER_IS_EXTENTS)
- ? bkey_gt(iter_pos, end)
- : bkey_ge(iter_pos, end)))
+ if (k.k->p.inode > end.inode)
goto end;
if (iter->update_path &&
continue;
}
+ /*
+ * iter->pos should be mononotically increasing, and always be
+ * equal to the key we just returned - except extents can
+ * straddle iter->pos:
+ */
+ if (!(iter->flags & BTREE_ITER_IS_EXTENTS))
+ iter_pos = k.k->p;
+ else
+ iter_pos = bkey_max(iter->pos, bkey_start_pos(k.k));
+
+ if (unlikely(!(iter->flags & BTREE_ITER_IS_EXTENTS)
+ ? bkey_gt(iter_pos, end)
+ : bkey_ge(iter_pos, end)))
+ goto end;
+
break;
}
mempool_exit(&c->btree_trans_pool);
}
-int bch2_fs_btree_iter_init(struct bch_fs *c)
+void bch2_fs_btree_iter_init_early(struct bch_fs *c)
{
struct btree_transaction_stats *s;
- int ret;
for (s = c->btree_transaction_stats;
s < c->btree_transaction_stats + ARRAY_SIZE(c->btree_transaction_stats);
INIT_LIST_HEAD(&c->btree_trans_list);
seqmutex_init(&c->btree_trans_lock);
+}
+
+int bch2_fs_btree_iter_init(struct bch_fs *c)
+{
+ int ret;
c->btree_trans_bufs = alloc_percpu(struct btree_trans_buf);
if (!c->btree_trans_bufs)
void bch2_btree_trans_to_text(struct printbuf *, struct btree_trans *);
void bch2_fs_btree_iter_exit(struct bch_fs *);
+void bch2_fs_btree_iter_init_early(struct bch_fs *);
int bch2_fs_btree_iter_init(struct bch_fs *);
#endif /* _BCACHEFS_BTREE_ITER_H */
enum btree_id btree_id = iter->btree_id;
struct bkey_i *update;
struct bpos new_start = bkey_start_pos(new.k);
- bool front_split = bkey_lt(bkey_start_pos(old.k), new_start);
- bool back_split = bkey_gt(old.k->p, new.k->p);
+ unsigned front_split = bkey_lt(bkey_start_pos(old.k), new_start);
+ unsigned back_split = bkey_gt(old.k->p, new.k->p);
+ unsigned middle_split = (front_split || back_split) &&
+ old.k->p.snapshot != new.k->p.snapshot;
+ unsigned nr_splits = front_split + back_split + middle_split;
int ret = 0, compressed_sectors;
/*
* so that __bch2_trans_commit() can increase our disk
* reservation:
*/
- if (((front_split && back_split) ||
- ((front_split || back_split) && old.k->p.snapshot != new.k->p.snapshot)) &&
+ if (nr_splits > 1 &&
(compressed_sectors = bch2_bkey_sectors_compressed(old)))
- trans->extra_journal_res += compressed_sectors;
+ trans->extra_journal_res += compressed_sectors * (nr_splits - 1);
if (front_split) {
update = bch2_bkey_make_mut_noupdate(trans, old);
}
/* If we're overwriting in a different snapshot - middle split: */
- if (old.k->p.snapshot != new.k->p.snapshot &&
- (front_split || back_split)) {
+ if (middle_split) {
update = bch2_bkey_make_mut_noupdate(trans, old);
if ((ret = PTR_ERR_OR_ZERO(update)))
return ret;
/* Calculate ideal packed bkey format for new btree nodes: */
-void __bch2_btree_calc_format(struct bkey_format_state *s, struct btree *b)
+static void __bch2_btree_calc_format(struct bkey_format_state *s, struct btree *b)
{
struct bkey_packed *k;
struct bset_tree *t;
return bch2_bkey_format_done(&s);
}
-static size_t btree_node_u64s_with_format(struct btree *b,
+static size_t btree_node_u64s_with_format(struct btree_nr_keys nr,
+ struct bkey_format *old_f,
struct bkey_format *new_f)
{
- struct bkey_format *old_f = &b->format;
-
/* stupid integer promotion rules */
ssize_t delta =
(((int) new_f->key_u64s - old_f->key_u64s) *
- (int) b->nr.packed_keys) +
+ (int) nr.packed_keys) +
(((int) new_f->key_u64s - BKEY_U64s) *
- (int) b->nr.unpacked_keys);
+ (int) nr.unpacked_keys);
- BUG_ON(delta + b->nr.live_u64s < 0);
+ BUG_ON(delta + nr.live_u64s < 0);
- return b->nr.live_u64s + delta;
+ return nr.live_u64s + delta;
}
/**
*
* @c: filesystem handle
* @b: btree node to rewrite
+ * @nr: number of keys for new node (i.e. b->nr)
* @new_f: bkey format to translate keys to
*
* Returns: true if all re-packed keys will be able to fit in a new node.
*
* Assumes all keys will successfully pack with the new format.
*/
-bool bch2_btree_node_format_fits(struct bch_fs *c, struct btree *b,
+static bool bch2_btree_node_format_fits(struct bch_fs *c, struct btree *b,
+ struct btree_nr_keys nr,
struct bkey_format *new_f)
{
- size_t u64s = btree_node_u64s_with_format(b, new_f);
+ size_t u64s = btree_node_u64s_with_format(nr, &b->format, new_f);
return __vstruct_bytes(struct btree_node, u64s) < btree_bytes(c);
}
* The keys might expand with the new format - if they wouldn't fit in
* the btree node anymore, use the old format for now:
*/
- if (!bch2_btree_node_format_fits(as->c, b, &format))
+ if (!bch2_btree_node_format_fits(as->c, b, b->nr, &format))
format = b->format;
SET_BTREE_NODE_SEQ(n->data, BTREE_NODE_SEQ(b->data) + 1);
struct bkey_packed *out[2];
struct bkey uk;
unsigned u64s, n1_u64s = (b->nr.live_u64s * 3) / 5;
+ struct { unsigned nr_keys, val_u64s; } nr_keys[2];
int i;
+ memset(&nr_keys, 0, sizeof(nr_keys));
+
for (i = 0; i < 2; i++) {
BUG_ON(n[i]->nsets != 1);
if (!i)
n1_pos = uk.p;
bch2_bkey_format_add_key(&format[i], &uk);
+
+ nr_keys[i].nr_keys++;
+ nr_keys[i].val_u64s += bkeyp_val_u64s(&b->format, k);
}
btree_set_min(n[0], b->data->min_key);
bch2_bkey_format_add_pos(&format[i], n[i]->data->max_key);
n[i]->data->format = bch2_bkey_format_done(&format[i]);
+
+ unsigned u64s = nr_keys[i].nr_keys * n[i]->data->format.key_u64s +
+ nr_keys[i].val_u64s;
+ if (__vstruct_bytes(struct btree_node, u64s) > btree_bytes(as->c))
+ n[i]->data->format = b->format;
+
btree_node_set_format(n[i], n[i]->data->format);
}
bch2_bkey_format_add_pos(&new_s, next->data->max_key);
new_f = bch2_bkey_format_done(&new_s);
- sib_u64s = btree_node_u64s_with_format(b, &new_f) +
- btree_node_u64s_with_format(m, &new_f);
+ sib_u64s = btree_node_u64s_with_format(b->nr, &b->format, &new_f) +
+ btree_node_u64s_with_format(m->nr, &m->format, &new_f);
if (sib_u64s > BTREE_FOREGROUND_MERGE_HYSTERESIS(c)) {
sib_u64s -= BTREE_FOREGROUND_MERGE_HYSTERESIS(c);
#include "btree_locking.h"
#include "btree_update.h"
-void __bch2_btree_calc_format(struct bkey_format_state *, struct btree *);
-bool bch2_btree_node_format_fits(struct bch_fs *c, struct btree *,
- struct bkey_format *);
-
#define BTREE_UPDATE_NODES_MAX ((BTREE_MAX_DEPTH - 2) * 2 + GC_MERGE_NODES)
#define BTREE_UPDATE_JOURNAL_RES (BTREE_UPDATE_NODES_MAX * (BKEY_BTREE_PTR_U64s_MAX + 1))
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include "darray.h"
+
+int __bch2_darray_resize(darray_char *d, size_t element_size, size_t new_size, gfp_t gfp)
+{
+ if (new_size > d->size) {
+ new_size = roundup_pow_of_two(new_size);
+
+ void *data = kvmalloc_array(new_size, element_size, gfp);
+ if (!data)
+ return -ENOMEM;
+
+ memcpy(data, d->data, d->size * element_size);
+ if (d->data != d->preallocated)
+ kvfree(d->data);
+ d->data = data;
+ d->size = new_size;
+ }
+
+ return 0;
+}
* Inspired by CCAN's darray
*/
-#include "util.h"
#include <linux/slab.h>
-#define DARRAY(type) \
+#define DARRAY_PREALLOCATED(_type, _nr) \
struct { \
size_t nr, size; \
- type *data; \
+ _type *data; \
+ _type preallocated[_nr]; \
}
-typedef DARRAY(void) darray_void;
+#define DARRAY(_type) DARRAY_PREALLOCATED(_type, 0)
-static inline int __darray_make_room(darray_void *d, size_t t_size, size_t more, gfp_t gfp)
+typedef DARRAY(char) darray_char;
+
+int __bch2_darray_resize(darray_char *, size_t, size_t, gfp_t);
+
+static inline int __darray_resize(darray_char *d, size_t element_size,
+ size_t new_size, gfp_t gfp)
{
- if (d->nr + more > d->size) {
- size_t new_size = roundup_pow_of_two(d->nr + more);
- void *data = krealloc_array(d->data, new_size, t_size, gfp);
+ return unlikely(new_size > d->size)
+ ? __bch2_darray_resize(d, element_size, new_size, gfp)
+ : 0;
+}
- if (!data)
- return -ENOMEM;
+#define darray_resize_gfp(_d, _new_size, _gfp) \
+ unlikely(__darray_resize((darray_char *) (_d), sizeof((_d)->data[0]), (_new_size), _gfp))
- d->data = data;
- d->size = new_size;
- }
+#define darray_resize(_d, _new_size) \
+ darray_resize_gfp(_d, _new_size, GFP_KERNEL)
- return 0;
+static inline int __darray_make_room(darray_char *d, size_t t_size, size_t more, gfp_t gfp)
+{
+ return __darray_resize(d, t_size, d->nr + more, gfp);
}
#define darray_make_room_gfp(_d, _more, _gfp) \
- __darray_make_room((darray_void *) (_d), sizeof((_d)->data[0]), (_more), _gfp)
+ __darray_make_room((darray_char *) (_d), sizeof((_d)->data[0]), (_more), _gfp)
#define darray_make_room(_d, _more) \
darray_make_room_gfp(_d, _more, GFP_KERNEL)
+#define darray_room(_d) ((_d).size - (_d).nr)
+
#define darray_top(_d) ((_d).data[(_d).nr])
#define darray_push_gfp(_d, _item, _gfp) \
#define darray_init(_d) \
do { \
- (_d)->data = NULL; \
- (_d)->nr = (_d)->size = 0; \
+ (_d)->nr = 0; \
+ (_d)->size = ARRAY_SIZE((_d)->preallocated); \
+ (_d)->data = (_d)->size ? (_d)->preallocated : NULL; \
} while (0)
#define darray_exit(_d) \
do { \
- kfree((_d)->data); \
+ if (!ARRAY_SIZE((_d)->preallocated) || \
+ (_d)->data != (_d)->preallocated) \
+ kvfree((_d)->data); \
darray_init(_d); \
} while (0)
move_ctxt_wait_event(ctxt,
(locked = bch2_bucket_nocow_trylock(&c->nocow_locks,
PTR_BUCKET_POS(c, &p.ptr), 0)) ||
- !atomic_read(&ctxt->read_sectors));
+ (!atomic_read(&ctxt->read_sectors) &&
+ !atomic_read(&ctxt->write_sectors)));
if (!locked)
bch2_bucket_nocow_lock(&c->nocow_locks,
* Increasing replication is an explicit operation triggered by
* rereplicate, currently, so that users don't get an unexpected -ENOSPC
*/
- if (durability_have >= io_opts.data_replicas) {
+ if (!(m->data_opts.write_flags & BCH_WRITE_CACHED) &&
+ durability_have >= io_opts.data_replicas) {
m->data_opts.kill_ptrs |= m->data_opts.rewrite_ptrs;
m->data_opts.rewrite_ptrs = 0;
/* if iter == NULL, it's just a promote */
x(ENOSPC, ENOSPC_sb_members) \
x(ENOSPC, ENOSPC_sb_members_v2) \
x(ENOSPC, ENOSPC_sb_crypt) \
+ x(ENOSPC, ENOSPC_sb_downgrade) \
x(ENOSPC, ENOSPC_btree_slot) \
x(ENOSPC, ENOSPC_snapshot_tree) \
x(ENOENT, ENOENT_bkey_type_mismatch) \
x(BCH_ERR_invalid_sb, invalid_sb_quota) \
x(BCH_ERR_invalid_sb, invalid_sb_errors) \
x(BCH_ERR_invalid_sb, invalid_sb_opt_compression) \
+ x(BCH_ERR_invalid_sb, invalid_sb_ext) \
+ x(BCH_ERR_invalid_sb, invalid_sb_downgrade) \
x(BCH_ERR_invalid, invalid_bkey) \
x(BCH_ERR_operation_blocked, nocow_lock_blocked) \
x(EIO, btree_node_read_err) \
struct printbuf buf = PRINTBUF, *out = &buf;
int ret = -BCH_ERR_fsck_ignore;
+ if (test_bit(err, c->sb.errors_silent))
+ return -BCH_ERR_fsck_fix;
+
bch2_sb_error_count(c, err);
va_start(args, fmt);
#define fsck_err_on(cond, c, _err_type, ...) \
__fsck_err_on(cond, c, FSCK_CAN_FIX|FSCK_CAN_IGNORE, _err_type, __VA_ARGS__)
+__printf(4, 0)
static inline void bch2_bkey_fsck_err(struct bch_fs *c,
struct printbuf *err_msg,
enum bch_sb_error_id err_type,
va_start(args, fmt);
prt_vprintf(err_msg, fmt, args);
va_end(args);
-
}
#define bkey_fsck_err(c, _err_msg, _err_type, ...) \
struct address_space *mapping;
struct bch_inode_info *inode;
struct mm_struct *mm;
+ const struct iovec *iov;
unsigned loop:1,
extending:1,
sync:1,
- flush:1,
- free_iov:1;
+ flush:1;
struct quota_res quota_res;
u64 written;
return -1;
if (dio->iter.nr_segs > ARRAY_SIZE(dio->inline_vecs)) {
- iov = kmalloc_array(dio->iter.nr_segs, sizeof(*iov),
+ dio->iov = iov = kmalloc_array(dio->iter.nr_segs, sizeof(*iov),
GFP_KERNEL);
if (unlikely(!iov))
return -ENOMEM;
-
- dio->free_iov = true;
}
memcpy(iov, dio->iter.__iov, dio->iter.nr_segs * sizeof(*iov));
bch2_pagecache_block_put(inode);
- if (dio->free_iov)
- kfree(dio->iter.__iov);
+ kfree(dio->iov);
ret = dio->op.error ?: ((long) dio->written << 9);
bio_put(&dio->op.wbio.bio);
dio->mapping = mapping;
dio->inode = inode;
dio->mm = current->mm;
+ dio->iov = NULL;
dio->loop = false;
dio->extending = extending;
dio->sync = is_sync_kiocb(req) || extending;
dio->flush = iocb_is_dsync(req) && !c->opts.journal_flush_disabled;
- dio->free_iov = false;
dio->quota_res.sectors = 0;
dio->written = 0;
dio->iter = *iter;
}
mutex_lock(&inode->ei_update_lock);
- ret = bch2_write_inode(c, inode, bch2_inode_flags_set, &s,
+ ret = bch2_subvol_is_ro(c, inode->ei_subvol) ?:
+ bch2_write_inode(c, inode, bch2_inode_flags_set, &s,
ATTR_CTIME);
mutex_unlock(&inode->ei_update_lock);
}
mutex_lock(&inode->ei_update_lock);
- ret = bch2_set_projid(c, inode, fa.fsx_projid);
- if (ret)
- goto err_unlock;
-
- ret = bch2_write_inode(c, inode, fssetxattr_inode_update_fn, &s,
+ ret = bch2_subvol_is_ro(c, inode->ei_subvol) ?:
+ bch2_set_projid(c, inode, fa.fsx_projid) ?:
+ bch2_write_inode(c, inode, fssetxattr_inode_update_fn, &s,
ATTR_CTIME);
-err_unlock:
mutex_unlock(&inode->ei_update_lock);
err:
inode_unlock(&inode->v);
switch (flags) {
case FSOP_GOING_FLAGS_DEFAULT:
- ret = freeze_bdev(c->vfs_sb->s_bdev);
+ ret = bdev_freeze(c->vfs_sb->s_bdev);
if (ret)
goto err;
bch2_journal_flush(&c->journal);
c->vfs_sb->s_flags |= SB_RDONLY;
bch2_fs_emergency_read_only(c);
- thaw_bdev(c->vfs_sb->s_bdev);
+ bdev_thaw(c->vfs_sb->s_bdev);
break;
case FSOP_GOING_FLAGS_LOGFLUSH:
retry:
bch2_trans_begin(trans);
- ret = bch2_create_trans(trans,
+ ret = bch2_subvol_is_ro_trans(trans, dir->ei_subvol) ?:
+ bch2_create_trans(trans,
inode_inum(dir), &dir_u, &inode_u,
!(flags & BCH_CREATE_TMPFILE)
? &dentry->d_name : NULL,
lockdep_assert_held(&inode->v.i_rwsem);
- ret = __bch2_link(c, inode, dir, dentry);
+ ret = bch2_subvol_is_ro(c, dir->ei_subvol) ?:
+ bch2_subvol_is_ro(c, inode->ei_subvol) ?:
+ __bch2_link(c, inode, dir, dentry);
if (unlikely(ret))
return ret;
static int bch2_unlink(struct inode *vdir, struct dentry *dentry)
{
- return __bch2_unlink(vdir, dentry, false);
+ struct bch_inode_info *dir= to_bch_ei(vdir);
+ struct bch_fs *c = dir->v.i_sb->s_fs_info;
+
+ return bch2_subvol_is_ro(c, dir->ei_subvol) ?:
+ __bch2_unlink(vdir, dentry, false);
}
static int bch2_symlink(struct mnt_idmap *idmap,
src_inode,
dst_inode);
+ ret = bch2_subvol_is_ro_trans(trans, src_dir->ei_subvol) ?:
+ bch2_subvol_is_ro_trans(trans, dst_dir->ei_subvol);
+ if (ret)
+ goto err;
+
if (inode_attr_changing(dst_dir, src_inode, Inode_opt_project)) {
ret = bch2_fs_quota_transfer(c, src_inode,
dst_dir->ei_qid,
struct dentry *dentry, struct iattr *iattr)
{
struct bch_inode_info *inode = to_bch_ei(dentry->d_inode);
+ struct bch_fs *c = inode->v.i_sb->s_fs_info;
int ret;
lockdep_assert_held(&inode->v.i_rwsem);
- ret = setattr_prepare(idmap, dentry, iattr);
+ ret = bch2_subvol_is_ro(c, inode->ei_subvol) ?:
+ setattr_prepare(idmap, dentry, iattr);
if (ret)
return ret;
return bch2_err_class(ret);
}
+static int bch2_open(struct inode *vinode, struct file *file)
+{
+ if (file->f_flags & (O_WRONLY|O_RDWR)) {
+ struct bch_inode_info *inode = to_bch_ei(vinode);
+ struct bch_fs *c = inode->v.i_sb->s_fs_info;
+
+ int ret = bch2_subvol_is_ro(c, inode->ei_subvol);
+ if (ret)
+ return ret;
+ }
+
+ return generic_file_open(vinode, file);
+}
+
static const struct file_operations bch_file_operations = {
+ .open = bch2_open,
.llseek = bch2_llseek,
.read_iter = bch2_read_iter,
.write_iter = bch2_write_iter,
.mmap = bch2_mmap,
- .open = generic_file_open,
.fsync = bch2_fsync,
.splice_read = filemap_splice_read,
.splice_write = iter_file_splice_write,
{
struct bch_inode_info *inode = to_bch_ei(vinode);
struct bch_inode_info *dir = to_bch_ei(vdir);
-
- if (*len < sizeof(struct bcachefs_fid_with_parent) / sizeof(u32))
- return FILEID_INVALID;
+ int min_len;
if (!S_ISDIR(inode->v.i_mode) && dir) {
struct bcachefs_fid_with_parent *fid = (void *) fh;
+ min_len = sizeof(*fid) / sizeof(u32);
+ if (*len < min_len) {
+ *len = min_len;
+ return FILEID_INVALID;
+ }
+
fid->fid = bch2_inode_to_fid(inode);
fid->dir = bch2_inode_to_fid(dir);
- *len = sizeof(*fid) / sizeof(u32);
+ *len = min_len;
return FILEID_BCACHEFS_WITH_PARENT;
} else {
struct bcachefs_fid *fid = (void *) fh;
+ min_len = sizeof(*fid) / sizeof(u32);
+ if (*len < min_len) {
+ *len = min_len;
+ return FILEID_INVALID;
+ }
*fid = bch2_inode_to_fid(inode);
- *len = sizeof(*fid) / sizeof(u32);
+ *len = min_len;
return FILEID_BCACHEFS_WITHOUT_PARENT;
}
}
bch2_write_done(cl);
}
+struct bucket_to_lock {
+ struct bpos b;
+ unsigned gen;
+ struct nocow_lock_bucket *l;
+};
+
static void bch2_nocow_write(struct bch_write_op *op)
{
struct bch_fs *c = op->c;
struct bkey_s_c k;
struct bkey_ptrs_c ptrs;
const struct bch_extent_ptr *ptr;
- struct {
- struct bpos b;
- unsigned gen;
- struct nocow_lock_bucket *l;
- } buckets[BCH_REPLICAS_MAX];
- unsigned nr_buckets = 0;
+ DARRAY_PREALLOCATED(struct bucket_to_lock, 3) buckets;
+ struct bucket_to_lock *i;
u32 snapshot;
- int ret, i;
+ struct bucket_to_lock *stale_at;
+ int ret;
if (op->flags & BCH_WRITE_MOVE)
return;
+ darray_init(&buckets);
trans = bch2_trans_get(c);
retry:
bch2_trans_begin(trans);
while (1) {
struct bio *bio = &op->wbio.bio;
- nr_buckets = 0;
+ buckets.nr = 0;
k = bch2_btree_iter_peek_slot(&iter);
ret = bkey_err(k);
break;
if (bch2_keylist_realloc(&op->insert_keys,
- op->inline_keys,
- ARRAY_SIZE(op->inline_keys),
- k.k->u64s))
+ op->inline_keys,
+ ARRAY_SIZE(op->inline_keys),
+ k.k->u64s))
break;
/* Get iorefs before dropping btree locks: */
ptrs = bch2_bkey_ptrs_c(k);
bkey_for_each_ptr(ptrs, ptr) {
- buckets[nr_buckets].b = PTR_BUCKET_POS(c, ptr);
- buckets[nr_buckets].gen = ptr->gen;
- buckets[nr_buckets].l =
- bucket_nocow_lock(&c->nocow_locks,
- bucket_to_u64(buckets[nr_buckets].b));
-
- prefetch(buckets[nr_buckets].l);
+ struct bpos b = PTR_BUCKET_POS(c, ptr);
+ struct nocow_lock_bucket *l =
+ bucket_nocow_lock(&c->nocow_locks, bucket_to_u64(b));
+ prefetch(l);
if (unlikely(!bch2_dev_get_ioref(bch_dev_bkey_exists(c, ptr->dev), WRITE)))
goto err_get_ioref;
- nr_buckets++;
+ /* XXX allocating memory with btree locks held - rare */
+ darray_push_gfp(&buckets, ((struct bucket_to_lock) {
+ .b = b, .gen = ptr->gen, .l = l,
+ }), GFP_KERNEL|__GFP_NOFAIL);
if (ptr->unwritten)
op->flags |= BCH_WRITE_CONVERT_UNWRITTEN;
if (op->flags & BCH_WRITE_CONVERT_UNWRITTEN)
bch2_cut_back(POS(op->pos.inode, op->pos.offset + bio_sectors(bio)), op->insert_keys.top);
- for (i = 0; i < nr_buckets; i++) {
- struct bch_dev *ca = bch_dev_bkey_exists(c, buckets[i].b.inode);
- struct nocow_lock_bucket *l = buckets[i].l;
- bool stale;
+ darray_for_each(buckets, i) {
+ struct bch_dev *ca = bch_dev_bkey_exists(c, i->b.inode);
- __bch2_bucket_nocow_lock(&c->nocow_locks, l,
- bucket_to_u64(buckets[i].b),
+ __bch2_bucket_nocow_lock(&c->nocow_locks, i->l,
+ bucket_to_u64(i->b),
BUCKET_NOCOW_LOCK_UPDATE);
rcu_read_lock();
- stale = gen_after(*bucket_gen(ca, buckets[i].b.offset), buckets[i].gen);
+ bool stale = gen_after(*bucket_gen(ca, i->b.offset), i->gen);
rcu_read_unlock();
- if (unlikely(stale))
+ if (unlikely(stale)) {
+ stale_at = i;
goto err_bucket_stale;
+ }
}
bio = &op->wbio.bio;
if (ret) {
bch_err_inum_offset_ratelimited(c,
- op->pos.inode,
- op->pos.offset << 9,
- "%s: btree lookup error %s",
- __func__, bch2_err_str(ret));
+ op->pos.inode, op->pos.offset << 9,
+ "%s: btree lookup error %s", __func__, bch2_err_str(ret));
op->error = ret;
op->flags |= BCH_WRITE_DONE;
}
bch2_trans_put(trans);
+ darray_exit(&buckets);
/* fallback to cow write path? */
if (!(op->flags & BCH_WRITE_DONE)) {
}
return;
err_get_ioref:
- for (i = 0; i < nr_buckets; i++)
- percpu_ref_put(&bch_dev_bkey_exists(c, buckets[i].b.inode)->io_ref);
+ darray_for_each(buckets, i)
+ percpu_ref_put(&bch_dev_bkey_exists(c, i->b.inode)->io_ref);
/* Fall back to COW path: */
goto out;
err_bucket_stale:
- while (i >= 0) {
- bch2_bucket_nocow_unlock(&c->nocow_locks,
- buckets[i].b,
- BUCKET_NOCOW_LOCK_UPDATE);
- --i;
+ darray_for_each(buckets, i) {
+ bch2_bucket_nocow_unlock(&c->nocow_locks, i->b, BUCKET_NOCOW_LOCK_UPDATE);
+ if (i == stale_at)
+ break;
}
- for (i = 0; i < nr_buckets; i++)
- percpu_ref_put(&bch_dev_bkey_exists(c, buckets[i].b.inode)->io_ref);
/* We can retry this: */
ret = -BCH_ERR_transaction_restart;
- goto out;
+ goto err_get_ioref;
}
static void __bch2_write(struct bch_write_op *op)
return 0;
}
- return journal_validate_key(c, jset, entry, 1, entry->btree_id, k,
- version, big_endian, flags);
+ ret = journal_validate_key(c, jset, entry, 1, entry->btree_id, k,
+ version, big_endian, flags);
+ if (ret == FSCK_DELETED_KEY)
+ ret = 0;
fsck_err:
return ret;
}
// SPDX-License-Identifier: LGPL-2.1+
/* Copyright (C) 2022 Kent Overstreet */
+#include <linux/bitmap.h>
#include <linux/err.h>
#include <linux/export.h>
#include <linux/kernel.h>
flags ^= BIT_ULL(bit);
}
}
+
+void bch2_prt_bitflags_vector(struct printbuf *out,
+ const char * const list[],
+ unsigned long *v, unsigned nr)
+{
+ bool first = true;
+ unsigned i;
+
+ for (i = 0; i < nr; i++)
+ if (!list[i]) {
+ nr = i - 1;
+ break;
+ }
+
+ for_each_set_bit(i, v, nr) {
+ if (!first)
+ bch2_prt_printf(out, ",");
+ first = false;
+ bch2_prt_printf(out, "%s", list[i]);
+ }
+}
void bch2_prt_units_s64(struct printbuf *, s64);
void bch2_prt_string_option(struct printbuf *, const char * const[], size_t);
void bch2_prt_bitflags(struct printbuf *, const char * const[], u64);
+void bch2_prt_bitflags_vector(struct printbuf *, const char * const[],
+ unsigned long *, unsigned);
/* Initializer for a heap allocated printbuf: */
#define PRINTBUF ((struct printbuf) { .heap_allocated = true })
#include "recovery.h"
#include "replicas.h"
#include "sb-clean.h"
+#include "sb-downgrade.h"
#include "snapshot.h"
#include "subvolume.h"
#include "super-io.h"
}
const char * const bch2_recovery_passes[] = {
-#define x(_fn, _when) #_fn,
+#define x(_fn, ...) #_fn,
BCH_RECOVERY_PASSES()
#undef x
NULL
};
static struct recovery_pass_fn recovery_pass_fns[] = {
-#define x(_fn, _when) { .fn = bch2_##_fn, .when = _when },
+#define x(_fn, _id, _when) { .fn = bch2_##_fn, .when = _when },
BCH_RECOVERY_PASSES()
#undef x
};
-static void check_version_upgrade(struct bch_fs *c)
+u64 bch2_recovery_passes_to_stable(u64 v)
+{
+ static const u8 map[] = {
+#define x(n, id, ...) [BCH_RECOVERY_PASS_##n] = BCH_RECOVERY_PASS_STABLE_##n,
+ BCH_RECOVERY_PASSES()
+#undef x
+ };
+
+ u64 ret = 0;
+ for (unsigned i = 0; i < ARRAY_SIZE(map); i++)
+ if (v & BIT_ULL(i))
+ ret |= BIT_ULL(map[i]);
+ return ret;
+}
+
+u64 bch2_recovery_passes_from_stable(u64 v)
+{
+ static const u8 map[] = {
+#define x(n, id, ...) [BCH_RECOVERY_PASS_STABLE_##n] = BCH_RECOVERY_PASS_##n,
+ BCH_RECOVERY_PASSES()
+#undef x
+ };
+
+ u64 ret = 0;
+ for (unsigned i = 0; i < ARRAY_SIZE(map); i++)
+ if (v & BIT_ULL(i))
+ ret |= BIT_ULL(map[i]);
+ return ret;
+}
+
+static bool check_version_upgrade(struct bch_fs *c)
{
unsigned latest_compatible = bch2_latest_compatible_version(c->sb.version);
unsigned latest_version = bcachefs_metadata_version_current;
unsigned old_version = c->sb.version_upgrade_complete ?: c->sb.version;
unsigned new_version = 0;
- u64 recovery_passes;
if (old_version < bcachefs_metadata_required_upgrade_below) {
if (c->opts.version_upgrade == BCH_VERSION_UPGRADE_incompatible ||
bch2_version_to_text(&buf, new_version);
prt_newline(&buf);
- recovery_passes = bch2_upgrade_recovery_passes(c, old_version, new_version);
+ u64 recovery_passes = bch2_upgrade_recovery_passes(c, old_version, new_version);
if (recovery_passes) {
if ((recovery_passes & RECOVERY_PASS_ALL_FSCK) == RECOVERY_PASS_ALL_FSCK)
prt_str(&buf, "fsck required");
bch_info(c, "%s", buf.buf);
- mutex_lock(&c->sb_lock);
bch2_sb_upgrade(c, new_version);
- mutex_unlock(&c->sb_lock);
printbuf_exit(&buf);
+ return true;
}
+
+ return false;
}
u64 bch2_fsck_recovery_passes(void)
struct bch_sb_field_clean *clean = NULL;
struct jset *last_journal_entry = NULL;
u64 last_seq = 0, blacklist_seq, journal_seq;
- bool write_sb = false;
int ret = 0;
if (c->sb.clean) {
goto err;
}
- if (c->opts.fsck || !(c->opts.nochanges && c->opts.norecovery))
- check_version_upgrade(c);
-
if (c->opts.fsck && c->opts.norecovery) {
bch_err(c, "cannot select both norecovery and fsck");
ret = -EINVAL;
goto err;
}
+ if (!(c->opts.nochanges && c->opts.norecovery)) {
+ mutex_lock(&c->sb_lock);
+ bool write_sb = false;
+
+ struct bch_sb_field_ext *ext =
+ bch2_sb_field_get_minsize(&c->disk_sb, ext, sizeof(*ext) / sizeof(u64));
+ if (!ext) {
+ ret = -BCH_ERR_ENOSPC_sb;
+ mutex_unlock(&c->sb_lock);
+ goto err;
+ }
+
+ if (BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb)) {
+ ext->recovery_passes_required[0] |=
+ cpu_to_le64(bch2_recovery_passes_to_stable(BIT_ULL(BCH_RECOVERY_PASS_check_topology)));
+ write_sb = true;
+ }
+
+ u64 sb_passes = bch2_recovery_passes_from_stable(le64_to_cpu(ext->recovery_passes_required[0]));
+ if (sb_passes) {
+ struct printbuf buf = PRINTBUF;
+ prt_str(&buf, "superblock requires following recovery passes to be run:\n ");
+ prt_bitflags(&buf, bch2_recovery_passes, sb_passes);
+ bch_info(c, "%s", buf.buf);
+ printbuf_exit(&buf);
+ }
+
+ if (bch2_check_version_downgrade(c)) {
+ struct printbuf buf = PRINTBUF;
+
+ prt_str(&buf, "Version downgrade required:\n");
+
+ __le64 passes = ext->recovery_passes_required[0];
+ bch2_sb_set_downgrade(c,
+ BCH_VERSION_MINOR(bcachefs_metadata_version_current),
+ BCH_VERSION_MINOR(c->sb.version));
+ passes = ext->recovery_passes_required[0] & ~passes;
+ if (passes) {
+ prt_str(&buf, " running recovery passes: ");
+ prt_bitflags(&buf, bch2_recovery_passes,
+ bch2_recovery_passes_from_stable(le64_to_cpu(passes)));
+ }
+
+ bch_info(c, "%s", buf.buf);
+ printbuf_exit(&buf);
+ write_sb = true;
+ }
+
+ if (check_version_upgrade(c))
+ write_sb = true;
+
+ if (write_sb)
+ bch2_write_super(c);
+
+ c->recovery_passes_explicit |= bch2_recovery_passes_from_stable(le64_to_cpu(ext->recovery_passes_required[0]));
+ mutex_unlock(&c->sb_lock);
+ }
+
+ if (c->opts.fsck && IS_ENABLED(CONFIG_BCACHEFS_DEBUG))
+ c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_topology);
+
ret = bch2_blacklist_table_initialize(c);
if (ret) {
bch_err(c, "error initializing blacklist table");
if (ret)
goto err;
- if (c->opts.fsck &&
- (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) ||
- BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb)))
- c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_topology);
-
ret = bch2_run_recovery_passes(c);
if (ret)
goto err;
}
mutex_lock(&c->sb_lock);
- if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) != c->sb.version) {
- SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, c->sb.version);
+ bool write_sb = false;
+
+ if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) != le16_to_cpu(c->disk_sb.sb->version)) {
+ SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, le16_to_cpu(c->disk_sb.sb->version));
write_sb = true;
}
- if (!test_bit(BCH_FS_ERROR, &c->flags)) {
+ if (!test_bit(BCH_FS_ERROR, &c->flags) &&
+ !(c->disk_sb.sb->compat[0] & cpu_to_le64(1ULL << BCH_COMPAT_alloc_info))) {
c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_alloc_info);
write_sb = true;
}
+ if (!test_bit(BCH_FS_ERROR, &c->flags)) {
+ struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
+ if (ext &&
+ (!bch2_is_zero(ext->recovery_passes_required, sizeof(ext->recovery_passes_required)) ||
+ !bch2_is_zero(ext->errors_silent, sizeof(ext->errors_silent)))) {
+ memset(ext->recovery_passes_required, 0, sizeof(ext->recovery_passes_required));
+ memset(ext->errors_silent, 0, sizeof(ext->errors_silent));
+ write_sb = true;
+ }
+ }
+
if (c->opts.fsck &&
!test_bit(BCH_FS_ERROR, &c->flags) &&
!test_bit(BCH_FS_ERRORS_NOT_FIXED, &c->flags)) {
c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_extents_above_btree_updates_done);
c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_bformat_overflow_done);
- bch2_sb_maybe_downgrade(c);
+ bch2_check_version_downgrade(c);
if (c->opts.version_upgrade != BCH_VERSION_UPGRADE_none) {
bch2_sb_upgrade(c, bcachefs_metadata_version_current);
extern const char * const bch2_recovery_passes[];
+u64 bch2_recovery_passes_to_stable(u64 v);
+u64 bch2_recovery_passes_from_stable(u64 v);
+
/*
* For when we need to rewind recovery passes and run a pass we skipped:
*/
static inline int bch2_run_explicit_recovery_pass(struct bch_fs *c,
enum bch_recovery_pass pass)
{
+ if (c->recovery_passes_explicit & BIT_ULL(pass))
+ return 0;
+
bch_info(c, "running explicit recovery pass %s (%u), currently at %s (%u)",
bch2_recovery_passes[pass], pass,
bch2_recovery_passes[c->curr_recovery_pass], c->curr_recovery_pass);
#define PASS_UNCLEAN BIT(2)
#define PASS_ALWAYS BIT(3)
-#define BCH_RECOVERY_PASSES() \
- x(alloc_read, PASS_ALWAYS) \
- x(stripes_read, PASS_ALWAYS) \
- x(initialize_subvolumes, 0) \
- x(snapshots_read, PASS_ALWAYS) \
- x(check_topology, 0) \
- x(check_allocations, PASS_FSCK) \
- x(trans_mark_dev_sbs, PASS_ALWAYS|PASS_SILENT) \
- x(fs_journal_alloc, PASS_ALWAYS|PASS_SILENT) \
- x(set_may_go_rw, PASS_ALWAYS|PASS_SILENT) \
- x(journal_replay, PASS_ALWAYS) \
- x(check_alloc_info, PASS_FSCK) \
- x(check_lrus, PASS_FSCK) \
- x(check_btree_backpointers, PASS_FSCK) \
- x(check_backpointers_to_extents,PASS_FSCK) \
- x(check_extents_to_backpointers,PASS_FSCK) \
- x(check_alloc_to_lru_refs, PASS_FSCK) \
- x(fs_freespace_init, PASS_ALWAYS|PASS_SILENT) \
- x(bucket_gens_init, 0) \
- x(check_snapshot_trees, PASS_FSCK) \
- x(check_snapshots, PASS_FSCK) \
- x(check_subvols, PASS_FSCK) \
- x(delete_dead_snapshots, PASS_FSCK) \
- x(fs_upgrade_for_subvolumes, 0) \
- x(resume_logged_ops, PASS_ALWAYS) \
- x(check_inodes, PASS_FSCK) \
- x(check_extents, PASS_FSCK) \
- x(check_indirect_extents, PASS_FSCK) \
- x(check_dirents, PASS_FSCK) \
- x(check_xattrs, PASS_FSCK) \
- x(check_root, PASS_FSCK) \
- x(check_directory_structure, PASS_FSCK) \
- x(check_nlinks, PASS_FSCK) \
- x(delete_dead_inodes, PASS_FSCK|PASS_UNCLEAN) \
- x(fix_reflink_p, 0) \
- x(set_fs_needs_rebalance, 0) \
+/*
+ * Passes may be reordered, but the second field is a persistent identifier and
+ * must never change:
+ */
+#define BCH_RECOVERY_PASSES() \
+ x(alloc_read, 0, PASS_ALWAYS) \
+ x(stripes_read, 1, PASS_ALWAYS) \
+ x(initialize_subvolumes, 2, 0) \
+ x(snapshots_read, 3, PASS_ALWAYS) \
+ x(check_topology, 4, 0) \
+ x(check_allocations, 5, PASS_FSCK) \
+ x(trans_mark_dev_sbs, 6, PASS_ALWAYS|PASS_SILENT) \
+ x(fs_journal_alloc, 7, PASS_ALWAYS|PASS_SILENT) \
+ x(set_may_go_rw, 8, PASS_ALWAYS|PASS_SILENT) \
+ x(journal_replay, 9, PASS_ALWAYS) \
+ x(check_alloc_info, 10, PASS_FSCK) \
+ x(check_lrus, 11, PASS_FSCK) \
+ x(check_btree_backpointers, 12, PASS_FSCK) \
+ x(check_backpointers_to_extents, 13, PASS_FSCK) \
+ x(check_extents_to_backpointers, 14, PASS_FSCK) \
+ x(check_alloc_to_lru_refs, 15, PASS_FSCK) \
+ x(fs_freespace_init, 16, PASS_ALWAYS|PASS_SILENT) \
+ x(bucket_gens_init, 17, 0) \
+ x(check_snapshot_trees, 18, PASS_FSCK) \
+ x(check_snapshots, 19, PASS_FSCK) \
+ x(check_subvols, 20, PASS_FSCK) \
+ x(delete_dead_snapshots, 21, PASS_FSCK) \
+ x(fs_upgrade_for_subvolumes, 22, 0) \
+ x(resume_logged_ops, 23, PASS_ALWAYS) \
+ x(check_inodes, 24, PASS_FSCK) \
+ x(check_extents, 25, PASS_FSCK) \
+ x(check_indirect_extents, 26, PASS_FSCK) \
+ x(check_dirents, 27, PASS_FSCK) \
+ x(check_xattrs, 28, PASS_FSCK) \
+ x(check_root, 29, PASS_FSCK) \
+ x(check_directory_structure, 30, PASS_FSCK) \
+ x(check_nlinks, 31, PASS_FSCK) \
+ x(delete_dead_inodes, 32, PASS_FSCK|PASS_UNCLEAN) \
+ x(fix_reflink_p, 33, 0) \
+ x(set_fs_needs_rebalance, 34, 0) \
+/* We normally enumerate recovery passes in the order we run them: */
enum bch_recovery_pass {
-#define x(n, when) BCH_RECOVERY_PASS_##n,
+#define x(n, id, when) BCH_RECOVERY_PASS_##n,
+ BCH_RECOVERY_PASSES()
+#undef x
+};
+
+/* But we also need stable identifiers that can be used in the superblock */
+enum bch_recovery_pass_stable {
+#define x(n, id, when) BCH_RECOVERY_PASS_STABLE_##n = id,
BCH_RECOVERY_PASSES()
#undef x
};
mutex_lock(&c->sb_lock);
SET_BCH_SB_CLEAN(c->disk_sb.sb, false);
-
- bch2_sb_maybe_downgrade(c);
c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALWAYS);
ret = bch2_write_super(c);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Superblock section that contains a list of recovery passes to run when
+ * downgrading past a given version
+ */
+
+#include "bcachefs.h"
+#include "darray.h"
+#include "recovery.h"
+#include "sb-downgrade.h"
+#include "sb-errors.h"
+#include "super-io.h"
+
+/*
+ * Downgrade table:
+ * When dowgrading past certain versions, we need to run certain recovery passes
+ * and fix certain errors:
+ *
+ * x(version, recovery_passes, errors...)
+ */
+
+#define DOWNGRADE_TABLE()
+
+struct downgrade_entry {
+ u64 recovery_passes;
+ u16 version;
+ u16 nr_errors;
+ const u16 *errors;
+};
+
+#define x(ver, passes, ...) static const u16 ver_##errors[] = { __VA_ARGS__ };
+DOWNGRADE_TABLE()
+#undef x
+
+static const struct downgrade_entry downgrade_table[] = {
+#define x(ver, passes, ...) { \
+ .recovery_passes = passes, \
+ .version = bcachefs_metadata_version_##ver,\
+ .nr_errors = ARRAY_SIZE(ver_##errors), \
+ .errors = ver_##errors, \
+},
+DOWNGRADE_TABLE()
+#undef x
+};
+
+static inline const struct bch_sb_field_downgrade_entry *
+downgrade_entry_next_c(const struct bch_sb_field_downgrade_entry *e)
+{
+ return (void *) &e->errors[le16_to_cpu(e->nr_errors)];
+}
+
+#define for_each_downgrade_entry(_d, _i) \
+ for (const struct bch_sb_field_downgrade_entry *_i = (_d)->entries; \
+ (void *) _i < vstruct_end(&(_d)->field) && \
+ (void *) &_i->errors[0] < vstruct_end(&(_d)->field); \
+ _i = downgrade_entry_next_c(_i))
+
+static int bch2_sb_downgrade_validate(struct bch_sb *sb, struct bch_sb_field *f,
+ struct printbuf *err)
+{
+ struct bch_sb_field_downgrade *e = field_to_type(f, downgrade);
+
+ for_each_downgrade_entry(e, i) {
+ if (BCH_VERSION_MAJOR(le16_to_cpu(i->version)) !=
+ BCH_VERSION_MAJOR(le16_to_cpu(sb->version))) {
+ prt_printf(err, "downgrade entry with mismatched major version (%u != %u)",
+ BCH_VERSION_MAJOR(le16_to_cpu(i->version)),
+ BCH_VERSION_MAJOR(le16_to_cpu(sb->version)));
+ return -BCH_ERR_invalid_sb_downgrade;
+ }
+ }
+
+ return 0;
+}
+
+static void bch2_sb_downgrade_to_text(struct printbuf *out, struct bch_sb *sb,
+ struct bch_sb_field *f)
+{
+ struct bch_sb_field_downgrade *e = field_to_type(f, downgrade);
+
+ if (out->nr_tabstops <= 1)
+ printbuf_tabstop_push(out, 16);
+
+ for_each_downgrade_entry(e, i) {
+ prt_str(out, "version:");
+ prt_tab(out);
+ bch2_version_to_text(out, le16_to_cpu(i->version));
+ prt_newline(out);
+
+ prt_str(out, "recovery passes:");
+ prt_tab(out);
+ prt_bitflags(out, bch2_recovery_passes,
+ bch2_recovery_passes_from_stable(le64_to_cpu(i->recovery_passes[0])));
+ prt_newline(out);
+
+ prt_str(out, "errors:");
+ prt_tab(out);
+ bool first = true;
+ for (unsigned j = 0; j < le16_to_cpu(i->nr_errors); j++) {
+ if (!first)
+ prt_char(out, ',');
+ first = false;
+ unsigned e = le16_to_cpu(i->errors[j]);
+ prt_str(out, e < BCH_SB_ERR_MAX ? bch2_sb_error_strs[e] : "(unknown)");
+ }
+ prt_newline(out);
+ }
+}
+
+const struct bch_sb_field_ops bch_sb_field_ops_downgrade = {
+ .validate = bch2_sb_downgrade_validate,
+ .to_text = bch2_sb_downgrade_to_text,
+};
+
+int bch2_sb_downgrade_update(struct bch_fs *c)
+{
+ darray_char table = {};
+ int ret = 0;
+
+ for (const struct downgrade_entry *src = downgrade_table;
+ src < downgrade_table + ARRAY_SIZE(downgrade_table);
+ src++) {
+ if (BCH_VERSION_MAJOR(src->version) != BCH_VERSION_MAJOR(le16_to_cpu(c->disk_sb.sb->version)))
+ continue;
+
+ struct bch_sb_field_downgrade_entry *dst;
+ unsigned bytes = sizeof(*dst) + sizeof(dst->errors[0]) * src->nr_errors;
+
+ ret = darray_make_room(&table, bytes);
+ if (ret)
+ goto out;
+
+ dst = (void *) &darray_top(table);
+ dst->version = cpu_to_le16(src->version);
+ dst->recovery_passes[0] = cpu_to_le64(src->recovery_passes);
+ dst->recovery_passes[1] = 0;
+ dst->nr_errors = cpu_to_le16(src->nr_errors);
+ for (unsigned i = 0; i < src->nr_errors; i++)
+ dst->errors[i] = cpu_to_le16(src->errors[i]);
+
+ table.nr += bytes;
+ }
+
+ struct bch_sb_field_downgrade *d = bch2_sb_field_get(c->disk_sb.sb, downgrade);
+
+ unsigned sb_u64s = DIV_ROUND_UP(sizeof(*d) + table.nr, sizeof(u64));
+
+ if (d && le32_to_cpu(d->field.u64s) > sb_u64s)
+ goto out;
+
+ d = bch2_sb_field_resize(&c->disk_sb, downgrade, sb_u64s);
+ if (!d) {
+ ret = -BCH_ERR_ENOSPC_sb_downgrade;
+ goto out;
+ }
+
+ memcpy(d->entries, table.data, table.nr);
+ memset_u64s_tail(d->entries, 0, table.nr);
+out:
+ darray_exit(&table);
+ return ret;
+}
+
+void bch2_sb_set_downgrade(struct bch_fs *c, unsigned new_minor, unsigned old_minor)
+{
+ struct bch_sb_field_downgrade *d = bch2_sb_field_get(c->disk_sb.sb, downgrade);
+ if (!d)
+ return;
+
+ struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
+
+ for_each_downgrade_entry(d, i) {
+ unsigned minor = BCH_VERSION_MINOR(le16_to_cpu(i->version));
+ if (new_minor < minor && minor <= old_minor) {
+ ext->recovery_passes_required[0] |= i->recovery_passes[0];
+ ext->recovery_passes_required[1] |= i->recovery_passes[1];
+
+ for (unsigned j = 0; j < le16_to_cpu(i->nr_errors); j++) {
+ unsigned e = le16_to_cpu(i->errors[j]);
+ if (e < BCH_SB_ERR_MAX)
+ __set_bit(e, c->sb.errors_silent);
+ if (e < sizeof(ext->errors_silent) * 8)
+ ext->errors_silent[e / 64] |= cpu_to_le64(BIT_ULL(e % 64));
+ }
+ }
+ }
+}
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _BCACHEFS_SB_DOWNGRADE_H
+#define _BCACHEFS_SB_DOWNGRADE_H
+
+extern const struct bch_sb_field_ops bch_sb_field_ops_downgrade;
+
+int bch2_sb_downgrade_update(struct bch_fs *);
+void bch2_sb_set_downgrade(struct bch_fs *, unsigned, unsigned);
+
+#endif /* _BCACHEFS_SB_DOWNGRADE_H */
#include "sb-errors.h"
#include "super-io.h"
-static const char * const bch2_sb_error_strs[] = {
+const char * const bch2_sb_error_strs[] = {
#define x(t, n, ...) [n] = #t,
BCH_SB_ERRS()
NULL
static inline unsigned bch2_sb_field_errors_nr_entries(struct bch_sb_field_errors *e)
{
- return e
- ? (bch2_sb_field_bytes(&e->field) - sizeof(*e)) / sizeof(e->entries[0])
- : 0;
+ return bch2_sb_field_nr_entries(e);
}
static inline unsigned bch2_sb_field_errors_u64s(unsigned nr)
#include "sb-errors_types.h"
-#define BCH_SB_ERRS() \
- x(clean_but_journal_not_empty, 0) \
- x(dirty_but_no_journal_entries, 1) \
- x(dirty_but_no_journal_entries_post_drop_nonflushes, 2) \
- x(sb_clean_journal_seq_mismatch, 3) \
- x(sb_clean_btree_root_mismatch, 4) \
- x(sb_clean_missing, 5) \
- x(jset_unsupported_version, 6) \
- x(jset_unknown_csum, 7) \
- x(jset_last_seq_newer_than_seq, 8) \
- x(jset_past_bucket_end, 9) \
- x(jset_seq_blacklisted, 10) \
- x(journal_entries_missing, 11) \
- x(journal_entry_replicas_not_marked, 12) \
- x(journal_entry_past_jset_end, 13) \
- x(journal_entry_replicas_data_mismatch, 14) \
- x(journal_entry_bkey_u64s_0, 15) \
- x(journal_entry_bkey_past_end, 16) \
- x(journal_entry_bkey_bad_format, 17) \
- x(journal_entry_bkey_invalid, 18) \
- x(journal_entry_btree_root_bad_size, 19) \
- x(journal_entry_blacklist_bad_size, 20) \
- x(journal_entry_blacklist_v2_bad_size, 21) \
- x(journal_entry_blacklist_v2_start_past_end, 22) \
- x(journal_entry_usage_bad_size, 23) \
- x(journal_entry_data_usage_bad_size, 24) \
- x(journal_entry_clock_bad_size, 25) \
- x(journal_entry_clock_bad_rw, 26) \
- x(journal_entry_dev_usage_bad_size, 27) \
- x(journal_entry_dev_usage_bad_dev, 28) \
- x(journal_entry_dev_usage_bad_pad, 29) \
- x(btree_node_unreadable, 30) \
- x(btree_node_fault_injected, 31) \
- x(btree_node_bad_magic, 32) \
- x(btree_node_bad_seq, 33) \
- x(btree_node_unsupported_version, 34) \
- x(btree_node_bset_older_than_sb_min, 35) \
- x(btree_node_bset_newer_than_sb, 36) \
- x(btree_node_data_missing, 37) \
- x(btree_node_bset_after_end, 38) \
- x(btree_node_replicas_sectors_written_mismatch, 39) \
- x(btree_node_replicas_data_mismatch, 40) \
- x(bset_unknown_csum, 41) \
- x(bset_bad_csum, 42) \
- x(bset_past_end_of_btree_node, 43) \
- x(bset_wrong_sector_offset, 44) \
- x(bset_empty, 45) \
- x(bset_bad_seq, 46) \
- x(bset_blacklisted_journal_seq, 47) \
- x(first_bset_blacklisted_journal_seq, 48) \
- x(btree_node_bad_btree, 49) \
- x(btree_node_bad_level, 50) \
- x(btree_node_bad_min_key, 51) \
- x(btree_node_bad_max_key, 52) \
- x(btree_node_bad_format, 53) \
- x(btree_node_bkey_past_bset_end, 54) \
- x(btree_node_bkey_bad_format, 55) \
- x(btree_node_bad_bkey, 56) \
- x(btree_node_bkey_out_of_order, 57) \
- x(btree_root_bkey_invalid, 58) \
- x(btree_root_read_error, 59) \
- x(btree_root_bad_min_key, 50) \
- x(btree_root_bad_max_key, 61) \
- x(btree_node_read_error, 62) \
- x(btree_node_topology_bad_min_key, 63) \
- x(btree_node_topology_bad_max_key, 64) \
- x(btree_node_topology_overwritten_by_prev_node, 65) \
- x(btree_node_topology_overwritten_by_next_node, 66) \
- x(btree_node_topology_interior_node_empty, 67) \
- x(fs_usage_hidden_wrong, 68) \
- x(fs_usage_btree_wrong, 69) \
- x(fs_usage_data_wrong, 70) \
- x(fs_usage_cached_wrong, 71) \
- x(fs_usage_reserved_wrong, 72) \
- x(fs_usage_persistent_reserved_wrong, 73) \
- x(fs_usage_nr_inodes_wrong, 74) \
- x(fs_usage_replicas_wrong, 75) \
- x(dev_usage_buckets_wrong, 76) \
- x(dev_usage_sectors_wrong, 77) \
- x(dev_usage_fragmented_wrong, 78) \
- x(dev_usage_buckets_ec_wrong, 79) \
- x(bkey_version_in_future, 80) \
- x(bkey_u64s_too_small, 81) \
- x(bkey_invalid_type_for_btree, 82) \
- x(bkey_extent_size_zero, 83) \
- x(bkey_extent_size_greater_than_offset, 84) \
- x(bkey_size_nonzero, 85) \
- x(bkey_snapshot_nonzero, 86) \
- x(bkey_snapshot_zero, 87) \
- x(bkey_at_pos_max, 88) \
- x(bkey_before_start_of_btree_node, 89) \
- x(bkey_after_end_of_btree_node, 90) \
- x(bkey_val_size_nonzero, 91) \
- x(bkey_val_size_too_small, 92) \
- x(alloc_v1_val_size_bad, 93) \
- x(alloc_v2_unpack_error, 94) \
- x(alloc_v3_unpack_error, 95) \
- x(alloc_v4_val_size_bad, 96) \
- x(alloc_v4_backpointers_start_bad, 97) \
- x(alloc_key_data_type_bad, 98) \
- x(alloc_key_empty_but_have_data, 99) \
- x(alloc_key_dirty_sectors_0, 100) \
- x(alloc_key_data_type_inconsistency, 101) \
- x(alloc_key_to_missing_dev_bucket, 102) \
- x(alloc_key_cached_inconsistency, 103) \
- x(alloc_key_cached_but_read_time_zero, 104) \
- x(alloc_key_to_missing_lru_entry, 105) \
- x(alloc_key_data_type_wrong, 106) \
- x(alloc_key_gen_wrong, 107) \
- x(alloc_key_dirty_sectors_wrong, 108) \
- x(alloc_key_cached_sectors_wrong, 109) \
- x(alloc_key_stripe_wrong, 110) \
- x(alloc_key_stripe_redundancy_wrong, 111) \
- x(bucket_sector_count_overflow, 112) \
- x(bucket_metadata_type_mismatch, 113) \
- x(need_discard_key_wrong, 114) \
- x(freespace_key_wrong, 115) \
- x(freespace_hole_missing, 116) \
- x(bucket_gens_val_size_bad, 117) \
- x(bucket_gens_key_wrong, 118) \
- x(bucket_gens_hole_wrong, 119) \
- x(bucket_gens_to_invalid_dev, 120) \
- x(bucket_gens_to_invalid_buckets, 121) \
- x(bucket_gens_nonzero_for_invalid_buckets, 122) \
- x(need_discard_freespace_key_to_invalid_dev_bucket, 123) \
- x(need_discard_freespace_key_bad, 124) \
- x(backpointer_pos_wrong, 125) \
- x(backpointer_to_missing_device, 126) \
- x(backpointer_to_missing_alloc, 127) \
- x(backpointer_to_missing_ptr, 128) \
- x(lru_entry_at_time_0, 129) \
- x(lru_entry_to_invalid_bucket, 130) \
- x(lru_entry_bad, 131) \
- x(btree_ptr_val_too_big, 132) \
- x(btree_ptr_v2_val_too_big, 133) \
- x(btree_ptr_has_non_ptr, 134) \
- x(extent_ptrs_invalid_entry, 135) \
- x(extent_ptrs_no_ptrs, 136) \
- x(extent_ptrs_too_many_ptrs, 137) \
- x(extent_ptrs_redundant_crc, 138) \
- x(extent_ptrs_redundant_stripe, 139) \
- x(extent_ptrs_unwritten, 140) \
- x(extent_ptrs_written_and_unwritten, 141) \
- x(ptr_to_invalid_device, 142) \
- x(ptr_to_duplicate_device, 143) \
- x(ptr_after_last_bucket, 144) \
- x(ptr_before_first_bucket, 145) \
- x(ptr_spans_multiple_buckets, 146) \
- x(ptr_to_missing_backpointer, 147) \
- x(ptr_to_missing_alloc_key, 148) \
- x(ptr_to_missing_replicas_entry, 149) \
- x(ptr_to_missing_stripe, 150) \
- x(ptr_to_incorrect_stripe, 151) \
- x(ptr_gen_newer_than_bucket_gen, 152) \
- x(ptr_too_stale, 153) \
- x(stale_dirty_ptr, 154) \
- x(ptr_bucket_data_type_mismatch, 155) \
- x(ptr_cached_and_erasure_coded, 156) \
- x(ptr_crc_uncompressed_size_too_small, 157) \
- x(ptr_crc_csum_type_unknown, 158) \
- x(ptr_crc_compression_type_unknown, 159) \
- x(ptr_crc_redundant, 160) \
- x(ptr_crc_uncompressed_size_too_big, 161) \
- x(ptr_crc_nonce_mismatch, 162) \
- x(ptr_stripe_redundant, 163) \
- x(reservation_key_nr_replicas_invalid, 164) \
- x(reflink_v_refcount_wrong, 165) \
- x(reflink_p_to_missing_reflink_v, 166) \
- x(stripe_pos_bad, 167) \
- x(stripe_val_size_bad, 168) \
- x(stripe_sector_count_wrong, 169) \
- x(snapshot_tree_pos_bad, 170) \
- x(snapshot_tree_to_missing_snapshot, 171) \
- x(snapshot_tree_to_missing_subvol, 172) \
- x(snapshot_tree_to_wrong_subvol, 173) \
- x(snapshot_tree_to_snapshot_subvol, 174) \
- x(snapshot_pos_bad, 175) \
- x(snapshot_parent_bad, 176) \
- x(snapshot_children_not_normalized, 177) \
- x(snapshot_child_duplicate, 178) \
- x(snapshot_child_bad, 179) \
- x(snapshot_skiplist_not_normalized, 180) \
- x(snapshot_skiplist_bad, 181) \
- x(snapshot_should_not_have_subvol, 182) \
- x(snapshot_to_bad_snapshot_tree, 183) \
- x(snapshot_bad_depth, 184) \
- x(snapshot_bad_skiplist, 185) \
- x(subvol_pos_bad, 186) \
- x(subvol_not_master_and_not_snapshot, 187) \
- x(subvol_to_missing_root, 188) \
- x(subvol_root_wrong_bi_subvol, 189) \
- x(bkey_in_missing_snapshot, 190) \
- x(inode_pos_inode_nonzero, 191) \
- x(inode_pos_blockdev_range, 192) \
- x(inode_unpack_error, 193) \
- x(inode_str_hash_invalid, 194) \
- x(inode_v3_fields_start_bad, 195) \
- x(inode_snapshot_mismatch, 196) \
- x(inode_unlinked_but_clean, 197) \
- x(inode_unlinked_but_nlink_nonzero, 198) \
- x(inode_checksum_type_invalid, 199) \
- x(inode_compression_type_invalid, 200) \
- x(inode_subvol_root_but_not_dir, 201) \
- x(inode_i_size_dirty_but_clean, 202) \
- x(inode_i_sectors_dirty_but_clean, 203) \
- x(inode_i_sectors_wrong, 204) \
- x(inode_dir_wrong_nlink, 205) \
- x(inode_dir_multiple_links, 206) \
- x(inode_multiple_links_but_nlink_0, 207) \
- x(inode_wrong_backpointer, 208) \
- x(inode_wrong_nlink, 209) \
- x(inode_unreachable, 210) \
- x(deleted_inode_but_clean, 211) \
- x(deleted_inode_missing, 212) \
- x(deleted_inode_is_dir, 213) \
- x(deleted_inode_not_unlinked, 214) \
- x(extent_overlapping, 215) \
- x(extent_in_missing_inode, 216) \
- x(extent_in_non_reg_inode, 217) \
- x(extent_past_end_of_inode, 218) \
- x(dirent_empty_name, 219) \
- x(dirent_val_too_big, 220) \
- x(dirent_name_too_long, 221) \
- x(dirent_name_embedded_nul, 222) \
- x(dirent_name_dot_or_dotdot, 223) \
- x(dirent_name_has_slash, 224) \
- x(dirent_d_type_wrong, 225) \
- x(dirent_d_parent_subvol_wrong, 226) \
- x(dirent_in_missing_dir_inode, 227) \
- x(dirent_in_non_dir_inode, 228) \
- x(dirent_to_missing_inode, 229) \
- x(dirent_to_missing_subvol, 230) \
- x(dirent_to_itself, 231) \
- x(quota_type_invalid, 232) \
- x(xattr_val_size_too_small, 233) \
- x(xattr_val_size_too_big, 234) \
- x(xattr_invalid_type, 235) \
- x(xattr_name_invalid_chars, 236) \
- x(xattr_in_missing_inode, 237) \
- x(root_subvol_missing, 238) \
- x(root_dir_missing, 239) \
- x(root_inode_not_dir, 240) \
- x(dir_loop, 241) \
- x(hash_table_key_duplicate, 242) \
- x(hash_table_key_wrong_offset, 243)
-
-enum bch_sb_error_id {
-#define x(t, n) BCH_FSCK_ERR_##t = n,
- BCH_SB_ERRS()
-#undef x
- BCH_SB_ERR_MAX
-};
+extern const char * const bch2_sb_error_strs[];
extern const struct bch_sb_field_ops bch_sb_field_ops_errors;
#include "darray.h"
+#define BCH_SB_ERRS() \
+ x(clean_but_journal_not_empty, 0) \
+ x(dirty_but_no_journal_entries, 1) \
+ x(dirty_but_no_journal_entries_post_drop_nonflushes, 2) \
+ x(sb_clean_journal_seq_mismatch, 3) \
+ x(sb_clean_btree_root_mismatch, 4) \
+ x(sb_clean_missing, 5) \
+ x(jset_unsupported_version, 6) \
+ x(jset_unknown_csum, 7) \
+ x(jset_last_seq_newer_than_seq, 8) \
+ x(jset_past_bucket_end, 9) \
+ x(jset_seq_blacklisted, 10) \
+ x(journal_entries_missing, 11) \
+ x(journal_entry_replicas_not_marked, 12) \
+ x(journal_entry_past_jset_end, 13) \
+ x(journal_entry_replicas_data_mismatch, 14) \
+ x(journal_entry_bkey_u64s_0, 15) \
+ x(journal_entry_bkey_past_end, 16) \
+ x(journal_entry_bkey_bad_format, 17) \
+ x(journal_entry_bkey_invalid, 18) \
+ x(journal_entry_btree_root_bad_size, 19) \
+ x(journal_entry_blacklist_bad_size, 20) \
+ x(journal_entry_blacklist_v2_bad_size, 21) \
+ x(journal_entry_blacklist_v2_start_past_end, 22) \
+ x(journal_entry_usage_bad_size, 23) \
+ x(journal_entry_data_usage_bad_size, 24) \
+ x(journal_entry_clock_bad_size, 25) \
+ x(journal_entry_clock_bad_rw, 26) \
+ x(journal_entry_dev_usage_bad_size, 27) \
+ x(journal_entry_dev_usage_bad_dev, 28) \
+ x(journal_entry_dev_usage_bad_pad, 29) \
+ x(btree_node_unreadable, 30) \
+ x(btree_node_fault_injected, 31) \
+ x(btree_node_bad_magic, 32) \
+ x(btree_node_bad_seq, 33) \
+ x(btree_node_unsupported_version, 34) \
+ x(btree_node_bset_older_than_sb_min, 35) \
+ x(btree_node_bset_newer_than_sb, 36) \
+ x(btree_node_data_missing, 37) \
+ x(btree_node_bset_after_end, 38) \
+ x(btree_node_replicas_sectors_written_mismatch, 39) \
+ x(btree_node_replicas_data_mismatch, 40) \
+ x(bset_unknown_csum, 41) \
+ x(bset_bad_csum, 42) \
+ x(bset_past_end_of_btree_node, 43) \
+ x(bset_wrong_sector_offset, 44) \
+ x(bset_empty, 45) \
+ x(bset_bad_seq, 46) \
+ x(bset_blacklisted_journal_seq, 47) \
+ x(first_bset_blacklisted_journal_seq, 48) \
+ x(btree_node_bad_btree, 49) \
+ x(btree_node_bad_level, 50) \
+ x(btree_node_bad_min_key, 51) \
+ x(btree_node_bad_max_key, 52) \
+ x(btree_node_bad_format, 53) \
+ x(btree_node_bkey_past_bset_end, 54) \
+ x(btree_node_bkey_bad_format, 55) \
+ x(btree_node_bad_bkey, 56) \
+ x(btree_node_bkey_out_of_order, 57) \
+ x(btree_root_bkey_invalid, 58) \
+ x(btree_root_read_error, 59) \
+ x(btree_root_bad_min_key, 60) \
+ x(btree_root_bad_max_key, 61) \
+ x(btree_node_read_error, 62) \
+ x(btree_node_topology_bad_min_key, 63) \
+ x(btree_node_topology_bad_max_key, 64) \
+ x(btree_node_topology_overwritten_by_prev_node, 65) \
+ x(btree_node_topology_overwritten_by_next_node, 66) \
+ x(btree_node_topology_interior_node_empty, 67) \
+ x(fs_usage_hidden_wrong, 68) \
+ x(fs_usage_btree_wrong, 69) \
+ x(fs_usage_data_wrong, 70) \
+ x(fs_usage_cached_wrong, 71) \
+ x(fs_usage_reserved_wrong, 72) \
+ x(fs_usage_persistent_reserved_wrong, 73) \
+ x(fs_usage_nr_inodes_wrong, 74) \
+ x(fs_usage_replicas_wrong, 75) \
+ x(dev_usage_buckets_wrong, 76) \
+ x(dev_usage_sectors_wrong, 77) \
+ x(dev_usage_fragmented_wrong, 78) \
+ x(dev_usage_buckets_ec_wrong, 79) \
+ x(bkey_version_in_future, 80) \
+ x(bkey_u64s_too_small, 81) \
+ x(bkey_invalid_type_for_btree, 82) \
+ x(bkey_extent_size_zero, 83) \
+ x(bkey_extent_size_greater_than_offset, 84) \
+ x(bkey_size_nonzero, 85) \
+ x(bkey_snapshot_nonzero, 86) \
+ x(bkey_snapshot_zero, 87) \
+ x(bkey_at_pos_max, 88) \
+ x(bkey_before_start_of_btree_node, 89) \
+ x(bkey_after_end_of_btree_node, 90) \
+ x(bkey_val_size_nonzero, 91) \
+ x(bkey_val_size_too_small, 92) \
+ x(alloc_v1_val_size_bad, 93) \
+ x(alloc_v2_unpack_error, 94) \
+ x(alloc_v3_unpack_error, 95) \
+ x(alloc_v4_val_size_bad, 96) \
+ x(alloc_v4_backpointers_start_bad, 97) \
+ x(alloc_key_data_type_bad, 98) \
+ x(alloc_key_empty_but_have_data, 99) \
+ x(alloc_key_dirty_sectors_0, 100) \
+ x(alloc_key_data_type_inconsistency, 101) \
+ x(alloc_key_to_missing_dev_bucket, 102) \
+ x(alloc_key_cached_inconsistency, 103) \
+ x(alloc_key_cached_but_read_time_zero, 104) \
+ x(alloc_key_to_missing_lru_entry, 105) \
+ x(alloc_key_data_type_wrong, 106) \
+ x(alloc_key_gen_wrong, 107) \
+ x(alloc_key_dirty_sectors_wrong, 108) \
+ x(alloc_key_cached_sectors_wrong, 109) \
+ x(alloc_key_stripe_wrong, 110) \
+ x(alloc_key_stripe_redundancy_wrong, 111) \
+ x(bucket_sector_count_overflow, 112) \
+ x(bucket_metadata_type_mismatch, 113) \
+ x(need_discard_key_wrong, 114) \
+ x(freespace_key_wrong, 115) \
+ x(freespace_hole_missing, 116) \
+ x(bucket_gens_val_size_bad, 117) \
+ x(bucket_gens_key_wrong, 118) \
+ x(bucket_gens_hole_wrong, 119) \
+ x(bucket_gens_to_invalid_dev, 120) \
+ x(bucket_gens_to_invalid_buckets, 121) \
+ x(bucket_gens_nonzero_for_invalid_buckets, 122) \
+ x(need_discard_freespace_key_to_invalid_dev_bucket, 123) \
+ x(need_discard_freespace_key_bad, 124) \
+ x(backpointer_pos_wrong, 125) \
+ x(backpointer_to_missing_device, 126) \
+ x(backpointer_to_missing_alloc, 127) \
+ x(backpointer_to_missing_ptr, 128) \
+ x(lru_entry_at_time_0, 129) \
+ x(lru_entry_to_invalid_bucket, 130) \
+ x(lru_entry_bad, 131) \
+ x(btree_ptr_val_too_big, 132) \
+ x(btree_ptr_v2_val_too_big, 133) \
+ x(btree_ptr_has_non_ptr, 134) \
+ x(extent_ptrs_invalid_entry, 135) \
+ x(extent_ptrs_no_ptrs, 136) \
+ x(extent_ptrs_too_many_ptrs, 137) \
+ x(extent_ptrs_redundant_crc, 138) \
+ x(extent_ptrs_redundant_stripe, 139) \
+ x(extent_ptrs_unwritten, 140) \
+ x(extent_ptrs_written_and_unwritten, 141) \
+ x(ptr_to_invalid_device, 142) \
+ x(ptr_to_duplicate_device, 143) \
+ x(ptr_after_last_bucket, 144) \
+ x(ptr_before_first_bucket, 145) \
+ x(ptr_spans_multiple_buckets, 146) \
+ x(ptr_to_missing_backpointer, 147) \
+ x(ptr_to_missing_alloc_key, 148) \
+ x(ptr_to_missing_replicas_entry, 149) \
+ x(ptr_to_missing_stripe, 150) \
+ x(ptr_to_incorrect_stripe, 151) \
+ x(ptr_gen_newer_than_bucket_gen, 152) \
+ x(ptr_too_stale, 153) \
+ x(stale_dirty_ptr, 154) \
+ x(ptr_bucket_data_type_mismatch, 155) \
+ x(ptr_cached_and_erasure_coded, 156) \
+ x(ptr_crc_uncompressed_size_too_small, 157) \
+ x(ptr_crc_csum_type_unknown, 158) \
+ x(ptr_crc_compression_type_unknown, 159) \
+ x(ptr_crc_redundant, 160) \
+ x(ptr_crc_uncompressed_size_too_big, 161) \
+ x(ptr_crc_nonce_mismatch, 162) \
+ x(ptr_stripe_redundant, 163) \
+ x(reservation_key_nr_replicas_invalid, 164) \
+ x(reflink_v_refcount_wrong, 165) \
+ x(reflink_p_to_missing_reflink_v, 166) \
+ x(stripe_pos_bad, 167) \
+ x(stripe_val_size_bad, 168) \
+ x(stripe_sector_count_wrong, 169) \
+ x(snapshot_tree_pos_bad, 170) \
+ x(snapshot_tree_to_missing_snapshot, 171) \
+ x(snapshot_tree_to_missing_subvol, 172) \
+ x(snapshot_tree_to_wrong_subvol, 173) \
+ x(snapshot_tree_to_snapshot_subvol, 174) \
+ x(snapshot_pos_bad, 175) \
+ x(snapshot_parent_bad, 176) \
+ x(snapshot_children_not_normalized, 177) \
+ x(snapshot_child_duplicate, 178) \
+ x(snapshot_child_bad, 179) \
+ x(snapshot_skiplist_not_normalized, 180) \
+ x(snapshot_skiplist_bad, 181) \
+ x(snapshot_should_not_have_subvol, 182) \
+ x(snapshot_to_bad_snapshot_tree, 183) \
+ x(snapshot_bad_depth, 184) \
+ x(snapshot_bad_skiplist, 185) \
+ x(subvol_pos_bad, 186) \
+ x(subvol_not_master_and_not_snapshot, 187) \
+ x(subvol_to_missing_root, 188) \
+ x(subvol_root_wrong_bi_subvol, 189) \
+ x(bkey_in_missing_snapshot, 190) \
+ x(inode_pos_inode_nonzero, 191) \
+ x(inode_pos_blockdev_range, 192) \
+ x(inode_unpack_error, 193) \
+ x(inode_str_hash_invalid, 194) \
+ x(inode_v3_fields_start_bad, 195) \
+ x(inode_snapshot_mismatch, 196) \
+ x(inode_unlinked_but_clean, 197) \
+ x(inode_unlinked_but_nlink_nonzero, 198) \
+ x(inode_checksum_type_invalid, 199) \
+ x(inode_compression_type_invalid, 200) \
+ x(inode_subvol_root_but_not_dir, 201) \
+ x(inode_i_size_dirty_but_clean, 202) \
+ x(inode_i_sectors_dirty_but_clean, 203) \
+ x(inode_i_sectors_wrong, 204) \
+ x(inode_dir_wrong_nlink, 205) \
+ x(inode_dir_multiple_links, 206) \
+ x(inode_multiple_links_but_nlink_0, 207) \
+ x(inode_wrong_backpointer, 208) \
+ x(inode_wrong_nlink, 209) \
+ x(inode_unreachable, 210) \
+ x(deleted_inode_but_clean, 211) \
+ x(deleted_inode_missing, 212) \
+ x(deleted_inode_is_dir, 213) \
+ x(deleted_inode_not_unlinked, 214) \
+ x(extent_overlapping, 215) \
+ x(extent_in_missing_inode, 216) \
+ x(extent_in_non_reg_inode, 217) \
+ x(extent_past_end_of_inode, 218) \
+ x(dirent_empty_name, 219) \
+ x(dirent_val_too_big, 220) \
+ x(dirent_name_too_long, 221) \
+ x(dirent_name_embedded_nul, 222) \
+ x(dirent_name_dot_or_dotdot, 223) \
+ x(dirent_name_has_slash, 224) \
+ x(dirent_d_type_wrong, 225) \
+ x(dirent_d_parent_subvol_wrong, 226) \
+ x(dirent_in_missing_dir_inode, 227) \
+ x(dirent_in_non_dir_inode, 228) \
+ x(dirent_to_missing_inode, 229) \
+ x(dirent_to_missing_subvol, 230) \
+ x(dirent_to_itself, 231) \
+ x(quota_type_invalid, 232) \
+ x(xattr_val_size_too_small, 233) \
+ x(xattr_val_size_too_big, 234) \
+ x(xattr_invalid_type, 235) \
+ x(xattr_name_invalid_chars, 236) \
+ x(xattr_in_missing_inode, 237) \
+ x(root_subvol_missing, 238) \
+ x(root_dir_missing, 239) \
+ x(root_inode_not_dir, 240) \
+ x(dir_loop, 241) \
+ x(hash_table_key_duplicate, 242) \
+ x(hash_table_key_wrong_offset, 243)
+
+enum bch_sb_error_id {
+#define x(t, n) BCH_FSCK_ERR_##t = n,
+ BCH_SB_ERRS()
+#undef x
+ BCH_SB_ERR_MAX
+};
+
struct bch_sb_error_entry_cpu {
u64 id:16,
nr:48;
return bch2_subvolume_get_inlined(trans, subvol, inconsistent_if_not_found, iter_flags, s);
}
+int bch2_subvol_is_ro_trans(struct btree_trans *trans, u32 subvol)
+{
+ struct bch_subvolume s;
+ int ret = bch2_subvolume_get_inlined(trans, subvol, true, 0, &s);
+ if (ret)
+ return ret;
+
+ if (BCH_SUBVOLUME_RO(&s))
+ return -EROFS;
+ return 0;
+}
+
+int bch2_subvol_is_ro(struct bch_fs *c, u32 subvol)
+{
+ return bch2_trans_do(c, NULL, NULL, 0,
+ bch2_subvol_is_ro_trans(trans, subvol));
+}
+
int bch2_snapshot_get_subvol(struct btree_trans *trans, u32 snapshot,
struct bch_subvolume *subvol)
{
bool, int, struct bch_subvolume *);
int bch2_subvolume_get_snapshot(struct btree_trans *, u32, u32 *);
+int bch2_subvol_is_ro_trans(struct btree_trans *, u32);
+int bch2_subvol_is_ro(struct bch_fs *, u32);
+
int bch2_delete_dead_snapshots(struct bch_fs *);
void bch2_delete_dead_snapshots_async(struct bch_fs *);
#include "replicas.h"
#include "quota.h"
#include "sb-clean.h"
+#include "sb-downgrade.h"
#include "sb-errors.h"
#include "sb-members.h"
#include "super-io.h"
void bch2_free_super(struct bch_sb_handle *sb)
{
kfree(sb->bio);
- if (!IS_ERR_OR_NULL(sb->bdev))
- blkdev_put(sb->bdev, sb->holder);
+ if (!IS_ERR_OR_NULL(sb->bdev_handle))
+ bdev_release(sb->bdev_handle);
kfree(sb->holder);
kfree(sb->sb_name);
return f;
}
+struct bch_sb_field *bch2_sb_field_get_minsize_id(struct bch_sb_handle *sb,
+ enum bch_sb_field_type type,
+ unsigned u64s)
+{
+ struct bch_sb_field *f = bch2_sb_field_get_id(sb->sb, type);
+
+ if (!f || le32_to_cpu(f->u64s) < u64s)
+ f = bch2_sb_field_resize_id(sb, type, u64s);
+ return f;
+}
+
/* Superblock validate: */
static int validate_sb_layout(struct bch_sb_layout *layout, struct printbuf *out)
/* device open: */
+static unsigned long le_ulong_to_cpu(unsigned long v)
+{
+ return sizeof(unsigned long) == 8
+ ? le64_to_cpu(v)
+ : le32_to_cpu(v);
+}
+
+static void le_bitvector_to_cpu(unsigned long *dst, unsigned long *src, unsigned nr)
+{
+ BUG_ON(nr & (BITS_PER_TYPE(long) - 1));
+
+ for (unsigned i = 0; i < BITS_TO_LONGS(nr); i++)
+ dst[i] = le_ulong_to_cpu(src[i]);
+}
+
static void bch2_sb_update(struct bch_fs *c)
{
struct bch_sb *src = c->disk_sb.sb;
c->sb.features = le64_to_cpu(src->features[0]);
c->sb.compat = le64_to_cpu(src->compat[0]);
+ memset(c->sb.errors_silent, 0, sizeof(c->sb.errors_silent));
+
+ struct bch_sb_field_ext *ext = bch2_sb_field_get(src, ext);
+ if (ext)
+ le_bitvector_to_cpu(c->sb.errors_silent, (void *) ext->errors_silent,
+ sizeof(c->sb.errors_silent) * 8);
+
for_each_member_device(ca, c, i) {
- struct bch_member m = bch2_sb_member_get(src, i);
+ struct bch_member m = bch2_sb_member_get(src, ca->dev_idx);
ca->mi = bch2_mi_to_cpu(&m);
}
}
if (!opt_get(*opts, nochanges))
sb->mode |= BLK_OPEN_WRITE;
- sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
- if (IS_ERR(sb->bdev) &&
- PTR_ERR(sb->bdev) == -EACCES &&
+ sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+ if (IS_ERR(sb->bdev_handle) &&
+ PTR_ERR(sb->bdev_handle) == -EACCES &&
opt_get(*opts, read_only)) {
sb->mode &= ~BLK_OPEN_WRITE;
- sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
- if (!IS_ERR(sb->bdev))
+ sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+ if (!IS_ERR(sb->bdev_handle))
opt_set(*opts, nochanges, true);
}
- if (IS_ERR(sb->bdev)) {
- ret = PTR_ERR(sb->bdev);
+ if (IS_ERR(sb->bdev_handle)) {
+ ret = PTR_ERR(sb->bdev_handle);
goto out;
}
+ sb->bdev = sb->bdev_handle->bdev;
ret = bch2_sb_realloc(sb, 0);
if (ret) {
bch2_sb_members_from_cpu(c);
bch2_sb_members_cpy_v2_v1(&c->disk_sb);
bch2_sb_errors_from_cpu(c);
+ bch2_sb_downgrade_update(c);
for_each_online_member(ca, c, i)
bch2_sb_from_fs(c, ca);
}
/* Downgrade if superblock is at a higher version than currently supported: */
-void bch2_sb_maybe_downgrade(struct bch_fs *c)
+bool bch2_check_version_downgrade(struct bch_fs *c)
{
+ bool ret = bcachefs_metadata_version_current < c->sb.version;
+
lockdep_assert_held(&c->sb_lock);
/*
if (c->sb.version_min > bcachefs_metadata_version_current)
c->disk_sb.sb->version_min = cpu_to_le16(bcachefs_metadata_version_current);
c->disk_sb.sb->compat[0] &= cpu_to_le64((1ULL << BCH_COMPAT_NR) - 1);
+ return ret;
}
void bch2_sb_upgrade(struct bch_fs *c, unsigned new_version)
{
lockdep_assert_held(&c->sb_lock);
+ if (BCH_VERSION_MAJOR(new_version) >
+ BCH_VERSION_MAJOR(le16_to_cpu(c->disk_sb.sb->version)))
+ bch2_sb_field_resize(&c->disk_sb, downgrade, 0);
+
c->disk_sb.sb->version = cpu_to_le16(new_version);
c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
}
+static int bch2_sb_ext_validate(struct bch_sb *sb, struct bch_sb_field *f,
+ struct printbuf *err)
+{
+ if (vstruct_bytes(f) < 88) {
+ prt_printf(err, "field too small (%zu < %u)", vstruct_bytes(f), 88);
+ return -BCH_ERR_invalid_sb_ext;
+ }
+
+ return 0;
+}
+
+static void bch2_sb_ext_to_text(struct printbuf *out, struct bch_sb *sb,
+ struct bch_sb_field *f)
+{
+ struct bch_sb_field_ext *e = field_to_type(f, ext);
+
+ prt_printf(out, "Recovery passes required:");
+ prt_tab(out);
+ prt_bitflags(out, bch2_recovery_passes,
+ bch2_recovery_passes_from_stable(le64_to_cpu(e->recovery_passes_required[0])));
+ prt_newline(out);
+
+ unsigned long *errors_silent = kmalloc(sizeof(e->errors_silent), GFP_KERNEL);
+ if (errors_silent) {
+ le_bitvector_to_cpu(errors_silent, (void *) e->errors_silent, sizeof(e->errors_silent) * 8);
+
+ prt_printf(out, "Errors to silently fix:");
+ prt_tab(out);
+ prt_bitflags_vector(out, bch2_sb_error_strs, errors_silent, sizeof(e->errors_silent) * 8);
+ prt_newline(out);
+
+ kfree(errors_silent);
+ }
+}
+
+static const struct bch_sb_field_ops bch_sb_field_ops_ext = {
+ .validate = bch2_sb_ext_validate,
+ .to_text = bch2_sb_ext_to_text,
+};
+
static const struct bch_sb_field_ops *bch2_sb_field_ops[] = {
#define x(f, nr) \
[BCH_SB_FIELD_##f] = &bch_sb_field_ops_##f,
#define bch2_sb_field_resize(_sb, _name, _u64s) \
field_to_type(bch2_sb_field_resize_id(_sb, BCH_SB_FIELD_##_name, _u64s), _name)
+struct bch_sb_field *bch2_sb_field_get_minsize_id(struct bch_sb_handle *,
+ enum bch_sb_field_type, unsigned);
+#define bch2_sb_field_get_minsize(_sb, _name, _u64s) \
+ field_to_type(bch2_sb_field_get_minsize_id(_sb, BCH_SB_FIELD_##_name, _u64s), _name)
+
+#define bch2_sb_field_nr_entries(_f) \
+ (_f ? ((bch2_sb_field_bytes(&_f->field) - sizeof(*_f)) / \
+ sizeof(_f->entries[0])) \
+ : 0)
+
void bch2_sb_field_delete(struct bch_sb_handle *, enum bch_sb_field_type);
extern const char * const bch2_sb_fields[];
__bch2_check_set_feature(c, feat);
}
-void bch2_sb_maybe_downgrade(struct bch_fs *);
+bool bch2_check_version_downgrade(struct bch_fs *);
void bch2_sb_upgrade(struct bch_fs *, unsigned);
void bch2_sb_field_to_text(struct printbuf *, struct bch_sb *,
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Kent Overstreet <kent.overstreet@gmail.com>");
MODULE_DESCRIPTION("bcachefs filesystem");
+MODULE_SOFTDEP("pre: crc32c");
+MODULE_SOFTDEP("pre: crc64");
+MODULE_SOFTDEP("pre: sha256");
+MODULE_SOFTDEP("pre: chacha20");
+MODULE_SOFTDEP("pre: poly1305");
+MODULE_SOFTDEP("pre: xxhash");
#define KTYPE(type) \
static const struct attribute_group type ## _group = { \
bch2_fs_copygc_init(c);
bch2_fs_btree_key_cache_init_early(&c->btree_key_cache);
+ bch2_fs_btree_iter_init_early(c);
bch2_fs_btree_interior_update_init_early(c);
bch2_fs_allocator_background_init(c);
bch2_fs_allocator_foreground_init(c);
struct bch_sb_handle {
struct bch_sb *sb;
+ struct bdev_handle *bdev_handle;
struct block_device *bdev;
char *sb_name;
struct bio *bio;
#define prt_units_s64(...) bch2_prt_units_s64(__VA_ARGS__)
#define prt_string_option(...) bch2_prt_string_option(__VA_ARGS__)
#define prt_bitflags(...) bch2_prt_bitflags(__VA_ARGS__)
+#define prt_bitflags_vector(...) bch2_prt_bitflags_vector(__VA_ARGS__)
void bch2_pr_time_units(struct printbuf *, u64);
void bch2_prt_datetime(struct printbuf *, time64_t);
struct btree_iter inode_iter = { NULL };
int ret;
- ret = bch2_inode_peek(trans, &inode_iter, inode_u, inum, BTREE_ITER_INTENT);
+ ret = bch2_subvol_is_ro_trans(trans, inum.subvol) ?:
+ bch2_inode_peek(trans, &inode_iter, inode_u, inum, BTREE_ITER_INTENT);
if (ret)
return ret;
* will not race with any other ebs.
*/
if (page->mapping)
- lockdep_assert_held(&page->mapping->private_lock);
+ lockdep_assert_held(&page->mapping->i_private_lock);
if (fs_info->nodesize >= PAGE_SIZE) {
if (!PagePrivate(page))
* Take private lock to ensure the subpage won't be detached
* in the meantime.
*/
- spin_lock(&page->mapping->private_lock);
+ spin_lock(&page->mapping->i_private_lock);
if (!PagePrivate(page)) {
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
break;
}
spin_lock_irqsave(&subpage->lock, flags);
if (!test_bit(bit_start + fs_info->subpage_info->dirty_offset,
subpage->bitmaps)) {
spin_unlock_irqrestore(&subpage->lock, flags);
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
bit_start++;
continue;
}
*/
eb = find_extent_buffer_nolock(fs_info, start);
spin_unlock_irqrestore(&subpage->lock, flags);
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
/*
* The eb has already reached 0 refs thus find_extent_buffer()
if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
return submit_eb_subpage(page, wbc);
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
if (!PagePrivate(page)) {
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
return 0;
}
* crashing the machine for something we can survive anyway.
*/
if (WARN_ON(!eb)) {
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
return 0;
}
if (eb == ctx->eb) {
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
return 0;
}
ret = atomic_inc_not_zero(&eb->refs);
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
if (!ret)
return 0;
{
struct btrfs_subpage *subpage;
- lockdep_assert_held(&page->mapping->private_lock);
+ lockdep_assert_held(&page->mapping->i_private_lock);
if (PagePrivate(page)) {
subpage = (struct btrfs_subpage *)page->private;
/*
* For mapped eb, we're going to change the page private, which should
- * be done under the private_lock.
+ * be done under the i_private_lock.
*/
if (mapped)
- spin_lock(&page->mapping->private_lock);
+ spin_lock(&page->mapping->i_private_lock);
if (!PagePrivate(page)) {
if (mapped)
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
return;
}
detach_page_private(page);
}
if (mapped)
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
return;
}
if (!page_range_has_eb(fs_info, page))
btrfs_detach_subpage(fs_info, page);
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
}
/* Release all pages attached to the extent buffer */
/*
* Preallocate page->private for subpage case, so that we won't
- * allocate memory with private_lock nor page lock hold.
+ * allocate memory with i_private_lock nor page lock hold.
*
* The memory will be freed by attach_extent_buffer_page() or freed
* manually if we exit earlier.
goto free_eb;
}
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
exists = grab_extent_buffer(fs_info, p);
if (exists) {
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
unlock_page(p);
put_page(p);
mark_extent_buffer_accessed(exists, p);
* Thus needs no special handling in error path.
*/
btrfs_page_inc_eb_refs(fs_info, p);
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
WARN_ON(btrfs_page_test_dirty(fs_info, p, eb->start, eb->len));
eb->pages[i] = p;
* Finally to check if we have cleared page private, as if we have
* released all ebs in the page, the page private should be cleared now.
*/
- spin_lock(&page->mapping->private_lock);
+ spin_lock(&page->mapping->i_private_lock);
if (!PagePrivate(page))
ret = 1;
else
ret = 0;
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
return ret;
}
* We need to make sure nobody is changing page->private, as we rely on
* page->private as the pointer to extent buffer.
*/
- spin_lock(&page->mapping->private_lock);
+ spin_lock(&page->mapping->i_private_lock);
if (!PagePrivate(page)) {
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
return 1;
}
spin_lock(&eb->refs_lock);
if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) {
spin_unlock(&eb->refs_lock);
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
return 0;
}
- spin_unlock(&page->mapping->private_lock);
+ spin_unlock(&page->mapping->i_private_lock);
/*
* If tree ref isn't set then we know the ref on this eb is a real ref,
if (ret < 0)
goto out_acct;
- file_start_write(file);
-
if (iov_iter_count(&iter) == 0) {
ret = 0;
- goto out_end_write;
+ goto out_iov;
}
pos = args.offset;
ret = rw_verify_area(WRITE, file, &pos, args.len);
if (ret < 0)
- goto out_end_write;
+ goto out_iov;
init_sync_kiocb(&kiocb, file);
ret = kiocb_set_rw_flags(&kiocb, 0);
if (ret)
- goto out_end_write;
+ goto out_iov;
kiocb.ki_pos = pos;
+ file_start_write(file);
+
ret = btrfs_do_write_iter(&kiocb, &iter, &args);
if (ret > 0)
fsnotify_modify(file);
-out_end_write:
file_end_write(file);
+out_iov:
kfree(iov);
out_acct:
if (ret > 0)
return;
ASSERT(PagePrivate(page) && page->mapping);
- lockdep_assert_held(&page->mapping->private_lock);
+ lockdep_assert_held(&page->mapping->i_private_lock);
subpage = (struct btrfs_subpage *)page->private;
atomic_inc(&subpage->eb_refs);
return;
ASSERT(PagePrivate(page) && page->mapping);
- lockdep_assert_held(&page->mapping->private_lock);
+ lockdep_assert_held(&page->mapping->i_private_lock);
subpage = (struct btrfs_subpage *)page->private;
ASSERT(atomic_read(&subpage->eb_refs));
return ERR_PTR(error);
}
+ /* No support for restricting writes to btrfs devices yet... */
+ mode &= ~BLK_OPEN_RESTRICT_WRITES;
/*
* Setup a dummy root and fs_info for test/set super. This is because
* we don't actually fill this stuff out until open_ctree, but we need
* Various filesystems appear to want __find_get_block to be non-blocking.
* But it's the page lock which protects the buffers. To get around this,
* we get exclusion from try_to_free_buffers with the blockdev mapping's
- * private_lock.
+ * i_private_lock.
*
- * Hack idea: for the blockdev mapping, private_lock contention
+ * Hack idea: for the blockdev mapping, i_private_lock contention
* may be quite high. This code could TryLock the page, and if that
- * succeeds, there is no need to take private_lock.
+ * succeeds, there is no need to take i_private_lock.
*/
static struct buffer_head *
__find_get_block_slow(struct block_device *bdev, sector_t block)
if (IS_ERR(folio))
goto out;
- spin_lock(&bd_mapping->private_lock);
+ spin_lock(&bd_mapping->i_private_lock);
head = folio_buffers(folio);
if (!head)
goto out_unlock;
1 << bd_inode->i_blkbits);
}
out_unlock:
- spin_unlock(&bd_mapping->private_lock);
+ spin_unlock(&bd_mapping->i_private_lock);
folio_put(folio);
out:
return ret;
*
* The functions mark_buffer_inode_dirty(), fsync_inode_buffers(),
* inode_has_buffers() and invalidate_inode_buffers() are provided for the
- * management of a list of dependent buffers at ->i_mapping->private_list.
+ * management of a list of dependent buffers at ->i_mapping->i_private_list.
*
* Locking is a little subtle: try_to_free_buffers() will remove buffers
* from their controlling inode's queue when they are being freed. But
* try_to_free_buffers() will be operating against the *blockdev* mapping
* at the time, not against the S_ISREG file which depends on those buffers.
- * So the locking for private_list is via the private_lock in the address_space
+ * So the locking for i_private_list is via the i_private_lock in the address_space
* which backs the buffers. Which is different from the address_space
* against which the buffers are listed. So for a particular address_space,
- * mapping->private_lock does *not* protect mapping->private_list! In fact,
- * mapping->private_list will always be protected by the backing blockdev's
- * ->private_lock.
+ * mapping->i_private_lock does *not* protect mapping->i_private_list! In fact,
+ * mapping->i_private_list will always be protected by the backing blockdev's
+ * ->i_private_lock.
*
* Which introduces a requirement: all buffers on an address_space's
- * ->private_list must be from the same address_space: the blockdev's.
+ * ->i_private_list must be from the same address_space: the blockdev's.
*
- * address_spaces which do not place buffers at ->private_list via these
- * utility functions are free to use private_lock and private_list for
- * whatever they want. The only requirement is that list_empty(private_list)
+ * address_spaces which do not place buffers at ->i_private_list via these
+ * utility functions are free to use i_private_lock and i_private_list for
+ * whatever they want. The only requirement is that list_empty(i_private_list)
* be true at clear_inode() time.
*
* FIXME: clear_inode should not call invalidate_inode_buffers(). The
*/
/*
- * The buffer's backing address_space's private_lock must be held
+ * The buffer's backing address_space's i_private_lock must be held
*/
static void __remove_assoc_queue(struct buffer_head *bh)
{
int inode_has_buffers(struct inode *inode)
{
- return !list_empty(&inode->i_data.private_list);
+ return !list_empty(&inode->i_data.i_private_list);
}
/*
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
*
- * Starts I/O against the buffers at mapping->private_list, and waits upon
+ * Starts I/O against the buffers at mapping->i_private_list, and waits upon
* that I/O.
*
* Basically, this is a convenience function for fsync().
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct address_space *buffer_mapping = mapping->private_data;
+ struct address_space *buffer_mapping = mapping->i_private_data;
- if (buffer_mapping == NULL || list_empty(&mapping->private_list))
+ if (buffer_mapping == NULL || list_empty(&mapping->i_private_list))
return 0;
- return fsync_buffers_list(&buffer_mapping->private_lock,
- &mapping->private_list);
+ return fsync_buffers_list(&buffer_mapping->i_private_lock,
+ &mapping->i_private_list);
}
EXPORT_SYMBOL(sync_mapping_buffers);
struct address_space *buffer_mapping = bh->b_folio->mapping;
mark_buffer_dirty(bh);
- if (!mapping->private_data) {
- mapping->private_data = buffer_mapping;
+ if (!mapping->i_private_data) {
+ mapping->i_private_data = buffer_mapping;
} else {
- BUG_ON(mapping->private_data != buffer_mapping);
+ BUG_ON(mapping->i_private_data != buffer_mapping);
}
if (!bh->b_assoc_map) {
- spin_lock(&buffer_mapping->private_lock);
+ spin_lock(&buffer_mapping->i_private_lock);
list_move_tail(&bh->b_assoc_buffers,
- &mapping->private_list);
+ &mapping->i_private_list);
bh->b_assoc_map = mapping;
- spin_unlock(&buffer_mapping->private_lock);
+ spin_unlock(&buffer_mapping->i_private_lock);
}
}
EXPORT_SYMBOL(mark_buffer_dirty_inode);
* bit, see a bunch of clean buffers and we'd end up with dirty buffers/clean
* page on the dirty page list.
*
- * We use private_lock to lock against try_to_free_buffers while using the
+ * We use i_private_lock to lock against try_to_free_buffers while using the
* page's buffer list. Also use this to protect against clean buffers being
* added to the page after it was set dirty.
*
struct buffer_head *head;
bool newly_dirty;
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
head = folio_buffers(folio);
if (head) {
struct buffer_head *bh = head;
*/
folio_memcg_lock(folio);
newly_dirty = !folio_test_set_dirty(folio);
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
if (newly_dirty)
__folio_mark_dirty(folio, mapping, 1);
smp_mb();
if (buffer_dirty(bh)) {
list_add(&bh->b_assoc_buffers,
- &mapping->private_list);
+ &mapping->i_private_list);
bh->b_assoc_map = mapping;
}
spin_unlock(lock);
* probably unmounting the fs, but that doesn't mean we have already
* done a sync(). Just drop the buffers from the inode list.
*
- * NOTE: we take the inode's blockdev's mapping's private_lock. Which
+ * NOTE: we take the inode's blockdev's mapping's i_private_lock. Which
* assumes that all the buffers are against the blockdev. Not true
* for reiserfs.
*/
{
if (inode_has_buffers(inode)) {
struct address_space *mapping = &inode->i_data;
- struct list_head *list = &mapping->private_list;
- struct address_space *buffer_mapping = mapping->private_data;
+ struct list_head *list = &mapping->i_private_list;
+ struct address_space *buffer_mapping = mapping->i_private_data;
- spin_lock(&buffer_mapping->private_lock);
+ spin_lock(&buffer_mapping->i_private_lock);
while (!list_empty(list))
__remove_assoc_queue(BH_ENTRY(list->next));
- spin_unlock(&buffer_mapping->private_lock);
+ spin_unlock(&buffer_mapping->i_private_lock);
}
}
EXPORT_SYMBOL(invalidate_inode_buffers);
if (inode_has_buffers(inode)) {
struct address_space *mapping = &inode->i_data;
- struct list_head *list = &mapping->private_list;
- struct address_space *buffer_mapping = mapping->private_data;
+ struct list_head *list = &mapping->i_private_list;
+ struct address_space *buffer_mapping = mapping->i_private_data;
- spin_lock(&buffer_mapping->private_lock);
+ spin_lock(&buffer_mapping->i_private_lock);
while (!list_empty(list)) {
struct buffer_head *bh = BH_ENTRY(list->next);
if (buffer_dirty(bh)) {
}
__remove_assoc_queue(bh);
}
- spin_unlock(&buffer_mapping->private_lock);
+ spin_unlock(&buffer_mapping->i_private_lock);
}
return ret;
}
* lock to be atomic wrt __find_get_block(), which does not
* run under the folio lock.
*/
- spin_lock(&inode->i_mapping->private_lock);
+ spin_lock(&inode->i_mapping->i_private_lock);
link_dev_buffers(folio, bh);
end_block = folio_init_buffers(folio, bdev,
(sector_t)index << sizebits, size);
- spin_unlock(&inode->i_mapping->private_lock);
+ spin_unlock(&inode->i_mapping->i_private_lock);
done:
ret = (block < end_block) ? 1 : -ENXIO;
failed:
* and then attach the address_space's inode to its superblock's dirty
* inode list.
*
- * mark_buffer_dirty() is atomic. It takes bh->b_folio->mapping->private_lock,
+ * mark_buffer_dirty() is atomic. It takes bh->b_folio->mapping->i_private_lock,
* i_pages lock and mapping->host->i_lock.
*/
void mark_buffer_dirty(struct buffer_head *bh)
if (bh->b_assoc_map) {
struct address_space *buffer_mapping = bh->b_folio->mapping;
- spin_lock(&buffer_mapping->private_lock);
+ spin_lock(&buffer_mapping->i_private_lock);
list_del_init(&bh->b_assoc_buffers);
bh->b_assoc_map = NULL;
- spin_unlock(&buffer_mapping->private_lock);
+ spin_unlock(&buffer_mapping->i_private_lock);
}
__brelse(bh);
}
/*
* We attach and possibly dirty the buffers atomically wrt
- * block_dirty_folio() via private_lock. try_to_free_buffers
+ * block_dirty_folio() via i_private_lock. try_to_free_buffers
* is already excluded via the folio lock.
*/
struct buffer_head *create_empty_buffers(struct folio *folio,
} while (bh);
tail->b_this_page = head;
- spin_lock(&folio->mapping->private_lock);
+ spin_lock(&folio->mapping->i_private_lock);
if (folio_test_uptodate(folio) || folio_test_dirty(folio)) {
bh = head;
do {
} while (bh != head);
}
folio_attach_private(folio, head);
- spin_unlock(&folio->mapping->private_lock);
+ spin_unlock(&folio->mapping->i_private_lock);
return head;
}
if (!folio_buffers(folio))
continue;
/*
- * We use folio lock instead of bd_mapping->private_lock
+ * We use folio lock instead of bd_mapping->i_private_lock
* to pin buffers here since we can afford to sleep and
* it scales better than a global spinlock lock.
*/
* are unused, and releases them if so.
*
* Exclusion against try_to_free_buffers may be obtained by either
- * locking the folio or by holding its mapping's private_lock.
+ * locking the folio or by holding its mapping's i_private_lock.
*
* If the folio is dirty but all the buffers are clean then we need to
* be sure to mark the folio clean as well. This is because the folio
* The same applies to regular filesystem folios: if all the buffers are
* clean then we set the folio clean and proceed. To do that, we require
* total exclusion from block_dirty_folio(). That is obtained with
- * private_lock.
+ * i_private_lock.
*
* try_to_free_buffers() is non-blocking.
*/
goto out;
}
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
ret = drop_buffers(folio, &buffers_to_free);
/*
* the folio's buffers clean. We discover that here and clean
* the folio also.
*
- * private_lock must be held over this entire operation in order
+ * i_private_lock must be held over this entire operation in order
* to synchronise against block_dirty_folio and prevent the
* dirty bit from being lost.
*/
if (ret)
folio_cancel_dirty(folio);
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
{ "tag", cachefiles_daemon_tag },
#ifdef CONFIG_CACHEFILES_ONDEMAND
{ "copen", cachefiles_ondemand_copen },
+ { "restore", cachefiles_ondemand_restore },
#endif
{ "", NULL }
};
struct poll_table_struct *poll)
{
struct cachefiles_cache *cache = file->private_data;
+ XA_STATE(xas, &cache->reqs, 0);
+ struct cachefiles_req *req;
__poll_t mask;
poll_wait(file, &cache->daemon_pollwq, poll);
mask = 0;
if (cachefiles_in_ondemand_mode(cache)) {
- if (!xa_empty(&cache->reqs))
- mask |= EPOLLIN;
+ if (!xa_empty(&cache->reqs)) {
+ rcu_read_lock();
+ xas_for_each_marked(&xas, req, ULONG_MAX, CACHEFILES_REQ_NEW) {
+ if (!cachefiles_ondemand_is_reopening_read(req)) {
+ mask |= EPOLLIN;
+ break;
+ }
+ }
+ rcu_read_unlock();
+ }
} else {
if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
mask |= EPOLLIN;
if (!object)
return NULL;
+ if (cachefiles_ondemand_init_obj_info(object, volume)) {
+ kmem_cache_free(cachefiles_object_jar, object);
+ return NULL;
+ }
+
refcount_set(&object->ref, 1);
spin_lock_init(&object->lock);
ASSERTCMP(object->file, ==, NULL);
kfree(object->d_name);
-
+ cachefiles_ondemand_deinit_obj_info(object);
cache = object->volume->cache->cache;
fscache_put_cookie(object->cookie, fscache_cookie_put_object);
object->cookie = NULL;
struct dentry *fanout[256]; /* Fanout subdirs */
};
+enum cachefiles_object_state {
+ CACHEFILES_ONDEMAND_OBJSTATE_CLOSE, /* Anonymous fd closed by daemon or initial state */
+ CACHEFILES_ONDEMAND_OBJSTATE_OPEN, /* Anonymous fd associated with object is available */
+ CACHEFILES_ONDEMAND_OBJSTATE_REOPENING, /* Object that was closed and is being reopened. */
+};
+
+struct cachefiles_ondemand_info {
+ struct work_struct ondemand_work;
+ int ondemand_id;
+ enum cachefiles_object_state state;
+ struct cachefiles_object *object;
+};
+
/*
* Backing file state.
*/
unsigned long flags;
#define CACHEFILES_OBJECT_USING_TMPFILE 0 /* Have an unlinked tmpfile */
#ifdef CONFIG_CACHEFILES_ONDEMAND
- int ondemand_id;
+ struct cachefiles_ondemand_info *ondemand;
#endif
};
extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
char *args);
+extern int cachefiles_ondemand_restore(struct cachefiles_cache *cache,
+ char *args);
+
extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
extern int cachefiles_ondemand_read(struct cachefiles_object *object,
loff_t pos, size_t len);
+extern int cachefiles_ondemand_init_obj_info(struct cachefiles_object *obj,
+ struct cachefiles_volume *volume);
+extern void cachefiles_ondemand_deinit_obj_info(struct cachefiles_object *obj);
+
+#define CACHEFILES_OBJECT_STATE_FUNCS(_state, _STATE) \
+static inline bool \
+cachefiles_ondemand_object_is_##_state(const struct cachefiles_object *object) \
+{ \
+ return object->ondemand->state == CACHEFILES_ONDEMAND_OBJSTATE_##_STATE; \
+} \
+ \
+static inline void \
+cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \
+{ \
+ object->ondemand->state = CACHEFILES_ONDEMAND_OBJSTATE_##_STATE; \
+}
+
+CACHEFILES_OBJECT_STATE_FUNCS(open, OPEN);
+CACHEFILES_OBJECT_STATE_FUNCS(close, CLOSE);
+CACHEFILES_OBJECT_STATE_FUNCS(reopening, REOPENING);
+
+static inline bool cachefiles_ondemand_is_reopening_read(struct cachefiles_req *req)
+{
+ return cachefiles_ondemand_object_is_reopening(req->object) &&
+ req->msg.opcode == CACHEFILES_OP_READ;
+}
+
#else
static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
char __user *_buffer, size_t buflen)
{
return -EOPNOTSUPP;
}
+
+static inline int cachefiles_ondemand_init_obj_info(struct cachefiles_object *obj,
+ struct cachefiles_volume *volume)
+{
+ return 0;
+}
+static inline void cachefiles_ondemand_deinit_obj_info(struct cachefiles_object *obj)
+{
+}
+
+static inline bool cachefiles_ondemand_is_reopening_read(struct cachefiles_req *req)
+{
+ return false;
+}
#endif
/*
_enter("%ld", ret);
- kiocb_end_write(iocb);
+ if (ki->was_async)
+ kiocb_end_write(iocb);
if (ret < 0)
trace_cachefiles_io_error(object, inode, ret,
ki->iocb.ki_complete = cachefiles_write_complete;
atomic_long_add(ki->b_writing, &cache->b_writing);
- kiocb_start_write(&ki->iocb);
-
get_file(ki->iocb.ki_filp);
cachefiles_grab_object(object, cachefiles_obj_get_ioreq);
{
struct cachefiles_object *object = file->private_data;
struct cachefiles_cache *cache = object->volume->cache;
- int object_id = object->ondemand_id;
+ struct cachefiles_ondemand_info *info = object->ondemand;
+ int object_id = info->ondemand_id;
struct cachefiles_req *req;
XA_STATE(xas, &cache->reqs, 0);
xa_lock(&cache->reqs);
- object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
+ info->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
+ cachefiles_ondemand_set_object_close(object);
- /*
- * Flush all pending READ requests since their completion depends on
- * anon_fd.
- */
- xas_for_each(&xas, req, ULONG_MAX) {
+ /* Only flush CACHEFILES_REQ_NEW marked req to avoid race with daemon_read */
+ xas_for_each_marked(&xas, req, ULONG_MAX, CACHEFILES_REQ_NEW) {
if (req->msg.object_id == object_id &&
- req->msg.opcode == CACHEFILES_OP_READ) {
- req->error = -EIO;
+ req->msg.opcode == CACHEFILES_OP_CLOSE) {
complete(&req->done);
xas_store(&xas, NULL);
}
set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
trace_cachefiles_ondemand_copen(req->object, id, size);
+ cachefiles_ondemand_set_object_open(req->object);
+ wake_up_all(&cache->daemon_pollwq);
+
out:
complete(&req->done);
return ret;
}
+int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args)
+{
+ struct cachefiles_req *req;
+
+ XA_STATE(xas, &cache->reqs, 0);
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return -EOPNOTSUPP;
+
+ /*
+ * Reset the requests to CACHEFILES_REQ_NEW state, so that the
+ * requests have been processed halfway before the crash of the
+ * user daemon could be reprocessed after the recovery.
+ */
+ xas_lock(&xas);
+ xas_for_each(&xas, req, ULONG_MAX)
+ xas_set_mark(&xas, CACHEFILES_REQ_NEW);
+ xas_unlock(&xas);
+
+ wake_up_all(&cache->daemon_pollwq);
+ return 0;
+}
+
static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
{
struct cachefiles_object *object;
load = (void *)req->msg.data;
load->fd = fd;
- req->msg.object_id = object_id;
- object->ondemand_id = object_id;
+ object->ondemand->ondemand_id = object_id;
cachefiles_get_unbind_pincount(cache);
trace_cachefiles_ondemand_open(object, &req->msg, load);
return ret;
}
+static void ondemand_object_worker(struct work_struct *work)
+{
+ struct cachefiles_ondemand_info *info =
+ container_of(work, struct cachefiles_ondemand_info, ondemand_work);
+
+ cachefiles_ondemand_init_object(info->object);
+}
+
+/*
+ * If there are any inflight or subsequent READ requests on the
+ * closed object, reopen it.
+ * Skip read requests whose related object is reopening.
+ */
+static struct cachefiles_req *cachefiles_ondemand_select_req(struct xa_state *xas,
+ unsigned long xa_max)
+{
+ struct cachefiles_req *req;
+ struct cachefiles_object *object;
+ struct cachefiles_ondemand_info *info;
+
+ xas_for_each_marked(xas, req, xa_max, CACHEFILES_REQ_NEW) {
+ if (req->msg.opcode != CACHEFILES_OP_READ)
+ return req;
+ object = req->object;
+ info = object->ondemand;
+ if (cachefiles_ondemand_object_is_close(object)) {
+ cachefiles_ondemand_set_object_reopening(object);
+ queue_work(fscache_wq, &info->ondemand_work);
+ continue;
+ }
+ if (cachefiles_ondemand_object_is_reopening(object))
+ continue;
+ return req;
+ }
+ return NULL;
+}
+
ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
char __user *_buffer, size_t buflen)
{
int ret = 0;
XA_STATE(xas, &cache->reqs, cache->req_id_next);
+ xa_lock(&cache->reqs);
/*
* Cyclically search for a request that has not ever been processed,
* to prevent requests from being processed repeatedly, and make
* request distribution fair.
*/
- xa_lock(&cache->reqs);
- req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW);
+ req = cachefiles_ondemand_select_req(&xas, ULONG_MAX);
if (!req && cache->req_id_next > 0) {
xas_set(&xas, 0);
- req = xas_find_marked(&xas, cache->req_id_next - 1, CACHEFILES_REQ_NEW);
+ req = cachefiles_ondemand_select_req(&xas, cache->req_id_next - 1);
}
if (!req) {
xa_unlock(&cache->reqs);
xa_unlock(&cache->reqs);
id = xas.xa_index;
- msg->msg_id = id;
if (msg->opcode == CACHEFILES_OP_OPEN) {
ret = cachefiles_ondemand_get_fd(req);
- if (ret)
+ if (ret) {
+ cachefiles_ondemand_set_object_close(req->object);
goto error;
+ }
}
+ msg->msg_id = id;
+ msg->object_id = req->object->ondemand->ondemand_id;
+
if (copy_to_user(_buffer, msg, n) != 0) {
ret = -EFAULT;
goto err_put_fd;
void *private)
{
struct cachefiles_cache *cache = object->volume->cache;
- struct cachefiles_req *req;
+ struct cachefiles_req *req = NULL;
XA_STATE(xas, &cache->reqs, 0);
int ret;
if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
return 0;
- if (test_bit(CACHEFILES_DEAD, &cache->flags))
- return -EIO;
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ ret = -EIO;
+ goto out;
+ }
req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL);
- if (!req)
- return -ENOMEM;
+ if (!req) {
+ ret = -ENOMEM;
+ goto out;
+ }
req->object = object;
init_completion(&req->done);
/* coupled with the barrier in cachefiles_flush_reqs() */
smp_mb();
- if (opcode != CACHEFILES_OP_OPEN && object->ondemand_id <= 0) {
- WARN_ON_ONCE(object->ondemand_id == 0);
+ if (opcode == CACHEFILES_OP_CLOSE &&
+ !cachefiles_ondemand_object_is_open(object)) {
+ WARN_ON_ONCE(object->ondemand->ondemand_id == 0);
xas_unlock(&xas);
ret = -EIO;
goto out;
wake_up_all(&cache->daemon_pollwq);
wait_for_completion(&req->done);
ret = req->error;
+ kfree(req);
+ return ret;
out:
+ /* Reset the object to close state in error handling path.
+ * If error occurs after creating the anonymous fd,
+ * cachefiles_ondemand_fd_release() will set object to close.
+ */
+ if (opcode == CACHEFILES_OP_OPEN)
+ cachefiles_ondemand_set_object_close(object);
kfree(req);
return ret;
}
void *private)
{
struct cachefiles_object *object = req->object;
- int object_id = object->ondemand_id;
- /*
- * It's possible that object id is still 0 if the cookie looking up
- * phase failed before OPEN request has ever been sent. Also avoid
- * sending CLOSE request for CACHEFILES_ONDEMAND_ID_CLOSED, which means
- * anon_fd has already been closed.
- */
- if (object_id <= 0)
+ if (!cachefiles_ondemand_object_is_open(object))
return -ENOENT;
- req->msg.object_id = object_id;
trace_cachefiles_ondemand_close(object, &req->msg);
return 0;
}
struct cachefiles_object *object = req->object;
struct cachefiles_read *load = (void *)req->msg.data;
struct cachefiles_read_ctx *read_ctx = private;
- int object_id = object->ondemand_id;
-
- /* Stop enqueuing requests when daemon has closed anon_fd. */
- if (object_id <= 0) {
- WARN_ON_ONCE(object_id == 0);
- pr_info_once("READ: anonymous fd closed prematurely.\n");
- return -EIO;
- }
- req->msg.object_id = object_id;
load->off = read_ctx->off;
load->len = read_ctx->len;
trace_cachefiles_ondemand_read(object, &req->msg, load);
* creating a new tmpfile as the cache file. Reuse the previously
* allocated object ID if any.
*/
- if (object->ondemand_id > 0)
+ if (cachefiles_ondemand_object_is_open(object))
return 0;
volume_key_size = volume->key[0] + 1;
cachefiles_ondemand_init_close_req, NULL);
}
+int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object,
+ struct cachefiles_volume *volume)
+{
+ if (!cachefiles_in_ondemand_mode(volume->cache))
+ return 0;
+
+ object->ondemand = kzalloc(sizeof(struct cachefiles_ondemand_info),
+ GFP_KERNEL);
+ if (!object->ondemand)
+ return -ENOMEM;
+
+ object->ondemand->object = object;
+ INIT_WORK(&object->ondemand->ondemand_work, ondemand_object_worker);
+ return 0;
+}
+
+void cachefiles_ondemand_deinit_obj_info(struct cachefiles_object *object)
+{
+ kfree(object->ondemand);
+ object->ondemand = NULL;
+}
+
int cachefiles_ondemand_read(struct cachefiles_object *object,
loff_t pos, size_t len)
{
#include <linux/falloc.h>
#include <linux/iversion.h>
#include <linux/ktime.h>
+#include <linux/splice.h>
#include "super.h"
#include "mds_client.h"
* {read,write}_iter, which will get caps again.
*/
put_rd_wr_caps(src_ci, src_got, dst_ci, dst_got);
- ret = do_splice_direct(src_file, &src_off, dst_file,
- &dst_off, src_objlen, flags);
+ ret = splice_file_range(src_file, &src_off, dst_file, &dst_off,
+ src_objlen);
/* Abort on short copies or on error */
if (ret < (long)src_objlen) {
doutc(cl, "Failed partial copy (%zd)\n", ret);
*/
if (len && (len < src_ci->i_layout.object_size)) {
doutc(cl, "Final partial copy of %zu bytes\n", len);
- bytes = do_splice_direct(src_file, &src_off, dst_file,
- &dst_off, len, flags);
+ bytes = splice_file_range(src_file, &src_off, dst_file,
+ &dst_off, len);
if (bytes > 0)
ret += bytes;
else
len, flags);
if (ret == -EOPNOTSUPP || ret == -EXDEV)
- ret = generic_copy_file_range(src_file, src_off, dst_file,
- dst_off, len, flags);
+ ret = splice_copy_file_range(src_file, src_off, dst_file,
+ dst_off, len);
return ret;
}
if (ret)
goto finish_write;
- file_start_write(host_file);
inode_lock(coda_inode);
ret = vfs_iter_write(cfi->cfi_container, to, &iocb->ki_pos, 0);
coda_inode->i_size = file_inode(host_file)->i_size;
coda_inode->i_blocks = (coda_inode->i_size + 511) >> 9;
inode_set_mtime_to_ts(coda_inode, inode_set_ctime_current(coda_inode));
inode_unlock(coda_inode);
- file_end_write(host_file);
finish_write:
venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
/* zero the edges if srcmap is a HOLE or IOMAP_UNWRITTEN */
bool zero_edge = srcmap->flags & IOMAP_F_SHARED ||
srcmap->type == IOMAP_UNWRITTEN;
- void *saddr = 0;
+ void *saddr = NULL;
int ret = 0;
if (!zero_edge) {
~DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
refcount_set(&fsd->active_users, 1);
init_completion(&fsd->active_users_drained);
+ INIT_LIST_HEAD(&fsd->cancellations);
+ mutex_init(&fsd->cancellations_mtx);
+
if (cmpxchg(&dentry->d_fsdata, d_fsd, fsd) != d_fsd) {
+ mutex_destroy(&fsd->cancellations_mtx);
kfree(fsd);
fsd = READ_ONCE(dentry->d_fsdata);
}
- INIT_LIST_HEAD(&fsd->cancellations);
- mutex_init(&fsd->cancellations_mtx);
}
/*
loff_t offset = iocb->ki_pos;
const loff_t end = offset + count;
struct dio *dio;
- struct dio_submit sdio = { 0, };
+ struct dio_submit sdio = { NULL, };
struct buffer_head map_bh = { 0, };
struct blk_plug plug;
unsigned long align = offset | iov_iter_alignment(iter);
int id;
};
-__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, __poll_t mask)
+/**
+ * eventfd_signal_mask - Increment the event counter
+ * @ctx: [in] Pointer to the eventfd context.
+ * @mask: [in] poll mask
+ *
+ * This function is supposed to be called by the kernel in paths that do not
+ * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
+ * value, and we signal this as overflow condition by returning a EPOLLERR
+ * to poll(2).
+ */
+void eventfd_signal_mask(struct eventfd_ctx *ctx, __poll_t mask)
{
unsigned long flags;
* safe context.
*/
if (WARN_ON_ONCE(current->in_eventfd))
- return 0;
+ return;
spin_lock_irqsave(&ctx->wqh.lock, flags);
current->in_eventfd = 1;
- if (ULLONG_MAX - ctx->count < n)
- n = ULLONG_MAX - ctx->count;
- ctx->count += n;
+ if (ctx->count < ULLONG_MAX)
+ ctx->count++;
if (waitqueue_active(&ctx->wqh))
wake_up_locked_poll(&ctx->wqh, EPOLLIN | mask);
current->in_eventfd = 0;
spin_unlock_irqrestore(&ctx->wqh.lock, flags);
-
- return n;
-}
-
-/**
- * eventfd_signal - Adds @n to the eventfd counter.
- * @ctx: [in] Pointer to the eventfd context.
- * @n: [in] Value of the counter to be added to the eventfd internal counter.
- * The value cannot be negative.
- *
- * This function is supposed to be called by the kernel in paths that do not
- * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
- * value, and we signal this as overflow condition by returning a EPOLLERR
- * to poll(2).
- *
- * Returns the amount by which the counter was incremented. This will be less
- * than @n if the counter has overflowed.
- */
-__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
-{
- return eventfd_signal_mask(ctx, n, 0);
}
-EXPORT_SYMBOL_GPL(eventfd_signal);
+EXPORT_SYMBOL_GPL(eventfd_signal_mask);
static void eventfd_free_ctx(struct eventfd_ctx *ctx)
{
if (ctx->id >= 0)
- ida_simple_remove(&eventfd_ida, ctx->id);
+ ida_free(&eventfd_ida, ctx->id);
kfree(ctx);
}
init_waitqueue_head(&ctx->wqh);
ctx->count = count;
ctx->flags = flags;
- ctx->id = ida_simple_get(&eventfd_ida, 0, 0, GFP_KERNEL);
+ ctx->id = ida_alloc(&eventfd_ida, GFP_KERNEL);
flags &= EFD_SHARED_FCNTL_FLAGS;
flags |= O_RDWR;
* We need to pick up the new inode size which generic_commit_write gave us
* `file' can be NULL - eg, when called from page_symlink().
*
- * ext4 never places buffers on inode->i_mapping->private_list. metadata
+ * ext4 never places buffers on inode->i_mapping->i_private_list. metadata
* buffers are managed internally.
*/
static int ext4_write_end(struct file *file,
}
/* Any metadata buffers to write? */
- if (!list_empty(&inode->i_mapping->private_list))
+ if (!list_empty(&inode->i_mapping->i_private_list))
return true;
return inode->i_state & I_DIRTY_DATASYNC;
}
switch (flags) {
case EXT4_GOING_FLAGS_DEFAULT:
- ret = freeze_bdev(sb->s_bdev);
+ ret = bdev_freeze(sb->s_bdev);
if (ret)
return ret;
set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
- thaw_bdev(sb->s_bdev);
+ bdev_thaw(sb->s_bdev);
break;
case EXT4_GOING_FLAGS_LOGFLUSH:
set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
struct ext4_super_block *es;
int errno;
- /* see get_tree_bdev why this is needed and safe */
- up_write(&sb->s_umount);
- bdev_handle = bdev_open_by_dev(j_dev, BLK_OPEN_READ | BLK_OPEN_WRITE,
- sb, &fs_holder_ops);
- down_write(&sb->s_umount);
+ bdev_handle = bdev_open_by_dev(j_dev,
+ BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
+ sb, &fs_holder_ops);
if (IS_ERR(bdev_handle)) {
ext4_msg(sb, KERN_ERR,
"failed to open journal device unknown-block(%u,%u) %ld",
switch (in) {
case F2FS_GOING_DOWN_FULLSYNC:
- ret = freeze_bdev(sb->s_bdev);
+ ret = bdev_freeze(sb->s_bdev);
if (ret)
goto out;
f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_SHUTDOWN);
- thaw_bdev(sb->s_bdev);
+ bdev_thaw(sb->s_bdev);
break;
case F2FS_GOING_DOWN_METASYNC:
/* do checkpoint only */
EXPORT_SYMBOL(fd_install);
/**
- * pick_file - return file associatd with fd
+ * file_close_fd_locked - return file associated with fd
* @files: file struct to retrieve file from
* @fd: file descriptor to retrieve file for
*
+ * Doesn't take a separate reference count.
+ *
* Context: files_lock must be held.
*
* Returns: The file associated with @fd (NULL if @fd is not open)
*/
-static struct file *pick_file(struct files_struct *files, unsigned fd)
+struct file *file_close_fd_locked(struct files_struct *files, unsigned fd)
{
struct fdtable *fdt = files_fdtable(files);
struct file *file;
+ lockdep_assert_held(&files->file_lock);
+
if (fd >= fdt->max_fds)
return NULL;
struct file *file;
spin_lock(&files->file_lock);
- file = pick_file(files, fd);
+ file = file_close_fd_locked(files, fd);
spin_unlock(&files->file_lock);
if (!file)
return -EBADF;
max_fd = min(max_fd, n);
for (; fd <= max_fd; fd++) {
- file = pick_file(files, fd);
+ file = file_close_fd_locked(files, fd);
if (file) {
spin_unlock(&files->file_lock);
filp_close(file, files);
return 0;
}
-/*
- * See close_fd_get_file() below, this variant assumes current->files->file_lock
- * is held.
- */
-struct file *__close_fd_get_file(unsigned int fd)
-{
- return pick_file(current->files, fd);
-}
-
-/*
- * variant of close_fd that gets a ref on the file for later fput.
- * The caller must ensure that filp_close() called on the file.
+/**
+ * file_close_fd - return file associated with fd
+ * @fd: file descriptor to retrieve file for
+ *
+ * Doesn't take a separate reference count.
+ *
+ * Returns: The file associated with @fd (NULL if @fd is not open)
*/
-struct file *close_fd_get_file(unsigned int fd)
+struct file *file_close_fd(unsigned int fd)
{
struct files_struct *files = current->files;
struct file *file;
spin_lock(&files->file_lock);
- file = pick_file(files, fd);
+ file = file_close_fd_locked(files, fd);
spin_unlock(&files->file_lock);
return file;
struct file *file;
struct fdtable *fdt = rcu_dereference_raw(files->fdt);
struct file __rcu **fdentry;
+ unsigned long nospec_mask;
- if (unlikely(fd >= fdt->max_fds))
- return NULL;
-
- fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+ /* Mask is a 0 for invalid fd's, ~0 for valid ones */
+ nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
/*
- * Ok, we have a file pointer. However, because we do
- * this all locklessly under RCU, we may be racing with
- * that file being closed.
- *
- * Such a race can take two forms:
- *
- * (a) the file ref already went down to zero and the
- * file hasn't been reused yet or the file count
- * isn't zero but the file has already been reused.
+ * fdentry points to the 'fd' offset, or fdt->fd[0].
+ * Loading from fdt->fd[0] is always safe, because the
+ * array always exists.
*/
- file = __get_file_rcu(fdentry);
+ fdentry = fdt->fd + (fd & nospec_mask);
+
+ /* Do the load, then mask any invalid result */
+ file = rcu_dereference_raw(*fdentry);
+ file = (void *)(nospec_mask & (unsigned long)file);
if (unlikely(!file))
return NULL;
- if (unlikely(IS_ERR(file)))
+ /*
+ * Ok, we have a file pointer that was valid at
+ * some point, but it might have become stale since.
+ *
+ * We need to confirm it by incrementing the refcount
+ * and then check the lookup again.
+ *
+ * atomic_long_inc_not_zero() gives us a full memory
+ * barrier. We only really need an 'acquire' one to
+ * protect the loads below, but we don't have that.
+ */
+ if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
continue;
/*
+ * Such a race can take two forms:
+ *
+ * (a) the file ref already went down to zero and the
+ * file hasn't been reused yet or the file count
+ * isn't zero but the file has already been reused.
+ *
* (b) the file table entry has changed under us.
* Note that we don't need to re-check the 'fdt->fd'
* pointer having changed, because it always goes
*
* If so, we need to put our ref and try again.
*/
- if (unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
+ if (unlikely(file != rcu_dereference_raw(*fdentry)) ||
+ unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
fput(file);
continue;
}
* atomic_read_acquire() pairs with atomic_dec_and_test() in
* put_files_struct().
*/
- if (atomic_read_acquire(&files->count) == 1) {
+ if (likely(atomic_read_acquire(&files->count) == 1)) {
file = files_lookup_fd_raw(files, fd);
if (!file || unlikely(file->f_mode & mask))
return 0;
return (unsigned long)file;
} else {
- file = __fget(fd, mask);
+ file = __fget_files(files, fd, mask);
if (!file)
return 0;
return FDPUT_FPUT | (unsigned long)file;
}
/**
- * __receive_fd() - Install received file into file descriptor table
+ * receive_fd() - Install received file into file descriptor table
* @file: struct file that was received from another process
* @ufd: __user pointer to write new fd number to
* @o_flags: the O_* flags to apply to the new fd entry
*
* Returns newly install fd or -ve on error.
*/
-int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
+int receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
{
int new_fd;
int error;
__receive_sock(file);
return new_fd;
}
+EXPORT_SYMBOL_GPL(receive_fd);
int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags)
{
return new_fd;
}
-int receive_fd(struct file *file, unsigned int o_flags)
-{
- return __receive_fd(file, NULL, o_flags);
-}
-EXPORT_SYMBOL_GPL(receive_fd);
-
static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
{
int err = -EBADF;
}
}
-void release_empty_file(struct file *f)
-{
- WARN_ON_ONCE(f->f_mode & (FMODE_BACKING | FMODE_OPENED));
- if (atomic_long_dec_and_test(&f->f_count)) {
- security_file_free(f);
- put_cred(f->f_cred);
- if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
- percpu_counter_dec(&nr_files);
- kmem_cache_free(filp_cachep, f);
- }
-}
-
/*
* Return the total number of open files in the system
*/
static void ____fput(struct callback_head *work)
{
- __fput(container_of(work, struct file, f_rcuhead));
+ __fput(container_of(work, struct file, f_task_work));
}
/*
if (atomic_long_dec_and_test(&file->f_count)) {
struct task_struct *task = current;
+ if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
+ file_free(file);
+ return;
+ }
if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
- init_task_work(&file->f_rcuhead, ____fput);
- if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
+ init_task_work(&file->f_task_work, ____fput);
+ if (!task_work_add(task, &file->f_task_work, TWA_RESUME))
return;
/*
* After this task has run exit_task_work(),
#include <linux/uio.h>
#include <linux/fs.h>
#include <linux/filelock.h>
+#include <linux/splice.h>
static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
unsigned int open_flags, int opcode,
len, flags);
if (ret == -EOPNOTSUPP || ret == -EXDEV)
- ret = generic_copy_file_range(src_file, src_off, dst_file,
- dst_off, len, flags);
+ ret = splice_copy_file_range(src_file, src_off, dst_file,
+ dst_off, len);
return ret;
}
mapping->host = s->s_bdev->bd_inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
- mapping->private_data = NULL;
+ mapping->i_private_data = NULL;
mapping->writeback_index = 0;
}
mapping->host = sb->s_bdev->bd_inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
- mapping->private_data = NULL;
+ mapping->i_private_data = NULL;
mapping->writeback_index = 0;
spin_lock_init(&sdp->sd_log_lock);
* @sector: block to read or write, for blocks of HFSPLUS_SECTOR_SIZE bytes
* @buf: buffer for I/O
* @data: output pointer for location of requested data
- * @op: direction of I/O
- * @op_flags: request op flags
+ * @opf: request op flags
*
* The unit of I/O is hfsplus_min_io_size(sb), which may be bigger than
* HFSPLUS_SECTOR_SIZE, and @buf must be sized accordingly. On reads
* that starts at the rounded-down address. As long as the data was
* read using hfsplus_submit_bio() and the same buffer is used things
* will work correctly.
+ *
+ * Returns: %0 on success else -errno code
*/
int hfsplus_submit_bio(struct super_block *sb, sector_t sector,
void *buf, void **data, blk_opf_t opf)
* at inode creation time. If this is a device special inode,
* i_mapping may not point to the original address space.
*/
- resv_map = (struct resv_map *)(&inode->i_data)->private_data;
+ resv_map = (struct resv_map *)(&inode->i_data)->i_private_data;
/* Only regular and link inodes have associated reserve maps */
if (resv_map)
resv_map_release(&resv_map->refs);
&hugetlbfs_i_mmap_rwsem_key);
inode->i_mapping->a_ops = &hugetlbfs_aops;
simple_inode_init_ts(inode);
- inode->i_mapping->private_data = resv_map;
+ inode->i_mapping->i_private_data = resv_map;
info->seals = F_SEAL_SEAL;
switch (mode & S_IFMT) {
default:
atomic_set(&mapping->nr_thps, 0);
#endif
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
- mapping->private_data = NULL;
+ mapping->i_private_data = NULL;
mapping->writeback_index = 0;
init_rwsem(&mapping->invalidate_lock);
lockdep_set_class_and_name(&mapping->invalidate_lock,
{
xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT);
init_rwsem(&mapping->i_mmap_rwsem);
- INIT_LIST_HEAD(&mapping->private_list);
- spin_lock_init(&mapping->private_lock);
+ INIT_LIST_HEAD(&mapping->i_private_list);
+ spin_lock_init(&mapping->i_private_lock);
mapping->i_mmap = RB_ROOT_CACHED;
}
* nor even WARN_ON(!mapping_empty).
*/
xa_unlock_irq(&inode->i_data.i_pages);
- BUG_ON(!list_empty(&inode->i_data.private_list));
+ BUG_ON(!list_empty(&inode->i_data.i_private_list));
BUG_ON(!(inode->i_state & I_FREEING));
BUG_ON(inode->i_state & I_CLEAR);
BUG_ON(!list_empty(&inode->i_wb_list));
* earlier than or equal to either the ctime or mtime,
* or if at least a day has passed since the last atime update.
*/
-static int relatime_need_update(struct vfsmount *mnt, struct inode *inode,
+static bool relatime_need_update(struct vfsmount *mnt, struct inode *inode,
struct timespec64 now)
{
struct timespec64 atime, mtime, ctime;
if (!(mnt->mnt_flags & MNT_RELATIME))
- return 1;
+ return true;
/*
* Is mtime younger than or equal to atime? If yes, update atime:
*/
atime = inode_get_atime(inode);
mtime = inode_get_mtime(inode);
if (timespec64_compare(&mtime, &atime) >= 0)
- return 1;
+ return true;
/*
* Is ctime younger than or equal to atime? If yes, update atime:
*/
ctime = inode_get_ctime(inode);
if (timespec64_compare(&ctime, &atime) >= 0)
- return 1;
+ return true;
/*
* Is the previous atime value older than a day? If yes,
* update atime:
*/
if ((long)(now.tv_sec - atime.tv_sec) >= 24*60*60)
- return 1;
+ return true;
/*
* Good, we can skip the atime update:
*/
- return 0;
+ return false;
}
/**
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
bool inode_owner_or_capable(struct mnt_idmap *idmap,
const struct inode *inode)
const char *type_page, unsigned long flags, void *data_page);
int path_umount(struct path *path, int flags);
+int show_path(struct seq_file *m, struct dentry *root);
+
/*
* fs_struct.c
*/
struct file *alloc_empty_file(int flags, const struct cred *cred);
struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
-void release_empty_file(struct file *f);
static inline void file_put_write_access(struct file *file)
{
const char *, const struct open_flags *);
extern struct open_how build_open_how(int flags, umode_t mode);
extern int build_open_flags(const struct open_how *how, struct open_flags *op);
-extern struct file *__close_fd_get_file(unsigned int fd);
+struct file *file_close_fd_locked(struct files_struct *files, unsigned fd);
long do_sys_ftruncate(unsigned int fd, loff_t length, int small);
int chmod_common(const struct path *path, umode_t mode);
/*
* fs/splice.c:
*/
-long splice_file_to_pipe(struct file *in,
- struct pipe_inode_info *opipe,
- loff_t *offset,
- size_t len, unsigned int flags);
+ssize_t splice_file_to_pipe(struct file *in,
+ struct pipe_inode_info *opipe,
+ loff_t *offset,
+ size_t len, unsigned int flags);
/*
* fs/xattr.c:
#include "internal.h"
+/*
+ * Outside of this file vfs{g,u}id_t are always created from k{g,u}id_t,
+ * never from raw values. These are just internal helpers.
+ */
+#define VFSUIDT_INIT_RAW(val) (vfsuid_t){ val }
+#define VFSGIDT_INIT_RAW(val) (vfsgid_t){ val }
+
struct mnt_idmap {
- struct user_namespace *owner;
+ struct uid_gid_map uid_map;
+ struct uid_gid_map gid_map;
refcount_t count;
};
* mapped to {g,u}id 1, [...], {g,u}id 1000 to {g,u}id 1000, [...].
*/
struct mnt_idmap nop_mnt_idmap = {
- .owner = &init_user_ns,
.count = REFCOUNT_INIT(1),
};
EXPORT_SYMBOL_GPL(nop_mnt_idmap);
-/**
- * check_fsmapping - check whether an mount idmapping is allowed
- * @idmap: idmap of the relevent mount
- * @sb: super block of the filesystem
- *
- * Return: true if @idmap is allowed, false if not.
- */
-bool check_fsmapping(const struct mnt_idmap *idmap,
- const struct super_block *sb)
-{
- return idmap->owner != sb->s_user_ns;
-}
-
/**
* initial_idmapping - check whether this is the initial mapping
* @ns: idmapping to check
return ns == &init_user_ns;
}
-/**
- * no_idmapping - check whether we can skip remapping a kuid/gid
- * @mnt_userns: the mount's idmapping
- * @fs_userns: the filesystem's idmapping
- *
- * This function can be used to check whether a remapping between two
- * idmappings is required.
- * An idmapped mount is a mount that has an idmapping attached to it that
- * is different from the filsystem's idmapping and the initial idmapping.
- * If the initial mapping is used or the idmapping of the mount and the
- * filesystem are identical no remapping is required.
- *
- * Return: true if remapping can be skipped, false if not.
- */
-static inline bool no_idmapping(const struct user_namespace *mnt_userns,
- const struct user_namespace *fs_userns)
-{
- return initial_idmapping(mnt_userns) || mnt_userns == fs_userns;
-}
-
/**
* make_vfsuid - map a filesystem kuid according to an idmapping
* @idmap: the mount's idmapping
* Take a @kuid and remap it from @fs_userns into @idmap. Use this
* function when preparing a @kuid to be reported to userspace.
*
- * If no_idmapping() determines that this is not an idmapped mount we can
- * simply return @kuid unchanged.
+ * If initial_idmapping() determines that this is not an idmapped mount
+ * we can simply return @kuid unchanged.
* If initial_idmapping() tells us that the filesystem is not mounted with an
* idmapping we know the value of @kuid won't change when calling
* from_kuid() so we can simply retrieve the value via __kuid_val()
*/
vfsuid_t make_vfsuid(struct mnt_idmap *idmap,
- struct user_namespace *fs_userns,
- kuid_t kuid)
+ struct user_namespace *fs_userns,
+ kuid_t kuid)
{
uid_t uid;
- struct user_namespace *mnt_userns = idmap->owner;
- if (no_idmapping(mnt_userns, fs_userns))
+ if (idmap == &nop_mnt_idmap)
return VFSUIDT_INIT(kuid);
if (initial_idmapping(fs_userns))
uid = __kuid_val(kuid);
uid = from_kuid(fs_userns, kuid);
if (uid == (uid_t)-1)
return INVALID_VFSUID;
- return VFSUIDT_INIT(make_kuid(mnt_userns, uid));
+ return VFSUIDT_INIT_RAW(map_id_down(&idmap->uid_map, uid));
}
EXPORT_SYMBOL_GPL(make_vfsuid);
* Take a @kgid and remap it from @fs_userns into @idmap. Use this
* function when preparing a @kgid to be reported to userspace.
*
- * If no_idmapping() determines that this is not an idmapped mount we can
- * simply return @kgid unchanged.
+ * If initial_idmapping() determines that this is not an idmapped mount
+ * we can simply return @kgid unchanged.
* If initial_idmapping() tells us that the filesystem is not mounted with an
* idmapping we know the value of @kgid won't change when calling
* from_kgid() so we can simply retrieve the value via __kgid_val()
struct user_namespace *fs_userns, kgid_t kgid)
{
gid_t gid;
- struct user_namespace *mnt_userns = idmap->owner;
- if (no_idmapping(mnt_userns, fs_userns))
+ if (idmap == &nop_mnt_idmap)
return VFSGIDT_INIT(kgid);
if (initial_idmapping(fs_userns))
gid = __kgid_val(kgid);
gid = from_kgid(fs_userns, kgid);
if (gid == (gid_t)-1)
return INVALID_VFSGID;
- return VFSGIDT_INIT(make_kgid(mnt_userns, gid));
+ return VFSGIDT_INIT_RAW(map_id_down(&idmap->gid_map, gid));
}
EXPORT_SYMBOL_GPL(make_vfsgid);
struct user_namespace *fs_userns, vfsuid_t vfsuid)
{
uid_t uid;
- struct user_namespace *mnt_userns = idmap->owner;
- if (no_idmapping(mnt_userns, fs_userns))
+ if (idmap == &nop_mnt_idmap)
return AS_KUIDT(vfsuid);
- uid = from_kuid(mnt_userns, AS_KUIDT(vfsuid));
+ uid = map_id_up(&idmap->uid_map, __vfsuid_val(vfsuid));
if (uid == (uid_t)-1)
return INVALID_UID;
if (initial_idmapping(fs_userns))
struct user_namespace *fs_userns, vfsgid_t vfsgid)
{
gid_t gid;
- struct user_namespace *mnt_userns = idmap->owner;
- if (no_idmapping(mnt_userns, fs_userns))
+ if (idmap == &nop_mnt_idmap)
return AS_KGIDT(vfsgid);
- gid = from_kgid(mnt_userns, AS_KGIDT(vfsgid));
+ gid = map_id_up(&idmap->gid_map, __vfsgid_val(vfsgid));
if (gid == (gid_t)-1)
return INVALID_GID;
if (initial_idmapping(fs_userns))
#endif
EXPORT_SYMBOL_GPL(vfsgid_in_group_p);
+static int copy_mnt_idmap(struct uid_gid_map *map_from,
+ struct uid_gid_map *map_to)
+{
+ struct uid_gid_extent *forward, *reverse;
+ u32 nr_extents = READ_ONCE(map_from->nr_extents);
+ /* Pairs with smp_wmb() when writing the idmapping. */
+ smp_rmb();
+
+ /*
+ * Don't blindly copy @map_to into @map_from if nr_extents is
+ * smaller or equal to UID_GID_MAP_MAX_BASE_EXTENTS. Since we
+ * read @nr_extents someone could have written an idmapping and
+ * then we might end up with inconsistent data. So just don't do
+ * anything at all.
+ */
+ if (nr_extents == 0)
+ return 0;
+
+ /*
+ * Here we know that nr_extents is greater than zero which means
+ * a map has been written. Since idmappings can't be changed
+ * once they have been written we know that we can safely copy
+ * from @map_to into @map_from.
+ */
+
+ if (nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) {
+ *map_to = *map_from;
+ return 0;
+ }
+
+ forward = kmemdup(map_from->forward,
+ nr_extents * sizeof(struct uid_gid_extent),
+ GFP_KERNEL_ACCOUNT);
+ if (!forward)
+ return -ENOMEM;
+
+ reverse = kmemdup(map_from->reverse,
+ nr_extents * sizeof(struct uid_gid_extent),
+ GFP_KERNEL_ACCOUNT);
+ if (!reverse) {
+ kfree(forward);
+ return -ENOMEM;
+ }
+
+ /*
+ * The idmapping isn't exposed anywhere so we don't need to care
+ * about ordering between extent pointers and @nr_extents
+ * initialization.
+ */
+ map_to->forward = forward;
+ map_to->reverse = reverse;
+ map_to->nr_extents = nr_extents;
+ return 0;
+}
+
+static void free_mnt_idmap(struct mnt_idmap *idmap)
+{
+ if (idmap->uid_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) {
+ kfree(idmap->uid_map.forward);
+ kfree(idmap->uid_map.reverse);
+ }
+ if (idmap->gid_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) {
+ kfree(idmap->gid_map.forward);
+ kfree(idmap->gid_map.reverse);
+ }
+ kfree(idmap);
+}
+
struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns)
{
struct mnt_idmap *idmap;
+ int ret;
idmap = kzalloc(sizeof(struct mnt_idmap), GFP_KERNEL_ACCOUNT);
if (!idmap)
return ERR_PTR(-ENOMEM);
- idmap->owner = get_user_ns(mnt_userns);
refcount_set(&idmap->count, 1);
+ ret = copy_mnt_idmap(&mnt_userns->uid_map, &idmap->uid_map);
+ if (!ret)
+ ret = copy_mnt_idmap(&mnt_userns->gid_map, &idmap->gid_map);
+ if (ret) {
+ free_mnt_idmap(idmap);
+ idmap = ERR_PTR(ret);
+ }
return idmap;
}
*/
void mnt_idmap_put(struct mnt_idmap *idmap)
{
- if (idmap != &nop_mnt_idmap && refcount_dec_and_test(&idmap->count)) {
- put_user_ns(idmap->owner);
- kfree(idmap);
- }
+ if (idmap != &nop_mnt_idmap && refcount_dec_and_test(&idmap->count))
+ free_mnt_idmap(idmap);
}
EXPORT_SYMBOL_GPL(mnt_idmap_put);
struct mnt_namespace {
struct ns_common ns;
struct mount * root;
- /*
- * Traversal and modification of .list is protected by either
- * - taking namespace_sem for write, OR
- * - taking namespace_sem for read AND taking .ns_lock.
- */
- struct list_head list;
- spinlock_t ns_lock;
+ struct rb_root mounts; /* Protected by namespace_sem */
struct user_namespace *user_ns;
struct ucounts *ucounts;
u64 seq; /* Sequence number to prevent loops */
wait_queue_head_t poll;
u64 event;
- unsigned int mounts; /* # of mounts in the namespace */
+ unsigned int nr_mounts; /* # of mounts in the namespace */
unsigned int pending_mounts;
} __randomize_layout;
struct list_head mnt_child; /* and going through their mnt_child */
struct list_head mnt_instance; /* mount instance on sb->s_mounts */
const char *mnt_devname; /* Name of device e.g. /dev/dsk/hda1 */
- struct list_head mnt_list;
+ union {
+ struct rb_node mnt_node; /* Under ns->mounts */
+ struct list_head mnt_list;
+ };
struct list_head mnt_expire; /* link in fs-specific expiry list */
struct list_head mnt_share; /* circular list of shared mounts */
struct list_head mnt_slave_list;/* list of slave mounts */
struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks;
__u32 mnt_fsnotify_mask;
#endif
- int mnt_id; /* mount identifier */
+ int mnt_id; /* mount identifier, reused */
+ u64 mnt_id_unique; /* mount ID unique until reboot */
int mnt_group_id; /* peer group identifier */
int mnt_expiry_mark; /* true if marked for expiry */
struct hlist_head mnt_pins;
struct mnt_namespace *ns;
struct path root;
int (*show)(struct seq_file *, struct vfsmount *);
- struct mount cursor;
};
extern const struct seq_operations mounts_op;
return ns->seq == 0;
}
+static inline void move_from_ns(struct mount *mnt, struct list_head *dt_list)
+{
+ WARN_ON(!(mnt->mnt.mnt_flags & MNT_ONRB));
+ mnt->mnt.mnt_flags &= ~MNT_ONRB;
+ rb_erase(&mnt->mnt_node, &mnt->mnt_ns->mounts);
+ list_add_tail(&mnt->mnt_list, dt_list);
+}
+
extern void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor);
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
static int check_acl(struct mnt_idmap *idmap,
struct inode *inode, int mask)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
static int acl_permission_check(struct mnt_idmap *idmap,
struct inode *inode, int mask)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int generic_permission(struct mnt_idmap *idmap, struct inode *inode,
int mask)
return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry));
}
-/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
+/* Returns 0 and nd will be valid on success; Returns error, otherwise. */
static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path)
{
const char *s = path_init(nd, flags);
return retval;
}
-/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
+/* Returns 0 and nd will be valid on success; Returns error, otherwise. */
static int path_parentat(struct nameidata *nd, unsigned flags,
struct path *parent)
{
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int vfs_create(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode, bool want_excl)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
static int vfs_tmpfile(struct mnt_idmap *idmap,
const struct path *parentpath,
WARN_ON(1);
error = -EINVAL;
}
- if (unlikely(file->f_mode & FMODE_OPENED))
- fput(file);
- else
- release_empty_file(file);
+ fput(file);
if (error == -EOPENSTALE) {
if (flags & LOOKUP_RCU)
error = -ECHILD;
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode, dev_t dev)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int vfs_unlink(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, struct inode **delegated_inode)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int vfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, const char *oldname)
* the vfsmount must be passed through @idmap. This function will then take
* care to map the inode according to @idmap before checking permissions.
* On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply passs @nop_mnt_idmap.
+ * raw inode simply pass @nop_mnt_idmap.
*/
int vfs_link(struct dentry *old_dentry, struct mnt_idmap *idmap,
struct inode *dir, struct dentry *new_dentry,
#include <linux/fs_context.h>
#include <linux/shmem_fs.h>
#include <linux/mnt_idmapping.h>
+#include <linux/nospec.h>
#include "pnode.h"
#include "internal.h"
static DEFINE_IDA(mnt_id_ida);
static DEFINE_IDA(mnt_group_ida);
+/* Don't allow confusion with old 32bit mount ID */
+static atomic64_t mnt_id_ctr = ATOMIC64_INIT(1ULL << 32);
+
static struct hlist_head *mount_hashtable __ro_after_init;
static struct hlist_head *mountpoint_hashtable __ro_after_init;
static struct kmem_cache *mnt_cache __ro_after_init;
if (res < 0)
return res;
mnt->mnt_id = res;
+ mnt->mnt_id_unique = atomic64_inc_return(&mnt_id_ctr);
return 0;
}
return m;
}
-static inline void lock_ns_list(struct mnt_namespace *ns)
-{
- spin_lock(&ns->ns_lock);
-}
-
-static inline void unlock_ns_list(struct mnt_namespace *ns)
-{
- spin_unlock(&ns->ns_lock);
-}
-
-static inline bool mnt_is_cursor(struct mount *mnt)
-{
- return mnt->mnt.mnt_flags & MNT_CURSOR;
-}
-
/*
* __is_local_mountpoint - Test to see if dentry is a mountpoint in the
* current mount namespace.
bool __is_local_mountpoint(struct dentry *dentry)
{
struct mnt_namespace *ns = current->nsproxy->mnt_ns;
- struct mount *mnt;
+ struct mount *mnt, *n;
bool is_covered = false;
down_read(&namespace_sem);
- lock_ns_list(ns);
- list_for_each_entry(mnt, &ns->list, mnt_list) {
- if (mnt_is_cursor(mnt))
- continue;
+ rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
is_covered = (mnt->mnt_mountpoint == dentry);
if (is_covered)
break;
}
- unlock_ns_list(ns);
up_read(&namespace_sem);
return is_covered;
mnt_add_count(old_parent, -1);
}
+static inline struct mount *node_to_mount(struct rb_node *node)
+{
+ return node ? rb_entry(node, struct mount, mnt_node) : NULL;
+}
+
+static void mnt_add_to_ns(struct mnt_namespace *ns, struct mount *mnt)
+{
+ struct rb_node **link = &ns->mounts.rb_node;
+ struct rb_node *parent = NULL;
+
+ WARN_ON(mnt->mnt.mnt_flags & MNT_ONRB);
+ mnt->mnt_ns = ns;
+ while (*link) {
+ parent = *link;
+ if (mnt->mnt_id_unique < node_to_mount(parent)->mnt_id_unique)
+ link = &parent->rb_left;
+ else
+ link = &parent->rb_right;
+ }
+ rb_link_node(&mnt->mnt_node, parent, link);
+ rb_insert_color(&mnt->mnt_node, &ns->mounts);
+ mnt->mnt.mnt_flags |= MNT_ONRB;
+}
+
/*
* vfsmount lock must be held for write
*/
BUG_ON(parent == mnt);
list_add_tail(&head, &mnt->mnt_list);
- list_for_each_entry(m, &head, mnt_list)
- m->mnt_ns = n;
+ while (!list_empty(&head)) {
+ m = list_first_entry(&head, typeof(*m), mnt_list);
+ list_del(&m->mnt_list);
- list_splice(&head, n->list.prev);
-
- n->mounts += n->pending_mounts;
+ mnt_add_to_ns(n, m);
+ }
+ n->nr_mounts += n->pending_mounts;
n->pending_mounts = 0;
__attach_mnt(mnt, parent);
}
mnt->mnt.mnt_flags = old->mnt.mnt_flags;
- mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL);
+ mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL|MNT_ONRB);
atomic_inc(&sb->s_active);
mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt));
return &p->mnt;
}
-#ifdef CONFIG_PROC_FS
-static struct mount *mnt_list_next(struct mnt_namespace *ns,
- struct list_head *p)
+/*
+ * Returns the mount which either has the specified mnt_id, or has the next
+ * smallest id afer the specified one.
+ */
+static struct mount *mnt_find_id_at(struct mnt_namespace *ns, u64 mnt_id)
{
- struct mount *mnt, *ret = NULL;
+ struct rb_node *node = ns->mounts.rb_node;
+ struct mount *ret = NULL;
- lock_ns_list(ns);
- list_for_each_continue(p, &ns->list) {
- mnt = list_entry(p, typeof(*mnt), mnt_list);
- if (!mnt_is_cursor(mnt)) {
- ret = mnt;
- break;
+ while (node) {
+ struct mount *m = node_to_mount(node);
+
+ if (mnt_id <= m->mnt_id_unique) {
+ ret = node_to_mount(node);
+ if (mnt_id == m->mnt_id_unique)
+ break;
+ node = node->rb_left;
+ } else {
+ node = node->rb_right;
}
}
- unlock_ns_list(ns);
-
return ret;
}
+#ifdef CONFIG_PROC_FS
+
/* iterator; we want it to have access to namespace_sem, thus here... */
static void *m_start(struct seq_file *m, loff_t *pos)
{
struct proc_mounts *p = m->private;
- struct list_head *prev;
down_read(&namespace_sem);
- if (!*pos) {
- prev = &p->ns->list;
- } else {
- prev = &p->cursor.mnt_list;
- /* Read after we'd reached the end? */
- if (list_empty(prev))
- return NULL;
- }
-
- return mnt_list_next(p->ns, prev);
+ return mnt_find_id_at(p->ns, *pos);
}
static void *m_next(struct seq_file *m, void *v, loff_t *pos)
{
- struct proc_mounts *p = m->private;
- struct mount *mnt = v;
+ struct mount *next = NULL, *mnt = v;
+ struct rb_node *node = rb_next(&mnt->mnt_node);
++*pos;
- return mnt_list_next(p->ns, &mnt->mnt_list);
+ if (node) {
+ next = node_to_mount(node);
+ *pos = next->mnt_id_unique;
+ }
+ return next;
}
static void m_stop(struct seq_file *m, void *v)
{
- struct proc_mounts *p = m->private;
- struct mount *mnt = v;
-
- lock_ns_list(p->ns);
- if (mnt)
- list_move_tail(&p->cursor.mnt_list, &mnt->mnt_list);
- else
- list_del_init(&p->cursor.mnt_list);
- unlock_ns_list(p->ns);
up_read(&namespace_sem);
}
.show = m_show,
};
-void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor)
-{
- down_read(&namespace_sem);
- lock_ns_list(ns);
- list_del(&cursor->mnt_list);
- unlock_ns_list(ns);
- up_read(&namespace_sem);
-}
#endif /* CONFIG_PROC_FS */
/**
/* Gather the mounts to umount */
for (p = mnt; p; p = next_mnt(p, mnt)) {
p->mnt.mnt_flags |= MNT_UMOUNT;
- list_move(&p->mnt_list, &tmp_list);
+ if (p->mnt.mnt_flags & MNT_ONRB)
+ move_from_ns(p, &tmp_list);
+ else
+ list_move(&p->mnt_list, &tmp_list);
}
/* Hide the mounts from mnt_mounts */
list_del_init(&p->mnt_list);
ns = p->mnt_ns;
if (ns) {
- ns->mounts--;
+ ns->nr_mounts--;
__touch_mnt_namespace(ns);
}
p->mnt_ns = NULL;
event++;
if (flags & MNT_DETACH) {
- if (!list_empty(&mnt->mnt_list))
+ if (mnt->mnt.mnt_flags & MNT_ONRB ||
+ !list_empty(&mnt->mnt_list))
umount_tree(mnt, UMOUNT_PROPAGATE);
retval = 0;
} else {
shrink_submounts(mnt);
retval = -EBUSY;
if (!propagate_mount_busy(mnt, 2)) {
- if (!list_empty(&mnt->mnt_list))
+ if (mnt->mnt.mnt_flags & MNT_ONRB ||
+ !list_empty(&mnt->mnt_list))
umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
retval = 0;
}
unsigned int mounts = 0;
struct mount *p;
- if (ns->mounts >= max)
+ if (ns->nr_mounts >= max)
return -ENOSPC;
- max -= ns->mounts;
+ max -= ns->nr_mounts;
if (ns->pending_mounts >= max)
return -ENOSPC;
max -= ns->pending_mounts;
touch_mnt_namespace(source_mnt->mnt_ns);
} else {
if (source_mnt->mnt_ns) {
+ LIST_HEAD(head);
+
/* move from anon - the caller will destroy */
- list_del_init(&source_mnt->mnt_ns->list);
+ for (p = source_mnt; p; p = next_mnt(p, source_mnt))
+ move_from_ns(p, &head);
+ list_del_init(&head);
}
if (beneath)
mnt_set_mountpoint_beneath(source_mnt, top_mnt, smp);
lock_mount_hash();
for (p = mnt; p; p = next_mnt(p, mnt)) {
- p->mnt_ns = ns;
- ns->mounts++;
+ mnt_add_to_ns(ns, p);
+ ns->nr_mounts++;
}
ns->root = mnt;
- list_add_tail(&ns->list, &mnt->mnt_list);
mntget(&mnt->mnt);
unlock_mount_hash();
namespace_unlock();
* can_move_mount_beneath - check that we can mount beneath the top mount
* @from: mount to mount beneath
* @to: mount under which to mount
+ * @mp: mountpoint of @to
*
* - Make sure that @to->dentry is actually the root of a mount under
* which we can mount another mount.
if (!anon)
new_ns->seq = atomic64_add_return(1, &mnt_ns_seq);
refcount_set(&new_ns->ns.count, 1);
- INIT_LIST_HEAD(&new_ns->list);
+ new_ns->mounts = RB_ROOT;
init_waitqueue_head(&new_ns->poll);
- spin_lock_init(&new_ns->ns_lock);
new_ns->user_ns = get_user_ns(user_ns);
new_ns->ucounts = ucounts;
return new_ns;
unlock_mount_hash();
}
new_ns->root = new;
- list_add_tail(&new_ns->list, &new->mnt_list);
/*
* Second pass: switch the tsk->fs->* elements and mark new vfsmounts
p = old;
q = new;
while (p) {
- q->mnt_ns = new_ns;
- new_ns->mounts++;
+ mnt_add_to_ns(new_ns, q);
+ new_ns->nr_mounts++;
if (new_fs) {
if (&p->mnt == new_fs->root.mnt) {
new_fs->root.mnt = mntget(&q->mnt);
mntput(m);
return ERR_CAST(ns);
}
- mnt->mnt_ns = ns;
ns->root = mnt;
- ns->mounts++;
- list_add(&mnt->mnt_list, &ns->list);
+ ns->nr_mounts++;
+ mnt_add_to_ns(ns, mnt);
err = vfs_path_lookup(m->mnt_root, m,
name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &path);
goto err_path;
}
mnt = real_mount(newmount.mnt);
- mnt->mnt_ns = ns;
ns->root = mnt;
- ns->mounts = 1;
- list_add(&mnt->mnt_list, &ns->list);
+ ns->nr_mounts = 1;
+ mnt_add_to_ns(ns, mnt);
mntget(newmount.mnt);
/* Attach to an apparent O_PATH fd with a note that we need to unmount
* Creating an idmapped mount with the filesystem wide idmapping
* doesn't make sense so block that. We don't allow mushy semantics.
*/
- if (!check_fsmapping(kattr->mnt_idmap, m->mnt_sb))
+ if (kattr->mnt_userns == m->mnt_sb->s_user_ns)
return -EINVAL;
/*
return err;
}
+int show_path(struct seq_file *m, struct dentry *root)
+{
+ if (root->d_sb->s_op->show_path)
+ return root->d_sb->s_op->show_path(m, root);
+
+ seq_dentry(m, root, " \t\n\\");
+ return 0;
+}
+
+static struct vfsmount *lookup_mnt_in_ns(u64 id, struct mnt_namespace *ns)
+{
+ struct mount *mnt = mnt_find_id_at(ns, id);
+
+ if (!mnt || mnt->mnt_id_unique != id)
+ return NULL;
+
+ return &mnt->mnt;
+}
+
+struct kstatmount {
+ struct statmount __user *buf;
+ size_t bufsize;
+ struct vfsmount *mnt;
+ u64 mask;
+ struct path root;
+ struct statmount sm;
+ struct seq_file seq;
+};
+
+static u64 mnt_to_attr_flags(struct vfsmount *mnt)
+{
+ unsigned int mnt_flags = READ_ONCE(mnt->mnt_flags);
+ u64 attr_flags = 0;
+
+ if (mnt_flags & MNT_READONLY)
+ attr_flags |= MOUNT_ATTR_RDONLY;
+ if (mnt_flags & MNT_NOSUID)
+ attr_flags |= MOUNT_ATTR_NOSUID;
+ if (mnt_flags & MNT_NODEV)
+ attr_flags |= MOUNT_ATTR_NODEV;
+ if (mnt_flags & MNT_NOEXEC)
+ attr_flags |= MOUNT_ATTR_NOEXEC;
+ if (mnt_flags & MNT_NODIRATIME)
+ attr_flags |= MOUNT_ATTR_NODIRATIME;
+ if (mnt_flags & MNT_NOSYMFOLLOW)
+ attr_flags |= MOUNT_ATTR_NOSYMFOLLOW;
+
+ if (mnt_flags & MNT_NOATIME)
+ attr_flags |= MOUNT_ATTR_NOATIME;
+ else if (mnt_flags & MNT_RELATIME)
+ attr_flags |= MOUNT_ATTR_RELATIME;
+ else
+ attr_flags |= MOUNT_ATTR_STRICTATIME;
+
+ if (is_idmapped_mnt(mnt))
+ attr_flags |= MOUNT_ATTR_IDMAP;
+
+ return attr_flags;
+}
+
+static u64 mnt_to_propagation_flags(struct mount *m)
+{
+ u64 propagation = 0;
+
+ if (IS_MNT_SHARED(m))
+ propagation |= MS_SHARED;
+ if (IS_MNT_SLAVE(m))
+ propagation |= MS_SLAVE;
+ if (IS_MNT_UNBINDABLE(m))
+ propagation |= MS_UNBINDABLE;
+ if (!propagation)
+ propagation |= MS_PRIVATE;
+
+ return propagation;
+}
+
+static void statmount_sb_basic(struct kstatmount *s)
+{
+ struct super_block *sb = s->mnt->mnt_sb;
+
+ s->sm.mask |= STATMOUNT_SB_BASIC;
+ s->sm.sb_dev_major = MAJOR(sb->s_dev);
+ s->sm.sb_dev_minor = MINOR(sb->s_dev);
+ s->sm.sb_magic = sb->s_magic;
+ s->sm.sb_flags = sb->s_flags & (SB_RDONLY|SB_SYNCHRONOUS|SB_DIRSYNC|SB_LAZYTIME);
+}
+
+static void statmount_mnt_basic(struct kstatmount *s)
+{
+ struct mount *m = real_mount(s->mnt);
+
+ s->sm.mask |= STATMOUNT_MNT_BASIC;
+ s->sm.mnt_id = m->mnt_id_unique;
+ s->sm.mnt_parent_id = m->mnt_parent->mnt_id_unique;
+ s->sm.mnt_id_old = m->mnt_id;
+ s->sm.mnt_parent_id_old = m->mnt_parent->mnt_id;
+ s->sm.mnt_attr = mnt_to_attr_flags(&m->mnt);
+ s->sm.mnt_propagation = mnt_to_propagation_flags(m);
+ s->sm.mnt_peer_group = IS_MNT_SHARED(m) ? m->mnt_group_id : 0;
+ s->sm.mnt_master = IS_MNT_SLAVE(m) ? m->mnt_master->mnt_group_id : 0;
+}
+
+static void statmount_propagate_from(struct kstatmount *s)
+{
+ struct mount *m = real_mount(s->mnt);
+
+ s->sm.mask |= STATMOUNT_PROPAGATE_FROM;
+ if (IS_MNT_SLAVE(m))
+ s->sm.propagate_from = get_dominating_id(m, ¤t->fs->root);
+}
+
+static int statmount_mnt_root(struct kstatmount *s, struct seq_file *seq)
+{
+ int ret;
+ size_t start = seq->count;
+
+ ret = show_path(seq, s->mnt->mnt_root);
+ if (ret)
+ return ret;
+
+ if (unlikely(seq_has_overflowed(seq)))
+ return -EAGAIN;
+
+ /*
+ * Unescape the result. It would be better if supplied string was not
+ * escaped in the first place, but that's a pretty invasive change.
+ */
+ seq->buf[seq->count] = '\0';
+ seq->count = start;
+ seq_commit(seq, string_unescape_inplace(seq->buf + start, UNESCAPE_OCTAL));
+ return 0;
+}
+
+static int statmount_mnt_point(struct kstatmount *s, struct seq_file *seq)
+{
+ struct vfsmount *mnt = s->mnt;
+ struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
+ int err;
+
+ err = seq_path_root(seq, &mnt_path, &s->root, "");
+ return err == SEQ_SKIP ? 0 : err;
+}
+
+static int statmount_fs_type(struct kstatmount *s, struct seq_file *seq)
+{
+ struct super_block *sb = s->mnt->mnt_sb;
+
+ seq_puts(seq, sb->s_type->name);
+ return 0;
+}
+
+static int statmount_string(struct kstatmount *s, u64 flag)
+{
+ int ret;
+ size_t kbufsize;
+ struct seq_file *seq = &s->seq;
+ struct statmount *sm = &s->sm;
+
+ switch (flag) {
+ case STATMOUNT_FS_TYPE:
+ sm->fs_type = seq->count;
+ ret = statmount_fs_type(s, seq);
+ break;
+ case STATMOUNT_MNT_ROOT:
+ sm->mnt_root = seq->count;
+ ret = statmount_mnt_root(s, seq);
+ break;
+ case STATMOUNT_MNT_POINT:
+ sm->mnt_point = seq->count;
+ ret = statmount_mnt_point(s, seq);
+ break;
+ default:
+ WARN_ON_ONCE(true);
+ return -EINVAL;
+ }
+
+ if (unlikely(check_add_overflow(sizeof(*sm), seq->count, &kbufsize)))
+ return -EOVERFLOW;
+ if (kbufsize >= s->bufsize)
+ return -EOVERFLOW;
+
+ /* signal a retry */
+ if (unlikely(seq_has_overflowed(seq)))
+ return -EAGAIN;
+
+ if (ret)
+ return ret;
+
+ seq->buf[seq->count++] = '\0';
+ sm->mask |= flag;
+ return 0;
+}
+
+static int copy_statmount_to_user(struct kstatmount *s)
+{
+ struct statmount *sm = &s->sm;
+ struct seq_file *seq = &s->seq;
+ char __user *str = ((char __user *)s->buf) + sizeof(*sm);
+ size_t copysize = min_t(size_t, s->bufsize, sizeof(*sm));
+
+ if (seq->count && copy_to_user(str, seq->buf, seq->count))
+ return -EFAULT;
+
+ /* Return the number of bytes copied to the buffer */
+ sm->size = copysize + seq->count;
+ if (copy_to_user(s->buf, sm, copysize))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int do_statmount(struct kstatmount *s)
+{
+ struct mount *m = real_mount(s->mnt);
+ int err;
+
+ /*
+ * Don't trigger audit denials. We just want to determine what
+ * mounts to show users.
+ */
+ if (!is_path_reachable(m, m->mnt.mnt_root, &s->root) &&
+ !ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN))
+ return -EPERM;
+
+ err = security_sb_statfs(s->mnt->mnt_root);
+ if (err)
+ return err;
+
+ if (s->mask & STATMOUNT_SB_BASIC)
+ statmount_sb_basic(s);
+
+ if (s->mask & STATMOUNT_MNT_BASIC)
+ statmount_mnt_basic(s);
+
+ if (s->mask & STATMOUNT_PROPAGATE_FROM)
+ statmount_propagate_from(s);
+
+ if (s->mask & STATMOUNT_FS_TYPE)
+ err = statmount_string(s, STATMOUNT_FS_TYPE);
+
+ if (!err && s->mask & STATMOUNT_MNT_ROOT)
+ err = statmount_string(s, STATMOUNT_MNT_ROOT);
+
+ if (!err && s->mask & STATMOUNT_MNT_POINT)
+ err = statmount_string(s, STATMOUNT_MNT_POINT);
+
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static inline bool retry_statmount(const long ret, size_t *seq_size)
+{
+ if (likely(ret != -EAGAIN))
+ return false;
+ if (unlikely(check_mul_overflow(*seq_size, 2, seq_size)))
+ return false;
+ if (unlikely(*seq_size > MAX_RW_COUNT))
+ return false;
+ return true;
+}
+
+static int prepare_kstatmount(struct kstatmount *ks, struct mnt_id_req *kreq,
+ struct statmount __user *buf, size_t bufsize,
+ size_t seq_size)
+{
+ if (!access_ok(buf, bufsize))
+ return -EFAULT;
+
+ memset(ks, 0, sizeof(*ks));
+ ks->mask = kreq->param;
+ ks->buf = buf;
+ ks->bufsize = bufsize;
+ ks->seq.size = seq_size;
+ ks->seq.buf = kvmalloc(seq_size, GFP_KERNEL_ACCOUNT);
+ if (!ks->seq.buf)
+ return -ENOMEM;
+ return 0;
+}
+
+static int copy_mnt_id_req(const struct mnt_id_req __user *req,
+ struct mnt_id_req *kreq)
+{
+ int ret;
+ size_t usize;
+
+ BUILD_BUG_ON(sizeof(struct mnt_id_req) != MNT_ID_REQ_SIZE_VER0);
+
+ ret = get_user(usize, &req->size);
+ if (ret)
+ return -EFAULT;
+ if (unlikely(usize > PAGE_SIZE))
+ return -E2BIG;
+ if (unlikely(usize < MNT_ID_REQ_SIZE_VER0))
+ return -EINVAL;
+ memset(kreq, 0, sizeof(*kreq));
+ ret = copy_struct_from_user(kreq, sizeof(*kreq), req, usize);
+ if (ret)
+ return ret;
+ if (kreq->spare != 0)
+ return -EINVAL;
+ return 0;
+}
+
+SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
+ struct statmount __user *, buf, size_t, bufsize,
+ unsigned int, flags)
+{
+ struct vfsmount *mnt;
+ struct mnt_id_req kreq;
+ struct kstatmount ks;
+ /* We currently support retrieval of 3 strings. */
+ size_t seq_size = 3 * PATH_MAX;
+ int ret;
+
+ if (flags)
+ return -EINVAL;
+
+ ret = copy_mnt_id_req(req, &kreq);
+ if (ret)
+ return ret;
+
+retry:
+ ret = prepare_kstatmount(&ks, &kreq, buf, bufsize, seq_size);
+ if (ret)
+ return ret;
+
+ down_read(&namespace_sem);
+ mnt = lookup_mnt_in_ns(kreq.mnt_id, current->nsproxy->mnt_ns);
+ if (!mnt) {
+ up_read(&namespace_sem);
+ kvfree(ks.seq.buf);
+ return -ENOENT;
+ }
+
+ ks.mnt = mnt;
+ get_fs_root(current->fs, &ks.root);
+ ret = do_statmount(&ks);
+ path_put(&ks.root);
+ up_read(&namespace_sem);
+
+ if (!ret)
+ ret = copy_statmount_to_user(&ks);
+ kvfree(ks.seq.buf);
+ if (retry_statmount(ret, &seq_size))
+ goto retry;
+ return ret;
+}
+
+static struct mount *listmnt_next(struct mount *curr)
+{
+ return node_to_mount(rb_next(&curr->mnt_node));
+}
+
+static ssize_t do_listmount(struct mount *first, struct path *orig, u64 mnt_id,
+ u64 __user *buf, size_t bufsize,
+ const struct path *root)
+{
+ struct mount *r;
+ ssize_t ctr;
+ int err;
+
+ /*
+ * Don't trigger audit denials. We just want to determine what
+ * mounts to show users.
+ */
+ if (!is_path_reachable(real_mount(orig->mnt), orig->dentry, root) &&
+ !ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN))
+ return -EPERM;
+
+ err = security_sb_statfs(orig->dentry);
+ if (err)
+ return err;
+
+ for (ctr = 0, r = first; r && ctr < bufsize; r = listmnt_next(r)) {
+ if (r->mnt_id_unique == mnt_id)
+ continue;
+ if (!is_path_reachable(r, r->mnt.mnt_root, orig))
+ continue;
+ ctr = array_index_nospec(ctr, bufsize);
+ if (put_user(r->mnt_id_unique, buf + ctr))
+ return -EFAULT;
+ if (check_add_overflow(ctr, 1, &ctr))
+ return -ERANGE;
+ }
+ return ctr;
+}
+
+SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req,
+ u64 __user *, buf, size_t, bufsize, unsigned int, flags)
+{
+ struct mnt_namespace *ns = current->nsproxy->mnt_ns;
+ struct mnt_id_req kreq;
+ struct mount *first;
+ struct path root, orig;
+ u64 mnt_id, last_mnt_id;
+ ssize_t ret;
+
+ if (flags)
+ return -EINVAL;
+
+ ret = copy_mnt_id_req(req, &kreq);
+ if (ret)
+ return ret;
+ mnt_id = kreq.mnt_id;
+ last_mnt_id = kreq.param;
+
+ down_read(&namespace_sem);
+ get_fs_root(current->fs, &root);
+ if (mnt_id == LSMT_ROOT) {
+ orig = root;
+ } else {
+ ret = -ENOENT;
+ orig.mnt = lookup_mnt_in_ns(mnt_id, ns);
+ if (!orig.mnt)
+ goto err;
+ orig.dentry = orig.mnt->mnt_root;
+ }
+ if (!last_mnt_id)
+ first = node_to_mount(rb_first(&ns->mounts));
+ else
+ first = mnt_find_id_at(ns, last_mnt_id + 1);
+
+ ret = do_listmount(first, &orig, mnt_id, buf, bufsize, &root);
+err:
+ path_put(&root);
+ up_read(&namespace_sem);
+ return ret;
+}
+
+
static void __init init_mount_tree(void)
{
struct vfsmount *mnt;
if (IS_ERR(ns))
panic("Can't allocate initial namespace");
m = real_mount(mnt);
- m->mnt_ns = ns;
ns->root = m;
- ns->mounts = 1;
- list_add(&m->mnt_list, &ns->list);
+ ns->nr_mounts = 1;
+ mnt_add_to_ns(ns, m);
init_task.nsproxy->mnt_ns = ns;
get_mnt_ns(ns);
int *new_mnt_flags)
{
int new_flags = *new_mnt_flags;
- struct mount *mnt;
+ struct mount *mnt, *n;
bool visible = false;
down_read(&namespace_sem);
- lock_ns_list(ns);
- list_for_each_entry(mnt, &ns->list, mnt_list) {
+ rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
struct mount *child;
int mnt_flags;
- if (mnt_is_cursor(mnt))
- continue;
-
if (mnt->mnt.mnt_sb->s_type != sb->s_type)
continue;
next: ;
}
found:
- unlock_ns_list(ns);
up_read(&namespace_sem);
return visible;
}
#include <linux/mount.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_ssc.h>
+#include <linux/splice.h>
#include "delegation.h"
#include "internal.h"
#include "iostat.h"
ret = __nfs4_copy_file_range(file_in, pos_in, file_out, pos_out, count,
flags);
if (ret == -EOPNOTSUPP || ret == -EXDEV)
- ret = generic_copy_file_range(file_in, pos_in, file_out,
- pos_out, count, flags);
+ ret = splice_copy_file_range(file_in, pos_in, file_out,
+ pos_out, count);
return ret;
}
if (!folio_test_private(folio))
return NULL;
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
req = nfs_folio_private_request(folio);
if (req) {
WARN_ON_ONCE(req->wb_head != req);
kref_get(&req->wb_kref);
}
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
return req;
}
* Swap-space should not get truncated. Hence no need to plug the race
* with invalidate/truncate.
*/
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
if (likely(!folio_test_swapcache(folio))) {
set_bit(PG_MAPPED, &req->wb_flags);
folio_set_private(folio);
folio->private = req;
}
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
atomic_long_inc(&nfsi->nrequests);
/* this a head request for a page group - mark it as having an
* extra reference so sub groups can follow suit.
struct folio *folio = nfs_page_to_folio(req->wb_head);
struct address_space *mapping = folio_file_mapping(folio);
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
if (likely(folio && !folio_test_swapcache(folio))) {
folio->private = NULL;
folio_clear_private(folio);
clear_bit(PG_MAPPED, &req->wb_head->wb_flags);
}
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
}
if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
static void encode_bitmap4(struct xdr_stream *xdr, const __u32 *bitmap,
size_t len)
{
- xdr_stream_encode_uint32_array(xdr, bitmap, len);
-}
-
-static int decode_cb_fattr4(struct xdr_stream *xdr, uint32_t *bitmap,
- struct nfs4_cb_fattr *fattr)
-{
- fattr->ncf_cb_change = 0;
- fattr->ncf_cb_fsize = 0;
- if (bitmap[0] & FATTR4_WORD0_CHANGE)
- if (xdr_stream_decode_u64(xdr, &fattr->ncf_cb_change) < 0)
- return -NFSERR_BAD_XDR;
- if (bitmap[0] & FATTR4_WORD0_SIZE)
- if (xdr_stream_decode_u64(xdr, &fattr->ncf_cb_fsize) < 0)
- return -NFSERR_BAD_XDR;
- return 0;
+ WARN_ON_ONCE(xdr_stream_encode_uint32_array(xdr, bitmap, len) < 0);
}
/*
hdr->nops++;
}
-/*
- * CB_GETATTR4args
- * struct CB_GETATTR4args {
- * nfs_fh4 fh;
- * bitmap4 attr_request;
- * };
- *
- * The size and change attributes are the only one
- * guaranteed to be serviced by the client.
- */
-static void
-encode_cb_getattr4args(struct xdr_stream *xdr, struct nfs4_cb_compound_hdr *hdr,
- struct nfs4_cb_fattr *fattr)
-{
- struct nfs4_delegation *dp =
- container_of(fattr, struct nfs4_delegation, dl_cb_fattr);
- struct knfsd_fh *fh = &dp->dl_stid.sc_file->fi_fhandle;
-
- encode_nfs_cb_opnum4(xdr, OP_CB_GETATTR);
- encode_nfs_fh4(xdr, fh);
- encode_bitmap4(xdr, fattr->ncf_cb_bmap, ARRAY_SIZE(fattr->ncf_cb_bmap));
- hdr->nops++;
-}
-
/*
* CB_SEQUENCE4args
*
xdr_reserve_space(xdr, 0);
}
-/*
- * 20.1. Operation 3: CB_GETATTR - Get Attributes
- */
-static void nfs4_xdr_enc_cb_getattr(struct rpc_rqst *req,
- struct xdr_stream *xdr, const void *data)
-{
- const struct nfsd4_callback *cb = data;
- struct nfs4_cb_fattr *ncf =
- container_of(cb, struct nfs4_cb_fattr, ncf_getattr);
- struct nfs4_cb_compound_hdr hdr = {
- .ident = cb->cb_clp->cl_cb_ident,
- .minorversion = cb->cb_clp->cl_minorversion,
- };
-
- encode_cb_compound4args(xdr, &hdr);
- encode_cb_sequence4args(xdr, cb, &hdr);
- encode_cb_getattr4args(xdr, &hdr, ncf);
- encode_cb_nops(&hdr);
-}
-
/*
* 20.2. Operation 4: CB_RECALL - Recall a Delegation
*/
return 0;
}
-/*
- * 20.1. Operation 3: CB_GETATTR - Get Attributes
- */
-static int nfs4_xdr_dec_cb_getattr(struct rpc_rqst *rqstp,
- struct xdr_stream *xdr,
- void *data)
-{
- struct nfsd4_callback *cb = data;
- struct nfs4_cb_compound_hdr hdr;
- int status;
- u32 bitmap[3] = {0};
- u32 attrlen;
- struct nfs4_cb_fattr *ncf =
- container_of(cb, struct nfs4_cb_fattr, ncf_getattr);
-
- status = decode_cb_compound4res(xdr, &hdr);
- if (unlikely(status))
- return status;
-
- status = decode_cb_sequence4res(xdr, cb);
- if (unlikely(status || cb->cb_seq_status))
- return status;
-
- status = decode_cb_op_status(xdr, OP_CB_GETATTR, &cb->cb_status);
- if (status)
- return status;
- if (xdr_stream_decode_uint32_array(xdr, bitmap, 3) < 0)
- return -NFSERR_BAD_XDR;
- if (xdr_stream_decode_u32(xdr, &attrlen) < 0)
- return -NFSERR_BAD_XDR;
- if (attrlen > (sizeof(ncf->ncf_cb_change) + sizeof(ncf->ncf_cb_fsize)))
- return -NFSERR_BAD_XDR;
- status = decode_cb_fattr4(xdr, bitmap, ncf);
- return status;
-}
-
/*
* 20.2. Operation 4: CB_RECALL - Recall a Delegation
*/
PROC(CB_NOTIFY_LOCK, COMPOUND, cb_notify_lock, cb_notify_lock),
PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload),
PROC(CB_RECALL_ANY, COMPOUND, cb_recall_any, cb_recall_any),
- PROC(CB_GETATTR, COMPOUND, cb_getattr, cb_getattr),
};
static unsigned int nfs4_cb_counts[ARRAY_SIZE(nfs4_cb_procedures)];
static const struct nfsd4_callback_ops nfsd4_cb_recall_ops;
static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
-static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops;
static struct workqueue_struct *laundry_wq;
dp->dl_recalled = false;
nfsd4_init_cb(&dp->dl_recall, dp->dl_stid.sc_client,
&nfsd4_cb_recall_ops, NFSPROC4_CLNT_CB_RECALL);
- nfsd4_init_cb(&dp->dl_cb_fattr.ncf_getattr, dp->dl_stid.sc_client,
- &nfsd4_cb_getattr_ops, NFSPROC4_CLNT_CB_GETATTR);
- dp->dl_cb_fattr.ncf_file_modified = false;
- dp->dl_cb_fattr.ncf_cb_bmap[0] = FATTR4_WORD0_CHANGE | FATTR4_WORD0_SIZE;
get_nfs4_file(fp);
dp->dl_stid.sc_file = fp;
return dp;
spin_unlock(&nn->client_lock);
}
-static int
-nfsd4_cb_getattr_done(struct nfsd4_callback *cb, struct rpc_task *task)
-{
- struct nfs4_cb_fattr *ncf =
- container_of(cb, struct nfs4_cb_fattr, ncf_getattr);
-
- ncf->ncf_cb_status = task->tk_status;
- switch (task->tk_status) {
- case -NFS4ERR_DELAY:
- rpc_delay(task, 2 * HZ);
- return 0;
- default:
- return 1;
- }
-}
-
-static void
-nfsd4_cb_getattr_release(struct nfsd4_callback *cb)
-{
- struct nfs4_cb_fattr *ncf =
- container_of(cb, struct nfs4_cb_fattr, ncf_getattr);
- struct nfs4_delegation *dp =
- container_of(ncf, struct nfs4_delegation, dl_cb_fattr);
-
- nfs4_put_stid(&dp->dl_stid);
- clear_bit(CB_GETATTR_BUSY, &ncf->ncf_cb_flags);
- wake_up_bit(&ncf->ncf_cb_flags, CB_GETATTR_BUSY);
-}
-
static const struct nfsd4_callback_ops nfsd4_cb_recall_any_ops = {
.done = nfsd4_cb_recall_any_done,
.release = nfsd4_cb_recall_any_release,
};
-static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops = {
- .done = nfsd4_cb_getattr_done,
- .release = nfsd4_cb_getattr_release,
-};
-
-void nfs4_cb_getattr(struct nfs4_cb_fattr *ncf)
-{
- struct nfs4_delegation *dp =
- container_of(ncf, struct nfs4_delegation, dl_cb_fattr);
-
- if (test_and_set_bit(CB_GETATTR_BUSY, &ncf->ncf_cb_flags))
- return;
- refcount_inc(&dp->dl_stid.sc_count);
- nfsd4_run_cb(&ncf->ncf_getattr);
-}
-
static struct nfs4_client *create_client(struct xdr_netobj name,
struct svc_rqst *rqstp, nfs4_verifier *verf)
{
struct svc_fh *parent = NULL;
int cb_up;
int status = 0;
- struct kstat stat;
- struct path path;
cb_up = nfsd4_cb_channel_good(oo->oo_owner.so_client);
open->op_recall = false;
if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
open->op_delegate_type = NFS4_OPEN_DELEGATE_WRITE;
trace_nfsd_deleg_write(&dp->dl_stid.sc_stateid);
- path.mnt = currentfh->fh_export->ex_path.mnt;
- path.dentry = currentfh->fh_dentry;
- if (vfs_getattr(&path, &stat,
- (STATX_SIZE | STATX_CTIME | STATX_CHANGE_COOKIE),
- AT_STATX_SYNC_AS_STAT)) {
- nfs4_put_stid(&dp->dl_stid);
- destroy_delegation(dp);
- goto out_no_deleg;
- }
- dp->dl_cb_fattr.ncf_cur_fsize = stat.size;
- dp->dl_cb_fattr.ncf_initial_cinfo =
- nfsd4_change_attribute(&stat, d_inode(currentfh->fh_dentry));
} else {
open->op_delegate_type = NFS4_OPEN_DELEGATE_READ;
trace_nfsd_deleg_read(&dp->dl_stid.sc_stateid);
* nfsd4_deleg_getattr_conflict - Recall if GETATTR causes conflict
* @rqstp: RPC transaction context
* @inode: file to be checked for a conflict
- * @modified: return true if file was modified
- * @size: new size of file if modified is true
*
* This function is called when there is a conflict between a write
* delegation and a change/size GETATTR from another client. The server
* delegation before replying to the GETATTR. See RFC 8881 section
* 18.7.4.
*
+ * The current implementation does not support CB_GETATTR yet. However
+ * this can avoid recalling the delegation could be added in follow up
+ * work.
+ *
* Returns 0 if there is no conflict; otherwise an nfs_stat
* code is returned.
*/
__be32
-nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode,
- bool *modified, u64 *size)
+nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode)
{
+ __be32 status;
struct file_lock_context *ctx;
- struct nfs4_delegation *dp;
- struct nfs4_cb_fattr *ncf;
struct file_lock *fl;
- struct iattr attrs;
- __be32 status;
-
- might_sleep();
+ struct nfs4_delegation *dp;
- *modified = false;
ctx = locks_inode_context(inode);
if (!ctx)
return 0;
break_lease:
spin_unlock(&ctx->flc_lock);
nfsd_stats_wdeleg_getattr_inc();
-
- dp = fl->fl_owner;
- ncf = &dp->dl_cb_fattr;
- nfs4_cb_getattr(&dp->dl_cb_fattr);
- wait_on_bit(&ncf->ncf_cb_flags, CB_GETATTR_BUSY, TASK_INTERRUPTIBLE);
- if (ncf->ncf_cb_status) {
- status = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
- if (status != nfserr_jukebox ||
- !nfsd_wait_for_delegreturn(rqstp, inode))
- return status;
- }
- if (!ncf->ncf_file_modified &&
- (ncf->ncf_initial_cinfo != ncf->ncf_cb_change ||
- ncf->ncf_cur_fsize != ncf->ncf_cb_fsize))
- ncf->ncf_file_modified = true;
- if (ncf->ncf_file_modified) {
- /*
- * The server would not update the file's metadata
- * with the client's modified size.
- */
- attrs.ia_mtime = attrs.ia_ctime = current_time(inode);
- attrs.ia_valid = ATTR_MTIME | ATTR_CTIME;
- setattr_copy(&nop_mnt_idmap, inode, &attrs);
- mark_inode_dirty(inode);
- ncf->ncf_cur_fsize = ncf->ncf_cb_fsize;
- *size = ncf->ncf_cur_fsize;
- *modified = true;
- }
+ status = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
+ if (status != nfserr_jukebox ||
+ !nfsd_wait_for_delegreturn(rqstp, inode))
+ return status;
return 0;
}
break;
u32 attrmask[3];
unsigned long mask[2];
} u;
- bool file_modified;
unsigned long bit;
- u64 size = 0;
WARN_ON_ONCE(bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1);
WARN_ON_ONCE(!nfsd_attrs_supported(minorversion, bmval));
}
args.size = 0;
if (u.attrmask[0] & (FATTR4_WORD0_CHANGE | FATTR4_WORD0_SIZE)) {
- status = nfsd4_deleg_getattr_conflict(rqstp, d_inode(dentry),
- &file_modified, &size);
+ status = nfsd4_deleg_getattr_conflict(rqstp, d_inode(dentry));
if (status)
goto out;
}
AT_STATX_SYNC_AS_STAT);
if (err)
goto out_nfserr;
- args.size = file_modified ? size : args.stat.size;
+ args.size = args.stat.size;
if (!(args.stat.result_mask & STATX_BTIME))
/* underlying FS does not offer btime so we can't share it */
char *mesg = buf;
int fd, err;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv;
err = get_int(&mesg, &fd);
if (err != 0 || fd < 0)
if (err != 0)
return err;
- err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
+ serv = nn->nfsd_serv;
+ err = svc_addsock(serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
- if (err >= 0 &&
- !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
- svc_get(nn->nfsd_serv);
+ if (err < 0 && !serv->sv_nrthreads && !nn->keep_active)
+ nfsd_last_thread(net);
+ else if (err >= 0 && !serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
+ svc_get(serv);
- nfsd_put(net);
+ svc_put(serv);
return err;
}
struct svc_xprt *xprt;
int port, err;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv;
if (sscanf(buf, "%15s %5u", transport, &port) != 2)
return -EINVAL;
if (err != 0)
return err;
- err = svc_xprt_create(nn->nfsd_serv, transport, net,
+ serv = nn->nfsd_serv;
+ err = svc_xprt_create(serv, transport, net,
PF_INET, port, SVC_SOCK_ANONYMOUS, cred);
if (err < 0)
goto out_err;
- err = svc_xprt_create(nn->nfsd_serv, transport, net,
+ err = svc_xprt_create(serv, transport, net,
PF_INET6, port, SVC_SOCK_ANONYMOUS, cred);
if (err < 0 && err != -EAFNOSUPPORT)
goto out_close;
- if (!nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
- svc_get(nn->nfsd_serv);
+ if (!serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
+ svc_get(serv);
- nfsd_put(net);
+ svc_put(serv);
return 0;
out_close:
- xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port);
+ xprt = svc_find_xprt(serv, transport, net, PF_INET, port);
if (xprt != NULL) {
svc_xprt_close(xprt);
svc_xprt_put(xprt);
}
out_err:
- nfsd_put(net);
+ if (!serv->sv_nrthreads && !nn->keep_active)
+ nfsd_last_thread(net);
+
+ svc_put(serv);
return err;
}
int ret = -ENODEV;
mutex_lock(&nfsd_mutex);
- if (nn->nfsd_serv) {
- svc_get(nn->nfsd_serv);
+ if (nn->nfsd_serv)
ret = 0;
- }
- mutex_unlock(&nfsd_mutex);
+ else
+ mutex_unlock(&nfsd_mutex);
return ret;
}
*/
int nfsd_nl_rpc_status_get_done(struct netlink_callback *cb)
{
- mutex_lock(&nfsd_mutex);
- nfsd_put(sock_net(cb->skb->sk));
mutex_unlock(&nfsd_mutex);
return 0;
int nfsd_pool_stats_release(struct inode *, struct file *);
void nfsd_shutdown_threads(struct net *net);
-static inline void nfsd_put(struct net *net)
-{
- struct nfsd_net *nn = net_generic(net, nfsd_net_id);
-
- svc_put(nn->nfsd_serv);
-}
-
bool i_am_nfsd(void);
struct nfsdfs_client {
int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change);
void nfsd_reset_versions(struct nfsd_net *nn);
int nfsd_create_serv(struct net *net);
+void nfsd_last_thread(struct net *net);
extern int nfsd_max_blksize;
/* Only used under nfsd_mutex, so this atomic may be overkill: */
static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0);
-static void nfsd_last_thread(struct net *net)
+void nfsd_last_thread(struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct svc_serv *serv = nn->nfsd_serv;
time64_t cpntf_time; /* last time stateid used */
};
-struct nfs4_cb_fattr {
- struct nfsd4_callback ncf_getattr;
- u32 ncf_cb_status;
- u32 ncf_cb_bmap[1];
-
- /* from CB_GETATTR reply */
- u64 ncf_cb_change;
- u64 ncf_cb_fsize;
-
- unsigned long ncf_cb_flags;
- bool ncf_file_modified;
- u64 ncf_initial_cinfo;
- u64 ncf_cur_fsize;
-};
-
-/* bits for ncf_cb_flags */
-#define CB_GETATTR_BUSY 0
-
/*
* Represents a delegation stateid. The nfs4_client holds references to these
* and they are put when it is being destroyed or when the delegation is
int dl_retries;
struct nfsd4_callback dl_recall;
bool dl_recalled;
-
- /* for CB_GETATTR */
- struct nfs4_cb_fattr dl_cb_fattr;
};
#define cb_to_delegation(cb) \
NFSPROC4_CLNT_CB_SEQUENCE,
NFSPROC4_CLNT_CB_NOTIFY_LOCK,
NFSPROC4_CLNT_CB_RECALL_ANY,
- NFSPROC4_CLNT_CB_GETATTR,
};
/* Returns true iff a is later than b: */
}
extern __be32 nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp,
- struct inode *inode, bool *file_modified, u64 *size);
-extern void nfs4_cb_getattr(struct nfs4_cb_fattr *ncf);
+ struct inode *inode);
#endif /* NFSD4_STATE_H */
ssize_t host_err;
trace_nfsd_read_splice(rqstp, fhp, offset, *count);
- host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
+ host_err = rw_verify_area(READ, file, &offset, *count);
+ if (!host_err)
+ host_err = splice_direct_to_actor(file, &sd,
+ nfsd_direct_splice_actor);
return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
}
since = READ_ONCE(file->f_wb_err);
if (verf)
nfsd_copy_write_verifier(verf, nn);
- file_start_write(file);
host_err = vfs_iter_write(file, &iter, &pos, flags);
- file_end_write(file);
if (host_err < 0) {
commit_reset_write_verifier(nn, rqstp, host_err);
goto out_nfserr;
#define NFS4_dec_cb_recall_any_sz (cb_compound_dec_hdr_sz + \
cb_sequence_dec_sz + \
op_dec_sz)
-
-/*
- * 1: CB_GETATTR opcode (32-bit)
- * N: file_handle
- * 1: number of entry in attribute array (32-bit)
- * 1: entry 0 in attribute array (32-bit)
- */
-#define NFS4_enc_cb_getattr_sz (cb_compound_enc_hdr_sz + \
- cb_sequence_enc_sz + \
- 1 + enc_nfs4_fh_sz + 1 + 1)
-/*
- * 4: fattr_bitmap_maxsz
- * 1: attribute array len
- * 2: change attr (64-bit)
- * 2: size (64-bit)
- */
-#define NFS4_dec_cb_getattr_sz (cb_compound_dec_hdr_sz + \
- cb_sequence_dec_sz + 4 + 1 + 2 + 2 + op_dec_sz)
/*
* The page may not be locked, eg if called from try_to_unmap_one()
*/
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
head = folio_buffers(folio);
if (head) {
struct buffer_head *bh = head;
} else if (ret) {
nr_dirty = 1 << (folio_shift(folio) - inode->i_blkbits);
}
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
return ERR_CAST(s);
if (!s->s_root) {
- /*
- * We drop s_umount here because we need to open the bdev and
- * bdev->open_mutex ranks above s_umount (blkdev_put() ->
- * __invalidate_device()). It is safe because we have active sb
- * reference and SB_BORN is not set yet.
- */
- up_write(&s->s_umount);
err = setup_bdev_super(s, flags, NULL);
- down_write(&s->s_umount);
if (!err)
err = nilfs_fill_super(s, data,
flags & SB_SILENT ? 1 : 0);
*
* If the page does not have buffers, we create them and set them uptodate.
* The page may not be locked which is why we need to handle the buffers under
- * the mapping->private_lock. Once the buffers are marked dirty we no longer
+ * the mapping->i_private_lock. Once the buffers are marked dirty we no longer
* need the lock since try_to_free_buffers() does not free dirty buffers.
*/
void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs) {
BUG_ON(!PageUptodate(page));
end = ofs + ni->itype.index.block_size;
bh_size = VFS_I(ni)->i_sb->s_blocksize;
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
if (unlikely(!page_has_buffers(page))) {
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
bh = head = alloc_page_buffers(page, bh_size, true);
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
if (likely(!page_has_buffers(page))) {
struct buffer_head *tail;
break;
set_buffer_dirty(bh);
} while ((bh = bh->b_this_page) != head);
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
filemap_dirty_folio(mapping, page_folio(page));
if (unlikely(buffers_to_free)) {
do {
/**
* ntfs_dir_fsync - sync a directory to disk
* @filp: directory to be synced
- * @dentry: dentry describing the directory to sync
+ * @start: offset in bytes of the beginning of data range to sync
+ * @end: offset in bytes of the end of data range (inclusive)
* @datasync: if non-zero only flush user data and not metadata
*
* Data integrity sync of a directory to disk. Used for fsync, fdatasync, and
if (ret)
return ret;
+ ret = fsnotify_file_area_perm(file, MAY_WRITE, &offset, len);
+ if (ret)
+ return ret;
+
if (S_ISFIFO(inode->i_mode))
return -ESPIPE;
* 'get_current_cred()' function), that will clear the
* non_rcu field, because now that other user may be
* expecting RCU freeing. But normal thread-synchronous
- * cred accesses will keep things non-RCY.
+ * cred accesses will keep things non-racy to avoid RCU
+ * freeing.
*/
override_cred->non_rcu = 1;
}
EXPORT_SYMBOL_GPL(kernel_file_open);
-/**
- * backing_file_open - open a backing file for kernel internal use
- * @user_path: path that the user reuqested to open
- * @flags: open flags
- * @real_path: path of the backing file
- * @cred: credentials for open
- *
- * Open a backing file for a stackable filesystem (e.g., overlayfs).
- * @user_path may be on the stackable filesystem and @real_path on the
- * underlying filesystem. In this case, we want to be able to return the
- * @user_path of the stackable filesystem. This is done by embedding the
- * returned file into a container structure that also stores the stacked
- * file's path, which can be retrieved using backing_file_user_path().
- */
-struct file *backing_file_open(const struct path *user_path, int flags,
- const struct path *real_path,
- const struct cred *cred)
-{
- struct file *f;
- int error;
-
- f = alloc_empty_backing_file(flags, cred);
- if (IS_ERR(f))
- return f;
-
- path_get(user_path);
- *backing_file_user_path(f) = *user_path;
- f->f_path = *real_path;
- error = do_dentry_open(f, d_inode(real_path->dentry), NULL);
- if (error) {
- fput(f);
- f = ERR_PTR(error);
- }
-
- return f;
-}
-EXPORT_SYMBOL_GPL(backing_file_open);
-
#define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE))
#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
int retval;
struct file *file;
- file = close_fd_get_file(fd);
+ file = file_close_fd(fd);
if (!file)
return -EBADF;
# SPDX-License-Identifier: GPL-2.0-only
config OVERLAY_FS
tristate "Overlay filesystem support"
+ select FS_STACK
select EXPORTFS
help
An overlay filesystem combines two filesystems - an 'upper' filesystem
return ovl_real_fileattr_set(new, &newfa);
}
+static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
+{
+ loff_t tmp;
+
+ if (WARN_ON_ONCE(pos != pos2))
+ return -EIO;
+ if (WARN_ON_ONCE(pos < 0 || len < 0 || totlen < 0))
+ return -EIO;
+ if (WARN_ON_ONCE(check_add_overflow(pos, len, &tmp)))
+ return -EIO;
+ return 0;
+}
+
static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
struct file *new_file, loff_t len)
{
int error = 0;
ovl_path_lowerdata(dentry, &datapath);
- if (WARN_ON(datapath.dentry == NULL))
+ if (WARN_ON_ONCE(datapath.dentry == NULL) ||
+ WARN_ON_ONCE(len < 0))
return -EIO;
old_file = ovl_path_open(&datapath, O_LARGEFILE | O_RDONLY);
if (IS_ERR(old_file))
return PTR_ERR(old_file);
+ error = rw_verify_area(READ, old_file, &old_pos, len);
+ if (!error)
+ error = rw_verify_area(WRITE, new_file, &new_pos, len);
+ if (error)
+ goto out_fput;
+
/* Try to use clone_file_range to clone up within the same fs */
ovl_start_write(dentry);
cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0);
while (len) {
size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
- long bytes;
+ ssize_t bytes;
if (len < this_len)
this_len = len;
}
}
- ovl_start_write(dentry);
+ error = ovl_verify_area(old_pos, new_pos, this_len, len);
+ if (error)
+ break;
+
bytes = do_splice_direct(old_file, &old_pos,
new_file, &new_pos,
this_len, SPLICE_F_MOVE);
- ovl_end_write(dentry);
if (bytes <= 0) {
error = bytes;
break;
path.dentry = temp;
err = ovl_copy_up_data(c, &path);
/*
- * We cannot hold lock_rename() throughout this helper, because or
+ * We cannot hold lock_rename() throughout this helper, because of
* lock ordering with sb_writers, which shouldn't be held when calling
* ovl_copy_up_data(), so lock workdir and destdir and make sure that
* temp wasn't moved before copy up completion or cleanup.
- * If temp was moved, abort without the cleanup.
*/
ovl_start_write(c->dentry);
if (lock_rename(c->workdir, c->destdir) != NULL ||
temp->d_parent != c->workdir) {
+ /* temp or workdir moved underneath us? abort without cleanup */
+ dput(temp);
err = -EIO;
goto unlock;
} else if (err) {
#include <linux/xattr.h>
#include <linux/uio.h>
#include <linux/uaccess.h>
-#include <linux/splice.h>
#include <linux/security.h>
-#include <linux/mm.h>
#include <linux/fs.h>
+#include <linux/backing-file.h>
#include "overlayfs.h"
-#include "../internal.h" /* for sb_init_dio_done_wq */
-
-struct ovl_aio_req {
- struct kiocb iocb;
- refcount_t ref;
- struct kiocb *orig_iocb;
- /* used for aio completion */
- struct work_struct work;
- long res;
-};
-
-static struct kmem_cache *ovl_aio_request_cachep;
-
static char ovl_whatisit(struct inode *inode, struct inode *realinode)
{
if (realinode != ovl_inode_upper(inode))
touch_atime(&file->f_path);
}
-#define OVL_IOCB_MASK \
- (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND)
-
-static rwf_t iocb_to_rw_flags(int flags)
-{
- return (__force rwf_t)(flags & OVL_IOCB_MASK);
-}
-
-static inline void ovl_aio_put(struct ovl_aio_req *aio_req)
-{
- if (refcount_dec_and_test(&aio_req->ref)) {
- fput(aio_req->iocb.ki_filp);
- kmem_cache_free(ovl_aio_request_cachep, aio_req);
- }
-}
-
-static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
-{
- struct kiocb *iocb = &aio_req->iocb;
- struct kiocb *orig_iocb = aio_req->orig_iocb;
-
- if (iocb->ki_flags & IOCB_WRITE) {
- kiocb_end_write(iocb);
- ovl_file_modified(orig_iocb->ki_filp);
- }
-
- orig_iocb->ki_pos = iocb->ki_pos;
- ovl_aio_put(aio_req);
-}
-
-static void ovl_aio_rw_complete(struct kiocb *iocb, long res)
-{
- struct ovl_aio_req *aio_req = container_of(iocb,
- struct ovl_aio_req, iocb);
- struct kiocb *orig_iocb = aio_req->orig_iocb;
-
- ovl_aio_cleanup_handler(aio_req);
- orig_iocb->ki_complete(orig_iocb, res);
-}
-
-static void ovl_aio_complete_work(struct work_struct *work)
-{
- struct ovl_aio_req *aio_req = container_of(work,
- struct ovl_aio_req, work);
-
- ovl_aio_rw_complete(&aio_req->iocb, aio_req->res);
-}
-
-static void ovl_aio_queue_completion(struct kiocb *iocb, long res)
-{
- struct ovl_aio_req *aio_req = container_of(iocb,
- struct ovl_aio_req, iocb);
- struct kiocb *orig_iocb = aio_req->orig_iocb;
-
- /*
- * Punt to a work queue to serialize updates of mtime/size.
- */
- aio_req->res = res;
- INIT_WORK(&aio_req->work, ovl_aio_complete_work);
- queue_work(file_inode(orig_iocb->ki_filp)->i_sb->s_dio_done_wq,
- &aio_req->work);
-}
-
-static int ovl_init_aio_done_wq(struct super_block *sb)
-{
- if (sb->s_dio_done_wq)
- return 0;
-
- return sb_init_dio_done_wq(sb);
-}
-
static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct file *file = iocb->ki_filp;
struct fd real;
- const struct cred *old_cred;
ssize_t ret;
+ struct backing_file_ctx ctx = {
+ .cred = ovl_creds(file_inode(file)->i_sb),
+ .user_file = file,
+ .accessed = ovl_file_accessed,
+ };
if (!iov_iter_count(iter))
return 0;
if (ret)
return ret;
- ret = -EINVAL;
- if (iocb->ki_flags & IOCB_DIRECT &&
- !(real.file->f_mode & FMODE_CAN_ODIRECT))
- goto out_fdput;
-
- old_cred = ovl_override_creds(file_inode(file)->i_sb);
- if (is_sync_kiocb(iocb)) {
- rwf_t rwf = iocb_to_rw_flags(iocb->ki_flags);
-
- ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, rwf);
- } else {
- struct ovl_aio_req *aio_req;
-
- ret = -ENOMEM;
- aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL);
- if (!aio_req)
- goto out;
-
- aio_req->orig_iocb = iocb;
- kiocb_clone(&aio_req->iocb, iocb, get_file(real.file));
- aio_req->iocb.ki_complete = ovl_aio_rw_complete;
- refcount_set(&aio_req->ref, 2);
- ret = vfs_iocb_iter_read(real.file, &aio_req->iocb, iter);
- ovl_aio_put(aio_req);
- if (ret != -EIOCBQUEUED)
- ovl_aio_cleanup_handler(aio_req);
- }
-out:
- revert_creds(old_cred);
- ovl_file_accessed(file);
-out_fdput:
+ ret = backing_file_read_iter(real.file, iter, iocb, iocb->ki_flags,
+ &ctx);
fdput(real);
return ret;
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
struct fd real;
- const struct cred *old_cred;
ssize_t ret;
int ifl = iocb->ki_flags;
+ struct backing_file_ctx ctx = {
+ .cred = ovl_creds(inode->i_sb),
+ .user_file = file,
+ .end_write = ovl_file_modified,
+ };
if (!iov_iter_count(iter))
return 0;
inode_lock(inode);
/* Update mode */
ovl_copyattr(inode);
- ret = file_remove_privs(file);
- if (ret)
- goto out_unlock;
ret = ovl_real_fdget(file, &real);
if (ret)
goto out_unlock;
- ret = -EINVAL;
- if (iocb->ki_flags & IOCB_DIRECT &&
- !(real.file->f_mode & FMODE_CAN_ODIRECT))
- goto out_fdput;
-
if (!ovl_should_sync(OVL_FS(inode->i_sb)))
ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
* this property in case it is set by the issuer.
*/
ifl &= ~IOCB_DIO_CALLER_COMP;
-
- old_cred = ovl_override_creds(file_inode(file)->i_sb);
- if (is_sync_kiocb(iocb)) {
- rwf_t rwf = iocb_to_rw_flags(ifl);
-
- file_start_write(real.file);
- ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf);
- file_end_write(real.file);
- /* Update size */
- ovl_file_modified(file);
- } else {
- struct ovl_aio_req *aio_req;
-
- ret = ovl_init_aio_done_wq(inode->i_sb);
- if (ret)
- goto out;
-
- ret = -ENOMEM;
- aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL);
- if (!aio_req)
- goto out;
-
- aio_req->orig_iocb = iocb;
- kiocb_clone(&aio_req->iocb, iocb, get_file(real.file));
- aio_req->iocb.ki_flags = ifl;
- aio_req->iocb.ki_complete = ovl_aio_queue_completion;
- refcount_set(&aio_req->ref, 2);
- kiocb_start_write(&aio_req->iocb);
- ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter);
- ovl_aio_put(aio_req);
- if (ret != -EIOCBQUEUED)
- ovl_aio_cleanup_handler(aio_req);
- }
-out:
- revert_creds(old_cred);
-out_fdput:
+ ret = backing_file_write_iter(real.file, iter, iocb, ifl, &ctx);
fdput(real);
out_unlock:
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
{
- const struct cred *old_cred;
struct fd real;
ssize_t ret;
+ struct backing_file_ctx ctx = {
+ .cred = ovl_creds(file_inode(in)->i_sb),
+ .user_file = in,
+ .accessed = ovl_file_accessed,
+ };
ret = ovl_real_fdget(in, &real);
if (ret)
return ret;
- old_cred = ovl_override_creds(file_inode(in)->i_sb);
- ret = vfs_splice_read(real.file, ppos, pipe, len, flags);
- revert_creds(old_cred);
- ovl_file_accessed(in);
-
+ ret = backing_file_splice_read(real.file, ppos, pipe, len, flags, &ctx);
fdput(real);
+
return ret;
}
loff_t *ppos, size_t len, unsigned int flags)
{
struct fd real;
- const struct cred *old_cred;
struct inode *inode = file_inode(out);
ssize_t ret;
+ struct backing_file_ctx ctx = {
+ .cred = ovl_creds(inode->i_sb),
+ .user_file = out,
+ .end_write = ovl_file_modified,
+ };
inode_lock(inode);
/* Update mode */
ovl_copyattr(inode);
- ret = file_remove_privs(out);
- if (ret)
- goto out_unlock;
ret = ovl_real_fdget(out, &real);
if (ret)
goto out_unlock;
- old_cred = ovl_override_creds(inode->i_sb);
- file_start_write(real.file);
-
- ret = iter_file_splice_write(pipe, real.file, ppos, len, flags);
-
- file_end_write(real.file);
- /* Update size */
- ovl_file_modified(out);
- revert_creds(old_cred);
+ ret = backing_file_splice_write(pipe, real.file, ppos, len, flags, &ctx);
fdput(real);
out_unlock:
static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
{
struct file *realfile = file->private_data;
- const struct cred *old_cred;
- int ret;
-
- if (!realfile->f_op->mmap)
- return -ENODEV;
+ struct backing_file_ctx ctx = {
+ .cred = ovl_creds(file_inode(file)->i_sb),
+ .user_file = file,
+ .accessed = ovl_file_accessed,
+ };
- if (WARN_ON(file != vma->vm_file))
- return -EIO;
-
- vma_set_file(vma, realfile);
-
- old_cred = ovl_override_creds(file_inode(file)->i_sb);
- ret = call_mmap(vma->vm_file, vma);
- revert_creds(old_cred);
- ovl_file_accessed(file);
-
- return ret;
+ return backing_file_mmap(realfile, vma, &ctx);
}
static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
.copy_file_range = ovl_copy_file_range,
.remap_file_range = ovl_remap_file_range,
};
-
-int __init ovl_aio_request_cache_init(void)
-{
- ovl_aio_request_cachep = kmem_cache_create("ovl_aio_req",
- sizeof(struct ovl_aio_req),
- 0, SLAB_HWCACHE_ALIGN, NULL);
- if (!ovl_aio_request_cachep)
- return -ENOMEM;
-
- return 0;
-}
-
-void ovl_aio_request_cache_destroy(void)
-{
- kmem_cache_destroy(ovl_aio_request_cachep);
-}
void ovl_drop_write(struct dentry *dentry);
struct dentry *ovl_workdir(struct dentry *dentry);
const struct cred *ovl_override_creds(struct super_block *sb);
+
+static inline const struct cred *ovl_creds(struct super_block *sb)
+{
+ return OVL_FS(sb)->creator_cred;
+}
+
int ovl_can_decode_fh(struct super_block *sb);
struct dentry *ovl_indexdir(struct super_block *sb);
bool ovl_index_all(struct super_block *sb);
/* file.c */
extern const struct file_operations ovl_file_operations;
-int __init ovl_aio_request_cache_init(void);
-void ovl_aio_request_cache_destroy(void);
int ovl_real_fileattr_get(const struct path *realpath, struct fileattr *fa);
int ovl_real_fileattr_set(const struct path *realpath, struct fileattr *fa);
int ovl_fileattr_get(struct dentry *dentry, struct fileattr *fa);
if (ovl_inode_cachep == NULL)
return -ENOMEM;
- err = ovl_aio_request_cache_init();
- if (!err) {
- err = register_filesystem(&ovl_fs_type);
- if (!err)
- return 0;
+ err = register_filesystem(&ovl_fs_type);
+ if (!err)
+ return 0;
- ovl_aio_request_cache_destroy();
- }
kmem_cache_destroy(ovl_inode_cachep);
return err;
*/
rcu_barrier();
kmem_cache_destroy(ovl_inode_cachep);
- ovl_aio_request_cache_destroy();
}
module_init(ovl_init);
bool was_empty = false;
bool wake_next_writer = false;
+ /*
+ * Reject writing to watch queue pipes before the point where we lock
+ * the pipe.
+ * Otherwise, lockdep would be unhappy if the caller already has another
+ * pipe locked.
+ * If we had to support locking a normal pipe and a notification pipe at
+ * the same time, we could set up lockdep annotations for that, but
+ * since we don't actually need that, it's simpler to just bail here.
+ */
+ if (pipe_has_watch_queue(pipe))
+ return -EXDEV;
+
/* Null write succeeds. */
if (unlikely(total_len == 0))
return 0;
goto out;
}
- if (pipe_has_watch_queue(pipe)) {
- ret = -EXDEV;
- goto out;
- }
-
/*
* If it wasn't empty we try to merge new data into
* the last buffer.
pipe->tail = tail;
pipe->head = head;
+ if (!pipe_has_watch_queue(pipe)) {
+ pipe->max_usage = nr_slots;
+ pipe->nr_accounted = nr_slots;
+ }
+
spin_unlock_irq(&pipe->rd_wait.lock);
/* This might have made more room for writers */
if (ret < 0)
goto out_revert_acct;
- pipe->max_usage = nr_slots;
- pipe->nr_accounted = nr_slots;
return pipe->max_usage * PAGE_SIZE;
out_revert_acct:
mnt->mnt.mnt_flags |= MNT_UMOUNT;
list_del_init(&mnt->mnt_child);
list_del_init(&mnt->mnt_umounting);
- list_move_tail(&mnt->mnt_list, to_umount);
+ move_from_ns(mnt, to_umount);
}
/*
* the vfsmount must be passed through @idmap. This function will then
* take care to map the inode according to @idmap before checking
* permissions. On non-idmapped mounts or if permission checking is to be
- * performed on the raw inode simply passs @nop_mnt_idmap.
+ * performed on the raw inode simply pass @nop_mnt_idmap.
*/
int
posix_acl_chmod(struct mnt_idmap *idmap, struct dentry *dentry,
* the vfsmount must be passed through @idmap. This function will then
* take care to map the inode according to @idmap before checking
* permissions. On non-idmapped mounts or if permission checking is to be
- * performed on the raw inode simply passs @nop_mnt_idmap.
+ * performed on the raw inode simply pass @nop_mnt_idmap.
*
* Called from set_acl inode operations.
*/
const char *name = NULL;
if (file) {
- struct inode *inode = file_inode(vma->vm_file);
+ const struct inode *inode = file_user_inode(vma->vm_file);
+
dev = inode->i_sb->s_dev;
ino = inode->i_ino;
pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
seq_printf(m, "%i %i %u:%u ", r->mnt_id, r->mnt_parent->mnt_id,
MAJOR(sb->s_dev), MINOR(sb->s_dev));
- if (sb->s_op->show_path) {
- err = sb->s_op->show_path(m, mnt->mnt_root);
- if (err)
- goto out;
- } else {
- seq_dentry(m, mnt->mnt_root, " \t\n\\");
- }
+ err = show_path(m, mnt->mnt_root);
+ if (err)
+ goto out;
seq_putc(m, ' ');
/* mountpoints outside of chroot jail will give SEQ_SKIP on this */
p->ns = ns;
p->root = root;
p->show = show;
- INIT_LIST_HEAD(&p->cursor.mnt_list);
- p->cursor.mnt.mnt_flags = MNT_CURSOR;
return 0;
struct seq_file *m = file->private_data;
struct proc_mounts *p = m->private;
path_put(&p->root);
- mnt_cursor_del(p->ns, &p->cursor);
put_mnt_ns(p->ns);
return seq_release_private(inode, file);
}
int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t count)
{
+ int mask = read_write == READ ? MAY_READ : MAY_WRITE;
+ int ret;
+
if (unlikely((ssize_t) count < 0))
return -EINVAL;
}
}
- return security_file_permission(file,
- read_write == READ ? MAY_READ : MAY_WRITE);
+ ret = security_file_permission(file, mask);
+ if (ret)
+ return ret;
+
+ return fsnotify_file_area_perm(file, mask, ppos, count);
}
EXPORT_SYMBOL(rw_verify_area);
return ret;
}
-static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
- loff_t *pos, rwf_t flags)
+ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb,
+ struct iov_iter *iter)
{
size_t tot_len;
ssize_t ret = 0;
+ if (!file->f_op->read_iter)
+ return -EINVAL;
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
tot_len = iov_iter_count(iter);
if (!tot_len)
goto out;
- ret = rw_verify_area(READ, file, pos, tot_len);
+ ret = rw_verify_area(READ, file, &iocb->ki_pos, tot_len);
if (ret < 0)
return ret;
- if (file->f_op->read_iter)
- ret = do_iter_readv_writev(file, iter, pos, READ, flags);
- else
- ret = do_loop_readv_writev(file, iter, pos, READ, flags);
+ ret = call_read_iter(file, iocb, iter);
out:
if (ret >= 0)
fsnotify_access(file);
return ret;
}
+EXPORT_SYMBOL(vfs_iocb_iter_read);
-ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb,
- struct iov_iter *iter)
+ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
+ rwf_t flags)
{
size_t tot_len;
ssize_t ret = 0;
tot_len = iov_iter_count(iter);
if (!tot_len)
goto out;
- ret = rw_verify_area(READ, file, &iocb->ki_pos, tot_len);
+ ret = rw_verify_area(READ, file, ppos, tot_len);
if (ret < 0)
return ret;
- ret = call_read_iter(file, iocb, iter);
+ ret = do_iter_readv_writev(file, iter, ppos, READ, flags);
out:
if (ret >= 0)
fsnotify_access(file);
return ret;
}
-EXPORT_SYMBOL(vfs_iocb_iter_read);
-
-ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
- rwf_t flags)
-{
- if (!file->f_op->read_iter)
- return -EINVAL;
- return do_iter_read(file, iter, ppos, flags);
-}
EXPORT_SYMBOL(vfs_iter_read);
-static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
- loff_t *pos, rwf_t flags)
+/*
+ * Caller is responsible for calling kiocb_end_write() on completion
+ * if async iocb was queued.
+ */
+ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
+ struct iov_iter *iter)
{
size_t tot_len;
ssize_t ret = 0;
+ if (!file->f_op->write_iter)
+ return -EINVAL;
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
tot_len = iov_iter_count(iter);
if (!tot_len)
return 0;
- ret = rw_verify_area(WRITE, file, pos, tot_len);
+ ret = rw_verify_area(WRITE, file, &iocb->ki_pos, tot_len);
if (ret < 0)
return ret;
- if (file->f_op->write_iter)
- ret = do_iter_readv_writev(file, iter, pos, WRITE, flags);
- else
- ret = do_loop_readv_writev(file, iter, pos, WRITE, flags);
+ kiocb_start_write(iocb);
+ ret = call_write_iter(file, iocb, iter);
+ if (ret != -EIOCBQUEUED)
+ kiocb_end_write(iocb);
if (ret > 0)
fsnotify_modify(file);
+
return ret;
}
+EXPORT_SYMBOL(vfs_iocb_iter_write);
-ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
- struct iov_iter *iter)
+ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
+ rwf_t flags)
{
size_t tot_len;
- ssize_t ret = 0;
+ ssize_t ret;
- if (!file->f_op->write_iter)
- return -EINVAL;
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
+ if (!file->f_op->write_iter)
+ return -EINVAL;
tot_len = iov_iter_count(iter);
if (!tot_len)
return 0;
- ret = rw_verify_area(WRITE, file, &iocb->ki_pos, tot_len);
+
+ ret = rw_verify_area(WRITE, file, ppos, tot_len);
if (ret < 0)
return ret;
- ret = call_write_iter(file, iocb, iter);
+ file_start_write(file);
+ ret = do_iter_readv_writev(file, iter, ppos, WRITE, flags);
if (ret > 0)
fsnotify_modify(file);
+ file_end_write(file);
return ret;
}
-EXPORT_SYMBOL(vfs_iocb_iter_write);
-
-ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
- rwf_t flags)
-{
- if (!file->f_op->write_iter)
- return -EINVAL;
- return do_iter_write(file, iter, ppos, flags);
-}
EXPORT_SYMBOL(vfs_iter_write);
static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos, rwf_t flags)
+ unsigned long vlen, loff_t *pos, rwf_t flags)
{
struct iovec iovstack[UIO_FASTIOV];
struct iovec *iov = iovstack;
struct iov_iter iter;
- ssize_t ret;
+ size_t tot_len;
+ ssize_t ret = 0;
- ret = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
- if (ret >= 0) {
- ret = do_iter_read(file, &iter, pos, flags);
- kfree(iov);
- }
+ if (!(file->f_mode & FMODE_READ))
+ return -EBADF;
+ if (!(file->f_mode & FMODE_CAN_READ))
+ return -EINVAL;
+
+ ret = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov,
+ &iter);
+ if (ret < 0)
+ return ret;
+
+ tot_len = iov_iter_count(&iter);
+ if (!tot_len)
+ goto out;
+ ret = rw_verify_area(READ, file, pos, tot_len);
+ if (ret < 0)
+ goto out;
+
+ if (file->f_op->read_iter)
+ ret = do_iter_readv_writev(file, &iter, pos, READ, flags);
+ else
+ ret = do_loop_readv_writev(file, &iter, pos, READ, flags);
+out:
+ if (ret >= 0)
+ fsnotify_access(file);
+ kfree(iov);
return ret;
}
static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos, rwf_t flags)
+ unsigned long vlen, loff_t *pos, rwf_t flags)
{
struct iovec iovstack[UIO_FASTIOV];
struct iovec *iov = iovstack;
struct iov_iter iter;
- ssize_t ret;
+ size_t tot_len;
+ ssize_t ret = 0;
- ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
- if (ret >= 0) {
- file_start_write(file);
- ret = do_iter_write(file, &iter, pos, flags);
- file_end_write(file);
- kfree(iov);
- }
+ if (!(file->f_mode & FMODE_WRITE))
+ return -EBADF;
+ if (!(file->f_mode & FMODE_CAN_WRITE))
+ return -EINVAL;
+
+ ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov,
+ &iter);
+ if (ret < 0)
+ return ret;
+
+ tot_len = iov_iter_count(&iter);
+ if (!tot_len)
+ goto out;
+
+ ret = rw_verify_area(WRITE, file, pos, tot_len);
+ if (ret < 0)
+ goto out;
+
+ file_start_write(file);
+ if (file->f_op->write_iter)
+ ret = do_iter_readv_writev(file, &iter, pos, WRITE, flags);
+ else
+ ret = do_loop_readv_writev(file, &iter, pos, WRITE, flags);
+ if (ret > 0)
+ fsnotify_modify(file);
+ file_end_write(file);
+out:
+ kfree(iov);
return ret;
}
#endif /* CONFIG_COMPAT */
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
- size_t count, loff_t max)
+ size_t count, loff_t max)
{
struct fd in, out;
struct inode *in_inode, *out_inode;
retval = rw_verify_area(WRITE, out.file, &out_pos, count);
if (retval < 0)
goto fput_out;
- file_start_write(out.file);
retval = do_splice_direct(in.file, &pos, out.file, &out_pos,
count, fl);
- file_end_write(out.file);
} else {
if (out.file->f_flags & O_NONBLOCK)
fl |= SPLICE_F_NONBLOCK;
}
#endif
-/**
- * generic_copy_file_range - copy data between two files
- * @file_in: file structure to read from
- * @pos_in: file offset to read from
- * @file_out: file structure to write data to
- * @pos_out: file offset to write data to
- * @len: amount of data to copy
- * @flags: copy flags
- *
- * This is a generic filesystem helper to copy data from one file to another.
- * It has no constraints on the source or destination file owners - the files
- * can belong to different superblocks and different filesystem types. Short
- * copies are allowed.
- *
- * This should be called from the @file_out filesystem, as per the
- * ->copy_file_range() method.
- *
- * Returns the number of bytes copied or a negative error indicating the
- * failure.
- */
-
-ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
- struct file *file_out, loff_t pos_out,
- size_t len, unsigned int flags)
-{
- lockdep_assert(sb_write_started(file_inode(file_out)->i_sb));
-
- return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
- len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
-}
-EXPORT_SYMBOL(generic_copy_file_range);
-
/*
* Performs necessary checks before doing a file copy
*
{
ssize_t ret;
bool splice = flags & COPY_FILE_SPLICE;
+ bool samesb = file_inode(file_in)->i_sb == file_inode(file_out)->i_sb;
if (flags & ~COPY_FILE_SPLICE)
return -EINVAL;
ret = file_out->f_op->copy_file_range(file_in, pos_in,
file_out, pos_out,
len, flags);
- goto done;
- }
-
- if (!splice && file_in->f_op->remap_file_range &&
- file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
+ } else if (!splice && file_in->f_op->remap_file_range && samesb) {
ret = file_in->f_op->remap_file_range(file_in, pos_in,
file_out, pos_out,
min_t(loff_t, MAX_RW_COUNT, len),
REMAP_FILE_CAN_SHORTEN);
- if (ret > 0)
- goto done;
+ /* fallback to splice */
+ if (ret <= 0)
+ splice = true;
+ } else if (samesb) {
+ /* Fallback to splice for same sb copy for backward compat */
+ splice = true;
}
+ file_end_write(file_out);
+
+ if (!splice)
+ goto done;
+
/*
* We can get here for same sb copy of filesystems that do not implement
* ->copy_file_range() in case filesystem does not support clone or in
* and which filesystems do not, that will allow userspace tools to
* make consistent desicions w.r.t using copy_file_range().
*
- * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE.
+ * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE
+ * for server-side-copy between any two sb.
+ *
+ * In any case, we call do_splice_direct() and not splice_file_range(),
+ * without file_start_write() held, to avoid possible deadlocks related
+ * to splicing from input file, while file_start_write() is held on
+ * the output file on a different sb.
*/
- ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
- flags);
-
+ ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
+ min_t(size_t, len, MAX_RW_COUNT), 0);
done:
if (ret > 0) {
fsnotify_access(file_in);
inc_syscr(current);
inc_syscw(current);
- file_end_write(file_out);
-
return ret;
}
EXPORT_SYMBOL(vfs_copy_file_range);
if (res)
goto out;
+ res = fsnotify_file_perm(file, MAY_READ);
+ if (res)
+ goto out;
+
res = down_read_killable(&inode->i_rwsem);
if (res)
goto out;
INITIALIZE_PATH(path);
int item_len = 0;
int tb_init = 0;
- struct cpu_key cpu_key;
+ struct cpu_key cpu_key = {};
int retval;
int quota_cut_bytes = 0;
static int remap_verify_area(struct file *file, loff_t pos, loff_t len,
bool write)
{
+ int mask = write ? MAY_WRITE : MAY_READ;
loff_t tmp;
+ int ret;
if (unlikely(pos < 0 || len < 0))
return -EINVAL;
if (unlikely(check_add_overflow(pos, len, &tmp)))
return -EINVAL;
- return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
+ ret = security_file_permission(file, mask);
+ if (ret)
+ return ret;
+
+ return fsnotify_file_area_perm(file, mask, &pos, len);
}
/*
if (!file_in->f_op->remap_file_range)
return -EOPNOTSUPP;
- ret = remap_verify_area(file_in, pos_in, len, false);
- if (ret)
- return ret;
-
- ret = remap_verify_area(file_out, pos_out, len, true);
- if (ret)
- return ret;
-
ret = file_in->f_op->remap_file_range(file_in, pos_in,
file_out, pos_out, len, remap_flags);
if (ret < 0)
{
loff_t ret;
+ ret = remap_verify_area(file_in, pos_in, len, false);
+ if (ret)
+ return ret;
+
+ ret = remap_verify_area(file_out, pos_out, len, true);
+ if (ret)
+ return ret;
+
file_start_write(file_out);
ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
remap_flags);
EXPORT_SYMBOL(vfs_clone_file_range);
/* Check whether we are allowed to dedupe the destination file */
-static bool allow_file_dedupe(struct file *file)
+static bool may_dedupe_file(struct file *file)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct inode *inode = file_inode(file);
WARN_ON_ONCE(remap_flags & ~(REMAP_FILE_DEDUP |
REMAP_FILE_CAN_SHORTEN));
- ret = mnt_want_write_file(dst_file);
- if (ret)
- return ret;
-
/*
* This is redundant if called from vfs_dedupe_file_range(), but other
* callers need it and it's not performance sesitive...
*/
ret = remap_verify_area(src_file, src_pos, len, false);
if (ret)
- goto out_drop_write;
+ return ret;
ret = remap_verify_area(dst_file, dst_pos, len, true);
if (ret)
- goto out_drop_write;
+ return ret;
+
+ /*
+ * This needs to be called after remap_verify_area() because of
+ * sb_start_write() and before may_dedupe_file() because the mount's
+ * MAY_WRITE need to be checked with mnt_get_write_access_file() held.
+ */
+ ret = mnt_want_write_file(dst_file);
+ if (ret)
+ return ret;
ret = -EPERM;
- if (!allow_file_dedupe(dst_file))
+ if (!may_dedupe_file(dst_file))
goto out_drop_write;
ret = -EXDEV;
#ifdef CONFIG_CIFS_DEBUG2
struct smb_hdr *smb = buf;
- cifs_dbg(VFS, "Cmd: %d Err: 0x%x Flags: 0x%x Flgs2: 0x%x Mid: %d Pid: %d\n",
- smb->Command, smb->Status.CifsError,
- smb->Flags, smb->Flags2, smb->Mid, smb->Pid);
- cifs_dbg(VFS, "smb buf %p len %u\n", smb,
- server->ops->calc_smb_size(smb));
+ cifs_dbg(VFS, "Cmd: %d Err: 0x%x Flags: 0x%x Flgs2: 0x%x Mid: %d Pid: %d Wct: %d\n",
+ smb->Command, smb->Status.CifsError, smb->Flags,
+ smb->Flags2, smb->Mid, smb->Pid, smb->WordCount);
+ if (!server->ops->check_message(buf, server->total_read, server)) {
+ cifs_dbg(VFS, "smb buf %p len %u\n", smb,
+ server->ops->calc_smb_size(smb));
+ }
#endif /* CONFIG_CIFS_DEBUG2 */
}
#include <linux/freezer.h>
#include <linux/namei.h>
#include <linux/random.h>
+#include <linux/splice.h>
#include <linux/uuid.h>
#include <linux/xattr.h>
#include <uapi/linux/magic.h>
free_xid(xid);
if (rc == -EOPNOTSUPP || rc == -EXDEV)
- rc = generic_copy_file_range(src_file, off, dst_file,
- destoff, len, flags);
+ rc = splice_copy_file_range(src_file, off, dst_file,
+ destoff, len);
return rc;
}
struct mid_q_entry **, char **, int *);
enum securityEnum (*select_sectype)(struct TCP_Server_Info *,
enum securityEnum);
- int (*next_header)(char *);
+ int (*next_header)(struct TCP_Server_Info *server, char *buf,
+ unsigned int *noff);
/* ioctl passthrough for query_info */
int (*ioctl_query_info)(const unsigned int xid,
struct cifs_tcon *tcon,
struct cifs_server_iface *iface = container_of(ref,
struct cifs_server_iface,
refcount);
- list_del_init(&iface->iface_head);
kfree(iface);
}
/* If server is a channel, select the primary channel */
pserver = SERVER_IS_CHAN(server) ? server->primary_server : server;
+ /*
+ * if the server has been marked for termination, there is a
+ * chance that the remaining channels all need reconnect. To be
+ * on the safer side, mark the session and trees for reconnect
+ * for this scenario. This might cause a few redundant session
+ * setup and tree connect requests, but it is better than not doing
+ * a tree connect when needed, and all following requests failing
+ */
+ if (server->terminate) {
+ mark_smb_session = true;
+ server = pserver;
+ }
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
- /*
- * if channel has been marked for termination, nothing to do
- * for the channel. in fact, we cannot find the channel for the
- * server. So safe to exit here
- */
- if (server->terminate)
- break;
-
/* check if iface is still active */
- if (!cifs_chan_is_iface_active(ses, server))
+ spin_lock(&ses->chan_lock);
+ if (!cifs_chan_is_iface_active(ses, server)) {
+ spin_unlock(&ses->chan_lock);
cifs_chan_update_iface(ses, server);
+ spin_lock(&ses->chan_lock);
+ }
- spin_lock(&ses->chan_lock);
if (!mark_smb_session && cifs_chan_needs_reconnect(ses, server)) {
spin_unlock(&ses->chan_lock);
continue;
server->total_read += length;
if (server->ops->next_header) {
- next_offset = server->ops->next_header(buf);
+ if (server->ops->next_header(server, buf, &next_offset)) {
+ cifs_dbg(VFS, "%s: malformed response (next_offset=%u)\n",
+ __func__, next_offset);
+ cifs_reconnect(server, true);
+ continue;
+ }
if (next_offset)
server->pdu_size = next_offset;
}
/* we do not want atime to be less than mtime, it broke some apps */
atime = inode_set_atime_to_ts(inode, current_time(inode));
mtime = inode_get_mtime(inode);
- if (timespec64_compare(&atime, &mtime))
+ if (timespec64_compare(&atime, &mtime) < 0)
inode_set_atime_to_ts(inode, inode_get_mtime(inode));
if (PAGE_SIZE > rc)
cifs_dbg(VFS, "Length less than smb header size\n");
}
return -EIO;
+ } else if (total_read < sizeof(*smb) + 2 * smb->WordCount) {
+ cifs_dbg(VFS, "%s: can't read BCC due to invalid WordCount(%u)\n",
+ __func__, smb->WordCount);
+ return -EIO;
}
/* otherwise, there is enough to get to the BCC */
cifs_dbg(FYI, "unable to find a suitable iface\n");
}
- if (!chan_index && !iface) {
+ if (!iface) {
cifs_dbg(FYI, "unable to get the interface matching: %pIS\n",
&ss);
spin_unlock(&ses->iface_lock);
}
/* now drop the ref to the current iface */
- if (old_iface && iface) {
+ if (old_iface) {
cifs_dbg(FYI, "replacing iface: %pIS with %pIS\n",
&old_iface->sockaddr,
&iface->sockaddr);
kref_put(&old_iface->refcount, release_iface);
} else if (old_iface) {
- cifs_dbg(FYI, "releasing ref to iface: %pIS\n",
+ /* if a new candidate is not found, keep things as is */
+ cifs_dbg(FYI, "could not replace iface: %pIS\n",
&old_iface->sockaddr);
-
- old_iface->num_channels--;
- if (old_iface->weight_fulfilled)
- old_iface->weight_fulfilled--;
-
- kref_put(&old_iface->refcount, release_iface);
} else if (!chan_index) {
/* special case: update interface for primary channel */
- cifs_dbg(FYI, "referencing primary channel iface: %pIS\n",
- &iface->sockaddr);
- iface->num_channels++;
- iface->weight_fulfilled++;
- } else {
- WARN_ON(!iface);
- cifs_dbg(FYI, "adding new iface: %pIS\n", &iface->sockaddr);
+ if (iface) {
+ cifs_dbg(FYI, "referencing primary channel iface: %pIS\n",
+ &iface->sockaddr);
+ iface->num_channels++;
+ iface->weight_fulfilled++;
+ }
}
spin_unlock(&ses->iface_lock);
- spin_lock(&ses->chan_lock);
- chan_index = cifs_ses_get_chan_index(ses, server);
- if (chan_index == CIFS_INVAL_CHAN_INDEX) {
+ if (iface) {
+ spin_lock(&ses->chan_lock);
+ chan_index = cifs_ses_get_chan_index(ses, server);
+ if (chan_index == CIFS_INVAL_CHAN_INDEX) {
+ spin_unlock(&ses->chan_lock);
+ return 0;
+ }
+
+ ses->chans[chan_index].iface = iface;
spin_unlock(&ses->chan_lock);
- return 0;
}
- ses->chans[chan_index].iface = iface;
-
- /* No iface is found. if secondary chan, drop connection */
- if (!iface && SERVER_IS_CHAN(server))
- ses->chans[chan_index].server = NULL;
-
- spin_unlock(&ses->chan_lock);
-
- if (!iface && SERVER_IS_CHAN(server))
- cifs_put_tcp_session(server, false);
-
return rc;
}
}
mid = le64_to_cpu(shdr->MessageId);
+ if (check_smb2_hdr(shdr, mid))
+ return 1;
+
+ if (shdr->StructureSize != SMB2_HEADER_STRUCTURE_SIZE) {
+ cifs_dbg(VFS, "Invalid structure size %u\n",
+ le16_to_cpu(shdr->StructureSize));
+ return 1;
+ }
+
+ command = le16_to_cpu(shdr->Command);
+ if (command >= NUMBER_OF_SMB2_COMMANDS) {
+ cifs_dbg(VFS, "Invalid SMB2 command %d\n", command);
+ return 1;
+ }
+
if (len < pdu_size) {
if ((len >= hdr_size)
&& (shdr->Status != 0)) {
return 1;
}
- if (check_smb2_hdr(shdr, mid))
- return 1;
-
- if (shdr->StructureSize != SMB2_HEADER_STRUCTURE_SIZE) {
- cifs_dbg(VFS, "Invalid structure size %u\n",
- le16_to_cpu(shdr->StructureSize));
- return 1;
- }
-
- command = le16_to_cpu(shdr->Command);
- if (command >= NUMBER_OF_SMB2_COMMANDS) {
- cifs_dbg(VFS, "Invalid SMB2 command %d\n", command);
- return 1;
- }
-
if (smb2_rsp_struct_sizes[command] != pdu->StructureSize2) {
if (command != SMB2_OPLOCK_BREAK_HE && (shdr->Status == 0 ||
pdu->StructureSize2 != SMB2_ERROR_STRUCTURE_SIZE2_LE)) {
cifs_server_dbg(VFS, "Cmd: %d Err: 0x%x Flags: 0x%x Mid: %llu Pid: %d\n",
shdr->Command, shdr->Status, shdr->Flags, shdr->MessageId,
shdr->Id.SyncId.ProcessId);
- cifs_server_dbg(VFS, "smb buf %p len %u\n", buf,
- server->ops->calc_smb_size(buf));
+ if (!server->ops->check_message(buf, server->total_read, server)) {
+ cifs_server_dbg(VFS, "smb buf %p len %u\n", buf,
+ server->ops->calc_smb_size(buf));
+ }
#endif
}
}
/*
- * Go through iface_list and do kref_put to remove
- * any unused ifaces. ifaces in use will be removed
- * when the last user calls a kref_put on it
+ * Go through iface_list and mark them as inactive
*/
list_for_each_entry_safe(iface, niface, &ses->iface_list,
- iface_head) {
+ iface_head)
iface->is_active = 0;
- kref_put(&iface->refcount, release_iface);
- ses->iface_count--;
- }
+
spin_unlock(&ses->iface_lock);
/*
iface_head) {
ret = iface_cmp(iface, &tmp_iface);
if (!ret) {
- /* just get a ref so that it doesn't get picked/freed */
iface->is_active = 1;
- kref_get(&iface->refcount);
- ses->iface_count++;
spin_unlock(&ses->iface_lock);
goto next_iface;
} else if (ret < 0) {
}
out:
+ /*
+ * Go through the list again and put the inactive entries
+ */
+ spin_lock(&ses->iface_lock);
+ list_for_each_entry_safe(iface, niface, &ses->iface_list,
+ iface_head) {
+ if (!iface->is_active) {
+ list_del(&iface->iface_head);
+ kref_put(&iface->refcount, release_iface);
+ ses->iface_count--;
+ }
+ }
+ spin_unlock(&ses->iface_lock);
+
return rc;
}
goto out;
/* check if iface is still active */
+ spin_lock(&ses->chan_lock);
pserver = ses->chans[0].server;
- if (pserver && !cifs_chan_is_iface_active(ses, pserver))
+ if (pserver && !cifs_chan_is_iface_active(ses, pserver)) {
+ spin_unlock(&ses->chan_lock);
cifs_chan_update_iface(ses, pserver);
+ spin_lock(&ses->chan_lock);
+ }
+ spin_unlock(&ses->chan_lock);
out:
kfree(out_buf);
NULL, 0, false);
}
-static int
-smb2_next_header(char *buf)
+static int smb2_next_header(struct TCP_Server_Info *server, char *buf,
+ unsigned int *noff)
{
struct smb2_hdr *hdr = (struct smb2_hdr *)buf;
struct smb2_transform_hdr *t_hdr = (struct smb2_transform_hdr *)buf;
- if (hdr->ProtocolId == SMB2_TRANSFORM_PROTO_NUM)
- return sizeof(struct smb2_transform_hdr) +
- le32_to_cpu(t_hdr->OriginalMessageSize);
-
- return le32_to_cpu(hdr->NextCommand);
+ if (hdr->ProtocolId == SMB2_TRANSFORM_PROTO_NUM) {
+ *noff = le32_to_cpu(t_hdr->OriginalMessageSize);
+ if (unlikely(check_add_overflow(*noff, sizeof(*t_hdr), noff)))
+ return -EINVAL;
+ } else {
+ *noff = le32_to_cpu(hdr->NextCommand);
+ }
+ if (unlikely(*noff && *noff < MID_HEADER_SIZE(server)))
+ return -EINVAL;
+ return 0;
}
int cifs_sfu_make_node(unsigned int xid, struct inode *inode,
}
if (smb2_command != SMB2_INTERNAL_CMD)
- if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0))
- cifs_put_tcp_session(server, false);
+ mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
atomic_inc(&tconInfoReconnectCount);
out:
void **request_buf, unsigned int *total_len)
{
/* BB eventually switch this to SMB2 specific small buf size */
- if (smb2_command == SMB2_SET_INFO)
+ switch (smb2_command) {
+ case SMB2_SET_INFO:
+ case SMB2_QUERY_INFO:
*request_buf = cifs_buf_get();
- else
+ break;
+ default:
*request_buf = cifs_small_buf_get();
+ break;
+ }
if (*request_buf == NULL) {
/* BB should we add a retry in here if not a writepage? */
return -ENOMEM;
struct smb2_query_info_req *req;
struct kvec *iov = rqst->rq_iov;
unsigned int total_len;
+ size_t len;
int rc;
+ if (unlikely(check_add_overflow(input_len, sizeof(*req), &len) ||
+ len > CIFSMaxBufSize))
+ return -EINVAL;
+
rc = smb2_plain_req_init(SMB2_QUERY_INFO, tcon, server,
(void **) &req, &total_len);
if (rc)
iov[0].iov_base = (char *)req;
/* 1 for Buffer */
- iov[0].iov_len = total_len - 1 + input_len;
+ iov[0].iov_len = len;
return 0;
}
SMB2_query_info_free(struct smb_rqst *rqst)
{
if (rqst && rqst->rq_iov)
- cifs_small_buf_release(rqst->rq_iov[0].iov_base); /* request */
+ cifs_buf_release(rqst->rq_iov[0].iov_base); /* request */
}
static int
return 0;
}
+static inline void free_qfs_info_req(struct kvec *iov)
+{
+ cifs_buf_release(iov->iov_base);
+}
+
int
SMB311_posix_qfs_info(const unsigned int xid, struct cifs_tcon *tcon,
u64 persistent_fid, u64 volatile_fid, struct kstatfs *fsdata)
rc = cifs_send_recv(xid, ses, server,
&rqst, &resp_buftype, flags, &rsp_iov);
- cifs_small_buf_release(iov.iov_base);
+ free_qfs_info_req(&iov);
if (rc) {
cifs_stats_fail_inc(tcon, SMB2_QUERY_INFO_HE);
goto posix_qfsinf_exit;
rc = cifs_send_recv(xid, ses, server,
&rqst, &resp_buftype, flags, &rsp_iov);
- cifs_small_buf_release(iov.iov_base);
+ free_qfs_info_req(&iov);
if (rc) {
cifs_stats_fail_inc(tcon, SMB2_QUERY_INFO_HE);
goto qfsinf_exit;
rc = cifs_send_recv(xid, ses, server,
&rqst, &resp_buftype, flags, &rsp_iov);
- cifs_small_buf_release(iov.iov_base);
+ free_qfs_info_req(&iov);
if (rc) {
cifs_stats_fail_inc(tcon, SMB2_QUERY_INFO_HE);
goto qfsattr_exit;
break;
case SMB2_CREATE:
{
+ unsigned short int name_off =
+ le16_to_cpu(((struct smb2_create_req *)hdr)->NameOffset);
+ unsigned short int name_len =
+ le16_to_cpu(((struct smb2_create_req *)hdr)->NameLength);
+
if (((struct smb2_create_req *)hdr)->CreateContextsLength) {
*off = le32_to_cpu(((struct smb2_create_req *)
hdr)->CreateContextsOffset);
*len = le32_to_cpu(((struct smb2_create_req *)
hdr)->CreateContextsLength);
- break;
+ if (!name_len)
+ break;
+
+ if (name_off + name_len < (u64)*off + *len)
+ break;
}
- *off = le16_to_cpu(((struct smb2_create_req *)hdr)->NameOffset);
- *len = le16_to_cpu(((struct smb2_create_req *)hdr)->NameLength);
+ *off = name_off;
+ *len = name_len;
break;
}
case SMB2_QUERY_INFO:
unsigned int tail = pipe->tail;
unsigned int head = pipe->head;
unsigned int mask = pipe->ring_size - 1;
- int ret = 0, page_nr = 0;
+ ssize_t ret = 0;
+ int page_nr = 0;
if (!spd_pages)
return 0;
.u.file = out,
};
int nbufs = pipe->max_usage;
- struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
- GFP_KERNEL);
+ struct bio_vec *array;
ssize_t ret;
+ if (!out->f_op->write_iter)
+ return -EINVAL;
+
+ array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL);
if (unlikely(!array))
return -ENOMEM;
splice_from_pipe_begin(&sd);
while (sd.total_len) {
+ struct kiocb kiocb;
struct iov_iter from;
unsigned int head, tail, mask;
size_t left;
}
iov_iter_bvec(&from, ITER_SOURCE, array, n, sd.total_len - left);
- ret = vfs_iter_write(out, &from, &sd.pos, 0);
+ init_sync_kiocb(&kiocb, out);
+ kiocb.ki_pos = sd.pos;
+ ret = call_write_iter(out, &kiocb, &from);
+ sd.pos = kiocb.ki_pos;
if (ret <= 0)
break;
/*
* Attempt to initiate a splice from pipe to file.
*/
-static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
- loff_t *ppos, size_t len, unsigned int flags)
+static ssize_t do_splice_from(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags)
{
if (unlikely(!out->f_op->splice_write))
return warn_unsupported(out, "write");
sd->splice_eof(sd);
}
-/**
- * vfs_splice_read - Read data from a file and splice it into a pipe
- * @in: File to splice from
- * @ppos: Input file offset
- * @pipe: Pipe to splice to
- * @len: Number of bytes to splice
- * @flags: Splice modifier flags (SPLICE_F_*)
- *
- * Splice the requested amount of data from the input file to the pipe. This
- * is synchronous as the caller must hold the pipe lock across the entire
- * operation.
- *
- * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
- * a hole and a negative error code otherwise.
+/*
+ * Callers already called rw_verify_area() on the entire range.
+ * No need to call it for sub ranges.
*/
-long vfs_splice_read(struct file *in, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
- unsigned int flags)
+static ssize_t do_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
{
unsigned int p_space;
- int ret;
if (unlikely(!(in->f_mode & FMODE_READ)))
return -EBADF;
p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail);
len = min_t(size_t, len, p_space << PAGE_SHIFT);
- ret = rw_verify_area(READ, in, ppos, len);
- if (unlikely(ret < 0))
- return ret;
-
if (unlikely(len > MAX_RW_COUNT))
len = MAX_RW_COUNT;
return copy_splice_read(in, ppos, pipe, len, flags);
return in->f_op->splice_read(in, ppos, pipe, len, flags);
}
+
+/**
+ * vfs_splice_read - Read data from a file and splice it into a pipe
+ * @in: File to splice from
+ * @ppos: Input file offset
+ * @pipe: Pipe to splice to
+ * @len: Number of bytes to splice
+ * @flags: Splice modifier flags (SPLICE_F_*)
+ *
+ * Splice the requested amount of data from the input file to the pipe. This
+ * is synchronous as the caller must hold the pipe lock across the entire
+ * operation.
+ *
+ * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
+ * a hole and a negative error code otherwise.
+ */
+ssize_t vfs_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ ssize_t ret;
+
+ ret = rw_verify_area(READ, in, ppos, len);
+ if (unlikely(ret < 0))
+ return ret;
+
+ return do_splice_read(in, ppos, pipe, len, flags);
+}
EXPORT_SYMBOL_GPL(vfs_splice_read);
/**
splice_direct_actor *actor)
{
struct pipe_inode_info *pipe;
- long ret, bytes;
+ ssize_t ret, bytes;
size_t len;
int i, flags, more;
size_t read_len;
loff_t pos = sd->pos, prev_pos = pos;
- ret = vfs_splice_read(in, &pos, pipe, len, flags);
+ ret = do_splice_read(in, &pos, pipe, len, flags);
if (unlikely(ret <= 0))
goto read_failure;
struct splice_desc *sd)
{
struct file *file = sd->u.file;
+ long ret;
+
+ file_start_write(file);
+ ret = do_splice_from(pipe, file, sd->opos, sd->total_len, sd->flags);
+ file_end_write(file);
+ return ret;
+}
+
+static int splice_file_range_actor(struct pipe_inode_info *pipe,
+ struct splice_desc *sd)
+{
+ struct file *file = sd->u.file;
- return do_splice_from(pipe, file, sd->opos, sd->total_len,
- sd->flags);
+ return do_splice_from(pipe, file, sd->opos, sd->total_len, sd->flags);
}
static void direct_file_splice_eof(struct splice_desc *sd)
file->f_op->splice_eof(file);
}
-/**
- * do_splice_direct - splices data directly between two files
- * @in: file to splice from
- * @ppos: input file offset
- * @out: file to splice to
- * @opos: output file offset
- * @len: number of bytes to splice
- * @flags: splice modifier flags
- *
- * Description:
- * For use by do_sendfile(). splice can easily emulate sendfile, but
- * doing it in the application would incur an extra system call
- * (splice in + splice out, as compared to just sendfile()). So this helper
- * can splice directly through a process-private pipe.
- *
- */
-long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
- loff_t *opos, size_t len, unsigned int flags)
+static ssize_t do_splice_direct_actor(struct file *in, loff_t *ppos,
+ struct file *out, loff_t *opos,
+ size_t len, unsigned int flags,
+ splice_direct_actor *actor)
{
struct splice_desc sd = {
.len = len,
.splice_eof = direct_file_splice_eof,
.opos = opos,
};
- long ret;
+ ssize_t ret;
if (unlikely(!(out->f_mode & FMODE_WRITE)))
return -EBADF;
if (unlikely(out->f_flags & O_APPEND))
return -EINVAL;
- ret = rw_verify_area(WRITE, out, opos, len);
- if (unlikely(ret < 0))
- return ret;
-
- ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
+ ret = splice_direct_to_actor(in, &sd, actor);
if (ret > 0)
*ppos = sd.pos;
return ret;
}
+/**
+ * do_splice_direct - splices data directly between two files
+ * @in: file to splice from
+ * @ppos: input file offset
+ * @out: file to splice to
+ * @opos: output file offset
+ * @len: number of bytes to splice
+ * @flags: splice modifier flags
+ *
+ * Description:
+ * For use by do_sendfile(). splice can easily emulate sendfile, but
+ * doing it in the application would incur an extra system call
+ * (splice in + splice out, as compared to just sendfile()). So this helper
+ * can splice directly through a process-private pipe.
+ *
+ * Callers already called rw_verify_area() on the entire range.
+ */
+ssize_t do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
+ loff_t *opos, size_t len, unsigned int flags)
+{
+ return do_splice_direct_actor(in, ppos, out, opos, len, flags,
+ direct_splice_actor);
+}
EXPORT_SYMBOL(do_splice_direct);
+/**
+ * splice_file_range - splices data between two files for copy_file_range()
+ * @in: file to splice from
+ * @ppos: input file offset
+ * @out: file to splice to
+ * @opos: output file offset
+ * @len: number of bytes to splice
+ *
+ * Description:
+ * For use by ->copy_file_range() methods.
+ * Like do_splice_direct(), but vfs_copy_file_range() already holds
+ * start_file_write() on @out file.
+ *
+ * Callers already called rw_verify_area() on the entire range.
+ */
+ssize_t splice_file_range(struct file *in, loff_t *ppos, struct file *out,
+ loff_t *opos, size_t len)
+{
+ lockdep_assert(file_write_started(out));
+
+ return do_splice_direct_actor(in, ppos, out, opos,
+ min_t(size_t, len, MAX_RW_COUNT),
+ 0, splice_file_range_actor);
+}
+EXPORT_SYMBOL(splice_file_range);
+
static int wait_for_space(struct pipe_inode_info *pipe, unsigned flags)
{
for (;;) {
struct pipe_inode_info *opipe,
size_t len, unsigned int flags);
-long splice_file_to_pipe(struct file *in,
- struct pipe_inode_info *opipe,
- loff_t *offset,
- size_t len, unsigned int flags)
+ssize_t splice_file_to_pipe(struct file *in,
+ struct pipe_inode_info *opipe,
+ loff_t *offset,
+ size_t len, unsigned int flags)
{
- long ret;
+ ssize_t ret;
pipe_lock(opipe);
ret = wait_for_space(opipe, flags);
if (!ret)
- ret = vfs_splice_read(in, offset, opipe, len, flags);
+ ret = do_splice_read(in, offset, opipe, len, flags);
pipe_unlock(opipe);
if (ret > 0)
wakeup_pipe_readers(opipe);
/*
* Determine where to splice to/from.
*/
-long do_splice(struct file *in, loff_t *off_in, struct file *out,
- loff_t *off_out, size_t len, unsigned int flags)
+ssize_t do_splice(struct file *in, loff_t *off_in, struct file *out,
+ loff_t *off_out, size_t len, unsigned int flags)
{
struct pipe_inode_info *ipipe;
struct pipe_inode_info *opipe;
loff_t offset;
- long ret;
+ ssize_t ret;
if (unlikely(!(in->f_mode & FMODE_READ) ||
!(out->f_mode & FMODE_WRITE)))
offset = in->f_pos;
}
+ ret = rw_verify_area(READ, in, &offset, len);
+ if (unlikely(ret < 0))
+ return ret;
+
if (out->f_flags & O_NONBLOCK)
flags |= SPLICE_F_NONBLOCK;
return ret;
}
-static long __do_splice(struct file *in, loff_t __user *off_in,
- struct file *out, loff_t __user *off_out,
- size_t len, unsigned int flags)
+static ssize_t __do_splice(struct file *in, loff_t __user *off_in,
+ struct file *out, loff_t __user *off_out,
+ size_t len, unsigned int flags)
{
struct pipe_inode_info *ipipe;
struct pipe_inode_info *opipe;
loff_t offset, *__off_in = NULL, *__off_out = NULL;
- long ret;
+ ssize_t ret;
ipipe = get_pipe_info(in, true);
opipe = get_pipe_info(out, true);
return ret;
}
-static int iter_to_pipe(struct iov_iter *from,
- struct pipe_inode_info *pipe,
- unsigned flags)
+static ssize_t iter_to_pipe(struct iov_iter *from,
+ struct pipe_inode_info *pipe,
+ unsigned int flags)
{
struct pipe_buffer buf = {
.ops = &user_page_pipe_buf_ops,
.flags = flags
};
size_t total = 0;
- int ret = 0;
+ ssize_t ret = 0;
while (iov_iter_count(from)) {
struct page *pages[16];
* For lack of a better implementation, implement vmsplice() to userspace
* as a simple copy of the pipes pages to the user iov.
*/
-static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
- unsigned int flags)
+static ssize_t vmsplice_to_user(struct file *file, struct iov_iter *iter,
+ unsigned int flags)
{
struct pipe_inode_info *pipe = get_pipe_info(file, true);
struct splice_desc sd = {
.flags = flags,
.u.data = iter
};
- long ret = 0;
+ ssize_t ret = 0;
if (!pipe)
return -EBADF;
* as splice-from-memory, where the regular splice is splice-from-file (or
* to file). In both cases the output is a pipe, naturally.
*/
-static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
- unsigned int flags)
+static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
+ unsigned int flags)
{
struct pipe_inode_info *pipe;
- long ret = 0;
+ ssize_t ret = 0;
unsigned buf_flag = 0;
if (flags & SPLICE_F_GIFT)
size_t, len, unsigned int, flags)
{
struct fd in, out;
- long error;
+ ssize_t error;
if (unlikely(!len))
return 0;
out = fdget(fd_out);
if (out.file) {
error = __do_splice(in.file, off_in, out.file, off_out,
- len, flags);
+ len, flags);
fdput(out);
}
fdput(in);
/*
* Link contents of ipipe to opipe.
*/
-static int link_pipe(struct pipe_inode_info *ipipe,
- struct pipe_inode_info *opipe,
- size_t len, unsigned int flags)
+static ssize_t link_pipe(struct pipe_inode_info *ipipe,
+ struct pipe_inode_info *opipe,
+ size_t len, unsigned int flags)
{
struct pipe_buffer *ibuf, *obuf;
unsigned int i_head, o_head;
unsigned int i_tail, o_tail;
unsigned int i_mask, o_mask;
- int ret = 0;
+ ssize_t ret = 0;
/*
* Potential ABBA deadlock, work around it by ordering lock
* The 'flags' used are the SPLICE_F_* variants, currently the only
* applicable one is SPLICE_F_NONBLOCK.
*/
-long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags)
+ssize_t do_tee(struct file *in, struct file *out, size_t len,
+ unsigned int flags)
{
struct pipe_inode_info *ipipe = get_pipe_info(in, true);
struct pipe_inode_info *opipe = get_pipe_info(out, true);
- int ret = -EINVAL;
+ ssize_t ret = -EINVAL;
if (unlikely(!(in->f_mode & FMODE_READ) ||
!(out->f_mode & FMODE_WRITE)))
SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
{
struct fd in, out;
- int error;
+ ssize_t error;
if (unlikely(flags & ~SPLICE_F_ALL))
return -EINVAL;
* the vfsmount must be passed through @idmap. This function will then
* take care to map the inode according to @idmap before filling in the
* uid and gid filds. On non-idmapped mounts or if permission checking is to be
- * performed on the raw inode simply passs @nop_mnt_idmap.
+ * performed on the raw inode simply pass @nop_mnt_idmap.
*/
void generic_fillattr(struct mnt_idmap *idmap, u32 request_mask,
struct inode *inode, struct kstat *stat)
error = vfs_getattr(&path, stat, request_mask, flags);
- stat->mnt_id = real_mount(path.mnt)->mnt_id;
- stat->result_mask |= STATX_MNT_ID;
+ if (request_mask & STATX_MNT_ID_UNIQUE) {
+ stat->mnt_id = real_mount(path.mnt)->mnt_id_unique;
+ stat->result_mask |= STATX_MNT_ID_UNIQUE;
+ } else {
+ stat->mnt_id = real_mount(path.mnt)->mnt_id;
+ stat->result_mask |= STATX_MNT_ID;
+ }
if (path.mnt->mnt_root == path.dentry)
stat->attributes |= STATX_ATTR_MOUNT_ROOT;
super_unlock(sb, false);
}
-static inline bool wait_born(struct super_block *sb)
+static bool super_flags(const struct super_block *sb, unsigned int flags)
{
- unsigned int flags;
-
/*
* Pairs with smp_store_release() in super_wake() and ensures
- * that we see SB_BORN or SB_DYING after we're woken.
+ * that we see @flags after we're woken.
*/
- flags = smp_load_acquire(&sb->s_flags);
- return flags & (SB_BORN | SB_DYING);
+ return smp_load_acquire(&sb->s_flags) & flags;
}
/**
*
* The caller must have acquired a temporary reference on @sb->s_count.
*
- * Return: This returns true if SB_BORN was set, false if SB_DYING was
- * set. The function acquires s_umount and returns with it held.
+ * Return: The function returns true if SB_BORN was set and with
+ * s_umount held. The function returns false if SB_DYING was
+ * set and without s_umount held.
*/
static __must_check bool super_lock(struct super_block *sb, bool excl)
{
-
lockdep_assert_not_held(&sb->s_umount);
-relock:
+ /* wait until the superblock is ready or dying */
+ wait_var_event(&sb->s_flags, super_flags(sb, SB_BORN | SB_DYING));
+
+ /* Don't pointlessly acquire s_umount. */
+ if (super_flags(sb, SB_DYING))
+ return false;
+
__super_lock(sb, excl);
/*
* @sb->s_root is NULL and @sb->s_active is 0. No one needs to
* grab a reference to this. Tell them so.
*/
- if (sb->s_flags & SB_DYING)
+ if (sb->s_flags & SB_DYING) {
+ super_unlock(sb, excl);
return false;
+ }
- /* Has called ->get_tree() successfully. */
- if (sb->s_flags & SB_BORN)
- return true;
-
- super_unlock(sb, excl);
-
- /* wait until the superblock is ready or dying */
- wait_var_event(&sb->s_flags, wait_born(sb));
-
- /*
- * Neither SB_BORN nor SB_DYING are ever unset so we never loop.
- * Just reacquire @sb->s_umount for the caller.
- */
- goto relock;
+ WARN_ON_ONCE(!(sb->s_flags & SB_BORN));
+ return true;
}
-/* wait and acquire read-side of @sb->s_umount */
+/* wait and try to acquire read-side of @sb->s_umount */
static inline bool super_lock_shared(struct super_block *sb)
{
return super_lock(sb, false);
}
-/* wait and acquire write-side of @sb->s_umount */
+/* wait and try to acquire write-side of @sb->s_umount */
static inline bool super_lock_excl(struct super_block *sb)
{
return super_lock(sb, true);
static struct super_block *alloc_super(struct file_system_type *type, int flags,
struct user_namespace *user_ns)
{
- struct super_block *s = kzalloc(sizeof(struct super_block), GFP_USER);
+ struct super_block *s = kzalloc(sizeof(struct super_block), GFP_KERNEL);
static const struct super_operations default_op;
int i;
EXPORT_SYMBOL(deactivate_super);
/**
- * grab_super - acquire an active reference
- * @s: reference we are trying to make active
- *
- * Tries to acquire an active reference. grab_super() is used when we
- * had just found a superblock in super_blocks or fs_type->fs_supers
- * and want to turn it into a full-blown active reference. grab_super()
- * is called with sb_lock held and drops it. Returns 1 in case of
- * success, 0 if we had failed (superblock contents was already dead or
- * dying when grab_super() had been called). Note that this is only
- * called for superblocks not in rundown mode (== ones still on ->fs_supers
- * of their type), so increment of ->s_count is OK here.
- */
-static int grab_super(struct super_block *s) __releases(sb_lock)
-{
- bool born;
-
- s->s_count++;
- spin_unlock(&sb_lock);
- born = super_lock_excl(s);
- if (born && atomic_inc_not_zero(&s->s_active)) {
- put_super(s);
- return 1;
- }
- super_unlock_excl(s);
- put_super(s);
- return 0;
-}
-
-static inline bool wait_dead(struct super_block *sb)
-{
- unsigned int flags;
-
- /*
- * Pairs with memory barrier in super_wake() and ensures
- * that we see SB_DEAD after we're woken.
- */
- flags = smp_load_acquire(&sb->s_flags);
- return flags & SB_DEAD;
-}
-
-/**
- * grab_super_dead - acquire an active reference to a superblock
+ * grab_super - acquire an active reference to a superblock
* @sb: superblock to acquire
*
* Acquire a temporary reference on a superblock and try to trade it for
* Return: This returns true if an active reference could be acquired,
* false if not.
*/
-static bool grab_super_dead(struct super_block *sb)
+static bool grab_super(struct super_block *sb)
{
+ bool locked;
sb->s_count++;
- if (grab_super(sb)) {
- put_super(sb);
- lockdep_assert_held(&sb->s_umount);
- return true;
+ spin_unlock(&sb_lock);
+ locked = super_lock_excl(sb);
+ if (locked) {
+ if (atomic_inc_not_zero(&sb->s_active)) {
+ put_super(sb);
+ return true;
+ }
+ super_unlock_excl(sb);
}
- wait_var_event(&sb->s_flags, wait_dead(sb));
- lockdep_assert_not_held(&sb->s_umount);
+ wait_var_event(&sb->s_flags, super_flags(sb, SB_DEAD));
put_super(sb);
return false;
}
warnfc(fc, "reusing existing filesystem in another namespace not allowed");
return ERR_PTR(-EBUSY);
}
- if (!grab_super_dead(old))
+ if (!grab_super(old))
goto retry;
destroy_unused_super(s);
return old;
destroy_unused_super(s);
return ERR_PTR(-EBUSY);
}
- if (!grab_super_dead(old))
+ if (!grab_super(old))
goto retry;
destroy_unused_super(s);
return old;
spin_lock(&sb_lock);
list_for_each_entry(sb, &super_blocks, s_list) {
- /* Pairs with memory marrier in super_wake(). */
- if (smp_load_acquire(&sb->s_flags) & SB_DYING)
+ if (super_flags(sb, SB_DYING))
continue;
sb->s_count++;
spin_unlock(&sb_lock);
spin_lock(&sb_lock);
list_for_each_entry(sb, &super_blocks, s_list) {
- bool born;
+ bool locked;
sb->s_count++;
spin_unlock(&sb_lock);
- born = super_lock_shared(sb);
- if (born && sb->s_root)
- f(sb, arg);
- super_unlock_shared(sb);
+ locked = super_lock_shared(sb);
+ if (locked) {
+ if (sb->s_root)
+ f(sb, arg);
+ super_unlock_shared(sb);
+ }
spin_lock(&sb_lock);
if (p)
spin_lock(&sb_lock);
hlist_for_each_entry(sb, &type->fs_supers, s_instances) {
- bool born;
+ bool locked;
sb->s_count++;
spin_unlock(&sb_lock);
- born = super_lock_shared(sb);
- if (born && sb->s_root)
- f(sb, arg);
- super_unlock_shared(sb);
+ locked = super_lock_shared(sb);
+ if (locked) {
+ if (sb->s_root)
+ f(sb, arg);
+ super_unlock_shared(sb);
+ }
spin_lock(&sb_lock);
if (p)
EXPORT_SYMBOL(iterate_supers_type);
-/**
- * get_active_super - get an active reference to the superblock of a device
- * @bdev: device to get the superblock for
- *
- * Scans the superblock list and finds the superblock of the file system
- * mounted on the device given. Returns the superblock with an active
- * reference or %NULL if none was found.
- */
-struct super_block *get_active_super(struct block_device *bdev)
-{
- struct super_block *sb;
-
- if (!bdev)
- return NULL;
-
- spin_lock(&sb_lock);
- list_for_each_entry(sb, &super_blocks, s_list) {
- if (sb->s_bdev == bdev) {
- if (!grab_super(sb))
- return NULL;
- super_unlock_excl(sb);
- return sb;
- }
- }
- spin_unlock(&sb_lock);
- return NULL;
-}
-
struct super_block *user_get_super(dev_t dev, bool excl)
{
struct super_block *sb;
spin_lock(&sb_lock);
list_for_each_entry(sb, &super_blocks, s_list) {
if (sb->s_dev == dev) {
- bool born;
+ bool locked;
sb->s_count++;
spin_unlock(&sb_lock);
/* still alive? */
- born = super_lock(sb, excl);
- if (born && sb->s_root)
- return sb;
- super_unlock(sb, excl);
+ locked = super_lock(sb, excl);
+ if (locked) {
+ if (sb->s_root)
+ return sb;
+ super_unlock(sb, excl);
+ }
/* nope, got unmounted */
spin_lock(&sb_lock);
__put_super(sb);
static void do_emergency_remount_callback(struct super_block *sb)
{
- bool born = super_lock_excl(sb);
+ bool locked = super_lock_excl(sb);
- if (born && sb->s_root && sb->s_bdev && !sb_rdonly(sb)) {
+ if (locked && sb->s_root && sb->s_bdev && !sb_rdonly(sb)) {
struct fs_context *fc;
fc = fs_context_for_reconfigure(sb->s_root,
put_fs_context(fc);
}
}
- super_unlock_excl(sb);
+ if (locked)
+ super_unlock_excl(sb);
}
static void do_emergency_remount(struct work_struct *work)
static void do_thaw_all_callback(struct super_block *sb)
{
- bool born = super_lock_excl(sb);
+ bool locked = super_lock_excl(sb);
- if (born && sb->s_root) {
+ if (locked && sb->s_root) {
if (IS_ENABLED(CONFIG_BLOCK))
- while (sb->s_bdev && !thaw_bdev(sb->s_bdev))
+ while (sb->s_bdev && !bdev_thaw(sb->s_bdev))
pr_warn("Emergency Thaw on %pg\n", sb->s_bdev);
thaw_super_locked(sb, FREEZE_HOLDER_USERSPACE);
- } else {
- super_unlock_excl(sb);
+ return;
}
+ if (locked)
+ super_unlock_excl(sb);
}
static void do_thaw_all(struct work_struct *work)
*
* The function must be called with bdev->bd_holder_lock and releases it.
*/
-static struct super_block *bdev_super_lock_shared(struct block_device *bdev)
+static struct super_block *bdev_super_lock(struct block_device *bdev, bool excl)
__releases(&bdev->bd_holder_lock)
{
struct super_block *sb = bdev->bd_holder;
- bool born;
+ bool locked;
lockdep_assert_held(&bdev->bd_holder_lock);
lockdep_assert_not_held(&sb->s_umount);
spin_lock(&sb_lock);
sb->s_count++;
spin_unlock(&sb_lock);
+
mutex_unlock(&bdev->bd_holder_lock);
- born = super_lock_shared(sb);
- if (!born || !sb->s_root || !(sb->s_flags & SB_ACTIVE)) {
- super_unlock_shared(sb);
- put_super(sb);
- return NULL;
- }
+ locked = super_lock(sb, excl);
+
/*
- * The superblock is active and we hold s_umount, we can drop our
- * temporary reference now.
- */
+ * If the superblock wasn't already SB_DYING then we hold
+ * s_umount and can safely drop our temporary reference.
+ */
put_super(sb);
+
+ if (!locked)
+ return NULL;
+
+ if (!sb->s_root || !(sb->s_flags & SB_ACTIVE)) {
+ super_unlock(sb, excl);
+ return NULL;
+ }
+
return sb;
}
{
struct super_block *sb;
- sb = bdev_super_lock_shared(bdev);
+ sb = bdev_super_lock(bdev, false);
if (!sb)
return;
{
struct super_block *sb;
- sb = bdev_super_lock_shared(bdev);
+ sb = bdev_super_lock(bdev, false);
if (!sb)
return;
+
sync_filesystem(sb);
super_unlock_shared(sb);
}
+static struct super_block *get_bdev_super(struct block_device *bdev)
+{
+ bool active = false;
+ struct super_block *sb;
+
+ sb = bdev_super_lock(bdev, true);
+ if (sb) {
+ active = atomic_inc_not_zero(&sb->s_active);
+ super_unlock_excl(sb);
+ }
+ if (!active)
+ return NULL;
+ return sb;
+}
+
+/**
+ * fs_bdev_freeze - freeze owning filesystem of block device
+ * @bdev: block device
+ *
+ * Freeze the filesystem that owns this block device if it is still
+ * active.
+ *
+ * A filesystem that owns multiple block devices may be frozen from each
+ * block device and won't be unfrozen until all block devices are
+ * unfrozen. Each block device can only freeze the filesystem once as we
+ * nest freezes for block devices in the block layer.
+ *
+ * Return: If the freeze was successful zero is returned. If the freeze
+ * failed a negative error code is returned.
+ */
+static int fs_bdev_freeze(struct block_device *bdev)
+{
+ struct super_block *sb;
+ int error = 0;
+
+ lockdep_assert_held(&bdev->bd_fsfreeze_mutex);
+
+ sb = get_bdev_super(bdev);
+ if (!sb)
+ return -EINVAL;
+
+ if (sb->s_op->freeze_super)
+ error = sb->s_op->freeze_super(sb,
+ FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+ else
+ error = freeze_super(sb,
+ FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+ if (!error)
+ error = sync_blockdev(bdev);
+ deactivate_super(sb);
+ return error;
+}
+
+/**
+ * fs_bdev_thaw - thaw owning filesystem of block device
+ * @bdev: block device
+ *
+ * Thaw the filesystem that owns this block device.
+ *
+ * A filesystem that owns multiple block devices may be frozen from each
+ * block device and won't be unfrozen until all block devices are
+ * unfrozen. Each block device can only freeze the filesystem once as we
+ * nest freezes for block devices in the block layer.
+ *
+ * Return: If the thaw was successful zero is returned. If the thaw
+ * failed a negative error code is returned. If this function
+ * returns zero it doesn't mean that the filesystem is unfrozen
+ * as it may have been frozen multiple times (kernel may hold a
+ * freeze or might be frozen from other block devices).
+ */
+static int fs_bdev_thaw(struct block_device *bdev)
+{
+ struct super_block *sb;
+ int error;
+
+ lockdep_assert_held(&bdev->bd_fsfreeze_mutex);
+
+ sb = get_bdev_super(bdev);
+ if (WARN_ON_ONCE(!sb))
+ return -EINVAL;
+
+ if (sb->s_op->thaw_super)
+ error = sb->s_op->thaw_super(sb,
+ FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+ else
+ error = thaw_super(sb,
+ FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+ deactivate_super(sb);
+ return error;
+}
+
const struct blk_holder_ops fs_holder_ops = {
.mark_dead = fs_bdev_mark_dead,
.sync = fs_bdev_sync,
+ .freeze = fs_bdev_freeze,
+ .thaw = fs_bdev_thaw,
};
EXPORT_SYMBOL_GPL(fs_holder_ops);
}
/*
- * Until SB_BORN flag is set, there can be no active superblock
- * references and thus no filesystem freezing. get_active_super() will
- * just loop waiting for SB_BORN so even freeze_bdev() cannot proceed.
- *
- * It is enough to check bdev was not frozen before we set s_bdev.
+ * It is enough to check bdev was not frozen before we set
+ * s_bdev as freezing will wait until SB_BORN is set.
*/
- mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (bdev->bd_fsfreeze_count > 0) {
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
+ if (atomic_read(&bdev->bd_fsfreeze_count) > 0) {
if (fc)
warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
bdev_release(bdev_handle);
if (bdev_stable_writes(bdev))
sb->s_iflags |= SB_I_STABLE_WRITES;
spin_unlock(&sb_lock);
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
snprintf(sb->s_id, sizeof(sb->s_id), "%pg", bdev);
shrinker_debugfs_rename(sb->s_shrink, "sb-%s:%s", sb->s_type->name,
return -EBUSY;
}
} else {
- /*
- * We drop s_umount here because we need to open the bdev and
- * bdev->open_mutex ranks above s_umount (blkdev_put() ->
- * bdev_mark_dead()). It is safe because we have active sb
- * reference and SB_BORN is not set yet.
- */
- super_unlock_excl(s);
error = setup_bdev_super(s, fc->sb_flags, fc);
- __super_lock_excl(s);
if (!error)
error = fill_super(s, fc);
if (error) {
return ERR_PTR(-EBUSY);
}
} else {
- /*
- * We drop s_umount here because we need to open the bdev and
- * bdev->open_mutex ranks above s_umount (blkdev_put() ->
- * bdev_mark_dead()). It is safe because we have active sb
- * reference and SB_BORN is not set yet.
- */
- super_unlock_excl(s);
error = setup_bdev_super(s, flags, NULL);
- __super_lock_excl(s);
if (!error)
error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
if (error) {
return ret;
}
+#define FREEZE_HOLDERS (FREEZE_HOLDER_KERNEL | FREEZE_HOLDER_USERSPACE)
+#define FREEZE_FLAGS (FREEZE_HOLDERS | FREEZE_MAY_NEST)
+
+static inline int freeze_inc(struct super_block *sb, enum freeze_holder who)
+{
+ WARN_ON_ONCE((who & ~FREEZE_FLAGS));
+ WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1);
+
+ if (who & FREEZE_HOLDER_KERNEL)
+ ++sb->s_writers.freeze_kcount;
+ if (who & FREEZE_HOLDER_USERSPACE)
+ ++sb->s_writers.freeze_ucount;
+ return sb->s_writers.freeze_kcount + sb->s_writers.freeze_ucount;
+}
+
+static inline int freeze_dec(struct super_block *sb, enum freeze_holder who)
+{
+ WARN_ON_ONCE((who & ~FREEZE_FLAGS));
+ WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1);
+
+ if ((who & FREEZE_HOLDER_KERNEL) && sb->s_writers.freeze_kcount)
+ --sb->s_writers.freeze_kcount;
+ if ((who & FREEZE_HOLDER_USERSPACE) && sb->s_writers.freeze_ucount)
+ --sb->s_writers.freeze_ucount;
+ return sb->s_writers.freeze_kcount + sb->s_writers.freeze_ucount;
+}
+
+static inline bool may_freeze(struct super_block *sb, enum freeze_holder who)
+{
+ WARN_ON_ONCE((who & ~FREEZE_FLAGS));
+ WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1);
+
+ if (who & FREEZE_HOLDER_KERNEL)
+ return (who & FREEZE_MAY_NEST) ||
+ sb->s_writers.freeze_kcount == 0;
+ if (who & FREEZE_HOLDER_USERSPACE)
+ return (who & FREEZE_MAY_NEST) ||
+ sb->s_writers.freeze_ucount == 0;
+ return false;
+}
+
/**
* freeze_super - lock the filesystem and force it into a consistent state
* @sb: the super to lock
* @who should be:
* * %FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs;
* * %FREEZE_HOLDER_KERNEL if the kernel wants to freeze the fs.
+ * * %FREEZE_MAY_NEST whether nesting freeze and thaw requests is allowed.
*
* The @who argument distinguishes between the kernel and userspace trying to
* freeze the filesystem. Although there cannot be multiple kernel freezes or
* userspace can both hold a filesystem frozen. The filesystem remains frozen
* until there are no kernel or userspace freezes in effect.
*
+ * A filesystem may hold multiple devices and thus a filesystems may be
+ * frozen through the block layer via multiple block devices. In this
+ * case the request is marked as being allowed to nest by passing
+ * FREEZE_MAY_NEST. The filesystem remains frozen until all block
+ * devices are unfrozen. If multiple freezes are attempted without
+ * FREEZE_MAY_NEST -EBUSY will be returned.
+ *
* During this function, sb->s_writers.frozen goes through these values:
*
* SB_UNFROZEN: File system is normal, all writes progress as usual.
* mostly auxiliary for filesystems to verify they do not modify frozen fs.
*
* sb->s_writers.frozen is protected by sb->s_umount.
+ *
+ * Return: If the freeze was successful zero is returned. If the freeze
+ * failed a negative error code is returned.
*/
int freeze_super(struct super_block *sb, enum freeze_holder who)
{
int ret;
+ if (!super_lock_excl(sb)) {
+ WARN_ON_ONCE("Dying superblock while freezing!");
+ return -EINVAL;
+ }
atomic_inc(&sb->s_active);
- if (!super_lock_excl(sb))
- WARN(1, "Dying superblock while freezing!");
retry:
if (sb->s_writers.frozen == SB_FREEZE_COMPLETE) {
- if (sb->s_writers.freeze_holders & who) {
- deactivate_locked_super(sb);
- return -EBUSY;
- }
-
- WARN_ON(sb->s_writers.freeze_holders == 0);
-
- /*
- * Someone else already holds this type of freeze; share the
- * freeze and assign the active ref to the freeze.
- */
- sb->s_writers.freeze_holders |= who;
- super_unlock_excl(sb);
- return 0;
+ if (may_freeze(sb, who))
+ ret = !!WARN_ON_ONCE(freeze_inc(sb, who) == 1);
+ else
+ ret = -EBUSY;
+ /* All freezers share a single active reference. */
+ deactivate_locked_super(sb);
+ return ret;
}
if (sb->s_writers.frozen != SB_UNFROZEN) {
goto retry;
}
- if (!(sb->s_flags & SB_BORN)) {
- super_unlock_excl(sb);
- return 0; /* sic - it's "nothing to do" */
- }
-
if (sb_rdonly(sb)) {
/* Nothing to do really... */
- sb->s_writers.freeze_holders |= who;
+ WARN_ON_ONCE(freeze_inc(sb, who) > 1);
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
wake_up_var(&sb->s_writers.frozen);
super_unlock_excl(sb);
/* Release s_umount to preserve sb_start_write -> s_umount ordering */
super_unlock_excl(sb);
sb_wait_write(sb, SB_FREEZE_WRITE);
- if (!super_lock_excl(sb))
- WARN(1, "Dying superblock while freezing!");
+ __super_lock_excl(sb);
/* Now we go and block page faults... */
sb->s_writers.frozen = SB_FREEZE_PAGEFAULT;
* For debugging purposes so that fs can warn if it sees write activity
* when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
*/
- sb->s_writers.freeze_holders |= who;
+ WARN_ON_ONCE(freeze_inc(sb, who) > 1);
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
wake_up_var(&sb->s_writers.frozen);
lockdep_sb_freeze_release(sb);
*/
static int thaw_super_locked(struct super_block *sb, enum freeze_holder who)
{
- int error;
+ int error = -EINVAL;
- if (sb->s_writers.frozen == SB_FREEZE_COMPLETE) {
- if (!(sb->s_writers.freeze_holders & who)) {
- super_unlock_excl(sb);
- return -EINVAL;
- }
+ if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
+ goto out_unlock;
- /*
- * Freeze is shared with someone else. Release our hold and
- * drop the active ref that freeze_super assigned to the
- * freezer.
- */
- if (sb->s_writers.freeze_holders & ~who) {
- sb->s_writers.freeze_holders &= ~who;
- deactivate_locked_super(sb);
- return 0;
- }
- } else {
- super_unlock_excl(sb);
- return -EINVAL;
- }
+ /*
+ * All freezers share a single active reference.
+ * So just unlock in case there are any left.
+ */
+ if (freeze_dec(sb, who))
+ goto out_unlock;
if (sb_rdonly(sb)) {
- sb->s_writers.freeze_holders &= ~who;
sb->s_writers.frozen = SB_UNFROZEN;
wake_up_var(&sb->s_writers.frozen);
- goto out;
+ goto out_deactivate;
}
lockdep_sb_freeze_acquire(sb);
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
if (error) {
- printk(KERN_ERR "VFS:Filesystem thaw failed\n");
+ pr_err("VFS: Filesystem thaw failed\n");
+ freeze_inc(sb, who);
lockdep_sb_freeze_release(sb);
- super_unlock_excl(sb);
- return error;
+ goto out_unlock;
}
}
- sb->s_writers.freeze_holders &= ~who;
sb->s_writers.frozen = SB_UNFROZEN;
wake_up_var(&sb->s_writers.frozen);
sb_freeze_unlock(sb, SB_FREEZE_FS);
-out:
+out_deactivate:
deactivate_locked_super(sb);
return 0;
+
+out_unlock:
+ super_unlock_excl(sb);
+ return error;
}
/**
* @who should be:
* * %FREEZE_HOLDER_USERSPACE if userspace wants to thaw the fs;
* * %FREEZE_HOLDER_KERNEL if the kernel wants to thaw the fs.
+ * * %FREEZE_MAY_NEST whether nesting freeze and thaw requests is allowed
+ *
+ * A filesystem may hold multiple devices and thus a filesystems may
+ * have been frozen through the block layer via multiple block devices.
+ * The filesystem remains frozen until all block devices are unfrozen.
*/
int thaw_super(struct super_block *sb, enum freeze_holder who)
{
- if (!super_lock_excl(sb))
- WARN(1, "Dying superblock while thawing!");
+ if (!super_lock_excl(sb)) {
+ WARN_ON_ONCE("Dying superblock while thawing!");
+ return -EINVAL;
+ }
return thaw_super_locked(sb, who);
}
EXPORT_SYMBOL(thaw_super);
* determined by the parent directory.
*/
if (dentry->d_inode->i_mode & S_IFDIR) {
- update_attr(&ei->attr, iattr);
+ /*
+ * The events directory dentry is never freed, unless its
+ * part of an instance that is deleted. It's attr is the
+ * default for its child files and directories.
+ * Do not update it. It's not used for its own mode or ownership
+ */
+ if (!ei->is_events)
+ update_attr(&ei->attr, iattr);
} else {
name = dentry->d_name.name;
.release = eventfs_release,
};
-static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, umode_t mode)
+/* Return the evenfs_inode of the "events" directory */
+static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
{
- if (!attr) {
- inode->i_mode = mode;
+ struct eventfs_inode *ei;
+
+ mutex_lock(&eventfs_mutex);
+ do {
+ /* The parent always has an ei, except for events itself */
+ ei = dentry->d_parent->d_fsdata;
+
+ /*
+ * If the ei is being freed, the ownership of the children
+ * doesn't matter.
+ */
+ if (ei->is_freed) {
+ ei = NULL;
+ break;
+ }
+
+ dentry = ei->dentry;
+ } while (!ei->is_events);
+ mutex_unlock(&eventfs_mutex);
+
+ return ei;
+}
+
+static void update_inode_attr(struct dentry *dentry, struct inode *inode,
+ struct eventfs_attr *attr, umode_t mode)
+{
+ struct eventfs_inode *events_ei = eventfs_find_events(dentry);
+
+ if (!events_ei)
+ return;
+
+ inode->i_mode = mode;
+ inode->i_uid = events_ei->attr.uid;
+ inode->i_gid = events_ei->attr.gid;
+
+ if (!attr)
return;
- }
if (attr->mode & EVENTFS_SAVE_MODE)
inode->i_mode = attr->mode & EVENTFS_MODE_MASK;
- else
- inode->i_mode = mode;
if (attr->mode & EVENTFS_SAVE_UID)
inode->i_uid = attr->uid;
inode->i_gid = attr->gid;
}
+static void update_gid(struct eventfs_inode *ei, kgid_t gid, int level)
+{
+ struct eventfs_inode *ei_child;
+
+ /* at most we have events/system/event */
+ if (WARN_ON_ONCE(level > 3))
+ return;
+
+ ei->attr.gid = gid;
+
+ if (ei->entry_attrs) {
+ for (int i = 0; i < ei->nr_entries; i++) {
+ ei->entry_attrs[i].gid = gid;
+ }
+ }
+
+ /*
+ * Only eventfs_inode with dentries are updated, make sure
+ * all eventfs_inodes are updated. If one of the children
+ * do not have a dentry, this function must traverse it.
+ */
+ list_for_each_entry_srcu(ei_child, &ei->children, list,
+ srcu_read_lock_held(&eventfs_srcu)) {
+ if (!ei_child->dentry)
+ update_gid(ei_child, gid, level + 1);
+ }
+}
+
+void eventfs_update_gid(struct dentry *dentry, kgid_t gid)
+{
+ struct eventfs_inode *ei = dentry->d_fsdata;
+ int idx;
+
+ idx = srcu_read_lock(&eventfs_srcu);
+ update_gid(ei, gid, 0);
+ srcu_read_unlock(&eventfs_srcu, idx);
+}
+
/**
* create_file - create a file in the tracefs filesystem
* @name: the name of the file to create.
return eventfs_failed_creating(dentry);
/* If the user updated the directory's attributes, use them */
- update_inode_attr(inode, attr, mode);
+ update_inode_attr(dentry, inode, attr, mode);
inode->i_op = &eventfs_file_inode_operations;
inode->i_fop = fop;
return eventfs_failed_creating(dentry);
/* If the user updated the directory's attributes, use them */
- update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO);
+ update_inode_attr(dentry, inode, &ei->attr,
+ S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO);
inode->i_op = &eventfs_root_dir_inode_operations;
inode->i_fop = &eventfs_file_operations;
struct eventfs_inode *ei;
struct tracefs_inode *ti;
struct inode *inode;
+ kuid_t uid;
+ kgid_t gid;
if (security_locked_down(LOCKDOWN_TRACEFS))
return NULL;
ei->dentry = dentry;
ei->entries = entries;
ei->nr_entries = size;
+ ei->is_events = 1;
ei->data = data;
ei->name = kstrdup_const(name, GFP_KERNEL);
if (!ei->name)
goto fail;
+ /* Save the ownership of this directory */
+ uid = d_inode(dentry->d_parent)->i_uid;
+ gid = d_inode(dentry->d_parent)->i_gid;
+
+ /* This is used as the default ownership of the files and directories */
+ ei->attr.uid = uid;
+ ei->attr.gid = gid;
+
INIT_LIST_HEAD(&ei->children);
INIT_LIST_HEAD(&ei->list);
ti->private = ei;
inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+ inode->i_uid = uid;
+ inode->i_gid = gid;
inode->i_op = &eventfs_root_dir_inode_operations;
inode->i_fop = &eventfs_file_operations;
next = this_parent->d_subdirs.next;
resume:
while (next != &this_parent->d_subdirs) {
+ struct tracefs_inode *ti;
struct list_head *tmp = next;
struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
next = tmp->next;
+ /* Note, getdents() can add a cursor dentry with no inode */
+ if (!dentry->d_inode)
+ continue;
+
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
change_gid(dentry, gid);
+ /* If this is the events directory, update that too */
+ ti = get_tracefs(dentry->d_inode);
+ if (ti && (ti->flags & TRACEFS_EVENT_INODE))
+ eventfs_update_gid(dentry, gid);
+
if (!list_empty(&dentry->d_subdirs)) {
spin_unlock(&this_parent->d_lock);
spin_release(&dentry->d_lock.dep_map, _RET_IP_);
struct rcu_head rcu;
};
unsigned int is_freed:1;
- unsigned int nr_entries:31;
+ unsigned int is_events:1;
+ unsigned int nr_entries:30;
};
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
struct dentry *eventfs_start_creating(const char *name, struct dentry *parent);
struct dentry *eventfs_failed_creating(struct dentry *dentry);
struct dentry *eventfs_end_creating(struct dentry *dentry);
+void eventfs_update_gid(struct dentry *dentry, kgid_t gid);
void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
{
switch (inflags) {
case XFS_FSOP_GOING_FLAGS_DEFAULT: {
- if (!freeze_bdev(mp->m_super->s_bdev)) {
+ if (!bdev_freeze(mp->m_super->s_bdev)) {
xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
- thaw_bdev(mp->m_super->s_bdev);
+ bdev_thaw(mp->m_super->s_bdev);
}
break;
}
{
int error = 0;
- *handlep = bdev_open_by_path(name, BLK_OPEN_READ | BLK_OPEN_WRITE,
- mp->m_super, &fs_holder_ops);
+ *handlep = bdev_open_by_path(name,
+ BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
+ mp->m_super, &fs_holder_ops);
if (IS_ERR(*handlep)) {
error = PTR_ERR(*handlep);
*handlep = NULL;
struct bdev_handle *logdev_handle = NULL, *rtdev_handle = NULL;
int error;
- /*
- * blkdev_put() can't be called under s_umount, see the comment
- * in get_tree_bdev() for more details
- */
- up_write(&sb->s_umount);
-
/*
* Open real time and log devices - order is important.
*/
if (mp->m_logname) {
error = xfs_blkdev_get(mp, mp->m_logname, &logdev_handle);
if (error)
- goto out_relock;
+ return error;
}
if (mp->m_rtname) {
bdev_release(logdev_handle);
}
- error = 0;
-out_relock:
- down_write(&sb->s_umount);
- return error;
+ return 0;
out_free_rtdev_targ:
if (mp->m_rtdev_targp)
out_close_logdev:
if (logdev_handle)
bdev_release(logdev_handle);
- goto out_relock;
+ return error;
}
/*
xfs_mount_free(
struct xfs_mount *mp)
{
- /*
- * Free the buftargs here because blkdev_put needs to be called outside
- * of sb->s_umount, which is held around the call to ->put_super.
- */
if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
xfs_free_buftarg(mp->m_logdev_targp);
if (mp->m_rtdev_targp)
void __init numa_set_distance(int from, int to, int distance);
void __init numa_free_distance(void);
void __init early_map_cpu_to_node(unsigned int cpu, int nid);
+int __init early_cpu_to_node(int cpu);
void numa_store_cpu_info(unsigned int cpu);
void numa_add_cpu(unsigned int cpu);
void numa_remove_cpu(unsigned int cpu);
static inline void numa_remove_cpu(unsigned int cpu) { }
static inline void arch_numa_init(void) { }
static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
+static inline int early_cpu_to_node(int cpu) { return 0; }
#endif /* CONFIG_NUMA */
static inline void __put_unaligned_be24(const u32 val, u8 *p)
{
- *p++ = val >> 16;
- *p++ = val >> 8;
- *p++ = val;
+ *p++ = (val >> 16) & 0xff;
+ *p++ = (val >> 8) & 0xff;
+ *p++ = val & 0xff;
}
static inline void put_unaligned_be24(const u32 val, void *p)
static inline void __put_unaligned_le24(const u32 val, u8 *p)
{
- *p++ = val;
- *p++ = val >> 8;
- *p++ = val >> 16;
+ *p++ = val & 0xff;
+ *p++ = (val >> 8) & 0xff;
+ *p++ = (val >> 16) & 0xff;
}
static inline void put_unaligned_le24(const u32 val, void *p)
static inline void __put_unaligned_be48(const u64 val, u8 *p)
{
- *p++ = val >> 40;
- *p++ = val >> 32;
- *p++ = val >> 24;
- *p++ = val >> 16;
- *p++ = val >> 8;
- *p++ = val;
+ *p++ = (val >> 40) & 0xff;
+ *p++ = (val >> 32) & 0xff;
+ *p++ = (val >> 24) & 0xff;
+ *p++ = (val >> 16) & 0xff;
+ *p++ = (val >> 8) & 0xff;
+ *p++ = val & 0xff;
}
static inline void put_unaligned_be48(const u64 val, void *p)
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Common helpers for stackable filesystems and backing files.
+ *
+ * Copyright (C) 2023 CTERA Networks.
+ */
+
+#ifndef _LINUX_BACKING_FILE_H
+#define _LINUX_BACKING_FILE_H
+
+#include <linux/file.h>
+#include <linux/uio.h>
+#include <linux/fs.h>
+
+struct backing_file_ctx {
+ const struct cred *cred;
+ struct file *user_file;
+ void (*accessed)(struct file *);
+ void (*end_write)(struct file *);
+};
+
+struct file *backing_file_open(const struct path *user_path, int flags,
+ const struct path *real_path,
+ const struct cred *cred);
+ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
+ struct kiocb *iocb, int flags,
+ struct backing_file_ctx *ctx);
+ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
+ struct kiocb *iocb, int flags,
+ struct backing_file_ctx *ctx);
+ssize_t backing_file_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags,
+ struct backing_file_ctx *ctx);
+ssize_t backing_file_splice_write(struct pipe_inode_info *pipe,
+ struct file *out, loff_t *ppos, size_t len,
+ unsigned int flags,
+ struct backing_file_ctx *ctx);
+int backing_file_mmap(struct file *file, struct vm_area_struct *vma,
+ struct backing_file_ctx *ctx);
+
+#endif /* _LINUX_BACKING_FILE_H */
void * bd_holder;
const struct blk_holder_ops *bd_holder_ops;
struct mutex bd_holder_lock;
- /* The counter of freeze processes */
- int bd_fsfreeze_count;
int bd_holders;
struct kobject *bd_holder_dir;
- /* Mutex for freeze */
- struct mutex bd_fsfreeze_mutex;
- struct super_block *bd_fsfreeze_sb;
+ atomic_t bd_fsfreeze_count; /* number of freeze requests */
+ struct mutex bd_fsfreeze_mutex; /* serialize freeze/thaw */
struct partition_meta_info *bd_meta_info;
#ifdef CONFIG_FAIL_MAKE_REQUEST
bool bd_make_it_fail;
#endif
bool bd_ro_warned;
+ int bd_writers;
/*
* keep this out-of-line as it's both big and not needed in the fast
* path
#define BLK_OPEN_NDELAY ((__force blk_mode_t)(1 << 3))
/* open for "writes" only for ioctls (specialy hack for floppy.c) */
#define BLK_OPEN_WRITE_IOCTL ((__force blk_mode_t)(1 << 4))
+/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
+#define BLK_OPEN_RESTRICT_WRITES ((__force blk_mode_t)(1 << 5))
struct gendisk {
/*
#define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */
#define QUEUE_FLAG_SYNCHRONOUS 11 /* always completes in submit context */
#define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */
-#define QUEUE_FLAG_HW_WC 18 /* Write back caching supported */
+#define QUEUE_FLAG_HW_WC 13 /* Write back caching supported */
#define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */
#define QUEUE_FLAG_STABLE_WRITES 15 /* don't modify blks until WB is done */
#define QUEUE_FLAG_POLL 16 /* IO polling enabled if set */
* Sync the file system mounted on the block device.
*/
void (*sync)(struct block_device *bdev);
+
+ /*
+ * Freeze the file system mounted on the block device.
+ */
+ int (*freeze)(struct block_device *bdev);
+
+ /*
+ * Thaw the file system mounted on the block device.
+ */
+ int (*thaw)(struct block_device *bdev);
};
+/*
+ * For filesystems using @fs_holder_ops, the @holder argument passed to
+ * helpers used to open and claim block devices via
+ * bd_prepare_to_claim() must point to a superblock.
+ */
extern const struct blk_holder_ops fs_holder_ops;
/*
* as stored in sb->s_flags.
*/
#define sb_open_mode(flags) \
- (BLK_OPEN_READ | (((flags) & SB_RDONLY) ? 0 : BLK_OPEN_WRITE))
+ (BLK_OPEN_READ | BLK_OPEN_RESTRICT_WRITES | \
+ (((flags) & SB_RDONLY) ? 0 : BLK_OPEN_WRITE))
struct bdev_handle {
struct block_device *bdev;
blk_mode_t mode;
};
-struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
- const struct blk_holder_ops *hops);
-struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
- void *holder, const struct blk_holder_ops *hops);
struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
const struct blk_holder_ops *hops);
struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
int bd_prepare_to_claim(struct block_device *bdev, void *holder,
const struct blk_holder_ops *hops);
void bd_abort_claiming(struct block_device *bdev, void *holder);
-void blkdev_put(struct block_device *bdev, void *holder);
void bdev_release(struct bdev_handle *handle);
/* just for blk-cgroup, don't use elsewhere */
}
#endif /* CONFIG_BLOCK */
-int freeze_bdev(struct block_device *bdev);
-int thaw_bdev(struct block_device *bdev);
+int bdev_freeze(struct block_device *bdev);
+int bdev_thaw(struct block_device *bdev);
struct io_comp_batch {
struct request *req_list;
#ifdef CONFIG_NET
BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
+BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
+BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
+BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
#endif
#ifdef CONFIG_PERF_EVENTS
BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
#endif
BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
+BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
mutex_unlock(&dev->mutex);
}
+DEFINE_GUARD(device, struct device *, device_lock(_T), device_unlock(_T))
+
static inline void device_lock_assert(struct device *dev)
{
lockdep_assert_held(&dev->mutex);
* @MEM_NVDIMM: Non-volatile RAM
* @MEM_WIO2: Wide I/O 2.
* @MEM_HBM2: High bandwidth Memory Gen 2.
+ * @MEM_HBM3: High bandwidth Memory Gen 3.
*/
enum mem_type {
MEM_EMPTY = 0,
MEM_NVDIMM,
MEM_WIO2,
MEM_HBM2,
+ MEM_HBM3,
};
#define MEM_FLAG_EMPTY BIT(MEM_EMPTY)
#define MEM_FLAG_NVDIMM BIT(MEM_NVDIMM)
#define MEM_FLAG_WIO2 BIT(MEM_WIO2)
#define MEM_FLAG_HBM2 BIT(MEM_HBM2)
+#define MEM_FLAG_HBM3 BIT(MEM_HBM3)
/**
* enum edac_type - Error Detection and Correction capabilities and mode
struct file *eventfd_fget(int fd);
struct eventfd_ctx *eventfd_ctx_fdget(int fd);
struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
-__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n);
-__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, __poll_t mask);
+void eventfd_signal_mask(struct eventfd_ctx *ctx, __poll_t mask);
int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait,
__u64 *cnt);
void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt);
return ERR_PTR(-ENOSYS);
}
-static inline int eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
+static inline void eventfd_signal_mask(struct eventfd_ctx *ctx, __poll_t mask)
{
- return -ENOSYS;
-}
-
-static inline int eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n,
- unsigned mask)
-{
- return -ENOSYS;
}
static inline void eventfd_ctx_put(struct eventfd_ctx *ctx)
#endif
+static inline void eventfd_signal(struct eventfd_ctx *ctx)
+{
+ eventfd_signal_mask(ctx, 0);
+}
+
#endif /* _LINUX_EVENTFD_H */
* and eliminates the need for absolute relocations that require runtime
* processing on relocatable kernels.
*/
+#define __KSYM_ALIGN ".balign 4"
#define __KSYM_REF(sym) ".long " #sym "- ."
#elif defined(CONFIG_64BIT)
+#define __KSYM_ALIGN ".balign 8"
#define __KSYM_REF(sym) ".quad " #sym
#else
+#define __KSYM_ALIGN ".balign 4"
#define __KSYM_REF(sym) ".long " #sym
#endif
" .asciz \"" ns "\"" "\n" \
" .previous" "\n" \
" .section \"___ksymtab" sec "+" #name "\", \"a\"" "\n" \
- " .balign 4" "\n" \
+ __KSYM_ALIGN "\n" \
"__ksymtab_" #name ":" "\n" \
__KSYM_REF(sym) "\n" \
__KSYM_REF(__kstrtab_ ##name) "\n" \
#define SYMBOL_CRC(sym, crc, sec) \
asm(".section \"___kcrctab" sec "+" #sym "\",\"a\"" "\n" \
+ ".balign 4" "\n" \
"__crc_" #sym ":" "\n" \
".long " #crc "\n" \
".previous" "\n")
static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd)
{
struct fdtable *fdt = rcu_dereference_raw(files->fdt);
-
- if (fd < fdt->max_fds) {
- fd = array_index_nospec(fd, fdt->max_fds);
- return rcu_dereference_raw(fdt->fd[fd]);
- }
- return NULL;
+ unsigned long mask = array_index_mask_nospec(fd, fdt->max_fds);
+ struct file *needs_masking;
+
+ /*
+ * 'mask' is zero for an out-of-bounds fd, all ones for ok.
+ * 'fd&mask' is 'fd' for ok, or 0 for out of bounds.
+ *
+ * Accessing fdt->fd[0] is ok, but needs masking of the result.
+ */
+ needs_masking = rcu_dereference_raw(fdt->fd[fd&mask]);
+ return (struct file *)(mask & (unsigned long)needs_masking);
}
static inline struct file *files_lookup_fd_locked(struct files_struct *files, unsigned int fd)
extern int close_fd(unsigned int fd);
extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags);
-extern struct file *close_fd_get_file(unsigned int fd);
+extern struct file *file_close_fd(unsigned int fd);
extern int unshare_fd(unsigned long unshare_flags, unsigned int max_fds,
struct files_struct **new_fdp);
extern void fd_install(unsigned int fd, struct file *file);
-extern int __receive_fd(struct file *file, int __user *ufd,
- unsigned int o_flags);
+int receive_fd(struct file *file, int __user *ufd, unsigned int o_flags);
-extern int receive_fd(struct file *file, unsigned int o_flags);
-
-static inline int receive_fd_user(struct file *file, int __user *ufd,
- unsigned int o_flags)
-{
- if (ufd == NULL)
- return -EFAULT;
- return __receive_fd(file, ufd, o_flags);
-}
int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags);
extern void flush_delayed_fput(void);
* @a_ops: Methods.
* @flags: Error bits and flags (AS_*).
* @wb_err: The most recent error which has occurred.
- * @private_lock: For use by the owner of the address_space.
- * @private_list: For use by the owner of the address_space.
- * @private_data: For use by the owner of the address_space.
+ * @i_private_lock: For use by the owner of the address_space.
+ * @i_private_list: For use by the owner of the address_space.
+ * @i_private_data: For use by the owner of the address_space.
*/
struct address_space {
struct inode *host;
unsigned long flags;
struct rw_semaphore i_mmap_rwsem;
errseq_t wb_err;
- spinlock_t private_lock;
- struct list_head private_list;
- void *private_data;
+ spinlock_t i_private_lock;
+ struct list_head i_private_list;
+ void * i_private_data;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
* On most architectures that alignment is already the case; but
*/
struct file {
union {
+ /* fput() uses task work when closing and freeing file (default). */
+ struct callback_head f_task_work;
+ /* fput() must use workqueue (most kernel threads). */
struct llist_node f_llist;
- struct rcu_head f_rcuhead;
unsigned int f_iocb_flags;
};
struct sb_writers {
unsigned short frozen; /* Is sb frozen? */
- unsigned short freeze_holders; /* Who froze fs? */
+ int freeze_kcount; /* How many kernel freeze requests? */
+ int freeze_ucount; /* How many userspace freeze requests? */
struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
};
#define __sb_writers_release(sb, lev) \
percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
+/**
+ * __sb_write_started - check if sb freeze level is held
+ * @sb: the super we write to
+ * @level: the freeze level
+ *
+ * * > 0 - sb freeze level is held
+ * * 0 - sb freeze level is not held
+ * * < 0 - !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN
+ */
+static inline int __sb_write_started(const struct super_block *sb, int level)
+{
+ return lockdep_is_held_type(sb->s_writers.rw_sem + level - 1, 1);
+}
+
+/**
+ * sb_write_started - check if SB_FREEZE_WRITE is held
+ * @sb: the super we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ */
static inline bool sb_write_started(const struct super_block *sb)
{
- return lockdep_is_held_type(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1, 1);
+ return __sb_write_started(sb, SB_FREEZE_WRITE);
+}
+
+/**
+ * sb_write_not_started - check if SB_FREEZE_WRITE is not held
+ * @sb: the super we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ */
+static inline bool sb_write_not_started(const struct super_block *sb)
+{
+ return __sb_write_started(sb, SB_FREEZE_WRITE) <= 0;
+}
+
+/**
+ * file_write_started - check if SB_FREEZE_WRITE is held
+ * @file: the file we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ * May be false positive with !S_ISREG, because file_start_write() has
+ * no effect on !S_ISREG.
+ */
+static inline bool file_write_started(const struct file *file)
+{
+ if (!S_ISREG(file_inode(file)->i_mode))
+ return true;
+ return sb_write_started(file_inode(file)->i_sb);
+}
+
+/**
+ * file_write_not_started - check if SB_FREEZE_WRITE is not held
+ * @file: the file we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ * May be false positive with !S_ISREG, because file_start_write() has
+ * no effect on !S_ISREG.
+ */
+static inline bool file_write_not_started(const struct file *file)
+{
+ if (!S_ISREG(file_inode(file)->i_mode))
+ return true;
+ return sb_write_not_started(file_inode(file)->i_sb);
}
/**
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
loff_t, size_t, unsigned int);
-extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
- struct file *file_out, loff_t pos_out,
- size_t len, unsigned int flags);
int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *len, unsigned int remap_flags,
struct file *dst_file, loff_t dst_pos,
loff_t len, unsigned int remap_flags);
+/**
+ * enum freeze_holder - holder of the freeze
+ * @FREEZE_HOLDER_KERNEL: kernel wants to freeze or thaw filesystem
+ * @FREEZE_HOLDER_USERSPACE: userspace wants to freeze or thaw filesystem
+ * @FREEZE_MAY_NEST: whether nesting freeze and thaw requests is allowed
+ *
+ * Indicate who the owner of the freeze or thaw request is and whether
+ * the freeze needs to be exclusive or can nest.
+ * Without @FREEZE_MAY_NEST, multiple freeze and thaw requests from the
+ * same holder aren't allowed. It is however allowed to hold a single
+ * @FREEZE_HOLDER_USERSPACE and a single @FREEZE_HOLDER_KERNEL freeze at
+ * the same time. This is relied upon by some filesystems during online
+ * repair or similar.
+ */
enum freeze_holder {
FREEZE_HOLDER_KERNEL = (1U << 0),
FREEZE_HOLDER_USERSPACE = (1U << 1),
+ FREEZE_MAY_NEST = (1U << 2),
};
struct super_operations {
const struct cred *creds);
struct file *dentry_create(const struct path *path, int flags, umode_t mode,
const struct cred *cred);
-struct file *backing_file_open(const struct path *user_path, int flags,
- const struct path *real_path,
- const struct cred *cred);
struct path *backing_file_user_path(struct file *f);
/*
- * file_user_path - get the path to display for memory mapped file
- *
* When mmapping a file on a stackable filesystem (e.g., overlayfs), the file
* stored in ->vm_file is a backing file whose f_inode is on the underlying
- * filesystem. When the mapped file path is displayed to user (e.g. via
- * /proc/<pid>/maps), this helper should be used to get the path to display
- * to the user, which is the path of the fd that user has requested to map.
+ * filesystem. When the mapped file path and inode number are displayed to
+ * user (e.g. via /proc/<pid>/maps), these helpers should be used to get the
+ * path and inode number to display to the user, which is the path of the fd
+ * that user has requested to map and the inode number that would be returned
+ * by fstat() on that same fd.
*/
+/* Get the path to display in /proc/<pid>/maps */
static inline const struct path *file_user_path(struct file *f)
{
if (unlikely(f->f_mode & FMODE_BACKING))
return backing_file_user_path(f);
return &f->f_path;
}
+/* Get the inode whose inode number to display in /proc/<pid>/maps */
+static inline const struct inode *file_user_inode(struct file *f)
+{
+ if (unlikely(f->f_mode & FMODE_BACKING))
+ return d_inode(backing_file_user_path(f)->dentry);
+ return file_inode(f);
+}
static inline struct file *file_clone_open(struct file *file)
{
size_t len, unsigned int flags);
extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
struct file *, loff_t *, size_t, unsigned int);
-extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
- loff_t *opos, size_t len, unsigned int flags);
extern void
extern struct file_system_type *get_filesystem(struct file_system_type *fs);
extern void put_filesystem(struct file_system_type *fs);
extern struct file_system_type *get_fs_type(const char *name);
-extern struct super_block *get_active_super(struct block_device *bdev);
extern void drop_super(struct super_block *sb);
extern void drop_super_exclusive(struct super_block *sb);
extern void iterate_supers(void (*)(struct super_block *, void *), void *);
return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
}
-/* Simple call site for access decisions */
-static inline int fsnotify_perm(struct file *file, int mask)
+/*
+ * fsnotify_file_area_perm - permission hook before access to file range
+ */
+static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
+ const loff_t *ppos, size_t count)
{
- int ret;
- __u32 fsnotify_mask = 0;
+ __u32 fsnotify_mask = FS_ACCESS_PERM;
+
+ /*
+ * filesystem may be modified in the context of permission events
+ * (e.g. by HSM filling a file on access), so sb freeze protection
+ * must not be held.
+ */
+ lockdep_assert_once(file_write_not_started(file));
- if (!(mask & (MAY_READ | MAY_OPEN)))
+ if (!(perm_mask & MAY_READ))
return 0;
- if (mask & MAY_OPEN) {
- fsnotify_mask = FS_OPEN_PERM;
+ return fsnotify_file(file, fsnotify_mask);
+}
+
+/*
+ * fsnotify_file_perm - permission hook before file access
+ */
+static inline int fsnotify_file_perm(struct file *file, int perm_mask)
+{
+ return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
+}
- if (file->f_flags & __FMODE_EXEC) {
- ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
+/*
+ * fsnotify_open_perm - permission hook before file open
+ */
+static inline int fsnotify_open_perm(struct file *file)
+{
+ int ret;
- if (ret)
- return ret;
- }
- } else if (mask & MAY_READ) {
- fsnotify_mask = FS_ACCESS_PERM;
+ if (file->f_flags & __FMODE_EXEC) {
+ ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
+ if (ret)
+ return ret;
}
- return fsnotify_file(file, fsnotify_mask);
+ return fsnotify_file(file, FS_OPEN_PERM);
}
/*
#define HID_USAGE_SENSOR_ALS 0x200041
#define HID_USAGE_SENSOR_DATA_LIGHT 0x2004d0
#define HID_USAGE_SENSOR_LIGHT_ILLUM 0x2004d1
-#define HID_USAGE_SENSOR_LIGHT_COLOR_TEMPERATURE 0x2004d2
-#define HID_USAGE_SENSOR_LIGHT_CHROMATICITY 0x2004d3
-#define HID_USAGE_SENSOR_LIGHT_CHROMATICITY_X 0x2004d4
-#define HID_USAGE_SENSOR_LIGHT_CHROMATICITY_Y 0x2004d5
/* PROX (200011) */
#define HID_USAGE_SENSOR_PROX 0x200011
action != WLAN_PUB_ACTION_LOC_TRACK_NOTI &&
action != WLAN_PUB_ACTION_FTM_REQUEST &&
action != WLAN_PUB_ACTION_FTM_RESPONSE &&
- action != WLAN_PUB_ACTION_FILS_DISCOVERY;
+ action != WLAN_PUB_ACTION_FILS_DISCOVERY &&
+ action != WLAN_PUB_ACTION_VENDOR_SPECIFIC;
}
/**
unsigned int flags;
#define KEY_TYPE_NET_DOMAIN 0x00000001 /* Keys of this type have a net namespace domain */
+#define KEY_TYPE_INSTANT_REAP 0x00000002 /* Keys of this type don't have a delay after expiring */
/* vet a description */
int (*vet_description)(const char *description);
return from_vfsgid(idmap, fs_userns, VFSGIDT_INIT(current_fsgid()));
}
-bool check_fsmapping(const struct mnt_idmap *idmap,
- const struct super_block *sb);
-
#endif /* _LINUX_MNT_IDMAPPING_H */
#define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME )
#define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \
- MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED | \
- MNT_CURSOR)
+ MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED | MNT_ONRB)
#define MNT_INTERNAL 0x4000
#define MNT_SYNC_UMOUNT 0x2000000
#define MNT_MARKED 0x4000000
#define MNT_UMOUNT 0x8000000
-#define MNT_CURSOR 0x10000000
+#define MNT_ONRB 0x10000000
struct vfsmount {
struct dentry *mnt_root; /* root of the mounted tree */
void (*remove)(struct nubus_board *board);
};
-extern struct bus_type nubus_bus_type;
-
/* Generic NuBus interface functions, modelled after the PCI interface */
#ifdef CONFIG_PROC_FS
extern bool nubus_populate_procfs;
* An MCS like lock especially tailored for optimistic spinning for sleeping
* lock implementations (mutex, rwsem, etc).
*/
-struct optimistic_spin_node {
- struct optimistic_spin_node *next, *prev;
- int locked; /* 1 if lock acquired */
- int cpu; /* encoded CPU # + 1 value */
-};
struct optimistic_spin_queue {
/*
int pci_write_config_byte(const struct pci_dev *dev, int where, u8 val);
int pci_write_config_word(const struct pci_dev *dev, int where, u16 val);
int pci_write_config_dword(const struct pci_dev *dev, int where, u32 val);
+void pci_clear_and_set_config_dword(const struct pci_dev *dev, int pos,
+ u32 clear, u32 set);
int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val);
int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val);
#define PCI_VENDOR_ID_TEKRAM 0x1de1
#define PCI_DEVICE_ID_TEKRAM_DC290 0xdc29
+#define PCI_VENDOR_ID_ALIBABA 0x1ded
+
#define PCI_VENDOR_ID_TEHUTI 0x1fc9
#define PCI_DEVICE_ID_TEHUTI_3009 0x3009
#define PCI_DEVICE_ID_TEHUTI_3010 0x3010
*/
DECLARE_BITMAP(used_mask, ARMPMU_MAX_HWEVENTS);
- /*
- * Hardware lock to serialize accesses to PMU registers. Needed for the
- * read/modify/write sequences.
- */
- raw_spinlock_t pmu_lock;
-
/*
* When using percpu IRQs, we need a percpu dev_id. Place it here as we
* already have to allocate this struct per cpu.
#define ARMV8_SPE_PDEV_NAME "arm,spe-v1"
#define ARMV8_TRBE_PDEV_NAME "arm,trbe"
+/* Why does everything I do descend into this? */
+#define __GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \
+ (lo) == (hi) ? #cfg ":" #lo "\n" : #cfg ":" #lo "-" #hi
+
+#define _GEN_PMU_FORMAT_ATTR(cfg, lo, hi) \
+ __GEN_PMU_FORMAT_ATTR(cfg, lo, hi)
+
+#define GEN_PMU_FORMAT_ATTR(name) \
+ PMU_FORMAT_ATTR(name, \
+ _GEN_PMU_FORMAT_ATTR(ATTR_CFG_FLD_##name##_CFG, \
+ ATTR_CFG_FLD_##name##_LO, \
+ ATTR_CFG_FLD_##name##_HI))
+
+#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi) \
+ ((((attr)->cfg) >> lo) & GENMASK_ULL(hi - lo, 0))
+
+#define ATTR_CFG_GET_FLD(attr, name) \
+ _ATTR_CFG_GET_FLD(attr, \
+ ATTR_CFG_FLD_##name##_CFG, \
+ ATTR_CFG_FLD_##name##_LO, \
+ ATTR_CFG_FLD_##name##_HI)
+
#endif /* __ARM_PMU_H__ */
#define ARMV8_PMU_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/
#define ARMV8_PMU_PMCR_LC (1 << 6) /* Overflow on 64 bit cycle counter */
#define ARMV8_PMU_PMCR_LP (1 << 7) /* Long event counter enable */
-#define ARMV8_PMU_PMCR_N_SHIFT 11 /* Number of counters supported */
-#define ARMV8_PMU_PMCR_N_MASK 0x1f
-#define ARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
+#define ARMV8_PMU_PMCR_N GENMASK(15, 11) /* Number of counters supported */
+/* Mask for writable bits */
+#define ARMV8_PMU_PMCR_MASK (ARMV8_PMU_PMCR_E | ARMV8_PMU_PMCR_P | \
+ ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_D | \
+ ARMV8_PMU_PMCR_X | ARMV8_PMU_PMCR_DP | \
+ ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
/*
* PMOVSR: counters overflow flag status reg
*/
-#define ARMV8_PMU_OVSR_MASK 0xffffffff /* Mask for writable bits */
-#define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_OVSR_MASK
+#define ARMV8_PMU_OVSR_P GENMASK(30, 0)
+#define ARMV8_PMU_OVSR_C BIT(31)
+/* Mask for writable bits is both P and C fields */
+#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C)
/*
* PMXEVTYPER: Event selection reg
*/
-#define ARMV8_PMU_EVTYPE_MASK 0xc800ffff /* Mask for writable bits */
-#define ARMV8_PMU_EVTYPE_EVENT 0xffff /* Mask for EVENT bits */
+#define ARMV8_PMU_EVTYPE_EVENT GENMASK(15, 0) /* Mask for EVENT bits */
+#define ARMV8_PMU_EVTYPE_TH GENMASK_ULL(43, 32) /* arm64 only */
+#define ARMV8_PMU_EVTYPE_TC GENMASK_ULL(63, 61) /* arm64 only */
/*
* Event filters for PMUv3
/*
* PMUSERENR: user enable reg
*/
-#define ARMV8_PMU_USERENR_MASK 0xf /* Mask for writable bits */
#define ARMV8_PMU_USERENR_EN (1 << 0) /* PMU regs can be accessed at EL0 */
#define ARMV8_PMU_USERENR_SW (1 << 1) /* PMSWINC can be written at EL0 */
#define ARMV8_PMU_USERENR_CR (1 << 2) /* Cycle counter can be read at EL0 */
#define ARMV8_PMU_USERENR_ER (1 << 3) /* Event counter can be read at EL0 */
+/* Mask for writable bits */
+#define ARMV8_PMU_USERENR_MASK (ARMV8_PMU_USERENR_EN | ARMV8_PMU_USERENR_SW | \
+ ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_ER)
/* PMMIR_EL1.SLOTS mask */
-#define ARMV8_PMU_SLOTS_MASK 0xff
-
-#define ARMV8_PMU_BUS_SLOTS_SHIFT 8
-#define ARMV8_PMU_BUS_SLOTS_MASK 0xff
-#define ARMV8_PMU_BUS_WIDTH_SHIFT 16
-#define ARMV8_PMU_BUS_WIDTH_MASK 0xf
+#define ARMV8_PMU_SLOTS GENMASK(7, 0)
+#define ARMV8_PMU_BUS_SLOTS GENMASK(15, 8)
+#define ARMV8_PMU_BUS_WIDTH GENMASK(19, 16)
+#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
/*
* This code is really good
* - Bits [31:24] are reserved for defining generic
* PHY driver behavior.
* @irq: IRQ number of the PHY's interrupt (-1 if none)
- * @phy_timer: The timer for handling the state machine
* @phylink: Pointer to phylink instance for this PHY
* @sfp_bus_attached: Flag indicating whether the SFP bus has been attached
* @sfp_bus: SFP bus attached to this PHY's fiber port
typedef int (splice_direct_actor)(struct pipe_inode_info *,
struct splice_desc *);
-extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
- loff_t *, size_t, unsigned int,
- splice_actor *);
-extern ssize_t __splice_from_pipe(struct pipe_inode_info *,
- struct splice_desc *, splice_actor *);
-extern ssize_t splice_to_pipe(struct pipe_inode_info *,
- struct splice_pipe_desc *);
-extern ssize_t add_to_pipe(struct pipe_inode_info *,
- struct pipe_buffer *);
-long vfs_splice_read(struct file *in, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
- unsigned int flags);
-extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
- splice_direct_actor *);
-extern long do_splice(struct file *in, loff_t *off_in,
- struct file *out, loff_t *off_out,
- size_t len, unsigned int flags);
+ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags,
+ splice_actor *actor);
+ssize_t __splice_from_pipe(struct pipe_inode_info *pipe,
+ struct splice_desc *sd, splice_actor *actor);
+ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
+ struct splice_pipe_desc *spd);
+ssize_t add_to_pipe(struct pipe_inode_info *pipe, struct pipe_buffer *buf);
+ssize_t vfs_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags);
+ssize_t splice_direct_to_actor(struct file *file, struct splice_desc *sd,
+ splice_direct_actor *actor);
+ssize_t do_splice(struct file *in, loff_t *off_in, struct file *out,
+ loff_t *off_out, size_t len, unsigned int flags);
+ssize_t do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
+ loff_t *opos, size_t len, unsigned int flags);
+ssize_t splice_file_range(struct file *in, loff_t *ppos, struct file *out,
+ loff_t *opos, size_t len);
-extern long do_tee(struct file *in, struct file *out, size_t len,
- unsigned int flags);
-extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
- loff_t *ppos, size_t len, unsigned int flags);
+static inline long splice_copy_file_range(struct file *in, loff_t pos_in,
+ struct file *out, loff_t pos_out,
+ size_t len)
+{
+ return splice_file_range(in, &pos_in, out, &pos_out, len);
+}
+
+ssize_t do_tee(struct file *in, struct file *out, size_t len,
+ unsigned int flags);
+ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags);
/*
* for dynamic pipe sizing
enum landlock_rule_type;
struct cachestat_range;
struct cachestat;
+struct statmount;
+struct mnt_id_req;
#include <linux/types.h>
#include <linux/aio_abi.h>
asmlinkage long sys_fstatfs(unsigned int fd, struct statfs __user *buf);
asmlinkage long sys_fstatfs64(unsigned int fd, size_t sz,
struct statfs64 __user *buf);
+asmlinkage long sys_statmount(const struct mnt_id_req __user *req,
+ struct statmount __user *buf, size_t bufsize,
+ unsigned int flags);
+asmlinkage long sys_listmount(const struct mnt_id_req __user *req,
+ u64 __user *buf, size_t bufsize,
+ unsigned int flags);
asmlinkage long sys_truncate(const char __user *path, long length);
asmlinkage long sys_ftruncate(unsigned int fd, unsigned long length);
#if BITS_PER_LONG == 32
struct user_namespace;
extern struct user_namespace init_user_ns;
+struct uid_gid_map;
typedef struct {
uid_t val;
return from_kgid(ns, gid) != (gid_t) -1;
}
+u32 map_id_down(struct uid_gid_map *map, u32 id);
+u32 map_id_up(struct uid_gid_map *map, u32 id);
+
#else
static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
return gid_valid(gid);
}
+static inline u32 map_id_down(struct uid_gid_map *map, u32 id)
+{
+ return id;
+}
+
+static inline u32 map_id_up(struct uid_gid_map *map, u32 id)
+{
+ return id;
+}
#endif /* CONFIG_USER_NS */
#endif /* _LINUX_UIDGID_H */
ssize_t __import_iovec(int type, const struct iovec __user *uvec,
unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
struct iov_iter *i, bool compat);
-int import_single_range(int type, void __user *buf, size_t len,
- struct iovec *iov, struct iov_iter *i);
int import_ubuf(int type, void __user *buf, size_t len, struct iov_iter *i);
static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
/* writeback.h requires fs.h; it, too, is not included from here. */
static inline void wait_on_inode(struct inode *inode)
{
- might_sleep();
wait_on_bit(&inode->i_state, __I_NEW, TASK_UNINTERRUPTIBLE);
}
struct smp_csrk {
bdaddr_t bdaddr;
u8 bdaddr_type;
+ u8 link_type;
u8 type;
u8 val[16];
};
struct rcu_head rcu;
bdaddr_t bdaddr;
u8 bdaddr_type;
+ u8 link_type;
u8 authenticated;
u8 type;
u8 enc_size;
bdaddr_t rpa;
bdaddr_t bdaddr;
u8 addr_type;
+ u8 link_type;
u8 val[16];
};
struct list_head list;
struct rcu_head rcu;
bdaddr_t bdaddr;
+ u8 bdaddr_type;
+ u8 link_type;
u8 type;
u8 val[HCI_LINK_KEY_SIZE];
u8 pin_len;
continue;
/* Match CIG ID if set */
- if (cig != BT_ISO_QOS_CIG_UNSET && cig != c->iso_qos.ucast.cig)
+ if (cig != c->iso_qos.ucast.cig)
continue;
/* Match CIS ID if set */
- if (id != BT_ISO_QOS_CIS_UNSET && id != c->iso_qos.ucast.cis)
+ if (id != c->iso_qos.ucast.cis)
continue;
/* Match destination address if set */
refcount_t fib6_ref;
unsigned long expires;
-
- struct hlist_node gc_link;
-
struct dst_metrics *fib6_metrics;
#define fib6_pmtu fib6_metrics->metrics[RTAX_MTU-1]
return rt->fib6_src.plen > 0;
}
+static inline void fib6_clean_expires(struct fib6_info *f6i)
+{
+ f6i->fib6_flags &= ~RTF_EXPIRES;
+ f6i->expires = 0;
+}
+
+static inline void fib6_set_expires(struct fib6_info *f6i,
+ unsigned long expires)
+{
+ f6i->expires = expires;
+ f6i->fib6_flags |= RTF_EXPIRES;
+}
+
static inline bool fib6_check_expired(const struct fib6_info *f6i)
{
if (f6i->fib6_flags & RTF_EXPIRES)
return false;
}
-static inline bool fib6_has_expires(const struct fib6_info *f6i)
-{
- return f6i->fib6_flags & RTF_EXPIRES;
-}
-
/* Function to safely get fn->fn_sernum for passed in rt
* and store result in passed in cookie.
* Return true if we can get cookie safely
struct inet_peer_base tb6_peers;
unsigned int flags;
unsigned int fib_seq;
- struct hlist_head tb6_gc_hlist; /* GC candidates */
#define RT6_TABLE_HAS_DFLT_ROUTER BIT(0)
};
int fib6_init(void);
-/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be
- * NULL.
- */
-static inline void fib6_set_expires_locked(struct fib6_info *f6i,
- unsigned long expires)
-{
- struct fib6_table *tb6;
-
- tb6 = f6i->fib6_table;
- f6i->expires = expires;
- if (tb6 && !fib6_has_expires(f6i))
- hlist_add_head(&f6i->gc_link, &tb6->tb6_gc_hlist);
- f6i->fib6_flags |= RTF_EXPIRES;
-}
-
-/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be
- * NULL. If fib6_table is NULL, the fib6_info will no be inserted into the
- * list of GC candidates until it is inserted into a table.
- */
-static inline void fib6_set_expires(struct fib6_info *f6i,
- unsigned long expires)
-{
- spin_lock_bh(&f6i->fib6_table->tb6_lock);
- fib6_set_expires_locked(f6i, expires);
- spin_unlock_bh(&f6i->fib6_table->tb6_lock);
-}
-
-static inline void fib6_clean_expires_locked(struct fib6_info *f6i)
-{
- if (fib6_has_expires(f6i))
- hlist_del_init(&f6i->gc_link);
- f6i->fib6_flags &= ~RTF_EXPIRES;
- f6i->expires = 0;
-}
-
-static inline void fib6_clean_expires(struct fib6_info *f6i)
-{
- spin_lock_bh(&f6i->fib6_table->tb6_lock);
- fib6_clean_expires_locked(f6i);
- spin_unlock_bh(&f6i->fib6_table->tb6_lock);
-}
-
struct ipv6_route_iter {
struct seq_net_private p;
struct fib6_walker w;
return -1;
len = iph_totlen(pkt->skb, iph);
- thoff = iph->ihl * 4;
+ thoff = skb_network_offset(pkt->skb) + (iph->ihl * 4);
if (pkt->skb->len < len)
return -1;
else if (len < thoff)
#include <linux/limits.h>
#include <linux/net.h>
#include <linux/cred.h>
+#include <linux/file.h>
#include <linux/security.h>
#include <linux/pid.h>
#include <linux/nsproxy.h>
scm_destroy_cred(scm);
}
+static inline int scm_recv_one_fd(struct file *f, int __user *ufd,
+ unsigned int flags)
+{
+ if (!ufd)
+ return -EFAULT;
+ return receive_fd(f, ufd, flags);
+}
+
#endif /* __LINUX_NET_SCM_H */
return sk->sk_type == SOCK_STREAM && sk->sk_protocol == IPPROTO_TCP;
}
+static inline bool sk_is_stream_unix(const struct sock *sk)
+{
+ return sk->sk_family == AF_UNIX && sk->sk_type == SOCK_STREAM;
+}
+
/**
* sk_eat_skb - Release a skb if it is no longer needed
* @sk: socket to eat this skb from
const struct sock *addr_sk);
#ifdef CONFIG_TCP_MD5SIG
-#include <linux/jump_label.h>
-extern struct static_key_false_deferred tcp_md5_needed;
struct tcp_md5sig_key *__tcp_md5_do_lookup(const struct sock *sk, int l3index,
const union tcp_md5_addr *addr,
int family, bool any_l3index);
struct rcu_head rcu;
};
+#ifdef CONFIG_TCP_MD5SIG
+#include <linux/jump_label.h>
+extern struct static_key_false_deferred tcp_md5_needed;
+#define static_branch_tcp_md5() static_branch_unlikely(&tcp_md5_needed.key)
+#else
+#define static_branch_tcp_md5() false
+#endif
+#ifdef CONFIG_TCP_AO
+/* TCP-AO structures and functions */
+#include <linux/jump_label.h>
+extern struct static_key_false_deferred tcp_ao_needed;
+#define static_branch_tcp_ao() static_branch_unlikely(&tcp_ao_needed.key)
+#else
+#define static_branch_tcp_ao() false
+#endif
+
+static inline bool tcp_hash_should_produce_warnings(void)
+{
+ return static_branch_tcp_md5() || static_branch_tcp_ao();
+}
+
#define tcp_hash_fail(msg, family, skb, fmt, ...) \
do { \
const struct tcphdr *th = tcp_hdr(skb); \
char hdr_flags[6]; \
char *f = hdr_flags; \
\
+ if (!tcp_hash_should_produce_warnings()) \
+ break; \
if (th->fin) \
*f++ = 'F'; \
if (th->syn) \
#ifdef CONFIG_TCP_AO
/* TCP-AO structures and functions */
-#include <linux/jump_label.h>
-extern struct static_key_false_deferred tcp_ao_needed;
-
struct tcp4_ao_context {
__be32 saddr;
__be32 daddr;
__field( void *, clnt )
__field( __u8, type )
__field( __u16, tag )
- __array( unsigned char, line, P9_PROTO_DUMP_SZ )
+ __dynamic_array(unsigned char, line,
+ min_t(size_t, pdu->capacity, P9_PROTO_DUMP_SZ))
),
TP_fast_assign(
__entry->clnt = clnt;
__entry->type = pdu->id;
__entry->tag = pdu->tag;
- memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
+ memcpy(__get_dynamic_array(line), pdu->sdata,
+ __get_dynamic_array_len(line));
),
- TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
+ TP_printk("clnt %lu %s(tag = %d)\n%*ph\n",
(unsigned long)__entry->clnt, show_9p_op(__entry->type),
- __entry->tag, 0, __entry->line, 16, __entry->line + 16)
+ __entry->tag, __get_dynamic_array_len(line),
+ __get_dynamic_array(line))
);
#define __NR_futex_requeue 456
__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
+#define __NR_statmount 457
+__SYSCALL(__NR_statmount, sys_statmount)
+
+#define __NR_listmount 458
+__SYSCALL(__NR_listmount, sys_listmount)
+
#undef __NR_syscalls
-#define __NR_syscalls 457
+#define __NR_syscalls 459
/*
* 32 bit systems traditionally used different
/* List of all mount_attr versions. */
#define MOUNT_ATTR_SIZE_VER0 32 /* sizeof first published struct */
+
+/*
+ * Structure for getting mount/superblock/filesystem info with statmount(2).
+ *
+ * The interface is similar to statx(2): individual fields or groups can be
+ * selected with the @mask argument of statmount(). Kernel will set the @mask
+ * field according to the supported fields.
+ *
+ * If string fields are selected, then the caller needs to pass a buffer that
+ * has space after the fixed part of the structure. Nul terminated strings are
+ * copied there and offsets relative to @str are stored in the relevant fields.
+ * If the buffer is too small, then EOVERFLOW is returned. The actually used
+ * size is returned in @size.
+ */
+struct statmount {
+ __u32 size; /* Total size, including strings */
+ __u32 __spare1;
+ __u64 mask; /* What results were written */
+ __u32 sb_dev_major; /* Device ID */
+ __u32 sb_dev_minor;
+ __u64 sb_magic; /* ..._SUPER_MAGIC */
+ __u32 sb_flags; /* SB_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
+ __u32 fs_type; /* [str] Filesystem type */
+ __u64 mnt_id; /* Unique ID of mount */
+ __u64 mnt_parent_id; /* Unique ID of parent (for root == mnt_id) */
+ __u32 mnt_id_old; /* Reused IDs used in proc/.../mountinfo */
+ __u32 mnt_parent_id_old;
+ __u64 mnt_attr; /* MOUNT_ATTR_... */
+ __u64 mnt_propagation; /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
+ __u64 mnt_peer_group; /* ID of shared peer group */
+ __u64 mnt_master; /* Mount receives propagation from this ID */
+ __u64 propagate_from; /* Propagation from in current namespace */
+ __u32 mnt_root; /* [str] Root of mount relative to root of fs */
+ __u32 mnt_point; /* [str] Mountpoint relative to current root */
+ __u64 __spare2[50];
+ char str[]; /* Variable size part containing strings */
+};
+
+/*
+ * Structure for passing mount ID and miscellaneous parameters to statmount(2)
+ * and listmount(2).
+ *
+ * For statmount(2) @param represents the request mask.
+ * For listmount(2) @param represents the last listed mount id (or zero).
+ */
+struct mnt_id_req {
+ __u32 size;
+ __u32 spare;
+ __u64 mnt_id;
+ __u64 param;
+};
+
+/* List of all mnt_id_req versions. */
+#define MNT_ID_REQ_SIZE_VER0 24 /* sizeof first published struct */
+
+/*
+ * @mask bits for statmount(2)
+ */
+#define STATMOUNT_SB_BASIC 0x00000001U /* Want/got sb_... */
+#define STATMOUNT_MNT_BASIC 0x00000002U /* Want/got mnt_... */
+#define STATMOUNT_PROPAGATE_FROM 0x00000004U /* Want/got propagate_from */
+#define STATMOUNT_MNT_ROOT 0x00000008U /* Want/got mnt_root */
+#define STATMOUNT_MNT_POINT 0x00000010U /* Want/got mnt_point */
+#define STATMOUNT_FS_TYPE 0x00000020U /* Want/got fs_type */
+
+/*
+ * Special @mnt_id values that can be passed to listmount
+ */
+#define LSMT_ROOT 0xffffffffffffffff /* root mount */
+
#endif /* _UAPI_LINUX_MOUNT_H */
#define STATX_BTIME 0x00000800U /* Want/got stx_btime */
#define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */
#define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info */
+#define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */
#define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
int ops = atomic_xchg(&ev_fd->ops, 0);
if (ops & BIT(IO_EVENTFD_OP_SIGNAL_BIT))
- eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING_WAKE);
+ eventfd_signal_mask(ev_fd->cq_ev_fd, EPOLL_URING_WAKE);
/* IO_EVENTFD_OP_FREE_BIT may not be set here depending on callback
* ordering in a race but if references are 0 we know we have to free
goto out;
if (likely(eventfd_signal_allowed())) {
- eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING_WAKE);
+ eventfd_signal_mask(ev_fd->cq_ev_fd, EPOLL_URING_WAKE);
} else {
atomic_inc(&ev_fd->refs);
if (!atomic_fetch_or(BIT(IO_EVENTFD_OP_SIGNAL_BIT), &ev_fd->ops))
return -EAGAIN;
}
- file = __close_fd_get_file(close->fd);
+ file = file_close_fd_locked(files, close->fd);
spin_unlock(&files->file_lock);
if (!file)
goto err;
struct file *out = sp->file_out;
unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
struct file *in;
- long ret = 0;
+ ssize_t ret = 0;
WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
loff_t *poff_in, *poff_out;
struct file *in;
- long ret = 0;
+ ssize_t ret = 0;
WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
config KEXEC_FILE
bool "Enable kexec file based system call"
depends on ARCH_SUPPORTS_KEXEC_FILE
+ select CRYPTO
+ select CRYPTO_SHA256
select KEXEC_CORE
help
This is new version of kexec system call. This system call is
* called from interrupt context and we have preemption disabled while
* spinning.
*/
+
+struct optimistic_spin_node {
+ struct optimistic_spin_node *next, *prev;
+ int locked; /* 1 if lock acquired */
+ int cpu; /* encoded CPU # + 1 value */
+};
+
static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
/*
/*
* Get a stable @node->next pointer, either for unlock() or unqueue() purposes.
* Can return NULL in case we were the last queued and we updated @lock instead.
+ *
+ * If osq_lock() is being cancelled there must be a previous node
+ * and 'old_cpu' is its CPU #.
+ * For osq_unlock() there is never a previous node and old_cpu is
+ * set to OSQ_UNLOCKED_VAL.
*/
static inline struct optimistic_spin_node *
osq_wait_next(struct optimistic_spin_queue *lock,
struct optimistic_spin_node *node,
- struct optimistic_spin_node *prev)
+ int old_cpu)
{
- struct optimistic_spin_node *next = NULL;
int curr = encode_cpu(smp_processor_id());
- int old;
-
- /*
- * If there is a prev node in queue, then the 'old' value will be
- * the prev node's CPU #, else it's set to OSQ_UNLOCKED_VAL since if
- * we're currently last in queue, then the queue will then become empty.
- */
- old = prev ? prev->cpu : OSQ_UNLOCKED_VAL;
for (;;) {
if (atomic_read(&lock->tail) == curr &&
- atomic_cmpxchg_acquire(&lock->tail, curr, old) == curr) {
+ atomic_cmpxchg_acquire(&lock->tail, curr, old_cpu) == curr) {
/*
* We were the last queued, we moved @lock back. @prev
* will now observe @lock and will complete its
* unlock()/unqueue().
*/
- break;
+ return NULL;
}
/*
* wait for a new @node->next from its Step-C.
*/
if (node->next) {
+ struct optimistic_spin_node *next;
+
next = xchg(&node->next, NULL);
if (next)
- break;
+ return next;
}
cpu_relax();
}
-
- return next;
}
bool osq_lock(struct optimistic_spin_queue *lock)
* back to @prev.
*/
- next = osq_wait_next(lock, node, prev);
+ next = osq_wait_next(lock, node, prev->cpu);
if (!next)
return false;
return;
}
- next = osq_wait_next(lock, node, NULL);
+ next = osq_wait_next(lock, node, OSQ_UNLOCKED_VAL);
if (next)
WRITE_ONCE(next->locked, 1);
}
if (IS_ERR(file))
return PTR_ERR(file);
- ret = receive_fd(file, O_CLOEXEC);
+ ret = receive_fd(file, NULL, O_CLOEXEC);
fput(file);
return ret;
*/
list_del_init(&addfd->list);
if (!addfd->setfd)
- fd = receive_fd(addfd->file, addfd->flags);
+ fd = receive_fd(addfd->file, NULL, addfd->flags);
else
fd = receive_fd_replace(addfd->fd, addfd->file, addfd->flags);
addfd->ret = fd;
COND_SYSCALL_COMPAT(recvmmsg_time32);
COND_SYSCALL_COMPAT(recvmmsg_time64);
+/* Posix timer syscalls may be configured out */
+COND_SYSCALL(timer_create);
+COND_SYSCALL(timer_gettime);
+COND_SYSCALL(timer_getoverrun);
+COND_SYSCALL(timer_settime);
+COND_SYSCALL(timer_delete);
+COND_SYSCALL(clock_adjtime);
+COND_SYSCALL(getitimer);
+COND_SYSCALL(setitimer);
+COND_SYSCALL(alarm);
+COND_SYSCALL_COMPAT(timer_create);
+COND_SYSCALL_COMPAT(getitimer);
+COND_SYSCALL_COMPAT(setitimer);
+
/*
* Architecture specific syscalls: see further below
*/
#include <linux/time_namespace.h>
#include <linux/compat.h>
-#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
-/* Architectures may override SYS_NI and COMPAT_SYS_NI */
-#include <asm/syscall_wrapper.h>
-#endif
-
-asmlinkage long sys_ni_posix_timers(void)
-{
- pr_err_once("process %d (%s) attempted a POSIX timer syscall "
- "while CONFIG_POSIX_TIMERS is not set\n",
- current->pid, current->comm);
- return -ENOSYS;
-}
-
-#ifndef SYS_NI
-#define SYS_NI(name) SYSCALL_ALIAS(sys_##name, sys_ni_posix_timers)
-#endif
-
-#ifndef COMPAT_SYS_NI
-#define COMPAT_SYS_NI(name) SYSCALL_ALIAS(compat_sys_##name, sys_ni_posix_timers)
-#endif
-
-SYS_NI(timer_create);
-SYS_NI(timer_gettime);
-SYS_NI(timer_getoverrun);
-SYS_NI(timer_settime);
-SYS_NI(timer_delete);
-SYS_NI(clock_adjtime);
-SYS_NI(getitimer);
-SYS_NI(setitimer);
-SYS_NI(clock_adjtime32);
-#ifdef __ARCH_WANT_SYS_ALARM
-SYS_NI(alarm);
-#endif
-
/*
* We preserve minimal support for CLOCK_REALTIME and CLOCK_MONOTONIC
* as it is easy to remain compatible with little code. CLOCK_BOOTTIME
which_clock);
}
-#ifdef CONFIG_COMPAT
-COMPAT_SYS_NI(timer_create);
-#endif
-
-#if defined(CONFIG_COMPAT) || defined(CONFIG_ALPHA)
-COMPAT_SYS_NI(getitimer);
-COMPAT_SYS_NI(setitimer);
-#endif
-
#ifdef CONFIG_COMPAT_32BIT_TIME
-SYS_NI(timer_settime32);
-SYS_NI(timer_gettime32);
SYSCALL_DEFINE2(clock_settime32, const clockid_t, which_clock,
struct old_timespec32 __user *, tp)
hash->count++;
}
-static int add_hash_entry(struct ftrace_hash *hash, unsigned long ip)
+static struct ftrace_func_entry *
+add_hash_entry(struct ftrace_hash *hash, unsigned long ip)
{
struct ftrace_func_entry *entry;
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
if (!entry)
- return -ENOMEM;
+ return NULL;
entry->ip = ip;
__add_hash_entry(hash, entry);
- return 0;
+ return entry;
}
static void
struct ftrace_func_entry *entry;
struct ftrace_hash *new_hash;
int size;
- int ret;
int i;
new_hash = alloc_ftrace_hash(size_bits);
size = 1 << hash->size_bits;
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
- ret = add_hash_entry(new_hash, entry->ip);
- if (ret < 0)
+ if (add_hash_entry(new_hash, entry->ip) == NULL)
goto free_hash;
}
}
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
/* Protected by rcu_tasks for reading, and direct_mutex for writing */
-static struct ftrace_hash *direct_functions = EMPTY_HASH;
+static struct ftrace_hash __rcu *direct_functions = EMPTY_HASH;
static DEFINE_MUTEX(direct_mutex);
int ftrace_direct_func_count;
return entry->direct;
}
-static struct ftrace_func_entry*
-ftrace_add_rec_direct(unsigned long ip, unsigned long addr,
- struct ftrace_hash **free_hash)
-{
- struct ftrace_func_entry *entry;
-
- if (ftrace_hash_empty(direct_functions) ||
- direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
- struct ftrace_hash *new_hash;
- int size = ftrace_hash_empty(direct_functions) ? 0 :
- direct_functions->count + 1;
-
- if (size < 32)
- size = 32;
-
- new_hash = dup_hash(direct_functions, size);
- if (!new_hash)
- return NULL;
-
- *free_hash = direct_functions;
- direct_functions = new_hash;
- }
-
- entry = kmalloc(sizeof(*entry), GFP_KERNEL);
- if (!entry)
- return NULL;
-
- entry->ip = ip;
- entry->direct = addr;
- __add_hash_entry(direct_functions, entry);
- return entry;
-}
-
static void call_direct_funcs(unsigned long ip, unsigned long pip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
/* Do nothing if it exists */
if (entry)
return 0;
-
- ret = add_hash_entry(hash, rec->ip);
+ if (add_hash_entry(hash, rec->ip) == NULL)
+ ret = -ENOMEM;
}
return ret;
}
return 0;
}
- return add_hash_entry(hash, ip);
+ entry = add_hash_entry(hash, ip);
+ return entry ? 0 : -ENOMEM;
}
static int
*/
int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
{
- struct ftrace_hash *hash, *free_hash = NULL;
+ struct ftrace_hash *hash, *new_hash = NULL, *free_hash = NULL;
struct ftrace_func_entry *entry, *new;
int err = -EBUSY, size, i;
}
}
- /* ... and insert them to direct_functions hash. */
err = -ENOMEM;
+
+ /* Make a copy hash to place the new and the old entries in */
+ size = hash->count + direct_functions->count;
+ if (size > 32)
+ size = 32;
+ new_hash = alloc_ftrace_hash(fls(size));
+ if (!new_hash)
+ goto out_unlock;
+
+ /* Now copy over the existing direct entries */
+ size = 1 << direct_functions->size_bits;
+ for (i = 0; i < size; i++) {
+ hlist_for_each_entry(entry, &direct_functions->buckets[i], hlist) {
+ new = add_hash_entry(new_hash, entry->ip);
+ if (!new)
+ goto out_unlock;
+ new->direct = entry->direct;
+ }
+ }
+
+ /* ... and add the new entries */
+ size = 1 << hash->size_bits;
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
- new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
+ new = add_hash_entry(new_hash, entry->ip);
if (!new)
- goto out_remove;
+ goto out_unlock;
+ /* Update both the copy and the hash entry */
+ new->direct = addr;
entry->direct = addr;
}
}
+ free_hash = direct_functions;
+ rcu_assign_pointer(direct_functions, new_hash);
+ new_hash = NULL;
+
ops->func = call_direct_funcs;
ops->flags = MULTI_FLAGS;
ops->trampoline = FTRACE_REGS_ADDR;
err = register_ftrace_function_nolock(ops);
- out_remove:
- if (err)
- remove_direct_functions_hash(hash, addr);
-
out_unlock:
mutex_unlock(&direct_mutex);
- if (free_hash) {
+ if (free_hash && free_hash != EMPTY_HASH) {
synchronize_rcu_tasks();
free_ftrace_hash(free_hash);
}
+
+ if (new_hash)
+ free_ftrace_hash(new_hash);
+
return err;
}
EXPORT_SYMBOL_GPL(register_ftrace_direct);
if (entry)
continue;
- if (add_hash_entry(hash, rec->ip) < 0)
+ if (add_hash_entry(hash, rec->ip) == NULL)
goto out;
} else {
if (entry) {
return local_try_cmpxchg(l, &expect, set);
}
-static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set)
-{
- unsigned long cnt, top, bottom, msb;
- unsigned long cnt2, top2, bottom2, msb2;
- u64 val;
-
- /* Any interruptions in this function should cause a failure */
- cnt = local_read(&t->cnt);
-
- /* The cmpxchg always fails if it interrupted an update */
- if (!__rb_time_read(t, &val, &cnt2))
- return false;
-
- if (val != expect)
- return false;
-
- if ((cnt & 3) != cnt2)
- return false;
-
- cnt2 = cnt + 1;
-
- rb_time_split(val, &top, &bottom, &msb);
- msb = rb_time_val_cnt(msb, cnt);
- top = rb_time_val_cnt(top, cnt);
- bottom = rb_time_val_cnt(bottom, cnt);
-
- rb_time_split(set, &top2, &bottom2, &msb2);
- msb2 = rb_time_val_cnt(msb2, cnt);
- top2 = rb_time_val_cnt(top2, cnt2);
- bottom2 = rb_time_val_cnt(bottom2, cnt2);
-
- if (!rb_time_read_cmpxchg(&t->cnt, cnt, cnt2))
- return false;
- if (!rb_time_read_cmpxchg(&t->msb, msb, msb2))
- return false;
- if (!rb_time_read_cmpxchg(&t->top, top, top2))
- return false;
- if (!rb_time_read_cmpxchg(&t->bottom, bottom, bottom2))
- return false;
- return true;
-}
-
#else /* 64 bits */
/* local64_t always succeeds */
{
local64_set(&t->time, val);
}
-
-static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set)
-{
- return local64_try_cmpxchg(&t->time, &expect, set);
-}
#endif
/*
if (!nr_pages || !full)
return true;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+ /*
+ * Add one as dirty will never equal nr_pages, as the sub-buffer
+ * that the writer is on is not counted as dirty.
+ * This is needed if "buffer_percent" is set to 100.
+ */
+ dirty = ring_buffer_nr_dirty_pages(buffer, cpu) + 1;
- return (dirty * 100) > (full * nr_pages);
+ return (dirty * 100) >= (full * nr_pages);
}
/*
/* make sure the waiters see the new index */
smp_wmb();
- rb_wake_up_waiters(&rbwork->work);
+ /* This can be called in any context */
+ irq_work_queue(&rbwork->work);
}
/**
} else {
u64 ts;
/* SLOW PATH - Interrupted between A and C */
- a_ok = rb_time_read(&cpu_buffer->write_stamp, &info->after);
- /* Was interrupted before here, write_stamp must be valid */
+
+ /* Save the old before_stamp */
+ a_ok = rb_time_read(&cpu_buffer->before_stamp, &info->before);
RB_WARN_ON(cpu_buffer, !a_ok);
+
+ /*
+ * Read a new timestamp and update the before_stamp to make
+ * the next event after this one force using an absolute
+ * timestamp. This is in case an interrupt were to come in
+ * between E and F.
+ */
ts = rb_time_stamp(cpu_buffer->buffer);
+ rb_time_set(&cpu_buffer->before_stamp, ts);
+
+ barrier();
+ /*E*/ a_ok = rb_time_read(&cpu_buffer->write_stamp, &info->after);
+ /* Was interrupted before here, write_stamp must be valid */
+ RB_WARN_ON(cpu_buffer, !a_ok);
barrier();
- /*E*/ if (write == (local_read(&tail_page->write) & RB_WRITE_MASK) &&
- info->after < ts &&
- rb_time_cmpxchg(&cpu_buffer->write_stamp,
- info->after, ts)) {
- /* Nothing came after this event between C and E */
+ /*F*/ if (write == (local_read(&tail_page->write) & RB_WRITE_MASK) &&
+ info->after == info->before && info->after < ts) {
+ /*
+ * Nothing came after this event between C and F, it is
+ * safe to use info->after for the delta as it
+ * matched info->before and is still valid.
+ */
info->delta = ts - info->after;
} else {
/*
- * Interrupted between C and E:
+ * Interrupted between C and F:
* Lost the previous events time stamp. Just set the
* delta to zero, and this will be the same time as
* the event this event interrupted. And the events that
ret = test_trace_synth_event();
WARN_ON(ret);
+
+ /* Disable when done */
+ trace_array_set_clr_event(gen_synth_test->tr,
+ "synthetic",
+ "gen_synth_test", false);
+ trace_array_set_clr_event(empty_synth_test->tr,
+ "synthetic",
+ "empty_synth_test", false);
+ trace_array_set_clr_event(create_synth_test->tr,
+ "synthetic",
+ "create_synth_test", false);
out:
return ret;
}
__update_max_tr(tr, tsk, cpu);
arch_spin_unlock(&tr->max_lock);
+
+ /* Any waiters on the old snapshot buffer need to wake up */
+ ring_buffer_wake_waiters(tr->array_buffer.buffer, RING_BUFFER_ALL_CPUS);
}
/**
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
+ int ret;
+
/* Iterators are static, they should be filled or empty */
if (trace_buffer_iter(iter, iter->cpu_file))
return 0;
- return ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file,
- full);
+ ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full);
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+ /*
+ * Make sure this is still the snapshot buffer, as if a snapshot were
+ * to happen, this would now be the main buffer.
+ */
+ if (iter->snapshot)
+ iter->array_buffer = &iter->tr->max_buffer;
+#endif
+ return ret;
}
#ifdef CONFIG_FTRACE_STARTUP_TEST
wait_index = READ_ONCE(iter->wait_index);
- ret = wait_on_pipe(iter, iter->tr->buffer_percent);
+ ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent);
if (ret)
goto out;
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @name: The name of the synthetic event
* @mod: The module creating the event, NULL if not created from a module
- * @args: Variable number of arg (pairs), one pair for each field
+ * @...: Variable number of arg (pairs), one pair for each field
*
* NOTE: Users normally won't want to call this function directly, but
* rather use the synth_event_gen_cmd_start() wrapper, which
* synth_event_trace - Trace a synthetic event
* @file: The trace_event_file representing the synthetic event
* @n_vals: The number of values in vals
- * @args: Variable number of args containing the event values
+ * @...: Variable number of args containing the event values
*
* Trace a synthetic event using the values passed in the variable
* argument list.
static ssize_t user_events_write(struct file *file, const char __user *ubuf,
size_t count, loff_t *ppos)
{
- struct iovec iov;
struct iov_iter i;
if (unlikely(*ppos != 0))
return -EFAULT;
- if (unlikely(import_single_range(ITER_SOURCE, (char __user *)ubuf,
- count, &iov, &i)))
+ if (unlikely(import_ubuf(ITER_SOURCE, (char __user *)ubuf, count, &i)))
return -EFAULT;
return user_events_write_core(file, &i);
}
EXPORT_SYMBOL(__put_user_ns);
-/**
+/*
* struct idmap_key - holds the information necessary to find an idmapping in a
* sorted idmap array. It is passed to cmp_map_id() as first argument.
*/
u32 count; /* == 0 unless used with map_id_range_down() */
};
-/**
+/*
* cmp_map_id - Function to be passed to bsearch() to find the requested
* idmapping. Expects struct idmap_key to be passed via @k.
*/
return 1;
}
-/**
+/*
* map_id_range_down_max - Find idmap via binary search in ordered idmap array.
* Can only be called if number of mappings exceeds UID_GID_MAP_MAX_BASE_EXTENTS.
*/
sizeof(struct uid_gid_extent), cmp_map_id);
}
-/**
+/*
* map_id_range_down_base - Find idmap via binary search in static extent array.
* Can only be called if number of mappings is equal or less than
* UID_GID_MAP_MAX_BASE_EXTENTS.
return id;
}
-static u32 map_id_down(struct uid_gid_map *map, u32 id)
+u32 map_id_down(struct uid_gid_map *map, u32 id)
{
return map_id_range_down(map, id, 1);
}
-/**
+/*
* map_id_up_base - Find idmap via binary search in static extent array.
* Can only be called if number of mappings is equal or less than
* UID_GID_MAP_MAX_BASE_EXTENTS.
return NULL;
}
-/**
+/*
* map_id_up_max - Find idmap via binary search in ordered idmap array.
* Can only be called if number of mappings exceeds UID_GID_MAP_MAX_BASE_EXTENTS.
*/
sizeof(struct uid_gid_extent), cmp_map_id);
}
-static u32 map_id_up(struct uid_gid_map *map, u32 id)
+u32 map_id_up(struct uid_gid_map *map, u32 id)
{
struct uid_gid_extent *extent;
unsigned extents = map->nr_extents;
return false;
}
-/**
+/*
* insert_extent - Safely insert a new idmap extent into struct uid_gid_map.
* Takes care to allocate a 4K block of memory if the number of mappings exceeds
* UID_GID_MAP_MAX_BASE_EXTENTS.
return 0;
}
-/**
+/*
* sort_idmaps - Sorts an array of idmap entries.
* Can only be called if number of mappings exceeds UID_GID_MAP_MAX_BASE_EXTENTS.
*/
goto error;
ret = -ENOMEM;
- pages = kcalloc(sizeof(struct page *), nr_pages, GFP_KERNEL);
+ pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
if (!pages)
goto error;
goto delete;
xas_store(&xas, xa_mk_value(v));
} else {
- if (!test_bit(bit, bitmap->bitmap))
+ if (!bitmap || !test_bit(bit, bitmap->bitmap))
goto err;
__clear_bit(bit, bitmap->bitmap);
xas_set_mark(&xas, XA_FREE_MARK);
}
EXPORT_SYMBOL(import_iovec);
-int import_single_range(int rw, void __user *buf, size_t len,
- struct iovec *iov, struct iov_iter *i)
-{
- if (len > MAX_RW_COUNT)
- len = MAX_RW_COUNT;
- if (unlikely(!access_ok(buf, len)))
- return -EFAULT;
-
- iov_iter_ubuf(i, rw, buf, len);
- return 0;
-}
-EXPORT_SYMBOL(import_single_range);
-
int import_ubuf(int rw, void __user *buf, size_t len, struct iov_iter *i)
{
if (len > MAX_RW_COUNT)
mas_wr_end_piv(&wr_mas);
node_size = mas_wr_new_end(&wr_mas);
+
+ /* Slot store, does not require additional nodes */
+ if (node_size == wr_mas.node_end) {
+ /* reuse node */
+ if (!mt_in_rcu(mas->tree))
+ return 0;
+ /* shifting boundary */
+ if (wr_mas.offset_end - mas->offset == 1)
+ return 0;
+ }
+
if (node_size >= mt_slots[wr_mas.type]) {
/* Split, worst case for now. */
request = 1 + mas_mt_height(mas) * 2;
IDA_BUG_ON(ida, !ida_is_empty(ida));
}
+/*
+ * Check various situations where we attempt to free an ID we don't own.
+ */
+static void ida_check_bad_free(struct ida *ida)
+{
+ unsigned long i;
+
+ printk("vvv Ignore \"not allocated\" warnings\n");
+ /* IDA is empty; all of these will fail */
+ ida_free(ida, 0);
+ for (i = 0; i < 31; i++)
+ ida_free(ida, 1 << i);
+
+ /* IDA contains a single value entry */
+ IDA_BUG_ON(ida, ida_alloc_min(ida, 3, GFP_KERNEL) != 3);
+ ida_free(ida, 0);
+ for (i = 0; i < 31; i++)
+ ida_free(ida, 1 << i);
+
+ /* IDA contains a single bitmap */
+ IDA_BUG_ON(ida, ida_alloc_min(ida, 1023, GFP_KERNEL) != 1023);
+ ida_free(ida, 0);
+ for (i = 0; i < 31; i++)
+ ida_free(ida, 1 << i);
+
+ /* IDA contains a tree */
+ IDA_BUG_ON(ida, ida_alloc_min(ida, (1 << 20) - 1, GFP_KERNEL) != (1 << 20) - 1);
+ ida_free(ida, 0);
+ for (i = 0; i < 31; i++)
+ ida_free(ida, 1 << i);
+ printk("^^^ \"not allocated\" warnings over\n");
+
+ ida_free(ida, 3);
+ ida_free(ida, 1023);
+ ida_free(ida, (1 << 20) - 1);
+
+ IDA_BUG_ON(ida, !ida_is_empty(ida));
+}
+
static DEFINE_IDA(ida);
static int ida_checks(void)
ida_check_leaf(&ida, 1024 * 64);
ida_check_max(&ida);
ida_check_conv(&ida);
+ ida_check_bad_free(&ida);
printk("IDA: %u of %u tests passed\n", tests_passed, tests_run);
return (tests_run != tests_passed) ? 0 : -EINVAL;
/* Loop starting from the root node to the current node. */
for (depth = fwnode_count_parents(fwnode); depth >= 0; depth--) {
- struct fwnode_handle *__fwnode =
- fwnode_get_nth_parent(fwnode, depth);
+ /*
+ * Only get a reference for other nodes (i.e. parent nodes).
+ * fwnode refcount may be 0 here.
+ */
+ struct fwnode_handle *__fwnode = depth ?
+ fwnode_get_nth_parent(fwnode, depth) : fwnode;
buf = string(buf, end, fwnode_get_name_prefix(__fwnode),
default_str_spec);
buf = string(buf, end, fwnode_get_name(__fwnode),
default_str_spec);
- fwnode_handle_put(__fwnode);
+ if (depth)
+ fwnode_handle_put(__fwnode);
}
return buf;
goto put_folios;
end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
+ /*
+ * Pairs with a barrier in
+ * block_write_end()->mark_buffer_dirty() or other page
+ * dirtying routines like iomap_write_end() to ensure
+ * changes to page contents are visible before we see
+ * increased inode size.
+ */
+ smp_rmb();
+
/*
* Once we start copying data, we don't want to be touching any
* cachelines that might be contended:
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
if (!list_empty(&folio->_deferred_list)) {
ds_queue->split_queue_len--;
- list_del(&folio->_deferred_list);
+ list_del_init(&folio->_deferred_list);
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
}
* The VERY common case is inode->mapping == &inode->i_data but,
* this may not be true for device special inodes.
*/
- return (struct resv_map *)(&inode->i_data)->private_data;
+ return (struct resv_map *)(&inode->i_data)->i_private_data;
}
static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
{
char *ptr;
size_t size = 128 - KASAN_GRANULE_SIZE;
+ size_t memset_size = 2;
KASAN_TEST_NEEDS_CHECKED_MEMINTRINSICS(test);
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
+ OPTIMIZER_HIDE_VAR(ptr);
OPTIMIZER_HIDE_VAR(size);
- KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 1, 0, 2));
+ OPTIMIZER_HIDE_VAR(memset_size);
+ KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 1, 0, memset_size));
kfree(ptr);
}
{
char *ptr;
size_t size = 128 - KASAN_GRANULE_SIZE;
+ size_t memset_size = 4;
KASAN_TEST_NEEDS_CHECKED_MEMINTRINSICS(test);
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
+ OPTIMIZER_HIDE_VAR(ptr);
OPTIMIZER_HIDE_VAR(size);
- KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 3, 0, 4));
+ OPTIMIZER_HIDE_VAR(memset_size);
+ KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 3, 0, memset_size));
kfree(ptr);
}
{
char *ptr;
size_t size = 128 - KASAN_GRANULE_SIZE;
+ size_t memset_size = 8;
KASAN_TEST_NEEDS_CHECKED_MEMINTRINSICS(test);
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
+ OPTIMIZER_HIDE_VAR(ptr);
OPTIMIZER_HIDE_VAR(size);
- KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 7, 0, 8));
+ OPTIMIZER_HIDE_VAR(memset_size);
+ KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 7, 0, memset_size));
kfree(ptr);
}
{
char *ptr;
size_t size = 128 - KASAN_GRANULE_SIZE;
+ size_t memset_size = 16;
KASAN_TEST_NEEDS_CHECKED_MEMINTRINSICS(test);
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
+ OPTIMIZER_HIDE_VAR(ptr);
OPTIMIZER_HIDE_VAR(size);
- KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 15, 0, 16));
+ OPTIMIZER_HIDE_VAR(memset_size);
+ KUNIT_EXPECT_KASAN_FAIL(test, memset(ptr + size - 15, 0, memset_size));
kfree(ptr);
}
* only one element of the array here.
*/
for (; i >= 0 && unlikely(t->entries[i].threshold > usage); i--)
- eventfd_signal(t->entries[i].eventfd, 1);
+ eventfd_signal(t->entries[i].eventfd);
/* i = current_threshold + 1 */
i++;
* only one element of the array here.
*/
for (; i < t->size && unlikely(t->entries[i].threshold <= usage); i++)
- eventfd_signal(t->entries[i].eventfd, 1);
+ eventfd_signal(t->entries[i].eventfd);
/* Update current_threshold */
t->current_threshold = i - 1;
spin_lock(&memcg_oom_lock);
list_for_each_entry(ev, &memcg->oom_notify, list)
- eventfd_signal(ev->eventfd, 1);
+ eventfd_signal(ev->eventfd);
spin_unlock(&memcg_oom_lock);
return 0;
/* already in OOM ? */
if (memcg->under_oom)
- eventfd_signal(eventfd, 1);
+ eventfd_signal(eventfd);
spin_unlock(&memcg_oom_lock);
return 0;
event->unregister_event(memcg, event->eventfd);
/* Notify userspace the event is going away. */
- eventfd_signal(event->eventfd, 1);
+ eventfd_signal(event->eventfd);
eventfd_ctx_put(event->eventfd);
kfree(event);
/* Transfer the charge and the css ref */
commit_charge(new, memcg);
+ /*
+ * If the old folio is a large folio and is in the split queue, it needs
+ * to be removed from the split queue now, in case getting an incorrect
+ * split queue in destroy_large_folio() after the memcg of the old folio
+ * is cleared.
+ *
+ * In addition, the old folio is about to be freed after migration, so
+ * removing from the split queue a bit earlier seems reasonable.
+ */
+ if (folio_test_large(old) && folio_test_large_rmappable(old))
+ folio_undo_large_rmappable(old);
old->memcg_data = 0;
}
/*
* Collect processes when the error hit an anonymous page.
*/
-static void collect_procs_anon(struct page *page, struct list_head *to_kill,
- int force_early)
+static void collect_procs_anon(struct folio *folio, struct page *page,
+ struct list_head *to_kill, int force_early)
{
- struct folio *folio = page_folio(page);
struct vm_area_struct *vma;
struct task_struct *tsk;
struct anon_vma *av;
/*
* Collect processes when the error hit a file mapped page.
*/
-static void collect_procs_file(struct page *page, struct list_head *to_kill,
- int force_early)
+static void collect_procs_file(struct folio *folio, struct page *page,
+ struct list_head *to_kill, int force_early)
{
struct vm_area_struct *vma;
struct task_struct *tsk;
- struct address_space *mapping = page->mapping;
+ struct address_space *mapping = folio->mapping;
pgoff_t pgoff;
i_mmap_lock_read(mapping);
/*
* Collect the processes who have the corrupted page mapped to kill.
*/
-static void collect_procs(struct page *page, struct list_head *tokill,
- int force_early)
+static void collect_procs(struct folio *folio, struct page *page,
+ struct list_head *tokill, int force_early)
{
- if (!page->mapping)
+ if (!folio->mapping)
return;
if (unlikely(PageKsm(page)))
collect_procs_ksm(page, tokill, force_early);
else if (PageAnon(page))
- collect_procs_anon(page, tokill, force_early);
+ collect_procs_anon(folio, page, tokill, force_early);
else
- collect_procs_file(page, tokill, force_early);
+ collect_procs_file(folio, page, tokill, force_early);
}
struct hwpoison_walk {
* This check implies we don't kill processes if their pages
* are in the swap cache early. Those are always late kills.
*/
- if (!page_mapped(hpage))
+ if (!page_mapped(p))
return true;
if (PageSwapCache(p)) {
* mapped in dirty form. This has to be done before try_to_unmap,
* because ttu takes the rmap data structures down.
*/
- collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
+ collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
if (PageHuge(hpage) && !PageAnon(hpage)) {
/*
try_to_unmap(folio, ttu);
}
- unmap_success = !page_mapped(hpage);
+ unmap_success = !page_mapped(p);
if (!unmap_success)
pr_err("%#lx: failed to unmap page (mapcount=%d)\n",
- pfn, page_mapcount(hpage));
+ pfn, page_mapcount(p));
/*
* try_to_unmap() might put mlocked page in lru cache, so call
* mapping being torn down is communicated in siginfo, see
* kill_proc()
*/
- loff_t start = (index << PAGE_SHIFT) & ~(size - 1);
+ loff_t start = ((loff_t)index << PAGE_SHIFT) & ~(size - 1);
unmap_mapping_range(mapping, start, size, 0);
}
* SIGBUS (i.e. MF_MUST_KILL)
*/
flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
- collect_procs(&folio->page, &to_kill, true);
+ collect_procs(folio, &folio->page, &to_kill, true);
unmap_and_kill(&to_kill, pfn, folio->mapping, folio->index, flags);
unlock:
void unmap_mapping_range(struct address_space *mapping,
loff_t const holebegin, loff_t const holelen, int even_cows)
{
- pgoff_t hba = holebegin >> PAGE_SHIFT;
- pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ pgoff_t hba = (pgoff_t)(holebegin) >> PAGE_SHIFT;
+ pgoff_t hlen = ((pgoff_t)(holelen) + PAGE_SIZE - 1) >> PAGE_SHIFT;
/* Check for overflow. */
if (sizeof(holelen) > sizeof(hlen)) {
int dirty;
int expected_count = folio_expected_refs(mapping, folio) + extra_count;
long nr = folio_nr_pages(folio);
+ long entries, i;
if (!mapping) {
/* Anonymous page without mapping */
folio_set_swapcache(newfolio);
newfolio->private = folio_get_private(folio);
}
+ entries = nr;
} else {
VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+ entries = 1;
}
/* Move dirty while page refs frozen and newpage not yet exposed */
folio_set_dirty(newfolio);
}
- xas_store(&xas, newfolio);
+ /* Swap cache still stores N entries instead of a high-order entry */
+ for (i = 0; i < entries; i++) {
+ xas_store(&xas, newfolio);
+ xas_next(&xas);
+ }
/*
* Drop cache reference from old page by unfreezing
recheck_buffers:
busy = false;
- spin_lock(&mapping->private_lock);
+ spin_lock(&mapping->i_private_lock);
bh = head;
do {
if (atomic_read(&bh->b_count)) {
rc = -EAGAIN;
goto unlock_buffers;
}
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
invalidate_bh_lrus();
invalidated = true;
goto recheck_buffers;
rc = MIGRATEPAGE_SUCCESS;
unlock_buffers:
if (check_refs)
- spin_unlock(&mapping->private_lock);
+ spin_unlock(&mapping->i_private_lock);
bh = head;
do {
unlock_buffer(bh);
*/
pgoff = 0;
get_area = shmem_get_unmapped_area;
+ } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ /* Ensures that larger anonymous mappings are THP aligned. */
+ get_area = thp_get_unmapped_area;
}
addr = get_area(file, addr, len, pgoff, flags);
if (min_ratio > 100 * BDI_RATIO_SCALE)
return -EINVAL;
- min_ratio *= BDI_RATIO_SCALE;
spin_lock_bh(&bdi_lock);
if (min_ratio > bdi->max_ratio) {
ret = -EINVAL;
} else {
bdi->max_ratio = max_ratio;
- bdi->max_prop_frac = (FPROP_FRAC_BASE * max_ratio) / 100;
+ bdi->max_prop_frac = (FPROP_FRAC_BASE * max_ratio) /
+ (100 * BDI_RATIO_SCALE);
}
spin_unlock_bh(&bdi_lock);
if (new_nr_max <= old->map_nr_max)
continue;
- new = kvmalloc_node(sizeof(*new) + new_size, GFP_KERNEL, nid);
+ new = kvzalloc_node(sizeof(*new) + new_size, GFP_KERNEL, nid);
if (!new)
return -ENOMEM;
continue;
if (level < ev->level)
continue;
- eventfd_signal(ev->efd, 1);
+ eventfd_signal(ev->efd);
ret = true;
}
mutex_unlock(&vmpr->events_lock);
int young = 0;
pte_t *pte = pvmw->pte;
unsigned long addr = pvmw->address;
+ struct vm_area_struct *vma = pvmw->vma;
struct folio *folio = pfn_folio(pvmw->pfn);
bool can_swap = !folio_is_file_lru(folio);
struct mem_cgroup *memcg = folio_memcg(folio);
if (spin_is_contended(pvmw->ptl))
return;
+ /* exclude special VMAs containing anon pages from COW */
+ if (vma->vm_flags & VM_SPECIAL)
+ return;
+
/* avoid taking the LRU lock under the PTL when possible */
walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL;
- start = max(addr & PMD_MASK, pvmw->vma->vm_start);
- end = min(addr | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
+ start = max(addr & PMD_MASK, vma->vm_start);
+ end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1;
if (end - start > MIN_LRU_BATCH * PAGE_SIZE) {
if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
unsigned long pfn;
pte_t ptent = ptep_get(pte + i);
- pfn = get_pte_pfn(ptent, pvmw->vma, addr);
+ pfn = get_pte_pfn(ptent, vma, addr);
if (pfn == -1)
continue;
if (!folio)
continue;
- if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i))
+ if (!ptep_test_and_clear_young(vma, addr, pte + i))
VM_WARN_ON_ONCE(true);
young++;
return 0;
list_for_each_entry(vid_info, &vlan_info->vid_list, list) {
+ if (!vlan_hw_filter_capable(by_dev, vid_info->proto))
+ continue;
err = vlan_vid_add(dev, vid_info->proto, vid_info->vid);
if (err)
goto unwind;
list_for_each_entry_continue_reverse(vid_info,
&vlan_info->vid_list,
list) {
+ if (!vlan_hw_filter_capable(by_dev, vid_info->proto))
+ continue;
vlan_vid_del(dev, vid_info->proto, vid_info->vid);
}
if (!vlan_info)
return;
- list_for_each_entry(vid_info, &vlan_info->vid_list, list)
+ list_for_each_entry(vid_info, &vlan_info->vid_list, list) {
+ if (!vlan_hw_filter_capable(by_dev, vid_info->proto))
+ continue;
vlan_vid_del(dev, vid_info->proto, vid_info->vid);
+ }
}
EXPORT_SYMBOL(vlan_vids_del_by_dev);
uint16_t *nwname = va_arg(ap, uint16_t *);
char ***wnames = va_arg(ap, char ***);
+ *wnames = NULL;
+
errcode = p9pdu_readf(pdu, proto_version,
"w", nwname);
if (!errcode) {
GFP_NOFS);
if (!*wnames)
errcode = -ENOMEM;
+ else
+ (*wnames)[0] = NULL;
}
if (!errcode) {
proto_version,
"s",
&(*wnames)[i]);
- if (errcode)
+ if (errcode) {
+ (*wnames)[i] = NULL;
break;
+ }
}
}
if (*wnames) {
int i;
- for (i = 0; i < *nwname; i++)
+ for (i = 0; i < *nwname; i++) {
+ if (!(*wnames)[i])
+ break;
kfree((*wnames)[i]);
+ }
+ kfree(*wnames);
+ *wnames = NULL;
}
- kfree(*wnames);
- *wnames = NULL;
}
}
break;
if (flags & MSG_OOB)
return -EOPNOTSUPP;
+ lock_sock(sk);
+
skb = skb_recv_datagram(sk, flags, &err);
if (!skb) {
if (sk->sk_shutdown & RCV_SHUTDOWN)
- return 0;
+ err = 0;
+ release_sock(sk);
return err;
}
skb_free_datagram(sk, skb);
+ release_sock(sk);
+
if (flags & MSG_TRUNC)
copied = skblen;
{
struct hci_rp_read_class_of_dev *rp = data;
+ if (WARN_ON(!hdev))
+ return HCI_ERROR_UNSPECIFIED;
+
bt_dev_dbg(hdev, "status 0x%2.2x", rp->status);
if (rp->status)
} else {
conn->enc_key_size = rp->key_size;
status = 0;
+
+ if (conn->enc_key_size < hdev->min_enc_key_size) {
+ /* As slave role, the conn->state has been set to
+ * BT_CONNECTED and l2cap conn req might not be received
+ * yet, at this moment the l2cap layer almost does
+ * nothing with the non-zero status.
+ * So we also clear encrypt related bits, and then the
+ * handler of l2cap conn req will get the right secure
+ * state at a later time.
+ */
+ status = HCI_ERROR_AUTH_FAILURE;
+ clear_bit(HCI_CONN_ENCRYPT, &conn->flags);
+ clear_bit(HCI_CONN_AES_CCM, &conn->flags);
+ }
}
- hci_encrypt_cfm(conn, 0);
+ hci_encrypt_cfm(conn, status);
done:
hci_dev_unlock(hdev);
if (!rp->status)
conn->auth_payload_timeout = get_unaligned_le16(sent + 2);
- hci_encrypt_cfm(conn, 0);
-
unlock:
hci_dev_unlock(hdev);
return;
}
- set_bit(HCI_INQUIRY, &hdev->flags);
+ if (hci_sent_cmd_data(hdev, HCI_OP_INQUIRY))
+ set_bit(HCI_INQUIRY, &hdev->flags);
}
static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status)
cp.handle = cpu_to_le16(conn->handle);
cp.timeout = cpu_to_le16(hdev->auth_payload_timeout);
if (hci_send_cmd(conn->hdev, HCI_OP_WRITE_AUTH_PAYLOAD_TO,
- sizeof(cp), &cp)) {
+ sizeof(cp), &cp))
bt_dev_err(hdev, "write auth payload timeout failed");
- goto notify;
- }
-
- goto unlock;
}
notify:
kfree_skb(skb);
}
+static inline void l2cap_sig_send_rej(struct l2cap_conn *conn, u16 ident)
+{
+ struct l2cap_cmd_rej_unk rej;
+
+ rej.reason = cpu_to_le16(L2CAP_REJ_NOT_UNDERSTOOD);
+ l2cap_send_cmd(conn, ident, L2CAP_COMMAND_REJ, sizeof(rej), &rej);
+}
+
static inline void l2cap_sig_channel(struct l2cap_conn *conn,
struct sk_buff *skb)
{
if (len > skb->len || !cmd->ident) {
BT_DBG("corrupted command");
+ l2cap_sig_send_rej(conn, cmd->ident);
break;
}
err = l2cap_bredr_sig_cmd(conn, cmd, len, skb->data);
if (err) {
- struct l2cap_cmd_rej_unk rej;
-
BT_ERR("Wrong link type (%d)", err);
-
- rej.reason = cpu_to_le16(L2CAP_REJ_NOT_UNDERSTOOD);
- l2cap_send_cmd(conn, cmd->ident, L2CAP_COMMAND_REJ,
- sizeof(rej), &rej);
+ l2cap_sig_send_rej(conn, cmd->ident);
}
skb_pull(skb, len);
}
+ if (skb->len > 0) {
+ BT_DBG("corrupted command");
+ l2cap_sig_send_rej(conn, 0);
+ }
+
drop:
kfree_skb(skb);
}
for (i = 0; i < key_count; i++) {
struct mgmt_link_key_info *key = &cp->keys[i];
- if (key->addr.type != BDADDR_BREDR || key->type > 0x08)
+ /* Considering SMP over BREDR/LE, there is no need to check addr_type */
+ if (key->type > 0x08)
return mgmt_cmd_status(sk, hdev->id,
MGMT_OP_LOAD_LINK_KEYS,
MGMT_STATUS_INVALID_PARAMS);
for (i = 0; i < irk_count; i++) {
struct mgmt_irk_info *irk = &cp->irks[i];
+ u8 addr_type = le_addr_type(irk->addr.type);
if (hci_is_blocked_key(hdev,
HCI_BLOCKED_KEY_TYPE_IRK,
continue;
}
+ /* When using SMP over BR/EDR, the addr type should be set to BREDR */
+ if (irk->addr.type == BDADDR_BREDR)
+ addr_type = BDADDR_BREDR;
+
hci_add_irk(hdev, &irk->addr.bdaddr,
- le_addr_type(irk->addr.type), irk->val,
+ addr_type, irk->val,
BDADDR_ANY);
}
for (i = 0; i < key_count; i++) {
struct mgmt_ltk_info *key = &cp->keys[i];
u8 type, authenticated;
+ u8 addr_type = le_addr_type(key->addr.type);
if (hci_is_blocked_key(hdev,
HCI_BLOCKED_KEY_TYPE_LTK,
continue;
}
+ /* When using SMP over BR/EDR, the addr type should be set to BREDR */
+ if (key->addr.type == BDADDR_BREDR)
+ addr_type = BDADDR_BREDR;
+
hci_add_ltk(hdev, &key->addr.bdaddr,
- le_addr_type(key->addr.type), type, authenticated,
+ addr_type, type, authenticated,
key->val, key->enc_size, key->ediv, key->rand);
}
ev.store_hint = persistent;
bacpy(&ev.key.addr.bdaddr, &key->bdaddr);
- ev.key.addr.type = BDADDR_BREDR;
+ ev.key.addr.type = link_to_bdaddr(key->link_type, key->bdaddr_type);
ev.key.type = key->type;
memcpy(ev.key.val, key->val, HCI_LINK_KEY_SIZE);
ev.key.pin_len = key->pin_len;
ev.store_hint = persistent;
bacpy(&ev.key.addr.bdaddr, &key->bdaddr);
- ev.key.addr.type = link_to_bdaddr(LE_LINK, key->bdaddr_type);
+ ev.key.addr.type = link_to_bdaddr(key->link_type, key->bdaddr_type);
ev.key.type = mgmt_ltk_type(key);
ev.key.enc_size = key->enc_size;
ev.key.ediv = key->ediv;
bacpy(&ev.rpa, &irk->rpa);
bacpy(&ev.irk.addr.bdaddr, &irk->bdaddr);
- ev.irk.addr.type = link_to_bdaddr(LE_LINK, irk->addr_type);
+ ev.irk.addr.type = link_to_bdaddr(irk->link_type, irk->addr_type);
memcpy(ev.irk.val, irk->val, sizeof(irk->val));
mgmt_event(MGMT_EV_NEW_IRK, hdev, &ev, sizeof(ev), NULL);
ev.store_hint = persistent;
bacpy(&ev.key.addr.bdaddr, &csrk->bdaddr);
- ev.key.addr.type = link_to_bdaddr(LE_LINK, csrk->bdaddr_type);
+ ev.key.addr.type = link_to_bdaddr(csrk->link_type, csrk->bdaddr_type);
ev.key.type = csrk->type;
memcpy(ev.key.val, csrk->val, sizeof(csrk->val));
}
if (smp->remote_irk) {
+ smp->remote_irk->link_type = hcon->type;
mgmt_new_irk(hdev, smp->remote_irk, persistent);
/* Now that user space can be considered to know the
}
if (smp->csrk) {
+ smp->csrk->link_type = hcon->type;
smp->csrk->bdaddr_type = hcon->dst_type;
bacpy(&smp->csrk->bdaddr, &hcon->dst);
mgmt_new_csrk(hdev, smp->csrk, persistent);
}
if (smp->responder_csrk) {
+ smp->responder_csrk->link_type = hcon->type;
smp->responder_csrk->bdaddr_type = hcon->dst_type;
bacpy(&smp->responder_csrk->bdaddr, &hcon->dst);
mgmt_new_csrk(hdev, smp->responder_csrk, persistent);
}
if (smp->ltk) {
+ smp->ltk->link_type = hcon->type;
smp->ltk->bdaddr_type = hcon->dst_type;
bacpy(&smp->ltk->bdaddr, &hcon->dst);
mgmt_new_ltk(hdev, smp->ltk, persistent);
}
if (smp->responder_ltk) {
+ smp->responder_ltk->link_type = hcon->type;
smp->responder_ltk->bdaddr_type = hcon->dst_type;
bacpy(&smp->responder_ltk->bdaddr, &hcon->dst);
mgmt_new_ltk(hdev, smp->responder_ltk, persistent);
key = hci_add_link_key(hdev, smp->conn->hcon, &hcon->dst,
smp->link_key, type, 0, &persistent);
if (key) {
+ key->link_type = hcon->type;
+ key->bdaddr_type = hcon->dst_type;
mgmt_new_link_key(hdev, key, persistent);
/* Don't keep debug keys around if the relevant
int err = 0, i;
for (i = 0; i < fdmax; i++) {
- err = receive_fd_user(scm->fp->fp[i], cmsg_data + i, o_flags);
+ err = scm_recv_one_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
if (err < 0)
break;
}
if (gso_segs > READ_ONCE(dev->gso_max_segs))
return features & ~NETIF_F_GSO_MASK;
+ if (unlikely(skb->len >= READ_ONCE(dev->gso_max_size)))
+ return features & ~NETIF_F_GSO_MASK;
+
if (!skb_shinfo(skb)->gso_type) {
skb_warn_bad_offload(skb);
return features & ~NETIF_F_GSO_MASK;
}
for (i = 0; i < fdmax; i++) {
- err = receive_fd_user(scm->fp->fp[i], cmsg_data + i, o_flags);
+ err = scm_recv_one_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
if (err < 0)
break;
}
static void skb_extensions_init(void)
{
BUILD_BUG_ON(SKB_EXT_NUM >= 8);
+#if !IS_ENABLED(CONFIG_KCOV_INSTRUMENT_ALL)
BUILD_BUG_ON(skb_ext_total_length() > 255);
+#endif
skbuff_ext_cache = kmem_cache_create("skbuff_ext_cache",
SKB_EXT_ALIGN_VALUE * skb_ext_total_length(),
break;
case SO_TIMESTAMPING_OLD:
+ case SO_TIMESTAMPING_NEW:
lv = sizeof(v.timestamping);
- v.timestamping.flags = READ_ONCE(sk->sk_tsflags);
- v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc);
+ /* For the later-added case SO_TIMESTAMPING_NEW: Be strict about only
+ * returning the flags when they were set through the same option.
+ * Don't change the beviour for the old case SO_TIMESTAMPING_OLD.
+ */
+ if (optname == SO_TIMESTAMPING_OLD || sock_flag(sk, SOCK_TSTAMP_NEW)) {
+ v.timestamping.flags = READ_ONCE(sk->sk_tsflags);
+ v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc);
+ }
break;
case SO_RCVTIMEO_OLD:
sockc->mark = *(u32 *)CMSG_DATA(cmsg);
break;
case SO_TIMESTAMPING_OLD:
+ case SO_TIMESTAMPING_NEW:
if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
return -EINVAL;
{
if (sk_is_tcp(sk))
return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
+ if (sk_is_stream_unix(sk))
+ return (1 << sk->sk_state) & TCPF_ESTABLISHED;
return true;
}
remove_wait_queue(sk_sleep(sk), &wait);
sk->sk_write_pending--;
} while (!done);
- return 0;
+ return done < 0 ? done : 0;
}
EXPORT_SYMBOL(sk_stream_wait_connect);
static int
dns_resolver_preparse(struct key_preparsed_payload *prep)
{
- const struct dns_payload_header *bin;
struct user_key_payload *upayload;
unsigned long derrno;
int ret;
return -EINVAL;
if (data[0] == 0) {
+ const struct dns_server_list_v1_header *v1;
+
/* It may be a server list. */
- if (datalen <= sizeof(*bin))
+ if (datalen <= sizeof(*v1))
return -EINVAL;
- bin = (const struct dns_payload_header *)data;
- kenter("[%u,%u],%u", bin->content, bin->version, datalen);
- if (bin->content != DNS_PAYLOAD_IS_SERVER_LIST) {
+ v1 = (const struct dns_server_list_v1_header *)data;
+ kenter("[%u,%u],%u", v1->hdr.content, v1->hdr.version, datalen);
+ if (v1->hdr.content != DNS_PAYLOAD_IS_SERVER_LIST) {
pr_warn_ratelimited(
"dns_resolver: Unsupported content type (%u)\n",
- bin->content);
+ v1->hdr.content);
return -EINVAL;
}
- if (bin->version != 1) {
+ if (v1->hdr.version != 1) {
pr_warn_ratelimited(
"dns_resolver: Unsupported server list version (%u)\n",
- bin->version);
+ v1->hdr.version);
return -EINVAL;
}
+ if ((v1->status != DNS_LOOKUP_GOOD &&
+ v1->status != DNS_LOOKUP_GOOD_WITH_BAD)) {
+ if (prep->expiry == TIME64_MAX)
+ prep->expiry = ktime_get_real_seconds() + 1;
+ }
+
result_len = datalen;
goto store_result;
}
struct key_type key_type_dns_resolver = {
.name = "dns_resolver",
- .flags = KEY_TYPE_NET_DOMAIN,
+ .flags = KEY_TYPE_NET_DOMAIN | KEY_TYPE_INSTANT_REAP,
.preparse = dns_resolver_preparse,
.free_preparse = dns_resolver_free_preparse,
.instantiate = generic_key_instantiate,
if (unlikely(!pskb_may_pull(skb, total_pull)))
return NULL;
+ ifehdr = (struct ifeheadr *)(skb->data + skb->dev->hard_header_len);
skb_set_mac_header(skb, total_pull);
__skb_pull(skb, total_pull);
*metalen = ifehdrln - IFE_METAHDRLEN;
{
unsigned long copy_address = (unsigned long)zc->copybuf_address;
struct msghdr msg = {};
- struct iovec iov;
int err;
zc->length = 0;
if (copy_address != zc->copybuf_address)
return -EINVAL;
- err = import_single_range(ITER_DEST, (void __user *)copy_address,
- inq, &iov, &msg.msg_iter);
+ err = import_ubuf(ITER_DEST, (void __user *)copy_address, inq,
+ &msg.msg_iter);
if (err)
return err;
{
unsigned long copy_address = (unsigned long)zc->copybuf_address;
struct msghdr msg = {};
- struct iovec iov;
int err;
if (copy_address != zc->copybuf_address)
return -EINVAL;
- err = import_single_range(ITER_DEST, (void __user *)copy_address,
- copylen, &iov, &msg.msg_iter);
+ err = import_ubuf(ITER_DEST, (void __user *)copy_address, copylen,
+ &msg.msg_iter);
if (err)
return err;
err = skb_copy_datagram_msg(skb, *offset, &msg, copylen);
if (strcmp(cpool[i].alg, alg))
continue;
- if (kref_read(&cpool[i].kref) > 0)
- kref_get(&cpool[i].kref);
- else
+ /* pairs with tcp_sigpool_release() */
+ if (!kref_get_unless_zero(&cpool[i].kref))
kref_init(&cpool[i].kref);
ret = i;
goto out;
write_unlock_bh(&idev->lock);
- /* From RFC 4941:
- *
- * A temporary address is created only if this calculated Preferred
- * Lifetime is greater than REGEN_ADVANCE time units. In
- * particular, an implementation must not create a temporary address
- * with a zero Preferred Lifetime.
- *
- * Clamp the preferred lifetime to a minimum of regen_advance, unless
- * that would exceed valid_lft.
- *
+ /* A temporary address is created only if this calculated Preferred
+ * Lifetime is greater than REGEN_ADVANCE time units. In particular,
+ * an implementation must not create a temporary address with a zero
+ * Preferred Lifetime.
* Use age calculation as in addrconf_verify to avoid unnecessary
* temporary addresses being generated.
*/
age = (now - tmp_tstamp + ADDRCONF_TIMER_FUZZ_MINUS) / HZ;
- if (cfg.preferred_lft <= regen_advance + age)
- cfg.preferred_lft = regen_advance + age + 1;
- if (cfg.preferred_lft > cfg.valid_lft) {
+ if (cfg.preferred_lft <= regen_advance + age) {
in6_ifa_put(ifp);
in6_dev_put(idev);
ret = -1;
INIT_LIST_HEAD(&f6i->fib6_siblings);
refcount_set(&f6i->fib6_ref, 1);
- INIT_HLIST_NODE(&f6i->gc_link);
-
return f6i;
}
net->ipv6.fib6_null_entry);
table->tb6_root.fn_flags = RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
inet_peer_base_init(&table->tb6_peers);
- INIT_HLIST_HEAD(&table->tb6_gc_hlist);
}
return table;
lockdep_is_held(&table->tb6_lock));
}
}
-
- fib6_clean_expires_locked(rt);
}
/*
if (!(iter->fib6_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->fib6_flags & RTF_EXPIRES))
- fib6_clean_expires_locked(iter);
+ fib6_clean_expires(iter);
else
- fib6_set_expires_locked(iter,
- rt->expires);
+ fib6_set_expires(iter, rt->expires);
if (rt->fib6_pmtu)
fib6_metric_set(iter, RTAX_MTU,
if (rt->nh)
list_add(&rt->nh_list, &rt->nh->f6i_list);
__fib6_update_sernum_upto_root(rt, fib6_new_sernum(info->nl_net));
-
- if (fib6_has_expires(rt))
- hlist_add_head(&rt->gc_link, &table->tb6_gc_hlist);
-
fib6_start_gc(info->nl_net, rt);
}
* Garbage collection
*/
-static int fib6_age(struct fib6_info *rt, struct fib6_gc_args *gc_args)
+static int fib6_age(struct fib6_info *rt, void *arg)
{
+ struct fib6_gc_args *gc_args = arg;
unsigned long now = jiffies;
/*
* Routes are expired even if they are in use.
*/
- if (fib6_has_expires(rt) && rt->expires) {
+ if (rt->fib6_flags & RTF_EXPIRES && rt->expires) {
if (time_after(now, rt->expires)) {
RT6_TRACE("expiring %p\n", rt);
return -1;
return 0;
}
-static void fib6_gc_table(struct net *net,
- struct fib6_table *tb6,
- struct fib6_gc_args *gc_args)
-{
- struct fib6_info *rt;
- struct hlist_node *n;
- struct nl_info info = {
- .nl_net = net,
- .skip_notify = false,
- };
-
- hlist_for_each_entry_safe(rt, n, &tb6->tb6_gc_hlist, gc_link)
- if (fib6_age(rt, gc_args) == -1)
- fib6_del(rt, &info);
-}
-
-static void fib6_gc_all(struct net *net, struct fib6_gc_args *gc_args)
-{
- struct fib6_table *table;
- struct hlist_head *head;
- unsigned int h;
-
- rcu_read_lock();
- for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
- head = &net->ipv6.fib_table_hash[h];
- hlist_for_each_entry_rcu(table, head, tb6_hlist) {
- spin_lock_bh(&table->tb6_lock);
- fib6_gc_table(net, table, gc_args);
- spin_unlock_bh(&table->tb6_lock);
- }
- }
- rcu_read_unlock();
-}
-
void fib6_run_gc(unsigned long expires, struct net *net, bool force)
{
struct fib6_gc_args gc_args;
net->ipv6.sysctl.ip6_rt_gc_interval;
gc_args.more = 0;
- fib6_gc_all(net, &gc_args);
+ fib6_clean_all(net, fib6_age, &gc_args);
now = jiffies;
net->ipv6.ip6_rt_last_gc = now;
rt->dst_nocount = true;
if (cfg->fc_flags & RTF_EXPIRES)
- fib6_set_expires_locked(rt, jiffies +
- clock_t_to_jiffies(cfg->fc_expires));
+ fib6_set_expires(rt, jiffies +
+ clock_t_to_jiffies(cfg->fc_expires));
else
- fib6_clean_expires_locked(rt);
+ fib6_clean_expires(rt);
if (cfg->fc_protocol == RTPROT_UNSPEC)
cfg->fc_protocol = RTPROT_BOOT;
lockdep_is_held(&local->hw.wiphy->mtx));
/*
- * If there are no changes, then accept a link that doesn't exist,
+ * If there are no changes, then accept a link that exist,
* unless it's a new link.
*/
- if (params->link_id < 0 && !new_link &&
+ if (params->link_id >= 0 && !new_link &&
!params->link_mac && !params->txpwr_set &&
!params->supported_rates_len &&
!params->ht_capa && !params->vht_capa &&
{
ieee80211_debugfs_remove_netdev(sdata);
ieee80211_debugfs_add_netdev(sdata, mld_vif);
- drv_vif_add_debugfs(sdata->local, sdata);
- if (!mld_vif)
- ieee80211_link_debugfs_drv_add(&sdata->deflink);
+
+ if (sdata->flags & IEEE80211_SDATA_IN_DRIVER) {
+ drv_vif_add_debugfs(sdata->local, sdata);
+ if (!mld_vif)
+ ieee80211_link_debugfs_drv_add(&sdata->deflink);
+ }
}
void ieee80211_link_debugfs_add(struct ieee80211_link_data *link)
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2015 Intel Deutschland GmbH
- * Copyright (C) 2022 Intel Corporation
+ * Copyright (C) 2022-2023 Intel Corporation
*/
#include <net/mac80211.h>
#include "ieee80211_i.h"
if (ret)
return ret;
- sdata->flags |= IEEE80211_SDATA_IN_DRIVER;
+ if (!(sdata->flags & IEEE80211_SDATA_IN_DRIVER)) {
+ sdata->flags |= IEEE80211_SDATA_IN_DRIVER;
- if (!local->in_reconfig) {
drv_vif_add_debugfs(local, sdata);
/* initially vif is not MLD */
ieee80211_link_debugfs_drv_add(&sdata->deflink);
if (!check_sdata_in_driver(sdata))
return;
+ sdata->flags &= ~IEEE80211_SDATA_IN_DRIVER;
+
+ /* Remove driver debugfs entries */
+ ieee80211_debugfs_recreate_netdev(sdata, sdata->vif.valid_links);
+
trace_drv_remove_interface(local, sdata);
local->ops->remove_interface(&local->hw, &sdata->vif);
- sdata->flags &= ~IEEE80211_SDATA_IN_DRIVER;
trace_drv_return_void(local);
}
if (ret)
return ret;
- if (!local->in_reconfig) {
+ if (!local->in_reconfig && !local->resuming) {
for_each_set_bit(link_id, &links_to_add,
IEEE80211_MLD_MAX_NUM_LINKS) {
link = rcu_access_pointer(sdata->link[link_id]);
if (ret)
return ret;
+ /* during reconfig don't add it to debugfs again */
+ if (local->in_reconfig || local->resuming)
+ return 0;
+
for_each_set_bit(link_id, &links_to_add, IEEE80211_MLD_MAX_NUM_LINKS) {
link_sta = rcu_dereference_protected(info->link[link_id],
lockdep_is_held(&local->hw.wiphy->mtx));
case WLAN_SP_MESH_PEERING_OPEN:
if (!matches_local)
event = OPN_RJCT;
- if (!mesh_plink_free_count(sdata) ||
- (sta->mesh->plid && sta->mesh->plid != plid))
+ else if (!mesh_plink_free_count(sdata) ||
+ (sta->mesh->plid && sta->mesh->plid != plid))
event = OPN_IGNR;
else
event = OPN_ACPT;
case WLAN_SP_MESH_PEERING_CONFIRM:
if (!matches_local)
event = CNF_RJCT;
- if (!mesh_plink_free_count(sdata) ||
- sta->mesh->llid != llid ||
- (sta->mesh->plid && sta->mesh->plid != plid))
+ else if (!mesh_plink_free_count(sdata) ||
+ sta->mesh->llid != llid ||
+ (sta->mesh->plid && sta->mesh->plid != plid))
event = CNF_IGNR;
else
event = CNF_ACPT;
return;
}
elems = ieee802_11_parse_elems(baseaddr, len - baselen, true, NULL);
- mesh_process_plink_frame(sdata, mgmt, elems, rx_status);
- kfree(elems);
+ if (elems) {
+ mesh_process_plink_frame(sdata, mgmt, elems, rx_status);
+ kfree(elems);
+ }
}
{
const struct ieee80211_multi_link_elem *ml;
const struct element *sub;
- size_t ml_len;
+ ssize_t ml_len;
unsigned long removed_links = 0;
u16 link_removal_timeout[IEEE80211_MLD_MAX_NUM_LINKS] = {};
u8 link_id;
elems->scratch + elems->scratch_len -
elems->scratch_pos,
WLAN_EID_FRAGMENT);
+ if (ml_len < 0)
+ return;
elems->ml_reconf = (const void *)elems->scratch_pos;
elems->ml_reconf_len = ml_len;
kunit_test_suite(mptcp_crypto_suite);
MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("KUnit tests for MPTCP Crypto");
if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags))
__mptcp_clean_una_wakeup(sk);
if (unlikely(msk->cb_flags)) {
- /* be sure to set the current sk state before taking actions
+ /* be sure to sync the msk state before taking actions
* depending on sk_state (MPTCP_ERROR_REPORT)
* On sk release avoid actions depending on the first subflow
*/
- if (__test_and_clear_bit(MPTCP_CONNECTED, &msk->cb_flags) && msk->first)
- __mptcp_set_connected(sk);
+ if (__test_and_clear_bit(MPTCP_SYNC_STATE, &msk->cb_flags) && msk->first)
+ __mptcp_sync_state(sk, msk->pending_state);
if (__test_and_clear_bit(MPTCP_ERROR_REPORT, &msk->cb_flags))
__mptcp_error_report(sk);
if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags))
#define MPTCP_ERROR_REPORT 3
#define MPTCP_RETRANSMIT 4
#define MPTCP_FLUSH_JOIN_LIST 5
-#define MPTCP_CONNECTED 6
+#define MPTCP_SYNC_STATE 6
#define MPTCP_SYNC_SNDBUF 7
struct mptcp_skb_cb {
bool use_64bit_ack; /* Set when we received a 64-bit DSN */
bool csum_enabled;
bool allow_infinite_fallback;
+ u8 pending_state; /* A subflow asked to set this sk_state,
+ * protected by the msk data lock
+ */
u8 mpc_endpoint_id;
u8 recvmsg_inq:1,
cork:1,
struct mptcp_options_received *mp_opt);
void mptcp_finish_connect(struct sock *sk);
-void __mptcp_set_connected(struct sock *sk);
+void __mptcp_sync_state(struct sock *sk, int state);
void mptcp_reset_tout_timer(struct mptcp_sock *msk, unsigned long fail_tout);
static inline void mptcp_stop_tout_timer(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
- return sk->sk_state == TCP_ESTABLISHED &&
+ return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_FIN_WAIT1) &&
is_active_ssk(subflow) &&
!subflow->conn_finished;
}
return inet_sk(sk)->inet_dport != inet_sk((struct sock *)msk)->inet_dport;
}
-void __mptcp_set_connected(struct sock *sk)
+void __mptcp_sync_state(struct sock *sk, int state)
{
- __mptcp_propagate_sndbuf(sk, mptcp_sk(sk)->first);
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ __mptcp_propagate_sndbuf(sk, msk->first);
if (sk->sk_state == TCP_SYN_SENT) {
- inet_sk_state_store(sk, TCP_ESTABLISHED);
+ inet_sk_state_store(sk, state);
sk->sk_state_change(sk);
}
}
-static void mptcp_set_connected(struct sock *sk)
+static void mptcp_propagate_state(struct sock *sk, struct sock *ssk)
{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
mptcp_data_lock(sk);
- if (!sock_owned_by_user(sk))
- __mptcp_set_connected(sk);
- else
- __set_bit(MPTCP_CONNECTED, &mptcp_sk(sk)->cb_flags);
+ if (!sock_owned_by_user(sk)) {
+ __mptcp_sync_state(sk, ssk->sk_state);
+ } else {
+ msk->pending_state = ssk->sk_state;
+ __set_bit(MPTCP_SYNC_STATE, &msk->cb_flags);
+ }
mptcp_data_unlock(sk);
}
subflow_set_remote_key(msk, subflow, &mp_opt);
MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEACTIVEACK);
mptcp_finish_connect(sk);
- mptcp_set_connected(parent);
+ mptcp_propagate_state(parent, sk);
} else if (subflow->request_join) {
u8 hmac[SHA256_DIGEST_SIZE];
} else if (mptcp_check_fallback(sk)) {
fallback:
mptcp_rcv_space_init(msk, sk);
- mptcp_set_connected(parent);
+ mptcp_propagate_state(parent, sk);
}
return;
mptcp_rcv_space_init(msk, sk);
pr_fallback(msk);
subflow->conn_finished = 1;
- mptcp_set_connected(parent);
+ mptcp_propagate_state(parent, sk);
}
/* as recvmsg() does not acquire the subflow socket for ssk selection
tcp_release_cb(ssk);
}
+static int tcp_abort_override(struct sock *ssk, int err)
+{
+ /* closing a listener subflow requires a great deal of care.
+ * keep it simple and just prevent such operation
+ */
+ if (inet_sk_state_load(ssk) == TCP_LISTEN)
+ return -EINVAL;
+
+ return tcp_abort(ssk, err);
+}
+
static struct tcp_ulp_ops subflow_ulp_ops __read_mostly = {
.name = "mptcp",
.owner = THIS_MODULE,
tcp_prot_override = tcp_prot;
tcp_prot_override.release_cb = tcp_release_cb_override;
+ tcp_prot_override.diag_destroy = tcp_abort_override;
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
/* In struct mptcp_subflow_request_sock, we assume the TCP request sock
tcpv6_prot_override = tcpv6_prot;
tcpv6_prot_override.release_cb = tcp_release_cb_override;
+ tcpv6_prot_override.diag_destroy = tcp_abort_override;
#endif
mptcp_diag_subflow_init(&subflow_ulp_ops);
kunit_test_suite(mptcp_token_suite);
MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("KUnit tests for MPTCP Token");
}
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
if (err == NF_ACCEPT)
*action |= BIT(maniptype);
-out:
+
return err;
}
list_for_each_entry_safe(set, next, set_update_list, pending_update) {
list_del_init(&set->pending_update);
- if (!set->ops->commit)
+ if (!set->ops->commit || set->dead)
continue;
set->ops->commit(set);
else {
if (!(pkt->flags & NFT_PKTINFO_L4PROTO))
return false;
- ptr = skb_network_header(skb) + nft_thoff(pkt);
+ ptr = skb->data + nft_thoff(pkt);
}
ptr += priv->offset;
case NFT_GOTO:
err = nf_tables_bind_chain(ctx, chain);
if (err < 0)
- return err;
+ goto err1;
break;
default:
break;
static struct nfc_llcp_local *nfc_llcp_local_get(struct nfc_llcp_local *local)
{
+ /* Since using nfc_llcp_local may result in usage of nfc_dev, whenever
+ * we hold a reference to local, we also need to hold a reference to
+ * the device to avoid UAF.
+ */
+ if (!nfc_get_device(local->dev->idx))
+ return NULL;
+
kref_get(&local->ref);
return local;
int nfc_llcp_local_put(struct nfc_llcp_local *local)
{
+ struct nfc_dev *dev;
+ int ret;
+
if (local == NULL)
return 0;
- return kref_put(&local->ref, local_release);
+ dev = local->dev;
+
+ ret = kref_put(&local->ref, local_release);
+ nfc_put_device(dev);
+
+ return ret;
}
static struct nfc_llcp_sock *nfc_llcp_sock_get(struct nfc_llcp_local *local,
}
new_sock = nfc_llcp_sock(new_sk);
- new_sock->dev = local->dev;
+
new_sock->local = nfc_llcp_local_get(local);
+ if (!new_sock->local) {
+ reason = LLCP_DM_REJ;
+ sock_put(&new_sock->sk);
+ release_sock(&sock->sk);
+ sock_put(&sock->sk);
+ goto fail;
+ }
+
+ new_sock->dev = local->dev;
new_sock->rw = sock->rw;
new_sock->miux = sock->miux;
new_sock->nfc_protocol = sock->nfc_protocol;
if (local == NULL)
return -ENOMEM;
- local->dev = ndev;
+ /* As we are going to initialize local's refcount, we need to get the
+ * nfc_dev to avoid UAF, otherwise there is no point in continuing.
+ * See nfc_llcp_local_get().
+ */
+ local->dev = nfc_get_device(ndev->idx);
+ if (!local->dev) {
+ kfree(local);
+ return -ENODEV;
+ }
+
INIT_LIST_HEAD(&local->list);
kref_init(&local->ref);
mutex_init(&local->sdp_lock);
}
if (sk->sk_type == SOCK_DGRAM) {
+ if (sk->sk_state != LLCP_BOUND) {
+ release_sock(sk);
+ return -ENOTCONN;
+ }
+
DECLARE_SOCKADDR(struct sockaddr_nfc_llcp *, addr,
msg->msg_name);
if (!node)
return -ENOENT;
- return server_del(node, port, true);
+ server_del(node, port, true);
+
+ return 0;
}
static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,
return -EINVAL;
}
+ ret = gpiod_direction_output(rfkill->reset_gpio, true);
+ if (ret)
+ return ret;
+
+ ret = gpiod_direction_output(rfkill->shutdown_gpio, true);
+ if (ret)
+ return ret;
+
rfkill->rfkill_dev = rfkill_alloc(rfkill->name, &pdev->dev,
rfkill->type, &rfkill_gpio_ops,
rfkill);
*/
static void rose_kill_by_device(struct net_device *dev)
{
- struct sock *s;
+ struct sock *sk, *array[16];
+ struct rose_sock *rose;
+ bool rescan;
+ int i, cnt;
+start:
+ rescan = false;
+ cnt = 0;
spin_lock_bh(&rose_list_lock);
- sk_for_each(s, &rose_list) {
- struct rose_sock *rose = rose_sk(s);
+ sk_for_each(sk, &rose_list) {
+ rose = rose_sk(sk);
+ if (rose->device == dev) {
+ if (cnt == ARRAY_SIZE(array)) {
+ rescan = true;
+ break;
+ }
+ sock_hold(sk);
+ array[cnt++] = sk;
+ }
+ }
+ spin_unlock_bh(&rose_list_lock);
+ for (i = 0; i < cnt; i++) {
+ sk = array[cnt];
+ rose = rose_sk(sk);
+ lock_sock(sk);
+ spin_lock_bh(&rose_list_lock);
if (rose->device == dev) {
- rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
+ rose_disconnect(sk, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
if (rose->neighbour)
rose->neighbour->use--;
netdev_put(rose->device, &rose->dev_tracker);
rose->device = NULL;
}
+ spin_unlock_bh(&rose_list_lock);
+ release_sock(sk);
+ sock_put(sk);
+ cond_resched();
}
- spin_unlock_bh(&rose_list_lock);
+ if (rescan)
+ goto start;
}
/*
break;
}
+ spin_lock_bh(&rose_list_lock);
netdev_put(rose->device, &rose->dev_tracker);
+ rose->device = NULL;
+ spin_unlock_bh(&rose_list_lock);
sock->sk = NULL;
release_sock(sk);
sock_put(sk);
static void em_text_destroy(struct tcf_ematch *m)
{
- if (EM_TEXT_PRIV(m) && EM_TEXT_PRIV(m)->config)
+ if (EM_TEXT_PRIV(m) && EM_TEXT_PRIV(m)->config) {
textsearch_destroy(EM_TEXT_PRIV(m)->config);
+ kfree(EM_TEXT_PRIV(m));
+ }
}
static int em_text_dump(struct sk_buff *skb, struct tcf_ematch *m)
.lnk[0].link_id = link->link_id,
};
- memcpy(linfo.lnk[0].ibname,
- smc->conn.lgr->lnk[0].smcibdev->ibdev->name,
+ memcpy(linfo.lnk[0].ibname, link->smcibdev->ibdev->name,
sizeof(link->smcibdev->ibdev->name));
smc_gid_be16_convert(linfo.lnk[0].gid, link->gid);
smc_gid_be16_convert(linfo.lnk[0].peer_gid, link->peer_gid);
{
struct sockaddr_storage *save_addr = (struct sockaddr_storage *)msg->msg_name;
struct sockaddr_storage address;
+ int save_len = msg->msg_namelen;
int ret;
if (msg->msg_name) {
ret = __sock_sendmsg(sock, msg);
msg->msg_name = save_addr;
+ msg->msg_namelen = save_len;
return ret;
}
struct sockaddr_storage address;
int err;
struct msghdr msg;
- struct iovec iov;
int fput_needed;
- err = import_single_range(ITER_SOURCE, buff, len, &iov, &msg.msg_iter);
+ err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter);
if (unlikely(err))
return err;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
.msg_name = addr ? (struct sockaddr *)&address : NULL,
};
struct socket *sock;
- struct iovec iov;
int err, err2;
int fput_needed;
- err = import_single_range(ITER_DEST, ubuf, size, &iov, &msg.msg_iter);
+ err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter);
if (unlikely(err))
return err;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
}
for (filled = 0; filled < pages; filled = ret) {
- ret = alloc_pages_bulk_array_node(GFP_KERNEL,
- rqstp->rq_pool->sp_id,
- pages, rqstp->rq_pages);
+ ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
+ rqstp->rq_pages);
if (ret > filled)
/* Made progress, don't sleep yet */
continue;
--- /dev/null
+/* Chen-Yu Tsai's regdb certificate */
+0x30, 0x82, 0x02, 0xa7, 0x30, 0x82, 0x01, 0x8f,
+0x02, 0x14, 0x61, 0xc0, 0x38, 0x65, 0x1a, 0xab,
+0xdc, 0xf9, 0x4b, 0xd0, 0xac, 0x7f, 0xf0, 0x6c,
+0x72, 0x48, 0xdb, 0x18, 0xc6, 0x00, 0x30, 0x0d,
+0x06, 0x09, 0x2a, 0x86, 0x48, 0x86, 0xf7, 0x0d,
+0x01, 0x01, 0x0b, 0x05, 0x00, 0x30, 0x0f, 0x31,
+0x0d, 0x30, 0x0b, 0x06, 0x03, 0x55, 0x04, 0x03,
+0x0c, 0x04, 0x77, 0x65, 0x6e, 0x73, 0x30, 0x20,
+0x17, 0x0d, 0x32, 0x33, 0x31, 0x32, 0x30, 0x31,
+0x30, 0x37, 0x34, 0x31, 0x31, 0x34, 0x5a, 0x18,
+0x0f, 0x32, 0x31, 0x32, 0x33, 0x31, 0x31, 0x30,
+0x37, 0x30, 0x37, 0x34, 0x31, 0x31, 0x34, 0x5a,
+0x30, 0x0f, 0x31, 0x0d, 0x30, 0x0b, 0x06, 0x03,
+0x55, 0x04, 0x03, 0x0c, 0x04, 0x77, 0x65, 0x6e,
+0x73, 0x30, 0x82, 0x01, 0x22, 0x30, 0x0d, 0x06,
+0x09, 0x2a, 0x86, 0x48, 0x86, 0xf7, 0x0d, 0x01,
+0x01, 0x01, 0x05, 0x00, 0x03, 0x82, 0x01, 0x0f,
+0x00, 0x30, 0x82, 0x01, 0x0a, 0x02, 0x82, 0x01,
+0x01, 0x00, 0xa9, 0x7a, 0x2c, 0x78, 0x4d, 0xa7,
+0x19, 0x2d, 0x32, 0x52, 0xa0, 0x2e, 0x6c, 0xef,
+0x88, 0x7f, 0x15, 0xc5, 0xb6, 0x69, 0x54, 0x16,
+0x43, 0x14, 0x79, 0x53, 0xb7, 0xae, 0x88, 0xfe,
+0xc0, 0xb7, 0x5d, 0x47, 0x8e, 0x1a, 0xe1, 0xef,
+0xb3, 0x90, 0x86, 0xda, 0xd3, 0x64, 0x81, 0x1f,
+0xce, 0x5d, 0x9e, 0x4b, 0x6e, 0x58, 0x02, 0x3e,
+0xb2, 0x6f, 0x5e, 0x42, 0x47, 0x41, 0xf4, 0x2c,
+0xb8, 0xa8, 0xd4, 0xaa, 0xc0, 0x0e, 0xe6, 0x48,
+0xf0, 0xa8, 0xce, 0xcb, 0x08, 0xae, 0x37, 0xaf,
+0xf6, 0x40, 0x39, 0xcb, 0x55, 0x6f, 0x5b, 0x4f,
+0x85, 0x34, 0xe6, 0x69, 0x10, 0x50, 0x72, 0x5e,
+0x4e, 0x9d, 0x4c, 0xba, 0x38, 0x36, 0x0d, 0xce,
+0x73, 0x38, 0xd7, 0x27, 0x02, 0x2a, 0x79, 0x03,
+0xe1, 0xac, 0xcf, 0xb0, 0x27, 0x85, 0x86, 0x93,
+0x17, 0xab, 0xec, 0x42, 0x77, 0x37, 0x65, 0x8a,
+0x44, 0xcb, 0xd6, 0x42, 0x93, 0x92, 0x13, 0xe3,
+0x39, 0x45, 0xc5, 0x6e, 0x00, 0x4a, 0x7f, 0xcb,
+0x42, 0x17, 0x2b, 0x25, 0x8c, 0xb8, 0x17, 0x3b,
+0x15, 0x36, 0x59, 0xde, 0x42, 0xce, 0x21, 0xe6,
+0xb6, 0xc7, 0x6e, 0x5e, 0x26, 0x1f, 0xf7, 0x8a,
+0x57, 0x9e, 0xa5, 0x96, 0x72, 0xb7, 0x02, 0x32,
+0xeb, 0x07, 0x2b, 0x73, 0xe2, 0x4f, 0x66, 0x58,
+0x9a, 0xeb, 0x0f, 0x07, 0xb6, 0xab, 0x50, 0x8b,
+0xc3, 0x8f, 0x17, 0xfa, 0x0a, 0x99, 0xc2, 0x16,
+0x25, 0xbf, 0x2d, 0x6b, 0x1a, 0xaa, 0xe6, 0x3e,
+0x5f, 0xeb, 0x6d, 0x9b, 0x5d, 0x4d, 0x42, 0x83,
+0x2d, 0x39, 0xb8, 0xc9, 0xac, 0xdb, 0x3a, 0x91,
+0x50, 0xdf, 0xbb, 0xb1, 0x76, 0x6d, 0x15, 0x73,
+0xfd, 0xc6, 0xe6, 0x6b, 0x71, 0x9e, 0x67, 0x36,
+0x22, 0x83, 0x79, 0xb1, 0xd6, 0xb8, 0x84, 0x52,
+0xaf, 0x96, 0x5b, 0xc3, 0x63, 0x02, 0x4e, 0x78,
+0x70, 0x57, 0x02, 0x03, 0x01, 0x00, 0x01, 0x30,
+0x0d, 0x06, 0x09, 0x2a, 0x86, 0x48, 0x86, 0xf7,
+0x0d, 0x01, 0x01, 0x0b, 0x05, 0x00, 0x03, 0x82,
+0x01, 0x01, 0x00, 0x24, 0x28, 0xee, 0x22, 0x74,
+0x7f, 0x7c, 0xfa, 0x6c, 0x1f, 0xb3, 0x18, 0xd1,
+0xc2, 0x3d, 0x7d, 0x29, 0x42, 0x88, 0xad, 0x82,
+0xa5, 0xb1, 0x8a, 0x05, 0xd0, 0xec, 0x5c, 0x91,
+0x20, 0xf6, 0x82, 0xfd, 0xd5, 0x67, 0x60, 0x5f,
+0x31, 0xf5, 0xbd, 0x88, 0x91, 0x70, 0xbd, 0xb8,
+0xb9, 0x8c, 0x88, 0xfe, 0x53, 0xc9, 0x54, 0x9b,
+0x43, 0xc4, 0x7a, 0x43, 0x74, 0x6b, 0xdd, 0xb0,
+0xb1, 0x3b, 0x33, 0x45, 0x46, 0x78, 0xa3, 0x1c,
+0xef, 0x54, 0x68, 0xf7, 0x85, 0x9c, 0xe4, 0x51,
+0x6f, 0x06, 0xaf, 0x81, 0xdb, 0x2a, 0x7b, 0x7b,
+0x6f, 0xa8, 0x9c, 0x67, 0xd8, 0xcb, 0xc9, 0x91,
+0x40, 0x00, 0xae, 0xd9, 0xa1, 0x9f, 0xdd, 0xa6,
+0x43, 0x0e, 0x28, 0x7b, 0xaa, 0x1b, 0xe9, 0x84,
+0xdb, 0x76, 0x64, 0x42, 0x70, 0xc9, 0xc0, 0xeb,
+0xae, 0x84, 0x11, 0x16, 0x68, 0x4e, 0x84, 0x9e,
+0x7e, 0x92, 0x36, 0xee, 0x1c, 0x3b, 0x08, 0x63,
+0xeb, 0x79, 0x84, 0x15, 0x08, 0x9d, 0xaf, 0xc8,
+0x9a, 0xc7, 0x34, 0xd3, 0x94, 0x4b, 0xd1, 0x28,
+0x97, 0xbe, 0xd1, 0x45, 0x75, 0xdc, 0x35, 0x62,
+0xac, 0x1d, 0x1f, 0xb7, 0xb7, 0x15, 0x87, 0xc8,
+0x98, 0xc0, 0x24, 0x31, 0x56, 0x8d, 0xed, 0xdb,
+0x06, 0xc6, 0x46, 0xbf, 0x4b, 0x6d, 0xa6, 0xd5,
+0xab, 0xcc, 0x60, 0xfc, 0xe5, 0x37, 0xb6, 0x53,
+0x7d, 0x58, 0x95, 0xa9, 0x56, 0xc7, 0xf7, 0xee,
+0xc3, 0xa0, 0x76, 0xf7, 0x65, 0x4d, 0x53, 0xfa,
+0xff, 0x5f, 0x76, 0x33, 0x5a, 0x08, 0xfa, 0x86,
+0x92, 0x5a, 0x13, 0xfa, 0x1a, 0xfc, 0xf2, 0x1b,
+0x8c, 0x7f, 0x42, 0x6d, 0xb7, 0x7e, 0xb7, 0xb4,
+0xf0, 0xc7, 0x83, 0xbb, 0xa2, 0x81, 0x03, 0x2d,
+0xd4, 0x2a, 0x63, 0x3f, 0xf7, 0x31, 0x2e, 0x40,
+0x33, 0x5c, 0x46, 0xbc, 0x9b, 0xc1, 0x05, 0xa5,
+0x45, 0x4e, 0xc3,
if (is_msi(mdev_state)) {
if (mdev_state->msi_evtfd)
- eventfd_signal(mdev_state->msi_evtfd, 1);
+ eventfd_signal(mdev_state->msi_evtfd);
} else if (is_intx(mdev_state)) {
if (mdev_state->intx_evtfd && !mdev_state->intx_mask) {
- eventfd_signal(mdev_state->intx_evtfd, 1);
+ eventfd_signal(mdev_state->intx_evtfd);
mdev_state->intx_mask = true;
}
}
# Some architectures create .build-id symlinks
ifneq ($(filter arm sparc x86, $(SRCARCH)),)
-link := $(install-dir)/.build-id/$$(shell $(READELF) -n $$(src) | sed -n 's@^.*Build ID: \(..\)\(.*\)@\1/\2@p')
+link := $(install-dir)/.build-id/$$(shell $(READELF) -n $$(src) | sed -n 's@^.*Build ID: \(..\)\(.*\)@\1/\2@p').debug
__default: $$(link)
$$(link): $$(dest) FORCE
args = parser.parse_args()
return (args.log_level,
- os.path.abspath(args.directory),
+ os.path.realpath(args.directory),
args.output,
args.ar,
args.paths if len(args.paths) > 0 else [args.directory])
# by Make, so this code replaces the escaped version with '#'.
prefix = command_prefix.replace('\#', '#').replace('$(pound)', '#')
- # Use os.path.abspath() to normalize the path resolving '.' and '..' .
- abs_path = os.path.abspath(os.path.join(root_directory, file_path))
+ # Return the canonical path, eliminating any symbolic links encountered in the path.
+ abs_path = os.path.realpath(os.path.join(root_directory, file_path))
if not os.path.exists(abs_path):
raise ValueError('File %s not found' % abs_path)
return {
use Cwd;
use File::Find;
use File::Spec::Functions;
+use open qw(:std :encoding(UTF-8));
my $cur_path = fastgetcwd() . '/';
my $lk_path = "./";
my $text = do { local($/) ; <$f> };
close($f);
- my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
+ my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
push(@file_emails, clean_file_emails(@poss_addr));
}
}
return 0;
}
+sub escape_name {
+ my ($name) = @_;
+
+ if ($name =~ /[^\w \-]/ai) { ##has "must quote" chars
+ $name =~ s/(?<!\\)"/\\"/g; ##escape quotes
+ $name = "\"$name\"";
+ }
+
+ return $name;
+}
+
sub parse_email {
my ($formatted_email) = @_;
$name =~ s/^\s+|\s+$//g;
$name =~ s/^\"|\"$//g;
+ $name = escape_name($name);
$address =~ s/^\s+|\s+$//g;
- if ($name =~ /[^\w \-]/i) { ##has "must quote" chars
- $name =~ s/(?<!\\)"/\\"/g; ##escape quotes
- $name = "\"$name\"";
- }
-
return ($name, $address);
}
$name =~ s/^\s+|\s+$//g;
$name =~ s/^\"|\"$//g;
+ $name = escape_name($name);
$address =~ s/^\s+|\s+$//g;
- if ($name =~ /[^\w \-]/i) { ##has "must quote" chars
- $name =~ s/(?<!\\)"/\\"/g; ##escape quotes
- $name = "\"$name\"";
- }
-
if ($usename) {
if ("$name" eq "") {
$formatted_email = "$address";
foreach my $email (@file_emails) {
$email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g;
my ($name, $address) = parse_email($email);
- if ($name eq '"[,\.]"') {
- $name = "";
- }
- my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name);
+ # Strip quotes for easier processing, format_email will add them back
+ $name =~ s/^"(.*)"$/$1/;
+
+ # Split into name-like parts and remove stray punctuation particles
+ my @nw = split(/[^\p{L}\'\,\.\+-]/, $name);
+ @nw = grep(!/^[\'\,\.\+-]$/, @nw);
+
+ # Make a best effort to extract the name, and only the name, by taking
+ # only the last two names, or in the case of obvious initials, the last
+ # three names.
if (@nw > 2) {
my $first = $nw[@nw - 3];
my $middle = $nw[@nw - 2];
my $last = $nw[@nw - 1];
- if (((length($first) == 1 && $first =~ m/[A-Za-z]/) ||
+ if (((length($first) == 1 && $first =~ m/\p{L}/) ||
(length($first) == 2 && substr($first, -1) eq ".")) ||
(length($middle) == 1 ||
(length($middle) == 2 && substr($middle, -1) eq "."))) {
} else {
$name = "$middle $last";
}
+ } else {
+ $name = "@nw";
}
if (substr($name, -1) =~ /[,\.]/) {
$name = substr($name, 0, length($name) - 1);
- } elsif (substr($name, -2) =~ /[,\.]"/) {
- $name = substr($name, 0, length($name) - 2) . '"';
}
if (substr($name, 0, 1) =~ /[,\.]/) {
$name = substr($name, 1, length($name) - 1);
- } elsif (substr($name, 0, 2) =~ /"[,\.]/) {
- $name = '"' . substr($name, 2, length($name) - 2);
}
my $fmt_email = format_email($name, $address, $email_usename);
static struct aa_sfs_entry aa_sfs_entry_mount[] = {
AA_SFS_FILE_STRING("mask", "mount umount pivot_root"),
+ AA_SFS_FILE_STRING("move_mount", "detached"),
{ }
};
error = -ENOMEM;
if (!to_buffer || !from_buffer)
goto out;
+
+ if (!our_mnt(from_path->mnt))
+ /* moving a mount detached from the namespace */
+ from_path = NULL;
error = fn_for_each_confined(label, profile,
match_mnt(subj_cred, profile, to_path, to_buffer,
from_path, from_buffer,
}
}
+/*
+ * Set the expiration time on a key.
+ */
+void key_set_expiry(struct key *key, time64_t expiry)
+{
+ key->expiry = expiry;
+ if (expiry != TIME64_MAX) {
+ if (!(key->type->flags & KEY_TYPE_INSTANT_REAP))
+ expiry += key_gc_delay;
+ key_schedule_gc(expiry);
+ }
+}
+
/*
* Schedule a dead links collection run.
*/
static u8 gc_state; /* Internal persistent state */
#define KEY_GC_REAP_AGAIN 0x01 /* - Need another cycle */
#define KEY_GC_REAPING_LINKS 0x02 /* - We need to reap links */
-#define KEY_GC_SET_TIMER 0x04 /* - We need to restart the timer */
#define KEY_GC_REAPING_DEAD_1 0x10 /* - We need to mark dead keys */
#define KEY_GC_REAPING_DEAD_2 0x20 /* - We need to reap dead key links */
#define KEY_GC_REAPING_DEAD_3 0x40 /* - We need to reap dead keys */
struct rb_node *cursor;
struct key *key;
- time64_t new_timer, limit;
+ time64_t new_timer, limit, expiry;
kenter("[%lx,%x]", key_gc_flags, gc_state);
limit = ktime_get_real_seconds();
- if (limit > key_gc_delay)
- limit -= key_gc_delay;
- else
- limit = key_gc_delay;
/* Work out what we're going to be doing in this pass */
gc_state &= KEY_GC_REAPING_DEAD_1 | KEY_GC_REAPING_DEAD_2;
gc_state <<= 1;
if (test_and_clear_bit(KEY_GC_KEY_EXPIRED, &key_gc_flags))
- gc_state |= KEY_GC_REAPING_LINKS | KEY_GC_SET_TIMER;
+ gc_state |= KEY_GC_REAPING_LINKS;
if (test_and_clear_bit(KEY_GC_REAP_KEYTYPE, &key_gc_flags))
gc_state |= KEY_GC_REAPING_DEAD_1;
}
}
- if (gc_state & KEY_GC_SET_TIMER) {
- if (key->expiry > limit && key->expiry < new_timer) {
+ expiry = key->expiry;
+ if (expiry != TIME64_MAX) {
+ if (!(key->type->flags & KEY_TYPE_INSTANT_REAP))
+ expiry += key_gc_delay;
+ if (expiry > limit && expiry < new_timer) {
kdebug("will expire %x in %lld",
key_serial(key), key->expiry - limit);
new_timer = key->expiry;
*/
kdebug("pass complete");
- if (gc_state & KEY_GC_SET_TIMER && new_timer != (time64_t)TIME64_MAX) {
+ if (new_timer != TIME64_MAX) {
new_timer += key_gc_delay;
key_schedule_gc(new_timer);
}
extern void keyring_gc(struct key *keyring, time64_t limit);
extern void keyring_restriction_gc(struct key *keyring,
struct key_type *dead_type);
+void key_set_expiry(struct key *key, time64_t expiry);
extern void key_schedule_gc(time64_t gc_at);
extern void key_schedule_gc_links(void);
extern void key_gc_keytype(struct key_type *ktype);
*/
static inline bool key_is_dead(const struct key *key, time64_t limit)
{
+ time64_t expiry = key->expiry;
+
+ if (expiry != TIME64_MAX) {
+ if (!(key->type->flags & KEY_TYPE_INSTANT_REAP))
+ expiry += key_gc_delay;
+ if (expiry <= limit)
+ return true;
+ }
+
return
key->flags & ((1 << KEY_FLAG_DEAD) |
(1 << KEY_FLAG_INVALIDATED)) ||
- (key->expiry > 0 && key->expiry <= limit) ||
key->domain_tag->removed;
}
key->uid = uid;
key->gid = gid;
key->perm = perm;
+ key->expiry = TIME64_MAX;
key->restrict_link = restrict_link;
key->last_used_at = ktime_get_real_seconds();
if (authkey)
key_invalidate(authkey);
- if (prep->expiry != TIME64_MAX) {
- key->expiry = prep->expiry;
- key_schedule_gc(prep->expiry + key_gc_delay);
- }
+ key_set_expiry(key, prep->expiry);
}
}
atomic_inc(&key->user->nikeys);
mark_key_instantiated(key, -error);
notify_key(key, NOTIFY_KEY_INSTANTIATED, -error);
- key->expiry = ktime_get_real_seconds() + timeout;
- key_schedule_gc(key->expiry + key_gc_delay);
+ key_set_expiry(key, ktime_get_real_seconds() + timeout);
if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags))
awaken = 1;
void key_set_timeout(struct key *key, unsigned timeout)
{
- time64_t expiry = 0;
+ time64_t expiry = TIME64_MAX;
/* make the changes with the locks held to prevent races */
down_write(&key->sem);
if (timeout > 0)
expiry = ktime_get_real_seconds() + timeout;
-
- key->expiry = expiry;
- key_schedule_gc(key->expiry + key_gc_delay);
+ key_set_expiry(key, expiry);
up_write(&key->sem);
}
key_serial_t ringid)
{
if (_payload && plen) {
- struct iovec iov;
struct iov_iter from;
int ret;
- ret = import_single_range(ITER_SOURCE, (void __user *)_payload, plen,
- &iov, &from);
+ ret = import_ubuf(ITER_SOURCE, (void __user *)_payload, plen,
+ &from);
if (unlikely(ret))
return ret;
/* come up with a suitable timeout value */
expiry = READ_ONCE(key->expiry);
- if (expiry == 0) {
+ if (expiry == TIME64_MAX) {
memcpy(xbuf, "perm", 5);
} else if (now >= expiry) {
memcpy(xbuf, "expd", 5);
*/
int security_file_permission(struct file *file, int mask)
{
- int ret;
-
- ret = call_int_hook(file_permission, 0, file, mask);
- if (ret)
- return ret;
-
- return fsnotify_perm(file, mask);
+ return call_int_hook(file_permission, 0, file, mask);
}
/**
if (ret)
return ret;
- return fsnotify_perm(file, MAY_OPEN);
+ return fsnotify_open_perm(file);
}
/**
if (cs35l41_safe_reset(cs35l41->regmap, cs35l41->hw_cfg.bst_type))
gpiod_set_value_cansleep(cs35l41->reset_gpio, 0);
gpiod_put(cs35l41->reset_gpio);
+ gpiod_put(cs35l41->cs_gpio);
acpi_dev_put(cs35l41->dacpi);
kfree(cs35l41->acpi_subsystem_id);
if (cs35l41_safe_reset(cs35l41->regmap, cs35l41->hw_cfg.bst_type))
gpiod_set_value_cansleep(cs35l41->reset_gpio, 0);
gpiod_put(cs35l41->reset_gpio);
+ gpiod_put(cs35l41->cs_gpio);
kfree(cs35l41->acpi_subsystem_id);
}
EXPORT_SYMBOL_NS_GPL(cs35l41_hda_remove, SND_HDA_SCODEC_CS35L41);
} __packed;
enum cs35l41_hda_spk_pos {
- CS35l41_LEFT,
- CS35l41_RIGHT,
+ CS35L41_LEFT,
+ CS35L41_RIGHT,
};
enum cs35l41_hda_gpio_function {
struct device *dev;
struct regmap *regmap;
struct gpio_desc *reset_gpio;
+ struct gpio_desc *cs_gpio;
struct cs35l41_hw_cfg hw_cfg;
struct hda_codec *codec;
//
// Author: Stefan Binding <sbinding@opensource.cirrus.com>
+#include <linux/acpi.h>
#include <linux/gpio/consumer.h>
#include <linux/string.h>
#include "cs35l41_hda_property.h"
+#include <linux/spi/spi.h>
+
+#define MAX_AMPS 4
+
+struct cs35l41_config {
+ const char *ssid;
+ enum {
+ SPI,
+ I2C
+ } bus;
+ int num_amps;
+ enum {
+ INTERNAL,
+ EXTERNAL
+ } boost_type;
+ u8 channel[MAX_AMPS];
+ int reset_gpio_index; /* -1 if no reset gpio */
+ int spkid_gpio_index; /* -1 if no spkid gpio */
+ int cs_gpio_index; /* -1 if no cs gpio, or cs-gpios already exists, max num amps == 2 */
+ int boost_ind_nanohenry; /* Required if boost_type == Internal */
+ int boost_peak_milliamp; /* Required if boost_type == Internal */
+ int boost_cap_microfarad; /* Required if boost_type == Internal */
+};
+
+static const struct cs35l41_config cs35l41_config_table[] = {
+/*
+ * Device 103C89C6 does have _DSD, however it is setup to use the wrong boost type.
+ * We can override the _DSD to correct the boost type here.
+ * Since this laptop has valid ACPI, we do not need to handle cs-gpios, since that already exists
+ * in the ACPI. The Reset GPIO is also valid, so we can use the Reset defined in _DSD.
+ */
+ { "103C89C6", SPI, 2, INTERNAL, { CS35L41_RIGHT, CS35L41_LEFT, 0, 0 }, -1, -1, -1, 1000, 4500, 24 },
+ { "104312AF", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "10431433", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431463", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431473", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 1000, 4500, 24 },
+ { "10431483", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 1000, 4500, 24 },
+ { "10431493", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "104314D3", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "104314E3", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431503", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431533", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431573", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "10431663", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 1000, 4500, 24 },
+ { "104316D3", SPI, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+ { "104316F3", SPI, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+ { "104317F3", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431863", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "104318D3", I2C, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
+ { "10431C9F", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "10431CAF", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "10431CCF", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "10431CDF", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "10431CEF", SPI, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
+ { "10431D1F", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431DA2", SPI, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+ { "10431E02", SPI, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+ { "10431EE2", I2C, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, -1, -1, 0, 0, 0 },
+ { "10431F12", I2C, 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
+ { "10431F1F", SPI, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 0, 0, 0 },
+ { "10431F62", SPI, 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+ {}
+};
+
+static int cs35l41_add_gpios(struct cs35l41_hda *cs35l41, struct device *physdev, int reset_gpio,
+ int spkid_gpio, int cs_gpio_index, int num_amps)
+{
+ struct acpi_gpio_mapping *gpio_mapping = NULL;
+ struct acpi_gpio_params *reset_gpio_params = NULL;
+ struct acpi_gpio_params *spkid_gpio_params = NULL;
+ struct acpi_gpio_params *cs_gpio_params = NULL;
+ unsigned int num_entries = 0;
+ unsigned int reset_index, spkid_index, csgpio_index;
+ int i;
+
+ /*
+ * GPIO Mapping only needs to be done once, since it would be available for subsequent amps
+ */
+ if (cs35l41->dacpi->driver_gpios)
+ return 0;
+
+ if (reset_gpio >= 0) {
+ reset_index = num_entries;
+ num_entries++;
+ }
+
+ if (spkid_gpio >= 0) {
+ spkid_index = num_entries;
+ num_entries++;
+ }
+
+ if ((cs_gpio_index >= 0) && (num_amps == 2)) {
+ csgpio_index = num_entries;
+ num_entries++;
+ }
+
+ if (!num_entries)
+ return 0;
+
+ /* must include termination entry */
+ num_entries++;
+
+ gpio_mapping = devm_kcalloc(physdev, num_entries, sizeof(struct acpi_gpio_mapping),
+ GFP_KERNEL);
+
+ if (!gpio_mapping)
+ goto err;
+
+ if (reset_gpio >= 0) {
+ gpio_mapping[reset_index].name = "reset-gpios";
+ reset_gpio_params = devm_kcalloc(physdev, num_amps, sizeof(struct acpi_gpio_params),
+ GFP_KERNEL);
+ if (!reset_gpio_params)
+ goto err;
+
+ for (i = 0; i < num_amps; i++)
+ reset_gpio_params[i].crs_entry_index = reset_gpio;
+
+ gpio_mapping[reset_index].data = reset_gpio_params;
+ gpio_mapping[reset_index].size = num_amps;
+ }
+
+ if (spkid_gpio >= 0) {
+ gpio_mapping[spkid_index].name = "spk-id-gpios";
+ spkid_gpio_params = devm_kcalloc(physdev, num_amps, sizeof(struct acpi_gpio_params),
+ GFP_KERNEL);
+ if (!spkid_gpio_params)
+ goto err;
+
+ for (i = 0; i < num_amps; i++)
+ spkid_gpio_params[i].crs_entry_index = spkid_gpio;
+
+ gpio_mapping[spkid_index].data = spkid_gpio_params;
+ gpio_mapping[spkid_index].size = num_amps;
+ }
+
+ if ((cs_gpio_index >= 0) && (num_amps == 2)) {
+ gpio_mapping[csgpio_index].name = "cs-gpios";
+ /* only one GPIO CS is supported without using _DSD, obtained using index 0 */
+ cs_gpio_params = devm_kzalloc(physdev, sizeof(struct acpi_gpio_params), GFP_KERNEL);
+ if (!cs_gpio_params)
+ goto err;
+
+ cs_gpio_params->crs_entry_index = cs_gpio_index;
+
+ gpio_mapping[csgpio_index].data = cs_gpio_params;
+ gpio_mapping[csgpio_index].size = 1;
+ }
+
+ return devm_acpi_dev_add_driver_gpios(physdev, gpio_mapping);
+err:
+ devm_kfree(physdev, gpio_mapping);
+ devm_kfree(physdev, reset_gpio_params);
+ devm_kfree(physdev, spkid_gpio_params);
+ devm_kfree(physdev, cs_gpio_params);
+ return -ENOMEM;
+}
+
+static int generic_dsd_config(struct cs35l41_hda *cs35l41, struct device *physdev, int id,
+ const char *hid)
+{
+ struct cs35l41_hw_cfg *hw_cfg = &cs35l41->hw_cfg;
+ const struct cs35l41_config *cfg;
+ struct gpio_desc *cs_gpiod;
+ struct spi_device *spi;
+ bool dsd_found;
+ int ret;
+
+ for (cfg = cs35l41_config_table; cfg->ssid; cfg++) {
+ if (!strcasecmp(cfg->ssid, cs35l41->acpi_subsystem_id))
+ break;
+ }
+
+ if (!cfg->ssid)
+ return -ENOENT;
+
+ if (!cs35l41->dacpi || cs35l41->dacpi != ACPI_COMPANION(physdev)) {
+ dev_err(cs35l41->dev, "ACPI Device does not match, cannot override _DSD.\n");
+ return -ENODEV;
+ }
+
+ dev_info(cs35l41->dev, "Adding DSD properties for %s\n", cs35l41->acpi_subsystem_id);
+
+ dsd_found = acpi_dev_has_props(cs35l41->dacpi);
+
+ if (!dsd_found) {
+ ret = cs35l41_add_gpios(cs35l41, physdev, cfg->reset_gpio_index,
+ cfg->spkid_gpio_index, cfg->cs_gpio_index,
+ cfg->num_amps);
+ if (ret) {
+ dev_err(cs35l41->dev, "Error adding GPIO mapping: %d\n", ret);
+ return ret;
+ }
+ } else if (cfg->reset_gpio_index >= 0 || cfg->spkid_gpio_index >= 0) {
+ dev_warn(cs35l41->dev, "Cannot add Reset/Speaker ID/SPI CS GPIO Mapping, "
+ "_DSD already exists.\n");
+ }
+
+ if (cfg->bus == SPI) {
+ cs35l41->index = id;
+
+ /*
+ * Manually set the Chip Select for the second amp <cs_gpio_index> in the node.
+ * This is only supported for systems with 2 amps, since we cannot expand the
+ * default number of chip selects without using cs-gpios
+ * The CS GPIO must be set high prior to communicating with the first amp (which
+ * uses a native chip select), to ensure the second amp does not clash with the
+ * first.
+ */
+ if (IS_ENABLED(CONFIG_SPI) && cfg->cs_gpio_index >= 0) {
+ spi = to_spi_device(cs35l41->dev);
+
+ if (cfg->num_amps != 2) {
+ dev_warn(cs35l41->dev,
+ "Cannot update SPI CS, Number of Amps (%d) != 2\n",
+ cfg->num_amps);
+ } else if (dsd_found) {
+ dev_warn(cs35l41->dev,
+ "Cannot update SPI CS, _DSD already exists.\n");
+ } else {
+ /*
+ * This is obtained using driver_gpios, since only one GPIO for CS
+ * exists, this can be obtained using index 0.
+ */
+ cs_gpiod = gpiod_get_index(physdev, "cs", 0, GPIOD_OUT_LOW);
+ if (IS_ERR(cs_gpiod)) {
+ dev_err(cs35l41->dev,
+ "Unable to get Chip Select GPIO descriptor\n");
+ return PTR_ERR(cs_gpiod);
+ }
+ if (id == 1) {
+ spi_set_csgpiod(spi, 0, cs_gpiod);
+ cs35l41->cs_gpio = cs_gpiod;
+ } else {
+ gpiod_set_value_cansleep(cs_gpiod, true);
+ gpiod_put(cs_gpiod);
+ }
+ spi_setup(spi);
+ }
+ }
+ } else {
+ if (cfg->num_amps > 2)
+ /*
+ * i2c addresses for 3/4 amps are used in order: 0x40, 0x41, 0x42, 0x43,
+ * subtracting 0x40 would give zero-based index
+ */
+ cs35l41->index = id - 0x40;
+ else
+ /* i2c addr 0x40 for first amp (always), 0x41/0x42 for 2nd amp */
+ cs35l41->index = id == 0x40 ? 0 : 1;
+ }
+
+ if (cfg->num_amps == 3)
+ /* 3 amps means a center channel, so no duplicate channels */
+ cs35l41->channel_index = 0;
+ else
+ /*
+ * if 4 amps, there are duplicate channels, so they need different indexes
+ * if 2 amps, no duplicate channels, channel_index would be 0
+ */
+ cs35l41->channel_index = cs35l41->index / 2;
+
+ cs35l41->reset_gpio = fwnode_gpiod_get_index(acpi_fwnode_handle(cs35l41->dacpi), "reset",
+ cs35l41->index, GPIOD_OUT_LOW,
+ "cs35l41-reset");
+ cs35l41->speaker_id = cs35l41_get_speaker_id(physdev, cs35l41->index, cfg->num_amps, -1);
+
+ hw_cfg->spk_pos = cfg->channel[cs35l41->index];
+
+ if (cfg->boost_type == INTERNAL) {
+ hw_cfg->bst_type = CS35L41_INT_BOOST;
+ hw_cfg->bst_ind = cfg->boost_ind_nanohenry;
+ hw_cfg->bst_ipk = cfg->boost_peak_milliamp;
+ hw_cfg->bst_cap = cfg->boost_cap_microfarad;
+ hw_cfg->gpio1.func = CS35L41_NOT_USED;
+ hw_cfg->gpio1.valid = true;
+ } else {
+ hw_cfg->bst_type = CS35L41_EXT_BOOST;
+ hw_cfg->bst_ind = -1;
+ hw_cfg->bst_ipk = -1;
+ hw_cfg->bst_cap = -1;
+ hw_cfg->gpio1.func = CS35l41_VSPK_SWITCH;
+ hw_cfg->gpio1.valid = true;
+ }
+
+ hw_cfg->gpio2.func = CS35L41_INTERRUPT;
+ hw_cfg->gpio2.valid = true;
+ hw_cfg->valid = true;
+
+ return 0;
+}
/*
* Device CLSA010(0/1) doesn't have _DSD so a gpiod_get by the label reset won't work.
return 0;
}
-/*
- * Device 103C89C6 does have _DSD, however it is setup to use the wrong boost type.
- * We can override the _DSD to correct the boost type here.
- * Since this laptop has valid ACPI, we do not need to handle cs-gpios, since that already exists
- * in the ACPI.
- */
-static int hp_vision_acpi_fix(struct cs35l41_hda *cs35l41, struct device *physdev, int id,
- const char *hid)
-{
- struct cs35l41_hw_cfg *hw_cfg = &cs35l41->hw_cfg;
-
- dev_info(cs35l41->dev, "Adding DSD properties for %s\n", cs35l41->acpi_subsystem_id);
-
- cs35l41->index = id;
- cs35l41->channel_index = 0;
-
- /*
- * This system has _DSD, it just contains an error, so we can still get the reset using
- * the "reset" label.
- */
- cs35l41->reset_gpio = fwnode_gpiod_get_index(acpi_fwnode_handle(cs35l41->dacpi), "reset",
- cs35l41->index, GPIOD_OUT_LOW,
- "cs35l41-reset");
- cs35l41->speaker_id = -ENOENT;
- hw_cfg->spk_pos = cs35l41->index ? 0 : 1; // right:left
- hw_cfg->gpio1.func = CS35L41_NOT_USED;
- hw_cfg->gpio1.valid = true;
- hw_cfg->gpio2.func = CS35L41_INTERRUPT;
- hw_cfg->gpio2.valid = true;
- hw_cfg->bst_type = CS35L41_INT_BOOST;
- hw_cfg->bst_ind = 1000;
- hw_cfg->bst_ipk = 4500;
- hw_cfg->bst_cap = 24;
- hw_cfg->valid = true;
-
- return 0;
-}
-
struct cs35l41_prop_model {
const char *hid;
const char *ssid;
static const struct cs35l41_prop_model cs35l41_prop_model_table[] = {
{ "CLSA0100", NULL, lenovo_legion_no_acpi },
{ "CLSA0101", NULL, lenovo_legion_no_acpi },
- { "CSC3551", "103C89C6", hp_vision_acpi_fix },
+ { "CSC3551", "103C89C6", generic_dsd_config },
+ { "CSC3551", "104312AF", generic_dsd_config },
+ { "CSC3551", "10431433", generic_dsd_config },
+ { "CSC3551", "10431463", generic_dsd_config },
+ { "CSC3551", "10431473", generic_dsd_config },
+ { "CSC3551", "10431483", generic_dsd_config },
+ { "CSC3551", "10431493", generic_dsd_config },
+ { "CSC3551", "104314D3", generic_dsd_config },
+ { "CSC3551", "104314E3", generic_dsd_config },
+ { "CSC3551", "10431503", generic_dsd_config },
+ { "CSC3551", "10431533", generic_dsd_config },
+ { "CSC3551", "10431573", generic_dsd_config },
+ { "CSC3551", "10431663", generic_dsd_config },
+ { "CSC3551", "104316D3", generic_dsd_config },
+ { "CSC3551", "104316F3", generic_dsd_config },
+ { "CSC3551", "104317F3", generic_dsd_config },
+ { "CSC3551", "10431863", generic_dsd_config },
+ { "CSC3551", "104318D3", generic_dsd_config },
+ { "CSC3551", "10431C9F", generic_dsd_config },
+ { "CSC3551", "10431CAF", generic_dsd_config },
+ { "CSC3551", "10431CCF", generic_dsd_config },
+ { "CSC3551", "10431CDF", generic_dsd_config },
+ { "CSC3551", "10431CEF", generic_dsd_config },
+ { "CSC3551", "10431D1F", generic_dsd_config },
+ { "CSC3551", "10431DA2", generic_dsd_config },
+ { "CSC3551", "10431E02", generic_dsd_config },
+ { "CSC3551", "10431EE2", generic_dsd_config },
+ { "CSC3551", "10431F12", generic_dsd_config },
+ { "CSC3551", "10431F1F", generic_dsd_config },
+ { "CSC3551", "10431F62", generic_dsd_config },
{}
};
if (!strcmp(model->hid, hid) &&
(!model->ssid ||
(cs35l41->acpi_subsystem_id &&
- !strcmp(model->ssid, cs35l41->acpi_subsystem_id))))
+ !strcasecmp(model->ssid, cs35l41->acpi_subsystem_id))))
return model->add_prop(cs35l41, physdev, id, hid);
}
SND_PCI_QUIRK(0x103c, 0x84da, "HP OMEN dc0019-ur", ALC295_FIXUP_HP_OMEN),
SND_PCI_QUIRK(0x103c, 0x84e7, "HP Pavilion 15", ALC269_FIXUP_HP_MUTE_LED_MIC3),
SND_PCI_QUIRK(0x103c, 0x8519, "HP Spectre x360 15-df0xxx", ALC285_FIXUP_HP_SPECTRE_X360),
+ SND_PCI_QUIRK(0x103c, 0x8537, "HP ProBook 440 G6", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x860f, "HP ZBook 15 G6", ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x861f, "HP Elite Dragonfly G1", ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x869d, "HP", ALC236_FIXUP_HP_MUTE_LED),
SND_PCI_QUIRK(0x103c, 0x89c6, "Zbook Fury 17 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x89ca, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x89d3, "HP EliteBook 645 G9 (MB 89D2)", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
+ SND_PCI_QUIRK(0x103c, 0x8a0f, "HP Pavilion 14-ec1xxx", ALC287_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8a20, "HP Laptop 15s-fq5xxx", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2),
SND_PCI_QUIRK(0x103c, 0x8a25, "HP Victus 16-d1xxx (MB 8A25)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8a78, "HP Dev One", ALC285_FIXUP_HP_LIMIT_INT_MIC_BOOST),
SND_PCI_QUIRK(0x103c, 0x8c70, "HP EliteBook 835 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8c71, "HP EliteBook 845 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8c72, "HP EliteBook 865 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8c96, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8ca4, "HP ZBook Fury", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8ca7, "HP ZBook Fury", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8cf5, "HP ZBook Studio 16", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
SND_PCI_QUIRK(0x1043, 0x1313, "Asus K42JZ", ALC269VB_FIXUP_ASUS_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1043, 0x13b0, "ASUS Z550SA", ALC256_FIXUP_ASUS_MIC),
SND_PCI_QUIRK(0x1043, 0x1427, "Asus Zenbook UX31E", ALC269VB_FIXUP_ASUS_ZENBOOK),
- SND_PCI_QUIRK(0x1043, 0x1433, "ASUS GX650P", ALC285_FIXUP_ASUS_I2C_HEADSET_MIC),
- SND_PCI_QUIRK(0x1043, 0x1463, "Asus GA402X", ALC285_FIXUP_ASUS_I2C_HEADSET_MIC),
- SND_PCI_QUIRK(0x1043, 0x1473, "ASUS GU604V", ALC285_FIXUP_ASUS_HEADSET_MIC),
- SND_PCI_QUIRK(0x1043, 0x1483, "ASUS GU603V", ALC285_FIXUP_ASUS_HEADSET_MIC),
- SND_PCI_QUIRK(0x1043, 0x1493, "ASUS GV601V", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1433, "ASUS GX650PY/PZ/PV/PU/PYV/PZV/PIV/PVV", ALC285_FIXUP_ASUS_I2C_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1463, "Asus GA402X/GA402N", ALC285_FIXUP_ASUS_I2C_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1473, "ASUS GU604VI/VC/VE/VG/VJ/VQ/VU/VV/VY/VZ", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1483, "ASUS GU603VQ/VU/VV/VJ/VI", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1493, "ASUS GV601VV/VU/VJ/VQ/VI", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x14d3, "ASUS G614JY/JZ/JG", ALC245_FIXUP_CS35L41_SPI_2),
+ SND_PCI_QUIRK(0x1043, 0x14e3, "ASUS G513PI/PU/PV", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x1503, "ASUS G733PY/PZ/PZV/PYV", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1517, "Asus Zenbook UX31A", ALC269VB_FIXUP_ASUS_ZENBOOK_UX31A),
- SND_PCI_QUIRK(0x1043, 0x1573, "ASUS GZ301V", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1533, "ASUS GV302XA/XJ/XQ/XU/XV/XI", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x1573, "ASUS GZ301VV/VQ/VU/VJ/VA/VC/VE/VVC/VQC/VUC/VJC/VEC/VCC", ALC285_FIXUP_ASUS_HEADSET_MIC),
SND_PCI_QUIRK(0x1043, 0x1662, "ASUS GV301QH", ALC294_FIXUP_ASUS_DUAL_SPK),
- SND_PCI_QUIRK(0x1043, 0x1663, "ASUS GU603ZV", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1663, "ASUS GU603ZI/ZJ/ZQ/ZU/ZV", ALC285_FIXUP_ASUS_HEADSET_MIC),
SND_PCI_QUIRK(0x1043, 0x1683, "ASUS UM3402YAR", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x16b2, "ASUS GU603", ALC289_FIXUP_ASUS_GA401),
+ SND_PCI_QUIRK(0x1043, 0x16d3, "ASUS UX5304VA", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x16e3, "ASUS UX50", ALC269_FIXUP_STEREO_DMIC),
+ SND_PCI_QUIRK(0x1043, 0x16f3, "ASUS UX7602VI/BZ", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1740, "ASUS UX430UA", ALC295_FIXUP_ASUS_DACS),
SND_PCI_QUIRK(0x1043, 0x17d1, "ASUS UX431FL", ALC294_FIXUP_ASUS_DUAL_SPK),
- SND_PCI_QUIRK(0x1043, 0x17f3, "ROG Ally RC71L_RC71L", ALC294_FIXUP_ASUS_ALLY),
+ SND_PCI_QUIRK(0x1043, 0x17f3, "ROG Ally NR2301L/X", ALC294_FIXUP_ASUS_ALLY),
+ SND_PCI_QUIRK(0x1043, 0x1863, "ASUS UX6404VI/VV", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1881, "ASUS Zephyrus S/M", ALC294_FIXUP_ASUS_GX502_PINS),
SND_PCI_QUIRK(0x1043, 0x18b1, "Asus MJ401TA", ALC256_FIXUP_ASUS_HEADSET_MIC),
SND_PCI_QUIRK(0x1043, 0x18d3, "ASUS UM3504DA", ALC294_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1c43, "ASUS UX8406MA", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1c62, "ASUS GU603", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x1c92, "ASUS ROG Strix G15", ALC285_FIXUP_ASUS_G533Z_PINS),
- SND_PCI_QUIRK(0x1043, 0x1c9f, "ASUS G614JI", ALC285_FIXUP_ASUS_HEADSET_MIC),
- SND_PCI_QUIRK(0x1043, 0x1caf, "ASUS G634JYR/JZR", ALC285_FIXUP_ASUS_SPI_REAR_SPEAKERS),
+ SND_PCI_QUIRK(0x1043, 0x1c9f, "ASUS G614JU/JV/JI", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1caf, "ASUS G634JY/JZ/JI/JG", ALC285_FIXUP_ASUS_SPI_REAR_SPEAKERS),
SND_PCI_QUIRK(0x1043, 0x1ccd, "ASUS X555UB", ALC256_FIXUP_ASUS_MIC),
- SND_PCI_QUIRK(0x1043, 0x1d1f, "ASUS ROG Strix G17 2023 (G713PV)", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x1ccf, "ASUS G814JU/JV/JI", ALC245_FIXUP_CS35L41_SPI_2),
+ SND_PCI_QUIRK(0x1043, 0x1cdf, "ASUS G814JY/JZ/JG", ALC245_FIXUP_CS35L41_SPI_2),
+ SND_PCI_QUIRK(0x1043, 0x1cef, "ASUS G834JY/JZ/JI/JG", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1d1f, "ASUS G713PI/PU/PV/PVN", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1d42, "ASUS Zephyrus G14 2022", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x1d4e, "ASUS TM420", ALC256_FIXUP_ASUS_HPE),
+ SND_PCI_QUIRK(0x1043, 0x1da2, "ASUS UP6502ZA/ZD", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1e02, "ASUS UX3402ZA", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x16a3, "ASUS UX3402VA", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1f62, "ASUS UX7602ZM", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1e11, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA502),
- SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM6702RA/RC", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1e51, "ASUS Zephyrus M15", ALC294_FIXUP_ASUS_GU502_PINS),
SND_PCI_QUIRK(0x1043, 0x1e5e, "ASUS ROG Strix G513", ALC294_FIXUP_ASUS_G513_PINS),
SND_PCI_QUIRK(0x1043, 0x1e8e, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA401),
+ SND_PCI_QUIRK(0x1043, 0x1ee2, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1c52, "ASUS Zephyrus G15 2022", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x1f11, "ASUS Zephyrus G14", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x1f12, "ASUS UM5302", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x1f1f, "ASUS H7604JI/JV/J3D", ALC245_FIXUP_CS35L41_SPI_2),
+ SND_PCI_QUIRK(0x1043, 0x1f62, "ASUS UX7602ZM", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1f92, "ASUS ROG Flow X16", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x3030, "ASUS ZN270IE", ALC256_FIXUP_ASUS_AIO_GPIO2),
SND_PCI_QUIRK(0x1043, 0x3a20, "ASUS G614JZR", ALC245_FIXUP_CS35L41_SPI_2),
CALIB_MAX
};
+struct tas2781_hda {
+ struct device *dev;
+ struct tasdevice_priv *priv;
+ struct snd_kcontrol *dsp_prog_ctl;
+ struct snd_kcontrol *dsp_conf_ctl;
+ struct snd_kcontrol *prof_ctl;
+ struct snd_kcontrol *snd_ctls[3];
+};
+
static int tas2781_get_i2c_res(struct acpi_resource *ares, void *data)
{
struct tasdevice_priv *tas_priv = data;
static void tas2781_hda_playback_hook(struct device *dev, int action)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
- dev_dbg(tas_priv->dev, "%s: action = %d\n", __func__, action);
+ dev_dbg(tas_hda->dev, "%s: action = %d\n", __func__, action);
switch (action) {
case HDA_GEN_PCM_ACT_OPEN:
pm_runtime_get_sync(dev);
- mutex_lock(&tas_priv->codec_lock);
- tasdevice_tuning_switch(tas_priv, 0);
- mutex_unlock(&tas_priv->codec_lock);
+ mutex_lock(&tas_hda->priv->codec_lock);
+ tasdevice_tuning_switch(tas_hda->priv, 0);
+ mutex_unlock(&tas_hda->priv->codec_lock);
break;
case HDA_GEN_PCM_ACT_CLOSE:
- mutex_lock(&tas_priv->codec_lock);
- tasdevice_tuning_switch(tas_priv, 1);
- mutex_unlock(&tas_priv->codec_lock);
+ mutex_lock(&tas_hda->priv->codec_lock);
+ tasdevice_tuning_switch(tas_hda->priv, 1);
+ mutex_unlock(&tas_hda->priv->codec_lock);
pm_runtime_mark_last_busy(dev);
pm_runtime_put_autosuspend(dev);
break;
default:
- dev_dbg(tas_priv->dev, "Playback action not supported: %d\n",
+ dev_dbg(tas_hda->dev, "Playback action not supported: %d\n",
action);
break;
}
}
}
-/* Update the calibrate data, including speaker impedance, f0, etc, into algo.
+/* Update the calibration data, including speaker impedance, f0, etc, into algo.
* Calibrate data is done by manufacturer in the factory. These data are used
- * by Algo for calucating the speaker temperature, speaker membrance excursion
+ * by Algo for calculating the speaker temperature, speaker membrane excursion
* and f0 in real time during playback.
*/
static int tas2781_save_calibration(struct tasdevice_priv *tas_priv)
return 0;
}
+static void tas2781_hda_remove_controls(struct tas2781_hda *tas_hda)
+{
+ struct hda_codec *codec = tas_hda->priv->codec;
+
+ if (tas_hda->dsp_prog_ctl)
+ snd_ctl_remove(codec->card, tas_hda->dsp_prog_ctl);
+
+ if (tas_hda->dsp_conf_ctl)
+ snd_ctl_remove(codec->card, tas_hda->dsp_conf_ctl);
+
+ for (int i = ARRAY_SIZE(tas_hda->snd_ctls) - 1; i >= 0; i--)
+ if (tas_hda->snd_ctls[i])
+ snd_ctl_remove(codec->card, tas_hda->snd_ctls[i]);
+
+ if (tas_hda->prof_ctl)
+ snd_ctl_remove(codec->card, tas_hda->prof_ctl);
+}
+
static void tasdev_fw_ready(const struct firmware *fmw, void *context)
{
struct tasdevice_priv *tas_priv = context;
+ struct tas2781_hda *tas_hda = dev_get_drvdata(tas_priv->dev);
struct hda_codec *codec = tas_priv->codec;
int i, ret;
if (ret)
goto out;
- ret = snd_ctl_add(codec->card,
- snd_ctl_new1(&tas2781_prof_ctrl, tas_priv));
+ tas_hda->prof_ctl = snd_ctl_new1(&tas2781_prof_ctrl, tas_priv);
+ ret = snd_ctl_add(codec->card, tas_hda->prof_ctl);
if (ret) {
dev_err(tas_priv->dev,
"Failed to add KControl %s = %d\n",
}
for (i = 0; i < ARRAY_SIZE(tas2781_snd_controls); i++) {
- ret = snd_ctl_add(codec->card,
- snd_ctl_new1(&tas2781_snd_controls[i], tas_priv));
+ tas_hda->snd_ctls[i] = snd_ctl_new1(&tas2781_snd_controls[i],
+ tas_priv);
+ ret = snd_ctl_add(codec->card, tas_hda->snd_ctls[i]);
if (ret) {
dev_err(tas_priv->dev,
"Failed to add KControl %s = %d\n",
goto out;
}
- ret = snd_ctl_add(codec->card,
- snd_ctl_new1(&tas2781_dsp_prog_ctrl, tas_priv));
+ tas_hda->dsp_prog_ctl = snd_ctl_new1(&tas2781_dsp_prog_ctrl,
+ tas_priv);
+ ret = snd_ctl_add(codec->card, tas_hda->dsp_prog_ctl);
if (ret) {
dev_err(tas_priv->dev,
"Failed to add KControl %s = %d\n",
goto out;
}
- ret = snd_ctl_add(codec->card,
- snd_ctl_new1(&tas2781_dsp_conf_ctrl, tas_priv));
+ tas_hda->dsp_conf_ctl = snd_ctl_new1(&tas2781_dsp_conf_ctrl,
+ tas_priv);
+ ret = snd_ctl_add(codec->card, tas_hda->dsp_conf_ctl);
if (ret) {
dev_err(tas_priv->dev,
"Failed to add KControl %s = %d\n",
tas_priv->fw_state = TASDEVICE_DSP_FW_ALL_OK;
tasdevice_prmg_load(tas_priv, 0);
+ if (tas_priv->fmw->nr_programs > 0)
+ tas_priv->cur_prog = 0;
+ if (tas_priv->fmw->nr_configurations > 0)
+ tas_priv->cur_conf = 0;
/* If calibrated data occurs error, dsp will still works with default
* calibrated data inside algo.
tas2781_save_calibration(tas_priv);
out:
- mutex_unlock(&tas_priv->codec_lock);
+ mutex_unlock(&tas_hda->priv->codec_lock);
if (fmw)
release_firmware(fmw);
- pm_runtime_mark_last_busy(tas_priv->dev);
- pm_runtime_put_autosuspend(tas_priv->dev);
+ pm_runtime_mark_last_busy(tas_hda->dev);
+ pm_runtime_put_autosuspend(tas_hda->dev);
}
static int tas2781_hda_bind(struct device *dev, struct device *master,
void *master_data)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
struct hda_component *comps = master_data;
struct hda_codec *codec;
unsigned int subid;
int ret;
- if (!comps || tas_priv->index < 0 ||
- tas_priv->index >= HDA_MAX_COMPONENTS)
+ if (!comps || tas_hda->priv->index < 0 ||
+ tas_hda->priv->index >= HDA_MAX_COMPONENTS)
return -EINVAL;
- comps = &comps[tas_priv->index];
+ comps = &comps[tas_hda->priv->index];
if (comps->dev)
return -EBUSY;
switch (subid) {
case 0x17aa:
- tas_priv->catlog_id = LENOVO;
+ tas_hda->priv->catlog_id = LENOVO;
break;
default:
- tas_priv->catlog_id = OTHERS;
+ tas_hda->priv->catlog_id = OTHERS;
break;
}
strscpy(comps->name, dev_name(dev), sizeof(comps->name));
- ret = tascodec_init(tas_priv, codec, tasdev_fw_ready);
+ ret = tascodec_init(tas_hda->priv, codec, tasdev_fw_ready);
if (!ret)
comps->playback_hook = tas2781_hda_playback_hook;
static void tas2781_hda_unbind(struct device *dev,
struct device *master, void *master_data)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
struct hda_component *comps = master_data;
- comps = &comps[tas_priv->index];
+ comps = &comps[tas_hda->priv->index];
if (comps->dev == dev) {
comps->dev = NULL;
comps->playback_hook = NULL;
}
- tasdevice_config_info_remove(tas_priv);
- tasdevice_dsp_remove(tas_priv);
+ tas2781_hda_remove_controls(tas_hda);
- tas_priv->fw_state = TASDEVICE_DSP_FW_PENDING;
+ tasdevice_config_info_remove(tas_hda->priv);
+ tasdevice_dsp_remove(tas_hda->priv);
+
+ tas_hda->priv->fw_state = TASDEVICE_DSP_FW_PENDING;
}
static const struct component_ops tas2781_hda_comp_ops = {
static void tas2781_hda_remove(struct device *dev)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
- pm_runtime_get_sync(tas_priv->dev);
- pm_runtime_disable(tas_priv->dev);
+ pm_runtime_get_sync(tas_hda->dev);
+ pm_runtime_disable(tas_hda->dev);
- component_del(tas_priv->dev, &tas2781_hda_comp_ops);
+ component_del(tas_hda->dev, &tas2781_hda_comp_ops);
- pm_runtime_put_noidle(tas_priv->dev);
+ pm_runtime_put_noidle(tas_hda->dev);
- tasdevice_remove(tas_priv);
+ tasdevice_remove(tas_hda->priv);
}
static int tas2781_hda_i2c_probe(struct i2c_client *clt)
{
- struct tasdevice_priv *tas_priv;
+ struct tas2781_hda *tas_hda;
const char *device_name;
int ret;
else
return -ENODEV;
- tas_priv = tasdevice_kzalloc(clt);
- if (!tas_priv)
+ tas_hda = devm_kzalloc(&clt->dev, sizeof(*tas_hda), GFP_KERNEL);
+ if (!tas_hda)
return -ENOMEM;
- tas_priv->irq_info.irq = clt->irq;
- ret = tas2781_read_acpi(tas_priv, device_name);
+ dev_set_drvdata(&clt->dev, tas_hda);
+ tas_hda->dev = &clt->dev;
+
+ tas_hda->priv = tasdevice_kzalloc(clt);
+ if (!tas_hda->priv)
+ return -ENOMEM;
+
+ tas_hda->priv->irq_info.irq = clt->irq;
+ ret = tas2781_read_acpi(tas_hda->priv, device_name);
if (ret)
- return dev_err_probe(tas_priv->dev, ret,
+ return dev_err_probe(tas_hda->dev, ret,
"Platform not supported\n");
- ret = tasdevice_init(tas_priv);
+ ret = tasdevice_init(tas_hda->priv);
if (ret)
goto err;
- pm_runtime_set_autosuspend_delay(tas_priv->dev, 3000);
- pm_runtime_use_autosuspend(tas_priv->dev);
- pm_runtime_mark_last_busy(tas_priv->dev);
- pm_runtime_set_active(tas_priv->dev);
- pm_runtime_get_noresume(tas_priv->dev);
- pm_runtime_enable(tas_priv->dev);
+ pm_runtime_set_autosuspend_delay(tas_hda->dev, 3000);
+ pm_runtime_use_autosuspend(tas_hda->dev);
+ pm_runtime_mark_last_busy(tas_hda->dev);
+ pm_runtime_set_active(tas_hda->dev);
+ pm_runtime_get_noresume(tas_hda->dev);
+ pm_runtime_enable(tas_hda->dev);
- pm_runtime_put_autosuspend(tas_priv->dev);
+ pm_runtime_put_autosuspend(tas_hda->dev);
- tas2781_reset(tas_priv);
+ tas2781_reset(tas_hda->priv);
- ret = component_add(tas_priv->dev, &tas2781_hda_comp_ops);
+ ret = component_add(tas_hda->dev, &tas2781_hda_comp_ops);
if (ret) {
- dev_err(tas_priv->dev, "Register component failed: %d\n", ret);
- pm_runtime_disable(tas_priv->dev);
+ dev_err(tas_hda->dev, "Register component failed: %d\n", ret);
+ pm_runtime_disable(tas_hda->dev);
}
err:
static int tas2781_runtime_suspend(struct device *dev)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
int i;
- dev_dbg(tas_priv->dev, "Runtime Suspend\n");
+ dev_dbg(tas_hda->dev, "Runtime Suspend\n");
- mutex_lock(&tas_priv->codec_lock);
+ mutex_lock(&tas_hda->priv->codec_lock);
- if (tas_priv->playback_started) {
- tasdevice_tuning_switch(tas_priv, 1);
- tas_priv->playback_started = false;
+ if (tas_hda->priv->playback_started) {
+ tasdevice_tuning_switch(tas_hda->priv, 1);
+ tas_hda->priv->playback_started = false;
}
- for (i = 0; i < tas_priv->ndev; i++) {
- tas_priv->tasdevice[i].cur_book = -1;
- tas_priv->tasdevice[i].cur_prog = -1;
- tas_priv->tasdevice[i].cur_conf = -1;
+ for (i = 0; i < tas_hda->priv->ndev; i++) {
+ tas_hda->priv->tasdevice[i].cur_book = -1;
+ tas_hda->priv->tasdevice[i].cur_prog = -1;
+ tas_hda->priv->tasdevice[i].cur_conf = -1;
}
- regcache_cache_only(tas_priv->regmap, true);
- regcache_mark_dirty(tas_priv->regmap);
-
- mutex_unlock(&tas_priv->codec_lock);
+ mutex_unlock(&tas_hda->priv->codec_lock);
return 0;
}
static int tas2781_runtime_resume(struct device *dev)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
unsigned long calib_data_sz =
- tas_priv->ndev * TASDEVICE_SPEAKER_CALIBRATION_SIZE;
- int ret;
-
- dev_dbg(tas_priv->dev, "Runtime Resume\n");
+ tas_hda->priv->ndev * TASDEVICE_SPEAKER_CALIBRATION_SIZE;
- mutex_lock(&tas_priv->codec_lock);
+ dev_dbg(tas_hda->dev, "Runtime Resume\n");
- regcache_cache_only(tas_priv->regmap, false);
- ret = regcache_sync(tas_priv->regmap);
- if (ret) {
- dev_err(tas_priv->dev,
- "Failed to restore register cache: %d\n", ret);
- goto out;
- }
+ mutex_lock(&tas_hda->priv->codec_lock);
- tasdevice_prmg_load(tas_priv, tas_priv->cur_prog);
+ tasdevice_prmg_load(tas_hda->priv, tas_hda->priv->cur_prog);
/* If calibrated data occurs error, dsp will still works with default
* calibrated data inside algo.
*/
- if (tas_priv->cali_data.total_sz > calib_data_sz)
- tas2781_apply_calib(tas_priv);
+ if (tas_hda->priv->cali_data.total_sz > calib_data_sz)
+ tas2781_apply_calib(tas_hda->priv);
-out:
- mutex_unlock(&tas_priv->codec_lock);
+ mutex_unlock(&tas_hda->priv->codec_lock);
- return ret;
+ return 0;
}
static int tas2781_system_suspend(struct device *dev)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
int ret;
- dev_dbg(tas_priv->dev, "System Suspend\n");
+ dev_dbg(tas_hda->priv->dev, "System Suspend\n");
ret = pm_runtime_force_suspend(dev);
if (ret)
return ret;
/* Shutdown chip before system suspend */
- regcache_cache_only(tas_priv->regmap, false);
- tasdevice_tuning_switch(tas_priv, 1);
- regcache_cache_only(tas_priv->regmap, true);
- regcache_mark_dirty(tas_priv->regmap);
+ tasdevice_tuning_switch(tas_hda->priv, 1);
/*
* Reset GPIO may be shared, so cannot reset here.
static int tas2781_system_resume(struct device *dev)
{
- struct tasdevice_priv *tas_priv = dev_get_drvdata(dev);
+ struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
unsigned long calib_data_sz =
- tas_priv->ndev * TASDEVICE_SPEAKER_CALIBRATION_SIZE;
+ tas_hda->priv->ndev * TASDEVICE_SPEAKER_CALIBRATION_SIZE;
int i, ret;
- dev_dbg(tas_priv->dev, "System Resume\n");
+ dev_info(tas_hda->priv->dev, "System Resume\n");
ret = pm_runtime_force_resume(dev);
if (ret)
return ret;
- mutex_lock(&tas_priv->codec_lock);
+ mutex_lock(&tas_hda->priv->codec_lock);
- for (i = 0; i < tas_priv->ndev; i++) {
- tas_priv->tasdevice[i].cur_book = -1;
- tas_priv->tasdevice[i].cur_prog = -1;
- tas_priv->tasdevice[i].cur_conf = -1;
+ for (i = 0; i < tas_hda->priv->ndev; i++) {
+ tas_hda->priv->tasdevice[i].cur_book = -1;
+ tas_hda->priv->tasdevice[i].cur_prog = -1;
+ tas_hda->priv->tasdevice[i].cur_conf = -1;
}
- tas2781_reset(tas_priv);
- tasdevice_prmg_load(tas_priv, tas_priv->cur_prog);
+ tas2781_reset(tas_hda->priv);
+ tasdevice_prmg_load(tas_hda->priv, tas_hda->priv->cur_prog);
/* If calibrated data occurs error, dsp will still work with default
* calibrated data inside algo.
*/
- if (tas_priv->cali_data.total_sz > calib_data_sz)
- tas2781_apply_calib(tas_priv);
- mutex_unlock(&tas_priv->codec_lock);
+ if (tas_hda->priv->cali_data.total_sz > calib_data_sz)
+ tas2781_apply_calib(tas_hda->priv);
+ mutex_unlock(&tas_hda->priv->codec_lock);
return 0;
}
.driver = {
.name = "cs35l45",
.of_match_table = cs35l45_of_match,
- .pm = &cs35l45_pm_ops,
+ .pm = pm_ptr(&cs35l45_pm_ops),
},
.id_table = cs35l45_id_i2c,
.probe = cs35l45_i2c_probe,
.driver = {
.name = "cs35l45",
.of_match_table = cs35l45_of_match,
- .pm = &cs35l45_pm_ops,
+ .pm = pm_ptr(&cs35l45_pm_ops),
},
.id_table = cs35l45_id_spi,
.probe = cs35l45_spi_probe,
cs35l45_setup_hibernate(cs35l45);
+ regmap_set_bits(cs35l45->regmap, CS35L45_IRQ1_MASK_2, CS35L45_DSP_VIRT2_MBOX_MASK);
+
// Don't wait for ACK since bus activity would wake the device
regmap_write(cs35l45->regmap, CS35L45_DSP_VIRT1_MBOX_1, CSPL_MBOX_CMD_HIBERNATE);
CSPL_MBOX_CMD_OUT_OF_HIBERNATE);
if (!ret) {
dev_dbg(cs35l45->dev, "Wake success at cycle: %d\n", j);
+ regmap_clear_bits(cs35l45->regmap, CS35L45_IRQ1_MASK_2,
+ CS35L45_DSP_VIRT2_MBOX_MASK);
return 0;
}
usleep_range(100, 200);
return -ETIMEDOUT;
}
-static int __maybe_unused cs35l45_runtime_suspend(struct device *dev)
+static int cs35l45_runtime_suspend(struct device *dev)
{
struct cs35l45_private *cs35l45 = dev_get_drvdata(dev);
return 0;
}
-static int __maybe_unused cs35l45_runtime_resume(struct device *dev)
+static int cs35l45_runtime_resume(struct device *dev)
{
struct cs35l45_private *cs35l45 = dev_get_drvdata(dev);
int ret;
return ret;
}
+static int cs35l45_sys_suspend(struct device *dev)
+{
+ struct cs35l45_private *cs35l45 = dev_get_drvdata(dev);
+
+ dev_dbg(cs35l45->dev, "System suspend, disabling IRQ\n");
+ disable_irq(cs35l45->irq);
+
+ return 0;
+}
+
+static int cs35l45_sys_suspend_noirq(struct device *dev)
+{
+ struct cs35l45_private *cs35l45 = dev_get_drvdata(dev);
+
+ dev_dbg(cs35l45->dev, "Late system suspend, reenabling IRQ\n");
+ enable_irq(cs35l45->irq);
+
+ return 0;
+}
+
+static int cs35l45_sys_resume_noirq(struct device *dev)
+{
+ struct cs35l45_private *cs35l45 = dev_get_drvdata(dev);
+
+ dev_dbg(cs35l45->dev, "Early system resume, disabling IRQ\n");
+ disable_irq(cs35l45->irq);
+
+ return 0;
+}
+
+static int cs35l45_sys_resume(struct device *dev)
+{
+ struct cs35l45_private *cs35l45 = dev_get_drvdata(dev);
+
+ dev_dbg(cs35l45->dev, "System resume, reenabling IRQ\n");
+ enable_irq(cs35l45->irq);
+
+ return 0;
+}
+
static int cs35l45_apply_property_config(struct cs35l45_private *cs35l45)
{
struct device_node *node = cs35l45->dev->of_node;
}
EXPORT_SYMBOL_NS_GPL(cs35l45_remove, SND_SOC_CS35L45);
-const struct dev_pm_ops cs35l45_pm_ops = {
- SET_RUNTIME_PM_OPS(cs35l45_runtime_suspend, cs35l45_runtime_resume, NULL)
+EXPORT_GPL_DEV_PM_OPS(cs35l45_pm_ops) = {
+ RUNTIME_PM_OPS(cs35l45_runtime_suspend, cs35l45_runtime_resume, NULL)
+
+ SYSTEM_SLEEP_PM_OPS(cs35l45_sys_suspend, cs35l45_sys_resume)
+ NOIRQ_SYSTEM_SLEEP_PM_OPS(cs35l45_sys_suspend_noirq, cs35l45_sys_resume_noirq)
};
-EXPORT_SYMBOL_NS_GPL(cs35l45_pm_ops, SND_SOC_CS35L45);
MODULE_DESCRIPTION("ASoC CS35L45 driver");
MODULE_AUTHOR("James Schulman, Cirrus Logic Inc, <james.schulman@cirrus.com>");
return ret;
}
-static void cs42l43_start_hs_bias(struct cs42l43_codec *priv, bool force_high)
+static void cs42l43_start_hs_bias(struct cs42l43_codec *priv, bool type_detect)
{
struct cs42l43 *cs42l43 = priv->core;
unsigned int val = 0x3 << CS42L43_HSBIAS_MODE_SHIFT;
regmap_update_bits(cs42l43->regmap, CS42L43_HS2,
CS42L43_HS_CLAMP_DISABLE_MASK, CS42L43_HS_CLAMP_DISABLE_MASK);
- if (!force_high && priv->bias_low)
- val = 0x2 << CS42L43_HSBIAS_MODE_SHIFT;
-
- if (priv->bias_sense_ua) {
- regmap_update_bits(cs42l43->regmap,
- CS42L43_HS_BIAS_SENSE_AND_CLAMP_AUTOCONTROL,
- CS42L43_HSBIAS_SENSE_EN_MASK |
- CS42L43_AUTO_HSBIAS_CLAMP_EN_MASK,
- CS42L43_HSBIAS_SENSE_EN_MASK |
- CS42L43_AUTO_HSBIAS_CLAMP_EN_MASK);
+ if (!type_detect) {
+ if (priv->bias_low)
+ val = 0x2 << CS42L43_HSBIAS_MODE_SHIFT;
+
+ if (priv->bias_sense_ua)
+ regmap_update_bits(cs42l43->regmap,
+ CS42L43_HS_BIAS_SENSE_AND_CLAMP_AUTOCONTROL,
+ CS42L43_HSBIAS_SENSE_EN_MASK |
+ CS42L43_AUTO_HSBIAS_CLAMP_EN_MASK,
+ CS42L43_HSBIAS_SENSE_EN_MASK |
+ CS42L43_AUTO_HSBIAS_CLAMP_EN_MASK);
}
regmap_update_bits(cs42l43->regmap, CS42L43_MIC_DETECT_CONTROL_1,
static void hdmi_codec_jack_report(struct hdmi_codec_priv *hcp,
unsigned int jack_status)
{
- if (hcp->jack && jack_status != hcp->jack_status) {
- snd_soc_jack_report(hcp->jack, jack_status, SND_JACK_LINEOUT);
+ if (jack_status != hcp->jack_status) {
+ if (hcp->jack)
+ snd_soc_jack_report(hcp->jack, jack_status, SND_JACK_LINEOUT);
hcp->jack_status = jack_status;
}
}
if (hcp->hcd.ops->hook_plugged_cb) {
hcp->jack = jack;
+
+ /*
+ * Report the initial jack status which may have been provided
+ * by the parent hdmi driver while the hpd hook was registered.
+ */
+ snd_soc_jack_report(jack, hcp->jack_status, SND_JACK_LINEOUT);
+
return 0;
}
static const struct regmap_config tasdevice_regmap = {
.reg_bits = 8,
.val_bits = 8,
- .cache_type = REGCACHE_RBTREE,
+ .cache_type = REGCACHE_NONE,
.ranges = tasdevice_ranges,
.num_ranges = ARRAY_SIZE(tasdevice_ranges),
.max_register = 256 * 128,
tas_priv->tasdevice[i].cur_conf = -1;
}
- dev_set_drvdata(tas_priv->dev, tas_priv);
-
mutex_init(&tas_priv->codec_lock);
out:
goto out;
}
- conf = &(tas_fmw->configs[cfg_no]);
for (i = 0, prog_status = 0; i < tas_priv->ndev; i++) {
if (cfg_info[rca_conf_no]->active_dev & (1 << i)) {
- if (tas_priv->tasdevice[i].cur_prog != prm_no
- || tas_priv->force_fwload_status) {
+ if (prm_no >= 0
+ && (tas_priv->tasdevice[i].cur_prog != prm_no
+ || tas_priv->force_fwload_status)) {
tas_priv->tasdevice[i].cur_conf = -1;
tas_priv->tasdevice[i].is_loading = true;
prog_status++;
}
for (i = 0, status = 0; i < tas_priv->ndev; i++) {
- if (tas_priv->tasdevice[i].cur_conf != cfg_no
+ if (cfg_no >= 0
+ && tas_priv->tasdevice[i].cur_conf != cfg_no
&& (cfg_info[rca_conf_no]->active_dev & (1 << i))
&& (tas_priv->tasdevice[i].is_loaderr == false)) {
status++;
}
if (status) {
+ conf = &(tas_fmw->configs[cfg_no]);
status = 0;
tasdevice_load_data(tas_priv, &(conf->dev_data));
for (i = 0; i < tas_priv->ndev; i++) {
}
for (i = 0, prog_status = 0; i < tas_priv->ndev; i++) {
- if (tas_priv->tasdevice[i].cur_prog != prm_no) {
+ if (prm_no >= 0 && tas_priv->tasdevice[i].cur_prog != prm_no) {
tas_priv->tasdevice[i].cur_conf = -1;
tas_priv->tasdevice[i].is_loading = true;
prog_status++;
}
for (i = 0, prog_status = 0; i < tas_priv->ndev; i++) {
- if (tas_priv->tasdevice[i].cur_prog != prm_no) {
+ if (prm_no >= 0 && tas_priv->tasdevice[i].cur_prog != prm_no) {
tas_priv->tasdevice[i].cur_conf = -1;
tas_priv->tasdevice[i].is_loading = true;
prog_status++;
if (!tas_priv)
return -ENOMEM;
+ dev_set_drvdata(&i2c->dev, tas_priv);
+
if (ACPI_HANDLE(&i2c->dev)) {
acpi_id = acpi_match_device(i2c->dev.driver->acpi_match_table,
&i2c->dev);
ret = devm_snd_soc_register_component(&pdev->dev, &fsl_component,
&fsl_rpmsg_dai, 1);
if (ret)
- return ret;
+ goto err_pm_disable;
rpmsg->card_pdev = platform_device_register_data(&pdev->dev,
"imx-audio-rpmsg",
if (IS_ERR(rpmsg->card_pdev)) {
dev_err(&pdev->dev, "failed to register rpmsg card\n");
ret = PTR_ERR(rpmsg->card_pdev);
- return ret;
+ goto err_pm_disable;
}
return 0;
+
+err_pm_disable:
+ pm_runtime_disable(&pdev->dev);
+ return ret;
}
static void fsl_rpmsg_remove(struct platform_device *pdev)
{
struct fsl_rpmsg *rpmsg = platform_get_drvdata(pdev);
+ pm_runtime_disable(&pdev->dev);
+
if (rpmsg->card_pdev)
platform_device_unregister(rpmsg->card_pdev);
}
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
unsigned int ofs = sai->soc_data->reg_offset;
+ /* Clear xMR to avoid channel swap with mclk_with_tere enabled case */
+ regmap_write(sai->regmap, FSL_SAI_xMR(tx), 0);
+
regmap_update_bits(sai->regmap, FSL_SAI_xCR3(tx, ofs),
FSL_SAI_CR3_TRCE_MASK, 0);
#define BYT_RT5640_HSMIC2_ON_IN1 BIT(27)
#define BYT_RT5640_JD_HP_ELITEP_1000G2 BIT(28)
#define BYT_RT5640_USE_AMCR0F28 BIT(29)
+#define BYT_RT5640_SWAPPED_SPEAKERS BIT(30)
#define BYTCR_INPUT_DEFAULTS \
(BYT_RT5640_IN3_MAP | \
dev_info(dev, "quirk MONO_SPEAKER enabled\n");
if (byt_rt5640_quirk & BYT_RT5640_NO_SPEAKERS)
dev_info(dev, "quirk NO_SPEAKERS enabled\n");
+ if (byt_rt5640_quirk & BYT_RT5640_SWAPPED_SPEAKERS)
+ dev_info(dev, "quirk SWAPPED_SPEAKERS enabled\n");
if (byt_rt5640_quirk & BYT_RT5640_LINEOUT)
dev_info(dev, "quirk LINEOUT enabled\n");
if (byt_rt5640_quirk & BYT_RT5640_LINEOUT_AS_HP2)
BYT_RT5640_SSP0_AIF1 |
BYT_RT5640_MCLK_EN),
},
+ {
+ /* Medion Lifetab S10346 */
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"),
+ DMI_MATCH(DMI_BOARD_NAME, "Aptio CRB"),
+ /* Above strings are much too generic, also match on BIOS date */
+ DMI_MATCH(DMI_BIOS_DATE, "10/22/2015"),
+ },
+ .driver_data = (void *)(BYTCR_INPUT_DEFAULTS |
+ BYT_RT5640_SWAPPED_SPEAKERS |
+ BYT_RT5640_SSP0_AIF1 |
+ BYT_RT5640_MCLK_EN),
+ },
{ /* Mele PCG03 Mini PC */
.matches = {
DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "Mini PC"),
const char *platform_name;
struct acpi_device *adev;
struct device *codec_dev;
+ const char *cfg_spk;
bool sof_parent;
int ret_val = 0;
int dai_index = 0;
- int i, cfg_spk;
- int aif;
+ int i, aif;
is_bytcr = false;
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
}
if (byt_rt5640_quirk & BYT_RT5640_NO_SPEAKERS) {
- cfg_spk = 0;
+ cfg_spk = "0";
spk_type = "none";
} else if (byt_rt5640_quirk & BYT_RT5640_MONO_SPEAKER) {
- cfg_spk = 1;
+ cfg_spk = "1";
spk_type = "mono";
+ } else if (byt_rt5640_quirk & BYT_RT5640_SWAPPED_SPEAKERS) {
+ cfg_spk = "swapped";
+ spk_type = "swapped";
} else {
- cfg_spk = 2;
+ cfg_spk = "2";
spk_type = "stereo";
}
headset2_string = " cfg-hs2:in1";
snprintf(byt_rt5640_components, sizeof(byt_rt5640_components),
- "cfg-spk:%d cfg-mic:%s aif:%d%s%s", cfg_spk,
+ "cfg-spk:%s cfg-mic:%s aif:%d%s%s", cfg_spk,
map_name[BYT_RT5640_MAP(byt_rt5640_quirk)], aif,
lineout_string, headset2_string);
byt_rt5640_card.components = byt_rt5640_components;
.adr = 0x00013701FA355601ull,
.num_endpoints = 1,
.endpoints = &spk_r_endpoint,
- .name_prefix = "cs35l56-8"
+ .name_prefix = "AMP8"
},
{
.adr = 0x00013601FA355601ull,
.num_endpoints = 1,
.endpoints = &spk_3_endpoint,
- .name_prefix = "cs35l56-7"
+ .name_prefix = "AMP7"
}
};
.adr = 0x00023301FA355601ull,
.num_endpoints = 1,
.endpoints = &spk_l_endpoint,
- .name_prefix = "cs35l56-1"
+ .name_prefix = "AMP1"
},
{
.adr = 0x00023201FA355601ull,
.num_endpoints = 1,
.endpoints = &spk_2_endpoint,
- .name_prefix = "cs35l56-2"
+ .name_prefix = "AMP2"
}
};
SND_SOC_DAPM_PRE_PMU | SND_SOC_DAPM_POST_PMD),
SND_SOC_DAPM_SUPPLY_S("AUD_PAD_TOP", SUPPLY_SEQ_ADDA_AUD_PAD_TOP,
- 0, 0, 0,
+ AFE_AUD_PAD_TOP, RG_RX_FIFO_ON_SFT, 0,
mtk_adda_pad_top_event,
SND_SOC_DAPM_PRE_PMU),
SND_SOC_DAPM_SUPPLY_S("ADDA_MTKAIF_CFG", SUPPLY_SEQ_ADDA_MTKAIF_CFG,
struct soc_enum *e = (struct soc_enum *)kcontrol->private_value;
unsigned int mux, reg;
+ if (ucontrol->value.enumerated.item[0] >= e->items)
+ return -EINVAL;
+
mux = snd_soc_enum_item_to_val(e, ucontrol->value.enumerated.item[0]);
regmap_field_read(priv->field_dat_sel, ®);
snd_soc_dapm_mux_update_power(dapm, kcontrol, mux, e, NULL);
- return 0;
+ return 1;
}
static SOC_ENUM_SINGLE_DECL(g12a_toacodec_mux_enum, TOACODEC_CTRL0,
struct soc_enum *e = (struct soc_enum *)kcontrol->private_value;
unsigned int mux, changed;
+ if (ucontrol->value.enumerated.item[0] >= e->items)
+ return -EINVAL;
+
mux = snd_soc_enum_item_to_val(e, ucontrol->value.enumerated.item[0]);
changed = snd_soc_component_test_bits(component, e->reg,
CTRL0_I2S_DAT_SEL,
struct soc_enum *e = (struct soc_enum *)kcontrol->private_value;
unsigned int mux, changed;
+ if (ucontrol->value.enumerated.item[0] >= e->items)
+ return -EINVAL;
+
mux = snd_soc_enum_item_to_val(e, ucontrol->value.enumerated.item[0]);
changed = snd_soc_component_test_bits(component, TOHDMITX_CTRL0,
CTRL0_SPDIF_SEL,
snd_soc_dapm_mux_update_power(dapm, kcontrol, mux, e, NULL);
- return 0;
+ return 1;
}
static SOC_ENUM_SINGLE_DECL(g12a_tohdmitx_spdif_mux_enum, TOHDMITX_CTRL0,
static int hda_codec_load_module(struct hda_codec *codec)
{
- int ret = request_codec_module(codec);
+ int ret;
+
+ ret = snd_hdac_device_register(&codec->core);
+ if (ret) {
+ dev_err(&codec->core.dev, "failed to register hdac device\n");
+ put_device(&codec->core.dev);
+ return ret;
+ }
+ ret = request_codec_module(codec);
if (ret <= 0) {
codec->probe_id = HDA_CODEC_ID_GENERIC;
ret = request_codec_module(codec);
static struct hda_codec *hda_codec_device_init(struct hdac_bus *bus, int addr, int type)
{
struct hda_codec *codec;
- int ret;
codec = snd_hda_codec_device_init(to_hda_bus(bus), addr, "ehdaudio%dD%d", bus->idx, addr);
if (IS_ERR(codec)) {
codec->core.type = type;
- ret = snd_hdac_device_register(&codec->core);
- if (ret) {
- dev_err(bus->dev, "failed to register hdac device\n");
- put_device(&codec->core.dev);
- return ERR_PTR(ret);
- }
-
return codec;
}
static struct snd_sof_of_mach sof_mt8186_machs[] = {
{
- .compatible = "google,steelix",
- .sof_tplg_filename = "sof-mt8186-google-steelix.tplg"
- }, {
.compatible = "mediatek,mt8186",
.sof_tplg_filename = "sof-mt8186.tplg",
},
__le16 num_meters;
__le32 magic;
} __packed req;
- u32 resp[SCARLETT2_MAX_METERS];
+ __le32 resp[SCARLETT2_MAX_METERS];
int i, err;
req.pad = 0;
/* copy, convert to u16 */
for (i = 0; i < num_meters; i++)
- levels[i] = resp[i];
+ levels[i] = le32_to_cpu(resp[i]);
return 0;
}
static int snd_usb_motu_m_series_boot_quirk(struct usb_device *dev)
{
- msleep(2000);
+ msleep(4000);
return 0;
}
unsigned int id)
{
switch (id) {
- case USB_ID(0x07fd, 0x0008): /* MOTU M Series */
+ case USB_ID(0x07fd, 0x0008): /* MOTU M Series, 1st hardware version */
return snd_usb_motu_m_series_boot_quirk(dev);
}
#define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */
#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */
#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */
-#define X86_FEATURE_ZEN (7*32+28) /* "" CPU based on Zen microarchitecture */
+#define X86_FEATURE_ZEN ( 7*32+28) /* "" Generic flag for all Zen and newer */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
#define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */
#define ARMV8_PMU_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/
#define ARMV8_PMU_PMCR_LC (1 << 6) /* Overflow on 64 bit cycle counter */
#define ARMV8_PMU_PMCR_LP (1 << 7) /* Long event counter enable */
-#define ARMV8_PMU_PMCR_N_SHIFT 11 /* Number of counters supported */
-#define ARMV8_PMU_PMCR_N_MASK 0x1f
-#define ARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
+#define ARMV8_PMU_PMCR_N GENMASK(15, 11) /* Number of counters supported */
+/* Mask for writable bits */
+#define ARMV8_PMU_PMCR_MASK (ARMV8_PMU_PMCR_E | ARMV8_PMU_PMCR_P | \
+ ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_D | \
+ ARMV8_PMU_PMCR_X | ARMV8_PMU_PMCR_DP | \
+ ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
/*
* PMOVSR: counters overflow flag status reg
*/
-#define ARMV8_PMU_OVSR_MASK 0xffffffff /* Mask for writable bits */
-#define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_OVSR_MASK
+#define ARMV8_PMU_OVSR_P GENMASK(30, 0)
+#define ARMV8_PMU_OVSR_C BIT(31)
+/* Mask for writable bits is both P and C fields */
+#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C)
/*
* PMXEVTYPER: Event selection reg
*/
-#define ARMV8_PMU_EVTYPE_MASK 0xc800ffff /* Mask for writable bits */
-#define ARMV8_PMU_EVTYPE_EVENT 0xffff /* Mask for EVENT bits */
+#define ARMV8_PMU_EVTYPE_EVENT GENMASK(15, 0) /* Mask for EVENT bits */
+#define ARMV8_PMU_EVTYPE_TH GENMASK(43, 32)
+#define ARMV8_PMU_EVTYPE_TC GENMASK(63, 61)
/*
* Event filters for PMUv3
*/
-#define ARMV8_PMU_EXCLUDE_EL1 (1U << 31)
-#define ARMV8_PMU_EXCLUDE_EL0 (1U << 30)
-#define ARMV8_PMU_INCLUDE_EL2 (1U << 27)
+#define ARMV8_PMU_EXCLUDE_EL1 (1U << 31)
+#define ARMV8_PMU_EXCLUDE_EL0 (1U << 30)
+#define ARMV8_PMU_EXCLUDE_NS_EL1 (1U << 29)
+#define ARMV8_PMU_EXCLUDE_NS_EL0 (1U << 28)
+#define ARMV8_PMU_INCLUDE_EL2 (1U << 27)
+#define ARMV8_PMU_EXCLUDE_EL3 (1U << 26)
/*
* PMUSERENR: user enable reg
*/
-#define ARMV8_PMU_USERENR_MASK 0xf /* Mask for writable bits */
#define ARMV8_PMU_USERENR_EN (1 << 0) /* PMU regs can be accessed at EL0 */
#define ARMV8_PMU_USERENR_SW (1 << 1) /* PMSWINC can be written at EL0 */
#define ARMV8_PMU_USERENR_CR (1 << 2) /* Cycle counter can be read at EL0 */
#define ARMV8_PMU_USERENR_ER (1 << 3) /* Event counter can be read at EL0 */
+/* Mask for writable bits */
+#define ARMV8_PMU_USERENR_MASK (ARMV8_PMU_USERENR_EN | ARMV8_PMU_USERENR_SW | \
+ ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_ER)
/* PMMIR_EL1.SLOTS mask */
-#define ARMV8_PMU_SLOTS_MASK 0xff
-
-#define ARMV8_PMU_BUS_SLOTS_SHIFT 8
-#define ARMV8_PMU_BUS_SLOTS_MASK 0xff
-#define ARMV8_PMU_BUS_WIDTH_SHIFT 16
-#define ARMV8_PMU_BUS_WIDTH_MASK 0xf
+#define ARMV8_PMU_SLOTS GENMASK(7, 0)
+#define ARMV8_PMU_BUS_SLOTS GENMASK(15, 8)
+#define ARMV8_PMU_BUS_WIDTH GENMASK(19, 16)
+#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
/*
* This code is really good
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
- MT_BUG_ON(mt, allocated != 1);
+ MT_BUG_ON(mt, allocated != 0);
mas_store_prealloc(&mas, ptr);
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
TARGETS += filesystems/binderfs
TARGETS += filesystems/epoll
TARGETS += filesystems/fat
+TARGETS += filesystems/overlayfs
+TARGETS += filesystems/statmount
TARGETS += firmware
TARGETS += fpu
TARGETS += ftrace
err = snd_ctl_elem_info(card_data->handle,
ctl_data->info);
if (err < 0) {
- ksft_print_msg("%s getting info for %d\n",
+ ksft_print_msg("%s getting info for %s\n",
snd_strerror(err),
ctl_data->name);
}
putnum(++tests_run); \
putstr(" " #name "\n");
+#define skip_test(name) \
+ tests_skipped++; \
+ putstr("ok "); \
+ putnum(++tests_run); \
+ putstr(" # SKIP " #name "\n");
+
int main(int argc, char **argv)
{
int ret, i;
} else {
putstr("# SME support not present\n");
- for (i = 0; i < EXPECTED_TESTS; i++) {
- putstr("ok ");
- putnum(i);
- putstr(" skipped, TPIDR2 not supported\n");
- }
-
- tests_skipped += EXPECTED_TESTS;
+ skip_test(default_value);
+ skip_test(write_read);
+ skip_test(write_sleep_read);
+ skip_test(write_fork_read);
+ skip_test(write_clone_read);
}
print_summary();
mov x11, x1 // actual data
mov x12, x2 // data size
+#ifdef SSVE
+ mrs x13, S3_3_C4_C2_2
+#endif
+
puts "Mismatch: PID="
mov x0, x20
bl putdec
bl dumphex
puts "]\n"
+#ifdef SSVE
+ puts "\tSVCR: "
+ mov x0, x13
+ bl putdecn
+#endif
+
mov x8, #__NR_getpid
svc #0
// fpsimd.c acitivty log dump hack
},
};
+static bool vec_type_supported(struct vec_data *data)
+{
+ return getauxval(data->hwcap_type) & data->hwcap;
+}
+
static int stdio_read_integer(FILE *f, const char *what, int *val)
{
int n = 0;
return;
}
- for (i = 0; i < ARRAY_SIZE(vec_data); i++)
+ for (i = 0; i < ARRAY_SIZE(vec_data); i++) {
+ if (!vec_type_supported(&vec_data[i]))
+ continue;
orig_vls[i] = vec_data[i].rdvl();
+ }
for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
vl = sve_vl_from_vq(vq);
if (&vec_data[i] == data)
continue;
- if (!(getauxval(vec_data[i].hwcap_type) & vec_data[i].hwcap))
+ if (!vec_type_supported(&vec_data[i]))
continue;
if (vec_data[i].rdvl() != orig_vls[i]) {
struct vec_data *data = &vec_data[i];
unsigned long supported;
- supported = getauxval(data->hwcap_type) & data->hwcap;
+ supported = vec_type_supported(data);
if (!supported)
all_supported = false;
// mov w8, #__NR_exit
// svc #0
// end hack
+
+ mrs x13, S3_3_C4_C2_2
+
smstop
mov x10, x0 // expected data
mov x11, x1 // actual data
mov x1, x12
bl dumphex
puts "]\n"
+ puts "\tSVCR: "
+ mov x0, x13
+ bl putdecn
mov x8, #__NR_getpid
svc #0
// mov w8, #__NR_exit
// svc #0
// end hack
+
+ mrs x13, S3_3_C4_C2_2
smstop
mov x10, x0 // expected data
mov x11, x1 // actual data
mov x1, x12
bl dumphex
puts "]\n"
+ puts "\tSVCR: "
+ mov x0, x13
+ bl putdecn
mov x8, #__NR_getpid
svc #0
test_sockmap_pass_prog__destroy(pass);
}
+static void test_sockmap_unconnected_unix(void)
+{
+ int err, map, stream = 0, dgram = 0, zero = 0;
+ struct test_sockmap_pass_prog *skel;
+
+ skel = test_sockmap_pass_prog__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "open_and_load"))
+ return;
+
+ map = bpf_map__fd(skel->maps.sock_map_rx);
+
+ stream = xsocket(AF_UNIX, SOCK_STREAM, 0);
+ if (stream < 0)
+ return;
+
+ dgram = xsocket(AF_UNIX, SOCK_DGRAM, 0);
+ if (dgram < 0) {
+ close(stream);
+ return;
+ }
+
+ err = bpf_map_update_elem(map, &zero, &stream, BPF_ANY);
+ ASSERT_ERR(err, "bpf_map_update_elem(stream)");
+
+ err = bpf_map_update_elem(map, &zero, &dgram, BPF_ANY);
+ ASSERT_OK(err, "bpf_map_update_elem(dgram)");
+
+ close(stream);
+ close(dgram);
+}
+
void test_sockmap_basic(void)
{
if (test__start_subtest("sockmap create_update_free"))
test_sockmap_skb_verdict_fionread(false);
if (test__start_subtest("sockmap skb_verdict msg_f_peek"))
test_sockmap_skb_verdict_peek();
+
+ if (test__start_subtest("sockmap unconnected af_unix"))
+ test_sockmap_unconnected_unix();
}
ip netns exec client ip link add dev bond0 down type bond mode 1 \
miimon 100 all_slaves_active 1
-ip netns exec client ip link set dev eth0 down master bond0
+ip netns exec client ip link set dev eth0 master bond0
ip netns exec client ip link set dev bond0 up
ip netns exec client ip addr add ${client_ip4}/24 dev bond0
ip netns exec client ping -c 5 $server_ip4 >/dev/null
-ip netns exec client ip link set dev eth0 down nomaster
+ip netns exec client ip link set dev eth0 nomaster
ip netns exec client ip link set dev bond0 down
ip netns exec client ip link set dev bond0 type bond mode 0 \
arp_interval 1000 arp_ip_target "+${server_ip4}"
-ip netns exec client ip link set dev eth0 down master bond0
+ip netns exec client ip link set dev eth0 master bond0
ip netns exec client ip link set dev bond0 up
ip netns exec client ping -c 5 $server_ip4 >/dev/null
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0-only
+dev_in_maps
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_GEN_PROGS := dev_in_maps
+
+CFLAGS := -Wall -Werror
+
+include ../../lib.mk
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <inttypes.h>
+#include <unistd.h>
+#include <stdio.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+#include <linux/mount.h>
+#include <sys/syscall.h>
+#include <sys/stat.h>
+#include <sys/mount.h>
+#include <sys/mman.h>
+#include <sched.h>
+#include <fcntl.h>
+
+#include "../../kselftest.h"
+#include "log.h"
+
+static int sys_fsopen(const char *fsname, unsigned int flags)
+{
+ return syscall(__NR_fsopen, fsname, flags);
+}
+
+static int sys_fsconfig(int fd, unsigned int cmd, const char *key, const char *value, int aux)
+{
+ return syscall(__NR_fsconfig, fd, cmd, key, value, aux);
+}
+
+static int sys_fsmount(int fd, unsigned int flags, unsigned int attr_flags)
+{
+ return syscall(__NR_fsmount, fd, flags, attr_flags);
+}
+
+static int sys_move_mount(int from_dfd, const char *from_pathname,
+ int to_dfd, const char *to_pathname,
+ unsigned int flags)
+{
+ return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags);
+}
+
+static long get_file_dev_and_inode(void *addr, struct statx *stx)
+{
+ char buf[4096];
+ FILE *mapf;
+
+ mapf = fopen("/proc/self/maps", "r");
+ if (mapf == NULL)
+ return pr_perror("fopen(/proc/self/maps)");
+
+ while (fgets(buf, sizeof(buf), mapf)) {
+ unsigned long start, end;
+ uint32_t maj, min;
+ __u64 ino;
+
+ if (sscanf(buf, "%lx-%lx %*s %*s %x:%x %llu",
+ &start, &end, &maj, &min, &ino) != 5)
+ return pr_perror("unable to parse: %s", buf);
+ if (start == (unsigned long)addr) {
+ stx->stx_dev_major = maj;
+ stx->stx_dev_minor = min;
+ stx->stx_ino = ino;
+ return 0;
+ }
+ }
+
+ return pr_err("unable to find the mapping");
+}
+
+static int ovl_mount(void)
+{
+ int tmpfs, fsfd, ovl;
+
+ fsfd = sys_fsopen("tmpfs", 0);
+ if (fsfd == -1)
+ return pr_perror("fsopen(tmpfs)");
+
+ if (sys_fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0) == -1)
+ return pr_perror("FSCONFIG_CMD_CREATE");
+
+ tmpfs = sys_fsmount(fsfd, 0, 0);
+ if (tmpfs == -1)
+ return pr_perror("fsmount");
+
+ close(fsfd);
+
+ /* overlayfs can't be constructed on top of a detached mount. */
+ if (sys_move_mount(tmpfs, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH))
+ return pr_perror("move_mount");
+ close(tmpfs);
+
+ if (mkdir("/tmp/w", 0755) == -1 ||
+ mkdir("/tmp/u", 0755) == -1 ||
+ mkdir("/tmp/l", 0755) == -1)
+ return pr_perror("mkdir");
+
+ fsfd = sys_fsopen("overlay", 0);
+ if (fsfd == -1)
+ return pr_perror("fsopen(overlay)");
+ if (sys_fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "test", 0) == -1 ||
+ sys_fsconfig(fsfd, FSCONFIG_SET_STRING, "lowerdir", "/tmp/l", 0) == -1 ||
+ sys_fsconfig(fsfd, FSCONFIG_SET_STRING, "upperdir", "/tmp/u", 0) == -1 ||
+ sys_fsconfig(fsfd, FSCONFIG_SET_STRING, "workdir", "/tmp/w", 0) == -1)
+ return pr_perror("fsconfig");
+ if (sys_fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0) == -1)
+ return pr_perror("fsconfig");
+ ovl = sys_fsmount(fsfd, 0, 0);
+ if (ovl == -1)
+ return pr_perror("fsmount");
+
+ return ovl;
+}
+
+/*
+ * Check that the file device and inode shown in /proc/pid/maps match values
+ * returned by stat(2).
+ */
+static int test(void)
+{
+ struct statx stx, mstx;
+ int ovl, fd;
+ void *addr;
+
+ ovl = ovl_mount();
+ if (ovl == -1)
+ return -1;
+
+ fd = openat(ovl, "test", O_RDWR | O_CREAT, 0644);
+ if (fd == -1)
+ return pr_perror("openat");
+
+ addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, fd, 0);
+ if (addr == MAP_FAILED)
+ return pr_perror("mmap");
+
+ if (get_file_dev_and_inode(addr, &mstx))
+ return -1;
+ if (statx(fd, "", AT_EMPTY_PATH | AT_STATX_SYNC_AS_STAT, STATX_INO, &stx))
+ return pr_perror("statx");
+
+ if (stx.stx_dev_major != mstx.stx_dev_major ||
+ stx.stx_dev_minor != mstx.stx_dev_minor ||
+ stx.stx_ino != mstx.stx_ino)
+ return pr_fail("unmatched dev:ino %x:%x:%llx (expected %x:%x:%llx)\n",
+ mstx.stx_dev_major, mstx.stx_dev_minor, mstx.stx_ino,
+ stx.stx_dev_major, stx.stx_dev_minor, stx.stx_ino);
+
+ ksft_test_result_pass("devices are matched\n");
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ int fsfd;
+
+ fsfd = sys_fsopen("overlay", 0);
+ if (fsfd == -1) {
+ ksft_test_result_skip("unable to create overlay mount\n");
+ return 1;
+ }
+ close(fsfd);
+
+ /* Create a new mount namespace to not care about cleaning test mounts. */
+ if (unshare(CLONE_NEWNS) == -1) {
+ ksft_test_result_skip("unable to create a new mount namespace\n");
+ return 1;
+ }
+
+ if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
+ pr_perror("mount");
+ return 1;
+ }
+
+ ksft_set_plan(1);
+
+ if (test())
+ return 1;
+
+ ksft_exit_pass();
+ return 0;
+}
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __SELFTEST_TIMENS_LOG_H__
+#define __SELFTEST_TIMENS_LOG_H__
+
+#define pr_msg(fmt, lvl, ...) \
+ ksft_print_msg("[%s] (%s:%d)\t" fmt "\n", \
+ lvl, __FILE__, __LINE__, ##__VA_ARGS__)
+
+#define pr_p(func, fmt, ...) func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...) \
+ ({ \
+ ksft_test_result_error(fmt "\n", ##__VA_ARGS__); \
+ -1; \
+ })
+
+#define pr_fail(fmt, ...) \
+ ({ \
+ ksft_test_result_fail(fmt, ##__VA_ARGS__); \
+ -1; \
+ })
+
+#define pr_perror(fmt, ...) pr_p(pr_err, fmt, ##__VA_ARGS__)
+
+#endif
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0-only
+/*_test
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+CFLAGS += -Wall -O2 -g $(KHDR_INCLUDES)
+TEST_GEN_PROGS := statmount_test
+
+include ../../lib.mk
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#define _GNU_SOURCE
+
+#include <assert.h>
+#include <stdint.h>
+#include <sched.h>
+#include <fcntl.h>
+#include <sys/param.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/statfs.h>
+#include <linux/mount.h>
+#include <linux/stat.h>
+#include <asm/unistd.h>
+
+#include "../../kselftest.h"
+
+static const char *const known_fs[] = {
+ "9p", "adfs", "affs", "afs", "aio", "anon_inodefs", "apparmorfs",
+ "autofs", "bcachefs", "bdev", "befs", "bfs", "binder", "binfmt_misc",
+ "bpf", "btrfs", "btrfs_test_fs", "ceph", "cgroup", "cgroup2", "cifs",
+ "coda", "configfs", "cpuset", "cramfs", "cxl", "dax", "debugfs",
+ "devpts", "devtmpfs", "dmabuf", "drm", "ecryptfs", "efivarfs", "efs",
+ "erofs", "exfat", "ext2", "ext3", "ext4", "f2fs", "functionfs",
+ "fuse", "fuseblk", "fusectl", "gadgetfs", "gfs2", "gfs2meta", "hfs",
+ "hfsplus", "hostfs", "hpfs", "hugetlbfs", "ibmasmfs", "iomem",
+ "ipathfs", "iso9660", "jffs2", "jfs", "minix", "mqueue", "msdos",
+ "nfs", "nfs4", "nfsd", "nilfs2", "nsfs", "ntfs", "ntfs3", "ocfs2",
+ "ocfs2_dlmfs", "ocxlflash", "omfs", "openpromfs", "overlay", "pipefs",
+ "proc", "pstore", "pvfs2", "qnx4", "qnx6", "ramfs", "reiserfs",
+ "resctrl", "romfs", "rootfs", "rpc_pipefs", "s390_hypfs", "secretmem",
+ "securityfs", "selinuxfs", "smackfs", "smb3", "sockfs", "spufs",
+ "squashfs", "sysfs", "sysv", "tmpfs", "tracefs", "ubifs", "udf",
+ "ufs", "v7", "vboxsf", "vfat", "virtiofs", "vxfs", "xenfs", "xfs",
+ "zonefs", NULL };
+
+static int statmount(uint64_t mnt_id, uint64_t mask, struct statmount *buf,
+ size_t bufsize, unsigned int flags)
+{
+ struct mnt_id_req req = {
+ .size = MNT_ID_REQ_SIZE_VER0,
+ .mnt_id = mnt_id,
+ .param = mask,
+ };
+
+ return syscall(__NR_statmount, &req, buf, bufsize, flags);
+}
+
+static struct statmount *statmount_alloc(uint64_t mnt_id, uint64_t mask, unsigned int flags)
+{
+ size_t bufsize = 1 << 15;
+ struct statmount *buf = NULL, *tmp = alloca(bufsize);
+ int tofree = 0;
+ int ret;
+
+ for (;;) {
+ ret = statmount(mnt_id, mask, tmp, bufsize, flags);
+ if (ret != -1)
+ break;
+ if (tofree)
+ free(tmp);
+ if (errno != EOVERFLOW)
+ return NULL;
+ bufsize <<= 1;
+ tofree = 1;
+ tmp = malloc(bufsize);
+ if (!tmp)
+ return NULL;
+ }
+ buf = malloc(tmp->size);
+ if (buf)
+ memcpy(buf, tmp, tmp->size);
+ if (tofree)
+ free(tmp);
+
+ return buf;
+}
+
+static void write_file(const char *path, const char *val)
+{
+ int fd = open(path, O_WRONLY);
+ size_t len = strlen(val);
+ int ret;
+
+ if (fd == -1)
+ ksft_exit_fail_msg("opening %s for write: %s\n", path, strerror(errno));
+
+ ret = write(fd, val, len);
+ if (ret == -1)
+ ksft_exit_fail_msg("writing to %s: %s\n", path, strerror(errno));
+ if (ret != len)
+ ksft_exit_fail_msg("short write to %s\n", path);
+
+ ret = close(fd);
+ if (ret == -1)
+ ksft_exit_fail_msg("closing %s\n", path);
+}
+
+static uint64_t get_mnt_id(const char *name, const char *path, uint64_t mask)
+{
+ struct statx sx;
+ int ret;
+
+ ret = statx(AT_FDCWD, path, 0, mask, &sx);
+ if (ret == -1)
+ ksft_exit_fail_msg("retrieving %s mount ID for %s: %s\n",
+ mask & STATX_MNT_ID_UNIQUE ? "unique" : "old",
+ name, strerror(errno));
+ if (!(sx.stx_mask & mask))
+ ksft_exit_fail_msg("no %s mount ID available for %s\n",
+ mask & STATX_MNT_ID_UNIQUE ? "unique" : "old",
+ name);
+
+ return sx.stx_mnt_id;
+}
+
+
+static char root_mntpoint[] = "/tmp/statmount_test_root.XXXXXX";
+static int orig_root;
+static uint64_t root_id, parent_id;
+static uint32_t old_root_id, old_parent_id;
+
+
+static void cleanup_namespace(void)
+{
+ fchdir(orig_root);
+ chroot(".");
+ umount2(root_mntpoint, MNT_DETACH);
+ rmdir(root_mntpoint);
+}
+
+static void setup_namespace(void)
+{
+ int ret;
+ char buf[32];
+ uid_t uid = getuid();
+ gid_t gid = getgid();
+
+ ret = unshare(CLONE_NEWNS|CLONE_NEWUSER);
+ if (ret == -1)
+ ksft_exit_fail_msg("unsharing mountns and userns: %s\n",
+ strerror(errno));
+
+ sprintf(buf, "0 %d 1", uid);
+ write_file("/proc/self/uid_map", buf);
+ write_file("/proc/self/setgroups", "deny");
+ sprintf(buf, "0 %d 1", gid);
+ write_file("/proc/self/gid_map", buf);
+
+ ret = mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
+ if (ret == -1)
+ ksft_exit_fail_msg("making mount tree private: %s\n",
+ strerror(errno));
+
+ if (!mkdtemp(root_mntpoint))
+ ksft_exit_fail_msg("creating temporary directory %s: %s\n",
+ root_mntpoint, strerror(errno));
+
+ old_parent_id = get_mnt_id("parent", root_mntpoint, STATX_MNT_ID);
+ parent_id = get_mnt_id("parent", root_mntpoint, STATX_MNT_ID_UNIQUE);
+
+ orig_root = open("/", O_PATH);
+ if (orig_root == -1)
+ ksft_exit_fail_msg("opening root directory: %s",
+ strerror(errno));
+
+ atexit(cleanup_namespace);
+
+ ret = mount(root_mntpoint, root_mntpoint, NULL, MS_BIND, NULL);
+ if (ret == -1)
+ ksft_exit_fail_msg("mounting temp root %s: %s\n",
+ root_mntpoint, strerror(errno));
+
+ ret = chroot(root_mntpoint);
+ if (ret == -1)
+ ksft_exit_fail_msg("chroot to temp root %s: %s\n",
+ root_mntpoint, strerror(errno));
+
+ ret = chdir("/");
+ if (ret == -1)
+ ksft_exit_fail_msg("chdir to root: %s\n", strerror(errno));
+
+ old_root_id = get_mnt_id("root", "/", STATX_MNT_ID);
+ root_id = get_mnt_id("root", "/", STATX_MNT_ID_UNIQUE);
+}
+
+static int setup_mount_tree(int log2_num)
+{
+ int ret, i;
+
+ ret = mount("", "/", NULL, MS_REC|MS_SHARED, NULL);
+ if (ret == -1) {
+ ksft_test_result_fail("making mount tree shared: %s\n",
+ strerror(errno));
+ return -1;
+ }
+
+ for (i = 0; i < log2_num; i++) {
+ ret = mount("/", "/", NULL, MS_BIND, NULL);
+ if (ret == -1) {
+ ksft_test_result_fail("mounting submount %s: %s\n",
+ root_mntpoint, strerror(errno));
+ return -1;
+ }
+ }
+ return 0;
+}
+
+static ssize_t listmount(uint64_t mnt_id, uint64_t last_mnt_id,
+ uint64_t list[], size_t num, unsigned int flags)
+{
+ struct mnt_id_req req = {
+ .size = MNT_ID_REQ_SIZE_VER0,
+ .mnt_id = mnt_id,
+ .param = last_mnt_id,
+ };
+
+ return syscall(__NR_listmount, &req, list, num, flags);
+}
+
+static void test_listmount_empty_root(void)
+{
+ ssize_t res;
+ const unsigned int size = 32;
+ uint64_t list[size];
+
+ res = listmount(LSMT_ROOT, 0, list, size, 0);
+ if (res == -1) {
+ ksft_test_result_fail("listmount: %s\n", strerror(errno));
+ return;
+ }
+ if (res != 1) {
+ ksft_test_result_fail("listmount result is %zi != 1\n", res);
+ return;
+ }
+
+ if (list[0] != root_id) {
+ ksft_test_result_fail("listmount ID doesn't match 0x%llx != 0x%llx\n",
+ (unsigned long long) list[0],
+ (unsigned long long) root_id);
+ return;
+ }
+
+ ksft_test_result_pass("listmount empty root\n");
+}
+
+static void test_statmount_zero_mask(void)
+{
+ struct statmount sm;
+ int ret;
+
+ ret = statmount(root_id, 0, &sm, sizeof(sm), 0);
+ if (ret == -1) {
+ ksft_test_result_fail("statmount zero mask: %s\n",
+ strerror(errno));
+ return;
+ }
+ if (sm.size != sizeof(sm)) {
+ ksft_test_result_fail("unexpected size: %u != %u\n",
+ sm.size, (uint32_t) sizeof(sm));
+ return;
+ }
+ if (sm.mask != 0) {
+ ksft_test_result_fail("unexpected mask: 0x%llx != 0x0\n",
+ (unsigned long long) sm.mask);
+ return;
+ }
+
+ ksft_test_result_pass("statmount zero mask\n");
+}
+
+static void test_statmount_mnt_basic(void)
+{
+ struct statmount sm;
+ int ret;
+ uint64_t mask = STATMOUNT_MNT_BASIC;
+
+ ret = statmount(root_id, mask, &sm, sizeof(sm), 0);
+ if (ret == -1) {
+ ksft_test_result_fail("statmount mnt basic: %s\n",
+ strerror(errno));
+ return;
+ }
+ if (sm.size != sizeof(sm)) {
+ ksft_test_result_fail("unexpected size: %u != %u\n",
+ sm.size, (uint32_t) sizeof(sm));
+ return;
+ }
+ if (sm.mask != mask) {
+ ksft_test_result_skip("statmount mnt basic unavailable\n");
+ return;
+ }
+
+ if (sm.mnt_id != root_id) {
+ ksft_test_result_fail("unexpected root ID: 0x%llx != 0x%llx\n",
+ (unsigned long long) sm.mnt_id,
+ (unsigned long long) root_id);
+ return;
+ }
+
+ if (sm.mnt_id_old != old_root_id) {
+ ksft_test_result_fail("unexpected old root ID: %u != %u\n",
+ sm.mnt_id_old, old_root_id);
+ return;
+ }
+
+ if (sm.mnt_parent_id != parent_id) {
+ ksft_test_result_fail("unexpected parent ID: 0x%llx != 0x%llx\n",
+ (unsigned long long) sm.mnt_parent_id,
+ (unsigned long long) parent_id);
+ return;
+ }
+
+ if (sm.mnt_parent_id_old != old_parent_id) {
+ ksft_test_result_fail("unexpected old parent ID: %u != %u\n",
+ sm.mnt_parent_id_old, old_parent_id);
+ return;
+ }
+
+ if (sm.mnt_propagation != MS_PRIVATE) {
+ ksft_test_result_fail("unexpected propagation: 0x%llx\n",
+ (unsigned long long) sm.mnt_propagation);
+ return;
+ }
+
+ ksft_test_result_pass("statmount mnt basic\n");
+}
+
+
+static void test_statmount_sb_basic(void)
+{
+ struct statmount sm;
+ int ret;
+ uint64_t mask = STATMOUNT_SB_BASIC;
+ struct statx sx;
+ struct statfs sf;
+
+ ret = statmount(root_id, mask, &sm, sizeof(sm), 0);
+ if (ret == -1) {
+ ksft_test_result_fail("statmount sb basic: %s\n",
+ strerror(errno));
+ return;
+ }
+ if (sm.size != sizeof(sm)) {
+ ksft_test_result_fail("unexpected size: %u != %u\n",
+ sm.size, (uint32_t) sizeof(sm));
+ return;
+ }
+ if (sm.mask != mask) {
+ ksft_test_result_skip("statmount sb basic unavailable\n");
+ return;
+ }
+
+ ret = statx(AT_FDCWD, "/", 0, 0, &sx);
+ if (ret == -1) {
+ ksft_test_result_fail("stat root failed: %s\n",
+ strerror(errno));
+ return;
+ }
+
+ if (sm.sb_dev_major != sx.stx_dev_major ||
+ sm.sb_dev_minor != sx.stx_dev_minor) {
+ ksft_test_result_fail("unexpected sb dev %u:%u != %u:%u\n",
+ sm.sb_dev_major, sm.sb_dev_minor,
+ sx.stx_dev_major, sx.stx_dev_minor);
+ return;
+ }
+
+ ret = statfs("/", &sf);
+ if (ret == -1) {
+ ksft_test_result_fail("statfs root failed: %s\n",
+ strerror(errno));
+ return;
+ }
+
+ if (sm.sb_magic != sf.f_type) {
+ ksft_test_result_fail("unexpected sb magic: 0x%llx != 0x%lx\n",
+ (unsigned long long) sm.sb_magic,
+ sf.f_type);
+ return;
+ }
+
+ ksft_test_result_pass("statmount sb basic\n");
+}
+
+static void test_statmount_mnt_point(void)
+{
+ struct statmount *sm;
+
+ sm = statmount_alloc(root_id, STATMOUNT_MNT_POINT, 0);
+ if (!sm) {
+ ksft_test_result_fail("statmount mount point: %s\n",
+ strerror(errno));
+ return;
+ }
+
+ if (strcmp(sm->str + sm->mnt_point, "/") != 0) {
+ ksft_test_result_fail("unexpected mount point: '%s' != '/'\n",
+ sm->str + sm->mnt_point);
+ goto out;
+ }
+ ksft_test_result_pass("statmount mount point\n");
+out:
+ free(sm);
+}
+
+static void test_statmount_mnt_root(void)
+{
+ struct statmount *sm;
+ const char *mnt_root, *last_dir, *last_root;
+
+ last_dir = strrchr(root_mntpoint, '/');
+ assert(last_dir);
+ last_dir++;
+
+ sm = statmount_alloc(root_id, STATMOUNT_MNT_ROOT, 0);
+ if (!sm) {
+ ksft_test_result_fail("statmount mount root: %s\n",
+ strerror(errno));
+ return;
+ }
+ mnt_root = sm->str + sm->mnt_root;
+ last_root = strrchr(mnt_root, '/');
+ if (last_root)
+ last_root++;
+ else
+ last_root = mnt_root;
+
+ if (strcmp(last_dir, last_root) != 0) {
+ ksft_test_result_fail("unexpected mount root last component: '%s' != '%s'\n",
+ last_root, last_dir);
+ goto out;
+ }
+ ksft_test_result_pass("statmount mount root\n");
+out:
+ free(sm);
+}
+
+static void test_statmount_fs_type(void)
+{
+ struct statmount *sm;
+ const char *fs_type;
+ const char *const *s;
+
+ sm = statmount_alloc(root_id, STATMOUNT_FS_TYPE, 0);
+ if (!sm) {
+ ksft_test_result_fail("statmount fs type: %s\n",
+ strerror(errno));
+ return;
+ }
+ fs_type = sm->str + sm->fs_type;
+ for (s = known_fs; s != NULL; s++) {
+ if (strcmp(fs_type, *s) == 0)
+ break;
+ }
+ if (!s)
+ ksft_print_msg("unknown filesystem type: %s\n", fs_type);
+
+ ksft_test_result_pass("statmount fs type\n");
+ free(sm);
+}
+
+static void test_statmount_string(uint64_t mask, size_t off, const char *name)
+{
+ struct statmount *sm;
+ size_t len, shortsize, exactsize;
+ uint32_t start, i;
+ int ret;
+
+ sm = statmount_alloc(root_id, mask, 0);
+ if (!sm) {
+ ksft_test_result_fail("statmount %s: %s\n", name,
+ strerror(errno));
+ goto out;
+ }
+ if (sm->size < sizeof(*sm)) {
+ ksft_test_result_fail("unexpected size: %u < %u\n",
+ sm->size, (uint32_t) sizeof(*sm));
+ goto out;
+ }
+ if (sm->mask != mask) {
+ ksft_test_result_skip("statmount %s unavailable\n", name);
+ goto out;
+ }
+ len = sm->size - sizeof(*sm);
+ start = ((uint32_t *) sm)[off];
+
+ for (i = start;; i++) {
+ if (i >= len) {
+ ksft_test_result_fail("string out of bounds\n");
+ goto out;
+ }
+ if (!sm->str[i])
+ break;
+ }
+ exactsize = sm->size;
+ shortsize = sizeof(*sm) + i;
+
+ ret = statmount(root_id, mask, sm, exactsize, 0);
+ if (ret == -1) {
+ ksft_test_result_fail("statmount exact size: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ errno = 0;
+ ret = statmount(root_id, mask, sm, shortsize, 0);
+ if (ret != -1 || errno != EOVERFLOW) {
+ ksft_test_result_fail("should have failed with EOVERFLOW: %s\n",
+ strerror(errno));
+ goto out;
+ }
+
+ ksft_test_result_pass("statmount string %s\n", name);
+out:
+ free(sm);
+}
+
+static void test_listmount_tree(void)
+{
+ ssize_t res;
+ const unsigned int log2_num = 4;
+ const unsigned int step = 3;
+ const unsigned int size = (1 << log2_num) + step + 1;
+ size_t num, expect = 1 << log2_num;
+ uint64_t list[size];
+ uint64_t list2[size];
+ size_t i;
+
+
+ res = setup_mount_tree(log2_num);
+ if (res == -1)
+ return;
+
+ num = res = listmount(LSMT_ROOT, 0, list, size, 0);
+ if (res == -1) {
+ ksft_test_result_fail("listmount: %s\n", strerror(errno));
+ return;
+ }
+ if (num != expect) {
+ ksft_test_result_fail("listmount result is %zi != %zi\n",
+ res, expect);
+ return;
+ }
+
+ for (i = 0; i < size - step;) {
+ res = listmount(LSMT_ROOT, i ? list2[i - 1] : 0, list2 + i, step, 0);
+ if (res == -1)
+ ksft_test_result_fail("short listmount: %s\n",
+ strerror(errno));
+ i += res;
+ if (res < step)
+ break;
+ }
+ if (i != num) {
+ ksft_test_result_fail("different number of entries: %zu != %zu\n",
+ i, num);
+ return;
+ }
+ for (i = 0; i < num; i++) {
+ if (list2[i] != list[i]) {
+ ksft_test_result_fail("different value for entry %zu: 0x%llx != 0x%llx\n",
+ i,
+ (unsigned long long) list2[i],
+ (unsigned long long) list[i]);
+ }
+ }
+
+ ksft_test_result_pass("listmount tree\n");
+}
+
+#define str_off(memb) (offsetof(struct statmount, memb) / sizeof(uint32_t))
+
+int main(void)
+{
+ int ret;
+ uint64_t all_mask = STATMOUNT_SB_BASIC | STATMOUNT_MNT_BASIC |
+ STATMOUNT_PROPAGATE_FROM | STATMOUNT_MNT_ROOT |
+ STATMOUNT_MNT_POINT | STATMOUNT_FS_TYPE;
+
+ ksft_print_header();
+
+ ret = statmount(0, 0, NULL, 0, 0);
+ assert(ret == -1);
+ if (errno == ENOSYS)
+ ksft_exit_skip("statmount() syscall not supported\n");
+
+ setup_namespace();
+
+ ksft_set_plan(14);
+ test_listmount_empty_root();
+ test_statmount_zero_mask();
+ test_statmount_mnt_basic();
+ test_statmount_sb_basic();
+ test_statmount_mnt_root();
+ test_statmount_mnt_point();
+ test_statmount_fs_type();
+ test_statmount_string(STATMOUNT_MNT_ROOT, str_off(mnt_root), "mount root");
+ test_statmount_string(STATMOUNT_MNT_POINT, str_off(mnt_point), "mount point");
+ test_statmount_string(STATMOUNT_FS_TYPE, str_off(fs_type), "fs type");
+ test_statmount_string(all_mask, str_off(mnt_root), "mount root & all");
+ test_statmount_string(all_mask, str_off(mnt_point), "mount point & all");
+ test_statmount_string(all_mask, str_off(fs_type), "fs type & all");
+
+ test_listmount_tree();
+
+
+ if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
+ ksft_exit_fail();
+ else
+ ksft_exit_pass();
+}
ARCH_DIR := $(ARCH)
endif
-ifeq ($(ARCH),arm64)
-tools_dir := $(top_srcdir)/tools
-arm64_tools_dir := $(tools_dir)/arch/arm64/tools/
-GEN_HDRS := $(top_srcdir)/tools/arch/arm64/include/generated/
-CFLAGS += -I$(GEN_HDRS)
-
-$(GEN_HDRS): $(wildcard $(arm64_tools_dir)/*)
- $(MAKE) -C $(arm64_tools_dir) O=$(tools_dir)
-endif
-
LIBKVM += lib/assert.c
LIBKVM += lib/elf.c
LIBKVM += lib/guest_modes.c
ifeq ($(ARCH),s390)
CFLAGS += -march=z10
endif
+ifeq ($(ARCH),arm64)
+tools_dir := $(top_srcdir)/tools
+arm64_tools_dir := $(tools_dir)/arch/arm64/tools/
+
+ifneq ($(abs_objdir),)
+arm64_hdr_outdir := $(abs_objdir)/tools/
+else
+arm64_hdr_outdir := $(tools_dir)/
+endif
+
+GEN_HDRS := $(arm64_hdr_outdir)arch/arm64/include/generated/
+CFLAGS += -I$(GEN_HDRS)
+
+$(GEN_HDRS): $(wildcard $(arm64_tools_dir)/*)
+ $(MAKE) -C $(arm64_tools_dir) OUTPUT=$(arm64_hdr_outdir)
+endif
no-pie-option := $(call try-run, echo 'int main(void) { return 0; }' | \
$(CC) -Werror $(CFLAGS) -no-pie -x c - -o "$$TMP", -no-pie)
static uint64_t get_pmcr_n(uint64_t pmcr)
{
- return (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK;
+ return FIELD_GET(ARMV8_PMU_PMCR_N, pmcr);
}
static void set_pmcr_n(uint64_t *pmcr, uint64_t pmcr_n)
{
- *pmcr = *pmcr & ~(ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT);
- *pmcr |= (pmcr_n << ARMV8_PMU_PMCR_N_SHIFT);
+ u64p_replace_bits((__u64 *) pmcr, pmcr_n, ARMV8_PMU_PMCR_N);
}
static uint64_t get_counters_mask(uint64_t n)
for_each_sublist(c, s) {
if (!strcmp(s->name, "base"))
continue;
- strcat(c->name + len, s->name);
- len += strlen(s->name) + 1;
- c->name[len - 1] = '+';
+ if (len)
+ c->name[len++] = '+';
+ strcpy(c->name + len, s->name);
+ len += strlen(s->name);
}
- c->name[len - 1] = '\0';
+ c->name[len] = '\0';
return c->name;
}
reg_size = "KVM_REG_SIZE_U128";
break;
default:
- printf("\tKVM_REG_RISCV | (%lld << KVM_REG_SIZE_SHIFT) | 0x%llx /* UNKNOWN */,",
- (id & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT, id & REG_MASK);
+ printf("\tKVM_REG_RISCV | (%lld << KVM_REG_SIZE_SHIFT) | 0x%llx /* UNKNOWN */,\n",
+ (id & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT, id & ~REG_MASK);
+ return;
}
switch (id & KVM_REG_RISCV_TYPE_MASK) {
reg_size, sbi_ext_id_to_str(prefix, id));
break;
default:
- printf("\tKVM_REG_RISCV | %s | 0x%llx /* UNKNOWN */,",
- reg_size, id & REG_MASK);
+ printf("\tKVM_REG_RISCV | %s | 0x%llx /* UNKNOWN */,\n",
+ reg_size, id & ~REG_MASK);
+ return;
}
}
char *mem;
len = mlock_limit_cur;
+ if (len % page_size != 0)
+ len = (len/page_size) * page_size;
+
mem = mmap(NULL, len, prot, mode, fd, 0);
if (mem == MAP_FAILED) {
fail("unable to mmap secret memory\n");
TEST_PROGS += test_vxlan_nolocalbypass.sh
TEST_PROGS += test_bridge_backup_port.sh
TEST_PROGS += fdb_flush.sh
+TEST_PROGS += vlan_hw_filter.sh
TEST_FILES := settings
fi
if reset "mpc backup" &&
- continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
+ continue_if mptcp_lib_kallsyms_doesnt_have "T mptcp_subflow_send_ack$"; then
pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,backup
speed=slow \
run_tests $ns1 $ns2 10.0.1.1
fi
if reset "mpc backup both sides" &&
- continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
+ continue_if mptcp_lib_kallsyms_doesnt_have "T mptcp_subflow_send_ack$"; then
pm_nl_add_endpoint $ns1 10.0.1.1 flags subflow,backup
pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,backup
speed=slow \
fi
if reset "mpc switch to backup" &&
- continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
+ continue_if mptcp_lib_kallsyms_doesnt_have "T mptcp_subflow_send_ack$"; then
pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow
sflags=backup speed=slow \
run_tests $ns1 $ns2 10.0.1.1
fi
if reset "mpc switch to backup both sides" &&
- continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
+ continue_if mptcp_lib_kallsyms_doesnt_have "T mptcp_subflow_send_ack$"; then
pm_nl_add_endpoint $ns1 10.0.1.1 flags subflow
pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow
sflags=backup speed=slow \
done
sleep 5
- run_cmd_grep "10.23.11." ip addr show dev "$devdummy"
+ run_cmd_grep_fail "10.23.11." ip addr show dev "$devdummy"
if [ $? -eq 0 ]; then
check_err 1
end_test "FAIL: preferred_lft addresses remaining"
--- /dev/null
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+readonly NETNS="ns-$(mktemp -u XXXXXX)"
+
+ret=0
+
+cleanup() {
+ ip netns del $NETNS
+}
+
+trap cleanup EXIT
+
+fail() {
+ echo "ERROR: ${1:-unexpected return code} (ret: $_)" >&2
+ ret=1
+}
+
+ip netns add ${NETNS}
+ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+ip netns exec ${NETNS} ip link add bond_slave_1 type veth peer veth2
+ip netns exec ${NETNS} ip link set bond_slave_1 master bond0
+ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+ip netns exec ${NETNS} ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
+ip netns exec ${NETNS} ip link add link bond0 name bond0.0 type vlan id 0
+ip netns exec ${NETNS} ip link set bond_slave_1 nomaster
+ip netns exec ${NETNS} ip link del veth2 || fail "Please check vlan HW filter function"
+
+exit $ret
vphn \
math \
papr_attributes \
+ papr_vpd \
+ papr_sysparm \
ptrace \
security \
mce
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright 2023, Michael Ellerman, IBM Corporation.
+ */
+
+#ifndef _SELFTESTS_POWERPC_FPU_H
+#define _SELFTESTS_POWERPC_FPU_H
+
+static inline void randomise_darray(double *darray, int num)
+{
+ long val;
+
+ for (int i = 0; i < num; i++) {
+ val = random();
+ if (val & 1)
+ val *= -1;
+
+ if (val & 2)
+ darray[i] = 1.0 / val;
+ else
+ darray[i] = val * val;
+ }
+}
+
+#endif /* _SELFTESTS_POWERPC_FPU_H */
li r3,0 # Success!!!
1: blr
+
+// int check_all_fprs(double darray[32])
+FUNC_START(check_all_fprs)
+ PUSH_BASIC_STACK(8)
+ mr r4, r3 // r4 = darray
+ li r3, 1 // prepare for failure
+
+ stfd f31, STACK_FRAME_LOCAL(0, 0)(sp) // backup f31
+
+ // Check regs f0-f30, using f31 as scratch
+ .set i, 0
+ .rept 31
+ lfd f31, (8 * i)(r4) // load expected value
+ fcmpu cr0, i, f31 // compare
+ bne cr0, 1f // bail if mismatch
+ .set i, i + 1
+ .endr
+
+ lfd f31, STACK_FRAME_LOCAL(0, 0)(sp) // reload f31
+ stfd f30, STACK_FRAME_LOCAL(0, 0)(sp) // backup f30
+
+ lfd f30, (8 * 31)(r4) // load expected value of f31
+ fcmpu cr0, f30, f31 // compare
+ bne cr0, 1f // bail if mismatch
+
+ lfd f30, STACK_FRAME_LOCAL(0, 0)(sp) // reload f30
+
+ // Success
+ li r3, 0
+
+1: POP_BASIC_STACK(8)
+ blr
+FUNC_END(check_all_fprs)
+
FUNC_START(test_fpu)
# r3 holds pointer to where to put the result of fork
# r4 holds pointer to the pid
std r3,STACK_FRAME_PARAM(0)(sp) # Address of darray
std r4,STACK_FRAME_PARAM(1)(sp) # Address of pid
- bl load_fpu
- nop
+ // Load FPRs with expected values
+ OP_REGS lfd, 8, 0, 31, r3
+
li r0,__NR_fork
sc
std r3,0(r9)
ld r3,STACK_FRAME_PARAM(0)(sp)
- bl check_fpu
+ bl check_all_fprs
nop
POP_FPU(256)
std r4,STACK_FRAME_PARAM(1)(sp) # int *threads_starting
std r5,STACK_FRAME_PARAM(2)(sp) # int *running
- bl load_fpu
- nop
+ // Load FPRs with expected values
+ OP_REGS lfd, 8, 0, 31, r3
sync
# Atomic DEC
bne- 1b
2: ld r3,STACK_FRAME_PARAM(0)(sp)
- bl check_fpu
- nop
+ bl check_all_fprs
cmpdi r3,0
bne 3f
ld r4,STACK_FRAME_PARAM(2)(sp)
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright 2015, Cyril Bur, IBM Corp.
+ * Copyright 2023, Michael Ellerman, IBM Corp.
*
* This test attempts to see if the FPU registers change across preemption.
- * Two things should be noted here a) The check_fpu function in asm only checks
- * the non volatile registers as it is reused from the syscall test b) There is
- * no way to be sure preemption happened so this test just uses many threads
- * and a long wait. As such, a successful test doesn't mean much but a failure
- * is bad.
+ * There is no way to be sure preemption happened so this test just uses many
+ * threads and a long wait. As such, a successful test doesn't mean much but
+ * a failure is bad.
*/
#include <stdio.h>
#include <pthread.h>
#include "utils.h"
+#include "fpu.h"
/* Time to wait for workers to get preempted (seconds) */
-#define PREEMPT_TIME 20
+#define PREEMPT_TIME 60
/*
* Factor by which to multiply number of online CPUs for total number of
* worker threads
#define THREAD_FACTOR 8
-__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
- 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
- 2.1};
+__thread double darray[32];
int threads_starting;
int running;
-extern void preempt_fpu(double *darray, int *threads_starting, int *running);
+extern int preempt_fpu(double *darray, int *threads_starting, int *running);
void *preempt_fpu_c(void *p)
{
- int i;
- srand(pthread_self());
- for (i = 0; i < 21; i++)
- darray[i] = rand();
+ long rc;
- /* Test failed if it ever returns */
- preempt_fpu(darray, &threads_starting, &running);
+ srand(pthread_self());
+ randomise_darray(darray, ARRAY_SIZE(darray));
+ rc = preempt_fpu(darray, &threads_starting, &running);
- return p;
+ return (void *)rc;
}
int test_preempt_fpu(void)
#include <stdlib.h>
#include "utils.h"
+#include "fpu.h"
extern int test_fpu(double *darray, pid_t *pid);
-double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
- 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
- 2.1};
+double darray[32];
int syscall_fpu(void)
{
int i;
int ret;
int child_ret;
+
+ randomise_darray(darray, ARRAY_SIZE(darray));
+
for (i = 0; i < 1000; i++) {
/* test_fpu will fork() */
ret = test_fpu(darray, &fork_pid);
int threads_starting;
int running;
-extern void preempt_vmx(vector int *varray, int *threads_starting, int *running);
+extern int preempt_vmx(vector int *varray, int *threads_starting, int *running);
void *preempt_vmx_c(void *p)
{
int i, j;
+ long rc;
+
srand(pthread_self());
for (i = 0; i < 12; i++)
for (j = 0; j < 4; j++)
varray[i][j] = rand();
- /* Test fails if it ever returns */
- preempt_vmx(varray, &threads_starting, &running);
- return p;
+ rc = preempt_vmx(varray, &threads_starting, &running);
+
+ return (void *)rc;
}
int test_preempt_vmx(void)
--- /dev/null
+/papr_sysparm
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0
+noarg:
+ $(MAKE) -C ../
+
+TEST_GEN_PROGS := papr_sysparm
+
+top_srcdir = ../../../../..
+include ../../lib.mk
+
+$(TEST_GEN_PROGS): ../harness.c ../utils.c
+
+$(OUTPUT)/papr_sysparm: CFLAGS += $(KHDR_INCLUDES)
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+#include <errno.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <sys/ioctl.h>
+#include <unistd.h>
+#include <asm/papr-sysparm.h>
+
+#include "utils.h"
+
+#define DEVPATH "/dev/papr-sysparm"
+
+static int open_close(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+ FAIL_IF(close(devfd) != 0);
+
+ return 0;
+}
+
+static int get_splpar(void)
+{
+ struct papr_sysparm_io_block sp = {
+ .parameter = 20, // SPLPAR characteristics
+ };
+ const int devfd = open(DEVPATH, O_RDONLY);
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+ FAIL_IF(ioctl(devfd, PAPR_SYSPARM_IOC_GET, &sp) != 0);
+ FAIL_IF(sp.length == 0);
+ FAIL_IF(sp.length > sizeof(sp.data));
+ FAIL_IF(close(devfd) != 0);
+
+ return 0;
+}
+
+static int get_bad_parameter(void)
+{
+ struct papr_sysparm_io_block sp = {
+ .parameter = UINT32_MAX, // there are only ~60 specified parameters
+ };
+ const int devfd = open(DEVPATH, O_RDONLY);
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ // Ensure expected error
+ FAIL_IF(ioctl(devfd, PAPR_SYSPARM_IOC_GET, &sp) != -1);
+ FAIL_IF(errno != EOPNOTSUPP);
+
+ // Ensure the buffer is unchanged
+ FAIL_IF(sp.length != 0);
+ for (size_t i = 0; i < ARRAY_SIZE(sp.data); ++i)
+ FAIL_IF(sp.data[i] != 0);
+
+ FAIL_IF(close(devfd) != 0);
+
+ return 0;
+}
+
+static int check_efault_common(unsigned long cmd)
+{
+ const int devfd = open(DEVPATH, O_RDWR);
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ // Ensure expected error
+ FAIL_IF(ioctl(devfd, cmd, NULL) != -1);
+ FAIL_IF(errno != EFAULT);
+
+ FAIL_IF(close(devfd) != 0);
+
+ return 0;
+}
+
+static int check_efault_get(void)
+{
+ return check_efault_common(PAPR_SYSPARM_IOC_GET);
+}
+
+static int check_efault_set(void)
+{
+ return check_efault_common(PAPR_SYSPARM_IOC_SET);
+}
+
+static int set_hmc0(void)
+{
+ struct papr_sysparm_io_block sp = {
+ .parameter = 0, // HMC0, not a settable parameter
+ };
+ const int devfd = open(DEVPATH, O_RDWR);
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ // Ensure expected error
+ FAIL_IF(ioctl(devfd, PAPR_SYSPARM_IOC_SET, &sp) != -1);
+ SKIP_IF_MSG(errno == EOPNOTSUPP, "operation not supported");
+ FAIL_IF(errno != EPERM);
+
+ FAIL_IF(close(devfd) != 0);
+
+ return 0;
+}
+
+static int set_with_ro_fd(void)
+{
+ struct papr_sysparm_io_block sp = {
+ .parameter = 0, // HMC0, not a settable parameter.
+ };
+ const int devfd = open(DEVPATH, O_RDONLY);
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ // Ensure expected error
+ FAIL_IF(ioctl(devfd, PAPR_SYSPARM_IOC_SET, &sp) != -1);
+ SKIP_IF_MSG(errno == EOPNOTSUPP, "operation not supported");
+
+ // HMC0 isn't a settable parameter and we would normally
+ // expect to get EPERM on attempts to modify it. However, when
+ // the file is open read-only, we expect the driver to prevent
+ // the attempt with a distinct error.
+ FAIL_IF(errno != EBADF);
+
+ FAIL_IF(close(devfd) != 0);
+
+ return 0;
+}
+
+struct sysparm_test {
+ int (*function)(void);
+ const char *description;
+};
+
+static const struct sysparm_test sysparm_tests[] = {
+ {
+ .function = open_close,
+ .description = "open and close " DEVPATH " without issuing commands",
+ },
+ {
+ .function = get_splpar,
+ .description = "retrieve SPLPAR characteristics",
+ },
+ {
+ .function = get_bad_parameter,
+ .description = "verify EOPNOTSUPP for known-bad parameter",
+ },
+ {
+ .function = check_efault_get,
+ .description = "PAPR_SYSPARM_IOC_GET returns EFAULT on bad address",
+ },
+ {
+ .function = check_efault_set,
+ .description = "PAPR_SYSPARM_IOC_SET returns EFAULT on bad address",
+ },
+ {
+ .function = set_hmc0,
+ .description = "ensure EPERM on attempt to update HMC0",
+ },
+ {
+ .function = set_with_ro_fd,
+ .description = "PAPR_IOC_SYSPARM_SET returns EACCES on read-only fd",
+ },
+};
+
+int main(void)
+{
+ size_t fails = 0;
+
+ for (size_t i = 0; i < ARRAY_SIZE(sysparm_tests); ++i) {
+ const struct sysparm_test *t = &sysparm_tests[i];
+
+ if (test_harness(t->function, t->description))
+ ++fails;
+ }
+
+ return fails == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
+}
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0
+noarg:
+ $(MAKE) -C ../
+
+TEST_GEN_PROGS := papr_vpd
+
+top_srcdir = ../../../../..
+include ../../lib.mk
+
+$(TEST_GEN_PROGS): ../harness.c ../utils.c
+
+$(OUTPUT)/papr_vpd: CFLAGS += $(KHDR_INCLUDES)
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <unistd.h>
+
+#include <asm/papr-vpd.h>
+
+#include "utils.h"
+
+#define DEVPATH "/dev/papr-vpd"
+
+static int dev_papr_vpd_open_close(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+ FAIL_IF(close(devfd) != 0);
+
+ return 0;
+}
+
+static int dev_papr_vpd_get_handle_all(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+ struct papr_location_code lc = { .str = "", };
+ off_t size;
+ int fd;
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ errno = 0;
+ fd = ioctl(devfd, PAPR_VPD_IOC_CREATE_HANDLE, &lc);
+ FAIL_IF(errno != 0);
+ FAIL_IF(fd < 0);
+
+ FAIL_IF(close(devfd) != 0);
+
+ size = lseek(fd, 0, SEEK_END);
+ FAIL_IF(size <= 0);
+
+ void *buf = malloc((size_t)size);
+ FAIL_IF(!buf);
+
+ ssize_t consumed = pread(fd, buf, size, 0);
+ FAIL_IF(consumed != size);
+
+ /* Ensure EOF */
+ FAIL_IF(read(fd, buf, size) != 0);
+ FAIL_IF(close(fd));
+
+ /* Verify that the buffer looks like VPD */
+ static const char needle[] = "System VPD";
+ FAIL_IF(!memmem(buf, size, needle, strlen(needle)));
+
+ return 0;
+}
+
+static int dev_papr_vpd_get_handle_byte_at_a_time(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+ struct papr_location_code lc = { .str = "", };
+ int fd;
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ errno = 0;
+ fd = ioctl(devfd, PAPR_VPD_IOC_CREATE_HANDLE, &lc);
+ FAIL_IF(errno != 0);
+ FAIL_IF(fd < 0);
+
+ FAIL_IF(close(devfd) != 0);
+
+ size_t consumed = 0;
+ while (1) {
+ ssize_t res;
+ char c;
+
+ errno = 0;
+ res = read(fd, &c, sizeof(c));
+ FAIL_IF(res > sizeof(c));
+ FAIL_IF(res < 0);
+ FAIL_IF(errno != 0);
+ consumed += res;
+ if (res == 0)
+ break;
+ }
+
+ FAIL_IF(consumed != lseek(fd, 0, SEEK_END));
+
+ FAIL_IF(close(fd));
+
+ return 0;
+}
+
+
+static int dev_papr_vpd_unterm_loc_code(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+ struct papr_location_code lc = {};
+ int fd;
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ /*
+ * Place a non-null byte in every element of loc_code; the
+ * driver should reject this input.
+ */
+ memset(lc.str, 'x', ARRAY_SIZE(lc.str));
+
+ errno = 0;
+ fd = ioctl(devfd, PAPR_VPD_IOC_CREATE_HANDLE, &lc);
+ FAIL_IF(fd != -1);
+ FAIL_IF(errno != EINVAL);
+
+ FAIL_IF(close(devfd) != 0);
+ return 0;
+}
+
+static int dev_papr_vpd_null_handle(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+ int rc;
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ errno = 0;
+ rc = ioctl(devfd, PAPR_VPD_IOC_CREATE_HANDLE, NULL);
+ FAIL_IF(rc != -1);
+ FAIL_IF(errno != EFAULT);
+
+ FAIL_IF(close(devfd) != 0);
+ return 0;
+}
+
+static int papr_vpd_close_handle_without_reading(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+ struct papr_location_code lc;
+ int fd;
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ errno = 0;
+ fd = ioctl(devfd, PAPR_VPD_IOC_CREATE_HANDLE, &lc);
+ FAIL_IF(errno != 0);
+ FAIL_IF(fd < 0);
+
+ /* close the handle without reading it */
+ FAIL_IF(close(fd) != 0);
+
+ FAIL_IF(close(devfd) != 0);
+ return 0;
+}
+
+static int papr_vpd_reread(void)
+{
+ const int devfd = open(DEVPATH, O_RDONLY);
+ struct papr_location_code lc = { .str = "", };
+ int fd;
+
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ errno = 0;
+ fd = ioctl(devfd, PAPR_VPD_IOC_CREATE_HANDLE, &lc);
+ FAIL_IF(errno != 0);
+ FAIL_IF(fd < 0);
+
+ FAIL_IF(close(devfd) != 0);
+
+ const off_t size = lseek(fd, 0, SEEK_END);
+ FAIL_IF(size <= 0);
+
+ char *bufs[2];
+
+ for (size_t i = 0; i < ARRAY_SIZE(bufs); ++i) {
+ bufs[i] = malloc(size);
+ FAIL_IF(!bufs[i]);
+ ssize_t consumed = pread(fd, bufs[i], size, 0);
+ FAIL_IF(consumed != size);
+ }
+
+ FAIL_IF(memcmp(bufs[0], bufs[1], size));
+
+ FAIL_IF(close(fd) != 0);
+
+ return 0;
+}
+
+static int get_system_loc_code(struct papr_location_code *lc)
+{
+ static const char system_id_path[] = "/sys/firmware/devicetree/base/system-id";
+ static const char model_path[] = "/sys/firmware/devicetree/base/model";
+ char *system_id;
+ char *model;
+ int err = -1;
+
+ if (read_file_alloc(model_path, &model, NULL))
+ return err;
+
+ if (read_file_alloc(system_id_path, &system_id, NULL))
+ goto free_model;
+
+ char *mtm;
+ int sscanf_ret = sscanf(model, "IBM,%ms", &mtm);
+ if (sscanf_ret != 1)
+ goto free_system_id;
+
+ char *plant_and_seq;
+ if (sscanf(system_id, "IBM,%*c%*c%ms", &plant_and_seq) != 1)
+ goto free_mtm;
+ /*
+ * Replace - with . to build location code.
+ */
+ char *sep = strchr(mtm, '-');
+ if (!sep)
+ goto free_mtm;
+ else
+ *sep = '.';
+
+ snprintf(lc->str, sizeof(lc->str),
+ "U%s.%s", mtm, plant_and_seq);
+ err = 0;
+
+ free(plant_and_seq);
+free_mtm:
+ free(mtm);
+free_system_id:
+ free(system_id);
+free_model:
+ free(model);
+ return err;
+}
+
+static int papr_vpd_system_loc_code(void)
+{
+ struct papr_location_code lc;
+ const int devfd = open(DEVPATH, O_RDONLY);
+ off_t size;
+ int fd;
+
+ SKIP_IF_MSG(get_system_loc_code(&lc),
+ "Cannot determine system location code");
+ SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
+ DEVPATH " not present");
+
+ FAIL_IF(devfd < 0);
+
+ errno = 0;
+ fd = ioctl(devfd, PAPR_VPD_IOC_CREATE_HANDLE, &lc);
+ FAIL_IF(errno != 0);
+ FAIL_IF(fd < 0);
+
+ FAIL_IF(close(devfd) != 0);
+
+ size = lseek(fd, 0, SEEK_END);
+ FAIL_IF(size <= 0);
+
+ void *buf = malloc((size_t)size);
+ FAIL_IF(!buf);
+
+ ssize_t consumed = pread(fd, buf, size, 0);
+ FAIL_IF(consumed != size);
+
+ /* Ensure EOF */
+ FAIL_IF(read(fd, buf, size) != 0);
+ FAIL_IF(close(fd));
+
+ /* Verify that the buffer looks like VPD */
+ static const char needle[] = "System VPD";
+ FAIL_IF(!memmem(buf, size, needle, strlen(needle)));
+
+ return 0;
+}
+
+struct vpd_test {
+ int (*function)(void);
+ const char *description;
+};
+
+static const struct vpd_test vpd_tests[] = {
+ {
+ .function = dev_papr_vpd_open_close,
+ .description = "open/close " DEVPATH,
+ },
+ {
+ .function = dev_papr_vpd_unterm_loc_code,
+ .description = "ensure EINVAL on unterminated location code",
+ },
+ {
+ .function = dev_papr_vpd_null_handle,
+ .description = "ensure EFAULT on bad handle addr",
+ },
+ {
+ .function = dev_papr_vpd_get_handle_all,
+ .description = "get handle for all VPD"
+ },
+ {
+ .function = papr_vpd_close_handle_without_reading,
+ .description = "close handle without consuming VPD"
+ },
+ {
+ .function = dev_papr_vpd_get_handle_byte_at_a_time,
+ .description = "read all VPD one byte at a time"
+ },
+ {
+ .function = papr_vpd_reread,
+ .description = "ensure re-read yields same results"
+ },
+ {
+ .function = papr_vpd_system_loc_code,
+ .description = "get handle for system VPD"
+ },
+};
+
+int main(void)
+{
+ size_t fails = 0;
+
+ for (size_t i = 0; i < ARRAY_SIZE(vpd_tests); ++i) {
+ const struct vpd_test *t = &vpd_tests[i];
+
+ if (test_harness(t->function, t->description))
+ ++fails;
+ }
+
+ return fails == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
+}
list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
srcu_read_lock_held(&resampler->kvm->irq_srcu))
- eventfd_signal(irqfd->resamplefd, 1);
+ eventfd_signal(irqfd->resamplefd);
}
/*
if (!ioeventfd_in_range(p, addr, len, val))
return -EOPNOTSUPP;
- eventfd_signal(p->eventfd, 1);
+ eventfd_signal(p->eventfd);
return 0;
}
return r < 0 ? r : 0;
}
-/* Caller must hold slots_lock. */
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, struct kvm_io_device *dev)
{
struct kvm_io_bus *new_bus, *bus;
struct kvm_io_range range;
+ lockdep_assert_held(&kvm->slots_lock);
+
bus = kvm_get_bus(kvm, bus_idx);
if (!bus)
return -ENOMEM;