Description: Set timeout to issue discard commands during umount.
Default: 5 secs
+What: /sys/fs/f2fs/<disk>/pending_discard
+Date: November 2021
+Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
+Description: Shows the number of pending discard commands in the queue.
+
What: /sys/fs/f2fs/<disk>/max_victim_search
Date: January 2014
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
f2fs will allocate 1..<max_fragment_chunk> blocks in a chunk and make a hole
in the length of 1..<max_fragment_hole> by turns. This value can be set
between 1..512 and the default value is 4.
+
+What: /sys/fs/f2fs/<disk>/gc_urgent_high_remaining
+Date: December 2021
+Contact: "Daeho Jeong" <daehojeong@google.com>
+Description: You can set the trial count limit for GC urgent high mode with this value.
+ If GC thread gets to the limit, the mode will turn back to GC normal mode.
+ By default, the value is zero, which means there is no limit like before.
b) completion of synchronous block I/O initiated by the task
c) swapping in pages
d) memory reclaim
+e) thrashing page cache
+f) direct compact
and makes these statistics available to userspace through
the taskstats interface.
statistics. The delay accounting functionality populates specific fields of
this structure. See
- include/linux/taskstats.h
+ include/uapi/linux/taskstats.h
for a description of the fields pertaining to delay accounting.
It will generally be in the form of counters returning the cumulative
-delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
+delay seen for cpu, sync block I/O, swapin, memory reclaim, thrash page
+cache, direct compact etc.
Taking the difference of two successive readings of a given
counter (say cpu_delay_total) for a task will give the delay
General format of the getdelays command::
- getdelays [-t tgid] [-p pid] [-c cmd...]
-
+ getdelays [-dilv] [-t tgid] [-p pid]
Get delays, since system boot, for pid 10::
- # ./getdelays -p 10
+ # ./getdelays -d -p 10
(output similar to next case)
Get sum of delays, since system boot, for all pids with tgid 5::
- # ./getdelays -t 5
-
-
- CPU count real total virtual total delay total
- 7876 92005750 100000000 24001500
- IO count delay total
- 0 0
- SWAP count delay total
- 0 0
- RECLAIM count delay total
- 0 0
+ # ./getdelays -d -t 5
+ print delayacct stats ON
+ TGID 5
-Get delays seen in executing a given simple command::
- # ./getdelays -c ls /
+ CPU count real total virtual total delay total delay average
+ 8 7000000 6872122 3382277 0.423ms
+ IO count delay total delay average
+ 0 0 0ms
+ SWAP count delay total delay average
+ 0 0 0ms
+ RECLAIM count delay total delay average
+ 0 0 0ms
+ THRASHING count delay total delay average
+ 0 0 0ms
+ COMPACT count delay total delay average
+ 0 0 0ms
- bin data1 data3 data5 dev home media opt root srv sys usr
- boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
+Get IO accounting for pid 1, it works only with -p::
+ # ./getdelays -i -p 1
+ printing IO accounting
+ linuxrc: read=65536, write=0, cancelled_write=0
- CPU count real total virtual total delay total
- 6 4000250 4000000 0
- IO count delay total
- 0 0
- SWAP count delay total
- 0 0
- RECLAIM count delay total
- 0 0
+The above command can be used with -v to get more debug information.
title: Analogix ANX7814 SlimPort (Full-HD Transmitter)
maintainers:
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
+ - Andrzej Hajda <andrzej.hajda@intel.com>
+ - Neil Armstrong <narmstrong@baylibre.com>
+ - Robert Foss <robert.foss@linaro.org>
properties:
compatible:
maintainers:
- Nicolas Boichat <drinkcat@chromium.org>
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
description: |
ChromeOS EC ANX7688 is a display bridge that converts HDMI 2.0 to
maintainers:
- Nicolas Boichat <drinkcat@chromium.org>
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
description: |
The PS8640 is a low power MIPI-to-eDP video format converter supporting
title: Asia Better Technology 3.0" (320x480 pixels) 24-bit IPS LCD panel
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Paul Cercueil <paul@crapouillou.net>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
960 TFT source driver pins and 240 TFT gate driver pins, VCOM, VCOML and
VCOMH outputs.
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
title: Innolux EJ030NA 3.0" (320x480 pixels) 24-bit TFT LCD panel
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Paul Cercueil <paul@crapouillou.net>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
title: King Display KD035G6-54NT 3.5" (320x240 pixels) 24-bit TFT LCD panel
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Paul Cercueil <paul@crapouillou.net>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
title: LG.Philips LB035Q02 Panel
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Tomi Valkeinen <tomi.valkeinen@ti.com>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
title: Samsung LD9040 AMOLED LCD parallel RGB panel with SPI control bus
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Andrzej Hajda <a.hajda@samsung.com>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
lcd@0 {
compatible = "samsung,ld9040";
- #address-cells = <1>;
- #size-cells = <0>;
reg = <0>;
vdd3-supply = <&ldo7_reg>;
allOf:
- $ref: panel-common.yaml#
- $ref: /schemas/leds/backlight/common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
title: Sitronix ST7789V RGB panel with SPI control bus
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Maxime Ripard <mripard@kernel.org>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
title: Sony ACX565AKM SDI Panel
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Tomi Valkeinen <tomi.valkeinen@ti.com>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
title: Toppoly TD Panels
-description: |
- The panel must obey the rules for a SPI slave device as specified in
- spi/spi-controller.yaml
-
maintainers:
- Marek Belisko <marek@goldelico.com>
- H. Nikolaus Schaller <hns@goldelico.com>
allOf:
- $ref: panel-common.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
compatible:
clock-names:
const: hclk
- pinctrl-0:
- maxItems: 2
-
- pinctrl-names:
- const: default
- description:
- Switch the iomux for the HPD/I2C pins to HDMI function.
-
power-domains:
maxItems: 1
maintainers:
- Benson Leung <bleung@chromium.org>
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
description: |
On ChromeOS systems with USB Type C ports, the ChromeOS Embedded Controller is
maintainers:
- Doug Anderson <dianders@chromium.org>
- Benson Leung <bleung@chromium.org>
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
description: |
On some ChromeOS board designs we've got a connection to the EC
maintainers:
- Stephen Boyd <swboyd@chromium.org>
- Benson Leung <bleung@chromium.org>
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
description: |
Google's ChromeOS EC sometimes has the ability to detect user proximity.
maintainers:
- Simon Glass <sjg@chromium.org>
- Benson Leung <bleung@chromium.org>
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
description: |
Google's ChromeOS EC Keyboard is a simple matrix keyboard
which can be disabled to suppress events from the button.
type: boolean
- pinctrl-0:
- maxItems: 1
-
- pinctrl-names:
- maxItems: 1
-
required:
- linux,code
data-lanes:
description:
Note that 'fsl,imx7-mipi-csi2' only supports up to 2 data lines.
+ minItems: 1
items:
- minItems: 1
- maxItems: 4
- items:
- - const: 1
- - const: 2
- - const: 3
- - const: 4
+ - const: 1
+ - const: 2
+ - const: 3
+ - const: 4
required:
- data-lanes
properties:
data-lanes:
+ minItems: 1
items:
- minItems: 1
- maxItems: 4
- items:
- - const: 1
- - const: 2
- - const: 3
- - const: 4
+ - const: 1
+ - const: 2
+ - const: 3
+ - const: 4
required:
- data-lanes
interrupt-controller;
#interrupt-cells = <2>;
- interrupts = <&host_irq1>;
- interrupt-parent = <&gic>;
+ interrupts = <4 1 0>;
gpio-controller;
#gpio-cells = <2>;
maintainers:
- Benson Leung <bleung@chromium.org>
- - Enric Balletbo i Serra <enric.balletbo@collabora.com>
- Guenter Roeck <groeck@chromium.org>
description:
clock-names = "mclk", "apb_pclk";
};
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+
mmc@80126000 {
compatible = "arm,pl18x", "arm,primecell";
reg = <0x80126000 0x1000>;
vqmmc-supply = <&vmmci>;
};
+ - |
mmc@101f6000 {
compatible = "arm,pl18x", "arm,primecell";
reg = <0x101f6000 0x1000>;
clocks = <&sdiclk>, <&pclksdi>;
clock-names = "mclk", "apb_pclk";
- interrupt-parent = <&vica>;
interrupts = <22>;
max-frequency = <400000>;
bus-width = <4>;
vmmc-supply = <&vmmc_regulator>;
};
+ - |
mmc@52007000 {
compatible = "arm,pl18x", "arm,primecell";
arm,primecell-periphid = <0x10153180>;
M_CAN user manual for details.
$ref: /schemas/types.yaml#/definitions/int32-array
items:
- items:
- - description: The 'offset' is an address offset of the Message RAM where
- the following elements start from. This is usually set to 0x0 if
- you're using a private Message RAM.
- default: 0
- - description: 11-bit Filter 0-128 elements / 0-128 words
- minimum: 0
- maximum: 128
- - description: 29-bit Filter 0-64 elements / 0-128 words
- minimum: 0
- maximum: 64
- - description: Rx FIFO 0 0-64 elements / 0-1152 words
- minimum: 0
- maximum: 64
- - description: Rx FIFO 1 0-64 elements / 0-1152 words
- minimum: 0
- maximum: 64
- - description: Rx Buffers 0-64 elements / 0-1152 words
- minimum: 0
- maximum: 64
- - description: Tx Event FIFO 0-32 elements / 0-64 words
- minimum: 0
- maximum: 32
- - description: Tx Buffers 0-32 elements / 0-576 words
- minimum: 0
- maximum: 32
- maxItems: 1
+ - description: The 'offset' is an address offset of the Message RAM where
+ the following elements start from. This is usually set to 0x0 if
+ you're using a private Message RAM.
+ default: 0
+ - description: 11-bit Filter 0-128 elements / 0-128 words
+ minimum: 0
+ maximum: 128
+ - description: 29-bit Filter 0-64 elements / 0-128 words
+ minimum: 0
+ maximum: 64
+ - description: Rx FIFO 0 0-64 elements / 0-1152 words
+ minimum: 0
+ maximum: 64
+ - description: Rx FIFO 1 0-64 elements / 0-1152 words
+ minimum: 0
+ maximum: 64
+ - description: Rx Buffers 0-64 elements / 0-1152 words
+ minimum: 0
+ maximum: 64
+ - description: Tx Event FIFO 0-32 elements / 0-64 words
+ minimum: 0
+ maximum: 32
+ - description: Tx Buffers 0-32 elements / 0-576 words
+ minimum: 0
+ maximum: 32
power-domains:
description:
description:
Specifies the MAC address that was assigned to the network device.
$ref: /schemas/types.yaml#/definitions/uint8-array
- items:
- - minItems: 6
- maxItems: 6
+ minItems: 6
+ maxItems: 6
mac-address:
description:
to the device by the boot program is different from the
local-mac-address property.
$ref: /schemas/types.yaml#/definitions/uint8-array
- items:
- - minItems: 6
- maxItems: 6
+ minItems: 6
+ maxItems: 6
max-frame-size:
$ref: /schemas/types.yaml#/definitions/uint32
type: array
then:
deprecated: true
- minItems: 1
- maxItems: 1
items:
- items:
- - minimum: 0
- maximum: 31
- description:
- Emulated PHY ID, choose any but unique to the all
- specified fixed-links
-
- - enum: [0, 1]
- description:
- Duplex configuration. 0 for half duplex or 1 for
- full duplex
-
- - enum: [10, 100, 1000, 2500, 10000]
- description:
- Link speed in Mbits/sec.
-
- - enum: [0, 1]
- description:
- Pause configuration. 0 for no pause, 1 for pause
-
- - enum: [0, 1]
- description:
- Asymmetric pause configuration. 0 for no asymmetric
- pause, 1 for asymmetric pause
+ - minimum: 0
+ maximum: 31
+ description:
+ Emulated PHY ID, choose any but unique to the all
+ specified fixed-links
+
+ - enum: [0, 1]
+ description:
+ Duplex configuration. 0 for half duplex or 1 for
+ full duplex
+
+ - enum: [10, 100, 1000, 2500, 10000]
+ description:
+ Link speed in Mbits/sec.
+
+ - enum: [0, 1]
+ description:
+ Pause configuration. 0 for no pause, 1 for pause
+
+ - enum: [0, 1]
+ description:
+ Asymmetric pause configuration. 0 for no asymmetric
+ pause, 1 for asymmetric pause
- if:
The settings and programming routines for internal/external
MDIO are different. Must be included for internal MDIO.
+- fsl,erratum-a009885
+ Usage: optional
+ Value type: <boolean>
+ Definition: Indicates the presence of the A009885
+ erratum describing that the contents of MDIO_DATA may
+ become corrupt unless it is read within 16 MDC cycles
+ of MDIO_CFG[BSY] being cleared, when performing an
+ MDIO read operation.
+
- fsl,erratum-a011043
Usage: optional
Value type: <boolean>
- compatible: For the OX820 SoC, it should be :
- "oxsemi,ox820-dwmac" to select glue
- "snps,dwmac-3.512" to select IP version.
+ For the OX810SE SoC, it should be :
+ - "oxsemi,ox810se-dwmac" to select glue
+ - "snps,dwmac-3.512" to select IP version.
- clocks: Should contain phandles to the following clocks
- clock-names: Should contain the following:
Offset and size in bytes within the storage device.
bits:
- maxItems: 1
+ $ref: /schemas/types.yaml#/definitions/uint32-array
items:
- items:
- - minimum: 0
- maximum: 7
- description:
- Offset in bit within the address range specified by reg.
- - minimum: 1
- description:
- Size in bit within the address range specified by reg.
+ - minimum: 0
+ maximum: 7
+ description:
+ Offset in bit within the address range specified by reg.
+ - minimum: 1
+ description:
+ Size in bit within the address range specified by reg.
required:
- reg
appropriate of the LOCHNAGARx_PIN_NUM_GPIOS define, see [3].
maxItems: 1
- pinctrl-0:
- description:
- A phandle to the default pinctrl state.
-
- pinctrl-names:
- description:
- A pinctrl state named "default" must be defined.
- const: default
-
pin-settings:
type: object
patternProperties:
Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
properties:
- pinctrl-0:
- description:
- A phandle to the node containing the subnodes containing default
- configurations.
-
- pinctrl-names:
- description:
- A pinctrl state named "default" must be defined.
- const: default
-
pin-settings:
description:
One subnode is required to contain the default settings. It
priority:
$ref: /schemas/types.yaml#/definitions/uint32
description: |
- A priority ranging from 0 to 255 (default 128) according to the following guidelines:
+ A priority ranging from 0 to 255 (default 129) according to the following guidelines:
0: Restart handler of last resort, with limited restart capabilities.
128: Default restart handler; use if no other restart handler is expected to be available,
255: Highest priority restart handler, will preempt all other restart handlers.
minimum: 0
maximum: 255
- default: 128
+ default: 129
active-delay:
$ref: /schemas/types.yaml#/definitions/uint32
maintainers:
- Thierry Reding <thierry.reding@gmail.com>
+select: false
+
properties:
$nodename:
pattern: "^pwm(@.*|-[0-9a-f])*$"
properties:
compatible:
enum:
+ - epson,rx8804
- epson,rx8900
- microcrystal,rv8803
- qcom,pmk8350-rtc
reg:
- maxItems: 1
+ minItems: 1
+ maxItems: 2
+
+ reg-names:
+ minItems: 1
+ items:
+ - const: rtc
+ - const: alarm
interrupts:
maxItems: 1
st,syscfg = <&pwrcfg 0x00 0x100>;
};
+ - |
#include <dt-bindings/interrupt-controller/arm-gic.h>
#include <dt-bindings/clock/stm32mp1-clks.h>
rtc@5c004000 {
--- /dev/null
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright (C) Sunplus Co., Ltd. 2021
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/rtc/sunplus,sp7021-rtc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sunplus SP7021 Real Time Clock controller
+
+maintainers:
+ - Vincent Shih <vincent.sunplus@gmail.com>
+
+properties:
+ compatible:
+ const: sunplus,sp7021-rtc
+
+ reg:
+ maxItems: 1
+
+ reg-names:
+ items:
+ - const: rtc
+
+ clocks:
+ maxItems: 1
+
+ resets:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - clocks
+ - resets
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ rtc: serial@9c003a00 {
+ compatible = "sunplus,sp7021-rtc";
+ reg = <0x9c003a00 0x80>;
+ reg-names = "rtc";
+ clocks = <&clkc 0x12>;
+ resets = <&rstc 0x02>;
+ interrupt-parent = <&intc>;
+ interrupts = <163 IRQ_TYPE_EDGE_RISING>;
+ };
+...
Internal DMA register base address of the audio
subsystem (used in secondary sound source).
- pinctrl-0:
- description: Should specify pin control groups used for this controller.
-
- pinctrl-names:
- const: default
-
power-domains:
maxItems: 1
- enum:
# SMBus/I2C Digital Temperature Sensor in 6-Pin SOT with SMBus Alert and Over Temperature Pin
- ad,ad7414
- # ADM9240: Complete System Hardware Monitor for uProcessor-Based Systems
+ # ADM9240: Complete System Hardware Monitor for uProcessor-Based Systems
- ad,adm9240
# AD5110 - Nonvolatile Digital Potentiometer
- adi,ad5110
- adi,adp5589
# AMS iAQ-Core VOC Sensor
- ams,iaq-core
- # i2c serial eeprom (24cxx)
+ # i2c serial eeprom (24cxx)
- at,24c08
# i2c trusted platform module (TPM)
- atmel,at97sc3204t
- skyworks,sky81452
# Socionext SynQuacer TPM MMIO module
- socionext,synquacer-tpm-mmio
- # i2c serial eeprom (24cxx)
- - sparkfun,qwiic-joystick
# SparkFun Qwiic Joystick (COM-15168) with i2c interface
+ - sparkfun,qwiic-joystick
+ # i2c serial eeprom (24cxx)
- st,24c256
# Ambient Light Sensor with SMBUS/Two Wire Serial Interface
- taos,tsl2550
# Keep list in alphabetical order.
"^70mai,.*":
description: 70mai Co., Ltd.
+ "^8dev,.*":
+ description: 8devices, UAB
"^abb,.*":
description: ABB
"^abilis,.*":
description: Freescale Semiconductor
"^fujitsu,.*":
description: Fujitsu Ltd.
+ "^fxtec,.*":
+ description: FX Technology Ltd.
"^gardena,.*":
description: GARDENA GmbH
"^gateworks,.*":
description: HannStar Display Co.
"^holtek,.*":
description: Holtek Semiconductor, Inc.
+ "^huawei,.*":
+ description: Huawei Technologies Co., Ltd.
"^hugsun,.*":
description: Shenzhen Hugsun Technology Co. Ltd.
"^hwacom,.*":
description: THine Electronics, Inc.
"^thingyjp,.*":
description: thingy.jp
+ "^thundercomm,.*":
+ description: Thundercomm Technology Co., Ltd.
"^ti,.*":
description: Texas Instruments
"^tianma,.*":
description: Wiligear, Ltd.
"^winbond,.*":
description: Winbond Electronics corp.
+ "^wingtech,.*":
+ description: Wingtech Technology Co., Ltd.
"^winlink,.*":
description: WinLink Co., Ltd
"^winstar,.*":
Firewire char device data structures
====================================
-.. include:: /ABI/stable/firewire-cdev
+.. include:: ../ABI/stable/firewire-cdev
:literal:
.. kernel-doc:: include/uapi/linux/firewire-cdev.h
Firewire device probing and sysfs interfaces
============================================
-.. include:: /ABI/stable/sysfs-bus-firewire
+.. include:: ../ABI/stable/sysfs-bus-firewire
:literal:
.. kernel-doc:: drivers/firewire/core-device.c
The basic mount syntax is::
- # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
+ # mount -t ceph user@fsid.fs_name=/[subdir] mnt -o mon_addr=monip1[:port][/monip2[:port]]
You only need to specify a single monitor, as the client will get the
full list when it connects. (However, if the monitor you specify
off if the monitor is using the default. So if the monitor is at
1.2.3.4::
- # mount -t ceph 1.2.3.4:/ /mnt/ceph
+ # mount -t ceph cephuser@07fe3187-00d9-42a3-814b-72a4d5e7d5be.cephfs=/ /mnt/ceph -o mon_addr=1.2.3.4
is sufficient. If /sbin/mount.ceph is installed, a hostname can be
-used instead of an IP address.
+used instead of an IP address and the cluster FSID can be left out
+(as the mount helper will fill it in by reading the ceph configuration
+file)::
+ # mount -t ceph cephuser@cephfs=/ /mnt/ceph -o mon_addr=mon-addr
+Multiple monitor addresses can be passed by separating each address with a slash (`/`)::
+
+ # mount -t ceph cephuser@cephfs=/ /mnt/ceph -o mon_addr=192.168.1.100/192.168.1.101
+
+When using the mount helper, monitor address can be read from ceph
+configuration file if available. Note that, the cluster FSID (passed as part
+of the device string) is validated by checking it with the FSID reported by
+the monitor.
Mount Options
=============
+ mon_addr=ip_address[:port][/ip_address[:port]]
+ Monitor address to the cluster. This is used to bootstrap the
+ connection to the cluster. Once connection is established, the
+ monitor addresses in the monitor map are followed.
+
+ fsid=cluster-id
+ FSID of the cluster (from `ceph fsid` command).
+
ip=A.B.C.D[:N]
Specify the IP and/or port the client should bind to locally.
There is normally not much reason to do this. If the IP is not
FAULT_WRITE_IO 0x000004000
FAULT_SLAB_ALLOC 0x000008000
FAULT_DQUOT_INIT 0x000010000
+ FAULT_LOCK_OP 0x000020000
=================== ===========
mode=%s Control block allocation mode which supports "adaptive"
and "lfs". In "lfs" mode, there should be no random
y y y Y/m/n
n m n N/m
m m m M/n
- y m n M/n
+ y m m M/n
y n * N
=== === ============= ==============
Program Minimal version Command to check the version
====================== =============== ========================================
GNU C 5.1 gcc --version
-Clang/LLVM (optional) 10.0.1 clang --version
+Clang/LLVM (optional) 11.0.0 clang --version
GNU make 3.81 make --version
binutils 2.23 ld -v
flex 2.5.35 flex --version
| Kernel-space virtual memory, shared between all processes:
____________________________________________________________|___________________________________________________________
| | | |
- ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan
- ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap
- ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io
- ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap
- ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | vmalloc/ioremap space
- ffffffe000000000 | -128 GB | ffffffff7fffffff | 124 GB | direct mapping of all physical memory
+ ffffffc6fee00000 | -228 GB | ffffffc6feffffff | 2 MB | fixmap
+ ffffffc6ff000000 | -228 GB | ffffffc6ffffffff | 16 MB | PCI io
+ ffffffc700000000 | -228 GB | ffffffc7ffffffff | 4 GB | vmemmap
+ ffffffc800000000 | -224 GB | ffffffd7ffffffff | 64 GB | vmalloc/ioremap space
+ ffffffd800000000 | -160 GB | fffffff6ffffffff | 124 GB | direct mapping of all physical memory
+ fffffff700000000 | -36 GB | fffffffeffffffff | 32 GB | kasan
__________________|____________|__________________|_________|____________________________________________________________
|
|
+--------------------------+ +---------+--------------------+
At the lowest level (in x86), the AMD Secure Processor (ASP) driver uses the
-CPU to PSP mailbox regsister to submit commands to the PSP. The format of the
+CPU to PSP mailbox register to submit commands to the PSP. The format of the
command buffer is opaque to the ASP driver. It's role is to submit commands to
the secure processor and return results to AMD-TEE driver. The interface
between AMD-TEE driver and AMD Secure Processor driver can be found in [6].
The GlobalPlatform TEE Client API [5] can be used by the user space (client) to
talk to AMD's TEE. AMD's TEE provides a secure environment for loading, opening
-a session, invoking commands and clossing session with TA.
+a session, invoking commands and closing session with TA.
References
==========
Instances
---------
-In the tracefs tracing directory is a directory called "instances".
+In the tracefs tracing directory, there is a directory called "instances".
This directory can have new directories created inside of it using
mkdir, and removing directories with rmdir. The directory created
with mkdir in this directory will already contain files and other
The Stats Data block contains an array of 64-bit values in the same order
as the descriptors in Descriptors block.
-4.42 KVM_GET_XSAVE2
-------------------
+4.134 KVM_GET_XSAVE2
+--------------------
:Capability: KVM_CAP_XSAVE2
:Architectures: x86
limit the attack surface on KVM's MSR emulation code.
8.28 KVM_CAP_ENFORCE_PV_FEATURE_CPUID
------------------------------
+-------------------------------------
Architectures: x86
+++ /dev/null
-.. _cleancache:
-
-==========
-Cleancache
-==========
-
-Motivation
-==========
-
-Cleancache is a new optional feature provided by the VFS layer that
-potentially dramatically increases page cache effectiveness for
-many workloads in many environments at a negligible cost.
-
-Cleancache can be thought of as a page-granularity victim cache for clean
-pages that the kernel's pageframe replacement algorithm (PFRA) would like
-to keep around, but can't since there isn't enough memory. So when the
-PFRA "evicts" a page, it first attempts to use cleancache code to
-put the data contained in that page into "transcendent memory", memory
-that is not directly accessible or addressable by the kernel and is
-of unknown and possibly time-varying size.
-
-Later, when a cleancache-enabled filesystem wishes to access a page
-in a file on disk, it first checks cleancache to see if it already
-contains it; if it does, the page of data is copied into the kernel
-and a disk access is avoided.
-
-Transcendent memory "drivers" for cleancache are currently implemented
-in Xen (using hypervisor memory) and zcache (using in-kernel compressed
-memory) and other implementations are in development.
-
-:ref:`FAQs <faq>` are included below.
-
-Implementation Overview
-=======================
-
-A cleancache "backend" that provides transcendent memory registers itself
-to the kernel's cleancache "frontend" by calling cleancache_register_ops,
-passing a pointer to a cleancache_ops structure with funcs set appropriately.
-The functions provided must conform to certain semantics as follows:
-
-Most important, cleancache is "ephemeral". Pages which are copied into
-cleancache have an indefinite lifetime which is completely unknowable
-by the kernel and so may or may not still be in cleancache at any later time.
-Thus, as its name implies, cleancache is not suitable for dirty pages.
-Cleancache has complete discretion over what pages to preserve and what
-pages to discard and when.
-
-Mounting a cleancache-enabled filesystem should call "init_fs" to obtain a
-pool id which, if positive, must be saved in the filesystem's superblock;
-a negative return value indicates failure. A "put_page" will copy a
-(presumably about-to-be-evicted) page into cleancache and associate it with
-the pool id, a file key, and a page index into the file. (The combination
-of a pool id, a file key, and an index is sometimes called a "handle".)
-A "get_page" will copy the page, if found, from cleancache into kernel memory.
-An "invalidate_page" will ensure the page no longer is present in cleancache;
-an "invalidate_inode" will invalidate all pages associated with the specified
-file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate
-all pages in all files specified by the given pool id and also surrender
-the pool id.
-
-An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache
-to treat the pool as shared using a 128-bit UUID as a key. On systems
-that may run multiple kernels (such as hard partitioned or virtualized
-systems) that may share a clustered filesystem, and where cleancache
-may be shared among those kernels, calls to init_shared_fs that specify the
-same UUID will receive the same pool id, thus allowing the pages to
-be shared. Note that any security requirements must be imposed outside
-of the kernel (e.g. by "tools" that control cleancache). Or a
-cleancache implementation can simply disable shared_init by always
-returning a negative value.
-
-If a get_page is successful on a non-shared pool, the page is invalidated
-(thus making cleancache an "exclusive" cache). On a shared pool, the page
-is NOT invalidated on a successful get_page so that it remains accessible to
-other sharers. The kernel is responsible for ensuring coherency between
-cleancache (shared or not), the page cache, and the filesystem, using
-cleancache invalidate operations as required.
-
-Note that cleancache must enforce put-put-get coherency and get-get
-coherency. For the former, if two puts are made to the same handle but
-with different data, say AAA by the first put and BBB by the second, a
-subsequent get can never return the stale data (AAA). For get-get coherency,
-if a get for a given handle fails, subsequent gets for that handle will
-never succeed unless preceded by a successful put with that handle.
-
-Last, cleancache provides no SMP serialization guarantees; if two
-different Linux threads are simultaneously putting and invalidating a page
-with the same handle, the results are indeterminate. Callers must
-lock the page to ensure serial behavior.
-
-Cleancache Performance Metrics
-==============================
-
-If properly configured, monitoring of cleancache is done via debugfs in
-the `/sys/kernel/debug/cleancache` directory. The effectiveness of cleancache
-can be measured (across all filesystems) with:
-
-``succ_gets``
- number of gets that were successful
-
-``failed_gets``
- number of gets that failed
-
-``puts``
- number of puts attempted (all "succeed")
-
-``invalidates``
- number of invalidates attempted
-
-A backend implementation may provide additional metrics.
-
-.. _faq:
-
-FAQ
-===
-
-* Where's the value? (Andrew Morton)
-
-Cleancache provides a significant performance benefit to many workloads
-in many environments with negligible overhead by improving the
-effectiveness of the pagecache. Clean pagecache pages are
-saved in transcendent memory (RAM that is otherwise not directly
-addressable to the kernel); fetching those pages later avoids "refaults"
-and thus disk reads.
-
-Cleancache (and its sister code "frontswap") provide interfaces for
-this transcendent memory (aka "tmem"), which conceptually lies between
-fast kernel-directly-addressable RAM and slower DMA/asynchronous devices.
-Disallowing direct kernel or userland reads/writes to tmem
-is ideal when data is transformed to a different form and size (such
-as with compression) or secretly moved (as might be useful for write-
-balancing for some RAM-like devices). Evicted page-cache pages (and
-swap pages) are a great use for this kind of slower-than-RAM-but-much-
-faster-than-disk transcendent memory, and the cleancache (and frontswap)
-"page-object-oriented" specification provides a nice way to read and
-write -- and indirectly "name" -- the pages.
-
-In the virtual case, the whole point of virtualization is to statistically
-multiplex physical resources across the varying demands of multiple
-virtual machines. This is really hard to do with RAM and efforts to
-do it well with no kernel change have essentially failed (except in some
-well-publicized special-case workloads). Cleancache -- and frontswap --
-with a fairly small impact on the kernel, provide a huge amount
-of flexibility for more dynamic, flexible RAM multiplexing.
-Specifically, the Xen Transcendent Memory backend allows otherwise
-"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple
-virtual machines, but the pages can be compressed and deduplicated to
-optimize RAM utilization. And when guest OS's are induced to surrender
-underutilized RAM (e.g. with "self-ballooning"), page cache pages
-are the first to go, and cleancache allows those pages to be
-saved and reclaimed if overall host system memory conditions allow.
-
-And the identical interface used for cleancache can be used in
-physical systems as well. The zcache driver acts as a memory-hungry
-device that stores pages of data in a compressed state. And
-the proposed "RAMster" driver shares RAM across multiple physical
-systems.
-
-* Why does cleancache have its sticky fingers so deep inside the
- filesystems and VFS? (Andrew Morton and Christoph Hellwig)
-
-The core hooks for cleancache in VFS are in most cases a single line
-and the minimum set are placed precisely where needed to maintain
-coherency (via cleancache_invalidate operations) between cleancache,
-the page cache, and disk. All hooks compile into nothingness if
-cleancache is config'ed off and turn into a function-pointer-
-compare-to-NULL if config'ed on but no backend claims the ops
-functions, or to a compare-struct-element-to-negative if a
-backend claims the ops functions but a filesystem doesn't enable
-cleancache.
-
-Some filesystems are built entirely on top of VFS and the hooks
-in VFS are sufficient, so don't require an "init_fs" hook; the
-initial implementation of cleancache didn't provide this hook.
-But for some filesystems (such as btrfs), the VFS hooks are
-incomplete and one or more hooks in fs-specific code are required.
-And for some other filesystems, such as tmpfs, cleancache may
-be counterproductive. So it seemed prudent to require a filesystem
-to "opt in" to use cleancache, which requires adding a hook in
-each filesystem. Not all filesystems are supported by cleancache
-only because they haven't been tested. The existing set should
-be sufficient to validate the concept, the opt-in approach means
-that untested filesystems are not affected, and the hooks in the
-existing filesystems should make it very easy to add more
-filesystems in the future.
-
-The total impact of the hooks to existing fs and mm files is only
-about 40 lines added (not counting comments and blank lines).
-
-* Why not make cleancache asynchronous and batched so it can more
- easily interface with real devices with DMA instead of copying each
- individual page? (Minchan Kim)
-
-The one-page-at-a-time copy semantics simplifies the implementation
-on both the frontend and backend and also allows the backend to
-do fancy things on-the-fly like page compression and
-page deduplication. And since the data is "gone" (copied into/out
-of the pageframe) before the cleancache get/put call returns,
-a great deal of race conditions and potential coherency issues
-are avoided. While the interface seems odd for a "real device"
-or for real kernel-addressable RAM, it makes perfect sense for
-transcendent memory.
-
-* Why is non-shared cleancache "exclusive"? And where is the
- page "invalidated" after a "get"? (Minchan Kim)
-
-The main reason is to free up space in transcendent memory and
-to avoid unnecessary cleancache_invalidate calls. If you want inclusive,
-the page can be "put" immediately following the "get". If
-put-after-get for inclusive becomes common, the interface could
-be easily extended to add a "get_no_invalidate" call.
-
-The invalidate is done by the cleancache backend implementation.
-
-* What's the performance impact?
-
-Performance analysis has been presented at OLS'09 and LCA'10.
-Briefly, performance gains can be significant on most workloads,
-especially when memory pressure is high (e.g. when RAM is
-overcommitted in a virtual workload); and because the hooks are
-invoked primarily in place of or in addition to a disk read/write,
-overhead is negligible even in worst case workloads. Basically
-cleancache replaces I/O with memory-copy-CPU-overhead; on older
-single-core systems with slow memory-copy speeds, cleancache
-has little value, but in newer multicore machines, especially
-consolidated/virtualized machines, it has great value.
-
-* How do I add cleancache support for filesystem X? (Boaz Harrash)
-
-Filesystems that are well-behaved and conform to certain
-restrictions can utilize cleancache simply by making a call to
-cleancache_init_fs at mount time. Unusual, misbehaving, or
-poorly layered filesystems must either add additional hooks
-and/or undergo extensive additional testing... or should just
-not enable the optional cleancache.
-
-Some points for a filesystem to consider:
-
- - The FS should be block-device-based (e.g. a ram-based FS such
- as tmpfs should not enable cleancache)
- - To ensure coherency/correctness, the FS must ensure that all
- file removal or truncation operations either go through VFS or
- add hooks to do the equivalent cleancache "invalidate" operations
- - To ensure coherency/correctness, either inode numbers must
- be unique across the lifetime of the on-disk file OR the
- FS must provide an "encode_fh" function.
- - The FS must call the VFS superblock alloc and deactivate routines
- or add hooks to do the equivalent cleancache calls done there.
- - To maximize performance, all pages fetched from the FS should
- go through the do_mpag_readpage routine or the FS should add
- hooks to do the equivalent (cf. btrfs)
- - Currently, the FS blocksize must be the same as PAGESIZE. This
- is not an architectural restriction, but no backends currently
- support anything different.
- - A clustered FS should invoke the "shared_init_fs" cleancache
- hook to get best performance for some backends.
-
-* Why not use the KVA of the inode as the key? (Christoph Hellwig)
-
-If cleancache would use the inode virtual address instead of
-inode/filehandle, the pool id could be eliminated. But, this
-won't work because cleancache retains pagecache data pages
-persistently even when the inode has been pruned from the
-inode unused list, and only invalidates the data page if the file
-gets removed/truncated. So if cleancache used the inode kva,
-there would be potential coherency issues if/when the inode
-kva is reused for a different file. Alternately, if cleancache
-invalidated the pages when the inode kva was freed, much of the value
-of cleancache would be lost because the cache of pages in cleanache
-is potentially much larger than the kernel pagecache and is most
-useful if the pages survive inode cache removal.
-
-* Why is a global variable required?
-
-The cleancache_enabled flag is checked in all of the frequently-used
-cleancache hooks. The alternative is a function call to check a static
-variable. Since cleancache is enabled dynamically at runtime, systems
-that don't enable cleancache would suffer thousands (possibly
-tens-of-thousands) of unnecessary function calls per second. So the
-global variable allows cleancache to be enabled by default at compile
-time, but have insignificant performance impact when cleancache remains
-disabled at runtime.
-
-* Does cleanache work with KVM?
-
-The memory model of KVM is sufficiently different that a cleancache
-backend may have less value for KVM. This remains to be tested,
-especially in an overcommitted system.
-
-* Does cleancache work in userspace? It sounds useful for
- memory hungry caches like web browsers. (Jamie Lokier)
-
-No plans yet, though we agree it sounds useful, at least for
-apps that bypass the page cache (e.g. O_DIRECT).
-
-Last updated: Dan Magenheimer, April 13 2011
In some environments, dramatic performance savings may be obtained because
swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk.
-(Note, frontswap -- and :ref:`cleancache` (merged at 3.0) -- are the "frontends"
-and the only necessary changes to the core kernel for transcendent memory;
-all other supporting code -- the "backends" -- is implemented as drivers.
-See the LWN.net article `Transcendent memory in a nutshell`_
-for a detailed overview of frontswap and related kernel parts)
-
.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/
Frontswap is so named because it can be thought of as the opposite of
If a store returns failure, transcendent memory has rejected the data, and the
page can be written to swap as usual.
-If a backend chooses, frontswap can be configured as a "writethrough
-cache" by calling frontswap_writethrough(). In this mode, the reduction
-in swap device writes is lost (and also a non-trivial performance advantage)
-in order to allow the backend to arbitrarily "reclaim" space used to
-store frontswap pages to more completely manage its memory usage.
-
Note that if a page is stored and the page already exists in transcendent memory
(a "duplicate" store), either the store succeeds and the data is overwritten,
or the store fails AND the page is invalidated. This ensures stale data may
and size (such as with compression) or secretly moved (as might be
useful for write-balancing for some RAM-like devices). Swap pages (and
evicted page-cache pages) are a great use for this kind of slower-than-RAM-
-but-much-faster-than-disk "pseudo-RAM device" and the frontswap (and
-cleancache) interface to transcendent memory provides a nice way to read
-and write -- and indirectly "name" -- the pages.
+but-much-faster-than-disk "pseudo-RAM device".
-Frontswap -- and cleancache -- with a fairly small impact on the kernel,
+Frontswap with a fairly small impact on the kernel,
provides a huge amount of flexibility for more dynamic, flexible RAM
utilization in various system configurations:
swap subsystem then writes the new data to the read swap device,
this is the correct course of action to ensure coherency.
-* What is frontswap_shrink for?
-
-When the (non-frontswap) swap subsystem swaps out a page to a real
-swap device, that page is only taking up low-value pre-allocated disk
-space. But if frontswap has placed a page in transcendent memory, that
-page may be taking up valuable real estate. The frontswap_shrink
-routine allows code outside of the swap subsystem to force pages out
-of the memory managed by frontswap and back into kernel-addressable memory.
-For example, in RAMster, a "suction driver" thread will attempt
-to "repatriate" pages sent to a remote machine back to the local machine;
-this is driven using the frontswap_shrink mechanism when memory pressure
-subsides.
-
* Why does the frontswap patch create the new include file swapfile.h?
The frontswap code depends on some swap-subsystem-internal data
active_mm
arch_pgtable_helpers
balance
- cleancache
damon/index
free_page_reporting
frontswap
R: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
R: Rasmus Villemoes <linux@rasmusvillemoes.dk>
S: Maintained
-F: include/asm-generic/bitops/find.h
F: include/linux/bitmap.h
+F: include/linux/find.h
F: lib/bitmap.c
F: lib/find_bit.c
F: lib/find_bit_benchmark.c
F: lib/test_bitmap.c
-F: tools/include/asm-generic/bitops/find.h
F: tools/include/linux/bitmap.h
+F: tools/include/linux/find.h
F: tools/lib/bitmap.c
F: tools/lib/find_bit.c
S: Maintained
F: Documentation/admin-guide/module-signing.rst
F: certs/
-F: scripts/extract-cert.c
F: scripts/sign-file.c
CFAG12864B LCD DRIVER
F: include/linux/cfi.h
F: kernel/cfi.c
-CLEANCACHE API
-M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-L: linux-kernel@vger.kernel.org
-S: Maintained
-F: include/linux/cleancache.h
-F: mm/cleancache.c
-
CLK API
M: Russell King <linux@armlinux.org.uk>
L: linux-clk@vger.kernel.org
M: Vasily Gorbik <gor@linux.ibm.com>
M: Christian Borntraeger <borntraeger@linux.ibm.com>
R: Alexander Gordeev <agordeev@linux.ibm.com>
+R: Sven Schnelle <svens@linux.ibm.com>
L: linux-s390@vger.kernel.org
S: Supported
W: http://www.ibm.com/developerworks/linux/linux390/
S: Maintained
F: drivers/net/ethernet/dlink/sundance.c
+SUNPLUS RTC DRIVER
+M: Vincent Shih <vincent.sunplus@gmail.com>
+L: linux-rtc@vger.kernel.org
+S: Maintained
+F: Documentation/devicetree/bindings/rtc/sunplus,sp7021-rtc.yaml
+F: drivers/rtc/rtc-sunplus.c
+
SUPERH
M: Yoshinori Sato <ysato@users.sourceforge.jp>
M: Rich Felker <dalias@libc.org>
KBUILD_CFLAGS += $(stackp-flags-y)
KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror
-KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) $(CONFIG_CC_IMPLICIT_FALLTHROUGH:"%"=%)
+KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) $(CONFIG_CC_IMPLICIT_FALLTHROUGH)
ifdef CONFIG_CC_IS_CLANG
KBUILD_CPPFLAGS += -Qunused-arguments
$(Q)$(MAKE) $(hdr-inst)=include/uapi
$(Q)$(MAKE) $(hdr-inst)=arch/$(SRCARCH)/include/uapi
-# Deprecated. It is no-op now.
-PHONY += headers_check
-headers_check:
- @echo >&2 "=================== WARNING ==================="
- @echo >&2 "Since Linux 5.5, 'make headers_check' is no-op,"
- @echo >&2 "and will be removed after Linux 5.15 release."
- @echo >&2 "Please remove headers_check from your scripts."
- @echo >&2 "==============================================="
-
ifdef CONFIG_HEADERS_INSTALL
prepare: headers
endif
debian snap tar-install \
.config .config.old .version \
Module.symvers \
- certs/signing_key.pem certs/signing_key.x509 \
+ certs/signing_key.pem \
certs/x509.genkey \
vmlinux-gdb.py \
*.spec
# now expand this into a simple variable to reduce the cost of shell evaluations
prepare: CC_VERSION_TEXT := $(CC_VERSION_TEXT)
prepare:
- @if [ "$(CC_VERSION_TEXT)" != $(CONFIG_CC_VERSION_TEXT) ]; then \
+ @if [ "$(CC_VERSION_TEXT)" != "$(CONFIG_CC_VERSION_TEXT)" ]; then \
echo >&2 "warning: the compiler differs from the one used to build the kernel"; \
- echo >&2 " The kernel was built by: "$(CONFIG_CC_VERSION_TEXT); \
+ echo >&2 " The kernel was built by: $(CONFIG_CC_VERSION_TEXT)"; \
echo >&2 " You are using: $(CC_VERSION_TEXT)"; \
fi
config HAS_LTO_CLANG
def_bool y
- # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
- depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD && AS_IS_LLVM
+ depends on CC_IS_CLANG && LD_IS_LLD && AS_IS_LLVM
depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
depends on ARCH_SUPPORTS_LTO_CLANG
depends on !PAGE_SIZE_64KB
depends on !PARISC_PAGE_SIZE_64KB
depends on !PPC_64K_PAGES
+ depends on PAGE_SIZE_LESS_THAN_256KB
+
+config PAGE_SIZE_LESS_THAN_256KB
+ def_bool y
depends on !PPC_256K_PAGES
depends on !PAGE_SIZE_256KB
#endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h>
-
#ifdef __KERNEL__
/*
static int
alpha_rtc_read_time(struct device *dev, struct rtc_time *tm)
{
- mc146818_get_time(tm);
+ int ret = mc146818_get_time(tm);
+
+ if (ret < 0) {
+ dev_err_ratelimited(dev, "unable to read current time\n");
+ return ret;
+ }
/* Adjust for non-default epochs. It's easier to depend on the
generic __get_rtc_time and adjust the epoch here than create
static int srm_env_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, srm_env_proc_show, PDE_DATA(inode));
+ return single_open(file, srm_env_proc_show, pde_data(inode));
}
static ssize_t srm_env_proc_write(struct file *file, const char __user *buffer,
size_t count, loff_t *pos)
{
int res;
- unsigned long id = (unsigned long)PDE_DATA(file_inode(file));
+ unsigned long id = (unsigned long)pde_data(file_inode(file));
char *buf = (char *) __get_free_page(GFP_USER);
unsigned long ret1, ret2;
select COMMON_CLK
select DMA_DIRECT_REMAP
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
- select GENERIC_FIND_FIRST_BIT
# for now, we don't need GENERIC_IRQ_PROBE, CONFIG_GENERIC_IRQ_CHIP
select GENERIC_IRQ_SHOW
select GENERIC_PCI_IOMAP
tune-mcpu-def-$(CONFIG_ISA_ARCOMPACT) := -mcpu=arc700
tune-mcpu-def-$(CONFIG_ISA_ARCV2) := -mcpu=hs38
-ifeq ($(CONFIG_ARC_TUNE_MCPU),"")
+ifeq ($(CONFIG_ARC_TUNE_MCPU),)
cflags-y += $(tune-mcpu-def-y)
else
-tune-mcpu := $(shell echo $(CONFIG_ARC_TUNE_MCPU))
+tune-mcpu := $(CONFIG_ARC_TUNE_MCPU)
ifneq ($(call cc-option,$(tune-mcpu)),)
cflags-y += $(tune-mcpu)
else
# Built-in dtb
builtindtb-y := nsim_700
-ifneq ($(CONFIG_ARC_BUILTIN_DTB_NAME),"")
- builtindtb-y := $(patsubst "%",%,$(CONFIG_ARC_BUILTIN_DTB_NAME))
+ifneq ($(CONFIG_ARC_BUILTIN_DTB_NAME),)
+ builtindtb-y := $(CONFIG_ARC_BUILTIN_DTB_NAME)
endif
obj-y += $(builtindtb-y).dtb.o
#include <asm-generic/bitops/atomic.h>
#include <asm-generic/bitops/non-atomic.h>
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/ext2-atomic-setbit.h>
config UNWINDER_ARM
bool "ARM EABI stack unwinder"
depends on AEABI && !FUNCTION_GRAPH_TRACER
- # https://github.com/ClangBuiltLinux/linux/issues/732
- depends on !LD_IS_LLD || LLD_VERSION >= 110000
select ARM_UNWIND
help
This option enables stack unwinding support in the kernel
CPPFLAGS_vmlinux.lds += -DMALLOC_SIZE="$(MALLOC_SIZE)"
compress-$(CONFIG_KERNEL_GZIP) = gzip
-compress-$(CONFIG_KERNEL_LZO) = lzo
-compress-$(CONFIG_KERNEL_LZMA) = lzma
-compress-$(CONFIG_KERNEL_XZ) = xzkern
-compress-$(CONFIG_KERNEL_LZ4) = lz4
+compress-$(CONFIG_KERNEL_LZO) = lzo_with_size
+compress-$(CONFIG_KERNEL_LZMA) = lzma_with_size
+compress-$(CONFIG_KERNEL_XZ) = xzkern_with_size
+compress-$(CONFIG_KERNEL_LZ4) = lz4_with_size
libfdt_objs := fdt_rw.o fdt_ro.o fdt_wip.o fdt.o
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_AEABI=y
CONFIG_KSM=y
-CONFIG_CLEANCACHE=y
CONFIG_CMA=y
CONFIG_SECCOMP=y
CONFIG_KEXEC=y
CONFIG_SMP=y
CONFIG_PREEMPT=y
CONFIG_HIGHMEM=y
-CONFIG_CLEANCACHE=y
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_CPU_IDLE=y
#endif
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/le.h>
/*
static ssize_t atags_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
- struct buffer *b = PDE_DATA(file_inode(file));
+ struct buffer *b = pde_data(file_inode(file));
return simple_read_from_buffer(buf, count, ppos, b->data, b->size);
}
__setup("noalign", noalign_setup);
/*
- * This needs to be done after sysctl_init, otherwise sys/ will be
+ * This needs to be done after sysctl_init_bases(), otherwise sys/ will be
* overwritten. Actually, this shouldn't be in sys/ at all since
* it isn't a sysctl, and it doesn't contain sysctl information.
* We now locate it in /proc/cpu/alignment instead.
select GENERIC_CPU_AUTOPROBE
select GENERIC_CPU_VULNERABILITIES
select GENERIC_EARLY_IOREMAP
- select GENERIC_FIND_FIRST_BIT
select GENERIC_IDLE_POLL_SETUP
select GENERIC_IRQ_IPI
select GENERIC_IRQ_PROBE
select GENERIC_ARCH_NUMA
select ACPI_NUMA if ACPI
select OF_NUMA
+ select HAVE_SETUP_PER_CPU_AREA
+ select NEED_PER_CPU_EMBED_FIRST_CHUNK
+ select NEED_PER_CPU_PAGE_FIRST_CHUNK
+ select USE_PERCPU_NUMA_NODE_ID
help
Enable NUMA (Non-Uniform Memory Access) support.
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.
-config USE_PERCPU_NUMA_NODE_ID
- def_bool y
- depends on NUMA
-
-config HAVE_SETUP_PER_CPU_AREA
- def_bool y
- depends on NUMA
-
-config NEED_PER_CPU_EMBED_FIRST_CHUNK
- def_bool y
- depends on NUMA
-
-config NEED_PER_CPU_PAGE_FIRST_CHUNK
- def_bool y
- depends on NUMA
-
source "kernel/Kconfig.hz"
config ARCH_SPARSEMEM_ENABLE
" mov %" #w "[tmp], %" #w "[old]\n" \
" cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n" \
" mov %" #w "[ret], %" #w "[tmp]" \
- : [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr), \
+ : [ret] "+r" (x0), [v] "+Q" (*(u##sz *)ptr), \
[tmp] "=&r" (tmp) \
: [old] "r" (x1), [new] "r" (x2) \
: cl); \
#include <asm-generic/bitops/ffz.h>
#include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
" cbnz %" #w "[tmp], 1f\n" \
" wfe\n" \
"1:" \
- : [tmp] "=&r" (tmp), [v] "+Q" (*(unsigned long *)ptr) \
+ : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr) \
: [val] "r" (val)); \
}
}
EXPORT_SYMBOL(pfn_is_map_memory);
-static phys_addr_t memory_limit = PHYS_ADDR_MAX;
+static phys_addr_t memory_limit __ro_after_init = PHYS_ADDR_MAX;
/*
* Limit the memory size that was specified via FDT.
#include <asm-generic/bitops/ffz.h>
#include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
#ifndef _LINUX_BITOPS_H
#error only <linux/bitops.h> can be included directly
suffix-$(CONFIG_KERNEL_GZIP) := gzip
suffix-$(CONFIG_KERNEL_LZO) := lzo
+compress-$(CONFIG_KERNEL_GZIP) := gzip
+compress-$(CONFIG_KERNEL_LZO) := lzo_with_size
$(obj)/vmlinux.bin.$(suffix-y): $(obj)/vmlinux.bin FORCE
- $(call if_changed,$(suffix-y))
+ $(call if_changed,$(compress-y))
LDFLAGS_piggy.o := -r --format binary --oformat elf32-h8300-linux -T
OBJCOPYFLAGS := -O binary
# SPDX-License-Identifier: GPL-2.0
-ifneq '$(CONFIG_H8300_BUILTIN_DTB)' '""'
-BUILTIN_DTB := $(patsubst "%",%,$(CONFIG_H8300_BUILTIN_DTB)).dtb.o
-endif
-
-obj-y += $(BUILTIN_DTB)
+obj-y += $(addsuffix .dtb.o, $(CONFIG_H8300_BUILTIN_DTB))
dtb-$(CONFIG_H8300H_SIM) := h8300h_sim.dtb
dtb-$(CONFIG_H8S_SIM) := h8s_sim.dtb
return result;
}
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
#include <asm-generic/bitops/lock.h>
}
#include <asm-generic/bitops/lock.h>
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE if (!ITANIUM)
select HAVE_FUNCTION_TRACER
+ select HAVE_SETUP_PER_CPU_AREA
select TTY
select HAVE_ARCH_TRACEHOOK
select HAVE_VIRT_CPU_ACCOUNTING
bool
default y
-config HAVE_SETUP_PER_CPU_AREA
- def_bool y
-
config DMI
bool
default y
bool "NUMA support"
depends on !FLATMEM
select SMP
+ select USE_PERCPU_NUMA_NODE_ID
help
Say Y to compile the kernel to support NUMA (Non-Uniform Memory
Access). This option is for configuring high-end multiprocessor
def_bool y
depends on NUMA
-config USE_PERCPU_NUMA_NODE_ID
- def_bool y
- depends on NUMA
-
config HAVE_MEMORYLESS_NODES
def_bool NUMA
#endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h>
-
#ifdef __KERNEL__
#include <asm-generic/bitops/le.h>
static ssize_t
salinfo_event_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos)
{
- struct salinfo_data *data = PDE_DATA(file_inode(file));
+ struct salinfo_data *data = pde_data(file_inode(file));
char cmd[32];
size_t size;
int i, n, cpu = -1;
static int
salinfo_log_open(struct inode *inode, struct file *file)
{
- struct salinfo_data *data = PDE_DATA(inode);
+ struct salinfo_data *data = pde_data(inode);
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
static int
salinfo_log_release(struct inode *inode, struct file *file)
{
- struct salinfo_data *data = PDE_DATA(inode);
+ struct salinfo_data *data = pde_data(inode);
if (data->state == STATE_NO_DATA) {
vfree(data->log_buffer);
static ssize_t
salinfo_log_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos)
{
- struct salinfo_data *data = PDE_DATA(file_inode(file));
+ struct salinfo_data *data = pde_data(file_inode(file));
u8 *buf;
u64 bufsize;
static ssize_t
salinfo_log_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos)
{
- struct salinfo_data *data = PDE_DATA(file_inode(file));
+ struct salinfo_data *data = pde_data(file_inode(file));
char cmd[32];
size_t size;
u32 offset;
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
-CONFIG_CLEANCACHE=y
CONFIG_ZPOOL=m
CONFIG_NET=y
CONFIG_PACKET=y
#include <asm-generic/bitops/le.h>
#endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h>
-
#endif /* _M68K_BITOPS_H */
# What CPU version are we building for, and crack it open
# as major.minor.rev
-CPU_VER := $(shell echo $(CONFIG_XILINX_MICROBLAZE0_HW_VER))
-CPU_MAJOR := $(shell echo $(CPU_VER) | cut -d '.' -f 1)
-CPU_MINOR := $(shell echo $(CPU_VER) | cut -d '.' -f 2)
-CPU_REV := $(shell echo $(CPU_VER) | cut -d '.' -f 3)
+CPU_VER := $(CONFIG_XILINX_MICROBLAZE0_HW_VER)
+CPU_MAJOR := $(word 1, $(subst ., , $(CPU_VER)))
+CPU_MINOR := $(word 2, $(subst ., , $(CPU_VER)))
+CPU_REV := $(word 3, $(subst ., , $(CPU_VER)))
export CPU_VER CPU_MAJOR CPU_MINOR CPU_REV
select GENERIC_ATOMIC64 if !64BIT
select GENERIC_CMOS_UPDATE
select GENERIC_CPU_AUTOPROBE
- select GENERIC_FIND_FIRST_BIT
select GENERIC_GETTIMEOFDAY
select GENERIC_IOMAP
select GENERIC_IRQ_PROBE
bool "NUMA Support"
depends on SYS_SUPPORTS_NUMA
select SMP
+ select HAVE_SETUP_PER_CPU_AREA
+ select NEED_PER_CPU_EMBED_FIRST_CHUNK
help
Say Y to compile the kernel to support NUMA (Non-Uniform Memory
Access). This option improves performance on systems with more
config SYS_SUPPORTS_NUMA
bool
-config HAVE_SETUP_PER_CPU_AREA
- def_bool y
- depends on NUMA
-
-config NEED_PER_CPU_EMBED_FIRST_CHUNK
- def_bool y
- depends on NUMA
-
config RELOCATABLE
bool "Relocatable kernel"
depends on SYS_SUPPORTS_RELOCATABLE
$(call if_changed,objcopy)
tool_$(CONFIG_KERNEL_GZIP) = gzip
-tool_$(CONFIG_KERNEL_BZIP2) = bzip2
-tool_$(CONFIG_KERNEL_LZ4) = lz4
-tool_$(CONFIG_KERNEL_LZMA) = lzma
-tool_$(CONFIG_KERNEL_LZO) = lzo
-tool_$(CONFIG_KERNEL_XZ) = xzkern
-tool_$(CONFIG_KERNEL_ZSTD) = zstd22
+tool_$(CONFIG_KERNEL_BZIP2) = bzip2_with_size
+tool_$(CONFIG_KERNEL_LZ4) = lz4_with_size
+tool_$(CONFIG_KERNEL_LZMA) = lzma_with_size
+tool_$(CONFIG_KERNEL_LZO) = lzo_with_size
+tool_$(CONFIG_KERNEL_XZ) = xzkern_with_size
+tool_$(CONFIG_KERNEL_ZSTD) = zstd22_with_size
targets += vmlinux.bin.z
}
#include <asm-generic/bitops/ffz.h>
-#include <asm-generic/bitops/find.h>
#ifdef __KERNEL__
return node_distance(cpu_to_node(from), cpu_to_node(to));
}
-static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size,
- size_t align)
+static int __init pcpu_cpu_to_node(int cpu)
{
- return memblock_alloc_try_nid(size, align, __pa(MAX_DMA_ADDRESS),
- MEMBLOCK_ALLOC_ACCESSIBLE,
- cpu_to_node(cpu));
-}
-
-static void __init pcpu_fc_free(void *ptr, size_t size)
-{
- memblock_free(ptr, size);
+ return cpu_to_node(cpu);
}
void __init setup_per_cpu_areas(void)
rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
pcpu_cpu_distance,
- pcpu_fc_alloc, pcpu_fc_free);
+ pcpu_cpu_to_node);
if (rc < 0)
panic("Failed to initialize percpu areas.");
core-$(CONFIG_FPU) += arch/nds32/math-emu/
libs-y += arch/nds32/lib/
-ifneq '$(CONFIG_NDS32_BUILTIN_DTB)' '""'
-BUILTIN_DTB := y
-else
-BUILTIN_DTB := n
-endif
-
ifdef CONFIG_CPU_LITTLE_ENDIAN
KBUILD_CFLAGS += $(call cc-option, -EL)
KBUILD_AFLAGS += $(call cc-option, -EL)
# SPDX-License-Identifier: GPL-2.0-only
-ifneq '$(CONFIG_NDS32_BUILTIN_DTB)' '""'
-BUILTIN_DTB := $(patsubst "%",%,$(CONFIG_NDS32_BUILTIN_DTB)).dtb.o
-else
-BUILTIN_DTB :=
-endif
-obj-$(CONFIG_OF) += $(BUILTIN_DTB)
+obj-$(CONFIG_OF) += $(addsuffix .dtb.o, $(CONFIG_NDS32_BUILTIN_DTB))
# SPDX-License-Identifier: GPL-2.0
-obj-y := $(patsubst "%.dts",%.dtb.o,$(CONFIG_NIOS2_DTB_SOURCE))
+obj-y := $(patsubst %.dts,%.dtb.o,$(CONFIG_NIOS2_DTB_SOURCE))
dtstree := $(srctree)/$(src)
dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard $(dtstree)/*.dts))
# SPDX-License-Identifier: GPL-2.0
-ifneq '$(CONFIG_OPENRISC_BUILTIN_DTB)' '""'
-BUILTIN_DTB := $(patsubst "%",%,$(CONFIG_OPENRISC_BUILTIN_DTB)).dtb.o
-else
-BUILTIN_DTB :=
-endif
-obj-y += $(BUILTIN_DTB)
+obj-y += $(addsuffix .dtb.o, $(CONFIG_OPENRISC_BUILTIN_DTB))
#DTC_FLAGS ?= -p 1024
#include <asm/bitops/fls.h>
#include <asm/bitops/__fls.h>
#include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
#ifndef _LINUX_BITOPS_H
#error only <linux/bitops.h> can be included directly
$(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
-vmlinux.bin.all-y := $(obj)/vmlinux.bin
-
suffix-$(CONFIG_KERNEL_GZIP) := gz
suffix-$(CONFIG_KERNEL_BZIP2) := bz2
suffix-$(CONFIG_KERNEL_LZ4) := lz4
suffix-$(CONFIG_KERNEL_LZO) := lzo
suffix-$(CONFIG_KERNEL_XZ) := xz
-$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
+$(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
$(call if_changed,gzip)
-$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,bzip2)
-$(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lz4)
-$(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzma)
-$(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzo)
-$(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,xzkern)
+$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,bzip2_with_size)
+$(obj)/vmlinux.bin.lz4: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lz4_with_size)
+$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lzma_with_size)
+$(obj)/vmlinux.bin.lzo: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lzo_with_size)
+$(obj)/vmlinux.bin.xz: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,xzkern_with_size)
LDFLAGS_piggy.o := -r --format binary --oformat $(LD_BFD) -T
$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.$(suffix-y) FORCE
#include <asm-generic/bitops/hweight.h>
#include <asm-generic/bitops/lock.h>
#include <asm-generic/bitops/sched.h>
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/ext2-atomic-setbit.h>
extern int running_on_qemu;
+extern void __noreturn toc_intr(struct pt_regs *regs);
extern void toc_handler(void);
extern unsigned int toc_handler_size;
extern unsigned int toc_handler_csum;
void __init setup_cmdline(char **cmdline_p)
{
extern unsigned int boot_args[];
+ char *p;
/* Collect stuff passed in from the boot loader */
/* called from hpux boot loader */
boot_command_line[0] = '\0';
} else {
- strlcpy(boot_command_line, (char *)__va(boot_args[1]),
+ strscpy(boot_command_line, (char *)__va(boot_args[1]),
COMMAND_LINE_SIZE);
+ /* autodetect console type (if not done by palo yet) */
+ p = boot_command_line;
+ if (!str_has_prefix(p, "console=") && !strstr(p, " console=")) {
+ strlcat(p, " console=", COMMAND_LINE_SIZE);
+ if (PAGE0->mem_cons.cl_class == CL_DUPLEX)
+ strlcat(p, "ttyS0", COMMAND_LINE_SIZE);
+ else
+ strlcat(p, "tty0", COMMAND_LINE_SIZE);
+ }
+
#ifdef CONFIG_BLK_DEV_INITRD
if (boot_args[2] != 0) /* did palo pass us a ramdisk? */
{
#endif
}
- strcpy(command_line, boot_command_line);
+ strscpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
}
#include <asm/pdc.h>
#include <asm/pdc_chassis.h>
#include <asm/ldcw.h>
+#include <asm/processor.h>
static unsigned int __aligned(16) toc_lock = 1;
-DEFINE_PER_CPU_PAGE_ALIGNED(char [16384], toc_stack);
+DEFINE_PER_CPU_PAGE_ALIGNED(char [16384], toc_stack) __visible;
static void toc20_to_pt_regs(struct pt_regs *regs, struct pdc_toc_pim_20 *toc)
{
default 9 if PPC_16K_PAGES # 9 = 23 (8MB) - 14 (16K)
default 11 # 11 = 23 (8MB) - 12 (4K)
-config HAVE_SETUP_PER_CPU_AREA
- def_bool PPC64
-
-config NEED_PER_CPU_EMBED_FIRST_CHUNK
- def_bool y if PPC64
-
-config NEED_PER_CPU_PAGE_FIRST_CHUNK
- def_bool y if PPC64
-
config NR_IRQS
int "Number of virtual interrupt numbers"
range 32 1048576
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
+ select HAVE_SETUP_PER_CPU_AREA if PPC64
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
select MMU_GATHER_RCU_TABLE_FREE
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE
+ select NEED_PER_CPU_EMBED_FIRST_CHUNK if PPC64
+ select NEED_PER_CPU_PAGE_FIRST_CHUNK if PPC64
select NEED_SG_DMA_LENGTH
select OF
select OF_DMA_DEFAULT_COHERENT if !NOT_COHERENT_CACHE
bool "NUMA Memory Allocation and Scheduler Support"
depends on PPC64 && SMP
default y if PPC_PSERIES || PPC_POWERNV
+ select USE_PERCPU_NUMA_NODE_ID
help
Enable NUMA (Non-Uniform Memory Access) support.
default "4"
depends on NUMA
-config USE_PERCPU_NUMA_NODE_ID
- def_bool y
- depends on NUMA
-
config HAVE_MEMORYLESS_NODES
def_bool y
depends on NUMA
endif
# Allow extra targets to be added to the defconfig
-image-y += $(subst ",,$(CONFIG_EXTRA_TARGETS))
+image-y += $(CONFIG_EXTRA_TARGETS)
initrd- := $(patsubst zImage%, zImage.initrd%, $(image-))
initrd-y := $(patsubst zImage%, zImage.initrd%, \
#size-cells = <0>;
compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
reg = <0xfc000 0x1000>;
+ fsl,erratum-a009885;
};
xmdio0: mdio@fd000 {
#size-cells = <0>;
compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
reg = <0xfd000 0x1000>;
+ fsl,erratum-a009885;
};
};
interrupts = <14>;
};
+ srnprot@d800060 {
+ compatible = "nintendo,hollywood-srnprot";
+ reg = <0x0d800060 0x4>;
+ };
+
GPIO: gpio@d8000c0 {
#gpio-cells = <2>;
compatible = "nintendo,hollywood-gpio";
CONFIG_SND_SEQUENCER_OSS=y
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
-CONFIG_RTC_DRV_GENERIC=y
+CONFIG_RTC_DRV_GAMECUBE=y
CONFIG_EXT2_FS=y
CONFIG_EXT4_FS=y
CONFIG_ISO9660_FS=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_PANIC=y
CONFIG_RTC_CLASS=y
-CONFIG_RTC_DRV_GENERIC=y
+CONFIG_RTC_DRV_GAMECUBE=y
CONFIG_NVMEM_NINTENDO_OTP=y
CONFIG_EXT2_FS=y
CONFIG_EXT4_FS=y
#include <asm-generic/bitops/hweight.h>
#endif
-#include <asm-generic/bitops/find.h>
-
/* wrappers that deal with KASAN instrumentation */
#include <asm-generic/bitops/instrumented-atomic.h>
#include <asm-generic/bitops/instrumented-lock.h>
loff_t *ppos)
{
return simple_read_from_buffer(buf, nbytes, ppos,
- PDE_DATA(file_inode(file)), PAGE_SIZE);
+ pde_data(file_inode(file)), PAGE_SIZE);
}
static int page_map_mmap( struct file *file, struct vm_area_struct *vma )
return -EINVAL;
remap_pfn_range(vma, vma->vm_start,
- __pa(PDE_DATA(file_inode(file))) >> PAGE_SHIFT,
+ __pa(pde_data(file_inode(file))) >> PAGE_SHIFT,
PAGE_SIZE, vma->vm_page_prot);
return 0;
}
}
#ifdef CONFIG_SMP
-/**
- * pcpu_alloc_bootmem - NUMA friendly alloc_bootmem wrapper for percpu
- * @cpu: cpu to allocate for
- * @size: size allocation in bytes
- * @align: alignment
- *
- * Allocate @size bytes aligned at @align for cpu @cpu. This wrapper
- * does the right thing for NUMA regardless of the current
- * configuration.
- *
- * RETURNS:
- * Pointer to the allocated area on success, NULL on failure.
- */
-static void * __init pcpu_alloc_bootmem(unsigned int cpu, size_t size,
- size_t align)
-{
- const unsigned long goal = __pa(MAX_DMA_ADDRESS);
-#ifdef CONFIG_NUMA
- int node = early_cpu_to_node(cpu);
- void *ptr;
-
- if (!node_online(node) || !NODE_DATA(node)) {
- ptr = memblock_alloc_from(size, align, goal);
- pr_info("cpu %d has no node %d or node-local memory\n",
- cpu, node);
- pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
- cpu, size, __pa(ptr));
- } else {
- ptr = memblock_alloc_try_nid(size, align, goal,
- MEMBLOCK_ALLOC_ACCESSIBLE, node);
- pr_debug("per cpu data for cpu%d %lu bytes on node%d at "
- "%016lx\n", cpu, size, node, __pa(ptr));
- }
- return ptr;
-#else
- return memblock_alloc_from(size, align, goal);
-#endif
-}
-
-static void __init pcpu_free_bootmem(void *ptr, size_t size)
-{
- memblock_free(ptr, size);
-}
-
static int pcpu_cpu_distance(unsigned int from, unsigned int to)
{
if (early_cpu_to_node(from) == early_cpu_to_node(to))
return REMOTE_DISTANCE;
}
-unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(__per_cpu_offset);
-
-static void __init pcpu_populate_pte(unsigned long addr)
+static __init int pcpu_cpu_to_node(int cpu)
{
- pgd_t *pgd = pgd_offset_k(addr);
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
-
- p4d = p4d_offset(pgd, addr);
- if (p4d_none(*p4d)) {
- pud_t *new;
-
- new = memblock_alloc(PUD_TABLE_SIZE, PUD_TABLE_SIZE);
- if (!new)
- goto err_alloc;
- p4d_populate(&init_mm, p4d, new);
- }
-
- pud = pud_offset(p4d, addr);
- if (pud_none(*pud)) {
- pmd_t *new;
-
- new = memblock_alloc(PMD_TABLE_SIZE, PMD_TABLE_SIZE);
- if (!new)
- goto err_alloc;
- pud_populate(&init_mm, pud, new);
- }
-
- pmd = pmd_offset(pud, addr);
- if (!pmd_present(*pmd)) {
- pte_t *new;
-
- new = memblock_alloc(PTE_TABLE_SIZE, PTE_TABLE_SIZE);
- if (!new)
- goto err_alloc;
- pmd_populate_kernel(&init_mm, pmd, new);
- }
-
- return;
-
-err_alloc:
- panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
- __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+ return early_cpu_to_node(cpu);
}
+unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
+EXPORT_SYMBOL(__per_cpu_offset);
void __init setup_per_cpu_areas(void)
{
if (pcpu_chosen_fc != PCPU_FC_PAGE) {
rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, pcpu_cpu_distance,
- pcpu_alloc_bootmem, pcpu_free_bootmem);
+ pcpu_cpu_to_node);
if (rc)
pr_warn("PERCPU: %s allocator failed (%d), "
"falling back to page size\n",
}
if (rc < 0)
- rc = pcpu_page_first_chunk(0, pcpu_alloc_bootmem, pcpu_free_bootmem,
- pcpu_populate_pte);
+ rc = pcpu_page_first_chunk(0, pcpu_cpu_to_node);
if (rc < 0)
panic("cannot initialize percpu area (err=%d)", rc);
int bit;
retry:
- bit = find_next_bit(flags_free, MAX_FLAGS, 0);
+ bit = find_first_bit(flags_free, MAX_FLAGS);
if (bit >= MAX_FLAGS)
return -ENOSPC;
if (!test_and_clear_bit(bit, flags_free))
int bit;
retry:
- bit = find_next_bit(fun_free, MAX_FLAGS, 0);
+ bit = find_first_bit(fun_free, MAX_FLAGS);
if (bit >= MAX_FLAGS)
return -ENOSPC;
if (!test_and_clear_bit(bit, fun_free))
def_bool y
select ARCH_CLOCKSOURCE_INIT
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
+ select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_VIRTUAL if MMU
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if 64BIT && MMU
+ select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_VMAP_STACK if MMU && 64BIT
select HAVE_ASM_MODVERSIONS
Select if you want MMU-based virtualised addressing space
support by paged memory management. If unsure, say 'Y'.
-config VA_BITS
- int
- default 32 if 32BIT
- default 39 if 64BIT
-
-config PA_BITS
- int
- default 34 if 32BIT
- default 56 if 64BIT
-
config PAGE_OFFSET
hex
- default 0xC0000000 if 32BIT && MAXPHYSMEM_1GB
+ default 0xC0000000 if 32BIT
default 0x80000000 if 64BIT && !MMU
- default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
- default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
+ default 0xffffaf8000000000 if 64BIT
config KASAN_SHADOW_OFFSET
hex
depends on KASAN_GENERIC
- default 0xdfffffc800000000 if 64BIT
+ default 0xdfffffff00000000 if 64BIT
default 0xffffffff if 32BIT
config ARCH_FLATMEM_ENABLE
config PGTABLE_LEVELS
int
- default 3 if 64BIT
+ default 4 if 64BIT
default 2
config LOCKDEP_SUPPORT
bool
select HAVE_MOD_ARCH_SPECIFIC
-choice
- prompt "Maximum Physical Memory"
- default MAXPHYSMEM_1GB if 32BIT
- default MAXPHYSMEM_2GB if 64BIT && CMODEL_MEDLOW
- default MAXPHYSMEM_128GB if 64BIT && CMODEL_MEDANY
-
- config MAXPHYSMEM_1GB
- depends on 32BIT
- bool "1GiB"
- config MAXPHYSMEM_2GB
- depends on 64BIT && CMODEL_MEDLOW
- bool "2GiB"
- config MAXPHYSMEM_128GB
- depends on 64BIT && CMODEL_MEDANY
- bool "128GiB"
-endchoice
-
-
config SMP
bool "Symmetric Multi-Processing"
help
select GENERIC_ARCH_NUMA
select OF_NUMA
select ARCH_SUPPORTS_NUMA_BALANCING
+ select USE_PERCPU_NUMA_NODE_ID
+ select NEED_PER_CPU_EMBED_FIRST_CHUNK
help
Enable NUMA (Non-Uniform Memory Access) support.
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.
-config USE_PERCPU_NUMA_NODE_ID
- def_bool y
- depends on NUMA
-
-config NEED_PER_CPU_EMBED_FIRST_CHUNK
- def_bool y
- depends on NUMA
-
config RISCV_ISA_C
bool "Emit compressed instructions when building Linux"
default y
config RISCV_SBI_V01
bool "SBI v0.1 support"
- default y
depends on RISCV_SBI
help
This config allows kernel to use SBI v0.1 APIs. This will be
deprecated in future once legacy M-mode software are no longer in use.
+config RISCV_BOOT_SPINWAIT
+ bool "Spinwait booting method"
+ depends on SMP
+ default y
+ help
+ This enables support for booting Linux via spinwait method. In the
+ spinwait method, all cores randomly jump to Linux. One of the cores
+ gets chosen via lottery and all other keep spinning on a percpu
+ variable. This method cannot support CPU hotplug and sparse hartid
+ scheme. It should be only enabled for M-mode Linux or platforms relying
+ on older firmware without SBI HSM extension. All other platforms should
+ rely on ordered booting via SBI HSM extension which gets chosen
+ dynamically at runtime if the firmware supports it.
+
config KEXEC
bool "Kexec system call"
select KEXEC_CORE
# SPDX-License-Identifier: GPL-2.0
-ifneq ($(CONFIG_SOC_CANAAN_K210_DTB_SOURCE),"")
-dtb-y += $(strip $(shell echo $(CONFIG_SOC_CANAAN_K210_DTB_SOURCE))).dtb
+dtb-$(CONFIG_SOC_CANAAN_K210_DTB_BUILTIN) += $(addsuffix .dtb, $(CONFIG_SOC_CANAAN_K210_DTB_SOURCE))
obj-$(CONFIG_SOC_CANAAN_K210_DTB_BUILTIN) += $(addsuffix .o, $(dtb-y))
-endif
clint0: timer@2000000 {
compatible = "canaan,k210-clint", "sifive,clint0";
reg = <0x2000000 0xC000>;
- interrupts-extended = <&cpu0_intc 3 &cpu0_intc 7
- &cpu1_intc 3 &cpu1_intc 7>;
+ interrupts-extended = <&cpu0_intc 3>, <&cpu0_intc 7>,
+ <&cpu1_intc 3>, <&cpu1_intc 7>;
};
plic0: interrupt-controller@c000000 {
compatible = "canaan,k210-plic", "sifive,plic-1.0.0";
reg = <0xC000000 0x4000000>;
interrupt-controller;
- interrupts-extended = <&cpu0_intc 11 &cpu1_intc 11>;
+ interrupts-extended = <&cpu0_intc 11>, <&cpu1_intc 11>;
riscv,ndev = <65>;
};
compatible = "canaan,k210-gpiohs", "sifive,gpio0";
reg = <0x38001000 0x1000>;
interrupt-controller;
- interrupts = <34 35 36 37 38 39 40 41
- 42 43 44 45 46 47 48 49
- 50 51 52 53 54 55 56 57
- 58 59 60 61 62 63 64 65>;
+ interrupts = <34>, <35>, <36>, <37>, <38>, <39>, <40>,
+ <41>, <42>, <43>, <44>, <45>, <46>, <47>,
+ <48>, <49>, <50>, <51>, <52>, <53>, <54>,
+ <55>, <56>, <57>, <58>, <59>, <60>, <61>,
+ <62>, <63>, <64>, <65>;
gpio-controller;
ngpios = <32>;
};
dmac0: dma-controller@50000000 {
compatible = "snps,axi-dma-1.01a";
reg = <0x50000000 0x1000>;
- interrupts = <27 28 29 30 31 32>;
+ interrupts = <27>, <28>, <29>, <30>, <31>, <32>;
#dma-cells = <1>;
clocks = <&sysclk K210_CLK_DMA>, <&sysclk K210_CLK_DMA>;
clock-names = "core-clk", "cfgr-clk";
timer0: timer@502d0000 {
compatible = "snps,dw-apb-timer";
reg = <0x502D0000 0x100>;
- interrupts = <14 15>;
+ interrupts = <14>, <15>;
clocks = <&sysclk K210_CLK_TIMER0>,
<&sysclk K210_CLK_APB0>;
clock-names = "timer", "pclk";
timer1: timer@502e0000 {
compatible = "snps,dw-apb-timer";
reg = <0x502E0000 0x100>;
- interrupts = <16 17>;
+ interrupts = <16>, <17>;
clocks = <&sysclk K210_CLK_TIMER1>,
<&sysclk K210_CLK_APB0>;
clock-names = "timer", "pclk";
timer2: timer@502f0000 {
compatible = "snps,dw-apb-timer";
reg = <0x502F0000 0x100>;
- interrupts = <18 19>;
+ interrupts = <18>, <19>;
clocks = <&sysclk K210_CLK_TIMER2>,
<&sysclk K210_CLK_APB0>;
clock-names = "timer", "pclk";
};
&spi3 {
- spi-flash@0 {
+ flash@0 {
compatible = "jedec,spi-nor";
reg = <0>;
spi-max-frequency = <50000000>;
};
&spi3 {
- spi-flash@0 {
+ flash@0 {
compatible = "jedec,spi-nor";
reg = <0>;
spi-max-frequency = <50000000>;
};
&spi3 {
- spi-flash@0 {
+ flash@0 {
compatible = "jedec,spi-nor";
reg = <0>;
spi-max-frequency = <50000000>;
};
&spi3 {
- spi-flash@0 {
+ flash@0 {
compatible = "jedec,spi-nor";
reg = <0>;
spi-max-frequency = <50000000>;
};
};
+&refclk {
+ clock-frequency = <600000000>;
+};
+
&serial0 {
status = "okay";
};
model = "Microchip PolarFire SoC";
compatible = "microchip,mpfs";
- chosen {
- };
-
cpus {
#address-cells = <1>;
#size-cells = <0>;
};
};
+ refclk: msspllclk {
+ compatible = "fixed-clock";
+ #clock-cells = <0>;
+ };
+
soc {
#address-cells = <2>;
#size-cells = <2>;
cache-size = <2097152>;
cache-unified;
interrupt-parent = <&plic>;
- interrupts = <1 2 3>;
+ interrupts = <1>, <2>, <3>;
reg = <0x0 0x2010000 0x0 0x1000>;
};
clint@2000000 {
compatible = "sifive,fu540-c000-clint", "sifive,clint0";
reg = <0x0 0x2000000 0x0 0xC000>;
- interrupts-extended = <&cpu0_intc 3 &cpu0_intc 7
- &cpu1_intc 3 &cpu1_intc 7
- &cpu2_intc 3 &cpu2_intc 7
- &cpu3_intc 3 &cpu3_intc 7
- &cpu4_intc 3 &cpu4_intc 7>;
+ interrupts-extended = <&cpu0_intc 3>, <&cpu0_intc 7>,
+ <&cpu1_intc 3>, <&cpu1_intc 7>,
+ <&cpu2_intc 3>, <&cpu2_intc 7>,
+ <&cpu3_intc 3>, <&cpu3_intc 7>,
+ <&cpu4_intc 3>, <&cpu4_intc 7>;
};
plic: interrupt-controller@c000000 {
- #interrupt-cells = <1>;
compatible = "sifive,fu540-c000-plic", "sifive,plic-1.0.0";
reg = <0x0 0xc000000 0x0 0x4000000>;
- riscv,ndev = <186>;
+ #address-cells = <0>;
+ #interrupt-cells = <1>;
interrupt-controller;
- interrupts-extended = <&cpu0_intc 11
- &cpu1_intc 11 &cpu1_intc 9
- &cpu2_intc 11 &cpu2_intc 9
- &cpu3_intc 11 &cpu3_intc 9
- &cpu4_intc 11 &cpu4_intc 9>;
+ interrupts-extended = <&cpu0_intc 11>,
+ <&cpu1_intc 11>, <&cpu1_intc 9>,
+ <&cpu2_intc 11>, <&cpu2_intc 9>,
+ <&cpu3_intc 11>, <&cpu3_intc 9>,
+ <&cpu4_intc 11>, <&cpu4_intc 9>;
+ riscv,ndev = <186>;
};
dma@3000000 {
compatible = "sifive,fu540-c000-pdma";
reg = <0x0 0x3000000 0x0 0x8000>;
interrupt-parent = <&plic>;
- interrupts = <23 24 25 26 27 28 29 30>;
+ interrupts = <23>, <24>, <25>, <26>, <27>, <28>, <29>,
+ <30>;
#dma-cells = <1>;
};
- refclk: refclk {
- compatible = "fixed-clock";
- #clock-cells = <0>;
- clock-frequency = <600000000>;
- clock-output-names = "msspllclk";
- };
-
clkcfg: clkcfg@20002000 {
compatible = "microchip,mpfs-clkcfg";
reg = <0x0 0x20002000 0x0 0x1000>;
- reg-names = "mss_sysreg";
clocks = <&refclk>;
#clock-cells = <1>;
- clock-output-names = "cpu", "axi", "ahb", "envm", /* 0-3 */
- "mac0", "mac1", "mmc", "timer", /* 4-7 */
- "mmuart0", "mmuart1", "mmuart2", "mmuart3", /* 8-11 */
- "mmuart4", "spi0", "spi1", "i2c0", /* 12-15 */
- "i2c1", "can0", "can1", "usb", /* 16-19 */
- "rsvd", "rtc", "qspi", "gpio0", /* 20-23 */
- "gpio1", "gpio2", "ddrc", "fic0", /* 24-27 */
- "fic1", "fic2", "fic3", "athena", "cfm"; /* 28-32 */
};
serial0: serial@20000000 {
compatible = "microchip,mpfs-sd4hc", "cdns,sd4hc";
reg = <0x0 0x20008000 0x0 0x1000>;
interrupt-parent = <&plic>;
- interrupts = <88 89>;
+ interrupts = <88>, <89>;
clocks = <&clkcfg 6>;
max-frequency = <200000000>;
status = "disabled";
compatible = "cdns,macb";
reg = <0x0 0x20110000 0x0 0x2000>;
interrupt-parent = <&plic>;
- interrupts = <64 65 66 67>;
+ interrupts = <64>, <65>, <66>, <67>;
local-mac-address = [00 00 00 00 00 00];
clocks = <&clkcfg 4>, <&clkcfg 2>;
clock-names = "pclk", "hclk";
compatible = "cdns,macb";
reg = <0x0 0x20112000 0x0 0x2000>;
interrupt-parent = <&plic>;
- interrupts = <70 71 72 73>;
+ interrupts = <70>, <71>, <72>, <73>;
local-mac-address = [00 00 00 00 00 00];
clocks = <&clkcfg 5>, <&clkcfg 2>;
status = "disabled";
soc {
#address-cells = <2>;
#size-cells = <2>;
- compatible = "sifive,fu540-c000", "sifive,fu540", "simple-bus";
+ compatible = "simple-bus";
ranges;
plic0: interrupt-controller@c000000 {
- #interrupt-cells = <1>;
compatible = "sifive,fu540-c000-plic", "sifive,plic-1.0.0";
reg = <0x0 0xc000000 0x0 0x4000000>;
- riscv,ndev = <53>;
+ #address-cells = <0>;
+ #interrupt-cells = <1>;
interrupt-controller;
- interrupts-extended = <
- &cpu0_intc 0xffffffff
- &cpu1_intc 0xffffffff &cpu1_intc 9
- &cpu2_intc 0xffffffff &cpu2_intc 9
- &cpu3_intc 0xffffffff &cpu3_intc 9
- &cpu4_intc 0xffffffff &cpu4_intc 9>;
+ interrupts-extended =
+ <&cpu0_intc 0xffffffff>,
+ <&cpu1_intc 0xffffffff>, <&cpu1_intc 9>,
+ <&cpu2_intc 0xffffffff>, <&cpu2_intc 9>,
+ <&cpu3_intc 0xffffffff>, <&cpu3_intc 9>,
+ <&cpu4_intc 0xffffffff>, <&cpu4_intc 9>;
+ riscv,ndev = <53>;
};
prci: clock-controller@10000000 {
compatible = "sifive,fu540-c000-prci";
compatible = "sifive,fu540-c000-pdma";
reg = <0x0 0x3000000 0x0 0x8000>;
interrupt-parent = <&plic0>;
- interrupts = <23 24 25 26 27 28 29 30>;
+ interrupts = <23>, <24>, <25>, <26>, <27>, <28>, <29>,
+ <30>;
#dma-cells = <1>;
};
uart1: serial@10011000 {
};
qspi0: spi@10040000 {
compatible = "sifive,fu540-c000-spi", "sifive,spi0";
- reg = <0x0 0x10040000 0x0 0x1000
- 0x0 0x20000000 0x0 0x10000000>;
+ reg = <0x0 0x10040000 0x0 0x1000>,
+ <0x0 0x20000000 0x0 0x10000000>;
interrupt-parent = <&plic0>;
interrupts = <51>;
clocks = <&prci PRCI_CLK_TLCLK>;
};
qspi1: spi@10041000 {
compatible = "sifive,fu540-c000-spi", "sifive,spi0";
- reg = <0x0 0x10041000 0x0 0x1000
- 0x0 0x30000000 0x0 0x10000000>;
+ reg = <0x0 0x10041000 0x0 0x1000>,
+ <0x0 0x30000000 0x0 0x10000000>;
interrupt-parent = <&plic0>;
interrupts = <52>;
clocks = <&prci PRCI_CLK_TLCLK>;
compatible = "sifive,fu540-c000-gem";
interrupt-parent = <&plic0>;
interrupts = <53>;
- reg = <0x0 0x10090000 0x0 0x2000
- 0x0 0x100a0000 0x0 0x1000>;
+ reg = <0x0 0x10090000 0x0 0x2000>,
+ <0x0 0x100a0000 0x0 0x1000>;
local-mac-address = [00 00 00 00 00 00];
clock-names = "pclk", "hclk";
clocks = <&prci PRCI_CLK_GEMGXLPLL>,
compatible = "sifive,fu540-c000-pwm", "sifive,pwm0";
reg = <0x0 0x10020000 0x0 0x1000>;
interrupt-parent = <&plic0>;
- interrupts = <42 43 44 45>;
+ interrupts = <42>, <43>, <44>, <45>;
clocks = <&prci PRCI_CLK_TLCLK>;
#pwm-cells = <3>;
status = "disabled";
compatible = "sifive,fu540-c000-pwm", "sifive,pwm0";
reg = <0x0 0x10021000 0x0 0x1000>;
interrupt-parent = <&plic0>;
- interrupts = <46 47 48 49>;
+ interrupts = <46>, <47>, <48>, <49>;
clocks = <&prci PRCI_CLK_TLCLK>;
#pwm-cells = <3>;
status = "disabled";
cache-size = <2097152>;
cache-unified;
interrupt-parent = <&plic0>;
- interrupts = <1 2 3>;
+ interrupts = <1>, <2>, <3>;
reg = <0x0 0x2010000 0x0 0x1000>;
};
gpio: gpio@10060000 {
reg = <0x0 0xc000000 0x0 0x4000000>;
riscv,ndev = <69>;
interrupt-controller;
- interrupts-extended = <
- &cpu0_intc 0xffffffff
- &cpu1_intc 0xffffffff &cpu1_intc 9
- &cpu2_intc 0xffffffff &cpu2_intc 9
- &cpu3_intc 0xffffffff &cpu3_intc 9
- &cpu4_intc 0xffffffff &cpu4_intc 9>;
+ interrupts-extended =
+ <&cpu0_intc 0xffffffff>,
+ <&cpu1_intc 0xffffffff>, <&cpu1_intc 9>,
+ <&cpu2_intc 0xffffffff>, <&cpu2_intc 9>,
+ <&cpu3_intc 0xffffffff>, <&cpu3_intc 9>,
+ <&cpu4_intc 0xffffffff>, <&cpu4_intc 9>;
};
prci: clock-controller@10000000 {
compatible = "sifive,fu740-c000-prci";
cache-size = <2097152>;
cache-unified;
interrupt-parent = <&plic0>;
- interrupts = <19 21 22 20>;
+ interrupts = <19>, <21>, <22>, <20>;
reg = <0x0 0x2010000 0x0 0x1000>;
};
gpio: gpio@10060000 {
clock-frequency = <RTCCLK_FREQ>;
clock-output-names = "rtcclk";
};
+
+ gpio-poweroff {
+ compatible = "gpio-poweroff";
+ gpios = <&gpio 2 GPIO_ACTIVE_LOW>;
+ };
};
&uart0 {
CONFIG_POSIX_MQUEUE=y
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_SYSCALL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_EXPERT=y
-CONFIG_BPF_SYSCALL=y
+# CONFIG_SYSFS_SYSCALL is not set
+CONFIG_SOC_MICROCHIP_POLARFIRE=y
CONFIG_SOC_SIFIVE=y
CONFIG_SOC_VIRT=y
-CONFIG_SOC_MICROCHIP_POLARFIRE=y
CONFIG_SMP=y
CONFIG_HOTPLUG_CPU=y
CONFIG_VIRTUALIZATION=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_SPI=y
CONFIG_SPI_SIFIVE=y
+# CONFIG_PTP_1588_CLOCK is not set
CONFIG_GPIOLIB=y
CONFIG_GPIO_SIFIVE=y
-# CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
CONFIG_DRM=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_NOUVEAU=m
CONFIG_DRM_VIRTIO_GPU=m
+CONFIG_FB=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_OHCI_HCD_PLATFORM=y
CONFIG_USB_STORAGE=y
CONFIG_USB_UAS=y
+CONFIG_MMC=y
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PLTFM=y
CONFIG_MMC_SDHCI_CADENCE=y
-CONFIG_MMC=y
CONFIG_MMC_SPI=y
CONFIG_RTC_CLASS=y
CONFIG_VIRTIO_PCI=y
# CONFIG_FTRACE is not set
# CONFIG_RUNTIME_TESTING_MENU is not set
CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
-CONFIG_EFI=y
CONFIG_SLOB=y
# CONFIG_MMU is not set
CONFIG_SOC_CANAAN=y
-CONFIG_SOC_CANAAN_K210_DTB_SOURCE="k210_generic"
-CONFIG_MAXPHYSMEM_2GB=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_CMDLINE="earlycon console=ttySIF0"
CONFIG_LEDS_USER=y
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VHOST_MENU is not set
-# CONFIG_SURFACE_PLATFORMS is not set
# CONFIG_FILE_LOCKING is not set
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
CONFIG_SLOB=y
# CONFIG_MMU is not set
CONFIG_SOC_CANAAN=y
-CONFIG_SOC_CANAAN_K210_DTB_SOURCE="k210_generic"
-CONFIG_MAXPHYSMEM_2GB=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_CMDLINE="earlycon console=ttySIF0 rootdelay=2 root=/dev/mmcblk0p1 ro"
# CONFIG_SECCOMP is not set
# CONFIG_STACKPROTECTOR is not set
# CONFIG_GCC_PLUGINS is not set
-# CONFIG_BLK_DEV_BSG is not set
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
CONFIG_BINFMT_FLAT=y
CONFIG_LEDS_USER=y
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VHOST_MENU is not set
-# CONFIG_SURFACE_PLATFORMS is not set
CONFIG_EXT2_FS=y
# CONFIG_FILE_LOCKING is not set
# CONFIG_DNOTIFY is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SLOB=y
-# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_MMU is not set
CONFIG_SOC_VIRT=y
-CONFIG_MAXPHYSMEM_2GB=y
CONFIG_SMP=y
CONFIG_CMDLINE="root=/dev/vda rw earlycon=uart8250,mmio,0x10000000,115200n8 console=ttyS0"
CONFIG_CMDLINE_FORCE=y
CONFIG_JUMP_LABEL=y
-# CONFIG_BLK_DEV_BSG is not set
CONFIG_PARTITION_ADVANCED=y
# CONFIG_MSDOS_PARTITION is not set
# CONFIG_EFI_PARTITION is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_SYSCALL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_EXPERT=y
-CONFIG_BPF_SYSCALL=y
+# CONFIG_SYSFS_SYSCALL is not set
CONFIG_SOC_SIFIVE=y
CONFIG_SOC_VIRT=y
CONFIG_ARCH_RV32I=y
CONFIG_SPI=y
CONFIG_SPI_SIFIVE=y
# CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
CONFIG_DRM=y
CONFIG_DRM_RADEON=y
CONFIG_DRM_VIRTIO_GPU=y
+CONFIG_FB=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
# CONFIG_FTRACE is not set
# CONFIG_RUNTIME_TESTING_MENU is not set
CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
} cpu_mfr_info;
static void (*vendor_patch_func)(struct alt_entry *begin, struct alt_entry *end,
- unsigned long archid, unsigned long impid);
+ unsigned long archid,
+ unsigned long impid) __initdata;
static inline void __init riscv_fill_cpu_mfr_info(void)
{
# SPDX-License-Identifier: GPL-2.0
generic-y += early_ioremap.h
-generic-y += extable.h
generic-y += flat.h
generic-y += kvm_para.h
generic-y += user.h
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_ASM_EXTABLE_H
+#define __ASM_ASM_EXTABLE_H
+
+#define EX_TYPE_NONE 0
+#define EX_TYPE_FIXUP 1
+#define EX_TYPE_BPF 2
+#define EX_TYPE_UACCESS_ERR_ZERO 3
+
+#ifdef __ASSEMBLY__
+
+#define __ASM_EXTABLE_RAW(insn, fixup, type, data) \
+ .pushsection __ex_table, "a"; \
+ .balign 4; \
+ .long ((insn) - .); \
+ .long ((fixup) - .); \
+ .short (type); \
+ .short (data); \
+ .popsection;
+
+ .macro _asm_extable, insn, fixup
+ __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
+ .endm
+
+#else /* __ASSEMBLY__ */
+
+#include <linux/bits.h>
+#include <linux/stringify.h>
+#include <asm/gpr-num.h>
+
+#define __ASM_EXTABLE_RAW(insn, fixup, type, data) \
+ ".pushsection __ex_table, \"a\"\n" \
+ ".balign 4\n" \
+ ".long ((" insn ") - .)\n" \
+ ".long ((" fixup ") - .)\n" \
+ ".short (" type ")\n" \
+ ".short (" data ")\n" \
+ ".popsection\n"
+
+#define _ASM_EXTABLE(insn, fixup) \
+ __ASM_EXTABLE_RAW(#insn, #fixup, __stringify(EX_TYPE_FIXUP), "0")
+
+#define EX_DATA_REG_ERR_SHIFT 0
+#define EX_DATA_REG_ERR GENMASK(4, 0)
+#define EX_DATA_REG_ZERO_SHIFT 5
+#define EX_DATA_REG_ZERO GENMASK(9, 5)
+
+#define EX_DATA_REG(reg, gpr) \
+ "((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
+
+#define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero) \
+ __DEFINE_ASM_GPR_NUMS \
+ __ASM_EXTABLE_RAW(#insn, #fixup, \
+ __stringify(EX_TYPE_UACCESS_ERR_ZERO), \
+ "(" \
+ EX_DATA_REG(ERR, err) " | " \
+ EX_DATA_REG(ZERO, zero) \
+ ")")
+
+#define _ASM_EXTABLE_UACCESS_ERR(insn, fixup, err) \
+ _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero)
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_ASM_EXTABLE_H */
#include <asm-generic/bitops/fls.h>
#include <asm-generic/bitops/__fls.h>
#include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/ffs.h>
extern const struct cpu_operations *cpu_ops[NR_CPUS];
void __init cpu_set_ops(int cpu);
-void cpu_update_secondary_bootdata(unsigned int cpuid,
- struct task_struct *tidle);
#endif /* ifndef __ASM_CPU_OPS_H */
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2021 by Rivos Inc.
+ */
+#ifndef __ASM_CPU_OPS_SBI_H
+#define __ASM_CPU_OPS_SBI_H
+
+#ifndef __ASSEMBLY__
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/threads.h>
+
+/**
+ * struct sbi_hart_boot_data - Hart specific boot used during booting and
+ * cpu hotplug.
+ * @task_ptr: A pointer to the hart specific tp
+ * @stack_ptr: A pointer to the hart specific sp
+ */
+struct sbi_hart_boot_data {
+ void *task_ptr;
+ void *stack_ptr;
+};
+#endif
+
+#endif /* ifndef __ASM_CPU_OPS_SBI_H */
#ifndef CONFIG_64BIT
#define SATP_PPN _AC(0x003FFFFF, UL)
#define SATP_MODE_32 _AC(0x80000000, UL)
-#define SATP_MODE SATP_MODE_32
#define SATP_ASID_BITS 9
#define SATP_ASID_SHIFT 22
#define SATP_ASID_MASK _AC(0x1FF, UL)
#else
#define SATP_PPN _AC(0x00000FFFFFFFFFFF, UL)
#define SATP_MODE_39 _AC(0x8000000000000000, UL)
-#define SATP_MODE SATP_MODE_39
+#define SATP_MODE_48 _AC(0x9000000000000000, UL)
#define SATP_ASID_BITS 16
#define SATP_ASID_SHIFT 44
#define SATP_ASID_MASK _AC(0xFFFF, UL)
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_EXTABLE_H
+#define _ASM_RISCV_EXTABLE_H
+
+/*
+ * The exception table consists of pairs of relative offsets: the first
+ * is the relative offset to an instruction that is allowed to fault,
+ * and the second is the relative offset at which the program should
+ * continue. No registers are modified, so it is entirely up to the
+ * continuation code to figure out what to do.
+ *
+ * All the routines below use bits of fixup code that are out of line
+ * with the main instruction path. This means when everything is well,
+ * we don't even have to jump over them. Further, they do not intrude
+ * on our cache or tlb entries.
+ */
+
+struct exception_table_entry {
+ int insn, fixup;
+ short type, data;
+};
+
+#define ARCH_HAS_RELATIVE_EXTABLE
+
+#define swap_ex_entry_fixup(a, b, tmp, delta) \
+do { \
+ (a)->fixup = (b)->fixup + (delta); \
+ (b)->fixup = (tmp).fixup - (delta); \
+ (a)->type = (b)->type; \
+ (b)->type = (tmp).type; \
+ (a)->data = (b)->data; \
+ (b)->data = (tmp).data; \
+} while (0)
+
+bool fixup_exception(struct pt_regs *regs);
+
+#if defined(CONFIG_BPF_JIT) && defined(CONFIG_ARCH_RV64I)
+bool ex_handler_bpf(const struct exception_table_entry *ex, struct pt_regs *regs);
+#else
+static inline bool
+ex_handler_bpf(const struct exception_table_entry *ex,
+ struct pt_regs *regs)
+{
+ return false;
+}
+#endif
+
+#endif
FIX_HOLE,
FIX_PTE,
FIX_PMD,
+ FIX_PUD,
FIX_TEXT_POKE1,
FIX_TEXT_POKE0,
FIX_EARLYCON_MEM_BASE,
#include <linux/uaccess.h>
#include <linux/errno.h>
#include <asm/asm.h>
+#include <asm/asm-extable.h>
/* We don't even really need the extable code, but for now keep it simple */
#ifndef CONFIG_MMU
#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
{ \
- uintptr_t tmp; \
__enable_user_access(); \
__asm__ __volatile__ ( \
"1: " insn " \n" \
"2: \n" \
- " .section .fixup,\"ax\" \n" \
- " .balign 4 \n" \
- "3: li %[r],%[e] \n" \
- " jump 2b,%[t] \n" \
- " .previous \n" \
- " .section __ex_table,\"a\" \n" \
- " .balign " RISCV_SZPTR " \n" \
- " " RISCV_PTR " 1b, 3b \n" \
- " .previous \n" \
+ _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %[r]) \
: [r] "+r" (ret), [ov] "=&r" (oldval), \
- [u] "+m" (*uaddr), [t] "=&r" (tmp) \
- : [op] "Jr" (oparg), [e] "i" (-EFAULT) \
+ [u] "+m" (*uaddr) \
+ : [op] "Jr" (oparg) \
: "memory"); \
__disable_user_access(); \
}
"2: sc.w.aqrl %[t],%z[nv],%[u] \n"
" bnez %[t],1b \n"
"3: \n"
- " .section .fixup,\"ax\" \n"
- " .balign 4 \n"
- "4: li %[r],%[e] \n"
- " jump 3b,%[t] \n"
- " .previous \n"
- " .section __ex_table,\"a\" \n"
- " .balign " RISCV_SZPTR " \n"
- " " RISCV_PTR " 1b, 4b \n"
- " " RISCV_PTR " 2b, 4b \n"
- " .previous \n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %[r]) \
+ _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %[r]) \
: [r] "+r" (ret), [v] "=&r" (val), [u] "+m" (*uaddr), [t] "=&r" (tmp)
- : [ov] "Jr" (oldval), [nv] "Jr" (newval), [e] "i" (-EFAULT)
+ : [ov] "Jr" (oldval), [nv] "Jr" (newval)
: "memory");
__disable_user_access();
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GPR_NUM_H
+#define __ASM_GPR_NUM_H
+
+#ifdef __ASSEMBLY__
+ .equ .L__gpr_num_zero, 0
+ .equ .L__gpr_num_ra, 1
+ .equ .L__gpr_num_sp, 2
+ .equ .L__gpr_num_gp, 3
+ .equ .L__gpr_num_tp, 4
+ .equ .L__gpr_num_t0, 5
+ .equ .L__gpr_num_t1, 6
+ .equ .L__gpr_num_t2, 7
+ .equ .L__gpr_num_s0, 8
+ .equ .L__gpr_num_s1, 9
+ .equ .L__gpr_num_a0, 10
+ .equ .L__gpr_num_a1, 11
+ .equ .L__gpr_num_a2, 12
+ .equ .L__gpr_num_a3, 13
+ .equ .L__gpr_num_a4, 14
+ .equ .L__gpr_num_a5, 15
+ .equ .L__gpr_num_a6, 16
+ .equ .L__gpr_num_a7, 17
+ .equ .L__gpr_num_s2, 18
+ .equ .L__gpr_num_s3, 19
+ .equ .L__gpr_num_s4, 20
+ .equ .L__gpr_num_s5, 21
+ .equ .L__gpr_num_s6, 22
+ .equ .L__gpr_num_s7, 23
+ .equ .L__gpr_num_s8, 24
+ .equ .L__gpr_num_s9, 25
+ .equ .L__gpr_num_s10, 26
+ .equ .L__gpr_num_s11, 27
+ .equ .L__gpr_num_t3, 28
+ .equ .L__gpr_num_t4, 29
+ .equ .L__gpr_num_t5, 30
+ .equ .L__gpr_num_t6, 31
+
+#else /* __ASSEMBLY__ */
+
+#define __DEFINE_ASM_GPR_NUMS \
+" .equ .L__gpr_num_zero, 0\n" \
+" .equ .L__gpr_num_ra, 1\n" \
+" .equ .L__gpr_num_sp, 2\n" \
+" .equ .L__gpr_num_gp, 3\n" \
+" .equ .L__gpr_num_tp, 4\n" \
+" .equ .L__gpr_num_t0, 5\n" \
+" .equ .L__gpr_num_t1, 6\n" \
+" .equ .L__gpr_num_t2, 7\n" \
+" .equ .L__gpr_num_s0, 8\n" \
+" .equ .L__gpr_num_s1, 9\n" \
+" .equ .L__gpr_num_a0, 10\n" \
+" .equ .L__gpr_num_a1, 11\n" \
+" .equ .L__gpr_num_a2, 12\n" \
+" .equ .L__gpr_num_a3, 13\n" \
+" .equ .L__gpr_num_a4, 14\n" \
+" .equ .L__gpr_num_a5, 15\n" \
+" .equ .L__gpr_num_a6, 16\n" \
+" .equ .L__gpr_num_a7, 17\n" \
+" .equ .L__gpr_num_s2, 18\n" \
+" .equ .L__gpr_num_s3, 19\n" \
+" .equ .L__gpr_num_s4, 20\n" \
+" .equ .L__gpr_num_s5, 21\n" \
+" .equ .L__gpr_num_s6, 22\n" \
+" .equ .L__gpr_num_s7, 23\n" \
+" .equ .L__gpr_num_s8, 24\n" \
+" .equ .L__gpr_num_s9, 25\n" \
+" .equ .L__gpr_num_s10, 26\n" \
+" .equ .L__gpr_num_s11, 27\n" \
+" .equ .L__gpr_num_t3, 28\n" \
+" .equ .L__gpr_num_t4, 29\n" \
+" .equ .L__gpr_num_t5, 30\n" \
+" .equ .L__gpr_num_t6, 31\n"
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_GPR_NUM_H */
*/
#define KASAN_SHADOW_SCALE_SHIFT 3
-#define KASAN_SHADOW_SIZE (UL(1) << ((CONFIG_VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
-#define KASAN_SHADOW_START KERN_VIRT_START
-#define KASAN_SHADOW_END (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
+#define KASAN_SHADOW_SIZE (UL(1) << ((VA_BITS - 1) - KASAN_SHADOW_SCALE_SHIFT))
+/*
+ * Depending on the size of the virtual address space, the region may not be
+ * aligned on PGDIR_SIZE, so force its alignment to ease its population.
+ */
+#define KASAN_SHADOW_START ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
+#define KASAN_SHADOW_END MODULES_LOWEST_VADDR
#define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
void kasan_init(void);
asmlinkage void kasan_early_init(void);
+void kasan_swapper_init(void);
#endif
#endif
* When not using MMU this corresponds to the first free page in
* physical memory (aligned on a page boundary).
*/
+#ifdef CONFIG_64BIT
+#ifdef CONFIG_MMU
+#define PAGE_OFFSET kernel_map.page_offset
+#else
#define PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL)
-
-#define KERN_VIRT_SIZE (-PAGE_OFFSET)
+#endif
+/*
+ * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
+ * define the PAGE_OFFSET value for SV39.
+ */
+#define PAGE_OFFSET_L3 _AC(0xffffffd800000000, UL)
+#else
+#define PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL)
+#endif /* CONFIG_64BIT */
#ifndef __ASSEMBLY__
#endif /* CONFIG_MMU */
struct kernel_mapping {
+ unsigned long page_offset;
unsigned long virt_addr;
uintptr_t phys_addr;
uintptr_t size;
#include <asm/tlb.h>
#ifdef CONFIG_MMU
+#define __HAVE_ARCH_PUD_ALLOC_ONE
+#define __HAVE_ARCH_PUD_FREE
#include <asm-generic/pgalloc.h>
static inline void pmd_populate_kernel(struct mm_struct *mm,
set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
}
+
+static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
+{
+ if (pgtable_l4_enabled) {
+ unsigned long pfn = virt_to_pfn(pud);
+
+ set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
+ }
+}
+
+static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d,
+ pud_t *pud)
+{
+ if (pgtable_l4_enabled) {
+ unsigned long pfn = virt_to_pfn(pud);
+
+ set_p4d_safe(p4d,
+ __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
+ }
+}
+
+#define pud_alloc_one pud_alloc_one
+static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+ if (pgtable_l4_enabled)
+ return __pud_alloc_one(mm, addr);
+
+ return NULL;
+}
+
+#define pud_free pud_free
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+ if (pgtable_l4_enabled)
+ __pud_free(mm, pud);
+}
+
+#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
#endif /* __PAGETABLE_PMD_FOLDED */
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
#include <linux/const.h>
-#define PGDIR_SHIFT 30
+extern bool pgtable_l4_enabled;
+
+#define PGDIR_SHIFT_L3 30
+#define PGDIR_SHIFT_L4 39
+#define PGDIR_SIZE_L3 (_AC(1, UL) << PGDIR_SHIFT_L3)
+
+#define PGDIR_SHIFT (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3)
/* Size of region mapped by a page global directory */
#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE - 1))
+/* pud is folded into pgd in case of 3-level page table */
+#define PUD_SHIFT 30
+#define PUD_SIZE (_AC(1, UL) << PUD_SHIFT)
+#define PUD_MASK (~(PUD_SIZE - 1))
+
#define PMD_SHIFT 21
/* Size of region mapped by a page middle directory */
#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT)
#define PMD_MASK (~(PMD_SIZE - 1))
+/* Page Upper Directory entry */
+typedef struct {
+ unsigned long pud;
+} pud_t;
+
+#define pud_val(x) ((x).pud)
+#define __pud(x) ((pud_t) { (x) })
+#define PTRS_PER_PUD (PAGE_SIZE / sizeof(pud_t))
+
/* Page Middle Directory entry */
typedef struct {
unsigned long pmd;
set_pud(pudp, __pud(0));
}
+static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
+{
+ return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
+}
+
+static inline unsigned long _pud_pfn(pud_t pud)
+{
+ return pud_val(pud) >> _PAGE_PFN_SHIFT;
+}
+
static inline pmd_t *pud_pgtable(pud_t pud)
{
return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
}
+#define mm_pud_folded mm_pud_folded
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+ if (pgtable_l4_enabled)
+ return false;
+
+ return true;
+}
+
+#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
+
static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
{
return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot));
#define pmd_ERROR(e) \
pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
+#define pud_ERROR(e) \
+ pr_err("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
+
+static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
+{
+ if (pgtable_l4_enabled)
+ *p4dp = p4d;
+ else
+ set_pud((pud_t *)p4dp, (pud_t){ p4d_val(p4d) });
+}
+
+static inline int p4d_none(p4d_t p4d)
+{
+ if (pgtable_l4_enabled)
+ return (p4d_val(p4d) == 0);
+
+ return 0;
+}
+
+static inline int p4d_present(p4d_t p4d)
+{
+ if (pgtable_l4_enabled)
+ return (p4d_val(p4d) & _PAGE_PRESENT);
+
+ return 1;
+}
+
+static inline int p4d_bad(p4d_t p4d)
+{
+ if (pgtable_l4_enabled)
+ return !p4d_present(p4d);
+
+ return 0;
+}
+
+static inline void p4d_clear(p4d_t *p4d)
+{
+ if (pgtable_l4_enabled)
+ set_p4d(p4d, __p4d(0));
+}
+
+static inline pud_t *p4d_pgtable(p4d_t p4d)
+{
+ if (pgtable_l4_enabled)
+ return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
+
+ return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
+}
+
+static inline struct page *p4d_page(p4d_t p4d)
+{
+ return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
+}
+
+#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+
+#define pud_offset pud_offset
+static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
+{
+ if (pgtable_l4_enabled)
+ return p4d_pgtable(*p4d) + pud_index(address);
+
+ return (pud_t *)p4d;
+}
+
#endif /* _ASM_RISCV_PGTABLE_64_H */
* _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
* distinguish them from swapped out pages
*/
-#define _PAGE_PROT_NONE _PAGE_READ
+#define _PAGE_PROT_NONE _PAGE_GLOBAL
#define _PAGE_PFN_SHIFT 10
#define KERNEL_LINK_ADDR PAGE_OFFSET
#endif
+/* Number of entries in the page global directory */
+#define PTRS_PER_PGD (PAGE_SIZE / sizeof(pgd_t))
+/* Number of entries in the page table */
+#define PTRS_PER_PTE (PAGE_SIZE / sizeof(pte_t))
+
+/*
+ * Half of the kernel address space (half of the entries of the page global
+ * directory) is for the direct mapping.
+ */
+#define KERN_VIRT_SIZE ((PTRS_PER_PGD / 2 * PGDIR_SIZE) / 2)
+
#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
-#define VMALLOC_END (PAGE_OFFSET - 1)
+#define VMALLOC_END PAGE_OFFSET
#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
#define BPF_JIT_REGION_SIZE (SZ_128M)
/* Modules always live before the kernel */
#ifdef CONFIG_64BIT
-#define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
-#define MODULES_END (PFN_ALIGN((unsigned long)&_start))
+/* This is used to define the end of the KASAN shadow region */
+#define MODULES_LOWEST_VADDR (KERNEL_LINK_ADDR - SZ_2G)
+#define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
+#define MODULES_END (PFN_ALIGN((unsigned long)&_start))
#endif
/*
* struct pages to map half the virtual address space. Then
* position vmemmap directly below the VMALLOC region.
*/
+#ifdef CONFIG_64BIT
+#define VA_BITS (pgtable_l4_enabled ? 48 : 39)
+#else
+#define VA_BITS 32
+#endif
+
#define VMEMMAP_SHIFT \
- (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
+ (VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
#define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT)
-#define VMEMMAP_END (VMALLOC_START - 1)
+#define VMEMMAP_END VMALLOC_START
#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
/*
#ifndef __ASSEMBLY__
-/* Page Upper Directory not used in RISC-V */
-#include <asm-generic/pgtable-nopud.h>
+#include <asm-generic/pgtable-nop4d.h>
#include <asm/page.h>
#include <asm/tlbflush.h>
#include <linux/mm_types.h>
#define XIP_FIXUP(addr) (addr)
#endif /* CONFIG_XIP_KERNEL */
-#ifdef CONFIG_MMU
-/* Number of entries in the page global directory */
-#define PTRS_PER_PGD (PAGE_SIZE / sizeof(pgd_t))
-/* Number of entries in the page table */
-#define PTRS_PER_PTE (PAGE_SIZE / sizeof(pte_t))
+struct pt_alloc_ops {
+ pte_t *(*get_pte_virt)(phys_addr_t pa);
+ phys_addr_t (*alloc_pte)(uintptr_t va);
+#ifndef __PAGETABLE_PMD_FOLDED
+ pmd_t *(*get_pmd_virt)(phys_addr_t pa);
+ phys_addr_t (*alloc_pmd)(uintptr_t va);
+ pud_t *(*get_pud_virt)(phys_addr_t pa);
+ phys_addr_t (*alloc_pud)(uintptr_t va);
+#endif
+};
+
+extern struct pt_alloc_ops pt_ops __initdata;
+#ifdef CONFIG_MMU
/* Number of PGD entries that a user-mode program can use */
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
/* Page protection bits */
#define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER)
-#define PAGE_NONE __pgprot(_PAGE_PROT_NONE)
+#define PAGE_NONE __pgprot(_PAGE_PROT_NONE | _PAGE_READ)
#define PAGE_READ __pgprot(_PAGE_BASE | _PAGE_READ)
#define PAGE_WRITE __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_WRITE)
#define PAGE_EXEC __pgprot(_PAGE_BASE | _PAGE_EXEC)
*
* Format of swap PTE:
* bit 0: _PAGE_PRESENT (zero)
- * bit 1: _PAGE_PROT_NONE (zero)
- * bits 2 to 6: swap type
- * bits 7 to XLEN-1: swap offset
+ * bit 1 to 3: _PAGE_LEAF (zero)
+ * bit 5: _PAGE_PROT_NONE (zero)
+ * bits 6 to 10: swap type
+ * bits 10 to XLEN-1: swap offset
*/
-#define __SWP_TYPE_SHIFT 2
+#define __SWP_TYPE_SHIFT 6
#define __SWP_TYPE_BITS 5
#define __SWP_TYPE_MASK ((1UL << __SWP_TYPE_BITS) - 1)
#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
+#define __swp_entry_to_pmd(swp) __pmd((swp).val)
+#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
+
/*
* In the RV64 Linux scheme, we give the user half of the virtual-address space
* and give the kernel the other (upper) half.
*/
#ifdef CONFIG_64BIT
-#define KERN_VIRT_START (-(BIT(CONFIG_VA_BITS)) + TASK_SIZE)
+#define KERN_VIRT_START (-(BIT(VA_BITS)) + TASK_SIZE)
#else
#define KERN_VIRT_START FIXADDR_START
#endif
/*
* Task size is 0x4000000000 for RV64 or 0x9fc00000 for RV32.
* Note that PGDIR_SIZE must evenly divide TASK_SIZE.
+ * Task size is:
+ * - 0x9fc00000 (~2.5GB) for RV32.
+ * - 0x4000000000 ( 256GB) for RV64 using SV39 mmu
+ * - 0x800000000000 ( 128TB) for RV64 using SV48 mmu
+ *
+ * Note that PGDIR_SIZE must evenly divide TASK_SIZE since "RISC-V
+ * Instruction Set Manual Volume II: Privileged Architecture" states that
+ * "load and store effective addresses, which are 64bits, must have bits
+ * 63–48 all equal to bit 47, or else a page-fault exception will occur."
*/
#ifdef CONFIG_64BIT
-#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
+#define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
+#define TASK_SIZE_MIN (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
#else
-#define TASK_SIZE FIXADDR_START
+#define TASK_SIZE FIXADDR_START
+#define TASK_SIZE_MIN TASK_SIZE
#endif
#else /* CONFIG_MMU */
#define dtb_early_va _dtb_early_va
#define dtb_early_pa _dtb_early_pa
#endif /* CONFIG_XIP_KERNEL */
+extern u64 satp_mode;
+extern bool pgtable_l4_enabled;
void paging_init(void);
void misc_mem_init(void);
#define _ASM_RISCV_SBI_H
#include <linux/types.h>
+#include <linux/cpumask.h>
#ifdef CONFIG_RISCV_SBI
enum sbi_ext_id {
SBI_EXT_IPI = 0x735049,
SBI_EXT_RFENCE = 0x52464E43,
SBI_EXT_HSM = 0x48534D,
+ SBI_EXT_SRST = 0x53525354,
/* Experimentals extensions must lie within this range */
SBI_EXT_EXPERIMENTAL_START = 0x08000000,
SBI_HSM_HART_STATUS_STOP_PENDING,
};
+enum sbi_ext_srst_fid {
+ SBI_EXT_SRST_RESET = 0,
+};
+
+enum sbi_srst_reset_type {
+ SBI_SRST_RESET_TYPE_SHUTDOWN = 0,
+ SBI_SRST_RESET_TYPE_COLD_REBOOT,
+ SBI_SRST_RESET_TYPE_WARM_REBOOT,
+};
+
+enum sbi_srst_reset_reason {
+ SBI_SRST_RESET_REASON_NONE = 0,
+ SBI_SRST_RESET_REASON_SYS_FAILURE,
+};
+
#define SBI_SPEC_VERSION_DEFAULT 0x1
#define SBI_SPEC_VERSION_MAJOR_SHIFT 24
#define SBI_SPEC_VERSION_MAJOR_MASK 0x7f
void sbi_set_timer(uint64_t stime_value);
void sbi_shutdown(void);
void sbi_clear_ipi(void);
-int sbi_send_ipi(const unsigned long *hart_mask);
-int sbi_remote_fence_i(const unsigned long *hart_mask);
-int sbi_remote_sfence_vma(const unsigned long *hart_mask,
+int sbi_send_ipi(const struct cpumask *cpu_mask);
+int sbi_remote_fence_i(const struct cpumask *cpu_mask);
+int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size);
-int sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
+int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size,
unsigned long asid);
-int sbi_remote_hfence_gvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size);
-int sbi_remote_hfence_gvma_vmid(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma_vmid(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size,
unsigned long vmid);
-int sbi_remote_hfence_vvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size);
-int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma_asid(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size,
unsigned long asid);
return sbi_spec_version & SBI_SPEC_VERSION_MINOR_MASK;
}
+/* Make SBI version */
+static inline unsigned long sbi_mk_version(unsigned long major,
+ unsigned long minor)
+{
+ return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
+ SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
+}
+
int sbi_err_map_linux_errno(int err);
#else /* CONFIG_RISCV_SBI */
-static inline int sbi_remote_fence_i(const unsigned long *hart_mask) { return -1; }
+static inline int sbi_remote_fence_i(const struct cpumask *cpu_mask) { return -1; }
static inline void sbi_init(void) {}
#endif /* CONFIG_RISCV_SBI */
#endif /* _ASM_RISCV_SBI_H */
void arch_send_call_function_single_ipi(int cpu);
int riscv_hartid_to_cpuid(int hartid);
-void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out);
/* Set custom IPI operations */
void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops);
#if defined CONFIG_HOTPLUG_CPU
int __cpu_disable(void);
void __cpu_die(unsigned int cpu);
-void cpu_stop(void);
-#else
#endif /* CONFIG_HOTPLUG_CPU */
#else
return boot_cpu_hartid;
}
-static inline void riscv_cpuid_to_hartid_mask(const struct cpumask *in,
- struct cpumask *out)
-{
- cpumask_clear(out);
- cpumask_set_cpu(boot_cpu_hartid, out);
-}
-
static inline void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops)
{
}
#define _ASM_RISCV_SPARSEMEM_H
#ifdef CONFIG_SPARSEMEM
-#define MAX_PHYSMEM_BITS CONFIG_PA_BITS
+#ifdef CONFIG_64BIT
+#define MAX_PHYSMEM_BITS 56
+#else
+#define MAX_PHYSMEM_BITS 34
+#endif /* CONFIG_64BIT */
#define SECTION_SIZE_BITS 27
#endif /* CONFIG_SPARSEMEM */
#ifndef _ASM_RISCV_UACCESS_H
#define _ASM_RISCV_UACCESS_H
+#include <asm/asm-extable.h>
#include <asm/pgtable.h> /* for TASK_SIZE */
/*
#define __get_user_asm(insn, x, ptr, err) \
do { \
- uintptr_t __tmp; \
__typeof__(x) __x; \
__asm__ __volatile__ ( \
"1:\n" \
- " " insn " %1, %3\n" \
+ " " insn " %1, %2\n" \
"2:\n" \
- " .section .fixup,\"ax\"\n" \
- " .balign 4\n" \
- "3:\n" \
- " li %0, %4\n" \
- " li %1, 0\n" \
- " jump 2b, %2\n" \
- " .previous\n" \
- " .section __ex_table,\"a\"\n" \
- " .balign " RISCV_SZPTR "\n" \
- " " RISCV_PTR " 1b, 3b\n" \
- " .previous" \
- : "+r" (err), "=&r" (__x), "=r" (__tmp) \
- : "m" (*(ptr)), "i" (-EFAULT)); \
+ _ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %0, %1) \
+ : "+r" (err), "=&r" (__x) \
+ : "m" (*(ptr))); \
(x) = __x; \
} while (0)
do { \
u32 __user *__ptr = (u32 __user *)(ptr); \
u32 __lo, __hi; \
- uintptr_t __tmp; \
__asm__ __volatile__ ( \
"1:\n" \
- " lw %1, %4\n" \
+ " lw %1, %3\n" \
"2:\n" \
- " lw %2, %5\n" \
+ " lw %2, %4\n" \
"3:\n" \
- " .section .fixup,\"ax\"\n" \
- " .balign 4\n" \
- "4:\n" \
- " li %0, %6\n" \
- " li %1, 0\n" \
- " li %2, 0\n" \
- " jump 3b, %3\n" \
- " .previous\n" \
- " .section __ex_table,\"a\"\n" \
- " .balign " RISCV_SZPTR "\n" \
- " " RISCV_PTR " 1b, 4b\n" \
- " " RISCV_PTR " 2b, 4b\n" \
- " .previous" \
- : "+r" (err), "=&r" (__lo), "=r" (__hi), \
- "=r" (__tmp) \
- : "m" (__ptr[__LSW]), "m" (__ptr[__MSW]), \
- "i" (-EFAULT)); \
+ _ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 3b, %0, %1) \
+ _ASM_EXTABLE_UACCESS_ERR_ZERO(2b, 3b, %0, %1) \
+ : "+r" (err), "=&r" (__lo), "=r" (__hi) \
+ : "m" (__ptr[__LSW]), "m" (__ptr[__MSW])); \
+ if (err) \
+ __hi = 0; \
(x) = (__typeof__(x))((__typeof__((x)-(x)))( \
(((u64)__hi << 32) | __lo))); \
} while (0)
#define __put_user_asm(insn, x, ptr, err) \
do { \
- uintptr_t __tmp; \
__typeof__(*(ptr)) __x = x; \
__asm__ __volatile__ ( \
"1:\n" \
- " " insn " %z3, %2\n" \
+ " " insn " %z2, %1\n" \
"2:\n" \
- " .section .fixup,\"ax\"\n" \
- " .balign 4\n" \
- "3:\n" \
- " li %0, %4\n" \
- " jump 2b, %1\n" \
- " .previous\n" \
- " .section __ex_table,\"a\"\n" \
- " .balign " RISCV_SZPTR "\n" \
- " " RISCV_PTR " 1b, 3b\n" \
- " .previous" \
- : "+r" (err), "=r" (__tmp), "=m" (*(ptr)) \
- : "rJ" (__x), "i" (-EFAULT)); \
+ _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0) \
+ : "+r" (err), "=m" (*(ptr)) \
+ : "rJ" (__x)); \
} while (0)
#ifdef CONFIG_64BIT
do { \
u32 __user *__ptr = (u32 __user *)(ptr); \
u64 __x = (__typeof__((x)-(x)))(x); \
- uintptr_t __tmp; \
__asm__ __volatile__ ( \
"1:\n" \
- " sw %z4, %2\n" \
+ " sw %z3, %1\n" \
"2:\n" \
- " sw %z5, %3\n" \
+ " sw %z4, %2\n" \
"3:\n" \
- " .section .fixup,\"ax\"\n" \
- " .balign 4\n" \
- "4:\n" \
- " li %0, %6\n" \
- " jump 3b, %1\n" \
- " .previous\n" \
- " .section __ex_table,\"a\"\n" \
- " .balign " RISCV_SZPTR "\n" \
- " " RISCV_PTR " 1b, 4b\n" \
- " " RISCV_PTR " 2b, 4b\n" \
- " .previous" \
- : "+r" (err), "=r" (__tmp), \
+ _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %0) \
+ _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %0) \
+ : "+r" (err), \
"=m" (__ptr[__LSW]), \
"=m" (__ptr[__MSW]) \
- : "rJ" (__x), "rJ" (__x >> 32), "i" (-EFAULT)); \
+ : "rJ" (__x), "rJ" (__x >> 32)); \
} while (0)
#endif /* CONFIG_64BIT */
__clear_user(to, n) : n;
}
-/*
- * Atomic compare-and-exchange, but with a fixup for userspace faults. Faults
- * will set "err" to -EFAULT, while successful accesses return the previous
- * value.
- */
-#define __cmpxchg_user(ptr, old, new, err, size, lrb, scb) \
-({ \
- __typeof__(ptr) __ptr = (ptr); \
- __typeof__(*(ptr)) __old = (old); \
- __typeof__(*(ptr)) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- __typeof__(err) __err = 0; \
- register unsigned int __rc; \
- __enable_user_access(); \
- switch (size) { \
- case 4: \
- __asm__ __volatile__ ( \
- "0:\n" \
- " lr.w" #scb " %[ret], %[ptr]\n" \
- " bne %[ret], %z[old], 1f\n" \
- " sc.w" #lrb " %[rc], %z[new], %[ptr]\n" \
- " bnez %[rc], 0b\n" \
- "1:\n" \
- ".section .fixup,\"ax\"\n" \
- ".balign 4\n" \
- "2:\n" \
- " li %[err], %[efault]\n" \
- " jump 1b, %[rc]\n" \
- ".previous\n" \
- ".section __ex_table,\"a\"\n" \
- ".balign " RISCV_SZPTR "\n" \
- " " RISCV_PTR " 1b, 2b\n" \
- ".previous\n" \
- : [ret] "=&r" (__ret), \
- [rc] "=&r" (__rc), \
- [ptr] "+A" (*__ptr), \
- [err] "=&r" (__err) \
- : [old] "rJ" (__old), \
- [new] "rJ" (__new), \
- [efault] "i" (-EFAULT)); \
- break; \
- case 8: \
- __asm__ __volatile__ ( \
- "0:\n" \
- " lr.d" #scb " %[ret], %[ptr]\n" \
- " bne %[ret], %z[old], 1f\n" \
- " sc.d" #lrb " %[rc], %z[new], %[ptr]\n" \
- " bnez %[rc], 0b\n" \
- "1:\n" \
- ".section .fixup,\"ax\"\n" \
- ".balign 4\n" \
- "2:\n" \
- " li %[err], %[efault]\n" \
- " jump 1b, %[rc]\n" \
- ".previous\n" \
- ".section __ex_table,\"a\"\n" \
- ".balign " RISCV_SZPTR "\n" \
- " " RISCV_PTR " 1b, 2b\n" \
- ".previous\n" \
- : [ret] "=&r" (__ret), \
- [rc] "=&r" (__rc), \
- [ptr] "+A" (*__ptr), \
- [err] "=&r" (__err) \
- : [old] "rJ" (__old), \
- [new] "rJ" (__new), \
- [efault] "i" (-EFAULT)); \
- break; \
- default: \
- BUILD_BUG(); \
- } \
- __disable_user_access(); \
- (err) = __err; \
- __ret; \
-})
-
#define HAVE_GET_KERNEL_NOFAULT
#define __get_kernel_nofault(dst, src, type, err_label) \
obj-$(CONFIG_SMP) += smpboot.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_SMP) += cpu_ops.o
-obj-$(CONFIG_SMP) += cpu_ops_spinwait.o
+
+obj-$(CONFIG_RISCV_BOOT_SPINWAIT) += cpu_ops_spinwait.o
obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
#include <asm/kvm_host.h>
#include <asm/thread_info.h>
#include <asm/ptrace.h>
+#include <asm/cpu_ops_sbi.h>
void asm_offsets(void);
DEFINE(PT_SIZE_ON_STACK, ALIGN(sizeof(struct pt_regs), STACK_ALIGN));
OFFSET(KERNEL_MAP_VIRT_ADDR, kernel_mapping, virt_addr);
+ OFFSET(SBI_HART_BOOT_TASK_PTR_OFFSET, sbi_hart_boot_data, task_ptr);
+ OFFSET(SBI_HART_BOOT_STACK_PTR_OFFSET, sbi_hart_boot_data, stack_ptr);
}
#include <asm/cpu_ops.h>
#include <asm/sbi.h>
-void cpu_stop(void);
-void arch_cpu_idle_dead(void)
-{
- cpu_stop();
-}
-
bool cpu_has_hotplug(unsigned int cpu)
{
if (cpu_ops[cpu]->cpu_stop)
/*
* Called from the idle thread for the CPU which has been shutdown.
*/
-void cpu_stop(void)
+void arch_cpu_idle_dead(void)
{
idle_task_exit();
#include <linux/seq_file.h>
#include <linux/of.h>
#include <asm/smp.h>
+#include <asm/pgtable.h>
/*
* Returns the hart ID of the given device tree node, or -ENODEV if the node
seq_puts(f, "\n");
}
-static void print_mmu(struct seq_file *f, const char *mmu_type)
+static void print_mmu(struct seq_file *f)
{
+ char sv_type[16];
+
#if defined(CONFIG_32BIT)
- if (strcmp(mmu_type, "riscv,sv32") != 0)
- return;
+ strncpy(sv_type, "sv32", 5);
#elif defined(CONFIG_64BIT)
- if (strcmp(mmu_type, "riscv,sv39") != 0 &&
- strcmp(mmu_type, "riscv,sv48") != 0)
- return;
+ if (pgtable_l4_enabled)
+ strncpy(sv_type, "sv48", 5);
+ else
+ strncpy(sv_type, "sv39", 5);
#endif
-
- seq_printf(f, "mmu\t\t: %s\n", mmu_type+6);
+ seq_printf(f, "mmu\t\t: %s\n", sv_type);
}
static void *c_start(struct seq_file *m, loff_t *pos)
{
unsigned long cpu_id = (unsigned long)v - 1;
struct device_node *node = of_get_cpu_node(cpu_id, NULL);
- const char *compat, *isa, *mmu;
+ const char *compat, *isa;
seq_printf(m, "processor\t: %lu\n", cpu_id);
seq_printf(m, "hart\t\t: %lu\n", cpuid_to_hartid_map(cpu_id));
if (!of_property_read_string(node, "riscv,isa", &isa))
print_isa(m, isa);
- if (!of_property_read_string(node, "mmu-type", &mmu))
- print_mmu(m, mmu);
+ print_mmu(m);
if (!of_property_read_string(node, "compatible", &compat)
&& strcmp(compat, "riscv"))
seq_printf(m, "uarch\t\t: %s\n", compat);
#include <linux/of.h>
#include <linux/string.h>
#include <linux/sched.h>
-#include <linux/sched/task_stack.h>
#include <asm/cpu_ops.h>
#include <asm/sbi.h>
#include <asm/smp.h>
const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
-void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
-void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
-
extern const struct cpu_operations cpu_ops_sbi;
+#ifdef CONFIG_RISCV_BOOT_SPINWAIT
extern const struct cpu_operations cpu_ops_spinwait;
-
-void cpu_update_secondary_bootdata(unsigned int cpuid,
- struct task_struct *tidle)
-{
- int hartid = cpuid_to_hartid_map(cpuid);
-
- /* Make sure tidle is updated */
- smp_mb();
- WRITE_ONCE(__cpu_up_stack_pointer[hartid],
- task_stack_page(tidle) + THREAD_SIZE);
- WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
-}
+#else
+const struct cpu_operations cpu_ops_spinwait = {
+ .name = "",
+ .cpu_prepare = NULL,
+ .cpu_start = NULL,
+};
+#endif
void __init cpu_set_ops(int cpuid)
{
#if IS_ENABLED(CONFIG_RISCV_SBI)
if (sbi_probe_extension(SBI_EXT_HSM) > 0) {
if (!cpuid)
- pr_info("SBI v0.2 HSM extension detected\n");
+ pr_info("SBI HSM extension detected\n");
cpu_ops[cpuid] = &cpu_ops_sbi;
} else
#endif
#include <linux/init.h>
#include <linux/mm.h>
+#include <linux/sched/task_stack.h>
#include <asm/cpu_ops.h>
+#include <asm/cpu_ops_sbi.h>
#include <asm/sbi.h>
#include <asm/smp.h>
extern char secondary_start_sbi[];
const struct cpu_operations cpu_ops_sbi;
+/*
+ * Ordered booting via HSM brings one cpu at a time. However, cpu hotplug can
+ * be invoked from multiple threads in parallel. Define a per cpu data
+ * to handle that.
+ */
+DEFINE_PER_CPU(struct sbi_hart_boot_data, boot_data);
+
static int sbi_hsm_hart_start(unsigned long hartid, unsigned long saddr,
unsigned long priv)
{
static int sbi_cpu_start(unsigned int cpuid, struct task_struct *tidle)
{
- int rc;
unsigned long boot_addr = __pa_symbol(secondary_start_sbi);
int hartid = cpuid_to_hartid_map(cpuid);
-
- cpu_update_secondary_bootdata(cpuid, tidle);
- rc = sbi_hsm_hart_start(hartid, boot_addr, 0);
-
- return rc;
+ unsigned long hsm_data;
+ struct sbi_hart_boot_data *bdata = &per_cpu(boot_data, cpuid);
+
+ /* Make sure tidle is updated */
+ smp_mb();
+ bdata->task_ptr = tidle;
+ bdata->stack_ptr = task_stack_page(tidle) + THREAD_SIZE;
+ /* Make sure boot data is updated */
+ smp_mb();
+ hsm_data = __pa(bdata);
+ return sbi_hsm_hart_start(hartid, boot_addr, hsm_data);
}
static int sbi_cpu_prepare(unsigned int cpuid)
#include <linux/errno.h>
#include <linux/of.h>
#include <linux/string.h>
+#include <linux/sched/task_stack.h>
#include <asm/cpu_ops.h>
#include <asm/sbi.h>
#include <asm/smp.h>
const struct cpu_operations cpu_ops_spinwait;
+void *__cpu_spinwait_stack_pointer[NR_CPUS] __section(".data");
+void *__cpu_spinwait_task_pointer[NR_CPUS] __section(".data");
+
+static void cpu_update_secondary_bootdata(unsigned int cpuid,
+ struct task_struct *tidle)
+{
+ int hartid = cpuid_to_hartid_map(cpuid);
+
+ /*
+ * The hartid must be less than NR_CPUS to avoid out-of-bound access
+ * errors for __cpu_spinwait_stack/task_pointer. That is not always possible
+ * for platforms with discontiguous hartid numbering scheme. That's why
+ * spinwait booting is not the recommended approach for any platforms
+ * booting Linux in S-mode and can be disabled in the future.
+ */
+ if (hartid == INVALID_HARTID || hartid >= NR_CPUS)
+ return;
+
+ /* Make sure tidle is updated */
+ smp_mb();
+ WRITE_ONCE(__cpu_spinwait_stack_pointer[hartid],
+ task_stack_page(tidle) + THREAD_SIZE);
+ WRITE_ONCE(__cpu_spinwait_task_pointer[hartid], tidle);
+}
static int spinwait_cpu_prepare(unsigned int cpuid)
{
* selects the first cpu to boot the kernel and causes the remainder
* of the cpus to spin in a loop waiting for their stack pointer to be
* setup by that main cpu. Writing to bootdata
- * (i.e __cpu_up_stack_pointer) signals to the spinning cpus that they
+ * (i.e __cpu_spinwait_stack_pointer) signals to the spinning cpus that they
* can continue the boot process.
*/
cpu_update_secondary_bootdata(cpuid, tidle);
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/csr.h>
+#include <asm/cpu_ops_sbi.h>
#include <asm/hwcap.h>
#include <asm/image.h>
#include "efi-header.S"
/* Compute satp for kernel page tables, but don't load it yet */
srl a2, a0, PAGE_SHIFT
- li a1, SATP_MODE
+ la a1, satp_mode
+ REG_L a1, 0(a1)
or a2, a2, a1
/*
/*
* Switch to kernel page tables. A full fence is necessary in order to
* avoid using the trampoline translations, which are only correct for
- * the first superpage. Fetching the fence is guarnteed to work
+ * the first superpage. Fetching the fence is guaranteed to work
* because that first superpage is translated the same way.
*/
csrw CSR_SATP, a2
la a3, .Lsecondary_park
csrw CSR_TVEC, a3
- slli a3, a0, LGREG
- la a4, __cpu_up_stack_pointer
- XIP_FIXUP_OFFSET a4
- la a5, __cpu_up_task_pointer
- XIP_FIXUP_OFFSET a5
- add a4, a3, a4
- add a5, a3, a5
- REG_L sp, (a4)
- REG_L tp, (a5)
-
- .global secondary_start_common
-secondary_start_common:
+ /* a0 contains the hartid & a1 contains boot data */
+ li a2, SBI_HART_BOOT_TASK_PTR_OFFSET
+ XIP_FIXUP_OFFSET a2
+ add a2, a2, a1
+ REG_L tp, (a2)
+ li a3, SBI_HART_BOOT_STACK_PTR_OFFSET
+ XIP_FIXUP_OFFSET a3
+ add a3, a3, a1
+ REG_L sp, (a3)
+
+.Lsecondary_start_common:
#ifdef CONFIG_MMU
/* Enable virtual memory and relocate to virtual address */
li t0, SR_FS
csrc CSR_STATUS, t0
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RISCV_BOOT_SPINWAIT
li t0, CONFIG_NR_CPUS
blt a0, t0, .Lgood_cores
tail .Lsecondary_park
.Lgood_cores:
-#endif
+ /* The lottery system is only required for spinwait booting method */
#ifndef CONFIG_XIP_KERNEL
/* Pick one hart to run the main boot sequence */
la a3, hart_lottery
/* first time here if hart_lottery in RAM is not set */
beq t0, t1, .Lsecondary_start
+#endif /* CONFIG_XIP */
+#endif /* CONFIG_RISCV_BOOT_SPINWAIT */
+
+#ifdef CONFIG_XIP_KERNEL
la sp, _end + THREAD_SIZE
XIP_FIXUP_OFFSET sp
mv s0, a0
call soc_early_init
tail start_kernel
+#if CONFIG_RISCV_BOOT_SPINWAIT
.Lsecondary_start:
-#ifdef CONFIG_SMP
/* Set trap vector to spin forever to help debug */
la a3, .Lsecondary_park
csrw CSR_TVEC, a3
slli a3, a0, LGREG
- la a1, __cpu_up_stack_pointer
+ la a1, __cpu_spinwait_stack_pointer
XIP_FIXUP_OFFSET a1
- la a2, __cpu_up_task_pointer
+ la a2, __cpu_spinwait_task_pointer
XIP_FIXUP_OFFSET a2
add a1, a3, a1
add a2, a3, a2
beqz tp, .Lwait_for_cpu_up
fence
- tail secondary_start_common
-#endif
+ tail .Lsecondary_start_common
+#endif /* CONFIG_RISCV_BOOT_SPINWAIT */
END(_start_kernel)
ret
END(reset_regs)
#endif /* CONFIG_RISCV_M_MODE */
-
-__PAGE_ALIGNED_BSS
- /* Empty zero page */
- .balign PAGE_SIZE
asmlinkage void __init __copy_data(void);
#endif
-extern void *__cpu_up_stack_pointer[];
-extern void *__cpu_up_task_pointer[];
+#ifdef CONFIG_RISCV_BOOT_SPINWAIT
+extern void *__cpu_spinwait_stack_pointer[];
+extern void *__cpu_spinwait_task_pointer[];
+#endif
#endif /* __ASM_HEAD_H */
* s0: (const) Phys address to jump to
* s1: (const) Phys address of the FDT image
* s2: (const) The hartid of the current hart
- * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
*/
mv s0, a1
mv s1, a2
mv s2, a3
- mv s3, a4
/* Disable / cleanup interrupts */
csrw CSR_SIE, zero
csrw CSR_SIP, zero
- /* Switch to physical addressing */
- la s4, 1f
- sub s4, s4, s3
- csrw CSR_STVEC, s4
- csrw CSR_SATP, zero
-
-.align 2
-1:
/* Pass the arguments to the next kernel / Cleanup*/
mv a0, s2
mv a1, s1
csrw CSR_SCAUSE, zero
csrw CSR_SSCRATCH, zero
- jalr zero, a2, 0
+ /*
+ * Switch to physical addressing
+ * This will also trigger a jump to CSR_STVEC
+ * which in this case is the address of the new
+ * kernel.
+ */
+ csrw CSR_STVEC, a2
+ csrw CSR_SATP, zero
+
SYM_CODE_END(riscv_kexec_norelocate)
.section ".rodata"
struct kimage_arch *internal = &image->arch;
unsigned long jump_addr = (unsigned long) image->start;
unsigned long first_ind_entry = (unsigned long) &image->head;
- unsigned long this_hart_id = raw_smp_processor_id();
+ unsigned long this_cpu_id = smp_processor_id();
+ unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id);
unsigned long fdt_addr = internal->fdt_addr;
void *control_code_buffer = page_address(image->control_code_page);
riscv_kexec_method kexec_method = NULL;
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- int ret;
struct pt_regs *regs;
regs = task_pt_regs(target);
- ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, regs, 0, -1);
- return ret;
+ return user_regset_copyin(&pos, &count, &kbuf, &ubuf, regs, 0, -1);
}
#ifdef CONFIG_FPU
#include <linux/init.h>
#include <linux/pm.h>
+#include <linux/reboot.h>
#include <asm/sbi.h>
#include <asm/smp.h>
EXPORT_SYMBOL(sbi_spec_version);
static void (*__sbi_set_timer)(uint64_t stime) __ro_after_init;
-static int (*__sbi_send_ipi)(const unsigned long *hart_mask) __ro_after_init;
-static int (*__sbi_rfence)(int fid, const unsigned long *hart_mask,
+static int (*__sbi_send_ipi)(const struct cpumask *cpu_mask) __ro_after_init;
+static int (*__sbi_rfence)(int fid, const struct cpumask *cpu_mask,
unsigned long start, unsigned long size,
unsigned long arg4, unsigned long arg5) __ro_after_init;
EXPORT_SYMBOL(sbi_err_map_linux_errno);
#ifdef CONFIG_RISCV_SBI_V01
+static unsigned long __sbi_v01_cpumask_to_hartmask(const struct cpumask *cpu_mask)
+{
+ unsigned long cpuid, hartid;
+ unsigned long hmask = 0;
+
+ /*
+ * There is no maximum hartid concept in RISC-V and NR_CPUS must not be
+ * associated with hartid. As SBI v0.1 is only kept for backward compatibility
+ * and will be removed in the future, there is no point in supporting hartid
+ * greater than BITS_PER_LONG (32 for RV32 and 64 for RV64). Ideally, SBI v0.2
+ * should be used for platforms with hartid greater than BITS_PER_LONG.
+ */
+ for_each_cpu(cpuid, cpu_mask) {
+ hartid = cpuid_to_hartid_map(cpuid);
+ if (hartid >= BITS_PER_LONG) {
+ pr_warn("Unable to send any request to hartid > BITS_PER_LONG for SBI v0.1\n");
+ break;
+ }
+ hmask |= 1 << hartid;
+ }
+
+ return hmask;
+}
+
/**
* sbi_console_putchar() - Writes given character to the console device.
* @ch: The data to be written to the console.
#endif
}
-static int __sbi_send_ipi_v01(const unsigned long *hart_mask)
+static int __sbi_send_ipi_v01(const struct cpumask *cpu_mask)
{
- sbi_ecall(SBI_EXT_0_1_SEND_IPI, 0, (unsigned long)hart_mask,
+ unsigned long hart_mask;
+
+ if (!cpu_mask)
+ cpu_mask = cpu_online_mask;
+ hart_mask = __sbi_v01_cpumask_to_hartmask(cpu_mask);
+
+ sbi_ecall(SBI_EXT_0_1_SEND_IPI, 0, (unsigned long)(&hart_mask),
0, 0, 0, 0, 0);
return 0;
}
-static int __sbi_rfence_v01(int fid, const unsigned long *hart_mask,
+static int __sbi_rfence_v01(int fid, const struct cpumask *cpu_mask,
unsigned long start, unsigned long size,
unsigned long arg4, unsigned long arg5)
{
int result = 0;
+ unsigned long hart_mask;
+
+ if (!cpu_mask)
+ cpu_mask = cpu_online_mask;
+ hart_mask = __sbi_v01_cpumask_to_hartmask(cpu_mask);
/* v0.2 function IDs are equivalent to v0.1 extension IDs */
switch (fid) {
case SBI_EXT_RFENCE_REMOTE_FENCE_I:
sbi_ecall(SBI_EXT_0_1_REMOTE_FENCE_I, 0,
- (unsigned long)hart_mask, 0, 0, 0, 0, 0);
+ (unsigned long)&hart_mask, 0, 0, 0, 0, 0);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA, 0,
- (unsigned long)hart_mask, start, size,
+ (unsigned long)&hart_mask, start, size,
0, 0, 0);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID, 0,
- (unsigned long)hart_mask, start, size,
+ (unsigned long)&hart_mask, start, size,
arg4, 0, 0);
break;
default:
sbi_major_version(), sbi_minor_version());
}
-static int __sbi_send_ipi_v01(const unsigned long *hart_mask)
+static int __sbi_send_ipi_v01(const struct cpumask *cpu_mask)
{
pr_warn("IPI extension is not available in SBI v%lu.%lu\n",
sbi_major_version(), sbi_minor_version());
return 0;
}
-static int __sbi_rfence_v01(int fid, const unsigned long *hart_mask,
+static int __sbi_rfence_v01(int fid, const struct cpumask *cpu_mask,
unsigned long start, unsigned long size,
unsigned long arg4, unsigned long arg5)
{
#endif
}
-static int __sbi_send_ipi_v02(const unsigned long *hart_mask)
+static int __sbi_send_ipi_v02(const struct cpumask *cpu_mask)
{
- unsigned long hartid, hmask_val, hbase;
- struct cpumask tmask;
+ unsigned long hartid, cpuid, hmask = 0, hbase = 0;
struct sbiret ret = {0};
int result;
- if (!hart_mask || !(*hart_mask)) {
- riscv_cpuid_to_hartid_mask(cpu_online_mask, &tmask);
- hart_mask = cpumask_bits(&tmask);
- }
+ if (!cpu_mask)
+ cpu_mask = cpu_online_mask;
- hmask_val = 0;
- hbase = 0;
- for_each_set_bit(hartid, hart_mask, NR_CPUS) {
- if (hmask_val && ((hbase + BITS_PER_LONG) <= hartid)) {
+ for_each_cpu(cpuid, cpu_mask) {
+ hartid = cpuid_to_hartid_map(cpuid);
+ if (hmask && ((hbase + BITS_PER_LONG) <= hartid)) {
ret = sbi_ecall(SBI_EXT_IPI, SBI_EXT_IPI_SEND_IPI,
- hmask_val, hbase, 0, 0, 0, 0);
+ hmask, hbase, 0, 0, 0, 0);
if (ret.error)
goto ecall_failed;
- hmask_val = 0;
+ hmask = 0;
hbase = 0;
}
- if (!hmask_val)
+ if (!hmask)
hbase = hartid;
- hmask_val |= 1UL << (hartid - hbase);
+ hmask |= 1UL << (hartid - hbase);
}
- if (hmask_val) {
+ if (hmask) {
ret = sbi_ecall(SBI_EXT_IPI, SBI_EXT_IPI_SEND_IPI,
- hmask_val, hbase, 0, 0, 0, 0);
+ hmask, hbase, 0, 0, 0, 0);
if (ret.error)
goto ecall_failed;
}
ecall_failed:
result = sbi_err_map_linux_errno(ret.error);
pr_err("%s: hbase = [%lu] hmask = [0x%lx] failed (error [%d])\n",
- __func__, hbase, hmask_val, result);
+ __func__, hbase, hmask, result);
return result;
}
-static int __sbi_rfence_v02_call(unsigned long fid, unsigned long hmask_val,
+static int __sbi_rfence_v02_call(unsigned long fid, unsigned long hmask,
unsigned long hbase, unsigned long start,
unsigned long size, unsigned long arg4,
unsigned long arg5)
switch (fid) {
case SBI_EXT_RFENCE_REMOTE_FENCE_I:
- ret = sbi_ecall(ext, fid, hmask_val, hbase, 0, 0, 0, 0);
+ ret = sbi_ecall(ext, fid, hmask, hbase, 0, 0, 0, 0);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
- ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+ ret = sbi_ecall(ext, fid, hmask, hbase, start,
size, 0, 0);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
- ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+ ret = sbi_ecall(ext, fid, hmask, hbase, start,
size, arg4, 0);
break;
case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
- ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+ ret = sbi_ecall(ext, fid, hmask, hbase, start,
size, 0, 0);
break;
case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID:
- ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+ ret = sbi_ecall(ext, fid, hmask, hbase, start,
size, arg4, 0);
break;
case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA:
- ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+ ret = sbi_ecall(ext, fid, hmask, hbase, start,
size, 0, 0);
break;
case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID:
- ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+ ret = sbi_ecall(ext, fid, hmask, hbase, start,
size, arg4, 0);
break;
default:
if (ret.error) {
result = sbi_err_map_linux_errno(ret.error);
pr_err("%s: hbase = [%lu] hmask = [0x%lx] failed (error [%d])\n",
- __func__, hbase, hmask_val, result);
+ __func__, hbase, hmask, result);
}
return result;
}
-static int __sbi_rfence_v02(int fid, const unsigned long *hart_mask,
+static int __sbi_rfence_v02(int fid, const struct cpumask *cpu_mask,
unsigned long start, unsigned long size,
unsigned long arg4, unsigned long arg5)
{
- unsigned long hmask_val, hartid, hbase;
- struct cpumask tmask;
+ unsigned long hartid, cpuid, hmask = 0, hbase = 0;
int result;
- if (!hart_mask || !(*hart_mask)) {
- riscv_cpuid_to_hartid_mask(cpu_online_mask, &tmask);
- hart_mask = cpumask_bits(&tmask);
- }
+ if (!cpu_mask)
+ cpu_mask = cpu_online_mask;
- hmask_val = 0;
- hbase = 0;
- for_each_set_bit(hartid, hart_mask, NR_CPUS) {
- if (hmask_val && ((hbase + BITS_PER_LONG) <= hartid)) {
- result = __sbi_rfence_v02_call(fid, hmask_val, hbase,
+ for_each_cpu(cpuid, cpu_mask) {
+ hartid = cpuid_to_hartid_map(cpuid);
+ if (hmask && ((hbase + BITS_PER_LONG) <= hartid)) {
+ result = __sbi_rfence_v02_call(fid, hmask, hbase,
start, size, arg4, arg5);
if (result)
return result;
- hmask_val = 0;
+ hmask = 0;
hbase = 0;
}
- if (!hmask_val)
+ if (!hmask)
hbase = hartid;
- hmask_val |= 1UL << (hartid - hbase);
+ hmask |= 1UL << (hartid - hbase);
}
- if (hmask_val) {
- result = __sbi_rfence_v02_call(fid, hmask_val, hbase,
+ if (hmask) {
+ result = __sbi_rfence_v02_call(fid, hmask, hbase,
start, size, arg4, arg5);
if (result)
return result;
/**
* sbi_send_ipi() - Send an IPI to any hart.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
*
* Return: 0 on success, appropriate linux error code otherwise.
*/
-int sbi_send_ipi(const unsigned long *hart_mask)
+int sbi_send_ipi(const struct cpumask *cpu_mask)
{
- return __sbi_send_ipi(hart_mask);
+ return __sbi_send_ipi(cpu_mask);
}
EXPORT_SYMBOL(sbi_send_ipi);
/**
* sbi_remote_fence_i() - Execute FENCE.I instruction on given remote harts.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
*
* Return: 0 on success, appropriate linux error code otherwise.
*/
-int sbi_remote_fence_i(const unsigned long *hart_mask)
+int sbi_remote_fence_i(const struct cpumask *cpu_mask)
{
return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
- hart_mask, 0, 0, 0, 0);
+ cpu_mask, 0, 0, 0, 0);
}
EXPORT_SYMBOL(sbi_remote_fence_i);
/**
* sbi_remote_sfence_vma() - Execute SFENCE.VMA instructions on given remote
* harts for the specified virtual address range.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
* @start: Start of the virtual address
* @size: Total size of the virtual address range.
*
* Return: 0 on success, appropriate linux error code otherwise.
*/
-int sbi_remote_sfence_vma(const unsigned long *hart_mask,
+int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size)
{
return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
- hart_mask, start, size, 0, 0);
+ cpu_mask, start, size, 0, 0);
}
EXPORT_SYMBOL(sbi_remote_sfence_vma);
* sbi_remote_sfence_vma_asid() - Execute SFENCE.VMA instructions on given
* remote harts for a virtual address range belonging to a specific ASID.
*
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
* @start: Start of the virtual address
* @size: Total size of the virtual address range.
* @asid: The value of address space identifier (ASID).
*
* Return: 0 on success, appropriate linux error code otherwise.
*/
-int sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
+int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size,
unsigned long asid)
{
return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
- hart_mask, start, size, asid, 0);
+ cpu_mask, start, size, asid, 0);
}
EXPORT_SYMBOL(sbi_remote_sfence_vma_asid);
/**
* sbi_remote_hfence_gvma() - Execute HFENCE.GVMA instructions on given remote
* harts for the specified guest physical address range.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
* @start: Start of the guest physical address
* @size: Total size of the guest physical address range.
*
* Return: None
*/
-int sbi_remote_hfence_gvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size)
{
return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA,
- hart_mask, start, size, 0, 0);
+ cpu_mask, start, size, 0, 0);
}
EXPORT_SYMBOL_GPL(sbi_remote_hfence_gvma);
* sbi_remote_hfence_gvma_vmid() - Execute HFENCE.GVMA instructions on given
* remote harts for a guest physical address range belonging to a specific VMID.
*
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
* @start: Start of the guest physical address
* @size: Total size of the guest physical address range.
* @vmid: The value of guest ID (VMID).
*
* Return: 0 if success, Error otherwise.
*/
-int sbi_remote_hfence_gvma_vmid(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma_vmid(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size,
unsigned long vmid)
{
return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID,
- hart_mask, start, size, vmid, 0);
+ cpu_mask, start, size, vmid, 0);
}
EXPORT_SYMBOL(sbi_remote_hfence_gvma_vmid);
/**
* sbi_remote_hfence_vvma() - Execute HFENCE.VVMA instructions on given remote
* harts for the current guest virtual address range.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
* @start: Start of the current guest virtual address
* @size: Total size of the current guest virtual address range.
*
* Return: None
*/
-int sbi_remote_hfence_vvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size)
{
return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA,
- hart_mask, start, size, 0, 0);
+ cpu_mask, start, size, 0, 0);
}
EXPORT_SYMBOL(sbi_remote_hfence_vvma);
* remote harts for current guest virtual address range belonging to a specific
* ASID.
*
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
* @start: Start of the current guest virtual address
* @size: Total size of the current guest virtual address range.
* @asid: The value of address space identifier (ASID).
*
* Return: None
*/
-int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma_asid(const struct cpumask *cpu_mask,
unsigned long start,
unsigned long size,
unsigned long asid)
{
return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID,
- hart_mask, start, size, asid, 0);
+ cpu_mask, start, size, asid, 0);
}
EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid);
+static void sbi_srst_reset(unsigned long type, unsigned long reason)
+{
+ sbi_ecall(SBI_EXT_SRST, SBI_EXT_SRST_RESET, type, reason,
+ 0, 0, 0, 0);
+ pr_warn("%s: type=0x%lx reason=0x%lx failed\n",
+ __func__, type, reason);
+}
+
+static int sbi_srst_reboot(struct notifier_block *this,
+ unsigned long mode, void *cmd)
+{
+ sbi_srst_reset((mode == REBOOT_WARM || mode == REBOOT_SOFT) ?
+ SBI_SRST_RESET_TYPE_WARM_REBOOT :
+ SBI_SRST_RESET_TYPE_COLD_REBOOT,
+ SBI_SRST_RESET_REASON_NONE);
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block sbi_srst_reboot_nb;
+
+static void sbi_srst_power_off(void)
+{
+ sbi_srst_reset(SBI_SRST_RESET_TYPE_SHUTDOWN,
+ SBI_SRST_RESET_REASON_NONE);
+}
+
/**
* sbi_probe_extension() - Check if an SBI extension ID is supported or not.
* @extid: The extension ID to be probed.
static void sbi_send_cpumask_ipi(const struct cpumask *target)
{
- struct cpumask hartid_mask;
-
- riscv_cpuid_to_hartid_mask(target, &hartid_mask);
-
- sbi_send_ipi(cpumask_bits(&hartid_mask));
+ sbi_send_ipi(target);
}
static const struct riscv_ipi_ops sbi_ipi_ops = {
} else {
__sbi_rfence = __sbi_rfence_v01;
}
+ if ((sbi_spec_version >= sbi_mk_version(0, 3)) &&
+ (sbi_probe_extension(SBI_EXT_SRST) > 0)) {
+ pr_info("SBI SRST extension detected\n");
+ pm_power_off = sbi_srst_power_off;
+ sbi_srst_reboot_nb.notifier_call = sbi_srst_reboot;
+ sbi_srst_reboot_nb.priority = 192;
+ register_restart_handler(&sbi_srst_reboot_nb);
+ }
} else {
__sbi_set_timer = __sbi_set_timer_v01;
__sbi_send_ipi = __sbi_send_ipi_v01;
return -ENOENT;
}
-void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out)
-{
- int cpu;
-
- cpumask_clear(out);
- for_each_cpu(cpu, in)
- cpumask_set_cpu(cpuid_to_hartid_map(cpu), out);
-}
-EXPORT_SYMBOL_GPL(riscv_cpuid_to_hartid_mask);
-
bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
{
return phys_id == cpuid_to_hartid_map(cpu);
if (cpuid >= NR_CPUS) {
pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
cpuid, hart);
- break;
+ continue;
}
cpuid_to_hartid_map(cpuid) = hart;
ENTRY_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
- *(.fixup)
_etext = .;
}
RO_DATA(L1_CACHE_BYTES)
* Copyright (C) 2017 SiFive
*/
-#define RO_EXCEPTION_TABLE_ALIGN 16
+#define RO_EXCEPTION_TABLE_ALIGN 4
#ifdef CONFIG_XIP_KERNEL
#include "vmlinux-xip.lds.S"
ENTRY_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
- *(.fixup)
_etext = .;
}
static void stage2_remote_tlb_flush(struct kvm *kvm, u32 level, gpa_t addr)
{
- struct cpumask hmask;
unsigned long size = PAGE_SIZE;
struct kvm_vmid *vmid = &kvm->arch.vmid;
* where the Guest/VM is running.
*/
preempt_disable();
- riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
- sbi_remote_hfence_gvma_vmid(cpumask_bits(&hmask), addr, size,
+ sbi_remote_hfence_gvma_vmid(cpu_online_mask, addr, size,
READ_ONCE(vmid->vmid));
preempt_enable();
}
{
int ret = 0;
unsigned long i;
- struct cpumask cm, hm;
+ struct cpumask cm;
struct kvm_vcpu *tmp;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
unsigned long hmask = cp->a0;
unsigned long funcid = cp->a6;
cpumask_clear(&cm);
- cpumask_clear(&hm);
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
if (hbase != -1UL) {
if (tmp->vcpu_id < hbase)
cpumask_set_cpu(tmp->cpu, &cm);
}
- riscv_cpuid_to_hartid_mask(&cm, &hm);
-
switch (funcid) {
case SBI_EXT_RFENCE_REMOTE_FENCE_I:
- ret = sbi_remote_fence_i(cpumask_bits(&hm));
+ ret = sbi_remote_fence_i(&cm);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
- ret = sbi_remote_hfence_vvma(cpumask_bits(&hm), cp->a2, cp->a3);
+ ret = sbi_remote_hfence_vvma(&cm, cp->a2, cp->a3);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
- ret = sbi_remote_hfence_vvma_asid(cpumask_bits(&hm), cp->a2,
+ ret = sbi_remote_hfence_vvma_asid(&cm, cp->a2,
cp->a3, cp->a4);
break;
case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
int i, ret = 0;
u64 next_cycle;
struct kvm_vcpu *rvcpu;
- struct cpumask cm, hm;
+ struct cpumask cm;
struct kvm *kvm = vcpu->kvm;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
continue;
cpumask_set_cpu(rvcpu->cpu, &cm);
}
- riscv_cpuid_to_hartid_mask(&cm, &hm);
if (cp->a7 == SBI_EXT_0_1_REMOTE_FENCE_I)
- ret = sbi_remote_fence_i(cpumask_bits(&hm));
+ ret = sbi_remote_fence_i(&cm);
else if (cp->a7 == SBI_EXT_0_1_REMOTE_SFENCE_VMA)
- ret = sbi_remote_hfence_vvma(cpumask_bits(&hm),
- cp->a1, cp->a2);
+ ret = sbi_remote_hfence_vvma(&cm, cp->a1, cp->a2);
else
- ret = sbi_remote_hfence_vvma_asid(cpumask_bits(&hm),
- cp->a1, cp->a2, cp->a3);
+ ret = sbi_remote_hfence_vvma_asid(&cm, cp->a1, cp->a2, cp->a3);
break;
default:
ret = -EINVAL;
{
unsigned long i;
struct kvm_vcpu *v;
- struct cpumask hmask;
struct kvm_vmid *vmid = &vcpu->kvm->arch.vmid;
if (!kvm_riscv_stage2_vmid_ver_changed(vmid))
* running, we force VM exits on all host CPUs using IPI and
* flush all Guest TLBs.
*/
- riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
- sbi_remote_hfence_gvma(cpumask_bits(&hmask), 0, 0);
+ sbi_remote_hfence_gvma(cpu_online_mask, 0, 0);
}
vmid->vmid = vmid_next;
#include <linux/linkage.h>
#include <asm-generic/export.h>
#include <asm/asm.h>
+#include <asm/asm-extable.h>
#include <asm/csr.h>
.macro fixup op reg addr lbl
100:
\op \reg, \addr
- .section __ex_table,"a"
- .balign RISCV_SZPTR
- RISCV_PTR 100b, \lbl
- .previous
+ _asm_extable 100b, \lbl
.endm
ENTRY(__asm_copy_to_user)
csrc CSR_STATUS, t6
li a0, 0
ret
+
+ /* Exception fixup code */
+10:
+ /* Disable access to user memory */
+ csrs CSR_STATUS, t6
+ mv a0, t5
+ ret
ENDPROC(__asm_copy_to_user)
ENDPROC(__asm_copy_from_user)
EXPORT_SYMBOL(__asm_copy_to_user)
addi a0, a0, 1
bltu a0, a3, 5b
j 3b
-ENDPROC(__clear_user)
-EXPORT_SYMBOL(__clear_user)
- .section .fixup,"ax"
- .balign 4
- /* Fixup code for __copy_user(10) and __clear_user(11) */
-10:
- /* Disable access to user memory */
- csrs CSR_STATUS, t6
- mv a0, t5
- ret
+ /* Exception fixup code */
11:
+ /* Disable access to user memory */
csrs CSR_STATUS, t6
mv a0, a1
ret
- .previous
+ENDPROC(__clear_user)
+EXPORT_SYMBOL(__clear_user)
*/
smp_mb();
} else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
- cpumask_t hartid_mask;
-
- riscv_cpuid_to_hartid_mask(&others, &hartid_mask);
- sbi_remote_fence_i(cpumask_bits(&hartid_mask));
+ sbi_remote_fence_i(&others);
} else {
on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
}
switch_mm_fast:
csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
((cntx & asid_mask) << SATP_ASID_SHIFT) |
- SATP_MODE);
+ satp_mode);
if (need_flush_tlb)
local_flush_tlb_all();
static void set_mm_noasid(struct mm_struct *mm)
{
/* Switch the page table and blindly nuke entire local TLB */
- csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | SATP_MODE);
+ csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
local_flush_tlb_all();
}
*/
+#include <linux/bitfield.h>
#include <linux/extable.h>
#include <linux/module.h>
#include <linux/uaccess.h>
+#include <asm/asm-extable.h>
+#include <asm/ptrace.h>
-#if defined(CONFIG_BPF_JIT) && defined(CONFIG_ARCH_RV64I)
-int rv_bpf_fixup_exception(const struct exception_table_entry *ex, struct pt_regs *regs);
-#endif
+static inline unsigned long
+get_ex_fixup(const struct exception_table_entry *ex)
+{
+ return ((unsigned long)&ex->fixup + ex->fixup);
+}
+
+static bool ex_handler_fixup(const struct exception_table_entry *ex,
+ struct pt_regs *regs)
+{
+ regs->epc = get_ex_fixup(ex);
+ return true;
+}
+
+static inline void regs_set_gpr(struct pt_regs *regs, unsigned int offset,
+ unsigned long val)
+{
+ if (unlikely(offset > MAX_REG_OFFSET))
+ return;
+
+ if (!offset)
+ *(unsigned long *)((unsigned long)regs + offset) = val;
+}
+
+static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex,
+ struct pt_regs *regs)
+{
+ int reg_err = FIELD_GET(EX_DATA_REG_ERR, ex->data);
+ int reg_zero = FIELD_GET(EX_DATA_REG_ZERO, ex->data);
+
+ regs_set_gpr(regs, reg_err, -EFAULT);
+ regs_set_gpr(regs, reg_zero, 0);
+
+ regs->epc = get_ex_fixup(ex);
+ return true;
+}
-int fixup_exception(struct pt_regs *regs)
+bool fixup_exception(struct pt_regs *regs)
{
- const struct exception_table_entry *fixup;
+ const struct exception_table_entry *ex;
- fixup = search_exception_tables(regs->epc);
- if (!fixup)
- return 0;
+ ex = search_exception_tables(regs->epc);
+ if (!ex)
+ return false;
-#if defined(CONFIG_BPF_JIT) && defined(CONFIG_ARCH_RV64I)
- if (regs->epc >= BPF_JIT_REGION_START && regs->epc < BPF_JIT_REGION_END)
- return rv_bpf_fixup_exception(fixup, regs);
-#endif
+ switch (ex->type) {
+ case EX_TYPE_FIXUP:
+ return ex_handler_fixup(ex, regs);
+ case EX_TYPE_BPF:
+ return ex_handler_bpf(ex, regs);
+ case EX_TYPE_UACCESS_ERR_ZERO:
+ return ex_handler_uaccess_err_zero(ex, regs);
+ }
- regs->epc = fixup->fixup;
- return 1;
+ BUG();
}
* only copy the information from the master page table,
* nothing more.
*/
- if (unlikely((addr >= VMALLOC_START) && (addr <= VMALLOC_END))) {
+ if (unlikely((addr >= VMALLOC_START) && (addr < VMALLOC_END))) {
vmalloc_fault(regs, code, addr);
return;
}
#define kernel_map (*(struct kernel_mapping *)XIP_FIXUP(&kernel_map))
#endif
+#ifdef CONFIG_64BIT
+u64 satp_mode = !IS_ENABLED(CONFIG_XIP_KERNEL) ? SATP_MODE_48 : SATP_MODE_39;
+#else
+u64 satp_mode = SATP_MODE_32;
+#endif
+EXPORT_SYMBOL(satp_mode);
+
+bool pgtable_l4_enabled = IS_ENABLED(CONFIG_64BIT) && !IS_ENABLED(CONFIG_XIP_KERNEL);
+EXPORT_SYMBOL(pgtable_l4_enabled);
+
phys_addr_t phys_ram_base __ro_after_init;
EXPORT_SYMBOL(phys_ram_base);
-#ifdef CONFIG_XIP_KERNEL
-extern char _xiprom[], _exiprom[], __data_loc;
-#endif
-
unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
__page_aligned_bss;
EXPORT_SYMBOL(empty_zero_page);
void *_dtb_early_va __initdata;
uintptr_t _dtb_early_pa __initdata;
-struct pt_alloc_ops {
- pte_t *(*get_pte_virt)(phys_addr_t pa);
- phys_addr_t (*alloc_pte)(uintptr_t va);
-#ifndef __PAGETABLE_PMD_FOLDED
- pmd_t *(*get_pmd_virt)(phys_addr_t pa);
- phys_addr_t (*alloc_pmd)(uintptr_t va);
-#endif
-};
-
static phys_addr_t dma32_phys_limit __initdata;
static void __init zone_sizes_init(void)
(unsigned long)VMALLOC_END);
print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
(unsigned long)high_memory);
-#ifdef CONFIG_64BIT
- print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
- (unsigned long)ADDRESS_SPACE_END);
+ if (IS_ENABLED(CONFIG_64BIT)) {
+#ifdef CONFIG_KASAN
+ print_mlm("kasan", KASAN_SHADOW_START, KASAN_SHADOW_END);
#endif
+
+ print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
+ (unsigned long)ADDRESS_SPACE_END);
+ }
}
#else
static void print_vm_layout(void) { }
print_vm_layout();
}
-/*
- * The default maximal physical memory size is -PAGE_OFFSET for 32-bit kernel,
- * whereas for 64-bit kernel, the end of the virtual address space is occupied
- * by the modules/BPF/kernel mappings which reduces the available size of the
- * linear mapping.
- * Limit the memory size via mem.
- */
-#ifdef CONFIG_64BIT
-static phys_addr_t memory_limit = -PAGE_OFFSET - SZ_4G;
-#else
-static phys_addr_t memory_limit = -PAGE_OFFSET;
-#endif
+/* Limit the memory size via mem. */
+static phys_addr_t memory_limit;
static int __init early_mem(char *p)
{
static void __init setup_bootmem(void)
{
phys_addr_t vmlinux_end = __pa_symbol(&_end);
- phys_addr_t vmlinux_start = __pa_symbol(&_start);
- phys_addr_t __maybe_unused max_mapped_addr;
- phys_addr_t phys_ram_end;
+ phys_addr_t max_mapped_addr;
+ phys_addr_t phys_ram_end, vmlinux_start;
-#ifdef CONFIG_XIP_KERNEL
- vmlinux_start = __pa_symbol(&_sdata);
-#endif
+ if (IS_ENABLED(CONFIG_XIP_KERNEL))
+ vmlinux_start = __pa_symbol(&_sdata);
+ else
+ vmlinux_start = __pa_symbol(&_start);
memblock_enforce_memory_limit(memory_limit);
- /*
- * Reserve from the start of the kernel to the end of the kernel
- */
-#if defined(CONFIG_64BIT) && defined(CONFIG_STRICT_KERNEL_RWX)
/*
* Make sure we align the reservation on PMD_SIZE since we will
* map the kernel in the linear mapping as read-only: we do not want
* any allocation to happen between _end and the next pmd aligned page.
*/
- vmlinux_end = (vmlinux_end + PMD_SIZE - 1) & PMD_MASK;
-#endif
+ if (IS_ENABLED(CONFIG_64BIT) && IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+ vmlinux_end = (vmlinux_end + PMD_SIZE - 1) & PMD_MASK;
+ /*
+ * Reserve from the start of the kernel to the end of the kernel
+ */
memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start);
-
phys_ram_end = memblock_end_of_DRAM();
-#ifndef CONFIG_64BIT
-#ifndef CONFIG_XIP_KERNEL
- phys_ram_base = memblock_start_of_DRAM();
-#endif
+ if (!IS_ENABLED(CONFIG_XIP_KERNEL))
+ phys_ram_base = memblock_start_of_DRAM();
/*
* memblock allocator is not aware of the fact that last 4K bytes of
* the addressable memory can not be mapped because of IS_ERR_VALUE
* address space is occupied by the kernel mapping then this check must
* be done as soon as the kernel mapping base address is determined.
*/
- max_mapped_addr = __pa(~(ulong)0);
- if (max_mapped_addr == (phys_ram_end - 1))
- memblock_set_current_limit(max_mapped_addr - 4096);
-#endif
+ if (!IS_ENABLED(CONFIG_64BIT)) {
+ max_mapped_addr = __pa(~(ulong)0);
+ if (max_mapped_addr == (phys_ram_end - 1))
+ memblock_set_current_limit(max_mapped_addr - 4096);
+ }
min_low_pfn = PFN_UP(phys_ram_base);
max_low_pfn = max_pfn = PFN_DOWN(phys_ram_end);
}
#ifdef CONFIG_MMU
-static struct pt_alloc_ops _pt_ops __initdata;
-
-#ifdef CONFIG_XIP_KERNEL
-#define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&_pt_ops))
-#else
-#define pt_ops _pt_ops
-#endif
+struct pt_alloc_ops pt_ops __initdata;
unsigned long riscv_pfn_base __ro_after_init;
EXPORT_SYMBOL(riscv_pfn_base);
static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
+static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
#ifdef CONFIG_XIP_KERNEL
+#define pt_ops (*(struct pt_alloc_ops *)XIP_FIXUP(&pt_ops))
#define trampoline_pg_dir ((pgd_t *)XIP_FIXUP(trampoline_pg_dir))
#define fixmap_pte ((pte_t *)XIP_FIXUP(fixmap_pte))
#define early_pg_dir ((pgd_t *)XIP_FIXUP(early_pg_dir))
#define early_pmd ((pmd_t *)XIP_FIXUP(early_pmd))
#endif /* CONFIG_XIP_KERNEL */
+static pud_t trampoline_pud[PTRS_PER_PUD] __page_aligned_bss;
+static pud_t fixmap_pud[PTRS_PER_PUD] __page_aligned_bss;
+static pud_t early_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
+
+#ifdef CONFIG_XIP_KERNEL
+#define trampoline_pud ((pud_t *)XIP_FIXUP(trampoline_pud))
+#define fixmap_pud ((pud_t *)XIP_FIXUP(fixmap_pud))
+#define early_pud ((pud_t *)XIP_FIXUP(early_pud))
+#endif /* CONFIG_XIP_KERNEL */
+
static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
{
/* Before MMU is enabled */
static phys_addr_t __init alloc_pmd_early(uintptr_t va)
{
- BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
+ BUG_ON((va - kernel_map.virt_addr) >> PUD_SHIFT);
return (uintptr_t)early_pmd;
}
unsigned long vaddr;
vaddr = __get_free_page(GFP_KERNEL);
- BUG_ON(!vaddr);
+ BUG_ON(!vaddr || !pgtable_pmd_page_ctor(virt_to_page(vaddr)));
+
return __pa(vaddr);
}
create_pte_mapping(ptep, va, pa, sz, prot);
}
-#define pgd_next_t pmd_t
-#define alloc_pgd_next(__va) pt_ops.alloc_pmd(__va)
-#define get_pgd_next_virt(__pa) pt_ops.get_pmd_virt(__pa)
+static pud_t *__init get_pud_virt_early(phys_addr_t pa)
+{
+ return (pud_t *)((uintptr_t)pa);
+}
+
+static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
+{
+ clear_fixmap(FIX_PUD);
+ return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
+}
+
+static pud_t *__init get_pud_virt_late(phys_addr_t pa)
+{
+ return (pud_t *)__va(pa);
+}
+
+static phys_addr_t __init alloc_pud_early(uintptr_t va)
+{
+ /* Only one PUD is available for early mapping */
+ BUG_ON((va - kernel_map.virt_addr) >> PGDIR_SHIFT);
+
+ return (uintptr_t)early_pud;
+}
+
+static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
+{
+ return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
+}
+
+static phys_addr_t alloc_pud_late(uintptr_t va)
+{
+ unsigned long vaddr;
+
+ vaddr = __get_free_page(GFP_KERNEL);
+ BUG_ON(!vaddr);
+ return __pa(vaddr);
+}
+
+static void __init create_pud_mapping(pud_t *pudp,
+ uintptr_t va, phys_addr_t pa,
+ phys_addr_t sz, pgprot_t prot)
+{
+ pmd_t *nextp;
+ phys_addr_t next_phys;
+ uintptr_t pud_index = pud_index(va);
+
+ if (sz == PUD_SIZE) {
+ if (pud_val(pudp[pud_index]) == 0)
+ pudp[pud_index] = pfn_pud(PFN_DOWN(pa), prot);
+ return;
+ }
+
+ if (pud_val(pudp[pud_index]) == 0) {
+ next_phys = pt_ops.alloc_pmd(va);
+ pudp[pud_index] = pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE);
+ nextp = pt_ops.get_pmd_virt(next_phys);
+ memset(nextp, 0, PAGE_SIZE);
+ } else {
+ next_phys = PFN_PHYS(_pud_pfn(pudp[pud_index]));
+ nextp = pt_ops.get_pmd_virt(next_phys);
+ }
+
+ create_pmd_mapping(nextp, va, pa, sz, prot);
+}
+
+#define pgd_next_t pud_t
+#define alloc_pgd_next(__va) (pgtable_l4_enabled ? \
+ pt_ops.alloc_pud(__va) : pt_ops.alloc_pmd(__va))
+#define get_pgd_next_virt(__pa) (pgtable_l4_enabled ? \
+ pt_ops.get_pud_virt(__pa) : (pgd_next_t *)pt_ops.get_pmd_virt(__pa))
#define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot) \
- create_pmd_mapping(__nextp, __va, __pa, __sz, __prot)
-#define fixmap_pgd_next fixmap_pmd
+ (pgtable_l4_enabled ? \
+ create_pud_mapping(__nextp, __va, __pa, __sz, __prot) : \
+ create_pmd_mapping((pmd_t *)__nextp, __va, __pa, __sz, __prot))
+#define fixmap_pgd_next (pgtable_l4_enabled ? \
+ (uintptr_t)fixmap_pud : (uintptr_t)fixmap_pmd)
+#define trampoline_pgd_next (pgtable_l4_enabled ? \
+ (uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd)
+#define early_dtb_pgd_next (pgtable_l4_enabled ? \
+ (uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd)
#else
#define pgd_next_t pte_t
#define alloc_pgd_next(__va) pt_ops.alloc_pte(__va)
#define get_pgd_next_virt(__pa) pt_ops.get_pte_virt(__pa)
#define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot) \
create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
-#define fixmap_pgd_next fixmap_pte
+#define fixmap_pgd_next ((uintptr_t)fixmap_pte)
+#define early_dtb_pgd_next ((uintptr_t)early_dtb_pmd)
+#define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot)
#define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot)
-#endif
+#endif /* __PAGETABLE_PMD_FOLDED */
void __init create_pgd_mapping(pgd_t *pgdp,
uintptr_t va, phys_addr_t pa,
}
#ifdef CONFIG_XIP_KERNEL
+extern char _xiprom[], _exiprom[], __data_loc;
+
/* called from head.S with MMU off */
asmlinkage void __init __copy_data(void)
{
}
#endif /* CONFIG_STRICT_KERNEL_RWX */
+#ifdef CONFIG_64BIT
+static void __init disable_pgtable_l4(void)
+{
+ pgtable_l4_enabled = false;
+ kernel_map.page_offset = PAGE_OFFSET_L3;
+ satp_mode = SATP_MODE_39;
+}
+
+/*
+ * There is a simple way to determine if 4-level is supported by the
+ * underlying hardware: establish 1:1 mapping in 4-level page table mode
+ * then read SATP to see if the configuration was taken into account
+ * meaning sv48 is supported.
+ */
+static __init void set_satp_mode(void)
+{
+ u64 identity_satp, hw_satp;
+ uintptr_t set_satp_mode_pmd;
+
+ set_satp_mode_pmd = ((unsigned long)set_satp_mode) & PMD_MASK;
+ create_pgd_mapping(early_pg_dir,
+ set_satp_mode_pmd, (uintptr_t)early_pud,
+ PGDIR_SIZE, PAGE_TABLE);
+ create_pud_mapping(early_pud,
+ set_satp_mode_pmd, (uintptr_t)early_pmd,
+ PUD_SIZE, PAGE_TABLE);
+ /* Handle the case where set_satp_mode straddles 2 PMDs */
+ create_pmd_mapping(early_pmd,
+ set_satp_mode_pmd, set_satp_mode_pmd,
+ PMD_SIZE, PAGE_KERNEL_EXEC);
+ create_pmd_mapping(early_pmd,
+ set_satp_mode_pmd + PMD_SIZE,
+ set_satp_mode_pmd + PMD_SIZE,
+ PMD_SIZE, PAGE_KERNEL_EXEC);
+
+ identity_satp = PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode;
+
+ local_flush_tlb_all();
+ csr_write(CSR_SATP, identity_satp);
+ hw_satp = csr_swap(CSR_SATP, 0ULL);
+ local_flush_tlb_all();
+
+ if (hw_satp != identity_satp)
+ disable_pgtable_l4();
+
+ memset(early_pg_dir, 0, PAGE_SIZE);
+ memset(early_pud, 0, PAGE_SIZE);
+ memset(early_pmd, 0, PAGE_SIZE);
+}
+#endif
+
/*
* setup_vm() is called from head.S with MMU-off.
*
uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
- IS_ENABLED(CONFIG_64BIT) ? (uintptr_t)early_dtb_pmd : pa,
+ IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
PGDIR_SIZE,
IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
+ if (pgtable_l4_enabled) {
+ create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
+ (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
+ }
+
if (IS_ENABLED(CONFIG_64BIT)) {
create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
pa, PMD_SIZE, PAGE_KERNEL);
dtb_early_pa = dtb_pa;
}
+/*
+ * MMU is not enabled, the page tables are allocated directly using
+ * early_pmd/pud/p4d and the address returned is the physical one.
+ */
+void __init pt_ops_set_early(void)
+{
+ pt_ops.alloc_pte = alloc_pte_early;
+ pt_ops.get_pte_virt = get_pte_virt_early;
+#ifndef __PAGETABLE_PMD_FOLDED
+ pt_ops.alloc_pmd = alloc_pmd_early;
+ pt_ops.get_pmd_virt = get_pmd_virt_early;
+ pt_ops.alloc_pud = alloc_pud_early;
+ pt_ops.get_pud_virt = get_pud_virt_early;
+#endif
+}
+
+/*
+ * MMU is enabled but page table setup is not complete yet.
+ * fixmap page table alloc functions must be used as a means to temporarily
+ * map the allocated physical pages since the linear mapping does not exist yet.
+ *
+ * Note that this is called with MMU disabled, hence kernel_mapping_pa_to_va,
+ * but it will be used as described above.
+ */
+void __init pt_ops_set_fixmap(void)
+{
+ pt_ops.alloc_pte = kernel_mapping_pa_to_va((uintptr_t)alloc_pte_fixmap);
+ pt_ops.get_pte_virt = kernel_mapping_pa_to_va((uintptr_t)get_pte_virt_fixmap);
+#ifndef __PAGETABLE_PMD_FOLDED
+ pt_ops.alloc_pmd = kernel_mapping_pa_to_va((uintptr_t)alloc_pmd_fixmap);
+ pt_ops.get_pmd_virt = kernel_mapping_pa_to_va((uintptr_t)get_pmd_virt_fixmap);
+ pt_ops.alloc_pud = kernel_mapping_pa_to_va((uintptr_t)alloc_pud_fixmap);
+ pt_ops.get_pud_virt = kernel_mapping_pa_to_va((uintptr_t)get_pud_virt_fixmap);
+#endif
+}
+
+/*
+ * MMU is enabled and page table setup is complete, so from now, we can use
+ * generic page allocation functions to setup page table.
+ */
+void __init pt_ops_set_late(void)
+{
+ pt_ops.alloc_pte = alloc_pte_late;
+ pt_ops.get_pte_virt = get_pte_virt_late;
+#ifndef __PAGETABLE_PMD_FOLDED
+ pt_ops.alloc_pmd = alloc_pmd_late;
+ pt_ops.get_pmd_virt = get_pmd_virt_late;
+ pt_ops.alloc_pud = alloc_pud_late;
+ pt_ops.get_pud_virt = get_pud_virt_late;
+#endif
+}
+
asmlinkage void __init setup_vm(uintptr_t dtb_pa)
{
pmd_t __maybe_unused fix_bmap_spmd, fix_bmap_epmd;
kernel_map.virt_addr = KERNEL_LINK_ADDR;
+ kernel_map.page_offset = _AC(CONFIG_PAGE_OFFSET, UL);
#ifdef CONFIG_XIP_KERNEL
kernel_map.xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
kernel_map.phys_addr = (uintptr_t)(&_start);
kernel_map.size = (uintptr_t)(&_end) - kernel_map.phys_addr;
#endif
+
+#if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
+ set_satp_mode();
+#endif
+
kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
+ /*
+ * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
+ * kernel, whereas for 64-bit kernel, the end of the virtual address
+ * space is occupied by the modules/BPF/kernel mappings which reduces
+ * the available size of the linear mapping.
+ */
+ memory_limit = KERN_VIRT_SIZE - (IS_ENABLED(CONFIG_64BIT) ? SZ_4G : 0);
+
/* Sanity check alignment and size */
BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0);
BUG_ON((kernel_map.phys_addr % PMD_SIZE) != 0);
BUG_ON((kernel_map.virt_addr + kernel_map.size) > ADDRESS_SPACE_END - SZ_4K);
#endif
- pt_ops.alloc_pte = alloc_pte_early;
- pt_ops.get_pte_virt = get_pte_virt_early;
-#ifndef __PAGETABLE_PMD_FOLDED
- pt_ops.alloc_pmd = alloc_pmd_early;
- pt_ops.get_pmd_virt = get_pmd_virt_early;
-#endif
+ pt_ops_set_early();
+
/* Setup early PGD for fixmap */
create_pgd_mapping(early_pg_dir, FIXADDR_START,
- (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
+ fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE);
#ifndef __PAGETABLE_PMD_FOLDED
- /* Setup fixmap PMD */
+ /* Setup fixmap PUD and PMD */
+ if (pgtable_l4_enabled)
+ create_pud_mapping(fixmap_pud, FIXADDR_START,
+ (uintptr_t)fixmap_pmd, PUD_SIZE, PAGE_TABLE);
create_pmd_mapping(fixmap_pmd, FIXADDR_START,
(uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
/* Setup trampoline PGD and PMD */
create_pgd_mapping(trampoline_pg_dir, kernel_map.virt_addr,
- (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
+ trampoline_pgd_next, PGDIR_SIZE, PAGE_TABLE);
+ if (pgtable_l4_enabled)
+ create_pud_mapping(trampoline_pud, kernel_map.virt_addr,
+ (uintptr_t)trampoline_pmd, PUD_SIZE, PAGE_TABLE);
#ifdef CONFIG_XIP_KERNEL
create_pmd_mapping(trampoline_pmd, kernel_map.virt_addr,
kernel_map.xiprom, PMD_SIZE, PAGE_KERNEL_EXEC);
* Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
* range can not span multiple pmds.
*/
- BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
+ BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
!= (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
#ifndef __PAGETABLE_PMD_FOLDED
pr_warn("FIX_BTMAP_BEGIN: %d\n", FIX_BTMAP_BEGIN);
}
#endif
+
+ pt_ops_set_fixmap();
}
static void __init setup_vm_final(void)
phys_addr_t pa, start, end;
u64 i;
- /**
- * MMU is enabled at this point. But page table setup is not complete yet.
- * fixmap page table alloc functions should be used at this point
- */
- pt_ops.alloc_pte = alloc_pte_fixmap;
- pt_ops.get_pte_virt = get_pte_virt_fixmap;
-#ifndef __PAGETABLE_PMD_FOLDED
- pt_ops.alloc_pmd = alloc_pmd_fixmap;
- pt_ops.get_pmd_virt = get_pmd_virt_fixmap;
-#endif
/* Setup swapper PGD for fixmap */
create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
__pa_symbol(fixmap_pgd_next),
}
}
-#ifdef CONFIG_64BIT
/* Map the kernel */
- create_kernel_page_table(swapper_pg_dir, false);
+ if (IS_ENABLED(CONFIG_64BIT))
+ create_kernel_page_table(swapper_pg_dir, false);
+
+#ifdef CONFIG_KASAN
+ kasan_swapper_init();
#endif
/* Clear fixmap PTE and PMD mappings */
clear_fixmap(FIX_PTE);
clear_fixmap(FIX_PMD);
+ clear_fixmap(FIX_PUD);
/* Move to swapper page table */
- csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE);
+ csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode);
local_flush_tlb_all();
- /* generic page allocation functions must be used to setup page table */
- pt_ops.alloc_pte = alloc_pte_late;
- pt_ops.get_pte_virt = get_pte_virt_late;
-#ifndef __PAGETABLE_PMD_FOLDED
- pt_ops.alloc_pmd = alloc_pmd_late;
- pt_ops.get_pmd_virt = get_pmd_virt_late;
-#endif
+ pt_ops_set_late();
}
#else
asmlinkage void __init setup_vm(uintptr_t dtb_pa)
* since it doesn't make much sense and we have limited memory
* resources.
*/
-#ifdef CONFIG_CRASH_DUMP
if (is_kdump_kernel()) {
pr_info("crashkernel: ignoring reservation request\n");
return;
}
-#endif
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base);
/*
* Current riscv boot protocol requires 2MB alignment for
* RV64 and 4MB alignment for RV32 (hugepage size)
+ *
+ * Try to alloc from 32bit addressible physical memory so that
+ * swiotlb can work on the crash kernel.
*/
crash_base = memblock_phys_alloc_range(crash_size, PMD_SIZE,
- search_start, search_end);
+ search_start,
+ min(search_end, (unsigned long) SZ_4G));
if (crash_base == 0) {
- pr_warn("crashkernel: couldn't allocate %lldKB\n",
- crash_size >> 10);
- return;
+ /* Try again without restricting region to 32bit addressible memory */
+ crash_base = memblock_phys_alloc_range(crash_size, PMD_SIZE,
+ search_start, search_end);
+ if (crash_base == 0) {
+ pr_warn("crashkernel: couldn't allocate %lldKB\n",
+ crash_size >> 10);
+ return;
+ }
}
pr_info("crashkernel: reserved 0x%016llx - 0x%016llx (%lld MB)\n",
#include <asm/fixmap.h>
#include <asm/pgalloc.h>
-extern pgd_t early_pg_dir[PTRS_PER_PGD];
-asmlinkage void __init kasan_early_init(void)
-{
- uintptr_t i;
- pgd_t *pgd = early_pg_dir + pgd_index(KASAN_SHADOW_START);
-
- BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
- KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
-
- for (i = 0; i < PTRS_PER_PTE; ++i)
- set_pte(kasan_early_shadow_pte + i,
- mk_pte(virt_to_page(kasan_early_shadow_page),
- PAGE_KERNEL));
-
- for (i = 0; i < PTRS_PER_PMD; ++i)
- set_pmd(kasan_early_shadow_pmd + i,
- pfn_pmd(PFN_DOWN
- (__pa((uintptr_t) kasan_early_shadow_pte)),
- __pgprot(_PAGE_TABLE)));
-
- for (i = KASAN_SHADOW_START; i < KASAN_SHADOW_END;
- i += PGDIR_SIZE, ++pgd)
- set_pgd(pgd,
- pfn_pgd(PFN_DOWN
- (__pa(((uintptr_t) kasan_early_shadow_pmd))),
- __pgprot(_PAGE_TABLE)));
-
- /* init for swapper_pg_dir */
- pgd = pgd_offset_k(KASAN_SHADOW_START);
-
- for (i = KASAN_SHADOW_START; i < KASAN_SHADOW_END;
- i += PGDIR_SIZE, ++pgd)
- set_pgd(pgd,
- pfn_pgd(PFN_DOWN
- (__pa(((uintptr_t) kasan_early_shadow_pmd))),
- __pgprot(_PAGE_TABLE)));
+/*
+ * Kasan shadow region must lie at a fixed address across sv39, sv48 and sv57
+ * which is right before the kernel.
+ *
+ * For sv39, the region is aligned on PGDIR_SIZE so we only need to populate
+ * the page global directory with kasan_early_shadow_pmd.
+ *
+ * For sv48 and sv57, the region is not aligned on PGDIR_SIZE so the mapping
+ * must be divided as follows:
+ * - the first PGD entry, although incomplete, is populated with
+ * kasan_early_shadow_pud/p4d
+ * - the PGD entries in the middle are populated with kasan_early_shadow_pud/p4d
+ * - the last PGD entry is shared with the kernel mapping so populated at the
+ * lower levels pud/p4d
+ *
+ * In addition, when shallow populating a kasan region (for example vmalloc),
+ * this region may also not be aligned on PGDIR size, so we must go down to the
+ * pud level too.
+ */
- local_flush_tlb_all();
-}
+extern pgd_t early_pg_dir[PTRS_PER_PGD];
static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long end)
{
set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
}
-static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long end)
+static void __init kasan_populate_pmd(pud_t *pud, unsigned long vaddr, unsigned long end)
{
phys_addr_t phys_addr;
pmd_t *pmdp, *base_pmd;
unsigned long next;
- base_pmd = (pmd_t *)pgd_page_vaddr(*pgd);
- if (base_pmd == lm_alias(kasan_early_shadow_pmd))
+ if (pud_none(*pud)) {
base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
+ } else {
+ base_pmd = (pmd_t *)pud_pgtable(*pud);
+ if (base_pmd == lm_alias(kasan_early_shadow_pmd))
+ base_pmd = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
+ }
pmdp = base_pmd + pmd_index(vaddr);
* it entirely, memblock could allocate a page at a physical address
* where KASAN is not populated yet and then we'd get a page fault.
*/
- set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+ set_pud(pud, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+}
+
+static void __init kasan_populate_pud(pgd_t *pgd,
+ unsigned long vaddr, unsigned long end,
+ bool early)
+{
+ phys_addr_t phys_addr;
+ pud_t *pudp, *base_pud;
+ unsigned long next;
+
+ if (early) {
+ /*
+ * We can't use pgd_page_vaddr here as it would return a linear
+ * mapping address but it is not mapped yet, but when populating
+ * early_pg_dir, we need the physical address and when populating
+ * swapper_pg_dir, we need the kernel virtual address so use
+ * pt_ops facility.
+ */
+ base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
+ } else {
+ base_pud = (pud_t *)pgd_page_vaddr(*pgd);
+ if (base_pud == lm_alias(kasan_early_shadow_pud))
+ base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
+ }
+
+ pudp = base_pud + pud_index(vaddr);
+
+ do {
+ next = pud_addr_end(vaddr, end);
+
+ if (pud_none(*pudp) && IS_ALIGNED(vaddr, PUD_SIZE) && (next - vaddr) >= PUD_SIZE) {
+ if (early) {
+ phys_addr = __pa(((uintptr_t)kasan_early_shadow_pmd));
+ set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_TABLE));
+ continue;
+ } else {
+ phys_addr = memblock_phys_alloc(PUD_SIZE, PUD_SIZE);
+ if (phys_addr) {
+ set_pud(pudp, pfn_pud(PFN_DOWN(phys_addr), PAGE_KERNEL));
+ continue;
+ }
+ }
+ }
+
+ kasan_populate_pmd(pudp, vaddr, next);
+ } while (pudp++, vaddr = next, vaddr != end);
+
+ /*
+ * Wait for the whole PGD to be populated before setting the PGD in
+ * the page table, otherwise, if we did set the PGD before populating
+ * it entirely, memblock could allocate a page at a physical address
+ * where KASAN is not populated yet and then we'd get a page fault.
+ */
+ if (!early)
+ set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pud)), PAGE_TABLE));
}
-static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
+#define kasan_early_shadow_pgd_next (pgtable_l4_enabled ? \
+ (uintptr_t)kasan_early_shadow_pud : \
+ (uintptr_t)kasan_early_shadow_pmd)
+#define kasan_populate_pgd_next(pgdp, vaddr, next, early) \
+ (pgtable_l4_enabled ? \
+ kasan_populate_pud(pgdp, vaddr, next, early) : \
+ kasan_populate_pmd((pud_t *)pgdp, vaddr, next))
+
+static void __init kasan_populate_pgd(pgd_t *pgdp,
+ unsigned long vaddr, unsigned long end,
+ bool early)
{
phys_addr_t phys_addr;
- pgd_t *pgdp = pgd_offset_k(vaddr);
unsigned long next;
do {
next = pgd_addr_end(vaddr, end);
- /*
- * pgdp can't be none since kasan_early_init initialized all KASAN
- * shadow region with kasan_early_shadow_pmd: if this is stillthe case,
- * that means we can try to allocate a hugepage as a replacement.
- */
- if (pgd_page_vaddr(*pgdp) == (unsigned long)lm_alias(kasan_early_shadow_pmd) &&
- IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE) {
- phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
- if (phys_addr) {
- set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_KERNEL));
+ if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE) {
+ if (early) {
+ phys_addr = __pa((uintptr_t)kasan_early_shadow_pgd_next);
+ set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_TABLE));
continue;
+ } else if (pgd_page_vaddr(*pgdp) ==
+ (unsigned long)lm_alias(kasan_early_shadow_pgd_next)) {
+ /*
+ * pgdp can't be none since kasan_early_init
+ * initialized all KASAN shadow region with
+ * kasan_early_shadow_pud: if this is still the
+ * case, that means we can try to allocate a
+ * hugepage as a replacement.
+ */
+ phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
+ if (phys_addr) {
+ set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), PAGE_KERNEL));
+ continue;
+ }
}
}
- kasan_populate_pmd(pgdp, vaddr, next);
+ kasan_populate_pgd_next(pgdp, vaddr, next, early);
} while (pgdp++, vaddr = next, vaddr != end);
}
+asmlinkage void __init kasan_early_init(void)
+{
+ uintptr_t i;
+
+ BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
+ KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+
+ for (i = 0; i < PTRS_PER_PTE; ++i)
+ set_pte(kasan_early_shadow_pte + i,
+ mk_pte(virt_to_page(kasan_early_shadow_page),
+ PAGE_KERNEL));
+
+ for (i = 0; i < PTRS_PER_PMD; ++i)
+ set_pmd(kasan_early_shadow_pmd + i,
+ pfn_pmd(PFN_DOWN
+ (__pa((uintptr_t)kasan_early_shadow_pte)),
+ PAGE_TABLE));
+
+ if (pgtable_l4_enabled) {
+ for (i = 0; i < PTRS_PER_PUD; ++i)
+ set_pud(kasan_early_shadow_pud + i,
+ pfn_pud(PFN_DOWN
+ (__pa(((uintptr_t)kasan_early_shadow_pmd))),
+ PAGE_TABLE));
+ }
+
+ kasan_populate_pgd(early_pg_dir + pgd_index(KASAN_SHADOW_START),
+ KASAN_SHADOW_START, KASAN_SHADOW_END, true);
+
+ local_flush_tlb_all();
+}
+
+void __init kasan_swapper_init(void)
+{
+ kasan_populate_pgd(pgd_offset_k(KASAN_SHADOW_START),
+ KASAN_SHADOW_START, KASAN_SHADOW_END, true);
+
+ local_flush_tlb_all();
+}
+
static void __init kasan_populate(void *start, void *end)
{
unsigned long vaddr = (unsigned long)start & PAGE_MASK;
unsigned long vend = PAGE_ALIGN((unsigned long)end);
- kasan_populate_pgd(vaddr, vend);
+ kasan_populate_pgd(pgd_offset_k(vaddr), vaddr, vend, false);
local_flush_tlb_all();
memset(start, KASAN_SHADOW_INIT, end - start);
}
+static void __init kasan_shallow_populate_pud(pgd_t *pgdp,
+ unsigned long vaddr, unsigned long end,
+ bool kasan_populate)
+{
+ unsigned long next;
+ pud_t *pudp, *base_pud;
+ pmd_t *base_pmd;
+ bool is_kasan_pmd;
+
+ base_pud = (pud_t *)pgd_page_vaddr(*pgdp);
+ pudp = base_pud + pud_index(vaddr);
+
+ if (kasan_populate)
+ memcpy(base_pud, (void *)kasan_early_shadow_pgd_next,
+ sizeof(pud_t) * PTRS_PER_PUD);
+
+ do {
+ next = pud_addr_end(vaddr, end);
+ is_kasan_pmd = (pud_pgtable(*pudp) == lm_alias(kasan_early_shadow_pmd));
+
+ if (is_kasan_pmd) {
+ base_pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+ set_pud(pudp, pfn_pud(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
+ }
+ } while (pudp++, vaddr = next, vaddr != end);
+}
+
static void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)
{
unsigned long next;
void *p;
pgd_t *pgd_k = pgd_offset_k(vaddr);
+ bool is_kasan_pgd_next;
do {
next = pgd_addr_end(vaddr, end);
- if (pgd_page_vaddr(*pgd_k) == (unsigned long)lm_alias(kasan_early_shadow_pmd)) {
+ is_kasan_pgd_next = (pgd_page_vaddr(*pgd_k) ==
+ (unsigned long)lm_alias(kasan_early_shadow_pgd_next));
+
+ if (is_kasan_pgd_next) {
p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
}
+
+ if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= PGDIR_SIZE)
+ continue;
+
+ kasan_shallow_populate_pud(pgd_k, vaddr, next, is_kasan_pgd_next);
} while (pgd_k++, vaddr = next, vaddr != end);
}
unsigned long size, unsigned long stride)
{
struct cpumask *cmask = mm_cpumask(mm);
- struct cpumask hmask;
unsigned int cpuid;
bool broadcast;
unsigned long asid = atomic_long_read(&mm->context.id);
if (broadcast) {
- riscv_cpuid_to_hartid_mask(cmask, &hmask);
- sbi_remote_sfence_vma_asid(cpumask_bits(&hmask),
- start, size, asid);
+ sbi_remote_sfence_vma_asid(cmask, start, size, asid);
} else if (size <= stride) {
local_flush_tlb_page_asid(start, asid);
} else {
}
} else {
if (broadcast) {
- riscv_cpuid_to_hartid_mask(cmask, &hmask);
- sbi_remote_sfence_vma(cpumask_bits(&hmask),
- start, size);
+ sbi_remote_sfence_vma(cmask, start, size);
} else if (size <= stride) {
local_flush_tlb_page(start);
} else {
#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
-int rv_bpf_fixup_exception(const struct exception_table_entry *ex,
- struct pt_regs *regs);
-int rv_bpf_fixup_exception(const struct exception_table_entry *ex,
- struct pt_regs *regs)
+bool ex_handler_bpf(const struct exception_table_entry *ex,
+ struct pt_regs *regs)
{
off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
int regs_offset = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
*(unsigned long *)((void *)regs + pt_regmap[regs_offset]) = 0;
regs->epc = (unsigned long)&ex->fixup - offset;
- return 1;
+ return true;
}
/* For accesses to BTF pointers, add an entry to the exception table */
offset = pc - (long)&ex->insn;
if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
return -ERANGE;
- ex->insn = pc;
+ ex->insn = offset;
/*
* Since the extable follows the program, the fixup offset is always
ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, offset) |
FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
+ ex->type = EX_TYPE_BPF;
ctx->nexentries++;
return 0;
select GENERIC_CPU_AUTOPROBE
select GENERIC_CPU_VULNERABILITIES
select GENERIC_ENTRY
- select GENERIC_FIND_FIRST_BIT
select GENERIC_GETTIMEOFDAY
select GENERIC_PTDUMP
select GENERIC_SMP_IDLE_THREAD
$(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
-vmlinux.bin.all-y := $(obj)/vmlinux.bin
-
suffix-$(CONFIG_KERNEL_GZIP) := .gz
suffix-$(CONFIG_KERNEL_BZIP2) := .bz2
suffix-$(CONFIG_KERNEL_LZ4) := .lz4
suffix-$(CONFIG_KERNEL_XZ) := .xz
suffix-$(CONFIG_KERNEL_ZSTD) := .zst
-$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
+$(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
$(call if_changed,gzip)
-$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,bzip2)
-$(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lz4)
-$(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzma)
-$(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzo)
-$(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,xzkern)
-$(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,zstd22)
+$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,bzip2_with_size)
+$(obj)/vmlinux.bin.lz4: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lz4_with_size)
+$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lzma_with_size)
+$(obj)/vmlinux.bin.lzo: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lzo_with_size)
+$(obj)/vmlinux.bin.xz: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,xzkern_with_size)
+$(obj)/vmlinux.bin.zst: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,zstd22_with_size)
OBJCOPYFLAGS_piggy.o := -I binary -O elf64-s390 -B s390:64-bit --rename-section .data=.vmlinux.bin.compressed
$(obj)/piggy.o: $(obj)/vmlinux.bin$(suffix-y) FORCE
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_KSM=y
CONFIG_TRANSPARENT_HUGEPAGE=y
-CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
CONFIG_CMA_DEBUG=y
CONFIG_CMA_DEBUGFS=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_KSM=y
CONFIG_TRANSPARENT_HUGEPAGE=y
-CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
CONFIG_CMA_SYSFS=y
CONFIG_CMA_AREAS=7
#endif /* CONFIG_HAVE_MARCH_Z9_109_FEATURES */
#include <asm-generic/bitops/ffz.h>
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/hweight.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/le.h>
unsigned int AS:2; /* 29-30 PSW address-space control */
unsigned int I:1; /* 31 entry valid or invalid */
unsigned int CL:2; /* 32-33 Configuration Level */
- unsigned int:14;
+ unsigned int H:1; /* 34 Host Indicator */
+ unsigned int LS:1; /* 35 Limited Sampling */
+ unsigned int:12;
unsigned int prim_asn:16; /* primary ASN */
unsigned long long ia; /* Instruction Address */
unsigned long long gpp; /* Guest Program Parameter */
#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES
-#define __put_get_user_asm(to, from, size, insn) \
-({ \
- int __rc; \
- \
- asm volatile( \
- insn " 0,%[spec]\n" \
- "0: mvcos %[_to],%[_from],%[_size]\n" \
- "1: xr %[rc],%[rc]\n" \
- "2:\n" \
- ".pushsection .fixup, \"ax\"\n" \
- "3: lhi %[rc],%[retval]\n" \
- " jg 2b\n" \
- ".popsection\n" \
- EX_TABLE(0b,3b) EX_TABLE(1b,3b) \
- : [rc] "=&d" (__rc), [_to] "+Q" (*(to)) \
- : [_size] "d" (size), [_from] "Q" (*(from)), \
- [retval] "K" (-EFAULT), [spec] "K" (0x81UL) \
- : "cc", "0"); \
- __rc; \
+union oac {
+ unsigned int val;
+ struct {
+ struct {
+ unsigned short key : 4;
+ unsigned short : 4;
+ unsigned short as : 2;
+ unsigned short : 4;
+ unsigned short k : 1;
+ unsigned short a : 1;
+ } oac1;
+ struct {
+ unsigned short key : 4;
+ unsigned short : 4;
+ unsigned short as : 2;
+ unsigned short : 4;
+ unsigned short k : 1;
+ unsigned short a : 1;
+ } oac2;
+ };
+};
+
+#define __put_get_user_asm(to, from, size, oac_spec) \
+({ \
+ int __rc; \
+ \
+ asm volatile( \
+ " lr 0,%[spec]\n" \
+ "0: mvcos %[_to],%[_from],%[_size]\n" \
+ "1: xr %[rc],%[rc]\n" \
+ "2:\n" \
+ ".pushsection .fixup, \"ax\"\n" \
+ "3: lhi %[rc],%[retval]\n" \
+ " jg 2b\n" \
+ ".popsection\n" \
+ EX_TABLE(0b,3b) EX_TABLE(1b,3b) \
+ : [rc] "=&d" (__rc), [_to] "+Q" (*(to)) \
+ : [_size] "d" (size), [_from] "Q" (*(from)), \
+ [retval] "K" (-EFAULT), [spec] "d" (oac_spec.val) \
+ : "cc", "0"); \
+ __rc; \
})
+#define __put_user_asm(to, from, size) \
+ __put_get_user_asm(to, from, size, ((union oac) { \
+ .oac1.as = PSW_BITS_AS_SECONDARY, \
+ .oac1.a = 1 \
+ }))
+
+#define __get_user_asm(to, from, size) \
+ __put_get_user_asm(to, from, size, ((union oac) { \
+ .oac2.as = PSW_BITS_AS_SECONDARY, \
+ .oac2.a = 1 \
+ })) \
+
static __always_inline int __put_user_fn(void *x, void __user *ptr, unsigned long size)
{
int rc;
switch (size) {
case 1:
- rc = __put_get_user_asm((unsigned char __user *)ptr,
- (unsigned char *)x,
- size, "llilh");
+ rc = __put_user_asm((unsigned char __user *)ptr,
+ (unsigned char *)x,
+ size);
break;
case 2:
- rc = __put_get_user_asm((unsigned short __user *)ptr,
- (unsigned short *)x,
- size, "llilh");
+ rc = __put_user_asm((unsigned short __user *)ptr,
+ (unsigned short *)x,
+ size);
break;
case 4:
- rc = __put_get_user_asm((unsigned int __user *)ptr,
- (unsigned int *)x,
- size, "llilh");
+ rc = __put_user_asm((unsigned int __user *)ptr,
+ (unsigned int *)x,
+ size);
break;
case 8:
- rc = __put_get_user_asm((unsigned long __user *)ptr,
- (unsigned long *)x,
- size, "llilh");
+ rc = __put_user_asm((unsigned long __user *)ptr,
+ (unsigned long *)x,
+ size);
break;
default:
__put_user_bad();
switch (size) {
case 1:
- rc = __put_get_user_asm((unsigned char *)x,
- (unsigned char __user *)ptr,
- size, "lghi");
+ rc = __get_user_asm((unsigned char *)x,
+ (unsigned char __user *)ptr,
+ size);
break;
case 2:
- rc = __put_get_user_asm((unsigned short *)x,
- (unsigned short __user *)ptr,
- size, "lghi");
+ rc = __get_user_asm((unsigned short *)x,
+ (unsigned short __user *)ptr,
+ size);
break;
case 4:
- rc = __put_get_user_asm((unsigned int *)x,
- (unsigned int __user *)ptr,
- size, "lghi");
+ rc = __get_user_asm((unsigned int *)x,
+ (unsigned int __user *)ptr,
+ size);
break;
case 8:
- rc = __put_get_user_asm((unsigned long *)x,
- (unsigned long __user *)ptr,
- size, "lghi");
+ rc = __get_user_asm((unsigned long *)x,
+ (unsigned long __user *)ptr,
+ size);
break;
default:
__get_user_bad();
case CPUMF_CTR_SET_CRYPTO:
if (info->csvn >= 1 && info->csvn <= 5)
ctrset_size = 16;
- else if (info->csvn == 6)
+ else if (info->csvn == 6 || info->csvn == 7)
ctrset_size = 20;
break;
case CPUMF_CTR_SET_EXT:
ctrset_size = 48;
else if (info->csvn >= 3 && info->csvn <= 5)
ctrset_size = 128;
- else if (info->csvn == 6)
+ else if (info->csvn == 6 || info->csvn == 7)
ctrset_size = 160;
break;
case CPUMF_CTR_SET_MT_DIAG:
NULL,
};
-static struct attribute *cpumcf_svn_6_pmu_event_attr[] __initdata = {
+static struct attribute *cpumcf_svn_67_pmu_event_attr[] __initdata = {
CPUMF_EVENT_PTR(cf_svn_12345, PRNG_FUNCTIONS),
CPUMF_EVENT_PTR(cf_svn_12345, PRNG_CYCLES),
CPUMF_EVENT_PTR(cf_svn_12345, PRNG_BLOCKED_FUNCTIONS),
case 1 ... 5:
csvn = cpumcf_svn_12345_pmu_event_attr;
break;
- case 6:
- csvn = cpumcf_svn_6_pmu_event_attr;
+ case 6 ... 7:
+ csvn = cpumcf_svn_67_pmu_event_attr;
break;
default:
csvn = none;
sample = (struct hws_basic_entry *) *sdbt;
while ((unsigned long *) sample < (unsigned long *) te) {
/* Check for an empty sample */
- if (!sample->def)
+ if (!sample->def || sample->LS)
break;
/* Update perf event period */
ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, ofs);
while (ofs >= ms->npages && (mnode = rb_next(mnode))) {
ms = container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_idx]);
- ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, 0);
+ ofs = find_first_bit(kvm_second_dirty_bitmap(ms), ms->npages);
}
return ms->base_gfn + ofs;
}
unsigned long size)
{
unsigned long tmp1, tmp2;
+ union oac spec = {
+ .oac2.as = PSW_BITS_AS_SECONDARY,
+ .oac2.a = 1,
+ };
tmp1 = -4096UL;
asm volatile(
- " lghi 0,%[spec]\n"
+ " lr 0,%[spec]\n"
"0: .insn ss,0xc80000000000,0(%0,%2),0(%1),0\n"
"6: jz 4f\n"
"1: algr %0,%3\n"
"5:\n"
EX_TABLE(0b,2b) EX_TABLE(3b,5b) EX_TABLE(6b,2b) EX_TABLE(7b,5b)
: "+a" (size), "+a" (ptr), "+a" (x), "+a" (tmp1), "=a" (tmp2)
- : [spec] "K" (0x81UL)
+ : [spec] "d" (spec.val)
: "cc", "memory", "0");
return size;
}
unsigned long size)
{
unsigned long tmp1, tmp2;
+ union oac spec = {
+ .oac1.as = PSW_BITS_AS_SECONDARY,
+ .oac1.a = 1,
+ };
tmp1 = -4096UL;
asm volatile(
- " llilh 0,%[spec]\n"
+ " lr 0,%[spec]\n"
"0: .insn ss,0xc80000000000,0(%0,%1),0(%2),0\n"
"6: jz 4f\n"
"1: algr %0,%3\n"
"5:\n"
EX_TABLE(0b,2b) EX_TABLE(3b,5b) EX_TABLE(6b,2b) EX_TABLE(7b,5b)
: "+a" (size), "+a" (ptr), "+a" (x), "+a" (tmp1), "=a" (tmp2)
- : [spec] "K" (0x81UL)
+ : [spec] "d" (spec.val)
: "cc", "memory", "0");
return size;
}
static inline unsigned long clear_user_mvcos(void __user *to, unsigned long size)
{
unsigned long tmp1, tmp2;
+ union oac spec = {
+ .oac1.as = PSW_BITS_AS_SECONDARY,
+ .oac1.a = 1,
+ };
tmp1 = -4096UL;
asm volatile(
- " llilh 0,%[spec]\n"
+ " lr 0,%[spec]\n"
"0: .insn ss,0xc80000000000,0(%0,%1),0(%4),0\n"
" jz 4f\n"
"1: algr %0,%2\n"
"5:\n"
EX_TABLE(0b,2b) EX_TABLE(3b,5b)
: "+a" (size), "+a" (to), "+a" (tmp1), "=a" (tmp2)
- : "a" (empty_zero_page), [spec] "K" (0x81UL)
+ : "a" (empty_zero_page), [spec] "d" (spec.val)
: "cc", "memory", "0");
return size;
}
CONFIG_ENTRY_OFFSET ?= 0x00001000
CONFIG_PHYSICAL_START ?= $(CONFIG_MEMORY_START)
-suffix-y := bin
-suffix-$(CONFIG_KERNEL_GZIP) := gz
-suffix-$(CONFIG_KERNEL_BZIP2) := bz2
-suffix-$(CONFIG_KERNEL_LZMA) := lzma
-suffix-$(CONFIG_KERNEL_XZ) := xz
-suffix-$(CONFIG_KERNEL_LZO) := lzo
+suffix_y := bin
+suffix_$(CONFIG_KERNEL_GZIP) := gz
+suffix_$(CONFIG_KERNEL_BZIP2) := bz2
+suffix_$(CONFIG_KERNEL_LZMA) := lzma
+suffix_$(CONFIG_KERNEL_XZ) := xz
+suffix_$(CONFIG_KERNEL_LZO) := lzo
targets := zImage vmlinux.srec romImage uImage uImage.srec uImage.gz \
uImage.bz2 uImage.lzma uImage.xz uImage.lzo uImage.bin \
$(obj)/uImage.srec: $(obj)/uImage FORCE
$(call if_changed,objcopy)
-$(obj)/uImage: $(obj)/uImage.$(suffix-y)
+$(obj)/uImage: $(obj)/uImage.$(suffix_y)
@ln -sf $(notdir $<) $@
@echo ' Image $@ is ready'
export CONFIG_PAGE_OFFSET CONFIG_MEMORY_START CONFIG_BOOT_LINK_OFFSET \
CONFIG_PHYSICAL_START CONFIG_ZERO_PAGE_OFFSET CONFIG_ENTRY_OFFSET \
- KERNEL_MEMORY suffix-y
+ KERNEL_MEMORY suffix_y
$(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
-vmlinux.bin.all-y := $(obj)/vmlinux.bin
-
-$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
+$(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
$(call if_changed,gzip)
-$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,bzip2)
-$(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzma)
-$(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,xzkern)
-$(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzo)
+$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,bzip2_with_size)
+$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lzma_with_size)
+$(obj)/vmlinux.bin.xz: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,xzkern_with_size)
+$(obj)/vmlinux.bin.lzo: $(obj)/vmlinux.bin FORCE
+ $(call if_changed,lzo_with_size)
OBJCOPYFLAGS += -R .empty_zero_page
LDFLAGS_piggy.o := -r --format binary --oformat $(ld-bfd) -T
-$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.$(suffix-y) FORCE
+$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.$(suffix_y) FORCE
$(call if_changed,ld)
# SPDX-License-Identifier: GPL-2.0-only
-ifneq ($(CONFIG_BUILTIN_DTB_SOURCE),"")
-obj-$(CONFIG_USE_BUILTIN_DTB) += $(patsubst "%",%,$(CONFIG_BUILTIN_DTB_SOURCE)).dtb.o
-endif
+obj-$(CONFIG_USE_BUILTIN_DTB) += $(addsuffix .dtb.o, $(CONFIG_BUILTIN_DTB_SOURCE))
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/le.h>
-#include <asm-generic/bitops/find.h>
#endif /* __ASM_SH_BITOPS_H */
static ssize_t alignment_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *pos)
{
- int *data = PDE_DATA(file_inode(file));
+ int *data = pde_data(file_inode(file));
char mode;
if (count > 0) {
};
/*
- * This needs to be done after sysctl_init, otherwise sys/ will be
+ * This needs to be done after sysctl_init_bases(), otherwise sys/ will be
* overwritten. Actually, this shouldn't be in sys/ at all since
* it isn't a sysctl, and it doesn't contain sysctl information.
* We now locate it in /proc/cpu/alignment instead.
select PCI_DOMAINS if PCI
select ARCH_HAS_GIGANTIC_PAGE
select HAVE_SOFTIRQ_ON_OWN_STACK
+ select HAVE_SETUP_PER_CPU_AREA
+ select NEED_PER_CPU_EMBED_FIRST_CHUNK
+ select NEED_PER_CPU_PAGE_FIRST_CHUNK
config ARCH_PROC_KCORE_TEXT
def_bool y
bool
default y
-config HAVE_SETUP_PER_CPU_AREA
- def_bool y if SPARC64
-
-config NEED_PER_CPU_EMBED_FIRST_CHUNK
- def_bool y if SPARC64
-
-config NEED_PER_CPU_PAGE_FIRST_CHUNK
- def_bool y if SPARC64
-
config MMU
bool
default y
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/hweight.h>
#include <asm-generic/bitops/lock.h>
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/ext2-atomic.h>
#include <asm-generic/bitops/lock.h>
#endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h>
-
#ifdef __KERNEL__
#include <asm-generic/bitops/le.h>
};
#endif
-static struct proc_dir_entry *led;
-
#define LED_VERSION "0.1"
static int __init led_init(void)
{
timer_setup(&led_blink_timer, led_blink, 0);
- led = proc_create("led", 0, NULL, &led_proc_ops);
- if (!led)
+#ifdef CONFIG_PROC_FS
+ if (!proc_create("led", 0, NULL, &led_proc_ops))
return -ENOMEM;
-
+#endif
printk(KERN_INFO
"led: version %s, Lars Kotthoff <metalhead@metalhead.ws>\n",
LED_VERSION);
smp_call_function(stop_this_cpu, NULL, 0);
}
-/**
- * pcpu_alloc_bootmem - NUMA friendly alloc_bootmem wrapper for percpu
- * @cpu: cpu to allocate for
- * @size: size allocation in bytes
- * @align: alignment
- *
- * Allocate @size bytes aligned at @align for cpu @cpu. This wrapper
- * does the right thing for NUMA regardless of the current
- * configuration.
- *
- * RETURNS:
- * Pointer to the allocated area on success, NULL on failure.
- */
-static void * __init pcpu_alloc_bootmem(unsigned int cpu, size_t size,
- size_t align)
-{
- const unsigned long goal = __pa(MAX_DMA_ADDRESS);
-#ifdef CONFIG_NUMA
- int node = cpu_to_node(cpu);
- void *ptr;
-
- if (!node_online(node) || !NODE_DATA(node)) {
- ptr = memblock_alloc_from(size, align, goal);
- pr_info("cpu %d has no node %d or node-local memory\n",
- cpu, node);
- pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
- cpu, size, __pa(ptr));
- } else {
- ptr = memblock_alloc_try_nid(size, align, goal,
- MEMBLOCK_ALLOC_ACCESSIBLE, node);
- pr_debug("per cpu data for cpu%d %lu bytes on node%d at "
- "%016lx\n", cpu, size, node, __pa(ptr));
- }
- return ptr;
-#else
- return memblock_alloc_from(size, align, goal);
-#endif
-}
-
-static void __init pcpu_free_bootmem(void *ptr, size_t size)
-{
- memblock_free(ptr, size);
-}
-
static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
{
if (cpu_to_node(from) == cpu_to_node(to))
return REMOTE_DISTANCE;
}
-static void __init pcpu_populate_pte(unsigned long addr)
+static int __init pcpu_cpu_to_node(int cpu)
{
- pgd_t *pgd = pgd_offset_k(addr);
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
-
- if (pgd_none(*pgd)) {
- pud_t *new;
-
- new = memblock_alloc_from(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
- if (!new)
- goto err_alloc;
- pgd_populate(&init_mm, pgd, new);
- }
-
- p4d = p4d_offset(pgd, addr);
- if (p4d_none(*p4d)) {
- pud_t *new;
-
- new = memblock_alloc_from(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
- if (!new)
- goto err_alloc;
- p4d_populate(&init_mm, p4d, new);
- }
-
- pud = pud_offset(p4d, addr);
- if (pud_none(*pud)) {
- pmd_t *new;
-
- new = memblock_alloc_from(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
- if (!new)
- goto err_alloc;
- pud_populate(&init_mm, pud, new);
- }
-
- pmd = pmd_offset(pud, addr);
- if (!pmd_present(*pmd)) {
- pte_t *new;
-
- new = memblock_alloc_from(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
- if (!new)
- goto err_alloc;
- pmd_populate_kernel(&init_mm, pmd, new);
- }
-
- return;
-
-err_alloc:
- panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
- __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+ return cpu_to_node(cpu);
}
void __init setup_per_cpu_areas(void)
rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
PERCPU_DYNAMIC_RESERVE, 4 << 20,
pcpu_cpu_distance,
- pcpu_alloc_bootmem,
- pcpu_free_bootmem);
+ pcpu_cpu_to_node);
if (rc)
pr_warn("PERCPU: %s allocator failed (%d), "
"falling back to page size\n",
}
if (rc < 0)
rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE,
- pcpu_alloc_bootmem,
- pcpu_free_bootmem,
- pcpu_populate_pte);
+ pcpu_cpu_to_node);
if (rc < 0)
panic("cannot initialize percpu area (err=%d)", rc);
select ARCH_HAS_FILTER_PGPROT
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
- select ARCH_HAS_KCOV if X86_64 && STACK_VALIDATION
+ select ARCH_HAS_KCOV if X86_64
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select GENERIC_CPU_VULNERABILITIES
select GENERIC_EARLY_IOREMAP
select GENERIC_ENTRY
- select GENERIC_FIND_FIRST_BIT
select GENERIC_IOMAP
select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
select GENERIC_IRQ_MATRIX_ALLOCATOR if X86_LOCAL_APIC
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
select HAVE_FUNCTION_ARG_ACCESS_API
+ select HAVE_SETUP_PER_CPU_AREA
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
select HAVE_STACK_VALIDATION if X86_64
select HAVE_GENERIC_VDSO
select HOTPLUG_SMT if SMP
select IRQ_FORCED_THREADING
+ select NEED_PER_CPU_EMBED_FIRST_CHUNK
+ select NEED_PER_CPU_PAGE_FIRST_CHUNK
select NEED_SG_DMA_LENGTH
select PCI_DOMAINS if PCI
select PCI_LOCKLESS_CONFIG if PCI
config ARCH_HAS_FILTER_PGPROT
def_bool y
-config HAVE_SETUP_PER_CPU_AREA
- def_bool y
-
-config NEED_PER_CPU_EMBED_FIRST_CHUNK
- def_bool y
-
-config NEED_PER_CPU_PAGE_FIRST_CHUNK
- def_bool y
-
config ARCH_HIBERNATION_POSSIBLE
def_bool y
depends on SMP
depends on X86_64 || (X86_32 && HIGHMEM64G && X86_BIGSMP)
default y if X86_BIGSMP
+ select USE_PERCPU_NUMA_NODE_ID
help
Enable NUMA (Non-Uniform Memory Access) support.
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
def_bool y
-config USE_PERCPU_NUMA_NODE_ID
- def_bool y
- depends on NUMA
-
menu "Power management and ACPI options"
config ARCH_HIBERNATION_HEADER
$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
$(call if_changed,gzip)
$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,bzip2)
+ $(call if_changed,bzip2_with_size)
$(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzma)
+ $(call if_changed,lzma_with_size)
$(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,xzkern)
+ $(call if_changed,xzkern_with_size)
$(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lzo)
+ $(call if_changed,lzo_with_size)
$(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,lz4)
+ $(call if_changed,lz4_with_size)
$(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
- $(call if_changed,zstd22)
+ $(call if_changed,zstd22_with_size)
suffix-$(CONFIG_KERNEL_GZIP) := gz
suffix-$(CONFIG_KERNEL_BZIP2) := bz2
#include <asm-generic/bitops/fls64.h>
#endif
-#include <asm-generic/bitops/find.h>
-
#include <asm-generic/bitops/sched.h>
#include <asm/arch_hweight.h>
KVM_X86_OP_NULL(tlb_remote_flush_with_range)
KVM_X86_OP(tlb_flush_gva)
KVM_X86_OP(tlb_flush_guest)
+KVM_X86_OP(vcpu_pre_run)
KVM_X86_OP(run)
KVM_X86_OP_NULL(handle_exit)
KVM_X86_OP_NULL(skip_emulated_instruction)
KVM_X86_OP_NULL(request_immediate_exit)
KVM_X86_OP(sched_in)
KVM_X86_OP_NULL(update_cpu_dirty_logging)
-KVM_X86_OP_NULL(pre_block)
-KVM_X86_OP_NULL(post_block)
KVM_X86_OP_NULL(vcpu_blocking)
KVM_X86_OP_NULL(vcpu_unblocking)
KVM_X86_OP_NULL(update_pi_irte)
*/
void (*tlb_flush_guest)(struct kvm_vcpu *vcpu);
+ int (*vcpu_pre_run)(struct kvm_vcpu *vcpu);
enum exit_fastpath_completion (*run)(struct kvm_vcpu *vcpu);
int (*handle_exit)(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion exit_fastpath);
const struct kvm_pmu_ops *pmu_ops;
const struct kvm_x86_nested_ops *nested_ops;
- /*
- * Architecture specific hooks for vCPU blocking due to
- * HLT instruction.
- * Returns for .pre_block():
- * - 0 means continue to block the vCPU.
- * - 1 means we cannot block the vCPU since some event
- * happens during this period, such as, 'ON' bit in
- * posted-interrupts descriptor is set.
- */
- int (*pre_block)(struct kvm_vcpu *vcpu);
- void (*post_block)(struct kvm_vcpu *vcpu);
-
void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
void __init lapic_assign_system_vectors(void)
{
- unsigned int i, vector = 0;
+ unsigned int i, vector;
- for_each_set_bit_from(vector, system_vectors, NR_VECTORS)
+ for_each_set_bit(vector, system_vectors, NR_VECTORS)
irq_matrix_assign_system(vector_matrix, vector, false);
if (nr_legacy_irqs() > 1)
.stolen_size = gen9_stolen_size,
};
+/* Intel integrated GPUs for which we need to reserve "stolen memory" */
static const struct pci_device_id intel_early_ids[] __initconst = {
INTEL_I830_IDS(&i830_early_ops),
INTEL_I845G_IDS(&i845_early_ops),
u16 device;
int i;
+ /*
+ * Reserve "stolen memory" for an integrated GPU. If we've already
+ * found one, there's nothing to do for other (discrete) GPUs.
+ */
+ if (resource_size(&intel_graphics_stolen_res))
+ return;
+
device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);
for (i = 0; i < ARRAY_SIZE(intel_early_ids); i++) {
{ PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA, PCI_ANY_ID,
- QFLAG_APPLY_ONCE, intel_graphics_quirks },
+ 0, intel_graphics_quirks },
/*
* HPET on the current version of the Baytrail platform has accuracy
* problems: it will halt in deep idle state - so we disable it.
hpet_rtc_timer_reinit();
memset(&curr_time, 0, sizeof(struct rtc_time));
- if (hpet_rtc_flags & (RTC_UIE | RTC_AIE))
- mc146818_get_time(&curr_time);
+ if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) {
+ if (unlikely(mc146818_get_time(&curr_time) < 0)) {
+ pr_err_ratelimited("unable to read current time from RTC\n");
+ return IRQ_HANDLED;
+ }
+ }
if (hpet_rtc_flags & RTC_UIE &&
curr_time.tm_sec != hpet_prev_update_sec) {
}
#endif
-/**
- * pcpu_alloc_bootmem - NUMA friendly alloc_bootmem wrapper for percpu
- * @cpu: cpu to allocate for
- * @size: size allocation in bytes
- * @align: alignment
- *
- * Allocate @size bytes aligned at @align for cpu @cpu. This wrapper
- * does the right thing for NUMA regardless of the current
- * configuration.
- *
- * RETURNS:
- * Pointer to the allocated area on success, NULL on failure.
- */
-static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size,
- unsigned long align)
-{
- const unsigned long goal = __pa(MAX_DMA_ADDRESS);
-#ifdef CONFIG_NUMA
- int node = early_cpu_to_node(cpu);
- void *ptr;
-
- if (!node_online(node) || !NODE_DATA(node)) {
- ptr = memblock_alloc_from(size, align, goal);
- pr_info("cpu %d has no node %d or node-local memory\n",
- cpu, node);
- pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
- cpu, size, __pa(ptr));
- } else {
- ptr = memblock_alloc_try_nid(size, align, goal,
- MEMBLOCK_ALLOC_ACCESSIBLE,
- node);
-
- pr_debug("per cpu data for cpu%d %lu bytes on node%d at %016lx\n",
- cpu, size, node, __pa(ptr));
- }
- return ptr;
-#else
- return memblock_alloc_from(size, align, goal);
-#endif
-}
-
-/*
- * Helpers for first chunk memory allocation
- */
-static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
-{
- return pcpu_alloc_bootmem(cpu, size, align);
-}
-
-static void __init pcpu_fc_free(void *ptr, size_t size)
-{
- memblock_free(ptr, size);
-}
-
static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
{
#ifdef CONFIG_NUMA
#endif
}
-static void __init pcpup_populate_pte(unsigned long addr)
+static int __init pcpu_cpu_to_node(int cpu)
+{
+ return early_cpu_to_node(cpu);
+}
+
+void __init pcpu_populate_pte(unsigned long addr)
{
populate_extra_pte(addr);
}
rc = pcpu_embed_first_chunk(PERCPU_FIRST_CHUNK_RESERVE,
dyn_size, atom_size,
pcpu_cpu_distance,
- pcpu_fc_alloc, pcpu_fc_free);
+ pcpu_cpu_to_node);
if (rc < 0)
pr_warn("%s allocator failed (%d), falling back to page size\n",
pcpu_fc_names[pcpu_chosen_fc], rc);
}
if (rc < 0)
rc = pcpu_page_first_chunk(PERCPU_FIRST_CHUNK_RESERVE,
- pcpu_fc_alloc, pcpu_fc_free,
- pcpup_populate_pte);
+ pcpu_cpu_to_node);
if (rc < 0)
panic("cannot initialize percpu area (err=%d)", rc);
return fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu, xfeatures);
}
+/* Check whether the supplied CPUID data is equal to what is already set for the vCPU. */
+static int kvm_cpuid_check_equal(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
+ int nent)
+{
+ struct kvm_cpuid_entry2 *orig;
+ int i;
+
+ if (nent != vcpu->arch.cpuid_nent)
+ return -EINVAL;
+
+ for (i = 0; i < nent; i++) {
+ orig = &vcpu->arch.cpuid_entries[i];
+ if (e2[i].function != orig->function ||
+ e2[i].index != orig->index ||
+ e2[i].eax != orig->eax || e2[i].ebx != orig->ebx ||
+ e2[i].ecx != orig->ecx || e2[i].edx != orig->edx)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static void kvm_update_kvm_cpuid_base(struct kvm_vcpu *vcpu)
{
u32 function;
}
}
-static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu)
+static struct kvm_cpuid_entry2 *__kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu,
+ struct kvm_cpuid_entry2 *entries, int nent)
{
u32 base = vcpu->arch.kvm_cpuid_base;
if (!base)
return NULL;
- return kvm_find_cpuid_entry(vcpu, base | KVM_CPUID_FEATURES, 0);
+ return cpuid_entry2_find(entries, nent, base | KVM_CPUID_FEATURES, 0);
+}
+
+static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu)
+{
+ return __kvm_find_kvm_cpuid_features(vcpu, vcpu->arch.cpuid_entries,
+ vcpu->arch.cpuid_nent);
}
void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
vcpu->arch.pv_cpuid.features = best->eax;
}
-void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
+static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
+ int nent)
{
struct kvm_cpuid_entry2 *best;
- best = kvm_find_cpuid_entry(vcpu, 1, 0);
+ best = cpuid_entry2_find(entries, nent, 1, 0);
if (best) {
/* Update OSXSAVE bit */
if (boot_cpu_has(X86_FEATURE_XSAVE))
vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
}
- best = kvm_find_cpuid_entry(vcpu, 7, 0);
+ best = cpuid_entry2_find(entries, nent, 7, 0);
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
- best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+ best = cpuid_entry2_find(entries, nent, 0xD, 0);
if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
- best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
+ best = cpuid_entry2_find(entries, nent, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
- best = kvm_find_kvm_cpuid_features(vcpu);
+ best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
- best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+ best = cpuid_entry2_find(entries, nent, 0x1, 0);
if (best)
cpuid_entry_change(best, X86_FEATURE_MWAIT,
vcpu->arch.ia32_misc_enable_msr &
MSR_IA32_MISC_ENABLE_MWAIT);
}
}
+
+void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
+{
+ __kvm_update_cpuid_runtime(vcpu, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+}
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
int r;
+ __kvm_update_cpuid_runtime(vcpu, e2, nent);
+
+ /*
+ * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
+ * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
+ * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
+ * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with
+ * the core vCPU model on the fly. It would've been better to forbid any
+ * KVM_SET_CPUID{,2} calls after KVM_RUN altogether but unfortunately
+ * some VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
+ * KVM_SET_CPUID{,2} again. To support this legacy behavior, check
+ * whether the supplied CPUID data is equal to what's already set.
+ */
+ if (vcpu->arch.last_vmentry_cpu != -1)
+ return kvm_cpuid_check_equal(vcpu, e2, nent);
+
r = kvm_check_cpuid(vcpu, e2, nent);
if (r)
return r;
vcpu->arch.cpuid_nent = nent;
kvm_update_kvm_cpuid_base(vcpu);
- kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);
return 0;
perf_get_x86_pmu_capability(&cap);
/*
- * Only support guest architectural pmu on a host
- * with architectural pmu.
+ * The guest architecture pmu is only supported if the architecture
+ * pmu exists on the host and the module parameters allow it.
*/
- if (!cap.version)
+ if (!cap.version || !enable_pmu)
memset(&cap, 0, sizeof(cap));
eax.split.version_id = min(cap.version, 2);
--array->nent;
continue;
}
+
+ if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
+ entry->ecx &= ~BIT_ULL(2);
entry->edx = 0;
}
break;
{
restart_apic_timer(vcpu->arch.apic);
}
-EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu)
{
start_sw_timer(apic);
preempt_enable();
}
-EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
{
continue;
flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
+
PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
start, end - 1, true, flush);
}
}
/*
- * We can flush all the TLBs out of the mmu lock without TLB
- * corruption since we just change the spte from writable to
- * readonly so that we only need to care the case of changing
- * spte from present to present (changing the spte from present
- * to nonpresent will flush all the TLBs immediately), in other
- * words, the only case we care is mmu_spte_update() where we
- * have checked Host-writable | MMU-writable instead of
- * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
- * anymore.
+ * Flush TLBs if any SPTEs had to be write-protected to ensure that
+ * guest writes are reflected in the dirty bitmap before the memslot
+ * update completes, i.e. before enabling dirty logging is visible to
+ * userspace.
+ *
+ * Perform the TLB flush outside the mmu_lock to reduce the amount of
+ * time the lock is held. However, this does mean that another CPU can
+ * now grab mmu_lock and encounter a write-protected SPTE while CPUs
+ * still have a writable mapping for the associated GFN in their TLB.
+ *
+ * This is safe but requires KVM to be careful when making decisions
+ * based on the write-protection status of an SPTE. Specifically, KVM
+ * also write-protects SPTEs to monitor changes to guest page tables
+ * during shadow paging, and must guarantee no CPUs can write to those
+ * page before the lock is dropped. As mentioned in the previous
+ * paragraph, a write-protected SPTE is no guarantee that CPU cannot
+ * perform writes. So to determine if a TLB flush is truly required, KVM
+ * will clear a separate software-only bit (MMU-writable) and skip the
+ * flush if-and-only-if this bit was already clear.
+ *
+ * See DEFAULT_SPTE_MMU_WRITEABLE for more details.
*/
if (flush)
kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
new_spte &= ~PT_WRITABLE_MASK;
new_spte &= ~shadow_host_writable_mask;
+ new_spte &= ~shadow_mmu_writable_mask;
new_spte = mark_spte_for_access_track(new_spte);
(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
-/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
-#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(9)
-#define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(10)
-
/*
* The mask/shift to use for saving the original R/X bits when marking the PTE
* as not-present for access tracking purposes. We do not save the W bit as the
SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
+/*
+ * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits
+ * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
+ * that map guest pages in read-only memslots and read-only VMAs.
+ *
+ * Invariants:
+ * - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
+ *
+ *
+ * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU
+ * allows writes to the guest page mapped by the SPTE. This bit is cleared when
+ * the guest page mapped by the SPTE contains a page table that is being
+ * monitored for shadow paging. In this case the SPTE can only be made writable
+ * by unsyncing the shadow page under the mmu_lock.
+ *
+ * Invariants:
+ * - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
+ * - If MMU-writable is set, Host-writable must be set.
+ *
+ * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
+ * to track writes for dirty logging. For such SPTEs, KVM will locklessly set
+ * PT_WRITABLE_MASK upon the next write from the guest and record the write in
+ * the dirty log (see fast_page_fault()).
+ */
+
+/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
+#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(9)
+#define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(10)
+
/*
* Low ignored bits are at a premium for EPT, use high ignored bits, taking care
* to not overlap the A/D type mask or the saved access bits of access-tracked
static inline bool spte_can_locklessly_be_made_writable(u64 spte)
{
- return (spte & shadow_host_writable_mask) &&
- (spte & shadow_mmu_writable_mask);
+ if (spte & shadow_mmu_writable_mask) {
+ WARN_ON_ONCE(!(spte & shadow_host_writable_mask));
+ return true;
+ }
+
+ WARN_ON_ONCE(spte & PT_WRITABLE_MASK);
+ return false;
}
static inline u64 get_mmio_spte_generation(u64 spte)
!is_last_spte(iter.old_spte, iter.level))
continue;
- if (!is_writable_pte(iter.old_spte))
- break;
-
new_spte = iter.old_spte &
~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
+ if (new_spte == iter.old_spte)
+ break;
+
tdp_mmu_set_spte(kvm, &iter, new_spte);
spte_set = true;
}
#include <linux/types.h>
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
+#include <linux/bsearch.h>
+#include <linux/sort.h>
#include <asm/perf_event.h>
#include "x86.h"
#include "cpuid.h"
.config = config,
};
+ if (type == PERF_TYPE_HARDWARE && config >= PERF_COUNT_HW_MAX)
+ return;
+
attr.sample_period = get_sample_period(pmc, pmc->counter);
if (in_tx)
return true;
}
+static int cmp_u64(const void *a, const void *b)
+{
+ return *(__u64 *)a - *(__u64 *)b;
+}
+
void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
{
unsigned config, type = PERF_TYPE_RAW;
struct kvm *kvm = pmc->vcpu->kvm;
struct kvm_pmu_event_filter *filter;
- int i;
bool allow_event = true;
if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
if (filter) {
- for (i = 0; i < filter->nevents; i++)
- if (filter->events[i] ==
- (eventsel & AMD64_RAW_EVENT_MASK_NB))
- break;
- if (filter->action == KVM_PMU_EVENT_ALLOW &&
- i == filter->nevents)
- allow_event = false;
- if (filter->action == KVM_PMU_EVENT_DENY &&
- i < filter->nevents)
- allow_event = false;
+ __u64 key = eventsel & AMD64_RAW_EVENT_MASK_NB;
+
+ if (bsearch(&key, filter->events, filter->nevents,
+ sizeof(__u64), cmp_u64))
+ allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
+ else
+ allow_event = filter->action == KVM_PMU_EVENT_DENY;
}
if (!allow_event)
return;
/* Ensure nevents can't be changed between the user copies. */
*filter = tmp;
+ /*
+ * Sort the in-kernel list so that we can search it with bsearch.
+ */
+ sort(&filter->events, filter->nevents, sizeof(__u64), cmp_u64, NULL);
+
mutex_lock(&kvm->lock);
filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
mutex_is_locked(&kvm->lock));
struct kvm_vcpu *vcpu;
unsigned long i;
+ /*
+ * Wake any target vCPUs that are blocking, i.e. waiting for a wake
+ * event. There's no need to signal doorbells, as hardware has handled
+ * vCPUs that were in guest at the time of the IPI, and vCPUs that have
+ * since entered the guest will have processed pending IRQs at VMRUN.
+ */
kvm_for_each_vcpu(i, vcpu, kvm) {
- bool m = kvm_apic_match_dest(vcpu, source,
- icrl & APIC_SHORT_MASK,
- GET_APIC_DEST_FIELD(icrh),
- icrl & APIC_DEST_MASK);
-
- if (m && !avic_vcpu_is_running(vcpu))
+ if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
+ GET_APIC_DEST_FIELD(icrh),
+ icrl & APIC_DEST_MASK))
kvm_vcpu_wake_up(vcpu);
}
}
return -1;
kvm_lapic_set_irr(vec, vcpu->arch.apic);
+
+ /*
+ * Pairs with the smp_mb_*() after setting vcpu->guest_mode in
+ * vcpu_enter_guest() to ensure the write to the vIRR is ordered before
+ * the read of guest_mode, which guarantees that either VMRUN will see
+ * and process the new vIRR entry, or that the below code will signal
+ * the doorbell if the vCPU is already running in the guest.
+ */
smp_mb__after_atomic();
- if (avic_vcpu_is_running(vcpu)) {
+ /*
+ * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
+ * is in the guest. If the vCPU is not in the guest, hardware will
+ * automatically process AVIC interrupts at VMRUN.
+ */
+ if (vcpu->mode == IN_GUEST_MODE) {
int cpu = READ_ONCE(vcpu->cpu);
/*
if (cpu != get_cpu())
wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
put_cpu();
- } else
+ } else {
+ /*
+ * Wake the vCPU if it was blocking. KVM will then detect the
+ * pending IRQ when checking if the vCPU has a wake event.
+ */
kvm_vcpu_wake_up(vcpu);
+ }
return 0;
}
int h_physical_id = kvm_cpu_get_apicid(cpu);
struct vcpu_svm *svm = to_svm(vcpu);
+ lockdep_assert_preemption_disabled();
+
/*
* Since the host physical APIC id is 8 bits,
* we can support host APIC ID upto 255.
if (WARN_ON(h_physical_id > AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK))
return;
+ /*
+ * No need to update anything if the vCPU is blocking, i.e. if the vCPU
+ * is being scheduled in after being preempted. The CPU entries in the
+ * Physical APIC table and IRTE are consumed iff IsRun{ning} is '1'.
+ * If the vCPU was migrated, its new CPU value will be stuffed when the
+ * vCPU unblocks.
+ */
+ if (kvm_vcpu_is_blocking(vcpu))
+ return;
+
entry = READ_ONCE(*(svm->avic_physical_id_cache));
WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
-
- entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
- if (svm->avic_is_running)
- entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+ entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
- avic_update_iommu_vcpu_affinity(vcpu, h_physical_id,
- svm->avic_is_running);
+ avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true);
}
void avic_vcpu_put(struct kvm_vcpu *vcpu)
u64 entry;
struct vcpu_svm *svm = to_svm(vcpu);
+ lockdep_assert_preemption_disabled();
+
entry = READ_ONCE(*(svm->avic_physical_id_cache));
- if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
- avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
+
+ /* Nothing to do if IsRunning == '0' due to vCPU blocking. */
+ if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
+ return;
+
+ avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
}
-/*
- * This function is called during VCPU halt/unhalt.
- */
-static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
{
- struct vcpu_svm *svm = to_svm(vcpu);
- int cpu = get_cpu();
-
- WARN_ON(cpu != vcpu->cpu);
- svm->avic_is_running = is_run;
+ if (!kvm_vcpu_apicv_active(vcpu))
+ return;
- if (kvm_vcpu_apicv_active(vcpu)) {
- if (is_run)
- avic_vcpu_load(vcpu, cpu);
- else
- avic_vcpu_put(vcpu);
- }
- put_cpu();
+ preempt_disable();
+
+ /*
+ * Unload the AVIC when the vCPU is about to block, _before_
+ * the vCPU actually blocks.
+ *
+ * Any IRQs that arrive before IsRunning=0 will not cause an
+ * incomplete IPI vmexit on the source, therefore vIRR will also
+ * be checked by kvm_vcpu_check_block() before blocking. The
+ * memory barrier implicit in set_current_state orders writing
+ * IsRunning=0 before reading the vIRR. The processor needs a
+ * matching memory barrier on interrupt delivery between writing
+ * IRR and reading IsRunning; the lack of this barrier might be
+ * the cause of errata #1235).
+ */
+ avic_vcpu_put(vcpu);
+
+ preempt_enable();
}
-void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
+void avic_vcpu_unblocking(struct kvm_vcpu *vcpu)
{
- avic_set_running(vcpu, false);
-}
+ int cpu;
-void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
-{
- if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
- kvm_vcpu_update_apicv(vcpu);
- avic_set_running(vcpu, true);
+ if (!kvm_vcpu_apicv_active(vcpu))
+ return;
+
+ cpu = get_cpu();
+ WARN_ON(cpu != vcpu->cpu);
+
+ avic_vcpu_load(vcpu, cpu);
+
+ put_cpu();
}
{
struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
- if (!pmu)
+ if (!enable_pmu)
return NULL;
switch (msr) {
static int lbrv = true;
module_param(lbrv, int, 0444);
-/* enable/disable PMU virtualization */
-bool pmu = true;
-module_param(pmu, bool, 0444);
-
static int tsc_scaling = true;
module_param(tsc_scaling, int, 0444);
}
}
-/*
- * The default MMIO mask is a single bit (excluding the present bit),
- * which could conflict with the memory encryption bit. Check for
- * memory encryption support and override the default MMIO mask if
- * memory encryption is enabled.
- */
-static __init void svm_adjust_mmio_mask(void)
-{
- unsigned int enc_bit, mask_bit;
- u64 msr, mask;
-
- /* If there is no memory encryption support, use existing mask */
- if (cpuid_eax(0x80000000) < 0x8000001f)
- return;
-
- /* If memory encryption is not enabled, use existing mask */
- rdmsrl(MSR_AMD64_SYSCFG, msr);
- if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
- return;
-
- enc_bit = cpuid_ebx(0x8000001f) & 0x3f;
- mask_bit = boot_cpu_data.x86_phys_bits;
-
- /* Increment the mask bit if it is the same as the encryption bit */
- if (enc_bit == mask_bit)
- mask_bit++;
-
- /*
- * If the mask bit location is below 52, then some bits above the
- * physical addressing limit will always be reserved, so use the
- * rsvd_bits() function to generate the mask. This mask, along with
- * the present bit, will be used to generate a page fault with
- * PFER.RSV = 1.
- *
- * If the mask bit location is 52 (or above), then clear the mask.
- */
- mask = (mask_bit < 52) ? rsvd_bits(mask_bit, 51) | PT_PRESENT_MASK : 0;
-
- kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
-}
-
static void svm_hardware_teardown(void)
{
int cpu;
iopm_base = 0;
}
-static __init void svm_set_cpu_caps(void)
-{
- kvm_set_cpu_caps();
-
- supported_xss = 0;
-
- /* CPUID 0x80000001 and 0x8000000A (SVM features) */
- if (nested) {
- kvm_cpu_cap_set(X86_FEATURE_SVM);
-
- if (nrips)
- kvm_cpu_cap_set(X86_FEATURE_NRIPS);
-
- if (npt_enabled)
- kvm_cpu_cap_set(X86_FEATURE_NPT);
-
- if (tsc_scaling)
- kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR);
-
- /* Nested VM can receive #VMEXIT instead of triggering #GP */
- kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
- }
-
- /* CPUID 0x80000008 */
- if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
- boot_cpu_has(X86_FEATURE_AMD_SSBD))
- kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
-
- /* AMD PMU PERFCTR_CORE CPUID */
- if (pmu && boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
- kvm_cpu_cap_set(X86_FEATURE_PERFCTR_CORE);
-
- /* CPUID 0x8000001F (SME/SEV features) */
- sev_set_cpu_caps();
-}
-
-static __init int svm_hardware_setup(void)
-{
- int cpu;
- struct page *iopm_pages;
- void *iopm_va;
- int r;
- unsigned int order = get_order(IOPM_SIZE);
-
- /*
- * NX is required for shadow paging and for NPT if the NX huge pages
- * mitigation is enabled.
- */
- if (!boot_cpu_has(X86_FEATURE_NX)) {
- pr_err_ratelimited("NX (Execute Disable) not supported\n");
- return -EOPNOTSUPP;
- }
- kvm_enable_efer_bits(EFER_NX);
-
- iopm_pages = alloc_pages(GFP_KERNEL, order);
-
- if (!iopm_pages)
- return -ENOMEM;
-
- iopm_va = page_address(iopm_pages);
- memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
- iopm_base = page_to_pfn(iopm_pages) << PAGE_SHIFT;
-
- init_msrpm_offsets();
-
- supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
-
- if (boot_cpu_has(X86_FEATURE_FXSR_OPT))
- kvm_enable_efer_bits(EFER_FFXSR);
-
- if (tsc_scaling) {
- if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
- tsc_scaling = false;
- } else {
- pr_info("TSC scaling supported\n");
- kvm_has_tsc_control = true;
- kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX;
- kvm_tsc_scaling_ratio_frac_bits = 32;
- }
- }
-
- tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
-
- /* Check for pause filtering support */
- if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
- pause_filter_count = 0;
- pause_filter_thresh = 0;
- } else if (!boot_cpu_has(X86_FEATURE_PFTHRESHOLD)) {
- pause_filter_thresh = 0;
- }
-
- if (nested) {
- printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
- kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
- }
-
- /*
- * KVM's MMU doesn't support using 2-level paging for itself, and thus
- * NPT isn't supported if the host is using 2-level paging since host
- * CR4 is unchanged on VMRUN.
- */
- if (!IS_ENABLED(CONFIG_X86_64) && !IS_ENABLED(CONFIG_X86_PAE))
- npt_enabled = false;
-
- if (!boot_cpu_has(X86_FEATURE_NPT))
- npt_enabled = false;
-
- /* Force VM NPT level equal to the host's paging level */
- kvm_configure_mmu(npt_enabled, get_npt_level(),
- get_npt_level(), PG_LEVEL_1G);
- pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
-
- /* Note, SEV setup consumes npt_enabled. */
- sev_hardware_setup();
-
- svm_hv_hardware_setup();
-
- svm_adjust_mmio_mask();
-
- for_each_possible_cpu(cpu) {
- r = svm_cpu_init(cpu);
- if (r)
- goto err;
- }
-
- if (nrips) {
- if (!boot_cpu_has(X86_FEATURE_NRIPS))
- nrips = false;
- }
-
- enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
-
- if (enable_apicv) {
- pr_info("AVIC enabled\n");
-
- amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
- }
-
- if (vls) {
- if (!npt_enabled ||
- !boot_cpu_has(X86_FEATURE_V_VMSAVE_VMLOAD) ||
- !IS_ENABLED(CONFIG_X86_64)) {
- vls = false;
- } else {
- pr_info("Virtual VMLOAD VMSAVE supported\n");
- }
- }
-
- if (boot_cpu_has(X86_FEATURE_SVME_ADDR_CHK))
- svm_gp_erratum_intercept = false;
-
- if (vgif) {
- if (!boot_cpu_has(X86_FEATURE_VGIF))
- vgif = false;
- else
- pr_info("Virtual GIF supported\n");
- }
-
- if (lbrv) {
- if (!boot_cpu_has(X86_FEATURE_LBRV))
- lbrv = false;
- else
- pr_info("LBR virtualization supported\n");
- }
-
- if (!pmu)
- pr_info("PMU virtualization is disabled\n");
-
- svm_set_cpu_caps();
-
- /*
- * It seems that on AMD processors PTE's accessed bit is
- * being set by the CPU hardware before the NPF vmexit.
- * This is not expected behaviour and our tests fail because
- * of it.
- * A workaround here is to disable support for
- * GUEST_MAXPHYADDR < HOST_MAXPHYADDR if NPT is enabled.
- * In this case userspace can know if there is support using
- * KVM_CAP_SMALLER_MAXPHYADDR extension and decide how to handle
- * it
- * If future AMD CPU models change the behaviour described above,
- * this variable can be changed accordingly
- */
- allow_smaller_maxphyaddr = !npt_enabled;
-
- return 0;
-
-err:
- svm_hardware_teardown();
- return r;
-}
-
static void init_seg(struct vmcb_seg *seg)
{
seg->selector = 0;
if (err)
goto error_free_vmsa_page;
- /* We initialize this flag to true to make sure that the is_running
- * bit would be set the first time the vcpu is loaded.
- */
- if (irqchip_in_kernel(vcpu->kvm) && kvm_apicv_activated(vcpu->kvm))
- svm->avic_is_running = true;
-
svm->msrpm = svm_vcpu_alloc_msrpm();
if (!svm->msrpm) {
err = -ENOMEM;
svm_complete_interrupts(vcpu);
}
+static int svm_vcpu_pre_run(struct kvm_vcpu *vcpu)
+{
+ return 1;
+}
+
static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
{
if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_MSR &&
.prepare_guest_switch = svm_prepare_guest_switch,
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
- .vcpu_blocking = svm_vcpu_blocking,
- .vcpu_unblocking = svm_vcpu_unblocking,
+ .vcpu_blocking = avic_vcpu_blocking,
+ .vcpu_unblocking = avic_vcpu_unblocking,
.update_exception_bitmap = svm_update_exception_bitmap,
.get_msr_feature = svm_get_msr_feature,
.tlb_flush_gva = svm_flush_tlb_gva,
.tlb_flush_guest = svm_flush_tlb,
+ .vcpu_pre_run = svm_vcpu_pre_run,
.run = svm_vcpu_run,
.handle_exit = handle_exit,
.skip_emulated_instruction = skip_emulated_instruction,
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
};
+/*
+ * The default MMIO mask is a single bit (excluding the present bit),
+ * which could conflict with the memory encryption bit. Check for
+ * memory encryption support and override the default MMIO mask if
+ * memory encryption is enabled.
+ */
+static __init void svm_adjust_mmio_mask(void)
+{
+ unsigned int enc_bit, mask_bit;
+ u64 msr, mask;
+
+ /* If there is no memory encryption support, use existing mask */
+ if (cpuid_eax(0x80000000) < 0x8000001f)
+ return;
+
+ /* If memory encryption is not enabled, use existing mask */
+ rdmsrl(MSR_AMD64_SYSCFG, msr);
+ if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
+ return;
+
+ enc_bit = cpuid_ebx(0x8000001f) & 0x3f;
+ mask_bit = boot_cpu_data.x86_phys_bits;
+
+ /* Increment the mask bit if it is the same as the encryption bit */
+ if (enc_bit == mask_bit)
+ mask_bit++;
+
+ /*
+ * If the mask bit location is below 52, then some bits above the
+ * physical addressing limit will always be reserved, so use the
+ * rsvd_bits() function to generate the mask. This mask, along with
+ * the present bit, will be used to generate a page fault with
+ * PFER.RSV = 1.
+ *
+ * If the mask bit location is 52 (or above), then clear the mask.
+ */
+ mask = (mask_bit < 52) ? rsvd_bits(mask_bit, 51) | PT_PRESENT_MASK : 0;
+
+ kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
+}
+
+static __init void svm_set_cpu_caps(void)
+{
+ kvm_set_cpu_caps();
+
+ supported_xss = 0;
+
+ /* CPUID 0x80000001 and 0x8000000A (SVM features) */
+ if (nested) {
+ kvm_cpu_cap_set(X86_FEATURE_SVM);
+
+ if (nrips)
+ kvm_cpu_cap_set(X86_FEATURE_NRIPS);
+
+ if (npt_enabled)
+ kvm_cpu_cap_set(X86_FEATURE_NPT);
+
+ if (tsc_scaling)
+ kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR);
+
+ /* Nested VM can receive #VMEXIT instead of triggering #GP */
+ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
+ }
+
+ /* CPUID 0x80000008 */
+ if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
+ boot_cpu_has(X86_FEATURE_AMD_SSBD))
+ kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+
+ /* AMD PMU PERFCTR_CORE CPUID */
+ if (enable_pmu && boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
+ kvm_cpu_cap_set(X86_FEATURE_PERFCTR_CORE);
+
+ /* CPUID 0x8000001F (SME/SEV features) */
+ sev_set_cpu_caps();
+}
+
+static __init int svm_hardware_setup(void)
+{
+ int cpu;
+ struct page *iopm_pages;
+ void *iopm_va;
+ int r;
+ unsigned int order = get_order(IOPM_SIZE);
+
+ /*
+ * NX is required for shadow paging and for NPT if the NX huge pages
+ * mitigation is enabled.
+ */
+ if (!boot_cpu_has(X86_FEATURE_NX)) {
+ pr_err_ratelimited("NX (Execute Disable) not supported\n");
+ return -EOPNOTSUPP;
+ }
+ kvm_enable_efer_bits(EFER_NX);
+
+ iopm_pages = alloc_pages(GFP_KERNEL, order);
+
+ if (!iopm_pages)
+ return -ENOMEM;
+
+ iopm_va = page_address(iopm_pages);
+ memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
+ iopm_base = page_to_pfn(iopm_pages) << PAGE_SHIFT;
+
+ init_msrpm_offsets();
+
+ supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
+
+ if (boot_cpu_has(X86_FEATURE_FXSR_OPT))
+ kvm_enable_efer_bits(EFER_FFXSR);
+
+ if (tsc_scaling) {
+ if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+ tsc_scaling = false;
+ } else {
+ pr_info("TSC scaling supported\n");
+ kvm_has_tsc_control = true;
+ kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX;
+ kvm_tsc_scaling_ratio_frac_bits = 32;
+ }
+ }
+
+ tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
+
+ /* Check for pause filtering support */
+ if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
+ pause_filter_count = 0;
+ pause_filter_thresh = 0;
+ } else if (!boot_cpu_has(X86_FEATURE_PFTHRESHOLD)) {
+ pause_filter_thresh = 0;
+ }
+
+ if (nested) {
+ printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
+ kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
+ }
+
+ /*
+ * KVM's MMU doesn't support using 2-level paging for itself, and thus
+ * NPT isn't supported if the host is using 2-level paging since host
+ * CR4 is unchanged on VMRUN.
+ */
+ if (!IS_ENABLED(CONFIG_X86_64) && !IS_ENABLED(CONFIG_X86_PAE))
+ npt_enabled = false;
+
+ if (!boot_cpu_has(X86_FEATURE_NPT))
+ npt_enabled = false;
+
+ /* Force VM NPT level equal to the host's paging level */
+ kvm_configure_mmu(npt_enabled, get_npt_level(),
+ get_npt_level(), PG_LEVEL_1G);
+ pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
+
+ /* Note, SEV setup consumes npt_enabled. */
+ sev_hardware_setup();
+
+ svm_hv_hardware_setup();
+
+ svm_adjust_mmio_mask();
+
+ for_each_possible_cpu(cpu) {
+ r = svm_cpu_init(cpu);
+ if (r)
+ goto err;
+ }
+
+ if (nrips) {
+ if (!boot_cpu_has(X86_FEATURE_NRIPS))
+ nrips = false;
+ }
+
+ enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
+
+ if (enable_apicv) {
+ pr_info("AVIC enabled\n");
+
+ amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
+ } else {
+ svm_x86_ops.vcpu_blocking = NULL;
+ svm_x86_ops.vcpu_unblocking = NULL;
+ }
+
+ if (vls) {
+ if (!npt_enabled ||
+ !boot_cpu_has(X86_FEATURE_V_VMSAVE_VMLOAD) ||
+ !IS_ENABLED(CONFIG_X86_64)) {
+ vls = false;
+ } else {
+ pr_info("Virtual VMLOAD VMSAVE supported\n");
+ }
+ }
+
+ if (boot_cpu_has(X86_FEATURE_SVME_ADDR_CHK))
+ svm_gp_erratum_intercept = false;
+
+ if (vgif) {
+ if (!boot_cpu_has(X86_FEATURE_VGIF))
+ vgif = false;
+ else
+ pr_info("Virtual GIF supported\n");
+ }
+
+ if (lbrv) {
+ if (!boot_cpu_has(X86_FEATURE_LBRV))
+ lbrv = false;
+ else
+ pr_info("LBR virtualization supported\n");
+ }
+
+ if (!enable_pmu)
+ pr_info("PMU virtualization is disabled\n");
+
+ svm_set_cpu_caps();
+
+ /*
+ * It seems that on AMD processors PTE's accessed bit is
+ * being set by the CPU hardware before the NPF vmexit.
+ * This is not expected behaviour and our tests fail because
+ * of it.
+ * A workaround here is to disable support for
+ * GUEST_MAXPHYADDR < HOST_MAXPHYADDR if NPT is enabled.
+ * In this case userspace can know if there is support using
+ * KVM_CAP_SMALLER_MAXPHYADDR extension and decide how to handle
+ * it
+ * If future AMD CPU models change the behaviour described above,
+ * this variable can be changed accordingly
+ */
+ allow_smaller_maxphyaddr = !npt_enabled;
+
+ return 0;
+
+err:
+ svm_hardware_teardown();
+ return r;
+}
+
+
static struct kvm_x86_init_ops svm_init_ops __initdata = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
extern bool intercept_smi;
-extern bool pmu;
/*
* Clean bits in VMCB.
u32 dfr_reg;
struct page *avic_backing_page;
u64 *avic_physical_id_cache;
- bool avic_is_running;
/*
* Per-vcpu list of struct amd_svm_iommu_ir:
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL
-static inline bool avic_vcpu_is_running(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- u64 *entry = svm->avic_physical_id_cache;
-
- if (!entry)
- return false;
-
- return (READ_ONCE(*entry) & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
-}
-
int avic_ga_log_notifier(u32 ga_tag);
void avic_vm_destroy(struct kvm *kvm);
int avic_vm_init(struct kvm *kvm);
bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu);
int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
uint32_t guest_irq, bool set);
-void svm_vcpu_blocking(struct kvm_vcpu *vcpu);
-void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
+void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
/* sev.c */
#include <asm/vmx.h>
#include "lapic.h"
+#include "x86.h"
extern bool __read_mostly enable_vpid;
extern bool __read_mostly flexpriority_enabled;
{
u64 perf_cap = 0;
+ if (!enable_pmu)
+ return perf_cap;
+
if (boot_cpu_has(X86_FEATURE_PDCM))
rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
#define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
static struct kvm_event_hw_type_mapping intel_arch_events[] = {
- /* Index must match CPUID 0x0A.EBX bit vector */
[0] = { 0x3c, 0x00, PERF_COUNT_HW_CPU_CYCLES },
[1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
[2] = { 0x3c, 0x01, PERF_COUNT_HW_BUS_CYCLES },
[4] = { 0x2e, 0x41, PERF_COUNT_HW_CACHE_MISSES },
[5] = { 0xc4, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
[6] = { 0xc5, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
+ /* The above index must match CPUID 0x0A.EBX bit vector */
[7] = { 0x00, 0x03, PERF_COUNT_HW_REF_CPU_CYCLES },
};
u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
int i;
- for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++)
- if (intel_arch_events[i].eventsel == event_select &&
- intel_arch_events[i].unit_mask == unit_mask &&
- (pmc_is_fixed(pmc) || pmu->available_event_types & (1 << i)))
- break;
+ for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++) {
+ if (intel_arch_events[i].eventsel != event_select ||
+ intel_arch_events[i].unit_mask != unit_mask)
+ continue;
+
+ /* disable event that reported as not present by cpuid */
+ if ((i < 7) && !(pmu->available_event_types & (1 << i)))
+ return PERF_COUNT_HW_MAX + 1;
+
+ break;
+ }
if (i == ARRAY_SIZE(intel_arch_events))
return PERF_COUNT_HW_MAX;
pmu->reserved_bits = 0xffffffff00200000ull;
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
- if (!entry)
+ if (!entry || !enable_pmu)
return;
eax.full = entry->eax;
edx.full = entry->edx;
* wake the target vCPUs. vCPUs are removed from the list and the notification
* vector is reset when the vCPU is scheduled in.
*/
-static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
/*
* Protect the per-CPU list with a per-CPU spinlock to handle task migration.
* When a blocking vCPU is awakened _and_ migrated to a different pCPU, the
* CPU. IRQs must be disabled when taking this lock, otherwise deadlock will
* occur if a wakeup IRQ arrives and attempts to acquire the lock.
*/
-static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
{
void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
{
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
struct pi_desc old, new;
+ unsigned long flags;
unsigned int dest;
/*
if (!enable_apicv || !lapic_in_kernel(vcpu))
return;
- /* Nothing to do if PI.SN and PI.NDST both have the desired value. */
- if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
+ /*
+ * If the vCPU wasn't on the wakeup list and wasn't migrated, then the
+ * full update can be skipped as neither the vector nor the destination
+ * needs to be changed.
+ */
+ if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR && vcpu->cpu == cpu) {
+ /*
+ * Clear SN if it was set due to being preempted. Again, do
+ * this even if there is no assigned device for simplicity.
+ */
+ if (pi_test_and_clear_sn(pi_desc))
+ goto after_clear_sn;
return;
+ }
+
+ local_irq_save(flags);
/*
- * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
- * PI.NDST: pi_post_block is the one expected to change PID.NDST and the
- * wakeup handler expects the vCPU to be on the blocked_vcpu_list that
- * matches PI.NDST. Otherwise, a vcpu may not be able to be woken up
- * correctly.
+ * If the vCPU was waiting for wakeup, remove the vCPU from the wakeup
+ * list of the _previous_ pCPU, which will not be the same as the
+ * current pCPU if the task was migrated.
*/
- if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || vcpu->cpu == cpu) {
- pi_clear_sn(pi_desc);
- goto after_clear_sn;
+ if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
+ raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+ list_del(&vmx->pi_wakeup_list);
+ raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
}
- /* The full case. Set the new destination and clear SN. */
dest = cpu_physical_id(cpu);
if (!x2apic_mode)
dest = (dest << 8) & 0xFF00;
do {
old.control = new.control = READ_ONCE(pi_desc->control);
+ /*
+ * Clear SN (as above) and refresh the destination APIC ID to
+ * handle task migration (@cpu != vcpu->cpu).
+ */
new.ndst = dest;
new.sn = 0;
+
+ /*
+ * Restore the notification vector; in the blocking case, the
+ * descriptor was modified on "put" to use the wakeup vector.
+ */
+ new.nv = POSTED_INTR_VECTOR;
} while (pi_try_set_control(pi_desc, old.control, new.control));
+ local_irq_restore(flags);
+
after_clear_sn:
/*
irq_remapping_cap(IRQ_POSTING_CAP);
}
-void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
-{
- struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
-
- if (!vmx_can_use_vtd_pi(vcpu->kvm))
- return;
-
- /* Set SN when the vCPU is preempted */
- if (vcpu->preempted)
- pi_set_sn(pi_desc);
-}
-
-static void __pi_post_block(struct kvm_vcpu *vcpu)
-{
- struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- struct pi_desc old, new;
- unsigned int dest;
-
- /*
- * Remove the vCPU from the wakeup list of the _previous_ pCPU, which
- * will not be the same as the current pCPU if the task was migrated.
- */
- spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
- list_del(&vcpu->blocked_vcpu_list);
- spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
-
- dest = cpu_physical_id(vcpu->cpu);
- if (!x2apic_mode)
- dest = (dest << 8) & 0xFF00;
-
- WARN(pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR,
- "Wakeup handler not enabled while the vCPU was blocking");
-
- do {
- old.control = new.control = READ_ONCE(pi_desc->control);
-
- new.ndst = dest;
-
- /* set 'NV' to 'notification vector' */
- new.nv = POSTED_INTR_VECTOR;
- } while (pi_try_set_control(pi_desc, old.control, new.control));
-
- vcpu->pre_pcpu = -1;
-}
-
/*
- * This routine does the following things for vCPU which is going
- * to be blocked if VT-d PI is enabled.
- * - Store the vCPU to the wakeup list, so when interrupts happen
- * we can find the right vCPU to wake up.
- * - Change the Posted-interrupt descriptor as below:
- * 'NV' <-- POSTED_INTR_WAKEUP_VECTOR
- * - If 'ON' is set during this process, which means at least one
- * interrupt is posted for this vCPU, we cannot block it, in
- * this case, return 1, otherwise, return 0.
- *
+ * Put the vCPU on this pCPU's list of vCPUs that needs to be awakened and set
+ * WAKEUP as the notification vector in the PI descriptor.
*/
-int pi_pre_block(struct kvm_vcpu *vcpu)
+static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
{
- struct pi_desc old, new;
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct pi_desc old, new;
unsigned long flags;
- if (!vmx_can_use_vtd_pi(vcpu->kvm) ||
- vmx_interrupt_blocked(vcpu))
- return 0;
-
local_irq_save(flags);
- vcpu->pre_pcpu = vcpu->cpu;
- spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu));
- list_add_tail(&vcpu->blocked_vcpu_list,
- &per_cpu(blocked_vcpu_on_cpu, vcpu->cpu));
- spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu));
+ raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+ list_add_tail(&vmx->pi_wakeup_list,
+ &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
+ raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
- WARN(pi_desc->sn == 1,
- "Posted Interrupt Suppress Notification set before blocking");
+ WARN(pi_desc->sn, "PI descriptor SN field set before blocking");
do {
old.control = new.control = READ_ONCE(pi_desc->control);
new.nv = POSTED_INTR_WAKEUP_VECTOR;
} while (pi_try_set_control(pi_desc, old.control, new.control));
- /* We should not block the vCPU if an interrupt is posted for it. */
- if (pi_test_on(pi_desc))
- __pi_post_block(vcpu);
+ /*
+ * Send a wakeup IPI to this CPU if an interrupt may have been posted
+ * before the notification vector was updated, in which case the IRQ
+ * will arrive on the non-wakeup vector. An IPI is needed as calling
+ * try_to_wake_up() from ->sched_out() isn't allowed (IRQs are not
+ * enabled until it is safe to call try_to_wake_up() on the task being
+ * scheduled out).
+ */
+ if (pi_test_on(&new))
+ apic->send_IPI_self(POSTED_INTR_WAKEUP_VECTOR);
local_irq_restore(flags);
- return (vcpu->pre_pcpu == -1);
}
-void pi_post_block(struct kvm_vcpu *vcpu)
+void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
{
- unsigned long flags;
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- if (vcpu->pre_pcpu == -1)
+ if (!vmx_can_use_vtd_pi(vcpu->kvm))
return;
- local_irq_save(flags);
- __pi_post_block(vcpu);
- local_irq_restore(flags);
+ if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu))
+ pi_enable_wakeup_handler(vcpu);
+
+ /*
+ * Set SN when the vCPU is preempted. Note, the vCPU can both be seen
+ * as blocking and preempted, e.g. if it's preempted between setting
+ * its wait state and manually scheduling out.
+ */
+ if (vcpu->preempted)
+ pi_set_sn(pi_desc);
}
/*
*/
void pi_wakeup_handler(void)
{
- struct kvm_vcpu *vcpu;
int cpu = smp_processor_id();
+ struct vcpu_vmx *vmx;
- spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
- list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
- blocked_vcpu_list) {
- struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
+ list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu),
+ pi_wakeup_list) {
- if (pi_test_on(pi_desc))
- kvm_vcpu_kick(vcpu);
+ if (pi_test_on(&vmx->pi_desc))
+ kvm_vcpu_wake_up(&vmx->vcpu);
}
- spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+ raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
}
void __init pi_init_cpu(int cpu)
{
- INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
- spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+ INIT_LIST_HEAD(&per_cpu(wakeup_vcpus_on_cpu, cpu));
+ raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
}
bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu)
* Bail out of the block loop if the VM has an assigned
* device, but the blocking vCPU didn't reconfigure the
* PI.NV to the wakeup vector, i.e. the assigned device
- * came along after the initial check in pi_pre_block().
+ * came along after the initial check in vmx_vcpu_pi_put().
*/
void vmx_pi_start_assignment(struct kvm *kvm)
{
(unsigned long *)&pi_desc->control);
}
+static inline bool pi_test_and_clear_sn(struct pi_desc *pi_desc)
+{
+ return test_and_clear_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
static inline bool pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
{
return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
-int pi_pre_block(struct kvm_vcpu *vcpu);
-void pi_post_block(struct kvm_vcpu *vcpu);
void pi_wakeup_handler(void);
void __init pi_init_cpu(int cpu);
bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
pt_update_intercept_for_msr(vcpu);
}
-static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
- bool nested)
+static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
+ int pi_vec)
{
#ifdef CONFIG_SMP
- int pi_vec = nested ? POSTED_INTR_NESTED_VECTOR : POSTED_INTR_VECTOR;
-
if (vcpu->mode == IN_GUEST_MODE) {
/*
* The vector of interrupt to be delivered to vcpu had
*/
apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
- return true;
+ return;
}
#endif
- return false;
+ /*
+ * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
+ * otherwise do nothing as KVM will grab the highest priority pending
+ * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
+ */
+ kvm_vcpu_wake_up(vcpu);
}
static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
smp_mb__after_atomic();
/* the PIR and ON have been set by L1. */
- if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true))
- kvm_vcpu_kick(vcpu);
+ kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_NESTED_VECTOR);
return 0;
}
return -1;
* guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
* posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
*/
- if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
- kvm_vcpu_kick(vcpu);
-
+ kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
return 0;
}
return 1;
}
+static bool vmx_emulation_required_with_pending_exception(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ return vmx->emulation_required && !vmx->rmode.vm86_active &&
+ vcpu->arch.exception.pending;
+}
+
static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
if (!kvm_emulate_instruction(vcpu, 0))
return 0;
- if (vmx->emulation_required && !vmx->rmode.vm86_active &&
- vcpu->arch.exception.pending) {
+ if (vmx_emulation_required_with_pending_exception(vcpu)) {
kvm_prepare_emulation_failure_exit(vcpu);
return 0;
}
return 1;
}
+static int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu)
+{
+ if (vmx_emulation_required_with_pending_exception(vcpu)) {
+ kvm_prepare_emulation_failure_exit(vcpu);
+ return 0;
+ }
+
+ return 1;
+}
+
static void grow_ple_window(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
BUILD_BUG_ON(offsetof(struct vcpu_vmx, vcpu) != 0);
vmx = to_vmx(vcpu);
+ INIT_LIST_HEAD(&vmx->pi_wakeup_list);
+
err = -ENOMEM;
vmx->vpid = allocate_vpid();
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_ENABLE_PML);
}
-static int vmx_pre_block(struct kvm_vcpu *vcpu)
-{
- if (pi_pre_block(vcpu))
- return 1;
-
- if (kvm_lapic_hv_timer_in_use(vcpu))
- kvm_lapic_switch_to_sw_timer(vcpu);
-
- return 0;
-}
-
-static void vmx_post_block(struct kvm_vcpu *vcpu)
-{
- if (kvm_x86_ops.set_hv_timer)
- kvm_lapic_switch_to_hv_timer(vcpu);
-
- pi_post_block(vcpu);
-}
-
static void vmx_setup_mce(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.mcg_cap & MCG_LMCE_P)
.tlb_flush_gva = vmx_flush_tlb_gva,
.tlb_flush_guest = vmx_flush_tlb_guest,
+ .vcpu_pre_run = vmx_vcpu_pre_run,
.run = vmx_vcpu_run,
.handle_exit = vmx_handle_exit,
.skip_emulated_instruction = vmx_skip_emulated_instruction,
.cpu_dirty_log_size = PML_ENTITY_NUM,
.update_cpu_dirty_logging = vmx_update_cpu_dirty_logging,
- .pre_block = vmx_pre_block,
- .post_block = vmx_post_block,
-
.pmu_ops = &intel_pmu_ops,
.nested_ops = &vmx_nested_ops,
/* Posted interrupt descriptor */
struct pi_desc pi_desc;
+ /* Used if this vCPU is waiting for PI notification wakeup. */
+ struct list_head pi_wakeup_list;
+
/* Support for a guest hypervisor (nested VMX) */
struct nested_vmx nested;
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
+/* Enable/disable PMU virtualization */
+bool __read_mostly enable_pmu = true;
+EXPORT_SYMBOL_GPL(enable_pmu);
+module_param(enable_pmu, bool, 0444);
+
/*
* Restoring the host value for MSRs that are only consumed when running in
* usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
struct kvm_cpuid __user *cpuid_arg = argp;
struct kvm_cpuid cpuid;
- /*
- * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
- * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
- * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
- * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with
- * the core vCPU model on the fly, so fail.
- */
- r = -EINVAL;
- if (vcpu->arch.last_vmentry_cpu != -1)
- goto out;
-
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
struct kvm_cpuid2 __user *cpuid_arg = argp;
struct kvm_cpuid2 cpuid;
- /*
- * KVM_SET_CPUID{,2} after KVM_RUN is forbidded, see the comment in
- * KVM_SET_CPUID case above.
- */
- r = -EINVAL;
- if (vcpu->arch.last_vmentry_cpu != -1)
- goto out;
-
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
smp_mb__after_srcu_read_unlock();
/*
- * This handles the case where a posted interrupt was
- * notified with kvm_vcpu_kick. Assigned devices can
- * use the POSTED_INTR_VECTOR even if APICv is disabled,
- * so do it even if APICv is disabled on this vCPU.
+ * Process pending posted interrupts to handle the case where the
+ * notification IRQ arrived in the host, or was never sent (because the
+ * target vCPU wasn't running). Do this regardless of the vCPU's APICv
+ * status, KVM doesn't update assigned devices when APICv is inhibited,
+ * i.e. they can post interrupts even if APICv is temporarily disabled.
*/
if (kvm_lapic_enabled(vcpu))
static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
{
- if (!kvm_arch_vcpu_runnable(vcpu) &&
- (!kvm_x86_ops.pre_block || static_call(kvm_x86_pre_block)(vcpu) == 0)) {
+ bool hv_timer;
+
+ if (!kvm_arch_vcpu_runnable(vcpu)) {
+ /*
+ * Switch to the software timer before halt-polling/blocking as
+ * the guest's timer may be a break event for the vCPU, and the
+ * hypervisor timer runs only when the CPU is in guest mode.
+ * Switch before halt-polling so that KVM recognizes an expired
+ * timer before blocking.
+ */
+ hv_timer = kvm_lapic_hv_timer_in_use(vcpu);
+ if (hv_timer)
+ kvm_lapic_switch_to_sw_timer(vcpu);
+
srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
kvm_vcpu_halt(vcpu);
kvm_vcpu_block(vcpu);
vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
- if (kvm_x86_ops.post_block)
- static_call(kvm_x86_post_block)(vcpu);
+ if (hv_timer)
+ kvm_lapic_switch_to_hv_timer(vcpu);
if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
return 1;
r = -EINTR;
goto out;
}
+ /*
+ * It should be impossible for the hypervisor timer to be in
+ * use before KVM has ever run the vCPU.
+ */
+ WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));
kvm_vcpu_block(vcpu);
if (kvm_apic_accept_events(vcpu) < 0) {
r = 0;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
- if (kvm_run->immediate_exit)
+ if (kvm_run->immediate_exit) {
r = -EINTR;
- else
- r = vcpu_run(vcpu);
+ goto out;
+ }
+
+ r = static_call(kvm_x86_vcpu_pre_run)(vcpu);
+ if (r <= 0)
+ goto out;
+
+ r = vcpu_run(vcpu);
out:
kvm_put_guest_fpu(vcpu);
extern u64 supported_xcr0;
extern u64 host_xss;
extern u64 supported_xss;
+extern bool enable_pmu;
static inline bool kvm_mpx_supported(void)
{
config UML_X86
def_bool y
- select GENERIC_FIND_FIRST_BIT
config 64BIT
bool "64-bit kernel" if "$(SUBARCH)" = "x86"
# Core configuration.
# (Use VAR=<xtensa_config> to use another default compiler.)
-variant-y := $(patsubst "%",%,$(CONFIG_XTENSA_VARIANT_NAME))
+variant-y := $(CONFIG_XTENSA_VARIANT_NAME)
VARIANT = $(variant-y)
#
#
-BUILTIN_DTB_SOURCE := $(patsubst "%",%,$(CONFIG_BUILTIN_DTB_SOURCE)).dtb.o
-ifneq ($(CONFIG_BUILTIN_DTB_SOURCE),"")
-obj-$(CONFIG_OF) += $(BUILTIN_DTB_SOURCE)
-endif
+obj-$(CONFIG_OF) += $(addsuffix .dtb.o, $(CONFIG_BUILTIN_DTB_SOURCE))
# for CONFIG_OF_ALL_DTBS test
dtstree := $(srctree)/$(src)
#undef BIT_OP
#undef TEST_AND_BIT_OP
-#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/ext2-atomic-setbit.h>
static ssize_t proc_read_simdisk(struct file *file, char __user *buf,
size_t size, loff_t *ppos)
{
- struct simdisk *dev = PDE_DATA(file_inode(file));
+ struct simdisk *dev = pde_data(file_inode(file));
const char *s = dev->filename;
if (s) {
ssize_t n = simple_read_from_buffer(buf, size, ppos,
size_t count, loff_t *ppos)
{
char *tmp = memdup_user_nul(buf, count);
- struct simdisk *dev = PDE_DATA(file_inode(file));
+ struct simdisk *dev = pde_data(file_inode(file));
int err;
if (IS_ERR(tmp))
#include <linux/pseudo_fs.h>
#include <linux/uio.h>
#include <linux/namei.h>
-#include <linux/cleancache.h>
#include <linux/part_stat.h>
#include <linux/uaccess.h>
#include "../fs/internal.h"
lru_add_drain_all(); /* make sure all lru add caches are flushed */
invalidate_mapping_pages(mapping, 0, -1);
}
- /* 99% of the time, we don't need to flush the cleancache on the bdev.
- * But, for the strange corners, lets be cautious
- */
- cleancache_invalidate_inode(mapping);
}
EXPORT_SYMBOL(invalidate_bdev);
offset = new_size - done;
else
offset = 0;
- zero_user(bv.bv_page, offset, bv.bv_len - offset);
+ zero_user(bv.bv_page, bv.bv_offset + offset,
+ bv.bv_len - offset);
truncated = true;
}
done += bv.bv_len;
#include "blk-mq-sched.h"
#include "blk-mq-tag.h"
+/*
+ * Recalculate wakeup batch when tag is shared by hctx.
+ */
+static void blk_mq_update_wake_batch(struct blk_mq_tags *tags,
+ unsigned int users)
+{
+ if (!users)
+ return;
+
+ sbitmap_queue_recalculate_wake_batch(&tags->bitmap_tags,
+ users);
+ sbitmap_queue_recalculate_wake_batch(&tags->breserved_tags,
+ users);
+}
+
/*
* If a previously inactive queue goes active, bump the active user count.
* We need to do this before try to allocate driver tag, then even if fail
*/
bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
+ unsigned int users;
+
if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;
- if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
- !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
- atomic_inc(&hctx->tags->active_queues);
+ if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
+ test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
+ return true;
+ }
} else {
- if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
- !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
- atomic_inc(&hctx->tags->active_queues);
+ if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
+ test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) {
+ return true;
+ }
}
+ users = atomic_inc_return(&hctx->tags->active_queues);
+
+ blk_mq_update_wake_batch(hctx->tags, users);
+
return true;
}
void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_tags *tags = hctx->tags;
+ unsigned int users;
if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;
return;
}
- atomic_dec(&tags->active_queues);
+ users = atomic_dec_return(&tags->active_queues);
+
+ blk_mq_update_wake_batch(tags, users);
blk_mq_tag_wakeup_all(tags, false);
}
bio = bio_clone_fast(bio_src, gfp_mask, bs);
if (!bio)
goto free_and_out;
+ bio->bi_bdev = rq->q->disk->part0;
if (bio_ctr && bio_ctr(bio, bio_src, data))
goto free_and_out;
static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu,
struct blk_mq_hw_ctx *hctx)
{
- if (cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu)
+ if (cpumask_first_and(hctx->cpumask, cpu_online_mask) != cpu)
return false;
if (cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask) < nr_cpu_ids)
return false;
bioset_exit(&q->bio_split);
+ if (blk_queue_has_srcu(q))
+ cleanup_srcu_struct(q->srcu);
+
ida_simple_remove(&blk_queue_ida, q->id);
call_rcu(&q->rcu_head, blk_free_queue_rcu);
}
kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
mutex_unlock(&q->sysfs_lock);
- ret = 0;
unlock:
mutex_unlock(&q->sysfs_dir_lock);
SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
SHOW_INT(deadline_front_merges_show, dd->front_merges);
-SHOW_INT(deadline_async_depth_show, dd->front_merges);
+SHOW_INT(deadline_async_depth_show, dd->async_depth);
SHOW_INT(deadline_fifo_batch_show, dd->fifo_batch);
#undef SHOW_INT
#undef SHOW_JIFFIES
STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
-STORE_INT(deadline_async_depth_store, &dd->front_merges, 1, INT_MAX);
+STORE_INT(deadline_async_depth_store, &dd->async_depth, 1, INT_MAX);
STORE_INT(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX);
#undef STORE_FUNCTION
#undef STORE_INT
# SPDX-License-Identifier: GPL-2.0-only
+/extract-cert
/x509_certificate_list
/x509_revocation_list
choice
prompt "Type of module signing key to be generated"
- default MODULE_SIG_KEY_TYPE_RSA
+ depends on MODULE_SIG || (IMA_APPRAISE_MODSIG && MODULES)
help
The type of module signing key type to generate. This option
does not apply if a #PKCS11 URI is used.
config MODULE_SIG_KEY_TYPE_RSA
bool "RSA"
- depends on MODULE_SIG || (IMA_APPRAISE_MODSIG && MODULES)
help
Use an RSA key for module signing.
config MODULE_SIG_KEY_TYPE_ECDSA
bool "ECDSA"
select CRYPTO_ECDSA
- depends on MODULE_SIG || (IMA_APPRAISE_MODSIG && MODULES)
help
Use an elliptic curve key (NIST P384) for module signing. Consider
using a strong hash like sha256 or sha384 for hashing modules.
obj-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += system_keyring.o system_certificates.o common.o
obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist.o common.o
obj-$(CONFIG_SYSTEM_REVOCATION_LIST) += revocation_certificates.o
-ifneq ($(CONFIG_SYSTEM_BLACKLIST_HASH_LIST),"")
+ifneq ($(CONFIG_SYSTEM_BLACKLIST_HASH_LIST),)
obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist_hashes.o
else
obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist_nohashes.o
endif
-ifeq ($(CONFIG_SYSTEM_TRUSTED_KEYRING),y)
+quiet_cmd_extract_certs = CERT $@
+ cmd_extract_certs = $(obj)/extract-cert $(2) $@
-$(eval $(call config_filename,SYSTEM_TRUSTED_KEYS))
-
-# GCC doesn't include .incbin files in -MD generated dependencies (PR#66871)
$(obj)/system_certificates.o: $(obj)/x509_certificate_list
-# Cope with signing_key.x509 existing in $(srctree) not $(objtree)
-AFLAGS_system_certificates.o := -I$(srctree)
-
-quiet_cmd_extract_certs = EXTRACT_CERTS $(patsubst "%",%,$(2))
- cmd_extract_certs = scripts/extract-cert $(2) $@
+$(obj)/x509_certificate_list: $(CONFIG_SYSTEM_TRUSTED_KEYS) $(obj)/extract-cert FORCE
+ $(call if_changed,extract_certs,$(if $(CONFIG_SYSTEM_TRUSTED_KEYS),$<,""))
targets += x509_certificate_list
-$(obj)/x509_certificate_list: scripts/extract-cert $(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(SYSTEM_TRUSTED_KEYS_FILENAME) FORCE
- $(call if_changed,extract_certs,$(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(CONFIG_SYSTEM_TRUSTED_KEYS))
-endif # CONFIG_SYSTEM_TRUSTED_KEYRING
-
-clean-files := x509_certificate_list .x509.list x509_revocation_list
ifeq ($(CONFIG_MODULE_SIG),y)
SIGN_KEY = y
# fail and that the kernel may be used afterwards.
#
###############################################################################
-ifndef CONFIG_MODULE_SIG_HASH
-$(error Could not determine digest type to use from kernel config)
-endif
-
-redirect_openssl = 2>&1
-quiet_redirect_openssl = 2>&1
-silent_redirect_openssl = 2>/dev/null
-openssl_available = $(shell openssl help 2>/dev/null && echo yes)
# We do it this way rather than having a boolean option for enabling an
# external private key, because 'make randconfig' might enable such a
# boolean option and we unfortunately can't make it depend on !RANDCONFIG.
-ifeq ($(CONFIG_MODULE_SIG_KEY),"certs/signing_key.pem")
+ifeq ($(CONFIG_MODULE_SIG_KEY),certs/signing_key.pem)
-ifeq ($(openssl_available),yes)
-X509TEXT=$(shell openssl x509 -in "certs/signing_key.pem" -text 2>/dev/null)
-endif
+keytype-$(CONFIG_MODULE_SIG_KEY_TYPE_ECDSA) := -newkey ec -pkeyopt ec_paramgen_curve:secp384r1
-# Support user changing key type
-ifdef CONFIG_MODULE_SIG_KEY_TYPE_ECDSA
-keytype_openssl = -newkey ec -pkeyopt ec_paramgen_curve:secp384r1
-ifeq ($(openssl_available),yes)
-$(if $(findstring id-ecPublicKey,$(X509TEXT)),,$(shell rm -f "certs/signing_key.pem"))
-endif
-endif # CONFIG_MODULE_SIG_KEY_TYPE_ECDSA
+quiet_cmd_gen_key = GENKEY $@
+ cmd_gen_key = openssl req -new -nodes -utf8 -$(CONFIG_MODULE_SIG_HASH) -days 36500 \
+ -batch -x509 -config $< \
+ -outform PEM -out $@ -keyout $@ $(keytype-y) 2>&1
-ifdef CONFIG_MODULE_SIG_KEY_TYPE_RSA
-ifeq ($(openssl_available),yes)
-$(if $(findstring rsaEncryption,$(X509TEXT)),,$(shell rm -f "certs/signing_key.pem"))
-endif
-endif # CONFIG_MODULE_SIG_KEY_TYPE_RSA
-
-$(obj)/signing_key.pem: $(obj)/x509.genkey
- @$(kecho) "###"
- @$(kecho) "### Now generating an X.509 key pair to be used for signing modules."
- @$(kecho) "###"
- @$(kecho) "### If this takes a long time, you might wish to run rngd in the"
- @$(kecho) "### background to keep the supply of entropy topped up. It"
- @$(kecho) "### needs to be run as root, and uses a hardware random"
- @$(kecho) "### number generator if one is available."
- @$(kecho) "###"
- $(Q)openssl req -new -nodes -utf8 -$(CONFIG_MODULE_SIG_HASH) -days 36500 \
- -batch -x509 -config $(obj)/x509.genkey \
- -outform PEM -out $(obj)/signing_key.pem \
- -keyout $(obj)/signing_key.pem \
- $(keytype_openssl) \
- $($(quiet)redirect_openssl)
- @$(kecho) "###"
- @$(kecho) "### Key pair generated."
- @$(kecho) "###"
+$(obj)/signing_key.pem: $(obj)/x509.genkey FORCE
+ $(call if_changed,gen_key)
+
+targets += signing_key.pem
+quiet_cmd_copy_x509_config = COPY $@
+ cmd_copy_x509_config = cat $(srctree)/$(src)/default_x509.genkey > $@
+
+# You can provide your own config file. If not present, copy the default one.
$(obj)/x509.genkey:
- @$(kecho) Generating X.509 key generation config
- @echo >$@ "[ req ]"
- @echo >>$@ "default_bits = 4096"
- @echo >>$@ "distinguished_name = req_distinguished_name"
- @echo >>$@ "prompt = no"
- @echo >>$@ "string_mask = utf8only"
- @echo >>$@ "x509_extensions = myexts"
- @echo >>$@
- @echo >>$@ "[ req_distinguished_name ]"
- @echo >>$@ "#O = Unspecified company"
- @echo >>$@ "CN = Build time autogenerated kernel key"
- @echo >>$@ "#emailAddress = unspecified.user@unspecified.company"
- @echo >>$@
- @echo >>$@ "[ myexts ]"
- @echo >>$@ "basicConstraints=critical,CA:FALSE"
- @echo >>$@ "keyUsage=digitalSignature"
- @echo >>$@ "subjectKeyIdentifier=hash"
- @echo >>$@ "authorityKeyIdentifier=keyid"
-endif # CONFIG_MODULE_SIG_KEY
+ $(call cmd,copy_x509_config)
-$(eval $(call config_filename,MODULE_SIG_KEY))
+endif # CONFIG_MODULE_SIG_KEY
# If CONFIG_MODULE_SIG_KEY isn't a PKCS#11 URI, depend on it
-ifeq ($(patsubst pkcs11:%,%,$(firstword $(MODULE_SIG_KEY_FILENAME))),$(firstword $(MODULE_SIG_KEY_FILENAME)))
-X509_DEP := $(MODULE_SIG_KEY_SRCPREFIX)$(MODULE_SIG_KEY_FILENAME)
+ifneq ($(filter-out pkcs11:%, $(CONFIG_MODULE_SIG_KEY)),)
+X509_DEP := $(CONFIG_MODULE_SIG_KEY)
endif
-# GCC PR#66871 again.
$(obj)/system_certificates.o: $(obj)/signing_key.x509
-targets += signing_key.x509
-$(obj)/signing_key.x509: scripts/extract-cert $(X509_DEP) FORCE
- $(call if_changed,extract_certs,$(MODULE_SIG_KEY_SRCPREFIX)$(CONFIG_MODULE_SIG_KEY))
+$(obj)/signing_key.x509: $(X509_DEP) $(obj)/extract-cert FORCE
+ $(call if_changed,extract_certs,$(if $(CONFIG_MODULE_SIG_KEY),$(if $(X509_DEP),$<,$(CONFIG_MODULE_SIG_KEY)),""))
endif # CONFIG_MODULE_SIG
-ifeq ($(CONFIG_SYSTEM_REVOCATION_LIST),y)
-
-$(eval $(call config_filename,SYSTEM_REVOCATION_KEYS))
+targets += signing_key.x509
$(obj)/revocation_certificates.o: $(obj)/x509_revocation_list
-quiet_cmd_extract_certs = EXTRACT_CERTS $(patsubst "%",%,$(2))
- cmd_extract_certs = scripts/extract-cert $(2) $@
+$(obj)/x509_revocation_list: $(CONFIG_SYSTEM_REVOCATION_KEYS) $(obj)/extract-cert FORCE
+ $(call if_changed,extract_certs,$(if $(CONFIG_SYSTEM_REVOCATION_KEYS),$<,""))
targets += x509_revocation_list
-$(obj)/x509_revocation_list: scripts/extract-cert $(SYSTEM_REVOCATION_KEYS_SRCPREFIX)$(SYSTEM_REVOCATION_KEYS_FILENAME) FORCE
- $(call if_changed,extract_certs,$(SYSTEM_REVOCATION_KEYS_SRCPREFIX)$(CONFIG_SYSTEM_REVOCATION_KEYS))
-endif
+
+hostprogs := extract-cert
+
+HOSTCFLAGS_extract-cert.o = $(shell pkg-config --cflags libcrypto 2> /dev/null)
+HOSTLDLIBS_extract-cert = $(shell pkg-config --libs libcrypto 2> /dev/null || echo -lcrypto)
--- /dev/null
+[ req ]
+default_bits = 4096
+distinguished_name = req_distinguished_name
+prompt = no
+string_mask = utf8only
+x509_extensions = myexts
+
+[ req_distinguished_name ]
+#O = Unspecified company
+CN = Build time autogenerated kernel key
+#emailAddress = unspecified.user@unspecified.company
+
+[ myexts ]
+basicConstraints=critical,CA:FALSE
+keyUsage=digitalSignature
+subjectKeyIdentifier=hash
+authorityKeyIdentifier=keyid
--- /dev/null
+/* Extract X.509 certificate in DER form from PKCS#11 or PEM.
+ *
+ * Copyright © 2014-2015 Red Hat, Inc. All Rights Reserved.
+ * Copyright © 2015 Intel Corporation.
+ *
+ * Authors: David Howells <dhowells@redhat.com>
+ * David Woodhouse <dwmw2@infradead.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1
+ * of the licence, or (at your option) any later version.
+ */
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <err.h>
+#include <openssl/bio.h>
+#include <openssl/pem.h>
+#include <openssl/err.h>
+#include <openssl/engine.h>
+
+#define PKEY_ID_PKCS7 2
+
+static __attribute__((noreturn))
+void format(void)
+{
+ fprintf(stderr,
+ "Usage: extract-cert <source> <dest>\n");
+ exit(2);
+}
+
+static void display_openssl_errors(int l)
+{
+ const char *file;
+ char buf[120];
+ int e, line;
+
+ if (ERR_peek_error() == 0)
+ return;
+ fprintf(stderr, "At main.c:%d:\n", l);
+
+ while ((e = ERR_get_error_line(&file, &line))) {
+ ERR_error_string(e, buf);
+ fprintf(stderr, "- SSL %s: %s:%d\n", buf, file, line);
+ }
+}
+
+static void drain_openssl_errors(void)
+{
+ const char *file;
+ int line;
+
+ if (ERR_peek_error() == 0)
+ return;
+ while (ERR_get_error_line(&file, &line)) {}
+}
+
+#define ERR(cond, fmt, ...) \
+ do { \
+ bool __cond = (cond); \
+ display_openssl_errors(__LINE__); \
+ if (__cond) { \
+ err(1, fmt, ## __VA_ARGS__); \
+ } \
+ } while(0)
+
+static const char *key_pass;
+static BIO *wb;
+static char *cert_dst;
+static int kbuild_verbose;
+
+static void write_cert(X509 *x509)
+{
+ char buf[200];
+
+ if (!wb) {
+ wb = BIO_new_file(cert_dst, "wb");
+ ERR(!wb, "%s", cert_dst);
+ }
+ X509_NAME_oneline(X509_get_subject_name(x509), buf, sizeof(buf));
+ ERR(!i2d_X509_bio(wb, x509), "%s", cert_dst);
+ if (kbuild_verbose)
+ fprintf(stderr, "Extracted cert: %s\n", buf);
+}
+
+int main(int argc, char **argv)
+{
+ char *cert_src;
+
+ OpenSSL_add_all_algorithms();
+ ERR_load_crypto_strings();
+ ERR_clear_error();
+
+ kbuild_verbose = atoi(getenv("KBUILD_VERBOSE")?:"0");
+
+ key_pass = getenv("KBUILD_SIGN_PIN");
+
+ if (argc != 3)
+ format();
+
+ cert_src = argv[1];
+ cert_dst = argv[2];
+
+ if (!cert_src[0]) {
+ /* Invoked with no input; create empty file */
+ FILE *f = fopen(cert_dst, "wb");
+ ERR(!f, "%s", cert_dst);
+ fclose(f);
+ exit(0);
+ } else if (!strncmp(cert_src, "pkcs11:", 7)) {
+ ENGINE *e;
+ struct {
+ const char *cert_id;
+ X509 *cert;
+ } parms;
+
+ parms.cert_id = cert_src;
+ parms.cert = NULL;
+
+ ENGINE_load_builtin_engines();
+ drain_openssl_errors();
+ e = ENGINE_by_id("pkcs11");
+ ERR(!e, "Load PKCS#11 ENGINE");
+ if (ENGINE_init(e))
+ drain_openssl_errors();
+ else
+ ERR(1, "ENGINE_init");
+ if (key_pass)
+ ERR(!ENGINE_ctrl_cmd_string(e, "PIN", key_pass, 0), "Set PKCS#11 PIN");
+ ENGINE_ctrl_cmd(e, "LOAD_CERT_CTRL", 0, &parms, NULL, 1);
+ ERR(!parms.cert, "Get X.509 from PKCS#11");
+ write_cert(parms.cert);
+ } else {
+ BIO *b;
+ X509 *x509;
+
+ b = BIO_new_file(cert_src, "rb");
+ ERR(!b, "%s", cert_src);
+
+ while (1) {
+ x509 = PEM_read_bio_X509(b, NULL, NULL, NULL);
+ if (wb && !x509) {
+ unsigned long err = ERR_peek_last_error();
+ if (ERR_GET_LIB(err) == ERR_LIB_PEM &&
+ ERR_GET_REASON(err) == PEM_R_NO_START_LINE) {
+ ERR_clear_error();
+ break;
+ }
+ }
+ ERR(!x509, "%s", cert_src);
+ write_cert(x509);
+ }
+ }
+
+ BIO_free(wb);
+
+ return 0;
+}
source "certs/Kconfig"
endif # if CRYPTO
-
-source "lib/crypto/Kconfig"
# ACPI Boot-Time Table Parsing
#
ifeq ($(CONFIG_ACPI_CUSTOM_DSDT),y)
-tables.o: $(src)/../../include/$(subst $\",,$(CONFIG_ACPI_CUSTOM_DSDT_FILE)) ;
+tables.o: $(src)/../../include/$(CONFIG_ACPI_CUSTOM_DSDT_FILE) ;
endif
static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val)
{
- int ret_val = 0;
void __iomem *vaddr = NULL;
int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
struct cpc_reg *reg = ®_res->cpc_entry.reg;
if (reg_res->type == ACPI_TYPE_INTEGER) {
*val = reg_res->cpc_entry.int_value;
- return ret_val;
+ return 0;
}
*val = 0;
if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO) {
u32 width = 8 << (reg->access_width - 1);
+ u32 val_u32;
acpi_status status;
status = acpi_os_read_port((acpi_io_address)reg->address,
- (u32 *)val, width);
+ &val_u32, width);
if (ACPI_FAILURE(status)) {
pr_debug("Error: Failed to read SystemIO port %llx\n",
reg->address);
return -EFAULT;
}
+ *val = val_u32;
return 0;
} else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0)
vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id);
default:
pr_debug("Error: Cannot read %u bit width from PCC for ss: %d\n",
reg->bit_width, pcc_ss_id);
- ret_val = -EFAULT;
+ return -EFAULT;
}
- return ret_val;
+ return 0;
}
static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val)
static const struct acpi_device_id pch_fivr_device_ids[] = {
{"INTC1045", 0},
{"INTC1049", 0},
+ {"INTC10A3", 0},
{"", 0},
};
MODULE_DEVICE_TABLE(acpi, pch_fivr_device_ids);
{"INTC1050", 0},
{"INTC1060", 0},
{"INTC1061", 0},
+ {"INTC10A4", 0},
+ {"INTC10A5", 0},
{"", 0},
};
MODULE_DEVICE_TABLE(acpi, int3407_device_ids);
{"INTC1050"},
{"INTC1060"},
{"INTC1061"},
+ {"INTC10A0"},
+ {"INTC10A1"},
+ {"INTC10A2"},
+ {"INTC10A3"},
+ {"INTC10A4"},
+ {"INTC10A5"},
{""},
};
{"INT3404", }, /* Fan */ \
{"INTC1044", }, /* Fan for Tiger Lake generation */ \
{"INTC1048", }, /* Fan for Alder Lake generation */ \
+ {"INTC10A2", }, /* Fan for Raptor Lake generation */ \
{"PNP0C0B", } /* Generic ACPI fan */
acpi_system_wakeup_device_open_fs(struct inode *inode, struct file *file)
{
return single_open(file, acpi_system_wakeup_device_seq_show,
- PDE_DATA(inode));
+ pde_data(inode));
}
static const struct proc_ops acpi_system_wakeup_device_proc_ops = {
union cvmx_mio_boot_dma_intx dma_int;
u8 status;
- trace_ata_bmdma_stop(qc, &qc->tf, qc->tag);
+ trace_ata_bmdma_stop(ap, &qc->tf, qc->tag);
if (ap->hsm_task_state != HSM_ST_LAST)
return 0;
static u16 get_desc (IADEV *dev, struct ia_vcc *iavcc) {
u_short desc_num, i;
- struct sk_buff *skb;
struct ia_vcc *iavcc_r = NULL;
unsigned long delta;
static unsigned long timer = 0;
else
dev->ffL.tcq_rd -= 2;
*(u_short *)(dev->seg_ram + dev->ffL.tcq_rd) = i+1;
- if (!(skb = dev->desc_tbl[i].txskb) ||
- !(iavcc_r = dev->desc_tbl[i].iavcc))
+ if (!dev->desc_tbl[i].txskb || !(iavcc_r = dev->desc_tbl[i].iavcc))
printk("Fatal err, desc table vcc or skb is NULL\n");
else
iavcc_r->vc_desc_cnt--;
#include <linux/of.h>
#include <asm/sections.h>
-#include <asm/pgalloc.h>
struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
EXPORT_SYMBOL(node_data);
return node_distance(early_cpu_to_node(from), early_cpu_to_node(to));
}
-static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size,
- size_t align)
-{
- int nid = early_cpu_to_node(cpu);
-
- return memblock_alloc_try_nid(size, align,
- __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
-}
-
-static void __init pcpu_fc_free(void *ptr, size_t size)
-{
- memblock_free(ptr, size);
-}
-
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-static void __init pcpu_populate_pte(unsigned long addr)
-{
- pgd_t *pgd = pgd_offset_k(addr);
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
-
- p4d = p4d_offset(pgd, addr);
- if (p4d_none(*p4d)) {
- pud_t *new;
-
- new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
- if (!new)
- goto err_alloc;
- p4d_populate(&init_mm, p4d, new);
- }
-
- pud = pud_offset(p4d, addr);
- if (pud_none(*pud)) {
- pmd_t *new;
-
- new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
- if (!new)
- goto err_alloc;
- pud_populate(&init_mm, pud, new);
- }
-
- pmd = pmd_offset(pud, addr);
- if (!pmd_present(*pmd)) {
- pte_t *new;
-
- new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
- if (!new)
- goto err_alloc;
- pmd_populate_kernel(&init_mm, pmd, new);
- }
-
- return;
-
-err_alloc:
- panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
- __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
-}
-#endif
-
void __init setup_per_cpu_areas(void)
{
unsigned long delta;
rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
pcpu_cpu_distance,
- pcpu_fc_alloc, pcpu_fc_free);
+ early_cpu_to_node);
#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
if (rc < 0)
pr_warn("PERCPU: %s allocator failed (%d), falling back to page size\n",
#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
if (rc < 0)
- rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE,
- pcpu_fc_alloc,
- pcpu_fc_free,
- pcpu_populate_pte);
+ rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE, early_cpu_to_node);
#endif
if (rc < 0)
panic("Failed to initialize percpu areas (err=%d).", rc);
# Create $(fwdir) from $(CONFIG_EXTRA_FIRMWARE_DIR) -- if it doesn't have a
# leading /, it's relative to $(srctree).
-fwdir := $(subst $(quote),,$(CONFIG_EXTRA_FIRMWARE_DIR))
+fwdir := $(CONFIG_EXTRA_FIRMWARE_DIR)
fwdir := $(addprefix $(srctree)/,$(filter-out /%,$(fwdir)))$(filter /%,$(fwdir))
-firmware := $(addsuffix .gen.o, $(subst $(quote),,$(CONFIG_EXTRA_FIRMWARE)))
+firmware := $(addsuffix .gen.o, $(CONFIG_EXTRA_FIRMWARE))
obj-y += $(firmware)
FWNAME = $(patsubst $(obj)/%.gen.S,%,$@)
int register_sysfs_loader(void)
{
- return class_register(&firmware_class);
+ int ret = class_register(&firmware_class);
+
+ if (ret != 0)
+ return ret;
+ return register_firmware_config_sysctl();
}
void unregister_sysfs_loader(void)
{
+ unregister_firmware_config_sysctl();
class_unregister(&firmware_class);
}
int register_sysfs_loader(void);
void unregister_sysfs_loader(void);
+#ifdef CONFIG_SYSCTL
+extern int register_firmware_config_sysctl(void);
+extern void unregister_firmware_config_sysctl(void);
+#else
+static inline int register_firmware_config_sysctl(void)
+{
+ return 0;
+}
+static inline void unregister_firmware_config_sysctl(void) { }
+#endif /* CONFIG_SYSCTL */
+
#else /* CONFIG_FW_LOADER_USER_HELPER */
static inline int firmware_fallback_sysfs(struct firmware *fw, const char *name,
struct device *device,
#include <linux/kconfig.h>
#include <linux/list.h>
#include <linux/slab.h>
+#include <linux/export.h>
#include <linux/security.h>
#include <linux/highmem.h>
#include <linux/umh.h>
EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE);
#ifdef CONFIG_SYSCTL
-struct ctl_table firmware_config_table[] = {
+static struct ctl_table firmware_config_table[] = {
{
.procname = "force_sysfs_fallback",
.data = &fw_fallback_config.force_sysfs_fallback,
},
{ }
};
-#endif
+
+static struct ctl_table_header *firmware_config_sysct_table_header;
+int register_firmware_config_sysctl(void)
+{
+ firmware_config_sysct_table_header =
+ register_sysctl("kernel/firmware_config",
+ firmware_config_table);
+ if (!firmware_config_sysct_table_header)
+ return -ENOMEM;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(register_firmware_config_sysctl, FIRMWARE_LOADER_PRIVATE);
+
+void unregister_firmware_config_sysctl(void)
+{
+ unregister_sysctl_table(firmware_config_sysct_table_header);
+ firmware_config_sysct_table_header = NULL;
+}
+EXPORT_SYMBOL_NS_GPL(unregister_firmware_config_sysctl, FIRMWARE_LOADER_PRIVATE);
+
+#endif /* CONFIG_SYSCTL */
struct rtc_time time;
unsigned int val;
- mc146818_get_time(&time);
+ if (mc146818_get_time(&time) < 0) {
+ pr_err("Unable to read current time from RTC\n");
+ return 0;
+ }
+
pr_info("RTC time: %ptRt, date: %ptRd\n", &time, &time);
val = time.tm_year; /* 100 years */
if (val > 100)
register ulong n;
n = jiffies & 0xffff;
- return n |= (++d->lasttag & 0x7fff) << 16;
+ return n | (++d->lasttag & 0x7fff) << 16;
}
static u32
* (should share code eventually).
*/
static LIST_HEAD(brd_devices);
-static DEFINE_MUTEX(brd_devices_mutex);
static struct dentry *brd_debugfs_dir;
static int brd_alloc(int i)
char buf[DISK_NAME_LEN];
int err = -ENOMEM;
- mutex_lock(&brd_devices_mutex);
- list_for_each_entry(brd, &brd_devices, brd_list) {
- if (brd->brd_number == i) {
- mutex_unlock(&brd_devices_mutex);
+ list_for_each_entry(brd, &brd_devices, brd_list)
+ if (brd->brd_number == i)
return -EEXIST;
- }
- }
brd = kzalloc(sizeof(*brd), GFP_KERNEL);
- if (!brd) {
- mutex_unlock(&brd_devices_mutex);
+ if (!brd)
return -ENOMEM;
- }
brd->brd_number = i;
list_add_tail(&brd->brd_list, &brd_devices);
- mutex_unlock(&brd_devices_mutex);
spin_lock_init(&brd->brd_lock);
INIT_RADIX_TREE(&brd->brd_pages, GFP_ATOMIC);
out_cleanup_disk:
blk_cleanup_disk(disk);
out_free_dev:
- mutex_lock(&brd_devices_mutex);
list_del(&brd->brd_list);
- mutex_unlock(&brd_devices_mutex);
kfree(brd);
return err;
}
brd_alloc(MINOR(dev) / max_part);
}
-static void brd_del_one(struct brd_device *brd)
+static void brd_cleanup(void)
{
- del_gendisk(brd->brd_disk);
- blk_cleanup_disk(brd->brd_disk);
- brd_free_pages(brd);
- mutex_lock(&brd_devices_mutex);
- list_del(&brd->brd_list);
- mutex_unlock(&brd_devices_mutex);
- kfree(brd);
+ struct brd_device *brd, *next;
+
+ debugfs_remove_recursive(brd_debugfs_dir);
+
+ list_for_each_entry_safe(brd, next, &brd_devices, brd_list) {
+ del_gendisk(brd->brd_disk);
+ blk_cleanup_disk(brd->brd_disk);
+ brd_free_pages(brd);
+ list_del(&brd->brd_list);
+ kfree(brd);
+ }
}
static inline void brd_check_and_reset_par(void)
static int __init brd_init(void)
{
- struct brd_device *brd, *next;
int err, i;
+ brd_check_and_reset_par();
+
+ brd_debugfs_dir = debugfs_create_dir("ramdisk_pages", NULL);
+
+ for (i = 0; i < rd_nr; i++) {
+ err = brd_alloc(i);
+ if (err)
+ goto out_free;
+ }
+
/*
* brd module now has a feature to instantiate underlying device
* structure on-demand, provided that there is an access dev node.
* dynamically.
*/
- if (__register_blkdev(RAMDISK_MAJOR, "ramdisk", brd_probe))
- return -EIO;
-
- brd_check_and_reset_par();
-
- brd_debugfs_dir = debugfs_create_dir("ramdisk_pages", NULL);
-
- for (i = 0; i < rd_nr; i++) {
- err = brd_alloc(i);
- if (err)
- goto out_free;
+ if (__register_blkdev(RAMDISK_MAJOR, "ramdisk", brd_probe)) {
+ err = -EIO;
+ goto out_free;
}
pr_info("brd: module loaded\n");
return 0;
out_free:
- unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
- debugfs_remove_recursive(brd_debugfs_dir);
-
- list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
- brd_del_one(brd);
+ brd_cleanup();
pr_info("brd: module NOT loaded !!!\n");
return err;
static void __exit brd_exit(void)
{
- struct brd_device *brd, *next;
unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
- debugfs_remove_recursive(brd_debugfs_dir);
-
- list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
- brd_del_one(brd);
+ brd_cleanup();
pr_info("brd: module unloaded\n");
}
static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
{
- struct rb_node **node = &(lo->worker_tree.rb_node), *parent = NULL;
+ struct rb_node **node, *parent = NULL;
struct loop_worker *cur_worker, *worker = NULL;
struct work_struct *work;
struct list_head *cmd_list;
* These are the characters that produce nonzero for
* isspace() in the "C" and "POSIX" locales.
*/
- const char *spaces = " \f\n\r\t\v";
+ static const char spaces[] = " \f\n\r\t\v";
*buf += strspn(*buf, spaces); /* Find start of token */
pctx.opts->exclusive = RBD_EXCLUSIVE_DEFAULT;
pctx.opts->trim = RBD_TRIM_DEFAULT;
- ret = ceph_parse_mon_ips(mon_addrs, mon_addrs_size, pctx.copts, NULL);
+ ret = ceph_parse_mon_ips(mon_addrs, mon_addrs_size, pctx.copts, NULL,
+ ',');
if (ret)
goto out_err;
return per_cpu_ptr(sess->cpu_queues, bit);
} else if (cpu != 0) {
/* Search from 0 to cpu */
- bit = find_next_bit(sess->cpu_queues_bm, cpu, 0);
+ bit = find_first_bit(sess->cpu_queues_bm, cpu);
if (bit < cpu)
return per_cpu_ptr(sess->cpu_queues, bit);
}
},
{ }
};
-
-static struct ctl_table cdrom_cdrom_table[] = {
- {
- .procname = "cdrom",
- .maxlen = 0,
- .mode = 0555,
- .child = cdrom_table,
- },
- { }
-};
-
-/* Make sure that /proc/sys/dev is there */
-static struct ctl_table cdrom_root_table[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = cdrom_cdrom_table,
- },
- { }
-};
static struct ctl_table_header *cdrom_sysctl_header;
static void cdrom_sysctl_register(void)
if (!atomic_add_unless(&initialized, 1, 1))
return;
- cdrom_sysctl_header = register_sysctl_table(cdrom_root_table);
+ cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table);
/* set the defaults */
cdrom_sysctl_settings.autoclose = autoclose;
{}
};
-static struct ctl_table hpet_root[] = {
- {
- .procname = "hpet",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_table,
- },
- {}
-};
-
-static struct ctl_table dev_root[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_root,
- },
- {}
-};
-
static struct ctl_table_header *sysctl_header;
/*
if (result < 0)
return -ENODEV;
- sysctl_header = register_sysctl_table(dev_root);
+ sysctl_header = register_sysctl("dev/hpet", hpet_table);
result = acpi_bus_register_driver(&hpet_acpi_driver);
if (result < 0) {
* ===============================
*
* There are four exported interfaces; two for use within the kernel,
- * and two or use from userspace.
+ * and two for use from userspace.
*
* Exported interfaces ---- userspace output
* -----------------------------------------
*
* The primary kernel interface is
*
- * void get_random_bytes(void *buf, int nbytes);
+ * void get_random_bytes(void *buf, int nbytes);
*
* This interface will return the requested number of random bytes,
* and place it in the requested buffer. This is equivalent to a
*
* For less critical applications, there are the functions:
*
- * u32 get_random_u32()
- * u64 get_random_u64()
- * unsigned int get_random_int()
- * unsigned long get_random_long()
+ * u32 get_random_u32()
+ * u64 get_random_u64()
+ * unsigned int get_random_int()
+ * unsigned long get_random_long()
*
* These are produced by a cryptographic RNG seeded from get_random_bytes,
* and so do not deplete the entropy pool as much. These are recommended
* from the devices are:
*
* void add_device_randomness(const void *buf, unsigned int size);
- * void add_input_randomness(unsigned int type, unsigned int code,
+ * void add_input_randomness(unsigned int type, unsigned int code,
* unsigned int value);
* void add_interrupt_randomness(int irq);
- * void add_disk_randomness(struct gendisk *disk);
+ * void add_disk_randomness(struct gendisk *disk);
* void add_hwgenerator_randomness(const char *buffer, size_t count,
* size_t entropy);
* void add_bootloader_randomness(const void *buf, unsigned int size);
* /dev/random and /dev/urandom created already, they can be created
* by using the commands:
*
- * mknod /dev/random c 1 8
- * mknod /dev/urandom c 1 9
+ * mknod /dev/random c 1 8
+ * mknod /dev/urandom c 1 9
*
* Acknowledgements:
* =================
#include <linux/spinlock.h>
#include <linux/kthread.h>
#include <linux/percpu.h>
-#include <linux/fips.h>
#include <linux/ptrace.h>
#include <linux/workqueue.h>
#include <linux/irq.h>
/* #define ADD_INTERRUPT_BENCH */
-/*
- * Configuration information
- */
-#define INPUT_POOL_SHIFT 12
-#define INPUT_POOL_WORDS (1 << (INPUT_POOL_SHIFT-5))
-#define OUTPUT_POOL_SHIFT 10
-#define OUTPUT_POOL_WORDS (1 << (OUTPUT_POOL_SHIFT-5))
-#define EXTRACT_SIZE (BLAKE2S_HASH_SIZE / 2)
-
-/*
- * To allow fractional bits to be tracked, the entropy_count field is
- * denominated in units of 1/8th bits.
- *
- * 2*(ENTROPY_SHIFT + poolbitshift) must <= 31, or the multiply in
- * credit_entropy_bits() needs to be 64 bits wide.
- */
-#define ENTROPY_SHIFT 3
-#define ENTROPY_BITS(r) ((r)->entropy_count >> ENTROPY_SHIFT)
-
/*
* If the entropy count falls under this number of bits, then we
* should wake up processes which are selecting or polling on write
* access to /dev/random.
*/
-static int random_write_wakeup_bits = 28 * OUTPUT_POOL_WORDS;
+static int random_write_wakeup_bits = 28 * (1 << 5);
/*
* Originally, we used a primitive polynomial of degree .poolwords
* polynomial which improves the resulting TGFSR polynomial to be
* irreducible, which we have made here.
*/
-static const struct poolinfo {
- int poolbitshift, poolwords, poolbytes, poolfracbits;
-#define S(x) ilog2(x)+5, (x), (x)*4, (x) << (ENTROPY_SHIFT+5)
- int tap1, tap2, tap3, tap4, tap5;
-} poolinfo_table[] = {
- /* was: x^128 + x^103 + x^76 + x^51 +x^25 + x + 1 */
+enum poolinfo {
+ POOL_WORDS = 128,
+ POOL_WORDMASK = POOL_WORDS - 1,
+ POOL_BYTES = POOL_WORDS * sizeof(u32),
+ POOL_BITS = POOL_BYTES * 8,
+ POOL_BITSHIFT = ilog2(POOL_BITS),
+
+ /* To allow fractional bits to be tracked, the entropy_count field is
+ * denominated in units of 1/8th bits. */
+ POOL_ENTROPY_SHIFT = 3,
+#define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
+ POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
+
/* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
- { S(128), 104, 76, 51, 25, 1 },
+ POOL_TAP1 = 104,
+ POOL_TAP2 = 76,
+ POOL_TAP3 = 51,
+ POOL_TAP4 = 25,
+ POOL_TAP5 = 1,
+
+ EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2
};
/*
static LIST_HEAD(random_ready_list);
struct crng_state {
- __u32 state[16];
- unsigned long init_time;
- spinlock_t lock;
+ u32 state[16];
+ unsigned long init_time;
+ spinlock_t lock;
};
static struct crng_state primary_crng = {
#define crng_ready() (likely(crng_init > 1))
static int crng_init_cnt = 0;
static unsigned long crng_global_init_time = 0;
-#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
-static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
+#define CRNG_INIT_CNT_THRESH (2 * CHACHA_KEY_SIZE)
+static void _extract_crng(struct crng_state *crng, u8 out[CHACHA_BLOCK_SIZE]);
static void _crng_backtrack_protect(struct crng_state *crng,
- __u8 tmp[CHACHA_BLOCK_SIZE], int used);
+ u8 tmp[CHACHA_BLOCK_SIZE], int used);
static void process_random_ready_list(void);
static void _get_random_bytes(void *buf, int nbytes);
*
**********************************************************************/
-struct entropy_store;
-struct entropy_store {
- /* read-only data: */
- const struct poolinfo *poolinfo;
- __u32 *pool;
- const char *name;
+static u32 input_pool_data[POOL_WORDS] __latent_entropy;
- /* read-write data: */
+static struct {
spinlock_t lock;
- unsigned short add_ptr;
- unsigned short input_rotate;
+ u16 add_ptr;
+ u16 input_rotate;
int entropy_count;
- unsigned int last_data_init:1;
- __u8 last_data[EXTRACT_SIZE];
+} input_pool = {
+ .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
};
-static ssize_t extract_entropy(struct entropy_store *r, void *buf,
- size_t nbytes, int min, int rsvd);
-static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
- size_t nbytes, int fips);
+static ssize_t extract_entropy(void *buf, size_t nbytes, int min);
+static ssize_t _extract_entropy(void *buf, size_t nbytes);
-static void crng_reseed(struct crng_state *crng, struct entropy_store *r);
-static __u32 input_pool_data[INPUT_POOL_WORDS] __latent_entropy;
+static void crng_reseed(struct crng_state *crng, bool use_input_pool);
-static struct entropy_store input_pool = {
- .poolinfo = &poolinfo_table[0],
- .name = "input",
- .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
- .pool = input_pool_data
-};
-
-static __u32 const twist_table[8] = {
+static const u32 twist_table[8] = {
0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
* it's cheap to do so and helps slightly in the expected case where
* the entropy is concentrated in the low-order bits.
*/
-static void _mix_pool_bytes(struct entropy_store *r, const void *in,
- int nbytes)
+static void _mix_pool_bytes(const void *in, int nbytes)
{
- unsigned long i, tap1, tap2, tap3, tap4, tap5;
+ unsigned long i;
int input_rotate;
- int wordmask = r->poolinfo->poolwords - 1;
- const unsigned char *bytes = in;
- __u32 w;
-
- tap1 = r->poolinfo->tap1;
- tap2 = r->poolinfo->tap2;
- tap3 = r->poolinfo->tap3;
- tap4 = r->poolinfo->tap4;
- tap5 = r->poolinfo->tap5;
+ const u8 *bytes = in;
+ u32 w;
- input_rotate = r->input_rotate;
- i = r->add_ptr;
+ input_rotate = input_pool.input_rotate;
+ i = input_pool.add_ptr;
/* mix one byte at a time to simplify size handling and churn faster */
while (nbytes--) {
w = rol32(*bytes++, input_rotate);
- i = (i - 1) & wordmask;
+ i = (i - 1) & POOL_WORDMASK;
/* XOR in the various taps */
- w ^= r->pool[i];
- w ^= r->pool[(i + tap1) & wordmask];
- w ^= r->pool[(i + tap2) & wordmask];
- w ^= r->pool[(i + tap3) & wordmask];
- w ^= r->pool[(i + tap4) & wordmask];
- w ^= r->pool[(i + tap5) & wordmask];
+ w ^= input_pool_data[i];
+ w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK];
+ w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK];
+ w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK];
+ w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK];
+ w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK];
/* Mix the result back in with a twist */
- r->pool[i] = (w >> 3) ^ twist_table[w & 7];
+ input_pool_data[i] = (w >> 3) ^ twist_table[w & 7];
/*
* Normally, we add 7 bits of rotation to the pool.
input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
}
- r->input_rotate = input_rotate;
- r->add_ptr = i;
+ input_pool.input_rotate = input_rotate;
+ input_pool.add_ptr = i;
}
-static void __mix_pool_bytes(struct entropy_store *r, const void *in,
- int nbytes)
+static void __mix_pool_bytes(const void *in, int nbytes)
{
- trace_mix_pool_bytes_nolock(r->name, nbytes, _RET_IP_);
- _mix_pool_bytes(r, in, nbytes);
+ trace_mix_pool_bytes_nolock(nbytes, _RET_IP_);
+ _mix_pool_bytes(in, nbytes);
}
-static void mix_pool_bytes(struct entropy_store *r, const void *in,
- int nbytes)
+static void mix_pool_bytes(const void *in, int nbytes)
{
unsigned long flags;
- trace_mix_pool_bytes(r->name, nbytes, _RET_IP_);
- spin_lock_irqsave(&r->lock, flags);
- _mix_pool_bytes(r, in, nbytes);
- spin_unlock_irqrestore(&r->lock, flags);
+ trace_mix_pool_bytes(nbytes, _RET_IP_);
+ spin_lock_irqsave(&input_pool.lock, flags);
+ _mix_pool_bytes(in, nbytes);
+ spin_unlock_irqrestore(&input_pool.lock, flags);
}
struct fast_pool {
- __u32 pool[4];
- unsigned long last;
- unsigned short reg_idx;
- unsigned char count;
+ u32 pool[4];
+ unsigned long last;
+ u16 reg_idx;
+ u8 count;
};
/*
*/
static void fast_mix(struct fast_pool *f)
{
- __u32 a = f->pool[0], b = f->pool[1];
- __u32 c = f->pool[2], d = f->pool[3];
+ u32 a = f->pool[0], b = f->pool[1];
+ u32 c = f->pool[2], d = f->pool[3];
a += b; c += d;
b = rol32(b, 6); d = rol32(d, 27);
* Use credit_entropy_bits_safe() if the value comes from userspace
* or otherwise should be checked for extreme values.
*/
-static void credit_entropy_bits(struct entropy_store *r, int nbits)
+static void credit_entropy_bits(int nbits)
{
- int entropy_count, orig;
- const int pool_size = r->poolinfo->poolfracbits;
- int nfrac = nbits << ENTROPY_SHIFT;
+ int entropy_count, entropy_bits, orig;
+ int nfrac = nbits << POOL_ENTROPY_SHIFT;
+
+ /* Ensure that the multiplication can avoid being 64 bits wide. */
+ BUILD_BUG_ON(2 * (POOL_ENTROPY_SHIFT + POOL_BITSHIFT) > 31);
if (!nbits)
return;
retry:
- entropy_count = orig = READ_ONCE(r->entropy_count);
+ entropy_count = orig = READ_ONCE(input_pool.entropy_count);
if (nfrac < 0) {
/* Debit */
entropy_count += nfrac;
* turns no matter how large nbits is.
*/
int pnfrac = nfrac;
- const int s = r->poolinfo->poolbitshift + ENTROPY_SHIFT + 2;
+ const int s = POOL_BITSHIFT + POOL_ENTROPY_SHIFT + 2;
/* The +2 corresponds to the /4 in the denominator */
do {
- unsigned int anfrac = min(pnfrac, pool_size/2);
+ unsigned int anfrac = min(pnfrac, POOL_FRACBITS / 2);
unsigned int add =
- ((pool_size - entropy_count)*anfrac*3) >> s;
+ ((POOL_FRACBITS - entropy_count) * anfrac * 3) >> s;
entropy_count += add;
pnfrac -= anfrac;
- } while (unlikely(entropy_count < pool_size-2 && pnfrac));
+ } while (unlikely(entropy_count < POOL_FRACBITS - 2 && pnfrac));
}
if (WARN_ON(entropy_count < 0)) {
- pr_warn("negative entropy/overflow: pool %s count %d\n",
- r->name, entropy_count);
+ pr_warn("negative entropy/overflow: count %d\n", entropy_count);
entropy_count = 0;
- } else if (entropy_count > pool_size)
- entropy_count = pool_size;
- if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
+ } else if (entropy_count > POOL_FRACBITS)
+ entropy_count = POOL_FRACBITS;
+ if (cmpxchg(&input_pool.entropy_count, orig, entropy_count) != orig)
goto retry;
- trace_credit_entropy_bits(r->name, nbits,
- entropy_count >> ENTROPY_SHIFT, _RET_IP_);
+ trace_credit_entropy_bits(nbits, entropy_count >> POOL_ENTROPY_SHIFT, _RET_IP_);
- if (r == &input_pool) {
- int entropy_bits = entropy_count >> ENTROPY_SHIFT;
-
- if (crng_init < 2 && entropy_bits >= 128)
- crng_reseed(&primary_crng, r);
- }
+ entropy_bits = entropy_count >> POOL_ENTROPY_SHIFT;
+ if (crng_init < 2 && entropy_bits >= 128)
+ crng_reseed(&primary_crng, true);
}
-static int credit_entropy_bits_safe(struct entropy_store *r, int nbits)
+static int credit_entropy_bits_safe(int nbits)
{
- const int nbits_max = r->poolinfo->poolwords * 32;
-
if (nbits < 0)
return -EINVAL;
/* Cap the value to avoid overflows */
- nbits = min(nbits, nbits_max);
+ nbits = min(nbits, POOL_BITS);
- credit_entropy_bits(r, nbits);
+ credit_entropy_bits(nbits);
return 0;
}
*
*********************************************************************/
-#define CRNG_RESEED_INTERVAL (300*HZ)
+#define CRNG_RESEED_INTERVAL (300 * HZ)
static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);
static bool crng_init_try_arch(struct crng_state *crng)
{
- int i;
- bool arch_init = true;
- unsigned long rv;
+ int i;
+ bool arch_init = true;
+ unsigned long rv;
for (i = 4; i < 16; i++) {
if (!arch_get_random_seed_long(&rv) &&
static bool __init crng_init_try_arch_early(struct crng_state *crng)
{
- int i;
- bool arch_init = true;
- unsigned long rv;
+ int i;
+ bool arch_init = true;
+ unsigned long rv;
for (i = 4; i < 16; i++) {
if (!arch_get_random_seed_long_early(&rv) &&
static void crng_initialize_secondary(struct crng_state *crng)
{
chacha_init_consts(crng->state);
- _get_random_bytes(&crng->state[4], sizeof(__u32) * 12);
+ _get_random_bytes(&crng->state[4], sizeof(u32) * 12);
crng_init_try_arch(crng);
crng->init_time = jiffies - CRNG_RESEED_INTERVAL - 1;
}
static void __init crng_initialize_primary(struct crng_state *crng)
{
- _extract_entropy(&input_pool, &crng->state[4], sizeof(__u32) * 12, 0);
+ _extract_entropy(&crng->state[4], sizeof(u32) * 12);
if (crng_init_try_arch_early(crng) && trust_cpu && crng_init < 2) {
invalidate_batched_entropy();
numa_crng_init();
struct crng_state *crng;
struct crng_state **pool;
- pool = kcalloc(nr_node_ids, sizeof(*pool), GFP_KERNEL|__GFP_NOFAIL);
+ pool = kcalloc(nr_node_ids, sizeof(*pool), GFP_KERNEL | __GFP_NOFAIL);
for_each_online_node(i) {
crng = kmalloc_node(sizeof(struct crng_state),
GFP_KERNEL | __GFP_NOFAIL, i);
* path. So we can't afford to dilly-dally. Returns the number of
* bytes processed from cp.
*/
-static size_t crng_fast_load(const char *cp, size_t len)
+static size_t crng_fast_load(const u8 *cp, size_t len)
{
unsigned long flags;
- char *p;
+ u8 *p;
size_t ret = 0;
if (!spin_trylock_irqsave(&primary_crng.lock, flags))
spin_unlock_irqrestore(&primary_crng.lock, flags);
return 0;
}
- p = (unsigned char *) &primary_crng.state[4];
+ p = (u8 *)&primary_crng.state[4];
while (len > 0 && crng_init_cnt < CRNG_INIT_CNT_THRESH) {
p[crng_init_cnt % CHACHA_KEY_SIZE] ^= *cp;
cp++; crng_init_cnt++; len--; ret++;
* like a fixed DMI table (for example), which might very well be
* unique to the machine, but is otherwise unvarying.
*/
-static int crng_slow_load(const char *cp, size_t len)
+static int crng_slow_load(const u8 *cp, size_t len)
{
- unsigned long flags;
- static unsigned char lfsr = 1;
- unsigned char tmp;
- unsigned i, max = CHACHA_KEY_SIZE;
- const char * src_buf = cp;
- char * dest_buf = (char *) &primary_crng.state[4];
+ unsigned long flags;
+ static u8 lfsr = 1;
+ u8 tmp;
+ unsigned int i, max = CHACHA_KEY_SIZE;
+ const u8 *src_buf = cp;
+ u8 *dest_buf = (u8 *)&primary_crng.state[4];
if (!spin_trylock_irqsave(&primary_crng.lock, flags))
return 0;
if (len > max)
max = len;
- for (i = 0; i < max ; i++) {
+ for (i = 0; i < max; i++) {
tmp = lfsr;
lfsr >>= 1;
if (tmp & 1)
return 1;
}
-static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
+static void crng_reseed(struct crng_state *crng, bool use_input_pool)
{
- unsigned long flags;
- int i, num;
+ unsigned long flags;
+ int i, num;
union {
- __u8 block[CHACHA_BLOCK_SIZE];
- __u32 key[8];
+ u8 block[CHACHA_BLOCK_SIZE];
+ u32 key[8];
} buf;
- if (r) {
- num = extract_entropy(r, &buf, 32, 16, 0);
+ if (use_input_pool) {
+ num = extract_entropy(&buf, 32, 16);
if (num == 0)
return;
} else {
}
spin_lock_irqsave(&crng->lock, flags);
for (i = 0; i < 8; i++) {
- unsigned long rv;
+ unsigned long rv;
if (!arch_get_random_seed_long(&rv) &&
!arch_get_random_long(&rv))
rv = random_get_entropy();
- crng->state[i+4] ^= buf.key[i] ^ rv;
+ crng->state[i + 4] ^= buf.key[i] ^ rv;
}
memzero_explicit(&buf, sizeof(buf));
WRITE_ONCE(crng->init_time, jiffies);
crng_finalize_init(crng);
}
-static void _extract_crng(struct crng_state *crng,
- __u8 out[CHACHA_BLOCK_SIZE])
+static void _extract_crng(struct crng_state *crng, u8 out[CHACHA_BLOCK_SIZE])
{
unsigned long flags, init_time;
init_time = READ_ONCE(crng->init_time);
if (time_after(READ_ONCE(crng_global_init_time), init_time) ||
time_after(jiffies, init_time + CRNG_RESEED_INTERVAL))
- crng_reseed(crng, crng == &primary_crng ?
- &input_pool : NULL);
+ crng_reseed(crng, crng == &primary_crng);
}
spin_lock_irqsave(&crng->lock, flags);
chacha20_block(&crng->state[0], out);
spin_unlock_irqrestore(&crng->lock, flags);
}
-static void extract_crng(__u8 out[CHACHA_BLOCK_SIZE])
+static void extract_crng(u8 out[CHACHA_BLOCK_SIZE])
{
_extract_crng(select_crng(), out);
}
* enough) to mutate the CRNG key to provide backtracking protection.
*/
static void _crng_backtrack_protect(struct crng_state *crng,
- __u8 tmp[CHACHA_BLOCK_SIZE], int used)
+ u8 tmp[CHACHA_BLOCK_SIZE], int used)
{
- unsigned long flags;
- __u32 *s, *d;
- int i;
+ unsigned long flags;
+ u32 *s, *d;
+ int i;
- used = round_up(used, sizeof(__u32));
+ used = round_up(used, sizeof(u32));
if (used + CHACHA_KEY_SIZE > CHACHA_BLOCK_SIZE) {
extract_crng(tmp);
used = 0;
}
spin_lock_irqsave(&crng->lock, flags);
- s = (__u32 *) &tmp[used];
+ s = (u32 *)&tmp[used];
d = &crng->state[4];
- for (i=0; i < 8; i++)
+ for (i = 0; i < 8; i++)
*d++ ^= *s++;
spin_unlock_irqrestore(&crng->lock, flags);
}
-static void crng_backtrack_protect(__u8 tmp[CHACHA_BLOCK_SIZE], int used)
+static void crng_backtrack_protect(u8 tmp[CHACHA_BLOCK_SIZE], int used)
{
_crng_backtrack_protect(select_crng(), tmp, used);
}
static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
{
ssize_t ret = 0, i = CHACHA_BLOCK_SIZE;
- __u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
+ u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
int large_request = (nbytes > 256);
while (nbytes) {
return ret;
}
-
/*********************************************************************
*
* Entropy input management
trace_add_device_randomness(size, _RET_IP_);
spin_lock_irqsave(&input_pool.lock, flags);
- _mix_pool_bytes(&input_pool, buf, size);
- _mix_pool_bytes(&input_pool, &time, sizeof(time));
+ _mix_pool_bytes(buf, size);
+ _mix_pool_bytes(&time, sizeof(time));
spin_unlock_irqrestore(&input_pool.lock, flags);
}
EXPORT_SYMBOL(add_device_randomness);
*/
static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
{
- struct entropy_store *r;
struct {
long jiffies;
- unsigned cycles;
- unsigned num;
+ unsigned int cycles;
+ unsigned int num;
} sample;
long delta, delta2, delta3;
sample.jiffies = jiffies;
sample.cycles = random_get_entropy();
sample.num = num;
- r = &input_pool;
- mix_pool_bytes(r, &sample, sizeof(sample));
+ mix_pool_bytes(&sample, sizeof(sample));
/*
* Calculate number of bits of randomness we probably added.
* Round down by 1 bit on general principles,
* and limit entropy estimate to 12 bits.
*/
- credit_entropy_bits(r, min_t(int, fls(delta>>1), 11));
+ credit_entropy_bits(min_t(int, fls(delta >> 1), 11));
}
void add_input_randomness(unsigned int type, unsigned int code,
- unsigned int value)
+ unsigned int value)
{
static unsigned char last_value;
last_value = value;
add_timer_randomness(&input_timer_state,
(type << 4) ^ code ^ (code >> 4) ^ value);
- trace_add_input_randomness(ENTROPY_BITS(&input_pool));
+ trace_add_input_randomness(POOL_ENTROPY_BITS());
}
EXPORT_SYMBOL_GPL(add_input_randomness);
#ifdef ADD_INTERRUPT_BENCH
static unsigned long avg_cycles, avg_deviation;
-#define AVG_SHIFT 8 /* Exponential average factor k=1/256 */
-#define FIXED_1_2 (1 << (AVG_SHIFT-1))
+#define AVG_SHIFT 8 /* Exponential average factor k=1/256 */
+#define FIXED_1_2 (1 << (AVG_SHIFT - 1))
static void add_interrupt_bench(cycles_t start)
{
- long delta = random_get_entropy() - start;
+ long delta = random_get_entropy() - start;
- /* Use a weighted moving average */
- delta = delta - ((avg_cycles + FIXED_1_2) >> AVG_SHIFT);
- avg_cycles += delta;
- /* And average deviation */
- delta = abs(delta) - ((avg_deviation + FIXED_1_2) >> AVG_SHIFT);
- avg_deviation += delta;
+ /* Use a weighted moving average */
+ delta = delta - ((avg_cycles + FIXED_1_2) >> AVG_SHIFT);
+ avg_cycles += delta;
+ /* And average deviation */
+ delta = abs(delta) - ((avg_deviation + FIXED_1_2) >> AVG_SHIFT);
+ avg_deviation += delta;
}
#else
#define add_interrupt_bench(x)
#endif
-static __u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
+static u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
{
- __u32 *ptr = (__u32 *) regs;
+ u32 *ptr = (u32 *)regs;
unsigned int idx;
if (regs == NULL)
return 0;
idx = READ_ONCE(f->reg_idx);
- if (idx >= sizeof(struct pt_regs) / sizeof(__u32))
+ if (idx >= sizeof(struct pt_regs) / sizeof(u32))
idx = 0;
ptr += idx++;
WRITE_ONCE(f->reg_idx, idx);
void add_interrupt_randomness(int irq)
{
- struct entropy_store *r;
- struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness);
- struct pt_regs *regs = get_irq_regs();
- unsigned long now = jiffies;
- cycles_t cycles = random_get_entropy();
- __u32 c_high, j_high;
- __u64 ip;
+ struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness);
+ struct pt_regs *regs = get_irq_regs();
+ unsigned long now = jiffies;
+ cycles_t cycles = random_get_entropy();
+ u32 c_high, j_high;
+ u64 ip;
if (cycles == 0)
cycles = get_reg(fast_pool, regs);
fast_pool->pool[1] ^= now ^ c_high;
ip = regs ? instruction_pointer(regs) : _RET_IP_;
fast_pool->pool[2] ^= ip;
- fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
- get_reg(fast_pool, regs);
+ fast_pool->pool[3] ^=
+ (sizeof(ip) > 4) ? ip >> 32 : get_reg(fast_pool, regs);
fast_mix(fast_pool);
add_interrupt_bench(cycles);
if (unlikely(crng_init == 0)) {
if ((fast_pool->count >= 64) &&
- crng_fast_load((char *) fast_pool->pool,
- sizeof(fast_pool->pool)) > 0) {
+ crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) {
fast_pool->count = 0;
fast_pool->last = now;
}
return;
}
- if ((fast_pool->count < 64) &&
- !time_after(now, fast_pool->last + HZ))
+ if ((fast_pool->count < 64) && !time_after(now, fast_pool->last + HZ))
return;
- r = &input_pool;
- if (!spin_trylock(&r->lock))
+ if (!spin_trylock(&input_pool.lock))
return;
fast_pool->last = now;
- __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
- spin_unlock(&r->lock);
+ __mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool));
+ spin_unlock(&input_pool.lock);
fast_pool->count = 0;
/* award one bit for the contents of the fast pool */
- credit_entropy_bits(r, 1);
+ credit_entropy_bits(1);
}
EXPORT_SYMBOL_GPL(add_interrupt_randomness);
return;
/* first major is 1, so we get >= 0x200 here */
add_timer_randomness(disk->random, 0x100 + disk_devt(disk));
- trace_add_disk_randomness(disk_devt(disk), ENTROPY_BITS(&input_pool));
+ trace_add_disk_randomness(disk_devt(disk), POOL_ENTROPY_BITS());
}
EXPORT_SYMBOL_GPL(add_disk_randomness);
#endif
* This function decides how many bytes to actually take from the
* given pool, and also debits the entropy count accordingly.
*/
-static size_t account(struct entropy_store *r, size_t nbytes, int min,
- int reserved)
+static size_t account(size_t nbytes, int min)
{
- int entropy_count, orig, have_bytes;
+ int entropy_count, orig;
size_t ibytes, nfrac;
- BUG_ON(r->entropy_count > r->poolinfo->poolfracbits);
+ BUG_ON(input_pool.entropy_count > POOL_FRACBITS);
/* Can we pull enough? */
retry:
- entropy_count = orig = READ_ONCE(r->entropy_count);
- ibytes = nbytes;
- /* never pull more than available */
- have_bytes = entropy_count >> (ENTROPY_SHIFT + 3);
-
- if ((have_bytes -= reserved) < 0)
- have_bytes = 0;
- ibytes = min_t(size_t, ibytes, have_bytes);
- if (ibytes < min)
- ibytes = 0;
-
+ entropy_count = orig = READ_ONCE(input_pool.entropy_count);
if (WARN_ON(entropy_count < 0)) {
- pr_warn("negative entropy count: pool %s count %d\n",
- r->name, entropy_count);
+ pr_warn("negative entropy count: count %d\n", entropy_count);
entropy_count = 0;
}
- nfrac = ibytes << (ENTROPY_SHIFT + 3);
- if ((size_t) entropy_count > nfrac)
+
+ /* never pull more than available */
+ ibytes = min_t(size_t, nbytes, entropy_count >> (POOL_ENTROPY_SHIFT + 3));
+ if (ibytes < min)
+ ibytes = 0;
+ nfrac = ibytes << (POOL_ENTROPY_SHIFT + 3);
+ if ((size_t)entropy_count > nfrac)
entropy_count -= nfrac;
else
entropy_count = 0;
- if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
+ if (cmpxchg(&input_pool.entropy_count, orig, entropy_count) != orig)
goto retry;
- trace_debit_entropy(r->name, 8 * ibytes);
- if (ibytes && ENTROPY_BITS(r) < random_write_wakeup_bits) {
+ trace_debit_entropy(8 * ibytes);
+ if (ibytes && POOL_ENTROPY_BITS() < random_write_wakeup_bits) {
wake_up_interruptible(&random_write_wait);
kill_fasync(&fasync, SIGIO, POLL_OUT);
}
*
* Note: we assume that .poolwords is a multiple of 16 words.
*/
-static void extract_buf(struct entropy_store *r, __u8 *out)
+static void extract_buf(u8 *out)
{
struct blake2s_state state __aligned(__alignof__(unsigned long));
u8 hash[BLAKE2S_HASH_SIZE];
}
/* Generate a hash across the pool */
- spin_lock_irqsave(&r->lock, flags);
- blake2s_update(&state, (const u8 *)r->pool,
- r->poolinfo->poolwords * sizeof(*r->pool));
+ spin_lock_irqsave(&input_pool.lock, flags);
+ blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES);
blake2s_final(&state, hash); /* final zeros out state */
/*
* brute-forcing the feedback as hard as brute-forcing the
* hash.
*/
- __mix_pool_bytes(r, hash, sizeof(hash));
- spin_unlock_irqrestore(&r->lock, flags);
+ __mix_pool_bytes(hash, sizeof(hash));
+ spin_unlock_irqrestore(&input_pool.lock, flags);
/* Note that EXTRACT_SIZE is half of hash size here, because above
* we've dumped the full length back into mixer. By reducing the
memzero_explicit(hash, sizeof(hash));
}
-static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
- size_t nbytes, int fips)
+static ssize_t _extract_entropy(void *buf, size_t nbytes)
{
ssize_t ret = 0, i;
- __u8 tmp[EXTRACT_SIZE];
- unsigned long flags;
+ u8 tmp[EXTRACT_SIZE];
while (nbytes) {
- extract_buf(r, tmp);
-
- if (fips) {
- spin_lock_irqsave(&r->lock, flags);
- if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
- panic("Hardware RNG duplicated output!\n");
- memcpy(r->last_data, tmp, EXTRACT_SIZE);
- spin_unlock_irqrestore(&r->lock, flags);
- }
+ extract_buf(tmp);
i = min_t(int, nbytes, EXTRACT_SIZE);
memcpy(buf, tmp, i);
nbytes -= i;
* returns it in a buffer.
*
* The min parameter specifies the minimum amount we can pull before
- * failing to avoid races that defeat catastrophic reseeding while the
- * reserved parameter indicates how much entropy we must leave in the
- * pool after each pull to avoid starving other readers.
+ * failing to avoid races that defeat catastrophic reseeding.
*/
-static ssize_t extract_entropy(struct entropy_store *r, void *buf,
- size_t nbytes, int min, int reserved)
+static ssize_t extract_entropy(void *buf, size_t nbytes, int min)
{
- __u8 tmp[EXTRACT_SIZE];
- unsigned long flags;
-
- /* if last_data isn't primed, we need EXTRACT_SIZE extra bytes */
- if (fips_enabled) {
- spin_lock_irqsave(&r->lock, flags);
- if (!r->last_data_init) {
- r->last_data_init = 1;
- spin_unlock_irqrestore(&r->lock, flags);
- trace_extract_entropy(r->name, EXTRACT_SIZE,
- ENTROPY_BITS(r), _RET_IP_);
- extract_buf(r, tmp);
- spin_lock_irqsave(&r->lock, flags);
- memcpy(r->last_data, tmp, EXTRACT_SIZE);
- }
- spin_unlock_irqrestore(&r->lock, flags);
- }
-
- trace_extract_entropy(r->name, nbytes, ENTROPY_BITS(r), _RET_IP_);
- nbytes = account(r, nbytes, min, reserved);
-
- return _extract_entropy(r, buf, nbytes, fips_enabled);
+ trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_);
+ nbytes = account(nbytes, min);
+ return _extract_entropy(buf, nbytes);
}
#define warn_unseeded_randomness(previous) \
- _warn_unseeded_randomness(__func__, (void *) _RET_IP_, (previous))
+ _warn_unseeded_randomness(__func__, (void *)_RET_IP_, (previous))
-static void _warn_unseeded_randomness(const char *func_name, void *caller,
- void **previous)
+static void _warn_unseeded_randomness(const char *func_name, void *caller, void **previous)
{
#ifdef CONFIG_WARN_ALL_UNSEEDED_RANDOM
const bool print_once = false;
static bool print_once __read_mostly;
#endif
- if (print_once ||
- crng_ready() ||
+ if (print_once || crng_ready() ||
(previous && (caller == READ_ONCE(*previous))))
return;
WRITE_ONCE(*previous, caller);
print_once = true;
#endif
if (__ratelimit(&unseeded_warning))
- printk_deferred(KERN_NOTICE "random: %s called from %pS "
- "with crng_init=%d\n", func_name, caller,
- crng_init);
+ printk_deferred(KERN_NOTICE "random: %s called from %pS with crng_init=%d\n",
+ func_name, caller, crng_init);
}
/*
*/
static void _get_random_bytes(void *buf, int nbytes)
{
- __u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
+ u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
trace_get_random_bytes(nbytes, _RET_IP_);
}
EXPORT_SYMBOL(get_random_bytes);
-
/*
* Each time the timer fires, we expect that we got an unpredictable
* jump in the cycle counter. Even if the timer is running on another
*/
static void entropy_timer(struct timer_list *t)
{
- credit_entropy_bits(&input_pool, 1);
+ credit_entropy_bits(1);
}
/*
timer_setup_on_stack(&stack.timer, entropy_timer, 0);
while (!crng_ready()) {
if (!timer_pending(&stack.timer))
- mod_timer(&stack.timer, jiffies+1);
- mix_pool_bytes(&input_pool, &stack.now, sizeof(stack.now));
+ mod_timer(&stack.timer, jiffies + 1);
+ mix_pool_bytes(&stack.now, sizeof(stack.now));
schedule();
stack.now = random_get_entropy();
}
del_timer_sync(&stack.timer);
destroy_timer_on_stack(&stack.timer);
- mix_pool_bytes(&input_pool, &stack.now, sizeof(stack.now));
+ mix_pool_bytes(&stack.now, sizeof(stack.now));
}
/*
int __must_check get_random_bytes_arch(void *buf, int nbytes)
{
int left = nbytes;
- char *p = buf;
+ u8 *p = buf;
trace_get_random_bytes_arch(left, _RET_IP_);
while (left) {
/*
* init_std_data - initialize pool with system data
*
- * @r: pool to initialize
- *
* This function clears the pool's entropy count and mixes some system
* data into the pool to prepare it for use. The pool is not cleared
* as that can only decrease the entropy in the pool.
*/
-static void __init init_std_data(struct entropy_store *r)
+static void __init init_std_data(void)
{
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
- mix_pool_bytes(r, &now, sizeof(now));
- for (i = r->poolinfo->poolbytes; i > 0; i -= sizeof(rv)) {
+ mix_pool_bytes(&now, sizeof(now));
+ for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) {
if (!arch_get_random_seed_long(&rv) &&
!arch_get_random_long(&rv))
rv = random_get_entropy();
- mix_pool_bytes(r, &rv, sizeof(rv));
+ mix_pool_bytes(&rv, sizeof(rv));
}
- mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
+ mix_pool_bytes(utsname(), sizeof(*(utsname())));
}
/*
*/
int __init rand_initialize(void)
{
- init_std_data(&input_pool);
+ init_std_data();
if (crng_need_final_init)
crng_finalize_init(&primary_crng);
crng_initialize_primary(&primary_crng);
}
#endif
-static ssize_t
-urandom_read_nowarn(struct file *file, char __user *buf, size_t nbytes,
- loff_t *ppos)
+static ssize_t urandom_read_nowarn(struct file *file, char __user *buf,
+ size_t nbytes, loff_t *ppos)
{
int ret;
- nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
+ nbytes = min_t(size_t, nbytes, INT_MAX >> (POOL_ENTROPY_SHIFT + 3));
ret = extract_crng_user(buf, nbytes);
- trace_urandom_read(8 * nbytes, 0, ENTROPY_BITS(&input_pool));
+ trace_urandom_read(8 * nbytes, 0, POOL_ENTROPY_BITS());
return ret;
}
-static ssize_t
-urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+static ssize_t urandom_read(struct file *file, char __user *buf, size_t nbytes,
+ loff_t *ppos)
{
static int maxwarn = 10;
return urandom_read_nowarn(file, buf, nbytes, ppos);
}
-static ssize_t
-random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+static ssize_t random_read(struct file *file, char __user *buf, size_t nbytes,
+ loff_t *ppos)
{
int ret;
return urandom_read_nowarn(file, buf, nbytes, ppos);
}
-static __poll_t
-random_poll(struct file *file, poll_table * wait)
+static __poll_t random_poll(struct file *file, poll_table *wait)
{
__poll_t mask;
mask = 0;
if (crng_ready())
mask |= EPOLLIN | EPOLLRDNORM;
- if (ENTROPY_BITS(&input_pool) < random_write_wakeup_bits)
+ if (POOL_ENTROPY_BITS() < random_write_wakeup_bits)
mask |= EPOLLOUT | EPOLLWRNORM;
return mask;
}
-static int
-write_pool(struct entropy_store *r, const char __user *buffer, size_t count)
+static int write_pool(const char __user *buffer, size_t count)
{
size_t bytes;
- __u32 t, buf[16];
+ u32 t, buf[16];
const char __user *p = buffer;
while (count > 0) {
if (copy_from_user(&buf, p, bytes))
return -EFAULT;
- for (b = bytes ; b > 0 ; b -= sizeof(__u32), i++) {
+ for (b = bytes; b > 0; b -= sizeof(u32), i++) {
if (!arch_get_random_int(&t))
break;
buf[i] ^= t;
count -= bytes;
p += bytes;
- mix_pool_bytes(r, buf, bytes);
+ mix_pool_bytes(buf, bytes);
cond_resched();
}
{
size_t ret;
- ret = write_pool(&input_pool, buffer, count);
+ ret = write_pool(buffer, count);
if (ret)
return ret;
switch (cmd) {
case RNDGETENTCNT:
/* inherently racy, no point locking */
- ent_count = ENTROPY_BITS(&input_pool);
+ ent_count = POOL_ENTROPY_BITS();
if (put_user(ent_count, p))
return -EFAULT;
return 0;
return -EPERM;
if (get_user(ent_count, p))
return -EFAULT;
- return credit_entropy_bits_safe(&input_pool, ent_count);
+ return credit_entropy_bits_safe(ent_count);
case RNDADDENTROPY:
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
return -EINVAL;
if (get_user(size, p++))
return -EFAULT;
- retval = write_pool(&input_pool, (const char __user *)p,
- size);
+ retval = write_pool((const char __user *)p, size);
if (retval < 0)
return retval;
- return credit_entropy_bits_safe(&input_pool, ent_count);
+ return credit_entropy_bits_safe(ent_count);
case RNDZAPENTCNT:
case RNDCLEARPOOL:
/*
return -EPERM;
if (crng_init < 2)
return -ENODATA;
- crng_reseed(&primary_crng, &input_pool);
+ crng_reseed(&primary_crng, true);
WRITE_ONCE(crng_global_init_time, jiffies - 1);
return 0;
default:
}
const struct file_operations random_fops = {
- .read = random_read,
+ .read = random_read,
.write = random_write,
- .poll = random_poll,
+ .poll = random_poll,
.unlocked_ioctl = random_ioctl,
.compat_ioctl = compat_ptr_ioctl,
.fasync = random_fasync,
};
const struct file_operations urandom_fops = {
- .read = urandom_read,
+ .read = urandom_read,
.write = random_write,
.unlocked_ioctl = random_ioctl,
.compat_ioctl = compat_ptr_ioctl,
.llseek = noop_llseek,
};
-SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
- unsigned int, flags)
+SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count, unsigned int,
+ flags)
{
int ret;
- if (flags & ~(GRND_NONBLOCK|GRND_RANDOM|GRND_INSECURE))
+ if (flags & ~(GRND_NONBLOCK | GRND_RANDOM | GRND_INSECURE))
return -EINVAL;
/*
* Requesting insecure and blocking randomness at the same time makes
* no sense.
*/
- if ((flags & (GRND_INSECURE|GRND_RANDOM)) == (GRND_INSECURE|GRND_RANDOM))
+ if ((flags & (GRND_INSECURE | GRND_RANDOM)) == (GRND_INSECURE | GRND_RANDOM))
return -EINVAL;
if (count > INT_MAX)
#include <linux/sysctl.h>
static int min_write_thresh;
-static int max_write_thresh = INPUT_POOL_WORDS * 32;
+static int max_write_thresh = POOL_BITS;
static int random_min_urandom_seed = 60;
static char sysctl_bootid[16];
* returned as an ASCII string in the standard UUID format; if via the
* sysctl system call, as 16 bytes of binary data.
*/
-static int proc_do_uuid(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
+static int proc_do_uuid(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
{
struct ctl_table fake_table;
unsigned char buf[64], tmp_uuid[16], *uuid;
/*
* Return entropy available scaled to integral bits
*/
-static int proc_do_entropy(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
+static int proc_do_entropy(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
{
struct ctl_table fake_table;
int entropy_count;
- entropy_count = *(int *)table->data >> ENTROPY_SHIFT;
+ entropy_count = *(int *)table->data >> POOL_ENTROPY_SHIFT;
fake_table.data = &entropy_count;
fake_table.maxlen = sizeof(entropy_count);
return proc_dointvec(&fake_table, write, buffer, lenp, ppos);
}
-static int sysctl_poolsize = INPUT_POOL_WORDS * 32;
-extern struct ctl_table random_table[];
-struct ctl_table random_table[] = {
+static int sysctl_poolsize = POOL_BITS;
+static struct ctl_table random_table[] = {
{
.procname = "poolsize",
.data = &sysctl_poolsize,
#endif
{ }
};
-#endif /* CONFIG_SYSCTL */
+
+/*
+ * rand_initialize() is called before sysctl_init(),
+ * so we cannot call register_sysctl_init() in rand_initialize()
+ */
+static int __init random_sysctls_init(void)
+{
+ register_sysctl_init("kernel/random", random_table);
+ return 0;
+}
+device_initcall(random_sysctls_init);
+#endif /* CONFIG_SYSCTL */
struct batched_entropy {
union {
* point prior.
*/
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
+ .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
};
u64 get_random_u64(void)
EXPORT_SYMBOL(get_random_u64);
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
+ .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
};
u32 get_random_u32(void)
{
int cpu;
unsigned long flags;
- for_each_possible_cpu (cpu) {
+ for_each_possible_cpu(cpu) {
struct batched_entropy *batched_entropy;
batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
* Return: A page aligned address within [start, start + range). On error,
* @start is returned.
*/
-unsigned long
-randomize_page(unsigned long start, unsigned long range)
+unsigned long randomize_page(unsigned long start, unsigned long range)
{
if (!PAGE_ALIGNED(start)) {
range -= PAGE_ALIGN(start) - start;
void add_hwgenerator_randomness(const char *buffer, size_t count,
size_t entropy)
{
- struct entropy_store *poolp = &input_pool;
-
if (unlikely(crng_init == 0)) {
size_t ret = crng_fast_load(buffer, count);
- mix_pool_bytes(poolp, buffer, ret);
+ mix_pool_bytes(buffer, ret);
count -= ret;
buffer += ret;
if (!count || crng_init == 0)
*/
wait_event_interruptible(random_write_wait,
!system_wq || kthread_should_stop() ||
- ENTROPY_BITS(&input_pool) <= random_write_wakeup_bits);
- mix_pool_bytes(poolp, buffer, count);
- credit_entropy_bits(poolp, entropy);
+ POOL_ENTROPY_BITS() <= random_write_wakeup_bits);
+ mix_pool_bytes(buffer, count);
+ credit_entropy_bits(entropy);
}
EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
clk_prepare(data->clk[i].hw.clk);
}
- err = of_clk_add_hw_provider(client->dev.of_node, of_clk_si5341_get,
+ err = devm_of_clk_add_hw_provider(&client->dev, of_clk_si5341_get,
data);
if (err) {
dev_err(&client->dev, "unable to add clk provider\n");
-// SPDX-License-Identifier: GPL-1.0
+// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2021 MediaTek Inc.
* Author: Sam Shih <sam.shih@mediatek.com>
-// SPDX-License-Identifier: GPL-1.0
+// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2021 MediaTek Inc.
* Author: Sam Shih <sam.shih@mediatek.com>
-// SPDX-License-Identifier: GPL-1.0
+// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2021 MediaTek Inc.
* Author: Sam Shih <sam.shih@mediatek.com>
{
struct clk_init_data init;
struct visconti_pll *pll;
- struct clk *pll_clk;
struct clk_hw *pll_hw_clk;
size_t len;
int ret;
pll_hw_clk = &pll->hw;
ret = clk_hw_register(NULL, &pll->hw);
if (ret) {
- pr_err("failed to register pll clock %s : %ld\n", name, PTR_ERR(pll_clk));
+ pr_err("failed to register pll clock %s : %d\n", name, ret);
kfree(pll);
pll_hw_clk = ERR_PTR(ret);
}
#ifdef CONFIG_ARM64
# define EFI_RT_VIRTUAL_LIMIT DEFAULT_MAP_WINDOW_64
+#elif defined(CONFIG_RISCV)
+# define EFI_RT_VIRTUAL_LIMIT TASK_SIZE_MIN
#else
# define EFI_RT_VIRTUAL_LIMIT TASK_SIZE
#endif
struct device *dev = &pdev->dev;
struct gpio_irq_chip *girq;
struct idt_gpio_ctrl *ctrl;
- unsigned int parent_irq;
+ int parent_irq;
int ngpios;
int ret;
return PTR_ERR(ctrl->pic);
parent_irq = platform_get_irq(pdev, 0);
- if (!parent_irq)
- return -EINVAL;
+ if (parent_irq < 0)
+ return parent_irq;
girq = &ctrl->gc.irq;
girq->chip = &idt_gpio_irqchip;
unsigned offset, int value);
struct irq_domain *irq;
- unsigned int irqn;
+ int irqn;
};
/*
}
mpc8xxx_gc->irqn = platform_get_irq(pdev, 0);
- if (!mpc8xxx_gc->irqn)
- return 0;
+ if (mpc8xxx_gc->irqn < 0)
+ return mpc8xxx_gc->irqn;
mpc8xxx_gc->irq = irq_domain_create_linear(fwnode,
MPC8XXX_GPIO_PINS,
}
if (amdgpu_sriov_vf(adev))
- amdgpu_virt_exchange_data(adev);
+ amdgpu_virt_init_data_exchange(adev);
r = amdgpu_ib_pool_init(adev);
if (r) {
if (amdgpu_gpu_recovery == -1) {
switch (adev->asic_type) {
- case CHIP_BONAIRE:
- case CHIP_HAWAII:
- case CHIP_TOPAZ:
- case CHIP_TONGA:
- case CHIP_FIJI:
- case CHIP_POLARIS10:
- case CHIP_POLARIS11:
- case CHIP_POLARIS12:
- case CHIP_VEGAM:
- case CHIP_VEGA20:
- case CHIP_VEGA10:
- case CHIP_VEGA12:
- case CHIP_RAVEN:
- case CHIP_ARCTURUS:
- case CHIP_RENOIR:
- case CHIP_NAVI10:
- case CHIP_NAVI14:
- case CHIP_NAVI12:
- case CHIP_SIENNA_CICHLID:
- case CHIP_NAVY_FLOUNDER:
- case CHIP_DIMGREY_CAVEFISH:
- case CHIP_BEIGE_GOBY:
- case CHIP_VANGOGH:
- case CHIP_ALDEBARAN:
- break;
- default:
+#ifdef CONFIG_DRM_AMDGPU_SI
+ case CHIP_VERDE:
+ case CHIP_TAHITI:
+ case CHIP_PITCAIRN:
+ case CHIP_OLAND:
+ case CHIP_HAINAN:
+#endif
+#ifdef CONFIG_DRM_AMDGPU_CIK
+ case CHIP_KAVERI:
+ case CHIP_KABINI:
+ case CHIP_MULLINS:
+#endif
+ case CHIP_CARRIZO:
+ case CHIP_STONEY:
+ case CHIP_CYAN_SKILLFISH:
goto disabled;
+ default:
+ break;
}
}
return (le32_to_cpu(bhdr->binary_signature) == BINARY_SIGNATURE);
}
+static void amdgpu_discovery_harvest_config_quirk(struct amdgpu_device *adev)
+{
+ /*
+ * So far, apply this quirk only on those Navy Flounder boards which
+ * have a bad harvest table of VCN config.
+ */
+ if ((adev->ip_versions[UVD_HWIP][1] == IP_VERSION(3, 0, 1)) &&
+ (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2))) {
+ switch (adev->pdev->revision) {
+ case 0xC1:
+ case 0xC2:
+ case 0xC3:
+ case 0xC5:
+ case 0xC7:
+ case 0xCF:
+ case 0xDF:
+ adev->vcn.harvest_config |= AMDGPU_VCN_HARVEST_VCN1;
+ break;
+ default:
+ break;
+ }
+ }
+}
+
static int amdgpu_discovery_init(struct amdgpu_device *adev)
{
struct table_info *info;
break;
}
}
- /* some IP discovery tables on Navy Flounder don't have this set correctly */
- if ((adev->ip_versions[UVD_HWIP][1] == IP_VERSION(3, 0, 1)) &&
- (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2)) &&
- (adev->pdev->revision != 0xFF))
- adev->vcn.harvest_config |= AMDGPU_VCN_HARVEST_VCN1;
+
+ amdgpu_discovery_harvest_config_quirk(adev);
+
if (vcn_harvest_count == adev->vcn.num_vcn_inst) {
adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK;
adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK;
return -ENODEV;
}
- if (flags == 0) {
- DRM_INFO("Unsupported asic. Remove me when IP discovery init is in place.\n");
- return -ENODEV;
- }
-
if (amdgpu_virtual_display ||
amdgpu_device_asic_has_dc_support(flags & AMD_ASIC_MASK))
supports_atomic = true;
adev->virt.fw_reserve.p_vf2pf = NULL;
adev->virt.vf2pf_update_interval_ms = 0;
- if (adev->bios != NULL) {
- adev->virt.vf2pf_update_interval_ms = 2000;
+ if (adev->mman.fw_vram_usage_va != NULL) {
+ /* go through this logic in ip_init and reset to init workqueue*/
+ amdgpu_virt_exchange_data(adev);
+ INIT_DELAYED_WORK(&adev->virt.vf2pf_work, amdgpu_virt_update_vf2pf_work_item);
+ schedule_delayed_work(&(adev->virt.vf2pf_work), msecs_to_jiffies(adev->virt.vf2pf_update_interval_ms));
+ } else if (adev->bios != NULL) {
+ /* got through this logic in early init stage to get necessary flags, e.g. rlcg_acc related*/
adev->virt.fw_reserve.p_pf2vf =
(struct amd_sriov_msg_pf2vf_info_header *)
(adev->bios + (AMD_SRIOV_MSG_PF2VF_OFFSET_KB << 10));
amdgpu_virt_read_pf2vf_data(adev);
}
-
- if (adev->virt.vf2pf_update_interval_ms != 0) {
- INIT_DELAYED_WORK(&adev->virt.vf2pf_work, amdgpu_virt_update_vf2pf_work_item);
- schedule_delayed_work(&(adev->virt.vf2pf_work), msecs_to_jiffies(adev->virt.vf2pf_update_interval_ms));
- }
}
if (adev->virt.ras_init_done)
amdgpu_virt_add_bad_page(adev, bp_block_offset, bp_block_size);
}
- } else if (adev->bios != NULL) {
- adev->virt.fw_reserve.p_pf2vf =
- (struct amd_sriov_msg_pf2vf_info_header *)
- (adev->bios + (AMD_SRIOV_MSG_PF2VF_OFFSET_KB << 10));
-
- amdgpu_virt_read_pf2vf_data(adev);
}
}
{
int r;
+ /* APUs don't have full asic reset */
+ if (adev->flags & AMD_IS_APU)
+ return 0;
+
if (cik_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
dev_info(adev->dev, "BACO reset\n");
r = amdgpu_dpm_baco_reset(adev);
{
int r;
+ /* APUs don't have full asic reset */
+ if (adev->flags & AMD_IS_APU)
+ return 0;
+
if (vi_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
dev_info(adev->dev, "BACO reset\n");
r = amdgpu_dpm_baco_reset(adev);
#include "clk/clk_11_0_0_offset.h"
#include "clk/clk_11_0_0_sh_mask.h"
-#include "irq/dcn20/irq_service_dcn20.h"
#undef FN
#define FN(reg_name, field_name) \
bool force_reset = false;
bool p_state_change_support;
int total_plane_count;
- int irq_src;
- uint32_t hpd_state;
if (dc->work_arounds.skip_clock_update)
return;
if (dc->res_pool->pp_smu)
pp_smu = &dc->res_pool->pp_smu->nv_funcs;
- for (irq_src = DC_IRQ_SOURCE_HPD1; irq_src <= DC_IRQ_SOURCE_HPD6; irq_src++) {
- hpd_state = dc_get_hpd_state_dcn20(dc->res_pool->irqs, irq_src);
- if (hpd_state)
- break;
- }
-
- if (display_count == 0 && !hpd_state)
+ if (display_count == 0)
enter_display_off = true;
if (enter_display_off == safe_to_lower) {
#include "clk/clk_10_0_2_sh_mask.h"
#include "renoir_ip_offset.h"
-#include "irq/dcn21/irq_service_dcn21.h"
/* Constants */
struct dc_clocks *new_clocks = &context->bw_ctx.bw.dcn.clk;
struct dc *dc = clk_mgr_base->ctx->dc;
int display_count;
- int irq_src;
bool update_dppclk = false;
bool update_dispclk = false;
bool dpp_clock_lowered = false;
- uint32_t hpd_state;
struct dmcu *dmcu = clk_mgr_base->ctx->dc->res_pool->dmcu;
display_count = rn_get_active_display_cnt_wa(dc, context);
- for (irq_src = DC_IRQ_SOURCE_HPD1; irq_src <= DC_IRQ_SOURCE_HPD5; irq_src++) {
- hpd_state = dc_get_hpd_state_dcn21(dc->res_pool->irqs, irq_src);
- if (hpd_state)
- break;
- }
-
/* if we can go lower, go lower */
- if (display_count == 0 && !hpd_state) {
+ if (display_count == 0) {
rn_vbios_smu_set_dcn_low_power_state(clk_mgr, DCN_PWR_STATE_LOW_POWER);
/* update power state */
clk_mgr_base->clks.pwr_state = DCN_PWR_STATE_LOW_POWER;
}
}
-uint32_t dc_get_hpd_state_dcn20(struct irq_service *irq_service, enum dc_irq_source source)
-{
- const struct irq_source_info *info;
- uint32_t addr;
- uint32_t value;
- uint32_t current_status;
-
- info = find_irq_source_info(irq_service, source);
- if (!info)
- return 0;
-
- addr = info->status_reg;
- if (!addr)
- return 0;
-
- value = dm_read_reg(irq_service->ctx, addr);
- current_status =
- get_reg_field_value(
- value,
- HPD0_DC_HPD_INT_STATUS,
- DC_HPD_SENSE);
-
- return current_status;
-}
-
static bool hpd_ack(
struct irq_service *irq_service,
const struct irq_source_info *info)
struct irq_service *dal_irq_service_dcn20_create(
struct irq_service_init_data *init_data);
-uint32_t dc_get_hpd_state_dcn20(struct irq_service *irq_service, enum dc_irq_source source);
-
#endif
return DC_IRQ_SOURCE_INVALID;
}
-uint32_t dc_get_hpd_state_dcn21(struct irq_service *irq_service, enum dc_irq_source source)
-{
- const struct irq_source_info *info;
- uint32_t addr;
- uint32_t value;
- uint32_t current_status;
-
- info = find_irq_source_info(irq_service, source);
- if (!info)
- return 0;
-
- addr = info->status_reg;
- if (!addr)
- return 0;
-
- value = dm_read_reg(irq_service->ctx, addr);
- current_status =
- get_reg_field_value(
- value,
- HPD0_DC_HPD_INT_STATUS,
- DC_HPD_SENSE);
-
- return current_status;
-}
-
static bool hpd_ack(
struct irq_service *irq_service,
const struct irq_source_info *info)
struct irq_service *dal_irq_service_dcn21_create(
struct irq_service_init_data *init_data);
-uint32_t dc_get_hpd_state_dcn21(struct irq_service *irq_service, enum dc_irq_source source);
-
#endif
*irq_service = NULL;
}
-const struct irq_source_info *find_irq_source_info(
+static const struct irq_source_info *find_irq_source_info(
struct irq_service *irq_service,
enum dc_irq_source source)
{
const struct irq_service_funcs *funcs;
};
-const struct irq_source_info *find_irq_source_info(
- struct irq_service *irq_service,
- enum dc_irq_source source);
-
void dal_irq_service_construct(
struct irq_service *irq_service,
struct irq_service_init_data *init_data);
mutex_init(&mgr->probe_lock);
#if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
mutex_init(&mgr->topology_ref_history_lock);
+ stack_depot_init();
#endif
INIT_LIST_HEAD(&mgr->tx_msg_downq);
INIT_LIST_HEAD(&mgr->destroy_port_list);
add_hole(&mm->head_node);
mm->scan_active = 0;
+
+#ifdef CONFIG_DRM_DEBUG_MM
+ stack_depot_init();
+#endif
}
EXPORT_SYMBOL(drm_mm_init);
kfree(buf);
}
+
+static void __drm_stack_depot_init(void)
+{
+ stack_depot_init();
+}
#else /* CONFIG_DRM_DEBUG_MODESET_LOCK */
static depot_stack_handle_t __drm_stack_depot_save(void)
{
static void __drm_stack_depot_print(depot_stack_handle_t stack_depot)
{
}
+static void __drm_stack_depot_init(void)
+{
+}
#endif /* CONFIG_DRM_DEBUG_MODESET_LOCK */
/**
{
ww_mutex_init(&lock->mutex, &crtc_ww_class);
INIT_LIST_HEAD(&lock->head);
+ __drm_stack_depot_init();
}
EXPORT_SYMBOL(drm_modeset_lock_init);
void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
{
- unsigned int i = 0;
+ unsigned int i;
dev_err(gpu->dev, "recover hung GPU!\n");
/* complete all events, the GPU won't do it after the reset */
spin_lock(&gpu->event_spinlock);
- for_each_set_bit_from(i, gpu->event_bitmap, ETNA_NR_EVENTS)
+ for_each_set_bit(i, gpu->event_bitmap, ETNA_NR_EVENTS)
complete(&gpu->event_free);
bitmap_zero(gpu->event_bitmap, ETNA_NR_EVENTS);
spin_unlock(&gpu->event_spinlock);
intel_de_rmw(dev_priv, DKL_TX_DPCNTL2(tc_port),
DKL_TX_DP20BITMODE, 0);
+
+ if (IS_ALDERLAKE_P(dev_priv)) {
+ u32 val;
+
+ if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI)) {
+ if (ln == 0) {
+ val = DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(0);
+ val |= DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(2);
+ } else {
+ val = DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(3);
+ val |= DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(3);
+ }
+ } else {
+ val = DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(0);
+ val |= DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(0);
+ }
+
+ intel_de_rmw(dev_priv, DKL_TX_DPCNTL2(tc_port),
+ DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK |
+ DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK,
+ val);
+ }
}
}
static const union intel_ddi_buf_trans_entry _ehl_combo_phy_trans_dp[] = {
/* NT mV Trans mV db */
{ .icl = { 0xA, 0x33, 0x3F, 0x00, 0x00 } }, /* 350 350 0.0 */
- { .icl = { 0xA, 0x47, 0x36, 0x00, 0x09 } }, /* 350 500 3.1 */
- { .icl = { 0xC, 0x64, 0x34, 0x00, 0x0B } }, /* 350 700 6.0 */
- { .icl = { 0x6, 0x7F, 0x30, 0x00, 0x0F } }, /* 350 900 8.2 */
+ { .icl = { 0xA, 0x47, 0x38, 0x00, 0x07 } }, /* 350 500 3.1 */
+ { .icl = { 0xC, 0x64, 0x33, 0x00, 0x0C } }, /* 350 700 6.0 */
+ { .icl = { 0x6, 0x7F, 0x2F, 0x00, 0x10 } }, /* 350 900 8.2 */
{ .icl = { 0xA, 0x46, 0x3F, 0x00, 0x00 } }, /* 500 500 0.0 */
- { .icl = { 0xC, 0x64, 0x38, 0x00, 0x07 } }, /* 500 700 2.9 */
+ { .icl = { 0xC, 0x64, 0x37, 0x00, 0x08 } }, /* 500 700 2.9 */
{ .icl = { 0x6, 0x7F, 0x32, 0x00, 0x0D } }, /* 500 900 5.1 */
{ .icl = { 0xC, 0x61, 0x3F, 0x00, 0x00 } }, /* 650 700 0.6 */
- { .icl = { 0x6, 0x7F, 0x38, 0x00, 0x07 } }, /* 600 900 3.5 */
+ { .icl = { 0x6, 0x7F, 0x37, 0x00, 0x08 } }, /* 600 900 3.5 */
{ .icl = { 0x6, 0x7F, 0x3F, 0x00, 0x00 } }, /* 900 900 0.0 */
};
{}
};
-static struct ctl_table i915_root[] = {
- {
- .procname = "i915",
- .maxlen = 0,
- .mode = 0555,
- .child = oa_table,
- },
- {}
-};
-
-static struct ctl_table dev_root[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = i915_root,
- },
- {}
-};
-
static void oa_init_supported_formats(struct i915_perf *perf)
{
struct drm_i915_private *i915 = perf->i915;
int i915_perf_sysctl_register(void)
{
- sysctl_header = register_sysctl_table(dev_root);
+ sysctl_header = register_sysctl("dev/i915", oa_table);
return 0;
}
_DKL_PHY2_BASE) + \
_DKL_TX_DPCNTL1)
-#define _DKL_TX_DPCNTL2 0x2C8
-#define DKL_TX_DP20BITMODE (1 << 2)
+#define _DKL_TX_DPCNTL2 0x2C8
+#define DKL_TX_DP20BITMODE REG_BIT(2)
+#define DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK REG_GENMASK(4, 3)
+#define DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(val) REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK, (val))
+#define DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK REG_GENMASK(6, 5)
+#define DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(val) REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK, (val))
#define DKL_TX_DPCNTL2(tc_port) _MMIO(_PORT(tc_port, \
_DKL_PHY1_BASE, \
_DKL_PHY2_BASE) + \
static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm)
{
spin_lock_init(&rpm->debug.lock);
+
+ if (rpm->available)
+ stack_depot_init();
}
static noinline depot_stack_handle_t
fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
if (unlikely(!fpriv)) {
r = -ENOMEM;
- goto out_suspend;
+ goto err_suspend;
}
if (rdev->accel_working) {
vm = &fpriv->vm;
r = radeon_vm_init(rdev, vm);
if (r)
- goto out_fpriv;
+ goto err_fpriv;
r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);
if (r)
- goto out_vm_fini;
+ goto err_vm_fini;
/* map the ib pool buffer read only into
* virtual address space */
rdev->ring_tmp_bo.bo);
if (!vm->ib_bo_va) {
r = -ENOMEM;
- goto out_vm_fini;
+ goto err_vm_fini;
}
r = radeon_vm_bo_set_addr(rdev, vm->ib_bo_va,
RADEON_VM_PAGE_READABLE |
RADEON_VM_PAGE_SNOOPED);
if (r)
- goto out_vm_fini;
+ goto err_vm_fini;
}
file_priv->driver_priv = fpriv;
}
- if (!r)
- goto out_suspend;
+ pm_runtime_mark_last_busy(dev->dev);
+ pm_runtime_put_autosuspend(dev->dev);
+ return 0;
-out_vm_fini:
+err_vm_fini:
radeon_vm_fini(rdev, vm);
-out_fpriv:
+err_fpriv:
kfree(fpriv);
-out_suspend:
+
+err_suspend:
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
return r;
#define USB_DEVICE_ID_HP_X2 0x074d
#define USB_DEVICE_ID_HP_X2_10_COVER 0x0755
#define I2C_DEVICE_ID_HP_ENVY_X360_15 0x2d05
+#define I2C_DEVICE_ID_HP_ENVY_X360_15T_DR100 0x29CF
#define I2C_DEVICE_ID_HP_SPECTRE_X360_15 0x2817
#define USB_DEVICE_ID_ASUS_UX550VE_TOUCHSCREEN 0x2544
#define USB_DEVICE_ID_ASUS_UX550_TOUCHSCREEN 0x2706
HID_BATTERY_QUIRK_IGNORE },
{ HID_I2C_DEVICE(USB_VENDOR_ID_ELAN, I2C_DEVICE_ID_HP_ENVY_X360_15),
HID_BATTERY_QUIRK_IGNORE },
+ { HID_I2C_DEVICE(USB_VENDOR_ID_ELAN, I2C_DEVICE_ID_HP_ENVY_X360_15T_DR100),
+ HID_BATTERY_QUIRK_IGNORE },
{ HID_I2C_DEVICE(USB_VENDOR_ID_ELAN, I2C_DEVICE_ID_HP_SPECTRE_X360_15),
HID_BATTERY_QUIRK_IGNORE },
{ HID_I2C_DEVICE(USB_VENDOR_ID_ELAN, I2C_DEVICE_ID_SURFACE_GO_TOUCHSCREEN),
* Author: Sean O'Brien <seobrien@chromium.org>
*/
+#include <linux/device.h>
#include <linux/hid.h>
+#include <linux/kernel.h>
#include <linux/module.h>
+#include <linux/sysfs.h>
#define MIN_FN_ROW_KEY 1
#define MAX_FN_ROW_KEY 24
#define HID_VD_FN_ROW_PHYSMAP 0x00000001
#define HID_USAGE_FN_ROW_PHYSMAP (HID_UP_GOOGLEVENDOR | HID_VD_FN_ROW_PHYSMAP)
-static struct hid_driver hid_vivaldi;
-
struct vivaldi_data {
u32 function_row_physmap[MAX_FN_ROW_KEY - MIN_FN_ROW_KEY + 1];
int max_function_row_key;
return size;
}
-DEVICE_ATTR_RO(function_row_physmap);
+static DEVICE_ATTR_RO(function_row_physmap);
static struct attribute *sysfs_attrs[] = {
&dev_attr_function_row_physmap.attr,
NULL
struct hid_usage *usage)
{
struct vivaldi_data *drvdata = hid_get_drvdata(hdev);
+ struct hid_report *report = field->report;
int fn_key;
int ret;
u32 report_len;
- u8 *buf;
+ u8 *report_data, *buf;
if (field->logical != HID_USAGE_FN_ROW_PHYSMAP ||
(usage->hid & HID_USAGE_PAGE) != HID_UP_ORDINAL)
if (fn_key > drvdata->max_function_row_key)
drvdata->max_function_row_key = fn_key;
- buf = hid_alloc_report_buf(field->report, GFP_KERNEL);
- if (!buf)
+ report_data = buf = hid_alloc_report_buf(report, GFP_KERNEL);
+ if (!report_data)
return;
- report_len = hid_report_len(field->report);
- ret = hid_hw_raw_request(hdev, field->report->id, buf,
+ report_len = hid_report_len(report);
+ if (!report->id) {
+ /*
+ * hid_hw_raw_request() will stuff report ID (which will be 0)
+ * into the first byte of the buffer even for unnumbered
+ * reports, so we need to account for this to avoid getting
+ * -EOVERFLOW in return.
+ * Note that hid_alloc_report_buf() adds 7 bytes to the size
+ * so we can safely say that we have space for an extra byte.
+ */
+ report_len++;
+ }
+
+ ret = hid_hw_raw_request(hdev, report->id, report_data,
report_len, HID_FEATURE_REPORT,
HID_REQ_GET_REPORT);
if (ret < 0) {
goto out;
}
- ret = hid_report_raw_event(hdev, HID_FEATURE_REPORT, buf,
+ if (!report->id) {
+ /*
+ * Undo the damage from hid_hw_raw_request() for unnumbered
+ * reports.
+ */
+ report_data++;
+ report_len--;
+ }
+
+ ret = hid_report_raw_event(hdev, HID_FEATURE_REPORT, report_data,
report_len, 0);
if (ret) {
dev_warn(&hdev->dev, "failed to report feature %d\n",
struct uhid_device {
struct mutex devlock;
+
+ /* This flag tracks whether the HID device is usable for commands from
+ * userspace. The flag is already set before hid_add_device(), which
+ * runs in workqueue context, to allow hid_add_device() to communicate
+ * with userspace.
+ * However, if hid_add_device() fails, the flag is cleared without
+ * holding devlock.
+ * We guarantee that if @running changes from true to false while you're
+ * holding @devlock, it's still fine to access @hid.
+ */
bool running;
__u8 *rd_data;
uint rd_size;
+ /* When this is NULL, userspace may use UHID_CREATE/UHID_CREATE2. */
struct hid_device *hid;
struct uhid_event input_buf;
if (ret) {
hid_err(uhid->hid, "Cannot register HID device: error %d\n", ret);
- hid_destroy_device(uhid->hid);
- uhid->hid = NULL;
- uhid->running = false;
+ /* We used to call hid_destroy_device() here, but that's really
+ * messy to get right because we have to coordinate with
+ * concurrent writes from userspace that might be in the middle
+ * of using uhid->hid.
+ * Just leave uhid->hid as-is for now, and clean it up when
+ * userspace tries to close or reinitialize the uhid instance.
+ *
+ * However, we do have to clear the ->running flag and do a
+ * wakeup to make sure userspace knows that the device is gone.
+ */
+ WRITE_ONCE(uhid->running, false);
+ wake_up_interruptible(&uhid->report_wait);
}
}
spin_unlock_irqrestore(&uhid->qlock, flags);
ret = wait_event_interruptible_timeout(uhid->report_wait,
- !uhid->report_running || !uhid->running,
+ !uhid->report_running || !READ_ONCE(uhid->running),
5 * HZ);
- if (!ret || !uhid->running || uhid->report_running)
+ if (!ret || !READ_ONCE(uhid->running) || uhid->report_running)
ret = -EIO;
else if (ret < 0)
ret = -ERESTARTSYS;
struct uhid_event *ev;
int ret;
- if (!uhid->running)
+ if (!READ_ONCE(uhid->running))
return -EIO;
ev = kzalloc(sizeof(*ev), GFP_KERNEL);
struct uhid_event *ev;
int ret;
- if (!uhid->running || count > UHID_DATA_MAX)
+ if (!READ_ONCE(uhid->running) || count > UHID_DATA_MAX)
return -EIO;
ev = kzalloc(sizeof(*ev), GFP_KERNEL);
void *rd_data;
int ret;
- if (uhid->running)
+ if (uhid->hid)
return -EALREADY;
rd_size = ev->u.create2.rd_size;
static int uhid_dev_destroy(struct uhid_device *uhid)
{
- if (!uhid->running)
+ if (!uhid->hid)
return -EINVAL;
- uhid->running = false;
+ WRITE_ONCE(uhid->running, false);
wake_up_interruptible(&uhid->report_wait);
cancel_work_sync(&uhid->worker);
hid_destroy_device(uhid->hid);
+ uhid->hid = NULL;
kfree(uhid->rd_data);
return 0;
static int uhid_dev_input(struct uhid_device *uhid, struct uhid_event *ev)
{
- if (!uhid->running)
+ if (!READ_ONCE(uhid->running))
return -EINVAL;
hid_input_report(uhid->hid, HID_INPUT_REPORT, ev->u.input.data,
static int uhid_dev_input2(struct uhid_device *uhid, struct uhid_event *ev)
{
- if (!uhid->running)
+ if (!READ_ONCE(uhid->running))
return -EINVAL;
hid_input_report(uhid->hid, HID_INPUT_REPORT, ev->u.input2.data,
static int uhid_dev_get_report_reply(struct uhid_device *uhid,
struct uhid_event *ev)
{
- if (!uhid->running)
+ if (!READ_ONCE(uhid->running))
return -EINVAL;
uhid_report_wake_up(uhid, ev->u.get_report_reply.id, ev);
static int uhid_dev_set_report_reply(struct uhid_device *uhid,
struct uhid_event *ev)
{
- if (!uhid->running)
+ if (!READ_ONCE(uhid->running))
return -EINVAL;
uhid_report_wake_up(uhid, ev->u.set_report_reply.id, ev);
}
}
+static bool wacom_wac_slot_is_active(struct input_dev *dev, int key)
+{
+ struct input_mt *mt = dev->mt;
+ struct input_mt_slot *s;
+
+ if (!mt)
+ return false;
+
+ for (s = mt->slots; s != mt->slots + mt->num_slots; s++) {
+ if (s->key == key &&
+ input_mt_get_value(s, ABS_MT_TRACKING_ID) >= 0) {
+ return true;
+ }
+ }
+
+ return false;
+}
+
static void wacom_wac_finger_event(struct hid_device *hdev,
struct hid_field *field, struct hid_usage *usage, __s32 value)
{
}
if (usage->usage_index + 1 == field->report_count) {
- if (equivalent_usage == wacom_wac->hid_data.last_slot_field &&
- wacom_wac->hid_data.confidence)
- wacom_wac_finger_slot(wacom_wac, wacom_wac->touch_input);
+ if (equivalent_usage == wacom_wac->hid_data.last_slot_field) {
+ bool touch_removed = wacom_wac_slot_is_active(wacom_wac->touch_input,
+ wacom_wac->hid_data.id) && !wacom_wac->hid_data.tipswitch;
+
+ if (wacom_wac->hid_data.confidence || touch_removed) {
+ wacom_wac_finger_slot(wacom_wac, wacom_wac->touch_input);
+ }
+ }
}
}
hid_data->confidence = true;
+ hid_data->cc_report = 0;
+ hid_data->cc_index = -1;
+ hid_data->cc_value_index = -1;
+
for (i = 0; i < report->maxfield; i++) {
struct hid_field *field = report->field[i];
int j;
hid_data->cc_index >= 0) {
struct hid_field *field = report->field[hid_data->cc_index];
int value = field->value[hid_data->cc_value_index];
- if (value)
+ if (value) {
hid_data->num_expected = value;
+ hid_data->num_received = 0;
+ }
}
else {
hid_data->num_expected = wacom_wac->features.touch_max;
+ hid_data->num_received = 0;
}
}
input_sync(input);
wacom_wac->hid_data.num_received = 0;
+ wacom_wac->hid_data.num_expected = 0;
/* keep touch state for pen event */
wacom_wac->shared->touch_down = wacom_wac_finger_count_touches(wacom_wac);
static long i8k_ioctl(struct file *fp, unsigned int cmd, unsigned long arg)
{
- struct dell_smm_data *data = PDE_DATA(file_inode(fp));
+ struct dell_smm_data *data = pde_data(file_inode(fp));
int __user *argp = (int __user *)arg;
int speed, err;
int val = 0;
static int i8k_open_fs(struct inode *inode, struct file *file)
{
- return single_open(file, i8k_proc_show, PDE_DATA(inode));
+ return single_open(file, i8k_proc_show, pde_data(inode));
}
static const struct proc_ops i8k_proc_ops = {
gpio_status = reg;
- gpio_nr = 0;
- for_each_set_bit_from(gpio_nr, mask, LTC2992_GPIO_NR) {
+ for_each_set_bit(gpio_nr, mask, LTC2992_GPIO_NR) {
if (test_bit(LTC2992_GPIO_BIT(gpio_nr), &gpio_status))
set_bit(gpio_nr, bits);
}
.relax = stm32_hwspinlock_relax,
};
+static void stm32_hwspinlock_disable_clk(void *data)
+{
+ struct platform_device *pdev = data;
+ struct stm32_hwspinlock *hw = platform_get_drvdata(pdev);
+ struct device *dev = &pdev->dev;
+
+ pm_runtime_get_sync(dev);
+ pm_runtime_disable(dev);
+ pm_runtime_set_suspended(dev);
+ pm_runtime_put_noidle(dev);
+
+ clk_disable_unprepare(hw->clk);
+}
+
static int stm32_hwspinlock_probe(struct platform_device *pdev)
{
+ struct device *dev = &pdev->dev;
struct stm32_hwspinlock *hw;
void __iomem *io_base;
size_t array_size;
return PTR_ERR(io_base);
array_size = STM32_MUTEX_NUM_LOCKS * sizeof(struct hwspinlock);
- hw = devm_kzalloc(&pdev->dev, sizeof(*hw) + array_size, GFP_KERNEL);
+ hw = devm_kzalloc(dev, sizeof(*hw) + array_size, GFP_KERNEL);
if (!hw)
return -ENOMEM;
- hw->clk = devm_clk_get(&pdev->dev, "hsem");
+ hw->clk = devm_clk_get(dev, "hsem");
if (IS_ERR(hw->clk))
return PTR_ERR(hw->clk);
- for (i = 0; i < STM32_MUTEX_NUM_LOCKS; i++)
- hw->bank.lock[i].priv = io_base + i * sizeof(u32);
+ ret = clk_prepare_enable(hw->clk);
+ if (ret) {
+ dev_err(dev, "Failed to prepare_enable clock\n");
+ return ret;
+ }
platform_set_drvdata(pdev, hw);
- pm_runtime_enable(&pdev->dev);
- ret = hwspin_lock_register(&hw->bank, &pdev->dev, &stm32_hwspinlock_ops,
- 0, STM32_MUTEX_NUM_LOCKS);
+ pm_runtime_get_noresume(dev);
+ pm_runtime_set_active(dev);
+ pm_runtime_enable(dev);
+ pm_runtime_put(dev);
- if (ret)
- pm_runtime_disable(&pdev->dev);
+ ret = devm_add_action_or_reset(dev, stm32_hwspinlock_disable_clk, pdev);
+ if (ret) {
+ dev_err(dev, "Failed to register action\n");
+ return ret;
+ }
- return ret;
-}
+ for (i = 0; i < STM32_MUTEX_NUM_LOCKS; i++)
+ hw->bank.lock[i].priv = io_base + i * sizeof(u32);
-static int stm32_hwspinlock_remove(struct platform_device *pdev)
-{
- struct stm32_hwspinlock *hw = platform_get_drvdata(pdev);
- int ret;
+ ret = devm_hwspin_lock_register(dev, &hw->bank, &stm32_hwspinlock_ops,
+ 0, STM32_MUTEX_NUM_LOCKS);
- ret = hwspin_lock_unregister(&hw->bank);
if (ret)
- dev_err(&pdev->dev, "%s failed: %d\n", __func__, ret);
-
- pm_runtime_disable(&pdev->dev);
+ dev_err(dev, "Failed to register hwspinlock\n");
- return 0;
+ return ret;
}
static int __maybe_unused stm32_hwspinlock_runtime_suspend(struct device *dev)
static struct platform_driver stm32_hwspinlock_driver = {
.probe = stm32_hwspinlock_probe,
- .remove = stm32_hwspinlock_remove,
.driver = {
.name = "stm32_hwspinlock",
.of_match_table = stm32_hwpinlock_ids,
{
unsigned int free_cfg_slot;
- free_cfg_slot = find_next_zero_bit(&st->cfg_slots_status, AD7124_MAX_CONFIGS, 0);
+ free_cfg_slot = find_first_zero_bit(&st->cfg_slots_status, AD7124_MAX_CONFIGS);
if (free_cfg_slot == AD7124_MAX_CONFIGS)
return -1;
*/
static void irdma_get_used_rsrc(struct irdma_device *iwdev)
{
- iwdev->rf->used_pds = find_next_zero_bit(iwdev->rf->allocated_pds,
- iwdev->rf->max_pd, 0);
- iwdev->rf->used_qps = find_next_zero_bit(iwdev->rf->allocated_qps,
- iwdev->rf->max_qp, 0);
- iwdev->rf->used_cqs = find_next_zero_bit(iwdev->rf->allocated_cqs,
- iwdev->rf->max_cq, 0);
- iwdev->rf->used_mrs = find_next_zero_bit(iwdev->rf->allocated_mrs,
- iwdev->rf->max_mr, 0);
+ iwdev->rf->used_pds = find_first_zero_bit(iwdev->rf->allocated_pds,
+ iwdev->rf->max_pd);
+ iwdev->rf->used_qps = find_first_zero_bit(iwdev->rf->allocated_qps,
+ iwdev->rf->max_qp);
+ iwdev->rf->used_cqs = find_first_zero_bit(iwdev->rf->allocated_cqs,
+ iwdev->rf->max_cq);
+ iwdev->rf->used_mrs = find_first_zero_bit(iwdev->rf->allocated_mrs,
+ iwdev->rf->max_mr);
}
void irdma_ctrl_deinit_hw(struct irdma_pci_f *rf)
pid_t pid;
pid_t subpid[QLOGIC_IB_MAX_SUBCTXT];
/* same size as task_struct .comm[], command that opened context */
- char comm[16];
+ char comm[TASK_COMM_LEN];
/* pkeys set by this use of this ctxt */
u16 pkeys[4];
/* so file ops can get at unit */
rcd->tid_pg_list = ptmp;
rcd->pid = current->pid;
init_waitqueue_head(&dd->rcd[ctxt]->wait);
- strlcpy(rcd->comm, current->comm, sizeof(rcd->comm));
+ get_task_comm(rcd->comm, current);
ctxt_fp(fp) = rcd;
qib_stats.sps_ctxts++;
dd->freectxts--;
* the port number must be in the Dynamic Ports range
* (0xc000 - 0xffff).
*/
- qp->src_port = RXE_ROCE_V2_SPORT +
- (hash_32_generic(qp_num(qp), 14) & 0x3fff);
+ qp->src_port = RXE_ROCE_V2_SPORT + (hash_32(qp_num(qp), 14) & 0x3fff);
qp->sq.max_wr = init->cap.max_send_wr;
/* These caps are limited by rxe_qp_chk_cap() done by the caller */
{ }
};
-/* dir in /proc/sys/dev */
-static struct ctl_table mac_hid_dir[] = {
- {
- .procname = "mac_hid",
- .maxlen = 0,
- .mode = 0555,
- .child = mac_hid_files,
- },
- { }
-};
-
-/* /proc/sys/dev itself, in case that is not there yet */
-static struct ctl_table mac_hid_root_dir[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = mac_hid_dir,
- },
- { }
-};
-
static struct ctl_table_header *mac_hid_sysctl_header;
static int __init mac_hid_init(void)
{
- mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir);
+ mac_hid_sysctl_header = register_sysctl("dev/mac_hid", mac_hid_files);
if (!mac_hid_sysctl_header)
return -ENOMEM;
/* Part 1: Find a free minor number */
mutex_lock(&cec_devnode_lock);
- minor = find_next_zero_bit(cec_devnode_nums, CEC_NUM_DEVICES, 0);
+ minor = find_first_zero_bit(cec_devnode_nums, CEC_NUM_DEVICES);
if (minor == CEC_NUM_DEVICES) {
mutex_unlock(&cec_devnode_lock);
pr_err("could not get a free minor\n");
/* Part 1: Find a free minor number */
mutex_lock(&media_devnode_lock);
- minor = find_next_zero_bit(media_devnode_nums, MEDIA_NUM_DEVICES, 0);
+ minor = find_first_zero_bit(media_devnode_nums, MEDIA_NUM_DEVICES);
if (minor == MEDIA_NUM_DEVICES) {
mutex_unlock(&media_devnode_lock);
pr_err("could not get a free minor\n");
if (!hdr.ExtPageLength)
goto out;
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer)
goto out;
rc = 1;
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return rc;
}
const uint64_t required_mask = dma_get_required_mask
(&pdev->dev);
if (required_mask > DMA_BIT_MASK(32)
- && !pci_set_dma_mask(pdev, DMA_BIT_MASK(64))
- && !pci_set_consistent_dma_mask(pdev,
- DMA_BIT_MASK(64))) {
+ && !dma_set_mask(&pdev->dev, DMA_BIT_MASK(64))
+ && !dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) {
ioc->dma_mask = DMA_BIT_MASK(64);
dinitprintk(ioc, printk(MYIOC_s_INFO_FMT
": 64 BIT PCI BUS DMA ADDRESSING SUPPORTED\n",
ioc->name));
- } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))
- && !pci_set_consistent_dma_mask(pdev,
- DMA_BIT_MASK(32))) {
+ } else if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(32))
+ && !dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32))) {
ioc->dma_mask = DMA_BIT_MASK(32);
dinitprintk(ioc, printk(MYIOC_s_INFO_FMT
": 32 BIT PCI BUS DMA ADDRESSING SUPPORTED\n",
goto out_pci_release_region;
}
} else {
- if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))
- && !pci_set_consistent_dma_mask(pdev,
- DMA_BIT_MASK(32))) {
+ if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(32))
+ && !dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32))) {
ioc->dma_mask = DMA_BIT_MASK(32);
dinitprintk(ioc, printk(MYIOC_s_INFO_FMT
": 32 BIT PCI BUS DMA ADDRESSING SUPPORTED\n",
if (ioc->spi_data.pIocPg4 != NULL) {
sz = ioc->spi_data.IocPg4Sz;
- pci_free_consistent(ioc->pcidev, sz,
- ioc->spi_data.pIocPg4,
- ioc->spi_data.IocPg4_dma);
+ dma_free_coherent(&ioc->pcidev->dev, sz,
+ ioc->spi_data.pIocPg4,
+ ioc->spi_data.IocPg4_dma);
ioc->spi_data.pIocPg4 = NULL;
ioc->alloc_total -= sz;
}
rc = 0;
goto out;
}
- ioc->cached_fw = pci_alloc_consistent(ioc->pcidev, size, &ioc->cached_fw_dma);
+ ioc->cached_fw = dma_alloc_coherent(&ioc->pcidev->dev, size,
+ &ioc->cached_fw_dma, GFP_ATOMIC);
if (!ioc->cached_fw) {
printk(MYIOC_s_ERR_FMT "Unable to allocate memory for the cached firmware image!\n",
ioc->name);
sz = ioc->facts.FWImageSize;
dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "free_fw_memory: FW Image @ %p[%p], sz=%d[%x] bytes\n",
ioc->name, ioc->cached_fw, (void *)(ulong)ioc->cached_fw_dma, sz, sz));
- pci_free_consistent(ioc->pcidev, sz, ioc->cached_fw, ioc->cached_fw_dma);
+ dma_free_coherent(&ioc->pcidev->dev, sz, ioc->cached_fw,
+ ioc->cached_fw_dma);
ioc->alloc_total -= sz;
ioc->cached_fw = NULL;
}
*/
if (ioc->pcidev->device == MPI_MANUFACTPAGE_DEVID_SAS1078 &&
ioc->dma_mask > DMA_BIT_MASK(35)) {
- if (!pci_set_dma_mask(ioc->pcidev, DMA_BIT_MASK(32))
- && !pci_set_consistent_dma_mask(ioc->pcidev,
- DMA_BIT_MASK(32))) {
+ if (!dma_set_mask(&ioc->pcidev->dev, DMA_BIT_MASK(32))
+ && !dma_set_coherent_mask(&ioc->pcidev->dev, DMA_BIT_MASK(32))) {
dma_mask = DMA_BIT_MASK(35);
d36memprintk(ioc, printk(MYIOC_s_DEBUG_FMT
"setting 35 bit addressing for "
ioc->name));
} else {
/*Reseting DMA mask to 64 bit*/
- pci_set_dma_mask(ioc->pcidev,
- DMA_BIT_MASK(64));
- pci_set_consistent_dma_mask(ioc->pcidev,
- DMA_BIT_MASK(64));
+ dma_set_mask(&ioc->pcidev->dev,
+ DMA_BIT_MASK(64));
+ dma_set_coherent_mask(&ioc->pcidev->dev,
+ DMA_BIT_MASK(64));
printk(MYIOC_s_ERR_FMT
"failed setting 35 bit addressing for "
alloc_dma += ioc->reply_sz;
}
- if (dma_mask == DMA_BIT_MASK(35) && !pci_set_dma_mask(ioc->pcidev,
- ioc->dma_mask) && !pci_set_consistent_dma_mask(ioc->pcidev,
+ if (dma_mask == DMA_BIT_MASK(35) && !dma_set_mask(&ioc->pcidev->dev,
+ ioc->dma_mask) && !dma_set_coherent_mask(&ioc->pcidev->dev,
ioc->dma_mask))
d36memprintk(ioc, printk(MYIOC_s_DEBUG_FMT
"restoring 64 bit addressing\n", ioc->name));
ioc->sense_buf_pool = NULL;
}
- if (dma_mask == DMA_BIT_MASK(35) && !pci_set_dma_mask(ioc->pcidev,
- DMA_BIT_MASK(64)) && !pci_set_consistent_dma_mask(ioc->pcidev,
+ if (dma_mask == DMA_BIT_MASK(35) && !dma_set_mask(&ioc->pcidev->dev,
+ DMA_BIT_MASK(64)) && !dma_set_coherent_mask(&ioc->pcidev->dev,
DMA_BIT_MASK(64)))
d36memprintk(ioc, printk(MYIOC_s_DEBUG_FMT
"restoring 64 bit addressing\n", ioc->name));
if (hdr.PageLength > 0) {
data_sz = hdr.PageLength * 4;
- ppage0_alloc = pci_alloc_consistent(ioc->pcidev, data_sz, &page0_dma);
+ ppage0_alloc = dma_alloc_coherent(&ioc->pcidev->dev, data_sz,
+ &page0_dma, GFP_KERNEL);
rc = -ENOMEM;
if (ppage0_alloc) {
memset((u8 *)ppage0_alloc, 0, data_sz);
}
- pci_free_consistent(ioc->pcidev, data_sz, (u8 *) ppage0_alloc, page0_dma);
+ dma_free_coherent(&ioc->pcidev->dev, data_sz,
+ (u8 *)ppage0_alloc, page0_dma);
/* FIXME!
* Normalize endianness of structure data,
data_sz = hdr.PageLength * 4;
rc = -ENOMEM;
- ppage1_alloc = pci_alloc_consistent(ioc->pcidev, data_sz, &page1_dma);
+ ppage1_alloc = dma_alloc_coherent(&ioc->pcidev->dev, data_sz,
+ &page1_dma, GFP_KERNEL);
if (ppage1_alloc) {
memset((u8 *)ppage1_alloc, 0, data_sz);
cfg.physAddr = page1_dma;
memcpy(&ioc->lan_cnfg_page1, ppage1_alloc, copy_sz);
}
- pci_free_consistent(ioc->pcidev, data_sz, (u8 *) ppage1_alloc, page1_dma);
+ dma_free_coherent(&ioc->pcidev->dev, data_sz,
+ (u8 *)ppage1_alloc, page1_dma);
/* FIXME!
* Normalize endianness of structure data,
/* Read the config page */
data_sz = hdr.PageLength * 4;
rc = -ENOMEM;
- ppage_alloc = pci_alloc_consistent(ioc->pcidev, data_sz, &page_dma);
+ ppage_alloc = dma_alloc_coherent(&ioc->pcidev->dev, data_sz,
+ &page_dma, GFP_KERNEL);
if (ppage_alloc) {
memset((u8 *)ppage_alloc, 0, data_sz);
cfg.physAddr = page_dma;
if ((rc = mpt_config(ioc, &cfg)) == 0)
ioc->biosVersion = le32_to_cpu(ppage_alloc->BiosVersion);
- pci_free_consistent(ioc->pcidev, data_sz, (u8 *) ppage_alloc, page_dma);
+ dma_free_coherent(&ioc->pcidev->dev, data_sz,
+ (u8 *)ppage_alloc, page_dma);
}
return rc;
return -EFAULT;
if (header.PageLength > 0) {
- pbuf = pci_alloc_consistent(ioc->pcidev, header.PageLength * 4, &buf_dma);
+ pbuf = dma_alloc_coherent(&ioc->pcidev->dev,
+ header.PageLength * 4, &buf_dma,
+ GFP_KERNEL);
if (pbuf) {
cfg.action = MPI_CONFIG_ACTION_PAGE_READ_CURRENT;
cfg.physAddr = buf_dma;
}
}
if (pbuf) {
- pci_free_consistent(ioc->pcidev, header.PageLength * 4, pbuf, buf_dma);
+ dma_free_coherent(&ioc->pcidev->dev,
+ header.PageLength * 4, pbuf,
+ buf_dma);
}
}
}
if (header.PageLength > 0) {
/* Allocate memory and read SCSI Port Page 2
*/
- pbuf = pci_alloc_consistent(ioc->pcidev, header.PageLength * 4, &buf_dma);
+ pbuf = dma_alloc_coherent(&ioc->pcidev->dev,
+ header.PageLength * 4, &buf_dma,
+ GFP_KERNEL);
if (pbuf) {
cfg.action = MPI_CONFIG_ACTION_PAGE_READ_NVRAM;
cfg.physAddr = buf_dma;
}
}
- pci_free_consistent(ioc->pcidev, header.PageLength * 4, pbuf, buf_dma);
+ dma_free_coherent(&ioc->pcidev->dev,
+ header.PageLength * 4, pbuf,
+ buf_dma);
}
}
if (!hdr.PageLength)
goto out;
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer)
goto out;
out:
if (buffer)
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, buffer,
- dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ buffer, dma_handle);
}
/**
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
rc = -ENOMEM;
out:
if (buffer)
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, buffer,
- dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ buffer, dma_handle);
return rc;
}
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
rc = 0;
out:
if (buffer)
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, buffer,
- dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ buffer, dma_handle);
return rc;
}
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
rc = -ENOMEM;
out:
if (buffer)
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, buffer,
- dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ buffer, dma_handle);
return rc;
}
return -EFAULT;
iocpage2sz = header.PageLength * 4;
- pIoc2 = pci_alloc_consistent(ioc->pcidev, iocpage2sz, &ioc2_dma);
+ pIoc2 = dma_alloc_coherent(&ioc->pcidev->dev, iocpage2sz, &ioc2_dma,
+ GFP_KERNEL);
if (!pIoc2)
return -ENOMEM;
pIoc2->RaidVolume[i].VolumeID);
out:
- pci_free_consistent(ioc->pcidev, iocpage2sz, pIoc2, ioc2_dma);
+ dma_free_coherent(&ioc->pcidev->dev, iocpage2sz, pIoc2, ioc2_dma);
return rc;
}
/* Read Header good, alloc memory
*/
iocpage3sz = header.PageLength * 4;
- pIoc3 = pci_alloc_consistent(ioc->pcidev, iocpage3sz, &ioc3_dma);
+ pIoc3 = dma_alloc_coherent(&ioc->pcidev->dev, iocpage3sz, &ioc3_dma,
+ GFP_KERNEL);
if (!pIoc3)
return 0;
}
}
- pci_free_consistent(ioc->pcidev, iocpage3sz, pIoc3, ioc3_dma);
+ dma_free_coherent(&ioc->pcidev->dev, iocpage3sz, pIoc3, ioc3_dma);
return 0;
}
if ( (pIoc4 = ioc->spi_data.pIocPg4) == NULL ) {
iocpage4sz = (header.PageLength + 4) * 4; /* Allow 4 additional SEP's */
- pIoc4 = pci_alloc_consistent(ioc->pcidev, iocpage4sz, &ioc4_dma);
+ pIoc4 = dma_alloc_coherent(&ioc->pcidev->dev, iocpage4sz,
+ &ioc4_dma, GFP_KERNEL);
if (!pIoc4)
return;
ioc->alloc_total += iocpage4sz;
ioc->spi_data.IocPg4_dma = ioc4_dma;
ioc->spi_data.IocPg4Sz = iocpage4sz;
} else {
- pci_free_consistent(ioc->pcidev, iocpage4sz, pIoc4, ioc4_dma);
+ dma_free_coherent(&ioc->pcidev->dev, iocpage4sz, pIoc4,
+ ioc4_dma);
ioc->spi_data.pIocPg4 = NULL;
ioc->alloc_total -= iocpage4sz;
}
/* Read Header good, alloc memory
*/
iocpage1sz = header.PageLength * 4;
- pIoc1 = pci_alloc_consistent(ioc->pcidev, iocpage1sz, &ioc1_dma);
+ pIoc1 = dma_alloc_coherent(&ioc->pcidev->dev, iocpage1sz, &ioc1_dma,
+ GFP_KERNEL);
if (!pIoc1)
return;
}
}
- pci_free_consistent(ioc->pcidev, iocpage1sz, pIoc1, ioc1_dma);
+ dma_free_coherent(&ioc->pcidev->dev, iocpage1sz, pIoc1, ioc1_dma);
return;
}
goto out;
cfg.action = MPI_CONFIG_ACTION_PAGE_READ_CURRENT;
- pbuf = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4, &buf_dma);
+ pbuf = dma_alloc_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ &buf_dma, GFP_KERNEL);
if (!pbuf)
goto out;
out:
if (pbuf)
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, pbuf, buf_dma);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.PageLength * 4, pbuf,
+ buf_dma);
}
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
* copying the data in this array into the correct place in the
* request and chain buffers.
*/
- sglbuf = pci_alloc_consistent(ioc->pcidev, MAX_SGL_BYTES, sglbuf_dma);
+ sglbuf = dma_alloc_coherent(&ioc->pcidev->dev, MAX_SGL_BYTES,
+ sglbuf_dma, GFP_KERNEL);
if (sglbuf == NULL)
goto free_and_fail;
if (sgdir & 0x04000000)
- dir = PCI_DMA_TODEVICE;
+ dir = DMA_TO_DEVICE;
else
- dir = PCI_DMA_FROMDEVICE;
+ dir = DMA_FROM_DEVICE;
/* At start:
* sgl = sglbuf = point to beginning of sg buffer
while (bytes_allocd < bytes) {
this_alloc = min(alloc_sz, bytes-bytes_allocd);
buflist[buflist_ent].len = this_alloc;
- buflist[buflist_ent].kptr = pci_alloc_consistent(ioc->pcidev,
- this_alloc,
- &pa);
+ buflist[buflist_ent].kptr = dma_alloc_coherent(&ioc->pcidev->dev,
+ this_alloc,
+ &pa, GFP_KERNEL);
if (buflist[buflist_ent].kptr == NULL) {
alloc_sz = alloc_sz / 2;
if (alloc_sz == 0) {
bytes_allocd += this_alloc;
sgl->FlagsLength = (0x10000000|sgdir|this_alloc);
- dma_addr = pci_map_single(ioc->pcidev,
- buflist[buflist_ent].kptr, this_alloc, dir);
+ dma_addr = dma_map_single(&ioc->pcidev->dev,
+ buflist[buflist_ent].kptr,
+ this_alloc, dir);
sgl->Address = dma_addr;
fragcnt++;
kptr = buflist[i].kptr;
len = buflist[i].len;
- pci_free_consistent(ioc->pcidev, len, kptr, dma_addr);
+ dma_free_coherent(&ioc->pcidev->dev, len, kptr,
+ dma_addr);
}
- pci_free_consistent(ioc->pcidev, MAX_SGL_BYTES, sglbuf, *sglbuf_dma);
+ dma_free_coherent(&ioc->pcidev->dev, MAX_SGL_BYTES, sglbuf,
+ *sglbuf_dma);
}
kfree(buflist);
return NULL;
int n = 0;
if (sg->FlagsLength & 0x04000000)
- dir = PCI_DMA_TODEVICE;
+ dir = DMA_TO_DEVICE;
else
- dir = PCI_DMA_FROMDEVICE;
+ dir = DMA_FROM_DEVICE;
nib = (sg->FlagsLength & 0xF0000000) >> 28;
while (! (nib & 0x4)) { /* eob */
dma_addr = sg->Address;
kptr = bl->kptr;
len = bl->len;
- pci_unmap_single(ioc->pcidev, dma_addr, len, dir);
- pci_free_consistent(ioc->pcidev, len, kptr, dma_addr);
+ dma_unmap_single(&ioc->pcidev->dev, dma_addr, len,
+ dir);
+ dma_free_coherent(&ioc->pcidev->dev, len, kptr,
+ dma_addr);
n++;
}
sg++;
dma_addr = sg->Address;
kptr = bl->kptr;
len = bl->len;
- pci_unmap_single(ioc->pcidev, dma_addr, len, dir);
- pci_free_consistent(ioc->pcidev, len, kptr, dma_addr);
+ dma_unmap_single(&ioc->pcidev->dev, dma_addr, len, dir);
+ dma_free_coherent(&ioc->pcidev->dev, len, kptr, dma_addr);
n++;
}
- pci_free_consistent(ioc->pcidev, MAX_SGL_BYTES, sgl, sgl_dma);
+ dma_free_coherent(&ioc->pcidev->dev, MAX_SGL_BYTES, sgl, sgl_dma);
kfree(buflist);
dctlprintk(ioc, printk(MYIOC_s_DEBUG_FMT "-SG: Free'd 1 SGL buf + %d kbufs!\n",
ioc->name, n));
}
flagsLength |= karg.dataOutSize;
bufOut.len = karg.dataOutSize;
- bufOut.kptr = pci_alloc_consistent(
- ioc->pcidev, bufOut.len, &dma_addr_out);
+ bufOut.kptr = dma_alloc_coherent(&ioc->pcidev->dev,
+ bufOut.len,
+ &dma_addr_out, GFP_KERNEL);
if (bufOut.kptr == NULL) {
rc = -ENOMEM;
flagsLength |= karg.dataInSize;
bufIn.len = karg.dataInSize;
- bufIn.kptr = pci_alloc_consistent(ioc->pcidev,
- bufIn.len, &dma_addr_in);
+ bufIn.kptr = dma_alloc_coherent(&ioc->pcidev->dev,
+ bufIn.len,
+ &dma_addr_in, GFP_KERNEL);
if (bufIn.kptr == NULL) {
rc = -ENOMEM;
/* Free the allocated memory.
*/
if (bufOut.kptr != NULL) {
- pci_free_consistent(ioc->pcidev,
- bufOut.len, (void *) bufOut.kptr, dma_addr_out);
+ dma_free_coherent(&ioc->pcidev->dev, bufOut.len,
+ (void *)bufOut.kptr, dma_addr_out);
}
if (bufIn.kptr != NULL) {
- pci_free_consistent(ioc->pcidev,
- bufIn.len, (void *) bufIn.kptr, dma_addr_in);
+ dma_free_coherent(&ioc->pcidev->dev, bufIn.len,
+ (void *)bufIn.kptr, dma_addr_in);
}
/* mf is null if command issued successfully
/* Issue the second config page request */
cfg.action = MPI_CONFIG_ACTION_PAGE_READ_CURRENT;
- pbuf = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4, &buf_dma);
+ pbuf = dma_alloc_coherent(&ioc->pcidev->dev,
+ hdr.PageLength * 4,
+ &buf_dma, GFP_KERNEL);
if (pbuf) {
cfg.physAddr = buf_dma;
if (mpt_config(ioc, &cfg) == 0) {
pdata->BoardTracerNumber, 24);
}
}
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, pbuf, buf_dma);
+ dma_free_coherent(&ioc->pcidev->dev,
+ hdr.PageLength * 4, pbuf,
+ buf_dma);
pbuf = NULL;
}
}
else
IstwiRWRequest->DeviceAddr = 0xB0;
- pbuf = pci_alloc_consistent(ioc->pcidev, 4, &buf_dma);
+ pbuf = dma_alloc_coherent(&ioc->pcidev->dev, 4, &buf_dma, GFP_KERNEL);
if (!pbuf)
goto out;
ioc->add_sge((char *)&IstwiRWRequest->SGL,
SET_MGMT_MSG_CONTEXT(ioc->ioctl_cmds.msg_context, 0);
if (pbuf)
- pci_free_consistent(ioc->pcidev, 4, pbuf, buf_dma);
+ dma_free_coherent(&ioc->pcidev->dev, 4, pbuf, buf_dma);
/* Copy the data from kernel memory to user memory
*/
/* Get the data transfer speeds
*/
data_sz = ioc->spi_data.sdp0length * 4;
- pg0_alloc = pci_alloc_consistent(ioc->pcidev, data_sz, &page_dma);
+ pg0_alloc = dma_alloc_coherent(&ioc->pcidev->dev, data_sz, &page_dma,
+ GFP_KERNEL);
if (pg0_alloc) {
hdr.PageVersion = ioc->spi_data.sdp0version;
hdr.PageLength = data_sz;
karg.negotiated_speed = HP_DEV_SPEED_ASYNC;
}
- pci_free_consistent(ioc->pcidev, data_sz, (u8 *) pg0_alloc, page_dma);
+ dma_free_coherent(&ioc->pcidev->dev, data_sz, (u8 *)pg0_alloc,
+ page_dma);
}
/* Set defaults
/* Issue the second config page request */
cfg.action = MPI_CONFIG_ACTION_PAGE_READ_CURRENT;
data_sz = (int) cfg.cfghdr.hdr->PageLength * 4;
- pg3_alloc = pci_alloc_consistent(ioc->pcidev, data_sz, &page_dma);
+ pg3_alloc = dma_alloc_coherent(&ioc->pcidev->dev, data_sz,
+ &page_dma, GFP_KERNEL);
if (pg3_alloc) {
cfg.physAddr = page_dma;
cfg.pageAddr = (karg.hdr.channel << 8) | karg.hdr.id;
karg.phase_errors = (u32) le16_to_cpu(pg3_alloc->PhaseErrorCount);
karg.parity_errors = (u32) le16_to_cpu(pg3_alloc->ParityErrorCount);
}
- pci_free_consistent(ioc->pcidev, data_sz, (u8 *) pg3_alloc, page_dma);
+ dma_free_coherent(&ioc->pcidev->dev, data_sz,
+ (u8 *)pg3_alloc, page_dma);
}
}
hd = shost_priv(ioc->sh);
if (priv->RcvCtl[i].skb != NULL) {
/**/ dlprintk((KERN_INFO MYNAM "/lan_close: bucket %05x "
/**/ "is still out\n", i));
- pci_unmap_single(mpt_dev->pcidev, priv->RcvCtl[i].dma,
- priv->RcvCtl[i].len,
- PCI_DMA_FROMDEVICE);
+ dma_unmap_single(&mpt_dev->pcidev->dev,
+ priv->RcvCtl[i].dma,
+ priv->RcvCtl[i].len, DMA_FROM_DEVICE);
dev_kfree_skb(priv->RcvCtl[i].skb);
}
}
for (i = 0; i < priv->tx_max_out; i++) {
if (priv->SendCtl[i].skb != NULL) {
- pci_unmap_single(mpt_dev->pcidev, priv->SendCtl[i].dma,
- priv->SendCtl[i].len,
- PCI_DMA_TODEVICE);
+ dma_unmap_single(&mpt_dev->pcidev->dev,
+ priv->SendCtl[i].dma,
+ priv->SendCtl[i].len, DMA_TO_DEVICE);
dev_kfree_skb(priv->SendCtl[i].skb);
}
}
__func__, sent));
priv->SendCtl[ctx].skb = NULL;
- pci_unmap_single(mpt_dev->pcidev, priv->SendCtl[ctx].dma,
- priv->SendCtl[ctx].len, PCI_DMA_TODEVICE);
+ dma_unmap_single(&mpt_dev->pcidev->dev, priv->SendCtl[ctx].dma,
+ priv->SendCtl[ctx].len, DMA_TO_DEVICE);
dev_kfree_skb_irq(sent);
spin_lock_irqsave(&priv->txfidx_lock, flags);
__func__, sent));
priv->SendCtl[ctx].skb = NULL;
- pci_unmap_single(mpt_dev->pcidev, priv->SendCtl[ctx].dma,
- priv->SendCtl[ctx].len, PCI_DMA_TODEVICE);
+ dma_unmap_single(&mpt_dev->pcidev->dev,
+ priv->SendCtl[ctx].dma,
+ priv->SendCtl[ctx].len, DMA_TO_DEVICE);
dev_kfree_skb_irq(sent);
priv->mpt_txfidx[++priv->mpt_txfidx_tail] = ctx;
skb_reset_mac_header(skb);
skb_pull(skb, 12);
- dma = pci_map_single(mpt_dev->pcidev, skb->data, skb->len,
- PCI_DMA_TODEVICE);
+ dma = dma_map_single(&mpt_dev->pcidev->dev, skb->data, skb->len,
+ DMA_TO_DEVICE);
priv->SendCtl[ctx].skb = skb;
priv->SendCtl[ctx].dma = dma;
return -ENOMEM;
}
- pci_dma_sync_single_for_cpu(mpt_dev->pcidev, priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len, PCI_DMA_FROMDEVICE);
+ dma_sync_single_for_cpu(&mpt_dev->pcidev->dev,
+ priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len,
+ DMA_FROM_DEVICE);
skb_copy_from_linear_data(old_skb, skb_put(skb, len), len);
- pci_dma_sync_single_for_device(mpt_dev->pcidev, priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len, PCI_DMA_FROMDEVICE);
+ dma_sync_single_for_device(&mpt_dev->pcidev->dev,
+ priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len,
+ DMA_FROM_DEVICE);
goto out;
}
priv->RcvCtl[ctx].skb = NULL;
- pci_unmap_single(mpt_dev->pcidev, priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len, PCI_DMA_FROMDEVICE);
+ dma_unmap_single(&mpt_dev->pcidev->dev, priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len, DMA_FROM_DEVICE);
out:
spin_lock_irqsave(&priv->rxfidx_lock, flags);
// dlprintk((KERN_INFO MYNAM "@rpr[2] TC + 3\n"));
priv->RcvCtl[ctx].skb = NULL;
- pci_unmap_single(mpt_dev->pcidev, priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len, PCI_DMA_FROMDEVICE);
+ dma_unmap_single(&mpt_dev->pcidev->dev, priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len, DMA_FROM_DEVICE);
dev_kfree_skb_any(skb);
priv->mpt_rxfidx[++priv->mpt_rxfidx_tail] = ctx;
// IOC_AND_NETDEV_NAMES_s_s(dev),
// i, l));
- pci_dma_sync_single_for_cpu(mpt_dev->pcidev,
- priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len,
- PCI_DMA_FROMDEVICE);
+ dma_sync_single_for_cpu(&mpt_dev->pcidev->dev,
+ priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len,
+ DMA_FROM_DEVICE);
skb_copy_from_linear_data(old_skb, skb_put(skb, l), l);
- pci_dma_sync_single_for_device(mpt_dev->pcidev,
- priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len,
- PCI_DMA_FROMDEVICE);
+ dma_sync_single_for_device(&mpt_dev->pcidev->dev,
+ priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len,
+ DMA_FROM_DEVICE);
priv->mpt_rxfidx[++priv->mpt_rxfidx_tail] = ctx;
szrem -= l;
return -ENOMEM;
}
- pci_dma_sync_single_for_cpu(mpt_dev->pcidev,
- priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len,
- PCI_DMA_FROMDEVICE);
+ dma_sync_single_for_cpu(&mpt_dev->pcidev->dev,
+ priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len,
+ DMA_FROM_DEVICE);
skb_copy_from_linear_data(old_skb, skb_put(skb, len), len);
- pci_dma_sync_single_for_device(mpt_dev->pcidev,
- priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len,
- PCI_DMA_FROMDEVICE);
+ dma_sync_single_for_device(&mpt_dev->pcidev->dev,
+ priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len,
+ DMA_FROM_DEVICE);
spin_lock_irqsave(&priv->rxfidx_lock, flags);
priv->mpt_rxfidx[++priv->mpt_rxfidx_tail] = ctx;
priv->RcvCtl[ctx].skb = NULL;
- pci_unmap_single(mpt_dev->pcidev, priv->RcvCtl[ctx].dma,
- priv->RcvCtl[ctx].len, PCI_DMA_FROMDEVICE);
+ dma_unmap_single(&mpt_dev->pcidev->dev, priv->RcvCtl[ctx].dma,
+ priv->RcvCtl[ctx].len, DMA_FROM_DEVICE);
priv->RcvCtl[ctx].dma = 0;
priv->mpt_rxfidx[++priv->mpt_rxfidx_tail] = ctx;
skb = priv->RcvCtl[ctx].skb;
if (skb && (priv->RcvCtl[ctx].len != len)) {
- pci_unmap_single(mpt_dev->pcidev,
+ dma_unmap_single(&mpt_dev->pcidev->dev,
priv->RcvCtl[ctx].dma,
priv->RcvCtl[ctx].len,
- PCI_DMA_FROMDEVICE);
+ DMA_FROM_DEVICE);
dev_kfree_skb(priv->RcvCtl[ctx].skb);
skb = priv->RcvCtl[ctx].skb = NULL;
}
break;
}
- dma = pci_map_single(mpt_dev->pcidev, skb->data,
- len, PCI_DMA_FROMDEVICE);
+ dma = dma_map_single(&mpt_dev->pcidev->dev,
+ skb->data, len,
+ DMA_FROM_DEVICE);
priv->RcvCtl[ctx].skb = skb;
priv->RcvCtl[ctx].dma = dma;
if (!hdr.PageLength)
goto out;
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer)
goto out;
out:
if (buffer)
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, buffer,
- dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ buffer, dma_handle);
}
/**
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
error = -ENOMEM;
goto out;
enclosure->sep_channel = buffer->SEPBus;
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return error;
}
if (!hdr.ExtPageLength)
return -ENXIO;
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
le32_to_cpu(buffer->PhyResetProblemCount);
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
return error;
}
<< MPI_SGE_FLAGS_SHIFT;
if (!dma_map_sg(&ioc->pcidev->dev, job->request_payload.sg_list,
- 1, PCI_DMA_BIDIRECTIONAL))
+ 1, DMA_BIDIRECTIONAL))
goto put_mf;
flagsLength |= (sg_dma_len(job->request_payload.sg_list) - 4);
flagsLength = flagsLength << MPI_SGE_FLAGS_SHIFT;
if (!dma_map_sg(&ioc->pcidev->dev, job->reply_payload.sg_list,
- 1, PCI_DMA_BIDIRECTIONAL))
+ 1, DMA_BIDIRECTIONAL))
goto unmap_out;
flagsLength |= sg_dma_len(job->reply_payload.sg_list) + 4;
ioc->add_sge(psge, flagsLength,
unmap_in:
dma_unmap_sg(&ioc->pcidev->dev, job->reply_payload.sg_list, 1,
- PCI_DMA_BIDIRECTIONAL);
+ DMA_BIDIRECTIONAL);
unmap_out:
dma_unmap_sg(&ioc->pcidev->dev, job->request_payload.sg_list, 1,
- PCI_DMA_BIDIRECTIONAL);
+ DMA_BIDIRECTIONAL);
put_mf:
if (mf)
mpt_free_msg_frame(ioc, mf);
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
error = -ENOMEM;
goto out;
}
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return error;
}
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
error = -ENOMEM;
goto out;
device_missing_delay & MPI_SAS_IOUNIT1_REPORT_MISSING_TIMEOUT_MASK;
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return error;
}
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
error = -ENOMEM;
goto out;
phy_info->attached.handle = le16_to_cpu(buffer->AttachedDevHandle);
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return error;
}
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
error = -ENOMEM;
goto out;
device_info->flags = le16_to_cpu(buffer->Flags);
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return error;
}
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
error = -ENOMEM;
goto out;
}
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return error;
}
goto out;
}
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer) {
error = -ENOMEM;
goto out;
phy_info->attached.handle = le16_to_cpu(buffer->AttachedDevHandle);
out_free_consistent:
- pci_free_consistent(ioc->pcidev, hdr.ExtPageLength * 4,
- buffer, dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.ExtPageLength * 4, buffer,
+ dma_handle);
out:
return error;
}
sz = sizeof(struct rep_manu_request) + sizeof(struct rep_manu_reply);
- data_out = pci_alloc_consistent(ioc->pcidev, sz, &data_out_dma);
+ data_out = dma_alloc_coherent(&ioc->pcidev->dev, sz, &data_out_dma,
+ GFP_KERNEL);
if (!data_out) {
printk(KERN_ERR "Memory allocation failure at %s:%d/%s()!\n",
__FILE__, __LINE__, __func__);
}
out_free:
if (data_out_dma)
- pci_free_consistent(ioc->pcidev, sz, data_out, data_out_dma);
+ dma_free_coherent(&ioc->pcidev->dev, sz, data_out,
+ data_out_dma);
put_mf:
if (mf)
mpt_free_msg_frame(ioc, mf);
if (!hdr.PageLength)
goto out;
- buffer = pci_alloc_consistent(ioc->pcidev, hdr.PageLength * 4,
- &dma_handle);
+ buffer = dma_alloc_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ &dma_handle, GFP_KERNEL);
if (!buffer)
goto out;
out:
if (buffer)
- pci_free_consistent(ioc->pcidev, hdr.PageLength * 4, buffer,
- dma_handle);
+ dma_free_coherent(&ioc->pcidev->dev, hdr.PageLength * 4,
+ buffer, dma_handle);
}
/*
* Work queue thread to handle SAS hotplug events
* is at least SH_MOBILE_SDHI_MIN_TAP_ROW probes long then use the
* center index as the tap, otherwise bail out.
*/
- bitmap_for_each_set_region(bitmap, rs, re, 0, taps_size) {
+ for_each_set_bitrange(rs, re, bitmap, taps_size) {
if (re - rs > tap_cnt) {
tap_end = re;
tap_start = rs;
skb->l4_hash)
return skb->hash;
- return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
- skb->mac_header, skb->network_header,
+ return __bond_xmit_hash(bond, skb, skb->data, skb->protocol,
+ skb_mac_offset(skb), skb_network_offset(skb),
skb_headlen(skb));
}
struct bonding *bond = netdev_priv(bond_dev);
struct slave *slave = NULL;
struct list_head *iter;
+ bool xmit_suc = false;
+ bool skb_used = false;
bond_for_each_slave_rcu(bond, slave, iter) {
- if (bond_is_last_slave(bond, slave))
- break;
- if (bond_slave_is_up(slave) && slave->link == BOND_LINK_UP) {
- struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
+ struct sk_buff *skb2;
+
+ if (!(bond_slave_is_up(slave) && slave->link == BOND_LINK_UP))
+ continue;
+ if (bond_is_last_slave(bond, slave)) {
+ skb2 = skb;
+ skb_used = true;
+ } else {
+ skb2 = skb_clone(skb, GFP_ATOMIC);
if (!skb2) {
net_err_ratelimited("%s: Error: %s: skb_clone() failed\n",
bond_dev->name, __func__);
continue;
}
- bond_dev_queue_xmit(bond, skb2, slave->dev);
}
+
+ if (bond_dev_queue_xmit(bond, skb2, slave->dev) == NETDEV_TX_OK)
+ xmit_suc = true;
}
- if (slave && bond_slave_is_up(slave) && slave->link == BOND_LINK_UP)
- return bond_dev_queue_xmit(bond, skb, slave->dev);
- return bond_tx_drop(bond_dev, skb);
+ if (!skb_used)
+ dev_kfree_skb_any(skb);
+
+ if (xmit_suc)
+ return NETDEV_TX_OK;
+
+ atomic_long_inc(&bond_dev->tx_dropped);
+ return NET_XMIT_DROP;
}
/*------------------------- Device initialization ---------------------------*/
static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
{
- struct bonding *bond = PDE_DATA(file_inode(seq->file));
+ struct bonding *bond = pde_data(file_inode(seq->file));
struct list_head *iter;
struct slave *slave;
loff_t off = 0;
static void *bond_info_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
- struct bonding *bond = PDE_DATA(file_inode(seq->file));
+ struct bonding *bond = pde_data(file_inode(seq->file));
struct list_head *iter;
struct slave *slave;
bool found = false;
static void bond_info_show_master(struct seq_file *seq)
{
- struct bonding *bond = PDE_DATA(file_inode(seq->file));
+ struct bonding *bond = pde_data(file_inode(seq->file));
const struct bond_opt_value *optval;
struct slave *curr, *primary;
int i;
static void bond_info_show_slave(struct seq_file *seq,
const struct slave *slave)
{
- struct bonding *bond = PDE_DATA(file_inode(seq->file));
+ struct bonding *bond = pde_data(file_inode(seq->file));
seq_printf(seq, "\nSlave Interface: %s\n", slave->dev->name);
seq_printf(seq, "MII Status: %s\n", bond_slave_link_status(slave->link));
/* set EMAC SPEED, depend on PHY */
reg_val = readl(db->membase + EMAC_MAC_SUPP_REG);
- reg_val &= ~(0x1 << 8);
+ reg_val &= ~EMAC_MAC_SUPP_100M;
if (db->speed == SPEED_100)
- reg_val |= 1 << 8;
+ reg_val |= EMAC_MAC_SUPP_100M;
writel(reg_val, db->membase + EMAC_MAC_SUPP_REG);
}
/* re enable interrupt */
reg_val = readl(db->membase + EMAC_INT_CTL_REG);
- reg_val |= (0x01 << 8);
+ reg_val |= EMAC_INT_CTL_RX_EN;
writel(reg_val, db->membase + EMAC_INT_CTL_REG);
db->emacrx_completed_flag = 1;
/* initial EMAC */
/* flush RX FIFO */
reg_val = readl(db->membase + EMAC_RX_CTL_REG);
- reg_val |= 0x8;
+ reg_val |= EMAC_RX_CTL_FLUSH_FIFO;
writel(reg_val, db->membase + EMAC_RX_CTL_REG);
udelay(1);
/* set MII clock */
reg_val = readl(db->membase + EMAC_MAC_MCFG_REG);
- reg_val &= (~(0xf << 2));
- reg_val |= (0xD << 2);
+ reg_val &= ~EMAC_MAC_MCFG_MII_CLKD_MASK;
+ reg_val |= EMAC_MAC_MCFG_MII_CLKD_72;
writel(reg_val, db->membase + EMAC_MAC_MCFG_REG);
/* clear RX counter */
/* enable RX/TX0/RX Hlevel interrup */
reg_val = readl(db->membase + EMAC_INT_CTL_REG);
- reg_val |= (0xf << 0) | (0x01 << 8);
+ reg_val |= (EMAC_INT_CTL_TX_EN | EMAC_INT_CTL_TX_ABRT_EN | EMAC_INT_CTL_RX_EN);
writel(reg_val, db->membase + EMAC_INT_CTL_REG);
spin_unlock_irqrestore(&db->lock, flags);
if (!rxcount) {
db->emacrx_completed_flag = 1;
reg_val = readl(db->membase + EMAC_INT_CTL_REG);
- reg_val |= (0xf << 0) | (0x01 << 8);
+ reg_val |= (EMAC_INT_CTL_TX_EN |
+ EMAC_INT_CTL_TX_ABRT_EN |
+ EMAC_INT_CTL_RX_EN);
writel(reg_val, db->membase + EMAC_INT_CTL_REG);
/* had one stuck? */
writel(reg_val | EMAC_CTL_RX_EN,
db->membase + EMAC_CTL_REG);
reg_val = readl(db->membase + EMAC_INT_CTL_REG);
- reg_val |= (0xf << 0) | (0x01 << 8);
+ reg_val |= (EMAC_INT_CTL_TX_EN |
+ EMAC_INT_CTL_TX_ABRT_EN |
+ EMAC_INT_CTL_RX_EN);
writel(reg_val, db->membase + EMAC_INT_CTL_REG);
db->emacrx_completed_flag = 1;
}
/* Transmit Interrupt check */
- if (int_status & (0x01 | 0x02))
+ if (int_status & EMAC_INT_STA_TX_COMPLETE)
emac_tx_done(dev, db, int_status);
- if (int_status & (0x04 | 0x08))
+ if (int_status & EMAC_INT_STA_TX_ABRT)
netdev_info(dev, " ab : %x\n", int_status);
/* Re-enable interrupt mask */
if (db->emacrx_completed_flag == 1) {
reg_val = readl(db->membase + EMAC_INT_CTL_REG);
- reg_val |= (0xf << 0) | (0x01 << 8);
+ reg_val |= (EMAC_INT_CTL_TX_EN | EMAC_INT_CTL_TX_ABRT_EN | EMAC_INT_CTL_RX_EN);
writel(reg_val, db->membase + EMAC_INT_CTL_REG);
} else {
reg_val = readl(db->membase + EMAC_INT_CTL_REG);
- reg_val |= (0xf << 0);
+ reg_val |= (EMAC_INT_CTL_TX_EN | EMAC_INT_CTL_TX_ABRT_EN);
writel(reg_val, db->membase + EMAC_INT_CTL_REG);
}
clk_disable_unprepare(db->clk);
out_dispose_mapping:
irq_dispose_mapping(ndev->irq);
+ dma_release_channel(db->rx_chan);
out_iounmap:
iounmap(db->membase);
out:
#define EMAC_RX_CTL_REG (0x3c)
#define EMAC_RX_CTL_AUTO_DRQ_EN (1 << 1)
#define EMAC_RX_CTL_DMA_EN (1 << 2)
+#define EMAC_RX_CTL_FLUSH_FIFO (1 << 3)
#define EMAC_RX_CTL_PASS_ALL_EN (1 << 4)
#define EMAC_RX_CTL_PASS_CTL_EN (1 << 5)
#define EMAC_RX_CTL_PASS_CRC_ERR_EN (1 << 6)
#define EMAC_RX_IO_DATA_STATUS_OK (1 << 7)
#define EMAC_RX_FBC_REG (0x50)
#define EMAC_INT_CTL_REG (0x54)
+#define EMAC_INT_CTL_RX_EN (1 << 8)
+#define EMAC_INT_CTL_TX0_EN (1)
+#define EMAC_INT_CTL_TX1_EN (1 << 1)
+#define EMAC_INT_CTL_TX_EN (EMAC_INT_CTL_TX0_EN | EMAC_INT_CTL_TX1_EN)
+#define EMAC_INT_CTL_TX0_ABRT_EN (0x1 << 2)
+#define EMAC_INT_CTL_TX1_ABRT_EN (0x1 << 3)
+#define EMAC_INT_CTL_TX_ABRT_EN (EMAC_INT_CTL_TX0_ABRT_EN | EMAC_INT_CTL_TX1_ABRT_EN)
#define EMAC_INT_STA_REG (0x58)
+#define EMAC_INT_STA_TX0_COMPLETE (0x1)
+#define EMAC_INT_STA_TX1_COMPLETE (0x1 << 1)
+#define EMAC_INT_STA_TX_COMPLETE (EMAC_INT_STA_TX0_COMPLETE | EMAC_INT_STA_TX1_COMPLETE)
+#define EMAC_INT_STA_TX0_ABRT (0x1 << 2)
+#define EMAC_INT_STA_TX1_ABRT (0x1 << 3)
+#define EMAC_INT_STA_TX_ABRT (EMAC_INT_STA_TX0_ABRT | EMAC_INT_STA_TX1_ABRT)
+#define EMAC_INT_STA_RX_COMPLETE (0x1 << 8)
#define EMAC_MAC_CTL0_REG (0x5c)
#define EMAC_MAC_CTL0_RX_FLOW_CTL_EN (1 << 2)
#define EMAC_MAC_CTL0_TX_FLOW_CTL_EN (1 << 3)
#define EMAC_MAC_CLRT_RM (0x0f)
#define EMAC_MAC_MAXF_REG (0x70)
#define EMAC_MAC_SUPP_REG (0x74)
+#define EMAC_MAC_SUPP_100M (0x1 << 8)
#define EMAC_MAC_TEST_REG (0x78)
#define EMAC_MAC_MCFG_REG (0x7c)
+#define EMAC_MAC_MCFG_MII_CLKD_MASK (0xff << 2)
+#define EMAC_MAC_MCFG_MII_CLKD_72 (0x0d << 2)
#define EMAC_MAC_A0_REG (0x98)
#define EMAC_MAC_A1_REG (0x9c)
#define EMAC_MAC_A2_REG (0xa0)
struct bmac_data *bp;
const unsigned char *prop_addr;
unsigned char addr[6];
+ u8 macaddr[6];
struct net_device *dev;
int is_bmac_plus = ((int)match->data) != 0;
rev = addr[0] == 0 && addr[1] == 0xA0;
for (j = 0; j < 6; ++j)
- dev->dev_addr[j] = rev ? bitrev8(addr[j]): addr[j];
+ macaddr[j] = rev ? bitrev8(addr[j]): addr[j];
+
+ eth_hw_addr_set(dev, macaddr);
/* Enable chip without interrupts for now */
bmac_enable_and_reset_chip(dev);
static void mace_tx_timeout(struct timer_list *t);
static inline void dbdma_reset(volatile struct dbdma_regs __iomem *dma);
static inline void mace_clean_rings(struct mace_data *mp);
-static void __mace_set_address(struct net_device *dev, void *addr);
+static void __mace_set_address(struct net_device *dev, const void *addr);
/*
* If we can't get a skbuff when we need it, we use this area for DMA.
struct net_device *dev;
struct mace_data *mp;
const unsigned char *addr;
+ u8 macaddr[ETH_ALEN];
int j, rev, rc = -EBUSY;
if (macio_resource_count(mdev) != 3 || macio_irq_count(mdev) != 3) {
rev = addr[0] == 0 && addr[1] == 0xA0;
for (j = 0; j < 6; ++j) {
- dev->dev_addr[j] = rev ? bitrev8(addr[j]): addr[j];
+ macaddr[j] = rev ? bitrev8(addr[j]): addr[j];
}
+ eth_hw_addr_set(dev, macaddr);
mp->chipid = (in_8(&mp->mace->chipid_hi) << 8) |
in_8(&mp->mace->chipid_lo);
out_8(&mb->plscc, PORTSEL_GPSI + ENPLSIO);
}
-static void __mace_set_address(struct net_device *dev, void *addr)
+static void __mace_set_address(struct net_device *dev, const void *addr)
{
struct mace_data *mp = netdev_priv(dev);
volatile struct mace __iomem *mb = mp->mace;
- unsigned char *p = addr;
+ const unsigned char *p = addr;
+ u8 macaddr[ETH_ALEN];
int i;
/* load up the hardware address */
;
}
for (i = 0; i < 6; ++i)
- out_8(&mb->padr, dev->dev_addr[i] = p[i]);
+ out_8(&mb->padr, macaddr[i] = p[i]);
+
+ eth_hw_addr_set(dev, macaddr);
+
if (mp->chipid != BROKEN_ADDRCHG_REV)
out_8(&mb->iac, 0);
}
/* Request the WOL interrupt and advertise suspend if available */
priv->wol_irq_disabled = true;
- err = devm_request_irq(&pdev->dev, priv->wol_irq, bcmgenet_wol_isr, 0,
- dev->name, priv);
- if (!err)
- device_set_wakeup_capable(&pdev->dev, 1);
+ if (priv->wol_irq > 0) {
+ err = devm_request_irq(&pdev->dev, priv->wol_irq,
+ bcmgenet_wol_isr, 0, dev->name, priv);
+ if (!err)
+ device_set_wakeup_capable(&pdev->dev, 1);
+ }
/* Set the needed headroom to account for any possible
* features enabling/disabling at runtime
#include <linux/tcp.h>
#include <linux/ipv6.h>
+#include <net/inet_ecn.h>
#include <net/route.h>
#include <net/ip6_route.h>
rt = ip_route_output_ports(&init_net, &fl4, NULL, peer_ip, local_ip,
peer_port, local_port, IPPROTO_TCP,
- tos, 0);
+ tos & ~INET_ECN_MASK, 0);
if (IS_ERR(rt))
return NULL;
n = dst_neigh_lookup(&rt->dst, &peer_ip);
struct mdio_fsl_priv {
struct tgec_mdio_controller __iomem *mdio_base;
bool is_little_endian;
+ bool has_a009885;
bool has_a011043;
};
{
struct mdio_fsl_priv *priv = (struct mdio_fsl_priv *)bus->priv;
struct tgec_mdio_controller __iomem *regs = priv->mdio_base;
+ unsigned long flags;
uint16_t dev_addr;
uint32_t mdio_stat;
uint32_t mdio_ctl;
- uint16_t value;
int ret;
bool endian = priv->is_little_endian;
return ret;
}
+ if (priv->has_a009885)
+ /* Once the operation completes, i.e. MDIO_STAT_BSY clears, we
+ * must read back the data register within 16 MDC cycles.
+ */
+ local_irq_save(flags);
+
/* Initiate the read */
xgmac_write32(mdio_ctl | MDIO_CTL_READ, ®s->mdio_ctl, endian);
ret = xgmac_wait_until_done(&bus->dev, regs, endian);
if (ret)
- return ret;
+ goto irq_restore;
/* Return all Fs if nothing was there */
if ((xgmac_read32(®s->mdio_stat, endian) & MDIO_STAT_RD_ER) &&
dev_dbg(&bus->dev,
"Error while reading PHY%d reg at %d.%hhu\n",
phy_id, dev_addr, regnum);
- return 0xffff;
+ ret = 0xffff;
+ } else {
+ ret = xgmac_read32(®s->mdio_data, endian) & 0xffff;
+ dev_dbg(&bus->dev, "read %04x\n", ret);
}
- value = xgmac_read32(®s->mdio_data, endian) & 0xffff;
- dev_dbg(&bus->dev, "read %04x\n", value);
+irq_restore:
+ if (priv->has_a009885)
+ local_irq_restore(flags);
- return value;
+ return ret;
}
static int xgmac_mdio_probe(struct platform_device *pdev)
priv->is_little_endian = device_property_read_bool(&pdev->dev,
"little-endian");
+ priv->has_a009885 = device_property_read_bool(&pdev->dev,
+ "fsl,erratum-a009885");
priv->has_a011043 = device_property_read_bool(&pdev->dev,
"fsl,erratum-a011043");
static int xgmac_mdio_remove(struct platform_device *pdev)
{
struct mii_bus *bus = platform_get_drvdata(pdev);
+ struct mdio_fsl_priv *priv = bus->priv;
mdiobus_unregister(bus);
- iounmap(bus->priv);
+ iounmap(priv->mdio_base);
mdiobus_free(bus);
return 0;
netdevice->dev_addr[5] = readb(eth_addr + 0x06);
iounmap(eth_addr);
- if (!netdevice->irq) {
+ if (netdevice->irq < 0) {
printk(KERN_ERR "%s: IRQ not found for i82596 at 0x%lx\n",
__FILE__, netdevice->base_addr);
+ retval = netdevice->irq;
goto probe_failed;
}
struct list_head rif_entry_list;
struct notifier_block inetaddr_nb;
struct notifier_block inetaddr_valid_nb;
- bool aborted;
};
struct prestera_rxtx_params {
int prestera_hw_rif_create(struct prestera_switch *sw,
struct prestera_iface *iif, u8 *mac, u16 *rif_id)
{
- struct prestera_msg_rif_req req;
struct prestera_msg_rif_resp resp;
+ struct prestera_msg_rif_req req;
int err;
memcpy(req.mac, mac, ETH_ALEN);
int prestera_hw_vr_create(struct prestera_switch *sw, u16 *vr_id)
{
- int err;
struct prestera_msg_vr_resp resp;
struct prestera_msg_vr_req req;
+ int err;
err = prestera_cmd_ret(sw, PRESTERA_CMD_TYPE_ROUTER_VR_CREATE,
&req.cmd, sizeof(req), &resp.ret, sizeof(resp));
prestera_event_handlers_unregister(sw);
prestera_rxtx_switch_fini(sw);
prestera_switchdev_fini(sw);
+ prestera_router_fini(sw);
prestera_netdev_event_handler_unregister(sw);
prestera_hw_switch_fini(sw);
}
struct netlink_ext_ack *extack)
{
struct prestera_port *port = netdev_priv(port_dev);
- int err;
- struct prestera_rif_entry *re;
struct prestera_rif_entry_key re_key = {};
+ struct prestera_rif_entry *re;
u32 kern_tb_id;
+ int err;
err = prestera_is_valid_mac_addr(port, port_dev->dev_addr);
if (err) {
switch (event) {
case NETDEV_UP:
if (re) {
- NL_SET_ERR_MSG_MOD(extack, "rif_entry already exist");
+ NL_SET_ERR_MSG_MOD(extack, "RIF already exist");
return -EEXIST;
}
re = prestera_rif_entry_create(port->sw, &re_key,
prestera_fix_tb_id(kern_tb_id),
port_dev->dev_addr);
if (!re) {
- NL_SET_ERR_MSG_MOD(extack, "Can't create rif_entry");
+ NL_SET_ERR_MSG_MOD(extack, "Can't create RIF");
return -EINVAL;
}
dev_hold(port_dev);
break;
case NETDEV_DOWN:
if (!re) {
- NL_SET_ERR_MSG_MOD(extack, "rif_entry not exist");
+ NL_SET_ERR_MSG_MOD(extack, "Can't find RIF");
return -EEXIST;
}
prestera_rif_entry_destroy(port->sw, re);
unsigned long event,
struct netlink_ext_ack *extack)
{
- if (prestera_netdev_check(dev) && !netif_is_bridge_port(dev) &&
- !netif_is_lag_port(dev) && !netif_is_ovs_port(dev))
- return __prestera_inetaddr_port_event(dev, event, extack);
+ if (!prestera_netdev_check(dev) || netif_is_bridge_port(dev) ||
+ netif_is_lag_port(dev) || netif_is_ovs_port(dev))
+ return 0;
- return 0;
+ return __prestera_inetaddr_port_event(dev, event, extack);
}
static int __prestera_inetaddr_cb(struct notifier_block *nb,
goto out;
if (ipv4_is_multicast(ivi->ivi_addr)) {
+ NL_SET_ERR_MSG_MOD(ivi->extack,
+ "Multicast addr on RIF is not supported");
err = -EINVAL;
goto out;
}
err_register_inetaddr_notifier:
unregister_inetaddr_validator_notifier(&router->inetaddr_valid_nb);
err_register_inetaddr_validator_notifier:
- /* prestera_router_hw_fini */
+ prestera_router_hw_fini(sw);
err_router_lib_init:
kfree(sw->router);
return err;
{
unregister_inetaddr_notifier(&sw->router->inetaddr_nb);
unregister_inetaddr_validator_notifier(&sw->router->inetaddr_valid_nb);
- /* router_hw_fini */
+ prestera_router_hw_fini(sw);
kfree(sw->router);
sw->router = NULL;
}
return 0;
}
+void prestera_router_hw_fini(struct prestera_switch *sw)
+{
+ WARN_ON(!list_empty(&sw->router->vr_list));
+ WARN_ON(!list_empty(&sw->router->rif_entry_list));
+}
+
static struct prestera_vr *__prestera_vr_find(struct prestera_switch *sw,
u32 tb_id)
{
struct netlink_ext_ack *extack)
{
struct prestera_vr *vr;
- u16 hw_vr_id;
int err;
- err = prestera_hw_vr_create(sw, &hw_vr_id);
- if (err)
- return ERR_PTR(-ENOMEM);
-
vr = kzalloc(sizeof(*vr), GFP_KERNEL);
if (!vr) {
err = -ENOMEM;
}
vr->tb_id = tb_id;
- vr->hw_vr_id = hw_vr_id;
+
+ err = prestera_hw_vr_create(sw, &vr->hw_vr_id);
+ if (err)
+ goto err_hw_create;
list_add(&vr->router_node, &sw->router->vr_list);
return vr;
-err_alloc_vr:
- prestera_hw_vr_delete(sw, hw_vr_id);
+err_hw_create:
kfree(vr);
+err_alloc_vr:
return ERR_PTR(err);
}
static void __prestera_vr_destroy(struct prestera_switch *sw,
struct prestera_vr *vr)
{
- prestera_hw_vr_delete(sw, vr->hw_vr_id);
list_del(&vr->router_node);
+ prestera_hw_vr_delete(sw, vr->hw_vr_id);
kfree(vr);
}
struct prestera_vr *vr;
vr = __prestera_vr_find(sw, tb_id);
- if (!vr)
+ if (vr) {
+ refcount_inc(&vr->refcount);
+ } else {
vr = __prestera_vr_create(sw, tb_id, extack);
- if (IS_ERR(vr))
- return ERR_CAST(vr);
+ if (IS_ERR(vr))
+ return ERR_CAST(vr);
+
+ refcount_set(&vr->refcount, 1);
+ }
return vr;
}
static void prestera_vr_put(struct prestera_switch *sw, struct prestera_vr *vr)
{
- if (!vr->ref_cnt)
+ if (refcount_dec_and_test(&vr->refcount))
__prestera_vr_destroy(sw, vr);
}
out->iface.vlan_id = in->iface.vlan_id;
break;
default:
- pr_err("Unsupported iface type");
+ WARN(1, "Unsupported iface type");
return -EINVAL;
}
iface.vr_id = e->vr->hw_vr_id;
prestera_hw_rif_delete(sw, e->hw_id, &iface);
- e->vr->ref_cnt--;
prestera_vr_put(sw, e->vr);
kfree(e);
}
if (IS_ERR(e->vr))
goto err_vr_get;
- e->vr->ref_cnt++;
memcpy(&e->addr, addr, sizeof(e->addr));
/* HW */
return e;
err_hw_create:
- e->vr->ref_cnt--;
prestera_vr_put(sw, e->vr);
err_vr_get:
err_key_copy:
struct prestera_vr {
struct list_head router_node;
- unsigned int ref_cnt;
+ refcount_t refcount;
u32 tb_id; /* key (kernel fib table id) */
u16 hw_vr_id; /* virtual router ID */
u8 __pad[2];
struct prestera_rif_entry_key *k,
u32 tb_id, const unsigned char *addr);
int prestera_router_hw_init(struct prestera_switch *sw);
+void prestera_router_hw_fini(struct prestera_switch *sw);
#endif /* _PRESTERA_ROUTER_HW_H_ */
phylink_config);
struct mtk_eth *eth = mac->hw;
u32 mcr_cur, mcr_new, sid, i;
- int val, ge_mode, err;
+ int val, ge_mode, err = 0;
/* MT76x8 has no hardware settings between for the MAC */
if (!MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628) &&
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2018 Mellanox Technologies. */
+#include <net/inet_ecn.h>
#include <net/vxlan.h>
#include <net/gre.h>
#include <net/geneve.h>
int err;
/* add the IP fields */
- attr.fl.fl4.flowi4_tos = tun_key->tos;
+ attr.fl.fl4.flowi4_tos = tun_key->tos & ~INET_ECN_MASK;
attr.fl.fl4.daddr = tun_key->u.ipv4.dst;
attr.fl.fl4.saddr = tun_key->u.ipv4.src;
attr.ttl = tun_key->ttl;
int err;
/* add the IP fields */
- attr.fl.fl4.flowi4_tos = tun_key->tos;
+ attr.fl.fl4.flowi4_tos = tun_key->tos & ~INET_ECN_MASK;
attr.fl.fl4.daddr = tun_key->u.ipv4.dst;
attr.fl.fl4.saddr = tun_key->u.ipv4.src;
attr.ttl = tun_key->ttl;
ocelot_write_rix(ocelot, 0, ANA_POL_FLOWC, port);
- ocelot_fields_write(ocelot, port, SYS_PAUSE_CFG_PAUSE_ENA, tx_pause);
+ /* Don't attempt to send PAUSE frames on the NPI port, it's broken */
+ if (port != ocelot->npi)
+ ocelot_fields_write(ocelot, port, SYS_PAUSE_CFG_PAUSE_ENA,
+ tx_pause);
/* Undo the effects of ocelot_phylink_mac_link_down:
* enable MAC module
return -EOPNOTSUPP;
}
- if (filter->block_id == VCAP_IS1 &&
- !is_zero_ether_addr(match.mask->dst)) {
- NL_SET_ERR_MSG_MOD(extack,
- "Key type S1_NORMAL cannot match on destination MAC");
- return -EOPNOTSUPP;
- }
-
/* The hw support mac matches only for MAC_ETYPE key,
* therefore if other matches(port, tcp flags, etc) are added
* then just bail out
return -EOPNOTSUPP;
flow_rule_match_eth_addrs(rule, &match);
+
+ if (filter->block_id == VCAP_IS1 &&
+ !is_zero_ether_addr(match.mask->dst)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Key type S1_NORMAL cannot match on destination MAC");
+ return -EOPNOTSUPP;
+ }
+
filter->key_type = OCELOT_VCAP_KEY_ETYPE;
ether_addr_copy(filter->key.etype.dmac.value,
match.key->dst);
struct netlink_ext_ack *extack = f->common.extack;
struct ocelot_vcap_filter *filter;
int chain = f->common.chain_index;
- int ret;
+ int block_id, ret;
if (chain && !ocelot_find_vcap_filter_that_points_at(ocelot, chain)) {
NL_SET_ERR_MSG_MOD(extack, "No default GOTO action points to this chain");
return -EOPNOTSUPP;
}
+ block_id = ocelot_chain_to_block(chain, ingress);
+ if (block_id < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "Cannot offload to this chain");
+ return -EOPNOTSUPP;
+ }
+
+ filter = ocelot_vcap_block_find_filter_by_id(&ocelot->block[block_id],
+ f->cookie, true);
+ if (filter) {
+ /* Filter already exists on other ports */
+ if (!ingress) {
+ NL_SET_ERR_MSG_MOD(extack, "VCAP ES0 does not support shared filters");
+ return -EOPNOTSUPP;
+ }
+
+ filter->ingress_port_mask |= BIT(port);
+
+ return ocelot_vcap_filter_replace(ocelot, filter);
+ }
+
+ /* Filter didn't exist, create it now */
filter = ocelot_vcap_filter_create(ocelot, port, ingress, f);
if (!filter)
return -ENOMEM;
if (filter->type == OCELOT_VCAP_FILTER_DUMMY)
return ocelot_vcap_dummy_filter_del(ocelot, filter);
+ if (ingress) {
+ filter->ingress_port_mask &= ~BIT(port);
+ if (filter->ingress_port_mask)
+ return ocelot_vcap_filter_replace(ocelot, filter);
+ }
+
return ocelot_vcap_filter_del(ocelot, filter);
}
EXPORT_SYMBOL_GPL(ocelot_cls_flower_destroy);
ocelot_port_bridge_join(ocelot, port, bridge);
err = switchdev_bridge_port_offload(brport_dev, dev, priv,
- &ocelot_netdevice_nb,
+ &ocelot_switchdev_nb,
&ocelot_switchdev_blocking_nb,
false, extack);
if (err)
err_switchdev_sync:
switchdev_bridge_port_unoffload(brport_dev, priv,
- &ocelot_netdevice_nb,
+ &ocelot_switchdev_nb,
&ocelot_switchdev_blocking_nb);
err_switchdev_offload:
ocelot_port_bridge_leave(ocelot, port, bridge);
struct ocelot_port_private *priv = netdev_priv(dev);
switchdev_bridge_port_unoffload(brport_dev, priv,
- &ocelot_netdevice_nb,
+ &ocelot_switchdev_nb,
&ocelot_switchdev_blocking_nb);
}
#include <linux/io.h>
#include <linux/module.h>
#include <linux/of.h>
+#include <linux/of_device.h>
#include <linux/platform_device.h>
#include <linux/regmap.h>
#include <linux/mfd/syscon.h>
#define DWMAC_RX_VARDELAY(d) ((d) << DWMAC_RX_VARDELAY_SHIFT)
#define DWMAC_RXN_VARDELAY(d) ((d) << DWMAC_RXN_VARDELAY_SHIFT)
+struct oxnas_dwmac;
+
+struct oxnas_dwmac_data {
+ int (*setup)(struct oxnas_dwmac *dwmac);
+};
+
struct oxnas_dwmac {
struct device *dev;
struct clk *clk;
struct regmap *regmap;
+ const struct oxnas_dwmac_data *data;
};
-static int oxnas_dwmac_init(struct platform_device *pdev, void *priv)
+static int oxnas_dwmac_setup_ox810se(struct oxnas_dwmac *dwmac)
{
- struct oxnas_dwmac *dwmac = priv;
unsigned int value;
int ret;
- /* Reset HW here before changing the glue configuration */
- ret = device_reset(dwmac->dev);
- if (ret)
+ ret = regmap_read(dwmac->regmap, OXNAS_DWMAC_CTRL_REGOFFSET, &value);
+ if (ret < 0)
return ret;
- ret = clk_prepare_enable(dwmac->clk);
- if (ret)
- return ret;
+ /* Enable GMII_GTXCLK to follow GMII_REFCLK, required for gigabit PHY */
+ value |= BIT(DWMAC_CKEN_GTX) |
+ /* Use simple mux for 25/125 Mhz clock switching */
+ BIT(DWMAC_SIMPLE_MUX);
+
+ regmap_write(dwmac->regmap, OXNAS_DWMAC_CTRL_REGOFFSET, value);
+
+ return 0;
+}
+
+static int oxnas_dwmac_setup_ox820(struct oxnas_dwmac *dwmac)
+{
+ unsigned int value;
+ int ret;
ret = regmap_read(dwmac->regmap, OXNAS_DWMAC_CTRL_REGOFFSET, &value);
- if (ret < 0) {
- clk_disable_unprepare(dwmac->clk);
+ if (ret < 0)
return ret;
- }
/* Enable GMII_GTXCLK to follow GMII_REFCLK, required for gigabit PHY */
value |= BIT(DWMAC_CKEN_GTX) |
/* Use simple mux for 25/125 Mhz clock switching */
- BIT(DWMAC_SIMPLE_MUX) |
- /* set auto switch tx clock source */
- BIT(DWMAC_AUTO_TX_SOURCE) |
- /* enable tx & rx vardelay */
- BIT(DWMAC_CKEN_TX_OUT) |
- BIT(DWMAC_CKEN_TXN_OUT) |
- BIT(DWMAC_CKEN_TX_IN) |
- BIT(DWMAC_CKEN_RX_OUT) |
- BIT(DWMAC_CKEN_RXN_OUT) |
- BIT(DWMAC_CKEN_RX_IN);
+ BIT(DWMAC_SIMPLE_MUX) |
+ /* set auto switch tx clock source */
+ BIT(DWMAC_AUTO_TX_SOURCE) |
+ /* enable tx & rx vardelay */
+ BIT(DWMAC_CKEN_TX_OUT) |
+ BIT(DWMAC_CKEN_TXN_OUT) |
+ BIT(DWMAC_CKEN_TX_IN) |
+ BIT(DWMAC_CKEN_RX_OUT) |
+ BIT(DWMAC_CKEN_RXN_OUT) |
+ BIT(DWMAC_CKEN_RX_IN);
regmap_write(dwmac->regmap, OXNAS_DWMAC_CTRL_REGOFFSET, value);
/* set tx & rx vardelay */
return 0;
}
+static int oxnas_dwmac_init(struct platform_device *pdev, void *priv)
+{
+ struct oxnas_dwmac *dwmac = priv;
+ int ret;
+
+ /* Reset HW here before changing the glue configuration */
+ ret = device_reset(dwmac->dev);
+ if (ret)
+ return ret;
+
+ ret = clk_prepare_enable(dwmac->clk);
+ if (ret)
+ return ret;
+
+ ret = dwmac->data->setup(dwmac);
+ if (ret)
+ clk_disable_unprepare(dwmac->clk);
+
+ return ret;
+}
+
static void oxnas_dwmac_exit(struct platform_device *pdev, void *priv)
{
struct oxnas_dwmac *dwmac = priv;
goto err_remove_config_dt;
}
+ dwmac->data = (const struct oxnas_dwmac_data *)of_device_get_match_data(&pdev->dev);
+ if (!dwmac->data) {
+ ret = -EINVAL;
+ goto err_remove_config_dt;
+ }
+
dwmac->dev = &pdev->dev;
plat_dat->bsp_priv = dwmac;
plat_dat->init = oxnas_dwmac_init;
return ret;
}
+static const struct oxnas_dwmac_data ox810se_dwmac_data = {
+ .setup = oxnas_dwmac_setup_ox810se,
+};
+
+static const struct oxnas_dwmac_data ox820_dwmac_data = {
+ .setup = oxnas_dwmac_setup_ox820,
+};
+
static const struct of_device_id oxnas_dwmac_match[] = {
- { .compatible = "oxsemi,ox820-dwmac" },
+ {
+ .compatible = "oxsemi,ox810se-dwmac",
+ .data = &ox810se_dwmac_data,
+ },
+ {
+ .compatible = "oxsemi,ox820-dwmac",
+ .data = &ox820_dwmac_data,
+ },
{ }
};
MODULE_DEVICE_TABLE(of, oxnas_dwmac_match);
pm_runtime_get_noresume(device);
pm_runtime_set_active(device);
- pm_runtime_enable(device);
+ if (!pm_runtime_enabled(device))
+ pm_runtime_enable(device);
if (priv->hw->pcs != STMMAC_PCS_TBI &&
priv->hw->pcs != STMMAC_PCS_RTBI) {
struct cpsw_common *cpsw = ndev_to_cpsw(xmeta->ndev);
int pkt_size = cpsw->rx_packet_max;
int ret = 0, port, ch = xmeta->ch;
- int headroom = CPSW_HEADROOM;
+ int headroom = CPSW_HEADROOM_NA;
struct net_device *ndev = xmeta->ndev;
struct cpsw_priv *priv;
struct page_pool *pool;
}
if (priv->xdp_prog) {
- int headroom = CPSW_HEADROOM, size = len;
+ int size = len;
xdp_init_buff(&xdp, PAGE_SIZE, &priv->xdp_rxq[ch]);
if (status & CPDMA_RX_VLAN_ENCAP) {
xmeta->ndev = ndev;
xmeta->ch = ch;
- dma = page_pool_get_dma_addr(new_page) + CPSW_HEADROOM;
+ dma = page_pool_get_dma_addr(new_page) + CPSW_HEADROOM_NA;
ret = cpdma_chan_submit_mapped(cpsw->rxv[ch].ch, new_page, dma,
pkt_size, 0);
if (ret < 0) {
{
struct page *new_page, *page = token;
void *pa = page_address(page);
- int headroom = CPSW_HEADROOM;
+ int headroom = CPSW_HEADROOM_NA;
struct cpsw_meta_xdp *xmeta;
struct cpsw_common *cpsw;
struct net_device *ndev;
}
if (priv->xdp_prog) {
- int headroom = CPSW_HEADROOM, size = len;
+ int size = len;
xdp_init_buff(&xdp, PAGE_SIZE, &priv->xdp_rxq[ch]);
if (status & CPDMA_RX_VLAN_ENCAP) {
xmeta->ndev = ndev;
xmeta->ch = ch;
- dma = page_pool_get_dma_addr(new_page) + CPSW_HEADROOM;
+ dma = page_pool_get_dma_addr(new_page) + CPSW_HEADROOM_NA;
ret = cpdma_chan_submit_mapped(cpsw->rxv[ch].ch, new_page, dma,
pkt_size, 0);
if (ret < 0) {
xmeta->ndev = priv->ndev;
xmeta->ch = ch;
- dma = page_pool_get_dma_addr(page) + CPSW_HEADROOM;
+ dma = page_pool_get_dma_addr(page) + CPSW_HEADROOM_NA;
ret = cpdma_chan_idle_submit_mapped(cpsw->rxv[ch].ch,
page, dma,
cpsw->rx_packet_max,
config NET_VENDOR_VERTEXCOM
bool "Vertexcom devices"
- default n
+ default y
help
If you have a network (Ethernet) card belonging to this class, say Y.
#include "xilinx_axienet.h"
/* Descriptors defines for Tx and Rx DMA */
-#define TX_BD_NUM_DEFAULT 64
+#define TX_BD_NUM_DEFAULT 128
#define RX_BD_NUM_DEFAULT 1024
+#define TX_BD_NUM_MIN (MAX_SKB_FRAGS + 1)
#define TX_BD_NUM_MAX 4096
#define RX_BD_NUM_MAX 4096
static int __axienet_device_reset(struct axienet_local *lp)
{
- u32 timeout;
+ u32 value;
+ int ret;
/* Reset Axi DMA. This would reset Axi Ethernet core as well. The reset
* process of Axi DMA takes a while to complete as all pending
* they both reset the entire DMA core, so only one needs to be used.
*/
axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, XAXIDMA_CR_RESET_MASK);
- timeout = DELAY_OF_ONE_MILLISEC;
- while (axienet_dma_in32(lp, XAXIDMA_TX_CR_OFFSET) &
- XAXIDMA_CR_RESET_MASK) {
- udelay(1);
- if (--timeout == 0) {
- netdev_err(lp->ndev, "%s: DMA reset timeout!\n",
- __func__);
- return -ETIMEDOUT;
- }
+ ret = read_poll_timeout(axienet_dma_in32, value,
+ !(value & XAXIDMA_CR_RESET_MASK),
+ DELAY_OF_ONE_MILLISEC, 50000, false, lp,
+ XAXIDMA_TX_CR_OFFSET);
+ if (ret) {
+ dev_err(lp->dev, "%s: DMA reset timeout!\n", __func__);
+ return ret;
+ }
+
+ /* Wait for PhyRstCmplt bit to be set, indicating the PHY reset has finished */
+ ret = read_poll_timeout(axienet_ior, value,
+ value & XAE_INT_PHYRSTCMPLT_MASK,
+ DELAY_OF_ONE_MILLISEC, 50000, false, lp,
+ XAE_IS_OFFSET);
+ if (ret) {
+ dev_err(lp->dev, "%s: timeout waiting for PhyRstCmplt\n", __func__);
+ return ret;
}
return 0;
if (nr_bds == -1 && !(status & XAXIDMA_BD_STS_COMPLETE_MASK))
break;
+ /* Ensure we see complete descriptor update */
+ dma_rmb();
phys = desc_get_phys_addr(lp, cur_p);
dma_unmap_single(ndev->dev.parent, phys,
(cur_p->cntrl & XAXIDMA_BD_CTRL_LENGTH_MASK),
if (cur_p->skb && (status & XAXIDMA_BD_STS_COMPLETE_MASK))
dev_consume_skb_irq(cur_p->skb);
- cur_p->cntrl = 0;
cur_p->app0 = 0;
cur_p->app1 = 0;
cur_p->app2 = 0;
cur_p->app4 = 0;
- cur_p->status = 0;
cur_p->skb = NULL;
+ /* ensure our transmit path and device don't prematurely see status cleared */
+ wmb();
+ cur_p->cntrl = 0;
+ cur_p->status = 0;
if (sizep)
*sizep += status & XAXIDMA_BD_STS_ACTUAL_LEN_MASK;
return i;
}
+/**
+ * axienet_check_tx_bd_space - Checks if a BD/group of BDs are currently busy
+ * @lp: Pointer to the axienet_local structure
+ * @num_frag: The number of BDs to check for
+ *
+ * Return: 0, on success
+ * NETDEV_TX_BUSY, if any of the descriptors are not free
+ *
+ * This function is invoked before BDs are allocated and transmission starts.
+ * This function returns 0 if a BD or group of BDs can be allocated for
+ * transmission. If the BD or any of the BDs are not free the function
+ * returns a busy status. This is invoked from axienet_start_xmit.
+ */
+static inline int axienet_check_tx_bd_space(struct axienet_local *lp,
+ int num_frag)
+{
+ struct axidma_bd *cur_p;
+
+ /* Ensure we see all descriptor updates from device or TX IRQ path */
+ rmb();
+ cur_p = &lp->tx_bd_v[(lp->tx_bd_tail + num_frag) % lp->tx_bd_num];
+ if (cur_p->cntrl)
+ return NETDEV_TX_BUSY;
+ return 0;
+}
+
/**
* axienet_start_xmit_done - Invoked once a transmit is completed by the
* Axi DMA Tx channel.
/* Matches barrier in axienet_start_xmit */
smp_mb();
- netif_wake_queue(ndev);
-}
-
-/**
- * axienet_check_tx_bd_space - Checks if a BD/group of BDs are currently busy
- * @lp: Pointer to the axienet_local structure
- * @num_frag: The number of BDs to check for
- *
- * Return: 0, on success
- * NETDEV_TX_BUSY, if any of the descriptors are not free
- *
- * This function is invoked before BDs are allocated and transmission starts.
- * This function returns 0 if a BD or group of BDs can be allocated for
- * transmission. If the BD or any of the BDs are not free the function
- * returns a busy status. This is invoked from axienet_start_xmit.
- */
-static inline int axienet_check_tx_bd_space(struct axienet_local *lp,
- int num_frag)
-{
- struct axidma_bd *cur_p;
- cur_p = &lp->tx_bd_v[(lp->tx_bd_tail + num_frag) % lp->tx_bd_num];
- if (cur_p->status & XAXIDMA_BD_STS_ALL_MASK)
- return NETDEV_TX_BUSY;
- return 0;
+ if (!axienet_check_tx_bd_space(lp, MAX_SKB_FRAGS + 1))
+ netif_wake_queue(ndev);
}
/**
num_frag = skb_shinfo(skb)->nr_frags;
cur_p = &lp->tx_bd_v[lp->tx_bd_tail];
- if (axienet_check_tx_bd_space(lp, num_frag)) {
- if (netif_queue_stopped(ndev))
- return NETDEV_TX_BUSY;
-
+ if (axienet_check_tx_bd_space(lp, num_frag + 1)) {
+ /* Should not happen as last start_xmit call should have
+ * checked for sufficient space and queue should only be
+ * woken when sufficient space is available.
+ */
netif_stop_queue(ndev);
-
- /* Matches barrier in axienet_start_xmit_done */
- smp_mb();
-
- /* Space might have just been freed - check again */
- if (axienet_check_tx_bd_space(lp, num_frag))
- return NETDEV_TX_BUSY;
-
- netif_wake_queue(ndev);
+ if (net_ratelimit())
+ netdev_warn(ndev, "TX ring unexpectedly full\n");
+ return NETDEV_TX_BUSY;
}
if (skb->ip_summed == CHECKSUM_PARTIAL) {
if (++lp->tx_bd_tail >= lp->tx_bd_num)
lp->tx_bd_tail = 0;
+ /* Stop queue if next transmit may not have space */
+ if (axienet_check_tx_bd_space(lp, MAX_SKB_FRAGS + 1)) {
+ netif_stop_queue(ndev);
+
+ /* Matches barrier in axienet_start_xmit_done */
+ smp_mb();
+
+ /* Space might have just been freed - check again */
+ if (!axienet_check_tx_bd_space(lp, MAX_SKB_FRAGS + 1))
+ netif_wake_queue(ndev);
+ }
+
return NETDEV_TX_OK;
}
tail_p = lp->rx_bd_p + sizeof(*lp->rx_bd_v) * lp->rx_bd_ci;
+ /* Ensure we see complete descriptor update */
+ dma_rmb();
phys = desc_get_phys_addr(lp, cur_p);
dma_unmap_single(ndev->dev.parent, phys, lp->max_frm_size,
DMA_FROM_DEVICE);
if (ering->rx_pending > RX_BD_NUM_MAX ||
ering->rx_mini_pending ||
ering->rx_jumbo_pending ||
- ering->rx_pending > TX_BD_NUM_MAX)
+ ering->tx_pending < TX_BD_NUM_MIN ||
+ ering->tx_pending > TX_BD_NUM_MAX)
return -EINVAL;
if (netif_running(ndev))
lp->coalesce_count_rx = XAXIDMA_DFT_RX_THRESHOLD;
lp->coalesce_count_tx = XAXIDMA_DFT_TX_THRESHOLD;
+ /* Reset core now that clocks are enabled, prior to accessing MDIO */
+ ret = __axienet_device_reset(lp);
+ if (ret)
+ goto cleanup_clk;
+
lp->phy_node = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
if (lp->phy_node) {
ret = axienet_mdio_setup(lp);
{
struct gsi *gsi;
u32 backlog;
+ int delta;
- if (!endpoint->replenish_enabled) {
+ if (!test_bit(IPA_REPLENISH_ENABLED, endpoint->replenish_flags)) {
if (add_one)
atomic_inc(&endpoint->replenish_saved);
return;
}
+ /* If already active, just update the backlog */
+ if (test_and_set_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags)) {
+ if (add_one)
+ atomic_inc(&endpoint->replenish_backlog);
+ return;
+ }
+
while (atomic_dec_not_zero(&endpoint->replenish_backlog))
if (ipa_endpoint_replenish_one(endpoint))
goto try_again_later;
+
+ clear_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags);
+
if (add_one)
atomic_inc(&endpoint->replenish_backlog);
return;
try_again_later:
- /* The last one didn't succeed, so fix the backlog */
- backlog = atomic_inc_return(&endpoint->replenish_backlog);
+ clear_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags);
- if (add_one)
- atomic_inc(&endpoint->replenish_backlog);
+ /* The last one didn't succeed, so fix the backlog */
+ delta = add_one ? 2 : 1;
+ backlog = atomic_add_return(delta, &endpoint->replenish_backlog);
/* Whenever a receive buffer transaction completes we'll try to
* replenish again. It's unlikely, but if we fail to supply even
u32 max_backlog;
u32 saved;
- endpoint->replenish_enabled = true;
+ set_bit(IPA_REPLENISH_ENABLED, endpoint->replenish_flags);
while ((saved = atomic_xchg(&endpoint->replenish_saved, 0)))
atomic_add(saved, &endpoint->replenish_backlog);
{
u32 backlog;
- endpoint->replenish_enabled = false;
+ clear_bit(IPA_REPLENISH_ENABLED, endpoint->replenish_flags);
while ((backlog = atomic_xchg(&endpoint->replenish_backlog, 0)))
atomic_add(backlog, &endpoint->replenish_saved);
}
/* RX transactions require a single TRE, so the maximum
* backlog is the same as the maximum outstanding TREs.
*/
- endpoint->replenish_enabled = false;
+ clear_bit(IPA_REPLENISH_ENABLED, endpoint->replenish_flags);
+ clear_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags);
atomic_set(&endpoint->replenish_saved,
gsi_channel_tre_max(gsi, endpoint->channel_id));
atomic_set(&endpoint->replenish_backlog, 0);
#define IPA_ENDPOINT_MAX 32 /* Max supported by driver */
+/**
+ * enum ipa_replenish_flag: RX buffer replenish flags
+ *
+ * @IPA_REPLENISH_ENABLED: Whether receive buffer replenishing is enabled
+ * @IPA_REPLENISH_ACTIVE: Whether replenishing is underway
+ * @IPA_REPLENISH_COUNT: Number of defined replenish flags
+ */
+enum ipa_replenish_flag {
+ IPA_REPLENISH_ENABLED,
+ IPA_REPLENISH_ACTIVE,
+ IPA_REPLENISH_COUNT, /* Number of flags (must be last) */
+};
+
/**
* struct ipa_endpoint - IPA endpoint information
* @ipa: IPA pointer
* @trans_tre_max: Maximum number of TRE descriptors per transaction
* @evt_ring_id: GSI event ring used by the endpoint
* @netdev: Network device pointer, if endpoint uses one
- * @replenish_enabled: Whether receive buffer replenishing is enabled
+ * @replenish_flags: Replenishing state flags
* @replenish_ready: Number of replenish transactions without doorbell
* @replenish_saved: Replenish requests held while disabled
* @replenish_backlog: Number of buffers needed to fill hardware queue
struct net_device *netdev;
/* Receive buffer replenishing for RX endpoints */
- bool replenish_enabled;
+ DECLARE_BITMAP(replenish_flags, IPA_REPLENISH_COUNT);
u32 replenish_ready;
atomic_t replenish_saved;
atomic_t replenish_backlog;
const u8 *mac;
int ret, irq_enabled;
unsigned int i;
- const unsigned int offsets[] = {
+ static const unsigned int offsets[] = {
AT803X_LOC_MAC_ADDR_32_47_OFFSET,
AT803X_LOC_MAC_ADDR_16_31_OFFSET,
AT803X_LOC_MAC_ADDR_0_15_OFFSET,
#define MII_88E1510_GEN_CTRL_REG_1_MODE_RGMII_SGMII 0x4
#define MII_88E1510_GEN_CTRL_REG_1_RESET 0x8000 /* Soft reset */
+#define MII_88E1510_MSCR_2 0x15
+
#define MII_VCT5_TX_RX_MDI0_COUPLING 0x10
#define MII_VCT5_TX_RX_MDI1_COUPLING 0x11
#define MII_VCT5_TX_RX_MDI2_COUPLING 0x12
data[i] = marvell_get_stat(phydev, i);
}
+static int m88e1510_loopback(struct phy_device *phydev, bool enable)
+{
+ int err;
+
+ if (enable) {
+ u16 bmcr_ctl = 0, mscr2_ctl = 0;
+
+ if (phydev->speed == SPEED_1000)
+ bmcr_ctl = BMCR_SPEED1000;
+ else if (phydev->speed == SPEED_100)
+ bmcr_ctl = BMCR_SPEED100;
+
+ if (phydev->duplex == DUPLEX_FULL)
+ bmcr_ctl |= BMCR_FULLDPLX;
+
+ err = phy_write(phydev, MII_BMCR, bmcr_ctl);
+ if (err < 0)
+ return err;
+
+ if (phydev->speed == SPEED_1000)
+ mscr2_ctl = BMCR_SPEED1000;
+ else if (phydev->speed == SPEED_100)
+ mscr2_ctl = BMCR_SPEED100;
+
+ err = phy_modify_paged(phydev, MII_MARVELL_MSCR_PAGE,
+ MII_88E1510_MSCR_2, BMCR_SPEED1000 |
+ BMCR_SPEED100, mscr2_ctl);
+ if (err < 0)
+ return err;
+
+ /* Need soft reset to have speed configuration takes effect */
+ err = genphy_soft_reset(phydev);
+ if (err < 0)
+ return err;
+
+ /* FIXME: Based on trial and error test, it seem 1G need to have
+ * delay between soft reset and loopback enablement.
+ */
+ if (phydev->speed == SPEED_1000)
+ msleep(1000);
+
+ return phy_modify(phydev, MII_BMCR, BMCR_LOOPBACK,
+ BMCR_LOOPBACK);
+ } else {
+ err = phy_modify(phydev, MII_BMCR, BMCR_LOOPBACK, 0);
+ if (err < 0)
+ return err;
+
+ return phy_config_aneg(phydev);
+ }
+}
+
static int marvell_vct5_wait_complete(struct phy_device *phydev)
{
int i;
.get_sset_count = marvell_get_sset_count,
.get_strings = marvell_get_strings,
.get_stats = marvell_get_stats,
- .set_loopback = genphy_loopback,
+ .set_loopback = m88e1510_loopback,
.get_tunable = m88e1011_get_tunable,
.set_tunable = m88e1011_set_tunable,
.cable_test_start = marvell_vct7_cable_test_start,
.config_init = kszphy_config_init,
.config_intr = kszphy_config_intr,
.handle_interrupt = kszphy_handle_interrupt,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
}, {
.phy_id = PHY_ID_KSZ8021,
.phy_id_mask = 0x00ffffff,
.get_sset_count = kszphy_get_sset_count,
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
}, {
.phy_id = PHY_ID_KSZ8031,
.phy_id_mask = 0x00ffffff,
.get_sset_count = kszphy_get_sset_count,
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
}, {
.phy_id = PHY_ID_KSZ8041,
.phy_id_mask = MICREL_PHY_ID_MASK,
.get_sset_count = kszphy_get_sset_count,
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
}, {
.name = "Micrel KSZ8051",
/* PHY_BASIC_FEATURES */
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
.match_phy_device = ksz8051_match_phy_device,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
}, {
.phy_id = PHY_ID_KSZ8001,
.name = "Micrel KSZ8001 or KS8721",
.get_sset_count = kszphy_get_sset_count,
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
}, {
.phy_id = PHY_ID_KSZ8081,
.name = "Micrel KSZ8081 or KSZ8091",
.config_init = ksz8061_config_init,
.config_intr = kszphy_config_intr,
.handle_interrupt = kszphy_handle_interrupt,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
}, {
.phy_id = PHY_ID_KSZ9021,
.phy_id_mask = 0x000ffffe,
.get_sset_count = kszphy_get_sset_count,
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
- .suspend = genphy_suspend,
- .resume = genphy_resume,
+ .suspend = kszphy_suspend,
+ .resume = kszphy_resume,
.read_mmd = genphy_read_mmd_unsupported,
.write_mmd = genphy_write_mmd_unsupported,
}, {
.get_sset_count = kszphy_get_sset_count,
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
- .suspend = genphy_suspend,
+ .suspend = kszphy_suspend,
.resume = kszphy_resume,
}, {
.phy_id = PHY_ID_LAN8814,
.get_sset_count = kszphy_get_sset_count,
.get_strings = kszphy_get_strings,
.get_stats = kszphy_get_stats,
- .suspend = genphy_suspend,
+ .suspend = kszphy_suspend,
.resume = kszphy_resume,
}, {
.phy_id = PHY_ID_KSZ8873MLL,
static int sfp_module_parse_power(struct sfp *sfp)
{
u32 power_mW = 1000;
+ bool supports_a2;
if (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_POWER_DECL))
power_mW = 1500;
if (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_HIGH_POWER_LEVEL))
power_mW = 2000;
+ supports_a2 = sfp->id.ext.sff8472_compliance !=
+ SFP_SFF8472_COMPLIANCE_NONE ||
+ sfp->id.ext.diagmon & SFP_DIAGMON_DDM;
+
if (power_mW > sfp->max_power_mW) {
/* Module power specification exceeds the allowed maximum. */
- if (sfp->id.ext.sff8472_compliance ==
- SFP_SFF8472_COMPLIANCE_NONE &&
- !(sfp->id.ext.diagmon & SFP_DIAGMON_DDM)) {
+ if (!supports_a2) {
/* The module appears not to implement bus address
* 0xa2, so assume that the module powers up in the
* indicated mode.
}
}
+ if (power_mW <= 1000) {
+ /* Modules below 1W do not require a power change sequence */
+ sfp->module_power_mW = power_mW;
+ return 0;
+ }
+
+ if (!supports_a2) {
+ /* The module power level is below the host maximum and the
+ * module appears not to implement bus address 0xa2, so assume
+ * that the module powers up in the indicated mode.
+ */
+ return 0;
+ }
+
/* If the module requires a higher power mode, but also requires
* an address change sequence, warn the user that the module may
* not be functional.
*/
- if (sfp->id.ext.diagmon & SFP_DIAGMON_ADDRMODE && power_mW > 1000) {
+ if (sfp->id.ext.diagmon & SFP_DIAGMON_ADDRMODE) {
dev_warn(sfp->dev,
"Address Change Sequence not supported but module requires %u.%uW, module may not be functional\n",
power_mW / 1000, (power_mW / 100) % 10);
{QMI_FIXED_INTF(0x19d2, 0x1426, 2)}, /* ZTE MF91 */
{QMI_FIXED_INTF(0x19d2, 0x1428, 2)}, /* Telewell TW-LTE 4G v2 */
{QMI_FIXED_INTF(0x19d2, 0x1432, 3)}, /* ZTE ME3620 */
+ {QMI_FIXED_INTF(0x19d2, 0x1485, 5)}, /* ZTE MF286D */
{QMI_FIXED_INTF(0x19d2, 0x2002, 4)}, /* ZTE (Vodafone) K3765-Z */
{QMI_FIXED_INTF(0x2001, 0x7e16, 3)}, /* D-Link DWM-221 */
{QMI_FIXED_INTF(0x2001, 0x7e19, 4)}, /* D-Link DWM-221 B1 */
{QMI_FIXED_INTF(0x413c, 0x81e0, 0)}, /* Dell Wireless 5821e with eSIM support*/
{QMI_FIXED_INTF(0x03f0, 0x4e1d, 8)}, /* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */
{QMI_FIXED_INTF(0x03f0, 0x9d1d, 1)}, /* HP lt4120 Snapdragon X5 LTE */
+ {QMI_QUIRK_SET_DTR(0x22de, 0x9051, 2)}, /* Hucom Wireless HM-211S/K */
{QMI_FIXED_INTF(0x22de, 0x9061, 3)}, /* WeTelecom WPD-600N */
{QMI_QUIRK_SET_DTR(0x1e0e, 0x9001, 5)}, /* SIMCom 7100E, 7230E, 7600E ++ */
{QMI_QUIRK_SET_DTR(0x2c7c, 0x0121, 4)}, /* Quectel EC21 Mini PCIe */
.bind = smsc95xx_bind,
.unbind = smsc95xx_unbind,
.link_reset = smsc95xx_link_reset,
- .reset = smsc95xx_start_phy,
+ .reset = smsc95xx_reset,
+ .check_connect = smsc95xx_start_phy,
.stop = smsc95xx_stop,
.rx_fixup = smsc95xx_rx_fixup,
.tx_fixup = smsc95xx_tx_fixup,
stragglers = num_cpu >= vi->curr_queue_pairs ?
num_cpu % vi->curr_queue_pairs :
0;
- cpu = cpumask_next(-1, cpu_online_mask);
+ cpu = cpumask_first(cpu_online_mask);
for (i = 0; i < vi->curr_queue_pairs; i++) {
group_size = stride + (i < stragglers ? 1 : 0);
static_identity->static_public, private_key);
}
+static void hmac(u8 *out, const u8 *in, const u8 *key, const size_t inlen, const size_t keylen)
+{
+ struct blake2s_state state;
+ u8 x_key[BLAKE2S_BLOCK_SIZE] __aligned(__alignof__(u32)) = { 0 };
+ u8 i_hash[BLAKE2S_HASH_SIZE] __aligned(__alignof__(u32));
+ int i;
+
+ if (keylen > BLAKE2S_BLOCK_SIZE) {
+ blake2s_init(&state, BLAKE2S_HASH_SIZE);
+ blake2s_update(&state, key, keylen);
+ blake2s_final(&state, x_key);
+ } else
+ memcpy(x_key, key, keylen);
+
+ for (i = 0; i < BLAKE2S_BLOCK_SIZE; ++i)
+ x_key[i] ^= 0x36;
+
+ blake2s_init(&state, BLAKE2S_HASH_SIZE);
+ blake2s_update(&state, x_key, BLAKE2S_BLOCK_SIZE);
+ blake2s_update(&state, in, inlen);
+ blake2s_final(&state, i_hash);
+
+ for (i = 0; i < BLAKE2S_BLOCK_SIZE; ++i)
+ x_key[i] ^= 0x5c ^ 0x36;
+
+ blake2s_init(&state, BLAKE2S_HASH_SIZE);
+ blake2s_update(&state, x_key, BLAKE2S_BLOCK_SIZE);
+ blake2s_update(&state, i_hash, BLAKE2S_HASH_SIZE);
+ blake2s_final(&state, i_hash);
+
+ memcpy(out, i_hash, BLAKE2S_HASH_SIZE);
+ memzero_explicit(x_key, BLAKE2S_BLOCK_SIZE);
+ memzero_explicit(i_hash, BLAKE2S_HASH_SIZE);
+}
+
/* This is Hugo Krawczyk's HKDF:
* - https://eprint.iacr.org/2010/264.pdf
* - https://tools.ietf.org/html/rfc5869
((third_len || third_dst) && (!second_len || !second_dst))));
/* Extract entropy from data into secret */
- blake2s256_hmac(secret, data, chaining_key, data_len, NOISE_HASH_LEN);
+ hmac(secret, data, chaining_key, data_len, NOISE_HASH_LEN);
if (!first_dst || !first_len)
goto out;
/* Expand first key: key = secret, data = 0x1 */
output[0] = 1;
- blake2s256_hmac(output, output, secret, 1, BLAKE2S_HASH_SIZE);
+ hmac(output, output, secret, 1, BLAKE2S_HASH_SIZE);
memcpy(first_dst, output, first_len);
if (!second_dst || !second_len)
/* Expand second key: key = secret, data = first-key || 0x2 */
output[BLAKE2S_HASH_SIZE] = 2;
- blake2s256_hmac(output, output, secret, BLAKE2S_HASH_SIZE + 1,
- BLAKE2S_HASH_SIZE);
+ hmac(output, output, secret, BLAKE2S_HASH_SIZE + 1, BLAKE2S_HASH_SIZE);
memcpy(second_dst, output, second_len);
if (!third_dst || !third_len)
/* Expand third key: key = secret, data = second-key || 0x3 */
output[BLAKE2S_HASH_SIZE] = 3;
- blake2s256_hmac(output, output, secret, BLAKE2S_HASH_SIZE + 1,
- BLAKE2S_HASH_SIZE);
+ hmac(output, output, secret, BLAKE2S_HASH_SIZE + 1, BLAKE2S_HASH_SIZE);
memcpy(third_dst, output, third_len);
out:
*/
#include <asm/unaligned.h>
+
+#include <linux/math.h>
#include <linux/string.h>
#include <linux/bug.h>
static int proc_status_open(struct inode *inode, struct file *file)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *apriv = dev->ml_priv;
CapabilityRid cap_rid;
StatusRid status_rid;
u16 rid)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *apriv = dev->ml_priv;
StatsRid stats;
int i, j;
static void proc_config_on_close(struct inode *inode, struct file *file)
{
struct proc_data *data = file->private_data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
char *line;
static int proc_config_open(struct inode *inode, struct file *file)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
int i;
__le16 mode;
static void proc_SSID_on_close(struct inode *inode, struct file *file)
{
struct proc_data *data = file->private_data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
SsidRid SSID_rid;
int i;
static void proc_APList_on_close(struct inode *inode, struct file *file)
{
struct proc_data *data = file->private_data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
APListRid *APList_rid = &ai->APList;
int i;
static void proc_wepkey_on_close(struct inode *inode, struct file *file)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
int i, rc;
char key[16];
static int proc_wepkey_open(struct inode *inode, struct file *file)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
char *ptr;
WepKeyRid wkr;
static int proc_SSID_open(struct inode *inode, struct file *file)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
int i;
char *ptr;
static int proc_APList_open(struct inode *inode, struct file *file)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
int i;
char *ptr;
static int proc_BSSList_open(struct inode *inode, struct file *file)
{
struct proc_data *data;
- struct net_device *dev = PDE_DATA(inode);
+ struct net_device *dev = pde_data(inode);
struct airo_info *ai = dev->ml_priv;
char *ptr;
BSSListRid BSSList_rid;
#if !defined(PRISM2_NO_PROCFS_DEBUG) && defined(CONFIG_PROC_FS)
static int ap_debug_proc_show(struct seq_file *m, void *v)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
seq_printf(m, "BridgedUnicastFrames=%u\n", ap->bridged_unicast);
seq_printf(m, "BridgedMulticastFrames=%u\n", ap->bridged_multicast);
static int ap_control_proc_show(struct seq_file *m, void *v)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
char *policy_txt;
struct mac_entry *entry;
static void *ap_control_proc_start(struct seq_file *m, loff_t *_pos)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
spin_lock_bh(&ap->mac_restrictions.lock);
return seq_list_start_head(&ap->mac_restrictions.mac_list, *_pos);
}
static void *ap_control_proc_next(struct seq_file *m, void *v, loff_t *_pos)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
return seq_list_next(v, &ap->mac_restrictions.mac_list, _pos);
}
static void ap_control_proc_stop(struct seq_file *m, void *v)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
spin_unlock_bh(&ap->mac_restrictions.lock);
}
static void *prism2_ap_proc_start(struct seq_file *m, loff_t *_pos)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
spin_lock_bh(&ap->sta_table_lock);
return seq_list_start_head(&ap->sta_list, *_pos);
}
static void *prism2_ap_proc_next(struct seq_file *m, void *v, loff_t *_pos)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
return seq_list_next(v, &ap->sta_list, _pos);
}
static void prism2_ap_proc_stop(struct seq_file *m, void *v)
{
- struct ap_data *ap = PDE_DATA(file_inode(m->file));
+ struct ap_data *ap = pde_data(file_inode(m->file));
spin_unlock_bh(&ap->sta_table_lock);
}
sizeof(struct prism2_download_aux_dump));
if (ret == 0) {
struct seq_file *m = file->private_data;
- m->private = PDE_DATA(inode);
+ m->private = pde_data(inode);
}
return ret;
}
static void *prism2_wds_proc_start(struct seq_file *m, loff_t *_pos)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
read_lock_bh(&local->iface_lock);
return seq_list_start(&local->hostap_interfaces, *_pos);
}
static void *prism2_wds_proc_next(struct seq_file *m, void *v, loff_t *_pos)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
return seq_list_next(v, &local->hostap_interfaces, _pos);
}
static void prism2_wds_proc_stop(struct seq_file *m, void *v)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
read_unlock_bh(&local->iface_lock);
}
static int prism2_bss_list_proc_show(struct seq_file *m, void *v)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
struct list_head *ptr = v;
struct hostap_bss_info *bss;
static void *prism2_bss_list_proc_start(struct seq_file *m, loff_t *_pos)
__acquires(&local->lock)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
spin_lock_bh(&local->lock);
return seq_list_start_head(&local->bss_list, *_pos);
}
static void *prism2_bss_list_proc_next(struct seq_file *m, void *v, loff_t *_pos)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
return seq_list_next(v, &local->bss_list, _pos);
}
static void prism2_bss_list_proc_stop(struct seq_file *m, void *v)
__releases(&local->lock)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
spin_unlock_bh(&local->lock);
}
static ssize_t prism2_pda_proc_read(struct file *file, char __user *buf,
size_t count, loff_t *_pos)
{
- local_info_t *local = PDE_DATA(file_inode(file));
+ local_info_t *local = pde_data(file_inode(file));
size_t off;
if (local->pda == NULL || *_pos >= PRISM2_PDA_SIZE)
#ifndef PRISM2_NO_STATION_MODES
static int prism2_scan_results_proc_show(struct seq_file *m, void *v)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
unsigned long entry;
int i, len;
struct hfa384x_hostscan_result *scanres;
static void *prism2_scan_results_proc_start(struct seq_file *m, loff_t *_pos)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
spin_lock_bh(&local->lock);
/* We have a header (pos 0) + N results to show (pos 1...N) */
static void *prism2_scan_results_proc_next(struct seq_file *m, void *v, loff_t *_pos)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
++*_pos;
if (*_pos > local->last_scan_results_count)
static void prism2_scan_results_proc_stop(struct seq_file *m, void *v)
{
- local_info_t *local = PDE_DATA(file_inode(m->file));
+ local_info_t *local = pde_data(file_inode(m->file));
spin_unlock_bh(&local->lock);
}
nr = nr * 10 + c;
p++;
} while (--len);
- *(int *)PDE_DATA(file_inode(file)) = nr;
+ *(int *)pde_data(file_inode(file)) = nr;
return count;
}
int err;
while (!mhi_queue_is_full(mdev, DMA_FROM_DEVICE)) {
- struct sk_buff *skb = alloc_skb(MHI_DEFAULT_MRU, GFP_KERNEL);
+ struct sk_buff *skb = alloc_skb(mbim->mru, GFP_KERNEL);
if (unlikely(!skb))
break;
err = mhi_queue_skb(mdev, DMA_FROM_DEVICE, skb,
- MHI_DEFAULT_MRU, MHI_EOT);
+ mbim->mru, MHI_EOT);
if (unlikely(err)) {
kfree_skb(skb);
break;
static void pn544_hci_i2c_platform_init(struct pn544_i2c_phy *phy)
{
int polarity, retry, ret;
- char rset_cmd[] = { 0x05, 0xF9, 0x04, 0x00, 0xC3, 0xE5 };
+ static const char rset_cmd[] = { 0x05, 0xF9, 0x04, 0x00, 0xC3, 0xE5 };
int count = sizeof(rset_cmd);
nfc_info(&phy->i2c_dev->dev, "Detecting nfc_en polarity\n");
return -ENOMEM;
transaction->aid_len = skb->data[1];
+
+ /* Checking if the length of the AID is valid */
+ if (transaction->aid_len > sizeof(transaction->aid))
+ return -EINVAL;
+
memcpy(transaction->aid, &skb->data[2],
transaction->aid_len);
return -EPROTO;
transaction->params_len = skb->data[transaction->aid_len + 3];
+
+ /* Total size is allocated (skb->len - 2) minus fixed array members */
+ if (transaction->params_len > ((skb->len - 2) - sizeof(struct nfc_evt_transaction)))
+ return -EINVAL;
+
memcpy(transaction->params, skb->data +
transaction->aid_len + 4, transaction->params_len);
static struct nubus_proc_pde_data *
nubus_proc_alloc_pde_data(unsigned char *ptr, unsigned int size)
{
- struct nubus_proc_pde_data *pde_data;
+ struct nubus_proc_pde_data *pded;
- pde_data = kmalloc(sizeof(*pde_data), GFP_KERNEL);
- if (!pde_data)
+ pded = kmalloc(sizeof(*pded), GFP_KERNEL);
+ if (!pded)
return NULL;
- pde_data->res_ptr = ptr;
- pde_data->res_size = size;
- return pde_data;
+ pded->res_ptr = ptr;
+ pded->res_size = size;
+ return pded;
}
static int nubus_proc_rsrc_show(struct seq_file *m, void *v)
{
struct inode *inode = m->private;
- struct nubus_proc_pde_data *pde_data;
+ struct nubus_proc_pde_data *pded;
- pde_data = PDE_DATA(inode);
- if (!pde_data)
+ pded = pde_data(inode);
+ if (!pded)
return 0;
- if (pde_data->res_size > m->size)
+ if (pded->res_size > m->size)
return -EFBIG;
- if (pde_data->res_size) {
+ if (pded->res_size) {
int lanes = (int)proc_get_parent_data(inode);
struct nubus_dirent ent;
return 0;
ent.mask = lanes;
- ent.base = pde_data->res_ptr;
+ ent.base = pded->res_ptr;
ent.data = 0;
- nubus_seq_write_rsrc_mem(m, &ent, pde_data->res_size);
+ nubus_seq_write_rsrc_mem(m, &ent, pded->res_size);
} else {
- unsigned int data = (unsigned int)pde_data->res_ptr;
+ unsigned int data = (unsigned int)pded->res_ptr;
seq_putc(m, data >> 16);
seq_putc(m, data >> 8);
unsigned int size)
{
char name[9];
- struct nubus_proc_pde_data *pde_data;
+ struct nubus_proc_pde_data *pded;
if (!procdir)
return;
snprintf(name, sizeof(name), "%x", ent->type);
if (size)
- pde_data = nubus_proc_alloc_pde_data(nubus_dirptr(ent), size);
+ pded = nubus_proc_alloc_pde_data(nubus_dirptr(ent), size);
else
- pde_data = NULL;
+ pded = NULL;
proc_create_single_data(name, S_IFREG | 0444, procdir,
- nubus_proc_rsrc_show, pde_data);
+ nubus_proc_rsrc_show, pded);
}
void nubus_proc_add_rsrc(struct proc_dir_entry *procdir,
return count;
}
-static int __of_parse_phandle_with_args(const struct device_node *np,
- const char *list_name,
- const char *cells_name,
- int cell_count, int index,
- struct of_phandle_args *out_args)
+int __of_parse_phandle_with_args(const struct device_node *np,
+ const char *list_name,
+ const char *cells_name,
+ int cell_count, int index,
+ struct of_phandle_args *out_args)
{
struct of_phandle_iterator it;
int rc, cur_index = 0;
+ if (index < 0)
+ return -EINVAL;
+
/* Loop over the phandles until all the requested entry is found */
of_for_each_phandle(&it, rc, np, list_name, cells_name, cell_count) {
/*
of_node_put(it.node);
return rc;
}
-
-/**
- * of_parse_phandle - Resolve a phandle property to a device_node pointer
- * @np: Pointer to device node holding phandle property
- * @phandle_name: Name of property holding a phandle value
- * @index: For properties holding a table of phandles, this is the index into
- * the table
- *
- * Return: The device_node pointer with refcount incremented. Use
- * of_node_put() on it when done.
- */
-struct device_node *of_parse_phandle(const struct device_node *np,
- const char *phandle_name, int index)
-{
- struct of_phandle_args args;
-
- if (index < 0)
- return NULL;
-
- if (__of_parse_phandle_with_args(np, phandle_name, NULL, 0,
- index, &args))
- return NULL;
-
- return args.np;
-}
-EXPORT_SYMBOL(of_parse_phandle);
-
-/**
- * of_parse_phandle_with_args() - Find a node pointed by phandle in a list
- * @np: pointer to a device tree node containing a list
- * @list_name: property name that contains a list
- * @cells_name: property name that specifies phandles' arguments count
- * @index: index of a phandle to parse out
- * @out_args: optional pointer to output arguments structure (will be filled)
- *
- * This function is useful to parse lists of phandles and their arguments.
- * Returns 0 on success and fills out_args, on error returns appropriate
- * errno value.
- *
- * Caller is responsible to call of_node_put() on the returned out_args->np
- * pointer.
- *
- * Example::
- *
- * phandle1: node1 {
- * #list-cells = <2>;
- * };
- *
- * phandle2: node2 {
- * #list-cells = <1>;
- * };
- *
- * node3 {
- * list = <&phandle1 1 2 &phandle2 3>;
- * };
- *
- * To get a device_node of the ``node2`` node you may call this:
- * of_parse_phandle_with_args(node3, "list", "#list-cells", 1, &args);
- */
-int of_parse_phandle_with_args(const struct device_node *np, const char *list_name,
- const char *cells_name, int index,
- struct of_phandle_args *out_args)
-{
- int cell_count = -1;
-
- if (index < 0)
- return -EINVAL;
-
- /* If cells_name is NULL we assume a cell count of 0 */
- if (!cells_name)
- cell_count = 0;
-
- return __of_parse_phandle_with_args(np, list_name, cells_name,
- cell_count, index, out_args);
-}
-EXPORT_SYMBOL(of_parse_phandle_with_args);
+EXPORT_SYMBOL(__of_parse_phandle_with_args);
/**
* of_parse_phandle_with_args_map() - Find a node pointed by phandle in a list and remap it
}
EXPORT_SYMBOL(of_parse_phandle_with_args_map);
-/**
- * of_parse_phandle_with_fixed_args() - Find a node pointed by phandle in a list
- * @np: pointer to a device tree node containing a list
- * @list_name: property name that contains a list
- * @cell_count: number of argument cells following the phandle
- * @index: index of a phandle to parse out
- * @out_args: optional pointer to output arguments structure (will be filled)
- *
- * This function is useful to parse lists of phandles and their arguments.
- * Returns 0 on success and fills out_args, on error returns appropriate
- * errno value.
- *
- * Caller is responsible to call of_node_put() on the returned out_args->np
- * pointer.
- *
- * Example::
- *
- * phandle1: node1 {
- * };
- *
- * phandle2: node2 {
- * };
- *
- * node3 {
- * list = <&phandle1 0 2 &phandle2 2 3>;
- * };
- *
- * To get a device_node of the ``node2`` node you may call this:
- * of_parse_phandle_with_fixed_args(node3, "list", 2, 1, &args);
- */
-int of_parse_phandle_with_fixed_args(const struct device_node *np,
- const char *list_name, int cell_count,
- int index, struct of_phandle_args *out_args)
-{
- if (index < 0)
- return -EINVAL;
- return __of_parse_phandle_with_args(np, list_name, NULL, cell_count,
- index, out_args);
-}
-EXPORT_SYMBOL(of_parse_phandle_with_fixed_args);
-
/**
* of_count_phandle_with_args() - Find the number of phandles references in a property
* @np: pointer to a device tree node containing a list
const struct of_device_id *of_match_device(const struct of_device_id *matches,
const struct device *dev)
{
- if ((!matches) || (!dev->of_node))
+ if (!matches || !dev->of_node || dev->of_node_reused)
return NULL;
return of_match_node(matches, dev->of_node);
}
static int led_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, led_proc_show, PDE_DATA(inode));
+ return single_open(file, led_proc_show, pde_data(inode));
}
static ssize_t led_proc_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
{
- void *data = PDE_DATA(file_inode(file));
+ void *data = pde_data(file_inode(file));
char *cur, lbuf[32];
int d;
entry->kobj.kset = paths_kset;
err = kobject_init_and_add(&entry->kobj, &ktype_pdcspath, NULL,
"%s", entry->name);
- if (err)
+ if (err) {
+ kobject_put(&entry->kobj);
return err;
+ }
/* kobject is now registered */
write_lock(&entry->rw_lock);
if (!val)
return 0;
- pos = find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL, 0);
+ pos = find_first_bit(&val, MAX_MSI_IRQS_PER_CTRL);
while (pos != MAX_MSI_IRQS_PER_CTRL) {
generic_handle_domain_irq(pp->irq_domain,
(index * MAX_MSI_IRQS_PER_CTRL) + pos);
static loff_t proc_bus_pci_lseek(struct file *file, loff_t off, int whence)
{
- struct pci_dev *dev = PDE_DATA(file_inode(file));
+ struct pci_dev *dev = pde_data(file_inode(file));
return fixed_size_llseek(file, off, whence, dev->cfg_size);
}
static ssize_t proc_bus_pci_read(struct file *file, char __user *buf,
size_t nbytes, loff_t *ppos)
{
- struct pci_dev *dev = PDE_DATA(file_inode(file));
+ struct pci_dev *dev = pde_data(file_inode(file));
unsigned int pos = *ppos;
unsigned int cnt, size;
size_t nbytes, loff_t *ppos)
{
struct inode *ino = file_inode(file);
- struct pci_dev *dev = PDE_DATA(ino);
+ struct pci_dev *dev = pde_data(ino);
int pos = *ppos;
int size = dev->cfg_size;
int cnt, ret;
static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
- struct pci_dev *dev = PDE_DATA(file_inode(file));
+ struct pci_dev *dev = pde_data(file_inode(file));
#ifdef HAVE_PCI_MMAP
struct pci_filp_private *fpriv = file->private_data;
#endif /* HAVE_PCI_MMAP */
#ifdef HAVE_PCI_MMAP
static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
{
- struct pci_dev *dev = PDE_DATA(file_inode(file));
+ struct pci_dev *dev = pde_data(file_inode(file));
struct pci_filp_private *fpriv = file->private_data;
int i, ret, write_combine = 0, res_bit = IORESOURCE_MEM;
static int dispatch_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, dispatch_proc_show, PDE_DATA(inode));
+ return single_open(file, dispatch_proc_show, pde_data(inode));
}
static ssize_t dispatch_proc_write(struct file *file,
const char __user *userbuf,
size_t count, loff_t *pos)
{
- struct ibm_struct *ibm = PDE_DATA(file_inode(file));
+ struct ibm_struct *ibm = pde_data(file_inode(file));
char *kernbuf;
int ret;
static int lcd_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, lcd_proc_show, PDE_DATA(inode));
+ return single_open(file, lcd_proc_show, pde_data(inode));
}
static int set_lcd_brightness(struct toshiba_acpi_dev *dev, int value)
static ssize_t lcd_proc_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
{
- struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file));
+ struct toshiba_acpi_dev *dev = pde_data(file_inode(file));
char cmd[42];
size_t len;
int levels;
static int video_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, video_proc_show, PDE_DATA(inode));
+ return single_open(file, video_proc_show, pde_data(inode));
}
static ssize_t video_proc_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
{
- struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file));
+ struct toshiba_acpi_dev *dev = pde_data(file_inode(file));
char *buffer;
char *cmd;
int lcd_out = -1, crt_out = -1, tv_out = -1;
static int fan_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, fan_proc_show, PDE_DATA(inode));
+ return single_open(file, fan_proc_show, pde_data(inode));
}
static ssize_t fan_proc_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
{
- struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file));
+ struct toshiba_acpi_dev *dev = pde_data(file_inode(file));
char cmd[42];
size_t len;
int value;
static int keys_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, keys_proc_show, PDE_DATA(inode));
+ return single_open(file, keys_proc_show, pde_data(inode));
}
static ssize_t keys_proc_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
{
- struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file));
+ struct toshiba_acpi_dev *dev = pde_data(file_inode(file));
char cmd[42];
size_t len;
int value;
static ssize_t isapnp_proc_bus_read(struct file *file, char __user * buf,
size_t nbytes, loff_t * ppos)
{
- struct pnp_dev *dev = PDE_DATA(file_inode(file));
+ struct pnp_dev *dev = pde_data(file_inode(file));
int pos = *ppos;
int cnt, size = 256;
static int pnpbios_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, pnpbios_proc_show, PDE_DATA(inode));
+ return single_open(file, pnpbios_proc_show, pde_data(inode));
}
static ssize_t pnpbios_proc_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
{
- void *data = PDE_DATA(file_inode(file));
+ void *data = pde_data(file_inode(file));
struct pnp_bios_node *node;
int boot = (long)data >> 8;
u8 nodenum = (long)data;
}
}
+static int pwm_apply_legacy(struct pwm_chip *chip, struct pwm_device *pwm,
+ const struct pwm_state *state)
+{
+ int err;
+ struct pwm_state initial_state = pwm->state;
+
+ if (state->polarity != pwm->state.polarity) {
+ if (!chip->ops->set_polarity)
+ return -EINVAL;
+
+ /*
+ * Changing the polarity of a running PWM is only allowed when
+ * the PWM driver implements ->apply().
+ */
+ if (pwm->state.enabled) {
+ chip->ops->disable(chip, pwm);
+
+ /*
+ * Update pwm->state already here in case
+ * .set_polarity() or another callback depend on that.
+ */
+ pwm->state.enabled = false;
+ }
+
+ err = chip->ops->set_polarity(chip, pwm, state->polarity);
+ if (err)
+ goto rollback;
+
+ pwm->state.polarity = state->polarity;
+ }
+
+ if (!state->enabled) {
+ if (pwm->state.enabled)
+ chip->ops->disable(chip, pwm);
+
+ return 0;
+ }
+
+ /*
+ * We cannot skip calling ->config even if state->period ==
+ * pwm->state.period && state->duty_cycle == pwm->state.duty_cycle
+ * because we might have exited early in the last call to
+ * pwm_apply_state because of !state->enabled and so the two values in
+ * pwm->state might not be configured in hardware.
+ */
+ err = chip->ops->config(pwm->chip, pwm,
+ state->duty_cycle,
+ state->period);
+ if (err)
+ goto rollback;
+
+ pwm->state.period = state->period;
+ pwm->state.duty_cycle = state->duty_cycle;
+
+ if (!pwm->state.enabled) {
+ err = chip->ops->enable(chip, pwm);
+ if (err)
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ pwm->state = initial_state;
+ return err;
+}
+
/**
* pwm_apply_state() - atomically apply a new state to a PWM device
* @pwm: PWM device
state->usage_power == pwm->state.usage_power)
return 0;
- if (chip->ops->apply) {
+ if (chip->ops->apply)
err = chip->ops->apply(chip, pwm, state);
- if (err)
- return err;
-
- trace_pwm_apply(pwm, state);
-
- pwm->state = *state;
-
- /*
- * only do this after pwm->state was applied as some
- * implementations of .get_state depend on this
- */
- pwm_apply_state_debug(pwm, state);
- } else {
- /*
- * FIXME: restore the initial state in case of error.
- */
- if (state->polarity != pwm->state.polarity) {
- if (!chip->ops->set_polarity)
- return -EINVAL;
+ else
+ err = pwm_apply_legacy(chip, pwm, state);
+ if (err)
+ return err;
- /*
- * Changing the polarity of a running PWM is
- * only allowed when the PWM driver implements
- * ->apply().
- */
- if (pwm->state.enabled) {
- chip->ops->disable(chip, pwm);
- pwm->state.enabled = false;
- }
+ trace_pwm_apply(pwm, state);
- err = chip->ops->set_polarity(chip, pwm,
- state->polarity);
- if (err)
- return err;
+ pwm->state = *state;
- pwm->state.polarity = state->polarity;
- }
-
- if (state->period != pwm->state.period ||
- state->duty_cycle != pwm->state.duty_cycle) {
- err = chip->ops->config(pwm->chip, pwm,
- state->duty_cycle,
- state->period);
- if (err)
- return err;
-
- pwm->state.duty_cycle = state->duty_cycle;
- pwm->state.period = state->period;
- }
-
- if (state->enabled != pwm->state.enabled) {
- if (state->enabled) {
- err = chip->ops->enable(chip, pwm);
- if (err)
- return err;
- } else {
- chip->ops->disable(chip, pwm);
- }
-
- pwm->state.enabled = state->enabled;
- }
- }
+ /*
+ * only do this after pwm->state was applied as some
+ * implementations of .get_state depend on this
+ */
+ pwm_apply_state_debug(pwm, state);
return 0;
}
duty = DIV_ROUND_UP(timebase * duty_ns, period_ns);
- ret = pm_runtime_get_sync(chip->dev);
- if (ret < 0) {
- pm_runtime_put_autosuspend(chip->dev);
+ ret = pm_runtime_resume_and_get(chip->dev);
+ if (ret < 0)
return ret;
- }
val = img_pwm_readl(pwm_chip, PWM_CTRL_CFG);
val &= ~(PWM_CTRL_CFG_DIV_MASK << PWM_CTRL_CFG_DIV_SHIFT(pwm->hwpwm));
pm_runtime_put_autosuspend(chip->dev);
}
+static int img_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
+ const struct pwm_state *state)
+{
+ int err;
+
+ if (state->polarity != PWM_POLARITY_NORMAL)
+ return -EINVAL;
+
+ if (!state->enabled) {
+ if (pwm->state.enabled)
+ img_pwm_disable(chip, pwm);
+
+ return 0;
+ }
+
+ err = img_pwm_config(pwm->chip, pwm, state->duty_cycle, state->period);
+ if (err)
+ return err;
+
+ if (!pwm->state.enabled)
+ err = img_pwm_enable(chip, pwm);
+
+ return err;
+}
+
static const struct pwm_ops img_pwm_ops = {
- .config = img_pwm_config,
- .enable = img_pwm_enable,
- .disable = img_pwm_disable,
+ .apply = img_pwm_apply,
.owner = THIS_MODULE,
};
}
static int twl_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
- int duty_ns, int period_ns)
+ u64 duty_ns, u64 period_ns)
{
- int duty_cycle = DIV_ROUND_UP(duty_ns * TWL_PWM_MAX, period_ns) + 1;
+ int duty_cycle = DIV64_U64_ROUND_UP(duty_ns * TWL_PWM_MAX, period_ns) + 1;
u8 pwm_config[2] = { 1, 0 };
int base, ret;
mutex_unlock(&twl->mutex);
}
+static int twl4030_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
+ const struct pwm_state *state)
+{
+ int err;
+
+ if (state->polarity != PWM_POLARITY_NORMAL)
+ return -EINVAL;
+
+ if (!state->enabled) {
+ if (pwm->state.enabled)
+ twl4030_pwm_disable(chip, pwm);
+
+ return 0;
+ }
+
+ err = twl_pwm_config(pwm->chip, pwm, state->duty_cycle, state->period);
+ if (err)
+ return err;
+
+ if (!pwm->state.enabled)
+ err = twl4030_pwm_enable(chip, pwm);
+
+ return err;
+}
+
+static int twl6030_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
+ const struct pwm_state *state)
+{
+ int err;
+
+ if (state->polarity != PWM_POLARITY_NORMAL)
+ return -EINVAL;
+
+ if (!state->enabled) {
+ if (pwm->state.enabled)
+ twl6030_pwm_disable(chip, pwm);
+
+ return 0;
+ }
+
+ err = twl_pwm_config(pwm->chip, pwm, state->duty_cycle, state->period);
+ if (err)
+ return err;
+
+ if (!pwm->state.enabled)
+ err = twl6030_pwm_enable(chip, pwm);
+
+ return err;
+}
+
static const struct pwm_ops twl4030_pwm_ops = {
- .config = twl_pwm_config,
- .enable = twl4030_pwm_enable,
- .disable = twl4030_pwm_disable,
+ .apply = twl4030_pwm_apply,
.request = twl4030_pwm_request,
.free = twl4030_pwm_free,
.owner = THIS_MODULE,
};
static const struct pwm_ops twl6030_pwm_ops = {
- .config = twl_pwm_config,
- .enable = twl6030_pwm_enable,
- .disable = twl6030_pwm_disable,
+ .apply = twl6030_pwm_apply,
.owner = THIS_MODULE,
};
}
static int vt8500_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
- int duty_ns, int period_ns)
+ u64 duty_ns, u64 period_ns)
{
struct vt8500_chip *vt8500 = to_vt8500_chip(chip);
unsigned long long c;
}
c = (unsigned long long)pv * duty_ns;
- do_div(c, period_ns);
- dc = c;
+
+ dc = div64_u64(c, period_ns);
writel(prescale, vt8500->base + REG_SCALAR(pwm->hwpwm));
vt8500_pwm_busy_wait(vt8500, pwm->hwpwm, STATUS_SCALAR_UPDATE);
return 0;
}
+static int vt8500_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
+ const struct pwm_state *state)
+{
+ int err;
+ bool enabled = pwm->state.enabled;
+
+ if (state->polarity != pwm->state.polarity) {
+ /*
+ * Changing the polarity of a running PWM is only allowed when
+ * the PWM driver implements ->apply().
+ */
+ if (enabled) {
+ vt8500_pwm_disable(chip, pwm);
+
+ enabled = false;
+ }
+
+ err = vt8500_pwm_set_polarity(chip, pwm, state->polarity);
+ if (err)
+ return err;
+ }
+
+ if (!state->enabled) {
+ if (enabled)
+ vt8500_pwm_disable(chip, pwm);
+
+ return 0;
+ }
+
+ /*
+ * We cannot skip calling ->config even if state->period ==
+ * pwm->state.period && state->duty_cycle == pwm->state.duty_cycle
+ * because we might have exited early in the last call to
+ * pwm_apply_state because of !state->enabled and so the two values in
+ * pwm->state might not be configured in hardware.
+ */
+ err = vt8500_pwm_config(pwm->chip, pwm, state->duty_cycle, state->period);
+ if (err)
+ return err;
+
+ if (!enabled)
+ err = vt8500_pwm_enable(chip, pwm);
+
+ return err;
+}
+
static const struct pwm_ops vt8500_pwm_ops = {
- .enable = vt8500_pwm_enable,
- .disable = vt8500_pwm_disable,
- .config = vt8500_pwm_config,
- .set_polarity = vt8500_pwm_set_polarity,
+ .apply = vt8500_pwm_apply,
.owner = THIS_MODULE,
};
This driver can also be built as a module. If so, the module
will be called rtc-v3020.
+config RTC_DRV_GAMECUBE
+ tristate "Nintendo GameCube, Wii and Wii U RTC"
+ depends on GAMECUBE || WII || COMPILE_TEST
+ select REGMAP
+ help
+ If you say yes here you will get support for the RTC subsystem
+ of the Nintendo GameCube, Wii and Wii U.
+
+ This driver can also be built as a module. If so, the module
+ will be called "rtc-gamecube".
+
config RTC_DRV_WM831X
tristate "Wolfson Microelectronics WM831x RTC"
depends on MFD_WM831X
To compile this driver as a module, choose M here: the
module will be called rtc-sh.
+config RTC_DRV_SUNPLUS
+ tristate "Sunplus SP7021 RTC"
+ depends on SOC_SP7021
+ help
+ Say 'yes' to get support for the real-time clock present in
+ Sunplus SP7021 - a SoC for industrial applications. It provides
+ RTC status check, timer/alarm functionalities, user data
+ reservation with the battery over 2.5V, RTC power status check
+ and battery charge.
+
+ This driver can also be built as a module. If so, the module
+ will be called rtc-sunplus.
+
config RTC_DRV_VR41XX
tristate "NEC VR41XX"
depends on CPU_VR41XX || COMPILE_TEST
obj-$(CONFIG_RTC_DRV_MV) += rtc-mv.o
obj-$(CONFIG_RTC_DRV_MXC) += rtc-mxc.o
obj-$(CONFIG_RTC_DRV_MXC_V2) += rtc-mxc_v2.o
+obj-$(CONFIG_RTC_DRV_GAMECUBE) += rtc-gamecube.o
obj-$(CONFIG_RTC_DRV_NTXEC) += rtc-ntxec.o
obj-$(CONFIG_RTC_DRV_OMAP) += rtc-omap.o
obj-$(CONFIG_RTC_DRV_OPAL) += rtc-opal.o
obj-$(CONFIG_RTC_DRV_STMP) += rtc-stmp3xxx.o
obj-$(CONFIG_RTC_DRV_SUN4V) += rtc-sun4v.o
obj-$(CONFIG_RTC_DRV_SUN6I) += rtc-sun6i.o
+obj-$(CONFIG_RTC_DRV_SUNPLUS) += rtc-sunplus.o
obj-$(CONFIG_RTC_DRV_SUNXI) += rtc-sunxi.o
obj-$(CONFIG_RTC_DRV_TEGRA) += rtc-tegra.o
obj-$(CONFIG_RTC_DRV_TEST) += rtc-test.o
}
switch(param.param) {
- long offset;
case RTC_PARAM_FEATURES:
if (param.index != 0)
err = -EINVAL;
param.uvalue = rtc->features[0];
break;
- case RTC_PARAM_CORRECTION:
+ case RTC_PARAM_CORRECTION: {
+ long offset;
mutex_unlock(&rtc->ops_lock);
if (param.index != 0)
return -EINVAL;
if (err == 0)
param.svalue = offset;
break;
-
+ }
default:
if (rtc->ops->param_get)
err = rtc->ops->param_get(rtc->dev.parent, ¶m);
static int cmos_read_time(struct device *dev, struct rtc_time *t)
{
+ int ret;
+
/*
* If pm_trace abused the RTC for storage, set the timespec to 0,
* which tells the caller that this RTC value is unusable.
if (!pm_trace_rtc_valid())
return -EIO;
- mc146818_get_time(t);
+ ret = mc146818_get_time(t);
+ if (ret < 0) {
+ dev_err_ratelimited(dev, "unable to read current time\n");
+ return ret;
+ }
+
return 0;
}
return mc146818_set_time(t);
}
+struct cmos_read_alarm_callback_param {
+ struct cmos_rtc *cmos;
+ struct rtc_time *time;
+ unsigned char rtc_control;
+};
+
+static void cmos_read_alarm_callback(unsigned char __always_unused seconds,
+ void *param_in)
+{
+ struct cmos_read_alarm_callback_param *p =
+ (struct cmos_read_alarm_callback_param *)param_in;
+ struct rtc_time *time = p->time;
+
+ time->tm_sec = CMOS_READ(RTC_SECONDS_ALARM);
+ time->tm_min = CMOS_READ(RTC_MINUTES_ALARM);
+ time->tm_hour = CMOS_READ(RTC_HOURS_ALARM);
+
+ if (p->cmos->day_alrm) {
+ /* ignore upper bits on readback per ACPI spec */
+ time->tm_mday = CMOS_READ(p->cmos->day_alrm) & 0x3f;
+ if (!time->tm_mday)
+ time->tm_mday = -1;
+
+ if (p->cmos->mon_alrm) {
+ time->tm_mon = CMOS_READ(p->cmos->mon_alrm);
+ if (!time->tm_mon)
+ time->tm_mon = -1;
+ }
+ }
+
+ p->rtc_control = CMOS_READ(RTC_CONTROL);
+}
+
static int cmos_read_alarm(struct device *dev, struct rtc_wkalrm *t)
{
struct cmos_rtc *cmos = dev_get_drvdata(dev);
- unsigned char rtc_control;
+ struct cmos_read_alarm_callback_param p = {
+ .cmos = cmos,
+ .time = &t->time,
+ };
/* This not only a rtc_op, but also called directly */
if (!is_valid_irq(cmos->irq))
* the future.
*/
- spin_lock_irq(&rtc_lock);
- t->time.tm_sec = CMOS_READ(RTC_SECONDS_ALARM);
- t->time.tm_min = CMOS_READ(RTC_MINUTES_ALARM);
- t->time.tm_hour = CMOS_READ(RTC_HOURS_ALARM);
-
- if (cmos->day_alrm) {
- /* ignore upper bits on readback per ACPI spec */
- t->time.tm_mday = CMOS_READ(cmos->day_alrm) & 0x3f;
- if (!t->time.tm_mday)
- t->time.tm_mday = -1;
-
- if (cmos->mon_alrm) {
- t->time.tm_mon = CMOS_READ(cmos->mon_alrm);
- if (!t->time.tm_mon)
- t->time.tm_mon = -1;
- }
- }
-
- rtc_control = CMOS_READ(RTC_CONTROL);
- spin_unlock_irq(&rtc_lock);
+ /* Some Intel chipsets disconnect the alarm registers when the clock
+ * update is in progress - during this time reads return bogus values
+ * and writes may fail silently. See for example "7th Generation Intel®
+ * Processor Family I/O for U/Y Platforms [...] Datasheet", section
+ * 27.7.1
+ *
+ * Use the mc146818_avoid_UIP() function to avoid this.
+ */
+ if (!mc146818_avoid_UIP(cmos_read_alarm_callback, &p))
+ return -EIO;
- if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
+ if (!(p.rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
if (((unsigned)t->time.tm_sec) < 0x60)
t->time.tm_sec = bcd2bin(t->time.tm_sec);
else
}
}
- t->enabled = !!(rtc_control & RTC_AIE);
+ t->enabled = !!(p.rtc_control & RTC_AIE);
t->pending = 0;
return 0;
return 0;
}
+struct cmos_set_alarm_callback_param {
+ struct cmos_rtc *cmos;
+ unsigned char mon, mday, hrs, min, sec;
+ struct rtc_wkalrm *t;
+};
+
+/* Note: this function may be executed by mc146818_avoid_UIP() more then
+ * once
+ */
+static void cmos_set_alarm_callback(unsigned char __always_unused seconds,
+ void *param_in)
+{
+ struct cmos_set_alarm_callback_param *p =
+ (struct cmos_set_alarm_callback_param *)param_in;
+
+ /* next rtc irq must not be from previous alarm setting */
+ cmos_irq_disable(p->cmos, RTC_AIE);
+
+ /* update alarm */
+ CMOS_WRITE(p->hrs, RTC_HOURS_ALARM);
+ CMOS_WRITE(p->min, RTC_MINUTES_ALARM);
+ CMOS_WRITE(p->sec, RTC_SECONDS_ALARM);
+
+ /* the system may support an "enhanced" alarm */
+ if (p->cmos->day_alrm) {
+ CMOS_WRITE(p->mday, p->cmos->day_alrm);
+ if (p->cmos->mon_alrm)
+ CMOS_WRITE(p->mon, p->cmos->mon_alrm);
+ }
+
+ if (use_hpet_alarm()) {
+ /*
+ * FIXME the HPET alarm glue currently ignores day_alrm
+ * and mon_alrm ...
+ */
+ hpet_set_alarm_time(p->t->time.tm_hour, p->t->time.tm_min,
+ p->t->time.tm_sec);
+ }
+
+ if (p->t->enabled)
+ cmos_irq_enable(p->cmos, RTC_AIE);
+}
+
static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)
{
struct cmos_rtc *cmos = dev_get_drvdata(dev);
- unsigned char mon, mday, hrs, min, sec, rtc_control;
+ struct cmos_set_alarm_callback_param p = {
+ .cmos = cmos,
+ .t = t
+ };
+ unsigned char rtc_control;
int ret;
/* This not only a rtc_op, but also called directly */
if (ret < 0)
return ret;
- mon = t->time.tm_mon + 1;
- mday = t->time.tm_mday;
- hrs = t->time.tm_hour;
- min = t->time.tm_min;
- sec = t->time.tm_sec;
+ p.mon = t->time.tm_mon + 1;
+ p.mday = t->time.tm_mday;
+ p.hrs = t->time.tm_hour;
+ p.min = t->time.tm_min;
+ p.sec = t->time.tm_sec;
+ spin_lock_irq(&rtc_lock);
rtc_control = CMOS_READ(RTC_CONTROL);
+ spin_unlock_irq(&rtc_lock);
+
if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
/* Writing 0xff means "don't care" or "match all". */
- mon = (mon <= 12) ? bin2bcd(mon) : 0xff;
- mday = (mday >= 1 && mday <= 31) ? bin2bcd(mday) : 0xff;
- hrs = (hrs < 24) ? bin2bcd(hrs) : 0xff;
- min = (min < 60) ? bin2bcd(min) : 0xff;
- sec = (sec < 60) ? bin2bcd(sec) : 0xff;
- }
-
- spin_lock_irq(&rtc_lock);
-
- /* next rtc irq must not be from previous alarm setting */
- cmos_irq_disable(cmos, RTC_AIE);
-
- /* update alarm */
- CMOS_WRITE(hrs, RTC_HOURS_ALARM);
- CMOS_WRITE(min, RTC_MINUTES_ALARM);
- CMOS_WRITE(sec, RTC_SECONDS_ALARM);
-
- /* the system may support an "enhanced" alarm */
- if (cmos->day_alrm) {
- CMOS_WRITE(mday, cmos->day_alrm);
- if (cmos->mon_alrm)
- CMOS_WRITE(mon, cmos->mon_alrm);
+ p.mon = (p.mon <= 12) ? bin2bcd(p.mon) : 0xff;
+ p.mday = (p.mday >= 1 && p.mday <= 31) ? bin2bcd(p.mday) : 0xff;
+ p.hrs = (p.hrs < 24) ? bin2bcd(p.hrs) : 0xff;
+ p.min = (p.min < 60) ? bin2bcd(p.min) : 0xff;
+ p.sec = (p.sec < 60) ? bin2bcd(p.sec) : 0xff;
}
- if (use_hpet_alarm()) {
- /*
- * FIXME the HPET alarm glue currently ignores day_alrm
- * and mon_alrm ...
- */
- hpet_set_alarm_time(t->time.tm_hour, t->time.tm_min,
- t->time.tm_sec);
- }
-
- if (t->enabled)
- cmos_irq_enable(cmos, RTC_AIE);
-
- spin_unlock_irq(&rtc_lock);
+ /*
+ * Some Intel chipsets disconnect the alarm registers when the clock
+ * update is in progress - during this time writes fail silently.
+ *
+ * Use mc146818_avoid_UIP() to avoid this.
+ */
+ if (!mc146818_avoid_UIP(cmos_set_alarm_callback, &p))
+ return -EIO;
cmos->alarm_expires = rtc_tm_to_time64(&t->time);
rename_region(ports, dev_name(&cmos_rtc.rtc->dev));
- spin_lock_irq(&rtc_lock);
-
- /* Ensure that the RTC is accessible. Bit 6 must be 0! */
- if ((CMOS_READ(RTC_VALID) & 0x40) != 0) {
- spin_unlock_irq(&rtc_lock);
- dev_warn(dev, "not accessible\n");
+ if (!mc146818_does_rtc_work()) {
+ dev_warn(dev, "broken or not accessible\n");
retval = -ENXIO;
goto cleanup1;
}
+ spin_lock_irq(&rtc_lock);
+
if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
/* force periodic irq to CMOS reset default of 1024Hz;
*
da9063_data_to_tm(data, &rtc->alarm_time, rtc);
rtc->rtc_sync = false;
- /*
- * TODO: some models have alarms on a minute boundary but still support
- * real hardware interrupts. Add this once the core supports it.
- */
- if (config->rtc_data_start != RTC_SEC)
- rtc->rtc_dev->uie_unsupported = 1;
+ if (config->rtc_data_start != RTC_SEC) {
+ set_bit(RTC_FEATURE_ALARM_RES_MINUTE, rtc->rtc_dev->features);
+ /*
+ * TODO: some models have alarms on a minute boundary but still
+ * support real hardware interrupts.
+ */
+ clear_bit(RTC_FEATURE_UPDATE_INTERRUPT, rtc->rtc_dev->features);
+ }
irq_alarm = platform_get_irq_byname(pdev, "ALARM");
if (irq_alarm < 0)
dev_err(&pdev->dev, "Failed to request ALARM IRQ %d: %d\n",
irq_alarm, ret);
+ device_init_wakeup(&pdev->dev, true);
+
return devm_rtc_register_device(rtc->rtc_dev);
}
}
}
- res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
- if (!res)
- return -ENODEV;
-
- rtc->rtc_irq = res->start;
+ rtc->rtc_irq = platform_get_irq(pdev, 0);
+ if (rtc->rtc_irq < 0)
+ return rtc->rtc_irq;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Nintendo GameCube, Wii and Wii U RTC driver
+ *
+ * This driver is for the MX23L4005, more specifically its real-time clock and
+ * SRAM storage. The value returned by the RTC counter must be added with the
+ * offset stored in a bias register in SRAM (on the GameCube and Wii) or in
+ * /config/rtc.xml (on the Wii U). The latter being very impractical to access
+ * from Linux, this driver assumes the bootloader has read it and stored it in
+ * SRAM like for the other two consoles.
+ *
+ * This device sits on a bus named EXI (which is similar to SPI), channel 0,
+ * device 1. This driver assumes no other user of the EXI bus, which is
+ * currently the case but would have to be reworked to add support for other
+ * GameCube hardware exposed on this bus.
+ *
+ * References:
+ * - https://wiiubrew.org/wiki/Hardware/RTC
+ * - https://wiibrew.org/wiki/MX23L4005
+ *
+ * Copyright (C) 2018 rw-r-r-0644
+ * Copyright (C) 2021 Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
+ *
+ * Based on rtc-gcn.c
+ * Copyright (C) 2004-2009 The GameCube Linux Team
+ * Copyright (C) 2005,2008,2009 Albert Herranz
+ * Based on gamecube_time.c from Torben Nielsen.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+#include <linux/rtc.h>
+#include <linux/time.h>
+
+/* EXI registers */
+#define EXICSR 0
+#define EXICR 12
+#define EXIDATA 16
+
+/* EXI register values */
+#define EXICSR_DEV 0x380
+ #define EXICSR_DEV1 0x100
+#define EXICSR_CLK 0x070
+ #define EXICSR_CLK_1MHZ 0x000
+ #define EXICSR_CLK_2MHZ 0x010
+ #define EXICSR_CLK_4MHZ 0x020
+ #define EXICSR_CLK_8MHZ 0x030
+ #define EXICSR_CLK_16MHZ 0x040
+ #define EXICSR_CLK_32MHZ 0x050
+#define EXICSR_INT 0x008
+ #define EXICSR_INTSET 0x008
+
+#define EXICR_TSTART 0x001
+#define EXICR_TRSMODE 0x002
+ #define EXICR_TRSMODE_IMM 0x000
+#define EXICR_TRSTYPE 0x00C
+ #define EXICR_TRSTYPE_R 0x000
+ #define EXICR_TRSTYPE_W 0x004
+#define EXICR_TLEN 0x030
+ #define EXICR_TLEN32 0x030
+
+/* EXI registers values to access the RTC */
+#define RTC_EXICSR (EXICSR_DEV1 | EXICSR_CLK_8MHZ | EXICSR_INTSET)
+#define RTC_EXICR_W (EXICR_TSTART | EXICR_TRSMODE_IMM | EXICR_TRSTYPE_W | EXICR_TLEN32)
+#define RTC_EXICR_R (EXICR_TSTART | EXICR_TRSMODE_IMM | EXICR_TRSTYPE_R | EXICR_TLEN32)
+#define RTC_EXIDATA_W 0x80000000
+
+/* RTC registers */
+#define RTC_COUNTER 0x200000
+#define RTC_SRAM 0x200001
+#define RTC_SRAM_BIAS 0x200004
+#define RTC_SNAPSHOT 0x204000
+#define RTC_ONTMR 0x210000
+#define RTC_OFFTMR 0x210001
+#define RTC_TEST0 0x210004
+#define RTC_TEST1 0x210005
+#define RTC_TEST2 0x210006
+#define RTC_TEST3 0x210007
+#define RTC_CONTROL0 0x21000c
+#define RTC_CONTROL1 0x21000d
+
+/* RTC flags */
+#define RTC_CONTROL0_UNSTABLE_POWER 0x00000800
+#define RTC_CONTROL0_LOW_BATTERY 0x00000200
+
+struct priv {
+ struct regmap *regmap;
+ void __iomem *iob;
+ u32 rtc_bias;
+};
+
+static int exi_read(void *context, u32 reg, u32 *data)
+{
+ struct priv *d = (struct priv *)context;
+ void __iomem *iob = d->iob;
+
+ /* The spin loops here loop about 15~16 times each, so there is no need
+ * to use a more expensive sleep method.
+ */
+
+ /* Write register offset */
+ iowrite32be(RTC_EXICSR, iob + EXICSR);
+ iowrite32be(reg << 8, iob + EXIDATA);
+ iowrite32be(RTC_EXICR_W, iob + EXICR);
+ while (!(ioread32be(iob + EXICSR) & EXICSR_INTSET))
+ cpu_relax();
+
+ /* Read data */
+ iowrite32be(RTC_EXICSR, iob + EXICSR);
+ iowrite32be(RTC_EXICR_R, iob + EXICR);
+ while (!(ioread32be(iob + EXICSR) & EXICSR_INTSET))
+ cpu_relax();
+ *data = ioread32be(iob + EXIDATA);
+
+ /* Clear channel parameters */
+ iowrite32be(0, iob + EXICSR);
+
+ return 0;
+}
+
+static int exi_write(void *context, u32 reg, u32 data)
+{
+ struct priv *d = (struct priv *)context;
+ void __iomem *iob = d->iob;
+
+ /* The spin loops here loop about 15~16 times each, so there is no need
+ * to use a more expensive sleep method.
+ */
+
+ /* Write register offset */
+ iowrite32be(RTC_EXICSR, iob + EXICSR);
+ iowrite32be(RTC_EXIDATA_W | (reg << 8), iob + EXIDATA);
+ iowrite32be(RTC_EXICR_W, iob + EXICR);
+ while (!(ioread32be(iob + EXICSR) & EXICSR_INTSET))
+ cpu_relax();
+
+ /* Write data */
+ iowrite32be(RTC_EXICSR, iob + EXICSR);
+ iowrite32be(data, iob + EXIDATA);
+ iowrite32be(RTC_EXICR_W, iob + EXICR);
+ while (!(ioread32be(iob + EXICSR) & EXICSR_INTSET))
+ cpu_relax();
+
+ /* Clear channel parameters */
+ iowrite32be(0, iob + EXICSR);
+
+ return 0;
+}
+
+static const struct regmap_bus exi_bus = {
+ /* TODO: is that true? Not that it matters here, but still. */
+ .fast_io = true,
+ .reg_read = exi_read,
+ .reg_write = exi_write,
+};
+
+static int gamecube_rtc_read_time(struct device *dev, struct rtc_time *t)
+{
+ struct priv *d = dev_get_drvdata(dev);
+ int ret;
+ u32 counter;
+ time64_t timestamp;
+
+ ret = regmap_read(d->regmap, RTC_COUNTER, &counter);
+ if (ret)
+ return ret;
+
+ /* Add the counter and the bias to obtain the timestamp */
+ timestamp = (time64_t)d->rtc_bias + counter;
+ rtc_time64_to_tm(timestamp, t);
+
+ return 0;
+}
+
+static int gamecube_rtc_set_time(struct device *dev, struct rtc_time *t)
+{
+ struct priv *d = dev_get_drvdata(dev);
+ time64_t timestamp;
+
+ /* Subtract the timestamp and the bias to obtain the counter value */
+ timestamp = rtc_tm_to_time64(t);
+ return regmap_write(d->regmap, RTC_COUNTER, timestamp - d->rtc_bias);
+}
+
+static int gamecube_rtc_ioctl(struct device *dev, unsigned int cmd, unsigned long arg)
+{
+ struct priv *d = dev_get_drvdata(dev);
+ int value;
+ int control0;
+ int ret;
+
+ switch (cmd) {
+ case RTC_VL_READ:
+ ret = regmap_read(d->regmap, RTC_CONTROL0, &control0);
+ if (ret)
+ return ret;
+
+ value = 0;
+ if (control0 & RTC_CONTROL0_UNSTABLE_POWER)
+ value |= RTC_VL_DATA_INVALID;
+ if (control0 & RTC_CONTROL0_LOW_BATTERY)
+ value |= RTC_VL_BACKUP_LOW;
+ return put_user(value, (unsigned int __user *)arg);
+
+ default:
+ return -ENOIOCTLCMD;
+ }
+}
+
+static const struct rtc_class_ops gamecube_rtc_ops = {
+ .read_time = gamecube_rtc_read_time,
+ .set_time = gamecube_rtc_set_time,
+ .ioctl = gamecube_rtc_ioctl,
+};
+
+static int gamecube_rtc_read_offset_from_sram(struct priv *d)
+{
+ struct device_node *np;
+ int ret;
+ struct resource res;
+ void __iomem *hw_srnprot;
+ u32 old;
+
+ np = of_find_compatible_node(NULL, NULL, "nintendo,latte-srnprot");
+ if (!np)
+ np = of_find_compatible_node(NULL, NULL,
+ "nintendo,hollywood-srnprot");
+ if (!np) {
+ pr_info("HW_SRNPROT not found, assuming a GameCube\n");
+ return regmap_read(d->regmap, RTC_SRAM_BIAS, &d->rtc_bias);
+ }
+
+ ret = of_address_to_resource(np, 0, &res);
+ if (ret) {
+ pr_err("no io memory range found\n");
+ return -1;
+ }
+
+ hw_srnprot = ioremap(res.start, resource_size(&res));
+ old = ioread32be(hw_srnprot);
+
+ /* TODO: figure out why we use this magic constant. I obtained it by
+ * reading the leftover value after boot, after IOSU already ran.
+ *
+ * On my Wii U, setting this register to 1 prevents the console from
+ * rebooting properly, so wiiubrew.org must be missing something.
+ *
+ * See https://wiiubrew.org/wiki/Hardware/Latte_registers
+ */
+ if (old != 0x7bf)
+ iowrite32be(0x7bf, hw_srnprot);
+
+ /* Get the offset from RTC SRAM.
+ *
+ * Its default location on the GameCube and on the Wii is in the SRAM,
+ * while on the Wii U the bootloader needs to fill it with the contents
+ * of /config/rtc.xml on the SLC (the eMMC). We don’t do that from
+ * Linux since it requires implementing a proprietary filesystem and do
+ * file decryption, instead we require the bootloader to fill the same
+ * SRAM address as on previous consoles.
+ */
+ ret = regmap_read(d->regmap, RTC_SRAM_BIAS, &d->rtc_bias);
+ if (ret) {
+ pr_err("failed to get the RTC bias\n");
+ return -1;
+ }
+
+ /* Reset SRAM access to how it was before, our job here is done. */
+ if (old != 0x7bf)
+ iowrite32be(old, hw_srnprot);
+ iounmap(hw_srnprot);
+
+ return 0;
+}
+
+static const struct regmap_range rtc_rd_ranges[] = {
+ regmap_reg_range(0x200000, 0x200010),
+ regmap_reg_range(0x204000, 0x204000),
+ regmap_reg_range(0x210000, 0x210001),
+ regmap_reg_range(0x210004, 0x210007),
+ regmap_reg_range(0x21000c, 0x21000d),
+};
+
+static const struct regmap_access_table rtc_rd_regs = {
+ .yes_ranges = rtc_rd_ranges,
+ .n_yes_ranges = ARRAY_SIZE(rtc_rd_ranges),
+};
+
+static const struct regmap_range rtc_wr_ranges[] = {
+ regmap_reg_range(0x200000, 0x200010),
+ regmap_reg_range(0x204000, 0x204000),
+ regmap_reg_range(0x210000, 0x210001),
+ regmap_reg_range(0x21000d, 0x21000d),
+};
+
+static const struct regmap_access_table rtc_wr_regs = {
+ .yes_ranges = rtc_wr_ranges,
+ .n_yes_ranges = ARRAY_SIZE(rtc_wr_ranges),
+};
+
+static const struct regmap_config gamecube_rtc_regmap_config = {
+ .reg_bits = 24,
+ .val_bits = 32,
+ .rd_table = &rtc_rd_regs,
+ .wr_table = &rtc_wr_regs,
+ .max_register = 0x21000d,
+ .name = "gamecube-rtc",
+};
+
+static int gamecube_rtc_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct rtc_device *rtc;
+ struct priv *d;
+ int ret;
+
+ d = devm_kzalloc(dev, sizeof(struct priv), GFP_KERNEL);
+ if (!d)
+ return -ENOMEM;
+
+ d->iob = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(d->iob))
+ return PTR_ERR(d->iob);
+
+ d->regmap = devm_regmap_init(dev, &exi_bus, d,
+ &gamecube_rtc_regmap_config);
+ if (IS_ERR(d->regmap))
+ return PTR_ERR(d->regmap);
+
+ ret = gamecube_rtc_read_offset_from_sram(d);
+ if (ret)
+ return ret;
+ dev_dbg(dev, "SRAM bias: 0x%x", d->rtc_bias);
+
+ dev_set_drvdata(dev, d);
+
+ rtc = devm_rtc_allocate_device(dev);
+ if (IS_ERR(rtc))
+ return PTR_ERR(rtc);
+
+ /* We can represent further than that, but it depends on the stored
+ * bias and we can’t modify it persistently on all supported consoles,
+ * so here we pretend to be limited to 2106.
+ */
+ rtc->range_min = 0;
+ rtc->range_max = U32_MAX;
+ rtc->ops = &gamecube_rtc_ops;
+
+ devm_rtc_register_device(rtc);
+
+ return 0;
+}
+
+static const struct of_device_id gamecube_rtc_of_match[] = {
+ {.compatible = "nintendo,latte-exi" },
+ {.compatible = "nintendo,hollywood-exi" },
+ {.compatible = "nintendo,flipper-exi" },
+ { }
+};
+MODULE_DEVICE_TABLE(of, gamecube_rtc_of_match);
+
+static struct platform_driver gamecube_rtc_driver = {
+ .probe = gamecube_rtc_probe,
+ .driver = {
+ .name = "rtc-gamecube",
+ .of_match_table = gamecube_rtc_of_match,
+ },
+};
+module_platform_driver(gamecube_rtc_driver);
+
+MODULE_AUTHOR("Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>");
+MODULE_DESCRIPTION("Nintendo GameCube, Wii and Wii U RTC driver");
+MODULE_LICENSE("GPL");
#include <linux/acpi.h>
#endif
-unsigned int mc146818_get_time(struct rtc_time *time)
+/*
+ * Execute a function while the UIP (Update-in-progress) bit of the RTC is
+ * unset.
+ *
+ * Warning: callback may be executed more then once.
+ */
+bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),
+ void *param)
{
- unsigned char ctrl;
+ int i;
unsigned long flags;
- unsigned char century = 0;
- bool retry;
+ unsigned char seconds;
-#ifdef CONFIG_MACH_DECSTATION
- unsigned int real_year;
-#endif
+ for (i = 0; i < 10; i++) {
+ spin_lock_irqsave(&rtc_lock, flags);
-again:
- spin_lock_irqsave(&rtc_lock, flags);
- /* Ensure that the RTC is accessible. Bit 6 must be 0! */
- if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) {
- spin_unlock_irqrestore(&rtc_lock, flags);
- memset(time, 0xff, sizeof(*time));
- return 0;
- }
+ /*
+ * Check whether there is an update in progress during which the
+ * readout is unspecified. The maximum update time is ~2ms. Poll
+ * every msec for completion.
+ *
+ * Store the second value before checking UIP so a long lasting
+ * NMI which happens to hit after the UIP check cannot make
+ * an update cycle invisible.
+ */
+ seconds = CMOS_READ(RTC_SECONDS);
- /*
- * Check whether there is an update in progress during which the
- * readout is unspecified. The maximum update time is ~2ms. Poll
- * every msec for completion.
- *
- * Store the second value before checking UIP so a long lasting NMI
- * which happens to hit after the UIP check cannot make an update
- * cycle invisible.
- */
- time->tm_sec = CMOS_READ(RTC_SECONDS);
+ if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) {
+ spin_unlock_irqrestore(&rtc_lock, flags);
+ mdelay(1);
+ continue;
+ }
- if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) {
- spin_unlock_irqrestore(&rtc_lock, flags);
- mdelay(1);
- goto again;
- }
+ /* Revalidate the above readout */
+ if (seconds != CMOS_READ(RTC_SECONDS)) {
+ spin_unlock_irqrestore(&rtc_lock, flags);
+ continue;
+ }
- /* Revalidate the above readout */
- if (time->tm_sec != CMOS_READ(RTC_SECONDS)) {
+ if (callback)
+ callback(seconds, param);
+
+ /*
+ * Check for the UIP bit again. If it is set now then
+ * the above values may contain garbage.
+ */
+ if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) {
+ spin_unlock_irqrestore(&rtc_lock, flags);
+ mdelay(1);
+ continue;
+ }
+
+ /*
+ * A NMI might have interrupted the above sequence so check
+ * whether the seconds value has changed which indicates that
+ * the NMI took longer than the UIP bit was set. Unlikely, but
+ * possible and there is also virt...
+ */
+ if (seconds != CMOS_READ(RTC_SECONDS)) {
+ spin_unlock_irqrestore(&rtc_lock, flags);
+ continue;
+ }
spin_unlock_irqrestore(&rtc_lock, flags);
- goto again;
+
+ return true;
}
+ return false;
+}
+EXPORT_SYMBOL_GPL(mc146818_avoid_UIP);
+
+/*
+ * If the UIP (Update-in-progress) bit of the RTC is set for more then
+ * 10ms, the RTC is apparently broken or not present.
+ */
+bool mc146818_does_rtc_work(void)
+{
+ return mc146818_avoid_UIP(NULL, NULL);
+}
+EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);
+
+struct mc146818_get_time_callback_param {
+ struct rtc_time *time;
+ unsigned char ctrl;
+#ifdef CONFIG_ACPI
+ unsigned char century;
+#endif
+#ifdef CONFIG_MACH_DECSTATION
+ unsigned int real_year;
+#endif
+};
+
+static void mc146818_get_time_callback(unsigned char seconds, void *param_in)
+{
+ struct mc146818_get_time_callback_param *p = param_in;
/*
* Only the values that we read from the RTC are set. We leave
* RTC has RTC_DAY_OF_WEEK, we ignore it, as it is only updated
* by the RTC when initially set to a non-zero value.
*/
- time->tm_min = CMOS_READ(RTC_MINUTES);
- time->tm_hour = CMOS_READ(RTC_HOURS);
- time->tm_mday = CMOS_READ(RTC_DAY_OF_MONTH);
- time->tm_mon = CMOS_READ(RTC_MONTH);
- time->tm_year = CMOS_READ(RTC_YEAR);
+ p->time->tm_sec = seconds;
+ p->time->tm_min = CMOS_READ(RTC_MINUTES);
+ p->time->tm_hour = CMOS_READ(RTC_HOURS);
+ p->time->tm_mday = CMOS_READ(RTC_DAY_OF_MONTH);
+ p->time->tm_mon = CMOS_READ(RTC_MONTH);
+ p->time->tm_year = CMOS_READ(RTC_YEAR);
#ifdef CONFIG_MACH_DECSTATION
- real_year = CMOS_READ(RTC_DEC_YEAR);
+ p->real_year = CMOS_READ(RTC_DEC_YEAR);
#endif
#ifdef CONFIG_ACPI
if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
- acpi_gbl_FADT.century)
- century = CMOS_READ(acpi_gbl_FADT.century);
+ acpi_gbl_FADT.century) {
+ p->century = CMOS_READ(acpi_gbl_FADT.century);
+ } else {
+ p->century = 0;
+ }
#endif
- ctrl = CMOS_READ(RTC_CONTROL);
- /*
- * Check for the UIP bit again. If it is set now then
- * the above values may contain garbage.
- */
- retry = CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP;
- /*
- * A NMI might have interrupted the above sequence so check whether
- * the seconds value has changed which indicates that the NMI took
- * longer than the UIP bit was set. Unlikely, but possible and
- * there is also virt...
- */
- retry |= time->tm_sec != CMOS_READ(RTC_SECONDS);
- spin_unlock_irqrestore(&rtc_lock, flags);
+ p->ctrl = CMOS_READ(RTC_CONTROL);
+}
- if (retry)
- goto again;
+int mc146818_get_time(struct rtc_time *time)
+{
+ struct mc146818_get_time_callback_param p = {
+ .time = time
+ };
+
+ if (!mc146818_avoid_UIP(mc146818_get_time_callback, &p)) {
+ memset(time, 0, sizeof(*time));
+ return -EIO;
+ }
- if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD)
+ if (!(p.ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD)
{
time->tm_sec = bcd2bin(time->tm_sec);
time->tm_min = bcd2bin(time->tm_min);
time->tm_mday = bcd2bin(time->tm_mday);
time->tm_mon = bcd2bin(time->tm_mon);
time->tm_year = bcd2bin(time->tm_year);
- century = bcd2bin(century);
+#ifdef CONFIG_ACPI
+ p.century = bcd2bin(p.century);
+#endif
}
#ifdef CONFIG_MACH_DECSTATION
- time->tm_year += real_year - 72;
+ time->tm_year += p.real_year - 72;
#endif
- if (century > 20)
- time->tm_year += (century - 19) * 100;
+#ifdef CONFIG_ACPI
+ if (p.century > 19)
+ time->tm_year += (p.century - 19) * 100;
+#endif
/*
* Account for differences between how the RTC uses the values
time->tm_mon--;
- return RTC_24H;
+ return 0;
}
EXPORT_SYMBOL_GPL(mc146818_get_time);
/*
* Enable timestamp function and store timestamp of first trigger
- * event until TSF1 and TFS2 interrupt flags are cleared.
+ * event until TSF1 and TSF2 interrupt flags are cleared.
*/
ret = regmap_update_bits(pcf2127->regmap, PCF2127_REG_TS_CTRL,
PCF2127_BIT_TS_CTRL_TSOFF |
}
#endif
-static const struct pcf85063_config pcf85063tp_config = {
- .regmap = {
- .reg_bits = 8,
- .val_bits = 8,
- .max_register = 0x0a,
+enum pcf85063_type {
+ PCF85063,
+ PCF85063TP,
+ PCF85063A,
+ RV8263,
+ PCF85063_LAST_ID
+};
+
+static struct pcf85063_config pcf85063_cfg[] = {
+ [PCF85063] = {
+ .regmap = {
+ .reg_bits = 8,
+ .val_bits = 8,
+ .max_register = 0x0a,
+ },
+ },
+ [PCF85063TP] = {
+ .regmap = {
+ .reg_bits = 8,
+ .val_bits = 8,
+ .max_register = 0x0a,
+ },
+ },
+ [PCF85063A] = {
+ .regmap = {
+ .reg_bits = 8,
+ .val_bits = 8,
+ .max_register = 0x11,
+ },
+ .has_alarms = 1,
+ },
+ [RV8263] = {
+ .regmap = {
+ .reg_bits = 8,
+ .val_bits = 8,
+ .max_register = 0x11,
+ },
+ .has_alarms = 1,
+ .force_cap_7000 = 1,
},
};
+static const struct i2c_device_id pcf85063_ids[];
+
static int pcf85063_probe(struct i2c_client *client)
{
struct pcf85063 *pcf85063;
unsigned int tmp;
int err;
- const struct pcf85063_config *config = &pcf85063tp_config;
- const void *data = of_device_get_match_data(&client->dev);
+ const struct pcf85063_config *config;
struct nvmem_config nvmem_cfg = {
.name = "pcf85063_nvram",
.reg_read = pcf85063_nvmem_read,
if (!pcf85063)
return -ENOMEM;
- if (data)
- config = data;
+ if (client->dev.of_node) {
+ config = of_device_get_match_data(&client->dev);
+ if (!config)
+ return -ENODEV;
+ } else {
+ enum pcf85063_type type =
+ i2c_match_id(pcf85063_ids, client)->driver_data;
+ if (type >= PCF85063_LAST_ID)
+ return -ENODEV;
+ config = &pcf85063_cfg[type];
+ }
pcf85063->regmap = devm_regmap_init_i2c(client, &config->regmap);
if (IS_ERR(pcf85063->regmap))
return devm_rtc_register_device(pcf85063->rtc);
}
-#ifdef CONFIG_OF
-static const struct pcf85063_config pcf85063a_config = {
- .regmap = {
- .reg_bits = 8,
- .val_bits = 8,
- .max_register = 0x11,
- },
- .has_alarms = 1,
-};
-
-static const struct pcf85063_config rv8263_config = {
- .regmap = {
- .reg_bits = 8,
- .val_bits = 8,
- .max_register = 0x11,
- },
- .has_alarms = 1,
- .force_cap_7000 = 1,
+static const struct i2c_device_id pcf85063_ids[] = {
+ { "pcf85063", PCF85063 },
+ { "pcf85063tp", PCF85063TP },
+ { "pcf85063a", PCF85063A },
+ { "rv8263", RV8263 },
+ {}
};
+MODULE_DEVICE_TABLE(i2c, pcf85063_ids);
+#ifdef CONFIG_OF
static const struct of_device_id pcf85063_of_match[] = {
- { .compatible = "nxp,pcf85063", .data = &pcf85063tp_config },
- { .compatible = "nxp,pcf85063tp", .data = &pcf85063tp_config },
- { .compatible = "nxp,pcf85063a", .data = &pcf85063a_config },
- { .compatible = "microcrystal,rv8263", .data = &rv8263_config },
+ { .compatible = "nxp,pcf85063", .data = &pcf85063_cfg[PCF85063] },
+ { .compatible = "nxp,pcf85063tp", .data = &pcf85063_cfg[PCF85063TP] },
+ { .compatible = "nxp,pcf85063a", .data = &pcf85063_cfg[PCF85063A] },
+ { .compatible = "microcrystal,rv8263", .data = &pcf85063_cfg[RV8263] },
{}
};
MODULE_DEVICE_TABLE(of, pcf85063_of_match);
.of_match_table = of_match_ptr(pcf85063_of_match),
},
.probe_new = pcf85063_probe,
+ .id_table = pcf85063_ids,
};
module_i2c_driver(pcf85063_driver);
if (sa1100_rtc->irq_alarm < 0)
return -ENXIO;
+ sa1100_rtc->rtc = devm_rtc_allocate_device(&pdev->dev);
+ if (IS_ERR(sa1100_rtc->rtc))
+ return PTR_ERR(sa1100_rtc->rtc);
+
pxa_rtc->base = devm_ioremap(dev, pxa_rtc->ress->start,
resource_size(pxa_rtc->ress));
if (!pxa_rtc->base) {
#define RS5C372_REG_MONTH 5
#define RS5C372_REG_YEAR 6
#define RS5C372_REG_TRIM 7
-# define RS5C372_TRIM_XSL 0x80
+# define RS5C372_TRIM_XSL 0x80 /* only if RS5C372[a|b] */
# define RS5C372_TRIM_MASK 0x7F
+# define R2221TL_TRIM_DEV (1 << 7) /* only if R2221TL */
+# define RS5C372_TRIM_DECR (1 << 6)
#define RS5C_REG_ALARM_A_MIN 8 /* or ALARM_W */
#define RS5C_REG_ALARM_A_HOURS 9
struct rs5c372 *rs5c372 = i2c_get_clientdata(client);
u8 tmp = rs5c372->regs[RS5C372_REG_TRIM];
- if (osc)
- *osc = (tmp & RS5C372_TRIM_XSL) ? 32000 : 32768;
+ if (osc) {
+ if (rs5c372->type == rtc_rs5c372a || rs5c372->type == rtc_rs5c372b)
+ *osc = (tmp & RS5C372_TRIM_XSL) ? 32000 : 32768;
+ else
+ *osc = 32768;
+ }
if (trim) {
dev_dbg(&client->dev, "%s: raw trim=%x\n", __func__, tmp);
#define rs5c372_rtc_proc NULL
#endif
+#ifdef CONFIG_RTC_INTF_DEV
+static int rs5c372_ioctl(struct device *dev, unsigned int cmd, unsigned long arg)
+{
+ struct rs5c372 *rs5c = i2c_get_clientdata(to_i2c_client(dev));
+ unsigned char ctrl2;
+ int addr;
+ unsigned int flags;
+
+ dev_dbg(dev, "%s: cmd=%x\n", __func__, cmd);
+
+ addr = RS5C_ADDR(RS5C_REG_CTRL2);
+ ctrl2 = i2c_smbus_read_byte_data(rs5c->client, addr);
+
+ switch (cmd) {
+ case RTC_VL_READ:
+ flags = 0;
+
+ switch (rs5c->type) {
+ case rtc_r2025sd:
+ case rtc_r2221tl:
+ if ((rs5c->type == rtc_r2025sd && !(ctrl2 & R2x2x_CTRL2_XSTP)) ||
+ (rs5c->type == rtc_r2221tl && (ctrl2 & R2x2x_CTRL2_XSTP))) {
+ flags |= RTC_VL_DATA_INVALID;
+ }
+ if (ctrl2 & R2x2x_CTRL2_VDET)
+ flags |= RTC_VL_BACKUP_LOW;
+ break;
+ default:
+ if (ctrl2 & RS5C_CTRL2_XSTP)
+ flags |= RTC_VL_DATA_INVALID;
+ break;
+ }
+
+ return put_user(flags, (unsigned int __user *)arg);
+ case RTC_VL_CLR:
+ /* clear VDET bit */
+ if (rs5c->type == rtc_r2025sd || rs5c->type == rtc_r2221tl) {
+ ctrl2 &= ~R2x2x_CTRL2_VDET;
+ if (i2c_smbus_write_byte_data(rs5c->client, addr, ctrl2) < 0) {
+ dev_dbg(&rs5c->client->dev, "%s: write error in line %i\n",
+ __func__, __LINE__);
+ return -EIO;
+ }
+ }
+ return 0;
+ default:
+ return -ENOIOCTLCMD;
+ }
+ return 0;
+}
+#else
+#define rs5c372_ioctl NULL
+#endif
+
+static int rs5c372_read_offset(struct device *dev, long *offset)
+{
+ struct rs5c372 *rs5c = i2c_get_clientdata(to_i2c_client(dev));
+ u8 val = rs5c->regs[RS5C372_REG_TRIM];
+ long ppb_per_step = 0;
+ bool decr = val & RS5C372_TRIM_DECR;
+
+ switch (rs5c->type) {
+ case rtc_r2221tl:
+ ppb_per_step = val & R2221TL_TRIM_DEV ? 1017 : 3051;
+ break;
+ case rtc_rs5c372a:
+ case rtc_rs5c372b:
+ ppb_per_step = val & RS5C372_TRIM_XSL ? 3125 : 3051;
+ break;
+ default:
+ ppb_per_step = 3051;
+ break;
+ }
+
+ /* Only bits[0:5] repsents the time counts */
+ val &= 0x3F;
+
+ /* If bits[1:5] are all 0, it means no increment or decrement */
+ if (!(val & 0x3E)) {
+ *offset = 0;
+ } else {
+ if (decr)
+ *offset = -(((~val) & 0x3F) + 1) * ppb_per_step;
+ else
+ *offset = (val - 1) * ppb_per_step;
+ }
+
+ return 0;
+}
+
+static int rs5c372_set_offset(struct device *dev, long offset)
+{
+ struct rs5c372 *rs5c = i2c_get_clientdata(to_i2c_client(dev));
+ int addr = RS5C_ADDR(RS5C372_REG_TRIM);
+ u8 val = 0;
+ u8 tmp = 0;
+ long ppb_per_step = 3051;
+ long steps = LONG_MIN;
+
+ switch (rs5c->type) {
+ case rtc_rs5c372a:
+ case rtc_rs5c372b:
+ tmp = rs5c->regs[RS5C372_REG_TRIM];
+ if (tmp & RS5C372_TRIM_XSL) {
+ ppb_per_step = 3125;
+ val |= RS5C372_TRIM_XSL;
+ }
+ break;
+ case rtc_r2221tl:
+ /*
+ * Check if it is possible to use high resolution mode (DEV=1).
+ * In this mode, the minimum resolution is 2 / (32768 * 20 * 3),
+ * which is about 1017 ppb.
+ */
+ steps = DIV_ROUND_CLOSEST(offset, 1017);
+ if (steps >= -0x3E && steps <= 0x3E) {
+ ppb_per_step = 1017;
+ val |= R2221TL_TRIM_DEV;
+ } else {
+ /*
+ * offset is out of the range of high resolution mode.
+ * Try to use low resolution mode (DEV=0). In this mode,
+ * the minimum resolution is 2 / (32768 * 20), which is
+ * about 3051 ppb.
+ */
+ steps = LONG_MIN;
+ }
+ break;
+ default:
+ break;
+ }
+
+ if (steps == LONG_MIN) {
+ steps = DIV_ROUND_CLOSEST(offset, ppb_per_step);
+ if (steps > 0x3E || steps < -0x3E)
+ return -ERANGE;
+ }
+
+ if (steps > 0) {
+ val |= steps + 1;
+ } else {
+ val |= RS5C372_TRIM_DECR;
+ val |= (~(-steps - 1)) & 0x3F;
+ }
+
+ if (!steps || !(val & 0x3E)) {
+ /*
+ * if offset is too small, set oscillation adjustment register
+ * or time trimming register with its default value whic means
+ * no increment or decrement. But for rs5c372[a|b], the XSL bit
+ * should be kept unchanged.
+ */
+ if (rs5c->type == rtc_rs5c372a || rs5c->type == rtc_rs5c372b)
+ val &= RS5C372_TRIM_XSL;
+ else
+ val = 0;
+ }
+
+ dev_dbg(&rs5c->client->dev, "write 0x%x for offset %ld\n", val, offset);
+
+ if (i2c_smbus_write_byte_data(rs5c->client, addr, val) < 0) {
+ dev_err(&rs5c->client->dev, "failed to write 0x%x to reg %d\n", val, addr);
+ return -EIO;
+ }
+
+ rs5c->regs[RS5C372_REG_TRIM] = val;
+
+ return 0;
+}
+
static const struct rtc_class_ops rs5c372_rtc_ops = {
.proc = rs5c372_rtc_proc,
.read_time = rs5c372_rtc_read_time,
.read_alarm = rs5c_read_alarm,
.set_alarm = rs5c_set_alarm,
.alarm_irq_enable = rs5c_rtc_alarm_irq_enable,
+ .ioctl = rs5c372_ioctl,
+ .read_offset = rs5c372_read_offset,
+ .set_offset = rs5c372_set_offset,
};
#if IS_ENABLED(CONFIG_RTC_INTF_SYSFS)
enum rv8803_type {
rv_8803,
+ rx_8804,
rx_8900
};
static const struct i2c_device_id rv8803_id[] = {
{ "rv8803", rv_8803 },
+ { "rv8804", rx_8804 },
{ "rx8803", rv_8803 },
{ "rx8900", rx_8900 },
{ }
.compatible = "epson,rx8803",
.data = (void *)rv_8803
},
+ {
+ .compatible = "epson,rx8804",
+ .data = (void *)rx_8804
+ },
{
.compatible = "epson,rx8900",
.data = (void *)rx_8900
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * The RTC driver for Sunplus SP7021
+ *
+ * Copyright (C) 2019 Sunplus Technology Inc., All rights reseerved.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+#include <linux/rtc.h>
+
+#define RTC_REG_NAME "rtc"
+
+#define RTC_CTRL 0x40
+#define TIMER_FREEZE_MASK_BIT BIT(5 + 16)
+#define TIMER_FREEZE BIT(5)
+#define DIS_SYS_RST_RTC_MASK_BIT BIT(4 + 16)
+#define DIS_SYS_RST_RTC BIT(4)
+#define RTC32K_MODE_RESET_MASK_BIT BIT(3 + 16)
+#define RTC32K_MODE_RESET BIT(3)
+#define ALARM_EN_OVERDUE_MASK_BIT BIT(2 + 16)
+#define ALARM_EN_OVERDUE BIT(2)
+#define ALARM_EN_PMC_MASK_BIT BIT(1 + 16)
+#define ALARM_EN_PMC BIT(1)
+#define ALARM_EN_MASK_BIT BIT(0 + 16)
+#define ALARM_EN BIT(0)
+#define RTC_TIMER_OUT 0x44
+#define RTC_DIVIDER 0x48
+#define RTC_TIMER_SET 0x4c
+#define RTC_ALARM_SET 0x50
+#define RTC_USER_DATA 0x54
+#define RTC_RESET_RECORD 0x58
+#define RTC_BATT_CHARGE_CTRL 0x5c
+#define BAT_CHARGE_RSEL_MASK_BIT GENMASK(3 + 16, 2 + 16)
+#define BAT_CHARGE_RSEL_MASK GENMASK(3, 2)
+#define BAT_CHARGE_RSEL_2K_OHM FIELD_PREP(BAT_CHARGE_RSEL_MASK, 0)
+#define BAT_CHARGE_RSEL_250_OHM FIELD_PREP(BAT_CHARGE_RSEL_MASK, 1)
+#define BAT_CHARGE_RSEL_50_OHM FIELD_PREP(BAT_CHARGE_RSEL_MASK, 2)
+#define BAT_CHARGE_RSEL_0_OHM FIELD_PREP(BAT_CHARGE_RSEL_MASK, 3)
+#define BAT_CHARGE_DSEL_MASK_BIT BIT(1 + 16)
+#define BAT_CHARGE_DSEL_MASK GENMASK(1, 1)
+#define BAT_CHARGE_DSEL_ON FIELD_PREP(BAT_CHARGE_DSEL_MASK, 0)
+#define BAT_CHARGE_DSEL_OFF FIELD_PREP(BAT_CHARGE_DSEL_MASK, 1)
+#define BAT_CHARGE_EN_MASK_BIT BIT(0 + 16)
+#define BAT_CHARGE_EN BIT(0)
+#define RTC_TRIM_CTRL 0x60
+
+struct sunplus_rtc {
+ struct rtc_device *rtc;
+ struct resource *res;
+ struct clk *rtcclk;
+ struct reset_control *rstc;
+ void __iomem *reg_base;
+ int irq;
+};
+
+static void sp_get_seconds(struct device *dev, unsigned long *secs)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(dev);
+
+ *secs = (unsigned long)readl(sp_rtc->reg_base + RTC_TIMER_OUT);
+}
+
+static void sp_set_seconds(struct device *dev, unsigned long secs)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(dev);
+
+ writel((u32)secs, sp_rtc->reg_base + RTC_TIMER_SET);
+}
+
+static int sp_rtc_read_time(struct device *dev, struct rtc_time *tm)
+{
+ unsigned long secs;
+
+ sp_get_seconds(dev, &secs);
+ rtc_time64_to_tm(secs, tm);
+
+ return 0;
+}
+
+static int sp_rtc_set_time(struct device *dev, struct rtc_time *tm)
+{
+ unsigned long secs;
+
+ secs = rtc_tm_to_time64(tm);
+ dev_dbg(dev, "%s, secs = %lu\n", __func__, secs);
+ sp_set_seconds(dev, secs);
+
+ return 0;
+}
+
+static int sp_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(dev);
+ unsigned long alarm_time;
+
+ alarm_time = rtc_tm_to_time64(&alrm->time);
+ dev_dbg(dev, "%s, alarm_time: %u\n", __func__, (u32)(alarm_time));
+ writel((u32)alarm_time, sp_rtc->reg_base + RTC_ALARM_SET);
+
+ return 0;
+}
+
+static int sp_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alrm)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(dev);
+ unsigned int alarm_time;
+
+ alarm_time = readl(sp_rtc->reg_base + RTC_ALARM_SET);
+ dev_dbg(dev, "%s, alarm_time: %u\n", __func__, alarm_time);
+
+ if (alarm_time == 0)
+ alrm->enabled = 0;
+ else
+ alrm->enabled = 1;
+
+ rtc_time64_to_tm((unsigned long)(alarm_time), &alrm->time);
+
+ return 0;
+}
+
+static int sp_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(dev);
+
+ if (enabled)
+ writel((TIMER_FREEZE_MASK_BIT | DIS_SYS_RST_RTC_MASK_BIT |
+ RTC32K_MODE_RESET_MASK_BIT | ALARM_EN_OVERDUE_MASK_BIT |
+ ALARM_EN_PMC_MASK_BIT | ALARM_EN_MASK_BIT) |
+ (DIS_SYS_RST_RTC | ALARM_EN_OVERDUE | ALARM_EN_PMC | ALARM_EN),
+ sp_rtc->reg_base + RTC_CTRL);
+ else
+ writel((ALARM_EN_OVERDUE_MASK_BIT | ALARM_EN_PMC_MASK_BIT | ALARM_EN_MASK_BIT) |
+ 0x0, sp_rtc->reg_base + RTC_CTRL);
+
+ return 0;
+}
+
+static const struct rtc_class_ops sp_rtc_ops = {
+ .read_time = sp_rtc_read_time,
+ .set_time = sp_rtc_set_time,
+ .set_alarm = sp_rtc_set_alarm,
+ .read_alarm = sp_rtc_read_alarm,
+ .alarm_irq_enable = sp_rtc_alarm_irq_enable,
+};
+
+static irqreturn_t sp_rtc_irq_handler(int irq, void *dev_id)
+{
+ struct platform_device *plat_dev = dev_id;
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(&plat_dev->dev);
+
+ rtc_update_irq(sp_rtc->rtc, 1, RTC_IRQF | RTC_AF);
+ dev_dbg(&plat_dev->dev, "[RTC] ALARM INT\n");
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * -------------------------------------------------------------------------------------
+ * bat_charge_rsel bat_charge_dsel bat_charge_en Remarks
+ * x x 0 Disable
+ * 0 0 1 0.86mA (2K Ohm with diode)
+ * 1 0 1 1.81mA (250 Ohm with diode)
+ * 2 0 1 2.07mA (50 Ohm with diode)
+ * 3 0 1 16.0mA (0 Ohm with diode)
+ * 0 1 1 1.36mA (2K Ohm without diode)
+ * 1 1 1 3.99mA (250 Ohm without diode)
+ * 2 1 1 4.41mA (50 Ohm without diode)
+ * 3 1 1 16.0mA (0 Ohm without diode)
+ * -------------------------------------------------------------------------------------
+ */
+static void sp_rtc_set_trickle_charger(struct device dev)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(&dev);
+ u32 ohms, rsel;
+ u32 chargeable;
+
+ if (of_property_read_u32(dev.of_node, "trickle-resistor-ohms", &ohms) ||
+ of_property_read_u32(dev.of_node, "aux-voltage-chargeable", &chargeable)) {
+ dev_warn(&dev, "battery charger disabled\n");
+ return;
+ }
+
+ switch (ohms) {
+ case 2000:
+ rsel = BAT_CHARGE_RSEL_2K_OHM;
+ break;
+ case 250:
+ rsel = BAT_CHARGE_RSEL_250_OHM;
+ break;
+ case 50:
+ rsel = BAT_CHARGE_RSEL_50_OHM;
+ break;
+ case 0:
+ rsel = BAT_CHARGE_RSEL_0_OHM;
+ break;
+ default:
+ dev_err(&dev, "invalid charger resistor value (%d)\n", ohms);
+ return;
+ }
+
+ writel(BAT_CHARGE_RSEL_MASK_BIT | rsel, sp_rtc->reg_base + RTC_BATT_CHARGE_CTRL);
+
+ switch (chargeable) {
+ case 0:
+ writel(BAT_CHARGE_DSEL_MASK_BIT | BAT_CHARGE_DSEL_OFF,
+ sp_rtc->reg_base + RTC_BATT_CHARGE_CTRL);
+ break;
+ case 1:
+ writel(BAT_CHARGE_DSEL_MASK_BIT | BAT_CHARGE_DSEL_ON,
+ sp_rtc->reg_base + RTC_BATT_CHARGE_CTRL);
+ break;
+ default:
+ dev_err(&dev, "invalid aux-voltage-chargeable value (%d)\n", chargeable);
+ return;
+ }
+
+ writel(BAT_CHARGE_EN_MASK_BIT | BAT_CHARGE_EN, sp_rtc->reg_base + RTC_BATT_CHARGE_CTRL);
+}
+
+static int sp_rtc_probe(struct platform_device *plat_dev)
+{
+ struct sunplus_rtc *sp_rtc;
+ int ret;
+
+ sp_rtc = devm_kzalloc(&plat_dev->dev, sizeof(*sp_rtc), GFP_KERNEL);
+ if (!sp_rtc)
+ return -ENOMEM;
+
+ sp_rtc->res = platform_get_resource_byname(plat_dev, IORESOURCE_MEM, RTC_REG_NAME);
+ sp_rtc->reg_base = devm_ioremap_resource(&plat_dev->dev, sp_rtc->res);
+ if (IS_ERR(sp_rtc->reg_base))
+ return dev_err_probe(&plat_dev->dev, PTR_ERR(sp_rtc->reg_base),
+ "%s devm_ioremap_resource fail\n", RTC_REG_NAME);
+ dev_dbg(&plat_dev->dev, "res = 0x%x, reg_base = 0x%lx\n",
+ sp_rtc->res->start, (unsigned long)sp_rtc->reg_base);
+
+ sp_rtc->irq = platform_get_irq(plat_dev, 0);
+ if (sp_rtc->irq < 0)
+ return dev_err_probe(&plat_dev->dev, sp_rtc->irq, "platform_get_irq failed\n");
+
+ ret = devm_request_irq(&plat_dev->dev, sp_rtc->irq, sp_rtc_irq_handler,
+ IRQF_TRIGGER_RISING, "rtc irq", plat_dev);
+ if (ret)
+ return dev_err_probe(&plat_dev->dev, ret, "devm_request_irq failed:\n");
+
+ sp_rtc->rtcclk = devm_clk_get(&plat_dev->dev, NULL);
+ if (IS_ERR(sp_rtc->rtcclk))
+ return dev_err_probe(&plat_dev->dev, PTR_ERR(sp_rtc->rtcclk),
+ "devm_clk_get fail\n");
+
+ sp_rtc->rstc = devm_reset_control_get_exclusive(&plat_dev->dev, NULL);
+ if (IS_ERR(sp_rtc->rstc))
+ return dev_err_probe(&plat_dev->dev, PTR_ERR(sp_rtc->rstc),
+ "failed to retrieve reset controller\n");
+
+ ret = clk_prepare_enable(sp_rtc->rtcclk);
+ if (ret)
+ goto free_clk;
+
+ ret = reset_control_deassert(sp_rtc->rstc);
+ if (ret)
+ goto free_reset_assert;
+
+ device_init_wakeup(&plat_dev->dev, 1);
+ dev_set_drvdata(&plat_dev->dev, sp_rtc);
+
+ sp_rtc->rtc = devm_rtc_allocate_device(&plat_dev->dev);
+ if (IS_ERR(sp_rtc->rtc)) {
+ ret = PTR_ERR(sp_rtc->rtc);
+ goto free_reset_assert;
+ }
+
+ sp_rtc->rtc->range_max = U32_MAX;
+ sp_rtc->rtc->range_min = 0;
+ sp_rtc->rtc->ops = &sp_rtc_ops;
+
+ ret = devm_rtc_register_device(sp_rtc->rtc);
+ if (ret)
+ goto free_reset_assert;
+
+ /* Setup trickle charger */
+ if (plat_dev->dev.of_node)
+ sp_rtc_set_trickle_charger(plat_dev->dev);
+
+ /* Keep RTC from system reset */
+ writel(DIS_SYS_RST_RTC_MASK_BIT | DIS_SYS_RST_RTC, sp_rtc->reg_base + RTC_CTRL);
+
+ return 0;
+
+free_reset_assert:
+ reset_control_assert(sp_rtc->rstc);
+free_clk:
+ clk_disable_unprepare(sp_rtc->rtcclk);
+
+ return ret;
+}
+
+static int sp_rtc_remove(struct platform_device *plat_dev)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(&plat_dev->dev);
+
+ device_init_wakeup(&plat_dev->dev, 0);
+ reset_control_assert(sp_rtc->rstc);
+ clk_disable_unprepare(sp_rtc->rtcclk);
+
+ return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int sp_rtc_suspend(struct device *dev)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(dev);
+
+ if (device_may_wakeup(dev))
+ enable_irq_wake(sp_rtc->irq);
+
+ return 0;
+}
+
+static int sp_rtc_resume(struct device *dev)
+{
+ struct sunplus_rtc *sp_rtc = dev_get_drvdata(dev);
+
+ if (device_may_wakeup(dev))
+ disable_irq_wake(sp_rtc->irq);
+
+ return 0;
+}
+#endif
+
+static const struct of_device_id sp_rtc_of_match[] = {
+ { .compatible = "sunplus,sp7021-rtc" },
+ { /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, sp_rtc_of_match);
+
+static SIMPLE_DEV_PM_OPS(sp_rtc_pm_ops, sp_rtc_suspend, sp_rtc_resume);
+
+static struct platform_driver sp_rtc_driver = {
+ .probe = sp_rtc_probe,
+ .remove = sp_rtc_remove,
+ .driver = {
+ .name = "sp7021-rtc",
+ .of_match_table = sp_rtc_of_match,
+ .pm = &sp_rtc_pm_ops,
+ },
+};
+module_platform_driver(sp_rtc_driver);
+
+MODULE_AUTHOR("Vincent Shih <vincent.sunplus@gmail.com>");
+MODULE_DESCRIPTION("Sunplus RTC driver");
+MODULE_LICENSE("GPL v2");
+
" 0=PIC(default), 1=MSI, 2=MSI-X)");
module_param(startup_timeout, int, S_IRUGO|S_IWUSR);
MODULE_PARM_DESC(startup_timeout, "The duration of time in seconds to wait for"
- " adapter to have it's kernel up and\n"
+ " adapter to have its kernel up and\n"
"running. This is typically adjusted for large systems that do not"
" have a BIOS.");
module_param(aif_timeout, int, S_IRUGO|S_IWUSR);
static int
ahd_linux_abort(struct scsi_cmnd *cmd)
{
- int error;
-
- error = ahd_linux_queue_abort_cmd(cmd);
-
- return error;
+ return ahd_linux_queue_abort_cmd(cmd);
}
/*
pci_set_drvdata(pdev, efct);
- if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)) != 0) {
- dev_warn(&pdev->dev, "trying DMA_BIT_MASK(32)\n");
- if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)) != 0) {
- dev_err(&pdev->dev, "setting DMA_BIT_MASK failed\n");
- rc = -1;
- goto dma_mask_out;
- }
+ rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+ if (rc) {
+ dev_err(&pdev->dev, "setting DMA_BIT_MASK failed\n");
+ goto dma_mask_out;
}
num_interrupts = efct_device_interrupts_required(efct);
struct device *dev = hisi_hba->dev;
int s = sizeof(struct host_to_dev_fis);
int rc = TMF_RESP_FUNC_FAILED;
- struct asd_sas_phy *sas_phy;
struct ata_link *link;
u8 fis[20] = {0};
- u32 state;
int i;
- state = hisi_hba->hw->get_phys_state(hisi_hba);
for (i = 0; i < hisi_hba->n_phy; i++) {
- if (!(state & BIT(sas_phy->id)))
- continue;
if (!(sas_port->phy_mask & BIT(i)))
continue;
* the driver starts at 0 each time.
*/
spin_lock_irq(&phba->hbalock);
- xri = find_next_zero_bit(phba->sli4_hba.xri_bmask,
- phba->sli4_hba.max_cfg_param.max_xri, 0);
+ xri = find_first_zero_bit(phba->sli4_hba.xri_bmask,
+ phba->sli4_hba.max_cfg_param.max_xri);
if (xri >= phba->sli4_hba.max_cfg_param.max_xri) {
spin_unlock_irq(&phba->hbalock);
return NO_XRI;
max_rpi = phba->sli4_hba.max_cfg_param.max_rpi;
rpi_limit = phba->sli4_hba.next_rpi;
- rpi = find_next_zero_bit(phba->sli4_hba.rpi_bmask, rpi_limit, 0);
+ rpi = find_first_zero_bit(phba->sli4_hba.rpi_bmask, rpi_limit);
if (rpi >= rpi_limit)
rpi = LPFC_RPI_ALLOC_ERROR;
else {
* have been tested so that we can detect when we should
* change the priority level.
*/
- next_fcf_index = find_next_bit(phba->fcf.fcf_rr_bmask,
- LPFC_SLI4_FCF_TBL_INDX_MAX, 0);
+ next_fcf_index = find_first_bit(phba->fcf.fcf_rr_bmask,
+ LPFC_SLI4_FCF_TBL_INDX_MAX);
}
{
dma_addr_t prod_info_dma_handle;
mega_inquiry3 *inquiry3;
- u8 raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
+ struct mbox_out mbox;
+ u8 *raw_mbox = (u8 *)&mbox;
int retval;
/* Initialize adapter inquiry mailbox */
- mbox = (mbox_t *)raw_mbox;
-
memset((void *)adapter->mega_buffer, 0, MEGA_BUFFER_SIZE);
- memset(&mbox->m_out, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
/*
* Try to issue Inquiry3 command
* if not succeeded, then issue MEGA_MBOXCMD_ADAPTERINQ command and
* update enquiry3 structure
*/
- mbox->m_out.xferaddr = (u32)adapter->buf_dma_handle;
+ mbox.xferaddr = (u32)adapter->buf_dma_handle;
inquiry3 = (mega_inquiry3 *)adapter->mega_buffer;
inq = &ext_inq->raid_inq;
- mbox->m_out.xferaddr = (u32)dma_handle;
+ mbox.xferaddr = (u32)dma_handle;
/*issue old 0x04 command to adapter */
- mbox->m_out.cmd = MEGA_MBOXCMD_ADPEXTINQ;
+ mbox.cmd = MEGA_MBOXCMD_ADPEXTINQ;
issue_scb_block(adapter, raw_mbox);
sizeof(mega_product_info),
DMA_FROM_DEVICE);
- mbox->m_out.xferaddr = prod_info_dma_handle;
+ mbox.xferaddr = prod_info_dma_handle;
raw_mbox[0] = FC_NEW_CONFIG; /* i.e. mbox->cmd=0xA1 */
raw_mbox[2] = NC_SUBOP_PRODUCT_INFO; /* i.e. 0x0E */
static int
mega_is_bios_enabled(adapter_t *adapter)
{
- unsigned char raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
-
- mbox = (mbox_t *)raw_mbox;
+ struct mbox_out mbox;
+ unsigned char *raw_mbox = (u8 *)&mbox;
- memset(&mbox->m_out, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
memset((void *)adapter->mega_buffer, 0, MEGA_BUFFER_SIZE);
- mbox->m_out.xferaddr = (u32)adapter->buf_dma_handle;
+ mbox.xferaddr = (u32)adapter->buf_dma_handle;
raw_mbox[0] = IS_BIOS_ENABLED;
raw_mbox[2] = GET_BIOS;
static void
mega_enum_raid_scsi(adapter_t *adapter)
{
- unsigned char raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
+ struct mbox_out mbox;
+ unsigned char *raw_mbox = (u8 *)&mbox;
int i;
- mbox = (mbox_t *)raw_mbox;
-
- memset(&mbox->m_out, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
/*
* issue command to find out what channels are raid/scsi
memset((void *)adapter->mega_buffer, 0, MEGA_BUFFER_SIZE);
- mbox->m_out.xferaddr = (u32)adapter->buf_dma_handle;
+ mbox.xferaddr = (u32)adapter->buf_dma_handle;
/*
* Non-ROMB firmware fail this command, so all channels
mega_get_boot_drv(adapter_t *adapter)
{
struct private_bios_data *prv_bios_data;
- unsigned char raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
+ struct mbox_out mbox;
+ unsigned char *raw_mbox = (u8 *)&mbox;
u16 cksum = 0;
u8 *cksum_p;
u8 boot_pdrv;
int i;
- mbox = (mbox_t *)raw_mbox;
-
- memset(&mbox->m_out, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
raw_mbox[0] = BIOS_PVT_DATA;
raw_mbox[2] = GET_BIOS_PVT_DATA;
memset((void *)adapter->mega_buffer, 0, MEGA_BUFFER_SIZE);
- mbox->m_out.xferaddr = (u32)adapter->buf_dma_handle;
+ mbox.xferaddr = (u32)adapter->buf_dma_handle;
adapter->boot_ldrv_enabled = 0;
adapter->boot_ldrv = 0;
static int
mega_support_random_del(adapter_t *adapter)
{
- unsigned char raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
+ struct mbox_out mbox;
+ unsigned char *raw_mbox = (u8 *)&mbox;
int rval;
- mbox = (mbox_t *)raw_mbox;
-
- memset(&mbox->m_out, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
/*
* issue command
static int
mega_support_ext_cdb(adapter_t *adapter)
{
- unsigned char raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
+ struct mbox_out mbox;
+ unsigned char *raw_mbox = (u8 *)&mbox;
int rval;
- mbox = (mbox_t *)raw_mbox;
-
- memset(&mbox->m_out, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
/*
* issue command to find out if controller supports extended CDBs.
*/
static void
mega_get_max_sgl(adapter_t *adapter)
{
- unsigned char raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
+ struct mbox_out mbox;
+ unsigned char *raw_mbox = (u8 *)&mbox;
- mbox = (mbox_t *)raw_mbox;
-
- memset(mbox, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
memset((void *)adapter->mega_buffer, 0, MEGA_BUFFER_SIZE);
- mbox->m_out.xferaddr = (u32)adapter->buf_dma_handle;
+ mbox.xferaddr = (u32)adapter->buf_dma_handle;
raw_mbox[0] = MAIN_MISC_OPCODE;
raw_mbox[2] = GET_MAX_SG_SUPPORT;
}
else {
adapter->sglen = *((char *)adapter->mega_buffer);
-
+
/*
* Make sure this is not more than the resources we are
* planning to allocate
static int
mega_support_cluster(adapter_t *adapter)
{
- unsigned char raw_mbox[sizeof(struct mbox_out)];
- mbox_t *mbox;
-
- mbox = (mbox_t *)raw_mbox;
+ struct mbox_out mbox;
+ unsigned char *raw_mbox = (u8 *)&mbox;
- memset(mbox, 0, sizeof(raw_mbox));
+ memset(&mbox, 0, sizeof(mbox));
memset((void *)adapter->mega_buffer, 0, MEGA_BUFFER_SIZE);
- mbox->m_out.xferaddr = (u32)adapter->buf_dma_handle;
+ mbox.xferaddr = (u32)adapter->buf_dma_handle;
/*
* Try to get the initiator id. This command will succeed iff the
},
{ MPI3MR_RESET_FROM_SYSFS, "sysfs invocation" },
{ MPI3MR_RESET_FROM_SYSFS_TIMEOUT, "sysfs TM timeout" },
- { MPI3MR_RESET_FROM_FIRMWARE, "firmware asynchronus reset" },
+ { MPI3MR_RESET_FROM_FIRMWARE, "firmware asynchronous reset" },
};
/**
ioc_state = mpi3mr_get_iocstate(mrioc);
if (ioc_state == MRIOC_STATE_READY) {
ioc_info(mrioc,
- "successfully transistioned to %s state\n",
+ "successfully transitioned to %s state\n",
mpi3mr_iocstate_name(ioc_state));
return 0;
}
* mpi3mr_check_rh_fault_ioc - check reset history and fault
* controller
* @mrioc: Adapter instance reference
- * @reason_code, reason code for the fault.
+ * @reason_code: reason code for the fault.
*
* This routine will save snapdump and fault the controller with
* the given reason code if it is not already in the fault or
/**
* mpi3mr_init_ioc - Initialize the controller
* @mrioc: Adapter instance reference
- * @init_type: Flag to indicate is the init_type
*
* This the controller initialization routine, executed either
* after soft reset or from pci probe callback.
if (mrioc->shost->nr_hw_queues > mrioc->num_op_reply_q) {
ioc_err(mrioc,
- "cannot create minimum number of operatioanl queues expected:%d created:%d\n",
+ "cannot create minimum number of operational queues expected:%d created:%d\n",
mrioc->shost->nr_hw_queues, mrioc->num_op_reply_q);
goto out_failed_noretry;
}
/**
* mpi3mr_cleanup_ioc - Cleanup controller
* @mrioc: Adapter instance reference
-
+ *
* controller cleanup handler, Message unit reset or soft reset
* and shutdown notification is issued to the controller.
*
#define MPT3SAS_DRIVER_NAME "mpt3sas"
#define MPT3SAS_AUTHOR "Avago Technologies <MPT-FusionLinux.pdl@avagotech.com>"
#define MPT3SAS_DESCRIPTION "LSI MPT Fusion SAS 3.0 Device Driver"
-#define MPT3SAS_DRIVER_VERSION "39.100.00.00"
-#define MPT3SAS_MAJOR_VERSION 39
+#define MPT3SAS_DRIVER_VERSION "40.100.00.00"
+#define MPT3SAS_MAJOR_VERSION 40
#define MPT3SAS_MINOR_VERSION 100
#define MPT3SAS_BUILD_VERSION 0
#define MPT3SAS_RELEASE_VERSION 00
{
struct Scsi_Host *shost = class_to_shost(cdev);
struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+ struct SL_WH_MASTER_TRIGGER_T *master_tg;
unsigned long flags;
ssize_t rc;
+ bool set = 1;
- spin_lock_irqsave(&ioc->diag_trigger_lock, flags);
rc = min(sizeof(struct SL_WH_MASTER_TRIGGER_T), count);
+
+ if (ioc->supports_trigger_pages) {
+ master_tg = kzalloc(sizeof(struct SL_WH_MASTER_TRIGGER_T),
+ GFP_KERNEL);
+ if (!master_tg)
+ return -ENOMEM;
+
+ memcpy(master_tg, buf, rc);
+ if (!master_tg->MasterData)
+ set = 0;
+ if (mpt3sas_config_update_driver_trigger_pg1(ioc, master_tg,
+ set)) {
+ kfree(master_tg);
+ return -EFAULT;
+ }
+ kfree(master_tg);
+ }
+
+ spin_lock_irqsave(&ioc->diag_trigger_lock, flags);
memset(&ioc->diag_trigger_master, 0,
sizeof(struct SL_WH_MASTER_TRIGGER_T));
memcpy(&ioc->diag_trigger_master, buf, rc);
{
struct Scsi_Host *shost = class_to_shost(cdev);
struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+ struct SL_WH_EVENT_TRIGGERS_T *event_tg;
unsigned long flags;
ssize_t sz;
+ bool set = 1;
- spin_lock_irqsave(&ioc->diag_trigger_lock, flags);
sz = min(sizeof(struct SL_WH_EVENT_TRIGGERS_T), count);
+ if (ioc->supports_trigger_pages) {
+ event_tg = kzalloc(sizeof(struct SL_WH_EVENT_TRIGGERS_T),
+ GFP_KERNEL);
+ if (!event_tg)
+ return -ENOMEM;
+
+ memcpy(event_tg, buf, sz);
+ if (!event_tg->ValidEntries)
+ set = 0;
+ if (mpt3sas_config_update_driver_trigger_pg2(ioc, event_tg,
+ set)) {
+ kfree(event_tg);
+ return -EFAULT;
+ }
+ kfree(event_tg);
+ }
+
+ spin_lock_irqsave(&ioc->diag_trigger_lock, flags);
+
memset(&ioc->diag_trigger_event, 0,
sizeof(struct SL_WH_EVENT_TRIGGERS_T));
memcpy(&ioc->diag_trigger_event, buf, sz);
{
struct Scsi_Host *shost = class_to_shost(cdev);
struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+ struct SL_WH_SCSI_TRIGGERS_T *scsi_tg;
unsigned long flags;
ssize_t sz;
+ bool set = 1;
+
+ sz = min(sizeof(struct SL_WH_SCSI_TRIGGERS_T), count);
+ if (ioc->supports_trigger_pages) {
+ scsi_tg = kzalloc(sizeof(struct SL_WH_SCSI_TRIGGERS_T),
+ GFP_KERNEL);
+ if (!scsi_tg)
+ return -ENOMEM;
+
+ memcpy(scsi_tg, buf, sz);
+ if (!scsi_tg->ValidEntries)
+ set = 0;
+ if (mpt3sas_config_update_driver_trigger_pg3(ioc, scsi_tg,
+ set)) {
+ kfree(scsi_tg);
+ return -EFAULT;
+ }
+ kfree(scsi_tg);
+ }
spin_lock_irqsave(&ioc->diag_trigger_lock, flags);
- sz = min(sizeof(ioc->diag_trigger_scsi), count);
+
memset(&ioc->diag_trigger_scsi, 0, sizeof(ioc->diag_trigger_scsi));
memcpy(&ioc->diag_trigger_scsi, buf, sz);
if (ioc->diag_trigger_scsi.ValidEntries > NUM_VALID_ENTRIES)
{
struct Scsi_Host *shost = class_to_shost(cdev);
struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+ struct SL_WH_MPI_TRIGGERS_T *mpi_tg;
unsigned long flags;
ssize_t sz;
+ bool set = 1;
- spin_lock_irqsave(&ioc->diag_trigger_lock, flags);
sz = min(sizeof(struct SL_WH_MPI_TRIGGERS_T), count);
+ if (ioc->supports_trigger_pages) {
+ mpi_tg = kzalloc(sizeof(struct SL_WH_MPI_TRIGGERS_T),
+ GFP_KERNEL);
+ if (!mpi_tg)
+ return -ENOMEM;
+
+ memcpy(mpi_tg, buf, sz);
+ if (!mpi_tg->ValidEntries)
+ set = 0;
+ if (mpt3sas_config_update_driver_trigger_pg4(ioc, mpi_tg,
+ set)) {
+ kfree(mpi_tg);
+ return -EFAULT;
+ }
+ kfree(mpi_tg);
+ }
+
+ spin_lock_irqsave(&ioc->diag_trigger_lock, flags);
memset(&ioc->diag_trigger_mpi, 0,
sizeof(ioc->diag_trigger_mpi));
memcpy(&ioc->diag_trigger_mpi, buf, sz);
data->MmioAddress = (unsigned long)
ioremap(p_dev->resource[2]->start,
resource_size(p_dev->resource[2]));
+ if (!data->MmioAddress)
+ goto next_entry;
+
data->MmioLength = resource_size(p_dev->resource[2]);
}
/* If we got this far, we're cool! */
struct pm8001_device *pm8001_dev;
struct pm8001_tmf_task tmf_task;
int rc = TMF_RESP_FUNC_FAILED, ret;
- u32 phy_id;
+ u32 phy_id, port_id;
struct sas_task_slow slow_task;
if (unlikely(!task || !task->lldd_task || !task->dev))
DECLARE_COMPLETION_ONSTACK(completion_reset);
DECLARE_COMPLETION_ONSTACK(completion);
struct pm8001_phy *phy = pm8001_ha->phy + phy_id;
+ port_id = phy->port->port_id;
/* 1. Set Device state as Recovery */
pm8001_dev->setds_completion = &completion;
PORT_RESET_TMO);
if (phy->port_reset_status == PORT_RESET_TMO) {
pm8001_dev_gone_notify(dev);
+ PM8001_CHIP_DISP->hw_event_ack_req(
+ pm8001_ha, 0,
+ 0x07, /*HW_EVENT_PHY_DOWN ack*/
+ port_id, phy_id, 0, 0);
goto out;
}
}
u32 state);
int (*sas_re_init_req)(struct pm8001_hba_info *pm8001_ha);
int (*fatal_errors)(struct pm8001_hba_info *pm8001_ha);
+ void (*hw_event_ack_req)(struct pm8001_hba_info *pm8001_ha,
+ u32 Qnum, u32 SEA, u32 port_id, u32 phyId, u32 param0,
+ u32 param1);
};
struct pm8001_chip_info {
break;
case HW_EVENT_PORT_RESET_TIMER_TMO:
pm8001_dbg(pm8001_ha, MSG, "HW_EVENT_PORT_RESET_TIMER_TMO\n");
- pm80xx_hw_event_ack_req(pm8001_ha, 0, HW_EVENT_PHY_DOWN,
- port_id, phy_id, 0, 0);
+ if (!pm8001_ha->phy[phy_id].reset_completion) {
+ pm80xx_hw_event_ack_req(pm8001_ha, 0, HW_EVENT_PHY_DOWN,
+ port_id, phy_id, 0, 0);
+ }
sas_phy_disconnected(sas_phy);
phy->phy_attached = 0;
sas_notify_port_event(sas_phy, PORTE_LINK_RESET_ERR,
.fw_flash_update_req = pm8001_chip_fw_flash_update_req,
.set_dev_state_req = pm8001_chip_set_dev_state_req,
.fatal_errors = pm80xx_fatal_errors,
+ .hw_event_ack_req = pm80xx_hw_event_ack_req,
};
*/
term_params = dma_alloc_coherent(&qedf->pdev->dev, QEDF_TERM_BUFF_SIZE,
&term_params_dma, GFP_KERNEL);
+ if (!term_params)
+ return;
QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "Uploading connection "
"port_id=%06x.\n", fcport->rdata->ids.port_id);
* @sdev: SCSI device to be queried
* @pf: Page format bit (1 == standard, 0 == vendor specific)
* @sp: Save page bit (0 == don't save, 1 == save)
- * @modepage: mode page being requested
* @buffer: request buffer (may not be smaller than eight bytes)
* @len: length of request buffer.
* @timeout: command timeout
* status on error
*
*/
-int
-scsi_mode_select(struct scsi_device *sdev, int pf, int sp, int modepage,
- unsigned char *buffer, int len, int timeout, int retries,
- struct scsi_mode_data *data, struct scsi_sense_hdr *sshdr)
+int scsi_mode_select(struct scsi_device *sdev, int pf, int sp,
+ unsigned char *buffer, int len, int timeout, int retries,
+ struct scsi_mode_data *data, struct scsi_sense_hdr *sshdr)
{
unsigned char cmd[10];
unsigned char *real_buffer;
static ssize_t proc_scsi_host_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
- struct Scsi_Host *shost = PDE_DATA(file_inode(file));
+ struct Scsi_Host *shost = pde_data(file_inode(file));
ssize_t ret = -ENOMEM;
char *page;
static int proc_scsi_host_open(struct inode *inode, struct file *file)
{
- return single_open_size(file, proc_scsi_show, PDE_DATA(inode),
+ return single_open_size(file, proc_scsi_show, pde_data(inode),
4 * PAGE_SIZE);
}
*/
data.device_specific = 0;
- if (scsi_mode_select(sdp, 1, sp, 8, buffer_data, len, SD_TIMEOUT,
+ if (scsi_mode_select(sdp, 1, sp, buffer_data, len, SD_TIMEOUT,
sdkp->max_retries, &data, &sshdr)) {
if (scsi_sense_valid(&sshdr))
sd_print_sense_hdr(sdkp, &sshdr);
#define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ)
-int sg_big_buff = SG_DEF_RESERVED_SIZE;
+static int sg_big_buff = SG_DEF_RESERVED_SIZE;
/* N.B. This variable is readable and writeable via
/proc/scsi/sg/def_reserved_size . Each time sg_open() is called a buffer
of this size (or less if there is not enough memory) will be reserved
MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd");
MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow))");
+#ifdef CONFIG_SYSCTL
+#include <linux/sysctl.h>
+
+static struct ctl_table sg_sysctls[] = {
+ {
+ .procname = "sg-big-buff",
+ .data = &sg_big_buff,
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = proc_dointvec,
+ },
+ {}
+};
+
+static struct ctl_table_header *hdr;
+static void register_sg_sysctls(void)
+{
+ if (!hdr)
+ hdr = register_sysctl("kernel", sg_sysctls);
+}
+
+static void unregister_sg_sysctls(void)
+{
+ if (hdr)
+ unregister_sysctl_table(hdr);
+}
+#else
+#define register_sg_sysctls() do { } while (0)
+#define unregister_sg_sysctls() do { } while (0)
+#endif /* CONFIG_SYSCTL */
+
static int __init
init_sg(void)
{
return 0;
}
class_destroy(sg_sysfs_class);
+ register_sg_sysctls();
err_out:
unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS);
return rc;
static void __exit
exit_sg(void)
{
+ unregister_sg_sysctls();
#ifdef CONFIG_SCSI_PROC_FS
remove_proc_subtree("scsi/sg", NULL);
#endif /* CONFIG_SCSI_PROC_FS */
struct ufs_mtk_host *host = ufshcd_get_variant(hba);
host->reg_va09 = regulator_get(hba->dev, "va09");
- if (!host->reg_va09)
+ if (IS_ERR(host->reg_va09))
dev_info(hba->dev, "failed to get va09");
else
host->caps |= UFS_MTK_CAP_VA09_PWR_CTRL;
peer_pa_tactivate_us = peer_pa_tactivate *
gran_to_us_table[peer_granularity - 1];
- if (pa_tactivate_us > peer_pa_tactivate_us) {
+ if (pa_tactivate_us >= peer_pa_tactivate_us) {
u32 new_peer_pa_tactivate;
new_peer_pa_tactivate = pa_tactivate_us /
depends on RISCV && SOC_CANAAN && OF
default SOC_CANAAN
select PM
- select SYSCON
select MFD_SYSCON
help
Canaan Kendryte K210 SoC system controller driver.
}
spin_lock(&bman_lock);
- cpu = cpumask_next_zero(-1, &portal_cpus);
+ cpu = cpumask_first_zero(&portal_cpus);
if (cpu >= nr_cpu_ids) {
__bman_portals_probed = 1;
/* unassigned portal, skip init */
pcfg->pools = qm_get_pools_sdqcr();
spin_lock(&qman_lock);
- cpu = cpumask_next_zero(-1, &portal_cpus);
+ cpu = cpumask_first_zero(&portal_cpus);
if (cpu >= nr_cpu_ids) {
__qman_portals_probed = 1;
/* unassigned portal, skip init */
goto out;
if (flags & K3_RINGACC_RING_USE_PROXY) {
- proxy_id = find_next_zero_bit(ringacc->proxy_inuse,
- ringacc->num_proxies, 0);
+ proxy_id = find_first_zero_bit(ringacc->proxy_inuse,
+ ringacc->num_proxies);
if (proxy_id == ringacc->num_proxies)
goto error;
}
{"INT3400", 0},
{"INTC1040", 0},
{"INTC1041", 0},
+ {"INTC10A0", 0},
{}
};
{"INT3403", 0},
{"INTC1043", 0},
{"INTC1046", 0},
+ {"INTC10A1", 0},
{"", 0},
};
MODULE_DEVICE_TABLE(acpi, int3403_device_ids);
#define PCI_DEVICE_ID_INTEL_HSB_THERMAL 0x0A03
#define PCI_DEVICE_ID_INTEL_ICL_THERMAL 0x8a03
#define PCI_DEVICE_ID_INTEL_JSL_THERMAL 0x4E03
+#define PCI_DEVICE_ID_INTEL_RPL_THERMAL 0xA71D
#define PCI_DEVICE_ID_INTEL_SKL_THERMAL 0x1903
#define PCI_DEVICE_ID_INTEL_TGL_THERMAL 0x9A03
static const struct pci_device_id proc_thermal_pci_ids[] = {
{ PCI_DEVICE_DATA(INTEL, ADL_THERMAL, PROC_THERMAL_FEATURE_RAPL | PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_MBOX) },
+ { PCI_DEVICE_DATA(INTEL, RPL_THERMAL, PROC_THERMAL_FEATURE_RAPL | PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_MBOX) },
{ },
};
more = n - (size - tail);
if (eol == N_TTY_BUF_SIZE && more) {
/* scan wrapped without finding set bit */
- eol = find_next_bit(ldata->read_flags, more, 0);
+ eol = find_first_bit(ldata->read_flags, more);
found = eol != more;
} else
found = eol != size;
static ssize_t rndis_proc_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
- rndis_params *p = PDE_DATA(file_inode(file));
+ rndis_params *p = pde_data(file_inode(file));
u32 speed = 0;
int i, fl_speed = 0;
static int rndis_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, rndis_proc_show, PDE_DATA(inode));
+ return single_open(file, rndis_proc_show, pde_data(inode));
}
static const struct proc_ops rndis_proc_ops = {
if ((pos & 3) && size > 2) {
u16 val;
+ __le16 lval;
ret = pci_user_read_config_word(pdev, pos, &val);
if (ret)
return ret;
- val = cpu_to_le16(val);
- if (copy_to_user(buf + count - size, &val, 2))
+ lval = cpu_to_le16(val);
+ if (copy_to_user(buf + count - size, &lval, 2))
return -EFAULT;
pos += 2;
while (size > 3) {
u32 val;
+ __le32 lval;
ret = pci_user_read_config_dword(pdev, pos, &val);
if (ret)
return ret;
- val = cpu_to_le32(val);
- if (copy_to_user(buf + count - size, &val, 4))
+ lval = cpu_to_le32(val);
+ if (copy_to_user(buf + count - size, &lval, 4))
return -EFAULT;
pos += 4;
while (size >= 2) {
u16 val;
+ __le16 lval;
ret = pci_user_read_config_word(pdev, pos, &val);
if (ret)
return ret;
- val = cpu_to_le16(val);
- if (copy_to_user(buf + count - size, &val, 2))
+ lval = cpu_to_le16(val);
+ if (copy_to_user(buf + count - size, &lval, 2))
return -EFAULT;
pos += 2;
static void vfio_dma_bitmap_free(struct vfio_dma *dma)
{
- kfree(dma->bitmap);
+ kvfree(dma->bitmap);
dma->bitmap = NULL;
}
spin_lock_bh(&vm->ioreq_clients_lock);
client = vm->default_client;
if (client) {
- vcpu = find_next_bit(client->ioreqs_map,
- ACRN_IO_REQUEST_MAX, 0);
+ vcpu = find_first_bit(client->ioreqs_map, ACRN_IO_REQUEST_MAX);
while (vcpu < ACRN_IO_REQUEST_MAX) {
acrn_ioreq_complete_request(client, vcpu, NULL);
vcpu = find_next_bit(client->ioreqs_map,
static ssize_t
proc_bus_zorro_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
{
- struct zorro_dev *z = PDE_DATA(file_inode(file));
+ struct zorro_dev *z = pde_data(file_inode(file));
struct ConfigDev cd;
loff_t pos = *ppos;
# Rewritten to use lists instead of if-statements.
#
+obj-$(CONFIG_SYSCTL) += sysctls.o
+
obj-y := open.o read_write.o file_table.o super.o \
char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \
ioctl.o readdir.o select.o dcache.o inode.o \
{
struct super_block *sb = inode->i_sb;
struct object_info obj;
- int ret;
obj.indaddr = ADFS_I(inode)->indaddr;
obj.name_len = 0;
obj.attr = ADFS_I(inode)->attr;
obj.size = inode->i_size;
- ret = adfs_dir_update(sb, &obj, wbc->sync_mode == WB_SYNC_ALL);
- return ret;
+ return adfs_dir_update(sb, &obj, wbc->sync_mode == WB_SYNC_ALL);
}
static void *afs_proc_cell_volumes_start(struct seq_file *m, loff_t *_pos)
__acquires(cell->proc_lock)
{
- struct afs_cell *cell = PDE_DATA(file_inode(m->file));
+ struct afs_cell *cell = pde_data(file_inode(m->file));
rcu_read_lock();
return seq_hlist_start_head_rcu(&cell->proc_volumes, *_pos);
static void *afs_proc_cell_volumes_next(struct seq_file *m, void *v,
loff_t *_pos)
{
- struct afs_cell *cell = PDE_DATA(file_inode(m->file));
+ struct afs_cell *cell = pde_data(file_inode(m->file));
return seq_hlist_next_rcu(v, &cell->proc_volumes, _pos);
}
{
struct afs_vl_seq_net_private *priv = m->private;
struct afs_vlserver_list *vllist;
- struct afs_cell *cell = PDE_DATA(file_inode(m->file));
+ struct afs_cell *cell = pde_data(file_inode(m->file));
loff_t pos = *_pos;
rcu_read_lock();
/*------ sysctl variables----*/
static DEFINE_SPINLOCK(aio_nr_lock);
-unsigned long aio_nr; /* current system wide number of aio requests */
-unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio requests */
+static unsigned long aio_nr; /* current system wide number of aio requests */
+static unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio requests */
/*----end sysctl variables---*/
+#ifdef CONFIG_SYSCTL
+static struct ctl_table aio_sysctls[] = {
+ {
+ .procname = "aio-nr",
+ .data = &aio_nr,
+ .maxlen = sizeof(aio_nr),
+ .mode = 0444,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "aio-max-nr",
+ .data = &aio_max_nr,
+ .maxlen = sizeof(aio_max_nr),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {}
+};
+
+static void __init aio_sysctl_init(void)
+{
+ register_sysctl_init("fs", aio_sysctls);
+}
+#else
+#define aio_sysctl_init() do { } while (0)
+#endif
static struct kmem_cache *kiocb_cachep;
static struct kmem_cache *kioctx_cachep;
kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
+ aio_sysctl_init();
return 0;
}
__initcall(aio_setup);
* independently randomized mmap region (0 load_bias
* without MAP_FIXED nor MAP_FIXED_NOREPLACE).
*/
- if (interpreter) {
+ alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum);
+ if (alignment > ELF_MIN_ALIGN) {
load_bias = ELF_ET_DYN_BASE;
if (current->flags & PF_RANDOMIZE)
load_bias += arch_mmap_rnd();
- alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum);
if (alignment)
load_bias &= ~(alignment - 1);
elf_flags |= MAP_FIXED_NOREPLACE;
SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
rcu_read_unlock();
- strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
+ get_task_comm(psinfo->pr_fname, p);
return 0;
}
int err = register_filesystem(&bm_fs_type);
if (!err)
insert_binfmt(&misc_format);
- return err;
+ if (!register_sysctl_mount_point("fs/binfmt_misc")) {
+ pr_warn("Failed to create fs/binfmt_misc sysctl mount point");
+ return -ENOMEM;
+ }
+ return 0;
}
static void __exit exit_misc_binfmt(void)
select RAID6_PQ
select XOR_BLOCKS
select SRCU
- depends on !PPC_256K_PAGES # powerpc
- depends on !PAGE_SIZE_256KB # hexagon
+ depends on PAGE_SIZE_LESS_THAN_256KB
help
Btrfs is a general purpose copy-on-write filesystem with extents,
#include <linux/writeback.h>
#include <linux/pagevec.h>
#include <linux/prefetch.h>
-#include <linux/cleancache.h>
#include <linux/fsverity.h>
#include "misc.h"
#include "extent_io.h"
goto out;
}
- if (!PageUptodate(page)) {
- if (cleancache_get_page(page) == 0) {
- BUG_ON(blocksize != PAGE_SIZE);
- unlock_extent(tree, start, end);
- unlock_page(page);
- goto out;
- }
- }
-
if (page->index == last_byte >> PAGE_SHIFT) {
size_t zero_offset = offset_in_page(last_byte);
#include <linux/miscdevice.h>
#include <linux/magic.h>
#include <linux/slab.h>
-#include <linux/cleancache.h>
#include <linux/ratelimit.h>
#include <linux/crc32c.h>
#include <linux/btrfs.h>
goto fail_close;
}
- cleancache_init_fs(sb);
sb->s_flags |= SB_ACTIVE;
return 0;
goto error_unsupported;
}
- /* check parameters */
+ /* Check features of the backing filesystem:
+ * - Directories must support looking up and directory creation
+ * - We create tmpfiles to handle invalidation
+ * - We use xattrs to store metadata
+ * - We need to be able to query the amount of space available
+ * - We want to be able to sync the filesystem when stopping the cache
+ * - We use DIO to/from pages, so the blocksize mustn't be too big.
+ */
ret = -EOPNOTSUPP;
if (d_is_negative(root) ||
!d_backing_inode(root)->i_op->lookup ||
!d_backing_inode(root)->i_op->mkdir ||
+ !d_backing_inode(root)->i_op->tmpfile ||
!(d_backing_inode(root)->i_opflags & IOP_XATTR) ||
!root->d_sb->s_op->statfs ||
!root->d_sb->s_op->sync_fs ||
goto error_unsupported;
cache->bsize = stats.f_bsize;
- cache->bshift = 0;
- if (stats.f_bsize < PAGE_SIZE)
- cache->bshift = PAGE_SHIFT - ilog2(stats.f_bsize);
+ cache->bshift = ilog2(stats.f_bsize);
_debug("blksize %u (shift %u)",
cache->bsize, cache->bshift);
(unsigned long long) cache->fcull,
(unsigned long long) cache->fstop);
- stats.f_blocks >>= cache->bshift;
do_div(stats.f_blocks, 100);
cache->bstop = stats.f_blocks * cache->bstop_percent;
cache->bcull = stats.f_blocks * cache->bcull_percent;
return ret;
}
- b_avail = stats.f_bavail >> cache->bshift;
+ b_avail = stats.f_bavail;
b_writing = atomic_long_read(&cache->b_writing);
if (b_avail > b_writing)
b_avail -= b_writing;
return -EBUSY;
}
+ /* Make sure we have copies of the tag string */
+ if (!cache->tag) {
+ /*
+ * The tag string is released by the fops->release()
+ * function, so we don't release it on error here
+ */
+ cache->tag = kstrdup("CacheFiles", GFP_KERNEL);
+ if (!cache->tag)
+ return -ENOMEM;
+ }
+
return cachefiles_add_cache(cache);
}
unsigned bcull_percent; /* when to start culling (% blocks) */
unsigned bstop_percent; /* when to stop allocating (% blocks) */
unsigned bsize; /* cache's block size */
- unsigned bshift; /* min(ilog2(PAGE_SIZE / bsize), 0) */
+ unsigned bshift; /* ilog2(bsize) */
uint64_t frun; /* when to stop culling */
uint64_t fcull; /* when to start culling */
uint64_t fstop; /* when to stop allocating */
ki->term_func = term_func;
ki->term_func_priv = term_func_priv;
ki->was_async = true;
- ki->b_writing = (len + (1 << cache->bshift)) >> cache->bshift;
+ ki->b_writing = (len + (1 << cache->bshift) - 1) >> cache->bshift;
if (ki->term_func)
ki->iocb.ki_complete = cachefiles_write_complete;
trace_cachefiles_mark_active(object, inode);
can_use = true;
} else {
- pr_notice("cachefiles: Inode already in use: %pd\n", dentry);
+ trace_cachefiles_mark_failed(object, inode);
+ pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
+ dentry, inode->i_ino);
}
return can_use;
subdir = lookup_one_len(dirname, dir, strlen(dirname));
else
subdir = ERR_PTR(ret);
+ trace_cachefiles_lookup(NULL, dir, subdir);
if (IS_ERR(subdir)) {
trace_cachefiles_vfs_error(NULL, d_backing_inode(dir),
PTR_ERR(subdir),
cachefiles_trace_mkdir_error);
goto mkdir_error;
}
+ trace_cachefiles_mkdir(dir, subdir);
if (unlikely(d_unhashed(subdir))) {
cachefiles_put_directory(subdir);
};
int ret;
- trace_cachefiles_unlink(object, dentry, why);
+ trace_cachefiles_unlink(object, d_inode(dentry)->i_ino, why);
ret = security_path_unlink(&path, dentry);
if (ret < 0) {
cachefiles_io_error(cache, "Unlink security error");
.new_dir = d_inode(cache->graveyard),
.new_dentry = grave,
};
- trace_cachefiles_rename(object, rep, grave, why);
+ trace_cachefiles_rename(object, d_inode(rep)->i_ino, why);
ret = cachefiles_inject_read_error();
if (ret == 0)
ret = vfs_rename(&rd);
object->d_name_len);
else
dentry = ERR_PTR(ret);
- trace_cachefiles_lookup(object, dentry);
+ trace_cachefiles_lookup(object, fan, dentry);
if (IS_ERR(dentry)) {
if (dentry == ERR_PTR(-ENOENT))
goto new_file;
dout("%s: result %d\n", __func__, err);
}
-static void ceph_init_rreq(struct netfs_read_request *rreq, struct file *file)
-{
-}
-
static void ceph_readahead_cleanup(struct address_space *mapping, void *priv)
{
struct inode *inode = mapping->host;
}
static const struct netfs_read_request_ops ceph_netfs_read_ops = {
- .init_rreq = ceph_init_rreq,
.is_cache_enabled = ceph_is_cache_enabled,
.begin_cache_operation = ceph_begin_cache_operation,
.issue_op = ceph_netfs_issue_op,
if ((newcaps & CEPH_CAP_LINK_SHARED) &&
(extra_info->issued & CEPH_CAP_LINK_EXCL) == 0) {
set_nlink(inode, le32_to_cpu(grant->nlink));
- if (inode->i_nlink == 0 &&
- (newcaps & (CEPH_CAP_LINK_SHARED | CEPH_CAP_LINK_EXCL)))
+ if (inode->i_nlink == 0)
deleted_inode = true;
}
int fmode, bool isdir)
{
struct ceph_inode_info *ci = ceph_inode(inode);
+ struct ceph_mount_options *opt =
+ ceph_inode_to_client(&ci->vfs_inode)->mount_options;
struct ceph_file_info *fi;
dout("%s %p %p 0%o (%s)\n", __func__, inode, file,
if (!fi)
return -ENOMEM;
+ if (opt->flags & CEPH_MOUNT_OPT_NOPAGECACHE)
+ fi->flags |= CEPH_F_SYNC;
+
file->private_data = fi;
}
struct ceph_inode_info *ci = ceph_inode(inode);
bool direct_lock = iocb->ki_flags & IOCB_DIRECT;
ssize_t ret;
- int want, got = 0;
+ int want = 0, got = 0;
int retry_op = 0, read = 0;
again:
else
ceph_start_io_read(inode);
+ if (!(fi->flags & CEPH_F_SYNC) && !direct_lock)
+ want |= CEPH_CAP_FILE_CACHE;
if (fi->fmode & CEPH_FILE_MODE_LAZY)
- want = CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO;
- else
- want = CEPH_CAP_FILE_CACHE;
+ want |= CEPH_CAP_FILE_LAZYIO;
+
ret = ceph_get_caps(filp, CEPH_CAP_FILE_RD, want, -1, &got);
if (ret < 0) {
- if (iocb->ki_flags & IOCB_DIRECT)
+ if (direct_lock)
ceph_end_io_direct(inode);
else
ceph_end_io_read(inode);
struct ceph_osd_client *osdc = &fsc->client->osdc;
struct ceph_cap_flush *prealloc_cf;
ssize_t count, written = 0;
- int err, want, got;
+ int err, want = 0, got;
bool direct_lock = false;
u32 map_flags;
u64 pool_flags;
dout("aio_write %p %llx.%llx %llu~%zd getting caps. i_size %llu\n",
inode, ceph_vinop(inode), pos, count, i_size_read(inode));
+ if (!(fi->flags & CEPH_F_SYNC) && !direct_lock)
+ want |= CEPH_CAP_FILE_BUFFER;
if (fi->fmode & CEPH_FILE_MODE_LAZY)
- want = CEPH_CAP_FILE_BUFFER | CEPH_CAP_FILE_LAZYIO;
- else
- want = CEPH_CAP_FILE_BUFFER;
+ want |= CEPH_CAP_FILE_LAZYIO;
got = 0;
err = ceph_get_caps(file, CEPH_CAP_FILE_WR, want, pos + count, &got);
if (err < 0)
msg->hdr.version = cpu_to_le16(1);
msg->hdr.compat_version = cpu_to_le16(1);
msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
- dout("client%llu send metrics to mds%d\n",
- ceph_client_gid(mdsc->fsc->client), s->s_mds);
ceph_con_send(&s->s_con, msg);
return true;
/* if root is the real CephFS root, we don't have quota realms */
if (root && ceph_ino(root) == CEPH_INO_ROOT)
return false;
+ /* MDS stray dirs have no quota realms */
+ if (ceph_vino_is_reserved(ceph_inode(inode)->i_vino))
+ return false;
/* otherwise, we can't know for sure */
return true;
}
if (ci->i_max_bytes) {
total = ci->i_max_bytes >> CEPH_BLOCK_SHIFT;
used = ci->i_rbytes >> CEPH_BLOCK_SHIFT;
+ /* For quota size less than 4MB, use 4KB block size */
+ if (!total) {
+ total = ci->i_max_bytes >> CEPH_4K_BLOCK_SHIFT;
+ used = ci->i_rbytes >> CEPH_4K_BLOCK_SHIFT;
+ buf->f_frsize = 1 << CEPH_4K_BLOCK_SHIFT;
+ }
/* It is possible for a quota to be exceeded.
* Report 'zero' in that case
*/
free = total > used ? total - used : 0;
+ /* For quota size less than 4KB, report the
+ * total=used=4KB,free=0 when quota is full
+ * and total=free=4KB, used=0 otherwise */
+ if (!total) {
+ total = 1;
+ free = ci->i_max_bytes > ci->i_rbytes ? 1 : 0;
+ buf->f_frsize = 1 << CEPH_4K_BLOCK_SHIFT;
+ }
}
spin_unlock(&ci->i_ceph_lock);
if (total) {
#include <linux/ceph/auth.h>
#include <linux/ceph/debugfs.h>
+#include <uapi/linux/magic.h>
+
static DEFINE_SPINLOCK(ceph_fsc_lock);
static LIST_HEAD(ceph_fsc_list);
Opt_mds_namespace,
Opt_recover_session,
Opt_source,
+ Opt_mon_addr,
/* string args above */
Opt_dirstat,
Opt_rbytes,
Opt_quotadf,
Opt_copyfrom,
Opt_wsync,
+ Opt_pagecache,
};
enum ceph_recover_session_mode {
fsparam_u32 ("rsize", Opt_rsize),
fsparam_string ("snapdirname", Opt_snapdirname),
fsparam_string ("source", Opt_source),
+ fsparam_string ("mon_addr", Opt_mon_addr),
fsparam_u32 ("wsize", Opt_wsize),
fsparam_flag_no ("wsync", Opt_wsync),
+ fsparam_flag_no ("pagecache", Opt_pagecache),
{}
};
}
/*
- * Parse the source parameter. Distinguish the server list from the path.
+ * Check if the mds namespace in ceph_mount_options matches
+ * the passed in namespace string. First time match (when
+ * ->mds_namespace is NULL) is treated specially, since
+ * ->mds_namespace needs to be initialized by the caller.
+ */
+static int namespace_equals(struct ceph_mount_options *fsopt,
+ const char *namespace, size_t len)
+{
+ return !(fsopt->mds_namespace &&
+ (strlen(fsopt->mds_namespace) != len ||
+ strncmp(fsopt->mds_namespace, namespace, len)));
+}
+
+static int ceph_parse_old_source(const char *dev_name, const char *dev_name_end,
+ struct fs_context *fc)
+{
+ int r;
+ struct ceph_parse_opts_ctx *pctx = fc->fs_private;
+ struct ceph_mount_options *fsopt = pctx->opts;
+
+ if (*dev_name_end != ':')
+ return invalfc(fc, "separator ':' missing in source");
+
+ r = ceph_parse_mon_ips(dev_name, dev_name_end - dev_name,
+ pctx->copts, fc->log.log, ',');
+ if (r)
+ return r;
+
+ fsopt->new_dev_syntax = false;
+ return 0;
+}
+
+static int ceph_parse_new_source(const char *dev_name, const char *dev_name_end,
+ struct fs_context *fc)
+{
+ size_t len;
+ struct ceph_fsid fsid;
+ struct ceph_parse_opts_ctx *pctx = fc->fs_private;
+ struct ceph_mount_options *fsopt = pctx->opts;
+ char *fsid_start, *fs_name_start;
+
+ if (*dev_name_end != '=') {
+ dout("separator '=' missing in source");
+ return -EINVAL;
+ }
+
+ fsid_start = strchr(dev_name, '@');
+ if (!fsid_start)
+ return invalfc(fc, "missing cluster fsid");
+ ++fsid_start; /* start of cluster fsid */
+
+ fs_name_start = strchr(fsid_start, '.');
+ if (!fs_name_start)
+ return invalfc(fc, "missing file system name");
+
+ if (ceph_parse_fsid(fsid_start, &fsid))
+ return invalfc(fc, "Invalid FSID");
+
+ ++fs_name_start; /* start of file system name */
+ len = dev_name_end - fs_name_start;
+
+ if (!namespace_equals(fsopt, fs_name_start, len))
+ return invalfc(fc, "Mismatching mds_namespace");
+ kfree(fsopt->mds_namespace);
+ fsopt->mds_namespace = kstrndup(fs_name_start, len, GFP_KERNEL);
+ if (!fsopt->mds_namespace)
+ return -ENOMEM;
+ dout("file system (mds namespace) '%s'\n", fsopt->mds_namespace);
+
+ fsopt->new_dev_syntax = true;
+ return 0;
+}
+
+/*
+ * Parse the source parameter for new device format. Distinguish the device
+ * spec from the path. Try parsing new device format and fallback to old
+ * format if needed.
+ *
+ * New device syntax will looks like:
+ * <device_spec>=/<path>
+ * where
+ * <device_spec> is name@fsid.fsname
+ * <path> is optional, but if present must begin with '/'
+ * (monitor addresses are passed via mount option)
*
- * The source will look like:
+ * Old device syntax is:
* <server_spec>[,<server_spec>...]:[<path>]
* where
* <server_spec> is <ip>[:<port>]
dev_name_end = dev_name + strlen(dev_name);
}
- dev_name_end--; /* back up to ':' separator */
- if (dev_name_end < dev_name || *dev_name_end != ':')
- return invalfc(fc, "No path or : separator in source");
+ dev_name_end--; /* back up to separator */
+ if (dev_name_end < dev_name)
+ return invalfc(fc, "Path missing in source");
dout("device name '%.*s'\n", (int)(dev_name_end - dev_name), dev_name);
if (fsopt->server_path)
dout("server path '%s'\n", fsopt->server_path);
- ret = ceph_parse_mon_ips(param->string, dev_name_end - dev_name,
- pctx->copts, fc->log.log);
- if (ret)
- return ret;
+ dout("trying new device syntax");
+ ret = ceph_parse_new_source(dev_name, dev_name_end, fc);
+ if (ret) {
+ if (ret != -EINVAL)
+ return ret;
+ dout("trying old device syntax");
+ ret = ceph_parse_old_source(dev_name, dev_name_end, fc);
+ if (ret)
+ return ret;
+ }
fc->source = param->string;
param->string = NULL;
return 0;
}
+static int ceph_parse_mon_addr(struct fs_parameter *param,
+ struct fs_context *fc)
+{
+ struct ceph_parse_opts_ctx *pctx = fc->fs_private;
+ struct ceph_mount_options *fsopt = pctx->opts;
+
+ kfree(fsopt->mon_addr);
+ fsopt->mon_addr = param->string;
+ param->string = NULL;
+
+ return ceph_parse_mon_ips(fsopt->mon_addr, strlen(fsopt->mon_addr),
+ pctx->copts, fc->log.log, '/');
+}
+
static int ceph_parse_mount_param(struct fs_context *fc,
struct fs_parameter *param)
{
param->string = NULL;
break;
case Opt_mds_namespace:
+ if (!namespace_equals(fsopt, param->string, strlen(param->string)))
+ return invalfc(fc, "Mismatching mds_namespace");
kfree(fsopt->mds_namespace);
fsopt->mds_namespace = param->string;
param->string = NULL;
if (fc->source)
return invalfc(fc, "Multiple sources specified");
return ceph_parse_source(param, fc);
+ case Opt_mon_addr:
+ return ceph_parse_mon_addr(param, fc);
case Opt_wsize:
if (result.uint_32 < PAGE_SIZE ||
result.uint_32 > CEPH_MAX_WRITE_SIZE)
else
fsopt->flags |= CEPH_MOUNT_OPT_ASYNC_DIROPS;
break;
+ case Opt_pagecache:
+ if (result.negated)
+ fsopt->flags |= CEPH_MOUNT_OPT_NOPAGECACHE;
+ else
+ fsopt->flags &= ~CEPH_MOUNT_OPT_NOPAGECACHE;
+ break;
default:
BUG();
}
kfree(args->mds_namespace);
kfree(args->server_path);
kfree(args->fscache_uniq);
+ kfree(args->mon_addr);
kfree(args);
}
if (ret)
return ret;
+ ret = strcmp_null(fsopt1->mon_addr, fsopt2->mon_addr);
+ if (ret)
+ return ret;
+
return ceph_compare_options(new_opt, fsc->client);
}
if ((fsopt->flags & CEPH_MOUNT_OPT_NOCOPYFROM) == 0)
seq_puts(m, ",copyfrom");
- if (fsopt->mds_namespace)
+ /* dump mds_namespace when old device syntax is in use */
+ if (fsopt->mds_namespace && !fsopt->new_dev_syntax)
seq_show_option(m, "mds_namespace", fsopt->mds_namespace);
+ if (fsopt->mon_addr)
+ seq_printf(m, ",mon_addr=%s", fsopt->mon_addr);
+
if (fsopt->flags & CEPH_MOUNT_OPT_CLEANRECOVER)
seq_show_option(m, "recover_session", "clean");
if (!(fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS))
seq_puts(m, ",wsync");
+ if (fsopt->flags & CEPH_MOUNT_OPT_NOPAGECACHE)
+ seq_puts(m, ",nopagecache");
+
if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
seq_printf(m, ",wsize=%u", fsopt->wsize);
if (fsopt->rsize != CEPH_MAX_READ_SIZE)
static int ceph_get_tree(struct fs_context *fc)
{
struct ceph_parse_opts_ctx *pctx = fc->fs_private;
+ struct ceph_mount_options *fsopt = pctx->opts;
struct super_block *sb;
struct ceph_fs_client *fsc;
struct dentry *res;
if (!fc->source)
return invalfc(fc, "No source");
+ if (fsopt->new_dev_syntax && !fsopt->mon_addr)
+ return invalfc(fc, "No monitor address");
/* create client (which we may/may not use) */
fsc = create_fs_client(pctx->opts, pctx->copts);
else
ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
+ if (strcmp_null(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
+ kfree(fsc->mount_options->mon_addr);
+ fsc->mount_options->mon_addr = fsopt->mon_addr;
+ fsopt->mon_addr = NULL;
+ pr_notice("ceph: monitor addresses recorded, but not used for reconnection");
+ }
+
sync_filesystem(fc->root->d_sb);
return 0;
}
module_param_cb(disable_send_metrics, ¶m_ops_metrics, &disable_send_metrics, 0644);
MODULE_PARM_DESC(disable_send_metrics, "Enable sending perf metrics to ceph cluster (default: on)");
+/* for both v1 and v2 syntax */
+static bool mount_support = true;
+static const struct kernel_param_ops param_ops_mount_syntax = {
+ .get = param_get_bool,
+};
+module_param_cb(mount_syntax_v1, ¶m_ops_mount_syntax, &mount_support, 0444);
+module_param_cb(mount_syntax_v2, ¶m_ops_mount_syntax, &mount_support, 0444);
+
module_init(init_ceph);
module_exit(exit_ceph);
#include <linux/fscache.h>
#endif
-/* f_type in struct statfs */
-#define CEPH_SUPER_MAGIC 0x00c36400
-
/* large granularity for statfs utilization stats to facilitate
* large volume sizes on 32-bit machines. */
#define CEPH_BLOCK_SHIFT 22 /* 4 MB */
#define CEPH_BLOCK (1 << CEPH_BLOCK_SHIFT)
+#define CEPH_4K_BLOCK_SHIFT 12 /* 4 KB */
#define CEPH_MOUNT_OPT_CLEANRECOVER (1<<1) /* auto reonnect (clean mode) after blocklisted */
#define CEPH_MOUNT_OPT_DIRSTAT (1<<4) /* `cat dirname` for stats */
#define CEPH_MOUNT_OPT_NOQUOTADF (1<<13) /* no root dir quota in statfs */
#define CEPH_MOUNT_OPT_NOCOPYFROM (1<<14) /* don't use RADOS 'copy-from' op */
#define CEPH_MOUNT_OPT_ASYNC_DIROPS (1<<15) /* allow async directory ops */
+#define CEPH_MOUNT_OPT_NOPAGECACHE (1<<16) /* bypass pagecache altogether */
#define CEPH_MOUNT_OPT_DEFAULT \
(CEPH_MOUNT_OPT_DCACHE | \
unsigned int max_readdir; /* max readdir result (entries) */
unsigned int max_readdir_bytes; /* max readdir result (bytes) */
+ bool new_dev_syntax;
+
/*
* everything above this point can be memcmp'd; everything below
* is handled in compare_mount_options()
char *mds_namespace; /* default NULL */
char *server_path; /* default NULL (means "/") */
char *fscache_uniq; /* default NULL */
+ char *mon_addr;
};
struct ceph_fs_client {
*
* These come from src/mds/mdstypes.h in the ceph sources.
*/
-#define CEPH_MAX_MDS 0x100
-#define CEPH_NUM_STRAY 10
+#define CEPH_MAX_MDS 0x100
+#define CEPH_NUM_STRAY 10
#define CEPH_MDS_INO_MDSDIR_OFFSET (1 * CEPH_MAX_MDS)
+#define CEPH_MDS_INO_LOG_OFFSET (2 * CEPH_MAX_MDS)
#define CEPH_INO_SYSTEM_BASE ((6*CEPH_MAX_MDS) + (CEPH_MAX_MDS * CEPH_NUM_STRAY))
static inline bool ceph_vino_is_reserved(const struct ceph_vino vino)
{
- if (vino.ino < CEPH_INO_SYSTEM_BASE &&
- vino.ino >= CEPH_MDS_INO_MDSDIR_OFFSET) {
- WARN_RATELIMIT(1, "Attempt to access reserved inode number 0x%llx", vino.ino);
- return true;
- }
- return false;
+ if (vino.ino >= CEPH_INO_SYSTEM_BASE ||
+ vino.ino < CEPH_MDS_INO_MDSDIR_OFFSET)
+ return false;
+
+ /* Don't warn on mdsdirs */
+ WARN_RATELIMIT(vino.ino >= CEPH_MDS_INO_LOG_OFFSET,
+ "Attempt to access reserved inode number 0x%llx",
+ vino.ino);
+ return true;
}
static inline struct inode *ceph_find_inode(struct super_block *sb,
config CIFS_FSCACHE
bool "Provide CIFS client caching support"
- depends on CIFS=m && FSCACHE_OLD_API || CIFS=y && FSCACHE_OLD_API=y
+ depends on CIFS=m && FSCACHE || CIFS=y && FSCACHE=y
help
Makes CIFS FS-Cache capable. Say Y here if you want your CIFS data
to be cached locally on disk through the general filesystem cache
cifs-$(CONFIG_CIFS_SWN_UPCALL) += netlink.o cifs_swn.o
-cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o cache.o
+cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o
cifs-$(CONFIG_CIFS_SMB_DIRECT) += smbdirect.o
+++ /dev/null
-// SPDX-License-Identifier: LGPL-2.1
-/*
- * CIFS filesystem cache index structure definitions
- *
- * Copyright (c) 2010 Novell, Inc.
- * Authors(s): Suresh Jayaraman (sjayaraman@suse.de>
- *
- */
-#include "fscache.h"
-#include "cifs_debug.h"
-
-/*
- * CIFS filesystem definition for FS-Cache
- */
-struct fscache_netfs cifs_fscache_netfs = {
- .name = "cifs",
- .version = 0,
-};
-
-/*
- * Register CIFS for caching with FS-Cache
- */
-int cifs_fscache_register(void)
-{
- return fscache_register_netfs(&cifs_fscache_netfs);
-}
-
-/*
- * Unregister CIFS for caching
- */
-void cifs_fscache_unregister(void)
-{
- fscache_unregister_netfs(&cifs_fscache_netfs);
-}
-
-/*
- * Server object for FS-Cache
- */
-const struct fscache_cookie_def cifs_fscache_server_index_def = {
- .name = "CIFS.server",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-static enum
-fscache_checkaux cifs_fscache_super_check_aux(void *cookie_netfs_data,
- const void *data,
- uint16_t datalen,
- loff_t object_size)
-{
- struct cifs_fscache_super_auxdata auxdata;
- const struct cifs_tcon *tcon = cookie_netfs_data;
-
- if (datalen != sizeof(auxdata))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
-
- if (memcmp(data, &auxdata, datalen) != 0)
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
-}
-
-/*
- * Superblock object for FS-Cache
- */
-const struct fscache_cookie_def cifs_fscache_super_index_def = {
- .name = "CIFS.super",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
- .check_aux = cifs_fscache_super_check_aux,
-};
-
-static enum
-fscache_checkaux cifs_fscache_inode_check_aux(void *cookie_netfs_data,
- const void *data,
- uint16_t datalen,
- loff_t object_size)
-{
- struct cifs_fscache_inode_auxdata auxdata;
- struct cifsInodeInfo *cifsi = cookie_netfs_data;
-
- if (datalen != sizeof(auxdata))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
- if (memcmp(data, &auxdata, datalen) != 0)
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
-}
-
-const struct fscache_cookie_def cifs_fscache_inode_object_def = {
- .name = "CIFS.uniqueid",
- .type = FSCACHE_COOKIE_TYPE_DATAFILE,
- .check_aux = cifs_fscache_inode_check_aux,
-};
switch (state) {
case CIFS_SWN_RESOURCE_STATE_UNAVAILABLE:
cifs_dbg(FYI, "%s: resource name '%s' become unavailable\n", __func__, name);
- cifs_ses_mark_for_reconnect(swnreg->tcon->ses);
+ cifs_reconnect(swnreg->tcon->ses->server, true);
break;
case CIFS_SWN_RESOURCE_STATE_AVAILABLE:
cifs_dbg(FYI, "%s: resource name '%s' become available\n", __func__, name);
- cifs_ses_mark_for_reconnect(swnreg->tcon->ses);
+ cifs_reconnect(swnreg->tcon->ses->server, true);
break;
case CIFS_SWN_RESOURCE_STATE_UNKNOWN:
cifs_dbg(FYI, "%s: resource name '%s' changed to unknown state\n", __func__, name);
goto unlock;
}
- spin_lock(&cifs_tcp_ses_lock);
- if (tcon->ses->server->tcpStatus != CifsExiting)
- tcon->ses->server->tcpStatus = CifsNeedReconnect;
- spin_unlock(&cifs_tcp_ses_lock);
+ cifs_reconnect(tcon->ses->server, false);
unlock:
mutex_unlock(&tcon->ses->server->srv_mutex);
cifs_evict_inode(struct inode *inode)
{
truncate_inode_pages_final(&inode->i_data);
+ if (inode->i_state & I_PINNING_FSCACHE_WB)
+ cifs_fscache_unuse_inode_cookie(inode, true);
+ cifs_fscache_release_inode_cookie(inode);
clear_inode(inode);
}
}
#endif
+static int cifs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+ fscache_unpin_writeback(wbc, cifs_inode_cookie(inode));
+ return 0;
+}
+
static int cifs_drop_inode(struct inode *inode)
{
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
static const struct super_operations cifs_super_ops = {
.statfs = cifs_statfs,
.alloc_inode = cifs_alloc_inode,
+ .write_inode = cifs_write_inode,
.free_inode = cifs_free_inode,
.drop_inode = cifs_drop_inode,
.evict_inode = cifs_evict_inode,
goto out_destroy_cifsoplockd_wq;
}
- rc = cifs_fscache_register();
- if (rc)
- goto out_destroy_deferredclose_wq;
-
rc = cifs_init_inodecache();
if (rc)
- goto out_unreg_fscache;
+ goto out_destroy_deferredclose_wq;
rc = cifs_init_mids();
if (rc)
cifs_destroy_mids();
out_destroy_inodecache:
cifs_destroy_inodecache();
-out_unreg_fscache:
- cifs_fscache_unregister();
out_destroy_deferredclose_wq:
destroy_workqueue(deferredclose_wq);
out_destroy_cifsoplockd_wq:
cifs_destroy_request_bufs();
cifs_destroy_mids();
cifs_destroy_inodecache();
- cifs_fscache_unregister();
destroy_workqueue(deferredclose_wq);
destroy_workqueue(cifsoplockd_wq);
destroy_workqueue(decrypt_wq);
extern const struct export_operations cifs_export_ops;
#endif /* CONFIG_CIFS_NFSD_EXPORT */
-#define CIFS_VERSION "2.34"
+#define SMB3_PRODUCT_BUILD 35
+#define CIFS_VERSION "2.35"
#endif /* _CIFSFS_H */
CifsInSessSetup,
CifsNeedTcon,
CifsInTcon,
+ CifsNeedFilesInvalidate,
CifsInFilesInvalidate
};
unsigned int total_read; /* total amount of data read in this pass */
atomic_t in_send; /* requests trying to send */
atomic_t num_waiters; /* blocked waiting to get in sendrecv */
-#ifdef CONFIG_CIFS_FSCACHE
- struct fscache_cookie *fscache; /* client index cache cookie */
-#endif
#ifdef CONFIG_CIFS_STATS2
atomic_t num_cmds[NUMBER_OF_SMB2_COMMANDS]; /* total requests by cmd */
atomic_t smb2slowcmd[NUMBER_OF_SMB2_COMMANDS]; /* count resps > 1 sec */
*/
struct cifs_ses {
struct list_head smb_ses_list;
+ struct list_head rlist; /* reconnect list */
struct list_head tcon_list;
struct cifs_tcon *tcon_ipc;
struct mutex session_mutex;
__u32 max_bytes_copy;
#ifdef CONFIG_CIFS_FSCACHE
u64 resource_id; /* server resource id */
- struct fscache_cookie *fscache; /* cookie for share */
+ struct fscache_volume *fscache; /* cookie for share */
#endif
struct list_head pending_opens; /* list of incomplete opens */
struct cached_fid crfid; /* Cached root fid */
struct smb_hdr *in_buf ,
struct smb_hdr *out_buf,
int *bytes_returned);
+void
+cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
+ bool mark_smb_session);
extern int cifs_reconnect(struct TCP_Server_Info *server,
bool mark_smb_session);
extern int checkSMB(char *buf, unsigned int len, struct TCP_Server_Info *srvr);
int match_target_ip(struct TCP_Server_Info *server,
const char *share, size_t share_len,
bool *result);
+
+int cifs_dfs_query_info_nonascii_quirk(const unsigned int xid,
+ struct cifs_tcon *tcon,
+ struct cifs_sb_info *cifs_sb,
+ const char *dfs_link_path);
#endif
static inline int cifs_create_options(struct cifs_sb_info *cifs_sb, int options)
* @server needs to be previously set to CifsNeedReconnect.
*
*/
-static void
+void
cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
bool mark_smb_session)
{
server->maxBuf = 0;
server->max_read = 0;
- cifs_dbg(FYI, "Mark tcp session as need reconnect\n");
- trace_smb3_reconnect(server->CurrentMid, server->conn_id, server->hostname);
/*
* before reconnecting the tcp session, mark the smb session (uid) and the tid bad so they
* are not used until reconnected.
*/
- cifs_dbg(FYI, "%s: marking sessions and tcons for reconnect\n", __func__);
+ cifs_dbg(FYI, "%s: marking necessary sessions and tcons for reconnect\n", __func__);
/* If server is a channel, select the primary channel */
pserver = CIFS_SERVER_IS_CHAN(server) ? server->primary_server : server;
+
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) {
spin_lock(&ses->chan_lock);
if (!mark_smb_session && cifs_chan_needs_reconnect(ses, server))
goto next_session;
- cifs_chan_set_need_reconnect(ses, server);
+ if (mark_smb_session)
+ CIFS_SET_ALL_CHANS_NEED_RECONNECT(ses);
+ else
+ cifs_chan_set_need_reconnect(ses, server);
/* If all channels need reconnect, then tcon needs reconnect */
if (!mark_smb_session && !CIFS_ALL_CHANS_NEED_RECONNECT(ses))
}
spin_unlock(&cifs_tcp_ses_lock);
- /*
- * before reconnecting the tcp session, mark the smb session (uid)
- * and the tid bad so they are not used until reconnected
- */
- cifs_dbg(FYI, "%s: marking sessions and tcons for reconnect and tearing down socket\n",
- __func__);
/* do not want to be sending data on a socket we are freeing */
+ cifs_dbg(FYI, "%s: tearing down socket\n", __func__);
mutex_lock(&server->srv_mutex);
if (server->ssocket) {
cifs_dbg(FYI, "State: 0x%x Flags: 0x%lx\n", server->ssocket->state,
wake_up(&server->response_q);
return false;
}
+
+ cifs_dbg(FYI, "Mark tcp session as need reconnect\n");
+ trace_smb3_reconnect(server->CurrentMid, server->conn_id,
+ server->hostname);
server->tcpStatus = CifsNeedReconnect;
+
spin_unlock(&cifs_tcp_ses_lock);
return true;
}
spin_unlock(&cifs_tcp_ses_lock);
cifs_swn_reset_server_dstaddr(server);
mutex_unlock(&server->srv_mutex);
+ mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
}
} while (server->tcpStatus == CifsNeedReconnect);
+ spin_lock(&cifs_tcp_ses_lock);
if (server->tcpStatus == CifsNeedNegotiate)
mod_delayed_work(cifsiod_wq, &server->echo, 0);
+ spin_unlock(&cifs_tcp_ses_lock);
wake_up(&server->response_q);
return rc;
spin_unlock(&cifs_tcp_ses_lock);
cifs_swn_reset_server_dstaddr(server);
mutex_unlock(&server->srv_mutex);
+ mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
} while (server->tcpStatus == CifsNeedReconnect);
if (target_hint)
if (server->tcpStatus == CifsNeedReconnect) {
spin_unlock(&cifs_tcp_ses_lock);
- cifs_reconnect(server, false);
return -ECONNABORTED;
}
spin_unlock(&cifs_tcp_ses_lock);
cifs_crypto_secmech_release(server);
- /* fscache server cookies are based on primary channel only */
- if (!CIFS_SERVER_IS_CHAN(server))
- cifs_fscache_release_client_cookie(server);
-
kfree(server->session_key.response);
server->session_key.response = NULL;
server->session_key.len = 0;
list_add(&tcp_ses->tcp_ses_list, &cifs_tcp_ses_list);
spin_unlock(&cifs_tcp_ses_lock);
- /* fscache server cookies are based on primary channel only */
- if (!CIFS_SERVER_IS_CHAN(tcp_ses))
- cifs_fscache_get_client_cookie(tcp_ses);
-#ifdef CONFIG_CIFS_FSCACHE
- else
- tcp_ses->fscache = tcp_ses->primary_server->fscache;
-#endif /* CONFIG_CIFS_FSCACHE */
-
/* queue echo request delayed work */
queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
spin_lock(&ses->chan_lock);
chan_count = ses->chan_count;
- spin_unlock(&ses->chan_lock);
/* close any extra channels */
if (chan_count > 1) {
ses->chans[i].server = NULL;
}
}
+ spin_unlock(&ses->chan_lock);
sesInfoFree(ses);
cifs_put_tcp_session(server, 0);
mutex_unlock(&ses->session_mutex);
/* each channel uses a different signing key */
+ spin_lock(&ses->chan_lock);
memcpy(ses->chans[0].signkey, ses->smb3signingkey,
sizeof(ses->smb3signingkey));
+ spin_unlock(&ses->chan_lock);
if (rc)
goto get_ses_fail;
* Inside cifs_fscache_get_super_cookie it checks
* that we do not get super cookie twice.
*/
- cifs_fscache_get_super_cookie(tcon);
+ if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_FSCACHE)
+ cifs_fscache_get_super_cookie(tcon);
out:
mnt_ctx->server = server;
rc = server->ops->is_path_accessible(xid, tcon, cifs_sb,
full_path);
+#ifdef CONFIG_CIFS_DFS_UPCALL
+ if (rc == -ENOENT && is_tcon_dfs(tcon))
+ rc = cifs_dfs_query_info_nonascii_quirk(xid, tcon, cifs_sb,
+ full_path);
+#endif
if (rc != 0 && rc != -EREMOTE) {
kfree(full_path);
return rc;
if (rc == 0) {
bool is_unicode;
- spin_lock(&cifs_tcp_ses_lock);
- tcon->tidStatus = CifsGood;
- spin_unlock(&cifs_tcp_ses_lock);
- tcon->need_reconnect = false;
tcon->tid = smb_buffer_response->Tid;
bcc_ptr = pByteArea(smb_buffer_response);
bytes_left = get_bcc(smb_buffer_response);
else
rc = -EHOSTDOWN;
spin_unlock(&cifs_tcp_ses_lock);
+ } else {
+ spin_lock(&cifs_tcp_ses_lock);
+ if (server->tcpStatus == CifsInNegotiate)
+ server->tcpStatus = CifsNeedNegotiate;
+ spin_unlock(&cifs_tcp_ses_lock);
}
return rc;
spin_unlock(&cifs_tcp_ses_lock);
return 0;
}
- ses->status = CifsInSessSetup;
+ server->tcpStatus = CifsInSessSetup;
spin_unlock(&cifs_tcp_ses_lock);
spin_lock(&ses->chan_lock);
if (server->ops->sess_setup)
rc = server->ops->sess_setup(xid, ses, server, nls_info);
- if (rc)
+ if (rc) {
cifs_server_dbg(VFS, "Send error in SessSetup = %d\n", rc);
+ spin_lock(&cifs_tcp_ses_lock);
+ if (server->tcpStatus == CifsInSessSetup)
+ server->tcpStatus = CifsNeedSessSetup;
+ spin_unlock(&cifs_tcp_ses_lock);
+ } else {
+ spin_lock(&cifs_tcp_ses_lock);
+ if (server->tcpStatus == CifsInSessSetup)
+ server->tcpStatus = CifsGood;
+ /* Even if one channel is active, session is in good state */
+ ses->status = CifsGood;
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ spin_lock(&ses->chan_lock);
+ cifs_chan_clear_need_reconnect(ses, server);
+ spin_unlock(&ses->chan_lock);
+ }
return rc;
}
struct dfs_cache_tgt_iterator *tit;
bool target_match;
- /* only send once per connect */
- spin_lock(&cifs_tcp_ses_lock);
- if (tcon->ses->status != CifsGood ||
- (tcon->tidStatus != CifsNew &&
- tcon->tidStatus != CifsNeedTcon)) {
- spin_unlock(&cifs_tcp_ses_lock);
- return 0;
- }
- tcon->tidStatus = CifsInTcon;
- spin_unlock(&cifs_tcp_ses_lock);
-
extract_unc_hostname(server->hostname, &tcp_host, &tcp_host_len);
tit = dfs_cache_get_tgt_iterator(tl);
*/
if (rc && server->current_fullpath != server->origin_fullpath) {
server->current_fullpath = server->origin_fullpath;
- cifs_ses_mark_for_reconnect(tcon->ses);
+ cifs_reconnect(tcon->ses->server, true);
}
dfs_cache_free_tgts(tl);
char *tree;
struct dfs_info3_param ref = {0};
+ /* only send once per connect */
+ spin_lock(&cifs_tcp_ses_lock);
+ if (tcon->ses->status != CifsGood ||
+ (tcon->tidStatus != CifsNew &&
+ tcon->tidStatus != CifsNeedTcon)) {
+ spin_unlock(&cifs_tcp_ses_lock);
+ return 0;
+ }
+ tcon->tidStatus = CifsInTcon;
+ spin_unlock(&cifs_tcp_ses_lock);
+
tree = kzalloc(MAX_TREE_SIZE, GFP_KERNEL);
- if (!tree)
- return -ENOMEM;
+ if (!tree) {
+ rc = -ENOMEM;
+ goto out;
+ }
if (tcon->ipc) {
scnprintf(tree, MAX_TREE_SIZE, "\\\\%s\\IPC$", server->hostname);
kfree(tree);
cifs_put_tcp_super(sb);
+ if (rc) {
+ spin_lock(&cifs_tcp_ses_lock);
+ if (tcon->tidStatus == CifsInTcon)
+ tcon->tidStatus = CifsNeedTcon;
+ spin_unlock(&cifs_tcp_ses_lock);
+ } else {
+ spin_lock(&cifs_tcp_ses_lock);
+ if (tcon->tidStatus == CifsInTcon)
+ tcon->tidStatus = CifsGood;
+ spin_unlock(&cifs_tcp_ses_lock);
+ tcon->need_reconnect = false;
+ }
+
return rc;
}
#else
int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const struct nls_table *nlsc)
{
+ int rc;
const struct smb_version_operations *ops = tcon->ses->server->ops;
/* only send once per connect */
tcon->tidStatus = CifsInTcon;
spin_unlock(&cifs_tcp_ses_lock);
- return ops->tree_connect(xid, tcon->ses, tcon->treeName, tcon, nlsc);
+ rc = ops->tree_connect(xid, tcon->ses, tcon->treeName, tcon, nlsc);
+ if (rc) {
+ spin_lock(&cifs_tcp_ses_lock);
+ if (tcon->tidStatus == CifsInTcon)
+ tcon->tidStatus = CifsNeedTcon;
+ spin_unlock(&cifs_tcp_ses_lock);
+ } else {
+ spin_lock(&cifs_tcp_ses_lock);
+ if (tcon->tidStatus == CifsInTcon)
+ tcon->tidStatus = CifsGood;
+ spin_unlock(&cifs_tcp_ses_lock);
+ tcon->need_reconnect = false;
+ }
+
+ return rc;
}
#endif
}
cifs_dbg(FYI, "%s: no cached or matched targets. mark dfs share for reconnect.\n", __func__);
- cifs_ses_mark_for_reconnect(tcon->ses);
+ cifs_reconnect(tcon->ses->server, true);
}
/* Refresh dfs referral of tcon and mark it for reconnect if needed */
#include "cifs_unicode.h"
#include "fs_context.h"
#include "cifs_ioctl.h"
+#include "fscache.h"
static void
renew_parental_timestamps(struct dentry *direntry)
server->ops->close(xid, tcon, &fid);
cifs_del_pending_open(&open);
rc = -ENOMEM;
+ goto out;
}
+ fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
+ file->f_mode & FMODE_WRITE);
+
out:
cifs_put_tlink(tlink);
out_free_xid:
struct cifsLockInfo *li, *tmp;
struct super_block *sb = inode->i_sb;
- cifs_fscache_release_inode_cookie(inode);
-
/*
* Delete any outstanding lock records. We'll lose them when the file
* is closed anyway.
spin_lock(&CIFS_I(inode)->deferred_lock);
cifs_del_deferred_close(cfile);
spin_unlock(&CIFS_I(inode)->deferred_lock);
- goto out;
+ goto use_cache;
} else {
_cifsFileInfo_put(cfile, true, false);
}
goto out;
}
- cifs_fscache_set_inode_cookie(inode, file);
-
if ((oplock & CIFS_CREATE_ACTION) && !posix_open_ok && tcon->unix_ext) {
/*
* Time to set mode which we can not set earlier due to
cfile->pid);
}
+use_cache:
+ fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
+ file->f_mode & FMODE_WRITE);
+ if (file->f_flags & O_DIRECT &&
+ (!((file->f_flags & O_ACCMODE) != O_RDONLY) ||
+ file->f_flags & O_APPEND))
+ cifs_invalidate_cache(file_inode(file),
+ FSCACHE_INVAL_DIO_WRITE);
+
out:
free_dentry_path(page);
free_xid(xid);
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifs_deferred_close *dclose;
+ cifs_fscache_unuse_inode_cookie(inode, file->f_mode & FMODE_WRITE);
+
if (file->private_data != NULL) {
cfile = file->private_data;
file->private_data = NULL;
dclose) {
if (test_and_clear_bit(CIFS_INO_MODIFIED_ATTR, &cinode->flags)) {
inode->i_ctime = inode->i_mtime = current_time(inode);
- cifs_fscache_update_inode_cookie(inode);
}
spin_lock(&cinode->deferred_lock);
cifs_add_deferred_close(cfile, dclose);
cifs_page_mkwrite(struct vm_fault *vmf)
{
struct page *page = vmf->page;
- struct file *file = vmf->vma->vm_file;
- struct inode *inode = file_inode(file);
- cifs_fscache_wait_on_page_write(inode, page);
+#ifdef CONFIG_CIFS_FSCACHE
+ if (PageFsCache(page) &&
+ wait_on_page_fscache_killable(page) < 0)
+ return VM_FAULT_RETRY;
+#endif
lock_page(page);
return VM_FAULT_LOCKED;
if (rdata->result == 0 ||
(rdata->result == -EAGAIN && got_bytes))
cifs_readpage_to_fscache(rdata->mapping->host, page);
- else
- cifs_fscache_uncache_page(rdata->mapping->host, page);
got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes);
kref_put(&rdata->refcount, cifs_readdata_release);
}
- /* Any pages that have been shown to fscache but didn't get added to
- * the pagecache must be uncached before they get returned to the
- * allocator.
- */
- cifs_fscache_readpages_cancel(mapping->host, page_list);
free_xid(xid);
return rc;
}
{
if (PagePrivate(page))
return 0;
-
- return cifs_fscache_release_page(page, gfp);
+ if (PageFsCache(page)) {
+ if (current_is_kswapd() || !(gfp & __GFP_FS))
+ return false;
+ wait_on_page_fscache(page);
+ }
+ fscache_note_page_release(cifs_inode_cookie(page->mapping->host));
+ return true;
}
static void cifs_invalidate_page(struct page *page, unsigned int offset,
unsigned int length)
{
- struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);
-
- if (offset == 0 && length == PAGE_SIZE)
- cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
+ wait_on_page_fscache(page);
}
static int cifs_launder_page(struct page *page)
if (clear_page_dirty_for_io(page))
rc = cifs_writepage_locked(page, &wbc);
- cifs_fscache_invalidate_page(page, page->mapping->host);
+ wait_on_page_fscache(page);
return rc;
}
/* do we need to unpin (or unlock) the file */
}
+/*
+ * Mark a page as having been made dirty and thus needing writeback. We also
+ * need to pin the cache object to write back to.
+ */
+#ifdef CONFIG_CIFS_FSCACHE
+static int cifs_set_page_dirty(struct page *page)
+{
+ return fscache_set_page_dirty(page, cifs_inode_cookie(page->mapping->host));
+}
+#else
+#define cifs_set_page_dirty __set_page_dirty_nobuffers
+#endif
+
const struct address_space_operations cifs_addr_ops = {
.readpage = cifs_readpage,
.readpages = cifs_readpages,
.writepages = cifs_writepages,
.write_begin = cifs_write_begin,
.write_end = cifs_write_end,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = cifs_set_page_dirty,
.releasepage = cifs_release_page,
.direct_IO = cifs_direct_io,
.invalidatepage = cifs_invalidate_page,
.writepages = cifs_writepages,
.write_begin = cifs_write_begin,
.write_end = cifs_write_end,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = cifs_set_page_dirty,
.releasepage = cifs_release_page,
.invalidatepage = cifs_invalidate_page,
.launder_page = cifs_launder_page,
#include "rfc1002pdu.h"
#include "fs_context.h"
+static DEFINE_MUTEX(cifs_mount_mutex);
+
static const match_table_t cifs_smb_version_tokens = {
{ Smb_1, SMB1_VERSION_STRING },
{ Smb_20, SMB20_VERSION_STRING},
static int smb3_get_tree(struct fs_context *fc)
{
int err = smb3_fs_context_validate(fc);
+ int ret;
if (err)
return err;
- return smb3_get_tree_common(fc);
+ mutex_lock(&cifs_mount_mutex);
+ ret = smb3_get_tree_common(fc);
+ mutex_unlock(&cifs_mount_mutex);
+ return ret;
}
static void smb3_fs_context_free(struct fs_context *fc)
#include "cifs_fs_sb.h"
#include "cifsproto.h"
-/*
- * Key layout of CIFS server cache index object
- */
-struct cifs_server_key {
- __u64 conn_id;
-} __packed;
-
-/*
- * Get a cookie for a server object keyed by {IPaddress,port,family} tuple
- */
-void cifs_fscache_get_client_cookie(struct TCP_Server_Info *server)
-{
- struct cifs_server_key key;
-
- /*
- * Check if cookie was already initialized so don't reinitialize it.
- * In the future, as we integrate with newer fscache features,
- * we may want to instead add a check if cookie has changed
- */
- if (server->fscache)
- return;
-
- memset(&key, 0, sizeof(key));
- key.conn_id = server->conn_id;
-
- server->fscache =
- fscache_acquire_cookie(cifs_fscache_netfs.primary_index,
- &cifs_fscache_server_index_def,
- &key, sizeof(key),
- NULL, 0,
- server, 0, true);
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server, server->fscache);
-}
-
-void cifs_fscache_release_client_cookie(struct TCP_Server_Info *server)
+static void cifs_fscache_fill_volume_coherency(
+ struct cifs_tcon *tcon,
+ struct cifs_fscache_volume_coherency_data *cd)
{
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server, server->fscache);
- fscache_relinquish_cookie(server->fscache, NULL, false);
- server->fscache = NULL;
+ memset(cd, 0, sizeof(*cd));
+ cd->resource_id = cpu_to_le64(tcon->resource_id);
+ cd->vol_create_time = tcon->vol_create_time;
+ cd->vol_serial_number = cpu_to_le32(tcon->vol_serial_number);
}
-void cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
+int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
{
+ struct cifs_fscache_volume_coherency_data cd;
struct TCP_Server_Info *server = tcon->ses->server;
+ struct fscache_volume *vcookie;
+ const struct sockaddr *sa = (struct sockaddr *)&server->dstaddr;
+ size_t slen, i;
char *sharename;
- struct cifs_fscache_super_auxdata auxdata;
+ char *key;
+ int ret = -ENOMEM;
- /*
- * Check if cookie was already initialized so don't reinitialize it.
- * In the future, as we integrate with newer fscache features,
- * we may want to instead add a check if cookie has changed
- */
- if (tcon->fscache)
- return;
+ tcon->fscache = NULL;
+ switch (sa->sa_family) {
+ case AF_INET:
+ case AF_INET6:
+ break;
+ default:
+ cifs_dbg(VFS, "Unknown network family '%d'\n", sa->sa_family);
+ return -EINVAL;
+ }
+
+ memset(&key, 0, sizeof(key));
sharename = extract_sharename(tcon->treeName);
if (IS_ERR(sharename)) {
cifs_dbg(FYI, "%s: couldn't extract sharename\n", __func__);
- tcon->fscache = NULL;
- return;
+ return -EINVAL;
}
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
+ slen = strlen(sharename);
+ for (i = 0; i < slen; i++)
+ if (sharename[i] == '/')
+ sharename[i] = ';';
+
+ key = kasprintf(GFP_KERNEL, "cifs,%pISpc,%s", sa, sharename);
+ if (!key)
+ goto out;
+
+ cifs_fscache_fill_volume_coherency(tcon, &cd);
+ vcookie = fscache_acquire_volume(key,
+ NULL, /* preferred_cache */
+ &cd, sizeof(cd));
+ cifs_dbg(FYI, "%s: (%s/0x%p)\n", __func__, key, vcookie);
+ if (IS_ERR(vcookie)) {
+ if (vcookie != ERR_PTR(-EBUSY)) {
+ ret = PTR_ERR(vcookie);
+ goto out_2;
+ }
+ pr_err("Cache volume key already in use (%s)\n", key);
+ vcookie = NULL;
+ }
- tcon->fscache =
- fscache_acquire_cookie(server->fscache,
- &cifs_fscache_super_index_def,
- sharename, strlen(sharename),
- &auxdata, sizeof(auxdata),
- tcon, 0, true);
+ tcon->fscache = vcookie;
+ ret = 0;
+out_2:
+ kfree(key);
+out:
kfree(sharename);
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server->fscache, tcon->fscache);
+ return ret;
}
void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon)
{
- struct cifs_fscache_super_auxdata auxdata;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
+ struct cifs_fscache_volume_coherency_data cd;
cifs_dbg(FYI, "%s: (0x%p)\n", __func__, tcon->fscache);
- fscache_relinquish_cookie(tcon->fscache, &auxdata, false);
- tcon->fscache = NULL;
-}
-
-static void cifs_fscache_acquire_inode_cookie(struct cifsInodeInfo *cifsi,
- struct cifs_tcon *tcon)
-{
- struct cifs_fscache_inode_auxdata auxdata;
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
- cifsi->fscache =
- fscache_acquire_cookie(tcon->fscache,
- &cifs_fscache_inode_object_def,
- &cifsi->uniqueid, sizeof(cifsi->uniqueid),
- &auxdata, sizeof(auxdata),
- cifsi, cifsi->vfs_inode.i_size, true);
+ cifs_fscache_fill_volume_coherency(tcon, &cd);
+ fscache_relinquish_volume(tcon->fscache, &cd, false);
+ tcon->fscache = NULL;
}
-static void cifs_fscache_enable_inode_cookie(struct inode *inode)
+void cifs_fscache_get_inode_cookie(struct inode *inode)
{
+ struct cifs_fscache_inode_coherency_data cd;
struct cifsInodeInfo *cifsi = CIFS_I(inode);
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
- if (cifsi->fscache)
- return;
-
- if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_FSCACHE))
- return;
-
- cifs_fscache_acquire_inode_cookie(cifsi, tcon);
+ cifs_fscache_fill_coherency(&cifsi->vfs_inode, &cd);
- cifs_dbg(FYI, "%s: got FH cookie (0x%p/0x%p)\n",
- __func__, tcon->fscache, cifsi->fscache);
+ cifsi->fscache =
+ fscache_acquire_cookie(tcon->fscache, 0,
+ &cifsi->uniqueid, sizeof(cifsi->uniqueid),
+ &cd, sizeof(cd),
+ i_size_read(&cifsi->vfs_inode));
}
-void cifs_fscache_release_inode_cookie(struct inode *inode)
+void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update)
{
- struct cifs_fscache_inode_auxdata auxdata;
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
-
- if (cifsi->fscache) {
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
+ if (update) {
+ struct cifs_fscache_inode_coherency_data cd;
+ loff_t i_size = i_size_read(inode);
- cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cifsi->fscache);
- /* fscache_relinquish_cookie does not seem to update auxdata */
- fscache_update_cookie(cifsi->fscache, &auxdata);
- fscache_relinquish_cookie(cifsi->fscache, &auxdata, false);
- cifsi->fscache = NULL;
+ cifs_fscache_fill_coherency(inode, &cd);
+ fscache_unuse_cookie(cifs_inode_cookie(inode), &cd, &i_size);
+ } else {
+ fscache_unuse_cookie(cifs_inode_cookie(inode), NULL, NULL);
}
}
-void cifs_fscache_update_inode_cookie(struct inode *inode)
+void cifs_fscache_release_inode_cookie(struct inode *inode)
{
- struct cifs_fscache_inode_auxdata auxdata;
struct cifsInodeInfo *cifsi = CIFS_I(inode);
if (cifsi->fscache) {
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cifsi->fscache);
- fscache_update_cookie(cifsi->fscache, &auxdata);
- }
-}
-
-void cifs_fscache_set_inode_cookie(struct inode *inode, struct file *filp)
-{
- cifs_fscache_enable_inode_cookie(inode);
-}
-
-void cifs_fscache_reset_inode_cookie(struct inode *inode)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
- struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
- struct fscache_cookie *old = cifsi->fscache;
-
- if (cifsi->fscache) {
- /* retire the current fscache cache and get a new one */
- fscache_relinquish_cookie(cifsi->fscache, NULL, true);
-
- cifs_fscache_acquire_inode_cookie(cifsi, tcon);
- cifs_dbg(FYI, "%s: new cookie 0x%p oldcookie 0x%p\n",
- __func__, cifsi->fscache, old);
+ fscache_relinquish_cookie(cifsi->fscache, false);
+ cifsi->fscache = NULL;
}
}
-int cifs_fscache_release_page(struct page *page, gfp_t gfp)
-{
- if (PageFsCache(page)) {
- struct inode *inode = page->mapping->host;
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, page, cifsi->fscache);
- if (!fscache_maybe_release_page(cifsi->fscache, page, gfp))
- return 0;
- }
-
- return 1;
-}
-
-static void cifs_readpage_from_fscache_complete(struct page *page, void *ctx,
- int error)
-{
- cifs_dbg(FYI, "%s: (0x%p/%d)\n", __func__, page, error);
- if (!error)
- SetPageUptodate(page);
- unlock_page(page);
-}
-
/*
* Retrieve a page from FS-Cache
*/
int __cifs_readpage_from_fscache(struct inode *inode, struct page *page)
{
- int ret;
-
cifs_dbg(FYI, "%s: (fsc:%p, p:%p, i:0x%p\n",
__func__, CIFS_I(inode)->fscache, page, inode);
- ret = fscache_read_or_alloc_page(CIFS_I(inode)->fscache, page,
- cifs_readpage_from_fscache_complete,
- NULL,
- GFP_KERNEL);
- switch (ret) {
-
- case 0: /* page found in fscache, read submitted */
- cifs_dbg(FYI, "%s: submitted\n", __func__);
- return ret;
- case -ENOBUFS: /* page won't be cached */
- case -ENODATA: /* page not in cache */
- cifs_dbg(FYI, "%s: %d\n", __func__, ret);
- return 1;
-
- default:
- cifs_dbg(VFS, "unknown error ret = %d\n", ret);
- }
- return ret;
+ return -ENOBUFS; // Needs conversion to using netfslib
}
/*
struct list_head *pages,
unsigned *nr_pages)
{
- int ret;
-
cifs_dbg(FYI, "%s: (0x%p/%u/0x%p)\n",
__func__, CIFS_I(inode)->fscache, *nr_pages, inode);
- ret = fscache_read_or_alloc_pages(CIFS_I(inode)->fscache, mapping,
- pages, nr_pages,
- cifs_readpage_from_fscache_complete,
- NULL,
- mapping_gfp_mask(mapping));
- switch (ret) {
- case 0: /* read submitted to the cache for all pages */
- cifs_dbg(FYI, "%s: submitted\n", __func__);
- return ret;
-
- case -ENOBUFS: /* some pages are not cached and can't be */
- case -ENODATA: /* some pages are not cached */
- cifs_dbg(FYI, "%s: no page\n", __func__);
- return 1;
-
- default:
- cifs_dbg(FYI, "unknown error ret = %d\n", ret);
- }
-
- return ret;
+ return -ENOBUFS; // Needs conversion to using netfslib
}
void __cifs_readpage_to_fscache(struct inode *inode, struct page *page)
{
struct cifsInodeInfo *cifsi = CIFS_I(inode);
- int ret;
WARN_ON(!cifsi->fscache);
cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n",
__func__, cifsi->fscache, page, inode);
- ret = fscache_write_page(cifsi->fscache, page,
- cifsi->vfs_inode.i_size, GFP_KERNEL);
- if (ret != 0)
- fscache_uncache_page(cifsi->fscache, page);
-}
-
-void __cifs_fscache_readpages_cancel(struct inode *inode, struct list_head *pages)
-{
- cifs_dbg(FYI, "%s: (fsc: %p, i: %p)\n",
- __func__, CIFS_I(inode)->fscache, inode);
- fscache_readpages_cancel(CIFS_I(inode)->fscache, pages);
-}
-
-void __cifs_fscache_invalidate_page(struct page *page, struct inode *inode)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_wait_on_page_write(cookie, page);
- fscache_uncache_page(cookie, page);
-}
-
-void __cifs_fscache_wait_on_page_write(struct inode *inode, struct page *page)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_wait_on_page_write(cookie, page);
-}
-
-void __cifs_fscache_uncache_page(struct inode *inode, struct page *page)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_uncache_page(cookie, page);
+ // Needs conversion to using netfslib
}
#include "cifsglob.h"
-#ifdef CONFIG_CIFS_FSCACHE
-
/*
- * Auxiliary data attached to CIFS superblock within the cache
+ * Coherency data attached to CIFS volume within the cache
*/
-struct cifs_fscache_super_auxdata {
- u64 resource_id; /* unique server resource id */
+struct cifs_fscache_volume_coherency_data {
+ __le64 resource_id; /* unique server resource id */
__le64 vol_create_time;
- u32 vol_serial_number;
+ __le32 vol_serial_number;
} __packed;
/*
- * Auxiliary data attached to CIFS inode within the cache
+ * Coherency data attached to CIFS inode within the cache.
*/
-struct cifs_fscache_inode_auxdata {
- u64 last_write_time_sec;
- u64 last_change_time_sec;
- u32 last_write_time_nsec;
- u32 last_change_time_nsec;
- u64 eof;
+struct cifs_fscache_inode_coherency_data {
+ __le64 last_write_time_sec;
+ __le64 last_change_time_sec;
+ __le32 last_write_time_nsec;
+ __le32 last_change_time_nsec;
};
-/*
- * cache.c
- */
-extern struct fscache_netfs cifs_fscache_netfs;
-extern const struct fscache_cookie_def cifs_fscache_server_index_def;
-extern const struct fscache_cookie_def cifs_fscache_super_index_def;
-extern const struct fscache_cookie_def cifs_fscache_inode_object_def;
-
-extern int cifs_fscache_register(void);
-extern void cifs_fscache_unregister(void);
+#ifdef CONFIG_CIFS_FSCACHE
/*
* fscache.c
*/
-extern void cifs_fscache_get_client_cookie(struct TCP_Server_Info *);
-extern void cifs_fscache_release_client_cookie(struct TCP_Server_Info *);
-extern void cifs_fscache_get_super_cookie(struct cifs_tcon *);
+extern int cifs_fscache_get_super_cookie(struct cifs_tcon *);
extern void cifs_fscache_release_super_cookie(struct cifs_tcon *);
+extern void cifs_fscache_get_inode_cookie(struct inode *inode);
extern void cifs_fscache_release_inode_cookie(struct inode *);
-extern void cifs_fscache_update_inode_cookie(struct inode *inode);
-extern void cifs_fscache_set_inode_cookie(struct inode *, struct file *);
-extern void cifs_fscache_reset_inode_cookie(struct inode *);
+extern void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update);
+
+static inline
+void cifs_fscache_fill_coherency(struct inode *inode,
+ struct cifs_fscache_inode_coherency_data *cd)
+{
+ struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+ memset(cd, 0, sizeof(*cd));
+ cd->last_write_time_sec = cpu_to_le64(cifsi->vfs_inode.i_mtime.tv_sec);
+ cd->last_write_time_nsec = cpu_to_le32(cifsi->vfs_inode.i_mtime.tv_nsec);
+ cd->last_change_time_sec = cpu_to_le64(cifsi->vfs_inode.i_ctime.tv_sec);
+ cd->last_change_time_nsec = cpu_to_le32(cifsi->vfs_inode.i_ctime.tv_nsec);
+}
+
-extern void __cifs_fscache_invalidate_page(struct page *, struct inode *);
-extern void __cifs_fscache_wait_on_page_write(struct inode *inode, struct page *page);
-extern void __cifs_fscache_uncache_page(struct inode *inode, struct page *page);
extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
extern int __cifs_readpage_from_fscache(struct inode *, struct page *);
extern int __cifs_readpages_from_fscache(struct inode *,
struct address_space *,
struct list_head *,
unsigned *);
-extern void __cifs_fscache_readpages_cancel(struct inode *, struct list_head *);
-
extern void __cifs_readpage_to_fscache(struct inode *, struct page *);
-static inline void cifs_fscache_invalidate_page(struct page *page,
- struct inode *inode)
+static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode)
{
- if (PageFsCache(page))
- __cifs_fscache_invalidate_page(page, inode);
+ return CIFS_I(inode)->fscache;
}
-static inline void cifs_fscache_wait_on_page_write(struct inode *inode,
- struct page *page)
+static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags)
{
- if (PageFsCache(page))
- __cifs_fscache_wait_on_page_write(inode, page);
-}
+ struct cifs_fscache_inode_coherency_data cd;
-static inline void cifs_fscache_uncache_page(struct inode *inode,
- struct page *page)
-{
- if (PageFsCache(page))
- __cifs_fscache_uncache_page(inode, page);
+ cifs_fscache_fill_coherency(inode, &cd);
+ fscache_invalidate(cifs_inode_cookie(inode), &cd,
+ i_size_read(inode), flags);
}
static inline int cifs_readpage_from_fscache(struct inode *inode,
__cifs_readpage_to_fscache(inode, page);
}
-static inline void cifs_fscache_readpages_cancel(struct inode *inode,
- struct list_head *pages)
+#else /* CONFIG_CIFS_FSCACHE */
+static inline
+void cifs_fscache_fill_coherency(struct inode *inode,
+ struct cifs_fscache_inode_coherency_data *cd)
{
- if (CIFS_I(inode)->fscache)
- return __cifs_fscache_readpages_cancel(inode, pages);
}
-#else /* CONFIG_CIFS_FSCACHE */
-static inline int cifs_fscache_register(void) { return 0; }
-static inline void cifs_fscache_unregister(void) {}
-
-static inline void
-cifs_fscache_get_client_cookie(struct TCP_Server_Info *server) {}
-static inline void
-cifs_fscache_release_client_cookie(struct TCP_Server_Info *server) {}
-static inline void cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) {}
-static inline void
-cifs_fscache_release_super_cookie(struct cifs_tcon *tcon) {}
+static inline int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) { return 0; }
+static inline void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon) {}
+static inline void cifs_fscache_get_inode_cookie(struct inode *inode) {}
static inline void cifs_fscache_release_inode_cookie(struct inode *inode) {}
-static inline void cifs_fscache_update_inode_cookie(struct inode *inode) {}
-static inline void cifs_fscache_set_inode_cookie(struct inode *inode,
- struct file *filp) {}
-static inline void cifs_fscache_reset_inode_cookie(struct inode *inode) {}
-static inline int cifs_fscache_release_page(struct page *page, gfp_t gfp)
-{
- return 1; /* May release page */
-}
-
-static inline void cifs_fscache_invalidate_page(struct page *page,
- struct inode *inode) {}
-static inline void cifs_fscache_wait_on_page_write(struct inode *inode,
- struct page *page) {}
-static inline void cifs_fscache_uncache_page(struct inode *inode,
- struct page *page) {}
+static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) {}
+static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; }
+static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) {}
static inline int
cifs_readpage_from_fscache(struct inode *inode, struct page *page)
static inline void cifs_readpage_to_fscache(struct inode *inode,
struct page *page) {}
-static inline void cifs_fscache_readpages_cancel(struct inode *inode,
- struct list_head *pages)
-{
-}
-
#endif /* CONFIG_CIFS_FSCACHE */
#endif /* _CIFS_FSCACHE_H */
rc = server->ops->query_path_info(xid, tcon, cifs_sb,
full_path, tmp_data,
&adjust_tz, &is_reparse_point);
+#ifdef CONFIG_CIFS_DFS_UPCALL
+ if (rc == -ENOENT && is_tcon_dfs(tcon))
+ rc = cifs_dfs_query_info_nonascii_quirk(xid, tcon,
+ cifs_sb,
+ full_path);
+#endif
data = tmp_data;
}
inode->i_flags |= S_NOATIME | S_NOCMTIME;
if (inode->i_state & I_NEW) {
inode->i_ino = hash;
-#ifdef CONFIG_CIFS_FSCACHE
- /* initialize per-inode cache cookie pointer */
- CIFS_I(inode)->fscache = NULL;
-#endif
+ cifs_fscache_get_inode_cookie(inode);
unlock_new_inode(inode);
}
}
iget_failed(inode);
inode = ERR_PTR(rc);
}
+
out:
kfree(path);
free_xid(xid);
int
cifs_invalidate_mapping(struct inode *inode)
{
+ struct cifs_fscache_inode_coherency_data cd;
+ struct cifsInodeInfo *cifsi = CIFS_I(inode);
int rc = 0;
if (inode->i_mapping && inode->i_mapping->nrpages != 0) {
__func__, inode);
}
- cifs_fscache_reset_inode_cookie(inode);
+ cifs_fscache_fill_coherency(&cifsi->vfs_inode, &cd);
+ fscache_invalidate(cifs_inode_cookie(inode), &cd, i_size_read(inode), 0);
return rc;
}
goto out;
if ((attrs->ia_valid & ATTR_SIZE) &&
- attrs->ia_size != i_size_read(inode))
+ attrs->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attrs->ia_size);
+ fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
+ }
setattr_copy(&init_user_ns, inode, attrs);
mark_inode_dirty(inode);
goto cifs_setattr_exit;
if ((attrs->ia_valid & ATTR_SIZE) &&
- attrs->ia_size != i_size_read(inode))
+ attrs->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attrs->ia_size);
+ fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
+ }
setattr_copy(&init_user_ns, inode, attrs);
mark_inode_dirty(inode);
cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH;
return 0;
}
+
+/** cifs_dfs_query_info_nonascii_quirk
+ * Handle weird Windows SMB server behaviour. It responds with
+ * STATUS_OBJECT_NAME_INVALID code to SMB2 QUERY_INFO request
+ * for "\<server>\<dfsname>\<linkpath>" DFS reference,
+ * where <dfsname> contains non-ASCII unicode symbols.
+ *
+ * Check such DFS reference and emulate -ENOENT if it is actual.
+ */
+int cifs_dfs_query_info_nonascii_quirk(const unsigned int xid,
+ struct cifs_tcon *tcon,
+ struct cifs_sb_info *cifs_sb,
+ const char *linkpath)
+{
+ char *treename, *dfspath, sep;
+ int treenamelen, linkpathlen, rc;
+
+ treename = tcon->treeName;
+ /* MS-DFSC: All paths in REQ_GET_DFS_REFERRAL and RESP_GET_DFS_REFERRAL
+ * messages MUST be encoded with exactly one leading backslash, not two
+ * leading backslashes.
+ */
+ sep = CIFS_DIR_SEP(cifs_sb);
+ if (treename[0] == sep && treename[1] == sep)
+ treename++;
+ linkpathlen = strlen(linkpath);
+ treenamelen = strnlen(treename, MAX_TREE_SIZE + 1);
+ dfspath = kzalloc(treenamelen + linkpathlen + 1, GFP_KERNEL);
+ if (!dfspath)
+ return -ENOMEM;
+ if (treenamelen)
+ memcpy(dfspath, treename, treenamelen);
+ memcpy(dfspath + treenamelen, linkpath, linkpathlen);
+ rc = dfs_cache_find(xid, tcon->ses, cifs_sb->local_nls,
+ cifs_remap(cifs_sb), dfspath, NULL, NULL);
+ if (rc == 0) {
+ cifs_dbg(FYI, "DFS ref '%s' is found, emulate -EREMOTE\n",
+ dfspath);
+ rc = -EREMOTE;
+ } else if (rc == -EEXIST) {
+ cifs_dbg(FYI, "DFS ref '%s' is not found, emulate -ENOENT\n",
+ dfspath);
+ rc = -ENOENT;
+ } else {
+ cifs_dbg(FYI, "%s: dfs_cache_find returned %d\n", __func__, rc);
+ }
+ kfree(dfspath);
+ return rc;
+}
#endif
if (class == ERRSRV && code == ERRbaduid) {
cifs_dbg(FYI, "Server returned 0x%x, reconnecting session...\n",
code);
- spin_lock(&cifs_tcp_ses_lock);
- if (mid->server->tcpStatus != CifsExiting)
- mid->server->tcpStatus = CifsNeedReconnect;
- spin_unlock(&cifs_tcp_ses_lock);
+ cifs_reconnect(mid->server, false);
}
}
#define NTLMSSP_REQUEST_NON_NT_KEY 0x400000
#define NTLMSSP_NEGOTIATE_TARGET_INFO 0x800000
/* #define reserved4 0x1000000 */
-#define NTLMSSP_NEGOTIATE_VERSION 0x2000000 /* we do not set */
+#define NTLMSSP_NEGOTIATE_VERSION 0x2000000 /* we only set for SMB2+ */
/* #define reserved3 0x4000000 */
/* #define reserved2 0x8000000 */
/* #define reserved1 0x10000000 */
/* followed by WorkstationString */
} __attribute__((packed)) NEGOTIATE_MESSAGE, *PNEGOTIATE_MESSAGE;
+#define NTLMSSP_REVISION_W2K3 0x0F
+
+/* See MS-NLMP section 2.2.2.10 */
+struct ntlmssp_version {
+ __u8 ProductMajorVersion;
+ __u8 ProductMinorVersion;
+ __le16 ProductBuild; /* we send the cifs.ko module version here */
+ __u8 Reserved[3];
+ __u8 NTLMRevisionCurrent; /* currently 0x0F */
+} __packed;
+
+/* see MS-NLMP section 2.2.1.1 */
+struct negotiate_message {
+ __u8 Signature[sizeof(NTLMSSP_SIGNATURE)];
+ __le32 MessageType; /* NtLmNegotiate = 1 */
+ __le32 NegotiateFlags;
+ SECURITY_BUFFER DomainName; /* RFC 1001 style and ASCII */
+ SECURITY_BUFFER WorkstationName; /* RFC 1001 and ASCII */
+ struct ntlmssp_version Version;
+ /* SECURITY_BUFFER */
+ char DomainString[0];
+ /* followed by WorkstationString */
+} __packed;
+
typedef struct _CHALLENGE_MESSAGE {
__u8 Signature[sizeof(NTLMSSP_SIGNATURE)];
__le32 MessageType; /* NtLmChallenge = 2 */
struct cifs_ses *ses,
struct TCP_Server_Info *server,
const struct nls_table *nls_cp);
+int build_ntlmssp_smb3_negotiate_blob(unsigned char **pbuffer, u16 *buflen,
+ struct cifs_ses *ses,
+ struct TCP_Server_Info *server,
+ const struct nls_table *nls_cp);
int build_ntlmssp_auth_blob(unsigned char **pbuffer, u16 *buflen,
struct cifs_ses *ses,
struct TCP_Server_Info *server,
#include "nterr.h"
#include <linux/utsname.h>
#include <linux/slab.h>
+#include <linux/version.h>
+#include "cifsfs.h"
#include "cifs_spnego.h"
#include "smb2proto.h"
#include "fs_context.h"
return false;
}
+/* channel helper functions. assumed that chan_lock is held by caller. */
+
unsigned int
cifs_ses_get_chan_index(struct cifs_ses *ses,
struct TCP_Server_Info *server)
left = ses->chan_max - ses->chan_count;
if (left <= 0) {
+ spin_unlock(&ses->chan_lock);
cifs_dbg(FYI,
"ses already at max_channels (%zu), nothing to open\n",
ses->chan_max);
- spin_unlock(&ses->chan_lock);
return 0;
}
return rc;
}
-/* Mark all session channels for reconnect */
-void cifs_ses_mark_for_reconnect(struct cifs_ses *ses)
-{
- int i;
-
- for (i = 0; i < ses->chan_count; i++) {
- spin_lock(&cifs_tcp_ses_lock);
- if (ses->chans[i].server->tcpStatus != CifsExiting)
- ses->chans[i].server->tcpStatus = CifsNeedReconnect;
- spin_unlock(&cifs_tcp_ses_lock);
- }
-}
-
static __u32 cifs_ssetup_hdr(struct cifs_ses *ses,
struct TCP_Server_Info *server,
SESSION_SETUP_ANDX *pSMB)
return rc;
}
+/*
+ * Build ntlmssp blob with additional fields, such as version,
+ * supported by modern servers. For safety limit to SMB3 or later
+ * See notes in MS-NLMP Section 2.2.2.1 e.g.
+ */
+int build_ntlmssp_smb3_negotiate_blob(unsigned char **pbuffer,
+ u16 *buflen,
+ struct cifs_ses *ses,
+ struct TCP_Server_Info *server,
+ const struct nls_table *nls_cp)
+{
+ int rc = 0;
+ struct negotiate_message *sec_blob;
+ __u32 flags;
+ unsigned char *tmp;
+ int len;
+
+ len = size_of_ntlmssp_blob(ses, sizeof(struct negotiate_message));
+ *pbuffer = kmalloc(len, GFP_KERNEL);
+ if (!*pbuffer) {
+ rc = -ENOMEM;
+ cifs_dbg(VFS, "Error %d during NTLMSSP allocation\n", rc);
+ *buflen = 0;
+ goto setup_ntlm_smb3_neg_ret;
+ }
+ sec_blob = (struct negotiate_message *)*pbuffer;
+
+ memset(*pbuffer, 0, sizeof(struct negotiate_message));
+ memcpy(sec_blob->Signature, NTLMSSP_SIGNATURE, 8);
+ sec_blob->MessageType = NtLmNegotiate;
+
+ /* BB is NTLMV2 session security format easier to use here? */
+ flags = NTLMSSP_NEGOTIATE_56 | NTLMSSP_REQUEST_TARGET |
+ NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE |
+ NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC |
+ NTLMSSP_NEGOTIATE_ALWAYS_SIGN | NTLMSSP_NEGOTIATE_SEAL |
+ NTLMSSP_NEGOTIATE_SIGN | NTLMSSP_NEGOTIATE_VERSION;
+ if (!server->session_estab || ses->ntlmssp->sesskey_per_smbsess)
+ flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
+
+ sec_blob->Version.ProductMajorVersion = LINUX_VERSION_MAJOR;
+ sec_blob->Version.ProductMinorVersion = LINUX_VERSION_PATCHLEVEL;
+ sec_blob->Version.ProductBuild = cpu_to_le16(SMB3_PRODUCT_BUILD);
+ sec_blob->Version.NTLMRevisionCurrent = NTLMSSP_REVISION_W2K3;
+
+ tmp = *pbuffer + sizeof(struct negotiate_message);
+ ses->ntlmssp->client_flags = flags;
+ sec_blob->NegotiateFlags = cpu_to_le32(flags);
+
+ /* these fields should be null in negotiate phase MS-NLMP 3.1.5.1.1 */
+ cifs_security_buffer_from_str(&sec_blob->DomainName,
+ NULL,
+ CIFS_MAX_DOMAINNAME_LEN,
+ *pbuffer, &tmp,
+ nls_cp);
+
+ cifs_security_buffer_from_str(&sec_blob->WorkstationName,
+ NULL,
+ CIFS_MAX_WORKSTATION_LEN,
+ *pbuffer, &tmp,
+ nls_cp);
+
+ *buflen = tmp - *pbuffer;
+setup_ntlm_smb3_neg_ret:
+ return rc;
+}
+
+
int build_ntlmssp_auth_blob(unsigned char **pbuffer,
u16 *buflen,
struct cifs_ses *ses,
mutex_unlock(&server->srv_mutex);
cifs_dbg(FYI, "CIFS session established successfully\n");
- spin_lock(&ses->chan_lock);
- cifs_chan_clear_need_reconnect(ses, server);
- spin_unlock(&ses->chan_lock);
-
- /* Even if one channel is active, session is in good state */
- spin_lock(&cifs_tcp_ses_lock);
- server->tcpStatus = CifsGood;
- ses->status = CifsGood;
- spin_unlock(&cifs_tcp_ses_lock);
-
return 0;
}
&blob_len, ses, server,
sess_data->nls_cp);
if (rc)
- goto out;
+ goto out_free_ntlmsspblob;
sess_data->iov[1].iov_len = blob_len;
sess_data->iov[1].iov_base = ntlmsspblob;
rc = _sess_auth_rawntlmssp_assemble_req(sess_data);
if (rc)
- goto out;
+ goto out_free_ntlmsspblob;
rc = sess_sendreceive(sess_data);
rc = 0;
if (rc)
- goto out;
+ goto out_free_ntlmsspblob;
cifs_dbg(FYI, "rawntlmssp session setup challenge phase\n");
if (smb_buf->WordCount != 4) {
rc = -EIO;
cifs_dbg(VFS, "bad word count %d\n", smb_buf->WordCount);
- goto out;
+ goto out_free_ntlmsspblob;
}
ses->Suid = smb_buf->Uid; /* UID left in wire format (le) */
cifs_dbg(VFS, "bad security blob length %d\n",
blob_len);
rc = -EINVAL;
- goto out;
+ goto out_free_ntlmsspblob;
}
rc = decode_ntlmssp_challenge(bcc_ptr, blob_len, ses);
+
+out_free_ntlmsspblob:
+ kfree(ntlmsspblob);
out:
sess_free_buffer(sess_data);
out:
sess_free_buffer(sess_data);
- if (!rc)
+ if (!rc)
rc = sess_establish_session(sess_data);
/* Cleanup */
spin_unlock(&ses->chan_lock);
return 0;
}
+ spin_unlock(&ses->chan_lock);
cifs_dbg(FYI, "sess reconnect mask: 0x%lx, tcon reconnect: %d",
tcon->ses->chans_need_reconnect,
tcon->need_reconnect);
- spin_unlock(&ses->chan_lock);
nls_codepage = load_nls_default();
rc = -EHOSTDOWN;
goto failed;
}
- }
-
- if (rc || !tcon->need_reconnect) {
+ } else {
mutex_unlock(&ses->session_mutex);
goto out;
}
+ mutex_unlock(&ses->session_mutex);
skip_sess_setup:
+ mutex_lock(&ses->session_mutex);
+ if (!tcon->need_reconnect) {
+ mutex_unlock(&ses->session_mutex);
+ goto out;
+ }
cifs_mark_open_files_invalid(tcon);
if (tcon->use_persistent)
tcon->need_reopen_files = true;
mutex_unlock(&server->srv_mutex);
cifs_dbg(FYI, "SMB2/3 session established successfully\n");
-
- spin_lock(&ses->chan_lock);
- cifs_chan_clear_need_reconnect(ses, server);
- spin_unlock(&ses->chan_lock);
-
- /* Even if one channel is active, session is in good state */
- spin_lock(&cifs_tcp_ses_lock);
- server->tcpStatus = CifsGood;
- ses->status = CifsGood;
- spin_unlock(&cifs_tcp_ses_lock);
-
return rc;
}
if (rc)
goto out_err;
- rc = build_ntlmssp_negotiate_blob(&ntlmssp_blob,
+ rc = build_ntlmssp_smb3_negotiate_blob(&ntlmssp_blob,
&blob_length, ses, server,
sess_data->nls_cp);
if (rc)
tcon->share_flags = le32_to_cpu(rsp->ShareFlags);
tcon->capabilities = rsp->Capabilities; /* we keep caps little endian */
tcon->maximal_access = le32_to_cpu(rsp->MaximalAccess);
- spin_lock(&cifs_tcp_ses_lock);
- tcon->tidStatus = CifsGood;
- spin_unlock(&cifs_tcp_ses_lock);
- tcon->need_reconnect = false;
tcon->tid = le32_to_cpu(rsp->hdr.Id.SyncId.TreeId);
strlcpy(tcon->treeName, tree, sizeof(tcon->treeName));
cp = load_nls_default();
cifs_strtoUTF16(*out_path, treename, treename_len, cp);
- UniStrcat(*out_path, sep);
- UniStrcat(*out_path, path);
+
+ /* Do not append the separator if the path is empty */
+ if (path[0] != cpu_to_le16(0x0000)) {
+ UniStrcat(*out_path, sep);
+ UniStrcat(*out_path, path);
+ }
+
unload_nls(cp);
return 0;
{
struct TCP_Server_Info *server = container_of(work,
struct TCP_Server_Info, reconnect.work);
- struct cifs_ses *ses;
+ struct TCP_Server_Info *pserver;
+ struct cifs_ses *ses, *ses2;
struct cifs_tcon *tcon, *tcon2;
- struct list_head tmp_list;
- int tcon_exist = false;
+ struct list_head tmp_list, tmp_ses_list;
+ bool tcon_exist = false, ses_exist = false;
+ bool tcon_selected = false;
int rc;
- int resched = false;
+ bool resched = false;
+ /* If server is a channel, select the primary channel */
+ pserver = CIFS_SERVER_IS_CHAN(server) ? server->primary_server : server;
/* Prevent simultaneous reconnects that can corrupt tcon->rlist list */
- mutex_lock(&server->reconnect_mutex);
+ mutex_lock(&pserver->reconnect_mutex);
INIT_LIST_HEAD(&tmp_list);
- cifs_dbg(FYI, "Need negotiate, reconnecting tcons\n");
+ INIT_LIST_HEAD(&tmp_ses_list);
+ cifs_dbg(FYI, "Reconnecting tcons and channels\n");
spin_lock(&cifs_tcp_ses_lock);
- list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) {
+ list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) {
+
+ tcon_selected = false;
+
list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
if (tcon->need_reconnect || tcon->need_reopen_files) {
tcon->tc_count++;
list_add_tail(&tcon->rlist, &tmp_list);
- tcon_exist = true;
+ tcon_selected = tcon_exist = true;
}
}
/*
*/
if (ses->tcon_ipc && ses->tcon_ipc->need_reconnect) {
list_add_tail(&ses->tcon_ipc->rlist, &tmp_list);
- tcon_exist = true;
+ tcon_selected = tcon_exist = true;
ses->ses_count++;
}
+ /*
+ * handle the case where channel needs to reconnect
+ * binding session, but tcon is healthy (some other channel
+ * is active)
+ */
+ spin_lock(&ses->chan_lock);
+ if (!tcon_selected && cifs_chan_needs_reconnect(ses, server)) {
+ list_add_tail(&ses->rlist, &tmp_ses_list);
+ ses_exist = true;
+ ses->ses_count++;
+ }
+ spin_unlock(&ses->chan_lock);
}
/*
* Get the reference to server struct to be sure that the last call of
* cifs_put_tcon() in the loop below won't release the server pointer.
*/
- if (tcon_exist)
+ if (tcon_exist || ses_exist)
server->srv_count++;
spin_unlock(&cifs_tcp_ses_lock);
cifs_put_tcon(tcon);
}
- cifs_dbg(FYI, "Reconnecting tcons finished\n");
+ if (!ses_exist)
+ goto done;
+
+ /* allocate a dummy tcon struct used for reconnect */
+ tcon = kzalloc(sizeof(struct cifs_tcon), GFP_KERNEL);
+ if (!tcon) {
+ resched = true;
+ list_del_init(&ses->rlist);
+ cifs_put_smb_ses(ses);
+ goto done;
+ }
+
+ tcon->tidStatus = CifsGood;
+ tcon->retry = false;
+ tcon->need_reconnect = false;
+
+ /* now reconnect sessions for necessary channels */
+ list_for_each_entry_safe(ses, ses2, &tmp_ses_list, rlist) {
+ tcon->ses = ses;
+ rc = smb2_reconnect(SMB2_INTERNAL_CMD, tcon, server);
+ if (rc)
+ resched = true;
+ list_del_init(&ses->rlist);
+ cifs_put_smb_ses(ses);
+ }
+ kfree(tcon);
+
+done:
+ cifs_dbg(FYI, "Reconnecting tcons and channels finished\n");
if (resched)
queue_delayed_work(cifsiod_wq, &server->reconnect, 2 * HZ);
- mutex_unlock(&server->reconnect_mutex);
+ mutex_unlock(&pserver->reconnect_mutex);
/* now we can safely release srv struct */
- if (tcon_exist)
+ if (tcon_exist || ses_exist)
cifs_put_tcp_session(server, 1);
}
goto out;
found:
+ spin_lock(&ses->chan_lock);
if (cifs_chan_needs_reconnect(ses, server) &&
!CIFS_ALL_CHANS_NEED_RECONNECT(ses)) {
/*
* session key
*/
memcpy(key, ses->smb3signingkey, SMB3_SIGN_KEY_SIZE);
+ spin_unlock(&ses->chan_lock);
goto out;
}
chan = ses->chans + i;
if (chan->server == server) {
memcpy(key, chan->signkey, SMB3_SIGN_KEY_SIZE);
+ spin_unlock(&ses->chan_lock);
goto out;
}
}
+ spin_unlock(&ses->chan_lock);
cifs_dbg(VFS,
"%s: Could not find channel signing key for session 0x%llx\n",
return rc;
/* safe to access primary channel, since it will never go away */
+ spin_lock(&ses->chan_lock);
memcpy(ses->chans[0].signkey, ses->smb3signingkey,
SMB3_SIGN_KEY_SIZE);
+ spin_unlock(&ses->chan_lock);
rc = generate_key(ses, ptriplet->encryption.label,
ptriplet->encryption.context,
* socket so the server throws away the partial SMB
*/
spin_lock(&cifs_tcp_ses_lock);
- server->tcpStatus = CifsNeedReconnect;
+ if (server->tcpStatus != CifsExiting)
+ server->tcpStatus = CifsNeedReconnect;
spin_unlock(&cifs_tcp_ses_lock);
trace_smb3_partial_send_reconnect(server->CurrentMid,
server->conn_id, server->hostname);
struct mid_q_entry **ppmidQ)
{
spin_lock(&cifs_tcp_ses_lock);
- if (ses->server->tcpStatus == CifsExiting) {
- spin_unlock(&cifs_tcp_ses_lock);
- return -ENOENT;
- }
-
- if (ses->server->tcpStatus == CifsNeedReconnect) {
- spin_unlock(&cifs_tcp_ses_lock);
- cifs_dbg(FYI, "tcp session dead - return to caller to retry\n");
- return -EAGAIN;
- }
-
if (ses->status == CifsNew) {
if ((in_buf->Command != SMB_COM_SESSION_SETUP_ANDX) &&
(in_buf->Command != SMB_COM_NEGOTIATE)) {
/* round robin */
index = (uint)atomic_inc_return(&ses->chan_seq);
+
+ spin_lock(&ses->chan_lock);
index %= ses->chan_count;
+ spin_unlock(&ses->chan_lock);
return ses->chans[index].server;
}
#include <linux/fs.h>
#include <linux/path.h>
#include <linux/timekeeping.h>
+#include <linux/sysctl.h>
#include <linux/uaccess.h>
#include <asm/mmu_context.h>
#include <trace/events/sched.h>
-int core_uses_pid;
-unsigned int core_pipe_limit;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
+static int core_uses_pid;
+static unsigned int core_pipe_limit;
+static char core_pattern[CORENAME_MAX_SIZE] = "core";
static int core_name_size = CORENAME_MAX_SIZE;
struct core_name {
int used, size;
};
-/* The maximal length of core_pattern is also specified in sysctl.c */
-
static int expand_corename(struct core_name *cn, int size)
{
char *corename = krealloc(cn->corename, size, GFP_KERNEL);
}
EXPORT_SYMBOL(dump_align);
+#ifdef CONFIG_SYSCTL
+
+void validate_coredump_safety(void)
+{
+ if (suid_dumpable == SUID_DUMP_ROOT &&
+ core_pattern[0] != '/' && core_pattern[0] != '|') {
+ pr_warn(
+"Unsafe core_pattern used with fs.suid_dumpable=2.\n"
+"Pipe handler or fully qualified core dump path required.\n"
+"Set kernel.core_pattern before fs.suid_dumpable.\n"
+ );
+ }
+}
+
+static int proc_dostring_coredump(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ int error = proc_dostring(table, write, buffer, lenp, ppos);
+
+ if (!error)
+ validate_coredump_safety();
+ return error;
+}
+
+static struct ctl_table coredump_sysctls[] = {
+ {
+ .procname = "core_uses_pid",
+ .data = &core_uses_pid,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "core_pattern",
+ .data = core_pattern,
+ .maxlen = CORENAME_MAX_SIZE,
+ .mode = 0644,
+ .proc_handler = proc_dostring_coredump,
+ },
+ {
+ .procname = "core_pipe_limit",
+ .data = &core_pipe_limit,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ { }
+};
+
+static int __init init_fs_coredump_sysctls(void)
+{
+ register_sysctl_init("kernel", coredump_sysctls);
+ return 0;
+}
+fs_initcall(init_fs_coredump_sysctls);
+#endif /* CONFIG_SYSCTL */
+
/*
* The purpose of always_dump_vma() is to make sure that special kernel mappings
* that are useful for post-mortem analysis are included in every core dump.
return in_lookup_hashtable + hash_32(hash, IN_LOOKUP_SHIFT);
}
-
-/* Statistics gathering. */
-struct dentry_stat_t dentry_stat = {
- .age_limit = 45,
+struct dentry_stat_t {
+ long nr_dentry;
+ long nr_unused;
+ long age_limit; /* age in seconds */
+ long want_pages; /* pages requested by system */
+ long nr_negative; /* # of unused negative dentries */
+ long dummy; /* Reserved for future use */
};
static DEFINE_PER_CPU(long, nr_dentry);
static DEFINE_PER_CPU(long, nr_dentry_negative);
#if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
+/* Statistics gathering. */
+static struct dentry_stat_t dentry_stat = {
+ .age_limit = 45,
+};
/*
* Here we resort to our own counters instead of using generic per-cpu counters
return sum < 0 ? 0 : sum;
}
-int proc_nr_dentry(struct ctl_table *table, int write, void *buffer,
- size_t *lenp, loff_t *ppos)
+static int proc_nr_dentry(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
{
dentry_stat.nr_dentry = get_nr_dentry();
dentry_stat.nr_unused = get_nr_dentry_unused();
dentry_stat.nr_negative = get_nr_dentry_negative();
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
}
+
+static struct ctl_table fs_dcache_sysctls[] = {
+ {
+ .procname = "dentry-state",
+ .data = &dentry_stat,
+ .maxlen = 6*sizeof(long),
+ .mode = 0444,
+ .proc_handler = proc_nr_dentry,
+ },
+ { }
+};
+
+static int __init init_fs_dcache_sysctls(void)
+{
+ register_sysctl_init("fs", fs_dcache_sysctls);
+ return 0;
+}
+fs_initcall(init_fs_dcache_sysctls);
#endif
/*
static long long_zero;
static long long_max = LONG_MAX;
-struct ctl_table epoll_table[] = {
+static struct ctl_table epoll_table[] = {
{
.procname = "max_user_watches",
.data = &max_user_watches,
},
{ }
};
+
+static void __init epoll_sysctls_init(void)
+{
+ register_sysctl("fs/epoll", epoll_table);
+}
+#else
+#define epoll_sysctls_init() do { } while (0)
#endif /* CONFIG_SYSCTL */
static const struct file_operations eventpoll_fops;
/* Allocates slab cache used to allocate "struct eppoll_entry" */
pwq_cache = kmem_cache_create("eventpoll_pwq",
sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
+ epoll_sysctls_init();
ephead_cache = kmem_cache_create("ep_head",
sizeof(struct epitems_head), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
#include <linux/vmalloc.h>
#include <linux/io_uring.h>
#include <linux/syscall_user_dispatch.h>
+#include <linux/coredump.h>
#include <linux/uaccess.h>
#include <asm/mmu_context.h>
char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
{
task_lock(tsk);
- strncpy(buf, tsk->comm, buf_size);
+ /* Always NUL terminated and zero-padded */
+ strscpy_pad(buf, tsk->comm, buf_size);
task_unlock(tsk);
return buf;
}
{
task_lock(tsk);
trace_task_rename(tsk, buf);
- strlcpy(tsk->comm, buf, sizeof(tsk->comm));
+ strscpy_pad(tsk->comm, buf, sizeof(tsk->comm));
task_unlock(tsk);
perf_event_comm(tsk, exec);
}
argv, envp, flags);
}
#endif
+
+#ifdef CONFIG_SYSCTL
+
+static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+ if (!error)
+ validate_coredump_safety();
+ return error;
+}
+
+static struct ctl_table fs_exec_sysctls[] = {
+ {
+ .procname = "suid_dumpable",
+ .data = &suid_dumpable,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax_coredump,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_TWO,
+ },
+ { }
+};
+
+static int __init init_fs_exec_sysctls(void)
+{
+ register_sysctl_init("fs", fs_exec_sysctls);
+ return 0;
+}
+
+fs_initcall(init_fs_exec_sysctls);
+#endif /* CONFIG_SYSCTL */
static void *ext4_mb_seq_groups_start(struct seq_file *seq, loff_t *pos)
{
- struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct super_block *sb = pde_data(file_inode(seq->file));
ext4_group_t group;
if (*pos < 0 || *pos >= ext4_get_groups_count(sb))
static void *ext4_mb_seq_groups_next(struct seq_file *seq, void *v, loff_t *pos)
{
- struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct super_block *sb = pde_data(file_inode(seq->file));
ext4_group_t group;
++*pos;
static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v)
{
- struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct super_block *sb = pde_data(file_inode(seq->file));
ext4_group_t group = (ext4_group_t) ((unsigned long) v);
int i;
int err, buddy_loaded = 0;
static void *ext4_mb_seq_structs_summary_start(struct seq_file *seq, loff_t *pos)
__acquires(&EXT4_SB(sb)->s_mb_rb_lock)
{
- struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct super_block *sb = pde_data(file_inode(seq->file));
unsigned long position;
read_lock(&EXT4_SB(sb)->s_mb_rb_lock);
static void *ext4_mb_seq_structs_summary_next(struct seq_file *seq, void *v, loff_t *pos)
{
- struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct super_block *sb = pde_data(file_inode(seq->file));
unsigned long position;
++*pos;
static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v)
{
- struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct super_block *sb = pde_data(file_inode(seq->file));
struct ext4_sb_info *sbi = EXT4_SB(sb);
unsigned long position = ((unsigned long) v);
struct ext4_group_info *grp;
static void ext4_mb_seq_structs_summary_stop(struct seq_file *seq, void *v)
__releases(&EXT4_SB(sb)->s_mb_rb_lock)
{
- struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct super_block *sb = pde_data(file_inode(seq->file));
read_unlock(&EXT4_SB(sb)->s_mb_rb_lock);
}
#include <linux/writeback.h>
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
-#include <linux/cleancache.h>
#include "ext4.h"
} else if (fully_mapped) {
SetPageMappedToDisk(page);
}
- if (fully_mapped && blocks_per_page == 1 &&
- !PageUptodate(page) && cleancache_get_page(page) == 0) {
- SetPageUptodate(page);
- goto confused;
- }
/*
* This page will go to BIO. Do we need to send this
#include <linux/log2.h>
#include <linux/crc16.h>
#include <linux/dax.h>
-#include <linux/cleancache.h>
#include <linux/uaccess.h>
#include <linux/iversion.h>
#include <linux/unicode.h>
EXT4_BLOCKS_PER_GROUP(sb),
EXT4_INODES_PER_GROUP(sb),
sbi->s_mount_opt, sbi->s_mount_opt2);
-
- cleancache_init_fs(sb);
return err;
}
select CRYPTO_CRC32
select F2FS_FS_XATTR if FS_ENCRYPTION
select FS_ENCRYPTION_ALGS if FS_ENCRYPTION
+ select FS_IOMAP
select LZ4_COMPRESS if F2FS_FS_LZ4
select LZ4_DECOMPRESS if F2FS_FS_LZ4
select LZ4HC_COMPRESS if F2FS_FS_LZ4HC
/* truncate all the data during iput */
iput(inode);
- err = f2fs_get_node_info(sbi, ino, &ni);
+ err = f2fs_get_node_info(sbi, ino, &ni, false);
if (err)
goto err_out;
unsigned long flags;
if (cpc->reason & CP_UMOUNT) {
- if (le32_to_cpu(ckpt->cp_pack_total_block_count) >
- sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks) {
+ if (le32_to_cpu(ckpt->cp_pack_total_block_count) +
+ NM_I(sbi)->nat_bits_blocks > sbi->blocks_per_seg) {
clear_ckpt_flags(sbi, CP_NAT_BITS_FLAG);
f2fs_notice(sbi, "Disable nat_bits due to no space");
} else if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG) &&
cc->rpages = NULL;
cc->nr_rpages = 0;
cc->nr_cpages = 0;
+ cc->valid_nr_cpages = 0;
if (!reuse)
cc->cluster_idx = NULL_CLUSTER;
}
const struct f2fs_compress_ops *cops =
f2fs_cops[fi->i_compress_algorithm];
unsigned int max_len, new_nr_cpages;
- struct page **new_cpages;
u32 chksum = 0;
int i, ret;
max_len = COMPRESS_HEADER_SIZE + cc->clen;
cc->nr_cpages = DIV_ROUND_UP(max_len, PAGE_SIZE);
+ cc->valid_nr_cpages = cc->nr_cpages;
cc->cpages = page_array_alloc(cc->inode, cc->nr_cpages);
if (!cc->cpages) {
new_nr_cpages = DIV_ROUND_UP(cc->clen + COMPRESS_HEADER_SIZE, PAGE_SIZE);
- /* Now we're going to cut unnecessary tail pages */
- new_cpages = page_array_alloc(cc->inode, new_nr_cpages);
- if (!new_cpages) {
- ret = -ENOMEM;
- goto out_vunmap_cbuf;
- }
-
/* zero out any unused part of the last page */
memset(&cc->cbuf->cdata[cc->clen], 0,
(new_nr_cpages * PAGE_SIZE) -
vm_unmap_ram(cc->rbuf, cc->cluster_size);
for (i = 0; i < cc->nr_cpages; i++) {
- if (i < new_nr_cpages) {
- new_cpages[i] = cc->cpages[i];
+ if (i < new_nr_cpages)
continue;
- }
f2fs_compress_free_page(cc->cpages[i]);
cc->cpages[i] = NULL;
}
if (cops->destroy_compress_ctx)
cops->destroy_compress_ctx(cc);
- page_array_free(cc->inode, cc->cpages, cc->nr_cpages);
- cc->cpages = new_cpages;
- cc->nr_cpages = new_nr_cpages;
+ cc->valid_nr_cpages = new_nr_cpages;
trace_f2fs_compress_pages_end(cc->inode, cc->cluster_idx,
cc->clen, ret);
psize = (loff_t)(cc->rpages[last_index]->index + 1) << PAGE_SHIFT;
- err = f2fs_get_node_info(fio.sbi, dn.nid, &ni);
+ err = f2fs_get_node_info(fio.sbi, dn.nid, &ni, false);
if (err)
goto out_put_dnode;
cic->magic = F2FS_COMPRESSED_PAGE_MAGIC;
cic->inode = inode;
- atomic_set(&cic->pending_pages, cc->nr_cpages);
+ atomic_set(&cic->pending_pages, cc->valid_nr_cpages);
cic->rpages = page_array_alloc(cc->inode, cc->cluster_size);
if (!cic->rpages)
goto out_put_cic;
cic->nr_rpages = cc->cluster_size;
- for (i = 0; i < cc->nr_cpages; i++) {
+ for (i = 0; i < cc->valid_nr_cpages; i++) {
f2fs_set_compressed_page(cc->cpages[i], inode,
cc->rpages[i + 1]->index, cic);
fio.compressed_page = cc->cpages[i];
if (fio.compr_blocks && __is_valid_data_blkaddr(blkaddr))
fio.compr_blocks++;
- if (i > cc->nr_cpages) {
+ if (i > cc->valid_nr_cpages) {
if (__is_valid_data_blkaddr(blkaddr)) {
f2fs_invalidate_blocks(sbi, blkaddr);
f2fs_update_data_blkaddr(&dn, NEW_ADDR);
if (fio.compr_blocks)
f2fs_i_compr_blocks_update(inode, fio.compr_blocks - 1, false);
- f2fs_i_compr_blocks_update(inode, cc->nr_cpages, true);
- add_compr_block_stat(inode, cc->nr_cpages);
+ f2fs_i_compr_blocks_update(inode, cc->valid_nr_cpages, true);
+ add_compr_block_stat(inode, cc->valid_nr_cpages);
set_inode_flag(cc->inode, FI_APPEND_WRITE);
if (cc->cluster_idx == 0)
else
f2fs_unlock_op(sbi);
out_free:
- for (i = 0; i < cc->nr_cpages; i++) {
- if (!cc->cpages[i])
- continue;
+ for (i = 0; i < cc->valid_nr_cpages; i++) {
f2fs_compress_free_page(cc->cpages[i]);
cc->cpages[i] = NULL;
}
enum iostat_type io_type)
{
struct address_space *mapping = cc->inode->i_mapping;
- int _submitted, compr_blocks, ret;
- int i = -1, err = 0;
+ int _submitted, compr_blocks, ret, i;
compr_blocks = f2fs_compressed_blocks(cc);
- if (compr_blocks < 0) {
- err = compr_blocks;
- goto out_err;
+
+ for (i = 0; i < cc->cluster_size; i++) {
+ if (!cc->rpages[i])
+ continue;
+
+ redirty_page_for_writepage(wbc, cc->rpages[i]);
+ unlock_page(cc->rpages[i]);
}
+ if (compr_blocks < 0)
+ return compr_blocks;
+
for (i = 0; i < cc->cluster_size; i++) {
if (!cc->rpages[i])
continue;
retry_write:
+ lock_page(cc->rpages[i]);
+
if (cc->rpages[i]->mapping != mapping) {
+continue_unlock:
unlock_page(cc->rpages[i]);
continue;
}
- BUG_ON(!PageLocked(cc->rpages[i]));
+ if (!PageDirty(cc->rpages[i]))
+ goto continue_unlock;
+
+ if (!clear_page_dirty_for_io(cc->rpages[i]))
+ goto continue_unlock;
ret = f2fs_write_single_data_page(cc->rpages[i], &_submitted,
NULL, NULL, wbc, io_type,
* avoid deadlock caused by cluster update race
* from foreground operation.
*/
- if (IS_NOQUOTA(cc->inode)) {
- err = 0;
- goto out_err;
- }
+ if (IS_NOQUOTA(cc->inode))
+ return 0;
ret = 0;
cond_resched();
congestion_wait(BLK_RW_ASYNC,
DEFAULT_IO_TIMEOUT);
- lock_page(cc->rpages[i]);
-
- if (!PageDirty(cc->rpages[i])) {
- unlock_page(cc->rpages[i]);
- continue;
- }
-
- clear_page_dirty_for_io(cc->rpages[i]);
goto retry_write;
}
- err = ret;
- goto out_err;
+ return ret;
}
*submitted += _submitted;
f2fs_balance_fs(F2FS_M_SB(mapping), true);
return 0;
-out_err:
- for (++i; i < cc->cluster_size; i++) {
- if (!cc->rpages[i])
- continue;
- redirty_page_for_writepage(wbc, cc->rpages[i]);
- unlock_page(cc->rpages[i]);
- }
- return err;
}
int f2fs_write_multi_pages(struct compress_ctx *cc,
#include <linux/swap.h>
#include <linux/prefetch.h>
#include <linux/uio.h>
-#include <linux/cleancache.h>
#include <linux/sched/signal.h>
#include <linux/fiemap.h>
+#include <linux/iomap.h>
#include "f2fs.h"
#include "node.h"
if (unlikely(is_inode_flag_set(dn->inode, FI_NO_ALLOC)))
return -EPERM;
- err = f2fs_get_node_info(sbi, dn->nid, &ni);
+ err = f2fs_get_node_info(sbi, dn->nid, &ni, false);
if (err)
return err;
f2fs_invalidate_compress_page(sbi, old_blkaddr);
}
f2fs_update_data_blkaddr(dn, dn->data_blkaddr);
-
- /*
- * i_size will be updated by direct_IO. Otherwise, we'll get stale
- * data from unwritten block via dio_read.
- */
return 0;
}
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
-{
- struct inode *inode = file_inode(iocb->ki_filp);
- struct f2fs_map_blocks map;
- int flag;
- int err = 0;
- bool direct_io = iocb->ki_flags & IOCB_DIRECT;
-
- map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
- map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
- if (map.m_len > map.m_lblk)
- map.m_len -= map.m_lblk;
- else
- map.m_len = 0;
-
- map.m_next_pgofs = NULL;
- map.m_next_extent = NULL;
- map.m_seg_type = NO_CHECK_TYPE;
- map.m_may_create = true;
-
- if (direct_io) {
- map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
- flag = f2fs_force_buffered_io(inode, iocb, from) ?
- F2FS_GET_BLOCK_PRE_AIO :
- F2FS_GET_BLOCK_PRE_DIO;
- goto map_blocks;
- }
- if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
- err = f2fs_convert_inline_inode(inode);
- if (err)
- return err;
- }
- if (f2fs_has_inline_data(inode))
- return err;
-
- flag = F2FS_GET_BLOCK_PRE_AIO;
-
-map_blocks:
- err = f2fs_map_blocks(inode, &map, 1, flag);
- if (map.m_len > 0 && err == -ENOSPC) {
- if (!direct_io)
- set_inode_flag(inode, FI_NO_PREALLOC);
- err = 0;
- }
- return err;
-}
-
void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
{
if (flag == F2FS_GET_BLOCK_PRE_AIO) {
flag != F2FS_GET_BLOCK_DIO);
err = __allocate_data_block(&dn,
map->m_seg_type);
- if (!err)
+ if (!err) {
+ if (flag == F2FS_GET_BLOCK_PRE_DIO)
+ file_need_truncate(inode);
set_inode_flag(inode, FI_APPEND_WRITE);
+ }
}
if (err)
goto sync_out;
return (blks << inode->i_blkbits);
}
-static int __get_data_block(struct inode *inode, sector_t iblock,
- struct buffer_head *bh, int create, int flag,
- pgoff_t *next_pgofs, int seg_type, bool may_write)
-{
- struct f2fs_map_blocks map;
- int err;
-
- map.m_lblk = iblock;
- map.m_len = bytes_to_blks(inode, bh->b_size);
- map.m_next_pgofs = next_pgofs;
- map.m_next_extent = NULL;
- map.m_seg_type = seg_type;
- map.m_may_create = may_write;
-
- err = f2fs_map_blocks(inode, &map, create, flag);
- if (!err) {
- map_bh(bh, inode->i_sb, map.m_pblk);
- bh->b_state = (bh->b_state & ~F2FS_MAP_FLAGS) | map.m_flags;
- bh->b_size = blks_to_bytes(inode, map.m_len);
-
- if (map.m_multidev_dio)
- bh->b_bdev = map.m_bdev;
- }
- return err;
-}
-
-static int get_data_block_dio_write(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- return __get_data_block(inode, iblock, bh_result, create,
- F2FS_GET_BLOCK_DIO, NULL,
- f2fs_rw_hint_to_seg_type(inode->i_write_hint),
- true);
-}
-
-static int get_data_block_dio(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- return __get_data_block(inode, iblock, bh_result, create,
- F2FS_GET_BLOCK_DIO, NULL,
- f2fs_rw_hint_to_seg_type(inode->i_write_hint),
- false);
-}
-
static int f2fs_xattr_fiemap(struct inode *inode,
struct fiemap_extent_info *fieinfo)
{
if (!page)
return -ENOMEM;
- err = f2fs_get_node_info(sbi, inode->i_ino, &ni);
+ err = f2fs_get_node_info(sbi, inode->i_ino, &ni, false);
if (err) {
f2fs_put_page(page, 1);
return err;
if (!page)
return -ENOMEM;
- err = f2fs_get_node_info(sbi, xnid, &ni);
+ err = f2fs_get_node_info(sbi, xnid, &ni, false);
if (err) {
f2fs_put_page(page, 1);
return err;
block_nr = map->m_pblk + block_in_file - map->m_lblk;
SetPageMappedToDisk(page);
- if (!PageUptodate(page) && (!PageSwapCache(page) &&
- !cleancache_get_page(page))) {
- SetPageUptodate(page);
- goto confused;
- }
-
if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr,
DATA_GENERIC_ENHANCE_READ)) {
ret = -EFSCORRUPTED;
ClearPageError(page);
*last_block_in_bio = block_nr;
goto out;
-confused:
- if (bio) {
- __submit_bio(F2FS_I_SB(inode), bio, DATA);
- bio = NULL;
- }
- unlock_page(page);
out:
*bio_ret = bio;
return ret;
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ /* The below cases were checked when setting it. */
+ if (f2fs_is_pinned_file(inode))
+ return false;
+ if (fio && is_sbi_flag_set(sbi, SBI_NEED_FSCK))
+ return true;
if (f2fs_lfs_mode(sbi))
return true;
if (S_ISDIR(inode->i_mode))
return true;
if (f2fs_is_atomic_file(inode))
return true;
- if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
- return true;
/* swap file is migrating in aligned write mode */
if (is_inode_flag_set(inode, FI_ALIGNED_WRITE))
fio->need_lock = LOCK_REQ;
}
- err = f2fs_get_node_info(fio->sbi, dn.nid, &ni);
+ err = f2fs_get_node_info(fio->sbi, dn.nid, &ni, false);
if (err)
goto out_writepage;
.rpages = NULL,
.nr_rpages = 0,
.cpages = NULL,
+ .valid_nr_cpages = 0,
.rbuf = NULL,
.cbuf = NULL,
.rlen = PAGE_SIZE * F2FS_I(inode)->i_cluster_size,
FS_CP_DATA_IO : FS_DATA_IO);
}
-static void f2fs_write_failed(struct inode *inode, loff_t to)
+void f2fs_write_failed(struct inode *inode, loff_t to)
{
loff_t i_size = i_size_read(inode);
int flag;
/*
- * we already allocated all the blocks, so we don't need to get
- * the block addresses when there is no need to fill the page.
+ * If a whole page is being written and we already preallocated all the
+ * blocks, then there is no need to get a block address now.
*/
- if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
- !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
- !f2fs_verity_in_progress(inode))
+ if (len == PAGE_SIZE && is_inode_flag_set(inode, FI_PREALLOCATED_ALL))
return 0;
/* f2fs_lock_op avoids race between write CP and convert_inline_page */
return copied;
}
-static int check_direct_IO(struct inode *inode, struct iov_iter *iter,
- loff_t offset)
-{
- unsigned i_blkbits = READ_ONCE(inode->i_blkbits);
- unsigned blkbits = i_blkbits;
- unsigned blocksize_mask = (1 << blkbits) - 1;
- unsigned long align = offset | iov_iter_alignment(iter);
- struct block_device *bdev = inode->i_sb->s_bdev;
-
- if (iov_iter_rw(iter) == READ && offset >= i_size_read(inode))
- return 1;
-
- if (align & blocksize_mask) {
- if (bdev)
- blkbits = blksize_bits(bdev_logical_block_size(bdev));
- blocksize_mask = (1 << blkbits) - 1;
- if (align & blocksize_mask)
- return -EINVAL;
- return 1;
- }
- return 0;
-}
-
-static void f2fs_dio_end_io(struct bio *bio)
-{
- struct f2fs_private_dio *dio = bio->bi_private;
-
- dec_page_count(F2FS_I_SB(dio->inode),
- dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
- bio->bi_private = dio->orig_private;
- bio->bi_end_io = dio->orig_end_io;
-
- kfree(dio);
-
- bio_endio(bio);
-}
-
-static void f2fs_dio_submit_bio(struct bio *bio, struct inode *inode,
- loff_t file_offset)
-{
- struct f2fs_private_dio *dio;
- bool write = (bio_op(bio) == REQ_OP_WRITE);
-
- dio = f2fs_kzalloc(F2FS_I_SB(inode),
- sizeof(struct f2fs_private_dio), GFP_NOFS);
- if (!dio)
- goto out;
-
- dio->inode = inode;
- dio->orig_end_io = bio->bi_end_io;
- dio->orig_private = bio->bi_private;
- dio->write = write;
-
- bio->bi_end_io = f2fs_dio_end_io;
- bio->bi_private = dio;
-
- inc_page_count(F2FS_I_SB(inode),
- write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
-
- submit_bio(bio);
- return;
-out:
- bio->bi_status = BLK_STS_IOERR;
- bio_endio(bio);
-}
-
-static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
- struct address_space *mapping = iocb->ki_filp->f_mapping;
- struct inode *inode = mapping->host;
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct f2fs_inode_info *fi = F2FS_I(inode);
- size_t count = iov_iter_count(iter);
- loff_t offset = iocb->ki_pos;
- int rw = iov_iter_rw(iter);
- int err;
- enum rw_hint hint = iocb->ki_hint;
- int whint_mode = F2FS_OPTION(sbi).whint_mode;
- bool do_opu;
-
- err = check_direct_IO(inode, iter, offset);
- if (err)
- return err < 0 ? err : 0;
-
- if (f2fs_force_buffered_io(inode, iocb, iter))
- return 0;
-
- do_opu = rw == WRITE && f2fs_lfs_mode(sbi);
-
- trace_f2fs_direct_IO_enter(inode, offset, count, rw);
-
- if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
- iocb->ki_hint = WRITE_LIFE_NOT_SET;
-
- if (iocb->ki_flags & IOCB_NOWAIT) {
- if (!down_read_trylock(&fi->i_gc_rwsem[rw])) {
- iocb->ki_hint = hint;
- err = -EAGAIN;
- goto out;
- }
- if (do_opu && !down_read_trylock(&fi->i_gc_rwsem[READ])) {
- up_read(&fi->i_gc_rwsem[rw]);
- iocb->ki_hint = hint;
- err = -EAGAIN;
- goto out;
- }
- } else {
- down_read(&fi->i_gc_rwsem[rw]);
- if (do_opu)
- down_read(&fi->i_gc_rwsem[READ]);
- }
-
- err = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
- iter, rw == WRITE ? get_data_block_dio_write :
- get_data_block_dio, NULL, f2fs_dio_submit_bio,
- rw == WRITE ? DIO_LOCKING | DIO_SKIP_HOLES :
- DIO_SKIP_HOLES);
-
- if (do_opu)
- up_read(&fi->i_gc_rwsem[READ]);
-
- up_read(&fi->i_gc_rwsem[rw]);
-
- if (rw == WRITE) {
- if (whint_mode == WHINT_MODE_OFF)
- iocb->ki_hint = hint;
- if (err > 0) {
- f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
- err);
- if (!do_opu)
- set_inode_flag(inode, FI_UPDATE_WRITE);
- } else if (err == -EIOCBQUEUED) {
- f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
- count - iov_iter_count(iter));
- } else if (err < 0) {
- f2fs_write_failed(inode, offset + count);
- }
- } else {
- if (err > 0)
- f2fs_update_iostat(sbi, APP_DIRECT_READ_IO, err);
- else if (err == -EIOCBQUEUED)
- f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO,
- count - iov_iter_count(iter));
- }
-
-out:
- trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
-
- return err;
-}
-
void f2fs_invalidate_page(struct page *page, unsigned int offset,
unsigned int length)
{
clear_page_private_gcing(page);
- if (test_opt(sbi, COMPRESS_CACHE)) {
- if (f2fs_compressed_file(inode))
- f2fs_invalidate_compress_pages(sbi, inode->i_ino);
- if (inode->i_ino == F2FS_COMPRESS_INO(sbi))
- clear_page_private_data(page);
- }
+ if (test_opt(sbi, COMPRESS_CACHE) &&
+ inode->i_ino == F2FS_COMPRESS_INO(sbi))
+ clear_page_private_data(page);
if (page_private_atomic(page))
return f2fs_drop_inmem_page(inode, page);
return 0;
if (test_opt(F2FS_P_SB(page), COMPRESS_CACHE)) {
- struct f2fs_sb_info *sbi = F2FS_P_SB(page);
struct inode *inode = page->mapping->host;
- if (f2fs_compressed_file(inode))
- f2fs_invalidate_compress_pages(sbi, inode->i_ino);
- if (inode->i_ino == F2FS_COMPRESS_INO(sbi))
+ if (inode->i_ino == F2FS_COMPRESS_INO(F2FS_I_SB(inode)))
clear_page_private_data(page);
}
.set_page_dirty = f2fs_set_data_page_dirty,
.invalidatepage = f2fs_invalidate_page,
.releasepage = f2fs_release_page,
- .direct_IO = f2fs_direct_IO,
+ .direct_IO = noop_direct_IO,
.bmap = f2fs_bmap,
.swap_activate = f2fs_swap_activate,
.swap_deactivate = f2fs_swap_deactivate,
{
kmem_cache_destroy(bio_entry_slab);
}
+
+static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ struct f2fs_map_blocks map = {};
+ pgoff_t next_pgofs = 0;
+ int err;
+
+ map.m_lblk = bytes_to_blks(inode, offset);
+ map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1;
+ map.m_next_pgofs = &next_pgofs;
+ map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
+ if (flags & IOMAP_WRITE)
+ map.m_may_create = true;
+
+ err = f2fs_map_blocks(inode, &map, flags & IOMAP_WRITE,
+ F2FS_GET_BLOCK_DIO);
+ if (err)
+ return err;
+
+ iomap->offset = blks_to_bytes(inode, map.m_lblk);
+
+ if (map.m_flags & (F2FS_MAP_MAPPED | F2FS_MAP_UNWRITTEN)) {
+ iomap->length = blks_to_bytes(inode, map.m_len);
+ if (map.m_flags & F2FS_MAP_MAPPED) {
+ iomap->type = IOMAP_MAPPED;
+ iomap->flags |= IOMAP_F_MERGED;
+ } else {
+ iomap->type = IOMAP_UNWRITTEN;
+ }
+ if (WARN_ON_ONCE(!__is_valid_data_blkaddr(map.m_pblk)))
+ return -EINVAL;
+
+ iomap->bdev = map.m_bdev;
+ iomap->addr = blks_to_bytes(inode, map.m_pblk);
+ } else {
+ iomap->length = blks_to_bytes(inode, next_pgofs) -
+ iomap->offset;
+ iomap->type = IOMAP_HOLE;
+ iomap->addr = IOMAP_NULL_ADDR;
+ }
+
+ if (map.m_flags & F2FS_MAP_NEW)
+ iomap->flags |= IOMAP_F_NEW;
+ if ((inode->i_state & I_DIRTY_DATASYNC) ||
+ offset + length > i_size_read(inode))
+ iomap->flags |= IOMAP_F_DIRTY;
+
+ return 0;
+}
+
+const struct iomap_ops f2fs_iomap_ops = {
+ .iomap_begin = f2fs_iomap_begin,
+};
FAULT_WRITE_IO,
FAULT_SLAB_ALLOC,
FAULT_DQUOT_INIT,
+ FAULT_LOCK_OP,
FAULT_MAX,
};
#define FADVISE_KEEP_SIZE_BIT 0x10
#define FADVISE_HOT_BIT 0x20
#define FADVISE_VERITY_BIT 0x40
+#define FADVISE_TRUNC_BIT 0x80
#define FADVISE_MODIFIABLE_BITS (FADVISE_COLD_BIT | FADVISE_HOT_BIT)
#define file_is_verity(inode) is_file(inode, FADVISE_VERITY_BIT)
#define file_set_verity(inode) set_file(inode, FADVISE_VERITY_BIT)
+#define file_should_truncate(inode) is_file(inode, FADVISE_TRUNC_BIT)
+#define file_need_truncate(inode) set_file(inode, FADVISE_TRUNC_BIT)
+#define file_dont_truncate(inode) clear_file(inode, FADVISE_TRUNC_BIT)
+
#define DEF_DIR_LEVEL 0
enum {
FI_INLINE_DOTS, /* indicate inline dot dentries */
FI_DO_DEFRAG, /* indicate defragment is running */
FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */
- FI_NO_PREALLOC, /* indicate skipped preallocated blocks */
+ FI_PREALLOCATED_ALL, /* all blocks for write were preallocated */
FI_HOT_DATA, /* indicate file is hot */
FI_EXTRA_ATTR, /* indicate file has extra attribute */
FI_PROJ_INHERIT, /* indicate file inherits projectid */
unsigned int segment_count; /* total # of segments */
unsigned int main_segments; /* # of segments in main area */
unsigned int reserved_segments; /* # of reserved segments */
+ unsigned int additional_reserved_segments;/* reserved segs for IO align feature */
unsigned int ovp_segments; /* # of overprovision segments */
/* a threshold to reclaim prefree segments */
unsigned int nr_rpages; /* total page number in rpages */
struct page **cpages; /* pages store compressed data in cluster */
unsigned int nr_cpages; /* total page number in cpages */
+ unsigned int valid_nr_cpages; /* valid page number in cpages */
void *rbuf; /* virtual mapped address on rpages */
struct compress_data *cbuf; /* virtual mapped address on cpages */
size_t rlen; /* valid data length in rbuf */
unsigned int cur_victim_sec; /* current victim section num */
unsigned int gc_mode; /* current GC state */
unsigned int next_victim_seg[2]; /* next segment in victim section */
+ spinlock_t gc_urgent_high_lock;
+ bool gc_urgent_high_limited; /* indicates having limited trial count */
+ unsigned int gc_urgent_high_remaining; /* remaining trial count for GC_URGENT_HIGH */
/* for skip statistic */
unsigned int atomic_files; /* # of opened atomic file */
#endif
};
-struct f2fs_private_dio {
- struct inode *inode;
- void *orig_private;
- bio_end_io_t *orig_end_io;
- bool write;
-};
-
#ifdef CONFIG_F2FS_FAULT_INJECTION
#define f2fs_show_injection_info(sbi, type) \
printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n", \
static inline int f2fs_trylock_op(struct f2fs_sb_info *sbi)
{
+ if (time_to_inject(sbi, FAULT_LOCK_OP)) {
+ f2fs_show_injection_info(sbi, FAULT_LOCK_OP);
+ return 0;
+ }
return down_read_trylock(&sbi->cp_rwsem);
}
if (!__allow_reserved_blocks(sbi, inode, true))
avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks;
+
+ if (F2FS_IO_ALIGNED(sbi))
+ avail_user_block_count -= sbi->blocks_per_seg *
+ SM_I(sbi)->additional_reserved_segments;
+
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
if (avail_user_block_count > sbi->unusable_block_count)
avail_user_block_count -= sbi->unusable_block_count;
if (!__allow_reserved_blocks(sbi, inode, false))
valid_block_count += F2FS_OPTION(sbi).root_reserved_blocks;
+
+ if (F2FS_IO_ALIGNED(sbi))
+ valid_block_count += sbi->blocks_per_seg *
+ SM_I(sbi)->additional_reserved_segments;
+
user_block_count = sbi->user_block_count;
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
user_block_count -= sbi->unusable_block_count;
static inline void set_file(struct inode *inode, int type)
{
+ if (is_file(inode, type))
+ return;
F2FS_I(inode)->i_advise |= type;
f2fs_mark_inode_dirty_sync(inode, true);
}
static inline void clear_file(struct inode *inode, int type)
{
+ if (!is_file(inode, type))
+ return;
F2FS_I(inode)->i_advise &= ~type;
f2fs_mark_inode_dirty_sync(inode, true);
}
bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid);
bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino);
int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
- struct node_info *ni);
+ struct node_info *ni, bool checkpoint_context);
pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs);
int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode);
int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from);
int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
int f2fs_reserve_new_block(struct dnode_of_data *dn);
int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
int op_flags, bool for_write);
struct writeback_control *wbc,
enum iostat_type io_type,
int compr_blocks, bool allow_balance);
+void f2fs_write_failed(struct inode *inode, loff_t to);
void f2fs_invalidate_page(struct page *page, unsigned int offset,
unsigned int length);
int f2fs_release_page(struct page *page, gfp_t wait);
void f2fs_destroy_post_read_processing(void);
int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
+extern const struct iomap_ops f2fs_iomap_ops;
/*
* gc.c
#include <linux/sched/signal.h>
#include <linux/fileattr.h>
#include <linux/fadvise.h>
+#include <linux/iomap.h>
#include "f2fs.h"
#include "node.h"
if (ret)
return ret;
- ret = f2fs_get_node_info(sbi, dn.nid, &ni);
+ ret = f2fs_get_node_info(sbi, dn.nid, &ni, false);
if (ret) {
f2fs_put_dnode(&dn);
return ret;
map.m_seg_type = CURSEG_COLD_DATA_PINNED;
err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO);
+ file_dont_truncate(inode);
up_write(&sbi->pin_sem);
(mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE)))
return -EOPNOTSUPP;
- if (f2fs_compressed_file(inode) &&
+ /*
+ * Pinned file should not support partial trucation since the block
+ * can be used by applications.
+ */
+ if ((f2fs_compressed_file(inode) || f2fs_is_pinned_file(inode)) &&
(mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE |
FALLOC_FL_ZERO_RANGE | FALLOC_FL_INSERT_RANGE)))
return -EOPNOTSUPP;
inode_lock(inode);
- if (f2fs_should_update_outplace(inode, NULL)) {
- ret = -EINVAL;
- goto out;
- }
-
if (!pin) {
clear_inode_flag(inode, FI_PIN_FILE);
f2fs_i_gc_failures_write(inode, 0);
goto done;
}
+ if (f2fs_should_update_outplace(inode, NULL)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
if (f2fs_pin_file_control(inode, false)) {
ret = -EAGAIN;
goto out;
return __f2fs_ioctl(filp, cmd, arg);
}
-static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+/*
+ * Return %true if the given read or write request should use direct I/O, or
+ * %false if it should use buffered I/O.
+ */
+static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
+ struct iov_iter *iter)
+{
+ unsigned int align;
+
+ if (!(iocb->ki_flags & IOCB_DIRECT))
+ return false;
+
+ if (f2fs_force_buffered_io(inode, iocb, iter))
+ return false;
+
+ /*
+ * Direct I/O not aligned to the disk's logical_block_size will be
+ * attempted, but will fail with -EINVAL.
+ *
+ * f2fs additionally requires that direct I/O be aligned to the
+ * filesystem block size, which is often a stricter requirement.
+ * However, f2fs traditionally falls back to buffered I/O on requests
+ * that are logical_block_size-aligned but not fs-block aligned.
+ *
+ * The below logic implements this behavior.
+ */
+ align = iocb->ki_pos | iov_iter_alignment(iter);
+ if (!IS_ALIGNED(align, i_blocksize(inode)) &&
+ IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
+ return false;
+
+ return true;
+}
+
+static int f2fs_dio_read_end_io(struct kiocb *iocb, ssize_t size, int error,
+ unsigned int flags)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
+
+ dec_page_count(sbi, F2FS_DIO_READ);
+ if (error)
+ return error;
+ f2fs_update_iostat(sbi, APP_DIRECT_READ_IO, size);
+ return 0;
+}
+
+static const struct iomap_dio_ops f2fs_iomap_dio_read_ops = {
+ .end_io = f2fs_dio_read_end_io,
+};
+
+static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
- int ret;
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ const loff_t pos = iocb->ki_pos;
+ const size_t count = iov_iter_count(to);
+ struct iomap_dio *dio;
+ ssize_t ret;
+
+ if (count == 0)
+ return 0; /* skip atime update */
+
+ trace_f2fs_direct_IO_enter(inode, iocb, count, READ);
+
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ if (!down_read_trylock(&fi->i_gc_rwsem[READ])) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ } else {
+ down_read(&fi->i_gc_rwsem[READ]);
+ }
+
+ /*
+ * We have to use __iomap_dio_rw() and iomap_dio_complete() instead of
+ * the higher-level function iomap_dio_rw() in order to ensure that the
+ * F2FS_DIO_READ counter will be decremented correctly in all cases.
+ */
+ inc_page_count(sbi, F2FS_DIO_READ);
+ dio = __iomap_dio_rw(iocb, to, &f2fs_iomap_ops,
+ &f2fs_iomap_dio_read_ops, 0, 0);
+ if (IS_ERR_OR_NULL(dio)) {
+ ret = PTR_ERR_OR_ZERO(dio);
+ if (ret != -EIOCBQUEUED)
+ dec_page_count(sbi, F2FS_DIO_READ);
+ } else {
+ ret = iomap_dio_complete(dio);
+ }
+
+ up_read(&fi->i_gc_rwsem[READ]);
+
+ file_accessed(file);
+out:
+ trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
+ return ret;
+}
+
+static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+ ssize_t ret;
if (!f2fs_is_compress_backend_ready(inode))
return -EOPNOTSUPP;
- ret = generic_file_read_iter(iocb, iter);
+ if (f2fs_should_use_dio(inode, iocb, to))
+ return f2fs_dio_read_iter(iocb, to);
+ ret = filemap_read(iocb, to, 0);
if (ret > 0)
- f2fs_update_iostat(F2FS_I_SB(inode), APP_READ_IO, ret);
+ f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_READ_IO, ret);
+ return ret;
+}
+
+static ssize_t f2fs_write_checks(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file_inode(file);
+ ssize_t count;
+ int err;
+
+ if (IS_IMMUTABLE(inode))
+ return -EPERM;
+
+ if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED))
+ return -EPERM;
+
+ count = generic_write_checks(iocb, from);
+ if (count <= 0)
+ return count;
+
+ err = file_modified(file);
+ if (err)
+ return err;
+ return count;
+}
+
+/*
+ * Preallocate blocks for a write request, if it is possible and helpful to do
+ * so. Returns a positive number if blocks may have been preallocated, 0 if no
+ * blocks were preallocated, or a negative errno value if something went
+ * seriously wrong. Also sets FI_PREALLOCATED_ALL on the inode if *all* the
+ * requested blocks (not just some of them) have been allocated.
+ */
+static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter,
+ bool dio)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ const loff_t pos = iocb->ki_pos;
+ const size_t count = iov_iter_count(iter);
+ struct f2fs_map_blocks map = {};
+ int flag;
+ int ret;
+
+ /* If it will be an out-of-place direct write, don't bother. */
+ if (dio && f2fs_lfs_mode(sbi))
+ return 0;
+ /*
+ * Don't preallocate holes aligned to DIO_SKIP_HOLES which turns into
+ * buffered IO, if DIO meets any holes.
+ */
+ if (dio && i_size_read(inode) &&
+ (F2FS_BYTES_TO_BLK(pos) < F2FS_BLK_ALIGN(i_size_read(inode))))
+ return 0;
+
+ /* No-wait I/O can't allocate blocks. */
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return 0;
+
+ /* If it will be a short write, don't bother. */
+ if (fault_in_iov_iter_readable(iter, count))
+ return 0;
+
+ if (f2fs_has_inline_data(inode)) {
+ /* If the data will fit inline, don't bother. */
+ if (pos + count <= MAX_INLINE_DATA(inode))
+ return 0;
+ ret = f2fs_convert_inline_inode(inode);
+ if (ret)
+ return ret;
+ }
+
+ /* Do not preallocate blocks that will be written partially in 4KB. */
+ map.m_lblk = F2FS_BLK_ALIGN(pos);
+ map.m_len = F2FS_BYTES_TO_BLK(pos + count);
+ if (map.m_len > map.m_lblk)
+ map.m_len -= map.m_lblk;
+ else
+ map.m_len = 0;
+ map.m_may_create = true;
+ if (dio) {
+ map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint);
+ flag = F2FS_GET_BLOCK_PRE_DIO;
+ } else {
+ map.m_seg_type = NO_CHECK_TYPE;
+ flag = F2FS_GET_BLOCK_PRE_AIO;
+ }
+
+ ret = f2fs_map_blocks(inode, &map, 1, flag);
+ /* -ENOSPC|-EDQUOT are fine to report the number of allocated blocks. */
+ if (ret < 0 && !((ret == -ENOSPC || ret == -EDQUOT) && map.m_len > 0))
+ return ret;
+ if (ret == 0)
+ set_inode_flag(inode, FI_PREALLOCATED_ALL);
+ return map.m_len;
+}
+
+static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
+ struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file_inode(file);
+ ssize_t ret;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EOPNOTSUPP;
+
+ current->backing_dev_info = inode_to_bdi(inode);
+ ret = generic_perform_write(file, from, iocb->ki_pos);
+ current->backing_dev_info = NULL;
+
+ if (ret > 0) {
+ iocb->ki_pos += ret;
+ f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_IO, ret);
+ }
return ret;
}
-static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+static int f2fs_dio_write_end_io(struct kiocb *iocb, ssize_t size, int error,
+ unsigned int flags)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(iocb->ki_filp));
+
+ dec_page_count(sbi, F2FS_DIO_WRITE);
+ if (error)
+ return error;
+ f2fs_update_iostat(sbi, APP_DIRECT_IO, size);
+ return 0;
+}
+
+static const struct iomap_dio_ops f2fs_iomap_dio_write_ops = {
+ .end_io = f2fs_dio_write_end_io,
+};
+
+static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
+ bool *may_need_sync)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ const bool do_opu = f2fs_lfs_mode(sbi);
+ const int whint_mode = F2FS_OPTION(sbi).whint_mode;
+ const loff_t pos = iocb->ki_pos;
+ const ssize_t count = iov_iter_count(from);
+ const enum rw_hint hint = iocb->ki_hint;
+ unsigned int dio_flags;
+ struct iomap_dio *dio;
+ ssize_t ret;
+
+ trace_f2fs_direct_IO_enter(inode, iocb, count, WRITE);
+
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ /* f2fs_convert_inline_inode() and block allocation can block */
+ if (f2fs_has_inline_data(inode) ||
+ !f2fs_overwrite_io(inode, pos, count)) {
+ ret = -EAGAIN;
+ goto out;
+ }
+
+ if (!down_read_trylock(&fi->i_gc_rwsem[WRITE])) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ if (do_opu && !down_read_trylock(&fi->i_gc_rwsem[READ])) {
+ up_read(&fi->i_gc_rwsem[WRITE]);
+ ret = -EAGAIN;
+ goto out;
+ }
+ } else {
+ ret = f2fs_convert_inline_inode(inode);
+ if (ret)
+ goto out;
+
+ down_read(&fi->i_gc_rwsem[WRITE]);
+ if (do_opu)
+ down_read(&fi->i_gc_rwsem[READ]);
+ }
+ if (whint_mode == WHINT_MODE_OFF)
+ iocb->ki_hint = WRITE_LIFE_NOT_SET;
+
+ /*
+ * We have to use __iomap_dio_rw() and iomap_dio_complete() instead of
+ * the higher-level function iomap_dio_rw() in order to ensure that the
+ * F2FS_DIO_WRITE counter will be decremented correctly in all cases.
+ */
+ inc_page_count(sbi, F2FS_DIO_WRITE);
+ dio_flags = 0;
+ if (pos + count > inode->i_size)
+ dio_flags |= IOMAP_DIO_FORCE_WAIT;
+ dio = __iomap_dio_rw(iocb, from, &f2fs_iomap_ops,
+ &f2fs_iomap_dio_write_ops, dio_flags, 0);
+ if (IS_ERR_OR_NULL(dio)) {
+ ret = PTR_ERR_OR_ZERO(dio);
+ if (ret == -ENOTBLK)
+ ret = 0;
+ if (ret != -EIOCBQUEUED)
+ dec_page_count(sbi, F2FS_DIO_WRITE);
+ } else {
+ ret = iomap_dio_complete(dio);
+ }
+
+ if (whint_mode == WHINT_MODE_OFF)
+ iocb->ki_hint = hint;
+ if (do_opu)
+ up_read(&fi->i_gc_rwsem[READ]);
+ up_read(&fi->i_gc_rwsem[WRITE]);
+
+ if (ret < 0)
+ goto out;
+ if (pos + ret > inode->i_size)
+ f2fs_i_size_write(inode, pos + ret);
+ if (!do_opu)
+ set_inode_flag(inode, FI_UPDATE_WRITE);
+
+ if (iov_iter_count(from)) {
+ ssize_t ret2;
+ loff_t bufio_start_pos = iocb->ki_pos;
+
+ /*
+ * The direct write was partial, so we need to fall back to a
+ * buffered write for the remainder.
+ */
+
+ ret2 = f2fs_buffered_write_iter(iocb, from);
+ if (iov_iter_count(from))
+ f2fs_write_failed(inode, iocb->ki_pos);
+ if (ret2 < 0)
+ goto out;
+
+ /*
+ * Ensure that the pagecache pages are written to disk and
+ * invalidated to preserve the expected O_DIRECT semantics.
+ */
+ if (ret2 > 0) {
+ loff_t bufio_end_pos = bufio_start_pos + ret2 - 1;
+
+ ret += ret2;
+
+ ret2 = filemap_write_and_wait_range(file->f_mapping,
+ bufio_start_pos,
+ bufio_end_pos);
+ if (ret2 < 0)
+ goto out;
+ invalidate_mapping_pages(file->f_mapping,
+ bufio_start_pos >> PAGE_SHIFT,
+ bufio_end_pos >> PAGE_SHIFT);
+ }
+ } else {
+ /* iomap_dio_rw() already handled the generic_write_sync(). */
+ *may_need_sync = false;
+ }
+out:
+ trace_f2fs_direct_IO_exit(inode, pos, count, WRITE, ret);
+ return ret;
+}
+
+static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+ const loff_t orig_pos = iocb->ki_pos;
+ const size_t orig_count = iov_iter_count(from);
+ loff_t target_size;
+ bool dio;
+ bool may_need_sync = true;
+ int preallocated;
ssize_t ret;
if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) {
inode_lock(inode);
}
- if (unlikely(IS_IMMUTABLE(inode))) {
- ret = -EPERM;
- goto unlock;
- }
-
- if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
- ret = -EPERM;
- goto unlock;
- }
-
- ret = generic_write_checks(iocb, from);
- if (ret > 0) {
- bool preallocated = false;
- size_t target_size = 0;
- int err;
-
- if (fault_in_iov_iter_readable(from, iov_iter_count(from)))
- set_inode_flag(inode, FI_NO_PREALLOC);
-
- if ((iocb->ki_flags & IOCB_NOWAIT)) {
- if (!f2fs_overwrite_io(inode, iocb->ki_pos,
- iov_iter_count(from)) ||
- f2fs_has_inline_data(inode) ||
- f2fs_force_buffered_io(inode, iocb, from)) {
- clear_inode_flag(inode, FI_NO_PREALLOC);
- inode_unlock(inode);
- ret = -EAGAIN;
- goto out;
- }
- goto write;
- }
-
- if (is_inode_flag_set(inode, FI_NO_PREALLOC))
- goto write;
-
- if (iocb->ki_flags & IOCB_DIRECT) {
- /*
- * Convert inline data for Direct I/O before entering
- * f2fs_direct_IO().
- */
- err = f2fs_convert_inline_inode(inode);
- if (err)
- goto out_err;
- /*
- * If force_buffere_io() is true, we have to allocate
- * blocks all the time, since f2fs_direct_IO will fall
- * back to buffered IO.
- */
- if (!f2fs_force_buffered_io(inode, iocb, from) &&
- f2fs_lfs_mode(F2FS_I_SB(inode)))
- goto write;
- }
- preallocated = true;
- target_size = iocb->ki_pos + iov_iter_count(from);
+ ret = f2fs_write_checks(iocb, from);
+ if (ret <= 0)
+ goto out_unlock;
- err = f2fs_preallocate_blocks(iocb, from);
- if (err) {
-out_err:
- clear_inode_flag(inode, FI_NO_PREALLOC);
- inode_unlock(inode);
- ret = err;
- goto out;
- }
-write:
- ret = __generic_file_write_iter(iocb, from);
- clear_inode_flag(inode, FI_NO_PREALLOC);
+ /* Determine whether we will do a direct write or a buffered write. */
+ dio = f2fs_should_use_dio(inode, iocb, from);
- /* if we couldn't write data, we should deallocate blocks. */
- if (preallocated && i_size_read(inode) < target_size) {
- down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
- filemap_invalidate_lock(inode->i_mapping);
- f2fs_truncate(inode);
- filemap_invalidate_unlock(inode->i_mapping);
- up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
- }
+ /* Possibly preallocate the blocks for the write. */
+ target_size = iocb->ki_pos + iov_iter_count(from);
+ preallocated = f2fs_preallocate_blocks(iocb, from, dio);
+ if (preallocated < 0)
+ ret = preallocated;
+ else
+ /* Do the actual write. */
+ ret = dio ?
+ f2fs_dio_write_iter(iocb, from, &may_need_sync):
+ f2fs_buffered_write_iter(iocb, from);
- if (ret > 0)
- f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
+ /* Don't leave any preallocated blocks around past i_size. */
+ if (preallocated && i_size_read(inode) < target_size) {
+ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+ filemap_invalidate_lock(inode->i_mapping);
+ if (!f2fs_truncate(inode))
+ file_dont_truncate(inode);
+ filemap_invalidate_unlock(inode->i_mapping);
+ up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+ } else {
+ file_dont_truncate(inode);
}
-unlock:
+
+ clear_inode_flag(inode, FI_PREALLOCATED_ALL);
+out_unlock:
inode_unlock(inode);
out:
- trace_f2fs_file_write_iter(inode, iocb->ki_pos,
- iov_iter_count(from), ret);
- if (ret > 0)
+ trace_f2fs_file_write_iter(inode, orig_pos, orig_count, ret);
+ if (ret > 0 && may_need_sync)
ret = generic_write_sync(iocb, ret);
return ret;
}
static int f2fs_file_fadvise(struct file *filp, loff_t offset, loff_t len,
int advice)
{
- struct inode *inode;
struct address_space *mapping;
struct backing_dev_info *bdi;
+ struct inode *inode = file_inode(filp);
+ int err;
if (advice == POSIX_FADV_SEQUENTIAL) {
- inode = file_inode(filp);
if (S_ISFIFO(inode->i_mode))
return -ESPIPE;
return 0;
}
- return generic_fadvise(filp, offset, len, advice);
+ err = generic_fadvise(filp, offset, len, advice);
+ if (!err && advice == POSIX_FADV_DONTNEED &&
+ test_opt(F2FS_I_SB(inode), COMPRESS_CACHE) &&
+ f2fs_compressed_file(inode))
+ f2fs_invalidate_compress_pages(F2FS_I_SB(inode), inode->i_ino);
+
+ return err;
}
#ifdef CONFIG_COMPAT
* So, I'd like to wait some time to collect dirty segments.
*/
if (sbi->gc_mode == GC_URGENT_HIGH) {
+ spin_lock(&sbi->gc_urgent_high_lock);
+ if (sbi->gc_urgent_high_limited) {
+ if (!sbi->gc_urgent_high_remaining) {
+ sbi->gc_urgent_high_limited = false;
+ spin_unlock(&sbi->gc_urgent_high_lock);
+ sbi->gc_mode = GC_NORMAL;
+ continue;
+ }
+ sbi->gc_urgent_high_remaining--;
+ }
+ spin_unlock(&sbi->gc_urgent_high_lock);
+
wait_ms = gc_th->urgent_sleep_time;
down_write(&sbi->gc_lock);
goto do_gc;
continue;
}
- if (f2fs_get_node_info(sbi, nid, &ni)) {
+ if (f2fs_get_node_info(sbi, nid, &ni, false)) {
f2fs_put_page(node_page, 1);
continue;
}
if (IS_ERR(node_page))
return false;
- if (f2fs_get_node_info(sbi, nid, dni)) {
+ if (f2fs_get_node_info(sbi, nid, dni, false)) {
f2fs_put_page(node_page, 1);
return false;
}
set_sbi_flag(sbi, SBI_NEED_FSCK);
}
+ if (f2fs_check_nid_range(sbi, dni->ino))
+ return false;
+
*nofs = ofs_of_node(node_page);
source_blkaddr = data_blkaddr(NULL, node_page, ofs_in_node);
f2fs_put_page(node_page, 1);
if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
f2fs_err(sbi, "mismatched blkaddr %u (source_blkaddr %u) in seg %u",
blkaddr, source_blkaddr, segno);
- f2fs_bug_on(sbi, 1);
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
}
}
#endif
f2fs_wait_on_block_writeback(inode, dn.data_blkaddr);
- err = f2fs_get_node_info(fio.sbi, dn.nid, &ni);
+ err = f2fs_get_node_info(fio.sbi, dn.nid, &ni, false);
if (err)
goto put_out;
if (phase == 3) {
inode = f2fs_iget(sb, dni.ino);
- if (IS_ERR(inode) || is_bad_inode(inode))
+ if (IS_ERR(inode) || is_bad_inode(inode) ||
+ special_file(inode->i_mode))
continue;
if (!down_write_trylock(
if (err)
return err;
- err = f2fs_get_node_info(fio.sbi, dn->nid, &ni);
+ err = f2fs_get_node_info(fio.sbi, dn->nid, &ni, false);
if (err) {
f2fs_truncate_data_blocks_range(dn, 1);
f2fs_put_dnode(dn);
ilen = start + len;
ilen -= start;
- err = f2fs_get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni);
+ err = f2fs_get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni, false);
if (err)
goto out;
} else if (ino == F2FS_COMPRESS_INO(sbi)) {
#ifdef CONFIG_F2FS_FS_COMPRESSION
inode->i_mapping->a_ops = &f2fs_compress_aops;
+ /*
+ * generic_error_remove_page only truncates pages of regular
+ * inode
+ */
+ inode->i_mode |= S_IFREG;
#endif
mapping_set_gfp_mask(inode->i_mapping,
GFP_NOFS | __GFP_HIGHMEM | __GFP_MOVABLE);
goto bad_inode;
}
f2fs_set_inode_flags(inode);
+
+ if (file_should_truncate(inode)) {
+ ret = f2fs_truncate(inode);
+ if (ret)
+ goto bad_inode;
+ file_dont_truncate(inode);
+ }
+
unlock_new_inode(inode);
trace_f2fs_iget(inode);
return inode;
trace_f2fs_evict_inode(inode);
truncate_inode_pages_final(&inode->i_data);
- if (test_opt(sbi, COMPRESS_CACHE) && f2fs_compressed_file(inode))
+ if ((inode->i_nlink || is_bad_inode(inode)) &&
+ test_opt(sbi, COMPRESS_CACHE) && f2fs_compressed_file(inode))
f2fs_invalidate_compress_pages(sbi, inode->i_ino);
if (inode->i_ino == F2FS_NODE_INO(sbi) ||
* so we can prevent losing this orphan when encoutering checkpoint
* and following suddenly power-off.
*/
- err = f2fs_get_node_info(sbi, inode->i_ino, &ni);
+ err = f2fs_get_node_info(sbi, inode->i_ino, &ni, false);
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_warn(sbi, "May loss orphan inode, run fsck to fix.");
struct f2fs_iostat_latency iostat_lat[MAX_IO_TYPE][NR_PAGE_TYPE];
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
- spin_lock_irq(&sbi->iostat_lat_lock);
+ spin_lock_bh(&sbi->iostat_lat_lock);
for (idx = 0; idx < MAX_IO_TYPE; idx++) {
for (io = 0; io < NR_PAGE_TYPE; io++) {
cnt = io_lat->bio_cnt[idx][io];
io_lat->bio_cnt[idx][io] = 0;
}
}
- spin_unlock_irq(&sbi->iostat_lat_lock);
+ spin_unlock_bh(&sbi->iostat_lat_lock);
trace_f2fs_iostat_latency(sbi, iostat_lat);
}
return;
/* Need double check under the lock */
- spin_lock(&sbi->iostat_lock);
+ spin_lock_bh(&sbi->iostat_lock);
if (time_is_after_jiffies(sbi->iostat_next_period)) {
- spin_unlock(&sbi->iostat_lock);
+ spin_unlock_bh(&sbi->iostat_lock);
return;
}
sbi->iostat_next_period = jiffies +
sbi->prev_rw_iostat[i];
sbi->prev_rw_iostat[i] = sbi->rw_iostat[i];
}
- spin_unlock(&sbi->iostat_lock);
+ spin_unlock_bh(&sbi->iostat_lock);
trace_f2fs_iostat(sbi, iostat_diff);
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
int i;
- spin_lock(&sbi->iostat_lock);
+ spin_lock_bh(&sbi->iostat_lock);
for (i = 0; i < NR_IO_TYPE; i++) {
sbi->rw_iostat[i] = 0;
sbi->prev_rw_iostat[i] = 0;
}
- spin_unlock(&sbi->iostat_lock);
+ spin_unlock_bh(&sbi->iostat_lock);
- spin_lock_irq(&sbi->iostat_lat_lock);
+ spin_lock_bh(&sbi->iostat_lat_lock);
memset(io_lat, 0, sizeof(struct iostat_lat_info));
- spin_unlock_irq(&sbi->iostat_lat_lock);
+ spin_unlock_bh(&sbi->iostat_lat_lock);
}
void f2fs_update_iostat(struct f2fs_sb_info *sbi,
if (!sbi->iostat_enable)
return;
- spin_lock(&sbi->iostat_lock);
+ spin_lock_bh(&sbi->iostat_lock);
sbi->rw_iostat[type] += io_bytes;
- if (type == APP_WRITE_IO || type == APP_DIRECT_IO)
- sbi->rw_iostat[APP_BUFFERED_IO] =
- sbi->rw_iostat[APP_WRITE_IO] -
- sbi->rw_iostat[APP_DIRECT_IO];
+ if (type == APP_BUFFERED_IO || type == APP_DIRECT_IO)
+ sbi->rw_iostat[APP_WRITE_IO] += io_bytes;
- if (type == APP_READ_IO || type == APP_DIRECT_READ_IO)
- sbi->rw_iostat[APP_BUFFERED_READ_IO] =
- sbi->rw_iostat[APP_READ_IO] -
- sbi->rw_iostat[APP_DIRECT_READ_IO];
- spin_unlock(&sbi->iostat_lock);
+ if (type == APP_BUFFERED_READ_IO || type == APP_DIRECT_READ_IO)
+ sbi->rw_iostat[APP_READ_IO] += io_bytes;
+
+ spin_unlock_bh(&sbi->iostat_lock);
f2fs_record_iostat(sbi);
}
{
unsigned long ts_diff;
unsigned int iotype = iostat_ctx->type;
- unsigned long flags;
struct f2fs_sb_info *sbi = iostat_ctx->sbi;
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
int idx;
idx = WRITE_ASYNC_IO;
}
- spin_lock_irqsave(&sbi->iostat_lat_lock, flags);
+ spin_lock_bh(&sbi->iostat_lat_lock);
io_lat->sum_lat[idx][iotype] += ts_diff;
io_lat->bio_cnt[idx][iotype]++;
if (ts_diff > io_lat->peak_lat[idx][iotype])
io_lat->peak_lat[idx][iotype] = ts_diff;
- spin_unlock_irqrestore(&sbi->iostat_lat_lock, flags);
+ spin_unlock_bh(&sbi->iostat_lat_lock);
}
void iostat_update_and_unbind_ctx(struct bio *bio, int rw)
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *new, *e;
+ /* Let's mitigate lock contention of nat_tree_lock during checkpoint */
+ if (rwsem_is_locked(&sbi->cp_global_sem))
+ return;
+
new = __alloc_nat_entry(sbi, nid, false);
if (!new)
return;
}
int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
- struct node_info *ni)
+ struct node_info *ni, bool checkpoint_context)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
* nat_tree_lock. Therefore, we should retry, if we failed to grab here
* while not bothering checkpoint.
*/
- if (!rwsem_is_locked(&sbi->cp_global_sem)) {
+ if (!rwsem_is_locked(&sbi->cp_global_sem) || checkpoint_context) {
down_read(&curseg->journal_rwsem);
- } else if (!down_read_trylock(&curseg->journal_rwsem)) {
+ } else if (rwsem_is_contended(&nm_i->nat_tree_lock) ||
+ !down_read_trylock(&curseg->journal_rwsem)) {
up_read(&nm_i->nat_tree_lock);
goto retry;
}
int err;
pgoff_t index;
- err = f2fs_get_node_info(sbi, dn->nid, &ni);
+ err = f2fs_get_node_info(sbi, dn->nid, &ni, false);
if (err)
return err;
goto fail;
#ifdef CONFIG_F2FS_CHECK_FS
- err = f2fs_get_node_info(sbi, dn->nid, &new_ni);
+ err = f2fs_get_node_info(sbi, dn->nid, &new_ni, false);
if (err) {
dec_valid_node_count(sbi, dn->inode, !ofs);
goto fail;
return LOCKED_PAGE;
}
- err = f2fs_get_node_info(sbi, page->index, &ni);
+ err = f2fs_get_node_info(sbi, page->index, &ni, false);
if (err)
return err;
nid = nid_of_node(page);
f2fs_bug_on(sbi, page->index != nid);
- if (f2fs_get_node_info(sbi, nid, &ni))
+ if (f2fs_get_node_info(sbi, nid, &ni, !do_balance))
goto redirty_out;
if (wbc->for_reclaim) {
goto recover_xnid;
/* 1: invalidate the previous xattr nid */
- err = f2fs_get_node_info(sbi, prev_xnid, &ni);
+ err = f2fs_get_node_info(sbi, prev_xnid, &ni, false);
if (err)
return err;
struct page *ipage;
int err;
- err = f2fs_get_node_info(sbi, ino, &old_ni);
+ err = f2fs_get_node_info(sbi, ino, &old_ni, false);
if (err)
return err;
f2fs_wait_on_page_writeback(dn.node_page, NODE, true, true);
- err = f2fs_get_node_info(sbi, dn.nid, &ni);
+ err = f2fs_get_node_info(sbi, dn.nid, &ni, false);
if (err)
goto err;
goto next;
}
- err = f2fs_get_node_info(sbi, dn.nid, &ni);
+ err = f2fs_get_node_info(sbi, dn.nid, &ni, false);
if (err) {
f2fs_put_dnode(&dn);
return err;
secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
if (secno >= MAIN_SECS(sbi)) {
if (dir == ALLOC_RIGHT) {
- secno = find_next_zero_bit(free_i->free_secmap,
- MAIN_SECS(sbi), 0);
+ secno = find_first_zero_bit(free_i->free_secmap,
+ MAIN_SECS(sbi));
f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
} else {
go_left = 1;
left_start--;
continue;
}
- left_start = find_next_zero_bit(free_i->free_secmap,
- MAIN_SECS(sbi), 0);
+ left_start = find_first_zero_bit(free_i->free_secmap,
+ MAIN_SECS(sbi));
f2fs_bug_on(sbi, left_start >= MAIN_SECS(sbi));
break;
}
static inline unsigned int reserved_segments(struct f2fs_sb_info *sbi)
{
- return SM_I(sbi)->reserved_segments;
+ return SM_I(sbi)->reserved_segments +
+ SM_I(sbi)->additional_reserved_segments;
}
static inline unsigned int free_sections(struct f2fs_sb_info *sbi)
[FAULT_WRITE_IO] = "write IO error",
[FAULT_SLAB_ALLOC] = "slab alloc",
[FAULT_DQUOT_INIT] = "dquot initialize",
+ [FAULT_LOCK_OP] = "lock_op",
};
void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
F2FS_OPTION(sbi).s_resgid));
}
+static inline int adjust_reserved_segment(struct f2fs_sb_info *sbi)
+{
+ unsigned int sec_blks = sbi->blocks_per_seg * sbi->segs_per_sec;
+ unsigned int avg_vblocks;
+ unsigned int wanted_reserved_segments;
+ block_t avail_user_block_count;
+
+ if (!F2FS_IO_ALIGNED(sbi))
+ return 0;
+
+ /* average valid block count in section in worst case */
+ avg_vblocks = sec_blks / F2FS_IO_SIZE(sbi);
+
+ /*
+ * we need enough free space when migrating one section in worst case
+ */
+ wanted_reserved_segments = (F2FS_IO_SIZE(sbi) / avg_vblocks) *
+ reserved_segments(sbi);
+ wanted_reserved_segments -= reserved_segments(sbi);
+
+ avail_user_block_count = sbi->user_block_count -
+ sbi->current_reserved_blocks -
+ F2FS_OPTION(sbi).root_reserved_blocks;
+
+ if (wanted_reserved_segments * sbi->blocks_per_seg >
+ avail_user_block_count) {
+ f2fs_err(sbi, "IO align feature can't grab additional reserved segment: %u, available segments: %u",
+ wanted_reserved_segments,
+ avail_user_block_count >> sbi->log_blocks_per_seg);
+ return -ENOSPC;
+ }
+
+ SM_I(sbi)->additional_reserved_segments = wanted_reserved_segments;
+
+ f2fs_info(sbi, "IO align feature needs additional reserved segment: %u",
+ wanted_reserved_segments);
+
+ return 0;
+}
+
static inline void adjust_unusable_cap_perc(struct f2fs_sb_info *sbi)
{
if (!F2FS_OPTION(sbi).unusable_cap_perc)
sbi->seq_file_ra_mul = MIN_RA_MUL;
sbi->max_fragment_chunk = DEF_FRAGMENT_SIZE;
sbi->max_fragment_hole = DEF_FRAGMENT_SIZE;
+ spin_lock_init(&sbi->gc_urgent_high_lock);
sbi->dir_level = DEF_DIR_LEVEL;
sbi->interval_time[CP_TIME] = DEF_CP_INTERVAL;
goto free_nm;
}
+ err = adjust_reserved_segment(sbi);
+ if (err)
+ goto free_nm;
+
/* For write statistics */
sbi->sectors_written_start = f2fs_get_sectors_written(sbi);
return sprintf(buf, "%lx\n", sbi->s_flag);
}
+static ssize_t pending_discard_show(struct f2fs_attr *a,
+ struct f2fs_sb_info *sbi, char *buf)
+{
+ if (!SM_I(sbi)->dcc_info)
+ return -EINVAL;
+ return sprintf(buf, "%llu\n", (unsigned long long)atomic_read(
+ &SM_I(sbi)->dcc_info->discard_cmd_cnt));
+}
+
static ssize_t features_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
{
if (a->struct_type == RESERVED_BLOCKS) {
spin_lock(&sbi->stat_lock);
if (t > (unsigned long)(sbi->user_block_count -
- F2FS_OPTION(sbi).root_reserved_blocks)) {
+ F2FS_OPTION(sbi).root_reserved_blocks -
+ sbi->blocks_per_seg *
+ SM_I(sbi)->additional_reserved_segments)) {
spin_unlock(&sbi->stat_lock);
return -EINVAL;
}
return count;
}
+ if (!strcmp(a->attr.name, "gc_urgent_high_remaining")) {
+ spin_lock(&sbi->gc_urgent_high_lock);
+ sbi->gc_urgent_high_limited = t != 0;
+ sbi->gc_urgent_high_remaining = t;
+ spin_unlock(&sbi->gc_urgent_high_lock);
+
+ return count;
+ }
+
#ifdef CONFIG_F2FS_IOSTAT
if (!strcmp(a->attr.name, "iostat_enable")) {
sbi->iostat_enable = !!t;
#endif
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, data_io_flag, data_io_flag);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, node_io_flag);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_urgent_high_remaining, gc_urgent_high_remaining);
F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, ckpt_thread_ioprio);
F2FS_GENERAL_RO_ATTR(dirty_segments);
F2FS_GENERAL_RO_ATTR(free_segments);
F2FS_GENERAL_RO_ATTR(encoding);
F2FS_GENERAL_RO_ATTR(mounted_time_sec);
F2FS_GENERAL_RO_ATTR(main_blkaddr);
+F2FS_GENERAL_RO_ATTR(pending_discard);
#ifdef CONFIG_F2FS_STAT_FS
F2FS_STAT_ATTR(STAT_INFO, f2fs_stat_info, cp_foreground_calls, cp_count);
F2FS_STAT_ATTR(STAT_INFO, f2fs_stat_info, cp_background_calls, bg_cp_count);
ATTR_LIST(main_blkaddr),
ATTR_LIST(max_small_discards),
ATTR_LIST(discard_granularity),
+ ATTR_LIST(pending_discard),
ATTR_LIST(batched_trim_sections),
ATTR_LIST(ipu_policy),
ATTR_LIST(min_ipu_util),
#endif
ATTR_LIST(data_io_flag),
ATTR_LIST(node_io_flag),
+ ATTR_LIST(gc_urgent_high_remaining),
ATTR_LIST(ckpt_thread_ioprio),
ATTR_LIST(dirty_segments),
ATTR_LIST(free_segments),
}
static struct f2fs_xattr_entry *__find_xattr(void *base_addr,
- void *last_base_addr, int index,
- size_t len, const char *name)
+ void *last_base_addr, void **last_addr,
+ int index, size_t len, const char *name)
{
struct f2fs_xattr_entry *entry;
list_for_each_xattr(entry, base_addr) {
if ((void *)(entry) + sizeof(__u32) > last_base_addr ||
- (void *)XATTR_NEXT_ENTRY(entry) > last_base_addr)
+ (void *)XATTR_NEXT_ENTRY(entry) > last_base_addr) {
+ if (last_addr)
+ *last_addr = entry;
return NULL;
+ }
if (entry->e_name_index != index)
continue;
unsigned int inline_size = inline_xattr_size(inode);
void *max_addr = base_addr + inline_size;
- list_for_each_xattr(entry, base_addr) {
- if ((void *)entry + sizeof(__u32) > max_addr ||
- (void *)XATTR_NEXT_ENTRY(entry) > max_addr) {
- *last_addr = entry;
- return NULL;
- }
- if (entry->e_name_index != index)
- continue;
- if (entry->e_name_len != len)
- continue;
- if (!memcmp(entry->e_name, name, len))
- break;
- }
+ entry = __find_xattr(base_addr, max_addr, last_addr, index, len, name);
+ if (!entry)
+ return NULL;
/* inline xattr header or entry across max inline xattr size */
if (IS_XATTR_LAST_ENTRY(entry) &&
else
cur_addr = txattr_addr;
- *xe = __find_xattr(cur_addr, last_txattr_addr, index, len, name);
+ *xe = __find_xattr(cur_addr, last_txattr_addr, NULL, index, len, name);
if (!*xe) {
f2fs_err(F2FS_I_SB(inode), "inode (%lu) has corrupted xattr",
inode->i_ino);
last_base_addr = (void *)base_addr + XATTR_SIZE(inode);
/* find entry with wanted name. */
- here = __find_xattr(base_addr, last_base_addr, index, len, name);
+ here = __find_xattr(base_addr, last_base_addr, NULL, index, len, name);
if (!here) {
f2fs_err(F2FS_I_SB(inode), "inode (%lu) has corrupted xattr",
inode->i_ino);
}
last = here;
- while (!IS_XATTR_LAST_ENTRY(last))
+ while (!IS_XATTR_LAST_ENTRY(last)) {
+ if ((void *)(last) + sizeof(__u32) > last_base_addr ||
+ (void *)XATTR_NEXT_ENTRY(last) > last_base_addr) {
+ f2fs_err(F2FS_I_SB(inode), "inode (%lu) has invalid last xattr entry, entry_size: %zu",
+ inode->i_ino, ENTRY_SIZE(last));
+ set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_FSCK);
+ error = -EFSCORRUPTED;
+ goto exit;
+ }
last = XATTR_NEXT_ENTRY(last);
+ }
newsize = XATTR_ALIGN(sizeof(struct f2fs_xattr_entry) + len + size);
static int fat_file_release(struct inode *inode, struct file *filp)
{
if ((filp->f_mode & FMODE_WRITE) &&
- MSDOS_SB(inode->i_sb)->options.flush) {
+ MSDOS_SB(inode->i_sb)->options.flush) {
fat_flush_inodes(inode->i_sb, inode, NULL);
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ io_schedule_timeout(HZ/10);
}
return 0;
}
#include "internal.h"
/* sysctl tunables... */
-struct files_stat_struct files_stat = {
+static struct files_stat_struct files_stat = {
.max_files = NR_FILE
};
}
EXPORT_SYMBOL_GPL(get_max_files);
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
+
/*
* Handle nr_files sysctl
*/
-#if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
-int proc_nr_files(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
+static int proc_nr_files(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
{
files_stat.nr_files = get_nr_files();
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
}
-#else
-int proc_nr_files(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
+
+static struct ctl_table fs_stat_sysctls[] = {
+ {
+ .procname = "file-nr",
+ .data = &files_stat,
+ .maxlen = sizeof(files_stat),
+ .mode = 0444,
+ .proc_handler = proc_nr_files,
+ },
+ {
+ .procname = "file-max",
+ .data = &files_stat.max_files,
+ .maxlen = sizeof(files_stat.max_files),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ .extra1 = SYSCTL_LONG_ZERO,
+ .extra2 = SYSCTL_LONG_MAX,
+ },
+ {
+ .procname = "nr_open",
+ .data = &sysctl_nr_open,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &sysctl_nr_open_min,
+ .extra2 = &sysctl_nr_open_max,
+ },
+ { }
+};
+
+static int __init init_fs_stat_sysctls(void)
{
- return -ENOSYS;
+ register_sysctl_init("fs", fs_stat_sysctls);
+ return 0;
}
+fs_initcall(init_fs_stat_sysctls);
#endif
static struct file *__alloc_file(int flags, const struct cred *cred)
unsigned int collidee_debug_id)
{
wait_var_event_timeout(&candidate->flags,
- fscache_is_acquire_pending(candidate), 20 * HZ);
+ !fscache_is_acquire_pending(candidate), 20 * HZ);
if (!fscache_is_acquire_pending(candidate)) {
pr_notice("Potential volume collision new=%08x old=%08x",
candidate->debug_id, collidee_debug_id);
fscache_stat(&fscache_n_volumes_collision);
- wait_var_event(&candidate->flags, fscache_is_acquire_pending(candidate));
+ wait_var_event(&candidate->flags, !fscache_is_acquire_pending(candidate));
}
}
__be32 access_date;
__be32 backup_date;
struct hfsplus_perm permissions;
- struct DInfo user_info;
- struct DXInfo finder_info;
+ struct_group_attr(info, __packed,
+ struct DInfo user_info;
+ struct DXInfo finder_info;
+ );
__be32 text_encoding;
__be32 subfolders; /* Subfolder count in HFSX. Reserved in HFS+. */
} __packed;
__be32 access_date;
__be32 backup_date;
struct hfsplus_perm permissions;
- struct FInfo user_info;
- struct FXInfo finder_info;
+ struct_group_attr(info, __packed,
+ struct FInfo user_info;
+ struct FXInfo finder_info;
+ );
__be32 text_encoding;
u32 reserved2;
sizeof(hfsplus_cat_entry));
if (be16_to_cpu(entry.type) == HFSPLUS_FOLDER) {
if (size == folder_finderinfo_len) {
- memcpy(&entry.folder.user_info, value,
+ memcpy(&entry.folder.info, value,
folder_finderinfo_len);
hfs_bnode_write(cat_fd.bnode, &entry,
cat_fd.entryoffset,
}
} else if (be16_to_cpu(entry.type) == HFSPLUS_FILE) {
if (size == file_finderinfo_len) {
- memcpy(&entry.file.user_info, value,
+ memcpy(&entry.file.info, value,
file_finderinfo_len);
hfs_bnode_write(cat_fd.bnode, &entry,
cat_fd.entryoffset,
};
EXPORT_SYMBOL(empty_aops);
-/*
- * Statistics gathering..
- */
-struct inodes_stat_t inodes_stat;
-
static DEFINE_PER_CPU(unsigned long, nr_inodes);
static DEFINE_PER_CPU(unsigned long, nr_unused);
* Handle nr_inode sysctl
*/
#ifdef CONFIG_SYSCTL
-int proc_nr_inodes(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
+/*
+ * Statistics gathering..
+ */
+static struct inodes_stat_t inodes_stat;
+
+static int proc_nr_inodes(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
{
inodes_stat.nr_inodes = get_nr_inodes();
inodes_stat.nr_unused = get_nr_inodes_unused();
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
}
+
+static struct ctl_table inodes_sysctls[] = {
+ {
+ .procname = "inode-nr",
+ .data = &inodes_stat,
+ .maxlen = 2*sizeof(long),
+ .mode = 0444,
+ .proc_handler = proc_nr_inodes,
+ },
+ {
+ .procname = "inode-state",
+ .data = &inodes_stat,
+ .maxlen = 7*sizeof(long),
+ .mode = 0444,
+ .proc_handler = proc_nr_inodes,
+ },
+ { }
+};
+
+static int __init init_fs_inode_sysctls(void)
+{
+ register_sysctl_init("fs", inodes_sysctls);
+ return 0;
+}
+early_initcall(init_fs_inode_sysctls);
#endif
static int no_open(struct inode *inode, struct file *file)
struct io_wqe *wqe;
struct io_wq_work *cur_work;
- spinlock_t lock;
+ struct io_wq_work *next_work;
+ raw_spinlock_t lock;
struct completion ref_done;
* Worker will start processing some work. Move it to the busy list, if
* it's currently on the freelist
*/
-static void __io_worker_busy(struct io_wqe *wqe, struct io_worker *worker,
- struct io_wq_work *work)
+static void __io_worker_busy(struct io_wqe *wqe, struct io_worker *worker)
__must_hold(wqe->lock)
{
if (worker->flags & IO_WORKER_F_FREE) {
cond_resched();
}
- spin_lock(&worker->lock);
+ raw_spin_lock(&worker->lock);
worker->cur_work = work;
- spin_unlock(&worker->lock);
+ worker->next_work = NULL;
+ raw_spin_unlock(&worker->lock);
}
static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work);
do {
struct io_wq_work *work;
-get_next:
+
/*
* If we got some work, mark us as busy. If we didn't, but
* the list isn't empty, it means we stalled on hashed work.
* clear the stalled flag.
*/
work = io_get_next_work(acct, worker);
- if (work)
- __io_worker_busy(wqe, worker, work);
-
+ if (work) {
+ __io_worker_busy(wqe, worker);
+
+ /*
+ * Make sure cancelation can find this, even before
+ * it becomes the active work. That avoids a window
+ * where the work has been removed from our general
+ * work list, but isn't yet discoverable as the
+ * current work item for this worker.
+ */
+ raw_spin_lock(&worker->lock);
+ worker->next_work = work;
+ raw_spin_unlock(&worker->lock);
+ }
raw_spin_unlock(&wqe->lock);
if (!work)
break;
spin_unlock_irq(&wq->hash->wait.lock);
if (wq_has_sleeper(&wq->hash->wait))
wake_up(&wq->hash->wait);
- raw_spin_lock(&wqe->lock);
- /* skip unnecessary unlock-lock wqe->lock */
- if (!work)
- goto get_next;
- raw_spin_unlock(&wqe->lock);
}
} while (work);
refcount_set(&worker->ref, 1);
worker->wqe = wqe;
- spin_lock_init(&worker->lock);
+ raw_spin_lock_init(&worker->lock);
init_completion(&worker->ref_done);
if (index == IO_WQ_ACCT_BOUND)
work->flags |= (IO_WQ_WORK_HASHED | (bit << IO_WQ_HASH_SHIFT));
}
+static bool __io_wq_worker_cancel(struct io_worker *worker,
+ struct io_cb_cancel_data *match,
+ struct io_wq_work *work)
+{
+ if (work && match->fn(work, match->data)) {
+ work->flags |= IO_WQ_WORK_CANCEL;
+ set_notify_signal(worker->task);
+ return true;
+ }
+
+ return false;
+}
+
static bool io_wq_worker_cancel(struct io_worker *worker, void *data)
{
struct io_cb_cancel_data *match = data;
* Hold the lock to avoid ->cur_work going out of scope, caller
* may dereference the passed in work.
*/
- spin_lock(&worker->lock);
- if (worker->cur_work &&
- match->fn(worker->cur_work, match->data)) {
- set_notify_signal(worker->task);
+ raw_spin_lock(&worker->lock);
+ if (__io_wq_worker_cancel(worker, match, worker->cur_work) ||
+ __io_wq_worker_cancel(worker, match, worker->next_work))
match->nr_running++;
- }
- spin_unlock(&worker->lock);
+ raw_spin_unlock(&worker->lock);
return match->nr_running && !match->cancel_all;
}
{
int i;
retry:
- raw_spin_lock(&wqe->lock);
for (i = 0; i < IO_WQ_ACCT_NR; i++) {
struct io_wqe_acct *acct = io_get_acct(wqe, i == 0);
if (io_acct_cancel_pending_work(wqe, acct, match)) {
+ raw_spin_lock(&wqe->lock);
if (match->cancel_all)
goto retry;
- return;
+ break;
}
}
- raw_spin_unlock(&wqe->lock);
}
static void io_wqe_cancel_running_work(struct io_wqe *wqe,
* First check pending list, if we're lucky we can just remove it
* from there. CANCEL_OK means that the work is returned as-new,
* no completion will be posted for it.
- */
- for_each_node(node) {
- struct io_wqe *wqe = wq->wqes[node];
-
- io_wqe_cancel_pending_work(wqe, &match);
- if (match.nr_pending && !match.cancel_all)
- return IO_WQ_CANCEL_OK;
- }
-
- /*
- * Now check if a free (going busy) or busy worker has the work
+ *
+ * Then check if a free (going busy) or busy worker has the work
* currently running. If we find it there, we'll return CANCEL_RUNNING
* as an indication that we attempt to signal cancellation. The
* completion will run normally in this case.
+ *
+ * Do both of these while holding the wqe->lock, to ensure that
+ * we'll find a work item regardless of state.
*/
for_each_node(node) {
struct io_wqe *wqe = wq->wqes[node];
+ raw_spin_lock(&wqe->lock);
+ io_wqe_cancel_pending_work(wqe, &match);
+ if (match.nr_pending && !match.cancel_all) {
+ raw_spin_unlock(&wqe->lock);
+ return IO_WQ_CANCEL_OK;
+ }
+
io_wqe_cancel_running_work(wqe, &match);
+ raw_spin_unlock(&wqe->lock);
if (match.nr_running && !match.cancel_all)
return IO_WQ_CANCEL_RUNNING;
}
.fn = io_wq_work_match_all,
.cancel_all = true,
};
+ raw_spin_lock(&wqe->lock);
io_wqe_cancel_pending_work(wqe, &match);
+ raw_spin_unlock(&wqe->lock);
free_cpumask_var(wqe->cpu_mask);
kfree(wqe);
}
return atomic_dec_and_test(&req->refs);
}
-static inline void req_ref_put(struct io_kiocb *req)
-{
- WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT));
- WARN_ON_ONCE(req_ref_put_and_test(req));
-}
-
static inline void req_ref_get(struct io_kiocb *req)
{
WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT));
static inline void io_poll_remove_entry(struct io_poll_iocb *poll)
{
- struct wait_queue_head *head = poll->head;
+ struct wait_queue_head *head = smp_load_acquire(&poll->head);
- spin_lock_irq(&head->lock);
- list_del_init(&poll->wait.entry);
- poll->head = NULL;
- spin_unlock_irq(&head->lock);
+ if (head) {
+ spin_lock_irq(&head->lock);
+ list_del_init(&poll->wait.entry);
+ poll->head = NULL;
+ spin_unlock_irq(&head->lock);
+ }
}
static void io_poll_remove_entries(struct io_kiocb *req)
struct io_poll_iocb *poll = io_poll_get_single(req);
struct io_poll_iocb *poll_double = io_poll_get_double(req);
- if (poll->head)
- io_poll_remove_entry(poll);
- if (poll_double && poll_double->head)
+ /*
+ * While we hold the waitqueue lock and the waitqueue is nonempty,
+ * wake_up_pollfree() will wait for us. However, taking the waitqueue
+ * lock in the first place can race with the waitqueue being freed.
+ *
+ * We solve this as eventpoll does: by taking advantage of the fact that
+ * all users of wake_up_pollfree() will RCU-delay the actual free. If
+ * we enter rcu_read_lock() and see that the pointer to the queue is
+ * non-NULL, we can then lock it without the memory being freed out from
+ * under us.
+ *
+ * Keep holding rcu_read_lock() as long as we hold the queue lock, in
+ * case the caller deletes the entry from the queue, leaving it empty.
+ * In that case, only RCU prevents the queue memory from being freed.
+ */
+ rcu_read_lock();
+ io_poll_remove_entry(poll);
+ if (poll_double)
io_poll_remove_entry(poll_double);
+ rcu_read_unlock();
}
/*
wait);
__poll_t mask = key_to_poll(key);
+ if (unlikely(mask & POLLFREE)) {
+ io_poll_mark_cancelled(req);
+ /* we have to kick tw in case it's not already */
+ io_poll_execute(req, 0);
+
+ /*
+ * If the waitqueue is being freed early but someone is already
+ * holds ownership over it, we have to tear down the request as
+ * best we can. That means immediately removing the request from
+ * its waitqueue and preventing all further accesses to the
+ * waitqueue via the request.
+ */
+ list_del_init(&poll->wait.entry);
+
+ /*
+ * Careful: this *must* be the last step, since as soon
+ * as req->head is NULL'ed out, the request can be
+ * completed and freed, since aio_poll_complete_work()
+ * will no longer need to take the waitqueue lock.
+ */
+ smp_store_release(&poll->head, NULL);
+ return 1;
+ }
+
/* for instances that support it check for an event match first */
if (mask && !(mask & poll->events))
return 0;
WARN_ON_ONCE(!io_wq_current_is_worker() && req->task != current);
ret = io_async_cancel_one(req->task->io_uring, sqe_addr, ctx);
- if (ret != -ENOENT)
- return ret;
+ /*
+ * Fall-through even for -EALREADY, as we may have poll armed
+ * that need unarming.
+ */
+ if (!ret)
+ return 0;
spin_lock(&ctx->completion_lock);
+ ret = io_poll_cancel(ctx, sqe_addr, false);
+ if (ret != -ENOENT)
+ goto out;
+
spin_lock_irq(&ctx->timeout_lock);
ret = io_timeout_cancel(ctx, sqe_addr);
spin_unlock_irq(&ctx->timeout_lock);
- if (ret != -ENOENT)
- goto out;
- ret = io_poll_cancel(ctx, sqe_addr, false);
out:
spin_unlock(&ctx->completion_lock);
return ret;
static int jbd2_seq_info_open(struct inode *inode, struct file *file)
{
- journal_t *journal = PDE_DATA(inode);
+ journal_t *journal = pde_data(inode);
struct jbd2_stats_proc_session *s;
int rc, size;
#include "ksmbd_spnego_negtokeninit.asn1.h"
#include "ksmbd_spnego_negtokentarg.asn1.h"
-#define SPNEGO_OID_LEN 7
#define NTLMSSP_OID_LEN 10
-#define KRB5_OID_LEN 7
-#define KRB5U2U_OID_LEN 8
-#define MSKRB5_OID_LEN 7
-static unsigned long SPNEGO_OID[7] = { 1, 3, 6, 1, 5, 5, 2 };
-static unsigned long NTLMSSP_OID[10] = { 1, 3, 6, 1, 4, 1, 311, 2, 2, 10 };
-static unsigned long KRB5_OID[7] = { 1, 2, 840, 113554, 1, 2, 2 };
-static unsigned long KRB5U2U_OID[8] = { 1, 2, 840, 113554, 1, 2, 2, 3 };
-static unsigned long MSKRB5_OID[7] = { 1, 2, 840, 48018, 1, 2, 2 };
static char NTLMSSP_OID_STR[NTLMSSP_OID_LEN] = { 0x2b, 0x06, 0x01, 0x04, 0x01,
0x82, 0x37, 0x02, 0x02, 0x0a };
-static bool
-asn1_subid_decode(const unsigned char **begin, const unsigned char *end,
- unsigned long *subid)
-{
- const unsigned char *ptr = *begin;
- unsigned char ch;
-
- *subid = 0;
-
- do {
- if (ptr >= end)
- return false;
-
- ch = *ptr++;
- *subid <<= 7;
- *subid |= ch & 0x7F;
- } while ((ch & 0x80) == 0x80);
-
- *begin = ptr;
- return true;
-}
-
-static bool asn1_oid_decode(const unsigned char *value, size_t vlen,
- unsigned long **oid, size_t *oidlen)
-{
- const unsigned char *iptr = value, *end = value + vlen;
- unsigned long *optr;
- unsigned long subid;
-
- vlen += 1;
- if (vlen < 2 || vlen > UINT_MAX / sizeof(unsigned long))
- goto fail_nullify;
-
- *oid = kmalloc(vlen * sizeof(unsigned long), GFP_KERNEL);
- if (!*oid)
- return false;
-
- optr = *oid;
-
- if (!asn1_subid_decode(&iptr, end, &subid))
- goto fail;
-
- if (subid < 40) {
- optr[0] = 0;
- optr[1] = subid;
- } else if (subid < 80) {
- optr[0] = 1;
- optr[1] = subid - 40;
- } else {
- optr[0] = 2;
- optr[1] = subid - 80;
- }
-
- *oidlen = 2;
- optr += 2;
-
- while (iptr < end) {
- if (++(*oidlen) > vlen)
- goto fail;
-
- if (!asn1_subid_decode(&iptr, end, optr++))
- goto fail;
- }
- return true;
-
-fail:
- kfree(*oid);
-fail_nullify:
- *oid = NULL;
- return false;
-}
-
-static bool oid_eq(unsigned long *oid1, unsigned int oid1len,
- unsigned long *oid2, unsigned int oid2len)
-{
- if (oid1len != oid2len)
- return false;
-
- return memcmp(oid1, oid2, oid1len) == 0;
-}
-
int
ksmbd_decode_negTokenInit(unsigned char *security_blob, int length,
struct ksmbd_conn *conn)
int ksmbd_gssapi_this_mech(void *context, size_t hdrlen, unsigned char tag,
const void *value, size_t vlen)
{
- unsigned long *oid;
- size_t oidlen;
- int err = 0;
-
- if (!asn1_oid_decode(value, vlen, &oid, &oidlen)) {
- err = -EBADMSG;
- goto out;
- }
+ enum OID oid;
- if (!oid_eq(oid, oidlen, SPNEGO_OID, SPNEGO_OID_LEN))
- err = -EBADMSG;
- kfree(oid);
-out:
- if (err) {
+ oid = look_up_OID(value, vlen);
+ if (oid != OID_spnego) {
char buf[50];
sprint_oid(value, vlen, buf, sizeof(buf));
ksmbd_debug(AUTH, "Unexpected OID: %s\n", buf);
+ return -EBADMSG;
}
- return err;
+
+ return 0;
}
int ksmbd_neg_token_init_mech_type(void *context, size_t hdrlen,
size_t vlen)
{
struct ksmbd_conn *conn = context;
- unsigned long *oid;
- size_t oidlen;
+ enum OID oid;
int mech_type;
- char buf[50];
- if (!asn1_oid_decode(value, vlen, &oid, &oidlen))
- goto fail;
-
- if (oid_eq(oid, oidlen, NTLMSSP_OID, NTLMSSP_OID_LEN))
+ oid = look_up_OID(value, vlen);
+ if (oid == OID_ntlmssp) {
mech_type = KSMBD_AUTH_NTLMSSP;
- else if (oid_eq(oid, oidlen, MSKRB5_OID, MSKRB5_OID_LEN))
+ } else if (oid == OID_mskrb5) {
mech_type = KSMBD_AUTH_MSKRB5;
- else if (oid_eq(oid, oidlen, KRB5_OID, KRB5_OID_LEN))
+ } else if (oid == OID_krb5) {
mech_type = KSMBD_AUTH_KRB5;
- else if (oid_eq(oid, oidlen, KRB5U2U_OID, KRB5U2U_OID_LEN))
+ } else if (oid == OID_krb5u2u) {
mech_type = KSMBD_AUTH_KRB5U2U;
- else
- goto fail;
+ } else {
+ char buf[50];
+
+ sprint_oid(value, vlen, buf, sizeof(buf));
+ ksmbd_debug(AUTH, "Unexpected OID: %s\n", buf);
+ return -EBADMSG;
+ }
conn->auth_mechs |= mech_type;
if (conn->preferred_auth_mech == 0)
conn->preferred_auth_mech = mech_type;
- kfree(oid);
return 0;
-
-fail:
- kfree(oid);
- sprint_oid(value, vlen, buf, sizeof(buf));
- ksmbd_debug(AUTH, "Unexpected OID: %s\n", buf);
- return -EBADMSG;
}
int ksmbd_neg_token_init_mech_token(void *context, size_t hdrlen,
* Return: 0 on success, error number on error
*/
int ksmbd_auth_ntlmv2(struct ksmbd_session *sess, struct ntlmv2_resp *ntlmv2,
- int blen, char *domain_name)
+ int blen, char *domain_name, char *cryptkey)
{
char ntlmv2_hash[CIFS_ENCPWD_SIZE];
char ntlmv2_rsp[CIFS_HMAC_MD5_HASH_SIZE];
goto out;
}
- memcpy(construct, sess->ntlmssp.cryptkey, CIFS_CRYPTO_KEY_SIZE);
+ memcpy(construct, cryptkey, CIFS_CRYPTO_KEY_SIZE);
memcpy(construct + CIFS_CRYPTO_KEY_SIZE, &ntlmv2->blob_signature, blen);
rc = crypto_shash_update(CRYPTO_HMACMD5(ctx), construct, len);
* Return: 0 on success, error number on error
*/
int ksmbd_decode_ntlmssp_auth_blob(struct authenticate_message *authblob,
- int blob_len, struct ksmbd_session *sess)
+ int blob_len, struct ksmbd_conn *conn,
+ struct ksmbd_session *sess)
{
char *domain_name;
unsigned int nt_off, dn_off;
/* TODO : use domain name that imported from configuration file */
domain_name = smb_strndup_from_utf16((const char *)authblob + dn_off,
- dn_len, true, sess->conn->local_nls);
+ dn_len, true, conn->local_nls);
if (IS_ERR(domain_name))
return PTR_ERR(domain_name);
domain_name);
ret = ksmbd_auth_ntlmv2(sess, (struct ntlmv2_resp *)((char *)authblob + nt_off),
nt_len - CIFS_ENCPWD_SIZE,
- domain_name);
+ domain_name, conn->ntlmssp.cryptkey);
kfree(domain_name);
return ret;
}
*
*/
int ksmbd_decode_ntlmssp_neg_blob(struct negotiate_message *negblob,
- int blob_len, struct ksmbd_session *sess)
+ int blob_len, struct ksmbd_conn *conn)
{
if (blob_len < sizeof(struct negotiate_message)) {
ksmbd_debug(AUTH, "negotiate blob len %d too small\n",
return -EINVAL;
}
- sess->ntlmssp.client_flags = le32_to_cpu(negblob->NegotiateFlags);
+ conn->ntlmssp.client_flags = le32_to_cpu(negblob->NegotiateFlags);
return 0;
}
*/
unsigned int
ksmbd_build_ntlmssp_challenge_blob(struct challenge_message *chgblob,
- struct ksmbd_session *sess)
+ struct ksmbd_conn *conn)
{
struct target_info *tinfo;
wchar_t *name;
__u8 *target_name;
unsigned int flags, blob_off, blob_len, type, target_info_len = 0;
int len, uni_len, conv_len;
- int cflags = sess->ntlmssp.client_flags;
+ int cflags = conn->ntlmssp.client_flags;
memcpy(chgblob->Signature, NTLMSSP_SIGNATURE, 8);
chgblob->MessageType = NtLmChallenge;
if (cflags & NTLMSSP_REQUEST_TARGET)
flags |= NTLMSSP_REQUEST_TARGET;
- if (sess->conn->use_spnego &&
+ if (conn->use_spnego &&
(cflags & NTLMSSP_NEGOTIATE_EXTENDED_SEC))
flags |= NTLMSSP_NEGOTIATE_EXTENDED_SEC;
return -ENOMEM;
conv_len = smb_strtoUTF16((__le16 *)name, ksmbd_netbios_name(), len,
- sess->conn->local_nls);
+ conn->local_nls);
if (conv_len < 0 || conv_len > len) {
kfree(name);
return -EINVAL;
chgblob->TargetName.BufferOffset = cpu_to_le32(blob_off);
/* Initialize random conn challenge */
- get_random_bytes(sess->ntlmssp.cryptkey, sizeof(__u64));
- memcpy(chgblob->Challenge, sess->ntlmssp.cryptkey,
+ get_random_bytes(conn->ntlmssp.cryptkey, sizeof(__u64));
+ memcpy(chgblob->Challenge, conn->ntlmssp.cryptkey,
CIFS_CRYPTO_KEY_SIZE);
/* Add Target Information to security buffer */
int ksmbd_crypt_message(struct ksmbd_conn *conn, struct kvec *iov,
unsigned int nvec, int enc);
void ksmbd_copy_gss_neg_header(void *buf);
-int ksmbd_auth_ntlm(struct ksmbd_session *sess, char *pw_buf);
int ksmbd_auth_ntlmv2(struct ksmbd_session *sess, struct ntlmv2_resp *ntlmv2,
- int blen, char *domain_name);
+ int blen, char *domain_name, char *cryptkey);
int ksmbd_decode_ntlmssp_auth_blob(struct authenticate_message *authblob,
- int blob_len, struct ksmbd_session *sess);
+ int blob_len, struct ksmbd_conn *conn,
+ struct ksmbd_session *sess);
int ksmbd_decode_ntlmssp_neg_blob(struct negotiate_message *negblob,
- int blob_len, struct ksmbd_session *sess);
+ int blob_len, struct ksmbd_conn *conn);
unsigned int
ksmbd_build_ntlmssp_challenge_blob(struct challenge_message *chgblob,
- struct ksmbd_session *sess);
+ struct ksmbd_conn *conn);
int ksmbd_krb5_authenticate(struct ksmbd_session *sess, char *in_blob,
int in_len, char *out_blob, int *out_len);
int ksmbd_sign_smb2_pdu(struct ksmbd_conn *conn, char *key, struct kvec *iov,
atomic_set(&conn->req_running, 0);
atomic_set(&conn->r_count, 0);
conn->total_credits = 1;
+ conn->outstanding_credits = 1;
init_waitqueue_head(&conn->req_running_q);
INIT_LIST_HEAD(&conn->conns_list);
static void stop_sessions(void)
{
struct ksmbd_conn *conn;
+ struct ksmbd_transport *t;
again:
read_lock(&conn_list_lock);
list_for_each_entry(conn, &conn_list, conns_list) {
struct task_struct *task;
- task = conn->transport->handler;
+ t = conn->transport;
+ task = t->handler;
if (task)
ksmbd_debug(CONN, "Stop session handler %s/%d\n",
task->comm, task_pid_nr(task));
conn->status = KSMBD_SESS_EXITING;
+ if (t->ops->shutdown) {
+ read_unlock(&conn_list_lock);
+ t->ops->shutdown(t);
+ read_lock(&conn_list_lock);
+ }
}
read_unlock(&conn_list_lock);
atomic_t req_running;
/* References which are made for this Server object*/
atomic_t r_count;
- unsigned short total_credits;
- unsigned short max_credits;
+ unsigned int total_credits;
+ unsigned int outstanding_credits;
spinlock_t credits_lock;
wait_queue_head_t req_running_q;
/* Lock to protect requests list*/
int connection_type;
struct ksmbd_stats stats;
char ClientGUID[SMB2_CLIENT_GUID_SIZE];
- union {
- /* pending trans request table */
- struct trans_state *recent_trans;
- /* Used by ntlmssp */
- char *ntlmssp_cryptkey;
- };
+ struct ntlmssp_auth ntlmssp;
spinlock_t llist_lock;
struct list_head lock_list;
struct ksmbd_transport_ops {
int (*prepare)(struct ksmbd_transport *t);
void (*disconnect)(struct ksmbd_transport *t);
+ void (*shutdown)(struct ksmbd_transport *t);
int (*read)(struct ksmbd_transport *t, char *buf, unsigned int size);
int (*writev)(struct ksmbd_transport *t, struct kvec *iovs, int niov,
int size, bool need_invalidate_rkey,
* we set the SPARSE_FILES bit (0x40).
*/
__u32 sub_auth[3]; /* Subauth value for Security ID */
+ __u32 smb2_max_credits; /* MAX credits */
+ __u32 reserved[128]; /* Reserved room */
__u32 ifc_list_sz; /* interfaces list size */
__s8 ____payload[];
};
* IPC request to shutdown ksmbd server.
*/
struct ksmbd_shutdown_request {
- __s32 reserved;
+ __s32 reserved[16];
};
/*
struct ksmbd_login_request {
__u32 handle;
__s8 account[KSMBD_REQ_MAX_ACCOUNT_NAME_SZ]; /* user account name */
+ __u32 reserved[16]; /* Reserved room */
};
/*
__u16 status;
__u16 hash_sz; /* hash size */
__s8 hash[KSMBD_REQ_MAX_HASH_SZ]; /* password hash */
+ __u32 reserved[16]; /* Reserved room */
};
/*
struct ksmbd_share_config_request {
__u32 handle;
__s8 share_name[KSMBD_REQ_MAX_SHARE_NAME]; /* share name */
+ __u32 reserved[16]; /* Reserved room */
};
/*
__u16 force_directory_mode;
__u16 force_uid;
__u16 force_gid;
+ __u32 reserved[128]; /* Reserved room */
__u32 veto_list_sz;
__s8 ____payload[];
};
__s8 account[KSMBD_REQ_MAX_ACCOUNT_NAME_SZ];
__s8 share[KSMBD_REQ_MAX_SHARE_NAME];
__s8 peer_addr[64];
+ __u32 reserved[16]; /* Reserved room */
};
/*
__u32 handle;
__u16 status;
__u16 connection_flags;
+ __u32 reserved[16]; /* Reserved room */
};
/*
struct ksmbd_tree_disconnect_request {
__u64 session_id; /* session id */
__u64 connect_id; /* tree connection id */
+ __u32 reserved[16]; /* Reserved room */
};
/*
struct ksmbd_logout_request {
__s8 account[KSMBD_REQ_MAX_ACCOUNT_NAME_SZ]; /* user account name */
__u32 account_flags;
+ __u32 reserved[16]; /* Reserved room */
};
/*
return 1;
return 0;
}
+
+bool ksmbd_compare_user(struct ksmbd_user *u1, struct ksmbd_user *u2)
+{
+ if (strcmp(u1->name, u2->name))
+ return false;
+ if (memcmp(u1->passkey, u2->passkey, u1->passkey_sz))
+ return false;
+
+ return true;
+}
struct ksmbd_user *ksmbd_alloc_user(struct ksmbd_login_response *resp);
void ksmbd_free_user(struct ksmbd_user *user);
int ksmbd_anonymous_user(struct ksmbd_user *user);
+bool ksmbd_compare_user(struct ksmbd_user *u1, struct ksmbd_user *u2);
#endif /* __USER_CONFIG_MANAGEMENT_H__ */
int state;
__u8 *Preauth_HashValue;
- struct ntlmssp_auth ntlmssp;
char sess_key[CIFS_KEY_SIZE];
struct hlist_node hlist;
unsigned int req_len = 0, expect_resp_len = 0, calc_credit_num, max_len;
unsigned short credit_charge = le16_to_cpu(hdr->CreditCharge);
void *__hdr = hdr;
- int ret;
+ int ret = 0;
switch (hdr->Command) {
case SMB2_QUERY_INFO:
ksmbd_debug(SMB, "Insufficient credit charge, given: %d, needed: %d\n",
credit_charge, calc_credit_num);
return 1;
- } else if (credit_charge > conn->max_credits) {
+ } else if (credit_charge > conn->vals->max_credits) {
ksmbd_debug(SMB, "Too large credit charge: %d\n", credit_charge);
return 1;
}
spin_lock(&conn->credits_lock);
- if (credit_charge <= conn->total_credits) {
- conn->total_credits -= credit_charge;
- ret = 0;
- } else {
+ if (credit_charge > conn->total_credits) {
ksmbd_debug(SMB, "Insufficient credits granted, given: %u, granted: %u\n",
credit_charge, conn->total_credits);
ret = 1;
}
+
+ if ((u64)conn->outstanding_credits + credit_charge > conn->vals->max_credits) {
+ ksmbd_debug(SMB, "Limits exceeding the maximum allowable outstanding requests, given : %u, pending : %u\n",
+ credit_charge, conn->outstanding_credits);
+ ret = 1;
+ } else
+ conn->outstanding_credits += credit_charge;
+
spin_unlock(&conn->credits_lock);
+
return ret;
}
.max_read_size = SMB21_DEFAULT_IOSIZE,
.max_write_size = SMB21_DEFAULT_IOSIZE,
.max_trans_size = SMB21_DEFAULT_IOSIZE,
+ .max_credits = SMB2_MAX_CREDITS,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE,
.shared_lock_type = SMB2_LOCKFLAG_SHARED,
.max_read_size = SMB3_DEFAULT_IOSIZE,
.max_write_size = SMB3_DEFAULT_IOSIZE,
.max_trans_size = SMB3_DEFAULT_TRANS_SIZE,
+ .max_credits = SMB2_MAX_CREDITS,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE,
.shared_lock_type = SMB2_LOCKFLAG_SHARED,
.max_read_size = SMB3_DEFAULT_IOSIZE,
.max_write_size = SMB3_DEFAULT_IOSIZE,
.max_trans_size = SMB3_DEFAULT_TRANS_SIZE,
+ .max_credits = SMB2_MAX_CREDITS,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE,
.shared_lock_type = SMB2_LOCKFLAG_SHARED,
.max_read_size = SMB3_DEFAULT_IOSIZE,
.max_write_size = SMB3_DEFAULT_IOSIZE,
.max_trans_size = SMB3_DEFAULT_TRANS_SIZE,
+ .max_credits = SMB2_MAX_CREDITS,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE,
.shared_lock_type = SMB2_LOCKFLAG_SHARED,
conn->ops = &smb2_0_server_ops;
conn->cmds = smb2_0_server_cmds;
conn->max_cmds = ARRAY_SIZE(smb2_0_server_cmds);
- conn->max_credits = SMB2_MAX_CREDITS;
conn->signing_algorithm = SIGNING_ALG_HMAC_SHA256_LE;
if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_LEASES)
conn->ops = &smb3_0_server_ops;
conn->cmds = smb2_0_server_cmds;
conn->max_cmds = ARRAY_SIZE(smb2_0_server_cmds);
- conn->max_credits = SMB2_MAX_CREDITS;
conn->signing_algorithm = SIGNING_ALG_AES_CMAC_LE;
if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_LEASES)
conn->ops = &smb3_0_server_ops;
conn->cmds = smb2_0_server_cmds;
conn->max_cmds = ARRAY_SIZE(smb2_0_server_cmds);
- conn->max_credits = SMB2_MAX_CREDITS;
conn->signing_algorithm = SIGNING_ALG_AES_CMAC_LE;
if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_LEASES)
conn->ops = &smb3_11_server_ops;
conn->cmds = smb2_0_server_cmds;
conn->max_cmds = ARRAY_SIZE(smb2_0_server_cmds);
- conn->max_credits = SMB2_MAX_CREDITS;
conn->signing_algorithm = SIGNING_ALG_AES_CMAC_LE;
if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_LEASES)
smb302_server_values.max_trans_size = sz;
smb311_server_values.max_trans_size = sz;
}
+
+void init_smb2_max_credits(unsigned int sz)
+{
+ smb21_server_values.max_credits = sz;
+ smb30_server_values.max_credits = sz;
+ smb302_server_values.max_credits = sz;
+ smb311_server_values.max_credits = sz;
+}
struct smb2_hdr *req_hdr = ksmbd_req_buf_next(work);
struct smb2_hdr *hdr = ksmbd_resp_buf_next(work);
struct ksmbd_conn *conn = work->conn;
- unsigned short credits_requested;
+ unsigned short credits_requested, aux_max;
unsigned short credit_charge, credits_granted = 0;
- unsigned short aux_max, aux_credits;
if (work->send_no_response)
return 0;
hdr->CreditCharge = req_hdr->CreditCharge;
- if (conn->total_credits > conn->max_credits) {
+ if (conn->total_credits > conn->vals->max_credits) {
hdr->CreditRequest = 0;
pr_err("Total credits overflow: %d\n", conn->total_credits);
return -EINVAL;
credit_charge = max_t(unsigned short,
le16_to_cpu(req_hdr->CreditCharge), 1);
+ if (credit_charge > conn->total_credits) {
+ ksmbd_debug(SMB, "Insufficient credits granted, given: %u, granted: %u\n",
+ credit_charge, conn->total_credits);
+ return -EINVAL;
+ }
+
+ conn->total_credits -= credit_charge;
+ conn->outstanding_credits -= credit_charge;
credits_requested = max_t(unsigned short,
le16_to_cpu(req_hdr->CreditRequest), 1);
* TODO: Need to adjuct CreditRequest value according to
* current cpu load
*/
- aux_credits = credits_requested - 1;
if (hdr->Command == SMB2_NEGOTIATE)
- aux_max = 0;
+ aux_max = 1;
else
- aux_max = conn->max_credits - credit_charge;
- aux_credits = min_t(unsigned short, aux_credits, aux_max);
- credits_granted = credit_charge + aux_credits;
+ aux_max = conn->vals->max_credits - credit_charge;
+ credits_granted = min_t(unsigned short, credits_requested, aux_max);
- if (conn->max_credits - conn->total_credits < credits_granted)
- credits_granted = conn->max_credits -
+ if (conn->vals->max_credits - conn->total_credits < credits_granted)
+ credits_granted = conn->vals->max_credits -
conn->total_credits;
conn->total_credits += credits_granted;
/**
* smb2_get_name() - get filename string from on the wire smb format
- * @share: ksmbd_share_config pointer
* @src: source buffer
* @maxlen: maxlen of source string
- * @nls_table: nls_table pointer
+ * @local_nls: nls_table pointer
*
* Return: matching converted filename on success, otherwise error ptr
*/
static char *
-smb2_get_name(struct ksmbd_share_config *share, const char *src,
- const int maxlen, struct nls_table *local_nls)
+smb2_get_name(const char *src, const int maxlen, struct nls_table *local_nls)
{
char *name;
int sz, rc;
ksmbd_debug(SMB, "negotiate phase\n");
- rc = ksmbd_decode_ntlmssp_neg_blob(negblob, negblob_len, work->sess);
+ rc = ksmbd_decode_ntlmssp_neg_blob(negblob, negblob_len, work->conn);
if (rc)
return rc;
memset(chgblob, 0, sizeof(struct challenge_message));
if (!work->conn->use_spnego) {
- sz = ksmbd_build_ntlmssp_challenge_blob(chgblob, work->sess);
+ sz = ksmbd_build_ntlmssp_challenge_blob(chgblob, work->conn);
if (sz < 0)
return -ENOMEM;
return -ENOMEM;
chgblob = (struct challenge_message *)neg_blob;
- sz = ksmbd_build_ntlmssp_challenge_blob(chgblob, work->sess);
+ sz = ksmbd_build_ntlmssp_challenge_blob(chgblob, work->conn);
if (sz < 0) {
rc = -ENOMEM;
goto out;
ksmbd_free_user(user);
return 0;
}
- ksmbd_free_user(sess->user);
- }
- sess->user = user;
- if (user_guest(sess->user)) {
- if (conn->sign) {
- ksmbd_debug(SMB, "Guest login not allowed when signing enabled\n");
+ if (!ksmbd_compare_user(sess->user, user)) {
+ ksmbd_free_user(user);
return -EPERM;
}
+ ksmbd_free_user(user);
+ } else {
+ sess->user = user;
+ }
+ if (user_guest(sess->user)) {
rsp->SessionFlags = SMB2_SESSION_FLAG_IS_GUEST_LE;
} else {
struct authenticate_message *authblob;
authblob = user_authblob(conn, req);
sz = le16_to_cpu(req->SecurityBufferLength);
- rc = ksmbd_decode_ntlmssp_auth_blob(authblob, sz, sess);
+ rc = ksmbd_decode_ntlmssp_auth_blob(authblob, sz, conn, sess);
if (rc) {
set_user_flag(sess->user, KSMBD_USER_FLAG_BAD_PASSWORD);
ksmbd_debug(SMB, "authentication failed\n");
return -EPERM;
}
+ }
- /*
- * If session state is SMB2_SESSION_VALID, We can assume
- * that it is reauthentication. And the user/password
- * has been verified, so return it here.
- */
- if (sess->state == SMB2_SESSION_VALID) {
- if (conn->binding)
- goto binding_session;
- return 0;
- }
+ /*
+ * If session state is SMB2_SESSION_VALID, We can assume
+ * that it is reauthentication. And the user/password
+ * has been verified, so return it here.
+ */
+ if (sess->state == SMB2_SESSION_VALID) {
+ if (conn->binding)
+ goto binding_session;
+ return 0;
+ }
- if ((conn->sign || server_conf.enforced_signing) ||
- (req->SecurityMode & SMB2_NEGOTIATE_SIGNING_REQUIRED))
- sess->sign = true;
+ if ((rsp->SessionFlags != SMB2_SESSION_FLAG_IS_GUEST_LE &&
+ (conn->sign || server_conf.enforced_signing)) ||
+ (req->SecurityMode & SMB2_NEGOTIATE_SIGNING_REQUIRED))
+ sess->sign = true;
- if (smb3_encryption_negotiated(conn) &&
- !(req->Flags & SMB2_SESSION_REQ_FLAG_BINDING)) {
- rc = conn->ops->generate_encryptionkey(sess);
- if (rc) {
- ksmbd_debug(SMB,
- "SMB3 encryption key generation failed\n");
- return -EINVAL;
- }
- sess->enc = true;
- rsp->SessionFlags = SMB2_SESSION_FLAG_ENCRYPT_DATA_LE;
- /*
- * signing is disable if encryption is enable
- * on this session
- */
- sess->sign = false;
+ if (smb3_encryption_negotiated(conn) &&
+ !(req->Flags & SMB2_SESSION_REQ_FLAG_BINDING)) {
+ rc = conn->ops->generate_encryptionkey(sess);
+ if (rc) {
+ ksmbd_debug(SMB,
+ "SMB3 encryption key generation failed\n");
+ return -EINVAL;
}
+ sess->enc = true;
+ rsp->SessionFlags = SMB2_SESSION_FLAG_ENCRYPT_DATA_LE;
+ /*
+ * signing is disable if encryption is enable
+ * on this session
+ */
+ sess->sign = false;
}
binding_session:
ksmbd_debug(SMB, "request\n");
- /* Got a valid session, set connection state */
- WARN_ON(sess->conn != conn);
-
/* setting CifsExiting here may race with start_tcp_sess */
ksmbd_conn_set_need_reconnect(work);
ksmbd_close_session_fds(work);
goto err_out1;
}
- name = smb2_get_name(share,
- req->Buffer,
+ name = smb2_get_name(req->Buffer,
le16_to_cpu(req->NameLength),
work->conn->local_nls);
if (IS_ERR(name)) {
* @conn: connection instance
* @info_level: smb information level
* @d_info: structure included variables for query dir
- * @user_ns: user namespace
* @ksmbd_kstat: ksmbd wrapper of dirent stat information
*
* if directory has many entries, find first can't read it fully.
* buffer_check_err() - helper function to check buffer errors
* @reqOutputBufferLength: max buffer length expected in command response
* @rsp: query info response buffer contains output buffer length
+ * @rsp_org: base response buffer pointer in case of chained response
* @infoclass_size: query info class response buffer size
*
* Return: 0 on success, otherwise error
goto out;
}
- new_name = smb2_get_name(share,
- file_info->FileName,
+ new_name = smb2_get_name(file_info->FileName,
le32_to_cpu(file_info->FileNameLength),
local_nls);
if (IS_ERR(new_name)) {
if (!pathname)
return -ENOMEM;
- link_name = smb2_get_name(share,
- file_info->FileName,
+ link_name = smb2_get_name(file_info->FileName,
le32_to_cpu(file_info->FileNameLength),
local_nls);
if (IS_ERR(link_name) || S_ISDIR(file_inode(filp)->i_mode)) {
* smb2_set_info_file() - handler for smb2 set info command
* @work: smb work containing set info command buffer
* @fp: ksmbd_file pointer
- * @info_class: smb2 set info class
+ * @req: request buffer pointer
* @share: ksmbd_share_config pointer
*
* Return: 0 on success, otherwise error
return err;
}
-static ssize_t smb2_read_rdma_channel(struct ksmbd_work *work,
- struct smb2_read_req *req, void *data_buf,
- size_t length)
+static int smb2_set_remote_key_for_rdma(struct ksmbd_work *work,
+ struct smb2_buffer_desc_v1 *desc,
+ __le32 Channel,
+ __le16 ChannelInfoOffset,
+ __le16 ChannelInfoLength)
{
- struct smb2_buffer_desc_v1 *desc =
- (struct smb2_buffer_desc_v1 *)&req->Buffer[0];
- int err;
-
if (work->conn->dialect == SMB30_PROT_ID &&
- req->Channel != SMB2_CHANNEL_RDMA_V1)
+ Channel != SMB2_CHANNEL_RDMA_V1)
return -EINVAL;
- if (req->ReadChannelInfoOffset == 0 ||
- le16_to_cpu(req->ReadChannelInfoLength) < sizeof(*desc))
+ if (ChannelInfoOffset == 0 ||
+ le16_to_cpu(ChannelInfoLength) < sizeof(*desc))
return -EINVAL;
work->need_invalidate_rkey =
- (req->Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE);
+ (Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE);
work->remote_key = le32_to_cpu(desc->token);
+ return 0;
+}
+
+static ssize_t smb2_read_rdma_channel(struct ksmbd_work *work,
+ struct smb2_read_req *req, void *data_buf,
+ size_t length)
+{
+ struct smb2_buffer_desc_v1 *desc =
+ (struct smb2_buffer_desc_v1 *)&req->Buffer[0];
+ int err;
err = ksmbd_conn_rdma_write(work->conn, data_buf, length,
le32_to_cpu(desc->token),
struct ksmbd_conn *conn = work->conn;
struct smb2_read_req *req;
struct smb2_read_rsp *rsp;
- struct ksmbd_file *fp;
+ struct ksmbd_file *fp = NULL;
loff_t offset;
size_t length, mincount;
ssize_t nbytes = 0, remain_bytes = 0;
return smb2_read_pipe(work);
}
+ if (req->Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE ||
+ req->Channel == SMB2_CHANNEL_RDMA_V1) {
+ err = smb2_set_remote_key_for_rdma(work,
+ (struct smb2_buffer_desc_v1 *)
+ &req->Buffer[0],
+ req->Channel,
+ req->ReadChannelInfoOffset,
+ req->ReadChannelInfoLength);
+ if (err)
+ goto out;
+ }
+
fp = ksmbd_lookup_fd_slow(work, le64_to_cpu(req->VolatileFileId),
le64_to_cpu(req->PersistentFileId));
if (!fp) {
desc = (struct smb2_buffer_desc_v1 *)&req->Buffer[0];
- if (work->conn->dialect == SMB30_PROT_ID &&
- req->Channel != SMB2_CHANNEL_RDMA_V1)
- return -EINVAL;
-
- if (req->Length != 0 || req->DataOffset != 0)
- return -EINVAL;
-
- if (req->WriteChannelInfoOffset == 0 ||
- le16_to_cpu(req->WriteChannelInfoLength) < sizeof(*desc))
- return -EINVAL;
-
- work->need_invalidate_rkey =
- (req->Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE);
- work->remote_key = le32_to_cpu(desc->token);
-
data_buf = kvmalloc(length, GFP_KERNEL | __GFP_ZERO);
if (!data_buf)
return -ENOMEM;
return smb2_write_pipe(work);
}
+ if (req->Channel == SMB2_CHANNEL_RDMA_V1 ||
+ req->Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE) {
+ if (req->Length != 0 || req->DataOffset != 0)
+ return -EINVAL;
+ err = smb2_set_remote_key_for_rdma(work,
+ (struct smb2_buffer_desc_v1 *)
+ &req->Buffer[0],
+ req->Channel,
+ req->WriteChannelInfoOffset,
+ req->WriteChannelInfoLength);
+ if (err)
+ goto out;
+ }
+
if (!test_tree_conn_flag(work->tcon, KSMBD_TREE_CONN_FLAG_WRITABLE)) {
ksmbd_debug(SMB, "User does not have write permission\n");
err = -EACCES;
struct sockaddr_storage_rsp *sockaddr_storage;
unsigned int flags;
unsigned long long speed;
- struct sockaddr_in6 *csin6 = (struct sockaddr_in6 *)&conn->peer_addr;
rtnl_lock();
for_each_netdev(&init_net, netdev) {
- if (out_buf_len <
- nbytes + sizeof(struct network_interface_info_ioctl_rsp)) {
- rtnl_unlock();
- return -ENOSPC;
- }
+ bool ipv4_set = false;
if (netdev->type == ARPHRD_LOOPBACK)
continue;
flags = dev_get_flags(netdev);
if (!(flags & IFF_RUNNING))
continue;
+ipv6_retry:
+ if (out_buf_len <
+ nbytes + sizeof(struct network_interface_info_ioctl_rsp)) {
+ rtnl_unlock();
+ return -ENOSPC;
+ }
nii_rsp = (struct network_interface_info_ioctl_rsp *)
&rsp->Buffer[nbytes];
nii_rsp->IfIndex = cpu_to_le32(netdev->ifindex);
nii_rsp->Capability = 0;
+ if (netdev->real_num_tx_queues > 1)
+ nii_rsp->Capability |= cpu_to_le32(RSS_CAPABLE);
if (ksmbd_rdma_capable_netdev(netdev))
nii_rsp->Capability |= cpu_to_le32(RDMA_CAPABLE);
nii_rsp->SockAddr_Storage;
memset(sockaddr_storage, 0, 128);
- if (conn->peer_addr.ss_family == PF_INET ||
- ipv6_addr_v4mapped(&csin6->sin6_addr)) {
+ if (!ipv4_set) {
struct in_device *idev;
sockaddr_storage->Family = cpu_to_le16(INTERNETWORK);
continue;
sockaddr_storage->addr4.IPv4address =
idev_ipv4_address(idev);
+ nbytes += sizeof(struct network_interface_info_ioctl_rsp);
+ ipv4_set = true;
+ goto ipv6_retry;
} else {
struct inet6_dev *idev6;
struct inet6_ifaddr *ifa;
break;
}
sockaddr_storage->addr6.ScopeId = 0;
+ nbytes += sizeof(struct network_interface_info_ioctl_rsp);
}
-
- nbytes += sizeof(struct network_interface_info_ioctl_rsp);
}
rtnl_unlock();
void init_smb2_max_read_size(unsigned int sz);
void init_smb2_max_write_size(unsigned int sz);
void init_smb2_max_trans_size(unsigned int sz);
+void init_smb2_max_credits(unsigned int sz);
bool is_smb2_neg_cmd(struct ksmbd_work *work);
bool is_smb2_rsp(struct ksmbd_work *work);
__u32 max_read_size;
__u32 max_write_size;
__u32 max_trans_size;
+ __u32 max_credits;
__u32 large_lock_type;
__u32 exclusive_lock_type;
__u32 shared_lock_type;
init_smb2_max_write_size(req->smb2_max_write);
if (req->smb2_max_trans)
init_smb2_max_trans_size(req->smb2_max_trans);
+ if (req->smb2_max_credits)
+ init_smb2_max_credits(req->smb2_max_credits);
ret = ksmbd_set_netbios_name(req->netbios_name);
ret |= ksmbd_set_server_string(req->server_string);
#include "smbstatus.h"
#include "transport_rdma.h"
-#define SMB_DIRECT_PORT 5445
+#define SMB_DIRECT_PORT_IWARP 5445
+#define SMB_DIRECT_PORT_INFINIBAND 445
#define SMB_DIRECT_VERSION_LE cpu_to_le16(0x0100)
* as defined in [MS-SMBD] 3.1.1.1
* Those may change after a SMB_DIRECT negotiation
*/
+
+/* Set 445 port to SMB Direct port by default */
+static int smb_direct_port = SMB_DIRECT_PORT_INFINIBAND;
+
/* The local peer's maximum number of credits to grant to the peer */
static int smb_direct_receive_credit_max = 255;
/* The maximum single-message size which can be received */
static int smb_direct_max_receive_size = 8192;
-static int smb_direct_max_read_write_size = 1024 * 1024;
+static int smb_direct_max_read_write_size = 1048512;
static int smb_direct_max_outstanding_rw_ops = 8;
+static LIST_HEAD(smb_direct_device_list);
+static DEFINE_RWLOCK(smb_direct_device_lock);
+
+struct smb_direct_device {
+ struct ib_device *ib_dev;
+ struct list_head list;
+};
+
static struct smb_direct_listener {
struct rdma_cm_id *cm_id;
} smb_direct_listener;
if (t->qp) {
ib_drain_qp(t->qp);
+ ib_mr_pool_destroy(t->qp, &t->qp->rdma_mrs);
ib_destroy_qp(t->qp);
}
}
t->negotiation_requested = true;
t->full_packet_received = true;
+ enqueue_reassembly(t, recvmsg, 0);
wake_up_interruptible(&t->wait_status);
break;
case SMB_DIRECT_MSG_DATA_TRANSFER: {
free_transport(st);
}
+static void smb_direct_shutdown(struct ksmbd_transport *t)
+{
+ struct smb_direct_transport *st = smb_trans_direct_transfort(t);
+
+ ksmbd_debug(RDMA, "smb-direct shutdown cm_id=%p\n", st->cm_id);
+
+ smb_direct_disconnect_rdma_work(&st->disconnect_work);
+}
+
static int smb_direct_cm_handler(struct rdma_cm_id *cm_id,
struct rdma_cm_event *event)
{
pr_err("error at rdma_accept: %d\n", ret);
return ret;
}
-
- wait_event_interruptible(t->wait_status,
- t->status != SMB_DIRECT_CS_NEW);
- if (t->status != SMB_DIRECT_CS_CONNECTED)
- return -ENOTCONN;
return 0;
}
-static int smb_direct_negotiate(struct smb_direct_transport *t)
+static int smb_direct_prepare_negotiation(struct smb_direct_transport *t)
{
int ret;
struct smb_direct_recvmsg *recvmsg;
- struct smb_direct_negotiate_req *req;
recvmsg = get_free_recvmsg(t);
if (!recvmsg)
ret = smb_direct_post_recv(t, recvmsg);
if (ret) {
pr_err("Can't post recv: %d\n", ret);
- goto out;
+ goto out_err;
}
t->negotiation_requested = false;
ret = smb_direct_accept_client(t);
if (ret) {
pr_err("Can't accept client\n");
- goto out;
+ goto out_err;
}
smb_direct_post_recv_credits(&t->post_recv_credits_work.work);
-
- ksmbd_debug(RDMA, "Waiting for SMB_DIRECT negotiate request\n");
- ret = wait_event_interruptible_timeout(t->wait_status,
- t->negotiation_requested ||
- t->status == SMB_DIRECT_CS_DISCONNECTED,
- SMB_DIRECT_NEGOTIATE_TIMEOUT * HZ);
- if (ret <= 0 || t->status == SMB_DIRECT_CS_DISCONNECTED) {
- ret = ret < 0 ? ret : -ETIMEDOUT;
- goto out;
- }
-
- ret = smb_direct_check_recvmsg(recvmsg);
- if (ret == -ECONNABORTED)
- goto out;
-
- req = (struct smb_direct_negotiate_req *)recvmsg->packet;
- t->max_recv_size = min_t(int, t->max_recv_size,
- le32_to_cpu(req->preferred_send_size));
- t->max_send_size = min_t(int, t->max_send_size,
- le32_to_cpu(req->max_receive_size));
- t->max_fragmented_send_size =
- le32_to_cpu(req->max_fragmented_size);
-
- ret = smb_direct_send_negotiate_response(t, ret);
-out:
- if (recvmsg)
- put_recvmsg(t, recvmsg);
+ return 0;
+out_err:
+ put_recvmsg(t, recvmsg);
return ret;
}
cap->max_send_sge = SMB_DIRECT_MAX_SEND_SGES;
cap->max_recv_sge = SMB_DIRECT_MAX_RECV_SGES;
cap->max_inline_data = 0;
- cap->max_rdma_ctxs = 0;
+ cap->max_rdma_ctxs =
+ rdma_rw_mr_factor(device, t->cm_id->port_num, max_pages) *
+ smb_direct_max_outstanding_rw_ops;
return 0;
}
{
int ret;
struct ib_qp_init_attr qp_attr;
+ int pages_per_rw;
t->pd = ib_alloc_pd(t->cm_id->device, 0);
if (IS_ERR(t->pd)) {
t->qp = t->cm_id->qp;
t->cm_id->event_handler = smb_direct_cm_handler;
+ pages_per_rw = DIV_ROUND_UP(t->max_rdma_rw_size, PAGE_SIZE) + 1;
+ if (pages_per_rw > t->cm_id->device->attrs.max_sgl_rd) {
+ int pages_per_mr, mr_count;
+
+ pages_per_mr = min_t(int, pages_per_rw,
+ t->cm_id->device->attrs.max_fast_reg_page_list_len);
+ mr_count = DIV_ROUND_UP(pages_per_rw, pages_per_mr) *
+ atomic_read(&t->rw_avail_ops);
+ ret = ib_mr_pool_init(t->qp, &t->qp->rdma_mrs, mr_count,
+ IB_MR_TYPE_MEM_REG, pages_per_mr, 0);
+ if (ret) {
+ pr_err("failed to init mr pool count %d pages %d\n",
+ mr_count, pages_per_mr);
+ goto err;
+ }
+ }
+
return 0;
err:
if (t->qp) {
static int smb_direct_prepare(struct ksmbd_transport *t)
{
struct smb_direct_transport *st = smb_trans_direct_transfort(t);
+ struct smb_direct_recvmsg *recvmsg;
+ struct smb_direct_negotiate_req *req;
+ int ret;
+
+ ksmbd_debug(RDMA, "Waiting for SMB_DIRECT negotiate request\n");
+ ret = wait_event_interruptible_timeout(st->wait_status,
+ st->negotiation_requested ||
+ st->status == SMB_DIRECT_CS_DISCONNECTED,
+ SMB_DIRECT_NEGOTIATE_TIMEOUT * HZ);
+ if (ret <= 0 || st->status == SMB_DIRECT_CS_DISCONNECTED)
+ return ret < 0 ? ret : -ETIMEDOUT;
+
+ recvmsg = get_first_reassembly(st);
+ if (!recvmsg)
+ return -ECONNABORTED;
+
+ ret = smb_direct_check_recvmsg(recvmsg);
+ if (ret == -ECONNABORTED)
+ goto out;
+
+ req = (struct smb_direct_negotiate_req *)recvmsg->packet;
+ st->max_recv_size = min_t(int, st->max_recv_size,
+ le32_to_cpu(req->preferred_send_size));
+ st->max_send_size = min_t(int, st->max_send_size,
+ le32_to_cpu(req->max_receive_size));
+ st->max_fragmented_send_size =
+ le32_to_cpu(req->max_fragmented_size);
+ st->max_fragmented_recv_size =
+ (st->recv_credit_max * st->max_recv_size) / 2;
+
+ ret = smb_direct_send_negotiate_response(st, ret);
+out:
+ spin_lock_irq(&st->reassembly_queue_lock);
+ st->reassembly_queue_length--;
+ list_del(&recvmsg->list);
+ spin_unlock_irq(&st->reassembly_queue_lock);
+ put_recvmsg(st, recvmsg);
+
+ return ret;
+}
+
+static int smb_direct_connect(struct smb_direct_transport *st)
+{
int ret;
struct ib_qp_cap qp_cap;
return ret;
}
- ret = smb_direct_negotiate(st);
+ ret = smb_direct_prepare_negotiation(st);
if (ret) {
pr_err("Can't negotiate: %d\n", ret);
return ret;
}
-
- st->status = SMB_DIRECT_CS_CONNECTED;
return 0;
}
static int smb_direct_handle_connect_request(struct rdma_cm_id *new_cm_id)
{
struct smb_direct_transport *t;
+ int ret;
if (!rdma_frwr_is_supported(&new_cm_id->device->attrs)) {
ksmbd_debug(RDMA,
if (!t)
return -ENOMEM;
+ ret = smb_direct_connect(t);
+ if (ret)
+ goto out_err;
+
KSMBD_TRANS(t)->handler = kthread_run(ksmbd_conn_handler_loop,
KSMBD_TRANS(t)->conn, "ksmbd:r%u",
- SMB_DIRECT_PORT);
+ smb_direct_port);
if (IS_ERR(KSMBD_TRANS(t)->handler)) {
- int ret = PTR_ERR(KSMBD_TRANS(t)->handler);
-
+ ret = PTR_ERR(KSMBD_TRANS(t)->handler);
pr_err("Can't start thread\n");
- free_transport(t);
- return ret;
+ goto out_err;
}
return 0;
+out_err:
+ free_transport(t);
+ return ret;
}
static int smb_direct_listen_handler(struct rdma_cm_id *cm_id,
return ret;
}
+static int smb_direct_ib_client_add(struct ib_device *ib_dev)
+{
+ struct smb_direct_device *smb_dev;
+
+ /* Set 5445 port if device type is iWARP(No IB) */
+ if (ib_dev->node_type != RDMA_NODE_IB_CA)
+ smb_direct_port = SMB_DIRECT_PORT_IWARP;
+
+ if (!ib_dev->ops.get_netdev ||
+ !rdma_frwr_is_supported(&ib_dev->attrs))
+ return 0;
+
+ smb_dev = kzalloc(sizeof(*smb_dev), GFP_KERNEL);
+ if (!smb_dev)
+ return -ENOMEM;
+ smb_dev->ib_dev = ib_dev;
+
+ write_lock(&smb_direct_device_lock);
+ list_add(&smb_dev->list, &smb_direct_device_list);
+ write_unlock(&smb_direct_device_lock);
+
+ ksmbd_debug(RDMA, "ib device added: name %s\n", ib_dev->name);
+ return 0;
+}
+
+static void smb_direct_ib_client_remove(struct ib_device *ib_dev,
+ void *client_data)
+{
+ struct smb_direct_device *smb_dev, *tmp;
+
+ write_lock(&smb_direct_device_lock);
+ list_for_each_entry_safe(smb_dev, tmp, &smb_direct_device_list, list) {
+ if (smb_dev->ib_dev == ib_dev) {
+ list_del(&smb_dev->list);
+ kfree(smb_dev);
+ break;
+ }
+ }
+ write_unlock(&smb_direct_device_lock);
+}
+
+static struct ib_client smb_direct_ib_client = {
+ .name = "ksmbd_smb_direct_ib",
+ .add = smb_direct_ib_client_add,
+ .remove = smb_direct_ib_client_remove,
+};
+
int ksmbd_rdma_init(void)
{
int ret;
smb_direct_listener.cm_id = NULL;
+ ret = ib_register_client(&smb_direct_ib_client);
+ if (ret) {
+ pr_err("failed to ib_register_client\n");
+ return ret;
+ }
+
/* When a client is running out of send credits, the credits are
* granted by the server's sending a packet using this queue.
* This avoids the situation that a clients cannot send packets
if (!smb_direct_wq)
return -ENOMEM;
- ret = smb_direct_listen(SMB_DIRECT_PORT);
+ ret = smb_direct_listen(smb_direct_port);
if (ret) {
destroy_workqueue(smb_direct_wq);
smb_direct_wq = NULL;
return 0;
}
-int ksmbd_rdma_destroy(void)
+void ksmbd_rdma_destroy(void)
{
- if (smb_direct_listener.cm_id)
- rdma_destroy_id(smb_direct_listener.cm_id);
+ if (!smb_direct_listener.cm_id)
+ return;
+
+ ib_unregister_client(&smb_direct_ib_client);
+ rdma_destroy_id(smb_direct_listener.cm_id);
+
smb_direct_listener.cm_id = NULL;
if (smb_direct_wq) {
destroy_workqueue(smb_direct_wq);
smb_direct_wq = NULL;
}
- return 0;
}
bool ksmbd_rdma_capable_netdev(struct net_device *netdev)
{
- struct ib_device *ibdev;
+ struct smb_direct_device *smb_dev;
+ int i;
bool rdma_capable = false;
- ibdev = ib_device_get_by_netdev(netdev, RDMA_DRIVER_UNKNOWN);
- if (ibdev) {
- if (rdma_frwr_is_supported(&ibdev->attrs))
- rdma_capable = true;
- ib_device_put(ibdev);
+ read_lock(&smb_direct_device_lock);
+ list_for_each_entry(smb_dev, &smb_direct_device_list, list) {
+ for (i = 0; i < smb_dev->ib_dev->phys_port_cnt; i++) {
+ struct net_device *ndev;
+
+ ndev = smb_dev->ib_dev->ops.get_netdev(smb_dev->ib_dev,
+ i + 1);
+ if (!ndev)
+ continue;
+
+ if (ndev == netdev) {
+ dev_put(ndev);
+ rdma_capable = true;
+ goto out;
+ }
+ dev_put(ndev);
+ }
+ }
+out:
+ read_unlock(&smb_direct_device_lock);
+
+ if (rdma_capable == false) {
+ struct ib_device *ibdev;
+
+ ibdev = ib_device_get_by_netdev(netdev, RDMA_DRIVER_UNKNOWN);
+ if (ibdev) {
+ if (rdma_frwr_is_supported(&ibdev->attrs))
+ rdma_capable = true;
+ ib_device_put(ibdev);
+ }
}
+
return rdma_capable;
}
static struct ksmbd_transport_ops ksmbd_smb_direct_transport_ops = {
.prepare = smb_direct_prepare,
.disconnect = smb_direct_disconnect,
+ .shutdown = smb_direct_shutdown,
.writev = smb_direct_writev,
.read = smb_direct_read,
.rdma_read = smb_direct_rdma_read,
#ifndef __KSMBD_TRANSPORT_RDMA_H__
#define __KSMBD_TRANSPORT_RDMA_H__
-#define SMB_DIRECT_PORT 5445
-
/* SMB DIRECT negotiation request packet [MS-SMBD] 2.2.1 */
struct smb_direct_negotiate_req {
__le16 min_version;
#ifdef CONFIG_SMB_SERVER_SMBDIRECT
int ksmbd_rdma_init(void);
-int ksmbd_rdma_destroy(void);
+void ksmbd_rdma_destroy(void);
bool ksmbd_rdma_capable_netdev(struct net_device *netdev);
#else
static inline int ksmbd_rdma_init(void) { return 0; }
&ksmbd_socket);
if (ret) {
pr_err("Can't create socket for ipv4: %d\n", ret);
- goto out_error;
+ goto out_clear;
}
sin.sin_family = PF_INET;
out_error:
tcp_destroy_socket(ksmbd_socket);
+out_clear:
iface->ksmbd_socket = NULL;
return ret;
}
int durable_timeout;
- /* for SMB1 */
- int pid;
-
- /* conflict lock fail count for SMB1 */
- unsigned int cflock_cnt;
- /* last lock failure start offset for SMB1 */
- unsigned long long llock_fstart;
-
- int dirent_offset;
-
/* if ls is happening on directory, below is valid*/
struct ksmbd_readdir_data readdir_data;
int dot_dotdot[2];
#include <linux/pid_namespace.h>
#include <linux/hashtable.h>
#include <linux/percpu.h>
+#include <linux/sysctl.h>
#define CREATE_TRACE_POINTS
#include <trace/events/filelock.h>
return fl->fl_type;
}
-int leases_enable = 1;
-int lease_break_time = 45;
+static int leases_enable = 1;
+static int lease_break_time = 45;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table locks_sysctls[] = {
+ {
+ .procname = "leases-enable",
+ .data = &leases_enable,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#ifdef CONFIG_MMU
+ {
+ .procname = "lease-break-time",
+ .data = &lease_break_time,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif /* CONFIG_MMU */
+ {}
+};
+
+static int __init init_fs_locks_sysctls(void)
+{
+ register_sysctl_init("fs", locks_sysctls);
+ return 0;
+}
+early_initcall(init_fs_locks_sysctls);
+#endif /* CONFIG_SYSCTL */
/*
* The global file_lock_list is only used for displaying /proc/locks, so we
#include <linux/writeback.h>
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
-#include <linux/cleancache.h>
#include "internal.h"
/*
SetPageMappedToDisk(page);
}
- if (fully_mapped && blocks_per_page == 1 && !PageUptodate(page) &&
- cleancache_get_page(page) == 0) {
- SetPageUptodate(page);
- goto confused;
- }
-
/*
* This page will go to BIO. Do we need to send this BIO off first?
*/
path_put(&last->link);
}
-int sysctl_protected_symlinks __read_mostly = 0;
-int sysctl_protected_hardlinks __read_mostly = 0;
-int sysctl_protected_fifos __read_mostly;
-int sysctl_protected_regular __read_mostly;
+static int sysctl_protected_symlinks __read_mostly;
+static int sysctl_protected_hardlinks __read_mostly;
+static int sysctl_protected_fifos __read_mostly;
+static int sysctl_protected_regular __read_mostly;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table namei_sysctls[] = {
+ {
+ .procname = "protected_symlinks",
+ .data = &sysctl_protected_symlinks,
+ .maxlen = sizeof(int),
+ .mode = 0600,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "protected_hardlinks",
+ .data = &sysctl_protected_hardlinks,
+ .maxlen = sizeof(int),
+ .mode = 0600,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "protected_fifos",
+ .data = &sysctl_protected_fifos,
+ .maxlen = sizeof(int),
+ .mode = 0600,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_TWO,
+ },
+ {
+ .procname = "protected_regular",
+ .data = &sysctl_protected_regular,
+ .maxlen = sizeof(int),
+ .mode = 0600,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_TWO,
+ },
+ { }
+};
+
+static int __init init_fs_namei_sysctls(void)
+{
+ register_sysctl_init("fs", namei_sysctls);
+ return 0;
+}
+fs_initcall(init_fs_namei_sysctls);
+
+#endif /* CONFIG_SYSCTL */
/**
* may_follow_link - Check symlink following for unsafe situations
#include "internal.h"
/* Maximum number of mounts in a mount namespace */
-unsigned int sysctl_mount_max __read_mostly = 100000;
+static unsigned int sysctl_mount_max __read_mostly = 100000;
static unsigned int m_hash_mask __read_mostly;
static unsigned int m_hash_shift __read_mostly;
.install = mntns_install,
.owner = mntns_owner,
};
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table fs_namespace_sysctls[] = {
+ {
+ .procname = "mount-max",
+ .data = &sysctl_mount_max,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ },
+ { }
+};
+
+static int __init init_fs_namespace_sysctls(void)
+{
+ register_sysctl_init("fs", fs_namespace_sysctls);
+ return 0;
+}
+fs_initcall(init_fs_namespace_sysctls);
+
+#endif /* CONFIG_SYSCTL */
INIT_WORK(&rreq->work, netfs_rreq_work);
refcount_set(&rreq->usage, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
- ops->init_rreq(rreq, file);
+ if (ops->init_rreq)
+ ops->init_rreq(rreq, file);
netfs_stat(&netfs_n_rh_rreq);
}
*/
static void nilfs_copy_page(struct page *dst, struct page *src, int copy_dirty)
{
- struct buffer_head *dbh, *dbufs, *sbh, *sbufs;
+ struct buffer_head *dbh, *dbufs, *sbh;
unsigned long mask = NILFS_BUFFER_INHERENT_BITS;
BUG_ON(PageWriteback(dst));
- sbh = sbufs = page_buffers(src);
+ sbh = page_buffers(src);
if (!page_has_buffers(dst))
create_empty_buffers(dst, sbh->b_size, 0);
#include <linux/fdtable.h>
#include <linux/fsnotify_backend.h>
-int dir_notify_enable __read_mostly = 1;
+static int dir_notify_enable __read_mostly = 1;
+#ifdef CONFIG_SYSCTL
+static struct ctl_table dnotify_sysctls[] = {
+ {
+ .procname = "dir-notify-enable",
+ .data = &dir_notify_enable,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {}
+};
+static void __init dnotify_sysctl_init(void)
+{
+ register_sysctl_init("fs", dnotify_sysctls);
+}
+#else
+#define dnotify_sysctl_init() do { } while (0)
+#endif
static struct kmem_cache *dnotify_struct_cache __read_mostly;
static struct kmem_cache *dnotify_mark_cache __read_mostly;
dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops);
if (IS_ERR(dnotify_group))
panic("unable to allocate fsnotify group for dnotify\n");
+ dnotify_sysctl_init();
return 0;
}
static long ft_zero = 0;
static long ft_int_max = INT_MAX;
-struct ctl_table fanotify_table[] = {
+static struct ctl_table fanotify_table[] = {
{
.procname = "max_user_groups",
.data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS],
},
{ }
};
+
+static void __init fanotify_sysctls_init(void)
+{
+ register_sysctl("fs/fanotify", fanotify_table);
+}
+#else
+#define fanotify_sysctls_init() do { } while (0)
#endif /* CONFIG_SYSCTL */
/*
init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] =
FANOTIFY_DEFAULT_MAX_GROUPS;
init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks;
+ fanotify_sysctls_init();
return 0;
}
static long it_zero = 0;
static long it_int_max = INT_MAX;
-struct ctl_table inotify_table[] = {
+static struct ctl_table inotify_table[] = {
{
.procname = "max_user_instances",
.data = &init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES],
},
{ }
};
+
+static void __init inotify_sysctls_init(void)
+{
+ register_sysctl("fs/inotify", inotify_table);
+}
+
+#else
+#define inotify_sysctls_init() do { } while (0)
#endif /* CONFIG_SYSCTL */
static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
inotify_max_queued_events = 16384;
init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
+ inotify_sysctls_init();
return 0;
}
#include <linux/blkdev.h>
#include <linux/buffer_head.h>
-#include <linux/cleancache.h>
#include <linux/fs.h>
#include <linux/highmem.h>
#include <linux/kernel.h>
o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap));
/* lowest node as master node to make negotiate decision. */
- master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0);
+ master_node = find_first_bit(live_node_bitmap, O2NM_MAX_NODES);
if (master_node == o2nm_this_node()) {
if (!test_bit(master_node, reg->hr_nego_node_bitmap)) {
int status, ret = 0, i;
char *p;
- if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES)
+ if (find_first_bit(node_map, O2NM_MAX_NODES) >= O2NM_MAX_NODES)
goto bail;
qr = kzalloc(sizeof(struct dlm_query_region), GFP_KERNEL);
struct o2nm_node *node;
int ret = 0, status, count, i;
- if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES)
+ if (find_first_bit(node_map, O2NM_MAX_NODES) >= O2NM_MAX_NODES)
goto bail;
qn = kzalloc(sizeof(struct dlm_query_nodeinfo), GFP_KERNEL);
* to see if there are any nodes that still need to be
* considered. these will not appear in the mle nodemap
* but they might own this lockres. wait on them. */
- bit = find_next_bit(dlm->recovery_map, O2NM_MAX_NODES, 0);
+ bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES);
if (bit < O2NM_MAX_NODES) {
mlog(0, "%s: res %.*s, At least one node (%d) "
"to recover before lock mastery can begin\n",
dlm_wait_for_recovery(dlm);
spin_lock(&dlm->spinlock);
- bit = find_next_bit(dlm->recovery_map, O2NM_MAX_NODES, 0);
+ bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES);
if (bit < O2NM_MAX_NODES) {
mlog(0, "%s: res %.*s, At least one node (%d) "
"to recover before lock mastery can begin\n",
sleep = 1;
/* have all nodes responded? */
if (voting_done && !*blocked) {
- bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, 0);
+ bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES);
if (dlm->node_num <= bit) {
/* my node number is lowest.
* now tell other nodes that I am
} else {
mlog(ML_ERROR, "node down! %d\n", node);
if (blocked) {
- int lowest = find_next_bit(mle->maybe_map,
- O2NM_MAX_NODES, 0);
+ int lowest = find_first_bit(mle->maybe_map,
+ O2NM_MAX_NODES);
/* act like it was never there */
clear_bit(node, mle->maybe_map);
"MLE for it! (%.*s)\n", assert->node_idx,
namelen, name);
} else {
- int bit = find_next_bit (mle->maybe_map, O2NM_MAX_NODES, 0);
+ int bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES);
if (bit >= O2NM_MAX_NODES) {
/* not necessarily an error, though less likely.
* could be master just re-asserting. */
}
if (!nonlocal) {
- node_ref = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
+ node_ref = find_first_bit(res->refmap, O2NM_MAX_NODES);
if (node_ref >= O2NM_MAX_NODES)
return 0;
}
BUG_ON(mle->type != DLM_MLE_BLOCK);
spin_lock(&mle->spinlock);
- bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, 0);
+ bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES);
if (bit != dead_node) {
mlog(0, "mle found, but dead node %u would not have been "
"master\n", dead_node);
spin_lock(&dlm->master_lock);
BUG_ON(dlm->dlm_state != DLM_CTXT_LEAVING);
- BUG_ON((find_next_bit(dlm->domain_map, O2NM_MAX_NODES, 0) < O2NM_MAX_NODES));
+ BUG_ON((find_first_bit(dlm->domain_map, O2NM_MAX_NODES) < O2NM_MAX_NODES));
for (i = 0; i < DLM_HASH_BUCKETS; i++) {
bucket = dlm_master_hash(dlm, i);
if (dlm->reco.dead_node == O2NM_INVALID_NODE_NUM) {
int bit;
- bit = find_next_bit (dlm->recovery_map, O2NM_MAX_NODES, 0);
+ bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES);
if (bit >= O2NM_MAX_NODES || bit < 0)
dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
else
return 0;
/* Another node has this resource with this node as the master */
- bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
+ bit = find_first_bit(res->refmap, O2NM_MAX_NODES);
if (bit < O2NM_MAX_NODES)
return 0;
{ }
};
-static struct ctl_table ocfs2_kern_table[] = {
- {
- .procname = "ocfs2",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_mod_table
- },
- { }
-};
-
-static struct ctl_table ocfs2_root_table[] = {
- {
- .procname = "fs",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_kern_table
- },
- { }
-};
-
static struct ctl_table_header *ocfs2_table_header;
-
/*
* Initialization
*/
{
strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);
- ocfs2_table_header = register_sysctl_table(ocfs2_root_table);
+ ocfs2_table_header = register_sysctl("fs/ocfs2", ocfs2_mod_table);
if (!ocfs2_table_header) {
printk(KERN_ERR
"ocfs2 stack glue: unable to register sysctl\n");
#include <linux/mount.h>
#include <linux/seq_file.h>
#include <linux/quotaops.h>
-#include <linux/cleancache.h>
#include <linux/signal.h>
#define CREATE_TRACE_POINTS
mlog_errno(status);
goto bail;
}
- cleancache_init_shared_fs(sb);
osb->ocfs2_wq = alloc_ordered_workqueue("ocfs2_wq", WQ_MEM_RECLAIM);
if (!osb->ocfs2_wq) {
#include <linux/fcntl.h>
#include <linux/memcontrol.h>
#include <linux/watch_queue.h>
+#include <linux/sysctl.h>
#include <linux/uaccess.h>
#include <asm/ioctls.h>
* The max size that a non-root user is allowed to grow the pipe. Can
* be set by root in /proc/sys/fs/pipe-max-size
*/
-unsigned int pipe_max_size = 1048576;
+static unsigned int pipe_max_size = 1048576;
/* Maximum allocatable pages per user. Hard limit is unset by default, soft
* matches default values.
*/
-unsigned long pipe_user_pages_hard;
-unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
+static unsigned long pipe_user_pages_hard;
+static unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
/*
* We use head and tail indices that aren't masked off, except at the point of
.kill_sb = kill_anon_super,
};
+#ifdef CONFIG_SYSCTL
+static int do_proc_dopipe_max_size_conv(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data)
+{
+ if (write) {
+ unsigned int val;
+
+ val = round_pipe_size(*lvalp);
+ if (val == 0)
+ return -EINVAL;
+
+ *valp = val;
+ } else {
+ unsigned int val = *valp;
+ *lvalp = (unsigned long) val;
+ }
+
+ return 0;
+}
+
+static int proc_dopipe_max_size(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ do_proc_dopipe_max_size_conv, NULL);
+}
+
+static struct ctl_table fs_pipe_sysctls[] = {
+ {
+ .procname = "pipe-max-size",
+ .data = &pipe_max_size,
+ .maxlen = sizeof(pipe_max_size),
+ .mode = 0644,
+ .proc_handler = proc_dopipe_max_size,
+ },
+ {
+ .procname = "pipe-user-pages-hard",
+ .data = &pipe_user_pages_hard,
+ .maxlen = sizeof(pipe_user_pages_hard),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "pipe-user-pages-soft",
+ .data = &pipe_user_pages_soft,
+ .maxlen = sizeof(pipe_user_pages_soft),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ { }
+};
+#endif
+
static int __init init_pipe_fs(void)
{
int err = register_filesystem(&pipe_fs_type);
unregister_filesystem(&pipe_fs_type);
}
}
+#ifdef CONFIG_SYSCTL
+ register_sysctl_init("fs", fs_pipe_sysctls);
+#endif
return err;
}
#include <linux/string_helpers.h>
#include <linux/user_namespace.h>
#include <linux/fs_struct.h>
+#include <linux/kthread.h>
#include <asm/processor.h>
#include "internal.h"
if (p->flags & PF_WQ_WORKER)
wq_worker_comm(tcomm, sizeof(tcomm), p);
+ else if (p->flags & PF_KTHREAD)
+ get_kthread_comm(tcomm, sizeof(tcomm), p);
else
__get_task_comm(tcomm, sizeof(tcomm), p);
/************************************************************************/
/* permission checks */
-static int proc_fd_access_allowed(struct inode *inode)
+static bool proc_fd_access_allowed(struct inode *inode)
{
struct task_struct *task;
- int allowed = 0;
+ bool allowed = false;
/* Allow access to a task's file descriptors if it is us or we
* may use ptrace attach to the process and find out that
* information.
}
EXPORT_SYMBOL(proc_remove);
-void *PDE_DATA(const struct inode *inode)
-{
- return __PDE_DATA(inode);
-}
-EXPORT_SYMBOL(PDE_DATA);
-
/*
* Pull a user buffer into memory and pass it to the file's write handler if
* one is supplied. The ->write() method is permitted to modify the
return NULL;
}
+ inode->i_private = de->data;
inode->i_ino = de->low_ino;
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
PROC_I(inode)->pde = de;
return PROC_I(inode)->pde;
}
-static inline void *__PDE_DATA(const struct inode *inode)
-{
- return PDE(inode)->data;
-}
-
static inline struct pid *proc_pid(const struct inode *inode)
{
return PROC_I(inode)->pid;
* @parent: The parent directory in which to create.
* @ops: The seq_file ops with which to read the file.
* @write: The write method with which to 'modify' the file.
- * @data: Data for retrieval by PDE_DATA().
+ * @data: Data for retrieval by pde_data().
*
* Create a network namespaced proc file in the @parent directory with the
* specified @name and @mode that allows reading of a file that displays a
* modified by the @write function. @write should return 0 on success.
*
* The @data value is accessible from the @show and @write functions by calling
- * PDE_DATA() on the file inode. The network namespace must be accessed by
+ * pde_data() on the file inode. The network namespace must be accessed by
* calling seq_file_net() on the seq_file struct.
*/
struct proc_dir_entry *proc_create_net_data_write(const char *name, umode_t mode,
* @parent: The parent directory in which to create.
* @show: The seqfile show method with which to read the file.
* @write: The write method with which to 'modify' the file.
- * @data: Data for retrieval by PDE_DATA().
+ * @data: Data for retrieval by pde_data().
*
* Create a network-namespaced proc file in the @parent directory with the
* specified @name and @mode that allows reading of a file that displays a
* modified by the @write function. @write should return 0 on success.
*
* The @data value is accessible from the @show and @write functions by calling
- * PDE_DATA() on the file inode. The network namespace must be accessed by
+ * pde_data() on the file inode. The network namespace must be accessed by
* calling seq_file_single_net() on the seq_file struct.
*/
struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mode,
#include <linux/module.h>
#include <linux/bpf-cgroup.h>
#include <linux/mount.h>
+#include <linux/kmemleak.h>
#include "internal.h"
static const struct dentry_operations proc_sys_dentry_operations;
static const struct inode_operations proc_sys_dir_operations;
/* shared constants to be used in various sysctls */
-const int sysctl_vals[] = { 0, 1, INT_MAX };
+const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, 3000, INT_MAX, 65535 };
EXPORT_SYMBOL(sysctl_vals);
+const unsigned long sysctl_long_vals[] = { 0, 1, LONG_MAX };
+EXPORT_SYMBOL_GPL(sysctl_long_vals);
+
/* Support for permanently empty directories */
struct ctl_table sysctl_mount_point[] = {
{ }
};
+/**
+ * register_sysctl_mount_point() - registers a sysctl mount point
+ * @path: path for the mount point
+ *
+ * Used to create a permanently empty directory to serve as mount point.
+ * There are some subtle but important permission checks this allows in the
+ * case of unprivileged mounts.
+ */
+struct ctl_table_header *register_sysctl_mount_point(const char *path)
+{
+ return register_sysctl(path, sysctl_mount_point);
+}
+EXPORT_SYMBOL(register_sysctl_mount_point);
+
static bool is_empty_dir(struct ctl_table_header *head)
{
return head->ctl_table[0].child == sysctl_mount_point;
else {
pr_err("sysctl duplicate entry: ");
sysctl_print_dir(head->parent);
- pr_cont("/%s\n", entry->procname);
+ pr_cont("%s\n", entry->procname);
return -EEXIST;
}
}
if (IS_ERR(subdir)) {
pr_err("sysctl could not get directory: ");
sysctl_print_dir(dir);
- pr_cont("/%*.*s %ld\n",
- namelen, namelen, name, PTR_ERR(subdir));
+ pr_cont("%*.*s %ld\n", namelen, namelen, name,
+ PTR_ERR(subdir));
}
drop_sysctl_table(&dir->header);
if (new)
struct ctl_dir *dir;
int ret;
- ret = 0;
spin_lock(&sysctl_lock);
root = (*pentry)->data;
set = lookup_header_set(root);
}
EXPORT_SYMBOL(register_sysctl);
+/**
+ * __register_sysctl_init() - register sysctl table to path
+ * @path: path name for sysctl base
+ * @table: This is the sysctl table that needs to be registered to the path
+ * @table_name: The name of sysctl table, only used for log printing when
+ * registration fails
+ *
+ * The sysctl interface is used by userspace to query or modify at runtime
+ * a predefined value set on a variable. These variables however have default
+ * values pre-set. Code which depends on these variables will always work even
+ * if register_sysctl() fails. If register_sysctl() fails you'd just loose the
+ * ability to query or modify the sysctls dynamically at run time. Chances of
+ * register_sysctl() failing on init are extremely low, and so for both reasons
+ * this function does not return any error as it is used by initialization code.
+ *
+ * Context: Can only be called after your respective sysctl base path has been
+ * registered. So for instance, most base directories are registered early on
+ * init before init levels are processed through proc_sys_init() and
+ * sysctl_init_bases().
+ */
+void __init __register_sysctl_init(const char *path, struct ctl_table *table,
+ const char *table_name)
+{
+ struct ctl_table_header *hdr = register_sysctl(path, table);
+
+ if (unlikely(!hdr)) {
+ pr_err("failed when register_sysctl %s to %s\n", table_name, path);
+ return;
+ }
+ kmemleak_not_leak(hdr);
+}
+
static char *append_path(const char *path, char *pos, const char *name)
{
int namelen;
}
EXPORT_SYMBOL(register_sysctl_table);
+int __register_sysctl_base(struct ctl_table *base_table)
+{
+ struct ctl_table_header *hdr;
+
+ hdr = register_sysctl_table(base_table);
+ kmemleak_not_leak(hdr);
+ return 0;
+}
+
static void put_links(struct ctl_table_header *header)
{
struct ctl_table_set *root_set = &sysctl_table_root.default_set;
else {
pr_err("sysctl link missing during unregister: ");
sysctl_print_dir(parent);
- pr_cont("/%s\n", name);
+ pr_cont("%s\n", name);
}
}
}
proc_sys_root->proc_dir_ops = &proc_sys_dir_file_operations;
proc_sys_root->nlink = 0;
- return sysctl_init();
+ return sysctl_init_bases();
}
struct sysctl_alias {
static DECLARE_RWSEM(vmcore_cb_rwsem);
/* List of registered vmcore callbacks. */
static LIST_HEAD(vmcore_cb_list);
-/* Whether we had a surprise unregistration of a callback. */
-static bool vmcore_cb_unstable;
/* Whether the vmcore has been opened once. */
static bool vmcore_opened;
* very unusual (e.g., forced driver removal), but we cannot stop
* unregistering.
*/
- if (vmcore_opened) {
+ if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback unregistration\n");
- vmcore_cb_unstable = true;
- }
up_write(&vmcore_cb_rwsem);
}
EXPORT_SYMBOL_GPL(unregister_vmcore_cb);
bool ret = true;
lockdep_assert_held_read(&vmcore_cb_rwsem);
- if (unlikely(vmcore_cb_unstable))
- return false;
list_for_each_entry(cb, &vmcore_cb_list, next) {
if (unlikely(!cb->pfn_is_ram))
* looping over all pages without a reason.
*/
down_read(&vmcore_cb_rwsem);
- if (!list_empty(&vmcore_cb_list) || vmcore_cb_unstable)
+ if (!list_empty(&vmcore_cb_list))
ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
else
ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);
*/
/* Flags */
-#define SMB2_ACCEPT_TRANSFORM_LEVEL_SECURITY 0x00000001
+#define SMB2_ACCEPT_TRANSPORT_LEVEL_SECURITY 0x00000001
struct smb2_transport_capabilities_context {
__le16 ContextType; /* 6 */
#define FSCTL_SET_SHORT_NAME_BEHAVIOR 0x000901B4 /* BB add struct */
#define FSCTL_GET_INTEGRITY_INFORMATION 0x0009027C
#define FSCTL_GET_REFS_VOLUME_DATA 0x000902D8 /* See MS-FSCC 2.3.24 */
+#define FSCTL_SET_INTEGRITY_INFORMATION_EXT 0x00090380
#define FSCTL_GET_RETRIEVAL_POINTERS_AND_REFCOUNT 0x000903d3
#define FSCTL_GET_RETRIEVAL_POINTER_COUNT 0x0009042b
+#define FSCTL_REFS_STREAM_SNAPSHOT_MANAGEMENT 0x00090440
#define FSCTL_QUERY_ALLOCATED_RANGES 0x000940CF
#define FSCTL_SET_DEFECT_MANAGEMENT 0x00098134 /* BB add struct */
#define FSCTL_FILE_LEVEL_TRIM 0x00098208 /* BB add struct */
#include <linux/mutex.h>
#include <linux/backing-dev.h>
#include <linux/rculist_bl.h>
-#include <linux/cleancache.h>
#include <linux/fscrypt.h>
#include <linux/fsnotify.h>
#include <linux/lockdep.h>
s->s_time_gran = 1000000000;
s->s_time_min = TIME64_MIN;
s->s_time_max = TIME64_MAX;
- s->cleancache_poolid = CLEANCACHE_NO_POOL;
s->s_shrink.seeks = DEFAULT_SEEKS;
s->s_shrink.scan_objects = super_cache_scan;
{
struct file_system_type *fs = s->s_type;
if (atomic_dec_and_test(&s->s_active)) {
- cleancache_invalidate_fs(s);
unregister_shrinker(&s->s_shrink);
fs->kill_sb(s);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * /proc/sys/fs shared sysctls
+ *
+ * These sysctls are shared between different filesystems.
+ */
+#include <linux/init.h>
+#include <linux/sysctl.h>
+
+static struct ctl_table fs_shared_sysctls[] = {
+ {
+ .procname = "overflowuid",
+ .data = &fs_overflowuid,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_MAXOLDUID,
+ },
+ {
+ .procname = "overflowgid",
+ .data = &fs_overflowgid,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_MAXOLDUID,
+ },
+ { }
+};
+
+DECLARE_SYSCTL_BASE(fs, fs_shared_sysctls);
+
+static int __init init_fs_sysctls(void)
+{
+ return register_sysctl_base(fs);
+}
+
+early_initcall(init_fs_sysctls);
#define XFS_FMR_OWN_COW FMR_OWNER('X', 7) /* cow staging */
#define XFS_FMR_OWN_DEFECTIVE FMR_OWNER('X', 8) /* bad blocks */
-/*
- * Structure for XFS_IOC_FSSETDM.
- * For use by backup and restore programs to set the XFS on-disk inode
- * fields di_dmevmask and di_dmstate. These must be set to exactly and
- * only values previously obtained via xfs_bulkstat! (Specifically the
- * struct xfs_bstat fields bs_dmevmask and bs_dmstate.)
- */
-#ifndef HAVE_FSDMIDATA
-struct fsdmidata {
- __u32 fsd_dmevmask; /* corresponds to di_dmevmask */
- __u16 fsd_padding;
- __u16 fsd_dmstate; /* corresponds to di_dmstate */
-};
-#endif
-
/*
* File segment locking set data type for 64 bit access.
* Also used for all the RESV/FREE interfaces.
/*
* Compound structures for passing args through Handle Request interfaces
- * xfs_fssetdm_by_handle, xfs_attrlist_by_handle, xfs_attrmulti_by_handle
- * - ioctls: XFS_IOC_FSSETDM_BY_HANDLE, XFS_IOC_ATTRLIST_BY_HANDLE, and
- * XFS_IOC_ATTRMULTI_BY_HANDLE
+ * xfs_attrlist_by_handle, xfs_attrmulti_by_handle
+ * - ioctls: XFS_IOC_ATTRLIST_BY_HANDLE, and XFS_IOC_ATTRMULTI_BY_HANDLE
*/
-typedef struct xfs_fsop_setdm_handlereq {
- struct xfs_fsop_handlereq hreq; /* handle information */
- struct fsdmidata __user *data; /* DMAPI data */
-} xfs_fsop_setdm_handlereq_t;
-
/*
* Flags passed in xfs_attr_multiop.am_flags for the attr ioctl interface.
*
* For 'documentation' purposed more than anything else,
* the "cmd #" field reflects the IRIX fcntl number.
*/
-#define XFS_IOC_ALLOCSP _IOW ('X', 10, struct xfs_flock64)
-#define XFS_IOC_FREESP _IOW ('X', 11, struct xfs_flock64)
+/* XFS_IOC_ALLOCSP ------- deprecated 10 */
+/* XFS_IOC_FREESP -------- deprecated 11 */
#define XFS_IOC_DIOINFO _IOR ('X', 30, struct dioattr)
#define XFS_IOC_FSGETXATTR FS_IOC_FSGETXATTR
#define XFS_IOC_FSSETXATTR FS_IOC_FSSETXATTR
-#define XFS_IOC_ALLOCSP64 _IOW ('X', 36, struct xfs_flock64)
-#define XFS_IOC_FREESP64 _IOW ('X', 37, struct xfs_flock64)
+/* XFS_IOC_ALLOCSP64 ----- deprecated 36 */
+/* XFS_IOC_FREESP64 ------ deprecated 37 */
#define XFS_IOC_GETBMAP _IOWR('X', 38, struct getbmap)
-#define XFS_IOC_FSSETDM _IOW ('X', 39, struct fsdmidata)
+/* XFS_IOC_FSSETDM ------- deprecated 39 */
#define XFS_IOC_RESVSP _IOW ('X', 40, struct xfs_flock64)
#define XFS_IOC_UNRESVSP _IOW ('X', 41, struct xfs_flock64)
#define XFS_IOC_RESVSP64 _IOW ('X', 42, struct xfs_flock64)
#define XFS_IOC_FREEZE _IOWR('X', 119, int) /* aka FIFREEZE */
#define XFS_IOC_THAW _IOWR('X', 120, int) /* aka FITHAW */
-#define XFS_IOC_FSSETDM_BY_HANDLE _IOW ('X', 121, struct xfs_fsop_setdm_handlereq)
+/* XFS_IOC_FSSETDM_BY_HANDLE -- deprecated 121 */
#define XFS_IOC_ATTRLIST_BY_HANDLE _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq)
#define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
#define XFS_IOC_FSGEOMETRY_V4 _IOR ('X', 124, struct xfs_fsop_geom_v4)
xfs_alloc_file_space(
struct xfs_inode *ip,
xfs_off_t offset,
- xfs_off_t len,
- int alloc_type)
+ xfs_off_t len)
{
xfs_mount_t *mp = ip->i_mount;
xfs_off_t count;
goto error;
error = xfs_bmapi_write(tp, ip, startoffset_fsb,
- allocatesize_fsb, alloc_type, 0, imapp,
- &nimaps);
+ allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
+ &nimaps);
if (error)
goto error;
/* preallocation and hole punch interface */
int xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
- xfs_off_t len, int alloc_type);
+ xfs_off_t len);
int xfs_free_file_space(struct xfs_inode *ip, xfs_off_t offset,
xfs_off_t len);
int xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
}
if (!xfs_is_always_cow_inode(ip)) {
- error = xfs_alloc_file_space(ip, offset, len,
- XFS_BMAPI_PREALLOC);
+ error = xfs_alloc_file_space(ip, offset, len);
if (error)
goto out_unlock;
}
}
/*
- * Force all currently queued inode inactivation work to run immediately, and
- * wait for the work to finish. Two pass - queue all the work first pass, wait
- * for it in a second pass.
+ * Force all currently queued inode inactivation work to run immediately and
+ * wait for the work to finish.
*/
void
xfs_inodegc_flush(
struct xfs_mount *mp)
{
- struct xfs_inodegc *gc;
- int cpu;
-
if (!xfs_is_inodegc_enabled(mp))
return;
trace_xfs_inodegc_flush(mp, __return_address);
xfs_inodegc_queue_all(mp);
-
- for_each_online_cpu(cpu) {
- gc = per_cpu_ptr(mp->m_inodegc, cpu);
- flush_work(&gc->work);
- }
+ flush_workqueue(mp->m_inodegc_wq);
}
/*
xfs_inodegc_stop(
struct xfs_mount *mp)
{
- struct xfs_inodegc *gc;
- int cpu;
-
if (!xfs_clear_inodegc_enabled(mp))
return;
xfs_inodegc_queue_all(mp);
+ drain_workqueue(mp->m_inodegc_wq);
- for_each_online_cpu(cpu) {
- gc = per_cpu_ptr(mp->m_inodegc, cpu);
- cancel_work_sync(&gc->work);
- }
trace_xfs_inodegc_stop(mp, __return_address);
}
return error;
}
-int
-xfs_ioc_space(
- struct file *filp,
- xfs_flock64_t *bf)
-{
- struct inode *inode = file_inode(filp);
- struct xfs_inode *ip = XFS_I(inode);
- struct iattr iattr;
- enum xfs_prealloc_flags flags = XFS_PREALLOC_CLEAR;
- uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
- int error;
-
- if (inode->i_flags & (S_IMMUTABLE|S_APPEND))
- return -EPERM;
-
- if (!(filp->f_mode & FMODE_WRITE))
- return -EBADF;
-
- if (!S_ISREG(inode->i_mode))
- return -EINVAL;
-
- if (xfs_is_always_cow_inode(ip))
- return -EOPNOTSUPP;
-
- if (filp->f_flags & O_DSYNC)
- flags |= XFS_PREALLOC_SYNC;
- if (filp->f_mode & FMODE_NOCMTIME)
- flags |= XFS_PREALLOC_INVISIBLE;
-
- error = mnt_want_write_file(filp);
- if (error)
- return error;
-
- xfs_ilock(ip, iolock);
- error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP);
- if (error)
- goto out_unlock;
- inode_dio_wait(inode);
-
- switch (bf->l_whence) {
- case 0: /*SEEK_SET*/
- break;
- case 1: /*SEEK_CUR*/
- bf->l_start += filp->f_pos;
- break;
- case 2: /*SEEK_END*/
- bf->l_start += XFS_ISIZE(ip);
- break;
- default:
- error = -EINVAL;
- goto out_unlock;
- }
-
- if (bf->l_start < 0 || bf->l_start > inode->i_sb->s_maxbytes) {
- error = -EINVAL;
- goto out_unlock;
- }
-
- if (bf->l_start > XFS_ISIZE(ip)) {
- error = xfs_alloc_file_space(ip, XFS_ISIZE(ip),
- bf->l_start - XFS_ISIZE(ip),
- XFS_BMAPI_PREALLOC);
- if (error)
- goto out_unlock;
- }
-
- iattr.ia_valid = ATTR_SIZE;
- iattr.ia_size = bf->l_start;
- error = xfs_vn_setattr_size(file_mnt_user_ns(filp), file_dentry(filp),
- &iattr);
- if (error)
- goto out_unlock;
-
- error = xfs_update_prealloc_flags(ip, flags);
-
-out_unlock:
- xfs_iunlock(ip, iolock);
- mnt_drop_write_file(filp);
- return error;
-}
-
/* Return 0 on success or positive error */
int
xfs_fsbulkstat_one_fmt(
return 0;
}
+/*
+ * These long-unused ioctls were removed from the official ioctl API in 5.17,
+ * but retain these definitions so that we can log warnings about them.
+ */
+#define XFS_IOC_ALLOCSP _IOW ('X', 10, struct xfs_flock64)
+#define XFS_IOC_FREESP _IOW ('X', 11, struct xfs_flock64)
+#define XFS_IOC_ALLOCSP64 _IOW ('X', 36, struct xfs_flock64)
+#define XFS_IOC_FREESP64 _IOW ('X', 37, struct xfs_flock64)
+
/*
* Note: some of the ioctl's return positive numbers as a
* byte count indicating success, such as readlink_by_handle.
case XFS_IOC_ALLOCSP:
case XFS_IOC_FREESP:
case XFS_IOC_ALLOCSP64:
- case XFS_IOC_FREESP64: {
- xfs_flock64_t bf;
-
- if (copy_from_user(&bf, arg, sizeof(bf)))
- return -EFAULT;
- return xfs_ioc_space(filp, &bf);
- }
+ case XFS_IOC_FREESP64:
+ xfs_warn_once(mp,
+ "%s should use fallocate; XFS_IOC_{ALLOC,FREE}SP ioctl unsupported",
+ current->comm);
+ return -ENOTTY;
case XFS_IOC_DIOINFO: {
struct xfs_buftarg *target = xfs_inode_buftarg(ip);
struct dioattr da;
struct xfs_ibulk;
struct xfs_inogrp;
-
-extern int
-xfs_ioc_space(
- struct file *filp,
- xfs_flock64_t *bf);
-
int
xfs_ioc_swapext(
xfs_swapext_t *sxp);
_IOC(_IOC_DIR(cmd), _IOC_TYPE(cmd), _IOC_NR(cmd), sizeof(type))
#ifdef BROKEN_X86_ALIGNMENT
-STATIC int
-xfs_compat_flock64_copyin(
- xfs_flock64_t *bf,
- compat_xfs_flock64_t __user *arg32)
-{
- if (get_user(bf->l_type, &arg32->l_type) ||
- get_user(bf->l_whence, &arg32->l_whence) ||
- get_user(bf->l_start, &arg32->l_start) ||
- get_user(bf->l_len, &arg32->l_len) ||
- get_user(bf->l_sysid, &arg32->l_sysid) ||
- get_user(bf->l_pid, &arg32->l_pid) ||
- copy_from_user(bf->l_pad, &arg32->l_pad, 4*sizeof(u32)))
- return -EFAULT;
- return 0;
-}
-
STATIC int
xfs_compat_ioc_fsgeometry_v1(
struct xfs_mount *mp,
switch (cmd) {
#if defined(BROKEN_X86_ALIGNMENT)
- case XFS_IOC_ALLOCSP_32:
- case XFS_IOC_FREESP_32:
- case XFS_IOC_ALLOCSP64_32:
- case XFS_IOC_FREESP64_32: {
- struct xfs_flock64 bf;
-
- if (xfs_compat_flock64_copyin(&bf, arg))
- return -EFAULT;
- cmd = _NATIVE_IOC(cmd, struct xfs_flock64);
- return xfs_ioc_space(filp, &bf);
- }
case XFS_IOC_FSGEOMETRY_V1_32:
return xfs_compat_ioc_fsgeometry_v1(ip->i_mount, arg);
case XFS_IOC_FSGROWFSDATA_32: {
_IOW('X', 123, struct compat_xfs_fsop_attrmulti_handlereq)
#ifdef BROKEN_X86_ALIGNMENT
-/* on ia32 l_start is on a 32-bit boundary */
-typedef struct compat_xfs_flock64 {
- __s16 l_type;
- __s16 l_whence;
- __s64 l_start __attribute__((packed));
- /* len == 0 means until end of file */
- __s64 l_len __attribute__((packed));
- __s32 l_sysid;
- __u32 l_pid;
- __s32 l_pad[4]; /* reserve area */
-} compat_xfs_flock64_t;
-
-#define XFS_IOC_ALLOCSP_32 _IOW('X', 10, struct compat_xfs_flock64)
-#define XFS_IOC_FREESP_32 _IOW('X', 11, struct compat_xfs_flock64)
-#define XFS_IOC_ALLOCSP64_32 _IOW('X', 36, struct compat_xfs_flock64)
-#define XFS_IOC_FREESP64_32 _IOW('X', 37, struct compat_xfs_flock64)
-#define XFS_IOC_RESVSP_32 _IOW('X', 40, struct compat_xfs_flock64)
-#define XFS_IOC_UNRESVSP_32 _IOW('X', 41, struct compat_xfs_flock64)
-#define XFS_IOC_RESVSP64_32 _IOW('X', 42, struct compat_xfs_flock64)
-#define XFS_IOC_UNRESVSP64_32 _IOW('X', 43, struct compat_xfs_flock64)
-#define XFS_IOC_ZERO_RANGE_32 _IOW('X', 57, struct compat_xfs_flock64)
-
typedef struct compat_xfs_fsop_geom_v1 {
__u32 blocksize; /* filesystem (data) block size */
__u32 rtextsize; /* realtime extent size */
* write-combining memory accesses before this macro with those after it.
*/
#ifndef io_stop_wc
-#define io_stop_wc do { } while (0)
+#define io_stop_wc() do { } while (0)
#endif
#endif /* !__ASSEMBLY__ */
#include <asm-generic/bitops/fls.h>
#include <asm-generic/bitops/__fls.h>
#include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
#ifndef _LINUX_BITOPS_H
#error only <linux/bitops.h> can be included directly
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_GENERIC_BITOPS_FIND_H_
-#define _ASM_GENERIC_BITOPS_FIND_H_
-
-extern unsigned long _find_next_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long nbits,
- unsigned long start, unsigned long invert, unsigned long le);
-extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
-extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
-extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
-
-#ifndef find_next_bit
-/**
- * find_next_bit - find the next set bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The bitmap size in bits
- *
- * Returns the bit number for the next set bit
- * If no bits are set, returns @size.
- */
-static inline
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val;
-
- if (unlikely(offset >= size))
- return size;
-
- val = *addr & GENMASK(size - 1, offset);
- return val ? __ffs(val) : size;
- }
-
- return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
-}
-#endif
-
-#ifndef find_next_and_bit
-/**
- * find_next_and_bit - find the next set bit in both memory regions
- * @addr1: The first address to base the search on
- * @addr2: The second address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The bitmap size in bits
- *
- * Returns the bit number for the next set bit
- * If no bits are set, returns @size.
- */
-static inline
-unsigned long find_next_and_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long size,
- unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val;
-
- if (unlikely(offset >= size))
- return size;
-
- val = *addr1 & *addr2 & GENMASK(size - 1, offset);
- return val ? __ffs(val) : size;
- }
-
- return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
-}
-#endif
-
-#ifndef find_next_zero_bit
-/**
- * find_next_zero_bit - find the next cleared bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The bitmap size in bits
- *
- * Returns the bit number of the next zero bit
- * If no bits are zero, returns @size.
- */
-static inline
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val;
-
- if (unlikely(offset >= size))
- return size;
-
- val = *addr | ~GENMASK(size - 1, offset);
- return val == ~0UL ? size : ffz(val);
- }
-
- return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
-}
-#endif
-
-#ifdef CONFIG_GENERIC_FIND_FIRST_BIT
-
-/**
- * find_first_bit - find the first set bit in a memory region
- * @addr: The address to start the search at
- * @size: The maximum number of bits to search
- *
- * Returns the bit number of the first set bit.
- * If no bits are set, returns @size.
- */
-static inline
-unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
-{
- if (small_const_nbits(size)) {
- unsigned long val = *addr & GENMASK(size - 1, 0);
-
- return val ? __ffs(val) : size;
- }
-
- return _find_first_bit(addr, size);
-}
-
-/**
- * find_first_zero_bit - find the first cleared bit in a memory region
- * @addr: The address to start the search at
- * @size: The maximum number of bits to search
- *
- * Returns the bit number of the first cleared bit.
- * If no bits are zero, returns @size.
- */
-static inline
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
-{
- if (small_const_nbits(size)) {
- unsigned long val = *addr | ~GENMASK(size - 1, 0);
-
- return val == ~0UL ? size : ffz(val);
- }
-
- return _find_first_zero_bit(addr, size);
-}
-#else /* CONFIG_GENERIC_FIND_FIRST_BIT */
-
-#ifndef find_first_bit
-#define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
-#endif
-#ifndef find_first_zero_bit
-#define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0)
-#endif
-
-#endif /* CONFIG_GENERIC_FIND_FIRST_BIT */
-
-#ifndef find_last_bit
-/**
- * find_last_bit - find the last set bit in a memory region
- * @addr: The address to start the search at
- * @size: The number of bits to search
- *
- * Returns the bit number of the last set bit, or size.
- */
-static inline
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
-{
- if (small_const_nbits(size)) {
- unsigned long val = *addr & GENMASK(size - 1, 0);
-
- return val ? __fls(val) : size;
- }
-
- return _find_last_bit(addr, size);
-}
-#endif
-
-/**
- * find_next_clump8 - find next 8-bit clump with set bits in a memory region
- * @clump: location to store copy of found clump
- * @addr: address to base the search on
- * @size: bitmap size in number of bits
- * @offset: bit offset at which to start searching
- *
- * Returns the bit offset for the next set clump; the found clump value is
- * copied to the location pointed by @clump. If no bits are set, returns @size.
- */
-extern unsigned long find_next_clump8(unsigned long *clump,
- const unsigned long *addr,
- unsigned long size, unsigned long offset);
-
-#define find_first_clump8(clump, bits, size) \
- find_next_clump8((clump), (bits), (size), 0)
-
-#endif /*_ASM_GENERIC_BITOPS_FIND_H_ */
#ifndef _ASM_GENERIC_BITOPS_LE_H_
#define _ASM_GENERIC_BITOPS_LE_H_
-#include <asm-generic/bitops/find.h>
#include <asm/types.h>
#include <asm/byteorder.h>
-#include <linux/swab.h>
#if defined(__LITTLE_ENDIAN)
#define BITOP_LE_SWIZZLE 0
-static inline unsigned long find_next_zero_bit_le(const void *addr,
- unsigned long size, unsigned long offset)
-{
- return find_next_zero_bit(addr, size, offset);
-}
-
-static inline unsigned long find_next_bit_le(const void *addr,
- unsigned long size, unsigned long offset)
-{
- return find_next_bit(addr, size, offset);
-}
-
-static inline unsigned long find_first_zero_bit_le(const void *addr,
- unsigned long size)
-{
- return find_first_zero_bit(addr, size);
-}
-
#elif defined(__BIG_ENDIAN)
#define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7)
-#ifndef find_next_zero_bit_le
-static inline
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
- long size, unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val = *(const unsigned long *)addr;
-
- if (unlikely(offset >= size))
- return size;
-
- val = swab(val) | ~GENMASK(size - 1, offset);
- return val == ~0UL ? size : ffz(val);
- }
-
- return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
-}
-#endif
-
-#ifndef find_next_bit_le
-static inline
-unsigned long find_next_bit_le(const void *addr, unsigned
- long size, unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val = *(const unsigned long *)addr;
-
- if (unlikely(offset >= size))
- return size;
-
- val = swab(val) & GENMASK(size - 1, offset);
- return val ? __ffs(val) : size;
- }
-
- return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
-}
#endif
-#ifndef find_first_zero_bit_le
-#define find_first_zero_bit_le(addr, size) \
- find_next_zero_bit_le((addr), (size), 0)
-#endif
-
-#else
-#error "Please fix <asm/byteorder.h>"
-#endif
static inline int test_bit_le(int nr, const void *addr)
{
#if CONFIG_PGTABLE_LEVELS > 3
+static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+ gfp_t gfp = GFP_PGTABLE_USER;
+
+ if (mm == &init_mm)
+ gfp = GFP_PGTABLE_KERNEL;
+ return (pud_t *)get_zeroed_page(gfp);
+}
+
#ifndef __HAVE_ARCH_PUD_ALLOC_ONE
/**
* pud_alloc_one - allocate a page for PUD-level page table
*/
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- gfp_t gfp = GFP_PGTABLE_USER;
-
- if (mm == &init_mm)
- gfp = GFP_PGTABLE_KERNEL;
- return (pud_t *)get_zeroed_page(gfp);
+ return __pud_alloc_one(mm, addr);
}
#endif
-static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
{
BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
free_page((unsigned long)pud);
}
+#ifndef __HAVE_ARCH_PUD_FREE
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+ __pud_free(mm, pud);
+}
+#endif
+
#endif /* CONFIG_PGTABLE_LEVELS > 3 */
#ifndef __HAVE_ARCH_PGD_FREE
blake2s_final(&state, out);
}
-void blake2s256_hmac(u8 *out, const u8 *in, const u8 *key, const size_t inlen,
- const size_t keylen);
-
#endif /* _CRYPTO_BLAKE2S_H */
#define _KUNIT_ASSERT_H
#include <linux/err.h>
-#include <linux/kernel.h>
+#include <linux/printk.h>
struct kunit;
struct string_stream;
kiocb_cancel_fn *cancel) { }
#endif /* CONFIG_AIO */
-/* for sysctl: */
-extern unsigned long aio_nr;
-extern unsigned long aio_max_nr;
-
#endif /* __LINUX__AIO_H */
#include <linux/align.h>
#include <linux/bitops.h>
+#include <linux/find.h>
#include <linux/limits.h>
#include <linux/string.h>
#include <linux/types.h>
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
* bitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off) as above
- * bitmap_next_clear_region(map, &start, &end, nbits) Find next clear region
- * bitmap_next_set_region(map, &start, &end, nbits) Find next set region
- * bitmap_for_each_clear_region(map, rs, re, start, end)
- * Iterate over all clear regions
- * bitmap_for_each_set_region(map, rs, re, start, end)
- * Iterate over all set regions
* bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
* bitmap_cut(dst, src, first, n, nbits) Cut n bits from first, copy rest
__bitmap_replace(dst, old, new, mask, nbits);
}
-static inline void bitmap_next_clear_region(unsigned long *bitmap,
- unsigned int *rs, unsigned int *re,
- unsigned int end)
-{
- *rs = find_next_zero_bit(bitmap, end, *rs);
- *re = find_next_bit(bitmap, end, *rs + 1);
-}
-
static inline void bitmap_next_set_region(unsigned long *bitmap,
unsigned int *rs, unsigned int *re,
unsigned int end)
*re = find_next_zero_bit(bitmap, end, *rs + 1);
}
-/*
- * Bitmap region iterators. Iterates over the bitmap between [@start, @end).
- * @rs and @re should be integer variables and will be set to start and end
- * index of the current clear or set region.
- */
-#define bitmap_for_each_clear_region(bitmap, rs, re, start, end) \
- for ((rs) = (start), \
- bitmap_next_clear_region((bitmap), &(rs), &(re), (end)); \
- (rs) < (re); \
- (rs) = (re) + 1, \
- bitmap_next_clear_region((bitmap), &(rs), &(re), (end)))
-
-#define bitmap_for_each_set_region(bitmap, rs, re, start, end) \
- for ((rs) = (start), \
- bitmap_next_set_region((bitmap), &(rs), &(re), (end)); \
- (rs) < (re); \
- (rs) = (re) + 1, \
- bitmap_next_set_region((bitmap), &(rs), &(re), (end)))
-
/**
* BITMAP_FROM_U64() - Represent u64 value in the format suitable for bitmap.
* @n: u64 value
*/
#include <asm/bitops.h>
-#define for_each_set_bit(bit, addr, size) \
- for ((bit) = find_first_bit((addr), (size)); \
- (bit) < (size); \
- (bit) = find_next_bit((addr), (size), (bit) + 1))
-
-/* same as for_each_set_bit() but use bit as value to start with */
-#define for_each_set_bit_from(bit, addr, size) \
- for ((bit) = find_next_bit((addr), (size), (bit)); \
- (bit) < (size); \
- (bit) = find_next_bit((addr), (size), (bit) + 1))
-
-#define for_each_clear_bit(bit, addr, size) \
- for ((bit) = find_first_zero_bit((addr), (size)); \
- (bit) < (size); \
- (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
-
-/* same as for_each_clear_bit() but use bit as value to start with */
-#define for_each_clear_bit_from(bit, addr, size) \
- for ((bit) = find_next_zero_bit((addr), (size), (bit)); \
- (bit) < (size); \
- (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
-
-/**
- * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits
- * @start: bit offset to start search and to store the current iteration offset
- * @clump: location to store copy of current 8-bit clump
- * @bits: bitmap address to base the search on
- * @size: bitmap size in number of bits
- */
-#define for_each_set_clump8(start, clump, bits, size) \
- for ((start) = find_first_clump8(&(clump), (bits), (size)); \
- (start) < (size); \
- (start) = find_next_clump8(&(clump), (bits), (size), (start) + 8))
-
static inline int get_bitmask_order(unsigned int count)
{
int order;
*/
MEM_RDONLY = BIT(1 + BPF_BASE_TYPE_BITS),
- __BPF_TYPE_LAST_FLAG = MEM_RDONLY,
+ /* MEM was "allocated" from a different helper, and cannot be mixed
+ * with regular non-MEM_ALLOC'ed MEM types.
+ */
+ MEM_ALLOC = BIT(2 + BPF_BASE_TYPE_BITS),
+
+ __BPF_TYPE_LAST_FLAG = MEM_ALLOC,
};
/* Max number of base types. */
RET_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCKET,
RET_PTR_TO_TCP_SOCK_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK,
RET_PTR_TO_SOCK_COMMON_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON,
- RET_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_ALLOC_MEM,
+ RET_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_ALLOC_MEM,
RET_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
/* This must be the last entry. Its purpose is to ensure the enum is
void
bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt);
-int check_ctx_reg(struct bpf_verifier_env *env,
- const struct bpf_reg_state *reg, int regno);
+int check_ptr_off_reg(struct bpf_verifier_env *env,
+ const struct bpf_reg_state *reg, int regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno, u32 mem_size);
extern const char *ceph_msg_type_name(int type);
extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
+extern int ceph_parse_fsid(const char *str, struct ceph_fsid *fsid);
struct fs_parameter;
struct fc_log;
struct ceph_options *ceph_alloc_options(void);
int ceph_parse_mon_ips(const char *buf, size_t len, struct ceph_options *opt,
- struct fc_log *l);
+ struct fc_log *l, char delim);
int ceph_parse_param(struct fs_parameter *param, struct ceph_options *opt,
struct fc_log *l);
int ceph_print_client_options(struct seq_file *m, struct ceph_client *client,
extern int ceph_parse_ips(const char *c, const char *end,
struct ceph_entity_addr *addr,
- int max_count, int *count);
+ int max_count, int *count, char delim);
extern int ceph_msgr_init(void);
extern void ceph_msgr_exit(void);
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_CLEANCACHE_H
-#define _LINUX_CLEANCACHE_H
-
-#include <linux/fs.h>
-#include <linux/exportfs.h>
-#include <linux/mm.h>
-
-#define CLEANCACHE_NO_POOL -1
-#define CLEANCACHE_NO_BACKEND -2
-#define CLEANCACHE_NO_BACKEND_SHARED -3
-
-#define CLEANCACHE_KEY_MAX 6
-
-/*
- * cleancache requires every file with a page in cleancache to have a
- * unique key unless/until the file is removed/truncated. For some
- * filesystems, the inode number is unique, but for "modern" filesystems
- * an exportable filehandle is required (see exportfs.h)
- */
-struct cleancache_filekey {
- union {
- ino_t ino;
- __u32 fh[CLEANCACHE_KEY_MAX];
- u32 key[CLEANCACHE_KEY_MAX];
- } u;
-};
-
-struct cleancache_ops {
- int (*init_fs)(size_t);
- int (*init_shared_fs)(uuid_t *uuid, size_t);
- int (*get_page)(int, struct cleancache_filekey,
- pgoff_t, struct page *);
- void (*put_page)(int, struct cleancache_filekey,
- pgoff_t, struct page *);
- void (*invalidate_page)(int, struct cleancache_filekey, pgoff_t);
- void (*invalidate_inode)(int, struct cleancache_filekey);
- void (*invalidate_fs)(int);
-};
-
-extern int cleancache_register_ops(const struct cleancache_ops *ops);
-extern void __cleancache_init_fs(struct super_block *);
-extern void __cleancache_init_shared_fs(struct super_block *);
-extern int __cleancache_get_page(struct page *);
-extern void __cleancache_put_page(struct page *);
-extern void __cleancache_invalidate_page(struct address_space *, struct page *);
-extern void __cleancache_invalidate_inode(struct address_space *);
-extern void __cleancache_invalidate_fs(struct super_block *);
-
-#ifdef CONFIG_CLEANCACHE
-#define cleancache_enabled (1)
-static inline bool cleancache_fs_enabled_mapping(struct address_space *mapping)
-{
- return mapping->host->i_sb->cleancache_poolid >= 0;
-}
-static inline bool cleancache_fs_enabled(struct page *page)
-{
- return cleancache_fs_enabled_mapping(page->mapping);
-}
-#else
-#define cleancache_enabled (0)
-#define cleancache_fs_enabled(_page) (0)
-#define cleancache_fs_enabled_mapping(_page) (0)
-#endif
-
-/*
- * The shim layer provided by these inline functions allows the compiler
- * to reduce all cleancache hooks to nothingness if CONFIG_CLEANCACHE
- * is disabled, to a single global variable check if CONFIG_CLEANCACHE
- * is enabled but no cleancache "backend" has dynamically enabled it,
- * and, for the most frequent cleancache ops, to a single global variable
- * check plus a superblock element comparison if CONFIG_CLEANCACHE is enabled
- * and a cleancache backend has dynamically enabled cleancache, but the
- * filesystem referenced by that cleancache op has not enabled cleancache.
- * As a result, CONFIG_CLEANCACHE can be enabled by default with essentially
- * no measurable performance impact.
- */
-
-static inline void cleancache_init_fs(struct super_block *sb)
-{
- if (cleancache_enabled)
- __cleancache_init_fs(sb);
-}
-
-static inline void cleancache_init_shared_fs(struct super_block *sb)
-{
- if (cleancache_enabled)
- __cleancache_init_shared_fs(sb);
-}
-
-static inline int cleancache_get_page(struct page *page)
-{
- if (cleancache_enabled && cleancache_fs_enabled(page))
- return __cleancache_get_page(page);
- return -1;
-}
-
-static inline void cleancache_put_page(struct page *page)
-{
- if (cleancache_enabled && cleancache_fs_enabled(page))
- __cleancache_put_page(page);
-}
-
-static inline void cleancache_invalidate_page(struct address_space *mapping,
- struct page *page)
-{
- /* careful... page->mapping is NULL sometimes when this is called */
- if (cleancache_enabled && cleancache_fs_enabled_mapping(mapping))
- __cleancache_invalidate_page(mapping, page);
-}
-
-static inline void cleancache_invalidate_inode(struct address_space *mapping)
-{
- if (cleancache_enabled && cleancache_fs_enabled_mapping(mapping))
- __cleancache_invalidate_inode(mapping);
-}
-
-static inline void cleancache_invalidate_fs(struct super_block *sb)
-{
- if (cleancache_enabled)
- __cleancache_invalidate_fs(sb);
-}
-
-#endif /* _LINUX_CLEANCACHE_H */
unsigned long dump_size;
};
-extern int core_uses_pid;
-extern char core_pattern[];
-extern unsigned int core_pipe_limit;
-
/*
* These are the only things you should do on a core-file: use only these
* functions to write out all the necessary info.
static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
#endif
+#if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
+extern void validate_coredump_safety(void);
+#else
+static inline void validate_coredump_safety(void) {}
+#endif
+
#endif /* _LINUX_COREDUMP_H */
return 0;
}
+static inline unsigned int cpumask_first_zero(const struct cpumask *srcp)
+{
+ return 0;
+}
+
+static inline unsigned int cpumask_first_and(const struct cpumask *srcp1,
+ const struct cpumask *srcp2)
+{
+ return 0;
+}
+
static inline unsigned int cpumask_last(const struct cpumask *srcp)
{
return 0;
static inline int cpumask_any_and_distribute(const struct cpumask *src1p,
const struct cpumask *src2p) {
- return cpumask_next_and(-1, src1p, src2p);
+ return cpumask_first_and(src1p, src2p);
}
static inline int cpumask_any_distribute(const struct cpumask *srcp)
return find_first_bit(cpumask_bits(srcp), nr_cpumask_bits);
}
+/**
+ * cpumask_first_zero - get the first unset cpu in a cpumask
+ * @srcp: the cpumask pointer
+ *
+ * Returns >= nr_cpu_ids if all cpus are set.
+ */
+static inline unsigned int cpumask_first_zero(const struct cpumask *srcp)
+{
+ return find_first_zero_bit(cpumask_bits(srcp), nr_cpumask_bits);
+}
+
+/**
+ * cpumask_first_and - return the first cpu from *srcp1 & *srcp2
+ * @src1p: the first input
+ * @src2p: the second input
+ *
+ * Returns >= nr_cpu_ids if no cpus set in both. See also cpumask_next_and().
+ */
+static inline
+unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask *srcp2)
+{
+ return find_first_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_cpumask_bits);
+}
+
/**
* cpumask_last - get the last CPU in a cpumask
* @srcp: - the cpumask pointer
*/
#define cpumask_any(srcp) cpumask_first(srcp)
-/**
- * cpumask_first_and - return the first cpu from *srcp1 & *srcp2
- * @src1p: the first input
- * @src2p: the second input
- *
- * Returns >= nr_cpu_ids if no cpus set in both. See also cpumask_next_and().
- */
-#define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p), (src2p))
-
/**
* cpumask_any_and - pick a "random" cpu from *mask1 & *mask2
* @mask1: the first input cpumask
extern const struct qstr slash_name;
extern const struct qstr dotdot_name;
-struct dentry_stat_t {
- long nr_dentry;
- long nr_unused;
- long age_limit; /* age in seconds */
- long want_pages; /* pages requested by system */
- long nr_negative; /* # of unused negative dentries */
- long dummy; /* Reserved for future use */
-};
-extern struct dentry_stat_t dentry_stat;
-
/*
* Try to keep struct dentry aligned on 64 byte cachelines (this will
* give reasonable cacheline footprint with larger lines without the
#include <uapi/linux/taskstats.h>
-/*
- * Per-task flags relevant to delay accounting
- * maintained privately to avoid exhausting similar flags in sched.h:PF_*
- * Used to set current->delays->flags
- */
-#define DELAYACCT_PF_SWAPIN 0x00000001 /* I am doing a swapin */
-#define DELAYACCT_PF_BLKIO 0x00000002 /* I am waiting on IO */
-
#ifdef CONFIG_TASK_DELAY_ACCT
struct task_delay_info {
raw_spinlock_t lock;
- unsigned int flags; /* Private per-task flags */
/* For each stat XXX, add following, aligned appropriately
*
* associated with the operation is added to XXX_delay.
* XXX_delay contains the accumulated delay time in nanoseconds.
*/
- u64 blkio_start; /* Shared by blkio, swapin */
+ u64 blkio_start;
u64 blkio_delay; /* wait for sync block io completion */
- u64 swapin_delay; /* wait for swapin block io completion */
+ u64 swapin_start;
+ u64 swapin_delay; /* wait for swapin */
u32 blkio_count; /* total count of the number of sync block */
/* io operations performed */
- u32 swapin_count; /* total count of the number of swapin block */
- /* io operations performed */
+ u32 swapin_count; /* total count of swapin */
u64 freepages_start;
u64 freepages_delay; /* wait for memory reclaim */
u64 thrashing_start;
u64 thrashing_delay; /* wait for thrashing page */
+ u64 compact_start;
+ u64 compact_delay; /* wait for memory compact */
+
u32 freepages_count; /* total count of memory reclaim */
u32 thrashing_count; /* total count of thrash waits */
+ u32 compact_count; /* total count of memory compact */
};
#endif
extern void __delayacct_freepages_end(void);
extern void __delayacct_thrashing_start(void);
extern void __delayacct_thrashing_end(void);
-
-static inline int delayacct_is_task_waiting_on_io(struct task_struct *p)
-{
- if (p->delays)
- return (p->delays->flags & DELAYACCT_PF_BLKIO);
- else
- return 0;
-}
-
-static inline void delayacct_set_flag(struct task_struct *p, int flag)
-{
- if (p->delays)
- p->delays->flags |= flag;
-}
-
-static inline void delayacct_clear_flag(struct task_struct *p, int flag)
-{
- if (p->delays)
- p->delays->flags &= ~flag;
-}
+extern void __delayacct_swapin_start(void);
+extern void __delayacct_swapin_end(void);
+extern void __delayacct_compact_start(void);
+extern void __delayacct_compact_end(void);
static inline void delayacct_tsk_init(struct task_struct *tsk)
{
if (!static_branch_unlikely(&delayacct_key))
return;
- delayacct_set_flag(current, DELAYACCT_PF_BLKIO);
if (current->delays)
__delayacct_blkio_start();
}
if (p->delays)
__delayacct_blkio_end(p);
- delayacct_clear_flag(p, DELAYACCT_PF_BLKIO);
}
static inline __u64 delayacct_blkio_ticks(struct task_struct *tsk)
static inline void delayacct_freepages_start(void)
{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
if (current->delays)
__delayacct_freepages_start();
}
static inline void delayacct_freepages_end(void)
{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
if (current->delays)
__delayacct_freepages_end();
}
static inline void delayacct_thrashing_start(void)
{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
if (current->delays)
__delayacct_thrashing_start();
}
static inline void delayacct_thrashing_end(void)
{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
if (current->delays)
__delayacct_thrashing_end();
}
+static inline void delayacct_swapin_start(void)
+{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
+ if (current->delays)
+ __delayacct_swapin_start();
+}
+
+static inline void delayacct_swapin_end(void)
+{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
+ if (current->delays)
+ __delayacct_swapin_end();
+}
+
+static inline void delayacct_compact_start(void)
+{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
+ if (current->delays)
+ __delayacct_compact_start();
+}
+
+static inline void delayacct_compact_end(void)
+{
+ if (!static_branch_unlikely(&delayacct_key))
+ return;
+
+ if (current->delays)
+ __delayacct_compact_end();
+}
+
#else
-static inline void delayacct_set_flag(struct task_struct *p, int flag)
-{}
-static inline void delayacct_clear_flag(struct task_struct *p, int flag)
-{}
static inline void delayacct_init(void)
{}
static inline void delayacct_tsk_init(struct task_struct *tsk)
{}
static inline void delayacct_thrashing_end(void)
{}
+static inline void delayacct_swapin_start(void)
+{}
+static inline void delayacct_swapin_end(void)
+{}
+static inline void delayacct_compact_start(void)
+{}
+static inline void delayacct_compact_end(void)
+{}
#endif /* CONFIG_TASK_DELAY_ACCT */
FS_CREATE | FS_RENAME |\
FS_MOVED_FROM | FS_MOVED_TO)
-extern int dir_notify_enable;
extern void dnotify_flush(struct file *, fl_owner_t);
extern int fcntl_dirnotify(int, struct file *, unsigned long);
__compat_uid_t pr_uid;
__compat_gid_t pr_gid;
compat_pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
+ /*
+ * The hard-coded 16 is derived from TASK_COMM_LEN, but it can't be
+ * changed as it is exposed to userspace. We'd better make it hard-coded
+ * here.
+ */
char pr_fname[16];
char pr_psargs[ELF_PRARGSZ];
};
__kernel_gid_t pr_gid;
pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
/* Lots missing */
+ /*
+ * The hard-coded 16 is derived from TASK_COMM_LEN, but it can't be
+ * changed as it is exposed to userspace. We'd better make it hard-coded
+ * here.
+ */
char pr_fname[16]; /* filename of executable */
char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
};
#include <linux/sysctl.h>
#include <uapi/linux/fanotify.h>
-extern struct ctl_table fanotify_table[]; /* for sysctl */
-
#define FAN_GROUP_FLAG(group, flag) \
((group)->fanotify_data.flags & (flag))
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_FIND_H_
+#define __LINUX_FIND_H_
+
+#ifndef __LINUX_BITMAP_H
+#error only <linux/bitmap.h> can be included directly
+#endif
+
+#include <linux/bitops.h>
+
+extern unsigned long _find_next_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long nbits,
+ unsigned long start, unsigned long invert, unsigned long le);
+extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long size);
+extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
+
+#ifndef find_next_bit
+/**
+ * find_next_bit - find the next set bit in a memory region
+ * @addr: The address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
+}
+#endif
+
+#ifndef find_next_and_bit
+/**
+ * find_next_and_bit - find the next set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_next_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long size,
+ unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr1 & *addr2 & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
+}
+#endif
+
+#ifndef find_next_zero_bit
+/**
+ * find_next_zero_bit - find the next cleared bit in a memory region
+ * @addr: The address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number of the next zero bit
+ * If no bits are zero, returns @size.
+ */
+static inline
+unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr | ~GENMASK(size - 1, offset);
+ return val == ~0UL ? size : ffz(val);
+ }
+
+ return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
+}
+#endif
+
+#ifndef find_first_bit
+/**
+ * find_first_bit - find the first set bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The maximum number of bits to search
+ *
+ * Returns the bit number of the first set bit.
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr & GENMASK(size - 1, 0);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_bit(addr, size);
+}
+#endif
+
+#ifndef find_first_and_bit
+/**
+ * find_first_and_bit - find the first set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_and_bit(addr1, addr2, size);
+}
+#endif
+
+#ifndef find_first_zero_bit
+/**
+ * find_first_zero_bit - find the first cleared bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The maximum number of bits to search
+ *
+ * Returns the bit number of the first cleared bit.
+ * If no bits are zero, returns @size.
+ */
+static inline
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr | ~GENMASK(size - 1, 0);
+
+ return val == ~0UL ? size : ffz(val);
+ }
+
+ return _find_first_zero_bit(addr, size);
+}
+#endif
+
+#ifndef find_last_bit
+/**
+ * find_last_bit - find the last set bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The number of bits to search
+ *
+ * Returns the bit number of the last set bit, or size.
+ */
+static inline
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr & GENMASK(size - 1, 0);
+
+ return val ? __fls(val) : size;
+ }
+
+ return _find_last_bit(addr, size);
+}
+#endif
+
+/**
+ * find_next_clump8 - find next 8-bit clump with set bits in a memory region
+ * @clump: location to store copy of found clump
+ * @addr: address to base the search on
+ * @size: bitmap size in number of bits
+ * @offset: bit offset at which to start searching
+ *
+ * Returns the bit offset for the next set clump; the found clump value is
+ * copied to the location pointed by @clump. If no bits are set, returns @size.
+ */
+extern unsigned long find_next_clump8(unsigned long *clump,
+ const unsigned long *addr,
+ unsigned long size, unsigned long offset);
+
+#define find_first_clump8(clump, bits, size) \
+ find_next_clump8((clump), (bits), (size), 0)
+
+#if defined(__LITTLE_ENDIAN)
+
+static inline unsigned long find_next_zero_bit_le(const void *addr,
+ unsigned long size, unsigned long offset)
+{
+ return find_next_zero_bit(addr, size, offset);
+}
+
+static inline unsigned long find_next_bit_le(const void *addr,
+ unsigned long size, unsigned long offset)
+{
+ return find_next_bit(addr, size, offset);
+}
+
+static inline unsigned long find_first_zero_bit_le(const void *addr,
+ unsigned long size)
+{
+ return find_first_zero_bit(addr, size);
+}
+
+#elif defined(__BIG_ENDIAN)
+
+#ifndef find_next_zero_bit_le
+static inline
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+ long size, unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *(const unsigned long *)addr;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = swab(val) | ~GENMASK(size - 1, offset);
+ return val == ~0UL ? size : ffz(val);
+ }
+
+ return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
+}
+#endif
+
+#ifndef find_next_bit_le
+static inline
+unsigned long find_next_bit_le(const void *addr, unsigned
+ long size, unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *(const unsigned long *)addr;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = swab(val) & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
+}
+#endif
+
+#ifndef find_first_zero_bit_le
+#define find_first_zero_bit_le(addr, size) \
+ find_next_zero_bit_le((addr), (size), 0)
+#endif
+
+#else
+#error "Please fix <asm/byteorder.h>"
+#endif
+
+#define for_each_set_bit(bit, addr, size) \
+ for ((bit) = find_next_bit((addr), (size), 0); \
+ (bit) < (size); \
+ (bit) = find_next_bit((addr), (size), (bit) + 1))
+
+/* same as for_each_set_bit() but use bit as value to start with */
+#define for_each_set_bit_from(bit, addr, size) \
+ for ((bit) = find_next_bit((addr), (size), (bit)); \
+ (bit) < (size); \
+ (bit) = find_next_bit((addr), (size), (bit) + 1))
+
+#define for_each_clear_bit(bit, addr, size) \
+ for ((bit) = find_next_zero_bit((addr), (size), 0); \
+ (bit) < (size); \
+ (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
+
+/* same as for_each_clear_bit() but use bit as value to start with */
+#define for_each_clear_bit_from(bit, addr, size) \
+ for ((bit) = find_next_zero_bit((addr), (size), (bit)); \
+ (bit) < (size); \
+ (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
+
+/**
+ * for_each_set_bitrange - iterate over all set bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first set bit)
+ * @e: bit offset of end of current bitrange (first unset bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_set_bitrange(b, e, addr, size) \
+ for ((b) = find_next_bit((addr), (size), 0), \
+ (e) = find_next_zero_bit((addr), (size), (b) + 1); \
+ (b) < (size); \
+ (b) = find_next_bit((addr), (size), (e) + 1), \
+ (e) = find_next_zero_bit((addr), (size), (b) + 1))
+
+/**
+ * for_each_set_bitrange_from - iterate over all set bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first set bit); must be initialized
+ * @e: bit offset of end of current bitrange (first unset bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_set_bitrange_from(b, e, addr, size) \
+ for ((b) = find_next_bit((addr), (size), (b)), \
+ (e) = find_next_zero_bit((addr), (size), (b) + 1); \
+ (b) < (size); \
+ (b) = find_next_bit((addr), (size), (e) + 1), \
+ (e) = find_next_zero_bit((addr), (size), (b) + 1))
+
+/**
+ * for_each_clear_bitrange - iterate over all unset bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first unset bit)
+ * @e: bit offset of end of current bitrange (first set bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_clear_bitrange(b, e, addr, size) \
+ for ((b) = find_next_zero_bit((addr), (size), 0), \
+ (e) = find_next_bit((addr), (size), (b) + 1); \
+ (b) < (size); \
+ (b) = find_next_zero_bit((addr), (size), (e) + 1), \
+ (e) = find_next_bit((addr), (size), (b) + 1))
+
+/**
+ * for_each_clear_bitrange_from - iterate over all unset bit ranges [b; e)
+ * @b: bit offset of start of current bitrange (first set bit); must be initialized
+ * @e: bit offset of end of current bitrange (first unset bit)
+ * @addr: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_clear_bitrange_from(b, e, addr, size) \
+ for ((b) = find_next_zero_bit((addr), (size), (b)), \
+ (e) = find_next_bit((addr), (size), (b) + 1); \
+ (b) < (size); \
+ (b) = find_next_zero_bit((addr), (size), (e) + 1), \
+ (e) = find_next_bit((addr), (size), (b) + 1))
+
+/**
+ * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits
+ * @start: bit offset to start search and to store the current iteration offset
+ * @clump: location to store copy of current 8-bit clump
+ * @bits: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ */
+#define for_each_set_clump8(start, clump, bits, size) \
+ for ((start) = find_first_clump8(&(clump), (bits), (size)); \
+ (start) < (size); \
+ (start) = find_next_clump8(&(clump), (bits), (size), (start) + 8))
+
+#endif /*__LINUX_FIND_H_ */
#include <linux/bitops.h>
#include <linux/jump_label.h>
-/*
- * Return code to denote that requested number of
- * frontswap pages are unused(moved to page cache).
- * Used in shmem_unuse and try_to_unuse.
- */
-#define FRONTSWAP_PAGES_UNUSED 2
-
struct frontswap_ops {
void (*init)(unsigned); /* this swap type was just swapon'ed */
int (*store)(unsigned, pgoff_t, struct page *); /* store a page */
int (*load)(unsigned, pgoff_t, struct page *); /* load a page */
void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */
void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */
- struct frontswap_ops *next; /* private pointer to next ops */
};
-extern void frontswap_register_ops(struct frontswap_ops *ops);
-extern void frontswap_shrink(unsigned long);
-extern unsigned long frontswap_curr_pages(void);
-extern void frontswap_writethrough(bool);
-#define FRONTSWAP_HAS_EXCLUSIVE_GETS
-extern void frontswap_tmem_exclusive_gets(bool);
+int frontswap_register_ops(const struct frontswap_ops *ops);
-extern bool __frontswap_test(struct swap_info_struct *, pgoff_t);
-extern void __frontswap_init(unsigned type, unsigned long *map);
+extern void frontswap_init(unsigned type, unsigned long *map);
extern int __frontswap_store(struct page *page);
extern int __frontswap_load(struct page *page);
extern void __frontswap_invalidate_page(unsigned, pgoff_t);
return static_branch_unlikely(&frontswap_enabled_key);
}
-static inline bool frontswap_test(struct swap_info_struct *sis, pgoff_t offset)
-{
- return __frontswap_test(sis, offset);
-}
-
static inline void frontswap_map_set(struct swap_info_struct *p,
unsigned long *map)
{
return false;
}
-static inline bool frontswap_test(struct swap_info_struct *sis, pgoff_t offset)
-{
- return false;
-}
-
static inline void frontswap_map_set(struct swap_info_struct *p,
unsigned long *map)
{
__frontswap_invalidate_area(type);
}
-static inline void frontswap_init(unsigned type, unsigned long *map)
-{
-#ifdef CONFIG_FRONTSWAP
- __frontswap_init(type, map);
-#endif
-}
-
#endif /* _LINUX_FRONTSWAP_H */
extern void __init files_init(void);
extern void __init files_maxfiles_init(void);
-extern struct files_stat_struct files_stat;
extern unsigned long get_max_files(void);
extern unsigned int sysctl_nr_open;
-extern struct inodes_stat_t inodes_stat;
-extern int leases_enable, lease_break_time;
-extern int sysctl_protected_symlinks;
-extern int sysctl_protected_hardlinks;
-extern int sysctl_protected_fifos;
-extern int sysctl_protected_regular;
typedef __kernel_rwf_t rwf_t;
const struct dentry_operations *s_d_op; /* default d_op for dentries */
- /*
- * Saved pool identifier for cleancache (-1 means none)
- */
- int cleancache_poolid;
-
struct shrinker s_shrink; /* per-sb shrinker handle */
/* Number of inodes with nlink == 0 but still referenced */
size_t len, loff_t *ppos);
struct ctl_table;
-int proc_nr_files(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos);
-int proc_nr_dentry(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos);
-int proc_nr_inodes(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos);
int __init list_bdev_fs_names(char *buf, size_t size);
#define __FMODE_EXEC ((__force int) FMODE_EXEC)
static inline
void fscache_note_page_release(struct fscache_cookie *cookie)
{
+ /* If we've written data to the cache (HAVE_DATA) and there wasn't any
+ * data in the cache when we started (NO_DATA_TO_READ), it may no
+ * longer be true that we can skip reading from the cache - so clear
+ * the flag that causes reads to be skipped.
+ */
if (cookie &&
test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
return val * GOLDEN_RATIO_32;
}
-#ifndef HAVE_ARCH_HASH_32
-#define hash_32 hash_32_generic
-#endif
-static inline u32 hash_32_generic(u32 val, unsigned int bits)
+static inline u32 hash_32(u32 val, unsigned int bits)
{
/* High bits are more random, so use them. */
return __hash_32(val) >> (32 - bits);
#ifndef _LINUX_INOTIFY_H
#define _LINUX_INOTIFY_H
-#include <linux/sysctl.h>
#include <uapi/linux/inotify.h>
-extern struct ctl_table inotify_table[]; /* for sysctl */
-
#define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | \
IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
IN_MOVED_TO | IN_CREATE | IN_DELETE | \
/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * NOTE:
+ *
+ * This header has combined a lot of unrelated to each other stuff.
+ * The process of splitting its content is in progress while keeping
+ * backward compatibility. That's why it's highly recommended NOT to
+ * include this header inside another header file, especially under
+ * generic or architectural include/ directory.
+ */
#ifndef _LINUX_KERNEL_H
#define _LINUX_KERNEL_H
DEFINE_INSN_CACHE_OPS(optinsn);
-#ifdef CONFIG_SYSCTL
-extern int sysctl_kprobes_optimization;
-extern int proc_kprobes_optimization_handler(struct ctl_table *table,
- int write, void *buffer,
- size_t *length, loff_t *ppos);
-#endif /* CONFIG_SYSCTL */
extern void wait_for_kprobe_optimizer(void);
#else /* !CONFIG_OPTPROBES */
static inline void wait_for_kprobe_optimizer(void) { }
unsigned int cpu,
const char *namefmt);
+void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk);
bool set_kthread_struct(struct task_struct *p);
void kthread_set_per_cpu(struct task_struct *k, int cpu);
u64 requests;
unsigned long guest_debug;
- int pre_pcpu;
- struct list_head blocked_vcpu_list;
-
struct mutex mutex;
struct kvm_run *run;
* @list: the entry to test
* @head: the head of the list
*/
-static inline int list_is_first(const struct list_head *list,
- const struct list_head *head)
+static inline int list_is_first(const struct list_head *list, const struct list_head *head)
{
return list->prev == head;
}
* @list: the entry to test
* @head: the head of the list
*/
-static inline int list_is_last(const struct list_head *list,
- const struct list_head *head)
+static inline int list_is_last(const struct list_head *list, const struct list_head *head)
{
return list->next == head;
}
+/**
+ * list_is_head - tests whether @list is the list @head
+ * @list: the entry to test
+ * @head: the head of the list
+ */
+static inline int list_is_head(const struct list_head *list, const struct list_head *head)
+{
+ return list == head;
+}
+
/**
* list_empty - tests whether a list is empty
* @head: the list to test.
static inline int list_empty_careful(const struct list_head *head)
{
struct list_head *next = smp_load_acquire(&head->next);
- return (next == head) && (next == head->prev);
+ return list_is_head(next, head) && (next == head->prev);
}
/**
{
if (list_empty(head))
return;
- if (list_is_singular(head) &&
- (head->next != entry && head != entry))
+ if (list_is_singular(head) && !list_is_head(entry, head) && (entry != head->next))
return;
- if (entry == head)
+ if (list_is_head(entry, head))
INIT_LIST_HEAD(list);
else
__list_cut_position(list, head, entry);
* @head: the head for your list.
*/
#define list_for_each(pos, head) \
- for (pos = (head)->next; pos != (head); pos = pos->next)
+ for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next)
/**
* list_for_each_continue - continue iteration over a list
* Continue to iterate over a list, continuing after the current position.
*/
#define list_for_each_continue(pos, head) \
- for (pos = pos->next; pos != (head); pos = pos->next)
+ for (pos = pos->next; !list_is_head(pos, (head)); pos = pos->next)
/**
* list_for_each_prev - iterate over a list backwards
* @head: the head for your list.
*/
#define list_for_each_prev(pos, head) \
- for (pos = (head)->prev; pos != (head); pos = pos->prev)
+ for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
/**
* list_for_each_safe - iterate over a list safe against removal of list entry
* @head: the head for your list.
*/
#define list_for_each_safe(pos, n, head) \
- for (pos = (head)->next, n = pos->next; pos != (head); \
- pos = n, n = pos->next)
+ for (pos = (head)->next, n = pos->next; \
+ !list_is_head(pos, (head)); \
+ pos = n, n = pos->next)
/**
* list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
*/
#define list_for_each_prev_safe(pos, n, head) \
for (pos = (head)->prev, n = pos->prev; \
- pos != (head); \
+ !list_is_head(pos, (head)); \
pos = n, n = pos->prev)
/**
#define RTC_IO_EXTENT_USED RTC_IO_EXTENT
#endif /* ARCH_RTC_LOCATION */
-unsigned int mc146818_get_time(struct rtc_time *time);
+bool mc146818_does_rtc_work(void);
+int mc146818_get_time(struct rtc_time *time);
int mc146818_set_time(struct rtc_time *time);
+bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),
+ void *param);
+
#endif /* _MC146818RTC_H */
struct page *newpage, struct page *page);
extern int migrate_page_move_mapping(struct address_space *mapping,
struct page *newpage, struct page *page, int extra_count);
+void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
+ spinlock_t *ptl);
void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
int folio_migrate_mapping(struct address_space *mapping,
__put_page(&folio->page);
}
+/**
+ * folio_put_refs - Reduce the reference count on a folio.
+ * @folio: The folio.
+ * @refs: The amount to subtract from the folio's reference count.
+ *
+ * If the folio's reference count reaches zero, the memory will be
+ * released back to the page allocator and may be used by another
+ * allocation immediately. Do not access the memory or the struct folio
+ * after calling folio_put_refs() unless you can be sure that these weren't
+ * the last references.
+ *
+ * Context: May be called in process or interrupt context, but not in NMI
+ * context. May be called while holding a spinlock.
+ */
+static inline void folio_put_refs(struct folio *folio, int refs)
+{
+ if (folio_ref_sub_and_test(folio, refs))
+ __put_page(&folio->page);
+}
+
static inline void put_page(struct page *page)
{
struct folio *folio = page_folio(page);
extern void mark_mounts_for_expiry(struct list_head *mounts);
extern dev_t name_to_dev_t(const char *name);
-
-extern unsigned int sysctl_mount_max;
-
extern bool path_is_mountpoint(const struct path *path);
extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
const struct of_device_id *matches, const struct device_node *node);
extern int of_modalias_node(struct device_node *node, char *modalias, int len);
extern void of_print_phandle_args(const char *msg, const struct of_phandle_args *args);
-extern struct device_node *of_parse_phandle(const struct device_node *np,
- const char *phandle_name,
- int index);
-extern int of_parse_phandle_with_args(const struct device_node *np,
- const char *list_name, const char *cells_name, int index,
- struct of_phandle_args *out_args);
+extern int __of_parse_phandle_with_args(const struct device_node *np,
+ const char *list_name, const char *cells_name, int cell_count,
+ int index, struct of_phandle_args *out_args);
extern int of_parse_phandle_with_args_map(const struct device_node *np,
const char *list_name, const char *stem_name, int index,
struct of_phandle_args *out_args);
-extern int of_parse_phandle_with_fixed_args(const struct device_node *np,
- const char *list_name, int cells_count, int index,
- struct of_phandle_args *out_args);
extern int of_count_phandle_with_args(const struct device_node *np,
const char *list_name, const char *cells_name);
#define of_match_ptr(_ptr) (_ptr)
-/**
- * of_property_read_u8_array - Find and read an array of u8 from a property.
- *
- * @np: device node from which the property value is to be read.
- * @propname: name of the property to be searched.
- * @out_values: pointer to return value, modified only if return value is 0.
- * @sz: number of array elements to read
- *
- * Search for a property in a device node and read 8-bit value(s) from
- * it.
- *
- * dts entry of array should be like:
- * ``property = /bits/ 8 <0x50 0x60 0x70>;``
- *
- * Return: 0 on success, -EINVAL if the property does not exist,
- * -ENODATA if property does not have a value, and -EOVERFLOW if the
- * property data isn't large enough.
- *
- * The out_values is modified only if a valid u8 value can be decoded.
- */
-static inline int of_property_read_u8_array(const struct device_node *np,
- const char *propname,
- u8 *out_values, size_t sz)
-{
- int ret = of_property_read_variable_u8_array(np, propname, out_values,
- sz, 0);
- if (ret >= 0)
- return 0;
- else
- return ret;
-}
-
-/**
- * of_property_read_u16_array - Find and read an array of u16 from a property.
- *
- * @np: device node from which the property value is to be read.
- * @propname: name of the property to be searched.
- * @out_values: pointer to return value, modified only if return value is 0.
- * @sz: number of array elements to read
- *
- * Search for a property in a device node and read 16-bit value(s) from
- * it.
- *
- * dts entry of array should be like:
- * ``property = /bits/ 16 <0x5000 0x6000 0x7000>;``
- *
- * Return: 0 on success, -EINVAL if the property does not exist,
- * -ENODATA if property does not have a value, and -EOVERFLOW if the
- * property data isn't large enough.
- *
- * The out_values is modified only if a valid u16 value can be decoded.
- */
-static inline int of_property_read_u16_array(const struct device_node *np,
- const char *propname,
- u16 *out_values, size_t sz)
-{
- int ret = of_property_read_variable_u16_array(np, propname, out_values,
- sz, 0);
- if (ret >= 0)
- return 0;
- else
- return ret;
-}
-
-/**
- * of_property_read_u32_array - Find and read an array of 32 bit integers
- * from a property.
- *
- * @np: device node from which the property value is to be read.
- * @propname: name of the property to be searched.
- * @out_values: pointer to return value, modified only if return value is 0.
- * @sz: number of array elements to read
- *
- * Search for a property in a device node and read 32-bit value(s) from
- * it.
- *
- * Return: 0 on success, -EINVAL if the property does not exist,
- * -ENODATA if property does not have a value, and -EOVERFLOW if the
- * property data isn't large enough.
- *
- * The out_values is modified only if a valid u32 value can be decoded.
- */
-static inline int of_property_read_u32_array(const struct device_node *np,
- const char *propname,
- u32 *out_values, size_t sz)
-{
- int ret = of_property_read_variable_u32_array(np, propname, out_values,
- sz, 0);
- if (ret >= 0)
- return 0;
- else
- return ret;
-}
-
-/**
- * of_property_read_u64_array - Find and read an array of 64 bit integers
- * from a property.
- *
- * @np: device node from which the property value is to be read.
- * @propname: name of the property to be searched.
- * @out_values: pointer to return value, modified only if return value is 0.
- * @sz: number of array elements to read
- *
- * Search for a property in a device node and read 64-bit value(s) from
- * it.
- *
- * Return: 0 on success, -EINVAL if the property does not exist,
- * -ENODATA if property does not have a value, and -EOVERFLOW if the
- * property data isn't large enough.
- *
- * The out_values is modified only if a valid u64 value can be decoded.
- */
-static inline int of_property_read_u64_array(const struct device_node *np,
- const char *propname,
- u64 *out_values, size_t sz)
-{
- int ret = of_property_read_variable_u64_array(np, propname, out_values,
- sz, 0);
- if (ret >= 0)
- return 0;
- else
- return ret;
-}
-
/*
* struct property *prop;
* const __be32 *p;
return -ENOSYS;
}
-static inline int of_property_read_u8_array(const struct device_node *np,
- const char *propname, u8 *out_values, size_t sz)
-{
- return -ENOSYS;
-}
-
-static inline int of_property_read_u16_array(const struct device_node *np,
- const char *propname, u16 *out_values, size_t sz)
-{
- return -ENOSYS;
-}
-
-static inline int of_property_read_u32_array(const struct device_node *np,
- const char *propname,
- u32 *out_values, size_t sz)
-{
- return -ENOSYS;
-}
-
-static inline int of_property_read_u64_array(const struct device_node *np,
- const char *propname,
- u64 *out_values, size_t sz)
-{
- return -ENOSYS;
-}
-
static inline int of_property_read_u32_index(const struct device_node *np,
const char *propname, u32 index, u32 *out_value)
{
return -ENOSYS;
}
-static inline struct device_node *of_parse_phandle(const struct device_node *np,
- const char *phandle_name,
- int index)
-{
- return NULL;
-}
-
-static inline int of_parse_phandle_with_args(const struct device_node *np,
- const char *list_name,
- const char *cells_name,
- int index,
- struct of_phandle_args *out_args)
+static inline int __of_parse_phandle_with_args(const struct device_node *np,
+ const char *list_name,
+ const char *cells_name,
+ int cell_count,
+ int index,
+ struct of_phandle_args *out_args)
{
return -ENOSYS;
}
return -ENOSYS;
}
-static inline int of_parse_phandle_with_fixed_args(const struct device_node *np,
- const char *list_name, int cells_count, int index,
- struct of_phandle_args *out_args)
-{
- return -ENOSYS;
-}
-
static inline int of_count_phandle_with_args(const struct device_node *np,
const char *list_name,
const char *cells_name)
return np && match && type && !strcmp(match, type);
}
+/**
+ * of_parse_phandle - Resolve a phandle property to a device_node pointer
+ * @np: Pointer to device node holding phandle property
+ * @phandle_name: Name of property holding a phandle value
+ * @index: For properties holding a table of phandles, this is the index into
+ * the table
+ *
+ * Return: The device_node pointer with refcount incremented. Use
+ * of_node_put() on it when done.
+ */
+static inline struct device_node *of_parse_phandle(const struct device_node *np,
+ const char *phandle_name,
+ int index)
+{
+ struct of_phandle_args args;
+
+ if (__of_parse_phandle_with_args(np, phandle_name, NULL, 0,
+ index, &args))
+ return NULL;
+
+ return args.np;
+}
+
+/**
+ * of_parse_phandle_with_args() - Find a node pointed by phandle in a list
+ * @np: pointer to a device tree node containing a list
+ * @list_name: property name that contains a list
+ * @cells_name: property name that specifies phandles' arguments count
+ * @index: index of a phandle to parse out
+ * @out_args: optional pointer to output arguments structure (will be filled)
+ *
+ * This function is useful to parse lists of phandles and their arguments.
+ * Returns 0 on success and fills out_args, on error returns appropriate
+ * errno value.
+ *
+ * Caller is responsible to call of_node_put() on the returned out_args->np
+ * pointer.
+ *
+ * Example::
+ *
+ * phandle1: node1 {
+ * #list-cells = <2>;
+ * };
+ *
+ * phandle2: node2 {
+ * #list-cells = <1>;
+ * };
+ *
+ * node3 {
+ * list = <&phandle1 1 2 &phandle2 3>;
+ * };
+ *
+ * To get a device_node of the ``node2`` node you may call this:
+ * of_parse_phandle_with_args(node3, "list", "#list-cells", 1, &args);
+ */
+static inline int of_parse_phandle_with_args(const struct device_node *np,
+ const char *list_name,
+ const char *cells_name,
+ int index,
+ struct of_phandle_args *out_args)
+{
+ int cell_count = -1;
+
+ /* If cells_name is NULL we assume a cell count of 0 */
+ if (!cells_name)
+ cell_count = 0;
+
+ return __of_parse_phandle_with_args(np, list_name, cells_name,
+ cell_count, index, out_args);
+}
+
+/**
+ * of_parse_phandle_with_fixed_args() - Find a node pointed by phandle in a list
+ * @np: pointer to a device tree node containing a list
+ * @list_name: property name that contains a list
+ * @cell_count: number of argument cells following the phandle
+ * @index: index of a phandle to parse out
+ * @out_args: optional pointer to output arguments structure (will be filled)
+ *
+ * This function is useful to parse lists of phandles and their arguments.
+ * Returns 0 on success and fills out_args, on error returns appropriate
+ * errno value.
+ *
+ * Caller is responsible to call of_node_put() on the returned out_args->np
+ * pointer.
+ *
+ * Example::
+ *
+ * phandle1: node1 {
+ * };
+ *
+ * phandle2: node2 {
+ * };
+ *
+ * node3 {
+ * list = <&phandle1 0 2 &phandle2 2 3>;
+ * };
+ *
+ * To get a device_node of the ``node2`` node you may call this:
+ * of_parse_phandle_with_fixed_args(node3, "list", 2, 1, &args);
+ */
+static inline int of_parse_phandle_with_fixed_args(const struct device_node *np,
+ const char *list_name,
+ int cell_count,
+ int index,
+ struct of_phandle_args *out_args)
+{
+ return __of_parse_phandle_with_args(np, list_name, NULL, cell_count,
+ index, out_args);
+}
+
/**
* of_property_count_u8_elems - Count the number of u8 elements in a property
*
return prop ? true : false;
}
+/**
+ * of_property_read_u8_array - Find and read an array of u8 from a property.
+ *
+ * @np: device node from which the property value is to be read.
+ * @propname: name of the property to be searched.
+ * @out_values: pointer to return value, modified only if return value is 0.
+ * @sz: number of array elements to read
+ *
+ * Search for a property in a device node and read 8-bit value(s) from
+ * it.
+ *
+ * dts entry of array should be like:
+ * ``property = /bits/ 8 <0x50 0x60 0x70>;``
+ *
+ * Return: 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ *
+ * The out_values is modified only if a valid u8 value can be decoded.
+ */
+static inline int of_property_read_u8_array(const struct device_node *np,
+ const char *propname,
+ u8 *out_values, size_t sz)
+{
+ int ret = of_property_read_variable_u8_array(np, propname, out_values,
+ sz, 0);
+ if (ret >= 0)
+ return 0;
+ else
+ return ret;
+}
+
+/**
+ * of_property_read_u16_array - Find and read an array of u16 from a property.
+ *
+ * @np: device node from which the property value is to be read.
+ * @propname: name of the property to be searched.
+ * @out_values: pointer to return value, modified only if return value is 0.
+ * @sz: number of array elements to read
+ *
+ * Search for a property in a device node and read 16-bit value(s) from
+ * it.
+ *
+ * dts entry of array should be like:
+ * ``property = /bits/ 16 <0x5000 0x6000 0x7000>;``
+ *
+ * Return: 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ *
+ * The out_values is modified only if a valid u16 value can be decoded.
+ */
+static inline int of_property_read_u16_array(const struct device_node *np,
+ const char *propname,
+ u16 *out_values, size_t sz)
+{
+ int ret = of_property_read_variable_u16_array(np, propname, out_values,
+ sz, 0);
+ if (ret >= 0)
+ return 0;
+ else
+ return ret;
+}
+
+/**
+ * of_property_read_u32_array - Find and read an array of 32 bit integers
+ * from a property.
+ *
+ * @np: device node from which the property value is to be read.
+ * @propname: name of the property to be searched.
+ * @out_values: pointer to return value, modified only if return value is 0.
+ * @sz: number of array elements to read
+ *
+ * Search for a property in a device node and read 32-bit value(s) from
+ * it.
+ *
+ * Return: 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ *
+ * The out_values is modified only if a valid u32 value can be decoded.
+ */
+static inline int of_property_read_u32_array(const struct device_node *np,
+ const char *propname,
+ u32 *out_values, size_t sz)
+{
+ int ret = of_property_read_variable_u32_array(np, propname, out_values,
+ sz, 0);
+ if (ret >= 0)
+ return 0;
+ else
+ return ret;
+}
+
+/**
+ * of_property_read_u64_array - Find and read an array of 64 bit integers
+ * from a property.
+ *
+ * @np: device node from which the property value is to be read.
+ * @propname: name of the property to be searched.
+ * @out_values: pointer to return value, modified only if return value is 0.
+ * @sz: number of array elements to read
+ *
+ * Search for a property in a device node and read 64-bit value(s) from
+ * it.
+ *
+ * Return: 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ *
+ * The out_values is modified only if a valid u64 value can be decoded.
+ */
+static inline int of_property_read_u64_array(const struct device_node *np,
+ const char *propname,
+ u64 *out_values, size_t sz)
+{
+ int ret = of_property_read_variable_u64_array(np, propname, out_values,
+ sz, 0);
+ if (ret >= 0)
+ return 0;
+ else
+ return ret;
+}
+
static inline int of_property_read_u8(const struct device_node *np,
const char *propname,
u8 *out_value)
static inline void folio_batch_init(struct folio_batch *fbatch)
{
fbatch->nr = 0;
+ fbatch->percpu_pvec_drained = false;
}
static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
extern enum pcpu_fc pcpu_chosen_fc;
-typedef void * (*pcpu_fc_alloc_fn_t)(unsigned int cpu, size_t size,
- size_t align);
-typedef void (*pcpu_fc_free_fn_t)(void *ptr, size_t size);
-typedef void (*pcpu_fc_populate_pte_fn_t)(unsigned long addr);
+typedef int (pcpu_fc_cpu_to_node_fn_t)(int cpu);
typedef int (pcpu_fc_cpu_distance_fn_t)(unsigned int from, unsigned int to);
extern struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
#endif
#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+void __init pcpu_populate_pte(unsigned long addr);
extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
#endif
extern void __percpu *__alloc_reserved_percpu(size_t size, size_t align) __alloc_size(1);
void pipe_unlock(struct pipe_inode_info *);
void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *);
-extern unsigned int pipe_max_size;
-extern unsigned long pipe_user_pages_hard;
-extern unsigned long pipe_user_pages_soft;
-
/* Wait for a pipe to be readable/writable while dropping the pipe lock */
void pipe_wait_readable(struct pipe_inode_info *);
void pipe_wait_writable(struct pipe_inode_info *);
#include <linux/wait.h>
#include <linux/string.h>
#include <linux/fs.h>
-#include <linux/sysctl.h>
#include <linux/uaccess.h>
#include <uapi/linux/poll.h>
#include <uapi/linux/eventpoll.h>
-extern struct ctl_table epoll_table[]; /* for sysctl */
/* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
#ifdef __clang__
extern int printk_delay_msec;
extern int dmesg_restrict;
-extern int
-devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void *buf,
- size_t *lenp, loff_t *ppos);
-
extern void wake_up_klogd(void);
char *log_buf_addr_get(void);
struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
extern void proc_set_size(struct proc_dir_entry *, loff_t);
extern void proc_set_user(struct proc_dir_entry *, kuid_t, kgid_t);
-extern void *PDE_DATA(const struct inode *);
+
+/*
+ * Obtain the private data passed by user through proc_create_data() or
+ * related.
+ */
+static inline void *pde_data(const struct inode *inode)
+{
+ return inode->i_private;
+}
+
extern void *proc_get_parent_data(const struct inode *);
extern void proc_remove(struct proc_dir_entry *);
extern void remove_proc_entry(const char *, struct proc_dir_entry *);
#define proc_create_seq(name, mode, parent, ops) ({NULL;})
#define proc_create_single(name, mode, parent, show) ({NULL;})
#define proc_create_single_data(name, mode, parent, show, data) ({NULL;})
-#define proc_create(name, mode, parent, proc_ops) ({NULL;})
-#define proc_create_data(name, mode, parent, proc_ops, data) ({NULL;})
+
+static inline struct proc_dir_entry *
+proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent,
+ const struct proc_ops *proc_ops)
+{ return NULL; }
+
+static inline struct proc_dir_entry *
+proc_create_data(const char *name, umode_t mode, struct proc_dir_entry *parent,
+ const struct proc_ops *proc_ops, void *data)
+{ return NULL; }
static inline void proc_set_size(struct proc_dir_entry *de, loff_t size) {}
static inline void proc_set_user(struct proc_dir_entry *de, kuid_t uid, kgid_t gid) {}
-static inline void *PDE_DATA(const struct inode *inode) {BUG(); return NULL;}
+static inline void *pde_data(const struct inode *inode) {BUG(); return NULL;}
static inline void *proc_get_parent_data(const struct inode *inode) { BUG(); return NULL; }
static inline void proc_remove(struct proc_dir_entry *de) {}
#include <linux/refcount.h>
#include <linux/types.h>
#include <linux/spinlock.h>
+#include <linux/stackdepot.h>
struct ref_tracker;
spin_lock_init(&dir->lock);
dir->quarantine_avail = quarantine_count;
refcount_set(&dir->untracked, 1);
+ stack_depot_init();
}
void ref_tracker_dir_exit(struct ref_tracker_dir *dir);
#define write_lock(lock) _raw_write_lock(lock)
#define read_lock(lock) _raw_read_lock(lock)
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define write_lock_nested(lock, subclass) _raw_write_lock_nested(lock, subclass)
+#else
+#define write_lock_nested(lock, subclass) _raw_write_lock(lock)
+#endif
+
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
#define read_lock_irqsave(lock, flags) \
void __lockfunc _raw_read_lock(rwlock_t *lock) __acquires(lock);
void __lockfunc _raw_write_lock(rwlock_t *lock) __acquires(lock);
+void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass) __acquires(lock);
void __lockfunc _raw_read_lock_bh(rwlock_t *lock) __acquires(lock);
void __lockfunc _raw_write_lock_bh(rwlock_t *lock) __acquires(lock);
void __lockfunc _raw_read_lock_irq(rwlock_t *lock) __acquires(lock);
LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock);
}
+static inline void __raw_write_lock_nested(rwlock_t *lock, int subclass)
+{
+ preempt_disable();
+ rwlock_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock);
+}
+
#endif /* !CONFIG_GENERIC_LOCKBREAK || CONFIG_DEBUG_LOCK_ALLOC */
static inline void __raw_write_unlock(rwlock_t *lock)
extern int rt_read_trylock(rwlock_t *rwlock);
extern void rt_read_unlock(rwlock_t *rwlock);
extern void rt_write_lock(rwlock_t *rwlock);
+extern void rt_write_lock_nested(rwlock_t *rwlock, int subclass);
extern int rt_write_trylock(rwlock_t *rwlock);
extern void rt_write_unlock(rwlock_t *rwlock);
rt_write_lock(rwlock);
}
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static __always_inline void write_lock_nested(rwlock_t *rwlock, int subclass)
+{
+ rt_write_lock_nested(rwlock, subclass);
+}
+#else
+#define write_lock_nested(lock, subclass) rt_write_lock(((void)(subclass), (lock)))
+#endif
+
static __always_inline void write_lock_bh(rwlock_t *rwlock)
{
local_bh_disable();
sbitmap_free(&sbq->sb);
}
+/**
+ * sbitmap_queue_recalculate_wake_batch() - Recalculate wake batch
+ * @sbq: Bitmap queue to recalculate wake batch.
+ * @users: Number of shares.
+ *
+ * Like sbitmap_queue_update_wake_batch(), this will calculate wake batch
+ * by depth. This interface is for HCTX shared tags or queue shared tags.
+ */
+void sbitmap_queue_recalculate_wake_batch(struct sbitmap_queue *sbq,
+ unsigned int users);
+
/**
* sbitmap_queue_resize() - Resize a &struct sbitmap_queue.
* @sbq: Bitmap queue to resize.
#define get_current_state() READ_ONCE(current->__state)
-/* Task command name length: */
-#define TASK_COMM_LEN 16
+/*
+ * Define the task command name length as enum, then it can be visible to
+ * BPF programs.
+ */
+enum {
+ TASK_COMM_LEN = 16,
+};
extern void scheduler_tick(void);
struct ctl_table;
#ifdef CONFIG_DETECT_HUNG_TASK
-
-#ifdef CONFIG_SMP
-extern unsigned int sysctl_hung_task_all_cpu_backtrace;
-#else
-#define sysctl_hung_task_all_cpu_backtrace 0
-#endif /* CONFIG_SMP */
-
-extern int sysctl_hung_task_check_count;
-extern unsigned int sysctl_hung_task_panic;
+/* used for hung_task and block/ */
extern unsigned long sysctl_hung_task_timeout_secs;
-extern unsigned long sysctl_hung_task_check_interval_secs;
-extern int sysctl_hung_task_warnings;
-int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos);
#else
/* Avoid need for ifdefs elsewhere in the code */
enum { sysctl_hung_task_timeout_secs = 0 };
#define DEFINE_PROC_SHOW_ATTRIBUTE(__name) \
static int __name ## _open(struct inode *inode, struct file *file) \
{ \
- return single_open(file, __name ## _show, PDE_DATA(inode)); \
+ return single_open(file, __name ## _show, pde_data(inode)); \
} \
\
static const struct proc_ops __name ## _proc_ops = { \
extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp_mask);
extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
-extern int shmem_unuse(unsigned int type, bool frontswap,
- unsigned long *fs_pages_to_unuse);
+int shmem_unuse(unsigned int type);
extern bool shmem_is_huge(struct vm_area_struct *vma,
struct inode *inode, pgoff_t index);
if (size <= 16 * 1024 * 1024) return 24;
if (size <= 32 * 1024 * 1024) return 25;
- if ((IS_ENABLED(CONFIG_CC_IS_GCC) || CONFIG_CLANG_VERSION >= 110000)
- && !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
+ if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
else
BUG();
#define _raw_spin_lock_nested(lock, subclass) __LOCK(lock)
#define _raw_read_lock(lock) __LOCK(lock)
#define _raw_write_lock(lock) __LOCK(lock)
+#define _raw_write_lock_nested(lock, subclass) __LOCK(lock)
#define _raw_spin_lock_bh(lock) __LOCK_BH(lock)
#define _raw_read_lock_bh(lock) __LOCK_BH(lock)
#define _raw_write_lock_bh(lock) __LOCK_BH(lock)
unsigned int nr_entries,
gfp_t gfp_flags, bool can_alloc);
+/*
+ * Every user of stack depot has to call this during its own init when it's
+ * decided that it will be calling stack_depot_save() later.
+ *
+ * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
+ * enabled as part of mm_init(), for subsystems where it's known at compile time
+ * that stack depot will be used.
+ */
+int stack_depot_init(void);
+
+#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
+static inline int stack_depot_early_init(void) { return stack_depot_init(); }
+#else
+static inline int stack_depot_early_init(void) { return 0; }
+#endif
+
depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries, gfp_t gfp_flags);
void stack_depot_print(depot_stack_handle_t stack);
-#ifdef CONFIG_STACKDEPOT
-int stack_depot_init(void);
-#else
-static inline int stack_depot_init(void)
-{
- return 0;
-}
-#endif /* CONFIG_STACKDEPOT */
-
#endif
# endif
}
-#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
-int stack_erasing_sysctl(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos);
-#endif
-
#else /* !CONFIG_GCC_PLUGIN_STACKLEAK */
static inline void stackleak_task_init(struct task_struct *t) { }
#endif
* these were static in swapfile.c but frontswap.c needs them and we don't
* want to expose them to the dozens of source files that include swap.h
*/
-extern spinlock_t swap_lock;
-extern struct plist_head swap_active_head;
extern struct swap_info_struct *swap_info[];
-extern int try_to_unuse(unsigned int, bool, unsigned long);
extern unsigned long generic_max_swapfile_size(void);
extern unsigned long max_swapfile_size(void);
struct ctl_dir;
/* Keep the same order as in fs/proc/proc_sysctl.c */
-#define SYSCTL_ZERO ((void *)&sysctl_vals[0])
-#define SYSCTL_ONE ((void *)&sysctl_vals[1])
-#define SYSCTL_INT_MAX ((void *)&sysctl_vals[2])
+#define SYSCTL_NEG_ONE ((void *)&sysctl_vals[0])
+#define SYSCTL_ZERO ((void *)&sysctl_vals[1])
+#define SYSCTL_ONE ((void *)&sysctl_vals[2])
+#define SYSCTL_TWO ((void *)&sysctl_vals[3])
+#define SYSCTL_FOUR ((void *)&sysctl_vals[4])
+#define SYSCTL_ONE_HUNDRED ((void *)&sysctl_vals[5])
+#define SYSCTL_TWO_HUNDRED ((void *)&sysctl_vals[6])
+#define SYSCTL_ONE_THOUSAND ((void *)&sysctl_vals[7])
+#define SYSCTL_THREE_THOUSAND ((void *)&sysctl_vals[8])
+#define SYSCTL_INT_MAX ((void *)&sysctl_vals[9])
+
+/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
+#define SYSCTL_MAXOLDUID ((void *)&sysctl_vals[10])
extern const int sysctl_vals[];
+#define SYSCTL_LONG_ZERO ((void *)&sysctl_long_vals[0])
+#define SYSCTL_LONG_ONE ((void *)&sysctl_long_vals[1])
+#define SYSCTL_LONG_MAX ((void *)&sysctl_long_vals[2])
+
+extern const unsigned long sysctl_long_vals[];
+
typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer,
size_t *lenp, loff_t *ppos);
#ifdef CONFIG_SYSCTL
+#define DECLARE_SYSCTL_BASE(_name, _table) \
+static struct ctl_table _name##_base_table[] = { \
+ { \
+ .procname = #_name, \
+ .mode = 0555, \
+ .child = _table, \
+ }, \
+ { }, \
+}
+
+extern int __register_sysctl_base(struct ctl_table *base_table);
+
+#define register_sysctl_base(_name) __register_sysctl_base(_name##_base_table)
+
void proc_sys_poll_notify(struct ctl_table_poll *poll);
extern void setup_sysctl_set(struct ctl_table_set *p,
void unregister_sysctl_table(struct ctl_table_header * table);
-extern int sysctl_init(void);
+extern int sysctl_init_bases(void);
+extern void __register_sysctl_init(const char *path, struct ctl_table *table,
+ const char *table_name);
+#define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table)
+extern struct ctl_table_header *register_sysctl_mount_point(const char *path);
+
void do_sysctl_args(void);
+int do_proc_douintvec(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos,
+ int (*conv)(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data),
+ void *data);
extern int pwrsw_enabled;
extern int unaligned_enabled;
extern int no_unaligned_warning;
extern struct ctl_table sysctl_mount_point[];
-extern struct ctl_table random_table[];
-extern struct ctl_table firmware_config_table[];
-extern struct ctl_table epoll_table[];
#else /* CONFIG_SYSCTL */
+
+#define DECLARE_SYSCTL_BASE(_name, _table)
+
+static inline int __register_sysctl_base(struct ctl_table *base_table)
+{
+ return 0;
+}
+
+#define register_sysctl_base(table) __register_sysctl_base(table)
+
static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table)
{
return NULL;
}
+static inline struct sysctl_header *register_sysctl_mount_point(const char *path)
+{
+ return NULL;
+}
+
static inline struct ctl_table_header *register_sysctl_paths(
const struct ctl_path *path, struct ctl_table *table)
{
#ifndef _LINUX_UNALIGNED_PACKED_STRUCT_H
#define _LINUX_UNALIGNED_PACKED_STRUCT_H
-#include <linux/kernel.h>
+#include <linux/types.h>
struct __una_u16 { u16 x; } __packed;
struct __una_u32 { u32 x; } __packed;
static inline void fqdir_pre_exit(struct fqdir *fqdir)
{
- fqdir->high_thresh = 0; /* prevent creation of new frags */
- fqdir->dead = true;
+ /* Prevent creation of new frags.
+ * Pairs with READ_ONCE() in inet_frag_find().
+ */
+ WRITE_ONCE(fqdir->high_thresh, 0);
+
+ /* Pairs with READ_ONCE() in inet_frag_kill(), ip_expire()
+ * and ip6frag_expire_frag_queue().
+ */
+ WRITE_ONCE(fqdir->dead, true);
}
void fqdir_exit(struct fqdir *fqdir);
struct sk_buff *head;
rcu_read_lock();
- if (fq->q.fqdir->dead)
+ /* Paired with the WRITE_ONCE() in fqdir_pre_exit(). */
+ if (READ_ONCE(fq->q.fqdir->dead))
goto out_rcu_unlock;
spin_lock(&fq->q.lock);
#ifdef CONFIG_NET_CLS_ACT
exts->type = 0;
exts->nr_actions = 0;
+ /* Note: we do not own yet a reference on net.
+ * This reference might be taken later from tcf_exts_get_net().
+ */
exts->net = net;
- netns_tracker_alloc(net, &exts->ns_tracker, GFP_KERNEL);
exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
GFP_KERNEL);
if (!exts->actions)
u64 rate_bytes_ps; /* bytes per second */
u32 mult;
u16 overhead;
+ u16 mpu;
u8 linklayer;
u8 shift;
};
{
len += r->overhead;
+ if (len < r->mpu)
+ len = r->mpu;
+
if (unlikely(r->linklayer == TC_LINKLAYER_ATM))
return ((u64)(DIV_ROUND_UP(len,48)*53) * r->mult) >> r->shift;
res->rate = min_t(u64, r->rate_bytes_ps, ~0U);
res->overhead = r->overhead;
+ res->mpu = r->mpu;
res->linklayer = (r->linklayer & TC_LINKLAYER_MASK);
}
int retries, struct scsi_mode_data *data,
struct scsi_sense_hdr *);
extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp,
- int modepage, unsigned char *buffer, int len,
- int timeout, int retries,
- struct scsi_mode_data *data,
+ unsigned char *buffer, int len, int timeout,
+ int retries, struct scsi_mode_data *data,
struct scsi_sense_hdr *);
extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
int retries, struct scsi_sense_hdr *sshdr);
* For utility and test programs see: http://sg.danny.cz/sg/sg3_utils.html
*/
-#ifdef __KERNEL__
-extern int sg_big_buff; /* for sysctl */
-#endif
-
typedef struct sg_iovec /* same structure as used by readv() Linux system */
{ /* call. It defines one scatter-gather element. */
TRACE_EVENT(cachefiles_lookup,
TP_PROTO(struct cachefiles_object *obj,
+ struct dentry *dir,
struct dentry *de),
- TP_ARGS(obj, de),
+ TP_ARGS(obj, dir, de),
TP_STRUCT__entry(
__field(unsigned int, obj )
__field(short, error )
+ __field(unsigned long, dino )
__field(unsigned long, ino )
),
TP_fast_assign(
- __entry->obj = obj->debug_id;
+ __entry->obj = obj ? obj->debug_id : 0;
+ __entry->dino = d_backing_inode(dir)->i_ino;
__entry->ino = (!IS_ERR(de) && d_backing_inode(de) ?
d_backing_inode(de)->i_ino : 0);
__entry->error = IS_ERR(de) ? PTR_ERR(de) : 0;
),
- TP_printk("o=%08x i=%lx e=%d",
- __entry->obj, __entry->ino, __entry->error)
+ TP_printk("o=%08x dB=%lx B=%lx e=%d",
+ __entry->obj, __entry->dino, __entry->ino, __entry->error)
+ );
+
+TRACE_EVENT(cachefiles_mkdir,
+ TP_PROTO(struct dentry *dir, struct dentry *subdir),
+
+ TP_ARGS(dir, subdir),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, dir )
+ __field(unsigned int, subdir )
+ ),
+
+ TP_fast_assign(
+ __entry->dir = d_backing_inode(dir)->i_ino;
+ __entry->subdir = d_backing_inode(subdir)->i_ino;
+ ),
+
+ TP_printk("dB=%x sB=%x",
+ __entry->dir,
+ __entry->subdir)
);
TRACE_EVENT(cachefiles_tmpfile,
__entry->backer = backer->i_ino;
),
- TP_printk("o=%08x b=%08x",
+ TP_printk("o=%08x B=%x",
__entry->obj,
__entry->backer)
);
__entry->backer = backer->i_ino;
),
- TP_printk("o=%08x b=%08x",
+ TP_printk("o=%08x B=%x",
__entry->obj,
__entry->backer)
);
TRACE_EVENT(cachefiles_unlink,
TP_PROTO(struct cachefiles_object *obj,
- struct dentry *de,
+ ino_t ino,
enum fscache_why_object_killed why),
- TP_ARGS(obj, de, why),
+ TP_ARGS(obj, ino, why),
/* Note that obj may be NULL */
TP_STRUCT__entry(
__field(unsigned int, obj )
- __field(struct dentry *, de )
+ __field(unsigned int, ino )
__field(enum fscache_why_object_killed, why )
),
TP_fast_assign(
__entry->obj = obj ? obj->debug_id : UINT_MAX;
- __entry->de = de;
+ __entry->ino = ino;
__entry->why = why;
),
- TP_printk("o=%08x d=%p w=%s",
- __entry->obj, __entry->de,
+ TP_printk("o=%08x B=%x w=%s",
+ __entry->obj, __entry->ino,
__print_symbolic(__entry->why, cachefiles_obj_kill_traces))
);
TRACE_EVENT(cachefiles_rename,
TP_PROTO(struct cachefiles_object *obj,
- struct dentry *de,
- struct dentry *to,
+ ino_t ino,
enum fscache_why_object_killed why),
- TP_ARGS(obj, de, to, why),
+ TP_ARGS(obj, ino, why),
/* Note that obj may be NULL */
TP_STRUCT__entry(
__field(unsigned int, obj )
- __field(struct dentry *, de )
- __field(struct dentry *, to )
+ __field(unsigned int, ino )
__field(enum fscache_why_object_killed, why )
),
TP_fast_assign(
__entry->obj = obj ? obj->debug_id : UINT_MAX;
- __entry->de = de;
- __entry->to = to;
+ __entry->ino = ino;
__entry->why = why;
),
- TP_printk("o=%08x d=%p t=%p w=%s",
- __entry->obj, __entry->de, __entry->to,
+ TP_printk("o=%08x B=%x w=%s",
+ __entry->obj, __entry->ino,
__print_symbolic(__entry->why, cachefiles_obj_kill_traces))
);
__entry->ino = ino;
),
- TP_printk("o=%08x %s i=%llx c=%u",
+ TP_printk("o=%08x %s B=%llx c=%u",
__entry->obj,
__print_symbolic(__entry->why, cachefiles_coherency_traces),
__entry->ino,
__entry->ino = ino;
),
- TP_printk("V=%08x %s i=%llx",
+ TP_printk("V=%08x %s B=%llx",
__entry->vol,
__print_symbolic(__entry->why, cachefiles_coherency_traces),
__entry->ino)
__entry->cache_inode = cache_inode;
),
- TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx ni=%x b=%x",
+ TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx ni=%x B=%x",
__entry->rreq, __entry->index,
__print_symbolic(__entry->source, netfs_sreq_sources),
__print_symbolic(__entry->why, cachefiles_prepare_read_traces),
__entry->len = len;
),
- TP_printk("o=%08x b=%08x s=%llx l=%zx",
+ TP_printk("o=%08x B=%x s=%llx l=%zx",
__entry->obj,
__entry->backer,
__entry->start,
__entry->len = len;
),
- TP_printk("o=%08x b=%08x s=%llx l=%zx",
+ TP_printk("o=%08x B=%x s=%llx l=%zx",
__entry->obj,
__entry->backer,
__entry->start,
__entry->why = why;
),
- TP_printk("o=%08x b=%08x %s l=%llx->%llx",
+ TP_printk("o=%08x B=%x %s l=%llx->%llx",
__entry->obj,
__entry->backer,
__print_symbolic(__entry->why, cachefiles_trunc_traces),
__entry->inode = inode->i_ino;
),
- TP_printk("o=%08x i=%lx",
+ TP_printk("o=%08x B=%lx",
+ __entry->obj, __entry->inode)
+ );
+
+TRACE_EVENT(cachefiles_mark_failed,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct inode *inode),
+
+ TP_ARGS(obj, inode),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(ino_t, inode )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj ? obj->debug_id : 0;
+ __entry->inode = inode->i_ino;
+ ),
+
+ TP_printk("o=%08x B=%lx",
__entry->obj, __entry->inode)
);
__entry->inode = inode->i_ino;
),
- TP_printk("o=%08x i=%lx",
+ TP_printk("o=%08x B=%lx",
__entry->obj, __entry->inode)
);
__entry->where = where;
),
- TP_printk("o=%08x b=%08x %s e=%d",
+ TP_printk("o=%08x B=%x %s e=%d",
__entry->obj,
__entry->backer,
__print_symbolic(__entry->where, cachefiles_error_traces),
__entry->where = where;
),
- TP_printk("o=%08x b=%08x %s e=%d",
+ TP_printk("o=%08x B=%x %s e=%d",
__entry->obj,
__entry->backer,
__print_symbolic(__entry->where, cachefiles_error_traces),
enum error_detector {
ERROR_DETECTOR_KFENCE,
- ERROR_DETECTOR_KASAN
+ ERROR_DETECTOR_KASAN,
+ ERROR_DETECTOR_WARN,
};
#endif /* __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY */
-#define error_detector_list \
+#define error_detector_list \
EM(ERROR_DETECTOR_KFENCE, "kfence") \
- EMe(ERROR_DETECTOR_KASAN, "kasan")
+ EM(ERROR_DETECTOR_KASAN, "kasan") \
+ EMe(ERROR_DETECTOR_WARN, "warning")
/* Always end the list with an EMe. */
#undef EM
TRACE_EVENT(f2fs_file_write_iter,
- TP_PROTO(struct inode *inode, unsigned long offset,
- unsigned long length, int ret),
+ TP_PROTO(struct inode *inode, loff_t offset, size_t length,
+ ssize_t ret),
TP_ARGS(inode, offset, length, ret),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(unsigned long, offset)
- __field(unsigned long, length)
- __field(int, ret)
+ __field(loff_t, offset)
+ __field(size_t, length)
+ __field(ssize_t, ret)
),
TP_fast_assign(
),
TP_printk("dev = (%d,%d), ino = %lu, "
- "offset = %lu, length = %lu, written(err) = %d",
+ "offset = %lld, length = %zu, written(err) = %zd",
show_dev_ino(__entry),
__entry->offset,
__entry->length,
TRACE_EVENT(f2fs_direct_IO_enter,
- TP_PROTO(struct inode *inode, loff_t offset, unsigned long len, int rw),
+ TP_PROTO(struct inode *inode, struct kiocb *iocb, long len, int rw),
- TP_ARGS(inode, offset, len, rw),
+ TP_ARGS(inode, iocb, len, rw),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(loff_t, pos)
+ __field(struct kiocb *, iocb)
__field(unsigned long, len)
__field(int, rw)
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->pos = offset;
+ __entry->iocb = iocb;
__entry->len = len;
__entry->rw = rw;
),
- TP_printk("dev = (%d,%d), ino = %lu pos = %lld len = %lu rw = %d",
+ TP_printk("dev = (%d,%d), ino = %lu pos = %lld len = %lu ki_flags = %x ki_hint = %x ki_ioprio = %x rw = %d",
show_dev_ino(__entry),
- __entry->pos,
+ __entry->iocb->ki_pos,
__entry->len,
+ __entry->iocb->ki_flags,
+ __entry->iocb->ki_hint,
+ __entry->iocb->ki_ioprio,
__entry->rw)
);
);
DECLARE_EVENT_CLASS(random__mix_pool_bytes,
- TP_PROTO(const char *pool_name, int bytes, unsigned long IP),
+ TP_PROTO(int bytes, unsigned long IP),
- TP_ARGS(pool_name, bytes, IP),
+ TP_ARGS(bytes, IP),
TP_STRUCT__entry(
- __field( const char *, pool_name )
__field( int, bytes )
__field(unsigned long, IP )
),
TP_fast_assign(
- __entry->pool_name = pool_name;
__entry->bytes = bytes;
__entry->IP = IP;
),
- TP_printk("%s pool: bytes %d caller %pS",
- __entry->pool_name, __entry->bytes, (void *)__entry->IP)
+ TP_printk("input pool: bytes %d caller %pS",
+ __entry->bytes, (void *)__entry->IP)
);
DEFINE_EVENT(random__mix_pool_bytes, mix_pool_bytes,
- TP_PROTO(const char *pool_name, int bytes, unsigned long IP),
+ TP_PROTO(int bytes, unsigned long IP),
- TP_ARGS(pool_name, bytes, IP)
+ TP_ARGS(bytes, IP)
);
DEFINE_EVENT(random__mix_pool_bytes, mix_pool_bytes_nolock,
- TP_PROTO(const char *pool_name, int bytes, unsigned long IP),
+ TP_PROTO(int bytes, unsigned long IP),
- TP_ARGS(pool_name, bytes, IP)
+ TP_ARGS(bytes, IP)
);
TRACE_EVENT(credit_entropy_bits,
- TP_PROTO(const char *pool_name, int bits, int entropy_count,
- unsigned long IP),
+ TP_PROTO(int bits, int entropy_count, unsigned long IP),
- TP_ARGS(pool_name, bits, entropy_count, IP),
+ TP_ARGS(bits, entropy_count, IP),
TP_STRUCT__entry(
- __field( const char *, pool_name )
__field( int, bits )
__field( int, entropy_count )
__field(unsigned long, IP )
),
TP_fast_assign(
- __entry->pool_name = pool_name;
__entry->bits = bits;
__entry->entropy_count = entropy_count;
__entry->IP = IP;
),
- TP_printk("%s pool: bits %d entropy_count %d caller %pS",
- __entry->pool_name, __entry->bits,
- __entry->entropy_count, (void *)__entry->IP)
+ TP_printk("input pool: bits %d entropy_count %d caller %pS",
+ __entry->bits, __entry->entropy_count, (void *)__entry->IP)
);
TRACE_EVENT(debit_entropy,
- TP_PROTO(const char *pool_name, int debit_bits),
+ TP_PROTO(int debit_bits),
- TP_ARGS(pool_name, debit_bits),
+ TP_ARGS( debit_bits),
TP_STRUCT__entry(
- __field( const char *, pool_name )
__field( int, debit_bits )
),
TP_fast_assign(
- __entry->pool_name = pool_name;
__entry->debit_bits = debit_bits;
),
- TP_printk("%s: debit_bits %d", __entry->pool_name,
- __entry->debit_bits)
+ TP_printk("input pool: debit_bits %d", __entry->debit_bits)
);
TRACE_EVENT(add_input_randomness,
);
DECLARE_EVENT_CLASS(random__extract_entropy,
- TP_PROTO(const char *pool_name, int nbytes, int entropy_count,
- unsigned long IP),
+ TP_PROTO(int nbytes, int entropy_count, unsigned long IP),
- TP_ARGS(pool_name, nbytes, entropy_count, IP),
+ TP_ARGS(nbytes, entropy_count, IP),
TP_STRUCT__entry(
- __field( const char *, pool_name )
__field( int, nbytes )
__field( int, entropy_count )
__field(unsigned long, IP )
),
TP_fast_assign(
- __entry->pool_name = pool_name;
__entry->nbytes = nbytes;
__entry->entropy_count = entropy_count;
__entry->IP = IP;
),
- TP_printk("%s pool: nbytes %d entropy_count %d caller %pS",
- __entry->pool_name, __entry->nbytes, __entry->entropy_count,
- (void *)__entry->IP)
+ TP_printk("input pool: nbytes %d entropy_count %d caller %pS",
+ __entry->nbytes, __entry->entropy_count, (void *)__entry->IP)
);
DEFINE_EVENT(random__extract_entropy, extract_entropy,
- TP_PROTO(const char *pool_name, int nbytes, int entropy_count,
- unsigned long IP),
+ TP_PROTO(int nbytes, int entropy_count, unsigned long IP),
- TP_ARGS(pool_name, nbytes, entropy_count, IP)
+ TP_ARGS(nbytes, entropy_count, IP)
);
TRACE_EVENT(urandom_read,
#define AFFS_SUPER_MAGIC 0xadff
#define AFS_SUPER_MAGIC 0x5346414F
#define AUTOFS_SUPER_MAGIC 0x0187
+#define CEPH_SUPER_MAGIC 0x00c36400
#define CODA_SUPER_MAGIC 0x73757245
#define CRAMFS_MAGIC 0x28cd3d45 /* some random number */
#define CRAMFS_MAGIC_WEND 0x453dcd28 /* magic number with the wrong endianess */
* the GPL version of OSS-4.x and build against that version
* of the header.
*
- * We redefine the extern keyword so that make headers_check
+ * We redefine the extern keyword so that usr/include/headers_check.pl
* does not complain about SEQ_USE_EXTBUF.
*/
#define SEQ_DECLAREBUF() SEQ_USE_EXTBUF()
*/
-#define TASKSTATS_VERSION 10
+#define TASKSTATS_VERSION 11
#define TS_COMM_LEN 32 /* should be >= TASK_COMM_LEN
* in linux/sched.h */
/* v10: 64-bit btime to avoid overflow */
__u64 ac_btime64; /* 64-bit begin time */
+
+ /* Delay waiting for memory compact */
+ __u64 compact_count;
+ __u64 compact_delay_total;
};
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* DO NOT USE in new code! This is solely for MEI due to legacy reasons */
/*
* UUID/GUID definition
*
* Copyright (C) 2010, Intel Corp.
* Huang Ying <ying.huang@intel.com>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License version
- * 2 as published by the Free Software Foundation;
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
*/
#ifndef _UAPI_LINUX_UUID_H_
config LD_ORPHAN_WARN
def_bool y
depends on ARCH_WANT_LD_ORPHAN_WARN
- depends on !LD_IS_LLD || LLD_VERSION >= 110000
depends on $(ld-option,--orphan-handling=warn)
config SYSCTL
cmd_compile.h = \
$(CONFIG_SHELL) $(srctree)/scripts/mkcompile_h $@ \
"$(UTS_MACHINE)" "$(CONFIG_SMP)" "$(CONFIG_PREEMPT_BUILD)" \
- "$(CONFIG_PREEMPT_RT)" $(CONFIG_CC_VERSION_TEXT) "$(LD)"
+ "$(CONFIG_PREEMPT_RT)" "$(CONFIG_CC_VERSION_TEXT)" "$(LD)"
include/generated/compile.h: FORCE
$(call cmd,compile.h)
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
- stack_depot_init();
+ stack_depot_early_init();
mem_init();
mem_init_print_info();
- /* page_owner must be initialized after buddy is ready */
- page_ext_init_flatmem_late();
kmem_cache_init();
+ /*
+ * page_owner must be initialized after buddy is ready, and also after
+ * slab is ready so that stack_depot_init() works properly
+ */
+ page_ext_init_flatmem_late();
kmemleak_init();
pgtable_init();
debug_objects_mem_init();
if (!iter)
return -ENOMEM;
- iter->iface = PDE_DATA(inode);
+ iter->iface = pde_data(inode);
iter->ns = get_ipc_ns(current->nsproxy->ipc_ns);
iter->pid_ns = get_pid_ns(task_active_pid_ns(current));
i, btf_type_str(t));
return -EINVAL;
}
- if (check_ctx_reg(env, reg, regno))
+ if (check_ptr_off_reg(env, reg, regno))
return -EINVAL;
} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids[reg->type])) {
const struct btf_type *reg_ref_t;
int opt;
opt = fs_parse(fc, bpf_fs_parameters, param, &result);
- if (opt < 0)
+ if (opt < 0) {
/* We might like to report bad mount options here, but
* traditionally we've ignored all mount options, so we'd
* better continue to ignore non-existing options for bpf.
*/
- return opt == -ENOPARAM ? 0 : opt;
+ if (opt == -ENOPARAM) {
+ opt = vfs_parse_fs_param_source(fc, param);
+ if (opt != -ENOPARAM)
+ return opt;
+
+ return 0;
+ }
+
+ if (opt < 0)
+ return opt;
+ }
switch (opt) {
case OPT_MODE:
if (type & MEM_RDONLY)
strncpy(prefix, "rdonly_", 16);
+ if (type & MEM_ALLOC)
+ strncpy(prefix, "alloc_", 16);
snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
prefix, str[base_type(type)], postfix);
static void mark_stack_slot_scratched(struct bpf_verifier_env *env, u32 spi)
{
- env->scratched_stack_slots |= 1UL << spi;
+ env->scratched_stack_slots |= 1ULL << spi;
}
static bool reg_scratched(const struct bpf_verifier_env *env, u32 regno)
static void mark_verifier_state_clean(struct bpf_verifier_env *env)
{
env->scratched_regs = 0U;
- env->scratched_stack_slots = 0UL;
+ env->scratched_stack_slots = 0ULL;
}
/* Used for printing the entire verifier state. */
static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
{
env->scratched_regs = ~0U;
- env->scratched_stack_slots = ~0UL;
+ env->scratched_stack_slots = ~0ULL;
}
/* The reg state of a pointer or a bounded scalar was saved when
}
#endif
-int check_ctx_reg(struct bpf_verifier_env *env,
- const struct bpf_reg_state *reg, int regno)
+static int __check_ptr_off_reg(struct bpf_verifier_env *env,
+ const struct bpf_reg_state *reg, int regno,
+ bool fixed_off_ok)
{
- /* Access to ctx or passing it to a helper is only allowed in
- * its original, unmodified form.
+ /* Access to this pointer-typed register or passing it to a helper
+ * is only allowed in its original, unmodified form.
*/
- if (reg->off) {
- verbose(env, "dereference of modified ctx ptr R%d off=%d disallowed\n",
- regno, reg->off);
+ if (!fixed_off_ok && reg->off) {
+ verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
+ reg_type_str(env, reg->type), regno, reg->off);
return -EACCES;
}
char tn_buf[48];
tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
- verbose(env, "variable ctx access var_off=%s disallowed\n", tn_buf);
+ verbose(env, "variable %s access var_off=%s disallowed\n",
+ reg_type_str(env, reg->type), tn_buf);
return -EACCES;
}
return 0;
}
+int check_ptr_off_reg(struct bpf_verifier_env *env,
+ const struct bpf_reg_state *reg, int regno)
+{
+ return __check_ptr_off_reg(env, reg, regno, false);
+}
+
static int __check_buffer_access(struct bpf_verifier_env *env,
const char *buf_info,
const struct bpf_reg_state *reg,
return -EACCES;
}
- err = check_ctx_reg(env, reg, regno);
+ err = check_ptr_off_reg(env, reg, regno);
if (err < 0)
return err;
PTR_TO_MAP_KEY,
PTR_TO_MAP_VALUE,
PTR_TO_MEM,
+ PTR_TO_MEM | MEM_ALLOC,
PTR_TO_BUF,
},
};
static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } };
static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
-static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM } };
+static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
kernel_type_name(btf_vmlinux, *arg_btf_id));
return -EACCES;
}
-
- if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
- verbose(env, "R%d is a pointer to in-kernel struct with non-zero offset\n",
- regno);
- return -EACCES;
- }
}
return 0;
if (err)
return err;
- if (type == PTR_TO_CTX) {
- err = check_ctx_reg(env, reg, regno);
+ switch ((u32)type) {
+ case SCALAR_VALUE:
+ /* Pointer types where reg offset is explicitly allowed: */
+ case PTR_TO_PACKET:
+ case PTR_TO_PACKET_META:
+ case PTR_TO_MAP_KEY:
+ case PTR_TO_MAP_VALUE:
+ case PTR_TO_MEM:
+ case PTR_TO_MEM | MEM_RDONLY:
+ case PTR_TO_MEM | MEM_ALLOC:
+ case PTR_TO_BUF:
+ case PTR_TO_BUF | MEM_RDONLY:
+ case PTR_TO_STACK:
+ /* Some of the argument types nevertheless require a
+ * zero register offset.
+ */
+ if (arg_type == ARG_PTR_TO_ALLOC_MEM)
+ goto force_off_check;
+ break;
+ /* All the rest must be rejected: */
+ default:
+force_off_check:
+ err = __check_ptr_off_reg(env, reg, regno,
+ type == PTR_TO_BTF_ID);
if (err < 0)
return err;
+ break;
}
skip_type_check:
return 0;
}
- if (insn->src_reg == BPF_PSEUDO_BTF_ID) {
- mark_reg_known_zero(env, regs, insn->dst_reg);
+ /* All special src_reg cases are listed below. From this point onwards
+ * we either succeed and assign a corresponding dst_reg->type after
+ * zeroing the offset, or fail and reject the program.
+ */
+ mark_reg_known_zero(env, regs, insn->dst_reg);
+ if (insn->src_reg == BPF_PSEUDO_BTF_ID) {
dst_reg->type = aux->btf_var.reg_type;
switch (base_type(dst_reg->type)) {
case PTR_TO_MEM:
}
map = env->used_maps[aux->map_index];
- mark_reg_known_zero(env, regs, insn->dst_reg);
dst_reg->map_ptr = map;
if (insn->src_reg == BPF_PSEUDO_MAP_VALUE ||
return err;
}
- err = check_ctx_reg(env, ®s[ctx_reg], ctx_reg);
+ err = check_ptr_off_reg(env, ®s[ctx_reg], ctx_reg);
if (err < 0)
return err;
--- /dev/null
+# The config is based on running daily CI for enterprise Linux distros to
+# seek regressions on linux-next builds on different bare-metal and virtual
+# platforms. It can be used for example,
+#
+# $ make ARCH=arm64 defconfig debug.config
+#
+# Keep alphabetically sorted inside each section.
+#
+# printk and dmesg options
+#
+CONFIG_DEBUG_BUGVERBOSE=y
+CONFIG_DYNAMIC_DEBUG=y
+CONFIG_PRINTK_CALLER=y
+CONFIG_PRINTK_TIME=y
+CONFIG_SYMBOLIC_ERRNAME=y
+#
+# Compile-time checks and compiler options
+#
+CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_SECTION_MISMATCH=y
+CONFIG_FRAME_WARN=2048
+CONFIG_SECTION_MISMATCH_WARN_ONLY=y
+#
+# Generic Kernel Debugging Instruments
+#
+# CONFIG_UBSAN_ALIGNMENT is not set
+# CONFIG_UBSAN_DIV_ZERO is not set
+# CONFIG_UBSAN_TRAP is not set
+# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
+CONFIG_DEBUG_FS=y
+CONFIG_DEBUG_FS_ALLOW_ALL=y
+CONFIG_DEBUG_IRQFLAGS=y
+CONFIG_UBSAN=y
+CONFIG_UBSAN_BOOL=y
+CONFIG_UBSAN_BOUNDS=y
+CONFIG_UBSAN_ENUM=y
+CONFIG_UBSAN_SHIFT=y
+CONFIG_UBSAN_UNREACHABLE=y
+#
+# Memory Debugging
+#
+# CONFIG_DEBUG_PAGEALLOC is not set
+# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
+# CONFIG_DEBUG_RODATA_TEST is not set
+# CONFIG_DEBUG_WX is not set
+# CONFIG_KFENCE is not set
+# CONFIG_PAGE_POISONING is not set
+# CONFIG_SLUB_STATS is not set
+CONFIG_PAGE_EXTENSION=y
+CONFIG_PAGE_OWNER=y
+CONFIG_DEBUG_KMEMLEAK=y
+CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
+CONFIG_DEBUG_OBJECTS=y
+CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
+CONFIG_DEBUG_OBJECTS_FREE=y
+CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
+CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
+CONFIG_DEBUG_OBJECTS_TIMERS=y
+CONFIG_DEBUG_OBJECTS_WORK=y
+CONFIG_DEBUG_PER_CPU_MAPS=y
+CONFIG_DEBUG_STACK_USAGE=y
+CONFIG_DEBUG_VIRTUAL=y
+CONFIG_DEBUG_VM=y
+CONFIG_DEBUG_VM_PGFLAGS=y
+CONFIG_DEBUG_VM_RB=y
+CONFIG_DEBUG_VM_VMACACHE=y
+CONFIG_GENERIC_PTDUMP=y
+CONFIG_KASAN=y
+CONFIG_KASAN_GENERIC=y
+CONFIG_KASAN_INLINE=y
+CONFIG_KASAN_VMALLOC=y
+CONFIG_PTDUMP_DEBUGFS=y
+CONFIG_SCHED_STACK_END_CHECK=y
+CONFIG_SLUB_DEBUG_ON=y
+#
+# Debug Oops, Lockups and Hangs
+#
+# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
+# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
+CONFIG_DEBUG_ATOMIC_SLEEP=y
+CONFIG_DETECT_HUNG_TASK=y
+CONFIG_PANIC_ON_OOPS=y
+CONFIG_PANIC_TIMEOUT=0
+CONFIG_SOFTLOCKUP_DETECTOR=y
+#
+# Lock Debugging (spinlocks, mutexes, etc...)
+#
+# CONFIG_PROVE_RAW_LOCK_NESTING is not set
+CONFIG_PROVE_LOCKING=y
+#
+# Debug kernel data structures
+#
+CONFIG_BUG_ON_DATA_CORRUPTION=y
+#
+# RCU Debugging
+#
+CONFIG_PROVE_RCU=y
+CONFIG_PROVE_RCU_LIST=y
+#
+# Tracers
+#
+CONFIG_BRANCH_PROFILE_NONE=y
+CONFIG_DYNAMIC_FTRACE=y
+CONFIG_FTRACE=y
+CONFIG_FUNCTION_TRACER=y
*/
void __delayacct_blkio_end(struct task_struct *p)
{
- struct task_delay_info *delays = p->delays;
- u64 *total;
- u32 *count;
-
- if (p->delays->flags & DELAYACCT_PF_SWAPIN) {
- total = &delays->swapin_delay;
- count = &delays->swapin_count;
- } else {
- total = &delays->blkio_delay;
- count = &delays->blkio_count;
- }
-
- delayacct_end(&delays->lock, &delays->blkio_start, total, count);
+ delayacct_end(&p->delays->lock,
+ &p->delays->blkio_start,
+ &p->delays->blkio_delay,
+ &p->delays->blkio_count);
}
int delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
d->freepages_delay_total = (tmp < d->freepages_delay_total) ? 0 : tmp;
tmp = d->thrashing_delay_total + tsk->delays->thrashing_delay;
d->thrashing_delay_total = (tmp < d->thrashing_delay_total) ? 0 : tmp;
+ tmp = d->compact_delay_total + tsk->delays->compact_delay;
+ d->compact_delay_total = (tmp < d->compact_delay_total) ? 0 : tmp;
d->blkio_count += tsk->delays->blkio_count;
d->swapin_count += tsk->delays->swapin_count;
d->freepages_count += tsk->delays->freepages_count;
d->thrashing_count += tsk->delays->thrashing_count;
+ d->compact_count += tsk->delays->compact_count;
raw_spin_unlock_irqrestore(&tsk->delays->lock, flags);
return 0;
unsigned long flags;
raw_spin_lock_irqsave(&tsk->delays->lock, flags);
- ret = nsec_to_clock_t(tsk->delays->blkio_delay +
- tsk->delays->swapin_delay);
+ ret = nsec_to_clock_t(tsk->delays->blkio_delay);
raw_spin_unlock_irqrestore(&tsk->delays->lock, flags);
return ret;
}
¤t->delays->thrashing_delay,
¤t->delays->thrashing_count);
}
+
+void __delayacct_swapin_start(void)
+{
+ current->delays->swapin_start = local_clock();
+}
+
+void __delayacct_swapin_end(void)
+{
+ delayacct_end(¤t->delays->lock,
+ ¤t->delays->swapin_start,
+ ¤t->delays->swapin_delay,
+ ¤t->delays->swapin_count);
+}
+
+void __delayacct_compact_start(void)
+{
+ current->delays->compact_start = local_clock();
+}
+
+void __delayacct_compact_end(void)
+{
+ delayacct_end(¤t->delays->lock,
+ ¤t->delays->compact_start,
+ ¤t->delays->compact_delay,
+ ¤t->delays->compact_count);
+}
config GCOV_KERNEL
bool "Enable gcov-based kernel profiling"
depends on DEBUG_FS
- depends on !CC_IS_CLANG || CLANG_VERSION >= 110000
depends on !ARCH_WANTS_NO_INSTR || CC_HAS_NO_PROFILE_FN_ATTR
select CONSTRUCTORS
default n
* Should we dump all CPUs backtraces in a hung task event?
* Defaults to 0, can be changed via sysctl.
*/
-unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
+static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
+#else
+#define sysctl_hung_task_all_cpu_backtrace 0
#endif /* CONFIG_SMP */
/*
MAX_SCHEDULE_TIMEOUT;
}
+#ifdef CONFIG_SYSCTL
/*
* Process updating of timeout sysctl
*/
-int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
+static int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
+ void __user *buffer,
+ size_t *lenp, loff_t *ppos)
{
int ret;
return ret;
}
+/*
+ * This is needed for proc_doulongvec_minmax of sysctl_hung_task_timeout_secs
+ * and hung_task_check_interval_secs
+ */
+static const unsigned long hung_task_timeout_max = (LONG_MAX / HZ);
+static struct ctl_table hung_task_sysctls[] = {
+#ifdef CONFIG_SMP
+ {
+ .procname = "hung_task_all_cpu_backtrace",
+ .data = &sysctl_hung_task_all_cpu_backtrace,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#endif /* CONFIG_SMP */
+ {
+ .procname = "hung_task_panic",
+ .data = &sysctl_hung_task_panic,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "hung_task_check_count",
+ .data = &sysctl_hung_task_check_count,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ },
+ {
+ .procname = "hung_task_timeout_secs",
+ .data = &sysctl_hung_task_timeout_secs,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = proc_dohung_task_timeout_secs,
+ .extra2 = (void *)&hung_task_timeout_max,
+ },
+ {
+ .procname = "hung_task_check_interval_secs",
+ .data = &sysctl_hung_task_check_interval_secs,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = proc_dohung_task_timeout_secs,
+ .extra2 = (void *)&hung_task_timeout_max,
+ },
+ {
+ .procname = "hung_task_warnings",
+ .data = &sysctl_hung_task_warnings,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_NEG_ONE,
+ },
+ {}
+};
+
+static void __init hung_task_sysctl_init(void)
+{
+ register_sysctl_init("kernel", hung_task_sysctls);
+}
+#else
+#define hung_task_sysctl_init() do { } while (0)
+#endif /* CONFIG_SYSCTL */
+
+
static atomic_t reset_hung_task = ATOMIC_INIT(0);
void reset_hung_task_detector(void)
pm_notifier(hungtask_pm_notify, 0);
watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
+ hung_task_sysctl_init();
return 0;
}
static ssize_t write_irq_affinity(int type, struct file *file,
const char __user *buffer, size_t count, loff_t *pos)
{
- unsigned int irq = (int)(long)PDE_DATA(file_inode(file));
+ unsigned int irq = (int)(long)pde_data(file_inode(file));
cpumask_var_t new_value;
int err;
static int irq_affinity_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, irq_affinity_proc_show, PDE_DATA(inode));
+ return single_open(file, irq_affinity_proc_show, pde_data(inode));
}
static int irq_affinity_list_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, irq_affinity_list_proc_show, PDE_DATA(inode));
+ return single_open(file, irq_affinity_list_proc_show, pde_data(inode));
}
static const struct proc_ops irq_affinity_proc_ops = {
static int default_affinity_open(struct inode *inode, struct file *file)
{
- return single_open(file, default_affinity_show, PDE_DATA(inode));
+ return single_open(file, default_affinity_show, pde_data(inode));
}
static const struct proc_ops default_affinity_proc_ops = {
#define KPROBE_HASH_BITS 6
#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
+#if !defined(CONFIG_OPTPROBES) || !defined(CONFIG_SYSCTL)
+#define kprobe_sysctls_init() do { } while (0)
+#endif
static int kprobes_initialized;
/* kprobe_table can be accessed by
}
static DEFINE_MUTEX(kprobe_sysctl_mutex);
-int sysctl_kprobes_optimization;
-int proc_kprobes_optimization_handler(struct ctl_table *table, int write,
- void *buffer, size_t *length,
- loff_t *ppos)
+static int sysctl_kprobes_optimization;
+static int proc_kprobes_optimization_handler(struct ctl_table *table,
+ int write, void *buffer,
+ size_t *length, loff_t *ppos)
{
int ret;
return ret;
}
+
+static struct ctl_table kprobe_sysctls[] = {
+ {
+ .procname = "kprobes-optimization",
+ .data = &sysctl_kprobes_optimization,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_kprobes_optimization_handler,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {}
+};
+
+static void __init kprobe_sysctls_init(void)
+{
+ register_sysctl_init("debug", kprobe_sysctls);
+}
#endif /* CONFIG_SYSCTL */
/* Put a breakpoint for a probe. */
err = register_module_notifier(&kprobe_module_nb);
kprobes_initialized = (err == 0);
+ kprobe_sysctls_init();
return err;
}
early_initcall(init_kprobes);
#ifdef CONFIG_BLK_CGROUP
struct cgroup_subsys_state *blkcg_css;
#endif
+ /* To store the full name if task comm is truncated. */
+ char *full_name;
};
enum KTHREAD_BITS {
return kthread;
}
+void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk)
+{
+ struct kthread *kthread = to_kthread(tsk);
+
+ if (!kthread || !kthread->full_name) {
+ __get_task_comm(buf, buf_size, tsk);
+ return;
+ }
+
+ strscpy_pad(buf, kthread->full_name, buf_size);
+}
+
bool set_kthread_struct(struct task_struct *p)
{
struct kthread *kthread;
* Can be NULL if kmalloc() in set_kthread_struct() failed.
*/
kthread = to_kthread(k);
+ if (!kthread)
+ return;
+
#ifdef CONFIG_BLK_CGROUP
- WARN_ON_ONCE(kthread && kthread->blkcg_css);
+ WARN_ON_ONCE(kthread->blkcg_css);
#endif
k->worker_private = NULL;
+ kfree(kthread->full_name);
kfree(kthread);
}
task = create->result;
if (!IS_ERR(task)) {
char name[TASK_COMM_LEN];
+ va_list aq;
+ int len;
/*
* task is already visible to other tasks, so updating
* COMM must be protected.
*/
- vsnprintf(name, sizeof(name), namefmt, args);
+ va_copy(aq, args);
+ len = vsnprintf(name, sizeof(name), namefmt, aq);
+ va_end(aq);
+ if (len >= TASK_COMM_LEN) {
+ struct kthread *kthread = to_kthread(task);
+
+ /* leave it truncated when out of memory. */
+ kthread->full_name = kvasprintf(GFP_KERNEL, namefmt, args);
+ }
set_task_comm(task, name);
}
kfree(create);
__raw_write_lock(lock);
}
EXPORT_SYMBOL(_raw_write_lock);
+
+#ifndef CONFIG_DEBUG_LOCK_ALLOC
+#define __raw_write_lock_nested(lock, subclass) __raw_write_lock(((void)(subclass), (lock)))
+#endif
+
+void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass)
+{
+ __raw_write_lock_nested(lock, subclass);
+}
+EXPORT_SYMBOL(_raw_write_lock_nested);
#endif
#ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE
}
EXPORT_SYMBOL(rt_write_lock);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+void __sched rt_write_lock_nested(rwlock_t *rwlock, int subclass)
+{
+ rtlock_might_resched();
+ rwlock_acquire(&rwlock->dep_map, subclass, 0, _RET_IP_);
+ rwbase_write_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
+ rcu_read_lock();
+ migrate_disable();
+}
+EXPORT_SYMBOL(rt_write_lock_nested);
+#endif
+
void __sched rt_read_unlock(rwlock_t *rwlock)
{
rwlock_release(&rwlock->dep_map, _RET_IP_);
#include <linux/bug.h>
#include <linux/ratelimit.h>
#include <linux/debugfs.h>
+#include <trace/events/error_report.h>
#include <asm/sections.h>
#define PANIC_TIMER_STEP 100
trigger_all_cpu_backtrace();
}
-/*
- * 64-bit random ID for oopses:
- */
-static u64 oops_id;
-
-static int init_oops_id(void)
-{
- if (!oops_id)
- get_random_bytes(&oops_id, sizeof(oops_id));
- else
- oops_id++;
-
- return 0;
-}
-late_initcall(init_oops_id);
-
static void print_oops_end_marker(void)
{
- init_oops_id();
- pr_warn("---[ end trace %016llx ]---\n", (unsigned long long)oops_id);
+ pr_warn("---[ end trace %016llx ]---\n", 0ULL);
}
/*
print_irqtrace_events(current);
print_oops_end_marker();
+ trace_error_report_end(ERROR_DETECTOR_WARN, (unsigned long)caller);
/* Just a warning, don't kill lockdep. */
add_taint(taint, LOCKDEP_STILL_OK);
obj-y = printk.o
obj-$(CONFIG_PRINTK) += printk_safe.o
obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
-obj-$(CONFIG_PRINTK) += printk_ringbuffer.o
obj-$(CONFIG_PRINTK_INDEX) += index.o
+
+obj-$(CONFIG_PRINTK) += printk_support.o
+printk_support-y := printk_ringbuffer.o
+printk_support-$(CONFIG_SYSCTL) += sysctl.o
*/
#include <linux/percpu.h>
+#if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
+void __init printk_sysctl_init(void);
+int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos);
+#else
+#define printk_sysctl_init() do { } while (0)
+#endif
+
#ifdef CONFIG_PRINTK
/* Flags for a single printk record. */
__setup("printk.devkmsg=", control_devkmsg);
char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE] = "ratelimit";
-
+#if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
return 0;
}
+#endif /* CONFIG_PRINTK && CONFIG_SYSCTL */
/* Number of registered extended console drivers. */
static int nr_ext_console_drivers;
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online",
console_cpu_notify, NULL);
WARN_ON(ret < 0);
+ printk_sysctl_init();
return 0;
}
late_initcall(printk_late_init);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * sysctl.c: General linux system control interface
+ */
+
+#include <linux/sysctl.h>
+#include <linux/printk.h>
+#include <linux/capability.h>
+#include <linux/ratelimit.h>
+#include "internal.h"
+
+static const int ten_thousand = 10000;
+
+static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ if (write && !capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+}
+
+static struct ctl_table printk_sysctls[] = {
+ {
+ .procname = "printk",
+ .data = &console_loglevel,
+ .maxlen = 4*sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "printk_ratelimit",
+ .data = &printk_ratelimit_state.interval,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .procname = "printk_ratelimit_burst",
+ .data = &printk_ratelimit_state.burst,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "printk_delay",
+ .data = &printk_delay_msec,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = (void *)&ten_thousand,
+ },
+ {
+ .procname = "printk_devkmsg",
+ .data = devkmsg_log_str,
+ .maxlen = DEVKMSG_STR_MAX_SIZE,
+ .mode = 0644,
+ .proc_handler = devkmsg_sysctl_set_loglvl,
+ },
+ {
+ .procname = "dmesg_restrict",
+ .data = &dmesg_restrict,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax_sysadmin,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "kptr_restrict",
+ .data = &kptr_restrict,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax_sysadmin,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_TWO,
+ },
+ {}
+};
+
+void __init printk_sysctl_init(void)
+{
+ register_sysctl_init("kernel", printk_sysctls);
+}
static void *r_start(struct seq_file *m, loff_t *pos)
__acquires(resource_lock)
{
- struct resource *p = PDE_DATA(file_inode(m->file));
+ struct resource *p = pde_data(file_inode(m->file));
loff_t l = 0;
read_lock(&resource_lock);
for (p = p->child; p && l < *pos; p = r_next(m, p, &l))
static int r_show(struct seq_file *m, void *v)
{
- struct resource *root = PDE_DATA(file_inode(m->file));
+ struct resource *root = pde_data(file_inode(m->file));
struct resource *r = v, *p;
unsigned long long start, end;
int width = root->end < 0x10000 ? 4 : 8;
#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
#include <linux/jump_label.h>
#include <linux/sysctl.h>
+#include <linux/init.h>
static DEFINE_STATIC_KEY_FALSE(stack_erasing_bypass);
-int stack_erasing_sysctl(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
+#ifdef CONFIG_SYSCTL
+static int stack_erasing_sysctl(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
{
int ret = 0;
int state = !static_branch_unlikely(&stack_erasing_bypass);
state ? "enabled" : "disabled");
return ret;
}
+static struct ctl_table stackleak_sysctls[] = {
+ {
+ .procname = "stack_erasing",
+ .data = NULL,
+ .maxlen = sizeof(int),
+ .mode = 0600,
+ .proc_handler = stack_erasing_sysctl,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {}
+};
+
+static int __init stackleak_sysctls_init(void)
+{
+ register_sysctl_init("kernel", stackleak_sysctls);
+ return 0;
+}
+late_initcall(stackleak_sysctls_init);
+#endif /* CONFIG_SYSCTL */
#define skip_erasing() static_branch_unlikely(&stack_erasing_bypass)
#else
niceval = MAX_NICE;
rcu_read_lock();
- read_lock(&tasklist_lock);
switch (which) {
case PRIO_PROCESS:
if (who)
pgrp = find_vpid(who);
else
pgrp = task_pgrp(current);
+ read_lock(&tasklist_lock);
do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
error = set_one_prio(p, niceval, error);
} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+ read_unlock(&tasklist_lock);
break;
case PRIO_USER:
uid = make_kuid(cred->user_ns, who);
if (!user)
goto out_unlock; /* No processes for this user */
}
- do_each_thread(g, p) {
+ for_each_process_thread(g, p) {
if (uid_eq(task_uid(p), uid) && task_pid_vnr(p))
error = set_one_prio(p, niceval, error);
- } while_each_thread(g, p);
+ }
if (!uid_eq(uid, cred->uid))
free_uid(user); /* For find_user() */
break;
}
out_unlock:
- read_unlock(&tasklist_lock);
rcu_read_unlock();
out:
return error;
return -EINVAL;
rcu_read_lock();
- read_lock(&tasklist_lock);
switch (which) {
case PRIO_PROCESS:
if (who)
pgrp = find_vpid(who);
else
pgrp = task_pgrp(current);
+ read_lock(&tasklist_lock);
do_each_pid_thread(pgrp, PIDTYPE_PGID, p) {
niceval = nice_to_rlimit(task_nice(p));
if (niceval > retval)
retval = niceval;
} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
+ read_unlock(&tasklist_lock);
break;
case PRIO_USER:
uid = make_kuid(cred->user_ns, who);
if (!user)
goto out_unlock; /* No processes for this user */
}
- do_each_thread(g, p) {
+ for_each_process_thread(g, p) {
if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) {
niceval = nice_to_rlimit(task_nice(p));
if (niceval > retval)
retval = niceval;
}
- } while_each_thread(g, p);
+ }
if (!uid_eq(uid, cred->uid))
free_uid(user); /* for find_user() */
break;
}
out_unlock:
- read_unlock(&tasklist_lock);
rcu_read_unlock();
return retval;
*/
#include <linux/module.h>
-#include <linux/aio.h>
#include <linux/mm.h>
#include <linux/swap.h>
#include <linux/slab.h>
#include <linux/times.h>
#include <linux/limits.h>
#include <linux/dcache.h>
-#include <linux/dnotify.h>
#include <linux/syscalls.h>
#include <linux/vmstat.h>
#include <linux/nfs_fs.h>
#include <linux/reboot.h>
#include <linux/ftrace.h>
#include <linux/perf_event.h>
-#include <linux/kprobes.h>
-#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
#include <linux/kmod.h>
#include <linux/capability.h>
#include <linux/binfmts.h>
#include <linux/sched/sysctl.h>
-#include <linux/sched/coredump.h>
#include <linux/kexec.h>
#include <linux/bpf.h>
#include <linux/mount.h>
#include <linux/userfaultfd_k.h>
-#include <linux/coredump.h>
#include <linux/latencytop.h>
#include <linux/pid.h>
#include <linux/delayacct.h>
#if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_LOCK_STAT)
#include <linux/lockdep.h>
#endif
-#ifdef CONFIG_CHR_DEV_SG
-#include <scsi/sg.h>
-#endif
-#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
-#include <linux/stackleak.h>
-#endif
-#ifdef CONFIG_LOCKUP_DETECTOR
-#include <linux/nmi.h>
-#endif
#if defined(CONFIG_SYSCTL)
/* Constants used for minimum and maximum */
-#ifdef CONFIG_LOCKUP_DETECTOR
-static int sixty = 60;
-#endif
-
-static int __maybe_unused neg_one = -1;
-static int __maybe_unused two = 2;
-static int __maybe_unused four = 4;
-static unsigned long zero_ul;
-static unsigned long one_ul = 1;
-static unsigned long long_max = LONG_MAX;
-static int one_hundred = 100;
-static int two_hundred = 200;
-static int one_thousand = 1000;
-static int three_thousand = 3000;
-#ifdef CONFIG_PRINTK
-static int ten_thousand = 10000;
-#endif
+
#ifdef CONFIG_PERF_EVENTS
-static int six_hundred_forty_kb = 640 * 1024;
+static const int six_hundred_forty_kb = 640 * 1024;
#endif
/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
-static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
-
-/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
-static int maxolduid = 65535;
-static int minolduid;
+static const unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
-static int ngroups_max = NGROUPS_MAX;
+static const int ngroups_max = NGROUPS_MAX;
static const int cap_last_cap = CAP_LAST_CAP;
-/*
- * This is needed for proc_doulongvec_minmax of sysctl_hung_task_timeout_secs
- * and hung_task_check_interval_secs
- */
-#ifdef CONFIG_DETECT_HUNG_TASK
-static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
-#endif
-
-#ifdef CONFIG_INOTIFY_USER
-#include <linux/inotify.h>
-#endif
-#ifdef CONFIG_FANOTIFY
-#include <linux/fanotify.h>
-#endif
-
#ifdef CONFIG_PROC_SYSCTL
/**
#endif
#ifdef CONFIG_COMPACTION
-static int min_extfrag_threshold;
-static int max_extfrag_threshold = 1000;
+/* min_extfrag_threshold is SYSCTL_ZERO */;
+static const int max_extfrag_threshold = 1000;
#endif
#endif /* CONFIG_SYSCTL */
return do_proc_douintvec_r(i, buffer, lenp, ppos, conv, data);
}
-static int do_proc_douintvec(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos,
- int (*conv)(unsigned long *lvalp,
- unsigned int *valp,
- int write, void *data),
- void *data)
+int do_proc_douintvec(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos,
+ int (*conv)(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data),
+ void *data)
{
return __do_proc_douintvec(table->data, table, write,
buffer, lenp, ppos, conv, data);
return err;
}
-#ifdef CONFIG_PRINTK
-static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
-{
- if (write && !capable(CAP_SYS_ADMIN))
- return -EPERM;
-
- return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-}
-#endif
-
/**
* struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure
* @min: pointer to minimum allowable value
}
EXPORT_SYMBOL_GPL(proc_dou8vec_minmax);
-static int do_proc_dopipe_max_size_conv(unsigned long *lvalp,
- unsigned int *valp,
- int write, void *data)
-{
- if (write) {
- unsigned int val;
-
- val = round_pipe_size(*lvalp);
- if (val == 0)
- return -EINVAL;
-
- *valp = val;
- } else {
- unsigned int val = *valp;
- *lvalp = (unsigned long) val;
- }
-
- return 0;
-}
-
-static int proc_dopipe_max_size(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
-{
- return do_proc_douintvec(table, write, buffer, lenp, ppos,
- do_proc_dopipe_max_size_conv, NULL);
-}
-
-static void validate_coredump_safety(void)
-{
-#ifdef CONFIG_COREDUMP
- if (suid_dumpable == SUID_DUMP_ROOT &&
- core_pattern[0] != '/' && core_pattern[0] != '|') {
- printk(KERN_WARNING
-"Unsafe core_pattern used with fs.suid_dumpable=2.\n"
-"Pipe handler or fully qualified core dump path required.\n"
-"Set kernel.core_pattern before fs.suid_dumpable.\n"
- );
- }
-#endif
-}
-
-static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
-{
- int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (!error)
- validate_coredump_safety();
- return error;
-}
-
-#ifdef CONFIG_COREDUMP
-static int proc_dostring_coredump(struct ctl_table *table, int write,
- void *buffer, size_t *lenp, loff_t *ppos)
-{
- int error = proc_dostring(table, write, buffer, lenp, ppos);
- if (!error)
- validate_coredump_safety();
- return error;
-}
-#endif
-
#ifdef CONFIG_MAGIC_SYSRQ
static int sysrq_sysctl_handler(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
err = proc_get_long(&p, &left, &val, &neg,
proc_wspace_sep,
sizeof(proc_wspace_sep), NULL);
- if (err)
+ if (err || neg) {
+ err = -EINVAL;
break;
- if (neg)
- continue;
+ }
+
val = convmul * val / convdiv;
if ((min && val < *min) || (max && val > *max)) {
err = -EINVAL;
.mode = 0644,
.proc_handler = proc_dointvec,
},
-#ifdef CONFIG_COREDUMP
- {
- .procname = "core_uses_pid",
- .data = &core_uses_pid,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
- {
- .procname = "core_pattern",
- .data = core_pattern,
- .maxlen = CORENAME_MAX_SIZE,
- .mode = 0644,
- .proc_handler = proc_dostring_coredump,
- },
- {
- .procname = "core_pipe_limit",
- .data = &core_pipe_limit,
- .maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
-#endif
#ifdef CONFIG_PROC_SYSCTL
{
.procname = "tainted",
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
- .extra1 = &neg_one,
+ .extra1 = SYSCTL_NEG_ONE,
.extra2 = SYSCTL_ONE,
},
#endif
.proc_handler = proc_dostring,
},
#endif
-#ifdef CONFIG_CHR_DEV_SG
- {
- .procname = "sg-big-buff",
- .data = &sg_big_buff,
- .maxlen = sizeof (int),
- .mode = 0444,
- .proc_handler = proc_dointvec,
- },
-#endif
#ifdef CONFIG_BSD_PROCESS_ACCT
{
.procname = "acct",
.mode = 0644,
.proc_handler = sysctl_max_threads,
},
- {
- .procname = "random",
- .mode = 0555,
- .child = random_table,
- },
{
.procname = "usermodehelper",
.mode = 0555,
.child = usermodehelper_table,
},
-#ifdef CONFIG_FW_LOADER_USER_HELPER
- {
- .procname = "firmware_config",
- .mode = 0555,
- .child = firmware_config_table,
- },
-#endif
{
.procname = "overflowuid",
.data = &overflowuid,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
- .extra1 = &minolduid,
- .extra2 = &maxolduid,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_MAXOLDUID,
},
{
.procname = "overflowgid",
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
- .extra1 = &minolduid,
- .extra2 = &maxolduid,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_MAXOLDUID,
},
#ifdef CONFIG_S390
{
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
},
-#if defined CONFIG_PRINTK
- {
- .procname = "printk",
- .data = &console_loglevel,
- .maxlen = 4*sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
- {
- .procname = "printk_ratelimit",
- .data = &printk_ratelimit_state.interval,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_jiffies,
- },
- {
- .procname = "printk_ratelimit_burst",
- .data = &printk_ratelimit_state.burst,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
- {
- .procname = "printk_delay",
- .data = &printk_delay_msec,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = &ten_thousand,
- },
- {
- .procname = "printk_devkmsg",
- .data = devkmsg_log_str,
- .maxlen = DEVKMSG_STR_MAX_SIZE,
- .mode = 0644,
- .proc_handler = devkmsg_sysctl_set_loglvl,
- },
- {
- .procname = "dmesg_restrict",
- .data = &dmesg_restrict,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax_sysadmin,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
- {
- .procname = "kptr_restrict",
- .data = &kptr_restrict,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax_sysadmin,
- .extra1 = SYSCTL_ZERO,
- .extra2 = &two,
- },
-#endif
{
.procname = "ngroups_max",
- .data = &ngroups_max,
+ .data = (void *)&ngroups_max,
.maxlen = sizeof (int),
.mode = 0444,
.proc_handler = proc_dointvec,
.mode = 0444,
.proc_handler = proc_dointvec,
},
-#if defined(CONFIG_LOCKUP_DETECTOR)
- {
- .procname = "watchdog",
- .data = &watchdog_user_enabled,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_watchdog,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
- {
- .procname = "watchdog_thresh",
- .data = &watchdog_thresh,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_watchdog_thresh,
- .extra1 = SYSCTL_ZERO,
- .extra2 = &sixty,
- },
- {
- .procname = "nmi_watchdog",
- .data = &nmi_watchdog_user_enabled,
- .maxlen = sizeof(int),
- .mode = NMI_WATCHDOG_SYSCTL_PERM,
- .proc_handler = proc_nmi_watchdog,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
- {
- .procname = "watchdog_cpumask",
- .data = &watchdog_cpumask_bits,
- .maxlen = NR_CPUS,
- .mode = 0644,
- .proc_handler = proc_watchdog_cpumask,
- },
-#ifdef CONFIG_SOFTLOCKUP_DETECTOR
- {
- .procname = "soft_watchdog",
- .data = &soft_watchdog_user_enabled,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_soft_watchdog,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
- {
- .procname = "softlockup_panic",
- .data = &softlockup_panic,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
-#ifdef CONFIG_SMP
- {
- .procname = "softlockup_all_cpu_backtrace",
- .data = &sysctl_softlockup_all_cpu_backtrace,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
-#endif /* CONFIG_SMP */
-#endif
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
- {
- .procname = "hardlockup_panic",
- .data = &hardlockup_panic,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
-#ifdef CONFIG_SMP
- {
- .procname = "hardlockup_all_cpu_backtrace",
- .data = &sysctl_hardlockup_all_cpu_backtrace,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
-#endif /* CONFIG_SMP */
-#endif
-#endif
-
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
{
.procname = "unknown_nmi_panic",
.proc_handler = proc_dointvec,
},
#endif
-#ifdef CONFIG_DETECT_HUNG_TASK
-#ifdef CONFIG_SMP
- {
- .procname = "hung_task_all_cpu_backtrace",
- .data = &sysctl_hung_task_all_cpu_backtrace,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
-#endif /* CONFIG_SMP */
- {
- .procname = "hung_task_panic",
- .data = &sysctl_hung_task_panic,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
- {
- .procname = "hung_task_check_count",
- .data = &sysctl_hung_task_check_count,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- },
- {
- .procname = "hung_task_timeout_secs",
- .data = &sysctl_hung_task_timeout_secs,
- .maxlen = sizeof(unsigned long),
- .mode = 0644,
- .proc_handler = proc_dohung_task_timeout_secs,
- .extra2 = &hung_task_timeout_max,
- },
- {
- .procname = "hung_task_check_interval_secs",
- .data = &sysctl_hung_task_check_interval_secs,
- .maxlen = sizeof(unsigned long),
- .mode = 0644,
- .proc_handler = proc_dohung_task_timeout_secs,
- .extra2 = &hung_task_timeout_max,
- },
- {
- .procname = "hung_task_warnings",
- .data = &sysctl_hung_task_warnings,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &neg_one,
- },
-#endif
#ifdef CONFIG_RT_MUTEXES
{
.procname = "max_lock_depth",
.mode = 0644,
.proc_handler = perf_cpu_time_max_percent_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &one_hundred,
+ .extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "perf_event_max_stack",
.mode = 0644,
.proc_handler = perf_event_max_stack_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &six_hundred_forty_kb,
+ .extra2 = (void *)&six_hundred_forty_kb,
},
{
.procname = "perf_event_max_contexts_per_stack",
.mode = 0644,
.proc_handler = perf_event_max_stack_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &one_thousand,
+ .extra2 = SYSCTL_ONE_THOUSAND,
},
#endif
{
.mode = 0644,
.proc_handler = bpf_unpriv_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &two,
+ .extra2 = SYSCTL_TWO,
},
{
.procname = "bpf_stats_enabled",
.extra1 = SYSCTL_ONE,
.extra2 = SYSCTL_INT_MAX,
},
-#endif
-#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
- {
- .procname = "stack_erasing",
- .data = NULL,
- .maxlen = sizeof(int),
- .mode = 0600,
- .proc_handler = stack_erasing_sysctl,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
#endif
{ }
};
.mode = 0644,
.proc_handler = overcommit_policy_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &two,
+ .extra2 = SYSCTL_TWO,
},
{
.procname = "panic_on_oom",
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = &two,
+ .extra2 = SYSCTL_TWO,
},
{
.procname = "oom_kill_allocating_task",
.mode = 0644,
.proc_handler = dirty_background_ratio_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &one_hundred,
+ .extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "dirty_background_bytes",
.maxlen = sizeof(dirty_background_bytes),
.mode = 0644,
.proc_handler = dirty_background_bytes_handler,
- .extra1 = &one_ul,
+ .extra1 = SYSCTL_LONG_ONE,
},
{
.procname = "dirty_ratio",
.mode = 0644,
.proc_handler = dirty_ratio_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &one_hundred,
+ .extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "dirty_bytes",
.maxlen = sizeof(vm_dirty_bytes),
.mode = 0644,
.proc_handler = dirty_bytes_handler,
- .extra1 = &dirty_bytes_min,
+ .extra1 = (void *)&dirty_bytes_min,
},
{
.procname = "dirty_writeback_centisecs",
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = &two_hundred,
+ .extra2 = SYSCTL_TWO_HUNDRED,
},
#ifdef CONFIG_HUGETLB_PAGE
{
.mode = 0200,
.proc_handler = drop_caches_sysctl_handler,
.extra1 = SYSCTL_ONE,
- .extra2 = &four,
+ .extra2 = SYSCTL_FOUR,
},
#ifdef CONFIG_COMPACTION
{
.mode = 0644,
.proc_handler = compaction_proactiveness_sysctl_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &one_hundred,
+ .extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "extfrag_threshold",
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
- .extra1 = &min_extfrag_threshold,
- .extra2 = &max_extfrag_threshold,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = (void *)&max_extfrag_threshold,
},
{
.procname = "compact_unevictable_allowed",
.mode = 0644,
.proc_handler = watermark_scale_factor_sysctl_handler,
.extra1 = SYSCTL_ONE,
- .extra2 = &three_thousand,
+ .extra2 = SYSCTL_THREE_THOUSAND,
},
{
.procname = "percpu_pagelist_high_fraction",
.mode = 0644,
.proc_handler = sysctl_min_unmapped_ratio_sysctl_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &one_hundred,
+ .extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "min_slab_ratio",
.mode = 0644,
.proc_handler = sysctl_min_slab_ratio_sysctl_handler,
.extra1 = SYSCTL_ZERO,
- .extra2 = &one_hundred,
+ .extra2 = SYSCTL_ONE_HUNDRED,
},
#endif
#ifdef CONFIG_SMP
{ }
};
-static struct ctl_table fs_table[] = {
- {
- .procname = "inode-nr",
- .data = &inodes_stat,
- .maxlen = 2*sizeof(long),
- .mode = 0444,
- .proc_handler = proc_nr_inodes,
- },
- {
- .procname = "inode-state",
- .data = &inodes_stat,
- .maxlen = 7*sizeof(long),
- .mode = 0444,
- .proc_handler = proc_nr_inodes,
- },
- {
- .procname = "file-nr",
- .data = &files_stat,
- .maxlen = sizeof(files_stat),
- .mode = 0444,
- .proc_handler = proc_nr_files,
- },
- {
- .procname = "file-max",
- .data = &files_stat.max_files,
- .maxlen = sizeof(files_stat.max_files),
- .mode = 0644,
- .proc_handler = proc_doulongvec_minmax,
- .extra1 = &zero_ul,
- .extra2 = &long_max,
- },
- {
- .procname = "nr_open",
- .data = &sysctl_nr_open,
- .maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &sysctl_nr_open_min,
- .extra2 = &sysctl_nr_open_max,
- },
- {
- .procname = "dentry-state",
- .data = &dentry_stat,
- .maxlen = 6*sizeof(long),
- .mode = 0444,
- .proc_handler = proc_nr_dentry,
- },
- {
- .procname = "overflowuid",
- .data = &fs_overflowuid,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &minolduid,
- .extra2 = &maxolduid,
- },
- {
- .procname = "overflowgid",
- .data = &fs_overflowgid,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &minolduid,
- .extra2 = &maxolduid,
- },
-#ifdef CONFIG_FILE_LOCKING
- {
- .procname = "leases-enable",
- .data = &leases_enable,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
-#endif
-#ifdef CONFIG_DNOTIFY
- {
- .procname = "dir-notify-enable",
- .data = &dir_notify_enable,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
-#endif
-#ifdef CONFIG_MMU
-#ifdef CONFIG_FILE_LOCKING
- {
- .procname = "lease-break-time",
- .data = &lease_break_time,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
-#endif
-#ifdef CONFIG_AIO
- {
- .procname = "aio-nr",
- .data = &aio_nr,
- .maxlen = sizeof(aio_nr),
- .mode = 0444,
- .proc_handler = proc_doulongvec_minmax,
- },
- {
- .procname = "aio-max-nr",
- .data = &aio_max_nr,
- .maxlen = sizeof(aio_max_nr),
- .mode = 0644,
- .proc_handler = proc_doulongvec_minmax,
- },
-#endif /* CONFIG_AIO */
-#ifdef CONFIG_INOTIFY_USER
- {
- .procname = "inotify",
- .mode = 0555,
- .child = inotify_table,
- },
-#endif
-#ifdef CONFIG_FANOTIFY
- {
- .procname = "fanotify",
- .mode = 0555,
- .child = fanotify_table,
- },
-#endif
-#ifdef CONFIG_EPOLL
- {
- .procname = "epoll",
- .mode = 0555,
- .child = epoll_table,
- },
-#endif
-#endif
- {
- .procname = "protected_symlinks",
- .data = &sysctl_protected_symlinks,
- .maxlen = sizeof(int),
- .mode = 0600,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
- {
- .procname = "protected_hardlinks",
- .data = &sysctl_protected_hardlinks,
- .maxlen = sizeof(int),
- .mode = 0600,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
- {
- .procname = "protected_fifos",
- .data = &sysctl_protected_fifos,
- .maxlen = sizeof(int),
- .mode = 0600,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = &two,
- },
- {
- .procname = "protected_regular",
- .data = &sysctl_protected_regular,
- .maxlen = sizeof(int),
- .mode = 0600,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = &two,
- },
- {
- .procname = "suid_dumpable",
- .data = &suid_dumpable,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax_coredump,
- .extra1 = SYSCTL_ZERO,
- .extra2 = &two,
- },
-#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
- {
- .procname = "binfmt_misc",
- .mode = 0555,
- .child = sysctl_mount_point,
- },
-#endif
- {
- .procname = "pipe-max-size",
- .data = &pipe_max_size,
- .maxlen = sizeof(pipe_max_size),
- .mode = 0644,
- .proc_handler = proc_dopipe_max_size,
- },
- {
- .procname = "pipe-user-pages-hard",
- .data = &pipe_user_pages_hard,
- .maxlen = sizeof(pipe_user_pages_hard),
- .mode = 0644,
- .proc_handler = proc_doulongvec_minmax,
- },
- {
- .procname = "pipe-user-pages-soft",
- .data = &pipe_user_pages_soft,
- .maxlen = sizeof(pipe_user_pages_soft),
- .mode = 0644,
- .proc_handler = proc_doulongvec_minmax,
- },
- {
- .procname = "mount-max",
- .data = &sysctl_mount_max,
- .maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ONE,
- },
- { }
-};
-
static struct ctl_table debug_table[] = {
#ifdef CONFIG_SYSCTL_EXCEPTION_TRACE
{
.mode = 0644,
.proc_handler = proc_dointvec
},
-#endif
-#if defined(CONFIG_OPTPROBES)
- {
- .procname = "kprobes-optimization",
- .data = &sysctl_kprobes_optimization,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_kprobes_optimization_handler,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
#endif
{ }
};
{ }
};
-static struct ctl_table sysctl_base_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = kern_table,
- },
- {
- .procname = "vm",
- .mode = 0555,
- .child = vm_table,
- },
- {
- .procname = "fs",
- .mode = 0555,
- .child = fs_table,
- },
- {
- .procname = "debug",
- .mode = 0555,
- .child = debug_table,
- },
- {
- .procname = "dev",
- .mode = 0555,
- .child = dev_table,
- },
- { }
-};
+DECLARE_SYSCTL_BASE(kernel, kern_table);
+DECLARE_SYSCTL_BASE(vm, vm_table);
+DECLARE_SYSCTL_BASE(debug, debug_table);
+DECLARE_SYSCTL_BASE(dev, dev_table);
-int __init sysctl_init(void)
+int __init sysctl_init_bases(void)
{
- struct ctl_table_header *hdr;
+ register_sysctl_base(kernel);
+ register_sysctl_base(vm);
+ register_sysctl_base(debug);
+ register_sysctl_base(dev);
- hdr = register_sysctl_table(sysctl_base_table);
- kmemleak_not_leak(hdr);
return 0;
}
#endif /* CONFIG_SYSCTL */
return;
/* Make sure to select at least one CPU other than the current CPU. */
- cpu = cpumask_next(-1, cpu_online_mask);
+ cpu = cpumask_first(cpu_online_mask);
if (cpu == smp_processor_id())
cpu = cpumask_next(cpu, cpu_online_mask);
if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
cpu = prandom_u32() % nr_cpu_ids;
cpu = cpumask_next(cpu - 1, cpu_online_mask);
if (cpu >= nr_cpu_ids)
- cpu = cpumask_next(-1, cpu_online_mask);
+ cpu = cpumask_first(cpu_online_mask);
if (!WARN_ON_ONCE(cpu >= nr_cpu_ids))
cpumask_set_cpu(cpu, &cpus_chosen);
}
help
C version of recordmcount available?
+config BUILDTIME_MCOUNT_SORT
+ bool
+ default y
+ depends on BUILDTIME_TABLE_SORT && !S390
+ help
+ Sort the mcount_loc section at build time.
+
config TRACER_MAX_TRACE
bool
config FTRACE_SORT_STARTUP_TEST
bool "Verify compile time sorting of ftrace functions"
depends on DYNAMIC_FTRACE
- depends on BUILDTIME_TABLE_SORT
+ depends on BUILDTIME_MCOUNT_SORT
help
Sorting of the mcount_loc sections that is used to find the
where the ftrace knows where to patch functions for tracing
/*
* Sorting mcount in vmlinux at build time depend on
- * CONFIG_BUILDTIME_TABLE_SORT, while mcount loc in
+ * CONFIG_BUILDTIME_MCOUNT_SORT, while mcount loc in
* modules can not be sorted at build time.
*/
- if (!IS_ENABLED(CONFIG_BUILDTIME_TABLE_SORT) || mod) {
+ if (!IS_ENABLED(CONFIG_BUILDTIME_MCOUNT_SORT) || mod) {
sort(start, count, sizeof(*start),
ftrace_cmp_ips, NULL);
} else {
mutex_unlock(&watchdog_mutex);
return err;
}
+
+static const int sixty = 60;
+
+static struct ctl_table watchdog_sysctls[] = {
+ {
+ .procname = "watchdog",
+ .data = &watchdog_user_enabled,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_watchdog,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "watchdog_thresh",
+ .data = &watchdog_thresh,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_watchdog_thresh,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = (void *)&sixty,
+ },
+ {
+ .procname = "nmi_watchdog",
+ .data = &nmi_watchdog_user_enabled,
+ .maxlen = sizeof(int),
+ .mode = NMI_WATCHDOG_SYSCTL_PERM,
+ .proc_handler = proc_nmi_watchdog,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "watchdog_cpumask",
+ .data = &watchdog_cpumask_bits,
+ .maxlen = NR_CPUS,
+ .mode = 0644,
+ .proc_handler = proc_watchdog_cpumask,
+ },
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+ {
+ .procname = "soft_watchdog",
+ .data = &soft_watchdog_user_enabled,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_soft_watchdog,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "softlockup_panic",
+ .data = &softlockup_panic,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#ifdef CONFIG_SMP
+ {
+ .procname = "softlockup_all_cpu_backtrace",
+ .data = &sysctl_softlockup_all_cpu_backtrace,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#endif /* CONFIG_SMP */
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
+ {
+ .procname = "hardlockup_panic",
+ .data = &hardlockup_panic,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#ifdef CONFIG_SMP
+ {
+ .procname = "hardlockup_all_cpu_backtrace",
+ .data = &sysctl_hardlockup_all_cpu_backtrace,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#endif /* CONFIG_SMP */
+#endif
+ {}
+};
+
+static void __init watchdog_sysctl_init(void)
+{
+ register_sysctl_init("kernel", watchdog_sysctls);
+}
+#else
+#define watchdog_sysctl_init() do { } while (0)
#endif /* CONFIG_SYSCTL */
void __init lockup_detector_init(void)
if (!watchdog_nmi_probe())
nmi_watchdog_available = true;
lockup_detector_setup();
+ watchdog_sysctl_init();
}
config GENERIC_NET_UTILS
bool
-config GENERIC_FIND_FIRST_BIT
- bool
-
source "lib/math/Kconfig"
config NO_GENERIC_PCI_IOPORT_MAP
mmio accesses when the IO memory address is not a registered
emulated region.
+source "lib/crypto/Kconfig"
+
config CRC_CCITT
tristate "CRC-CCITT functions"
help
bool
select STACKTRACE
+config STACKDEPOT_ALWAYS_INIT
+ bool
+ select STACKDEPOT
+
config STACK_HASH_ORDER
int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
range 12 20
bool "Code coverage for fuzzing"
depends on ARCH_HAS_KCOV
depends on CC_HAS_SANCOV_TRACE_PC || GCC_PLUGINS
+ depends on !ARCH_WANTS_NO_INSTR || STACK_VALIDATION || \
+ GCC_VERSION >= 120000 || CLANG_VERSION >= 130000
select DEBUG_FS
select GCC_PLUGIN_SANCOV if !CC_HAS_SANCOV_TRACE_PC
help
If unsure, say N.
-config TEST_HASH
- tristate "Perform selftest on hash functions"
+config TEST_SIPHASH
+ tristate "Perform selftest on siphash functions"
help
- Enable this option to test the kernel's integer (<linux/hash.h>),
- string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
- hash functions on boot (or module load).
+ Enable this option to test the kernel's siphash (<linux/siphash.h>) hash
+ functions on boot (or module load).
This is intended to help people writing architecture-specific
optimized versions. If unsure, say N.
If unsure, say N.
+config HASH_KUNIT_TEST
+ tristate "KUnit Test for integer hash functions" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ default KUNIT_ALL_TESTS
+ help
+ Enable this option to test the kernel's string (<linux/stringhash.h>), and
+ integer (<linux/hash.h>) hash functions on boot.
+
+ KUnit tests run during boot and output the results to the debug log
+ in TAP format (https://testanything.org/). Only useful for kernel devs
+ running the KUnit test harness, and not intended for inclusion into a
+ production build.
+
+ For more information on KUnit and unit tests in general please refer
+ to the KUnit documentation in Documentation/dev-tools/kunit/.
+
+ This is intended to help people writing architecture-specific
+ optimized versions. If unsure, say N.
+
config RESOURCE_KUNIT_TEST
tristate "KUnit test for resource API"
depends on KUNIT
depends on m
depends on NETDEVICES && NET_CORE && INET # for TUN
depends on BLOCK
+ depends on PAGE_SIZE_LESS_THAN_256KB # for BTRFS
select TEST_LKM
select XFS_FS
select TUN
CC_HAS_WORKING_NOSANITIZE_ADDRESS) || \
HAVE_ARCH_KASAN_HW_TAGS
depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB)
- select STACKDEPOT
+ select STACKDEPOT_ALWAYS_INIT
help
Enables KASAN (KernelAddressSANitizer) - runtime memory debugger,
designed to find out-of-bounds accesses and use-after-free bugs.
This option enables -fsanitize=unreachable which checks for control
flow reaching an expected-to-be-unreachable position.
-config UBSAN_OBJECT_SIZE
- bool "Perform checking for accesses beyond the end of objects"
- default UBSAN
- # gcc hugely expands stack usage with -fsanitize=object-size
- # https://lore.kernel.org/lkml/CAHk-=wjPasyJrDuwDnpHJS2TuQfExwe=px-SzLeN8GFMAQJPmQ@mail.gmail.com/
- depends on !CC_IS_GCC
- depends on $(cc-option,-fsanitize=object-size)
- help
- This option enables -fsanitize=object-size which checks for accesses
- beyond the end of objects where the optimizer can determine both the
- object being operated on and its size, usually seen with bad downcasts,
- or access to struct members from NULL pointers.
-
config UBSAN_BOOL
bool "Perform checking for non-boolean values used as boolean"
default UBSAN
obj-$(CONFIG_TEST_BITOPS) += test_bitops.o
CFLAGS_test_bitops.o += -Werror
obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o
-obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
+obj-$(CONFIG_TEST_SIPHASH) += test_siphash.o
+obj-$(CONFIG_HASH_KUNIT_TEST) += test_hash.o
obj-$(CONFIG_TEST_IDA) += test_ida.o
obj-$(CONFIG_KASAN_KUNIT_TEST) += test_kasan.o
CFLAGS_test_kasan.o += -fno-builtin
# SPDX-License-Identifier: GPL-2.0
+menu "Crypto library routines"
+
config CRYPTO_LIB_AES
tristate
config CRYPTO_LIB_CHACHA_GENERIC
tristate
- select CRYPTO_ALGAPI
+ select XOR_BLOCKS
help
This symbol can be depended upon by arch implementations of the
ChaCha library interface that require the generic code as a
of CRYPTO_LIB_CHACHA.
config CRYPTO_LIB_CHACHA
- tristate
+ tristate "ChaCha library interface"
+ depends on CRYPTO
depends on CRYPTO_ARCH_HAVE_LIB_CHACHA || !CRYPTO_ARCH_HAVE_LIB_CHACHA
select CRYPTO_LIB_CHACHA_GENERIC if CRYPTO_ARCH_HAVE_LIB_CHACHA=n
help
of CRYPTO_LIB_CURVE25519.
config CRYPTO_LIB_CURVE25519
- tristate
+ tristate "Curve25519 scalar multiplication library"
depends on CRYPTO_ARCH_HAVE_LIB_CURVE25519 || !CRYPTO_ARCH_HAVE_LIB_CURVE25519
select CRYPTO_LIB_CURVE25519_GENERIC if CRYPTO_ARCH_HAVE_LIB_CURVE25519=n
help
of CRYPTO_LIB_POLY1305.
config CRYPTO_LIB_POLY1305
- tristate
+ tristate "Poly1305 library interface"
depends on CRYPTO_ARCH_HAVE_LIB_POLY1305 || !CRYPTO_ARCH_HAVE_LIB_POLY1305
select CRYPTO_LIB_POLY1305_GENERIC if CRYPTO_ARCH_HAVE_LIB_POLY1305=n
help
is available and enabled.
config CRYPTO_LIB_CHACHA20POLY1305
- tristate
+ tristate "ChaCha20-Poly1305 AEAD support (8-byte nonce library version)"
depends on CRYPTO_ARCH_HAVE_LIB_CHACHA || !CRYPTO_ARCH_HAVE_LIB_CHACHA
depends on CRYPTO_ARCH_HAVE_LIB_POLY1305 || !CRYPTO_ARCH_HAVE_LIB_POLY1305
+ depends on CRYPTO
select CRYPTO_LIB_CHACHA
select CRYPTO_LIB_POLY1305
+ select CRYPTO_ALGAPI
config CRYPTO_LIB_SHA256
tristate
config CRYPTO_LIB_SM4
tristate
+
+endmenu
* #include <stdio.h>
*
* #include <openssl/evp.h>
- * #include <openssl/hmac.h>
*
* #define BLAKE2S_TESTVEC_COUNT 256
*
* }
* printf("};\n\n");
*
- * printf("static const u8 blake2s_hmac_testvecs[][BLAKE2S_HASH_SIZE] __initconst = {\n");
- *
- * HMAC(EVP_blake2s256(), key, sizeof(key), buf, sizeof(buf), hash, NULL);
- * print_vec(hash, BLAKE2S_OUTBYTES);
- *
- * HMAC(EVP_blake2s256(), buf, sizeof(buf), key, sizeof(key), hash, NULL);
- * print_vec(hash, BLAKE2S_OUTBYTES);
- *
- * printf("};\n");
- *
* return 0;
*}
*/
0xd6, 0x98, 0x6b, 0x07, 0x10, 0x65, 0x52, 0x65, },
};
-static const u8 blake2s_hmac_testvecs[][BLAKE2S_HASH_SIZE] __initconst = {
- { 0xce, 0xe1, 0x57, 0x69, 0x82, 0xdc, 0xbf, 0x43, 0xad, 0x56, 0x4c, 0x70,
- 0xed, 0x68, 0x16, 0x96, 0xcf, 0xa4, 0x73, 0xe8, 0xe8, 0xfc, 0x32, 0x79,
- 0x08, 0x0a, 0x75, 0x82, 0xda, 0x3f, 0x05, 0x11, },
- { 0x77, 0x2f, 0x0c, 0x71, 0x41, 0xf4, 0x4b, 0x2b, 0xb3, 0xc6, 0xb6, 0xf9,
- 0x60, 0xde, 0xe4, 0x52, 0x38, 0x66, 0xe8, 0xbf, 0x9b, 0x96, 0xc4, 0x9f,
- 0x60, 0xd9, 0x24, 0x37, 0x99, 0xd6, 0xec, 0x31, },
-};
-
bool __init blake2s_selftest(void)
{
u8 key[BLAKE2S_KEY_SIZE];
}
}
- if (success) {
- blake2s256_hmac(hash, buf, key, sizeof(buf), sizeof(key));
- success &= !memcmp(hash, blake2s_hmac_testvecs[0], BLAKE2S_HASH_SIZE);
-
- blake2s256_hmac(hash, key, buf, sizeof(key), sizeof(buf));
- success &= !memcmp(hash, blake2s_hmac_testvecs[1], BLAKE2S_HASH_SIZE);
-
- if (!success)
- pr_err("blake2s256_hmac self-test: FAIL\n");
- }
-
return success;
}
}
EXPORT_SYMBOL(blake2s_final);
-void blake2s256_hmac(u8 *out, const u8 *in, const u8 *key, const size_t inlen,
- const size_t keylen)
-{
- struct blake2s_state state;
- u8 x_key[BLAKE2S_BLOCK_SIZE] __aligned(__alignof__(u32)) = { 0 };
- u8 i_hash[BLAKE2S_HASH_SIZE] __aligned(__alignof__(u32));
- int i;
-
- if (keylen > BLAKE2S_BLOCK_SIZE) {
- blake2s_init(&state, BLAKE2S_HASH_SIZE);
- blake2s_update(&state, key, keylen);
- blake2s_final(&state, x_key);
- } else
- memcpy(x_key, key, keylen);
-
- for (i = 0; i < BLAKE2S_BLOCK_SIZE; ++i)
- x_key[i] ^= 0x36;
-
- blake2s_init(&state, BLAKE2S_HASH_SIZE);
- blake2s_update(&state, x_key, BLAKE2S_BLOCK_SIZE);
- blake2s_update(&state, in, inlen);
- blake2s_final(&state, i_hash);
-
- for (i = 0; i < BLAKE2S_BLOCK_SIZE; ++i)
- x_key[i] ^= 0x5c ^ 0x36;
-
- blake2s_init(&state, BLAKE2S_HASH_SIZE);
- blake2s_update(&state, x_key, BLAKE2S_BLOCK_SIZE);
- blake2s_update(&state, i_hash, BLAKE2S_HASH_SIZE);
- blake2s_final(&state, i_hash);
-
- memcpy(out, i_hash, BLAKE2S_HASH_SIZE);
- memzero_explicit(x_key, BLAKE2S_BLOCK_SIZE);
- memzero_explicit(i_hash, BLAKE2S_HASH_SIZE);
-}
-EXPORT_SYMBOL(blake2s256_hmac);
-
static int __init blake2s_mod_init(void)
{
if (!IS_ENABLED(CONFIG_CRYPTO_MANAGER_DISABLE_TESTS) &&
EXPORT_SYMBOL(_find_first_bit);
#endif
+#ifndef find_first_and_bit
+/*
+ * Find the first set bit in two memory regions.
+ */
+unsigned long _find_first_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ unsigned long size)
+{
+ unsigned long idx, val;
+
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ val = addr1[idx] & addr2[idx];
+ if (val)
+ return min(idx * BITS_PER_LONG + __ffs(val), size);
+ }
+
+ return size;
+}
+EXPORT_SYMBOL(_find_first_and_bit);
+#endif
+
#ifndef find_first_zero_bit
/*
* Find the first cleared bit in a memory region.
return 0;
}
+static int __init test_find_first_and_bit(void *bitmap, const void *bitmap2, unsigned long len)
+{
+ static DECLARE_BITMAP(cp, BITMAP_LEN) __initdata;
+ unsigned long i, cnt;
+ ktime_t time;
+
+ bitmap_copy(cp, bitmap, BITMAP_LEN);
+
+ time = ktime_get();
+ for (cnt = i = 0; i < len; cnt++) {
+ i = find_first_and_bit(cp, bitmap2, len);
+ __clear_bit(i, cp);
+ }
+ time = ktime_get() - time;
+ pr_err("find_first_and_bit: %18llu ns, %6ld iterations\n", time, cnt);
+
+ return 0;
+}
+
static int __init test_find_next_bit(const void *bitmap, unsigned long len)
{
unsigned long i, cnt;
* traverse only part of bitmap to avoid soft lockup.
*/
test_find_first_bit(bitmap, BITMAP_LEN / 10);
+ test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN / 2);
test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
pr_err("\nStart testing find_bit() with sparse bitmap\n");
test_find_next_zero_bit(bitmap, BITMAP_LEN);
test_find_last_bit(bitmap, BITMAP_LEN);
test_find_first_bit(bitmap, BITMAP_LEN);
+ test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
/*
list_del(&chunk->next_chunk);
end_bit = chunk_size(chunk) >> order;
- bit = find_next_bit(chunk->bits, end_bit, 0);
+ bit = find_first_bit(chunk->bits, end_bit);
BUG_ON(bit < end_bit);
vfree(chunk);
#include "kstrtox.h"
+noinline
const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
{
if (*base == 0) {
*
* Don't you dare use this function.
*/
+noinline
unsigned int _parse_integer_limit(const char *s, unsigned int base, unsigned long long *p,
size_t max_chars)
{
return rv;
}
+noinline
unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
{
return _parse_integer_limit(s, base, p, INT_MAX);
* Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
* Preferred over simple_strtoull(). Return code must be checked.
*/
+noinline
int kstrtoull(const char *s, unsigned int base, unsigned long long *res)
{
if (s[0] == '+')
* Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
* Preferred over simple_strtoll(). Return code must be checked.
*/
+noinline
int kstrtoll(const char *s, unsigned int base, long long *res)
{
unsigned long long tmp;
* Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
* Preferred over simple_strtoul(). Return code must be checked.
*/
+noinline
int kstrtouint(const char *s, unsigned int base, unsigned int *res)
{
unsigned long long tmp;
* Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
* Preferred over simple_strtol(). Return code must be checked.
*/
+noinline
int kstrtoint(const char *s, unsigned int base, int *res)
{
long long tmp;
}
EXPORT_SYMBOL(kstrtoint);
+noinline
int kstrtou16(const char *s, unsigned int base, u16 *res)
{
unsigned long long tmp;
}
EXPORT_SYMBOL(kstrtou16);
+noinline
int kstrtos16(const char *s, unsigned int base, s16 *res)
{
long long tmp;
}
EXPORT_SYMBOL(kstrtos16);
+noinline
int kstrtou8(const char *s, unsigned int base, u8 *res)
{
unsigned long long tmp;
}
EXPORT_SYMBOL(kstrtou8);
+noinline
int kstrtos8(const char *s, unsigned int base, s8 *res)
{
long long tmp;
* [oO][NnFf] for "on" and "off". Otherwise it will return -EINVAL. Value
* pointed to by res is updated upon finding a match.
*/
+noinline
int kstrtobool(const char *s, bool *res)
{
if (!s)
"list_del corruption, %px->prev is LIST_POISON2 (%px)\n",
entry, LIST_POISON2) ||
CHECK_DATA_CORRUPTION(prev->next != entry,
- "list_del corruption. prev->next should be %px, but was %px\n",
- entry, prev->next) ||
+ "list_del corruption. prev->next should be %px, but was %px. (prev=%px)\n",
+ entry, prev->next, prev) ||
CHECK_DATA_CORRUPTION(next->prev != entry,
- "list_del corruption. next->prev should be %px, but was %px\n",
- entry, next->prev))
+ "list_del corruption. next->prev should be %px, but was %px. (next=%px)\n",
+ entry, next->prev, next))
return false;
return true;
*/
#include <asm/unaligned.h>
+
+#include <linux/bitops.h>
#include <linux/string.h> /* memset, memcpy */
#define FORCE_INLINE __always_inline
unsigned long entries[REF_TRACKER_STACK_ENTRIES];
struct ref_tracker *tracker;
unsigned int nr_entries;
+ gfp_t gfp_mask = gfp;
unsigned long flags;
- *trackerp = tracker = kzalloc(sizeof(*tracker), gfp | __GFP_NOFAIL);
+ if (gfp & __GFP_DIRECT_RECLAIM)
+ gfp_mask |= __GFP_NOFAIL;
+ *trackerp = tracker = kzalloc(sizeof(*tracker), gfp_mask);
if (unlikely(!tracker)) {
pr_err_once("memory allocation failure, unreliable refcount tracker.\n");
refcount_inc(&dir->untracked);
}
EXPORT_SYMBOL_GPL(sbitmap_queue_init_node);
-static void sbitmap_queue_update_wake_batch(struct sbitmap_queue *sbq,
- unsigned int depth)
+static inline void __sbitmap_queue_update_wake_batch(struct sbitmap_queue *sbq,
+ unsigned int wake_batch)
{
- unsigned int wake_batch = sbq_calc_wake_batch(sbq, depth);
int i;
if (sbq->wake_batch != wake_batch) {
}
}
+static void sbitmap_queue_update_wake_batch(struct sbitmap_queue *sbq,
+ unsigned int depth)
+{
+ unsigned int wake_batch;
+
+ wake_batch = sbq_calc_wake_batch(sbq, depth);
+ __sbitmap_queue_update_wake_batch(sbq, wake_batch);
+}
+
+void sbitmap_queue_recalculate_wake_batch(struct sbitmap_queue *sbq,
+ unsigned int users)
+{
+ unsigned int wake_batch;
+
+ wake_batch = clamp_val((sbq->sb.depth + users - 1) /
+ users, 4, SBQ_WAKE_BATCH);
+ __sbitmap_queue_update_wake_batch(sbq, wake_batch);
+}
+EXPORT_SYMBOL_GPL(sbitmap_queue_recalculate_wake_batch);
+
void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth)
{
sbitmap_queue_update_wake_batch(sbq, depth);
#include <linux/kernel.h>
#include <linux/export.h>
#include <linux/bitops.h>
+#include <linux/string.h>
#include <crypto/sha1.h>
#include <asm/unaligned.h>
#define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \
__u32 TEMP = input(t); setW(t, TEMP); \
E += TEMP + rol32(A,5) + (fn) + (constant); \
- B = ror32(B, 2); } while (0)
+ B = ror32(B, 2); \
+ TEMP = E; E = D; D = C; C = B; B = A; A = TEMP; } while (0)
#define T_0_15(t, A, B, C, D, E) SHA_ROUND(t, SHA_SRC, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E )
#define T_16_19(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E )
void sha1_transform(__u32 *digest, const char *data, __u32 *array)
{
__u32 A, B, C, D, E;
+ unsigned int i = 0;
A = digest[0];
B = digest[1];
E = digest[4];
/* Round 1 - iterations 0-16 take their input from 'data' */
- T_0_15( 0, A, B, C, D, E);
- T_0_15( 1, E, A, B, C, D);
- T_0_15( 2, D, E, A, B, C);
- T_0_15( 3, C, D, E, A, B);
- T_0_15( 4, B, C, D, E, A);
- T_0_15( 5, A, B, C, D, E);
- T_0_15( 6, E, A, B, C, D);
- T_0_15( 7, D, E, A, B, C);
- T_0_15( 8, C, D, E, A, B);
- T_0_15( 9, B, C, D, E, A);
- T_0_15(10, A, B, C, D, E);
- T_0_15(11, E, A, B, C, D);
- T_0_15(12, D, E, A, B, C);
- T_0_15(13, C, D, E, A, B);
- T_0_15(14, B, C, D, E, A);
- T_0_15(15, A, B, C, D, E);
+ for (; i < 16; ++i)
+ T_0_15(i, A, B, C, D, E);
/* Round 1 - tail. Input from 512-bit mixing array */
- T_16_19(16, E, A, B, C, D);
- T_16_19(17, D, E, A, B, C);
- T_16_19(18, C, D, E, A, B);
- T_16_19(19, B, C, D, E, A);
+ for (; i < 20; ++i)
+ T_16_19(i, A, B, C, D, E);
/* Round 2 */
- T_20_39(20, A, B, C, D, E);
- T_20_39(21, E, A, B, C, D);
- T_20_39(22, D, E, A, B, C);
- T_20_39(23, C, D, E, A, B);
- T_20_39(24, B, C, D, E, A);
- T_20_39(25, A, B, C, D, E);
- T_20_39(26, E, A, B, C, D);
- T_20_39(27, D, E, A, B, C);
- T_20_39(28, C, D, E, A, B);
- T_20_39(29, B, C, D, E, A);
- T_20_39(30, A, B, C, D, E);
- T_20_39(31, E, A, B, C, D);
- T_20_39(32, D, E, A, B, C);
- T_20_39(33, C, D, E, A, B);
- T_20_39(34, B, C, D, E, A);
- T_20_39(35, A, B, C, D, E);
- T_20_39(36, E, A, B, C, D);
- T_20_39(37, D, E, A, B, C);
- T_20_39(38, C, D, E, A, B);
- T_20_39(39, B, C, D, E, A);
+ for (; i < 40; ++i)
+ T_20_39(i, A, B, C, D, E);
/* Round 3 */
- T_40_59(40, A, B, C, D, E);
- T_40_59(41, E, A, B, C, D);
- T_40_59(42, D, E, A, B, C);
- T_40_59(43, C, D, E, A, B);
- T_40_59(44, B, C, D, E, A);
- T_40_59(45, A, B, C, D, E);
- T_40_59(46, E, A, B, C, D);
- T_40_59(47, D, E, A, B, C);
- T_40_59(48, C, D, E, A, B);
- T_40_59(49, B, C, D, E, A);
- T_40_59(50, A, B, C, D, E);
- T_40_59(51, E, A, B, C, D);
- T_40_59(52, D, E, A, B, C);
- T_40_59(53, C, D, E, A, B);
- T_40_59(54, B, C, D, E, A);
- T_40_59(55, A, B, C, D, E);
- T_40_59(56, E, A, B, C, D);
- T_40_59(57, D, E, A, B, C);
- T_40_59(58, C, D, E, A, B);
- T_40_59(59, B, C, D, E, A);
+ for (; i < 60; ++i)
+ T_40_59(i, A, B, C, D, E);
/* Round 4 */
- T_60_79(60, A, B, C, D, E);
- T_60_79(61, E, A, B, C, D);
- T_60_79(62, D, E, A, B, C);
- T_60_79(63, C, D, E, A, B);
- T_60_79(64, B, C, D, E, A);
- T_60_79(65, A, B, C, D, E);
- T_60_79(66, E, A, B, C, D);
- T_60_79(67, D, E, A, B, C);
- T_60_79(68, C, D, E, A, B);
- T_60_79(69, B, C, D, E, A);
- T_60_79(70, A, B, C, D, E);
- T_60_79(71, E, A, B, C, D);
- T_60_79(72, D, E, A, B, C);
- T_60_79(73, C, D, E, A, B);
- T_60_79(74, B, C, D, E, A);
- T_60_79(75, A, B, C, D, E);
- T_60_79(76, E, A, B, C, D);
- T_60_79(77, D, E, A, B, C);
- T_60_79(78, C, D, E, A, B);
- T_60_79(79, B, C, D, E, A);
+ for (; i < 80; ++i)
+ T_60_79(i, A, B, C, D, E);
digest[0] += A;
digest[1] += B;
#include <linux/jhash.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/mutex.h>
#include <linux/percpu.h>
#include <linux/printk.h>
#include <linux/slab.h>
}
early_param("stack_depot_disable", is_stack_depot_disabled);
-int __init stack_depot_init(void)
+/*
+ * __ref because of memblock_alloc(), which will not be actually called after
+ * the __init code is gone, because at that point slab_is_available() is true
+ */
+__ref int stack_depot_init(void)
{
- if (!stack_depot_disable) {
+ static DEFINE_MUTEX(stack_depot_init_mutex);
+
+ mutex_lock(&stack_depot_init_mutex);
+ if (!stack_depot_disable && !stack_table) {
size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
int i;
- stack_table = memblock_alloc(size, size);
- for (i = 0; i < STACK_HASH_SIZE; i++)
- stack_table[i] = NULL;
+ if (slab_is_available()) {
+ pr_info("Stack Depot allocating hash table with kvmalloc\n");
+ stack_table = kvmalloc(size, GFP_KERNEL);
+ } else {
+ pr_info("Stack Depot allocating hash table with memblock_alloc\n");
+ stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
+ }
+ if (stack_table) {
+ for (i = 0; i < STACK_HASH_SIZE; i++)
+ stack_table[i] = NULL;
+ } else {
+ pr_err("Stack Depot hash table allocation failed, disabling\n");
+ stack_depot_disable = true;
+ mutex_unlock(&stack_depot_init_mutex);
+ return -ENOMEM;
+ }
}
+ mutex_unlock(&stack_depot_init_mutex);
return 0;
}
+EXPORT_SYMBOL_GPL(stack_depot_init);
/* Calculate hash for a stack */
static inline u32 hash_stack(unsigned long *entries, unsigned int size)
* (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids
* any allocations and will fail if no space is left to store the stack trace.
*
+ * If the stack trace in @entries is from an interrupt, only the portion up to
+ * interrupt entry is saved.
+ *
* Context: Any context, but setting @can_alloc to %false is required if
* alloc_pages() cannot be used from the current context. Currently
* this is the case from contexts where neither %GFP_ATOMIC nor
unsigned long flags;
u32 hash;
+ /*
+ * If this stack trace is from an interrupt, including anything before
+ * interrupt entry usually leads to unbounded stackdepot growth.
+ *
+ * Because use of filter_irq_stacks() is a requirement to ensure
+ * stackdepot can efficiently deduplicate interrupt stacks, always
+ * filter_irq_stacks() to simplify all callers' use of stackdepot.
+ */
+ nr_entries = filter_irq_stacks(entries, nr_entries);
+
if (unlikely(nr_entries == 0) || stack_depot_disable)
goto fast_exit;
}
}
+static void __init test_bitmap_printlist(void)
+{
+ unsigned long *bmap = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ char expected[256];
+ int ret, slen;
+ ktime_t time;
+
+ if (!buf || !bmap)
+ goto out;
+
+ memset(bmap, -1, PAGE_SIZE);
+ slen = snprintf(expected, 256, "0-%ld", PAGE_SIZE * 8 - 1);
+ if (slen < 0)
+ goto out;
+
+ time = ktime_get();
+ ret = bitmap_print_to_pagebuf(true, buf, bmap, PAGE_SIZE * 8);
+ time = ktime_get() - time;
+
+ if (ret != slen + 1) {
+ pr_err("bitmap_print_to_pagebuf: result is %d, expected %d\n", ret, slen);
+ goto out;
+ }
+
+ if (strncmp(buf, expected, slen)) {
+ pr_err("bitmap_print_to_pagebuf: result is %s, expected %s\n", buf, expected);
+ goto out;
+ }
+
+ pr_err("bitmap_print_to_pagebuf: input is '%s', Time: %llu\n", buf, time);
+out:
+ kfree(buf);
+ kfree(bmap);
+}
+
static const unsigned long parse_test[] __initconst = {
BITMAP_FROM_U64(0),
BITMAP_FROM_U64(1),
test_bitmap_arr32();
test_bitmap_parse();
test_bitmap_parselist();
+ test_bitmap_printlist();
test_mem_optimisations();
test_for_each_set_clump8();
test_bitmap_cut();
* and hash_64().
*/
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt "\n"
-
#include <linux/compiler.h>
#include <linux/types.h>
#include <linux/module.h>
#include <linux/hash.h>
#include <linux/stringhash.h>
-#include <linux/printk.h>
+#include <kunit/test.h>
/* 32-bit XORSHIFT generator. Seed must not be zero. */
-static u32 __init __attribute_const__
+static u32 __attribute_const__
xorshift(u32 seed)
{
seed ^= seed << 13;
}
/* Given a non-zero x, returns a non-zero byte. */
-static u8 __init __attribute_const__
+static u8 __attribute_const__
mod255(u32 x)
{
x = (x & 0xffff) + (x >> 16); /* 1 <= x <= 0x1fffe */
}
/* Fill the buffer with non-zero bytes. */
-static void __init
-fill_buf(char *buf, size_t len, u32 seed)
+static void fill_buf(char *buf, size_t len, u32 seed)
{
size_t i;
}
}
+/* Holds most testing variables for the int test. */
+struct test_hash_params {
+ /* Pointer to integer to be hashed. */
+ unsigned long long *h64;
+ /* Low 32-bits of integer to be hashed. */
+ u32 h0;
+ /* Arch-specific hash result. */
+ u32 h1;
+ /* Generic hash result. */
+ u32 h2;
+ /* ORed hashes of given size (in bits). */
+ u32 (*hash_or)[33];
+};
+
+#ifdef HAVE_ARCH__HASH_32
+static void
+test_int__hash_32(struct kunit *test, struct test_hash_params *params)
+{
+ params->hash_or[1][0] |= params->h2 = __hash_32_generic(params->h0);
+#if HAVE_ARCH__HASH_32 == 1
+ KUNIT_EXPECT_EQ_MSG(test, params->h1, params->h2,
+ "__hash_32(%#x) = %#x != __hash_32_generic() = %#x",
+ params->h0, params->h1, params->h2);
+#endif
+}
+#endif
+
+#ifdef HAVE_ARCH_HASH_64
+static void
+test_int_hash_64(struct kunit *test, struct test_hash_params *params, u32 const *m, int *k)
+{
+ params->h2 = hash_64_generic(*params->h64, *k);
+#if HAVE_ARCH_HASH_64 == 1
+ KUNIT_EXPECT_EQ_MSG(test, params->h1, params->h2,
+ "hash_64(%#llx, %d) = %#x != hash_64_generic() = %#x",
+ *params->h64, *k, params->h1, params->h2);
+#else
+ KUNIT_EXPECT_LE_MSG(test, params->h1, params->h2,
+ "hash_64_generic(%#llx, %d) = %#x > %#x",
+ *params->h64, *k, params->h1, *m);
+#endif
+}
+#endif
+
/*
* Test the various integer hash functions. h64 (or its low-order bits)
* is the integer to hash. hash_or accumulates the OR of the hash values,
* inline, the code being tested is actually in the module, and you can
* recompile and re-test the module without rebooting.
*/
-static bool __init
-test_int_hash(unsigned long long h64, u32 hash_or[2][33])
+static void
+test_int_hash(struct kunit *test, unsigned long long h64, u32 hash_or[2][33])
{
int k;
- u32 h0 = (u32)h64, h1, h2;
+ struct test_hash_params params = { &h64, (u32)h64, 0, 0, hash_or };
/* Test __hash32 */
- hash_or[0][0] |= h1 = __hash_32(h0);
+ hash_or[0][0] |= params.h1 = __hash_32(params.h0);
#ifdef HAVE_ARCH__HASH_32
- hash_or[1][0] |= h2 = __hash_32_generic(h0);
-#if HAVE_ARCH__HASH_32 == 1
- if (h1 != h2) {
- pr_err("__hash_32(%#x) = %#x != __hash_32_generic() = %#x",
- h0, h1, h2);
- return false;
- }
-#endif
+ test_int__hash_32(test, ¶ms);
#endif
/* Test k = 1..32 bits */
u32 const m = ((u32)2 << (k-1)) - 1; /* Low k bits set */
/* Test hash_32 */
- hash_or[0][k] |= h1 = hash_32(h0, k);
- if (h1 > m) {
- pr_err("hash_32(%#x, %d) = %#x > %#x", h0, k, h1, m);
- return false;
- }
-#ifdef HAVE_ARCH_HASH_32
- h2 = hash_32_generic(h0, k);
-#if HAVE_ARCH_HASH_32 == 1
- if (h1 != h2) {
- pr_err("hash_32(%#x, %d) = %#x != hash_32_generic() "
- " = %#x", h0, k, h1, h2);
- return false;
- }
-#else
- if (h2 > m) {
- pr_err("hash_32_generic(%#x, %d) = %#x > %#x",
- h0, k, h1, m);
- return false;
- }
-#endif
-#endif
+ hash_or[0][k] |= params.h1 = hash_32(params.h0, k);
+ KUNIT_EXPECT_LE_MSG(test, params.h1, m,
+ "hash_32(%#x, %d) = %#x > %#x",
+ params.h0, k, params.h1, m);
+
/* Test hash_64 */
- hash_or[1][k] |= h1 = hash_64(h64, k);
- if (h1 > m) {
- pr_err("hash_64(%#llx, %d) = %#x > %#x", h64, k, h1, m);
- return false;
- }
+ hash_or[1][k] |= params.h1 = hash_64(h64, k);
+ KUNIT_EXPECT_LE_MSG(test, params.h1, m,
+ "hash_64(%#llx, %d) = %#x > %#x",
+ h64, k, params.h1, m);
#ifdef HAVE_ARCH_HASH_64
- h2 = hash_64_generic(h64, k);
-#if HAVE_ARCH_HASH_64 == 1
- if (h1 != h2) {
- pr_err("hash_64(%#llx, %d) = %#x != hash_64_generic() "
- "= %#x", h64, k, h1, h2);
- return false;
- }
-#else
- if (h2 > m) {
- pr_err("hash_64_generic(%#llx, %d) = %#x > %#x",
- h64, k, h1, m);
- return false;
- }
-#endif
+ test_int_hash_64(test, ¶ms, &m, &k);
#endif
}
-
- (void)h2; /* Suppress unused variable warning */
- return true;
}
#define SIZE 256 /* Run time is cubic in SIZE */
-static int __init
-test_hash_init(void)
+static void test_string_or(struct kunit *test)
{
char buf[SIZE+1];
- u32 string_or = 0, hash_or[2][33] = { { 0, } };
- unsigned tests = 0;
+ u32 string_or = 0;
+ int i, j;
+
+ fill_buf(buf, SIZE, 1);
+
+ /* Test every possible non-empty substring in the buffer. */
+ for (j = SIZE; j > 0; --j) {
+ buf[j] = '\0';
+
+ for (i = 0; i <= j; i++) {
+ u32 h0 = full_name_hash(buf+i, buf+i, j-i);
+
+ string_or |= h0;
+ } /* i */
+ } /* j */
+
+ /* The OR of all the hash values should cover all the bits */
+ KUNIT_EXPECT_EQ_MSG(test, string_or, -1u,
+ "OR of all string hash results = %#x != %#x",
+ string_or, -1u);
+}
+
+static void test_hash_or(struct kunit *test)
+{
+ char buf[SIZE+1];
+ u32 hash_or[2][33] = { { 0, } };
unsigned long long h64 = 0;
int i, j;
u32 h0 = full_name_hash(buf+i, buf+i, j-i);
/* Check that hashlen_string gets the length right */
- if (hashlen_len(hashlen) != j-i) {
- pr_err("hashlen_string(%d..%d) returned length"
- " %u, expected %d",
- i, j, hashlen_len(hashlen), j-i);
- return -EINVAL;
- }
+ KUNIT_EXPECT_EQ_MSG(test, hashlen_len(hashlen), j-i,
+ "hashlen_string(%d..%d) returned length %u, expected %d",
+ i, j, hashlen_len(hashlen), j-i);
/* Check that the hashes match */
- if (hashlen_hash(hashlen) != h0) {
- pr_err("hashlen_string(%d..%d) = %08x != "
- "full_name_hash() = %08x",
- i, j, hashlen_hash(hashlen), h0);
- return -EINVAL;
- }
+ KUNIT_EXPECT_EQ_MSG(test, hashlen_hash(hashlen), h0,
+ "hashlen_string(%d..%d) = %08x != full_name_hash() = %08x",
+ i, j, hashlen_hash(hashlen), h0);
- string_or |= h0;
h64 = h64 << 32 | h0; /* For use with hash_64 */
- if (!test_int_hash(h64, hash_or))
- return -EINVAL;
- tests++;
+ test_int_hash(test, h64, hash_or);
} /* i */
} /* j */
- /* The OR of all the hash values should cover all the bits */
- if (~string_or) {
- pr_err("OR of all string hash results = %#x != %#x",
- string_or, -1u);
- return -EINVAL;
- }
- if (~hash_or[0][0]) {
- pr_err("OR of all __hash_32 results = %#x != %#x",
- hash_or[0][0], -1u);
- return -EINVAL;
- }
+ KUNIT_EXPECT_EQ_MSG(test, hash_or[0][0], -1u,
+ "OR of all __hash_32 results = %#x != %#x",
+ hash_or[0][0], -1u);
#ifdef HAVE_ARCH__HASH_32
#if HAVE_ARCH__HASH_32 != 1 /* Test is pointless if results match */
- if (~hash_or[1][0]) {
- pr_err("OR of all __hash_32_generic results = %#x != %#x",
- hash_or[1][0], -1u);
- return -EINVAL;
- }
+ KUNIT_EXPECT_EQ_MSG(test, hash_or[1][0], -1u,
+ "OR of all __hash_32_generic results = %#x != %#x",
+ hash_or[1][0], -1u);
#endif
#endif
for (i = 1; i <= 32; i++) {
u32 const m = ((u32)2 << (i-1)) - 1; /* Low i bits set */
- if (hash_or[0][i] != m) {
- pr_err("OR of all hash_32(%d) results = %#x "
- "(%#x expected)", i, hash_or[0][i], m);
- return -EINVAL;
- }
- if (hash_or[1][i] != m) {
- pr_err("OR of all hash_64(%d) results = %#x "
- "(%#x expected)", i, hash_or[1][i], m);
- return -EINVAL;
- }
+ KUNIT_EXPECT_EQ_MSG(test, hash_or[0][i], m,
+ "OR of all hash_32(%d) results = %#x (%#x expected)",
+ i, hash_or[0][i], m);
+ KUNIT_EXPECT_EQ_MSG(test, hash_or[1][i], m,
+ "OR of all hash_64(%d) results = %#x (%#x expected)",
+ i, hash_or[1][i], m);
}
+}
- /* Issue notices about skipped tests. */
-#ifdef HAVE_ARCH__HASH_32
-#if HAVE_ARCH__HASH_32 != 1
- pr_info("__hash_32() is arch-specific; not compared to generic.");
-#endif
-#else
- pr_info("__hash_32() has no arch implementation to test.");
-#endif
-#ifdef HAVE_ARCH_HASH_32
-#if HAVE_ARCH_HASH_32 != 1
- pr_info("hash_32() is arch-specific; not compared to generic.");
-#endif
-#else
- pr_info("hash_32() has no arch implementation to test.");
-#endif
-#ifdef HAVE_ARCH_HASH_64
-#if HAVE_ARCH_HASH_64 != 1
- pr_info("hash_64() is arch-specific; not compared to generic.");
-#endif
-#else
- pr_info("hash_64() has no arch implementation to test.");
-#endif
-
- pr_notice("%u tests passed.", tests);
+static struct kunit_case hash_test_cases[] __refdata = {
+ KUNIT_CASE(test_string_or),
+ KUNIT_CASE(test_hash_or),
+ {}
+};
- return 0;
-}
+static struct kunit_suite hash_test_suite = {
+ .name = "hash",
+ .test_cases = hash_test_cases,
+};
-static void __exit test_hash_exit(void)
-{
-}
-module_init(test_hash_init); /* Does everything */
-module_exit(test_hash_exit); /* Does nothing */
+kunit_test_suite(hash_test_suite);
MODULE_LICENSE("GPL");
if (num)
kmem_cache_free_bulk(c, num, objects);
}
+ kmem_cache_destroy(c);
*total_failures += fail;
return 1;
}
{ }
};
-static struct ctl_table test_sysctl_table[] = {
- {
- .procname = "test_sysctl",
- .maxlen = 0,
- .mode = 0555,
- .child = test_table,
- },
- { }
-};
-
-static struct ctl_table test_sysctl_root_table[] = {
- {
- .procname = "debug",
- .maxlen = 0,
- .mode = 0555,
- .child = test_sysctl_table,
- },
- { }
-};
-
static struct ctl_table_header *test_sysctl_header;
static int __init test_sysctl_init(void)
test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL);
if (!test_data.bitmap_0001)
return -ENOMEM;
- test_sysctl_header = register_sysctl_table(test_sysctl_root_table);
+ test_sysctl_header = register_sysctl("debug/test_sysctl", test_table);
if (!test_sysctl_header) {
kfree(test_data.bitmap_0001);
return -ENOMEM;
eval2 = eval;
}
-static void test_ubsan_null_ptr_deref(void)
-{
- volatile int *ptr = NULL;
- int val;
-
- UBSAN_TEST(CONFIG_UBSAN_OBJECT_SIZE);
- val = *ptr;
-}
-
static void test_ubsan_misaligned_access(void)
{
volatile char arr[5] __aligned(4) = {1, 2, 3, 4, 5};
*ptr = val;
}
-static void test_ubsan_object_size_mismatch(void)
-{
- /* "((aligned(8)))" helps this not into be misaligned for ptr-access. */
- volatile int val __aligned(8) = 4;
- volatile long long *ptr, val2;
-
- UBSAN_TEST(CONFIG_UBSAN_OBJECT_SIZE);
- ptr = (long long *)&val;
- val2 = *ptr;
-}
-
static const test_ubsan_fp test_ubsan_array[] = {
test_ubsan_shift_out_of_bounds,
test_ubsan_out_of_bounds,
test_ubsan_load_invalid_value,
test_ubsan_misaligned_access,
- test_ubsan_object_size_mismatch,
};
/* Excluded because they Oops the module. */
static const test_ubsan_fp skip_ubsan_array[] = {
test_ubsan_divrem_overflow,
- test_ubsan_null_ptr_deref,
};
static int __init test_ubsan_init(void)
struct printf_spec spec, const char *fmt)
{
int nr_bits = max_t(int, spec.field_width, 0);
- /* current bit is 'cur', most recently seen range is [rbot, rtop] */
- int cur, rbot, rtop;
bool first = true;
+ int rbot, rtop;
if (check_pointer(&buf, end, bitmap, spec))
return buf;
- rbot = cur = find_first_bit(bitmap, nr_bits);
- while (cur < nr_bits) {
- rtop = cur;
- cur = find_next_bit(bitmap, nr_bits, cur + 1);
- if (cur < nr_bits && cur <= rtop + 1)
- continue;
-
+ for_each_set_bitrange(rbot, rtop, bitmap, nr_bits) {
if (!first) {
if (buf < end)
*buf = ',';
first = false;
buf = number(buf, end, rbot, default_dec_spec);
- if (rbot < rtop) {
- if (buf < end)
- *buf = '-';
- buf++;
-
- buf = number(buf, end, rtop, default_dec_spec);
- }
+ if (rtop == rbot + 1)
+ continue;
- rbot = cur;
+ if (buf < end)
+ *buf = '-';
+ buf = number(++buf, end, rtop - 1, default_dec_spec);
}
return buf;
}
bool
default y
-config CLEANCACHE
- bool "Enable cleancache driver to cache clean pages if tmem is present"
- help
- Cleancache can be thought of as a page-granularity victim cache
- for clean pages that the kernel's pageframe replacement algorithm
- (PFRA) would like to keep around, but can't since there isn't enough
- memory. So when the PFRA "evicts" a page, it first attempts to use
- cleancache code to put the data contained in that page into
- "transcendent memory", memory that is not directly accessible or
- addressable by the kernel and is of unknown and possibly
- time-varying size. And when a cleancache-enabled
- filesystem wishes to access a page in a file on disk, it first
- checks cleancache to see if it already contains it; if it does,
- the page is copied into the kernel and a disk access is avoided.
- When a transcendent memory driver is available (such as zcache or
- Xen transcendent memory), a significant I/O reduction
- may be achieved. When none is available, all cleancache calls
- are reduced to a single pointer-compare-against-NULL resulting
- in a negligible performance hit.
-
- If unsure, say Y to enable cleancache
+config NEED_PER_CPU_EMBED_FIRST_CHUNK
+ bool
-config FRONTSWAP
- bool "Enable frontswap to cache swap pages if tmem is present"
- depends on SWAP
- help
- Frontswap is so named because it can be thought of as the opposite
- of a "backing" store for a swap device. The data is stored into
- "transcendent memory", memory that is not directly accessible or
- addressable by the kernel and is of unknown and possibly
- time-varying size. When space in transcendent memory is available,
- a significant swap I/O reduction may be achieved. When none is
- available, all frontswap calls are reduced to a single pointer-
- compare-against-NULL resulting in a negligible performance hit
- and swap data is stored as normal on the matching swap device.
+config NEED_PER_CPU_PAGE_FIRST_CHUNK
+ bool
+
+config USE_PERCPU_NUMA_NODE_ID
+ bool
+
+config HAVE_SETUP_PER_CPU_AREA
+ bool
- If unsure, say Y to enable frontswap.
+config FRONTSWAP
+ bool
config CMA
bool "Contiguous Memory Allocator"
config ZSWAP
bool "Compressed cache for swap pages (EXPERIMENTAL)"
- depends on FRONTSWAP && CRYPTO=y
+ depends on SWAP && CRYPTO=y
+ select FRONTSWAP
select ZPOOL
help
A lightweight compressed cache for swap pages. It takes
obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o
obj-$(CONFIG_PAGE_OWNER) += page_owner.o
-obj-$(CONFIG_CLEANCACHE) += cleancache.o
obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
obj-$(CONFIG_ZPOOL) += zpool.o
obj-$(CONFIG_ZBUD) += zbud.o
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Cleancache frontend
- *
- * This code provides the generic "frontend" layer to call a matching
- * "backend" driver implementation of cleancache. See
- * Documentation/vm/cleancache.rst for more information.
- *
- * Copyright (C) 2009-2010 Oracle Corp. All rights reserved.
- * Author: Dan Magenheimer
- */
-
-#include <linux/module.h>
-#include <linux/fs.h>
-#include <linux/exportfs.h>
-#include <linux/mm.h>
-#include <linux/debugfs.h>
-#include <linux/cleancache.h>
-
-/*
- * cleancache_ops is set by cleancache_register_ops to contain the pointers
- * to the cleancache "backend" implementation functions.
- */
-static const struct cleancache_ops *cleancache_ops __read_mostly;
-
-/*
- * Counters available via /sys/kernel/debug/cleancache (if debugfs is
- * properly configured. These are for information only so are not protected
- * against increment races.
- */
-static u64 cleancache_succ_gets;
-static u64 cleancache_failed_gets;
-static u64 cleancache_puts;
-static u64 cleancache_invalidates;
-
-static void cleancache_register_ops_sb(struct super_block *sb, void *unused)
-{
- switch (sb->cleancache_poolid) {
- case CLEANCACHE_NO_BACKEND:
- __cleancache_init_fs(sb);
- break;
- case CLEANCACHE_NO_BACKEND_SHARED:
- __cleancache_init_shared_fs(sb);
- break;
- }
-}
-
-/*
- * Register operations for cleancache. Returns 0 on success.
- */
-int cleancache_register_ops(const struct cleancache_ops *ops)
-{
- if (cmpxchg(&cleancache_ops, NULL, ops))
- return -EBUSY;
-
- /*
- * A cleancache backend can be built as a module and hence loaded after
- * a cleancache enabled filesystem has called cleancache_init_fs. To
- * handle such a scenario, here we call ->init_fs or ->init_shared_fs
- * for each active super block. To differentiate between local and
- * shared filesystems, we temporarily initialize sb->cleancache_poolid
- * to CLEANCACHE_NO_BACKEND or CLEANCACHE_NO_BACKEND_SHARED
- * respectively in case there is no backend registered at the time
- * cleancache_init_fs or cleancache_init_shared_fs is called.
- *
- * Since filesystems can be mounted concurrently with cleancache
- * backend registration, we have to be careful to guarantee that all
- * cleancache enabled filesystems that has been mounted by the time
- * cleancache_register_ops is called has got and all mounted later will
- * get cleancache_poolid. This is assured by the following statements
- * tied together:
- *
- * a) iterate_supers skips only those super blocks that has started
- * ->kill_sb
- *
- * b) if iterate_supers encounters a super block that has not finished
- * ->mount yet, it waits until it is finished
- *
- * c) cleancache_init_fs is called from ->mount and
- * cleancache_invalidate_fs is called from ->kill_sb
- *
- * d) we call iterate_supers after cleancache_ops has been set
- *
- * From a) it follows that if iterate_supers skips a super block, then
- * either the super block is already dead, in which case we do not need
- * to bother initializing cleancache for it, or it was mounted after we
- * initiated iterate_supers. In the latter case, it must have seen
- * cleancache_ops set according to d) and initialized cleancache from
- * ->mount by itself according to c). This proves that we call
- * ->init_fs at least once for each active super block.
- *
- * From b) and c) it follows that if iterate_supers encounters a super
- * block that has already started ->init_fs, it will wait until ->mount
- * and hence ->init_fs has finished, then check cleancache_poolid, see
- * that it has already been set and therefore do nothing. This proves
- * that we call ->init_fs no more than once for each super block.
- *
- * Combined together, the last two paragraphs prove the function
- * correctness.
- *
- * Note that various cleancache callbacks may proceed before this
- * function is called or even concurrently with it, but since
- * CLEANCACHE_NO_BACKEND is negative, they will all result in a noop
- * until the corresponding ->init_fs has been actually called and
- * cleancache_ops has been set.
- */
- iterate_supers(cleancache_register_ops_sb, NULL);
- return 0;
-}
-EXPORT_SYMBOL(cleancache_register_ops);
-
-/* Called by a cleancache-enabled filesystem at time of mount */
-void __cleancache_init_fs(struct super_block *sb)
-{
- int pool_id = CLEANCACHE_NO_BACKEND;
-
- if (cleancache_ops) {
- pool_id = cleancache_ops->init_fs(PAGE_SIZE);
- if (pool_id < 0)
- pool_id = CLEANCACHE_NO_POOL;
- }
- sb->cleancache_poolid = pool_id;
-}
-EXPORT_SYMBOL(__cleancache_init_fs);
-
-/* Called by a cleancache-enabled clustered filesystem at time of mount */
-void __cleancache_init_shared_fs(struct super_block *sb)
-{
- int pool_id = CLEANCACHE_NO_BACKEND_SHARED;
-
- if (cleancache_ops) {
- pool_id = cleancache_ops->init_shared_fs(&sb->s_uuid, PAGE_SIZE);
- if (pool_id < 0)
- pool_id = CLEANCACHE_NO_POOL;
- }
- sb->cleancache_poolid = pool_id;
-}
-EXPORT_SYMBOL(__cleancache_init_shared_fs);
-
-/*
- * If the filesystem uses exportable filehandles, use the filehandle as
- * the key, else use the inode number.
- */
-static int cleancache_get_key(struct inode *inode,
- struct cleancache_filekey *key)
-{
- int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *);
- int len = 0, maxlen = CLEANCACHE_KEY_MAX;
- struct super_block *sb = inode->i_sb;
-
- key->u.ino = inode->i_ino;
- if (sb->s_export_op != NULL) {
- fhfn = sb->s_export_op->encode_fh;
- if (fhfn) {
- len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
- if (len <= FILEID_ROOT || len == FILEID_INVALID)
- return -1;
- if (maxlen > CLEANCACHE_KEY_MAX)
- return -1;
- }
- }
- return 0;
-}
-
-/*
- * "Get" data from cleancache associated with the poolid/inode/index
- * that were specified when the data was put to cleanache and, if
- * successful, use it to fill the specified page with data and return 0.
- * The pageframe is unchanged and returns -1 if the get fails.
- * Page must be locked by caller.
- *
- * The function has two checks before any action is taken - whether
- * a backend is registered and whether the sb->cleancache_poolid
- * is correct.
- */
-int __cleancache_get_page(struct page *page)
-{
- int ret = -1;
- int pool_id;
- struct cleancache_filekey key = { .u.key = { 0 } };
-
- if (!cleancache_ops) {
- cleancache_failed_gets++;
- goto out;
- }
-
- VM_BUG_ON_PAGE(!PageLocked(page), page);
- pool_id = page->mapping->host->i_sb->cleancache_poolid;
- if (pool_id < 0)
- goto out;
-
- if (cleancache_get_key(page->mapping->host, &key) < 0)
- goto out;
-
- ret = cleancache_ops->get_page(pool_id, key, page->index, page);
- if (ret == 0)
- cleancache_succ_gets++;
- else
- cleancache_failed_gets++;
-out:
- return ret;
-}
-EXPORT_SYMBOL(__cleancache_get_page);
-
-/*
- * "Put" data from a page to cleancache and associate it with the
- * (previously-obtained per-filesystem) poolid and the page's,
- * inode and page index. Page must be locked. Note that a put_page
- * always "succeeds", though a subsequent get_page may succeed or fail.
- *
- * The function has two checks before any action is taken - whether
- * a backend is registered and whether the sb->cleancache_poolid
- * is correct.
- */
-void __cleancache_put_page(struct page *page)
-{
- int pool_id;
- struct cleancache_filekey key = { .u.key = { 0 } };
-
- if (!cleancache_ops) {
- cleancache_puts++;
- return;
- }
-
- VM_BUG_ON_PAGE(!PageLocked(page), page);
- pool_id = page->mapping->host->i_sb->cleancache_poolid;
- if (pool_id >= 0 &&
- cleancache_get_key(page->mapping->host, &key) >= 0) {
- cleancache_ops->put_page(pool_id, key, page->index, page);
- cleancache_puts++;
- }
-}
-EXPORT_SYMBOL(__cleancache_put_page);
-
-/*
- * Invalidate any data from cleancache associated with the poolid and the
- * page's inode and page index so that a subsequent "get" will fail.
- *
- * The function has two checks before any action is taken - whether
- * a backend is registered and whether the sb->cleancache_poolid
- * is correct.
- */
-void __cleancache_invalidate_page(struct address_space *mapping,
- struct page *page)
-{
- /* careful... page->mapping is NULL sometimes when this is called */
- int pool_id = mapping->host->i_sb->cleancache_poolid;
- struct cleancache_filekey key = { .u.key = { 0 } };
-
- if (!cleancache_ops)
- return;
-
- if (pool_id >= 0) {
- VM_BUG_ON_PAGE(!PageLocked(page), page);
- if (cleancache_get_key(mapping->host, &key) >= 0) {
- cleancache_ops->invalidate_page(pool_id,
- key, page->index);
- cleancache_invalidates++;
- }
- }
-}
-EXPORT_SYMBOL(__cleancache_invalidate_page);
-
-/*
- * Invalidate all data from cleancache associated with the poolid and the
- * mappings's inode so that all subsequent gets to this poolid/inode
- * will fail.
- *
- * The function has two checks before any action is taken - whether
- * a backend is registered and whether the sb->cleancache_poolid
- * is correct.
- */
-void __cleancache_invalidate_inode(struct address_space *mapping)
-{
- int pool_id = mapping->host->i_sb->cleancache_poolid;
- struct cleancache_filekey key = { .u.key = { 0 } };
-
- if (!cleancache_ops)
- return;
-
- if (pool_id >= 0 && cleancache_get_key(mapping->host, &key) >= 0)
- cleancache_ops->invalidate_inode(pool_id, key);
-}
-EXPORT_SYMBOL(__cleancache_invalidate_inode);
-
-/*
- * Called by any cleancache-enabled filesystem at time of unmount;
- * note that pool_id is surrendered and may be returned by a subsequent
- * cleancache_init_fs or cleancache_init_shared_fs.
- */
-void __cleancache_invalidate_fs(struct super_block *sb)
-{
- int pool_id;
-
- pool_id = sb->cleancache_poolid;
- sb->cleancache_poolid = CLEANCACHE_NO_POOL;
-
- if (cleancache_ops && pool_id >= 0)
- cleancache_ops->invalidate_fs(pool_id);
-}
-EXPORT_SYMBOL(__cleancache_invalidate_fs);
-
-static int __init init_cleancache(void)
-{
-#ifdef CONFIG_DEBUG_FS
- struct dentry *root = debugfs_create_dir("cleancache", NULL);
-
- debugfs_create_u64("succ_gets", 0444, root, &cleancache_succ_gets);
- debugfs_create_u64("failed_gets", 0444, root, &cleancache_failed_gets);
- debugfs_create_u64("puts", 0444, root, &cleancache_puts);
- debugfs_create_u64("invalidates", 0444, root, &cleancache_invalidates);
-#endif
- return 0;
-}
-module_init(init_cleancache)
#include <linux/gfp.h>
#include <linux/mm.h>
#include <linux/swap.h>
+#include <linux/swapops.h>
#include <linux/mman.h>
#include <linux/pagemap.h>
#include <linux/file.h>
#include <linux/cpuset.h>
#include <linux/hugetlb.h>
#include <linux/memcontrol.h>
-#include <linux/cleancache.h>
#include <linux/shmem_fs.h>
#include <linux/rmap.h>
#include <linux/delayacct.h>
#include <linux/psi.h>
#include <linux/ramfs.h>
#include <linux/page_idle.h>
+#include <linux/migrate.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include "internal.h"
{
long nr;
- /*
- * if we're uptodate, flush out into the cleancache, otherwise
- * invalidate any existing cleancache entries. We can't leave
- * stale data around in the cleancache once our page is gone
- */
- if (folio_test_uptodate(folio) && folio_test_mappedtodisk(folio))
- cleancache_put_page(&folio->page);
- else
- cleancache_invalidate_page(mapping, &folio->page);
-
VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(folio_mapped(folio))) {
int mapcount;
void filemap_free_folio(struct address_space *mapping, struct folio *folio)
{
void (*freepage)(struct page *);
+ int refs = 1;
freepage = mapping->a_ops->freepage;
if (freepage)
freepage(&folio->page);
- if (folio_test_large(folio) && !folio_test_hugetlb(folio)) {
- folio_ref_sub(folio, folio_nr_pages(folio));
- VM_BUG_ON_FOLIO(folio_ref_count(folio) <= 0, folio);
- } else {
- folio_put(folio);
- }
+ if (folio_test_large(folio) && !folio_test_hugetlb(folio))
+ refs = folio_nr_pages(folio);
+ folio_put_refs(folio, refs);
}
/**
return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
}
+#ifdef CONFIG_MIGRATION
+/**
+ * migration_entry_wait_on_locked - Wait for a migration entry to be removed
+ * @entry: migration swap entry.
+ * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
+ * for pte entries, pass NULL for pmd entries.
+ * @ptl: already locked ptl. This function will drop the lock.
+ *
+ * Wait for a migration entry referencing the given page to be removed. This is
+ * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
+ * this can be called without taking a reference on the page. Instead this
+ * should be called while holding the ptl for the migration entry referencing
+ * the page.
+ *
+ * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
+ *
+ * This follows the same logic as folio_wait_bit_common() so see the comments
+ * there.
+ */
+void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
+ spinlock_t *ptl)
+{
+ struct wait_page_queue wait_page;
+ wait_queue_entry_t *wait = &wait_page.wait;
+ bool thrashing = false;
+ bool delayacct = false;
+ unsigned long pflags;
+ wait_queue_head_t *q;
+ struct folio *folio = page_folio(pfn_swap_entry_to_page(entry));
+
+ q = folio_waitqueue(folio);
+ if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) {
+ if (!folio_test_swapbacked(folio)) {
+ delayacct_thrashing_start();
+ delayacct = true;
+ }
+ psi_memstall_enter(&pflags);
+ thrashing = true;
+ }
+
+ init_wait(wait);
+ wait->func = wake_page_function;
+ wait_page.folio = folio;
+ wait_page.bit_nr = PG_locked;
+ wait->flags = 0;
+
+ spin_lock_irq(&q->lock);
+ folio_set_waiters(folio);
+ if (!folio_trylock_flag(folio, PG_locked, wait))
+ __add_wait_queue_entry_tail(q, wait);
+ spin_unlock_irq(&q->lock);
+
+ /*
+ * If a migration entry exists for the page the migration path must hold
+ * a valid reference to the page, and it must take the ptl to remove the
+ * migration entry. So the page is valid until the ptl is dropped.
+ */
+ if (ptep)
+ pte_unmap_unlock(ptep, ptl);
+ else
+ spin_unlock(ptl);
+
+ for (;;) {
+ unsigned int flags;
+
+ set_current_state(TASK_UNINTERRUPTIBLE);
+
+ /* Loop until we've been woken or interrupted */
+ flags = smp_load_acquire(&wait->flags);
+ if (!(flags & WQ_FLAG_WOKEN)) {
+ if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
+ break;
+
+ io_schedule();
+ continue;
+ }
+ break;
+ }
+
+ finish_wait(q, wait);
+
+ if (thrashing) {
+ if (delayacct)
+ delayacct_thrashing_end();
+ psi_memstall_leave(&pflags);
+ }
+}
+#endif
+
void folio_wait_bit(struct folio *folio, int bit_nr)
{
folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
* may be registered, but implementations can never deregister. This
* is a simple singly-linked list of all registered implementations.
*/
-static struct frontswap_ops *frontswap_ops __read_mostly;
-
-#define for_each_frontswap_ops(ops) \
- for ((ops) = frontswap_ops; (ops); (ops) = (ops)->next)
-
-/*
- * If enabled, frontswap_store will return failure even on success. As
- * a result, the swap subsystem will always write the page to swap, in
- * effect converting frontswap into a writethrough cache. In this mode,
- * there is no direct reduction in swap writes, but a frontswap backend
- * can unilaterally "reclaim" any pages in use with no data loss, thus
- * providing increases control over maximum memory usage due to frontswap.
- */
-static bool frontswap_writethrough_enabled __read_mostly;
-
-/*
- * If enabled, the underlying tmem implementation is capable of doing
- * exclusive gets, so frontswap_load, on a successful tmem_get must
- * mark the page as no longer in frontswap AND mark it dirty.
- */
-static bool frontswap_tmem_exclusive_gets_enabled __read_mostly;
+static const struct frontswap_ops *frontswap_ops __read_mostly;
#ifdef CONFIG_DEBUG_FS
/*
/*
* Register operations for frontswap
*/
-void frontswap_register_ops(struct frontswap_ops *ops)
+int frontswap_register_ops(const struct frontswap_ops *ops)
{
- DECLARE_BITMAP(a, MAX_SWAPFILES);
- DECLARE_BITMAP(b, MAX_SWAPFILES);
- struct swap_info_struct *si;
- unsigned int i;
-
- bitmap_zero(a, MAX_SWAPFILES);
- bitmap_zero(b, MAX_SWAPFILES);
-
- spin_lock(&swap_lock);
- plist_for_each_entry(si, &swap_active_head, list) {
- if (!WARN_ON(!si->frontswap_map))
- __set_bit(si->type, a);
- }
- spin_unlock(&swap_lock);
-
- /* the new ops needs to know the currently active swap devices */
- for_each_set_bit(i, a, MAX_SWAPFILES)
- ops->init(i);
-
- /*
- * Setting frontswap_ops must happen after the ops->init() calls
- * above; cmpxchg implies smp_mb() which will ensure the init is
- * complete at this point.
- */
- do {
- ops->next = frontswap_ops;
- } while (cmpxchg(&frontswap_ops, ops->next, ops) != ops->next);
+ if (frontswap_ops)
+ return -EINVAL;
+ frontswap_ops = ops;
static_branch_inc(&frontswap_enabled_key);
-
- spin_lock(&swap_lock);
- plist_for_each_entry(si, &swap_active_head, list) {
- if (si->frontswap_map)
- __set_bit(si->type, b);
- }
- spin_unlock(&swap_lock);
-
- /*
- * On the very unlikely chance that a swap device was added or
- * removed between setting the "a" list bits and the ops init
- * calls, we re-check and do init or invalidate for any changed
- * bits.
- */
- if (unlikely(!bitmap_equal(a, b, MAX_SWAPFILES))) {
- for (i = 0; i < MAX_SWAPFILES; i++) {
- if (!test_bit(i, a) && test_bit(i, b))
- ops->init(i);
- else if (test_bit(i, a) && !test_bit(i, b))
- ops->invalidate_area(i);
- }
- }
-}
-EXPORT_SYMBOL(frontswap_register_ops);
-
-/*
- * Enable/disable frontswap writethrough (see above).
- */
-void frontswap_writethrough(bool enable)
-{
- frontswap_writethrough_enabled = enable;
-}
-EXPORT_SYMBOL(frontswap_writethrough);
-
-/*
- * Enable/disable frontswap exclusive gets (see above).
- */
-void frontswap_tmem_exclusive_gets(bool enable)
-{
- frontswap_tmem_exclusive_gets_enabled = enable;
+ return 0;
}
-EXPORT_SYMBOL(frontswap_tmem_exclusive_gets);
/*
* Called when a swap device is swapon'd.
*/
-void __frontswap_init(unsigned type, unsigned long *map)
+void frontswap_init(unsigned type, unsigned long *map)
{
struct swap_info_struct *sis = swap_info[type];
- struct frontswap_ops *ops;
VM_BUG_ON(sis == NULL);
* p->frontswap set to something valid to work properly.
*/
frontswap_map_set(sis, map);
-
- for_each_frontswap_ops(ops)
- ops->init(type);
+ frontswap_ops->init(type);
}
-EXPORT_SYMBOL(__frontswap_init);
-bool __frontswap_test(struct swap_info_struct *sis,
+static bool __frontswap_test(struct swap_info_struct *sis,
pgoff_t offset)
{
if (sis->frontswap_map)
return test_bit(offset, sis->frontswap_map);
return false;
}
-EXPORT_SYMBOL(__frontswap_test);
static inline void __frontswap_set(struct swap_info_struct *sis,
pgoff_t offset)
int type = swp_type(entry);
struct swap_info_struct *sis = swap_info[type];
pgoff_t offset = swp_offset(entry);
- struct frontswap_ops *ops;
VM_BUG_ON(!frontswap_ops);
VM_BUG_ON(!PageLocked(page));
*/
if (__frontswap_test(sis, offset)) {
__frontswap_clear(sis, offset);
- for_each_frontswap_ops(ops)
- ops->invalidate_page(type, offset);
+ frontswap_ops->invalidate_page(type, offset);
}
- /* Try to store in each implementation, until one succeeds. */
- for_each_frontswap_ops(ops) {
- ret = ops->store(type, offset, page);
- if (!ret) /* successful store */
- break;
- }
+ ret = frontswap_ops->store(type, offset, page);
if (ret == 0) {
__frontswap_set(sis, offset);
inc_frontswap_succ_stores();
} else {
inc_frontswap_failed_stores();
}
- if (frontswap_writethrough_enabled)
- /* report failure so swap also writes to swap device */
- ret = -1;
+
return ret;
}
-EXPORT_SYMBOL(__frontswap_store);
/*
* "Get" data from frontswap associated with swaptype and offset that were
int type = swp_type(entry);
struct swap_info_struct *sis = swap_info[type];
pgoff_t offset = swp_offset(entry);
- struct frontswap_ops *ops;
VM_BUG_ON(!frontswap_ops);
VM_BUG_ON(!PageLocked(page));
return -1;
/* Try loading from each implementation, until one succeeds. */
- for_each_frontswap_ops(ops) {
- ret = ops->load(type, offset, page);
- if (!ret) /* successful load */
- break;
- }
- if (ret == 0) {
+ ret = frontswap_ops->load(type, offset, page);
+ if (ret == 0)
inc_frontswap_loads();
- if (frontswap_tmem_exclusive_gets_enabled) {
- SetPageDirty(page);
- __frontswap_clear(sis, offset);
- }
- }
return ret;
}
-EXPORT_SYMBOL(__frontswap_load);
/*
* Invalidate any data from frontswap associated with the specified swaptype
void __frontswap_invalidate_page(unsigned type, pgoff_t offset)
{
struct swap_info_struct *sis = swap_info[type];
- struct frontswap_ops *ops;
VM_BUG_ON(!frontswap_ops);
VM_BUG_ON(sis == NULL);
if (!__frontswap_test(sis, offset))
return;
- for_each_frontswap_ops(ops)
- ops->invalidate_page(type, offset);
+ frontswap_ops->invalidate_page(type, offset);
__frontswap_clear(sis, offset);
inc_frontswap_invalidates();
}
-EXPORT_SYMBOL(__frontswap_invalidate_page);
/*
* Invalidate all data from frontswap associated with all offsets for the
void __frontswap_invalidate_area(unsigned type)
{
struct swap_info_struct *sis = swap_info[type];
- struct frontswap_ops *ops;
VM_BUG_ON(!frontswap_ops);
VM_BUG_ON(sis == NULL);
if (sis->frontswap_map == NULL)
return;
- for_each_frontswap_ops(ops)
- ops->invalidate_area(type);
+ frontswap_ops->invalidate_area(type);
atomic_set(&sis->frontswap_pages, 0);
bitmap_zero(sis->frontswap_map, sis->max);
}
-EXPORT_SYMBOL(__frontswap_invalidate_area);
-
-static unsigned long __frontswap_curr_pages(void)
-{
- unsigned long totalpages = 0;
- struct swap_info_struct *si = NULL;
-
- assert_spin_locked(&swap_lock);
- plist_for_each_entry(si, &swap_active_head, list)
- totalpages += atomic_read(&si->frontswap_pages);
- return totalpages;
-}
-
-static int __frontswap_unuse_pages(unsigned long total, unsigned long *unused,
- int *swapid)
-{
- int ret = -EINVAL;
- struct swap_info_struct *si = NULL;
- int si_frontswap_pages;
- unsigned long total_pages_to_unuse = total;
- unsigned long pages = 0, pages_to_unuse = 0;
-
- assert_spin_locked(&swap_lock);
- plist_for_each_entry(si, &swap_active_head, list) {
- si_frontswap_pages = atomic_read(&si->frontswap_pages);
- if (total_pages_to_unuse < si_frontswap_pages) {
- pages = pages_to_unuse = total_pages_to_unuse;
- } else {
- pages = si_frontswap_pages;
- pages_to_unuse = 0; /* unuse all */
- }
- /* ensure there is enough RAM to fetch pages from frontswap */
- if (security_vm_enough_memory_mm(current->mm, pages)) {
- ret = -ENOMEM;
- continue;
- }
- vm_unacct_memory(pages);
- *unused = pages_to_unuse;
- *swapid = si->type;
- ret = 0;
- break;
- }
-
- return ret;
-}
-
-/*
- * Used to check if it's necessary and feasible to unuse pages.
- * Return 1 when nothing to do, 0 when need to shrink pages,
- * error code when there is an error.
- */
-static int __frontswap_shrink(unsigned long target_pages,
- unsigned long *pages_to_unuse,
- int *type)
-{
- unsigned long total_pages = 0, total_pages_to_unuse;
-
- assert_spin_locked(&swap_lock);
-
- total_pages = __frontswap_curr_pages();
- if (total_pages <= target_pages) {
- /* Nothing to do */
- *pages_to_unuse = 0;
- return 1;
- }
- total_pages_to_unuse = total_pages - target_pages;
- return __frontswap_unuse_pages(total_pages_to_unuse, pages_to_unuse, type);
-}
-
-/*
- * Frontswap, like a true swap device, may unnecessarily retain pages
- * under certain circumstances; "shrink" frontswap is essentially a
- * "partial swapoff" and works by calling try_to_unuse to attempt to
- * unuse enough frontswap pages to attempt to -- subject to memory
- * constraints -- reduce the number of pages in frontswap to the
- * number given in the parameter target_pages.
- */
-void frontswap_shrink(unsigned long target_pages)
-{
- unsigned long pages_to_unuse = 0;
- int type, ret;
-
- /*
- * we don't want to hold swap_lock while doing a very
- * lengthy try_to_unuse, but swap_list may change
- * so restart scan from swap_active_head each time
- */
- spin_lock(&swap_lock);
- ret = __frontswap_shrink(target_pages, &pages_to_unuse, &type);
- spin_unlock(&swap_lock);
- if (ret == 0)
- try_to_unuse(type, true, pages_to_unuse);
- return;
-}
-EXPORT_SYMBOL(frontswap_shrink);
-
-/*
- * Count and return the number of frontswap pages across all
- * swap devices. This is exported so that backend drivers can
- * determine current usage without reading debugfs.
- */
-unsigned long frontswap_curr_pages(void)
-{
- unsigned long totalpages = 0;
-
- spin_lock(&swap_lock);
- totalpages = __frontswap_curr_pages();
- spin_unlock(&swap_lock);
-
- return totalpages;
-}
-EXPORT_SYMBOL(frontswap_curr_pages);
static int __init init_frontswap(void)
{
unsigned int nr_entries;
nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
- nr_entries = filter_irq_stacks(entries, nr_entries);
return __stack_depot_save(entries, nr_entries, flags, can_alloc);
}
if (unlikely(!si))
goto out;
- delayacct_set_flag(current, DELAYACCT_PF_SWAPIN);
page = lookup_swap_cache(entry, vma, vmf->address);
swapcache = page;
vmf->address, &vmf->ptl);
if (likely(pte_same(*vmf->pte, vmf->orig_pte)))
ret = VM_FAULT_OOM;
- delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN);
goto unlock;
}
* owner processes (which may be unknown at hwpoison time)
*/
ret = VM_FAULT_HWPOISON;
- delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN);
goto out_release;
}
locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
- delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN);
if (!locked) {
ret |= VM_FAULT_RETRY;
goto out_release;
{
pte_t pte;
swp_entry_t entry;
- struct folio *folio;
spin_lock(ptl);
pte = *ptep;
if (!is_migration_entry(entry))
goto out;
- folio = page_folio(pfn_swap_entry_to_page(entry));
-
- /*
- * Once page cache replacement of page migration started, page_count
- * is zero; but we must not call folio_put_wait_locked() without
- * a ref. Use folio_try_get(), and just fault again if it fails.
- */
- if (!folio_try_get(folio))
- goto out;
- pte_unmap_unlock(ptep, ptl);
- folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE);
+ migration_entry_wait_on_locked(entry, ptep, ptl);
return;
out:
pte_unmap_unlock(ptep, ptl);
void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
{
spinlock_t *ptl;
- struct folio *folio;
ptl = pmd_lock(mm, pmd);
if (!is_pmd_migration_entry(*pmd))
goto unlock;
- folio = page_folio(pfn_swap_entry_to_page(pmd_to_swp_entry(*pmd)));
- if (!folio_try_get(folio))
- goto unlock;
- spin_unlock(ptl);
- folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE);
+ migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
return;
unlock:
spin_unlock(ptl);
return false;
/* Page from ZONE_DEVICE have one extra reference */
- if (is_zone_device_page(page)) {
- /*
- * Private page can never be pin as they have no valid pte and
- * GUP will fail for those. Yet if there is a pending migration
- * a thread might try to wait on the pte migration entry and
- * will bump the page reference count. Sadly there is no way to
- * differentiate a regular pin from migration wait. Hence to
- * avoid 2 racing thread trying to migrate back to CPU to enter
- * infinite loop (one stopping migration because the other is
- * waiting on pte migration entry). We always return true here.
- *
- * FIXME proper solution is to rework migration_entry_wait() so
- * it does not need to take a reference on page.
- */
- return is_device_private_page(page);
- }
+ if (is_zone_device_page(page))
+ extra++;
/* For file back page */
if (page_mapping(page))
#include <linux/padata.h>
#include <linux/khugepaged.h>
#include <linux/buffer_head.h>
+#include <linux/delayacct.h>
#include <asm/sections.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
return NULL;
psi_memstall_enter(&pflags);
+ delayacct_compact_start();
noreclaim_flag = memalloc_noreclaim_save();
*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
memalloc_noreclaim_restore(noreclaim_flag);
psi_memstall_leave(&pflags);
+ delayacct_compact_end();
if (*compact_result == COMPACT_SKIPPED)
return NULL;
#include <linux/psi.h>
#include <linux/uio.h>
#include <linux/sched/task.h>
+#include <linux/delayacct.h>
void end_swap_bio_write(struct bio *bio)
{
* significant part of overall IO time.
*/
psi_memstall_enter(&pflags);
+ delayacct_swapin_start();
if (frontswap_load(page) == 0) {
SetPageUptodate(page);
out:
psi_memstall_leave(&pflags);
+ delayacct_swapin_end();
return ret;
}
if (!page_owner_enabled)
return;
+ stack_depot_init();
+
register_dummy_stack();
register_failure_stack();
register_early_stack();
{
struct pcpu_block_md *block = chunk->md_blocks + index;
unsigned long *alloc_map = pcpu_index_alloc_map(chunk, index);
- unsigned int rs, re, start; /* region start, region end */
+ unsigned int start, end; /* region start, region end */
/* promote scan_hint to contig_hint */
if (block->scan_hint) {
block->right_free = 0;
/* iterate over free areas and update the contig hints */
- bitmap_for_each_clear_region(alloc_map, rs, re, start,
- PCPU_BITMAP_BLOCK_BITS)
- pcpu_block_update(block, rs, re);
+ for_each_clear_bitrange_from(start, end, alloc_map, PCPU_BITMAP_BLOCK_BITS)
+ pcpu_block_update(block, start, end);
}
/**
static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits,
int *next_off)
{
- unsigned int page_start, page_end, rs, re;
+ unsigned int start, end;
- page_start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE);
- page_end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
+ start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE);
+ end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
- rs = page_start;
- bitmap_next_clear_region(chunk->populated, &rs, &re, page_end);
- if (rs >= page_end)
+ start = find_next_zero_bit(chunk->populated, end, start);
+ if (start >= end)
return true;
- *next_off = re * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE;
+ end = find_next_bit(chunk->populated, end, start + 1);
+
+ *next_off = end * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE;
return false;
}
/* populate if not all pages are already there */
if (!is_atomic) {
- unsigned int page_start, page_end, rs, re;
+ unsigned int page_end, rs, re;
- page_start = PFN_DOWN(off);
+ rs = PFN_DOWN(off);
page_end = PFN_UP(off + size);
- bitmap_for_each_clear_region(chunk->populated, rs, re,
- page_start, page_end) {
+ for_each_clear_bitrange_from(rs, re, chunk->populated, page_end) {
WARN_ON(chunk->immutable);
ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp);
list_for_each_entry_safe(chunk, next, &to_free, list) {
unsigned int rs, re;
- bitmap_for_each_set_region(chunk->populated, rs, re, 0,
- chunk->nr_pages) {
+ for_each_set_bitrange(rs, re, chunk->populated, chunk->nr_pages) {
pcpu_depopulate_chunk(chunk, rs, re);
spin_lock_irq(&pcpu_lock);
pcpu_chunk_depopulated(chunk, rs, re);
continue;
/* @chunk can't go away while pcpu_alloc_mutex is held */
- bitmap_for_each_clear_region(chunk->populated, rs, re, 0,
- chunk->nr_pages) {
+ for_each_clear_bitrange(rs, re, chunk->populated, chunk->nr_pages) {
int nr = min_t(int, re - rs, nr_to_pop);
spin_unlock_irq(&pcpu_lock);
return ai;
}
+
+static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align,
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn)
+{
+ const unsigned long goal = __pa(MAX_DMA_ADDRESS);
+#ifdef CONFIG_NUMA
+ int node = NUMA_NO_NODE;
+ void *ptr;
+
+ if (cpu_to_nd_fn)
+ node = cpu_to_nd_fn(cpu);
+
+ if (node == NUMA_NO_NODE || !node_online(node) || !NODE_DATA(node)) {
+ ptr = memblock_alloc_from(size, align, goal);
+ pr_info("cpu %d has no node %d or node-local memory\n",
+ cpu, node);
+ pr_debug("per cpu data for cpu%d %zu bytes at 0x%llx\n",
+ cpu, size, (u64)__pa(ptr));
+ } else {
+ ptr = memblock_alloc_try_nid(size, align, goal,
+ MEMBLOCK_ALLOC_ACCESSIBLE,
+ node);
+
+ pr_debug("per cpu data for cpu%d %zu bytes on node%d at 0x%llx\n",
+ cpu, size, node, (u64)__pa(ptr));
+ }
+ return ptr;
+#else
+ return memblock_alloc_from(size, align, goal);
+#endif
+}
+
+static void __init pcpu_fc_free(void *ptr, size_t size)
+{
+ memblock_free(ptr, size);
+}
#endif /* BUILD_EMBED_FIRST_CHUNK || BUILD_PAGE_FIRST_CHUNK */
#if defined(BUILD_EMBED_FIRST_CHUNK)
* @dyn_size: minimum free size for dynamic allocation in bytes
* @atom_size: allocation atom size
* @cpu_distance_fn: callback to determine distance between cpus, optional
- * @alloc_fn: function to allocate percpu page
- * @free_fn: function to free percpu page
+ * @cpu_to_nd_fn: callback to convert cpu to it's node, optional
*
* This is a helper to ease setting up embedded first percpu chunk and
* can be called where pcpu_setup_first_chunk() is expected.
*
* If this function is used to setup the first chunk, it is allocated
- * by calling @alloc_fn and used as-is without being mapped into
+ * by calling pcpu_fc_alloc and used as-is without being mapped into
* vmalloc area. Allocations are always whole multiples of @atom_size
* aligned to @atom_size.
*
* @dyn_size specifies the minimum dynamic area size.
*
* If the needed size is smaller than the minimum or specified unit
- * size, the leftover is returned using @free_fn.
+ * size, the leftover is returned using pcpu_fc_free.
*
* RETURNS:
* 0 on success, -errno on failure.
int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn)
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn)
{
void *base = (void *)ULONG_MAX;
void **areas = NULL;
BUG_ON(cpu == NR_CPUS);
/* allocate space for the whole group */
- ptr = alloc_fn(cpu, gi->nr_units * ai->unit_size, atom_size);
+ ptr = pcpu_fc_alloc(cpu, gi->nr_units * ai->unit_size, atom_size, cpu_to_nd_fn);
if (!ptr) {
rc = -ENOMEM;
goto out_free_areas;
for (i = 0; i < gi->nr_units; i++, ptr += ai->unit_size) {
if (gi->cpu_map[i] == NR_CPUS) {
/* unused unit, free whole */
- free_fn(ptr, ai->unit_size);
+ pcpu_fc_free(ptr, ai->unit_size);
continue;
}
/* copy and return the unused part */
memcpy(ptr, __per_cpu_load, ai->static_size);
- free_fn(ptr + size_sum, ai->unit_size - size_sum);
+ pcpu_fc_free(ptr + size_sum, ai->unit_size - size_sum);
}
}
out_free_areas:
for (group = 0; group < ai->nr_groups; group++)
if (areas[group])
- free_fn(areas[group],
+ pcpu_fc_free(areas[group],
ai->groups[group].nr_units * ai->unit_size);
out_free:
pcpu_free_alloc_info(ai);
#endif /* BUILD_EMBED_FIRST_CHUNK */
#ifdef BUILD_PAGE_FIRST_CHUNK
+#include <asm/pgalloc.h>
+
+#ifndef P4D_TABLE_SIZE
+#define P4D_TABLE_SIZE PAGE_SIZE
+#endif
+
+#ifndef PUD_TABLE_SIZE
+#define PUD_TABLE_SIZE PAGE_SIZE
+#endif
+
+#ifndef PMD_TABLE_SIZE
+#define PMD_TABLE_SIZE PAGE_SIZE
+#endif
+
+#ifndef PTE_TABLE_SIZE
+#define PTE_TABLE_SIZE PAGE_SIZE
+#endif
+void __init __weak pcpu_populate_pte(unsigned long addr)
+{
+ pgd_t *pgd = pgd_offset_k(addr);
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ if (pgd_none(*pgd)) {
+ p4d_t *new;
+
+ new = memblock_alloc(P4D_TABLE_SIZE, P4D_TABLE_SIZE);
+ if (!new)
+ goto err_alloc;
+ pgd_populate(&init_mm, pgd, new);
+ }
+
+ p4d = p4d_offset(pgd, addr);
+ if (p4d_none(*p4d)) {
+ pud_t *new;
+
+ new = memblock_alloc(PUD_TABLE_SIZE, PUD_TABLE_SIZE);
+ if (!new)
+ goto err_alloc;
+ p4d_populate(&init_mm, p4d, new);
+ }
+
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud)) {
+ pmd_t *new;
+
+ new = memblock_alloc(PMD_TABLE_SIZE, PMD_TABLE_SIZE);
+ if (!new)
+ goto err_alloc;
+ pud_populate(&init_mm, pud, new);
+ }
+
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_present(*pmd)) {
+ pte_t *new;
+
+ new = memblock_alloc(PTE_TABLE_SIZE, PTE_TABLE_SIZE);
+ if (!new)
+ goto err_alloc;
+ pmd_populate_kernel(&init_mm, pmd, new);
+ }
+
+ return;
+
+err_alloc:
+ panic("%s: Failed to allocate memory\n", __func__);
+}
+
/**
* pcpu_page_first_chunk - map the first chunk using PAGE_SIZE pages
* @reserved_size: the size of reserved percpu area in bytes
- * @alloc_fn: function to allocate percpu page, always called with PAGE_SIZE
- * @free_fn: function to free percpu page, always called with PAGE_SIZE
- * @populate_pte_fn: function to populate pte
+ * @cpu_to_nd_fn: callback to convert cpu to it's node, optional
*
* This is a helper to ease setting up page-remapped first percpu
* chunk and can be called where pcpu_setup_first_chunk() is expected.
* RETURNS:
* 0 on success, -errno on failure.
*/
-int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn)
+int __init pcpu_page_first_chunk(size_t reserved_size, pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn)
{
static struct vm_struct vm;
struct pcpu_alloc_info *ai;
for (i = 0; i < unit_pages; i++) {
void *ptr;
- ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
+ ptr = pcpu_fc_alloc(cpu, PAGE_SIZE, PAGE_SIZE, cpu_to_nd_fn);
if (!ptr) {
pr_warn("failed to allocate %s page for cpu%u\n",
psize_str, cpu);
(unsigned long)vm.addr + unit * ai->unit_size;
for (i = 0; i < unit_pages; i++)
- populate_pte_fn(unit_addr + (i << PAGE_SHIFT));
+ pcpu_populate_pte(unit_addr + (i << PAGE_SHIFT));
/* pte already populated, the following shouldn't fail */
rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages],
enomem:
while (--j >= 0)
- free_fn(page_address(pages[j]), PAGE_SIZE);
+ pcpu_fc_free(page_address(pages[j]), PAGE_SIZE);
rc = -ENOMEM;
out_free_ar:
memblock_free(pages, pages_size);
unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
EXPORT_SYMBOL(__per_cpu_offset);
-static void * __init pcpu_dfl_fc_alloc(unsigned int cpu, size_t size,
- size_t align)
-{
- return memblock_alloc_from(size, align, __pa(MAX_DMA_ADDRESS));
-}
-
-static void __init pcpu_dfl_fc_free(void *ptr, size_t size)
-{
- memblock_free(ptr, size);
-}
-
void __init setup_per_cpu_areas(void)
{
unsigned long delta;
* Always reserve area for module percpu variables. That's
* what the legacy allocator did.
*/
- rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
- PERCPU_DYNAMIC_RESERVE, PAGE_SIZE, NULL,
- pcpu_dfl_fc_alloc, pcpu_dfl_fc_free);
+ rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE, PERCPU_DYNAMIC_RESERVE,
+ PAGE_SIZE, NULL, NULL);
if (rc < 0)
panic("Failed to initialize percpu areas.");
#include <linux/uio.h>
#include <linux/khugepaged.h>
#include <linux/hugetlb.h>
-#include <linux/frontswap.h>
#include <linux/fs_parser.h>
#include <linux/swapfile.h>
static int shmem_find_swap_entries(struct address_space *mapping,
pgoff_t start, unsigned int nr_entries,
struct page **entries, pgoff_t *indices,
- unsigned int type, bool frontswap)
+ unsigned int type)
{
XA_STATE(xas, &mapping->i_pages, start);
struct page *page;
entry = radix_to_swp_entry(page);
if (swp_type(entry) != type)
continue;
- if (frontswap &&
- !frontswap_test(swap_info[type], swp_offset(entry)))
- continue;
indices[ret] = xas.xa_index;
entries[ret] = page;
/*
* If swap found in inode, free it and move page from swapcache to filecache.
*/
-static int shmem_unuse_inode(struct inode *inode, unsigned int type,
- bool frontswap, unsigned long *fs_pages_to_unuse)
+static int shmem_unuse_inode(struct inode *inode, unsigned int type)
{
struct address_space *mapping = inode->i_mapping;
pgoff_t start = 0;
struct pagevec pvec;
pgoff_t indices[PAGEVEC_SIZE];
- bool frontswap_partial = (frontswap && *fs_pages_to_unuse > 0);
int ret = 0;
pagevec_init(&pvec);
do {
unsigned int nr_entries = PAGEVEC_SIZE;
- if (frontswap_partial && *fs_pages_to_unuse < PAGEVEC_SIZE)
- nr_entries = *fs_pages_to_unuse;
-
pvec.nr = shmem_find_swap_entries(mapping, start, nr_entries,
- pvec.pages, indices,
- type, frontswap);
+ pvec.pages, indices, type);
if (pvec.nr == 0) {
ret = 0;
break;
if (ret < 0)
break;
- if (frontswap_partial) {
- *fs_pages_to_unuse -= ret;
- if (*fs_pages_to_unuse == 0) {
- ret = FRONTSWAP_PAGES_UNUSED;
- break;
- }
- }
-
start = indices[pvec.nr - 1];
} while (true);
* device 'type' back into memory, so the swap device can be
* unused.
*/
-int shmem_unuse(unsigned int type, bool frontswap,
- unsigned long *fs_pages_to_unuse)
+int shmem_unuse(unsigned int type)
{
struct shmem_inode_info *info, *next;
int error = 0;
atomic_inc(&info->stop_eviction);
mutex_unlock(&shmem_swaplist_mutex);
- error = shmem_unuse_inode(&info->vfs_inode, type, frontswap,
- fs_pages_to_unuse);
+ error = shmem_unuse_inode(&info->vfs_inode, type);
cond_resched();
mutex_lock(&shmem_swaplist_mutex);
return 0;
}
-int shmem_unuse(unsigned int type, bool frontswap,
- unsigned long *fs_pages_to_unuse)
+int shmem_unuse(unsigned int type)
{
return 0;
}
unsigned char);
static void free_swap_count_continuations(struct swap_info_struct *);
-DEFINE_SPINLOCK(swap_lock);
+static DEFINE_SPINLOCK(swap_lock);
static unsigned int nr_swapfiles;
atomic_long_t nr_swap_pages;
/*
* all active swap_info_structs
* protected with swap_lock, and ordered by priority.
*/
-PLIST_HEAD(swap_active_head);
+static PLIST_HEAD(swap_active_head);
/*
* all available (active, not full) swap_info_structs
static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
- unsigned int type, bool frontswap,
- unsigned long *fs_pages_to_unuse)
+ unsigned int type)
{
struct page *page;
swp_entry_t entry;
continue;
offset = swp_offset(entry);
- if (frontswap && !frontswap_test(si, offset))
- continue;
-
pte_unmap(pte);
swap_map = &si->swap_map[offset];
page = lookup_swap_cache(entry, vma, addr);
try_to_free_swap(page);
unlock_page(page);
put_page(page);
-
- if (*fs_pages_to_unuse && !--(*fs_pages_to_unuse)) {
- ret = FRONTSWAP_PAGES_UNUSED;
- goto out;
- }
try_next:
pte = pte_offset_map(pmd, addr);
} while (pte++, addr += PAGE_SIZE, addr != end);
static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
- unsigned int type, bool frontswap,
- unsigned long *fs_pages_to_unuse)
+ unsigned int type)
{
pmd_t *pmd;
unsigned long next;
next = pmd_addr_end(addr, end);
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
continue;
- ret = unuse_pte_range(vma, pmd, addr, next, type,
- frontswap, fs_pages_to_unuse);
+ ret = unuse_pte_range(vma, pmd, addr, next, type);
if (ret)
return ret;
} while (pmd++, addr = next, addr != end);
static inline int unuse_pud_range(struct vm_area_struct *vma, p4d_t *p4d,
unsigned long addr, unsigned long end,
- unsigned int type, bool frontswap,
- unsigned long *fs_pages_to_unuse)
+ unsigned int type)
{
pud_t *pud;
unsigned long next;
next = pud_addr_end(addr, end);
if (pud_none_or_clear_bad(pud))
continue;
- ret = unuse_pmd_range(vma, pud, addr, next, type,
- frontswap, fs_pages_to_unuse);
+ ret = unuse_pmd_range(vma, pud, addr, next, type);
if (ret)
return ret;
} while (pud++, addr = next, addr != end);
static inline int unuse_p4d_range(struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end,
- unsigned int type, bool frontswap,
- unsigned long *fs_pages_to_unuse)
+ unsigned int type)
{
p4d_t *p4d;
unsigned long next;
next = p4d_addr_end(addr, end);
if (p4d_none_or_clear_bad(p4d))
continue;
- ret = unuse_pud_range(vma, p4d, addr, next, type,
- frontswap, fs_pages_to_unuse);
+ ret = unuse_pud_range(vma, p4d, addr, next, type);
if (ret)
return ret;
} while (p4d++, addr = next, addr != end);
return 0;
}
-static int unuse_vma(struct vm_area_struct *vma, unsigned int type,
- bool frontswap, unsigned long *fs_pages_to_unuse)
+static int unuse_vma(struct vm_area_struct *vma, unsigned int type)
{
pgd_t *pgd;
unsigned long addr, end, next;
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
continue;
- ret = unuse_p4d_range(vma, pgd, addr, next, type,
- frontswap, fs_pages_to_unuse);
+ ret = unuse_p4d_range(vma, pgd, addr, next, type);
if (ret)
return ret;
} while (pgd++, addr = next, addr != end);
return 0;
}
-static int unuse_mm(struct mm_struct *mm, unsigned int type,
- bool frontswap, unsigned long *fs_pages_to_unuse)
+static int unuse_mm(struct mm_struct *mm, unsigned int type)
{
struct vm_area_struct *vma;
int ret = 0;
mmap_read_lock(mm);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (vma->anon_vma) {
- ret = unuse_vma(vma, type, frontswap,
- fs_pages_to_unuse);
+ ret = unuse_vma(vma, type);
if (ret)
break;
}
* if there are no inuse entries after prev till end of the map.
*/
static unsigned int find_next_to_unuse(struct swap_info_struct *si,
- unsigned int prev, bool frontswap)
+ unsigned int prev)
{
unsigned int i;
unsigned char count;
for (i = prev + 1; i < si->max; i++) {
count = READ_ONCE(si->swap_map[i]);
if (count && swap_count(count) != SWAP_MAP_BAD)
- if (!frontswap || frontswap_test(si, i))
- break;
+ break;
if ((i % LATENCY_LIMIT) == 0)
cond_resched();
}
return i;
}
-/*
- * If the boolean frontswap is true, only unuse pages_to_unuse pages;
- * pages_to_unuse==0 means all pages; ignored if frontswap is false
- */
-int try_to_unuse(unsigned int type, bool frontswap,
- unsigned long pages_to_unuse)
+static int try_to_unuse(unsigned int type)
{
struct mm_struct *prev_mm;
struct mm_struct *mm;
if (!READ_ONCE(si->inuse_pages))
return 0;
- if (!frontswap)
- pages_to_unuse = 0;
-
retry:
- retval = shmem_unuse(type, frontswap, &pages_to_unuse);
+ retval = shmem_unuse(type);
if (retval)
- goto out;
+ return retval;
prev_mm = &init_mm;
mmget(prev_mm);
spin_unlock(&mmlist_lock);
mmput(prev_mm);
prev_mm = mm;
- retval = unuse_mm(mm, type, frontswap, &pages_to_unuse);
-
+ retval = unuse_mm(mm, type);
if (retval) {
mmput(prev_mm);
- goto out;
+ return retval;
}
/*
i = 0;
while (READ_ONCE(si->inuse_pages) &&
!signal_pending(current) &&
- (i = find_next_to_unuse(si, i, frontswap)) != 0) {
+ (i = find_next_to_unuse(si, i)) != 0) {
entry = swp_entry(type, i);
page = find_get_page(swap_address_space(entry), i);
try_to_free_swap(page);
unlock_page(page);
put_page(page);
-
- /*
- * For frontswap, we just need to unuse pages_to_unuse, if
- * it was specified. Need not check frontswap again here as
- * we already zeroed out pages_to_unuse if not frontswap.
- */
- if (pages_to_unuse && --pages_to_unuse == 0)
- goto out;
}
/*
if (READ_ONCE(si->inuse_pages)) {
if (!signal_pending(current))
goto retry;
- retval = -EINTR;
+ return -EINTR;
}
-out:
- return (retval == FRONTSWAP_PAGES_UNUSED) ? 0 : retval;
+
+ return 0;
}
/*
struct swap_cluster_info *cluster_info,
unsigned long *frontswap_map)
{
- frontswap_init(p->type, frontswap_map);
+ if (IS_ENABLED(CONFIG_FRONTSWAP))
+ frontswap_init(p->type, frontswap_map);
spin_lock(&swap_lock);
spin_lock(&p->lock);
setup_swap_info(p, prio, swap_map, cluster_info);
disable_swap_slots_cache_lock();
set_current_oom_origin();
- err = try_to_unuse(p->type, false, 0); /* force unuse all pages */
+ err = try_to_unuse(p->type);
clear_current_oom_origin();
if (err) {
#include <linux/buffer_head.h> /* grr. try_to_release_page,
do_invalidatepage */
#include <linux/shmem_fs.h>
-#include <linux/cleancache.h>
#include <linux/rmap.h>
#include "internal.h"
*/
folio_zero_range(folio, offset, length);
- cleancache_invalidate_page(folio->mapping, &folio->page);
if (folio_has_private(folio))
do_invalidatepage(&folio->page, offset, length);
if (!folio_test_large(folio))
bool same_folio;
if (mapping_empty(mapping))
- goto out;
+ return;
/*
* 'start' and 'end' always covers the range of pages to be fully
folio_batch_release(&fbatch);
index++;
}
-
-out:
- cleancache_invalidate_inode(mapping);
}
EXPORT_SYMBOL(truncate_inode_pages_range);
xa_unlock_irq(&mapping->i_pages);
}
- /*
- * Cleancache needs notification even if there are no pages or shadow
- * entries.
- */
truncate_inode_pages(mapping, 0);
}
EXPORT_SYMBOL(truncate_inode_pages_final);
int did_range_unmap = 0;
if (mapping_empty(mapping))
- goto out;
+ return 0;
folio_batch_init(&fbatch);
index = start;
if (dax_mapping(mapping)) {
unmap_mapping_pages(mapping, start, end - start + 1, false);
}
-out:
- cleancache_invalidate_inode(mapping);
return ret;
}
EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range);
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+/*
+ * lock ordering:
+ * page_lock
+ * pool->migrate_lock
+ * class->lock
+ * zspage->lock
+ */
+
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/wait.h>
#include <linux/pagemap.h>
#include <linux/fs.h>
+#include <linux/local_lock.h>
#define ZSPAGE_MAGIC 0x58
#define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
-/*
- * Memory for allocating for handle keeps object position by
- * encoding <page, obj_idx> and the encoded value has a room
- * in least bit(ie, look at obj_to_location).
- * We use the bit to synchronize between object access by
- * user and migration.
- */
-#define HANDLE_PIN_BIT 0
-
/*
* Head in allocated object should have OBJ_ALLOCATED_TAG
* to identify the object was allocated or not.
#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
+#define HUGE_BITS 1
#define FULLNESS_BITS 2
#define CLASS_BITS 8
#define ISOLATED_BITS 3
NR_ZS_FULLNESS,
};
-enum zs_stat_type {
+enum class_stat_type {
CLASS_EMPTY,
CLASS_ALMOST_EMPTY,
CLASS_ALMOST_FULL,
struct zs_size_stat stats;
};
-/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
-static void SetPageHugeObject(struct page *page)
-{
- SetPageOwnerPriv1(page);
-}
-
-static void ClearPageHugeObject(struct page *page)
-{
- ClearPageOwnerPriv1(page);
-}
-
-static int PageHugeObject(struct page *page)
-{
- return PageOwnerPriv1(page);
-}
-
/*
* Placed within free objects to form a singly linked list.
* For every zspage, zspage->freeobj gives head of this list.
#ifdef CONFIG_COMPACTION
struct inode *inode;
struct work_struct free_work;
- /* A wait queue for when migration races with async_free_zspage() */
- struct wait_queue_head migration_wait;
- atomic_long_t isolated_pages;
- bool destroying;
#endif
+ /* protect page/zspage migration */
+ rwlock_t migrate_lock;
};
struct zspage {
struct {
+ unsigned int huge:HUGE_BITS;
unsigned int fullness:FULLNESS_BITS;
unsigned int class:CLASS_BITS + 1;
unsigned int isolated:ISOLATED_BITS;
};
struct mapping_area {
+ local_lock_t lock;
char *vm_buf; /* copy buffer for objects that span pages */
char *vm_addr; /* address of kmap_atomic()'ed pages */
enum zs_mapmode vm_mm; /* mapping mode */
};
+/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
+static void SetZsHugePage(struct zspage *zspage)
+{
+ zspage->huge = 1;
+}
+
+static bool ZsHugePage(struct zspage *zspage)
+{
+ return zspage->huge;
+}
+
#ifdef CONFIG_COMPACTION
static int zs_register_migration(struct zs_pool *pool);
static void zs_unregister_migration(struct zs_pool *pool);
static void migrate_lock_init(struct zspage *zspage);
static void migrate_read_lock(struct zspage *zspage);
static void migrate_read_unlock(struct zspage *zspage);
+static void migrate_write_lock(struct zspage *zspage);
+static void migrate_write_lock_nested(struct zspage *zspage);
+static void migrate_write_unlock(struct zspage *zspage);
static void kick_deferred_free(struct zs_pool *pool);
static void init_deferred_free(struct zs_pool *pool);
static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage);
static void migrate_lock_init(struct zspage *zspage) {}
static void migrate_read_lock(struct zspage *zspage) {}
static void migrate_read_unlock(struct zspage *zspage) {}
+static void migrate_write_lock(struct zspage *zspage) {}
+static void migrate_write_lock_nested(struct zspage *zspage) {}
+static void migrate_write_unlock(struct zspage *zspage) {}
static void kick_deferred_free(struct zs_pool *pool) {}
static void init_deferred_free(struct zs_pool *pool) {}
static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage) {}
kmem_cache_free(pool->zspage_cachep, zspage);
}
+/* class->lock(which owns the handle) synchronizes races */
static void record_obj(unsigned long handle, unsigned long obj)
{
- /*
- * lsb of @obj represents handle lock while other bits
- * represent object value the handle is pointing so
- * updating shouldn't do store tearing.
- */
- WRITE_ONCE(*(unsigned long *)handle, obj);
+ *(unsigned long *)handle = obj;
}
/* zpool driver */
#endif /* CONFIG_ZPOOL */
/* per-cpu VM mapping areas for zspage accesses that cross page boundaries */
-static DEFINE_PER_CPU(struct mapping_area, zs_map_area);
-
-static bool is_zspage_isolated(struct zspage *zspage)
-{
- return zspage->isolated;
-}
+static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = {
+ .lock = INIT_LOCAL_LOCK(lock),
+};
static __maybe_unused int is_first_page(struct page *page)
{
*class_idx = zspage->class;
}
+static struct size_class *zspage_class(struct zs_pool *pool,
+ struct zspage *zspage)
+{
+ return pool->size_class[zspage->class];
+}
+
static void set_zspage_mapping(struct zspage *zspage,
unsigned int class_idx,
enum fullness_group fullness)
return min_t(int, ZS_SIZE_CLASSES - 1, idx);
}
-/* type can be of enum type zs_stat_type or fullness_group */
-static inline void zs_stat_inc(struct size_class *class,
+/* type can be of enum type class_stat_type or fullness_group */
+static inline void class_stat_inc(struct size_class *class,
int type, unsigned long cnt)
{
class->stats.objs[type] += cnt;
}
-/* type can be of enum type zs_stat_type or fullness_group */
-static inline void zs_stat_dec(struct size_class *class,
+/* type can be of enum type class_stat_type or fullness_group */
+static inline void class_stat_dec(struct size_class *class,
int type, unsigned long cnt)
{
class->stats.objs[type] -= cnt;
}
-/* type can be of enum type zs_stat_type or fullness_group */
+/* type can be of enum type class_stat_type or fullness_group */
static inline unsigned long zs_stat_get(struct size_class *class,
int type)
{
{
struct zspage *head;
- zs_stat_inc(class, fullness, 1);
+ class_stat_inc(class, fullness, 1);
head = list_first_entry_or_null(&class->fullness_list[fullness],
struct zspage, list);
/*
enum fullness_group fullness)
{
VM_BUG_ON(list_empty(&class->fullness_list[fullness]));
- VM_BUG_ON(is_zspage_isolated(zspage));
list_del_init(&zspage->list);
- zs_stat_dec(class, fullness, 1);
+ class_stat_dec(class, fullness, 1);
}
/*
if (newfg == currfg)
goto out;
- if (!is_zspage_isolated(zspage)) {
- remove_zspage(class, zspage, currfg);
- insert_zspage(class, zspage, newfg);
- }
-
+ remove_zspage(class, zspage, currfg);
+ insert_zspage(class, zspage, newfg);
set_zspage_mapping(zspage, class_idx, newfg);
-
out:
return newfg;
}
static struct page *get_next_page(struct page *page)
{
- if (unlikely(PageHugeObject(page)))
+ struct zspage *zspage = get_zspage(page);
+
+ if (unlikely(ZsHugePage(zspage)))
return NULL;
return (struct page *)page->index;
*obj_idx = (obj & OBJ_INDEX_MASK);
}
+static void obj_to_page(unsigned long obj, struct page **page)
+{
+ obj >>= OBJ_TAG_BITS;
+ *page = pfn_to_page(obj >> OBJ_INDEX_BITS);
+}
+
/**
* location_to_obj - get obj value encoded from (<page>, <obj_idx>)
* @page: page object resides in zspage
return *(unsigned long *)handle;
}
-static unsigned long obj_to_head(struct page *page, void *obj)
+static bool obj_allocated(struct page *page, void *obj, unsigned long *phandle)
{
- if (unlikely(PageHugeObject(page))) {
+ unsigned long handle;
+ struct zspage *zspage = get_zspage(page);
+
+ if (unlikely(ZsHugePage(zspage))) {
VM_BUG_ON_PAGE(!is_first_page(page), page);
- return page->index;
+ handle = page->index;
} else
- return *(unsigned long *)obj;
-}
-
-static inline int testpin_tag(unsigned long handle)
-{
- return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle);
-}
-
-static inline int trypin_tag(unsigned long handle)
-{
- return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle);
-}
+ handle = *(unsigned long *)obj;
-static void pin_tag(unsigned long handle) __acquires(bitlock)
-{
- bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle);
-}
+ if (!(handle & OBJ_ALLOCATED_TAG))
+ return false;
-static void unpin_tag(unsigned long handle) __releases(bitlock)
-{
- bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle);
+ *phandle = handle & ~OBJ_ALLOCATED_TAG;
+ return true;
}
static void reset_page(struct page *page)
ClearPagePrivate(page);
set_page_private(page, 0);
page_mapcount_reset(page);
- ClearPageHugeObject(page);
page->index = 0;
}
cache_free_zspage(pool, zspage);
- zs_stat_dec(class, OBJ_ALLOCATED, class->objs_per_zspage);
+ class_stat_dec(class, OBJ_ALLOCATED, class->objs_per_zspage);
atomic_long_sub(class->pages_per_zspage,
&pool->pages_allocated);
}
VM_BUG_ON(get_zspage_inuse(zspage));
VM_BUG_ON(list_empty(&zspage->list));
+ /*
+ * Since zs_free couldn't be sleepable, this function cannot call
+ * lock_page. The page locks trylock_zspage got will be released
+ * by __free_zspage.
+ */
if (!trylock_zspage(zspage)) {
kick_deferred_free(pool);
return;
SetPagePrivate(page);
if (unlikely(class->objs_per_zspage == 1 &&
class->pages_per_zspage == 1))
- SetPageHugeObject(page);
+ SetZsHugePage(zspage);
} else {
prev_page->index = (unsigned long)page;
}
unsigned long obj, off;
unsigned int obj_idx;
- unsigned int class_idx;
- enum fullness_group fg;
struct size_class *class;
struct mapping_area *area;
struct page *pages[2];
*/
BUG_ON(in_interrupt());
- /* From now on, migration cannot move the object */
- pin_tag(handle);
-
+ /* It guarantees it can get zspage from handle safely */
+ read_lock(&pool->migrate_lock);
obj = handle_to_obj(handle);
obj_to_location(obj, &page, &obj_idx);
zspage = get_zspage(page);
- /* migration cannot move any subpage in this zspage */
+ /*
+ * migration cannot move any zpages in this zspage. Here, class->lock
+ * is too heavy since callers would take some time until they calls
+ * zs_unmap_object API so delegate the locking from class to zspage
+ * which is smaller granularity.
+ */
migrate_read_lock(zspage);
+ read_unlock(&pool->migrate_lock);
- get_zspage_mapping(zspage, &class_idx, &fg);
- class = pool->size_class[class_idx];
+ class = zspage_class(pool, zspage);
off = (class->size * obj_idx) & ~PAGE_MASK;
- area = &get_cpu_var(zs_map_area);
+ local_lock(&zs_map_area.lock);
+ area = this_cpu_ptr(&zs_map_area);
area->vm_mm = mm;
if (off + class->size <= PAGE_SIZE) {
/* this object is contained entirely within a page */
ret = __zs_map_object(area, pages, off, class->size);
out:
- if (likely(!PageHugeObject(page)))
+ if (likely(!ZsHugePage(zspage)))
ret += ZS_HANDLE_SIZE;
return ret;
unsigned long obj, off;
unsigned int obj_idx;
- unsigned int class_idx;
- enum fullness_group fg;
struct size_class *class;
struct mapping_area *area;
obj = handle_to_obj(handle);
obj_to_location(obj, &page, &obj_idx);
zspage = get_zspage(page);
- get_zspage_mapping(zspage, &class_idx, &fg);
- class = pool->size_class[class_idx];
+ class = zspage_class(pool, zspage);
off = (class->size * obj_idx) & ~PAGE_MASK;
area = this_cpu_ptr(&zs_map_area);
__zs_unmap_object(area, pages, off, class->size);
}
- put_cpu_var(zs_map_area);
+ local_unlock(&zs_map_area.lock);
migrate_read_unlock(zspage);
- unpin_tag(handle);
}
EXPORT_SYMBOL_GPL(zs_unmap_object);
}
EXPORT_SYMBOL_GPL(zs_huge_class_size);
-static unsigned long obj_malloc(struct size_class *class,
+static unsigned long obj_malloc(struct zs_pool *pool,
struct zspage *zspage, unsigned long handle)
{
int i, nr_page, offset;
unsigned long obj;
struct link_free *link;
+ struct size_class *class;
struct page *m_page;
unsigned long m_offset;
void *vaddr;
+ class = pool->size_class[zspage->class];
handle |= OBJ_ALLOCATED_TAG;
obj = get_freeobj(zspage);
vaddr = kmap_atomic(m_page);
link = (struct link_free *)vaddr + m_offset / sizeof(*link);
set_freeobj(zspage, link->next >> OBJ_TAG_BITS);
- if (likely(!PageHugeObject(m_page)))
+ if (likely(!ZsHugePage(zspage)))
/* record handle in the header of allocated chunk */
link->handle = handle;
else
kunmap_atomic(vaddr);
mod_zspage_inuse(zspage, 1);
- zs_stat_inc(class, OBJ_USED, 1);
obj = location_to_obj(m_page, obj);
size += ZS_HANDLE_SIZE;
class = pool->size_class[get_size_class_index(size)];
+ /* class->lock effectively protects the zpage migration */
spin_lock(&class->lock);
zspage = find_get_zspage(class);
if (likely(zspage)) {
- obj = obj_malloc(class, zspage, handle);
+ obj = obj_malloc(pool, zspage, handle);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, zspage);
record_obj(handle, obj);
+ class_stat_inc(class, OBJ_USED, 1);
spin_unlock(&class->lock);
return handle;
}
spin_lock(&class->lock);
- obj = obj_malloc(class, zspage, handle);
+ obj = obj_malloc(pool, zspage, handle);
newfg = get_fullness_group(class, zspage);
insert_zspage(class, zspage, newfg);
set_zspage_mapping(zspage, class->index, newfg);
record_obj(handle, obj);
atomic_long_add(class->pages_per_zspage,
&pool->pages_allocated);
- zs_stat_inc(class, OBJ_ALLOCATED, class->objs_per_zspage);
+ class_stat_inc(class, OBJ_ALLOCATED, class->objs_per_zspage);
+ class_stat_inc(class, OBJ_USED, 1);
/* We completely set up zspage so mark them as movable */
SetZsPageMovable(pool, zspage);
}
EXPORT_SYMBOL_GPL(zs_malloc);
-static void obj_free(struct size_class *class, unsigned long obj)
+static void obj_free(int class_size, unsigned long obj)
{
struct link_free *link;
struct zspage *zspage;
void *vaddr;
obj_to_location(obj, &f_page, &f_objidx);
- f_offset = (class->size * f_objidx) & ~PAGE_MASK;
+ f_offset = (class_size * f_objidx) & ~PAGE_MASK;
zspage = get_zspage(f_page);
vaddr = kmap_atomic(f_page);
/* Insert this object in containing zspage's freelist */
link = (struct link_free *)(vaddr + f_offset);
- link->next = get_freeobj(zspage) << OBJ_TAG_BITS;
+ if (likely(!ZsHugePage(zspage)))
+ link->next = get_freeobj(zspage) << OBJ_TAG_BITS;
+ else
+ f_page->index = 0;
kunmap_atomic(vaddr);
set_freeobj(zspage, f_objidx);
mod_zspage_inuse(zspage, -1);
- zs_stat_dec(class, OBJ_USED, 1);
}
void zs_free(struct zs_pool *pool, unsigned long handle)
struct zspage *zspage;
struct page *f_page;
unsigned long obj;
- unsigned int f_objidx;
- int class_idx;
struct size_class *class;
enum fullness_group fullness;
- bool isolated;
if (unlikely(!handle))
return;
- pin_tag(handle);
+ /*
+ * The pool->migrate_lock protects the race with zpage's migration
+ * so it's safe to get the page from handle.
+ */
+ read_lock(&pool->migrate_lock);
obj = handle_to_obj(handle);
- obj_to_location(obj, &f_page, &f_objidx);
+ obj_to_page(obj, &f_page);
zspage = get_zspage(f_page);
-
- migrate_read_lock(zspage);
-
- get_zspage_mapping(zspage, &class_idx, &fullness);
- class = pool->size_class[class_idx];
-
+ class = zspage_class(pool, zspage);
spin_lock(&class->lock);
- obj_free(class, obj);
+ read_unlock(&pool->migrate_lock);
+
+ obj_free(class->size, obj);
+ class_stat_dec(class, OBJ_USED, 1);
fullness = fix_fullness_group(class, zspage);
- if (fullness != ZS_EMPTY) {
- migrate_read_unlock(zspage);
+ if (fullness != ZS_EMPTY)
goto out;
- }
- isolated = is_zspage_isolated(zspage);
- migrate_read_unlock(zspage);
- /* If zspage is isolated, zs_page_putback will free the zspage */
- if (likely(!isolated))
- free_zspage(pool, class, zspage);
+ free_zspage(pool, class, zspage);
out:
-
spin_unlock(&class->lock);
- unpin_tag(handle);
cache_free_handle(pool, handle);
}
EXPORT_SYMBOL_GPL(zs_free);
static unsigned long find_alloced_obj(struct size_class *class,
struct page *page, int *obj_idx)
{
- unsigned long head;
int offset = 0;
int index = *obj_idx;
unsigned long handle = 0;
offset += class->size * index;
while (offset < PAGE_SIZE) {
- head = obj_to_head(page, addr + offset);
- if (head & OBJ_ALLOCATED_TAG) {
- handle = head & ~OBJ_ALLOCATED_TAG;
- if (trypin_tag(handle))
- break;
- handle = 0;
- }
+ if (obj_allocated(page, addr + offset, &handle))
+ break;
offset += class->size;
index++;
/* Stop if there is no more space */
if (zspage_full(class, get_zspage(d_page))) {
- unpin_tag(handle);
ret = -ENOMEM;
break;
}
used_obj = handle_to_obj(handle);
- free_obj = obj_malloc(class, get_zspage(d_page), handle);
+ free_obj = obj_malloc(pool, get_zspage(d_page), handle);
zs_object_copy(class, free_obj, used_obj);
obj_idx++;
- /*
- * record_obj updates handle's value to free_obj and it will
- * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
- * breaks synchronization using pin_tag(e,g, zs_free) so
- * let's keep the lock bit.
- */
- free_obj |= BIT(HANDLE_PIN_BIT);
record_obj(handle, free_obj);
- unpin_tag(handle);
- obj_free(class, used_obj);
+ obj_free(class->size, used_obj);
}
/* Remember last position in this iteration */
zspage = list_first_entry_or_null(&class->fullness_list[fg[i]],
struct zspage, list);
if (zspage) {
- VM_BUG_ON(is_zspage_isolated(zspage));
remove_zspage(class, zspage, fg[i]);
return zspage;
}
{
enum fullness_group fullness;
- VM_BUG_ON(is_zspage_isolated(zspage));
-
fullness = get_fullness_group(class, zspage);
insert_zspage(class, zspage, fullness);
set_zspage_mapping(zspage, class->index, fullness);
write_lock(&zspage->lock);
}
+static void migrate_write_lock_nested(struct zspage *zspage)
+{
+ write_lock_nested(&zspage->lock, SINGLE_DEPTH_NESTING);
+}
+
static void migrate_write_unlock(struct zspage *zspage)
{
write_unlock(&zspage->lock);
static void dec_zspage_isolation(struct zspage *zspage)
{
+ VM_BUG_ON(zspage->isolated == 0);
zspage->isolated--;
}
-static void putback_zspage_deferred(struct zs_pool *pool,
- struct size_class *class,
- struct zspage *zspage)
-{
- enum fullness_group fg;
-
- fg = putback_zspage(class, zspage);
- if (fg == ZS_EMPTY)
- schedule_work(&pool->free_work);
-
-}
-
-static inline void zs_pool_dec_isolated(struct zs_pool *pool)
-{
- VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0);
- atomic_long_dec(&pool->isolated_pages);
- /*
- * Checking pool->destroying must happen after atomic_long_dec()
- * for pool->isolated_pages above. Paired with the smp_mb() in
- * zs_unregister_migration().
- */
- smp_mb__after_atomic();
- if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying)
- wake_up_all(&pool->migration_wait);
-}
-
static void replace_sub_page(struct size_class *class, struct zspage *zspage,
struct page *newpage, struct page *oldpage)
{
create_page_chain(class, zspage, pages);
set_first_obj_offset(newpage, get_first_obj_offset(oldpage));
- if (unlikely(PageHugeObject(oldpage)))
+ if (unlikely(ZsHugePage(zspage)))
newpage->index = oldpage->index;
__SetPageMovable(newpage, page_mapping(oldpage));
}
static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
{
- struct zs_pool *pool;
- struct size_class *class;
- int class_idx;
- enum fullness_group fullness;
struct zspage *zspage;
- struct address_space *mapping;
/*
* Page is locked so zspage couldn't be destroyed. For detail, look at
VM_BUG_ON_PAGE(PageIsolated(page), page);
zspage = get_zspage(page);
-
- /*
- * Without class lock, fullness could be stale while class_idx is okay
- * because class_idx is constant unless page is freed so we should get
- * fullness again under class lock.
- */
- get_zspage_mapping(zspage, &class_idx, &fullness);
- mapping = page_mapping(page);
- pool = mapping->private_data;
- class = pool->size_class[class_idx];
-
- spin_lock(&class->lock);
- if (get_zspage_inuse(zspage) == 0) {
- spin_unlock(&class->lock);
- return false;
- }
-
- /* zspage is isolated for object migration */
- if (list_empty(&zspage->list) && !is_zspage_isolated(zspage)) {
- spin_unlock(&class->lock);
- return false;
- }
-
- /*
- * If this is first time isolation for the zspage, isolate zspage from
- * size_class to prevent further object allocation from the zspage.
- */
- if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) {
- get_zspage_mapping(zspage, &class_idx, &fullness);
- atomic_long_inc(&pool->isolated_pages);
- remove_zspage(class, zspage, fullness);
- }
-
+ migrate_write_lock(zspage);
inc_zspage_isolation(zspage);
- spin_unlock(&class->lock);
+ migrate_write_unlock(zspage);
return true;
}
{
struct zs_pool *pool;
struct size_class *class;
- int class_idx;
- enum fullness_group fullness;
struct zspage *zspage;
struct page *dummy;
void *s_addr, *d_addr, *addr;
- int offset, pos;
- unsigned long handle, head;
+ int offset;
+ unsigned long handle;
unsigned long old_obj, new_obj;
unsigned int obj_idx;
- int ret = -EAGAIN;
/*
* We cannot support the _NO_COPY case here, because copy needs to
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
- zspage = get_zspage(page);
-
- /* Concurrent compactor cannot migrate any subpage in zspage */
- migrate_write_lock(zspage);
- get_zspage_mapping(zspage, &class_idx, &fullness);
pool = mapping->private_data;
- class = pool->size_class[class_idx];
- offset = get_first_obj_offset(page);
+ /*
+ * The pool migrate_lock protects the race between zpage migration
+ * and zs_free.
+ */
+ write_lock(&pool->migrate_lock);
+ zspage = get_zspage(page);
+ class = zspage_class(pool, zspage);
+
+ /*
+ * the class lock protects zpage alloc/free in the zspage.
+ */
spin_lock(&class->lock);
- if (!get_zspage_inuse(zspage)) {
- /*
- * Set "offset" to end of the page so that every loops
- * skips unnecessary object scanning.
- */
- offset = PAGE_SIZE;
- }
+ /* the migrate_write_lock protects zpage access via zs_map_object */
+ migrate_write_lock(zspage);
- pos = offset;
+ offset = get_first_obj_offset(page);
s_addr = kmap_atomic(page);
- while (pos < PAGE_SIZE) {
- head = obj_to_head(page, s_addr + pos);
- if (head & OBJ_ALLOCATED_TAG) {
- handle = head & ~OBJ_ALLOCATED_TAG;
- if (!trypin_tag(handle))
- goto unpin_objects;
- }
- pos += class->size;
- }
/*
* Here, any user cannot access all objects in the zspage so let's move.
memcpy(d_addr, s_addr, PAGE_SIZE);
kunmap_atomic(d_addr);
- for (addr = s_addr + offset; addr < s_addr + pos;
+ for (addr = s_addr + offset; addr < s_addr + PAGE_SIZE;
addr += class->size) {
- head = obj_to_head(page, addr);
- if (head & OBJ_ALLOCATED_TAG) {
- handle = head & ~OBJ_ALLOCATED_TAG;
- BUG_ON(!testpin_tag(handle));
+ if (obj_allocated(page, addr, &handle)) {
old_obj = handle_to_obj(handle);
obj_to_location(old_obj, &dummy, &obj_idx);
new_obj = (unsigned long)location_to_obj(newpage,
obj_idx);
- new_obj |= BIT(HANDLE_PIN_BIT);
record_obj(handle, new_obj);
}
}
+ kunmap_atomic(s_addr);
replace_sub_page(class, zspage, newpage, page);
- get_page(newpage);
-
- dec_zspage_isolation(zspage);
-
/*
- * Page migration is done so let's putback isolated zspage to
- * the list if @page is final isolated subpage in the zspage.
+ * Since we complete the data copy and set up new zspage structure,
+ * it's okay to release migration_lock.
*/
- if (!is_zspage_isolated(zspage)) {
- /*
- * We cannot race with zs_destroy_pool() here because we wait
- * for isolation to hit zero before we start destroying.
- * Also, we ensure that everyone can see pool->destroying before
- * we start waiting.
- */
- putback_zspage_deferred(pool, class, zspage);
- zs_pool_dec_isolated(pool);
- }
+ write_unlock(&pool->migrate_lock);
+ spin_unlock(&class->lock);
+ dec_zspage_isolation(zspage);
+ migrate_write_unlock(zspage);
+ get_page(newpage);
if (page_zone(newpage) != page_zone(page)) {
dec_zone_page_state(page, NR_ZSPAGES);
inc_zone_page_state(newpage, NR_ZSPAGES);
reset_page(page);
put_page(page);
- page = newpage;
-
- ret = MIGRATEPAGE_SUCCESS;
-unpin_objects:
- for (addr = s_addr + offset; addr < s_addr + pos;
- addr += class->size) {
- head = obj_to_head(page, addr);
- if (head & OBJ_ALLOCATED_TAG) {
- handle = head & ~OBJ_ALLOCATED_TAG;
- BUG_ON(!testpin_tag(handle));
- unpin_tag(handle);
- }
- }
- kunmap_atomic(s_addr);
- spin_unlock(&class->lock);
- migrate_write_unlock(zspage);
- return ret;
+ return MIGRATEPAGE_SUCCESS;
}
static void zs_page_putback(struct page *page)
{
- struct zs_pool *pool;
- struct size_class *class;
- int class_idx;
- enum fullness_group fg;
- struct address_space *mapping;
struct zspage *zspage;
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
zspage = get_zspage(page);
- get_zspage_mapping(zspage, &class_idx, &fg);
- mapping = page_mapping(page);
- pool = mapping->private_data;
- class = pool->size_class[class_idx];
-
- spin_lock(&class->lock);
+ migrate_write_lock(zspage);
dec_zspage_isolation(zspage);
- if (!is_zspage_isolated(zspage)) {
- /*
- * Due to page_lock, we cannot free zspage immediately
- * so let's defer.
- */
- putback_zspage_deferred(pool, class, zspage);
- zs_pool_dec_isolated(pool);
- }
- spin_unlock(&class->lock);
+ migrate_write_unlock(zspage);
}
static const struct address_space_operations zsmalloc_aops = {
return 0;
}
-static bool pool_isolated_are_drained(struct zs_pool *pool)
-{
- return atomic_long_read(&pool->isolated_pages) == 0;
-}
-
-/* Function for resolving migration */
-static void wait_for_isolated_drain(struct zs_pool *pool)
-{
-
- /*
- * We're in the process of destroying the pool, so there are no
- * active allocations. zs_page_isolate() fails for completely free
- * zspages, so we need only wait for the zs_pool's isolated
- * count to hit zero.
- */
- wait_event(pool->migration_wait,
- pool_isolated_are_drained(pool));
-}
-
static void zs_unregister_migration(struct zs_pool *pool)
{
- pool->destroying = true;
- /*
- * We need a memory barrier here to ensure global visibility of
- * pool->destroying. Thus pool->isolated pages will either be 0 in which
- * case we don't care, or it will be > 0 and pool->destroying will
- * ensure that we wake up once isolation hits 0.
- */
- smp_mb();
- wait_for_isolated_drain(pool); /* This can block */
flush_work(&pool->free_work);
iput(pool->inode);
}
spin_unlock(&class->lock);
}
-
list_for_each_entry_safe(zspage, tmp, &free_pages, list) {
list_del(&zspage->list);
lock_zspage(zspage);
struct zspage *dst_zspage = NULL;
unsigned long pages_freed = 0;
+ /* protect the race between zpage migration and zs_free */
+ write_lock(&pool->migrate_lock);
+ /* protect zpage allocation/free */
spin_lock(&class->lock);
while ((src_zspage = isolate_zspage(class, true))) {
+ /* protect someone accessing the zspage(i.e., zs_map_object) */
+ migrate_write_lock(src_zspage);
if (!zs_can_compact(class))
break;
cc.s_page = get_first_page(src_zspage);
while ((dst_zspage = isolate_zspage(class, false))) {
+ migrate_write_lock_nested(dst_zspage);
+
cc.d_page = get_first_page(dst_zspage);
/*
* If there is no more space in dst_page, resched
break;
putback_zspage(class, dst_zspage);
+ migrate_write_unlock(dst_zspage);
+ dst_zspage = NULL;
+ if (rwlock_is_contended(&pool->migrate_lock))
+ break;
}
/* Stop if we couldn't find slot */
break;
putback_zspage(class, dst_zspage);
+ migrate_write_unlock(dst_zspage);
+
if (putback_zspage(class, src_zspage) == ZS_EMPTY) {
+ migrate_write_unlock(src_zspage);
free_zspage(pool, class, src_zspage);
pages_freed += class->pages_per_zspage;
- }
+ } else
+ migrate_write_unlock(src_zspage);
spin_unlock(&class->lock);
+ write_unlock(&pool->migrate_lock);
cond_resched();
+ write_lock(&pool->migrate_lock);
spin_lock(&class->lock);
}
- if (src_zspage)
+ if (src_zspage) {
putback_zspage(class, src_zspage);
+ migrate_write_unlock(src_zspage);
+ }
spin_unlock(&class->lock);
+ write_unlock(&pool->migrate_lock);
return pages_freed;
}
return NULL;
init_deferred_free(pool);
+ rwlock_init(&pool->migrate_lock);
pool->name = kstrdup(name, GFP_KERNEL);
if (!pool->name)
goto err;
-#ifdef CONFIG_COMPACTION
- init_waitqueue_head(&pool->migration_wait);
-#endif
-
if (create_cache(pool))
goto err;
zswap_trees[type] = tree;
}
-static struct frontswap_ops zswap_frontswap_ops = {
+static const struct frontswap_ops zswap_frontswap_ops = {
.store = zswap_frontswap_store,
.load = zswap_frontswap_load,
.invalidate_page = zswap_frontswap_invalidate_page,
if (!shrink_wq)
goto fallback_fail;
- frontswap_register_ops(&zswap_frontswap_ops);
+ ret = frontswap_register_ops(&zswap_frontswap_ops);
+ if (ret)
+ goto destroy_wq;
if (zswap_debugfs_init())
pr_warn("debugfs initialization failed\n");
return 0;
+destroy_wq:
+ destroy_workqueue(shrink_wq);
fallback_fail:
if (pool)
zswap_pool_destroy(pool);
static inline void *vcc_walk(struct seq_file *seq, loff_t l)
{
struct vcc_state *state = seq->private;
- int family = (uintptr_t)(PDE_DATA(file_inode(seq->file)));
+ int family = (uintptr_t)(pde_data(file_inode(seq->file)));
return __vcc_walk(&state->sk, family, &state->bucket, l) ?
state : NULL;
page = get_zeroed_page(GFP_KERNEL);
if (!page)
return -ENOMEM;
- dev = PDE_DATA(file_inode(file));
+ dev = pde_data(file_inode(file));
if (!dev->ops->proc_read)
length = -EINVAL;
else {
static void *bt_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(seq->private->l->lock)
{
- struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
+ struct bt_sock_list *l = pde_data(file_inode(seq->file));
read_lock(&l->lock);
return seq_hlist_start_head(&l->head, *pos);
static void *bt_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
- struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
+ struct bt_sock_list *l = pde_data(file_inode(seq->file));
return seq_hlist_next(v, &l->head, pos);
}
static void bt_seq_stop(struct seq_file *seq, void *v)
__releases(seq->private->l->lock)
{
- struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
+ struct bt_sock_list *l = pde_data(file_inode(seq->file));
read_unlock(&l->lock);
}
static int bt_seq_show(struct seq_file *seq, void *v)
{
- struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
+ struct bt_sock_list *l = pde_data(file_inode(seq->file));
if (v == SEQ_START_TOKEN) {
seq_puts(seq, "sk RefCnt Rmem Wmem User Inode Parent");
err = dev_set_allmulti(dev, 1);
if (err) {
br_multicast_del_port(p);
+ dev_put_track(dev, &p->dev_tracker);
kfree(p); /* kobject not yet init'd, manually free */
goto err1;
}
sysfs_remove_link(br->ifobj, p->dev->name);
err2:
br_multicast_del_port(p);
+ dev_put_track(dev, &p->dev_tracker);
kobject_put(&p->kobj);
dev_set_allmulti(dev, -1);
err1:
- dev_put(dev);
return err;
}
{
char ifname[IFNAMSIZ];
struct net *net = m->private;
- struct sock *sk = (struct sock *)PDE_DATA(m->file->f_inode);
+ struct sock *sk = (struct sock *)pde_data(m->file->f_inode);
struct bcm_sock *bo = bcm_sk(sk);
struct bcm_op *op;
static int can_rcvlist_proc_show(struct seq_file *m, void *v)
{
/* double cast to prevent GCC warning */
- int idx = (int)(long)PDE_DATA(m->file->f_inode);
+ int idx = (int)(long)pde_data(m->file->f_inode);
struct net_device *dev;
struct can_dev_rcv_lists *dev_rcv_lists;
struct net *net = m->private;
}
EXPORT_SYMBOL(ceph_compare_options);
-static int parse_fsid(const char *str, struct ceph_fsid *fsid)
+int ceph_parse_fsid(const char *str, struct ceph_fsid *fsid)
{
int i = 0;
char tmp[3];
int err = -EINVAL;
int d;
- dout("parse_fsid '%s'\n", str);
+ dout("%s '%s'\n", __func__, str);
tmp[2] = 0;
while (*str && i < 16) {
if (ispunct(*str)) {
if (i == 16)
err = 0;
- dout("parse_fsid ret %d got fsid %pU\n", err, fsid);
+ dout("%s ret %d got fsid %pU\n", __func__, err, fsid);
return err;
}
+EXPORT_SYMBOL(ceph_parse_fsid);
/*
* ceph options
}
int ceph_parse_mon_ips(const char *buf, size_t len, struct ceph_options *opt,
- struct fc_log *l)
+ struct fc_log *l, char delim)
{
struct p_log log = {.prefix = "libceph", .log = l};
int ret;
- /* ip1[:port1][,ip2[:port2]...] */
+ /* ip1[:port1][<delim>ip2[:port2]...] */
ret = ceph_parse_ips(buf, buf + len, opt->mon_addr, CEPH_MAX_MON,
- &opt->num_mon);
+ &opt->num_mon, delim);
if (ret) {
error_plog(&log, "Failed to parse monitor IPs: %d", ret);
return ret;
case Opt_ip:
err = ceph_parse_ips(param->string,
param->string + param->size,
- &opt->my_addr,
- 1, NULL);
+ &opt->my_addr, 1, NULL, ',');
if (err) {
error_plog(&log, "Failed to parse ip: %d", err);
return err;
break;
case Opt_fsid:
- err = parse_fsid(param->string, &opt->fsid);
+ err = ceph_parse_fsid(param->string, &opt->fsid);
if (err) {
error_plog(&log, "Failed to parse fsid: %d", err);
return err;
*/
int ceph_parse_ips(const char *c, const char *end,
struct ceph_entity_addr *addr,
- int max_count, int *count)
+ int max_count, int *count, char delim)
{
int i, ret = -EINVAL;
const char *p = c;
dout("parse_ips on '%.*s'\n", (int)(end-c), c);
for (i = 0; i < max_count; i++) {
+ char cur_delim = delim;
const char *ipend;
int port;
- char delim = ',';
if (*p == '[') {
- delim = ']';
+ cur_delim = ']';
p++;
}
- ret = ceph_parse_server_name(p, end - p, &addr[i], delim, &ipend);
+ ret = ceph_parse_server_name(p, end - p, &addr[i], cur_delim,
+ &ipend);
if (ret)
goto bad;
ret = -EINVAL;
p = ipend;
- if (delim == ']') {
+ if (cur_delim == ']') {
if (*p != ']') {
dout("missing matching ']'\n");
goto bad;
addr[i].type = CEPH_ENTITY_ADDR_TYPE_LEGACY;
addr[i].nonce = 0;
- dout("parse_ips got %s\n", ceph_pr_addr(&addr[i]));
+ dout("%s got %s\n", __func__, ceph_pr_addr(&addr[i]));
if (p == end)
break;
- if (*p != ',')
+ if (*p != delim)
goto bad;
p++;
}
goto out_unlock;
}
old_prog = link->prog;
+ if (old_prog->type != new_prog->type ||
+ old_prog->expected_attach_type != new_prog->expected_attach_type) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
if (old_prog == new_prog) {
/* no-op, don't disturb drivers */
bpf_prog_put(new_prog);
static void *neigh_stat_seq_start(struct seq_file *seq, loff_t *pos)
{
- struct neigh_table *tbl = PDE_DATA(file_inode(seq->file));
+ struct neigh_table *tbl = pde_data(file_inode(seq->file));
int cpu;
if (*pos == 0)
static void *neigh_stat_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
- struct neigh_table *tbl = PDE_DATA(file_inode(seq->file));
+ struct neigh_table *tbl = pde_data(file_inode(seq->file));
int cpu;
for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
static int neigh_stat_seq_show(struct seq_file *seq, void *v)
{
- struct neigh_table *tbl = PDE_DATA(file_inode(seq->file));
+ struct neigh_table *tbl = pde_data(file_inode(seq->file));
struct neigh_statistics *st = v;
if (v == SEQ_START_TOKEN) {
{
struct net *net;
if (ops->exit) {
- list_for_each_entry(net, net_exit_list, exit_list)
+ list_for_each_entry(net, net_exit_list, exit_list) {
ops->exit(net);
+ cond_resched();
+ }
}
if (ops->exit_batch)
ops->exit_batch(net_exit_list);
{
struct platform_device *pdev = of_find_device_by_node(np);
struct nvmem_cell *cell;
- const void *buf;
+ const void *mac;
size_t len;
int ret;
if (IS_ERR(cell))
return PTR_ERR(cell);
- buf = nvmem_cell_read(cell, &len);
+ mac = nvmem_cell_read(cell, &len);
nvmem_cell_put(cell);
- if (IS_ERR(buf))
- return PTR_ERR(buf);
-
- ret = 0;
- if (len == ETH_ALEN) {
- if (is_valid_ether_addr(buf))
- memcpy(addr, buf, ETH_ALEN);
- else
- ret = -EINVAL;
- } else if (len == 3 * ETH_ALEN - 1) {
- u8 mac[ETH_ALEN];
-
- if (mac_pton(buf, mac))
- memcpy(addr, mac, ETH_ALEN);
- else
- ret = -EINVAL;
- } else {
- ret = -EINVAL;
+ if (IS_ERR(mac))
+ return PTR_ERR(mac);
+
+ if (len != ETH_ALEN || !is_valid_ether_addr(mac)) {
+ kfree(mac);
+ return -EINVAL;
}
- kfree(buf);
+ memcpy(addr, mac, ETH_ALEN);
+ kfree(mac);
- return ret;
+ return 0;
}
/**
static int pgctrl_open(struct inode *inode, struct file *file)
{
- return single_open(file, pgctrl_show, PDE_DATA(inode));
+ return single_open(file, pgctrl_show, pde_data(inode));
}
static const struct proc_ops pktgen_proc_ops = {
static int pktgen_if_open(struct inode *inode, struct file *file)
{
- return single_open(file, pktgen_if_show, PDE_DATA(inode));
+ return single_open(file, pktgen_if_show, pde_data(inode));
}
static const struct proc_ops pktgen_if_proc_ops = {
static int pktgen_thread_open(struct inode *inode, struct file *file)
{
- return single_open(file, pktgen_thread_show, PDE_DATA(inode));
+ return single_open(file, pktgen_thread_show, pde_data(inode));
}
static const struct proc_ops pktgen_thread_proc_ops = {
}
num = ethtool_get_phc_vclocks(dev, &vclock_index);
+ dev_put(dev);
+
for (i = 0; i < num; i++) {
if (*(vclock_index + i) == phc_index) {
match = true;
{
bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE);
+ WARN_ON_ONCE(!llist_empty(&sk->defer_list));
+ sk_defer_free_flush(sk);
+
if (rcu_access_pointer(sk->sk_reuseport_cb)) {
reuseport_detach_sock(sk);
use_call_rcu = true;
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/netlink.h>
+#include <linux/hash.h>
#include <net/arp.h>
#include <net/ip.h>
static struct hlist_head *fib_info_hash;
static struct hlist_head *fib_info_laddrhash;
static unsigned int fib_info_hash_size;
+static unsigned int fib_info_hash_bits;
static unsigned int fib_info_cnt;
#define DEVINDEX_HASHBITS 8
pr_warn("Freeing alive fib_info %p\n", fi);
return;
}
- fib_info_cnt--;
call_rcu(&fi->rcu, free_fib_info_rcu);
}
spin_lock_bh(&fib_info_lock);
if (fi && refcount_dec_and_test(&fi->fib_treeref)) {
hlist_del(&fi->fib_hash);
+
+ /* Paired with READ_ONCE() in fib_create_info(). */
+ WRITE_ONCE(fib_info_cnt, fib_info_cnt - 1);
+
if (fi->fib_prefsrc)
hlist_del(&fi->fib_lhash);
if (fi->nh) {
static inline unsigned int fib_devindex_hashfn(unsigned int val)
{
- unsigned int mask = DEVINDEX_HASHSIZE - 1;
+ return hash_32(val, DEVINDEX_HASHBITS);
+}
+
+static struct hlist_head *
+fib_info_devhash_bucket(const struct net_device *dev)
+{
+ u32 val = net_hash_mix(dev_net(dev)) ^ dev->ifindex;
- return (val ^
- (val >> DEVINDEX_HASHBITS) ^
- (val >> (DEVINDEX_HASHBITS * 2))) & mask;
+ return &fib_info_devhash[fib_devindex_hashfn(val)];
}
static unsigned int fib_info_hashfn_1(int init_val, u8 protocol, u8 scope,
{
struct hlist_head *head;
struct fib_nh *nh;
- unsigned int hash;
spin_lock(&fib_info_lock);
- hash = fib_devindex_hashfn(dev->ifindex);
- head = &fib_info_devhash[hash];
+ head = fib_info_devhash_bucket(dev);
+
hlist_for_each_entry(nh, head, nh_hash) {
if (nh->fib_nh_dev == dev &&
nh->fib_nh_gw4 == gw &&
return err;
}
-static inline unsigned int fib_laddr_hashfn(__be32 val)
+static struct hlist_head *
+fib_info_laddrhash_bucket(const struct net *net, __be32 val)
{
- unsigned int mask = (fib_info_hash_size - 1);
+ u32 slot = hash_32(net_hash_mix(net) ^ (__force u32)val,
+ fib_info_hash_bits);
- return ((__force u32)val ^
- ((__force u32)val >> 7) ^
- ((__force u32)val >> 14)) & mask;
+ return &fib_info_laddrhash[slot];
}
static struct hlist_head *fib_info_hash_alloc(int bytes)
old_info_hash = fib_info_hash;
old_laddrhash = fib_info_laddrhash;
fib_info_hash_size = new_size;
+ fib_info_hash_bits = ilog2(new_size);
for (i = 0; i < old_size; i++) {
struct hlist_head *head = &fib_info_hash[i];
}
fib_info_hash = new_info_hash;
+ fib_info_laddrhash = new_laddrhash;
for (i = 0; i < old_size; i++) {
- struct hlist_head *lhead = &fib_info_laddrhash[i];
+ struct hlist_head *lhead = &old_laddrhash[i];
struct hlist_node *n;
struct fib_info *fi;
hlist_for_each_entry_safe(fi, n, lhead, fib_lhash) {
struct hlist_head *ldest;
- unsigned int new_hash;
- new_hash = fib_laddr_hashfn(fi->fib_prefsrc);
- ldest = &new_laddrhash[new_hash];
+ ldest = fib_info_laddrhash_bucket(fi->fib_net,
+ fi->fib_prefsrc);
hlist_add_head(&fi->fib_lhash, ldest);
}
}
- fib_info_laddrhash = new_laddrhash;
spin_unlock_bh(&fib_info_lock);
#endif
err = -ENOBUFS;
- if (fib_info_cnt >= fib_info_hash_size) {
+
+ /* Paired with WRITE_ONCE() in fib_release_info() */
+ if (READ_ONCE(fib_info_cnt) >= fib_info_hash_size) {
unsigned int new_size = fib_info_hash_size << 1;
struct hlist_head *new_info_hash;
struct hlist_head *new_laddrhash;
return ERR_PTR(err);
}
- fib_info_cnt++;
fi->fib_net = net;
fi->fib_protocol = cfg->fc_protocol;
fi->fib_scope = cfg->fc_scope;
refcount_set(&fi->fib_treeref, 1);
refcount_set(&fi->fib_clntref, 1);
spin_lock_bh(&fib_info_lock);
+ fib_info_cnt++;
hlist_add_head(&fi->fib_hash,
&fib_info_hash[fib_info_hashfn(fi)]);
if (fi->fib_prefsrc) {
struct hlist_head *head;
- head = &fib_info_laddrhash[fib_laddr_hashfn(fi->fib_prefsrc)];
+ head = fib_info_laddrhash_bucket(net, fi->fib_prefsrc);
hlist_add_head(&fi->fib_lhash, head);
}
if (fi->nh) {
} else {
change_nexthops(fi) {
struct hlist_head *head;
- unsigned int hash;
if (!nexthop_nh->fib_nh_dev)
continue;
- hash = fib_devindex_hashfn(nexthop_nh->fib_nh_dev->ifindex);
- head = &fib_info_devhash[hash];
+ head = fib_info_devhash_bucket(nexthop_nh->fib_nh_dev);
hlist_add_head(&nexthop_nh->nh_hash, head);
} endfor_nexthops(fi)
}
*/
int fib_sync_down_addr(struct net_device *dev, __be32 local)
{
- int ret = 0;
- unsigned int hash = fib_laddr_hashfn(local);
- struct hlist_head *head = &fib_info_laddrhash[hash];
int tb_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;
struct net *net = dev_net(dev);
+ struct hlist_head *head;
struct fib_info *fi;
+ int ret = 0;
if (!fib_info_laddrhash || local == 0)
return 0;
+ head = fib_info_laddrhash_bucket(net, local);
hlist_for_each_entry(fi, head, fib_lhash) {
if (!net_eq(fi->fib_net, net) ||
fi->fib_tb_id != tb_id)
void fib_sync_mtu(struct net_device *dev, u32 orig_mtu)
{
- unsigned int hash = fib_devindex_hashfn(dev->ifindex);
- struct hlist_head *head = &fib_info_devhash[hash];
+ struct hlist_head *head = fib_info_devhash_bucket(dev);
struct fib_nh *nh;
hlist_for_each_entry(nh, head, nh_hash) {
*/
int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force)
{
- int ret = 0;
- int scope = RT_SCOPE_NOWHERE;
+ struct hlist_head *head = fib_info_devhash_bucket(dev);
struct fib_info *prev_fi = NULL;
- unsigned int hash = fib_devindex_hashfn(dev->ifindex);
- struct hlist_head *head = &fib_info_devhash[hash];
+ int scope = RT_SCOPE_NOWHERE;
struct fib_nh *nh;
+ int ret = 0;
if (force)
scope = -1;
int fib_sync_up(struct net_device *dev, unsigned char nh_flags)
{
struct fib_info *prev_fi;
- unsigned int hash;
struct hlist_head *head;
struct fib_nh *nh;
int ret;
}
prev_fi = NULL;
- hash = fib_devindex_hashfn(dev->ifindex);
- head = &fib_info_devhash[hash];
+ head = fib_info_devhash_bucket(dev);
ret = 0;
hlist_for_each_entry(nh, head, nh_hash) {
/* The RCU read lock provides a memory barrier
* guaranteeing that if fqdir->dead is false then
* the hash table destruction will not start until
- * after we unlock. Paired with inet_frags_exit_net().
+ * after we unlock. Paired with fqdir_pre_exit().
*/
- if (!fqdir->dead) {
+ if (!READ_ONCE(fqdir->dead)) {
rhashtable_remove_fast(&fqdir->rhashtable, &fq->node,
fqdir->f->rhash_params);
refcount_dec(&fq->refcnt);
/* TODO : call from rcu_read_lock() and no longer use refcount_inc_not_zero() */
struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void *key)
{
+ /* This pairs with WRITE_ONCE() in fqdir_pre_exit(). */
+ long high_thresh = READ_ONCE(fqdir->high_thresh);
struct inet_frag_queue *fq = NULL, *prev;
- if (!fqdir->high_thresh || frag_mem_limit(fqdir) > fqdir->high_thresh)
+ if (!high_thresh || frag_mem_limit(fqdir) > high_thresh)
return NULL;
rcu_read_lock();
rcu_read_lock();
- if (qp->q.fqdir->dead)
+ /* Paired with WRITE_ONCE() in fqdir_pre_exit(). */
+ if (READ_ONCE(qp->q.fqdir->dead))
goto out_rcu_unlock;
spin_lock(&qp->q.lock);
key = &info->key;
ip_tunnel_init_flow(&fl4, IPPROTO_GRE, key->u.ipv4.dst, key->u.ipv4.src,
- tunnel_id_to_key32(key->tun_id), key->tos, 0,
- skb->mark, skb_get_hash(skb));
+ tunnel_id_to_key32(key->tun_id),
+ key->tos & ~INET_ECN_MASK, 0, skb->mark,
+ skb_get_hash(skb));
rt = ip_route_output_key(dev_net(dev), &fl4);
if (IS_ERR(rt))
return PTR_ERR(rt);
if (!ret) {
struct seq_file *sf = file->private_data;
- struct clusterip_config *c = PDE_DATA(inode);
+ struct clusterip_config *c = pde_data(inode);
sf->private = c;
static int clusterip_proc_release(struct inode *inode, struct file *file)
{
- struct clusterip_config *c = PDE_DATA(inode);
+ struct clusterip_config *c = pde_data(inode);
int ret;
ret = seq_release(inode, file);
static ssize_t clusterip_proc_write(struct file *file, const char __user *input,
size_t size, loff_t *ofs)
{
- struct clusterip_config *c = PDE_DATA(file_inode(file));
+ struct clusterip_config *c = pde_data(file_inode(file));
#define PROC_WRITELEN 10
char buffer[PROC_WRITELEN+1];
unsigned long nodenum;
static struct sock *raw_get_first(struct seq_file *seq)
{
struct sock *sk;
- struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
+ struct raw_hashinfo *h = pde_data(file_inode(seq->file));
struct raw_iter_state *state = raw_seq_private(seq);
for (state->bucket = 0; state->bucket < RAW_HTABLE_SIZE;
static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk)
{
- struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
+ struct raw_hashinfo *h = pde_data(file_inode(seq->file));
struct raw_iter_state *state = raw_seq_private(seq);
do {
void *raw_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(&h->lock)
{
- struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
+ struct raw_hashinfo *h = pde_data(file_inode(seq->file));
read_lock(&h->lock);
return *pos ? raw_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
void raw_seq_stop(struct seq_file *seq, void *v)
__releases(&h->lock)
{
- struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
+ struct raw_hashinfo *h = pde_data(file_inode(seq->file));
read_unlock(&h->lock);
}
#endif
/* Iterated from proc fs */
- afinfo = PDE_DATA(file_inode(seq->file));
+ afinfo = pde_data(file_inode(seq->file));
return afinfo->family;
}
if (state->bpf_seq_afinfo)
afinfo = state->bpf_seq_afinfo;
else
- afinfo = PDE_DATA(file_inode(seq->file));
+ afinfo = pde_data(file_inode(seq->file));
for (state->bucket = start; state->bucket <= afinfo->udp_table->mask;
++state->bucket) {
if (state->bpf_seq_afinfo)
afinfo = state->bpf_seq_afinfo;
else
- afinfo = PDE_DATA(file_inode(seq->file));
+ afinfo = pde_data(file_inode(seq->file));
do {
sk = sk_next(sk);
if (state->bpf_seq_afinfo)
afinfo = state->bpf_seq_afinfo;
else
- afinfo = PDE_DATA(file_inode(seq->file));
+ afinfo = pde_data(file_inode(seq->file));
if (state->bucket <= afinfo->udp_table->mask)
spin_unlock_bh(&afinfo->udp_table->hash[state->bucket].lock);
dst_cache_set_ip4(&tunnel->dst_cache, &rt->dst, fl4.saddr);
}
- if (rt->rt_type != RTN_UNICAST) {
+ if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
ip_rt_put(rt);
dev->stats.tx_carrier_errors++;
goto tx_error_icmp;
struct mctp_test_route **rtp,
struct socket **sockp)
{
- struct sockaddr_mctp addr;
+ struct sockaddr_mctp addr = {0};
struct mctp_test_route *rt;
struct mctp_test_dev *dev;
struct socket *sock;
bitmap = &ncf->bitmap;
spin_lock_irqsave(&nc->lock, flags);
- index = find_next_bit(bitmap, ncf->n_vids, 0);
+ index = find_first_bit(bitmap, ncf->n_vids);
if (index >= ncf->n_vids) {
spin_unlock_irqrestore(&nc->lock, flags);
return -1;
return -1;
}
- index = find_next_zero_bit(bitmap, ncf->n_vids, 0);
+ index = find_first_zero_bit(bitmap, ncf->n_vids);
if (index < 0 || index >= ncf->n_vids) {
netdev_err(ndp->ndev.dev,
"Channel %u already has all VLAN filters set\n",
struct nft_connlimit *priv_src = nft_expr_priv(src);
priv_dst->list = kmalloc(sizeof(*priv_dst->list), GFP_ATOMIC);
- if (priv_dst->list)
+ if (!priv_dst->list)
return -ENOMEM;
nf_conncount_list_init(priv_dst->list);
struct nft_last_priv *priv_dst = nft_expr_priv(dst);
priv_dst->last = kzalloc(sizeof(*priv_dst->last), GFP_ATOMIC);
- if (priv_dst->last)
+ if (!priv_dst->last)
return -ENOMEM;
return 0;
priv_dst->invert = priv_src->invert;
priv_dst->limit = kmalloc(sizeof(*priv_dst->limit), GFP_ATOMIC);
- if (priv_dst->limit)
+ if (!priv_dst->limit)
return -ENOMEM;
spin_lock_init(&priv_dst->limit->lock);
struct nft_quota *priv_dst = nft_expr_priv(dst);
priv_dst->consumed = kmalloc(sizeof(*priv_dst->consumed), GFP_ATOMIC);
- if (priv_dst->consumed)
+ if (!priv_dst->consumed)
return -ENOMEM;
atomic64_set(priv_dst->consumed, 0);
#ifdef CONFIG_PROC_FS
static void *xt_table_seq_start(struct seq_file *seq, loff_t *pos)
{
- u8 af = (unsigned long)PDE_DATA(file_inode(seq->file));
+ u8 af = (unsigned long)pde_data(file_inode(seq->file));
struct net *net = seq_file_net(seq);
struct xt_pernet *xt_net;
static void *xt_table_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
- u8 af = (unsigned long)PDE_DATA(file_inode(seq->file));
+ u8 af = (unsigned long)pde_data(file_inode(seq->file));
struct net *net = seq_file_net(seq);
struct xt_pernet *xt_net;
static void xt_table_seq_stop(struct seq_file *seq, void *v)
{
- u_int8_t af = (unsigned long)PDE_DATA(file_inode(seq->file));
+ u_int8_t af = (unsigned long)pde_data(file_inode(seq->file));
mutex_unlock(&xt[af].mutex);
}
[MTTG_TRAV_NFP_UNSPEC] = MTTG_TRAV_NFP_SPEC,
[MTTG_TRAV_NFP_SPEC] = MTTG_TRAV_DONE,
};
- uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file));
+ uint8_t nfproto = (unsigned long)pde_data(file_inode(seq->file));
struct nf_mttg_trav *trav = seq->private;
if (ppos != NULL)
static void xt_mttg_seq_stop(struct seq_file *seq, void *v)
{
- uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file));
+ uint8_t nfproto = (unsigned long)pde_data(file_inode(seq->file));
struct nf_mttg_trav *trav = seq->private;
switch (trav->class) {
static void *dl_seq_start(struct seq_file *s, loff_t *pos)
__acquires(htable->lock)
{
- struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file));
unsigned int *bucket;
spin_lock_bh(&htable->lock);
static void *dl_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
- struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file));
unsigned int *bucket = v;
*pos = ++(*bucket);
static void dl_seq_stop(struct seq_file *s, void *v)
__releases(htable->lock)
{
- struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file));
unsigned int *bucket = v;
if (!IS_ERR(bucket))
static int dl_seq_real_show_v2(struct dsthash_ent *ent, u_int8_t family,
struct seq_file *s)
{
- struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file));
spin_lock(&ent->lock);
/* recalculate to show accurate numbers */
static int dl_seq_real_show_v1(struct dsthash_ent *ent, u_int8_t family,
struct seq_file *s)
{
- struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file));
spin_lock(&ent->lock);
/* recalculate to show accurate numbers */
static int dl_seq_real_show(struct dsthash_ent *ent, u_int8_t family,
struct seq_file *s)
{
- struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file));
spin_lock(&ent->lock);
/* recalculate to show accurate numbers */
static int dl_seq_show_v2(struct seq_file *s, void *v)
{
- struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file));
unsigned int *bucket = (unsigned int *)v;
struct dsthash_ent *ent;
static int dl_seq_show_v1(struct seq_file *s, void *v)
{
- struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file));
unsigned int *bucket = v;
struct dsthash_ent *ent;
static int dl_seq_show(struct seq_file *s, void *v)
{
- struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file));
+ struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file));
unsigned int *bucket = v;
struct dsthash_ent *ent;
if (st == NULL)
return -ENOMEM;
- st->table = PDE_DATA(inode);
+ st->table = pde_data(inode);
return 0;
}
recent_mt_proc_write(struct file *file, const char __user *input,
size_t size, loff_t *loff)
{
- struct recent_table *t = PDE_DATA(file_inode(file));
+ struct recent_table *t = pde_data(file_inode(file));
struct recent_entry *e;
char buf[sizeof("+b335:1d35:1e55:dead:c0de:1715:5afe:c0de")];
const char *c = buf;
lock_sock(sk);
+ if (!llcp_sock->local) {
+ release_sock(sk);
+ return -ENODEV;
+ }
+
if (sk->sk_type == SOCK_DGRAM) {
DECLARE_SOCKADDR(struct sockaddr_nfc_llcp *, addr,
msg->msg_name);
qdisc_offload_graft_root(dev, new, old, extack);
- if (new && new->ops->attach)
+ if (new && new->ops->attach && !ingress)
goto skip;
for (i = 0; i < num_q; i++) {
{
memset(r, 0, sizeof(*r));
r->overhead = conf->overhead;
+ r->mpu = conf->mpu;
r->rate_bytes_ps = max_t(u64, conf->rate, rate64);
r->linklayer = (conf->linklayer & TC_LINKLAYER_MASK);
psched_ratecfg_precompute__(r->rate_bytes_ps, &r->mult, &r->shift);
{
struct smc_connection *conn = &smc->conn;
struct smc_link_group *lgr = conn->lgr;
+ bool lgr_valid = false;
+
+ if (smc_conn_lgr_valid(conn))
+ lgr_valid = true;
smc_conn_free(conn);
- if (local_first)
+ if (local_first && lgr_valid)
smc_lgr_cleanup_early(lgr);
}
*/
u64 peer_token; /* SMC-D token of peer */
u8 killed : 1; /* abnormal termination */
+ u8 freed : 1; /* normal termiation */
u8 out_of_sync : 1; /* out of sync with peer */
};
{
int rc;
- if (!conn->lgr || (conn->lgr->is_smcd && conn->lgr->peer_shutdown))
+ if (!smc_conn_lgr_valid(conn) ||
+ (conn->lgr->is_smcd && conn->lgr->peer_shutdown))
return -EPIPE;
if (conn->lgr->is_smcd) {
dclc.os_type = version == SMC_V1 ? 0 : SMC_CLC_OS_LINUX;
dclc.hdr.typev2 = (peer_diag_info == SMC_CLC_DECL_SYNCERR) ?
SMC_FIRST_CONTACT_MASK : 0;
- if ((!smc->conn.lgr || !smc->conn.lgr->is_smcd) &&
+ if ((!smc_conn_lgr_valid(&smc->conn) || !smc->conn.lgr->is_smcd) &&
smc_ib_is_valid_local_systemid())
memcpy(dclc.id_for_peer, local_systemid,
sizeof(local_systemid));
{
struct smc_link_group *lgr = conn->lgr;
- if (!lgr)
+ if (!smc_conn_lgr_valid(conn))
return;
write_lock_bh(&lgr->conns_lock);
if (conn->alert_token_local) {
__smc_lgr_unregister_conn(conn);
}
write_unlock_bh(&lgr->conns_lock);
- conn->lgr = NULL;
}
int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
}
get_device(&lnk->smcibdev->ibdev->dev);
atomic_inc(&lnk->smcibdev->lnk_cnt);
+ refcount_set(&lnk->refcnt, 1); /* link refcnt is set to 1 */
+ lnk->clearing = 0;
lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu;
lnk->link_id = smcr_next_link_id(lgr);
lnk->lgr = lgr;
+ smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */
lnk->link_idx = link_idx;
smc_ibdev_cnt_inc(lnk);
smcr_copy_dev_info_to_link(lnk);
lnk->state = SMC_LNK_UNUSED;
if (!atomic_dec_return(&smcibdev->lnk_cnt))
wake_up(&smcibdev->lnks_deleted);
+ smc_lgr_put(lgr); /* lgr_hold above */
return rc;
}
lgr->terminating = 0;
lgr->freeing = 0;
lgr->vlan_id = ini->vlan_id;
+ refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */
mutex_init(&lgr->sndbufs_lock);
mutex_init(&lgr->rmbs_lock);
rwlock_init(&lgr->conns_lock);
struct smc_link *to_lnk)
{
atomic_dec(&conn->lnk->conn_cnt);
+ /* link_hold in smc_conn_create() */
+ smcr_link_put(conn->lnk);
conn->lnk = to_lnk;
atomic_inc(&conn->lnk->conn_cnt);
+ /* link_put in smc_conn_free() */
+ smcr_link_hold(conn->lnk);
}
struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
{
struct smc_link_group *lgr = conn->lgr;
- if (!lgr)
+ if (!lgr || conn->freed)
+ /* Connection has never been registered in a
+ * link group, or has already been freed.
+ */
return;
+
+ conn->freed = 1;
+ if (!smc_conn_lgr_valid(conn))
+ /* Connection has already unregistered from
+ * link group.
+ */
+ goto lgr_put;
+
if (lgr->is_smcd) {
if (!list_empty(&lgr->list))
smc_ism_unset_conn(conn);
if (!lgr->conns_num)
smc_lgr_schedule_free_work(lgr);
+lgr_put:
+ if (!lgr->is_smcd)
+ smcr_link_put(conn->lnk); /* link_hold in smc_conn_create() */
+ smc_lgr_put(lgr); /* lgr_hold in smc_conn_create() */
}
/* unregister a link from a buf_desc */
}
}
-/* must be called under lgr->llc_conf_mutex lock */
-void smcr_link_clear(struct smc_link *lnk, bool log)
+static void __smcr_link_clear(struct smc_link *lnk)
{
+ struct smc_link_group *lgr = lnk->lgr;
struct smc_ib_device *smcibdev;
- if (!lnk->lgr || lnk->state == SMC_LNK_UNUSED)
+ smc_wr_free_link_mem(lnk);
+ smc_ibdev_cnt_dec(lnk);
+ put_device(&lnk->smcibdev->ibdev->dev);
+ smcibdev = lnk->smcibdev;
+ memset(lnk, 0, sizeof(struct smc_link));
+ lnk->state = SMC_LNK_UNUSED;
+ if (!atomic_dec_return(&smcibdev->lnk_cnt))
+ wake_up(&smcibdev->lnks_deleted);
+ smc_lgr_put(lgr); /* lgr_hold in smcr_link_init() */
+}
+
+/* must be called under lgr->llc_conf_mutex lock */
+void smcr_link_clear(struct smc_link *lnk, bool log)
+{
+ if (!lnk->lgr || lnk->clearing ||
+ lnk->state == SMC_LNK_UNUSED)
return;
+ lnk->clearing = 1;
lnk->peer_qpn = 0;
smc_llc_link_clear(lnk, log);
smcr_buf_unmap_lgr(lnk);
smc_wr_free_link(lnk);
smc_ib_destroy_queue_pair(lnk);
smc_ib_dealloc_protection_domain(lnk);
- smc_wr_free_link_mem(lnk);
- smc_ibdev_cnt_dec(lnk);
- put_device(&lnk->smcibdev->ibdev->dev);
- smcibdev = lnk->smcibdev;
- memset(lnk, 0, sizeof(struct smc_link));
- lnk->state = SMC_LNK_UNUSED;
- if (!atomic_dec_return(&smcibdev->lnk_cnt))
- wake_up(&smcibdev->lnks_deleted);
+ smcr_link_put(lnk); /* theoretically last link_put */
+}
+
+void smcr_link_hold(struct smc_link *lnk)
+{
+ refcount_inc(&lnk->refcnt);
+}
+
+void smcr_link_put(struct smc_link *lnk)
+{
+ if (refcount_dec_and_test(&lnk->refcnt))
+ __smcr_link_clear(lnk);
}
static void smcr_buf_free(struct smc_link_group *lgr, bool is_rmb,
__smc_lgr_free_bufs(lgr, true);
}
+/* won't be freed until no one accesses to lgr anymore */
+static void __smc_lgr_free(struct smc_link_group *lgr)
+{
+ smc_lgr_free_bufs(lgr);
+ if (lgr->is_smcd) {
+ if (!atomic_dec_return(&lgr->smcd->lgr_cnt))
+ wake_up(&lgr->smcd->lgrs_deleted);
+ } else {
+ smc_wr_free_lgr_mem(lgr);
+ if (!atomic_dec_return(&lgr_cnt))
+ wake_up(&lgrs_deleted);
+ }
+ kfree(lgr);
+}
+
/* remove a link group */
static void smc_lgr_free(struct smc_link_group *lgr)
{
smc_llc_lgr_clear(lgr);
}
- smc_lgr_free_bufs(lgr);
destroy_workqueue(lgr->tx_wq);
if (lgr->is_smcd) {
smc_ism_put_vlan(lgr->smcd, lgr->vlan_id);
put_device(&lgr->smcd->dev);
- if (!atomic_dec_return(&lgr->smcd->lgr_cnt))
- wake_up(&lgr->smcd->lgrs_deleted);
- } else {
- smc_wr_free_lgr_mem(lgr);
- if (!atomic_dec_return(&lgr_cnt))
- wake_up(&lgrs_deleted);
}
- kfree(lgr);
+ smc_lgr_put(lgr); /* theoretically last lgr_put */
+}
+
+void smc_lgr_hold(struct smc_link_group *lgr)
+{
+ refcount_inc(&lgr->refcnt);
+}
+
+void smc_lgr_put(struct smc_link_group *lgr)
+{
+ if (refcount_dec_and_test(&lgr->refcnt))
+ __smc_lgr_free(lgr);
}
static void smc_sk_wake_ups(struct smc_sock *smc)
/* Called when an SMCR device is removed or the smc module is unloaded.
* If smcibdev is given, all SMCR link groups using this device are terminated.
* If smcibdev is NULL, all SMCR link groups are terminated.
- *
- * We must wait here for QPs been destroyed before we destroy the CQs,
- * or we won't received any CQEs and cdc_pend_tx_wr cannot reach 0 thus
- * smc_sock cannot be released.
*/
void smc_smcr_terminate_all(struct smc_ib_device *smcibdev)
{
struct smc_link_group *lgr, *lg;
LIST_HEAD(lgr_free_list);
- LIST_HEAD(lgr_linkdown_list);
int i;
spin_lock_bh(&smc_lgr_list.lock);
list_for_each_entry_safe(lgr, lg, &smc_lgr_list.list, list) {
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (lgr->lnk[i].smcibdev == smcibdev)
- list_move_tail(&lgr->list, &lgr_linkdown_list);
+ smcr_link_down_cond_sched(&lgr->lnk[i]);
}
}
}
__smc_lgr_terminate(lgr, false);
}
- list_for_each_entry_safe(lgr, lg, &lgr_linkdown_list, list) {
- for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
- if (lgr->lnk[i].smcibdev == smcibdev) {
- mutex_lock(&lgr->llc_conf_mutex);
- smcr_link_down_cond(&lgr->lnk[i]);
- mutex_unlock(&lgr->llc_conf_mutex);
- }
- }
- }
-
if (smcibdev) {
if (atomic_read(&smcibdev->lnk_cnt))
wait_event(smcibdev->lnks_deleted,
goto out;
}
}
+ smc_lgr_hold(conn->lgr); /* lgr_put in smc_conn_free() */
+ if (!conn->lgr->is_smcd)
+ smcr_link_hold(conn->lnk); /* link_put in smc_conn_free() */
+ conn->freed = 0;
conn->local_tx_ctrl.common.type = SMC_CDC_MSG_TYPE;
conn->local_tx_ctrl.len = SMC_WR_TX_SIZE;
conn->urg_state = SMC_URG_READ;
void smc_sndbuf_sync_sg_for_cpu(struct smc_connection *conn)
{
- if (!conn->lgr || conn->lgr->is_smcd || !smc_link_active(conn->lnk))
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd ||
+ !smc_link_active(conn->lnk))
return;
smc_ib_sync_sg_for_cpu(conn->lnk, conn->sndbuf_desc, DMA_TO_DEVICE);
}
void smc_sndbuf_sync_sg_for_device(struct smc_connection *conn)
{
- if (!conn->lgr || conn->lgr->is_smcd || !smc_link_active(conn->lnk))
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd ||
+ !smc_link_active(conn->lnk))
return;
smc_ib_sync_sg_for_device(conn->lnk, conn->sndbuf_desc, DMA_TO_DEVICE);
}
{
int i;
- if (!conn->lgr || conn->lgr->is_smcd)
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd)
return;
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (!smc_link_active(&conn->lgr->lnk[i]))
{
int i;
- if (!conn->lgr || conn->lgr->is_smcd)
+ if (!smc_conn_lgr_valid(conn) || conn->lgr->is_smcd)
return;
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (!smc_link_active(&conn->lgr->lnk[i]))
u8 peer_link_uid[SMC_LGR_ID_SIZE]; /* peer uid */
u8 link_idx; /* index in lgr link array */
u8 link_is_asym; /* is link asymmetric? */
+ u8 clearing : 1; /* link is being cleared */
+ refcount_t refcnt; /* link reference count */
struct smc_link_group *lgr; /* parent link group */
struct work_struct link_down_wrk; /* wrk to bring link down */
char ibname[IB_DEVICE_NAME_MAX]; /* ib device name */
u8 terminating : 1;/* lgr is terminating */
u8 freeing : 1; /* lgr is being freed */
+ refcount_t refcnt; /* lgr reference count */
bool is_smcd; /* SMC-R or SMC-D */
u8 smc_version;
u8 negotiated_eid[SMC_MAX_EID_LEN];
return res;
}
+static inline bool smc_conn_lgr_valid(struct smc_connection *conn)
+{
+ return conn->lgr && conn->alert_token_local;
+}
+
/*
* Returns true if the specified link is usable.
*
void smc_lgr_cleanup_early(struct smc_link_group *lgr);
void smc_lgr_terminate_sched(struct smc_link_group *lgr);
+void smc_lgr_hold(struct smc_link_group *lgr);
+void smc_lgr_put(struct smc_link_group *lgr);
void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport);
void smcr_port_err(struct smc_ib_device *smcibdev, u8 ibport);
void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid,
int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
u8 link_idx, struct smc_init_info *ini);
void smcr_link_clear(struct smc_link *lnk, bool log);
+void smcr_link_hold(struct smc_link *lnk);
+void smcr_link_put(struct smc_link *lnk);
void smc_switch_link_and_count(struct smc_connection *conn,
struct smc_link *to_lnk);
int smcr_buf_map_lgr(struct smc_link *lnk);
r->diag_state = sk->sk_state;
if (smc->use_fallback)
r->diag_mode = SMC_DIAG_MODE_FALLBACK_TCP;
- else if (smc->conn.lgr && smc->conn.lgr->is_smcd)
+ else if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd)
r->diag_mode = SMC_DIAG_MODE_SMCD;
else
r->diag_mode = SMC_DIAG_MODE_SMCR;
goto errout;
}
- if (smc->conn.lgr && !smc->conn.lgr->is_smcd &&
+ if (smc_conn_lgr_valid(&smc->conn) && !smc->conn.lgr->is_smcd &&
(req->diag_ext & (1 << (SMC_DIAG_LGRINFO - 1))) &&
!list_empty(&smc->conn.lgr->list)) {
struct smc_link *link = smc->conn.lnk;
if (nla_put(skb, SMC_DIAG_LGRINFO, sizeof(linfo), &linfo) < 0)
goto errout;
}
- if (smc->conn.lgr && smc->conn.lgr->is_smcd &&
+ if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
(req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
!list_empty(&smc->conn.lgr->list)) {
struct smc_connection *conn = &smc->conn;
memcpy(new_pe->pnet_name, pnet_name, SMC_MAX_PNETID_LEN);
strncpy(new_pe->eth_name, eth_name, IFNAMSIZ);
new_pe->ndev = ndev;
- netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL);
+ if (ndev)
+ netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL);
rc = -EEXIST;
new_netdev = true;
write_lock(&pnettable->lock);
int smc_wr_tx_send_wait(struct smc_link *link, struct smc_wr_tx_pend_priv *priv,
unsigned long timeout);
void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context);
-void smc_wr_tx_dismiss_slots(struct smc_link *lnk, u8 wr_rx_hdr_type,
- smc_wr_tx_filter filter,
- smc_wr_tx_dismisser dismisser,
- unsigned long data);
void smc_wr_tx_wait_no_pending_sends(struct smc_link *link);
int smc_wr_rx_register_handler(struct smc_wr_rx_handler *handler);
static ssize_t write_gssp(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
- struct net *net = PDE_DATA(file_inode(file));
+ struct net *net = pde_data(file_inode(file));
char tbuf[20];
unsigned long i;
int res;
static ssize_t read_gssp(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
- struct net *net = PDE_DATA(file_inode(file));
+ struct net *net = pde_data(file_inode(file));
struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
unsigned long p = *ppos;
char tbuf[10];
static ssize_t cache_read_procfs(struct file *filp, char __user *buf,
size_t count, loff_t *ppos)
{
- struct cache_detail *cd = PDE_DATA(file_inode(filp));
+ struct cache_detail *cd = pde_data(file_inode(filp));
return cache_read(filp, buf, count, ppos, cd);
}
static ssize_t cache_write_procfs(struct file *filp, const char __user *buf,
size_t count, loff_t *ppos)
{
- struct cache_detail *cd = PDE_DATA(file_inode(filp));
+ struct cache_detail *cd = pde_data(file_inode(filp));
return cache_write(filp, buf, count, ppos, cd);
}
static __poll_t cache_poll_procfs(struct file *filp, poll_table *wait)
{
- struct cache_detail *cd = PDE_DATA(file_inode(filp));
+ struct cache_detail *cd = pde_data(file_inode(filp));
return cache_poll(filp, wait, cd);
}
unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
- struct cache_detail *cd = PDE_DATA(inode);
+ struct cache_detail *cd = pde_data(inode);
return cache_ioctl(inode, filp, cmd, arg, cd);
}
static int cache_open_procfs(struct inode *inode, struct file *filp)
{
- struct cache_detail *cd = PDE_DATA(inode);
+ struct cache_detail *cd = pde_data(inode);
return cache_open(inode, filp, cd);
}
static int cache_release_procfs(struct inode *inode, struct file *filp)
{
- struct cache_detail *cd = PDE_DATA(inode);
+ struct cache_detail *cd = pde_data(inode);
return cache_release(inode, filp, cd);
}
static int content_open_procfs(struct inode *inode, struct file *filp)
{
- struct cache_detail *cd = PDE_DATA(inode);
+ struct cache_detail *cd = pde_data(inode);
return content_open(inode, filp, cd);
}
static int content_release_procfs(struct inode *inode, struct file *filp)
{
- struct cache_detail *cd = PDE_DATA(inode);
+ struct cache_detail *cd = pde_data(inode);
return content_release(inode, filp, cd);
}
static int open_flush_procfs(struct inode *inode, struct file *filp)
{
- struct cache_detail *cd = PDE_DATA(inode);
+ struct cache_detail *cd = pde_data(inode);
return open_flush(inode, filp, cd);
}
static int release_flush_procfs(struct inode *inode, struct file *filp)
{
- struct cache_detail *cd = PDE_DATA(inode);
+ struct cache_detail *cd = pde_data(inode);
return release_flush(inode, filp, cd);
}
static ssize_t read_flush_procfs(struct file *filp, char __user *buf,
size_t count, loff_t *ppos)
{
- struct cache_detail *cd = PDE_DATA(file_inode(filp));
+ struct cache_detail *cd = pde_data(file_inode(filp));
return read_flush(filp, buf, count, ppos, cd);
}
const char __user *buf,
size_t count, loff_t *ppos)
{
- struct cache_detail *cd = PDE_DATA(file_inode(filp));
+ struct cache_detail *cd = pde_data(file_inode(filp));
return write_flush(filp, buf, count, ppos, cd);
}
static int rpc_proc_open(struct inode *inode, struct file *file)
{
- return single_open(file, rpc_proc_show, PDE_DATA(inode));
+ return single_open(file, rpc_proc_show, pde_data(inode));
}
static const struct proc_ops rpc_proc_ops = {
splice_read_end:
release_sock(sk);
+ sk_defer_free_flush(sk);
return copied ? : err;
}
{
/* If number of inflight sockets is insane,
* force a garbage collect right now.
+ * Paired with the WRITE_ONCE() in unix_inflight(),
+ * unix_notinflight() and gc_in_progress().
*/
- if (unix_tot_inflight > UNIX_INFLIGHT_TRIGGER_GC && !gc_in_progress)
+ if (READ_ONCE(unix_tot_inflight) > UNIX_INFLIGHT_TRIGGER_GC &&
+ !READ_ONCE(gc_in_progress))
unix_gc();
wait_event(unix_gc_wait, gc_in_progress == false);
}
if (gc_in_progress)
goto out;
- gc_in_progress = true;
+ /* Paired with READ_ONCE() in wait_for_unix_gc(). */
+ WRITE_ONCE(gc_in_progress, true);
+
/* First, select candidates for garbage collection. Only
* in-flight sockets are considered, and from those only ones
* which don't have any external reference.
/* All candidates should have been detached by now. */
BUG_ON(!list_empty(&gc_candidates));
- gc_in_progress = false;
+
+ /* Paired with READ_ONCE() in wait_for_unix_gc(). */
+ WRITE_ONCE(gc_in_progress, false);
+
wake_up(&unix_gc_wait);
out:
} else {
BUG_ON(list_empty(&u->link));
}
- unix_tot_inflight++;
+ /* Paired with READ_ONCE() in wait_for_unix_gc() */
+ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1);
}
user->unix_inflight++;
spin_unlock(&unix_gc_lock);
if (atomic_long_dec_and_test(&u->inflight))
list_del_init(&u->link);
- unix_tot_inflight--;
+ /* Paired with READ_ONCE() in wait_for_unix_gc() */
+ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1);
}
user->unix_inflight--;
spin_unlock(&unix_gc_lock);
echo 'unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);'; \
) > $@
-$(obj)/extra-certs.c: $(CONFIG_CFG80211_EXTRA_REGDB_KEYDIR:"%"=%) \
- $(wildcard $(CONFIG_CFG80211_EXTRA_REGDB_KEYDIR:"%"=%)/*.x509)
+$(obj)/extra-certs.c: $(CONFIG_CFG80211_EXTRA_REGDB_KEYDI) \
+ $(wildcard $(CONFIG_CFG80211_EXTRA_REGDB_KEYDIR)/*.x509)
@$(kecho) " GEN $@"
$(Q)(set -e; \
allf=""; \
#include <linux/if_tunnel.h>
#include <net/dst.h>
#include <net/flow.h>
+#include <net/inet_ecn.h>
#include <net/xfrm.h>
#include <net/ip.h>
#include <net/gre.h>
fl4->flowi4_proto = iph->protocol;
fl4->daddr = reverse ? iph->saddr : iph->daddr;
fl4->saddr = reverse ? iph->daddr : iph->saddr;
- fl4->flowi4_tos = iph->tos;
+ fl4->flowi4_tos = iph->tos & ~INET_ECN_MASK;
if (!ip_is_fragment(iph)) {
switch (iph->protocol) {
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
struct sched_switch_args {
unsigned long long pad;
- char prev_comm[16];
+ char prev_comm[TASK_COMM_LEN];
int prev_pid;
int prev_prio;
long long prev_state;
- char next_comm[16];
+ char next_comm[TASK_COMM_LEN];
int next_pid;
int next_prio;
};
*/
#include <linux/version.h>
#include <linux/ptrace.h>
+#include <linux/sched.h>
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
{
struct signal_struct *signal;
struct task_struct *tsk;
- char oldcomm[16] = {};
- char newcomm[16] = {};
+ char oldcomm[TASK_COMM_LEN] = {};
+ char newcomm[TASK_COMM_LEN] = {};
u16 oom_score_adj;
u32 pid;
tsk = (void *)PT_REGS_PARM1(ctx);
pid = _(tsk->pid);
- bpf_probe_read_kernel(oldcomm, sizeof(oldcomm), &tsk->comm);
- bpf_probe_read_kernel(newcomm, sizeof(newcomm),
- (void *)PT_REGS_PARM2(ctx));
+ bpf_probe_read_kernel_str(oldcomm, sizeof(oldcomm), &tsk->comm);
+ bpf_probe_read_kernel_str(newcomm, sizeof(newcomm),
+ (void *)PT_REGS_PARM2(ctx));
signal = _(tsk->signal);
oom_score_adj = _(signal->oom_score_adj);
return 0;
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
+#include <linux/sched.h>
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct task_rename {
__u64 pad;
__u32 pid;
- char oldcomm[16];
- char newcomm[16];
+ char oldcomm[TASK_COMM_LEN];
+ char newcomm[TASK_COMM_LEN];
__u16 oom_score_adj;
};
SEC("tracepoint/task/task_rename")
# SPDX-License-Identifier: GPL-2.0-only
/asn1_compiler
/bin2c
-/extract-cert
/insert-sys-cert
/kallsyms
/module.lds
echo-why = $(call escsq, $(strip $(why)))
endif
-###############################################################################
-#
-# When a Kconfig string contains a filename, it is suitable for
-# passing to shell commands. It is surrounded by double-quotes, and
-# any double-quotes or backslashes within it are escaped by
-# backslashes.
-#
-# This is no use for dependencies or $(wildcard). We need to strip the
-# surrounding quotes and the escaping from quotes and backslashes, and
-# we *do* need to escape any spaces in the string. So, for example:
-#
-# Usage: $(eval $(call config_filename,FOO))
-#
-# Defines FOO_FILENAME based on the contents of the CONFIG_FOO option,
-# transformed as described above to be suitable for use within the
-# makefile.
-#
-# Also, if the filename is a relative filename and exists in the source
-# tree but not the build tree, define FOO_SRCPREFIX as $(srctree)/ to
-# be prefixed to *both* command invocation and dependencies.
-#
-# Note: We also print the filenames in the quiet_cmd_foo text, and
-# perhaps ought to have a version specially escaped for that purpose.
-# But it's only cosmetic, and $(patsubst "%",%,$(CONFIG_FOO)) is good
-# enough. It'll strip the quotes in the common case where there's no
-# space and it's a simple filename, and it'll retain the quotes when
-# there's a space. There are some esoteric cases in which it'll print
-# the wrong thing, but we don't really care. The actual dependencies
-# and commands *do* get it right, with various combinations of single
-# and double quotes, backslashes and spaces in the filenames.
-#
-###############################################################################
-#
-define config_filename
-ifneq ($$(CONFIG_$(1)),"")
-$(1)_FILENAME := $$(subst \\,\,$$(subst \$$(quote),$$(quote),$$(subst $$(space_escape),\$$(space),$$(patsubst "%",%,$$(subst $$(space),$$(space_escape),$$(CONFIG_$(1)))))))
-ifneq ($$(patsubst /%,%,$$(firstword $$($(1)_FILENAME))),$$(firstword $$($(1)_FILENAME)))
-else
-ifeq ($$(wildcard $$($(1)_FILENAME)),)
-ifneq ($$(wildcard $$(srctree)/$$($(1)_FILENAME)),)
-$(1)_SRCPREFIX := $(srctree)/
-endif
-endif
-endif
-endif
-endef
-#
###############################################################################
# delete partially updated (i.e. corrupted) files on error
# scripts contains sources for various helper programs used throughout
# the kernel for the build process.
-CRYPTO_LIBS = $(shell pkg-config --libs libcrypto 2> /dev/null || echo -lcrypto)
-CRYPTO_CFLAGS = $(shell pkg-config --cflags libcrypto 2> /dev/null)
-
hostprogs-always-$(CONFIG_BUILD_BIN2C) += bin2c
hostprogs-always-$(CONFIG_KALLSYMS) += kallsyms
hostprogs-always-$(BUILD_C_RECORDMCOUNT) += recordmcount
hostprogs-always-$(CONFIG_BUILDTIME_TABLE_SORT) += sorttable
hostprogs-always-$(CONFIG_ASN1) += asn1_compiler
hostprogs-always-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
-hostprogs-always-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
hostprogs-always-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
-hostprogs-always-$(CONFIG_SYSTEM_REVOCATION_LIST) += extract-cert
HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include
HOSTLDLIBS_sorttable = -lpthread
HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
-HOSTCFLAGS_sign-file.o = $(CRYPTO_CFLAGS)
-HOSTLDLIBS_sign-file = $(CRYPTO_LIBS)
-HOSTCFLAGS_extract-cert.o = $(CRYPTO_CFLAGS)
-HOSTLDLIBS_extract-cert = $(CRYPTO_LIBS)
+HOSTCFLAGS_sign-file.o = $(shell pkg-config --cflags libcrypto 2> /dev/null)
+HOSTLDLIBS_sign-file = $(shell pkg-config --libs libcrypto 2> /dev/null || echo -lcrypto)
ifdef CONFIG_UNWINDER_ORC
ifeq ($(ARCH),x86_64)
} \
)
+quiet_cmd_file_size = GEN $@
+ cmd_file_size = $(size_append) > $@
+
quiet_cmd_bzip2 = BZIP2 $@
- cmd_bzip2 = { cat $(real-prereqs) | $(KBZIP2) -9; $(size_append); } > $@
+ cmd_bzip2 = cat $(real-prereqs) | $(KBZIP2) -9 > $@
+
+quiet_cmd_bzip2_with_size = BZIP2 $@
+ cmd_bzip2_with_size = { cat $(real-prereqs) | $(KBZIP2) -9; $(size_append); } > $@
# Lzma
# ---------------------------------------------------------------------------
quiet_cmd_lzma = LZMA $@
- cmd_lzma = { cat $(real-prereqs) | $(LZMA) -9; $(size_append); } > $@
+ cmd_lzma = cat $(real-prereqs) | $(LZMA) -9 > $@
+
+quiet_cmd_lzma_with_size = LZMA $@
+ cmd_lzma_with_size = { cat $(real-prereqs) | $(LZMA) -9; $(size_append); } > $@
quiet_cmd_lzo = LZO $@
- cmd_lzo = { cat $(real-prereqs) | $(KLZOP) -9; $(size_append); } > $@
+ cmd_lzo = cat $(real-prereqs) | $(KLZOP) -9 > $@
+
+quiet_cmd_lzo_with_size = LZO $@
+ cmd_lzo_with_size = { cat $(real-prereqs) | $(KLZOP) -9; $(size_append); } > $@
quiet_cmd_lz4 = LZ4 $@
- cmd_lz4 = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
+ cmd_lz4 = cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout > $@
+
+quiet_cmd_lz4_with_size = LZ4 $@
+ cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
$(size_append); } > $@
# U-Boot mkimage
# big dictionary would increase the memory usage too much in the multi-call
# decompression mode. A BCJ filter isn't used either.
quiet_cmd_xzkern = XZKERN $@
- cmd_xzkern = { cat $(real-prereqs) | sh $(srctree)/scripts/xz_wrap.sh; \
+ cmd_xzkern = cat $(real-prereqs) | sh $(srctree)/scripts/xz_wrap.sh > $@
+
+quiet_cmd_xzkern_with_size = XZKERN $@
+ cmd_xzkern_with_size = { cat $(real-prereqs) | sh $(srctree)/scripts/xz_wrap.sh; \
$(size_append); } > $@
quiet_cmd_xzmisc = XZMISC $@
# be used because it would require zstd to allocate a 128 MB buffer.
quiet_cmd_zstd = ZSTD $@
- cmd_zstd = { cat $(real-prereqs) | $(ZSTD) -19; $(size_append); } > $@
+ cmd_zstd = cat $(real-prereqs) | $(ZSTD) -19 > $@
quiet_cmd_zstd22 = ZSTD22 $@
- cmd_zstd22 = { cat $(real-prereqs) | $(ZSTD) -22 --ultra; $(size_append); } > $@
+ cmd_zstd22 = cat $(real-prereqs) | $(ZSTD) -22 --ultra > $@
+
+quiet_cmd_zstd22_with_size = ZSTD22 $@
+ cmd_zstd22_with_size = { cat $(real-prereqs) | $(ZSTD) -22 --ultra; $(size_append); } > $@
# ASM offsets
# ---------------------------------------------------------------------------
# Don't stop modules_install even if we can't sign external modules.
#
ifeq ($(CONFIG_MODULE_SIG_ALL),y)
+sig-key := $(if $(wildcard $(CONFIG_MODULE_SIG_KEY)),,$(srctree)/)$(CONFIG_MODULE_SIG_KEY)
quiet_cmd_sign = SIGN $@
-$(eval $(call config_filename,MODULE_SIG_KEY))
- cmd_sign = scripts/sign-file $(CONFIG_MODULE_SIG_HASH) $(MODULE_SIG_KEY_SRCPREFIX)$(CONFIG_MODULE_SIG_KEY) certs/signing_key.x509 $@ \
+ cmd_sign = scripts/sign-file $(CONFIG_MODULE_SIG_HASH) $(sig-key) certs/signing_key.x509 $@ \
$(if $(KBUILD_EXTMOD),|| true)
else
quiet_cmd_sign :=
ubsan-cflags-$(CONFIG_UBSAN_SHIFT) += -fsanitize=shift
ubsan-cflags-$(CONFIG_UBSAN_DIV_ZERO) += -fsanitize=integer-divide-by-zero
ubsan-cflags-$(CONFIG_UBSAN_UNREACHABLE) += -fsanitize=unreachable
-ubsan-cflags-$(CONFIG_UBSAN_OBJECT_SIZE) += -fsanitize=object-size
ubsan-cflags-$(CONFIG_UBSAN_BOOL) += -fsanitize=bool
ubsan-cflags-$(CONFIG_UBSAN_ENUM) += -fsanitize=enum
ubsan-cflags-$(CONFIG_UBSAN_TRAP) += -fsanitize-undefined-trap-on-error
length($line) > 75 &&
!($line =~ /^\s*[a-zA-Z0-9_\/\.]+\s+\|\s+\d+/ ||
# file delta changes
- $line =~ /^\s*(?:[\w\.\-]+\/)++[\w\.\-]+:/ ||
+ $line =~ /^\s*(?:[\w\.\-\+]*\/)++[\w\.\-\+]+:/ ||
# filename then :
$line =~ /^\s*(?:Fixes:|Link:|$signature_tags)/i ||
# A Fixes: or Link: line or signature tag line
# Kconfig supports named choices), so use a word boundary
# (\b) rather than a whitespace character (\s)
$line =~ /^\+\s*(?:config|menuconfig|choice)\b/) {
- my $length = 0;
- my $cnt = $realcnt;
- my $ln = $linenr + 1;
- my $f;
- my $is_start = 0;
- my $is_end = 0;
- for (; $cnt > 0 && defined $lines[$ln - 1]; $ln++) {
- $f = $lines[$ln - 1];
- $cnt-- if ($lines[$ln - 1] !~ /^-/);
- $is_end = $lines[$ln - 1] =~ /^\+/;
+ my $ln = $linenr;
+ my $needs_help = 0;
+ my $has_help = 0;
+ my $help_length = 0;
+ while (defined $lines[$ln]) {
+ my $f = $lines[$ln++];
next if ($f =~ /^-/);
- last if (!$file && $f =~ /^\@\@/);
+ last if ($f !~ /^[\+ ]/); # !patch context
- if ($lines[$ln - 1] =~ /^\+\s*(?:bool|tristate|prompt)\s*["']/) {
- $is_start = 1;
- } elsif ($lines[$ln - 1] =~ /^\+\s*(?:---)?help(?:---)?$/) {
- $length = -1;
+ if ($f =~ /^\+\s*(?:bool|tristate|prompt)\s*["']/) {
+ $needs_help = 1;
+ next;
+ }
+ if ($f =~ /^\+\s*help\s*$/) {
+ $has_help = 1;
+ next;
}
- $f =~ s/^.//;
- $f =~ s/#.*//;
- $f =~ s/^\s+//;
- next if ($f =~ /^$/);
+ $f =~ s/^.//; # strip patch context [+ ]
+ $f =~ s/#.*//; # strip # directives
+ $f =~ s/^\s+//; # strip leading blanks
+ next if ($f =~ /^$/); # skip blank lines
+ # At the end of this Kconfig block:
# This only checks context lines in the patch
# and so hopefully shouldn't trigger false
# positives, even though some of these are
# common words in help texts
- if ($f =~ /^\s*(?:config|menuconfig|choice|endchoice|
- if|endif|menu|endmenu|source)\b/x) {
- $is_end = 1;
+ if ($f =~ /^(?:config|menuconfig|choice|endchoice|
+ if|endif|menu|endmenu|source)\b/x) {
last;
}
- $length++;
+ $help_length++ if ($has_help);
}
- if ($is_start && $is_end && $length < $min_conf_desc_length) {
+ if ($needs_help &&
+ $help_length < $min_conf_desc_length) {
+ my $stat_real = get_stat_real($linenr, $ln - 1);
WARN("CONFIG_DESCRIPTION",
- "please write a paragraph that describes the config symbol fully\n" . $herecurr);
+ "please write a help paragraph that fully describes the config symbol\n" . "$here\n$stat_real\n");
}
- #print "is_start<$is_start> is_end<$is_end> length<$length>\n";
}
# check MAINTAINERS entries
drm_connector_funcs
drm_encoder_funcs
drm_encoder_helper_funcs
+dvb_frontend_ops
+dvb_tuner_ops
ethtool_ops
extent_io_ops
+fb_ops
file_lock_operations
file_operations
hv_ops
+hwmon_ops
+ib_device_ops
ide_dma_ops
ide_port_ops
+ieee80211_ops
+iio_buffer_setup_ops
inode_operations
intel_dvo_dev_ops
irq_domain_ops
item_operations
iwl_cfg
iwl_ops
+kernel_param_ops
kgdb_arch
kgdb_io
kset_uevent_ops
machine_desc
microcode_ops
mlxsw_reg_info
+mtd_ooblayout_ops
mtrr_ops
+nand_controller_ops
neigh_ops
net_device_ops
+nft_expr_ops
nlmsvc_binding
nvkm_device_chip
of_device_id
pci_raw_ops
phy_ops
+pinconf_ops
pinctrl_ops
pinmux_ops
pipe_buf_operations
platform_hibernation_ops
platform_suspend_ops
+proc_ops
proto_ops
+pwm_ops
regmap_access_table
regulator_ops
+reset_control_ops
rpc_pipe_ops
rtc_class_ops
sd_desc
+sdhci_ops
seq_operations
sirfsoc_padmux
snd_ac97_build_ops
usb_mon_operations
v4l2_ctrl_ops
v4l2_ioctl_ops
+v4l2_subdev_core_ops
+v4l2_subdev_internal_ops
+v4l2_subdev_ops
+v4l2_subdev_pad_ops
+v4l2_subdev_video_ops
+vb2_ops
vm_operations_struct
wacom_features
+watchdog_ops
wd_ops
or '/include/' to be processed.
If DTx_1 and DTx_2 are in different architectures, then this script
- may not work since \${ARCH} is part of the include path. Two possible
- workarounds:
-
- `basename $0` \\
- <(ARCH=arch_of_dtx_1 `basename $0` DTx_1) \\
- <(ARCH=arch_of_dtx_2 `basename $0` DTx_2)
+ may not work since \${ARCH} is part of the include path. The following
+ workaround can be used:
`basename $0` ARCH=arch_of_dtx_1 DTx_1 >tmp_dtx_1.dts
`basename $0` ARCH=arch_of_dtx_2 DTx_2 >tmp_dtx_2.dts
+++ /dev/null
-/* Extract X.509 certificate in DER form from PKCS#11 or PEM.
- *
- * Copyright © 2014-2015 Red Hat, Inc. All Rights Reserved.
- * Copyright © 2015 Intel Corporation.
- *
- * Authors: David Howells <dhowells@redhat.com>
- * David Woodhouse <dwmw2@infradead.org>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public License
- * as published by the Free Software Foundation; either version 2.1
- * of the licence, or (at your option) any later version.
- */
-#define _GNU_SOURCE
-#include <stdio.h>
-#include <stdlib.h>
-#include <stdint.h>
-#include <stdbool.h>
-#include <string.h>
-#include <err.h>
-#include <openssl/bio.h>
-#include <openssl/pem.h>
-#include <openssl/err.h>
-#include <openssl/engine.h>
-
-#define PKEY_ID_PKCS7 2
-
-static __attribute__((noreturn))
-void format(void)
-{
- fprintf(stderr,
- "Usage: scripts/extract-cert <source> <dest>\n");
- exit(2);
-}
-
-static void display_openssl_errors(int l)
-{
- const char *file;
- char buf[120];
- int e, line;
-
- if (ERR_peek_error() == 0)
- return;
- fprintf(stderr, "At main.c:%d:\n", l);
-
- while ((e = ERR_get_error_line(&file, &line))) {
- ERR_error_string(e, buf);
- fprintf(stderr, "- SSL %s: %s:%d\n", buf, file, line);
- }
-}
-
-static void drain_openssl_errors(void)
-{
- const char *file;
- int line;
-
- if (ERR_peek_error() == 0)
- return;
- while (ERR_get_error_line(&file, &line)) {}
-}
-
-#define ERR(cond, fmt, ...) \
- do { \
- bool __cond = (cond); \
- display_openssl_errors(__LINE__); \
- if (__cond) { \
- err(1, fmt, ## __VA_ARGS__); \
- } \
- } while(0)
-
-static const char *key_pass;
-static BIO *wb;
-static char *cert_dst;
-static int kbuild_verbose;
-
-static void write_cert(X509 *x509)
-{
- char buf[200];
-
- if (!wb) {
- wb = BIO_new_file(cert_dst, "wb");
- ERR(!wb, "%s", cert_dst);
- }
- X509_NAME_oneline(X509_get_subject_name(x509), buf, sizeof(buf));
- ERR(!i2d_X509_bio(wb, x509), "%s", cert_dst);
- if (kbuild_verbose)
- fprintf(stderr, "Extracted cert: %s\n", buf);
-}
-
-int main(int argc, char **argv)
-{
- char *cert_src;
-
- OpenSSL_add_all_algorithms();
- ERR_load_crypto_strings();
- ERR_clear_error();
-
- kbuild_verbose = atoi(getenv("KBUILD_VERBOSE")?:"0");
-
- key_pass = getenv("KBUILD_SIGN_PIN");
-
- if (argc != 3)
- format();
-
- cert_src = argv[1];
- cert_dst = argv[2];
-
- if (!cert_src[0]) {
- /* Invoked with no input; create empty file */
- FILE *f = fopen(cert_dst, "wb");
- ERR(!f, "%s", cert_dst);
- fclose(f);
- exit(0);
- } else if (!strncmp(cert_src, "pkcs11:", 7)) {
- ENGINE *e;
- struct {
- const char *cert_id;
- X509 *cert;
- } parms;
-
- parms.cert_id = cert_src;
- parms.cert = NULL;
-
- ENGINE_load_builtin_engines();
- drain_openssl_errors();
- e = ENGINE_by_id("pkcs11");
- ERR(!e, "Load PKCS#11 ENGINE");
- if (ENGINE_init(e))
- drain_openssl_errors();
- else
- ERR(1, "ENGINE_init");
- if (key_pass)
- ERR(!ENGINE_ctrl_cmd_string(e, "PIN", key_pass, 0), "Set PKCS#11 PIN");
- ENGINE_ctrl_cmd(e, "LOAD_CERT_CTRL", 0, &parms, NULL, 1);
- ERR(!parms.cert, "Get X.509 from PKCS#11");
- write_cert(parms.cert);
- } else {
- BIO *b;
- X509 *x509;
-
- b = BIO_new_file(cert_src, "rb");
- ERR(!b, "%s", cert_src);
-
- while (1) {
- x509 = PEM_read_bio_X509(b, NULL, NULL, NULL);
- if (wb && !x509) {
- unsigned long err = ERR_peek_last_error();
- if (ERR_GET_LIB(err) == ERR_LIB_PEM &&
- ERR_GET_REASON(err) == PEM_R_NO_START_LINE) {
- ERR_clear_error();
- break;
- }
- }
- ERR(!x509, "%s", cert_src);
- write_cert(x509);
- }
- }
-
- BIO_free(wb);
-
- return 0;
-}
;;
esac
-# We need access to CONFIG_ symbols
-. include/config/auto.conf
-
needed_symbols=
# Special case for modversions (see modpost.c)
-if [ -n "$CONFIG_MODVERSIONS" ]; then
+if grep -q "^CONFIG_MODVERSIONS=y$" include/config/auto.conf; then
needed_symbols="$needed_symbols module_layout"
fi
-ksym_wl=
-if [ -n "$CONFIG_UNUSED_KSYMS_WHITELIST" ]; then
- # Use 'eval' to expand the whitelist path and check if it is relative
- eval ksym_wl="$CONFIG_UNUSED_KSYMS_WHITELIST"
+ksym_wl=$(sed -n 's/^CONFIG_UNUSED_KSYMS_WHITELIST=\(.*\)$/\1/p' include/config/auto.conf)
+if [ -n "$ksym_wl" ]; then
[ "${ksym_wl}" != "${ksym_wl#/}" ] || ksym_wl="$abs_srctree/$ksym_wl"
if [ ! -f "$ksym_wl" ] || [ ! -r "$ksym_wl" ]; then
echo "ERROR: '$ksym_wl' whitelist file not found" >&2
%VCS_cmds = %VCS_cmds_hg;
return 2 if eval $VCS_cmds{"available"};
%VCS_cmds = ();
- if (!$printed_novcs) {
+ if (!$printed_novcs && $email_git) {
warn("$P: No supported VCS found. Add --nogit to options?\n");
warn("Using a git repository produces better results.\n");
warn("Try Linus Torvalds' latest git repository using:\n");
+++ /dev/null
-#!/usr/bin/env perl
-# SPDX-License-Identifier: GPL-2.0
-#
-# headers_check.pl execute a number of trivial consistency checks
-#
-# Usage: headers_check.pl dir arch [files...]
-# dir: dir to look for included files
-# arch: architecture
-# files: list of files to check
-#
-# The script reads the supplied files line by line and:
-#
-# 1) for each include statement it checks if the
-# included file actually exists.
-# Only include files located in asm* and linux* are checked.
-# The rest are assumed to be system include files.
-#
-# 2) It is checked that prototypes does not use "extern"
-#
-# 3) Check for leaked CONFIG_ symbols
-
-use warnings;
-use strict;
-use File::Basename;
-
-my ($dir, $arch, @files) = @ARGV;
-
-my $ret = 0;
-my $line;
-my $lineno = 0;
-my $filename;
-
-foreach my $file (@files) {
- $filename = $file;
-
- open(my $fh, '<', $filename)
- or die "$filename: $!\n";
- $lineno = 0;
- while ($line = <$fh>) {
- $lineno++;
- &check_include();
- &check_asm_types();
- &check_sizetypes();
- &check_declarations();
- # Dropped for now. Too much noise &check_config();
- }
- close $fh;
-}
-exit $ret;
-
-sub check_include
-{
- if ($line =~ m/^\s*#\s*include\s+<((asm|linux).*)>/) {
- my $inc = $1;
- my $found;
- $found = stat($dir . "/" . $inc);
- if (!$found) {
- $inc =~ s#asm/#asm-$arch/#;
- $found = stat($dir . "/" . $inc);
- }
- if (!$found) {
- printf STDERR "$filename:$lineno: included file '$inc' is not exported\n";
- $ret = 1;
- }
- }
-}
-
-sub check_declarations
-{
- # soundcard.h is what it is
- if ($line =~ m/^void seqbuf_dump\(void\);/) {
- return;
- }
- # drm headers are being C++ friendly
- if ($line =~ m/^extern "C"/) {
- return;
- }
- if ($line =~ m/^(\s*extern|unsigned|char|short|int|long|void)\b/) {
- printf STDERR "$filename:$lineno: " .
- "userspace cannot reference function or " .
- "variable defined in the kernel\n";
- }
-}
-
-sub check_config
-{
- if ($line =~ m/[^a-zA-Z0-9_]+CONFIG_([a-zA-Z0-9_]+)[^a-zA-Z0-9_]/) {
- printf STDERR "$filename:$lineno: leaks CONFIG_$1 to userspace where it is not valid\n";
- }
-}
-
-my $linux_asm_types;
-sub check_asm_types
-{
- if ($filename =~ /types.h|int-l64.h|int-ll64.h/o) {
- return;
- }
- if ($lineno == 1) {
- $linux_asm_types = 0;
- } elsif ($linux_asm_types >= 1) {
- return;
- }
- if ($line =~ m/^\s*#\s*include\s+<asm\/types.h>/) {
- $linux_asm_types = 1;
- printf STDERR "$filename:$lineno: " .
- "include of <linux/types.h> is preferred over <asm/types.h>\n"
- # Warn until headers are all fixed
- #$ret = 1;
- }
-}
-
-my $linux_types;
-my %import_stack = ();
-sub check_include_typesh
-{
- my $path = $_[0];
- my $import_path;
-
- my $fh;
- my @file_paths = ($path, $dir . "/" . $path, dirname($filename) . "/" . $path);
- for my $possible ( @file_paths ) {
- if (not $import_stack{$possible} and open($fh, '<', $possible)) {
- $import_path = $possible;
- $import_stack{$import_path} = 1;
- last;
- }
- }
- if (eof $fh) {
- return;
- }
-
- my $line;
- while ($line = <$fh>) {
- if ($line =~ m/^\s*#\s*include\s+<linux\/types.h>/) {
- $linux_types = 1;
- last;
- }
- if (my $included = ($line =~ /^\s*#\s*include\s+[<"](\S+)[>"]/)[0]) {
- check_include_typesh($included);
- }
- }
- close $fh;
- delete $import_stack{$import_path};
-}
-
-sub check_sizetypes
-{
- if ($filename =~ /types.h|int-l64.h|int-ll64.h/o) {
- return;
- }
- if ($lineno == 1) {
- $linux_types = 0;
- } elsif ($linux_types >= 1) {
- return;
- }
- if ($line =~ m/^\s*#\s*include\s+<linux\/types.h>/) {
- $linux_types = 1;
- return;
- }
- if (my $included = ($line =~ /^\s*#\s*include\s+[<"](\S+)[>"]/)[0]) {
- check_include_typesh($included);
- }
- if ($line =~ m/__[us](8|16|32|64)\b/) {
- printf STDERR "$filename:$lineno: " .
- "found __[us]{8,16,32,64} type " .
- "without #include <linux/types.h>\n";
- $linux_types = 2;
- # Warn until headers are all fixed
- #$ret = 1;
- }
-}
# deprecated for external use
simple-targets := oldconfig allnoconfig allyesconfig allmodconfig \
alldefconfig randconfig listnewconfig olddefconfig syncconfig \
- helpnewconfig yes2modconfig mod2yesconfig
+ helpnewconfig yes2modconfig mod2yesconfig mod2noconfig
PHONY += $(simple-targets)
@echo ' randconfig - New config with random answer to all options'
@echo ' yes2modconfig - Change answers from yes to mod if possible'
@echo ' mod2yesconfig - Change answers from mod to yes if possible'
+ @echo ' mod2noconfig - Change answers from mod to no if possible'
@echo ' listnewconfig - List new options'
@echo ' helpnewconfig - List new options and help text'
@echo ' olddefconfig - Same as oldconfig but sets new symbols to their'
olddefconfig,
yes2modconfig,
mod2yesconfig,
+ mod2noconfig,
};
static enum input_mode input_mode = oldaskconfig;
static int input_mode_opt;
def_default,
def_yes,
def_mod,
- def_y2m,
- def_m2y,
def_no,
def_random
};
return has_changed;
}
-static void conf_rewrite_mod_or_yes(enum conf_def_mode mode)
+static void conf_rewrite_tristates(tristate old_val, tristate new_val)
{
struct symbol *sym;
int i;
- tristate old_val = (mode == def_y2m) ? yes : mod;
- tristate new_val = (mode == def_y2m) ? mod : yes;
for_all_symbols(i, sym) {
if (sym_get_type(sym) == S_TRISTATE &&
{"olddefconfig", no_argument, &input_mode_opt, olddefconfig},
{"yes2modconfig", no_argument, &input_mode_opt, yes2modconfig},
{"mod2yesconfig", no_argument, &input_mode_opt, mod2yesconfig},
+ {"mod2noconfig", no_argument, &input_mode_opt, mod2noconfig},
{NULL, 0, NULL, 0}
};
printf(" --randconfig New config with random answer to all options\n");
printf(" --yes2modconfig Change answers from yes to mod if possible\n");
printf(" --mod2yesconfig Change answers from mod to yes if possible\n");
+ printf(" --mod2noconfig Change answers from mod to no if possible\n");
printf(" (If none of the above is given, --oldaskconfig is the default)\n");
}
case olddefconfig:
case yes2modconfig:
case mod2yesconfig:
+ case mod2noconfig:
conf_read(NULL);
break;
case allnoconfig:
case savedefconfig:
break;
case yes2modconfig:
- conf_rewrite_mod_or_yes(def_y2m);
+ conf_rewrite_tristates(yes, mod);
break;
case mod2yesconfig:
- conf_rewrite_mod_or_yes(def_m2y);
+ conf_rewrite_tristates(mod, yes);
+ break;
+ case mod2noconfig:
+ conf_rewrite_tristates(mod, no);
break;
case oldaskconfig:
rootEntry = &rootmenu;
p, sym->name);
return 1;
case S_STRING:
- if (*p++ != '"')
- break;
- for (p2 = p; (p2 = strpbrk(p2, "\"\\")); p2++) {
- if (*p2 == '"') {
- *p2 = 0;
+ /* No escaping for S_DEF_AUTO (include/config/auto.conf) */
+ if (def != S_DEF_AUTO) {
+ if (*p++ != '"')
break;
+ for (p2 = p; (p2 = strpbrk(p2, "\"\\")); p2++) {
+ if (*p2 == '"') {
+ *p2 = 0;
+ break;
+ }
+ memmove(p2, p2 + 1, strlen(p2));
}
- memmove(p2, p2 + 1, strlen(p2));
- }
- if (!p2) {
- if (def != S_DEF_AUTO)
+ if (!p2) {
conf_warning("invalid string found");
- return 1;
+ return 1;
+ }
}
/* fall through */
case S_INT:
static void print_symbol_for_autoconf(FILE *fp, struct symbol *sym)
{
- __print_symbol(fp, sym, OUTPUT_N_NONE, true);
+ __print_symbol(fp, sym, OUTPUT_N_NONE, false);
}
void print_symbol_for_listconfig(struct symbol *sym)
$source =~ s/\$\($env\)/$ENV{$env}/;
}
- open(my $kinfile, '<', $source) || die "Can't open $kconfig";
+ open(my $kinfile, '<', $source) || die "Can't open $source";
while (<$kinfile>) {
chomp;
KBUILD_LDFLAGS="$2"
LDFLAGS_vmlinux="$3"
+is_enabled() {
+ grep -q "^$1=y" include/config/auto.conf
+}
+
# Nice output in kbuild format
# Will be supressed by "make -s"
info()
${KBUILD_VMLINUX_LIBS} \
--end-group"
- if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ if is_enabled CONFIG_LTO_CLANG; then
gen_initcalls
lds="-T .tmp_initcalls.lds"
- if [ -n "${CONFIG_MODVERSIONS}" ]; then
+ if is_enabled CONFIG_MODVERSIONS; then
gen_symversions
lds="${lds} -T .tmp_symversions.lds"
fi
local objtoolcmd;
local objtoolopt;
- if [ "${CONFIG_LTO_CLANG} ${CONFIG_STACK_VALIDATION}" = "y y" ]; then
+ if is_enabled CONFIG_LTO_CLANG && is_enabled CONFIG_STACK_VALIDATION; then
# Don't perform vmlinux validation unless explicitly requested,
# but run objtool on vmlinux.o now that we have an object file.
- if [ -n "${CONFIG_UNWINDER_ORC}" ]; then
+ if is_enabled CONFIG_UNWINDER_ORC; then
objtoolcmd="orc generate"
fi
objtoolopt="${objtoolopt} --duplicate"
- if [ -n "${CONFIG_FTRACE_MCOUNT_USE_OBJTOOL}" ]; then
+ if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
objtoolopt="${objtoolopt} --mcount"
fi
fi
- if [ -n "${CONFIG_VMLINUX_VALIDATION}" ]; then
+ if is_enabled CONFIG_VMLINUX_VALIDATION; then
objtoolopt="${objtoolopt} --noinstr"
fi
objtoolcmd="check"
fi
objtoolopt="${objtoolopt} --vmlinux"
- if [ -z "${CONFIG_FRAME_POINTER}" ]; then
+ if ! is_enabled CONFIG_FRAME_POINTER; then
objtoolopt="${objtoolopt} --no-fp"
fi
- if [ -n "${CONFIG_GCOV_KERNEL}" ] || [ -n "${CONFIG_LTO_CLANG}" ]; then
+ if is_enabled CONFIG_GCOV_KERNEL || is_enabled CONFIG_LTO_CLANG; then
objtoolopt="${objtoolopt} --no-unreachable"
fi
- if [ -n "${CONFIG_RETPOLINE}" ]; then
+ if is_enabled CONFIG_RETPOLINE; then
objtoolopt="${objtoolopt} --retpoline"
fi
- if [ -n "${CONFIG_X86_SMAP}" ]; then
+ if is_enabled CONFIG_X86_SMAP; then
objtoolopt="${objtoolopt} --uaccess"
fi
- if [ -n "${CONFIG_SLS}" ]; then
+ if is_enabled CONFIG_SLS; then
objtoolopt="${objtoolopt} --sls"
fi
info OBJTOOL ${1}
# skip output file argument
shift
- if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ if is_enabled CONFIG_LTO_CLANG; then
# Use vmlinux.o instead of performing the slow LTO link again.
objs=vmlinux.o
libs=
ldflags="${ldflags} ${wl}--strip-debug"
fi
- if [ -n "${CONFIG_VMLINUX_MAP}" ]; then
+ if is_enabled CONFIG_VMLINUX_MAP; then
ldflags="${ldflags} ${wl}-Map=${output}.map"
fi
{
local kallsymopt;
- if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
+ if is_enabled CONFIG_KALLSYMS_ALL; then
kallsymopt="${kallsymopt} --all-symbols"
fi
- if [ -n "${CONFIG_KALLSYMS_ABSOLUTE_PERCPU}" ]; then
+ if is_enabled CONFIG_KALLSYMS_ABSOLUTE_PERCPU; then
kallsymopt="${kallsymopt} --absolute-percpu"
fi
- if [ -n "${CONFIG_KALLSYMS_BASE_RELATIVE}" ]; then
+ if is_enabled CONFIG_KALLSYMS_BASE_RELATIVE; then
kallsymopt="${kallsymopt} --base-relative"
fi
exit 0
fi
-# We need access to CONFIG_ symbols
-. include/config/auto.conf
-
# Update version
info GEN .version
if [ -r .version ]; then
tr ' ' '\n' | uniq | sed -e 's:^:kernel/:' -e 's/$/.ko/' > modules.builtin
btf_vmlinux_bin_o=""
-if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
+if is_enabled CONFIG_DEBUG_INFO_BTF; then
btf_vmlinux_bin_o=.btf.vmlinux.bin.o
if ! gen_btf .tmp_vmlinux.btf $btf_vmlinux_bin_o ; then
echo >&2 "Failed to generate BTF for vmlinux"
kallsymso=""
kallsymso_prev=""
kallsyms_vmlinux=""
-if [ -n "${CONFIG_KALLSYMS}" ]; then
+if is_enabled CONFIG_KALLSYMS; then
# kallsyms support
# Generate section listing all symbols and add it into vmlinux
vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
# fill in BTF IDs
-if [ -n "${CONFIG_DEBUG_INFO_BTF}" -a -n "${CONFIG_BPF}" ]; then
+if is_enabled CONFIG_DEBUG_INFO_BTF && is_enabled CONFIG_BPF; then
info BTFIDS vmlinux
${RESOLVE_BTFIDS} vmlinux
fi
info SYSMAP System.map
mksysmap vmlinux System.map
-if [ -n "${CONFIG_BUILDTIME_TABLE_SORT}" ]; then
+if is_enabled CONFIG_BUILDTIME_TABLE_SORT; then
info SORTTAB vmlinux
if ! sorttable vmlinux; then
echo >&2 Failed to sort kernel tables
fi
# step a (see comment above)
-if [ -n "${CONFIG_KALLSYMS}" ]; then
+if is_enabled CONFIG_KALLSYMS; then
mksysmap ${kallsyms_vmlinux} .tmp_System.map
if ! cmp -s System.map .tmp_System.map; then
if [ "$SRCARCH" = s390 ]; then
echo 13.0.0
else
- echo 10.0.1
+ echo 11.0.0
fi
;;
*)
return 0;
}
+#ifndef EM_RISCV
+#define EM_RISCV 243
+#endif
+
+#ifndef R_RISCV_SUB32
+#define R_RISCV_SUB32 39
+#endif
+
static void section_rela(const char *modname, struct elf_info *elf,
Elf_Shdr *sechdr)
{
r_sym = ELF_R_SYM(r.r_info);
#endif
r.r_addend = TO_NATIVE(rela->r_addend);
+ switch (elf->hdr->e_machine) {
+ case EM_RISCV:
+ if (!strcmp("__ex_table", fromsec) &&
+ ELF_R_TYPE(r.r_info) == R_RISCV_SUB32)
+ continue;
+ break;
+ }
sym = elf->symtab_start + r_sym;
/* Skip special sections */
if (is_shndx_special(sym->st_shndx))
rm -f arch/parisc/boot/compressed/${f}
done
fi
+
+rm -f scripts/extract-cert
exit
fi
-if test -e include/config/auto.conf; then
- . include/config/auto.conf
-else
+if ! test -e include/config/auto.conf; then
echo "Error: kernelrelease not valid - run 'make prepare' to update it" >&2
exit 1
fi
fi
# CONFIG_LOCALVERSION and LOCALVERSION (if set)
-res="${res}${CONFIG_LOCALVERSION}${LOCALVERSION}"
+config_localversion=$(sed -n 's/^CONFIG_LOCALVERSION=\(.*\)$/\1/p' include/config/auto.conf)
+res="${res}${config_localversion}${LOCALVERSION}"
# scm version string if not at a tagged commit
-if test "$CONFIG_LOCALVERSION_AUTO" = "y"; then
+if grep -q "^CONFIG_LOCALVERSION_AUTO=y$" include/config/auto.conf; then
# full scm version string
res="$res$(scm_version)"
elif [ "${LOCALVERSION+set}" != "set" ]; then
}
}
-static void arm64_sort_relative_table(char *extab_image, int image_size)
+static void sort_relative_table_with_data(char *extab_image, int image_size)
{
int i = 0;
}
}
-static void x86_sort_relative_table(char *extab_image, int image_size)
-{
- int i = 0;
-
- while (i < image_size) {
- uint32_t *loc = (uint32_t *)(extab_image + i);
-
- w(r(loc) + i, loc);
- w(r(loc + 1) + i + 4, loc + 1);
- /* Don't touch the fixup type */
-
- i += sizeof(uint32_t) * 3;
- }
-
- qsort(extab_image, image_size / 12, 12, compare_relative_table);
-
- i = 0;
- while (i < image_size) {
- uint32_t *loc = (uint32_t *)(extab_image + i);
-
- w(r(loc) - i, loc);
- w(r(loc + 1) - (i + 4), loc + 1);
- /* Don't touch the fixup type */
-
- i += sizeof(uint32_t) * 3;
- }
-}
-
static void s390_sort_relative_table(char *extab_image, int image_size)
{
int i;
switch (r2(&ehdr->e_machine)) {
case EM_386:
+ case EM_AARCH64:
+ case EM_RISCV:
case EM_X86_64:
- custom_sort = x86_sort_relative_table;
+ custom_sort = sort_relative_table_with_data;
break;
case EM_S390:
custom_sort = s390_sort_relative_table;
break;
- case EM_AARCH64:
- custom_sort = arm64_sort_relative_table;
- break;
case EM_PARISC:
case EM_PPC:
case EM_PPC64:
case EM_ARM:
case EM_MICROBLAZE:
case EM_MIPS:
- case EM_RISCV:
case EM_XTENSA:
break;
default:
return 0;
}
#ifdef MCOUNT_SORT_ENABLED
+pthread_t mcount_sort_thread;
+
struct elf_mcount_loc {
Elf_Ehdr *ehdr;
Elf_Shdr *init_data_sec;
unsigned int shnum;
unsigned int shstrndx;
#ifdef MCOUNT_SORT_ENABLED
- struct elf_mcount_loc mstruct;
+ struct elf_mcount_loc mstruct = {0};
uint_t _start_mcount_loc = 0;
uint_t _stop_mcount_loc = 0;
- pthread_t mcount_sort_thread;
#endif
#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
unsigned int orc_ip_size = 0;
static int snd_info_entry_open(struct inode *inode, struct file *file)
{
- struct snd_info_entry *entry = PDE_DATA(inode);
+ struct snd_info_entry *entry = pde_data(inode);
struct snd_info_private_data *data;
int mode, err;
static int snd_info_text_entry_open(struct inode *inode, struct file *file)
{
- struct snd_info_entry *entry = PDE_DATA(inode);
+ struct snd_info_entry *entry = pde_data(inode);
struct snd_info_private_data *data;
int err;
*/
int snd_power_ref_and_wait(struct snd_card *card)
{
- wait_queue_entry_t wait;
- int result = 0;
-
snd_power_ref(card);
- /* fastpath */
if (snd_power_get_state(card) == SNDRV_CTL_POWER_D0)
return 0;
- init_waitqueue_entry(&wait, current);
- add_wait_queue(&card->power_sleep, &wait);
- while (1) {
- if (card->shutdown) {
- result = -ENODEV;
- break;
- }
- if (snd_power_get_state(card) == SNDRV_CTL_POWER_D0)
- break;
- snd_power_unref(card);
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_timeout(30 * HZ);
- snd_power_ref(card);
- }
- remove_wait_queue(&card->power_sleep, &wait);
- return result;
+ wait_event_cmd(card->power_sleep,
+ card->shutdown ||
+ snd_power_get_state(card) == SNDRV_CTL_POWER_D0,
+ snd_power_unref(card), snd_power_ref(card));
+ return card->shutdown ? -ENODEV : 0;
}
EXPORT_SYMBOL_GPL(snd_power_ref_and_wait);
{
const struct snd_pci_quirk *q;
- for (q = list; q->subvendor; q++) {
+ for (q = list; q->subvendor || q->subdevice; q++) {
if (q->subvendor != vendor)
continue;
if (!q->subdevice ||
// SPDX-License-Identifier: GPL-2.0
//
-// cs35l41.c -- CS35l41 ALSA HDA audio driver
+// CS35l41 ALSA HDA audio driver
//
// Copyright 2021 Cirrus Logic, Inc.
//
#include "cs35l41_hda.h"
static const struct reg_sequence cs35l41_hda_config[] = {
- { CS35L41_PLL_CLK_CTRL, 0x00000430 }, //3200000Hz, BCLK Input, PLL_REFCLK_EN = 1
- { CS35L41_GLOBAL_CLK_CTRL, 0x00000003 }, //GLOBAL_FS = 48 kHz
- { CS35L41_SP_ENABLES, 0x00010000 }, //ASP_RX1_EN = 1
- { CS35L41_SP_RATE_CTRL, 0x00000021 }, //ASP_BCLK_FREQ = 3.072 MHz
- { CS35L41_SP_FORMAT, 0x20200200 }, //24 bits, I2S, BCLK Slave, FSYNC Slave
- { CS35L41_DAC_PCM1_SRC, 0x00000008 }, //DACPCM1_SRC = ASPRX1
- { CS35L41_AMP_DIG_VOL_CTRL, 0x00000000 }, //AMP_VOL_PCM 0.0 dB
- { CS35L41_AMP_GAIN_CTRL, 0x00000084 }, //AMP_GAIN_PCM 4.5 dB
- { CS35L41_PWR_CTRL2, 0x00000001 }, //AMP_EN = 1
+ { CS35L41_PLL_CLK_CTRL, 0x00000430 }, // 3200000Hz, BCLK Input, PLL_REFCLK_EN = 1
+ { CS35L41_GLOBAL_CLK_CTRL, 0x00000003 }, // GLOBAL_FS = 48 kHz
+ { CS35L41_SP_ENABLES, 0x00010000 }, // ASP_RX1_EN = 1
+ { CS35L41_SP_RATE_CTRL, 0x00000021 }, // ASP_BCLK_FREQ = 3.072 MHz
+ { CS35L41_SP_FORMAT, 0x20200200 }, // 24 bits, I2S, BCLK Slave, FSYNC Slave
+ { CS35L41_DAC_PCM1_SRC, 0x00000008 }, // DACPCM1_SRC = ASPRX1
+ { CS35L41_AMP_DIG_VOL_CTRL, 0x00000000 }, // AMP_VOL_PCM 0.0 dB
+ { CS35L41_AMP_GAIN_CTRL, 0x00000084 }, // AMP_GAIN_PCM 4.5 dB
+ { CS35L41_PWR_CTRL2, 0x00000001 }, // AMP_EN = 1
};
static const struct reg_sequence cs35l41_hda_start_bst[] = {
- { CS35L41_PWR_CTRL2, 0x00000021 }, //BST_EN = 10, AMP_EN = 1
+ { CS35L41_PWR_CTRL2, 0x00000021 }, // BST_EN = 10, AMP_EN = 1
{ CS35L41_PWR_CTRL1, 0x00000001, 3000}, // set GLOBAL_EN = 1
};
{ 0x00000040, 0x00000055 },
{ 0x00000040, 0x000000AA },
{ 0x00007438, 0x00585941 },
- { 0x00002014, 0x00000000, 3000}, //set GLOBAL_EN = 0
+ { 0x00002014, 0x00000000, 3000}, // set GLOBAL_EN = 0
{ 0x0000742C, 0x00000009 },
{ 0x00007438, 0x00580941 },
{ 0x00011008, 0x00000001 },
{ 0x0000742C, 0x0000000F },
{ 0x0000742C, 0x00000079 },
{ 0x00007438, 0x00585941 },
- { CS35L41_PWR_CTRL1, 0x00000001, 2000 }, //GLOBAL_EN = 1
+ { CS35L41_PWR_CTRL1, 0x00000001, 2000 }, // GLOBAL_EN = 1
{ 0x0000742C, 0x000000F9 },
{ 0x00007438, 0x00580941 },
{ 0x00000040, 0x000000CC },
{ 0x00000040, 0x00000055 },
{ 0x00000040, 0x000000AA },
{ 0x00007438, 0x00585941 },
- { CS35L41_AMP_DIG_VOL_CTRL, 0x0000A678 }, //AMP_VOL_PCM Mute
- { CS35L41_PWR_CTRL2, 0x00000000 }, //AMP_EN = 0
+ { CS35L41_AMP_DIG_VOL_CTRL, 0x0000A678 }, // AMP_VOL_PCM Mute
+ { CS35L41_PWR_CTRL2, 0x00000000 }, // AMP_EN = 0
{ CS35L41_PWR_CTRL1, 0x00000000 },
{ 0x0000742C, 0x00000009, 2000 },
{ 0x00007438, 0x00580941 },
if (reg_seq->close)
ret = regmap_multi_reg_write(reg, reg_seq->close, reg_seq->num_close);
break;
+ default:
+ ret = -EINVAL;
+ break;
}
if (ret)
dev_warn(cs35l41->dev, "Failed to apply multi reg write: %d\n", ret);
-
}
static int cs35l41_hda_channel_map(struct device *dev, unsigned int tx_num, unsigned int *tx_slot,
struct cs35l41_hda *cs35l41 = dev_get_drvdata(dev);
struct hda_component *comps = master_data;
- if (comps && cs35l41->index >= 0 && cs35l41->index < HDA_MAX_COMPONENTS)
- comps = &comps[cs35l41->index];
- else
+ if (!comps || cs35l41->index < 0 || cs35l41->index >= HDA_MAX_COMPONENTS)
return -EINVAL;
- if (!comps->dev) {
- comps->dev = dev;
- strscpy(comps->name, dev_name(dev), sizeof(comps->name));
- comps->playback_hook = cs35l41_hda_playback_hook;
- comps->set_channel_map = cs35l41_hda_channel_map;
- return 0;
- }
+ comps = &comps[cs35l41->index];
+ if (comps->dev)
+ return -EBUSY;
+
+ comps->dev = dev;
+ strscpy(comps->name, dev_name(dev), sizeof(comps->name));
+ comps->playback_hook = cs35l41_hda_playback_hook;
+ comps->set_channel_map = cs35l41_hda_channel_map;
- return -EBUSY;
+ return 0;
}
static void cs35l41_hda_unbind(struct device *dev, struct device *master, void *master_data)
internal_boost = true;
switch (hw_cfg->gpio1_func) {
+ case CS35L41_NOT_USED:
+ break;
case CS35l41_VSPK_SWITCH:
regmap_update_bits(cs35l41->regmap, CS35L41_GPIO_PAD_CONTROL,
CS35L41_GPIO1_CTRL_MASK, 1 << CS35L41_GPIO1_CTRL_SHIFT);
regmap_update_bits(cs35l41->regmap, CS35L41_GPIO_PAD_CONTROL,
CS35L41_GPIO1_CTRL_MASK, 2 << CS35L41_GPIO1_CTRL_SHIFT);
break;
+ default:
+ dev_err(cs35l41->dev, "Invalid function %d for GPIO1\n", hw_cfg->gpio1_func);
+ return -EINVAL;
}
switch (hw_cfg->gpio2_func) {
+ case CS35L41_NOT_USED:
+ break;
case CS35L41_INTERRUPT:
regmap_update_bits(cs35l41->regmap, CS35L41_GPIO_PAD_CONTROL,
CS35L41_GPIO2_CTRL_MASK, 2 << CS35L41_GPIO2_CTRL_SHIFT);
break;
+ default:
+ dev_err(cs35l41->dev, "Invalid function %d for GPIO2\n", hw_cfg->gpio2_func);
+ return -EINVAL;
}
if (internal_boost) {
cs35l41->reg_seq = &cs35l41_hda_reg_seq_ext_bst;
}
- ret = cs35l41_hda_channel_map(cs35l41->dev, 0, NULL, 1, (unsigned int *)&hw_cfg->spk_pos);
- if (ret)
- return ret;
-
- return 0;
+ return cs35l41_hda_channel_map(cs35l41->dev, 0, NULL, 1, (unsigned int *)&hw_cfg->spk_pos);
}
static struct cs35l41_hda_hw_config *cs35l41_hda_read_acpi(struct cs35l41_hda *cs35l41,
struct cs35l41_hda_hw_config *hw_cfg;
u32 values[HDA_MAX_COMPONENTS];
struct acpi_device *adev;
- struct device *acpi_dev;
+ struct device *physdev;
char *property;
size_t nval;
int i, ret;
return ERR_PTR(-ENODEV);
}
- acpi_dev = get_device(acpi_get_first_physical_node(adev));
+ physdev = get_device(acpi_get_first_physical_node(adev));
acpi_dev_put(adev);
property = "cirrus,dev-index";
- ret = device_property_count_u32(acpi_dev, property);
+ ret = device_property_count_u32(physdev, property);
if (ret <= 0)
goto no_acpi_dsd;
}
nval = ret;
- ret = device_property_read_u32_array(acpi_dev, property, values, nval);
+ ret = device_property_read_u32_array(physdev, property, values, nval);
if (ret)
goto err;
goto err;
}
- /* No devm_ version as CLSA0100, in no_acpi_dsd case, can't use devm version */
+ /* To use the same release code for all laptop variants we can't use devm_ version of
+ * gpiod_get here, as CLSA010* don't have a fully functional bios with an _DSD node
+ */
cs35l41->reset_gpio = fwnode_gpiod_get_index(&adev->fwnode, "reset", cs35l41->index,
GPIOD_OUT_LOW, "cs35l41-reset");
}
property = "cirrus,speaker-position";
- ret = device_property_read_u32_array(acpi_dev, property, values, nval);
+ ret = device_property_read_u32_array(physdev, property, values, nval);
if (ret)
goto err_free;
hw_cfg->spk_pos = values[cs35l41->index];
property = "cirrus,gpio1-func";
- ret = device_property_read_u32_array(acpi_dev, property, values, nval);
+ ret = device_property_read_u32_array(physdev, property, values, nval);
if (ret)
goto err_free;
hw_cfg->gpio1_func = values[cs35l41->index];
property = "cirrus,gpio2-func";
- ret = device_property_read_u32_array(acpi_dev, property, values, nval);
+ ret = device_property_read_u32_array(physdev, property, values, nval);
if (ret)
goto err_free;
hw_cfg->gpio2_func = values[cs35l41->index];
property = "cirrus,boost-peak-milliamp";
- ret = device_property_read_u32_array(acpi_dev, property, values, nval);
+ ret = device_property_read_u32_array(physdev, property, values, nval);
if (ret == 0)
hw_cfg->bst_ipk = values[cs35l41->index];
property = "cirrus,boost-ind-nanohenry";
- ret = device_property_read_u32_array(acpi_dev, property, values, nval);
+ ret = device_property_read_u32_array(physdev, property, values, nval);
if (ret == 0)
hw_cfg->bst_ind = values[cs35l41->index];
property = "cirrus,boost-cap-microfarad";
- ret = device_property_read_u32_array(acpi_dev, property, values, nval);
+ ret = device_property_read_u32_array(physdev, property, values, nval);
if (ret == 0)
hw_cfg->bst_cap = values[cs35l41->index];
- put_device(acpi_dev);
+ put_device(physdev);
return hw_cfg;
err_free:
kfree(hw_cfg);
err:
- put_device(acpi_dev);
+ put_device(physdev);
dev_err(cs35l41->dev, "Failed property %s: %d\n", property, ret);
return ERR_PTR(ret);
/*
* Device CLSA0100 doesn't have _DSD so a gpiod_get by the label reset won't work.
* And devices created by i2c-multi-instantiate don't have their device struct pointing to
- * the correct fwnode, so acpi_dev must be used here
+ * the correct fwnode, so acpi_dev must be used here.
* And devm functions expect that the device requesting the resource has the correct
- * fwnode
+ * fwnode.
*/
if (strncmp(hid, "CLSA0100", 8) != 0)
return ERR_PTR(-EINVAL);
/* check I2C address to assign the index */
cs35l41->index = id == 0x40 ? 0 : 1;
- cs35l41->reset_gpio = gpiod_get_index(acpi_dev, NULL, 0, GPIOD_OUT_HIGH);
+ cs35l41->reset_gpio = gpiod_get_index(physdev, NULL, 0, GPIOD_OUT_HIGH);
cs35l41->vspk_always_on = true;
- put_device(acpi_dev);
+ put_device(physdev);
return NULL;
}
if (ret == -EBUSY) {
dev_info(cs35l41->dev, "Reset line busy, assuming shared reset\n");
} else {
- if (ret != -EPROBE_DEFER)
- dev_err(cs35l41->dev, "Failed to get reset GPIO: %d\n", ret);
+ dev_err_probe(cs35l41->dev, ret, "Failed to get reset GPIO: %d\n", ret);
goto err;
}
}
ret = regmap_read(cs35l41->regmap, CS35L41_IRQ1_STATUS3, &int_sts);
if (ret || (int_sts & CS35L41_OTP_BOOT_ERR)) {
- dev_err(cs35l41->dev, "OTP Boot error\n");
+ dev_err(cs35l41->dev, "OTP Boot status %x error: %d\n",
+ int_sts & CS35L41_OTP_BOOT_ERR, ret);
ret = -EIO;
goto err;
}
goto err;
}
+ ret = cs35l41_test_key_unlock(cs35l41->dev, cs35l41->regmap);
+ if (ret)
+ goto err;
+
ret = cs35l41_register_errata_patch(cs35l41->dev, cs35l41->regmap, reg_revid);
if (ret)
goto err;
goto err;
}
+ ret = cs35l41_test_key_lock(cs35l41->dev, cs35l41->regmap);
+ if (ret)
+ goto err;
+
ret = cs35l41_hda_apply_properties(cs35l41, acpi_hw_cfg);
if (ret)
goto err;
acpi_hw_cfg = NULL;
if (cs35l41->reg_seq->probe) {
- ret = regmap_register_patch(cs35l41->regmap, cs35l41->reg_seq->probe,
- cs35l41->reg_seq->num_probe);
+ ret = regmap_multi_reg_write(cs35l41->regmap, cs35l41->reg_seq->probe,
+ cs35l41->reg_seq->num_probe);
if (ret) {
dev_err(cs35l41->dev, "Fail to apply probe reg patch: %d\n", ret);
goto err;
return ret;
}
-EXPORT_SYMBOL_GPL(cs35l41_hda_probe);
+EXPORT_SYMBOL_NS_GPL(cs35l41_hda_probe, SND_HDA_SCODEC_CS35L41);
-int cs35l41_hda_remove(struct device *dev)
+void cs35l41_hda_remove(struct device *dev)
{
struct cs35l41_hda *cs35l41 = dev_get_drvdata(dev);
if (!cs35l41->vspk_always_on)
gpiod_set_value_cansleep(cs35l41->reset_gpio, 0);
gpiod_put(cs35l41->reset_gpio);
-
- return 0;
}
-EXPORT_SYMBOL_GPL(cs35l41_hda_remove);
-
+EXPORT_SYMBOL_NS_GPL(cs35l41_hda_remove, SND_HDA_SCODEC_CS35L41);
MODULE_DESCRIPTION("CS35L41 HDA Driver");
MODULE_AUTHOR("Lucas Tanure, Cirrus Logic Inc, <tanureal@opensource.cirrus.com>");
/* SPDX-License-Identifier: GPL-2.0
*
- * cs35l41_hda.h -- CS35L41 ALSA HDA audio driver
+ * CS35L41 ALSA HDA audio driver
*
* Copyright 2021 Cirrus Logic, Inc.
*
int cs35l41_hda_probe(struct device *dev, const char *device_name, int id, int irq,
struct regmap *regmap);
-int cs35l41_hda_remove(struct device *dev);
+void cs35l41_hda_remove(struct device *dev);
#endif /*__CS35L41_HDA_H__*/
static int cs35l41_hda_i2c_remove(struct i2c_client *clt)
{
- return cs35l41_hda_remove(&clt->dev);
+ cs35l41_hda_remove(&clt->dev);
+
+ return 0;
}
static const struct i2c_device_id cs35l41_hda_i2c_id[] = {
.probe = cs35l41_hda_i2c_probe,
.remove = cs35l41_hda_i2c_remove,
};
-
module_i2c_driver(cs35l41_i2c_driver);
MODULE_DESCRIPTION("HDA CS35L41 driver");
+MODULE_IMPORT_NS(SND_HDA_SCODEC_CS35L41);
MODULE_AUTHOR("Lucas Tanure <tanureal@opensource.cirrus.com>");
MODULE_LICENSE("GPL");
static int cs35l41_hda_spi_remove(struct spi_device *spi)
{
- return cs35l41_hda_remove(&spi->dev);
+ cs35l41_hda_remove(&spi->dev);
+
+ return 0;
}
static const struct spi_device_id cs35l41_hda_spi_id[] = {
.probe = cs35l41_hda_spi_probe,
.remove = cs35l41_hda_spi_remove,
};
-
module_spi_driver(cs35l41_spi_driver);
MODULE_DESCRIPTION("HDA CS35L41 driver");
+MODULE_IMPORT_NS(SND_HDA_SCODEC_CS35L41);
MODULE_AUTHOR("Lucas Tanure <tanureal@opensource.cirrus.com>");
MODULE_LICENSE("GPL");
SND_PCI_QUIRK(0x1028, 0x0ADC, "Warlock", CS8409_WARLOCK),
SND_PCI_QUIRK(0x1028, 0x0AF4, "Warlock", CS8409_WARLOCK),
SND_PCI_QUIRK(0x1028, 0x0AF5, "Warlock", CS8409_WARLOCK),
+ SND_PCI_QUIRK(0x1028, 0x0BB5, "Warlock N3 15 TGL-U Nuvoton EC", CS8409_WARLOCK),
+ SND_PCI_QUIRK(0x1028, 0x0BB6, "Warlock V3 15 TGL-U Nuvoton EC", CS8409_WARLOCK),
SND_PCI_QUIRK(0x1028, 0x0A77, "Cyborg", CS8409_CYBORG),
SND_PCI_QUIRK(0x1028, 0x0A78, "Cyborg", CS8409_CYBORG),
SND_PCI_QUIRK(0x1028, 0x0A79, "Cyborg", CS8409_CYBORG),
ALC285_FIXUP_LEGION_Y9000X_AUTOMUTE,
ALC287_FIXUP_LEGION_16ACHG6,
ALC287_FIXUP_CS35L41_I2C_2,
+ ALC285_FIXUP_HP_SPEAKERS_MICMUTE_LED,
};
static const struct hda_fixup alc269_fixups[] = {
.type = HDA_FIXUP_FUNC,
.v.func = cs35l41_fixup_i2c_two,
},
+ [ALC285_FIXUP_HP_SPEAKERS_MICMUTE_LED] = {
+ .type = HDA_FIXUP_VERBS,
+ .v.verbs = (const struct hda_verb[]) {
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x19 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x8e11 },
+ { }
+ },
+ .chained = true,
+ .chain_id = ALC285_FIXUP_HP_MUTE_LED,
+ },
};
static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8870, "HP ZBook Fury 15.6 Inch G8 Mobile Workstation PC", ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x8873, "HP ZBook Studio 15.6 Inch G8 Mobile Workstation PC", ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x888d, "HP ZBook Power 15.6 inch G8 Mobile Workstation PC", ALC236_FIXUP_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8895, "HP EliteBook 855 G8 Notebook PC", ALC285_FIXUP_HP_SPEAKERS_MICMUTE_LED),
SND_PCI_QUIRK(0x103c, 0x8896, "HP EliteBook 855 G8 Notebook PC", ALC285_FIXUP_HP_MUTE_LED),
SND_PCI_QUIRK(0x103c, 0x8898, "HP EliteBook 845 G8 Notebook PC", ALC285_FIXUP_HP_LIMIT_INT_MIC_BOOST),
SND_PCI_QUIRK(0x103c, 0x88d0, "HP Pavilion 15-eh1xxx (mainboard 88D0)", ALC287_FIXUP_HP_GPIO_LED),
{}
};
+/* MSI MPG X570S Carbon Max Wifi with ALC4080 */
+static const struct usbmix_name_map msi_mpg_x570s_carbon_max_wifi_alc4080_map[] = {
+ { 29, "Speaker Playback" },
+ { 30, "Front Headphone Playback" },
+ { 32, "IEC958 Playback" },
+ {}
+};
+
/*
* Control map entries
*/
.map = trx40_mobo_map,
.connector_map = trx40_mobo_connector_map,
},
+ { /* MSI MPG X570S Carbon Max Wifi */
+ .id = USB_ID(0x0db0, 0x419c),
+ .map = msi_mpg_x570s_carbon_max_wifi_alc4080_map,
+ },
{ /* MSI TRX40 */
.id = USB_ID(0x0db0, 0x543d),
.map = trx40_mobo_map,
"RECLAIM %12s%15s%15s\n"
" %15llu%15llu%15llums\n"
"THRASHING%12s%15s%15s\n"
+ " %15llu%15llu%15llums\n"
+ "COMPACT %12s%15s%15s\n"
" %15llu%15llu%15llums\n",
"count", "real total", "virtual total",
"delay total", "delay average",
"count", "delay total", "delay average",
(unsigned long long)t->thrashing_count,
(unsigned long long)t->thrashing_delay_total,
- average_ms(t->thrashing_delay_total, t->thrashing_count));
+ average_ms(t->thrashing_delay_total, t->thrashing_count),
+ "count", "delay total", "delay average",
+ (unsigned long long)t->compact_count,
+ (unsigned long long)t->compact_delay_total,
+ average_ms(t->compact_delay_total, t->compact_count));
}
static void task_context_switch_counts(struct taskstats *t)
e.pid = task->tgid;
e.id = get_obj_id(file->private_data, obj_type);
- bpf_probe_read_kernel(&e.comm, sizeof(e.comm),
- task->group_leader->comm);
+ bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm),
+ task->group_leader->comm);
bpf_seq_write(ctx->meta->seq, &e, sizeof(e));
return 0;
#include <asm-generic/bitops/fls.h>
#include <asm-generic/bitops/__fls.h>
#include <asm-generic/bitops/fls64.h>
-#include <asm-generic/bitops/find.h>
#ifndef _TOOLS_LINUX_BITOPS_H_
#error only <linux/bitops.h> can be included directly
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
-#define _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_
-
-extern unsigned long _find_next_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long nbits,
- unsigned long start, unsigned long invert, unsigned long le);
-extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
-extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
-extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
-
-#ifndef find_next_bit
-/**
- * find_next_bit - find the next set bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The bitmap size in bits
- *
- * Returns the bit number for the next set bit
- * If no bits are set, returns @size.
- */
-static inline
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val;
-
- if (unlikely(offset >= size))
- return size;
-
- val = *addr & GENMASK(size - 1, offset);
- return val ? __ffs(val) : size;
- }
-
- return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
-}
-#endif
-
-#ifndef find_next_and_bit
-/**
- * find_next_and_bit - find the next set bit in both memory regions
- * @addr1: The first address to base the search on
- * @addr2: The second address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The bitmap size in bits
- *
- * Returns the bit number for the next set bit
- * If no bits are set, returns @size.
- */
-static inline
-unsigned long find_next_and_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long size,
- unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val;
-
- if (unlikely(offset >= size))
- return size;
-
- val = *addr1 & *addr2 & GENMASK(size - 1, offset);
- return val ? __ffs(val) : size;
- }
-
- return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
-}
-#endif
-
-#ifndef find_next_zero_bit
-/**
- * find_next_zero_bit - find the next cleared bit in a memory region
- * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
- * @size: The bitmap size in bits
- *
- * Returns the bit number of the next zero bit
- * If no bits are zero, returns @size.
- */
-static inline
-unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
-{
- if (small_const_nbits(size)) {
- unsigned long val;
-
- if (unlikely(offset >= size))
- return size;
-
- val = *addr | ~GENMASK(size - 1, offset);
- return val == ~0UL ? size : ffz(val);
- }
-
- return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
-}
-#endif
-
-#ifndef find_first_bit
-
-/**
- * find_first_bit - find the first set bit in a memory region
- * @addr: The address to start the search at
- * @size: The maximum number of bits to search
- *
- * Returns the bit number of the first set bit.
- * If no bits are set, returns @size.
- */
-static inline
-unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
-{
- if (small_const_nbits(size)) {
- unsigned long val = *addr & GENMASK(size - 1, 0);
-
- return val ? __ffs(val) : size;
- }
-
- return _find_first_bit(addr, size);
-}
-
-#endif /* find_first_bit */
-
-#ifndef find_first_zero_bit
-
-/**
- * find_first_zero_bit - find the first cleared bit in a memory region
- * @addr: The address to start the search at
- * @size: The maximum number of bits to search
- *
- * Returns the bit number of the first cleared bit.
- * If no bits are zero, returns @size.
- */
-static inline
-unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
-{
- if (small_const_nbits(size)) {
- unsigned long val = *addr | ~GENMASK(size - 1, 0);
-
- return val == ~0UL ? size : ffz(val);
- }
-
- return _find_first_zero_bit(addr, size);
-}
-#endif
-
-#endif /*_TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ */
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _PERF_BITOPS_H
-#define _PERF_BITOPS_H
+#ifndef _TOOLS_LINUX_BITMAP_H
+#define _TOOLS_LINUX_BITMAP_H
#include <string.h>
#include <linux/bitops.h>
+#include <linux/find.h>
#include <stdlib.h>
#include <linux/kernel.h>
return __bitmap_intersects(src1, src2, nbits);
}
-#endif /* _PERF_BITOPS_H */
+#endif /* _TOOLS_LINUX_BITMAP_H */
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TOOLS_LINUX_FIND_H_
+#define _TOOLS_LINUX_FIND_H_
+
+#ifndef _TOOLS_LINUX_BITMAP_H
+#error tools: only <linux/bitmap.h> can be included directly
+#endif
+
+#include <linux/bitops.h>
+
+extern unsigned long _find_next_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long nbits,
+ unsigned long start, unsigned long invert, unsigned long le);
+extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_first_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long size);
+extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
+extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
+
+#ifndef find_next_bit
+/**
+ * find_next_bit - find the next set bit in a memory region
+ * @addr: The address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
+}
+#endif
+
+#ifndef find_next_and_bit
+/**
+ * find_next_and_bit - find the next set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_next_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2, unsigned long size,
+ unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr1 & *addr2 & GENMASK(size - 1, offset);
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
+}
+#endif
+
+#ifndef find_next_zero_bit
+/**
+ * find_next_zero_bit - find the next cleared bit in a memory region
+ * @addr: The address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number of the next zero bit
+ * If no bits are zero, returns @size.
+ */
+static inline
+unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val;
+
+ if (unlikely(offset >= size))
+ return size;
+
+ val = *addr | ~GENMASK(size - 1, offset);
+ return val == ~0UL ? size : ffz(val);
+ }
+
+ return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
+}
+#endif
+
+#ifndef find_first_bit
+/**
+ * find_first_bit - find the first set bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The maximum number of bits to search
+ *
+ * Returns the bit number of the first set bit.
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr & GENMASK(size - 1, 0);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_bit(addr, size);
+}
+#endif
+
+#ifndef find_first_and_bit
+/**
+ * find_first_and_bit - find the first set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_and_bit(addr1, addr2, size);
+}
+#endif
+
+#ifndef find_first_zero_bit
+/**
+ * find_first_zero_bit - find the first cleared bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The maximum number of bits to search
+ *
+ * Returns the bit number of the first cleared bit.
+ * If no bits are zero, returns @size.
+ */
+static inline
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr | ~GENMASK(size - 1, 0);
+
+ return val == ~0UL ? size : ffz(val);
+ }
+
+ return _find_first_zero_bit(addr, size);
+}
+#endif
+
+#ifndef find_last_bit
+/**
+ * find_last_bit - find the last set bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The number of bits to search
+ *
+ * Returns the bit number of the last set bit, or size.
+ */
+static inline
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr & GENMASK(size - 1, 0);
+
+ return val ? __fls(val) : size;
+ }
+
+ return _find_last_bit(addr, size);
+}
+#endif
+
+/**
+ * find_next_clump8 - find next 8-bit clump with set bits in a memory region
+ * @clump: location to store copy of found clump
+ * @addr: address to base the search on
+ * @size: bitmap size in number of bits
+ * @offset: bit offset at which to start searching
+ *
+ * Returns the bit offset for the next set clump; the found clump value is
+ * copied to the location pointed by @clump. If no bits are set, returns @size.
+ */
+extern unsigned long find_next_clump8(unsigned long *clump,
+ const unsigned long *addr,
+ unsigned long size, unsigned long offset);
+
+#define find_first_clump8(clump, bits, size) \
+ find_next_clump8((clump), (bits), (size), 0)
+
+
+#endif /*__LINUX_FIND_H_ */
return val * GOLDEN_RATIO_32;
}
-#ifndef HAVE_ARCH_HASH_32
-#define hash_32 hash_32_generic
-#endif
-static inline u32 hash_32_generic(u32 val, unsigned int bits)
+static inline u32 hash_32(u32 val, unsigned int bits)
{
/* High bits are more random, so use them. */
return __hash_32(val) >> (32 - bits);
#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
#define KVM_CAP_ARM_MTE 205
#define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
-#define KVM_CAP_XSAVE2 207
+#define KVM_CAP_VM_GPA_BITS 207
+#define KVM_CAP_XSAVE2 208
#ifdef KVM_CAP_IRQ_ROUTING
__u32 sint;
};
+struct kvm_irq_routing_xen_evtchn {
+ __u32 port;
+ __u32 vcpu;
+ __u32 priority;
+};
+
+#define KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL ((__u32)(-1))
+
/* gsi routing entry types */
#define KVM_IRQ_ROUTING_IRQCHIP 1
#define KVM_IRQ_ROUTING_MSI 2
#define KVM_IRQ_ROUTING_S390_ADAPTER 3
#define KVM_IRQ_ROUTING_HV_SINT 4
+#define KVM_IRQ_ROUTING_XEN_EVTCHN 5
struct kvm_irq_routing_entry {
__u32 gsi;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
struct kvm_irq_routing_hv_sint hv_sint;
+ struct kvm_irq_routing_xen_evtchn xen_evtchn;
__u32 pad[8];
} u;
};
#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1)
#define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2)
#define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3)
+#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4)
struct kvm_xen_hvm_config {
__u32 flags;
/* Available with KVM_CAP_XSAVE */
#define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave)
#define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave)
-/* Available with KVM_CAP_XSAVE2 */
-#define KVM_GET_XSAVE2 _IOR(KVMIO, 0xcf, struct kvm_xsave)
/* Available with KVM_CAP_XCRS */
#define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs)
#define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs)
#define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
#define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
+/* Available with KVM_CAP_XSAVE2 */
+#define KVM_GET_XSAVE2 _IOR(KVMIO, 0xcf, struct kvm_xsave)
+
struct kvm_s390_pv_sec_parm {
__u64 origin;
__u64 length;
}
#endif
+#ifndef find_first_and_bit
+/*
+ * Find the first set bit in two memory regions.
+ */
+unsigned long _find_first_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ unsigned long size)
+{
+ unsigned long idx, val;
+
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ val = addr1[idx] & addr2[idx];
+ if (val)
+ return min(idx * BITS_PER_LONG + __ffs(val), size);
+ }
+
+ return size;
+}
+#endif
+
#ifndef find_first_zero_bit
/*
* Find the first cleared bit in a memory region.
#include "test_d_path.skel.h"
#include "test_d_path_check_rdonly_mem.skel.h"
+#include "test_d_path_check_types.skel.h"
static int duration;
test_d_path_check_rdonly_mem__destroy(skel);
}
+static void test_d_path_check_types(void)
+{
+ struct test_d_path_check_types *skel;
+
+ skel = test_d_path_check_types__open_and_load();
+ ASSERT_ERR_PTR(skel, "unexpected_load_passing_wrong_type");
+
+ test_d_path_check_types__destroy(skel);
+}
+
void test_d_path(void)
{
if (test__start_subtest("basic"))
if (test__start_subtest("check_rdonly_mem"))
test_d_path_check_rdonly_mem();
+
+ if (test__start_subtest("check_alloc_mem"))
+ test_d_path_check_types();
}
void serial_test_xdp_link(void)
{
- __u32 duration = 0, id1, id2, id0 = 0, prog_fd1, prog_fd2, err;
DECLARE_LIBBPF_OPTS(bpf_xdp_set_link_opts, opts, .old_fd = -1);
struct test_xdp_link *skel1 = NULL, *skel2 = NULL;
+ __u32 id1, id2, id0 = 0, prog_fd1, prog_fd2;
struct bpf_link_info link_info;
struct bpf_prog_info prog_info;
struct bpf_link *link;
+ int err;
__u32 link_info_len = sizeof(link_info);
__u32 prog_info_len = sizeof(prog_info);
skel1 = test_xdp_link__open_and_load();
- if (CHECK(!skel1, "skel_load", "skeleton open and load failed\n"))
+ if (!ASSERT_OK_PTR(skel1, "skel_load"))
goto cleanup;
prog_fd1 = bpf_program__fd(skel1->progs.xdp_handler);
skel2 = test_xdp_link__open_and_load();
- if (CHECK(!skel2, "skel_load", "skeleton open and load failed\n"))
+ if (!ASSERT_OK_PTR(skel2, "skel_load"))
goto cleanup;
prog_fd2 = bpf_program__fd(skel2->progs.xdp_handler);
memset(&prog_info, 0, sizeof(prog_info));
err = bpf_obj_get_info_by_fd(prog_fd1, &prog_info, &prog_info_len);
- if (CHECK(err, "fd_info1", "failed %d\n", -errno))
+ if (!ASSERT_OK(err, "fd_info1"))
goto cleanup;
id1 = prog_info.id;
memset(&prog_info, 0, sizeof(prog_info));
err = bpf_obj_get_info_by_fd(prog_fd2, &prog_info, &prog_info_len);
- if (CHECK(err, "fd_info2", "failed %d\n", -errno))
+ if (!ASSERT_OK(err, "fd_info2"))
goto cleanup;
id2 = prog_info.id;
/* set initial prog attachment */
err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, prog_fd1, XDP_FLAGS_REPLACE, &opts);
- if (CHECK(err, "fd_attach", "initial prog attach failed: %d\n", err))
+ if (!ASSERT_OK(err, "fd_attach"))
goto cleanup;
/* validate prog ID */
err = bpf_get_link_xdp_id(IFINDEX_LO, &id0, 0);
- CHECK(err || id0 != id1, "id1_check",
- "loaded prog id %u != id1 %u, err %d", id0, id1, err);
+ if (!ASSERT_OK(err, "id1_check_err") || !ASSERT_EQ(id0, id1, "id1_check_val"))
+ goto cleanup;
/* BPF link is not allowed to replace prog attachment */
link = bpf_program__attach_xdp(skel1->progs.xdp_handler, IFINDEX_LO);
/* detach BPF program */
opts.old_fd = prog_fd1;
err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, -1, XDP_FLAGS_REPLACE, &opts);
- if (CHECK(err, "prog_detach", "failed %d\n", err))
+ if (!ASSERT_OK(err, "prog_detach"))
goto cleanup;
/* now BPF link should attach successfully */
/* validate prog ID */
err = bpf_get_link_xdp_id(IFINDEX_LO, &id0, 0);
- if (CHECK(err || id0 != id1, "id1_check",
- "loaded prog id %u != id1 %u, err %d", id0, id1, err))
+ if (!ASSERT_OK(err, "id1_check_err") || !ASSERT_EQ(id0, id1, "id1_check_val"))
goto cleanup;
/* BPF prog attach is not allowed to replace BPF link */
opts.old_fd = prog_fd1;
err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, prog_fd2, XDP_FLAGS_REPLACE, &opts);
- if (CHECK(!err, "prog_attach_fail", "unexpected success\n"))
+ if (!ASSERT_ERR(err, "prog_attach_fail"))
goto cleanup;
/* Can't force-update when BPF link is active */
err = bpf_set_link_xdp_fd(IFINDEX_LO, prog_fd2, 0);
- if (CHECK(!err, "prog_update_fail", "unexpected success\n"))
+ if (!ASSERT_ERR(err, "prog_update_fail"))
goto cleanup;
/* Can't force-detach when BPF link is active */
err = bpf_set_link_xdp_fd(IFINDEX_LO, -1, 0);
- if (CHECK(!err, "prog_detach_fail", "unexpected success\n"))
+ if (!ASSERT_ERR(err, "prog_detach_fail"))
goto cleanup;
/* BPF link is not allowed to replace another BPF link */
skel2->links.xdp_handler = link;
err = bpf_get_link_xdp_id(IFINDEX_LO, &id0, 0);
- if (CHECK(err || id0 != id2, "id2_check",
- "loaded prog id %u != id2 %u, err %d", id0, id1, err))
+ if (!ASSERT_OK(err, "id2_check_err") || !ASSERT_EQ(id0, id2, "id2_check_val"))
goto cleanup;
/* updating program under active BPF link works as expected */
err = bpf_link__update_program(link, skel1->progs.xdp_handler);
- if (CHECK(err, "link_upd", "failed: %d\n", err))
+ if (!ASSERT_OK(err, "link_upd"))
goto cleanup;
memset(&link_info, 0, sizeof(link_info));
err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &link_info, &link_info_len);
- if (CHECK(err, "link_info", "failed: %d\n", err))
+ if (!ASSERT_OK(err, "link_info"))
goto cleanup;
- CHECK(link_info.type != BPF_LINK_TYPE_XDP, "link_type",
- "got %u != exp %u\n", link_info.type, BPF_LINK_TYPE_XDP);
- CHECK(link_info.prog_id != id1, "link_prog_id",
- "got %u != exp %u\n", link_info.prog_id, id1);
- CHECK(link_info.xdp.ifindex != IFINDEX_LO, "link_ifindex",
- "got %u != exp %u\n", link_info.xdp.ifindex, IFINDEX_LO);
+ ASSERT_EQ(link_info.type, BPF_LINK_TYPE_XDP, "link_type");
+ ASSERT_EQ(link_info.prog_id, id1, "link_prog_id");
+ ASSERT_EQ(link_info.xdp.ifindex, IFINDEX_LO, "link_ifindex");
+
+ /* updating program under active BPF link with different type fails */
+ err = bpf_link__update_program(link, skel1->progs.tc_handler);
+ if (!ASSERT_ERR(err, "link_upd_invalid"))
+ goto cleanup;
err = bpf_link__detach(link);
- if (CHECK(err, "link_detach", "failed %d\n", err))
+ if (!ASSERT_OK(err, "link_detach"))
goto cleanup;
memset(&link_info, 0, sizeof(link_info));
err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &link_info, &link_info_len);
- if (CHECK(err, "link_info", "failed: %d\n", err))
- goto cleanup;
- CHECK(link_info.prog_id != id1, "link_prog_id",
- "got %u != exp %u\n", link_info.prog_id, id1);
+
+ ASSERT_OK(err, "link_info");
+ ASSERT_EQ(link_info.prog_id, id1, "link_prog_id");
/* ifindex should be zeroed out */
- CHECK(link_info.xdp.ifindex != 0, "link_ifindex",
- "got %u != exp %u\n", link_info.xdp.ifindex, 0);
+ ASSERT_EQ(link_info.xdp.ifindex, 0, "link_ifindex");
cleanup:
test_xdp_link__destroy(skel1);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+extern const int bpf_prog_active __ksym;
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RINGBUF);
+ __uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+SEC("fentry/security_inode_getattr")
+int BPF_PROG(d_path_check_rdonly_mem, struct path *path, struct kstat *stat,
+ __u32 request_mask, unsigned int query_flags)
+{
+ void *active;
+ u32 cpu;
+
+ cpu = bpf_get_smp_processor_id();
+ active = (void *)bpf_per_cpu_ptr(&bpf_prog_active, cpu);
+ if (active) {
+ /* FAIL here! 'active' points to 'regular' memory. It
+ * cannot be submitted to ring buffer.
+ */
+ bpf_ringbuf_submit(active, 0);
+ }
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2018 Facebook
-#include <linux/bpf.h>
+#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#ifndef PERF_MAX_STACK_DEPTH
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
struct sched_switch_args {
unsigned long long pad;
- char prev_comm[16];
+ char prev_comm[TASK_COMM_LEN];
int prev_pid;
int prev_prio;
long long prev_state;
- char next_comm[16];
+ char next_comm[TASK_COMM_LEN];
int next_pid;
int next_prio;
};
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2017 Facebook
-#include <linux/bpf.h>
+#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
struct sched_switch_args {
unsigned long long pad;
- char prev_comm[16];
+ char prev_comm[TASK_COMM_LEN];
int prev_pid;
int prev_prio;
long long prev_state;
- char next_comm[16];
+ char next_comm[TASK_COMM_LEN];
int next_pid;
int next_prio;
};
{
return 0;
}
+
+SEC("tc")
+int tc_handler(struct __sk_buff *skb)
+{
+ return 0;
+}
--- /dev/null
+{
+ "ringbuf: invalid reservation offset 1",
+ .insns = {
+ /* reserve 8 byte ringbuf memory */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_MOV64_IMM(BPF_REG_2, 8),
+ BPF_MOV64_IMM(BPF_REG_3, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_ringbuf_reserve),
+ /* store a pointer to the reserved memory in R6 */
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+ /* check whether the reservation was successful */
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+ /* spill R6(mem) into the stack */
+ BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_6, -8),
+ /* fill it back in R7 */
+ BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_10, -8),
+ /* should be able to access *(R7) = 0 */
+ BPF_ST_MEM(BPF_DW, BPF_REG_7, 0, 0),
+ /* submit the reserved ringbuf memory */
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+ /* add invalid offset to reserved ringbuf memory */
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xcafe),
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_ringbuf_submit),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_ringbuf = { 1 },
+ .result = REJECT,
+ .errstr = "dereference of modified alloc_mem ptr R1",
+},
+{
+ "ringbuf: invalid reservation offset 2",
+ .insns = {
+ /* reserve 8 byte ringbuf memory */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_MOV64_IMM(BPF_REG_2, 8),
+ BPF_MOV64_IMM(BPF_REG_3, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_ringbuf_reserve),
+ /* store a pointer to the reserved memory in R6 */
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+ /* check whether the reservation was successful */
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+ /* spill R6(mem) into the stack */
+ BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_6, -8),
+ /* fill it back in R7 */
+ BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_10, -8),
+ /* add invalid offset to reserved ringbuf memory */
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, 0xcafe),
+ /* should be able to access *(R7) = 0 */
+ BPF_ST_MEM(BPF_DW, BPF_REG_7, 0, 0),
+ /* submit the reserved ringbuf memory */
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_ringbuf_submit),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_ringbuf = { 1 },
+ .result = REJECT,
+ .errstr = "R7 min value is outside of the allowed memory range",
+},
+{
+ "ringbuf: check passing rb mem to helpers",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ /* reserve 8 byte ringbuf memory */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_MOV64_IMM(BPF_REG_2, 8),
+ BPF_MOV64_IMM(BPF_REG_3, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_ringbuf_reserve),
+ BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+ /* check whether the reservation was successful */
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ /* pass allocated ring buffer memory to fib lookup */
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_3, 8),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_fib_lookup),
+ /* submit the ringbuf memory */
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_ringbuf_submit),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_ringbuf = { 2 },
+ .prog_type = BPF_PROG_TYPE_XDP,
+ .result = ACCEPT,
+},
},
.fixup_map_ringbuf = { 1 },
.result = REJECT,
- .errstr = "R0 pointer arithmetic on mem_or_null prohibited",
+ .errstr = "R0 pointer arithmetic on alloc_mem_or_null prohibited",
},
{
"check corrupted spill/fill",
/s390x/memop
/s390x/resets
/s390x/sync_regs_test
+/x86_64/amx_test
+/x86_64/cpuid_test
/x86_64/cr4_cpuid_sync_test
/x86_64/debug_regs
/x86_64/evmcs_test
/x86_64/emulator_error_test
-/x86_64/get_cpuid_test
/x86_64/get_msr_index_features
/x86_64/kvm_clock_test
/x86_64/kvm_pv_test
/x86_64/mmio_warning_test
/x86_64/mmu_role_test
/x86_64/platform_info_test
+/x86_64/pmu_event_filter_test
/x86_64/set_boot_cpu_id
/x86_64/set_sregs_test
/x86_64/sev_migrate_tests
/x86_64/vmx_apic_access_test
/x86_64/vmx_close_while_nested_test
/x86_64/vmx_dirty_log_test
+/x86_64/vmx_exception_with_invalid_guest_state
/x86_64/vmx_invalid_nested_guest_state
/x86_64/vmx_preemption_timer_test
/x86_64/vmx_set_nested_state_test
LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
-TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
+TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
+TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
TEST_GEN_PROGS_x86_64 += x86_64/emulator_error_test
-TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
TEST_GEN_PROGS_x86_64 += x86_64/mmu_role_test
TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
+TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
TEST_GEN_PROGS_x86_64 += x86_64/smm_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_exception_with_invalid_guest_state
TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
}
bool is_intel_cpu(void);
+bool is_amd_cpu(void);
+
+static inline unsigned int x86_family(unsigned int eax)
+{
+ unsigned int x86;
+
+ x86 = (eax >> 8) & 0xf;
+
+ if (x86 == 0xf)
+ x86 += (eax >> 20) & 0xff;
+
+ return x86;
+}
+
+static inline unsigned int x86_model(unsigned int eax)
+{
+ return ((eax >> 12) & 0xf0) | ((eax >> 4) & 0x0f);
+}
struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid);
void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_cpuid2 *kvm_get_supported_cpuid(void);
struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
+int __vcpu_set_cpuid(struct kvm_vm *vm, uint32_t vcpuid,
+ struct kvm_cpuid2 *cpuid);
void vcpu_set_cpuid(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_cpuid2 *cpuid);
void vm_set_page_table_entry(struct kvm_vm *vm, int vcpuid, uint64_t vaddr,
uint64_t pte);
+/*
+ * get_cpuid() - find matching CPUID entry and return pointer to it.
+ */
+struct kvm_cpuid_entry2 *get_cpuid(struct kvm_cpuid2 *cpuid, uint32_t function,
+ uint32_t index);
/*
* set_cpuid() - overwrites a matching cpuid entry with the provided value.
* matches based on ent->function && ent->index. returns true
struct kvm_vm *vm;
int i;
+#ifdef __x86_64__
/*
* Permission needs to be requested before KVM_SET_CPUID2.
*/
vm_xsave_req_perm();
+#endif
/* Force slot0 memory size not small than DEFAULT_GUEST_PHY_PAGES */
if (slot0_mem_pages < DEFAULT_GUEST_PHY_PAGES)
void kvm_vm_clear_dirty_log(struct kvm_vm *vm, int slot, void *log,
uint64_t first_page, uint32_t num_pages)
{
- struct kvm_clear_dirty_log args = { .dirty_bitmap = log, .slot = slot,
- .first_page = first_page,
- .num_pages = num_pages };
+ struct kvm_clear_dirty_log args = {
+ .dirty_bitmap = log, .slot = slot,
+ .first_page = first_page,
+ .num_pages = num_pages
+ };
int ret;
ret = ioctl(vm->fd, KVM_CLEAR_DIRTY_LOG, &args);
return entry;
}
+
+int __vcpu_set_cpuid(struct kvm_vm *vm, uint32_t vcpuid,
+ struct kvm_cpuid2 *cpuid)
+{
+ struct vcpu *vcpu = vcpu_find(vm, vcpuid);
+
+ TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid);
+
+ return ioctl(vcpu->fd, KVM_SET_CPUID2, cpuid);
+}
+
/*
* VM VCPU CPUID Set
*
void vcpu_set_cpuid(struct kvm_vm *vm,
uint32_t vcpuid, struct kvm_cpuid2 *cpuid)
{
- struct vcpu *vcpu = vcpu_find(vm, vcpuid);
int rc;
- TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid);
-
- rc = ioctl(vcpu->fd, KVM_SET_CPUID2, cpuid);
+ rc = __vcpu_set_cpuid(vm, vcpuid, cpuid);
TEST_ASSERT(rc == 0, "KVM_SET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
list = malloc(sizeof(*list) + nmsrs * sizeof(list->indices[0]));
list->nmsrs = nmsrs;
r = ioctl(vm->kvm_fd, KVM_GET_MSR_INDEX_LIST, list);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_MSR_INDEX_LIST, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_MSR_INDEX_LIST, r: %i",
+ r);
state = malloc(sizeof(*state) + nmsrs * sizeof(state->msrs.entries[0]));
r = ioctl(vcpu->fd, KVM_GET_VCPU_EVENTS, &state->events);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_VCPU_EVENTS, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_VCPU_EVENTS, r: %i",
+ r);
r = ioctl(vcpu->fd, KVM_GET_MP_STATE, &state->mp_state);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_MP_STATE, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_MP_STATE, r: %i",
+ r);
r = ioctl(vcpu->fd, KVM_GET_REGS, &state->regs);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_REGS, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_REGS, r: %i",
+ r);
r = vcpu_save_xsave_state(vm, vcpu, state);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_XSAVE, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_XSAVE, r: %i",
+ r);
if (kvm_check_cap(KVM_CAP_XCRS)) {
r = ioctl(vcpu->fd, KVM_GET_XCRS, &state->xcrs);
}
r = ioctl(vcpu->fd, KVM_GET_SREGS, &state->sregs);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_SREGS, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_SREGS, r: %i",
+ r);
if (nested_size) {
state->nested.size = sizeof(state->nested_);
r = ioctl(vcpu->fd, KVM_GET_NESTED_STATE, &state->nested);
TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_NESTED_STATE, r: %i",
- r);
+ r);
TEST_ASSERT(state->nested.size <= nested_size,
- "Nested state size too big, %i (KVM_CHECK_CAP gave %i)",
- state->nested.size, nested_size);
+ "Nested state size too big, %i (KVM_CHECK_CAP gave %i)",
+ state->nested.size, nested_size);
} else
state->nested.size = 0;
for (i = 0; i < nmsrs; i++)
state->msrs.entries[i].index = list->indices[i];
r = ioctl(vcpu->fd, KVM_GET_MSRS, &state->msrs);
- TEST_ASSERT(r == nmsrs, "Unexpected result from KVM_GET_MSRS, r: %i (failed MSR was 0x%x)",
- r, r == nmsrs ? -1 : list->indices[r]);
+ TEST_ASSERT(r == nmsrs, "Unexpected result from KVM_GET_MSRS, r: %i (failed MSR was 0x%x)",
+ r, r == nmsrs ? -1 : list->indices[r]);
r = ioctl(vcpu->fd, KVM_GET_DEBUGREGS, &state->debugregs);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_DEBUGREGS, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_DEBUGREGS, r: %i",
+ r);
free(list);
return state;
r = ioctl(vcpu->fd, KVM_SET_SREGS, &state->sregs);
TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_SREGS, r: %i",
- r);
+ r);
r = ioctl(vcpu->fd, KVM_SET_MSRS, &state->msrs);
TEST_ASSERT(r == state->msrs.nmsrs,
r = ioctl(vcpu->fd, KVM_SET_XSAVE, state->xsave);
TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XSAVE, r: %i",
- r);
+ r);
r = ioctl(vcpu->fd, KVM_SET_VCPU_EVENTS, &state->events);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_VCPU_EVENTS, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_VCPU_EVENTS, r: %i",
+ r);
r = ioctl(vcpu->fd, KVM_SET_MP_STATE, &state->mp_state);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_MP_STATE, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_MP_STATE, r: %i",
+ r);
r = ioctl(vcpu->fd, KVM_SET_DEBUGREGS, &state->debugregs);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_DEBUGREGS, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_DEBUGREGS, r: %i",
+ r);
r = ioctl(vcpu->fd, KVM_SET_REGS, &state->regs);
- TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_REGS, r: %i",
- r);
+ TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_REGS, r: %i",
+ r);
if (state->nested.size) {
r = ioctl(vcpu->fd, KVM_SET_NESTED_STATE, &state->nested);
TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_NESTED_STATE, r: %i",
- r);
+ r);
}
}
free(state);
}
-bool is_intel_cpu(void)
+static bool cpu_vendor_string_is(const char *vendor)
{
+ const uint32_t *chunk = (const uint32_t *)vendor;
int eax, ebx, ecx, edx;
- const uint32_t *chunk;
const int leaf = 0;
__asm__ __volatile__(
"=c"(ecx), "=d"(edx)
: /* input */ "0"(leaf), "2"(0));
- chunk = (const uint32_t *)("GenuineIntel");
return (ebx == chunk[0] && edx == chunk[1] && ecx == chunk[2]);
}
+bool is_intel_cpu(void)
+{
+ return cpu_vendor_string_is("GenuineIntel");
+}
+
+/*
+ * Exclude early K5 samples with a vendor string of "AMDisbetter!"
+ */
+bool is_amd_cpu(void)
+{
+ return cpu_vendor_string_is("AuthenticAMD");
+}
+
uint32_t kvm_get_cpuid_max_basic(void)
{
return kvm_get_supported_cpuid_entry(0)->eax;
}
}
+struct kvm_cpuid_entry2 *get_cpuid(struct kvm_cpuid2 *cpuid, uint32_t function,
+ uint32_t index)
+{
+ int i;
+
+ for (i = 0; i < cpuid->nent; i++) {
+ struct kvm_cpuid_entry2 *cur = &cpuid->entries[i];
+
+ if (cur->function == function && cur->index == index)
+ return cur;
+ }
+
+ TEST_FAIL("CPUID function 0x%x index 0x%x not found ", function, index);
+
+ return NULL;
+}
+
bool set_cpuid(struct kvm_cpuid2 *cpuid,
struct kvm_cpuid_entry2 *ent)
{
return cpuid;
}
-#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx 0x68747541
-#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx 0x444d4163
-#define X86EMUL_CPUID_VENDOR_AuthenticAMD_edx 0x69746e65
-
-static inline unsigned x86_family(unsigned int eax)
-{
- unsigned int x86;
-
- x86 = (eax >> 8) & 0xf;
-
- if (x86 == 0xf)
- x86 += (eax >> 20) & 0xff;
-
- return x86;
-}
-
unsigned long vm_compute_max_gfn(struct kvm_vm *vm)
{
const unsigned long num_ht_pages = 12 << (30 - vm->page_shift); /* 12 GiB */
max_gfn = (1ULL << (vm->pa_bits - vm->page_shift)) - 1;
/* Avoid reserved HyperTransport region on AMD processors. */
- eax = ecx = 0;
- cpuid(&eax, &ebx, &ecx, &edx);
- if (ebx != X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx ||
- ecx != X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx ||
- edx != X86EMUL_CPUID_VENDOR_AuthenticAMD_edx)
+ if (!is_amd_cpu())
return max_gfn;
/* On parts with <40 physical address bits, the area is fully hidden */
/* Before family 17h, the HyperTransport area is just below 1T. */
ht_gfn = (1 << 28) - num_ht_pages;
eax = 1;
+ ecx = 0;
cpuid(&eax, &ebx, &ecx, &edx);
if (x86_family(eax) < 0x17)
goto done;
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021, Red Hat Inc.
+ *
+ * Generic tests for KVM CPUID set/get ioctls
+ */
+#include <asm/kvm_para.h>
+#include <linux/kvm_para.h>
+#include <stdint.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID 0
+
+/* CPUIDs known to differ */
+struct {
+ u32 function;
+ u32 index;
+} mangled_cpuids[] = {
+ /*
+ * These entries depend on the vCPU's XCR0 register and IA32_XSS MSR,
+ * which are not controlled for by this test.
+ */
+ {.function = 0xd, .index = 0},
+ {.function = 0xd, .index = 1},
+};
+
+static void test_guest_cpuids(struct kvm_cpuid2 *guest_cpuid)
+{
+ int i;
+ u32 eax, ebx, ecx, edx;
+
+ for (i = 0; i < guest_cpuid->nent; i++) {
+ eax = guest_cpuid->entries[i].function;
+ ecx = guest_cpuid->entries[i].index;
+
+ cpuid(&eax, &ebx, &ecx, &edx);
+
+ GUEST_ASSERT(eax == guest_cpuid->entries[i].eax &&
+ ebx == guest_cpuid->entries[i].ebx &&
+ ecx == guest_cpuid->entries[i].ecx &&
+ edx == guest_cpuid->entries[i].edx);
+ }
+
+}
+
+static void test_cpuid_40000000(struct kvm_cpuid2 *guest_cpuid)
+{
+ u32 eax = 0x40000000, ebx, ecx = 0, edx;
+
+ cpuid(&eax, &ebx, &ecx, &edx);
+
+ GUEST_ASSERT(eax == 0x40000001);
+}
+
+static void guest_main(struct kvm_cpuid2 *guest_cpuid)
+{
+ GUEST_SYNC(1);
+
+ test_guest_cpuids(guest_cpuid);
+
+ GUEST_SYNC(2);
+
+ test_cpuid_40000000(guest_cpuid);
+
+ GUEST_DONE();
+}
+
+static bool is_cpuid_mangled(struct kvm_cpuid_entry2 *entrie)
+{
+ int i;
+
+ for (i = 0; i < sizeof(mangled_cpuids); i++) {
+ if (mangled_cpuids[i].function == entrie->function &&
+ mangled_cpuids[i].index == entrie->index)
+ return true;
+ }
+
+ return false;
+}
+
+static void check_cpuid(struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 *entrie)
+{
+ int i;
+
+ for (i = 0; i < cpuid->nent; i++) {
+ if (cpuid->entries[i].function == entrie->function &&
+ cpuid->entries[i].index == entrie->index) {
+ if (is_cpuid_mangled(entrie))
+ return;
+
+ TEST_ASSERT(cpuid->entries[i].eax == entrie->eax &&
+ cpuid->entries[i].ebx == entrie->ebx &&
+ cpuid->entries[i].ecx == entrie->ecx &&
+ cpuid->entries[i].edx == entrie->edx,
+ "CPUID 0x%x.%x differ: 0x%x:0x%x:0x%x:0x%x vs 0x%x:0x%x:0x%x:0x%x",
+ entrie->function, entrie->index,
+ cpuid->entries[i].eax, cpuid->entries[i].ebx,
+ cpuid->entries[i].ecx, cpuid->entries[i].edx,
+ entrie->eax, entrie->ebx, entrie->ecx, entrie->edx);
+ return;
+ }
+ }
+
+ TEST_ASSERT(false, "CPUID 0x%x.%x not found", entrie->function, entrie->index);
+}
+
+static void compare_cpuids(struct kvm_cpuid2 *cpuid1, struct kvm_cpuid2 *cpuid2)
+{
+ int i;
+
+ for (i = 0; i < cpuid1->nent; i++)
+ check_cpuid(cpuid2, &cpuid1->entries[i]);
+
+ for (i = 0; i < cpuid2->nent; i++)
+ check_cpuid(cpuid1, &cpuid2->entries[i]);
+}
+
+static void run_vcpu(struct kvm_vm *vm, uint32_t vcpuid, int stage)
+{
+ struct ucall uc;
+
+ _vcpu_run(vm, vcpuid);
+
+ switch (get_ucall(vm, vcpuid, &uc)) {
+ case UCALL_SYNC:
+ TEST_ASSERT(!strcmp((const char *)uc.args[0], "hello") &&
+ uc.args[1] == stage + 1,
+ "Stage %d: Unexpected register values vmexit, got %lx",
+ stage + 1, (ulong)uc.args[1]);
+ return;
+ case UCALL_DONE:
+ return;
+ case UCALL_ABORT:
+ TEST_ASSERT(false, "%s at %s:%ld\n\tvalues: %#lx, %#lx", (const char *)uc.args[0],
+ __FILE__, uc.args[1], uc.args[2], uc.args[3]);
+ default:
+ TEST_ASSERT(false, "Unexpected exit: %s",
+ exit_reason_str(vcpu_state(vm, vcpuid)->exit_reason));
+ }
+}
+
+struct kvm_cpuid2 *vcpu_alloc_cpuid(struct kvm_vm *vm, vm_vaddr_t *p_gva, struct kvm_cpuid2 *cpuid)
+{
+ int size = sizeof(*cpuid) + cpuid->nent * sizeof(cpuid->entries[0]);
+ vm_vaddr_t gva = vm_vaddr_alloc(vm, size, KVM_UTIL_MIN_VADDR);
+ struct kvm_cpuid2 *guest_cpuids = addr_gva2hva(vm, gva);
+
+ memcpy(guest_cpuids, cpuid, size);
+
+ *p_gva = gva;
+ return guest_cpuids;
+}
+
+static void set_cpuid_after_run(struct kvm_vm *vm, struct kvm_cpuid2 *cpuid)
+{
+ struct kvm_cpuid_entry2 *ent;
+ int rc;
+ u32 eax, ebx, x;
+
+ /* Setting unmodified CPUID is allowed */
+ rc = __vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ TEST_ASSERT(!rc, "Setting unmodified CPUID after KVM_RUN failed: %d", rc);
+
+ /* Changing CPU features is forbidden */
+ ent = get_cpuid(cpuid, 0x7, 0);
+ ebx = ent->ebx;
+ ent->ebx--;
+ rc = __vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ TEST_ASSERT(rc, "Changing CPU features should fail");
+ ent->ebx = ebx;
+
+ /* Changing MAXPHYADDR is forbidden */
+ ent = get_cpuid(cpuid, 0x80000008, 0);
+ eax = ent->eax;
+ x = eax & 0xff;
+ ent->eax = (eax & ~0xffu) | (x - 1);
+ rc = __vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ TEST_ASSERT(rc, "Changing MAXPHYADDR should fail");
+ ent->eax = eax;
+}
+
+int main(void)
+{
+ struct kvm_cpuid2 *supp_cpuid, *cpuid2;
+ vm_vaddr_t cpuid_gva;
+ struct kvm_vm *vm;
+ int stage;
+
+ vm = vm_create_default(VCPU_ID, 0, guest_main);
+
+ supp_cpuid = kvm_get_supported_cpuid();
+ cpuid2 = vcpu_get_cpuid(vm, VCPU_ID);
+
+ compare_cpuids(supp_cpuid, cpuid2);
+
+ vcpu_alloc_cpuid(vm, &cpuid_gva, cpuid2);
+
+ vcpu_args_set(vm, VCPU_ID, 1, cpuid_gva);
+
+ for (stage = 0; stage < 3; stage++)
+ run_vcpu(vm, VCPU_ID, stage);
+
+ set_cpuid_after_run(vm, cpuid2);
+
+ kvm_vm_free(vm);
+}
+++ /dev/null
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (C) 2021, Red Hat Inc.
- *
- * Generic tests for KVM CPUID set/get ioctls
- */
-#include <asm/kvm_para.h>
-#include <linux/kvm_para.h>
-#include <stdint.h>
-
-#include "test_util.h"
-#include "kvm_util.h"
-#include "processor.h"
-
-#define VCPU_ID 0
-
-/* CPUIDs known to differ */
-struct {
- u32 function;
- u32 index;
-} mangled_cpuids[] = {
- /*
- * These entries depend on the vCPU's XCR0 register and IA32_XSS MSR,
- * which are not controlled for by this test.
- */
- {.function = 0xd, .index = 0},
- {.function = 0xd, .index = 1},
-};
-
-static void test_guest_cpuids(struct kvm_cpuid2 *guest_cpuid)
-{
- int i;
- u32 eax, ebx, ecx, edx;
-
- for (i = 0; i < guest_cpuid->nent; i++) {
- eax = guest_cpuid->entries[i].function;
- ecx = guest_cpuid->entries[i].index;
-
- cpuid(&eax, &ebx, &ecx, &edx);
-
- GUEST_ASSERT(eax == guest_cpuid->entries[i].eax &&
- ebx == guest_cpuid->entries[i].ebx &&
- ecx == guest_cpuid->entries[i].ecx &&
- edx == guest_cpuid->entries[i].edx);
- }
-
-}
-
-static void test_cpuid_40000000(struct kvm_cpuid2 *guest_cpuid)
-{
- u32 eax = 0x40000000, ebx, ecx = 0, edx;
-
- cpuid(&eax, &ebx, &ecx, &edx);
-
- GUEST_ASSERT(eax == 0x40000001);
-}
-
-static void guest_main(struct kvm_cpuid2 *guest_cpuid)
-{
- GUEST_SYNC(1);
-
- test_guest_cpuids(guest_cpuid);
-
- GUEST_SYNC(2);
-
- test_cpuid_40000000(guest_cpuid);
-
- GUEST_DONE();
-}
-
-static bool is_cpuid_mangled(struct kvm_cpuid_entry2 *entrie)
-{
- int i;
-
- for (i = 0; i < sizeof(mangled_cpuids); i++) {
- if (mangled_cpuids[i].function == entrie->function &&
- mangled_cpuids[i].index == entrie->index)
- return true;
- }
-
- return false;
-}
-
-static void check_cpuid(struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 *entrie)
-{
- int i;
-
- for (i = 0; i < cpuid->nent; i++) {
- if (cpuid->entries[i].function == entrie->function &&
- cpuid->entries[i].index == entrie->index) {
- if (is_cpuid_mangled(entrie))
- return;
-
- TEST_ASSERT(cpuid->entries[i].eax == entrie->eax &&
- cpuid->entries[i].ebx == entrie->ebx &&
- cpuid->entries[i].ecx == entrie->ecx &&
- cpuid->entries[i].edx == entrie->edx,
- "CPUID 0x%x.%x differ: 0x%x:0x%x:0x%x:0x%x vs 0x%x:0x%x:0x%x:0x%x",
- entrie->function, entrie->index,
- cpuid->entries[i].eax, cpuid->entries[i].ebx,
- cpuid->entries[i].ecx, cpuid->entries[i].edx,
- entrie->eax, entrie->ebx, entrie->ecx, entrie->edx);
- return;
- }
- }
-
- TEST_ASSERT(false, "CPUID 0x%x.%x not found", entrie->function, entrie->index);
-}
-
-static void compare_cpuids(struct kvm_cpuid2 *cpuid1, struct kvm_cpuid2 *cpuid2)
-{
- int i;
-
- for (i = 0; i < cpuid1->nent; i++)
- check_cpuid(cpuid2, &cpuid1->entries[i]);
-
- for (i = 0; i < cpuid2->nent; i++)
- check_cpuid(cpuid1, &cpuid2->entries[i]);
-}
-
-static void run_vcpu(struct kvm_vm *vm, uint32_t vcpuid, int stage)
-{
- struct ucall uc;
-
- _vcpu_run(vm, vcpuid);
-
- switch (get_ucall(vm, vcpuid, &uc)) {
- case UCALL_SYNC:
- TEST_ASSERT(!strcmp((const char *)uc.args[0], "hello") &&
- uc.args[1] == stage + 1,
- "Stage %d: Unexpected register values vmexit, got %lx",
- stage + 1, (ulong)uc.args[1]);
- return;
- case UCALL_DONE:
- return;
- case UCALL_ABORT:
- TEST_ASSERT(false, "%s at %s:%ld\n\tvalues: %#lx, %#lx", (const char *)uc.args[0],
- __FILE__, uc.args[1], uc.args[2], uc.args[3]);
- default:
- TEST_ASSERT(false, "Unexpected exit: %s",
- exit_reason_str(vcpu_state(vm, vcpuid)->exit_reason));
- }
-}
-
-struct kvm_cpuid2 *vcpu_alloc_cpuid(struct kvm_vm *vm, vm_vaddr_t *p_gva, struct kvm_cpuid2 *cpuid)
-{
- int size = sizeof(*cpuid) + cpuid->nent * sizeof(cpuid->entries[0]);
- vm_vaddr_t gva = vm_vaddr_alloc(vm, size, KVM_UTIL_MIN_VADDR);
- struct kvm_cpuid2 *guest_cpuids = addr_gva2hva(vm, gva);
-
- memcpy(guest_cpuids, cpuid, size);
-
- *p_gva = gva;
- return guest_cpuids;
-}
-
-int main(void)
-{
- struct kvm_cpuid2 *supp_cpuid, *cpuid2;
- vm_vaddr_t cpuid_gva;
- struct kvm_vm *vm;
- int stage;
-
- vm = vm_create_default(VCPU_ID, 0, guest_main);
-
- supp_cpuid = kvm_get_supported_cpuid();
- cpuid2 = vcpu_get_cpuid(vm, VCPU_ID);
-
- compare_cpuids(supp_cpuid, cpuid2);
-
- vcpu_alloc_cpuid(vm, &cpuid_gva, cpuid2);
-
- vcpu_args_set(vm, VCPU_ID, 1, cpuid_gva);
-
- for (stage = 0; stage < 3; stage++)
- run_vcpu(vm, VCPU_ID, stage);
-
- kvm_vm_free(vm);
-}
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test for x86 KVM_SET_PMU_EVENT_FILTER.
+ *
+ * Copyright (C) 2022, Google LLC.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * Verifies the expected behavior of allow lists and deny lists for
+ * virtual PMU events.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+/*
+ * In lieu of copying perf_event.h into tools...
+ */
+#define ARCH_PERFMON_EVENTSEL_OS (1ULL << 17)
+#define ARCH_PERFMON_EVENTSEL_ENABLE (1ULL << 22)
+
+union cpuid10_eax {
+ struct {
+ unsigned int version_id:8;
+ unsigned int num_counters:8;
+ unsigned int bit_width:8;
+ unsigned int mask_length:8;
+ } split;
+ unsigned int full;
+};
+
+union cpuid10_ebx {
+ struct {
+ unsigned int no_unhalted_core_cycles:1;
+ unsigned int no_instructions_retired:1;
+ unsigned int no_unhalted_reference_cycles:1;
+ unsigned int no_llc_reference:1;
+ unsigned int no_llc_misses:1;
+ unsigned int no_branch_instruction_retired:1;
+ unsigned int no_branch_misses_retired:1;
+ } split;
+ unsigned int full;
+};
+
+/* End of stuff taken from perf_event.h. */
+
+/* Oddly, this isn't in perf_event.h. */
+#define ARCH_PERFMON_BRANCHES_RETIRED 5
+
+#define VCPU_ID 0
+#define NUM_BRANCHES 42
+
+/*
+ * This is how the event selector and unit mask are stored in an AMD
+ * core performance event-select register. Intel's format is similar,
+ * but the event selector is only 8 bits.
+ */
+#define EVENT(select, umask) ((select & 0xf00UL) << 24 | (select & 0xff) | \
+ (umask & 0xff) << 8)
+
+/*
+ * "Branch instructions retired", from the Intel SDM, volume 3,
+ * "Pre-defined Architectural Performance Events."
+ */
+
+#define INTEL_BR_RETIRED EVENT(0xc4, 0)
+
+/*
+ * "Retired branch instructions", from Processor Programming Reference
+ * (PPR) for AMD Family 17h Model 01h, Revision B1 Processors,
+ * Preliminary Processor Programming Reference (PPR) for AMD Family
+ * 17h Model 31h, Revision B0 Processors, and Preliminary Processor
+ * Programming Reference (PPR) for AMD Family 19h Model 01h, Revision
+ * B1 Processors Volume 1 of 2.
+ */
+
+#define AMD_ZEN_BR_RETIRED EVENT(0xc2, 0)
+
+/*
+ * This event list comprises Intel's eight architectural events plus
+ * AMD's "retired branch instructions" for Zen[123] (and possibly
+ * other AMD CPUs).
+ */
+static const uint64_t event_list[] = {
+ EVENT(0x3c, 0),
+ EVENT(0xc0, 0),
+ EVENT(0x3c, 1),
+ EVENT(0x2e, 0x4f),
+ EVENT(0x2e, 0x41),
+ EVENT(0xc4, 0),
+ EVENT(0xc5, 0),
+ EVENT(0xa4, 1),
+ AMD_ZEN_BR_RETIRED,
+};
+
+/*
+ * If we encounter a #GP during the guest PMU sanity check, then the guest
+ * PMU is not functional. Inform the hypervisor via GUEST_SYNC(0).
+ */
+static void guest_gp_handler(struct ex_regs *regs)
+{
+ GUEST_SYNC(0);
+}
+
+/*
+ * Check that we can write a new value to the given MSR and read it back.
+ * The caller should provide a non-empty set of bits that are safe to flip.
+ *
+ * Return on success. GUEST_SYNC(0) on error.
+ */
+static void check_msr(uint32_t msr, uint64_t bits_to_flip)
+{
+ uint64_t v = rdmsr(msr) ^ bits_to_flip;
+
+ wrmsr(msr, v);
+ if (rdmsr(msr) != v)
+ GUEST_SYNC(0);
+
+ v ^= bits_to_flip;
+ wrmsr(msr, v);
+ if (rdmsr(msr) != v)
+ GUEST_SYNC(0);
+}
+
+static void intel_guest_code(void)
+{
+ check_msr(MSR_CORE_PERF_GLOBAL_CTRL, 1);
+ check_msr(MSR_P6_EVNTSEL0, 0xffff);
+ check_msr(MSR_IA32_PMC0, 0xffff);
+ GUEST_SYNC(1);
+
+ for (;;) {
+ uint64_t br0, br1;
+
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+ wrmsr(MSR_P6_EVNTSEL0, ARCH_PERFMON_EVENTSEL_ENABLE |
+ ARCH_PERFMON_EVENTSEL_OS | INTEL_BR_RETIRED);
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 1);
+ br0 = rdmsr(MSR_IA32_PMC0);
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
+ br1 = rdmsr(MSR_IA32_PMC0);
+ GUEST_SYNC(br1 - br0);
+ }
+}
+
+/*
+ * To avoid needing a check for CPUID.80000001:ECX.PerfCtrExtCore[bit 23],
+ * this code uses the always-available, legacy K7 PMU MSRs, which alias to
+ * the first four of the six extended core PMU MSRs.
+ */
+static void amd_guest_code(void)
+{
+ check_msr(MSR_K7_EVNTSEL0, 0xffff);
+ check_msr(MSR_K7_PERFCTR0, 0xffff);
+ GUEST_SYNC(1);
+
+ for (;;) {
+ uint64_t br0, br1;
+
+ wrmsr(MSR_K7_EVNTSEL0, 0);
+ wrmsr(MSR_K7_EVNTSEL0, ARCH_PERFMON_EVENTSEL_ENABLE |
+ ARCH_PERFMON_EVENTSEL_OS | AMD_ZEN_BR_RETIRED);
+ br0 = rdmsr(MSR_K7_PERFCTR0);
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
+ br1 = rdmsr(MSR_K7_PERFCTR0);
+ GUEST_SYNC(br1 - br0);
+ }
+}
+
+/*
+ * Run the VM to the next GUEST_SYNC(value), and return the value passed
+ * to the sync. Any other exit from the guest is fatal.
+ */
+static uint64_t run_vm_to_sync(struct kvm_vm *vm)
+{
+ struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+ struct ucall uc;
+
+ vcpu_run(vm, VCPU_ID);
+ TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+ "Exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+ run->exit_reason,
+ exit_reason_str(run->exit_reason));
+ get_ucall(vm, VCPU_ID, &uc);
+ TEST_ASSERT(uc.cmd == UCALL_SYNC,
+ "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
+ return uc.args[1];
+}
+
+/*
+ * In a nested environment or if the vPMU is disabled, the guest PMU
+ * might not work as architected (accessing the PMU MSRs may raise
+ * #GP, or writes could simply be discarded). In those situations,
+ * there is no point in running these tests. The guest code will perform
+ * a sanity check and then GUEST_SYNC(success). In the case of failure,
+ * the behavior of the guest on resumption is undefined.
+ */
+static bool sanity_check_pmu(struct kvm_vm *vm)
+{
+ bool success;
+
+ vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler);
+ success = run_vm_to_sync(vm);
+ vm_install_exception_handler(vm, GP_VECTOR, NULL);
+
+ return success;
+}
+
+static struct kvm_pmu_event_filter *make_pmu_event_filter(uint32_t nevents)
+{
+ struct kvm_pmu_event_filter *f;
+ int size = sizeof(*f) + nevents * sizeof(f->events[0]);
+
+ f = malloc(size);
+ TEST_ASSERT(f, "Out of memory");
+ memset(f, 0, size);
+ f->nevents = nevents;
+ return f;
+}
+
+static struct kvm_pmu_event_filter *event_filter(uint32_t action)
+{
+ struct kvm_pmu_event_filter *f;
+ int i;
+
+ f = make_pmu_event_filter(ARRAY_SIZE(event_list));
+ f->action = action;
+ for (i = 0; i < ARRAY_SIZE(event_list); i++)
+ f->events[i] = event_list[i];
+
+ return f;
+}
+
+/*
+ * Remove the first occurrence of 'event' (if any) from the filter's
+ * event list.
+ */
+static struct kvm_pmu_event_filter *remove_event(struct kvm_pmu_event_filter *f,
+ uint64_t event)
+{
+ bool found = false;
+ int i;
+
+ for (i = 0; i < f->nevents; i++) {
+ if (found)
+ f->events[i - 1] = f->events[i];
+ else
+ found = f->events[i] == event;
+ }
+ if (found)
+ f->nevents--;
+ return f;
+}
+
+static void test_without_filter(struct kvm_vm *vm)
+{
+ uint64_t count = run_vm_to_sync(vm);
+
+ if (count != NUM_BRANCHES)
+ pr_info("%s: Branch instructions retired = %lu (expected %u)\n",
+ __func__, count, NUM_BRANCHES);
+ TEST_ASSERT(count, "Allowed PMU event is not counting");
+}
+
+static uint64_t test_with_filter(struct kvm_vm *vm,
+ struct kvm_pmu_event_filter *f)
+{
+ vm_ioctl(vm, KVM_SET_PMU_EVENT_FILTER, (void *)f);
+ return run_vm_to_sync(vm);
+}
+
+static void test_member_deny_list(struct kvm_vm *vm)
+{
+ struct kvm_pmu_event_filter *f = event_filter(KVM_PMU_EVENT_DENY);
+ uint64_t count = test_with_filter(vm, f);
+
+ free(f);
+ if (count)
+ pr_info("%s: Branch instructions retired = %lu (expected 0)\n",
+ __func__, count);
+ TEST_ASSERT(!count, "Disallowed PMU Event is counting");
+}
+
+static void test_member_allow_list(struct kvm_vm *vm)
+{
+ struct kvm_pmu_event_filter *f = event_filter(KVM_PMU_EVENT_ALLOW);
+ uint64_t count = test_with_filter(vm, f);
+
+ free(f);
+ if (count != NUM_BRANCHES)
+ pr_info("%s: Branch instructions retired = %lu (expected %u)\n",
+ __func__, count, NUM_BRANCHES);
+ TEST_ASSERT(count, "Allowed PMU event is not counting");
+}
+
+static void test_not_member_deny_list(struct kvm_vm *vm)
+{
+ struct kvm_pmu_event_filter *f = event_filter(KVM_PMU_EVENT_DENY);
+ uint64_t count;
+
+ remove_event(f, INTEL_BR_RETIRED);
+ remove_event(f, AMD_ZEN_BR_RETIRED);
+ count = test_with_filter(vm, f);
+ free(f);
+ if (count != NUM_BRANCHES)
+ pr_info("%s: Branch instructions retired = %lu (expected %u)\n",
+ __func__, count, NUM_BRANCHES);
+ TEST_ASSERT(count, "Allowed PMU event is not counting");
+}
+
+static void test_not_member_allow_list(struct kvm_vm *vm)
+{
+ struct kvm_pmu_event_filter *f = event_filter(KVM_PMU_EVENT_ALLOW);
+ uint64_t count;
+
+ remove_event(f, INTEL_BR_RETIRED);
+ remove_event(f, AMD_ZEN_BR_RETIRED);
+ count = test_with_filter(vm, f);
+ free(f);
+ if (count)
+ pr_info("%s: Branch instructions retired = %lu (expected 0)\n",
+ __func__, count);
+ TEST_ASSERT(!count, "Disallowed PMU Event is counting");
+}
+
+/*
+ * Check for a non-zero PMU version, at least one general-purpose
+ * counter per logical processor, an EBX bit vector of length greater
+ * than 5, and EBX[5] clear.
+ */
+static bool check_intel_pmu_leaf(struct kvm_cpuid_entry2 *entry)
+{
+ union cpuid10_eax eax = { .full = entry->eax };
+ union cpuid10_ebx ebx = { .full = entry->ebx };
+
+ return eax.split.version_id && eax.split.num_counters > 0 &&
+ eax.split.mask_length > ARCH_PERFMON_BRANCHES_RETIRED &&
+ !ebx.split.no_branch_instruction_retired;
+}
+
+/*
+ * Note that CPUID leaf 0xa is Intel-specific. This leaf should be
+ * clear on AMD hardware.
+ */
+static bool use_intel_pmu(void)
+{
+ struct kvm_cpuid_entry2 *entry;
+
+ entry = kvm_get_supported_cpuid_index(0xa, 0);
+ return is_intel_cpu() && entry && check_intel_pmu_leaf(entry);
+}
+
+static bool is_zen1(uint32_t eax)
+{
+ return x86_family(eax) == 0x17 && x86_model(eax) <= 0x0f;
+}
+
+static bool is_zen2(uint32_t eax)
+{
+ return x86_family(eax) == 0x17 &&
+ x86_model(eax) >= 0x30 && x86_model(eax) <= 0x3f;
+}
+
+static bool is_zen3(uint32_t eax)
+{
+ return x86_family(eax) == 0x19 && x86_model(eax) <= 0x0f;
+}
+
+/*
+ * Determining AMD support for a PMU event requires consulting the AMD
+ * PPR for the CPU or reference material derived therefrom. The AMD
+ * test code herein has been verified to work on Zen1, Zen2, and Zen3.
+ *
+ * Feel free to add more AMD CPUs that are documented to support event
+ * select 0xc2 umask 0 as "retired branch instructions."
+ */
+static bool use_amd_pmu(void)
+{
+ struct kvm_cpuid_entry2 *entry;
+
+ entry = kvm_get_supported_cpuid_index(1, 0);
+ return is_amd_cpu() && entry &&
+ (is_zen1(entry->eax) ||
+ is_zen2(entry->eax) ||
+ is_zen3(entry->eax));
+}
+
+int main(int argc, char *argv[])
+{
+ void (*guest_code)(void) = NULL;
+ struct kvm_vm *vm;
+ int r;
+
+ /* Tell stdout not to buffer its content */
+ setbuf(stdout, NULL);
+
+ r = kvm_check_cap(KVM_CAP_PMU_EVENT_FILTER);
+ if (!r) {
+ print_skip("KVM_CAP_PMU_EVENT_FILTER not supported");
+ exit(KSFT_SKIP);
+ }
+
+ if (use_intel_pmu())
+ guest_code = intel_guest_code;
+ else if (use_amd_pmu())
+ guest_code = amd_guest_code;
+
+ if (!guest_code) {
+ print_skip("Don't know how to test this guest PMU");
+ exit(KSFT_SKIP);
+ }
+
+ vm = vm_create_default(VCPU_ID, 0, guest_code);
+
+ vm_init_descriptor_tables(vm);
+ vcpu_init_descriptor_tables(vm, VCPU_ID);
+
+ if (!sanity_check_pmu(vm)) {
+ print_skip("Guest PMU is not functional");
+ exit(KSFT_SKIP);
+ }
+
+ test_without_filter(vm);
+ test_member_deny_list(vm);
+ test_member_allow_list(vm);
+ test_not_member_deny_list(vm);
+ test_not_member_allow_list(vm);
+
+ kvm_vm_free(vm);
+
+ return 0;
+}
switch (get_ucall(vm, vcpuid, &uc)) {
case UCALL_SYNC:
TEST_ASSERT(!strcmp((const char *)uc.args[0], "hello") &&
- uc.args[1] == stage + 1, "Stage %d: Unexpected register values vmexit, got %lx",
- stage + 1, (ulong)uc.args[1]);
+ uc.args[1] == stage + 1, "Stage %d: Unexpected register values vmexit, got %lx",
+ stage + 1, (ulong)uc.args[1]);
return;
case UCALL_DONE:
return;
static void l2_guest_code(void)
{
/* Exit to L0 */
- asm volatile("inb %%dx, %%al"
- : : [port] "d" (PORT_L0_EXIT) : "rax");
+ asm volatile("inb %%dx, %%al"
+ : : [port] "d" (PORT_L0_EXIT) : "rax");
}
static void l1_guest_code(struct vmx_pages *vmx_pages)
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0-only
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#include <signal.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+
+#include "kselftest.h"
+
+#define VCPU_ID 0
+
+static struct kvm_vm *vm;
+
+static void guest_ud_handler(struct ex_regs *regs)
+{
+ /* Loop on the ud2 until guest state is made invalid. */
+}
+
+static void guest_code(void)
+{
+ asm volatile("ud2");
+}
+
+static void __run_vcpu_with_invalid_state(void)
+{
+ struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+
+ vcpu_run(vm, VCPU_ID);
+
+ TEST_ASSERT(run->exit_reason == KVM_EXIT_INTERNAL_ERROR,
+ "Expected KVM_EXIT_INTERNAL_ERROR, got %d (%s)\n",
+ run->exit_reason, exit_reason_str(run->exit_reason));
+ TEST_ASSERT(run->emulation_failure.suberror == KVM_INTERNAL_ERROR_EMULATION,
+ "Expected emulation failure, got %d\n",
+ run->emulation_failure.suberror);
+}
+
+static void run_vcpu_with_invalid_state(void)
+{
+ /*
+ * Always run twice to verify KVM handles the case where _KVM_ queues
+ * an exception with invalid state and then exits to userspace, i.e.
+ * that KVM doesn't explode if userspace ignores the initial error.
+ */
+ __run_vcpu_with_invalid_state();
+ __run_vcpu_with_invalid_state();
+}
+
+static void set_timer(void)
+{
+ struct itimerval timer;
+
+ timer.it_value.tv_sec = 0;
+ timer.it_value.tv_usec = 200;
+ timer.it_interval = timer.it_value;
+ ASSERT_EQ(setitimer(ITIMER_REAL, &timer, NULL), 0);
+}
+
+static void set_or_clear_invalid_guest_state(bool set)
+{
+ static struct kvm_sregs sregs;
+
+ if (!sregs.cr0)
+ vcpu_sregs_get(vm, VCPU_ID, &sregs);
+ sregs.tr.unusable = !!set;
+ vcpu_sregs_set(vm, VCPU_ID, &sregs);
+}
+
+static void set_invalid_guest_state(void)
+{
+ set_or_clear_invalid_guest_state(true);
+}
+
+static void clear_invalid_guest_state(void)
+{
+ set_or_clear_invalid_guest_state(false);
+}
+
+static void sigalrm_handler(int sig)
+{
+ struct kvm_vcpu_events events;
+
+ TEST_ASSERT(sig == SIGALRM, "Unexpected signal = %d", sig);
+
+ vcpu_events_get(vm, VCPU_ID, &events);
+
+ /*
+ * If an exception is pending, attempt KVM_RUN with invalid guest,
+ * otherwise rearm the timer and keep doing so until the timer fires
+ * between KVM queueing an exception and re-entering the guest.
+ */
+ if (events.exception.pending) {
+ set_invalid_guest_state();
+ run_vcpu_with_invalid_state();
+ } else {
+ set_timer();
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ if (!is_intel_cpu() || vm_is_unrestricted_guest(NULL)) {
+ print_skip("Must be run with kvm_intel.unrestricted_guest=0");
+ exit(KSFT_SKIP);
+ }
+
+ vm = vm_create_default(VCPU_ID, 0, (void *)guest_code);
+
+ vm_init_descriptor_tables(vm);
+ vcpu_init_descriptor_tables(vm, VCPU_ID);
+
+ vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
+
+ /*
+ * Stuff invalid guest state for L2 by making TR unusuable. The next
+ * KVM_RUN should induce a TRIPLE_FAULT in L2 as KVM doesn't support
+ * emulating invalid guest state for L2.
+ */
+ set_invalid_guest_state();
+ run_vcpu_with_invalid_state();
+
+ /*
+ * Verify KVM also handles the case where userspace gains control while
+ * an exception is pending and stuffs invalid state. Run with valid
+ * guest state and a timer firing every 200us, and attempt to enter the
+ * guest with invalid state when the handler interrupts KVM with an
+ * exception pending.
+ */
+ clear_invalid_guest_state();
+ TEST_ASSERT(signal(SIGALRM, sigalrm_handler) != SIG_ERR,
+ "Failed to register SIGALRM handler, errno = %d (%s)",
+ errno, strerror(errno));
+
+ set_timer();
+ run_vcpu_with_invalid_state();
+}
#define MIN_STEAL_TIME 50000
struct pvclock_vcpu_time_info {
- u32 version;
- u32 pad0;
- u64 tsc_timestamp;
- u64 system_time;
- u32 tsc_to_system_mul;
- s8 tsc_shift;
- u8 flags;
- u8 pad[2];
+ u32 version;
+ u32 pad0;
+ u64 tsc_timestamp;
+ u64 system_time;
+ u32 tsc_to_system_mul;
+ s8 tsc_shift;
+ u8 flags;
+ u8 pad[2];
} __attribute__((__packed__)); /* 32 bytes */
struct pvclock_wall_clock {
- u32 version;
- u32 sec;
- u32 nsec;
+ u32 version;
+ u32 sec;
+ u32 nsec;
} __attribute__((__packed__));
struct vcpu_runstate_info {
};
struct vcpu_info {
- uint8_t evtchn_upcall_pending;
- uint8_t evtchn_upcall_mask;
- unsigned long evtchn_pending_sel;
- struct arch_vcpu_info arch;
- struct pvclock_vcpu_time_info time;
+ uint8_t evtchn_upcall_pending;
+ uint8_t evtchn_upcall_mask;
+ unsigned long evtchn_pending_sel;
+ struct arch_vcpu_info arch;
+ struct pvclock_vcpu_time_info time;
}; /* 64 bytes (x86) */
struct shared_info {
vm_ts.tv_sec = wc->sec;
vm_ts.tv_nsec = wc->nsec;
- TEST_ASSERT(wc->version && !(wc->version & 1),
+ TEST_ASSERT(wc->version && !(wc->version & 1),
"Bad wallclock version %x", wc->version);
TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old");
TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new");
-p Pause on fail
-P Pause after each test
-v Be verbose
+
+Tests:
+ $TESTS_IPV4 $TESTS_IPV6 $TESTS_OTHER
EOF
}
-timeout=300
+timeout=1500
# kbuild file for usr/ - including initramfs image
#
-# cmd_bzip2, cmd_lzma, cmd_lzo, cmd_lz4 from scripts/Makefile.lib appends the
-# size at the end of the compressed file, which unfortunately does not work
-# with unpack_to_rootfs(). Make size_append no-op.
-override size_append := :
-
compress-y := shipped
compress-$(CONFIG_INITRAMFS_COMPRESSION_GZIP) := gzip
compress-$(CONFIG_INITRAMFS_COMPRESSION_BZIP2) := bzip2
$(obj)/initramfs_data.o: $(obj)/initramfs_inc_data
-ramfs-input := $(strip $(shell echo $(CONFIG_INITRAMFS_SOURCE)))
+ramfs-input := $(CONFIG_INITRAMFS_SOURCE)
cpio-data :=
# If CONFIG_INITRAMFS_SOURCE is empty, generate a small initramfs with the
no-header-test += linux/ivtv.h
no-header-test += linux/kexec.h
no-header-test += linux/matroxfb.h
-no-header-test += linux/nfc.h
no-header-test += linux/omap3isp.h
no-header-test += linux/omapfb.h
no-header-test += linux/patchkey.h
cmd_hdrtest = \
$(CC) $(c_flags) -S -o /dev/null -x c /dev/null \
$(if $(filter-out $(no-header-test), $*.h), -include $< -include $<); \
- $(PERL) $(srctree)/scripts/headers_check.pl $(obj) $(SRCARCH) $<; \
+ $(PERL) $(srctree)/$(src)/headers_check.pl $(obj) $(SRCARCH) $<; \
touch $@
$(obj)/%.hdrtest: $(obj)/%.h FORCE
$(call if_changed_dep,hdrtest)
-clean-files += $(filter-out Makefile, $(notdir $(wildcard $(obj)/*)))
+# Since GNU Make 4.3, $(patsubst $(obj)/%/,%,$(wildcard $(obj)/*/)) works.
+# To support older Make versions, use a somewhat tedious way.
+clean-files += $(filter-out Makefile headers_check.pl, $(notdir $(wildcard $(obj)/*)))
--- /dev/null
+#!/usr/bin/env perl
+# SPDX-License-Identifier: GPL-2.0
+#
+# headers_check.pl execute a number of trivial consistency checks
+#
+# Usage: headers_check.pl dir arch [files...]
+# dir: dir to look for included files
+# arch: architecture
+# files: list of files to check
+#
+# The script reads the supplied files line by line and:
+#
+# 1) for each include statement it checks if the
+# included file actually exists.
+# Only include files located in asm* and linux* are checked.
+# The rest are assumed to be system include files.
+#
+# 2) It is checked that prototypes does not use "extern"
+#
+# 3) Check for leaked CONFIG_ symbols
+
+use warnings;
+use strict;
+use File::Basename;
+
+my ($dir, $arch, @files) = @ARGV;
+
+my $ret = 0;
+my $line;
+my $lineno = 0;
+my $filename;
+
+foreach my $file (@files) {
+ $filename = $file;
+
+ open(my $fh, '<', $filename)
+ or die "$filename: $!\n";
+ $lineno = 0;
+ while ($line = <$fh>) {
+ $lineno++;
+ &check_include();
+ &check_asm_types();
+ &check_sizetypes();
+ &check_declarations();
+ # Dropped for now. Too much noise &check_config();
+ }
+ close $fh;
+}
+exit $ret;
+
+sub check_include
+{
+ if ($line =~ m/^\s*#\s*include\s+<((asm|linux).*)>/) {
+ my $inc = $1;
+ my $found;
+ $found = stat($dir . "/" . $inc);
+ if (!$found) {
+ $inc =~ s#asm/#asm-$arch/#;
+ $found = stat($dir . "/" . $inc);
+ }
+ if (!$found) {
+ printf STDERR "$filename:$lineno: included file '$inc' is not exported\n";
+ $ret = 1;
+ }
+ }
+}
+
+sub check_declarations
+{
+ # soundcard.h is what it is
+ if ($line =~ m/^void seqbuf_dump\(void\);/) {
+ return;
+ }
+ # drm headers are being C++ friendly
+ if ($line =~ m/^extern "C"/) {
+ return;
+ }
+ if ($line =~ m/^(\s*extern|unsigned|char|short|int|long|void)\b/) {
+ printf STDERR "$filename:$lineno: " .
+ "userspace cannot reference function or " .
+ "variable defined in the kernel\n";
+ }
+}
+
+sub check_config
+{
+ if ($line =~ m/[^a-zA-Z0-9_]+CONFIG_([a-zA-Z0-9_]+)[^a-zA-Z0-9_]/) {
+ printf STDERR "$filename:$lineno: leaks CONFIG_$1 to userspace where it is not valid\n";
+ }
+}
+
+my $linux_asm_types;
+sub check_asm_types
+{
+ if ($filename =~ /types.h|int-l64.h|int-ll64.h/o) {
+ return;
+ }
+ if ($lineno == 1) {
+ $linux_asm_types = 0;
+ } elsif ($linux_asm_types >= 1) {
+ return;
+ }
+ if ($line =~ m/^\s*#\s*include\s+<asm\/types.h>/) {
+ $linux_asm_types = 1;
+ printf STDERR "$filename:$lineno: " .
+ "include of <linux/types.h> is preferred over <asm/types.h>\n"
+ # Warn until headers are all fixed
+ #$ret = 1;
+ }
+}
+
+my $linux_types;
+my %import_stack = ();
+sub check_include_typesh
+{
+ my $path = $_[0];
+ my $import_path;
+
+ my $fh;
+ my @file_paths = ($path, $dir . "/" . $path, dirname($filename) . "/" . $path);
+ for my $possible ( @file_paths ) {
+ if (not $import_stack{$possible} and open($fh, '<', $possible)) {
+ $import_path = $possible;
+ $import_stack{$import_path} = 1;
+ last;
+ }
+ }
+ if (eof $fh) {
+ return;
+ }
+
+ my $line;
+ while ($line = <$fh>) {
+ if ($line =~ m/^\s*#\s*include\s+<linux\/types.h>/) {
+ $linux_types = 1;
+ last;
+ }
+ if (my $included = ($line =~ /^\s*#\s*include\s+[<"](\S+)[>"]/)[0]) {
+ check_include_typesh($included);
+ }
+ }
+ close $fh;
+ delete $import_stack{$import_path};
+}
+
+sub check_sizetypes
+{
+ if ($filename =~ /types.h|int-l64.h|int-ll64.h/o) {
+ return;
+ }
+ if ($lineno == 1) {
+ $linux_types = 0;
+ } elsif ($linux_types >= 1) {
+ return;
+ }
+ if ($line =~ m/^\s*#\s*include\s+<linux\/types.h>/) {
+ $linux_types = 1;
+ return;
+ }
+ if (my $included = ($line =~ /^\s*#\s*include\s+[<"](\S+)[>"]/)[0]) {
+ check_include_typesh($included);
+ }
+ if ($line =~ m/__[us](8|16|32|64)\b/) {
+ printf STDERR "$filename:$lineno: " .
+ "found __[us]{8,16,32,64} type " .
+ "without #include <linux/types.h>\n";
+ $linux_types = 2;
+ # Warn until headers are all fixed
+ #$ret = 1;
+ }
+}
#endif
kvm_async_pf_vcpu_init(vcpu);
- vcpu->pre_pcpu = -1;
- INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
-
kvm_vcpu_set_in_spin_loop(vcpu, false);
kvm_vcpu_set_dy_eligible(vcpu, false);
vcpu->preempted = false;
{
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING
if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
return;
+#endif
if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
unsigned long rel_gfn = gfn - memslot->base_gfn;