1 .. SPDX-License-Identifier: GPL-2.0
9 ``devlink-port`` is a port that exists on the device. It has a logically
10 separate ingress/egress point of the device. A devlink port can be any one
11 of many flavours. A devlink port flavour along with port attributes
12 describe what a port represents.
14 A device driver that intends to publish a devlink port sets the
15 devlink port attributes and registers the devlink port.
17 Devlink port flavours are described below.
19 .. list-table:: List of devlink port flavours
24 * - ``DEVLINK_PORT_FLAVOUR_PHYSICAL``
25 - Any kind of physical port. This can be an eswitch physical port or any
26 other physical port on the device.
27 * - ``DEVLINK_PORT_FLAVOUR_DSA``
28 - This indicates a DSA interconnect port.
29 * - ``DEVLINK_PORT_FLAVOUR_CPU``
30 - This indicates a CPU port applicable only to DSA.
31 * - ``DEVLINK_PORT_FLAVOUR_PCI_PF``
32 - This indicates an eswitch port representing a port of PCI
33 physical function (PF).
34 * - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
35 - This indicates an eswitch port representing a port of PCI
36 virtual function (VF).
37 * - ``DEVLINK_PORT_FLAVOUR_PCI_SF``
38 - This indicates an eswitch port representing a port of PCI
40 * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
41 - This indicates a virtual port for the PCI virtual function.
43 Devlink port can have a different type based on the link layer described below.
45 .. list-table:: List of devlink port types
50 * - ``DEVLINK_PORT_TYPE_ETH``
51 - Driver should set this port type when a link layer of the port is
53 * - ``DEVLINK_PORT_TYPE_IB``
54 - Driver should set this port type when a link layer of the port is
56 * - ``DEVLINK_PORT_TYPE_AUTO``
57 - This type is indicated by the user when driver should detect the port
62 In most cases a PCI device has only one controller. A controller consists of
63 potentially multiple physical, virtual functions and subfunctions. A function
64 consists of one or more ports. This port is represented by the devlink eswitch
67 A PCI device connected to multiple CPUs or multiple PCI root complexes or a
68 SmartNIC, however, may have multiple controllers. For a device with multiple
69 controllers, each controller is distinguished by a unique controller number.
70 An eswitch is on the PCI device which supports ports of multiple controllers.
72 An example view of a system with two controllers::
74 ---------------------------------------------------------
76 | --------- --------- ------- ------- |
77 ----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
78 | server | | ------- ----/---- ---/----- ------- ---/--- ---/--- |
79 | pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ |
80 | connect | | ------- ------- |
81 ----------- | | controller_num=1 (no eswitch) |
82 ------|--------------------------------------------------
85 ---------------------------------------------------------
86 | devlink eswitch ports and reps |
87 | ----------------------------------------------------- |
88 | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
89 | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
90 | ----------------------------------------------------- |
91 | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
92 | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
93 | ----------------------------------------------------- |
96 ----------- | --------- --------- ------- ------- |
97 | smartNIC| | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
98 | pci rc |==| ------- ----/---- ---/----- ------- ---/--- ---/--- |
99 | connect | | | pf0 |______/________/ | pf1 |___/_______/ |
100 ----------- | ------- ------- |
102 | local controller_num=0 (eswitch) |
103 ---------------------------------------------------------
105 In the above example, the external controller (identified by controller number = 1)
106 doesn't have the eswitch. Local controller (identified by controller number = 0)
107 has the eswitch. The Devlink instance on the local controller has eswitch
108 devlink ports for both the controllers.
110 Function configuration
111 ======================
113 A user can configure the function attribute before enumerating the PCI
114 function. Usually it means, user should configure function attribute
115 before a bus specific device for the function is created. However, when
116 SRIOV is enabled, virtual function devices are created on the PCI bus.
117 Hence, function attribute should be configured before binding virtual
118 function device to the driver. For subfunctions, this means user should
119 configure port function attribute before activating the port function.
121 A user may set the hardware address of the function using
122 'devlink port function set hw_addr' command. For Ethernet port function
123 this means a MAC address.
128 Subfunction is a lightweight function that has a parent PCI function on which
129 it is deployed. Subfunction is created and deployed in unit of 1. Unlike
130 SRIOV VFs, a subfunction doesn't require its own PCI virtual function.
131 A subfunction communicates with the hardware through the parent PCI function.
133 To use a subfunction, 3 steps setup sequence is followed.
134 (1) create - create a subfunction;
135 (2) configure - configure subfunction attributes;
136 (3) deploy - deploy the subfunction;
138 Subfunction management is done using devlink port user interface.
139 User performs setup on the subfunction management device.
143 A subfunction is created using a devlink port interface. A user adds the
144 subfunction by adding a devlink port of subfunction flavour. The devlink
145 kernel code calls down to subfunction management driver (devlink ops) and asks
146 it to create a subfunction devlink port. Driver then instantiates the
147 subfunction port and any associated objects such as health reporters and
148 representor netdevice.
152 A subfunction devlink port is created but it is not active yet. That means the
153 entities are created on devlink side, the e-switch port representor is created,
154 but the subfunction device itself is not created. A user might use e-switch port
155 representor to do settings, putting it into bridge, adding TC rules, etc. A user
156 might as well configure the hardware address (such as MAC address) of the
157 subfunction while subfunction is inactive.
161 Once a subfunction is configured, user must activate it to use it. Upon
162 activation, subfunction management driver asks the subfunction management
163 device to instantiate the subfunction device on particular PCI function.
164 A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`.
165 At this point a matching subfunction driver binds to the subfunction's auxiliary device.
167 Rate object management
168 ======================
170 Devlink provides API to manage tx rates of single devlink port or a group.
171 This is done through rate objects, which can be one of the two types:
174 Represents a single devlink port; created/destroyed by the driver. Since leaf
175 have 1to1 mapping to its devlink port, in user space it is referred as
176 ``pci/<bus_addr>/<port_index>``;
179 Represents a group of rate objects (leafs and/or nodes); created/deleted by
180 request from the userspace; initially empty (no rate objects added). In
181 userspace it is referred as ``pci/<bus_addr>/<node_name>``, where
182 ``node_name`` can be any identifier, except decimal number, to avoid
183 collisions with leafs.
185 API allows to configure following rate object's parameters:
188 Minimum TX rate value shared among all other rate objects, or rate objects
189 that parts of the parent group, if it is a part of the same group.
192 Maximum TX rate value.
195 Parent node name. Parent node rate limits are considered as additional limits
196 to all node children limits. ``tx_max`` is an upper limit for children.
197 ``tx_share`` is a total bandwidth distributed among children.
199 Driver implementations are allowed to support both or either rate object types
200 and setting methods of their parameters.
202 Terms and Definitions
203 =====================
205 .. list-table:: Terms and Definitions
211 - A physical PCI device having one or more PCI buses consists of one or
212 more PCI controllers.
213 * - ``PCI controller``
214 - A controller consists of potentially multiple physical functions,
215 virtual functions and subfunctions.
216 * - ``Port function``
217 - An object to manage the function of a port.
219 - A lightweight function that has parent PCI function on which it is
221 * - ``Subfunction device``
222 - A bus device of the subfunction, usually on a auxiliary bus.
223 * - ``Subfunction driver``
224 - A device driver for the subfunction auxiliary device.
225 * - ``Subfunction management device``
226 - A PCI physical function that supports subfunction management.
227 * - ``Subfunction management driver``
228 - A device driver for PCI physical function that supports
229 subfunction management using devlink port interface.
230 * - ``Subfunction host driver``
231 - A device driver for PCI physical function that hosts subfunction
232 devices. In most cases it is same as subfunction management driver. When
233 subfunction is used on external controller, subfunction management and
234 host drivers are different.