Merge tag 's390-5.20-1' of git://git./linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Rework copy_oldmem_page() callback to take an iov_iter. This includes a few prerequisite updates and fixes to the oldmem reading code. - Rework cpufeature implementation to allow for various CPU feature indications, which is not only limited to hardware capabilities, but also allows CPU facilities. - Use the cpufeature rework to autoload Ultravisor module when CPU facility 158 is available. - Add ELF note type for encrypted CPU state of a protected virtual CPU. The zgetdump tool from s390-tools package will decrypt the CPU state using a Customer Communication Key and overwrite respective notes to make the data accessible for crash and other debugging tools. - Use vzalloc() instead of vmalloc() + memset() in ChaCha20 crypto test. - Fix incorrect recovery of kretprobe modified return address in stacktrace. - Switch the NMI handler to use generic irqentry_nmi_enter() and irqentry_nmi_exit() helper functions. - Rework the cryptographic Adjunct Processors (AP) pass-through design to support dynamic changes to the AP matrix of a running guest as well as to implement more of the AP architecture. - Minor boot code cleanups. - Grammar and typo fixes to hmcdrv and tape drivers. * tag 's390-5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits) Revert "s390/smp: enforce lowcore protection on CPU restart" Revert "s390/smp: rework absolute lowcore access" Revert "s390/smp,ptdump: add absolute lowcore markers" s390/unwind: fix fgraph return address recovery s390/nmi: use irqentry_nmi_enter()/irqentry_nmi_exit() s390: add ELF note type for encrypted CPU state of a PV VCPU s390/smp,ptdump: add absolute lowcore markers s390/smp: rework absolute lowcore access s390/setup: rearrange absolute lowcore initialization s390/boot: cleanup adjust_to_uv_max() function s390/smp: enforce lowcore protection on CPU restart s390/tape: fix comment typo s390/hmcdrv: fix Kconfig "its" grammar s390/docs: fix warnings for vfio_ap driver doc s390/docs: fix warnings for vfio_ap driver lock usage doc s390/crash: support multi-segment iterators s390/crash: use static swap buffer for copy_to_user_real() s390/crash: move copy_to_user_real() to crash_dump.c s390/zcore: fix race when reading from hardware system area s390/crash: fix incorrect number of bytes to copy to user space ...
vfio: Replace phys_pfn with pages for vfio_pin_pages() Most of the callers of vfio_pin_pages() want "struct page *" and the low-level mm code to pin pages returns a list of "struct page *" too. So there's no gain in converting "struct page *" to PFN in between. Replace the output parameter "phys_pfn" list with a "pages" list, to simplify callers. This also allows us to replace the vfio_iommu_type1 implementation with a more efficient one. And drop the pfn_valid check in the gvt code, as there is no need to do such a check at a page-backed struct page pointer. For now, also update vfio_iommu_type1 to fit this new parameter too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-11-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio/ap: Change saved_pfn to saved_iova The vfio_ap_ops code maintains both nib address and its PFN, which is redundant, merely because vfio_pin/unpin_pages API wanted pfn. Since vfio_pin/unpin_pages() now accept "iova", change "saved_pfn" to "saved_iova" and remove pfn in the vfio_ap_validate_nib(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-7-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio: Pass in starting IOVA to vfio_pin/unpin_pages API The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA. Among all three callers, there was only one caller possibly passing in a non-contiguous PFN list, which is now ensured to have contiguous PFN inputs too. Pass in the starting address with "iova" alone to simplify things, so callers no longer need to maintain a PFN list or to pin/unpin one page at a time. This also allows VFIO to use more efficient implementations of pin/unpin_pages. For now, also update vfio_iommu_type1 to fit this new parameter too, while keeping its input intact (being user_iova) since we don't want to spend too much effort swapping its parameters and local variables at that level. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-6-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio/ap: Pass in physical address of ind to ap_aqic() The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a virt value that's casted from a physical address "h_nib". Inside the ap_aqic(), it does virt_to_phys() again. Since ap_aqic() needs a physical address, let's just pass in a pa of ind directly. So change the "ind" to "pa_ind". Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-4-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio: Replace the DMA unmapping notifier with a callback Instead of having drivers register the notifier with explicit code just have them provide a dma_unmap callback op in their driver ops and rely on the core code to wire it up. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
s390/vfio-ap: handle config changed and scan complete notification This patch implements two new AP driver callbacks: void (*on_config_changed)(struct ap_config_info *new_config_info, struct ap_config_info *old_config_info); void (*on_scan_complete)(struct ap_config_info *new_config_info, struct ap_config_info *old_config_info); The on_config_changed callback is invoked at the start of the AP bus scan function when it determines that the host AP configuration information has changed since the previous scan. The vfio_ap device driver registers a callback function for this callback that performs the following operations: 1. Unplugs the adapters, domains and control domains removed from the host's AP configuration from the guests to which they are assigned in a single operation. 2. Stores bitmaps identifying the adapters, domains and control domains added to the host's AP configuration with the structure representing the mediated device. When the vfio_ap device driver's probe callback is subsequently invoked, the probe function will recognize that the queue is being probed due to a change in the host's AP configuration and the plugging of the queue into the guest will be bypassed. The on_scan_complete callback is invoked after the ap bus scan is completed if the host AP configuration data has changed. The vfio_ap device driver registers a callback function for this callback that hot plugs each queue and control domain added to the AP configuration for each guest using them in a single hot plug operation. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: sysfs attribute to display the guest's matrix The matrix of adapters and domains configured in a guest's APCB may differ from the matrix of adapters and domains assigned to the matrix mdev, so this patch introduces a sysfs attribute to display the matrix of adapters and domains that are or will be assigned to the APCB of a guest that is or will be using the matrix mdev. For a matrix mdev denoted by $uuid, the guest matrix can be displayed as follows: cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: implement in-use callback for vfio_ap driver Let's implement the callback to indicate when an APQN is in use by the vfio_ap device driver. The callback is invoked whenever a change to the apmask or aqmask would result in one or more queue devices being removed from the driver. The vfio_ap device driver will indicate a resource is in use if the APQN of any of the queue devices to be removed are assigned to any of the matrix mdevs under the driver's control. There is potential for a deadlock condition between the matrix_dev->guests_lock used to lock the guest during assignment of adapters and domains and the ap_perms_mutex locked by the AP bus when changes are made to the sysfs apmask/aqmask attributes. The AP Perms lock controls access to the objects that store the adapter numbers (ap_perms) and domain numbers (aq_perms) for the sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attributes. These attributes identify which queues are reserved for the zcrypt default device drivers. Before allowing a bit to be removed from either mask, the AP bus must check with the vfio_ap device driver to verify that none of the queues are assigned to any of its mediated devices. The apmask/aqmask attributes can be written or read at any time from userspace, so care must be taken to prevent a deadlock with asynchronous operations that might be taking place in the vfio_ap device driver. For example, consider the following: 1. A system administrator assigns an adapter to a mediated device under the control of the vfio_ap device driver. The driver will need to first take the matrix_dev->guests_lock to potentially hot plug the adapter into the KVM guest. 2. At the same time, a system administrator sets a bit in the sysfs /sys/bus/ap/ap_mask attribute. To complete the operation, the AP bus must: a. Take the ap_perms_mutex lock to update the object storing the values for the /sys/bus/ap/ap_mask attribute. b. Call the vfio_ap device driver's in-use callback to verify that the queues now being reserved for the default zcrypt drivers are not assigned to a mediated device owned by the vfio_ap device driver. To do the verification, the in-use callback function takes the matrix_dev->guests_lock, but has to wait because it is already held by the operation in 1 above. 3. The vfio_ap device driver calls an AP bus function to verify that the new queues resulting from the assignment of the adapter in step 1 are not reserved for the default zcrypt device driver. This AP bus function tries to take the ap_perms_mutex lock but gets stuck waiting for the waiting for the lock due to step 2a above. Consequently, we have the following deadlock situation: matrix_dev->guests_lock locked (1) ap_perms_mutex lock locked (2a) Waiting for matrix_dev->gusts_lock (2b) which is currently held (1) Waiting for ap_perms_mutex lock (3) which is currently held (2a) To prevent this deadlock scenario, the function called in step 3 will no longer take the ap_perms_mutex lock and require the caller to take the lock. The lock will be the first taken by the adapter/domain assignment functions in the vfio_ap device driver to maintain the proper locking order. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: reset queues after adapter/domain unassignment When an adapter or domain is unassigned from an mdev attached to a KVM guest, one or more of the guest's queues may get dynamically removed. Since the removed queues could get re-assigned to another mdev, they need to be reset. So, when an adapter or domain is unassigned from the mdev, the queues that are removed from the guest's AP configuration (APCB) will be reset. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: hot plug/unplug of AP devices when probed/removed When an AP queue device is probed or removed, if the mediated device is attached to a KVM guest, the mediated device's adapter, domain and control domain bitmaps must be filtered to update the guest's APCB and if any changes are detected, the guest's APCB must then be hot plugged into the guest to reflect those changes to the guest. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: allow hot plug/unplug of AP devices when assigned/unassigned Let's hot plug an adapter, domain or control domain into the guest when it is assigned to a matrix mdev that is attached to a KVM guest. Likewise, let's hot unplug an adapter, domain or control domain from the guest when it is unassigned from a matrix_mdev that is attached to a KVM guest. Whenever an assignment or unassignment of an adapter, domain or control domain is performed, the APQNs and control domains assigned to the matrix mdev will be filtered and assigned to the AP control block (APCB) that supplies the AP configuration to the guest so that no adapter, domain or control domain that is not in the host's AP configuration nor any APQN that does not reference a queue device bound to the vfio_ap device driver is assigned. After updating the APCB, if the mdev is in use by a KVM guest, it is hot plugged into the guest to dynamically provide access to the adapters, domains and control domains provided via the newly refreshed APCB. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: prepare for dynamic update of guest's APCB on queue probe/remove The callback functions for probing and removing a queue device must take and release the locks required to perform a dynamic update of a guest's APCB in the proper order. The proper order for taking the locks is: matrix_dev->guests_lock => kvm->lock => matrix_dev->mdevs_lock The proper order for releasing the locks is: matrix_dev->mdevs_lock => kvm->lock => matrix_dev->guests_lock A new helper function is introduced to be used by the probe callback to acquire the required locks. Since the probe callback only has access to a queue device when it is called, the helper function will find the ap_matrix_mdev object to which the queue device's APQN is assigned and return it so the KVM guest to which the mdev is attached can be dynamically updated. Note that in order to find the ap_matrix_mdev (matrix_mdev) object, it is necessary to search the matrix_dev->mdev_list. This presents a locking order dilemma because the matrix_dev->mdevs_lock can't be taken to protect against changes to the list while searching for the matrix_mdev to which a queue device's APQN is assigned. This is due to the fact that the proper locking order requires that the matrix_dev->mdevs_lock be taken after both the matrix_mdev->kvm->lock and the matrix_dev->mdevs_lock. Consequently, the matrix_dev->guests_lock will be used to protect against removal of a matrix_mdev object from the list while a queue device is being probed. This necessitates changes to the mdev probe/remove callback functions to take the matrix_dev->guests_lock prior to removing a matrix_mdev object from the list. A new macro is also introduced to acquire the locks required to dynamically update the guest's APCB in the proper order when a queue device is removed. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: prepare for dynamic update of guest's APCB on assign/unassign The functions backing the matrix mdev's sysfs attribute interfaces to assign/unassign adapters, domains and control domains must take and release the locks required to perform a dynamic update of a guest's APCB in the proper order. The proper order for taking the locks is: matrix_dev->guests_lock => kvm->lock => matrix_dev->mdevs_lock The proper order for releasing the locks is: matrix_dev->mdevs_lock => kvm->lock => matrix_dev->guests_lock Two new macros are introduced for this purpose: One to take the locks and the other to release the locks. These macros will be used by the assignment/unassignment functions to prepare for dynamic update of the KVM guest's APCB. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: use proper locking order when setting/clearing KVM pointer The group notifier that handles the VFIO_GROUP_NOTIFY_SET_KVM event must use the required locks in proper locking order to dynamically update the guest's APCB. The proper locking order is: 1. matrix_dev->guests_lock: required to use the KVM pointer to update a KVM guest's APCB. 2. matrix_mdev->kvm->lock: required to update a KVM guest's APCB. 3. matrix_dev->mdevs_lock: required to store or access the data stored in a struct ap_matrix_mdev instance. Two macros are introduced to acquire and release the locks in the proper order. These macros are now used by the group notifier functions. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: rename matrix_dev->lock mutex to matrix_dev->mdevs_lock The matrix_dev->lock mutex is being renamed to matrix_dev->mdevs_lock to better reflect its purpose, which is to control access to the state of the mediated devices under the control of the vfio_ap device driver. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: allow assignment of unavailable AP queues to mdev device The current implementation does not allow assignment of an AP adapter or domain to an mdev device if each APQN resulting from the assignment does not reference an AP queue device that is bound to the vfio_ap device driver. This patch allows assignment of AP resources to the matrix mdev as long as the APQNs resulting from the assignment: 1. Are not reserved by the AP BUS for use by the zcrypt device drivers. 2. Are not assigned to another matrix mdev. The rationale behind this is that the AP architecture does not preclude assignment of APQNs to an AP configuration profile that are not available to the system. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev Refresh the guest's APCB by filtering the APQNs and control domain numbers assigned to the matrix mdev. Filtering of APQNs: ----------------- APQNs that do not reference an AP queue device bound to the vfio_ap device driver must be filtered from the APQNs assigned to the matrix mdev before they can be assigned to the guest's APCB. Given that the APQNs are configured in the guest's APCB as a matrix of APIDs (adapters) and APQIs (domains), it is not possible to filter an individual APQN. For example, suppose the matrix of APQNs is structured as follows: APIDs 3 4 5 0 (3,0) (4,0) (5,0) APQIs 1 (3,1) (4,1) (5,1) 2 (3,2) (4,2) (5,2) Now suppose APQN (4,1) does not reference a queue device bound to the vfio_ap device driver. If we filter APID 4, the APQNs (4,0), (4,1) and (4,2) will be removed. Similarly, if we filter domain 1, APQNs (3,1), (4,1) and (5,1) will be removed. To resolve this dilemma, the choice was made to filter the APID - in this case 4 - from the guest's APCB. The reason for this design decision is because the APID references an AP adapter which is a real hardware device that can be physically installed, removed, enabled or disabled; whereas, a domain is a partition within the adapter. It therefore better reflects reality to remove the APID from the guest's APCB. Filtering of control domains: ---------------------------- Any control domains that are not assigned to the host's AP configuration will be filtered from those assigned to the matrix mdev before assigning them to the guest's APCB. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: introduce shadow APCB The APCB is a field within the CRYCB that provides the AP configuration to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and maintain it for the lifespan of the guest. The shadow APCB serves the following purposes: 1. The shadow APCB can be maintained even when the mediated device is not currently in use by a KVM guest. Since the mediated device's AP configuration is filtered to ensure that no AP queues are passed through to the KVM guest that are not bound to the vfio_ap device driver or available to the host, the mediated device's AP configuration may differ from the guest's. Having a shadow of a guest's APCB allows us to provide a sysfs interface to view the guest's APCB even if the mediated device is not currently passed through to a KVM guest. This can aid in problem determination when the guest is unexpectedly missing AP resources. 2. If filtering was done in-place for the real APCB, the guest could pick up a transient state. Doing the filtering on a shadow and transferring the AP configuration to the real APCB after the guest is started or when AP resources are assigned to or unassigned from the mediated device, or when the host configuration changes, the guest's AP configuration will never be in a transient state. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/vfio-ap: manage link between queue struct and matrix mdev Let's create links between each queue device bound to the vfio_ap device driver and the matrix mdev to which the queue's APQN is assigned. The idea is to facilitate efficient retrieval of the objects representing the queue devices and matrix mdevs as well as to verify that a queue assigned to a matrix mdev is bound to the driver. The links will be created as follows: * When the queue device is probed, if its APQN is assigned to a matrix mdev, the structures representing the queue device and the matrix mdev will be linked. * When an adapter or domain is assigned to a matrix mdev, for each new APQN assigned that references a queue device bound to the vfio_ap device driver, the structures representing the queue device and the matrix mdev will be linked. The links will be removed as follows: * When the queue device is removed, if its APQN is assigned to a matrix mdev, the link from the structure representing the matrix mdev to the structure representing the queue will be removed. Since the storage allocated for the vfio_ap_queue will be freed, there is no need to remove the link to the matrix_mdev to which the queue's APQN is assigned. * When an adapter or domain is unassigned from a matrix mdev, for each APQN unassigned that references a queue device bound to the vfio_ap device driver, the structures representing the queue device and the matrix mdev will be unlinked. * When an mdev is removed, the link from any queues assigned to the mdev to the mdev will be removed. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>