accel/habanalabs: modify pci health check
authorOfir Bitton <obitton@habana.ai>
Mon, 12 Feb 2024 12:35:24 +0000 (14:35 +0200)
committerOded Gabbay <ogabbay@kernel.org>
Mon, 26 Feb 2024 07:47:32 +0000 (09:47 +0200)
Today we read PCI VENDOR-ID in order to make sure PCI link is
healthy. Apparently the VENDOR-ID might be stored on host and
hence, when we read it we might not access the PCI bus.
In order to make sure PCI health check is reliable, we will start
checking the DEVICE-ID instead.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drivers/accel/habanalabs/common/device.c

index 3b9e8a2..8f92445 100644 (file)
@@ -1035,14 +1035,14 @@ static void device_early_fini(struct hl_device *hdev)
 
 static bool is_pci_link_healthy(struct hl_device *hdev)
 {
-       u16 vendor_id;
+       u16 device_id;
 
        if (!hdev->pdev)
                return false;
 
-       pci_read_config_word(hdev->pdev, PCI_VENDOR_ID, &vendor_id);
+       pci_read_config_word(hdev->pdev, PCI_DEVICE_ID, &device_id);
 
-       return (vendor_id == PCI_VENDOR_ID_HABANALABS);
+       return (device_id == hdev->pdev->device);
 }
 
 static int hl_device_eq_heartbeat_check(struct hl_device *hdev)