git.monstr.eu Git - linux-2.6-microblaze.git/log

scsi: smartpqi: Add TEST UNIT READY check for SANITIZE operation

Send a TEST UNIT READY to HBA disks and do not present them to the OS if
0x02/0x04/0x1b (SANITIZE IN PROGRESS) is returned.

During boot-up, some OSes appear to hang when there are one or more disks
undergoing a sanitize operation.

According to SCSI SBC4 specification section 4.11.2 "Commands allowed
during SANITIZE", some SCSI commands are permitted, but read/write
operations are not.

When the OS attempts to read the disk partition table a CHECK CONDITION ASC
0x04 ASCQ 0x1b is returned which causes the OS to retry the read until
SANITIZE has completed. This can take hours.

According to document HPE Smart Storage Administrator User Guide, during
the sanitize erase operation, the drive is unusable. I.e. the expected
behavior for SANITIZE is the that disk remains offline even after SANITIZE
has completed. The customer is expected to re-enable the disk using the
management utility.

Link: https://lore.kernel.org/r/20210928235442.201875-6-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Update LUN reset handler

Enhance check for commands queued to the controller.  Add new function
pqi_nonempty_inbound_queue_count() that will wait for all I/O queued for
submission to controller across all queue groups to drain.  Add helper
functions to obtain queue command counts for each queue group.  These
queues should drain quickly as they are already staged to be submitted down
to the controller's IB queue.

Enhance check for outstanding command completion.  Update the count of
outstanding commands while waiting.  This value was not re-obtained and was
potentially causing infinite wait for all completions.

Link: https://lore.kernel.org/r/20210928235442.201875-5-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Capture controller reason codes

In some rare cases, the driver can halt the controller. Add a reason code
describing why the controller was halted. Store this reason code in a
controller register to aid in debugging the issue.

Link: https://lore.kernel.org/r/20210928235442.201875-4-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Add controller handshake during kdump

Correct kdump hangs when controller is locked up.

There are occasions when a controller reboot (controller soft reset) is
issued when a controller firmware crash dump is in progress.

This leads to incomplete controller firmware crash dump:

- When the controller crash dump is in progress, and a kdump is initiated,
   the driver issues inbound doorbell reset to bring back the controller in
   SIS mode.

- If the controller is in locked up state, the inbound doorbell reset does
   not work causing controller initialization failures. This results in the
   driver hanging waiting for SIS mode.

To avoid an incomplete controller crash dump, add in a controller crash
dump handshake:

- Controller will indicate start and end of the controller crash dump by
   setting some register bits.

- Driver will look these bits when a kdump is initiated.  If a controller
   crash dump is in progress, the driver will wait for the controller crash
   dump to complete before issuing the controller soft reset then complete
   driver initialization.

Link: https://lore.kernel.org/r/20210928235442.201875-3-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Update device removal management

Update device removal path to handle issues for:

- rmmod: Correct stack trace when removing devices.
- rmmod: Synchronize SCSI cache.
- Update handling for removing devices using sysfs.

Link: https://lore.kernel.org/r/20210928235442.201875-2-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpi3mr: Clean up mpi3mr_print_ioc_info()

This function is more complicated than necessary.

If we change from scnprintf() to snprintf() that lets us remove the if
bytes_wrote < sizeof(protocol) checks. Also, we can use bytes_wrote ? ","
: "" to print the comma and remove the separate if statement and the
"is_string_nonempty" variable.

[mkp: a few formatting cleanups and s/wrote/written/]

Link: https://lore.kernel.org/r/20210916132605.GF25094@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp()

pm8001_mpi_get_nvmd_resp() handles a GET_NVMD_DATA response, not a
SET_NVMD_DATA response, as the log statement implies.

Fixes: 1f889b58716a ("scsi: pm80xx: Fix pm8001_mpi_get_nvmd_resp() race condition")
Link: https://lore.kernel.org/r/20210929025847.646999-1-ipylypiv@google.com
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Replace open coded check with dev_is_expander()

This is a follow up cleanup to the commit 924a3541eab0 ("scsi: libsas:
aic94xx: hisi_sas: mvsas: pm8001: Use dev_is_expander()")

Link: https://lore.kernel.org/r/20210929025807.646589-1-ipylypiv@google.com
Reviewed-by: Vishakha Channapattan <vishakhavc@google.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: tcmu: Use struct_size() helper in kmalloc()

Make use of the struct_size() helper instead of an open-coded version, in
order to avoid any potential type mistakes or integer overflows that, in
the worst scenario, could lead to heap overflows.

Link: https://github.com/KSPP/linux/issues/160
Link: https://lore.kernel.org/r/20210927224344.GA190701@embeddedor
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: usb: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-8-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: ibm_vscsi: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-7-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: srpt: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-6-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: sbp: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-5-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: qla2xxx: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-4-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: iscsi: Replace tpg enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-3-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Add common tpg/enable attribute

Many fabric modules provide their own implementation of enable attribute in
tpg.

Provide a way to remove code duplication in the fabric modules and
automatically add "enable" attribute if a fabric module has an
implementation of fabric_enable_tpg().

Link: https://lore.kernel.org/r/20210910084133.17956-2-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Driver version update to 07.719.03.00-rc1

Link: https://lore.kernel.org/r/20210929124022.24605-4-sumit.saxena@broadcom.com
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Add helper functions for irq_context

Adding helper functions for ISR access and release to improve readability.

Link: https://lore.kernel.org/r/20210929124022.24605-3-sumit.saxena@broadcom.com
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Fix concurrent access to ISR between IRQ polling and real interrupt

IRQ polling thread calls ISR after enable_irq() to handle any missed I/O
completion. The atomic flag "in_used" was added to have the synchronization
between the IRQ polling thread and the interrupt context. There is a bug
around it leading to a race condition.

Below is the sequence:

- IRQ polling thread accesses ISR, fetches the reply descriptor.

- Real interrupt arrives and pre-empts polling thread (enable_irq() is
   already called).

- Interrupt context picks the same reply descriptor as fetched by polling
   thread, processes it, and exits.

- Polling thread resumes and processes the descriptor which is already
   processed by interrupt thread leads to kernel crash.

Setting the "in_used" flag before fetching the reply descriptor ensures
synchronized access to ISR.

Link: https://www.spinics.net/lists/linux-scsi/msg159440.html
Link: https://lore.kernel.org/r/20210929124022.24605-2-sumit.saxena@broadcom.com
Fixes: 9bedd36e9146 ("scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs")
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: advansys: Fix kernel pointer leak

Pointers should be printed with %p or %px rather than cast to 'unsigned
long' and printed with %lx.

Change %lx to %p to print the hashed pointer.

Link: https://lore.kernel.org/r/20210929122538.1158235-1-qtxuning1999@sjtu.edu.cn
Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Make logs less verbose

Change the log level of the following message to debug:

Unsupported SCSI Opcode 0xXX, sending CHECK_CONDITION.

This message is mostly helpful during debugging sessions in order to
understand errors on the initiator side. But most of the time it's just
useless and makes reading logs much harder.

It gets particularly annoying if there are many initiators that come and go
or if an initiator runs a program that does not care whether the command is
supported and just keeps sending it.

Link: https://lore.kernel.org/r/20210929114959.705852-1-k.shelekhin@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Do not exit ufshcd_err_handler() unless operational or dead

Callers of ufshcd_err_handler() expect it to return in an operational
state. However, the code does not check the state before exiting.

Add a check for the state and perform retries until either success or the
maximum number of retries is reached.

Link: https://lore.kernel.org/r/20211002154550.128511-3-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Do not exit ufshcd_reset_and_restore() unless operational or dead

Callers of ufshcd_reset_and_restore() expect it to return in an operational
state. However, the code only checks direct errors and so the ufshcd_state
may not be UFSHCD_STATE_OPERATIONAL due to error interrupts.

Fix by also checking ufshcd_state, still allowing non-fatal errors which
are left for the error handler to deal with.

Link: https://lore.kernel.org/r/20211002154550.128511-2-adrian.hunter@intel.com
Reviewed-by: Avri altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Stop clearing UNIT ATTENTIONS

Commit aa53f580e67b ("scsi: ufs: Minor adjustments to error handling")
introduced a ufshcd_clear_ua_wluns() call in
ufshcd_err_handling_unprepare(). As explained in detail by Adrian Hunter,
this can trigger a deadlock. Avoid that deadlock by removing the code that
clears the unit attention. This is safe because the only software that
relies on clearing unit attentions is the Android Trusty software and
because support for handling unit attentions has been added in the Trusty
software.

See also https://lore.kernel.org/linux-scsi/20210930124224.114031-2-adrian.hunter@intel.com/

Note that "scsi: ufs: Retry START_STOP on UNIT_ATTENTION" is a prerequisite
for this commit.

Link: https://lore.kernel.org/r/20211001182015.1347587-3-jaegeuk@kernel.org
Fixes: aa53f580e67b ("scsi: ufs: Minor adjustments to error handling")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Retry START_STOP on UNIT_ATTENTION

Commit 57d104c153d3 ("ufs: add UFS power management support") made the UFS
driver submit a REQUEST SENSE command before submitting a power management
command to a WLUN to clear the POWER ON unit attention. Instead of
submitting a REQUEST SENSE command before submitting a power management
command, retry the power management command until it succeeds.

This is the preparation to get rid of all UNIT ATTENTION code which should
be handled by users.

Link: https://lore.kernel.org/r/20211001182015.1347587-2-jaegeuk@kernel.org
Cc: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Remove return statement in void function

Return statement is not useful at the end of "void" function.

Link: https://lore.kernel.org/r/20210929200640.828611-4-huobean@gmail.com
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Fix ufshcd_probe_hba() prototype to match the definition

Since commit 568dd9959611 ("scsi: ufs: Rename the second ufshcd_probe_hba()
argument"), the second ufshcd_probe_hba() argument has been changed to
init_dev_params.

Link: https://lore.kernel.org/r/20210929200640.828611-3-huobean@gmail.com
Fixes: 568dd9959611 ("scsi: ufs: Rename the second ufshcd_probe_hba() argument")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Fix NULL pointer dereference

Calling ufshcd_rpm_{get/put}_sync() prior to ufshcd_scsi_add_wlus() being
called will trigger a NULL pointer dereference. This is because
hba->sdev_ufs_device is initialized in ufshcd_scsi_add_wlus().

    Unable to handle kernel NULL pointer dereference at virtual address
    0000000000000348
    Mem abort info:
      ESR = 0x96000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
    [0000000000000348] user address but active_mm is swapper
    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 91 Comm: kworker/u16:1 Not tainted 5.15.0-rc1-beanhuo-linaro-1423
    Hardware name: MicronRB (DT)
    Workqueue: events_unbound async_run_entry_fn
    pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : pm_runtime_drop_link+0x128/0x338
    lr : ufshpb_get_dev_info+0x8c/0x148
    sp : ffff800012573c10
    x29: ffff800012573c10 x28: 0000000000000000 x27: 0000000000000003
    x26: ffff000001d21298 x25: 000000005abcea60 x24: ffff800011d89000
    x23: 0000000000000001 x22: ffff000001d21880 x21: ffff000001ec9300
    x20: 0000000000000004 x19: 0000000000000198 x18: ffffffffffffffff
    x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000041400
    x14: 5eee00201100200a x13: 000000000000bb03 x12: 0000000000000000
    x11: 0000000000000100 x10: 0200000000000000 x9 : bb0000021a162c01
    x8 : 0302010021021003 x7 : 0000000000000000 x6 : ffff800012573af0
    x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000200
    x2 : 0000000000000348 x1 : 0000000000000348 x0 : ffff80001095308c
    Call trace:
     pm_runtime_drop_link+0x128/0x338
     ufshpb_get_dev_info+0x8c/0x148
     ufshcd_probe_hba+0xda0/0x11b8
     ufshcd_async_scan+0x34/0x330
     async_run_entry_fn+0x38/0x180
     process_one_work+0x1f4/0x498
     worker_thread+0x48/0x480
     kthread+0x140/0x158
     ret_from_fork+0x10/0x20
    Code: 88027c01 35ffffa2 17fff6c4 f9800051 (885f7c40)
    ---[ end trace 2ba541335f595c95 ]

ufshpb_get_dev_info() is only called during asynchronous scanning and at
that time pm_runtime_get_sync() has been called:

    ...
    /* Hold auto suspend until async scan completes */
    pm_runtime_get_sync(dev);
    atomic_set(&hba->scsi_block_reqs_cnt, 0);
    ...
    ufshcd_async_scan()
        ufshcd_probe_hba(hba, true);
            ufshcd_device_params_init(hba);
                ufshpb_get_dev_info();
    ...
        pm_runtime_put_sync(hba->dev);

Remove ufshcd_rpm_{get/put}_sync() from ufshpb_get_dev_info() to fix this
problem.

Link: https://lore.kernel.org/r/20210929200640.828611-2-huobean@gmail.com
Fixes: 351b3a849ac7 ("scsi: ufs: ufshpb: Use proper power management API")
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Fix spelling in a source code comment

The typo in this source code comment makes the comment confusing. Clear up
the confusion by fixing the typo.

Link: https://lore.kernel.org/r/20210929182318.2060489-1-bvanassche@acm.org
Fixes: bc85dc500f9d ("scsi: remove scsi_end_request")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sd: Fix sd_do_mode_sense() buffer length handling

For devices that explicitly asked for MODE SENSE(10) use, make sure that
scsi_mode_sense() is called with a buffer of at least 8 bytes so that the
sense header fits.

Link: https://lore.kernel.org/r/20210820070255.682775-4-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Fix scsi_mode_select() buffer length handling

The MODE SELECT(6) command allows handling mode page buffers that are up to
255 bytes, including the 4 byte header needed in front of the page
buffer. For requests larger than this limit, automatically use the MODE
SELECT(10) command.

In both cases, since scsi_mode_select() adds the mode select page header,
checks on the buffer length value must include this header size to avoid
overflows of the command CDB allocation length field.

While at it, use put_unaligned_be16() for setting the header block
descriptor length and CDB allocation length when using MODE SELECT(10).

[mkp: fix MODE SENSE vs. MODE SELECT confusion]

Link: https://lore.kernel.org/r/20210820070255.682775-3-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Fix scsi_mode_sense() buffer length handling

Several problems exist with scsi_mode_sense() buffer length handling:

1) The allocation length field of the MODE SENSE(10) command is 16-bits,
    occupying bytes 7 and 8 of the CDB. With this command, access to mode
    pages larger than 255 bytes is thus possible. However, the CDB
    allocation length field is set by assigning len to byte 8 only, thus
    truncating buffer length larger than 255.

2) If scsi_mode_sense() is called with len smaller than 8 with
    sdev->use_10_for_ms set, or smaller than 4 otherwise, the buffer length
    is increased to 8 and 4 respectively, and the buffer is zero filled
    with these increased values, thus corrupting the memory following the
    buffer.

Fix these 2 problems by using put_unaligned_be16() to set the allocation
length field of MODE SENSE(10) CDB and by returning an error when len is
too small.

Furthermore, if len is larger than 255B, always try MODE SENSE(10) first,
even if the device driver did not set sdev->use_10_for_ms. In case of
invalid opcode error for MODE SENSE(10), access to mode pages larger than
255 bytes are not retried using MODE SENSE(6). To avoid buffer length
overflows for the MODE_SENSE(10) case, check that len is smaller than 65535
bytes.

While at it, also fix the folowing:

* Use get_unaligned_be16() to retrieve the mode data length and block
   descriptor length fields of the mode sense reply header instead of using
   an open coded calculation.

* Fix the kdoc dbd argument explanation: the DBD bit stands for Disable
   Block Descriptor, which is the opposite of what the dbd argument
   description was.

Link: https://lore.kernel.org/r/20210820070255.682775-2-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Delete scsi_{get,free}_host_dev()

Since commit 0653c358d2dc ("scsi: Drop gdth driver"), functions
scsi_{get,free}_host_dev() no longer have any in-tree users, so delete
them.

Link: https://lore.kernel.org/r/1631528047-30150-1-git-send-email-john.garry@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nacked-by: Hannes Reinecke <hare@suse.de>

scsi: elx: efct: Switch from 'pci_' to 'dma_' API

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been hand modified to use 'dma_set_mask_and_coherent()' instead of
'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
This is less verbose.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Link: https://lore.kernel.org/r/3899b1ed4abac581c30845d82f33ec6df8b38976.1629633207.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-qcom: Enter and exit hibern8 during clock scaling

Qualcomm controller needs to be in hibern8 before scaling clocks. This
change puts the controller in hibern8 state before scaling and brings it
out after scaling of clocks.

Link: https://lore.kernel.org/r/212b7aaf6d834c4a8c682fdac4a59b84013ed573.1632818942.git.nguyenb@codeaurora.org
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Export hibern8 entry and exit functions

Qualcomm controllers need to be in hibern8 before scaling up or down the
clocks. Hence, export the hibern8 entry and exit functions.

Link: https://lore.kernel.org/r/a29bfdd0c8f1d1a3e5fb69e43ea277c97a7f0cb6.1632818942.git.nguyenb@codeaurora.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Add support for optional PLDV handling

At adapter attachment or SLI port initialization, read the SLIPORT_STATUS
register to check for pldv_enable. If found, the driver will perform a PCIe
configuration space write when attaching to an SLI port instance that is an
LPe32000 series adapter.

Link: https://lore.kernel.org/r/20210927183518.22130-1-jsmart2021@gmail.com
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: SCSI_UFS_HWMON depends on HWMON=y

When building an allmodconfig kernel, the following build error shows up:

aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_probe':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:177: undefined reference to `hwmon_device_register_with_info'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:177:(.text+0x510): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_device_register_with_info'
aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_remove':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:195: undefined reference to `hwmon_device_unregister'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:195:(.text+0x5c8): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_device_unregister'
aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_notify_event':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:206: undefined reference to `hwmon_notify_event'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:206:(.text+0x64c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_notify_event'
aarch64-linux-gnu-ld: /home/anders/src/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:209: undefined reference to `hwmon_notify_event'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:209:(.text+0x66c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_notify_event'

Since SCSI_UFS_HWMON can't be built as a module, SCSI_UFS_HWMON has to
depend on HWMON=y.

Link: https://lore.kernel.org/r/20210927084615.1938432-1-anders.roxell@linaro.org
Fixes: e88e2d32200a ("scsi: ufs: core: Probe for temperature notification support")
Also-reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Return NULL rather than a plain 0 integer

Function lpfc_sli4_perform_vport_cvl() returns a pointer to struct
lpfc_nodelist so returning a plain 0 integer isn't good practice. Fix this
by returning a NULL instead.

Link: https://lore.kernel.org/r/20210925224113.183040-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: aic7xxx: Fix a function name in comments

Use dma_alloc_coherent() instead of pci_alloc_consistent().

Link: https://lore.kernel.org/r/20210925132931.95-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix a function name in comments

Use dma_map_sg() instead of pci_map_sg() in comments.

Link: https://lore.kernel.org/r/20210925125324.1760-3-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: advansys: Prefer struct_size() over open-coded arithmetic

As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.

Use the struct_size() helper to do the arithmetic instead of the argument
"size + count * size" in the kzalloc() function.

This code was detected with the help of Coccinelle and audited and fixed
manually.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://lore.kernel.org/r/20210925114205.11377-1-len.baker@gmx.com
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: exynos: Unify naming

Use "Samsung" and "Exynos", not the uppercase versions.

Link: https://lore.kernel.org/r/20210924132658.109814-2-krzysztof.kozlowski@canonical.com
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix mailbox command failure during driver initialization

Contention for the mailbox interface may occur during driver initialization
(immediately after a function reset), between mailbox commands initiated
via ioctl (bsg) and those driver requested by the driver.

After setting SLI_ACTIVE flag for a port, there is a window in which the
driver will allow an ioctl to be initiated while the adapter is
initializing and issuing mailbox commands via polling. The polling logic
then gets confused.

Correct by having thread setting SLI_ACTIVE spot an active mailbox command
and allow it complete before proceeding.

Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: dc395: Fix error case unwinding

dc395x_init_one()->adapter_init() might fail. In this case, the acb is
already cleaned up by adapter_init(), no need to do that in
adapter_uninit(acb) again.

[    1.252251] dc395x: adapter init failed
[    1.254900] RIP: 0010:adapter_uninit+0x94/0x170 [dc395x]
[    1.260307] Call Trace:
[    1.260442]  dc395x_init_one.cold+0x72a/0x9bb [dc395x]

Link: https://lore.kernel.org/r/20210907040702.1846409-1-ztong0001@gmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Add temperature notification exception handling

The device may notify the host of an extreme temperature by using the
exception event mechanism. The exception can be raised when the device’s
Tcase temperature is either too high or too low.

It is essentially up to the platform to decide what further actions need to
be taken. leave a placeholder for a designated vop for that.

Link: https://lore.kernel.org/r/20210915060407.40-3-avri.altman@wdc.com
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Probe for temperature notification support

Probe the dExtendedUFSFeaturesSupport register for the device's temperature
notification support and, if supported, add a hardware monitor device.

Link: https://lore.kernel.org/r/20210915060407.40-2-avri.altman@wdc.com
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: efct: Decrease area under spinlock

Under the session level spinlock node->active_ios_lock in
efct_scsi_io_alloc() we are taking another spinlock for the port. This
leads to contention between sessions and even between I/Os in the same
session.

Reduce the locked region to active_ios list for which active_ios_lock is
intended. Spinlock CPU usage decreases from 18% down to 13%. IOPS are
increased from 220 kIOPS to 264 kIOPS for one LUN.

Link: https://lore.kernel.org/r/20210914105539.6942-4-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: efct: Fix nport free

nport_free for an empty nport hangs the state machine waiting for mbox
completion if nport is not yet attached thinking that it is attaching right
now. Add a check for nport attaching state and complete nport free.

Link: https://lore.kernel.org/r/20210914105539.6942-3-d.bogdanov@yadro.com
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: efct: Add state in nport sm trace printout

Similar to other state machine traces and to make debug easier, add the
state name to nport sm trace printout.

Link: https://lore.kernel.org/r/20210914105539.6942-2-d.bogdanov@yadro.com
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Remove include <scsi/scsi_host.h> from scsi_cmnd.h

There are no dependencies in <scsi/scsi_cmnd.h> on the <scsi/scsi_host.h>
header file. Hence remove the scsi_host.h include directive from
scsi_cmnd.h. This include directive was introduced in February 2021 by
commit af1830956dc3 ("scsi: core: Add mq_poll support to SCSI layer").

Link: https://lore.kernel.org/r/20210917212751.2676054-1-bvanassche@acm.org
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: Remove unused function arguments

The se_cmd is unused in these functions, just remove it.

Link: https://lore.kernel.org/r/20210913083045.3670648-1-fengli@smartx.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Li Feng <fengli@smartx.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-mediatek: Change dbg select by check IP version

Mediatek UFS dbg select setting is changed in new IP version. Check the IP
version before setting dbg select.

Link: https://lore.kernel.org/r/1630918387-8333-1-git-send-email-peter.wang@mediatek.com
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufshpb: Use proper power management API

In ufshpb, pm_runtime_{get,put}_sync() are used to avoid unwanted runtime
suspend during query requests. Whereas commit b294ff3e3449 ("scsi: ufs:
core: Enable power management for wlun") modified the driver core to use
ufshcd_rpm_{get,put}_sync() APIs.

Switch to these APIs in HPB module as well.

Link: https://lore.kernel.org/r/20210902003534epcms2p1937a0f0eeb48a441cb69f5ef13ff8430@epcms2p1
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-qcom: Remove unneeded variable 'err'

'err' is never set in this functon. Remove the declaration and just return
0.

Link: https://lore.kernel.org/r/20210907044111.29632-1-cw9316.lee@samsung.com
Signed-off-by: ChanWoo Lee <cw9316.lee@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: documentation: Document Fibre Channel sysfs node for appid

Update documentation for sysfs node within /sys/class/fc/fc_udev_device/.

Link: https://lore.kernel.org/r/20210913015853.2086512-1-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: elx: libefc: Prefer kcalloc() over open coded arithmetic

As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.

Use the purpose specific kcalloc() function instead of the argument count *
size in the kzalloc() function.

[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://lore.kernel.org/r/20210905062448.6587-1-len.baker@gmx.com
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Update lpfc version to 14.0.0.2

Update lpfc version to 14.0.0.2.

Link: https://lore.kernel.org/r/20210910233159.115896-15-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Improve PBDE checks during SGL processing

The PBDE feature, setting payload buffer address explicitly in the WQE so
it doesn't have to be fetched from the SGL, only makes sense when there is
a single buffer for the I/O. When there are multiple buffers it actually
hurts performance as the SGL subsequently has to be fetched.

Rework the SGL logic to only use PBDE when a single buffer.

Link: https://lore.kernel.org/r/20210910233159.115896-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Zero CGN stats only during initial driver load and stat reset

Currently congestion management framework results are cleared whenever the
framework settings changed (such as it being turned off then back on). This
unfortunately means prior stats, rolled up to higher time windows lose
meaning.

Change such that stats are not cleared. Thus they pause and resume with
prior values still being considered.

Link: https://lore.kernel.org/r/20210910233159.115896-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix I/O block after enabling managed congestion mode

If the congestion management framework dynamically enables, it may do so
while I/O is in flight. The updates of cmf info due to inflight I/O
completing may happen before values have been initialized.

Fix by ensure cmf_max_bytes_per_interval is initialized when checking
bandwidth utilization for SCSI layer blocking.

Link: https://lore.kernel.org/r/20210910233159.115896-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Adjust bytes received vales during cmf timer interval

The newly added congestion mgmt framework is seeing unexpected congestion
FPINs and signals. In analysis, time values given to the adapter are not
at hard time intervals. Thus the drift vs the transfer count seen is
affecting how the framework manages things.

Adjust counters to cover the drift.

Link: https://lore.kernel.org/r/20210910233159.115896-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix EEH support for NVMe I/O

Injecting errors on the PCI slot while the driver is handling NVMe I/O will
cause crashes and hangs.

There are several rather difficult scenarios occurring. The main issue is
that the adapter can report a PCI error before or simultaneously to the PCI
subsystem reporting the error. Both paths have different entry points and
currently there is no interlock between them. Thus multiple teardown paths
are competing and all heck breaks loose.

Complicating things is the NVMs path. To a large degree, I/O was able to be
shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
similar call. At best, it works on a per-controller basis, but even at the
controller level, it's a controller "reset" call. All of which means I/O is
still flowing on different CPUs with reset paths expecting hw access
(mailbox commands) to execute properly.

The following modifications are made:

- A new flag is set in PCI error entrypoints so the driver can track being
   called by that path.

- An interlock is added in the SLI hw error path and the PCI error path
   such that only one of the paths proceeds with the teardown logic.

- RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
   cmds in cases of hw error.

- If entering the SLI port re-init calls, a case where SLI error teardown
   was quick and beat the PCI calls now reporting error, check whether the
   SLI port is still live on the PCI bus.

- In the PCI reset code to bring the adapter back, recheck the IRQ
   settings. Different checks for SLI3 vs SLI4.

- In I/O completions, that may be called as part of the cleanup or
   underway just before the hw error, check the state of the adapter.  If
   in error, shortcut handling that would expect further adapter
   completions as the hw error won't be sending them.

- In routines waiting on I/O completions, which may have been in progress
   prior to the hw error, detect the device is being torn down and abort
   from their waits and just give up. This points to a larger issue in the
   driver on ref-counting for data structures, as it doesn't have
   ref-counting on q and port structures. We'll do this fix for now as it
   would be a major rework to be done differently.

- Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
   failed back due to hw error.

- In I/O buf allocation, done at the start of new I/Os, check hw state and
   fail if hw error.

Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix FCP I/O flush functionality for TMF routines

A prior patch inadvertently caused lpfc_sli_sum_iocb() to exclude counting
of outstanding aborted I/Os and ABORT IOCBs.  Thus,
lpfc_reset_flush_io_context() called from any TMF routine does not properly
wait to flush all outstanding FCP IOCBs leading to a block layer crash on
an invalid scsi_cmnd->request pointer.

  kernel BUG at ../block/blk-core.c:1489!
  RIP: 0010:blk_requeue_request+0xaf/0xc0
  ...
  Call Trace:
  <IRQ>
  __scsi_queue_insert+0x90/0xe0 [scsi_mod]
  blk_done_softirq+0x7e/0x90
  __do_softirq+0xd2/0x280
  irq_exit+0xd5/0xe0
  do_IRQ+0x4c/0xd0
  common_interrupt+0x87/0x87
  </IRQ>

Fix by separating out the LPFC_IO_FCP, LPFC_IO_ON_TXCMPLQ,
LPFC_DRIVER_ABORTED, and CMD_ABORT_XRI_CN || CMD_CLOSE_XRI_CN checks into a
new lpfc_sli_validate_fcp_iocb_for_abort() routine when determining to
build an ABORT iocb.

Restore lpfc_reset_flush_io_context() functionality by including counting
of outstanding aborted IOCBs and ABORT IOCBs in lpfc_sli_sum_iocb().

Link: https://lore.kernel.org/r/20210910233159.115896-9-jsmart2021@gmail.com
Fixes: e1364711359f ("scsi: lpfc: Fix illegal memory access on Abort IOCBs")
Cc: <stable@vger.kernel.org> # v5.12+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix NVMe I/O failover to non-optimized path

Currently, we hold off unregistering with NVMe transport layer until GID_FT
or ADISC completes upon receipt of RSCN. In the ADISC discovery routine,
for nodes not found in the GID_FT response, the nodes are unregistered from
the SCSI transport but not UNREG_RPI'd. Meaning outstanding WQEs continue
to be outstanding and were not failed back to the OS. If an NVMe device,
this mean there wasn't initial termination of the I/Os so they could be
issued on a different NVMe path.

Fix by unregistering the RPI so that I/O is cancelled.

Link: https://lore.kernel.org/r/20210910233159.115896-8-jsmart2021@gmail.com
Fixes: 0614568361b0 ("scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Don't remove ndlp on PRLI errors in P2P mode

In pt-2-pt mode, the initiator does not log into the target after a PRLI
error. In pt-2-pt mode, the target responded to the PRLI by sending a
LOGO. The LOGO causes all ELS and I/Os to be aborted. This caused the PRLI
to fail. The PRLI completion path caused the discovery node to be dropped
to avoid being stick in an UNUSED (not logged in) state. As the node was
dropped there is no retry of the login and as it is pt-2-pt, there is no
RSCN to retrigger discovery. Thus the other end is not seen by the OS.

Fix by ensuring the discovery node is not dropped if connecting pt-2-pt.
This will cause PLOGI to be retried.

Link: https://lore.kernel.org/r/20210910233159.115896-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix rediscovery of tape device after LIP

On link up and node discovery, a remote port is registered with the SCSI
transport and the driver sets fc4_xpt_flags to track transport
registration.

A link down event causes the driver to deregister with the SCSI transport,
starting the devloss timer, and calls a local unreg routine to clear the
login state. Part of the login state is the fc4_xpt_flags. However, with
tape devices that support sequence level error recovery, which wants to
preserve the login, the local unreg routine is skipped, thus the flags
aren't cleared.

A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node()
routine is called. As the fc4_xpt_flags is not clear, it's believed the
node is already registered with the transport. Unfortunately, the
registration was already terminated. Eventually the devloss tmo timer
expires and tears down the device.

Fix by ensuring the tape device, known by the ADISC flag, is always
unregistered if the link drops.

Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix hang on unload due to stuck fport node

A test scenario encountered an unload hang while an FLOGI ELS was in flight
when a link down condition occurred.  The driver fails unload as it never
releases the fport node.

For most nodes, when the link drops, devloss tmo is started and the timeout
will cause the final node release. For the Fport, as it has not yet
registered with the SCSI transport, there is no devloss timer to be
started, so there is no final release.  Additionally, the link down
sequence causes ABORTS to be issued for pending ELS's. The completions from
the ABORTS perform the release of node references.  However, as the adapter
is being reset to be unloaded, those completions will never occur.

Fix by the following:

- In the ELS cleanup, recognize when unloading and place the ELS's on a
   different list that immediately cleans up/completes the ELS's.  It's
   recognized that this condition primarily affects only the fport, with
   other ports having normal clean up logic that handles things.

- Resolve the devloss issue by, when cleaning up nodes on after link down,
   recognizing when the fabric node does not have a completed state (its
   state is UNUSED) and removing a reference so the node can delete after
   the ELS reference is released.

Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix premature rpi release for unsolicited TPLS and LS_RJT

A test scenario has a target issuing a TPLS after accepting the driver's
PRLI.  TPLS is not supported by the driver so it rejects the ELS.  However,
the reject was only happening on the primary N_Port.  If the TPLS was to a
NPIV vport, not only would it reject the ELS, but it would act on the TPLS,
starting devloss, then unregister from the SCSI transport and release the
node. When devloss expired, it would access the node again and cause a page
faul.

Fix by altering the NPIV code to recognize that a correctly registered node
can reject unsolicited ELS I/O and to not unregister with the SCSI
transport and tear the node down.  Add a check of the fc4_xpt_flags so that
only a zero value allows the unreg and teardown.

Link: https://lore.kernel.org/r/20210910233159.115896-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Don't release final kref on Fport node while ABTS outstanding

In a rarely executed path, FLOGI failure, there is a refcounting error. If
FLOGI completed with an error, typically a timeout, the initial completion
handler would remove the job reference. However, the job completion isn't
the actual end of the job/exchange as the timeout usually initiates an
ABTS, and upon that ABTS completion, a final completion is sent. The driver
removes the reference again in the final completion. Thus the imbalance.

In the buggy cases, if there was a link bounce while the delayed response
is outstanding, the fport node may be referenced again but there was no
additional reference as it is already present. The delayed completion then
occurs and removes the last reference freeing the node and causing issues
in the link up processed that is using the node.

Fix this scenario by removing the snippet that removed the reference in the
initial FLOGI completion. The bad snippet was poorly trying to identify the
FLOGI as OK to do so by realizing the node was not registered with either
SCSI or NVMe transport.

Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com
Fixes: 618e2ee146d4 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node")
Cc: <stable@vger.kernel.org> # v5.13+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()

When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass
the requests to the adapter. If such an attempt fails, a local "fail_msg"
string is set and a log message output. The job is then added to a
completions list for cancellation.

Processing of any further jobs from the txq list continues, but since
"fail_msg" remains set, jobs are added to the completions list regardless
of whether a wqe was passed to the adapter. If successfully added to
txcmplq, jobs are added to both lists resulting in list corruption.

Fix by clearing the fail_msg string after adding a job to the completions
list. This stops the subsequent jobs from being added to the completions
list unless they had an appropriate failure.

Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Remove redundant initialization of pointer req

The pointer req is being initialized with a value that is never read, it is
being updated later on. The assignment is redundant and can be removed.

Link: https://lore.kernel.org/r/20210910114610.44752-1-colin.king@canonical.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Unused value")

scsi: qla2xxx: Update version to 10.02.07.100-k

Link: https://lore.kernel.org/r/20210908164622.19240-11-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix use after free in eh_abort path

In eh_abort path driver prematurely exits the call to upper layer. Check
whether command is aborted / completed by firmware before exiting the call.

9 [ffff8b1ebf803c00] page_fault at ffffffffb0389778
  [exception RIP: qla2x00_status_entry+0x48d]
  RIP: ffffffffc04fa62d  RSP: ffff8b1ebf803cb0  RFLAGS: 00010082
  RAX: 00000000ffffffff  RBX: 00000000000e0000  RCX: 0000000000000000
  RDX: 0000000000000000  RSI: 00000000000013d8  RDI: fffff3253db78440
  RBP: ffff8b1ebf803dd0   R8: ffff8b1ebcd9b0c0   R9: 0000000000000000
  R10: ffff8b1e38a30808  R11: 0000000000001000  R12: 00000000000003e9
  R13: 0000000000000000  R14: ffff8b1ebcd9d740  R15: 0000000000000028
  ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
10 [ffff8b1ebf803cb0] enqueue_entity at ffffffffafce708f
11 [ffff8b1ebf803d00] enqueue_task_fair at ffffffffafce7b88
12 [ffff8b1ebf803dd8] qla24xx_process_response_queue at ffffffffc04fc9a6
[qla2xxx]
13 [ffff8b1ebf803e78] qla24xx_msix_rsp_q at ffffffffc04ff01b [qla2xxx]
14 [ffff8b1ebf803eb0] __handle_irq_event_percpu at ffffffffafd50714

Link: https://lore.kernel.org/r/20210908164622.19240-10-njavali@marvell.com
Fixes: f45bca8c5052 ("scsi: qla2xxx: Fix double scsi_done for abort path")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Co-developed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Move heartbeat handling from DPC thread to workqueue

DPC thread gets restricted due to a no-op mailbox, which is a blocking call
and has a high execution frequency. To free up the DPC thread we move no-op
handling to the workqueue. Also, modified qla_do_heartbeat() to send no-op
MBC if we don’t have any active interrupts, but there are still I/Os
outstanding with firmware.

Link: https://lore.kernel.org/r/20210908164622.19240-9-njavali@marvell.com
Fixes: d94d8158e184 ("scsi: qla2xxx: Add heartbeat check")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Call process_response_queue() in Tx path

Process responses in Tx path if any available for better performance.

Link: https://lore.kernel.org/r/20210908164622.19240-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix kernel crash when accessing port_speed sysfs file

Kernel crashes when accessing port_speed sysfs file.  The issue happens on
a CNA when the local array was accessed beyond bounds. Fix this by changing
the lookup.

BUG: unable to handle kernel paging request at 0000000000004000
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 15 PID: 455213 Comm: sosreport Kdump: loaded Not tainted
4.18.0-305.7.1.el8_4.x86_64 #1
RIP: 0010:string_nocheck+0x12/0x70
Code: 00 00 4c 89 e2 be 20 00 00 00 48 89 ef e8 86 9a 00 00 4c 01
e3 eb 81 90 49 89 f2 48 89 ce 48 89 f8 48 c1 fe 30 66 85 f6 74 4f <44> 0f b6 0a
45 84 c9 74 46 83 ee 01 41 b8 01 00 00 00 48 8d 7c 37
RSP: 0018:ffffb5141c1afcf0 EFLAGS: 00010286
RAX: ffff8bf4009f8000 RBX: ffff8bf4009f9000 RCX: ffff0a00ffffff04
RDX: 0000000000004000 RSI: ffffffffffffffff RDI: ffff8bf4009f8000
RBP: 0000000000004000 R08: 0000000000000001 R09: ffffb5141c1afb84
R10: ffff8bf4009f9000 R11: ffffb5141c1afce6 R12: ffff0a00ffffff04
R13: ffffffffc08e21aa R14: 0000000000001000 R15: ffffffffc08e21aa
FS:  00007fc4ebfff700(0000) GS:ffff8c717f7c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000004000 CR3: 000000edfdee6006 CR4: 00000000001706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  string+0x40/0x50
  vsnprintf+0x33c/0x520
  scnprintf+0x4d/0x90
  qla2x00_port_speed_show+0xb5/0x100 [qla2xxx]
  dev_attr_show+0x1c/0x40
  sysfs_kf_seq_show+0x9b/0x100
  seq_read+0x153/0x410
  vfs_read+0x91/0x140
  ksys_read+0x4f/0xb0
  do_syscall_64+0x5b/0x1a0
  entry_SYSCALL_64_after_hwframe+0x65/0xca

Link: https://lore.kernel.org/r/20210908164622.19240-7-njavali@marvell.com
Fixes: 4910b524ac9e ("scsi: qla2xxx: Add support for setting port speed")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: edif: Use link event to wake up app

Authentication application may be running and in the past tried to probe
driver (app_start) but was unsuccessful. This could be due to the bsg layer
not being ready to service the request. On a successful link up, driver
will use the netlink Link Up event to notify the app to retry the app_start
call.

In another case, app does not poll for new NPIV host. This link up event
would notify app of the presence of a new SCSI host.

Link: https://lore.kernel.org/r/20210908164622.19240-6-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix crash in NVMe abort path

System crash was seen when I/O was run against an NVMe target and aborts
were occurring.

Crash stack is:

    -- relevant crash stack --
    BUG: kernel NULL pointer dereference, address: 0000000000000010
    :
    #6 [ffffae1f8666bdd0] page_fault at ffffffffa740122e
       [exception RIP: qla_nvme_abort_work+339]
       RIP: ffffffffc0f592e3  RSP: ffffae1f8666be80  RFLAGS: 00010297
       RAX: 0000000000000000  RBX: ffff9b581fc8af80  RCX: ffffffffc0f83bd0
       RDX: 0000000000000001  RSI: ffff9b5839c6c7c8  RDI: 0000000008000000
       RBP: ffff9b6832f85000   R8: ffffffffc0f68160   R9: ffffffffc0f70652
       R10: ffffae1f862ffdc8  R11: 0000000000000300  R12: 000000000000010d
       R13: 0000000000000000  R14: ffff9b5839cea000  R15: 0ffff9b583fab170
       ORIG_RAX: ffffffffffffffff   CS: 0010  SS: 0018
    #7 [ffffae1f8666be98] process_one_work at ffffffffa6aba184
    #8 [ffffae1f8666bed8] worker_thread at ffffffffa6aba39d
    #9 [ffffae1f8666bf10] kthread at ffffffffa6ac06ed

The crash was due to a stale SRB structure access after it was aborted.
Fix the issue by removing stale access.

Link: https://lore.kernel.org/r/20210908164622.19240-5-njavali@marvell.com
Fixes: 2cabf10dbbe3 ("scsi: qla2xxx: Fix hang on NVMe command timeouts")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Check for firmware capability before creating QPair

Add firmware capability check of multiQ specifically for ISP25XX before
creating qpair.

Link: https://lore.kernel.org/r/20210908164622.19240-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Display 16G only as supported speeds for 3830c card

This card is unique and doesn't support lower speeds, hence update the fdmi
field to display 16G only.

Link: https://lore.kernel.org/r/20210908164622.19240-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Add support for mailbox passthru

This interface will allow user space applications to send a mailbox command
to the firmware.

Link: https://lore.kernel.org/r/20210908164622.19240-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Fix memory leak during rmmod

Driver failed to release all memory allocated. This would lead to memory
leak during driver removal.

Properly free memory when the module is removed.

Link: https://lore.kernel.org/r/20210906170404.5682-5-Ajish.Koshy@microchip.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Correct inbound and outbound queue logging

Correct inbound queue and outbound queue size in 'ib_log' and 'ob_log'
sysfs entries.

Link: https://lore.kernel.org/r/20210906170404.5682-4-Ajish.Koshy@microchip.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Fix lockup in outbound queue management

Commit 1f02beff224e ("scsi: pm80xx: Remove global lock from outbound queue
processing") introduced a lock per outbound queue. Prior to that change the
driver was using a global lock for all outbound queues.

While processing the I/O responses and events the driver takes the outbound
queue spinlock and is supposed to release it in pm8001_ccb_task_free_done()
before calling command done(). Since the older code was using a global
lock, pm8001_ccb_task_free_done() was releasing the global spin lock. The
change that split the lock per outbound queue did not consider this and
pm8001_ccb_task_free_done() was still releasing the global lock.

Link: https://lore.kernel.org/r/20210906170404.5682-3-Ajish.Koshy@microchip.com
Fixes: 1f02beff224e ("scsi: pm80xx: Remove global lock from outbound queue processing")
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Fix incorrect port value when registering a device

During phyup event, the firmware provides the phy_id and port_id and driver
is supposed to use these during device handle registration. Previously the
driver was using the port id value from libsas during device handle
registration. Since id can be different from the one assigned by firmware,
this can lead to wrong device registration and drives not showing up.

Use firmware assigned port id during device registration.

Link: https://lore.kernel.org/r/20210906170404.5682-2-Ajish.Koshy@microchip.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: libiscsi: Move ehwait initialization to iscsi_session_setup()

Commit ec29d0ac29be ("scsi: iscsi: Fix conn use after free during resets")
moved member ehwait from 'conn' to 'session', but left the initialization
of ehwait in iscsi_conn_setup().

Although a session can only have 1 conn currently, it is better to
initialize ehwait in iscsi_session_setup() in case we implement handling
multiple conns in the future.

Link: https://lore.kernel.org/r/20210911135159.20543-1-dinghui@sangfor.com.cn
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: libsas: Co-locate exports with symbols

It is standard practice to co-locate export declarations with the symbol
which is being exported. Or at least in the same file - see
sas_phy_reset().

Modify libsas to follow this practice consistently.

Link: https://lore.kernel.org/r/1631530296-32358-1-git-send-email-john.garry@huawei.com
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Increase debugfs_dump_index after dump is completed

The hisi_hba debugfs_dump_index member should increased after a dump
insertion completed, and not before it has started, so fix the code to do
so.

Link: https://lore.kernel.org/r/1629799260-120116-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Replace del_timer() calls with del_timer_sync()

Some usage of del_timer() in the driver is potentially unsafe.

When running the sas_task->slow_task timer in
hisi_sas_exec_internal_tmf_task(), execution may be blocked in function
hisi_sas_task_exec(); so it is possible that the timer is running when the
callback to disable the timer is running. This could be dangerous, as we
immediately release resources which the timer callback uses after disabling
the timer. The same situation may be found at other sites, such as
_hisi_sas_internal_task_abort().

Change calls to del_timer() to del_timer_sync() as necessary, to ensure any
timer has finished when disabling.

Also remove calls to timer_pending() prior to del_timer() as it is not
necessary.

Link: https://lore.kernel.org/r/1629799260-120116-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Rename HISI_SAS_{RESET -> RESETTING}_BIT

HISI_SAS_RESET_BIT means that the controller is being reset, and so the
name is a bit vague. Rename it to HISI_SAS_RESETTING_BIT.

Link: https://lore.kernel.org/r/1629799260-120116-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Stop printing queue count in v3 hardware probe

The number of hardware queues is available from sysfs. Remove the print in
the v3 hardware probe function.

Link: https://lore.kernel.org/r/1629799260-120116-3-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Use managed PCI functions

Use managed PCI functions such as pcim_enable_device() and
pcim_iomap_regions() to simplify exception handling code.

Link: https://lore.kernel.org/r/1629799260-120116-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Linux 5.15-rc1

Merge tag 'perf-tools-for-v5.15-2021-09-11' of git://git./linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:

- Add missing fields and remove some duplicate fields when printing a
   perf_event_attr.

- Fix hybrid config terms list corruption.

- Update kernel header copies, some resulted in new kernel features
   being automagically added to 'perf trace' syscall/tracepoint argument
   id->string translators.

- Add a file generated during the documentation build to .gitignore.

- Add an option to build without libbfd, as some distros, like Debian
   consider its ABI unstable.

- Add support to print a textual representation of IBS raw sample data
   in 'perf report'.

- Fix bpf 'perf test' sample mismatch reporting

- Fix passing arguments to stackcollapse report in a 'perf script'
   python script.

- Allow build-id with trailing zeros.

- Look for ImageBase in PE file to compute .text offset.

* tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (25 commits)
  tools headers UAPI: Update tools's copy of drm.h headers
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Sync linux/fs.h with the kernel sources
  tools headers UAPI: Sync linux/in.h copy with the kernel sources
  perf tools: Add an option to build without libbfd
  perf tools: Allow build-id with trailing zeros
  perf tools: Fix hybrid config terms list corruption
  perf tools: Factor out copy_config_terms() and free_config_terms()
  perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
  perf tools: Ignore Documentation dependency file
  perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
  tools include UAPI: Update linux/mount.h copy
  perf beauty: Cover more flags in the  move_mount syscall argument beautifier
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools include UAPI: Sync sound/asound.h copy with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
  perf report: Add support to print a textual representation of IBS raw sample data
  perf report: Add tools/arch/x86/include/asm/amd-ibs.h
  perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
  ...

Merge tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux

Pull compiler attributes updates from Miguel Ojeda:

- Fix __has_attribute(__no_sanitize_coverage__) for GCC 4 (Marco Elver)

- Add Nick as Reviewer for compiler_attributes.h (Nick Desaulniers)

- Move __compiletime_{error|warning} (Nick Desaulniers)

* tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux:
  compiler_attributes.h: move __compiletime_{error|warning}
  MAINTAINERS: add Nick as Reviewer for compiler_attributes.h
  Compiler Attributes: fix __has_attribute(__no_sanitize_coverage__) for GCC 4

Merge tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux

Pull auxdisplay updates from Miguel Ojeda:
"An assortment of improvements for auxdisplay:

   - Replace symbolic permissions with octal permissions (Jinchao Wang)

   - ks0108: Switch to use module_parport_driver() (Andy Shevchenko)

   - charlcd: Drop unneeded initializers and switch to C99 style (Andy
     Shevchenko)

   - hd44780: Fix oops on module unloading (Lars Poeschel)

   - Add I2C gpio expander example (Ralf Schlatterbeck)"

* tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux:
  auxdisplay: Replace symbolic permissions with octal permissions
  auxdisplay: ks0108: Switch to use module_parport_driver()
  auxdisplay: charlcd: Drop unneeded initializers and switch to C99 style
  auxdisplay: hd44780: Fix oops on module unloading
  auxdisplay: Add I2C gpio expander example

Merge tag 'smp-urgent-2021-09-12' of git://git./linux/kernel/git/tip/tip

Pull CPU hotplug updates from Thomas Gleixner:
"Updates for the SMP and CPU hotplug:

   - Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the
     original hotplug code and now causing trouble with the ARM64 cache
     topology setup due to the pointless SMP function call.

     It's not longer required as the hotplug callbacks are guaranteed to
     be invoked on the upcoming CPU.

   - Remove the deprecated and now unused CPU hotplug functions

   - Rewrite the CPU hotplug API documentation"

* tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: core-api/cpuhotplug: Rewrite the API section
  cpu/hotplug: Remove deprecated CPU-hotplug functions.
  thermal: Replace deprecated CPU-hotplug functions.
  drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()

Merge tag 'char-misc-5.15-rc1-lkdtm' of git://git./linux/kernel/git/gregkh/char-misc

Pull misc driver fix from Greg KH:
"Here is a single patch for 5.15-rc1, for the lkdtm misc driver.

  It resolves a build issue that many people were hitting with your
  current tree, and Kees and others felt would be good to get merged
  before -rc1 comes out, to prevent them from having to constantly hit
  it as many development trees restart on -rc1, not older -rc releases.

  It has NOT been in linux-next, but has passed 0-day testing and looks
  'obviously correct' when reviewing it locally :)"

* tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  lkdtm: Use init_uts_ns.name instead of macros

Merge tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi

Pull IPMI updates from Corey Minyard:
"A couple of very minor fixes for style and rate limiting.

  Nothing big, but probably needs to go in"

* tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi:
  char: ipmi: use DEVICE_ATTR helper macro
  ipmi: rate limit ipmi smi_event failure message