If peers are "simultaneously" told to disconnect from each other,
either explicitly, or implicitly by taking down the resource,
with bad timing, one side may see its disconnect "fail" with
a result of "state change failed by peer", and interpret this as
"please oudate yourself".
Try to catch this by checking for current connection status,
and possibly retry as local-only state change instead.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
static enum drbd_state_rv conn_try_disconnect(struct drbd_connection *connection, bool force)
{
+ enum drbd_conns cstate;
enum drbd_state_rv rv;
+repeat:
rv = conn_request_state(connection, NS(conn, C_DISCONNECTING),
force ? CS_HARD : 0);
break;
case SS_CW_FAILED_BY_PEER:
+ spin_lock_irq(&connection->resource->req_lock);
+ cstate = connection->cstate;
+ spin_unlock_irq(&connection->resource->req_lock);
+ if (cstate <= C_WF_CONNECTION)
+ goto repeat;
/* The peer probably wants to see us outdated. */
rv = conn_request_state(connection, NS2(conn, C_DISCONNECTING,
disk, D_OUTDATED), 0);