【failure report】2021.10.21:interactive queue failure

A failure had occurred and has been restored.

1.overview

 interactive queue is unavailable 

2.period

 2021/10/21(Thr) from around 13:05 to around 14:12

3.detail

    One of the two job management servers (jobcon1) became unresponsive at around 13:05 on October 21. Due to this failure, the dedicated interactive queue was temporarily unavailable. At 13:31 of the same day, we performed a power cycle of the server and confirmed that it started normally, and after performing HW check, we started the job scheduler of the interactive queue at 14:12 of the same day and resumed the service.

4.affected jobs

   Jobs 1005333, 1005345, 1005338 were running during the time, we confirmed each job finished properly.