[Fault report] /gs/hs0 - occurred on Jun. 25, 2018

2018.6.25

An fault occurred and now temporarily recovered.

1. Summary

Impossible to access a part of /gs/hs0. It has temporarily recovered, but there is the possibility of performance decline.

2. Period

From 13:39 to 13:54, on Jun. 25

3. Details

Around 13:39, panic occurred on ossa2 which manages OST of Lustre (/gs/hs0), thereby It happened not to be able to access to /gs/hs0. Around 13:54,  it was taken over to ossa3. /gs/hs0 is accessible at present.
It was probably caused by a temporal stall of file I/O to Lustre file system in the period above.
OST, which is supposed to be managed by ossa0, is mounted on ossa1 at present. For that reason, it is possible that I/O bandwidth to /gs/hs0 decline.

It is thought that of the same kind as the fault occurred on Jun. 15, May 24, and Jun. 4

The cause of the failure is known and can be corrected, but we are carefully considering it because it will be a large stop.

Maintenance scheduled to take back to 6/27 is scheduled.