[Fault report] /gs/hs0 - occurred on Jun. 4, 2018

2018.6.4

An fault occurred and now temporarily recovered.

1. Summary

Impossible to access a part of /gs/hs0. It has temporarily recovered, but there is the possibility of performance decline.

2. Period

From 01:52 to 02:06, on Jun. 4

3. Details

Around 01:52, panic occurred on ossa1 which manages OST of Lustre (/gs/hs0), thereby It happened not to be able to access to /gs/hs0.
Around 01:52,  it was taken over to ossa0. /gs/hs0 is accessible at present.
It was probably caused by a temporal stall of file I/O to Lustre file system in the period above.

It is thought that of the same kind as the fault occurred on Jun. 4, but because the pair of OSS that occured is different, it is now in a degenerate state further.

Recovery of group disk at Jun. 6, will resolve the degenerate state, but it is not a fundamental cause solution, so it may reoccur. We planned to fix it at the time of the power outage in August, but please be aware of future announcements as there is a possibility that it will be implemented early even if the service is stopped.