Recovery of group disk /gs/hs0 (6/6)

2018.5.28

Due to a failure in /gs/hs0 that occured on May 25, we were performing a degenerate operation in part, but we will do the work for restoration as described below. Please note that login nodes and load balancers will be maintained on 6th June.

 

1.Date of implementation

 Wednesday, June 6, 2018 13:30-14:30 *The end time may be around.

2.Contents

 OST managed by ossa3 (It is one of the OSS constituting /gs/hs0)  is under the control of ossa2 due to failure, so it is restored by this work (take back).

3.Influence

  I/O for /gs/hs0 will stay in operation and it will be delayed by about 30 minutes during work, but I/O will be continued without timeout as the Lustre file system. 

4.future work

 The obstacle of ossa3 is due to a known bug in Lutre and has been fixed in the new version. Since updating the version involves a system shutdown, we plan to fix it in conjunction with the campus power outage in August.  

Glossary

  OST:In the Lustre file system, a collection of disks actually storing the contents of the file

  OSS:In the Lustre file system, a server that actually transmits and receives the contents of a file to a computre node.