[Fault report] 2018.1.26: Node stop due to leakage of cooling rack

2018.2.27

 Sorry for delaying to notice. 

1. Outline of failure
 Water leakage occurred inside the cooling rack #1 on Jan. 26.
 That caused a failure in 19 nodes, and it affected all the 72 jobs in the rack.
 Although it does not break down only by water leakage, it hits cooling fan, and it bounced, so compute node also affected.
 Disassembly and visual inspection of cooling rack#1 have identified water leakage points, and we have completed the treatment against water leakage.

2. Cause
 There was a gap between the gasket and the water cooling pipe for some reason and leaked.
 After Jan. 2018 (especially from the midnight to the dawn), it is confirmed that the amount of water flowing in the water  cooling pipe fluctuating due to the change of the outside temperature.
 Irregular force is applied to the relevant part of the gasket due to the change in the amount of water, and as a result, it is presumed that part of the gasket has been damaged.  

3. Future action
 Disassemble a part of the  cooling rack, fully curing, and conduct flow test with fluctuating glow rate on all racks.
 Implementation timing and influence on users are currently being adjusted. 


Jobs that may be affected by failures are as follows.

1310516, 1666865, 1667033, 1667051, 1667052, 1667592, 1667610, 1667645, 1667688, 1667750, 1667834, 1667838, 1667842, 1667853, 1668498, 1668503, 1668560, 1668572, 1668584, 1668617, 1668627, 1668638, 1668646, 1671602, 1674768, 1696549, 1696744, 1697857, 1698341, 1698631, 1702052, 1704947, 1706114, 1708423, 1708445, 1709685, 1709981, 1714738, 1716127, 1716133, 1716139, 1716328, 1716352, 1716557, 1716561, 1716574, 1716640, 1716655, 1716804, 1717149, 1717206, 1717224, 1717303, 1717482, 1717499, 1717509, 1717978, 1718184, 1718226, 1719009, 1719104, 1719111