2017.11.16
We will maintain TSUBAME service with extensive shutdown to fix frequent Omni-Path network defects and related failures.
1. Time window
Dec. 4 (Mon) 9:00 - 18:00
※We will restart the service in order as soon as work is done.
2. Impact
- Login node (will be down and users can not log in)
- Job Scheduler (all the compute nodes will be down)
- Storage Service (also will be down. Impossible to access to /home, /gs/hs0, /gs/hs1 and /gs/hs2.)
- TSUBAME portal
- License servers (lice0, remote and t3ldap1 will be down. In-campus distribution softwares referring to them will can not be used.)
3. Available services
- TSUBAME hosting (Operating independently from TSUBAME service)
- Information education computer systems (Under comfirmation. iMac on Computer room) -> Outage duration from 12:15 to 13:15.
- TSUBAME Website (This page)
4. Purpose
- Upgrading fablic software of all the equipments connected to Omni-Path network (from v10.4 to v10.6)
- Omni-Path network failure handling
- Applying GPU Direct RDMA function
Additional update schedule (addition date 11/29)
Lustre: lustre-2.7.21.3.ddn9.gbd2c642 lustre 2.7.21.3-ddn11
NVIDIA Driver: 384.66
OpenMPI: to 2.1.2
5. Important points
- OpenMPI
It will be necessary to recompile your programs using Open MPI 2.1.2 (module name: openmpi/2.1.2) if they are using the older versions of OpenMPI. If recompiling is not performed, the programs may not work properly.
- CUDA (addition date 11/29)
It will be necessary to recompile the applications with CUDA 8.0.61 (module name: cuda/8.0.61) if they are using the older version of CUDA (module name: cuda or cuda/8.0.44)