Notification of the entire TSUBAME service stop for Omni-Path network update (operation date 12/4) (last modified 11/29)

2017.11.16

We will maintain TSUBAME service with extensive shutdown to fix frequent Omni-Path network defects and related failures.

1. Time window

Dec. 4 (Mon)  9:00 - 18:00
※We will restart the service in order as soon as work is done.

2. Impact

 - Login node (will be down and users can not log in)
 - Job Scheduler (all the compute nodes will be down)
 - Storage Service (also will be down. Impossible to access to /home, /gs/hs0, /gs/hs1 and /gs/hs2.)
 - TSUBAME portal
 - License servers (lice0, remote and t3ldap1 will be down. In-campus distribution softwares referring to them will can not be used.)

3. Available services

 - TSUBAME hosting (Operating independently from TSUBAME service)
 - Information education computer systems (Under comfirmation. iMac on Computer room) -> Outage duration from 12:15 to 13:15.
 - 
TSUBAME Website (This page)

4. Purpose

 - Upgrading fablic software of all the equipments connected to Omni-Path network (from v10.4 to v10.6)
 - Omni-Path network failure handling
 - Applying GPU Direct RDMA function

Additional update schedule (addition date 11/29)
Lustre: lustre-2.7.21.3.ddn9.gbd2c642  lustre 2.7.21.3-ddn11
NVIDIA Driver: 384.66
OpenMPI: to 2.1.2

5. Important points

  • OpenMPI
    It will be necessary to recompile your programs using Open MPI 2.1.2 (module name: openmpi/2.1.2) if they are using the older versions of OpenMPI. If recompiling is not performed, the programs may not work properly.
  • CUDA (addition date 11/29)
    It will be necessary to recompile the applications with CUDA 8.0.61 (module name: cuda/8.0.61) if they are using the older version of CUDA (module name: cuda or cuda/8.0.44)