• [Disability Report] 30 Oct., 2017: OmniPath network failure

    2017.10.31

    Fault occured in OminiPath network as follows and now resolved.

    1.Period

    30 Oct., 2017 (Mon.) 19:42 - 21:53

    2.Influence

    Around 30 Oct. (Mon.) 19:42, a failure occured in the Omini-Path network. During this time, could not access between about 340 computation nodes and storage (Luster, NFS). In login0 and login1, could be accessed normally.

    We are verifying version 10.6 released on 27 Oct. (Fri.) of Fabric Manager which controls Omni-Path to fix this.

    3.Recovery




  • [Disability Report] 30 Oct., 2017: OmniPath network failure

    2017.10.31

    Fault occured in OminiPath network as follows and now resolved.

    1.Period

    30 Oct., 2017 (Mon.) 19:42 - 21:53

    2.Influence

    Around 30 Oct. (Mon.) 19:42, a failure occured in the Omini-Path network. During this time, could not access between about 340 computation nodes and storage (Luster, NFS). In login0 and login1, could be accessed normally.

    We are verifying version 10.6 released on 27 Oct. (Fri.) of Fabric Manager which controls Omni-Path to fix this.

    3.Recovery




  • [Disability Report] 30 Oct., 2017: OmniPath network failure

    2017.10.31

    Fault occured in OminiPath network as follows and now resolved.

    1.Period

    30 Oct., 2017 (Mon.) 19:42 - 21:53

    2.Influence

    Around 30 Oct. (Mon.) 19:42, a failure occured in the Omini-Path network. During this time, could not access between about 340 computation nodes and storage (Luster, NFS). In login0 and login1, could be accessed normally.

    We are verifying version 10.6 released on 27 Oct. (Fri.) of Fabric Manager which controls Omni-Path to fix this.

    3.Recovery




  • [Disability Report] 30 Oct., 2017: OmniPath network failure

    2017.10.31

    Fault occured in OminiPath network as follows and now resolved.

    1.Period

    30 Oct., 2017 (Mon.) 19:42 - 21:53

    2.Influence

    Around 30 Oct. (Mon.) 19:42, a failure occured in the Omini-Path network. During this time, could not access between about 340 computation nodes and storage (Luster, NFS). In login0 and login1, could be accessed normally.

    We are verifying version 10.6 released on 27 Oct. (Fri.) of Fabric Manager which controls Omni-Path to fix this.

    3.Recovery




  • 【Trouble】occurred on 2017.9.23: batch job scheduler - Job submission fails when default group is used

    Sep. 25, 2017

    (Added on 2017/11/01) This problem is already solved. You no longer need to follow workaround below.

    Since Sep. 23, trouble has occurred in the batch scheduling system . Symptom and provisional workaround are as follows.

    Symptom:
    If you belong to tsubame-users group (default group), executing the UGE command (qsub, qstat, qdel) results in an error.