過去のバッチキューの情報

2018.3.9 10:50   特定のユーザジョブで計算機資源が占有されているため、投入量を調整していただくよう連絡を行っています。システム的に制限をかけることが現時点ではできないため、実行数合計が72ノード程度になるよう、調整にご協力いただきますようお願いいたします。

2018.2.21 9:30 There may be an error when submitting a job.

2018.2.19 9:27 A failure occurred in the power supply system, and multiple nodes stopped at around 2/18 16:05. We restored to 2/19 0:35 but the cause is under investigation. We swill post details in a notice later.

2018.2.9 18:00  The problem that resources can not be allocated properly when q_core is specified with more than 2 nodes in parallel has been resolved today.

2018.2.9 18:00  Job scheduler commands such as qsub and qdel was temporally unavailable from 17:00 to 17:45 today. We're investigating root cause. Currently, the scheduler seems working normally.

2018.2.1 17:00  If 2 or more is specified for q_core, resource allocation will not be performed correctly. Please see the Announcements for details.

2018.1.26 10:10  At around 0:32 today, the water leak was detected with a water cooled cooling rack.
As we confirmed water leakage at Rack 1 today at 7:39 today, we shut down emergency computation nodes of 72 units, which is the smallest unit.
Jobs running on these nodes was forcibly terminated. Details will be posted in the announcements later.

2017.12.29 6:00 Some data in job monitoring page are rendered as 0, but job scheduler itself seems working normally. (This problem was fixed at 12.29 11:30)

2017.12.20 16:30  We found problem that GPU is not assigned in resource type s_gpu. We'll stop new execution of jobs which uses s_gpu. Estimated recovery time is not determined yet.

2017.12.1 18:00 15:45頃より断続的にジョブが投入できない事象が発生しています。現在は投入できるようですが、週末に再発の可能性があります。

2017.11.1 14:30 Today's Maintenance was completed at noon.

2017.10.31 10:00 Omni-Path network failure has been recovered.

2017.10.24 10:00 Service stop for TSUBAME3.0 Grand Challenge execution

2017.9.23 19:55   Switching the current group to the TSUBAME group other than tsubame-uses by the newgrp command, you can execute UGE commands such as qsub.

2017.9.23 19:30   it seems that a failure has occurred in the batch scheduler. it is not possible to submit new jobs, display job status and delete jobs.

2017.9.12 12:00 The Omni-Path network recovered around 10:50

2017.9.12 9:30 The Omni-Path network problem has occurred. Fabric Manager and compute nodes will be restarted at 9/12 9: 30 - 12: 00.

2017.9.11 20:30 The Omni-Path network problem has occured. Can not access normally between about 200 nodes of compute nodes and storage (Luster, NFS).

2017.9.1 9:30 Batch queue restarted. You need to set the TSUBAME point again.

2017.8.31 15:30 Scheduled maintenance started.

2017.8.17 12:00 Regulate the amount of batch queue so as not to occupy resources. For details, here.