About troubleshooting at reservation execution

We summarize the troubleshooting when jobs can not be submitted during reservation execution.
The following command is an example where the GSIC group executes the AR number 20190108 which is used on 2days.

1.Forgot to add ARID

Example of NG
When the following command is executed, it is executed as a normal job.

$ qsub -g GSIC hoge.sh

OK example
Be sure to use the -ar option when making reservation execution.

$ qsub -g GSIC -ar 20190108 hoge.sh

 

2.h_rt longer than reserved time


If the h_rt option time specification is longer than the reserved time, the job will not flow.
Also, because it is a specification that will be used 5 minutes before the reservation end time, please shorten the specified time by 5 minutes from the reservation time.

Example of NG
It is not executed because reservation time is full.

$ grep h_rt hoge.sh
#$ -l h_rt=48:00:00
$ qsub -g GSIC -ar 20190108 hoge.sh

OK example (end time is -5 minutes)

$ grep h_rt hoge.sh
#$ -l h_rt=47:55:00
$ qsub -g GSIC -ar 20190108 hoge.sh


When executing after the reservation start time, such as when the program terminates abnormally or when a job can not be submitted before the reservation start time, it is necessary to consider elapsed time.
For example, if you submit a job after 2 hours from the reservation start time, it will be the following script. (When one minute of internal processing time from qsub command execution to allocation of compute nodes)

$ grep h_rt hoge.sh
#$ -l h_rt=45:54:00
$ qsub -g GSIC -ar 20190108 hoge.sh

Related URL

TSUBAME3.0 User’s Guide "5.3. Reserve compute nodes"

TSUBAME portal User's Guide "9. Reserving compute nodes"

About specification of batch job scheduler

Main differences between TSUBAME 2.5 and TSUBAME 3.0 ( node reservation )