Issues
Control ID | Confirmed | Updated | Detail |
---|---|---|---|
T3KI-20180817 | 2018/8 |
In Abaqus/Explicit the following error may occur at parallel execution Abaqus Error: Abaqus/Explicit Packager exited with an error - Please see the |
|
T3KI-20180731 | 2018/7 |
the job fails with the following error. xxx:yyy terminated with signal 11 at PC=0 SP=7fffffffa558. Backtrace: there seem some reasons to trigger the error. as a workaround, please try one of the followings. 1. export I_MPI_FABRICS=shm:tcp 2. if the error occurs with the number of processes per node(ppn) = 28, try ppn = 16 3. if the error occurs with mpirun, try to use mpiexec.hydra |
|
T3KI-20180629 | 2018/6 |
timeout error happens randomly by MPI collective functions with both intel MPI/OpenMPI in the large scale. As a workaround, set the following option before mpirun/mpiexec.hydra in the job script. export HFI_UNIT=0 |
|
T3KI-20180531 | 2018/5 | It was confirmed that MPI_Allgather does not work properly (Communication result can not be obtained correctly) when performing MPI_Allgather on GPU memory using OpenMPI 2.1.1 and 2.1.2 installed in TSUBAME 3.0 It was. workaround, set the following option variables. mpirun -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_allgather_algorithm 2 detailed information:Problem with collective communication for GPU memory with OpenMPI |
|
T3KI-20180420 | 2018/4 |
If openmpi sends and receives 2 bytes of data in large quantities, segmentation fault may occur. As a workaround, set the following environment variables. export PSM2_MQ_RNDV_HFI_THRESH=128000 |
|
T3KI-20180301 | 2017/12 | 2018/4/5 | It may consume excessively points. We will return it automatically in sequence (no report to individual users). (Apr 5, 2018) Regularly detect problems and return them while aiming at fundamental solution. |
T3KI-20171222A | 2017/12/22 |
Can not change shell from csh to other shell by chsh command. |
|
T3KI-10171207A | 2017/11 | 2018/4/5 | When failing to submit a job, temporary holding of points that are not displayed on the portal also occurs(Problems that consume excess points).Started returning automatically in sequence (no report to individual users).(Apr 5, 2018) Regularly detect problems and return them while aiming at fundamental solution. |
T3KI-20171130A | 2017/11 | 2018/4/5 | In the portal, "Processing" is displayed even for a finished job, and the temporary hold state of the point is not canceled. (Problems that consume excess points)(Apr 5, 2018) Regularly detect problems and return them while aiming at fundamental solution. |
T3KI-20171031A | 2017/10 | 2018/4/5 | TSUBAME point usage status may have a negative value in usage history.(Apr 5, 2018) Regularly detect problems and return them while aiming at fundamental solution. |
T3KI-20170926A | 2017/9/26 | 2017/12/21 |
When fnode was spcefied, a problem occurred in the resource map (CPU core and GPU topology) on the batch system. For example, only 21 cores can be used out of 28 physical cores. We recognize it as a batch system malfunction, being organized with vendor to fix. |
T3KI-20170914A | 2017/8/1 | Because there is a problem with the operation of the reservation function, we are not going to publish. | |
T3KI-20170913A | 2017/9/13 |
In Ubuntu16.04, TSUBAME 3.0 can not be logged-in by SSH key authentication, prompting the error "sign_and_send_pubkey: signing failed: agent refused operation." It can be resolved by registering keys in advance with the ssh-add command on the terminal. |
|
T3KI-20170829D | 2017/8/29 |
When starting Ansys Fluent with Cygwin/X, segmentation fault may occur. Please try another combination like PuTTy + Xming. |
|
T3KI-20170829C | 2017/8/25 |
Can not start COMSOL propmting an error message, when connecting from macOS sierra 10.12. XQuartz to TSUBAME 3.0. It is very likely that malfunction due to compatibility with OpenGL of Mac. Please connect with following option. "$ ssh -YC login.t3.gsic.titech.ac.jp -l USER-ID". |
|
T3KI-20170829B | 2017/8/29 |
TSUBAME account can not be created unless the TITECH common mail address has been created. Please get a TITECH mail address first. We also will fix the portal to show that. |
|
T3KI-20170824A | 2017/8/24 |
Modules may not be loaded in the second and subsequent nodes. We recognize that LD_LIBRARY_PATH can not be handed over the second and the subsequent nodes because of a problem of job scheduler. Please add the -v option to the UGE script which specifies the environment variable, and specify the library required for calculation. |
|
T3KI-20170822A | 2017/8/18 |
Froze with qrsh. This is because flow control is enabled in the setting of the terminal, so that specific operation (Ctrl + s) can not be used when operating on the remote host by rsh. Execute the following command before running qrsh. |
|
T3KI-20170818B | 2017/8/18 | Jobs exceeding 24 hours can be submitted even if not on reserved nodes. | |
T3KI-20170818A | 2017/8/18 | 2017/9/14 |
LAMMPS can not be executed with multiple nodes. This is a bug in the job scheduler UGE. #$ -v LD_LIBRARY_PATH=/apps/t3/sles12sp2/cuda/8.0/lib.375.66:/apps/t3/sles12sp2/cuda/8.0/lib64 |
T3KI-20170802B | 2017/8/1 | 2017/8/3 |
Clicking on the link of the group invitation mail does not work properly. Depending on the e-mail program, "=" at the end of link text may not be included in the link. In that case, copy full text including "=" and paste it on the browser. |
Resolved
Control ID | Confirmed | Updated | Detail |
---|---|---|---|
T3KI-20171221A | 2017/12/20 | 2018/4/5 | Since Job scheduler Update on December 19, GPU can not be assured with s_gpu. Supply of resource type_s is stopped because the cause can not be specified. (2018/1/11) Although we have reproduces defects in verification environments so far, no concrete solution has been found and resolution time is undecided. (2018/4/5) It was solved at year end maintenance. |
T3KI-20180201 | 2017/2/1 | 2017/2/13 | If q_core=2 or more is specified, 4 core can not be assigned properly at each node. q_core=1 is no problem. (2/13) fixed. |
T3KI-20170925A | 2017/9/23 | 2017/11/1 |
qsub, qstat, qdel are fail from 23 Sep. It is the bug of batch scheduler. Specify your group with the newgrp command as a temporary workaround. Click here for details. (11/1) It was fixed in today's scheduler version upgrade. |
T3KI-20170829A | 2017/8/28 | 2017/9/14 |
There is a case that the job usage history of the portal differs from those of login node. For example, STATUS may be displayed as "処理中(r)[being processed]" even though it had ended. The information on the login node is correct. Also will fix the cumulative usage point. (xx Sep.) Fixed.
|
T3KI-20170825A | 2017/8/23 | 2017/8/25 |
There are cases that local scratch area is not created when multiple nodes are used. Resolved by fixing Batch Scheduler UGE. |
T3KI-20170822B | 2017/8/1 | 2017/8/23 |
When submitting application of TSUBAME 3.0 on the Portal, some people can not create an account with error "An unexpected error has occured, please contact the system administrator". Changing the browser was not effective. (23 Aug.) Cause identified and fixed. |
T3KI-20170803A | 2017/8/3 | 2017/9/14 |
The following error is displayed at job submission and qrsh execution. It is a temporary error, please re-execute. |
T3KI-20170802F | 2017/8/2 | 2017/9/14 |
An error occurs when switching languages on the TSUBAME portal. (xx Sep.) It has been fixed. Please contact us in case of problems. |
T3KI-20170802D | 2017/8/1 | 2017/9/14 |
Even if I change the password on the portal, it does not show wheter it succeeded or failed. (xx xx) A dialog for notifying the change results is now displayed. |
T3KI-20170802C | 2017/8/1 | 2017/9/14 |
I can not connect to the storage service )CIFS). (xx Sep.) can now be connected. |
T3KI-20170802A | 2017/8/1 | 2017/8/7 |
Can not apply for new applications from some browsers. (7 Aug.) We have fixed the portal on 3 August, 2017. If it does not work, please use another browser as a workaround. Windows 10 + IE -> Firefox MacOS Sierra + Chrome 57 -> Safari Also, please confirm that JavaScript is enabled. |
T3KI-20170803B | 2017/8/3 | 2017/8/4 |
The expiration time of the group invitation mail expires in 30 minutes, too short. The expiration date will be fixed to one week. (4 Aug.) The expiration date of group invitation mail has been revised to one week. Please also refer to T3KI-20170802B. |
Not a bug, by design
Control ID | Confirmed | Updated | Detail |
---|---|---|---|
T3KI-20170802E | 2017/8/1 | 2017/9/14 |
An error may occur in the inquiry form. It may be a error when character strings that matches a system command such as "chmod" are detected. For the present, please replace it with double-byte characters as in this sentence. |
T3Ki-20170926A | 2017/8/1 |
An application for access card holder of 8-digit beginning with A is not approved. Because anyone can create an access card if you want to get it, you need to submit a document proving your identity. Please visit the account aqcuisition page. |