Read-only storage for Machine Learning

We offer databases for machine learning etc. with dedicated SSD server connected to TSUBAME.

(2023.6.22) We resumed this service on April 11th, which was suspended due to operational reasons. Be aware that some changes are introduced to its usage.

Note: This is an experimental service and may change or be discontinued without prior notice.

Notes

  • There are some differences from data on Lustre parallel filesystem (/gs/hs*/) and the performance might be worse than Lustre in some conditions.
    • Pros: Data is stored on RAID-0 SSD (/gs/hs*/ is RAID-6 HDD)
    • Cons: SSD, server, and network are not parallelized and the performance will be reduced under contention.
  • Users cannot write into the storage. If you want something to be hosted, please refer to the section at the bottom of this page.
  • Longer downtime is expected when the SSD fails.
  • This service is available only from f_node, and the user must declare #$ -v USE_SS=1 explicitly at the beginning of the job script (From April 2023)
    • This directory will not be available otherwise.

Available databases

  • Alphafold2 database
    • /gs/ss0/alphafold/2.1.1/data/ $ALPHAFOLD_DATA_DIR
    • Set this path to ALPHAFOLD_DATA_DIR environment variable, after invoking module load alphafold
    • X.X should be replaced with appropriate version number of Alphafold. Please also refer to the original value of ALPHAFOLD_DATA_DIR.
  • ILSVRC2012 dataset(also known as ImageNet): Academic use only

We restrict access to the data marked as "Academic use only" to users in Tokyo Tech for license reasons. If an academic user outside of Tokyo Tech want to access the contents, please send an inquiry

Request for serving new databases

If you want some databases to be served, please send an inquiry to us.
Please note that not all requests will be satisfied for various reasons.

  • The database must be public and widely used.
    • The database which is used by only one research group will likely be rejected.
    • If the complicated terms of use, unlike CC-BY-SA, are applied, we cannot put the data in public storage.
  • The database size must be suitable to be served with dedicated SSD.
    • A database smaller than 1GB can easily fit into home directory or group disk.
    • A database which does not fit into SSD (15TB with RAID-0) cannot be served in this storage.