Download ImageNet dataset? A true story
The ImageNet dataset is too much used into research papers: but now download it from official site is impossible.
ImageNet dataset types
When we speack about imagenet we should specify the type, because there are many version of it. The most complete and used ImageNet dataset is the ILSVRC2012(ImageNet Large Scale Visual Recognition Challenge 2012).
Aim
So in this article I explain how to download and prepare this dataset. I put all interesting things into this repo.
Download
Start with cloning the repo inside the interesting folder
mkdir ImageNet-ILSVRC2012
cd ImageNet-ILSVRC2012git clone https://gitlab.com/nicolalandro/download_and_prepare_imagenet_dataset.git
You can use the torrent files inside the repo folder for download the train and val tar files. You can download by yourself this files at Accademic Torrent .
If you do not want to use the UI tool you can do:
cd download_and_prepare_imagenet_datasettransmission-remote -a "ILSVRC2012_img_train.tar-a306397ccf9c2ead27155983c254227c0fd938e2.torrent"transmission-remote -a "ILSVRC2012_img_val.tar-5d6d0df7ed81efd49ca99ea4737e0ae5e3a5f2e5.torrent"
Untar
Start this chapter with all tar files inside the ImageNet-ILSVRC2012 folder.
cp /<download>/<path>/ILSVRC2012_img_train.tar .
cp /<download>/<path>/ILSVRC2012_img_val.tar .
The untar is a bit normal for the train because it contains some subtar for each class. Instead of the validation do not contains subfolder so we need some extra work. I create the screep for all this two task, so:
cp download_and_prepare_imagenet_dataset/*.sh ../untar_train.sh
./untar_val.sh
Human readable folder names
Now we have the dataset with folder train and val that contains a subfolder for each class with images. But the folder name are with a non human readable code.
If you want to transform it with correct name you can exec the follow:
cp download_and_prepare_imagenet_dataset/*.py .
cp download_and_prepare_imagenet_dataset/*.json .python3.6 id_to_class_train.py
python3.6 id_to_class_val.py
Long task? Use nohup
If you want I can advise you to use nohup and different log file for each task because it will take along time, for example:
nohup <do> <the> <command 1> > command_1.out 2>&1
In this way you can have all log inside the file and the process are in backgroud.
Conclusions
In this way I managed to download and prepare the ImageNet dataset.
I hope that this article can be usefull to someone. Good train.