Download ImageNet dataset? A true story

Nicola Landro
2 min readAug 22, 2020

The ImageNet dataset is too much used into research papers: but now download it from official site is impossible.

ImageNet dataset types

When we speack about imagenet we should specify the type, because there are many version of it. The most complete and used ImageNet dataset is the ILSVRC2012(ImageNet Large Scale Visual Recognition Challenge 2012).

Aim

So in this article I explain how to download and prepare this dataset. I put all interesting things into this repo.

Download

Start with cloning the repo inside the interesting folder

mkdir ImageNet-ILSVRC2012
cd ImageNet-ILSVRC2012
git clone https://gitlab.com/nicolalandro/download_and_prepare_imagenet_dataset.git

You can use the torrent files inside the repo folder for download the train and val tar files. You can download by yourself this files at Accademic Torrent .

If you do not want to use the UI tool you can do:

cd download_and_prepare_imagenet_datasettransmission-remote -a "ILSVRC2012_img_train.tar-a306397ccf9c2ead27155983c254227c0fd938e2.torrent"transmission-remote -a "ILSVRC2012_img_val.tar-5d6d0df7ed81efd49ca99ea4737e0ae5e3a5f2e5.torrent"

Untar

Start this chapter with all tar files inside the ImageNet-ILSVRC2012 folder.

cp /<download>/<path>/ILSVRC2012_img_train.tar .
cp /<download>/<path>/ILSVRC2012_img_val.tar .

The untar is a bit normal for the train because it contains some subtar for each class. Instead of the validation do not contains subfolder so we need some extra work. I create the screep for all this two task, so:

cp download_and_prepare_imagenet_dataset/*.sh ../untar_train.sh
./untar_val.sh

Human readable folder names

Now we have the dataset with folder train and val that contains a subfolder for each class with images. But the folder name are with a non human readable code.

If you want to transform it with correct name you can exec the follow:

cp download_and_prepare_imagenet_dataset/*.py .
cp download_and_prepare_imagenet_dataset/*.json .
python3.6 id_to_class_train.py
python3.6 id_to_class_val.py

Long task? Use nohup

If you want I can advise you to use nohup and different log file for each task because it will take along time, for example:

nohup <do> <the> <command 1> > command_1.out 2>&1

In this way you can have all log inside the file and the process are in backgroud.

Conclusions

In this way I managed to download and prepare the ImageNet dataset.

I hope that this article can be usefull to someone. Good train.

--

--

Nicola Landro

Linux user and Open Source fun. Deep learning PhD. , Full stack web developer, Mobile developer, cloud engineer and Musitian.