Step by step guide to training Detectron2 detection models on GPU -Part 1

Pallawi
15 min readAug 10, 2021

--

Image Source: https://github.com/facebookresearch/detectron2

Introduction

This blog series is divided into four parts,

Part 1- The first part is about setting up the docker container for detectron2.

Part 2- Part two is about an open-source tool called labelme to label training images for detection

Part 3- Part three is about creating a dataset as per detectron2 COCO dataset requirements to train a detection model

Part 4- Training and evaluating the detectron2 detection model. The architecture of the detection model is a Faster region proposal convolutional neural network (FRCNN) with a Feature pyramid network(FPN) and the backbone is resnet101. We will learn the steps to train a multiclass model.

what is detectron2?

Detectron2 is created by the Facebook research team. This is the official GitHub repository of Detectron2.

It is a library that has algorithms written with a research perspective that deliver state of the art solutions to artificial intelligence — computer vision focused problem statements.

You can explore all the projects done here. I would still want to mention a few interesting and my favourite ones here, 2D image-based detection, DensePose, Panoptic-DeepLab, Pointly-Supervised Instance Segmentation.

The detectron2 uses PyTorch as its framework. If you are new to training a PyTorch model then consider this as an excellent opportunity to begin your PyTorch journey.

NOTE: If your other deep learning models are deployed using a tensorflow serving and you wish to use detectron2 along with them then you may want to read this blog that helps to deploy models in PyTorch. I have spent a huge amount to figure this out. The weights that we get after training the detectron2 detection PyTorch model is in the format “WeightsFile.pth” and converting/exporting a “.pth” file to any other format and its deployment steps are very well written here.

Why one can read this blog?

Since we will train the resnet101+FRCNN+FPN detection model we would need GPU’s to train and evaluate the model. To train on GPU we will use the detectron2 docker provided by the Facebook research team.

I have documented my experience of running this docker container on my remote Linux machine.

I have added real-time screenshots while creating a docker container on my remote machine. I have shared all the cases where the installation failed and how I resolved them and successfully trained the model.

This is an end to end blog series to train the Detectron2 model. The data creation which is Part 3 is written by my husband Rohit Raj who is a deep learning enthusiast. I have given the link to his blog. He has shared the code he wrote to convert the CSV file which has annotation and other metadata of training data to a coco JSON file that goes as an input to detectron2 training code.

Working on a remote machine? This blog will help you to open JupyterLab, which will give you a Graphic user interface (GUI) to view, edit, save, run and debug your codes.

Learn about the best open-source labelling tool Labelme to create your own dataset.

Let us begin with setting up the docker container. If you are new to the field of docker containers, please read my step by step hands-on guide to the docker containers blog.

Successfully get the detectron2 docker image on your machine

The first step is to open a terminal on your machine. Please ensure that the machine has GPU’s support.

The remote machine I am working on has 4 NVIDIA A100 Tensor Core GPUs. It is a Linux machine, Ubuntu 18.04.2 LTS.

Step 1: Create a directory on your remote machine where you will clone the detectron2 git repository.

Create a new folder where you want to clone the detectron2 repository and data for this project.

Use the command “mkdir detectron2_detection” to create a new folder.

mkdir detectron2_detection

Use the change directory command “cd detectron2_detection” to go inside the folder detectron2_detection.

cd detectron2_detection

The next step is to clone the detectron2 repository inside the folder detectron2_detection using the below git clone command. If you are new to git I would highly recommend you to please read the basics of GitHub, it will be helpful in the long run.

git clone https://github.com/facebookresearch/detectron2.git

Once you run the git clone command the detectron2 directory will get downloaded into the detectron2_detection folder. Your terminal screen would look similar to the below image.

Screenshot: Cloning the git repository

Let us explore what got cloned inside the detectron2_detection folder. Please use the command “ls” or “tree -L 1” inside the detectron2_detection folder to list down the downloaded folder.

ls or sudo apt-get install tree
Screenshot: Installation of the tree package
Screenshot: The detectron2 directory that got cloned inside the detectron2_detection folder is detectron2
Screenshot: Folder structure of detectron2

You will find a folder named “detectron2”. This is the folder that got downloaded. You may want to explore every folder inside detectron2 but for now, let us focus on setting up the docker so that we can view any file on JupyterLab.

Let us now go inside the docker folder which is inside the detectron2 folder. To go inside the docker folder use the command “cd”.

cd /detectron2_detection/detectron2/docker/
Screenshot: Files inside the docker folder

Inside the docker folder, you will find four files. For now, let us open the file README.md to understand what command we need to run to create a GPU compatible docker image for detectron2. I am assuming that you have “vim” already installed on your machine. Vim helps to open, view, edit and save files on terminals. If you do not have vim please install it on your machine. If you use any other text editor to view files on the terminal please go ahead and open the file README.md. I have used the command “vim README.md” inside the docker folder to view the content of the file.

vim README.md

To run any command mentioned in the readme file. Please ensure you have docker installed on your machine. If you do not have docker then please follow the instruction mentioned on this page to install docker.

To ensure you have docker installed on your machine use the below command on the terminal.

docker --version

If the command runs successfully we can get started with the next steps.

Screenshot: Running VIM command inside the docker folder to read the README.md file

Once you run the vim command the file will get open and you can read the set of instructions.

Screenshot: When we do a vim README.md, these are the steps and set of instructions we are expected to follow.

You can see that we are inside the docker folder and now we need to run the next command which is under the heading of “#build” in the README.md. The command which will build or create the docker image for detectron2 is docker build --build-arg USER_ID=$UID -t detectron2:v0 . . We need to make sure that we run this command in this path only “/detectron2_detection/detectron2/docker/”. The “Dockerfile” present inside the docker folder is used to build the docker image for detectron and if you wish to view the set of commands written inside the “Dockerfile” you can use vim to view.

docker build --build-arg USER_ID=$UID -t detectron2:v0 .

The build command creates a docker image named detectron2 and gives it a tag “v0”. The file that creates this docker image is the “Dockerfile” inside the docker folder.

Getting to see the README.md file can be exciting but please do not copy and paste the next set of instructions as soon as the build command has finished its work.

I request you not to run the #Launch instruction after the build is over.

Screenshot: When we run the command “docker build --build-arg USER_ID=$UID -t detectron2:v0 . ", this is how our screen looks like. The command starts to download the base docker image and start installing the required packages mentioned in the Dockerfile.

If you have docker installed on your machine then ideally there should not be any error while running the build command. The build process took 15 to 20 minutes on my machine. It is slow.

Once you have successfully run the build command you can run the docker image command to verify that you have the detectron2 image on your machine.

docker image ls

You will find something similar to what we can see in the below screenshot image when we run the “docker image ls” command. We can see we have an image with the name detectron2 and tag v0, an image ID, time of creation and the size of the image. The size of the image is quite big. It is 17.4 GB.

Now we have the docker image for Detectron2 on our machine. Let us now create a docker container out of the docker image.

Create docker container for detectron2

For the docker container please run the below command. You may first want to copy and paste the command line into any text editor and see if things are according to your machines requirement.

docker run

The “docker run” command is used to create a docker container out of a docker image.

-p 9976:9976

The “-p 9976:9976” since we are working on a remote machine that does not have a GUI. -p stands for mapping the port between the remote and your local machine so that you can view, edit, create, save and run the files using JupyterLab. This port mapping will allow you to open the project in your Chrome browser. Port 9976 of the local machine is mapped to port 9976 of the remote machine. These ports are listening to each other.

Even if we are working on our local machine which has GPU support, we need to add the port mapping.

--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=1,2

You can give the GPU nomenclature followed on your machine. For example, I have 4 GPUS and they are numbered as GPU 1, GPU 2, GPU 3, GPU 4. so in the argument NVIDIA_VISIBLE_DEVICES I have used the numbers 1 and 2 to state that I will be utilizing two GPUs to do any GPU operations. Those are GPU 1 and GPU 2. You can specify your GPU numbers.

-it -v /detectron2_detection/:/code/

The above command will help us mount your local folder to a folder on the container. After running the above command a folder with the name “code” will get created inside the docker container and You will see that anything that is present inside the “detectron2_detection” folder will be present inside the folder “code”.

Screenshot: This is how your screen would look like when you run the command to start a container.

The container is triggered on your machine.

Every container is an instance of the docker image. Every package that is installed in the docker image is present in the container.

You can see in the immediate above screenshot that the name of the container is “appuser@8bbcc9d793d4”.

The above container ID (8bbcc9d793d4) is a random number generated on my machine. You may have a different container ID. So please do not get confused.

Now you have a Linux machine (Your container) that has everything already installed to run the detectron2 code.

Let us explore what is there inside the container. To list down all the files and folders please type the command “ls” on the terminal. Refer to the immediate below screenshot.

If we type “tree -L 1” in the current folder of the container it will have the detectron2 directory cloned. It is completely identical to what we had cloned in the folder “detectron2_detection” outside the container.

But we will not work in this directory “/home/appuser/detectron2_repo/”. We will navigate to the folder called “code”. “Code” is the same folder where we mounted our folder “detectron2_detection”. Remember detectron2_detection folder also has cloned detectron2 folder.

Screenshot: We will not work in this directory

You can choose to run the training code here from this folder, as this is part of the container. But if after some time you stop your container without committing all the data and code you will lose the hard work, time that you have invested to create your dataset and write code.

To solve this problem we had mounted our local folder “detectron2_detection” to a folder in our container called “code”.

Let us navigate to the folder “code”.

Type the below command inside the folder “detectron2_repo”.

cd /

The above command will take you to the directory which has the folder “code”.

Screenshot: The folder “code” can be seen in this screenshot.
cd code/detectron2/

You can find the folder “code” which has the same folder detectron2 that is inside the folder “detectron2_detection”. Remember “detectron2_detection” is the folder we created and mounted to our container folder “code”.

Any data or code you write inside the “code” folder will get saved and will not get deleted if your container is stopped, exited or deleted.

So feel free to write any script inside the “code” folder or bring your precious data.

Screenshot: This is the repository location where we will be adding data or scripts to train the model.

Run the below command to get inside the detectron2 folder inside your container.

Final check of the detectron2 container by running the demo code

Now is the time to check if the container is running fine.

For that, we need to navigate into the “demo” folder which is inside the current detectron2 folder.

Screenshot: You will find the folder demo here inside the folder “/code/detectron2/”

Inside the demo folder, we need to run the below command to download a test image to run a test model inside the container.

Use the below gist command to get the image.

Screenshot: Navigate to the demo folder and run the above command to download the test input.jpg image. In the screenshot, you can see how the input.jpg test image gets added to the demo folder file list.

Now I request you to go again back to the detectron2 folder which is inside the detectron2_detection folder and then run the command from the below gist to test the working of the model inside the container. It is recommended to not experiment with the below code when running for the first time.

cd ../
pwd
python3 demo/demo.py
Screenshot: once you run the demo.py, your screen will look similar to this screenshot.

You may not be able to see any screen popping up because we are working on a remote machine that does not have a GUI. But if your machine has GUI and GPU, Congratulations you may see an OpenCV window with Mask RCNN results on images.

In the above screenshot, you may not be able to see the results but the code ran successfully without error and that proves that the container is ready.

If you get any error while running the demo.py. The error can mostly be an OpenCV error. Please comment any cv2.imshow() and cv2.waitKey() functions in the code.

To view the code you can install the VIM text editor.

sudo apt-get update
sudo apt-get install vim

Use the above commands to install VIM and make changes by typing the below command.

vim demo.py

To edit the code press the key “i”, which means insert and then start editing and once you are done commenting the cv2 lines press the button “ESC” to come out of the editing mode and then press the button “wq” to save the file changes. Then re-run the code.

Steps to setup a GUI for the remote machine container for detectron2

Please run all the commands mentioned in the below block sequentially.

sudo apt-get install python3-pip
pip3 --version
sudo pip3 install jupyter -U && pip install jupyterlab
/usr/bin/python3 -m pip install --upgrade pip
sudo pip3 install nbconvert==5.4.1
sudo apt-get install pandoc
sudo apt-get install texlive-xetex
sudo pip3 install --upgrade --user nbconvert
sudo pip3 install tornado --upgrade
python3 -m pip install setuptools==59.5.0

To come out of the container press the key “control p+control q” if you are using mac os.

https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues

After you have come out of the container upon successful installation you will be in your remote machine folders.

Like in the below screenshot you can see that we have the “detectron2_detection” folder.

Screenshot: This is how your screen should ideally look when you come out of the container.

Now please open a new tab (terminal) on your local machine.

Screenshot: This is how my new tab looks like. You may have something else, maybe your username.
ssh -L 9976:localhost:9976 remote_machine_name_and_IP@10.168.16.157

Type the above command in the new terminal tab of your local machine. Please ensure you change the name and IP of your remote machine.

Once inside the remote machine through the new tab, you can type to below command to enter inside the detectron2 container.

docker exec -it detectron2_container bash

The above command will open the container screen and now you can navigate to the code folder and then to the detectron2 folder.

The only difference this time is you have used the port mapping and the entered through ssh so that we can utilize the ports to open Jupyterlab we installed.

Screenshot: Enter inside the container and navigate
Screenshot: Let us open this path on the chrome browser

To run the jupyterlab type the below command.

jupyter lab --ip=0.0.0.0 --port=9976 --allow-root
Screenshot: This is how your screen should ideally look after running the above command

Now please open a browser in chrome and type the below command.

localhost:9976
Screenshot: Type the command localhost:9976 and press enter
Screenshot: Jupyter Lab will get opened in the browser

Now please copy and paste the similar-looking highlighted token from the terminal in the password or token box in the browser and press the button Login.

Screenshot: Copy the token and paste it into the browser. You will have a different token number and it will be different every time you open a new jupyter lab session. So no need to save the token like the passwords.
Screenshot: After pasting the copied token in the token field.
Screenshot: You can now navigate through the folders which are outside your container on Jupyter Lab in google chrome on your local machine. You can add data and view them too.
docker ps --filter "status=exited"
docker start container_id

Use the above command if your container exits anytime. You can restart the container.

Conclusion:

This is always the first step that every data Science person has to take if she/he has a goal to train/test the model. Docker building, the container and running the demo.py successfully gives you the confidence to take the next steps.

Detectron2 is here to live, grow and lead the Artificial intelligence domain with its state of art solutions so it is always a good idea to experiment and experience for ourselves the power of detectron2.

I am extremely happy to be finishing this blog. I hope this part and the following parts will help many of us.

If you liked the blog please give a clap. This would mean a lot to me. It would definitely encourage me to write more and write better.

This blog is also part of self-encouragement. You can also follow and subscribe to my page for free for AI-focused easy to understand blogs.

Let me know in the comments section if I can help you with anything related to artificial intelligence, computer vision, image processes solutions.

The parts of this blog will be published in the next 15 days.

References and amazing articles to read:

https://www.cyberciti.biz/tips/what-is-devshm-and-its-practical-usage.html

--

--

Pallawi

Computer Vision contributor. Lead Data Scientist @https://www.here.com/ Love Data Science.