Step by step guide to training Detectron2 detection models on GPU -Part 1
Introduction
This blog series is divided into four parts,
Part 1- The first part is about setting up the docker container for detectron2.
Part 2- Part two is about an open-source tool called labelme to label training images for detection
Part 4- Training and evaluating the detectron2 detection model. The architecture of the detection model is a Faster region proposal convolutional neural network (FRCNN) with a Feature pyramid network(FPN) and the backbone is resnet101. We will learn the steps to train a multiclass model.
what is detectron2?
Detectron2 is created by the Facebook research team. This is the official GitHub repository of Detectron2.
It is a library that has algorithms written with a research perspective that deliver state of the art solutions to artificial intelligence — computer vision focused problem statements.
You can explore all the projects done here. I would still want to mention a few interesting and my favourite ones here, 2D image-based detection, DensePose, Panoptic-DeepLab, Pointly-Supervised Instance Segmentation.
The detectron2 uses PyTorch as its framework. If you are new to training a PyTorch model then consider this as an excellent opportunity to begin your PyTorch journey.
NOTE: If your other deep learning models are deployed using a tensorflow serving and you wish to use detectron2 along with them then you may want to read this blog that helps to deploy models in PyTorch. I have spent a huge amount to figure this out. The weights that we get after training the detectron2 detection PyTorch model is in the format “WeightsFile.pth” and converting/exporting a “.pth” file to any other format and its deployment steps are very well written here.
Why one can read this blog?
Since we will train the resnet101+FRCNN+FPN detection model we would need GPU’s to train and evaluate the model. To train on GPU we will use the detectron2 docker provided by the Facebook research team.
I have documented my experience of running this docker container on my remote Linux machine.
I have added real-time screenshots while creating a docker container on my remote machine. I have shared all the cases where the installation failed and how I resolved them and successfully trained the model.
This is an end to end blog series to train the Detectron2 model. The data creation which is Part 3 is written by my husband Rohit Raj who is a deep learning enthusiast. I have given the link to his blog. He has shared the code he wrote to convert the CSV file which has annotation and other metadata of training data to a coco JSON file that goes as an input to detectron2 training code.
Working on a remote machine? This blog will help you to open JupyterLab, which will give you a Graphic user interface (GUI) to view, edit, save, run and debug your codes.
Learn about the best open-source labelling tool Labelme to create your own dataset.
Let us begin with setting up the docker container. If you are new to the field of docker containers, please read my step by step hands-on guide to the docker containers blog.
Successfully get the detectron2 docker image on your machine
The first step is to open a terminal on your machine. Please ensure that the machine has GPU’s support.
The remote machine I am working on has 4 NVIDIA A100 Tensor Core GPUs. It is a Linux machine, Ubuntu 18.04.2 LTS.
Step 1: Create a directory on your remote machine where you will clone the detectron2 git repository.
Create a new folder where you want to clone the detectron2 repository and data for this project.
Use the command “mkdir detectron2_detection” to create a new folder.
mkdir detectron2_detection
Use the change directory command “cd detectron2_detection” to go inside the folder detectron2_detection.
cd detectron2_detection
The next step is to clone the detectron2 repository inside the folder detectron2_detection using the below git clone command. If you are new to git I would highly recommend you to please read the basics of GitHub, it will be helpful in the long run.
git clone https://github.com/facebookresearch/detectron2.git
Once you run the git clone command the detectron2 directory will get downloaded into the detectron2_detection folder. Your terminal screen would look similar to the below image.
Let us explore what got cloned inside the detectron2_detection folder. Please use the command “ls” or “tree -L 1” inside the detectron2_detection folder to list down the downloaded folder.
ls or sudo apt-get install tree
You will find a folder named “detectron2”. This is the folder that got downloaded. You may want to explore every folder inside detectron2 but for now, let us focus on setting up the docker so that we can view any file on JupyterLab.
Let us now go inside the docker folder which is inside the detectron2 folder. To go inside the docker folder use the command “cd”.
cd /detectron2_detection/detectron2/docker/
Inside the docker folder, you will find four files. For now, let us open the file README.md to understand what command we need to run to create a GPU compatible docker image for detectron2. I am assuming that you have “vim” already installed on your machine. Vim helps to open, view, edit and save files on terminals. If you do not have vim please install it on your machine. If you use any other text editor to view files on the terminal please go ahead and open the file README.md. I have used the command “vim README.md” inside the docker folder to view the content of the file.
vim README.md
To run any command mentioned in the readme file. Please ensure you have docker installed on your machine. If you do not have docker then please follow the instruction mentioned on this page to install docker.
To ensure you have docker installed on your machine use the below command on the terminal.
docker --version
If the command runs successfully we can get started with the next steps.
Once you run the vim command the file will get open and you can read the set of instructions.
You can see that we are inside the docker folder and now we need to run the next command which is under the heading of “#build” in the README.md. The command which will build or create the docker image for detectron2 is docker build --build-arg USER_ID=$UID -t detectron2:v0 .
. We need to make sure that we run this command in this path only “/detectron2_detection/detectron2/docker/”. The “Dockerfile” present inside the docker folder is used to build the docker image for detectron and if you wish to view the set of commands written inside the “Dockerfile” you can use vim to view.
docker build --build-arg USER_ID=$UID -t detectron2:v0 .
The build command creates a docker image named detectron2 and gives it a tag “v0”. The file that creates this docker image is the “Dockerfile” inside the docker folder.
Getting to see the README.md file can be exciting but please do not copy and paste the next set of instructions as soon as the build command has finished its work.
I request you not to run the #Launch instruction after the build is over.
If you have docker installed on your machine then ideally there should not be any error while running the build command. The build process took 15 to 20 minutes on my machine. It is slow.
Once you have successfully run the build command you can run the docker image command to verify that you have the detectron2 image on your machine.
docker image ls
You will find something similar to what we can see in the below screenshot image when we run the “docker image ls” command. We can see we have an image with the name detectron2 and tag v0, an image ID, time of creation and the size of the image. The size of the image is quite big. It is 17.4 GB.
Now we have the docker image for Detectron2 on our machine. Let us now create a docker container out of the docker image.
Create docker container for detectron2
For the docker container please run the below command. You may first want to copy and paste the command line into any text editor and see if things are according to your machines requirement.
docker run
The “docker run” command is used to create a docker container out of a docker image.
-p 9976:9976
The “-p 9976:9976” since we are working on a remote machine that does not have a GUI. -p stands for mapping the port between the remote and your local machine so that you can view, edit, create, save and run the files using JupyterLab. This port mapping will allow you to open the project in your Chrome browser. Port 9976 of the local machine is mapped to port 9976 of the remote machine. These ports are listening to each other.
Even if we are working on our local machine which has GPU support, we need to add the port mapping.
--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=1,2
You can give the GPU nomenclature followed on your machine. For example, I have 4 GPUS and they are numbered as GPU 1, GPU 2, GPU 3, GPU 4. so in the argument NVIDIA_VISIBLE_DEVICES I have used the numbers 1 and 2 to state that I will be utilizing two GPUs to do any GPU operations. Those are GPU 1 and GPU 2. You can specify your GPU numbers.
-it -v /detectron2_detection/:/code/
The above command will help us mount your local folder to a folder on the container. After running the above command a folder with the name “code” will get created inside the docker container and You will see that anything that is present inside the “detectron2_detection” folder will be present inside the folder “code”.
The container is triggered on your machine.
Every container is an instance of the docker image. Every package that is installed in the docker image is present in the container.
You can see in the immediate above screenshot that the name of the container is “appuser@8bbcc9d793d4”.
The above container ID (8bbcc9d793d4) is a random number generated on my machine. You may have a different container ID. So please do not get confused.
Now you have a Linux machine (Your container) that has everything already installed to run the detectron2 code.
Let us explore what is there inside the container. To list down all the files and folders please type the command “ls” on the terminal. Refer to the immediate below screenshot.
If we type “tree -L 1” in the current folder of the container it will have the detectron2 directory cloned. It is completely identical to what we had cloned in the folder “detectron2_detection” outside the container.
But we will not work in this directory “/home/appuser/detectron2_repo/”. We will navigate to the folder called “code”. “Code” is the same folder where we mounted our folder “detectron2_detection”. Remember detectron2_detection folder also has cloned detectron2 folder.
You can choose to run the training code here from this folder, as this is part of the container. But if after some time you stop your container without committing all the data and code you will lose the hard work, time that you have invested to create your dataset and write code.
To solve this problem we had mounted our local folder “detectron2_detection” to a folder in our container called “code”.
Let us navigate to the folder “code”.
Type the below command inside the folder “detectron2_repo”.
cd /
The above command will take you to the directory which has the folder “code”.
cd code/detectron2/
You can find the folder “code” which has the same folder detectron2 that is inside the folder “detectron2_detection”. Remember “detectron2_detection” is the folder we created and mounted to our container folder “code”.
Any data or code you write inside the “code” folder will get saved and will not get deleted if your container is stopped, exited or deleted.
So feel free to write any script inside the “code” folder or bring your precious data.
Run the below command to get inside the detectron2 folder inside your container.
Final check of the detectron2 container by running the demo code
Now is the time to check if the container is running fine.
For that, we need to navigate into the “demo” folder which is inside the current detectron2 folder.
Inside the demo folder, we need to run the below command to download a test image to run a test model inside the container.
Use the below gist command to get the image.
Now I request you to go again back to the detectron2 folder which is inside the detectron2_detection folder and then run the command from the below gist to test the working of the model inside the container. It is recommended to not experiment with the below code when running for the first time.
cd ../
pwd
python3 demo/demo.py
You may not be able to see any screen popping up because we are working on a remote machine that does not have a GUI. But if your machine has GUI and GPU, Congratulations you may see an OpenCV window with Mask RCNN results on images.
In the above screenshot, you may not be able to see the results but the code ran successfully without error and that proves that the container is ready.
If you get any error while running the demo.py. The error can mostly be an OpenCV error. Please comment any cv2.imshow() and cv2.waitKey() functions in the code.
To view the code you can install the VIM text editor.
sudo apt-get update
sudo apt-get install vim
Use the above commands to install VIM and make changes by typing the below command.
vim demo.py
To edit the code press the key “i”, which means insert and then start editing and once you are done commenting the cv2 lines press the button “ESC” to come out of the editing mode and then press the button “wq” to save the file changes. Then re-run the code.
Steps to setup a GUI for the remote machine container for detectron2
Please run all the commands mentioned in the below block sequentially.
sudo apt-get install python3-pip
pip3 --version
sudo pip3 install jupyter -U && pip install jupyterlab
/usr/bin/python3 -m pip install --upgrade pip
sudo pip3 install nbconvert==5.4.1
sudo apt-get install pandoc
sudo apt-get install texlive-xetex
sudo pip3 install --upgrade --user nbconvert
sudo pip3 install tornado --upgrade
python3 -m pip install setuptools==59.5.0
To come out of the container press the key “control p+control q” if you are using mac os.
https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues
After you have come out of the container upon successful installation you will be in your remote machine folders.
Like in the below screenshot you can see that we have the “detectron2_detection” folder.
Now please open a new tab (terminal) on your local machine.
ssh -L 9976:localhost:9976 remote_machine_name_and_IP@10.168.16.157
Type the above command in the new terminal tab of your local machine. Please ensure you change the name and IP of your remote machine.
Once inside the remote machine through the new tab, you can type to below command to enter inside the detectron2 container.
docker exec -it detectron2_container bash
The above command will open the container screen and now you can navigate to the code folder and then to the detectron2 folder.
The only difference this time is you have used the port mapping and the entered through ssh so that we can utilize the ports to open Jupyterlab we installed.
To run the jupyterlab type the below command.
jupyter lab --ip=0.0.0.0 --port=9976 --allow-root
Now please open a browser in chrome and type the below command.
localhost:9976
Now please copy and paste the similar-looking highlighted token from the terminal in the password or token box in the browser and press the button Login.
docker ps --filter "status=exited"
docker start container_id
Use the above command if your container exits anytime. You can restart the container.
Conclusion:
This is always the first step that every data Science person has to take if she/he has a goal to train/test the model. Docker building, the container and running the demo.py successfully gives you the confidence to take the next steps.
Detectron2 is here to live, grow and lead the Artificial intelligence domain with its state of art solutions so it is always a good idea to experiment and experience for ourselves the power of detectron2.
I am extremely happy to be finishing this blog. I hope this part and the following parts will help many of us.
If you liked the blog please give a clap. This would mean a lot to me. It would definitely encourage me to write more and write better.
This blog is also part of self-encouragement. You can also follow and subscribe to my page for free for AI-focused easy to understand blogs.
Let me know in the comments section if I can help you with anything related to artificial intelligence, computer vision, image processes solutions.
The parts of this blog will be published in the next 15 days.
References and amazing articles to read:
https://www.cyberciti.biz/tips/what-is-devshm-and-its-practical-usage.html