If you go to Google and search for “python docker” you will be hit with something like 60,500,000 results, so, why on earth add another one ? In a nutshell, simply because most of the posts out there will tell you, yes, how to containerise your python application, but very few will tell you how to do it properly.
Let’s start with the simplest of applications. Create a text file called main.py and copy and paste the code below.
print("Hello World !")
Let’s see if that works
Ok that works, so let’s go ahead and put it in a container. Googling around you will find several blog posts explaining how to do it. Basically we need to :
- Create a dockerfile
- Install the relative dependencies
- Copy our application (ideally in its own directory)
- Set the Workdir to the root directory where we copied our app in the previous step
- And finally execute our application
Most dockerfile examples will look something like this :
FROM python:3.9.1 WORKDIR /app COPY . /app CMD ["python", "-u", "./main.py"]<code data-enlighter-language="dockerfile" class="EnlighterJSRAW"></code>
Notice the “-u” directive to start python in unbuffered mode. This will cause both stderr and stdout to be displayed via the docker logs command. If you do not see any output from the docker logs command, chances are you started your application using
CMD ["python", "./main.py"]
Let’s create a dockerfile based on the above, and see if that works using :
docker build . -t test && docker run test
Sending build context to Docker daemon 3.072kB Step 1/5 : FROM python:3.9.1 3.9.1: Pulling from library/python 0ecb575e629c: Extracting [=============================> ] 49.28MB/50.4MB 7467d1831b69: Download complete feab2c490a3c: Download complete f15a0f46f8c3: Downloading [==========================> ] 47.91MB/51.83MB 937782447ff6: Downloading [===========> ] 45.91MB/192.3MB e78b7aaaab2c: Downloading [==========================> ] 3.218MB/6.146MB 06c4d8634a1a: Waiting 42b6aa65d161: Waiting f7fc0748308d: Waiting
The docker daemon will start parsing the dockerfile, pull and extract the necessary images, when all is done, we should end up something like this :
It worked … so we are done … or are we ?
Let’s take a quick look at the image we created by running the command docker image ls
docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE test latest c9b04ff9bbcd 6 minutes ago 885MB python 3.9.1 2a93c239d591 2 weeks ago 885MB
Oh wow it worked, but the image we created is so large, it could create its own gravity field ! 885MB for a simple hello world app is just ridiculous isn’t it ? So where does all that extra fat come from ? And more importantly, can we get rid of it ?
Docker image options
Let’s head to the docker registry and scroll down to the Supported tags and “respective
Dockerfile links” section.
Here we see several options like 3.9.2-buster, 3.9.2-slim-buster, 3.9.2-alpine, but what exactly are these ?
Let me start by saying that the safest image is always the full version, however with proper testing there is nothing stopping us from creating a more lightweight image, which will be lighter on our infrastructure during deployments, much faster to build and less taxing on our storage.
Stretch, Buster and Jessie images
Images tagged stretch, buster, or jessie are codenames for different Debian releases. The current stable Debian release is 10.4, and is codenamed “Buster.”
“Stretch” was the codename for all version 9 variations, and “Jessie” was the codename for all version 8 variations.
Future versions in development, but not yet stable, are “Bullseye” and “Bookworm.” You may start seeing these tags in the list of image versions on DockerHub.
You can safely choose one of the stable images if your code is compatible with a specific version of the Debian operating system. It is never recommended to use older versions due to vulnerabilities that might be present.
The slim images are a cut down version of the above, and only install the minimum number of packages needed to run python in this case.
As the name suggests, Alpine images are based on Alpine Linux. As a general rule, these images will be as thin as they come, and if space is indeed a concern, they could be considered. There are a few things you need to be aware of though. Sometimes the images are so thin, that they wont even give you the most basic of debugging tools. They are also missing glibc in favour of the slimmer musl. This could also interfere with some debugging tools. So as a rule of thumb only use these images where space is a major concern, and on mature code which will not likely require any debugging.
Of course you can always install any missing packages you need using apk add, however if you are experiencing unexplainable issues while building the container or at run time, try switching images to see if it helps.
Building a slimmer docker
So with this new knowledge in hand, let’s go ahead a rebuild a slimmer version of our application. We will change the first line of the dockerfile to “FROM python:3.9.1–alpine3.13“
FROM python:3.9.1-alpine3.13 WORKDIR /app COPY . /app CMD ["python", "-u", "./main.py"]
Then rebuild the container
docker build . -t test && docker run test
The first thing you will notice is that the pull/build/extraction time will be much shorter, and depending on the machine you are using and you internet connection it should not take more than a few seconds as opposed to the few minutes it took to build our container the first time.
The second thing you will notice is that the image size is now only around 45MB
Now that’s more like it.
A word about requirements.txt
The example we discussed in this writeup is the most basic of python applications, and was only used to show how to dockerise a python application efficiently. When you build more complex applications, chances are that you will need to also install some depedencies. As a rule of thumb, these are specified inside a txt file called requirements.txt. A sample file is below.
appnope==0.1.0 backcall==0.1.0 chardet==3.0.4 Click==7.0 Flask==1.0.2 google==2.0.1 google-cloud==0.34.0 gtfs-realtime-bindings==0.0.5 html5lib==1.0.1 idna==2.7 ipykernel==5.0.0 ipython==7.1.1 ipython-genutils==0.2.0 jupyter-client==5.2.3 jupyter-console==5.2.0 jupyter-core==4.4.0 kiwisolver==1.0.1 MarkupSafe==1.0 matplotlib==2.2.2 mistune==0.8.3 PySocks==1.6.8 python-dateutil==2.7.3 traitlets==4.3.2 tweepy==3.6.0 xlrd==1.1.0
Note the ==x.y.z after each entry. This is the version of each requirement to be installed by pip. If the ==x.y.z part is ommitted, pip will always install the latest version of the package (during the docker build stage). This might introduce issues at run time due to some incompatibility the newer package might have with your application. So always ensure to generate the requirements.txt using :
pip freeze > requirements.txt
Make also sure to keep an eye out for vulnerabilities on the package versions used by your applications in production.
In order to install the requirements inside your docker image, you will need to copy the requirements.txt to your container and add a pip install directive inside the dockerfile as per below :
FROM python:3.9.1 WORKDIR /app COPY requirements.txt /app/requirements.txt RUN pip install -r requirements.txt COPY . /app CMD ["python", "-u", "./main.py"]
Always remember to include a requirements.txt file with all your applications, even if they are not dockerised. It will save someone, and perhaps yourself, hours of debugging one day !!
Happy pythoning and dockering