I have a personal project that I have been working on for about a year now. It is a pretty basic project with a Django backend and a SvelteKit based frontend. I deploy it to a VPS using docker images and traefik as reverse proxy.
Recently as feature development slowed, I have been reworking my infrastructure. I added a docker registry, and a CI system. And finally relevant to this post, I noticed that my docker image for my backend was huge! It was 819MB!! And worse yet after some work to switch to using PostgreSQL as my database, it ballooned to 1.06GB. This obviously was slow to build, slow to push, slow to pull and overall just excessive. In this post, I'll outline the steps that I followed to shrink my image size down to a more respectable 119MB.
FROM python:3.9.4-alpine3.13 ENV PYTHONUNBUFFERED=1 WORKDIR /app RUN apk add --update --no-cache \ postgresql-dev \ gcc \ python3-dev \ musl-dev \ libffi-dev \ openssl-dev \ cargo COPY requirements.txt . RUN pip install -r requirements.txt COPY bowling_scores/ ./ RUN python manage.py collectstatic --no-input ENV DJANGO_SETTINGS_MODULE='bowling_scores.prod_settings' EXPOSE 8000 CMD ["daphne", "-b", "0.0.0.0", "bowling_scores.asgi:application"]
And my requirements.txt looked something like:
daphne==3.0.2 django-cors-headers==3.11.0 django-filter==2.4.0 django-graphql-auth==0.3.16 django==3.2.12 djangorestframework==3.13.1 graphene-django==2.15.0 psycopg2==2.9.3 whitenoise==5.3.0
Simplifying operating system dependencies
First, I switched to using a meta package rather than installing a bunch of explicit build dependencies. This helps to limit the length of the Dockerfile while only increasing the size by a few MB. I add alpine-sdk and remove gcc, musl-dev, and openssl-dev
RUN apk add --update --no-cache \ alpine-sdk \ postgresql-dev \ python3-dev \ libffi-dev \ cargo
1.06GB -> 1.15GB
Switch to uvicorn
Next, I switched my asgi application server from daphne to `` uvicorn``. One of my reasons for having such a long build was the need to build several dependencies from source including cryptography. After doing some more research I learned that daphne was able to do SSL termination and that and a couple other features ended up causing it to have a large dependency footprint. From what I could tell reading the Django documentation and a blog post or two I should be able switch easily between daphne, unicorn and hypercorn and have similar performance.
Because of this I switched to uvicorn which has less features but since I wasn't using any of them. That didn't seem to be a large issue.
CMD ["uvicorn", "--host", "0.0.0.0", "bowling_scores.asgi:application"]
This change provided the largest swing in build time as well as a modest size reduction
1.15 GB -> 838MB
Remove extra dependencies
Now that I wasn't having to build so many python libraries from source I was able to remove a couple more OS packages, including python-dev and cargo
RUN apk add --update --no-cache \ alpine-sdk \ postgresql-dev
838MB -> 542MB
Using multistage builds
My docker image build is now much faster and a bit smaller, but we can do better. The tool in the box is using a multistage build. Basically, you can have your docker build make artifacts for you and then you reset your image to one without the build dependencies and copy just the artifacts
Here is a minimal example:
FROM python as builder ... build steps ... FROM python COPY --from=builder /stuff /stuff --- CMD ["python ..."]
In my project I decided that I would use pip wheel to install my dependencies and then copy the wheels over to the final stage. Some recommendations I read suggested installing dependencies to a virtual environment and copying that over. I suppose both would work.
In the build stage, the docker steps look like:
WORKDIR /wheels RUN apk add --update --no-cache \ alpine-sdk \ postgresql-dev COPY requirements.txt . RUN pip wheel -r requirements.txt
And the final image the docker steps:
COPY --from=builder /wheels /wheels RUN pip install \ -r /wheels/requirements.txt \ -f /wheels \ && rm -rf /wheels
542MB -> 128MB
Fixing the postgres library
Now after all these changes I was ready to test out my image and I realized that I was missing a runtime OS dependency for using postgres. I also decided to add the postgres client, which would allow the manage.py dbshell command to work:
RUN apk add --update --no-cache \ libpq \ postgresql-client
128MB -> 119MB
(I'm not really sure why adding a package made the size go down)
I am not under any illusion that this is the best, most production ready, perfect Dockerfile out there. And I would highly recommend that before you use anything in my post above you do research on your own. However, I hope that my thought process and maybe a couple tricks along the way will help you to shrink your own images.
I was able to reduce build times from ~6:30 minutes to ~1:40 seconds on CI to build and push an image. I was also able to reduce the overall image size from over a gigabyte at it's peak to 112MB.
If you'd like to see am example of the whole Dockerfile, you can do so in this gist.
- multistage with virtual environment: https://pythonspeed.com/articles/multi-stage-docker-python/
- multistage with wheels: https://www.merixstudio.com/blog/docker-multi-stage-builds-python-development/