This post explores Docker: how to build an image, how to minimize and how to run it. As an exemple, we’ll dockerize a small demo application written for NodeJS/Express.
N.B.: This is an exploration of Docker: I’m definitely not a guru of any of the technologies discussed here.
What is Docker
Docker is a container management and runtime system. A container can be seen as a virtual machine stripped down of everything, including OS kernel and almost every system tool, except all libraries and executables necessary to run one application. The objective is to have the application and its runtime packaged so that it can be run (almost) isolated from the host it runs on and from any other container. A Docker container can run unchanged on any host that has Docker installed, which means development work station, pre-production server, production server or cloud. Almost isolated means that there are some adherence points that can be defined between containers and the host they run on as for example network or storage volumes.
Here are some definitions and we’ll dig into that along this post
Docker: Docker by itself is a daemon running on a host and few command line commands, the main one being
which interacts with the daemon.docker
allows to create images, start, stop, inspect containers, create storage volumes, create networks for the containers to communicate, etc.Images: An image is a package of all necessary elements to run an application: executables, libraries, file system and some parameters. Images ate templates from which containers are created. Images are created from a description contained in a file named by default
.Containers: A container in an instance of an image and one can run as many containers based on a single image as needed. Each container has however its own parameters as for example its file system, a network address, the CPU/RAM resources that are allocated to it or its restart policy in case of failure.
Volumes: By default all files created inside a container are not persisted when that container no longer exists. Docker has two options for containers to persist files on the host machine even after the container is erased: volumes, and bind mounts. No matter which type is chosen, the data looks the same from within the container.
Volumes are objects created and managed by Docker.
Bind mounts are files or directories on the host that are mounted into the container.
Networks: Most of the time a container need to communicate with other containers or with the host it runs on. Docker’s networking subsystem is based on drivers that implement different functionalities as for example:
: allow communication between containers running on the same host. This is actually the
: allow communication between containers with the host they run onoverlay
: allow communication between containers running on different hostsThere are other drivers which provide more specific control of the network.
Now that we have some vocabulary, let’s have some fun and create an image.
Creating a docker image
Creating an image is done by executing docker build ...
against an
image description file named by default Dockerfile
. In this section
we will create an image able to execute a demo application.
The demo application
The demo application is a NodeJS/Express application with two endpoints:
returns the JSON object{sid: "...", resp: "Hello world"}
stands for the server ID, a UUID that is set when the Node application starts. This will help us to identify the application that responds, when a container had been restarted, etc../kill
will reply with{sid: "...", resp: "Server killed"}
and will then kill the application server with a non zero exit code to simulate a crash.
All the details to create a NodeJS/Express application can be found in
a previous post on this topic. The application will be
called docktest
and the source code of the index.js
file is as
1// Libraries and global variable section
2var express = require('express')
3var crypto = require('crypto')
4var app = express()
5var sid = crypto.randomUUID()
7// Server start section. The server will be listening on port 3000
8var port = 3000
9var server = app.listen(port, function () {
10 var host = server.address().address
11 console.log(`Server sid='${sid}' listening at http://%s:%s`, host, port)
14app.get('/', function (req, res) {
15 res.json({"sid": sid, "resp": "Hello world"})
18app.get('/kill', function (req, res) {
19 res.json({"sid": sid, "resp": "Server killed"})
20 process.exit(-100)
If you have NodeJS installed locally, you can start the application
with node index.js
and hit the endpoints with a browser, wget
The .dockerignore file
To build the image we will copy some files from the development
machine, typically source files like index.js
onto the
container. Sometimes however, we don’t want to copy all the files. The
file lists, one per line, the files and directories
that we do not want to be copied. In our case, it has the following
The Dockerfile file
is a text file which describes the steps to build the
image. Let’s first see its content and then comment each line:
1# Build from the latest LTS version
2FROM node:16
4# Create app directory
7# Install applications dependencies and the app itself
8COPY . .
9RUN npm install -y express --production
11# Expose the port to the outside world
12EXPOSE 3000
14# Define the command to start the server
15CMD ["node", "index.js"]
FROM node:16
: Docker images are usually built on top of other base images, as for example OS images. NodeJS’s team has built an image that contains all NodeJS tools on top a Debian distribution. Actually NodeJS’s team has built dozens of images based on various OSes and architectures. See the official NodeJS page on Docker Hub. Here, our image will be based on the official Node v16.x image.WORKDIR /app
: defines that the working directory in the container will be/app
and some other commands will work directly in this directory.COPY . .
: copies everything from the current directory on the development host to the working directory in the container (here/app
as defined inWORKDIR
) except files and directories that are listed in.dockerignore
: executes some commands inside the container as for example installation of software packages or some configurations.RUN npm install -y express --production
installs Express by runningnpm
from within the container. the--production
option makesnpm
install only used for production, thus decreasing the size on the image.EXPOSE 3000
: port3000
will be seen on the host machineCMD ["node", "index.js"]
: is the command to execute to start the container.
Build the image
We are now ready to build the docker image
1docker build . -t mszmurlo/docktest:0.1
is the path where to findDockerfile
and the local files (source code, configurations, etc.). It can also be a URL, for example a GitHub repository.-t
introduces a tag, that is a human readable identification of the image. Tagging is not mandatory but if not provided, the image will be referenced only by its ID, that is something likeeff629089685
. Not really handy! The tag string can be anything but if the image is to be pushed on a repository like Docker Hub, it has to have the formatrepository/image:version
. The different elements of the tag string are:repository
is the username on Docker Hub.image
is the name of the application we are dockerizing. Can be anything.The
part is actually the real tag. It should have the formatmajor.minor.patch
and if not provided, the keywordlatest
will be appended.
Building the image may take some time. If the base image is not
already present on your system, it will get downloaded from Docker
Hub. Moreover, the package installation step RUN ...
(if any) may
also introduce some delays.
Once the build process is finished, we can verify we have our image
on the system by issuing docker images
(or docker image ls
$ docker images
mszmurlo/docktest 0.1 508c39a1924b 8 seconds ago 910MB
node 16 1e151315aa91 2 weeks ago 906MB
Just keep an eye on the size of the image: 910MB looks like a lot of space for an application that weights few kB… We’ll work that out later.
Run the container
Running a container is quite straightforward:
1docker run -d --rm -p 3080:3000 mszmurlo/docktest:0.1
detaches the container from the terminal and runs it as a daemon. If something goes wrong though, there will be no output on the terminal. In such case, remove the-d
option until all is fixed.--rm
to remove the container after exit. If you don’t take care, it’s quite easy to “saturate” the system with stopped containers. I recently recovered about 130GB just by deleting dead containers.-p 3080:3000
maps the port that will be made available from the host,3080
to the one that isEXPOSE
-ed, by the container,3000
, and that the application listens on.
This command returns the ID of the container so that we can further communicate with it (get logs, stop it, etc).
To see all the running containers, issue docker ps
. To test the
container point the URL to http://localhost:3080/ping
2 sid: "e3d98dc8-3dc8-41ed-b7dc-a7e5ec9c325e",
3 resp: "Hello world"
Some useful commands on running containers
Having a running container is very close to have a VM running an application: we may want to inspect what is going on inside the container, the resources it uses, etc..
docker ps
: List all running containers.docker container ls
: List running containers. Without options, this is equivalent todocker ps
. The-a
option adds non-running containers as well.docker stats
: Streams the resources used by all running containers is the very same wayvmstat
. The result is streamed every few seconds unless--no-stream
flag is provideddocker container logs <container ID>
: displays the logs from the container.logs -f
will follow the output, just astail -f
would.docker container attach <container ID>
: attaches host’s standard input, output, and error streams to a running container. Warning: once connected to standard streams you won’t be able to detach withCtrl-C
as all what you type is passed to the container.docker exec -it <container ID> <cmd> [opts>]
: executes the commandcmd
with optionsopts
inside the container while the container is running. The most useful use case is probably to attach a shell:docker exec -it <container ID> /bin/bash
. Once you have a shell, an interesting try isps auxww
…docker stop <container ID>
: Stops gracefully the container by sending aSIGTERM
signal to it. If it shuts down within 10 seconds (default value), fine, if not, a definitiveSIGKILL
is sent.docker kill <container ID>
: By default, stops immediately the container by sending aSIGKILL
signal to it. However, this command acts just as the badly named shellkill
command meaning that you con specify the signal to be sent with the--signal
option.Stopping or killing all running containers can be done in one line with
docker stop $(docker ps -q)
ordocker kill $(docker ps -q)
Check the the
for all available docker
Create small docker images
We have seen in the previous section that the size of the image is more than 900MB. This is huge!
Obviously, fat images will take more time to be pulled from a repository over the network than slim ones and will delay application startup. They will use more storage for production images which we usually pay for with a cloud provider. Images are fat because they include lots of software. This software is certainly useful during development for debugging but most probably unused in production. Moreover, having all this software available increases the attack surface for the bad guys. The objective is therefore to reduce the size of the image of our application.
Here are few techniques to achieve this goal:
use a prepackaged image designed to be small. This is most probably the safer option if the packaging is done by the developers of the base application, here NodeJS.
use a small base OS image and install the needed software on it. The safest way to do this is to use the OS’s package manager as available packages are supposed to have been tested before they had been released.
use a small base OS image and install from source. This option is difficult for two main reasons. Firstly because you’ll need to include the compilations tools, libraries, etc. and, secondly, because you will have to carry out all the tests
Erase whatever is not necessary, typically installation log files, package files, etc..
Create a small image based on Alpine Linux
The first idea might be to strip down a base image to only what is needed. Unless you know exactly what you are doing this is difficult and possibly dangerous as you might simply break your OS. Hopefully, there are several distributions designed to be small. One of them is Alpine Linux.
After having a look at the official Node images on Docker
Hub, we can find an official NodeJS
version packaged on Alpine: node:16-alpine
. Let’s give it a try
Create a new image description file by copying
at the end ofalpine-1
is not part of Alpine versioning: as we will create several image description files, it’s just a way to keep track of the different versions):1cp Dockerfile Dockerfile.alpine-1
Change the
directive from1FROM node:16
1FROM node:16-alpine
Build the image with:
1docker build . -t mszmurlo/docktest:0.1.1 -f Dockerfile.alpine-1
docker images
gives us a size of 115MB for our newly created
image. Not so bad for just one change, but let’s make it even
Create a smaller image based on Alpine
Here we will start with a fresh Alpine image and install NodeJS and NPM applications “manually” with the package manager.
Here is the content for Dockerfile.alpine-2
1# Build from basic Alpine Linux
2FROM alpine
4# Create app directory
8# Install Node and npm with Alpine's package manager
9RUN apk add --update nodejs
10RUN apk add --update npm
12# Install applications dependencies and the app itself
13COPY . .
14RUN npm install -y express --production
16# Expose the port to the outside world
17EXPOSE 3000
19# Define the command to start the server
20CMD ["node", "index.js"]
The main changes here are the FROM
and the RUN
We can now build the new image:
1docker build . -t mszmurlo/docktest:0.1.2 -f Dockerfile.alpine-2
And the result is… Tadaaaa:
1$ docker images
3mszmurlo/docktest 0.1.2 12fb02695cfa 5 seconds ago 62.2MB
4mszmurlo/docktest 0.1.1 4bd494d70f63 About an hour ago 115MB
5mszmurlo/docktest 0.1 508c39a1924b 20 hours ago 910MB
About 62MB instead of 115MB or 910MB before. Can we do better?
Create an even smaller image
Let’s have a look at the structure of an image.
docker image history <image ID>
prints all the commands that had
been used to actually build the image. If ran on docktest:0.1.2
, I
84263b7d1d5e 3 days ago /bin/sh -c #(nop) CMD ["node" "index.js"] 0B
59072440a130 3 days ago /bin/sh -c #(nop) EXPOSE 3000 0B
7438d3758d47 3 days ago /bin/sh -c npm install -y express --producti… 2.55MB
5c0dcc2c783b 3 days ago /bin/sh -c #(nop) COPY dir:145d7f8d3ac9801b5… 35.6kB
38047790e994 3 days ago /bin/sh -c apk add --update npm 8.13MB
6600b323857f 3 days ago /bin/sh -c apk add --update nodejs 45.9MB
8a5ecf05b642 3 days ago /bin/sh -c #(nop) WORKDIR /app 0B
c059bfaa849c 3 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 3 months ago /bin/sh -c #(nop) ADD file:9233f6f2237d79659… 5.59MB
Each of this lines describes a layer. Layers are kind of
versioning system use by Docker where an image is built on top of the
previous one. Starting from the bottom, the first line corresponds to
the use of alpine. On lines 4, 5 and 7 come the installation of node
(46MB), of npm (8MB) and of express (2.5MB), the HTTP server. Line 6
corresponds to copying all the files in the source directory except
what is listed in .dockerignore
NPM is only needed to download and to install the packages and not to
run the application. Thus, it can be removed at the end of
installation process. The same holds for the APKINDEX*
files which
are Alpine’s repositories of available packages. Add the following
lines to a Dockerfile.alpine-3
1RUN apk del npm
2RUN rm -f /var/cache/apk/APKINDEX*tar.gz
and build the image:
1docker build . -t mszmurlo/docktest:0.1.3 -f Dockerfile.alpine-3
And the resulting image weights… 62.3MB, meaning its size has
increased! Actually, this is not as strange as it looks like. Every
time we run ADD
and RUN
commands, we actually add a layer
and alter the container’s internal file system because we add or
remove some files. Each image layer can then be seen as the
difference between the state of the file system before and after the
command just as the commits in git
. Docker works that way to speedup
the image download and build processes. However, once we are happy
with the application that is being packaged inside the image, we don’t
need that information any more.
There are two ways to reduce the number of layers.
The “old” one is to reduce the number of layers, that is the number of commands that are being ran on the file system. The trick is to write:
1RUN apk add --update nodejs\
2 && apk add --update npm
4COPY . .
6RUN npm install -y express --production\
7 && apk del npm\
8 && rm -f /var/cache/apk/APKINDEX*tar.gz
rather than:
1RUN apk add --update nodejs
2RUN apk add --update npm
4COPY . .
6RUN npm install -y express --production
7RUN apk del npm
8RUN rm -f /var/cache/apk/APKINDEX*tar.gz
Notice the ‘\
’ (backslash character) for line continuation in the
first snippet.
command is one layer. Simple. Make these changes in
and save in Dockerfile.alpine-4
and create the
image docktest:0.1.4
. Its size has become 59.9MB. Not as impressive
as before, but better than nothing.
As this is a problem that had been identified by many since long time,
Docker team have introduced a new syntax to Dockerfile
to solve
it. The basic idea is to create a first image then copy some files
from that image on the next one, and so on. This method is called
multi-stage build. I’m not going to get into this as the
is quite clear.
Alternatives to Alpine Linux
Alpine is great, but there are however some challenges working with
it. Alpine is build on top of musl
C library, an alternative to
which is the reference C library in the Linux world. Running
an application linked against a C library that is different from the
one used during development might be risky. I can’t evaluate this
risk, nor have I ran into trouble my self, but whoever did some
production knows than minimizing differences between the development
environment and the production environment is always a good idea.
So in case you are a bit paranoid, you may want to chose an image adapted from a traditional distribution like Ubuntu’s baseimage which holds an Ubuntu within 8.5MB or Debian’s minideb which is a minimalistic Debian-based image built specifically to be used as a base image for containers.
The different steps in the previous sections led us from an image that weighted more than 900MB to a tiny(-ier) one which size was less than 60MB which represents a reduction by a factor of 15. The techniques we have seen here are not dedicated to NodeJS-based applications running on Alpine Linux. Using the right base image, reduce the software to what is needed in production, minimize the number of layers are principles that are universal.
Persisting the data of a container
Some applications are immutable as they contain only immutable data, for example a static web site where all pages are pre-generated with a system like Jekyll or Hugo. Even our demo application from the previous section is immutable. While this situation is very desirable because it’s very safe, there are applications which need to persist (to write) data on a disk and retrieve that data later. This is of course the case with databases. But before getting our hands dirty, let’s introduce the volumes.
Whenever a container writes some data on a file system (actually its internal file system), this data disappears if the container is restarted, for example after a crash. Volumes are here for rescue. They are a way to connect a mount point in the container to a storage space on the host. This way, if a container is restarted, the data will still be available. Docker defines two types of volumes: the named volumes ans the bind mounts.
Bind mounts
A Bind mount mounts an existing directory on the host on a directory in the container. For example:
1docker run -it --rm -w /app -v "$(pwd)/tmp:/app" alpine
The -v
introduces a volume. Volume syntax is split into two parts
separated by a colon (’:
’) character:
is the absolute path on the host machine. In the example above,$(pwd)/tmp
we want to mount thetmp
is the mount point in the container, here the/app
directory we defined above with-w
With the previous command, the content of the directory $(pwd)/tmp
on the host will be available in the container under /app
. To get
convinced, try the following:
1## we are on the host
2$ mkdir tmp
3$ cat > tmp/testfile <<EOF
4this is the content of test file
6$ docker run -it --rm -w /app -v "$(pwd)/tmp:/app" alpine
8## Here we are in the container with a root shell
9# cat /app/testfile
10this is the content of test file
11# ^D
13## back to the host
14$ rm -rf tmp
Bind mounts are very convenient if we want to pass some data to the container during its startup, as for example some configuration, or if we use the container in interactive mode for application development: the container running a specific version of a compiler, an interpreter, etc..
Bind mounts have however one drawback: as usually the process in the
container runs as root
and if it writes a file in a bind
mount, the file will belong to root on the host as well, as root
always have UID equal to zero. Even if the owner of the process in the
container is changed to a regular user, there is little chance it’s
UID matches the one of the user on the host. To be convinced of the
issue, try the following:
1## Here we are on the host machine
2$ mkdir tmp
3$ docker run -it --rm -w /app -v "$(pwd)/tmp:/app" alpine
5# here we are in the container
6touch /app/toto
9# and back on the host
10ll tmp
11total 0
12-rw-r--r-- 1 root root 0 mars 5 08:37 toto
We’ll not get into how to handle this here: this post explains very well a solution and this one gives a small correction.
Named volumes
A named volume is a dedicated storage space managed by directly by Docker. We don’t have to know where it is located and the only thing we need to remember is it’s name.
Volumes can be created with the command docker volume create <volume_name>
and mounted as follows:
1docker volume create alp_storage
2docker run -it --rm -w /app -v "alp_storage:/app" alpine
They can be listed with docker volume ls
and deleted with docker volume rm <volume_name>
. The create
command can be added some
options as the volume size, its owner, its type, etc. but we’ll not
get into that. See the
for more information.
Notice that named volumes don’t need to be created in advance: if a volume does not exist before the container is started, it will be created on the fly.
Networking: connecting multiple containers
The power of Docker is not only that “what runs on my laptop, runs unchanged, in production” (yet this is a very important feature), but also that we can connect together different containers running on the same host or on different hosts. To connect containers together we need to define a network.
In this section we’ll start one container from the docktest:0.1.4.
image to play the role of a server and a second container from the
basic alpine image to play the role of the client. From the client
we’ll query the server with wget
. But first, let’s see how to
manipulate the networks.
Manipulating networks
Just as for volumes, docker has commands to manipulate networks. The
commands below are sub commands of docker network
: lists available networkscreate
: creates a new network. The syntax is:docker network create --driver <drv_type> <net_name>
defines the type of network we want to create. The most commonly used ones are:bridge
is for standalone network allowing containers running on the same host to communicate one with each other.overlay
creates a network among containers running on several hosts, typically in a swarm, a Docker native Kubernetes alternative.Other drivers are available for defining networks for more precise usages as
but we’ll not get into these here.
There is one additional network driver, the
driver. This kind of network allows a container to use directly the host’s network. Incidentally, there can only be one instance ofhost
network and it exists by default. Thus, such a network cannot be created.rm
deletes a networkconnect
connects a container to a network anddisconnect
disconnects it from that network.
Standalone networking
By default Docker defines a bridge
network called bridge
docker network ls
) but as it’s the default, is not a good idea to
use it in production and it is advised to work with user defined
Let’s first create testnet
, our test network
1docker network create --driver bridge testnet
and start the containers:
In terminal 1:
1docker run --rm -it --name client --network testnet alpine
In terminal 2:
1docker run --rm -d --name server --network testnet mszmurlo/docktest:0.1.4
2docker exec -it server sh
In both terminals, get the IP address of the container with ifconfig
and try wget -O - -q
in the client (change
by whatever address you’ve found for the server)
One nice thing is that containers connected to a user defined network
can resolve container names to their IP addresses, so that the query
above can be written: wget -O - -q http://server:3000/
. Much more
user friendly…
Finally, an already running container can be connected to an existing
network. In terminal 3 start a container client2
without network:
docker run --rm -it --name client2 alpine
and in yet another
terminal connect this container to the testnet
network: docker network connect testnet client2
. All three containers can now
communicate one with each other.
Host networking
The host
network driver allows the container to use the host’s
network directly. Typically, if you start a container running Nginx
that will listen on port 80
, it will be the port 80
of the
host that will be used without any mapping with -p
The syntax is as follows:
1docker run -d --rm --network host --name docktest mszmurlo/docktest:0.1.4
Then try the following in a terminal on the host:
1# Our container is listens and replies to requests on port 3000
2$ curl http://localhost:3000/
3{"sid":"8f508937-0c29-4831-8b49-3896a36e665f","resp":"Hello world"}
5# And it's really on the host itself:
6$ netstat -t | grep 3000
7tcp 0 0 localhost:51834 localhost:3000 TIME_WAIT
9# Just to be really sure that this happens on the host and not in the container:
10$ nc -l -p 3000
11nc: Address already in use
Notice that the host
network driver only works on Linux.
This post has covered most of the basics of Docker: how to create an image and how to make it reasonably small, how to attach a volume to a container to persist some data or how to make containers communicate one with each other. Sure, you have not become a ninja on Docker yet, but I hope it’s enough to be able to speak about.
- Docker home page
- Docker documentation home page
command line documentation- Multi-stage builds
- Alpine Linux
- Baseimage docker: A minimal Ubuntu base image modified for Docker-friendliness
- Docker Hub: Docker Hub is the world’s largest library and community for container images
- Issue with docker file permissions on mount binds
- Reducing Image Size
Appendix - Docker command line snippets
General commands
docker help
: Displays the list of all the available commandsdocker help <command>
: Displays the list of all the available sub commands of the command<command>
docker help <command> <subcommand>
: Displays the help for the subcommand<subcommand>
Image related commands
docker image ls
(ordocker images
): List all images that had been downloaded or created on the host.-a
flag adds to the list all intermediate images that had been created during an image build process.docker image rm <image-name>
: Deletes the image named<image-name>
. The name may be the real name of the image or itsID
.docker build . -t <name>:<tag>
: Builds an image from theDockerfile
present in the working directory. The image will be named<name>:<tag>
. Alternative build file can be provided with the-f
Container related commands
docker ps
: List all running containers.docker stats # --no-stream
: List the resources used by all running containers. The result is streamed every few seconds unless--no-stream
flag is provideddocker container ls
: List running containers. The-a
option adds non-running containers as well.docker run -d --rm -p <port mapping> -v <volume mapping> <img>
: Creates a container from the image<img>
and runs it. If the image is not available locally, it will be downloaded from a repository which by default is Docker Hub.-d
runs the container as a daemon--rm
remove the container when it stops-p
option maps a port in the container on a port on the local machine. It is of the form-p <port on the host>:<port in the container>
. For example, if the containerized application listens on port3000
and we want to access it from the host on port4000
, we will write-p 4000:3000
option maps a volume from the host on a directory inside the container.- for a bind mount of the current directory on
:-v "$(pwd)":/app
- for a volume named
:-v my_volume:/app
Notice that
option is now recommended to mount volumes on the containers file system:- for a bind mount:
--mount type=bind,source="$(pwd)",target=/app
- for a volume named
:--mount source=my_volume,target=/app
- for a bind mount of the current directory on
docker container logs <container ID>
: displays the logs from the container.logs -f
will follow the output, just as withtail -f
.docker exec -it <container ID> <cmd> [opts>]
: executes the commandcmd
with optionsopts
inside the container while the container is running. The most useful use case is probably to attach a shell:docker exec -it <container ID> /bin/bash
.docker stop <container ID>
: Stops gracefully the container by sending aSIGTERM
signal to it.docker kill <container ID>
: Stops immediately the container by sending aSIGKILL
signal to it.