This post is part of Jetbatsa series and explores Docker: how to build an image and how to run it. As an exemple, we’ll dockerize a small demo application written for NodeJS/Express.

Jetbatsa stands for Just Enough To Be Able To Speak About. This is the code name for posts that are check lists or quick notes for myself while I explore some topic and that I started to share recently. I’m definitely not a guru of any of the technologies discussed here.

What is Docker

Docker is a container management and runtime system. A container can be seen as a virtual machine stripped down of everything, including OS kernel and almost every system tool, except all libraries and executables necessary to run one application. The objective is to have the application and its runtime packaged so that it can be run (almost) isolated from the host it runs on and from any other container. A Docker container can run unchanged on any host that has Docker installed, which means development work station, pre-production server, production server or cloud. Almost isolated means that there are some adherence points that can be defined between containers and the host they run on as for example network or storage volumes.

Here are some definitions and we’ll dig into that along this post

  • Docker:
    Docker by itself is a daemon running on a host and few command line commands, the main one being docker which interacts with the daemon. docker allows to create images, start, stop, inspect containers, create storage volumes, create networks for the containers to communicate, etc.

  • Images:
    An image is a package of all necessary elements to run an application: executables, libraries, file system and some parameters. Images are created from a description contained in a file named by default Dockerfile.

  • Containers:
    A container in an instance of an image and one can run as many containers based on a single image as needed. Each container has however its own parameters as for example its file system, a network address, the CPU/RAM resources that are allocated to it or its restart policy in case of failure.

  • Volumes:
    By default all files created inside a container are not persisted when that container no longer exists. Docker has two options for containers to persist files on the host machine even after the container is erased: volumes, and bind mounts. No matter which type is chosen, the data looks the same from within the container.

    • Volumes are objects created and managed by Docker.

    • Bind mounts are files or directories on the host that are mounted into the container.

  • Networks:
    Most of the time a container need to communicate with other containers or with the host it runs on. Docker’s networking subsystem is based on drivers that implement different functionalities as for example:

    • bridge: allow communication between containers running on the same host. This is actually the default.

    • host: allow communication between containers with the host they run on

    • overlay: allow communication between containers running on different hosts

    • There are other drivers which provide more specific control of the network.

Now that we have some vocabulary, let’s have some fun and create an image.

Creating a docker image

Creating an image is done by executing docker build ... against an image description file named by default Dockerfile. In this section we will create an image able to execute a demo application.

The demo application

The demo application is a NodeJS/Express application with two endpoints:

  • / returns the JSON object {sid: "...", resp: "Hello world"} where sid stands for the server ID, a UUID that is set when the Node application starts. This will help us to identify the application that responds, when a container had been restarted, etc..

  • /kill will reply with {sid: "...", resp: "Server killed"} and will then kill the application server with a non zero exit code to simulate a crash.

All the details to create a NodeJS/Express application can be found in Express Jetbatsa

The application will be called docktest and the source code of the index.js file is as follows:

// Libraries and global variable section
var express = require('express')
var crypto = require('crypto')
var app = express()
var sid = crypto.randomUUID()

// Server start section. The server will be listening on port 3000
var port = 3000
var server = app.listen(port, function () {
    var host = server.address().address
    console.log(`Server sid='${sid}' listening at http://%s:%s`, host, port)

app.get('/', function (req, res) {
    res.json({"sid": sid, "resp": "Hello world"})

app.get('/kill', function (req, res) {
    res.json({"sid": sid, "resp": "Server killed"})

If you have NodeJS installed locally, you can start the application with node index.js and hit the endpoints with a browser, wget or curl.

The .dockerignore file

To build the image we will copy some files from the development machine, typically source files like index.js onto the container. Sometimes however, we don’t want to copy all the files. The .dockerignore file lists, one per line, the files and directories that we do not want to be copied. In our case, it has the following content:


The Dockerfile file

Dockerfile is a text file which describes the steps to build the image. Let’s first see its content and then comment each line:

# Build from the latest LTS version
FROM node:16

# Create app directory

# Install applications dependencies and the app itself
COPY . .
RUN npm install -y express --production

# Expose the port to the outside world

# Define the command to start the server
CMD ["node", "index.js"]


  • FROM node:16 : Docker images are usually built on top of other base images, as for example OS images. NodeJS’s team has built an image that contains all NodeJS tools on top a Debian distribution. Actually NodeJS’s team has built dozens of images based on various OSes and architectures. See the official NodeJS page on Docker Hub. Here, our image will be based on the official Node v16.x image.

  • WORKDIR /app: defines that the working directory in the container will be /app. COPY, RUN and some other commands will work directly in this directory.

  • COPY . .: copies everything from the current directory on the development host to the working directory in the container (here /app as defined in WORKDIR) except files and directories that are listed in .dockerignore.

  • RUN: executes some commands inside the container as for example installation of software packages or some configurations. RUN npm install -y express --production installs Express by running npm from within the container. the --production option makes npm install only used for production, thus decreasing the size on the image.

  • EXPOSE 3000: port 3000 will be seen on the host machine

  • CMD ["node", "index.js"]: is the command to execute to start the container.

Build the image

We are now ready to build the docker image

docker build . -t mszmurlo/docktest:0.1


  • . is the path where to find Dockerfile and the local files (source code, configurations, etc.). It can also be a URL, for example a GitHub repository.

  • -t introduces a tag, that is a human readable identification of the image. Tagging is not mandatory but if not provided, the image will be referenced only by its ID, that is something like eff629089685. Not really handy! The tag string can be anything but if the image is to be pushed on a repository like Docker Hub, it has to have the format repository/image:version. The different elements of the tag string are:

    • repository is the username on Docker Hub.

    • image is the name of the application we are dockerizing. Can be anything.

    • The version part is actually the real tag. It should have the format major.minor.patch and if not provided, the keyword latest will be appended.

Building the image may take some time. If the base image is not already present on your system, it will get downloaded from Docker Hub. Moreover, the package installation step RUN ... (if any) may also introduce some delays.

Once the build process is finished, we can verify we have our image on the system by issuing docker images (or docker image ls):

$ docker images
REPOSITORY          TAG        IMAGE ID       CREATED         SIZE
mszmurlo/docktest   0.1        508c39a1924b   8 seconds ago   910MB
node                16         1e151315aa91   2 weeks ago     906MB


Just keep an eye on the size of the image: 910MB looks like a lot of space for an application that weights few kB… We’ll work that out later.

Run the container

Running a container is quite straightforward:

docker run -d --rm -p 3080:3000 mszmurlo/docktest:0.1


  • -d detaches the container from the terminal and runs it as a daemon. If something goes wrong though, there will be no output on the terminal. In such case, remove the -d option until all is fixed.

  • --rm to remove the container after exit. If you don’t take care, it’s quite easy to “saturate” the system with stopped containers. I recently recovered about 130GB just by deleting dead containers.

  • -p 3080:3000 maps the port that will be made available from the host, 3080 to the one that is EXPOSE-ed, by the container, 3000, and that the application listens on.

This command returns the ID of the container so that we can further communicate with it (get logs, stop it, etc).

To see all the running containers, issue docker ps. To test the container point the URL to http://localhost:3080/ping:

  sid: "e3d98dc8-3dc8-41ed-b7dc-a7e5ec9c325e",
  resp: "Hello world"


Some useful commands on running containers

Having a running container is very close to have a VM running an application: we may want to inspect what is going on inside the container, the resources it uses, etc..

  • docker ps:
    List all running containers.

  • docker container ls:
    List running containers. Without options, this is equivalent to docker ps. The -a option adds non-running containers as well.

  • docker stats:
    Streams the resources used by all running containers is the very same way vmstat or top. The result is streamed every few seconds unless --no-stream flag is provided

  • docker container logs <container ID>: displays the logs from the container. logs -f will follow the output, just as tail -f would.

  • docker container attach <container ID>: attaches host’s standard input, output, and error streams to a running container. Warning: once connected to standard streams you won’t be able to detach with Ctrl-C as all what you type is passed to the container.

  • docker exec -it <container ID> <cmd> [opts>]: executes the command cmd with options opts inside the container while the container is running. The most useful use case is probably to attach a shell: docker exec -it <container ID> /bin/bash. Once you have a shell, an interesting try is ps auxww

  • docker stop <container ID>:
    Stops gracefully the container by sending a SIGTERM signal to it. If it shuts down within 10 seconds (default value), fine, if not, a definitive SIGKILL is sent.

  • docker kill <container ID>:
    By default, stops immediately the container by sending a SIGKILL signal to it. However, this command acts just as the badly named shell kill command meaning that you con specify the signal to be sent with the --signal option.

  • Stopping or killing all running containers can be done in one line with docker stop $(docker ps -q) or docker kill $(docker ps -q).

Check the the documentation for all available docker commands.

Create small docker images

We have seen in the previous section that the size of the image is more than 900MB. This is huge!

Obviously, fat images will take more time to be pulled from a repository over the network than slim ones and will delays application startup. They will use more storage for production images which we usually pay for with a cloud provider. Images are fat because they include lots of software. This software is certainly useful during development for debugging but most probably unused in production. Moreover, having all this software available increases the attack surface for the bad guys. The objective is therefore to reduce the size of the image of our application.

Here are few techniques to achieve this goal:

  1. use a prepackaged image designed to be small. This is most probably the safer option if the packaging is done by the developers of the base application, here NodeJS.

  2. use a small base OS image and install the needed software on it. The safest way to do this is to use the OS’s package manager as available packages are supposed to have been tested before they had been released.

  3. use a small base OS image and install from source. This option is difficult for two main reasons. Firstly because you’ll need to include the compilations tools, libraries, etc. and, secondly, because you will have to carry out all the tests

  4. Erase whatever is not necessary, typically installation log files, package files, etc..

Create a small image based on Alpine Linux

The first idea might be to strip down a base image to only what is needed. Unless you know exactly what you are doing this is difficult and possibly dangerous as you might simply break your OS. Hopefully, there are several distributions designed to be small. One of them is Alpine Linux.

After having a look at the official Node images on Docker Hub, we can find an official NodeJS version packaged on Alpine: node:16-alpine. Let’s give it a try

  1. Create a new image description file by copying Dockerfile into Dockerfile.alpine-1 (the -1 at the end of alpine-1 is not part of Alpine versioning: as we will create several image description files, it’s just a way to keep track of the different versions):

    cp Dockerfile Dockerfile.alpine-1
  2. Change the FROM directive from

    FROM node:16


    FROM node:16-alpine
  3. Build the image with:

    docker build . -t mszmurlo/docktest:0.1.1 -f Dockerfile.alpine-1

docker images gives us a size of 115MB for our newly created image. Not so bad for just one change, but let’s make it even smaller.

Create a smaller image based on Alpine

Here we will start with a fresh Alpine image and install NodeJS and NPM applications “manually” with the package manager.

Here is the content for Dockerfile.alpine-2:

# Build from basic Alpine Linux
FROM alpine

# Create app directory

# Install Node and npm with Alpine's package manager
RUN apk add --update nodejs
RUN apk add --update npm

# Install applications dependencies and the app itself
COPY . .
RUN npm install -y express --production

# Expose the port to the outside world

# Define the command to start the server
CMD ["node", "index.js"]

The main changes here are the FROM and the RUN lines.

We can now build the new image:

docker build . -t mszmurlo/docktest:0.1.2 -f Dockerfile.alpine-2

And the result is… Tadaaaa:

$ docker images
REPOSITORY          TAG        IMAGE ID       CREATED             SIZE
mszmurlo/docktest   0.1.2      12fb02695cfa   5 seconds ago       62.2MB
mszmurlo/docktest   0.1.1      4bd494d70f63   About an hour ago   115MB
mszmurlo/docktest   0.1        508c39a1924b   20 hours ago        910MB

About 62MB instead of 115MB or 910MB before. Can we do better?

Create an even smaller image

Let’s have a look at the structure of an image.

docker image history <image ID> prints all the commands that had been used to actually build the image. If ran on docktest:0.1.2, I get

MAGE         CREATED       CREATED BY                                      SIZE
84263b7d1d5e 3 days ago   /bin/sh -c #(nop)  CMD ["node" "index.js"]      0B        
59072440a130 3 days ago   /bin/sh -c #(nop)  EXPOSE 3000                  0B        
7438d3758d47 3 days ago   /bin/sh -c npm install -y express --producti…   2.55MB    
5c0dcc2c783b 3 days ago   /bin/sh -c #(nop) COPY dir:145d7f8d3ac9801b5…   35.6kB    
38047790e994 3 days ago   /bin/sh -c apk add --update npm                 8.13MB    
6600b323857f 3 days ago   /bin/sh -c apk add --update nodejs              45.9MB    
8a5ecf05b642 3 days ago   /bin/sh -c #(nop) WORKDIR /app                  0B        
c059bfaa849c 3 months ago /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>    3 months ago /bin/sh -c #(nop) ADD file:9233f6f2237d79659…   5.59MB    

Each of this lines describes a layer. Layers are kind of versioning system use by Docker where an image is built on top of the previous one. Starting from the bottom, the first line corresponds to the use of alpine. On lines 4, 5 and 7 come the installation of node (46MB), of npm (8MB) and of express (2.5MB), the HTTP server. Line 6 corresponds to copying all the files in the source directory except what is listed in .dockerignore.

NPM is only needed to download and to install the packages and not to run the application. Thus, it can be removed at the end of installation process. The same holds for the APKINDEX* files which are Alpine’s repositories of available packages. Add the following lines to a Dockerfile.alpine-3:

RUN apk del npm
RUN rm -f /var/cache/apk/APKINDEX*tar.gz

and build the image:

docker build . -t mszmurlo/docktest:0.1.3 -f Dockerfile.alpine-3

And the resulting image weights… 62.3MB, meaning its size has increased! Actually, this is not as strange as it looks like. Every time we run ADD, COPY and RUN commands, we actually add a layer and alter the container’s internal file system because we add or remove some files. Each image layer can then be seen as the difference between the state of the file system before and after the command just as the commits in git. Docker works that way to speedup the image download and build processes. However, once we are happy with the application that is being packaged inside the image, we don’t need that information any more.

There are two ways to reduce the number of layers.

The “old” one is to reduce the number of layers, that is the number of commands that are being ran on the file system. The trick is to write:

RUN apk add --update nodejs\
    && apk add --update npm

COPY . .

RUN npm install -y express --production\
    && apk del npm\
    && rm -f /var/cache/apk/APKINDEX*tar.gz

rather than:

RUN apk add --update nodejs
RUN apk add --update npm

COPY . .

RUN npm install -y express --production
RUN apk del npm
RUN rm -f /var/cache/apk/APKINDEX*tar.gz

Notice the ‘\’ (backslash character) for line continuation in the first snippet.

One RUN command is one layer. Simple. Make these changes in Dockerfile.alpine-3 and save in Dockerfile.alpine-4 and create the image docktest:0.1.4. Its size has become 59.9MB. Not as impressive as before, but better than nothing.

As this is a problem that had been identified by many since long time, Docker team have introduced a new syntax to Dockerfile to solve it. The basic idea is to create a first image then copy some files from that image on the next one, and so on. This method is called multi-stage build. I’m not going to get into this as the documentation is quite clear.

Alternatives to Alpine Linux

Alpine is great, but there are however some challenges working with it. Alpine is build on top of musl C library, an alternative to glibc which is the reference C library in the Linux world. Running an application linked against a C library that is different from the one used during development might be risky. I can’t evaluate this risk, nor have I ran into trouble my self, but whoever did some production knows than minimizing differences between the development environment and the production environment is always a good idea.

So in case you are a bit paranoid, you may want to chose an image adapted from a traditional distribution like Ubuntu’s baseimage which holds an Ubuntu within 8.5MB or Debian’s minideb which is a minimalistic Debian-based image built specifically to be used as a base image for containers.


The different steps in the previous sections led us from an image that weighted more than 900MB to a tiny(-ier) one which size was less than 60MB which represents a reduction by a factor of 15. The techniques we have seen here are not dedicated to NodeJS-based applications running on Alpine Linux. Using the right base image, reduce the software to what is needed in production, minimize the number of layers are principles that are universal.

Persisting the data of a container

Some applications are immutable as they contain only immutable data, for example a static web site where all pages are pre-generated with a system like Jekyll or Hugo. Even our demo application from the previous section is immutable. While this situation is very desirable because it’s very safe, there are applications which need to persist (to write) data on a disk and retrieve that data later. This is of course the case with databases. But before getting our hands dirty, let’s introduce the volumes.


Whenever a container writes some data on a file system (actually its internal file system), this data disappears if the container is restarted, for example after a crash. Volumes are here for rescue. They are a way to connect a mount point in the container to a storage space on the host. This way, if a container is restarted, the data will still be available. Docker defines two types of volumes: the named volumes ans the bind mounts.

Bind mounts

A Bind mount mounts an existing directory on the host on a directory in the container. For example:

docker run -it --rm -w /app -v "$(pwd)/tmp:/app" alpine

The -v introduces a volume. Volume syntax is split into two parts separated by a colon (‘:’) character:

  • abs_path_on_host is the absolute path on the host machine. In the example above, $(pwd)/tmp we want to mount the tmp directory

  • dest_path_in_container is the mount point in the container, here the /app directory we defined above with -w.

With the previous command, the content of the directory $(pwd)/tmp on the host will be available in the container under /app. To get convinced, try the following:

## we are on the host
$ mkdir tmp
$ cat > tmp/testfile <<EOF
this is the content of test file
$ docker run -it --rm -w /app -v "$(pwd)/tmp:/app" alpine

## Here we are in the container with a root shell
# cat /app/testfile 
this is the content of test file
# ^D 

## back to the host 
$ rm -rf tmp

Bind mounts are very convenient if we want to pass some data to the container during its startup, as for example some configuration, or if we use the container in interactive mode for application development: the container running a specific version of a compiler, an interpreter, etc..

Bind mounts have however one drawback: as usually the process in the container runs as root and if it writes a file in a bind mount, the file will belong to root on the host as well, as root always have UID equal to zero. Even if the owner of the process in the container is changed to a regular user, there is little chance it’s UID matches the one of the user on the host. To be convinced of the issue, try the following:

## Here we are on the host machine
$ mkdir tmp
$ docker run -it --rm -w /app -v "$(pwd)/tmp:/app" alpine

# here we are in the container
touch /app/toto

# and back on the host
ll tmp
total 0
-rw-r--r-- 1 root root 0 mars   5 08:37 toto

We’ll not get into how to handle this here: this post explains very well a solution and this one gives a small correction.

Named volumes

A named volume is a dedicated storage space managed by directly by Docker. We don’t have to know where it is located and the only thing we need to remember is it’s name.

Volumes can be created with the command docker volume create <volume_name> and mounted as follows:

docker volume create alp_storage
docker run -it --rm -w /app -v "alp_storage:/app" alpine

They can be listed with docker volume ls and deleted with docker volume rm <volume_name>. The create command can be added some options as the volume size, its owner, its type, etc. but we’ll not get into that. See the documentation for more information.

Notice that named volumes don’t need to be created in advance: if a volume does not exist before the container is started, it will be created on the fly.

Networking: connecting multiple containers

The power of Docker is not only that “what runs on my laptop, runs unchanged, in production” (yet this is a very important feature), but also that we can connect together different containers running on the same host or on different hosts. To connect containers together we need to define a network.

In this section we’ll start one container from the docktest:0.1.4. image to play the role of a server and a second container from the basic alpine image to play the role of the client. From the client we’ll query the server with wget. But first, let’s see how to manipulate the networks.

Manipulating networks

Just as for volumes, docker has commands to manipulate networks. The commands below are sub commands of docker network:

  • ls : lists available networks

  • create: creates a new network. The syntax is:

    docker network create --driver <drv_type> <net_name>

    --driver defines the type of network we want to create. The most commonly used ones are:

    • bridge is for standalone network allowing containers running on the same host to communicate one with each other.

    • overlay creates a network among containers running on several hosts, typically in a swarm, a Docker native Kubernetes alternative.

    • Other drivers are available for defining networks for more precise usages as ipvlan, macvlan but we’ll not get into these here.

    There is one additional network driver, the host driver. This kind of network allows a container to use directly the host’s network. Incidentally, there can only be one instance of host network and it exists by default. Thus, such a network cannot be created.

  • rm deletes a network

  • connect connects a container to a network and disconnect disconnects it from that network.

Standalone networking

By default Docker defines a bridge network called bridge (try docker network ls) but as it’s the default, is not a good idea to use it in production and it is advised to work with user defined networks.

Let’s first create testnet, our test network

docker network create --driver bridge testnet

and start the containers:

In terminal 1:

docker run --rm -it --name client --network testnet alpine

In terminal 2:

docker run --rm -d --name server --network testnet mszmurlo/docktest:0.1.4
docker exec -it server sh

In both terminals, get the IP address of the container with ifconfig and try wget -O - -q in the client (change by whatever address you’ve found for the server)

One nice thing is that containers connected to a user defined network can resolve container names to their IP addresses, so that the query above can be written: wget -O - -q http://server:3000/. Much more user friendly…

Finally, an already running container can be connected to an existing network. In terminal 3 start a container client2 without network: docker run --rm -it --name client2 alpine and in yet another terminal connect this container to the testnet network: docker network connect testnet client2. All three containers can now communicate one with each other.

Host networking

The host network driver allows the container to use the host’s network directly. Typically, if you start a container running Nginx that will listen on port 80, it will be the port 80 of the host that will be used without any mapping with -p option.

The syntax is as follows:

docker run -d --rm --network host --name docktest mszmurlo/docktest:0.1.4

Then try the following in a terminal on the host:

# Our container is listens and replies to requests on port 3000
$ curl http://localhost:3000/
{"sid":"8f508937-0c29-4831-8b49-3896a36e665f","resp":"Hello world"}

# And it's really on the host itself:
$ netstat -t | grep 3000
tcp        0      0 localhost:51834      localhost:3000       TIME_WAIT

# Just to be really sure that this happens on the host and not in the container:
$ nc -l -p 3000
nc: Address already in use

Notice that the host network driver only works on Linux.


This post has covered most of the basics of Docker: how to create an image and how to make it reasonably small, how to attach a volume to a container to persist some data or how to make containers communicate one with each other. Sure, you have not become a ninja on Docker yet, but I hope it’s enough to be able to speak about.


Appendix - Docker command line snippets

General commands

  • docker help:
    Displays the list of all the available commands

  • docker help <command>:
    Displays the list of all the available sub commands of the command <command>

  • docker help <command> <subcommand>:
    Displays the help for the subcommand <subcommand>

  • docker image ls (or docker images):
    List all images that had been downloaded or created on the host. -a flag adds to the list all intermediate images that had been created during an image build process.

  • docker image rm <image-name>:
    Deletes the image named <image-name>. The name may be the real name of the image or its ID.

  • docker build . -t <name>:<tag>:
    Builds an image from the Dockerfile present in the working directory. The image will be named <name>:<tag>. Alternative build file can be provided with the -f option.

  • docker ps:
    List all running containers.

  • docker stats # --no-stream:
    List the resources used by all running containers. The result is streamed every few seconds unless --no-stream flag is provided

  • docker container ls:
    List running containers. The -a option adds non-running containers as well.

  • docker run -d --rm -p <port mapping> -v <volume mapping> <img>:
    Creates a container from the image <img> and runs it. If the image is not available locally, it will be downloaded from a repository which by default is Docker Hub.
    • -d runs the container as a daemon
    • --rm remove the container when it stops
    • -p option maps a port in the container on a port on the local machine. It is of the form -p <port on the host>:<port in the container>. For example, if the containerized application listens on port 3000 and we want to access it from the host on port 4000, we will write -p 4000:3000.
    • -v option maps a volume from the host on a directory inside the container.
      • for a bind mount of the current directory on /app: -v "$(pwd)":/app
      • for a volume named my_volume: -v my_volume:/app

      Notice that --mount option is now recommended to mount volumes on the containers file system:

      • for a bind mount: ` –mount type=bind,source=”$(pwd)”,target=/app`
      • for a volume named my_volume: ` –mount source=my_volume,target=/app`
  • docker container logs <container ID>: displays the logs from the container. logs -f will follow the output, just as with tail -f.

  • docker exec -it <container ID> <cmd> [opts>]: executes the command cmd with options opts inside the container while the container is running. The most useful use case is probably to attach a shell: docker exec -it <container ID> /bin/bash.

  • docker stop <container ID>:
    Stops gracefully the container by sending a SIGTERM signal to it.

  • docker kill <container ID>:
    Stops immediately the container by sending a SIGKILL signal to it.