In the first part of this blog series on Docker, you had the opportunity to follow a Docker basic tutorial and to have an overview of the lifecycle of a Docker container. In this post, we will have an overview on what a Docker Image and a Docker Container are made up.

Anatomy of an image

Union File Systems

Union file systems, or UnionFS, are file systems that operate by creating layers, making them very lightweight and fast. Docker uses union file systems to provide the building blocks for containers.
— Docker Official Documentation

We will see next how those UnionFS are exploited in Docker.

UnionFS and Docker Images

A Docker image is made of several layers. Each layer represents a filesystem difference with the layer below it. The stack of all the layers makes up the filesystem of an image. Each layer is read only and shared between the images (and the containers, as we will see in the next part).

As you may have guessed, all of this means that you need to be very careful when building new images to maximize the reuse of layers. We will discuss this in details in another blog post of this series.

Hands On!

Let us take a concrete example. Let us pull a maven image:

docker pull maven:3.3.9-jdk-8

Do not bother with the output of the command yet, we will get back to it later. Instead, let’s now use the history commands that displays all the layers of an image:

docker history maven:3.3.9-jdk-8

And the output:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
b2ef69e414f5        3 weeks ago         /bin/sh -c #(nop)  CMD ["mvn"]                  0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENTRYPOINT ["/usr/local/bi   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  VOLUME [/root/.m2]           0 B
<missing>           3 weeks ago         /bin/sh -c #(nop) COPY file:b3fc14e8337e0079a   327 B
<missing>           3 weeks ago         /bin/sh -c #(nop) COPY file:e3aa30a24fcf60a59   1.042 kB
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV MAVEN_CONFIG=/root/.m2   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV MAVEN_HOME=/usr/share/   0 B
<missing>           3 weeks ago         |2 MAVEN_VERSION=3.3.9 USER_HOME_DIR=/root /b   10.03 MB
<missing>           3 weeks ago         /bin/sh -c #(nop)  ARG USER_HOME_DIR=/root      0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ARG MAVEN_VERSION=3.3.9      0 B
<missing>           3 weeks ago         /bin/sh -c /var/lib/dpkg/info/ca-certificates   418.2 kB
<missing>           3 weeks ago         /bin/sh -c set -x  && apt-get update  && apt-   351.5 MB
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV CA_CERTIFICATES_JAVA_V   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV JAVA_DEBIAN_VERSION=8u   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV JAVA_VERSION=8u111       0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV JAVA_HOME=/usr/lib/jvm   0 B
<missing>           3 weeks ago         /bin/sh -c {   echo '#!/bin/sh';   echo 'set    87 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0 B
<missing>           3 weeks ago         /bin/sh -c echo 'deb http://deb.debian.org/de   55 B
<missing>           3 weeks ago         /bin/sh -c apt-get update && apt-get install    1.287 MB
<missing>           3 weeks ago         /bin/sh -c apt-get update && apt-get install    122.6 MB
<missing>           3 weeks ago         /bin/sh -c apt-get update && apt-get install    44.3 MB
<missing>           3 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/bash"]            0 B
<missing>           3 weeks ago         /bin/sh -c #(nop) ADD file:41ea5187c50116884c   123 MB

That is a lot of layers right? Well, let’s try to sort this out. First let’s review the Dockerfile used to build this image, available on github:

FROM openjdk:8-jdk

ARG MAVEN_VERSION=3.3.9
ARG USER_HOME_DIR="/root"

RUN mkdir -p /usr/share/maven /usr/share/maven/ref \
  && curl -fsSL http://apache.osuosl.org/maven/maven-3/$MAVEN_VERSION/binaries/apache-maven-$MAVEN_VERSION-bin.tar.gz \
    | tar -xzC /usr/share/maven --strip-components=1 \
  && ln -s /usr/share/maven/bin/mvn /usr/bin/mvn

ENV MAVEN_HOME /usr/share/maven
ENV MAVEN_CONFIG "$USER_HOME_DIR/.m2"

COPY mvn-entrypoint.sh /usr/local/bin/mvn-entrypoint.sh
COPY settings-docker.xml /usr/share/maven/ref/

VOLUME "$USER_HOME_DIR/.m2"

ENTRYPOINT ["/usr/local/bin/mvn-entrypoint.sh"]
CMD ["mvn"]

When you build an image, you obviously build the layers from bottom to top. So if we read the Dockerfile from bottom to top, we should find the list of commands that the history command output:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
b2ef69e414f5        3 weeks ago         /bin/sh -c #(nop)  CMD ["mvn"]                  0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENTRYPOINT ["/usr/local/bi   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  VOLUME [/root/.m2]           0 B
<missing>           3 weeks ago         /bin/sh -c #(nop) COPY file:b3fc14e8337e0079a   327 B
<missing>           3 weeks ago         /bin/sh -c #(nop) COPY file:e3aa30a24fcf60a59   1.042 kB
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV MAVEN_CONFIG=/root/.m2   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV MAVEN_HOME=/usr/share/   0 B
<missing>           3 weeks ago         |2 MAVEN_VERSION=3.3.9 USER_HOME_DIR=/root /b   10.03 MB
<missing>           3 weeks ago         /bin/sh -c #(nop)  ARG USER_HOME_DIR=/root      0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ARG MAVEN_VERSION=3.3.9      0 B

It seems correct, up until the FROM openjdk:8-jdk command of the Dockerfile. So, what about those remaining layers?

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
<missing>           3 weeks ago         /bin/sh -c /var/lib/dpkg/info/ca-certificates   418.2 kB
<missing>           3 weeks ago         /bin/sh -c set -x  && apt-get update  && apt-   351.5 MB
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV CA_CERTIFICATES_JAVA_V   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV JAVA_DEBIAN_VERSION=8u   0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV JAVA_VERSION=8u111       0 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV JAVA_HOME=/usr/lib/jvm   0 B
<missing>           3 weeks ago         /bin/sh -c {   echo '#!/bin/sh';   echo 'set    87 B
<missing>           3 weeks ago         /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0 B
<missing>           3 weeks ago         /bin/sh -c echo 'deb http://deb.debian.org/de   55 B
<missing>           3 weeks ago         /bin/sh -c apt-get update && apt-get install    1.287 MB
<missing>           3 weeks ago         /bin/sh -c apt-get update && apt-get install    122.6 MB
<missing>           3 weeks ago         /bin/sh -c apt-get update && apt-get install    44.3 MB
<missing>           3 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/bash"]            0 B
<missing>           3 weeks ago         /bin/sh -c #(nop) ADD file:41ea5187c50116884c   123 MB

Let’s have a look at the content of the Dockerfile used to build openjdk:8-jdk (I removed the comments to the readibility):

FROM buildpack-deps:jessie-scm
RUN apt-get update && apt-get install -y --no-install-recommends \
		bzip2 \
		unzip \
		xz-utils \
	&& rm -rf /var/lib/apt/lists/*

RUN echo 'deb http://deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list
ENV LANG C.UTF-8
RUN { \
		echo '#!/bin/sh'; \
		echo 'set -e'; \
		echo; \
		echo 'dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"'; \
	} > /usr/local/bin/docker-java-home \
	&& chmod +x /usr/local/bin/docker-java-home

ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64

ENV JAVA_VERSION 8u111
ENV JAVA_DEBIAN_VERSION 8u111-b14-2~bpo8+1
ENV CA_CERTIFICATES_JAVA_VERSION 20140324

RUN set -x \
	&& apt-get update \
	&& apt-get install -y \
		openjdk-8-jdk="$JAVA_DEBIAN_VERSION" \
		ca-certificates-java="$CA_CERTIFICATES_JAVA_VERSION" \
	&& rm -rf /var/lib/apt/lists/* \
	&& [ "$JAVA_HOME" = "$(docker-java-home)" ]

RUN /var/lib/dpkg/info/ca-certificates-java.postinst configure

If you read the Dockerfile from bottom to top, you’ll match every command with a layer. We could continue again with the Dockerfile of the next base image (buildpack-deps:jessie-scm) but I guess you get the point.

A Docker image is a stack of layers.
When you build on top of a base image, you are stacking layers on top of all the layers of the base image.

Base image

At this point, you might wonder which is the mother of all images. If you continue working up the base images from the previous example, you will get to the debian:jessie image:

FROM scratch
ADD rootfs.tar.xz /
CMD ["/bin/bash"]

The scratch image is a reserved minimal image used to signal Docker that you are building an image from scratch. The next command of your Dockerfile will be the first layer of your container. The scratch image has a special status in Docker. You cannot pull it, push it or tag an image with the name. Building a base image is clearly out of the scope of this blog post. If you wish to build a base image, you can have a look to the official documentation on base images.

Back to the output of docker pull

Remember, I told you that we would get back to the output of docker pull maven:3.3.9-jdk-8. My output looked like this:

3.3.9-jdk-8: Pulling from library/maven

386a066cd84a: Already exists
75ea84187083: Already exists
88b459c9f665: Already exists
690dbea0e4ca: Already exists
7e401cdd6f18: Already exists
a58186ddf9a0: Already exists
49999ed55bc4: Already exists
eb40561aad8f: Already exists
4ce0e24588f2: Pull complete
35430242cb99: Pull complete
d4af041dcf95: Pull complete
Digest: sha256:a7fd540bc273b7c4f1193fbcd46127ad3912fd095a251382f4e4312b9ac85e9d
Status: Downloaded newer image for maven:3.3.9-jdk-8

The output of the docker pull shows you all the image layers that the maven:3.3.9-jdk-8 image is made of. If you compare this output to the output of the docker history command, you will notice that we have 11 image layers here for 24 layers in the image. This is due to the fact that during the build process, docker is able to make optimizations by squashing layers into an image.

The previous statement is right since Docker 1.10.
Before version 1.10, there was a 1:1 ratio between image layers and layers.

Of course this does not change anything regarding what you download. You won’t have to download any image if you wish to download the openjdk:8-jdk image for instance:

openjdk:8-jdk: Pulling from library/openjdk

386a066cd84a: Already exists
75ea84187083: Already exists
88b459c9f665: Already exists
690dbea0e4ca: Already exists
7e401cdd6f18: Already exists
a58186ddf9a0: Already exists
49999ed55bc4: Already exists
eb40561aad8f: Already exists
Digest: sha256:dd0fc686a5584c0c7f3e50dd84ddc42fae400c27a21d8ca98dad190aff5e9d52
Status: Downloaded newer image for openjdk:8-jdk

Anatomy of a container

Now, how does the previous apply to containers? Well, it is pretty simple actually. When you create a container, Docker will add a new layer on top of the stack of layers that makes up the image. Contrary to the image layers, this layer is writable.

If you decide to create another container from the same image, a new writable layer will be created, the rest of the layer stack will be shared.

This diagram from the official documentation pretty much sums it up:

Layers and Containers

If you wish to go further in your understanding of containers, particurlarly on the topic of the writing strategies in containers, I suggest this page of the official documentation.

That is it for this post, you can jump to the next one Docker starter kit part 3: Commits, Volumes and Data Management

comments powered by Disqus