Lower your container image size and improve compliance

The size of container images plays an essential role in various aspects, not just the amount of storage space used in the Container Registry. In this article, we will explore why optimizing container image size matters and how to achieve it effectively.

Oftentimes we have applications whose binaries are quite small but whose container images can be quite large, e.g. 80kb vs 470Mb. When scanners need to unpack layers and dependencies for such an image, it can quickly become quite troublesome. Optimizing container images will help speed up the scanning process while minimizing potential security risks. With fewer layers and reduced complexity, it will be easier to spot vulnerabilities and ensure the integrity of your applications.

Stephen Kitt (Senior Principal Software Engineer at Red Hat) is to be credited with the idea presented here. In the following I will simply build on the concept that he has outlined. I would like to thank Stephen for sharing his valuable contribution.

The constraints

One of the main concerns when working with container images is managing dependencies and packages that may have known vulnerabilities. As an image owner, it’s crucial to ensure that all dependencies are up-to-date and secure. However, understanding vulnerabilities in completely foreign code can be challenging.

When it comes to compliance policies, single binary images may not be the best choice due to their lack of transparency regarding vulnerabilities. Since no vulnerability score is associated with these images, they are often classified as vulnerable by compliance scanners.

For more details on this, please have a look at Learning container best practices.

Mitigating the situation

We will discuss a strategy for reducing container image size, allowing for faster scanning and improved security. You can potentially reduce your image size by a factor of up to 10 times!

One of the most effective methods for reducing container image size is using multi-stage builds. This technique allows you to build your application in multiple stages, with each stage focusing on a specific task. By copying only the necessary artifacts from one stage to the next, you can significantly reduce the size of your final image.

In the case of Python applications the results may not always be as anticipated, especially when the image was built using S2I (Source-to-Image).

It is indeed very difficult to remove unused libraries, update packages to their latest versions, or use alternative package managers that offer smaller artifacts if the source is built for a specific use case.

Multi-stage builds with installroot

While s2i offers a streamlined approach to building containerized applications, it was not really designed for producing minimalistic images that minimize the amount of included dependencies.

We will now explore an alternative approach for building images that makes use of the –installroot parameter of yum/dnf. In the initial stage of our multi-stage build, we create a base layer which has all the necessary packages to compile and build application dependencies.

FROM docker.io/redhat/ubi9 as base
RUN dnf -y install \
      --setopt=tsflags=nodocs \
      --setopt=install_weak_deps=0 \
      --releasever 9 \
      --nodocs \
        python3-devel \
        autoconf \
        automake \
        bzip2 \
        gcc-c++ \
        gd-devel \
        gdb \
        git \
        libcurl-devel \
        libpq-devel \
        libxml2-devel \
        libxslt-devel \
        lsof \
        make \
        mariadb-connector-c-devel \
        openssl-devel \
        patch \
        procps-ng \
        npm \
        redhat-rpm-config \
        sqlite-devel \
        unzip \
        wget \
        which \
        zlib-devel \
        python3-pip ; \
    yum -y clean all --enablerepo='*'Code language: plaintext (plaintext)

Compared to the original s2i image, we will create a “chrooted” based installation. In the second stage, we install the python dependencies into this chroot base and use the result as an artifact for the next step.

FROM base as builder
COPY app-src/requirements.txt /tmp/requirements.txt
RUN dnf -y \
        --setopt=install_weak_deps=0 \
        --nodocs \
        --releasever 9 \
        --installroot /output \
          install \
          glibc 
          glibc-minimal-langpack 
          libstdc++ \
          python3 \ # below are the application dependencies
          python3-requests \
          python3-dateutil \
          libpq ; \
      yum -y clean all --enablerepo='*'
RUN pip install --prefix=/usr --root /output -r /tmp/requirements.txtCode language: plaintext (plaintext)

We now have a stripped-down chroot environment for our containerized Python application. We thereby improve security runtime protection by ensuring that only necessary files and directories are accessible.

In the third and last stage we further optimize the size of our containerized Python application by combining the stripped-down chroot environment and the application into a single “scratch container”, as follows.

FROM scratch 

COPY --from=builder /output / 
COPY --from=base /root/buildinfo /root/buildinfo
COPY app-src/app.py /opt/app/app.py

USER 1001
WORKDIR /opt/app
ENTRYPOINT [ "/usr/bin/gunicorn" ]
CMD [ "--bind", "0.0.0.0:8080", "app:app" ]Code language: plaintext (plaintext)

While the python3-libs package does pull in the bash package, in the end we nevertheless have a very tailored and minimalistic image which minimizes the attack surface dramatically.

Applying the concept to a multi-arch build system

Having seen the initial benefits of this approach, I have adopted the updated Containerfile in themulti-architecture build system I have presented in a previous blog post and discovered some drawbacks that still need to be addressed.

Quickly summarizing the concept of the article mentioned, the multi-architecture build system utilizes virtual machines that execute the podman command through remote sessions for the Pipeline running on OpenShift. This technique allows us to build different CPU architectures without violating the OpenShift Cluster supportability.

One architecture-specific build using this Dockerfile approach has been successful, but I have encountered a challenge when trying to extend it to support multi-architecture builds. Specifically, I ran into issues when executing:

podman -c buildsys build --net=host --authfile=${DOCKER_CONFIG}/config.json \
         --platform=${PLATFORMS} --manifest ${MANIFEST} -f ${DOCKERFILE}Code language: Bash (bash)

The above re-uses the previously created base and/or build containers. Building without caching would mitigate that issue at the cost of longer build times, which isn’t a viable solution.

For local builds, using variables allows us to adjust the stage names, enabling a more efficient build process.

FROM quay.example.com/ubi9/ubi as base-${TARGETARCH} # adjust for multi-arch builds

FROM base-${TARGETARCH} as builder-${TARGETARCH} # adjust for multi-arch builds

COPY --from=builder-${TARGETARCH} /output /Code language: plaintext (plaintext)

Whilst this approach is effective for local builds, it is less easily applicable to non-local builds that use Podman remote connections, which cannot resolve those variables. As a workaround, we can modify the Podman commands to use the original Dockerfile (without variables) instead.

for PLATFORM in linux/amd64 linux/arm64 ; do 
  podman -c buildsys build --net=host \ 
     --authfile=${DOCKER_CONFIG}/config.json \
     --platform=${PLATFORM} \
     --manifest ${MANIFEST} \
     -f ${DOCKERFILE} 
done     
podman -c buildsys tag localhost/${MANIFEST} \
          ${IMAGE}
podman -c buildsys manifest  \
          push \
      --authfile=${DOCKER_CONFIG}/config.json \
      --all \
          ${IMAGE}Code language: Bash (bash)

I encourage you to give this multi-stage method a try and compare your image’s vulnerability score six months from now against the score of an equivalent “standard” image to assess its effectiveness.

By separating the base layer from the build layer through “chrooting” upon the base layer, we are able to create an output artifact on the final layer with our application. This is the smartest way I am currently aware of that allows us to stay compliant and effective when building container images. If you have any additional ideas, please do not hesitate to share them!

The constraints

Mitigating the situation

Multi-stage builds with installroot

Applying the concept to a multi-arch build system

Related