Photo by Daniel van den Berg on Unsplash

On a warm summer day, I visited the Kubernetes Community Days Munich and enjoyed Adrian Reber’s talk about “Forensic container checkpointing and analysis”. Now I want to try that with OpenShift 4.13! This blog post will mainly cover how to enable and use checkpointing on OpenShift 4.13. All the details about Forensic container checkpointing you can learn and read from two great blog posts from my colleague Adrian Reber:

On Tuesday, September 12th at the hybrid conference ContainersDays in Hamburg, Adrian Reber will present his great talk again at 9:45 CEST on Stage K1. Don’t miss the talk on-site or virtually! I will add the recording to this blog post as soon as the recording is available.

Let’s start at the beginning.

What is “Forensic container checkpointing”? The important part here is “checkpointing”: In the realm of computing, checkpointing usually refers to a process where the state of a system or an application (in our case a container) is saved at a particular point in time. This allows for recovery or analysis at a later point in case of failures or for debugging purposes.

To achieve that, the technical foundation is Checkpoint/Restore In Userspace (CRIU), and it’s integrated in runc, crun, CRI-O and containerd – which is to say, on most of the container runtimes. In OpenShift, we use CRI-O and runc by default, but if you like you can also switch to crun. Second, we have to enable the ContainerCheckpoint feature on the Kubelet level to enable the Kubelet API to create a checkpoint of a container.

The downside is, checkpointing is an alpha level feature in CRI-O and Kubelet/Kubernetes level. Additionally, enabling this feature in OpenShift means you lose the support of the cluster because this feature is not yet supported by Red Hat. In case you are interested, a Feature Request (RFE) is available: https://issues.redhat.com/browse/RFE-3915

Enabling checkpointing

Let’s get our hands dirty and change the OpenShift Cluster configuration to enable checkpointing:

Step 1: pause the machine config pools

We first need to pause the machine config pools to rollout all changes at once and avoid too many node reboots and changes:

$ oc patch mcp/{master,worker} --type merge -p '{"spec":{"paused": true}}'
machineconfigpool.machineconfiguration.openshift.io/master patched
machineconfigpool.machineconfiguration.openshift.io/worker patchedCode language: Shell Session (shell)

Step 2: enable checkpointing at the CRI-O level for all worker nodes

To do this, we rollout an additional CRI-O configuration via a MachineConfig; to create the MachineConfig objects we use a tool called butane. If you want to learn more about that, I recommend reading the documentation: Creating machine configs with Butane.

$ curl -L -O https://raw.githubusercontent.com/openshift-examples/forensic-container-checkpointing-and-analysis/main/05-worker-enable-criu.bu

$ cat 05-worker-enable-criu.bu
variant: openshift
version: 4.13.0
metadata:
  name: 05-worker-enable-criu
  labels:
	machineconfiguration.openshift.io/role: worker
storage:
  files:
  - path: /etc/crio/crio.conf.d/05-enable-criu
	mode: 0644
	overwrite: true
	contents:
  	inline: |
    	[crio.runtime]
    	enable_criu_support = true

$ butane  05-worker-enable-criu.bu -o 05-worker-enable-criu.yaml

$ cat 05-worker-enable-criu.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
	machineconfiguration.openshift.io/role: worker
  name: 05-worker-enable-criu
spec:
  config:
	ignition:
  	version: 3.2.0
	storage:
  	files:
    	- contents:
        	compression: ""
        	source: data:,%5Bcrio.runtime%5D%0Aenable_criu_support%20%3D%20true%0A
      	mode: 420
      	overwrite: true
      	path: /etc/crio/crio.conf.d/05-enable-criu

$ oc apply -f 05-worker-enable-criu.yaml
machineconfig.machineconfiguration.openshift.io/05-worker-enable-criu createdCode language: Shell Session (shell)

Step 3: enable checkpointing at the Kubelet level

To do this we have to enable the ContainerCheckpoint feature gate in an existing custom resource. If you want to learn more about that I recommend reading the documentation: Enabling features using feature gates. We have two options to adjust the customer resource object, via `oc edit` or `oc patch`:

Option 1) oc edit

oc edit featuregate/clusterCode language: Bash (bash)

Edit YAML and add or adjust:

spec:
  customNoUpgrade:
    enabled:
      - ContainerCheckpoint
  featureSet: CustomNoUpgradeCode language: YAML (yaml)

Option 2) oc patch

$ oc patch featuregate/cluster \
    --type='json' \
    --patch='[
   	 {"op": "add", "path": "/spec/featureSet", "value": "CustomNoUpgrade"},
   	 {"op": "add", "path": "/spec/customNoUpgrade", "value": {"enabled": ["ContainerCheckpoint"]}}
    ]'Code language: Shell Session (shell)

Step 4: unpause the Machine Config Pools and rollout all changes on the Nodes

Now we unpause the Machine Config Pools we had paused in the first step to enforce the rollout of CRI-O and Kubelet configuration rollout on the Nodes.

$ oc patch mcp/{master,worker} --type merge -p '{"spec":{"paused": false}}'
machineconfigpool.machineconfiguration.openshift.io/master patched
machineconfigpool.machineconfiguration.openshift.io/worker patchedCode language: Shell Session (shell)

Wait for all machine config pools to be in status: Updated: true, updating: false and Degraded: false with the command

oc get mcpCode language: Bash (bash)

Demo

After a successful machine config rollout, let’s deploy our demo application called counters app and a checkpoint analyze helper to analyze the created checkpoint from counters app.

Deployment of the counters application

I created a git repository with all the deployment artifacts and application code: https://github.com/openshift-examples/forensic-container-checkpointing-and-analysis

Let’s create a new project and deploy the application:

oc new-project demo
oc apply -k https://github.com/openshift-examples/forensic-container-checkpointing-and-analysis/counters-appCode language: Bash (bash)

Wait until the application is running:

$ oc get pods -l app=counters
NAME                    	 READY  STATUS	  RESTARTS      AGE
counters-857d7978fd-jnkck   1/1 	Running   0      	123mCode language: Shell Session (shell)

Let’s fetch some information for later commands and tests:

# Get counter-app URL
export COUNTER_URL=$(oc get route/counters -o jsonpath="https://{.spec.host}")

# Get node where Pod is running
export NODE_NAME=$(oc get pods -l app=counters -o  jsonpath="{.items[0].spec.nodeName}" )

# Get pod name
export POD_NAME=$(oc get pods -l app=counters -o  jsonpath="{.items[0].metadata.name}" )Code language: Bash (bash)

Deployment of our checkpoint analyser helper pods

We deploy the checkpoint analyser in the same demo project as the counters application:

oc apply -k https://github.com/openshift-examples/forensic-container-checkpointing-and-analysis/checkpoint-analyserCode language: Bash (bash)

Now we come to the checkpointing part

Again, in this blog post I’m focusing on the OpenShift part. Now we have everything running and ready to follow Adrian Reber’s blog post https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/

Run queries against the counters app to write a file and write something into the memory:

$ curl ${COUNTER_URL}/create?test-file
counter: 0
$ curl ${COUNTER_URL}/secret?RANDOM_1432_KEY
counter: 1
$ curl ${COUNTER_URL}/
counter: 2Code language: Shell Session (shell)

Let’s create the checkpoint through the OpenShift / Kubernetes API at the Kubelet:

$ export TOKEN=$(oc whoami -t )
$ curl -k -X POST --header "Authorization: Bearer $TOKEN"  https://api.demo.openshift.pub:6443/api/v1/nodes/$NODE_NAME/proxy/checkpoint/demo/$POD_NAME/counter
{"items":["/var/lib/kubelet/checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tar"]}Code language: Shell Session (shell)

Now finally, we have our checkpoint ????

Keep in mind, a checkpoint contains everything from memory to filesystem and process information. If you create a checkpoint from an application with sensitive information in memory, you can easily export and discover that sensitive information!

Let’s discover the checkpoint a bit:

Get the matching Pod on the same node as we created the checkpoint:

$ export CHECKPOINT_POD_NAME=$(oc get pods -l app.kubernetes.io/component=checkpoint-analyser -o jsonpath="{.items[?(@.spec.nodeName=='${NODE_NAME}')].metadata.name}")Code language: Shell Session (shell)

“Login” into the checkpoint analyser pod:

$ oc rsh $CHECKPOINT_POD_NAMECode language: Shell Session (shell)

Now we are inside the checkpoint analyser pod and can discover the checkpoint:

sh-5.2# cd /checkpoints/
sh-5.2# ls
checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tarCode language: Shell Session (shell)

With the checkpointctl tool you can show some information:

sh-5.2# checkpointctl show checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar

Displaying container checkpoint data from checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tar

+-----------+--------------------------------------------------------------------------------------------+--------------+---------+--------------------------------+--------+-------------+------------+-------------------+
| CONTAINER |                             			  IMAGE                              			  |      ID      | RUNTIME |  		  CREATED   		  | ENGINE | 	IP      | CHKPT SIZE | ROOT FS DIFF SIZE |
+-----------+--------------------------------------------------------------------------------------------+--------------+---------+--------------------------------+--------+-------------+------------+-------------------+
| counter   | quay.io/openshift-examples/forensic-container-checkpointing-and-analysis/counters-app:main | b7fe1c786b7d | runc	| 2023-08-24T11:19:38.607090024Z | CRI-O  | 10.130.2.99 | 8.7 MiB	| 3.0 KiB 		  |
+-----------+--------------------------------------------------------------------------------------------+--------------+---------+--------------------------------+--------+-------------+------------+-------------------+Code language: Shell Session (shell)

Let’s unpack the checkpoint and take a look inside:

bind.mounts – this file contains information about bind mounts and is needed during restore to mount all external files and directories at the right location
checkpoint/ – this directory contains the actual checkpoint as created by CRIU
config.dump and spec.dump – these files contain metadata about the container which is needed during restore
dump.log – this file contains the debug output of CRIU created during checkpointing
stats-dump – this file contains the data which is used by checkpointctl to display dump statistics (--print-stats)
rootfs-diff.tar – this file contains all changed files on the container’s file-system

sh-5.2# cd /tmp/
sh-5.2# mkdir checkpoint
sh-5.2# cd checkpoint/
sh-5.2# tar xf /checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar
sh-5.2# ls
bind.mounts  checkpoint  config.dump  dump.log	io.kubernetes.cri-o.LogPath  rootfs-diff.tar  spec.dump  stats-dumpCode language: Shell Session (shell)

CRiu Image Tool (CRIT) is another tool to analyze the CRiu Images in the checkpoint/ directory.

sh-5.2# sh-5.2# crit show checkpoint/pstree.img | jq .entries[].pid
1
sh-5.2# crit show checkpoint/core-1.img | jq .entries[0].tc.comm
"Python3"Code language: Shell Session (shell)

Here is an important example. As mentioned above, the whole memory is also stored on disk with possible sensitive information. We stored with our application a “Secret” key “RANDOM_1432_KEY” in memory and can easily find it:

sh-5.2# ls  checkpoint/pages-*
checkpoint/pages-1.img
sh-5.2# grep -ao RANDOM_1432_KEY checkpoint/pages-*
RANDOM_1432_KEYCode language: Shell Session (shell)

In case you want to debug your application with gdb, you can convert the checkpoint to a coredump:

sh-5.2# cd checkpoint/
sh-5.2# pwd
/tmp/checkpoint/checkpoint
sh-5.2# coredump-python3
sh-5.2# 
sh-5.2# echo info registers | gdb --core core.1 -q
BFD: warning: /tmp/checkpoint/checkpoint/core.1 has a segment extending past end of file

warning: malformed note - filename area is too big
[New LWP 1]
Missing separate debuginfo for the main executable file
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/3e/6eae34c82de9e112e48289c49532ee80ab3929

warning: Unexpected size of section `.reg-xstate/1' in core file.
Core was generated by `python3 counter.py'.

warning: Unexpected size of section `.reg-xstate/1' in core file.
#0  0x00007f563e142937 in ?? ()
(gdb) rax    	0xfffffffffffffffc  -4
rbx    		0x1f4   			500
rcx    		0x7f563e142937 	 140008385423671
rdx    		0x1f4   			500
rsi    		0x1 			1
rdi    		0x7f563de4c6b0 	 140008382318256
rbp    		0x4345886f1693 	 0x4345886f1693
rsp    		0x7ffd7fbf3a68 	 0x7ffd7fbf3a68
r8 		0x0 			0
r9 		0x0 			0
r10    		0x4345518d0200 	 73965000000000
r11    		0x246   		582
r12    		0x7f563e7741c0 	 140008391918016
r13    		0x7f563df226c0 	 140008383194816
r14    		0x7f563e72dbf8 	 140008391629816
r15    		0x7f563dc8bfc0 	 140008380481472
rip    		0x7f563e142937 	 0x7f563e142937
eflags 		0x246   		[ PF ZF IF ]
cs 		0x33    			51
ss 		0x2b    			43
ds 		0x0 			0
es 		0x0 			0
fs 		0x0 			0
gs 		0x0 			0
(gdb) sh-5.2#
Code language: Shell Session (shell)

Another option to analyze the checkpoint is to copy it on your local machine via:

$ oc cp $CHECKPOINT_POD_NAME:/checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar checkpoint-counters.tarCode language: Shell Session (shell)

Update 2025 (OCP 4.17)

In the last Kubernetes Versions there was ongoing improvement in the container-snapshot feature. The feature matured and is now enabled by default (starting with OpenShift 4.17). This means the feature-gate enablement, which is described above, is not necessary anymore. But the api-server has no permissions on the snapshots by default. This causes the need of the following additional configuration instead:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-apiserver-checkpoints
rules:
- apiGroups:
  - ""
  resources:
  - nodes/checkpoint
  verbs:
  - get
  - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-apiserver-checkpoints
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-apiserver-checkpoints
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: kube-apiserver
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: system:kube-apiserverCode language: JavaScript (javascript)

which can be added with oc apply. Afterwards you can start with the “Demo” part above.

Summary

Forensic analysis is just one among the various use cases of container checkpointing. Consider the following scenarios, and there are likely many more:

Long-Running Processes: Applications with prolonged processes or computations benefit from checkpointing. When a container needs temporary pausing or stopping, checkpointing allows for resuming from the interruption point. For instance, this is useful during node maintenance to apply operating system updates. Similarly, it enables starting a long-running process with higher priority and resuming the lower priority process after the higher priority task completes.
Backup and Recovery: Creating backups of running containers is critical for swift recovery in the event of hardware failures or crashes. These checkpoints can restore container states and data on alternative infrastructure, ensuring business continuity.
Pre-Warming and Caching: Another valuable application is pre-warming or caching an application’s startup. By initiating an application, creating a checkpoint, and then quickly starting from the checkpoint, the startup time can be significantly reduced. A proposal by Adrian Reber at the Open Container Initiative explores the idea of storing checkpoints for later startups and other use-cases. You can find the proposal here: OCI Proposal (still a work-in-progress).

While this concept is in its early stages, it’s exciting to witness the possibilities that lie ahead.