Photo by Daniel van den Berg on Unsplash
On a warm summer day, I visited the Kubernetes Community Days Munich and enjoyed Adrian Reber’s talk about “Forensic container checkpointing and analysis”. Now I want to try that with OpenShift 4.13! This blog post will mainly cover how to enable and use checkpointing on OpenShift 4.13. All the details about Forensic container checkpointing you can learn and read from two great blog posts from my colleague Adrian Reber:
- https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
- https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/
On Tuesday, September 12th at the hybrid conference ContainersDays in Hamburg, Adrian Reber will present his great talk again at 9:45 CEST on Stage K1. Don’t miss the talk on-site or virtually! I will add the recording to this blog post as soon as the recording is available.
Let’s start at the beginning.
What is “Forensic container checkpointing”? The important part here is “checkpointing”: In the realm of computing, checkpointing usually refers to a process where the state of a system or an application (in our case a container) is saved at a particular point in time. This allows for recovery or analysis at a later point in case of failures or for debugging purposes.
To achieve that, the technical foundation is Checkpoint/Restore In Userspace (CRIU), and it’s integrated in runc, crun, CRI-O and containerd – which is to say, on most of the container runtimes. In OpenShift, we use CRI-O and runc by default, but if you like you can also switch to crun. Second, we have to enable the ContainerCheckpoint feature on the Kubelet level to enable the Kubelet API to create a checkpoint of a container.
The downside is, checkpointing is an alpha level feature in CRI-O and Kubelet/Kubernetes level. Additionally, enabling this feature in OpenShift means you lose the support of the cluster because this feature is not yet supported by Red Hat. In case you are interested, a Feature Request (RFE) is available: https://issues.redhat.com/browse/RFE-3915
Enabling checkpointing
Let’s get our hands dirty and change the OpenShift Cluster configuration to enable checkpointing:
Step 1: pause the machine config pools
We first need to pause the machine config pools to rollout all changes at once and avoid too many node reboots and changes:
$ oc patch mcp/{master,worker} --type merge -p '{"spec":{"paused": true}}'
machineconfigpool.machineconfiguration.openshift.io/master patched
machineconfigpool.machineconfiguration.openshift.io/worker patched
Code language: Shell Session (shell)
Step 2: enable checkpointing at the CRI-O level for all worker nodes
To do this, we rollout an additional CRI-O configuration via a MachineConfig; to create the MachineConfig objects we use a tool called butane. If you want to learn more about that, I recommend reading the documentation: Creating machine configs with Butane.
$ curl -L -O https://raw.githubusercontent.com/openshift-examples/forensic-container-checkpointing-and-analysis/main/05-worker-enable-criu.bu
$ cat 05-worker-enable-criu.bu
variant: openshift
version: 4.13.0
metadata:
name: 05-worker-enable-criu
labels:
machineconfiguration.openshift.io/role: worker
storage:
files:
- path: /etc/crio/crio.conf.d/05-enable-criu
mode: 0644
overwrite: true
contents:
inline: |
[crio.runtime]
enable_criu_support = true
$ butane 05-worker-enable-criu.bu -o 05-worker-enable-criu.yaml
$ cat 05-worker-enable-criu.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 05-worker-enable-criu
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
compression: ""
source: data:,%5Bcrio.runtime%5D%0Aenable_criu_support%20%3D%20true%0A
mode: 420
overwrite: true
path: /etc/crio/crio.conf.d/05-enable-criu
$ oc apply -f 05-worker-enable-criu.yaml
machineconfig.machineconfiguration.openshift.io/05-worker-enable-criu created
Code language: Shell Session (shell)
Step 3: enable checkpointing at the Kubelet level
To do this we have to enable the ContainerCheckpoint feature gate in an existing custom resource. If you want to learn more about that I recommend reading the documentation: Enabling features using feature gates. We have two options to adjust the customer resource object, via `oc edit` or `oc patch`:
Option 1) oc edit
oc edit featuregate/cluster
Code language: Bash (bash)
Edit YAML and add or adjust:
spec:
customNoUpgrade:
enabled:
- ContainerCheckpoint
featureSet: CustomNoUpgrade
Code language: YAML (yaml)
Option 2) oc patch
$ oc patch featuregate/cluster \
--type='json' \
--patch='[
{"op": "add", "path": "/spec/featureSet", "value": "CustomNoUpgrade"},
{"op": "add", "path": "/spec/customNoUpgrade", "value": {"enabled": ["ContainerCheckpoint"]}}
]'
Code language: Shell Session (shell)
Step 4: unpause the Machine Config Pools and rollout all changes on the Nodes
Now we unpause the Machine Config Pools we had paused in the first step to enforce the rollout of CRI-O and Kubelet configuration rollout on the Nodes.
$ oc patch mcp/{master,worker} --type merge -p '{"spec":{"paused": false}}'
machineconfigpool.machineconfiguration.openshift.io/master patched
machineconfigpool.machineconfiguration.openshift.io/worker patched
Code language: Shell Session (shell)
Wait for all machine config pools to be in status: Updated: true, updating: false and Degraded: false
with the command
oc get mcp
Code language: Bash (bash)
Demo
After a successful machine config rollout, let’s deploy our demo application called counters app and a checkpoint analyze helper to analyze the created checkpoint from counters app.
Deployment of the counters application
I created a git repository with all the deployment artifacts and application code: https://github.com/openshift-examples/forensic-container-checkpointing-and-analysis
Let’s create a new project and deploy the application:
oc new-project demo
oc apply -k https://github.com/openshift-examples/forensic-container-checkpointing-and-analysis/counters-app
Code language: Bash (bash)
Wait until the application is running:
$ oc get pods -l app=counters
NAME READY STATUS RESTARTS AGE
counters-857d7978fd-jnkck 1/1 Running 0 123m
Code language: Shell Session (shell)
Let’s fetch some information for later commands and tests:
# Get counter-app URL
export COUNTER_URL=$(oc get route/counters -o jsonpath="https://{.spec.host}")
# Get node where Pod is running
export NODE_NAME=$(oc get pods -l app=counters -o jsonpath="{.items[0].spec.nodeName}" )
# Get pod name
export POD_NAME=$(oc get pods -l app=counters -o jsonpath="{.items[0].metadata.name}" )
Code language: Bash (bash)
Deployment of our checkpoint analyser helper pods
We deploy the checkpoint analyser in the same demo project as the counters application:
oc apply -k https://github.com/openshift-examples/forensic-container-checkpointing-and-analysis/checkpoint-analyser
Code language: Bash (bash)
Now we come to the checkpointing part
Again, in this blog post I’m focusing on the OpenShift part. Now we have everything running and ready to follow Adrian Reber’s blog post https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/
Run queries against the counters app to write a file and write something into the memory:
$ curl ${COUNTER_URL}/create?test-file
counter: 0
$ curl ${COUNTER_URL}/secret?RANDOM_1432_KEY
counter: 1
$ curl ${COUNTER_URL}/
counter: 2
Code language: Shell Session (shell)
Let’s create the checkpoint through the OpenShift / Kubernetes API at the Kubelet:
$ export TOKEN=$(oc whoami -t )
$ curl -k -X POST --header "Authorization: Bearer $TOKEN" https://api.demo.openshift.pub:6443/api/v1/nodes/$NODE_NAME/proxy/checkpoint/demo/$POD_NAME/counter
{"items":["/var/lib/kubelet/checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tar"]}
Code language: Shell Session (shell)
Now finally, we have our checkpoint ????
Keep in mind, a checkpoint contains everything from memory to filesystem and process information. If you create a checkpoint from an application with sensitive information in memory, you can easily export and discover that sensitive information!
Let’s discover the checkpoint a bit:
Get the matching Pod on the same node as we created the checkpoint:
$ export CHECKPOINT_POD_NAME=$(oc get pods -l app.kubernetes.io/component=checkpoint-analyser -o jsonpath="{.items[?(@.spec.nodeName=='${NODE_NAME}')].metadata.name}")
Code language: Shell Session (shell)
“Login” into the checkpoint analyser pod:
$ oc rsh $CHECKPOINT_POD_NAME
Code language: Shell Session (shell)
Now we are inside the checkpoint analyser pod and can discover the checkpoint:
sh-5.2# cd /checkpoints/
sh-5.2# ls
checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tar
Code language: Shell Session (shell)
With the checkpointctl tool you can show some information:
sh-5.2# checkpointctl show checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar
Displaying container checkpoint data from checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tar
+-----------+--------------------------------------------------------------------------------------------+--------------+---------+--------------------------------+--------+-------------+------------+-------------------+
| CONTAINER | IMAGE | ID | RUNTIME | CREATED | ENGINE | IP | CHKPT SIZE | ROOT FS DIFF SIZE |
+-----------+--------------------------------------------------------------------------------------------+--------------+---------+--------------------------------+--------+-------------+------------+-------------------+
| counter | quay.io/openshift-examples/forensic-container-checkpointing-and-analysis/counters-app:main | b7fe1c786b7d | runc | 2023-08-24T11:19:38.607090024Z | CRI-O | 10.130.2.99 | 8.7 MiB | 3.0 KiB |
+-----------+--------------------------------------------------------------------------------------------+--------------+---------+--------------------------------+--------+-------------+------------+-------------------+
Code language: Shell Session (shell)
Let’s unpack the checkpoint and take a look inside:
bind.mounts
– this file contains information about bind mounts and is needed during restore to mount all external files and directories at the right locationcheckpoint/
– this directory contains the actual checkpoint as created by CRIUconfig.dump
andspec.dump
– these files contain metadata about the container which is needed during restoredump.log
– this file contains the debug output of CRIU created during checkpointingstats-dump
– this file contains the data which is used by checkpointctl to display dump statistics (--print-stats
)rootfs-diff.tar
– this file contains all changed files on the container’s file-system
sh-5.2# cd /tmp/
sh-5.2# mkdir checkpoint
sh-5.2# cd checkpoint/
sh-5.2# tar xf /checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar
sh-5.2# ls
bind.mounts checkpoint config.dump dump.log io.kubernetes.cri-o.LogPath rootfs-diff.tar spec.dump stats-dump
Code language: Shell Session (shell)
CRiu Image Tool (CRIT) is another tool to analyze the CRiu Images in the checkpoint/ directory.
sh-5.2# sh-5.2# crit show checkpoint/pstree.img | jq .entries[].pid
1
sh-5.2# crit show checkpoint/core-1.img | jq .entries[0].tc.comm
"Python3"
Code language: Shell Session (shell)
Here is an important example. As mentioned above, the whole memory is also stored on disk with possible sensitive information. We stored with our application a “Secret” key “RANDOM_1432_KEY” in memory and can easily find it:
sh-5.2# ls checkpoint/pages-*
checkpoint/pages-1.img
sh-5.2# grep -ao RANDOM_1432_KEY checkpoint/pages-*
RANDOM_1432_KEY
Code language: Shell Session (shell)
In case you want to debug your application with gdb
, you can convert the checkpoint to a coredump:
sh-5.2# cd checkpoint/
sh-5.2# pwd
/tmp/checkpoint/checkpoint
sh-5.2# coredump-python3
sh-5.2#
sh-5.2# echo info registers | gdb --core core.1 -q
BFD: warning: /tmp/checkpoint/checkpoint/core.1 has a segment extending past end of file
warning: malformed note - filename area is too big
[New LWP 1]
Missing separate debuginfo for the main executable file
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/3e/6eae34c82de9e112e48289c49532ee80ab3929
warning: Unexpected size of section `.reg-xstate/1' in core file.
Core was generated by `python3 counter.py'.
warning: Unexpected size of section `.reg-xstate/1' in core file.
#0 0x00007f563e142937 in ?? ()
(gdb) rax 0xfffffffffffffffc -4
rbx 0x1f4 500
rcx 0x7f563e142937 140008385423671
rdx 0x1f4 500
rsi 0x1 1
rdi 0x7f563de4c6b0 140008382318256
rbp 0x4345886f1693 0x4345886f1693
rsp 0x7ffd7fbf3a68 0x7ffd7fbf3a68
r8 0x0 0
r9 0x0 0
r10 0x4345518d0200 73965000000000
r11 0x246 582
r12 0x7f563e7741c0 140008391918016
r13 0x7f563df226c0 140008383194816
r14 0x7f563e72dbf8 140008391629816
r15 0x7f563dc8bfc0 140008380481472
rip 0x7f563e142937 0x7f563e142937
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) sh-5.2#
Code language: Shell Session (shell)
Another option to analyze the checkpoint is to copy it on your local machine via:
$ oc cp $CHECKPOINT_POD_NAME:/checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar checkpoint-counters.tar
Code language: Shell Session (shell)
Summary
Forensic analysis is just one among the various use cases of container checkpointing. Consider the following scenarios, and there are likely many more:
- Long-Running Processes: Applications with prolonged processes or computations benefit from checkpointing. When a container needs temporary pausing or stopping, checkpointing allows for resuming from the interruption point. For instance, this is useful during node maintenance to apply operating system updates. Similarly, it enables starting a long-running process with higher priority and resuming the lower priority process after the higher priority task completes.
- Backup and Recovery: Creating backups of running containers is critical for swift recovery in the event of hardware failures or crashes. These checkpoints can restore container states and data on alternative infrastructure, ensuring business continuity.
- Pre-Warming and Caching: Another valuable application is pre-warming or caching an application’s startup. By initiating an application, creating a checkpoint, and then quickly starting from the checkpoint, the startup time can be significantly reduced. A proposal by Adrian Reber at the Open Container Initiative explores the idea of storing checkpoints for later startups and other use-cases. You can find the proposal here: OCI Proposal (still a work-in-progress).
While this concept is in its early stages, it’s exciting to witness the possibilities that lie ahead.