As Kubernetes gets more and more adopted, the need for tools to manage diverse and widespread installations grows. Red Hat’s answer to that challenge is Red Hat Advanced Cluster Management for Kubernetes (RHACM). Its range of features is divided into four main areas, which can also easily be seen in its UI (Note: The Screenshots are taken from RHACM 2.2, we already have RHACM 2.3, the UI might now look a bit different):
In the top left corner, we see “End-to-end visibility”. In the bottom left corner, we see “Application lifecycle”. In the top right corner, we see “Cluster lifecycle” and in the bottom right corner we see “Governance, Risk, and Compliance”.
This article provides an introduction to the concepts behind the “Governance, Risk and Compliance” section, sometimes shortened to “GRC” and also known as “Policy Engine”.
The architecture of the Policy Engine in RHACM
Quoting from https://github.com/open-cluster-management/governance-policy-framework :
“The Policy is the Custom Resource Definition (CRD), created for policy framework controllers to monitor. It acts as a vehicle to deliver policies to managed cluster and collect results to send to the hub cluster.”
So, here we find a couple of interesting aspects:
- A policy is a CRD, which means, it makes use of standard Kubernetes capabilities and is therefore applicable and usable for any Kubernetes, and not only for OpenShift.
- It interacts with so-called “policy framework controllers”, meaning, there can be different controllers.
- It interacts with managed clusters, meaning, there can be more than one, and they can be selected individually.
- Results are sent back to the hub cluster, so that there is one central place of control.
The above-mentioned URL also has an architecture image of the whole policy engine setup:
Architecture of RHACM
Red Hat’s standard slide deck for the introduction of RHACM has a slightly simplified and therefore different image:
Simplified Architecture of RHACM
What we can see in the lower image is that there are “Out-of-box Policy Controllers”, “Policy Controllers” and “Custom Policy Controllers”.
So RHACM comes with a set of usable policies, the “Out-of-box Policy Controllers”.
According to the documentation, there are currently three such “Out-of-box Policy Controllers”:
More information on the specifics and the allowed parameters of these controllers can be found in the GitHub sections of the controllers:
- https://github.com/open-cluster-management/config-policy-controller
- https://github.com/open-cluster-management/cert-policy-controller
- https://github.com/open-cluster-management/iam-policy-controller
A piece of additional important information here is that Policy Controllers in principle could also enforce their policies, and not only send back the status of adherence or compliance to that specific policy. From the “Out-of-box Policy Controllers” so far only the configuration policy controller supports the enforce feature.
What makes a Policy?
Now that we have had a small look into the overall architecture, let’s dive into policies a bit deeper.
As mentioned above, a Policy is a CRD, and therefore can be represented in a YAML file. For the policy to be effective, it needs to consist of three parts: The Policy
, the PlacementBinding
and the PlacementRule
:
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
.
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
.
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
Here we simply see the top line definitions of the three parts. What they can contain, and how they interact will be shown later in this blog entry and also in upcoming blog entries.
As a quick start, let’s note that the PlacementBinding
connects the Policy
to the PlacementRule
, and the PlacementRule
defines where the Policy
should be active.
With that, we have covered the basics of what a Policy in RHACM is, and how it works.
Some Examples
Okay, let’s look at some example policies to better understand what they do.
Like all Red Hat products, RHACM has a so-called “upstream” community version, which is available under: https://open-cluster-management.io/. Its source code can be found on GitHub: https://github.com/open-cluster-management-io. The “downstream” product also has some GitHub repositories, here we are specifically interested in the policies, which can be found in: https://github.com/open-cluster-management/policy-collection
For this exercise, let’s pick two specific policies:
- An operator-installation policy
- A config check policy
In the first example, we will make use of the gatekeeper operator policy, which checks for the existence (and can also enforce the installation) of the gatekeeper operator. Gatekeeper itself is something that we will not look into here, check out these articles if you would like to learn more, or have a look at its source code repository.
The second policy checks if the SCCs (Security Context Constraints) adhere to predefined settings (for example, if it’s allowed to run a container with root privileges).
So, the policy for 1.) can be found here: https://github.com/open-cluster-management/policy-collection/blob/main/community/CM-Configuration-Management/policy-gatekeeper-operator.yaml
The policy for 2.) can be found here: https://github.com/open-cluster-management/policy-collection/blob/main/stable/SC-System-and-Communications-Protection/policy-scc.yaml
They can also be found below in the Appendixes for reference.
Example 1
Let’s start with a quick look at the gatekeeper policy:
It defines some namespaces and operators to be installed. We can see these in the YAML with their own “objectDefinition” section categorized under “Policy”. They are:
- A “Namespace” called: “openshift-gatekeeper-operator”
- A “CatalogSource” called “gatekeep-operator-catalog-source” to be deployed in the aforementioned new Namespace “openshift-gatekeeper-operator” and be named “gatekeeper-operator” and making use of an image “quay.io/gatekeeper/gatekeeper-operator-bundle-index:latest”.
- An “OperatorGroup” named “gatekeeper-operator-group” again to be deployed in the aforementioned Namespace created in 1.
- A “Subscription” to manage the whole things in RHACM
- The “Gatekeeper” itself, making use of the following image: “docker.io/openpolicyagent/gatekeeper:v3.3.0”
This is all described in the YAML of our example 1.). We see that a policy can also check and enforce the existence of multiple elements at the same time. Here we will not go into more detail, on what needs to be put into the YAML and what these things all do, this will be left for a later blog entry. Here, we simply want to show how easy it is to make use of such policies.
For demo purposes, I have two OCP-clusters, one of them runs RHACM. We can see that, when we go to the “Cluster lifecycle” section of RHACM:
List of managed Clusters
An important note here: One of the clusters was prepared with the label “environment=dev” (circled in red above).
This RHACM has not yet created any GRC policies, so let’s do this with our first example.
To start, let’s go to the bottom left group on the RHACM start page, and we’ll see the UI for Governance and risk (also called: Policy engine):
Startpage of Goverance and Risk in RHACM
We click on “Create Policy”:
Dialog for creation of a Policy
By default, we see the YAML code on the right side, which makes it also easy for us to import the above-mentioned first policy. Let’s go to the GitHub page, click on “Raw” for the policy YAML, and just copy the YAML code from GitHub into the YAML section of RHACM. Note: Before pasting into RHACM clear the YAML section there. Typically you do a <ctrl>-a <ctrl>-c
in the GitHub Window, and a <ctrl>-a <ctrl>-v
in the RHACM window. After you paste the policy into that YAML-Edit Window in RHACM, you should have the following:
Edit and modification of a Policy
In the last line of the policy code, in the “PlacementRule” section, we see that this policy should be used on all clusters which have a label “environment” with a value of “dev”. Before we can press the “Create” button, we still need to select a namespace, in which this policy shall be executed. This is for internal organization reasons only, it does NOT affect the results of the policy engine itself. So, here I simply select on the left side the “Default” namespace. I could also have created some specific policy-engine namespaces in advance to be able to group them more efficiently. Also note, that I will not yet select the “Enforce if supported” button.
Before we create the policy, let’s again check the list of installed operators on the cluster itself in its OpenShift UI:
List of installed Operators in OpenShift
We see that also this cluster is a fresh one, no additional operators are installed.
So, let’s create the policy by clicking on the “Create” button in the top right corner of the “Create policy” dialog in RHACM.
We are forwarded to a screen, which after a couple of moments looks like:
Result of an instantiated running Policy
We see that RHACM detected that the policy shall be used on 1 cluster, and that the policy is NOT adhered to in this cluster, therefore we have one policy violation. We can click on the policy name to get a more detailed overview, and there we select the “Status” Tab:
Details of Policy failure
We see: The required operator elements are missing, which is why the policy failed.
If we go back to the policy overview two images above, we see that at the end of the line, where the policy is listed, we see three dots on top of each other. If we click on those, we get a popup box, in which we can select an action to the policy:
How to enforce a Policy
Let’s click on “Enforce” and confirm that action in the next popup box.
A couple of moments later the image changes to:
Successful enforcement of a Policy
And, when we again check the details of the policy, we see:
Details of Success of a Policy
And we can confirm in our cluster with the “environment=dev” label, that the operator has been installed:
List of installed Operators in OpenShift with new Operator
This concludes our first example. We learned how a policy can ensure or simply check for the installation of a specific operator and all the elements it needs to run.
Example 2
Just as with our example 1, let’s look at our policy.
Different from our first example, this policy only has one single “objectDefinition” section. It is of the kind “ConfigurationPolicy” and refers to an “object-template” of the kind “SecurityContextConstraints”.
SecurityContextConstraints in OpenShift are used to define what permissions running containers will have. They consist of a couple of attributes, for example “allowPrivilegedContainer” or “allowPrivilegeEscalation”.
We can check for the default setup of these SCCs via:
[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc get scc
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES
anyuid false <no value> MustRunAs RunAsAny RunAsAny RunAsAny 10 false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
hostaccess false <no value> MustRunAs MustRunAsRange MustRunAs RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","hostPath","persistentVolumeClaim","projected","secret"]
hostmount-anyuid false <no value> MustRunAs RunAsAny RunAsAny RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","hostPath","nfs","persistentVolumeClaim","projected","secret"]
hostnetwork false <no value> MustRunAs MustRunAsRange MustRunAs MustRunAs <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
machine-api-termination-handler false <no value> MustRunAs RunAsAny MustRunAs MustRunAs <no value> false ["downwardAPI","hostPath"]
node-exporter true <no value> RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"]
nonroot false <no value> MustRunAs MustRunAsNonRoot RunAsAny RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
privileged true ["*"] RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"]
restricted false <no value> MustRunAs MustRunAsRange MustRunAs RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
[mpfuetzn@mpfuetzn oc4.6.25]$
Code language: PHP (php)
We see that there are a couple of predefined ones. And we can look at for example the “restricted” one and its definition via:
[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc describe scc restricted
Name: restricted
Priority: <none>
Access:
Users: <none>
Groups: system:authenticated
Settings:
Allow Privileged: false
Allow Privilege Escalation: true
Default Add Capabilities: <none>
Required Drop Capabilities: KILL,MKNOD,SETUID,SETGID
Allowed Capabilities: <none>
Allowed Seccomp Profiles: <none>
Allowed Volume Types: configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret
Allowed Flexvolumes: <all>
Allowed Unsafe Sysctls: <none>
Forbidden Sysctls: <none>
Allow Host Network: false
Allow Host Ports: false
Allow Host PID: false
Allow Host IPC: false
Read Only Root Filesystem: false
Run As User Strategy: MustRunAsRange
UID: <none>
UID Range Min: <none>
UID Range Max: <none>
SELinux Context Strategy: MustRunAs
User: <none>
Role: <none>
Type: <none>
Level: <none>
FSGroup Strategy: MustRunAs
Ranges: <none>
Supplemental Groups Strategy: RunAsAny
Ranges: <none>
[mpfuetzn@mpfuetzn oc4.6.25]$
Code language: HTML, XML (xml)
In the YAML of our example, we see that this policy will create a new one, or check for the existence of a SCC named “sample-restricted-scc”.
In this example, we will now use the policy to make sure that the “restricted” SCC will look like the definition in the policy.
So, let’s again create a new policy by using copy&paste of the YAML code into our RHACM policy definition window.
To achieve this, we need to again define a Namespace in which the policy shall run. I again select default, see also my remarks above on this selection. But there is a second thing we need to do. In line 39 we see still “sample-restricted-scc”, let’s change that to “restricted”:
Edit and Modification of a Policy
Note again, that in the last line of this policy this is also only to be applied to clusters with the “environment=dev” labels, so we are safe in this example.
Again, this policy fails, because it defines a slightly different setup than the one we saw above as the current status. So let’s look at the results of the policy check (note: I also did not set the “enforce” button when creating the policy):
Result of failed Policy
Drilldown reveals:
Details of failed Policy
Let’s view the details:
More details of failed Policy
If we again set the policy to “Enforce” as we did above in the other example, it will “correct” the error, and that will lead to the following output for:
[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc describe scc restricted
Name: restricted
Priority: 10
Access:
Users: <none>
Groups: system:authenticated
Settings:
Allow Privileged: false
Allow Privilege Escalation: true
Default Add Capabilities: <none>
Required Drop Capabilities: KILL,MKNOD,SETUID,SETGID
Allowed Capabilities: <none>
Allowed Seccomp Profiles: <none>
Allowed Volume Types: configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret
Allowed Flexvolumes: <all>
Allowed Unsafe Sysctls: <none>
Forbidden Sysctls: <none>
Allow Host Network: false
Allow Host Ports: false
Allow Host PID: false
Allow Host IPC: false
Read Only Root Filesystem: false
Run As User Strategy: MustRunAsRange
UID: <none>
UID Range Min: <none>
UID Range Max: <none>
SELinux Context Strategy: MustRunAs
User: <none>
Role: <none>
Type: <none>
Level: <none>
FSGroup Strategy: MustRunAs
Ranges: <none>
Supplemental Groups Strategy: RunAsAny
Ranges: <none>
[mpfuetzn@mpfuetzn oc4.6.25]$
Code language: HTML, XML (xml)
If we look into more details where the origial SCC and the new enforced SCC differ, we see:
[mpfuetzn@mpfuetzn oc4.6.25]$ diff orig new
3c3
< Priority: <none>
---
> Priority: 10
[mpfuetzn@mpfuetzn oc4.6.25]$
Code language: JavaScript (javascript)
It’s a small change, but still, it’s a difference… 🙂 And regardless, the policy ensures that the SCC stays as intended, and doesn’t get modified by accident.
This finishes our introduction to Policies in RHACM. More to come in future blog entries. Here: https://open011prod.wpengine.com/2021/10/11/rhacm-and-policies-more-details/
Appendix A: Example 1:
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
name: policy-gatekeeper-operator
annotations:
policy.open-cluster-management.io/standards: NIST SP 800-53
policy.open-cluster-management.io/categories: CM Configuration Management
policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
spec:
remediationAction: inform
disabled: false
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: gatekeeper-operator-ns
spec:
remediationAction: inform
severity: high
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: v1
kind: Namespace
metadata:
name: openshift-gatekeeper-operator
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: gatekeeper-operator-catalog-source
spec:
remediationAction: inform
severity: high
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: gatekeeper-operator
namespace: openshift-gatekeeper-operator
spec:
displayName: Gatekeeper Operator Upstream
publisher: github.com/font/gatekeeper-operator
sourceType: grpc
image: 'quay.io/gatekeeper/gatekeeper-operator-bundle-index:latest'
updateStrategy:
registryPoll:
interval: 45m
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: gatekeeper-operator-group
spec:
remediationAction: inform
severity: high
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: gatekeeper-operator
namespace: openshift-gatekeeper-operator
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: gatekeeper-operator-subscription
spec:
remediationAction: inform
severity: high
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: gatekeeper-operator-sub
namespace: openshift-gatekeeper-operator
spec:
channel: stable
name: gatekeeper-operator
source: gatekeeper-operator
sourceNamespace: openshift-gatekeeper-operator
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: gatekeeper
spec:
remediationAction: inform
severity: high
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: operator.gatekeeper.sh/v1alpha1
kind: Gatekeeper
metadata:
name: gatekeeper
spec:
audit:
logLevel: INFO
replicas: 1
image:
image: 'docker.io/openpolicyagent/gatekeeper:v3.3.0'
validatingWebhook: Enabled
mutatingWebhook: Disabled
webhook:
emitAdmissionEvents: Enabled
logLevel: INFO
replicas: 2
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
name: binding-policy-gatekeeper-operator
placementRef:
name: placement-policy-gatekeeper-operator
kind: PlacementRule
apiGroup: apps.open-cluster-management.io
subjects:
- name: policy-gatekeeper-operator
kind: Policy
apiGroup: policy.open-cluster-management.io
---
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
name: placement-policy-gatekeeper-operator
spec:
clusterConditions:
- status: "True"
type: ManagedClusterConditionAvailable
clusterSelector:
matchExpressions:
- {key: environment, operator: In, values: ["dev"]}
Code language: JavaScript (javascript)
Appendix B: Example 2:
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
name: policy-securitycontextconstraints
annotations:
policy.open-cluster-management.io/standards: NIST SP 800-53
policy.open-cluster-management.io/categories: SC System and Communications Protection
policy.open-cluster-management.io/controls: SC-4 Information In Shared Resources
spec:
remediationAction: inform
disabled: false
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: policy-securitycontextconstraints-example
spec:
remediationAction: inform # the policy-template spec.remediationAction is overridden by the preceding parameter value for spec.remediationAction.
severity: high
namespaceSelector:
exclude: ["kube-*"]
include: ["default"]
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints # restricted scc
metadata:
annotations:
kubernetes.io/description: restricted denies access to all host features and requires pods to be run with a UID, and SELinux context that are allocated to the namespace. This is the most restrictive SCC and it is used by default for authenticated users.
name: sample-restricted-scc
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: []
defaultAddCapabilities: []
fsGroup:
type: MustRunAs
groups:
- system:authenticated
priority: 10
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
type: MustRunAsRange
seLinuxContext:
type: MustRunAs
supplementalGroups:
type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
name: binding-policy-securitycontextconstraints
placementRef:
name: placement-policy-securitycontextconstraints
kind: PlacementRule
apiGroup: apps.open-cluster-management.io
subjects:
- name: policy-securitycontextconstraints
kind: Policy
apiGroup: policy.open-cluster-management.io
---
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
name: placement-policy-securitycontextconstraints
spec:
clusterConditions:
- status: "True"
type: ManagedClusterConditionAvailable
clusterSelector:
matchExpressions:
- {key: environment, operator: In, values: ["dev"]}