RHACM and Policies – An Introduction

August 23, 2021

As Kubernetes gets more and more adopted, the need for tools to manage diverse and widespread installations grows. Red Hat’s answer to that challenge is Red Hat Advanced Cluster Management for Kubernetes (RHACM). Its range of features is divided into four main areas, which can also easily be seen in its UI (Note: The Screenshots are taken from RHACM 2.2, we already have RHACM 2.3, the UI might now look a bit different):

In the top left corner, we see “End-to-end visibility”. In the bottom left corner, we see “Application lifecycle”. In the top right corner, we see “Cluster lifecycle” and in the bottom right corner we see “Governance, Risk, and Compliance”.

This article provides an introduction to the concepts behind the “Governance, Risk and Compliance” section, sometimes shortened to “GRC” and also known as “Policy Engine”.

The architecture of the Policy Engine in RHACM

Quoting from https://github.com/open-cluster-management/governance-policy-framework :

“The Policy is the Custom Resource Definition (CRD), created for policy framework controllers to monitor. It acts as a vehicle to deliver policies to managed cluster and collect results to send to the hub cluster.”

So, here we find a couple of interesting aspects:

  1. A policy is a CRD, which means, it makes use of standard Kubernetes capabilities and is therefore applicable and usable for any Kubernetes, and not only for OpenShift.
  2. It interacts with so-called “policy framework controllers”, meaning, there can be different controllers.
  3. It interacts with managed clusters, meaning, there can be more than one, and they can be selected individually.
  4. Results are sent back to the hub cluster, so that there is one central place of control.

The above-mentioned URL also has an architecture image of the whole policy engine setup:

Red Hat’s standard slide deck for the introduction of RHACM has a slightly simplified and therefore different image:

What we can see in the lower image is that there are “Out-of-box Policy Controllers”, “Policy Controllers” and “Custom Policy Controllers”.

So RHACM comes with a set of usable policies, the “Out-of-box Policy Controllers”.

According to the documentation, there are currently three such “Out-of-box Policy Controllers”:

More information on the specifics and the allowed parameters of these controllers can be found in the GitHub sections of the controllers:

A piece of additional important information here is that Policy Controllers in principle could also enforce their policies, and not only send back the status of adherence or compliance to that specific policy. From the “Out-of-box Policy Controllers” so far only the configuration policy controller supports the enforce feature.

What makes a Policy?

Now that we have had a small look into the overall architecture, let’s dive into policies a bit deeper.

As mentioned above, a Policy is a CRD, and therefore can be represented in a YAML file. For the policy to be effective, it needs to consist of three parts: The Policy, the PlacementBinding and the PlacementRule:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
.
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
.
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule

Here we simply see the top line definitions of the three parts. What they can contain, and how they interact will be shown later in this blog entry and also in upcoming blog entries.

As a quick start, let’s note that the PlacementBinding connects the Policy to the PlacementRule, and the PlacementRule defines where the Policy should be active.

With that, we have covered the basics of what a Policy in RHACM is, and how it works.

Some Examples

Okay, let’s look at some example policies to better understand what they do.

Like all Red Hat products, RHACM has a so-called “upstream” community version, which is available under: https://open-cluster-management.io/. Its source code can be found on GitHub: https://github.com/open-cluster-management-io. The “downstream” product also has some GitHub repositories, here we are specifically interested in the policies, which can be found in: https://github.com/open-cluster-management/policy-collection 

For this exercise, let’s pick two specific policies:

  1. An operator-installation policy
  2. A config check policy

In the first example, we will make use of the gatekeeper operator policy, which checks for the existence (and can also enforce the installation) of the gatekeeper operator. Gatekeeper itself is something that we will not look into here, check out these articles if you would like to learn more, or have a look at its source code repository.

The second policy checks if the SCCs (Security Context Constraints) adhere to predefined settings (for example, if it’s allowed to run a container with root privileges).

So, the policy for 1.) can be found here: https://github.com/open-cluster-management/policy-collection/blob/main/community/CM-Configuration-Management/policy-gatekeeper-operator.yaml 

The policy for 2.) can be found here: https://github.com/open-cluster-management/policy-collection/blob/main/stable/SC-System-and-Communications-Protection/policy-scc.yaml

They can also be found below in the Appendixes for reference.

Example 1

Let’s start with a quick look at the gatekeeper policy:

It defines some namespaces and operators to be installed. We can see these in the YAML with their own “objectDefinition” section categorized under “Policy”. They are:

  1. A “Namespace” called: “openshift-gatekeeper-operator”
  2. A “CatalogSource” called “gatekeep-operator-catalog-source” to be deployed in the aforementioned new Namespace “openshift-gatekeeper-operator” and be named “gatekeeper-operator” and making use of an image “quay.io/gatekeeper/gatekeeper-operator-bundle-index:latest”.
  3. An “OperatorGroup” named “gatekeeper-operator-group” again to be deployed in the aforementioned Namespace created in 1.
  4. A “Subscription” to manage the whole things in RHACM
  5. The “Gatekeeper” itself, making use of the following image: “docker.io/openpolicyagent/gatekeeper:v3.3.0”

This is all described in the YAML of our example 1.). We see that a policy can also check and enforce the existence of multiple elements at the same time. Here we will not go into more detail, on what needs to be put into the YAML and what these things all do, this will be left for a later blog entry. Here, we simply want to show how easy it is to make use of such policies.

For demo purposes, I have two OCP-clusters, one of them runs RHACM. We can see that, when we go to the “Cluster lifecycle” section of RHACM:

An important note here: One of the clusters was prepared with the label “environment=dev” (circled in red above).

This RHACM has not yet created any GRC policies, so let’s do this with our first example.

To start, let’s go to the bottom left group on the RHACM start page, and we’ll see the UI for Governance and risk (also called: Policy engine):

We click on “Create Policy”:

By default, we see the YAML code on the right side, which makes it also easy for us to import the above-mentioned first policy. Let’s go to the GitHub page, click on “Raw” for the policy YAML, and just copy the YAML code from GitHub into the YAML section of RHACM. Note: Before pasting into RHACM clear the YAML section there. Typically you do a <ctrl>-a <ctrl>-c in the GitHub Window, and a <ctrl>-a <ctrl>-v in the RHACM window. After you paste the policy into that YAML-Edit Window in RHACM, you should have the following:

In the last line of the policy code, in the “PlacementRule” section, we see that this policy should be used on all clusters which have a label “environment” with a value of “dev”. Before we can press the “Create” button, we still need to select a namespace, in which this policy shall be executed. This is for internal organization reasons only, it does NOT affect the results of the policy engine itself. So, here I simply select on the left side the “Default” namespace. I could also have created some specific policy-engine namespaces in advance to be able to group them more efficiently. Also note, that I will not yet select the “Enforce if supported” button.

Before we create the policy, let’s again check the list of installed operators on the cluster itself in its OpenShift UI:

We see that also this cluster is a fresh one, no additional operators are installed.

So, let’s create the policy by clicking on the “Create” button in the top right corner of the “Create policy” dialog in RHACM.

We are forwarded to a screen, which after a couple of moments looks like:

We see that RHACM detected that the policy shall be used on 1 cluster, and that the policy is NOT adhered to in this cluster, therefore we have one policy violation. We can click on the policy name to get a more detailed overview, and there we select the “Status” Tab:

We see: The required operator elements are missing, which is why the policy failed.

If we go back to the policy overview two images above, we see that at the end of the line, where the policy is listed, we see three dots on top of each other. If we click on those, we get a popup box, in which we can select an action to the policy:

Let’s click on “Enforce” and confirm that action in the next popup box.

A couple of moments later the image changes to:

And, when we again check the details of the policy, we see:

And we can confirm in our cluster with the “environment=dev” label, that the operator has been installed:

This concludes our first example. We learned how a policy can ensure or simply check for the installation of a specific operator and all the elements it needs to run.

Example 2

Just as with our example 1, let’s look at our policy.

Different from our first example, this policy only has one single “objectDefinition” section. It is of the kind “ConfigurationPolicy” and refers to an “object-template” of the kind “SecurityContextConstraints”.

SecurityContextConstraints in OpenShift are used to define what permissions running containers will have. They consist of a couple of attributes, for example “allowPrivilegedContainer” or “allowPrivilegeEscalation”.

We can check for the default setup of these SCCs via:

[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc get scc
NAME                              PRIV    CAPS         SELINUX     RUNASUSER          FSGROUP     SUPGROUP    PRIORITY     READONLYROOTFS   VOLUMES
anyuid                            false   <no value>   MustRunAs   RunAsAny           RunAsAny    RunAsAny    10           false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
hostaccess                        false   <no value>   MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","hostPath","persistentVolumeClaim","projected","secret"]
hostmount-anyuid                  false   <no value>   MustRunAs   RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","hostPath","nfs","persistentVolumeClaim","projected","secret"]
hostnetwork                       false   <no value>   MustRunAs   MustRunAsRange     MustRunAs   MustRunAs   <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
machine-api-termination-handler   false   <no value>   MustRunAs   RunAsAny           MustRunAs   MustRunAs   <no value>   false            ["downwardAPI","hostPath"]
node-exporter                     true    <no value>   RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]
nonroot                           false   <no value>   MustRunAs   MustRunAsNonRoot   RunAsAny    RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
privileged                        true    ["*"]        RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]
restricted                        false   <no value>   MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
[mpfuetzn@mpfuetzn oc4.6.25]$Code language: PHP (php)

We see that there are a couple of predefined ones. And we can look at for example the “restricted” one and its definition via:

[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc describe scc restricted
Name:                        restricted
Priority:                    <none>
Access:                        
  Users:                    <none>
  Groups:                    system:authenticated
Settings:                    
  Allow Privileged:                false
  Allow Privilege Escalation:            true
  Default Add Capabilities:            <none>
  Required Drop Capabilities:            KILL,MKNOD,SETUID,SETGID
  Allowed Capabilities:                <none>
  Allowed Seccomp Profiles:            <none>
  Allowed Volume Types:                configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret
  Allowed Flexvolumes:                <all>
  Allowed Unsafe Sysctls:            <none>
  Forbidden Sysctls:                <none>
  Allow Host Network:                false
  Allow Host Ports:                false
  Allow Host PID:                false
  Allow Host IPC:                false
  Read Only Root Filesystem:            false
  Run As User Strategy: MustRunAsRange        
    UID:                    <none>
    UID Range Min:                <none>
    UID Range Max:                <none>
  SELinux Context Strategy: MustRunAs        
    User:                    <none>
    Role:                    <none>
    Type:                    <none>
    Level:                    <none>
  FSGroup Strategy: MustRunAs            
    Ranges:                    <none>
  Supplemental Groups Strategy: RunAsAny    
    Ranges:                    <none>
[mpfuetzn@mpfuetzn oc4.6.25]$Code language: HTML, XML (xml)

In the YAML of our example, we see that this policy will create a new one, or check for the existence of a SCC named “sample-restricted-scc”.

In this example, we will now use the policy to make sure that the “restricted” SCC will look like the definition in the policy.

So, let’s again create a new policy by using copy&paste of the YAML code into our RHACM policy definition window.

To achieve this, we need to again define a Namespace in which the policy shall run. I again select default, see also my remarks above on this selection. But there is a second thing we need to do. In line 39 we see still “sample-restricted-scc”, let’s change that to “restricted”:

Note again, that in the last line of this policy this is also only to be applied to clusters with the “environment=dev” labels, so we are safe in this example.

Again, this policy fails, because it defines a slightly different setup than the one we saw above as the current status. So let’s look at the results of the policy check (note: I also did not set the “enforce” button when creating the policy):

Drilldown reveals:

Let’s view the details:

If we again set the policy to “Enforce” as we did above in the other example, it will “correct” the error, and that will lead to the following output for:

[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc describe scc restricted
Name:                        restricted
Priority:                    10
Access:                        
  Users:                    <none>
  Groups:                    system:authenticated
Settings:                    
  Allow Privileged:                false
  Allow Privilege Escalation:            true
  Default Add Capabilities:            <none>
  Required Drop Capabilities:            KILL,MKNOD,SETUID,SETGID
  Allowed Capabilities:                <none>
  Allowed Seccomp Profiles:            <none>
  Allowed Volume Types:                configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret
  Allowed Flexvolumes:                <all>
  Allowed Unsafe Sysctls:            <none>
  Forbidden Sysctls:                <none>
  Allow Host Network:                false
  Allow Host Ports:                false
  Allow Host PID:                false
  Allow Host IPC:                false
  Read Only Root Filesystem:            false
  Run As User Strategy: MustRunAsRange        
    UID:                    <none>
    UID Range Min:                <none>
    UID Range Max:                <none>
  SELinux Context Strategy: MustRunAs        
    User:                    <none>
    Role:                    <none>
    Type:                    <none>
    Level:                    <none>
  FSGroup Strategy: MustRunAs            
    Ranges:                    <none>
  Supplemental Groups Strategy: RunAsAny    
    Ranges:                    <none>
[mpfuetzn@mpfuetzn oc4.6.25]$Code language: HTML, XML (xml)

If we look into more details where the origial SCC and the new enforced SCC differ, we see:

[mpfuetzn@mpfuetzn oc4.6.25]$ diff orig new
3c3
< Priority:                    <none>
---
> Priority:                    10
[mpfuetzn@mpfuetzn oc4.6.25]$
Code language: JavaScript (javascript)

It’s a small change, but still, it’s a difference… 🙂 And regardless, the policy ensures that the SCC stays as intended, and doesn’t get modified by accident.

This finishes our introduction to Policies in RHACM. More to come in future blog entries. Here: https://open011prod.wpengine.com/2021/10/11/rhacm-and-policies-more-details/

Appendix A: Example 1:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: policy-gatekeeper-operator
  annotations:
    policy.open-cluster-management.io/standards: NIST SP 800-53
    policy.open-cluster-management.io/categories: CM Configuration Management
    policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
spec:
  remediationAction: inform
  disabled: false
  policy-templates:
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: gatekeeper-operator-ns
      spec:
        remediationAction: inform
        severity: high
        object-templates:
          - complianceType: musthave
            objectDefinition:
              apiVersion: v1
              kind: Namespace
              metadata:
                name: openshift-gatekeeper-operator
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: gatekeeper-operator-catalog-source
      spec:
        remediationAction: inform
        severity: high
        object-templates:
          - complianceType: musthave
            objectDefinition:
              apiVersion: operators.coreos.com/v1alpha1
              kind: CatalogSource
              metadata:
                name: gatekeeper-operator
                namespace: openshift-gatekeeper-operator
              spec:
                displayName: Gatekeeper Operator Upstream
                publisher: github.com/font/gatekeeper-operator
                sourceType: grpc
                image: 'quay.io/gatekeeper/gatekeeper-operator-bundle-index:latest'
                updateStrategy:
                  registryPoll:
                    interval: 45m
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: gatekeeper-operator-group
      spec:
        remediationAction: inform
        severity: high
        object-templates:
          - complianceType: musthave
            objectDefinition:
              apiVersion: operators.coreos.com/v1
              kind: OperatorGroup
              metadata:
                name: gatekeeper-operator
                namespace: openshift-gatekeeper-operator
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: gatekeeper-operator-subscription
      spec:
        remediationAction: inform
        severity: high
        object-templates:
          - complianceType: musthave
            objectDefinition:
              apiVersion: operators.coreos.com/v1alpha1
              kind: Subscription
              metadata:
                name: gatekeeper-operator-sub
                namespace: openshift-gatekeeper-operator
              spec:
                channel: stable
                name: gatekeeper-operator
                source: gatekeeper-operator
                sourceNamespace: openshift-gatekeeper-operator
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: gatekeeper
      spec:
        remediationAction: inform
        severity: high
        object-templates:
          - complianceType: musthave
            objectDefinition:
              apiVersion: operator.gatekeeper.sh/v1alpha1
              kind: Gatekeeper
              metadata:
                name: gatekeeper
              spec:
                audit:
                  logLevel: INFO
                  replicas: 1
                image:
                  image: 'docker.io/openpolicyagent/gatekeeper:v3.3.0'
                validatingWebhook: Enabled
                mutatingWebhook: Disabled
                webhook:
                  emitAdmissionEvents: Enabled
                  logLevel: INFO
                  replicas: 2
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
  name: binding-policy-gatekeeper-operator
placementRef:
  name: placement-policy-gatekeeper-operator
  kind: PlacementRule
  apiGroup: apps.open-cluster-management.io
subjects:
- name: policy-gatekeeper-operator
  kind: Policy
  apiGroup: policy.open-cluster-management.io
---
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
  name: placement-policy-gatekeeper-operator
spec:
  clusterConditions:
  - status: "True"
    type: ManagedClusterConditionAvailable
  clusterSelector:
    matchExpressions:
      - {key: environment, operator: In, values: ["dev"]}Code language: JavaScript (javascript)

Appendix B: Example 2:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: policy-securitycontextconstraints
  annotations:
    policy.open-cluster-management.io/standards: NIST SP 800-53
    policy.open-cluster-management.io/categories: SC System and Communications Protection
    policy.open-cluster-management.io/controls: SC-4 Information In Shared Resources
spec:
  remediationAction: inform
  disabled: false
  policy-templates:
    - objectDefinition:
        apiVersion: policy.open-cluster-management.io/v1
        kind: ConfigurationPolicy
        metadata:
          name: policy-securitycontextconstraints-example
        spec:
          remediationAction: inform # the policy-template spec.remediationAction is overridden by the preceding parameter value for spec.remediationAction.
          severity: high
          namespaceSelector:
            exclude: ["kube-*"]
            include: ["default"]
          object-templates:
            - complianceType: musthave
              objectDefinition:
                apiVersion: security.openshift.io/v1
                kind: SecurityContextConstraints # restricted scc
                metadata:
                  annotations:
                    kubernetes.io/description: restricted denies access to all host features and requires pods to be run with a UID, and SELinux context that are allocated to the namespace.  This is the most restrictive SCC and it is used by default for authenticated users.
                  name: sample-restricted-scc
                allowHostDirVolumePlugin: false
                allowHostIPC: false
                allowHostNetwork: false
                allowHostPID: false
                allowHostPorts: false
                allowPrivilegeEscalation: true
                allowPrivilegedContainer: false
                allowedCapabilities: []
                defaultAddCapabilities: []
                fsGroup:
                  type: MustRunAs
                groups:
                - system:authenticated
                priority: 10
                readOnlyRootFilesystem: false
                requiredDropCapabilities:
                - KILL
                - MKNOD
                - SETUID
                - SETGID
                runAsUser:
                  type: MustRunAsRange
                seLinuxContext:
                  type: MustRunAs
                supplementalGroups:
                  type: RunAsAny
                users: []
                volumes:
                - configMap
                - downwardAPI
                - emptyDir
                - persistentVolumeClaim
                - projected
                - secret
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
  name: binding-policy-securitycontextconstraints
placementRef:
  name: placement-policy-securitycontextconstraints
  kind: PlacementRule
  apiGroup: apps.open-cluster-management.io
subjects:
- name: policy-securitycontextconstraints
  kind: Policy
  apiGroup: policy.open-cluster-management.io
---
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
  name: placement-policy-securitycontextconstraints
spec:
  clusterConditions:
  - status: "True"
    type: ManagedClusterConditionAvailable
  clusterSelector:
    matchExpressions:
      - {key: environment, operator: In, values: ["dev"]}