By Robert Baumgartner, Red Hat Austria, February 2023 (OpenShift 4.12)
In this blog I will guide you on how to use service performance monitoring (SPM) with Jaeger.
It should work with any application that is working with OpenTelemetry.
This document is based on OpenShift 4.12. See Distributed tracing release notes.
OpenShift distributed tracing platform Operator is based on Jaeger 1.39.
OpenShift distributed tracing data collection Operator based on OpenTelemetry 0.63.1
OpenTelemetry and Jaeger
OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.
Jaeger is a tool to monitor and troubleshoot transactions in complex distributed systems.
In the following diagram, I will show you how the flow will be between your application, OpenTelemetry, and Jaeger.
To make the demo simpler I am using the AllInOne image from Jaeger. This will install collector, query, and Jaeger UI in a single pod, using in-memory storage by default.
More details can be found
Enabling Distributed Tracing
A cluster administrator has to enable the Distributed Tracing Platform and Distributed Tracing Data Collection operator once.
As of OpenShift 4.12, this can be done easily by using the OperatorHub on the OpenShift console. See Installing the Red Hat OpenShift distributed tracing platform Operator.
In this demo, we do not install the OpenShift Elasticsearch Operator, because we use only in-memory tracing – no persistence.
Make sure you are logged in as cluster-admin!
After a short time, you can check that the operator pods were created and running and the CRDs were created:
$ oc get pod -n openshift-distributed-tracing|grep jaeger
jaeger-operator-bc65549bd-hch9v 1/1 Running 0 10d
$ oc get pod -n openshift-operators|grep opentelemetry
opentelemetry-operator-controller-manager-69f7f56598-nsr5h 2/2 Running 0 10d
$ oc get crd jaegers
NAME CREATED AT
jaegers.jaegertracing.io 2021-12-08T15:51:29Z
$ oc get crd opentelemetrycollectors
NAME CREATED AT
opentelemetrycollectors.opentelemetry.io 2021-12-15T07:57:38Z
Enabling Monitoring of Your Own Services in OpenShift 4.6+
Using the monitoring for user-defined projects in OpenShift saves us to deploy our own Prometheus instance.
In OpenShift version 4.6+ monitoring for user-defined projects is GA. See Enabling monitoring for user-defined projects
As of OpenShift 4.6+, this is done by an update on the ConfigMap within the project openshift-monitoring.
Make sure you are logged in as cluster-admin:
$ cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
thanosQuerier:
enableRequestLogging: true
EOF
Check User Workload Monitoring
After a short time, you can check that the prometheus-user-workload pods were created and running:
$ oc get pod -n openshift-user-workload-monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-7bcc9cc899-p8cbr 1/1 Running 0 10h
prometheus-user-workload-0 5/5 Running 1 10h
prometheus-user-workload-1 5/5 Running 1 10h
thanos-ruler-user-workload-0 3/3 Running 0 10h
thanos-ruler-user-workload-1 3/3 Running 0 10h
Create a New Project
Create a new project (for example jaeger-demo) and give a normal user (such as a developer) admin rights to the project:
$ oc new-project jaeger-demo
Now using project "jaeger-demo" on server "https://api.yourserver:6443".
You can add applications to this project with the 'new-app' command. For example, try:
oc new-app rails-postgresql-example
to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:
kubectl create deployment hello-node --image=k8s.gcr.io/serve_hostname
$ oc policy add-role-to-user admin developer -n jaeger-demo
clusterrole.rbac.authorization.k8s.io/admin added: "developer"
Login as the Normal User
$ oc login -u developer
Authentication required for https://api.yourserver:6443 (openshift)
Username: developer
Password:
Login successful.
You have one project on this server: "jaeger-demo"
Using project "jaeger-demo".
Create Jaeger
Create a simple Jaeger instance with the name my-jaeger and add some environment variables to the deployment.
$ cat <<EOF | oc apply -f -
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: my-jaeger
spec:
allInOne:
# image: jaegertracing/all-in-one:1.31
options:
# log-level: debug
# query:
# base-path: /jaeger
prometheus:
server-url: "https://thanos-querier.openshift-monitoring.svc:9092"
tls.ca: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
# tls.key: "/var/run/secrets/kubernetes.io/serviceaccount/token"
tls.enabled: "true"
metricsStorage:
type: prometheus
strategy: allinone
EOF
jaeger.jaegertracing.io/my-jaeger created
When the Jaeger instance is up and running you can check the service and route.
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-jaeger-agent ClusterIP None <none> 5775/UDP,5778/TCP,6831/UDP,6832/UDP 3s
my-jaeger-collector ClusterIP 172.30.112.217 <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP,4317/TCP,4318/TCP 3s
my-jaeger-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP,4317/TCP,4318/TCP 3s
my-jaeger-query ClusterIP 172.30.33.207 <none> 443/TCP,16685/TCP 3s
$ oc get route my-jaeger -o jsonpath='{.spec.host}'
my-jaeger-jaeger-demo.apps.rbaumgar.demo.net
Open a new browser window and go to the route URL and login with your OpenShift login (developer).
Create OpenTelemetry Collector
Create a ConfigMap and an OpenTelemetry Collector instance with the name my-otelcol.
The configmap is used because the Jaeger service requires encryption. These certificates are issued as TLS web server certificates. More details can be found at Understanding service serving certificates
$ cat <<EOF |oc apply -f -
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: my-otelcol
spec:
config: |
receivers:
jaeger:
protocols:
thrift_http:
endpoint: "0.0.0.0:14278"
otlp:
protocols:
grpc:
http:
# Dummy receiver that's never used, because a pipeline is required to have one.
otlp/spanmetrics:
protocols:
grpc:
endpoint: "localhost:65535"
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
jaeger:
endpoint: my-jaeger-collector-headless.jaeger-demo.svc:14250
tls:
ca_file: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
processors:
batch:
spanmetrics:
metrics_exporter: prometheus
service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: [spanmetrics, batch]
exporters: [jaeger]
# The exporter name in this pipeline must match the spanmetrics.metrics_exporter name.
# The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
metrics/spanmetrics:
receivers: [otlp/spanmetrics]
exporters: [prometheus]
mode: deployment
ports:
- name: metrics
port: 8889
protocol: TCP
targetPort: 8889
ingress: {}
image: 'otel/opentelemetry-collector-contrib:latest'
EOF
opentelemetrycollector.opentelemetry.io/my-otelcol created
When the OpenTelemetryCollector instance is up and running you can check the log.
$ oc logs deployment/my-otelcol-collector
2023-02-01T06:50:31.740Z info service/telemetry.go:110 Setting up own telemetry...
2023-02-01T06:50:31.740Z info service/telemetry.go:140 Serving Prometheus metrics {"address": ":8888", "level": "basic"}
2023-02-01T06:50:31.741Z info service/service.go:89 Starting otelcol... {"Version": "0.63.1", "NumCPU": 4}
2023-02-01T06:50:31.741Z info extensions/extensions.go:42 Starting extensions...
2023-02-01T06:50:31.741Z info pipelines/pipelines.go:74 Starting exporters...
2023-02-01T06:50:31.741Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "jaeger"}
2023-02-01T06:50:31.743Z info [email protected]/exporter.go:185 State of the connection with the Jaeger Collector backend {"kind": "exporter", "data_type": "traces", "name": "jaeger", "state": "IDLE"}
2023-02-01T06:50:31.743Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "traces", "name": "jaeger"}
2023-02-01T06:50:31.743Z info pipelines/pipelines.go:86 Starting processors...
2023-02-01T06:50:31.743Z info pipelines/pipelines.go:98 Starting receivers...
2023-02-01T06:50:31.743Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2023-02-01T06:50:31.743Z info otlpreceiver/otlp.go:71 Starting GRPC server {"kind": "receiver", "name": "otlp", "pipeline": "traces", "endpoint": "0.0.0.0:4317"}
2023-02-01T06:50:31.743Z info otlpreceiver/otlp.go:89 Starting HTTP server {"kind": "receiver", "name": "otlp", "pipeline": "traces", "endpoint": "0.0.0.0:4318"}
2023-02-01T06:50:31.743Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2023-02-01T06:50:31.743Z info service/service.go:106 Everything is ready. Begin running and processing data.
2023-02-01T06:50:32.743Z info [email protected]/exporter.go:185 State of the connection with the Jaeger Collector backend {"kind": "exporter", "data_type": "traces", "name": "jaeger", "state": "READY"}
Very important is the last line (“State of connection…”) which shows that the collector is connected to the Jaeger instance. If this is not the case, you have to update the spec.config.exports.jaeger.endpoint value in your OpenTelemetry Collector instance. Should be ..svc:14250.
This can be done by:
$ oc edit opentelemetrycollector my-otelcol
You can check that the Prometheus port is available at port 8889.
$ oc get svc -l app.kubernetes.io/name=my-otelcol-collector
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-otelcol-collector ClusterIP 172.30.131.109 <none> 8889/TCP,14278/TCP,4317/TCP,4318/TCP,55681/TCP,65535/TCP 4d2h
my-otelcol-collector-headless ClusterIP None <none> 8889/TCP,14278/TCP,4317/TCP,4318/TCP,55681/TCP,65535/TCP 4d2h
Setting up Metrics Collection
To use the metrics exposed by OpenTelemetry Collector, you need to configure OpenShift Monitoring to scrape metrics from the /metrics endpoint. You can do this using a ServiceMonitor, a custom resource definition (CRD) that specifies how a service should be monitored, or a PodMonitor, a CRD that specifies how a pod should be monitored. The former requires a Service object, while the latter does not, allowing Prometheus to directly scrape metrics from the metrics endpoint exposed by a pod.
$ cat <<EOF | oc apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: otelcol-monitor
name: otelcol-monitor
spec:
endpoints:
- interval: 30s
path: /metrics
port: metrics
scheme: http
selector:
matchLabels:
app.kubernetes.io/name: my-otelcol-collector
EOF
servicemonitor.monitoring.coreos.com/otelcol-monitor created
$ oc get servicemonitor
NAME AGE
otelcol-monitor 42s
⭐ The matchLabels must be the same as it is defined at the monitoring service of The OpenTelemetry Collector!
Sample Application
You can use any application which is using OpenTelemetry. You have to make sure that your application is sending the OpenTelemetry data to http://my-otelcol-collector:4317!
If you have no application, continue and deploy the sample application. Otherwise, go to the next chapter.
Deploy a Sample Application
All modern application development frameworks (like Quarkus) support OpenTelemetry features, Quarkus – USING OPENTELEMETRY.
To simplify this document, I am using an existing example. The application is based on an example at GitHub – rbaumgar/otelcol-demo-app: Quarkus demo app to show OpenTelemetry with Jaeger.
Deploying a sample application monitor-demo-app and expose as a route:
$ cat <<EOF |oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: otelcol-demo-app
name: otelcol-demo-app
spec:
replicas: 1
selector:
matchLabels:
app: otelcol-demo-app
template:
metadata:
labels:
app: otelcol-demo-app
spec:
containers:
- name: otelcol-demo-app
image: quay.io/rbaumgar/otelcol-demo-app-jvm
env:
- name: OTELCOL_SERVER
value: 'http://my-otelcol-collector:4317'
---
apiVersion: v1
kind: Service
metadata:
labels:
app: otelcol-demo-app
name: otelcol-demo-app
spec:
ports:
- port: 8080
protocol: TCP
targetPort: 8080
name: web
selector:
app: otelcol-demo-app
type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
labels:
app: otelcol-demo-app
name: otelcol-demo-app
spec:
path: /
to:
kind: Service
name: otelcol-demo-app
port:
targetPort: web
tls:
termination: edge
EOF
deployment.apps/otelcol-demo-app created
service/otelcol-demo-app created
route.route.openshift.io/otelcol-demo-app exposed
$oc set env deployment/otelcol-demo-app \
SERVICE_NAME=https://`oc get route otelcol-demo-app -o jsonpath='{.spec.host}'`
deployment.apps/otelcol-demo-app updated
The environment variable with the name OTELCOL_SERVER specified to point to the right OpenTelemetry Collector: (http://my-otelcol-collector:4317)
Test Sample Application
Check the router URL with /hello and see the hello message with the pod name. Do this multiple times.
$ export URL=$(oc get route otelcol-demo-app -o jsonpath='{.spec.host}')
$ curl $URL/hello
hello
$ curl $URL/sayHello/demo1
hello: demo1
$ curl $URL/sayRemote/demo2
hello: demo2 from http://otelcol-demo-app-jaeger-demo.apps.rbaumgar.demo.net/
...
Go to the Jaeger URL.
Reload by pressing F5.
Under Service select my-service.
Find Traces…
⭐ The service name is specified in the application.properties (quarkus.application.name) of the demo app.
⭐ The URL of the collector is specified in the application.properties (quarkus.opentelemetry.tracer.exporter.otlp.endpoint=http://my-otelcol-collector:4317).
Open one trace entry and expand it to get all the details.
Done!
If you want more details on how the OpenTracing is done in Quarkus go to the GitHub example at GitHub – rbaumgar/otelcol-demo-app: Quarkus demo app to show OpenTelemetry with Jaeger.
Accessing the Metrics of Your Service
Once you have enabled monitoring of your own services, deployed a service, and set up metrics collection for it, you can access the metrics of the service as a cluster administrator, as a developer, or as a user with view permissions for the project.
To access the metrics as a developer or a user with permissions, go to the OpenShift Container Platform web console, switch to the Developer Perspective, then click Observer → Metrics.
⭐ Developers can only use the Developer Perspective. They can only query metrics from a single project.
Use the “Custom query” (PromQL) interface to run queries for your services.
Enter “latency” and ENTER you will get a list of all available values…
The following metric names will be created:
Type: histogram
Description: a histogram of span latencies. Under the hood, Prometheus histograms will create a number of time series:
- latency_count: The total number of data points across all buckets in the histogram.
- latency_sum: The sum of all data point values.
- latency_bucket: A collection of n time series (where n is the number of latency buckets) for each latency bucket identified by le (less than or equal to) label. The latency_bucket counter with the lowest le and le >= span latency will be incremented for each span.
Here is an example
You can also use the Thanos Querier to display the application metrics. The Thanos Querier enables aggregating and, optionally, deduplicating cluster and user workload metrics under a single, multi-tenant interface.
Thanos Querier can be reached at: https://thanos-querier-openshift-monitoring.apps.your.cluster/graph
If you are just interested in exposing application metrics to the dashboard, you can stop here.
Workaround
When you want to use the Thanos-Querier in the OpenShift User Workload Monitoring you have to make some additional changes to get access to the Prometheus data of your project.
- you have to present an authentication bearer to get access.
- you need to add “namespace=” as a query parameter. The bearer has to have at least view rights on the namespace.
The new SPM feature of Jaeger does not provide parameters to set those requirements.
Two feature enhancement are created
spm additional query parameter required for prometheus
spm bearer token required for prometheus
So I decided to create a nginx reverse proxy with this feature.
Deploy the nginx:
$ NAMESPACE=`oc project -q`
$ BEARER=`oc whoami -t`
$ sed "s/<BEARER>/$BEARER/g" nginx/configmap.yaml | sed "s/<NAMESPACE>/$NAMESPACE/g" | oc apply -f -
configmap/nginx created
$ oc apply -f nginx/deployment.yaml
deployment.apps/nginx created
$ oc apply -f nginx/service.yaml
service/nginx created
Check that the nxinx pod is up and running
$ oc get pod -l app=nginx
NAME READY STATUS RESTARTS AGE
nginx-76545df445-fj48l 1/1 Running 0 82m
In the end, we configure Jaeger to use the proxy as Prometheus endpoint.
$ oc patch jaeger/my-jaeger --type='json' -p='[{"op": "replace","path": "/spec/allInOne/options/prometheus/server-url", "value": "http://nginx:9092"}]'
jaeger.jaegertracing.io/my-jaeger patched
Show Service Performance Monitoring (SPM)
If you have not created some test data now is the right time…
Go to your Jaeger UI, select the Monitor tab, and select the right service (my-service)
Done!
The only thing you have to keep in mind is that this feature is currently under development!
But what’s coming is very interesting!
Remove this Demo
$ oc delete all nginx
$ oc delete servicemonitor otelcol-monitor
$ oc delete deployment,svc,route otelcol-demo-app
$ oc delete opentelemetrycollector my-otelcol
$ oc delete jaeger my-jaeger
$ oc delete cm my-otelcol-cabundle
$ oc delete project jaeger-demo
This document: