Service Mesh for Developers

February 6, 2023

The Fallacies of Distributed Computing

Many years ago we moved away from monolithic applications and started to develop our app as a network of smaller services, the microservice architecture. This brought many advantages and we finally had fun again developing large business applications. But we were also being introduced to the Fallacies of Distributed Computing and had to find ways to make our network of microservices secure, resilient, and highly observable in this new environment.

From Microservice Pattern to a Service Mesh

Patterns have been developed to compensate for the effects of errors in the network, timeouts, non-responsive services, etc., such as a circuit breaker, retries, etc. (more on this later). These first came in the form of open source libraries like Netflix Hystrix and the Feign client, and soon became very convenient for us developers to use via the Spring Framework.

Each microservice application now had a lot of additional libraries in its belly. This made the apps bloated and central configuration changes for all apps at once were not easy to achieve. So a better solution was needed and the Sidecar pattern was found. All communication logic (to enhance observability, security and resiliency) was moved to the sidecar, a small application that sits next to the actual application and intercepts the incoming and outgoing traffic. The Sidecar pattern offloads functionality from application code to the Service Mesh.

As a bonus, the Sidecar also works for apps in other programming languages such as Python, Node.js and Go, for which the aforementioned Java libraries are not available.

And as we’ve bundled all the features for Observability, Security and Resiliency in a Sidecar, we can use a central place, the Control Plane, to manage and configure our sidecars.

There are many implementations of a Service Mesh. We use the OpenShift Service Mesh, based on the Open Source upstream projects Istio and Envoy. Envoy is a high-performance proxy written in C++ and Istio adds the Control Plane functionality.

The sample apps

To illustrate some of the features of a Service Mesh, I created 3 sample apps, called Service A, B and C.

  • Service A: Python app which calls Service B
  • Service B: TypeScript/Deno app which calls Service C
  • Service C: Java app with endpoints to simulate failures

You can find these on Github: https://github.com/nikolaus-lemberski/servicemesh-for-developers

To play with the apps, just run the podman-compose file (‘podman-compose up –build’ – also works with Docker Compose) and call service-a on localhost:3000 to see the call hierarchy. service-c (port 3002) has endpoints to activate (‘/crash’) and deactivate (‘/repair’) error mode. You should see a response like this:

Call to service-a on localhost

Then let’s move forward to Kubernetes / OpenShift. If you don’t have access to an OpenShift cluster, just use OpenShift Local. You can also run the apps with other Kubernetes distributions, for example Minikube or Kind on your local machine. However, not all Service Mesh functionality may be available and Ingress may behave differently.

Using the Service Mesh

Service Mesh Installation

We’re using OpenShift Service Mesh based on Istio and Envoy. The examples work as well on Vanilla Kubernetes with upstream Istio. Just follow the installation guide for your Kubernetes distribution:

  • OpenShift Service Mesh installation via Operators: see Docs and install in order:
    • OpenShift Elasticsearch (Namespace: openshift-operators-redhat)
    • Red Hat OpenShift distributed tracing platform (Namespace: openshift-distributed-tracing)
    • Kiali (Namespace: openshift-operators)
    • Red Hat OpenShift Service Mesh (Namespace: openshift-operators)
  • OR: Istio and Envoy installation: see Docs

Hint: To follow this tutorial you don’t need to install the Istio CLI. For Kubernetes you need ‘oc’, the OpenShift CLI – if you’re using another Kubernetes distribution, everything should work with ‘kubectl’ as well.

Now create a namespace for Istio and a namespace for the apps:

oc new-project istio-system
oc new-project servicemesh-appsCode language: JavaScript (javascript)

Hint: If you’re using ‘kubectl’, create namespaces not a project.

To use Istio, we install the Control Plane and create a MemberRoll, telling Istio that we want it to work with the servicemesh-apps project:

oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/controlplane.yml
oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/memberroll.ymlCode language: JavaScript (javascript)

Wait until all pods in the istio-system namespace are running (‘oc get pods -n istio-system -w‘), then you’re ready to install our sample apps.

Sample apps deployment

oc project servicemesh-apps

oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/a-deploy.yml
oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/b-deploy.yml
oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/c-v1-deploy.yml
oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/c-v2-deploy.ymlCode language: JavaScript (javascript)

Check the pods (‘oc get pod‘): all pods should be running and you should see in the READY column “2/2”. Why 2? In each pod are 2 containers – one for the app and one for the Envoy Sidecar.

The pods with an app container and a sidecar container

Gateway for Ingress

Now we create a Gateway and expose our service-a.

oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/gateway.yml
ROUTE=`oc get route istio-ingressgateway -n istio-system -o jsonpath='{.spec.host}'`

curl $ROUTE/service-aCode language: PHP (php)
Calling service-a on Kubernetes cluster

If the services respond correctly, continue. As you can see, 2 versions of Service C are deployed with traffic split 50/50 (round robin – the Kubernetes default).

We haven’t configured any Service Mesh rules until now, but as we’ve our Sidecars in place we can check our Observability features, e.g. the Jaeger traces. You’ll find the Jaeger URL in your OpenShift Console: Networking -> Routes, select namespace istio-system and click on the Jaeger link.

Canary Releases

Traffic shaping allows us to release new software versions as “Canary releases” to avoid the risk of a Big Bang / All at Once approach. This is the first use case we’ll have a look at.

What is a Canary Release?

With a Canary Release you deploy the new version of your app to production but you keep the former version and you send only a small set of users to the new version. If the new version performs well and as expexted, you send more traffic to the new version. If 100% traffic goes to the new version, you can scale down and remove the former version.

There are lots of options, how to adjust the traffic, for example by user group, location and so on. Here we just use a simple approach by defining the percentage of traffic for each version.

Apply the Canary Release

We already have two versions of service-c deployed. At the moment the traffic goes 50/50, the default “round robin” behavior of service routing in Kubernetes.

With a Service Mesh, we can fine tune this behavior. First we inform the Service Mesh about our two versions, using a DestinationRule:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: service-c
spec:
  host: service-c
  subsets:
  - name: version-v1
    labels:
      version: v1
  - name: version-v2
    labels:
      version: v2

Now apply the destination rule:

oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/destination-rules.ymlCode language: JavaScript (javascript)

Let’s start to shift the traffic. Open 2 terminals.

Terminal 1

ROUTE=`oc get route istio-ingressgateway -n istio-system -o jsonpath='{.spec.host}'`
while true; do curl $ROUTE/service-a; sleep 0.5; doneCode language: PHP (php)

Terminal 2

While applying steps 1-4, check Kiali and Jaeger. Here you have great Observability without any libraries or coding*. You can open Jaeger and Kiali from the OpenShift Console (Networking Routes).

1. 100% traffic goes to our “old” version 1

The Virtual Service, sending 100% of the traffic to version-v1:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: service-c
spec:
  hosts:
  - service-c
  http:
  - route:
    - destination:
        host: service-c
        subset: version-v1
      weight: 100

Create the virtual service:

oc create -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/canary/1-vs-v1.ymlCode language: JavaScript (javascript)

2. We start the canary release by sending 10% of traffic to version 2

The Virtual Service with a traffic split of 90 / 10:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: service-c
spec:
  hosts:
  - service-c
  http:
  - route:
    - destination:
        host: service-c
        subset: version-v1
      weight: 90
    - destination:
        host: service-c
        subset: version-v2
      weight: 10

Replace the virtual service:

oc replace -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/canary/2-vs-v1_and_v2_90_10.ymlCode language: JavaScript (javascript)

If you check Kiali (in your OpenShift Console go to Networking -> Routes, select the istio-system namespace and click on the Kiali route), you can see a nice chart with your services and the traffic distribution:

3. We are happy with version 2 and increase the traffic to 50%

The Virtual Service with a traffic split of 50 / 50:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: service-c
spec:
  hosts:
  - service-c
  http:
  - route:
    - destination:
        host: service-c
        subset: version-v1
      weight: 50
    - destination:
        host: service-c
        subset: version-v2
      weight: 50

Replace the virtual service:

oc replace -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/canary/3-vs-v1_and_v2_50_50.ymlCode language: JavaScript (javascript)

4. Finally we send 100% of the traffic to version 2

The Virtual Service sending 100% of the traffic to our new version 2:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: service-c
spec:
  hosts:
  - service-c
  http:
  - route:
    - destination:
        host: service-c
        subset: version-v2
      weight: 100

Replace the virtual service:

oc replace -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/canary/4-vs-v2.ymlCode language: JavaScript (javascript)

Please check again your Kiali chart and see how Istio adjusts the traffic distribution.

(*) The Envoy Sidecar automatically injects tracing headers and sends traffic metadata to Kiali and Jaeger. For the Distributed Tracing, you must propagate the tracing headers when doing calls to other services. See Istio Header Propagation.

Circuit Breaker and Retry

Circuit Breaker and Retries are Resiliency pattern. A circuit breaker blocks traffic to a slow or non-performing service, so the app can (hopefully) recover. This is to prevent cascading failures, a common scenario if for example Thread Pools are running full while all requests wait for an unresponsive service.

A circuit breaker reduces the number of errors that are propagated to the end user and prevents cascading failures. With Retry policies we can eliminate almost all. If an error occurs or the service call is too slow, the Retry policy will try the service call again, is routed to another app instance and the request is processed successfully.

Apply the Circuit Breaker and Retry

Let the terminal 1 with the curl loop running or open a new one.

Terminal 1

ROUTE=`oc get route istio-ingressgateway -n istio-system -o jsonpath='{.spec.host}'`
while true; do curl $ROUTE/service-a; sleep 0.5; doneCode language: PHP (php)

Terminal 2

In Terminal 2, let’s reset the VirtualService from our former Canary release and scale the service-c-v1 down to zero replicas and service-c-v2 up to 2 replicas.

oc replace -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/circuit-breaker/1-vs.yml
oc scale deploy/service-c-v1 --replicas 0
oc scale deploy/service-c-v2 --replicas 2Code language: JavaScript (javascript)

Terminal 3

Now connect to service-c and let it crash… in a separate terminal, run

oc get pod
POD_NAME=....
oc port-forward pod/$POD_NAME 8080:8080Code language: JavaScript (javascript)

For POD_NAME choose one of the service-c pods. With the port-forwarding you can call the service-c app on localhost:8080.

Let the port-forwarding of Terminal 3 open, go back to Terminal 2 and let one app of service-c crash:

curl localhost:8080/crash

Go to your Terminal 1 with the curl loop and see what happens with the service responses.

Service response errors after activating the “crash” mode of Service C

Now apply the Circuit Breaker (check what happens), then the Retry policy. The Circuit Breaker is configured in the destination rule:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: service-c
spec:
  host: service-c
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        maxConnections: 100
    outlierDetection:
      consecutive5xxErrors: 1
      interval: 2s
      baseEjectionTime: 10s
      maxEjectionPercent: 100

Terminal 2

oc replace -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/circuit-breaker/2-destination-rules.ymlCode language: JavaScript (javascript)
Circuit Breaker goes to open in line 3. And checks again after ~10 seconds.

Better, but still some errors. The Circuit Breaker removes the crashed app from the healthy client list but tries after the configured baseEjectionTime if the crashed app became healthy. So the number of errors propagated to the end user is reduced drastically but some errors are still thrown. Let’s fix it.

Terminal 2

With the Circuit Breaker in place, we add a retry policy.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: service-c
spec:
  hosts:
  - service-c
  http:
  - route:
    - destination:
        host: service-c
    retries:
        attempts: 3
        perTryTimeout: 1s
        retryOn: 5xx

Apply the retry configuration:

oc replace -f https://raw.githubusercontent.com/nikolaus-lemberski/servicemesh-for-developers/main/kubernetes/circuit-breaker/3-vs-retry.ymlCode language: JavaScript (javascript)

Go to your Terminal 1 with the curl loop and see what happens with the service responses. All errors are gone.

Terminal 2

Finally repair the crashed service.

curl localhost:8080/repair

After ~10 seconds the repaired pod gets traffic (Circuit Breaker goes from open to close).

The crashed app instance is healthy again and gets traffic – Circuit Breaker is closed.

Congratulations, you made it!

Thank you very much for taking the time to follow my “Service Mesh for Developers” introduction. I hope you are as excited as I am about the possibilities a Service Mesh gives us.

In this article I picked only a small subset of the possibilities the Service Mesh offers. There’s much more to discover, like traffic mirroring to test new app versions, introducing errors for Chaos Engineering, and more. The power of Service Mesh makes a lot possible, and to quote Stan Lee: “With great power comes great responsibility”. So it’s better to start small and gain experience before you take every feature into production.

Running a large Service Mesh with many apps is another challenge. You can learn more about Service Mesh Federation here.