Istio Egress Filtering Deep Dive

Context

In our Kubernetes cluster, in order to control connectivity between pods, we're using Istio as a service mesh. Practically, it means that every pod in our cluster has an envoy proxy attached to it, capturing traffic on both way : ingress and egress.

By default, every pod can reach any other pod within the mesh, however we want to be able to only authorize traffic to legitimate applications and forbid everything else.

It may sound simple : after all an Istio resource exists to do that, but we faced some caveats during the deployment. This article deep dive into the egress filtering implementation and demonstrate what is exacly modified in the envoy configuration when you manipulate Istio resources.

The Istio Control Plane

The brain of our service mesh is istiod : aka the Istio control plane.

This piece of software is responsible to :

gather the configuration described in the Istio resources (Virtual services, ServiceEntry, DestinationRules...) but also in Kubernetes itself (Services, Endpoints…)
build the corresponding envoy configuration
push the configuration to every envoy proxy in the mesh.

In this page I usually use the term of “istio proxy” instead of “envoy proxy”, but they are strictly equivalent. Envoy is the real name of the proxy software used by Istio.

Envoy terminology

The following is a simplified view of Envoy.

Excerpt from the official Envoy documentation:

Listener: A listener is a named network location (e.g., port, unix domain socket, etc.) that can be connected to by downstream clients. Envoy exposes one or more listeners that downstream hosts connect to.

Cluster: A cluster is a group of logically similar upstream hosts that Envoy connects to. Envoy discovers the members of a cluster via service discovery.

I used it to draw the images down below, to specify in which logical block the traffic is entering to depending on the direction (ingress or egress)

Simple Istio mesh config

Before jumping into the details of the egress filtering, let's have an overview of the envoy configuration when your pods are deployed with an Istio proxy without any specific configuration.

Here, we have an application-0 which expose a service on 8080 HTTP port and has 3 dependencies : application-1 , application-2 located inside the mesh, and bigtable which is located outside of the mesh.

This is resulting in the proxy config as :

No specific inbound listener
More than 2 outbound clusters : all services in the mesh are added in term of outbound configuration

It means that every topology change updates the configuration to all proxy in the mesh, even if your application is not concerned. Envoy will then consume more CPU and memory to process this complete mesh topology.

Adding mesh internal dependencies

To avoid this overload and most importantly regarding this article subject, to qualify our workload dependencies, we usually leverage the Sidecar resource which allows to fine tune ingress and egress traffic of our workload.

Let's start with only our mesh internal dependencies : application-1 and application-2

To do that we define in the Sidecar resource :

1 ingress
2 egress

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: "release-name"
spec:
  workloadSelector:
    labels:
      app: "release-name"
  outboundTrafficPolicy:
    mode: "ALLOW_ANY"
  ingress:
  - port:
      number:  8080
      protocol: TCP
      name: http
    defaultEndpoint: "127.0.0.1:8080"
  egress:
  - hosts:
    - namespace/application-1
    - namespace/application-2

This is resulting in the proxy config as :

1 inbound listener : this is mainly used to have correct metrics. However at this point if you mistake the port number, you gonna break your inbound traffic.
2 outbound clusters : so all other services in the mesh are ignored in terms of outbound configuration, but they’re still reachable.

The first benefit is a lower footprint : the configuration is now reduced as only a couple of dependencies. We're almost ready to leverage the filtering, but right now we're only using the ALLOW_ANY parameter, so nothing is really filtered out. Yes, but it may have an impact anyway.

Harmless, really ?

Indeed, the application-3 is not a dependency declared in the Sidecar resource, so the internal outboundCluster application-3 doesn't exist anymore.

If application-0 must reach application-3 (forgotten dependency for example) like I said before, the traffic is allowed, thanks to outBoundTrafficPolicy set to ALLOW_ANY.

However, it may break if application-3 have specific cluster configuration (VirtualService, DestinationRules...). Every traffic forwarded througth the passThroughCluster, is done without taking destination VS and DR configuration into consideration. So take extra precautions to really identify your dependencies before adding the Sidecar resource.

Regarding external services like bigtable , nothing changes : all traffic forwarded is still using the passThroughCluster . We must deal with this before actually performing the filtering.

Adding mesh external dependencies

Now we've defined all internal dependencies, let's add the external ones. Open your Sidecar resource in your favorite editor and add bigtable .

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: "release-name"
spec:
  workloadSelector:
    labels:
      app: "release-name"
  outboundTrafficPolicy:
    mode: "ALLOW_ANY"
  ingress:
  - port:
      number:  8080
      protocol: TCP
      name: http
    defaultEndpoint: "127.0.0.1:8080"
  egress:
  - hosts:
    - namespace/application-1
    - namespace/application-2
    - ./bigtable.googleapis.com

and you create a ServiceEntry resource which specify :

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: "release-name"
  namespace: istio-system
spec:
  exportTo:
  - '.'
  hosts:
  - bigtable.googleapis.com
  location: MESH_EXTERNAL
  ports:
  - name: https
    number: 443
    protocol: HTTPS
  resolution: DNS

This is resulting in the proxy config as :

3 outbound clusters. The 3rd one (corresponding to bigtable.googleapis.com) finally shows up as soon as you create the Service Entry resource.

Why not before ? Because Istio rely on it's service registry to build its cluster configuration. All Kubernetes services are by default populated in the Istio registry, but bigtable is not part of the mesh : that's why we create a Service Entry resource which creates an entry in the Istio Service Registry.

Now you can activate the REGISTRY_ONLY option, which only allow traffic from every logical cluster and forbid anything which used to go through the passThroughCluster.