How to deploy Cilium in ENI mode for Apigee Hybrid on AWS EKS

How to deploy Cilium in ENI mode for Apigee Hybrid on AWS EKS

Overview

Large enterprises operating at scale have established a set of leading practices for running mission-critical workloads on Kubernetes clusters. These aren’t just good ideas; they’re essential for maintaining security, reliability, and cost-efficiency. Practices like using Kubernetes Namespaces for logical separation, implementing Role-Based Access Control (RBAC) for granular permissions, and defining Kubernetes Network Policies to control Kubernetes Pod communication are all foundational to a secure and well-managed cluster.

Another key enterprise requirement is a robust Container Network Interface (CNI). While many cloud providers offer a default CNI, they often lack the advanced features needed for sophisticated network security and observability. This is where a CNI like Cilium comes in. By leveraging eBPF (extended Berkeley Packet Filter), Cilium provides superior performance, deep network visibility, and identity-based security policies that are far more dynamic and scalable than traditional IP-based firewalls. It allows enterprises to implement a Zero Trust architecture, where policies are based on a Kubernetes Pod’s identity (like its service account or labels) rather than its volatile IP address.

To achieve this, Cilium offers two primary networking modes: Overlay Mode and Direct Routing Mode (often called ENI mode on AWS). In Overlay Mode, Kubernetes Pods receive IP addresses from a virtual network that is decoupled from the underlying VPC network. This is highly efficient for IP address allocation but requires network address translation (NAT) for traffic leaving the cluster, which can obscure communication. Conversely, ENI Mode is a direct routing configuration where Kubernetes Pods are assigned IP addresses directly from the AWS VPC. This makes them first-class citizens of the VPC network, ensuring they are directly routable and reachable from other VPC services, including the AWS EKS control plane.

However, the choice of networking mode is critical when integrating specific applications like Apigee Hybrid. The conflict arises when Cilium is configured in its default Overlay Mode on AWS EKS. Because Kubernetes Pods in this mode have non-VPC-routable IP addresses, the EKS kube-apiserver (which resides outside the cluster’s node network) cannot directly communicate with essential components like admission webhooks. This communication failure is the direct cause of the Address is not allowed error encountered during the Apigee installation. The solution is to configure Cilium to use its ENI mode. By assigning Kubernetes Pods routable IP addresses from the VPC, this mode ensures seamless connectivity with the kube-apiserver.

This article will walk you through the precise steps to configure an EKS cluster with Cilium in ENI mode, resolving the webhook issue and enabling a successful Apigee Hybrid deployment.

Customer Scenario

  • Cluster management team at an enterprise customer attempted deploying Apigee Hybrid Runtime Plane on their AWS EKS cluster for POC (Proof of Concept) purposes.
  • The AWS EKS cluster in question was configured with Cilium CNI.
  • Assume the customer had performed the Apigee Hybrid install steps up till deploying the Helm Charts here.
  • While deploying the apigee-datastore Helm Chart, they encountered the below-stated error:

Error: release datastore failed, and has been uninstalled due to atomic being set: failed to create resource: Internal error occurred: failed calling webhook "[mapigeedatastore.apigee.cloud.google.com](http://mapigeedatastore.apigee.cloud.google.com/)": failed to call webhook: Post "[https://apigee-webhook-service.apigee-poc.svc:443/mutate-apigee-cloud-google-com-v1alpha1-apigeedatastore?timeout=10s](https://apigee-webhook-service.apigee-poc.svc/mutate-apigee-cloud-google-com-v1alpha1-apigeedatastore?timeout=10s)": Address is not allowed

  • The error suggested issues with some Kubernetes webhooks being reachable by the kube-apiserver.

Root Cause Analysis - Potential Issue

  • It is a known issue that AWS EKS clusters configured with Cilium in overlay mode restrict webhooks from getting accessed by kube-apiserver.
    • This also causes Address is not allowed, failed to call webhook, and failed to verify certificate errors when attempting to deploy Kubernetes resources which rely on webhook usage/validations. Apigee-specific Helm Charts leverage Kubernetes webhooks for performing some validations (like certificate validation) and mutations, and because of the above-referenced known issue, the provisioning of Apigee Hybrid Runtime Plane Kubernetes resources fails.
    • Cilium in overlay mode configures the Kubernetes Pods with non-VPC-routable IPs. Though this allows running more Kubernetes Pods per Kubernetes Worker Node, it limits a Kubernetes Pod’s network connectivity in a sense that the connectivity to resources outside the Kubernetes Cluster is masqueraded, as explained on Cilium’s official documentation.
  • There is an open GitHub Issue related to the above-referenced known issue.

Workaround

  • Apigee Hybrid assumes the VPC native networking/routing setup is in use. In order to overcome the issue with unreachability of the Kubernetes webhooks from the kube-apiserver, we propose configuring AWS EKS with Cilium in ENI mode (VPC Native Routing). The reason for the proposal of ENI mode activation is to adapt or configure Cilium to work in a VPC native networking manner.
  • The most straightforward way to mitigate the above-stated issue is to configure AWS EKS with Cilium in ENI mode to achieve this.

Configuring AWS EKS with Cilium in ENI Mode

Inference should be taken from the below-mentioned steps and due diligence should be practiced, especially for Network/Security areas. The below-mentioned steps should be performed at beginning of Part 2 Step 1 - “Create a Cluster” of the Apigee installation guide described here.

Step 0: Install and configure eksctl

Step 1: Create and Configure AWS EKS Cluster with right plugins

Create a file named eks-config.yaml, as shown below, and create the AWS EKS cluster using the command

eksctl create cluster -f eks-config.yaml

The installation will result in coredns and kube-proxy Kubernetes Pods being in Pending mode and that is expected at this point as there is a dependency on Cilium to be up in order for any workloads to turn healthy.

Configure the following:

  • iam.withOIDC i.e. enable the IAM OIDC provider.
  • vpi-cni plugin so as to have a clean installation of Cilium.
  • eks-pod-identity-agent plugin here to enable IRSA-based authentication to AWS.
  • aws-ebs-csi-driver plugin here to enable working with PVCs and StorageClasses.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: <cluster-name>
  region: <cluster-region>
  version: "1.31"
iam:
  withOIDC: true
addons:
  - name: vpc-cni
    resolveConflicts: overwrite
  - name: coredns
  - name: kube-proxy
  - name: eks-pod-identity-agent
  - name: aws-ebs-csi-driver
managedNodeGroups:
  - name: apigee-runtime
    instanceType: m5.xlarge
    desiredCapacity: 2
    minSize: 2
    maxSize: 5
    privateNetworking: true
    labels:
      apigee.com/node-pool: runtime
    taints:
      - key: node.cilium.io/agent-not-ready
        value: "true"
        effect: NoExecute
  - name: apigee-data
    instanceType: m5.xlarge
    desiredCapacity: 1
    minSize: 1
    maxSize: 1
    privateNetworking: true
    labels:
      apigee.com/node-pool: data
    taints:
      - key: node.cilium.io/agent-not-ready
        value: "true"
        effect: NoExecute

Step 2: Create an IAM Policy

In the aws iam console, create an IAM policy named CiliumOperatorPolicy


Version: 2012-10-17
Statement:
  - Effect: Allow
    Action:
      - ec2:CreateNetworkInterface
      - ec2:AttachNetworkInterface
      - ec2:DeleteNetworkInterface
      - ec2:DetachNetworkInterface
      - ec2:DescribeNetworkInterfaces
      - ec2:ModifyNetworkInterfaceAttribute
      - ec2:DescribeInstances
      - ec2:DescribeSubnets
      - ec2:DescribeSecurityGroups
      - ec2:DescribeInstanceTypes
      - ec2:DescribeVpcs
      - ec2:ModifyNetworkInterfaceAttribute
      - ec2:AssignPrivateIpAddresses
      - ec2:CreateTags
      - ec2:DescribeTags
    Resource: "*"

Step 3: Create an IAM Role

In the aws iam console, create an IAM role named CiliumOperatorRole

Reference output (excluding https://) of aws eks describe-cluster --name --query “cluster.identity.oidc.issuer” --output text command for the OIDC provider details in the schema below.

The schema belongs to the configuration of CiliumOperatorRole’s Trust Relationships.

Version: 2012-10-17
Statement:
  - Effect: Allow
    Principal:
      Federated: arn:aws:iam::<ACCOUNT_ID>:oidc-provider/<OIDC_PROVIDER_ID>
    Action: sts:AssumeRoleWithWebIdentity
    Condition:
      StringEquals:
        <OIDC_PROVIDER_ID>:sub: system:serviceaccount:kube-system:cilium-operator
        :aud: sts.amazonaws.com
  - Effect: Allow
    Principal:
      Service: pods.eks.amazonaws.com
    Action:
      - sts:AssumeRole
      - sts:TagSession

Step 4: Create cilium-config.yaml

Create cilium-config.yaml to define the configuration for Cilium’s ENI mode-based installation.

cluster:
  name: <cluster-name>
aws:
  enabled: true
  region: <cluster-region>
ipam:
  mode: eni
eni:
  enabled: true
  iamRole: arn:aws:iam::<ACCOUNT_ID>:role/CiliumOperatorRole
operator:
  extraEnv:
    - name: AWS_REGION
      value: <cluster-region>
    - name: AWS_DEFAULT_REGION
      value: <default-aws-region>

Step 5: Install Cilium

cilium install --version 1.17.4 --values cilium-config.yaml

Wait for cilium to turn up

cilium status --wait

Step 6: Proceed with download of Apigee Hybrid Helm Charts

At this point, you can now proceed with downloading the Helm Charts locally and follow the rest of the steps. It should be noted that when deploying the actual Helm Charts, you must ensure that the labels for the apigee-runtime and apigee-data Kubernetes Node Pools are correctly configured via the nodeSelector configuration in overrides.yaml file for Apigee Hybrid. You can now re-attempt installing the Helm Charts here, which previously failed, and proceed with the Apigee Hybrid installation steps to completion.

Authors

  • Ayo Salawu TSC Apigee NorthAm
  • Anmol Sachdeva Hybrid Cloud Architect, Google Cloud Consulting

Ask a Google Cloud Sales Specialist about your project :memo:

2 Likes