If you’ve recently completed your initial Apigee hybrid installation—perhaps as a proof of concept, pilot, or evaluation—and are now ready to establish a more permanent installation, this article is for you. Apigee hybrid allows you to extend Apigee’s API management capabilities to your data center or preferred cloud, supporting various Kubernetes distributions. This article provides essential guidance and best practices to help you transition from a basic setup to something a bit more robust, realistic and scalable, with an increased security posture.
Supported Kubernetes Distributions
Apigee hybrid is designed to run on several Kubernetes distributions, offering flexibility in deployment. This includes support for:
-
AWS EKS
-
Azure AKS
-
OpenShift
-
Other supported Kubernetes platforms.
The advice and best practices outlined in this article are applicable regardless of your chosen Kubernetes environment, ensuring you can harden your Apigee hybrid installation effectively.
Production Readiness: Key Considerations
Moving to a production environment requires careful planning and configuration to ensure stability, performance, and security. Here are key areas to address:
Monitoring and Alerting
Effective monitoring and alerting are crucial for maintaining the health and performance of your Apigee hybrid deployment.
-
Infrastructure & Kubernetes Workloads: Monitor and set up alerts for telemetry data related to your Kubernetes nodes and workloads.
-
API Traffic: Monitor Apigee API traffic and configure alerts for anomalies in latency, error codes, or request rates.
-
Automated Issue Surfacing (AIS): AIS can detect known, common, system-detectable issues within your cluster, creating ApigeeIssue instances with information and links to documentation for faster resolution. You can dismiss and ignore specific ApigeeIssue instances if required.
Disaster Recovery Strategy
A well-defined disaster recovery (DR) plan is essential to minimize downtime and data loss.
-
Define Metrics: Establish Recovery Time Objective (RTO) and Recovery Point Objective (RPO) based on your business impact analysis.
-
Understand Dependencies: Identify all systems your API platform depends on and their respective SLOs/SLAs.
-
Test DR Plan: Regularly validate your recovery scenarios.
Key DR concerns include:
-
Infrastructure: Kubernetes Cluster, CertManager, Network Automation.
-
Configuration: Infrastructure Configuration, Proxy Configuration.
-
Data: Cassandra, TLS Certificates, Encryption Keys, HashiCorp Vault.
Cassandra Backups
Regular Cassandra backups are critical to prevent data loss.
-
Regular Backups: Configure Cassandra backups in your overrides file, choosing a schedule that aligns with your RPO. Backups can be enabled with Kubernetes Container Storage Interface (CSI), delivered to object storage or a specified server (over SSH). Configure retention time on your chosen storage.
-
Restore from Backup: Cassandra restores are typically performed in a new/separate Kubernetes cluster and namespace. Configure the Cassandra restore job in your overrides file and check logs for errors.
Security Hardening: Essential Practices
Securing your Apigee hybrid installation involves multiple layers of defense.
Private Kubernetes Clusters
Deploying your Apigee hybrid runtime plane in a private Kubernetes cluster enhances security by restricting access to external services and networks. Note that these are all optional, based on your desired security posture, but can typically involve:
-
Private container registry: Host your container images in a private registry; more on this below.
-
Authorized networks for control plane access: Limit access to the Kubernetes control plane.
-
Private access to external APIs: Configure your network and DNS to reach necessary APIs over a private network.
-
Restricting VPCs and subnets with firewall rules: Implement strict firewall rules to control network traffic.
-
Private DNS zones: Customize DNS resolution for external APIs to use private endpoints.
Networking
Careful network configuration is paramount for a secure and efficient Apigee hybrid deployment.
-
VPC and Subnets: Assuming a Kubernetes platform in the cloud, create a VPC and subnets that the cluster will use, including ranges for nodes, services, and pods. Configure VPCs and subnets to enable private access to necessary services. The same concepts of course exist outside of cloud environments, but can still be applied to on-premise (e.g. OpenShift)
-
Firewall rules: Implement rules to deny egress traffic by default and explicitly allow necessary ingress (e.g., for health checks) and egress to private API endpoints and master nodes.
-
Routing to external APIs: Configure routes to direct traffic to private API endpoints.
-
DNS Zone for external APIs: Customize DNS resolution for API domains to point to private endpoints.
Container Registry
- Private Container Registry: Copy Apigee hybrid runtime container images and Helm charts to a private container registry accessible from your Kubernetes cluster. If the private registry requires credentials, override imagePullSecrets.
Access Management
Implement robust access controls based on the principle of least privilege.
-
Role Based Access Control (RBAC): Manage access using your cloud provider’s IAM system, leveraging predefined and custom roles.
-
Service Accounts: Use separate Service Accounts for each component and avoid storing key files outside the cluster. Leverage Workload Identity Federation where available, to avoid the need for Service Account keys.
Cassandra Security
-
Cassandra authentication credentials: Define custom passwords in your overrides file for Cassandra authentication. Do not use default values and ensure the cassandra.auth stanza is removed before committing overrides to SCM. Consider using a Git Hook to verify the absence of credentials.
-
Cassandra data encryption: Each type of data persisted in Cassandra is encrypted at rest with separate encryption keys. Define custom keys in overrides, use different keys for each environment, and do not use defaults. Ensure all *EncryptionKey lines are removed before committing overrides to SCM. Kubernetes Secrets will be created for these values during installation.
Ingress and TLS
-
Ingress TLS configuration: Enforce minimum and maximum TLS versions at the ingress. Define a Kubernetes Secret in the Apigee namespace containing the private key and certificate for the ingress gateway.
-
Ingress mTLS configuration: For mutual TLS, set tlsMode to “MUTUAL” in your virtualhosts stanza and define two Kubernetes Secrets: one for the key and certificate, and another for a truststore containing CA certificates.
-
DDoS Protection & WAF: Consider using a Web Application Firewall (WAF) and DDoS mitigation service (e.g., Cloud Armor, AWS Shield) in front of your ingress to protect against common attacks and volumetric DDoS.
Analytics
-
Data Obfuscation: Obfuscate sensitive user data for analytics before it leaves the runtime plane by enabling the features.analytics.data.obfuscation.enabled property per environment. Apigee uses SHA512 to hash original values.
-
Setting the hash salt: Set axHashSalt in your overrides file to a custom value. Use the same salt across multiple clusters for consistent hashing.
Audit Logging
- Cloud Audit Logs: Utilize cloud provider audit logs to record operations that modify resource configurations and metadata. Enable data access audit logs explicitly if required for auditing purposes or debugging permission issues.
Environment
- Data Masking: Configure data masking at the environment level to obscure sensitive data, such as Personally Identifiable Information (PII), within the Debug tool. Promote mask configurations across environments as appropriate.
Configuration Management
-
Overrides file strategy: Do not edit any values.yaml or other files within Helm chart directories directly. Use overrides.yaml instead. For large, multi-cluster deployments, split overrides files to size environments differently and consider the order of application. Use consistent naming conventions.
-
Override file versioning: Use Source Code Management (SCM) for managing override file versions. Use external secrets management to template secrets. Store Helm chart versions together with the overrides file. Use override file versions for rollbacks and staged rollouts to higher environments.
Production Considerations Checklist
This checklist provides a summary of key items to review before going live:
-
IP Address Management (IPAM) Planning Completed
-
Running a supported Runtime Cluster version
-
Runtime Cluster configuration in SCM (Secrets removed)
-
All static Apigee resources in SCM
-
Cassandra Ring with (n>=1)*3 nodes
-
Cassandra Backups Enabled
-
Plan for regular backup restore drills
-
All RBAC permissions reviewed
-
Following Apigee Sizing recommendations
-
Keep track of upcoming releases
Please note: this is not an exhaustive list of steps for a production go-live.
Can you think of something to add? What are you doing to get your clusters production-ready? Did you learn about something new here? Add your comments before and let’s discuss!
Happy Apigee hybrid-ing!