OpenTelemetry and GKE

**Hi @ms4446 **

Yesterday you told me we could do this. So I wanted to know how to do this because I’m using GKE. It may be simpler than going through an ops agent.

2. Cloud Operations for GKE:

  • If you are using GKE, you can leverage Cloud Operations for GKE, which integrates OpenTelemetry natively.
  • It automatically collects traces from your applications deployed on GKE and sends them to Google Cloud Trace without needing to configure the OpenTelemetry Collector manually.

Thank you

Hi @Navirash ,

To use Cloud Operations for GKE for OpenTelemetry tracing on Google Kubernetes Engine (GKE), follow these steps:

  1. Default Observability Features: By default, GKE clusters (both Standard and Autopilot) are configured to send system logs, audit logs, and application logs to Cloud Logging, and system metrics to Cloud Monitoring. They also use Google Cloud Managed Service for Prometheus to collect configured third-party and user-defined metrics and send them to Cloud Monitoring.

  2. Customize and Enhance Data Collection: You have control over which logs and metrics are sent from your GKE cluster to Cloud Logging and Cloud Monitoring. You can also decide whether to enable Google Cloud Managed Service for Prometheus. For GKE Autopilot clusters, the integration with Cloud Monitoring and Cloud Logging cannot be disabled.

  3. Additional Observability Metrics: You can enable additional observability metrics packages for more detailed monitoring. This includes control plane metrics for monitoring the health of Kubernetes components and kube state metrics for monitoring Kubernetes objects like deployments, nodes, and pods.

  4. Third-Party and User-Defined Metrics: To monitor third-party applications running on your clusters (like Postgres, MongoDB, Redis), use Prometheus exporters with Google Cloud Managed Service for Prometheus. You can also write custom exporters to monitor other signals of health and performance.

  5. Use Collected Data: Utilize the data collected for analyzing application health, debugging, troubleshooting, and testing. GKE provides built-in observability features like customizable dashboards, key cluster metrics, and the ability to create your own dashboards or import Grafana dashboards.

  6. Other Features: GKE integrates with other Google Cloud services for additional monitoring and management capabilities, such as security posture dashboards, insights and recommendations for cluster optimization, and network policy logging.

For detailed configuration instructions and more information, you can refer to the Google Cloud documentation on Observability for GKE.

ok thank you @ms4446 . Is it possible to do all this step with an config file like yaml ?

Yes, it is possible to configure many aspects of observability in GKE using YAML configuration files. YAML files are commonly used in Kubernetes and GKE for defining, configuring, and managing resources.

For more detailed and specific configurations, you can visit the All GKE code samples page.

Thank you but I don’t understand how to activate tracing open telemetry and export trace in google trace. I didn’t find an specific example.

Do you have a specific example please ?

To activate tracing with OpenTelemetry and export traces to Google Cloud Trace in a GKE environment, you can use the following configuration:

Enable OpenTelemetry in Strimzi:

  1. Add the following configuration to your Strimzi deployment:

    tracing:
      type: opentelemetry
    
    
  2. Configure the OpenTelemetry Collector: a. Deploy an OpenTelemetry Collector in your GKE cluster. b. Apply the following configuration:

    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:55680"
    exporters:
      googlecloud:
        project: "YOUR_PROJECT_ID"
        service: pipelines
        traces:
          receivers: [otlp]
          exporters: [googlecloud]
    
    

    This configuration sets up the Collector to receive OTLP trace data and export it to Google Cloud Trace.

  3. Apply and Restart: a. Apply the OpenTelemetry Collector configuration. b. Restart the OpenTelemetry Collector to apply the changes.

  4. Permissions and Network Configuration: a. Ensure that the OpenTelemetry Collector has the necessary permissions to send data to Google Cloud Trace. b. Verify that your network configuration allows communication between Strimzi, the Collector, and Google Cloud Trace.

After completing these steps, your Strimzi deployment will emit OpenTelemetry traces, which the OpenTelemetry Collector will collect and export to Google Cloud Trace. You can then view and analyze these traces in the Google Cloud Console.

1 Like

For this approach, you do not need the Ops Agent in this configuration. The OpenTelemetry Collector alone is sufficient for collecting and exporting traces from Strimzi to Google Cloud Trace.

For step 2 of the OpenTelemetry Collector configuration, you can start with the provided YAML configuration. However, be aware that additional adjustments may be necessary depending on your specific setup. For example, if your Strimzi deployment sends traces over a different protocol or port, you will need to modify the receivers section accordingly.

The connection between your Strimzi configuration and the OpenTelemetry Collector is established through the OTLP protocol. Ensure that Strimzi is configured to send OTLP trace data to the correct endpoint where the OpenTelemetry Collector is listening. This means matching the IP and port in the Strimzi configuration with the endpoint specified in the OpenTelemetry Collector’s receivers section.

Here’s a summary of the steps:

  1. Enable OpenTelemetry in your Strimzi deployment: Configure Strimzi to emit OpenTelemetry traces.

  2. Deploy and configure the OpenTelemetry Collector: Use the provided YAML as a base, but be prepared to make adjustments based on your environment’s specifics, such as network settings and trace volume.

  3. Ensure Proper Network Configuration and Permissions: Make sure the OpenTelemetry Collector has the necessary permissions to access Google Cloud Trace. Also, configure network policies and firewall rules within your GKE cluster to allow communication between Strimzi and the OpenTelemetry Collector.

  4. Monitor and Scale as Needed: Keep an eye on the resource usage and performance of the OpenTelemetry Collector, especially if dealing with high volumes of traces. Scale the Collector if necessary to handle the load.

After completing these steps, Strimzi will emit OpenTelemetry traces, which the OpenTelemetry Collector will then collect and export to Google Cloud Trace. You can view and analyze these traces in the Google Cloud Console.

Thank you very much for your explanation @ms4446 :grinning_face_with_smiling_eyes:

Hi @ms4446
I implemented the opentelemetry collector solution. I receive the traces from my strimzi to my opentelemetry collector. But unfortunately I don’t have these traces in google trace explorer.

When I look at the logs at my opentelemetry collector. I have a problem of permission while my service account has the role cloudtrace.agent.

Do you have any suggestion ?

Hi @Navirash ,

If you’re encountering permission issues with your OpenTelemetry Collector despite the service account having the cloudtrace.agent role, here are some suggestions to troubleshoot and resolve the issue:

  1. Verify Service Account Permissions:

    • Double-check that the service account used by the OpenTelemetry Collector indeed has the cloudtrace.agent role. This role should allow the account to write trace data to Google Cloud Trace.
    • Ensure that the service account is correctly associated with the OpenTelemetry Collector. If the Collector is running in a Kubernetes environment, this typically involves setting up a Kubernetes secret with the service account key and mounting it in the Collector’s pod.
  2. Check for IAM Policy Propagation Delay:

    • Sometimes, there can be a delay in IAM policy changes taking effect. If you’ve recently added the cloudtrace.agent role to the service account, wait a few minutes and then retry.
  3. Review OpenTelemetry Collector Logs:

    • Examine the logs of the OpenTelemetry Collector more closely to identify any specific error messages related to the permission issue. This can provide clues about what might be going wrong.
  4. Validate Service Account Key:

    • Ensure that the service account key file used by the OpenTelemetry Collector is valid and has not expired. If necessary, create a new key file in the Google Cloud Console and update the Kubernetes secret accordingly.
  5. Network Configuration:

    • Although this seems like a permission issue, it’s also worth checking that there are no network configuration issues preventing the OpenTelemetry Collector from reaching Google Cloud Trace.
  6. Google Cloud Trace API Enabled:

    • Make sure that the Google Cloud Trace API is enabled in your Google Cloud project.

Thanks for your help @ms4446 . I just restart the deployment and it works.
But now, i have this error : failed to export to Google Cloud Trace: context deadline exceeded.

Do you know this error ?

The error “failed to export to Google Cloud Trace: context deadline exceeded” typically indicates a timeout issue. This error occurs when the OpenTelemetry Collector is unable to send trace data to Google Cloud Trace within a specified time frame. Here are some steps to troubleshoot and resolve this issue:

  1. Network Latency or Connectivity Issues:

    • Check for any network latency or connectivity issues between the OpenTelemetry Collector and Google Cloud Trace. This could be due to network congestion, firewall rules, or other network-related configurations that might be blocking or slowing down the connection.
  2. Increase Timeout Settings:

    • If network latency is an issue, consider increasing the timeout settings in the OpenTelemetry Collector’s configuration. This gives more time for the Collector to send data to Google Cloud Trace before timing out.
  3. Review Collector Configuration:

    • Ensure that the OpenTelemetry Collector is correctly configured to communicate with Google Cloud Trace. This includes verifying endpoint URLs, authentication credentials, and other relevant settings.
  4. Check for High Volume of Traces:

    • If your system is generating a high volume of trace data, the Collector might be getting overwhelmed, leading to timeouts. In this case, consider scaling up the Collector (e.g., increasing resources like CPU and memory) or optimizing how traces are batched and sent to Google Cloud Trace.
  5. Monitor Collector Performance:

    • Monitor the performance metrics of the OpenTelemetry Collector to see if it’s experiencing resource constraints (like CPU or memory pressure) that could be causing the timeouts.
  6. Examine Logs for Additional Clues:

    • Check the OpenTelemetry Collector logs for any additional error messages or warnings that might provide more context about the timeout issue.
  7. Update Collector to Latest Version:

    • Ensure that you are using the latest version of the OpenTelemetry Collector, as updates often include performance improvements and bug fixes.

Thanks @ms4446 . Do you know how to increase timeout because my system generate a high volume of trace ?

Thanks

To address timeout issues when exporting traces to Google Cloud Trace, you’ll need to modify the configuration of the googlecloud exporter in your OpenTelemetry Collector configuration. Follow these steps:

1. Locate the Exporter Configuration:

  • Identify the section in your OpenTelemetry Collector configuration file where the googlecloud exporter is defined.

2. Adjust the Timeout Setting:

  • Add or modify the timeout setting within the googlecloud exporter configuration. The timeout is typically specified in seconds.

Example:

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    timeout: 30s

3. Apply the Configuration Changes:

  • Save the updated configuration file.

4. Restart the OpenTelemetry Collector:

  • Restart the OpenTelemetry Collector to apply the new configuration.

5. Monitor the Results:

  • Observe the OpenTelemetry Collector logs to check if the “context deadline exceeded” errors are resolved.

6. Consider Batch Processing:

  • Configure the batch processor in your OpenTelemetry Collector to handle high volumes of traces efficiently.

Example:

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

7. Review Network Performance:

  • Verify that network latency and bandwidth are not contributing to the timeouts.

Remember:

  • Monitor the performance and resource usage of the Collector, especially with high trace volumes.
  • Carefully balance timeout settings with Collector performance and resource utilization.

Hi @ms4446

Thank you, it works, my traces are exported well to Google Trace.

I have a quick question: I want to export the logs to Google cloud logging. But this requires a json format.
Are the logs in opentelemetry in JSON format by default?
If it is not in JSON format is there a way to convert to JSON format?

Hi @Navirash ,

The OpenTelemetry Collector dictates the format, and for Google Cloud Logging, we need JSON. Let’s fix that!

Configure a Logging Exporter

  1. Add a logging exporter to your OpenTelemetry Collector config,specifying json as the output format. This exporter will wrap your logs in JSON before sending them to Google Cloud Logging.

Example Configuration:

exporters:
  logging:
    loglevel: debug
    encoding: json

Here, encoding: json is the key! ?

Include the Exporter in Your Pipeline

  1. Tell your pipelines to use this new JSON-loving exporter. Here’s an example:
service:
  pipelines:
    logs:
      receivers: [your_log_receiver]
      processors: [your_processors]
      exporters: [logging]

Replace your_log_receiver and your_processors with your actual log collection and processing components.

Apply and Verify

  1. Apply the updated config to your OpenTelemetry Collector and restart it. Then, check the logs!They should now be formatted as JSON and happily chilling in Google Cloud Logging.

Hi @ms4446
Thanks for your answer. Can I put encoding : json in googlecloud exporters ?
If I understood correctly, if i add the encoding json in googlecloud exporters this will export the logs from my opentelemetry collector (see attachment) to Google cloud logging.

When i try with logging. I have this error :

To resolve this error, I added this :
service:
telemetry:
logs:
encoding: json

That’s works. So can i export the telemetry logs in google cloud ?

As of the latest OpenTelemetry Collector versions, this setting is unnecessary. The exporter automatically handles formatting for Google Cloud Logging, which typically involves JSON. My previous information about specifying encoding: json was outdated and potentially misleading. I apologize for the confusion.

Here’s a revised overview of how to export your telemetry logs to Google Cloud:

1. Configure the googlecloud exporter:

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    # other relevant configuration options...

This configuration focuses on the project ID and other essential settings, not explicit encoding.

2. (Optional) Use a dedicated logging exporter:

If you need more control over the JSON format or require advanced processing, consider a separate logging exporter like logging or fluentd. Configure it with your desired format and Google Cloud Logging details (project ID, log name, etc.).

3. Restart the OpenTelemetry Collector:

After any configuration changes, restarting the Collector ensures the new settings take effect.

4. Verify your logs in Google Cloud Logging:

Once everything is set up and restarted, your telemetry logs should be flowing to Google Cloud in JSON format. You can access and analyze them using the Google Cloud Console or other tools.

Hi @ms4446

Thanks for your answer. Where do you find this information “The exporter automatically handles formatting for Google Cloud Logging” ?

Can I add this to be sure that the logs are in json format ?
service:
telemetry:
logs:
encoding: json
And then I export like this :
service:
pipeline:
logs:
receiver: [oltp]
exporters: [googlecloud]

While the OpenTelemetry Collector documentation doesn’t explicitly mention “JSON encoding” for the googlecloud exporter, it does imply automatic format handling. This is evident in statements about the exporter “formatting and sending log entries to the Google Cloud Logging API.” This suggests adherence to the expected format, typically JSON, for Google Cloud Logging.

Standard Configuration Structure:

You’re absolutely right; the standard Collector configuration doesn’t include service: telemetry: logs: encoding: json. The Collector primarily focuses on receivers, processors, exporters, and pipelines. The encoding setting typically resides within a dedicated logging exporter, not under the service section.

Exporting Logs to Google Cloud Logging:

The correct approach is to configure the googlecloud exporter within your pipeline. This exporter handles both formatting and exporting of logs to Google Cloud Logging. Here’s a recommended configuration:

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    # other configuration options...
service:
  pipelines:
    logs:
      receivers: [your_log_receiver]
      processors: [your_processors]
      exporters: [googlecloud]

This configuration assigns responsibility for log handling and exporting to the googlecloud exporter, eliminating the redundant and potentially misleading encoding: json setting.