A guide to compression benchmarking and scaling for Google Cloud Managed Service for Kafka

This post offers a comprehensive analysis of compression and scaling within Google Cloud’s Managed Kafka service, aiming to guide you in optimizing your Kafka clusters for peak performance, cost-efficiency, and robust reliability.

In this post, we’ll explore the advantages of Google Managed Kafka, conduct performance benchmarks to evaluate its capabilities, and provide essential guidance on optimizing your Kafka deployments for throughput and latency in order to handle even the most demanding workloads.

For an introduction and overview of Apache Kafka, visit the Apache Software Foundation website. Please see here for an introduction to BASH.

Benchmarking Kafka Producers, consumers and latencies

Benchmarking your Kafka deployment is crucial for understanding its performance characteristics and ensuring it can serve your application’s requirements. This involves a deep dive into metrics like throughput and latency, along with systematic experimentation by optimizing your producer and consumer configurations. It’s important to note that this is done at a topic / application level and should be replicated for each.

Optimizing for throughput and latency

The Apache Kafka bundle includes two utilities – kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh – to assess producer performance as well as latencies.

Note: while we are using some config values in order to demonstrate tool usage, it’s recommended that you use configurations (e.g. message size, message rates etc) that mirror your workloads.

Kafka-producer-perf-test

This tool simulates producer behavior by sending a specified number of messages to a topic while measuring throughput and latencies.

  • topic*(required)*: Specifies the target kafka topic.
  • num-records*(required)*: Sets the total number of messages to send.
  • record-size*(required)*: Defines the size of each message in bytes.
  • throughput*(required)*: Sets a target throughput in messages per second (use -1 to disable throttling).
  • producer-props: Allows you to configure producer properties like:
    • bootstrap.servers(required): Comma-separated list of Kafka bootstrap server or broker addresses.
    • acks(optional): Controls the level of acknowledgment required from brokers (0, 1, or all). 0 for no broker, 1 for leader broker and ‘all’ for all brokers. The default value is ‘all’
    • batch.size(optional): The maximum size of a batch of messages in bytes. The default value is 16KB.
    • linger.ms(optional): The maximum time to wait for a batch to fill before sending. The default value is 0 ms.
    • compression.type(optional): The compression algorithm to use (none, gzip, snappy, lz4, zstd). The default value is none.

Sample code block #1: Kafka Producer performance test (maximize throughput)

**# Focused on maximizing throughput**

\# Loop through compression types

  for i in "none" "gzip" "snappy" "lz4"; do

  \# Loop through a more realistic set of batch sizes (16KB, 32KB, 64KB, 131KB)

  for j in 16384 32768 65536 131072; do

	    echo "compression:$i and batch:$j"

	    \~/kafka_2.13-3.7.2/bin/kafka-producer-perf-test.sh \\

	      --topic "comp-test-1" \\

	      --num-records 1000000 \\

	      --payload-file ./truck_engine_sensors.json \\

	      --throughput  -1 \\

	      --producer-props bootstrap.servers="bootstrap.gmk-compression-test.us-central1.managedkafka.mbawa-sandbox.cloud.goog:9092" \\

	      key.serializer=org.apache.kafka.common.serialization.StringSerializer \\ value.serializer=org.apache.kafka.common.serialization.StringSerializer \\

	      acks=all \\

	      batch.size=$j \\

	      linger.ms=10 \\

	      compression.type="$i" \\

	      --producer.config client.properties \\

	      --print-metrics | grep "1000000 records sent\\|\\compression-rate-avg"

	done;

  done;

Sample code block #2: Kafka Producer performance test (minimize latency)

**# Focused on minimizing latency**

\# Loop through compression types

  for i in "none" "gzip" "snappy" "lz4"; do

  \# Loop through a more realistic set of batch sizes (16KB, 32KB, 64KB, 131KB)

  for j in 16384 32768 65536 131072; do

	    echo "compression:$i and batch:$j"

	    \~/kafka_2.13-3.7.2/bin/kafka-producer-perf-test.sh \\

	      --topic "comp-test-1" \\

	      --num-records 1000000 \\

	      --payload-file ./truck_engine_sensors.json \\

	      --throughput  50000 \\

	      --producer-props bootstrap.servers="bootstrap.gmk-compression-test.us-central1.managedkafka.mbawa-sandbox.cloud.goog:9092" \\

	      key.serializer=org.apache.kafka.common.serialization.StringSerializer \\ value.serializer=org.apache.kafka.common.serialization.StringSerializer \\

	      acks=all \\

	      batch.size=$j \\

	      linger.ms=10 \\

	      compression.type="$i" \\

	      --producer.config client.properties \\

	      --print-metrics | grep "1000000 records sent\\|\\compression-rate-avg"

	done;

  done;
Important considerations

The most crucial properties are acks, batch.size, linger.ms and compression because they directly influence producer’s throughput and latencies. You need to find the ideal values for them for your own workload requirements. However, we suggest these baseline configurations to start with.

  • Acks: acks=1 will require acknowledgement from the leader broker only. This will give the best performance unless you need acks from all the leaders and followers.
  • Batch.size: 10000B or 10 KB, this is a good baseline value to start with. Increasing the batch size will allow producers to send more messages in a single request, reducing overheads.
  • Linger.ms: 10ms is a good value as a baseline. You can try within a range of 0-50ms. Increasing linger time further can result in increased latencies.
  • Compression: Recommendation is to use compression to further increase your throughput and reduce latencies.
Important considerations

Fetch-size is the most important property to optimize for best consumer throughputs. The baseline config for fetch-size depends majorly on your consumption and throughput requirements. It can vary upto 1MB for small messages and 1 - 50MB for large messages. We recommend analyzing how fetch size impacts both throughput and the responsiveness of your application. By meticulously documenting these experiments and analyzing the resulting data, you can identify bottlenecks and fine-tune your configurations.

Benchmark environment

  • We have created Managed Service for Apache Kafka Cluster with following parameters

12 vCPUs/48GB sizing details are informed from How to benchmark and scale your Google Cloud Managed Service for Kafka deployment blog

  • Kafka Producer is deployed on GCE Machine type n2d-standard-8 (8 vCPUs, 32 GB Memory)

How to benchmark throughput and latencies

Benchmarking for Producer

When conducting tests to measure the throughput and latencies of kafka producers, key parameters are batch.size, or the maximum size of a batch of messages, and linger.ms, the maximum time to wait for a batch to fill before sending. For the purposes of this benchmark, we are keeping acks at 1 (acknowledgment from the leader broker) to balance durability and performance. This helped us to estimate the expected throughput and latencies for a producer. To be noted, message size is kept constant as 1KB.

We also run the two separate experiments (See Throughput Column)as detailed below

  • Minimize latency:- We Cap throughout to 50000 Messages/Sec for better latency. Different compression algorithms have a different compute / IO bottleneck. So we get much better latency at lower settings of throughput.
  • Maximize throughput:- We set throughout to -1 instructs the producer to send messages as fast as possible, effectively testing its maximum throughput.
Throughput(msg/sec) Throughput(MB/sec) Avg Latency(ms) batch_size linger_ms Compression compression-rate-avg Throughput Msgs
157109.19 9.1 1930.08 16KB 10 No 1 -1 1000000
216543.95 12.54 1269.91 32KB 10 No 1 -1 1000000
228990.15 13.26 1205.06 65KB 10 No 1 -1 1000000
235294.11 13.62 1179.1 131KB 10 No 1 -1 1000000
511247.44 27.22 18.69 16KB 10 GZIP 0.126 -1 1000000
513083.63 29.6 9.45 32KB 10 GZIP 0.123 -1 1000000
513083.63 29.71 13.28 65KB 10 GZIP 0.122 -1 1000000
508905.85 29.47 13.24 131KB 10 GZIP 0.12 -1 1000000
363636.36 21.05 646.11 16KB 10 SNAPPY 0.236 -1 1000000
460829.49 26.68 262.12 32KB 10 SNAPPY 0.234 -1 1000000
553097.34 32.02 211.24 65KB 10 SNAPPY 0.233 -1 1000000
556483.02 32.22 192.38 131KB 10 SNAPPY 0.233 -1 1000000
367782.27 21.29 652.4 16KB 10 LZ4 0.228 -1 1000000
480538.2 27.82 331.18 32KB 10 LZ4 0.225 -1 1000000
515463.91 29.85 187.31 65KB 10 LZ4 0.224 -1 1000000
577367.2 33.43 202.25 131KB 10 LZ4 0.223 -1 1000000
49995 2.89 12.03 16KB 10 No 1 50000 1000000
49995 2.89 12.04 32KB 10 No 1 50000 1000000
49992.5 2.89 13.11 65KB 10 No 1 50000 1000000
49990 2.89 11.69 131KB 10 No 1 50000 1000000
49980 2.89 9.51 16KB 10 GZIP 0.136 50000 1000000
49995 2.89 10.56 32KB 10 GZIP 0.136 50000 1000000
49992.5 2.89 9.84 65KB 10 GZIP 0.136 50000 1000000
49997.5 2.89 9.5 131KB 10 GZIP 0.136 50000 1000000
49997.5 2.89 11.3 16KB 10 SNAPPY 0.243 50000 1000000
49992.5 2.89 10.75 32KB 10 SNAPPY 0.243 50000 1000000
50000 2.89 10.39 65KB 10 SNAPPY 0.243 50000 1000000
49995 2.89 9.55 131KB 10 SNAPPY 0.243 50000 1000000
49995 2.89 11.25 16KB 10 LZ4 0.235 50000 1000000
49992.5 2.89 9.81 32KB 10 LZ4 0.234 50000 1000000
49995 2.89 9.23 65KB 10 LZ4 0.234 50000 1000000
49995 2.89 10.66 131KB 10 LZ4 0.234 50000 1000000

Compression-specific observations

  • No Compression (Baseline)
    • Throughput: 235,294.11 messages/sec (Range: 157,109.19 to 235,294.11)
    • Avg Latency: 11.69 ms (Range: 11.69 to 12.06)
    • Takeaway: Not using compression results in significantly lower throughput and much higher latency compared to any of the compression options.
  • GZIP
    • Throughput: 511,247.44 messages/sec (Range: 508,905.85 to 511,247.44).
    • Avg Latency: 9.5 ms (Range: 9.5 to 9.51)
    • Compression Rate: 0.136 indicating significant size reduction
    • Takeaway: GZIP provides the best overall performance, with the highest throughput and lowest latency. This is due to its excellent compression rate. The trade-off, not measured here, is typically higher CPU usage.
  • SNAPPY
    • Throughput: 556,483.02 messages/sec (Range: 363,636.36 to 556,483.02)
    • Avg Latency: 9.55 ms (Range: 9.55 to 11.3)
    • Compression Rate: 0.243
    • Takeaway: SNAPPY offers a very high throughput, not far behind GZIP, but with a noticeable increase in average latency. It’s a good choice for scenarios where you need high throughput but want to minimize CPU overhead.
  • LZ4
    • Throughput: 577,367.20 messages/sec (Range: 367,782.27 to 577,367.20)
    • Avg Latency: 9.23 ms (Range: 9.23 to 11.25)
    • Compression Rate: 0.234
    • Takeaway: LZ4’s performance is very similar to SNAPPY. It provides excellent throughput with a moderate increase in latency compared to GZIP. Like SNAPPY, it’s a strong contender when you need a balance between performance and computational cost

Relationship between variables

  • Batch Size vs. Latency
  • For all compression types, larger batch sizes increase Avg Latency.
  • No Compression sees the most significant latency spike with batch size.
  • As expected, increasing the batch size generally leads to higher throughput (messages/s and MBs). We see a significant jump in throughput as we move from 16KB to 65KB batch sizes. However, further increasing the batch size to 100KB does not show a significant improvement in throughput. This suggests that an optimal batch size exists beyond which further increases may not yield substantial throughput gains
  • Latency tends to increase with larger batch sizes. This is because messages wait longer to be included in a larger batch before being sent. This is clearly visible when batch_size is increased from 16KB to 65KB.
  • Compression Impact on Throughput & Latency
  • Throughput increases with compression (LZ4 > SNAPPY > GZIP > No Compression).
  • Latency varies:
    • GZIP adds the most latency due to higher CPU overhead.
    • LZ4 and SNAPPY strike a balance, with LZ4 performing slightly better.
  • Compression Rate vs. Performance
  • GZIP achieves the best compression (~0.12) but at the cost of latency.
  • SNAPPY & LZ4 offer similar compression rates (~0.23) with much better throughput and lower latency.

These findings highlight the importance of careful tuning when configuring Kafka producers. Finding the optimal balance between batch.size and linger.ms is crucial for achieving desired throughput and latency goals.

Overall patterns & takeaways

  • Best overall performer:
    • LZ4 offers the highest throughput and low latency, making it ideal when speed is critical.
  • Compression trade-offs:
    • GZIP is great for storage savings but hurts latency.
    • SNAPPY & LZ4 are better for real-time performance while still reducing size.
  • Batch Size Considerations:
    • Larger batch sizes increase latency across the board, more so for No Compression and GZIP.
    • Compression helps mitigate some of the latency issues with bigger batches.
  • Optimal Configuration:
    • If throughput and low latency are priorities → Use LZ4 with smaller batches.
    • If storage savings matter more than speed → Use GZIP with larger batches.

Compression Rate vs. Throughput for each compression type

Compression Rate vs. Avg Latency for each compression type

Throughput (messages/sec) across batch sizes and compression types

Conclusion

In conclusion, optimizing your Google Managed Kafka deployment involves a thorough understanding of producer and consumer behavior, careful benchmarking, and strategic scaling. By actively monitoring resource utilization and adjusting your configurations based on your specific workload demands, you can ensure your kafka cluster delivers the high throughput and low latency required for your real-time data streaming applications.

Interested in diving deeper? Explore the resources and documentations linked below:

2 Likes

Sorry on a different topic I would like to know what kind of discovery questions should i ask my customer to make sizing decisions?

Thanks for your query. One can start form Cluster Sizing and then run following benchmark for better price-performance

1 Like

Thanks @Shlok_Karpathak appreciate your response, I shall go over the links you shared.