**Title:** Why do the Cassandra-to-Bigtable migration tools (Proxy Adapter, CDM, Dataflow Template) not support uuid, decimal, frozen<tuple>, and frozen<UDT> types — and is native support planned?

Background

We have been working on a Cassandra-to-Cloud-Bigtable migration project and have successfully used all three tools in Google’s migration toolchain:

  • Cassandra-Bigtable Proxy Adapter (port 9042 → 9043)
  • CDM BulkLoad (via spark-submit)
  • Dataflow Template (cassandra-to-bigtable)

All three tools work well for simple scalar types — text, int, boolean, timestamp, float, set, list, map. We have migrated multiple keyspaces cleanly using these tools.


The Gap We Found

However, when we tested against enterprise-grade schemas that include the following CQL types, all three tools fail:

CQL Type Proxy Adapter Dataflow Template CDM via Proxy
uuid / timeuuid :cross_mark: BLOCKED :warning: string only :cross_mark: BLOCKED
decimal :cross_mark: BLOCKED :warning: string only :cross_mark: BLOCKED
frozen<tuple<...>> :cross_mark: BLOCKED :cross_mark: BLOCKED :cross_mark: BLOCKED
frozen<UDT> :cross_mark: BLOCKED :cross_mark: BLOCKED :cross_mark: BLOCKED

Example error from Proxy + CDM:

[FAIL] geo_location    frozen<tuple<double,double>>  → UNSUPPORTED
[FAIL] primary_address frozen<address_udt>           → UNSUPPORTED
Verdict: PROXY_BLOCKED

Example error from Dataflow Template:

Cannot map column 'geo_location' (type: frozen<tuple<double,double>>) to a Bigtable byte value.
Pipeline construction failed.
Unsupported column types found: [geo_location, primary_address]
Job state: FAILED

The Dataflow Template README explicitly lists tuple types as unsupported. The Proxy limitations documentation confirms frozen and UDT are unsupported.


Why This Matters for Enterprise Migrations

These are not edge-case types. In real-world enterprise Cassandra deployments:

  • uuid is the most common primary key type
  • decimal is standard in every financial services schema for monetary values
  • frozen<UDT> is used for addresses, profiles, and payment details in almost every CRM or e-commerce schema
  • frozen<tuple> is used for geo-coordinates in any location-aware application
  • timeuuid is the default clustering key for time-series and IoT data

Tables using any of these types are completely blocked from migration using the standard toolchain.


What We Built as a Workaround

We solved this by building a custom encoding layer (we call it QLift Phase 3) that:

  1. Reads from Cassandra using the full Python driver (which natively decodes frozen/UDT/tuple)
  2. Auto-discovers the schema from system_schema.columns
  3. Routes each CQL type to the correct encoder (16-byte big-endian for uuid/timeuuid, JSON bytes for frozen types, string bytes for decimal)
  4. Writes directly to Bigtable via the Python client, bypassing the Proxy

We have successfully migrated tables including uuid PKs, timeuuid clustering keys, frozen<tuple<double,double,double>>, multiple frozen<UDT> types, decimal, inet, varint, tinyint, smallint, time, and date — all in a single automated pipeline.


My Questions for the Community

  1. Why were these types left unsupported in the Proxy Adapter and Dataflow Template? Is this a deliberate architectural decision (e.g., Bigtable’s schemaless byte model makes type-aware conversion out of scope for the Proxy), or is it simply a gap that hasn’t been prioritised yet?

  2. Is native support for frozen, frozen, uuid, and decimal planned in any of the three tools? Is there a public roadmap or feature request tracker where this has been raised?

  3. Is our approach of converting these types to JSON bytes and writing direct to Bigtable considered a valid production pattern, or is there a recommended alternative we may have missed?

Any insights from Googlers or community members who have faced this in production migrations would be very helpful. Thank you.