Background
We have been working on a Cassandra-to-Cloud-Bigtable migration project and have successfully used all three tools in Google’s migration toolchain:
- Cassandra-Bigtable Proxy Adapter (port 9042 → 9043)
- CDM BulkLoad (via spark-submit)
- Dataflow Template (cassandra-to-bigtable)
All three tools work well for simple scalar types — text, int, boolean, timestamp, float, set, list, map. We have migrated multiple keyspaces cleanly using these tools.
The Gap We Found
However, when we tested against enterprise-grade schemas that include the following CQL types, all three tools fail:
| CQL Type | Proxy Adapter | Dataflow Template | CDM via Proxy |
|---|---|---|---|
uuid / timeuuid |
|||
decimal |
|||
frozen<tuple<...>> |
|||
frozen<UDT> |
Example error from Proxy + CDM:
[FAIL] geo_location frozen<tuple<double,double>> → UNSUPPORTED
[FAIL] primary_address frozen<address_udt> → UNSUPPORTED
Verdict: PROXY_BLOCKED
Example error from Dataflow Template:
Cannot map column 'geo_location' (type: frozen<tuple<double,double>>) to a Bigtable byte value.
Pipeline construction failed.
Unsupported column types found: [geo_location, primary_address]
Job state: FAILED
The Dataflow Template README explicitly lists tuple types as unsupported. The Proxy limitations documentation confirms frozen and UDT are unsupported.
Why This Matters for Enterprise Migrations
These are not edge-case types. In real-world enterprise Cassandra deployments:
uuidis the most common primary key typedecimalis standard in every financial services schema for monetary valuesfrozen<UDT>is used for addresses, profiles, and payment details in almost every CRM or e-commerce schemafrozen<tuple>is used for geo-coordinates in any location-aware applicationtimeuuidis the default clustering key for time-series and IoT data
Tables using any of these types are completely blocked from migration using the standard toolchain.
What We Built as a Workaround
We solved this by building a custom encoding layer (we call it QLift Phase 3) that:
- Reads from Cassandra using the full Python driver (which natively decodes frozen/UDT/tuple)
- Auto-discovers the schema from
system_schema.columns - Routes each CQL type to the correct encoder (16-byte big-endian for uuid/timeuuid, JSON bytes for frozen types, string bytes for decimal)
- Writes directly to Bigtable via the Python client, bypassing the Proxy
We have successfully migrated tables including uuid PKs, timeuuid clustering keys, frozen<tuple<double,double,double>>, multiple frozen<UDT> types, decimal, inet, varint, tinyint, smallint, time, and date — all in a single automated pipeline.
My Questions for the Community
-
Why were these types left unsupported in the Proxy Adapter and Dataflow Template? Is this a deliberate architectural decision (e.g., Bigtable’s schemaless byte model makes type-aware conversion out of scope for the Proxy), or is it simply a gap that hasn’t been prioritised yet?
-
Is native support for frozen, frozen, uuid, and decimal planned in any of the three tools? Is there a public roadmap or feature request tracker where this has been raised?
-
Is our approach of converting these types to JSON bytes and writing direct to Bigtable considered a valid production pattern, or is there a recommended alternative we may have missed?
Any insights from Googlers or community members who have faced this in production migrations would be very helpful. Thank you.