It isn’t clear how google manages schema versioning or schema evolution? For example it doesn’t seem possible to perform an edit on a schema? Does that mean that any additive change is always a new version? If so, how would the team recommend that we tag the version of the schema?
After a schema is associated with a topic, you cannot update the schema or remove its association with that topic. If a schema is deleted, publishing to associated topics will fail. Schemas in Pub/Sub aren’t designed to be updated or modified – rather, they are used to enforce a particular structure on all published messages to a topic. If your schema were to change, you would need to create a new topic and attach the updated schema to it.
Pub/Sub product manager here. @jredl we are trying to figure out how evolution must work. Any feedback about your hopes and requirements on this would be appreciated. Easy way to submit this privately is through the feedback link in Console (Question Mark Icon in upper right corner).
We’d like to see it work similar to what Avro calls full transitive compatibility. So a change can be made to a schema, but only if that change is fully compatible with all data that has been sent before with previous schemas.
This would allow publishers to evolve their schema to a degree, whilst also allowing consumers to be confident building on the topic knowing the schema won’t introduce breaking changes.
It will (presumably) also be easier for you to implement, knowing that features like replaying data will continue to work, because no matter what schema is being used the data will be compliant.
For big changes to the schemas, I think it’s ok for us to have some migration path to a new topic, with new subscriptions, etc, that the consumers can then be migrated to, similar to a big change in an API and a new version. But to have to go through that effort simply to add a new field is too prohibitive and is probably going to a blocker for us making use of PubSub schemas, unless we can find some workaround (though we can’t think of any yet).
Which would be a shame, as we want to start treating our data like our APIs, with schemas, versions, etc, so we do want our PubSub feeds to be associated with a schema, but right now PubSub schemas are just too prohibitive, so we’re having to look at alternatives/workarounds. We’re just not going to come up with a perfect schema on day 1
@jredl That make sense. Very helpful definition from Confluent there. I think full transitive is achievable. Any thoughts on the value of
Schema validation server side
Does evolution mean: the a new version is added to a schema attached to topic or a new schema resource, guaranteed to be compatible with the previous one, is attached to the topic?
I was thinking along the same lines. Changes that are forward compatible according to the schema framework (currently Protobuf or Avro) should not necessitate new topics, but breaking changes absolutely should.
Today my worry is that if I was going to use Pub/Sub schemas I’d end up having to build a system around them that managed creating new topics whenever a schema changed in a forward compatible way and republishing the messages into a stable subscription for subscribers because the schemas are simply evolving too rapidly as developers iterate on business/customer problems.
As it stands I would only consider using the Schema feature for very well-known Schemas that are probably orthogonal to business logic and unlikely to evolve. There are definitely a few of these, but I don’t believe they represent the majority case in our usage of Pub/Sub (3500+ topics).
Sorry, by server side validation, do you mean validation of the schemas by the applications, i.e. the publishers to the topic and the consumers from the subscription? That’s one of the workarounds we are looking at, but it’s tricky as they could be written in any language, and we would introduce a single point of failure on a registry (we currently use an Avro registry, although may be able to use GCP Data Catalog which would help alleviate that concern). It would also be a lot less strict validation that having it done by PubSub, and wouldn’t work with things like Dataflow SQL.
Does evolution mean: the a new version is added to a schema attached to topic or a new schema resource, guaranteed to be compatible with the previous one, is attached to the topic?
Yes. I think ideally there would be a version associated with a schema. Again, similar to how an Avro Registry might do it.
So maybe if I run:
gcloud pubsub schemas create my-schema
And there is already a schema called my-schema in that project, a new version of that schema is saved, and any PubSub topics associated with that schema start using the new version.
Running:
gcloud pubsub schemas describe my-schema
Could list the version history and useful metadata like when they were created, who by, etc.
Do you feel this is something you are likely to work on, and if so do you have an idea when you might look at this (i.e. is it weeks, months, quarters away)?
We’re actively looking at applying schemas to our PubSub data and would love to get an idea if this is something we can expect to have in the future, or should we invest in alternatives (i.e. maybe not using PubSub schemas, maybe using a registry with PubSub, maybe looking more seriously at Kafka and its ecosystem, etc).
@andrewjones Apologies for the silence. Yes, your answer makes sense. We are actively working on the design for this. I hope to have this out in GA within 3-6 months, but at the moment cannot make firm commitment.
A major usability question we are working through now is whether it is better to have schema be immutable and have topics manage versioning, as an alternative to what you proposed. Meaning, if you wanted to evolve a version of mySchema_v0.1 associated with myTopic you would create mySchema_v0.2 as a distinct object and associate it with the same topic. The topic would then have a history of schema it has supported and enforce backwards compatibility of mySchema_v0.2 and mySchema_v0.1. What would be the implication of this version of evolution for your designs?
If you’d be open to discussing this live, please send me a note at kir@google.com
@jredl@dwalker-va Might you be open to discussing your views on this live? If so, send me a note at kir@google.com. I’d be grateful for a deeper discussion.
Thanks for the response @KirTitievsky . That’s disappointing - I realise you can’t get a firm date and don’t know what other priorities pushed this back, but will this be a priority for you and your team next year? If not, and you think this pushed again, we will need to discuss moving to another solution as this is a pretty large inconvenience for our teams.
Hi @KirTitievsky - I sent a message to the email you linked to last month but haven’t received a response yet. Happy to still chat about this as it’s still an issue for our customers and could be a blocker for pubsub to be a long term solution for us.
Just wondering if there has been any updates on this feature, and if there is a rough ETA you can share?
We’ve continued to build on Pub/Sub Schemas and it’s going well, but the lack of compatible updates to schemas does mean that every schema change requires quite a big migration, which is quite a lot of effort if (say) you’re just adding a new field. It has the potential to slow our teams down and/or make them reluctant to structure their data in the first place.