Correction 4/3/2024:
The message ID in Google Cloud Pub/Sub is indeed assigned by the service itself when a message is successfully published. This message ID is guaranteed to be unique within the topic for the lifetime of the topic.
However, the purpose of the message ID in Pub/Sub is not to enable end-user deduplication, but rather to allow the service to guarantee “at least once” delivery. When a message is published, it’s assigned an ID by Pub/Sub. If the same message is published again (due to an error or other issue), the service will assign a new ID to the repeated message.
For deduplication purposes, Pub/Sub provides an optional ordering_key
attribute. When publishing a batch of ordered messages, if a message with the same ordering key and data is published again, Pub/Sub deduplicates it on a best-effort basis.
Please note: The purpose of ordering keys is not to eliminate duplicate messages, but rather to ensure that messages sharing the same key are delivered in a specific sequence. However, it’s important to note that Google Pub/Sub does offer a feature known as “exactly once delivery” (you can read more about it here: https://cloud.google.com/pubsub/docs/exactly-once-delivery). This feature ensures that messages are not delivered more than once if they have not been acknowledged, thereby providing a more reliable handling of acknowledgments and message delivery.
That said, maintaining your own unique identifier for each message could help with deduplication on the subscriber side, particularly for a legacy application that may not handle redelivery well. You could include this unique identifier in the message data or attributes, and then have the subscriber application check this identifier against a local cache or database to see if the message has been processed before.
This approach, however, has its own complexities and potential pitfalls. For example, you would need to ensure that the cache or database used for checking identifiers is highly available and consistent, to prevent processing duplicate messages or missing messages in case of errors or failures. Also, you would need to consider how to manage the size and lifetime of the data stored in this cache or database, to prevent it from growing indefinitely.
So, while it’s possible to manage your own unique identifiers for messages in Pub/Sub, it’s not a straightforward solution and requires careful consideration of the trade-offs and potential issues. The best solution would depend on the specific requirements and constraints of your application and system architecture.