I’ve got a project to upload millions of files to Google Cloud Storage.
Total objects: 100 million+, total size: 14 TB.
My original plan was to use a 40TB Google Transfer Appliance for this task. This would let me treat it as a portable NFS server, easily load the files to it, ship it to Google and have it loaded into GCS. My primary motivation for this was the simple nature of getting data on the Transfer device and avoiding any impact on our network bandwidth to get data into the cloud.
Then I found the “known limitation” that Transfer Appliances only accept files >= 1 MB in size. Our files are primarily < 1 MiB in size, so this seems to rule out.
I’m now pivoting to using Google Transfer Service to manage the uploads of files, upload multiple files in parallel and restrict bandwidth so it doesn’t cause problems with the daily operations of my network.
I’d love to hear from others about this. Is my interpretation that a Transfer Appliance can’t do this correct? Will I have success with Transfer Service and does anyone have any recommendations for my workload?
Thanks