So! After countless retrying, and with the help of @DarwinVinoth , I manage to download the whole content of my Google Takeout Bucket : 1.2 TB over a 40 Mbps bandwith connection, so ±70 hours of download time.
Unfortunately, Google Cloud does not seem to be configured to have access that long, even with the different tricks provided by @DarwinVinoth . That’s a shame because I think that a lot of my folks do have a “low speed internet connection”. Here in Quebec 50Mbps Down / 10 Mbps Up is considered has highspeed internet when you are not in urban centre.
Nevertheless, I will write exactly what I did in the hope that it serve somebody in the future.
Basically, the trick is to use “storage rsync” instead of “storage cp” to be able to resume downloads after the connection drops, and to run a cron job to keep the connection alive as long as possible (24 hours max in my experience).
I’m not sure all the commands below are necessary, please comment to improve the following how-to.
—How to download large amount of data exported via Google Workspace Takeout via the Google Cloud Bucket—
- Start the Google Takeout export in Google Workspace Admin
- Get your Google Cloud Bucket link to download the data when the export is finished
- Install google-cloud-cli for linux (snap package or else, info from Google Workspace Admin Help - Export all your organization’s data)
- Then we need to run some commands :
1. Open your terminal
2. Go to the folder where you want to dump the data
3. Authenticate with google-cloud-cli auth login command
gcloud auth login
This will give you access to the Google Cloud Bucket with google-cloud-cli
4. Authenticate with auth application-default login
gloud auth application-default login
5. Start the data dump with verbose mode activated for feedback during the operation.
Note that gs://takeout-export-**** must be replaced by the provided google cloud bucket link.
Also note that the “.” at the end indicate to dump the data in the active folder. It could be replaced by any folder like /media/DATA or else.
Note the usage of “rsync” instead of “cp”
gcloud storage rsync --continue-on-error --recursive --log-http --verbosity=debug "gs://takeout-export-***-***-***-***-***" .
6. In another terminal window, start the cron job to keep the connection alive
while true; do gcloud auth print-access-token > token.txt; sleep 3000 # Refresh every 50 minutes
7. Monitor the operation, especially the #6 print-access-token job.
If you ever see
Reauthentication required.
You need to run the whole operation again from the start. To kill any pending jobs, press CTRL+C to kill the running command and restart from the authentication. Usually, the files will be able to resume from where they started.
8. Hopefully, after a long long time (70 hours for me!), you downloaded the whole content of your Google Takeout Cloud Bucket!
Ho, by the way, Google Devs, if you are somewhat listening to this, why not providing a simple tool to download something that big as a Google Takeout Bucket? Or a simple interface to stay logged into that job and be able to download as long as it take, like few days?
Maybe I’m getting old, but downloading during few days large files was something I did back in the 2000s ![]()
I wish you the best luck!