Why can’t GPU be saved to checkpoint before it reaches the quota time? Many epochs are lost before the next program starts running, which is a big waste of GPU. I really hope you can improve it, thank you!
Regarding your concern, my suggestion is to create a feature request.
Here is the direct link to create a Public Issue Tracker feature request for GPU.