I am trying to create a Batch job that mounts a GCS bucket.
I have configured logs to go to cloud logging.
For this job, the job moves directly from SCHEDULED to FAILED. There are no longs on Cloud Logging. And no error messages or descriptions for taskEvents are reported by gcloud batch jobs describe:
status:
runDuration: 0s
state: FAILED
statusEvents:
- description: Job state is set from QUEUED to SCHEDULED for [REDACTED]
type: STATUS_CHANGED
- description: Job state is set from SCHEDULED to FAILED for [REDACTED]
eventTime: '2023-01-06T21:11:26.029319834Z'
type: STATUS_CHANGED
taskGroups:
group0:
counts:
FAILED: '1'
instances:
- machineType: n2-standard-8
provisioningModel: STANDARD
taskPack: '1'
Are there any other ways to get additional debugging information for why this job failed?
^ The above instance appears to happen for a misspelled bucket name.
Upon correcting I am now getting task logs in Cloud Logging, but there is no useful output beyond runnable 3 exited with error code 1. Runnable 3 in this case seeming to be the script that is attempting to mount the bucket with gcsfuse.
Currently only getting errors like the following (textpayload):
{
task action/STARTUP/0/0/group0 runnable 3 wait error: exit status 1
},
{
task action/STARTUP/0/0/group0 runnable 3 execution err: command failed with exitCode 1
},
{
task action/STARTUP/0/0/group0: failed runnables [3]
}
Any way I can get useful error messages as to why my bucket mounts are failing?
1 Like
Ok I figured out the issue. I missed the bit in the documentation that states “The path must start with /mnt/disks/”.
First, It would be helpful to not have this requirement. Is there any reason preventing that?
Second, the agent script mounts the bucket using gcsfuse with the following command:
gcsfuse -o allow_other -file-mode=777 -dir-mode=777 -implicit-dirs MY-BUCKET /mnt/disks/MY-BUCKET > /dev/null 2>&1
Would definitely be helpful to remove the redirection of stdout and stderr to /dev/null. Then errors would show up in batch_agent_logs and help with troubleshooting.
1 Like
I encountered this issue today while using Google Batch. Apparently gcsfuse does not work from within the container if I run it as part of a docker entrypoint script to mount bucket directories. I instead soft linked the directories from /mnt/disks/MY_BUCKET instead. Thanks for your post, it helped me debug the issue.
1 Like