Dataproc v2.1.0 Ranger w/ GCS Plugin Breaks Startup Script

The bootstrap script fails due to the following:

<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: + chmod 400 /etc/security/keytab/gcs.service.keytab
<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: + copy_ranger_gcs_plugin_auth_service_artifacts
<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: + mkdir -p /usr/lib/dataproc-ranger-gcs-plugin/lib
<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: + chmod 751 /usr/lib/dataproc-ranger-gcs-plugin/lib
<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: + rm -rf '/usr/lib/dataproc-ranger-gcs-plugin/lib/*'
<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: + cp '/usr/lib/dataproc-ranger-gcs-plugin/ranger-gcs-authorization-service-*.jar' /usr/lib/dataproc-ranger-gcs-plugin/lib/ranger-gcs-plugin-authorization-server.jar
<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: cp: cannot stat '/usr/lib/dataproc-ranger-gcs-plugin/ranger-gcs-authorization-service-*.jar': No such file or directory
<13>Jan 28 01:10:32 startup-script[1367]: <13>Jan 28 01:10:32 activate-component-ranger[5106]: + exit_code=1

When I downgrade to 2.0, it successfully boots.

@tim_bess

I faced this issue with startup scripts , try to put required packages, dependencies in storage bucket and download from storage bucket usibg gsutil in startup script.

@VishalBulbule I considered doing that, but I can’t seem to find those ranger gcs jars anywhere. Do you remember where you were able to download them from?

@tim_bess

Not able to recall actually, but got some documentation check it if it help

https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/ranger-plugin

Welp I went into the 2.0 cluster, downloaded the jars, put them in a bucket, then added an initialization script that downloads them into the directory… But the initialization script only seems to run on the master node and all the workers fail now.

@VishalBulbule FYI, for some reason or another I couldn’t get my initialization script to run on the workers. I did end up getting it working by:

  1. Standing up a dataproc 2.0 cluster.
  2. Uploading /usr/lib/dataproc-ranger-gcs-plugin to GCS.
  3. Writing a script to download it into the right directory.
  4. Using that script to build a custom dataproc base image.
  5. Creating a 2.1 cluster with said image.

Still need to integrate it with the metastore and a separate analysis cluster to be sure that it definitely works, but it at least boots and shows the plugin in the ranger UI now.