Exposing an external data connection

Hi,

I’ve been experimenting with the Spark integration with BigQuery.

I created remote procedures and was investigating whether or not I could share these procedures.

I noticed that the procedures I constructed required an external data connector to spark, which was the one in my project, and I could not share my procedure for others to use.

Is there a way to configure the remote procedure such that it uses the caller’s external data connector to Spark instead?

Instead of having others copy my script then create their own remote procedures, is there a way for others to just use my remote procedures instead without having to deal with the external data connector to Spark?

Thank you.

Hi,

Can you provide your reproduction steps? Can you provide your sample code? Are you using dataproc?

Thanks

I’m working within my own organization, I created an external data connection to Apache Spark.

I created some pyspark scripts and placed them into Cloud Storage.

I then created a public dataset authenticated to allUsers, and created a remote procedure linking my pyspark script and to my external data connection.

Now with a different organization, I wanted my colleague to call my public remote procedure, which he is able to. However, he encounters an error that says that he doesn’t have access to my external data connector to Apache Spark.

I believe this is because when constructing the remote procedure, you have to specify what connection to use, and that external data connection doesn’t seem to have options to make publicly available.

So I was wondering if there are alternative ways or better yet, established best practices to expose a remote procedure for people outside the organization to call.

Since this is BigQuery with Apache Spark, this is using DataProc Batches, with a custom image to set up my python script.

I posted a reply to your question, but I mistakenly put it in the comment thread of this thread instead.

Hi,

Thank you for the response. Since the issue you are facing is coming from creating a public remote procedure which for sure has a code behind it, I believe the best place to get help for this issue is likely to be Stack Overflow including the proper tag [google-cloud-dataproc] on it. This will allow more user to reach the post and could be helpful for you to find some better solutions.

Hi,

Thank you for the advice on posting to Stack Overflow. I have done as such, but there has not been a single response thus far.

For the post, I created a simple “Hello World” script, such that no custom image was needed. So this was a simple Apache Spark connector with the default image.

I hope that it would be possible to shed some light on this issue now that it’s simplified further.