Trigger Postgres stored procedure using PySpark

vigneswar17 · May 2, 2024, 5:55pm

Hi Folks,

I have created a stored procedure in Postgres SQL named **“source”.**add_missing_columns. I want to trigger it using PySpark or Python code.

Below is the code Im trying to execute but getting error,

from pyspark.sql import SparkSession

Create a SparkSession

spark = SparkSession.builder
.appName(“Trigger PostgreSQL Stored Procedure”)
.getOrCreate()

Define PostgreSQL connection properties

url = “jdbc:postgresql://[HOST]:[PORT]/[DATABASE]”
properties = {
“user”: “[USERNAME]”,
“password”: “[PASSWORD]”,
“driver”: “org.postgresql.Driver”
}

Define the name of the stored procedure

procedure_name = “"Stg".add_missing_columns”

Define the call to the stored procedure with parameters

procedure_call = "CALL " + procedure_name + “(‘InLd’, ‘InStg’, ‘AnalytEDW’)”

Print the SQL query statement

print(“SQL Query Statement:”, procedure_call)

Execute the stored procedure using a SQL query

spark.read.jdbc(url=url, table="(SELECT " + procedure_call + “)”, properties=properties)

Stop the SparkSession

spark.stop()

SQL Query Statement: CALL “StgEntr”.add_missing_columns(‘InLd’, ‘InStg’, ‘AnalytEDW’)

Error - py4j.protocol.Py4JJavaError: An error occurred while calling o69.jdbc.
: org.postgresql.util.PSQLException: ERROR: syntax error at or near “.”

Has anyone tried this in any case? Any leads on this are much appreciated!

Thanks,

Vigneswar Jeyaraj

ms4446 · May 5, 2024, 4:00pm

The error message you’re encountering, “ERROR: syntax error at or near .,”, indicates a likely issue with how the stored procedure’s name is being specified or handled within the SQL context.

When using this method, be mindful of connection management. Ensure connections are properly closed to avoid resource leaks. Consider using a connection pool or context manager for better handling. See the following code as an example:

from pyspark.sql import SparkSession

Initialize a Spark session

spark = SparkSession.builder
.appName(“Trigger PostgreSQL Stored Procedure”)
.getOrCreate()

Define PostgreSQL connection properties

url = “jdbc:postgresql://[HOST]:[PORT]/[DATABASE]”
properties = {
“user”: “[USERNAME]”,
“password”: “[PASSWORD]”,
“driver”: “org.postgresql.Driver”
}

Correctly quote the schema and procedure name if necessary

procedure_name = “"source".add_missing_columns”

Construct the SQL command to call the stored procedure with parameters

procedure_call = f"CALL {procedure_name}(‘InLd’, ‘InStg’, ‘AnalytEDW’)"

Display the SQL command to be executed

print(“SQL Query Statement:”, procedure_call)

Try executing the stored procedure using Spark SQL

try:
spark.sql(procedure_call)
print(“Procedure executed successfully via Spark SQL”)
except Exception as e:
print(“Failed via Spark SQL. Error:”, e)

If Spark SQL fails, attempt direct JDBC execution

try:
with spark._jvm.java.sql.DriverManager.getConnection(url, properties[“user”], properties[“password”]) as conn:
with conn.prepareCall(procedure_call) as stmt:
stmt.execute() # Execute the procedure without fetching results
print(“Procedure executed successfully via direct JDBC”)
except Exception as jdbc_error:
print(“Failed via direct JDBC. Error:”, jdbc_error)

Clean up by stopping the Spark session

spark.stop()

Topic		Replies	Views
PySpark stored procedure in BigQuery throws a syntax error Data Analytics bigquery	3	48	January 12, 2024
How to load BigQuery stored procedure result to a spark dataframe Cloud Foundations & Onboarding cloud-trace	1	14	November 16, 2023
calling a pyspark stored procedure in bigquery takes a long time Data Analytics bigquery , business-intelligence , dataproc , analytics-general	3	43	June 14, 2024

Trigger Postgres stored procedure using PySpark

Create a SparkSession

Define PostgreSQL connection properties

Define the name of the stored procedure

Define the call to the stored procedure with parameters

Print the SQL query statement

Execute the stored procedure using a SQL query

Stop the SparkSession

Initialize a Spark session

Define PostgreSQL connection properties

Correctly quote the schema and procedure name if necessary

Construct the SQL command to call the stored procedure with parameters

Display the SQL command to be executed

Try executing the stored procedure using Spark SQL

If Spark SQL fails, attempt direct JDBC execution

Clean up by stopping the Spark session

AI Suggested topics