I already had the server-ca.pem, client-cert.pem and client-key.pem files. But I was having trouble making jdbc calls with the three pem files.
So I followed the MysSQL documentation and created the truststore file from server-ca.pem file for server authentication, the below is the documentation:
MySQL :: MySQL Connector/J 8.1 Developer Guide :: 6.9.1 Setting up Server Authentication
Then I created keystore file from client-cert.pem and client-key.pem files as per the MySQL documentation:
MySQL :: MySQL Connector/J 8.1 Developer Guide :: 6.9.2 Setting up Client Authentication
Finally once the truststore and keystore files are ready we need copy the files to both driver and executors because we are working on distributed systems like EMR. So to copy files to both driver and executors, I had copied the truststore and keystore files to a S3 bucket and launched another EMR with a bootstrap script below to copy to a same path in both executors and driver. The bootstrap script is as below:
#!/bin/bash
aws s3 cp s3://test-bucket-us-east-2/truststore /home/hadoop
sudo chmod +x /home/hadoop/truststore
aws s3 cp s3://test-bucket-us-east-2/keystore /home/hadoop
sudo chmod +x /home/hadoop/keystore
Finally once the files are ready in the EMR I started a pyspark shell with the below command:
pyspark --jars mysql-connector-j-8.1.0.jar ## You can download the MySQL Connector/J jar file from this location https://downloads.mysql.com/archives/c-j/
I used this MySQL Connector/J - mysql-connector-j-8.1.0.jar
Finally in the pyspark shell I ran the below spark JDBC connection command to extract data from a MySQL table hosted on GCP:
*df = spark.read *
*.format(“jdbc”) *
*.option(“driver”,“com.mysql.cj.jdbc.Driver”) *
*.option(“url”, “jdbc:mysql://:/<db_name>”) *
*.option(“dbtable”, “table_name”) *
*.option(“user”, “user_id”) *
*.option(“password”, “pwd”) *
.option(“sslMode”, “VERIFY_CA”) \ # This needs to be changed basing on your requirement. This determines what type of SSL authentication happens, so this is important
*.option(“trustCertificateKeyStoreUrl”, “file:/path/to/truststore”) *
*.option(“trustCertificateKeyStorePassword”, “truststore_pwd”) *
*.option(“clientCertificateKeyStoreUrl”, “file:/path/to/keystore”) *
*.option(“clientCertificateKeyStorePassword”, “keystore_pwd”) *
.load()