Im trying to build google data fusion pipeline to load data from MS SQL server to Big Query. The source (MS Sql server - 2016 standard) is running on GCP VM. I can connect to sql instance using public IP without any issue. Im using Big Query connector as sink.
As JDBC driver I’m using Microsoft SQL Server JDBC Driver v 6.0.jre7 which available on data fusion HUB
Here is Data Fusion instance details:
- Edition : BASIC
- Version :6.8.1 (latest version)
- Private IP Disabled
I gave the “Cloud Data Fusion API Service Agent” role to the Data Fusion Service account.
When i run the pipeline and it get fails with following error messages. Can anyone please help me to resolve this. What I’m missing in the configuration.
04/07/2023 0:42:40
INFO
Launch main class org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main([--driver-url, spark://CoarseGrainedScheduler@cdap-mypipe-98803685-d4b2-11ed-a81e-26b0641a83bd-w-1.us-central1-f.c.marine-fusion-270120.internal:35779, --executor-id, 1, --hostname, cdap-mypipe-98803685-d4b2-11ed-a81e-bd-w-0.us-central1-f.c.marine-fusion-2.internal, --cores, 1, --app-id, application_1680809963358_0002, --resourceProfileId, 0, --user-class-path, file:/hadoop/yarn/nm-local-dir/usercache/yarn/appcache/application_1680809963358_0002/container_1680809963358_0002_01_000002/__app__.jar])
04/07/2023 0:42:46
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HftpFileSystem not found
04/07/2023 0:42:46
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HsftpFileSystem not found
04/07/2023 0:42:48
ERROR
Aborting task
04/07/2023 0:42:48
ERROR
Task attempt_202304061942308162483986510619117_0003_r_000000_0 aborted.
04/07/2023 0:42:48
ERROR
Exception in task 0.0 in stage 0.0 (TID 0)
04/07/2023 0:42:48
WARN
Lost task 0.0 in stage 0.0 (TID 0) (cdap-mypipe-98803685-d4b2-11ed-a81e-26b0641a83bd-w-1.us-central1-f.c.marine-fusion-270120.internal executor 2): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:162) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:505) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:508) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 1 at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:90) at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:37) at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:58) at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:32) at io.cdap.cdap.etl.spark.io.TrackingRecordWriter.write(TrackingRecordWriter.java:41) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:367) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:137) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473) at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:134) ... 9 more Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 1 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740) at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:159) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166) at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:90) at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:191) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:159) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302) ... 18 more
04/07/2023 0:42:49
ERROR
Aborting task
04/07/2023 0:42:49
ERROR
Task attempt_202304061942308162483986510619117_0003_r_000000_1 aborted.
04/07/2023 0:42:49
ERROR
Exception in task 0.1 in stage 0.0 (TID 1)
04/07/2023 0:42:51
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HftpFileSystem not found
04/07/2023 0:42:51
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HsftpFileSystem not found
04/07/2023 0:42:53
ERROR
Aborting task
