Hi,
While I’m trying to connect to hive-metastore(Dataproc Metastore) using thrift url within Spark Configuration, I’m getting various Metastore exceptions as included below.
Spark config:
`spark = ( SparkSession .builder .appName(“IcebergSparkSrvrlesss”) .config(“spark.sql.catalog.iceberg_catalog”,“org.apache.iceberg.spark.SparkCatalog”) .config(“spark.sql.extensions”,“org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions”) .config(“spark.sql.catalog.iceberg_catalog.type”,“hive”) .config(‘spark.sql.catalog.spark_catalog.uri’, ‘thrift://...:9083’)
.config(“spark.sql.catalog.iceberg_catalog.warehouse”,“gs://usmedp-devstg-icebergpoc/iceberg-catalog”).enableHiveSupport().getOrCreate()
Read the data and create the table df_account_partition.writeTo(f"{iceberg_catalog}.{iceberg_warehouse}.icbg_account_tbl").tableProperty(“format-version”, “2”).createOrReplace()`
Exception:
Query for candidates of org.apache.hadoop.hive.metastore.model.MVersionTable and subclasses resulted in no possible candidates
Required table missing : “VERSION” in Catalog “” Schema “”. DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “datanucleus.schema.autoCreateTables”
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : “VERSION” in Catalog “” Schema “”. DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “datanucleus.schema.autoCreateTables”
at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Traceback (most recent call last):
File “/tmp/srvls-batch-461fc9f5-cf14-4b80-b879-a2f8f369d268/iceberg_hive.py”, line 21, in
spark.sql(“CREATE NAMESPACE IF NOT EXISTS iceberg_catalog.test;”)
File “/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py”, line 723, in sql
File “/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py”, line 1321, in call
File “/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py”, line 111, in deco
File “/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py”, line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o83.sql.
: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
org.apache.spark.sql.connector.catalog.SupportsNamespaces.namespaceExists(SupportsNamespaces.java:97)
at org.apache.spark.sql.execution.datasources.v2.CreateNamespaceExec.run(CreateNamespaceExec.scala:43)
… 51 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
… 63 more
Caused by: MetaException(message:Version information not found in metastore. )
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:83)
… 68 more
Caused by: MetaException(message:Version information not found in metastore. )