Apache Spark无法连接到Hive元存储(找不到数据库)

huangapple 未分类评论64阅读模式
标题翻译

Apache Spark not connecting to Hive meta store (Database not found)

问题

以下是翻译好的内容:

我有一个 Java Spark 代码,我试图连接到 Hive 数据库。但是它只有默认数据库,并且给我一个 NoSuchDatabaseException。我尝试了以下设置 Hive metastore。

  1. 在代码中添加 Spark 配置,包括 Hive Metastore URI。
  2. 在 spark submit 中添加 Spark 配置。
  3. 在资源文件夹中添加 hive-site.xml。
  4. 将 hive-site.xml 复制到 Spark 配置目录 (/etc/spark2/conf/hive-site.xml)。

另外,运行时加载的 hive 配置文件与 (/etc/hive/conf/hive-site.xml) 相同。

SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("example");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
final SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark Hive Example")
                .config("hive.metastore.uris", "thrift://***:1234")
                .config("spark.sql.uris", "thrift://***:1234")
                .config("hive.metastore.warehouse.dir", "hdfs://***:1234/user/hive/warehouse/")
                .enableHiveSupport()
                .getOrCreate();
JavaRDD<sampleClass> rdd = sc.parallelize(sample);

Dataset<Row> df2 = spark.createDataFrame(rdd, sampleClass.class);

spark.sql("show databases").show();

spark submit 的日志如下所示。

spark-submit --class sampleClass \
    --master local --deploy-mode client --executor-memory 1g \
    --name sparkTest --conf "spark.app.id=SampleLoad" \
    --files /etc/spark/conf/hive-site.xml load-1.0-SNAPSHOT-all.jar
20/03/16 12:33:19 INFO SparkContext: Running Spark version 2.3.0.2.6.5.0-292
...

Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'sample' not found;
...

请告诉我我在哪里出错了。

提前感谢,

Gowtham R

英文翻译

I have a Java Spark code where I'm trying to connect Hive database. But its has only default database and gives me NoSuchDatabaseException. I tried the following to set the hive metastore.

  1. Add Spark conf in code with Hive Metastore URI
  2. Add Spark conf in spark submit
  3. Add the hive-site.xml in resources folder
  4. copy the hive-site.xml in spark conf (/etc/spark2/conf/hive-site.xml)

Also, the hive config file loaded at run time is same as (/etc/hive/conf/hive-site.xml)

SparkConf sparkConf = new SparkConf();
sparkConf.setAppName(&quot;example&quot;);
JavaSparkContext sc = new JavaSparkContext(sparkConf);
final SparkSession spark = SparkSession
                .builder()
                .appName(&quot;Java Spark Hive Example&quot;)
                .config(&quot;hive.metastore.uris&quot;, &quot;thrift://***:1234&quot;)
                .config(&quot;spark.sql.uris&quot;, &quot;thrift://***:1234&quot;)
                .config(&quot;hive.metastore.warehouse.dir&quot;, &quot;hdfs://***:1234/user/hive/warehouse/&quot;)
                .enableHiveSupport()
                .getOrCreate();
JavaRDD&lt;sampleClass&gt; rdd = sc.parallelize(sample);

Dataset&lt;Row&gt; df2 = spark.createDataFrame(rdd, sampleClass.class);

spark.sql(&quot;show databases&quot;).show();

The Logs of the spark submit is as below.

    spark-submit --class sampleClass \
&gt; --master local --deploy-mode client --executor-memory 1g \
&gt; --name sparkTest --conf &quot;spark.app.id=SampleLoad&quot; \
&gt; --files /etc/spark/conf/hive-site.xml load-1.0-SNAPSHOT-all.jar
20/03/16 12:33:19 INFO SparkContext: Running Spark version 2.3.0.2.6.5.0-292
20/03/16 12:33:19 INFO SparkContext: Submitted application: SampleLoad
20/03/16 12:33:19 INFO SecurityManager: Changing view acls to: root,User
20/03/16 12:33:19 INFO SecurityManager: Changing modify acls to: root,User
20/03/16 12:33:19 INFO SecurityManager: Changing view acls groups to:
20/03/16 12:33:19 INFO SecurityManager: Changing modify acls groups to:
20/03/16 12:33:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root, User); groups with view
permissions: Set(); users  with modify permissions: Set(root, User); groups with modify permissions: Set()
20/03/16 12:33:19 INFO Utils: Successfully started service &#39;sparkDriver&#39; on port 35746.
20/03/16 12:33:19 INFO SparkEnv: Registering MapOutputTracker
20/03/16 12:33:19 INFO SparkEnv: Registering BlockManagerMaster
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/16 12:33:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b946b14f-a52d-4467-8028-503ed7ae93da
20/03/16 12:33:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/16 12:33:19 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/16 12:33:19 INFO Utils: Successfully started service &#39;SparkUI&#39; on port 4042.
20/03/16 12:33:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://sample:4042
20/03/16 12:33:19 INFO SparkContext: Added JAR file:/abc/xyz/load-1.0-SNAPSHOT-all.jar at spark://sample:35746/jars/load-1.0-SNAPSHOT-all.jar with timestamp 1584347599756
20/03/16 12:33:19 INFO SparkContext: Added file file:///etc/spark/conf/hive-site.xml at file:///etc/spark/conf/hive-site.xml with timestamp 1584347599776
20/03/16 12:33:19 INFO Utils: Copying /etc/spark/conf/hive-site.xml to /tmp/spark-914265c5-6115-4aca-8b85-2cd49a530fae/userFiles-aaca5153-ce38-489a-a020-c2477fddc66e/hi
ve-site.xml
20/03/16 12:33:19 INFO Executor: Starting executor ID driver on host localhost
20/03/16 12:33:19 INFO Utils: Successfully started service &#39;org.apache.spark.network.netty.NettyBlockTransferService&#39; on port 45179.
20/03/16 12:33:19 INFO NettyBlockTransferService: Server created on sample:45179
20/03/16 12:33:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/16 12:33:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: Registering block manager sample:45179 with 366.3 MB RAM, BlockManagerId(driver, lhdpegde2u.enbduat.c
om, 45179, None)
20/03/16 12:33:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:20 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/local-1584347599812
20/03/16 12:33:20 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
20/03/16 12:33:20 INFO SharedState: loading hive config file: file:/etc/spark2/2.6.5.0-292/0/hive-site.xml
20/03/16 12:33:21 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir (&#39;/apps/hive/warehouse&#39;).
20/03/16 12:33:21 INFO SharedState: Warehouse path is &#39;/apps/hive/warehouse&#39;.
20/03/16 12:33:21 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/03/16 12:33:22 INFO CodeGenerator: Code generated in 184.728545 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 10.538159 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 8.809847 ms
+-------+----------------+--------------------+
|   name|     description|         locationUri|
+-------+----------------+--------------------+
|default|default database|/apps/hive/warehouse|
+-------+----------------+--------------------+

20/03/16 12:33:23 INFO CodeGenerator: Code generated in 7.13541 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 5.771691 ms
+------------+
|databaseName|
+------------+
|     default|
+------------+

Exception in thread &quot;main&quot; org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database &#39;sample&#39; not found;
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(SessionCatalog.scala:177)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabase(SessionCatalog.scala:259)
        at org.apache.spark.sql.execution.command.SetDatabaseCommand.run(databases.scala:59)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
        at org.apache.spark.sql.Dataset.&lt;init&gt;(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
        at ProcessXML.main(ProcessXML.java:95)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/03/16 12:33:23 INFO SparkContext: Invoking stop() from shutdown hook
20/03/16 12:33:23 INFO SparkUI: Stopped Spark web UI at http://sample:4042
20/03/16 12:33:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/16 12:33:24 INFO MemoryStore: MemoryStore cleared
20/03/16 12:33:24 INFO BlockManager: BlockManager stopped
20/03/16 12:33:24 INFO BlockManagerMaster: BlockManagerMaster stopped
20/03/16 12:33:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/16 12:33:24 INFO SparkContext: Successfully stopped SparkContext
20/03/16 12:33:24 INFO ShutdownHookManager: Shutdown hook called
20/03/16 12:33:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-37386c3b-855a-4e09-a372-e8d12a08eebc
20/03/16 12:33:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-914265c5-6115-4aca-8b85-2cd49a530fae

Kindly let me know what/where i went wrong.

Thanks in Advance,

Gowtham R

huangapple
  • 本文由 发表于 2020年3月16日 16:52:08
  • 转载请务必保留本文链接:https://java.coder-hub.com/60702865.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定