锐单电子商城 , 一站式电子元器件采购平台!
  • 电话:400-990-0325

Kyuubi 安装配置总结

时间:2023-11-14 00:37:06 二极管模块mee75

前言

前段时间研究过Kyuubi,主要是安装配置,适应Spark版本,验证Spark Server HA 基本验证了所有功能,但后续没有实际使用。现在总结一下记忆,避免忘记。
主要适配Spark2.4.5 以及 Spark3.1.2版同时验证是否支持Hudi。

版本说明

目前Kyuubi最新版本为1.4,Kyuubi 1.x 默认不支持Spark2,1.4版本默认Spark版本3.1.并默认支持Hudi,但是因hudi0.9版本不支持Spark3.1.2,所以需要hudi0.10.1
hudi0.10.1 已发布,可通过 Mvn 下载:https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.1.2-bundle_2.12/0.10.1/hudi-spark3.1.2-bundle_2.12-0.10.1.jar
你也可以自己打包
要想支持Spark2可以选择Kyuubi0.7,两个Kyuubi 版本都支持HA,但是0.默认不支持7版hudi

0.7版本打包: git 切换到branch-0.7
修改pom,添加

<profile>     <id>spark-2.4.5id>     <properties>         <spark.version>2.4.5spark.version>         <scalatest.version>3.0.3scalatest.version>     properties> profile> 

然后执行包装命令

./build/dist --tgz  -P spark-2.4.5 

包装完成后,生成kyuubi-0.7.0-SNAPSHOT-bin-spark-2.4.5.tar.gz

Kyuubi1.4

下载

下载apache-kyuubi-1.4.0-incubating-bin.tgz

解压

tar -zxvf apache-kyuubi-1.4.0-incubating-bin.tgz -C /opt/ 

这里的路径是 /opt/apache-kyuubi-1.4.0-incubating-bin

验证Spark

首先,确保安装Spark版本为 spark3.1.2

Welcome to       ____              __      / __/__  ___ _____/ /__     _\ \/ _ \/ _ `/ __/  '_/    /___/ .__/\_,_/_/ /_/\_\   version 3.1.2       /_/  

测试Hadoop/Spark环境

/usr/hdp/3.1.0.0-78/spark3/bin/spark-submit \ --master yarn \ --class org.apache.spark.examples.SparkPi \ /usr/hdp/3.1.0.0-78/spark3/examples/jars/spark-examples_2.12-3.1.2.jar \ 10 

输出结果

Pi is roughly 3.138211138211138 

修改Kyuubi配置

cd /opt/apache-kyuubi-1.4.0-incubating-bin/conf/ 

kyuubi-env.sh

cp kyuubi-env.sh.template kyuubi-env.sh vi kyuubi-env.sh  export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64 export SPARK_HOME=/usr/hdp/3.1.0.0-78/spark3 export HADOOP_CONF_DIR=/usr/hdp/3.1.0.0-78/hadoop/etc/hadoop export KYUUBI_JAVA_OPTS="-Xmx10g -XX: UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX: UseParNewGC -XX: UseConcMarkSweepGC -XX: CMSConcurrentMTEabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -XX:MaxDirectMemorySize=1024m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:NewRatio=3 -XX:MetaspaceSize=512m"

kyuubi-defaults.conf

cp kyuubi-defaults.conf.template kyuubi-defaults.conf

vi kyuubi-defaults.conf

kyuubi.frontend.bind.host       indata-192-168-44-128.indata.com
kyuubi.frontend.bind.port       10009

#kerberos
kyuubi.authentication   KERBEROS
kyuubi.kinit.principal  hive/indata-192-168-44-128.indata.com@INDATA.COM
kyuubi.kinit.keytab     /etc/security/keytabs/hive.service.keytab

配置环境变量

vi ~/.bashrc

export KYUUBI_HOME=/opt/kyuubi/apache-kyuubi-1.4.0-incubating-bin
export PATH=$KYUUBI_HOME/bin:$PATH

source ~/.bashrc

启停Kyuubi

bin/kyuubi start

bin/kyuubi stop

端口冲突

引用:放弃Spark Thrift Server吧,你需要的是Apache Kyuubi!

cat logs/kyuubi-root-org.apache.kyuubi.server.KyuubiServer-indata-*.indata.com.out
22/03/21 09:56:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
        at org.apache.kyuubi.zookeeper.EmbeddedZookeeper.initialize(EmbeddedZookeeper.scala:53)
        at org.apache.kyuubi.server.KyuubiServer$.startServer(KyuubiServer.scala:48)
        at org.apache.kyuubi.server.KyuubiServer$.main(KyuubiServer.scala:121)
        at org.apache.kyuubi.server.KyuubiServer.main(KyuubiServer.scala)

发现KyuubiServer报错端口占用,但不知道占用的是哪个端口。看下源码,

  private val zkServer = new EmbeddedZookeeper()

  def startServer(conf: KyuubiConf): KyuubiServer = {
    if (!ServiceDiscovery.supportServiceDiscovery(conf)) {
      zkServer.initialize(conf)
      zkServer.start()
      conf.set(HA_ZK_QUORUM, zkServer.getConnectString)
      conf.set(HA_ZK_ACL_ENABLED, false)
    }

    val server = new KyuubiServer()
    server.initialize(conf)
    server.start()
    Utils.addShutdownHook(new Runnable {
      override def run(): Unit = server.stop()
    }, Utils.SERVER_SHUTDOWN_PRIORITY)
    server
  }

看下源码,如果检测到没有配置服务发现,就会默认使用的是内嵌的ZooKeeper。当前,我们并没有开启HA模式。所以,会启动一个本地的ZK,而我们当前测试环境已经部署了ZK。所以,基于此,我们还是配置好HA。这样,也可以让我们Kyuubi服务更加可靠。

当然其实我们本来就是要配置HA的

配置Kyuubi HA

kyuubi.ha.enabled true
kyuubi.ha.zookeeper.quorum indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com
kyuubi.ha.zookeeper.client.port 2181
kyuubi.ha.zookeeper.session.timeout 600000

beeline连接

非HA

首先以非HA,即IP:Port的形式测试连接

这里用spark3 beeline,hive beeline不兼容

/usr/hdp/3.1.0.0-78/spark3/bin/beeline
Beeline version 2.3.7 by Apache Hive
beeline> !connect jdbc:hive2://indata-192-168-44-130.indata.com:10009/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=spark
Connecting to jdbc:hive2://indata-192-168-44-130.indata.com:10009/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=spark
22/03/22 19:53:39 INFO Utils: Supplied authorities: indata-192-168-44-130.indata.com:10009
22/03/22 19:53:39 INFO Utils: Resolved authority: indata-192-168-44-130.indata.com:10009
22/03/22 19:53:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Apache Kyuubi (Incubating) (version 1.4.0-incubating)
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ


0: jdbc:hive2://indata-192-168-44-130.indata.> !connect jdbc:hive2://indata-192-168-44-130.indata.com:10009/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=hive
Connecting to jdbc:hive2://indata-192-168-44-130.indata.com:10009/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=hive
22/03/22 19:54:03 INFO Utils: Supplied authorities: indata-192-168-44-130.indata.com:10009
22/03/22 19:54:03 INFO Utils: Resolved authority: indata-192-168-44-130.indata.com:10009
Connected to: Apache Kyuubi (Incubating) (version 1.4.0-incubating)
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ

这里分别以spark用户和hive用户起了两个程序

HA

HA连接方式是指,通过Zookeeper地址发现的方式,Kyuubi 1.4 zooKeeperNamespace默认值为kyuubi

/usr/hdp/3.1.0.0-78/spark3/bin/beeline
!connect jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;hive.server2.proxy.user=spark

Connecting to jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;hive.server2.proxy.user=spark
Enter username for jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default:
Enter password for jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default:
22/03/23 09:56:30 INFO Utils: Supplied authorities: indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com
22/03/23 09:56:30 INFO CuratorFrameworkImpl: Starting
22/03/23 09:56:30 INFO ZooKeeper: Initiating client connection, connectString=indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@1224144a
22/03/23 09:56:30 INFO ClientCnxn: Opening socket connection to server indata-192-168-44-130.indata.com/192.168.44.130:2181. Will not attempt to authenticate using SASL (unknown error)
22/03/23 09:56:30 INFO ClientCnxn: Socket connection established, initiating session, client: /192.168.44.130:59044, server: indata-192-168-44-130.indata.com/192.168.44.130:2181
22/03/23 09:56:30 INFO ClientCnxn: Session establishment complete on server indata-192-168-44-130.indata.com/192.168.44.130:2181, sessionid = 0x37f2e7f62d60ba1, negotiated timeout = 60000
22/03/23 09:56:30 INFO ConnectionStateManager: State change: CONNECTED
22/03/23 09:56:30 INFO ZooKeeper: Session: 0x37f2e7f62d60ba1 closed
22/03/23 09:56:30 INFO ClientCnxn: EventThread shut down
22/03/23 09:56:30 INFO Utils: Resolved authority: indata-192-168-44-130.indata.com:10009
22/03/23 09:56:30 INFO HiveConnection: Connected to indata-192-168-44-130.indata.com:10009
Connected to: Apache Kyuubi (Incubating) (version 1.4.0-incubating)
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ

根据日志,可以看到,我们的jdbc链接地址为zookeeper,通过zookeeper解析zooKeeperNamespace获取到kyuubi server真正的地址为indata-192-168-44-130.indata.com:10009
实际上,当我们配置了HA参数,启动Kyuubi Server时会在zookeeper创建一个地址/kyuubi,让我们来看一下,在zookeeper上存的是什么信息

/usr/hdp/3.1.0.0-78/zookeeper/bin/zkCli.sh -server indata-192-168-44-130.indata.com:2181
ls /kyuubi
[serviceUri=indata-192-168-44-130.indata.com:10009;version=1.4.0-incubating;sequence=0000000007]
get /kyuubi/serviceUri=indata-192-168-44-130.indata.com:10009;version=1.4.0-incubating;sequence=0000000007
hive.server2.thrift.sasl.qop=auth;hive.server2.thrift.bind.host=indata-192-168-44-130.indata.com;hive.server2.transport.mode=binary;hive.server2.authentication=KERBEROS;hive.server2.thrift.port=10009;hive.server2.authentication.kerberos.principal=hive/indata-192-168-44-130.indata.com@INDATA.COM
cZxid = 0x300029a08
ctime = Tue Mar 22 19:57:36 CST 2022
mZxid = 0x300029a08
mtime = Tue Mar 22 19:57:36 CST 2022
pZxid = 0x300029a08
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x37f2e7f62d60b64
dataLength = 295
numChildren = 0

可以看到,不仅保存了kyuubi server的ip、端口,还保存了kerberos票据等信息。
zooKeeperNamespace默认值为kyuubi,我们可以通过修改配置参数更改它

vi conf/kyuubi-defaults.conf

kyuubi.ha.zookeeper.namespace=kyuubi_cluster001
kyuubi.session.engine.initialize.timeout=180000

重启kyuubi server,在通过zk查看,发现原来的/kyuubi内容为空,新建了一个/kyuubi_cluster001,内容和之前一样。

再另一台机器再起一个kyuubi server,这样才能达到真正的HA效果,再在zk看一下,我们发现,内容已经变成两个kyuubi server了

ls /kyuubi_cluster001
[serviceUri=indata-192-168-44-130.indata.com:10009;version=1.4.0-incubating;sequence=0000000006, serviceUri=indata-192-168-44-129.indata.com:10009;version=1.4.0-incubating;sequence=0000000009]

查看具体内容,需要以逗号分隔,分开查看,这里就不贴详细信息了

get /kyuubi_cluster001/serviceUri=indata-192-168-44-130.indata.com:10009;version=1.4.0-incubating;sequence=0000000006
get /kyuubi_cluster001/serviceUri=indata-192-168-44-129.indata.com:10009;version=1.4.0-incubating;sequence=0000000009

beeline 多登陆几次,就会发现,会随机选择两个kyuubi server中的一个,这样就做到了HA,主要这里的zooKeeperNamespac改为了kyuubi_cluster001

!connect jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster001;hive.server2.proxy.user=spark

22/03/23 11:51:43 INFO Utils: Resolved authority: indata-192-168-44-130.indata.com:10009
22/03/23 11:51:43 INFO HiveConnection: Connected to indata-192-168-44-130.indata.com:10009

22/03/23 13:53:19 INFO Utils: Resolved authority: indata-192-168-44-129.indata.com:10009
22/03/23 13:53:19 INFO HiveConnection: Connected to indata-192-168-44-129.indata.com:10009

支持hudi

注意首先配好 spark hudi环境,这里的hudi jar包需要适配spark3.1.2,开头有说明

!connect jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster001;hive.server2.proxy.user=spark#spark.yarn.queue=default;spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension;spark.serializer=org.apache.spark.serializer.KryoSerializer;spark.executor.instances=1;spark.executor.memory=1g;kyuubi.engine.share.level=CONNECTION

通过jdbc_url中指定spark的参数

  • spark.yarn.queue 队列
  • spark.executor.instances executor的数量 默认2
  • spark.executor.memory executor的内存大小
  • kyuubi.engine.share.level kyuubi的级别,默认user
    支持hudi是通过这俩参数:spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension;spark.serializer=org.apache.spark.serializer.KryoSerializer

通过上面的参数连接kyuubi server并启动spark server程序后,验证hudi sql功能

create table test_hudi_table (
  id int,
  name string,
  price double,
  ts long,
  dt string
) using hudi
 partitioned by (dt)
 options (
  primaryKey = 'id',
  preCombineField = 'ts',
  type = 'cow'
 );
 insert into test_hudi_table values (1,'hudi',10,100,'2021-05-05'),(2,'hudi',10,100,'2021-05-05');
 update test_hudi_table set price = 20.0 where id = 1;
 select * from test_hudi_table;

简单的验证一下hudi的sql没有问题就行

kyuubi 0.7

这里主要是为了适配spark2, 本人的spark2版本为spark2.4.5,前面有说明如何打包kyuubi 0.7,这里打包完后的包名为:kyuubi-0.7.0-SNAPSHOT-bin-spark-2.4.5.tar.gz

配置

tar -zxvf kyuubi-0.7.0-SNAPSHOT-bin-spark-2.4.5.tar.gz -C /opt/
cd /opt/kyuubi-0.7.0-SNAPSHOT-bin-spark-2.4.5/
vi bin/kyuubi-env.sh

export SPARK_HOME=/usr/hdp/3.1.0.0-78/spark2  ## 注意需要放在# Find the spark-submit之前

解决: No such file or directory

因为tar包是在windows上打的,脚本和Linux不兼容,需要进行修改,解决方法

用vim打开该sh文件,输入:
:set ff //回车,显示fileformat=dos
:set ff=unix //重新设置下文件格式
:wq //保存退出
再执行,就不会再提示No such file or directory这个问题了。

bin 目录下面的脚本都执行一遍

启动Kyuubi Server

bin/start-kyuubi.sh --master yarn --deploy-mode client --driver-memory 2g --conf spark.kyuubi.frontend.bind.port=10010 --conf spark.kyuubi.authentication=KERBEROS --conf spark.kyuubi.ha.enabled=true \
    --conf spark.kyuubi.ha.zk.quorum=indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com --conf spark.kyuubi.ha.zk.namespace=kyuubi_cluster002 --conf spark.kyuubi.ha.mode=load-balance \
	--conf spark.kyuubi.frontend.bind.host=indata-192-168-44-130.indata.com --conf spark.yarn.keytab=/etc/security/keytabs/hive.service.keytab --conf spark.yarn.principal=hive/indata-192-168-44-130.indata.com@INDATA.COM

beeline连接

IP:Port

!connect jdbc:hive2://indata-192-168-44-130.indata.com:10010/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=spark

Connecting to jdbc:hive2://indata-192-168-44-130.indata.com:10010/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=spark
Enter username for jdbc:hive2://indata-192-168-44-130.indata.com:10010/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=spark:
Enter password for jdbc:hive2://indata-192-168-44-130.indata.com:10010/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=spark:
22/03/25 14:37:25 INFO Utils: Supplied authorities: indata-192-168-44-130.indata.com:10010
22/03/25 14:37:25 INFO Utils: Resolved authority: indata-192-168-44-130.indata.com:10010
22/03/25 14:37:25 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://indata-192-168-44-130.indata.com:10010/;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;hive.server2.proxy.user=spark
Connected to: Spark SQL (version 2.4.5)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ

ZK

/usr/hdp/3.1.0.0-78/spark2/bin/beeline

!connect jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster002

22/03/23 17:01:00 INFO Utils: Resolved authority: null:0
22/03/23 17:01:00 INFO ClientCnxn: EventThread shut down for session: 0x37f2e7f62d60bdb
22/03/23 17:01:00 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://null:0/default;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster002

然后我们发现没有连接成功,解析zk地址为null:0,我们在zk客户端,发现zk地址里的内容是正确的,那么就是beeline客户端解析有问题,然后我用Java 连接 Kereros认证下的Spark Thrift Server/Hive Server总结连接发现是成功的,只需要将程序中的SPARK_JDBC_URL 改为 jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster002,前提pom依赖版本要对应好

这样进一步验证我们的server没有问题,然后我试着将maven仓库中的hive-jdbc-1.2.1.jar放到 $SPARK_HOME/jars下,然后将原来的hive-jdbc-1.21.2.3.1.0.0-78.jar做一个备份并删掉

mv hive-jdbc-1.21.2.3.1.0.0-78.jar hive-jdbc-1.21.2.3.1.0.0-78.jar.bak
/usr/hdp/3.1.0.0-78/spark2/bin/beeline
!connect jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster002;hive.server2.proxy.user=hive

Utils: Resolved authority: indata-192-168-44-130.indata.com:10010
22/03/25 14:41:02 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://indata-192-168-44-130.indata.com:10010/default;principal=hive/indata-192-168-44-130.indata.com@INDATA.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster002;hive.server2.proxy.user=hive
Connected to: Spark SQL (version 2.4.5)
Driver: Hive JDBC (version 1.2.1)

然后发现正确解析了zk里面的内容,连接成功!!
当然这里碰巧替换了hive-jdbc-1.2.1.jar就成功了,如果不行的话,可以自己下一个开源的spark2.4.5,然后考一个之前的spark备份,试着将备份中所有的jar包都替换为开源版本,再试着用备份路径下的beeline命令

spark 2.4.5 下载地址:http://archive.apache.org/dist/spark/spark-2.4.5/

程序连接异常

上面的连接串程序运行是正常的,当加了hive.server2.proxy.user=spark时,就会抛出下面的异常

org.apache.hive.service.cli.HiveSQLException: Failed to validate proxy privilege of hive for spark
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:247)
	at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:586)
	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:192)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:270)
	at com.dkl.blog.SparkThriftServerDemoWithKerberos_2.jdbcDemo(SparkThriftServerDemoWithKerberos_2.java:52)
	at com.dkl.blog.SparkThriftServerDemoWithKerberos_2.main(SparkThriftServerDemoWithKerberos_2.java:46)
Caused by: java.lang.RuntimeException: yaooqinn.kyuubi.KyuubiSQLException:Failed to validate proxy privilege of hive for spark
	at yaooqinn.kyuubi.auth.KyuubiAuthFactory$.verifyProxyAccess(KyuubiAuthFactory.scala:190)
	at yaooqinn.kyuubi.server.FrontendService.getProxyUser(FrontendService.scala:210)
	at yaooqinn.kyuubi.server.FrontendService.getUserName(FrontendService.scala:188)
	at yaooqinn.kyuubi.server.FrontendService.getSessionHandle(FrontendService.scala:229)
	at yaooqinn.kyuubi.server.FrontendService.OpenSession(FrontendService.scala:248)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.security.authorize.AuthorizationException: Unauthorized connection for super-user: hive from IP 10.201.30.73

这是因为我本地机器没有权限

修改core-site.xml

hadoop.proxyuser.hive.groups=*
hadoop.proxyuser.hive.hosts=*

然后重启HDFS服务,再重新运行程序就可以了
当然我这里启动Kyuubi使用的票据为hive用户,就需要修改hive的权限;还有在生产环境不要设置权限为*

HA

在另一台服务器上,配置相同的kyuubi,并启动,

bin/start-kyuubi.sh --master yarn --deploy-mode client --driver-memory 2g --conf spark.kyuubi.frontend.bind.port=10010 --conf spark.kyuubi.authentication=KERBEROS --conf spark.kyuubi.ha.enabled=true     --conf spark.kyuubi.ha.zk.quorum=indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com --conf spark.kyuubi.ha.zk.namespace=kyuubi_cluster002 --conf spark.kyuubi.ha.mode=load-balance --conf spark.kyuubi.frontend.bind.host=indata-192-168-44-129.indata.com --conf spark.yarn.keytab=/etc/security/keytabs/hive.service.keytab --conf spark.yarn.principal=hive/indata-192-168-44-129.indata.com@INDATA.COM

然后beeline重新多次连接后,会发现当用这台服务器的principal连接连接另外一台服务器的kyuubi server时,会报kerberos认证的错误,我们只需要将jdbc连接串中的principal改为hive/_HOST@INDATA.COM,就可以成功随机连接其中的一台Kyuubi server了,但是在Kyuubi 1.4 Spark 3.1.2没有这个问题

!connect jdbc:hive2://indata-192-168-44-128.indata.com,indata-192-168-44-129.indata.com,indata-192-168-44-130.indata.com/default;principal=hive/_HOST@INDATA.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi_cluster002;hive.server2.proxy.user=hive

停止

bin/stop-kyuubi.sh

参考

放弃Spark Thrift Server吧,你需要的是Apache Kyuubi!
Apache Kyuubi(Incubating) 核心功能调研

锐单商城拥有海量元器件数据手册IC替代型号,打造电子元器件IC百科大全!

相关文章