锐单电子商城 , 一站式电子元器件采购平台!
  • 电话:400-990-0325

Presto/Trino的Hive Connector的使用(内部表、外部表、分区表)

时间:2023-02-14 19:30:00 302b电流传感器

目录

  • 1. 配置Hive连接器
  • 2. 通过Hive连接器创建schema
  • 3. 通过Hive连接器创建内外表
  • 3. 通过Hive连接器创建分区表
  • 4. Hive连接器创建外部分区表,无法查询现有分区数据
  • 5. 通过Hive将数据插入数据
  • 6. 删除分区表的数据

Hive不使用连接器Hive运行环境,而是使用Hive Metastore服务获取元数据,然后使用HDFS直接从客户端HDFS读写数据。所以不能。SQL查询下推到Hive

1. 配置Hive连接器

在所有节点上新建catalog配置文件

[root@trino1 catalog]# pwd /root/trino-server-367/etc/catalog [root@trino1 catalog]# [root@trino1 catalog]# cat hive.properties  connector.name=hive hive.metastore.uri=thrift://192.168.23.91:9083 # 是否开启向Hive的外部表写入数据,默认是False hive.non-managed-table-writes-enabled=true  [root@trino1 catalog]#  

对于HA高可用的HDFS,则需要将Hadoop的配置文件core-site.xml和hdfs-site.xml复制到Trino所有节点

先在所有Trino节点上创建Hadoop配置目录

[root@trino1 ~]# mkdir trino-server-367/etc/hadoop [root@trino1 ~]# 

然后将Hadoop复制配置文件Trino所有节点

[root@hive1 ~]#  [root@hive1 ~]# scp hadoop-3.3.1/etc/hadoop/core-site.xml root@192.168.23.81:/root/trino-server-367/etc/hadoop  [root@hive1 ~]# scp hadoop-3.3.1/etc/hadoop/core-site.xml root@192.168.23.82:/root/trino-server-367/etc/hadoop      [root@hive1 ~]# scp hadoop-3.3.1/etc/hadoop/core-site.xml root@192.168.23.83:/root/trino-server-367/etc/hadoop  [root@hive1 ~]# scp hadoop-3.3.1/etc/hadoop/hdfs-site.xml root@192.168.23.81:/root/trino-server-367/etc/hadoop    [root@hive1 ~]# scp hadoop-3.3.1/etc/hadoop/hdfs-site.xml root@192.168.23.82:/root/trino-server-367/etc/hadoop     [root@hive1 ~]# scp hadoop-3.3.1/etc/hadoop/hdfs-site.xml root@192.168.23.83:/root/trino-server-367/etc/hadoop  [root@hive1 ~]#  

在所有Trino节点的/etc/hosts添加Hadoop服务器地址映射

192.168.23.91 hive1 192.168.23.92 hive2 192.168.23.93 hive3 

然后在Trino所有节点的etc/catalog/hive.properties添加以下配置

hive.config.resources=/root/trino-server-367/etc/hadoop/core-site.xml,/root/trino-server-367/etc/hadoop/hdfs-site.xml 

重启trino

[root@trino1 catalog]#  [root@trino1 catalog]# cd /root/trino-server-367 [root@trino1 trino-server-367]#  [root@trino1 trino-server-367]# bin/launcher stop Not running [root@trino1 trino-server-367]#  [root@trino1 trino-server-367]# bin/launcher start Started as 46215 [root@trino1 trino-server-367]# 

2. 通过Hive连接器创建schema

trino> create schema hive.test_db with (location = 'hdfs://nnha/user/hive/warehouse/test_db.db'); CREATE SCHEMA trino> 

location参数是可选的

3. 通过Hive连接器创建内外表

数据文件格式format默认为ORC,可以在catalog参数通过属性文件hive.storage-format进行修改

默认使用数据压缩GZIP,可以在catalog参数通过属性文件hive.compression-codec进行修改

创建外表,external_location必须存在指向的目录位置,如下所示:

trino>  trino> create table hive.test_db.test_external_tb(     -> user_id bigint,     -> user_name varchar,     -> birthday date,     -> country varchar     -> ) with (external_location = 'hdfs://nnha/user/hive/warehouse/test_db.db/test_external_tb'); CREATE TABLE trino> 

如果没有external_location参数创建参数

3. 通过Hive连接器创建分区表

分区列必须位于表中列的最后

trino>  trino> create table hive.test_db.test_partition_tb(     -> user_id bigint,     -> user_name varchar,     -> birthday date,     -> country varchar     -> ) with (partitioned_by = array['birthday', 'country']); CREATE TABLE trino>  

使用分区列过滤数据可以减少数据扫描的范围,如下所示:

trino> select distinct user_id from hive.test_db.test_partition_tb where birthday = date '2022-02-09' and country = 'china'; 

4. Hive连接器创建外部分区表,无法查询现有分区数据

HDFS目录结构如下:

/user/hive/warehouse/test_db.db/external_partition_tb/birthday=2018-08-16/country=china/test2018.txt /user/hive/warehouse/test_db.db/external_partition_tb/birthday=2019-08-16/country=japan/test2019.txt 

birthday=2018-08-16/country=china分区的数据内容如下

1,zhang_san,2018-08-16,china 

birthday=2019-08-16/country=japan/test2019.txt分区的数据内容如下

2,li_si,2019-08-16,japan 

在Trino创建外部区表external_partition_tb

trino>  trino> create table hive.test_db.external_partition_tb(     -> user_id varchar,     -&t; user_name varchar,
    -> birthday varchar,
    -> country varchar
    -> ) with (
    -> format = 'CSV',
    -> csv_escape = '\',
    -> csv_quote = '"',  
    -> csv_separator = ',',
    -> external_location = 'hdfs://nnha/user/hive/warehouse/test_db.db/external_partition_tb',
    -> partitioned_by = array['birthday', 'country']
    -> );
CREATE TABLE
trino> 
trino> select * from hive.test_db.external_partition_tb;
 user_id | user_name | birthday | country 
---------+-----------+----------+---------
(0 rows)

Query 20220209_113936_00038_jx84g, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0.30 [0 rows, 0B] [0 rows/s, 0B/s]

trino> 

使用Hive连接器创建CSV格式的表,列的数据类型只能是varchar类型

查询external_partition_tb表数据为空。这是因为HMS无法识别分区,需要我们手动添加分区,或者执行自动发现分区命令

  1. 手动添加分区
trino> 
trino> use hive.test_db;
USE
trino:test_db> 
trino:test_db> call system.create_empty_partition(
            -> schema_name => 'test_db',
            -> table_name => 'external_partition_tb',
            -> partition_columns => array['birthday', 'country'],
            -> partition_values => array['2018-08-16', 'china']
            -> );
CALL
trino:test_db> 
trino:test_db> select * from hive.test_db.external_partition_tb;
 user_id | user_name |  birthday  | country 
---------+-----------+------------+---------
 1       | zhang_san | 2018-08-16 | china   
(1 row)

Query 20220209_114018_00044_jx84g, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0.54 [1 rows, 32B] [1 rows/s, 60B/s]

trino:test_db> 
  1. 自动发现所有分区
trino:test_db> 
trino:test_db> call system.sync_partition_metadata(
            -> schema_name => 'test_db', 
            -> table_name => 'external_partition_tb', 
            -> mode => 'FULL', 
            -> case_sensitive => true
            -> );
CALL
trino:test_db> 
trino:test_db> select * from hive.test_db.external_partition_tb;
 user_id | user_name |  birthday  | country 
---------+-----------+------------+---------
 1       | zhang_san | 2018-08-16 | china   
 2       | li_si     | 2019-08-16 | japan   
(2 rows)

Query 20220209_114728_00048_jx84g, FINISHED, 1 node
Splits: 3 total, 3 done (100.00%)
0.50 [2 rows, 57B] [3 rows/s, 114B/s]

trino:test_db> 

5. 通过Hive连接器插入数据

trino> 
trino> insert into hive.test_db.test_external_tb(user_id, user_name, birthday, country) values(1, 'zhang_san', date '2018-08-16', 'china'), (2, 'li_si', date '2019-08-16', 'japan');
INSERT: 2 rows

Query 20220209_095722_00002_jx84g, FINISHED, 2 nodes
Splits: 5 total, 5 done (100.00%)
17.84 [0 rows, 0B] [0 rows/s, 0B/s]

trino>
trino> insert into hive.test_db.test_partition_tb select * from hive.test_db.test_external_tb;
INSERT: 2 rows

Query 20220209_095933_00003_jx84g, FINISHED, 2 nodes
Splits: 5 total, 5 done (100.00%)
6.99 [2 rows, 589B] [0 rows/s, 84B/s]

trino> 
trino> 
trino> create table hive.test_db.create_table_insert ( 
    -> user_id,                           
    -> user_name,                        
    -> birthday,                            
    -> country                           
    -> ) as select * from hive.test_db.test_partition_tb;
CREATE TABLE: 2 rows

Query 20220209_103421_00011_jx84g, FINISHED, 2 nodes
Splits: 6 total, 6 done (100.00%)
2.42 [2 rows, 732B] [0 rows/s, 302B/s]

trino>
  • 使用Hive连接器向分区表插入数据,如果分区目录不存在,则会自动创建分区目录
  • 通过create table table_name(......) as select ......创建的表,会复制表结构,不会复制表属性。但创建的新表可以自己添加表属性

6. 删除分区表的数据

trino> 
trino> select * from hive.test_db.test_partition_tb;
 user_id | user_name |  birthday  | country 
---------+-----------+------------+---------
       1 | zhang_san | 2018-08-16 | china   
       2 | li_si     | 2019-08-16 | japan   
(2 rows)

Query 20220209_115202_00053_jx84g, FINISHED, 1 node
Splits: 3 total, 3 done (100.00%)
0.44 [2 rows, 732B] [4 rows/s, 1.62KB/s]

trino> 
trino> delete from hive.test_db.test_partition_tb where birthday = date '2019-08-16';
DELETE

Query 20220209_115234_00054_jx84g, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.91 [0 rows, 0B] [0 rows/s, 0B/s]

trino> 
trino> select * from hive.test_db.test_partition_tb;
 user_id | user_name |  birthday  | country 
---------+-----------+------------+---------
       1 | zhang_san | 2018-08-16 | china   
(1 row)

Query 20220209_115259_00055_jx84g, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0.49 [1 rows, 374B] [2 rows/s, 759B/s]

trino> 
trino> select * from hive.test_db.external_partition_tb;
 user_id | user_name |  birthday  | country 
---------+-----------+------------+---------
 2       | li_si     | 2019-08-16 | japan   
 1       | zhang_san | 2018-08-16 | china   
(2 rows)

Query 20220209_115403_00056_jx84g, FINISHED, 1 node
Splits: 3 total, 3 done (100.00%)
0.39 [2 rows, 57B] [5 rows/s, 147B/s]

trino> 
trino> delete from hive.test_db.external_partition_tb where birthday = '2019-08-16';
DELETE

Query 20220209_115453_00059_jx84g, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.42 [0 rows, 0B] [0 rows/s, 0B/s]

trino> 
trino> 
trino> select * from hive.test_db.external_partition_tb;
 user_id | user_name |  birthday  | country 
---------+-----------+------------+---------
 1       | zhang_san | 2018-08-16 | china   
(1 row)

Query 20220209_115649_00060_jx84g, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0.58 [1 rows, 32B] [1 rows/s, 55B/s]

trino> 
锐单商城拥有海量元器件数据手册IC替代型号,打造电子元器件IC百科大全!

相关文章