Hive:Hive 安装部署示例

1. 环境准备

  • 由于 Apache Hive 是一款基于 Hadoop 的数据仓库软件,通常部署运行在 Linux 系统之上。因此不管使用何种方式配置 Hive Metastore,必须要先保证服务器的基础环境正常,Hadoop 集群健康可用。
  • 服务器基础环境
    • 集群时间同步、防火墙关闭、主机Host映射、免密登录、JDK安装
  • Hadoop 集群健康可用
    • 启动 Hive 之前必须先启动 Hadoop 集群。
    • 特别要注意,需等待 HDFS 安全模式关闭之后再启动运行 Hive。
    • Hive 不是分布式安装运行的软件,其分布式的特性主要借由 Hadoop 完成,包括分布式存储、分布式计算。

2. 安装 MySQL

2.1. 删除 自带mariadb

[root@hadoop102 ~]# rpm -qa|grep mariadb
mariadb-libs-5.5.60-1.el7_5.x86_64
[root@hadoop102 ~]# rpm -e mariadb-libs-5.5.60-1.el7_5.x86_64 --nodeps
[root@hadoop102 ~]# rpm -qa|grep mariadb
[root@hadoop102 ~]# 

2.2. 安装软件

  • 上传文件安装包

  • 解压缩
      tar xvf mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar
    
      [root@hadoop102 mysql]# tar xvf mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar 
      mysql-community-embedded-devel-5.7.29-1.el7.x86_64.rpm
      mysql-community-test-5.7.29-1.el7.x86_64.rpm
      mysql-community-embedded-5.7.29-1.el7.x86_64.rpm
      mysql-community-embedded-compat-5.7.29-1.el7.x86_64.rpm
      mysql-community-libs-5.7.29-1.el7.x86_64.rpm
      mysql-community-client-5.7.29-1.el7.x86_64.rpm
      mysql-community-server-5.7.29-1.el7.x86_64.rpm
      mysql-community-devel-5.7.29-1.el7.x86_64.rpm
      mysql-community-libs-compat-5.7.29-1.el7.x86_64.rpm
      mysql-community-common-5.7.29-1.el7.x86_64.rpm
      [root@hadoop102 mysql]# ls
      mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar                 mysql-community-embedded-devel-5.7.29-1.el7.x86_64.rpm
      mysql-community-client-5.7.29-1.el7.x86_64.rpm           mysql-community-libs-5.7.29-1.el7.x86_64.rpm
      mysql-community-common-5.7.29-1.el7.x86_64.rpm           mysql-community-libs-compat-5.7.29-1.el7.x86_64.rpm
      mysql-community-devel-5.7.29-1.el7.x86_64.rpm            mysql-community-server-5.7.29-1.el7.x86_64.rpm
      mysql-community-embedded-5.7.29-1.el7.x86_64.rpm         mysql-community-test-5.7.29-1.el7.x86_64.rpm
      mysql-community-embedded-compat-5.7.29-1.el7.x86_64.rpm
    
  • 安装依赖

      yum -y install libaio
    
      [root@hadoop102 mysql]# yum -y install libaio
      Loaded plugins: fastestmirror
      Determining fastest mirrors
      base                                                                                                                        | 3.6 kB  00:00:00     
      epel                                                                                                                        | 4.7 kB  00:00:00     
      extras                                                                                                                      | 2.9 kB  00:00:00     
      updates                                                                                                                     | 2.9 kB  00:00:00     
      (1/7): base/7/x86_64/group_gz                                                                                               | 153 kB  00:00:00     
      (2/7): epel/x86_64/group_gz                                                                                                 |  96 kB  00:00:00     
      (3/7): base/7/x86_64/primary_db                                                                                             | 6.1 MB  00:00:00     
      (4/7): epel/x86_64/updateinfo                                                                                               | 1.1 MB  00:00:00     
      (5/7): epel/x86_64/primary_db                                                                                               | 7.0 MB  00:00:00     
      (6/7): extras/7/x86_64/primary_db                                                                                           | 247 kB  00:00:00     
      (7/7): updates/7/x86_64/primary_db                                                                                          |  16 MB  00:00:00     
      Resolving Dependencies
      --> Running transaction check
      ---> Package libaio.x86_64 0:0.3.109-13.el7 will be installed
      --> Finished Dependency Resolution
        
      Dependencies Resolved
        
      ===================================================================================================================================================
       Package                          Arch                             Version                                    Repository                      Size
      ===================================================================================================================================================
      Installing:
       libaio                           x86_64                           0.3.109-13.el7                             base                            24 k
        
      Transaction Summary
      ===================================================================================================================================================
      Install  1 Package
        
      Total download size: 24 k
      Installed size: 38 k
      Downloading packages:
      libaio-0.3.109-13.el7.x86_64.rpm                                                                                            |  24 kB  00:00:00     
      Running transaction check
      Running transaction test
      Transaction test succeeded
      Running transaction
      ** Found 2 pre-existing rpmdb problem(s), 'yum check' output follows:
      2:postfix-2.10.1-6.el7.x86_64 has missing requires of libmysqlclient.so.18()(64bit)
      2:postfix-2.10.1-6.el7.x86_64 has missing requires of libmysqlclient.so.18(libmysqlclient_18)(64bit)
        Installing : libaio-0.3.109-13.el7.x86_64                                                                                                    1/1 
        Verifying  : libaio-0.3.109-13.el7.x86_64                                                                                                    1/1 
        
      Installed:
        libaio.x86_64 0:0.3.109-13.el7                                                                                                                   
        
      Complete!
    
  • 安装 mysql

      rpm -ivh mysql-community-common-5.7.29-1.el7.x86_64.rpm mysql-community-libs-5.7.29-1.el7.x86_64.rpm mysql-community-client-5.7.29-1.el7.x86_64.rpm mysql-community-server-5.7.29-1.el7.x86_64.rpm
    
      [root@hadoop102 mysql]# rpm -ivh mysql-community-common-5.7.29-1.el7.x86_64.rpm mysql-community-libs-5.7.29-1.el7.x86_64.rpm mysql-community-client-5.7.29-1.el7.x86_64.rpm mysql-community-server-5.7.29-1.el7.x86_64.rpm
      warning: mysql-community-common-5.7.29-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
      Preparing...                          ################################# [100%]
      Updating / installing...
         1:mysql-community-common-5.7.29-1.e################################# [ 25%]
         2:mysql-community-libs-5.7.29-1.el7################################# [ 50%]
         3:mysql-community-client-5.7.29-1.e################################# [ 75%]
         4:mysql-community-server-5.7.29-1.e################################# [100%]
    

初始化配置 MySQL

  • 初始化启动
      mysqld --initialize
    
  • 更改所属组
      chown mysql:mysql /var/lib/mysql -R
    
  • 启动 mysql
      systemctl start mysqld.service
    
  • 查看生成的临时 root 密码

      cat  /var/log/mysqld.log
      ...
      [Note] A temporary password is generated for root@localhost: o+TU+KDOm004
    
  • 修改root密码 授权远程访问 设置开机自启动

    [root@node2 ~]# mysql -u root -p
    Enter password:     #这里输入在日志中生成的临时密码
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 3
    Server version: 5.7.29
      
    Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
      
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
      
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
      
    mysql> 
      
      
    #更新root密码  设置为hadoop
    mysql> alter user user() identified by "hadoop";
    Query OK, 0 rows affected (0.00 sec)
      
      
    #授权
    mysql> use mysql;
      
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'hadoop' WITH GRANT OPTION;
      
    mysql> FLUSH PRIVILEGES; 
      
    #mysql的启动和关闭 状态查看 (这几个命令必须记住)
    systemctl stop mysqld
    systemctl status mysqld
    systemctl start mysqld
      
    #建议设置为开机自启动服务
    [root@node2 ~]# systemctl enable  mysqld                             
    Created symlink from /etc/systemd/system/multi-user.target.wants/mysqld.service to /usr/lib/systemd/system/mysqld.service.
      
    #查看是否已经设置自启动成功
    [root@node2 ~]# systemctl list-unit-files | grep mysqld
    mysqld.service                                enabled 
    

3. Hive 安装部署

3.1. 部署方式

  • 单机部署
    • Hive 不是分布式软件,但是却又分布式的能力;
    • Hive 使用 HDFS 存储数据,而 HDFS 是分布式的;
    • Hive 计算数据使用 MR、Spark、Tez,而这些计算引擎是分布式的。

3.2. 上传解压

  • 上传文件 apache-hive-3.1.2-bin.tar.gz 到服务器。

3.3. 修改 guava 包

  • hive 自带的 guava 包版本太低,需要替换成 hadoop 中的 guava 包;
      cd hive/
      rm -rf lib/guava-19.0.jar
      cp /usr/local/hadoop-3.1.3/share/hadoop/common/lib/guava-27.0-jre.jar ./lib/
      cd lib/ & ls
    
  • 查看 guava 包已被更新至 hadoop 使用的更高版本 guava-27.0

3.4. 修改 hive 配置文件

  • 编辑 hive-env.sh

      vim hive-env.sh
    
      export HADOOP_HOME=/usr/local/hadoop-3.1.3
      export HIVE_CONF_DIR=/usr/local/hive/conf
      export HIVE_AUX_JARS_PATH=/usr/local/hive/lib
    
  • 编辑 hive-site.xml

      vim hive-site.xml
    
      <configuration>
      <!-- 存储元数据mysql相关配置 -->
      <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://hadoop102:3306/hive3?createDatabaseIfNotExist=true&amp;useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
      </property>
        
      <property>
          <name>javax.jdo.option.ConnectionDriverName</name>
          <value>com.mysql.jdbc.Driver</value>
      </property>
        
      <!-- 密码账号 -->
      <property>
          <name>javax.jdo.option.ConnectionUserName</name>
          <value>root</value>
      </property>
        
      <!-- mysql 密码 -->
      <property>
          <name>javax.jdo.option.ConnectionPassword</name>
          <value>hadoop</value>
      </property>
        
      <!-- H2S运行绑定host -->
      <property>
          <name>hive.server2.thrift.bind.host</name>
          <value>hadoop102</value>
      </property>
        
      <!-- 远程模式部署metastore metastore地址 -->
      <property>
          <name>hive.metastore.uris</name>
          <value>thrift://hadoop102:9083</value>
      </property>
        
      <!-- 关闭元数据存储授权  --> 
      <property>
          <name>hive.metastore.event.db.notification.api.auth</name>
          <value>false</value>
      </property>
      </configuration>
    

3.5. 修改 Hadoop 中 core-site.xml

  • 因为Hive需要把数据存储在HDFS上,并且通过MapReduce作为执行引擎处理数据;
  • 因此需要在Hadoop中添加相关配置属性,以满足Hive在Hadoop上运行。

  • core-site.xml

      vim core-site.xml
    
      <!--该参数表示可以通过httpfs接口hdfs的ip地址限制-->
      <property>
          <name>hadoop.proxyuser.root.hosts</name>
          <value>*</value>
      </property>
        
      <!--通过httpfs接口访问的用户获得的群组身份-->
      <property>
          <name>hadoop.proxyuser.root.groups</name>
          <value>*</value>
      </property>
        
    
  • 修改文件后,需要在 Hadoop 集群同步配置文件,重启 hadoop 生效。

3.6。 重启 hadoop 集群

  • 略。

3.5. 上传 mysql jar 包

  • 上传 jar 文件 mysql-connector-java-5.1.32.jar

4. 启动 Hive

4.1. 初始化 hive 元数据

  • schema 初始化命令

      cd /hive/bin
      ./schematool -initSchema -dbType mysql -verbos
    
  • 下面提示标识初始化成功

      ...
      Initialization script completed
      schemaTool completed
    

4.2. 启动 metastore 服务

  • 前台启动
    • 可以根据需求添加参数开启 debug 日志,获取详细日志信息,便于排错。
    • 进程会一直占据终端,ctrl + c 结束进程,服务关闭。 ``` /hive/bin/hive –service metastore

    /hive/bin/hive –service metastore –hiveconf hive.root.logger=DEBUG,console
    ```

  • 后台启动
      nohup ./hive/bin/hive --service metastore &
    

5. Hive 客户端使用

5.1. Hive自带客户端

  • Hive 发展至今,总共历经了两代客户端工具。
  • 第一代客户端 bin/hive(deprecated不推荐使用):
    • $HIVE_HOME/bin/hive, 是一个 shellUtil。
    • 主要功能:
      • 一是可用于以交互或批处理模式运行Hive查询;
      • 二是用于Hive相关服务的启动,比如metastore服务。
  • 第二代客户端 bin/beeline(recommended 推荐使用):
    • $HIVE_HOME/bin/beeline,是一个JDBC客户端,是官方强烈推荐使用的Hive命令行工具,和第一代客户端相比,性能加强安全性提高。

5.2. HiveServer2 服务

  • 远程模式下 beeline 通过 Thrift 连接到单独的 HiveServer2 服务上,这也是官方推荐在生产环境中使用的模式。
  • HiveServer2 支持多客户端的并发和身份认证,旨在为开放 API 客户端如 JDBC、ODBC 提供更好的支持。

    HiveServer2 通过 Metastore 服务读写元数据,在远程模式下,启动 HiveServer2 之前必须先首先启动 metastore服务。

  • 特别注意:
    • 远程模式下,Beeline 客户端只能通过 HiveServer2 服务访问 Hive,而 bin/hive 是通过Metastore服务访问的。
  • hive 客户端和服务的具体关系,如下图:

5.3. 在 hadoop103 使用 beeline 客户端 远程访问 HiveServer2

  • scp hive 到 hadoop103

      scp -r hive hadoop103:/usr/local
    
  • 启动 hiveserver2

      nohup /hive/bin/hive --service metastore &
      nohup /hive/bin/hive --service hiveserver2 &
    
  • 启动 beeline

      cd /hive
      ./beeline
    
  • 处理报错:Cannot find hadoop installation

    • 报错内容:Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
  • 解决办法:执行 source hive-env.sh

      cd /hive/conf
      source hive-env.sh
    
  • 再次启动 beeline

      cd /hive
      ./beeline
    
  • 得到如下提示

      [root@hadoop103 bin]# ./beeline
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
      Beeline version 3.1.2 by Apache Hive
      beeline> 
    
  • 连接 HS2

      beeline> ! connect jdbc:hive2://hadoop102:10000
      Connecting to jdbc:hive2://hadoop102:10000
      Enter username for jdbc:hive2://hadoop102:10000: root
      Enter password for jdbc:hive2://hadoop102:10000: 
      Connected to: Apache Hive (version 3.1.2)
      Driver: Hive JDBC (version 3.1.2)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      0: jdbc:hive2://hadoop102:10000> 
    
  • 执行 show tables;

      0: jdbc:hive2://hadoop102:10000> show tables;
      INFO  : Compiling command(queryId=root_20220630230711_bbf7a441-313a-4a33-b3dc-b4192c9b4fc1): show tables
      INFO  : Concurrency mode is disabled, not creating a lock manager
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
      INFO  : Completed compiling command(queryId=root_20220630230711_bbf7a441-313a-4a33-b3dc-b4192c9b4fc1); Time taken: 1.139 seconds
      INFO  : Concurrency mode is disabled, not creating a lock manager
      INFO  : Executing command(queryId=root_20220630230711_bbf7a441-313a-4a33-b3dc-b4192c9b4fc1): show tables
      INFO  : Starting task [Stage-0:DDL] in serial mode
      INFO  : Completed executing command(queryId=root_20220630230711_bbf7a441-313a-4a33-b3dc-b4192c9b4fc1); Time taken: 0.09 seconds
      INFO  : OK
      INFO  : Concurrency mode is disabled, not creating a lock manager
      +-----------+
      | tab_name  |
      +-----------+
      +-----------+
      No rows selected (1.576 seconds)
      0: jdbc:hive2://hadoop102:10000> 
    

6. DataGrip 连接 Hive

  • 建立连接

  • 创建 database

      create database test;
    
  • 查看库

    • 文件位置:/user/hive/warehouse
    • 看到 test.db;