实验目的

了解使用Hive Shell进行Hive的基本操作。

实验原理

Hive有两种操作方式，一种是在Shell终端中，使用HiveShell命令；一种是用JAVA API。本次实验，先了解Hive Shell的基本操作，下一次实验了解JAVA API操作。

实验步骤

步骤1.了解Hive数据类型。

Hive所有数据类型分为四种类型，给出如下：

列类型

文字

Null 值

复杂类型

列类型被用作Hive的列数据类型。包括整型INT，字符串类型CHAR和VARCHAR，时间戳，日期，小数点DECIMAL和联合类型。

文字类型包括浮点类型和十进制类型。

NULL值表示缺少值，是一个特殊值。

复杂类型包括数组，映射，结构体。

步骤2.使用hive命令进入Hive CLI。

#cd /usr/local/hive
#bin/hive

此时，我们进入了hive的CLI界面，如下：

hive>

后面的操作如无特殊说明，均表示在hive的CLI下操作（即在hive>下），如果不小心退出了，则重新进入。

步骤3.数据库的有关操作。

1.如果数据库不存在的话创建数据库，默认数据库default：

create database if not exists test;

2.查看hive中所包含的数据库：

show databases;

3.如果数据库非常多，可以用正则表达式匹配筛选出需要的数据库名。

show databases like 't.*';

4.创建数据库并指定数据库存放位置(默认存放在hive.metastore.warehouse.dir所指定的目录)：

create database test01 location '/data1';

5.创建数据库时增加描述信息：

create database test02 comment 'this is a database named test02';

6.查看数据库的描述信息：

describe database test02;

7.使用数据库：

use test01;

8.如果数据库存在，删除数据库：

drop database if exists test;

9.默认情况下，hive是不允许用户删除一个包含有表的数据库的。

用户要么先删除表再删除数据库，要么在命令中加入关键字cascade（默认是restrict）:

drop database if exists test01 cascade;

如果某个数据库删除了，其对应的目录也同时会被删除。

步骤4.表的有关操作。
1.创建表

create table if not exists test02.employees(name string comment 'Employee name', salary float comment 'employee salary',subordinates array<String> comment 'Names of subordinates', deductions map<string, float> comment 'keys are deductions names, values are percentages',address struct<street:string, city:string, state:string, zip:int> comment 'Home address')comment 'Description of the table' tblproperties ('creator'='me','created_date'='2015-12-22');

如果用户当前所处的数据库并非目标数据库，那么我们可以在表名钱增加一个数据库名来进行指定，如test02就是我们之前建的数据库。

2.拷贝已经存在的表的表模式，无需拷贝数据：

create table if not exists test02.employees2 like test02.employees;

3.列举当前库下的所有表：

show tables;

4.列举指定库下的所有表：

show tables in default;

5.根据正则过滤需要的表名：

show tables like 'empl.*';

6.查看表结构信息:

describe test02.employees;

7.查看表的结构的详细信息：

describe extended test02.employees;

8.查看某一列的信息：

describe test02.employees salary;

9.创建一个外部表

create external table if not exists stocks(name STRING, age INT, address STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/data1/stocks';

这个表可以读取所有位于/data1/stocks目录下以逗号分隔的数据。

因为表是外部的，所以hive并非认为其完全拥有这份数据，因此删除该表并不会删除掉这份数据。

10.通过复制产生一张表

create external table if not exists test02.emplyees3 like test02.employees;

这里语句中如果省略掉external关键字而且源表是外部表的话，那么生成的新表也是外部表。

如果语句中省略掉external关键字而且源表是管理表的话，那么生成的新表也将是管理表。

但是，如果语句中包含有external关键字而且源表是管理表的话，那么生成的新表将是外部表。
11.创建分区表：

create table employees06 (name string, age int, phone string) partitioned by (address string, city string) row format delimited fields terminated by '\t';

12.增加一个分区：

alter table employees06 add if not exists partition(address='US',city='NY');

13.查看所有分区：

show partitions employees06;

14.查看指定分区：

show partitions employees06 partition(address='US');

15.创建外部表分区：

create external table if not exists log_messages (hms int) partitioned by (year int, month int, day int) row format delimited fields terminated by '\t';

16.删除表：

drop table if exists employees;

17.表重命名

alter table log_messages rename to logmessages;

18.增加、修改和删除分区

alter table logmessages add if not exists partition(year = 2015, month = 12, day =23) location '/logs/2015/12/23' partition (year = 2015, month = 12, day = 22) location '/logs/2015/12/22';

19.删除某个分区：

alter table logmessages drop if exists partition (year = 2015, month = 12, day = 22);

20.增加列：

alter table logmessages add columns (app_name string comment 'application name');

21.修改列，对某个字段进行重命名并修改其位置、类型或者注释：

alter table logmessages change column hms hours_minutes int comment 'the hours and minutes' after app_name;

把列名hms修改为：hours_minutes，并添加注释，位置放在列app_name 后面，若是放在第一个位置则用关键字 first代替after app_name.

22.删除或者替换列：

alter table logmessages replace columns (hms int);

这是把表中所有的列删除掉，并新加入列hms，因为是alter语句，所以只有表的元数据信息改变了，原来的分区还在。

23.修改表的属性

alter table logmessages set tblproperties('notes' = 'the process');

步骤5.插入导出数据。

先在/home/user/data下建立将要导入的数据（user指当前用户，使用时注意替换为自己的用户名，下同）。

另起一个终端，#后的为shell命令。

#cd ~
#sudo mkdir data
#cd data
#sudo touch employees06-data.txt
#sudo gedit employees06-data.txt

在打开的文件中输入以下内容，以tap键作为分隔符：

zhang    23    12345678912
zhao    45    12345678912

保存并关闭文件，退出终端。

1.在hive CLI中继续，导入employees06-data.txt进入表格employees06。

load data local inpath '/home/user/data/employees06-data.txt' overwrite into table employees06 partition (address = 'US', city = 'CA');

如果目标表是非分区表，则不需要partition。

关键字：LOCAL，使用LOCAL的话，路径是本地文件系统路径，数据将会拷贝到目标位置。如果省略掉LOCAL，路径就是分布式文件系统中的路径。

关键字：OVERWRITE,如果使用此关键字，那么目标文件夹之前存在的数据将会被先删除掉，如果没有这个关键字，则仅仅会把新增的文件增加到目标文件中，而不会删除之前的数据。

2.通过查询语句将一个表中的数据插入另一个表中：

create table employees07 (name string, age int, phone string) partitioned by (address string, city string) row format delimited fields terminated by '\t';

有3种插入的方式：

3.动态分区插入（需要设置hive-site.xml中的hive.exec.dynamic.partition.mode为nonstrict，此处不需实现下面代码，请直接到4）：

INSERT OVERWRITE TABLE employees07 PARTITION(address,city) SELECT t.name,t.age,t.phone FROM employees06 t;

4.静态和动态分区混合插入：

INSERT OVERWRITE TABLE employees07 PARTITION(address='US',city='NY') SELECT t.name,t.age,t.phone from employees06 t where t.address='US' and t.city='NY';

5.创建表时插入：

CREATE TABLE employees08 AS SELECT * FROM employees06;

导出数据有2种方式：

直接从表所在的位置导出：

6.另起终端，在终端中执行：

#cd /usr/local/hadoop
#bin/hdfs dfs -get /user/hive/warehouse/employees06 /home/user/outdata

7.使用INSERT ...DIRECTORY...:

INSERT OVERWRITE LOCAL DIRECTORY '/home/user/outdata1' SELECT * FROM employees06;

进入/home/user/outdata和outdata1即可查看导出数据。

实验二：Hive基本操作实验

实验目的

实验原理

实验步骤

results matching ""

No results matching ""