Hive常见操作命令

There is no translation available.

1. hive模糊搜索表 show tables like '*name*';

2. 查看表结构信息 desc formatted tablename; desc table_name;

3. 查看分区信息 show partitions tablename;

4. 根据分区查询数据 select table_coulm from tablename where partitionname = '2016-02-25';

5. 删除分区 alter table test drop partition(dt='2016-03-01'); alter table test drop partition(dt='2016-01-17');

6. 杀死某个任务不在hive shell中执行 Hadoop job -kill job_201403041453_58315

7. hive命令行操作执行一个查询,在终端上显示mapreduce的进度，执行完毕后，最后把查询结果输出到终端上，接着hive进程退出，不会进入交互模式。 hive -e 'select table_cloum from table' -S，终端上的输出不会有mapreduce的进度，执行完毕，只会把查询结果输出到终端上。这个静音模式很实用，,通过第三方程序调用，第三方程序通过hive的标准输出获取结果集。 hive -S -e 'select tablecloum from table' 执行sql文件 hive -f hive_sql.sql

8. 查看文件大小及删除文件 Hive>dfs –du /xx/xx Hive>dfs –rmr /xx/xx

9. mapjoin的使用应用场景：1.关联操作中有一张表非常小 2. 不等值的链接操作 select /*+ mapjoin(A)*/ f.a,f.b from A t join B f on ( f.a=t.a and f.ftime=20110802);

10. hive开启简单模式不启用mr set hive.fetch.task.conversion=more;

11. hive修改表名 ALTER TABLE oldtablename RENAME TO newtablename;

12. hive添加字段 alter table temp add columns(current_session_timelenth_count bigint comment '页面停留总时长'); ALTER temp CHANGE current_session_timelenth current_session_timelenth bigint comment '当前会话停留时间';

------------------------------------------------------------------------------------------ Hive支持大量SQL数据定义语言（Data Manipulation Language，DML）中的现有功能，包括以下各种语句：使用where条件过滤表的行使用列或子查询的select表达式使用等值连接，连接多张表合并表的所有行或子查询针对多个“分组”的列进行聚合计算将查询结果存入另一张表导出表中的内容到本地目录或HDFS目录中以下参考自点击打开链接 1. 只查询前两条： select * from student limit 2 ;

2. 统计一个表的行数： select count(*) from student ;

3. 求一个表id字段的id 之和： select sum(id) from student ;

4. 查询分区表 select * from beauties where nation='China' ;

5. 多表关联： select t . account , u . name , t . income , t . expenses , t . surplus from user_info u join (select account , sum(income) as income , sum(expenses) as expenses , sum(income-expenses) as surplus from trade_detail group by account) t on u . account = t . account ; 别名 select count(distinct e.uid) from (select * from tablenamewhere rank <=3 and order =1) e; 小括号中返回的也是一个表,它只是临时的别名为e 查搜索过"奥巴马" 的用户所搜过的关键字 select m.uid,m.keyword from (select distinct n.uid from tablenamewhere keyword like '%奥巴马%' n ) m where m.uid=n.uid; 查搜索过"奥巴马" 的用户所搜过的不包含"奥巴马"本身的关键字 select m.uid,m.keyword from sogou_20111230 m join (select distinct uid from sogou_20111230 where keyword like '%奥巴马%') n on m.uid=n.uid where m.keyword not like '%奥巴马%'; UNION ALL可以将2个或多个表进行合并 select count(distinct e.uid)from( select * from tablename where rank<11 union all select * from ext_sogou_20111230_limit3 where rank < 11) e;

6. 去重查询：group by的使用 select * from mytable group by uid ; Group by 语句通常会和聚合函数一起使用,按照一个或者多个对结果进行分组,然后对每个组执行聚合操作 select year(ts), avg(rank) from tablename where ts like '%2011' group by year(ts);

7. 独立UID总数： select count(distinct(uid)) from mytable ; （高效）或者 select count(*) from(select * from mytable group by uid) a ;

8. 查询频度排名（频度最高的前50）： select keyword,count(*) as cnt from test group by keyword order by cnt desc limit 50;

9. 添加防止删除的保护： alter table tablename > partition (day='0925') enable no_drop; 删除添加的"删除"保护： alter table tablename > partition (day='20161207') disable no_drop;

10. 添加防止查询的保护： alter table tablename > partition (day=20161207') enable offline; 删除防止查询的保护: alter table tablename > partition (day='20161207') disable offline;

11. 强转: select cast(rank as DOUBLE) from tablename limit 10;

12. 拼接: select concat(uid,url) from tablename limit 10;

13. 查找url字符串中的5位置之后字符串str第一次出现的位置 select locate("str",url,5) from tablename limit 100;

14. 抽取字符串str中符合正则表达式url的第5个部分的子字符串 select regexp_extract("str",url,5) from tablename limit 100;

15. 按照正则表达式"0"分割字符串uid,并将分割后的部分以字符串数组的方式返回 select split(uid,"0") from tablename limit 100;

16. 对字符串url,从0处开截取长度为3的字符串,作为其子字符串 select substr(url,0,3) from tablename limit 3;

17. 将字符串url中所有的字母转换成大写字母 select upper(url) from tablename limit 3;

18. where ..and 或者 where ....or where的两种条件查询 select * from tablename where rank<=3 and order =1 limit 3; select * from tablenamewhere rank !=0 or order =1 limit 3;

19. like 过滤字符串 select * from tablename where url like '%http%' limit 10; rlike 通过Java的正则表达式过滤 *与%功能一样，它是hive中扩展功能的操作符 select * from tablenamewhere url rlike ' .*http.* ' limit 3;

20. left semi-join 左半表 semi 半挂的半独立的 select * from be where rank in(1,2,5); select * from tablenamem left semi join ext_sogou_20111230_limit3 n on m.rank=n.rank;

21. 视图 hive只支持逻辑视图作用降低查询复杂度创建视图 create view sogou_view as select * from tablenamewhere rank <=3;

22. 索引 Hive的索引需要单独创建表实现创建索引 CREATE INDEX employees_index ON TABLE employees (name) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD IDXPROPERTIES('creator' = 'me',' created_at '='some time') IN TABLE employees_index_table;

23. 上午7-9点之间，搜索过“百度”的用户，哪些用户直接点击了百度的URL 老师: select distinct n.uid from (select * from sogou_view where keyword ='百度') and substr(ts,9,2) in ('07','08','09')) n where n.url like '%baidu.com%'; select uid from sogou_view where (cast(substr(ts,9,2) as int)>7 or cast(substr(ts,9,2) as int)<9) and url like '%www.ganji.com%' or keyword like '%百度%' ; select uid from sogou_view where substr(ts,9,2) in ('07','08','09') and url like '%www.ganji.com%' and keyword like '%百度%' ;

24. 保存Hive查询结果到本地这个方法最为常见，sql的查询结果将直接保存到/tmp/out.txt中 $ hive -e "select user, login_timestamp from user_login" > /Downloads/result.txt

最新分享

Publish our news and thoughts with all.

Latest Sharing

最新 分享

Publish our news and thoughts with all.

Latest Sharing

Hive常见操作命令

Recent Posts

Ubuntu 10.04架设VPN

Linux系统修改MySQL数据库密码的方法

最新分享