小言_互联网的博客

Hive1.1.0升级至2.3.6 踩坑记录

762人阅读  评论(0)

1、union all 左右字段类型不匹配
Hive尝试跨Hive类型组执行隐式转换。隐式转换支持类型如下表:

例:

hive> select 1 as c2,    2 as c2
    > union all
    > select 1.0 as c1, "2" as c1;
FAILED: SemanticException Schema of both sides of union should match: 
Column _col1 is of type int on first table and type string on second table. 
Cannot tell the position of null AST.

需要cast(“2” as int)强制将"2"转换为int

2、date_add和date_sub函数返回类型不再是string而是date
这会导致在使用自己写的一些UDF时,抛类型不匹配异常。

3、union all之间不能有order by,order by只能出现在union all之后
虽然order by之后再union all也没什么意义,但是就是有这种SQL出现在线上环境。

3、表/字段名或别名与保留字冲突
以下保留字若需要作为表/字段名或别名,需要加上反引号``

ALL, ALTER, AND, ARRAY, AS, AUTHORIZATION, BETWEEN, BIGINT, BINARY, BOOLEAN, BOTH, BY, CASE, CAST, CHAR, COLUMN, CONF, CREATE, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_TIMESTAMP, CURSOR, DATABASE, DATE, DECIMAL, DELETE, DESCRIBE, DISTINCT, DOUBLE, DROP, ELSE, END, EXCHANGE, EXISTS, EXTENDED, EXTERNAL, FALSE, FETCH, FLOAT, FOLLOWING, FOR, FROM, FULL, FUNCTION, GRANT, GROUP, GROUPING, HAVING, IF, IMPORT, IN, INNER, INSERT, INT, INTERSECT, INTERVAL, INTO, IS, JOIN, LATERAL, LEFT, LESS, LIKE, LOCAL, MACRO, MAP, MORE, NONE, NOT, NULL, OF, ON, OR, ORDER, OUT, OUTER, OVER, PARTIALSCAN, PARTITION, PERCENT, PRECEDING, PRESERVE, PROCEDURE, RANGE, READS, REDUCE, REVOKE, RIGHT, ROLLUP, ROW, ROWS, SELECT, SET, SMALLINT, TABLE, TABLESAMPLE, THEN, TIMESTAMP, TO, TRANSFORM, TRIGGER, TRUE, TRUNCATE, UNBOUNDED, UNION, UNIQUEJOIN, UPDATE, USER, USING, UTC_TMESTAMP, VALUES, VARCHAR, WHEN, WHERE, WINDOW, WITH, COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START, CACHE, CONSTRAINT, FOREIGN, PRIMARY, REFERENCES, DAYOFWEEK, EXTRACT, FLOOR, INTEGER, PRECISION, VIEWS

4、设置参数大小写敏感

5、不指定类型 null类型已不兼容

6、元数据兼容升级
对Hive1.1元数据库进行以下操作,然后mysqldump -t导出数据,再删库重新用Hive2.3 schemaTool初始化元数据,最后将数据导入即可。

alter table partitions drop foreign key PARTITIONS_FK3;
alter table partitions drop link_target_id;
alter table tbls drop foreign key TBLS_FK3;
alter table tbls drop link_target_id;
alter table tbls drop OWNER_TYPE;
update roles set role_name=lower(role_name);

alter table tbls add `IS_REWRITE_ENABLED` bit(1) NOT NULL default '';

alter table notification_log add `MESSAGE_FORMAT` varchar(16) DEFAULT NULL;

drop table auth_user;
drop table tmp_user_map_d;
drop table version;

CREATE TABLE `key_constraints` (
`CHILD_CD_ID` bigint(20) DEFAULT NULL,
`CHILD_INTEGER_IDX` int(11) DEFAULT NULL,
`CHILD_TBL_ID` bigint(20) DEFAULT NULL,
`PARENT_CD_ID` bigint(20) NOT NULL,
`PARENT_INTEGER_IDX` int(11) NOT NULL,
`PARENT_TBL_ID` bigint(20) NOT NULL,
`POSITION` bigint(20) NOT NULL,
`CONSTRAINT_NAME` varchar(400) NOT NULL,
`CONSTRAINT_TYPE` smallint(6) NOT NULL,
`UPDATE_RULE` smallint(6) DEFAULT NULL,
`DELETE_RULE` smallint(6) DEFAULT NULL,
`ENABLE_VALIDATE_RELY` smallint(6) NOT NULL,
PRIMARY KEY (`CONSTRAINT_NAME`,`POSITION`),
KEY `CONSTRAINTS_PARENT_TABLE_ID_INDEX` (`PARENT_TBL_ID`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

7、对于用户角色,Hive2.3会将所有新建角色全转为小写存入mysql,而在鉴权的时候,又是直接用=比较的。这样就会导致之前Hive1.1中有大写字符的roles在鉴权的时候会失败。
异常如下:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error granting roles for A to role B: null

所以需要将roles表的role_name全转为小写:

update roles set role_name=lower(role_name)


8、SQL中单引号内的双引号个数若为奇数,则需要转义
如:'date":"(.+)"}'出现在SQL中会报<EOF>的错误
需要改成'date\":\"(.+)\"}'

9、权限ALL不能用了,需要替换成select,insert,update,delete
如果开启了权限认证(hive.security.authorization.enabled=true)
则不能再使用ALL来代替select,insert,update,delete
需要修改Hive库:
检查以下几张表:
DB_PRIVS
PART_COL_PRIVS
PART_PRIVS
TBL_COL_PRIVS
TBL_PRIVS

需要写个脚本,重新生成权限,将ALL转化为SELECT,UPDATE,INSERT,DELETE四条权限。
然后直接删除所有的ALL权限。

delete from TBL_PRIVS where TBL_PRIV like 'ALL%';
delete from DB_PRIVS where DB_PRIV like 'ALL%';

除此之外,也要留意下面这个参数配置:

<property>
    <name>hive.security.authorization.createtable.owner.grants</name>
    <value>select,insert,update,delete</value>
    <description>
      The privileges automatically granted to the owner whenever a table gets created.
      An example like "select,drop" will grant select and drop privilege to the owner
      of the table. Note that the default gives the creator of a table no access to the
      table (but see HIVE-8067).
    </description>
  </property>

10、执行SQL出现报错,怀疑是Hive Bug
报错:

Task with the most failures(4):
-----
Task ID:
  task_1566481621886_4925755_m_000000

URL:
  http://TXIDC65-bigdata-resourcemanager1:8088/taskdetails.jsp?jobid=job_1566481621886_4925755&tipid=task_1566481621886_4925755_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:695)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
	... 11 more
Caused by: java.lang.NullPointerException
	at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
	at org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:118)
	at org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:189)
	at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:75)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:75)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99)
	... 16 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

解决方法:
在org/apache/hadoop/hive/ql/plan/TableScanDesc.java
private List<String> neededNestedColumnPaths;
改为private List<String> neededNestedColumnPaths = new ArrayList<String>();

11、MetaException(message:Version information not found in metastore. )
hive-site.xml中将hive.metastore.schema.verification设置成false。

12、权限表tbl_privs和db_privs中是否存在TBL_PRIV或者DB_PRIV为CREATE的权限,如果存在则根据具体情况替换为ALL/SELECT等权限。

13、权限版本不同
Hive1.1.0用的是AuthorizerV1:

<property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
</property>

Hive2.3.6用的是AuthorizerV2:

<property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>

这两者有相当大的差别,比如在一个库中建表或删表,V2中必须是该库的owner,而V1则只需要有相应的权限即可。

解决办法:
1、升级权限,不得不说V2管理权限更为严谨,更符合数仓规范。
2、修改Operation2Privilege.java,将相应的operation的权限设为你需要的。

14、User: root is not allowed to impersonate root

20/01/14 10:11:59 ERROR HiveConnection: Error opening session
org.apache.thrift.protocol.TProtocolException: Required field 'serverProtocolVersion' is unset! Struct:TOpenSessionResp(status:TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root:14:13, 

这个不算是迁移问题,可以参考这篇文章解决。

我们这边是设置了

  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>

因此只需要使用su hive,用hive去启动metastore和hiveserver即可。

15、set hive.vectorized.execution.enabled=false
这个参数默认是false ,向量化执行。
若开启就会报错:(原因待查)

Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:456)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

16、双引号问题

from_unixtime(unix_timestamp(regexp_replace(regexp_replace(regexp_extract(get_json_object(content,'$.createTime'),'date":"(.+)"}',1),'T',' '),'Z',''))+28800,'yyyy-MM-dd HH:mm:ss') create_time

需要转义:

from_unixtime(unix_timestamp(regexp_replace(regexp_replace(regexp_extract(get_json_object(content,'$.createTime'),'date\":\"(.+)\"}',1),'T',' '),'Z',''))+28800,'yyyy-MM-dd HH:mm:ss') create_time

转载:https://blog.csdn.net/zyzzxycj/article/details/103611761
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场