1、union all 左右字段类型不匹配
Hive尝试跨Hive类型组执行隐式转换。隐式转换支持类型如下表:
例:
hive> select 1 as c2, 2 as c2
> union all
> select 1.0 as c1, "2" as c1;
FAILED: SemanticException Schema of both sides of union should match:
Column _col1 is of type int on first table and type string on second table.
Cannot tell the position of null AST.
需要cast(“2” as int)强制将"2"转换为int
2、date_add和date_sub函数返回类型不再是string而是date
这会导致在使用自己写的一些UDF时,抛类型不匹配异常。
3、union all之间不能有order by,order by只能出现在union all之后
虽然order by之后再union all也没什么意义,但是就是有这种SQL出现在线上环境。
3、表/字段名或别名与保留字冲突
以下保留字若需要作为表/字段名或别名,需要加上反引号``
ALL, ALTER, AND, ARRAY, AS, AUTHORIZATION, BETWEEN, BIGINT, BINARY, BOOLEAN, BOTH, BY, CASE, CAST, CHAR, COLUMN, CONF, CREATE, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_TIMESTAMP, CURSOR, DATABASE, DATE, DECIMAL, DELETE, DESCRIBE, DISTINCT, DOUBLE, DROP, ELSE, END, EXCHANGE, EXISTS, EXTENDED, EXTERNAL, FALSE, FETCH, FLOAT, FOLLOWING, FOR, FROM, FULL, FUNCTION, GRANT, GROUP, GROUPING, HAVING, IF, IMPORT, IN, INNER, INSERT, INT, INTERSECT, INTERVAL, INTO, IS, JOIN, LATERAL, LEFT, LESS, LIKE, LOCAL, MACRO, MAP, MORE, NONE, NOT, NULL, OF, ON, OR, ORDER, OUT, OUTER, OVER, PARTIALSCAN, PARTITION, PERCENT, PRECEDING, PRESERVE, PROCEDURE, RANGE, READS, REDUCE, REVOKE, RIGHT, ROLLUP, ROW, ROWS, SELECT, SET, SMALLINT, TABLE, TABLESAMPLE, THEN, TIMESTAMP, TO, TRANSFORM, TRIGGER, TRUE, TRUNCATE, UNBOUNDED, UNION, UNIQUEJOIN, UPDATE, USER, USING, UTC_TMESTAMP, VALUES, VARCHAR, WHEN, WHERE, WINDOW, WITH, COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START, CACHE, CONSTRAINT, FOREIGN, PRIMARY, REFERENCES, DAYOFWEEK, EXTRACT, FLOOR, INTEGER, PRECISION, VIEWS
4、设置参数大小写敏感
5、不指定类型 null类型已不兼容
6、元数据兼容升级
对Hive1.1元数据库进行以下操作,然后mysqldump -t
导出数据,再删库重新用Hive2.3 schemaTool初始化元数据,最后将数据导入即可。
alter table partitions drop foreign key PARTITIONS_FK3;
alter table partitions drop link_target_id;
alter table tbls drop foreign key TBLS_FK3;
alter table tbls drop link_target_id;
alter table tbls drop OWNER_TYPE;
update roles set role_name=lower(role_name);
alter table tbls add `IS_REWRITE_ENABLED` bit(1) NOT NULL default '';
alter table notification_log add `MESSAGE_FORMAT` varchar(16) DEFAULT NULL;
drop table auth_user;
drop table tmp_user_map_d;
drop table version;
CREATE TABLE `key_constraints` (
`CHILD_CD_ID` bigint(20) DEFAULT NULL,
`CHILD_INTEGER_IDX` int(11) DEFAULT NULL,
`CHILD_TBL_ID` bigint(20) DEFAULT NULL,
`PARENT_CD_ID` bigint(20) NOT NULL,
`PARENT_INTEGER_IDX` int(11) NOT NULL,
`PARENT_TBL_ID` bigint(20) NOT NULL,
`POSITION` bigint(20) NOT NULL,
`CONSTRAINT_NAME` varchar(400) NOT NULL,
`CONSTRAINT_TYPE` smallint(6) NOT NULL,
`UPDATE_RULE` smallint(6) DEFAULT NULL,
`DELETE_RULE` smallint(6) DEFAULT NULL,
`ENABLE_VALIDATE_RELY` smallint(6) NOT NULL,
PRIMARY KEY (`CONSTRAINT_NAME`,`POSITION`),
KEY `CONSTRAINTS_PARENT_TABLE_ID_INDEX` (`PARENT_TBL_ID`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
7、对于用户角色,Hive2.3会将所有新建角色全转为小写存入mysql,而在鉴权的时候,又是直接用=
比较的。这样就会导致之前Hive1.1中有大写字符的roles在鉴权的时候会失败。
异常如下:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error granting roles for A to role B: null
所以需要将roles表的role_name全转为小写:
update roles set role_name=lower(role_name)
8、SQL中单引号内的双引号个数若为奇数,则需要转义
如:'date":"(.+)"}'
出现在SQL中会报<EOF>
的错误
需要改成'date\":\"(.+)\"}'
9、权限ALL不能用了,需要替换成select,insert,update,delete
如果开启了权限认证(hive.security.authorization.enabled=true
)
则不能再使用ALL
来代替select,insert,update,delete
。
需要修改Hive库:
检查以下几张表:
DB_PRIVS
PART_COL_PRIVS
PART_PRIVS
TBL_COL_PRIVS
TBL_PRIVS
需要写个脚本,重新生成权限,将ALL
转化为SELECT,UPDATE,INSERT,DELETE
四条权限。
然后直接删除所有的ALL
权限。
delete from TBL_PRIVS where TBL_PRIV like 'ALL%';
delete from DB_PRIVS where DB_PRIV like 'ALL%';
除此之外,也要留意下面这个参数配置:
<property>
<name>hive.security.authorization.createtable.owner.grants</name>
<value>select,insert,update,delete</value>
<description>
The privileges automatically granted to the owner whenever a table gets created.
An example like "select,drop" will grant select and drop privilege to the owner
of the table. Note that the default gives the creator of a table no access to the
table (but see HIVE-8067).
</description>
</property>
10、执行SQL出现报错,怀疑是Hive Bug
报错:
Task with the most failures(4):
-----
Task ID:
task_1566481621886_4925755_m_000000
URL:
http://TXIDC65-bigdata-resourcemanager1:8088/taskdetails.jsp?jobid=job_1566481621886_4925755&tipid=task_1566481621886_4925755_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:695)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
... 11 more
Caused by: java.lang.NullPointerException
at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
at org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:118)
at org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:189)
at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:75)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:75)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99)
... 16 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
解决方法:
在org/apache/hadoop/hive/ql/plan/TableScanDesc.java
中
将private List<String> neededNestedColumnPaths;
改为private List<String> neededNestedColumnPaths = new ArrayList<String>();
11、MetaException(message:Version information not found in metastore. )
在hive-site.xml
中将hive.metastore.schema.verification
设置成false。
12、权限表tbl_privs和db_privs中是否存在TBL_PRIV或者DB_PRIV为CREATE
的权限,如果存在则根据具体情况替换为ALL/SELECT
等权限。
13、权限版本不同
Hive1.1.0用的是AuthorizerV1:
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
</property>
Hive2.3.6用的是AuthorizerV2:
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>
这两者有相当大的差别,比如在一个库中建表或删表,V2中必须是该库的owner,而V1则只需要有相应的权限即可。
解决办法:
1、升级权限,不得不说V2管理权限更为严谨,更符合数仓规范。
2、修改Operation2Privilege.java
,将相应的operation的权限设为你需要的。
14、User: root is not allowed to impersonate root
20/01/14 10:11:59 ERROR HiveConnection: Error opening session
org.apache.thrift.protocol.TProtocolException: Required field 'serverProtocolVersion' is unset! Struct:TOpenSessionResp(status:TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root:14:13,
这个不算是迁移问题,可以参考这篇文章解决。
我们这边是设置了
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
因此只需要使用su hive
,用hive去启动metastore和hiveserver即可。
15、set hive.vectorized.execution.enabled=false
这个参数默认是false ,向量化执行。
若开启就会报错:(原因待查)
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:456)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
16、双引号问题
from_unixtime(unix_timestamp(regexp_replace(regexp_replace(regexp_extract(get_json_object(content,'$.createTime'),'date":"(.+)"}',1),'T',' '),'Z',''))+28800,'yyyy-MM-dd HH:mm:ss') create_time
需要转义:
from_unixtime(unix_timestamp(regexp_replace(regexp_replace(regexp_extract(get_json_object(content,'$.createTime'),'date\":\"(.+)\"}',1),'T',' '),'Z',''))+28800,'yyyy-MM-dd HH:mm:ss') create_time
转载:https://blog.csdn.net/zyzzxycj/article/details/103611761