Hadoop宕机
(1)如果MR造成系统宕机。此时要控制Yarn同时运行的任务数,和每个任务申请的最大内存。调整参数:yarn.scheduler.maximum-allocation-mb(单个任务可申请的最多物理内存量,默认是8192MB)
(2)如果写入文件过量造成NameNode宕机。那么调高Kafka的存储大小,控制从Kafka到HDFS的写入速度。高峰期的时候用Kafka进行缓存,高峰期过去数据同步会自动跟上。
报错:no YARN_RESOURCEMANAGER_USER defined
- 报错内容:
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
-
解决办法:
cd /usr/local/hadoop-3.1.3/sbin/
vim start-dfs.sh # 在文件顶部添加以下参数 HDFS_NAMENODE_USER=root HDFS_DATANODE_USER=root HDFS_SECONDARYNAMENODE_USER=root YARN_RESOURCEMANAGER_USER=root YARN_NODEMANAGER_USER=root
vim stop-dfs.sh # 在文件顶部添加以下参数 HDFS_NAMENODE_USER=root HDFS_DATANODE_USER=root HDFS_SECONDARYNAMENODE_USER=root YARN_RESOURCEMANAGER_USER=root YARN_NODEMANAGER_USER=root
vim start-yarn.sh # 在文件顶部添加以下参数 YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
vim stop-yarn.sh # 在文件顶部添加以下参数 YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
-
写入数据到 hive 时报错,提示
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
,报错内容如下方所示:INFO : Compiling command(queryId=root_20220702214441_90f37f30-1f88-4147-bc45-832d0018fa04): insert into student_hdfs values (950122, '华少','男',30,'IS') INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:int, comment:null), FieldSchema(name:col2, type:string, comment:null), FieldSchema(name:col3, type:string, comment:null), FieldSchema(name:col4, type:int, comment:null), FieldSchema(name:col5, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20220702214441_90f37f30-1f88-4147-bc45-832d0018fa04); Time taken: 0.633 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=root_20220702214441_90f37f30-1f88-4147-bc45-832d0018fa04): insert into student_hdfs values (950122, '华少','男',30,'IS') WARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. INFO : Query ID = root_20220702214441_90f37f30-1f88-4147-bc45-832d0018fa04 INFO : Total jobs = 3 INFO : Launching Job 1 out of 3 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Number of reduce tasks determined at compile time: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=<number> INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=<number> INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces=<number> INFO : number of splits:1 INFO : Submitting tokens for job: job_1656766458051_0004 INFO : Executing with tokens: [] INFO : The url to track the job: http://hadoop103:8088/proxy/application_1656766458051_0004/ INFO : Starting Job = job_1656766458051_0004, Tracking URL = http://hadoop103:8088/proxy/application_1656766458051_0004/ INFO : Kill Command = /usr/local/hadoop-3.1.3/bin/mapred job -kill job_1656766458051_0004 INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 INFO : 2022-07-02 21:45:05,409 Stage-1 map = 100%, reduce = 100% ERROR : Ended Job = job_1656766458051_0004 with errors ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL INFO : Total MapReduce CPU Time Spent: 0 msec INFO : Completed executing command(queryId=root_20220702214441_90f37f30-1f88-4147-bc45-832d0018fa04); Time taken: 25.682 seconds INFO : Concurrency mode is disabled, not creating a lock manager Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
-
注意:下方的报错并不是实际报错,需要到 yarn web ui 集群查看真实的错误。
-
真实的报错内容:
Application application_1656759536632_0001 failed 2 times due to AM Container for appattempt_1656759536632_0001_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2022-07-02 19:00:15.757]Exception from container-launch. Container id: container_1656759536632_0001_02_000001 Exit code: 1 [2022-07-02 19:00:15.799]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Please check whether your etc/hadoop/mapred-site.xml contains the below configuration: <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value> </property> [2022-07-02 19:00:15.801]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Please check whether your etc/hadoop/mapred-site.xml contains the below configuration: <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value> </property> For more detailed output, check the application tracking page: http://hadoop103:8088/cluster/app/application_1656759536632_0001 Then click on links to logs of each attempt. . Failing the application.
-
错误解决办法:
- 下面的方法我尝试了,不可行。
- mapred-site.xml 给 HADOOP_MAPRED_HOME 配置绝对路径
<property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value> </property> </configuration>
- 配置 hadoop classpath
hadoop classpath
- 获取后的路径
:
替换成,
,写入yarn-site.xml
,如下:<property> <name>yarn.application.classpath</name> <value>/usr/local/hadoop-3.1.3/etc/hadoop,/usr/local/hadoop-3.1.3/share/hadoop/common/lib/*,/usr/local/hadoop-3.1.3/share/hadoop/common/*,/usr/local/hadoop-3.1.3/share/hadoop/hdfs,/usr/local/hadoop-3.1.3/share/hadoop/hdfs/lib/*,/usr/local/hadoop-3.1.3/share/hadoop/hdfs/*,/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/lib/*,/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/*,/usr/local/hadoop-3.1.3/share/hadoop/yarn,/usr/local/hadoop-3.1.3/share/hadoop/yarn/lib/*,/usr/local/hadoop-3.1.3/share/hadoop/yarn/*</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name>, <value>JAVA_HOME,HADOOP_HOME,HADOOP_COMMON_HOME, HADOOP_ HDFS_HOME, HADOOP_ CONF_DIR, CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME, HADOOP_MAPRED_HOME</value> </property>
- 获取后的路径
- 上述方法我试了都不可行。
- mapred-site.xml 给 HADOOP_MAPRED_HOME 配置绝对路径
- 下面的方法我尝试了,不可行。
-
解决办法:
- 编辑
mapred-site.xml
<property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}</value> </property>
- export 配置 HADOOP_MAPRED_HOME
export HADOOP_MAPRED_HOME=/usr/local/hadoop-3.1.3
- 注意:这一步不能去掉,不然的话上面的配置会不生效。
- 分发
mapred-site.xml
到集群其他节点 - 重启集群。
- 编辑
The required MAP capability is more than the supported max container capability in the cluster.
-
报错内容:
The required MAP capability is more than the supported max container capability in the cluster. Killing the Job. mapResourceRequest: <memory:5120, vCores:1> maxContainerCapability:<memory:4096, vCores:4> Job received Kill while in RUNNING state. REDUCE capability required is more than the supported max container capability in the cluster. Killing the Job. reduceResourceRequest: <memory:5120, vCores:1> maxContainerCapability:<memory:4096, vCores:4>
-
解决方法:
-
需要调整两个参数:
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-mb
- 注意:如果只设置其中一个,可能依然存在同样的报错
-
编辑
yarn-site.xml
<property> <name>yarn.nodemanager.resource.memory-mb</name> <value>10000</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>10000</value> </property>
- 分发
yarn-site.xml
到集群其他节点 - 重启集群。
-
Error: JAVA_HOME is not set and could not be found.
- 报错内容:
Error: JAVA_HOME is not set and could not be found.
- 解决办法 1: 添加系统环境变设置
- 检查环境变量设置
which java java -version
- 如果返回异常,则需要手动添加
vim /etc/profile # JAVA_HOME export JAVA_HOME=/usr/local/bin/jdk1.8 export PATH=$PATH:$JAVA_HOME/bin
- 启动生效
source /etc/profile
- 检查环境变量设置
- 解决方法 2:在
hadoop-env.sh
中的环境变量设置cd /usr/local/hadoop-3.1.3/etc/hadoop/ vim hadoop-env.sh export JAVA_HOME=/usr/local/bin/jdk1.8
- 注意:
- 对于 hadoop 集群而言,前面所说的环境变量配置,必须要在 master 上进行,并在集群内部分发、而且启动生效;
- 如果按照上面的操作进行以后仍然报错,请 仔细检查是否是在 master 上进行这些配置操作,检查集群机器是否全部配置生效。
bash: jps: command not found
- 报错内容:
bash: jps: command not found
- 解决方法:
- 删除原有的 jps 软连接,并新建一个
rm /usr/local/bin/jps ln -s /usr/local/bin/jdk1.8/bin/jps /usr/local/bin/jps
- 删除原有的 jps 软连接,并新建一个
/usr/libexec/grepconf.sh:行5: grep: 未找到命令
- 问题:
- 在
/etc/profile
配置 SPARK_HOME 后,执行source /etc/profile
环境变量生效,提示报错; - 报错后命令行常规命令均无法正常使用
- 在
-
经排查发现路径配置错误,需要修改路径,重新启用环境变量配置文件;
- 解决方法:
- 修改配置文件 SPARK_HOME 路径;
- cmd 窗口执行如下命令:
export PATH=/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin
- 重新启用环境变量配置文件
source /etc/profile