飞道的博客

hadoop3.3.1集群搭建

480人阅读  评论(0)

概述集群配置及节点信息如下:

8c16g 操作系统centos7.3,操作安装用户ucmed(注意:运维的安全基线加固会影响hadoop启动)

192.168.3.184 master 

192.168.3.185 slave1

192.168.3.186 slave2

master slave1 slave2
HDFS NamNode,DataNode DataNode SecondaryNameNode,DataNode
YARN NodeManager ResourceManager,NodeManager NodeManager

1.修改主机名称,配置hosts及免密登录


  
  1. sudo hostnamectl set-hostname master
  2. sudo hostnamectl set-hostname slave1
  3. sudo hostnamectl set-hostname slave2

配置hosts文件


  
  1. sudo vi /etc/hosts
  2. 192.168.3.184 master
  3. 192.168.3.185 slave1
  4. 192.168.3.186 slave2

配置免密


  
  1. ssh-keygen -t rsa -b 2048 -v
  2. 拷贝 ~/.ssh/id_rsa.pub 到目标服务器并重命名
  3. cp ~/.ssh/id_rsa.pub /tmp/master.pub
  4. cp ~/.ssh/id_rsa.pub /tmp/slave1.pub
  5. cp ~/.ssh/id_rsa.pub /tmp/slave2.pub
  6. cat /tmp/master.pub >> ~/.ssh/authorized_keys
  7. cat /tmp/slave1.pub >> ~/.ssh/authorized_keys
  8. cat /tmp/slave2.pub >> ~/.ssh/authorized_keys
  9. chmod 700 ~/.ssh/
  10. chmod 600 ~/.ssh/authorized_keys

2.关闭防火墙及SELinux

查看防火墙状态:systemctl status firewalld

关防火墙


  
  1. sudo systemctl disable firewalld
  2. sudo systemctl stop firewalld

查看SELinux状态:sestatus

关 SELinux


  
  1. #永久关闭,需要重启机器:
  2. vi /etc/syscon/root/.bashrcfig/selinux
  3. #下面为selinux文件中需要修改的元素
  4. SELINUX=disabled

3.安装oracleJDK

复制 jdk-8u311-linux-x64.tar.gz到服务器

解压


  
  1. sudo mkdir /usr/local/src/jdk
  2. sudo cp /tmp/jdk-8u311-linux-x64.tar.gz /usr/local/src/jdk
  3. cd /usr/local/src/jdk
  4. sudo tar -zxvf jdk-8u311-linux-x64.tar.gz

添加环境变量:


  
  1. sudo vim /etc/profile
  2. export JAVA_HOME=/usr/local/src/jdk/jdk1.8.0_311 
  3. export PATH= $PATH: $JAVA_HOME/bin
  4. 立即生效文件
  5. source /etc/profile
java -version

4. hadoop 3.3.1安装 

https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz


  
  1. sudo mkdir /usr/local/src/hadoop
  2. sudo cp /tmp/hadoop-3.3.1.tar.gz /usr/local/src/hadoop/
  3. sudo chown -R ucmed:ucmed /usr/local/src/hadoop
  4. cd /usr/local/src/hadoop
  5. sudo tar -zxvf hadoop-3.3.1.tar.gz
  6. cd /usr/local/src/hadoop/hadoop-3.3.1
  7. sudo mkdir -p /opt/hdfs/data
  8. sudo mkdir -p /opt/hdfs/name
  9. sudo chown -R ucmed:ucmed /opt/hdfs

配置环境变量


  
  1. sudo vim /etc/profile
  2. export HADOOP_HOME=/usr/local/src/hadoop/hadoop-3.3.1
  3. export PATH= $PATH: $JAVA_HOME/bin: $HADOOP_HOME/bin
  4. source /etc/profile
hadoop version

5. 配置文件修改


  
  1. cd $HADOOP_HOME/etc/hadoop
  2. vim core-site.xml
  3. <configuration>
  4. <!-- 指定NameNode的地址 -->
  5. <property>
  6. <name>fs.defaultFS </name>
  7. <value>hdfs://master:8020 </value>
  8. </property>
  9. <!-- 指定hadoop数据的存储目录 -->
  10. <property>
  11. <name>hadoop.tmp.dir </name>
  12. <value>/opt/hdfs/data </value>
  13. </property>
  14. <!-- 配置HDFS网页登录使用的静态用户为ucmed -->
  15. <property>
  16. <name>hadoop.http.staticuser.user </name>
  17. <value>ucmed </value>
  18. </property>
  19. </configuration>


  
  1. vim hdfs-site.xml
  2. <configuration>
  3. <!-- nn web端访问地址-->
  4. <property>
  5. <name>dfs.namenode.http-address </name>
  6. <value>master:9870 </value>
  7. </property>
  8. <!-- 2nn web端访问地址-->
  9. <property>
  10. <name>dfs.namenode.secondary.http-address </name>
  11. <value>slave2:9868 </value>
  12. </property>
  13. </configuration>


  
  1. vim yarn-site.xml
  2. <configuration>
  3.     <!-- Site specific YARN configuration properties -->
  4.     <!-- 指定MR走shuffle -->
  5.     <property>
  6.         <name>yarn.nodemanager.aux-services </name>
  7.         <value>mapreduce_shuffle </value>
  8.     </property>
  9.     <!-- 指定ResourceManager的地址-->
  10.     <property>
  11.         <name>yarn.resourcemanager.hostname </name>
  12.         <value>slave1 </value>
  13.     </property>
  14.     <!-- 环境变量的继承 -->
  15.     <property>
  16.         <name>yarn.nodemanager.env-whitelist </name>
  17.         <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME </value>
  18.     </property>
  19.     <property>
  20.         <name>yarn.nodemanager.pmem-check-enabled </name>
  21.         <value>true </value>
  22.         <description>检测物理内存的使用是否超出分配值,若任务超出分配值,则将其杀掉,默认true。 </description>
  23.     </property>
  24.     <property>
  25.         <name>yarn.nodemanager.vmem-check-enabled </name>
  26.         <value>true </value>
  27.         <description>检测虚拟内存的使用是否超出;若任务超出分配值,则将其杀掉,默认true。在确定内存不会泄漏的情况下可以设置此项为 False; </description>
  28.     </property>
  29.     <property>
  30.         <name>yarn.nodemanager.vmem-pmem-ratio </name>
  31.         <value>8 </value>
  32.         <description>任务每使用1MB物理内存,最多可使用虚拟内存量比率,默认2.1;在上一项中设置为false不检测虚拟内存时,此项就无意义了 </description>
  33.     </property>
  34.     <property>
  35.         <name>yarn.nodemanager.resource.cpu-vcores </name>
  36.         <value>8 </value>
  37.         <description>该节点上YARN可使用的总核心数;一般设为cat /proc/cpuinfo| grep "processor"| wc -l 的值。默认是8个; </description>
  38.     </property>
  39.     <property>
  40.         <name>yarn.nodemanager.resource.memory-mb </name>
  41.         <value>24576 </value>
  42.         <description>该节点上YARN可使用的物理内存总量,【向操作系统申请的总量】默认是8192(MB) </description>
  43.     </property>
  44.     <property>
  45.         <name>yarn.scheduler.minimum-allocation-mb </name>
  46.         <value>1024 </value>
  47.         <description>单个容器/调度器可申请的最少物理内存量,默认是1024(MB);一般每个contain都分配这个值;即:capacity memory:3072, vCores:1,如果提示物理内存溢出,提高这个���即可; </description>
  48.     </property>
  49.     <property>
  50.         <name>yarn.scheduler.maximum-allocation-mb </name>
  51.         <value>8192 </value>
  52.         <description>单个容器申请最大值 </description>
  53.     </property>
  54.     <property>
  55.         <description>cpu申请最小量 </description>
  56.         <name>yarn.scheduler.minimum-allocation-vcores </name>
  57.         <value>1 </value>
  58.     </property>
  59.     <property>
  60.         <name>yarn.timeline-service.enabled </name>
  61.         <value>true </value>
  62.     </property>
  63.     <property>
  64.         <name>yarn.timeline-service.http-cross-origin.enabled </name>
  65.         <value>true </value>
  66.     </property>
  67.     <property>
  68.         <name>yarn.webapp.api-service.enable </name>
  69.         <value>true </value>
  70.     </property>
  71.     <property>
  72.         <name>yarn.webapp.ui2.enable </name>
  73.         <value>true </value>
  74.     </property>
  75.     <property>
  76.         <name>yarn.application.classpath </name>
  77.         <value>
  78.             %HADOOP_HOME%\etc\hadoop,
  79.             %HADOOP_HOME%\share\hadoop\common\*,
  80.             %HADOOP_HOME%\share\hadoop\common\lib\*,
  81.             %HADOOP_HOME%\share\hadoop\hdfs\*,
  82.             %HADOOP_HOME%\share\hadoop\hdfs\lib\*,
  83.             %HADOOP_HOME%\share\hadoop\mapreduce\*,
  84.             %HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
  85.             %HADOOP_HOME%\share\hadoop\yarn\*,
  86.             %HADOOP_HOME%\share\hadoop\yarn\lib\*
  87.         </value>
  88.     </property>
  89. </configuration>

  
  1. vim mapred-site.xml
  2. <configuration>
  3. <!-- 指定MapReduce程序运行在Yarn上 -->
  4. <property>
  5. <name>mapreduce.framework.name </name>
  6. <value>yarn </value>
  7. </property>
  8. </configuration>

  
  1. vim workers
  2. master
  3. slave1
  4. slave2

  
  1. vim hadoop-env. sh
  2. export JAVA_HOME= /usr/local/src/jdk/jdk1 .8 .0_311

在****in/start-dfs.sh和****in/stop-dfs.sh文件添加如下内容


  
  1. HDFS_DATANODE_USER=ucmed
  2. HDFS_DATANODE_SECURE_USER=ucmed
  3. HDFS_NAMENODE_USER=ucmed
  4. HDFS_SECONDARYNAMENODE_USER=ucmed

在****in/start-yarn.sh和****in/stop-yarn.sh文件添加如下内容


  
  1. YARN_RESOURCEMANAGER_USER=ucmed
  2. HADOOP_SECURE_DN_USER=ucmed
  3. YARN_NODEMANAGER_USER=ucmed

分发配置到slave1和slave2,并配置环境变量(java,hadoop)


  
  1. scp -r /usr/local /src/hadoop /hadoop-3.3.1 slave1:/usr /local/src /hadoop/
  2. scp -r /usr/local /src/hadoop /hadoop-3.3.1 slave2:/usr /local/src /hadoop/

在master节点上初始化HDFS文件系统

hdfs namenode -format

在master节点上启动hdfs


  
  1. cd /usr/local/src/hadoop/hadoop-3.3.1/s bin
  2. ./start-dfs.sh

在slave1节点上启动yarn(因为yarn的主节点配置的是slave1)

cd /usr/local/src/hadoop/hadoop-3.3.1/./start-yarn.sh 

验证服务

Web端查看HDFS的NameNode

http://192.168.3.184:9870/

Web端查看YARN的ResourceManager

http://192.168.3.185:8088

6.测试

cd $HADOOP_HOME/share/hadoop/mapreduce

向HDFS集群写10个10MB的文件:


  
  1. hadoop jar ./hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
  2. TestDFSIO \
  3. -write \
  4. -nrFiles 10 \
  5. -size 10MB

删除测试临时文件:


  
  1. hadoop jar hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
  2. mrbench \
  3. -numRuns 20 \
  4. -maps 3 \
  5. -reduces 3 \
  6. -inputLines 5 \
  7. -inputType descending

测试重复执行小作业效率

测试使用3个mapper和3个reducer运行一个小作业20次,生成输入行数为5,降序排列:


  
  1. hadoop jar hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
  2. mrbench \
  3. -numRuns 20 \
  4. -maps 3 \
  5. -reduces 3 \
  6. -inputLines 5 \
  7. -inputType descending

测试NameNode负载

测试使用3个mapper和3个reducer来创建100个文件:


  
  1. hadoop jar hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
  2. nnbench \
  3. -operation create_write \
  4. -maps 3 \
  5. -reduces 3 \
  6. -numberOfFiles 100 \
  7. -replicationFactorPerFile 3 \
  8. -readFileAfterOpen true


转载:https://blog.csdn.net/weixin_38751513/article/details/128674646
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场