一、介绍
Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。
HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。二、部署环境规划
1、服务器地址规划
序号 | IP地址 | 机器名 | 类型 | 用户名 |
1 | 10.0.0.67 | Master.Hadoop | Namenode | Hadoop/root |
2 | 10.0.0.68 | Slave1.Hadoop | Datanode | Hadoop/root |
3 | 10.0.0.69 | Slave2.Hadoop | Datanode | Hadoop/root |
2、部署环境
[root@Master ~]# cat /etc/redhat-release CentOS release 6.9 (Final)[root@Master ~]# uname -r2.6.32-696.el6.x86_64[root@Master ~]# /etc/init.d/iptables statusiptables: Firewall is not running.[root@Master ~]# getenforce Disabled
3、统一/etc/hosts解析
10.0.0.67 Master.Hadoop 10.0.0.68 Slave1.Hadoop 10.0.0.69 Slave2.Hadoop
三、SSH无密码验证配置
1、Master操作
[root@Master ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsaGenerating public/private dsa key pair.Created directory '/root/.ssh'.Your identification has been saved in /root/.ssh/id_dsa.Your public key has been saved in /root/.ssh/id_dsa.pub.The key fingerprint is:d9:50:b7:b1:f9:aa:83:6e:34:b9:0a:10:61:b9:83:e8 root@Master.HadoopThe key's randomart image is:+--[ DSA 1024]----+| o. . o || ... . . = ||.... . + ||o o. + . ||. .. S.. . || E . + . || . . + . || . + .. || .+. .. |+-----------------+[root@Master ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2、分发公钥到两个Slave上面
①Slave1
[root@Slave1 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pubThe authenticity of host 'master.hadoop (10.0.0.67)' can't be established.RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts.root@master.hadoop's password: id_dsa.pub 100% 608 0.6KB/s 00:00 [root@Slave1 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys
②Slave2
[root@Slave2 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pubThe authenticity of host 'master.hadoop (10.0.0.67)' can't be established.RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts.root@master.hadoop's password: id_dsa.pub 100% 608 0.6KB/s 00:00 [root@Slave2 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys
③Master测试连接slave
[root@Master ~]# ssh Slave1.HadoopLast login: Tue Aug 7 10:30:53 2018 from 10.0.0.67[root@Slave1 ~]# exitlogoutConnection to Slave1.Hadoop closed.[root@Master ~]# ssh Slave2.HadoopLast login: Tue Aug 7 10:31:04 2018 from 10.0.0.67
四、Hadoop安装及环境配置
1、Master操作
①安装JAVA环境
tar xf jdk-8u121-linux-x64.tar.gz -C /usr/local/ln -s /usr/local/jdk1.8.0_121/ /usr/local/jdk
配置环境变量
[root@Master ~]# tail -4 /etc/profileexport JAVA_HOME=/usr/local/jdk1.8.0_181 export JRE_HOME=/usr/local/jdk1.8.0_181/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$PATH[root@Master ~]# source /etc/profile[root@Master ~]# java -versionjava version "1.8.0_181"Java(TM) SE Runtime Environment (build 1.8.0_181-b13)Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
2、Hadoop安装及其环境配置
①安装
tar -xf hadoop-2.8.0.tar.gz -C /usr/mv /usr/hadoop-2.8.0/ /usr/hadoop###配置Hadoop环境变量###export HADOOP_HOME=/usr/hadoopexport PATH=$PATH:$HADOOP_HOME/bin
②配置hadoop-env.sh生效
vim /usr/hadoop/etc/hadoop/hadoop-env.shexport JAVA_HOME=/usr/local/jdk1.8.0_181source /usr/hadoop/etc/hadoop/hadoop-env.sh[root@Master usr]# hadoop version #查看Hadoop版本Hadoop 2.8.0Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009Compiled by jdu on 2017-03-17T04:12ZCompiled with protoc 2.5.0From source with checksum 60125541c2b3e266cbf3becc5bda666This command was run using /usr/hadoop/share/hadoop/common/hadoop-common-2.8.0.jar
③创建Hadoop所需的子目录
mkdir /usr/hadoop/{tmp,hdfs}mkdir /usr/hadoop/hdfs/{name,tmp,data} -p
④修改Hadoop核心配置文件core-site.xml,配置是HDFS master(即namenode)的地址和端口号
vim /usr/hadoop/etc/hadoop/core-site.xmlhadoop.tmp.dir /usr/hadoop/tmp true A base for other temporary directories. fs.default.name hdfs://10.0.0.67:9000 true io.file.buffer.size 131072
⑤配置hdfs-site.xml文件
vim /usr/hadoop/etc/hadoop/hdfs-site.xmldfs.replication 2 dfs.name.dir /usr/hadoop/hdfs/name dfs.data.dir /usr/hadoop/hdfs/data dfs.namenode.secondary.http-address master.hadoop:9001 dfs.webhdfs.enabled true dfs.permissions false
⑥配置mapred-site.xml文件
vim /usr/hadoop/etc/hadoop/mapred-site.xmlmapreduce.framework.name yarn
⑦配置yarn-site.xml文件
vim /usr/hadoop/etc/hadoop/yarn-site.xmlyarn.resourcemanager.address Master.Hadoop:18040 yarn.resourcemanager.scheduler.address Master.Hadoop:18030 yarn.resourcemanager.webapp.address Master.Hadoop:18088 yarn.resourcemanager.resource-tracker.address Master.Hadoop:18025 yarn.resourcemanager.admin.address Master.Hadoop:18141 yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler
⑧配置masters、slaves文件
echo "10.0.0.67" >/usr/hadoop/etc/hadoop/mastersecho -e "10.0.0.68\n10.0.0.69" >/usr/hadoop/etc/hadoop/slaves
查看
[root@Master hadoop]# cat /usr/hadoop/etc/hadoop/masters10.0.0.67[root@Master hadoop]# cat /usr/hadoop/etc/hadoop/slaves 10.0.0.6810.0.0.69
3、Slave服务器安装及配置
①拷贝jdk到Slave
scp -rp /usr/local/jdk1.8.0_181 root@Slave1.Hadoop:/usr/local/scp -rp /usr/local/jdk1.8.0_181 root@Slave2.Hadoop:/usr/local/
②拷贝环境变量/etc/profile
scp -rp /etc/profile root@Slave1.Hadoop:/etc/scp -rp /etc/profile root@Slave2.Hadoop:/etc/
③拷贝/usr/hadoop
scp -rp /usr/hadoop root@Slave1.Hadoop:/usr/scp -rp /usr/hadoop root@Slave2.Hadoop:/usr/
到此环境搭建完毕
五、启动及验证Hadoop集群
1、启动
①格式化HDFS文件系统
/usr/hadoop/sbin/hadoop namenode –format
②启动Hadoop集群所有节点
sh /usr/hadoop/sbin/start-all.sh
查看hadoop进程
[root@Master sbin]# ps -ef|grep hadooproot 1523 1 3 16:37 ? 00:00:07 /usr/local/jdk1.8.0_181/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-secondarynamenode-Master.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNoderoot 1670 1 11 16:37 pts/0 00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usr/hadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManagerroot 1941 1235 0 16:40 pts/0 00:00:00 grep --color=auto hadoop
③关闭Hadoop集群所有节点
sh /usr/hadoop/sbin/stop-all.sh
Slave1.Hadoop Slave2.Hadoop查看hadoop进程
[root@Slave1 ~]# ps -ef|grep hadooproot 1271 1 2 16:37 ? 00:00:12 /usr/local/jdk1.8.0_181/bin/java -Dproc_datanode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-datanode-Slave1.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNoderoot 1363 1 4 16:37 ? 00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usrhadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManagerroot 1499 1238 0 16:45 pts/0 00:00:00 grep --color=auto hadoop
2、使用jps命令测试
①Master
[root@Master ~]# jps11329 NameNode11521 SecondaryNameNode12269 Jps11677 ResourceManager
②Slave
[root@Slave1 ~]# jps4320 Jps4122 NodeManager4012 DataNode
3、Master上面查看集群状态
[root@Master ~]# hadoop dfsadmin -reportDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.18/08/08 11:16:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableConfigured Capacity: 38020816896 (35.41 GB)Present Capacity: 27476373504 (25.59 GB)DFS Remaining: 27476316160 (25.59 GB)DFS Used: 57344 (56 KB)DFS Used%: 0.00%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0Pending deletion blocks: 0-------------------------------------------------Live datanodes (2):Name: 10.0.0.68:50010 (Slave1.Hadoop)Hostname: Slave1.HadoopDecommission Status : NormalConfigured Capacity: 19010408448 (17.70 GB)DFS Used: 28672 (28 KB)Non DFS Used: 4270084096 (3.98 GB)DFS Remaining: 13767794688 (12.82 GB)DFS Used%: 0.00%DFS Remaining%: 72.42%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Wed Aug 08 11:16:09 CST 2018Name: 10.0.0.69:50010 (Slave2.Hadoop)Hostname: Slave2.HadoopDecommission Status : NormalConfigured Capacity: 19010408448 (17.70 GB)DFS Used: 28672 (28 KB)Non DFS Used: 4329357312 (4.03 GB)DFS Remaining: 13708521472 (12.77 GB)DFS Used%: 0.00%DFS Remaining%: 72.11%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Wed Aug 08 11:16:08 CST 2018
4、通过web页面查看集群状态
http://10.0.0.67:50070