博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
linux运维、架构之路-Hadoop完全分布式集群搭建
阅读量:5163 次
发布时间:2019-06-13

本文共 14009 字,大约阅读时间需要 46 分钟。

一、介绍

Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。

HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。

二、部署环境规划

1、服务器地址规划

序号

IP地址

机器名

类型

用户名

1

10.0.0.67

Master.Hadoop

Namenode

Hadoop/root

2

10.0.0.68

Slave1.Hadoop

Datanode

Hadoop/root

3

10.0.0.69

Slave2.Hadoop

Datanode

Hadoop/root

2、部署环境

[root@Master ~]# cat /etc/redhat-release CentOS release 6.9 (Final)[root@Master ~]# uname -r2.6.32-696.el6.x86_64[root@Master ~]# /etc/init.d/iptables statusiptables: Firewall is not running.[root@Master ~]# getenforce Disabled

3、统一/etc/hosts解析

10.0.0.67  Master.Hadoop 10.0.0.68  Slave1.Hadoop 10.0.0.69  Slave2.Hadoop

三、SSH无密码验证配置

1、Master操作

[root@Master ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsaGenerating public/private dsa key pair.Created directory '/root/.ssh'.Your identification has been saved in /root/.ssh/id_dsa.Your public key has been saved in /root/.ssh/id_dsa.pub.The key fingerprint is:d9:50:b7:b1:f9:aa:83:6e:34:b9:0a:10:61:b9:83:e8 root@Master.HadoopThe key's randomart image is:+--[ DSA 1024]----+|  o.      . o    || ...     . . =   ||....    .   +    ||o o.     +   .   ||. ..    S..   .  || E .    +    .   ||    .  . +  .    ||     .  + ..     ||      .+. ..     |+-----------------+[root@Master ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2、分发公钥到两个Slave上面

①Slave1

[root@Slave1 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pubThe authenticity of host 'master.hadoop (10.0.0.67)' can't be established.RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts.root@master.hadoop's password: id_dsa.pub                                                           100%  608     0.6KB/s   00:00    [root@Slave1 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys

②Slave2

[root@Slave2 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pubThe authenticity of host 'master.hadoop (10.0.0.67)' can't be established.RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts.root@master.hadoop's password: id_dsa.pub                                                           100%  608     0.6KB/s   00:00    [root@Slave2 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys

③Master测试连接slave

[root@Master ~]# ssh Slave1.HadoopLast login: Tue Aug  7 10:30:53 2018 from 10.0.0.67[root@Slave1 ~]# exitlogoutConnection to Slave1.Hadoop closed.[root@Master ~]# ssh Slave2.HadoopLast login: Tue Aug  7 10:31:04 2018 from 10.0.0.67

四、Hadoop安装及环境配置

1、Master操作

①安装JAVA环境

tar xf jdk-8u121-linux-x64.tar.gz -C /usr/local/ln -s /usr/local/jdk1.8.0_121/ /usr/local/jdk

配置环境变量

[root@Master ~]# tail -4  /etc/profileexport JAVA_HOME=/usr/local/jdk1.8.0_181  export JRE_HOME=/usr/local/jdk1.8.0_181/jre  export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH  export PATH=$JAVA_HOME/bin:$PATH[root@Master ~]# source /etc/profile[root@Master ~]# java -versionjava version "1.8.0_181"Java(TM) SE Runtime Environment (build 1.8.0_181-b13)Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

2、Hadoop安装及其环境配置

①安装

tar -xf hadoop-2.8.0.tar.gz -C /usr/mv /usr/hadoop-2.8.0/ /usr/hadoop###配置Hadoop环境变量###export HADOOP_HOME=/usr/hadoopexport PATH=$PATH:$HADOOP_HOME/bin

②配置hadoop-env.sh生效

vim /usr/hadoop/etc/hadoop/hadoop-env.shexport JAVA_HOME=/usr/local/jdk1.8.0_181source /usr/hadoop/etc/hadoop/hadoop-env.sh[root@Master usr]# hadoop version #查看Hadoop版本Hadoop 2.8.0Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009Compiled by jdu on 2017-03-17T04:12ZCompiled with protoc 2.5.0From source with checksum 60125541c2b3e266cbf3becc5bda666This command was run using /usr/hadoop/share/hadoop/common/hadoop-common-2.8.0.jar

③创建Hadoop所需的子目录

mkdir /usr/hadoop/{tmp,hdfs}mkdir /usr/hadoop/hdfs/{name,tmp,data} -p

④修改Hadoop核心配置文件core-site.xml,配置是HDFS master(即namenode)的地址和端口号

vim /usr/hadoop/etc/hadoop/core-site.xml
hadoop.tmp.dir
/usr/hadoop/tmp
true
A base for other temporary directories.
fs.default.name
hdfs://10.0.0.67:9000
true
io.file.buffer.size
131072

⑤配置hdfs-site.xml文件

vim /usr/hadoop/etc/hadoop/hdfs-site.xml
dfs.replication
2
dfs.name.dir
/usr/hadoop/hdfs/name
dfs.data.dir
/usr/hadoop/hdfs/data
dfs.namenode.secondary.http-address
master.hadoop:9001
dfs.webhdfs.enabled
true
dfs.permissions
false

⑥配置mapred-site.xml文件

vim /usr/hadoop/etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn

⑦配置yarn-site.xml文件

vim /usr/hadoop/etc/hadoop/yarn-site.xml
yarn.resourcemanager.address
Master.Hadoop:18040
yarn.resourcemanager.scheduler.address
Master.Hadoop:18030
yarn.resourcemanager.webapp.address
Master.Hadoop:18088
yarn.resourcemanager.resource-tracker.address
Master.Hadoop:18025
yarn.resourcemanager.admin.address
Master.Hadoop:18141
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler

⑧配置masters、slaves文件

echo "10.0.0.67" >/usr/hadoop/etc/hadoop/mastersecho -e "10.0.0.68\n10.0.0.69" >/usr/hadoop/etc/hadoop/slaves

查看

[root@Master hadoop]# cat /usr/hadoop/etc/hadoop/masters10.0.0.67[root@Master hadoop]# cat /usr/hadoop/etc/hadoop/slaves 10.0.0.6810.0.0.69

3、Slave服务器安装及配置 

①拷贝jdk到Slave

scp -rp /usr/local/jdk1.8.0_181 root@Slave1.Hadoop:/usr/local/scp -rp /usr/local/jdk1.8.0_181 root@Slave2.Hadoop:/usr/local/

②拷贝环境变量/etc/profile

scp -rp /etc/profile root@Slave1.Hadoop:/etc/scp -rp /etc/profile root@Slave2.Hadoop:/etc/

③拷贝/usr/hadoop

scp -rp /usr/hadoop root@Slave1.Hadoop:/usr/scp -rp /usr/hadoop root@Slave2.Hadoop:/usr/

到此环境搭建完毕

五、启动及验证Hadoop集群

1、启动

①格式化HDFS文件系统

/usr/hadoop/sbin/hadoop namenode –format

②启动Hadoop集群所有节点

sh /usr/hadoop/sbin/start-all.sh

查看hadoop进程

[root@Master sbin]# ps -ef|grep hadooproot       1523      1  3 16:37 ?        00:00:07 /usr/local/jdk1.8.0_181/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-secondarynamenode-Master.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNoderoot       1670      1 11 16:37 pts/0    00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usr/hadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManagerroot       1941   1235  0 16:40 pts/0    00:00:00 grep --color=auto hadoop

③关闭Hadoop集群所有节点

sh /usr/hadoop/sbin/stop-all.sh

Slave1.Hadoop Slave2.Hadoop查看hadoop进程

[root@Slave1 ~]# ps -ef|grep hadooproot       1271      1  2 16:37 ?        00:00:12 /usr/local/jdk1.8.0_181/bin/java -Dproc_datanode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-datanode-Slave1.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNoderoot       1363      1  4 16:37 ?        00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usrhadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManagerroot       1499   1238  0 16:45 pts/0    00:00:00 grep --color=auto hadoop

 2、使用jps命令测试

①Master

[root@Master ~]# jps11329 NameNode11521 SecondaryNameNode12269 Jps11677 ResourceManager

②Slave

[root@Slave1 ~]# jps4320 Jps4122 NodeManager4012 DataNode

3、Master上面查看集群状态

[root@Master ~]# hadoop dfsadmin -reportDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.18/08/08 11:16:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableConfigured Capacity: 38020816896 (35.41 GB)Present Capacity: 27476373504 (25.59 GB)DFS Remaining: 27476316160 (25.59 GB)DFS Used: 57344 (56 KB)DFS Used%: 0.00%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0Pending deletion blocks: 0-------------------------------------------------Live datanodes (2):Name: 10.0.0.68:50010 (Slave1.Hadoop)Hostname: Slave1.HadoopDecommission Status : NormalConfigured Capacity: 19010408448 (17.70 GB)DFS Used: 28672 (28 KB)Non DFS Used: 4270084096 (3.98 GB)DFS Remaining: 13767794688 (12.82 GB)DFS Used%: 0.00%DFS Remaining%: 72.42%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Wed Aug 08 11:16:09 CST 2018Name: 10.0.0.69:50010 (Slave2.Hadoop)Hostname: Slave2.HadoopDecommission Status : NormalConfigured Capacity: 19010408448 (17.70 GB)DFS Used: 28672 (28 KB)Non DFS Used: 4329357312 (4.03 GB)DFS Remaining: 13708521472 (12.77 GB)DFS Used%: 0.00%DFS Remaining%: 72.11%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Wed Aug 08 11:16:08 CST 2018

4、通过web页面查看集群状态

http://10.0.0.67:50070

 

转载于:https://www.cnblogs.com/yanxinjiang/p/9437964.html

你可能感兴趣的文章
小弟刚入职,谈谈入职感受
查看>>
js 正则只允许小写字母、数字、点、中短划线
查看>>
3-log4j2之输出日志到文件
查看>>
day python calss08 深浅copy
查看>>
Spring—Quartz定时调度CronTrigger时间配置格式说明与实例
查看>>
三种常见的单例模式
查看>>
NodeJS学习(1)--- 安装配置介绍
查看>>
浏览器的渲染过程
查看>>
React拾遗(下)
查看>>
OpenResty + Lua访问Redis,实现高并发访问时的毫秒级响应打回
查看>>
浅谈 OpenResty,基于opebresty+redis进行实时线上限流
查看>>
OpenResty + Lua + Kafka 实现日志收集系统以及部署过程中遇到的坑
查看>>
Kafka如何保证百万级写入速度以及保证不丢失不重复消费
查看>>
openresty+lua+kafka方案与Tomcat接口并发度对比分析
查看>>
Kafka性能调优分析-线上环境篇
查看>>
使用zookeeper作为分布式锁以及设计一种通知监听模式
查看>>
数据传输协议protobuf的使用及案例
查看>>
Flink入门学习及实战
查看>>
kafka中partition和消费者对应关系
查看>>
模糊查询中输入通配符的问题
查看>>