摘要:
大数据集群伪分布搭建所用架包为jdk-8u144-linux-x64 hadoop-2.7.4
安装虚拟机环境
转载:安装虚拟机环境
虚拟机的安装(Centos7)
转载:虚拟机的安装(Centos7)
搭建Hadoop平台
搭建前的准备
配置主节点名
主节点更名为master
1 2
| sudo hostnamectl set-hostname master reboot
|
配置各节点的静态IP地址并且访问外网
注意DNS1与网关同名才可访问外网
master节点
1
| vi /etc/sysconfig/network-scripts/ifcfg-ens33
|
在文件中更改BOOTPROTO=static
ONBOOT=yes
添加:
1 2 3 4 5
| IPADDR=192.168.*.* NETMASK=255.255.255.0 GATEWAY=192.168.*.2 DNS1=192.168.*.2 service network restart
|
配置hosts映射
master节点
添加如下映射信息
关闭防火墙
1.查看防火墙状态
2.停止firewall
1
| systemctl stop firewalld.service
|
3.禁止firewall开机启动
1
| systemctl disable firewalld.service
|
时间同步
命令行输入:
下载完成后 命令行输入:
1
| ntpdate -u ntp1.aliyun.com
|
然后命令行输入:
创建用户(root下)
1 2 3 4 5
| sudo useradd -m joker -s /bin/bash sudo passwd joker sudo adduser joker sudo chmod 777 /etc/sudoers vi sudoers
|
添加:
1 2 3
| joker ALL=(ALL) ALL chmod 440 /etc/sudoers su - joker
|
配置ssh无密码访问
生成公钥密钥对
命令行输入:
一直按回车直到生成结束
执行结束之后每个节点上的/root/.ssh/目录下生成了两个文件 id_rsa 和 id_rsa.pub
其中前者为私钥,后者为公钥
在主节点上执行
命令行输入:
1 2 3
| cd ~/.ssh/ cat id_rsa.pub >> authorized_keys chmod 600 ./authorized_keys
|
安装jdk
主从机都进行该操作
解压JDK :
1
| sudo tar -zxvf /opt/sorftware/jdk-8u144-linux-x64.tar.gz -C /opt/modules/
|
配置环境变量, 编辑profile文件:
在profile文件末尾添加以下代码:
1 2 3 4
| export JAVA_HOME=/opt/modules/jdk1.8.0_144 export JRE_HOME=$JAVA_HOME/jre export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib
|
保存后,使刚才编辑的文件生效:
测试是否安装成功:
安装并配置hadoop
安装
解压hadoop:
1
| sudo tar -zxvf /opt/sorftware/hadoop-2.7.4.tar.gz -C /opt/modules/
|
配置环境变量:
在末尾添加:
1 2
| export HADOOP_HOME=/opt/modules/hadoop-2.7.4 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
|
保存后使新编辑的profile生效:
1 2
| source /etc/profile sudo chown -R joker /opt/modules/hadoop-2.7.4/
|
配置
需要配置的文件的位置为/opt/modules/hadoop-2.7.4/etc/hadoop,需要修改的有以下几个文件:
hadoop-env.sh
yarn-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
slaves
其中hadoop-env.sh和yarn-env.sh里面都要添加jdk的环境变量
hadoop-env.sh
添加如下代码:export JAVA_HOME=/opt/modules/jdk1.8.0_144 到#The java implemention to use下
yarn-env.sh
添加如下代码:export JAVA_HOME=/opt/modules/jdk1.8.0_144 到#some java parameters下
core-site.xml
1
| mkdir /opt/modules/hadoop-2.7.4/data
|
1 2 3 4 5 6 7 8 9 10
| <configuration> <property> <name>fs.defaultFS</name> <value>hdfs: </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/modules/hadoop-2.7.4/data</value> </property> </configuration>
|
hdfs-site.xml
1 2
| cd /usr sudo chown -R joker ./dfs
|
1 2
| mkdir /usr/dfs/data mkdir /usr/dfs/name
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| <configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master1:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.datanode.directoryscan.throttle.limit.ms.per.sec</name> <value>1000</value> </property> <property> <name>dfs.datanode.max.transfer.threads</name> <value>8192</value> </property> </configuration>
|
mapred-site.xml
1
| cp mapred-site.xml.template mapred-site.xml
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master1:19888</value> </property> </configuration>
|
yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master1:8088</value> </property> <property> <name>yarn.log.server.url</name> <value>http://master1:19888/jobhistory/logs/</value> </property> </configuration>
|
slaves
修改成:
启动
格式化主节点的namenode主节点上进入/opt/modules/hadoop-2.7.4目录然后执行:
1
| ./bin/hadoop namenode -format
|
提示:successfully formatted表示格式化成功
启动hadoop
主节点上在hadoop-2.7.4目录下执行:
1
| ./sbin/mr-jobhistory-daemon.sh start historyserver
|
主节点上jps进程如下:
NameNode
SecondaryNameNode
ResourceManager
jps
DataNode
NodeManager
注:JobHistoryServer (启动命令:mr-jobhistory-daemon.sh start historyserver)