搭建Hadoop单机、伪分布式集群

安装前准备:

一台拥有root权限的Linux服务器(本文以Debian10为例)

1.更新一下系统软件

apt update && apt upgrade

2.安装JDK与Hadoop

下载Hadoop与JDK

Hadoop官网: https://hadoop.apache.org/

Oracle官网: https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

移动到相应路径进行解压

mv jdk-8u231-linux-x64.tar.gz /usr/local/lib
cd /usr/local/lib
tar xzvf jdk-8u231-linux-x64.tar.gz
cd
mv hadoop-2.7.3.tar.gz /usr/local
cd /usr/local
tar xzvf hadoop-2.7.3.tar.gz

3.配置环境变量

打开/etc/profile

vi /etc/profile

加入以下内容

export JAVA_HOME=/usr/local/lib/jdk1.8.0_231
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

使配置生效

source /etc/profile

5.设置ssh密钥

创建密钥并发送到本机

ssh-keygen
ssh-copy-id localhost

6.修改配置文件

进入配置文件目录

/usr/local/hadoop-2.7.3/etc/hadoop

修改 hadoop-env.sh

export JAVA_HOME=/usr/local/lib/jdk1.8.0_231

修改 yarn-env.sh

export JAVA_HOME=/usr/local/lib/jdk1.8.0_231

修改core-site.xml 加入如下内容

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/hadoop-2.7.3/hadoopdata/tmp</value>
	</property>
</configuration>

修改hdfs-site.xml 加入如下内容

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
 <name>dfs.name.dir</name>
 <value>/usr/local/hadoop-2.7.3/hadoopdata/name</value>
</property>
<property>
 <name>dfs.data.dir</name>
 <value>/usr/local/hadoop-2.7.3/hadoopdata/data</value>
</property>
</configuration>

修改yarn-site.xml 加入如下内容

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
<property>
 <name>yarn.resourcemanager.address</name>
 <value>localhost:18040</value>
</property>
<property>
 <name>yarn.resourcemanager.scheduler.address</name>
 <value>localhost:18030</value>
</property>
<property>
 <name>yarn.resourcemanager.resource-tracker.address</name>
 <value>localhost:18025</value>
</property>
<property>
 <name>yarn.resourcemanager.admin.address</name>
 <value>localhost:18141</value>
</property>
<property>
 <name>yarn.resourcemanager.webapp.address</name>
 <value>localhost:18088</value>
</property>
</configuration>

复制并修改 mapred -site.xml 加入如下内容

cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

7.格式化文件系统

hdfs namenode -format

8.启动Hadoop集群

cd /usr/local/hadoop-2.7.3/sbin
./start-all.sh

查看集群状态

hdfs dfsadmin -report

至此,Hadoop单机、伪分布式集群已经安装完毕,可以运行一下实例来验证安装是否成功

cd /usr/local/hadoop-2.7.3/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 10 10

发表回复

您的电子邮箱地址不会被公开。

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据