centos7安装hadoop3
# CentOS 7 安装 Hadoop 3.x 完整教程(单机/伪分布式/完全分布式通用前置)
## 一、环境准备(所有节点必做)
系统:CentOS 7
软件:JDK8、Hadoop3.3.6(稳定版)
### 1. 关闭防火墙 & SELinux
```bash
# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
# 关闭SELinux
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
```
### 2. 安装 JDK 1.8
```bash
yum install -y java-1.8.0-openjdk-devel
```
查看Java路径
```bash
readlink -f $(which java)
# 示例真实路径:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-1.el7_9.x86_64
```
### 3. 配置 Hosts & 主机名
```bash
# 设置主机名(示例)
hostnamectl set-hostname hadoop-master
# 配置hosts
vim /etc/hosts
# 添加
192.168.10.100 hadoop-master
192.168.10.101 hadoop-slave1
192.168.10.102 hadoop-slave2
```
### 4. 配置 SSH 免密登录
```bash
# 生成密钥 一路回车
ssh-keygen -t rsa
# 本地免密
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
# 分发到从节点(伪分布式只需要本机)
ssh-copy-id hadoop-master
ssh-copy-id hadoop-slave1
ssh-copy-id hadoop-slave2
```
---
## 二、下载并解压 Hadoop3
### 1. 下载(国内镜像)
```bash
cd /opt
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
```
### 2. 解压
```bash
tar -zxvf hadoop-3.3.6.tar.gz
mv hadoop-3.3.6 hadoop
```
### 3. 配置全局环境变量
```bash
vim /etc/profile
# 末尾添加
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-1.el7_9.x86_64
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
```
生效
```bash
source /etc/profile
# 验证
hadoop version
```
---
## 三、Hadoop 核心配置文件
配置文件目录:`/opt/hadoop/etc/hadoop`
### 1. hadoop-env.sh
```bash
vim /opt/hadoop/etc/hadoop/hadoop-env.sh
# 添加
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-1.el7_9.x86_64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
```
### 2. core-site.xml
```xml
<configuration>
<!-- 指定默认文件系统 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<!-- 临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
<!-- 关闭权限检查 -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
```
### 3. hdfs-site.xml
```xml
<configuration>
<!-- 副本数 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 关闭权限 -->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
```
### 4. mapred-site.xml
```xml
<configuration>
<!-- 指定MR运行在YARN上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
```
### 5. yarn-site.xml
```xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- ResourceManager地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
</configuration>
```
### 6. workers(3.x 替代 slaves)
```bash
vim /opt/hadoop/etc/hadoop/workers
# 写入节点
hadoop-master
hadoop-slave1
hadoop-slave2
```
---
## 四、分发配置(分布式必做)
```bash
scp -r /opt/hadoop root@hadoop-slave1:/opt/
scp -r /opt/hadoop root@hadoop-slave2:/opt/
# 同步环境变量
scp /etc/profile root@hadoop-slave1:/etc/
scp /etc/profile root@hadoop-slave2:/etc/
```
---
## 五、格式化 & 启动集群
### 1. 格式化 NameNode(只执行一次!)
```bash
hdfs namenode -format
```
### 2. 启动 HDFS
```bash
start-dfs.sh
```
### 3. 启动 YARN
```bash
start-yarn.sh
```
### 4. 查看进程 `jps`
- master:NameNode、ResourceManager、SecondaryNameNode
- slave:DataNode、NodeManager
---
## 六、访问 Web UI
- HDFS:`http://hadoop-master:9870`
- YARN:`http://hadoop-master:8088`
---
## 七、常用启停命令
```bash
# 全部停止
stop-dfs.sh
stop-yarn.sh
# 全部启动
start-dfs.sh
start-yarn.sh
```
---
## 常见报错解决
1. **JAVA_HOME 找不到**
核对 `hadoop-env.sh` 内 JAVA_HOME 绝对路径
2. **连接从节点拒绝**
检查 SSH 免密、hosts 解析
3. **启动后无 DataNode**
删除 `tmp` 目录,重新格式化 NameNode



