Hadoop 完全分布式搭建(超详细无坑版,直接复制执行)
# Hadoop 完全分布式搭建【超详细无坑版|直接复制执行】 适配:**CentOS 7 / Rocky Linux 7+** 架构:`1主 + 2从` - master:NameNode、ResourceManager - node1:DataNode、NodeManager - node2:DataNode、NodeManager 前置统一规范: - 主机名:master / node1 / node2 - 统一用户:root - 统一目录:`/usr/local/hadoop` - JDK:1.8 - 关闭防火墙、SELinux、免密登录、固定IP --- # 一、三台机器 前置统一配置(所有节点执行) ## 1. 固定主机名 ```bash # master 执行 hostnamectl set-hostname master # node1 执行 hostnamectl set-hostname node1 # node2 执行 hostnamectl set-hostname node2 ``` ## 2. 配置 hosts(三台全部执行) ```bash cat >> /etc/hosts << EOF 192.168.10.100 master 192.168.10.101 node1 192.168.10.102 node2 EOF ``` > 替换为你三台真实 IP ## 3. 关闭防火墙 & 开机不自启 ```bash systemctl stop firewalld systemctl disable firewalld ``` ## 4. 关闭 SELinux ```bash sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config setenforce 0 ``` ## 5. 安装依赖 ```bash yum install -y wget net-tools vim lrzsz tar ``` --- # 二、配置 SSH 免密登录【仅 master 执行】 ```bash # 生成密钥 一路回车 ssh-keygen -t rsa # 分发公钥到三台机器 ssh-copy-id master ssh-copy-id node1 ssh-copy-id node2 ``` 测试:`ssh node1` 无需密码即为成功 --- # 三、统一安装 JDK1.8(三台全部) ## 1. 解压安装 ```bash # 上传 jdk-8u341-linux-x64.tar.gz 到 /usr/local tar -zxvf /usr/local/jdk-8u341-linux-x64.tar.gz -C /usr/local/ mv /usr/local/jdk1.8.0_341 /usr/local/jdk ``` ## 2. 配置环境变量 ```bash cat >> /etc/profile << EOF # JDK export JAVA_HOME=/usr/local/jdk export JRE_HOME=\$JAVA_HOME/jre export CLASSPATH=.:\$JAVA_HOME/lib:\$JRE_HOME/lib export PATH=\$JAVA_HOME/bin:\$PATH EOF source /etc/profile java -version ``` --- # 四、Hadoop 安装【master 操作,后续分发】 ## 1. 解压 ```bash # 上传 hadoop-3.3.6.tar.gz 到 /usr/local tar -zxvf /usr/local/hadoop-3.3.6.tar.gz -C /usr/local/ mv /usr/local/hadoop-3.3.6 /usr/local/hadoop ``` ## 2. Hadoop 环境变量 ```bash cat >> /etc/profile << EOF # HADOOP export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=\$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=\$HADOOP_HOME/lib/native export PATH=\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin:\$PATH EOF source /etc/profile hadoop version ``` ## 3. 修改 hadoop-env.sh ```bash vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh ``` 写入: ```bash export JAVA_HOME=/usr/local/jdk export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root ``` --- # 五、核心配置文件(全部复制覆盖) ## 1. core-site.xml ```bash vim /usr/local/hadoop/etc/hadoop/core-site.xml ``` ```xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> <property> <name>hdfs.trash.interval</name> <value>1440</value> </property> </configuration> ``` ## 2. hdfs-site.xml ```bash vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml ``` ```xml <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>node1:50090</value> </property> </configuration> ``` ## 3. mapred-site.xml ```bash vim /usr/local/hadoop/etc/hadoop/mapred-site.xml ``` ```xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration> ``` ## 4. yarn-site.xml ```bash vim /usr/local/hadoop/etc/hadoop/yarn-site.xml ``` ```xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> ``` ## 5. workers 从节点列表 ```bash vim /usr/local/hadoop/etc/hadoop/workers ``` 清空原有内容,写入: ``` node1 node2 ``` --- # 六、分发 Hadoop 到 node1、node2 ```bash scp -r /usr/local/hadoop node1:/usr/local/ scp -r /usr/local/hadoop node2:/usr/local/ # 分发环境变量 scp /etc/profile node1:/etc/ scp /etc/profile node2:/etc/ ``` 两台从节点执行生效: ```bash source /etc/profile ``` --- # 七、格式化 & 启动集群【仅 master】 ## 1. 初始化临时目录 ```bash mkdir -p /usr/local/hadoop/tmp ``` ## 2. 格式化 NameNode(**只执行一次!**) ```bash hdfs namenode -format ``` ## 3. 启动 HDFS ```bash start-dfs.sh ``` ## 4. 启动 YARN ```bash start-yarn.sh ``` --- # 八、集群进程检查 ## master 执行:`jps` ``` NameNode ResourceManager ``` ## node1 / node2 执行:`jps` ``` DataNode NodeManager ``` --- # 九、Web 访问 - HDFS:http://master:9870 - YARN:http://master:8088 --- # 十、常用启停命令 ```bash # 全集群停止 stop-dfs.sh stop-yarn.sh # 全集群启动 start-dfs.sh start-yarn.sh ``` --- # 十一、避坑关键点(必看) 1. 所有节点 **IP+hosts 一一对应** 2. 只在 **第一次部署** 执行 `hdfs namenode -format`,重复格式化集群损坏 3. 免密登录必须通,否则启动卡死 4. 防火墙、SELinux 必须关闭 5. 三台机器 **JDK、Hadoop 路径完全一致**






