安装配置HDFS

安装配置HDFS

Scroll Down

安装步骤

  1. 解压安装
  2. 用户授权
$ cd ~/Downloads
$ rz
# 上传下载到windows内的hadoop文件
$ sudo tar -zxf ./hadoop-2.7.7.tar.gz -C /usr/local
$ cd /usr/local
# 更改文件夹名称便于后面的配置
$ sudo mv ./hadoop-2.2.7/ ./hadoop
$ sudo chown -R qinphy ./hadoop
12345678

环境配置

$ vi ~/.bashrc
1

配置内容

export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
1
$ source ~/.bashrc
# 检验安装及环境配置
$ cd ~
$ hadoop version
hadoop-2.2.7
...
123456

集群配置

$ vi slaves
1

workers文件是指定从节点

每行一个

Slave1
Slave2
12
$ vi core-site.xml
1

core-site.xml是全局参数配置文件

主要是用于配置NameNode与DataNode通信地址

<configuration>
	<property>
 	<name>fs.defaultFS</name>
     <value>hdfs://Master:9000</value>
 </property>
 <property>
 	<name>hadoop.tmp.dir</name>
     <value>/usr/local/hadoop/tmp</value>
 </property>
</configuration>
12345678910
$ vi hdfs-site.xml
1

hdfs-site.xml配置内容:声明

dfs.replication是从节点个数

<configuration>
 <property>
 	<name>dfs.namenode.http-address</name>
     <value>Master:50090</value>
 </property>
 <property>
 	<name>dfs.namenode.secondary.http-address</name>
     <value>Slave1:50070</value>
 </property>
	<property>
 	<name>dfs.replication</name>
     <value>4</value>
 </property>
 <property>
 	<name>dfs.namenode.name.dir</name>
     <value>/usr/local/hadoop/tmp/dfs/name</value>
 </property>
 <property>
 	<name>dfs.datanode.data.dir</name>
     <value>/usr/local/hadoop/tmp/dfs/data</value>
 </property>
</configuration>
12345678910111213141516171819202122
$ cp ./mapred-site.xml.template ./mapred-site.xml
$ rm -f ./mapred-site.xml.template
$ vi mapred-site.xml
123

mapred-site.xml配置

<configuration>
	<property>
 	<name>mapreduce.framework.name</name>
     <value>yarn</value>
 </property>
 <property>
 	<name>mapreduce.jobhistory.address</name>
     <value>Master:10020</value>
 </property>
 <property>
 	<name>mapreduce.jobhistory.webapp.address</name>
     <value>Master:19888</value>
 </property>
</configuration>
1234567891011121314
$ vi yarn-site.xml
1

yarn-site配置

<configuration>
 <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
 </property>
 <property>
     <name>yarn.resourcemanager.address</name>
     <value>Master:8032</value>
 </property>
 <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>Master:8030</value>
 </property>
 <property>
     <name>yarn.resourcemanager.resource-tracker.address</name>
     <value>Master:8031</value>
 </property>
 <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>Master:8030</value>
 </property>
 <property>
     <name>yarn.resourcemanager.webapp.address</name>
     <value>Master:8100</value>
 </property>
 <property>
     <name>yarn.log-aggregation-enable</name>
     <value>true</value>
 </property>
</configuration>
123456789101112131415161718192021222324252627282930

配置好从节点

# 转发到每个结点
$ cd /usr/local
$ sudo tar -zcf ~/hadoop.master.tar.gz ./hadoop
$ scp ~/hadoop.master.tar.gz qinphy@Slave1:/home/qinphy
$ scp ~/hadoop.master.tar.gz qinphy@Slave2:/home/qinphy
# 每个子节点上解压
$ sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
# 授予权限
$ sudo chown -R qinphy /usr/local/hadoop
123456789

初始化

$ hdfs namenode -format
1

启动与关闭

$ start-all.sh
# 等同于start-dfs.sh+start-yarn.sh
12

查看效果:

# 在每个节点上
$ jps
# 在Master结点上
$ hdfs dfsadmin -report
# 停止Hadoop
$ stop-all.sh
123456

jps结果:

# Master
$ jps
10433 Jps
10317 ResourceManager
9967 NameNode
# Slave1
$ jps
9796 NodeManager
9881 Jps
9722 SecondaryNameNode
9661 DataNod
# 其余Slave结点
$ jps
9634 DataNode
9778 Jps
9715 NodeManager
12345678910111213141516

report结果:能够看到从节点信息

$ hdfs dfsadmin -report
Configured Capacity: 36477861888 (33.97 GB)
Present Capacity: 30168911872 (28.10 GB)
DFS Remaining: 30168903680 (28.10 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Replicated Blocks:
	Under replicated blocks: 0
	Blocks with corrupt replicas: 0
	Missing blocks: 0
	Missing blocks (with replication factor 1): 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0
Erasure Coded Block Groups: 
	Low redundancy block groups: 0
	Block groups with corrupt internal blocks: 0
	Missing block groups: 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.1.121:9866 (Slave1)
Hostname: Slave1
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3154427904 (2.94 GB)
DFS Remaining: 15084498944 (14.05 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 01 08:42:26 CST 2020
Last Block Report: Fri May 01 08:41:45 CST 2020
Num of Blocks: 0


Name: 192.168.1.122:9866 (Slave2)
Hostname: Slave2
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3154522112 (2.94 GB)
DFS Remaining: 15084404736 (14.05 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.70%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 01 08:42:26 CST 2020
Last Block Report: Fri May 01 08:41:37 CST 2020
Num of Blocks: 0

使用HDFS进行增删改查

一、put文件到HDFS

1.使用hdfs dfs -put file /

解释:file指的是你本地文件路径地址,/指的是hdfs的根路径.

例如我现在位于/usr/local/source路径下面,我想把当前路径下的Hadoop-2.7.3.tar.gz上传到hdfs上面,hadoop-2.7.3.tar.gz这个文件大概是200M左右,那么它将会被分成2块,因为hdfs默认的块大小是128M.

运行命令:hdfs dfs -put ./hadoop-2.7.3.tar.gz /

img

二、使用web页面信息

1.访问http://hadoop1:50070

img

  1. 点击刚才上传的文件,查看分块信息

第0块:

img

第1块:

img

HDFS常用操作命令

hdfs dfs -mkdir /abc #建立名为/abc的文件夹hadoop

hdfs dfs -ls / 0 #列出根目录中的内容get

hdfs dfs -ls -R / #递归列出多层文件夹的内容文件上传

hdfs dfs -put /etc/hosts /abc/hosts #把Linux系统中/etc/hosts文件上传到HDFS中

hdfs dfs -appendToFile /etc/hosts /abc/hosts

#向文件中追加内容

hdfs dfs -checksum /abc/hosts #查看文件的MD5值

hdfs dfs -du -h / #查看文件/文件夹的大小

#-h以人类友好的方式显示大小(过大时带单位)

hdfs dfs -get /abc/hosts ./hosts #把HDFS中的文件下载到本地Linux中

#注意./hosts是下载后保存到本地的位置

hdfs dfs -cat /abc/hosts #查看HDFS中文本文件的内容

#注意:只能查看文件文件

hdfs dfs -tail /abc/hosts #列出文件结尾处1KB的文件内容

hdfs dfs -mv /abc/hosts /abc/xyz #修改文件名字或移动位置

hdfs dfs -cp /abc/xyz /abc/hosts #复制文件

hdfs dfs -find / -name xyz #查找名字为xyz的文件的位置

hdfs dfs -rmdir /abc #删除名为/abc的文件夹

#注意:若是其中还有文件则不能删除

hdfs dfs -rm /abc/hosts #删除文件

hdfs dfs -rm -r /abc #递归删除文件/文件夹,文件夹中有文件也能删除

hdfs dfs -df #查看HDFS文件系统的磁盘使用状况