Hadoop HDFS commands usage

posted on Nov 20th, 2016

Apache Hadoop

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models.

The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system.

2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)

Hadoop Distributed File System (HDFS)

HDFS holds very large amount of data and provides easier access. To store such huge data, the files are stored across multiple machines. These files are stored in redundant fashion to rescue the system from possible data losses in case of failure. HDFS also makes applications available to parallel processing.

List Command

Lists the contents of the directory specified by path, showing the names, permissions, owner, size and modification date for each entry. -lsr behaves like -ls, but recursively displays entries in all subdirectories of path.

$ hdfs dfs -ls <args> 
$ hdfs dfs -lsr <path>

Disk Usage Command

Shows disk usage, in bytes, for all the files which match path; filenames are reported with the full HDFS protocol prefix. -dus is like -du, but prints a summary of disk usage of all files/directories in the path.

$ hdfs dfs -du <path>
$ hdfs dfs -dus <path>

Move Command

Moves the file or directory indicated by src to dest, within HDFS.

$ hdfs dfs -mv <src> <dest>

Copy Command

Copies the file or directory identified by src to dest, within HDFS.

$ hdfs dfs -cp <src> <dest>

Remove Command

Removes the file or empty directory identified by path. Removes the file or directory identified by path. Recursively deletes any child entries (i.e., files or subdirectories of path).

$ hdfs dfs -rm <path>
$ hdfs dfs -rmr <path>

Put Command

Copies the file or directory from the local file system identified by localSrc to dest within the DFS.

$ hdfs dfs -put  <localSrc>  <dest>

Cat Command

Displays the contents of filename on stdout.

$ hdfs dfs -cat <file-name>

Get Command

Copies the file or directory in HDFS identified by src to the local file system path identified by localDest.

$ hdfs dfs -get /user/output/ /home/hadoop_tp/

Copy and Move From Local file system Command

-copyFromLocal is dentical to -get command above. Works like -get, but deletes the HDFS copy on success.

$ hdfs dfs -copyFromLocal<locSrc> <dest>
$ hdfs dfs -moveFromLocal<localSrc> <dest>

Set Replication Command

Sets the target replication factor for files identified by path to rep. (The actual replication factor will move toward the target over time)

$ hdfs dfs -setrep [-R] [-w] rep <path>

Tail Command

Shows the last 1KB of file on stdout.

$ hdfs dfs -tail [-f] <file2name>

Make Directory Command

Creates a directory named path in HDFS.

$ hdfs dfs -mkdir dirName

Test Command

Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.

$ hdfs dfs -test -[ezd] <path>

Change Command

chmod changes the file permissions associated with one or more objects identified by path. Performs changes recursively with -R. mode is a 3-digit octal mode, or {augo}+/-{rwxX}. Assumes if no scope is specified and does not apply an umask.

chown sets the owning user and/or group for files or directories identified by path. Sets owner recursively if -R is specified.

chgrp Sets the owning group for files or directories identified by path.... Sets group recursively if -R is specified.

$ hdfs dfs -chmod [-R] mode,mode,...<path>
$ hdfs dfs -chown [-R] [owner][:[group]] <path>
$ hdfs dfs -chgrp [-R] group <path>

Help Command

Returns usage information for one of the commands listed above. You must omit the leading '-' character in cmd.

$ hdfs dfs -help <cmd-name>

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Hadoop Standalone Mode Installation   Hadoop Fully Distributed Mode Installation   Hadoop Pseudo Distributed Mode Installation   Hadoop Commissioning and Decommissioning DataNode   Hadoop WordCount Java Example   Hadoop Mapper/Reducer Java Example   Hadoop Combiner Java Example   Hadoop Partitioner Java Example   Hadoop HDFS operations using Java   Hadoop Distributed Cache Java Example