Hadoop HDFS operation using java example

posted on Nov 20th, 2016

Apache Hadoop

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models.

The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system installed.

2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)

HDFS Operations Java Example

This program shows how to create a directory, delete a directory and copy files from local system to HDFS programmatically without using dfs commands.

Step 1 - Add all hadoop jar files to your java project. Add following jars.

/usr/local/hadoop/share/hadoop/common/*
/usr/local/hadoop/share/hadoop/common/lib/*
/usr/local/hadoop/share/hadoop/mapreduce/*
/usr/local/hadoop/share/hadoop/mapreduce/lib* 
/usr/local/hadoop/share/hadoop/yarn/*
/usr/local/hadoop/share/hadoop/yarn/lib/*

HDFS.java

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFS {
	//hdfs
	static Configuration config = null;
	static FileSystem dfs = null;
	//local
	static Configuration conf = null;
	static FileSystem localFileSystem = null;
	
	public HDFS() throws IOException {
		//hdfs conf
		config = new Configuration();
		config.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
		config.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
		config.set("fs.hdfs.impl",
				org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
		config.set("fs.file.impl",
				org.apache.hadoop.fs.LocalFileSystem.class.getName());
		dfs = FileSystem.get(config);
		//local conf
		conf = new Configuration();
		localFileSystem = FileSystem.get(conf);
	}

	public static Configuration getConfiguration() throws IOException {
		//hdfs conf
		config = new Configuration();
		config.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));
		config.addResource(new Path("/usr/local/hadoop/conf/hdfs-site.xml"));
		config.set("fs.hdfs.impl",
				org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
		config.set("fs.file.impl",
				org.apache.hadoop.fs.LocalFileSystem.class.getName());
		return config;
	}

	public static void main(String... a) throws IOException {
		new HDFS();
		createDir("/user/hduser/abcd");
	//	deleteDir("/user/hduser/img");
	//	copyFromLocalToHdfs("/home/hduser/Desktop/video.mp4", "/user/hduser/video/");
		
	}
	
	public static void createDir(String dirName) throws IOException {
		System.out.println(dfs.getWorkingDirectory());
		if (dfs.exists(new Path(dirName))) {
			System.out.println("directory already exists - " + dirName );
		} else {
			Path src = new Path(dirName);
			dfs.mkdirs(src);
			System.out.println("directory created- " + dirName );
		}
	}
	
	public static void deleteDir(String dirName) throws IOException {
		if (dfs.exists(new Path(dirName))) {
			dfs.delete(new Path(dirName), true);
			System.out.println("directory deleted- " + dirName );
		} else{
			System.out.println("directory doesnt exists - " + dirName );
		}
	}
	
	public static boolean copyFromLocalToHdfs(String localfile, String hdfsfile, Configuration configuration) throws IOException {
		//local
		Configuration conf = new Configuration();
		FileSystem localFileSystem = FileSystem.getLocal(conf);
		Path src = localFileSystem.makeQualified(new Path(localfile));
		//hdfs
		Configuration config = configuration;
		FileSystem dfs = FileSystem.get(config);
		if (dfs.exists(new Path(hdfsfile))) {
			dfs.delete(new Path(hdfsfile),true);
		}
		if (localFileSystem.exists(src)) {
			Path dst = new Path(hdfsfile);
			dfs.copyFromLocalFile(src, dst);
			return true;
		} else{
			System.out.println("local directory doesnt exists - " + localfile);
			return false;
		}
	}
}

Step 2 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 3 - Start all hadoop daemons

$ start-all.sh

Step 4 - Run your HDFS program by submitting java project jar file to hadoop. Creating jar file is left to you.

$ hadoop jar /path/hdfs.jar HDFS

Step 5 - Check the new directory abcd is created or not.

$ hdfs dfs -ls /user/hduser/

Step 6 - Dont forget to stop hadoop daemons.

$ stop-all.sh

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Hadoop Standalone Mode Installation   Hadoop Pseudo Distributed Mode Installation   Hadoop Fully Distributed Mode Installation   Hadoop HDFS commands usage Hadoop Commissioning and Decommissioning DataNode     Hadoop Mapper/Reducer Java Example   Hadoop WordCount Java Example   Hadoop Combiner Java Example   Hadoop Partitioner Java Example   Hadoop Distributed Cache Java Example