Hadoop Developer


Hadoop is an open-source framework that allows to store and process the big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Training Objectives of Hadoop Developer/Admin:
Hadoop training is designed to help you become a top Hadoop developer. During this course, our expert instructors will train you to-
• Master the concepts of HDFS and MapReduce framework
• Understand Hadoop 2.x Architecture
• Setup Hadoop Cluster and write Complex MapReduce programs
• Learn data loading techniques using Sqoop and Flume
• Perform data analytics using Pig, Hive and YARN
• Implement HBase and MapReduce integration
• Learn how to work in RDD in Spark
• Work on a real life Project on Big Data Analytics

Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux

Course Content

Hadoop Architecture
Introduction to
Parallel Computer vs. Distributed Computing
How to install Hadoop on your system
How to install Hadoop cluster on multiple
Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker
Exploring HDFS (Hadoop Distributed File System) Exploring the HDFS Apache Web UI
NameNode architecture (EditLog, FsImage, location of replicas) Secondary NameNode architecture
DataNode architecture

MapReduce Architecture
Exploring JobTracker/TaskTracker
How a client submits a Map-Reduce job
Exploring Mapper/Reducer/Combiner
Shuffle: Sort & Partition
Input/output formats
Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler) Exploring the
Apache MapReduce Web UI

Hadoop Developer Tasks
Balancing Sorting in HDFS Writting a map-reduce programme
Reading and writing data using
Java Hadoop Eclipse integration
Mapper in details
Reducer in details
Using Combiners
Reducing Intermediate Data with Combiners
Writing Partitioners for Better Load
Searching in HDFS
Indexing in HDFS
Hands-On Exercise

Hadoop Administrative Tasks
Routine Administrative Procedures
Understanding dfsadmin and mradmin Block Scanner, Balancer
Health Check & Safe mode
DataNode commissioning/decommissioning
Monitoring and Debugging on a production
cluster NameNode Back up and Recovery
ACL (Access control list) Upgrading Hadoop

HBase Architecture
Introduction to Hbase
HBase vs. RDBMS
Exploring HBase Master & region server
Column Families and Regions
Basic Hbase shell commands

Hive Architecture
Introduction to Hive
HBase vs Hive
Installation of Hive
HQL (Hive query language)
Basic Hive commands

Pig Architecture
Introduction to Pig
Installation of Pig on your system
Basic Pig commands
Hands-On Exercise

Sqoop Architecture
Introduction to Sqoop
Installation of Sqoop on your system
Import/Export data from RDBMS to HDFS
Import/Export data from RDBMS to HBase
Import/Export data from RDBMS to Hive
Hands-On Exercise

Mini Project / POC ( Proof of Concept )
Facebook-Hive POC
Usages of Hadoop/Hive @ Facebook
Static & dynamic partitioning
UDF ( User defined functions )