Hadoop Framework Course

Hadoop Framework Course

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is an open source tool from ASF. Open source means its codes are easily available and its framework is written in Java. It is used for distributed storage and processing of dataset of Big Data. It is widely used tool for handling big data.

Training Objectives of Hadoop:
For learning Hadoop you should have good command in Java because Hadoop framework is written in Java. First we will recommend that start learning from basics. If you have good understanding of basics, then you can learn complex part easily. we will give you the quality online training and We provide the placement after completion of the training.

Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux.

Course Content:

Introduction , The Motivation for Hadoop
Problems with traditional large-scale systems
Requirements for a new approach

Hadoop Basic Concepts
An Overview of Hadoop
The Hadoop Distributed File System
Hands on Exercise
How MapReduce Works
Hands on Exercies
Anatomy of a Hadoop Cluster
Other Hadoop Ecosystem Components

Writing a MapReduce Program
Examining a Sample MapReduce Program
With several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop’s Streaming API

Delving Deeper Into The Hadoop API
More About ToolRunner
Testing with MRUnit
Reducing Intermediate Data With Combiners
The configure and close methods for Map/Reduce Setup and Teardown
Writing Partitioners for Better Load Balancing
Hands-On Exercise
Directly Accessing HDFS
Using the Distributed Cache
Hands-On Exercise

Performing several hadoop jobs
The configure and close Methods
Sequence Files
Record Reader
Record Writer
Role of Reporter
Output Collector
Processing video files and audio files
Processing image files
Processing XML files
Directly Accessing HDFS
Using The Distributed Cache

Common MapReduce Algorithms
Sorting and Searching
Classification/Machine Learning
Term Frequency – Inverse Document Frequency
Word Co-Occurrence
Hands-On Exercise: Creating an Inverted Index
Identity Mapper
Identity Reducer
Exploring well known problems using MapReduce applications

Using Hbase
What is HBase?
Managing large data sets with HBase
Using HBase in Hadoop applications
Hands-on Exercise

Using Hive and Pig
Hive Basics
Pig Basics
Hands on Exercise

Practical Development Tips and Techniques
Debugging MapReduce Code
Using LocalJobRunner Mode for Easier Debugging
Retrieving Job Information with Countrers
Splittable File Formats
Determining the Optimal Number of Reducers
Map-Only MapReduce Jobs
Hands on Exercise

Debugging MapReduce Programs
Testing with MRUnit
Classification/Machine Learning
Advanced MapReduce Programming
A Recap of the MapReduce Flow
The Secondary Sort
Customized InputFormats and OutputFormats
Pipelining Jobs With Oozie
Map-Side Joins
Reduce-Side Joins

Joining Data Sets in MapReduce
Map-Side Joins
The Secondary Sort
Reduce-Side Joins

Monitoring and debugging on a Production Cluster
Skipping Bad Records
Rerunning failed tasks with Isolation Runner

Tuning for Performance in MapReduce
Reducing network traffic with combiner
Reducing the amount of input data
Using Compression
Reusing the JVM
Running with speculative execution
Refactoring code and rewriting algorithms Parameters affecting Performance
Other Performance Aspects