Home

Welcome to the HadoopExam Hadoo/BigData Professional Training Course.

Have you subscribed for update ? Please subscribe here

Please follow the below steps to view the training contents step by step.

Covered Syllabus: 

Module 1 :  Introduction to BigData, Hadoop (HDFS and MapReduce) : Available (Length 35 Minutes)

1. BigData Inroduction

2. Hadoop Introduction

3. HDFS Introduction

4. MapReduce Introduction

Module 2 :  Deep Dive in HDFS : Available (Length 48 Minutes) + Useful for CCA175

1. HDFS Design

2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)

3. Rack Awareness

4. Read/Write from HDFS

5. HDFS Federation  and High Availability (Hadoop 2.x.x)

6. Parallel Copying using DistCp

7. HDFS Command Line Interface

Module 2A : HDFS File Operation Lifecycle (Supplementary)   : Available (Length 45 Minutes)

1. File Read Cycel from HDFS

  - DistributedFileSystem

  - FSDataInputStream

2. Failure or Error Handling When File Reading Fails

3. File Write Cycle from HDFS

  - FSDataOutputStream

4. Failure or Error Handling while File write fails

Module 3 : Understanding MapReduce : Available (Length 60 Minutes)

1. JobTracker and TaskTracker

2. Topology Hadoop cluster

3. Example of MapReduce

Map Function

Reduce Function

4. Java Implementation of MapReduce

5. DataFlow of MapReduce

6. Use of Combiner

Module 4 : MapReduce Internals -1 (In Detail) : Available (Length 57 Minutes)

1. How MapReduce Works

2. Anatomy of MapReduce Job (MR-1)

3. Submission & Initialization of MapReduce Job (What Happen ?)

4. Assigning & Execution of Tasks

5. Monitoring & Progress of MapReduce Job

6. Completion of Job

7. Handling of MapReduce Job

- Task Failure

- TaskTracker Failure

- JobTracker Failure

Module 5 : MapReduce-2 (YARN : Yet Another Resource Negotiator Hadoop 2.x.x ) : Available (Length 52 Minutes)

1. Limitation of Current Architecture (Classic)

2. What are the Requirement ?

3. YARN Architecture

4. JobSubmission and Job Initialization

5. Task Assignment and Task Execution

6.  Progress and Monitoring of the Job

7.  Failure Handling in YARN

- Task Failure

- Application Master Failure

- Node Manager Failure

- Resource Manager Failure

Module 6 : Advanced Topic for MapReduce (Performance and Optimization) : Available (Length 58 Minutes)

1. Job Sceduling

2. In Depth Shuffle and Sorting

3. Speculative Execution

4. Output Committers

5. JVM Reuse in MR1

6. Configuration and Performance Tuning

Module 7 : Advanced MapReduce Algorithm : Available (Length 87 Minutes) 

File Based Data Structure

- Sequence File

- MapFile

Default Sorting In MapReduce

- Data Filtering (Map-only jobs)

- Partial Sorting

Data Lookup Stratgies

- In MapFiles

Sorting Algorithm

- Total Sort (Globally Sorted Data)

- InputSampler

- Secondary Sort

Module 8 : Advanced MapReduce Algorithm -2 : Available : Private (Length 67 Minutes) 

1. MapReduce Joining

- Reduce Side Join

- MapSide Join

- Semi Join

2. MapReduce Job Chaining

- MapReduce Sequence Chaining

- MapReduce Complex Chaining

Module 9 : Features of MapReduce : Available : Private (Length 61 Minutes)

Introduction to MapReduce Counters

    Types of Counters

    Task Counters

    Job Counters

    User Defined Counters

    Propagation of Counters

Side Data Distribution

    Using JobConfiguration

    Distributed Cache

    Steps to Read and Delete Cache File

Module 10: MapReduce DataTypes and Formats : Available : Private (Length 77 Minutes)

      1.Serialization In Hadoop

      2. Hadoop Writable and Comparable

      3. Hadoop RawComparator and Custom Writable

      4. MapReduce Types and Formats

      5. Understand Difference Between Block and InputSplit

      6. Role of RecordReader

      7. FileInputFormat

      8. ComineFileInputFormat and Processing whole file Single Mapper

      9. Each input File as a record

    10. Text/KeyValue/NLine InputFormat

    11. BinaryInput processing

    12. MultipleInputs Format

    13. DatabaseInput and Output

    14. Text/Biinary/Multiple/Lazy OutputFormat MapReduce Types

Module 11 : Apache Pig : Available (Length 52 Minutes)

1. What is Pig ?

2. Introduction to Pig Data Flow Engine

3. Pig and MapReduce in Detail

4. When should Pig Used ?

5. Pig and Hadoop Cluster

6. Pig Interpreter and MapReduce

7. Pig Relations and Data Types

8. PigLatin Example in Detail

9. Debugging and Generating Example in Apache Pig

Module 11A :   Hands On :  Apache Pig Coding : Available (Length 23 Minutes) 

1. Working with Grunt shell

2. Create word count application

3. Execute word count application

4. Accessing HDFS from grunt shell

Module 11B :   Hands On :  Apache Pig Complex Datatypes : Available (Length 14 Minutes) 

1. Understand Map, Tuple and Bag

2. Create Outer Bag and Inner Bag

3. Defining Pig Schema

Module 11C :   Hands On :  Apache Pig Data loading : Available (Length 14  Minutes) 

1. Understand Load statement

2. Loading csv file

3. Loading csv file with schema

4. Loading Tab separated file

5. Storing back data to HDFS.

Module 11D :  Hands On :   Apache Pig Statements : Available (Length 8 Minutes) 

1. ForEach statement

2. Example 1 : Data projecting and foreach statement

3. Example 2 : Projection using schema

4. Example 3 : Another way of selecting columns using two dots ..

Module 11E :   Hands On :  Apache Pig Complex Datatype practice : Available (Length 16 Minutes) 

1. Example 1 : Loading Complex Datatypes

2. Example 2 : Loading compressed files 

3. Example 3 : Store relation as compressed files

4. Example 4 : Nested FOREACH statements to solved same problem.

Module 12 : Fundamental of Apache Hive Part-1 : Available (Length 60 Minutes) + Useful for CCA175

1. What is Hive ?

2. Architecture of Hive

3. Hive Services

4. Hive Clients

5. how Hive Differs from Traditional RDBMS

6. Introduction to HiveQL

7. Data Types and File Formats in Hive

8. File Encoding

9. Common problems while working with Hive

Module 13 : Apache Hive : Available (Length 73 Minutes ) + Useful for CCA175

1. HiveQL

2. Managed and External Tables

3. Understand Storage Formats

4. Querying Data

- Sorting and Aggregation

- MapReduce In Query

- Joins, SubQueries and Views

5. Writing User Defined Functions (UDFs)

3. Data types and schemas

4. Querying Data

5. HiveODBC

6. User-Defined Functions

Module 14  : Understanding NGram algorithm Available (Length 14 Minutes)  : Newly Replaced

Module 15  : Hands On :  Step by Step Process creating and Configuring eclipse for writing MapReduce Code Available (Length 29 Minutes)  : Newly Replaced

Module 16 : Hands On :  Analyzing the Result by Running NGram application (UniGram, BiGram, TriGram etc.)  Available (Length 19 Minutes) : Newly Replaced

Module 17 : NOSQL Introduction and Implementation : Available (Length 56 Minutes) New

1. What is NoSQL ?

2. NoSQL Characerstics or Common Traits

3. Catgories of NoSQL DataBases

    - Key-Value Database

    - Document DataBase

    - Column Family DataBase

    - Graph DataBase

4. Aggregate Orientation : Perfect fit for NoSQl

5. NOSQL Implementation

6. Key-Value Database Example and Use

7. Document DataBase  Example and Use

8. Column Family DataBase Example and Use

9. What is Polyglot persistence ?

Module 18 : HBase Introduction :  : Available (Part-1 Length 48 Minutes and Part-2 Length-37 Minutes) New

1. Fundamentals of HBase

2. Usage Scenerio of HBase

3. Use of HBase in Search Engine

4. HBase DataModel

 - Table and Row

 - Column Family and Column Qualifier

 - Cell and its Versioning

 - Regions and Region Server

5. HBase Designing Tables

6. HBase Data Coordinates

7. Versions and HBase Operation

 - Get/Scan

 - Put

 - Delete

Video URL : Watch Private Video Part-1 and Part-2

Module 19 :  Hands On  Creating MapReduce application and deploying on Hadoop Cluster. Available (Length 33 Minutes) : Newly Replaced

1. Creating MapReduce Program

2. Running MapReduce Job

3. Analyzing Resource Manager and looking for the logs

Module 20 :  Apache Cassandra  : Available (Length 63 Minutes) New

1. BigData and Apache Cassandra

2. Why Cassanra is so Popular

3. Cassandra as a Distributed DataBase

4. Cassandra and High Availability

5. Cassandra and Replication Mechanism

6. Cassandra's Elastic Scalability

7. Tuneable consistency

 - Strict Consistency

 - Casual Consistency

 - Weak Consistency

8. Brewer's CAP Theorem

9. Cassandra as a Scema Free DataBase

10. Where should we use Cassandra

11. Who and why using the Cassandra

Module 21:  Hands On MRUnit (MapReduce Testing Framework) : Available (Length 48 Minutes) New

1. Practice Basic MapReduce Without Installing Hadoop Framework

2. Mapper Testing

3. Reducer Testing

4. Counter Testing

5. Full MapReduce Job Testing

Module 22 :  Apache Sqoop (SQL To Hadoop) : Available (Length 66 Minutes) New + Useful for CCA175

1. Sqoop Tutorial

2. How does Sqoop Work

3. Sqoop JDBCDriver  and Connectors

4. Sqoop Importing Data

5. Various Options to Import Data

 - Table Import

 - Binary Data Import

 - SpeedUp the Import

 - Filtering Import

 - Full DataBase Import Introduction to Sqoop

Module 23 :  Apache Flume  : Available (Length 28 Minutes) New

1. Data Acquisition : Apache Flume Introduction

2. Apache Flume Components

3. POSIX and HDFS File Write

4. Flume Events

5. Interceptors, Channel Selectors, Sink Processor

 

Module 24 :  Advanced Apache Flume  :Available (Length 48 Minutes) New

1. Sample Twiteer Feed Configuration

2. Flume Channel

   - Memory Channel

   - File Channel

3. Sinks and Sink Processors

4. Sources

5. Channel Selectors

6. Interceptors

Module 25 : YARN Introduction (Length 52 Mins)  Available Hadoop 2.x. YARN Training

1. Why to think Beyond MapReduce

2. New Components of YARN

3. Revisit Hadoop 1.0

4. How YARN fits in Hadoop Framework

5. Hadoop MR1 Components Revisit

6. Need for Non-MapReduce

7. YARN Components Introduction

Module 26 : Fundamental Overview of YARN (Length 40 Mins) Available Hadoop 2.x. YARN Training

1. YARN Functional Component

2. YARN Architecture Overview

3. Claiming and Re-claiming Resources

4. Functional Properties of 

Resource Manager

Node Manager

Application Master

5. YARN Scheduling Component

6. Introduction to FIFO Scheduler

7. Introduction to Capacity Scheduler

Module 27 : Powerfull Hadoop 2.0 Framework (Length : 27 Mins) Available Hadoop 2.x. YARN Training

1. HDFS 1.0 Versus Hadoop 2.0

2. Resource Manager - Subcomponent

3. Details About Fair Share Scheduler

4. Hierarchical Queues in Scheduler

5. Containers

6. Node Manager and Its Responsbility

7. Role of Application Master while submitting Jobs

Module 28 : Submitting the Application to YARN Hadoop Cluster (Length : 27 Mins) Available Hadoop 2.x. YARN Training

1. Submitting the Application to YARN Hadoop Cluster

2. Managing Application Dependencies

3. Writing a YARN Application : Birdseye View

Module 29 : LocalResources of the Application Available Hadoop 2.x. YARN Training

1. Understanding of YARN Application/Jobs Dependencies

2. Types of LocalResource

3. Visibilites of Local Resources

4. Lifetime of Local Resources

5. Good and Bad Local Resources

6. Target Directories of Local Resources

Module 30 : Deep Dive in Capacity Schedular (Length 39 Mins) Available Hadoop 2.x. YARN Training

1. Introduction and Enabling Capacity Schedular

2. Setting Up Quesues in the CS

3. Access Control List Setup

4. Managing Cluster Capacity in with Queues

5. Resource Distribution Workflow Example

Module 31 : Managing Capacity Schedular (Length  39 Mins) Available Hadoop 2.x. YARN Training

1. Managing Capacity with Queues

2. Resource Distribution Example

3. Understanding User Limits

4. Application Reservation

5. Understanding the Preemption

Module 32 : Hadoop Security : Kerberos Authentication (Length  23 Mins) Available Hadoop Security Training

1. Kerberos Authentication

2. Important entity of Kerberos Autherization

3. How Kerberos Process works

Module 33 : Apache Spark : Introduction to Apache Spark  (Length  48 Mins) Available 100 Time Faster Data Processing + Useful for CCA175

1. Introduction to Apache Spark

2. Features of Apache Spark

3. Apache Spark Stack

4. Introduction to RDD's

5. RDD's Transformation

6. What is Good and Bad In MapReduce

7. Why to use Apache Spark

Module 34 : Cloudera QuickStart VM Step By Step Installation  (Length  19  Mins) Available + Steps in PDF+ Hands On Lab

1. It Includes Hadoop 2.0

2. YARN

3. Hive

4. Pig

5. Hue

6. Apache Spark

7. Workflow

Module 35 : Load data in HDFS using the HDFS commands (Length 35  Mins) Available + Steps in PDF + Hands On Lab + Useful for CCA175

Module 36  : Importing Data from RDBMS to HDFS (Length  21 Mins) Available + Steps in PDF+Hands On Lab + Useful for CCA175

1. Without Specifying Directory

2. With target Directory

3. With warehouse directory

Module 37 : Sqoop Import Module  (Length  41 Mins) Available + Steps in PDF +Hands On Lab + Useful for CCA175

1. Importing Subset of data from RDBMS

2. Chnaging the delimiter during Import

3. Encoding Null values

4. Importing Entire schema or all tables

Module 38 : Importing data to HIve Using Sqoop  (Length  41 Mins) Available +Steps in PDF + Hands On Lab + Useful for CCA175

Module 39 : Apache Avro Introduction  (Length  26 Mins) Available + PDF Download + Useful for CCA175

1. Why Avro files

2. Avro file Serialization and Deserialization

3. Adding fields

4. Deleting fields

Module 40 : Apache Avro Schema In Depth  (Length  12 Mins) Available + PDF Download + Useful for CCA175

1. Avro schema example

2. Avro embedded schema 

3. Avro schema primitive data types

4. Avro schema Complex data types

Record, Map, Array, Union, Enum, Fixed etc.

Module 41 : Apache Avro Schema Evolution  (Length  16 Mins) Available + PDF Download + Useful for CCA175

1. Understand Avro Schema Evolution

2. Reader Schema and Writer Schema

3. JSON schema Adding new fields

4. JSON schema removing a filed

All above 41 modules are available and ready to Watch/Learn (To Buy go on Top)