Home
Welcome to the HadoopExam Hadoo/BigData Professional Training Course.
Have you subscribed for update ? Please subscribe here
Please follow the below steps to view the training contents step by step.
Using SignIn, to login with your permitted email Id
Use the Pedagogy Navigation to watch Individual training Module
You can download Training PDF Resources only after login.
As soon as more modules will be added or existing updated you will access from here.
Covered Syllabus:
Module 1 : Introduction to BigData, Hadoop (HDFS and MapReduce) : Available (Length 35 Minutes)
1. BigData Inroduction
2. Hadoop Introduction
3. HDFS Introduction
4. MapReduce Introduction
Module 2 : Deep Dive in HDFS : Available (Length 48 Minutes) + Useful for CCA175
1. HDFS Design
2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)
3. Rack Awareness
4. Read/Write from HDFS
5. HDFS Federation and High Availability (Hadoop 2.x.x)
6. Parallel Copying using DistCp
7. HDFS Command Line Interface
Module 2A : HDFS File Operation Lifecycle (Supplementary) : Available (Length 45 Minutes)
1. File Read Cycel from HDFS
- DistributedFileSystem
- FSDataInputStream
2. Failure or Error Handling When File Reading Fails
3. File Write Cycle from HDFS
- FSDataOutputStream
4. Failure or Error Handling while File write fails
Module 3 : Understanding MapReduce : Available (Length 60 Minutes)
1. JobTracker and TaskTracker
2. Topology Hadoop cluster
3. Example of MapReduce
Map Function
Reduce Function
4. Java Implementation of MapReduce
5. DataFlow of MapReduce
6. Use of Combiner
Module 4 : MapReduce Internals -1 (In Detail) : Available (Length 57 Minutes)
1. How MapReduce Works
2. Anatomy of MapReduce Job (MR-1)
3. Submission & Initialization of MapReduce Job (What Happen ?)
4. Assigning & Execution of Tasks
5. Monitoring & Progress of MapReduce Job
6. Completion of Job
7. Handling of MapReduce Job
- Task Failure
- TaskTracker Failure
- JobTracker Failure
Module 5 : MapReduce-2 (YARN : Yet Another Resource Negotiator Hadoop 2.x.x ) : Available (Length 52 Minutes)
1. Limitation of Current Architecture (Classic)
2. What are the Requirement ?
3. YARN Architecture
4. JobSubmission and Job Initialization
5. Task Assignment and Task Execution
6. Progress and Monitoring of the Job
7. Failure Handling in YARN
- Task Failure
- Application Master Failure
- Node Manager Failure
- Resource Manager Failure
Module 6 : Advanced Topic for MapReduce (Performance and Optimization) : Available (Length 58 Minutes)
1. Job Sceduling
2. In Depth Shuffle and Sorting
3. Speculative Execution
4. Output Committers
5. JVM Reuse in MR1
6. Configuration and Performance Tuning
Module 7 : Advanced MapReduce Algorithm : Available (Length 87 Minutes)
File Based Data Structure
- Sequence File
- MapFile
Default Sorting In MapReduce
- Data Filtering (Map-only jobs)
- Partial Sorting
Data Lookup Stratgies
- In MapFiles
Sorting Algorithm
- Total Sort (Globally Sorted Data)
- InputSampler
- Secondary Sort
Module 8 : Advanced MapReduce Algorithm -2 : Available : Private (Length 67 Minutes)
1. MapReduce Joining
- Reduce Side Join
- MapSide Join
- Semi Join
2. MapReduce Job Chaining
- MapReduce Sequence Chaining
- MapReduce Complex Chaining
Module 9 : Features of MapReduce : Available : Private (Length 61 Minutes)
Introduction to MapReduce Counters
Types of Counters
Task Counters
Job Counters
User Defined Counters
Propagation of Counters
Side Data Distribution
Using JobConfiguration
Distributed Cache
Steps to Read and Delete Cache File
Module 10: MapReduce DataTypes and Formats : Available : Private (Length 77 Minutes)
1.Serialization In Hadoop
2. Hadoop Writable and Comparable
3. Hadoop RawComparator and Custom Writable
4. MapReduce Types and Formats
5. Understand Difference Between Block and InputSplit
6. Role of RecordReader
7. FileInputFormat
8. ComineFileInputFormat and Processing whole file Single Mapper
9. Each input File as a record
10. Text/KeyValue/NLine InputFormat
11. BinaryInput processing
12. MultipleInputs Format
13. DatabaseInput and Output
14. Text/Biinary/Multiple/Lazy OutputFormat MapReduce Types
Module 11 : Apache Pig : Available (Length 52 Minutes)
1. What is Pig ?
2. Introduction to Pig Data Flow Engine
3. Pig and MapReduce in Detail
4. When should Pig Used ?
5. Pig and Hadoop Cluster
6. Pig Interpreter and MapReduce
7. Pig Relations and Data Types
8. PigLatin Example in Detail
9. Debugging and Generating Example in Apache Pig
Module 11A : Hands On : Apache Pig Coding : Available (Length 23 Minutes)
1. Working with Grunt shell
2. Create word count application
3. Execute word count application
4. Accessing HDFS from grunt shell
Module 11B : Hands On : Apache Pig Complex Datatypes : Available (Length 14 Minutes)
1. Understand Map, Tuple and Bag
2. Create Outer Bag and Inner Bag
3. Defining Pig Schema
Module 11C : Hands On : Apache Pig Data loading : Available (Length 14 Minutes)
1. Understand Load statement
2. Loading csv file
3. Loading csv file with schema
4. Loading Tab separated file
5. Storing back data to HDFS.
Module 11D : Hands On : Apache Pig Statements : Available (Length 8 Minutes)
1. ForEach statement
2. Example 1 : Data projecting and foreach statement
3. Example 2 : Projection using schema
4. Example 3 : Another way of selecting columns using two dots ..
Module 11E : Hands On : Apache Pig Complex Datatype practice : Available (Length 16 Minutes)
1. Example 1 : Loading Complex Datatypes
2. Example 2 : Loading compressed files
3. Example 3 : Store relation as compressed files
4. Example 4 : Nested FOREACH statements to solved same problem.
Module 12 : Fundamental of Apache Hive Part-1 : Available (Length 60 Minutes) + Useful for CCA175
1. What is Hive ?
2. Architecture of Hive
3. Hive Services
4. Hive Clients
5. how Hive Differs from Traditional RDBMS
6. Introduction to HiveQL
7. Data Types and File Formats in Hive
8. File Encoding
9. Common problems while working with Hive
Module 13 : Apache Hive : Available (Length 73 Minutes ) + Useful for CCA175
1. HiveQL
2. Managed and External Tables
3. Understand Storage Formats
4. Querying Data
- Sorting and Aggregation
- MapReduce In Query
- Joins, SubQueries and Views
5. Writing User Defined Functions (UDFs)
3. Data types and schemas
4. Querying Data
5. HiveODBC
6. User-Defined Functions
Module 14 : Understanding NGram algorithm Available (Length 14 Minutes) : Newly Replaced
Module 15 : Hands On : Step by Step Process creating and Configuring eclipse for writing MapReduce Code Available (Length 29 Minutes) : Newly Replaced
Module 16 : Hands On : Analyzing the Result by Running NGram application (UniGram, BiGram, TriGram etc.) Available (Length 19 Minutes) : Newly Replaced
Module 17 : NOSQL Introduction and Implementation : Available (Length 56 Minutes) New
1. What is NoSQL ?
2. NoSQL Characerstics or Common Traits
3. Catgories of NoSQL DataBases
- Key-Value Database
- Document DataBase
- Column Family DataBase
- Graph DataBase
4. Aggregate Orientation : Perfect fit for NoSQl
5. NOSQL Implementation
6. Key-Value Database Example and Use
7. Document DataBase Example and Use
8. Column Family DataBase Example and Use
9. What is Polyglot persistence ?
Module 18 : HBase Introduction : : Available (Part-1 Length 48 Minutes and Part-2 Length-37 Minutes) New
1. Fundamentals of HBase
2. Usage Scenerio of HBase
3. Use of HBase in Search Engine
4. HBase DataModel
- Table and Row
- Column Family and Column Qualifier
- Cell and its Versioning
- Regions and Region Server
5. HBase Designing Tables
6. HBase Data Coordinates
7. Versions and HBase Operation
- Get/Scan
- Put
- Delete
Video URL : Watch Private Video Part-1 and Part-2
Module 19 : Hands On Creating MapReduce application and deploying on Hadoop Cluster. Available (Length 33 Minutes) : Newly Replaced
1. Creating MapReduce Program
2. Running MapReduce Job
3. Analyzing Resource Manager and looking for the logs
Module 20 : Apache Cassandra : Available (Length 63 Minutes) New
1. BigData and Apache Cassandra
2. Why Cassanra is so Popular
3. Cassandra as a Distributed DataBase
4. Cassandra and High Availability
5. Cassandra and Replication Mechanism
6. Cassandra's Elastic Scalability
7. Tuneable consistency
- Strict Consistency
- Casual Consistency
- Weak Consistency
8. Brewer's CAP Theorem
9. Cassandra as a Scema Free DataBase
10. Where should we use Cassandra
11. Who and why using the Cassandra
Module 21: Hands On MRUnit (MapReduce Testing Framework) : Available (Length 48 Minutes) New
1. Practice Basic MapReduce Without Installing Hadoop Framework
2. Mapper Testing
3. Reducer Testing
4. Counter Testing
5. Full MapReduce Job Testing
Module 22 : Apache Sqoop (SQL To Hadoop) : Available (Length 66 Minutes) New + Useful for CCA175
1. Sqoop Tutorial
2. How does Sqoop Work
3. Sqoop JDBCDriver and Connectors
4. Sqoop Importing Data
5. Various Options to Import Data
- Table Import
- Binary Data Import
- SpeedUp the Import
- Filtering Import
- Full DataBase Import Introduction to Sqoop
Module 23 : Apache Flume : Available (Length 28 Minutes) New
1. Data Acquisition : Apache Flume Introduction
2. Apache Flume Components
3. POSIX and HDFS File Write
4. Flume Events
5. Interceptors, Channel Selectors, Sink Processor
Module 24 : Advanced Apache Flume :Available (Length 48 Minutes) New
1. Sample Twiteer Feed Configuration
2. Flume Channel
- Memory Channel
- File Channel
3. Sinks and Sink Processors
4. Sources
5. Channel Selectors
6. Interceptors
Module 25 : YARN Introduction (Length 52 Mins) Available Hadoop 2.x. YARN Training
1. Why to think Beyond MapReduce
2. New Components of YARN
3. Revisit Hadoop 1.0
4. How YARN fits in Hadoop Framework
5. Hadoop MR1 Components Revisit
6. Need for Non-MapReduce
7. YARN Components Introduction
Module 26 : Fundamental Overview of YARN (Length 40 Mins) Available Hadoop 2.x. YARN Training
1. YARN Functional Component
2. YARN Architecture Overview
3. Claiming and Re-claiming Resources
4. Functional Properties of
Resource Manager
Node Manager
Application Master
5. YARN Scheduling Component
6. Introduction to FIFO Scheduler
7. Introduction to Capacity Scheduler
Module 27 : Powerfull Hadoop 2.0 Framework (Length : 27 Mins) Available Hadoop 2.x. YARN Training
1. HDFS 1.0 Versus Hadoop 2.0
2. Resource Manager - Subcomponent
3. Details About Fair Share Scheduler
4. Hierarchical Queues in Scheduler
5. Containers
6. Node Manager and Its Responsbility
7. Role of Application Master while submitting Jobs
Module 28 : Submitting the Application to YARN Hadoop Cluster (Length : 27 Mins) Available Hadoop 2.x. YARN Training
1. Submitting the Application to YARN Hadoop Cluster
2. Managing Application Dependencies
3. Writing a YARN Application : Birdseye View
Module 29 : LocalResources of the Application Available Hadoop 2.x. YARN Training
1. Understanding of YARN Application/Jobs Dependencies
2. Types of LocalResource
3. Visibilites of Local Resources
4. Lifetime of Local Resources
5. Good and Bad Local Resources
6. Target Directories of Local Resources
Module 30 : Deep Dive in Capacity Schedular (Length 39 Mins) Available Hadoop 2.x. YARN Training
1. Introduction and Enabling Capacity Schedular
2. Setting Up Quesues in the CS
3. Access Control List Setup
4. Managing Cluster Capacity in with Queues
5. Resource Distribution Workflow Example
Module 31 : Managing Capacity Schedular (Length 39 Mins) Available Hadoop 2.x. YARN Training
1. Managing Capacity with Queues
2. Resource Distribution Example
3. Understanding User Limits
4. Application Reservation
5. Understanding the Preemption
Module 32 : Hadoop Security : Kerberos Authentication (Length 23 Mins) Available Hadoop Security Training
1. Kerberos Authentication
2. Important entity of Kerberos Autherization
3. How Kerberos Process works
Module 33 : Apache Spark : Introduction to Apache Spark (Length 48 Mins) Available 100 Time Faster Data Processing + Useful for CCA175
1. Introduction to Apache Spark
2. Features of Apache Spark
3. Apache Spark Stack
4. Introduction to RDD's
5. RDD's Transformation
6. What is Good and Bad In MapReduce
7. Why to use Apache Spark
Module 34 : Cloudera QuickStart VM Step By Step Installation (Length 19 Mins) Available + Steps in PDF+ Hands On Lab
1. It Includes Hadoop 2.0
2. YARN
3. Hive
4. Pig
5. Hue
6. Apache Spark
7. Workflow
Module 35 : Load data in HDFS using the HDFS commands (Length 35 Mins) Available + Steps in PDF + Hands On Lab + Useful for CCA175
Module 36 : Importing Data from RDBMS to HDFS (Length 21 Mins) Available + Steps in PDF+Hands On Lab + Useful for CCA175
1. Without Specifying Directory
2. With target Directory
3. With warehouse directory
Module 37 : Sqoop Import Module (Length 41 Mins) Available + Steps in PDF +Hands On Lab + Useful for CCA175
1. Importing Subset of data from RDBMS
2. Chnaging the delimiter during Import
3. Encoding Null values
4. Importing Entire schema or all tables
Module 38 : Importing data to HIve Using Sqoop (Length 41 Mins) Available +Steps in PDF + Hands On Lab + Useful for CCA175
Module 39 : Apache Avro Introduction (Length 26 Mins) Available + PDF Download + Useful for CCA175
1. Why Avro files
2. Avro file Serialization and Deserialization
3. Adding fields
4. Deleting fields
Module 40 : Apache Avro Schema In Depth (Length 12 Mins) Available + PDF Download + Useful for CCA175
1. Avro schema example
2. Avro embedded schema
3. Avro schema primitive data types
4. Avro schema Complex data types
Record, Map, Array, Union, Enum, Fixed etc.
Module 41 : Apache Avro Schema Evolution (Length 16 Mins) Available + PDF Download + Useful for CCA175
1. Understand Avro Schema Evolution
2. Reader Schema and Writer Schema
3. JSON schema Adding new fields
4. JSON schema removing a filed
All above 41 modules are available and ready to Watch/Learn (To Buy go on Top)