Hadoop Course Content

  • What is Big Data?
  • What is Hadoop?
  • The relation between Big Data and Hadoop.
  • What is the need for going ahead with Hadoop?
  • Scenarios to apt Hadoop Technology in REAL TIME Projects
  • Challenges with Big Data
    1. Storage
    2. Processing
  • How Hadoop is addressing Big Data Changes
  • Comparison with Other Technologies
    1. RDBMS
    2. Data Warehouse
    3. Teradata
  • Define attributes of Material TypesDifferent Components of Hadoop Echo System
    1. Storage Components
    2. Processing Components
HDFS (Hadoop Distributed File System)
  • What is a Cluster Environment?
  • Cluster Vs Hadoop Cluster.
  • The significance of HDFS in Hadoop
  • Features of HDFS
  • Storages aspects of HDFS
    1. Block
    2. How to Configure block size
    3. Default Vs Configurable Block size
    4. Why HDFS Block size so large?
    5. Design Principles of Block Size
  • HDFS Architecture – 5 Daemons of Hadoop
    1. Name Node and its functionality
    2. Data Node and its functionality
    3. Job Tracker and its functionality
    4. Task Track and its functionality
    5. Secondary Name Node and its functionality.
  • Replication in Hadoop – Fail Over Mechanism
    1. Data Storage in Data Nodes
    2. Fail Over Mechanism in Hadoop – Replication
    3. Replication Configuration
    4. Custom Replication
    5. Design Constraints with Replication Factor
  • Accessing HDFS
    1. CLI(Command Line Interface) and HDFS Commands
    2. Java Based Approach
Map Reduce
  • Why is Map Reduce essential in Hadoop?
  • Processing Daemons of Hadoop
  • Job Tracker
    1. Roles Of Job Tracker
    2. Drawbacks w.r.to Job Tracker failure in Hadoop Cluster
    3. How to Configure Job Tracker in the Hadoop cluster
  • Task Tracker
    1. Roles of Task Tracker
    2. Drawbacks w.r.to Job Tracker Failure in Hadoop Cluster
Input Split
  • Input Split
  • Need of Input Split in Map Reduce
  • Input Split Size
  • Input Split Size Vs Block Size
  • Input Split Vs Mappers
Map Reduce Life Cycle
  • Communication Mechanism of Job Tracker & Task Tracker
  • Input Format Class
  • Record Reader Class
  • Success Case Scenarios
  • Failure Case Scenarios
  • Retry Mechanism in Map Reduce
Map Reduce Programming Model
  • Different phases of Map Reduce Algorithm
  • Different Data Types in Map Reduce
    1. Primitive Data Types Vs Map Reduce Data Types
  • How to write a basic Map Reduce Program
    1. Driver Code
    2. Mapper Code
    3. Reducer Code
  • Driver Code
    1. Importance of Driver Code in a Map-Reduce Program
    2. How to Identify the Driver Code in Map Reduce Program
    3. Different sections of Driver code
  • Mapper Code
    1. Importance of Mapper Phase in Map Reduce
    2. How to Write a Mapper Class?
    3. Methods in Mapper Class
  • Reducer Code
    1. Importance of Reduce phase in Map Reduce
    2. How to Write Reducer Class?
    3. Methods in Reducer Class

IDENTITY MAPPER & IDENTITY REDUCER

  • Input Format’s in Map Reduce
    1. Text Input Format
    2. Key Value Text Input Format
    3. Nine Input Format
    4. DB Input Format
    5. Sequence File Input Format.
    6. How to Use the specific Output format in Map Reduce
  • Output Format’s in Map Reduce
    1. Text Output Format
    2. Key Value Text Input Format
    3. Nine Input Format
    4. DB Input Format
    5. Sequence File Input Format.
    6. How to Use the specific Output format in Map Reduce
  • Combiner in Map Reduce
    1. Is combiner mandate in Map Reduce
    2. How to Use the Combiner class in Map Reduce
    3. Performance tradeoffs w.r.to Combiner
  • Partitioner in Map Reduce
    1. Importance of Practitioner class in Map Reduce
    2. How to use the Partitioner class in Map Reduce
    3. Hash Partitioner Functionality
    4. How to write a custom Partitioner
  • Compression Techniques in Map Reduce
    1. Importance of Compression in Map Reduce
    2. What is CODEC
    3. Compression Types
    4. Gzip Codec
    5. Bzip Codec
    6. LZO Codec
    7. Snappy Codec
    8. Configurations w.r.to Compression Techniques
    9. How to customize the Compression per one job Vs all the job.
  • Joins – in Map Reduce
    1. Map Side Join
    2. Reduce Side Join
    3. Distributed cache
  • How to debug MapReduce jobs in Local and Pseudo cluster Mode
  • Data Localization in Map Reduce
Apache PIG
  • Introduction to Apache Pig
  • SQL Vs Apache Pig
  • Different data types in Pig
  • Modes of Execution in Pig
    1. Local Mode
    2. Map Reduce OR Distributed Mode
  • Execution Mechanism
    1. Grunt Shell
    2. Script
  • Embedded
  • Transformations in Pig
  • How to develop the Complex Pig Script
  • Bags, Tuples, and fields in PIG
  • UDF’s in Pig
  • Driver Code
    1. Need for using UDF’s in PIG
    2. How to use UDF’s
    3. REGISTER keyword in PIG
  • When to use Map Reduce & Apache PIG in REAL TIME Projects
APACHE HIVE
  • Hive Introduction
  • Need of Apache HIVE in Hadoop
  • Hive Architecture
    1. Driver
    2. Compiler
    3. Executor(Semantic Analyzer)
  • Meta Store in Hive
    1. Importance of Hive Meta Store
    2. Embedded meta store configuration
    3. External meta store configuration
    4. Communication mechanics with Metastore
  • Hive Integration with Hadoop
  • Hive Query Language (Hive QL)
  • Configuring Hive With MySQL Metastore
  • SQL Vs Hive QL
  • Data Slicing Mechanisms
    1. Partitions in Hive
    2. Buckets In Hive
    3. Partitioning Vs Bucketing
    4. Real-Time Use Cases
  • Collection Data Types in HIVE
    1. Array
    2. Struct
    3. Map
  • User Defined Functions(UDFs) in HIVE
    1. UDFs
    2. UDAFs
    3. UDTFs
    4. Need of UDFs in HIVE
  • Hive Serializer/De-serializer – SerDe
  • HIVE – HBase Integration
APACHE SQOOP
  • Introduction to Sqoop.
  • MySQL client and Server Installation
  • How to connect to Relational Database using Sqoop
  • Different Sqoop Commands
    1. Different Flavors of Imports
    2. Export
    3. Hive-Imports
APACHE HBase
  • HBase Introduction
  • HDFS Vs HBase
  • HBase Use cases
  • HBase basics
    1. Column Families
    2. Scans
  • HBase Architecture
  • Clients
    1. REST
    2. Thrift
    3. Java Based
    4. Avro
  • Map Reduce Integration
  • Map Reduce over HBase
  • HBase Admin
    1. Scheme Definition
    2. Basic CRUD Operations
APACHE Flume
  • Flume Introduction
  • Flume Architecture
  • Flume Master, Flume Collector, and Flume Agent
  • Flume Configurations
  • Real-Time Use Case using Apache Flume
APACHE Oozie
  • Oozie Introduction
  • Oozie Architecture
  • Oozie Configuration Files
  • Oozie Job Submission
    1. Workflow.xml
    2. Coordinators.xml
    3. Job.coordinator.properties
YARN (Yet Another Resource Negotiator)-Next Gen.Map Reduce
  • What is YARN?
  • YARN Architecture
    1. Resource Manager
    2. Application Master
    3. Node Manager
  • When should we go ahead with YARN
  • Classic Map Reduce Vs YARN Map Reduce, Different Configuration Files for YARN
MongoDB (As part of NoSQL Databases)
  • The need for NoSQL Database
  • Relational Vs Non-Relational Databases
  • Introduction to MongoDB
  • Installation of MongoDB
  • Mongo DB Basic operations
APACHE SPARK
  • Spark Architecture
  • Spark Processing with Use cases
  • Spark with SCALA
  • Spark With SQL
Hadoop Administration
  • Hadoop Single Node Cluster Set Up(Hands on Installation on Laptops)
  • Operating System Installation
  • JDK Installation
  • SSH Configuration
  • Dedicated Group & User Creation
  • Hadoop Installation
  • Different Configuration Files Setting
  • Name node format
  • Starting the Hadoop Daemons
  • PIG Installation (Hands on Installation on Laptops)
    1. Local Mode
    2. Clustered Mode
    3. Bashrc file configuration
  • SQOOP Installation (Hand on Installation on Laptops)
    1. Sqoop installation with MySQL Client
  • HIVE Installation(Hands on Installation on Laptops)
    1. Local Mode
    2. Clustered Mode
  • HBase Installation (Hand on Installation on Laptops)
    1. Local Mode
    2. Clustered Mode
Offers from Training:
  • Provided 2 POC’s to work with Hadoop and Its Components
  • Provided All the Materials Soft copy with Use cases
  • Provided Certification Assistance
  • Provided Project Exposure and Discussion

Courses Features

  • Language
    English
  • Lectures
    1
  • Certification
    Yes
  • Project
    1 Minor + 2 Major
  • Duration
    50 hrs
  • Max-Students
    20
DEMO
DROP US A QUERY

© Copyright - 2021 | Cyberaegis . All Rights Reserved.