Spark and Scala Course Content

Introduction to Bigdata and Hadoop
  • What is Big Data?
  • What is Hadoop?
  • Relation between Big Data and Hadoop.
  • What is the need of going ahead with Hadoop?
  • Scenarios to apt Hadoop Technology in REAL TIME projects.
  • Challenges with Big Data
    1. Storage
    2. Processing
  • How Hadoop is addressing Big Data Changes
  • Comparison with Other Technologies
    1. RDBMS
    2. Data Warehouse
    3. Tera Data
  • Different Components with other Technologies
    1. Storage Components
    2. Processing Components
  • Importance of Hadoop Echo System Components
  • Other solutions of Big Data
    1. Introduction to NO SQL
  • Batch Vs Real Times Vs Near Real Time (NRT) processing
  • Examples of Batch Processing Systems
  • Examples on Real Time Analytics Systems
  • Examples on Near Real Time Systems

SCALA (Scalable Language)

Introduction to Scala
  • Why Scala
  • Why Scala is a multi paradigm Language?
  • Scala Vs java
  • Scala Vs Python
  • Later operability between Scala and Java
  • Scala Data types
  • Scala packages
  • Scala REPL (Read Evaluate Print Loop)
Scala Basics
  • Variable Declarations
  • Variable Type Inference
  • Interactive Scala – Scala Shell
  • Writing Scala Scripts – Compiling the Scala Programs
  • Defining Function in Scala
  • Type casting in Scala
  • Different IDES for Scala
Scala Control Structures
  • If expressions
  • If – Else expressions
  • While Loops
  • Do- While Loops
  • For loop
  • Diff types of for loop
    1. For loop with Range
    2. For loop with Collection
    3. For loop with Filter
    4. For loop with Yield
    5. Pattern Matching in Scala
    6. Exception Handling in Scala
    7. How to pass run time arguments in Scala
Functional Programming in Scala
  • What is Functional Programming?
  • Difference between Object Oriented and Functional- Programming Paradigm
  • Closures in Scala
  • Anonymous Functions in Scala
  • Currying Functions
  • Higher Order Functions
  • Collections in scala
    1. Lists
    2. Sets
    3. Maps
  • Mutable & Immutable Collections
Object Oriented Programming in Scala (Traits & OOPS)
  • Traits Introduction
    1. When to use traits in Scala
    2. Creating traits basic oops
    3. Classes and Objects Basics
    4. Pattern Matching in Scala
    5. Exception Handling in Scala
    6. Constructors in Scala
Data Type ConversionScala Environment Set Up
  • Scala set up on Windows
    1. Java Set UP
    2. Scala Set Up
  • Scala set up on Linux
    1. Java Set UP
    2. Scala Set Up

SPARK

Introduction to Spark
  • Motivation for Spark?
  • Spark Vs Map Reduce Processing?
  • Advantages of IN_MEMORY Processing over Disk Based?
  • Where to use Spark?
  • ROI Comparison of Hadoop Processing over Spark Processing?
  • Why Spark Processing is faster than Map Reduce?
Architecture of Spark
  • Comparison between Hadoop & Spark Architectures
  • Spark Master
  • Spark Driver
  • Spark Worker Node
  • Spark Runtime Managers
    1. Standalone
    2. YARN
    3. Apache Mesos
  • How to Start Spark Deamons
Spark Basics
  • Spark Shell Introduction – Standalone Mode
  • Creating Spark Context
  • Creating Spark Conf, Spark Shell
  • File Operations in Spark Shell
  • Caching in Spark
  • Real time Examples of Spark
Simple Build Tool (SBT)
  • IDEA Intellij IDE Introduction
  • Adding SCALA Plug in to Intellij
  • Installing SBT
  • Spark Project creation and building with SBT
  • Running a Spark Project With SBT
  • Verifying Spark Jobs in Spark Web UI
  • Spark-submit – How to deploy Spark Applications with spark-submit command
  • Running Spark project in Clustered Mode
Resilient Distributed Dataset (RDD)
  • What is RDD and why it is important in Spark
  • RDD Key Features
    1. Immutable
    2. Lazily Evaluated
    3. Partitioned
    4. Cacheable
    5. How to create a RDD
    6. Different types of RDDs
  • RDD Operations
    1. Transformations
    2. Actions
  • Different Transformations in RDD
  • Different Actions in RDD
  • Loading Data through RDD
  • Saving Data
  • Key-Value pair RDD
  • Loading and Saving Data – through different File Formats
    1. Text, csv, tsv, Object files
    2. As a Hadoop file
  • Key – Value Pair RDD Operations
  • Spark Storage Persistence Levels
  • Running Spark in a Clustered Mode
  • Deploying Application with spark-submit
  • Cluster Management
  • Accumulators
    1. Introduction to Accumulators
    2. Practical applicability of accumulators
    3. Real time examples on Accumulators
  • Broadcast Variables
    1. Introduction to Broadcast variables
    2. Practical applicability of Broadcast variables
    3. Real time examples on Broadcast variables
Spark Processing – with different Programming Languages
  • Scala
    1. Installing Scala
    2. How to use “Spark-Shell”
    3. Examples on Spark with Python
  • Python
    1. Installing Python
    2. How to use “Pyspark”
    3. Examples on Spark with Python
  • R
    1. Installing R
    2. How to use “SparkR”
    3. Examples on Spark with R Language
  • Spark SQL
    1. Introduction to Spark SQL
    2. IThe SQL Context
    3. IHive Vs Spark SQL
    4. ISpark SQL support for Text Files, parquet and JSON files
    5. IData Frames
    6. IData Sets
    7. IData frames Vs Data sets – Performance Optimization
    8. IReal Time examples of Spark SQL
  • Different File Formats Support in Spark SQL
    1. Text – JSON – CSV - ORC – TSV - Parquet
  • Different Integration with Spark SQL
    1. Spark SQL integration with Hive
    2. Spark SQL integration with RDBMS
    3. Spark SQL integration with NOSQL(Cassandra)
  • Spark Streaming
    1. Introduction to Spark Streaming
    2. Architecture of Spark Streaming
    3. RDD Vs Discretized Streams (DStreams)
    4. DStream Opreations
    5. Introduction to Spark Streaming Context (SSC)
    6. Transformations on DStreams
      1. Window Operations
      2. Transform Operations
  • Spark Streaming Vs Flume
  • Introduction to Kafka
  • Spark Streaming Integration with Kafka Overview
  • Real Time examples Of Spark Streaming & Kafka
Spark MLib
  • Introduction to Machine Learning
  • Vector Class in MLib
  • Spark MLib Algorithms introduction
  • Classification and Regression Algortithms
  • Naïve Bayes Classification Algorithm
  • Decision Trees Algorithm Overview
Apache Kafka
  • Introduction to Apache Kafka
  • Architecture of Kafka
  • Real time examples on Kafka usage in enterprise level applications
  • Installation of Apache Kafka
  • Fail Over Mechanism in Kafka
  • Practical Use Cases on Kafka

Courses Features

  • Language
    English
  • Lectures
    1
  • Certification
    Yes
  • Project
    02
  • Duration
    40 hrs
  • Max-Students
    20
DEMO
DROP US A QUERY

© Copyright - 2021 | Cyberaegis . All Rights Reserved.