Setup Menus in Admin Panel

Scala and Spark Training



Scala and Spark Training – What is Scala?

Scala and spark Training – Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala, the word came from “Scalable Language”, is a hybrid functional programming language which smoothly integrates the features of objected oriented and functional programming languages and it is compiled to run on the Java Virtual Machine. Scala has been created by Martin Odersky and released in 2003.

Why Scala?

There are the following reasons that encourages Scala learning.

Many existing companies, who depend on Java for business critical applications, are turning to Scala to boost their development productivity, applications scalability and overall reliability.

Scala  is a type-safe JVM language that incorporates both object oriented and functional programming features into an extremely concise, logical, simple and extremely powerful language.

Scala creates a “better Java” alternative by remaining its syntax very close to the Java language syntax, so that to minimize the learning difficulty.

Scala was created specifically with the goal of creating a better language, in contrast with those restrictive, overly tedious, or frustrating features of Java.

Scala is a much cleaner and well organized language that is ultimately easier to use and increases productivity.

What is Spark?

Spark is a fast cluster computing technology, designed for fast computation in Hadoop clusters. It is based on Hadoop MapReduce programming and it extends the MapReduce model to efficiently use it for more types of computations, like interactive queries and stream processing. Spark uses Hadoop in two different ways – one is storage and another one is processing. As Spark is having its own cluster management computation, it uses Hadoop for storage purpose only.

Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab by Matei Zaharia. It was Open Sourced in 2010 under a BSD license. It was donated to Apache software foundation in 2013, and now Apache Spark has become a top level Apache project from Feb-2014.

Why Spark?

Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process.

The main feature of Spark is its in-memory cluster computing that highly increases the speed of an application processing.

Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools.

Apache Spark also have the following features.

  • Speed− Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory and 10 times faster when running on disk by reducing number of read/write operations to disk and by storing the intermediate processing data in memory.
  • Supports multiple languages− Spark comes up with 80 high-level operators for interactive querying and provides application development with built-in APIs in different languages in Java, Scala, or Python.
  • Advanced Analytics− Spark not only supports ‘Map’ and ‘reduce’ programming but it also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.


The following topics will be covered in our Scala and Spark Training:

Scala and Spark Training – Introduction to Scala

Scala and spark Training – Overview of Scala

Installing Scala

Scala Basics

IDE for Scala

Scala Worksheet

Scala Programming

Variables & Methods


Reserved Words


Precedence Rules

Operator Associativity

Ways of Executing a Scala Program

Expressions and Loops

If Expression

For Expression

Usage of ‘yield’ keyword in For Expression

Exception handling with Try Expression

Match Expression

While Loops

Do-While Loops

Functions in Scala


Nested Methods

First class Function

Higher Order Methods

Function Literal

Partially Applied Function

Tail Recursion



Control Abstraction

Call-by-name Vs call-by-value

Repeated Parameter passing mechanism

Named Parameter mechanism

Default parameter mechanism

OOPs in Scala

Classes & Objects

Defining a Constructor

Constructor Parameter Vs Class Parameter

Singleton Object

Companion Object

Abstract Class

Uniform Access Principle

Access Modifiers

Extending a Class

Namespace in Scala

Calling a superclass Constructor

Dynamic Binding in Scala

Final Member in Scala Class

Scala Class Hierarchy

Object Equality in Scala

Factory Design Pattern in Scala


Introduction to Traits

Inheritance in Traits

Mixing a Trait

Trait Vs Class

Ordered Trait

Example of Ordered Trait

Stackable Modification behaviour of Trait

Example of Stackable Modification

Rules of mixing of multiple traits

Scala Programming Packaging 


Different form of Scala Package

Imports statement

Different form of Import

Package Object

Implicit Imports

Case Class & Pattern Matching

Introduction to Case Class

Introduction to Pattern Matching

Example of Pattern Matching

Wildcard Pattern

Constant Pattern

Variable Pattern

Constructor Pattern

Sequence Pattern

Tuple Pattern

Type Pattern

Variable Binding

Pattern Guard

Sealed Class

Option Data Type

Usage of Option Data Type

Pattern Usage

Partial Function

Case Class and Partial Function

Usage of Pattern in For Expression

Scala Collection 

Immutable and Mutable collection

Constructing object of Array, Set, List, Tuple,Map

Detailed Discussion of various methods in List class and List Object

List Construction

Basic Operations like head, tail, isEmpty on List

List Pattern

Example of using List Pattern

Categories of methods in List

First Order Methods in List

Higher Order Methods in List

Map vs flatMap

Filtering a List

Example of takeWhile, dropWhile, span, partition

Predicates over List

Folding Over List

FoldLeft Vs FoldRight

Scala and Spark Training – Introduction to Spark 

Introduction to Big Data

Big Data Problem

Scale-Up Vs Scale-Out Architecture

Characteristics of Scale-Out

Introduction to Hadoop, Map-Reduce and HDFS

Introducing Spark

Hortonworks Data Platform (HDP) using Virtual box

Importing HDP VM image using Virtual box on local machine

Configuring HDP

Overview of Ambari and its components

Overview of services configuration using Ambari

Overview of Apache Zeppelin

Creating, importing and executing notebooks in Apache Zeppelin

IDEs for Spark Applications

SBT and its overview



Resolving dependencies for Spark applications

Spark Basics 

Spark Shell

Overview of Spark architecture

Storage layers for Spark

Initialize a Spark Context and building applications

Submitting a Spark Application

Use of Spark History Server

Spark Components

Spark Driver Process

Spark Executor

Spark Conf and Spark Context

SparkSession object

Overview of spark-submit command

Spark UI


Overview of RDD

RDD and Partitions

Ways of Creating RDD

RDD transformations and Actions

Lazy evaluation

RDD Lineage Graph (DAG)

Element wise transformations

Map Vs FlatMap Transformation

Set Transformation

RDD Actions

Overview of RDD persistence

Methods for persisting RDD

Persisting RDD with Storage option

Illustration of Caching on an RDD in DAG

Removal of Cached RDD

Pair RDDs

Overview of Key-Value Pair RDD

Ways of creating Pair RDDs

Transformations on Pair RDD

ReduceByKey(), FoldByKey(),MapValues(), FlatMapValues(),keys() and Values() Transformation

Grouping, Joining, Sorting on Pair RDD

ReduceByKey() Vs GroupByKey()

Pair RDD Action

Launching Spark on cluster 

Configure and launch Spark Cluster on Google Cloud

Configure and launch Spark Cluster on Microsoft Azure

Logging and Debugging a Spark Application

Setting up a window environment for executing Spark Application using IDE

Steps of using slf4j logging mechanism in Spark Application

Attaching a debugger to Spark Application

Example of debugging a Spark application running inside a cluster

Spark Application Architecture 

Spark Application Distributed Architecture

Spark Application submission Mode

Overview of Cluster Manager

Example of using Standalone Cluster Manager

Driver and its responsibilities

Overview of Job, Stage and Tasks

Spark Job Hierarchy


Spark-submit command and various submission options

Yarn Cluster Manager

Yarn Architecture

Client and Cluster Deploy-mode

Advance concepts in Spark  



RDD partitioning

Re-partition RDD

Determining RDD partitioner

Spark SQL 

Introduction to SparkSQL

Creating SparkSession with Hive Support


Ways of Creating DataFrame

Registering a DataFrame as View

DataFrame Transformations API

DataFrame SQL statement

Aggregate Operations

DataFrame Action

Catalyst Optimizer

Catalog API

Limitation of DataFrame

Introduction to Dataset

Introduction to Encoder

Creating Dataset

Functional transformation on Dataset

Loading CSV, JSON, Parquet format file in SparkSQL

Loading and saving data from/in Hive, JDBC, HDFS, Cassandra

Introduction to User-Defined-Function (UDF)

Customizing a UDF

Usage of UDF in DataFrame Transformations API

Usage of UDF in Spark SQL statement

Introduction to Window Function

Steps of defining a window function

Illustration of Window function usage

Introduction to UDAF

Customizing a UDAF

Illustration of customized UDAF usage

Spark Streaming 

Introduction to data streaming

Spark Streaming framework

Spark Streaming and Micro batch

Introduction of DStreams

DStreams and RDD

Word Count example using Socket Text Stream

Streaming with Twitter feeds

Setting up a Twitter App

Resolving Twitter dependency in Spark Streaming Application

Steps of creating Uber Jar

Example of extracting hashtags from tweet data

Troubleshooting Twitter Streaming issue in Spark Application

Steps of creating Spark Streaming Application

Architecture of Spark Streaming

Stateless Transformations

Twitter Streaming examples using stateless transformation

Introduction to stateful Transformations

Window Transformations

Window Duration and Slide Duration

Window Operations

Naive and inverse window reduce operation


Tracking State of an event using updateStateByKey operation

Interact directly with RDD using transform () operation

Example of HDFS file streaming

Example of Spark-Kafka interaction

Saving DStreams to external file system

Duration & Timings : USA

Duration – 30 Hours.

Training Type: Online Live Interactive Session.

Faculty: Experienced.

Weekday Session – Mon – Thu 8:30 PM to 10:30 PM (EST) – 4 Weeks. December 13, 2021.

Weekend Session – Sat & Sun 9:30 AM – 12:30 PM (EST) – 5 Weeks. January 8, 2022.


Your classes creates a lot of interest . Very clearly you try to make us understand on each of the topic. Each of your exercise help us to understand the topics .I sincerely appreciate your efforts and time for us.


Thank you very much for arranging a good trainer for Scala & Spark. I felt very good and satisfied with the training. Trainer has a good knowledge over the subject. He explained with a good set of examples for the coding lectures.


 For More Reviews 

Most Viewed Big Data & Hadoop Blog Articles

 Inquiry Now         Discount Offer 

USA: +1 734 418 2465 | India: +91 40 4018 1306


© 2019 LEARNTEK. ALL RIGHTS RESERVED | Privacy Policy | Terms & Conditions

USA: +1 734 418 2465 | Discount Offer
Happy Thanksgiving Discount Offer End's in
Discount Offer