Scala and Spark Training – What is Scala?
Scala and spark Training – Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala, the word came from “Scalable Language”, is a hybrid functional programming language which smoothly integrates the features of objected oriented and functional programming languages and it is compiled to run on the Java Virtual Machine. Scala has been created by Martin Odersky and released in 2003.
There are the following reasons that encourages Scala learning.
Many existing companies, who depend on Java for business critical applications, are turning to Scala to boost their development productivity, applications scalability and overall reliability.
Scala is a type-safe JVM language that incorporates both object oriented and functional programming features into an extremely concise, logical, simple and extremely powerful language.
Scala creates a “better Java” alternative by remaining its syntax very close to the Java language syntax, so that to minimize the learning difficulty.
Scala was created specifically with the goal of creating a better language, in contrast with those restrictive, overly tedious, or frustrating features of Java.
Scala is a much cleaner and well organized language that is ultimately easier to use and increases productivity.
What is Spark?
Spark is a fast cluster computing technology, designed for fast computation in Hadoop clusters. It is based on Hadoop MapReduce programming and it extends the MapReduce model to efficiently use it for more types of computations, like interactive queries and stream processing. Spark uses Hadoop in two different ways – one is storage and another one is processing. As Spark is having its own cluster management computation, it uses Hadoop for storage purpose only.
Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab by Matei Zaharia. It was Open Sourced in 2010 under a BSD license. It was donated to Apache software foundation in 2013, and now Apache Spark has become a top level Apache project from Feb-2014.
Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process.
The main feature of Spark is its in-memory cluster computing that highly increases the speed of an application processing.
Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools.
Apache Spark also have the following features.
- Speed− Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory and 10 times faster when running on disk by reducing number of read/write operations to disk and by storing the intermediate processing data in memory.
- Supports multiple languages− Spark comes up with 80 high-level operators for interactive querying and provides application development with built-in APIs in different languages in Java, Scala, or Python.
- Advanced Analytics− Spark not only supports ‘Map’ and ‘reduce’ programming but it also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.
The following topics will be covered in our Scala and Spark Training:
Scala and Spark Training – Introduction to Scala
Scala and spark Training – Overview of Scala
IDE for Scala
Variables & Methods
Exception handling with Try Expression
Functions in Scala
First class Function
Higher Order Methods
Partially Applied Function
Traits & OOPs in Scala
Classes & Objects
Scala Class Hierarchy
Package and Imports
Case Class & Pattern Matching
Immutable And Mutable collection
Scala and Spark Training – Introduction to Spark
Scala & spark Training – Problems with Traditional Large-Scale Systems
What is Spark?
Configure HDP 2.4 (or 2.5) on local machine
Storage layers for Spark
Overview of Spark architecture
Initialize a Spark Context and building applications
IDEs for Spark Applications
SBT and its overview
Resolving dependencies for Spark applications
RDD transformations and Actions
Element wise transformations
Key-Value Pair RDD
Creating Pair RDDs
Transformations on Pair RDD
Grouping , Joining, Sorting on Pair RDD
Determining a partitioner of Pair RDD
Operations that Benefit from Partitioning
Operations those affect the partitioning
Page Rank Example
Advance concepts in Spark
Working on per-partition basis
Launching Spark on cluster
Configure and launch Spark Cluster on AWS
Configure and launch Spark Cluster on Microsoft Azure
Running Spark on Cluster
Spark Runtime Architecture
Components of Execution : Job, Stage and Task
Spark Web URL
Driver and Executor logs
Caching and Persistence
Duration & Timings :
Duration – 30 Hours.
Training Type: Online Live Interactive Session.
Weekend Session – Sat – Sun 9:30 AM – 12:30 PM EST– 5 Weeks. July 28, 2018.
Weekend Session – Sat – Sun 9:30 AM – 12:30 PM EST– 5 Weeks. September 8, 2018.