Machine Learning Using Spark

$500.00 $300.00


What is Machine Learning?

Machine learning Using Spark-Spark MLlib is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers to learn automatically without human intervention or assistance and adjust actions accordingly.

Into to Machine Learning Using Spark

MLlib is  Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. At a high level, it provides tools such as:

ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering

Featurization: feature extraction, transformation, dimensionality reduction, and selection

Pipelines: tools for constructing, evaluating, and tuning ML Pipelines

Persistence: saving and load algorithms, models, and Pipelines

Utilities: linear algebra, statistics, data handling, etc.


This course will be delivered using Scala and PYTHON API. For explaining statistical concept, R language will also be using. Visualization part will be covered using Bokeh/ggplot library.

Introduction to Apache Spark

Spark Programming model

RDD and DataFrame

Transformation and Action

Broadcast and Accumulator

Running HDP on local machine

Launching Spark Cluster

Basic Statistics 

Descriptive Statistics

• Mean, Mode, Media, Range, Variance, Standard Deviation, Quartiles, Percentiles


Sampling Methods

Sampling Errors

Probability Distributions
• Normal distribution, t-distribution, Chi-square, F

Margin of Error, Confidence Interval, Significance level, Degree of Freedom

Hypothesis concept, Type I and Type II error

P-value, t-Test, Chi-square Test

Correlation Coefficient

Machine Learning Using Spark

Introduction to Spark MLlib

Data types: Vector, Labeled Point

Feature Extraction

Feature Transformation, Normalization

Feature Selectors

Locality Sensitive Hashing(LSH)

Regression Analysis with Spark

Types of Regression Models

Gradient Descent

Linear Regression, Generalized Linear Regression

MSE, RMSE MAE, R-squared Coefficient

Transforming the target variable

Tuning Model Parameters

Classification Model with Spark

Types of Classification Models
• Linear Models, Naives Bayes Model, Decision Tree

Logistic Regression

Linear Support Vector Machine

Random Forest

Gradient-Boosted Trees

Training Classification Models

Accuracy and prediction error

Precision and Recall

ROC curve and AUC

Cross validation


Hierarchical clustering

K-mean clustering

Dimensionality Reduction

Principal Component Analysis

Singular Value Decomposition

Clustering as dimensionality reduction

Training a dimensionality reduction model

Evaluating dimensionality reduction models

Recommendation Engine

Content based filtering

Collaborative based filtering

Overview of MovieLens data

Training a recommendation model

Using the recommendation model

Performance Evaluation

Text Processing

Feature Hashing

TF-IDF model


Stop words

TF-IDF Weightings

Training a TF-IDF model

Usage of TF-IDF model

Evaluating TF-IDF models

Prerequisites :

Prior  understanding of exploratory data analysis and data visualization  will help immensely in learning machine learning concept and  applications. This  include basic  statistical technique for data analysis. Having some knowledge of R programming or some Python packages like sci-kit, numpy will be useful. However , we are going to cover basic  statistics technique  as part of this course  before going deep into machine learning . This will help everyone to gain maximum from this course.

Duration & Timings :

Duration – 30 Hours.

Training Type: Online Live Interactive Session.

Faculty: Experienced.

Weekend Session – Sat & Sun 9:30 AM to 12:30 PM (EST) – 5 Weeks. June 16, 2018.

Weekend Session – Sat & Sun 9:30 AM to 12:30 PM (EST) – 5 Weeks. August 18, 2018.


There are no reviews yet.

Be the first to review “Machine Learning Using Spark”


© 2018 LEARNTEK. ALL RIGHTS RESERVED | Privacy Policy | Terms & Conditions

Hello. Add your message here.
Memorial Day Discount Offer. Up to 40% Off Ends in
Learn More