USA: +1 734 418 2465 | India: +91 40 4018 1306 |

Setup Menus in Admin Panel

spark mllib

Machine Learning Using Spark

Inquiry Now

Product Description

What is Machine Learning?

Machine learning-Spark MLlib is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.
The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers to learn automatically without human intervention or assistance and adjust actions accordingly.

Into to Spark MLlib

MLlib is  Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. At a high level, it provides tools such as:

ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering

Featurization: feature extraction, transformation, dimensionality reduction, and selection

Pipelines: tools for constructing, evaluating, and tuning ML Pipelines

Persistence: saving and load algorithms, models, and Pipelines

Utilities: linear algebra, statistics, data handling, etc.


This course will be delivered using Scala and PYTHON API. For explaining statistical concept, R language will also be using. Visualization part will be covered using Bokeh/ggplot library.

Introduction to Apache Spark

Spark Programming model

RDD and DataFrame

Transformation and Action

Broadcast and Accumulator

Running HDP on local machine

Launching Spark Cluster

Basic Statistics 

Flink Introduction


Distributed Execution

Descriptive Statistics
• Mean, Mode, Media, Range, Variance, Standard Deviation, Quartiles, Percentiles


Sampling Methods

Sampling Errors

Probability Distributions
• Normal distribution, t-distribution, Chi-square, F

Margin of Error, Confidence Interval, Significance level, Degree of Freedom

Hypothesis concept, Type I and Type II error

P-value, t-Test, Chi-square Test

Correlation Coefficient

Spark MLlib

Introduction to Spark MLlib

Data types: Vector, Labeled Point

Feature Extraction

Feature Transformation, Normalization

Feature Selectors

Locality Sensitive Hashing(LSH)

Regression Analysis with Spark

Types of Regression Models

Gradient Descent

Linear Regression, Generalized Linear Regression

MSE, RMSE MAE, R-squared Coefficient

Transforming the target variable

Tuning Model Parameters

Classification Model with Spark

Types of Classification Models
• Linear Models, Naives Bayes Model, Decision Tree

Logistic Regression

Linear Support Vector Machine

Random Forest

Gradient-Boosted Trees

Training Classification Models

Accuracy and prediction error

Precision and Recall

ROC curve and AUC

Cross validation


Hierarchical clustering

K-mean clustering

Dimensionality Reduction

Principal Component Analysis

Singular Value Decomposition

Clustering as dimensionality reduction

Training a dimensionality reduction model

Evaluating dimensionality reduction models

Recommendation Engine

Content based filtering

Collaborative based filtering

Overview of MovieLens data

Training a recommendation model

Using the recommendation model

Performance Evaluation

Text Processing

Feature Hashing

TF-IDF model


Stop words

TF-IDF Weightings

Training a TF-IDF model

Usage of TF-IDF model

Evaluating TF-IDF models

Prerequisites :

Familiarity with Scala or Python is better. Plus having idea about Spark functionality will make it easier to understand background processing.

Duration & Timings :

Duration – 30 Hours.

Course Fee : $400.    Discount Offer  

Training Type: Online Live Interactive Session.

Faculty: Experienced.

Weekend Session – Sat & Sun 9:30 AM to 12:30 PM (EST) – 5 Weeks. February 24, 2018.

Weekday Session – Mon – Thu 8:30 PM to 10:30 PM (EST) – 4 Weeks. March 26, 2018.


There are no reviews yet.

Be the first to review “Machine Learning Using Spark”

About Learntek

Learntek is global online training provider on Big Data, Hadoop, Data Analytics and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional.

Our job is to make sure your training and learning experience is everything it should be – exciting, enjoyable, stimulating and successful.
Copyright @ 2017 Learntek. All Rights Reserved