What is Machine Learning?
Machine learning Using Spark – Spark MLlib is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.
The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers to learn automatically without human intervention or assistance and adjust actions accordingly.
Into to Machine Learning Using Spark
MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. At a high level, it provides tools such as:
ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering
Featurization: feature extraction, transformation, dimensionality reduction, and selection
Pipelines: tools for constructing, evaluating, and tuning ML Pipelines
Persistence: saving and load algorithms, models, and Pipelines
Utilities: linear algebra, statistics, data handling, etc.
This course will be delivered using Scala and PYTHON API. For explaining statistical concept, R language will also be using. Visualization part will be covered using Bokeh/ggplot library.
Introduction to Apache Spark
Spark Programming model
RDD and DataFrame
Transformation and Action
Broadcast and Accumulator
Running HDP on local machine
Launching Spark Cluster
• Mean, Mode, Media, Range, Variance, Standard Deviation, Quartiles, Percentiles
• Normal distribution, t-distribution, Chi-square, F
Margin of Error, Confidence Interval, Significance level, Degree of Freedom
Hypothesis concept, Type I and Type II error
P-value, t-Test, Chi-square Test
Regression Analysis with Spark
Types of Regression Models
Linear Regression, Generalized Linear Regression
MSE, RMSE MAE, R-squared Coefficient
Transforming the target variable
Tuning Model Parameters
Classification Model with Spark
Types of Classification Models
• Linear Models, Naives Bayes Model, Decision Tree
Linear Support Vector Machine
Training Classification Models
Accuracy and prediction error
Precision and Recall
ROC curve and AUC
Principal Component Analysis
Singular Value Decomposition
Clustering as dimensionality reduction
Training a dimensionality reduction model
Evaluating dimensionality reduction models
Content based filtering
Collaborative based filtering
Overview of MovieLens data
Training a recommendation model
Using the recommendation model
Training a TF-IDF model
Usage of TF-IDF model
Evaluating TF-IDF models
Prior understanding of exploratory data analysis and data visualization will help immensely in learning machine learning concept and applications. This include basic statistical technique for data analysis. Having some knowledge of R programming or some Python packages like sci-kit, numpy will be useful. However , we are going to cover basic statistics technique as part of this course before going deep into machine learning . This will help everyone to gain maximum from this course.
Duration & Timings :
Duration – 30 Hours.
Training Type: Online Live Interactive Session.
Weekday Session – Mon – Thu 8:30 PM – 10:30 PM EST– 4 Weeks. January 21, 2019.
Weekend Session – Sat & Sun 9:30 AM to 12:30 PM (EST) – 5 Weeks. January 16, 2019.