Description
Big Data Hadoop Training – What is Hadoop?
Big Data Hadoop Training : Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
Why Hadoop?
- Large Volumes of Data: Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration.
- Computing Power: Hadoop’s distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have.
- Fault Tolerance: Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically.
- Flexibility: Unlike traditional relational database, you don’t have to process data before storing it, You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos etc.
- Low Cost: The open-source framework is free and used commodity hardware to store large quantities of data.
- Scalability: You can easily grow your system to handle more data simply by adding nodes. Little administration is required.
The following topics will be covered in our Big Data and Hadoop Online Training:
Big Data Hadoop Training Topics.
Big Data Hadoop Training : Hadoop Introduction
Big Data Hadoop Training : Introduction to Data and System
Types of Data
Traditional way of dealing large data and its problems
Types of Systems & Scaling
What is Big Data
Challenges in Big Data
Challenges in Traditional Application
New Requirements
What is Hadoop? Why Hadoop?
Brief history of Hadoop
Features of Hadoop
Hadoop and RDBMS
Hadoop Ecosystem’s overview
Hadoop Installation
Installation in detail
Creating Ubuntu image in VMwareDownloading Hadoop
Installing SSH
Configuring Hadoop, HDFS & MapReduce
Download, Installation & Configuration Hive
Download, Installation & Configuration Pig
Download, Installation & Configuration Sqoop
Download, Installation & Configuration Hive
Configuring Hadoop in Different Modes
Hadoop Distribute File System (HDFS)
File System – Concepts
Blocks
Replication Factor
Version File
Safe mode
Namespace IDs
Purpose of Name Node
Purpose of Data Node
Purpose of Secondary Name Node
Purpose of Job Tracker
Purpose of Task Tracker
HDFS Shell Commands – copy, delete, create directories etc.
Reading and Writing in HDFS
Difference of Unix Commands and HDFS commands
Hadoop Admin Commands
Hands on exercise with Unix and HDFS commands
Read / Write in HDFS – Internal Process between Client, NameNode & DataNodes.
Accessing HDFS using Java API
Various Ways of Accessing HDFS
Understanding HDFS Java classes and methods
Admin: 1. Commissioning / DeCommissioning DataNode
- Balancer
- Replication Policy
- Network Distance / Topology Script
Map Reduce Programming
About MapReduce
Understanding block and input splits
MapReduce Data types
Understanding Writable
Data Flow in MapReduce Application
Understanding MapReduce problem on datasets
MapReduce and Functional Programming
Writing MapReduce Application
Understanding Mapper function
Understanding Reducer Function
Understanding Driver
Usage of Combiner
Understanding Partitioner
Usage of Distributed Cache
Passing the parameters to mapper and reducer
Analysing the Results
Log files
Input Formats and Output Formats
Counters, Skipping Bad and unwanted Records
Writing Join’s in MapReduce with 2 Input files. Join Types.
Execute MapReduce Job – Insights.
Exercise’s on MapReduce.
Job Scheduling: Type of Schedulers.
Hive
Hive concepts
Schema on Read VS Schema on Write
Hive architecture
Install and configure hive on cluster
Meta Store – Purpose & Type of Configurations
Different type of tables in Hive
Buckets
Partitions
Joins in hive
Hive Query Language
Hive Data Types
Data Loading into Hive Tables
Hive Query Execution
Hive library functions
Hive UDF
Hive Limitations
Pig
Pig basics
Install and configure PIG on a cluster
PIG Library functions
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Running as Java program
PIG UDFs
HBase
HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access
Scans
HBase use cases
Install and configure HBase on a multi node cluster
Create database, Develop and run sample applications
Access data stored in HBase using Java API
Sqoop
Install and configure Sqoop on cluster
Connecting to RDBMS
Installing Mysql
Import data from Mysql to hive
Export data to Mysql
Internal mechanism of import/export
Oozie
Introduction to OOZIE
Oozie architecture
XML file specifications
Specifying Work flow
Control nodes
Oozie job coordinator
Flume
Introduction to Flume
Configuration and Setup
Flume Sink with example
Channel
Flume Source with example
Complex flume architecture
ZooKeeper
Introduction to ZooKeeper
Challenges in distributed Applications
Coordination
ZooKeeper : Design Goals
Data Model and Hierarchical namespace
Cilent APIs
YARN
Hadoop 1.0 Limitations
MapReduce Limitations
History of Hadoop 2.0
HDFS 2: Architecture
HDFS 2: Quorum based storage
HDFS 2: High availability
HDFS 2: Federation
YARN Architecture
Classic vs YARN
YARN Apps
YARN multitenancy
YARN Capacity Scheduler
Prerequisites :
Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python knowledge helpful.
Duration & Timings :
Duration – 30 Hours.
Training Type: Instructor Led Live Interactive Sessions.
Faculty: Certified & Experienced.
Weekday Session – Mon – Thu 8:30 PM to 10:30 PM (EST) – 4 Weeks. February 8, 2021.
Inquiry Now Discount Offer Placement Assistance
USA: +1 734 418 2465 | India: +91 40 4018 1306