USA: +1 734 418 2465 | India: +91 40 4018 1306 | info@learntek.org

Setup Menus in Admin Panel

LEARNTEK
rsz_bhh

BIG DATA AND HADOOP TRAINING

Inquiry Now

Product Description

What is Hadoop?

Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.

Why Hadoop?

  • Large Volumes of Data: Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration.
  • Computing Power: Hadoop’s distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have.
  • Fault Tolerance: Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically.
  • Flexibility: Unlike traditional relational database, you don’t have to process data before storing it, You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos etc.
  • Low Cost: The open-source framework is free and used commodity hardware to store large quantities of data.
  • Scalability: You can easily grow your system to handle more data simply by adding nodes. Little administration is required.

The following topics will be covered in our Big Data and Hadoop Online Training:

Big Data and Hadoop Training Topics

Hadoop Introduction

Introduction to Data and System

Types of Data

Traditional way of dealing large data and its problems

Types of Systems & Scaling

What is Big Data

Challenges in Big Data

Challenges in Traditional Application

New Requirements

What is Hadoop? Why Hadoop?

Brief history of Hadoop

Features of Hadoop

Hadoop and RDBMS

Hadoop Ecosystem’s overview

Hadoop Installation

Installation in detail

Creating Ubuntu image in VMwareDownloading Hadoop

Installing SSH

Configuring Hadoop, HDFS & MapReduce

Download, Installation & Configuration Hive

Download, Installation & Configuration Pig

Download, Installation & Configuration Sqoop

Download, Installation & Configuration Hive

Configuring Hadoop in Different Modes

Hadoop Distribute File System (HDFS)

File System – Concepts

Blocks

Replication Factor

Version File

Safe mode

Namespace IDs

Purpose of Name Node

Purpose of Data Node

Purpose of Secondary Name Node

Purpose of Job Tracker

Purpose of Task Tracker

HDFS Shell Commands – copy, delete, create directories etc.

Reading and Writing in HDFS

Difference of Unix Commands and HDFS commands

Hadoop Admin Commands

Hands on exercise with Unix and HDFS commands

Read / Write in HDFS – Internal Process between Client, NameNode & DataNodes.

Accessing HDFS using Java API

Various Ways of Accessing HDFS

Understanding HDFS Java classes and methods

Admin: 1. Commissioning / DeCommissioning DataNode

  1. Balancer
  2. Replication Policy
  3. Network Distance / Topology Script

Map Reduce Programming

About MapReduce

Understanding block and input splits

MapReduce Data types

Understanding Writable

Data Flow in MapReduce Application

Understanding MapReduce problem on datasets

MapReduce and Functional Programming

Writing MapReduce Application

Understanding Mapper function

Understanding Reducer Function

Understanding Driver

Usage of Combiner

Understanding Partitioner

Usage of Distributed Cache

Passing the parameters to mapper and reducer

Analysing the Results

Log files

Input Formats and Output Formats

Counters, Skipping Bad and unwanted Records

Writing Join’s in MapReduce with 2 Input files. Join Types.

Execute MapReduce Job – Insights.

Exercise’s on MapReduce.

Job Scheduling: Type of Schedulers.

Hive

Hive concepts

Schema on Read VS Schema on Write

Hive architecture

Install and configure hive on cluster

Meta Store – Purpose & Type of Configurations

Different type of tables in Hive

Buckets

Partitions

Joins in hive

Hive Query Language

Hive Data Types

Data Loading into Hive Tables

Hive Query Execution

Hive library functions

Hive UDF

Hive Limitations

Pig

Pig basics

Install and configure PIG on a cluster

PIG Library functions

Pig Vs Hive

Write sample Pig Latin scripts

Modes of running PIG

Running in Grunt shell

Running as Java program

PIG UDFs

HBase

HBase concepts

HBase architecture

Region server architecture

File storage architecture

HBase basics

Column access

Scans

HBase use cases

Install and configure HBase on a multi node cluster

Create database, Develop and run sample applications

Access data stored in HBase using Java API

Sqoop

Install and configure Sqoop on cluster

Connecting to RDBMS

Installing Mysql

Import data from Mysql to hive

Export data to Mysql

Internal mechanism of import/export

Oozie

Introduction to OOZIE

Oozie architecture

XML file specifications

Specifying Work flow

Control nodes

Oozie job coordinator

Flume

Introduction to Flume

Configuration and Setup

Flume Sink with example

Channel

Flume Source with example

Complex flume architecture

ZooKeeper

Introduction to ZooKeeper

Challenges in distributed Applications

Coordination

ZooKeeper : Design Goals

Data Model and Hierarchical namespace

Cilent APIs

YARN

Hadoop 1.0 Limitations

MapReduce Limitations

History of Hadoop 2.0

HDFS 2: Architecture

HDFS 2: Quorum based storage

HDFS 2: High availability

HDFS 2: Federation

YARN Architecture

Classic vs YARN

YARN Apps

YARN multitenancy

YARN Capacity Scheduler

Prerequisites :

Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python knowledge helpful.

Duration & Timings :

Duration – 30 Hours.

Course Fee : $300    Discount Offer  

Training Type: Online Live Interactive Session.

Faculty: Experienced.

Weekend Session – Sat & Sun 9:30 AM to 12:30 PM (EST) – 5 Weeks. September 30, 2017.

Weekday Session – Mon – Thu 8:30 PM to 10:30 PM (EST) – 4 Weeks. October 9, 2017.

Weekend Session – Sat & Sun 9:30 AM to 12:30 PM (EST) – 5 Weeks. October 21, 2017.

Weekday Session – Mon – Thu 8:30 PM to 10:30 PM (EST) – 4 Weeks. October 30, 2017.

Any questions, please submit   Inquiry Now  

USA: +1 734 418 2465 | India: +91 40 4018 1306

About Learntek

Learntek is global online training provider on Big Data, Hadoop, Data Analytics and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional.

Our job is to make sure your training and learning experience is everything it should be – exciting, enjoyable, stimulating and successful.
top