Pig Case Study – Complain Raised by different Software Companies.


Pig Case Study : Problem Statement: To build a script which produces a report listing each Company & State and number of complaints raised by them. We have a data set of complaints (incidents or trouble tickets) of a software system raised by different companies. The details of the fields (i.e. data dictionary) of each record is in below format: Date received = dtr:chararray, Product = prd:chararray, Sub-product =  sprd:chararray, Issue = iss:chararray, Sub-issue = siss:chararray, Company public response =

Read more

Apache Hadoop Vs Apache Spark

Apache Hadoop

What is Big Data? What size of Data is considered to be big and will be termed as Big Data?  We have many relative assumptions for the term Big Data. It is possible that, the amount of data say 50 terabytes can be considered as Big Data for Start-up’s but it may not be Big Data for the companies like Google and Facebook. It is because they have infrastructure to store and process this vast amount of data. When data

Read more

Everything is on Internet! The Internet has a lot of Data! Therefore, everything is Big Data!!

What is Data Science

What is Data Science? Data Science is the art and science of extracting actionable insight from raw data. Put simply, Data Science is an umbrella term for techniques used when trying to extract insights and information from data. Data Science uses automated methods to analyze huge amount of data and extract knowledge from them in various forms either structured or unstructured. Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be

Read more

What is Data Analytics?

Data Analytics

Big Data Analytics as a buzzword has generated much hype in recent times. It is said to be the answer to any marketer’s woes, the way to generate highly focused, customized advertisements and marketing strategies that give the consumers what they want. But what exactly is big data? As the name implies – it is data, which is big in volume. Simply put, it is data that is too large to be processed using traditional data processing mechanisms. With the

Read more

The 6 Top Hadoop Distributions that You Can Employ for Your Big Data Needs

top Hadoop distributions

Apache Hadoop is an excellent software framework that allows the processing of big data elements. It can use the power of commodity hardware by employing a modular system and process large sets of data. Hadoop is available in different distributions as companies often deliver it as a packaged deal. It uses the Hadoop Distributed File System (HDFS) which allows the use of different platforms and the ability to perform parallel data processing. Here, we discuss the six top Hadoop distributions that

Read more

A Brief Discussion of Hadoop Core Components

Hadoop core components

Apache Hadoop is an excellent open-source big data technology platform that allows the use of computer networks to perform complex processing and come up with results that are always available even when a few nodes are not available for functional processing. There are a few important Hadoop core components that govern the way it can perform through various cloud-based platforms. The core components are often termed as modules and are described below: The Distributed File System The first and the most

Read more

The Merits of Hadoop as a Data Solution

Hadoop Data Solution

Hadoop is an amazing programming framework which is quickly gaining prominence. It is sponsored by the Apache Software Foundation and has the support of the Java programming community. Hadoop as a solution does have its merits because it can simplify big data and allow normal consumers to simplify the concept of volumetric data available on the internet. Here are some important details to understand this framework: The Arrival of Hadoop Hadoop as a solution was formulated by top computer experts

Read more

How Analytics with R became powerful


It seems that developers really love to debate about the different ways to do the same things. Seriously, we love it too, but we don’t think we have seen people argue so much in any other industry. There are always people who will think one way of doing something is better than the other but rarely will this result in widespread online arguments and debates the way it does for programmers. Among many such debates, one that has heated up a

Read more

Why MongoDB is so important for big data


MongoDB often confuses people because they don’t understand what its purpose is. MongoDB doesn’t have a singular purpose – it is good in many instances, and yes, it is bad in some situations too, but that is just how every piece of technology is. We have different languages and databases because no one technology can be the best fit for every situation. The main thing which makes MongoDB so great, especially when it comes to big data, is its approach

Read more

What makes Scala & Spark so powerful?


People who work with Python in Spark often hear people discuss Scala and how good it is. If you are wondering whether you should learn Scala, you are probably asking the same question most people asked before they started using Scala: Is it really that better than Python? The answer, you might be surprised to know, is yes – and it is a definite yes. Usually, the experts always have some caveats, some exceptions where the alternative still proves useful

Read more

Site Footer