Words of Wisdom:

"It is not about when, it is just about when is the beginning." - Jessicahh

Hadoop - Big Data

  • Date Submitted: 10/19/2014 12:31 AM
  • Flesch-Kincaid Score: 55.8 
  • Words: 2237
  • Essay Grade: no grades
  • Report this Essay
We live in the age of big data, where the data volumes we need to work with on a
day-to-day basis have outgrown the storage and processing capabilities of a single
host. Big data brings with it two fundamental challenges: how to store and work
with voluminous data sizes, and more important, how to understand data and turn
it into a competitive advantage.
Hadoop fills a gap in the market by effectively storing and providing computational
capabilities over substantial amounts of data. It’s a distributed system made up
of a distributed filesystem and it offers a way to parallelize and execute programs on
a cluster of machines (see figure 1.1). You’ve most likely come across Hadoop as it’s
been adopted by technology giants like Yahoo!, Facebook, and Twitter to address
their big data needs, and it’s making inroads across all industrial sectors.
Because you’ve come to this book to get some practical experience with
Hadoop and Java, I’ll start with a brief overview and then show you how to install
Hadoop and run a MapReduce job. By the end of this chapter you’ll have received
This chapter covers
■ Understanding the Hadoop ecosystem
■ Downloading and installing Hadoop
■ Running a MapReduce job
www.it-ebooks.info
4 CHAPTER 1 Hadoop in a heartbeat
a basic refresher on the nuts and bolts of Hadoop, which will allow you to move on to
the more challenging aspects of working with Hadoop.1
Let’s get started with a detailed overview of Hadoop.
1.1 What is Hadoop?
Hadoop is a platform that provides both distributed storage and computational capabilities.
Hadoop was first conceived to fix a scalability issue that existed in Nutch,2 an
open source crawler and search engine. At the time Google had published papers that
described its novel distributed filesystem, the Google File System (GFS), and Map-
Reduce, a computational framework for parallel processing. The successful implementation
of these papers’ concepts in Nutch resulted in its split into two...

Comments

Express your owns thoughts and ideas on this essay by writing a grade and/or critique.

  1. No comments