
Guess what? We’re wrong.
If you’re a decent Java developer, working with Hadoop is not inaccessible, says Jesse Anderson, who develops the curriculum and is an instructor for Cloudera, the leading commercial organization that develops software and provides services around Big Data.
What Is Hadoop? (Hint: It’s Not A Baby Elephant)
For anyone who isn’t familiar with Hadoop, it’s a new Open Source framework for storing and processing data, lots of data, as in 100’s of terabytes. A terabyte is 1000 to the power of four. Try writing that number out.
Hadoop is a Java (yes, the kind of Java that many of you already know) based data storage and computation framework for large data sets distributed across a cluster of commodity hardware. It’s made up of two main components: MapReduce and HDFS.
HDFS stands for Hadoop Distributed File System, it’s a way for storing Big Data.
MapReduce is a computational pattern in which an application is divided into many small fragments of work, each of which is executed on any node in the cluster. (A node, for anyone who may not know, is a computer.)
So, let’s say that you want to run an experiment that adds up the face values of every numerical card ever dealt in Las Vegas. You might use MapReduce to do that.
Learning Opportunities
Where to Begin
Where would you start and how would you do it? Cloudera created a video demonstration for us.
Mind you, the first few minutes of the video are a bit dry, but after that, it gets quite interesting, so stick with it.
Have any questions? We’re all ears. Also, if you’d like to take one of Cloudera’s Big Data courses, you can find information on Cloudera University here.
One more thing, if Big Data training isn’t something that your boss is willing to pay for out of his funds, check your company’s tuition reimbursement plan. Or, of course, you can cough up the bucks yourself, which is exactly what Anderson says a number of students are doing -- they’re investing in their careers.
Title image courtesy of photobank.kiev.ua (Shutterstock)