Enterprises have been collecting data for decades. Now advanced storage and network technologies offer the ability to capture more data than ever.
No two companies are the same, nor do they need big data to solve the same problems. More data on clicks, likes, locations and other types of information are being collected than ever before. But without the right skills to integrate and analyze this information, big data is useless.
IT professionals need specific skills to manage, monitor and operate Hadoop clusters and they need specific expertise to extract real business value from big data. The IT team must train or obtain resources with these big data engineering and analysis skills as well as business talents.
Many Skills Required
Success with big data requires technical skills as well as business insight.
Data mining and analysis
IT professionals need experience with data mining and analysis to help their companies discover useful information, draw conclusions and support sound decision making. This means backgrounds with technologies like Hadoop and MapReduce. Most companies also require understanding of the popular commercial Hadoop distributions, like Cloudera, Hortonworks or MapR, as well as open source Apache Hadoop. Working knowledge of Hive and Pig are also necessary, as well as skill with languages like Java, Ruby and Python. For managing the agile environment, knowledge of automation tools like Puppet and Chef are essential for streamlining processes.
Today’s IT teams require an understanding of data warehousing platforms (Teradata, Greenplum and Netezza, for example) to get data in and out. This skill provides organizations with valuable information through current and historical data for reporting, trending and seasonal sales comparisons. Firsthand knowledge with relational databases is a checklist item, so proficiency with MySQL, MS SQL Server and DB2 are essential. NoSQL databases, such as Cassandra, HBASE and MongoDB, are becoming part of the big data stack. They're not interchangeable, so real world experience with as many as possible should be the goal for engineers looking to hit the ground running. A few reviews of job postings on Dice.com provide a fairly thorough list.
Data collection and transformation
Extracting and transforming data is a regular activity in the life of a big data specialist. Collection is painstaking work that requires precision and patience. Transforms can be hard to specify and evaluate, and poorly formatted data is widespread. Discovering and correcting these issues is challenging but paramount, and automation is almost impossible. For most Fortune 1000 companies, it takes a year or more to extract, load and transform data from legacy systems into Hadoop. For these activities, IT professionals require in-depth SQL expertise as well as data transformation skills using scripting with programming languages like Python or R, or manual editing using tools like Excel. Hands-on knowledge with Extract, Transform, Load (ETL) tools like Informatica, Talend and Pentaho are also a good idea.
Cloud infrastructure experience is necessary for most organizations, even if the big data initiative is currently on-premise. It likely won’t stay that way. Cloud offerings are hard to beat for factors like speed, reliability, elasticity and time-to-service, among others. IT pros need to be familiar with Amazon Web Services EC2 and Elastic MapReduce, and they need to know how to build, operate and manage a distributed system at scale. A firm understanding and experience with bringing Hadoop into production is crucial, including expertise in administration, configuration management, testing, performance tuning and monitoring.
Beyond Technical Skills
Administrators and developers must work with line of business leaders to identify and understand what’s needed from big data. The administrator is often a mediator and project overseer who accepts the challenge, translating what the business unit wants into language (scripts, processes, etc.) that IT will apply to the infrastructure to return the deliverable (the answer to the challenge).
Consider a hypothetical big data request. The CMO wants to know which customers should receive emails with a complimentary three-month trial subscription. He’d like the top 1,000 customers ranked, based on the past 12 months of lead nurturing campaigns, purchases, website visits and specific page clicks.
The big data team must understand the objectives, details and goals of the campaign as well as the quality of the data, complexities of system integration and the process of manipulating those systems. They must then set expectations, provide the marketing department and IT team with a schedule and assign tasks. It is also their job to follow the project through and update the teams and company with milestones and results.
Excellent communication skills with technical and non-technical jargon are necessary so all audiences understand the challenges, milestones, results and benefits. They must be ready and willing to call appropriate meetings at proper intervals and know when and why certain decisions must be made, such as extending deadlines.
It’s an exciting time for IT professionals. Growth potential and opportunities abound. Companies offer great salaries, training, certification and projects that help boost resumes. When considered wisely, the right career path will provide rich experiences.
Organizations face stiff competition. Until they find all the right hires, they can augment IT teams with consultants and add a robust training program for permanent staff. Offer competitive salaries and benefits, professional development, and certification programs in order to attract talent and once landed, keep them interested, educated, advancing and happy. They are one of your biggest assets.