The challenges presented by big data are spurring innovations in many areas of technology, not the least of which are storage and data processing. Consider Apache Hadoop, an open source software project aimed squarely at managing large sets of data—as large as petabytes of data in some cases.
The Hadoop platform consists of software that runs on clusters of distributed servers in the cloud, and hence its ability to handle big data. The data processing software that is included with Hadoop is MapReduce, which enables large processing jobs to be distributed in parallel across servers. The results are then combined into a single output. Both Apache Hadoop and MapReduce are part of the wave that is pushing the envelope of big data opportunities.
Companies are seemingly intrigued by the potential of analyzing heretofore unwieldy data sets. “Hadoop is really taking off, as organizations have realized there are huge amounts of data that are largely untapped,” says Deirdre Mahon, vice president of marketing at San Francisco-based RainStor Inc., a provider of database software that works with the Hadoop platform and MapReduce.
Birst Inc., a provider of business analytics, also in San Francisco, has recently jumped on the bandwagon, announcing support for Apache Hadoop. According to the company, the support will enable customers to use Birst’s BI solution to access Hadoop-based data in real time or integrate that data with data in other systems such as SAP or Salesforce.com. Information can be delivered to whoever needs it via dashboards, reports, ad-hoc queries, and mobile distribution.
According to Mahon, MapReduce represents a learning curve for many as special skills are required. (RainStor also runs with SQL.) For channel partners interested in getting up to speed with Apache Hadoop and MapReduce, Cloudera Inc., a Palo Alto, Calif.-based provider of a data platform built on Apache Hadoop, offers training and certification.