Data & Analytics

Column: The Death of Big Data and the Emergence of the Multicloud Era

The era of Big Data is coming to an end as the focus shifts from how we collect data to processing that data in real-time. Big Data is now a business asset supporting the next eras of multicloud support, machine learning, and real-time analytics.

ripbigdata.jpg

By Hyoun Park, Amalgam Insights

Park

The era of Big Data passed away on 5 June 2019 with the announcement of Tom Reilly’s upcoming resignation from Cloudera and subsequent market capitalization drop. Coupled with MapR’s recent announcement intending to shut down in late June, which will be dependent on whether MapR can find a buyer to continue operations, June of 2019 accentuated that the initial era of Hadoop-driven Big Data has come to an end. Big Data will be remembered for its role in enabling the beginning of social-media dominance; its role in fundamentally changing the mindset of enterprises in working with multiple-orders-of-magnitude increases in data volume; and in clarifying the value of analytic data, data quality, and data governance for the ongoing valuation of data as an enterprise asset.

As I give a eulogy of sorts to the era of Big Data, I do want to emphasize that Big Data technologies are not actually dead but that the initial generation of Hadoop-based Big Data has reached a point of maturity where its role in enterprise data is established. Big Data is no longer part of the breathless hype cycle of infinite growth but is now an established technology.

The Birth of Big Data

When the era of Big Data started with the launch of Apache Hadoop in 2006, developers and architects saw this tool as an enabler to process and store multistructured and semistructured data. The fundamental shift in thinking of enterprise data beyond traditional enterprise database assumptions of ACID (atomicity, consistency, isolation, and durability) led to a transformation of data use cases as companies realized that data previously thrown away or kept in static archives could actually provide value to understanding customer behavior; propensity to take action; risk factors; and complex organizational, environmental, and business behaviors. The commercial value of Hadoop started to be established in 2009 with the launch of Cloudera as a commercial distribution, which was quickly followed by MapR, Hortonworks, and EMC Greenplum (now Pivotal HD). Although analysts provided heady projections of Big Data as a potential market of $50 billion or more, Hadoop ended up being challenged through the 2010s as an analytic tool.

Hadoop’s Challenges in the Enterprise World

Although Hadoop was very valuable in supporting large storage and ETL (extract, transform, and load) jobs and in supporting machine-learning tasks through batch processing, it was not optimal for supporting more traditional analytics jobs that businesses and large organizations used to manage day-to-day operations. Tools such as Hive, Dremel, and Spark were used on top of Hadoop to support analytics, but Hadoop never became fast enough to truly replace the data warehouse.

Hadoop also faced challenges from the advances in NoSQL databases and object storage providers in solving aspects of the storage and management challenges that Hadoop was originally designed to support. Over time, the challenges of supporting business continuity on Hadoop and the lack of flexibility in supporting real-time, geospatial, and other emerging analytics use cases made it difficult for Hadoop to evolve beyond batch processing for massive volumes of data.

In addition, over time, businesses started to find that their Big Data challenges were increasingly associated with supporting a wide variety of data sources and quickly adjusting data schemas, queries, definitions, and contexts to reflect the use of new applications, platforms, and cloud infrastructure vendors. To solve this challenge, analytics, integration, and replication had to become more agile and more rapid. This challenge was reflected in the creation of a number of vendors ranging including

  • Analytics solutions such as ClearStory Data, Domo, Incorta, Looker, Microsoft Power BI, Qlik, Sisense, Tableau, and ThoughtSpot
  • Data pipeline vendors such as Alooma, Attunity, Alteryx, Fivetran, and Matillion
  • And data integration vendors including Informatica, MuleSoft, SnapLogic, Talend, and TIBCO (which also competes in the analytics space with its Spotfire portfolio)

If it seems like a lot of these companies have been in the spotlight, either from an acquisition or funding perspective, it is no coincidence. Recent examples include, but are not limited to

  • ThoughtSpot’s $145 million D Round in May 2018
  • Sisense’s $80 million E round in September 2018
  • Incorta’s $15 million B round extension in October 2018
  • Fivetran’s $15 million A round in December 2018
  • Looker’s $103 million E round in December 2018
  • TIBCO’s acquisition of Orchestra Networks in December 2018
  • Logi Analytics’ acquisition of Jinfonet in February 2019
  • Google’s acquisition of Alooma in February 2019
  • Qlik’s acquisition of Attunity in February 2019
  • Informatica’s acquisition of AllSight in February 2019
  • TIBCO’s acquisition of SnappyData in March 2019
  • Alteryx’ acquisition of ClearStory Data in April 2019
  • Matillion’s $35 million C round in June 2019
  • Google’s intent to acquire Looker in June 2019
  • Salesforce’s intent to acquire of Tableau in June 2019
  • Logi Analytics’ acquisition of Zoomdata in June 2019

The success of these solutions reflects the increasing need for analyst, data, and platform flexibility in improving the contextual analytic value of data across clouds and sources. And there will be more activity in 2019 as a number of these companies are either private-equity-owned or have taken significant venture capital funding and will need to exit soon to help fund future venture capital funds.
With the passing of Big Data, we move forward in tending to the health and care of the era of Big Data’s progeny, including the era of multi-cloud, the era of machine learning, and the era of real-time and ubiquitous context.

Read the full column here.