2021 Hadoop-to-Cloud Migration Benchmark Report

By Tony Velcich, Jul 20, 2021

More than half of the Hadoop-to-cloud data migration projects happening today are achingly old-school, and as a result, painfully inefficient.

That’s a staggering statistic pointing to lost time and resources, and it’s just one of the insights that came out in the annual 2021 Hadoop-to-Cloud Migration Benchmark Report, which surveyed more than 200 C-level leaders (e.g. CIOs, CTOs), cloud and data architects, and data professionals. The subjects were all well acquainted with Hadoop, either because they are currently using it or they had previously migrated Hadoop data lakes to the cloud.

And the majority of the respondents admitted to relying on “old school” approaches and tools involving shipping data by truck to cloud vendors and/or using tools not designed for on-premises to cloud migration such as DistCp. That means that far too many companies are not taking advantage of technologies that can ease the transition and mitigate risks in moving to the cloud.

There’s clearly a disconnect because despite those goals, they are turning to clunky and inefficient methods of transfer -- even as they seek to become more streamlined and modern.

These same companies are moving their data to the cloud to increase flexibility and agility. They want to be able to crunch data and develop faster analytics. But there is clearly a disconnect, because despite those goals, they are looking to clunky and inefficient methods of transfer — even as they seek to become more streamlined and modern.

“Manual migration DistCp-based tooling and bulk transfer devices strain resources, add complexity and ultimately increase risks to the business,” said Van Diamandakis, SVP Marketing, WANdisco. “While customers are looking to accelerate time to business insights leveraging cloud-scale analytics, the survey says they are looking at the wrong ways to migrate their petabyte scale data. It’s a train wreck in the making. There is a much better way.”

Many leaders expect to live in a hybrid environment and are planning for multi-cloud data management to deliver business value.

These outdated approaches and manual tools such as bulk transfer devices and DistCp are not designed for modern migrations with large volumes of actively changing data. These manual approaches introduce business disruption or require unnecessary heroics to perform manual reconciliation or custom development to handle data changes that occurred during the data transfer or copy. Such migration techniques jeopardize the success of the migration projects and put the companies’ data and business at risk.

Hadoop-to-cloud migration key findings

In addition to the above, the key findings from the report were:

  1. The next wave of Hadoop data migrations will be even larger. Having learned from the first wave of migrations led by large organizations, the next wave of companies to migrate will benefit both from lessons learned and more mature migration technology.

  2. Migration concerns and business impacts can be solved. Each of the leading impacts to business disruption from companies planning, completing, and avoiding migration can be avoided with software designed to handle data changes and maintain security settings without the costly and risky development of custom code by sparse IT resources.

  3. Top requirements emerge for Hadoop migration software. As organizational technology leaders set the strategy for their Hadoop migrations, the most requested requirements are 1) data migration validation, 2) support for multiple cloud targets, and 3) the ability to handle data changes without requiring operational downtime.

  4. Companies are planning for hybrid and multi-cloud data management. Companies should have a mindset for justifying the Hadoop data migration software as an independent modern data management capability for delivering agility and ensuring data integrity across an active mix of on-premises and cloud environments.

The Report found that most IT leaders are pursuing cloud migration to lay the foundation for future business value creation. The three top drivers of migration to the cloud were:

  1. data modernization initiatives;

  2. cloud scale analytics; and

  3. adopting scalable cloud shortage

This shows us that despite the use of outdated technologies, the appetite for more agile capabilities is alive and well.

The demand for cloud migration solutions that can move data to the cloud with zero downtime is clear. The challenge now lies in bringing companies into the future and encouraging more widespread adaptation to the technology. WANdisco’s LiveData Platform keeps geographically dispersed data at any scale consistent between on-premises and cloud environments allowing businesses to operate seamlessly in a hybrid or multi-cloud environment, with zero downtime and zero data loss.

The days when a company needed to resign themselves to downtime while migrating their data are over.

Download the 2021 Hadoop-to-Cloud Migration Benchmark Report


Tony Velcich

Tony is an accomplished product management and marketing leader with over 25 years of experience in the software industry. Tony is currently responsible for product marketing at WANdisco, helping to drive go-to-market strategy, content and activities. Tony has a strong background in data management having worked at leading database companies including Oracle, Informix and TimesTen where he led strategy for areas such as big data analytics for the telecommunications industry, sales force automation, as well as sales and customer experience analytics.



Get notified of the latest WANdisco Blog posts and Newsletter.

Our LiveData Story

Related Blog Posts



LiveData Platform for Azure is Now Generally Available

Today, we announced that WANdisco’s LiveData Platform for Azure is generally available. The first na...

Oct 18, 2021

Read More

Tech & Trends

Leverage a Data-First Strategy for Your AWS Cloud Migration

Leverage a Data-First Strategy for Your AWS Cloud Migration

Oct 12, 2021

Read More

Tech & Trends

How WANdisco Enables High Availability for Distributed Ledgers

Overview of recent work integrating WANdisco’s Distributed Coordination Engine (DConE) with two of t...

Aug 13, 2021

Read More

Seeing is Believing. Try WANdisco Now.

Fully-featured, self-service and automated.

Start migrating Hadoop data in minutes, at any scale, to any cloud

Cookies and Privacy

At WANdisco, we respect your concerns about privacy and value the relationship that we have with you.

Like many companies, we use technology on our website to collect information that helps us enhance your experience and our products and services. The cookies that we use at WANdisco allow our website to work and help us to understand what information and advertising is most useful to visitors.

Please take a moment to familiarise yourself with our cookie practices and let us know if you have any questions by getting in touch through any of the methods listed on our "Contact Us" page.

We have tried to keep this Notice as simple as possible, but if you’re not familiar with terms, such as cookies, IP addresses, and browsers, then read about these key terms first.