Blog

09 May 2019Joel Horwitz, DISCOtecher

Don’t Let Data Gravity Drag You Down

The universe is something that never ceases to surprise us. In all of the years that humans have walked planet earth, there’s always something new that catches humanity off-guard—like the opportunity to see a black hole for the first time, in April 2019.

This was a moment that fascinated the team at WANdisco. It made us sit back and self-reflect. No matter how far we feel we’ve come as a species, time is something that outpaces our humble lifespans. Think about it: the moon landing seems so long ago. But 50 years ago was right around the corner, in 1969.

Think of what’s happened since then:

Public internet
Search
Relational database
Data warehouse
Hadoop distributed file system
NoSQL database
Machine learning
Cloud computing
Data science
Mobile technology
Advanced analytics
Artificial intelligence


...The list goes on and gets interesting, doesn’t it? Technology is advancing quickly. That’s what the World Economic Forum says—humankind is navigating a Fourth Industrial Revolution. Distributed computing is one of the cornerstone capabilities that powers everything that humanity touches.

So what does distributed computing have to do with a black hole? For one, all of the image data of the black hole was collected at eight different observation stations across the globe. That image data was literally air lifted and shipped to MIT and the Max Plank Institute where it was collated together with an algorithm for over 2 years!

It's hard to believe that with all the innovations that have been made over the past 50 some odd years, we are still shipping hard drives around the world to synchronize data manually when we have consensus algorithms like WANdisco’s DConE that can collate automatically. Aside from the considerable effort to find a black hole and collect the data, the challenge of data gravity very nearly stalled the effort to make this important scientific discovery.

What are the Mechanics of a Black Hole You Ask?

Glad you asked, Black holes form when massive stars, at the end of their life cycles, collapse. This is what NASA wrote in August 2018.

“A black hole is a place in space where gravity pulls so much that even light cannot get out. The gravity is so strong because matter has been squeezed into a tiny space. This can happen when a star is dying.

Because no light can get out, people can't see black holes. They are invisible. Space telescopes with special tools can help find black holes. The special tools can see how stars that are very close to black holes act differently than other stars.”

It’s funny how quickly things can change. Now it’s possible to visualize a black hole. In this way, technological innovation is much like space exploration. Discoveries continue to open doors.

Dr. Katie Bouman sees her work, which allowed the world to see a black hole for the first time, in action


Just Like A Black Hole, Data Exerts a Gravitational Pull

Hadoop data environments, often referred to as Data Lakes, are experiencing a scenario similar to the formation of a black hole. As these once bright stars begin to collapse, they are exerting data gravity on everything around it.

“Einstein thought nature would protect us from the formation of black holes,” writes Janna Levin, contributing columnist for Quanta Magazine, about the black hole discovery. “To the contrary, nature makes them in abundance. When a dying star is heavy enough, gravity overcomes matter’s intrinsic resistance and the star collapses catastrophically.”

Every force has an equal and opposite reaction force—and the amount of data that has been proliferating from humankind’s recent digital inventions—is creating a force of its own. Many organizations don’t realize that they are creating black holes by allowing data to exert gravity on dependent business applications.

Hadoop’s Impact on Data Gravity

Let’s talk about the solar system of big data, as it has evolved over the last 20 years.

Consider how much the Hadoop Ecosystem has evolved since Google wrote its first paper on MapReduce.

The advent of Hadoop happened in tandem with the advent of the modern “app” ecosystem. Data is proliferating as distributed computing gains power. Today, companies are collecting and analyzing more than just clickstream data—we’re collecting more log file data, behavioral data, mobile data, GPS data, and IoT data than ever before. Digital applications are changing the way that machines interact with the world.

Hadoop reduced the cost and increased the feasibility of storing and analyzing big data. It created a set of standards for on-premise data storage with commodity hardware that have become the model for more types of distributed file systems, like object storage. Not only that, a thriving ecosystem of hundreds of services and applications sprung up to support this newly minted distributed file system to recreate the many ways in which we had become familiar with using data. For example, querying services like Apache Hive, IBM’s BigSQL, Cloudera Impala or run-times like Apache Spark, Yarn, Tez, and others proliferated in this new galaxy of vendor solar systems.

It’s this thriving Hadoop solar system that has enabled companies like HM Health Solutions to continuously replicate large, fast-changing distributed data and query it with Hive at significantly lower costs. In the case of HM Health Solutions, WANdisco Fusion provided a platform to expand its primary data lake into a secondary data lake without the risk of data inconsistency or unplanned outages as the data gravity started to exert its pull.

You can learn more about how HM Health Solutions is overcoming data gravity, with an always-on data lake here.

New Cloud “Solar Systems” Are Emerging

On-premise implementations are not enough to sustain a business through its transition into the future. It’s this need that has given rise to the entire cloud economy.

Without the right infrastructure in place, data has the potential to implode on itself. Data, without velocity and movement between systems, is dead weight. Data gravity could cause your galaxies and solar systems of data to implode.

Is your company well-situated to avoid a data black hole scenario? Do you need a new North Star? Here’s what you need to consider:

  • How your data “lives” within varying systems in your organization.
  • What it will take to get your data into the cloud.
  • What potential barriers stand in the way of a successful cloud migration.


Joel Horwitz, SVP of marketing, shares his insights in a talk that he gave at Strata London 2019, check out his slides here.


About the author


Joel Horwitz
At WANdisco, we value our relationships with industry experts and partners and highly value their educational material. This blog series is made up of their opinions and ideas relevant to our followers and we support them - and respect their personal viewpoints.


For more information on WANdisco, visit http://www.wandisco.com

Email an Expert

Talk to us about making data movement reliable without downtime

* REQUIRED FIELDS