NEWS Coverage

WANdisco sticks Fusion into Amazon's Snowballs for mega-petabyte data pelt

August 01 2017

Replication tech integrated with data truck - yes an actual truck...

WANdisco is integrating its Fusion product with Amazon's Snowball product, which moves massive amounts of data to its public cloud.

Snowball is the AWS method of transporting large amounts of data to its public cloud; data amounts so large that digital transmission across a wide-area network (WAN) would take weeks or more and cost a fortune. Data is transferred to drives and these transported to an Amazon data centre where their contents are read and uploaded to the AWS cloud's storage arrays. Vast datasets, up to 45PB, are transported by a truck, a so-called Snowmobile.

WANdisco (Wide-Area Network Distributed Computing) Fusion is replication technology that can handle transactional data and transmit from multiple sources to a destination while the data set at the sources is still in use.

Essentially, what happens is that distributed Paxos algorithm technology, devised by chief scientist Dr Yeturu Aahlad, is used by the several processors involved to register and agree on the order of updates to the global data set. These updates are given a Global Sequence Number (GSN) and that enables them to be applied in sequence (replayed) at the target data centre.

The system can withstand network outages by saving up the registered data events and then having GSNs calculated and the data sent upstream when the network is back up again.

An AWS Snowmobile data transfer can be viewed as a network outage, a fairly prolonged one. By installing Fusion technology both at the Snowmobile source site and AWS destination site, then the Snowmobile data can be uploaded to Amazon, a normal Internet access network pipe to the dataset established, and then the Fusion technology used to "replay" intervening updates at the source site to the AWS-held dataset. This ensures eventual consistency between the source and AWS target datasets.


Why does this matter? If there are two or more updates to a dataset during a network outage then it may not matter if the updates are to different dataset items, aka database records. But if they are to the same record then they need to be applied in sequence, otherwise a disaster might happen.

Suppose the database record is a business' bank balance and it is $1,000,000. Update one is a deposit of $2,000,000 while update two is a withdrawl of $2,000,000. If they are applied in the wrong sequence then the business could find itself having a negative balance of -$1,000,000 with the bank doing bad things like suspending the account.

Guaranteed dataset consistency is a really big deal when you absolutely must have consistency. We understand that WANdisco and Amazon are talking to banking institutions, interested in moving data to the cloud, about this technology integration. It will become a core part of WANdisco Fusion and not a separately branded and charged-for item. ®

FOLLOW

SUBSCRIBE

Get notified of the latest WANdisco Blog posts and Newsletter.

Terms of Service and Privacy Policy. You also agree to receive other marketing communications from WANdisco and our subsidiaries. You can unsubscribe anytime.

14th - 17th February 2023 | FLORIDA

WANdisco Booth #154

06th - 07th October 2022 | TORONTO

Big Data + AI 2022 Toronto Speaking session and space

Cookies and Privacy

At WANdisco, we respect your concerns about privacy and value the relationship that we have with you.

Like many companies, we use technology on our website to collect information that helps us enhance your experience and our products and services. The cookies that we use at WANdisco allow our website to work and help us to understand what information and advertising is most useful to visitors.

Please take a moment to familiarise yourself with our cookie practices and let us know if you have any questions by getting in touch through any of the methods listed on our "Contact Us" page.

We have tried to keep this Notice as simple as possible, but if you’re not familiar with terms, such as cookies, IP addresses, and browsers, then read about these key terms first.