- Originally Posted by Tony Cerqueira on Fri, Nov 27, 2009 @ 05:21 PM
In response to our CTO’s last blog (Why Volume CDP and Replication products are so Wasteful), and because it is Black Friday, a day with a serious shopping theme, we had a few comments out there from a few Volume Replication Vendors, so I thought I would answer them here and keep things better organized:
Comment 1: I guess it depends on what the volume contains. And the purpose of doing it in the first place. Certainly replicating or CDPing a system volume doesn’t seem to make much sense unless the reason for it is Disaster Recovery at a remote site. But replicating or CDPing a volume that only contains business critical data could be meaningful particularly in compliant heavy environments. Were the donkeys nodding?
Answer 1: There are some noisy applications. In this particular example it was Sophos anti-virus. But the OS can do very well all on its own too. OS vendors even call out noisy directories that should be avoided during backups, because there is no value in a restore, and there is an obvious cost to replicate it. It is also not untypical for an application to want to create temporary files that have no business value on the same volume as the database. You want to replicate all that too?
With Volume CDP grabbing it all, that extra 40GB-50GB per machine gets expensive. Multiply that by a large number of machines, and the overhead is very large. Plus, the extra of CPU, energy, and bandwidth sending wasteful and unneeded
data is another big cost that adds up quickly, and then goes exponential once you consider the enterprise. That is the essence of the problem with Volume CDP and Replication. It is indiscriminate by nature and grabs everything.
Kind of like a starving barbarian with a big shopping cart at the grocers on double-coupon day, she can’t even resist taking the trash with her.
The Donkey’s weren’t nodding, but they were chuckling.
Comment 2: Couple of things that are puzzling to me in your blog are the fact that there were 40GB of wasted capacity in a single server during 1 week? That would certainly not be the norm and if it was there would be other useful conversations to have with a client. As for CDP being intelligent enough to distinguish useful data. Great idea and most enterprise CDP solutions will have this ability now or in the near future. Even more important when considering replication is evaluating solutions that will compare data on the local and remote site and deduplicate before replicating the changes across the wire. We have customer examples that were able to shave 70%+ off replicated data!
Answer 2: We are simply saying that it’s a good idea to avoid sending all that unneeded data, in the name of simple logic, speed and efficiency. The only effective way of combating this is by understanding the data (which is what AIMstor has solved).
I’d be interested in seeing how the Volume replication vendors address this. I suggest that they can’t.
Volume replication argument have generally been that the “customer” ought to reconfigure their system to suite the replication technologies inability to address data types or data classifications. Have a volume for one thing, another volume for another , etc. While it certainly may make sense to partition your system, the point is, customer shouldn’t be forced to because of the failings of the CDP product. Let the customer partition storage based on what makes sense to his application, not because of the inability of the volume CDP product.
The fact is also, CDP shouldnt just be for the application. Why shouldn’t it be used for the system volume as it provides a good DR image as well? Or something even more radical, why not provide a hybrid, period transfers of parts of the system but CDP granularity of other part of the system. Imagine you have a volume that is both the OS and the application (OK example normally for smaller setups), you could take periodic images of the OS, but then CDP the application data. This will minimize data transmitted and provide very nice and granular application restore, with safe set of periodic images of OS. You also get big overhead reduction, plus, savings on CPU, energy, bandwidth, etc.
Bringing up the de-duplication topic is interesting too. Understanding the data you are de-duplicating substantially increases the de-duplication rates, like we do. That is also why Data Domain excels, it distinguishes the data boundaries and doesn’t treat everything as a dumb block. Would be good to know how much of that 70% replicated data savings you mention was just white space elimination? – which should have never been transferred in the first place. If so, am puzzled because that approach, which is typical among all Volume-approach vendors, seems to be making a mistake, and then the vendor congratulates himself for later correcting his mistakes.
And that’s supposed to be a “solution”?