Posts Tagged ‘CDP’

How Underdogs Win: Real-Time versus Batch Data Protection

Friday, March 19th, 2010
  • Originally Posted by Tony Cerqueira on Wed, Dec 02, 2009 @ 06:02 PM

The New Yorker Magazine has is a great read for anyone considering the strategic aspects of real-time versus batch processes, (from databases, to running a girl’s basketball team) in this article titled “How David Beats Goliath: When underdogs break the rules“.

In the world of storage and information management and protection, the parallels to current legacy point products are impressive. Today’s leading backup products reside completely upon legacy architectures.  They are still, by and large, run as batch processes, are not searchable, do not provide real-time differencing, and have no real-time capability to tie into other data movement or data management capabiliites. You could say many of the same things about many other tools used within the IT dept.

It would be nice to turn a key, and make it all real-time, but that won’t happen. Fundamentally, it requires changes in the way systems, physical or Virtual Machines, are managed, and how responsibilities are distributed (if they are).  The legacy Client/Servers approaches completely rely on outdated policy distribution communications (batch), where connectivity must remain intact to execute their “server -to- slave server -to- media server -to- client” laundry list of batch “stuff to do”.  They need a lot of hand holding in order for things to happen, and for policies to be executed. A short list of issues with legacy products:

o- Batch methods require scans, trawls, polls, etc., all of which drag down resources

o- Batch I/O stacks up fast on VMs, and goes medieval on their host systems

o- Data changes can be discerned, but data touches cannot be tracked

o- Data classification, if any, is after the fact, instead of at “point of creation”

o- Compliance is via “batch” time slices, not real world “second-by-second” views

o- Metadata consistency is always a day late and a dollar short

o- Repository data always has a “window” of difference with primary data

o- Deduplication remains after the fact and separate

If users want to explore the road of real-time, they will need to seek new solutions that are outside of the realm of their current vendor portfolio, because vendor leaders  just have too much invested in existing legacy code bases. New architectures, which provide self-managing nodes, together with scalable and distributed storage, are the key to deploying more value across the enterprise, on a granular, simple and cost effective basis.  . . . Did I mention . . uhm . . . AIMstor?

Why Volume CDP and Replication products are so Wasteful

Friday, March 19th, 2010
  • Originally Posted by Fabrice Helliker on Tue, Nov 03, 2009 @ 11:24 AM

I’m often bewildered by the prevalence of volume CDP  or volume replication products.  This is the type of replication that works at either the whole disk or the partition level.    At this level, everything that is replicated is a dumb block.  There is no context as to “what” the blocks are . . . so, everything is replicated.

So let’s talk about something fundamental – wasted data transfers, wasted storage, and unnecessary system loads.

First let me describe a real world problem.  We had AIMstor setup to Backup, Version and CDP an assortment of machines.  We’d select the whole machine so that we could perform point in time bare metal restores in conjunction with file versioning of user documents.  Many of the machines were office systems, although what we’ve observed would have been exactly the same for a file server.

We decided to analyze the weekend traffic.  Note: because it was the weekend, we really didn’t expect an awful lot of traffic as the systems weren’t in use.  What surprised us however, is the amount of useless data that collected over this period.  We know operating systems can generate noise in the way of unwanted, temporary files, but for this test, we turned “off” all of the filtering within AIMstor.  What shocked us though was the incredible amount of useless data that was generated that has absolutely zero value.

One system alone, generated a staggering 40GB of temporary files.  A large amount of this was created by a virus checker.  Fortunately, because AIMstor works at a very granular level, this type of waste and noise can be easily filtered out.

Take your average Windows OS and you will find a lot a data written to disk that has no value to the business.  The system’s pagefile and prefetch files are constantly being written to.  This is before you apply virus checkers or user applications like Skype (yes it writes a lot to disk), Temporary Internet Files, etc.

And this is where volume level replication is so wasteful.   With Volume Replication everything is transferred and stored.  Factor a CDP system and then you are looking at capturing, transferring and storing a lot of unnecessary data.

Consider also that every block transferred is a load on the source system, the network and storage subsystem. There is a awful lot of energy and resources that goes into supporting Volume Replication and Volume CDP products . . . for no good reason.

The Fallacy of Integrated Solution Marketing

Friday, March 19th, 2010
  • Originally Posted by Tony Cerqueira on Wed, Sep 16, 2009 @ 10:30 AM

So . . . Company A (which ships one of these, maybe a Backup, or a Replication, or a File Archive product) acquires Company X (the creator of say, a CDP Product), and then announces their sincere plans to combine both solutions and to deliver huge value to their existing user base.  A few weeks later, they have a brand spanking new CDP software box on the website, a new data sheet showing what seems to be tight integration of Company X Product into Company A Product, and a press release that extols the virtues of this newly integrated solution, promising “Single Pane of Glass” yadayada yada.  Hey, they might even have gotten the GUI from the CDP product to work a tad bit with the Company A Product (always in a meaningless way, but, working together none-the-less).

Don’t be fooled.  The game of product integration, the headaches it creates, and the expenses and risks associated with it are all still there.  Understand this: Integrated solutions are not bad. They are necessary, and they are your only choice in many circumstances.

What is bad is the “Marketing Spin” you get from some vendors, that things are “fully integrated” to a level where the products look and work in a “seamless” fashion.

Don’t they know you can go to hell for lying?

Sure, for the customer, now there is one vendor and one throat to choke, but those solutions are still separated, in every material way that matters.  And sure enough, too many calls to support will soon mean that your most recent investment in the new product from your old supplier, will eventually turn into shelf-ware. Your investment is lost, and your problem and pain remains.

Stove-piped solutions that are forced together by the sheer will and cost of vendor provided professional services, are lessons in complexity, poor ROI, and overworked IT staff. Sometimes you have no other choice, and must deal with it, regardless of the cost and pain. The desire customers have, to believe vendors who acquire products, and buy into their claims of getting a platform, instead of several separate products, is what gets them into trouble.

“Hey, wait a minute” you say, “these are big, big companies, with hundreds of engineers.  They will make the solutions work together, and I will get the solution that I want.”

That makes sense, until you look at the track records of all major vendors in the storage/data management space.  After hundreds of acquisitions, billions of dollars spent, thousands of infrastructures uprooted and redone, it is still hard to show tight integration between any of the solutions.

It is however, easy to show the disparities between them, the incompatible metadata, the separate business processes, the redundant repositories, the conflicting data movers, the contradictory data classification schemes, the incompatible policy schemes, and the archaic mindsets that emanate from legacy architectures that date back 15 to 20 years.

I have no beef with honest vendors who go out and give it their best to integrate with other products, or multiple products they offer, in order to deliver a solution.  We do the same thing at Cofio with some solutions, and of course we have the advantage of a TRULY UNIFIED set of solutions in a single product, AIMstor.  What  I have a hard time understanding, is how some vendors (you know who you are, big and small) can lie so boldly, mislead customers, and claim unified or tight integration, where none exists.