Posts Tagged ‘deduplication’

CDP is a Dog -> Unless it’s UNIFIED with Backup

Friday, March 19th, 2010
  • Originally Posted by Tony Cerqueira on Thu, Jan 14, 2010 @ 03:32 PM

It’s true.  CDP is tremendous technology, offering granular point-in-time restore that backups simply cannot do. But CDP (Continuous Data Protection) has severe retention and data management limitations, so backup is absolutely necessary.

But – why do CDP if you cannot get it FULLY UNIFIED with your backup solution??  I don’t mean “integrated”.  Any moron can “integrate” a CDP product with their legacy backup product (and, many have, mind you). You just tell the people in marketing to make the box look the same, and update the user manual.

THE TRUTH: CDP is ONLY worth doing, if it comes FULLY UNIFIED with a next-generation backup solution (optimally, with inherent deduplication). That way, they share the same data mover, the same repository, the same metadata, the same underlying data structure and supporting infrastructure.

I dont know any other product besides Cofio’s AIMstor that does this. You get granularity of CDP, with smart retention flexibility of AIMstor’s next-gen Backup, and all the great policy driven capabilities that come together with it. You can also empower Bare Metal Restore from your backup and CDP sets, which are fully single-instanced for huge capacity savings.

More importantly, because AIMstor auto-classifies data, you can SELECT what you want to CDP, and what you want to Backup, and what retention you want for very specific types of data, or whole categories of data. Standalone CDP  products are kinda, well, dumb. They like to move . . . everything. Optimal? Uhm, not.

So what happens if you buy CDP that is NOT unified with your backup solution? Triple the data movers, double the repository setup and capacity usage, double the overhead to servers and clients, double the admin time, double the infrastructure. Plus, you probably can’t select what you really want, so you will just end up wasting even more resources.  Why do it?

The Legacy Backup Bubble (Part II)

Friday, March 19th, 2010
  • Originally Posted by Tony Cerqueira on Tue, Jan 12, 2010 @ 05:14 PM

Legacy Backup is a major market in the data protection space, and is still going strong. Regardless of its inefficiencies, people still buy it, and add onto their existing Legacy Backup environment. However, users are starting to take notice.

Every user backup forum will often point to lack of Legacy Backup products to deliver any upstream value, and their typical failure rates as a result of server-dependent architectures, and their terrible storage inefficiency.

In addition, many environmental factors have crept into the woodwork at user sites (business intelligence, eDiscovery needs, compliance requirements, etc.), and now that the paint is off, people are finally getting a look at what’s underneath the hood of Legacy Backup products. It won’t be long.

Deduplication was a key first mover that really made people question the insanity of Legacy Backup. Why create something so inherently inefficient that it required such a huge level of clean-up? (remember, 20X or greater is the typical deduplication cleanup rate).

Cloud architectures will soon expose even more inadequacies in the Legacy Backup camp. Forcing many vendors to accomodate Cloud storage in strange, non-optimal ways.

Virtual machine sprawl has added more headaches to the Legacy Backup camp because of I/O and overhead issues created by Legacy Backup, and multiplied by VM’s.

Additionally, users are becoming more reliant on other tools within the market to make up for the lack of flexible recovery capability of Legacy Backup. CDP, Replication, Bare Metal Restore, and others, are coming into play in the mid-market.  As are technologies that help manage information; index/search tools, data classificationpolicy management, and tools that control data for added layers of security or monitoring.

There are many others, but these ones stick out. When things be

The Legacy Backup Bubble (Part I)

Friday, March 19th, 2010
  • Originally Posted by Tony Cerqueira on Tue, Jan 05, 2010 @ 07:10 PM

The terrible inefficiency of Legacy Backup has created new markets and new companies over the past decade in the storage backup space.  Many are fixes applied to Legacy Backup itself, many others are another form of Legacy Backup, that solve some issues for a key market or vertical. Many have been proven to solve real world problems, caused, of course, by Legacy Backup.

So, what is Legacy Backup?  You are probably using it right now in your data center, your remote office, or your SMB, and most certainly, in your enterprise.  It’s a product that protects your data by doing several things based on a schedule, then sends a copy of some processed data to disk or tape. Unfortunately, it batch copies data, creates massive and unnecessary duplication of data, and has no ability to share its repository, its processes, policies, metadata, data movement, or any of its significant infrastructure with other data protection products (like CDPReplicationArchive, etc.).

The great thing about inefficiency is that it creates need.  And where there is need, there is opportunity. But the reason for the need, it is now being learned, is that Legacy Backup is the problem.  Like any boom or bubble, Legacy Backup will . . . utlimately . . . pop.

Why Volume CDP and Replication products are so Wasteful

Friday, March 19th, 2010
  • Originally Posted by Fabrice Helliker on Tue, Nov 03, 2009 @ 11:24 AM

I’m often bewildered by the prevalence of volume CDP  or volume replication products.  This is the type of replication that works at either the whole disk or the partition level.    At this level, everything that is replicated is a dumb block.  There is no context as to “what” the blocks are . . . so, everything is replicated.

So let’s talk about something fundamental – wasted data transfers, wasted storage, and unnecessary system loads.

First let me describe a real world problem.  We had AIMstor setup to Backup, Version and CDP an assortment of machines.  We’d select the whole machine so that we could perform point in time bare metal restores in conjunction with file versioning of user documents.  Many of the machines were office systems, although what we’ve observed would have been exactly the same for a file server.

We decided to analyze the weekend traffic.  Note: because it was the weekend, we really didn’t expect an awful lot of traffic as the systems weren’t in use.  What surprised us however, is the amount of useless data that collected over this period.  We know operating systems can generate noise in the way of unwanted, temporary files, but for this test, we turned “off” all of the filtering within AIMstor.  What shocked us though was the incredible amount of useless data that was generated that has absolutely zero value.

One system alone, generated a staggering 40GB of temporary files.  A large amount of this was created by a virus checker.  Fortunately, because AIMstor works at a very granular level, this type of waste and noise can be easily filtered out.

Take your average Windows OS and you will find a lot a data written to disk that has no value to the business.  The system’s pagefile and prefetch files are constantly being written to.  This is before you apply virus checkers or user applications like Skype (yes it writes a lot to disk), Temporary Internet Files, etc.

And this is where volume level replication is so wasteful.   With Volume Replication everything is transferred and stored.  Factor a CDP system and then you are looking at capturing, transferring and storing a lot of unnecessary data.

Consider also that every block transferred is a load on the source system, the network and storage subsystem. There is a awful lot of energy and resources that goes into supporting Volume Replication and Volume CDP products . . . for no good reason.

The Fallacy of Integrated Solution Marketing

Friday, March 19th, 2010
  • Originally Posted by Tony Cerqueira on Wed, Sep 16, 2009 @ 10:30 AM

So . . . Company A (which ships one of these, maybe a Backup, or a Replication, or a File Archive product) acquires Company X (the creator of say, a CDP Product), and then announces their sincere plans to combine both solutions and to deliver huge value to their existing user base.  A few weeks later, they have a brand spanking new CDP software box on the website, a new data sheet showing what seems to be tight integration of Company X Product into Company A Product, and a press release that extols the virtues of this newly integrated solution, promising “Single Pane of Glass” yadayada yada.  Hey, they might even have gotten the GUI from the CDP product to work a tad bit with the Company A Product (always in a meaningless way, but, working together none-the-less).

Don’t be fooled.  The game of product integration, the headaches it creates, and the expenses and risks associated with it are all still there.  Understand this: Integrated solutions are not bad. They are necessary, and they are your only choice in many circumstances.

What is bad is the “Marketing Spin” you get from some vendors, that things are “fully integrated” to a level where the products look and work in a “seamless” fashion.

Don’t they know you can go to hell for lying?

Sure, for the customer, now there is one vendor and one throat to choke, but those solutions are still separated, in every material way that matters.  And sure enough, too many calls to support will soon mean that your most recent investment in the new product from your old supplier, will eventually turn into shelf-ware. Your investment is lost, and your problem and pain remains.

Stove-piped solutions that are forced together by the sheer will and cost of vendor provided professional services, are lessons in complexity, poor ROI, and overworked IT staff. Sometimes you have no other choice, and must deal with it, regardless of the cost and pain. The desire customers have, to believe vendors who acquire products, and buy into their claims of getting a platform, instead of several separate products, is what gets them into trouble.

“Hey, wait a minute” you say, “these are big, big companies, with hundreds of engineers.  They will make the solutions work together, and I will get the solution that I want.”

That makes sense, until you look at the track records of all major vendors in the storage/data management space.  After hundreds of acquisitions, billions of dollars spent, thousands of infrastructures uprooted and redone, it is still hard to show tight integration between any of the solutions.

It is however, easy to show the disparities between them, the incompatible metadata, the separate business processes, the redundant repositories, the conflicting data movers, the contradictory data classification schemes, the incompatible policy schemes, and the archaic mindsets that emanate from legacy architectures that date back 15 to 20 years.

I have no beef with honest vendors who go out and give it their best to integrate with other products, or multiple products they offer, in order to deliver a solution.  We do the same thing at Cofio with some solutions, and of course we have the advantage of a TRULY UNIFIED set of solutions in a single product, AIMstor.  What  I have a hard time understanding, is how some vendors (you know who you are, big and small) can lie so boldly, mislead customers, and claim unified or tight integration, where none exists.