BlogData Deduplication: 5 Things you Need to Know for Optimal Data Storage Management (Post One)

Data & Storage - Delivering Digital Transformation

Along with social media, data redundancy is a primary contributor to explosive data growth. Studies estimate that multiple copies of data require organizations to buy, use, and administer two to fifty times more storage than they’d need with data deduplication.

Initially, data deduplication eliminated data redundancy in specific cases like full backups, email attachments, and VMware images. However, you’d soon notice the pervasiveness of duplicated data. That’s because test and development data multiplies across an organization over time. Replication, backup, and data archiving create multiple data copies scattered across the enterprise, and users often copy data to multiple locations for their own convenience.

Organizations now recognize that—far from being a niche technology— deduplication should be an integrated and mandatory element in their overall IT strategies and data storage management solutions.

There are essentially two ways to reduce the cost of your data storage. First, you can try to leverage a lower-cost storage platform, which results in an additional set of problems. Your other option is to leverage data deduplication to reduce your data growth and total required storage.

In this first of two posts, we’ll explore the first two data deduplication best practices for optimal data storage management.

1. Consider the Broad Implications of Deduplication

Like disk-to-disk backup or server virtualization, you don’t want to evaluate deduplication as an isolated product or feature. You must consider the broader implications of deduplication within the context of your entire data management and storage strategy.

For example, deduplication can be performed at the file, block, and byte levels. You’ll have to consider the tradeoffs for each method, which include computational time, accuracy, level of duplication detected, index size, and in some cases, the scalability of the solution.

Also, consider how you can use deduplication to eliminate tape where it makes sense in your environment. That might be remote offices or any locations where your company doesn’t have trained IT personnel.

2. Learn What Data Does Not Dedupe Well

In the simplest terms, data created by humans—documents, transactions, and email for example—dedupes well in most dedupe systems. Photos, audio, video, imaging, or data created by computers generally don’t dedupe well, so you should store these sets of data on non-deduped storage. Learn what data does not dedupe well in your particular environment, and consider not deduping it. For some situations, you might consider a deduplication solution that can selectively avoid certain sets of data.

In our next post we’ll explore the remaining three best data deduplication best practices for optimal data storage management. In the meantime, share your best practices in the comments field below.

About the Author

Dustin Smith

Dustin Smith, Chief Technologist

Throughout his twenty-five year career, Dustin Smith has specialized in designing enterprise architectural solutions. As the Chief Technologist at ASG, Dustin uses his advanced understanding of cloud compute models to help customers develop and align their cloud strategies with their business objectives. His master-level engineering knowledge spans storage, systems, and networking.