When organizations today put programs in place for data storage management, they need to account for multiple copies of data, especially if they aren’t following data deduplication best practices. In fact, studies estimate that multiple copies of data require organizations to buy, use, and administer two to fifty times more storage than they’d need without data deduplication.
In our first blog post on data deduplication best practices we discussed the first two: the broad implications of data deduplication and understanding what data does not dedupe well. Here we’ll explore the remaining three things you need to know for optimal data storage management.
3. Don’t Obsess Over Space Reduction Ratios
The length of time that data is retained affects data deduplication ratios in two ways: If more data is examined when deduplicating new data, you’re more likely to find duplicate data and increase space savings.
While you should closely examine this number when you’re comparing multiple products, try not to overanalyze this number once your system is up and running. Rather than performing more frequent full backups just to get a better data deduplication ratio, consider increasing your backup retention period for your on-disk data store. Once you have your first set of backups on disk, adding additional backups to that same deduped system will take up less space than sending them to tape.
4. Don’t Use Multiplexing if You’re Backing up to a VTL
If you’re backing up to a virtual tape library (VTL), don’t use multiplexing. Even if your deduplication solution can de-multiplex data, consider turning this feature off. Often a carryover practice from writing to physical tapes, multiplexing data merely wastes computing cycles— cycles that could otherwise be used to dedupe your data faster. For example, instead of multiplexing ten backups to two virtual tape drives, create twenty virtual tape drives and turn off multiplexing.
5. Pilot Multiple Systems Before Selecting Your Solution
Before selecting your deduplication solution, try to pilot several deduplication systems in your environment. While current vendors offer many good solutions and various deduplication approaches, you may also find some products with real limitations. Only by comparing multiple products can you best determine the optimum approach for deduping your data, whether it’s inline, post-process, target-side, client-side, via backup software, etc.
Common challenges of deploying a deduplication solution involve problems related to performance, increased complexity of management, and proliferation of deduplicated data silos. To avoid unnecessary complications, first ensure ease of integration into your existing environment and get customer references in your industry. Take time to understand the vendor’s roadmap, but test everything. Once you’ve selected your data deduplication solution, make sure you follow the best practices suggested by your deduplication solution vendor.
When evaluating deduplication solutions, look for the following essential features:
- Ability to scale without expensive hardware upgrades
- More recovery points and with shorter recovery times
- Point-and-click deduplication management
- Built-in reporting of deduplication across vendors, data types, sources and platforms
- Tight integration with all necessary applications to minimize end-user downtime
- Single solution simplicity for ease of deployment and administration
- Ability to rapidly and securely recover business-critical data across all locations, applications, storage media and points-in-time
- D2D2T-optimized for backup performance and reliable data recovery
- Fast, comprehensive search to aid in recovery
- Data integrity and security features
- Built-in Disaster Recovery capabilities
- Data classification
- Cost-effective and timely eDiscovery
- A common technology platform
- Single point of management
Armed with these five deduplication best practices, your data storage management efficiency will improve.