With European Union’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act (CCPA), the need for more robust data management strategy is painfully obvious. We need to get past the idea that every kilobyte of data ever created by an organization should be 24/7 accessible worldwide with a username and password.
So the question now for IT is, do I have to backup all my data to ensure its availability?
According to the Global Databerg Report (PDF), only 15% of the data at a typical company is business-critical. And the report called 52% of the stored information “dark data” because its value was unknown. The rest of the data (33%) is described as “redundant, obsolete or trivial.”
For a quick review on what a conventional backup solution does, it typically copies data from one storage system to another storage system (hopefully this is not tape, but that is conventionally what IT organizations have been using) for the purposes of data recovery. You might need to recover data in case the primary copy becomes unavailable due to some sort of hardware failure, accidental deletion, or data corruption, or you might need to recover data for the purpose of having version controls. The need to roll-back to a prior version is very common and IT must provide this type of data curatorship. Lastly, a fire or other types of data center disasters is a reason to have the backups, but in this case the backup data needs to be transferred off-site.
Generally, primary business data requires a comprehensive data protection strategy. Not only does the data need to be available on another system and available for immediate restore, it might also need to be on a system that can immediately take the place of the primary system in every way, including the same performance and availability of the original system, in other words a failover scenario. The use of flash is now being used for this strategy. I will discuss this in more detail in a future blog.
But what about data that does not require this same level of service as primary business data? This is data that was created and used for a brief period but is not in real-time usage and probably would not be looked at it again for a long time – if ever (this is the 33% of the data referenced in the Databerg Report and a big portion of the 52% of the dark data they reference).
Reference data is a perfect example of this type of data. Log files, previous versions of files that are no longer needed for production, previous runs of tests that are being kept for reference purposes are all good examples of data that should be pushed further downstream. The bottom line: We shouldn’t have to spend as much energy securing, backing up and managing “stale” data, so why not put it in a long-term archive.
Well, object storage is getting very popular because of this need. Organization can move this data downstream into a self-healing object storage system that will still protect it and ensure its accessibility just as a traditional backup system would do. The difference is the reduced complexity and cost, consolidation of the number of copies of data being created, and the ease of being able to expand or migrate this storage to the cloud. You can get more information on object storage on our other blogs.
An object-based storage system that has data immutability, supports versioning, has high data durability, and geographic distribution of data can supply both the storage needs and backup needs of data in a single solution; therefore, data stored in such systems probably does not require the traditional multiple layers of backup copies because of 11 nines of durability. Plus this data has already been backed up at least once when it resided on the primary data tier.
Companies that can identify the data that really doesn’t need to be on primary storage can achieve significant savings by moving it into self-protecting object storage system that would not need to be regularly backed up. Expect more information about how to implement this type of strategy in a future blog.
So for a modern-day data protection strategy, who has an enterprise grade object store? There are many great object stores available in the market, but according to the IDC, NetApp is positioned as a leader in the Worldwide Object-Based Storage 2018 Vendor Assessment report. The NetApp StorageGRID is available as either an appliance or a software-only offering and offers enterprise-grade attributes such as client connectivity via known protocols (NFS/CIFS, cloud protocols like S3 and Swift), global namespace across sites, comprehensive and automated ILM/dynamic policy management, flexible data protection methods (replication and erasure coding), fault-tolerant architecture, security, audit capabilities, and integration with cloud and archiving solutions.
StorageGRID Cloud Tiering and Cloud Mirroring support data tiering to S3-compatible public clouds. It supports metadata search, reporting, and visualization integration across on-premises and cloud deployments via the Elastic Stack. Search is also enabled for policy management via the product’s ILM framework. StorageGRID provides cloud-architected infrastructure for financial and personal data retention compliance as one integrated resource across public and private clouds. NOTE: NetApp also offers Cloud Volumes for organizations that want to snap their data directly to the cloud. I will write more about NetApp Cloud Volumes in future blogs.
In summary, IT needs to re-think their approach to the age-old backup and recovery solution. A modern-day data protection solution should:
- Only backup data that truly needs revision controls (probably less than 20% of primary data)
- Archive all inactive (~80%) to on-premise or public cloud object storage
- Provide a secondary data tier from the backup target to be used for testing, analytics or data tiering
- Support data deduplication
- Have built-in data lifecycle management
- Provide near instant RTO and at least hourly RPO
- Have built-in malware protection and GDPR notification