Object Storage – Tackling Big Data Storage and Protection Challenges
According to industry research firm IDC, there will be over 28 billion connected devices (PDF) by 2020 thanks to the Internet of Things (IoT). Research firm Gartner estimates that in 2016 alone, connected devices will increase by 30 percent. When you add the growing use of business applications, social media, video and other high-bandwidth media, and you have a big data machine that IDC predicts will grow over 26 percent year-over-year. All of this data, much of it unstructured, needs to be stored, protected, and backed up depending on your industry, creating IT challenges that are not easily overcome.
The growth of big data has exposed the bottlenecks and failures of traditional storage methods, especially with regards to unstructured data. Traditional RAID storage schemes are based on parity, and if more than two drives fail simultaneously, data becomes unrecoverable. RAID is suitable for most storage requirements, but with the shear volumes that big data places on storage needs, this failure scenario starts to become a real possibility.
Object storage is one way to overcome the bottlenecks and failure possibilities that big data exposes with RAID, especially with unstructured data. Using object storage, each item stored is assigned a unique identifier. Unlike hierarchically stored data, object storage enables the number of stored items to grow beyond the limits of traditional storage systems, while still maintaining the integrity and consistency of the data.
The primary reason for the use of object storage is the concept of data protection. By design, any object store protects every piece of data placed on it. No other backup or copy of the data needs to occur. Objects are intrinsically fully protected.
Here’s how object storage technology protects data.
All object storage systems use a multi-node system. Each “node” is a rack mounted server filled with internal disks. None of the disks use RAID protection. Each disk is formatted and controlled by the object store as an individual place to store objects. Nodes are by design spread out across multiple data centers to provide protection in case of a complete site failure.
2.Full Object Copy
Some object stores take an object and protect it by making three (3) copies of the object and placing the copies on different nodes. Any individual disk failure or complete node failure is a non-event as there are always two (2) other copies of the object available at all times. A background scan process is always running across the object store system to verify the integrity of each object and build a new copy of the object in case of any damage or missing data. Some object stores allow the ability to change the protection level to a number of copies other than three.
An erasure code is an advance mathematical process to protect data. Here’s how the process works:
- An object is run through a math process to increase its size with redundant data from the object itself.
- The inflated object is sliced into multiple pieces.
- The different pieces are scattered across all the nodes (and individual disks within each node) in the system.
The goal of these steps is to provide a protection scheme that can be described as #/#/#. If you take 5/7/13 (five/seven/thirteen) as an example, that means:
- The object was expanded and sliced into thirteen (13) pieces.
- Only 7 of 13 are needed to have a read-write version of all the data in the object, due to the math used to create the thirteen pieces.
- Only 5 of 13 pieces are needed to have a read-only copy of the data.
By copying the “slices” to as many geographically dispersed nodes as possible, object storage technology provides full protection against any disk, node, or site failure.
Different protection schemes are available that use much larger numbers. (The 5/7/13 scheme example above is very small.)
Again, a background scan process is always running to ensure the integrity of each slice, that the slices are present, and to rebuild missing slices as necessary.
Erasure codes allow full protection while using storage that’s 160% to 180% of the original size of the files—as compared to full object copy which, in the normal three copy mode, takes 300% of the original size of the files.
Object storage is designed to scale to Petabyte (PB) sizes and beyond by its very design. From its inception, multi-site designs are not only preferred but required to provide full protection. The number of sites in a particular object storage system is arbitrary. Three (3) sites is considered the smallest implementation. The number of sites in a single object storage system can range up into the dozens.
As big data continues to challenge IT storage demands, object storage will continue to make inroads in how unstructured data is stored and protected.