The new bevvy of storage solutions are rich with Flash and Cache. From hard disks with NAND Flash installed on the disk controller to SSD drives being logically configured as extended cache for the array; caching is the new/(read old) I/O acceleration method du jour. But is this a fix all or a fix some? I should say that this is a technology that will benefit a small number of use cases but not all. And for the rest of us, this will create another layer of complexity that will require a great deal of planning and testing to realize the benefits of this fast I/O medium.
Caching in the general sense is the placing of a small higher speed memory access device in front of a large slower memory access device. The expected behavior is that the smaller, lower latency memory device will service a portion of the I/O requests at a higher rate. This will then translate into an increase in the overall average I/O transfer rates. Sounds great right? This performance increase is only realized if the scenario above actually occurs, and this scenario relies on ‘locality of reference’, the accessing of data that is nearby, or the recurring access of the same data. To benefit from a cache requires that I have a good cache ‘hit’ rate. That is, the data I was looking for is in the cache when I wanted it. This happens when I am doing very sequential operations; it happens less often when I am doing random I/O operations. In fact, I could even see my performance drop in an environment with excessive caching and very random I/O. Ouch.
Cache and SSD with an Array Controller
In addition to the problem of ensuring good locality of reference to get good I/O performance is the issue of using SSD in an enterprise array. These arrays have been designed as caching controllers in front of the slow disk drives in the array. The cache in the array controller sits in the data path and can provide large multi-gigabyte caching DRAMs. This large cache helps to satisfy I/O requests faster to the hosts. That said, if I am utilizing SSD drives in my enterprise array, I should consider carefully the cache page sizes and the impact of cache de-stage processor load. In some cases ‘write-thru’ cache mode may provide as good or better performance. There is also the issue of RAID overhead. Using SSD in an enterprise array will necessarily introduce a RAID parity overhead that may impact the SSD performance and lifespan.
SSD Cache Acceleration
SSD as a cache accelerator is being slowly implemented in enterprise RAID arrays. This is just as it sounds, add an SSD drive and then allow the array to access it as if it were internal DRAM cache; a quick way to increase the overall cache of the array. This sounds like a great solution but will require that the cache be managed closely to ensure its sized appropriately to get the utilization that justifies its cost.
Flash on Disk
Intel has provided a Smart Response Technology to allow an SSD device to provide a portion of space to the host’s disk cache and mask the extra SSD drive letter thus enhancing the performance of the internal SATA commodity disk drive. This technology can offer great enhancement to the desktop but it may not be suitable for enterprise implementation. In fact, it may be only a ‘flash’ in the pan. As the interface between the NAND flash and the SATA disk will have to be implemented cheaply, we can expect slow throughput. This may be acceptable on a personal computer to service small bursts of I/O, but would create issues in an array. In addition there may be risks in to data in a RAID column when multiple drives are using onboard FLASH that may or may not have de-staged data to the disk in the event of a hard disk failure.
In all of these examples we are seeing a pattern of stacking cache. Not the least of which is the filesystem cache within the servers. In many cases it may not be clear which tier of cache has satisfied any given I/O. As such, tuning those cache sizes and reservations can be quite a task. It is safe to say that if the cache stacking is not managed and designed to be predictable one can experience some very interesting I/O rates that are far inferior to the rates that one would expect from solid state technology.
Data Path Management and Emergent Performance
Management of the I/O starts at the application. If we are to add specific hardware to support I/O for specific I/O patterns, then we have to tune the entire data path. We will have to plan for sizing of writes into the cache and plan for the sizing of stripes across our SSD devices. Finally we need to plan for the I/O processing characteristics of the Application itself. Failure to do so will lead to inexplicable performance scenarios.
Locality of reference
So in the final analysis, if you don’t have the right I/O you may find that these SSD and Caching technologies don’t benefit you at all, which would be disheartening after investing in this high performance storage tier. The way to avoid this scenario is to look hard at your data patterns, allocate SSD and Cache where it benefits most (highest Locality of reference), then use it sparingly and monitor heavily.