I’ve already talked about one of OpenStack’s core building blocks, Cinder, which is OpenStack’s block based storage feature. Now, I want to take a closer look at Swift, OpenStack’s object based storage project, and another key component to an OpenStack deployment.
Swift enables the storage and retrieval of data by using a simple API. It uses a replication factor to store multiple copies of data, which is something that is configurable. It also uses something called a consistent hashing ring to allow the system to remain resilient and scale simply.
If You Liked It Then You Should Have Put a Ring On It
In a Swift ring, the partitions are evenly spaced apart, across all of the devices in the cluster. When a ring is created, something called a partition power must be selected. The number of partitions is equal to 2^partition power, and cannot be changed later on. It is a little bit if a commitment, and takes some thought to figure out The partition power also comes into play when Swift needs to determine where an object is in the ring, it will use this number of the bits from the MD5 hash for indexing. When I add a capacity to the ring, data will move proportionally to the new storage space. For example, if I had 100 drives, and added one more, only 1/100th of the data would move. This allows me be flexible when adding to my environment, especially when I’m first building it out and may not have the resources to add large amounts of capacity at any one time.
One Ring to Rule Them All?
Not necessarily. Besides the object rings where our object are stored, there are also rings for account databases and container databases. Swift is driven by storage policies, which are applied to an object ring. I may have different requirements inside of my OpenStack environment. I may have a need for some extremely fast SSDs, but I don’t care that much about their data, so I’ll have a replication factor of 2 perhaps. Or, I may have some critical objects to store on some cheap and deep 6TB SATA disks, and I want 5 copies of those. Each of these would have a different storage policy associated with them, and be assigned to a different container. I could also have multiple containers with the same policy applied to them. Data replication within the ring is ongoing, it will check to ensure we have the required number of copies of data. In the event of something such as a drive failure, replication will realize that it is down a copy, and create a new copy on a device that has not failed.
Do I say Yes or No?
To my current storage environment. That’s a loaded question, it depends on your environment and your business needs. While there are organizations that run Swift clusters on top of storage nodes – commodity servers loaded up with disk, there’s nothing wrong with leverage your existing storage investments, or an enterprise class storage array. By leveraging NetApp Clustered Data ONTAP, you’re able to use features such as data deduplication and compression. By leveraging a NetApp E-Series device, you can eliminate the need for data replication by leveraging Dynamic Disk Pools, reducing the footprint required. There are pros and cons to every Swift deployment, it is just a matter of backing it up with your business requirements.
This week’s OpenStack summit had a great overview presentation called Swift 101: Technology and Architecture for Beginners, presented by SwiftStack. SwiftStack is a great resource for leaning more about Swift, especially getting a good understanding of how the ring works.
Song of the Day – Phantogram – Black Out Days