Archive for the ‘Storage’ Category

ZFS in OS X Leopard

Having heard several completely mangled attempts to explain the benefits of ZFS (reliably believed to be a feature in the next version of OS X) in various places (including on This Week in Media a few weeks back), I feel I should probably take a crack at explaining what it is and, more to the point, why it matters to indie filmmakers. Because it does matter, a lot.

This post is background for an upcoming post on building a cheap RAID to hold all those hours of 4K footage you’ll be shooting in a few months.

There’s tons of information about ZFS around the web, but much of it is fairly technical, and as far as I’ve seen there’s almost nothing that explains in concrete terms what makes it different from what we’ve got now, and why it’s so important to IT production workflow.

Basically, ZFS is a much more flexible way of handling data storage than what traditional file systems provide. Traditional file systems work with discrete volumes, which may span one disk, or, with RAID, more than one disk.

With ZFS, instead of having inflexible volumes, you create “storage pools” across multiple physical disks. Pools are flexible; if you create a pool spanning four drives and need to add a fifth drive, you can do that without having to recreate any of the file systems in the pool. You can even remove a drive from a pool, assuming there’s enough space to store all the data without it. With a simple command, ZFS will rearrange your data so the drive you specify is no longer necessary, after which it can be removed.

Drives in the pool can be used for any combination of striping, mirroring, or RAID Z (which is sort of like RAID 5), plus there’s support for hot spares.

With traditional RAID setups, though, if you want to add drives or rearrange things, the whole array has to be backed up and reformatted, or you have to use pricey volume management tools which could take many hours to move your data around, during which time your storage is unavailable. With ZFS, none of this is necessary.

Once a pool is created, file systems can be created and rearranged within that pool extremely easily; it’s nearly as easy as creating or rearranging directories presently is. As many file systems as you’d like to create can share a storage pool.

ZFS also has built-in pervasive checksumming features so it’ll automatically detect if your data gets corrupted (and recover it, if you’ve set things up with some amount of redundancy). And because of its architecture, the data on disk is always in a consistent state, eliminating the need for file system repair utilities and the speed hit associated with journaling.

Right now your options for using ZFS are buying ludicrously expensive Sun gear, or downloading OpenSolaris x86 and trying to piece together a system that will work on your own. (I spent some time trying to figure out what hardware would work for an OpenSolaris-based storage server… it’s not easy to find good information.)

Having support in OS X will make things a lot easier. Up next: discussion of how you can leverage ZFS in OS X to get most of the benefits of enterprise-class storage at a fraction of the price.

How not to store data

This blog has been spending a lot of time discussing storage, and will continue to do so. This is, after all, perhaps the largest new workflow challenge with HD, 2K and 4K cameras, for those moving up from the SD world. I’ve already discussed archiving (offline storage) a bit, and I’m planning some posts on online storage. But first, I think a word on how not to store data might be in order.

First off, just to be clear, since there’s some terminology confusion… in the production world, “online” storage doesn’t mean Internet or network storage. It means specifically what we could call “active” storage, almost always hard drives. The drives you actually edit off off, as contrasted with, for instance, tape or DVD backups sitting in your closet, which are “offline” storage.

Onward.

Digital production involves working with large amounts of data. The IT industry has developed a wealth of techniques for managing data in ways that are robust, affordable, and convenient.

Unfortunately, many smaller digital production facilities don’t really have a serious IT guy on staff. Their data management schemes tend to resemble what one would expect if a technique that works well for hobbyists — sticking your footage on external Firewire drives — is scaled up for the entire business. I’ve seen facilities with a dozen or more external drives kicking around, with data haphazardly distributed around on them, and with no set of practices in place to make sure that there are always at least two copies of everything.

You really don’t want to do this. It’s expensive. It wastes a lot of time. And when you’re dealing with RED footage (estimate about one hour per 100 GB with REDCODE RAW 4K), it’s going to get really impractical, because you’ll need multiple external drives per feature.

Far worse, from the perspective of someone with IT experience, who knows how to keep data safe… it’s really just downright scary.

There are far better options. I’ll be posting about some of them this week.

Archiving #2: tape vs. drives

Having basically ruled out optical storage, we’re left with two major archiving options: data tape and hard drives.

The leading high-end data tape format is LTO-3. Drives are available from many vendors, and you can expect to pay upwards of $3500. Tapes are 400 GB, and cost around $55 if you shop around, though you have to buy tapes in 20 packs (over $1000) to get prices like that.

This makes LTO-3 media a lot cheaper than hard drive space; it’s about $0.14/GB, or $0.23 per minute to store REDCODE RAW 4K footage. In contrast, you can get a 500 GB hard drive for around $200, which gives you a price of $0.40/GB or $0.66/minute for that Red footage.

But, of course, you have to take into account that big up-front cost for the tape drive. When does that pay itself off? Let’s do some math.

Assume $480 for a 1 TB hard drive (Hitachi is shipping these in Q1 in an external case (or a bay of a multi-bay external case, much more on these in a later post), and $4000 + $60/tape on the tape side (a price you can get without buying a thousand bucks worth of tape at once).

At these rates, the tape drive pays for itself when you’ve got 12 TB worth of data to store. If you’re storing 24 TB of data, tape is down to $0.32/GB, while hard drive storage is still $0.48/GB. For 40 TB of storage, tape is down to $0.25/GB.

So, which should to pick? Well, tape is a bit of a hassle vs. hard drive storage (much more on this aspect of archiving in a latter post). And 12 TB is a good bit of storage, even for 4K (well, compressed with REDCODE, anyway). It’s enough storage for ~120 hours of footage. If you’re making a documentary or a reality TV program you’ll probably need more storage than that fairly quickly, but that’s enough storage for all the footage comprising ten 100 minute narrative features shot at a 7:1 shooting ratio. It’s probably going to take a while to shoot ten movies, by which time hard drives will probably be more competitive, since hard drive prices tend to drop faster than tape prices.

Based on these numbers, this one is going to be a tough call for a lot of people.

Archiving #1: optical formats

The RED ONE is clearly going to generate a very large amount of data, even using REDCODE compression. At the 28 MB/s rate quoted for 4K, a minute of footage will be about 1.65 GB.

How do you deal with all of this data? This will be the first in a series of posts discussing archiving; it will address optical storage options. Future posts will address tape and hard drive storage options. The subject of online (working) storage will also be addressed in the future, in another series of posts.

For people with busy schedules, here’s the executive summary. How do you store hundreds of gigabytes of footage cheaply and conveniently on optical discs today? You don’t.

Standard single-layer and dual-layer DVDs can be ruled out immediately. A double layer DVD would only hold a bit over five minutes of footage, which is not remotely practical. But just for the record, storage costs would be about $0.35/minute to store REDCODE 4K footage on double layer DVDs.

Blu-ray and HD DVD have higher capacities. Maybe they’re more plausible? Not at the moment. We can probably write off HD DVD for the same reason we wrote of standard DVDs. A 15 GB single-layer HD DVD disc (the only sort your can presently burn) will only hold about nine minutes of footage, and the drives and media are far more expensive, even per gigabyte. A 15 GB HD DVD disc costs around $18. That works out to about $1.98/minute. Yikes.

Blu-ray is slightly more plausible. Single layer Blu-ray discs hold 25 GB. That’s 15 minutes of footage. Better — people seem to manage numbers in this range with film reels — but still not exactly ideal. A burner will set you back around $900, which is a lot… but a 25 GB blank disc costs around $20, for a per-minute cost of about $1.32/minute. This beats out HD-DVD, but it’s still quite pricey.

All media prices assume you’re buying in quantity.

These optical drives also all share another major problem — speed. Even 2x Blu-ray burners — the fastest of the formats — only burn at about 9 MB/second, which means recording a minute of footage takes three minutes. Read performance is similarly slow, making it impossible to play footage back directly from the disc, thus eliminating one of the major advantages optical storage would otherwise have over data tape.

So, while high capacity optical media might seem like the wave of the future, it isn’t practical at this point. Even if all these prices fall by 50% by the time Red starts shipping cameras, is still won’t be very practical.

In the next post in this series, we’ll turn to that stalwart of high-capacity data storage, still going strong in the 21st century: tape.