Comparing ZFS to the 4.4BSD LFS

Remember the 4.4BSD Log Structured Filesystem? I do. I’ve been thinking
about how the two compare recently. Let’s take a look.

First, let’s recap how LFS works. Open your trusty copies of “The
Design and Implementation of the 4.4BSD Operating Systems”
to
chapter 8, section 3 — or go look at the Wikepedia
entry for LFS
, or read any of the LFS papers referenced therein. Go
ahead, take your time, this blog entry will wait for you.

As you can see, LFS is organized on disk as a sequence of large
segments, each segments consisting of smaller chunks, the whole forming
a log file of sorts. Each small chunk consists of data and meta-data
blocks that have been modified/added at the time that the chunk was
written. LFS, like ZFS, never overwrites data, excepting superblocks, so
LFS does, by definition, copy-on-write, just like ZFS. In order to be
able to find inodes whose block locations have changed LFS maintains a
mapping of inode numbers to block addresses in a regular file called the
“ifile.”

That should be enough of a recap. As for ZFS, I assume the reader has
been reading the same ZFS blog entries I have been reading. (By the way,
I’m not a ZFS developer. I only looked at ZFS source code for the first
time two days ago.)

So, let’s compare the two filesystems, starting with generic aspects of
transactional filesystems:

  • LFS writes data and meta-data blocks everytime it needs to
    fsync()/sync()
  • Whereas ZFS need only write data blocks and entries in its intent log
    (ZIL)

This is very important. The ZIL is a compression technique that allows
ZFS to safely defer many writes that LFS could not. Most LFS meta-data
writes are very redundant, after all: writing to one block of a file
implies writing new indirect blocks, a new inode, a new data block for
the ifile, new indirect blocks for the ifile, and a new ifile inode —
but all of these writes are easily summarized as “wrote block #
to replace the block of the file whose inode # is .”

Of course, ZFS can’t ward off meta-data block writes forever, but it can
safely defer them with its ZIL, and in the process stands a good chance
of being able to coalesce related ZIL entries.

  • LFS needs copying garbage collection, a process involving both,
    searching for garbage and relocating data surrounding garbage
  • Whereas where LFS sees garbage ZFS sees snapshots and clones

The Wikipedia LFS entry says this about LFS’ lack of snapshots and
clones: “LFS does not allow snapshotting or versioning, even though both
features are trivial to implement in general on log-structured file
systems.” I’m not sure I’d say “trivial,” but, certainly easier than in
more traditional filesystems.

  • LFS has an ifile to track inode number to block address mappings. It
    has to, else how to COW inodes?
  • ZFS has a meta-dnode; all objects in ZFS are modelled as a “dnode” and
    so dnodes belie inodes, or rather, “znodes,” and znode numbers are
    dnode numbers. Dnode-number-to-block-address mappings are kept in a
    ZFS filesystem’s object set’s meta-dnode much as in LFS inode-to-block
    address mappings are kept in the LFS ifile.

It’s worth noting that ZFS uses dnodes for many purposes besides
implementing regular files and directories; some such uses do not
require many of the meta-data items associated with regular files and
directories, such as ctime/mtime/atime, and so on. Thus “everything is a
dnode” is more space efficient than “everything is a file” (in LFS the
“ifile” is a regular file).

  • LFS does COW (copy-on-write)
  • ZFS does COW too

But:

  • LFS stops there
  • Whereas ZFS uses its COW nature to improve on RAID-5 by avoiding the
    need to read-modify-write, so RAID-z goes faster than RAID-5. Of
    course, ZFS also integrates volume management.

Besides features associated with transactional filesystems, and besides
volume management, ZFS provides many other features, such as:

  • data and meta-data checksums to protect against bitrot
  • extended attributes
  • NFSv4-style ACLs
  • ease of use
  • and so on

~ by nico on December 7, 2005.

Leave a Reply

Your email address will not be published. Required fields are marked *