Comparing ZFS to the 4.4BSD LFS
Remember the 4.4BSD Log Structured Filesystem? I do. I’ve been thinking
about how the two compare recently. Let’s take a look.
First, let’s recap how LFS works. Open your trusty copies of “The
Design and Implementation of the 4.4BSD Operating Systems” to
chapter 8, section 3 — or go look at the Wikepedia
entry for LFS, or read any of the LFS papers referenced therein. Go
ahead, take your time, this blog entry will wait for you.
As you can see, LFS is organized on disk as a sequence of large
segments, each segments consisting of smaller chunks, the whole forming
a log file of sorts. Each small chunk consists of data and meta-data
blocks that have been modified/added at the time that the chunk was
written. LFS, like ZFS, never overwrites data, excepting superblocks, so
LFS does, by definition, copy-on-write, just like ZFS. In order to be
able to find inodes whose block locations have changed LFS maintains a
mapping of inode numbers to block addresses in a regular file called the
“ifile.”
That should be enough of a recap. As for ZFS, I assume the reader has
been reading the same ZFS blog entries I have been reading. (By the way,
I’m not a ZFS developer. I only looked at ZFS source code for the first
time two days ago.)
So, let’s compare the two filesystems, starting with generic aspects of
transactional filesystems:
-
LFS writes data and meta-data blocks everytime it needs to
fsync()/sync() -
Whereas ZFS need only write data blocks and entries in its intent log
(ZIL)
This is very important. The ZIL is a compression technique that allows
ZFS to safely defer many writes that LFS could not. Most LFS meta-data
writes are very redundant, after all: writing to one block of a file
implies writing new indirect blocks, a new inode, a new data block for
the ifile, new indirect blocks for the ifile, and a new ifile inode —
but all of these writes are easily summarized as “wrote block #
to replace the
Of course, ZFS can’t ward off meta-data block writes forever, but it can
safely defer them with its ZIL, and in the process stands a good chance
of being able to coalesce related ZIL entries.
-
LFS needs copying garbage collection, a process involving both,
searching for garbage and relocating data surrounding garbage - Whereas where LFS sees garbage ZFS sees snapshots and clones
The Wikipedia LFS entry says this about LFS’ lack of snapshots and
clones: “LFS does not allow snapshotting or versioning, even though both
features are trivial to implement in general on log-structured file
systems.” I’m not sure I’d say “trivial,” but, certainly easier than in
more traditional filesystems.
-
LFS has an ifile to track inode number to block address mappings. It
has to, else how to COW inodes? -
ZFS has a meta-dnode; all objects in ZFS are modelled as a “dnode” and
so dnodes belie inodes, or rather, “znodes,” and znode numbers are
dnode numbers. Dnode-number-to-block-address mappings are kept in a
ZFS filesystem’s object set’s meta-dnode much as in LFS inode-to-block
address mappings are kept in the LFS ifile.
It’s worth noting that ZFS uses dnodes for many purposes besides
implementing regular files and directories; some such uses do not
require many of the meta-data items associated with regular files and
directories, such as ctime/mtime/atime, and so on. Thus “everything is a
dnode” is more space efficient than “everything is a file” (in LFS the
“ifile” is a regular file).
- LFS does COW (copy-on-write)
- ZFS does COW too
But:
- LFS stops there
-
Whereas ZFS uses its COW nature to improve on RAID-5 by avoiding the
need to read-modify-write, so RAID-z goes faster than RAID-5. Of
course, ZFS also integrates volume management.
Besides features associated with transactional filesystems, and besides
volume management, ZFS provides many other features, such as:
- data and meta-data checksums to protect against bitrot
- extended attributes
- NFSv4-style ACLs
- ease of use
- and so on