Hello all,
We run two Lustre on ZFS filesystems running Lustre 2.4.0 and ZFS 0.62
(the defaults from the ZFS on Linux repo). One filesystem has roughly
150GB of metadata and the other has 6GB. I've been using ZFS
send/receive functionality to make full/incremental backups of the
metadata on a daily/hourly basis. However, this has become more and more
difficult to do as the size of our metadata has increased.
On our filesystem with 150GB of metadata, we are using ZFS RAID 10 with
4x15K 300GB drives. Our backup metadata server has the same
configuration. When I'm using ZFS send/receive to do a full backup, I
can only reach somewhere between 3-5MB/s in throughput over 1GB
Ethernet. I've used a tool called mbuffer to determine that the send
side definitely is causing the slow performance. For our larger metadata
pool, a full backup can take an entire day and decreases performance of
the file system for other uses. CPU wait is quite high on the system
during normal operations and increases substantially during backups.
Our other, smaller, ZFS metadata partition is a poor test subject as the
loads are far lighter and the partition uses ZFS RAID10 on 4xSSDs. There
I see performance at 10-20MB/s and much friendlier CPU wait times. Even
there, that's a lot lower performance than those drives are capable of
doing, but the smaller metadata size makes it very reasonable for now.
What are other people doing for backing up ZFS metadata partitions? What
kind of performance do have you normally see for these backups? If
anyone has found any gotchas regarding performance of these types of ZFS
operations, please let me know!
Thanks,
Andrew
--
Andrew Wagner
Research Systems Administrator
Technical Computing
UW-Space Science and Engineering
AOSS Room 439
608-261-1360
Show replies by thread