Hello,
I'm new to SPDK and am benchmarking RocksDB with BlobFS on an NVMe SSD.
When I tried to run a fillseq db_bench workload for 500M keys, db_bench
gets stuck after processing some 100M keys. After playing with the db_bench
options I was using, I found out that it only gets stuck when WAL is
enabled. When I disabled WAL, the entire workload completed properly.
With WAL enabled, the issue occurs consistently on each run. Using gdb and
some extra print statements, I saw that this happened when RocksDB calls
sync on one of the MANIFEST files. The sync call waits on a semaphore in
this line,
https://github.com/spdk/spdk/blob/master/lib/blobfs/blobfs.c#L2169
I have not yet properly debugged which is the piece of code that is not
calling sem_post, but it looks to be this section (though I'm not quite
certain about this),
https://github.com/spdk/spdk/blob/master/lib/blobfs/blobfs.c#L1756
Any ideas what is happening here? Is this a known issue or a bug? (Just to
note, the run_tests.sh db_bench test script in the repository runs the
fillseq workload with WAL disabled).
Thanks
Swami