Lustre file system hangs server during really fast I/O?

If you are a proud user of the Lustre file system, and you find yourself pushing many thousands of very small files very quickly, it can choke Lustre, as of version 2.4.1. It will hang so badly that you may not be able to kill the process that is writing to Lustre, and causing this problem.

Yes, that means that

kill -9 <PID> 

does not work at all, nor does any other signal you can imagine.

Previously, I had to reboot the server to kill the damn process that ate my server. It turns out that there is a very simple solution to keep your sanity…

sudo umount -f <Lustre mount point>  

Just forcibly unmount the Lustre file system mount point, and that will cause your process run amok to commit hara kiri and crash, thereby stopping it. This is far better than rebooting a server!

Leave a Reply

Your email address will not be published. Required fields are marked *