As a system Administrator at SingleHop, one common type of scenario that we run into almost every day is when a client will call or send us a ticket saying “Help! My server is slow and we’re not sure why!” Depending on the server OS, there are a lot of different ways to approach troubleshooting this type of issue. This post will concentrate on Linux-based systems.
When troubleshooting performance on most Linux based servers, one of my favorite tools to use is lsof, which stands for list open files. Sure, there are some simpler ways to look at similar information, such as top, which shows all the top processes in real time, or ps aux which will show you ALL the processes and their status. However, when the answer isn’t obvious or you need more information about what exactly is bogging your server down, lsof is a great tool to get started.
Let me give you an example that I ran into recently while on shift. A client sent a ticket in which his server memory usage was spiking and running entirely too high. The server’s IOwait% was spiking, basically, he was almost out of memory and his hard drive was being used heavily to compensate. I accessed the server via SSH and ran a top command. In the output, I noted that there were two processes that were using the majority of the server’s memory, both were zip. In most Linux screens, on the far left, you’ll note PID, or process ID. For this example, let’s assume a PID of 12345
At the command line, we run a simple command:
lsof -p 12345
The output you’ll get a list of ALL the files being held open by that process. You will often see a lot of log files and system files there, but in the haystack there will be a needle… usually it will reveal the source of the problem.
Back to the example I spoke of earlier. In the lsof output, we found that the process was actually zipping up some large SQL dump files from the customer’s eCommerce software. We looked at the software log files and identified who started the process.
Here are a few other things you might have a lot of luck doing with lsof:
Finding and stopping compromised scripts.
If your server is rooted or hacked, it’s pretty common for the people who did this to dump different scripts inside hidden folders buried far within your system folders. I’ve used lsof several times to help weed out scripts buried deep within system files and stop/delete them. For example, let’s say there is a rogue perl script running some port scans or being used to DDoS another machine, ps aux | grep perl to get the PID followed by a lsof –p PIDwill help you drill down exactly where the script is and from there you can delete the script and start figuring out how to fix the problem.
Find out what is running on a certain port.
This is very useful when you note some bizarre things in a netstat –nacommand. There are several ways to do this, but let’s say you want to see what is listening on port 2525 of your server. You could do this:
lsof -Pnl +M -i4 | grep :2525
or a simpler version:
lsof –i :2525
See what a files a system user has open.
So, let’s say we want to see what files the user ‘archie’ is currently using on the server
Lsof –u archie
Finding out what is running when trying to unmount a device.
Ever try to unmount a CD Drive, virtual drive, or even a USB thumb drive and you get the pesky ‘Device is busy’ error message. lsof can help here too. If you are trying to unmount /dev/sda3
lsof is really that simple and that powerful as troubleshooting tool. If you use it in conjecture with other commands, such as grep, then it can really make for some powerful one line pieces of code.