Jul 28, 2013
Andrew Brooks
Stack Trace

I am a firm believer that in the field of information security, tools are overrated.  I, like all security professionals, have a long list of software that I rely on for getting work done, but make no mistake; tools do not provide security nor do they provide knowledge, and at best, they provide insight and a way to simplify the things that you could do manually.  The moment you find yourself using a tool that’s smarter than you, you’re setting yourself up for failure and disappointment.  Along the same vein of knowing exactly what your tools are doing, it’s important to know what your tools aren’t doing, and bridging that gap by leveraging what you know you don’t know.

To help us see how this can be done, let’s use a quasi case study, where we can use what we know we don’t know to know more. (makes perfect sense, right?)

In a lot of large environments, it is not at all uncommon to come across a web proxy which is designed to protect users from malicious websites and harmful downloads.  A common accessory to web proxies are antivirus modules which are designed to sit inline and scan incoming files which may be willingly or unwillingly downloaded by the user to ensure that no known malicious files make their way to the user’s workstation.  However, herein lies the problem; AV only detects things it knows about.  Now, rather than go on about the failures and shortcomings of AV, suffice to say that a system which can only detect things it knows about isn’t very effective against new and unknown threats (zero days).  Based on this system, we know that we cannot rely on our proxy and AV software to detect unknown threats.  However, our proxy does provide us with log data that we can use to help bridge the gap in threats we otherwise wouldn’t know about.  So, let’s use this data to hack up a system which helps us identify anomalous files and for the sake of keeping this post short and simple, let’s only examine PDF files.  Before going on, note that this is not a real-time preventative system we’re hacking together, but it is designed to help you identify and respond to items that may otherwise go unnoticed.  For the sake of simplicity, in this example our system only runs once every 24 hours.

Alright, back on track!  The web proxy is used by all employees and at the end of a 24 hour period, we have a significant amount of log data, but we’re only interested in malicious PDFs that the proxy did not block.  After all, if the proxy blocked the malicious PDFs, we can make that a lower priority investigation since the malicious file(s) presumably didn’t make it down to the workstation.  Now, we want to know about malicious PDFs that weren’t detected.  To do this, we’ll use a nice combination of free tools as well as your own simple and custom scripts that are suited to your organization’s needs.  That’s not cop out, I simply can’t tell you what is and is not important for your environment so it will be up to you to determine what metrics you use to determine severity.  Anyway, our system works like this:

  1. Dump the log data for a 24 hour period, and identify PDFs that were successfully downloaded
  2. Using the above information, dump these URLs to a file or database, and retrieve them en masse.  *For example, if we outputted these URLs to the file “badurls.txt,” we could download them in bulk with the simple Linux one-liner: for i in `cat badurls.txt`; do wget $i;done
  3. Now that we have all of the PDFs downloaded that were not detected as malicious, we can use some tools such as those listed by the well known Lenny Zeltser to perform some superficial and automated analysis of the files in question.
  4. Using the output from the tools, write the data to a text file and have a batch job that e-mails it to you at whatever interval you see fit for manual analysis.

*You may also want to look into requests that 404 when you try to download them as that could be suspicious, but it’s your call.  This is a fictitious example anyway :)

While it’s simple, I think it gets the point across.  The system proposed above isn’t perfect, and it can very easily still miss a lot of things, however, the point is that by using the data we have, and looking at all of the unknowns, we can help close the gap which in turn, provides us with more visibility not only into our environment, but also our tools which is always a win-win!

Thanks for reading and be sure to do something cool this week!

Leave a Comment