Simon PG Edwards: What is a malware sample?

Malware samples come in many
forms. This is not one of them.

After we run an anti-malware test some security companies request the malware samples that their products failed to recognise.

I'm never quite sure what they mean when they use the phrase 'sample' because it can mean different things.

Here's a short list of the most common options.

1. Malicious program
2. Hash of a malicious program
3. URL
4. Hash of URL
5. Network capture, aka 'pcap'
6. Web session, aka 'replay'

At Dennis Technology Labs we deal mainly with the sixth option - the web session. In addition to providing web session replays we also make the fifth (pcap) option available in some cases.

As we test all layers of security, from web reputation systems down to file detection scanners, providing samples like this enable security vendors to verify our results and possibly improve their products.

1. Malicious program

Usually such samples are referred to as a binary, an executable (aka an 'exe') or a PE file. In practical terms these types of samples are either a downloader or the payload downloaded by the downloader. You would expect to see files named in a similar way to those below:

0132787483643.exe
foto(4).exe
xyz-britney.scr
winlive.exe

2. Hash of a malicious program

Instead of sending collections of malicious files around the internet, sometimes it's sufficient to simply identify the file using a mathematical hash.

This can be useful because most anti-malware vendors have massive databases containing details of all known (to them) malware and these database records usually include a hash for each file.

You can generate a hash of a file using one of many free utilities, such as MD5sums. To discover the MD5 hash value of a file called test.txt you might type the following command in Windows:

C:\> md5sums test.txt

The output would look something like this:

MD5sums 1.2 freeware for Win9x/ME/NT/2000/XP+
Copyright (C) 2001-2005 Jem Berkes - http://www.pc-tools.net/
Type md5sums -h for help

[Path] / filename MD5 sum
---------------------------------------------------
[C:\]
test.txt a0895e00f4d49c355f4f33f69475f963

3. URL

This could be as simple as www.example.com, or a more detailed (and arguably more useful) example could be www.example.com/dir/bad.exe or www.example.com/dir/script.php.

Some testers download files from the web to enable web reputation systems to protect the system. This is a good idea, although using drive-by download and social-engineering pages is more realistic than a direct download taken out of context, such as typing www.example.com/dir/bad.exe into the browser, rather than clicking on it from a malicious email or internet messaging message.

4. Hash of URL

As with files, it is possible to generate hashes of URLs (see '2. Hash of a malicious program' above).

5. Network capture, aka 'pcap'

Saving all of the network traffic generated during an attack makes it possible to recover downloaded files and to understand why a product may not have been working properly. If it fails to send a successful query to its back-end database, for example, the pcap file may contain evidence to help uncover this issue.

Packet capture files, which is where the name 'pcap' comes from, contain as much or as little information as you choose when you start monitoring the network. We capture every packet, so the files are large but contain everything. Capturing just the packet headers is useful for troubleshooting but you can't pull binaries out of the resulting (tiny) files.

6. Web session, aka 'replay'

It's possible to capture a complete web session using a tool like Fiddler. Such capture files include any exploit code used in a web attack as well as programs that are downloaded.

The benefit for security companies is that they can reproduce the same attack as when we used it to test anti-malware products. You can't do that with just the program itself, or a hash of that program.

To replay a Fiddler capture file you can use a utility like Microsoft's HTTPREPLAY.