Simon PG Edwards: Anti-malware testing: behind the scenes

Anti-malware or anti-virus tests can be very useful. Their results may help customers choose the product that is most suitable, which usually means a balance between the cheapest, easiest to use and most effective.

However, there is a certain amount of suspicion that surrounds anti-virus testing in particular. I hope to address some of the issues here, at least from my own standpoint.

Regardless of who performs a test and how well they perform it, there are fans of certain anti-virus brands that will never accept a test's results if it contradicts their personal choice. They will claim that the test has a biased methodology, perhaps.

If the published methodology is fair then they will claim that the test was not carried out according to that methodology. They would rather believe that there is a conspiracy than that their favourite product did not perform well.

Here's a real example, paraphrased from one of the popular security discussion forums. The subject is a test that we performed, and the problem that some people had was that it was paid for by an anti-virus vendor:

Person A: This test is biased. The sponsor got a 100 per cent protection score.

Person B: And yet the sponsor's product also generated 10 per cent false positives. They can't be happy with that.

Person C: True. If the results were doctored then the sponsor would have removed these false positive results.

Persona A: Ah, but the false positives are there to make the test look unbiased.

It's probably not possible to win over those who are determined to see conspiracy in every PDF that is published.

Unfortunately some vendors also feel that tests can be intentionally biased. At one well-known virus conference earlier this year a speaker painted the entire security testing with the same brush, accusing it of being biased and out of date.

In a probably vain effort to address some of the prevalent issues I've come up with a few FAQs that I have to answer almost every time I deal with a vendor for the first time. These will probably be of interest to those who follow the testing scene, although those with severe biases of their own simply won't believe me.

The following applies to sponsored tests that we carry out at Dennis Technology Labs. Some of the reference points, such as the Threat Report chart and the Appendices, refer to regular elements of our public test reports:

Testing FAQs for Dennis Technology Labs

Does the sponsor know what samples are used, before or during the test?
No. We don’t even know what threats will be used until the test starts. Each day we find new ones, so it is impossible for us to give this information before the test starts.

Neither do we disclose this information until the test has concluded. If we did the sponsor may be able to gain an advantage that would not reflect real life.

Do you share samples with the vendors?
The sponsor is able to download all samples from us after the test is complete. Other vendors may request a subset of the threats that compromised their products in order for them to verify our results.

The same applies to client-side logs, including the network capture files. There is a small administration fee for the provision of this service to vendors that do not support our work financially.

What is a sample?
In our tests a sample is not simply a set of malicious executable files that run on the system. A sample is an entire replay archive that enables researchers to replicate the incident, even if the original infected website is no longer available.

This means that it is possible to reproduce the attack and to determine which layer of protection is was able to bypass.

Replaying the attack should, in most cases, produce the relevant executable files. If not, these are available in the client-side network capture (pcap) file.

Does the sponsor have a completely free choice of products?
No, not for tests intended for publication. While the sponsor may specify which products it wants us to compare, we will always advise on this decision and may refuse to include certain products if we feel that a comparison with the others is not fair.

How can a product be 'compromised' when your results clearly show that it detected and deleted the threat?
Our Threat Report chart shows both what the product claims to do and the end result as we determine it. A product may well claim to have protected the system, but the forensic evidence may tell a different tale.

It is quite common in our tests for a product to detect some element of an attack but to fail to prevent another.

Why does Appendix C: Threat Report list more malicious incidents than the total number of samples?
It doesn’t, although it may look like it at first glance. If the final round (e.g. #51) has a higher number than the total number of samples (e.g. 50) then we’ve removed at least one incident from the entire test.

This can happen for a number of reasons. Sometimes the malware crashes, so the product has not defended the system, but neither has the system been compromised. We will either give the product the benefit of the doubt or we’ll remove those rounds and retest with another threat.

We maintain the incident numbers to keep the records consistent with our internal data. No incidents are removed on request by any vendor, including the sponsor.