Simon PG Edwards: Rating anti-malware protection

Would you rather your anti-virus program stopped threats from running on your PC, or that it let then run for a while and then removed them?

The answer for most people is, I am sure you will agree, that it's better to keep the malware off the system in the first place rather than give it a chance to cause damage or steal information.

Some people think that rating the effectiveness of anti-malware products is simple. Either it detected the virus or it did not, surely?

If only things were this simple.

Static testing

Admittedly this was at least partially true with traditional, in which a scanner would assess a folder of malicious files. It would either detect the bad file or it would not.

Of course, it might mis-classify a threat but ultimately who cares? As long as it detects that the file is bad, it has succeeded.

Real-world testing

When an anti-malware test uses live threats that attempt to infect the target things change dramatically.

This is because there are different layers of protection involved. While one or more may fail, others can come in to play. But, as we will see, it's better to stop the threat in top layers.

Let's take web threat testing as an example, because that's what we spend a lot of time doing at Dennis Technology Labs. In such tests the target (aka 'victim') computer browses a website that is hosting malicious content. This could be an automatic drive-by attack or some form of social engineering that tricks users into downloading the malware.

When you visit the site while running an anti-malware product a number of things may happen: These include:

Product blocks the website, denies access to its content and prevents the attack.
Product allows the website but detects and blocks the malicious content.
Product allows the website, including some malicious content. However, it successfully detects and removes the main payload before any damage is done.
Product allows the website, including all malicious content. As malicious files are executed and others are downloaded the product starts detecting and removing some. Ultimately the system is cleaned, although there may be some inactive files left on the hard disk.
Product allows the attack to run fully.

There are, of course, other possibilities but these are the ones that we see quite frequently. In some cases (2 - 3) the product may be so good that it completely removes any traces of the attack. That is possible, but not usual, when the threat has managed to run on the system. It is always in the case when the product blocks the website fully (1).

It's pretty clear that disallowing the threat from running on the system is the best possible case. Any interaction between a threat and its target is dangerous. For example, data may be stolen while the threat runs, even if the malicious files are removed from the system later.

Rating the differences

It is for this reason that we allocate different scores depending on the outcome. We give the highest scores to the products that keep the threats off the system completely, give credit to those that remove the threats completely and acknowledge those that disable the threat, even if some traces remain. Obviously we penalise those products that allow the threat to run.

In our reports we use the following terms as shorthand to describe these cases:

Defended
Neutralised
Compromised

We also use the term, "with full remediation," which means, "no traces of the threat remained on the system."

With a 'defended' result full remediation is implied, as there never were any traces to be found or removed. In some (rare) cases of 'neutralisation' we may determine that all traces were removed.

Here is an extract from one of our 2013 reports that explains things in more detail:

These levels of protection have been recorded using three main terms: defended, neutralized, and compromised. A threat that was unable to gain a foothold on the target was defended against; one that was prevented from continuing its activities was neutralized; while a successful threat was considered to have compromised the target.

A defended incident occurs where no malicious activity is observed with the naked eye or third-party monitoring tools following the initial threat introduction. The snapshot report files are used to verify this happy state.

If a threat is observed to run actively on the system, but not beyond the point where an on-demand scan is run, it is considered to have been neutralized.

Comparing the snapshot reports should show that malicious files were created and Registry entries were made after the introduction. However, as long as the ‘scanned’ snapshot report shows that either the files have been removed or the Registry entries have been deleted, the threat has been neutralized.

The target is compromised if malware is observed to run after the on-demand scan. In some cases a product might request a further scan to complete the removal. We considered secondary scans to be acceptable, but continual scan requests may be ignored after no progress is determined.

An edited ‘hosts’ file or altered system file also counts as a compromise.

The full picture

To fully rate an anti-malware product one also needs to factor in how it handles legitimate software. In the worst cases does it misclassify useful programs as malware? How much responsibility does it pass on to the user? (e.g. unhelpful messages that say "application.exe wants to run. Allow or deny?")

Dennis Technology Labs uses a Total Accuracy rating to
show how effective products are at protecting against
threats and handling legitimate software

To gain a full overview of how effective an anti-malware product is you need to measure not only its detection rates but its protection rates. The different possibilities mean that it is rarely as simple as a 'hit' or a 'miss' and product ratings in tests should reflect that.