Defeating The Perfect Emulator.

Written by Bhunji


Intro.

We virus writers have defeated heuristics, scanning, checksums and TSR  
blockers for a long time. Even emulation has been fought, we have won some
battles but the AV's is leading the war. As code emulators is the AV's
strongest weapon, having a defence would be very nice.

To detect an emulator you need to use something that differs when being 
emulated. This could be non common instructions, the function  
IsDebuggerPresent or similar. All of these methods has one weakness,  The
Perfect Emulator (tm). The Perfect Emulator would cut through them like me
cutting  through my victims, fast, elegant and non detectable. The
Perfect Emulator would only differ from the processor in speed.

(Some viruses adds a call to the virus inside the infected program instead  
of changing the start address (Called EPO). This is a really good idea as 
an emulator need to emulate everything to find a virus like this. This is 
possible but today's processors is to slow to make it an useable technology. 
The downside is that the call might never be executed and that it's a pretty 
advanced (or buggy if made simple) technology )

Even if The Perfect Emulator never will exist its always possible to add 
some code to a good emulator after you have found a virus using a new 
technique. All your work with the polymorphic engine will then be useless.

But... there is a way to defeat TPE. TPE has one weakness. It assumes stuff :)
This is what the AV's write about generic decryption. (Understanding and
managing polymorphic viruses by Carey Nachenberg, Symantec)



Generic decryption assumes:

* The body of a polymorphic virus is encrypted to avoid detection. 
* Once an infected program begins to execute, a polymorphic virus must
  immediately usurp control of the computer to decrypt the virus body.
* A polymorphic virus must decrypt before it can execute normally.



The first two points has already been defeated, the first with
metamorphic and the second with EPO. This text will talk about
defeating the third method.

Why does generic decryption need to assume that the polymorphic
virus decrypts itself? Its actually very simple . Instead of creating
a detection routine for every polymorphic engine they add a search 
string that is inside the encrypted code. It does then rely on the 
virus to decrypt itself and because of that show the scanner this 
search string.

A metamorphic virus will not have any static search string so generic
decryption cant be used for those. A virus using EPO does have a 
search string but the decryptor is "hidden", to find a virus like
this with emulation one needs to emulate everything. The third
known technique is to use something that the engine isn't able to 
emulate, this is a stupid idea I think, it will help nothing against
future emulators. 

The fourth very logical idea is not to decrypt the virus. If the virus 
isn't decrypted how the hell is the scanner supposed to find its search 
string. The problem is that if the virus wont run if it doesn't decrypt 
itself. But what if the virus decrypts itself sometimes, and sometimes
not? Then the generic decryptor will have the search string sometimes
but then again, sometimes not.

If a virus executes only 50% of the times, only 50% of the viruses will be
found with emulation. If a virus only executes only 10% of the times,
only 10% of the viruses will be found. Not even emulating every program
ten times will find every virus. Compare the virus as a lottery with 10%
chance of winning, just because you buy ten lots doesn't mean that you
will win.

How often do we want the virus to execute? 

If the virus is memory resident it doesn't really matter if it takes one or
ten times, the spreading will hardly be affected. We want the virus code
to execute as seldom as possible. 

If the virus is a direct infector things are a bit more complex. Do we
want to have a high infection rate but also a high detection rate or a
slow infection rate and not get detected as often? Well, its up to you.
Maybe a engine that evolves from fast to slow over time would be a good
idea.

I call this the Guide technique because the execution is guided to either
the program or the virus. Its time for some code examples.


Programstart		dd	OriginalRVA
Virusstart		dd	VirusPolymorphicCodeOffset

mov	eax,RandomNumber
and	eax,100b		; eax = 0 or 4
jmp	[Programstart+eax]

Put this Guide at the program entry point and 50% of the viruses will not
be found with emulation. A scanner can find this code though but we all
know how to defeat scanners don't we :)

Programstart			dd	OriginalRVA
trash				db	4 dup (?)
VirusPtr			dd	VirusStart

mov	eax,RandomNumber		
and   	eax,1000b		; eax = 0 or 8
jmp	[Programstart+eax]

This example showes that we dont need to have the pointers next to each
other.


Programstart	dd	OriginalRVA
VirusPtr 	dd	VirusStart
...

mov	eax,RandomNumber
and   	eax,111b		 ; eax = 0 - 7
setz	al
jmp	[Programstart+eax*4]

This Guide will only execute the virus 1/8 times, making it very difficult
to detect. This code is not so secure as the previous though as an emulator
that tries both possibilities of a 'setz' will find the decryption code. 


As you can see a Guide is pretty small which makes it even harder to scan.
We need som random numbers though. By using API's we can get plenty. We
need to patch the import section for this as its hard to polymorph
a 'GetApi' function good. In DOS its easier as they can call int's which
doesn't rely on addresses. I have unfortunately not made so much research
in random numbers but I know of five that doesn't need any API. The first
one is fs:[0ch] and the second is fs:[34h]. One of the random bits in
fs:[0ch] is bit 4.

00000 ... ?000b
	  |
  This bit is random

If we 'and' fs:[0ch] with 8 we get either zero or eight. We could then use
the Guide in example two. 

The random numbers created with 'fs' is random every time a program executes.
A more stealth technique is to have a random number that only changes every
boot. The virus will of course spread slower but a user will not notice 
anything if he scans "the wrong boot". Ecx, edx and esi are random in this
way. One of the random bits in these is bit 3. A very nice add-on when
using these is that the Guide will be very small.

and	ecx,4
jmp	[JumpTable+ecx]



How to defeat a Guide.

Defeating a Guide is not easy. A Guide does have one negative side though, 
it has the possibility to return the address of the decryptor. An emulator
together with heuristics is able to decide if a piece of code is a Guide.
It could then run the Guide multiple times fetching all possible addresses.
This is possible and very easy. Lets look at an example.

mov 	eax,RandomValue
and	eax,100b
jmp	[eax+ProgramStart]

If we emulate this we get.

eax = ?
eax = ? and 4
jmp  [? and 4 + ProgramStart]

If the emulator finds a jump depending on a random value it knows it's a 
Guide. A program do never jump somewhere on random. There is no way to  
defeat this. The Perfect Emulator wont find viruses like this though because
it doesn't know that fs:[0ch] is a randomiser. The Perfect Emulator needs to 
be equipped with addresses and functions that returns random values. If it is, 
what would it do when it finds a random jump. Lets look at the following code. 

mov	eax,12
add 	eax,?
and	eax,100b
jmp	[eax+ProgramStart]

This is emulated as.

eax = 12
eax = ?+12
eax = (?+12) and 4
jmp  [(?+12) and 4 + ProgramStart]

The emulator needs the whole Guide to be able to tell where the program
is able to jump. Because of this it needs the offset where the guide
begins. It starts parsing backwards. (Parsing backwards isn't possible
but it will use a similar technique)


eax = (?+12) and 4
eax = ?+12
eax = 12

This is where the Guide begins. At this point it replaces the jump with a 
"mov eax" and the ? with [esp+4]. It will then attach a ret at the end of 
the Guide. It has now turned the Guide into a function that returns an
offset instead of jumping to it.

mov	eax,12
add 	eax,[ebp+4]
and	eax,100b
mov	eax,[eax+ProgramStart]
ret

All it has to do now is to run this code in a loop.

for (i=0; i<0xffffffff; i++)
	AddValueToList(Guide(i));

Now it has every possible value the Guide is capable of returning. It will
then start doing regular emulation at those addresses and also find the
virus.

Even if its possible to find a Guide the AV's need to recode their engines. 
As they don't give out their source I don't know if they already record 
which registers that depends on a random value and which don't. They do  
emulate the registers though. That's why it's not possible to hide from 
heuristics by moving data in two steps.

mov	ax,0012h
xchg	al,ah

instead of

mov	ax,1200h

This is not possible which means that they do emulate the registers.  The
question is how good they emulate them?

There is a more elegant way to find all possible addresses a Guide
returns. Instead of just mark a register as random and not random they
could mark every bit in the register. They would then get something like
if we use 'and ?,100b'.

0 = Constant bit
1 = Random bit
 
? = 00000 ... 000100b

Then they just have to run the Guide twice to get all possible offsets.

The second method that can be used to defeat a Guide is using a memory 
resident program. Instead of just scanning every program before execution
it should start emulating it a while and then let it go. This might slow 
down execution to much though but it is a possibility.


Anti anti-Guide techniques.

This is fortunately very easy. Lets look at the example again.

mov 	eax,RandomValue
and	eax,100b
jmp	[eax+JumpTable]

What makes this code different from everything else? Right, it jumps
somewhere on random. What is the difference from these pieces of code.

ProgramStart:
and	ecx,100b
jmp	[ecx+JumpTable]

ProgramStart:
and	eax,100b
jmp	[eax+JumpTable]

The answer is that the first one is a Guide and the second one isn't.
Eax is always the same (on win9x anyway). As it isn't possible to treat 
every jump as a Guide the AV's need to know that it should inspect the 
first example closer but continue at the second. The only way to know 
this is to save all ways to get a random number in a file. Defeating a 
regular Guide in the future might not be so hard, just add the random 
number it uses and let the engine do the rest. But... What to do with 
a virus that scans the memory for random values out in the "field". 
There must be thousands and thousands of different memory locations 
that is random. If the engine doesn't know these random values it wont 
be able to determine if it's a Guide and will not find all infections.

Defeating a resident program is even easier, just add EPO too. They 
cant emulate everything.


Anti anti anti-Guide techniques.

If you think this is the end, think again :).

How to defeat a virus that collects random numbers "in the field"?

a) Defeating random numbers that changes every boot.

The solution is to scan the computer every boot. This takes lots of
time but it is the only solution (that I can think of).

b) Defeating random numbers that changes every execution.

This is a bit trickier but still possible. The solution is AI. When
the AV program finds an infected file (by using regular emulation) it
runs what it believes is the Guide multiple times and takes notes of
the differences every time. The engine should then be able to find
where the virus gets the random value and add this to the database. It
is then able to find every virus using this location.

Anti anti anti anti-Guide techniques.

Actually, this isn't possible. By using both techniques above scanning
for virii is just the same as it is today. Scanning for viruses every
boot isn't very likely though. But maybe they will scan some files. If
they do we don't want our virus to be in one of those (it is fucked then
stupid). The less files we infect the less is the chance that an
infected file is scanned. Less infections = less infections though.


If your still think this idea wont work maybe this comment (also
stolen from Understanding and managing polymorphic viruses) might
change your mind.

"Virus authors might design a polymorphic virus that decrypts half the
 time, for example, yet remains dormant at other times. Anti-virus
 software could not reliably detect such a virus if it does not decrypt
 itself every time the file is loaded into the virtual computer. In this
 case, a hand-coded detection routine will be needed."