Defeating The Perfect Emulator. Written by Bhunji Intro. We virus writers have defeated heuristics, scanning, checksums and TSR blockers for a long time. Even emulation has been fought, we have won some battles but the AV's is leading the war. As code emulators is the AV's strongest weapon, having a defence would be very nice. To detect an emulator you need to use something that differs when being emulated. This could be non common instructions, the function IsDebuggerPresent or similar. All of these methods has one weakness, The Perfect Emulator (tm). The Perfect Emulator would cut through them like me cutting through my victims, fast, elegant and non detectable. The Perfect Emulator would only differ from the processor in speed. (Some viruses adds a call to the virus inside the infected program instead of changing the start address (Called EPO). This is a really good idea as an emulator need to emulate everything to find a virus like this. This is possible but today's processors is to slow to make it an useable technology. The downside is that the call might never be executed and that it's a pretty advanced (or buggy if made simple) technology ) Even if The Perfect Emulator never will exist its always possible to add some code to a good emulator after you have found a virus using a new technique. All your work with the polymorphic engine will then be useless. But... there is a way to defeat TPE. TPE has one weakness. It assumes stuff :) This is what the AV's write about generic decryption. (Understanding and managing polymorphic viruses by Carey Nachenberg, Symantec) Generic decryption assumes: * The body of a polymorphic virus is encrypted to avoid detection. * Once an infected program begins to execute, a polymorphic virus must immediately usurp control of the computer to decrypt the virus body. * A polymorphic virus must decrypt before it can execute normally. The first two points has already been defeated, the first with metamorphic and the second with EPO. This text will talk about defeating the third method. Why does generic decryption need to assume that the polymorphic virus decrypts itself? Its actually very simple . Instead of creating a detection routine for every polymorphic engine they add a search string that is inside the encrypted code. It does then rely on the virus to decrypt itself and because of that show the scanner this search string. A metamorphic virus will not have any static search string so generic decryption cant be used for those. A virus using EPO does have a search string but the decryptor is "hidden", to find a virus like this with emulation one needs to emulate everything. The third known technique is to use something that the engine isn't able to emulate, this is a stupid idea I think, it will help nothing against future emulators. The fourth very logical idea is not to decrypt the virus. If the virus isn't decrypted how the hell is the scanner supposed to find its search string. The problem is that if the virus wont run if it doesn't decrypt itself. But what if the virus decrypts itself sometimes, and sometimes not? Then the generic decryptor will have the search string sometimes but then again, sometimes not. If a virus executes only 50% of the times, only 50% of the viruses will be found with emulation. If a virus only executes only 10% of the times, only 10% of the viruses will be found. Not even emulating every program ten times will find every virus. Compare the virus as a lottery with 10% chance of winning, just because you buy ten lots doesn't mean that you will win. How often do we want the virus to execute? If the virus is memory resident it doesn't really matter if it takes one or ten times, the spreading will hardly be affected. We want the virus code to execute as seldom as possible. If the virus is a direct infector things are a bit more complex. Do we want to have a high infection rate but also a high detection rate or a slow infection rate and not get detected as often? Well, its up to you. Maybe a engine that evolves from fast to slow over time would be a good idea. I call this the Guide technique because the execution is guided to either the program or the virus. Its time for some code examples. Programstart dd OriginalRVA Virusstart dd VirusPolymorphicCodeOffset mov eax,RandomNumber and eax,100b ; eax = 0 or 4 jmp [Programstart+eax] Put this Guide at the program entry point and 50% of the viruses will not be found with emulation. A scanner can find this code though but we all know how to defeat scanners don't we :) Programstart dd OriginalRVA trash db 4 dup (?) VirusPtr dd VirusStart mov eax,RandomNumber and eax,1000b ; eax = 0 or 8 jmp [Programstart+eax] This example showes that we dont need to have the pointers next to each other. Programstart dd OriginalRVA VirusPtr dd VirusStart ... mov eax,RandomNumber and eax,111b ; eax = 0 - 7 setz al jmp [Programstart+eax*4] This Guide will only execute the virus 1/8 times, making it very difficult to detect. This code is not so secure as the previous though as an emulator that tries both possibilities of a 'setz' will find the decryption code. As you can see a Guide is pretty small which makes it even harder to scan. We need som random numbers though. By using API's we can get plenty. We need to patch the import section for this as its hard to polymorph a 'GetApi' function good. In DOS its easier as they can call int's which doesn't rely on addresses. I have unfortunately not made so much research in random numbers but I know of five that doesn't need any API. The first one is fs:[0ch] and the second is fs:[34h]. One of the random bits in fs:[0ch] is bit 4. 00000 ... ?000b | This bit is random If we 'and' fs:[0ch] with 8 we get either zero or eight. We could then use the Guide in example two. The random numbers created with 'fs' is random every time a program executes. A more stealth technique is to have a random number that only changes every boot. The virus will of course spread slower but a user will not notice anything if he scans "the wrong boot". Ecx, edx and esi are random in this way. One of the random bits in these is bit 3. A very nice add-on when using these is that the Guide will be very small. and ecx,4 jmp [JumpTable+ecx] How to defeat a Guide. Defeating a Guide is not easy. A Guide does have one negative side though, it has the possibility to return the address of the decryptor. An emulator together with heuristics is able to decide if a piece of code is a Guide. It could then run the Guide multiple times fetching all possible addresses. This is possible and very easy. Lets look at an example. mov eax,RandomValue and eax,100b jmp [eax+ProgramStart] If we emulate this we get. eax = ? eax = ? and 4 jmp [? and 4 + ProgramStart] If the emulator finds a jump depending on a random value it knows it's a Guide. A program do never jump somewhere on random. There is no way to defeat this. The Perfect Emulator wont find viruses like this though because it doesn't know that fs:[0ch] is a randomiser. The Perfect Emulator needs to be equipped with addresses and functions that returns random values. If it is, what would it do when it finds a random jump. Lets look at the following code. mov eax,12 add eax,? and eax,100b jmp [eax+ProgramStart] This is emulated as. eax = 12 eax = ?+12 eax = (?+12) and 4 jmp [(?+12) and 4 + ProgramStart] The emulator needs the whole Guide to be able to tell where the program is able to jump. Because of this it needs the offset where the guide begins. It starts parsing backwards. (Parsing backwards isn't possible but it will use a similar technique) eax = (?+12) and 4 eax = ?+12 eax = 12 This is where the Guide begins. At this point it replaces the jump with a "mov eax" and the ? with [esp+4]. It will then attach a ret at the end of the Guide. It has now turned the Guide into a function that returns an offset instead of jumping to it. mov eax,12 add eax,[ebp+4] and eax,100b mov eax,[eax+ProgramStart] ret All it has to do now is to run this code in a loop. for (i=0; i<0xffffffff; i++) AddValueToList(Guide(i)); Now it has every possible value the Guide is capable of returning. It will then start doing regular emulation at those addresses and also find the virus. Even if its possible to find a Guide the AV's need to recode their engines. As they don't give out their source I don't know if they already record which registers that depends on a random value and which don't. They do emulate the registers though. That's why it's not possible to hide from heuristics by moving data in two steps. mov ax,0012h xchg al,ah instead of mov ax,1200h This is not possible which means that they do emulate the registers. The question is how good they emulate them? There is a more elegant way to find all possible addresses a Guide returns. Instead of just mark a register as random and not random they could mark every bit in the register. They would then get something like if we use 'and ?,100b'. 0 = Constant bit 1 = Random bit ? = 00000 ... 000100b Then they just have to run the Guide twice to get all possible offsets. The second method that can be used to defeat a Guide is using a memory resident program. Instead of just scanning every program before execution it should start emulating it a while and then let it go. This might slow down execution to much though but it is a possibility. Anti anti-Guide techniques. This is fortunately very easy. Lets look at the example again. mov eax,RandomValue and eax,100b jmp [eax+JumpTable] What makes this code different from everything else? Right, it jumps somewhere on random. What is the difference from these pieces of code. ProgramStart: and ecx,100b jmp [ecx+JumpTable] ProgramStart: and eax,100b jmp [eax+JumpTable] The answer is that the first one is a Guide and the second one isn't. Eax is always the same (on win9x anyway). As it isn't possible to treat every jump as a Guide the AV's need to know that it should inspect the first example closer but continue at the second. The only way to know this is to save all ways to get a random number in a file. Defeating a regular Guide in the future might not be so hard, just add the random number it uses and let the engine do the rest. But... What to do with a virus that scans the memory for random values out in the "field". There must be thousands and thousands of different memory locations that is random. If the engine doesn't know these random values it wont be able to determine if it's a Guide and will not find all infections. Defeating a resident program is even easier, just add EPO too. They cant emulate everything. Anti anti anti-Guide techniques. If you think this is the end, think again :). How to defeat a virus that collects random numbers "in the field"? a) Defeating random numbers that changes every boot. The solution is to scan the computer every boot. This takes lots of time but it is the only solution (that I can think of). b) Defeating random numbers that changes every execution. This is a bit trickier but still possible. The solution is AI. When the AV program finds an infected file (by using regular emulation) it runs what it believes is the Guide multiple times and takes notes of the differences every time. The engine should then be able to find where the virus gets the random value and add this to the database. It is then able to find every virus using this location. Anti anti anti anti-Guide techniques. Actually, this isn't possible. By using both techniques above scanning for virii is just the same as it is today. Scanning for viruses every boot isn't very likely though. But maybe they will scan some files. If they do we don't want our virus to be in one of those (it is fucked then stupid). The less files we infect the less is the chance that an infected file is scanned. Less infections = less infections though. If your still think this idea wont work maybe this comment (also stolen from Understanding and managing polymorphic viruses) might change your mind. "Virus authors might design a polymorphic virus that decrypts half the time, for example, yet remains dormant at other times. Anti-virus software could not reliably detect such a virus if it does not decrypt itself every time the file is loaded into the virtual computer. In this case, a hand-coded detection routine will be needed."