*********************************************** Some ideas to increase detection complexity by Second Part To Hell *********************************************** Index: ****** 0) Introduction 1) Improving tau-obfuscation? 2) Reverse Engineering vs. Meta-Language in Body 3) Code Integration -> Code Merging 4) Overlapping Code for mutations 0) Introduction Here you'll find a few small ideas and thoughts about making detection of computerviruses harder. Thanks alot to herm1t and hh86 for discussion and asking the right questions. 1) Improving tau-obfuscation? The idea of tau-obfuscation is to perform a time-intensive calculation before encrypting/executing the virus-code, with the result that realistic AV emulators have to give up (as they can't scan one file for too long). This technique has been already covered by Beaucamps & Filiol[1] and Z0MBiE[2]. A simple example: encrypted_code=[ENCRYPTED CODE]; key=sum(factors(VERY_BIG_INTEGER_NUMBER)); eval(decrypt(encrypted_code, key)); * First question: What algorithm should be used? Algorithms such as factorization need much code, and could be a source for detection themselves. Z0MBiE used a RSA algorithm, which is smaller than factorization, but still big in terms of assembler instructions - and as its asymmetric decryption, it has to carry both encrypted code and decryption key. In MatLab.MicrophoneFever[3] I've used inbuild complex mathematic functions provided by MatLab, thus reduced the code size. Disadvantage of this methode is obviously the dependence on mathematical programs. A simple solution is to use short Random Number Generators such as LCG or XORSHIFT, which can be created with <10 assembler instructions. With that method, the decryption key could be the n-th random number starting from a given random seed. n can be adjusted such that it takes xxx seconds to find the key. To avoid X-Ray attacks, subsequent numbers can be combined to form the whole key. * Second question: What about observant users? Imagine threshold tau is set to one minute. An infected program is executed, the user would have to wait for one minute. Obviously this will smell fishy. The most simple solution would be to start the decryption engine as own process with lowest priority. By that, whenever CPU isn't used, the engine continues to decrypt itself. Advantage: user wont notice anything and emulator still would have to to invest much time. * Third question: After decryption - fully unprotected? We could use partial decryption of the code: Get 1st key with tau-obfuscation Decrypt 1st part Execute 1st part Re-encrypt 1st part Get 2nd key with tau-obfuscation Decrypt 2nd part Execute 2nd part Re-encrypt 2nd part ... Get n-th key with tau-obfuscation Decrypt n-th part Execute n-th part Re-encrypt n-th part The virus will never be fully undecrypted in memory - it never loses its shild. * Fourth question: Suspicious single loop? What if antivirus program mark a short long-running loop as suspicious? Simple: Instead of searching for one key after N loops of a RNG engine we can search for m keys after (N/m) loops each, and use each key to encrypt one of the m parts of the virusbody. * Fifth question: Can I use it only for encryption? We can use this technique for general obfuscation, not just encryption. Examples: bignum=BIG_SPECIAL_NUMBER; jmpvalue=add(factors(bignum))%pow(2,32); jmp dword[jmpvalue] or bignum=BIG_SPECIAL_NUMBER; datavalue=add(factors(bignum))%pow(2,32); mov dword[eax], datavalue We see using tau-obfuscation can be fun for us and pain for them. :) [1] Philippe Beaucamps & Eric Filiol, "On the possibility of practically obfuscating programs Towards a unified perspective of code protection" Journal in Computer Virology, April 2007. [2] Z0MBiE, ""DELAYED CODE" technology (version 1.1)", 2000, http://vxheavens.com/lib/vzo23.html [3] SPTH, "Matlab.MicrophoneFever2", Valhalla Magazine, July 2011. 2) Reverse Engineering vs. Meta-Language in Body Metamorphic viruses/worms need the information of their structure coded in a metalanguage to work with it later (change it and write it back to native code). One way is to get it by reverse engineering (disassembling) the code. - - Biologic organisms need the information of their structure coded in a metalanguage to work with it later (due to the lack of a "copy function"). They could also use a mechanism of reverse engineering the structures in the cell to get this information. They dont do this, because its way to complicated. Instead, they save the whole information within the cell in form of the metalanguage (DNA), and therefor they can directly start at this step. For compuerviruses, the meta-language structure must not appear in plain-text, and simple encryption is vulnerable to statistical attacks. Instead, one could write the zero-form at runtime to memory: mov edi, Alloc_memory_for_metalanguage mov dword[edi], 'AABBCCDD' mov dword[edi+4], 'EEFFGGHH' Advantage: This writing process is an excellent source for metamorphic mutations, thus increases the variability of the organism alot, by that also increases the detection complexity. We can be funny and add simple encryption to written memory: mov edi, Alloc_memory_for_metalanguage mov dword[edi], 'XXYYZZAA' mov dword[edi+4], 'BBCCDDEE' ... for(int i=0; i Code Merging Code integration is certainly the most complex infection technique for computer viruses so far. It was first used in ZMist by Z0MBiE for Win32 executeables in 2001[4][5], and later in 2007 by herm1t in his Linux.Lacrimae[6][7]. The idea is to fully disassemble the host and virus, and integrate the viruscode into the hostcode: *************** ##################### * * ## ## * H * ## jmp Vir1 ## * * ## Host1: ## * O * ## H ## * * ## jmp Host2 ## * S * ## Vir3: ## * * ## R ## * T * ## jmp Host1 ## * * ## Host2: ## *************** ## O ## - - - > ## jmp Host3 ## +++++++++++++++ ## Vir1: ## + + ## V ## + V + ## jmp Vir2 ## + + ## Host3: ## + I + ## S ## + + ## jmp Host4 ## + R + ## Vir2: ## + + ## I ## +++++++++++++++ ## jmp Vir3 ## ## Host4: ## ## T ## ## ## ##################### This is a successful technique. However, we can try to put it one additional step further. We can not just insert the virus between the hostcode, but actually use the hostcode as viruscode, by creating a second codeflow. Let's say, we want to include a simple invoke MessageBox, 0x0, VMSG1, VMSG2, 0x0 into a given hostcode: [ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ] include 'E:\Programme\FASM\INCLUDE\win32ax.inc' .data FileName db 'info.txt',0 hCreateFileFile dd 0x0 .code start: push 0x0 push FILE_ATTRIBUTE_NORMAL push OPEN_ALWAYS push 0x0 push 0x0 push (GENERIC_READ or GENERIC_WRITE) push FileName stdcall dword[CreateFileA] mov dword[hCreateFileFile], eax ret .end start [ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ] To get enough instructions that we can use, we can expand the hostcode [ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ] include 'E:\Programme\FASM\INCLUDE\win32ax.inc' .data FileName db 'info.txt',0 hCreateFileFile dd 0x0 .code start: push 0x0 mov eax, FILE_ATTRIBUTE_NORMAL push eax push OPEN_ALWAYS push 0x0 push 0x0 mov eax, (GENERIC_READ or GENERIC_WRITE) push eax mov eax, FileName push eax mov eax, CreateFileA stdcall dword[eax] mov ebx, hCreateFileFile mov dword[ebx], eax ret .end start [ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ] And now let's merge our MessageBox with this hostcode. [ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ] include 'E:\Programme\FASM\INCLUDE\win32ax.inc' .data FileName db 'info.txt',0 hCreateFileFile dd 0x0 VMSG1 db 'Hello',0 VMSG2 db 'VXers!',0 .code start: xor ecx, ecx ; Set ZF jmp VirInstr1 HostInstr0: push 0x0 mov eax, FILE_ATTRIBUTE_NORMAL jnz HostInstr1 VirInstr3: add eax, (VMSG1-FileName) xor ecx, ecx ; Set ZF jmp VirInstr4 HostInstr1: VirInstr6: push eax jz VirInstr7 push OPEN_ALWAYS VirInstr7: push 0x0 jz VirInstr8 VirInstr1: push 0x0 jz VirInstr2 mov eax, (GENERIC_READ or GENERIC_WRITE) VirInstr4: push eax jz VirInstr5 VirInstr2: mov eax, FileName jz VirInstr3 push eax jnz HostInstr4 VirInstr10: inc ecx ; Clear ZF jmp HostInstr0 HostInstr4: mov eax, CreateFileA VirInstr9: stdcall dword[eax] jz VirInstr10 jnz HostInstr2 VirInstr5: add eax, (VMSG2-VMSG1) xor ecx, ecx ; Set ZF jmp VirInstr6 HostInstr2: mov ebx, hCreateFileFile jnz HostInstr3 VirInstr8: add eax, (MessageBox-VMSG2) xor ecx, ecx ; Set ZF jmp VirInstr9 HostInstr3: mov dword[ebx], eax ret .end start [ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ] We use the instructions given by the hostcode, and combine them with conditional jumps. The only instructions that are not merged are some re-adjustments of addresses (MessageBox, VMSG1, VMSG1) - but in fact this could be done by merging too, however, the result would be more complex. Beside of hard recognizion of the code (even for the human eye), it provides alot of freedom which can be used to alter after every generation: which instructions are expanded; which registers are used for expansion; how is the codeflow of the virus; ... In my oppinion: Absolutly worth to bring to reality! :) [4] Z0MBiE, "Automated reverse engineering: Mistfall engine.", 2000, http://vxheavens.com/lib/vzo21.html [5] Peter Ferrie & Péter Ször, "Zmist Opportunities", VirusBulletin Mar 2001, http://vxheavens.com/lib/apf47.html [6] herm1t, "Code integration on Linux: Cooking the PIE", EOF-DR-RRLF, 2008. [7] Peter Ferrie, "Crimea river", VirusBulletin February 2008, http://vxheavens.com/lib/apf12.html 4) Overlapping Code for mutations Overlapping code are code segments that have different behaviour depending on how they are executed. For instance: 00402000 > $ B8 31C04040 MOV EAX,4040C031 what happens if we jump to 00402001? 00402001 > 31C0 XOR EAX,EAX 00402003 . 40 INC EAX 00402004 . 40 INC EAX This can be used in a vast variety of ways for obfuscation (in 1994, Stormbringer wrote a virus that just consists of jump instructions, using overlapping code[8]) or code protection[9]. Certainly, this can be used in mutation engines too, gives additional variability. Some examples: Our code: 00402000 > $ 31C0 XOR EAX,EAX 00402002 . 40 INC EAX 00402003 . 40 INC EAX Overlapped Code: 00402000 > $ 68 11204000 PUSH overlap_.00402011 00402005 . 68 0C204000 PUSH overlap_.0040200C 0040200A . 81F7 31C040C3 XOR EDI,C340C031 00402010 . C3 RETN 00402011 . 40 INC EAX or 00402000 > $ B8 31C04040 MOV EAX,4040C031 00402005 . 3D 31C04040 CMP EAX,4040C031 0040200A .^74 F5 JE SHORT overlap_.00402001 or 00402000 > $ EB 02 JMP SHORT overlap_.00402004 00402002 . 81FE 31C04040 CMP ESI,4040C031 There are over 9.000 other ways to write the original instructions down using overlapping code. One may consider this when planing the next mutation engine. [8] Stormbringer, "Jump", 40hex #14, 1994. [9] Matthias Jacob & Mariusz H. Jakubowski & Ramarathnam Venkatesan, "Towards Integral Binary Execution: Implementing Oblivious Hashing Using Overlapped Instruction Encodings", 2007. Second Part To Hell July 2011