Stealth api-based decryptor
kaze / FAT

Stealth api-based decryptor
___________________________

kaze/FAT

http://fat.next-touch.com



Table of Contents

 1. Introduction
 2. The idea
 3. The decryptor
 4. "Stealth" IAT patching
 5. Adding some semantic junk
    5.1. Finding safe APIs
    5.2. Adding random api calls to the decryptor
 6. Decryptor fragmentation
 7. Polymorphism
    7.1. Kpasm
    7.2. The poly rules
 8. Obfuscation through relocations
 9. Results & Conclusion



1. Introduction
_______________


This paper is a (partial and crapy) translation of the original paper written in  french. I take no
responsibility for the brain injuries caused by my awful english. You have been warned.



The main thing i really enjoy in virus writing is  neither  spreading  nor  weird  target  platform
infection, it's just AV detection evading. And  when I  say  stealth, i don't  mean  "kill  any  AV
running on  the  victim's OS", I  mean: not  detected. But  to  be  honest, writing  a  long-enough
undectetd virus begins to be a real challenge. Nowadays, even the most advanced  poly  engines  get
detected in a  few  days. A  few  years  ago, some  little  tricks  like  including  big  loops  in
decryptors, generating a lot of junk or using uncommon opcodes  could  fool  some  of  the  weakest
emulators. But new techniques like code normalization can detect easily such  tricky  polymorphized
decryptors.



I'll try to present here a new approach to evade av detection. Instead of increasing the complexity
of the decryptor, as most of the actual poly engines tend to, we will try to build a decryptor that
looks as common as possible, hopping for the AV to cancel emulation. We will try  to  increase  the
risk of false positive during virus detection. This approach has been implemented in my last virus,
win32.leon, which can be found in the virus section of this emag.




2. The idea
___________


To defeat or slow down actual emulators, I propose a new approach: not a breaking-through  one, but
I never saw it in a working virus. It's based on two hypothesis:



* Current emulators don't emulate APIs. At least, not all the APIs

* If the emulated code looks like a traditionnal app and unemulable APIs calls are being
  analyzed, emulators will be likely to cancel the scanning




Those hypothesis may be a bit too strong. Some av products, like sandboxes, emulate  the  whole  OS
environnement. But those two hypothesis are likely to be the reality  for  desktop  antivirus, that
can't spend more than a bunch of seconds for an executable analyzis.



Usual virus decryptors use to be as complex and as hard to emulate as possible. What  we  will  try
here instead is to build a decryptor that looks like a common  application, an  harmless  chunk  of
code. Our decryptor will only use standard win32 api calls. No xor  nor  uncommon  opcodes. A  very
few number of viruses use APIs in their decryptor, and they're often  used  as  junk  code. If  AVs
don't emulate them, they can still ignore them: no good. And in most of the cases, the API sequence
used by the virus is constant, leaving the possibility to perform behavioural detection.



3. The decryptor
________________


There are several ways to build a 100% api-based decryptor, you just need to find a set of api able
to perform simple encryption. I choosed the CryptoApis for leon, but other api sets should  do  the
job. For example, the BitBlt API with it's xor capability may be a good choice too (even  a  better
one, as the XOR operation is done by the graphic card chipset, not the  cpu). But  simpler, better,
let's look at our candiate :



   csp dd ?
   hash dd ?
   key db 48 dup (?)
   hkey dd ?

   call CryptAcquireContext, offset csp,0,0,PROV_RSA_FULL,CRYPT_VERIFY_CONTEXT
   call CryptCreateHash, csp,CALG_MD5,0,0,offset hash
   call CryptHashData, hash, offset key,4,0
   call CryptDeriveKey, csp,CALG_RC4,hash,48,offset hkey
   call CryptDecrypt, hkey,0,1,0,start_of_virus,size_of_virus,start_of_virus



This simple bit of code will decrypt our virus through the RC4 symmetric  algorithm. More  info  on
the CryptoApis can be found easily on the msdn library. In  fact, win32.leon  has  two  decryptors,
this one and a standard xor decryptor, but the xor decryptor is rarely used, and we'll  focus  here
on the cryptoapis based one. We will first see how to build an api-based  decryptor, and  then  how
to make it as stealth as possible.



4. "Stealth" IAT patching
_________________________


In order to use APIs in the decryptor, we need to  gather  the  used  APIs  adresses, nothing  new.
Usually, in a virus, API adresses are obtained at runtime from a memory scan, by parsing the export
table of the DLLs. As we want our decryptor to look as harmless as possible, looking  for  the  api
adresses in memory at runtime won't be considered. Instead, we will modify  the  infected  host  to
make it import the wanted APis for us . This can be done by patching the host's  IAT  at  infection
time, in this way:



1. Modify the IAT RVA in the host's data directory to point to  some  free  space. For  example, we
   can allocate an extra 4kb space in the last section by resizing the section. Copy  the  original
   host's IAT to this free space

2. Add an extra Image Import Descriptor for the DLL hosting the APIs used in the decrpytor. For the
   CryptoApis, it will be an IID for advapi32.dll. If an IID for advapi32 was already in the host's
   IAT it won't matter: a PE containing two IIDs for the same DLL is still a valid PE.

3. Add to this IID the names (in the First Thunk and Original First Thunk) of the APIs used in  the
   decryptor: "CryptAcquireContext", "CryptCreateHash", etc.

4. Remember the virtual adress of the First Thunk of this IID. It will contain the adresses of  the
   wanted APIs when the infected host is loaded in memory by the  windows  loader. For  example, if
   CryptAcquireContext is the second imported API, the call  CryptAquireContext  in  the  decryptor
   will be replaced by call [adress_of_first_thunk+4].




I won't show here the structure of the Import Image Descriptor, as it has been already explained in
a bunch of papers: if you're lost, go get some documentation about the PE format. Source  code  for
this task can be found in the functions add_iid_decrypteur and deal_imports  of  fusion_imports.asm
in win32.leon source code.



Thanks to IAT patching, we can now call APIs in the decryptor without having to look for the  API's
adresses in memory, that is, just like  a  traditionnal  application. But  this  algorithm  can  be
improved a bit thought: the IID we just added in the IAT is too much  constant. In  fact, it  could
lead to potential signatures for the AVs. To make it a bit stealthier, I added  two  little  tricks
in win32.leon:



* The API strings (referenced by the FirstThunk and the OriginalFirstThunk of the  extra  IID)  are
  stored at random locations in the  host  file. This  is  done  to  avoid  simple  signature  like
  "CryptAcquireContext+CryptCreateHash+…+CryptDecrypt".

* Some random advapi32 API (APIs we don't need in the decryptor) are added to the  extra  IID. This
  done to avoid an heuritic detection: without this trick, our extra IID  would  alway  import  the
  same five APIs in the same order: CryptAcquireContext .. CryptDecrypt. Even if  the  API  strings
  are stored at random locations in the host file, it could be detected through heuristic.



The first algorithm is very simple: the only thing you have to do  is  to  find  (or to make)  some
random located free spaces in the host and to store the API strings there. Those  free  spaces  can
be located in the virus body, but be careful to let enough space between the API strings. And  make
sure that the spaces between two API strings aren't constant bytes, as it could lead to a potential
signature too.






The second trick isn't harder to implement. What you have to do is to keep a list of  advapi32  API
strings (or whatever dll your decryptor uses) inside your virus, and add a  random  set  of  thoses
string at infection time into the extra IID (well, into the FirstChunk  and  OriginalFirstThunk  of
the IID). Don't hesitate to insert a lot of different APIs into the extra IID, but make sure:



* Those APIs are inserted between the APIs you really use in your decryptor.

* The API you add are cross-compatible. It would be a mess if your virus couldn't spread  to  win2k
  because you added an import to an API only aviable on winXP.



Again, I won't show source code for this as this is a trivial task. If you're very curious, take  a
look to the other functions in fusion_imports.asm in win32.leon source  code. The  cross-compatible
advapi32 APIs list is stored in the file  apis_advapi_compat.inc. Of  course  this  list  has  been
computed, I didn't test APIs one by one ;)



5. Adding some semantic junk
____________________________


Now, we have a 100% api-based decryptor that looks pretty harmless. It could be still  detected  in
two ways :



* Through standard signature. I will present the poly engine later.

* Dynamically, through the sequence of api calls in the  decryptor. In  fact, our  decryptor  will
  always perform the same api calls  sequence: CryptAcquireContext, CryptCreateHash … CryptDecrypt.
  No good.



Because we don't want our decryptor to be detected, even dynamically, we will try to introduce some
randomness to the API calls sequence of the virus decryptor. We will insert random "safe API" calls
between our effective api calls in the decryptor. By "safe API", I mean an API  we  can  call  with
random-value parameters and that won't do anything besides returning an error code. For example the
API CloseHandle: if we call CloseHandle(random()) under Windows  XP  or  2000, the  API  will  just
return (99.999% of the time) an  harmless  error  code  into  eax, nothing  more. No  side  effect,
nothing. So the idea is to insert calls  to  such  harmless  APIs  into  our  decryptor, at  random
locations.



5.1. Finding safe APIs
______________________


Before adding safe api calls into our decryptor, we have to find witch win32 API is a  "safe"  one,
witch APIs can we execute without a risk. It can be a painful task if we test them by hand  one  by
one. Would be nice if a program could compute that for us. That's why I wrote a  little  tool  that
tests all APIs on a given OS with random parameters, and write down the APIs that don't  crash, i.e
the safe ones. Again, as this tool is very simple, I won't paste source code. It can  be  found  in
this emag too (safeapisdetector). To be short, what it does is:



1. List all of the APIs in the system dlls (kernel32, user32, advapi32, gdi32, etc.)

2. For each API, find the number of parameters of the API. This is done by calling the API with
   just say 20 parameters on the stack, call the API, and look how many dword have been  popped
   from the stack. This number is the number of parameteres for the API.

3. Call the API many times, each time with different random parameters.

4. If no exception has been thrown and if we still haven't crashed, this API is a safe one.




Then, this tool write the list of (crc of the name of safe api, number of parameters)  into  a .inc
file that can be directly included into a .asm file. I ran it on several  OS  (win XP SP0, SP2, SP2
Pro and 2000) and kept the intersection of all the safe APIs that the tool found  on  each  system.
The result can be found into the file  fake_apis.inc  from  win32.leon  source  code. Here  is  the
beginning of that file:



list_safe_apis:
        db "kernel32.dll",0
        dd 034EEF5CFh, 1 ; AddAtomA
        dd 085870330h, 1 ; AddAtomW
        dd 0E8EE9923h, 3 ; AddConsoleAliasA
        dd 07B5E9926h, 3 ; AddConsoleAliasW
        dd 0E9FA4F67h, 0 ; AllocConsole
        dd 0E35DCCE1h, 0 ; AreFileApisANSI
        ...




For windows XP and 2000, something like 66% of the APIs are safe ones. Under Vista it's about 5-10%
as most of the APIs throw an exception when called with  wrong  parameters. But  I  don't  care, as
win32.leon targets XP/2000 only.



5.2. Adding random api calls to the decryptor
_____________________________________________


We now have a list of cross-compatible safe apis, that we can call from our decryptor, embedded  in
our virus. The next step is to modify the decryptor. We will insert api calls to some of those safe
apis at random locations into the decryptor. This  is  done  at  infection  time, just  before  the
decryptor gets polymorphized. But again, in order to call those APIs in the decryptor, we  have  to
make sure the host import them. In win32.leon, I choosed to use only already-imported  APIs. As  my
safe-API list is a big one (more than 500 cross-compatible safe APIs), we are nearly sure that  the
infected host will import at least two or three of them.



After semantic polymorphism, our api-based decryptor (for a given generation of the virus) may look
like:




   call CreateFiber, random(),random(),random()
   call CloseHandle, random()
   call CryptAcquireContext, offset csp,0,0,PROV_RSA_FULL,CRYPT_VERIFY_CONTEXT
   call TlsAlloc
   call CryptCreateHash, csp,CALG_MD5,0,0,offset hash
   ...



6. Decryptor fragmentation
__________________________


In order to obfuscate a bit more the  virus, Win32.leon's  decryptor  is  fragmented  into  several
chunks of code. Each chunk contains the code for a (fake or not) API call. Those chunks are written
at random locations into the host, the first chunk being located at the entry point of the infected
PE. They overwrite the host data (or code), data that are saved into the virus body. Those data are
of course restored when the virus exits, just before jumping to the infected program.



The location of each chunk is choosed carefully, in order to avoid  the  host  corruption: the  main
PE structures won't be overwritten by any chunk. In leon's source code, the module fragmentation.asm
will list all the important structures of the PE (IAT, EAT, ressources, tls, etc.) and choose N safe
locations for the chunks (where N=number of chunks=number of api calls in the decryptor).







In order to loose a bit the emulator, the jump from chunk #k to chunk #k+1 is a bit  obfuscated. If
the av doesn't know the number of parameters of the api used in chunk #k (and if the api itself  is
not emulated of course), it won't be able to locate chunk #k+1. Lets look at an example:








7. Polymorphism
_______________


Our api-based decryptor may be able to bypass emulation, but it is still vulnerable  to  signature-
based detections. A quick solution is to polymorphize the decryptor code, but  it  should  be  done
carefully. As we want our decryptor to look as harmless  as  possible, standard  engines  won't  be
considered: most of them produce "strange"  code, uncommon  opcodes  and  are  quickly  flagged  as
"suspicious" by AV  emulators. Instead, the  poly  engine  for  such  decryptors  should  focus  on
"standard" code production.



7.1. Kpasm
__________


As i'm a lazy person, I didn't want to build the poly engine for leon entierly in asm. A good  poly
engine is an engine with a lot of obuscation rules, and writing all those rules in asm is a painful
task. Instead, I created a tool to help me in the creation of my poly engine. This  tool, kpasm, is
like a specialized compiler designed to build poly engines. The obfuscation rules are written in  a
high level language that looks like C, with specialized instructions for polymorphism. To be  short,
an obfuscation rule may look like:



mov reg,cst <=> mov reg,0; add reg,cst



This is not Kpasm syntax, but the idea is there: rules are described in a short and elegant manner,
while poly engine implementation is abstracted. Rules may be of course  more  complicated  (use  of
random registers, jumps, loops, memory reads and writes etc.). From those rules, kpasm will produce
the source code (asm source code) of the poly engine that will apply those user-defined obfuscation
rules.







I won't describe kpasm here as it is a complex tool. User guide (in french) can  be  found  on  the
FAT website, as well as binaries and examples. An english version may be  avaible  in  the  future.
Using kpasm permits to build a lot of obfuscation rules very quickly, and that's what is  important
for the kind of decryptor we want to build. For example, the poly engine  of  win32.leon  has  been
built in 2-3 hours (see regles.kpasm in source code).



7.2. The poly rules
___________________


As we want to increase the risk of false positive for AVs, poly rules must be written carefuly. The
idea is to apply a lot of small  obfuscation  rules  that  produce  short  standard  code. Uncommon
operations like push reg / junk reg / pop reg should be avoided, as well as uncommon  opcodes  like
stc, clc etc. The polymorphized decryptor of win32.leon only contains:



* Standard operations: mov, add, sub, lea, cmp, jmp, push, pop, etc.

* Api calls: junk apis and useful ones

* Always false tests followed by jumps to host code. Those tests must not be too much trivial

* Memory accesses: reads, writes and read of previsouly written values. Most of the memory
  access are "useful" for the virus, i.e not junk. A big piece of code without any meaningful
  memory access would look suspicious

* Small junk loops




The decryptor should not be polymorphized too much tho, as unoptimized code always looks suspicious.
In win32.leon, only a few steps of polymorphism are done: not all  opcodes  are  polymorphized  and
polymorphized code is not too big. But  two  decryptors  from  different  generations  are  totally
differents, and that's what matter. All the rules  are  described  in  kpasm  syntax  in  the  file
regles.kpasm. Below is an example of a polimorphised (fake) api chunk of the decryptor. Just infect
some executable with win32.leon to look at more samples.




01008A61 > $ C705 6F3C0801 >MOV DWORD PTR DS:[1083C6F],kazerege.0100>
01008A6B   . FF35 6F3C0801  PUSH DWORD PTR DS:[1083C6F]
01008A71   . 8B35 D33C0801  MOV ESI,DWORD PTR DS:[1083CD3]           ;  kazerege.01069704
01008A77   . 2B1D 733C0801  SUB EBX,DWORD PTR DS:[1083C73]
01008A7D   . 56             PUSH ESI                                 ; /hTemplateFile => 01069704
01008A7E   . FF35 433D0801  PUSH DWORD PTR DS:[1083D43]              ; |Attributes = HIDDEN|SYSTE
01008A84   . BF 239C70DD    MOV EDI,DD709C23                         ; |
01008A89   . BB FD350801    MOV EBX,kazerege.010835FD                ; |
01008A8E   . 2B15 BB3C0801  SUB EDX,DWORD PTR DS:[1083CBB]           ; |
01008A94   . 8BB3 B2060000  MOV ESI,DWORD PTR DS:[EBX+6B2]           ; |
01008A9A   . 57             PUSH EDI                                 ; |Mode => DD709C23
01008A9B   . C705 E73D0801 >MOV DWORD PTR DS:[1083DE7],8BAE08D6      ; |
01008AA5   . BB 093D0801    MOV EBX,kazerege.01083D09                ; |
01008AAA   . 8B53 0A        MOV EDX,DWORD PTR DS:[EBX+A]             ; |
01008AAD   . FF35 E73D0801  PUSH DWORD PTR DS:[1083DE7]              ; |pSecurity = 00E2DA48
01008AB3   . 68 F73848DB    PUSH DB4838F7                            ; |ShareMode = FILE_
01008AB8   . 68 A2A36A41    PUSH 416AA3A2                            ; |Access = GENERIC_WRITE|16
01008ABD   . FF35 5B3D0801  PUSH DWORD PTR DS:[1083D5B]              ; |FileName = 0035DB23 ???
01008AC3   . FF15 B0110001  CALL DWORD PTR DS:[<&KERNEL32.CreateFile>; \CreateFileW
01008AC9   . 813D 233D0801 >CMP DWORD PTR DS:[1083D23],9DD8D33
01008AD3   .^0F84 9D9CFFFF  JE kazerege.01002776
01008AD9   . C3             RETN




As you can see, the polymorphized decryptor chunks don't look  very suspicious: pushs, pops, memory
access, that's all. Of course  all  registers, instructions  and  memory  access  change  from  one
generation to another.



8. Obfuscation through relocations
__________________________________


Our stealth api-based decryptor can be even be more obfuscated, without looking too much suspicious.
A good technique to obfuscate the decryptor (besides polymorphism) without introducing  weird  code
ike xor loops, is the "encryption through relocations"  technique. This  technique  is  a  old  one,
first presented by TCP in 29A#5 and used in his  resur  virus  and  my  sankey  (aka slicer)  virus.
I won't describe the original technique here, as it's not original anymore.



Though, some additional work has to be done in order to use this  old technique  under  winXP/2000.
The main difference between win9x and xp/2000 for the PE relocation process  is  that  the  default
imagebase under xp/2000 is no longer 0x400000 but 0x10000. It implies that in  order  to  use  this
technique, we first have to relocate the infected executable, because the imagebase of the  process
(often 0x400000) differs from the default imagebase (0x10000) under 2k/XP.



A consequence is that this technique will only be possible  for  executable  which  have  a  .reloc
section (~5% of the executables). But 5% is better than nothing, isn't it ? I  won't  describe  the
whole technique here: the idea is the same as the old one, and for the differences under winxp/2000
(relocating the host), just take a look at the file relocation.asm in win32.leon source code  (this
is a trivial task).



This technique allow us to have some chosen dwords of the virus decryptor encrypted  on  the  disk.
Those dwords are decrypted by the  windows  loader  when  applying  relocations : no  need  to  use
additional decryptor code, windows decrypts our code for us. Let's look at an example:







Finally, we have a 100% api-based polymorphic decrytor, fragmented, featuring fake  api  calls  and
sometimes encrypted through the reloc technique. It's likely that actual AVs engine won't  be  able
to emulate the decryptor. The only flaw is see is located  in  the  poly  engine: it  must  feature
enough obfuscation rules to avoid signature-based detections. Actual  poly  engine  may  be  a  bit
"light", but thanks to kpasm, it shouldn't be hard to add a lot of polymorphism rules to the engine.



9. Results & Conclusion
_______________________


The few techniques presented in this paper or not killer ones. But put together, they are likely to
make life hard for the AVs. As leon is a proof of concept, i didn't spend a lot of time on the poly
rules (well, the poly is still better than a lot of  engines). But  even  with  this  reduced  poly
engine, it took more than three weeks for the AVs  (after i sent them samples)  to  detect  it, and
with a bad detection rate (~80% for sophos, less than 20%  for  the  others). After  6  months, the
detection rate is about 95% for sophos and some others, while most of AVs, e.g KAV, stay  near  15%
(source: virustotal). Maybe the reason for those bad detection rates is because  the  virus  hasn't
been spread into the wild, so avers don't care ?

The main goal has been achieved though: the virus decryptor is not emulated (i did some tests  with
well-known malwares embedded into win32.leon: still not detected). Sophos  detection  seems  to  be
signature based (the signature for win32.leon is ~40k big, seems that they detect win32.leon  tanks
to a lot of different decryptor signatures, but I may be wrong as I didn't ask them :). I won't say
it's a win, infact further tests with a slightly improved  poly  engine  should  be  done  to  draw
conclusions.  But  I  still  believe  that  stealth  api-based  decryptor  are  the  solution   for
undetectability in future viruses. Todays AVs engines are good, and what we have left  is  to  play
with the false positive rate.

Finally, I hope that you enjoyed reading this short paper. Greets go to all FAT members, as well as
ex 29a & IKx members for their past work. Thanks to izee too, for taking the time to read  my  poor
english ;) Don't hesitate to mail me comments or questions.




kaze/FAT


Last updated 2008-07-26 20:18:22 Paris, Madrid