*******************************************
Code Mutations via Behaviour Analysis
by Dr. Theodore Spankman
*******************************************
1) Introduction
2) Basics - Register Operators only
2.1) Analyse me: Local Behaviour Table
2.2) Create Random Code
2.3) Compare me: Blackbox-Analyse
2.4) More Freedom: Global Behaviour Table
3) Extentions
3.1) Including the Stack
3.2) Including Memory
4) Possible Extentions
4.1) Hide me: Global Behaviour Table at Runtime
4.2) Permutation - easy as it can be
4.3) Jump somewhere
4.4) Behaviour Flow
4.5) Automatic shrinking/extention/trash
5) Conclusion
1) Introduction
The basic idea is: The file analyses the behaviour of its own code and
compares it with the behaviour of a randomly generated code. If the
behaviour is the same, the original file-code will be substituted by the
new random code.
This article describes how this idea can be realized.
2) Basics - Register Operators only
2.1) Analyse me: Local Behaviour Table
The program's code is splitted into 8byte blocks of code, padded with
NOPs). With that knowlegde we can parse the whole file, and analyse
the behaviour of each 8byte part.
We only use Register operators for the moment. When we run an unknown
code, all that can be changed are the 8 Registers and FLAGS Registers.
We define a table consisting of 8+1 * 4 bytes:
- - - - - - - - - - - - - - - - - -
Local Behaviour Table (LBT):
Each behaviour Block consists of:
Registers: 9*4=36 byte
0x00 dd EAX
0x04 dd EBX
0x08 dd ECX
0x0C dd EDX
0x10 dd EBP
0x14 dd ESI
0x18 dd EDI
0x1C dd FLAGS
0x20 dd ESP
- - - - - - - - - - - - - - - - - -
Before running the unknown code, we set all registers (except of EPS)
to zero. After running the code, we save the change of the registers:
- - - - - - - - - - - - - - - - - -
sub dword[LBT+0x00], eax
sub dword[LBT+0x04], ebx
...
sub dword[LBT+0x20], esp
- - - - - - - - - - - - - - - - - -
(For the flag-register: pushfd/pop Reg32)
With that table, we know the principal behaviour of out code. Thats
all we need for the moment.
2.2) Create Random Code
We want to create an 8 byte random code. The best way is to use a
template system. One template could look like this:
- - - - - - - - - - - - - - - - - -
OPAlImm8:
cmp ebp, (RandomCodeBufferSize-2)
jg EndCRCCycle
mov ebx, 9
xor ecx, ecx
call CreateSpecialRndNumber
mov byte[RandomCode+ebp], 0x04 ; add al, Imm8
cmp dl, 0
je OPAlImm8Cont
...
mov byte[RandomCode+ebp], 0x3C ; cmp al, Imm8
cmp dl, 7
je OPAlImm8Cont
jmp OPAlImm8End
OPAlImm8Cont:
mov byte[RandomCode+ebp+1], dl
add ebp, 0x2
jmp EndCRCCycle
OPAlImm8End:
- - - - - - - - - - - - - - - - - -
One important restriction is the ESP. The Stack Pointer must stay in
the reserved Stack Memory (from [fs:0x04]-[fs:0x08]). Therefore the
RCG must not be the source of MOV, ADD, ... . INC and DEC are usually
OK if you restore the ESP variable after running the unknown code.
2.3) Compare me: Blackbox-Analyse
We could create a Local Behaviour Table for the file code and the
random code now, and compare the tables. Thats all?
No! We have used an initial condition (setting all registers to zero),
thus the LBT is dependent on that. For example these codes would have
the same LBT:
mov eax, 1 inc al
inc bl inc bl
Or this one:
mov eax, ebx xor eax, eax
That means, when we found equivalent LBTs, we have to make further
tests using different initial conditions.
We can change the registers and the flags to random values, and create
new LBTs with that new conditions. If we run that test with always
changing conditions for a big number of times (I used 100 and 1000 for
register only and flags tests, respectivly), and every single LBT of
the programs code and the random code is equal, we found a code with
same behaviour, and can substitute the old one.
2.4) More Freedom: Global Behaviour Table
It is a very strict restriction to the new random code that it has to
have exactly the same behaviour. Usually its not required to have have
the same behaviour in all registers, most are unused anyway.
We could use this additional freedom, and creating a Global Behaviour
Table (GBT) and compile-time, giving the code the information of which
registers have to be the same and which can be different.
Its a good idea to use a macro and symbolic variables for that
(everything else would be insane):
- - - - - - - - - - - - - - - - - -
; Restrictions:
; A=EAX, B=EBX, C=ECX, D=EDX, P=EPB, S=ESI, I=EDI, F=FLAGS
rNoRes EQU 00000000b
rA EQU 00000001b
rB EQU 00000010b
rAB EQU 00000011b
rC EQU 00000100b
...
rABCDPSIF EQU 11111111b
GlobalBehaviourTableList equ
macro cc restr*, instr*, op1, op2
{
; Macro for padding commands to 8byte each and for
; adding the restrictions to the GlobalBehaviourTableList
local StartCommand, EndCommand
StartCommand:
if op2 eq
if op1 eq
instr
else
instr op1
end if
else
instr op1,op2
end if
EndCommand:
times (8-EndCommand+StartCommand): nop ; padding
match any, GlobalBehaviourTableList ; Add further elements to list
\{
GlobalBehaviourTableList equ GlobalBehaviourTableList,restr
\}
match , GlobalBehaviourTableList ; Add first element to list
\{
GlobalBehaviourTableList equ restr
\}
}
- - - - - - - - - - - - - - - - - -
There is no restiction for ESP, as this register always has to be
equivalent in file code and random code.
One simple example:
- - - - - - - - - - - - - - - - - -
cc rNoRes, nop
cc rA, mov, eax, 0x0 ; eax=0
cc rA, inc, eax ; eax=1
cc rA, add, eax, 2 ; eax=3
cc rA, nop ; eax=3
cc rC, mov, ecx, eax ; ecx=3
cc rC, nop ; ecx=3
- - - - - - - - - - - - - - - - - -
will be translated to
- - - - - - - - - - - - - - - - - -
00402000 > $ 90 NOP
00402001 . 90 NOP
00402002 . 90 NOP
00402003 . 90 NOP
00402004 . 90 NOP
00402005 . 90 NOP
00402006 . 90 NOP
00402007 . 90 NOP
00402008 . B8 00000000 MOV EAX,0
0040200D . 90 NOP
0040200E . 90 NOP
0040200F . 90 NOP
00402010 . 40 INC EAX
00402011 . 90 NOP
00402012 . 90 NOP
00402013 . 90 NOP
00402014 . 90 NOP
00402015 . 90 NOP
00402016 . 90 NOP
00402017 . 90 NOP
00402018 . 83C0 02 ADD EAX,2
0040201B . 90 NOP
0040201C . 90 NOP
0040201D . 90 NOP
0040201E . 90 NOP
0040201F . 90 NOP
00402020 . 90 NOP
00402021 . 90 NOP
00402022 . 90 NOP
00402023 . 90 NOP
00402024 . 90 NOP
00402025 . 90 NOP
00402026 . 90 NOP
00402027 . 90 NOP
00402028 . 89C1 MOV ECX,EAX
0040202A . 90 NOP
0040202B . 90 NOP
0040202C . 90 NOP
0040202D . 90 NOP
0040202E . 90 NOP
0040202F . 90 NOP
00402030 . 90 NOP
00402031 . 90 NOP
00402032 . 90 NOP
00402033 . 90 NOP
00402034 . 90 NOP
00402035 . 90 NOP
00402036 . 90 NOP
00402037 . 90 NOP
- - - - - - - - - - - - - - - - - -
After a few minutes, the new code looks like that:
- - - - - - - - - - - - - - - - - -
00402000 > $ 80D5 39 ADC CH,39
00402003 . 90 NOP
00402004 . 92 XCHG EAX,EDX
00402005 . 4B DEC EBX
00402006 . 8BD6 MOV EDX,ESI
00402008 . 45 INC EBP
00402009 . 20C8 AND AL,CL
0040200B . 33C0 XOR EAX,EAX
0040200D . 2BF2 SUB ESI,EDX
0040200F . 4F DEC EDI
00402010 . 88EE MOV DH,CH
00402012 . 39CD CMP EBP,ECX
00402014 . 12F5 ADC DH,CH
00402016 . 40 INC EAX
00402017 . 4F DEC EDI
00402018 . 22FA AND BH,DL
0040201A . 14 02 ADC AL,2
0040201C . 13C8 ADC ECX,EAX
0040201E . 3C 07 CMP AL,7
00402020 . 83ED C6 SUB EBP,-3A
00402023 . 29DE SUB ESI,EBX
00402025 . 83DA D9 SBB EDX,-27
00402028 . 8BD8 MOV EBX,EAX
0040202A . 80E2 37 AND DL,37
0040202D . 23D4 AND EDX,ESP
0040202F . 91 XCHG EAX,ECX
00402030 . 2C 05 SUB AL,5
00402032 . 88CE MOV DH,CL
00402034 . 21EF AND EDI,EBP
00402036 . 4F DEC EDI
00402037 . 90 NOP ; ecx=3
- - - - - - - - - - - - - - - - - -
3) Extentions
3.1) Including the Stack
Including Stack-Commands is quite easy. There are some things to note:
What happens at 'pop Reg32'? Reg32 changes and ESP changes, we already
have all of that informations in the LBT, so nothing to change for POP.
But what is with 'push Reg32/Imm32'? ESP changes, but we dont know what
is on the stack from the above rules. 'PUSH 0x0' would result in the
same LBT as 'PUSH 0x1' does.
Therefore we have to add an additional dword to the LBT, defining the
latest value of the stack (if newESP=oldESP-4 changed):
- - - - - - - - - - - - - - - - - -
Local Behaviour Table (LBT):
...
Stack: 1*4=4 byte
0x44 STACK change (if push is used)
- - - - - - - - - - - - - - - - - -
One restriction to the random code appears, too:
You must not use pop/push as trash (push/pop is ok). As an example,
you have:
- - - - - - - - - - - - - - - - - -
xor eax, eax
ret
- - - - - - - - - - - - - - - - - -
This could be translated to:
- - - - - - - - - - - - - - - - - -
xor eax, eax
pop eax
push 0
ret
- - - - - - - - - - - - - - - - - -
In principal, a legal substituation from the rules above, but fatal.
One can solve that in two ways: Increase the Local Behaviour Table by
8 dwords, and compare the changing of the stack; or you could just
add a byte to note whether the pop occured before the push (this is what
I've used in my example virus).
3.2) Including Memory
It is very tricky to use memory instructions for the random code and
get the behaviour later.
First thing is, that we have to use an Exception Handler. I tried SEH,
but there are problems because the EXCEPTION_REGISTRATION and the
callback function handle has to be on the stack, and this value can be
changed by the random code (for instance "SBB DWORD PTR SS:[ESP+B],EAX",
which actually happened).
As it's the worst thing when the Exception Handler will be destroyed, we
have to use something else: VEH (Vectored Exception Handler). This is
provided by Windows XP+ as AddVectoredExceptionHandler-API.
Second problem: The memory may overwrite any data that is not protected.
That means: Its own code (if its in the .data section), any data created
with Virtual Alloc (and was no Read/Execution-Only-Flag).
We have to use VirtualProtect to any allocated memory.
Third problem: We can not and should not protect the .data section, as
we want new code-substituations for that, too. But the random code can
overwrite some important data in the .data section. A solution is to
mirror the .data memory with READ-ONLY potection (or NOACCESS flag)
After the execution and analyse of the random code, we can restore the
original .data section.
Fourth problem: We have to save the pointer to the DataMirror somewhere.
The register may change, so we have to use the .data section. But this
pointer can be overwritten by random code, then we dont know the place
of the DataMirror anymore, and can not restore the original .data - big
problem!
However, we can use some self-repair technique: We store the pointer to
the DataMirror three times at totally different positions in the .data
section, and after execution use that algorithm:
- - - - - - - - - - - - - - - - - -
mov eax, hDataMirror1;
if (hDataMirror1!=hDataMirror2) { mov eax, hDataMirror3; }
- - - - - - - - - - - - - - - - - -
eax is the pointer to the DataMirror if there is maximum one destroyed
pointer. This should be enough for us (we just have 8 byte of random
code).
Fifth problem: The instructions may rely on the values of the memory,
just the same thing as with initial register values. We have to fill the
memory as well as the stack with random memory, to prevent that
dependence on the initial values.
Sixth problem: The result of the random code (the local behaviour table)
can not be saved in .data (as this one will be restored later) or in the
DataMirror (as this one is used for comparing). So one has to allocate
additional memory for saving the LBT temporarily, and copy it to the
restored .data Section later.
For saving the results I added 2*4 additional dwords to the Local
Behaviour Table:
- - - - - - - - - - - - - - - - - -
Memory: 2*4*4=8*4=32 byte:
0x24 dd MemOffset1
0x28 dword[MemOffset1]
0x2C dd MemOffset2
0x30 dword[MemOffset2]
0x34 dd MemOffset3
0x38 dword[MemOffset3]
0x3C dd MemOffset4
0x40 dword[MemOffset4]
- - - - - - - - - - - - - - - - - -
There is space for saving results of four changes of the random code,
each change uses two dwords, one for the addresse that has been changed,
and one for the actual data that has been changed.
The chance that a valid change appears is very slow if we use totally
random initial values. One could increase the chance by using the
following trick:
All registers get a small random value between 0 and ~200, and one
variable gets the value of the .data-section:
eax=0x33 ebp=0x67
ebx=0x8A esi=.data + 0x12
ecx=0x04 edi=0xAA
edx=0x12 esp=esp
This increases the chance of valid memory changes alot - for instance:
mov dword[esi+ecx], eax
will be valid, which would have not be if the registers were totally
random.
This diagram shows what will be done with the memory blocks while
analysing one unknown code:
Preparation
===========
___________
mirroring | |
----------> | Data |
| | Mirror |
___________ | |
| | | |
| .data | |___________| ___________
| | | |
| | | Random |
| | --------------------> | Data |
|___________| important values | |
| |
|___________|
___________
| |
| Data |
| Mirror |
___________ | |
| | | |
| .data | |___________| ___________
| | | |
| | | Random |
| | <-------------------- | Data |
|___________| randomize .data | |
| |
|___________|
E X E C U T I O N
Comparison
==========
___________
| |
| Data |
| Mirror |
___________ | |
| | | |
| .data | |___________| ___________
| | | |
| | | Random |
| | <<------------------>> | Data |
|___________| compare data | |
| | |
| |___________|
| ___________
| | |
-----------> | temp. LBT |
results |___________|
Restoration
===========
___________
restore | |
---------- | Data |
| | Mirror |
V | |
___________ | |
| | |___________|
| .data |
| |
| | ___________
| | save LBT | |
|___________| <------------------- | temp. LBT |
|___________|
4) Possible extentions
These are some ideas that increase the power of this technique and
are easy to implement - everything without the need of deep code
analyse or full disassembling.
4.1) Hide me: Global Behaviour Table at Runtime
The technique above uses a behaviour-table hardcoded in the file,
which could be a source for detection by lazy avers.
There is no need for that, one could create the GBT at runtime, and
write it to the memory. It's just a little bit tricky as the GBT
is defined implicit for all commands - but some fake-code or hand-
adjusted GBT can solve that problem.
4.2) Permutation - easy as it can be
As the whole code consists of 8byte command blocks, its very easy
to use permutation - there is no need of disassembling as in usual
permutation engines. One can simply copy the different 8byte blocks
to a random place in the file and connect them with Jmp-Operations.
4.3) Jump somewhere
One weakness of this technique may be the handling of jmp/ret/call
commands. One must not execute them due to unexpected behaviour,
therefore I have defined a further restriction in the GBT: rNoEmul.
If this command appears, the corresponding 8byte block will not be
analysed, thus no equivalent codes will be found.
As the code knows where a jump will be (because of that rNoEmul
byte in the GBT), it can include trash or change the instruction
(call -> pop Reg32 + jmp Reg32). When trash is included, it should
restrict the trash to not contain significant operation opcodes
(0xC3 = ret, 0xE8 = call, 0xEB = jmp, 0x7x = jxxx).
4.4) Behaviour Flow
Imagine you have a code like this:
- - - - - - - - - - - - - - - - - -
cc rA, mov, eax, 0
cc rAB, mov, ebx, eax
cc rB, inc, ebx
cc rBC, xor, ecx, ecx
- - - - - - - - - - - - - - - - - -
Command 3 and 4 are independent and could be exchanged. But how can
the code itself find that out?
We could analyse both commands (16 bytes) together in a black-box
test and compare output. If the behaviour of the changed 16 bytes
is the same as the original one, we could use the new one. We can
see immediatly that line 2 and 3 can not be exchanged - exactly the
same result we would get in the blackbox test.
The only thing we have to think of is to adjust the restrictions,
which is straight forward.
In the end, we get some kind of behaviour flow, a very beautiful
result:
- - - - - - - - - - - - - - - - - -
cc rA, mov, eax, 0 cc rA, mov, eax, 0 cc rA, mov, eax, 0
cc rAB, mov, ebx, eax cc rAB, mov, ebx, eax cc rAC, xor, ecx, ecx
cc rB, inc, ebx -> cc rBC, xor, ecx, ecx -> cc rABC, mov, ebx, eax
cc rBC, xor, ecx, ecx cc rBC, inc ebx cc rBC, inc ebx
- - - - - - - - - - - - - - - - - -
4.5) Automatic shrinking/extention/trash
Shrinking and extention is usually a very hard buisness - requires
full code disassembling to some meta-languages. Here it is much
easier, the only thing we have to do is adjusting the jump addresses.
For shrinking, we just use take a 16byte block (instead of the
usual 8 byte block) for analysing, and compare it with a 8 byte
block of random data - very simple!
The extention is the inverse process - we take a 8byte code from
the file code and compare it with a 16byte block of random code.
The random 16byte block should consist of 2 independent 8 byte
blocks, to prevent problems when this code will be analysed alone.
This is very trivial too.
The trash is a similar technique: We get an 8 byte code and analyse
its behaviour with respect to the restrictions of the direct-above
block. Again - very simple.
5) Conclusion
Analysing the behaviour of code and substituate with random instructions
is powerful as well as beautiful.
We should use more the power of unpredictable randomness!
Dr. Theodore Spankman
December 2010
PS1: The initial idea comes from an article released in August 2004:
http://vx.netlux.org/lib/vsp14.html
PS2: Special thanks goes to hh86 for motivation, help and useful ideas
in connection with this research... Merci! :)