Permutation conditions ---------------------- Here i'm trying to define conditions, when it is possible to change order of some consecutive x86 instructions and instruction blocks, i.e. swap them, but keep program working the same. Imagine that we have "ADD EAX, EBX" which is followed by "MOV ECX, EDX", and we know that both instructions are not the destination of some JMP, CALL, or any other code/data/algorithmic reference. In such a case they can be swapped with each other. Now lets examine conditions under which such a trick can be performed for some arbitrary consecutive instructions. 1. ONE instruction Let it consists of: operation code, source object set, destination object set. Object set is a bitset of properties, instruction deal with: common registers: EAX,ECX,EDX,EBX,ESI,EDI,ESP,EBP (8 bits) flags (1 bit) memory reference (1 bit) We will enclose sets into { } brackets, as such full set is {EAX,EBX,ECX,EDX,ESI,EDI,ESP,EBP,flags,memory} and empty set is {}. In binary notation it may look just like a 10-bit number. Memory reference means that at least any byte in memory can be read/written by our instruction. We dont distinguish memory addresses, because of the following: if there are two instructions, first one accessing memory at [eax], second one accessing memory at [ebx], and we dont know if eax/ebx are equal or not (or 4-byte ranges intersects); then, it is just a memory reference for us, and we only know that it could be the same variable. Other properties, such as EIP, segment-, FPU-, MMX-, SSE-registers, are not considered here, since limited application of our theory. We just assume, that no one instruction (except maybe NOP) can be mixed with INT, RET, CALL, JMP, JXX, or FPU/MMX instructions. "Source object set" is a set of objects to be read. "Destination object set" is a set of objects to be written/modified. Any destination object supersedes source object. I mean that in "ADD EAX, EBX" instruction, EBX is source object, and EAX is both source and destination object, but we say that EAX is destination object only, and we never say that EAX is source; however, we always keep in mind that any destination object can be also (indirectly) source object. But, if we have "ADD ECX, ECX" instruction, we MUST include 2nd (src) ECX into source object set. Lets consider some typical instructions: INSTRUCTION DESTINATION OBJECT SET SOURCE OBJECT SET nop mov R1, R2 R1 R2 cmp R1, R2 flags R1, R2 add R1, R1 R1, flags R1 adc R1, R2 R1, flags R2, flags mov R1, [R2+R3] R1 R2, R3, memory add R1, [R2+R3] R1, flags R2, R3, memory sub [R1+R2], R3 flags, memory R1, R2, R3 test [R1+R2], R3 flags R1, R2, R3, memory inc R1 R1, flags lea R1, [R2+R3] R1 R1, R2 xchg R1, [R2] R1, memory R2 push R1 ESP, memory R1 pop R1 ESP, R1 memory cld flags lods ESI, EAX flags, memory rep lods ESI, EAX, ECX, flags flags, memory stos EDI, memory EAX, flags rep stos EDI, ECX, memory, flags EAX, flags rep cmps ESI, EDI, ECX, memory, flags memory 2.1. TWO instructions let they look as CMD1 DST1, SRC1 CMD2 DST2, SRC2 where DST1, SRC1, DST2, SRC2 are source/destination object sets. Any of these sets could be empty. We also assume that both instructions are already filtered/checked, i.e. no one of them is JMP, CALL, RET, etc. So, we have 4 sets, and we have 6 relations between each 2 of these sets. {C(4,2) = 4!/2!*(4-2)! = 6} 2.2. Simple conditions Each of six relations is bitwise AND operation on the two bitsets, returning logical ZERO or NON-ZERO. In other words, each two sets can intersect or not. #define INTERSECT(bitset_X, bitset_Y) ((bitset_X & bitset_Y) != 0) Here is 3 conditions, when it is possible to swap two instructions: (DST1 & DST2) == 0 [1] (DST1 & SRC2) == 0 [2] (DST2 & SRC1) == 0 [3] Other 3 (unused) relations looks as following: (DST1 & SRC1) == whatever [4] (DST2 & SRC2) == whatever [5] (SRC1 & SRC2) == whatever [6] Example: add eax, ebx # cmd1=ADD dst1={EAX,flags} src1={EBX} sub ebx, ecx # cmd2=SUB dst2={EBX,flags} src2={ECX} [1] == FALSE {EAX,flags} intersects {EBX,flags} [2] == TRUE {EAX,flags} doesnt intersects {ECX} [3] == FALSE {EBX,flags} intersects {EBX} as such, these two instructions can not be swapped. 2.3. Advanced conditions There exists some advanced conditions, which requires more analysis. In such a cases, some of [1], [2], [3] conditions can be ignored. For example: (CMD1 == CMD2) && (DST1 == DST2) && [2] && [3] [7] or #define GROUP_1(cmd) ((cmd == ADD) || (cmd == SUB)) GROUP_1(CMD1) && GROUP_2(CMD2) && (DST1 == DST2) && [2] && [3] [8] For example (below), [1] is FALSE, but since [8], the following instructions can be swapped: add eax, ecx sub eax, edx Now, imagine, that we have the following commands: mov [ebp+4], eax mov [ebp+8], ecx Such a situation should be handled in a special way; otherwise our theory will grow to contain lists of variables in the object sets; also this will require checking for intersection of memory ranges & etc. Except that all, we can ignore condition [1], in cases when instructions are followed by code which doesn't uses DST1 and DST2 object sets. But this will require much more analysis, and i think it is not optimal. 3. From single instruction to block of instructions Imagine that we have these instructions (following each other): [a] add eax, ecx [b] adc eax, edx [c] add ebx, ecx [d] adc ebx, edx [e] sub esi, edi We can not swap [a] and [b], and we can not swap [c] and [d]. But we can swap [ab] and [cd]. As such, we need some entity to replace consecutive instructions with. 4. ONE block of N instructions CMD1 DST1, SRC1 CMDi DSTi, SRCi CMDN DSTN, SRCN If we will consider this block as a single instruction, how to calculate summary SUM_DST and SUM_SRC object sets? I.e. sets of objects which are read and written by that code block. Simplest way is to do BITWISE OR for all SRC/DST[i]; it is correct, but then the we will lose some possiblities in the future, as shown in the following example: mov eax, ebx add eax, ecx adc edx, eax here, SUM_SRC, if calculated as SRC1|SRC2|SRC3, is equal to EBX|ECX|EAX, however it is not necessary to include EAX into that set, since it is initialized inside of this block, but not passed into. So, algorithm of calculating SUM_SRC and SUM_DST is the following: SUM_SRC = 0 SUM_DST = 0 for(i = 0; i < N; i++) # in the example shown above (N=3): { # i = 0 1 2 SUM_SRC |= SRC[i] & (~SUM_DST) # SUM_SRC = {EBX} {EBX,ECX} {EBX,ECX} SUM_DST |= DST[i] # SUM_DST = {EAX} {EAX} {EAX,EDX} } 5. TWO BLOCKS Working with blocks is the same as working with single instructions, except that SUM_SRC and SUM_DST should be calculated, as it were shown above. You can swap two blocks, or block and single instruction, it doesn't matter. 6. How to find out object sets for some given instruction? Probably, using something like ADE32 and some additional analysis.