Thoughts About The Use Of Garbage
Instructions In Polymorphism
by Sorcerer
Thoughts About The Use Of Garbage Instructions In Polymorphism
==============================================================
By The Sorcerer

  Most texts on polymorphism suggest that the use of garbage instructions are paramount, in my playing with polymorphism I have come to the conclusion that Garbage is of limited use in protecting a virus from AV software and can in fact do the opposite.
	
  Garbage instructions are instructions that can be added between other instructions in the virus's decryptor.
  The initial aim of this is to increase the number of potential decryptors that the polymorphic engine could produce from the model.
  An example of garbage instructions in use is the following:

	Original Code
	-------------
	MOV	eax,FFh
	ADD eax,ebx

	Code with Garbage Instructions 1
	-----------------------------
	MOV	eax,FFh
	INC	eax
	DEC	eax
	ADD eax,ebx
	
	Code with Garbage Instructions 2
	-----------------------------
	MOV	eax,FFh
	ROR eax,3
	XOR eax,ebx
	XOR eax,ebx
	ROL eax,4
	ADD eax,ebx
	
	As can be seen from the example the garbage instructions have altered the look of the original code without changing the properties of it.

	AV software countered the use of garbage bytes on there own with wildcards in their scan strings.
        An example of how wildcards in scan strings works is :

	Original Scan String
	--------------------
	B8FF 0000 0000 03C3

        Wildcard Scan String
	--------------------
	B8FF 0000 0000 * 03C3

	The first scan string would only match against the original code.
        The second scan string has a wild card in it which can match against anything.
        The second scan string would match the ``Original Code'', the ``Code with Garbage Instructions 1'' and the ``Code with Garbage Instructions 2''.

	Thanks to wild cards in scan strings the use of garbage instructions on their own became ineffective.
        In fact heuristic scanners will get suspicious of a file that contains instructions that look like garbage instructions.
        For example the instructions

	XOR eax,ebx
	XOR eax,ebx

	do not do anything other than to alter eax and then alter it back again.
        This would be spotted by a heuristic as garbage instructions.
        If enough of these are detected then the heuristics would flag the file as containing a potential virus.
        This can be avoided if the routine that generates the garbage instructions produces more complicated garbage instructions that the heuristics do not detect.

	Now we know that heuristics can detect the more simple garbage instructions we need to consider if the increase in the potential number of encodings is worth the risk of being detected by a heuristic.
        Rather than work out the difference in the potential number of encodings between a decryptor with garbage instructions and a decryptor without garbage instructions, we will just look at the potential number of encodings of the the decryptor without garbage instructions.
        We will work this out with the assumption that each instruction has 3 alternate forms (A simplification of the situation but accurate enough for our purposes).
        The following table shows the potential number of encodings in this situation at 10, 20, 30, 40, 50 and 60 instructions in the decoder.

	Number of instructions		Potential number of
	in decoder                      encodings
    	10				59049
    	20				3486784401
    	30                        	~2.005891 * 10 ^ 14
    	40                        	~1.21577 * 10 ^ 19
    	50                        	~7.17898 * 10 ^ 23
    	60                        	~4.23912 * 10 ^ 28

  From this we can see that if we have 10 instructions in our decoder model then we can only produce 59049 difference versions of the encoder, this is not really enough to be a problem for AV companies.
  If we have 20 instructions in our decoder then we can produce over 3 billion different versions of our decoder, which is much more work for the AV companies.
  The key point to take from this table though is that the more instructions in our decoder model the more versions of the decoder can be produced by the polymorphic engine.

  This suggests that rather than using a simple encryption method with garbage instructions we should consider using more complex (and stronger) encryption methods.
  Imagine that instead of using a simple XOR encryption scheme with a regularly increasing key a virus was to use 3DES for the encryption.
  No more garbage instructions to flag up the heuristics and how far into a series of 3DES decodes would an AVs emulator go bearing in mind that 3DES is quite common in commercial software?