Encryption, Scan Strings & You ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Intro: So, you've written the perfect polymorphic engine, have you? You've spent countless hours perfecting it so that utilizes most of the instructions available to the CPU, make the junk instructions look real, create working subprocedures, encrypt in multiple layers, use self modifying code, and to vary the algorithm and registers used for decryption. You've really put a lot of work into this to make it as hard as possible for the anti-virus guys to find you. Unfortunately, you never were that good at encryption, thus you stuck with the basic methods of encrypting (xor, add, sub, etc). Well, guess what. You might as well not have bothered writing the polymorphic engine in the first place! That's right! The anti-virus chaps can scan string you as if your poly engine was not there at all. How can this be? Well let's look at some examples, starting with the most lame and working our way up. ************ SCAN STRINGS ************ NOT/NEG: -======- Some people, which will remain nameless, think that the "not" and "neg" instructions are a form of encryption. So let's see what happens. Virus Body: 01 80 65 2A 31 ... Not "Encrypted" Virus: FE 7F 9A D5 CE ... Since there in no possible variation in the "encrypted" pattern, why should the AVs bother to scan for "01 80 65..." when the "FE 7F 9A..." number sequence is sitting right in front of them? They just will use "FE 7F 9A..." as the scan string. ROL/ROR: -======- Other people like to use ror and a number from 0 to 7 (when working with bytes). There are two ways this can be tackled. Since there are only 8 possible variants to ror/rol, the AVs can add 8 signatures to their data files and claim another 8 viruses that they detect (how do you think we've got so many viruses?). The other method I'll get to later, Of course, other methods are not so easy. This led AVs to develop a way look at the scrambled sections of code and find patterns that do not change. Relax. It's not that difficult. Watch: XOR: -==- Ah, everyones favorite method of encryption. So, let's say you have your virus, and you encrypt it with any old number. Encrypted with: 7A Virus Body: 01 80 65 2A 31 7B ... Xor Encryption: 7B FA 1F 50 4B 09 ... So, what does this mean? We have (where ^ = xor, and K= key): (01^K) (80^K) (65^K) (2A^K) (31^K) (7B^K) ... An interesting phenomenon was noticed. Let's xor the first and second terms of our encrypted virus. We get: (01^K)^(80^K) Now, xors can be done in any order. So we can remove the brackets: = 01^K^80^K Let's rearrange: = 01^80^K^K But since any number xored with itself is 0, K^K=0. Thus, this is: = 01^80 = 81 Oh oh. The Key value is now totally irrelevant! Let's make a series of numbers by taking the first and second, second and third, etc. 81 (80^01) E5 (65^80) 4F (2A^65) 1B (31^2A) 4A (7B^31) So we get: 81 E5 4F 1B 4A This can be used as a scan string. Unclear? Let's go back to the example: (Number Encrypted with: 7A) Virus Body: 01 80 65 2A 31 7B ... Xor Encyption: 7B FA 1F 50 4B 01 ... Scan String: 81 E5 4F 1B 4A ... (notice it's one shorter than the above) So, without knowing the key, let's see if "7B FA..." can be scan stringed. To start xor the first and second numbers: 7B^FA= 81 And the next two: 1F^FA= E5 Next three sets: 50^1F= 4F 4B^50= 1B 01^4B= 4A Scan strings match. Game over. To sum up what happened, the AV guys figured out that you don't need to be looking for specific bytes, but rather a relationship between certain bytes. This relation does not vary with the key used in the encryption. This RELATION is then used as a scan string. That means they go along calculating the xors of consectutive bytes and store it someplace. If they get a match in what they are storing to a known relation of a virus, it's flagged. It's not quite as accurate as a regular scan string, but it's pretty close. So, would upgrading to a 16 or 32 bit key save the XOR method? Afraid not. This is why: (16-bit Number Encrypted with: 7A 21) Virus Body: 01 80 65 2A 31 7B ... Encrypt by: 7A 21 7A 21 7A 21 ... Encrypted: 7B A1 1F 0B 4B 5A ... Now notice that the 1st and 3rd entries are encrypted with the same byte. Thus, a scan string is generated from every second byte of the encrypted virus in the same manner as above (1F^7B= 64, 4B^1F= 54, ...). This causes the area in the virus that is to be a scan string to have to be twice as long. But since a scan string is only like 15-30 bytes (?), that's not all that much. Even a 32-bit encryption key can be easily scan stringed by simply taking the first, fifth, ninth... bytes. Since the first, fifth, ninth... bytes are always going to be encrypted by the same byte value regardless of whether an 8-bit, 16-bit or 32-bit key is used, it is a good bet that the AVs use every forth byte to scan string all xor encrypted sections. Since increasing the key length does not help, maybe xor/adding the previous value of the plaintext together before xoring might help. I call it the "Previous Value XOR". It doesn't work either - watch: Virus Body: 01 80 65 2A 31 7B ... Xor Encyption: (00^01)^K= 01^7A= 7B (01^80)^K= 81^7A= FB (80^65)^K= E5^75= 90 ... The problem here is that the stuff in the brackets is a constant. It does not change, hence for all this algorithm cares, there may as well be an 81 instead of 80 in position 2 of the virus (likewise with the E5 instead of the 65). Therefore, the same principle as above works to scan string this. ADD/SUB: -======- So, does the add/sub method fare any better? Nope. Let's look at an add example. Add Encyption: = (7A+01) (7A+80) (7A+65) (7A+2A) (7A+31) (7A+7B) = 7B FA DF A4 AB F5 ... Now, let's see what happens when you subtract the first and second elements. = (7A+01) - (7A+80) = (7A+01) - 7A - 80 [Carry minus thru brackets] = 7A + 01 - 7A - 80 [Remove useless brackets] = 01 - 80 + 7A - 7A [Can rearrange now] = 01 - 80 [Cancel the two keys - Oh oh] = 81 Yes, that last "subtraction" looks weird, but this math is modulo 100h. That means that once you add past FFh, you start again at 0 and once you subtract past 0, you start again at FFh. If we continue the pattern, we get a scan string of: 81 1B 3B F9 B6 ... And if we test this, we see that indeed it does indeed catch every possible encryption value for add. Sub encryption suffers the same flaw, but instead of subtracting to get the scan string, you add the bytes. So, does a longer key help us here? Not really. This process easily applies to 16-bit and 32-bit keys by reading every 2nd or 4th byte to get the scan strings. This makes it hard to scan for the parts encrypted by the upper half or 3/4 of the key. However, it has the same flaw as the xor - you can always apply this to every 4th bytes, and thus be able to scan all the add/sub encrypted parts with one pass (remember: speed is an issue to scanners - I doubt they can afford to make useless passes of every file they scan). So, with that, there go the 3 most classic methods of simple encryption. Anything else that creates these scan strings? Yup! ROR/ROL: -======- Back to bit rotation. There is another method, that, despite probably being slower and less accurate, is worth mentioning. For each of the bytes, rotate by some number. Choose the number of bits to rotate by to make the largest number. For example: Virus Body: 01 80 65 2A 31 7B ... Since 01h = 00000001b, the largest number by rotation is 10000000b = 80h Similarly: 80 = 10000000b -> 10000000b = 80h 65 = 01100101b -> 11001010b = CAh 2A = 00101010b -> 10101000b = A8h 31 = 00110001b -> 11000100b = C4h 7B = 01111011b -> 11110110b = F6h Thus the scan string is: 80 80 CA A8 C4 F6 ... If you check against any rotation of the virus body, you will see that this method does indeed catch it. Note 1) This scan is more likely to capture other strings as well as this scan string will just as easily flag "20 08 56 2A 4C F6" as being a match. This string could not result from the rotation encryption. This necessitates longer strings and/or checking if the string is really present by other methods. Note 2) This scan is not actually as cumbersome as it may seem. It can be sped up drastically by generating a look up table for each number's maximum attainable value resulting from rotation. Well, that sums up how the most popular methods can be overcome. So, is there a solution? Of course there is! If all cryptography was this easy to play with, PGP would be out of business. ********* SOLUTION? ********* A few simple things that don't scan string easily are as follows: Look Up Table: -============- The classic idea of replacing one thing with another. For example, if you replace all the letter 'A's in a text with 'S's and all 'Z's with 'Y's, etc. To do this, you need a simple look up table: LookUpTable: db 0 db 1 db 2 ... db 0FFh A lot of typing? Lazy bastard! Define this macro then: LUmacro macro cntr = 0 rept 100h db cntr cntr = cntr + 1 endm endm then define your look up table like this: LookUpTable: LUmacro Now with everything so nicely defined, how would you go about using it? Decryption: xor eax, eax ; Clear top of eax mov esi, Source ; Set up the pointers mov edi, Destination mov ecx, SizeToDecrypt ; Size of encrypted part DecryptLoop: lodsb ; Load the encypted byte mov edx, offset LookUpTable ; Point to start of look up table mov al, byte ptr [eax+edx] ; Look up what the byte should be stosb ; Store it loop DecryptLoop ; Next! The encryption process is a little trickier. You must do the reverse process and scan the lookup table for the value, and write out the offset from the lookup table. Encryption: xor eax, eax ; Clear top of eax mov esi, Source ; Set up Source & Dest mov edx, Dest mov ebx, SizeToEncrypt ; Size EncryptLoop: lodsb ; read in value to encrypt mov edi, offset LookUpTable ; edi = LookUp Table mov ecx, 100h ; 256 entries in table repnz scasb ; Find value in table not cl ; Find offset in table(quick and dirty) mov byte ptr [edx], cl ; Store it inc edx ; Point to next dec ebx ; Decrement counter jnz EncryptLoop ; Repeat if more to encrypt Changing the lookup table from generation to generation is fairly easy. There are two possible ways: 1) random, but make sure no entry is there two or more times. 2) randomly start exchanging entries in the look up table. Think of this like shuffling a deck of cards :-) Important Reminders: 1) DO NOT ENCRYPT THE LOOK UP TABLE! Remember, the lookup table changes from run to run, so despite it being in the open, it cannot be scan stringed. 2) Each entry in the look up table has to be distinct and all the possible offsets into the table must be used. In math terms, the relation must be 1-1. Since you're usually going to be using a look up table with 256, this simply means that NONE of the entries in the table is allowed to match with any other entry in the table AND all values 00 through 0FFh must be used, lest decryption is not possible. Previous Value Jumping Encryption: -================================- Remember the section above, where I showed that just xoring in the previous value you encrypted into the next encryption? Well, if you add one little trick to that, it will not be scan stringable. The idea comes from group theory in math. Lets say you have your virus: 10 20 05 FE 87 CA 21 09 As you can see, this "virus" is 8 bytes long. :-) The idea is that you encrypt the current byte with the previously encrypted byte, then jump past a few bytes and repeat the process. Once you get past the end of the virus, you go back and start again. So, in this case, let's say we use the a jump value of 3 and start at the 20. That means you encrypt the 20 using the previous byte (in this case some initial value), and then you "jump" 3. This takes you to the 87. You encrypt the 87 with the 20, and jump 3 more. This takes you to the 09 which you encrypt with the 87. Now you jump 3 more, but you when you get to the end of the file, you keep on counting from the beginning. This takes you to the 05 which you encrypt with the 09. You repeat this process for the number of bytes you wish to encrypt (8 times in this case). There are a few very important things to note about this however: 1) The jump value (the number of bytes to jump - 3 in my example) MUST vary with each encryption, otherwise this suffers the same flaw as the previous value xor. Just think of the jump value as the encryption key. The starting position does not have to vary, but it can. 2) The jump value MUST be relatively prime to the size of the thing you are encrypting. That means that the largest number that evenly divides both the jump value and the size HAS to be 1. If it isn't, then you are not going to cover the entire region you are trying to encrypt (and the example code I give, will probably not work). A good way to ensure this, is to make the size of the part being encrypted a prime number (get the next largest prime from a list then add a few bytes to your code to make it that big - you'll probably use less bytes then if you try to do the division tests). 3) The jump value cannot be 0 (duh!), and it works best if it's less than the size of the region you are encrypting. So, with that out of the way, let's do an example: [Note: I'm using XORs, but it can be anything] The virus: 10 20 05 FE 87 CA 21 09 Jump Value: 3 Initial 'Last Value': 10 Starting position: The 05 So, the 05 would become: 10 (LastValue) xor 05 (Position)= 15 ...and last value would become 05, and we jump to the CA CA-> 05 xor CA = CF (Last Value becomes CA. Goto 10) 10-> CA xor 10 = DA (Last Value becomes 10. Goto FE) FE-> 10 xor FE = EE (Last Value becomes FE. Goto 21) 21-> FE xor 21 = DF (Last Value becomes 21. Goto 20) 20-> 21 xor 20 = 01 (Last Value becomes 20. Goto 87) 87-> 20 xor 87 = A7 (Last Value becomes 87. Goto 09) 09-> 87 xor 09 = 8E (Since we did this 8 times, we are now done.) Thus, the encrypted virus: DA 01 15 EE A7 CF DF 8E Simple, no? Well, anyways, lets see some code: Decryption: mov esi, Source ; Init Source & Dest mov edi, Destination mov ah, InitialValue ; Initialize Previous value mov ecx, SizeToDecrypt ; Init Size mov edx, StartingPosition ; Init Starting Position DecryptLoop: mov al, byte ptr [esi+edx] ; Read byte xor ah, al ; Decrypt with last byte ; Last byte= unencrypted byte mov byte ptr [edi+edx], ah ; Write it out add edx, JumpValue ; Jump by some value cmp edx, SizeToDecrypt ; Is [esi+edx] in bounds? jb short DecryptSkipSub ; If not, sub edx, SizeToDecrypt ; then make it in bounds DecryptSkipSub: loop DecryptLoop Encryption: mov esi, Source ; Init Source & Dest mov edi, Destination mov ah, InitialValue ; Initialize Last Value mov ecx, SizeToEncrypt ; Init Size mov edx, StartingPosition ; Init Starting Position EncryptLoop: mov al, byte ptr [esi+edx] ; Read in byte to Encrypt xor ah, al ; Encrypt with last byte mov byte ptr [edi+edx], ah ; Store encrypted mov ah, al ; Save last byte (unencrypted) add edx, JumpValue ; Jump by some value cmp edx, SizeToEncrypt ; See if esi+edx is in bounds jb short EncryptSkipSub ; If not, sub edx, SizeToEncrypt ; then make it in bounds EncryptSkipSub: loop EncryptLoop There are a lot of variations on the above code. The trickiest part is to make sure that the numbers are relatively prime. I don't feel like writing out the code, so I'll only give an outline (this can all be avoided by SizeToEncrypt being prime): Pick a number X to try for Jump Value between 1 and the size In a loop Y of 2 upto and including X: divide X by Y. If X does not divide Y evenly (remainder not 0) then continue in loop else Divide The Size by Y If this divides evenly, pick a new X and start over If not, continue. If you finish the inner loop for any X, X is usable. Multiple Encryption Instructions: -===============================- This is perhaps one of the simplest. Instead of using just one xor or add, it uses multiple instructions. It is doubtful that AV companies will go to the trouble and expense - not to mention slowing down their scanners - by testing all files with some weird algorithm for a single virus. That means that combinations such as: xor al, SomeValue add al, SomeOtherValue while, theoretically, scan stringable are probably not going to be. That said, there are a few things that must be said: 1) The instruction order for an encryption has to be the reverse of a decryption. For example, if the encryptor is: xor al, SomeNum add al, SomeNum2 Then the Decryptor has to be: sub al, SomeNum2 xor al, SomeNum 2) The keys for all encryption instructions have to be random and independent. 3) Having 2 consecutive instructions of the same type is worthless as instructions of the same type can be effectively combined into one. For example a ror by 1 followed by a rol by 3 is in essence a rol by 2. If you're making a polymorphic engine, then it's not too difficult to expand on this idea. Make the encrypting instructions themselves random. To accomplish this you do something like this: - ... - create encryption/decryption instruction pair - create junk - loop on the previous two until you are satisfied there are enough - ... Creating the encryption and corresponding decryption instruction is easy with an accurate opcode listing. The problem, however, is with Rule #1 above. The instructions to encrypt must be executed in the reverse order than the ones used to decrypt. That said, there is an elegant solution. You create your own little "stack". Just define: Stack db MaxStackSize dup (?) ; 30 for size should be enough EndOfStack equ $ StackPtr dd EndOfStack Before you start, simply "push" a RET onto the stack dec StackPtr mov edi, StackPtr mov byte ptr [edi], 0C3h ; Put on the ret instruction Now, whenever you add a decryption instruction to the decryptor, simply "push" the corresponding encryption instruction onto this stack. For example, if you added an "add cl, 45h" (80 C1 45) to your decryptor, then you add: "sub AL, 45h" (2C 45) to your stack. Please note, since the encryptor is in the encrypted part of the virus, you can make your life easy on yourselves and not have to worry about using different registers for your encryptor. Keep it simple and use AL. So, the code would be something like: ; ... ; write "80 C1 45" (add cl, 45h) to decryptor ; ... sub StackPtr, 2 ; Push two bytes onto "stack" mov edi, StackPtr mov word ptr [edi], 452Ch ; "sub al, 45" in rev. order When you are done creating your encryption instructions, your "stack" might look something like this: add al, 0B2h xor al, 97h ror al, 2 sub al, 45 ret Do I really have tell you that for each byte you are encrypting you should just call this "function"? Remember, your stack pointer is neatly pointing to the first instruction. How convinient. BTW - I fully realize that the code presented is unoptimized, but that's not the aim of this tutorial now, is it? Shuffling: -========- This isn't really encryption, but it does evade scan strings. You can combine this method with just about any other, to add in encryption if you so desire. The problem with shuffling, is not so much how to shuffle the bytes, but rather how to un-shuffle them. To do this, one of many ways is to take one byte at a time and swap it with a byte pointed to by some relation. This relation has to be random from generation to generation. If you don't vary the relation, then the shuffled part will not change from generation to generation allowing use of a simple scan string to pick it up. I recommend using the same idea as the "Previous Value Jumping Encryption" mentioned above. For example: JumpKey: 3 Starting offset: 2 (pointing to the 03) Encrypted Virus (call this array T): 01 02 03 04 05 To decrypt, do: T[0] <-> T[2] = 03 02 01 04 05 T[1] <-> T[2+3] = T[1] <-> T[0] = 02 03 01 04 05 T[2] <-> T[0+3] = T[2] <-> T[3] = 02 03 04 01 05 T[3] <-> T[3+3] = T[3] <-> T[1] = 02 01 04 03 05 T[4] <-> T[1+3] = T[4] <-> T[3] = 02 01 04 05 03 All clear? There is one small caveat however. The shuffling for Encryption and Decryption have to be done in the opposite order, lest it will not un-shuffle right. I hope it's clear from the example code: Decryption: ;---------- mov esi, offset Source ; Place to De-shuffle mov ecx, Size mov edx, StartingPosition ; Same idea as above xor ebx, ebx ; Clear Current position DecryptLoop: mov al, [esi+ebx] ; Grab byte xchg [esi+edx], al ; Exchange it with the other mov [esi+ebx], al ; Finish the change add edx, JumpValue ; Jump Around the file. cmp edx, Size ; If edx is not out of bounds jb SkipDecryptSub ; then ok sub edx, Size ; else make it in bounds SkipDecryptSub: inc ebx ; Shuffle Next Byte with the loop DecryptLoop ; byte at [esi+edx] Encryption: ;---------- mov esi, Source ; First you're probably going mov edi, Destination ; to have to make a copy mov ecx, Size ; (Can't scramble original) pusha ; These regs. we'll need later rep movsb popa mov ebx, StartingPosition ; Which register the first ; register will be swapped with EncryptLoop: sub ebx, JumpValue ; Jump Value same idea as above ; but must go backwards jns SkipEncryptAdd ; Jump if not Negative add ebx, Size ; Make positive SkipEncryptAdd: mov al, [edi+ebx] ; Grab byte at Jump Value xchg [edi+ecx-1], al ; Replace it with current byte ; Remember: we need to shuffle ; backwards. Thus we count ; down (1 less than ecx). mov [edi+ebx], al ; Finish the swap loop EncryptLoop Summary: Well, that's about all to this article. I'm sure there are many more methods than the few I described. Be creative. Combine. The possibilities really are endless. Just don't let me see and more simple XOR/ADD/SUB encryption routines... ---- This article brought to you by your dear friends at Feathered Serpents. Disclaimers: We are not responsible for anything you do from the information in this article. It's YOUR life, YOUR choices. What you choose to do with this information is something YOU have to live with. The information in this article is as accurate as possible, but due to the nature of the subject, there can be no guarantees about correctness.