Lost in XLAT-ion roy g biv / defjam -= defjam =- since 1992 bringing you the viruses of tomorrow today! - celebrating 10 years in 2002 - About the author: Former DOS/Win16 virus writer, author of several virus families, including Ginger (see Coderz #1 zine for terrible buggy example, contact me for better sources ;), and Virus Bulletin 9/95 for a description of what they called Rainbow. Co-author of world's first virus using circular partition trick (Orsam, coded with Prototype in 1993). Designer of world's first XMS swapping virus (John Galt, coded by RT Fishel in 1995, only 30 bytes stub, the rest is swapped out). Author of world's first virus using Thread Local Storage for replication (Shrug, see Virus Bulletin 6/02 for a description, but they call it Chiton), world's first virus using Visual Basic 5/6 language extensions for replication (OU812), world's first Native executable virus (Chthon), world's first virus using process co-operation to prevent termination (Gemini), world's first virus using polymorphic SMTP headers (Junkmail), and world's first viruses that can convert any data files to infectable objects (Pretext). Author of various retrovirus articles (eg see Vlad #7 for the strings that make your code invisible to TBScan). Went to sleep for a number of years. I am awake now. ;) What is xlat encryption? Xlat encryption works by replacing every byte value by another value. It is like having 256 8-bit keys. When the xlat instruction executes, the value in al is used as an index into a table at ebx, and the byte at that location is returned in al. It is equivalent to mov al, byte ptr [ebx + al] This means that there is a 1-to-1 mapping of the index values: for every value of index, there is only one xlat value that is returned. However, the table can be arranged so that one xlat value has many index values that will return it. This means that there is a many-to-1 mapping of the xlat values. It is the many-to-1 mapping that is most interesting to us, but first we look at 1-to-1 mapping. 1-to-1 mapping Let us imagine we have this unencrypted code (?? bytes are the not interesting values): E8 00 00 00 00 ?? ?? E8 ?? ?? 00 00 ?? 00 ?? ?? 00 and that our table uses index 72 to map E8, and index 67 to map 00. Then our encrypted code becomes: 72 67 67 67 67 ?? ?? 72 ?? ?? 67 67 ?? 67 ?? ?? 67 To detect this is easy (but not fast), because the position of the bytes remains the same. Simply take any byte, and see if the same byte appears at all of the same relative offsets in the code, and does not appear at some other offsets. If you repeat this for several different bytes of the code, then you can say that it is likely to be the real thing. In our unencrypted example, E8 appears at offset 0 and 7, and 00 appears at offset 1, 2, 3, 4, 10, 11, 13, 16. In the encrypted code, we find 72 at offset 0, so we see if it also appears at offset 7 (it does). Then we find 67 at offset 1, so we see if it also appears at the other offsets (it does). Finally, we should see if it doesn't appear at some of the offsets, like 5, 6, 8 (it doesn't). So it seems that we have found the code, even though it is encrypted. Additionally, the xlat table must be stored somewhere, so if we can find it, then we can even decrypt this code! How to find the table? It's easy, too. Xlat tables 1-to-1 mapping means that the xlat table contains 256 unique values. It can be found using a table of 256 bits. Take one byte at a time from the file and use it as an index into the bit table. If the bit is set already, then we have a non-unique value so it's not the xlat table. Clear the bits and start over from the current position. If the bit is not set already, then set the bit. When all 256 bits are set, then we have found 256 unique values in a row, so it's probably the xlat table. Many-to-1 mapping We can break both of these algorithms using the many-to-1 mapping. If our code does not use all 256 possible values, then the values that are not used by the code can be used in the xlat table as extra maps to values that are used in the code. Let us imagine that 62 is not used by our code, so we decide to use index 62 to map 00. In our example above, now we have two values that map 00. Then our encrypted code can become: 72 67 62 67 67 ?? ?? 72 ?? ?? 62 67 ?? 67 ?? ?? 62 Now we try again to find the code. We find 72 at offset 0, so we see if it also appears at offset 7 (it does). Then we find 67 at offset 1, so we see if it also appears at the offsets 2, 3, 4, 10, 11, 13, 16. A different value is at 3, so it can't be our code. ;) Let us ignore that and look for the xlat table instead. At index 67 we find a 00, so we set the 0th bit, but at offset 62 we find a 00, and the 0th bit is set already, so it can't be the xlat table. We are done. Greets to friendly people (A-Z): Active - Obleak - Prototype - Ronin - RT Fishel - The Gingerbread Man - Ultras - VirusBuster - Whitehead rgb/defjam dec 2002 iam_rgb@hotmail.com