======================== Metamorphism in practice Published for ======================== 29A e-zine nr.6 --------------- or "How I made MetaPHOR and what I've learnt", by The Mental Driller / 29A Index ----- 0) Basics a) What's metamorphism? b) Structure of a metamorphic code 1) Planning a) Mentalization: we code with macros! b) What we are going to do? i) Simplistic: permutation/substitution ii) "Accordion" model (shrink/expand) iii) Pseudo-assembly language 2) Coding a) Disassembler/depermutator (deobfuscator) b) Shrinker/emulator c) Permutator d) Expander (obfuscator) e) Reassembler 3) Known problems (and solutions) a) Debugging your engine b) API calls c) Memory 4) Future a) Plugins b) Multi-platform cross-infection c) Reassembling to different processors 5) Conclusion Greets ------ My greets go for all 29A members, for Vecna and Z0MBiE for being pioneers in the little explored field of metamorphism and for Vecna again, for advertising me in some points of the structure of the article. 0) Basics --------- a) What's metamorphism? :P Obligated question and answer in an article like this :). Metamorphism is the art of extreme mutation. This means, we mutate everything in the code, not only a possible decryptor. Metamorphism was the natural evolution from polymorphism, which appeared to evade virus scanners. With metamorphism, the difficulty to detect a virus grows exponentially. Then, why aren't there more metamorphic viruses? Simple: they are extremely difficult to make, as I show in this article (not only for tech used, but for the many fux0ring problems we can when we code something like that). Anyway, we'll try to see here that maybe the important thing is to have the correct ideas (something that coders like Vecna, Z0MBiE and others had - hello! :). b) Structure of a metamorphic code A metamorphic virus is like a 49cc motorbike with a spaceshuttle fuel deposit (if you can handle such vision :). In fact, the 90%+ of the code is the metamorphism engine to mutate the little part dedicated to infection, which is somewhat paradoxical. The engine has to be able to mutate itself and the attached code that allows the engine to travel alone (oh). It's far from the little polymorphic engines that made a 8Kb virus to mutate with a single 200 bytes code length engine: this time is just the opposite! That's because we can't make a whole disassembler in 200 bytes (well, maybe Super can ;). Then, the structure is : ********************************************************------- E N G I N E virus And the structure of the engine is (thypically): ***********···········&&&&#############@@@@@@@@ disasm shrinker permut. expander assembler Now in detail: Disassembler: The very start of the engine. The disassembler will decode every instruction to know the length of it, the registers it uses and all the information related with itself. It has also to be able to decode IP changing instructions like JMP and CALL in x86 (or BSR in others, for example). Shrinker: Also called compressor, this part will compress the disassembled code generated in the previous generation (i.e. the code in this generation). This is done to avoid the in-every-generation growth that will render in a many-mega-bytes virus in very few generations, although you use a non expansion technique of mutation. So, this part depends on the type of your metamorphic code, and it's also the most difficult to make: in fact, VERY few viruses has this part. Basically, it compresses in one instruction what the expander coded in many. It can be also an emulator, eliminating do-nothing and redundant instructions and compressing operations into a single one (for example, MOV Reg,1234/ADD Reg,4321 --> MOV Reg,5555). Permutator: A basic part of the metamorphism engine, and the one that many virus authors has coded to make metamorphic viruses, altough it's not metamorphism at all, since you leave the instructions unchanged. It's normally combined with other forms of metamorphism like instruction substitution (XOR EAX,EAX for SUB EAX,EAX, for example, and etc.). It's very simple in its concept, but very powerful since it breaks all the scan strings that can be used to detect the virus. Expander: This part only exists when a shrinker is present (well, there are some viruses that have it, but the code in them grows uncontrollably). This does what the shrinker undoes: it recodes a single instruction to many instructions that perform the same. Assembler: It recodes what we constructed with the expander. It fixes JMPs, CALLs and all that instructions, instruction lengths, changes the registers, etc. Or, if you are using an internal pseudo-assembler, it reassembles that code into the target processor language. OK, these are the normal parts. Anyway, decide what you decide to make, you need a PLANNING!! Don't commit the stupidity (sorry :) of making a hard metamorphic code (light is OK ;) without planning what you want to do. If not, the most probable thing that would happen is that you never finish the code. When coding the MetaPHOR virus I was planning the code for about two months, and I wanted to have clear in my mind what I wanted to do exactly. And, believe me, it helped me alot and I saved a lot of work. So, let's do what I did for planning (just if it helps you to plan your code). 1) Planning ----------- a) Mentalization: we code with macros! This part is very important: forget that you are touching instruction at machine level. What you do is "move this value to this register" or "add the content of this register to this variable". This makes easier the way you look at coding metamorphism (in fact, it's the same than when you code polymorphism). All the instructions and groups of instructions are macros of the operation you really want to do. The objective of this is learning to see the code as a bunch of instructions that doesn't rely on the final code, but in an operation that must be performed to make a bigger operation. b) What we are going to do? i) Simplistic: permutation/substitution If this is the way you want to do metamorphism (or you don't want to complicate your life making a 400 Kb engine) this is your choice. Plan the way you are going to permutate the code (JMP linking, single-instruction shifting and NOP padding, etc.). This must be present always you code your virus, because the whole code must be compatible with the type. For example, if you select NOP padding you must code all the instructions with equal size, padding them with NOPs. Since it's the most simple way of doing meta, it's also the way that requires less planning: just code directly, and after you have finished the first release (and before you test), just pad with NOPs. Other way can be a macro that does: db 10h dup (90h) ; NOPs org $-10h .align 4 Or something similar. Code a macro if you are a normal person, or do it directly if you are crazy ;). Another way is to get the length of every instruction in run-time and then garble them in a buffer. This requires disassembly to fix JMPs, CALLs and Jccs (conditional jumps), so maybe it's not as simple as it can seem. ii) "Accordion" model (shrink/expand) What a name for a technique! XD The power of this kind of metamorphism is the fact that code maybe doesn't permutate, but it's always different (and can be combined perfectly with permutation, which generates "absolute metamorphism"). For this kind of meta you have to decide if you are going to use a mini-emulator or just a disassembly into a pseudo-assembler, which is the solution I made in MetaPHOR. For this, you first define your own assembly language (that can be based on x86 opcodes). Just keep in mind that the more this assembly language seems the x86 opcodes, the more easy will be its handling. So, let's merge the next section... iii) Pseudo-assembly language ...and continue. The MetaPHOR internal pseudo-assembler follows the next rules: a) All the instructions are 16-bytes long (but this can change in the future to handle 64-bits processors, like the Itanium). b) The structure of the instruction is always the same for all them: General structure: 16 bytes per instruction, 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OP *----- instruction data ----* LM *-pointer-* OP is the opcode of the instruction. Depending on the opcode we use an instruction data structure or other. LM is "Label Mark". Its value is 1 when a label is pointing to this instruction, and can be used for quite things, for example to know if two instructions can be shrinked or not (they can't if the second one has a label over it). It's at +0B in the instruction. The dword at +0C is a pointer that means "last code reference". On disassembly this means the EIP where this instruction is pointing to its original codification, but while we are advancing in the code treatment we store here references to the last situation of the instruction. This helps to make modifications to the table of labels, to recode the displacement instructions (JMP, CALL, etc.) and more. Now the structures that the engine uses in the instructions: Memory_address_struct: +01: First index +02: Second index, bits 7&6 are the multiplicator (00=*1,01=*2, 10=*4,11=*8) +03: DWORD addition to indexes Depending on the opcode (the operation to perform), the following means: - If operation has no operand (NOP, RET, etc). nothing in the instr. data is performed - If operation has one operand: Register operand: +01: Register Memory address: +01: Memory address struct Immediate value: +07: DWORD value, zero extended if it's a byte operation Destiny address (JMP, CALL, etc.) +01: Label to jump to (DWORD) - If operation has two operands: Reg,Imm: +01: Register +07: DWORD immediate value, zero extended if it's a 8-bits op. Reg,Reg: +01: Source register +07: Destiny register Reg,Mem: +01: Memory address struct +07: Destiny register Mem,Reg: +01: Memory address struct +07: Source register Mem,Imm: +01: Memory address struct +07: DWORD immediate value, zero extended if it's a 8-bits op. From this rules, now we use the next pseudo-opcodes: 00: ADD, 08: OR, 20: AND, 28: SUB, 30: XOR, 38: CMP, 40: MOV, 48: TEST Set rules: +00: Reg,Imm +01: Reg,Reg +02: Reg,Mem +03: Mem,Reg +04: Mem,Imm +80: 8 bits operation So, opcode 83 means ADD Mem,Reg using 8-bits operands, and so on. 50: PUSH Reg 51: PUSH Mem 58: POP Reg 59: POP Mem 68: PUSH Imm 70-7F: Conditional jumps E0: NOT Reg E1: NOT Mem E2: NOT Reg8 E3: NOT Mem8 E4: NEG Reg E5: NEG Mem E6: NEG Reg8 E7: NEG Mem8 E8: CALL label E9: JMP label EA: CALL Mem (used for API calls) EB: JMP Mem (used for obfuscation in API calls) EC: CALL Reg (obfuscation of API calls) ED: JMP Reg (idem) F0: SHIFT Reg,Imm F1: SHIFT Mem,Imm F2: SHIFT Reg8,Imm F3: SHIFT Mem8,Imm For all SHIFTs: +07: Byte with the value of rotation/shifting +08: Operation performed: 0: ROL, 8: ROR, 20: SHL, 28: SHR F4: APICALL_BEGIN Special operation meaning PUSH EAX/PUSH ECX/PUSH EDX that avoids the recoding of these registers, always remaining the same. F5: APICALL_END The complementary of APICALL_BEGIN, it means POP EDX/POP ECX/POP EAX F6: APICALL_STORE +01: Memory address struct This always means: MOV [Mem],EAX <-- Avoiding the recoding of EAX F7: SET_WEIGHT +01: Memory address struct +07: Byte with the weight identificator +08: Register 1 in code structure +09: Register 2 in code structure F8: MOVZX Memory address struct is a 8-bits operand, while +07 is a 32 bit reg. FC: LEA FE: RET FF: NOP These are the opcodes we can find when disassembling. Additionally, I have have defined some more opcodes for internal operations: 4F: Exists only While shrinking, and means a MOV Mem,Mem, being: +01: Source memory address struct +07: Pointer to the instruction that holds the destiny memory address struct, which has the format: +00: FF (NOP) +01: Destiny memory address struct +07: Ptr to source memory address holder (the 4F instruction) Since 4F is only a transition opcode, it's free for use after shrinking. There are three more opcodes used only for reassembly: 4E: INC/DEC Register +01: Register +07: 0 if INC, 8 if DEC 4F: INC/DEC memory address +01: Memory address struct +07: 0 if INC, 8 if DEC FD: Literal byte +01: Byte to insert directly into the code Thats the planning I made for the beginning. As you can see, the opcodes are very similar to the x86 ones, so they are quite easy to remember (in fact, I wrote this list only remembering it :). Now, the second part of planning: the shrinker. How are we going to shrink? We'll see them in its section, because I'm not on the mood of writing things twince :P. 2) Coding --------- a) Disassembler/depermutator The entrypoint of the engine is this! Before making anything, we must disassemble, of course. The disassembler is quite easy in concept, but a real drudgery. Since we are trying to do an absolute metamorphic code, we can't use hash tables although we code a routine to mutate that hash, something that can be harder. My solution was the first case, and moreover I combined the depermutator in an implicit way. The theory of decoding is as follows: Given ESI = Entrypoint, we have a memory buffer as big as the code we are going to disassemble. This buffer (stored in the variable PathMarks) allows us to control the already-disassembled code. We have also two more tables: the LabelTable and the FutureLabelTable, also initialized. These two tables have a counter variable for each one, giving the number of elements in every table. We have also the DisassembledCode in EDI, just where we decode the instructions. LabelTable is a table where each element is 2 DWORDs long. The first DWORD stores the real EIP where it points, where the second DWORD stores a pointer to the disassembled code. Then, when we decode a JMP, we set a a pointer to this table as the label. In this way, we can move as we want the internal pointers of the label and all the instructions referencing to that label are automatically updated. FutureLabelTable is a buffer table that only exists on disassembly. It's used to store destinies of JMPs, CALLs, etc. pointing to code that we haven't disassembled yet. Every time we decode an instruction, we look if that address is stored in this table, and then we can complete all the instructions that referenced to that address if it exists there. Given all this, let's see the algorithm: 1) Initialize the PathMarks map (i.e. zeroing it) and the number of labes and future labels. 2) Translate the current EIP (in ESI) directly onto the PathMarks map. - If it exists, store at ESI a JMP to the disassembled instruction (storing also a label pointing to that instruction if it hasn't one already). - If it doesn't exists, mark it as already disassembled and decode the instruction. Now, depending on the instruction, we act: If it's JMP: * If it points to an already decoded address, write a JMP instruction, insert a label to the destiny and get a new EIP at FutureLabelTable. If the label already exists, use that label. * If not, then write a NOP (just in case a label points directly to this JMP) and load a new EIP (in ESI) with the destiny. In this way, we have eliminated a possible permutation JMP. If it's Jcc (conditional jump): * If it points to an already decoded address, write the Jcc and insert a label to the destiny if the label doesn't exist (if not, use the label already inserted in the table). * If the destiny is not disassembled yet, then store it at FutureLabelTable and continue. If it's CALL, act as if it were a Jcc. If it's RET, JMP Reg or JMP [Mem] (a final leaf in the code tree), store the instruction and get a new EIP from FutureLabelTable. When getting a new EIP from the FutureLabelTable, we check if the labels stored here are already decoded. If they are, then we insert the corresponding labels at the LabelTable and eliminate the entry in FutureLabelTable. If not, we get that new EIP (i.e. we load ESI with that new entrypoint), we insert the new label at LabelTable and continue. As you can deduct, the disassembly will end when FutureLabelTable has no more entries, since this means that we came from an end leaf of the code flux. After performing such "emulation", we have: 1) Eliminated the permutation and the permutation jumps (since we have eliminated the JMPs by changing directly the EIP in ESI). 2) Eliminated all the code that can't be reached in any way. 3) Decoded the whole code in our pseudo-assembler. 4) Substituted the labels by pointers to table entries. We have done it! We haven't to code a depermutator or an emulator to detect dead code zones, since we eliminate them implicity. An example of a depermutation performed here would be: CODE PASSES ------ ------ xxx1 1) Decode xxx1 xxx2 2) Decode xxx2 xxx3 3) Decode xxx3 jmp @A 4) Change EIP to @A (don't store label) yyy1 5) Decode xxx7 yyy2 6) Decode xxx8 @B: xxx4 7) Decode xxx9 xxx5 8) Change EIP to @B (don't store label) xxx6 9) Decode xxx4 jmp @C 10) Decode xxx5 yyy3 11) Decode xxx6 yyy4 12) Change EIP to @C (don't store label) @A: xxx7 13) Decode xxx10 xxx8 14) Decode xxx11 xxx9 15) Decode JZ and store @D in FutureLabelTable jmp @B 16) Decode xxx12 @D: xxx13 17) Decode RET, get @D from FutureLabelTable and xxx14 complete the JZ at pass 15 (@D = current EIP) RET 18) Decode xxx13 yyy5 19) Decode xxx14 @C: xxx10 20) Decode RET and get an item from FutureLabelTable. xxx11 Since it's empty, we have decoded everything, so finish. jz @D xxx12 RET The result of the disassembly would be: xxx1 xxx2 xxx3 xxx7 xxx8 xxx9 xxx4 xxx5 xxx6 xxx10 xxx11 jz @D xxx12 RET @D: xxx13 xxx14 RET I think it's more clear now. All garbage and hole-filling code (the yyy? instructions) are implicitly eliminated, so we haven't to look for them later. The problem here is that, once shrinked, the code skeleton (already depermutated here) can be use for detection. Or maybe not, but we want even the skeleton to be different from generation to generation (the spanish expression is "rizar el rizo", or "to loop the loop" in spanglish :). Then, we insert what I call the "3D instructions". What?? Ehrmm... Well, it's not taking Quake and playing, it's just that it pictures exactly what it happens: let's imagine the dimensions where you make metamorphism: the first dimension is the current code (what you are disassembling now). The second dimension is what you are going to code (the result of this metamorphosis). The third dimension is what the next generations will decode. So, first and second dimension are clear, but... what about third? Simple: just code some JMPs as CMP EAX,EAX/JZ @xxx, for example. Do you see the point? The shrinker (as we are going to see in the next part) must be able to compress the "CMP EAX,EAX/JZ @xxx" pair to "JMP @xxx", but it will happen in the next generation, not in this one (and there are some structures that will be compressed completely at more far generations). The only thing we have to take in account is that we can't put garbage instructions after this type of jump, because they'll be decoded also. Let's see the code above (eliminating the yyy? this time) and substituting the first JMP by a CMP/JZ pair: xxx1 Result of the disassembly: xxx2 xxx1 xxx3 xxx2 CMP X,X xxx3 JZ @A CMP X,X @B: xxx4 JZ @A --> This will be compressed to JMP @A on this xxx5 @B: xxx4 generation and then eliminated implicitly xxx6 xxx5 in the disassembly of next generation. jmp @C xxx6 @A: xxx7 xxx10 xxx8 xxx11 xxx9 jz @D jmp @B xxx12 @D: xxx13 RET xxx14 @A: xxx7 RET xxx8 @C: xxx10 xxx9 xxx11 jmp @B jz @D @D: xxx13 xxx12 xxx14 RET RET This means ABSOLUTE metamorphism. The skeleton changes, but not the code algorithm. It's necessary to run some generations to reach the original code, but since other JMPs are mutated in the same way, you never reach it, and you need some generations to eliminate a permutation JMP, while other new ones are inserted. The good thing is that it's not infinite, and we reach a point where this code gets stable, but not in the first generations, where we'll see that our code grows due to these jumps. Other thing that we can do is to use this disassembler to decode some instructions to its easy form: INC Reg --> ADD Reg,1, for example. In this way, we only have to treat one type of instruction, not all the instruction variants in the processor that perform the same operation (although it can be performed by the shrinker, but in this way we save pseudo-opcodes). b) Shrinker/emulator (deobfuscator) At this point, are we going to emulate or compress? The emulation is more advanced and powerful, but implies an outstandingly complex coding, and it has many problems, like the control of the values after a loop. The easier way of doing things, and also the way that gives better ratios in a quality/quantity comparision is compressing by known pairs and triplets of instructions: it's just undoes what the expander part does. The planning must have this part. You have to make a table with all the possible single instructions, pairs/triplets of them, and decide what you are going to compress/expand and what not. The shrinker can be used to eliminate "intelligent garbage": do-nothing code that is inserted as if it were part of the algorithm. Now I show the possible single/pairs/triplets MetaPHOR can compress, and they're just the set of instructions that the expander can generate from a single one. Legend: Reg: A register Mem: A memory address Imm: Immediate When in an instruction is Reg,Reg or something like that, both are the same register. If they are different, I write it as Reg,Reg2 (for example). Transformations over single instructions: XOR Reg,-1 --> NOT Reg XOR Mem,-1 --> NOT Mem MOV Reg,Reg --> NOP SUB Reg,Imm --> ADD Reg,-Imm SUB Mem,Imm --> ADD Mem,-Imm XOR Reg,0 --> MOV Reg,0 XOR Mem,0 --> MOV Mem,0 ADD Reg,0 --> NOP ADD Mem,0 --> NOP OR Reg,0 --> NOP OR Mem,0 --> NOP AND Reg,-1 --> NOP AND Mem,-1 --> NOP AND Reg,0 --> MOV Reg,0 AND Mem,0 --> MOV Mem,0 XOR Reg,Reg --> MOV Reg,0 SUB Reg,Reg --> MOV Reg,0 OR Reg,Reg --> CMP Reg,0 AND Reg,Reg --> CMP Reg,0 TEST Reg,Reg --> CMP Reg,0 LEA Reg,[Imm] --> MOV Reg,Imm LEA Reg,[Reg+Imm] --> ADD Reg,Imm LEA Reg,[Reg2] --> MOV Reg,Reg2 LEA Reg,[Reg+Reg2] --> ADD Reg,Reg2 LEA Reg,[Reg2+Reg2+xxx] --> LEA Reg,[2*Reg2+xxx] MOV Reg,Reg --> NOP MOV Mem,Mem --> NOP (result of a compression of PUSH Mem/POP Mem, with pseudoopcode 4F) The instructions that are eliminated (the ones that mean NOP) are be used as garbage along the executable code. Since every NOP instruction can be expanded (for example, MOV Reg,Reg can be set as PUSH Reg/POP Reg, and every PUSH and POP also can be expanded, and so on) you can't know what's garbage and what's not until you have compressed everything. The pairs of instructions that MetaPHOR can compress are: PUSH Imm / POP Reg --> MOV Reg,Imm PUSH Imm / POP Mem --> MOV Mem,Imm PUSH Reg / POP Reg2 --> MOV Reg2,Reg PUSH Reg / POP Mem --> MOV Mem,Reg PUSH Mem / POP Reg --> MOV Reg,Mem PUSH Mem / POP Mem2 --> MOV Mem2,Mem (codificated with pseudoopcode 4F) MOV Mem,Reg/PUSH Mem --> PUSH Reg POP Mem / MOV Reg,Mem --> POP Reg POP Mem2 / MOV Mem,Mem2 --> POP Mem MOV Mem,Reg / MOV Reg2,Mem --> MOV Reg2,Reg MOV Mem,Imm / PUSH Mem --> PUSH Imm MOV Mem,Imm / OP Reg,Mem --> OP Reg,Imm MOV Reg,Imm / ADD Reg,Reg2 --> LEA Reg,[Reg2+Imm] MOV Reg,Reg2 / ADD Reg,Imm --> LEA Reg,[Reg2+Imm] MOV Reg,Reg2 / ADD Reg,Reg3 --> LEA Reg,[Reg2+Reg3] ADD Reg,Imm / ADD Reg,Reg2 --> LEA Reg,[Reg+Reg2+Imm] ADD Reg,Reg2 / ADD Reg,Imm --> LEA Reg,[Reg+Reg2+Imm] OP Reg,Imm / OP Reg,Imm2 --> OP Reg,(Imm OP Imm2) (must be calculated) OP Mem,Imm / OP Mem,Imm2 --> OP Mem,(Imm OP Imm2) (must be calculated) LEA Reg,[Reg2+Imm] / ADD Reg,Reg3 --> LEA Reg,[Reg2+Reg3+Imm] LEA Reg,[(RegX+)Reg2+Imm] / ADD Reg,Reg2 -> LEA Reg,[(RegX+)2*Reg2+Imm] POP Mem / PUSH Mem --> NOP MOV Mem2,Mem / MOV Mem3,Mem2 --> MOV Mem3,Mem MOV Mem2,Mem / OP Reg,Mem2 --> OP Reg,Mem MOV Mem2,Mem / MOV Mem2,xxx --> MOV Mem2,xxx MOV Mem,Reg / CALL Mem --> CALL Reg MOV Mem,Reg / JMP Mem --> JMP Reg MOV Mem2,Mem / CALL Mem2 --> CALL Mem MOV Mem2,Mem / JMP Mem2 --> JMP Mem MOV Mem,Reg / MOV Mem2,Mem --> MOV Mem2,Reg OP Reg,xxx / MOV Reg,yyy --> MOV Reg,yyy Jcc @xxx / !Jcc @xxx --> JMP @xxx (this applies to (Jcc & 0FEh) with (Jcc | 1) NOT Reg / NEG Reg --> ADD Reg,1 NOT Reg / ADD Reg,1 --> NEG Reg NOT Mem / NEG Mem --> ADD Mem,1 NOT Mem / ADD Mem,1 --> NEG Mem NEG Reg / NOT Reg --> ADD Reg,-1 NEG Reg / ADD Reg,-1 --> NOT Reg NEG Mem / NOT Mem --> ADD Mem,-1 NEG Mem / ADD Mem,-1 --> NOT Mem CMP X,Y / != Jcc (CMP without Jcc) --> NOP TEST X,Y / != Jcc --> NOP POP Mem / JMP Mem --> RET PUSH Reg / RET --> JMP Reg CALL Mem / MOV Mem2,EAX --> CALL Mem / APICALL_STORE Mem2 MOV Reg,Mem / CALL Reg --> CALL Mem XOR Reg,Reg / MOV Reg8,[Mem] --> MOVZX Reg,byte ptr [Mem] MOV Reg,[Mem] / AND Reg,0FFh --> MOVZX Reg,byte ptr [Mem] Maybe there are more, but this set is sufficient, at least for our proposits. What we do know is scan the code for this situations and then we substitute the first instruction by their equivalent and we overwrite with NOP the second, so the instructions are compressed. But there are more: the triplets: MOV Mem,Reg OP Mem,Reg2 MOV Reg,Mem --> OP Reg,Reg2 MOV Mem,Reg OP Mem,Imm MOV Reg,Mem --> OP Reg,Imm MOV Mem,Imm OP Mem,Reg MOV Reg,Mem --> OP Reg,Imm (it can't be SUB) MOV Mem2,Mem OP Mem2,Reg MOV Mem,Mem2 --> OP Mem,Reg MOV Mem2,Mem OP Mem2,Imm MOV Mem,Mem2 --> OP Mem,Imm CMP Reg,Reg JO/JB/JNZ/JA/JS/JNP/JL/JG @xxx != Jcc --> NOP CMP Reg,Reg JNO/JAE/JZ/JBE/JNS/JP/JGE/JLE @xxx != Jcc --> JMP @xxx MOV Mem,Imm CMP/TEST Reg,Mem Jcc @xxx --> CMP/TEST Reg,Imm Jcc @xxx MOV Mem,Reg SUB/CMP Mem,Reg2 Jcc @xxx --> CMP Reg,Reg2 Jcc @xxx MOV Mem,Reg AND/TEST Mem,Reg2 Jcc @xxx --> TEST Reg,Reg2 Jcc @xxx MOV Mem,Reg SUB/CMP Mem,Imm Jcc @xxx --> CMP Reg,Imm Jcc @xxx MOV Mem,Reg AND/TEST Mem,Imm Jcc @xxx --> TEST Reg,Imm Jcc @xxx MOV Mem2,Mem CMP/TEST Reg,Mem2 Jcc @xxx --> CMP/TEST Reg,Mem Jcc @xxx MOV Mem2,Mem AND/TEST Mem2,Reg Jcc @xxx --> TEST Mem,Reg Jcc @xxx MOV Mem2,Mem SUB/CMP Mem2,Reg Jcc @xxx --> CMP Mem,Reg Jcc @xxx MOV Mem2,Mem AND/TEST Mem2,Imm Jcc @xxx --> TEST Mem,Imm Jcc @xxx MOV Mem2,Mem SUB/CMP Mem2,Imm Jcc @xxx --> CMP Mem,Imm Jcc @xxx PUSH EAX PUSH ECX PUSH EDX --> APICALL_BEGIN POP EDX POP ECX POP EAX --> APICALL_END And again, maybe there are more cases, but these are the ones I use. The mechanism we follow is the same as with pairs: we check if the three instructions under our pointer construct a defined triplet, and then we compress it overwriting the two last instructions with two NOPs. Once we have defined the singles, pairs and triplets of instructions, we are going to see the algorithm of compression, since we cannot apply this directly if we expanded them recursively (this means, maybe we coded PUSH Imm/POP Reg, but PUSH Imm can be coded as MOV Mem,Imm/PUSH Mem, so the instructions would be "MOV Mem,Imm/PUSH Mem/POP Reg", and the POP Reg could be expanded also. The algorithm of compression is: CurrentPointer = FirstInstruction @@Loop: if ([CurrentPointer] == MATCHING_SINGLE) { Convert it if (CurrentPointer != FirstInstruction) call DecreasePointer if (CurrentPointer != FirstInstruction) call DecreasePointer if (CurrentPointer != FirstInstruction) call DecreasePointer goto @@Loop } if ([CurrentPointer] == MATCHING_PAIR) { Convert it if (CurrentPointer != FirstInstruction) call DecreasePointer if (CurrentPointer != FirstInstruction) call DecreasePointer if (CurrentPointer != FirstInstruction) call DecreasePointer goto @@Loop } if([CurrentPointer] == MATCHING_TRIPLET) { Convert it if (CurrentPointer != FirstInstruction) call DecreasePointer if (CurrentPointer != FirstInstruction) call DecreasePointer if (CurrentPointer != FirstInstruction) call DecreasePointer goto @@Loop } do (CurrentPointer++) while [CurrentPointer] == NOP if(CurrentPointer != LastInstruction) goto @@Loop DecreasePointer: do (CurrentPointer--) while (([CurrentPointer] == NOP) && ([CurrentPointer.Label == FALSE)) return We don't need to pad with NOPs the last instruction to avoid that garbage from a previous disassembly generate something undesired, since no compression is defined that takes a RET, JMP or an instruction like that as the first instruction of a pair/triplet, and we are SURE - capitalized - that the code finishes with one of this instruction. Also take in account that we ignore NOPs always! Also take care of labelled instructions: we MUST NOT compress instructions if one of them (apart from the first one) has a label pointing to it (that's the reason of having a LABELLED flag embedded into the instruction structure that we defined while planning the engine). A label over it means that a jump, call, etc. points to the instruction, and if we merge it with the one above we are corrupting the code. The memory addresses that we eliminate with the compression are just temporary variables that we reserved in reassembly to hold the values to perform that operation. So, we must be sure that our true variables (the memory addresses where we store important things) must not match any pair or triplet, because if it does it will be eliminated and the engine won't work anymore. With a little care, we don't worry about that memory variables and we don't have to check if they are important variables or only transition ones, which allows us to redefine freely the position of the variables along the code and mix them with the temporary ones. An example of what the shrinker does: Original code: MOV [Var1], ESI * PUSH ESI * MOV EAX,ESI PUSH [Var1] * nop nop POP EAX POP EAX * nop PUSH EBX PUSH EBX PUSH EBX =====> POP [Var2] POP [Var2] POP [Var2] ADD EAX,[Var2] ADD EAX,[Var2] ADD EAX,[Var2] MOV EAX,ESI MOV EAX,ESI * LEA EAX,[ESI+EBX] nop nop nop nop nop nop ====> * MOV [Var2],EBX * ADD EAX,EBX * nop * nop nop nop ADD EAX,[Var2] * nop nop After passing the algorithm above to the pseudo-assembler, we obtain a code with lots of NOPs along the code, but it doesn't matter because, on expansion, we'll ignore this NOPs, performing a true optimization (although that achieved optimization is broken by the expander, but who cares...). c) Permutator Using an internal assembler allows to perform this pass in an easy way, and we haven't to keep all the instructions of the same size and things like that, because the reassembler will calculate that for us. The easiest way of doing a permutation over the code is to define "code frames": we construct a table where we define portions of code, giving an initial and a final offset, in this way: ESI = Initial address of instructions EDI = Address of last instruction Given ESI = 00000000h, EDI = 00000060h while(ESI < EDI) Store ESI ESI += Random(8)+8 Store ESI if((ESI+0F > EDI) Store ESI,EDI break; end if end while Result (for example): DD 00000000h,0000000Ah DD 0000000Ah,00000017h DD 00000017h,00000023h DD 00000023h,00000032h DD 00000032h,0000003Dh DD 0000003Dh,00000049h DD 00000049h,00000052h DD 00000052h,00000060h Now we make a shuffle of the elements of this array. Shuffling is easy (there are lots of algorithms for doing it). After doing it, we obtain: DD 00000032h,0000003Dh DD 00000023h,00000032h DD 0000000Ah,00000017h DD 00000000h,0000000Ah DD 00000017h,00000023h DD 00000052h,00000060h DD 0000003Dh,00000049h DD 00000049h,00000052h While doing it, we keep the track of the first frame, which is the entrypoint of the code. If the first frame is not the entrypoint, we insert a JMP to that frame. Since the destiny of the linking JMPs is not known, we store them into a table for, after copying all the instructions, complete them. So, the first action we make is to insert a JMP for the entrypoint, and then we begin to copy the instructions. The first frame tell us that we must copy from instruction 32h to instruction 3Dh, and then insert a JMP to the next frame, and so on. After performing this, we'll get a permutated code. At this point we only have to finish the JMPs we have stored for later fixing, and all is done. Also try different things here: for example, if you leave the NOPs that the shrinker generated it will derive in a more random code distribution, since the NOPs will be included also in the permutation, but eliminated in the next code processings. d) Expander (obfuscator) The expander is the part that undoes what the shrinker did. It performs *exactly* the reverse operation (in a random way, of course). Just take the defined singles, pairs and triplets and code an alternative for every instruction an alternative (apart from their direct codification). For this, we are going to code it all recursive. For example, when we find a 50h opcode (PUSH Reg) we'll call MakePUSHReg(), which will decide randomly if we code directly the PUSH or we make a workaround using a defined pair or triplet. So, the function decides to make it as MOV Mem,Reg / PUSH Mem, so it calls MakeMOVMemReg() and MakePUSHMem(). But, what a casuallity! The function MakeMOVMemReg() can be called with MakePUSHReg()+MakePOPMem(), which again calls MakePUSHReg(). This can make the code grow too much, so we put a recursivity control, i.e. a variable that increments when a recursive-procedure-using function is called and decreased when it leaves. We check that variable to see if it has increased to a certain number, and if we reach it we code the instruction directly (in this case, 50h+Reg and EIP++). The expander is cooler if we use our internal assembly language also, so we haven't to deal with the final codification. That work will be done by the reassembler (next section). To be sure that what's resulting on the code expansion, we are going to generate the pseudo-assembler as similar to the final code as we can (that's why I have opcodes such 4E and 4F to code INC and DEC in the list of opcodes). So, if we find CMP EAX,0 on the expander, we use it to code OR EAX,EAX or TEST EAX,EAX in pseudoassembler, so the reassembler only has to code the instruction directly, without having to decide anything more than some randomness on opcode generation. The expander should make some more operations, such as: * Register translation We select a new register to translate the register that we were using until now. The easier way is to put the sequence 0,1,2,3,5,6,7 in a list and shuffle it, and then translate every register by that number. In this way the registers are never the same when performing operations, and even on memory access, since if we use ESI->EBP translation (for example) the way these memory references are assembled to x86 are different (or can be different). It's something that we could see in Vecna's Regswap. * Variable re-selection To do memory variables we have to own a memory buffer (reserved with VirtualAlloc, .bss section in the host itself, etc.). If you have the variables we use to store important things at a fixed location, forget it! Put them in a buffer where all them are together. In this way, we only have to detect that a memory address is a variable (looking if it's a memory address of the form [DeltaRegister+12345678h]), store it in a buffer like we did with the code labels (but this time with memory addresses), shuffle them and reassign an address for every variable. Doing this we don't use fixed variables anymore. The only inconvenient is that we must supply some values from other places, such from the decryptor (if we use one). Also I'll take a case for LEAs: better if we use them only for recoding MOV Reg,Value, ADD Reg,Value, MOV Reg,Reg and ADD Reg,Reg. Why? Because if you look a LEA closer, means that it arrived here because it's a shrink of some more instructions, like MOV EAX,EBX / ADD EAX,12345678h. So, if we avoid the coding of the LEA and we disgregate it in simpler operations (randomly), we have an embedded swapper module. An example: MOV EAX,ECX ADD EAX,3 -> (shrink) -> LEA EAX,[ECX+3] -> (expand) -> MOV EAX,3 ADD EAX,ECX (expand) -> MOV EAX,ECX ADD EAX,3 The thing is that if we did the effort of coding a swapper in a part of the engine, we'll see that these cases are contemplated, so we coded things that are redundant in the final result. Anyway, a swap can be performed on expansion, but take care, because many times a swap between two elements like the next ones without controlling EVERYTHING can corrupt the code: MOV EAX,1234 MOV EBX,2345 <-- check if a label is pointing to this! MOV EAX,1234 MOV EBX,[EAX] <-- check if the second instruction uses the elements of the first instruction MOV EAX,[EBX] MOV EBX,1234 <-- check if the first instruction uses the elements of the second instruction MOV EAX,[EBX] MOV [ECX],EDX <-- EBX and ECX has the same value? We can't know this without total emulation And MANY MANY more. My experience is that there are so many factors that the code can crash even with the most little, innocent change that the swapper makes although it seems to be perfectly valid. The only ones we can mutate with security are the ones that are inserted in LEAs and unrolled from them later. Moreover, we avoid the coding of another huge routine :). e) Reassembler The reassembler is the end edge of the engine. This piece of code will generate the instructions the processor can understand. If we followed the philosophy of expanding the code in pseudo-assembler, this is a piece of cake, because we have done this plenty of times when we coded polymorphic engines. Moreover, the vision of seeing your engine finished animates you a lot, believe me :). The reassembler is a kind module that likes to be helped from other ones, such from the expander. We can code it to take literally the pseudoassembler and write what it's expressed there, with no care about what it means. But while we code it, we see that there are some things that aren't as easy as they seem, for example the EIP displacement instructions. How can we code foward JMPs, CALLs and Jccs? We must use (again) a table, and store here all the instructions we must fix when the pseudocode is all assembled. But a real pain can be the foward short JMPs/Jccs. We can do it without it, but it's not fair ;). My solution (very neat) is to look if the instruction points to a maximum length of 11 or 12 instructions foward, so if it's below that mark we can decide if we code it short or long (randomly, ofcoz). Where we haven't any problem is with backwards jumps, because we always know the length. CALLs doesn't need to be decided, but we must resolve them also when they are fowards, because the code isn't assembled yet. The randomness in this part of the engine is made in the jumps (when we can decide if we use short or long ones) and in the opcode reassembly when we have several possibilities for the same opcode. The list that follows shows some instructions that can be done in this way: B0+Reg C6 C0+Reg --> MOV Reg8,Value B8+Reg C7 C0+Reg --> MOV Reg,Value 50+Reg FF F0+Reg --> PUSH Reg 58+Reg 8F C0+Reg --> POP Reg 40+Reg FF C0+Reg --> INC Reg 48+Reg FF C8+Reg --> DEC Reg This is an example. Other ones can be the instructions that can uses EAX (both in EAX-exclusive opcode or using the generic opcode), the opcodes that takes a sign-extended byte-to-dword operand (opcode 83, and using -80 > value < 7F), and much more. All them are only random at opcode level, because all the alternate ways of doing things (the complex movements, and the like) are made with the expander. Hey, it's done! We have a reassembled code! Now it comes the hard part: DEBUGGING. 3) Known problems (and solutions) --------------------------------- a) Debugging your engine Debugging metamorphic code is a hell, and if you never have coded such an engine, you'll notice it. It's fine while you debug the code at first generation because you can see what you did, but the problem comes when you must debug the generated code. The assembler is soooo obfuscated that you maybe get crazy, so the solution is: debug from the beginning! Don't code the shrinker/expander until all the other parts work perfectly! Other thing you must consider is that the debugger uses INT 3 to patch the return from CALLs. If you code the disassembler in a separated function, take care when tracing/stepping over, because: MOV EDI,[VirusEntry] CALL Disassembly MOV EAX,12345678 --> The disassembler will see: MOV EDI,[VirusEntry] CALL Disassembly INT 3 JS @xxx ... You can see that the debugger corrupts the code. This is good, because the code is implicitly anti-debugging :), but it's bad because if you don't notice it you believe you're wrong in some point and lose lots of time with a checking you didn't need. The solution is: 1) Enter the call and trace all the debugging (unacceptable) or 2) Use hardware breakpoints (OOOOOH! clap clap clap clap) So, take care of where you press the "Run to here" key, since it will be OK if you use it after you have runned the disassembler, but it will corrupt the code if you use it before the disassembly and you point to code that will execute after it. Using hardware breakpoints on "fetch instruction" will eliminate the problem, but it's more annoying (more keys to press, etc.). Make it in a macro. b) API calls API calls are a real pain in the ass to detect. Since all the calls haven't the same number of parameters, we don't know at what point the parameters start (but we know that the API ends with the CALL, ah, now we can rest in peace :P). This type of calls are delicate because we can't change the registers around them as we want, since the return value is given always at EAX, and they always modify ECX and EDX. So, in a normal code, we happily set values in registers without using these three, but we can't make it when we are translating registers "a go-go" (or we can play only with registers > EDX, but then we limit the possibilities). We also have to detect the moment when EAX is used to check the return value, and lots of things that comes from the fixed use of these registers, things that collide with our engine. The easy solution is to use a "code mark": they're a sequence of certain instructions with certain operands that doesn't repeat along the code unless the thing they are marking is present. The ones I use are the instructions called APICALL_BEGIN, APICALL_END and APICALL_STORE. APICALL_BEGIN is, simply PUSH EAX/PUSH ECX/PUSH EDX. It's detected by the shrinker, since every instruction of the structure can be expanded as well as with other instructions. The shrinker will detect this sequence of instructions and will change them by the pseudoopcode F4, which is a mark for the expander to code PUSH EAX/PUSH ECX/PUSH EDX. The registers must be always EAX, ECX and EDX. In this way we assure the saving of these ones before an API call. In the same way, APICALL_END is POP EDX/POP ECX/POP EAX. This is also translated by the shrinker, giving a pseudoopcode F5, which is also a mark for the expander to code POP EDX/POP ECX/POP EAX. This signalizes the end of an API call and then the restoration of what it was saved. So, for the registers values it's like if the API call was never performed. APICALL_STORE is another pseudoinstruction. The shrinker will detect it when it gets a MOV [Mem],EAX just after the CALL of the API. This is done to avoid the translating of EAX, so this instruction will always be codificated as MOV [Mem],EAX, regardless the register EAX must be translated to. The memory address is the variable where the return value is stored, and can be retrieved after doing the APICALL_END. The next example illustrates the use of that technique: PUSH EAX PUSH ECX PUSH EDX -------------------> APICALL_BEGIN MOV EAX,[EBP+AddressOfNewDirectory] PUSH EAX CALL DWORD PTR [EBP+RVA_SetCurrentDirectoryA] MOV [EBP+ReturnValue],EAX -------------> APICALL_STORE [ReturnValue] POP EDX POP ECX POP EAX -------------------> APICALL_END MOV EAX,[EBP+ReturnValue] CMP EAX,EDX ; Get the return value and check it JZ @X ... Now change the registers: PUSH EAX PUSH ECX PUSH EDX -------------------> APICALL_BEGIN MOV ESI,[EBX+AddressOfNewDirectory] PUSH ESI CALL DWORD PTR [EBX+SetCurrentDirectoryA] MOV [EBX+ReturnValue],EAX -----------> APICALL_STORE [ReturnValue] POP EDX POP ECX POP EAX -------------------> APICALL_END MOV ESI,[EBX+ReturnValue] CMP ESI,ECX ; Get the return value and check it JZ @X ... For obvious reasons, you must not use EAX, ECX or EDX as the Delta register in the recodification of the virus because, if you use it, the call to the API function will overwrite it and the return value will be wrote anywhere, with a 99% of throwing an exception. We can use ths fact (Delta is not EAX, ECX or EDX) to code the expansion of CALL DWORD PTR [Mem], which we planned to be MOV Reg,Mem/CALL Reg: we can use EAX, ECX or EDX for Reg perfectly and without saving anything, because the register values are going to be destroyed by the API call. For Linux (for the ones that want to make metamorphism under this system) the API call is more complicated, since we pass the parameters in the registers (EAX, EBX, ECX, etc.). Well, we can define a structure in this way: MOV [EBP+Parameter1], XXX MOV [EBP+Parameter2], YYY MOV [EBP+Parameter3], ZZZ PUSH EAX PUSH EBX PUSH ECX PUSH EDX PUSH EBP --------------> LINUX_SYSCALL_BEGIN DeltaReg MOV EAX,[EBP+Parameter1] MOV EBX,[EBP+Parameter2] MOV ECX,[EBP+Parameter3] MOV EDX,[EBP+Parameter4] ---> LINUX_SYSCALL_LOADPARAM INT 80h POP EBP MOV [EBP+ReturnValue],EAX --> LINUX_SYSCALL_STORE DeltaReg,[ReturnValue] POP EDX POP ECX POP EBX POP EAX --------------> LINUX_SYSCALL_END It's larger, but I'm not God :). There's something that also applies to this, although it's not an API call: the SET_WEIGHT pseudoinstruction. The instruction corresponds to the next code structure: PUSH Reg1 MOV Reg1,WEIGHT_IDENT MOV Reg2,xxyyzztt MOV [Mem],Reg2 POP Reg1 This is compressed to SET_WEIGHT [Mem],IDENT,Reg1,Reg2. This instruction is used to transmit the data here from one generation to another. We can only transmit this because we obtained the values before making the total reasm, so we can't use it for substituting parameters passed by stack. The weights expressed here are part of a little genetic algorithm that makes the virus to be "adaptative", since the "natural selection" will make that the survivors have the best options to be in the wild (type of infection, structure of the decryptor, infection rate. etc.). The algorithm is not very advanced, but it does its work. c) Memory Yes, metamorphism by-the-hard-way requires lots of memory to store tables, the disassemblies, the temporal code changes, execution path marks, the local variables, and lots of etc. etc. etc. The using of ESP is a dirty trick that can be useful to store not very big things, but all changes when you realize that you need about 3 or 4 Mb to do a decent job (just see z0mbie's Mistfall: reserves 32Mb!). The only solution is to use VirtualAlloc and reserve the quantity you need. Since we reserve memory, nothing stop us of copying the code to memory, as if we were dinamically generated code. Moreover, for an antivirus emulator it's a copy, not a decryption (if you decrypt while copying). For this, we have to code a polymorphic engine to generate at least a "copier" (just if you want to leave the code unencrypted). The copy to memory allow us to make more nice things: we can copy our code anywhere inside that memory frame, and have our memory variables at the offset we want. Since we use a Delta register, we don't care where the code is! The problem to this is to get our internal offset (just notice that the memory address where we are is dinamically got by the operating system). We can recode the addresses of all buffers and tables while reassembling, and then even the "shape" of the final code will be different! The problem here is that we must supply the address where the code is stored, the data division inside the reserved virtual allocated memory, etc., so we must supply them from the unique thing we must vary as we want: the decryptor/copier, making, for example, pushes of the values and then poping them once in the engine (look the source code of MetaPHOR). 4) Future This section is optional. It expresses ideas that I'm thinking to increase the functionality of the virus MetaPHOR, and that maybe are useful for you to implement. a) Plug-ins That's a thing one day I'll implement in my virus for sure (I hope!). I realized that with this type of metamorphism the plug-ins are very easy to implement (although "easy" doesn't mean "short and quick-coding"). The plug-ins can be implemented with "code marks", in the same manner we signalize an API call. What we do is to make an information header (variables it uses, etc.) and a routine made in pseudoassembler (our assembler). A code mark that marks a plug-in can be "PUSH Reg/MOV Reg,12345678/POP Reg" (a do-nothing code that we don't eliminate with the shrinker). The number passed in that register is, in fact, a plug-in identificator and a version number, (using hi-word and lo-word), which allows us to compare the version of the plug-in currently installed with the version of a new plug-in, and to determine if the plug-in must substitute and old one or must be added to the execution. The mark would be repeated twice (one for begin and once for end), with the value in the last one with the high-bit to 1, so we can substitute the current plug-in by the new one. The shrinker must detect this structure and set two new instructions called "PLUGIN_BEGIN Version" and "PLUGIN_END Version". When the shrinker has performed it's job, and just after the permutator, the plug-in injector comes to work: it searches for new plug-ins anywhere, decrypts them and check their signature (like Vecna does in his Hybris). The plug-ins are not needing to be encrypted, and are also nearly impossible to modify or put one that is not yours if you use a public-private key signature system. Depending on the type of plug-in and the version, we can substitute directly the plug-in entry (if type coincides and the version is greater) overwriting the CALL to the last plug-in. This is a 3D-coding technique that I mentioned nearly at the beginning of the article: the code of the old plug-in remains and it's assembled forming part of the new virus, but when in the next generation the virus get disassembled, that part won't be reached by the disassembler part, so the olg plug-in is eliminated (and meanwhile the code gets garbled and mixed with code that does absolutely nothing :). All the whole virus can be made of plug-ins, or at least have plug-in identificators at all the routines, even at the plug-in downloader. In that case, if one day I make support for Linux (I think it'll be before the plug-ins get finished :) I only have to code the plug-in and put it where it corresponds. Also we should have to have a field to specify the zone where we want to put a new plug-in (just in case we want to insert a plug-in just after the permutator but before the expander, to make something new, or just if we forgot to code a function and we want to put it now). It'll be on the plug-in identificator, and it's simple: just make a unique identifier for each plug-in in the original virus that be quite separated from the one before, for example: Disassembler: 10000001, Shrinker 20000001, etc. (all them with the identificator and the version). If we want a new function between the disassembler and the shrinker, we use the identificator 18000001, and so on. Let's see if one day I do it :). b) Multi-platform cross-infector That's something that I didn't do on the release 1.0 of MetaPHOR because I was short of time to finish it for 29A#6, but it's quite easy (in fact, we have seen the proof of concept in Benny's Winux). We only have to call one API function or another, depending on the system we are running, and have a function to attach our mutated code to a PE or to an ELF. The APIs can be used in the same way (for example, I code a single function called MapFile that returns me a file mapping for Windows using CreateFileMapping and MapViewOfFile, and for Linux using the pertinent int 80h function). c) Reassembling to different processors That's something that I don't know if I'm going to code it before plug-ins (in which case I'll program a plug-in rather than a new whole virus). If you have realized, the pseudoassembler is quite general, so it can be adapted to any processor we want! Only if they use a 32-bits architecture, of course. The 64-bits architecture requires a change in all the management of the internal pseudoassembler, but not very complicated (only I have to extend the instruction length from 16 bytes to 32, which gives me space for the QWORDs and some new fields that won't remain in blank). The fact is that I thought about it and can be done for sure, and the first pass would be use Alpha assembler and the WinNT under Alpha processors. Why? Because I don't have to change NOTHING, only the Disassembler and the Assembler I use, since the infection methods and the algorithm would be exactly the same! (since we use ring-3). The shrinker, permutator, expander, etc. would remain untouched. The only thing we have to do is redefine some instructions and expansions under this assembler, since the TEST instruction, for example, doesn't exist in all the processors (I think it doesn't exist in Itaniums). The very first thing to do is to code a new disassembler (but keeping the current). So, we only have to see if our code is Alpha or x86 to call to one routine or other. After this, since we are using only our internal assembler, we are in a common part (in algorithm, but not in codification) until the reassembler, which will be selected (reassembler for Alpha or for x86) depending on the target processor specified in the PE header. The different processors we can use are quite alot (and then also means a lot of work!): x86, Alpha, Itanium, 680x0, PowerPC, PA-RISC, etc. (all the platforms that can run WinNT or Unix/Linux, or even MacOS). The great thing is that the code is completely changed: it's not a virus with several parts, one for every processor, as it was MrSandman's Esperanto (with respect, Sandy ;), it was another type of code). The whole virus will reassemble to the new processor's assembly language, so an antivirus that wants to intercept the virus for x86 processors have to deal with a possible conversion to Alpha assembler and then, once in an Alpha, reconvert to x86, jumping over the antivirus completely (for example, in an infected server). And since it's metamorphic, they can't apply strings to detect it. Yummy! :) 5) Conclusion Metamorphism: the strongest viric technique ever ideated, ever created. Any more to say? --------------------------------- * The Mental Driller / 29A, written for 29A#6 on 02/02/2002 ---------------------------------