========================
Metamorphism in practice                                Published for
========================                               29A e-zine nr.6
                                                       ---------------
or

"How I made MetaPHOR and what I've learnt",

                                 by

                                    The Mental Driller / 29A


Index
-----

0) Basics
   a) What's metamorphism?
   b) Structure of a metamorphic code

1) Planning
   a) Mentalization: we code with macros!
   b) What we are going to do?
        i) Simplistic: permutation/substitution
        ii) "Accordion" model (shrink/expand)
        iii) Pseudo-assembly language

2) Coding
   a) Disassembler/depermutator (deobfuscator)
   b) Shrinker/emulator
   c) Permutator
   d) Expander (obfuscator)
   e) Reassembler

3) Known problems (and solutions)
   a) Debugging your engine
   b) API calls
   c) Memory

4) Future
   a) Plugins
   b) Multi-platform cross-infection
   c) Reassembling to different processors

5) Conclusion

Greets
------
My greets go for all 29A members, for Vecna and Z0MBiE for being pioneers in
the little explored field of metamorphism and for Vecna again, for advertising
me in some points of the structure of the article.


0) Basics
---------
 a) What's metamorphism?

  :P Obligated question and answer in an article like this :).

  Metamorphism is the art of extreme mutation. This means, we mutate
  everything in the code, not only a possible decryptor. Metamorphism was the
  natural evolution from polymorphism, which appeared to evade virus scanners.
  With metamorphism, the difficulty to detect a virus grows exponentially.

  Then, why aren't there more metamorphic viruses? Simple: they are
  extremely difficult to make, as I show in this article (not only for tech
  used, but for the many fux0ring problems we can when we code something like
  that). Anyway, we'll try to see here that maybe the important thing is to
  have the correct ideas (something that coders like Vecna, Z0MBiE and others
  had - hello! :).


 b) Structure of a metamorphic code

  A metamorphic virus is like a 49cc motorbike with a spaceshuttle fuel
  deposit (if you can handle such vision :). In fact, the 90%+ of the code is
  the metamorphism engine to mutate the little part dedicated to infection,
  which is somewhat paradoxical. The engine has to be able to mutate itself
  and the attached code that allows the engine to travel alone (oh). It's far
  from the little polymorphic engines that made a 8Kb virus to mutate with a
  single 200 bytes code length engine: this time is just the opposite! That's
  because we can't make a whole disassembler in 200 bytes (well, maybe Super
  can ;). Then, the structure is :

      ********************************************************-------
                       E N G I N E                             virus

  And the structure of the engine is (thypically):

      ***********···········&&&&#############@@@@@@@@
        disasm     shrinker permut. expander  assembler

  Now in detail:

  Disassembler:
    The very start of the engine. The disassembler will decode every
    instruction to know the length of it, the registers it uses and all the
    information related with itself. It has also to be able to decode IP
    changing instructions like JMP and CALL in x86 (or BSR in others, for
    example).

  Shrinker:
    Also called compressor, this part will compress the disassembled code
    generated in the previous generation (i.e. the code in this generation).
    This is done to avoid the in-every-generation growth that will render in a
    many-mega-bytes virus in very few generations, although you use a non
    expansion technique of mutation. So, this part depends on the type of your
    metamorphic code, and it's also the most difficult to make: in fact, VERY
    few viruses has this part. Basically, it compresses in one instruction
    what the expander coded in many. It can be also an emulator, eliminating
    do-nothing and redundant instructions and compressing operations into a
    single one (for example, MOV Reg,1234/ADD Reg,4321 --> MOV Reg,5555).

  Permutator:
    A basic part of the metamorphism engine, and the one that many virus
    authors has coded to make metamorphic viruses, altough it's not
    metamorphism at all, since you leave the instructions unchanged. It's
    normally combined with other forms of metamorphism like instruction
    substitution (XOR EAX,EAX for SUB EAX,EAX, for example, and etc.). It's
    very simple in its concept, but very powerful since it breaks all the
    scan strings that can be used to detect the virus.

  Expander:
    This part only exists when a shrinker is present (well, there are some
    viruses that have it, but the code in them grows uncontrollably). This
    does what the shrinker undoes: it recodes a single instruction to many
    instructions that perform the same.

  Assembler:
    It recodes what we constructed with the expander. It fixes JMPs, CALLs and
    all that instructions, instruction lengths, changes the registers, etc.
    Or, if you are using an internal pseudo-assembler, it reassembles that
    code into the target processor language.


  OK, these are the normal parts. Anyway, decide what you decide to make, you
  need a PLANNING!! Don't commit the stupidity (sorry :) of making a hard
  metamorphic code (light is OK ;) without planning what you want to do. If
  not, the most probable thing that would happen is that you never finish the
  code. When coding the MetaPHOR virus I was planning the code for about two
  months, and I wanted to have clear in my mind what I wanted to do exactly.
  And, believe me, it helped me alot and I saved a lot of work. So, let's do
  what I did for planning (just if it helps you to plan your code).


1) Planning
-----------

 a) Mentalization: we code with macros!

  This part is very important: forget that you are touching instruction at
  machine level. What you do is "move this value to this register" or "add the
  content of this register to this variable". This makes easier the way you
  look at coding metamorphism (in fact, it's the same than when you code
  polymorphism). All the instructions and groups of instructions are macros of
  the operation you really want to do. The objective of this is learning to
  see the code as a bunch of instructions that doesn't rely on the final code,
  but in an operation that must be performed to make a bigger operation.

  b) What we are going to do?

    i) Simplistic: permutation/substitution

  If this is the way you want to do metamorphism (or you don't want to
  complicate your life making a 400 Kb engine) this is your choice. Plan the
  way you are going to permutate the code (JMP linking, single-instruction
  shifting and NOP padding, etc.). This must be present always you code your
  virus, because the whole code must be compatible with the type. For
  example, if you select NOP padding you must code all the instructions with
  equal size, padding them with NOPs. Since it's the most simple way of doing
  meta, it's also the way that requires less planning: just code directly,
  and after you have finished the first release (and before you test), just
  pad with NOPs. Other way can be a macro that does:

        db 10h dup (90h) ; NOPs
        org $-10h
        <instruction>
        .align 4

  Or something similar. Code a macro if you are a normal person, or do it
  directly if you are crazy ;).

  Another way is to get the length of every instruction in run-time and then
  garble them in a buffer. This requires disassembly to fix JMPs, CALLs and
  Jccs (conditional jumps), so maybe it's not as simple as it can seem.

    ii) "Accordion" model (shrink/expand)

  What a name for a technique! XD The power of this kind of metamorphism is
  the fact that code maybe doesn't permutate, but it's always different (and
  can be combined perfectly with permutation, which generates "absolute
  metamorphism"). For this kind of meta you have to decide if you are going
  to use a mini-emulator or just a disassembly into a pseudo-assembler, which
  is the solution I made in MetaPHOR.

  For this, you first define your own assembly language (that can be based
  on x86 opcodes).  Just keep in mind that the more this assembly language
  seems the x86 opcodes, the more easy will be its handling. So, let's merge
  the next section...

    iii) Pseudo-assembly language

  ...and continue. The MetaPHOR internal pseudo-assembler follows the next
  rules:

      a) All the instructions are 16-bytes long (but this can change in the
       future to handle 64-bits processors, like the Itanium).

      b) The structure of the instruction is always the same for all them:

        General structure:

        16 bytes per instruction,

        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        OP *----- instruction data ----* LM *-pointer-*

        OP is the opcode of the instruction. Depending on the opcode we use
         an instruction data structure or other.

        LM is "Label Mark". Its value is 1 when a label is pointing to this
         instruction, and can be used for quite things, for example to know
         if two instructions can be shrinked or not (they can't if the second
         one has a label over it). It's at +0B in the instruction.

        The dword at +0C is a pointer that means "last code reference". On
        disassembly this means the EIP where this instruction is pointing to
        its original codification, but while we are advancing in the code
        treatment we store here references to the last situation of the
        instruction. This helps to make modifications to the table of labels,
        to recode the displacement instructions (JMP, CALL, etc.) and more.


       Now the structures that the engine uses in the instructions:

        Memory_address_struct:
           +01: First index
           +02: Second index, bits 7&6 are the multiplicator (00=*1,01=*2,
                                                              10=*4,11=*8)
           +03: DWORD addition to indexes



        Depending on the opcode (the operation to perform), the following
        means:

        - If operation has no operand (NOP, RET, etc). nothing in the instr.
         data is performed

        - If operation has one operand:

           Register operand:
              +01: Register

           Memory address:
              +01: Memory address struct

           Immediate value:
              +07: DWORD value, zero extended if it's a byte operation

           Destiny address (JMP, CALL, etc.)
              +01: Label to jump to (DWORD)

        - If operation has two operands:

           Reg,Imm:
              +01: Register
              +07: DWORD immediate value, zero extended if it's a 8-bits op.

           Reg,Reg:
              +01: Source register
              +07: Destiny register

           Reg,Mem:
              +01: Memory address struct
              +07: Destiny register

           Mem,Reg:
              +01: Memory address struct
              +07: Source register

           Mem,Imm:
              +01: Memory address struct
              +07: DWORD immediate value, zero extended if it's a 8-bits op.


  From this rules, now we use the next pseudo-opcodes:

      00: ADD, 08: OR, 20: AND, 28: SUB, 30: XOR, 38: CMP, 40: MOV, 48: TEST

      Set rules: +00: Reg,Imm
                 +01: Reg,Reg
                 +02: Reg,Mem
                 +03: Mem,Reg
                 +04: Mem,Imm
                 +80: 8 bits operation

      So, opcode 83 means ADD Mem,Reg using 8-bits operands, and so on.
        
      50: PUSH Reg
      51: PUSH Mem
      58: POP Reg
      59: POP Mem
      68: PUSH Imm
      70-7F: Conditional jumps
      E0: NOT Reg
      E1: NOT Mem
      E2: NOT Reg8
      E3: NOT Mem8
      E4: NEG Reg
      E5: NEG Mem
      E6: NEG Reg8
      E7: NEG Mem8
      E8: CALL label
      E9: JMP label
      EA: CALL Mem (used for API calls)
      EB: JMP Mem (used for obfuscation in API calls)
      EC: CALL Reg (obfuscation of API calls)
      ED: JMP Reg (idem)

      F0: SHIFT Reg,Imm
      F1: SHIFT Mem,Imm
      F2: SHIFT Reg8,Imm
      F3: SHIFT Mem8,Imm
          For all SHIFTs:
          +07: Byte with the value of rotation/shifting
          +08: Operation performed: 0: ROL, 8: ROR, 20: SHL, 28: SHR
      F4: APICALL_BEGIN
          Special operation meaning PUSH EAX/PUSH ECX/PUSH EDX that avoids the
          recoding of
          these registers, always remaining the same.
      F5: APICALL_END
          The complementary of APICALL_BEGIN, it means POP EDX/POP ECX/POP EAX
      F6: APICALL_STORE
          +01: Memory address struct
          This always means: MOV [Mem],EAX <-- Avoiding the recoding of EAX
      F7: SET_WEIGHT
          +01: Memory address struct
          +07: Byte with the weight identificator
          +08: Register 1 in code structure
          +09: Register 2 in code structure
      F8: MOVZX
          Memory address struct is a 8-bits operand, while +07 is a 32 bit reg.
      FC: LEA
      FE: RET
      FF: NOP

  These are the opcodes we can find when disassembling. Additionally, I have
  have defined some more opcodes for internal operations:

      4F: Exists only While shrinking, and means a MOV Mem,Mem, being:
          +01: Source memory address struct
          +07: Pointer to the instruction that holds the destiny memory
               address struct, which has the format:
               +00: FF (NOP)
               +01: Destiny memory address struct
               +07: Ptr to source memory address holder (the 4F instruction)

  Since 4F is only a transition opcode, it's free for use after shrinking.
  There are three more opcodes used only for reassembly:

      4E: INC/DEC Register
          +01: Register
          +07: 0 if INC, 8 if DEC
      4F: INC/DEC memory address
          +01: Memory address struct
          +07: 0 if INC, 8 if DEC
      FD: Literal byte
          +01: Byte to insert directly into the code


  Thats the planning I made for the beginning. As you can see, the opcodes
  are very similar to the x86 ones, so they are quite easy to remember (in
  fact, I wrote this list only remembering it :).

  Now, the second part of planning: the shrinker. How are we going to shrink?
  We'll see them in its section, because I'm not on the mood of writing
  things twince :P.



2) Coding
---------

  a) Disassembler/depermutator

  The entrypoint of the engine is this! Before making anything, we must
  disassemble, of course. The disassembler is quite easy in concept, but a
  real drudgery. Since we are trying to do an absolute metamorphic code, we
  can't use hash tables although we code a routine to mutate that hash,
  something that can be harder. My solution was the first case, and moreover
  I combined the depermutator in an implicit way. The theory of decoding is
  as follows:

  Given ESI = Entrypoint, we have a memory buffer as big as the code we are
  going to disassemble. This buffer (stored in the variable PathMarks) allows
  us to control the already-disassembled code. We have also two more tables:
  the LabelTable and the FutureLabelTable, also initialized. These two tables
  have a counter variable for each one, giving the number of elements in
  every table. We have also the DisassembledCode in EDI, just where we decode
  the instructions.

  LabelTable is a table where each element is 2 DWORDs long. The first DWORD
  stores the real EIP where it points, where the second DWORD stores a
  pointer to the disassembled code. Then, when we decode a JMP, we set a
  a pointer to this table as the label. In this way, we can move as we want
  the internal pointers of the label and all the instructions referencing to
  that label are automatically updated.

  FutureLabelTable is a buffer table that only exists on disassembly. It's
  used to store destinies of JMPs, CALLs, etc. pointing to code that we
  haven't disassembled yet. Every time we decode an instruction, we look if
  that address is stored in this table, and then we can complete all the
  instructions that referenced to that address if it exists there.

  Given all this, let's see the algorithm:

    1) Initialize the PathMarks map (i.e. zeroing it) and the number of labes
      and future labels.

    2) Translate the current EIP (in ESI) directly onto the PathMarks map.

        - If it exists, store at ESI a JMP to the disassembled instruction
        (storing also a label pointing to that instruction if it hasn't one
        already).

        - If it doesn't exists, mark it as already disassembled and decode the
         instruction. Now, depending on the instruction, we act:

           If it's JMP:

             * If it points to an already decoded address, write a JMP
              instruction, insert a label to the destiny and get a new EIP at
              FutureLabelTable. If the label already exists, use that label.

             * If not, then write a NOP (just in case a label points directly
              to this JMP) and load a new EIP (in ESI) with the destiny. In
              this way, we have eliminated a possible permutation JMP.

           If it's Jcc (conditional jump):

             * If it points to an already decoded address, write the Jcc and
              insert a label to the destiny if the label doesn't exist (if
              not, use the label already inserted in the table).

             * If the destiny is not disassembled yet, then store it at
              FutureLabelTable and continue.

           If it's CALL, act as if it were a Jcc.

           If it's RET, JMP Reg or JMP [Mem] (a final leaf in the code tree),
           store the instruction and get a new EIP from FutureLabelTable.


  When getting a new EIP from the FutureLabelTable, we check if the labels
  stored here are already decoded. If they are, then we insert the
  corresponding labels at the LabelTable and eliminate the entry in
  FutureLabelTable. If not, we get that new EIP (i.e. we load ESI with that
  new entrypoint), we insert the new label at LabelTable and continue.

  As you can deduct, the disassembly will end when FutureLabelTable has no
  more entries, since this means that we came from an end leaf of the code
  flux. After performing such "emulation", we have:

   1) Eliminated the permutation and the permutation jumps (since we have
     eliminated the JMPs by changing directly the EIP in ESI).
   2) Eliminated all the code that can't be reached in any way.
   3) Decoded the whole code in our pseudo-assembler.
   4) Substituted the labels by pointers to table entries.


  We have done it! We haven't to code a depermutator or an emulator to detect
  dead code zones, since we eliminate them implicity. An example of a
  depermutation performed here would be:

       CODE         PASSES
      ------        ------
        xxx1        1) Decode xxx1
        xxx2        2) Decode xxx2
        xxx3        3) Decode xxx3
        jmp @A      4) Change EIP to @A (don't store label)
        yyy1        5) Decode xxx7
        yyy2        6) Decode xxx8
    @B: xxx4        7) Decode xxx9
        xxx5        8) Change EIP to @B (don't store label)
        xxx6        9) Decode xxx4
        jmp @C     10) Decode xxx5
        yyy3       11) Decode xxx6
        yyy4       12) Change EIP to @C (don't store label)
    @A: xxx7       13) Decode xxx10
        xxx8       14) Decode xxx11
        xxx9       15) Decode JZ and store @D in FutureLabelTable
        jmp @B     16) Decode xxx12
    @D: xxx13      17) Decode RET, get @D from FutureLabelTable and
        xxx14        complete the JZ at pass 15 (@D = current EIP)
        RET        18) Decode xxx13
        yyy5       19) Decode xxx14
    @C: xxx10      20) Decode RET and get an item from FutureLabelTable.
        xxx11        Since it's empty, we have decoded everything, so finish.
        jz @D
        xxx12
        RET

  The result of the disassembly would be:

        xxx1
        xxx2
        xxx3
        xxx7
        xxx8
        xxx9
        xxx4
        xxx5
        xxx6
        xxx10
        xxx11
        jz @D
        xxx12
        RET
  @D:   xxx13
        xxx14
        RET

  I think it's more clear now. All garbage and hole-filling code (the yyy?
  instructions) are implicitly eliminated, so we haven't to look for them
  later.

  The problem here is that, once shrinked, the code skeleton (already
  depermutated here) can be use for detection. Or maybe not, but we want even
  the skeleton to be different from generation to generation (the spanish
  expression is "rizar el rizo", or "to loop the loop" in spanglish :). Then,
  we insert what I call the "3D instructions". What?? Ehrmm... Well, it's not
  taking Quake and playing, it's just that it pictures exactly what it
  happens: let's imagine the dimensions where you make metamorphism: the first
  dimension is the current code (what you are disassembling now). The second
  dimension is what you are going to code (the result of this metamorphosis).
  The third dimension is what the next generations will decode.

  So, first and second dimension are clear, but... what about third? Simple:
  just code some JMPs as CMP EAX,EAX/JZ @xxx, for example. Do you see the
  point? The shrinker (as we are going to see in the next part) must be able
  to compress the "CMP EAX,EAX/JZ @xxx" pair to "JMP @xxx", but it will
  happen in the next generation, not in this one (and there are some
  structures that will be compressed completely at more far generations). The
  only thing we have to take in  account is that we can't put garbage
  instructions after this type of jump, because they'll be decoded also. Let's
  see the code above (eliminating the yyy? this time) and substituting the
  first JMP by a CMP/JZ pair:

          xxx1       Result of the disassembly:
          xxx2          xxx1
          xxx3          xxx2
          CMP X,X       xxx3
          JZ @A         CMP X,X
      @B: xxx4          JZ @A    --> This will be compressed to JMP @A on this
          xxx5      @B: xxx4         generation and then eliminated implicitly
          xxx6          xxx5         in the disassembly of next generation.
          jmp @C        xxx6
      @A: xxx7          xxx10
          xxx8          xxx11
          xxx9          jz @D
          jmp @B        xxx12
      @D: xxx13         RET
          xxx14     @A: xxx7
          RET           xxx8
      @C: xxx10         xxx9
          xxx11         jmp @B
          jz @D     @D: xxx13
          xxx12         xxx14
          RET           RET

  This means ABSOLUTE metamorphism. The skeleton changes, but not the code
  algorithm. It's necessary to run some generations to reach the original
  code, but since other JMPs are mutated in the same way, you never reach it,
  and you need some generations to eliminate a permutation JMP, while other
  new ones are inserted. The good thing is that it's not infinite, and we
  reach a point where this code gets stable, but not in the first generations,
  where we'll see that our code grows due to these jumps.

  Other thing that we can do is to use this disassembler to decode some
  instructions to its easy form: INC Reg --> ADD Reg,1, for example. In this
  way, we only have to treat one type of instruction, not all the instruction
  variants in the processor that perform the same operation (although it can
  be performed by the shrinker, but in this way we save pseudo-opcodes).


  b) Shrinker/emulator (deobfuscator)

  At this point, are we going to emulate or compress? The emulation is more
  advanced and powerful, but implies an outstandingly complex coding, and it
  has many problems, like the control of the values after a loop. The easier
  way of doing things, and also the way that gives better ratios in a
  quality/quantity comparision is compressing by known pairs and triplets of
  instructions: it's just undoes what the expander part does.

  The planning must have this part. You have to make a table with all the
  possible single instructions, pairs/triplets of them, and decide what you
  are going to compress/expand and what not. The shrinker can be used to
  eliminate "intelligent garbage": do-nothing code that is inserted as if it
  were part of the algorithm. Now I show the possible single/pairs/triplets
  MetaPHOR can compress, and they're just the set of instructions that the
  expander can generate from a single one.

  Legend:
   Reg: A register
   Mem: A memory address
   Imm: Immediate

   When in an instruction is Reg,Reg or something like that, both are the
   same register. If they are different, I write it as Reg,Reg2 (for example).

   Transformations over single instructions:

     XOR Reg,-1               --> NOT Reg
     XOR Mem,-1               --> NOT Mem
     MOV Reg,Reg              --> NOP
     SUB Reg,Imm              --> ADD Reg,-Imm
     SUB Mem,Imm              --> ADD Mem,-Imm
     XOR Reg,0                --> MOV Reg,0
     XOR Mem,0                --> MOV Mem,0
     ADD Reg,0                --> NOP
     ADD Mem,0                --> NOP
     OR  Reg,0                --> NOP
     OR  Mem,0                --> NOP
     AND Reg,-1               --> NOP
     AND Mem,-1               --> NOP
     AND Reg,0                --> MOV Reg,0
     AND Mem,0                --> MOV Mem,0
     XOR Reg,Reg              --> MOV Reg,0
     SUB Reg,Reg              --> MOV Reg,0
     OR  Reg,Reg              --> CMP Reg,0
     AND Reg,Reg              --> CMP Reg,0
     TEST Reg,Reg             --> CMP Reg,0
     LEA Reg,[Imm]            --> MOV Reg,Imm
     LEA Reg,[Reg+Imm]        --> ADD Reg,Imm
     LEA Reg,[Reg2]           --> MOV Reg,Reg2
     LEA Reg,[Reg+Reg2]       --> ADD Reg,Reg2
     LEA Reg,[Reg2+Reg2+xxx]  --> LEA Reg,[2*Reg2+xxx]
     MOV Reg,Reg              --> NOP
     MOV Mem,Mem              --> NOP (result of a compression of
                                       PUSH Mem/POP Mem, with pseudoopcode 4F)

  The instructions that are eliminated (the ones that mean NOP) are be used
  as garbage along the executable code. Since every NOP instruction can be
  expanded (for example, MOV Reg,Reg can be set as PUSH Reg/POP Reg, and every
  PUSH and POP also can be expanded, and so on) you can't know what's garbage
  and what's not until you have compressed everything.

  The pairs of instructions that MetaPHOR can compress are:

     PUSH Imm / POP Reg                      --> MOV Reg,Imm
     PUSH Imm / POP Mem                      --> MOV Mem,Imm
     PUSH Reg / POP Reg2                     --> MOV Reg2,Reg
     PUSH Reg / POP Mem                      --> MOV Mem,Reg
     PUSH Mem / POP Reg                      --> MOV Reg,Mem
     PUSH Mem / POP Mem2                     --> MOV Mem2,Mem (codificated
                                                  with pseudoopcode 4F)
     MOV Mem,Reg/PUSH Mem                    --> PUSH Reg
     POP Mem / MOV Reg,Mem                   --> POP Reg
     POP Mem2 / MOV Mem,Mem2                 --> POP Mem
     MOV Mem,Reg / MOV Reg2,Mem              --> MOV Reg2,Reg
     MOV Mem,Imm / PUSH Mem                  --> PUSH Imm
     MOV Mem,Imm / OP Reg,Mem                --> OP Reg,Imm
     MOV Reg,Imm / ADD Reg,Reg2              --> LEA Reg,[Reg2+Imm]
     MOV Reg,Reg2 / ADD Reg,Imm              --> LEA Reg,[Reg2+Imm]
     MOV Reg,Reg2 / ADD Reg,Reg3             --> LEA Reg,[Reg2+Reg3]
     ADD Reg,Imm / ADD Reg,Reg2              --> LEA Reg,[Reg+Reg2+Imm]
     ADD Reg,Reg2 / ADD Reg,Imm              --> LEA Reg,[Reg+Reg2+Imm]
     OP Reg,Imm / OP Reg,Imm2                --> OP Reg,(Imm OP Imm2)
                                                  (must be calculated)
     OP Mem,Imm / OP Mem,Imm2                --> OP Mem,(Imm OP Imm2)
                                                  (must be calculated)
     LEA Reg,[Reg2+Imm] / ADD Reg,Reg3       --> LEA Reg,[Reg2+Reg3+Imm]
     LEA Reg,[(RegX+)Reg2+Imm] / ADD Reg,Reg2 -> LEA Reg,[(RegX+)2*Reg2+Imm]
     POP Mem / PUSH Mem                      --> NOP
     MOV Mem2,Mem / MOV Mem3,Mem2            --> MOV Mem3,Mem
     MOV Mem2,Mem / OP Reg,Mem2              --> OP Reg,Mem
     MOV Mem2,Mem / MOV Mem2,xxx             --> MOV Mem2,xxx
     MOV Mem,Reg / CALL Mem                  --> CALL Reg
     MOV Mem,Reg / JMP Mem                   --> JMP Reg
     MOV Mem2,Mem / CALL Mem2                --> CALL Mem
     MOV Mem2,Mem / JMP Mem2                 --> JMP Mem
     MOV Mem,Reg / MOV Mem2,Mem              --> MOV Mem2,Reg
     OP Reg,xxx / MOV Reg,yyy                --> MOV Reg,yyy
     Jcc @xxx / !Jcc @xxx                    --> JMP @xxx (this applies to
                                                  (Jcc & 0FEh) with (Jcc | 1)
     NOT Reg / NEG Reg                       --> ADD Reg,1
     NOT Reg / ADD Reg,1                     --> NEG Reg
     NOT Mem / NEG Mem                       --> ADD Mem,1
     NOT Mem / ADD Mem,1                     --> NEG Mem
     NEG Reg / NOT Reg                       --> ADD Reg,-1
     NEG Reg / ADD Reg,-1                    --> NOT Reg
     NEG Mem / NOT Mem                       --> ADD Mem,-1
     NEG Mem / ADD Mem,-1                    --> NOT Mem
     CMP X,Y / != Jcc (CMP without Jcc)      --> NOP
     TEST X,Y / != Jcc                       --> NOP
     POP Mem / JMP Mem                       --> RET
     PUSH Reg / RET                          --> JMP Reg
     CALL Mem / MOV Mem2,EAX                 --> CALL Mem / APICALL_STORE Mem2
     MOV Reg,Mem / CALL Reg                  --> CALL Mem
     XOR Reg,Reg / MOV Reg8,[Mem]            --> MOVZX Reg,byte ptr [Mem]
     MOV Reg,[Mem] / AND Reg,0FFh            --> MOVZX Reg,byte ptr [Mem] 


  Maybe there are more, but this set is sufficient, at least for our
  proposits. What we do know is scan the code for this situations and then we
  substitute the first instruction by their equivalent and we overwrite with
  NOP the second, so the instructions are compressed.

  But there are more: the triplets:

     MOV Mem,Reg
     OP Mem,Reg2
     MOV Reg,Mem                     --> OP Reg,Reg2

     MOV Mem,Reg
     OP Mem,Imm
     MOV Reg,Mem                     --> OP Reg,Imm

     MOV Mem,Imm
     OP Mem,Reg
     MOV Reg,Mem                     --> OP Reg,Imm (it can't be SUB)

     MOV Mem2,Mem
     OP Mem2,Reg
     MOV Mem,Mem2                    --> OP Mem,Reg 

     MOV Mem2,Mem
     OP Mem2,Imm
     MOV Mem,Mem2                    --> OP Mem,Imm

     CMP Reg,Reg
     JO/JB/JNZ/JA/JS/JNP/JL/JG @xxx
     != Jcc                          --> NOP

     CMP Reg,Reg
     JNO/JAE/JZ/JBE/JNS/JP/JGE/JLE @xxx
     != Jcc                          --> JMP @xxx

     MOV Mem,Imm
     CMP/TEST Reg,Mem
     Jcc @xxx                        --> CMP/TEST Reg,Imm
                                         Jcc @xxx
     MOV Mem,Reg
     SUB/CMP Mem,Reg2
     Jcc @xxx                        --> CMP Reg,Reg2
                                         Jcc @xxx
     MOV Mem,Reg
     AND/TEST Mem,Reg2
     Jcc @xxx                        --> TEST Reg,Reg2
                                         Jcc @xxx
     MOV Mem,Reg
     SUB/CMP Mem,Imm
     Jcc @xxx                        --> CMP Reg,Imm
                                         Jcc @xxx
     MOV Mem,Reg
     AND/TEST Mem,Imm
     Jcc @xxx                        --> TEST Reg,Imm
                                         Jcc @xxx
     MOV Mem2,Mem
     CMP/TEST Reg,Mem2
     Jcc @xxx                        --> CMP/TEST Reg,Mem
                                         Jcc @xxx
     MOV Mem2,Mem
     AND/TEST Mem2,Reg
     Jcc @xxx                        --> TEST Mem,Reg
                                         Jcc @xxx
     MOV Mem2,Mem
     SUB/CMP Mem2,Reg
     Jcc @xxx                        --> CMP Mem,Reg
                                         Jcc @xxx
     MOV Mem2,Mem
     AND/TEST Mem2,Imm
     Jcc @xxx                        --> TEST Mem,Imm
                                         Jcc @xxx
     MOV Mem2,Mem
     SUB/CMP Mem2,Imm
     Jcc @xxx                        --> CMP Mem,Imm
                                         Jcc @xxx
     PUSH EAX
     PUSH ECX
     PUSH EDX                        --> APICALL_BEGIN

     POP EDX
     POP ECX
     POP EAX                         --> APICALL_END


  And again, maybe there are more cases, but these are the ones I use. The
  mechanism we follow is the same as with pairs: we check if the three
  instructions under our pointer construct a defined triplet, and then we
  compress it overwriting the two last instructions with two NOPs.

  Once we have defined the singles, pairs and triplets of instructions, we are
  going to see the algorithm of compression, since we cannot apply this
  directly if we expanded them recursively (this means, maybe we coded
  PUSH Imm/POP Reg, but PUSH Imm can be coded as MOV Mem,Imm/PUSH Mem, so the
  instructions would be "MOV Mem,Imm/PUSH Mem/POP Reg", and the POP Reg could
  be expanded also. The algorithm of compression is:

          CurrentPointer = FirstInstruction
       @@Loop:
          if ([CurrentPointer] == MATCHING_SINGLE)
          {
             Convert it
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             goto @@Loop
          }
          if ([CurrentPointer] == MATCHING_PAIR)
          {
             Convert it
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             goto @@Loop
          }
          if([CurrentPointer] == MATCHING_TRIPLET)
          {
             Convert it
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             goto @@Loop
          }
          do (CurrentPointer++) while [CurrentPointer] == NOP
          if(CurrentPointer != LastInstruction) goto @@Loop

      DecreasePointer:
          do (CurrentPointer--) while (([CurrentPointer] == NOP) &&
                                       ([CurrentPointer.Label == FALSE))
          return

  We don't need to pad with NOPs the last instruction to avoid that garbage
  from a previous disassembly generate something undesired, since no
  compression is defined that takes a RET, JMP or an instruction like that as
  the first instruction of a pair/triplet, and we are SURE - capitalized -
  that the code finishes with one of this instruction. Also take in account
  that we ignore NOPs always!

  Also take care of labelled instructions: we MUST NOT compress instructions
  if one of them (apart from the first one) has a label pointing to it (that's
  the reason of having a LABELLED flag embedded into the instruction structure
  that we defined while planning the engine). A label over it means that a
  jump, call, etc. points to the instruction, and if we merge it with the
  one above we are corrupting the code.

  The memory addresses that we eliminate with the compression are just
  temporary variables that we reserved in reassembly to hold the values to
  perform that operation. So, we must be sure that our true variables (the
  memory addresses where we store important things) must not match any
  pair or triplet, because if it does it will be eliminated and the engine
  won't work anymore. With a little care, we don't worry about that memory
  variables and we don't have to check if they are important variables or
  only transition ones, which allows us to redefine freely the position of
  the variables along the code and mix them with the temporary ones.


  An example of what the shrinker does:

  Original code:

  MOV [Var1], ESI  * PUSH ESI       * MOV EAX,ESI
  PUSH [Var1]      * nop              nop
  POP EAX            POP EAX        * nop
  PUSH EBX           PUSH EBX         PUSH EBX        =====>
  POP [Var2]         POP [Var2]       POP [Var2]
  ADD EAX,[Var2]     ADD EAX,[Var2]   ADD EAX,[Var2]




             MOV EAX,ESI        MOV EAX,ESI   * LEA EAX,[ESI+EBX]
             nop                nop             nop
             nop                nop             nop
  ====>    * MOV [Var2],EBX   * ADD EAX,EBX   * nop
           * nop                nop             nop
             ADD EAX,[Var2]   * nop             nop



  After passing the algorithm above to the pseudo-assembler, we obtain a
  code with lots of NOPs along the code, but it doesn't matter because, on
  expansion, we'll ignore this NOPs, performing a true optimization (although
  that achieved optimization is broken by the expander, but who cares...).


 c) Permutator

  Using an internal assembler allows to perform this pass in an easy way, and
  we haven't to keep all the instructions of the same size and things like
  that, because the reassembler will calculate that for us.

  The easiest way of doing a permutation over the code is to define "code
  frames": we construct a table where we define portions of code, giving an
  initial and a final offset, in this way:

    ESI = Initial address of instructions
    EDI = Address of last instruction

    Given ESI = 00000000h,
          EDI = 00000060h

    while(ESI < EDI)
       Store ESI
       ESI += Random(8)+8
       Store ESI
       if((ESI+0F > EDI)
          Store ESI,EDI
          break;
       end if
    end while
       
    Result (for example):
          DD    00000000h,0000000Ah
          DD    0000000Ah,00000017h
          DD    00000017h,00000023h
          DD    00000023h,00000032h
          DD    00000032h,0000003Dh
          DD    0000003Dh,00000049h
          DD    00000049h,00000052h
          DD    00000052h,00000060h

  Now we make a shuffle of the elements of this array. Shuffling is easy
  (there are lots of algorithms for doing it). After doing it, we obtain:

          DD    00000032h,0000003Dh
          DD    00000023h,00000032h
          DD    0000000Ah,00000017h
          DD    00000000h,0000000Ah
          DD    00000017h,00000023h
          DD    00000052h,00000060h
          DD    0000003Dh,00000049h
          DD    00000049h,00000052h
          
  While doing it, we keep the track of the first frame, which is the
  entrypoint of the code. If the first frame is not the entrypoint, we insert
  a JMP to that frame. Since the destiny of the linking JMPs is not known,
  we store them into a table for, after copying all the instructions, complete
  them. So, the first action we make is to insert a JMP for the entrypoint,
  and then we begin to copy the instructions. The first frame tell us that
  we must copy from instruction 32h to instruction 3Dh, and then insert a
  JMP to the next frame, and so on.

  After performing this, we'll get a permutated code. At this point we only
  have to finish the JMPs we have stored for later fixing, and all is done.

  Also try different things here: for example, if you leave the NOPs that
  the shrinker generated it will derive in a more random code distribution,
  since the NOPs will be included also in the permutation, but eliminated
  in the next code processings.


 d) Expander (obfuscator)

  The expander is the part that undoes what the shrinker did. It performs
  *exactly* the reverse operation (in a random way, of course). Just take
  the defined singles, pairs and triplets and code an alternative for every
  instruction an alternative (apart from their direct codification).

  For this, we are going to code it all recursive. For example, when we find
  a 50h opcode (PUSH Reg) we'll call MakePUSHReg(), which will decide randomly
  if we code directly the PUSH or we make a workaround using a defined pair
  or triplet. So, the function decides to make it as MOV Mem,Reg / PUSH Mem,
  so it calls MakeMOVMemReg() and MakePUSHMem(). But, what a casuallity! The
  function MakeMOVMemReg() can be called with MakePUSHReg()+MakePOPMem(),
  which again calls MakePUSHReg(). This can make the code grow too much, so
  we put a recursivity control, i.e. a variable that increments when a
  recursive-procedure-using function is called and decreased when it leaves.
  We check that variable to see if it has increased to a certain number, and
  if we reach it we code the instruction directly (in this case, 50h+Reg and
  EIP++).

  The expander is cooler if we use our internal assembly language also, so
  we haven't to deal with the final codification. That work will be done by
  the reassembler (next section). To be sure that what's resulting on the
  code expansion, we are going to generate the pseudo-assembler as similar
  to the final code as we can (that's why I have opcodes such 4E and 4F to
  code INC and DEC in the list of opcodes). So, if we find CMP EAX,0 on the
  expander, we use it to code OR EAX,EAX or TEST EAX,EAX in pseudoassembler,
  so the reassembler only has to code the instruction directly, without
  having to decide anything more than some randomness on opcode generation.

  The expander should make some more operations, such as:

   * Register translation

     We select a new register to translate the register that we were using
    until now. The easier way is to put the sequence 0,1,2,3,5,6,7 in a
    list and shuffle it, and then translate every register by that number.
    In this way the registers are never the same when performing operations,
    and even on memory access, since if we use ESI->EBP translation (for
    example) the way these memory references are assembled to x86 are
    different (or can be different). It's something that we could see in
    Vecna's Regswap.

   * Variable re-selection

     To do memory variables we have to own a memory buffer (reserved with
    VirtualAlloc, .bss section in the host itself, etc.). If you have the
    variables we use to store important things at a fixed location, forget
    it! Put them in a buffer where all them are together. In this way, we
    only have to detect that a memory address is a variable (looking if it's
    a memory address of the form [DeltaRegister+12345678h]), store it in
    a buffer like we did with the code labels (but this time with memory
    addresses), shuffle them and reassign an address for every variable. Doing
    this we don't use fixed variables anymore. The only inconvenient is that
    we must supply some values from other places, such from the decryptor (if
    we use one).

  Also I'll take a case for LEAs: better if we use them only for recoding
  MOV Reg,Value, ADD Reg,Value, MOV Reg,Reg and ADD Reg,Reg. Why? Because
  if you look a LEA closer, means that it arrived here because it's a shrink
  of some more instructions, like MOV EAX,EBX / ADD EAX,12345678h. So, if
  we avoid the coding of the LEA and we disgregate it in simpler operations
  (randomly), we have an embedded swapper module. An example:

     MOV EAX,ECX                                              
     ADD EAX,3   -> (shrink) -> LEA EAX,[ECX+3] -> (expand) -> MOV EAX,3
                                                               ADD EAX,ECX

                                                   (expand) -> MOV EAX,ECX
                                                               ADD EAX,3

  The thing is that if we did the effort of coding a swapper in a part of
  the engine, we'll see that these cases are contemplated, so we coded things
  that are redundant in the final result. Anyway, a swap can be performed
  on expansion, but take care, because many times a swap between two elements
  like the next ones without controlling EVERYTHING can corrupt the code:


     MOV EAX,1234
     MOV EBX,2345  <-- check if a label is pointing to this!

     MOV EAX,1234
     MOV EBX,[EAX] <-- check if the second instruction uses the elements of
                       the first instruction

     MOV EAX,[EBX]
     MOV EBX,1234  <-- check if the first instruction uses the elements of
                       the second instruction

     MOV EAX,[EBX]
     MOV [ECX],EDX <-- EBX and ECX has the same value? We can't know this
                       without total emulation

  And MANY MANY more. My experience is that there are so many factors that
  the code can crash even with the most little, innocent change that the
  swapper makes although it seems to be perfectly valid. The only ones we
  can mutate with security are the ones that are inserted in LEAs and unrolled
  from them later. Moreover, we avoid the coding of another huge routine :).



 e) Reassembler

  The reassembler is the end edge of the engine. This piece of code will
  generate the instructions the processor can understand. If we followed the
  philosophy of expanding the code in pseudo-assembler, this is a piece of
  cake, because we have done this plenty of times when we coded polymorphic
  engines. Moreover, the vision of seeing your engine finished animates you
  a lot, believe me :).

  The reassembler is a kind module that likes to be helped from other ones,
  such from the expander. We can code it to take literally the pseudoassembler
  and write what it's expressed there, with no care about what it means.

  But while we code it, we see that there are some things that aren't as easy
  as they seem, for example the EIP displacement instructions. How can we code
  foward JMPs, CALLs and Jccs? We must use (again) a table, and store here
  all the instructions we must fix when the pseudocode is all assembled.

  But a real pain can be the foward short JMPs/Jccs. We can do it without it,
  but it's not fair ;). My solution (very neat) is to look if the instruction
  points to a maximum length of 11 or 12 instructions foward, so if it's
  below that mark we can decide if we code it short or long (randomly, ofcoz).
  Where we haven't any problem is with backwards jumps, because we always know
  the length. CALLs doesn't need to be decided, but we must resolve them also
  when they are fowards, because the code isn't assembled yet.

  The randomness in this part of the engine is made in the jumps (when we can
  decide if we use short or long ones) and in the opcode reassembly when we
  have several possibilities for the same opcode. The list that follows shows
  some instructions that can be done in this way:

  B0+Reg
  C6 C0+Reg   --> MOV Reg8,Value

  B8+Reg
  C7 C0+Reg   --> MOV Reg,Value

  50+Reg
  FF F0+Reg   --> PUSH Reg

  58+Reg
  8F C0+Reg   --> POP Reg

  40+Reg
  FF C0+Reg   --> INC Reg

  48+Reg
  FF C8+Reg   --> DEC Reg

  This is an example. Other ones can be the instructions that can uses EAX
  (both in EAX-exclusive opcode or using the generic opcode), the opcodes
  that takes a sign-extended byte-to-dword operand (opcode 83, and using
  -80 > value < 7F), and much more. All them are only random at opcode level,
  because all the alternate ways of doing things (the complex movements, and
  the like) are made with the expander.


  Hey, it's done! We have a reassembled code! Now it comes the hard part:
  DEBUGGING.


3) Known problems (and solutions)
---------------------------------

 a) Debugging your engine

  Debugging metamorphic code is a hell, and if you never have coded such an
  engine, you'll notice it. It's fine while you debug the code at first
  generation because you can see what you did, but the problem comes when
  you must debug the generated code. The assembler is soooo obfuscated that
  you maybe get crazy, so the solution is: debug from the beginning! Don't
  code the shrinker/expander until all the other parts work perfectly!

  Other thing you must consider is that the debugger uses INT 3 to patch the
  return from CALLs. If you code the disassembler in a separated function,
  take care when tracing/stepping over, because:

  MOV EDI,[VirusEntry]
  CALL Disassembly
  MOV EAX,12345678  --> The disassembler will see:  MOV EDI,[VirusEntry]
                                                    CALL Disassembly
                                                    INT 3
                                                    JS @xxx
                                                    ...

  You can see that the debugger corrupts the code. This is good, because the
  code is implicitly anti-debugging :), but it's bad because if you don't
  notice it you believe you're wrong in some point and lose lots of time with
  a checking you didn't need. The solution is:

      1) Enter the call and trace all the debugging (unacceptable)
       or
      2) Use hardware breakpoints (OOOOOH! clap clap clap clap)

  So, take care of where you press the "Run to here" key, since it will be OK
  if you use it after you have runned the disassembler, but it will corrupt
  the code if you use it before the disassembly and you point to code that
  will execute after it. Using hardware breakpoints on "fetch instruction"
  will eliminate the problem, but it's more annoying (more keys to press,
  etc.). Make it in a macro.


 b) API calls

  API calls are a real pain in the ass to detect. Since all the calls haven't
  the same number of parameters, we don't know at what point the parameters
  start (but we know that the API ends with the CALL, ah, now we can rest in
  peace :P). This type of calls are delicate because we can't change the
  registers around them as we want, since the return value is given always at
  EAX, and they always modify ECX and EDX. So, in a normal code, we happily
  set values in registers without using these three, but we can't make it when
  we are translating registers "a go-go" (or we can play only with registers
  > EDX, but then we limit the possibilities). We also have to detect the
  moment when EAX is used to check the return value, and lots of things that
  comes from the fixed use of these registers, things that collide with our
  engine.

  The easy solution is to use a "code mark": they're a sequence of certain
  instructions with certain operands that doesn't repeat along the code unless
  the thing they are marking is present. The ones I use are the instructions
  called APICALL_BEGIN, APICALL_END and APICALL_STORE.

  APICALL_BEGIN is, simply PUSH EAX/PUSH ECX/PUSH EDX. It's detected by the
  shrinker, since every instruction of the structure can be expanded as well
  as with other instructions. The shrinker will detect this sequence of
  instructions and will change them by the pseudoopcode F4, which is a mark
  for the expander to code PUSH EAX/PUSH ECX/PUSH EDX. The registers must be
  always EAX, ECX and EDX. In this way we assure the saving of these ones
  before an API call.

  In the same way, APICALL_END is POP EDX/POP ECX/POP EAX. This is also
  translated by the shrinker, giving a pseudoopcode F5, which is also a mark
  for the expander to code POP EDX/POP ECX/POP EAX. This signalizes the end
  of an API call and then the restoration of what it was saved. So, for the
  registers values it's like if the API call was never performed.

  APICALL_STORE is another pseudoinstruction. The shrinker will detect it when
  it gets a MOV [Mem],EAX just after the CALL of the API. This is done to
  avoid the translating of EAX, so this instruction will always be codificated
  as MOV [Mem],EAX, regardless the register EAX must be translated to. The
  memory address is the variable where the return value is stored, and can be
  retrieved after doing the APICALL_END.

  The next example illustrates the use of that technique:

  PUSH EAX
  PUSH ECX
  PUSH EDX            -------------------> APICALL_BEGIN
  MOV EAX,[EBP+AddressOfNewDirectory]
  PUSH EAX
  CALL DWORD PTR [EBP+RVA_SetCurrentDirectoryA]
  MOV [EBP+ReturnValue],EAX -------------> APICALL_STORE [ReturnValue]
  POP EDX
  POP ECX
  POP EAX             -------------------> APICALL_END
  MOV EAX,[EBP+ReturnValue]
  CMP EAX,EDX          ; Get the return value and check it
  JZ @X
  ...

  Now change the registers:

  PUSH EAX
  PUSH ECX
  PUSH EDX            -------------------> APICALL_BEGIN
  MOV ESI,[EBX+AddressOfNewDirectory]
  PUSH ESI
  CALL DWORD PTR [EBX+SetCurrentDirectoryA]
  MOV [EBX+ReturnValue],EAX   -----------> APICALL_STORE [ReturnValue]
  POP EDX
  POP ECX
  POP EAX             -------------------> APICALL_END
  MOV ESI,[EBX+ReturnValue]
  CMP ESI,ECX          ; Get the return value and check it
  JZ @X
  ...


  For obvious reasons, you must not use EAX, ECX or EDX as the Delta register
  in the recodification of the virus because, if you use it, the call to the
  API function will overwrite it and the return value will be wrote anywhere,
  with a 99% of throwing an exception. We can use ths fact (Delta is not EAX,
  ECX or EDX) to code the expansion of CALL DWORD PTR [Mem], which we planned
  to be MOV Reg,Mem/CALL Reg: we can use EAX, ECX or EDX for Reg perfectly
  and without saving anything, because the register values are going to be
  destroyed by the API call.

  For Linux (for the ones that want to make metamorphism under this system)
  the API call is more complicated, since we pass the parameters in the
  registers (EAX, EBX, ECX, etc.). Well, we can define a structure in this
  way:

  MOV [EBP+Parameter1], XXX
  MOV [EBP+Parameter2], YYY
  MOV [EBP+Parameter3], ZZZ
  PUSH EAX
  PUSH EBX
  PUSH ECX
  PUSH EDX
  PUSH EBP       --------------> LINUX_SYSCALL_BEGIN DeltaReg
  MOV EAX,[EBP+Parameter1]
  MOV EBX,[EBP+Parameter2]
  MOV ECX,[EBP+Parameter3]
  MOV EDX,[EBP+Parameter4]  ---> LINUX_SYSCALL_LOADPARAM
  INT 80h
  POP EBP
  MOV [EBP+ReturnValue],EAX  --> LINUX_SYSCALL_STORE DeltaReg,[ReturnValue]
  POP EDX
  POP ECX
  POP EBX
  POP EAX        --------------> LINUX_SYSCALL_END


  It's larger, but I'm not God :).

  There's something that also applies to this, although it's not an API call:
  the SET_WEIGHT pseudoinstruction. The instruction corresponds to the next
  code structure:

  PUSH Reg1
  MOV Reg1,WEIGHT_IDENT
  MOV Reg2,xxyyzztt
  MOV [Mem],Reg2
  POP Reg1

  This is compressed to SET_WEIGHT [Mem],IDENT,Reg1,Reg2. This instruction is
  used to transmit the data here from one generation to another. We can only
  transmit this because we obtained the values before making the total reasm,
  so we can't use it for substituting parameters passed by stack.

  The weights expressed here are part of a little genetic algorithm that makes
  the virus to be "adaptative", since the "natural selection" will make that
  the survivors have the best options to be in the wild (type of infection,
  structure of the decryptor, infection rate. etc.). The algorithm is not very
  advanced, but it does its work.

 c) Memory

  Yes, metamorphism by-the-hard-way requires lots of memory to store tables,
  the disassemblies, the temporal code changes, execution path marks, the
  local variables, and lots of etc. etc. etc. The using of ESP is a dirty
  trick that can be useful to store not very big things, but all changes when
  you realize that you need about 3 or 4 Mb to do a decent job (just see
  z0mbie's Mistfall: reserves 32Mb!). The only solution is to use VirtualAlloc
  and reserve the quantity you need.

  Since we reserve memory, nothing stop us of copying the code to memory, as
  if we were dinamically generated code. Moreover, for an antivirus emulator
  it's a copy, not a decryption (if you decrypt while copying). For this, we
  have to code a polymorphic engine to generate at least a "copier" (just if
  you want to leave the code unencrypted).

  The copy to memory allow us to make more nice things: we can copy our code
  anywhere inside that memory frame, and have our memory variables at the
  offset we want. Since we use a Delta register, we don't care where the code
  is! The problem to this is to get our internal offset (just notice that the
  memory address where we are is dinamically got by the operating system).
  We can recode the addresses of all buffers and tables while reassembling,
  and then even the "shape" of the final code will be different! The problem
  here is that we must supply the address where the code is stored, the
  data division inside the reserved virtual allocated memory, etc., so we
  must supply them from the unique thing we must vary as we want: the
  decryptor/copier, making, for example, pushes of the values and then poping
  them once in the engine (look the source code of MetaPHOR).



4) Future

 This section is optional. It expresses ideas that I'm thinking to increase
 the functionality of the virus MetaPHOR, and that maybe are useful for you
 to implement.

 a) Plug-ins

  That's a thing one day I'll implement in my virus for sure (I hope!). I
  realized that with this type of metamorphism the plug-ins are very easy to
  implement (although "easy" doesn't mean "short and quick-coding"). The
  plug-ins can be implemented with "code marks", in the same manner we
  signalize an API call. What we do is to make an information header
  (variables it uses, etc.) and a routine made in pseudoassembler (our
  assembler). A code mark that marks a plug-in can be
  "PUSH Reg/MOV Reg,12345678/POP Reg" (a do-nothing code that we don't
  eliminate with the shrinker). The number passed in that register is, in
  fact, a plug-in identificator and a version number, (using hi-word and
  lo-word), which allows us to compare the version of the plug-in currently
  installed with the version of a new plug-in, and to determine if the plug-in
  must substitute and old one or must be added to the execution. The mark
  would be repeated twice (one for begin and once for end), with the value in
  the last one with the high-bit to 1, so we can substitute the current
  plug-in by the new one. The shrinker must detect this structure and set two
  new instructions called "PLUGIN_BEGIN Version" and "PLUGIN_END Version".
  When the shrinker has performed it's job, and just after the permutator,
  the plug-in injector comes to work: it searches for new plug-ins anywhere,
  decrypts them and check their signature (like Vecna does in his Hybris).
  The plug-ins are not needing to be encrypted, and are also nearly impossible
  to modify or put one that is not yours if you use a public-private key
  signature system.

  Depending on the type of plug-in and the version, we can substitute directly
  the plug-in entry (if type coincides and the version is greater) overwriting
  the CALL to the last plug-in. This is a 3D-coding technique that I mentioned
  nearly at the beginning of the article: the code of the old plug-in remains
  and it's assembled forming part of the new virus, but when in the next
  generation the virus get disassembled, that part won't be reached by the
  disassembler part, so the olg plug-in is eliminated (and meanwhile the
  code gets garbled and mixed with code that does absolutely nothing :).

  All the whole virus can be made of plug-ins, or at least have plug-in
  identificators at all the routines, even at the plug-in downloader. In that
  case, if one day I make support for Linux (I think it'll be before the
  plug-ins get finished :) I only have to code the plug-in and put it where
  it corresponds.

  Also we should have to have a field to specify the zone where we want to put
  a new plug-in (just in case we want to insert a plug-in just after the
  permutator but before the expander, to make something new, or just if we
  forgot to code a function and we want to put it now). It'll be on the
  plug-in identificator, and it's simple: just make a unique identifier for
  each plug-in in the original virus that be quite separated from the one
  before, for example: Disassembler: 10000001, Shrinker 20000001, etc. (all
  them with the identificator and the version). If we want a new function
  between the disassembler and the shrinker, we use the identificator
  18000001, and so on.

  Let's see if one day I do it :).


 b) Multi-platform cross-infector

  That's something that I didn't do on the release 1.0 of MetaPHOR because
  I was short of time to finish it for 29A#6, but it's quite easy (in fact,
  we have seen the proof of concept in Benny's Winux). We only have to call
  one API function or another, depending on the system we are running, and
  have a function to attach our mutated code to a PE or to an ELF. The APIs
  can be used in the same way (for example, I code a single function called
  MapFile that returns me a file mapping for Windows using CreateFileMapping
  and MapViewOfFile, and for Linux using the pertinent int 80h function).


 c) Reassembling to different processors

  That's something that I don't know if I'm going to code it before plug-ins
  (in which case I'll program a plug-in rather than a new whole virus).

  If you have realized, the pseudoassembler is quite general, so it can be
  adapted to any processor we want! Only if they use a 32-bits architecture,
  of course. The 64-bits architecture requires a change in all the management
  of the internal pseudoassembler, but not very complicated (only I have to
  extend the instruction length from 16 bytes to 32, which gives me space for
  the QWORDs and some new fields that won't remain in blank).

  The fact is that I thought about it and can be done for sure, and the first
  pass would be use Alpha assembler and the WinNT under Alpha processors. Why?
  Because I don't have to change NOTHING, only the Disassembler and the
  Assembler I use, since the infection methods and the algorithm would be
  exactly the same! (since we use ring-3). The shrinker, permutator, expander,
  etc. would remain untouched.

  The only thing we have to do is redefine some instructions and expansions
  under this assembler, since the TEST instruction, for example, doesn't
  exist in all the processors (I think it doesn't exist in Itaniums).

  The very first thing to do is to code a new disassembler (but keeping the
  current). So, we only have to see if our code is Alpha or x86 to call to
  one routine or other. After this, since we are using only our internal
  assembler, we are in a common part (in algorithm, but not in codification)
  until the reassembler, which will be selected (reassembler for Alpha or
  for x86) depending on the target processor specified in the PE header.
  The different processors we can use are quite alot (and then also means
  a lot of work!): x86, Alpha, Itanium, 680x0, PowerPC, PA-RISC, etc. (all
  the platforms that can run WinNT or Unix/Linux, or even MacOS).

  The great thing is that the code is completely changed: it's not a virus
  with several parts, one for every processor, as it was MrSandman's
  Esperanto (with respect, Sandy ;), it was another type of code). The whole
  virus will reassemble to the new processor's assembly language, so an
  antivirus that wants to intercept the virus for x86 processors have to deal
  with a possible conversion to Alpha assembler and then, once in an Alpha,
  reconvert to x86, jumping over the antivirus completely (for example, in
  an infected server). And since it's metamorphic, they can't apply strings
  to detect it. Yummy! :)



5) Conclusion

  Metamorphism: the strongest viric technique ever ideated, ever created.

  Any more to say?


---------------------------------
* The Mental Driller / 29A,

  written for 29A#6 on 02/02/2002
---------------------------------