Back To The Basics by SPo0ky I wrote this tutorial because some beginners who read the first two editions of our magazine told us that they have problems understanding the basics of assembly... In this tutorial I'll not try to teach you any virus techniques, I'll only try to explain the most basic things of assembly like how tasm/tlink work, registers, memory, interrupts,... as simple as possible. Maybe this sounds boring to you but if you want to code your own viruses you have to understand the basics. Also this article will not fully teach you assembly! It will only help you to understand the basics. To understand assembly you will need at least a few weeks with good training (-programming). I also suggest you to buy a good book about Assembler and to read many many many virus source codes (Thats the way I used to learn Assembler). (Also - If you don't have a bookstore in your town you can use the online bookstore at http://intertain.com, they have many great books, and they are fast and cheap!) 1. Why Assembler? There are some pros and cons why you should (should not) use Assembler. Contrary to HLL (High Level Languages), like Pascal, Basic or C++, in Assembler you have to tell the CPU each step it has to execute, which means that to write a big (complex) Assembler program is very time consuming. Thats why most of the time, big programs are not completely written in Assembler but Assembler parts are included in HLL-programs. Another con of Assembler is that programs can not be used on other brands of CPUs which they were written on because each brand of CPU has another instruction set (We will use the 80x86 instruction set, which is used in the IBM-PC and compatibles), but this gives you the possibility to optimize the code for one specific CPU so that you can use all of its capabilities. The result is extremely FAST and SMALL code. 2. A simple assembly program Lets start with a short Assembler program. You don't have to understand what each instruction is used for now, I'll explain that later. Just type this program into an ascii editor (-edit.com) and save it as example.asm. .model tiny .code org 100h start: mov ah,9h mov dx,offset message int 21h mov ax,4c00h int 21h message db 'CodeBreakers Rule! ;-)',10,13,'$' end start 3. Assembler and Linker (TASM and TLINK) Before I continue I'll show you how to use TASM and TLINK to compile such a file to an EXE or COM program. After you saved the above program into a file (example.asm) you can type TASM EXAMPLE1.ASM This will generate a so called OBJECT-FILE named EXAMPLE.OBJ. Generally this file contains only information for the linker and a translated version of the above code into binary (machine code). Example: MOV AH,9h would be translated into 1011 0100 0000 1001 This Object file is not executable yet, to make an executable file (COM or EXE) we have to use a LINKER (TLINK). This linker will stick one or more object files together to one executable file. To link the example1.obj file you just type: TLINK /t EXAMPLE1.OBJ The /t switch tells the linker that it should produce a COM file, if you leave /t away you will get an EXE file. Anyway, now you should have a ready to run COM file... Just type EXAMPLE to start it! 4. Registers Registers are extremely fast accessible memory cells in the CPU, they are used to address memory, to give instructions to the CPU,... generally they are used to store "values". All registers can store 16 bits (= 2 bytes) of data and some registers can be split into two 8 bit (= 1 byte) registers. Registers of the 8086 CPU's: +--------------------+ | AH | AL > AX | -> Accumulator Register | BH | BL > BX | -> Base Register | CH | CL > CX | -> Count Register | DH | DL > DX | -> Data Register +-----+--------------+ | SI | -> Source Index | DI | -> Destination Index +--------------------+ | BP | -> Base Pointer | SP | -> Stack Pointer +--------------------+ | CS | -> Code Segment | DS | -> Data Segment | ES | -> Extra Segment | SS | -> Stack Segment +--------------------+ | IP | -> Instruction Pointer +--------------------+ | F | -> Flag-Register +--------------------+ Only the registers AX, BX, CX and DX can be split into two parts: AH, AL, BH, BL, CH, CL, DH, DL. Each of them has only 8 bits instead of 16! BTW - 8 bits are called a BYTE 16 bits are called a WORD AH, AL, BH, BL, and so on, are called byte register and all others (AX, BX, CX, DX,...) are called WORD registers. 5. MOV(e) Assembler (or the CPU) provides many functions which allow you to manipulate (to change) the data stored in a register. One of the most important instructions used to manipulate a register is MOV. Look back to our example program... we used MOV 3 times (MOV AH,9 / MOV DX, OFFSET MESSAGE / MOV AX,4C00H). You can change the data in a register by using: MOV , . Where is the 16 or 8 bit register you want to change, and is the data you wan to store in the register. But MOV can't be only used to change the data of registers, it can also be used to change data stored at a certain location in MEMORY. 6. MEMORY When you execute our little example program it just displays text... now, how can the computer know WHICH text it should display? Again, look at the example programs source code: MOV AH, 9 MOV DX, OFFSET MESSAGE INT 21H . . MESSAGE DB 'Some text...$' The first line is used to tell the CPU that it should display text (function 9 in AH is used to display text). And the second line tells the CPU where it has to look for the text in memory. Each byte in memory gets a number (an address), so the CPU knows exactly which byte it has to read/write. In the second line, "OFFSET MESSAGE" would return the address where the MESSAGE is stored in memory and store it in DX (To display text MS-DOS requires that the address of the text is stored in the register DX!). Some examples: Let's say this is our memory: Offset: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data: | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | We want to get the data which is stored at offset 6 into register AH. Which instruction would be used? -> MOV AH, [6] This would put the data at offset 6 (= 'F') into register AH. The '[' and ']' are very important, If you forget them it would put the number 6 into AH instead of 'F'! Remember, AH is a 8 bit register (-> It can store only 1 byte or 8 bit), what would happen if we'd use AX (which is a 16 bit register) instead of AH? -> MOV AX, [6] AX would become 'GF'. Yes, not 'FG'! In the x86's everything you read from memory into a word register is turned around! (Thats not that important for you yet, but you should know it anyway...) 7. Interrupts I needed a very long time until I found a (hopefully) good way to explain interrupts to a newbie! Finally I decided to use a simple example, the MS-DOS Prompt. When you are at the MS-DOS prompt you can enter commands, after you press RETURN the command gets executed. You could compare the pressing of the RETURN key with an interrupt. In assembler you fill the registers with values, then you execute the interrupt. The interrupt code would then evaluate the values you put into the register, it would decide which function it should execute, .... and finally it would return the results (in a register, on the screen or on your hdd,...) Ex.: MOV AH,9 MOV DX,OFFSET MESSAGE INT 21H The first two lines of this example have already been explained above, the 3rd line would execute an interrupt, interrupt 21h(ex). Interrupts are numbered from 0 to FFh, each interrupt provides other types of 'services'. Like INT 21H, this is the MS-DOS interrupt, it provides basic DOS functions, like input/output of text, file functions (like open, read, write to files). INT 13H is the BIOS interrupt, it provides many Disk access functions like reading/writing/formatting of disk sectors. INT 10H is the video interrupt, this interrupt allows you to use many functions to make nice graphics :-) It provides functions to change the video mode, to draw pixels onto the screen, to change the color of text, and so on... For a list of all interrupts and their functions download Ralf Browns Interrupt List from http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ralf/pub/WWW/ 8. The sample program, step-by-step .model tiny ----------- This is not code which will later be executed... it just tells TASM/TLINK that they should use one segment for the whole program. There are also .model small, .model huge, etc,... but for small programs (= simple viruses ;) model tiny is enough. .code ----- This tells TASM and TLINK that our executable code begin here. After this like we can begin to write our main program. org 100h -------- All COM files are loaded into memory at offset 100h. ORG tells the compiler 'where to store the code in memory'. start: ------ This is just a lable which is required by TASM... MOV AH, 9 --------- We want to display text... we will use the dos interrupt to do so. INT 21h requires that we put the function number into register AH. So, to tell the CPU that we want to display some text we 'MOV(E)' the number of the function used to display text (9) into register AH. MOV DX, OFFSET MESSAGE ---------------------- The CPU needs to know where to find the text it should display... If we use INT 21H we have to store this location in register DX. To do so we just get the address of the message with 'OFFSET MESSAGE' and move it into DX. INT 21H ------- Now that we have 'collected' enough information (filled the registers with many stupid numbers) we can execute the interrupt, which will finally get the CPU to display some text for us. :-) MOV AX,4C00H ------------ Never forget the last two lines of this code! They are used to exit a program. If you forget them, your program will crash. DOS uses the function 4C to exit programs, 00 means that we will not return an ¨Error Code¨ (exit without an error). INT 21H ------- Now INT 21H will execute the function to exit the program... message db 'CodeBreakers Rule! ;-)',10,13,'$' ------------------------------------------------ This is the message we WANT to display :-) '10,13' does the same as pressing return, it puts the cursor into the next line. '$', this sign doesn't mean 'fast money' :) ... somehow DOS needs to know where to stop displaying text, Bill G. decided to use '$'. end start --------- This indicates the end of the label 'start', also required by TASM. btw - This doesn't exit the program! To exit the program you still have to use function 4C with INT 21h. Ok, thats all for now... I know that this is a very basic tutorial, and that it wasn't written very well... but I hope that it answered at least a few of your questions! If you have any further questions feel free to use the message board on our homepage at www.codebreakers.org, or email me at spo0ky@thepentagon.com (spo0ky with zero! :)... Maybe I'll write a FAQ with specific questions for the 4th edition. --SPo0ky