Thread Local Storage The hidden entry point 64bit version (Intel) roy g biv / 29A About the author: Former DOS/Win16 virus writer, author of several virus families, including Ginger (see Coderz #1 zine for terrible buggy example, contact me for better sources ;), and Virus Bulletin 9/95 for a description of what they called Rainbow. Co-author of world's first virus using circular partition trick (Orsam, coded with Prototype in 1993). Designer of world's first XMS swapping virus (John Galt, coded by RT Fishel in 1995, only 30 bytes stub, the rest is swapped out). Author of world's first virus using Thread Local Storage for replication (Shrug, see Virus Bulletin 6/02 for a description, but they call it Chiton), world's first virus using Visual Basic 5/6 language extensions for replication (OU812), world's first Native executable virus (Chthon), world's first virus using process co-operation to prevent termination (Gemini, see Virus Bulletin 9/02 for a description), world's first virus using polymorphic SMTP headers (JunkMail, see Virus Bulletin 11/02 for a description), world's first viruses that can convert any data files to infectable objects (Pretext), world's first 32/64-bit parasitic EPO .NET virus (Croissant), and world's first virus using self-executing HTML (JunkHTMaiL, see Virus Bulletin 7/03 for a description). Author of various retrovirus articles (eg see Vlad #7 for the strings that make your code invisible to TBScan). Went to sleep for a number of years. This is my first virus for Win64. It is the world's first virus for Win64 on Intel Itanium. What is Thread Local Storage? This is what Microsoft has to say about it: "The .tls section provides direct PE/COFF support for static Thread Local Storage (TLS). TLS is a special storage class supported by Windows NT. To support this programming construct, the PE/COFF .tls section specifies the following information: initialization data, callback routines for per-thread initialization and termination, and the TLS index". So, Thread Local Storage (TLS) is a Microsoft invention for applications that need to initialise thread data before main execution begins. To do this, there are callback pointers. These functions execute before the code at the main entry point! Clearly, this is a new way for viruses to run and even though AVers know about it, they probably don't support PE+ files because no viruses use it. One point now: We can ignore the reference to .tls because there is a field in the PE header that points to this structure anywhere in the file. The callback functions have the same parameters as a DLL entry-point function, except that nothing is returned. The declaration looks like this: typedef VOID (NTAPI *PIMAGE_TLS_CALLBACK) (PVOID DllHandle, DWORD Reason, PVOID Reserved); The Reason parameter can take the following values: Setting Value Description DLL_PROCESS_ATTACH 1 New process has started DLL_THREAD_ATTACH 2 New thread has been created DLL_THREAD_DETACH 3 Thread is about to be terminated DLL_PROCESS_DETACH 0 Process is about to terminate The DLL_PROCESS_ATTACH and DLL_PROCESS_DETACH messages mean that we are called for the host startup (after CreateProcess() but before process entry point) and shutdown (from within ExitProcess()), and the DLL_THREAD_ATTACH and DLL_THREAD_DETACH mean that we are called for thread startup (after CreateThread() but before thread entry point) and shutdown (from within ExitThread()). This happens for EXEs and also DLLs (but only DLLs that are not loaded with LoadLibrary). No need to hook ExitProcess() anymore because we will be called by ExitProcess() automatically. It is important to know that NTDLL.DLL (not KERNEL32.DLL!) calls the callback functions. Thus, if you need to call kernel32.dll APIs, you need to call LdrGetDllHandle() to find kernel32.dll image base. The good thing is that the import table is filled already, so you can use the host imports. What does TLS look like? At offset 0xD0 in the PE+ header is the pointer to the TLS directory. According to Microsoft documentation, the TLS directory has the format: Offset Size Field Description 0x00 8 Raw Data Start VA Starting address of the TLS template 0x04 8 Raw Data End VA Address of last byte of TLS template 0x08 8 Address of Index Location to receive the TLS index 0x0C 8 Address of Callbacks Pointer to array of TLS callbacks 0x10 4 Size of Zero Fill Size of unused data in TLS template 0x14 4 Characteristics (reserved but not checked) Notice that the pointers are all virtual addresses (VA), not relative virtual addresses (RVA). This means that if we add a TLS directory, we should also add relocation items to the .reloc section, or simply remove all relocations. The reason for this is that if the file is loaded to a different base address, then Windows will display the message box "The application failed to initialize correctly" and the file will not execute anymore. What do the TLS fields mean? The TLS template contains data that are copied whenever a thread is created. These data can also be executable codes (after calling VirtualProtect(), since IA64 enforces the executable bit). If the template exists (it is optional and so the fields can be null) then when the application starts, Windows will allocate an array for the TLS pointers and store this pointer at r13:0x58. For each thread that is created, the size of the template is allocated from the local heap, the data are copied to there, the pointer is stored in the array, and the array index is stored in the TLS index field. A thread can get its pointer by this formula: qword at (qword at [r13:[0x58]] + (TLS index * 8)) Or some code: add r30 = 0x58, r13 ld8 r30 = [r30] //get pointer to array of TLS pointers add r31 = @ltoff(TLSIndex), gp ld8 r31 = [r31] //get TLS index pointer ld4 r31 = [r31] //get TLS index shladd r30 = r31, 3, r30 //get pointer to TLS data pointer ld8 r30 = [r30] //get pointer to TLS data then access data at [r30 + offset] The Address of Callbacks field contains the Virtual Address of an null- terminated array of functions that receive the ATTACH/DETACH messages. It is valid to have no entries in this array. In that case, the field is supposed to point to eight zero bytes, however the actual field can also be null. How to use TLS? There are a few simple ways to use TLS to infect a file: add a callback pointer to existing array (or create new array) alter one of the host callback pointers alter the code in one of the callbacks create a new TLS directory hijack the TLS template and alter some code somewhere in the file If you want to use the TLS method to infect a file, firstly check if a TLS directory exists already. If it does, then you can pick at random a callback routine pointer and change it to point to your code. If there is no existing TLS directory, then add one by setting correctly the pointers in your own version. The template addresses can be set to null and the index pointer can point to any writable dword (including the Characteristics field because it is not used). The callback pointer will point to the array of callback routine pointers, one of which will be the virus entry point. When this entry point receives control, the file is loaded fully into memory and the import table is fixed up. This means that we can do anything that we would do normally, like go resident or call API functions and spread to other files. The main difference is that we are guaranteed to be called at least twice, once on startup and once on shutdown, and twice more for every thread that the host uses. This means that we must be careful to avoid recursion because we will also be called if we use threads in our virus code. Hijacking the TLS template is a technique that still has not been used. The idea is to make a copy of the TLS template and add the virus code to it. When the process starts (or a thread is created), then the virus code is copied by Windows into the heap. This means that the code is automatically placed into a writable memory space, without any call to malloc or memcopy. The only two things that are required after that are to mark the region as executable, and to transfer control to the code on the heap. That is done by using the TLS index to get the heap pointer. The transfer of control code would look something like this: this code is in the file: fib: alloc loc0 = 0, 4, 4, 0 mov loc1 = rp mov loc2 = gp add out0 = 0x58, r13 ld8 out0 = [out0] add out1 = @ltoff(TLSIndex), gp ld8 out1 = [out1] ld4 out1 = [out1] shladd out0 = out1, 3, out0 ld8 out0 = [out0] mov out1 = 1 mov out2 = PAGE_EXECUTE_READWRITE add out0 = size of original TLS template, out0 mov out3 = sp mov loc3 = out0 br.call.sptk.few rp = VirtualProtect fie: mov b1 = loc3 br.cond.sptk.many b1 this code is on the heap: mov r2 = rp //get return address addl r2 = fib - fie, r2 //point to first byte of code in file //rest of code is here. do not forget to restore host bytes mov ar.pfs = loc0 //restore important registers mov rp = loc1 //restore important registers mov gp = loc2 //restore important registers mov b1 = loc3 //store real return address br.cond.sptk.many b1 //return to host Epilogue: Now you want to look at my example code and then to make your own examples. There are many possibilities with this technique that make it very interesting. It is easy when you know how. Just use your imagination. TLSDemo3 has an inserted TLS directory and code that displays message box. This code runs before main entry point. TLSDemo4 has a hijacked TLS template and code that displays message box. This code jumps from main entry point to heap without malloc or memcopy. Greets to friendly people (A-Z): Active - Benny - Obleak - Prototype - Ratter - Ronin - RT Fishel - sars - The Gingerbread Man - Ultras - uNdErX - Vecna - VirusBuster - Whitehead rgb/29A may 2004 iam_rgb@hotmail.com