Insane Reality issue #8 - (c)opyright 1996 Immortal Riot/Genesis - REALITY.005 Article: AV-Articles Author: Who cares? % Ripped-Off AV Articles % __________________________ OK. Many people said they enjoyed the AV articles in IR zine #7, so here are some more. These include: *The Problems in Creating Goat Files - Read it for anti-bait techniques as well as other interesting stuff. *Heuristic Anti-Virus Compatibility - material on Heuristic scanning and cleaning, aswell as AV technique in general. *Detecting and Erradicating Known and Unknown Viruses - Scanning/Cleaning infections generically via Intergrity Checking. *A Brief History of PC Viruses - interesting, and might give you ideas. What?? Why should an elite Vx dude like you read AV articles?? - K N O W Y O U R E N E M I E S - - _Sepultura_ ============================================================================= ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º The problems in creating goat files. º º Igor G. Muttik º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Abstract Having more than 6000 of viruses for IBM PC the maintenance and updating of a virus library of samples is a difficult task. Parasitic file infectors are the majority of this great quantity and testing of their properties and creation of samples takes many efforts. To help solving of this problem the author has developed a special tool for antivirus researchers, which allows to create bait files (also called sacrificial goats). Theoretical points of bait creation (infectable objects, unusual infection conditions, environmental requirements) are discussed and detailed description of GOAT package is given. This paper is an attempt to summarize problems appearing during weeding of suspicious files and replicating of viruses. Safe testing environment based on hardware hard disk drive (HDD) protection is described. The paper also describes DOS peculiarities, appearing when working with long directories. Possible appearance of viruses targeted against antivirus research environments is discussed. 1. Virus samples 1.1. What is "a virus sample"? A file-infector virus usually attaches itself to an executable file using appending or prepending technique. Such viruses are called parasitic infectors. Among antivirus researchers these viruses are usually transferred in the "sample form" -- the virus is attached to the do-nothing file of some fixed size (usually divisible with 10**N or 16**N) and simple contents (do- nothing or printing a short message on the screen). The result of infection of such a goat file is called "a virus sample". We have: Virus sample = Virus(Goat file) or, simply: Virus sample = Goat file + Virus Can we "standardize" virus sample? Unfortunately, not, if speaking in general. All polymorphic viruses have zillions of instances and it is impossible to select some "standard" image of such a virus. Oligomorphic and encrypted viruses are difficult to "standardize" too. Even for non-encrypted viruses the problem is not simple -- they usually have some variables, stored inside their body (especially resident viruses) and, though, their image is variable. 1.2. Types of goat objects We have many infectable objects in DOS environment. This includes: 1) Files: - EXE/COM/OV? executable files (usually started by the user) - SYS drivers (called by DOS kernel at startup) - BAT files (run on the user request or from AUTOEXEC.BAT) - OBJ/LIB/source files (compiled into executables on the user request) - DLL/CPL/etc. (NE, LE, etc. - Windows, OS/2 executables) - DOC/WK?/etc. (including macro and OLE files) 2) Pointers: - MBR partition table (a'la Starship) - DBR pointers (IO.SYS/MSDOS.SYS; IBMBIO.COM/IBMDOS.COM) - directory entries (ex., DIR-II family) - FAT pointers (ex., Necropolis) 3) Startup code: - Flash ROM (called by microprocessor after RESET) - MBR code (called by ROM BIOS after POST) - DBR code on HDD (called by MBR code) - DBR code on floppy (called by BIOS) - DOS kernel code (called by DBR code) Each mentioned object can be infected and, therefore, requires preparation of a "goat object". Fortunately, most types of unusual infection techniques are very rare or even not yet found. And creation of bait objects for bizarre viruses is a rare task -- great majority of known viruses are simple parasitic file infectors. Furthermore, creation of a goat BAT file (or source file) is rather easy -- one can use a text editor to make a bait for the virus. To create a goat floppy diskette we can use standard FORMAT utility. Antivirus researchers are mostly disappointed with a problem of "virus glut" [Skulason]. "Virus glut" means an increase of the number of known viruses at a rapid rate. Great majority of this amount is file viruses. So, in most cases, an attention of antivirus researchers is focused on the parasitic file infectors. We'll discuss only this type of viruses in the rest of the paper. 1.3. Creation of goat files To try to replicate a virus one have to have a set of goat files. Most antivirus researchers have their own pre-created sets of files, produced using an ASM source or directly from the DEBUG utility. This approach has a drawback -- if new goat file is required it should be created manually. And if we need a lot of files (ex., for testing of polymorphic virus detection rate) -- the process must be repeated many times. Obviously, specific automated tool has many more options and capabilities. It can create even sets of files on one invocation. It is convenient to use a set of goat files with linearly increasing length (say, 1000, 2000, ...20000). If the virus leaves alone short victims after infection -- this will be easily noticeable. And file growth can be calculated subtracting the size of the infected file from the original size. 2. Infection of a goat file 2.1. "Weeding problem" From the point of view of an antivirus researcher all incoming suspicious samples should be classified in one of the following groups (for definition -- see VIRUS-L FAQ [FAQ]): - innocent file (includes garbage and damaged programs) - virus (includes germs, droppers, viruses of the 1-st generation) - trojan - intended - joke Mentioned classification problem is usually called "weeding". There are automated and manual methods, used to weed a set of files. The following automated tools are used: - scanners, detecting viruses by name - heuristic scanner - TRASHCAN/DUSTBIN, detecting non-viruses, jokes, garbage and intendeds Manual "weeding" methods are used after automatic ones: - visual analysis (ex., presence of "MZ", "PK" identifiers) - tracing in DEBUG (includes partial on-the-fly disassembling) - full disassembling We should take into account that the infected sample may be compressed with one of the EXE-packers (PKLITE, LZEXE, DIET, EXEPACK, COMPACK, PGMPACK, KVETCH, SHRINK, TINYPROG, WWPACK, AXE, IMPLODE, AVPACK, etc.). In such a case UNP and UUP programs should be used to remove the compression code before the manual analysis. Visual checks of incoming suspicious files are usually made using DEBUG or HIEW (Hackers View) -- wonderful viewer of executable files. Last one combines features of simple ASCII/HEX viewer with a built-in disassembler/assembler (both 86 and 32-bit modes) and binary file editor. I can hardly recommend this utility for all antivirus researchers. 2.2. Safety problem Every antivirus researcher faces a problem, when he needs to start the infected (or just suspicious) program or trojan horse. The typical solution is to use a special goat PC (usually old PC/XT/AT). But the malware can easily destroy data on the hard disk of this PC. It can even cause malfunction of the hardware (ex., low-level format IDE disk, if any). It will take significant time and effort to restore your testing environment. The hardware protection of the hard disk of your PC can only be a 100%-reliable solution. To make hardware protection you will need some switch, which selects an operation mode -- "normal"/"protected". 2.2.1. Hardware protection using "Turbo" switch "Turbo" switch is rarely used in the computer operation. The reasons are the following. First, any user will usually select the highest possible speed to minimize response time of the software. Second, most available BIOSes support toggling of turbo mode using the keyboard (for example, AMI BIOS uses [Alt]-[Ctrl]-[+] to set the higher speed and [Alt]-[Ctrl]-[-] to set the lower speed). Therefore, you can easily replace your connection of "Turbo" switch to the motherboard with a simple jumper. Now your "Turbo" switch connector is free for use as hard-disk protection switch. Typically connector of "Turbo" switch has three contacts (and switch shorts two left contacts, or two right ones). The use of this switch to turn on/off the disk protection looks an elegant solution. Now find the jumper on your hard disk controller, which enables its operation (examine controller manual if needed). Most MFM, IDE and SCSI controllers have such a jumper. Remove this "HDD-enable" jumper and substitute it with the connector of "Turbo" switch (connector should replace the jumper on the controller and short the contacts instead of the jumper). Now, after described modification, you can easily turn off HDD simply pressing "Turbo" switch and return it to operation pressing it once more. LED indicator (or simple LED) of your PC (which usually shows the current frequency of processor operation) is wired to the turbo switch and reflects its state. You can easily configure the LED indicator to reflect current mode of operation (say, "On"/"FF"). 2.2.2. Software shell for hardware protection To work without HDD you will need some media instead of it. Ideal solution is to use a ramdrive. You have to add the following statement to your CONFIG.SYS -- DEVICE=RAMDRIVE.SYS nnnn (where nnnn stands for the size of ramdrive in kilobytes; you may also need /e switch to use extended memory). Size of ramdrive <2MB is usually not sufficient, so better select 2-4MB. First, copy all software, needed for virus testing (plus suspicious files) to your virtual disk. After your hard disk will be switched off all programs will be inaccessible, so make a good selection (in my case it took around 1MB or more). Now you are ready to disable hard disk. But DOS still thinks that HDD is present. Its internal buffers and cache utilities (if any) still remember the current contents of some portions of your hard disk in the computer memory. The most obvious solution is the elimination of all "notes" about hard disk presence. To simulate the absence of hard disk on the PC, I wrote a special program, which clears INT_41h and INT_46h (pointers to the HDD disk tables), and sets number of available hard disks (BIOS variable at [0:475h]) to zero. To reroute any access from hard disk (ex., drives C:, D:, E:) to the virtual disk, I use DOS' SUBST utility, which replaces drives C:, D: and E: with the virtual disk drive letter (F: in my case). SUBST also clears HDD cache contents. Finally, DOS environment variables (ex., COMSPEC and PATH) should be rewritten to point on the ramdrive objects. 2.3. "Replication problem" The problem of infecting a goat file, having a sample of possible virus is called "replicating". Very often one researcher asks the others -- "I have a sample of what I think is a virus, but cannot replicate it. Have you tried? If anybody succeeded in doing this -- send me a sample, please..." And that repeats very frequently. We see that "replication problem" is one of the most common problems. The question is to find correct computer environment and meet all virus infection conditions. Obviously, both problems can be solved with the help of full disassembly of the viral code, but that is not very practical approach, because it takes much time. Usually, suspicious files are simply tested in so-called "goat computer". Only in case of problems (files do not replicate, but look suspicious) they are disassembled and analyzed in deep. We already saw one approach to the replication problem -- to ask for help from other researchers. There are also other options: - trying a lot of different goats - trying a lot of different environments - manual analysis (tracing, debugging, disassembling) to find out all infection conditions (i.e., requirements for the goat and environment). 2.4. Infection conditions To replicate a virus we have to feed him a goat file, which meets virus internal infection conditions. This must be done in the environment, which is appropriate for the current virus. Fortunately, to make viruses more infective, they are usually made to operate in the wide range of environments. On the other hand, sometimes, numerous limitations are implemented to simplify the viral code (ex., Ping-Pong, Vindicator, Yale and Exeheader.Mz1 viruses work only on 88/86 processors; 3APA3A and MIREA.4156 viruses require 16 bit FAT hard disk; AT144 virus requires 286 processor or higher; Green_Caterpillar virus needs CMOS clock; Lovechild virus requires only MS-DOS 3.2; Nightfall virus does not replicate without XMS driver [Brown]; EMMA virus requires presence of EMS [Kaspersky]; etc.). In the case of specific requirements only random environment selection or manual analysis of the virus internals may help to find the correct environment. Parasitic file infectors can theoretically infect all following types of files: - COM - EXE MZ/ZM (DOS executables) NE (Windows, OS/2 16-bit) LE, W3 (Windows VxD, Win386) LX (OS/2) PE (Windows, NT 32-bit) MP, P2, P3 (Pharlap DOS extenders) - SYS/COM (normal DOS drivers) - SYS/EXE (understood only by DOS 5.0, 6.0) There are following infection conditions (except file type): - file size - filename - attributes - file timestamp (date/time of creation/modification) - file contents Most common infection condition is file type (COM/EXE) and, second, size of the victim. Very short files are usually avoided, because their growth is too noticeable and also to avoid infection of do-nothing goat files (like primitive INT_20, 2-byte files). Most file infectors are targeted against simple DOS executables -- COM files and EXE files (with MZ or ZM marker). Some file infectors are capable to infect DOS drivers of SYS type (ex., SVC.4644, SVC.4661, SVC.4677, Alpha.4000, Astra, Astra_II, Cysta or Comsysexe, Terminator.3275, CCBB, Talon or Daemaen, Ontario, VLAD.Hemlock, Face.2521, etc.). All other formats of executables need reclamation of the virgin lands from virus writers. For example, there are only few known Windows viruses up to date (all infecting only executables in NE-EXE format). Speaking of the contents of goat files, we should mention that viruses, which check the internals of the victim file are rather rare. I do not mean a selfcheck to avoid multiple infections of the same file. I mean checking of virus-free areas (same as inspection of the uninfected file). Nevertheless, such viruses exist. Lucretia virus looks for an 0xE8 byte (Intel x86 CALL instruction) in the file and replaces the offset of the call to point on the viral body. Warlock virus avoids all files having 0Eh byte at the start of program code (includes all LZEXE-packed programs). Raptor virus does not infect EXE files with SS in the header equal to 07BC, 141D, ...2894 (13 entries). Behavior of Internal.1381 virus depends on the contents of EXE header too. Moreover, there are Zerohunter viruses, which look for a series of zeroes (412 bytes for Zerohunter.412 and 415 for Zerohunter.415) in the file and infect the victim overwriting this block of zeroes, if found. Zerohunter viruses are typical representatives of the class of "cavity virus|cavity viruses" (like Helicopter.777, Grog.Hop, Gorlovka.1022/1024, Russian_Anarchy.2048, Locust.2486, Tony.338, etc.). There are also viruses of exeheader type -- Dragon, Hobbit, SkidRow, Mike, VVM, Bob, XAM, Mz1, Pure, etc. They infect only EXE files having a long block of zeroes (around 200-300 bytes) in the EXE header (it is 512 bytes by default). They can be regarded as a subclass of cavity virus|cavity viruses. Many viruses do not infect some programs. They usually avoid command processor COMMAND.COM and certain antivirus or widely used programs (archivers, command-line shells, etc.). The following reason come to my mind: infection of COMMAND.COM is very noticeable and causes many incompatibilities, so virus writers simply filter-off COMMAND.COM to avoid compatibility problems. This approach has a drawback (from the virus writer's point of view), as the infection of COMMAND.COM with a resident virus guarantees that the computer will come up with a virus installed in memory, because COMMAND.COM is always automatically invoked during the boot process. Viruses try to avoid antivirus programs -- they normally check own integrity and virus will be detected in a minute. More difficult case -- if the virus infects only on certain days of week, or during the first 20 minutes of an hour (like Vienna.644.a does). For example, Kylie virus affects the victim if current year is not 1990. Fumble virus infects only on even dates. Virus called Invisible avoids certain COM files by doing checksum on the name of the victim. Viruses of Phoenix family (also called Live_after_Death) avoid some file sizes and about 1/8 of files are left uninfected. Russian Mirror (Beeper) virus infects only every third executed file. Some of these viruses are called "sparse" infectors. Random environment/goat selection may not help in this case and viruses have to be traced and/or disassembled. Many viruses require a JMP instruction in the beginning of victim file (ex., first versions of Yankee_Doodle, Russian_Tiny.143, Rust.1710, Screen.1014, Leapfrog.516, etc.) All mentioned exclusions and conditions must be taken into account when trying to create goat files suitable for the infection and if the virus does not replicate. 2.5. infection marker|Infection markers as an obstacle for infection Almost all viruses try to "mark" their victims to avoid multiple infections of the same file, because growing of files beyond some reasonable limit cannot go unnoticed (because of waste of disk space and delays for the reinfections) and may even cause infected file to hang (ex., COM file >64k). Viruses use different "infection marker|infection markers": - detection of self-presence (check own code; full or partial) - sequence of bytes (text or binary designator; usually at specific position) - timestamp (62 seconds, >2000 year, etc.) - file size (ex., Uruguay-#3, #4) - attribute (some viruses mark their victims as ReadOnly) Some viruses use perfectly legal markers -- for example, seconds value (say, all infected files have 33s) or file length (say, all infected files' lengths are divisible with 23). If, occasionally, our goat file will carry a "marker" of the virus, it will not be infected. Fortunately, most viruses use specific markers. In fact, viruses have to behave in such a way to be infective. Therefore, it is usually easy to make an infectable goat file if the first attempt of replication failed because of the coincidence with a legal virus marker. 2.6. Checking of goat files after attack After a try to infect a goat we have to detect possible changes. If we see a file growth (in a directory listing) -- the reason is obvious: longer files are virus children. One additional test is recommended -- to check whether virus child is itself replicating. In some cases (because of the errors in the virus) it is not and, though, must be classified as intended, not a virus. Visual checks after the attack are made just like before the attack -- see 2.1. If the virus has stealth or semi-stealth properties -- the detection of infected samples is somewhat more complex. The best approach is to preserve all goat files, involved in the test and inspect them after clean reboot (copy them to a floppy disk if your HDD is disabled as described in 2.2). More simple, but not that reliable method -- try to remove the virus from the interrupt chains using, say MARK/RELEASE programs by TurboPower Software (MARK should be installed before the first start of the virus, it remembers the whole interrupt table; RELEASE should be started after the attack to restore old interrupt table and remove the virus from the interrupt chain). Unfortunately, this approach might not work if the virus uses tunneling. In principle, we can use an integrity checker to compare test files before and after the virus attack. This generic method can even detect almost all stealth viruses if used in the low-level disk access mode. For example, this mode is available in Russian integrity checker ADInf. 3. "Polymorphics detection rate" 3.1. Huge quantities of goats In the products reviews we frequently read something like that: "... the 'Polymorphic' test-set contains a mammoth 4796 infected files" [TOP] or "When tested against the 500 positively replicating Mutation Engine (MtE) samples, all but two were correctly detected as infected" [Jackson]. Why all these tests need so many samples of the same virus? The answer is simple -- because of great variability of polymorphic viruses (more correctly -- because of the variability of the virus decryptor). Any scanner coping with the polymorphics have to decrypt the body of the virus and locate a search-string. Other approach is to try to distinguish the viral decryptor from a normal non-viral code. Both methods can produce both false positives and false negatives. They are, of course, rather rare, but practically (and even theoretically) unavoidable. To find out the misses of the scanner number of tested samples should be very big. That is why almost all comparisons of the scanners are performed using huge quantities of samples. That is, of course, rather time consuming and not very convenient, but unavoidable practice. How can we speedup the tests and preparation of samples? The first idea is to put virus samples on the fast media -- virtual disk looks the ideal selection. But can we enhance DOS' access to the drive? 3.2. DOS slowdown when working with long directories When experimenting with creation of hundreds of files I have noticed a very interesting peculiarity. After creating some number of files in the directory (in my case it was around 700 files) all additional files needed much more time to be created! Obviously, some internal resource of DOS was exhausted. To shed the light on this effect I have run the same task -- creation of 100*N goat files (N=1..10) using GOATS (with zero size increase; i.e., all goats were identical), but varied number of BUFFERS (as written in CONFIG.SYS). Note, that disk cache (SMARTDRV) was not active, because files were created on the virtual disk. Collected data is given in the table: Time needed to create given number of files (in seconds +/-1). FILES 100 200 300 400 500 600 700 800 900 1000 BUFFERS 15 6 12 19 28* 40 51 64 80 96 118 48 6 12 19 27 35 45 55 70* 90 112 58 6 12 19 26 35 45 55 70 82* 103 68 6 12 19 26 35 45 55 70 82 98 Note: "*" -- shows number of files, when significant slowdown occurs. 1. We see that total time depends much on the number of BUFFERS. 2. At some place significant slowdown always occurs (compare columns to see). 3. Moment of this slowdown depends on the number of BUFFERS. 4. For creation of 1000 files 68 BUFFERS are sufficient. 5. For 48 BUFFERS slowdown occurred at around 720 files. 6. For 58 BUFFERS slowdown occurred at around 870 files. Thus, addition of 10 BUFFERS (10*512=5120 bytes) shifts the limit on (870-720=150) files. We can calculate how much bytes are needed per file -- 5120/150=34.1. Surprisingly, it is very close to the directory entry size! That is an additional evidence that slowdown occurs when there is no more space in BUFFERS to store current directory (and DOS needs to reload it from disk). I have also found an interesting fact (not yet known to me) -- the creation of files in a fresh directory takes much less time, than the creation of the same amount of files in the same directory after removing of 1000 files! And the time for creation of 1000 files in used directory is approximately three times more, comparably to a fresh directory! That is because DOS scans a directory only until it encounters zero entry. And for used directory there are no such entries (at least near the beginning) and DOS has to scan the whole list of deleted entries. Thus, we have to create bait files in a set of fresh directories of moderate size. Same applies to the tests of scanners against huge virus collections -- fresh and short directories will be scanned faster. 4. GOAT software package After discussing some theoretical points, let's turn to the realization of these ideas in the GOAT package [GOAT]. This package is a set of tools for antivirus researchers, which help to create bait files (also called sacrificial goat files or, simply, goat files). The purpose of the programs can be explained using the following table: You need Use Bait file with some special internal structure GOAT.COM A series of bait files of different sizes GOATS.COM Files of the same size, but with different contents GOATSET.BAT Many identical files to infect them with polymorphic virus FLOCK.COM Using GOAT.COM you can manually select the size, the name of a sacrificial goat file and vary its internals to meet the criteria, which the virus uses when deciding "to infect or not to infect" the victim file. You can enter the size of a sacrificial goat file in any of given formats: decimal, hexadecimal or in kilobytes. Size of the victim files can be as small as 2 bytes and as much as many gigabytes (it is stored in 32-bit variable). GOAT.COM is very flexible -- it can create COM, EXE, SYS(COM) and SYS(EXE) files, with code at the beginning, in the middle, or at the very end of the goat file. Files can be filled with zeroes, NOPs, two types of pattern and even filled with random garbage. You can add stack segment for the EXE files, vary header size, and ... many other options are available. GOATS.COM file is intended to create a series of bait files with linearly increasing length. Length increase step is changeable. GOATS.COM has the same flexibility as GOAT.COM. FLOCK.COM is a creator of up to 1000000 identical files. You can infect them with a polymorphic virus to test its behavior and properties. FLOCK.COM uses the same engine as GOAT.COM and GOATS.COM. Thus, all flexibility of GOAT.COM is available too. GOATSET.BAT produces some sort of "a standard set" of files of the same size. These files are different (internal contents or attribute is variable). GOATSET.BAT needs GOAT.COM for the execution. GOAT.COM should be located in the current directory accessible via PATH environment variable. A small batch file RUN-ALL.BAT will help you to run (or infect, if you have a resident virus) all generated bait files. 4.1. Synopsis and switches Usage of the main program -- GOAT.COM looks like this (others are similar): GOAT Size [Filename] [/switch] [/switch] ... Size - decimal, hexadecimal, or in kbytes (Example: 10000, 3E00h, FF00h, 31k, 512K, 2048k) Filename - file to create. If no - makes GOAT000, GOAT001, ... Short reference of all available switches is given below in the alphabetical order: /Annnn - set device Attribute (default=0C853h) /B - place code at bottom of file (default - at start) /C[n] - set selfcheck level (by default equal to 2, the highest) (/C means /C0; i.e., no selfchecking at all) /Dnnn - create maximum 'nnn' subdirectories (default=10) (recognized only by FLOCK.COM, ignored by GOAT and GOATS) /E - create EXE file (if size > 65280 - done automatically) /Fnnn - create maximum 'nnn' files in a subdirectory (default=500) (recognized only by FLOCK.COM, ignored by GOAT and GOATS) /H, /? - Help screen /Inn - use fill byte 'nn' instead of standard zero-fill (ex., decimal /i100 or hexadecimal notation /iE5h) /J - remove JMP at code start (default - JMP present) /Knnnn - add 'nnnn' bytes of STACK segment to the bottom of EXE file (stack segment is filled with 'STACK' by default) /Mnnnn - place code in the middle of the file exactly at nnnn position ('nnnn' is 32-bit value, but see limitations below) /N[nnnn] - fill goat file with pseudorandom bytes. The parameter (if given) is a random number generator seed. RNG uses multiplicative congruental method with 2**32 period /O - do not make long EXE (>256K) with internal overlay structure /P - fill free file space with pattern 00, 01, .. FE, FF, 00, .. /R - make file ReadOnly (default - normal) /S - make short (32 bytes) EXE header (default - 512 bytes) /Tnn - set timestamp seconds field = nn (<63, even: 0, 1Eh, 62, ..) /V - set SS:SP equal to CS:IP /W - make word pattern (0000, 0001, ...FFFF, 0000) /X - suppress signature defined in the INI file using "Motto=" /Y - create device driver (SYS file) /Z - make 'ZM' EXE header instead of 'MZ' /9 - fill free file space with NOPs (default - with zeroes) GOAT.COM, GOATS.COM and FLOCK.COM programs use the same set of command line switches. Most switches are self-explanatory. Pattern inside the goat file always reflects the current offsets in the file (i.e., it is "anchored" to the absolute location in the file). For example, at the file offset 1A2Bh you will see bytes "2B", "2C", "2D", ... (for byte pattern). Word pattern at the same location will look like this -- "2B", "1A", "2C", "1A", etc. Sometimes pattern filling is very useful. Switch /Knnnn adds stack segment at the bottom of the EXE file. Size of the stack segment is limited -- 16 < nnnn < 65536. Obviously, SP always points on the bottom of stack segment (i.e., SP=nnnn). Small and odd values in /K switch should be avoided, because they can hang computer or cause "Exception #13" (QEMM frequent warning), when SP goes through the stack segment boundary (i.e., half of a word is written at SS:0000 and other half -- at SS:FFFF). Switches /Fnnn and /Dnnn are recognized only by FLOCK.COM (GOAT.COM and GOATS.COM simply ignore them). You can specify the desired number of files and subdirectories to create. By default, 10 subdirectories with 500 files in each are created. 4.2. Size limitations By default GOAT.COM, GOATS.COM and FLOCK.COM programs produce sacrificial file of COM type. This applies to any given size, which meets the following criterion: 2 < Size_of_COM < 65280 The magic number 65280 is a maximum size of COM file, which must fit in a segment size (64k=65536) without PSP size (256): 65536 - 256 = 65280. When placing the code at the bottom of the COM file, which size is around 64K, code may lay too close to SS:SP (SS=CS for COM files; SP=FFFE) and the program may hang when run, because stack will likely overwrite the code. Therefore, if the spacing between IP and SP is less than 64 bytes, the goat generation is aborted and output file is not created (You will see a warning -- "Goat IP will be too close to SP. Abort!"). When the size specified in the command line is greater than 65280 (or equal to), EXE file is generated automatically (you do not need to write /E or /S switch explicitly). Such a file will have a normal 512-bytes EXE header in the beginning. When you need to create EXE file shorter than 65280 bytes, use /E (or /S, /Z or /Knnnn) command line switch. 4.3. INI file You may like to put your preferences (signature, switches, filename templates, etc.) into a separate file -- GOAT.INI (common for GOAT.COM, GOATS.COM and FLOCK.COM). Use any text editor to create or modify INI file. The sample GOAT.INI file is given below: GOAT.INI Motto="Antivirus test file." ;all output bait files will carry this string. GOATfiles=FPROT ;files will be FPROT000.COM, FPROT001.COM, .. ;(default=GOAT) GOATSfiles=ESASS ;files will be ESASS000.COM, ESASS001.COM, ... ;(default=GOAT) FLOCKfiles=S&S ;files will be S&S000.COM, S&S001.COM, ... ;(default=GOAT) FLOCKdirs=HEAP ;directories created - HEAP000, HEAP001, ;HEAP002 ;(default=DIR) STACKfill="*MYSTACK" ;fill stack with '*MYSTACK*MYSTACK*MYSTACK' ;(default=STACK) SYSname="DRIVERXX" ;this string is inserted into SYS header ;(default=GOATXXXX) Switches=/F200/D50 ;make 50 dirs, 200 files in each. 10000 in ;total Switches=/C1 ;to turn off registers check and avoid ;warning "Your PC might be infected..." Switches=/iF6h ;always fill free file space with 0F6h byte Switches=/O ;never make overlaid EXE files GOAT.INI may be located in the current directory or in the path of started program. The first location has priority over the second. GOAT.INI may not exist. In that case programs use built-in defaults. Filename and subdirectory templates are limited to 5 symbols, because p rograms always add '000' and then start incrementing this number until it becomes '999'. Any string exceeding the limit of 5 symbols will result in the following error message: "Error in the INI file line #nnn" 4.4. Bait file internals The bait files created with GOAT.COM, GOATS.COM and FLOCK.COM (if they have the same size) are absolutely identical in their internal structure and properties. Created sacrificial goat file contains a small program, which displays its type (COM, EXE or SYS), size in hexadecimal and in decimal (only when goat file is of enough size, i.e., space for code itself is at least 70 bytes). Sacrificial goat file consists of the two parts: the small portion of code (70 bytes or, if space not allows, just 2 bytes) and a block of zeroes, NOPs or pattern of variable size (00..FF, 0000...FFFE or random pattern). Zeroes (or NOPs or pattern) take all space of the file, free from the code. EXE files have additionally an EXE-header. Non-used part of the EXE header is always filled with zeroes. SYS files have additionally a device header, strategy and interrupt routines. The output of a sample goat file (the size of the sample was 100 bytes) is the following: "Goat file (COM). Size=00000064h/0000000100d bytes." File type (COM/EXE/SYS) and real numbers are inserted into the goat file message at the moment of creation. 4.5. Naming of goats Usually GOAT.COM, GOATS.COM and FLOCK.COM programs create output sacrificial files in the following order: GOAT000.COM, GOAT001.COM, GOAT002.COM, etc. Same applies to EXE files: GOAT000.EXE, GOAT001.EXE, GOAT002.EXE, etc. If some file in a row (say GOAT050.COM or GOAT050.EXE) already exists -- the next file number is selected automatically (it will be GOAT051.COM or GOAT051.EXE). Thus, we cannot generate both GOAT050.COM and GOAT050.EXE in the same directory. This rule does not apply for SYS files (ex., GOAT000.COM and GOAT000.SYS are allowed). This naming strategy is used to give some freedom for companion|companion viruses. Note, that definitions, given in the INI file may change default file (and subdirectory) naming. 4.6. Bait device drivers There are two formats of DOS device drivers -- old format (a'la COM, understood by all DOS versions >2.0) and new format (a'la EXE, introduced in MS-DOS 3.0). Drivers of old type can only be started from CONFIG.SYS using DEVICE statement. The entry point is defined in special SYS header. Drivers of new (EXE) type can additionally be started as a normal executables from the DOS command prompt. Drivers of EXE type have two entry points -- one for invocation from CONFIG.SYS/DEVICE (as written in the SYS header, which goes after EXE header) and the other is defined by CS:IP fields in the EXE header (this one works only when file is started from the command line). The other advantage of EXE format driver -- it is not limited to 64K, like old type of drivers. Such new drivers can exceed 64K, but pointers to Strategy and Interrupt routines must fit into first 64k (they are limited to 16-bits). To create device driver (SYS) file use switch /Y. Goat drivers of the old (COM) style will print message "Goat file (SYS). Size=..." when DOS requests an initialization of the driver (during CONFIG.SYS processing). Files in new format (SYS&EXE) will do the same, but will print this message also when run from the DOS command line as a normal EXE file. In both cases this driver file prints the same message. Note, that EXE device drivers bear a "(SYS)" designator inside, but are always named as EXE files (to enable start from the command line as a normal executable). Minimal size of the device driver is around 150 bytes (including SYS header). This limit increases for SYS&EXE files (it should include additionally the size of the EXE header -- 32 bytes for /S; 512 bytes for /E). 5. "A standard set" of goat files. Let's imagine that we know that we have a sample of the virus (ex., we got the sample from knowledgeable antivirus researcher), but we have no information about properties of the virus. This situation frequently occurs in practice. First, we test it against a set of files of different lengths (say, 1000, 2000, ...10000 bytes). Now we see that the virus infected 8 files (3000, ...10000) and conclude that the virus avoids short victims (<3000). The "standard set" of goat files may help you to find out which files are preferred by the virus (ex.: virus may infect only COM files starting with JMP). Checking "a standard set" after virus attack, you can easily understand which files are infectable. Now we have another question -- does the virus infect all files longer than 3000 bytes regardless of their contents? We have to test the virus against a set of files of fixed size, but different contents. To simplify this task GOAT package has the generator of "a standard set" of baits of given size -- it is called GOATSET.BAT. Yes, this file is really a DOS batch file, issuing a series of calls to GOAT.COM with different parameters. GOATSET.BAT makes COM, EXE and SYS files. Files are filled with zeroes or NOPs (90h), with initial JMP (0E9h) or without it. Some files carry ReadOnly attribute. EXE files are with normal (512 bytes) and short (32 bytes) EXE headers, with MZ and ZM markers. GOATSET.BAT needs only one command line parameter -- size of the files in the set. After invocation 52 files of the same size are generated -- 12 COM, 34 EXE, 2 SYS and 4 SYS&EXE files. GOATSET.BAT also writes a report file GOATSET.LOG and places there a full description of the generated bait files set. Being a BAT file, GOATSET.BAT is fully customizable. It can be easily changed with any text editor. 6. Future threats 6.1. Anti-goat viruses Fortunately, there are only few viruses, that try to avoid infecting goat files. One of them is Sarov.1400. It uses primitive algorithm to avoid victims with many repeated bytes. Corresponding code is: 0100 8B161C00 MOV DX,[001C] ;LOAD RELATIVE OFFSET IN FILE 0104 33C9 XOR CX,CX 0106 D1EA SHR DX,1 0108 B80042 MOV AX,4200 ;LSEEK TO CHECKED FILE AREA 010B E80F01 INT 21 010E BAD804 MOV DX,04D8 ;BUFFER LOCATION 0111 B43F MOV AH,3F ;READ 100 BYTES FROM FILE 0113 B96400 MOV CX,0064 ;SIZE OF BLOCK TO CHECK 0116 8BFA MOV DI,DX ;DI -> BUFFER 0118 CD21 INT 21 011A 268A05 MOV AL,ES:[DI] ;GET FIRST BYTE (ES=DS) 011D 47 INC DI ;SKIP TO NEXT BYTE 011E F3AE REPZ SCASB ;COMPARE WITH THE FIRST 0120 7455 JZ DON'T_INFECT ;ALL BYTES ARE THE SAME! INFECT_THE_FILE: ... Without any doubt, more and more anti-goat viruses will appear in future. We can also expect appearance of more viruses, which avoid victims placed on virtual disk. Or viruses, which do not infect files with certain typical lengths (divisible with 10**N and 16**N). Fortunately, most virus writers have not yet realized that such features are a very strong weapon. I would say, comparable with polymorphicity, because in most cases full disassembly of the virus will be required and that takes time. Moreover, such anti-goat tricks are programmed much more easily than any polymorphic engine. 6.2. Armoring tricks, virus/trojan conversion There are a lot of viruses, which try to complicate their investigation. Viruses use anti-tracing techniques: SVC.4644, Ieronim, XPEH (family of viruses), Zherkov (called also Loz), Magnitogorsk, HideNowt, OneHalf.3544, OneHalf.3577, Cornucopia, etc. A wonderful set of antitracing capabilities is found in CPE|Compact Polymorphic Engine (CPE 0.11b), which is actually a virus creation tool. Some viruses, when they detect that they are being traced switch to the "trojan" mode and try to damage files, floppies and/or hard disks. That looks like a revenge of virus writer for an attempt of antivirus researcher to catch the virus. Many viruses have such a behavior -- for example, recently found RDA.Fighter.5871/5969/7408 (overwrites random sectors on the HDD) [Daniloff], rather old Maltese Amoeba (destroys 4 sectors on each of the first 30 cylinders of all drives), CLME.Ming.1952 (overwrites 34 first sectors on all drives), DR&ET.1710 (erases 128 first sectors on HDDs), Gambler.288 (destroys first 10 sectors on drive C:), Kotlas (removes original non-infected copy of MBR), SumCMOS.6000 (tries to corrupt HDD). The most nasty idea -- to use destructive capabilities (a'la trojan) if the virus senses the antivirus environment. For example, when virus detected goat files. ============================================================================= ============================================================================= 1 HEURISTIC ANTI-VIRUS TECHNOLOGY Generally speaking, there are two basic methods to detect viruses - specific and generic. Specific virus detection requires the anti-virus program to have some pre-defined information about a specific virus (like a scan string). The anti-virus program must be frequently updated in order to make it detect new viruses as they appear. Generic detection methods however are based on generic characteristics of the virus, so theoretically they are able to detect every virus, including the new and unknown ones. Why is generic detection gaining importance? There are four reasons: 1) The number of viruses increases rapidly. Studies indicate that the total number of viruses doubles roughly every nine months. The amount of work for the virus researcher increases, and the chances that someone will be hit by one of these unrecognizable new viruses increases too. 2) The number of virus mutants increases. Virus source codes are widely spread and many people can't resist the temptation to experiment with them, creating many slightly modified viruses. These modified viruses may or may not be recognized by the anti-virus product. Sometimes they are, but unfortunately often they are not. 3) The development of polymorphic viruses. Polymorphic viruses like MTE and TPE are more difficult to detect with virus scanners. It is often months after a polymorphic virus has been discovered before a reliable detection algorithm has been developed. In the meantime many users have an increased chance of being infected by that virus. 4) Viruses directed at a specific organization or company. It is possible for individuals to utilize viruses as weapons. By creating a virus that only works on machines owned by a specific organization or company it is very unlikely that the virus will spread outside of the organization. Thus it is very unlikely that any virus scanner will be able to detect the virus before the payload of the virus does its destructive work and reveals itself. Each of these scenarios demonstrates the fact that virus scanners can not recognize a virus until the virus has been discovered and analyzed by an anti-virus vendor. These same scenarios do not hold true for generic detectors, and therefore many people are becoming more interested in generic anti-virus products. Of the many generic detection methods, heuristic scanning is currently becoming the most important. 2 HEURISTIC SCANNING One of the most time consuming tasks that a virus researcher faces is the examination of files. People often send files to researchers because they believe the files are infected by a new virus. Sometimes these files are indeed infected, sometimes not. Every researcher is able to determine very quickly what is going on by loading the suspected file into a debugger. A few seconds is often enough, and many researchers must have asked themselves: "How can I determine this so quickly"? One time I demonstrated this effect to the audience on an international conference. I showed the first page of the assembly listing of a MTE-infected file, and within about a second, Vesselin Bontchev came with the correct answer. How is this possible? 2.1 ARTIFICIAL INTELLIGENCE Some of the many differences between viruses and normal programs is that normal programs typically start searching the command line for options, clearing the screen, etc. Viruses however never search for command line options or clear the screen. Instead they start with a search for other executable files, by writing to the disk, or by decrypting themselves. A researcher who has loaded the suspected file into a debugger can notice this difference in only a glance. Heuristic scanning is an attempt to put this experience and knowledge into a virus scanner. The word 'heuristic' means (according to a Dutch dictionary) 'the self finding' and 'the knowledge to determine something in a methodic way'. A heuristic scanner is a type of automatic debugger or disassembler. The instructions are disassembled and their purposes are determined. If a program starts with the sequence MOV AH,5 INT 13h which is a disk format instruction for the BIOS, this is highly suspected, especially if the program does not process any command line options or interact with the user. 2.2 SUSPECTED ABILITIES In reality, heuristics is much more complicated. The heuristic scanners that I am familiar with are able to detect suspicious instruction sequences, like the ability to format a disk, the ability to search for other executables, the ability to remain resident in memory, the ability to issue non-standard or undocumented system calls, etc. Each of these abilities has a value assigned to it. The values assigned to the various suspicious abilities are dependant on various fact. A disk format routine doesn't appear in many normal programs, but often in viruses. So it gets a high value. The abilities to remain resident in memory are found in many normal programs, so despite of the fact that they also appear in many viruses it doesn't get a high value. If the total of the values for one program exceeds a predefined threshold, the scanner yells "Virus!". A single suspected ability is never enough to trigger the alarm. It is always the combination of the suspected abilities which convince the scanner that the file is a virus. 2.3 HEURISTIC FLAGS Some scanners set a flag for each suspected ability which has been found in the file being analyzed. This makes it easier to explain to the user what has been found. TbScan for instance recognizes many suspected instruction sequences. Every suspected instruction sequence has a flag assigned to it: 2.4 FLAG DESCRIPTION F = Suspicious file access. Might be able to infect a file. R = Relocator. Program code will be relocated in a suspicious way. A = Suspicious Memory Allocation. The program uses a non-standard way to search for, and/or allocate memory. N = Wrong name extension. Extension conflicts with program structure. S = Contains a routine to search for executable (.COM or .EXE) files. # = Found an instruction decryption routine. This is common for viruses but also for some protected software. E = Flexible Entry-point. The code seems to be designed to be linked on any location within an executable file. Common for viruses. L = The program traps the loading of software. Might be a virus that intercepts program load to infect the software. D = Disk write access. The program writes to disk without using DOS. M = Memory resident code. This program is designed to stay in memory. ! = Invalid opcode (non-8088 instructions) or out-of-range branch. T = Incorrect timestamp. Some viruses use this to mark infected files. J = Suspicious jump construct. Entry point via chained or indirect jumps. This is unusual for normal software but common for viruses. ? = Inconsistent exe-header. Might be a virus but can also be a bug. G = Garbage instructions. Contains code that seems to have no purpose other than encryption or avoiding recognition by virus scanners. U = Undocumented interrupt/DOS call. The program might be just tricky but can also be a virus using a non-standard way to detect itself. Z = EXE/COM determination. The program tries to check whether a file is a COM or EXE file. Viruses need to do this to infect a program. O = Found code that can be used to overwrite/move a program in memory. B = Back to entry point. Contains code to re-start the program after modifications at the entry-point are made. Very usual for viruses. K = Unusual stack. The program has a suspicious stack or an odd stack. TbScan would for instance output the following flags: VIRUS HEURISTIC FLAGS Jerusalem/PLO FRLMUZ Backfont FRALDMUZK Minsk_Ghost FELDTGUZB Murphy FSLDMTUZO Ninja FEDMTUZOBK Tolbuhin ASEDMUOB Yankee_Doodle FN#ELMUZB The more flags that are triggered by a file, the more likely it is that the file is infected by a virus. Normal programs rarely trigger one flag, while at least two flags are required to trigger the alarm. To make it more complicated, not all flags carry the same 'weight'. 3 FALSE POSITIVES Just like all other generic detection techniques, heuristic scanners sometimes blame innocent programs for being contaminated by a virus. This is called a "False Positive" or "False Alarm". The reason for this is simple. Some programs happen to have several suspected abilities. For instance, the LOADHI.COM file of QEMM has the following suspected abilities (according to an older, yet obsolete version of TbScan): A = Suspicious Memory Allocation. The program uses a non-standard way to search for, and/or allocate memory. M = Memory resident code. This program may be a TSR but also a virus. U = Undocumented interrupt/DOS call. The program might be just tricky but can also be a virus using a non-standard way to detect itself. Z = EXE/COM determination. The program tries to check whether a file is a COM or EXE file. Viruses need to do this to infect a program. O = Found code that can be used to overwrite/move a program in memory. All of these abilities are available in LoadHi, and the flags are enough to trigger the heuristic alarm. As LoadHi is supposed to allocate upper memory, load resident programs in memory, move them to upper memory, etc., all these suspected abilities can easily be explained and verified. However, the scanner is not able to know the intended purpose of the program, and as most of these suspected abilities are often found in viruses, it just describes the LoadHi program as "a possible virus". 3.1 HOW SERIOUS IS THE ISSUE OF FALSE ALARMS? If a heuristic scanner pops up with a message saying: "This program is able to format a disk and it stays resident in memory", and the program is a resident disk format utility, is this really a false alarm? Actually, the scanner is right. A resident format utility obviously contains code to format a disk, and it contains code to stay resident in memory. The heuristic scanner is therefore completely right! You could name it a false suspicion, but not a false positive. The only problem here is that the scanner says that it might be a virus. If you think the scanner tells you it has found a virus, it turns out to be a false alarm. However, if you take this information as is, saying 'ok, the facts you reported are true for this program, I can verify this so it is not a virus', I wouldn't count it as a false alarm. The scanner just tells the truth. The main problem here is the person who has to make decisions with the information supplied by the scanner. If it is a novice user, it is a problem. More about that later. 3.2 AVOIDING FALSE POSITIVES Whether we call it a false positive or a false suspicion doesn't matter. We do not like the scanner to yell every time we scan. So we need to avoid this situation. How do we achieve this? 1) Definition of (combinations of) suspicious abilities The scanner does not issue an alarm unless at least two separate suspected program abilities have been found. 2) Recognition of common program codes Some known compiler codes or run time compression or decryption routines can cause false alarms. These specific compression or decryption codes can be recognized by the scanner to avoid false alarms. 3) Recognition of specific programs Some programs which normally cause a problem (like the LoadHi program used in the example) can be recognized by the heuristic scanner. 4) Assumption that the machine is initially not infected Some heuristic scanners have a 'learn' mode, i.e. they are able to learn that a file causing a false alarm is not a virus. 3.3 DEALING WITH FALSE POSITIVES Some false positives are not easily avoided. So, the user has to deal with a certain amount of false alarms, and must make the final decision as to whether a file is infected or not. Ok, you may say, how do we know whether a suspicious program is a virus or innocent. There is no way to find out, that is what most people believe. Actually there is a way to find out, but this depends on the scanner. The scanner has to explain to the user the reasons why the program is suspect. 'This file might contain a virus' actually doesn't say much to the user. It is always right. Every file MIGHT contain a virus, but MAY also be clean. We actually use a scanner to find out! What is the user supposed to do with this information? However, if the scanner says that some program is able to remain resident in memory and able to format a disk, the user can more easily figure out what is going on. If a word processor gives such an alarm, it is extremely likely that the program carries a virus, because word processors generally are not able to format disks and remain resident in memory. However, if the suspected file is a resident disk formatting utility, then all of the suspected abilities can be explained by the intended purpose of the program. Reason for suspicion: memory resident and disk formatting abilities. PROGRAM PROBABLY Resident disk formatter No Virus (innocent) Word processor Malicious (virus) Both programs cause the same heuristic alarms, but the final conclusion is different. Naturally, it requires an advanced user to draw a conclusion for the question "infected or not?". However, my opinion is that judging the results of any scanner (also conventional scanners) is a task for an advanced user only. If the scanner has a 'learn' mode, i.e. is able to remember which programs cause a false alarm, the initial scan should be performed by an advanced user, but the subsequent scans (when the possible false positives have been eliminated) can be performed by a novice user. This is already common practice in most organizations. Anyway, it isn't as bad as it seems, as all other detection methods (including signature scanning) are known to cause some false alarms as well. Heuristics however has the advantage that it is able to supply you with enough information to establish for yourself whether a suspected file is likely a virus or not. 4 HOW DOES HEURISTIC SCANNING PERFORM? Heuristics is a relatively new technique and still under development. It is however gaining importance rapidly. This is not surprising as heuristic scanners are able to detect over 90% of the viruses without using any predefined information like signatures or checksum values. The amount of false positives depends on the scanner, but a figure as low as 0.1% can be reached easily. TbScan 6.02 used on the large virus collection of Vesselin Bontchev showed the following results: SCANNING 7210 DETECTION METHOD FILES PERCENTAGE Conventional 7056 97.86% Heuristics 6465 89.67% A false positive test however is more difficult to perform so there are no independent results available. 5 COMBINATION OF CONVENTIONAL AND HEURISTIC SCANNING Some people think heuristic scanning is a replacement for conventional scanning. In my opinion it is not. Heuristic scanning serves a very useful purpose when used in combination with conventional scanning. The results of both scanning methods can be validated by each other, thereby reducing false positives and also false negatives. Combined result of analysis: HEURISTICS CONVENTIONAL PROBABILITY clean clean very probably clean clean virus might be a false positive virus clean might be a false negative virus virus very probably infected fn: 10% fn: 1% combined false negatives: 0.1% fp: 0.1% fp: 0.001% combined false positives: 0.00001% The chances of both the heuristic scanner and the conventional scanner failing is minimal. If both scanning methods have the same results, the result is almost certain. In the few cases that the results don't agree with each other additional analysis is required. TbScan 6.02 used on the large virus collection of Vesselin Bontchev showed the following results: SCANNING 7210 DETECTION METHOD FILES PERCENTAGE Conventional 7056 97.86% Heuristics 6465 89.67% Combined 7194 99.78% 6 WHAT CAN BE EXPECTED FROM IT IN THE FUTURE? -> THE DEVELOPMENT CONTINUES Most anti-virus developers still do not supply a ready-to-use heuristic analyzer. Those who have heuristics already available are still improving it. It is however unlikely that the detection rate will ever reach 100% without a certain amount of false positives. On the other hand it is unlikely that the amount of false positives will ever reach 0%. Maybe you wonder why it isn't possible to achieve 100% correct results. There is a large grey area between viruses and non-viruses. Even for humans it is hard to describe what a virus is or not, an often used definition of a computer virus is this: "A virus is a program that is able to copy itself". According to this definition the DiskCopy.Com program is a virus... -> REACTION OF VIRUS WRITERS An important issue is the effect on virus writers. It is likely that they will try to avoid detection by heuristic scanners. Until now the goal was to avoid detection by signature scanners, and this was very easy to do, as it was sufficient to modify only a small part of an existing virus. Teenagers with some basic understanding of programming could do so easily . Avoiding heuristic scanners however requires a lot more knowledge, if even possible at all. Fortunately, this detection-avoiding method of programming makes detection by conventional anti-virus products easier because it means that the programmer can not use very tight and straight code. The virus writer will be forced to write more complex viruses. 7 THE PRO'S AND CON'S OF HEURISTIC SCANNING ADVANTAGES Can detect 'future' viruses User is less dependant on product updates DISADVANTAGES False positives are possible Judgement of the result requires some basic knowledge 8 HEURISTIC CLEANING Before we can discuss heuristic cleaning, it is important to know how a virus infects a program. The basic principle is not difficult. A virus - a program by itself - adds itself to the end of the program. The size of the program increases due to this addition of the viral code. Appending a virus program to another program is however not enough, the virus code should also be executed. To make this happen, the virus overwrites the first bytes of the file with a 'jump' instruction, which makes the processor jump to the viral code. The virus now gains control when the program is invoked, and it will finally pass control to the original program. Since the first bytes of the file are overwritten by the jump instruction, the virus has to 'repair' these bytes first. After that the virus just jumps to the beginning of the original program, and most often this program works as usual. ORIGINAL PROGRAM INFECTED PROGRAM 100: 2487: Virus!p r jmp 100 To clean an infected program, it is of vital importance to restore the bytes being overwritten by the jump to the virus code. The virus has to restore these bytes also, so somewhere in this virus code these original bytes are stored. The cleaner searches for those bytes, puts them back in their original location, and truncates the file to the original size. 8.1 HOW DOES A CONVENTIONAL CLEANER WORK? A conventional cleaner has to know which virus to remove. Suppose your system is infected with the Jerusalem/PLO virus. You invoke your cleaner and it proceeds like this: "Hey, this file is infected with the Jerusalem/PLO virus. OK, this virus is 1873 bytes in size, and it overwrites the first three bytes of the original program with a jump to itself. The original bytes are located at offset 483 in the viral code. So, I have to take those bytes, copy them to the beginning of the file, and I have to remove 1873 bytes of the file. That's it!" 8.2 SHORTCOMINGS OF CONVENTIONAL CLEANSERS The cleaner has to know the virus it has to remove. It is impossible to remove an unknown virus. The virus should be the same as the virus known to the cleaner. Imagine what would happen if the virus used in the example was modified and now 1869 bytes in size instead of 1873... the cleaner would remove too much! This is not an exception, but it happens quite often since there are so many mutants. For instance, the Jerusalem/PLO family now contains more than 100 mutants! Many polymorphic viruses have variable lengths and maintain the original instructions encrypted. Most conventional cleaners are therefore unable to clean MTE infected programs. 8.3 THE VIRUS WILL REMOVE ITSELF BEFORE ACTUAL EXECUTION We have seen above how a virus works. The interesting part is that when the virus passes control to the original program it restores the original bytes at the beginning of the program and jumps back to start the program. Every virus is able to repair the original program in order to keep it functional (except for overwriting viruses, but these can't be cleaned anyway). 8.4 LET THE VIRUS DO THE DIRTY WORK The idea is: why not let do the virus the dirty work? The basic principle of heuristic cleaning is simple. The heuristic cleaner loads the infected file and starts emulating the program code. It uses a combination of disassembly, emulation and sometimes execution to trace the flow of the virus, and to emulate what the virus is normally doing. When the virus restores the original instructions and jumps back to the original program code, the cleaner stops the emulation process, and says 'thank you' to the virus for its cooperation in restoring the original bytes. The now repaired start of the program is copied back to the program file on disk, and the part of the program that gained 'execution' will be chopped off. An additional analysis of the cleaned program file will be performed to be on the safe side. Note that the cleaner is actually removing the unknown from the unknown. No predefined information about the virus or infected file is necessary. The process of emulation is just like hitchhiking. The emulator convinces the viral code that it is actually executing, and it hitchhikes to the point where the virus passes control to the original program. However, the actual process is very complicated. As with hitchhiking, many things can go wrong: -> Driver takes you to the wrong place The virus does not intend to execute the original program, but it starts doing completely different things. As the purpose of the emulation is to restore the original program, we never reach our goal. -> Driver won't let you out If the viral code performs an endless loop, the original program will never be restored so the cleaner might wait forever. -> Driver leaves the car A potentially dangerous situation is that the cleaner is too ambitious in its task to emulate everything, and that the virus gets control inside the emulated environment and finally escapes from it. -> Driver hits a tree and kills you too Many viruses are badly programmed. If they crash inside the emulator, chances are that the emulator crashes too. Heuristic cleaners are so complicated that there is only one available right now. However, the great potential of heuristic cleaning make it likely that there will be more heuristic cleaners soon. 8.5 THE PRO'S AND CON'S (OF HITCHHIKING) ADVANTAGES No need to recognize mutants No problems with polymorphic viruses Can clean 'future' viruses User is less dependant on product updates DISADVANTAGES No exact copy of the original It cleans everything: even clean files! Being the author of the first heuristic cleaner I have received many reactions to it. Most people were surprised that my cleaner was able to remove MTE viruses before my scanner was even able to recognize them. This is especially interesting as most anti-virus products are still not able to remove MTE infections. Of course everybody wants to know how many viruses can be removed this way. I can't show a reliable figure, as testing a cleaner is extremely tedious and time consuming task. However, a figure of 80% is a rough estimate. Many conventional cleaners do not even come close to this percentage. 8.6 WHAT CAN BE EXPECTED FROM IT IN THE FUTURE? Heuristic cleaning needs additional improvements. Some viruses use anti-debugger features that also make an emulator fail. It is also still possible that a virus detects that it is being emulated, and it can simply refuse to cooperate. The better the emulator performs, the less likely this is. Major improvements however are more likely to show up after multiple heuristic cleaners are available and some competition occurs. This text is copyright Dr. Frans Veldman, ESaSS B.V. 1994-1995 and is reproduced with permission. [* Yeah.. look who's caring :P - sep *] Requests for reproduction in part, or whole may be addressed on the Internet to: veldman@esass.iaf.nl, or by telephone in North America to: 1-800-667-8228 x331, Europe: +31 8894 22282. ============================================================================= ============================================================================= This is the text of the lecture, presented in Boston (USA) on 21th September 1995 on the 5th international conference VB-95 (Virus Bulletin - 95), 20-22 September 1995 ------------------------------------------------------------ MODERN METHODS OF DETECTING AND ERADICATING KNOWN AND UNKNOWN VIRUSES Dr. Dmitry Mostovoy DialogueScience, Inc. Computing Center of the Russian Academy of Sciences, 40 Vavilova Street, Moscow, 117967, Russia E-mail: dmost@dials.msk.su Abstract Viruses are growing in number from day to day, so it is obvious that soon anti-virus programs like NAV or MSAV will not be quite efficacious. Therefore, we started designing a program that would annihilate not individual infectors, but viruses in general, regardless of whether a virus is known or not, or whether it is old or new. The first outcome of our efforts in this direction, ADinf (Advanced Diskinfoscope), is a forecasting center which alerts the user in advance with great reliability about the intrusion of viruses, even HITHERTO unknown infectors. As distinct from all other data integrity checkers, ADinf inspects a disk by scanning the sectors one by one via direct addressing of BIOS without the assistance of the operating system and takes under check all vital parts of hard disk. To evade such a detection tactics is almost impossible. ADinf alerts the user in time about virus intrusion and restores infected boot sectors. How to restore the infected files automatically? Our next step was to produce a curing companion to ADinf. The new tool, ADinf Cure Module, deploys a novel strategy. Paradoxically, ninety seven percents of the viruses in our collection fall under few standard groups by the types of infection methods. New viruses are as a rule designed on one of these common infection principles, and therefore ADinf Cure Module will be about 97% efficient in its performance also in the future. ADinf and ADinf Cure Module are parts of DialogueScience anti-virus kit - the most popular anti-virus in Russia. INTEGRITY CHECKING The basic classes of anti-virus programs are well known. They are scanners/removers, monitors, and vaccines. I would like to discuss the development of programs to which, in my opinion, anti-virus designer pay undeservedly little attention. This class of anti-virus programs is known as ``integrity checkers'', though the name does not fully characterize the program's policy which we describe below. This is the only class of purely software means of anti-virus protection, which permits the detection of known and unknown viruses with reliability approaching 100% and eradication up to 97% file infectors, even new hitherto unknown viruses. The operation of integrity checkers is based on a simple fact: even though it is impossible to know all information about potentially infinite number of viruses, it is quite possible to store a finite volume of information about each logical drive in the disk and to detect virus infection from the changes taken place in files and system areas of the disk. As already mentioned, the name "integrity checker'' does not fully reflect the essence of these programs. Infection techniques is not restricted to a simple modification of the program code. Other paths for infection either already exist or are also possible; for example, companion viruses [1]. A disk can be corrupted by restructuring the directory tree, say, by renaming the directories and creating new directories, and by other such manipulations. Consequently, to provide reliable protection integrity checkers must take care of far more number of parameters that the mere changes in the size and CRC of files as is done by most programs of this class. Thus, master boot record (MBR) and boot sectors of logical drives, a list of bad clusters, directory tree structure, free memory size, CRC of Int 13h handler in BIOS and even the Hard Disk Parameter Tables, all must be under the control of integrity checkers. Changes in the size and CRC of files, creation of new files and directories and removal of old files and directories are obviously objects for strict control. A designer of integrity checker must be one step ahead of virus designers and block every possible loophole for parasite intrusion. Despite the large amount of controlled information, an integrity checker must nonetheless be user-friendly, simple in usage, and quick in checking disks. It must at the same time be user-customizable as regards the levels of messages displayed on the changes occurred in the disk and be capable of conducting a preliminary analysis of the changes, particularly the suspicious modifications such as - changes in size and CRC of files without any change in datestamp, - illegal values of hours, minutes or seconds in the datestamp of infected files (for example, 62 seconds), - year greater than the current year (certain viruses mark infected files by increasing the year of creation by 100 years, which cannot be detected visually because ``dir'' command only displays the last two figures of the year, - any changes in files specified in the ``stable'' list, - change in master boot record or boot sector, - appearance of new bad clusters on the disk and others. Let us now discuss the main problems faced by a designers of ``integrity checkers''. First, this is the dodging ability of viruses based on stealth-mechanism. Integrity checkers that rely on operating system tools in their scanning mission are absolutely helpless against this class of viruses. They have stimulated the development of an integrity checker that checks disks by reading the sectors via direct addressing through BIOS. Stealth viruses cannot hide the changes in an infected file size; on the contrary, under such a scanning technique the stealth-mechanism betrays the presence of known and hitherto unknown stealth viruses through the discrepancy between the information given out by DOS and the information obtained by reading via BIOS. Such algorithms have been created and successfully detect the appearance of stealth-viruses. Scanning a disk by reading the sectors by direct addressing of BIOS has one more important merit which is often overlooked. If a computer is infected by a so-called ``fast infector'' [1], i.e., a virus that infects files not only when they are started, but also when they opened, such an integrity checker will not spread the infection to all files in the disk, because it does not at all address the operating system for reading a disk via sectors and uses an independent file opening system, and the viruses does not get any control. Finally, an integrity checker utilizing direct reading of sectors is twice faster in checking a disk than any other program than relies on the operating system tools, because a disk scan algorithm can be created that reads each sector only once and optimizes the head movements. Disk handling via BIOS has its own hurdles. The foremost problem is the compatibility with innumerable number of diverse hardware and software, including disk compactors (Stacker, DoubleSpace), specialized drivers for accessing large disks (Disk Manager), SCSI disk drivers etc. Furthermore, there are many MS-DOS compatible operating systems that have imperceptible but quite important features in partitioning logical drives. Integrity checkers must pay due attention to these fine factors. VIRUS REMOVAL TECHNIQUES Modern integrity checkers are useful not only in detecting infection, but are also capable of removing viruses immediately with the help of the information they retrieve from an uninfected machine at the time of installation. An integrity checkers can kill known viruses as well as the viruses which were unknown at the time of creation of the integrity checker. How this is done? Obvious are the methods for removing viruses from the master boot record and boot sectors. Integrity checker stores images of uninfected boot sectors in its tables and in case of damage can instantly restore them. The only restriction is the restoration must also be effected via direct addressing of BIOS and after restoration the system must be rebooted immediately in order to prevent the active virus from reinjecting infection while accessing the disk via INT 13h. Removal of file viruses is based on a surprising fact, namely, despite the vast number of diverse viruses, there are only a few techniques by which a virus is injected into a file. Here we only briefly outline the file restoration strategy. Figure 1 shows a schematic diagram of a usual EXE file. For each file integrity checker keeps a header (area 1), relocation table (area 2) and the code at the entry point (area 4). Strings (area 3 and area 5) are vital because they are the keys to identifying the mutual locations of various areas in an infected file when a virus writes its tail, not at the file end, but at the file beginning or in the file body (after the relocation table or at the entry point). In an infected file, after determining the area that coincides with the imaged areas in the table, the displacement of a block (for example, the block for area 3 begins at the end of area 2 and ends at the beginning of the area 4) can be identified by string 3 position and thus moved back to its original location. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿Í» ³ EXE-header ³ º 1 ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĴ͹ ³ ³ º ³ Relocation table ³ º 2 ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĴͼ ³ ³ ³ Code ³Í» ³ ³ º 3 ³ ³Í¼ ³ ³ ³ Entry point ÄÄÄÄÄÄÄ>³Í» ³ ³ º 4 ³ ³Í¼ ³ ³Í» ³ ³ º 5 ³ ³Í¼ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´Í» ³ Debug information or ³ º 6 ³ overlays ³Í¼ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Fig.1 Image of area 6 takes about 3-4 Kb and is essential in recovering a file corrupted by viruses which damage the debug information and overlays in the course of defective infection. Thus, a file is recovered by reinstating its original status overwriting the image of its structure stored in integrity checker tables on an infected file. Consequently, a knowledge as to which virus infected the file is not mandatory. Tables containing information necessary for recovering files take about 200-450 Kb for one logical drive. The table size can be cut down to 90 Kb, if a user decides not to save the relocation information and this will not have any perceptible influence on the quality of recovery in most of the cases. CONCLUSION Integrity checkers undoubtedly do not provide a panacea against computer viruses. Unfortunately, there is no such panacea, nor can there be one. But they are quite reliable protection utilities which must be used jointly with other classes of anti-virus tools. The highlights of integrity checkers described above are all implemented in ADinf program, the most popular itegrity checker in Russia. It also is known in Germany where it is distributed on CD-ROM as a component of the DialogueScience Anti-Virus Kit. It checks a disk by reading its sectors one by one directly addressing BIOS, easily traps active stealth viruses by comparing the information obtained through BIOS and DOS. It instantly restores up to 97% of files corrupted by known and unknown viruses. REFERENCES 1. Vesselin Bontchev, Possible Virus Attacks Against Integrity Programs And How To Prevent Them, Proc. 2nd Int. Virus Bulletin Conf., September 1992, pp. 131-141. 2. Mostovoy D. Yu., A Method of Detecting and Eradicating Known and Unknown Viruses, IFIP Transactions, A-43, Security&Control of Information Technology in Society, February, 1994, pp. 109-111. ============================================================================= ============================================================================= ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º A brief history of PC viruses º º by Dr Alan Solomon º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 1986-1987 - the prologue It all started in 1986. Basit and Amjad realised that the boot sector of a floppy diskette contained executable code, and this code is run whenever you start up the computer with a diskette in drive A. They realised that they could replace this code with their own program, that this could be a memory resident program, and that it could install a copy of itself on each floppy diskette that is accessed in any drive. The program copied itself - they called it a virus. But it only infected 360 kb floppy disks. In 1987, the University of Delaware realised that they had this virus, when they started seeing the label " (c) Brain" on floppy diskettes. That's all it did - copy itself, and put a volume label on diskettes. Meanwhile, also in 1986, a programmer called Ralf Burger realised that a file could be made to copy itself, by attaching a copy of itself to other files. He wrote a demonstration of this effect, which he called VIRDEM. He distributed it at the Chaos Computer Club conference that December, where the theme was viruses. VIRDEM would infect any COM file; again the payload was pretty harmless. This attracted so much interest, that he was asked to write a book. Ralf hadn't thought of boot sector viruses like Brain, so his book doesn't even mention them. But by then, someone had started spreading a virus, in Vienna. In 1987, Franz Swoboda became aware that a virus was being spread in a program called Charlie. He called it the Charlie virus. He made lots of noise about the virus (and got badly bitten as a result). At this point, there are two versions of the story - Burger claims that he got a copy of this virus from Swoboda, but Swoboda denies this. In any case, Burger obtained a copy, and gave it to Berdt Fix, who disassembled it (this was the first time anyone had disassembled a virus). Burger included the disassembly in his book, after patching out a couple of areas to make it less infectious and changing the payload. The normal payload of Vienna is to cause one file in eight to reboot the computer (the virus patches the first five bytes of the code); Burger (or maybe Fix) replaced this reboot code with five spaces. The effect was that patched files hung the computer, instead of rebooting. This isn't really an improvement. Meanwhile, in the US, Fred Cohen had completed his doctoral dissertation, which was on computer viruses. Dr Cohen proved that you cannot write a program that can, with 100% certainty, look at a file and decide whether it is a virus. Of course, no one ever thought that you could, but Cohen made good use of an existing mathematical theorem and earned a doctorate. He also did some experiments; he released a virus on a system, and discovered that it travelled further and faster than anyone had expected. In 1987, Cohen was at Lehigh, as was Ken van Wyk. So was the author of the Lehigh virus. Lehigh was an extremely unsuccessful virus - it never managed to spread outside its home university, because it could only infect COMMAND.COM and did a lot of damage to its host after only four replications. One of the rules of the virus is that a virus that quickly damages it host, cannot survive. However, the Lehigh virus got a lot of publicity, and led to van Wyk setting up the Virus-L newsgroup on Usenet. Lehigh was nasty. After four replications, it did an overwrite on the disk, hitting most of the File Allocation Table. But a virus that only infects COMMAND.COM, isn't very infectious. Meanwhile, in Tel Aviv, Israel (some say in Italy), another programmer was experimenting. His first virus was called Suriv-01 (virus spelled backwards). It was a memory resident virus, but it could infect any COM file, whereas Lehigh could only infect COMMAND.COM. This is a much better infection strategy than the non-TSR strategy used by Vienna, as it leads to files on all drives and all directories being infected. His second virus was called Suriv-02, and that could infect only EXE files, but it was the first EXE infector in the world. His third attempt was called Suriv-03, and it could handle COM and EXE files. His fourth effort escaped into the world, and became known as Jerusalem virus. Every Friday 13th, instead of infecting files that are run, it deletes them. but Friday 13t are not common, so the virus is pretty inconspicuous, most of the time. It avoids infecting COMMAND.COM, because in those days, many people believed that this was the file to watch (see Lehigh). It looks as if it escaped rather than was released, because it plainly was not ready for release. The author decided to change the way that the virus detected itself in EXE files, and had made part of that change. There is redundant code from the Suriv viruses still in place, and also what looks like debugging code. It was found in the Hebrew University of Jerusalem (hence the name) by Yisrael Radai. While all this was going on, a young student at the University of Wellington, New Zealand, had found a very simple way to create a very effective virus. One time in eight, when booting from an infected floppy, it also displayed the message 'Your PC is now Stoned', hence the name of the virus. The virus itself was just a few hundred bytes long, but because of its selfrestraint, and memory-resident replication, it has become the most widespread virus in the world, accounting for over a quarter of outbreaks. It is very unlikely that Stoned virus will ever become rare. The virus spread rapidly, because of its inconspicuousness (and because in those days, people were keeping a careful eye on COMMAND.COM, because of Lehigh). In Italy, at the University of Turin, a programmer was writing another boot sector virus. This one put a bouncing ball up on the screen, if the disk was accessed exactly on the half hour. It became known as Italian virus, Ping pong, or Bouncing Ball. But this virus had a major defect - it couldn't work on anything except an 8088 or 8086 computer, because it uses an instruction that doesn't work on more advanced chips. As a result, this virus has almost died out (as has Brain, which can only infect 360 kb floppies, and which foolishly announces its presence via the volume label). Back in the US, an American was demonstrating a problem that has continued to dog US virus writers ever since - incompetence. The Lehigh didn't make it outside a small circle - neither did the Yale virus. This was another boot sector virus, but it only copied itself when you booted from an infected floppy, then put another floppy in to continue the boot process. No subsequent diskette was infected, and if the boot-up continued from a hard disk, there was no infection at all. Yale never spread at all widely, either. But also in 1987, a German programmer was writing a very competent virus, the Cascade, so called after the falling letters display that it gave. Cascade used a new idea - most of the virus was encrypted, leaving only a small stub of code in clear for decrypting the rest of the virus. The reason for this was not clear, but it certainly made it more difficult to repair infected files, and it restricted the choice of search string to the first couple of dozen bytes. This idea was later extended by Mark Washburn when he wrote the first polymorphic virus, 1260 (Chameleon). Washburn based Chameleon on a virus that he found in a book - the Vienna, published by Burger. Cascade was supposed to look at the Bios, and if it found and IBM copyright, it would refrain from infecting. This part of the code didn't work. The author soon released another version of the virus, 1704 bytes long instead of 1701, in order to correct this bug. But the corrected version had a bug that meant that it still didn't detect IBM Bioses. Of these early viruses, only Stoned, Cascade and Jerusalem are common today, but those three are very common. 1988 - the game begins 1988 was fairly quiet, as far as virus writing went. Mostly, it was the year that anti-virus vendors started appearing, making a fuss about what was at that time only a potential problem, and not selling very much anti-virus software. The vendors were all small companies, selling their software for very low prices (#5 or $10 was common). Some of them were shareware, some were freeware. Occasionally some larger company tried to pop up, but no-one was paying serious cash to solve a potential problem. In some ways, that was a pity, because 1988 was a very virus- friendly year. It gave Stoned, Cascade and Jerusalem a chance to spread undetected, and to establish a pool of infected objects that will ensure that they never become rare. It was in 1988 that IBM realised that it had to take viruses seriously. This was not because of the well-known Christmas tree worm, which was pretty easy to deal with. It was because IBM had an outbreak of Cascade at the Lehulpe site, and found itself in the embarrassing position of having to inform its customers that they might have become infected there. In fact, there was no real problem, but from this point on, IBM took viruses very seriously indeed, and the High Integrity Computing Laboratory in Yorktown was given responsibility for the IBM research effort in this field. 1988 saw a few scattered, sporadic outbreaks of Brain, Italian, Stoned, Cascade and Jerusalem. It also saw the final arguments about whether viruses existed or not. Peter Norton, in an interview, said that they were an urban legend, like the crocodiles in the New York sewers, and one UK expert claimed that he had a proof that viruses were a figment of the imagination. In 1988, the real virus experts would debate with such people - after that year, real virus experts would simply walk away from anyone who had such absurd beliefs. Each outbreak of a virus was dealt with on a case-by-case basis. One American claimed that he had a fully equipped mobile home for dealing with virus outbreaks (and another one extrapolated to the notion that soon there would be many such mobile units). Existing software was used to detect boot sector viruses (by inspecting the boot sector), and one-off software was written for dealing with outbreaks of Cascade and Jerusalem. In 1988, a virus that is called "Virus-B" was written. This is another virus that doesn't go memory resident, and it is a modification of another virus that deletes files on Friday 13th. When this virus is run, it displays "WARNING!!!! THIS PROGRAM IS INFECTED WITH VIRUS-B! IT WILL INFECT EVERY .COM FILE IN THE CURRENT SUBDIRECTORY!". A virus that is as obvious as that, was clearly not written to spread. It was obviously written as a demonstration virus. Virus researchers are often asked for "harmless viruses" or "viruses for demonstration"; most researchers offer some alternative, such as an overhead foil, or a non-virus program that does a falling letters display. But it looks as if VIRUS-B was written with the intention of giving it away as a demonstration virus - hence the warning. And, indeed, we find that an American company was offering it to "large corporations, universities and research organizations" on a special access basis. At the end of 1988, a few things happened almost at once. The first was a big outbreak of Jerusalem at a large financial institution, which meant that dozens of people were tied up in doing a big clean-up for several days. The second was that a company called S&S did the first ever Virus Seminar that actually explained what a virus was and how they worked. The third was Friday 13th. It was clear that we couldn't go out and help everyone with a virus, even if we bought a mobile home and equipped it (with what)? It was also clear that the financial institution, and the academic site, could easily handle a virus outbreak, but they didn't have the tools to do the job. All they needed was a decent virus detector, which was not available. So we wrote one, added some other tools that experience said might be useful, and created the first Anti Virus Toolkit. In 1989, the first Friday 13th was in January. At the end of 1988, it was clear that Jerusalem was in Spain and the UK, at least, and was in academic as well as commercial sites. Because of the destructive payload in the virus, we felt that if we failed to send out some sort of warning, we would be negligent. But the media grabbed the ball and ran with it; the predictability of the trigger day, together with the feature of it being Friday 13th, caught their imagination, and the first virus media circus was under way. On the 13th of January, we had dozens of phone calls, mostly from the media wanting to know if the world had ended yet. But we also had calls from a large corporate site, a small vendor of PC hardware, and a couple of single users. We were invaded by TV cameras in droves, and had to schedule them carefully to avoid them tripping over each other. In the middle of all this, the PC Support person from the infected corporate arrived. The TV people wanted nothing better than a victim to film, but the corporate wanted anonymity. We pretended that he was just one of our staff. Also, at that time, British Rail contacted us - they also had an outbreak of Jerusalem, and they went public on it. Later, they regretted that decision, because for a long time afterwards, their PC Support person was badgered by the media seeking interviews. 1989 - Datacrime 1989 was the year that things really started to move. The Fu Manchu virus (a modification of Jerusalem) was sent anonymously to a virus researcher in the UK, and the 405 virus (a modification of the overwriting virus in the Burger book) was sent to another UK researcher. A third UK researcher wrote a virus and sent it to another UK researcher - in 1989, the UK was where it was all happening. But not quite all. In 1989, the Bulgarians started getting interested in viruses, and Russia was beginning to awaken. In March of 1989, a minor event happened that was to trigger an avalanche. A new virus was written in Holland. A Dutchman calling himself Fred Vogel (a very common Dutch name) contacted a UK virus researcher, and said that he had found this virus all over his hard disk. He also said that it was called Datacrime, and that he was worried that it would trigger on the 13th of the next month. When the virus was disassembled, it was found that on any day after October 12th, it would trigger a low level format of cylinder zero of the hard disk, which would, on most hard disks, wipe out the File Allocation Table, and leave the user effectively without any data. It would also display the virus's name, Datacrime virus. A straightforward write-up of the effect of this virus was published, but it was another non-memory- resident virus, and so highly unlikely to spread. However, the write-up was reprinted by a magazine, another magazine repeated the story, a third party embellished it a bit, and by June it was becoming an established fact that it would trigger on October 12th (not true, it triggers on any day *after* the 12th, up till December 31st) and that it would low level format the whole hard disk. In America, the press started calling it "Columbus Day virus" (October 12th) and it was suggested that it had been written by Norwegian terrorists, angry at the fact that Eric the Red had discovered America, not Columbus. Meanwhile, in Holland, the Dutch police were doing one of the things that falls within those things that police are supposed to do - crime prevention. Datacrime virus was obviously a crime, and the way to prevent it was to run a detector for it. So the commissioned a programmer to write a Datacrime detector, and offered it at Dutch police stations for $1. It sold really well. But it gave a number of false alarms, and it had to be recalled, and replaced with version 2. There were long queues outside the Dutch police stations, lots of confusion about whether anyone actually had this virus (hardly anyone did, but the false alarms muddied the waters). If the police take something seriously, it must be serious, right? So in July, large Dutch companies started asking IBM if viruses were a serious threat. Datacrime isn't, but there is a distinct possibility that a company could get Jerusalem, Cascade or Stoned (or Italian, in those days before 8088 computers became a rarity). So what is IBM doing about this threat, they asked? IBM had internal-use-only anti-virus software. They used this to check incoming media, and to make sure that an accident like Lehulpe could never happen again. IBM had a problem - if they didn't offer this software to their customers, they could look very bad if on October 13th a lot of computers went down. The technical people knew that this wouldn't happen, but obviously they knew that someone, somewhere, might have important data on a computer that would get hit by Datacrime. IBM had to make a decision about whether to release their software, and they had a very strict deadline to work to - October the 13th would be too late. In September of 1989, IBM sent out version 1.0 of the IBM scanning software, together with a letter telling their customers what it was, and why they were sending it out. When you get a letter like that from IBM, and a disk, you would be pretty brave to take no notice, so a lot of large companies scanned a lot of computers, for the first time. Hardly anyone found Datacrime, but there were instances of the usual viruses. October 13th fell on a Friday, so there was a double event - Jerusalem and Datacrime. In the US, Datacrime (Columbus Day) had been hyped out of all proportion for a virus that is as uninfective as this one, and it is highly likely that not a single user had the virus. In Europe (especially in Holland) there might have been a few, but not many. In London, the Royal National Institute for the Blind announced that they'd had a hit, and had lost large amounts of valuable research data, and months of work. We investigated this particular incident, and the truth was that they had a very minor outbreak of Jerusalem, and a few easily-replaced program files had been deleted. Four computers were infected. But the RNIB outbreak has passed into legend as a Great Disaster. Actually, the RNIB took more damage from the invasion of the television and print media than from the virus. By the end of 1989, there were a couple of dozen viruses that we knew about, but we didn't know that in Bulgaria and Russia, big things were brewing. 1990 - the game gets more complex By 1990, it was no longer a matter of running a couple of dozen search strings down each file. Mark Washburn had taken the Vienna virus, and created the first polymorphic virus from it. We didn't use that word at first, but the idea of his viruses (1260, V2P1, V2P2 and V2P6) was that the whole virus would be variably encrypted, and there would be a decryptor at the start of the virus. But the decryptor could take a very wide number of forms, and in the first few viruses, the longest possible search string was just two bytes long (V2P6 got this down to one byte). To detect this virus, it was necessary to write an algorithm that would apply logical tests to the file, and decide whether the bytes it was looking at were one of the possible decryptors. One consequence of this, was that some vendors couldn't do this. It isn't easy to write such an algorithm, and many vendors were, by this time, relying on search strings extracted by someone else. The three main sources of search strings were a newsletter called Virus Bulletin, the IBM scanner, and reverse engineering a competitor's product. But you can't detect a polymorphic virus this way (indeed, two years after these viruses were published, many products are still incapable of detecting these viruses). Washburn also published his source code, which is now widely available. At the time, we thought that this would bring out a number of imitators; in practice, no-one seems to be using Washburn's code. However, plenty of virus authors are using his idea. Another consequence of polymorphic viruses, was an increase in the false alarm rate. If you write code to detect something that has as many possibilities as V2P6, then there is a chance that you will flag an innocent file, and that chance is much greater than with the sort of virus that you can find with a 24-byte scan string. A false alarm can be as much hassle to the user as a real virus, as he will put all his anti-virus procedures into action. Also, in 1990, we saw a number of virus coming out of Bulgaria, especially from someone who called himself "Dark Avenger". The Dark Avenger viruses introduced two new ideas. The first idea was the "Fast infector"; with these viruses, if the virus is in memory, then simply opening a file for reading, triggers the virus infection. The entire hard disk is very soon infected. The second idea in this virus, was that of subtle damage. Dark Avenger-1800 occasionally overwrites a sector on the hard disk. If this isn't noticed for a period of time, the corrupted files are backed up, and when the backup is restored, the data is still no good. Dark Avenger targets backups, not just data. Other viruses came from the same source, such as the Number-of-the- Beast (stealth in a file virus) and Nomenklatura (with an even nastier payload than Dark Avenger. Also, Dark Avenger was more creative about distributing his viruses. He would upload them to BBSes, infecting shareware anti- virus programs, together with a documentation file that gave reassurance to anyone who checked the file size and checksums. He uploaded his source code also, so that people could learn how to write viruses. In 1990, another event happened in Bulgaria - the first virus exchange BBS. The idea was that if you uploaded a virus, you could download a virus, and if you uploaded a new virus, you were given full access. This, of course, encourages the creation of new viruses, and gets viruses into wider circulation. Also, the VX BBS offered source code, which makes the technology of writing a virus more widely available. In the second half of 1990, The Whale appeared. Whale was a very large, and very complex virus. It didn't do very much; mostly, it crashed the computer when you tried to run it. But it was an exercise in complexity and obfuscation, and it arrived in virus author's hands like a crossword puzzle to be solved. Some virus researchers wasted weeks unravelling Whale, although in practice you could detect it with a couple of dozen search strings, and you didn't really need to do any more, as the thing was too clumsy to work anyway. But because it was so large and complex, it achieved fame. At the end of 1990, the anti-virus people saw that they had to get more organised - they had to be at least as organised as the virus authors. So EICAR (European Institute for Computer Antivirus Research) was born in Hamburg, in December 1990. This gave a very useful forum for the anti-virus researchers and vendors to meet and exchange ideas (and specimens), and to encourage the authorities to try to prosecute virus authors more vigorously. At the time that EICAR was founded, there were about 150 viruses, and the Bulgarian "Virus factory" was in full swing. 1991 - product launches and polymorphism In 1991, the virus problem was sufficiently interesting to attract the large marketing companies. Symantec launched Norton Anti Virus in December 1990, and Central Point launched CPAV in April 1991. This was soon followed by Xtree, Fifth Generation and a couple of others. Most of these companies were rebadging other companies program (nearly all Israeli). The other big problem of 1991 was "glut". In December 1990, there were about 200-300 viruses; by December 1991 there were 1000 (there may have been even more written that year, because by February, we were counting 1300). Glut means lots of viruses, and this causes a number of unpleasant problems. In every program, there must be various limitations. In particular, a scanner has to store search strings in memory, and under Dos, there is only 640 kb to use (and Dos, the network shell and the program's user interface might take half of that). Another Glut problem, is that some scanners slow down in proportion to the number of viruses scanned for. Not many scanners work this way, but it certainly poses a problem for those that do. A third Glut problem, comes with the analysis of viruses; this is necessary if you want to detect the virus reliably, to repair it, and if you want to know what it does. If it takes one researcher one day to disassemble one virus, then he can only do 250 per year. If it takes one hour, that figure becomes 2000 per year, but whatever the figure, more viruses means more work. Glut also means a lot of viruses that are similar to each other. This then can lead to mis-identification, and therefore a wrong repair. Very few scanners attempt a complete virus identification, so this confusion about exactly which virus is being found, is very common. Most of these viruses came from Eastern Europe and Russia - the Russian virus production was in full swing. But another major source of new viruses was the virus exchange BBSes. Bulgaria pioneered the VX BBS, but a number of other countries quickly followed. Some shut down not long after they started up, but the Milan "Italian Virus Research Laboratory" was where a virus author called Cracker Jack uploaded his viruses (which were plagiarised versions of the Bulgarian viruses). Germany had Gonorrhea, Sweden had Demoralised Youth, America had Hellpit, UK had Dead On Arrival and Semaj. Some of these have now either closed down or gone underground, but they certainly contributed to the glut problem. With a VX BBS, all a virus author has to do, is download some source code, make a few simple changes, then upload a new virus, which gives him access to all the other viruses on the board. 1991 was also the year that polymorphic viruses first made a major impact on users. Washburnhad written 1260 and the V2 series long before, but because these were based on Vienna, they weren't infectious enough to spread. But in April of 1991, Tequila burst upon the world like a comet. It was written in Switzerland, and was not intended to spread. But it was stolen from the author by a friend, who planted it on his father's master disks. Father was a shareware vendor, and soon Tequila was very widespread. Tequila used full stealth when it installed itself on the partition sector, and in files it used partial stealth, and was fully polymorphic. A full polymorphic virus in one for which no search string can be written down, even if you allow the use of wild cards. Tequila was the first polymorphic virus that was widespread. By May, the first few scanners were detecting it, but it was not until September that all the major scanners could detect it reliably. If you don't detect it reliably, then you miss, say, 1% of infected files. The virus starts another outbreak from these overlooked instances, and has to be put down again, but now there is that old 1%, plus another 1% of files that are infected but not detected. This can continue for as long as the user has patience, until eventually the hard disk contains nothing but files that the scanner cannot detect. The user, thinks that after the virus coming back a number of times, it gradually infected fewer and fewer files, until now he has gotten rid of it completely. In September 1991, Maltese Amoeba spread through Europe - another polymorphic virus. By the end of the year, there were a few dozen polymorphic viruses. Each of these is classified as "difficult", meaning it takes a virus researcher more than a few hours to do everything that needs to be done. Also, most products need some form of hard coding in order to detect the virus, which means program development, which means bugs, debugging, beta testing and quality control. Furthermore, although a normal virus won't slow down most scanners, a polymorphic virus might. It was also in 1991, that Dark Avenger announced the first virus vapourware. He threatened a virus that had 4,000,000,000 different forms. In January 1992, this virus appeared, but it wasn't a virus. 1992 - Michelangelo January 1992 saw the Self Mutating Engine (MtE) from Dark Avenger. At first, all we saw was a virus that we named Dedicated, but shortly after that, we saw the MtE. This came as an OBJ file, plus the source code for a simple virus, and instructions on how to link the OBJ file to a virus to give you a full polymorphic virus. Immediately, virus researchers set to work on detectors for it. Most companies did this in two stages. In some outfit, stage one was look at it and shudder, stage two was ignore it and hope it goes away. But at the better R&D sites, stage one was usually a detector that found between 90 and 99% of instances, and was shipped very quickly, and stage two was a detector that found 100%. At first, it was expected that there would be lots and lots of viruses using the MtE, because it was fairly easy to use this to make your virus hard to find. But the virus authors quickly realised that a scanner that detected one MtE virus, would detect all MtE viruses fairly easily. So very few virus authors have taken advantage of the engine (there are about a dozen or two viruses that use it). This was followed by Dark Avenger's Commander Bomber. Before CB, you could very easily predict where in the file the virus would be. Many products take advantage of this predictability to run fast; some only scan the top and tail of the file, and some just scan the one place in the file that the virus must occupy if it is there at all. Bomber transforms this, and so products either have to scan the entire file, or else they have to be more sophisticated about locating the virus. Another virus that came out at about that time, was Starship. Starship is a fully polymorphic virus (to defeat scanners), with a few neat anti-debugging tricks, and it also aims to defeat checksummers with a very simple trick. Checksumming programs aim to detect a virus by the fact that it has to change executable code in order to replicate. Starship only infects files as they are copied from the hard disk to the floppy. So files on the hard disk never change. But the copy on the floppy disk is infected, and if you then copy that onto a new hard disk, and tell the checksummer on the new machine about this new file, the checksummer will happily accept it, and never report any changes. Starship also installs itself on the hard disk, but without changing executable code. It changes the partition data, making a new partition as the boot partition. No code is changed, but the new partition contains the virus code, and this is run before it passes control on to the original boot partition. Probably the greatest event of 1992 was the great Michelangelo scare. One of the American anti-virus vendors forecast that five million computers would go down on March the 6th, and many other US vendors climbed on to the band wagon. PC users went into a purchasing frenzy, as the media whipped up the hype. On March the 6th, between 5,000 and 10,000 machines went down, and naturally the US vendors that had been hyping the problem put this down to their timely and accurate warning. We'll probably never know how many people had Michelangelo, but certainly in the days leading up to March the 6th, a lot of computers were checked for viruses. After March 6th, there were a lot of discredited experts around. The reaction to the Michelangelo hype did a lot of damage to the credibility of people advicating sensible antivirus strategies, and outweighed any possible benefits from the gains in awareness. In August 1992, we saw the first serious virus authoring packages. First the VCL (Virus Creation Laboratory) from Nowhere Man, and then Dark Angel's Phalcon/Skism Mass-Produced Code Generator. These packages made it possible for anyone who could use a computer, to write a virus. Within twelve months, dozens of viruses had been created using these tools. Towards the end of 1992, a new virus writing group called ARCV (Association of Really Cruel Viruses) had appeared in England - within a couple of months, the Computer Crime Unit of New Scotland Yard had tracked them down and arrested them. ARCV flourished for about three months, during which they wrote a few dozen viruses and attracted a few members. Another happening of 1992, was the appearance of people selling (or trying to sell) virus collections. To be more precise, these were collections of files, some of which were viruses, and many of which were assorted harmless files. In America, John Buchanan offered his collection of a few thousand files for $100 per copy, and in Europe, The Virus Clinic offered various options from #25. The Virus Clinic was raided by the Computer Crime Unit; John Buchanan is still offering viruses for sale. Towards the end of 1992, the US Government was offering viruses to people who called the relevant BBS. 1993 - Polymorphics and Engines Early in 1993, XTREE announced that they were quitting the antivirus business. This was the first time that a major company had given up the struggle. Early in 1993, a new virus writing group appeared, in Holland, called Trident. The main Trident author, Masouf Khafir, wrote a polymorphic engine called the Trident Polymorphic Engine, and release a virus that used it, called GIRAFE. This was followed by updated versions of the TPE. The TPE is much more difficult to detect reliably than the MtE, and very difficult to avoid false alarming on. Khafir also released the first virus that worked according to a principle first described by Fred Cohen. The Cruncher virus was a data compression virus, that automatically added itself to files in order to auto-install on as many computers as possible. Meanwhile, Nowhere Man, of the Nuke group, had been busy. Early in 1993, he released the Nuke Encryption Device (NED). This was another mutator that was more tricky than MtE. A virus called Itshard soon followed. Phalcon/Skism was not to be left out. Dark Angel released DAME (Dark Angel's Multiple Encyptor) in an issue of 40hex; a virus called Trigger uses this. Trident released version 1.4 of TPE (again, this is more complex and difficult than previous versions) and released a virus called Bosnia that uses it. Soon after that, Lucifer Messiah, of Anarkick Systems had taken version 1.4 of the TPE and written a virus POETCODE, using a modified version of this engine (1.4b). Early in 1993, another highly polymorphic virus appeared, called Tremor. This rocketed to stardom when it got included in a TV broadcast of software (received via a decoder). In the middle of 1993, Trident got a boost when Dark Ray and John Tardy joined the group. Tardy has released a fully polymorphic virus in 444 bytes, and we can expect more difficult things from Trident. The main events of 1993, were the emergence of an increasing number of polymorphic engines, which will make it easier and easier to write viruses that scanners find difficult to detect. The future There will be more viruses - that's an easy prediction. How many more is a difficult call, but over the last five years, the number of viruses has been doubling every year or so. This surely must slow down. If we say 1500 viruses in mid-1992, and 3000 in mid-1993, then we could imagine 5000 in mid 1994 and we could expect to reach the 8,000 mark some time in 1995. Or perhaps we are being optimistic? The glut problem will continue, and could get sharply worse. Whenever a group of serious anti-virus researchers meet, we find an empty room, hang "Closed for cleaning" on the door, and frighten each other with "nightmare scenarios". Some of the older nightmare scenarios have already come true, others have not, but remain possibilities. The biggest nightmare for all anti-virus people is glut. There are only about 10-15 first class anti-virus people in the world, and most of the anti-virus companies have just one of these people (some have none). It would be difficult to create more, as the learning curve is very steep. The first time you disassemble something like Jerusalem virus, it takes a week. After you've done a few hundred viruses, you could whip through something as simple as Jerusalem in 15 minutes. The polymorphic viruses will get more numerous. It turns out that they are a much bigger problem than the stealth viruses, because stealth is aimed at checksummers, but polymorphism is aimed at scanners, which is what most people are using. And each polymorphic virus will be a source of false alarms, and will cause the researchers much more work than the normal viruses. The polymorphic viruses will also continue to get more complex, as virus authors learn the technique, and increasingly try to ensure that their viruses cannot be detected. Scanners will get larger - more code will be needed because more viruses will need hard coding to scan for them. The databases that scanners use will get larger; each new virus needs to be detected, identified and repaired. Loading the databases will take longer, and some programs will have memory shortage problems. As Windows becomes more popular, people will be increasingly reluctant to run scanners under Dos. But if you are running Windows, you have run software on the hard disk, and if one of the things you've run is infected by a virus, you have a virus in memory. If there is a virus in memory, you cannot trust what the computer is saying - it could be a stealth virus. Windows will make antivirus software less secure. The R&D effort to keep scanners up-to-date will get more and more. Some companies won't be able to do it, and will decide that scanning is outdated technology, and try to rely on checksumming. Other companies will licence scanners from one of the few companies that still maintains adequate R&D (we've already started seeing some of this). Some companies will decide that the anti-virus business isn't as profitable as they had thought, and will abandon their anti-virus product, and go back to their core business. Users will get a lot more relaxed about viruses. We've long since passed the stage where a virus is regarded as a loathsome disease, to be kept secret. But we're increasingly seeing people who regard a virus on their system with about the same degree of casualness as a bit of fluff on their jacket. Sure, they'll wipe it off, but there's not real need to worry about it happening again. This is perhaps a bit too relaxed an attitude, but what can you expect if a user keeps on getting hit by viruses, and nothing terrible ever seems to result. Anti-virus products will mature a lot. Those without any kind of decent user interface will have a hard time competing against the pretty ones. Those with a long run time will be rejected in favour of those that run in seconds. Exactly which viruses are detected will have far less emphasis (it is very difficult for users to swallow claims about so many thousands of viruses) than the ease of use of the product, and the amount of impact it has on the usability of the computer. New products will keep arriving, as each company invents the product that makes all previous products obsolete. Sometimes the magic ingredient will be software (AI, neural nets, whatever is the latest buzzword) and sometimes it will be hardware (which can never be infected, except that that isn't the problem). These products will burst on a startled world in a blaze of publicity, and vanish without trace when users find that installing them makes their computer unusable, or else it doesn't find any viruses, or both. But new ones will come along to take their place. Gradually, people will trade up from Dos to whatever takes its place - OS/2, Windows-NT or Unix, and the Dos virus will become as irrelevant as CPM. Except that Dos will still be around 10 or even 20 years from now, and viruses for the new operating system will start to appear as soon as it is worth writing them. Some computers are already being built with ingrained resistance to viruses. Some brands of computer are already immune to boot sector viruses, provided you make a simple choice in the CMOS setup (don't boot from the floppy). Right now, very few users are being told that these computers can be set up that way, but people are gradually finding out for themselves. This doesn't solve the virus problem, but anything that makes the world a difficult place for viruses must be a help. The virus problem will be with us for ever. It isn't the dramatic, worldshaking kind of problem that Michelangelo was made out to be; nor is it the fluff-on-your-jacket kind of problem. But as long as people have problems with computers, other people will be offering solutions for those problems. =============================================================================